Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp19858imu; Thu, 10 Jan 2019 17:03:11 -0800 (PST) X-Google-Smtp-Source: ALg8bN4XXzAZTzYfYLK25o0rp4ssxyXeunP9xNmrnFI1eC5Sp7rWFdQFOanav0G/tpy2hgR6e5nU X-Received: by 2002:a62:1b50:: with SMTP id b77mr12491229pfb.36.1547168590949; Thu, 10 Jan 2019 17:03:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547168590; cv=none; d=google.com; s=arc-20160816; b=lz0dVGBdh0xIleYTZvrOa279+wJZXz+R2r+UhKHjkWVFjC3VCp8oYHLZvwZMrG3YO6 la22i9zX/5wd2MJ3iZtQbx17Bz21remnaC/yf2Aodjgal8UaUUw14EAlCCj0nBBnkfK0 4j7OcnOeOtviZ9hiSp+2gWJGBHhqPk5IqsMolrQl8xRmDSHPImCDjgQS/L1pWmjn2f8c XqxsLvajAwTxasISr/MDgsTj2ALiriVPpejuuBjxYanH3ZZPTcBJafEAEUuyN6zm0194 /Qdr2tsDM8SrNBMl8t6YhTTVWlniWXqYheFzZuTubT1E+ZxJ/JMtkZZKPj3UFUaNa4kb Rt4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=AyO2ZUX14HwGnFRVfShkhM/0Ey6IITntw+hlIqJE8Hg=; b=R5DZ2Tf0NK0upxIKjbGAmzgKEntHsWRA6H6JNUba4J2E35qppTImVwUrexXTvrxOfq zXwWWjCYAnfHUrvlyYQH9LINoW0dFwNhA2Fomy+nk//bBGrVQM2UXDu/bDvSY6GX0paB AruPkZ5kwFHNAFXWuajahwZs/9iyl+fZmiYWA8SPmwayC8sqRyP4Tr56n3D3TvQR+CN+ GNwQVSsLRYPwazz90uM210/BK5Mc7J+i18vdZ8W8scJrpPB9NZWYYUi6xMSfzhpFwftA hbkpDp5gzq36b7ARNqu95y+9aEuWuyi6xoNfWpfWj82zIT9O3CvPgLzqaCsMbalnVTs5 KeMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=0wOYoZuy; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b11si26244043pla.405.2019.01.10.17.02.54; Thu, 10 Jan 2019 17:03:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=0wOYoZuy; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729136AbfAKAox (ORCPT + 99 others); Thu, 10 Jan 2019 19:44:53 -0500 Received: from mail.kernel.org ([198.145.29.99]:35442 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727098AbfAKAox (ORCPT ); Thu, 10 Jan 2019 19:44:53 -0500 Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id EBCC6218AF for ; Fri, 11 Jan 2019 00:44:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1547167492; bh=WG/P66XN4qR7HG2MSq1KjFJ2CkSzkKRidv2c1nBua0c=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=0wOYoZuyhSTuDfZfYOyUUB5GM/qZ9DSk4fNQHanVxK9hh7E3mLxhPcUrcR2SZRW5d b4nTmZ8zHmhPcBKNilb5lTl8kgLtNby3D7Z2g4CMUfmtAEdq9iQ09l2y1lGM8zpxvr u+8m/v/cCMoB1KoBZpIdHeGMzLrLmVjZfj6FClYE= Received: by mail-wr1-f43.google.com with SMTP id u4so13427928wrp.3 for ; Thu, 10 Jan 2019 16:44:51 -0800 (PST) X-Gm-Message-State: AJcUukeMnVPiKJTl2uUodOWhIz0ViLwWsPxcEwOcR8nQNe4Idefd8nfG NWTps816LHRNSzxkKcosxF0dYKGk/KMRbfHW7/nymw== X-Received: by 2002:adf:f0c5:: with SMTP id x5mr10938341wro.77.1547167488414; Thu, 10 Jan 2019 16:44:48 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Andy Lutomirski Date: Thu, 10 Jan 2019 16:44:36 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH v7 00/16] Add support for eXclusive Page Frame Ownership To: Kees Cook Cc: Khalid Aziz , Andy Lutomirski , Dave Hansen , Ingo Molnar , Juerg Haefliger , Tycho Andersen , jsteckli@amazon.de, Andi Kleen , Linus Torvalds , liran.alon@oracle.com, Konrad Rzeszutek Wilk , deepa.srinivasan@oracle.com, chris hyser , Tyler Hicks , "Woodhouse, David" , Andrew Cooper , Jon Masters , Boris Ostrovsky , kanth.ghatraju@oracle.com, Joao Martins , Jim Mattson , pradeep.vincent@oracle.com, John Haxby , "Kirill A. Shutemov" , Christoph Hellwig , steven.sistare@oracle.com, Kernel Hardening , Linux-MM , LKML , Thomas Gleixner Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 10, 2019 at 3:07 PM Kees Cook wrote: > > On Thu, Jan 10, 2019 at 1:10 PM Khalid Aziz wrote: > > I implemented a solution to reduce performance penalty and > > that has had large impact. When XPFO code flushes stale TLB entries, > > it does so for all CPUs on the system which may include CPUs that > > may not have any matching TLB entries or may never be scheduled to > > run the userspace task causing TLB flush. Problem is made worse by > > the fact that if number of entries being flushed exceeds > > tlb_single_page_flush_ceiling, it results in a full TLB flush on > > every CPU. A rogue process can launch a ret2dir attack only from a > > CPU that has dual mapping for its pages in physmap in its TLB. We > > can hence defer TLB flush on a CPU until a process that would have > > caused a TLB flush is scheduled on that CPU. I have added a cpumask > > to task_struct which is then used to post pending TLB flush on CPUs > > other than the one a process is running on. This cpumask is checked > > when a process migrates to a new CPU and TLB is flushed at that > > time. I measured system time for parallel make with unmodified 4.20 > > kernel, 4.20 with XPFO patches before this optimization and then > > again after applying this optimization. Here are the results: I wasn't cc'd on the patch, so I don't know the exact details. I'm assuming that "ret2dir" means that you corrupt the kernel into using a direct-map page as its stack. If so, then I don't see why the task in whose context the attack is launched needs to be the same process as the one that has the page mapped for user access. My advice would be to attempt an entirely different optimization: try to avoid putting pages *back* into the direct map when they're freed until there is an actual need to use them for kernel purposes. How are you handing page cache? Presumably MAP_SHARED PROT_WRITE pages are still in the direct map so that IO works.