Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751904AbdIUAJG (ORCPT ); Wed, 20 Sep 2017 20:09:06 -0400 Received: from mail-io0-f169.google.com ([209.85.223.169]:51796 "EHLO mail-io0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751762AbdIUAJE (ORCPT ); Wed, 20 Sep 2017 20:09:04 -0400 X-Google-Smtp-Source: AOwi7QB1XkanaLu8j2YMwRFqyzi4671PqeaMMCbt00phJsYMj+FSEsMBDpEL8KmZb/zjK8QLXSSW8g== Date: Wed, 20 Sep 2017 18:09:01 -0600 From: Tycho Andersen To: Dave Hansen Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, Marco Benatto , Juerg Haefliger , x86@kernel.org Subject: Re: [PATCH v6 03/11] mm, x86: Add support for eXclusive Page Frame Ownership (XPFO) Message-ID: <20170921000901.v7zo4g5edhqqfabm@docker> References: <20170907173609.22696-1-tycho@docker.com> <20170907173609.22696-4-tycho@docker.com> <34454a32-72c2-c62e-546c-1837e05327e1@intel.com> <20170920223452.vam3egenc533rcta@smitten> <97475308-1f3d-ea91-5647-39231f3b40e5@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <97475308-1f3d-ea91-5647-39231f3b40e5@intel.com> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2032 Lines: 47 On Wed, Sep 20, 2017 at 04:21:15PM -0700, Dave Hansen wrote: > On 09/20/2017 03:34 PM, Tycho Andersen wrote: > >> I really have to wonder whether there are better ret2dir defenses than > >> this. The allocator just seems like the *wrong* place to be doing this > >> because it's such a hot path. > > > > This might be crazy, but what if we defer flushing of the kernel > > ranges until just before we return to userspace? We'd still manipulate > > the prot/xpfo bits for the pages, but then just keep a list of which > > ranges need to be flushed, and do the right thing before we return. > > This leaves a little window between the actual allocation and the > > flush, but userspace would need another thread in its threadgroup to > > predict the next allocation, write the bad stuff there, and do the > > exploit all in that window. > > I think the common case is still that you enter the kernel, allocate a > single page (or very few) and then exit. So, you don't really reduce > the total number of flushes. > > Just think of this in terms of IPIs to do the remote TLB flushes. A CPU > can do roughly 1 million page faults and allocations a second. Say you > have a 2-socket x 28-core x 2 hyperthead system = 112 CPU threads. > That's 111M IPI interrupts/second, just for the TLB flushes, *ON* *EACH* > *CPU*. Since we only need to flush when something switches from a userspace to a kernel page or back, hopefully it's not this bad, but point taken. > I think the only thing that will really help here is if you batch the > allocations. For instance, you could make sure that the per-cpu-pageset > lists always contain either all kernel or all user data. Then remap the > entire list at once and do a single flush after the entire list is consumed. Just so I understand, the idea would be that we only flush when the type of allocation alternates, so: kmalloc(..., GFP_KERNEL); kmalloc(..., GFP_KERNEL); /* remap+flush here */ kmalloc(..., GFP_HIGHUSER); /* remap+flush here */ kmalloc(..., GFP_KERNEL); ? Tycho