Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933254AbeAHXFu (ORCPT + 1 other); Mon, 8 Jan 2018 18:05:50 -0500 Received: from mail.kernel.org ([198.145.29.99]:51632 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932099AbeAHXFt (ORCPT ); Mon, 8 Jan 2018 18:05:49 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9145321726 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org Subject: Re: [PATCH RFC 3/4] x86/pti: don't mark the user PGD with _PAGE_NX. To: Dave Hansen , Willy Tarreau , linux-kernel@vger.kernel.org, x86@kernel.org Cc: tglx@linutronix.de, gnomes@lxorguk.ukuu.org.uk, torvalds@linux-foundation.org, Kees Cook References: <1515427939-10999-1-git-send-email-w@1wt.eu> <1515427939-10999-4-git-send-email-w@1wt.eu> <57039ac1-efe2-2f97-386f-dab0b90f64a5@intel.com> From: Andy Lutomirski Message-ID: Date: Mon, 8 Jan 2018 15:05:48 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <57039ac1-efe2-2f97-386f-dab0b90f64a5@intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-MW Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 01/08/2018 09:03 AM, Dave Hansen wrote: > On 01/08/2018 08:12 AM, Willy Tarreau wrote: >> Since we're going to keep running on the same PGD when returning to >> userspace for certain performance-critical tasks, we'll need the user >> pages to be executable. So this code disables the extra protection >> that was added consisting in marking user pages _PAGE_NX so that this >> pgd remains usable for userspace. >> >> Note: it isn't necessarily the best approach, but one way or another >> if we want to be able to return to userspace from the kernel, >> we'll have to have this executable anyway. Another approach >> might consist in using another pgd for userland+kernel but >> the current core really looks like an extra careful measure >> to catch early bugs if any. > > I don't like this. > > I think the prctl() should apply to an entire process, not to a thread. > If it applies to a process, you can unpoison the PGD. I even had code > to do this in an earlier version of the (whole system) runtime PTI > on/off stuff. > > Why are you even posting half-baked hacks like this now? Is there > something super-pressing about this set that we need to lock in a new > ABI now? > I vote per-thread. Anyway, we can easily sync the NX-clearing: just catch the spurious page fault and clear the bit. Avoiding infinite loops will need a bit of thought, but it's surely doable. Or we set a per-mm flag saying "no NX", then do synchronize_sched() or similar if we were the first to set it (or take the pagetable lock), then clear all the NX bits. Again, needs some care, but doable. FWIW, the NX trick quite nicely emulates SMEP on non-SMEP hardware, which is fantastic for Spectre resistance and general hardening. Turning it off totally defeats that, which hurts a bit. Also, Kees should be CC'd here.