Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933272AbeAHXJD (ORCPT + 1 other); Mon, 8 Jan 2018 18:09:03 -0500 Received: from mail-ua0-f196.google.com ([209.85.217.196]:44420 "EHLO mail-ua0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932240AbeAHXJB (ORCPT ); Mon, 8 Jan 2018 18:09:01 -0500 X-Google-Smtp-Source: ACJfBovM6QDcxpj+RsqHhQaOk6CAhRemWmrLDVBwprP8t1CH3ecvZt+AcYJ7VjtCseC1FLpqhwhpMcPupg5eSa5iup4= MIME-Version: 1.0 In-Reply-To: References: <1515427939-10999-1-git-send-email-w@1wt.eu> <1515427939-10999-4-git-send-email-w@1wt.eu> <57039ac1-efe2-2f97-386f-dab0b90f64a5@intel.com> From: Kees Cook Date: Mon, 8 Jan 2018 15:09:00 -0800 X-Google-Sender-Auth: NmXlHyJpHWRwdpFA-_Gy6tVQLTw Message-ID: Subject: Re: [PATCH RFC 3/4] x86/pti: don't mark the user PGD with _PAGE_NX. To: Andy Lutomirski Cc: Dave Hansen , Willy Tarreau , LKML , X86 ML , Thomas Gleixner , Alan Cox , Linus Torvalds Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Mon, Jan 8, 2018 at 3:05 PM, Andy Lutomirski wrote: > On 01/08/2018 09:03 AM, Dave Hansen wrote: >> >> On 01/08/2018 08:12 AM, Willy Tarreau wrote: >>> >>> Since we're going to keep running on the same PGD when returning to >>> userspace for certain performance-critical tasks, we'll need the user >>> pages to be executable. So this code disables the extra protection >>> that was added consisting in marking user pages _PAGE_NX so that this >>> pgd remains usable for userspace. >>> >>> Note: it isn't necessarily the best approach, but one way or another >>> if we want to be able to return to userspace from the kernel, >>> we'll have to have this executable anyway. Another approach >>> might consist in using another pgd for userland+kernel but >>> the current core really looks like an extra careful measure >>> to catch early bugs if any. >> >> >> I don't like this. >> >> I think the prctl() should apply to an entire process, not to a thread. >> If it applies to a process, you can unpoison the PGD. I even had code >> to do this in an earlier version of the (whole system) runtime PTI >> on/off stuff. >> >> Why are you even posting half-baked hacks like this now? Is there >> something super-pressing about this set that we need to lock in a new >> ABI now? >> > > I vote per-thread. > > Anyway, we can easily sync the NX-clearing: just catch the spurious page > fault and clear the bit. Avoiding infinite loops will need a bit of > thought, but it's surely doable. > > Or we set a per-mm flag saying "no NX", then do synchronize_sched() or > similar if we were the first to set it (or take the pagetable lock), then > clear all the NX bits. Again, needs some care, but doable. > > FWIW, the NX trick quite nicely emulates SMEP on non-SMEP hardware, which is > fantastic for Spectre resistance and general hardening. Turning it off > totally defeats that, which hurts a bit. > > Also, Kees should be CC'd here. Please please keep the NX. As mentioned, this gets us emulated SMEP for "free" with PTI, which Linux has needed for years. -Kees -- Kees Cook Pixel Security