Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964974AbeAKP3g (ORCPT + 1 other); Thu, 11 Jan 2018 10:29:36 -0500 Received: from mga11.intel.com ([192.55.52.93]:5056 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933670AbeAKP3c (ORCPT ); Thu, 11 Jan 2018 10:29:32 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,345,1511856000"; d="scan'208";a="20177906" Subject: Re: [RFC PATCH v2 6/6] x86/entry/pti: don't switch PGD on when pti_disable is set To: Willy Tarreau , Linus Torvalds References: <1515502580-12261-1-git-send-email-w@1wt.eu> <1515502580-12261-7-git-send-email-w@1wt.eu> <20180110082207.GX29822@worktop.programming.kicks-ass.net> <20180110091102.GH14066@1wt.eu> <20180111064259.GC14920@1wt.eu> Cc: Andy Lutomirski , Peter Zijlstra , LKML , X86 ML , Borislav Petkov , Brian Gerst , Ingo Molnar , Thomas Gleixner , Josh Poimboeuf , "H. Peter Anvin" , Greg Kroah-Hartman , Kees Cook From: Dave Hansen Message-ID: <0f08d89e-61e1-20e3-5c59-0b2f7b32bf0c@linux.intel.com> Date: Thu, 11 Jan 2018 07:29:30 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <20180111064259.GC14920@1wt.eu> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 01/10/2018 10:42 PM, Willy Tarreau wrote: > On Wed, Jan 10, 2018 at 11:50:46AM -0800, Linus Torvalds wrote: >> And the whole "NOW" vs "NEXT" is complete garbage. The obvious sane >> no-PTI interface is that it >> >> (a) inherits on fork/exec, so that you don't have to worry about how >> something is implemented (think "I want to run this kernel build >> without the PTI overhead", but also "I want to run this system daemon >> without PTI"). >> >> (b) actual domain changes clear it (ie suid, whatever). >> >> that make it useful for random uses of "I trust service XYZ". > OK. Do you want to see something *only* based on a wrapper (i.e. works > only after execve) or can we let the application apply the change to > itself ? I would also like to let applications re-enable the protection > for processes they're going to exec and not necessarily trust. I don't think we need a "NOW" and "NEXT" mode, at least initially. The "NEXT" semantics are going to be tricky and I think "NOW" is good enough Whatever we do, we'll need this PTI-disable flag to be able cross exeve() so that a wrapper a la nice(1) work. Initially, I think the default should be that it survives fork(). There are just too many things out there that "start up" by doing a shell script that calls a python script, that calls a... Without the wrapper support, we're _basically_ stuck using this only in newly-compiled binaries. That's going to make it much less likely to get used. The inheritance also gives an app a way to re-enable protections for children, just from a _second_ wrapper. That's nice because it means we don't initially need a "NEXT" ABI. So, I'd do this: 1. Do the arch_prctl() (but ask the ARM guys what they want too) 2. Enabled for an entire process (not thread) 3. Inherited across fork/exec 4. Cleared on setuid() and friends 5. I'm sure the security folks have/want a way to force it on forever Next, if we decide that we have things that both don't want PTI's protections and are forking things not covered by #4, we can add some "child opt out" in the prctl(), plus maybe marking binaries somehow. Please don't forget to add ways to tell if this feature is on/off in /proc or whatever. I think we also need to be able to dump the actual CR3 value that we entered the kernel with before we start doing too much other funky stuff with the entry code.