MIME-Version: 1.0
In-Reply-To: <20180111174025.GB15344@1wt.eu>
References: <1515502580-12261-1-git-send-email-w@1wt.eu> <1515502580-12261-7-git-send-email-w@1wt.eu>
 <20180110082207.GX29822@worktop.programming.kicks-ass.net>
 <20180110091102.GH14066@1wt.eu> <CALCETrXQFL2sJPJe_Z8XLEo11V_tLyZR0y4PTRxJzrhmpdsuJg@mail.gmail.com>
 <CA+55aFwfSKNAYEFWCwCe2vSxejA3X85FnX9t1EdnCRUwC1ou1Q@mail.gmail.com>
 <20180111064259.GC14920@1wt.eu> <0f08d89e-61e1-20e3-5c59-0b2f7b32bf0c@linux.intel.com>
 <20180111154412.GA15296@1wt.eu> <CALCETrVcQg_1opnvOP4ksOAC07K4O_LTSxy2czwtObwR3YL+-w@mail.gmail.com>
 <20180111174025.GB15344@1wt.eu>
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Thu, 11 Jan 2018 10:25:29 -0800
Message-ID: <CA+55aFzOAAJ+WwA6BGAqW_h-r3_dYNpvgsmHcbpugwu4emcS6g@mail.gmail.com>
Subject: Re: [RFC PATCH v2 6/6] x86/entry/pti: don't switch PGD on when
 pti_disable is set
To: Willy Tarreau <w@1wt.eu>
Cc: Andy Lutomirski <luto@kernel.org>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        Peter Zijlstra <peterz@infradead.org>,
        LKML <linux-kernel@vger.kernel.org>, X86 ML <x86@kernel.org>,
        Borislav Petkov <bp@alien8.de>,
        Brian Gerst <brgerst@gmail.com>,
        Ingo Molnar <mingo@kernel.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Josh Poimboeuf <jpoimboe@redhat.com>,
        "H. Peter Anvin" <hpa@zytor.com>,
        Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        Kees Cook <keescook@chromium.org>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org

On Thu, Jan 11, 2018 at 9:40 AM, Willy Tarreau <w@1wt.eu> wrote:
>> As for per-mm vs per-thread, let's make it only switchable in
>> single-threaded processes for now and inherited when threads are
>> created.
>
> That's exactly what it does for now, but Linus doesn't like it at all.

Just to clarify: I definitely want the part where it is only
switchable in single-threaded mode, and I actually do want it
"inherited" by threads when they do get created.

It's just that my mental model for threads is not that they "inherit"
the PTI flag, it's that they simply share it. But that's more of a
difference in "views" than in actual behavior.

>> (Another reason for per-thread instead of per-mm: as a per-mm thing,
>> you can't set it up for your descendents using vfork(); prctl();
>> exec(), and the latter is how your average language runtime that
>> spawns subprocesses would want to do it.
>
> That's indeed the benefit it provides for now since I actually had
> to *add* code to execve() to disable it then.

So the "vfork()" case is indeed interesting, but I don't think it's
all that relevant.

Why?

If you do the PTI on/off operation *before* the vfork(), nothing is
different. The vfork() by definition ends up having the same PTI
state, since it has the same VM. But that's actually 100% expected,
and it matches the fork() behavior too: the PTI state should be just
copied by a fork(), since fork isn't any protection domain.

And *after* you've done a vfork(), you can't do a PTI on/off, because
now the VM has multiple users, which is 100% equivalent to the thread
case that we already all agreed should be disallowed. So no, you can't
do "vfork -> setnopti -> exec', but that is in no way different from
any of the *other* things you cannot do in between vfork and execve.

And in a wrapper that sets nopti, you wouldn't want to use vfork
anyway. You wouldn't even want to use *fork*. You'd just do "set
nopti" and then execve(). That's the whole point of the wrapper.

So vfork() is worth _mentioning_, but I don't think there is any
actual issue there. Quite the reverse - it acts exactly as expected.

The main thing that should be special for PTI on/off is "execve()".
That's the one that may force PTI on again, because of a security
boundary.

The other case may be the CLONE_NEW* operations. I *think* they are
noops as far as PTI settings would be, but I think people should think
about them.

               Linus