Date: Thu, 11 Jan 2018 22:59:54 +0100
From: Willy Tarreau <w@1wt.eu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        Peter Zijlstra <peterz@infradead.org>,
        LKML <linux-kernel@vger.kernel.org>, X86 ML <x86@kernel.org>,
        Borislav Petkov <bp@alien8.de>,
        Brian Gerst <brgerst@gmail.com>,
        Ingo Molnar <mingo@kernel.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Josh Poimboeuf <jpoimboe@redhat.com>,
        "H. Peter Anvin" <hpa@zytor.com>,
        Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        Kees Cook <keescook@chromium.org>
Subject: Re: [RFC PATCH v2 6/6] x86/entry/pti: don't switch PGD on when
 pti_disable is set
Message-ID: <20180111215954.GC15528@1wt.eu>
References: <20180110082207.GX29822@worktop.programming.kicks-ass.net>
 <20180110091102.GH14066@1wt.eu>
 <CALCETrXQFL2sJPJe_Z8XLEo11V_tLyZR0y4PTRxJzrhmpdsuJg@mail.gmail.com>
 <CA+55aFwfSKNAYEFWCwCe2vSxejA3X85FnX9t1EdnCRUwC1ou1Q@mail.gmail.com>
 <20180111064259.GC14920@1wt.eu>
 <0f08d89e-61e1-20e3-5c59-0b2f7b32bf0c@linux.intel.com>
 <20180111154412.GA15296@1wt.eu>
 <CALCETrVcQg_1opnvOP4ksOAC07K4O_LTSxy2czwtObwR3YL+-w@mail.gmail.com>
 <20180111174025.GB15344@1wt.eu>
 <CA+55aFzOAAJ+WwA6BGAqW_h-r3_dYNpvgsmHcbpugwu4emcS6g@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CA+55aFzOAAJ+WwA6BGAqW_h-r3_dYNpvgsmHcbpugwu4emcS6g@mail.gmail.com>
User-Agent: Mutt/1.6.1 (2016-04-27)
Sender: linux-kernel-owner@vger.kernel.org

On Thu, Jan 11, 2018 at 10:25:29AM -0800, Linus Torvalds wrote:
> Just to clarify: I definitely want the part where it is only
> switchable in single-threaded mode, and I actually do want it
> "inherited" by threads when they do get created.

OK this is what is currently done in series v3 because the TIF_* flags
are copied as-is to threads (unless I missed something). Even for
re-enabling it currently refuses it if mm_users > 1.

> It's just that my mental model for threads is not that they "inherit"
> the PTI flag, it's that they simply share it. But that's more of a
> difference in "views" than in actual behavior.

I see, thanks for explaining this point, I understand better your
concern now. Well, if we document that the current process' flag is
replicated as-is to any threads so that it is consistent across all
threads and that it may only be modified on all threads atomically,
which currently we can only achieve by doing it when there's a single
thread on an mm, I suspect it could match your mental model.

> If you do the PTI on/off operation *before* the vfork(), nothing is
> different. The vfork() by definition ends up having the same PTI
> state, since it has the same VM. But that's actually 100% expected,
> and it matches the fork() behavior too: the PTI state should be just
> copied by a fork(), since fork isn't any protection domain.
> 
> And *after* you've done a vfork(), you can't do a PTI on/off, because
> now the VM has multiple users, which is 100% equivalent to the thread
> case that we already all agreed should be disallowed. So no, you can't
> do "vfork -> setnopti -> exec', but that is in no way different from
> any of the *other* things you cannot do in between vfork and execve.

That's where I like the principle of the NEXT ctl which can be per-
thread. The thread about to do an execve() cannot change its own flag
because it's entangled to the other ones sharing the same mm, but it
can change its own NEXT flag so that execve() starts with the specified
mode (typically PTI on in the example of log rotation for a server).

Quite honnestly for the NOW vs NEXT, I find the NOW convenient to avoid
a wrapper, but a program could also self-exec after setting the flag
(I've already done this to change thread stack sizes on certain
processes a long time ago and that's no real hassle). And given that
NOW cannot really re-adjust the PGD that was already assigned, maybe
in the end we should stick to this NEXT thing and wait for the next
execve() to apply the operation.

Willy