Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753618AbeAIVmY (ORCPT + 1 other); Tue, 9 Jan 2018 16:42:24 -0500 Received: from wtarreau.pck.nerim.net ([62.212.114.60]:39224 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753373AbeAIVmW (ORCPT ); Tue, 9 Jan 2018 16:42:22 -0500 Date: Tue, 9 Jan 2018 22:41:51 +0100 From: Willy Tarreau To: Andy Lutomirski Cc: Borislav Petkov , LKML , X86 ML , Brian Gerst , Dave Hansen , Ingo Molnar , Linus Torvalds , Peter Zijlstra , Thomas Gleixner , Josh Poimboeuf , "H. Peter Anvin" , Kees Cook Subject: Re: [RFC PATCH v2 2/6] x86/arch_prctl: add ARCH_GET_NOPTI and ARCH_SET_NOPTI to enable/disable PTI Message-ID: <20180109214151.GB13282@1wt.eu> References: <1515502580-12261-1-git-send-email-w@1wt.eu> <1515502580-12261-3-git-send-email-w@1wt.eu> <20180109141713.ngqrf6weyiy2q3in@pd.tnic> <20180109143653.GA12976@1wt.eu> <20180109145157.5ltqbz4o5sqkcggb@pd.tnic> <20180109145422.GD12976@1wt.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.1 (2016-04-27) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 09, 2018 at 01:26:57PM -0800, Andy Lutomirski wrote: > On Tue, Jan 9, 2018 at 6:54 AM, Willy Tarreau wrote: > > On Tue, Jan 09, 2018 at 03:51:57PM +0100, Borislav Petkov wrote: > >> On Tue, Jan 09, 2018 at 03:36:53PM +0100, Willy Tarreau wrote: > >> > I see and am not particularly against this, but what use case do you > >> > have in mind precisely ? I doubt it's just saving a few tens of bytes, > >> > so probably you're more concerned about the potential risks this opens ? > >> > But given we only allow this for CAP_SYS_RAWIO and these ones already > >> > have access to /dev/mem and many other things, don't you think there > >> > are much easier ways to dump kernel memory in this case than trying to > >> > inject some meltdown code into the victim process ? Or maybe you have > >> > other cases in mind that I'm not seeing. > >> > >> I'd like this to be config-controllable so that distros can make the > >> decision whether/if they want to support the whole per-mm thing. > > > > OK. > > > >> Also, if CAP_SYS_RAWIO is going to protect, please make the > >> ARCH_GET_NOPTI variant check it too. > > > > Interestingly I removed the check consecutive to the discussions. But > > I think I'll simply remove the whole ARCH_GET_NOPTI as it has no real > > value beyond initial development. > > > > I've thought about this a bit more. Here are my thoughts: > > 1. I don't like it being per-mm. I think it should be a per-thread > control so that a program can have a thread with PTI that runs > less-trusted JavaScript and other network threads with PTI off. Ingo suggested such use case as well. While I'm quite inclined to agree with it, I'm just thinking, do we really have some processes both I/O bound and executing Javascript or similar in a thread ? Well, thinking about it, we have Lua in haproxy, we could imagine having Javascript later when admins don't want to learn Lua. So that could make sense (/me takes a sickness bag to throw up). > Obviously we lose NX protection mm-wide if any threads have PTI off. > I think the way to implement this is: > > Have this in struct mm_context: > > bool has_non_pti_thread; > > To turn PTI off on a thread: > > Take pagetable_lock. > if (!has_non_pti_thread) { > context.has_non_pti_thread = true; > clear the NX bits; > } > drop pagetable_lock; > set the TI flag; Linus suggested that we refuse to turn off PTI if any thread was already created and I really agree with this, and it's not incompatible with what you have above. We could just turn it on again for certain threads. > Fork clears the per-mm flag in the new mm. Exec clears it, too. I > think that's all that's needed. Newly created threads always have PTI > on. Fork doesn't clear (exec indeed does). Fork clearing it would be problematic as it would mean you can't do it on a deamon during startup. > To turn PTI back on, just clear the TI flag. > > 2.Turning off PTI is, in general, a terrible idea. It totally breaks > any semblance of a security model on a Meltdown-affected CPU. Absolutely, but it recovers what matters more in *certain* workloads, which is performance. > So I > think we should require CAP_SYS_RAWIO *and* that the system is booted > with pti=allow_optout or something like that. I'm really not fan of this. 1) it would require to reboot during the peak hour to try to fix the problem. 2) the flag will end up being deployed everywhere by default in environments flirting with performance "just in case" so it will be rendered useless. I'm fine with Boris' requirement that the kernel should be build with the appropriate option to support this. If you're doing your own builds, you can well take care of having the appropriate options (PTI+the right to turn it off) and deploy such kernels where relevant. Willy