Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935368AbeAKRlX (ORCPT + 1 other); Thu, 11 Jan 2018 12:41:23 -0500 Received: from wtarreau.pck.nerim.net ([62.212.114.60]:39606 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932429AbeAKRlW (ORCPT ); Thu, 11 Jan 2018 12:41:22 -0500 Date: Thu, 11 Jan 2018 18:40:25 +0100 From: Willy Tarreau To: Andy Lutomirski Cc: Dave Hansen , Linus Torvalds , Peter Zijlstra , LKML , X86 ML , Borislav Petkov , Brian Gerst , Ingo Molnar , Thomas Gleixner , Josh Poimboeuf , "H. Peter Anvin" , Greg Kroah-Hartman , Kees Cook Subject: Re: [RFC PATCH v2 6/6] x86/entry/pti: don't switch PGD on when pti_disable is set Message-ID: <20180111174025.GB15344@1wt.eu> References: <1515502580-12261-1-git-send-email-w@1wt.eu> <1515502580-12261-7-git-send-email-w@1wt.eu> <20180110082207.GX29822@worktop.programming.kicks-ass.net> <20180110091102.GH14066@1wt.eu> <20180111064259.GC14920@1wt.eu> <0f08d89e-61e1-20e3-5c59-0b2f7b32bf0c@linux.intel.com> <20180111154412.GA15296@1wt.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.1 (2016-04-27) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Hi Andy! On Thu, Jan 11, 2018 at 09:09:14AM -0800, Andy Lutomirski wrote: > On Thu, Jan 11, 2018 at 7:44 AM, Willy Tarreau wrote: > >> So, I'd do this: > >> 1. Do the arch_prctl() (but ask the ARM guys what they want too) > >> 2. Enabled for an entire process (not thread) > >> 3. Inherited across fork/exec > >> 4. Cleared on setuid() and friends > > > > This one causes me a problem : some daemons already take care of dropping > > privileges after the initial fork() for the sake of security. Haproxy > > typically does this at boot : > > > > - parse config > > - chroot to /var/empty > > - setuid(dedicated_uid) > > - fork() > > > > This ensures the process is properly isolated and hard enough to break out > > of. So I'd really like this setuid() not to anihilate all we've done. > > Probably that we want to drop it on suid binaries however, though I'm > > having doubts about the benefits, because if the binary already allows > > an intruder to inject its own meltdown code, you're quite screwed anyway. > > > >> 5. I'm sure the security folks have/want a way to force it on forever > > > > Sure! That's what I implemented using the sysctl. > > > > All of these proposals have serious issues. For example, suppose I > have a setuid program called nopti that works like this: > > $ nopti some_program > > nopti verifies that some_program is trustworthy and runs it (as the > real uid of nopti's user) with PTI off. Now we have all the usual > problems: you can easily break out using ptrace(), for example. And > LD_PRELOAD gets this wrong. Et. Fine, we can drop it on suid binaries as I mentionned. > So I think that no-pti mode is a privilege as opposed to a mode per > se. If you can turn off PTI, then you have the ability to read all of > kernel memory So maybe we should treat it as such. Add a capability > CAP_DISABLE_PTI. If you have that capability (globally), then you can > use the arch_prctl() or regular prctl() or whatever to turn PTI on. > If you lose the cap, you lose no-pti mode as well. I disagree on this, because the only alternative I have is to decide to keep my process running as root, which is even worse, as root can much more easily escape from a chroot jail for example, or access /dev/mem and read all the memory as well. Also tell Linus that he'll have to build his kernels as root ;-) The arch_prctl() calls I proposed only allow to turn PTI off for privileged users but any user can turn it back on. For me it's important. A process might trust itself for its own use, but such processes will rarely trust external processes in case they need to perform an occasional fork+exec. Imagine for example a static web server requiring to achieve the highest possible performance and having to occasionally call logrotate to rotate+compress the logs. It's better if the process knows how to turn PTI back on before calling this. > If an LSM wants to block it, it can use existing mechanisms. > > As for per-mm vs per-thread, let's make it only switchable in > single-threaded processes for now and inherited when threads are > created. That's exactly what it does for now, but Linus doesn't like it at all. So I'll switch it back to per-mm + per-CPU variable. Well he has a valid point regarding the pgd and _PAGE_NX setting. point Now at least we know the change is minimal if we have a good reason for doing differently later. > We can change that if and when demand for the ability to > change it shows up. > > (Another reason for per-thread instead of per-mm: as a per-mm thing, > you can't set it up for your descendents using vfork(); prctl(); > exec(), and the latter is how your average language runtime that > spawns subprocesses would want to do it. That's indeed the benefit it provides for now since I actually had to *add* code to execve() to disable it then. Cheers, Willy