Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935040AbeAKRJh (ORCPT + 1 other); Thu, 11 Jan 2018 12:09:37 -0500 Received: from mail.kernel.org ([198.145.29.99]:43290 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932858AbeAKRJg (ORCPT ); Thu, 11 Jan 2018 12:09:36 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2A53121772 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org X-Google-Smtp-Source: ACJfBos1PRZP6PLcaSqFHV7Lpv7X55o4uQeV01pr39DXcm98lz8iuZChRBBlBa3TI7EsiKKF5ShAU+1EJAn48VRA8nE= MIME-Version: 1.0 In-Reply-To: <20180111154412.GA15296@1wt.eu> References: <1515502580-12261-1-git-send-email-w@1wt.eu> <1515502580-12261-7-git-send-email-w@1wt.eu> <20180110082207.GX29822@worktop.programming.kicks-ass.net> <20180110091102.GH14066@1wt.eu> <20180111064259.GC14920@1wt.eu> <0f08d89e-61e1-20e3-5c59-0b2f7b32bf0c@linux.intel.com> <20180111154412.GA15296@1wt.eu> From: Andy Lutomirski Date: Thu, 11 Jan 2018 09:09:14 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH v2 6/6] x86/entry/pti: don't switch PGD on when pti_disable is set To: Willy Tarreau Cc: Dave Hansen , Linus Torvalds , Andy Lutomirski , Peter Zijlstra , LKML , X86 ML , Borislav Petkov , Brian Gerst , Ingo Molnar , Thomas Gleixner , Josh Poimboeuf , "H. Peter Anvin" , Greg Kroah-Hartman , Kees Cook Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Thu, Jan 11, 2018 at 7:44 AM, Willy Tarreau wrote: > Hi Dave, > > On Thu, Jan 11, 2018 at 07:29:30AM -0800, Dave Hansen wrote: >> I don't think we need a "NOW" and "NEXT" mode, at least initially. The >> "NEXT" semantics are going to be tricky and I think "NOW" is good enough > > In fact I thought the NEXT one would bring us a nice benefit which is that > we start the new process knowing the flag's value so we can decide whether > or not to apply _PAGE_NX on the pgd from the start, and never touch it > anymore. > >> Whatever we do, we'll need this PTI-disable flag to be able cross >> exeve() so that a wrapper a la nice(1) work. > > Absolutely! > >> Initially, I think the >> default should be that it survives fork(). There are just too many >> things out there that "start up" by doing a shell script that calls a >> python script, that calls a... > > Not only that, simply daemons, like most services are! > >> Without the wrapper support, we're _basically_ stuck using this only in >> newly-compiled binaries. That's going to make it much less likely to >> get used. > > I know, that's why I kept considering that option despite not really > needing it for my own use case. > >> The inheritance also gives an app a way to re-enable protections for >> children, just from a _second_ wrapper. That's nice because it means we >> don't initially need a "NEXT" ABI. >> >> So, I'd do this: >> 1. Do the arch_prctl() (but ask the ARM guys what they want too) >> 2. Enabled for an entire process (not thread) >> 3. Inherited across fork/exec >> 4. Cleared on setuid() and friends > > This one causes me a problem : some daemons already take care of dropping > privileges after the initial fork() for the sake of security. Haproxy > typically does this at boot : > > - parse config > - chroot to /var/empty > - setuid(dedicated_uid) > - fork() > > This ensures the process is properly isolated and hard enough to break out > of. So I'd really like this setuid() not to anihilate all we've done. > Probably that we want to drop it on suid binaries however, though I'm > having doubts about the benefits, because if the binary already allows > an intruder to inject its own meltdown code, you're quite screwed anyway. > >> 5. I'm sure the security folks have/want a way to force it on forever > > Sure! That's what I implemented using the sysctl. > All of these proposals have serious issues. For example, suppose I have a setuid program called nopti that works like this: $ nopti some_program nopti verifies that some_program is trustworthy and runs it (as the real uid of nopti's user) with PTI off. Now we have all the usual problems: you can easily break out using ptrace(), for example. And LD_PRELOAD gets this wrong. Et. So I think that no-pti mode is a privilege as opposed to a mode per se. If you can turn off PTI, then you have the ability to read all of kernel memory So maybe we should treat it as such. Add a capability CAP_DISABLE_PTI. If you have that capability (globally), then you can use the arch_prctl() or regular prctl() or whatever to turn PTI on. If you lose the cap, you lose no-pti mode as well. If an LSM wants to block it, it can use existing mechanisms. As for per-mm vs per-thread, let's make it only switchable in single-threaded processes for now and inherited when threads are created. We can change that if and when demand for the ability to change it shows up. (Another reason for per-thread instead of per-mm: as a per-mm thing, you can't set it up for your descendents using vfork(); prctl(); exec(), and the latter is how your average language runtime that spawns subprocesses would want to do it.