Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932982AbeAIPcS (ORCPT + 1 other); Tue, 9 Jan 2018 10:32:18 -0500 Received: from out02.mta.xmission.com ([166.70.13.232]:50683 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932520AbeAIPcO (ORCPT ); Tue, 9 Jan 2018 10:32:14 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Willy Tarreau Cc: linux-kernel@vger.kernel.org, x86@kernel.org, tglx@linutronix.de, gnomes@lxorguk.ukuu.org.uk, torvalds@linux-foundation.org References: <1515427939-10999-1-git-send-email-w@1wt.eu> Date: Tue, 09 Jan 2018 09:31:27 -0600 In-Reply-To: <1515427939-10999-1-git-send-email-w@1wt.eu> (Willy Tarreau's message of "Mon, 8 Jan 2018 17:12:15 +0100") Message-ID: <87a7xnkq0g.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1eYvsq-00070I-2z;;;mid=<87a7xnkq0g.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=67.3.133.177;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+GS9S5xc0ugfgFPNwGOE3g2J4kj3bsh6M= X-SA-Exim-Connect-IP: 67.3.133.177 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH RFC 0/4] Per-task PTI activation X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Willy Tarreau writes: > Hi! > > I could experiment a bit with the possibility to enable/disable PTI per > task. Please keep in mind that it's not my area of experitise at all, but > doing so I could recover the initial performance without disabling PTI on > the whole system. > > So what I did in this series consists in the following : > - addition of a new per-task TIF_NOPTI flag. Please note that I'm not > proud of the way I did it, as 32 flags were already taken. The flags > are declared as "long" so there are 32 more flags available on x86_64 > but C and asm disagree on the type of 1<<32 so I had to declare the > hex value by hand... By the way I even suspect that _TIF_FSCHECK is > wrong once cast to a long, I think it causes sign extension into the > 32 upper bits since it's supposed to be signed. > > - addition of a set of arch_prctl() calls (ARCH_GET_NOPTI and > ARCH_SET_NOPTI), to check and change the activation of the > protection. The change requires CAP_SYS_RAWIO and can be done in > a wrapper (that's how I tested) > > - the user PGD was marked with _PAGE_NX to prevent an accidental leak > of CR3 from not being detected. I obviously had to disable this since > in this case we do want such a user task to run without switching the > PGD. I think this could be performed per-task maybe. Another approach > might consist in dealing with 3 PGDs and using a different one for > unprotected tasks but that really starts to sound overkill. > > - upon return to userspace, I check if the task's flags contain the > new TIF_NOPTI or not. If it does contain it, then we don't switch > the CR3. > > - upon entry into the kernel from userspace, we can't access the task's > flags but we can already check if CR3 points to the kernel or user PGD, > and we refrain from switching if it's already the system one. > > By doing so I could recover the initial performance of haproxy in a VM, > going from 12400 connections per second to 21000 once started with this > trivial wrapper : > > #include > #include > > #ifndef ARCH_SET_NOPTI > #define ARCH_SET_NOPTI 0x1022 > #endif > > int main(int argc, char **argv) > { > arch_prctl(ARCH_SET_NOPTI, 1); > argv++; > return execvp(argv[0], argv); > } > > I have not yet run it on real hardware. Before trying to go a bit further > I'd like to know if such an approach is acceptable or if I'm doing anything > stupid and looking in the wrong direction. Before this goes much farther I want to point something out. When I have kpti protecting me it is the applications with that connect to the network I worry about. Until I get to a system with users that don't trust each other local I don't have a reason to worry about these attacks from local applications. The dangerous scenario is someone exploting a buffer overflow, or otherwise getting a network facing application to misbehave, and then using these new attacks to assist in gaining privilege escalation. Googling seems to indicate that there is about one issue a year found in haproxy. So this is not an unrealistic concern for the case you mention. So unless I am seeing things wrong this is a patchset designed to drop your defensense on the most vulnerable applications. Disably protection on the most vunerable applications is not behavior I would encourage. It seems better than disabling protection system wide but only slightly. I definitely don't think this is something we want applications disabling themselves. Certainly this is something that should look at no-new-privs and if no-new-privs is set not allow disabling this protection. Eric