Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934555AbeAHQMs (ORCPT + 1 other); Mon, 8 Jan 2018 11:12:48 -0500 Received: from wtarreau.pck.nerim.net ([62.212.114.60]:38887 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932943AbeAHQMr (ORCPT ); Mon, 8 Jan 2018 11:12:47 -0500 From: Willy Tarreau To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: tglx@linutronix.de, gnomes@lxorguk.ukuu.org.uk, torvalds@linux-foundation.org, Willy Tarreau Subject: [PATCH RFC 0/4] Per-task PTI activation Date: Mon, 8 Jan 2018 17:12:15 +0100 Message-Id: <1515427939-10999-1-git-send-email-w@1wt.eu> X-Mailer: git-send-email 2.8.0.rc2.1.gbe9624a Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Hi! I could experiment a bit with the possibility to enable/disable PTI per task. Please keep in mind that it's not my area of experitise at all, but doing so I could recover the initial performance without disabling PTI on the whole system. So what I did in this series consists in the following : - addition of a new per-task TIF_NOPTI flag. Please note that I'm not proud of the way I did it, as 32 flags were already taken. The flags are declared as "long" so there are 32 more flags available on x86_64 but C and asm disagree on the type of 1<<32 so I had to declare the hex value by hand... By the way I even suspect that _TIF_FSCHECK is wrong once cast to a long, I think it causes sign extension into the 32 upper bits since it's supposed to be signed. - addition of a set of arch_prctl() calls (ARCH_GET_NOPTI and ARCH_SET_NOPTI), to check and change the activation of the protection. The change requires CAP_SYS_RAWIO and can be done in a wrapper (that's how I tested) - the user PGD was marked with _PAGE_NX to prevent an accidental leak of CR3 from not being detected. I obviously had to disable this since in this case we do want such a user task to run without switching the PGD. I think this could be performed per-task maybe. Another approach might consist in dealing with 3 PGDs and using a different one for unprotected tasks but that really starts to sound overkill. - upon return to userspace, I check if the task's flags contain the new TIF_NOPTI or not. If it does contain it, then we don't switch the CR3. - upon entry into the kernel from userspace, we can't access the task's flags but we can already check if CR3 points to the kernel or user PGD, and we refrain from switching if it's already the system one. By doing so I could recover the initial performance of haproxy in a VM, going from 12400 connections per second to 21000 once started with this trivial wrapper : #include #include #ifndef ARCH_SET_NOPTI #define ARCH_SET_NOPTI 0x1022 #endif int main(int argc, char **argv) { arch_prctl(ARCH_SET_NOPTI, 1); argv++; return execvp(argv[0], argv); } I have not yet run it on real hardware. Before trying to go a bit further I'd like to know if such an approach is acceptable or if I'm doing anything stupid and looking in the wrong direction. Thanks! Willy Willy Tarreau (4): x86/thread_info: add TIF_NOPTI to disable PTI per task x86/arch_prctl: add ARCH_GET_NOPTI and ARCH_SET_NOPTI to enable/disable PTI x86/pti: don't mark the user PGD with _PAGE_NX. x86/entry/pti: don't switch PGD on tasks holding flag TIF_NOPTI arch/x86/entry/calling.h | 23 +++++++++++++++++++++++ arch/x86/include/asm/thread_info.h | 8 ++++++++ arch/x86/include/uapi/asm/prctl.h | 3 +++ arch/x86/kernel/process_64.c | 24 ++++++++++++++++++++++++ arch/x86/mm/pti.c | 2 ++ 5 files changed, 60 insertions(+) -- 1.7.12.1