Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934623AbeAHQN0 (ORCPT + 1 other); Mon, 8 Jan 2018 11:13:26 -0500 Received: from wtarreau.pck.nerim.net ([62.212.114.60]:38907 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933126AbeAHQNR (ORCPT ); Mon, 8 Jan 2018 11:13:17 -0500 From: Willy Tarreau To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: tglx@linutronix.de, gnomes@lxorguk.ukuu.org.uk, torvalds@linux-foundation.org, Willy Tarreau Subject: [PATCH RFC 4/4] x86/entry/pti: don't switch PGD on tasks holding flag TIF_NOPTI Date: Mon, 8 Jan 2018 17:12:19 +0100 Message-Id: <1515427939-10999-5-git-send-email-w@1wt.eu> X-Mailer: git-send-email 2.8.0.rc2.1.gbe9624a In-Reply-To: <1515427939-10999-1-git-send-email-w@1wt.eu> References: <1515427939-10999-1-git-send-email-w@1wt.eu> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: If a task has the TIF_NOPTI flag set, it doesn't want to experience page table isolation. In this case, returns from kernel to user will not switch the CR3, leaving it to the kernel one which already maps both user and kernel pages. Upon entry in the kernel, we can't check this flag so we simply check if CR3 was pointing to the kernel's PGD, indicating an earlier absence of switch, and in this case we don't change it. Thanks to these changes, haproxy running under KVM went back from 12400 conn/s to 21000 once loaded after calling prctl(). Signed-off-by: Willy Tarreau --- arch/x86/entry/calling.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 45a63e0..054b8b7 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0 */ #include +#include #include #include #include @@ -214,6 +215,11 @@ .macro SWITCH_TO_KERNEL_CR3 scratch_reg:req ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI mov %cr3, \scratch_reg + + /* if we're already on the kernel PGD, we don't switch */ + testq $(PTI_SWITCH_PGTABLES_MASK), \scratch_reg + jz .Lend_\@ + ADJUST_KERNEL_CR3 \scratch_reg mov \scratch_reg, %cr3 .Lend_\@: @@ -224,6 +230,12 @@ .macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI + + /* "NOPTI" taskflag avoids the switch */ + movq PER_CPU_VAR(current_task), \scratch_reg + btq $TIF_NOPTI, TASK_TI_flags(\scratch_reg) + jc .Lend_\@ + mov %cr3, \scratch_reg ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID @@ -262,6 +274,13 @@ ALTERNATIVE "jmp .Ldone_\@", "", X86_FEATURE_PTI movq %cr3, \scratch_reg movq \scratch_reg, \save_reg + + /* if we're already on the kernel PGD, we don't switch, + * we just save the current cr3. + */ + testq $(PTI_SWITCH_PGTABLES_MASK), \scratch_reg + jz .Ldone_\@ + /* * Is the "switch mask" all zero? That means that both of * these are zero: @@ -284,6 +303,10 @@ .macro RESTORE_CR3 scratch_reg:req save_reg:req ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI + /* if we saved a kernel context, we didn't switch so we don't switch */ + testq $(PTI_SWITCH_PGTABLES_MASK), \save_reg + jz .Lend_\@ + ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID /* -- 1.7.12.1