Received: by 10.223.185.116 with SMTP id b49csp423602wrg; Fri, 16 Feb 2018 01:05:06 -0800 (PST) X-Google-Smtp-Source: AH8x224WxXaMEm2chOz/pSLfLVGdyE13vLd6aTvhgrlQi+PhtRJrGYXOJNEI9dyR5gQm82leah7v X-Received: by 10.98.18.65 with SMTP id a62mr5466204pfj.115.1518771906380; Fri, 16 Feb 2018 01:05:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518771906; cv=none; d=google.com; s=arc-20160816; b=awOWaecP5F+3xkR++tYlPBvBwnV84/7cpKKhPIm10IZnWH/LqZh11ImkD5Tdc+9aDR XRa6r2gm/4Yv0DaCZkWdniVOmiBURtVRyqZW3t6jbqDXqts2V+qv+dnhw6wHj9i/Yeb2 cDDZ5Ol9TLcmLeCKPJnPOKEVXQbquCPT4vZwcMC9c9DH+EmmIlLHeTXInzaLg2EcNvcE Q7oWkzDuxa0XXu68nKQYayVkk+zCeJkxu5aqJzUeW/VjxY1mqr8+Po592WLqrC+6Wh81 LL/L8usHkTpa8LDk8Qm8BlfzgwVq7cGKl7ZKlgCAtvIX8XdIFA2y0M56ciecpEI0tJ3h Tfvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:arc-authentication-results; bh=Qmoa96ECIgCCNYM62I0JTDCwR+VPKvqC14G1/E/CINY=; b=vOUHnUdk/ONUSe7uHqYKsQ6Z3klCMD6S9gKiIdwqukHjx/Vl8mqov2iGt0cb4XRXR2 tim1qa7mTACLVL9IDu3524iqyDFvjTi1h+pQD+XO9qqibA5RfwvIY48Mbof2AKrpiDbl lmYJKqlasxxPTRwCZIKd1/rCFPECdrJJckkb6eiMHgf2z99A2Yf1KUQur8Q63qJpAPqK IHgx1ysdq8ANShBVtXb/8hgCXB/Yt4GKd2ikj35Abc2PBZzjY06zx8WGJNh3VhAnSLPV o4QiEQ5zfteoyssqzLhcggWoS5P59ZkJbbjbChKVWhqUozCNgPeLj7dIrrGAOIuekB1b rPOA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c2-v6si864841plk.770.2018.02.16.01.04.52; Fri, 16 Feb 2018 01:05:06 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1426087AbeBOQgU (ORCPT + 99 others); Thu, 15 Feb 2018 11:36:20 -0500 Received: from ex13-edg-ou-002.vmware.com ([208.91.0.190]:37394 "EHLO EX13-EDG-OU-002.vmware.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1424024AbeBOQgQ (ORCPT ); Thu, 15 Feb 2018 11:36:16 -0500 Received: from sc9-mailhost3.vmware.com (10.113.161.73) by EX13-EDG-OU-002.vmware.com (10.113.208.156) with Microsoft SMTP Server id 15.0.1156.6; Thu, 15 Feb 2018 08:36:07 -0800 Received: from ubuntu.localdomain (unknown [10.2.101.129]) by sc9-mailhost3.vmware.com (Postfix) with ESMTP id 83FE440B8E; Thu, 15 Feb 2018 08:36:15 -0800 (PST) From: Nadav Amit To: Ingo Molnar CC: Thomas Gleixner , Andy Lutomirski , Peter Zijlstra , Dave Hansen , Willy Tarreau , Nadav Amit , , , Nadav Amit Subject: [PATCH RFC v2 1/6] x86: Skip PTI when disable indication is set Date: Thu, 15 Feb 2018 08:35:57 -0800 Message-ID: <20180215163602.61162-2-namit@vmware.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20180215163602.61162-1-namit@vmware.com> References: <20180215163602.61162-1-namit@vmware.com> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: None (EX13-EDG-OU-002.vmware.com: namit@vmware.com does not designate permitted sender hosts) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org If PTI is disabled, we do not want to switch page-tables. On entry to the kernel, this is done based on CR3 value. On return, do it according to per core indication. To be on the safe side, avoid speculative skipping of page-tables switching when returning the userspace. This can be avoided if the CPU cannot execute speculatively code without the proper permissions. When switching to the kernel page-tables, this is anyhow not an issue: if PTI is enabled and page-tables were not switched, the kernel part of the user page-tables would not be set. Signed-off-by: Nadav Amit --- arch/x86/entry/calling.h | 33 +++++++++++++++++++++++++++++++++ arch/x86/include/asm/tlbflush.h | 17 +++++++++++++++-- arch/x86/kernel/asm-offsets.c | 1 + 3 files changed, 49 insertions(+), 2 deletions(-) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 3f48f695d5e6..5e9895f44d11 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -216,7 +216,14 @@ For 32-bit we have the following conventions - kernel is built with .macro SWITCH_TO_KERNEL_CR3 scratch_reg:req ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI + + /* + * Do not switch on compatibility mode. + */ mov %cr3, \scratch_reg + testq $PTI_USER_PGTABLE_MASK, \scratch_reg + jz .Lend_\@ + ADJUST_KERNEL_CR3 \scratch_reg mov \scratch_reg, %cr3 .Lend_\@: @@ -225,8 +232,20 @@ For 32-bit we have the following conventions - kernel is built with #define THIS_CPU_user_pcid_flush_mask \ PER_CPU_VAR(cpu_tlbstate) + TLB_STATE_user_pcid_flush_mask +#define THIS_CPU_pti_disable \ + PER_CPU_VAR(cpu_tlbstate) + TLB_STATE_pti_disable + .macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI + + /* + * Do not switch on compatibility mode. If there is no need for a + * flush, run lfence to avoid speculative execution returning to user + * with the wrong CR3. + */ + cmpw $(0), THIS_CPU_pti_disable + jnz .Lno_spec_\@ + mov %cr3, \scratch_reg ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID @@ -244,6 +263,10 @@ For 32-bit we have the following conventions - kernel is built with movq \scratch_reg2, \scratch_reg jmp .Lwrcr3_pcid_\@ +.Lno_spec_\@: + lfence + jmp .Lend_\@ + .Lnoflush_\@: movq \scratch_reg2, \scratch_reg SET_NOFLUSH_BIT \scratch_reg @@ -288,6 +311,12 @@ For 32-bit we have the following conventions - kernel is built with ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID + /* + * Do not restore if PTI is disabled. + */ + cmpw $(0), THIS_CPU_pti_disable + jnz .Lno_spec_\@ + /* * KERNEL pages can always resume with NOFLUSH as we do * explicit flushes. @@ -307,6 +336,10 @@ For 32-bit we have the following conventions - kernel is built with btr \scratch_reg, THIS_CPU_user_pcid_flush_mask jmp .Lwrcr3_\@ +.Lno_spec_\@: + lfence + jmp .Lend_\@ + .Lnoflush_\@: SET_NOFLUSH_BIT \save_reg diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index d33e4a26dc7e..cf91a484bb41 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -216,6 +216,12 @@ struct tlb_state { */ unsigned long cr4; + /* + * Cached value of mm.pti_enable to simplify and speed up kernel entry + * code. + */ + unsigned short pti_disable; + /* * This is a list of all contexts that might exist in the TLB. * There is one per ASID that we use, and the ASID (what the @@ -298,6 +304,12 @@ static inline void invalidate_other_asid(void) this_cpu_write(cpu_tlbstate.invalidate_other, true); } +/* Return whether page-table isolation is disabled on this CPU */ +static inline unsigned short cpu_pti_disable(void) +{ + return this_cpu_read(cpu_tlbstate.pti_disable); +} + /* * Save some of cr4 feature set we're using (e.g. Pentium 4MB * enable and PPro Global page enable), so that any CPU's that boot @@ -355,7 +367,8 @@ static inline void __native_flush_tlb(void) */ WARN_ON_ONCE(preemptible()); - invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + if (!cpu_pti_disable()) + invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid)); /* If current->mm == NULL then the read_cr3() "borrows" an mm */ native_write_cr3(__native_read_cr3()); @@ -404,7 +417,7 @@ static inline void __native_flush_tlb_single(unsigned long addr) asm volatile("invlpg (%0)" ::"r" (addr) : "memory"); - if (!static_cpu_has(X86_FEATURE_PTI)) + if (!static_cpu_has(X86_FEATURE_PTI) || cpu_pti_disable()) return; /* diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 76417a9aab73..435bb5cdfd66 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -97,6 +97,7 @@ void common(void) { /* TLB state for the entry code */ OFFSET(TLB_STATE_user_pcid_flush_mask, tlb_state, user_pcid_flush_mask); + OFFSET(TLB_STATE_pti_disable, tlb_state, pti_disable); /* Layout info for cpu_entry_area */ OFFSET(CPU_ENTRY_AREA_tss, cpu_entry_area, tss); -- 2.14.1