Received: by 10.223.185.116 with SMTP id b49csp888387wrg; Fri, 16 Feb 2018 08:47:06 -0800 (PST) X-Google-Smtp-Source: AH8x224I1lAX7/FzbapnNQgrm4yO1qkPm2d2g6mCegvW2ZmpHZ5z6QoKcgDveT/T0nAJpy2mx5sc X-Received: by 10.99.110.11 with SMTP id j11mr5213028pgc.294.1518799626683; Fri, 16 Feb 2018 08:47:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518799626; cv=none; d=google.com; s=arc-20160816; b=q1k6ey/24bPNUC1/L5h1v85pKAiSSsqkhQPcqLrQ/6hpFgWpVV9dDtjVLu0ZnqFchy YxP2ADkM1MWC9Dr4U+lsfdl6SrRAJZmYNuFH5bX9OW+qHrQcOCCUPW9mNBDxjTM1wuU0 ub+fqFTmf//lqCxwC8hD7QS2/KBuNFMPdX+qIY8pGk7NTbyjl1ZagIa2XxBYMdNU60Xu q9QyjnFe5dpqQ1JKOmAtCRYENwY8Vcpd/ioit1V61kO11kGYe6JKw+2/k7uRhbNbW+aa oH+YyViBIjP4AdLvAjHMpJLSEqz360yyoSas8HpATNG8/W20Zw+I5hLjQmPRai/wSrW9 QL5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dmarc-filter :arc-authentication-results; bh=SJDs4tBwvwquvVMoGqHI1KCSlyT7DFzcdZHzHuq3G6M=; b=idYTQuZzG+8rYl7MRJCqpPeGqesi8359jKwkz3fhYg/a2PhMxIBVOfqJKI2cB0roqh uoAqxP0gxMwQfmjCTxy52iRHCYMZdOgWulWoZiZEvBg1qgMVYmRHGgbwKmi259+kznP9 zrJ3ROGLTIUE1CPr8bu/0MDisVCD8fEmnptf2ucE4F054yDk/+Ka0r7hVevTWEZceJ/x z1uuKlf7PDtp5f4Suib1FsN3Sfs1k2tj4xOsBcBXHYJhJAbweFqdQFw39NrdZPWLr6mt 0nodKX6obGu2/l5k5EVHrhK46Zzgcs0u91TJROR1x79NjJFPil1XiwkhnJm/wVd97Uzp f6XA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n12si4249041pgf.289.2018.02.16.08.46.51; Fri, 16 Feb 2018 08:47:06 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161314AbeBOTwN (ORCPT + 99 others); Thu, 15 Feb 2018 14:52:13 -0500 Received: from mail.kernel.org ([198.145.29.99]:54570 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1033454AbeBOTwM (ORCPT ); Thu, 15 Feb 2018 14:52:12 -0500 Received: from mail-io0-f181.google.com (mail-io0-f181.google.com [209.85.223.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 5D19C2171F for ; Thu, 15 Feb 2018 19:52:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5D19C2171F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org Received: by mail-io0-f181.google.com with SMTP id t22so1873670ioa.7 for ; Thu, 15 Feb 2018 11:52:11 -0800 (PST) X-Gm-Message-State: APf1xPC2V9Y3S4RZ8zMwLYkg8YC36GkBFwq3P+7pHQKrg0tJuSjwn0Pi niWfbMjYhsXXd00u6ZwhYGV5nFEyoVXpzjQvwXCg/Q== X-Received: by 10.107.7.153 with SMTP id g25mr5047120ioi.271.1518724330750; Thu, 15 Feb 2018 11:52:10 -0800 (PST) MIME-Version: 1.0 Received: by 10.2.137.84 with HTTP; Thu, 15 Feb 2018 11:51:50 -0800 (PST) In-Reply-To: <20180215163602.61162-2-namit@vmware.com> References: <20180215163602.61162-1-namit@vmware.com> <20180215163602.61162-2-namit@vmware.com> From: Andy Lutomirski Date: Thu, 15 Feb 2018 19:51:50 +0000 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH RFC v2 1/6] x86: Skip PTI when disable indication is set To: Nadav Amit Cc: Ingo Molnar , Thomas Gleixner , Andy Lutomirski , Peter Zijlstra , Dave Hansen , Willy Tarreau , Nadav Amit , X86 ML , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 15, 2018 at 4:35 PM, Nadav Amit wrote: > If PTI is disabled, we do not want to switch page-tables. On entry to > the kernel, this is done based on CR3 value. On return, do it according > to per core indication. > > To be on the safe side, avoid speculative skipping of page-tables > switching when returning the userspace. This can be avoided if the CPU > cannot execute speculatively code without the proper permissions. When > switching to the kernel page-tables, this is anyhow not an issue: if PTI > is enabled and page-tables were not switched, the kernel part of the > user page-tables would not be set. > > Signed-off-by: Nadav Amit > --- > arch/x86/entry/calling.h | 33 +++++++++++++++++++++++++++++++++ > arch/x86/include/asm/tlbflush.h | 17 +++++++++++++++-- > arch/x86/kernel/asm-offsets.c | 1 + > 3 files changed, 49 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h > index 3f48f695d5e6..5e9895f44d11 100644 > --- a/arch/x86/entry/calling.h > +++ b/arch/x86/entry/calling.h > @@ -216,7 +216,14 @@ For 32-bit we have the following conventions - kernel is built with > > .macro SWITCH_TO_KERNEL_CR3 scratch_reg:req > ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI > + > + /* > + * Do not switch on compatibility mode. > + */ That comment should just say "if we're already using kernel CR3, don't switch" or something like that. > mov %cr3, \scratch_reg > + testq $PTI_USER_PGTABLE_MASK, \scratch_reg > + jz .Lend_\@ > + > ADJUST_KERNEL_CR3 \scratch_reg > mov \scratch_reg, %cr3 > .Lend_\@: > @@ -225,8 +232,20 @@ For 32-bit we have the following conventions - kernel is built with > #define THIS_CPU_user_pcid_flush_mask \ > PER_CPU_VAR(cpu_tlbstate) + TLB_STATE_user_pcid_flush_mask > > +#define THIS_CPU_pti_disable \ > + PER_CPU_VAR(cpu_tlbstate) + TLB_STATE_pti_disable > + > .macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req > ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI > + > + /* > + * Do not switch on compatibility mode. If there is no need for a > + * flush, run lfence to avoid speculative execution returning to user > + * with the wrong CR3. > + */ Nix the "compatibility mode" stuff please. Also, can someone confirm whether the affected CPUs actually speculate through SYSRET? Because your LFENCE might be so expensive that it negates a decent chunk of the benefit. > + /* > + * Cached value of mm.pti_enable to simplify and speed up kernel entry > + * code. > + */ > + unsigned short pti_disable; Why unsigned short? IIRC a lot of CPUs use a slow path when decoding instructions with 16-bit operands like cmpw, so u8 or u32 could be waaaay faster than u16. > +/* Return whether page-table isolation is disabled on this CPU */ > +static inline unsigned short cpu_pti_disable(void) > +{ > + return this_cpu_read(cpu_tlbstate.pti_disable); > +} This should return bool regardless of what type lives in the struct. > - invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid)); > + if (!cpu_pti_disable()) > + invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid)); This will go badly wrong if pti_disable becomes dynamic. Can you just leave the code as it was? > > /* If current->mm == NULL then the read_cr3() "borrows" an mm */ > native_write_cr3(__native_read_cr3()); > @@ -404,7 +417,7 @@ static inline void __native_flush_tlb_single(unsigned long addr) > > asm volatile("invlpg (%0)" ::"r" (addr) : "memory"); > > - if (!static_cpu_has(X86_FEATURE_PTI)) > + if (!static_cpu_has(X86_FEATURE_PTI) || cpu_pti_disable()) > return; Ditto.