Received: by 10.223.185.116 with SMTP id b49csp991705wrg; Fri, 16 Feb 2018 10:25:51 -0800 (PST) X-Google-Smtp-Source: AH8x224Sb+hRnKTFK6ehxeFhTUS1mBITmAB6A5OvhxTRXR06Oj2xMgCOUBYeyuhxKGZCiynV0ymi X-Received: by 10.98.141.208 with SMTP id p77mr6843605pfk.5.1518805551313; Fri, 16 Feb 2018 10:25:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518805551; cv=none; d=google.com; s=arc-20160816; b=LydwFoYYw3m/ceDbKRdKHjh1oZedZhlVtuw3d6ZtTpc1AvnzHRwMgkM3siPLAde61D SQEceEkCZpoSg0FxtnAYK8hSoJslbHXbNmAsNXfQ0d695OMt4UrlvQ29G14UZU3SXwEK tQzNGvo0qU83acJpTN5yx+Fv+Xo8UA+NSy6WrcrCh2559Hmq8CxZ4QIsrkgeAm7KRm9c qb+KKsFLTY7CnOQ4zCZ80KLTfhral3c8S9hVctOIOHH0wI6GbpgZSyBSadCedGWB0rPF R6Jfy9I8/24ddHQ8k53JeFc8/WUGJ2+JaSq4e6Bx3WQDPqcZRmw3u/8/EO5UdRwgwwRA W/Bw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:references:in-reply-to:mime-version :dmarc-filter:arc-authentication-results; bh=KS6lqL9xKOMBX+LuakSkeMjtoYed+GFzmGnLij3pN4U=; b=iU19F2aKjFvQ4d+2P7kBPmH4B5xd8EykTKAT/OpgjVHP0RkwzN6ecZauObCqMmmVfS L2yL2HQCwzcJ8I1drL6GuFyxDfsV90YrVEHJqq8cwblOWkeljecq3/hzaJwFqvUecFbx 4XB3Uqnmy2cOO5VIuwL4xUPLYekjxclR8xJnOwVEZrwjIZKQY2BRx+hjFqqxBTN8CTHg +6whqCEfYJelYimVRxEiI1HisQGF+RzBmica5mSaacahJc9GLEKpe/2hlCDc76Ps9SAw udWnj6GZQLyLQa6/XmrKv9jHHEK3lRh1gtZq+sbd/kbbbJo3QN/aIuxJJXj9PCEz5vjg IlMg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g4-v6si3482976plp.818.2018.02.16.10.25.37; Fri, 16 Feb 2018 10:25:51 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1164696AbeBOXgE convert rfc822-to-8bit (ORCPT + 99 others); Thu, 15 Feb 2018 18:36:04 -0500 Received: from mail.kernel.org ([198.145.29.99]:58392 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1163785AbeBOXgD (ORCPT ); Thu, 15 Feb 2018 18:36:03 -0500 Received: from mail-it0-f53.google.com (mail-it0-f53.google.com [209.85.214.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id DEFA521775 for ; Thu, 15 Feb 2018 23:36:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DEFA521775 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org Received: by mail-it0-f53.google.com with SMTP id v186so132926itc.5 for ; Thu, 15 Feb 2018 15:36:02 -0800 (PST) X-Gm-Message-State: APf1xPB8xPrPjIGxipZ0i/xpCrfGygoQwkIBTqNI+X2YbjnDCLB0PSd1 +zCi9jthkug0be7nTjU+udZtEut4hK43Vnl7mDt3MQ== X-Received: by 10.36.145.139 with SMTP id i133mr6062833ite.69.1518737762328; Thu, 15 Feb 2018 15:36:02 -0800 (PST) MIME-Version: 1.0 Received: by 10.2.137.84 with HTTP; Thu, 15 Feb 2018 15:35:41 -0800 (PST) In-Reply-To: References: <20180215163602.61162-1-namit@vmware.com> <20180215163602.61162-2-namit@vmware.com> From: Andy Lutomirski Date: Thu, 15 Feb 2018 23:35:41 +0000 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH RFC v2 1/6] x86: Skip PTI when disable indication is set To: Nadav Amit Cc: Andy Lutomirski , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Dave Hansen , Willy Tarreau , X86 ML , LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 15, 2018 at 8:51 PM, Nadav Amit wrote: > Andy Lutomirski wrote: > >> On Thu, Feb 15, 2018 at 4:35 PM, Nadav Amit wrote: >>> If PTI is disabled, we do not want to switch page-tables. On entry to >>> the kernel, this is done based on CR3 value. On return, do it according >>> to per core indication. >>> >>> To be on the safe side, avoid speculative skipping of page-tables >>> switching when returning the userspace. This can be avoided if the CPU >>> cannot execute speculatively code without the proper permissions. When >>> switching to the kernel page-tables, this is anyhow not an issue: if PTI >>> is enabled and page-tables were not switched, the kernel part of the >>> user page-tables would not be set. >>> >>> Signed-off-by: Nadav Amit >>> --- >>> arch/x86/entry/calling.h | 33 +++++++++++++++++++++++++++++++++ >>> arch/x86/include/asm/tlbflush.h | 17 +++++++++++++++-- >>> arch/x86/kernel/asm-offsets.c | 1 + >>> 3 files changed, 49 insertions(+), 2 deletions(-) >>> >>> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h >>> index 3f48f695d5e6..5e9895f44d11 100644 >>> --- a/arch/x86/entry/calling.h >>> +++ b/arch/x86/entry/calling.h >>> @@ -216,7 +216,14 @@ For 32-bit we have the following conventions - kernel is built with >>> >>> .macro SWITCH_TO_KERNEL_CR3 scratch_reg:req >>> ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI >>> + >>> + /* >>> + * Do not switch on compatibility mode. >>> + */ >> >> That comment should just say "if we're already using kernel CR3, don't >> switch" or something like that. > > ok. > >> >>> mov %cr3, \scratch_reg >>> + testq $PTI_USER_PGTABLE_MASK, \scratch_reg >>> + jz .Lend_\@ >>> + >>> ADJUST_KERNEL_CR3 \scratch_reg >>> mov \scratch_reg, %cr3 >>> .Lend_\@: >>> @@ -225,8 +232,20 @@ For 32-bit we have the following conventions - kernel is built with >>> #define THIS_CPU_user_pcid_flush_mask \ >>> PER_CPU_VAR(cpu_tlbstate) + TLB_STATE_user_pcid_flush_mask >>> >>> +#define THIS_CPU_pti_disable \ >>> + PER_CPU_VAR(cpu_tlbstate) + TLB_STATE_pti_disable >>> + >>> .macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req >>> ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI >>> + >>> + /* >>> + * Do not switch on compatibility mode. If there is no need for a >>> + * flush, run lfence to avoid speculative execution returning to user >>> + * with the wrong CR3. >>> + */ >> >> Nix the "compatibility mode" stuff please. Also, can someone confirm >> whether the affected CPUs actually speculate through SYSRET? Because >> your LFENCE might be so expensive that it negates a decent chunk of >> the benefit. > > I will send performance numbers with in the next iteration. The LFENCE did > not introduce high overheads. Anyhow, it would surely be nice to remove it. > >>> + /* >>> + * Cached value of mm.pti_enable to simplify and speed up kernel entry >>> + * code. >>> + */ >>> + unsigned short pti_disable; >> >> Why unsigned short? >> >> IIRC a lot of CPUs use a slow path when decoding instructions with >> 16-bit operands like cmpw, so u8 or u32 could be waaaay faster than >> u16. > > Will do. > >> >>> +/* Return whether page-table isolation is disabled on this CPU */ >>> +static inline unsigned short cpu_pti_disable(void) >>> +{ >>> + return this_cpu_read(cpu_tlbstate.pti_disable); >>> +} >> >> This should return bool regardless of what type lives in the struct. > > Ok. I think that it was so because I tried to support both CS64 and CS32. > >> >>> - invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid)); >>> + if (!cpu_pti_disable()) >>> + invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid)); >> >> This will go badly wrong if pti_disable becomes dynamic. Can you just >> leave the code as it was? > > Will do. > >> >>> /* If current->mm == NULL then the read_cr3() "borrows" an mm */ >>> native_write_cr3(__native_read_cr3()); >>> @@ -404,7 +417,7 @@ static inline void __native_flush_tlb_single(unsigned long addr) >>> >>> asm volatile("invlpg (%0)" ::"r" (addr) : "memory"); >>> >>> - if (!static_cpu_has(X86_FEATURE_PTI)) >>> + if (!static_cpu_has(X86_FEATURE_PTI) || cpu_pti_disable()) >>> return; >> >> Ditto. > > As for this last one - I don’t see why. Can you please explain? If you are > only worried about enabling/disabling PTI dynamically, I can address this > specific issue by flushing the TLB when it happens. > Simplicity. But if the code is careful to get all the flushing right when the mode switches, I'm okay with keeping the optimization. --Andy