Received: by 10.223.185.116 with SMTP id b49csp908114wrg; Fri, 16 Feb 2018 09:05:00 -0800 (PST) X-Google-Smtp-Source: AH8x227MY2eAmlcFwpuTzlszRl0s16s+0muZVCqkwpOUaTvy6OUMg0dJkXbgberPEHaH9hj1FOHp X-Received: by 2002:a17:902:42e:: with SMTP id 43-v6mr6632201ple.186.1518800700336; Fri, 16 Feb 2018 09:05:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518800700; cv=none; d=google.com; s=arc-20160816; b=y5xbpFppSGC/98ktz/9bfOGNahLdIBotOSCnn7MIsstJtGEv4VFXOxoJj8IOFQS8uM hzQozIUUjT/LMWVB4OzBPbdMcJtFZvlg0YZuHNYPDGzgqY3KwlomLp0/dnVBUwwZPaFJ ecN8cCYfYp3QVNbfX7tNz93JH4IRhd6JB6XKs/br4SvOPTvoNoC1nw8G3YH/Ufi7zNOG rf/1puxq1GpUb3RzbA5NiaCegpxPdN0G12fE18zHjxtO5kJ+IsyJU4Ghfnpy4f+J774V 8/EsnKioWTrVE31aWsnmK06A6dFdFMlVIxeEhQ9pM2nOxLVCZuqedETL7zHbyx0kg8+V Fchg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature:arc-authentication-results; bh=NF2LvTPLy+2kfnC5+u8wLZOFYPZXnlQLR5yoo0NKKcM=; b=IydHgJhBHJ3xU/IH2QbO1TY8ZAIRP0PHmfHfUDSuySqUwEmVkMWzMKZw5VUBHAPvRH 15p3nwX0YG1tEIL6+SxJwM2pDSKbqvUd0SxPFwev36DfkVTJ7SGLO0sIR09VPGvuwx0Y VUN/9UWwBMglPA1OF9bkBtAWOvtjzo5hrDbxT9imWJcYcZ6pHVSdfYmj/b7FXnRPg2O3 3/TOrSjTRDGPfmcer3a1HFRCEWJg26CoBi960cKrx8jCzx1I0+/vaS/+Q+mgOxJumfFb 4kiZvZuxz/w6Aet2tirWLFNX0J3ZiezoqA5CLbEFx8300SIqyBDzza2MoWn72jgq54D/ vFpg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=svRKIZJ8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f8si709328pgr.324.2018.02.16.09.04.42; Fri, 16 Feb 2018 09:05:00 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=svRKIZJ8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1167034AbeBOUvM (ORCPT + 99 others); Thu, 15 Feb 2018 15:51:12 -0500 Received: from mail-pg0-f66.google.com ([74.125.83.66]:40966 "EHLO mail-pg0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161064AbeBOUvL (ORCPT ); Thu, 15 Feb 2018 15:51:11 -0500 Received: by mail-pg0-f66.google.com with SMTP id t4so715817pgp.8 for ; Thu, 15 Feb 2018 12:51:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=NF2LvTPLy+2kfnC5+u8wLZOFYPZXnlQLR5yoo0NKKcM=; b=svRKIZJ89D0xjYXQTTJuXv0Z7Xk0T9d1GqtjT6Am7MOPd4cz6aEQZv8vEDdPNM2eVE Ub9gGc6dyfIHXRy7mKjmIp+Z5s3BS6lt0RxFKaZW/C4uQSnX3D2/8hfyPymix3txDiSO VbkuxlLwui78tWAnrD2oNyzig6q3sG2epxg1tXRdYbn0nRp3k/S3VfeMbNpWeNBgaMYh XbYLuskC2XC+hV0gF8idKd6fDbdTbn/h3+7EEdsoevrffaAH7fpwP16gzY7SVIFyAqb9 3pE7863uu25uP7oH5RmLYCYVHV2r6vt0vbH6XCFMDHksUZEMiSpLpR+PwrfhPMHgRJSJ 1ypg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=NF2LvTPLy+2kfnC5+u8wLZOFYPZXnlQLR5yoo0NKKcM=; b=ss/XSDbZn6/i++8fVQhTUK0PyTP/ywv3HpcPwUdl18CR7yEBr9trzKKoIZYz+EXVif oE/6oZG07NN0iqBwizliPPzMMQvytog7kS0tigYv93Sd9Bc4AHKlYO/zshFU27YtrvgY telTR9g/wA14R8wgvpzStlMOH8R19dcWPcbkD6EGpch33SRLMRRAoPLIqcqx/9/eqbFi o9rDZaS2HtvZhIDA7nW/g5+JBGz7ZVSZD9VnJ/ebqzW3bYJph6VlwaQ8MDBbP9UZ2yDv 8JRqwYN6Tux4MlIf2xU30ClopiL1z9ec4iSHz73F66dcLdMgmv5gw15ETvQ2AL9eLn9T 1Scg== X-Gm-Message-State: APf1xPCx4h3aciFme11l8hIndIo5KeDFkdesnw/Uv9W9DJ6yfRYOM0ku vdUKqgc2sMHrXdkZcfU510A= X-Received: by 10.99.4.131 with SMTP id 125mr3135980pge.375.1518727870379; Thu, 15 Feb 2018 12:51:10 -0800 (PST) Received: from [10.2.101.129] ([208.91.2.2]) by smtp.gmail.com with ESMTPSA id h13sm50437247pfd.14.2018.02.15.12.51.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 15 Feb 2018 12:51:08 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [PATCH RFC v2 1/6] x86: Skip PTI when disable indication is set From: Nadav Amit In-Reply-To: Date: Thu, 15 Feb 2018 12:51:05 -0800 Cc: Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Dave Hansen , Willy Tarreau , X86 ML , LKML Content-Transfer-Encoding: quoted-printable Message-Id: References: <20180215163602.61162-1-namit@vmware.com> <20180215163602.61162-2-namit@vmware.com> To: Andy Lutomirski X-Mailer: Apple Mail (2.3273) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Andy Lutomirski wrote: > On Thu, Feb 15, 2018 at 4:35 PM, Nadav Amit wrote: >> If PTI is disabled, we do not want to switch page-tables. On entry to >> the kernel, this is done based on CR3 value. On return, do it = according >> to per core indication. >>=20 >> To be on the safe side, avoid speculative skipping of page-tables >> switching when returning the userspace. This can be avoided if the = CPU >> cannot execute speculatively code without the proper permissions. = When >> switching to the kernel page-tables, this is anyhow not an issue: if = PTI >> is enabled and page-tables were not switched, the kernel part of the >> user page-tables would not be set. >>=20 >> Signed-off-by: Nadav Amit >> --- >> arch/x86/entry/calling.h | 33 = +++++++++++++++++++++++++++++++++ >> arch/x86/include/asm/tlbflush.h | 17 +++++++++++++++-- >> arch/x86/kernel/asm-offsets.c | 1 + >> 3 files changed, 49 insertions(+), 2 deletions(-) >>=20 >> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h >> index 3f48f695d5e6..5e9895f44d11 100644 >> --- a/arch/x86/entry/calling.h >> +++ b/arch/x86/entry/calling.h >> @@ -216,7 +216,14 @@ For 32-bit we have the following conventions - = kernel is built with >>=20 >> .macro SWITCH_TO_KERNEL_CR3 scratch_reg:req >> ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI >> + >> + /* >> + * Do not switch on compatibility mode. >> + */ >=20 > That comment should just say "if we're already using kernel CR3, don't > switch" or something like that. ok. >=20 >> mov %cr3, \scratch_reg >> + testq $PTI_USER_PGTABLE_MASK, \scratch_reg >> + jz .Lend_\@ >> + >> ADJUST_KERNEL_CR3 \scratch_reg >> mov \scratch_reg, %cr3 >> .Lend_\@: >> @@ -225,8 +232,20 @@ For 32-bit we have the following conventions - = kernel is built with >> #define THIS_CPU_user_pcid_flush_mask \ >> PER_CPU_VAR(cpu_tlbstate) + TLB_STATE_user_pcid_flush_mask >>=20 >> +#define THIS_CPU_pti_disable \ >> + PER_CPU_VAR(cpu_tlbstate) + TLB_STATE_pti_disable >> + >> .macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req >> ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI >> + >> + /* >> + * Do not switch on compatibility mode. If there is no need = for a >> + * flush, run lfence to avoid speculative execution returning = to user >> + * with the wrong CR3. >> + */ >=20 > Nix the "compatibility mode" stuff please. Also, can someone confirm > whether the affected CPUs actually speculate through SYSRET? Because > your LFENCE might be so expensive that it negates a decent chunk of > the benefit. I will send performance numbers with in the next iteration. The LFENCE = did not introduce high overheads. Anyhow, it would surely be nice to remove = it. >> + /* >> + * Cached value of mm.pti_enable to simplify and speed up = kernel entry >> + * code. >> + */ >> + unsigned short pti_disable; >=20 > Why unsigned short? >=20 > IIRC a lot of CPUs use a slow path when decoding instructions with > 16-bit operands like cmpw, so u8 or u32 could be waaaay faster than > u16. Will do. >=20 >> +/* Return whether page-table isolation is disabled on this CPU */ >> +static inline unsigned short cpu_pti_disable(void) >> +{ >> + return this_cpu_read(cpu_tlbstate.pti_disable); >> +} >=20 > This should return bool regardless of what type lives in the struct. Ok. I think that it was so because I tried to support both CS64 and = CS32. >=20 >> - = invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid)); >> + if (!cpu_pti_disable()) >> + = invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid)); >=20 > This will go badly wrong if pti_disable becomes dynamic. Can you just > leave the code as it was? Will do. >=20 >> /* If current->mm =3D=3D NULL then the read_cr3() "borrows" an = mm */ >> native_write_cr3(__native_read_cr3()); >> @@ -404,7 +417,7 @@ static inline void = __native_flush_tlb_single(unsigned long addr) >>=20 >> asm volatile("invlpg (%0)" ::"r" (addr) : "memory"); >>=20 >> - if (!static_cpu_has(X86_FEATURE_PTI)) >> + if (!static_cpu_has(X86_FEATURE_PTI) || cpu_pti_disable()) >> return; >=20 > Ditto. As for this last one - I don=E2=80=99t see why. Can you please explain? = If you are only worried about enabling/disabling PTI dynamically, I can address = this specific issue by flushing the TLB when it happens. Thanks, Nadav