Received: by 10.223.176.5 with SMTP id f5csp599503wra; Fri, 2 Feb 2018 03:06:11 -0800 (PST) X-Google-Smtp-Source: AH8x226cXtJQY/tqS3af/EtDjplUNhEnTpZZlmgVCCz2zL71N2jLHZ1KlL7oskG126aLvW/uHbP+ X-Received: by 10.98.97.4 with SMTP id v4mr29493524pfb.160.1517569570955; Fri, 02 Feb 2018 03:06:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517569570; cv=none; d=google.com; s=arc-20160816; b=fdmXAjK/xD4HzeL/xOHMXjXQlSSvZMTeW05EoRNI9Neg8QOXC/HOFEAZsDGTIXOaEd /hniby/e6CO+WGzqfgHC5NwRCr185BByHmRKJMXm1MxlX1hTFY6eIm6neYdezeHDqjmV 3oq8hAwwkXixDaJdOWYI/HKnqrJjej2zPYLYMNDp0ZHxMtkWDCoUjpSW6PxYsfXWY2Ie qe5zKwMPOFLDO6u/MR+L4CHdlEMnfDWB7ZWAlXVgzn0Dn8gITLjdA7O5GWLS/a2EUPMc FUnbqGQzNZGZz5MflCq3dPtmfw+ZErADduK7ex2A+1re06SHKWsmG7HgnfzodbEw5Y9J FE4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:mail-followup-to :message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=+EwcOHR6YUtTj6rVaFEKBHG8zoG2xryzMF82GsygzmE=; b=e2JlUbjM8uHt6zBiXtVqHhz9DYCL9YOcNR3+zyDUeZMAoSj0/4aJyUrso9WHPGS53f vVqwZNgas6DJS/eEcYtJdlcg/3jFgRWoZaO1N3TcP3mnvD3xKFjAqizQm895qZJOjQSV Z51y0D1ciNsC7YSI4313ysPWpAH6VvEItjIWPpAvKNqwE3aK5jVZAkxrkz1x8aR/ZxKd Jvccp7x9J6C8d5BurILG6LUqehBUuHnzPUWOgrXUYgbWSj2EOVrUjowf7bLeRyJX4tO7 QEqQN7J01BNF8erByd2qCnhh3gh8TYv2Mk2qLATohMib7zdaZ80+GOO96Mv40WnWjxRw 26Pg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=OAuW4C8u; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k12si1240359pgo.806.2018.02.02.03.05.54; Fri, 02 Feb 2018 03:06:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=OAuW4C8u; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751580AbeBBLEE (ORCPT + 99 others); Fri, 2 Feb 2018 06:04:04 -0500 Received: from userp2130.oracle.com ([156.151.31.86]:46820 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750773AbeBBLD4 (ORCPT ); Fri, 2 Feb 2018 06:03:56 -0500 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w12B1f2v078886; Fri, 2 Feb 2018 11:03:25 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=corp-2017-10-26; bh=+EwcOHR6YUtTj6rVaFEKBHG8zoG2xryzMF82GsygzmE=; b=OAuW4C8udVpcqBTvLE32iz4pg6KM/6ZIPGaIPqcikVRFhO4JlKhgBk5GpE8Was/JjvT7 4bS40ilTizGyVoJMsBbZKl/PA/KiWQ82n0TRvsN256dMTJAVNcBru8TJ275IdPKh9URw +P9/itMiZrWOQuMJWz6s6PU57Sz9fiv4vB2rLZQOMXAEkvwyP83GQ6bMzPH7Eg5knrUf 4qBT7GlALeoGwmhE8TI8HD95qX/OJ6AyigLpx5d47EYhnCi/y/h2IBQfSU8HPnSJMqEt H5WqULWkrORxqYZhuVVvEXxmi4hjsZIlXbozOYh+Y8/p162rUCTDdDjgK1LKM4Z0Rdql Mg== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2130.oracle.com with ESMTP id 2fvk9v0wab-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 02 Feb 2018 11:03:25 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w12B3OrX029903 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 2 Feb 2018 11:03:24 GMT Received: from abhmp0013.oracle.com (abhmp0013.oracle.com [141.146.116.19]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w12B3Oq3010450; Fri, 2 Feb 2018 11:03:24 GMT Received: from starbug-vm.ie.oracle.com (/10.175.188.7) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 02 Feb 2018 03:03:23 -0800 Date: Fri, 2 Feb 2018 11:03:18 +0000 From: Darren Kenny To: KarimAllah Ahmed Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, Asit Mallick , Arjan Van De Ven , Dave Hansen , Andi Kleen , Andrea Arcangeli , Linus Torvalds , Tim Chen , Thomas Gleixner , Dan Williams , Jun Nakajima , Paolo Bonzini , David Woodhouse , Greg KH , Andy Lutomirski , Ashok Raj Subject: Re: [PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL Message-ID: <20180202110318.mxacgr52zya2dqk5@starbug-vm.ie.oracle.com> Mail-Followup-To: KarimAllah Ahmed , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, Asit Mallick , Arjan Van De Ven , Dave Hansen , Andi Kleen , Andrea Arcangeli , Linus Torvalds , Tim Chen , Thomas Gleixner , Dan Williams , Jun Nakajima , Paolo Bonzini , David Woodhouse , Greg KH , Andy Lutomirski , Ashok Raj References: <1517522386-18410-1-git-send-email-karahmed@amazon.de> <1517522386-18410-5-git-send-email-karahmed@amazon.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Disposition: inline In-Reply-To: <1517522386-18410-5-git-send-email-karahmed@amazon.de> User-Agent: NeoMutt/20171215 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8792 signatures=668660 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1802020136 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 01, 2018 at 10:59:45PM +0100, KarimAllah Ahmed wrote: >[ Based on a patch from Ashok Raj ] > >Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for >guests that will only mitigate Spectre V2 through IBRS+IBPB and will not >be using a retpoline+IBPB based approach. > >To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for >guests that do not actually use the MSR, only start saving and restoring >when a non-zero is written to it. > >No attempt is made to handle STIBP here, intentionally. Filtering STIBP >may be added in a future patch, which may require trapping all writes >if we don't want to pass it through directly to the guest. > >[dwmw2: Clean up CPUID bits, save/restore manually, handle reset] > >Cc: Asit Mallick >Cc: Arjan Van De Ven >Cc: Dave Hansen >Cc: Andi Kleen >Cc: Andrea Arcangeli >Cc: Linus Torvalds >Cc: Tim Chen >Cc: Thomas Gleixner >Cc: Dan Williams >Cc: Jun Nakajima >Cc: Paolo Bonzini >Cc: David Woodhouse >Cc: Greg KH >Cc: Andy Lutomirski >Cc: Ashok Raj >Signed-off-by: KarimAllah Ahmed >Signed-off-by: David Woodhouse Reviewed-by: Darren Kenny >--- >v6: >- got rid of save_spec_ctrl_on_exit >- introduce msr_write_intercepted >v5: >- Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes >v4: >- Add IBRS to kvm_cpuid_8000_0008_ebx_x86_features >- Handling nested guests >v3: >- Save/restore manually >- Fix CPUID handling >- Fix a copy & paste error in the name of SPEC_CTRL MSR in > disable_intercept. >- support !cpu_has_vmx_msr_bitmap() >v2: >- remove 'host_spec_ctrl' in favor of only a comment (dwmw@). >- special case writing '0' in SPEC_CTRL to avoid confusing live-migration > when the instance never used the MSR (dwmw@). >- depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@). >- add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident). >--- > arch/x86/kvm/cpuid.c | 9 +++-- > arch/x86/kvm/vmx.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++++++- > arch/x86/kvm/x86.c | 2 +- > 3 files changed, 110 insertions(+), 6 deletions(-) > >diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c >index 1909635..13f5d42 100644 >--- a/arch/x86/kvm/cpuid.c >+++ b/arch/x86/kvm/cpuid.c >@@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > > /* cpuid 0x80000008.ebx */ > const u32 kvm_cpuid_8000_0008_ebx_x86_features = >- F(IBPB); >+ F(IBPB) | F(IBRS); > > /* cpuid 0xC0000001.edx */ > const u32 kvm_cpuid_C000_0001_edx_x86_features = >@@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > > /* cpuid 7.0.edx*/ > const u32 kvm_cpuid_7_0_edx_x86_features = >- F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES); >+ F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) | >+ F(ARCH_CAPABILITIES); > > /* all calls to cpuid_count() should be made on the same cpu */ > get_cpu(); >@@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > g_phys_as = phys_as; > entry->eax = g_phys_as | (virt_as << 8); > entry->edx = 0; >- /* IBPB isn't necessarily present in hardware cpuid */ >+ /* IBRS and IBPB aren't necessarily present in hardware cpuid */ > if (boot_cpu_has(X86_FEATURE_IBPB)) > entry->ebx |= F(IBPB); >+ if (boot_cpu_has(X86_FEATURE_IBRS)) >+ entry->ebx |= F(IBRS); > entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; > cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); > break; >diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >index b13314a..5d8a6a91 100644 >--- a/arch/x86/kvm/vmx.c >+++ b/arch/x86/kvm/vmx.c >@@ -594,6 +594,7 @@ struct vcpu_vmx { > #endif > > u64 arch_capabilities; >+ u64 spec_ctrl; > > u32 vm_entry_controls_shadow; > u32 vm_exit_controls_shadow; >@@ -1913,6 +1914,29 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu) > } > > /* >+ * Check if MSR is intercepted for currently loaded MSR bitmap. >+ */ >+static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr) >+{ >+ unsigned long *msr_bitmap; >+ int f = sizeof(unsigned long); >+ >+ if (!cpu_has_vmx_msr_bitmap()) >+ return true; >+ >+ msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap; >+ >+ if (msr <= 0x1fff) { >+ return !!test_bit(msr, msr_bitmap + 0x800 / f); >+ } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) { >+ msr &= 0x1fff; >+ return !!test_bit(msr, msr_bitmap + 0xc00 / f); >+ } >+ >+ return true; >+} >+ >+/* > * Check if MSR is intercepted for L01 MSR bitmap. > */ > static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr) >@@ -3264,6 +3288,14 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > case MSR_IA32_TSC: > msr_info->data = guest_read_tsc(vcpu); > break; >+ case MSR_IA32_SPEC_CTRL: >+ if (!msr_info->host_initiated && >+ !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) && >+ !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) >+ return 1; >+ >+ msr_info->data = to_vmx(vcpu)->spec_ctrl; >+ break; > case MSR_IA32_ARCH_CAPABILITIES: > if (!msr_info->host_initiated && > !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES)) >@@ -3377,6 +3409,37 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > case MSR_IA32_TSC: > kvm_write_tsc(vcpu, msr_info); > break; >+ case MSR_IA32_SPEC_CTRL: >+ if (!msr_info->host_initiated && >+ !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) && >+ !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) >+ return 1; >+ >+ /* The STIBP bit doesn't fault even if it's not advertised */ >+ if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) >+ return 1; >+ >+ vmx->spec_ctrl = data; >+ >+ if (!data) >+ break; >+ >+ /* >+ * For non-nested: >+ * When it's written (to non-zero) for the first time, pass >+ * it through. >+ * >+ * For nested: >+ * The handling of the MSR bitmap for L2 guests is done in >+ * nested_vmx_merge_msr_bitmap. We should not touch the >+ * vmcs02.msr_bitmap here since it gets completely overwritten >+ * in the merging. We update the vmcs01 here for L1 as well >+ * since it will end up touching the MSR anyway now. >+ */ >+ vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, >+ MSR_IA32_SPEC_CTRL, >+ MSR_TYPE_RW); >+ break; > case MSR_IA32_PRED_CMD: > if (!msr_info->host_initiated && > !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) && >@@ -5702,6 +5765,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) > u64 cr0; > > vmx->rmode.vm86_active = 0; >+ vmx->spec_ctrl = 0; > > vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val(); > kvm_set_cr8(vcpu, 0); >@@ -9373,6 +9437,15 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) > > vmx_arm_hv_timer(vcpu); > >+ /* >+ * If this vCPU has touched SPEC_CTRL, restore the guest's value if >+ * it's non-zero. Since vmentry is serialising on affected CPUs, there >+ * is no need to worry about the conditional branch over the wrmsr >+ * being speculatively taken. >+ */ >+ if (vmx->spec_ctrl) >+ wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl); >+ > vmx->__launched = vmx->loaded_vmcs->launched; > asm( > /* Store host registers */ >@@ -9491,6 +9564,27 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) > #endif > ); > >+ /* >+ * We do not use IBRS in the kernel. If this vCPU has used the >+ * SPEC_CTRL MSR it may have left it on; save the value and >+ * turn it off. This is much more efficient than blindly adding >+ * it to the atomic save/restore list. Especially as the former >+ * (Saving guest MSRs on vmexit) doesn't even exist in KVM. >+ * >+ * For non-nested case: >+ * If the L01 MSR bitmap does not intercept the MSR, then we need to >+ * save it. >+ * >+ * For nested case: >+ * If the L02 MSR bitmap does not intercept the MSR, then we need to >+ * save it. >+ */ >+ if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)) >+ rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl); >+ >+ if (vmx->spec_ctrl) >+ wrmsrl(MSR_IA32_SPEC_CTRL, 0); >+ > /* Eliminate branch target predictions from guest mode */ > vmexit_fill_RSB(); > >@@ -10115,7 +10209,7 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, > unsigned long *msr_bitmap_l1; > unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; > /* >- * pred_cmd is trying to verify two things: >+ * pred_cmd & spec_ctrl are trying to verify two things: > * > * 1. L0 gave a permission to L1 to actually passthrough the MSR. This > * ensures that we do not accidentally generate an L02 MSR bitmap >@@ -10128,9 +10222,10 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, > * the MSR. > */ > bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD); >+ bool spec_ctrl = msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL); > > if (!nested_cpu_has_virt_x2apic_mode(vmcs12) && >- !pred_cmd) >+ !pred_cmd && !spec_ctrl) > return false; > > page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap); >@@ -10164,6 +10259,12 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, > } > } > >+ if (spec_ctrl) >+ nested_vmx_disable_intercept_for_msr( >+ msr_bitmap_l1, msr_bitmap_l0, >+ MSR_IA32_SPEC_CTRL, >+ MSR_TYPE_R | MSR_TYPE_W); >+ > if (pred_cmd) > nested_vmx_disable_intercept_for_msr( > msr_bitmap_l1, msr_bitmap_l0, >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >index 4ec142e..ac38143 100644 >--- a/arch/x86/kvm/x86.c >+++ b/arch/x86/kvm/x86.c >@@ -1009,7 +1009,7 @@ static u32 msrs_to_save[] = { > #endif > MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA, > MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX, >- MSR_IA32_ARCH_CAPABILITIES >+ MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES > }; > > static unsigned num_msrs_to_save; >-- >2.7.4 >