Received: by 10.223.176.5 with SMTP id f5csp3248537wra; Mon, 29 Jan 2018 10:44:11 -0800 (PST) X-Google-Smtp-Source: AH8x225KYco4+Dx6gN/tSWyrEFdCMt6i0PR5E/Yzid34iA7MaEmnuczNvfjeYohTdhMryXdDvyNy X-Received: by 2002:a17:902:203:: with SMTP id 3-v6mr23561241plc.413.1517251451797; Mon, 29 Jan 2018 10:44:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517251451; cv=none; d=google.com; s=arc-20160816; b=d82gOXdyRfYsRLROuV7EmEbE1rK1oQ7Bj5ETwyEn5kOUp5R8fLCgp8waHqJhfwT8p5 njJmLVpGpp2uYWq/Iq83SfPHiJIVhoK6cnDmNzm0aa2ZBb7BO30uRaZy0ah02+8bKeY9 4SEuonLpCil2V5jjsIYchtUICWGjnWMVnADZtjo1AwSl2pvGnmSXwPGZF0c8ruxu4bW+ LxaZxUe2Oydo5xepIaRRGj2lZL+jPhm7Uqwe+9znHTNivKMXf5KSBGNBG1cpwbJKEUMT LWUO2511OQp+48zXS6j8/bYZtA9KUq/mK1HQxvzcHOsBFqlQ7mb3CMxK+x4aD6DXassz hV5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=RfuiNrgS3jfUpOmbJz3Arbscxd6MWZnZrV07BPTM7uQ=; b=jRsn8BNzAel8vNHcxmGrA8ctAbesYQW+hcYX+Ed4KqOE8MQfnAGv0ehDVa+DN28YGM fXt4OLXAswkvStYkWFe24B7RsLeBRn5bp87yJW2lvachuRFFMegQ60V4jjicWYNO6/1L b/FTNbyRHlZFxSLK/DfT+XJqjW6H0h4Lk7JPn13ZroXUPtpu/kPG0PZVEoUzRQbrkJSG 1jmsy0bBQ4eFfpP2a0ezdtY9g+sqwocpRY1ujoBiLByJx+EXozkuExUYDsmYT0CNW2jy mGfRjAT9h1uDdO9TBD50SgWeGFtQT3QndsLpP1MxDbAltP2FQqDh0ZAgGe/k4S22a7Ab ydTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=PFtwA3K4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e22si12448616pfl.178.2018.01.29.10.43.57; Mon, 29 Jan 2018 10:44:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=PFtwA3K4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751647AbeA2Sn0 (ORCPT + 99 others); Mon, 29 Jan 2018 13:43:26 -0500 Received: from mail-it0-f51.google.com ([209.85.214.51]:36125 "EHLO mail-it0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751604AbeA2SnY (ORCPT ); Mon, 29 Jan 2018 13:43:24 -0500 Received: by mail-it0-f51.google.com with SMTP id n206so8046663itg.1 for ; Mon, 29 Jan 2018 10:43:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=RfuiNrgS3jfUpOmbJz3Arbscxd6MWZnZrV07BPTM7uQ=; b=PFtwA3K4Z+VVGtdy113uyZ+7AbgYtyks3vS0hmy/wYG/3dNOvxEkkLAhjZrEM1tggS 7ER8LATLuSUCEQpQSu6ZIXbwg8tujweEhmTJIMCL4Cc0kQ1qZO1+0KL2HNNecXh9yDRN tz8nnGM+NTuN0u4dNVRbK+EAhDqsklj6GCIgomV1OAQSFFZKOPVRfmavd2r//YcNMYc6 p30CHupVKbqJcoJFzzhER6zU75eszRf0uL5WJAIKi9NrgZKBrQr2aWXKEvm6bw7xCG1K 3lobZpiWprRNFIDSDE0DnxhOYa1rgYh28vkI2dxRvl1AEMRr0J2qS7hj1oArmqdK7g0J 2Q8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=RfuiNrgS3jfUpOmbJz3Arbscxd6MWZnZrV07BPTM7uQ=; b=PxibOcNdJ5/EvsX0FflrBRGkVaQ+ETmQd3xoRzRIj6+3YmDrt2OjSaL0j1o6zMRRJX vvDBTNyH6UBaWCFRQMDnFms4qa4yHG7Te9PxuwzYJtovVG0KKks5EID0RFioBjkUPOJw TpdLNRsTZJi08TSM/Yo6SxqvzR4qJ4kad6GerDWGAnJXTj8c2O9uOonR0lrWQDhEQxPD Fsu8O5BRcz5BLJybPdr1bxwWoqMhZtXTzNmArwzQLDVBTmlEtvRVnaDVzQwz4tBwresv +G8TCCgHBX0+dQMHGs1RVb/4zOSTmuMHc1haptR93Z0VxVtaFLLrJ3ipZqapExTheBtX doSQ== X-Gm-Message-State: AKwxytfDWLekk1TcNo5yxD9Mn0wmH1dDkOq3jcEOWRTg5DNniXjjZ81X BY5S+txUr6U0Oaf39nDnJIZgLR8N7HY5jXVpaj0wmA== X-Received: by 10.36.80.11 with SMTP id m11mr14217388itb.3.1517251403122; Mon, 29 Jan 2018 10:43:23 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.128.7 with HTTP; Mon, 29 Jan 2018 10:43:22 -0800 (PST) In-Reply-To: <1517167750-23485-1-git-send-email-karahmed@amazon.de> References: <1517167750-23485-1-git-send-email-karahmed@amazon.de> From: Jim Mattson Date: Mon, 29 Jan 2018 10:43:22 -0800 Message-ID: Subject: Re: [PATCH] x86: vmx: Allow direct access to MSR_IA32_SPEC_CTRL To: KarimAllah Ahmed Cc: kvm list , LKML , Asit Mallick , Arjan Van De Ven , Dave Hansen , Andi Kleen , Andrea Arcangeli , Linus Torvalds , Tim Chen , Thomas Gleixner , Dan Williams , Jun Nakajima , Paolo Bonzini , David Woodhouse , Greg KH , Andy Lutomirski , Ashok Raj Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jan 28, 2018 at 11:29 AM, KarimAllah Ahmed wrote: > Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests > that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a > retpoline+IBPB based approach. > > To avoid the overhead of atomically saving and restoring the MSR_IA32_SPEC_CTRL > for guests that do not actually use the MSR, only add_atomic_switch_msr when a > non-zero is written to it. > > Cc: Asit Mallick > Cc: Arjan Van De Ven > Cc: Dave Hansen > Cc: Andi Kleen > Cc: Andrea Arcangeli > Cc: Linus Torvalds > Cc: Tim Chen > Cc: Thomas Gleixner > Cc: Dan Williams > Cc: Jun Nakajima > Cc: Paolo Bonzini > Cc: David Woodhouse > Cc: Greg KH > Cc: Andy Lutomirski > Signed-off-by: KarimAllah Ahmed > Signed-off-by: Ashok Raj > --- > arch/x86/kvm/cpuid.c | 4 +++- > arch/x86/kvm/cpuid.h | 1 + > arch/x86/kvm/vmx.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 67 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 0099e10..dc78095 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -70,6 +70,7 @@ u64 kvm_supported_xcr0(void) > /* These are scattered features in cpufeatures.h. */ > #define KVM_CPUID_BIT_AVX512_4VNNIW 2 > #define KVM_CPUID_BIT_AVX512_4FMAPS 3 > +#define KVM_CPUID_BIT_SPEC_CTRL 26 > #define KF(x) bit(KVM_CPUID_BIT_##x) > > int kvm_update_cpuid(struct kvm_vcpu *vcpu) > @@ -392,7 +393,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > > /* cpuid 7.0.edx*/ > const u32 kvm_cpuid_7_0_edx_x86_features = > - KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS); > + KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS) | \ > + (boot_cpu_has(X86_FEATURE_SPEC_CTRL) ? KF(SPEC_CTRL) : 0); Isn't 'boot_cpu_has()' superflous here? And aren't there two bits to pass through for existing CPUs (26 and 27)? > > /* all calls to cpuid_count() should be made on the same cpu */ > get_cpu(); > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h > index cdc70a3..dcfe227 100644 > --- a/arch/x86/kvm/cpuid.h > +++ b/arch/x86/kvm/cpuid.h > @@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = { > [CPUID_8000_000A_EDX] = {0x8000000a, 0, CPUID_EDX}, > [CPUID_7_ECX] = { 7, 0, CPUID_ECX}, > [CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX}, > + [CPUID_7_EDX] = { 7, 0, CPUID_EDX}, > }; > > static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature) > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index aa8638a..1b743a0 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -920,6 +920,9 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked); > static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, > u16 error_code); > static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); > +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, > + u32 msr, int type); > + > > static DEFINE_PER_CPU(struct vmcs *, vmxarea); > static DEFINE_PER_CPU(struct vmcs *, current_vmcs); > @@ -2007,6 +2010,28 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, > m->host[i].value = host_val; > } > > +/* do not touch guest_val and host_val if the msr is not found */ > +static int read_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, > + u64 *guest_val, u64 *host_val) > +{ > + unsigned i; > + struct msr_autoload *m = &vmx->msr_autoload; > + > + for (i = 0; i < m->nr; ++i) > + if (m->guest[i].index == msr) > + break; > + > + if (i == m->nr) > + return 1; > + > + if (guest_val) > + *guest_val = m->guest[i].value; > + if (host_val) > + *host_val = m->host[i].value; > + > + return 0; > +} > + > static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset) > { > u64 guest_efer = vmx->vcpu.arch.efer; > @@ -3203,7 +3228,9 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu, > */ > static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > { > + u64 spec_ctrl = 0; > struct shared_msr_entry *msr; > + struct vcpu_vmx *vmx = to_vmx(vcpu); > > switch (msr_info->index) { > #ifdef CONFIG_X86_64 > @@ -3223,6 +3250,19 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > case MSR_IA32_TSC: > msr_info->data = guest_read_tsc(vcpu); > break; > + case MSR_IA32_SPEC_CTRL: > + if (!msr_info->host_initiated && > + !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) Shouldn't this conjunct be: !(guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) || guest_cpuid_has(vcpu, X86_FEATURE_STIBP))? > + return 1; What if !boot_cpu_has(X86_FEATURE_SPEC_CTRL) && !boot_cpu_has(X86_FEATURE_STIBP)? That should also return 1, I think. > + > + /* > + * If the MSR is not in the atomic list yet, then it was never > + * written to. So the MSR value will be '0'. > + */ > + read_atomic_switch_msr(vmx, MSR_IA32_SPEC_CTRL, &spec_ctrl, NULL); Why not just add msr_ia32_spec_ctrl to struct vcpu_vmx, so that you don't have to search the atomic switch list? > + > + msr_info->data = spec_ctrl; > + break; > case MSR_IA32_SYSENTER_CS: > msr_info->data = vmcs_read32(GUEST_SYSENTER_CS); > break; > @@ -3289,6 +3329,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > int ret = 0; > u32 msr_index = msr_info->index; > u64 data = msr_info->data; > + unsigned long *msr_bitmap; > + > + /* > + * IBRS is not used (yet) to protect the host. Once it does, this > + * variable needs to be a bit smarter. > + */ > + u64 host_spec_ctrl = 0; > > switch (msr_index) { > case MSR_EFER: > @@ -3330,6 +3377,22 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > case MSR_IA32_TSC: > kvm_write_tsc(vcpu, msr_info); > break; > + case MSR_IA32_SPEC_CTRL: > + if (!msr_info->host_initiated && > + !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) > + return 1; This looks incomplete. As above, what if !boot_cpu_has(X86_FEATURE_SPEC_CTRL) && !boot_cpu_has(X86_FEATURE_STIBP)? If the host doesn't support MSR_IA32_SPEC_CTRL, you'll get a VMX-abort on loading the host MSRs from the VM-exit MSR load list. Also, what if the value being written is illegal? /* * Processors that support IBRS but not STIBP * (CPUID.(EAX=07H, ECX=0):EDX[27:26] = 01b) will * ignore attempts to set STIBP instead of causing an * exception due to setting that reserved bit. */ if ((data & ~(u64)(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) || ((data & SPEC_CTRL_IBRS) && !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))) return 1; > + > + /* > + * Now we know that the guest is actually using the MSR, so > + * atomically load and save the SPEC_CTRL MSR and pass it > + * through to the guest. > + */ > + add_atomic_switch_msr(vmx, MSR_IA32_SPEC_CTRL, msr_info->data, > + host_spec_ctrl); > + msr_bitmap = vmx->vmcs01.msr_bitmap; > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW); I assume you mean MSR_IA32_SPEC_CTRL rather than MSR_FS_BASE. Also, what if the host and the guest support a different set of bits in MSR_IA32_SPEC_CTRL, due to a userspace modification of the guest's CPUID info? > + > + break; > case MSR_IA32_CR_PAT: > if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { > if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) > -- > 2.7.4 > Where do you preserve the guest's MSR_IA32_SPEC_CTRL value on VM-exit, if the guest has been given permission to write the MSR? You also have to clear the guest's MSR_IA32_SPEC_CTRL on vmx_vcpu_reset, don't you?