Received: by 10.223.176.5 with SMTP id f5csp3288073wra; Mon, 29 Jan 2018 11:21:04 -0800 (PST) X-Google-Smtp-Source: AH8x227ttSZH0IzNQ5Y0LkhTeDXpy0ZGH+WM2OJfHGpiVYDK+sUwWRzwagZREaunLKkdOvrr3jCC X-Received: by 2002:a17:902:7c86:: with SMTP id y6-v6mr13459643pll.24.1517253664038; Mon, 29 Jan 2018 11:21:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517253664; cv=none; d=google.com; s=arc-20160816; b=cGBvWAxmDhKR+OnXCfCqd3pVDLSFKB3pR/yInCz8yreVz/yBnrX2LD6CcHieX9r3Id tD5EKinoGl+mlYDsgcqw+qHkpcXNmGmp++oOuCn2dTzqGgpzhXSDxwxPp+QpDyIabMRu Mav3Iu1gThGNlyNQyxo7CrVUoXKWJAAEm75rADwxKO51pE7te+dQL4gxnr4hM2cWTUtF CupLS01PkVGgB5tVmf8mcYX/XPJdvgTYqxoyx2J79BkACpUcBBjERm4I+THjoTMRa2G6 3d63fsg5US85wiMmbXWHZX5q8rcT+qmh91tN8OFBxE2biOp9mLy+n9zhKh2N91zWpyJS iSUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=6INmSQODIrinv2Zh//9VY7O0H6iTHagcmtMgWkgeI4Q=; b=x3mPOqP8zzNW/LLYkaB+R0TvAuQDh9/E7OXZDvQY3k8lvTymyVrf76xk/IU0ubZ5zG cK1l/Wu9H4EOHNkuzBwFUv2Sa9M3HXue3H9Kf1yZygbax2L44BCxxwiSaNYJMZSt7BGL Xne+NZCO7ODQ0rxsgxyy72jk1EIpdiFXnyxopLTwSf3C1iYrc+jja+hmtu9mXfhnqwUQ cr3Ryz8Bbq7ZGnQamBgQcfPU4sYaum13WFNdy3TEOOPUFgCvaozf7EEwhjnm1AtHRKPK qXZgggZUjUjJjdbpvBwiCwmOJpqGHy8aE2gpzt/X5t3uuhzFI6C0CnNQfuIhu+O1rp4L /kiw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=LKVuLzVo; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y190si7891371pgb.728.2018.01.29.11.20.49; Mon, 29 Jan 2018 11:21:03 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=LKVuLzVo; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751521AbeA2TUP (ORCPT + 99 others); Mon, 29 Jan 2018 14:20:15 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:49322 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751262AbeA2TUN (ORCPT ); Mon, 29 Jan 2018 14:20:13 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w0TJBmRD050033; Mon, 29 Jan 2018 19:18:44 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=corp-2017-10-26; bh=6INmSQODIrinv2Zh//9VY7O0H6iTHagcmtMgWkgeI4Q=; b=LKVuLzVoBUPqp3tXCZhnBK4tVcgIhkE4WRJkxtUWhyQNLu+BOX4kPMbM/e/EmDhu12nJ 4yhXGYuoEs9psX9BtgMIzoPMlhj5pIKYd4u9xgDGK5UJ+NHPPYeAE1FIqFVTmd2xR0DK laODdFcYZYQ6e+J0xp7ncpPZkDyphCQ1LItET+U9W+UBwNgpTdQjW77eRrV70Aya86uh 4agpLz/wcrC/tJ19eF9AYecIwRKjNB+DHSwwYK3q641oXaZtx+VNtpLmT0ZRQ/oAdNh4 iZWoqVp39rj6H4Vpj4sOBPCGLaxcM612mYIeW902ldy0kF9ga8w9q0fn9k7xV7RYTOqr 4A== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp2120.oracle.com with ESMTP id 2ft9f5r5sj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 29 Jan 2018 19:18:44 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w0TJGBx4006907 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Mon, 29 Jan 2018 19:16:11 GMT Received: from abhmp0011.oracle.com (abhmp0011.oracle.com [141.146.116.17]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w0TJGATe002918; Mon, 29 Jan 2018 19:16:10 GMT Received: from char.us.oracle.com (/10.137.176.158) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 29 Jan 2018 11:16:10 -0800 Received: by char.us.oracle.com (Postfix, from userid 1000) id 07A416A0A22; Mon, 29 Jan 2018 14:16:08 -0500 (EST) Date: Mon, 29 Jan 2018 14:16:08 -0500 From: Konrad Rzeszutek Wilk To: Jim Mattson Cc: KarimAllah Ahmed , kvm list , LKML , Asit Mallick , Arjan Van De Ven , Dave Hansen , Andi Kleen , Andrea Arcangeli , Linus Torvalds , Tim Chen , Thomas Gleixner , Dan Williams , Jun Nakajima , Paolo Bonzini , David Woodhouse , Greg KH , Andy Lutomirski , Ashok Raj Subject: Re: [PATCH] x86: vmx: Allow direct access to MSR_IA32_SPEC_CTRL Message-ID: <20180129191608.GS22045@char.us.oracle.com> References: <1517167750-23485-1-git-send-email-karahmed@amazon.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.3 (2017-05-23) X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8789 signatures=668655 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801290247 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 29, 2018 at 10:43:22AM -0800, Jim Mattson wrote: > On Sun, Jan 28, 2018 at 11:29 AM, KarimAllah Ahmed wrote: > > Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests > > that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a > > retpoline+IBPB based approach. > > > > To avoid the overhead of atomically saving and restoring the MSR_IA32_SPEC_CTRL > > for guests that do not actually use the MSR, only add_atomic_switch_msr when a > > non-zero is written to it. > > > > Cc: Asit Mallick > > Cc: Arjan Van De Ven > > Cc: Dave Hansen > > Cc: Andi Kleen > > Cc: Andrea Arcangeli > > Cc: Linus Torvalds > > Cc: Tim Chen > > Cc: Thomas Gleixner > > Cc: Dan Williams > > Cc: Jun Nakajima > > Cc: Paolo Bonzini > > Cc: David Woodhouse > > Cc: Greg KH > > Cc: Andy Lutomirski > > Signed-off-by: KarimAllah Ahmed > > Signed-off-by: Ashok Raj > > --- > > arch/x86/kvm/cpuid.c | 4 +++- > > arch/x86/kvm/cpuid.h | 1 + > > arch/x86/kvm/vmx.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 67 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > > index 0099e10..dc78095 100644 > > --- a/arch/x86/kvm/cpuid.c > > +++ b/arch/x86/kvm/cpuid.c > > @@ -70,6 +70,7 @@ u64 kvm_supported_xcr0(void) > > /* These are scattered features in cpufeatures.h. */ > > #define KVM_CPUID_BIT_AVX512_4VNNIW 2 > > #define KVM_CPUID_BIT_AVX512_4FMAPS 3 > > +#define KVM_CPUID_BIT_SPEC_CTRL 26 > > #define KF(x) bit(KVM_CPUID_BIT_##x) > > > > int kvm_update_cpuid(struct kvm_vcpu *vcpu) > > @@ -392,7 +393,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > > > > /* cpuid 7.0.edx*/ > > const u32 kvm_cpuid_7_0_edx_x86_features = > > - KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS); > > + KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS) | \ > > + (boot_cpu_has(X86_FEATURE_SPEC_CTRL) ? KF(SPEC_CTRL) : 0); > > Isn't 'boot_cpu_has()' superflous here? And aren't there two bits to > pass through for existing CPUs (26 and 27)? > > > > > /* all calls to cpuid_count() should be made on the same cpu */ > > get_cpu(); > > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h > > index cdc70a3..dcfe227 100644 > > --- a/arch/x86/kvm/cpuid.h > > +++ b/arch/x86/kvm/cpuid.h > > @@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = { > > [CPUID_8000_000A_EDX] = {0x8000000a, 0, CPUID_EDX}, > > [CPUID_7_ECX] = { 7, 0, CPUID_ECX}, > > [CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX}, > > + [CPUID_7_EDX] = { 7, 0, CPUID_EDX}, > > }; > > > > static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > > index aa8638a..1b743a0 100644 > > --- a/arch/x86/kvm/vmx.c > > +++ b/arch/x86/kvm/vmx.c > > @@ -920,6 +920,9 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked); > > static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, > > u16 error_code); > > static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); > > +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, > > + u32 msr, int type); > > + > > > > static DEFINE_PER_CPU(struct vmcs *, vmxarea); > > static DEFINE_PER_CPU(struct vmcs *, current_vmcs); > > @@ -2007,6 +2010,28 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, > > m->host[i].value = host_val; > > } > > > > +/* do not touch guest_val and host_val if the msr is not found */ > > +static int read_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, > > + u64 *guest_val, u64 *host_val) > > +{ > > + unsigned i; > > + struct msr_autoload *m = &vmx->msr_autoload; > > + > > + for (i = 0; i < m->nr; ++i) > > + if (m->guest[i].index == msr) > > + break; > > + > > + if (i == m->nr) > > + return 1; > > + > > + if (guest_val) > > + *guest_val = m->guest[i].value; > > + if (host_val) > > + *host_val = m->host[i].value; > > + > > + return 0; > > +} > > + > > static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset) > > { > > u64 guest_efer = vmx->vcpu.arch.efer; > > @@ -3203,7 +3228,9 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu, > > */ > > static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > > { > > + u64 spec_ctrl = 0; > > struct shared_msr_entry *msr; > > + struct vcpu_vmx *vmx = to_vmx(vcpu); > > > > switch (msr_info->index) { > > #ifdef CONFIG_X86_64 > > @@ -3223,6 +3250,19 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > > case MSR_IA32_TSC: > > msr_info->data = guest_read_tsc(vcpu); > > break; > > + case MSR_IA32_SPEC_CTRL: > > + if (!msr_info->host_initiated && > > + !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) > > Shouldn't this conjunct be: > !(guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) || > guest_cpuid_has(vcpu, X86_FEATURE_STIBP))? > > > + return 1; > > What if !boot_cpu_has(X86_FEATURE_SPEC_CTRL) && > !boot_cpu_has(X86_FEATURE_STIBP)? That should also return 1, I think. > > > + > > + /* > > + * If the MSR is not in the atomic list yet, then it was never > > + * written to. So the MSR value will be '0'. > > + */ > > + read_atomic_switch_msr(vmx, MSR_IA32_SPEC_CTRL, &spec_ctrl, NULL); > > Why not just add msr_ia32_spec_ctrl to struct vcpu_vmx, so that you > don't have to search the atomic switch list? > > > + > > + msr_info->data = spec_ctrl; > > + break; > > case MSR_IA32_SYSENTER_CS: > > msr_info->data = vmcs_read32(GUEST_SYSENTER_CS); > > break; > > @@ -3289,6 +3329,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > > int ret = 0; > > u32 msr_index = msr_info->index; > > u64 data = msr_info->data; > > + unsigned long *msr_bitmap; > > + > > + /* > > + * IBRS is not used (yet) to protect the host. Once it does, this > > + * variable needs to be a bit smarter. > > + */ > > + u64 host_spec_ctrl = 0; > > > > switch (msr_index) { > > case MSR_EFER: > > @@ -3330,6 +3377,22 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > > case MSR_IA32_TSC: > > kvm_write_tsc(vcpu, msr_info); > > break; > > + case MSR_IA32_SPEC_CTRL: > > + if (!msr_info->host_initiated && > > + !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) > > + return 1; > > This looks incomplete. As above, what if > !boot_cpu_has(X86_FEATURE_SPEC_CTRL) && > !boot_cpu_has(X86_FEATURE_STIBP)? > If the host doesn't support MSR_IA32_SPEC_CTRL, you'll get a VMX-abort > on loading the host MSRs from the VM-exit MSR load list. Yikes, right it will #GP. > > Also, what if the value being written is illegal? You can write garbage and it won't #GP. Granted it should only read correct values (0,1,2,or 3). Albeit the spec says nothing about it (except call those regions as reserved which would imply - rdmsr ifrst and then 'or' it with what you are wrmsr). That of couse would not be the best choice :-(