Received: by 10.223.176.5 with SMTP id f5csp2265257wra; Wed, 31 Jan 2018 20:28:11 -0800 (PST) X-Google-Smtp-Source: AH8x225DqTpK3RK0JnXdbhDPoiuu1/KCHpvsuUBtBl128JJF9RzbsIF9ny9Gw/vO9zWY1copEKJZ X-Received: by 10.99.113.16 with SMTP id m16mr23504325pgc.29.1517459291162; Wed, 31 Jan 2018 20:28:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517459291; cv=none; d=google.com; s=arc-20160816; b=ki41AXpFcZ9GOzkGVJbLQ+ZhFabI76XSIRdSkJN6kZILzikfv29XsYraRM4INpQlJd yNqfqvwKNuMVrs/2PM58O4crhEhfbxWlBTKGnYRYpgZwnhenUog5ARSYewcSdZZ6Sh/n 2A81d8IYP5K1jhZndWMpwK85ucyy47Kt86DHl9GSozsaWzq51K9eHvhBtxVfqliybJxG z3239QRlOATdE2CCclcEmX+OyueWnnaw6nnFhN6mNbutJUWfgvWnA7qzD/A2rOVvMgqr E4O0ZpRt3VbZfSJ8L9k5xbMza6IUaljJ0fyFW8gjDElb2WkywVgekxin5aWbZ1FkawLs 0pJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=/Cy6ikqRNvbv0JxAhOpsBhXkHWdTJuDDVrdKcLreOD0=; b=XtyxEzXfhBcNQwXOIErrN5GVurOLhDgROcSAzTjyjoP8cMOHhdfRJ2kaHDjiO3zhz2 tOLcbDDzohZm01geJ3Srp1jptx37BS6qDvZxk5stmeQT6pIJr2rrQ4HnBNFbgljSesOn WeW8LWrSz8AL76718vCRwa+IXBwfP6Pxas9FfJUhg8AL5bqw5ilfgHjXVVz4K78FF2oI 6nEmp/9I2Wo/MD50Hb4HLl/jrMXqKMntG/Geijm699QTIqbvOmVBXleZdwSn7Sk3/YSa yMrc6Fmvf50mma3oALRqmHZgepLyKth42i3YBtrxEuAjIJt5O663dIMT0dkO3mpZ95n5 lUmg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=uaj7B9ec; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t14si12063335pgn.157.2018.01.31.20.27.55; Wed, 31 Jan 2018 20:28:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=uaj7B9ec; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751542AbeBAE13 (ORCPT + 99 others); Wed, 31 Jan 2018 23:27:29 -0500 Received: from userp2120.oracle.com ([156.151.31.85]:37632 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751324AbeBAE10 (ORCPT ); Wed, 31 Jan 2018 23:27:26 -0500 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w114Qq0V130171; Thu, 1 Feb 2018 04:26:57 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=corp-2017-10-26; bh=/Cy6ikqRNvbv0JxAhOpsBhXkHWdTJuDDVrdKcLreOD0=; b=uaj7B9ecZwN1++BP2e3jC4dzCbU/p0v7CiSSxFAlDH6gs+tTexHvsBSoCHbNr0mjsJHm 7CvQO5RO416sWcFhhUqQHWLMb6lR0JX5k/Ql5iWdK8qrVvbEcJSSVlisEN2hUZuR0jUX XGTzWQK84t7D9nTImMymGZ6NRpl6m4KnChNnOOzL3kcL9auqxmn9O8D5i1KT6WOeZpSM EB5K7PgbxRoqlOQJCEJ6X6eP8yV6kds+OuDOiplqeiwDmfreCKfsQu/sp+pcP42JsgsX iskE45vtafdmXgVLfGMaIDqkEnAMXqaU50nInTzM7H8neDLyEeKjX28IAKbt6WMfwABn XQ== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2120.oracle.com with ESMTP id 2fuqnt8k0c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 01 Feb 2018 04:26:56 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w114QtlA018156 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 1 Feb 2018 04:26:55 GMT Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w114QqWQ021215; Thu, 1 Feb 2018 04:26:53 GMT Received: from char.us.oracle.com (/10.137.176.158) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 31 Jan 2018 20:26:52 -0800 Received: by char.us.oracle.com (Postfix, from userid 1000) id CD1F46A09F6; Wed, 31 Jan 2018 23:26:50 -0500 (EST) Date: Wed, 31 Jan 2018 23:26:50 -0500 From: Konrad Rzeszutek Wilk To: KarimAllah Ahmed Cc: Jim Mattson , KarimAllah Ahmed , kvm list , LKML , the arch/x86 maintainers , Asit Mallick , Arjan Van De Ven , Dave Hansen , Andi Kleen , Andrea Arcangeli , Linus Torvalds , Tim Chen , Thomas Gleixner , Dan Williams , Jun Nakajima , Paolo Bonzini , David Woodhouse , Greg KH , Andy Lutomirski , Ashok Raj Subject: Re: [PATCH v5 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL Message-ID: <20180201042650.GD21751@char.us.oracle.com> References: <1517427467-28567-1-git-send-email-karahmed@amazon.de> <1517427467-28567-5-git-send-email-karahmed@amazon.de> <06cb88da-f355-41ed-380f-7daa8ddf6159@amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.3 (2017-05-23) X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8791 signatures=668659 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1802010055 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > >From 9c19a8ac3f021efba6f70ad7e28f7ad06bb97e43 Mon Sep 17 00:00:00 2001 > From: KarimAllah Ahmed > Date: Mon, 29 Jan 2018 19:58:10 +0000 > Subject: [PATCH] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL > > [ Based on a patch from Ashok Raj ] > > Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for > guests that will only mitigate Spectre V2 through IBRS+IBPB and will not > be using a retpoline+IBPB based approach. > > To avoid the overhead of atomically saving and restoring the > MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only > add_atomic_switch_msr when a non-zero is written to it. ^^^^^^^^^^^^^^^^^^^^^ That part of the comment does not seem to be in sync with the code. > > No attempt is made to handle STIBP here, intentionally. Filtering STIBP > may be added in a future patch, which may require trapping all writes > if we don't want to pass it through directly to the guest. > > [dwmw2: Clean up CPUID bits, save/restore manually, handle reset] > > Cc: Asit Mallick > Cc: Arjan Van De Ven > Cc: Dave Hansen > Cc: Andi Kleen > Cc: Andrea Arcangeli > Cc: Linus Torvalds > Cc: Tim Chen > Cc: Thomas Gleixner > Cc: Dan Williams > Cc: Jun Nakajima > Cc: Paolo Bonzini > Cc: David Woodhouse > Cc: Greg KH > Cc: Andy Lutomirski > Cc: Ashok Raj > Signed-off-by: KarimAllah Ahmed > Signed-off-by: David Woodhouse > --- > v6: > - got rid of save_spec_ctrl_on_exit > - introduce spec_ctrl_intercepted > - introduce spec_ctrl_used > v5: > - Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes > v4: > - Add IBRS to kvm_cpuid_8000_0008_ebx_x86_features > - Handling nested guests > v3: > - Save/restore manually > - Fix CPUID handling > - Fix a copy & paste error in the name of SPEC_CTRL MSR in > disable_intercept. > - support !cpu_has_vmx_msr_bitmap() > v2: > - remove 'host_spec_ctrl' in favor of only a comment (dwmw@). > - special case writing '0' in SPEC_CTRL to avoid confusing live-migration > when the instance never used the MSR (dwmw@). > - depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@). > - add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident). > --- > arch/x86/kvm/cpuid.c | 9 +++-- > arch/x86/kvm/vmx.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++- > arch/x86/kvm/x86.c | 2 +- > 3 files changed, 100 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 1909635..13f5d42 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > > /* cpuid 0x80000008.ebx */ > const u32 kvm_cpuid_8000_0008_ebx_x86_features = > - F(IBPB); > + F(IBPB) | F(IBRS); > > /* cpuid 0xC0000001.edx */ > const u32 kvm_cpuid_C000_0001_edx_x86_features = > @@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > > /* cpuid 7.0.edx*/ > const u32 kvm_cpuid_7_0_edx_x86_features = > - F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES); > + F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) | > + F(ARCH_CAPABILITIES); > > /* all calls to cpuid_count() should be made on the same cpu */ > get_cpu(); > @@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > g_phys_as = phys_as; > entry->eax = g_phys_as | (virt_as << 8); > entry->edx = 0; > - /* IBPB isn't necessarily present in hardware cpuid */ > + /* IBRS and IBPB aren't necessarily present in hardware cpuid */ > if (boot_cpu_has(X86_FEATURE_IBPB)) > entry->ebx |= F(IBPB); > + if (boot_cpu_has(X86_FEATURE_IBRS)) > + entry->ebx |= F(IBRS); > entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; > cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); > break; > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 6a9f4ec..bfc80ff 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -594,6 +594,14 @@ struct vcpu_vmx { > #endif > > u64 arch_capabilities; > + u64 spec_ctrl; > + > + /* > + * This indicates that: > + * 1) guest_cpuid_has(X86_FEATURE_IBRS) == true && > + * 2) The guest has actually initiated a write against the MSR. > + */ > + bool spec_ctrl_used; > > /* > * This indicates that: > @@ -946,6 +954,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked); > static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, > u16 error_code); > static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); > +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, > + u32 msr, int type); > > static DEFINE_PER_CPU(struct vmcs *, vmxarea); > static DEFINE_PER_CPU(struct vmcs *, current_vmcs); > @@ -1917,6 +1927,22 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu) > vmcs_write32(EXCEPTION_BITMAP, eb); > } > > +/* Is SPEC_CTRL intercepted for the currently running vCPU? */ > +static bool spec_ctrl_intercepted(struct kvm_vcpu *vcpu) > +{ > + unsigned long *msr_bitmap; > + int f = sizeof(unsigned long); > + > + if (!cpu_has_vmx_msr_bitmap()) > + return true; > + > + msr_bitmap = is_guest_mode(vcpu) ? > + to_vmx(vcpu)->nested.vmcs02.msr_bitmap : > + to_vmx(vcpu)->vmcs01.msr_bitmap; > + > + return !!test_bit(MSR_IA32_SPEC_CTRL, msr_bitmap + 0x800 / f); > +} > + > static void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx, > unsigned long entry, unsigned long exit) > { > @@ -3246,6 +3272,14 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > case MSR_IA32_TSC: > msr_info->data = guest_read_tsc(vcpu); > break; > + case MSR_IA32_SPEC_CTRL: > + if (!msr_info->host_initiated && > + !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) && > + !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) > + return 1; > + > + msr_info->data = to_vmx(vcpu)->spec_ctrl; > + break; > case MSR_IA32_ARCH_CAPABILITIES: > if (!msr_info->host_initiated && > !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES)) > @@ -3359,6 +3393,34 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > case MSR_IA32_TSC: > kvm_write_tsc(vcpu, msr_info); > break; > + case MSR_IA32_SPEC_CTRL: > + if (!msr_info->host_initiated && > + !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) && > + !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) > + return 1; > + > + vmx->spec_ctrl_used = true; > + > + /* The STIBP bit doesn't fault even if it's not advertised */ > + if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) > + return 1; > + > + vmx->spec_ctrl = data; > + > + /* > + * When it's written (to non-zero) for the first time, pass > + * it through. This means we don't have to take the perf .. But only if it is a nested guest (as you have && is_guest_mode). Do you want to update the comment a bit? > + * hit of saving it on vmexit for the common case of guests > + * that don't use it. > + */ > + if (cpu_has_vmx_msr_bitmap() && data && > + spec_ctrl_intercepted(vcpu) && > + is_guest_mode(vcpu)) ^^^^^^^^^^^^^^^^^^ <=== here > + vmx_disable_intercept_for_msr( > + vmx->vmcs01.msr_bitmap, > + MSR_IA32_SPEC_CTRL, > + MSR_TYPE_RW); > + break; > case MSR_IA32_PRED_CMD: > if (!msr_info->host_initiated && > !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&