Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp32770img; Tue, 19 Mar 2019 17:11:03 -0700 (PDT) X-Google-Smtp-Source: APXvYqy6xNvMwQZfQcQGw8g9F+dgqLu66+J61sQ3/sfWZNPHvimYq/JfIrOZyR6kuadwvhPxtsoY X-Received: by 2002:a17:902:681:: with SMTP id 1mr4876057plh.31.1553040663132; Tue, 19 Mar 2019 17:11:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553040663; cv=none; d=google.com; s=arc-20160816; b=lOENKZ+TpZSv+KYRnoEhQ5zoJdxwTIedE1uEinnhkbqhi2TDEHSXi2lU9LxLQGZ96R UArYOIVXPCPvUZDpl/WZgy9BFLcoidIpCeNZKlY7oDj/LARhp9xdvIHvsO1s+fM30r44 arSjBPaEVaunMZmUk87q53yCe7l+BQ/m6ziLMbtTXaCRRf1Jx1RYh79o2rq0rMn3N6gD vKrINBHua5EJoYv0/xChKYnFVlwvJK5airzg8Va7kGZaTSQ0WIjno+bF/vMCq+hiY7RJ FoFHA4O69OzXkp1s2UWbjbBgY+5TtEsd2oMm7kE0Hb0Xv8kLK6oDaAvpC9Ts18Y3CyoL rpDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=dIqzpDzthJ1Owp+k+dvaeMJWQCT2dcOe+CqHTt6jLbY=; b=SuCTPcKRBxQT+Y3kh6VNELAwo1vMZnl5uuOmjAFfLX83t14lqa2YtnPsLr9Kt+JeFB PRcmAAFLEE2q7QPUKfP+O3SQhFOirx6yfN4t5aJ3bLgjLSM/bRcXtdGF9HplvXjOW1wt VeR+THZuo1tkfFyyyPezbwy0114Yw/KXNozFEsbbAQMZliQDObQSCs9dy+HGwof6l/0D bfS/efnaWS8UnJPRXAl5cZN7FUeYRJH1rgleOZ+MscvLdiEg2zOcfLbOtm8jTPFZYwww FIAvkBWyVrtwiQjH32Dnpc505h0p2vGxrUmjZoy0/xo6IECZ977CyZU4T7Vpl5D0eONy 18RA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y11si269877pfm.119.2019.03.19.17.10.47; Tue, 19 Mar 2019 17:11:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727541AbfCTAJo (ORCPT + 99 others); Tue, 19 Mar 2019 20:09:44 -0400 Received: from mga07.intel.com ([134.134.136.100]:40906 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727068AbfCTAJo (ORCPT ); Tue, 19 Mar 2019 20:09:44 -0400 X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Mar 2019 17:09:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,246,1549958400"; d="scan'208";a="153261074" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.181]) by fmsmga002.fm.intel.com with ESMTP; 19 Mar 2019 17:09:42 -0700 Date: Tue, 19 Mar 2019 17:09:42 -0700 From: Sean Christopherson To: Xiaoyao Li Cc: kvm@vger.kernel.org, Paolo Bonzini , Radim =?utf-8?B?S3LEjW3DocWZ?= , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , linux-kernel@vger.kernel.org, chao.gao@intel.com Subject: Re: [PATCH v2 2/2] kvm/vmx: Using hardware cpuid faulting to avoid emulation overhead Message-ID: <20190320000942.GN25575@linux.intel.com> References: <20190318114324.14198-1-xiaoyao.li@linux.intel.com> <20190318114324.14198-3-xiaoyao.li@linux.intel.com> <20190318163832.GB13528@linux.intel.com> <085cc389df4d734ee173bb4c199776591d197a21.camel@linux.intel.com> <20190319142849.GB25575@linux.intel.com> <73fefa02cafb0e94de0c2928185560e092fe137e.camel@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <73fefa02cafb0e94de0c2928185560e092fe137e.camel@linux.intel.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 20, 2019 at 01:51:28AM +0800, Xiaoyao Li wrote: > On Tue, 2019-03-19 at 07:28 -0700, Sean Christopherson wrote: > > On Tue, Mar 19, 2019 at 12:37:23PM +0800, Xiaoyao Li wrote: > > > On Mon, 2019-03-18 at 09:38 -0700, Sean Christopherson wrote: > > > > On Mon, Mar 18, 2019 at 07:43:24PM +0800, Xiaoyao Li wrote: > > > > > Current cpuid faulting of guest is purely emulated in kvm, which > > > > > exploits > > > > > CPUID vm exit to inject #GP to guest. However, if host hardware cpu has > > > > > X86_FEATURE_CPUID_FAULT, we can just use the hardware cpuid faulting for > > > > > guest to avoid the vm exit overhead. > > > > > > > > Heh, I obviously didn't look at this patch before responding to patch 1/2. > > > > > > > > > Note: cpuid faulting takes higher priority over CPUID instruction vm > > > > > exit (Intel SDM vol3.25.1.1). > > > > > > > > > > Since cpuid faulting only exists on some Intel's cpu, just apply this > > > > > optimization to vmx. > > > > > > > > > > Signed-off-by: Xiaoyao Li > > > > > --- > > > > > arch/x86/include/asm/kvm_host.h | 2 ++ > > > > > arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++++---- > > > > > arch/x86/kvm/x86.c | 15 ++++++++++++--- > > > > > 3 files changed, 29 insertions(+), 7 deletions(-) > > > > > > > > > > diff --git a/arch/x86/include/asm/kvm_host.h > > > > > b/arch/x86/include/asm/kvm_host.h > > > > > index ce79d7bfe1fd..14cad587b804 100644 > > > > > --- a/arch/x86/include/asm/kvm_host.h > > > > > +++ b/arch/x86/include/asm/kvm_host.h > > > > > @@ -1339,6 +1339,8 @@ void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long > > > > > msw); > > > > > void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l); > > > > > int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr); > > > > > > > > > > +int kvm_supported_msr_misc_features_enables(struct kvm_vcpu *vcpu, u64 > > > > > data); > > > > > + > > > > > int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr); > > > > > int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr); > > > > > > > > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > > > > index 2c59e0209e36..6b413e471dca 100644 > > > > > --- a/arch/x86/kvm/vmx/vmx.c > > > > > +++ b/arch/x86/kvm/vmx/vmx.c > > > > > @@ -1037,7 +1037,7 @@ static void pt_guest_exit(struct vcpu_vmx *vmx) > > > > > > > > > > static void vmx_save_host_cpuid_fault(struct vcpu_vmx *vmx) > > > > > { > > > > > - u64 host_val; > > > > > + u64 host_val, guest_val; > > > > > > > > > > if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT)) > > > > > return; > > > > > @@ -1045,10 +1045,12 @@ static void vmx_save_host_cpuid_fault(struct > > > > > vcpu_vmx *vmx) > > > > > rdmsrl(MSR_MISC_FEATURES_ENABLES, host_val); > > > > > vmx->host_msr_misc_features_enables = host_val; > > > > > > > > > > - /* clear cpuid fault bit to avoid it leak to guest */ > > > > > - if (host_val & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) { > > > > > + guest_val = vmx->vcpu.arch.msr_misc_features_enables; > > > > > + > > > > > + /* we can use the hardware cpuid faulting to avoid emulation > > > > > overhead */ > > > > > + if ((host_val ^ guest_val) & > > > > > MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) { > > > > > wrmsrl(MSR_MISC_FEATURES_ENABLES, > > > > > - host_val & > > > > > ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT); > > > > > + host_val ^ > > > > > MSR_MISC_FEATURES_ENABLES_CPUID_FAULT); > > > > > } > > > > > } > > > > > > > > > > @@ -2057,6 +2059,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, > > > > > struct > > > > > msr_data *msr_info) > > > > > else > > > > > vmx->pt_desc.guest.addr_a[index / 2] = data; > > > > > break; > > > > > + case MSR_MISC_FEATURES_ENABLES: > > > > > + if (!kvm_supported_msr_misc_features_enables(vcpu, > > > > > data)) > > > > > + return 1; > > > > > + if (boot_cpu_has(X86_FEATURE_CPUID_FAULT)) { > > > > > + if (vmx->loaded_cpu_state) > > > > > > > > No need for two separate if statements. And assuming we're checking the > > > > existing shadow value when loading guest/host state, the WRMSR should > > > > only be done if the host's value is non-zero. > > > > > > I'll combine these two if statements into one. > > > > > > I cannot understand why the WRMSR should only be done if the host's value is > > > non-zero. I think there is no depedency with host's value, if using the > > > hardware > > > cpuid faulting. We just need to set the value to real hardware MSR. > > > > What I was trying to say in patch 1/2 regarding save/restore, is that I > > don't think it is worthwhile to voluntarily switch hardware's value. In > > other words, do the WRMSR if and only if it's absolutely necessary. And > > that means only installing the guest's value if the host's value is > > non-zero and host_val != guest_val. If the host's value is zero, then > > the guest's value is irrelevant as CPUID faulting behavior will naturally > > be taken care of when intercepting CPUID. And for obvious reasons the > > WRMSR can be skipped if host and guest want the same value. > > The purpose of this patch is always using hardware cpuid fault if hardware cpu > has this feature. Because emuated cpuid faulting needs entire vmexit process, > but using hardware cpuid faulting just adds two WRMSR in save/restore cycle only > when host_val != guest_val. > > Also I conclude the handling ou said as below: > When host's value is zero, we do nothing to this MSR, and let guest ues emulated > cpuid faulting through CPUID intercepting. > When host's value is non-zero, we load the guest'value into hardware MSR, which > means we use hardware cpuid faulting. > > So the difference is when host value is zero, you choose to use emulated cpuid > faulting. What's meaning of chooseing emualtion or hardware feature based on > host's value? To save cycles by avoiding WRMSR whenever possible. WRMSR is expensive, even if it's limited to vcpu_{load,put}(), e.g. a workload that triggers a lot of exits to userspace isn't going to be thrilled about the extra 250-300 cycles added to each round trip. On the other hand, emulating CPUID faulting only adds a VM-Exit to userspace CPUID instructions that will fault anyways, and faults aren't exactly fast paths. Most uses of CPUID in userspace are in application startup, e.g. a hanful of CPUIDs to determine what features can be used. So even if the ~1000 cycles added by the VM-Exit is somehow meaningful to the fault path, its impact is limited to a tiny number of instructions executed in the guest, whereas doing WRMSR affects every vcpu_{load,put}.