2019-03-18 11:44:43

by Xiaoyao Li

[permalink] [raw]
Subject: [PATCH v2 0/2] Avoid cpuid faulting leaking and one optimization

This series avoid cpuid faulting of host leakding to guest, which may
potentially cause guest boot failure, and use hardware cpuid faulting
to remove emulation overhead.

Patch 1 avoids cpuid faulting leaking to guest through clearing cpuid faulting
bit before enter guest and restoring host's cpuid faulting bit when switch
to host.

Patch 2 enables hardware cpuid faulting for guest if it exists, to avoid
the emulation overhead.

==changelog==
v2:
- move the save/restore of cpuid faulting bit to
vmx_prepare_swich_to_guest/vmx_prepare_swich_to_host to avoid every
vmentry RDMSR, based on Paolo's comment.

==previous version==
v1: https://patchwork.kernel.org/patch/10852253/

Xiaoyao Li (2):
kvm/vmx: avoid CPUID faulting leaking to guest
kvm/vmx: Using hardware cpuid faulting to avoid emulation overhead

arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/vmx/vmx.c | 45 +++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 2 ++
arch/x86/kvm/x86.c | 15 ++++++++---
4 files changed, 61 insertions(+), 3 deletions(-)

--
2.19.1



2019-03-18 11:44:55

by Xiaoyao Li

[permalink] [raw]
Subject: [PATCH v2 1/2] kvm/vmx: avoid CPUID faulting leaking to guest

cpuid faulting is a feature about CPUID instruction. When cpuif faulting
is enabled, all execution of the CPUID instruction outside system-management
mode (SMM) cause a general-protection (#GP) if the CPL > 0.

About this feature, detailed information can be found at
https://www.intel.com/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf

Current KVM provides software emulation of this feature for guest.
However, because cpuid faulting takes higher priority over CPUID vm exit (Intel
SDM vol3.25.1.1), there is a risk of leaking cpuid faulting to guest when host
enables it. If host enables cpuid faulting by setting the bit 0 of
MSR_MISC_FEATURES_ENABLES, it will pass to guest since there is no handling of
MSR_MISC_FEATURES_ENABLES yet. As a result, when guest calls CPUID instruction
in CPL > 0, it will generate a #GP instead of CPUID vm eixt.

This issue will cause guest boot failure when guest uses *modprobe*
to load modules. *modprobe* calls CPUID instruction, thus causing #GP in
guest. Since there is no handling of cpuid faulting in #GP handler, guest
fails boot.

To fix this issue, we should switch cpuid faulting bit between host and guest.
Since MSR_MISC_FEATURES_ENABLES is intel-specific, this patch implement the
switching only in vmx. It clears the cpuid faulting bit and save host's
value before switching to guest, and restores the cpuid faulting settings of
host before switching to host.

Because kvm provides the software emulation of cpuid faulting, we can
just clear the cpuid faulting bit in hardware MSR when switching to
guest.

Signed-off-by: Xiaoyao Li <[email protected]>
---
Changes in v2:
- move the save/restore of cpuid faulting bit to
vmx_prepare_swich_to_guest/vmx_prepare_swich_to_host to avoid every
vmentry RDMSR, based on Paolo's comment.

---
arch/x86/kvm/vmx/vmx.c | 34 ++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 2 ++
2 files changed, 36 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 541d442edd4e..2c59e0209e36 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1035,6 +1035,23 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)
wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
}

+static void vmx_save_host_cpuid_fault(struct vcpu_vmx *vmx)
+{
+ u64 host_val;
+
+ if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
+ return;
+
+ rdmsrl(MSR_MISC_FEATURES_ENABLES, host_val);
+ vmx->host_msr_misc_features_enables = host_val;
+
+ /* clear cpuid fault bit to avoid it leak to guest */
+ if (host_val & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
+ wrmsrl(MSR_MISC_FEATURES_ENABLES,
+ host_val & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
+ }
+}
+
void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -1068,6 +1085,8 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
vmx->loaded_cpu_state = vmx->loaded_vmcs;
host_state = &vmx->loaded_cpu_state->host_state;

+ vmx_save_host_cpuid_fault(vmx);
+
/*
* Set host fs and gs selectors. Unfortunately, 22.2.3 does not
* allow segment selectors with cpl > 0 or ti == 1.
@@ -1124,6 +1143,19 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
}
}

+static void vmx_restore_host_cpuid_fault(struct vcpu_vmx *vmx)
+{
+ u64 msrval;
+
+ if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
+ return;
+
+ rdmsrl(MSR_MISC_FEATURES_ENABLES, msrval);
+ msrval |= vmx->host_msr_misc_features_enables &
+ MSR_MISC_FEATURES_ENABLES_CPUID_FAULT;
+ wrmsrl(MSR_MISC_FEATURES_ENABLES, msrval);
+}
+
static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx)
{
struct vmcs_host_state *host_state;
@@ -1137,6 +1169,8 @@ static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx)
++vmx->vcpu.stat.host_state_reload;
vmx->loaded_cpu_state = NULL;

+ vmx_restore_host_cpuid_fault(vmx);
+
#ifdef CONFIG_X86_64
rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
#endif
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 5df73b36fa49..ba867bbc5676 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -268,6 +268,8 @@ struct vcpu_vmx {
u64 msr_ia32_feature_control_valid_bits;
u64 ept_pointer;

+ u64 host_msr_misc_features_enables;
+
struct pt_desc pt_desc;
};

--
2.19.1


2019-03-18 11:45:24

by Xiaoyao Li

[permalink] [raw]
Subject: [PATCH v2 2/2] kvm/vmx: Using hardware cpuid faulting to avoid emulation overhead

Current cpuid faulting of guest is purely emulated in kvm, which exploits
CPUID vm exit to inject #GP to guest. However, if host hardware cpu has
X86_FEATURE_CPUID_FAULT, we can just use the hardware cpuid faulting for
guest to avoid the vm exit overhead.

Note: cpuid faulting takes higher priority over CPUID instruction vm
exit (Intel SDM vol3.25.1.1).

Since cpuid faulting only exists on some Intel's cpu, just apply this
optimization to vmx.

Signed-off-by: Xiaoyao Li <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++++----
arch/x86/kvm/x86.c | 15 ++++++++++++---
3 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ce79d7bfe1fd..14cad587b804 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1339,6 +1339,8 @@ void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l);
int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);

+int kvm_supported_msr_misc_features_enables(struct kvm_vcpu *vcpu, u64 data);
+
int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2c59e0209e36..6b413e471dca 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1037,7 +1037,7 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)

static void vmx_save_host_cpuid_fault(struct vcpu_vmx *vmx)
{
- u64 host_val;
+ u64 host_val, guest_val;

if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
return;
@@ -1045,10 +1045,12 @@ static void vmx_save_host_cpuid_fault(struct vcpu_vmx *vmx)
rdmsrl(MSR_MISC_FEATURES_ENABLES, host_val);
vmx->host_msr_misc_features_enables = host_val;

- /* clear cpuid fault bit to avoid it leak to guest */
- if (host_val & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
+ guest_val = vmx->vcpu.arch.msr_misc_features_enables;
+
+ /* we can use the hardware cpuid faulting to avoid emulation overhead */
+ if ((host_val ^ guest_val) & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
wrmsrl(MSR_MISC_FEATURES_ENABLES,
- host_val & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
+ host_val ^ MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
}
}

@@ -2057,6 +2059,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
else
vmx->pt_desc.guest.addr_a[index / 2] = data;
break;
+ case MSR_MISC_FEATURES_ENABLES:
+ if (!kvm_supported_msr_misc_features_enables(vcpu, data))
+ return 1;
+ if (boot_cpu_has(X86_FEATURE_CPUID_FAULT)) {
+ if (vmx->loaded_cpu_state)
+ wrmsrl(MSR_MISC_FEATURES_ENABLES, data);
+ }
+ vcpu->arch.msr_misc_features_enables = data;
+ break;
case MSR_TSC_AUX:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 434ec113cc79..33a8c95b2f2e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2449,6 +2449,17 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
}

+int kvm_supported_msr_misc_features_enables(struct kvm_vcpu *vcpu, u64 data)
+{
+ if (data & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT ||
+ (data & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT &&
+ !supports_cpuid_fault(vcpu)))
+ return 0;
+ else
+ return 1;
+}
+EXPORT_SYMBOL_GPL(kvm_supported_msr_misc_features_enables);
+
int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
bool pr = false;
@@ -2669,9 +2680,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vcpu->arch.msr_platform_info = data;
break;
case MSR_MISC_FEATURES_ENABLES:
- if (data & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT ||
- (data & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT &&
- !supports_cpuid_fault(vcpu)))
+ if (!kvm_supported_msr_misc_features_enables(vcpu, data))
return 1;
vcpu->arch.msr_misc_features_enables = data;
break;
--
2.19.1


2019-03-18 16:26:00

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] kvm/vmx: avoid CPUID faulting leaking to guest

On Mon, Mar 18, 2019 at 07:43:23PM +0800, Xiaoyao Li wrote:
> cpuid faulting is a feature about CPUID instruction. When cpuif faulting
^^^^^
cpuid
> is enabled, all execution of the CPUID instruction outside system-management
> mode (SMM) cause a general-protection (#GP) if the CPL > 0.
>
> About this feature, detailed information can be found at
> https://www.intel.com/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf
>
> Current KVM provides software emulation of this feature for guest.
> However, because cpuid faulting takes higher priority over CPUID vm exit (Intel
> SDM vol3.25.1.1), there is a risk of leaking cpuid faulting to guest when host
> enables it. If host enables cpuid faulting by setting the bit 0 of
> MSR_MISC_FEATURES_ENABLES, it will pass to guest since there is no handling of
> MSR_MISC_FEATURES_ENABLES yet. As a result, when guest calls CPUID instruction
> in CPL > 0, it will generate a #GP instead of CPUID vm eixt.
>
> This issue will cause guest boot failure when guest uses *modprobe*
> to load modules. *modprobe* calls CPUID instruction, thus causing #GP in
> guest. Since there is no handling of cpuid faulting in #GP handler, guest
> fails boot.
>
> To fix this issue, we should switch cpuid faulting bit between host and guest.
> Since MSR_MISC_FEATURES_ENABLES is intel-specific, this patch implement the
> switching only in vmx. It clears the cpuid faulting bit and save host's
> value before switching to guest, and restores the cpuid faulting settings of
> host before switching to host.
>
> Because kvm provides the software emulation of cpuid faulting, we can
> just clear the cpuid faulting bit in hardware MSR when switching to
> guest.
>
> Signed-off-by: Xiaoyao Li <[email protected]>
> ---
> Changes in v2:
> - move the save/restore of cpuid faulting bit to
> vmx_prepare_swich_to_guest/vmx_prepare_swich_to_host to avoid every
> vmentry RDMSR, based on Paolo's comment.
>
> ---
> arch/x86/kvm/vmx/vmx.c | 34 ++++++++++++++++++++++++++++++++++
> arch/x86/kvm/vmx/vmx.h | 2 ++
> 2 files changed, 36 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 541d442edd4e..2c59e0209e36 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1035,6 +1035,23 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)
> wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
> }
>
> +static void vmx_save_host_cpuid_fault(struct vcpu_vmx *vmx)

Maybe vmx_load_guest_misc_features_enables()? See below for reasoning.
Alternatively you can drop the helpers altogether.

> +{
> + u64 host_val;
> +
> + if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
> + return;
> +
> + rdmsrl(MSR_MISC_FEATURES_ENABLES, host_val);
> + vmx->host_msr_misc_features_enables = host_val;

There's no need to cache the host value in struct vcpu_vmx, just use the
kernel's shadow value.

> +
> + /* clear cpuid fault bit to avoid it leak to guest */

Personally I find the comment unnecessary and somewhat misleading.

> + if (host_val & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
> + wrmsrl(MSR_MISC_FEATURES_ENABLES,
> + host_val & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);

I think we can also skip WRMSR if CPUID faulting is also enabled in the
guest. It probably doesn't make sense to install the guest's value if
CPUID faulting is enabled in the guest and not host since intercepting
CPUID likely provides better overall performance than switching the MSR
on entry and exit. And last but not least, since there are other bits
in the MSR that we don't want to expose to the guest, e.g. RING3MWAIT,
checking for a non-zero host value is probably better than checking for
individual feature bits. Same reasoning for writing the guest's value
instead of clearing just the CPUID faulting bit.

So something like:

u64 msrval = this_cpu_read(msr_misc_features_shadow);

if (!msrval || msrval == vcpu->arch.msr_misc_features_enables)
return;

wrmsrl(MSR_MISC_FEATURES_ENABLES, vcpu->arch.msr_misc_features_enables);


or if you drop the helpers:

msrval = this_cpu_read(msr_misc_features_shadow);
if (msrval && msrval != vcpu->arch.msr_misc_features_enables)
wrmsrl(MSR_MISC_FEATURES_ENABLES,
vcpu->arch.msr_misc_features_enables);

> + }
> +}
> +
> void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -1068,6 +1085,8 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> vmx->loaded_cpu_state = vmx->loaded_vmcs;
> host_state = &vmx->loaded_cpu_state->host_state;
>
> + vmx_save_host_cpuid_fault(vmx);
> +
> /*
> * Set host fs and gs selectors. Unfortunately, 22.2.3 does not
> * allow segment selectors with cpl > 0 or ti == 1.
> @@ -1124,6 +1143,19 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> }
> }
>
> +static void vmx_restore_host_cpuid_fault(struct vcpu_vmx *vmx)

If you keep the helpers, maybe vmx_load_host_misc_features_enables()
to pair with the new function name for loading guest state?

> +{
> + u64 msrval;
> +
> + if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
> + return;
> +
> + rdmsrl(MSR_MISC_FEATURES_ENABLES, msrval);

> + msrval |= vmx->host_msr_misc_features_enables &
> + MSR_MISC_FEATURES_ENABLES_CPUID_FAULT;

Again, there's no need for RDMSR, the host's value can be pulled from
msr_misc_features_shadow, and the WRMSR can be skipped if the host and
guest have the same value, i.e. we didn't install the guest's value.

u64 msrval = this_cpu_read(msr_misc_features_shadow);

if (!msrval || msrval == vcpu->arch.msr_misc_features_enables)
return;

wrmsrl(MSR_MISC_FEATURES_ENABLES, msrval);

> + wrmsrl(MSR_MISC_FEATURES_ENABLES, msrval);
> +}
> +
> static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx)
> {
> struct vmcs_host_state *host_state;
> @@ -1137,6 +1169,8 @@ static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx)
> ++vmx->vcpu.stat.host_state_reload;
> vmx->loaded_cpu_state = NULL;
>
> + vmx_restore_host_cpuid_fault(vmx);
> +
> #ifdef CONFIG_X86_64
> rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
> #endif
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 5df73b36fa49..ba867bbc5676 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -268,6 +268,8 @@ struct vcpu_vmx {
> u64 msr_ia32_feature_control_valid_bits;
> u64 ept_pointer;
>
> + u64 host_msr_misc_features_enables;

As mentioned above, this isn't needed.

> +
> struct pt_desc pt_desc;
> };
>
> --
> 2.19.1
>

2019-03-18 16:39:52

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] kvm/vmx: Using hardware cpuid faulting to avoid emulation overhead

On Mon, Mar 18, 2019 at 07:43:24PM +0800, Xiaoyao Li wrote:
> Current cpuid faulting of guest is purely emulated in kvm, which exploits
> CPUID vm exit to inject #GP to guest. However, if host hardware cpu has
> X86_FEATURE_CPUID_FAULT, we can just use the hardware cpuid faulting for
> guest to avoid the vm exit overhead.

Heh, I obviously didn't look at this patch before responding to patch 1/2.

> Note: cpuid faulting takes higher priority over CPUID instruction vm
> exit (Intel SDM vol3.25.1.1).
>
> Since cpuid faulting only exists on some Intel's cpu, just apply this
> optimization to vmx.
>
> Signed-off-by: Xiaoyao Li <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 2 ++
> arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++++----
> arch/x86/kvm/x86.c | 15 ++++++++++++---
> 3 files changed, 29 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ce79d7bfe1fd..14cad587b804 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1339,6 +1339,8 @@ void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
> void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l);
> int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
>
> +int kvm_supported_msr_misc_features_enables(struct kvm_vcpu *vcpu, u64 data);
> +
> int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
> int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 2c59e0209e36..6b413e471dca 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1037,7 +1037,7 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)
>
> static void vmx_save_host_cpuid_fault(struct vcpu_vmx *vmx)
> {
> - u64 host_val;
> + u64 host_val, guest_val;
>
> if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
> return;
> @@ -1045,10 +1045,12 @@ static void vmx_save_host_cpuid_fault(struct vcpu_vmx *vmx)
> rdmsrl(MSR_MISC_FEATURES_ENABLES, host_val);
> vmx->host_msr_misc_features_enables = host_val;
>
> - /* clear cpuid fault bit to avoid it leak to guest */
> - if (host_val & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
> + guest_val = vmx->vcpu.arch.msr_misc_features_enables;
> +
> + /* we can use the hardware cpuid faulting to avoid emulation overhead */
> + if ((host_val ^ guest_val) & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
> wrmsrl(MSR_MISC_FEATURES_ENABLES,
> - host_val & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
> + host_val ^ MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
> }
> }
>
> @@ -2057,6 +2059,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> else
> vmx->pt_desc.guest.addr_a[index / 2] = data;
> break;
> + case MSR_MISC_FEATURES_ENABLES:
> + if (!kvm_supported_msr_misc_features_enables(vcpu, data))
> + return 1;
> + if (boot_cpu_has(X86_FEATURE_CPUID_FAULT)) {
> + if (vmx->loaded_cpu_state)

No need for two separate if statements. And assuming we're checking the
existing shadow value when loading guest/host state, the WRMSR should
only be done if the host's value is non-zero.

> + wrmsrl(MSR_MISC_FEATURES_ENABLES, data);
> + }
> + vcpu->arch.msr_misc_features_enables = data;
> + break;
> case MSR_TSC_AUX:
> if (!msr_info->host_initiated &&
> !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 434ec113cc79..33a8c95b2f2e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2449,6 +2449,17 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
> &vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
> }
>
> +int kvm_supported_msr_misc_features_enables(struct kvm_vcpu *vcpu, u64 data)
> +{
> + if (data & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT ||
> + (data & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT &&
> + !supports_cpuid_fault(vcpu)))
> + return 0;
> + else
> + return 1;
> +}
> +EXPORT_SYMBOL_GPL(kvm_supported_msr_misc_features_enables);
> +
> int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> {
> bool pr = false;
> @@ -2669,9 +2680,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> vcpu->arch.msr_platform_info = data;
> break;
> case MSR_MISC_FEATURES_ENABLES:
> - if (data & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT ||
> - (data & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT &&
> - !supports_cpuid_fault(vcpu)))
> + if (!kvm_supported_msr_misc_features_enables(vcpu, data))
> return 1;
> vcpu->arch.msr_misc_features_enables = data;
> break;
> --
> 2.19.1
>

2019-03-19 04:44:35

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] kvm/vmx: Using hardware cpuid faulting to avoid emulation overhead

On Mon, 2019-03-18 at 09:38 -0700, Sean Christopherson wrote:
> On Mon, Mar 18, 2019 at 07:43:24PM +0800, Xiaoyao Li wrote:
> > Current cpuid faulting of guest is purely emulated in kvm, which exploits
> > CPUID vm exit to inject #GP to guest. However, if host hardware cpu has
> > X86_FEATURE_CPUID_FAULT, we can just use the hardware cpuid faulting for
> > guest to avoid the vm exit overhead.
>
> Heh, I obviously didn't look at this patch before responding to patch 1/2.
>
> > Note: cpuid faulting takes higher priority over CPUID instruction vm
> > exit (Intel SDM vol3.25.1.1).
> >
> > Since cpuid faulting only exists on some Intel's cpu, just apply this
> > optimization to vmx.
> >
> > Signed-off-by: Xiaoyao Li <[email protected]>
> > ---
> > arch/x86/include/asm/kvm_host.h | 2 ++
> > arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++++----
> > arch/x86/kvm/x86.c | 15 ++++++++++++---
> > 3 files changed, 29 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h
> > b/arch/x86/include/asm/kvm_host.h
> > index ce79d7bfe1fd..14cad587b804 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1339,6 +1339,8 @@ void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long
> > msw);
> > void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l);
> > int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
> >
> > +int kvm_supported_msr_misc_features_enables(struct kvm_vcpu *vcpu, u64
> > data);
> > +
> > int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
> > int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 2c59e0209e36..6b413e471dca 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1037,7 +1037,7 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)
> >
> > static void vmx_save_host_cpuid_fault(struct vcpu_vmx *vmx)
> > {
> > - u64 host_val;
> > + u64 host_val, guest_val;
> >
> > if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
> > return;
> > @@ -1045,10 +1045,12 @@ static void vmx_save_host_cpuid_fault(struct
> > vcpu_vmx *vmx)
> > rdmsrl(MSR_MISC_FEATURES_ENABLES, host_val);
> > vmx->host_msr_misc_features_enables = host_val;
> >
> > - /* clear cpuid fault bit to avoid it leak to guest */
> > - if (host_val & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
> > + guest_val = vmx->vcpu.arch.msr_misc_features_enables;
> > +
> > + /* we can use the hardware cpuid faulting to avoid emulation overhead */
> > + if ((host_val ^ guest_val) & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
> > wrmsrl(MSR_MISC_FEATURES_ENABLES,
> > - host_val & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
> > + host_val ^ MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
> > }
> > }
> >
> > @@ -2057,6 +2059,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct
> > msr_data *msr_info)
> > else
> > vmx->pt_desc.guest.addr_a[index / 2] = data;
> > break;
> > + case MSR_MISC_FEATURES_ENABLES:
> > + if (!kvm_supported_msr_misc_features_enables(vcpu, data))
> > + return 1;
> > + if (boot_cpu_has(X86_FEATURE_CPUID_FAULT)) {
> > + if (vmx->loaded_cpu_state)
>
> No need for two separate if statements. And assuming we're checking the
> existing shadow value when loading guest/host state, the WRMSR should
> only be done if the host's value is non-zero.

I'll combine these two if statements into one.

I cannot understand why the WRMSR should only be done if the host's value is
non-zero. I think there is no depedency with host's value, if using the hardware
cpuid faulting. We just need to set the value to real hardware MSR.

> > + wrmsrl(MSR_MISC_FEATURES_ENABLES, data);
> > + }
> > + vcpu->arch.msr_misc_features_enables = data;
> > + break;
> > case MSR_TSC_AUX:
> > if (!msr_info->host_initiated &&
> > !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 434ec113cc79..33a8c95b2f2e 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2449,6 +2449,17 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
> > &vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
> > }
> >
> > +int kvm_supported_msr_misc_features_enables(struct kvm_vcpu *vcpu, u64
> > data)
> > +{
> > + if (data & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT ||
> > + (data & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT &&
> > + !supports_cpuid_fault(vcpu)))
> > + return 0;
> > + else
> > + return 1;
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_supported_msr_misc_features_enables);
> > +
> > int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > {
> > bool pr = false;
> > @@ -2669,9 +2680,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct
> > msr_data *msr_info)
> > vcpu->arch.msr_platform_info = data;
> > break;
> > case MSR_MISC_FEATURES_ENABLES:
> > - if (data & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT ||
> > - (data & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT &&
> > - !supports_cpuid_fault(vcpu)))
> > + if (!kvm_supported_msr_misc_features_enables(vcpu, data))
> > return 1;
> > vcpu->arch.msr_misc_features_enables = data;
> > break;
> > --
> > 2.19.1
> >


2019-03-19 14:20:56

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] kvm/vmx: avoid CPUID faulting leaking to guest

On Tue, Mar 19, 2019 at 12:26:55PM +0800, Xiaoyao Li wrote:
> Hi, Sean,
>
> On Mon, 2019-03-18 at 09:23 -0700, Sean Christopherson wrote:
> > On Mon, Mar 18, 2019 at 07:43:23PM +0800, Xiaoyao Li wrote:
> > > cpuid faulting is a feature about CPUID instruction. When cpuif faulting
> >
> > ^^^^^
> > cpuid
> > > is enabled, all execution of the CPUID instruction outside system-management
> > > mode (SMM) cause a general-protection (#GP) if the CPL > 0.
> > >
> > > About this feature, detailed information can be found at
> > >
> https://www.intel.com/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf
> > >
> > > Current KVM provides software emulation of this feature for guest.
> > > However, because cpuid faulting takes higher priority over CPUID vm exit
> > > (Intel
> > > SDM vol3.25.1.1), there is a risk of leaking cpuid faulting to guest when
> > > host
> > > enables it. If host enables cpuid faulting by setting the bit 0 of
> > > MSR_MISC_FEATURES_ENABLES, it will pass to guest since there is no handling
> > > of
> > > MSR_MISC_FEATURES_ENABLES yet. As a result, when guest calls CPUID
> > > instruction
> > > in CPL > 0, it will generate a #GP instead of CPUID vm eixt.
> > >
> > > This issue will cause guest boot failure when guest uses *modprobe*
> > > to load modules. *modprobe* calls CPUID instruction, thus causing #GP in
> > > guest. Since there is no handling of cpuid faulting in #GP handler, guest
> > > fails boot.
> > >
> > > To fix this issue, we should switch cpuid faulting bit between host and
> > > guest.
> > > Since MSR_MISC_FEATURES_ENABLES is intel-specific, this patch implement the
> > > switching only in vmx. It clears the cpuid faulting bit and save host's
> > > value before switching to guest, and restores the cpuid faulting settings of
> > > host before switching to host.
> > >
> > > Because kvm provides the software emulation of cpuid faulting, we can
> > > just clear the cpuid faulting bit in hardware MSR when switching to
> > > guest.
> > >
> > > Signed-off-by: Xiaoyao Li <[email protected]>
> > > ---
> > > Changes in v2:
> > > - move the save/restore of cpuid faulting bit to
> > > vmx_prepare_swich_to_guest/vmx_prepare_swich_to_host to avoid every
> > > vmentry RDMSR, based on Paolo's comment.
> > >
> > > ---
> > > arch/x86/kvm/vmx/vmx.c | 34 ++++++++++++++++++++++++++++++++++
> > > arch/x86/kvm/vmx/vmx.h | 2 ++
> > > 2 files changed, 36 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 541d442edd4e..2c59e0209e36 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -1035,6 +1035,23 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)
> > > wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
> > > }
> > >
> > > +static void vmx_save_host_cpuid_fault(struct vcpu_vmx *vmx)
> >
> > Maybe vmx_load_guest_misc_features_enables()? See below for reasoning.
> > Alternatively you can drop the helpers altogether.
> >
> > > +{
> > > + u64 host_val;
> > > +
> > > + if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
> > > + return;
> > > +
> > > + rdmsrl(MSR_MISC_FEATURES_ENABLES, host_val);
> > > + vmx->host_msr_misc_features_enables = host_val;
> >
> > There's no need to cache the host value in struct vcpu_vmx, just use the
> > kernel's shadow value.
>
> you're right, thanks for pointing out this.
>
> > > +
> > > + /* clear cpuid fault bit to avoid it leak to guest */
> >
> > Personally I find the comment unnecessary and somewhat misleading.
> >
> > > + if (host_val & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
> > > + wrmsrl(MSR_MISC_FEATURES_ENABLES,
> > > + host_val & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
> >
> > I think we can also skip WRMSR if CPUID faulting is also enabled in the
>
> Right, the initial purpose of this patch is to clear hardware cpuid faulting bit
> for guest and just let guest use the kvm emulation of cpuid faulting.
> Using hardware cpuid faulting instead of kvm emulation is what the next patch
> doing.
>
> It's Paolo's suggestion that splits it into 2 patches.

Agreed, two patches is better.

> > guest. It probably doesn't make sense to install the guest's value if
> > CPUID faulting is enabled in the guest and not host since intercepting
> > CPUID likely provides better overall performance than switching the MSR
> > on entry and exit. And last but not least, since there are other bits
>
> Actually, this MSR switching is not happening on every entry and exit, it clears
> host cpuid faulting only when vmx->loaded_cpu_state == NULL, and load host cpuid
> faulting only when vmx->loaded_cpu_state != NULL. Usually, in the vcpu_run loop,
> it switchs for once.

Whoops, yeah, every save/restore cycle.

> However, there indeed is a tradeoff between switching the MSR and intercepting
> CPUID. Using hardware cpuid faulting for guest, it needs to switch related MSR,
> which incurs wrmsr overhead. But using emulation, it has to suffer the overhead
> of vmexit with intercepting CPUID instruction.
> And I don't know which is better.
>
> At this point, I think it makes sense to use 2 patches. One for fixing potential
> leaking issue that just clears cpuid faulting bit for guest, and the other
> writes guest cpuid faulting value to hardware bit to optimize performance (maybe
> it doesn't).
>
> > in the MSR that we don't want to expose to the guest, e.g. RING3MWAIT,
> > checking for a non-zero host value is probably better than checking for
> > individual feature bits. Same reasoning for writing the guest's value
> > instead of clearing just the CPUID faulting bit.
>
> About RING3MWAIT, I just let it go as without this patch.
> Since you pointed out *we* don't want to expose other bits in the MSR to the
> guest, I will clear the both bits in next version.

Clear all bits, i.e. write 0. That way KVM doesn't need to be updated
if/when hardware introduces another bit.

> >
> > So something like:
> >
> > u64 msrval = this_cpu_read(msr_misc_features_shadow);
> >
> > if (!msrval || msrval == vcpu->arch.msr_misc_features_enables)
> > return;
> >
> > wrmsrl(MSR_MISC_FEATURES_ENABLES, vcpu->arch.msr_misc_features_enables);
> >
> >
> > or if you drop the helpers:
> >
> > msrval = this_cpu_read(msr_misc_features_shadow);
> > if (msrval && msrval != vcpu->arch.msr_misc_features_enables)
> > wrmsrl(MSR_MISC_FEATURES_ENABLES,
> > vcpu->arch.msr_misc_features_enables);
> >
> > > + }
> > > +}
> > > +
> > > void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> > > {
> > > struct vcpu_vmx *vmx = to_vmx(vcpu);
> > > @@ -1068,6 +1085,8 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu
> > > *vcpu)
> > > vmx->loaded_cpu_state = vmx->loaded_vmcs;
> > > host_state = &vmx->loaded_cpu_state->host_state;
> > >
> > > + vmx_save_host_cpuid_fault(vmx);
> > > +
> > > /*
> > > * Set host fs and gs selectors. Unfortunately, 22.2.3 does not
> > > * allow segment selectors with cpl > 0 or ti == 1.
> > > @@ -1124,6 +1143,19 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu
> > > *vcpu)
> > > }
> > > }
> > >
> > > +static void vmx_restore_host_cpuid_fault(struct vcpu_vmx *vmx)
> >
> > If you keep the helpers, maybe vmx_load_host_misc_features_enables()
> > to pair with the new function name for loading guest state?
> >
> > > +{
> > > + u64 msrval;
> > > +
> > > + if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
> > > + return;
> > > +
> > > + rdmsrl(MSR_MISC_FEATURES_ENABLES, msrval);
> > > + msrval |= vmx->host_msr_misc_features_enables &
> > > + MSR_MISC_FEATURES_ENABLES_CPUID_FAULT;
> >
> > Again, there's no need for RDMSR, the host's value can be pulled from
> > msr_misc_features_shadow, and the WRMSR can be skipped if the host and
> > guest have the same value, i.e. we didn't install the guest's value.
> >
> > u64 msrval = this_cpu_read(msr_misc_features_shadow);
> >
> > if (!msrval || msrval == vcpu->arch.msr_misc_features_enables)
> > return;
> >
> > wrmsrl(MSR_MISC_FEATURES_ENABLES, msrval);
> >
> > > + wrmsrl(MSR_MISC_FEATURES_ENABLES, msrval);
> > > +}
> > > +
> > > static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx)
> > > {
> > > struct vmcs_host_state *host_state;
> > > @@ -1137,6 +1169,8 @@ static void vmx_prepare_switch_to_host(struct vcpu_vmx
> > > *vmx)
> > > ++vmx->vcpu.stat.host_state_reload;
> > > vmx->loaded_cpu_state = NULL;
> > >
> > > + vmx_restore_host_cpuid_fault(vmx);
> > > +
> > > #ifdef CONFIG_X86_64
> > > rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
> > > #endif
> > > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> > > index 5df73b36fa49..ba867bbc5676 100644
> > > --- a/arch/x86/kvm/vmx/vmx.h
> > > +++ b/arch/x86/kvm/vmx/vmx.h
> > > @@ -268,6 +268,8 @@ struct vcpu_vmx {
> > > u64 msr_ia32_feature_control_valid_bits;
> > > u64 ept_pointer;
> > >
> > > + u64 host_msr_misc_features_enables;
> >
> > As mentioned above, this isn't needed.
>
> Sure, I will remove this and use msr_misc_features_shadow.
>
> > > +
> > > struct pt_desc pt_desc;
> > > };
> > >
> > > --
> > > 2.19.1
> > >
>

2019-03-19 14:29:54

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] kvm/vmx: Using hardware cpuid faulting to avoid emulation overhead

On Tue, Mar 19, 2019 at 12:37:23PM +0800, Xiaoyao Li wrote:
> On Mon, 2019-03-18 at 09:38 -0700, Sean Christopherson wrote:
> > On Mon, Mar 18, 2019 at 07:43:24PM +0800, Xiaoyao Li wrote:
> > > Current cpuid faulting of guest is purely emulated in kvm, which exploits
> > > CPUID vm exit to inject #GP to guest. However, if host hardware cpu has
> > > X86_FEATURE_CPUID_FAULT, we can just use the hardware cpuid faulting for
> > > guest to avoid the vm exit overhead.
> >
> > Heh, I obviously didn't look at this patch before responding to patch 1/2.
> >
> > > Note: cpuid faulting takes higher priority over CPUID instruction vm
> > > exit (Intel SDM vol3.25.1.1).
> > >
> > > Since cpuid faulting only exists on some Intel's cpu, just apply this
> > > optimization to vmx.
> > >
> > > Signed-off-by: Xiaoyao Li <[email protected]>
> > > ---
> > > arch/x86/include/asm/kvm_host.h | 2 ++
> > > arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++++----
> > > arch/x86/kvm/x86.c | 15 ++++++++++++---
> > > 3 files changed, 29 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/kvm_host.h
> > > b/arch/x86/include/asm/kvm_host.h
> > > index ce79d7bfe1fd..14cad587b804 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1339,6 +1339,8 @@ void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long
> > > msw);
> > > void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l);
> > > int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
> > >
> > > +int kvm_supported_msr_misc_features_enables(struct kvm_vcpu *vcpu, u64
> > > data);
> > > +
> > > int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
> > > int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
> > >
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 2c59e0209e36..6b413e471dca 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -1037,7 +1037,7 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)
> > >
> > > static void vmx_save_host_cpuid_fault(struct vcpu_vmx *vmx)
> > > {
> > > - u64 host_val;
> > > + u64 host_val, guest_val;
> > >
> > > if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
> > > return;
> > > @@ -1045,10 +1045,12 @@ static void vmx_save_host_cpuid_fault(struct
> > > vcpu_vmx *vmx)
> > > rdmsrl(MSR_MISC_FEATURES_ENABLES, host_val);
> > > vmx->host_msr_misc_features_enables = host_val;
> > >
> > > - /* clear cpuid fault bit to avoid it leak to guest */
> > > - if (host_val & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
> > > + guest_val = vmx->vcpu.arch.msr_misc_features_enables;
> > > +
> > > + /* we can use the hardware cpuid faulting to avoid emulation overhead */
> > > + if ((host_val ^ guest_val) & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
> > > wrmsrl(MSR_MISC_FEATURES_ENABLES,
> > > - host_val & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
> > > + host_val ^ MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
> > > }
> > > }
> > >
> > > @@ -2057,6 +2059,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct
> > > msr_data *msr_info)
> > > else
> > > vmx->pt_desc.guest.addr_a[index / 2] = data;
> > > break;
> > > + case MSR_MISC_FEATURES_ENABLES:
> > > + if (!kvm_supported_msr_misc_features_enables(vcpu, data))
> > > + return 1;
> > > + if (boot_cpu_has(X86_FEATURE_CPUID_FAULT)) {
> > > + if (vmx->loaded_cpu_state)
> >
> > No need for two separate if statements. And assuming we're checking the
> > existing shadow value when loading guest/host state, the WRMSR should
> > only be done if the host's value is non-zero.
>
> I'll combine these two if statements into one.
>
> I cannot understand why the WRMSR should only be done if the host's value is
> non-zero. I think there is no depedency with host's value, if using the hardware
> cpuid faulting. We just need to set the value to real hardware MSR.

What I was trying to say in patch 1/2 regarding save/restore, is that I
don't think it is worthwhile to voluntarily switch hardware's value. In
other words, do the WRMSR if and only if it's absolutely necessary. And
that means only installing the guest's value if the host's value is
non-zero and host_val != guest_val. If the host's value is zero, then
the guest's value is irrelevant as CPUID faulting behavior will naturally
be taken care of when intercepting CPUID. And for obvious reasons the
WRMSR can be skipped if host and guest want the same value.

As it pertains to this code, we should not modify any hardware value that
isn't saved/restored. If we go with the above approach, then we're only
switching the hardware MSR when the host's value is non-zero. Ergo this
code should not write the actual MSR if the host value is zero as the MSR
will not be restored to the host's value when putting the vCPU.

> > > + wrmsrl(MSR_MISC_FEATURES_ENABLES, data);
> > > + }
> > > + vcpu->arch.msr_misc_features_enables = data;
> > > + break;
> > > case MSR_TSC_AUX:
> > > if (!msr_info->host_initiated &&
> > > !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 434ec113cc79..33a8c95b2f2e 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -2449,6 +2449,17 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
> > > &vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
> > > }
> > >
> > > +int kvm_supported_msr_misc_features_enables(struct kvm_vcpu *vcpu, u64
> > > data)
> > > +{
> > > + if (data & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT ||
> > > + (data & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT &&
> > > + !supports_cpuid_fault(vcpu)))
> > > + return 0;
> > > + else
> > > + return 1;
> > > +}
> > > +EXPORT_SYMBOL_GPL(kvm_supported_msr_misc_features_enables);
> > > +
> > > int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > > {
> > > bool pr = false;
> > > @@ -2669,9 +2680,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct
> > > msr_data *msr_info)
> > > vcpu->arch.msr_platform_info = data;
> > > break;
> > > case MSR_MISC_FEATURES_ENABLES:
> > > - if (data & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT ||
> > > - (data & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT &&
> > > - !supports_cpuid_fault(vcpu)))
> > > + if (!kvm_supported_msr_misc_features_enables(vcpu, data))
> > > return 1;
> > > vcpu->arch.msr_misc_features_enables = data;
> > > break;
> > > --
> > > 2.19.1
> > >
>

2019-03-19 17:56:24

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] kvm/vmx: Using hardware cpuid faulting to avoid emulation overhead

On Tue, 2019-03-19 at 07:28 -0700, Sean Christopherson wrote:
> On Tue, Mar 19, 2019 at 12:37:23PM +0800, Xiaoyao Li wrote:
> > On Mon, 2019-03-18 at 09:38 -0700, Sean Christopherson wrote:
> > > On Mon, Mar 18, 2019 at 07:43:24PM +0800, Xiaoyao Li wrote:
> > > > Current cpuid faulting of guest is purely emulated in kvm, which
> > > > exploits
> > > > CPUID vm exit to inject #GP to guest. However, if host hardware cpu has
> > > > X86_FEATURE_CPUID_FAULT, we can just use the hardware cpuid faulting for
> > > > guest to avoid the vm exit overhead.
> > >
> > > Heh, I obviously didn't look at this patch before responding to patch 1/2.
> > >
> > > > Note: cpuid faulting takes higher priority over CPUID instruction vm
> > > > exit (Intel SDM vol3.25.1.1).
> > > >
> > > > Since cpuid faulting only exists on some Intel's cpu, just apply this
> > > > optimization to vmx.
> > > >
> > > > Signed-off-by: Xiaoyao Li <[email protected]>
> > > > ---
> > > > arch/x86/include/asm/kvm_host.h | 2 ++
> > > > arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++++----
> > > > arch/x86/kvm/x86.c | 15 ++++++++++++---
> > > > 3 files changed, 29 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/arch/x86/include/asm/kvm_host.h
> > > > b/arch/x86/include/asm/kvm_host.h
> > > > index ce79d7bfe1fd..14cad587b804 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > @@ -1339,6 +1339,8 @@ void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long
> > > > msw);
> > > > void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l);
> > > > int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
> > > >
> > > > +int kvm_supported_msr_misc_features_enables(struct kvm_vcpu *vcpu, u64
> > > > data);
> > > > +
> > > > int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
> > > > int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
> > > >
> > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > > index 2c59e0209e36..6b413e471dca 100644
> > > > --- a/arch/x86/kvm/vmx/vmx.c
> > > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > > @@ -1037,7 +1037,7 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)
> > > >
> > > > static void vmx_save_host_cpuid_fault(struct vcpu_vmx *vmx)
> > > > {
> > > > - u64 host_val;
> > > > + u64 host_val, guest_val;
> > > >
> > > > if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
> > > > return;
> > > > @@ -1045,10 +1045,12 @@ static void vmx_save_host_cpuid_fault(struct
> > > > vcpu_vmx *vmx)
> > > > rdmsrl(MSR_MISC_FEATURES_ENABLES, host_val);
> > > > vmx->host_msr_misc_features_enables = host_val;
> > > >
> > > > - /* clear cpuid fault bit to avoid it leak to guest */
> > > > - if (host_val & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
> > > > + guest_val = vmx->vcpu.arch.msr_misc_features_enables;
> > > > +
> > > > + /* we can use the hardware cpuid faulting to avoid emulation
> > > > overhead */
> > > > + if ((host_val ^ guest_val) &
> > > > MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
> > > > wrmsrl(MSR_MISC_FEATURES_ENABLES,
> > > > - host_val &
> > > > ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
> > > > + host_val ^
> > > > MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
> > > > }
> > > > }
> > > >
> > > > @@ -2057,6 +2059,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu,
> > > > struct
> > > > msr_data *msr_info)
> > > > else
> > > > vmx->pt_desc.guest.addr_a[index / 2] = data;
> > > > break;
> > > > + case MSR_MISC_FEATURES_ENABLES:
> > > > + if (!kvm_supported_msr_misc_features_enables(vcpu,
> > > > data))
> > > > + return 1;
> > > > + if (boot_cpu_has(X86_FEATURE_CPUID_FAULT)) {
> > > > + if (vmx->loaded_cpu_state)
> > >
> > > No need for two separate if statements. And assuming we're checking the
> > > existing shadow value when loading guest/host state, the WRMSR should
> > > only be done if the host's value is non-zero.
> >
> > I'll combine these two if statements into one.
> >
> > I cannot understand why the WRMSR should only be done if the host's value is
> > non-zero. I think there is no depedency with host's value, if using the
> > hardware
> > cpuid faulting. We just need to set the value to real hardware MSR.
>
> What I was trying to say in patch 1/2 regarding save/restore, is that I
> don't think it is worthwhile to voluntarily switch hardware's value. In
> other words, do the WRMSR if and only if it's absolutely necessary. And
> that means only installing the guest's value if the host's value is
> non-zero and host_val != guest_val. If the host's value is zero, then
> the guest's value is irrelevant as CPUID faulting behavior will naturally
> be taken care of when intercepting CPUID. And for obvious reasons the
> WRMSR can be skipped if host and guest want the same value.

The purpose of this patch is always using hardware cpuid fault if hardware cpu
has this feature. Because emuated cpuid faulting needs entire vmexit process,
but using hardware cpuid faulting just adds two WRMSR in save/restore cycle only
when host_val != guest_val.

Also I conclude the handling ou said as below:
When host's value is zero, we do nothing to this MSR, and let guest ues emulated
cpuid faulting through CPUID intercepting.
When host's value is non-zero, we load the guest'value into hardware MSR, which
means we use hardware cpuid faulting.

So the difference is when host value is zero, you choose to use emulated cpuid
faulting. What's meaning of chooseing emualtion or hardware feature based on
host's value?

> As it pertains to this code, we should not modify any hardware value that
> isn't saved/restored. If we go with the above approach, then we're only
> switching the hardware MSR when the host's value is non-zero. Ergo this
> code should not write the actual MSR if the host value is zero as the MSR
> will not be restored to the host's value when putting the vCPU.
>
> > > > + wrmsrl(MSR_MISC_FEATURES_ENABLES, data);
> > > > + }
> > > > + vcpu->arch.msr_misc_features_enables = data;
> > > > + break;
> > > > case MSR_TSC_AUX:
> > > > if (!msr_info->host_initiated &&
> > > > !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index 434ec113cc79..33a8c95b2f2e 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -2449,6 +2449,17 @@ static void record_steal_time(struct kvm_vcpu
> > > > *vcpu)
> > > > &vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
> > > > }
> > > >
> > > > +int kvm_supported_msr_misc_features_enables(struct kvm_vcpu *vcpu, u64
> > > > data)
> > > > +{
> > > > + if (data & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT ||
> > > > + (data & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT &&
> > > > + !supports_cpuid_fault(vcpu)))
> > > > + return 0;
> > > > + else
> > > > + return 1;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(kvm_supported_msr_misc_features_enables);
> > > > +
> > > > int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data
> > > > *msr_info)
> > > > {
> > > > bool pr = false;
> > > > @@ -2669,9 +2680,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu,
> > > > struct
> > > > msr_data *msr_info)
> > > > vcpu->arch.msr_platform_info = data;
> > > > break;
> > > > case MSR_MISC_FEATURES_ENABLES:
> > > > - if (data & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT ||
> > > > - (data & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT &&
> > > > - !supports_cpuid_fault(vcpu)))
> > > > + if (!kvm_supported_msr_misc_features_enables(vcpu,
> > > > data))
> > > > return 1;
> > > > vcpu->arch.msr_misc_features_enables = data;
> > > > break;
> > > > --
> > > > 2.19.1
> > > >


2019-03-20 00:11:03

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] kvm/vmx: Using hardware cpuid faulting to avoid emulation overhead

On Wed, Mar 20, 2019 at 01:51:28AM +0800, Xiaoyao Li wrote:
> On Tue, 2019-03-19 at 07:28 -0700, Sean Christopherson wrote:
> > On Tue, Mar 19, 2019 at 12:37:23PM +0800, Xiaoyao Li wrote:
> > > On Mon, 2019-03-18 at 09:38 -0700, Sean Christopherson wrote:
> > > > On Mon, Mar 18, 2019 at 07:43:24PM +0800, Xiaoyao Li wrote:
> > > > > Current cpuid faulting of guest is purely emulated in kvm, which
> > > > > exploits
> > > > > CPUID vm exit to inject #GP to guest. However, if host hardware cpu has
> > > > > X86_FEATURE_CPUID_FAULT, we can just use the hardware cpuid faulting for
> > > > > guest to avoid the vm exit overhead.
> > > >
> > > > Heh, I obviously didn't look at this patch before responding to patch 1/2.
> > > >
> > > > > Note: cpuid faulting takes higher priority over CPUID instruction vm
> > > > > exit (Intel SDM vol3.25.1.1).
> > > > >
> > > > > Since cpuid faulting only exists on some Intel's cpu, just apply this
> > > > > optimization to vmx.
> > > > >
> > > > > Signed-off-by: Xiaoyao Li <[email protected]>
> > > > > ---
> > > > > arch/x86/include/asm/kvm_host.h | 2 ++
> > > > > arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++++----
> > > > > arch/x86/kvm/x86.c | 15 ++++++++++++---
> > > > > 3 files changed, 29 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/arch/x86/include/asm/kvm_host.h
> > > > > b/arch/x86/include/asm/kvm_host.h
> > > > > index ce79d7bfe1fd..14cad587b804 100644
> > > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > > @@ -1339,6 +1339,8 @@ void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long
> > > > > msw);
> > > > > void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l);
> > > > > int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
> > > > >
> > > > > +int kvm_supported_msr_misc_features_enables(struct kvm_vcpu *vcpu, u64
> > > > > data);
> > > > > +
> > > > > int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
> > > > > int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
> > > > >
> > > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > > > index 2c59e0209e36..6b413e471dca 100644
> > > > > --- a/arch/x86/kvm/vmx/vmx.c
> > > > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > > > @@ -1037,7 +1037,7 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)
> > > > >
> > > > > static void vmx_save_host_cpuid_fault(struct vcpu_vmx *vmx)
> > > > > {
> > > > > - u64 host_val;
> > > > > + u64 host_val, guest_val;
> > > > >
> > > > > if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
> > > > > return;
> > > > > @@ -1045,10 +1045,12 @@ static void vmx_save_host_cpuid_fault(struct
> > > > > vcpu_vmx *vmx)
> > > > > rdmsrl(MSR_MISC_FEATURES_ENABLES, host_val);
> > > > > vmx->host_msr_misc_features_enables = host_val;
> > > > >
> > > > > - /* clear cpuid fault bit to avoid it leak to guest */
> > > > > - if (host_val & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
> > > > > + guest_val = vmx->vcpu.arch.msr_misc_features_enables;
> > > > > +
> > > > > + /* we can use the hardware cpuid faulting to avoid emulation
> > > > > overhead */
> > > > > + if ((host_val ^ guest_val) &
> > > > > MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
> > > > > wrmsrl(MSR_MISC_FEATURES_ENABLES,
> > > > > - host_val &
> > > > > ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
> > > > > + host_val ^
> > > > > MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
> > > > > }
> > > > > }
> > > > >
> > > > > @@ -2057,6 +2059,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu,
> > > > > struct
> > > > > msr_data *msr_info)
> > > > > else
> > > > > vmx->pt_desc.guest.addr_a[index / 2] = data;
> > > > > break;
> > > > > + case MSR_MISC_FEATURES_ENABLES:
> > > > > + if (!kvm_supported_msr_misc_features_enables(vcpu,
> > > > > data))
> > > > > + return 1;
> > > > > + if (boot_cpu_has(X86_FEATURE_CPUID_FAULT)) {
> > > > > + if (vmx->loaded_cpu_state)
> > > >
> > > > No need for two separate if statements. And assuming we're checking the
> > > > existing shadow value when loading guest/host state, the WRMSR should
> > > > only be done if the host's value is non-zero.
> > >
> > > I'll combine these two if statements into one.
> > >
> > > I cannot understand why the WRMSR should only be done if the host's value is
> > > non-zero. I think there is no depedency with host's value, if using the
> > > hardware
> > > cpuid faulting. We just need to set the value to real hardware MSR.
> >
> > What I was trying to say in patch 1/2 regarding save/restore, is that I
> > don't think it is worthwhile to voluntarily switch hardware's value. In
> > other words, do the WRMSR if and only if it's absolutely necessary. And
> > that means only installing the guest's value if the host's value is
> > non-zero and host_val != guest_val. If the host's value is zero, then
> > the guest's value is irrelevant as CPUID faulting behavior will naturally
> > be taken care of when intercepting CPUID. And for obvious reasons the
> > WRMSR can be skipped if host and guest want the same value.
>
> The purpose of this patch is always using hardware cpuid fault if hardware cpu
> has this feature. Because emuated cpuid faulting needs entire vmexit process,
> but using hardware cpuid faulting just adds two WRMSR in save/restore cycle only
> when host_val != guest_val.
>
> Also I conclude the handling ou said as below:
> When host's value is zero, we do nothing to this MSR, and let guest ues emulated
> cpuid faulting through CPUID intercepting.
> When host's value is non-zero, we load the guest'value into hardware MSR, which
> means we use hardware cpuid faulting.
>
> So the difference is when host value is zero, you choose to use emulated cpuid
> faulting. What's meaning of chooseing emualtion or hardware feature based on
> host's value?

To save cycles by avoiding WRMSR whenever possible. WRMSR is expensive, even
if it's limited to vcpu_{load,put}(), e.g. a workload that triggers a lot of
exits to userspace isn't going to be thrilled about the extra 250-300 cycles
added to each round trip.

On the other hand, emulating CPUID faulting only adds a VM-Exit to userspace
CPUID instructions that will fault anyways, and faults aren't exactly fast
paths. Most uses of CPUID in userspace are in application startup, e.g. a
hanful of CPUIDs to determine what features can be used. So even if the ~1000
cycles added by the VM-Exit is somehow meaningful to the fault path, its
impact is limited to a tiny number of instructions executed in the guest,
whereas doing WRMSR affects every vcpu_{load,put}.

2019-03-20 13:33:05

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] kvm/vmx: Using hardware cpuid faulting to avoid emulation overhead

On Tue, 2019-03-19 at 17:09 -0700, Sean Christopherson wrote:
> On Wed, Mar 20, 2019 at 01:51:28AM +0800, Xiaoyao Li wrote:
> > On Tue, 2019-03-19 at 07:28 -0700, Sean Christopherson wrote:
> > > On Tue, Mar 19, 2019 at 12:37:23PM +0800, Xiaoyao Li wrote:
> > > > On Mon, 2019-03-18 at 09:38 -0700, Sean Christopherson wrote:
> > > > > On Mon, Mar 18, 2019 at 07:43:24PM +0800, Xiaoyao Li wrote:
> > > > > > Current cpuid faulting of guest is purely emulated in kvm, which
> > > > > > exploits
> > > > > > CPUID vm exit to inject #GP to guest. However, if host hardware cpu
> > > > > > has
> > > > > > X86_FEATURE_CPUID_FAULT, we can just use the hardware cpuid faulting
> > > > > > for
> > > > > > guest to avoid the vm exit overhead.
> > > > >
> > > > > Heh, I obviously didn't look at this patch before responding to patch
> > > > > 1/2.
> > > > >
> > > > > > Note: cpuid faulting takes higher priority over CPUID instruction vm
> > > > > > exit (Intel SDM vol3.25.1.1).
> > > > > >
> > > > > > Since cpuid faulting only exists on some Intel's cpu, just apply
> > > > > > this
> > > > > > optimization to vmx.
> > > > > >
> > > > > > Signed-off-by: Xiaoyao Li <[email protected]>
> > > > > > ---
> > > > > > arch/x86/include/asm/kvm_host.h | 2 ++
> > > > > > arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++++----
> > > > > > arch/x86/kvm/x86.c | 15 ++++++++++++---
> > > > > > 3 files changed, 29 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/x86/include/asm/kvm_host.h
> > > > > > b/arch/x86/include/asm/kvm_host.h
> > > > > > index ce79d7bfe1fd..14cad587b804 100644
> > > > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > > > @@ -1339,6 +1339,8 @@ void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned
> > > > > > long
> > > > > > msw);
> > > > > > void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l);
> > > > > > int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
> > > > > >
> > > > > > +int kvm_supported_msr_misc_features_enables(struct kvm_vcpu *vcpu,
> > > > > > u64
> > > > > > data);
> > > > > > +
> > > > > > int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data
> > > > > > *msr);
> > > > > > int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data
> > > > > > *msr);
> > > > > >
> > > > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > > > > index 2c59e0209e36..6b413e471dca 100644
> > > > > > --- a/arch/x86/kvm/vmx/vmx.c
> > > > > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > > > > @@ -1037,7 +1037,7 @@ static void pt_guest_exit(struct vcpu_vmx
> > > > > > *vmx)
> > > > > >
> > > > > > static void vmx_save_host_cpuid_fault(struct vcpu_vmx *vmx)
> > > > > > {
> > > > > > - u64 host_val;
> > > > > > + u64 host_val, guest_val;
> > > > > >
> > > > > > if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
> > > > > > return;
> > > > > > @@ -1045,10 +1045,12 @@ static void vmx_save_host_cpuid_fault(struct
> > > > > > vcpu_vmx *vmx)
> > > > > > rdmsrl(MSR_MISC_FEATURES_ENABLES, host_val);
> > > > > > vmx->host_msr_misc_features_enables = host_val;
> > > > > >
> > > > > > - /* clear cpuid fault bit to avoid it leak to guest */
> > > > > > - if (host_val & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
> > > > > > + guest_val = vmx->vcpu.arch.msr_misc_features_enables;
> > > > > > +
> > > > > > + /* we can use the hardware cpuid faulting to avoid emulation
> > > > > > overhead */
> > > > > > + if ((host_val ^ guest_val) &
> > > > > > MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) {
> > > > > > wrmsrl(MSR_MISC_FEATURES_ENABLES,
> > > > > > - host_val &
> > > > > > ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
> > > > > > + host_val ^
> > > > > > MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > @@ -2057,6 +2059,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu,
> > > > > > struct
> > > > > > msr_data *msr_info)
> > > > > > else
> > > > > > vmx->pt_desc.guest.addr_a[index / 2] = data;
> > > > > > break;
> > > > > > + case MSR_MISC_FEATURES_ENABLES:
> > > > > > + if (!kvm_supported_msr_misc_features_enables(vcpu,
> > > > > > data))
> > > > > > + return 1;
> > > > > > + if (boot_cpu_has(X86_FEATURE_CPUID_FAULT)) {
> > > > > > + if (vmx->loaded_cpu_state)
> > > > >
> > > > > No need for two separate if statements. And assuming we're checking
> > > > > the
> > > > > existing shadow value when loading guest/host state, the WRMSR should
> > > > > only be done if the host's value is non-zero.
> > > >
> > > > I'll combine these two if statements into one.
> > > >
> > > > I cannot understand why the WRMSR should only be done if the host's
> > > > value is
> > > > non-zero. I think there is no depedency with host's value, if using the
> > > > hardware
> > > > cpuid faulting. We just need to set the value to real hardware MSR.
> > >
> > > What I was trying to say in patch 1/2 regarding save/restore, is that I
> > > don't think it is worthwhile to voluntarily switch hardware's value. In
> > > other words, do the WRMSR if and only if it's absolutely necessary. And
> > > that means only installing the guest's value if the host's value is
> > > non-zero and host_val != guest_val. If the host's value is zero, then
> > > the guest's value is irrelevant as CPUID faulting behavior will naturally
> > > be taken care of when intercepting CPUID. And for obvious reasons the
> > > WRMSR can be skipped if host and guest want the same value.
> >
> > The purpose of this patch is always using hardware cpuid fault if hardware
> > cpu
> > has this feature. Because emuated cpuid faulting needs entire vmexit
> > process,
> > but using hardware cpuid faulting just adds two WRMSR in save/restore cycle
> > only
> > when host_val != guest_val.
> >
> > Also I conclude the handling ou said as below:
> > When host's value is zero, we do nothing to this MSR, and let guest ues
> > emulated
> > cpuid faulting through CPUID intercepting.
> > When host's value is non-zero, we load the guest'value into hardware MSR,
> > which
> > means we use hardware cpuid faulting.
> >
> > So the difference is when host value is zero, you choose to use emulated
> > cpuid
> > faulting. What's meaning of chooseing emualtion or hardware feature based on
> > host's value?
>
> To save cycles by avoiding WRMSR whenever possible. WRMSR is expensive, even
> if it's limited to vcpu_{load,put}(), e.g. a workload that triggers a lot of
> exits to userspace isn't going to be thrilled about the extra 250-300 cycles
> added to each round trip.
>
> On the other hand, emulating CPUID faulting only adds a VM-Exit to userspace
> CPUID instructions that will fault anyways, and faults aren't exactly fast
> paths. Most uses of CPUID in userspace are in application startup, e.g. a
> hanful of CPUIDs to determine what features can be used. So even if the ~1000
> cycles added by the VM-Exit is somehow meaningful to the fault path, its
> impact is limited to a tiny number of instructions executed in the guest,
> whereas doing WRMSR affects every vcpu_{load,put}.

Understand, it makes sense.
So the real optimization is to avoid WRMSR as far as possible.
Thanks for your patience and explanation, Sean. I'll adjust the patches
following your comments in the next version.