2021-04-23 22:35:20

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 0/4] KVM: x86: MSR_TSC_AUX fixes and improvements

Fix potential cross-vendor landmines due to Intel and AMD having different
behavior for MSR_TSC_AUX[63:32], unify the logic across SVM and VMX, and
switch MSR_TSC_AUX via user return MSRs on SVM (the original goal).

v3:
- Fix a comment typo. [Reiji]
- Add patches to add missing guest_cpuid_has() check, drop bits 63:32 on
AMD, and unify VMX and SVM emulation.
- Rebase to kvm/next, commit c4f71901d53b ("Merge tag 'kvmarm-5.13' ... )

v2:
- Rebase to kvm/queue (ish), commit 0e91d1992235 ("KVM: SVM: Allocate SEV
command structures on local stack")
- https://lkml.kernel.org/r/[email protected]

v1: https://lkml.kernel.org/r/[email protected]


Sean Christopherson (4):
KVM: SVM: Inject #GP on guest MSR_TSC_AUX accesses if RDTSCP
unsupported
KVM: SVM: Clear MSR_TSC_AUX[63:32] on write
KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU
model
KVM: SVM: Delay restoration of host MSR_TSC_AUX until return to
userspace

arch/x86/kvm/svm/svm.c | 58 +++++++++++++++++-------------------------
arch/x86/kvm/svm/svm.h | 7 -----
arch/x86/kvm/vmx/vmx.c | 13 ----------
arch/x86/kvm/x86.c | 34 +++++++++++++++++++++++++
4 files changed, 58 insertions(+), 54 deletions(-)

--
2.31.1.498.g6c1eba8ee3d-goog


2021-04-23 22:35:25

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 1/4] KVM: SVM: Inject #GP on guest MSR_TSC_AUX accesses if RDTSCP unsupported

Inject #GP on guest accesses to MSR_TSC_AUX if RDTSCP is unsupported in
the guest's CPUID model.

Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cd8c333ed2dc..9ed9c7bd7cfd 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2674,6 +2674,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_TSC_AUX:
if (!boot_cpu_has(X86_FEATURE_RDTSCP))
return 1;
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+ return 1;
msr_info->data = svm->tsc_aux;
break;
/*
@@ -2892,6 +2895,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
if (!boot_cpu_has(X86_FEATURE_RDTSCP))
return 1;

+ if (!msr->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+ return 1;
+
/*
* This is rare, so we update the MSR here instead of using
* direct_access_msrs. Doing that would require a rdmsr in
--
2.31.1.498.g6c1eba8ee3d-goog

2021-04-23 22:35:29

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 4/4] KVM: SVM: Delay restoration of host MSR_TSC_AUX until return to userspace

Use KVM's "user return MSRs" framework to defer restoring the host's
MSR_TSC_AUX until the CPU returns to userspace. Add/improve comments to
clarify why MSR_TSC_AUX is intercepted on both RDMSR and WRMSR, and why
it's safe for KVM to keep the guest's value loaded even if KVM is
scheduled out.

Cc: Reiji Watanabe <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 52 +++++++++++++++++++-----------------------
arch/x86/kvm/svm/svm.h | 7 ------
2 files changed, 23 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4c7604fca009..a4bd7cb19755 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -213,6 +213,15 @@ struct kvm_ldttss_desc {

DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);

+/*
+ * Only MSR_TSC_AUX is switched via the user return hook. EFER is switched via
+ * the VMCB, and the SYSCALL/SYSENTER MSRs are handled by VMLOAD/VMSAVE.
+ *
+ * RDTSCP and RDPID are not used in the kernel, specifically to allow KVM to
+ * defer the restoration of TSC_AUX until the CPU returns to userspace.
+ */
+#define TSC_AUX_URET_SLOT 0
+
static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};

#define NUM_MSR_MAPS ARRAY_SIZE(msrpm_ranges)
@@ -958,6 +967,9 @@ static __init int svm_hardware_setup(void)
kvm_tsc_scaling_ratio_frac_bits = 32;
}

+ if (boot_cpu_has(X86_FEATURE_RDTSCP))
+ kvm_define_user_return_msr(TSC_AUX_URET_SLOT, MSR_TSC_AUX);
+
/* Check for pause filtering support */
if (!boot_cpu_has(X86_FEATURE_PAUSEFILTER)) {
pause_filter_count = 0;
@@ -1423,19 +1435,10 @@ static void svm_prepare_guest_switch(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
struct svm_cpu_data *sd = per_cpu(svm_data, vcpu->cpu);
- unsigned int i;

if (svm->guest_state_loaded)
return;

- /*
- * Certain MSRs are restored on VMEXIT (sev-es), or vmload of host save
- * area (non-sev-es). Save ones that aren't so we can restore them
- * individually later.
- */
- for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
- rdmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
-
/*
* Save additional host state that will be restored on VMEXIT (sev-es)
* or subsequent vmload of host save area.
@@ -1454,29 +1457,15 @@ static void svm_prepare_guest_switch(struct kvm_vcpu *vcpu)
}
}

- /* This assumes that the kernel never uses MSR_TSC_AUX */
if (static_cpu_has(X86_FEATURE_RDTSCP))
- wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
+ kvm_set_user_return_msr(TSC_AUX_URET_SLOT, svm->tsc_aux, -1ull);

svm->guest_state_loaded = true;
}

static void svm_prepare_host_switch(struct kvm_vcpu *vcpu)
{
- struct vcpu_svm *svm = to_svm(vcpu);
- unsigned int i;
-
- if (!svm->guest_state_loaded)
- return;
-
- /*
- * Certain MSRs are restored on VMEXIT (sev-es), or vmload of host save
- * area (non-sev-es). Restore the ones that weren't.
- */
- for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
- wrmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
-
- svm->guest_state_loaded = false;
+ to_svm(vcpu)->guest_state_loaded = false;
}

static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
@@ -2785,6 +2774,7 @@ static int svm_set_vm_cr(struct kvm_vcpu *vcpu, u64 data)
static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ int r;

u32 ecx = msr->index;
u64 data = msr->data;
@@ -2888,11 +2878,15 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
break;
case MSR_TSC_AUX:
/*
- * This is rare, so we update the MSR here instead of using
- * direct_access_msrs. Doing that would require a rdmsr in
- * svm_vcpu_put.
+ * TSC_AUX is usually changed only during boot and never read
+ * directly. Intercept TSC_AUX instead of exposing it to the
+ * guest via direct_access_msrs, and switch it via user return.
*/
- wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
+ preempt_disable();
+ r = kvm_set_user_return_msr(TSC_AUX_URET_SLOT, data, -1ull);
+ preempt_enable();
+ if (r)
+ return 1;

svm->tsc_aux = data;
break;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index d6206196eec1..5d8027e9c1c5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -23,11 +23,6 @@

#define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)

-static const u32 host_save_user_msrs[] = {
- MSR_TSC_AUX,
-};
-#define NR_HOST_SAVE_USER_MSRS ARRAY_SIZE(host_save_user_msrs)
-
#define IOPM_SIZE PAGE_SIZE * 3
#define MSRPM_SIZE PAGE_SIZE * 2

@@ -129,8 +124,6 @@ struct vcpu_svm {

u64 next_rip;

- u64 host_user_msrs[NR_HOST_SAVE_USER_MSRS];
-
u64 spec_ctrl;
/*
* Contains guest-controlled bits of VIRT_SPEC_CTRL, which will be
--
2.31.1.498.g6c1eba8ee3d-goog

2021-04-23 22:37:07

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 2/4] KVM: SVM: Clear MSR_TSC_AUX[63:32] on write

Force clear bits 63:32 of MSR_TSC_AUX on write to emulate current AMD
CPUs, which completely ignore the upper 32 bits, including dropping them
on write. Emulating AMD hardware will also allow migrating a vCPU from
AMD hardware to Intel hardware without requiring userspace to manually
clear the upper bits, which are reserved on Intel hardware.

Presumably, MSR_TSC_AUX[63:32] are intended to be reserved on AMD, but
sadly the APM doesn't say _anything_ about those bits in the context of
MSR access. The RDTSCP entry simply states that RCX contains bits 31:0
of the MSR, zero extended. And even worse is that the RDPID description
implies that it can consume all 64 bits of the MSR:

RDPID reads the value of TSC_AUX MSR used by the RDTSCP instruction
into the specified destination register. Normal operand size prefixes
do not apply and the update is either 32 bit or 64 bit based on the
current mode.

Emulate current hardware behavior to give KVM the best odds of playing
nice with whatever the behavior of future AMD CPUs happens to be.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9ed9c7bd7cfd..71d704f8d569 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2904,8 +2904,17 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
* direct_access_msrs. Doing that would require a rdmsr in
* svm_vcpu_put.
*/
- svm->tsc_aux = data;
wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
+
+ /*
+ * Per Intel's SDM, bits 63:32 are reserved, but AMD's APM has
+ * incomplete and conflicting architectural behavior. Current
+ * AMD CPUs completely ignore bits 63:32, i.e. they aren't
+ * reserved and always read as zeros. Emulate AMD CPU behavior
+ * to avoid explosions if the vCPU is migrated from an AMD host
+ * to an Intel host.
+ */
+ svm->tsc_aux = (u32)data;
break;
case MSR_IA32_DEBUGCTLMSR:
if (!boot_cpu_has(X86_FEATURE_LBRV)) {
--
2.31.1.498.g6c1eba8ee3d-goog

2021-04-23 22:37:18

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 3/4] KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU model

Squish the Intel and AMD emulation of MSR_TSC_AUX together and tie it to
the guest CPU model instead of the host CPU behavior. While not strictly
necessary to avoid guest breakage, emulating cross-vendor "architecture"
will provide consistent behavior for the guest, e.g. WRMSR fault behavior
won't change if the vCPU is migrated to a host with divergent behavior.

Note, this also adds a kvm_cpu_has() check on RDTSCP for VMX. For all
practical purposes, the extra check is a nop as VMX's use of user return
MSRs indirectly performs the same check by checking the result of WRMSR.
Technically RDTSCP support can exist in bare metal but not in the VMCS,
but no known Intel CPUs behave this way. In practice, the only scenario
where adding the kvm_cpu_has() check isn't a nop is nested virtualization
scenario where L0 hides the VMCS control from L1 (KVM in this case), and
the L1 userspace VMM has decided to advertise RDTSCP _against_ the
recommendations of KVM_GET_SUPPORTED_CPUID.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 22 +---------------------
arch/x86/kvm/vmx/vmx.c | 13 -------------
arch/x86/kvm/x86.c | 34 ++++++++++++++++++++++++++++++++++
3 files changed, 35 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 71d704f8d569..4c7604fca009 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2672,11 +2672,6 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
msr_info->data |= (u64)svm->sysenter_esp_hi << 32;
break;
case MSR_TSC_AUX:
- if (!boot_cpu_has(X86_FEATURE_RDTSCP))
- return 1;
- if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
- return 1;
msr_info->data = svm->tsc_aux;
break;
/*
@@ -2892,13 +2887,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
svm->sysenter_esp_hi = guest_cpuid_is_intel(vcpu) ? (data >> 32) : 0;
break;
case MSR_TSC_AUX:
- if (!boot_cpu_has(X86_FEATURE_RDTSCP))
- return 1;
-
- if (!msr->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
- return 1;
-
/*
* This is rare, so we update the MSR here instead of using
* direct_access_msrs. Doing that would require a rdmsr in
@@ -2906,15 +2894,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
*/
wrmsrl(MSR_TSC_AUX, svm->tsc_aux);

- /*
- * Per Intel's SDM, bits 63:32 are reserved, but AMD's APM has
- * incomplete and conflicting architectural behavior. Current
- * AMD CPUs completely ignore bits 63:32, i.e. they aren't
- * reserved and always read as zeros. Emulate AMD CPU behavior
- * to avoid explosions if the vCPU is migrated from an AMD host
- * to an Intel host.
- */
- svm->tsc_aux = (u32)data;
+ svm->tsc_aux = data;
break;
case MSR_IA32_DEBUGCTLMSR:
if (!boot_cpu_has(X86_FEATURE_LBRV)) {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6501d66167b8..d3fce53d89ac 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1992,11 +1992,6 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
else
msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
break;
- case MSR_TSC_AUX:
- if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
- return 1;
- goto find_uret_msr;
case MSR_IA32_DEBUGCTLMSR:
msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
break;
@@ -2312,14 +2307,6 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
else
vmx->pt_desc.guest.addr_a[index / 2] = data;
break;
- case MSR_TSC_AUX:
- if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
- return 1;
- /* Check reserved bit, higher 32 bits should be zero */
- if ((data >> 32) != 0)
- return 1;
- goto find_uret_msr;
case MSR_IA32_PERF_CAPABILITIES:
if (data && !vcpu_to_pmu(vcpu)->version)
return 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f0d0b6e043ae..95da9b1cabdb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1610,6 +1610,29 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
* invokes 64-bit SYSENTER.
*/
data = get_canonical(data, vcpu_virt_addr_bits(vcpu));
+ break;
+ case MSR_TSC_AUX:
+ if (!kvm_cpu_cap_has(X86_FEATURE_RDTSCP))
+ return 1;
+
+ if (!host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+ return 1;
+
+ /*
+ * Per Intel's SDM, bits 63:32 are reserved, but AMD's APM has
+ * incomplete and conflicting architectural behavior. Current
+ * AMD CPUs completely ignore bits 63:32, i.e. they aren't
+ * reserved and always read as zeros. Enforce Intel's reserved
+ * bits check if and only if the guest CPU is Intel, and clear
+ * the bits in all other cases. This ensures cross-vendor
+ * migration will provide consistent behavior for the guest.
+ */
+ if (guest_cpuid_is_intel(vcpu) && (data >> 32) != 0)
+ return 1;
+
+ data = (u32)data;
+ break;
}

msr.data = data;
@@ -1646,6 +1669,17 @@ int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
if (!host_initiated && !kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ))
return KVM_MSR_RET_FILTERED;

+ switch (index) {
+ case MSR_TSC_AUX:
+ if (!kvm_cpu_cap_has(X86_FEATURE_RDTSCP))
+ return 1;
+
+ if (!host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+ return 1;
+ break;
+ }
+
msr.index = index;
msr.host_initiated = host_initiated;

--
2.31.1.498.g6c1eba8ee3d-goog

2021-04-24 07:23:16

by Reiji Watanabe

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU model

> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1610,6 +1610,29 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
> * invokes 64-bit SYSENTER.
> */
> data = get_canonical(data, vcpu_virt_addr_bits(vcpu));
> + break;
> + case MSR_TSC_AUX:
> + if (!kvm_cpu_cap_has(X86_FEATURE_RDTSCP))
> + return 1;
> +
> + if (!host_initiated &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> + return 1;
> +
> + /*
> + * Per Intel's SDM, bits 63:32 are reserved, but AMD's APM has
> + * incomplete and conflicting architectural behavior. Current
> + * AMD CPUs completely ignore bits 63:32, i.e. they aren't
> + * reserved and always read as zeros. Enforce Intel's reserved
> + * bits check if and only if the guest CPU is Intel, and clear
> + * the bits in all other cases. This ensures cross-vendor
> + * migration will provide consistent behavior for the guest.
> + */
> + if (guest_cpuid_is_intel(vcpu) && (data >> 32) != 0)
> + return 1;
> +
> + data = (u32)data;
> + break;
> }
>
> msr.data = data;
> @@ -1646,6 +1669,17 @@ int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
> if (!host_initiated && !kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ))
> return KVM_MSR_RET_FILTERED;
>
> + switch (index) {
> + case MSR_TSC_AUX:
> + if (!kvm_cpu_cap_has(X86_FEATURE_RDTSCP))
> + return 1;
> +
> + if (!host_initiated &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> + return 1;


It looks Table 2-2 of the Intel SDM Vol4 (April 2021) says
TSC_AUX is supported:

If CPUID.80000001H:EDX[27] = 1 or CPUID.(EAX=7,ECX=0):ECX[22] = 1

Should we also check X86_FEATURE_RDPID before returning 1
due to no RDTSCP support ?
There doesn't seem to be a similar description in the APM though.

Thanks,
Reiji

2021-04-26 08:50:36

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] KVM: SVM: Inject #GP on guest MSR_TSC_AUX accesses if RDTSCP unsupported

Sean Christopherson <[email protected]> writes:

> Inject #GP on guest accesses to MSR_TSC_AUX if RDTSCP is unsupported in
> the guest's CPUID model.
>
> Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index cd8c333ed2dc..9ed9c7bd7cfd 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2674,6 +2674,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case MSR_TSC_AUX:
> if (!boot_cpu_has(X86_FEATURE_RDTSCP))
> return 1;
> + if (!msr_info->host_initiated &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> + return 1;

Super-nit: a blank like before and after the condition would make this
consistent with the hunk you add to svm_set_msr() below.

> msr_info->data = svm->tsc_aux;
> break;
> /*
> @@ -2892,6 +2895,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> if (!boot_cpu_has(X86_FEATURE_RDTSCP))
> return 1;
>
> + if (!msr->host_initiated &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> + return 1;
> +
> /*
> * This is rare, so we update the MSR here instead of using
> * direct_access_msrs. Doing that would require a rdmsr in

Reviewed-by: Vitaly Kuznetsov <[email protected]>

--
Vitaly

2021-04-26 08:59:13

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] KVM: SVM: Clear MSR_TSC_AUX[63:32] on write

Sean Christopherson <[email protected]> writes:

> Force clear bits 63:32 of MSR_TSC_AUX on write to emulate current AMD
> CPUs, which completely ignore the upper 32 bits, including dropping them
> on write. Emulating AMD hardware will also allow migrating a vCPU from
> AMD hardware to Intel hardware without requiring userspace to manually
> clear the upper bits, which are reserved on Intel hardware.
>
> Presumably, MSR_TSC_AUX[63:32] are intended to be reserved on AMD, but
> sadly the APM doesn't say _anything_ about those bits in the context of
> MSR access. The RDTSCP entry simply states that RCX contains bits 31:0
> of the MSR, zero extended. And even worse is that the RDPID description
> implies that it can consume all 64 bits of the MSR:
>
> RDPID reads the value of TSC_AUX MSR used by the RDTSCP instruction
> into the specified destination register. Normal operand size prefixes
> do not apply and the update is either 32 bit or 64 bit based on the
> current mode.
>
> Emulate current hardware behavior to give KVM the best odds of playing
> nice with whatever the behavior of future AMD CPUs happens to be.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 9ed9c7bd7cfd..71d704f8d569 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2904,8 +2904,17 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> * direct_access_msrs. Doing that would require a rdmsr in
> * svm_vcpu_put.
> */
> - svm->tsc_aux = data;
> wrmsrl(MSR_TSC_AUX, svm->tsc_aux);

Hm, shouldn't this be

wrmsrl(MSR_TSC_AUX, data) or wrmsrl(MSR_TSC_AUX, (u32)data)

then? As svm->tsc_aux wasn't updated yet.

> +
> + /*
> + * Per Intel's SDM, bits 63:32 are reserved, but AMD's APM has
> + * incomplete and conflicting architectural behavior. Current
> + * AMD CPUs completely ignore bits 63:32, i.e. they aren't
> + * reserved and always read as zeros. Emulate AMD CPU behavior
> + * to avoid explosions if the vCPU is migrated from an AMD host
> + * to an Intel host.
> + */
> + svm->tsc_aux = (u32)data;

Actually, shouldn't we just move wrmsrl() here? Assuming we're not sure
how (and if) upper 32 bits are going to be used, it would probably make
sense to not write them to the actual MSR...

> break;
> case MSR_IA32_DEBUGCTLMSR:
> if (!boot_cpu_has(X86_FEATURE_LBRV)) {

--
Vitaly

2021-04-26 11:11:40

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] KVM: x86: MSR_TSC_AUX fixes and improvements

On 24/04/21 00:34, Sean Christopherson wrote:
> Fix potential cross-vendor landmines due to Intel and AMD having different
> behavior for MSR_TSC_AUX[63:32], unify the logic across SVM and VMX, and
> switch MSR_TSC_AUX via user return MSRs on SVM (the original goal).
>
> v3:
> - Fix a comment typo. [Reiji]
> - Add patches to add missing guest_cpuid_has() check, drop bits 63:32 on
> AMD, and unify VMX and SVM emulation.
> - Rebase to kvm/next, commit c4f71901d53b ("Merge tag 'kvmarm-5.13' ... )
>
> v2:
> - Rebase to kvm/queue (ish), commit 0e91d1992235 ("KVM: SVM: Allocate SEV
> command structures on local stack")
> - https://lkml.kernel.org/r/[email protected]
>
> v1: https://lkml.kernel.org/r/[email protected]
>
>
> Sean Christopherson (4):
> KVM: SVM: Inject #GP on guest MSR_TSC_AUX accesses if RDTSCP
> unsupported
> KVM: SVM: Clear MSR_TSC_AUX[63:32] on write
> KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU
> model
> KVM: SVM: Delay restoration of host MSR_TSC_AUX until return to
> userspace
>
> arch/x86/kvm/svm/svm.c | 58 +++++++++++++++++-------------------------
> arch/x86/kvm/svm/svm.h | 7 -----
> arch/x86/kvm/vmx/vmx.c | 13 ----------
> arch/x86/kvm/x86.c | 34 +++++++++++++++++++++++++
> 4 files changed, 58 insertions(+), 54 deletions(-)
>

Queued 1-2-4 (with fix for patch 2).

Paolo

2021-04-26 19:39:50

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] KVM: SVM: Clear MSR_TSC_AUX[63:32] on write

On Mon, Apr 26, 2021, Vitaly Kuznetsov wrote:
> Sean Christopherson <[email protected]> writes:
>
> > Force clear bits 63:32 of MSR_TSC_AUX on write to emulate current AMD
> > CPUs, which completely ignore the upper 32 bits, including dropping them
> > on write. Emulating AMD hardware will also allow migrating a vCPU from
> > AMD hardware to Intel hardware without requiring userspace to manually
> > clear the upper bits, which are reserved on Intel hardware.
> >
> > Presumably, MSR_TSC_AUX[63:32] are intended to be reserved on AMD, but
> > sadly the APM doesn't say _anything_ about those bits in the context of
> > MSR access. The RDTSCP entry simply states that RCX contains bits 31:0
> > of the MSR, zero extended. And even worse is that the RDPID description
> > implies that it can consume all 64 bits of the MSR:
> >
> > RDPID reads the value of TSC_AUX MSR used by the RDTSCP instruction
> > into the specified destination register. Normal operand size prefixes
> > do not apply and the update is either 32 bit or 64 bit based on the
> > current mode.
> >
> > Emulate current hardware behavior to give KVM the best odds of playing
> > nice with whatever the behavior of future AMD CPUs happens to be.
> >
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/x86/kvm/svm/svm.c | 11 ++++++++++-
> > 1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 9ed9c7bd7cfd..71d704f8d569 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -2904,8 +2904,17 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> > * direct_access_msrs. Doing that would require a rdmsr in
> > * svm_vcpu_put.
> > */
> > - svm->tsc_aux = data;
> > wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
>
> Hm, shouldn't this be
>
> wrmsrl(MSR_TSC_AUX, data) or wrmsrl(MSR_TSC_AUX, (u32)data)
>
> then? As svm->tsc_aux wasn't updated yet.
>
> > +
> > + /*
> > + * Per Intel's SDM, bits 63:32 are reserved, but AMD's APM has
> > + * incomplete and conflicting architectural behavior. Current
> > + * AMD CPUs completely ignore bits 63:32, i.e. they aren't
> > + * reserved and always read as zeros. Emulate AMD CPU behavior
> > + * to avoid explosions if the vCPU is migrated from an AMD host
> > + * to an Intel host.
> > + */
> > + svm->tsc_aux = (u32)data;
>
> Actually, shouldn't we just move wrmsrl() here? Assuming we're not sure
> how (and if) upper 32 bits are going to be used, it would probably make
> sense to not write them to the actual MSR...

Argh. I got too clever in trying to minimize the patch deltas and "broke" this
intermediate patch. Once the user return framework is used, the result of
setting the MSR is checked before setting svm->tsc_aux, i.e. don't set the
virtual state if the real state could not be set.

I'm very tempted to add yet another patch to this mess to do:

if (wrmsrl_safe(MSR_TSC_AUX, data))
return 1;

svm->tsc_aux = data;

And then this patch becomes:

data = (u32)data;

if (wrmsrl_safe(MSR_TSC_AUX, data))
return 1;

svm->tsc_aux = data;

The above will also make patch 3 cleaner as it will preserve the ordering of
truncating data and the wrmsr.

> > break;
> > case MSR_IA32_DEBUGCTLMSR:
> > if (!boot_cpu_has(X86_FEATURE_LBRV)) {
>
> --
> Vitaly
>

2021-04-26 19:42:26

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU model

On Sat, Apr 24, 2021, Reiji Watanabe wrote:
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1610,6 +1610,29 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
> > * invokes 64-bit SYSENTER.
> > */
> > data = get_canonical(data, vcpu_virt_addr_bits(vcpu));
> > + break;
> > + case MSR_TSC_AUX:
> > + if (!kvm_cpu_cap_has(X86_FEATURE_RDTSCP))
> > + return 1;
> > +
> > + if (!host_initiated &&
> > + !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> > + return 1;
> > +
> > + /*
> > + * Per Intel's SDM, bits 63:32 are reserved, but AMD's APM has
> > + * incomplete and conflicting architectural behavior. Current
> > + * AMD CPUs completely ignore bits 63:32, i.e. they aren't
> > + * reserved and always read as zeros. Enforce Intel's reserved
> > + * bits check if and only if the guest CPU is Intel, and clear
> > + * the bits in all other cases. This ensures cross-vendor
> > + * migration will provide consistent behavior for the guest.
> > + */
> > + if (guest_cpuid_is_intel(vcpu) && (data >> 32) != 0)
> > + return 1;
> > +
> > + data = (u32)data;
> > + break;
> > }
> >
> > msr.data = data;
> > @@ -1646,6 +1669,17 @@ int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
> > if (!host_initiated && !kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ))
> > return KVM_MSR_RET_FILTERED;
> >
> > + switch (index) {
> > + case MSR_TSC_AUX:
> > + if (!kvm_cpu_cap_has(X86_FEATURE_RDTSCP))
> > + return 1;
> > +
> > + if (!host_initiated &&
> > + !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> > + return 1;
>
>
> It looks Table 2-2 of the Intel SDM Vol4 (April 2021) says
> TSC_AUX is supported:
>
> If CPUID.80000001H:EDX[27] = 1 or CPUID.(EAX=7,ECX=0):ECX[22] = 1
>
> Should we also check X86_FEATURE_RDPID before returning 1
> due to no RDTSCP support ?

Yep. VMX should also clear RDPID if the ENABLE_RDTSCP control isn't supported.
That bug isn't fatal because KVM emulates RDPID on #UD, but it would be a
notieable performance hit for the guest.

There is also a kernel bug lurking; vgetcpu_cpu_init() doesn't check
X86_FEATURE_RDPID and will fail to initialize MSR_TSC_AUX if RDPID is supported
but RDTSCP is not, and __getcpu() uses RDPID. I'll verify that's broken and
send a patch for that one too.

> There doesn't seem to be a similar description in the APM though.

AMD also documents this in Appendix E:

CPUID Fn0000_0007_EBX_x0 Structured Extended Feature Identifiers (ECX=0)
Bits Field Name
...
22 RDPID RDPID instruction and TSC_AUX MSR support.


2021-04-26 19:45:05

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] KVM: SVM: Clear MSR_TSC_AUX[63:32] on write

On Mon, Apr 26, 2021, Sean Christopherson wrote:
> On Mon, Apr 26, 2021, Vitaly Kuznetsov wrote:
> > Actually, shouldn't we just move wrmsrl() here? Assuming we're not sure
> > how (and if) upper 32 bits are going to be used, it would probably make
> > sense to not write them to the actual MSR...
>
> Argh. I got too clever in trying to minimize the patch deltas and "broke" this
> intermediate patch. Once the user return framework is used, the result of
> setting the MSR is checked before setting svm->tsc_aux, i.e. don't set the
> virtual state if the real state could not be set.
>
> I'm very tempted to add yet another patch to this mess to do:
>
> if (wrmsrl_safe(MSR_TSC_AUX, data))
> return 1;
>
> svm->tsc_aux = data;
>
> And then this patch becomes:
>
> data = (u32)data;
>
> if (wrmsrl_safe(MSR_TSC_AUX, data))
> return 1;
>
> svm->tsc_aux = data;
>
> The above will also make patch 3 cleaner as it will preserve the ordering of
> truncating data and the wrmsr.

Ah, never mind, Paolo already pushed this to kvm/next with your above fix.

2021-04-27 04:44:30

by Reiji Watanabe

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU model

> > It looks Table 2-2 of the Intel SDM Vol4 (April 2021) says
> > TSC_AUX is supported:
> >
> > If CPUID.80000001H:EDX[27] = 1 or CPUID.(EAX=7,ECX=0):ECX[22] = 1
> >
> > Should we also check X86_FEATURE_RDPID before returning 1
> > due to no RDTSCP support ?
>
> Yep. VMX should also clear RDPID if the ENABLE_RDTSCP control isn't supported.
> That bug isn't fatal because KVM emulates RDPID on #UD, but it would be a
> notieable performance hit for the guest.

Thank you so much for the confirmation and the information.
Understood.


> There is also a kernel bug lurking; vgetcpu_cpu_init() doesn't check
> X86_FEATURE_RDPID and will fail to initialize MSR_TSC_AUX if RDPID is supported
> but RDTSCP is not, and __getcpu() uses RDPID. I'll verify that's broken and
> send a patch for that one too.

I don't find vgetcpu_cpu_init() or __getcpu() in
https://github.com/torvalds/linux.
I would assume you meant setup_getcpu() and vdso_read_cpunode() instead (?).


> AMD also documents this in Appendix E:
>
> CPUID Fn0000_0007_EBX_x0 Structured Extended Feature Identifiers (ECX=0)
> Bits Field Name
> ...
> 22 RDPID RDPID instruction and TSC_AUX MSR support.

Thank you. I overlooked that...


Regards,
Reiji

2021-04-27 22:01:10

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU model

On Mon, Apr 26, 2021, Reiji Watanabe wrote:

> > There is also a kernel bug lurking; vgetcpu_cpu_init() doesn't check
> > X86_FEATURE_RDPID and will fail to initialize MSR_TSC_AUX if RDPID is supported
> > but RDTSCP is not, and __getcpu() uses RDPID. I'll verify that's broken and
> > send a patch for that one too.
>
> I don't find vgetcpu_cpu_init() or __getcpu() in
> https://github.com/torvalds/linux.
> I would assume you meant setup_getcpu() and vdso_read_cpunode() instead (?).

Ya, I was looking at an old kernel when I typed that up. Bug is still there
though :-)