2021-05-04 17:19:16

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 00/15] KVM: x86: RDPID/RDTSCP fixes and uret MSR cleanups

This is a continuation of a less ambitious effort to unify MSR_TSC_AUX
handling across SVM and VMX. Reiji pointed out that MSR_TSC_AUX exists if
RDTSCP *or* RDPID is supported, and things went downhill from there.

The first half of this series fixes a variety of RDTSCP and RDPID related
bugs.

The second half of the series cleans up VMX's user return MSR framework
and consolidates more of the uret logic into common x86.

The last two patches leverage the uret MSR cleanups to move MSR_TSC_AUX
handling to common x86 and add sanity checks to guard against misreporting
of RDPID and/or RDTSCP support.

This will conflict with my vCPU RESET/INIT cleanup series. Feel free to
punt the conflicts to me.

Other "fun" things to tackle:

- The kernel proper also botches RDPID vs. RDTSCP, as MSR_TSC_AUX is
configured if RDTSCP is supported, but is consumed if RDPID is
supported. I'll send this fix separately.

- Commit 844d69c26d83 ("KVM: SVM: Delay restoration of host MSR_TSC_AUX
until return to userspace") unwittingly fixed a bug where KVM would
write MSR_TSC_AUX with the guest's value when svm->guest_state_loaded
is false, which could lead to running the host with the guest's value.
The bug only exists in 5.12 (maybe 5.11 too?), so crafting a fix for
stable won't be too awful.

Sean Christopherson (15):
KVM: VMX: Do not adverise RDPID if ENABLE_RDTSCP control is
unsupported
KVM: x86: Emulate RDPID only if RDTSCP is supported
KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest
KVM: x86: Move RDPID emulation intercept to its own enum
KVM: VMX: Disable preemption when probing user return MSRs
KVM: SVM: Probe and load MSR_TSC_AUX regardless of RDTSCP support in
host
KVM: x86: Add support for RDPID without RDTSCP
KVM: VMX: Configure list of user return MSRs at module init
KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting
list
KVM: VMX: Use common x86's uret MSR list as the one true list
KVM: VMX: Disable loading of TSX_CTRL MSR the more conventional way
KVM: x86: Export the number of uret MSRs to vendor modules
KVM: x86: Move uret MSR slot management to common x86
KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU
model
KVM: x86: Hide RDTSCP and RDPID if MSR_TSC_AUX probing failed

arch/x86/include/asm/kvm_host.h | 9 +-
arch/x86/kvm/cpuid.c | 18 ++-
arch/x86/kvm/emulate.c | 2 +-
arch/x86/kvm/kvm_emulate.h | 1 +
arch/x86/kvm/svm/svm.c | 50 +++-----
arch/x86/kvm/vmx/vmx.c | 217 ++++++++++++++++----------------
arch/x86/kvm/vmx/vmx.h | 12 +-
arch/x86/kvm/x86.c | 101 ++++++++++++---
8 files changed, 245 insertions(+), 165 deletions(-)

--
2.31.1.527.g47e6f16901-goog


2021-05-04 17:20:06

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 06/15] KVM: SVM: Probe and load MSR_TSC_AUX regardless of RDTSCP support in host

Probe MSR_TSC_AUX whether or not RDTSCP is supported in the host, and
if probing succeeds, load the guest's MSR_TSC_AUX into hardware prior to
VMRUN. Because SVM doesn't support interception of RDPID, RDPID cannot
be disallowed in the guest (without resorting to binary translation).
Leaving the host's MSR_TSC_AUX in hardware would leak the host's value to
the guest if RDTSCP is not supported.

Note, there is also a kernel bug that prevents leaking the host's value.
The host kernel initializes MSR_TSC_AUX if and only if RDTSCP is
supported, even though the vDSO usage consumes MSR_TSC_AUX via RDPID.
I.e. if RDTSCP is not supported, there is no host value to leak. But,
if/when the host kernel bug is fixed, KVM would start leaking MSR_TSC_AUX
in the case where hardware supports RDPID but RDTSCP is unavailable for
whatever reason.

Probing MSR_TSC_AUX will also allow consolidating the probe and define
logic in common x86, and will make it simpler to condition the existence
of MSR_TSX_AUX (from the guest's perspective) on RDTSCP *or* RDPID.

Fixes: AMD CPUs
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8f2b184270c0..b3153d40cc4d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -212,7 +212,7 @@ DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
* RDTSCP and RDPID are not used in the kernel, specifically to allow KVM to
* defer the restoration of TSC_AUX until the CPU returns to userspace.
*/
-#define TSC_AUX_URET_SLOT 0
+static int tsc_aux_uret_slot __read_mostly = -1;

static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};

@@ -959,8 +959,10 @@ static __init int svm_hardware_setup(void)
kvm_tsc_scaling_ratio_frac_bits = 32;
}

- if (boot_cpu_has(X86_FEATURE_RDTSCP))
- kvm_define_user_return_msr(TSC_AUX_URET_SLOT, MSR_TSC_AUX);
+ if (!kvm_probe_user_return_msr(MSR_TSC_AUX)) {
+ tsc_aux_uret_slot = 0;
+ kvm_define_user_return_msr(tsc_aux_uret_slot, MSR_TSC_AUX);
+ }

/* Check for pause filtering support */
if (!boot_cpu_has(X86_FEATURE_PAUSEFILTER)) {
@@ -1454,8 +1456,8 @@ static void svm_prepare_guest_switch(struct kvm_vcpu *vcpu)
}
}

- if (static_cpu_has(X86_FEATURE_RDTSCP))
- kvm_set_user_return_msr(TSC_AUX_URET_SLOT, svm->tsc_aux, -1ull);
+ if (likely(tsc_aux_uret_slot >= 0))
+ kvm_set_user_return_msr(tsc_aux_uret_slot, svm->tsc_aux, -1ull);

svm->guest_state_loaded = true;
}
@@ -2664,7 +2666,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
msr_info->data |= (u64)svm->sysenter_esp_hi << 32;
break;
case MSR_TSC_AUX:
- if (!boot_cpu_has(X86_FEATURE_RDTSCP))
+ if (tsc_aux_uret_slot < 0)
return 1;
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
@@ -2885,7 +2887,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
svm->sysenter_esp_hi = guest_cpuid_is_intel(vcpu) ? (data >> 32) : 0;
break;
case MSR_TSC_AUX:
- if (!boot_cpu_has(X86_FEATURE_RDTSCP))
+ if (tsc_aux_uret_slot < 0)
return 1;

if (!msr->host_initiated &&
@@ -2908,7 +2910,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
* guest via direct_access_msrs, and switch it via user return.
*/
preempt_disable();
- r = kvm_set_user_return_msr(TSC_AUX_URET_SLOT, data, -1ull);
+ r = kvm_set_user_return_msr(tsc_aux_uret_slot, data, -1ull);
preempt_enable();
if (r)
return 1;
--
2.31.1.527.g47e6f16901-goog

2021-05-04 17:20:10

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 07/15] KVM: x86: Add support for RDPID without RDTSCP

Allow userspace to enable RDPID for a guest without also enabling RDTSCP.
Aside from checking for RDPID support in the obvious flows, VMX also needs
to set ENABLE_RDTSCP=1 when RDPID is exposed.

For the record, there is no known scenario where enabling RDPID without
RDTSCP is desirable. But, both AMD and Intel architectures allow for the
condition, i.e. this is purely to make KVM more architecturally accurate.

Fixes: 41cd02c6f7f6 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID")
Cc: [email protected]
Reported-by: Reiji Watanabe <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 6 ++++--
arch/x86/kvm/vmx/vmx.c | 27 +++++++++++++++++++++++----
arch/x86/kvm/x86.c | 3 ++-
3 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b3153d40cc4d..231b9650d864 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2669,7 +2669,8 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (tsc_aux_uret_slot < 0)
return 1;
if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
return 1;
msr_info->data = svm->tsc_aux;
break;
@@ -2891,7 +2892,8 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
return 1;

if (!msr->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
return 1;

/*
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 990ee339a05f..42e4bbaa299a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1788,7 +1788,8 @@ static void setup_msrs(struct vcpu_vmx *vmx)
if (update_transition_efer(vmx))
vmx_setup_uret_msr(vmx, MSR_EFER);

- if (guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP))
+ if (guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP) ||
+ guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDPID))
vmx_setup_uret_msr(vmx, MSR_TSC_AUX);

vmx_setup_uret_msr(vmx, MSR_IA32_TSX_CTRL);
@@ -1994,7 +1995,8 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
break;
case MSR_TSC_AUX:
if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
return 1;
goto find_uret_msr;
case MSR_IA32_DEBUGCTLMSR:
@@ -2314,7 +2316,8 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
break;
case MSR_TSC_AUX:
if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
return 1;
/* Check reserved bit, higher 32 bits should be zero */
if ((data >> 32) != 0)
@@ -4368,7 +4371,23 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
xsaves_enabled, false);
}

- vmx_adjust_sec_exec_feature(vmx, &exec_control, rdtscp, RDTSCP);
+ /*
+ * RDPID is also gated by ENABLE_RDTSCP, turn on the control if either
+ * feature is exposed to the guest. This creates a virtualization hole
+ * if both are supported in hardware but only one is exposed to the
+ * guest, but letting the guest execute RDTSCP or RDPID when either one
+ * is advertised is preferable to emulating the advertised instruction
+ * in KVM on #UD, and obviously better than incorrectly injecting #UD.
+ */
+ if (cpu_has_vmx_rdtscp()) {
+ bool rdpid_or_rdtscp_enabled =
+ guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) ||
+ guest_cpuid_has(vcpu, X86_FEATURE_RDPID);
+
+ vmx_adjust_secondary_exec_control(vmx, &exec_control,
+ SECONDARY_EXEC_ENABLE_RDTSCP,
+ rdpid_or_rdtscp_enabled, false);
+ }
vmx_adjust_sec_exec_feature(vmx, &exec_control, invpcid, INVPCID);

vmx_adjust_sec_exec_exiting(vmx, &exec_control, rdrand, RDRAND);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e304447be42d..b4516d303413 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5978,7 +5978,8 @@ static void kvm_init_msr_list(void)
continue;
break;
case MSR_TSC_AUX:
- if (!kvm_cpu_cap_has(X86_FEATURE_RDTSCP))
+ if (!kvm_cpu_cap_has(X86_FEATURE_RDTSCP) &&
+ !kvm_cpu_cap_has(X86_FEATURE_RDPID))
continue;
break;
case MSR_IA32_UMWAIT_CONTROL:
--
2.31.1.527.g47e6f16901-goog

2021-05-04 17:20:31

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 01/15] KVM: VMX: Do not adverise RDPID if ENABLE_RDTSCP control is unsupported

Clear KVM's RDPID capability if the ENABLE_RDTSCP secondary exec control is
unsupported. Despite being enumerated in a separate CPUID flag, RDPID is
bundled under the same VMCS control as RDTSCP and will #UD in VMX non-root
if ENABLE_RDTSCP is not enabled.

Fixes: 41cd02c6f7f6 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID")
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 10b610fc7bbc..82404ee2520e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7377,9 +7377,11 @@ static __init void vmx_set_cpu_caps(void)
if (!cpu_has_vmx_xsaves())
kvm_cpu_cap_clear(X86_FEATURE_XSAVES);

- /* CPUID 0x80000001 */
- if (!cpu_has_vmx_rdtscp())
+ /* CPUID 0x80000001 and 0x7 (RDPID) */
+ if (!cpu_has_vmx_rdtscp()) {
kvm_cpu_cap_clear(X86_FEATURE_RDTSCP);
+ kvm_cpu_cap_clear(X86_FEATURE_RDPID);
+ }

if (cpu_has_vmx_waitpkg())
kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG);
--
2.31.1.527.g47e6f16901-goog

2021-05-04 17:20:33

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 14/15] KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU model

Squish the Intel and AMD emulation of MSR_TSC_AUX together and tie it to
the guest CPU model instead of the host CPU behavior. While not strictly
necessary to avoid guest breakage, emulating cross-vendor "architecture"
will provide consistent behavior for the guest, e.g. WRMSR fault behavior
won't change if the vCPU is migrated to a host with divergent behavior.

Note, the "new" kvm_is_supported_user_return_msr() checks do not add new
functionality on either SVM or VMX. On SVM, the equivalent was
"tsc_aux_uret_slot < 0", and on VMX the check was buried in the
vmx_find_uret_msr() call at the find_uret_msr label.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 5 +++++
arch/x86/kvm/svm/svm.c | 24 ----------------------
arch/x86/kvm/vmx/vmx.c | 15 --------------
arch/x86/kvm/x86.c | 36 +++++++++++++++++++++++++++++++++
4 files changed, 41 insertions(+), 39 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a4b912f7e427..00fb9efb9984 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1782,6 +1782,11 @@ int kvm_add_user_return_msr(u32 msr);
int kvm_find_user_return_msr(u32 msr);
int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);

+static inline bool kvm_is_supported_user_return_msr(u32 msr)
+{
+ return kvm_find_user_return_msr(msr) >= 0;
+}
+
u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc);
u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index de921935e8de..6c7c6a303cc5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2663,12 +2663,6 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
msr_info->data |= (u64)svm->sysenter_esp_hi << 32;
break;
case MSR_TSC_AUX:
- if (tsc_aux_uret_slot < 0)
- return 1;
- if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
- return 1;
msr_info->data = svm->tsc_aux;
break;
/*
@@ -2885,24 +2879,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
svm->sysenter_esp_hi = guest_cpuid_is_intel(vcpu) ? (data >> 32) : 0;
break;
case MSR_TSC_AUX:
- if (tsc_aux_uret_slot < 0)
- return 1;
-
- if (!msr->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
- return 1;
-
- /*
- * Per Intel's SDM, bits 63:32 are reserved, but AMD's APM has
- * incomplete and conflicting architectural behavior. Current
- * AMD CPUs completely ignore bits 63:32, i.e. they aren't
- * reserved and always read as zeros. Emulate AMD CPU behavior
- * to avoid explosions if the vCPU is migrated from an AMD host
- * to an Intel host.
- */
- data = (u32)data;
-
/*
* TSC_AUX is usually changed only during boot and never read
* directly. Intercept TSC_AUX instead of exposing it to the
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 26f82f302391..d85ac5876982 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1981,12 +1981,6 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
else
msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
break;
- case MSR_TSC_AUX:
- if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
- return 1;
- goto find_uret_msr;
case MSR_IA32_DEBUGCTLMSR:
msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
break;
@@ -2302,15 +2296,6 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
else
vmx->pt_desc.guest.addr_a[index / 2] = data;
break;
- case MSR_TSC_AUX:
- if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
- return 1;
- /* Check reserved bit, higher 32 bits should be zero */
- if ((data >> 32) != 0)
- return 1;
- goto find_uret_msr;
case MSR_IA32_PERF_CAPABILITIES:
if (data && !vcpu_to_pmu(vcpu)->version)
return 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index adca491d3b4b..896127ea4d4f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1642,6 +1642,30 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
* invokes 64-bit SYSENTER.
*/
data = get_canonical(data, vcpu_virt_addr_bits(vcpu));
+ break;
+ case MSR_TSC_AUX:
+ if (!kvm_is_supported_user_return_msr(MSR_TSC_AUX))
+ return 1;
+
+ if (!host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
+ return 1;
+
+ /*
+ * Per Intel's SDM, bits 63:32 are reserved, but AMD's APM has
+ * incomplete and conflicting architectural behavior. Current
+ * AMD CPUs completely ignore bits 63:32, i.e. they aren't
+ * reserved and always read as zeros. Enforce Intel's reserved
+ * bits check if and only if the guest CPU is Intel, and clear
+ * the bits in all other cases. This ensures cross-vendor
+ * migration will provide consistent behavior for the guest.
+ */
+ if (guest_cpuid_is_intel(vcpu) && (data >> 32) != 0)
+ return 1;
+
+ data = (u32)data;
+ break;
}

msr.data = data;
@@ -1678,6 +1702,18 @@ int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
if (!host_initiated && !kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ))
return KVM_MSR_RET_FILTERED;

+ switch (index) {
+ case MSR_TSC_AUX:
+ if (!kvm_is_supported_user_return_msr(MSR_TSC_AUX))
+ return 1;
+
+ if (!host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
+ return 1;
+ break;
+ }
+
msr.index = index;
msr.host_initiated = host_initiated;

--
2.31.1.527.g47e6f16901-goog

2021-05-04 17:20:35

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 02/15] KVM: x86: Emulate RDPID only if RDTSCP is supported

Do not advertise emulation support for RDPID if RDTSCP is unsupported.
RDPID emulation subtly relies on MSR_TSC_AUX to exist in hardware, as
both vmx_get_msr() and svm_get_msr() will return an error if the MSR is
unsupported, i.e. ctxt->ops->get_msr() will fail and the emulator will
inject a #UD.

Note, RDPID emulation also relies on RDTSCP being enabled in the guest,
but this is a KVM bug and will eventually be fixed.

Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID")
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index f765bf7a529c..c96f79c9fff2 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -637,7 +637,8 @@ static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func)
case 7:
entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
entry->eax = 0;
- entry->ecx = F(RDPID);
+ if (kvm_cpu_cap_has(X86_FEATURE_RDTSCP))
+ entry->ecx = F(RDPID);
++array->nent;
default:
break;
--
2.31.1.527.g47e6f16901-goog

2021-05-04 17:20:56

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 10/15] KVM: VMX: Use common x86's uret MSR list as the one true list

Drop VMX's global list of user return MSRs now that VMX doesn't resort said
list to isolate "active" MSRs, i.e. now that VMX's list and x86's list have
the same MSRs in the same order.

In addition to eliminating the redundant list, this will also allow moving
more of the list management into common x86.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/vmx/vmx.c | 97 ++++++++++++++-------------------
arch/x86/kvm/x86.c | 12 ++++
3 files changed, 53 insertions(+), 57 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a02c9bf3f7f1..c9452472ed55 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1778,6 +1778,7 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
unsigned long icr, int op_64_bit);

void kvm_define_user_return_msr(unsigned index, u32 msr);
+int kvm_find_user_return_msr(u32 msr);
int kvm_probe_user_return_msr(u32 msr);
int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6caabcd5037e..4b432d2bbd06 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -454,26 +454,7 @@ static inline void vmx_segment_cache_clear(struct vcpu_vmx *vmx)

static unsigned long host_idt_base;

-/*
- * Though SYSCALL is only supported in 64-bit mode on Intel CPUs, kvm
- * will emulate SYSCALL in legacy mode if the vendor string in guest
- * CPUID.0:{EBX,ECX,EDX} is "AuthenticAMD" or "AMDisbetter!" To
- * support this emulation, MSR_STAR is included in the list for i386,
- * but is never loaded into hardware. MSR_CSTAR is also never loaded
- * into hardware and is here purely for emulation purposes.
- */
-static u32 vmx_uret_msrs_list[] = {
-#ifdef CONFIG_X86_64
- MSR_SYSCALL_MASK, MSR_LSTAR, MSR_CSTAR,
-#endif
- MSR_EFER, MSR_TSC_AUX, MSR_STAR,
- MSR_IA32_TSX_CTRL,
-};
-
-/*
- * Number of user return MSRs that are actually supported in hardware.
- * vmx_uret_msrs_list is modified when KVM is loaded to drop unsupported MSRs.
- */
+/* Number of user return MSRs that are actually supported in hardware. */
static int vmx_nr_uret_msrs;

#if IS_ENABLED(CONFIG_HYPERV)
@@ -703,22 +684,11 @@ static bool is_valid_passthrough_msr(u32 msr)
return r;
}

-static inline int __vmx_find_uret_msr(u32 msr)
-{
- int i;
-
- for (i = 0; i < vmx_nr_uret_msrs; ++i) {
- if (vmx_uret_msrs_list[i] == msr)
- return i;
- }
- return -1;
-}
-
struct vmx_uret_msr *vmx_find_uret_msr(struct vcpu_vmx *vmx, u32 msr)
{
int i;

- i = __vmx_find_uret_msr(msr);
+ i = kvm_find_user_return_msr(msr);
if (i >= 0)
return &vmx->guest_uret_msrs[i];
return NULL;
@@ -1086,7 +1056,7 @@ static bool update_transition_efer(struct vcpu_vmx *vmx)
return false;
}

- i = __vmx_find_uret_msr(MSR_EFER);
+ i = kvm_find_user_return_msr(MSR_EFER);
if (i < 0)
return false;

@@ -6922,6 +6892,7 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu)

static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
{
+ struct vmx_uret_msr *tsx_ctrl;
struct vcpu_vmx *vmx;
int i, cpu, err;

@@ -6946,29 +6917,25 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)

for (i = 0; i < vmx_nr_uret_msrs; ++i) {
vmx->guest_uret_msrs[i].data = 0;
-
- switch (vmx_uret_msrs_list[i]) {
- case MSR_IA32_TSX_CTRL:
- /*
- * TSX_CTRL_CPUID_CLEAR is handled in the CPUID
- * interception. Keep the host value unchanged to avoid
- * changing CPUID bits under the host kernel's feet.
- *
- * hle=0, rtm=0, tsx_ctrl=1 can be found with some
- * combinations of new kernel and old userspace. If
- * those guests run on a tsx=off host, do allow guests
- * to use TSX_CTRL, but do not change the value on the
- * host so that TSX remains always disabled.
- */
- if (boot_cpu_has(X86_FEATURE_RTM))
- vmx->guest_uret_msrs[i].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
- else
- vmx->guest_uret_msrs[i].mask = 0;
- break;
- default:
- vmx->guest_uret_msrs[i].mask = -1ull;
- break;
- }
+ vmx->guest_uret_msrs[i].mask = -1ull;
+ }
+ tsx_ctrl = vmx_find_uret_msr(vmx, MSR_IA32_TSX_CTRL);
+ if (tsx_ctrl) {
+ /*
+ * TSX_CTRL_CPUID_CLEAR is handled in the CPUID interception.
+ * Keep the host value unchanged to avoid changing CPUID bits
+ * under the host kernel's feet.
+ *
+ * hle=0, rtm=0, tsx_ctrl=1 can be found with some combinations
+ * of new kernel and old userspace. If those guests run on a
+ * tsx=off host, do allow guests to use TSX_CTRL, but do not
+ * change the value on the host so that TSX remains always
+ * disabled.
+ */
+ if (boot_cpu_has(X86_FEATURE_RTM))
+ vmx->guest_uret_msrs[i].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
+ else
+ vmx->guest_uret_msrs[i].mask = 0;
}

err = alloc_loaded_vmcs(&vmx->vmcs01);
@@ -7829,6 +7796,22 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {

static __init void vmx_setup_user_return_msrs(void)
{
+
+ /*
+ * Though SYSCALL is only supported in 64-bit mode on Intel CPUs, kvm
+ * will emulate SYSCALL in legacy mode if the vendor string in guest
+ * CPUID.0:{EBX,ECX,EDX} is "AuthenticAMD" or "AMDisbetter!" To
+ * support this emulation, MSR_STAR is included in the list for i386,
+ * but is never loaded into hardware. MSR_CSTAR is also never loaded
+ * into hardware and is here purely for emulation purposes.
+ */
+ const u32 vmx_uret_msrs_list[] = {
+ #ifdef CONFIG_X86_64
+ MSR_SYSCALL_MASK, MSR_LSTAR, MSR_CSTAR,
+ #endif
+ MSR_EFER, MSR_TSC_AUX, MSR_STAR,
+ MSR_IA32_TSX_CTRL,
+ };
u32 msr;
int i;

@@ -7841,7 +7824,7 @@ static __init void vmx_setup_user_return_msrs(void)
continue;

kvm_define_user_return_msr(vmx_nr_uret_msrs, msr);
- vmx_uret_msrs_list[vmx_nr_uret_msrs++] = msr;
+ vmx_nr_uret_msrs++;
}
}

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b4516d303413..90ef340565a4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -364,6 +364,18 @@ void kvm_define_user_return_msr(unsigned slot, u32 msr)
}
EXPORT_SYMBOL_GPL(kvm_define_user_return_msr);

+int kvm_find_user_return_msr(u32 msr)
+{
+ int i;
+
+ for (i = 0; i < user_return_msrs_global.nr; ++i) {
+ if (user_return_msrs_global.msrs[i] == msr)
+ return i;
+ }
+ return -1;
+}
+EXPORT_SYMBOL_GPL(kvm_find_user_return_msr);
+
static void kvm_user_return_msr_cpu_online(void)
{
unsigned int cpu = smp_processor_id();
--
2.31.1.527.g47e6f16901-goog

2021-05-04 17:21:11

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 09/15] KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list

Explicitly flag a uret MSR as needing to be loaded into hardware instead of
resorting the list of "active" MSRs and tracking how many MSRs in total
need to be loaded. The only benefit to sorting the list is that the loop
to load MSRs during vmx_prepare_switch_to_guest() doesn't need to iterate
over all supported uret MRS, only those that are active. But that is a
pointless optimization, as the most common case, running a 64-bit guest,
will load the vast majority of MSRs. Not to mention that a single WRMSR is
far more expensive than iterating over the list.

Providing a stable list order obviates the need to track a given MSR's
"slot" in the per-CPU list of user return MSRs; all lists simply use the
same ordering. Future patches will take advantage of the stable order to
further simplify the related code.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 80 ++++++++++++++++++++++--------------------
arch/x86/kvm/vmx/vmx.h | 2 +-
2 files changed, 42 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 68454b0de2b1..6caabcd5037e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -458,8 +458,9 @@ static unsigned long host_idt_base;
* Though SYSCALL is only supported in 64-bit mode on Intel CPUs, kvm
* will emulate SYSCALL in legacy mode if the vendor string in guest
* CPUID.0:{EBX,ECX,EDX} is "AuthenticAMD" or "AMDisbetter!" To
- * support this emulation, IA32_STAR must always be included in
- * vmx_uret_msrs_list[], even in i386 builds.
+ * support this emulation, MSR_STAR is included in the list for i386,
+ * but is never loaded into hardware. MSR_CSTAR is also never loaded
+ * into hardware and is here purely for emulation purposes.
*/
static u32 vmx_uret_msrs_list[] = {
#ifdef CONFIG_X86_64
@@ -702,18 +703,12 @@ static bool is_valid_passthrough_msr(u32 msr)
return r;
}

-static inline int __vmx_find_uret_msr(struct vcpu_vmx *vmx, u32 msr)
+static inline int __vmx_find_uret_msr(u32 msr)
{
int i;

- /*
- * Note, vmx->guest_uret_msrs is the same size as vmx_uret_msrs_list,
- * but is ordered differently. The MSR is matched against the list of
- * supported uret MSRs using "slot", but the index that is returned is
- * the index into guest_uret_msrs.
- */
for (i = 0; i < vmx_nr_uret_msrs; ++i) {
- if (vmx_uret_msrs_list[vmx->guest_uret_msrs[i].slot] == msr)
+ if (vmx_uret_msrs_list[i] == msr)
return i;
}
return -1;
@@ -723,7 +718,7 @@ struct vmx_uret_msr *vmx_find_uret_msr(struct vcpu_vmx *vmx, u32 msr)
{
int i;

- i = __vmx_find_uret_msr(vmx, msr);
+ i = __vmx_find_uret_msr(msr);
if (i >= 0)
return &vmx->guest_uret_msrs[i];
return NULL;
@@ -732,13 +727,14 @@ struct vmx_uret_msr *vmx_find_uret_msr(struct vcpu_vmx *vmx, u32 msr)
static int vmx_set_guest_uret_msr(struct vcpu_vmx *vmx,
struct vmx_uret_msr *msr, u64 data)
{
+ unsigned int slot = msr - vmx->guest_uret_msrs;
int ret = 0;

u64 old_msr_data = msr->data;
msr->data = data;
- if (msr - vmx->guest_uret_msrs < vmx->nr_active_uret_msrs) {
+ if (msr->load_into_hardware) {
preempt_disable();
- ret = kvm_set_user_return_msr(msr->slot, msr->data, msr->mask);
+ ret = kvm_set_user_return_msr(slot, msr->data, msr->mask);
preempt_enable();
if (ret)
msr->data = old_msr_data;
@@ -1090,7 +1086,7 @@ static bool update_transition_efer(struct vcpu_vmx *vmx)
return false;
}

- i = __vmx_find_uret_msr(vmx, MSR_EFER);
+ i = __vmx_find_uret_msr(MSR_EFER);
if (i < 0)
return false;

@@ -1252,11 +1248,14 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
*/
if (!vmx->guest_uret_msrs_loaded) {
vmx->guest_uret_msrs_loaded = true;
- for (i = 0; i < vmx->nr_active_uret_msrs; ++i)
- kvm_set_user_return_msr(vmx->guest_uret_msrs[i].slot,
+ for (i = 0; i < vmx_nr_uret_msrs; ++i) {
+ if (!vmx->guest_uret_msrs[i].load_into_hardware)
+ continue;
+
+ kvm_set_user_return_msr(i,
vmx->guest_uret_msrs[i].data,
vmx->guest_uret_msrs[i].mask);
-
+ }
}

if (vmx->nested.need_vmcs12_to_shadow_sync)
@@ -1763,19 +1762,16 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
vmx_clear_hlt(vcpu);
}

-static void vmx_setup_uret_msr(struct vcpu_vmx *vmx, unsigned int msr)
+static void vmx_setup_uret_msr(struct vcpu_vmx *vmx, unsigned int msr,
+ bool load_into_hardware)
{
- struct vmx_uret_msr tmp;
- int from, to;
+ struct vmx_uret_msr *uret_msr;

- from = __vmx_find_uret_msr(vmx, msr);
- if (from < 0)
+ uret_msr = vmx_find_uret_msr(vmx, msr);
+ if (!uret_msr)
return;
- to = vmx->nr_active_uret_msrs++;

- tmp = vmx->guest_uret_msrs[to];
- vmx->guest_uret_msrs[to] = vmx->guest_uret_msrs[from];
- vmx->guest_uret_msrs[from] = tmp;
+ uret_msr->load_into_hardware = load_into_hardware;
}

/*
@@ -1785,30 +1781,36 @@ static void vmx_setup_uret_msr(struct vcpu_vmx *vmx, unsigned int msr)
*/
static void setup_msrs(struct vcpu_vmx *vmx)
{
- vmx->guest_uret_msrs_loaded = false;
- vmx->nr_active_uret_msrs = 0;
#ifdef CONFIG_X86_64
+ bool load_syscall_msrs;
+
/*
* The SYSCALL MSRs are only needed on long mode guests, and only
* when EFER.SCE is set.
*/
- if (is_long_mode(&vmx->vcpu) && (vmx->vcpu.arch.efer & EFER_SCE)) {
- vmx_setup_uret_msr(vmx, MSR_STAR);
- vmx_setup_uret_msr(vmx, MSR_LSTAR);
- vmx_setup_uret_msr(vmx, MSR_SYSCALL_MASK);
- }
+ load_syscall_msrs = is_long_mode(&vmx->vcpu) &&
+ (vmx->vcpu.arch.efer & EFER_SCE);
+
+ vmx_setup_uret_msr(vmx, MSR_STAR, load_syscall_msrs);
+ vmx_setup_uret_msr(vmx, MSR_LSTAR, load_syscall_msrs);
+ vmx_setup_uret_msr(vmx, MSR_SYSCALL_MASK, load_syscall_msrs);
#endif
- if (update_transition_efer(vmx))
- vmx_setup_uret_msr(vmx, MSR_EFER);
+ vmx_setup_uret_msr(vmx, MSR_EFER, update_transition_efer(vmx));

- if (guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP) ||
- guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDPID))
- vmx_setup_uret_msr(vmx, MSR_TSC_AUX);
+ vmx_setup_uret_msr(vmx, MSR_TSC_AUX,
+ guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP) ||
+ guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDPID));

- vmx_setup_uret_msr(vmx, MSR_IA32_TSX_CTRL);
+ vmx_setup_uret_msr(vmx, MSR_IA32_TSX_CTRL, true);

if (cpu_has_vmx_msr_bitmap())
vmx_update_msr_bitmap(&vmx->vcpu);
+
+ /*
+ * The set of MSRs to load may have changed, reload MSRs before the
+ * next VM-Enter.
+ */
+ vmx->guest_uret_msrs_loaded = false;
}

static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index d71ed8b425c5..16e4e457ba23 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -36,7 +36,7 @@ struct vmx_msrs {
};

struct vmx_uret_msr {
- unsigned int slot; /* The MSR's slot in kvm_user_return_msrs. */
+ bool load_into_hardware;
u64 data;
u64 mask;
};
--
2.31.1.527.g47e6f16901-goog

2021-05-04 17:21:26

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 12/15] KVM: x86: Export the number of uret MSRs to vendor modules

Split out and export the number of configured user return MSRs so that
VMX can iterate over the set of MSRs without having to do its own tracking.
Keep the list itself internal to x86 so that vendor code still has to go
through the "official" APIs to add/modify entries.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/x86.c | 29 +++++++++++++----------------
2 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c9452472ed55..10663610f105 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1419,6 +1419,7 @@ struct kvm_arch_async_pf {
bool direct_map;
};

+extern u32 __read_mostly kvm_nr_uret_msrs;
extern u64 __read_mostly host_efer;
extern bool __read_mostly allow_smaller_maxphyaddr;
extern struct kvm_x86_ops kvm_x86_ops;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 90ef340565a4..2fd46e917666 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -184,11 +184,6 @@ module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
*/
#define KVM_MAX_NR_USER_RETURN_MSRS 16

-struct kvm_user_return_msrs_global {
- int nr;
- u32 msrs[KVM_MAX_NR_USER_RETURN_MSRS];
-};
-
struct kvm_user_return_msrs {
struct user_return_notifier urn;
bool registered;
@@ -198,7 +193,9 @@ struct kvm_user_return_msrs {
} values[KVM_MAX_NR_USER_RETURN_MSRS];
};

-static struct kvm_user_return_msrs_global __read_mostly user_return_msrs_global;
+u32 __read_mostly kvm_nr_uret_msrs;
+EXPORT_SYMBOL_GPL(kvm_nr_uret_msrs);
+static u32 __read_mostly kvm_uret_msrs_list[KVM_MAX_NR_USER_RETURN_MSRS];
static struct kvm_user_return_msrs __percpu *user_return_msrs;

#define KVM_SUPPORTED_XCR0 (XFEATURE_MASK_FP | XFEATURE_MASK_SSE \
@@ -330,10 +327,10 @@ static void kvm_on_user_return(struct user_return_notifier *urn)
user_return_notifier_unregister(urn);
}
local_irq_restore(flags);
- for (slot = 0; slot < user_return_msrs_global.nr; ++slot) {
+ for (slot = 0; slot < kvm_nr_uret_msrs; ++slot) {
values = &msrs->values[slot];
if (values->host != values->curr) {
- wrmsrl(user_return_msrs_global.msrs[slot], values->host);
+ wrmsrl(kvm_uret_msrs_list[slot], values->host);
values->curr = values->host;
}
}
@@ -358,9 +355,9 @@ EXPORT_SYMBOL_GPL(kvm_probe_user_return_msr);
void kvm_define_user_return_msr(unsigned slot, u32 msr)
{
BUG_ON(slot >= KVM_MAX_NR_USER_RETURN_MSRS);
- user_return_msrs_global.msrs[slot] = msr;
- if (slot >= user_return_msrs_global.nr)
- user_return_msrs_global.nr = slot + 1;
+ kvm_uret_msrs_list[slot] = msr;
+ if (slot >= kvm_nr_uret_msrs)
+ kvm_nr_uret_msrs = slot + 1;
}
EXPORT_SYMBOL_GPL(kvm_define_user_return_msr);

@@ -368,8 +365,8 @@ int kvm_find_user_return_msr(u32 msr)
{
int i;

- for (i = 0; i < user_return_msrs_global.nr; ++i) {
- if (user_return_msrs_global.msrs[i] == msr)
+ for (i = 0; i < kvm_nr_uret_msrs; ++i) {
+ if (kvm_uret_msrs_list[i] == msr)
return i;
}
return -1;
@@ -383,8 +380,8 @@ static void kvm_user_return_msr_cpu_online(void)
u64 value;
int i;

- for (i = 0; i < user_return_msrs_global.nr; ++i) {
- rdmsrl_safe(user_return_msrs_global.msrs[i], &value);
+ for (i = 0; i < kvm_nr_uret_msrs; ++i) {
+ rdmsrl_safe(kvm_uret_msrs_list[i], &value);
msrs->values[i].host = value;
msrs->values[i].curr = value;
}
@@ -399,7 +396,7 @@ int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
value = (value & mask) | (msrs->values[slot].host & ~mask);
if (value == msrs->values[slot].curr)
return 0;
- err = wrmsrl_safe(user_return_msrs_global.msrs[slot], value);
+ err = wrmsrl_safe(kvm_uret_msrs_list[slot], value);
if (err)
return 1;

--
2.31.1.527.g47e6f16901-goog

2021-05-04 17:21:57

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 13/15] KVM: x86: Move uret MSR slot management to common x86

Now that SVM and VMX both probe MSRs before "defining" user return slots
for them, consolidate the code for probe+define into common x86 and
eliminate the odd behavior of having the vendor code define the slot for
a given MSR.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 +--
arch/x86/kvm/svm/svm.c | 5 +----
arch/x86/kvm/vmx/vmx.c | 19 ++++---------------
arch/x86/kvm/x86.c | 19 +++++++++++--------
4 files changed, 17 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 10663610f105..a4b912f7e427 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1778,9 +1778,8 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
unsigned long ipi_bitmap_high, u32 min,
unsigned long icr, int op_64_bit);

-void kvm_define_user_return_msr(unsigned index, u32 msr);
+int kvm_add_user_return_msr(u32 msr);
int kvm_find_user_return_msr(u32 msr);
-int kvm_probe_user_return_msr(u32 msr);
int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);

u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 231b9650d864..de921935e8de 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -959,10 +959,7 @@ static __init int svm_hardware_setup(void)
kvm_tsc_scaling_ratio_frac_bits = 32;
}

- if (!kvm_probe_user_return_msr(MSR_TSC_AUX)) {
- tsc_aux_uret_slot = 0;
- kvm_define_user_return_msr(tsc_aux_uret_slot, MSR_TSC_AUX);
- }
+ tsc_aux_uret_slot = kvm_add_user_return_msr(MSR_TSC_AUX);

/* Check for pause filtering support */
if (!boot_cpu_has(X86_FEATURE_PAUSEFILTER)) {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 7a53568b34fc..26f82f302391 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -454,9 +454,6 @@ static inline void vmx_segment_cache_clear(struct vcpu_vmx *vmx)

static unsigned long host_idt_base;

-/* Number of user return MSRs that are actually supported in hardware. */
-static int vmx_nr_uret_msrs;
-
#if IS_ENABLED(CONFIG_HYPERV)
static bool __read_mostly enlightened_vmcs = true;
module_param(enlightened_vmcs, bool, 0444);
@@ -1218,7 +1215,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
*/
if (!vmx->guest_uret_msrs_loaded) {
vmx->guest_uret_msrs_loaded = true;
- for (i = 0; i < vmx_nr_uret_msrs; ++i) {
+ for (i = 0; i < kvm_nr_uret_msrs; ++i) {
if (!vmx->guest_uret_msrs[i].load_into_hardware)
continue;

@@ -6921,7 +6918,7 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
goto free_vpid;
}

- for (i = 0; i < vmx_nr_uret_msrs; ++i) {
+ for (i = 0; i < kvm_nr_uret_msrs; ++i) {
vmx->guest_uret_msrs[i].data = 0;
vmx->guest_uret_msrs[i].mask = -1ull;
}
@@ -7810,20 +7807,12 @@ static __init void vmx_setup_user_return_msrs(void)
MSR_EFER, MSR_TSC_AUX, MSR_STAR,
MSR_IA32_TSX_CTRL,
};
- u32 msr;
int i;

BUILD_BUG_ON(ARRAY_SIZE(vmx_uret_msrs_list) != MAX_NR_USER_RETURN_MSRS);

- for (i = 0; i < ARRAY_SIZE(vmx_uret_msrs_list); ++i) {
- msr = vmx_uret_msrs_list[i];
-
- if (kvm_probe_user_return_msr(msr))
- continue;
-
- kvm_define_user_return_msr(vmx_nr_uret_msrs, msr);
- vmx_nr_uret_msrs++;
- }
+ for (i = 0; i < ARRAY_SIZE(vmx_uret_msrs_list); ++i)
+ kvm_add_user_return_msr(vmx_uret_msrs_list[i]);
}

static __init int hardware_setup(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2fd46e917666..adca491d3b4b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -336,7 +336,7 @@ static void kvm_on_user_return(struct user_return_notifier *urn)
}
}

-int kvm_probe_user_return_msr(u32 msr)
+static int kvm_probe_user_return_msr(u32 msr)
{
u64 val;
int ret;
@@ -350,16 +350,18 @@ int kvm_probe_user_return_msr(u32 msr)
preempt_enable();
return ret;
}
-EXPORT_SYMBOL_GPL(kvm_probe_user_return_msr);

-void kvm_define_user_return_msr(unsigned slot, u32 msr)
+int kvm_add_user_return_msr(u32 msr)
{
- BUG_ON(slot >= KVM_MAX_NR_USER_RETURN_MSRS);
- kvm_uret_msrs_list[slot] = msr;
- if (slot >= kvm_nr_uret_msrs)
- kvm_nr_uret_msrs = slot + 1;
+ BUG_ON(kvm_nr_uret_msrs >= KVM_MAX_NR_USER_RETURN_MSRS);
+
+ if (kvm_probe_user_return_msr(msr))
+ return -1;
+
+ kvm_uret_msrs_list[kvm_nr_uret_msrs] = msr;
+ return kvm_nr_uret_msrs++;
}
-EXPORT_SYMBOL_GPL(kvm_define_user_return_msr);
+EXPORT_SYMBOL_GPL(kvm_add_user_return_msr);

int kvm_find_user_return_msr(u32 msr)
{
@@ -8169,6 +8171,7 @@ int kvm_arch_init(void *opaque)
printk(KERN_ERR "kvm: failed to allocate percpu kvm_user_return_msrs\n");
goto out_free_x86_emulator_cache;
}
+ kvm_nr_uret_msrs = 0;

r = kvm_mmu_module_init();
if (r)
--
2.31.1.527.g47e6f16901-goog

2021-05-04 17:21:57

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 11/15] KVM: VMX: Disable loading of TSX_CTRL MSR the more conventional way

Tag TSX_CTRL as not needing to be loaded when RTM isn't supported in the
host. Crushing the write mask to '0' has the same effect, but requires
more mental gymnastics to understand.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4b432d2bbd06..7a53568b34fc 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1771,7 +1771,13 @@ static void setup_msrs(struct vcpu_vmx *vmx)
guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP) ||
guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDPID));

- vmx_setup_uret_msr(vmx, MSR_IA32_TSX_CTRL, true);
+ /*
+ * hle=0, rtm=0, tsx_ctrl=1 can be found with some combinations of new
+ * kernel and old userspace. If those guests run on a tsx=off host, do
+ * allow guests to use TSX_CTRL, but don't change the value in hardware
+ * so that TSX remains always disabled.
+ */
+ vmx_setup_uret_msr(vmx, MSR_IA32_TSX_CTRL, boot_cpu_has(X86_FEATURE_RTM));

if (cpu_has_vmx_msr_bitmap())
vmx_update_msr_bitmap(&vmx->vcpu);
@@ -6919,23 +6925,15 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
vmx->guest_uret_msrs[i].data = 0;
vmx->guest_uret_msrs[i].mask = -1ull;
}
- tsx_ctrl = vmx_find_uret_msr(vmx, MSR_IA32_TSX_CTRL);
- if (tsx_ctrl) {
+ if (boot_cpu_has(X86_FEATURE_RTM)) {
/*
* TSX_CTRL_CPUID_CLEAR is handled in the CPUID interception.
* Keep the host value unchanged to avoid changing CPUID bits
* under the host kernel's feet.
- *
- * hle=0, rtm=0, tsx_ctrl=1 can be found with some combinations
- * of new kernel and old userspace. If those guests run on a
- * tsx=off host, do allow guests to use TSX_CTRL, but do not
- * change the value on the host so that TSX remains always
- * disabled.
*/
- if (boot_cpu_has(X86_FEATURE_RTM))
+ tsx_ctrl = vmx_find_uret_msr(vmx, MSR_IA32_TSX_CTRL);
+ if (tsx_ctrl)
vmx->guest_uret_msrs[i].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
- else
- vmx->guest_uret_msrs[i].mask = 0;
}

err = alloc_loaded_vmcs(&vmx->vmcs01);
--
2.31.1.527.g47e6f16901-goog

2021-05-04 17:22:11

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 15/15] KVM: x86: Hide RDTSCP and RDPID if MSR_TSC_AUX probing failed

If probing MSR_TSC_AUX failed, hide RDTSCP and RDPID, and WARN if either
feature was reported as supported. In theory, such a scenario should
never happen as both Intel and AMD state that MSR_TSC_AUX is available if
RDTSCP or RDPID is supported. But, KVM injects #GP on MSR_TSC_AUX
accesses if probing failed, faults on WRMSR(MSR_TSC_AUX) may be fatal to
the guest (because they happen during early CPU bringup), and KVM itself
has effectively misreported RDPID support in the past.

Note, this also has the happy side effect of omitting MSR_TSC_AUX from
the list of MSRs that are exposed to userspace if probing the MSR fails.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c96f79c9fff2..bf0f74ce4974 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -567,6 +567,21 @@ void kvm_set_cpu_caps(void)
F(ACE2) | F(ACE2_EN) | F(PHE) | F(PHE_EN) |
F(PMM) | F(PMM_EN)
);
+
+ /*
+ * Hide RDTSCP and RDPID if either feature is reported as supported but
+ * probing MSR_TSC_AUX failed. This is purely a sanity check and
+ * should never happen, but the guest will likely crash if RDTSCP or
+ * RDPID is misreported, and KVM has botched MSR_TSC_AUX emulation in
+ * the past, e.g. the sanity check may fire if this instance of KVM is
+ * running as L1 on top of an older, broken KVM.
+ */
+ if (WARN_ON((kvm_cpu_cap_has(X86_FEATURE_RDTSCP) ||
+ kvm_cpu_cap_has(X86_FEATURE_RDPID)) &&
+ !kvm_is_supported_user_return_msr(MSR_TSC_AUX))) {
+ kvm_cpu_cap_clear(X86_FEATURE_RDTSCP);
+ kvm_cpu_cap_clear(X86_FEATURE_RDPID);
+ }
}
EXPORT_SYMBOL_GPL(kvm_set_cpu_caps);

--
2.31.1.527.g47e6f16901-goog

2021-05-04 17:39:02

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 01/15] KVM: VMX: Do not adverise RDPID if ENABLE_RDTSCP control is unsupported

On Tue, May 4, 2021 at 10:17 AM Sean Christopherson <[email protected]> wrote:
>
> Clear KVM's RDPID capability if the ENABLE_RDTSCP secondary exec control is
> unsupported. Despite being enumerated in a separate CPUID flag, RDPID is
> bundled under the same VMCS control as RDTSCP and will #UD in VMX non-root
> if ENABLE_RDTSCP is not enabled.
>
> Fixes: 41cd02c6f7f6 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID")
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>

But KVM will happily emulate RDPID if the instruction causes a #UD
VM-exit, won't it? See commit fb6d4d340e05 (KVM: x86: emulate RDPID).

2021-05-04 17:54:54

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 01/15] KVM: VMX: Do not adverise RDPID if ENABLE_RDTSCP control is unsupported

On Tue, May 4, 2021 at 10:37 AM Jim Mattson <[email protected]> wrote:
>
> On Tue, May 4, 2021 at 10:17 AM Sean Christopherson <[email protected]> wrote:
> >
> > Clear KVM's RDPID capability if the ENABLE_RDTSCP secondary exec control is
> > unsupported. Despite being enumerated in a separate CPUID flag, RDPID is
> > bundled under the same VMCS control as RDTSCP and will #UD in VMX non-root
> > if ENABLE_RDTSCP is not enabled.
> >
> > Fixes: 41cd02c6f7f6 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID")
> > Cc: [email protected]
> > Signed-off-by: Sean Christopherson <[email protected]>
>
> But KVM will happily emulate RDPID if the instruction causes a #UD
> VM-exit, won't it? See commit fb6d4d340e05 (KVM: x86: emulate RDPID).

Oh, after reading the second patch, I now see why this is needed.

You mispelled 'advertise' in the summary line.

Reviewed-by: Jim Mattson <[email protected]>

2021-05-04 18:16:49

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 01/15] KVM: VMX: Do not adverise RDPID if ENABLE_RDTSCP control is unsupported

On Tue, May 04, 2021, Jim Mattson wrote:
> On Tue, May 4, 2021 at 10:37 AM Jim Mattson <[email protected]> wrote:
> >
> > On Tue, May 4, 2021 at 10:17 AM Sean Christopherson <[email protected]> wrote:
> > >
> > > Clear KVM's RDPID capability if the ENABLE_RDTSCP secondary exec control is
> > > unsupported. Despite being enumerated in a separate CPUID flag, RDPID is
> > > bundled under the same VMCS control as RDTSCP and will #UD in VMX non-root
> > > if ENABLE_RDTSCP is not enabled.
> > >
> > > Fixes: 41cd02c6f7f6 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID")
> > > Cc: [email protected]
> > > Signed-off-by: Sean Christopherson <[email protected]>
> >
> > But KVM will happily emulate RDPID if the instruction causes a #UD
> > VM-exit, won't it? See commit fb6d4d340e05 (KVM: x86: emulate RDPID).
>
> Oh, after reading the second patch, I now see why this is needed.

Yeah. Technically, once common x86 can query MSR_TSC_AUX support directly at
the end of the series, the emulation enumeration could be:

if (kvm_is_supported_user_return_msr(MSR_TSC_AUX))
entry->ecx = F(RDPID);

I think I actually meant to do that, then lost track of that TODO item when
reworking the series for the umpteenth time.

Practically speaking, the only way for kvm_is_supported_user_return_msr() to be
meaningful vs. kvm_cpu_cap_has() is if RDTSCP is supported in hardware but the
VMCS control is not available. And I suppose there's also the case where
X86_FEATURE_RDTSCP was cleared by the kernel, but I feel like KVM should respect
the kernel's avoidance of RDTSCP/MSR_TSC_AUX in that case. Regarding the silly
VMCS case, I have no objection to making the change, but I also don't care if we
sweep it under the rug.

2021-05-04 18:31:22

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 03/15] KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest

Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest.

Note, SVM does not support intercepting RDPID. Unlike VMX's
ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID. This
is a benign virtualization hole as the host kernel (incorrectly) sets
MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX
into hardware if RDTSCP is supported in the host, i.e. KVM will not leak
the host's MSR_TSC_AUX to the guest.

But, when the kernel bug is fixed, KVM will start leaking the host's
MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available
for whatever reason. This leak will be remedied in a future commit.

Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a7271f31df47..8f2b184270c0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1100,7 +1100,9 @@ static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
return svm->vmcb->control.tsc_offset;
}

-static void svm_check_invpcid(struct vcpu_svm *svm)
+/* Evaluate instruction intercepts that depend on guest CPUID features. */
+static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
+ struct vcpu_svm *svm)
{
/*
* Intercept INVPCID if shadow paging is enabled to sync/free shadow
@@ -1113,6 +1115,13 @@ static void svm_check_invpcid(struct vcpu_svm *svm)
else
svm_clr_intercept(svm, INTERCEPT_INVPCID);
}
+
+ if (kvm_cpu_cap_has(X86_FEATURE_RDTSCP)) {
+ if (guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+ svm_clr_intercept(svm, INTERCEPT_RDTSCP);
+ else
+ svm_set_intercept(svm, INTERCEPT_RDTSCP);
+ }
}

static void init_vmcb(struct kvm_vcpu *vcpu)
@@ -1248,7 +1257,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
svm_clr_intercept(svm, INTERCEPT_PAUSE);
}

- svm_check_invpcid(svm);
+ svm_recalc_instruction_intercepts(vcpu, svm);

/*
* If the host supports V_SPEC_CTRL then disable the interception
@@ -3084,6 +3093,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
[SVM_EXIT_STGI] = stgi_interception,
[SVM_EXIT_CLGI] = clgi_interception,
[SVM_EXIT_SKINIT] = skinit_interception,
+ [SVM_EXIT_RDTSCP] = kvm_handle_invalid_op,
[SVM_EXIT_WBINVD] = kvm_emulate_wbinvd,
[SVM_EXIT_MONITOR] = kvm_emulate_monitor,
[SVM_EXIT_MWAIT] = kvm_emulate_mwait,
@@ -4007,8 +4017,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
guest_cpuid_has(vcpu, X86_FEATURE_NRIPS);

- /* Check again if INVPCID interception if required */
- svm_check_invpcid(svm);
+ svm_recalc_instruction_intercepts(vcpu, svm);

/* For sev guests, the memory encryption bit is not reserved in CR3. */
if (sev_guest(vcpu->kvm)) {
--
2.31.1.527.g47e6f16901-goog

2021-05-04 18:31:27

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 05/15] KVM: VMX: Disable preemption when probing user return MSRs

Disable preemption when probing a user return MSR via RDSMR/WRMSR. If
the MSR holds a different value per logical CPU, the WRMSR could corrupt
the host's value if KVM is preempted between the RDMSR and WRMSR, and
then rescheduled on a different CPU.

Opportunistically land the helper in common x86, SVM will use the helper
in a future commit.

Fixes: 4be534102624 ("KVM: VMX: Initialize vmx->guest_msrs[] right after allocation")
Cc: [email protected]
Cc: Xiaoyao Li <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/vmx/vmx.c | 5 +----
arch/x86/kvm/x86.c | 16 ++++++++++++++++
3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3e5fc80a35c8..a02c9bf3f7f1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1778,6 +1778,7 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
unsigned long icr, int op_64_bit);

void kvm_define_user_return_msr(unsigned index, u32 msr);
+int kvm_probe_user_return_msr(u32 msr);
int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);

u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 99591e523b47..990ee339a05f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6914,12 +6914,9 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)

for (i = 0; i < ARRAY_SIZE(vmx_uret_msrs_list); ++i) {
u32 index = vmx_uret_msrs_list[i];
- u32 data_low, data_high;
int j = vmx->nr_uret_msrs;

- if (rdmsr_safe(index, &data_low, &data_high) < 0)
- continue;
- if (wrmsr_safe(index, data_low, data_high) < 0)
+ if (kvm_probe_user_return_msr(index))
continue;

vmx->guest_uret_msrs[j].slot = i;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3bf52ba5f2bb..e304447be42d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -339,6 +339,22 @@ static void kvm_on_user_return(struct user_return_notifier *urn)
}
}

+int kvm_probe_user_return_msr(u32 msr)
+{
+ u64 val;
+ int ret;
+
+ preempt_disable();
+ ret = rdmsrl_safe(msr, &val);
+ if (ret)
+ goto out;
+ ret = wrmsrl_safe(msr, val);
+out:
+ preempt_enable();
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_probe_user_return_msr);
+
void kvm_define_user_return_msr(unsigned slot, u32 msr)
{
BUG_ON(slot >= KVM_MAX_NR_USER_RETURN_MSRS);
--
2.31.1.527.g47e6f16901-goog

2021-05-04 18:32:06

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 02/15] KVM: x86: Emulate RDPID only if RDTSCP is supported

On Tue, May 4, 2021 at 10:17 AM Sean Christopherson <[email protected]> wrote:
>
> Do not advertise emulation support for RDPID if RDTSCP is unsupported.
> RDPID emulation subtly relies on MSR_TSC_AUX to exist in hardware, as
> both vmx_get_msr() and svm_get_msr() will return an error if the MSR is
> unsupported, i.e. ctxt->ops->get_msr() will fail and the emulator will
> inject a #UD.
>
> Note, RDPID emulation also relies on RDTSCP being enabled in the guest,
> but this is a KVM bug and will eventually be fixed.
>
> Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID")
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>
Reviewed by: Jim Mattson <[email protected]>

2021-05-04 18:48:07

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 04/15] KVM: x86: Move RDPID emulation intercept to its own enum

Add a dedicated intercept enum for RDPID instead of piggybacking RDTSCP.
Unlike VMX's ENABLE_RDTSCP, RDPID is not bound to SVM's RDTSCP intercept.

Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID")
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/emulate.c | 2 +-
arch/x86/kvm/kvm_emulate.h | 1 +
arch/x86/kvm/vmx/vmx.c | 3 ++-
3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index abd9a4db11a8..8fc71e70857d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4502,7 +4502,7 @@ static const struct opcode group8[] = {
* from the register case of group9.
*/
static const struct gprefix pfx_0f_c7_7 = {
- N, N, N, II(DstMem | ModRM | Op3264 | EmulateOnUD, em_rdpid, rdtscp),
+ N, N, N, II(DstMem | ModRM | Op3264 | EmulateOnUD, em_rdpid, rdpid),
};


diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index 0d359115429a..f016838faedd 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -468,6 +468,7 @@ enum x86_intercept {
x86_intercept_clgi,
x86_intercept_skinit,
x86_intercept_rdtscp,
+ x86_intercept_rdpid,
x86_intercept_icebp,
x86_intercept_wbinvd,
x86_intercept_monitor,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 82404ee2520e..99591e523b47 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7437,8 +7437,9 @@ static int vmx_check_intercept(struct kvm_vcpu *vcpu,
/*
* RDPID causes #UD if disabled through secondary execution controls.
* Because it is marked as EmulateOnUD, we need to intercept it here.
+ * Note, RDPID is hidden behind ENABLE_RDTSCP.
*/
- case x86_intercept_rdtscp:
+ case x86_intercept_rdpid:
if (!nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_RDTSCP)) {
exception->vector = UD_VECTOR;
exception->error_code_valid = false;
--
2.31.1.527.g47e6f16901-goog

2021-05-04 18:48:14

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 08/15] KVM: VMX: Configure list of user return MSRs at module init

Configure the list of user return MSRs that are actually supported at
module init instead of reprobing the list of possible MSRs every time a
vCPU is created. Curating the list on a per-vCPU basis is pointless; KVM
is completely hosed if the set of supported MSRs changes after module init,
or if the set of MSRs differs per physical PCU.

The per-vCPU lists also increase complexity (see __vmx_find_uret_msr()) and
creates corner cases that _should_ be impossible, but theoretically exist
in KVM, e.g. advertising RDTSCP to userspace without actually being able to
virtualize RDTSCP if probing MSR_TSC_AUX fails.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 61 ++++++++++++++++++++++++++++--------------
arch/x86/kvm/vmx/vmx.h | 10 ++++++-
2 files changed, 50 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 42e4bbaa299a..68454b0de2b1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -461,7 +461,7 @@ static unsigned long host_idt_base;
* support this emulation, IA32_STAR must always be included in
* vmx_uret_msrs_list[], even in i386 builds.
*/
-static const u32 vmx_uret_msrs_list[] = {
+static u32 vmx_uret_msrs_list[] = {
#ifdef CONFIG_X86_64
MSR_SYSCALL_MASK, MSR_LSTAR, MSR_CSTAR,
#endif
@@ -469,6 +469,12 @@ static const u32 vmx_uret_msrs_list[] = {
MSR_IA32_TSX_CTRL,
};

+/*
+ * Number of user return MSRs that are actually supported in hardware.
+ * vmx_uret_msrs_list is modified when KVM is loaded to drop unsupported MSRs.
+ */
+static int vmx_nr_uret_msrs;
+
#if IS_ENABLED(CONFIG_HYPERV)
static bool __read_mostly enlightened_vmcs = true;
module_param(enlightened_vmcs, bool, 0444);
@@ -700,9 +706,16 @@ static inline int __vmx_find_uret_msr(struct vcpu_vmx *vmx, u32 msr)
{
int i;

- for (i = 0; i < vmx->nr_uret_msrs; ++i)
+ /*
+ * Note, vmx->guest_uret_msrs is the same size as vmx_uret_msrs_list,
+ * but is ordered differently. The MSR is matched against the list of
+ * supported uret MSRs using "slot", but the index that is returned is
+ * the index into guest_uret_msrs.
+ */
+ for (i = 0; i < vmx_nr_uret_msrs; ++i) {
if (vmx_uret_msrs_list[vmx->guest_uret_msrs[i].slot] == msr)
return i;
+ }
return -1;
}

@@ -6929,18 +6942,10 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
goto free_vpid;
}

- BUILD_BUG_ON(ARRAY_SIZE(vmx_uret_msrs_list) != MAX_NR_USER_RETURN_MSRS);
+ for (i = 0; i < vmx_nr_uret_msrs; ++i) {
+ vmx->guest_uret_msrs[i].data = 0;

- for (i = 0; i < ARRAY_SIZE(vmx_uret_msrs_list); ++i) {
- u32 index = vmx_uret_msrs_list[i];
- int j = vmx->nr_uret_msrs;
-
- if (kvm_probe_user_return_msr(index))
- continue;
-
- vmx->guest_uret_msrs[j].slot = i;
- vmx->guest_uret_msrs[j].data = 0;
- switch (index) {
+ switch (vmx_uret_msrs_list[i]) {
case MSR_IA32_TSX_CTRL:
/*
* TSX_CTRL_CPUID_CLEAR is handled in the CPUID
@@ -6954,15 +6959,14 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
* host so that TSX remains always disabled.
*/
if (boot_cpu_has(X86_FEATURE_RTM))
- vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
+ vmx->guest_uret_msrs[i].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
else
- vmx->guest_uret_msrs[j].mask = 0;
+ vmx->guest_uret_msrs[i].mask = 0;
break;
default:
- vmx->guest_uret_msrs[j].mask = -1ull;
+ vmx->guest_uret_msrs[i].mask = -1ull;
break;
}
- ++vmx->nr_uret_msrs;
}

err = alloc_loaded_vmcs(&vmx->vmcs01);
@@ -7821,17 +7825,34 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
.vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
};

+static __init void vmx_setup_user_return_msrs(void)
+{
+ u32 msr;
+ int i;
+
+ BUILD_BUG_ON(ARRAY_SIZE(vmx_uret_msrs_list) != MAX_NR_USER_RETURN_MSRS);
+
+ for (i = 0; i < ARRAY_SIZE(vmx_uret_msrs_list); ++i) {
+ msr = vmx_uret_msrs_list[i];
+
+ if (kvm_probe_user_return_msr(msr))
+ continue;
+
+ kvm_define_user_return_msr(vmx_nr_uret_msrs, msr);
+ vmx_uret_msrs_list[vmx_nr_uret_msrs++] = msr;
+ }
+}
+
static __init int hardware_setup(void)
{
unsigned long host_bndcfgs;
struct desc_ptr dt;
- int r, i, ept_lpage_level;
+ int r, ept_lpage_level;

store_idt(&dt);
host_idt_base = dt.address;

- for (i = 0; i < ARRAY_SIZE(vmx_uret_msrs_list); ++i)
- kvm_define_user_return_msr(i, vmx_uret_msrs_list[i]);
+ vmx_setup_user_return_msrs();

if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0)
return -EIO;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 008cb87ff088..d71ed8b425c5 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -245,8 +245,16 @@ struct vcpu_vmx {
u32 idt_vectoring_info;
ulong rflags;

+ /*
+ * User return MSRs are always emulated when enabled in the guest, but
+ * only loaded into hardware when necessary, e.g. SYSCALL #UDs outside
+ * of 64-bit mode or if EFER.SCE=1, thus the SYSCALL MSRs don't need to
+ * be loaded into hardware if those conditions aren't met.
+ * nr_active_uret_msrs tracks the number of MSRs that need to be loaded
+ * into hardware when running the guest. guest_uret_msrs[] is resorted
+ * whenever the number of "active" uret MSRs is modified.
+ */
struct vmx_uret_msr guest_uret_msrs[MAX_NR_USER_RETURN_MSRS];
- int nr_uret_msrs;
int nr_active_uret_msrs;
bool guest_uret_msrs_loaded;
#ifdef CONFIG_X86_64
--
2.31.1.527.g47e6f16901-goog

2021-05-04 22:18:13

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 03/15] KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest

On Tue, May 4, 2021 at 2:53 PM Sean Christopherson <[email protected]> wrote:
>
> On Tue, May 04, 2021, Jim Mattson wrote:
> > On Tue, May 4, 2021 at 10:17 AM Sean Christopherson <[email protected]> wrote:
> > >
> > > Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest.
> > >
> > > Note, SVM does not support intercepting RDPID. Unlike VMX's
> > > ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID. This
> > > is a benign virtualization hole as the host kernel (incorrectly) sets
> > > MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX
> > > into hardware if RDTSCP is supported in the host, i.e. KVM will not leak
> > > the host's MSR_TSC_AUX to the guest.
> > >
> > > But, when the kernel bug is fixed, KVM will start leaking the host's
> > > MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available
> > > for whatever reason. This leak will be remedied in a future commit.
> > >
> > > Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
> > > Cc: [email protected]
> > > Signed-off-by: Sean Christopherson <[email protected]>
> > > ---
> > ...
> > > @@ -4007,8 +4017,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > > svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
> > > guest_cpuid_has(vcpu, X86_FEATURE_NRIPS);
> > >
> > > - /* Check again if INVPCID interception if required */
> > > - svm_check_invpcid(svm);
> > > + svm_recalc_instruction_intercepts(vcpu, svm);
> >
> > Does the right thing happen here if the vCPU is in guest mode when
> > userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
> > off?
>
> I hate our terminology. By "guest mode", do you mean running the vCPU, or do
> you specifically mean running in L2?

I mean is_guest_mode(vcpu) is true (i.e. running L2).

2021-05-04 22:28:02

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 03/15] KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest

On 04/05/21 23:53, Sean Christopherson wrote:
>> Does the right thing happen here if the vCPU is in guest mode when
>> userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
>> off?
> I hate our terminology. By "guest mode", do you mean running the vCPU, or do
> you specifically mean running in L2?
>

Guest mode should mean L2.

(I wonder if we should have a capability that says "KVM_SET_CPUID2 can
only be called prior to KVM_RUN").

Paolo

2021-05-04 22:29:34

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 03/15] KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest

On Tue, May 4, 2021 at 2:57 PM Paolo Bonzini <[email protected]> wrote:
>
> On 04/05/21 23:53, Sean Christopherson wrote:
> >> Does the right thing happen here if the vCPU is in guest mode when
> >> userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
> >> off?
> > I hate our terminology. By "guest mode", do you mean running the vCPU, or do
> > you specifically mean running in L2?
> >
>
> Guest mode should mean L2.
>
> (I wonder if we should have a capability that says "KVM_SET_CPUID2 can
> only be called prior to KVM_RUN").

It would certainly make it easier to reason about potential security issues.

2021-05-04 22:31:13

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 03/15] KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest

On Tue, May 4, 2021 at 3:10 PM Sean Christopherson <[email protected]> wrote:
>
> On Tue, May 04, 2021, Jim Mattson wrote:
> > On Tue, May 4, 2021 at 2:53 PM Sean Christopherson <[email protected]> wrote:
> > >
> > > On Tue, May 04, 2021, Jim Mattson wrote:
> > > > On Tue, May 4, 2021 at 10:17 AM Sean Christopherson <[email protected]> wrote:
> > > > >
> > > > > Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest.
> > > > >
> > > > > Note, SVM does not support intercepting RDPID. Unlike VMX's
> > > > > ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID. This
> > > > > is a benign virtualization hole as the host kernel (incorrectly) sets
> > > > > MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX
> > > > > into hardware if RDTSCP is supported in the host, i.e. KVM will not leak
> > > > > the host's MSR_TSC_AUX to the guest.
> > > > >
> > > > > But, when the kernel bug is fixed, KVM will start leaking the host's
> > > > > MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available
> > > > > for whatever reason. This leak will be remedied in a future commit.
> > > > >
> > > > > Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
> > > > > Cc: [email protected]
> > > > > Signed-off-by: Sean Christopherson <[email protected]>
> > > > > ---
> > > > ...
> > > > > @@ -4007,8 +4017,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > > > > svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
> > > > > guest_cpuid_has(vcpu, X86_FEATURE_NRIPS);
> > > > >
> > > > > - /* Check again if INVPCID interception if required */
> > > > > - svm_check_invpcid(svm);
> > > > > + svm_recalc_instruction_intercepts(vcpu, svm);
> > > >
> > > > Does the right thing happen here if the vCPU is in guest mode when
> > > > userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
> > > > off?
> > >
> > > I hate our terminology. By "guest mode", do you mean running the vCPU, or do
> > > you specifically mean running in L2?
> >
> > I mean is_guest_mode(vcpu) is true (i.e. running L2).
>
> No, it will not do the right thing, whatever "right thing" even means in this
> context. That's a pre-existing issue, e.g. INVCPID handling is also wrong.
> I highly doubt VMX does, or even can, do the right thing either.
>
> I'm pretty sure I lobbied in the past to disallow KVM_SET_CPUID* if the vCPU is
> in guest mode since it's impossible to do the right thing without forcing an
> exit to L1, e.g. changing MAXPHYSADDR will allow running L2 with an illegal
> CR3, ditto for various CR4 bits.

With that caveat understood,

Reviewed-by: Jim Mattson <[email protected]>

2021-05-04 23:38:33

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 03/15] KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest

On Tue, May 4, 2021 at 10:17 AM Sean Christopherson <[email protected]> wrote:
>
> Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest.
>
> Note, SVM does not support intercepting RDPID. Unlike VMX's
> ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID. This
> is a benign virtualization hole as the host kernel (incorrectly) sets
> MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX
> into hardware if RDTSCP is supported in the host, i.e. KVM will not leak
> the host's MSR_TSC_AUX to the guest.
>
> But, when the kernel bug is fixed, KVM will start leaking the host's
> MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available
> for whatever reason. This leak will be remedied in a future commit.
>
> Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
...
> @@ -4007,8 +4017,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
> guest_cpuid_has(vcpu, X86_FEATURE_NRIPS);
>
> - /* Check again if INVPCID interception if required */
> - svm_check_invpcid(svm);
> + svm_recalc_instruction_intercepts(vcpu, svm);

Does the right thing happen here if the vCPU is in guest mode when
userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
off?

2021-05-04 23:38:46

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 03/15] KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest

On Tue, May 04, 2021, Jim Mattson wrote:
> On Tue, May 4, 2021 at 10:17 AM Sean Christopherson <[email protected]> wrote:
> >
> > Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest.
> >
> > Note, SVM does not support intercepting RDPID. Unlike VMX's
> > ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID. This
> > is a benign virtualization hole as the host kernel (incorrectly) sets
> > MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX
> > into hardware if RDTSCP is supported in the host, i.e. KVM will not leak
> > the host's MSR_TSC_AUX to the guest.
> >
> > But, when the kernel bug is fixed, KVM will start leaking the host's
> > MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available
> > for whatever reason. This leak will be remedied in a future commit.
> >
> > Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
> > Cc: [email protected]
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> ...
> > @@ -4007,8 +4017,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
> > guest_cpuid_has(vcpu, X86_FEATURE_NRIPS);
> >
> > - /* Check again if INVPCID interception if required */
> > - svm_check_invpcid(svm);
> > + svm_recalc_instruction_intercepts(vcpu, svm);
>
> Does the right thing happen here if the vCPU is in guest mode when
> userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
> off?

I hate our terminology. By "guest mode", do you mean running the vCPU, or do
you specifically mean running in L2?

2021-05-04 23:38:52

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 03/15] KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest

On Tue, May 04, 2021, Jim Mattson wrote:
> On Tue, May 4, 2021 at 2:53 PM Sean Christopherson <[email protected]> wrote:
> >
> > On Tue, May 04, 2021, Jim Mattson wrote:
> > > On Tue, May 4, 2021 at 10:17 AM Sean Christopherson <[email protected]> wrote:
> > > >
> > > > Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest.
> > > >
> > > > Note, SVM does not support intercepting RDPID. Unlike VMX's
> > > > ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID. This
> > > > is a benign virtualization hole as the host kernel (incorrectly) sets
> > > > MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX
> > > > into hardware if RDTSCP is supported in the host, i.e. KVM will not leak
> > > > the host's MSR_TSC_AUX to the guest.
> > > >
> > > > But, when the kernel bug is fixed, KVM will start leaking the host's
> > > > MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available
> > > > for whatever reason. This leak will be remedied in a future commit.
> > > >
> > > > Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
> > > > Cc: [email protected]
> > > > Signed-off-by: Sean Christopherson <[email protected]>
> > > > ---
> > > ...
> > > > @@ -4007,8 +4017,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > > > svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
> > > > guest_cpuid_has(vcpu, X86_FEATURE_NRIPS);
> > > >
> > > > - /* Check again if INVPCID interception if required */
> > > > - svm_check_invpcid(svm);
> > > > + svm_recalc_instruction_intercepts(vcpu, svm);
> > >
> > > Does the right thing happen here if the vCPU is in guest mode when
> > > userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
> > > off?
> >
> > I hate our terminology. By "guest mode", do you mean running the vCPU, or do
> > you specifically mean running in L2?
>
> I mean is_guest_mode(vcpu) is true (i.e. running L2).

No, it will not do the right thing, whatever "right thing" even means in this
context. That's a pre-existing issue, e.g. INVCPID handling is also wrong.
I highly doubt VMX does, or even can, do the right thing either.

I'm pretty sure I lobbied in the past to disallow KVM_SET_CPUID* if the vCPU is
in guest mode since it's impossible to do the right thing without forcing an
exit to L1, e.g. changing MAXPHYSADDR will allow running L2 with an illegal
CR3, ditto for various CR4 bits.

2021-05-04 23:41:06

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 04/15] KVM: x86: Move RDPID emulation intercept to its own enum

On Tue, May 4, 2021 at 10:17 AM Sean Christopherson <[email protected]> wrote:
>
> Add a dedicated intercept enum for RDPID instead of piggybacking RDTSCP.
> Unlike VMX's ENABLE_RDTSCP, RDPID is not bound to SVM's RDTSCP intercept.
>
> Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID")
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>

2021-05-05 00:13:41

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 05/15] KVM: VMX: Disable preemption when probing user return MSRs

On Tue, May 4, 2021 at 10:17 AM Sean Christopherson <[email protected]> wrote:
>
> Disable preemption when probing a user return MSR via RDSMR/WRMSR. If
> the MSR holds a different value per logical CPU, the WRMSR could corrupt
> the host's value if KVM is preempted between the RDMSR and WRMSR, and
> then rescheduled on a different CPU.
>
> Opportunistically land the helper in common x86, SVM will use the helper
> in a future commit.
>
> Fixes: 4be534102624 ("KVM: VMX: Initialize vmx->guest_msrs[] right after allocation")
> Cc: [email protected]
> Cc: Xiaoyao Li <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>

2021-05-05 03:20:31

by Reiji Watanabe

[permalink] [raw]
Subject: Re: [PATCH 01/15] KVM: VMX: Do not adverise RDPID if ENABLE_RDTSCP control is unsupported

On Tue, May 4, 2021 at 10:17 AM Sean Christopherson <[email protected]> wrote:
>
> Clear KVM's RDPID capability if the ENABLE_RDTSCP secondary exec control is
> unsupported. Despite being enumerated in a separate CPUID flag, RDPID is
> bundled under the same VMCS control as RDTSCP and will #UD in VMX non-root
> if ENABLE_RDTSCP is not enabled.
>
> Fixes: 41cd02c6f7f6 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID")
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>

Reviewed-by: Reiji Watanabe <[email protected]>

2021-05-05 03:52:59

by Reiji Watanabe

[permalink] [raw]
Subject: Re: [PATCH 02/15] KVM: x86: Emulate RDPID only if RDTSCP is supported

> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -637,7 +637,8 @@ static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func)
> case 7:
> entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> entry->eax = 0;
> - entry->ecx = F(RDPID);
> + if (kvm_cpu_cap_has(X86_FEATURE_RDTSCP))
> + entry->ecx = F(RDPID);
> ++array->nent;
> default:
> break;

I'm wondering if entry->ecx should be set to F(RDPID) here
even if the CPU supports RDPID natively.
(i.e. kvm_cpu_cap_has(X86_FEATURE_RDPID) is true)

The document "Documentation/virt/kvm/api.rst" says:
---
4.88 KVM_GET_EMULATED_CPUID
---------------------------
<...>
Userspace can use the information returned by this ioctl to query
which features are emulated by kvm instead of being present natively.
---

Thanks,
Reiji

2021-05-05 04:33:22

by Reiji Watanabe

[permalink] [raw]
Subject: Re: [PATCH 03/15] KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest

On Tue, May 4, 2021 at 10:17 AM Sean Christopherson <[email protected]> wrote:
>
> Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest.
>
> Note, SVM does not support intercepting RDPID. Unlike VMX's
> ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID. This
> is a benign virtualization hole as the host kernel (incorrectly) sets
> MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX
> into hardware if RDTSCP is supported in the host, i.e. KVM will not leak
> the host's MSR_TSC_AUX to the guest.
>
> But, when the kernel bug is fixed, KVM will start leaking the host's
> MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available
> for whatever reason. This leak will be remedied in a future commit.
>
> Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>

Reviewed-by: Reiji Watanabe <[email protected]>

2021-05-05 08:03:20

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 02/15] KVM: x86: Emulate RDPID only if RDTSCP is supported

On 05/05/21 05:51, Reiji Watanabe wrote:
>> --- a/arch/x86/kvm/cpuid.c
>> +++ b/arch/x86/kvm/cpuid.c
>> @@ -637,7 +637,8 @@ static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func)
>> case 7:
>> entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
>> entry->eax = 0;
>> - entry->ecx = F(RDPID);
>> + if (kvm_cpu_cap_has(X86_FEATURE_RDTSCP))
>> + entry->ecx = F(RDPID);
>> ++array->nent;
>> default:
>> break;
> I'm wondering if entry->ecx should be set to F(RDPID) here
> even if the CPU supports RDPID natively.
> (i.e. kvm_cpu_cap_has(X86_FEATURE_RDPID) is true)
>
> The document "Documentation/virt/kvm/api.rst" says:
> ---
> 4.88 KVM_GET_EMULATED_CPUID
> ---------------------------
> <...>
> Userspace can use the information returned by this ioctl to query
> which features are emulated by kvm instead of being present natively.
> ---

Setting it always is consistent with the treatment of MOVBE above.
Either way is okay but it should be done for both bits.

Paolo

2021-05-05 08:53:51

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 00/15] KVM: x86: RDPID/RDTSCP fixes and uret MSR cleanups

On 04/05/21 19:17, Sean Christopherson wrote:
> This is a continuation of a less ambitious effort to unify MSR_TSC_AUX
> handling across SVM and VMX. Reiji pointed out that MSR_TSC_AUX exists if
> RDTSCP *or* RDPID is supported, and things went downhill from there.
>
> The first half of this series fixes a variety of RDTSCP and RDPID related
> bugs.
>
> The second half of the series cleans up VMX's user return MSR framework
> and consolidates more of the uret logic into common x86.
>
> The last two patches leverage the uret MSR cleanups to move MSR_TSC_AUX
> handling to common x86 and add sanity checks to guard against misreporting
> of RDPID and/or RDTSCP support.
>
> This will conflict with my vCPU RESET/INIT cleanup series. Feel free to
> punt the conflicts to me.
>
> Other "fun" things to tackle:
>
> - The kernel proper also botches RDPID vs. RDTSCP, as MSR_TSC_AUX is
> configured if RDTSCP is supported, but is consumed if RDPID is
> supported. I'll send this fix separately.
>
> - Commit 844d69c26d83 ("KVM: SVM: Delay restoration of host MSR_TSC_AUX
> until return to userspace") unwittingly fixed a bug where KVM would
> write MSR_TSC_AUX with the guest's value when svm->guest_state_loaded
> is false, which could lead to running the host with the guest's value.
> The bug only exists in 5.12 (maybe 5.11 too?), so crafting a fix for
> stable won't be too awful.
>
> Sean Christopherson (15):
> KVM: VMX: Do not adverise RDPID if ENABLE_RDTSCP control is
> unsupported
> KVM: x86: Emulate RDPID only if RDTSCP is supported
> KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest
> KVM: x86: Move RDPID emulation intercept to its own enum
> KVM: VMX: Disable preemption when probing user return MSRs
> KVM: SVM: Probe and load MSR_TSC_AUX regardless of RDTSCP support in
> host
> KVM: x86: Add support for RDPID without RDTSCP
> KVM: VMX: Configure list of user return MSRs at module init
> KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting
> list
> KVM: VMX: Use common x86's uret MSR list as the one true list
> KVM: VMX: Disable loading of TSX_CTRL MSR the more conventional way
> KVM: x86: Export the number of uret MSRs to vendor modules
> KVM: x86: Move uret MSR slot management to common x86
> KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU
> model
> KVM: x86: Hide RDTSCP and RDPID if MSR_TSC_AUX probing failed
>
> arch/x86/include/asm/kvm_host.h | 9 +-
> arch/x86/kvm/cpuid.c | 18 ++-
> arch/x86/kvm/emulate.c | 2 +-
> arch/x86/kvm/kvm_emulate.h | 1 +
> arch/x86/kvm/svm/svm.c | 50 +++-----
> arch/x86/kvm/vmx/vmx.c | 217 ++++++++++++++++----------------
> arch/x86/kvm/vmx/vmx.h | 12 +-
> arch/x86/kvm/x86.c | 101 ++++++++++++---
> 8 files changed, 245 insertions(+), 165 deletions(-)
>

Queued, thanks.

Paolo

2021-05-05 09:20:28

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 11/15] KVM: VMX: Disable loading of TSX_CTRL MSR the more conventional way

On 04/05/21 19:17, Sean Christopherson wrote:
> Tag TSX_CTRL as not needing to be loaded when RTM isn't supported in the
> host. Crushing the write mask to '0' has the same effect, but requires
> more mental gymnastics to understand.

This doesn't explain _why_ this is now possible. What about:

Now that user return MSRs is always present in the list, we don't have
the problem that the TSX_CTRL MSR needs a slot vmx->guest_uret_msrs even
if RTM is not supported in the host (and therefore there is nothing to
enable). Thus we can simply tag TSX_CTRL as not needing to be loaded
instead of crushing the write mask to '0'. That has the same effect,
but requires less mental gymnastics to understand.

Paolo

> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 22 ++++++++++------------
> 1 file changed, 10 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4b432d2bbd06..7a53568b34fc 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1771,7 +1771,13 @@ static void setup_msrs(struct vcpu_vmx *vmx)
> guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP) ||
> guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDPID));
>
> - vmx_setup_uret_msr(vmx, MSR_IA32_TSX_CTRL, true);
> + /*
> + * hle=0, rtm=0, tsx_ctrl=1 can be found with some combinations of new
> + * kernel and old userspace. If those guests run on a tsx=off host, do
> + * allow guests to use TSX_CTRL, but don't change the value in hardware
> + * so that TSX remains always disabled.
> + */
> + vmx_setup_uret_msr(vmx, MSR_IA32_TSX_CTRL, boot_cpu_has(X86_FEATURE_RTM));
>
> if (cpu_has_vmx_msr_bitmap())
> vmx_update_msr_bitmap(&vmx->vcpu);
> @@ -6919,23 +6925,15 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
> vmx->guest_uret_msrs[i].data = 0;
> vmx->guest_uret_msrs[i].mask = -1ull;
> }
> - tsx_ctrl = vmx_find_uret_msr(vmx, MSR_IA32_TSX_CTRL);
> - if (tsx_ctrl) {
> + if (boot_cpu_has(X86_FEATURE_RTM)) {
> /*
> * TSX_CTRL_CPUID_CLEAR is handled in the CPUID interception.
> * Keep the host value unchanged to avoid changing CPUID bits
> * under the host kernel's feet.
> - *
> - * hle=0, rtm=0, tsx_ctrl=1 can be found with some combinations
> - * of new kernel and old userspace. If those guests run on a
> - * tsx=off host, do allow guests to use TSX_CTRL, but do not
> - * change the value on the host so that TSX remains always
> - * disabled.
> */
> - if (boot_cpu_has(X86_FEATURE_RTM))
> + tsx_ctrl = vmx_find_uret_msr(vmx, MSR_IA32_TSX_CTRL);
> + if (tsx_ctrl)
> vmx->guest_uret_msrs[i].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
> - else
> - vmx->guest_uret_msrs[i].mask = 0;
> }
>
> err = alloc_loaded_vmcs(&vmx->vmcs01);
>

2021-05-05 15:40:52

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 11/15] KVM: VMX: Disable loading of TSX_CTRL MSR the more conventional way

On Wed, May 05, 2021, Paolo Bonzini wrote:
> On 04/05/21 19:17, Sean Christopherson wrote:
> > Tag TSX_CTRL as not needing to be loaded when RTM isn't supported in the
> > host. Crushing the write mask to '0' has the same effect, but requires
> > more mental gymnastics to understand.
>
> This doesn't explain _why_ this is now possible. What about:
>
> Now that user return MSRs is always present in the list, we don't have

User return MSRs aren't always present in the list; this series doesn't change
that behavior at all.

> the problem that the TSX_CTRL MSR needs a slot vmx->guest_uret_msrs even
> if RTM is not supported in the host (and therefore there is nothing to
> enable). Thus we can simply tag TSX_CTRL as not needing to be loaded
> instead of crushing the write mask to '0'.

Unless I'm missing something, it would have been possible to give TSX_CTRL a
slot but not load it even before this refactoring, we just missed that approach
when handling the TSX_CTRL without HLE/RTM case. Several other MSRs rely on
this behavior, notably the SYSCALL MSRs, which are present in the list so that
the guest can read/write the MSRs, but are loaded into hardware iff the guest
has enabled SYSCALL.

All that said, I certainly have no objection to writing a longer changelog.

2021-05-05 15:54:10

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 11/15] KVM: VMX: Disable loading of TSX_CTRL MSR the more conventional way

On 05/05/21 17:36, Sean Christopherson wrote:
> On Wed, May 05, 2021, Paolo Bonzini wrote:
>> On 04/05/21 19:17, Sean Christopherson wrote:
>>> Tag TSX_CTRL as not needing to be loaded when RTM isn't supported in the
>>> host. Crushing the write mask to '0' has the same effect, but requires
>>> more mental gymnastics to understand.
>>
>> This doesn't explain _why_ this is now possible. What about:
>>
>> Now that user return MSRs is always present in the list, we don't have
>
> User return MSRs aren't always present in the list; this series doesn't change
> that behavior at all.
>
>> the problem that the TSX_CTRL MSR needs a slot vmx->guest_uret_msrs even
>> if RTM is not supported in the host (and therefore there is nothing to
>> enable). Thus we can simply tag TSX_CTRL as not needing to be loaded
>> instead of crushing the write mask to '0'.
>
> Unless I'm missing something, it would have been possible to give TSX_CTRL a
> slot but not load it even before this refactoring, we just missed that approach
> when handling the TSX_CTRL without HLE/RTM case. Several other MSRs rely on
> this behavior, notably the SYSCALL MSRs, which are present in the list so that
> the guest can read/write the MSRs, but are loaded into hardware iff the guest
> has enabled SYSCALL.

You're right, it used to be done with vmx->nr_active_uret_msr.

Paolo

> All that said, I certainly have no objection to writing a longer changelog.

2021-05-08 03:33:32

by Reiji Watanabe

[permalink] [raw]
Subject: Re: [PATCH 09/15] KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list

> -static void vmx_setup_uret_msr(struct vcpu_vmx *vmx, unsigned int msr)
> +static void vmx_setup_uret_msr(struct vcpu_vmx *vmx, unsigned int msr,
> + bool load_into_hardware)
> {
> - struct vmx_uret_msr tmp;
> - int from, to;
> + struct vmx_uret_msr *uret_msr;
>
> - from = __vmx_find_uret_msr(vmx, msr);
> - if (from < 0)
> + uret_msr = vmx_find_uret_msr(vmx, msr);
> + if (!uret_msr)
> return;
> - to = vmx->nr_active_uret_msrs++;
>
> - tmp = vmx->guest_uret_msrs[to];
> - vmx->guest_uret_msrs[to] = vmx->guest_uret_msrs[from];
> - vmx->guest_uret_msrs[from] = tmp;
> + uret_msr->load_into_hardware = load_into_hardware;
> }
>
> /*
> @@ -1785,30 +1781,36 @@ static void vmx_setup_uret_msr(struct vcpu_vmx *vmx, unsigned int msr)
> */
> static void setup_msrs(struct vcpu_vmx *vmx)
> {
> - vmx->guest_uret_msrs_loaded = false;
> - vmx->nr_active_uret_msrs = 0;
> #ifdef CONFIG_X86_64
> + bool load_syscall_msrs;
> +
> /*
> * The SYSCALL MSRs are only needed on long mode guests, and only
> * when EFER.SCE is set.
> */
> - if (is_long_mode(&vmx->vcpu) && (vmx->vcpu.arch.efer & EFER_SCE)) {
> - vmx_setup_uret_msr(vmx, MSR_STAR);
> - vmx_setup_uret_msr(vmx, MSR_LSTAR);
> - vmx_setup_uret_msr(vmx, MSR_SYSCALL_MASK);
> - }
> + load_syscall_msrs = is_long_mode(&vmx->vcpu) &&
> + (vmx->vcpu.arch.efer & EFER_SCE);
> +
> + vmx_setup_uret_msr(vmx, MSR_STAR, load_syscall_msrs);
> + vmx_setup_uret_msr(vmx, MSR_LSTAR, load_syscall_msrs);
> + vmx_setup_uret_msr(vmx, MSR_SYSCALL_MASK, load_syscall_msrs);
> #endif
> - if (update_transition_efer(vmx))
> - vmx_setup_uret_msr(vmx, MSR_EFER);
> + vmx_setup_uret_msr(vmx, MSR_EFER, update_transition_efer(vmx));
>
> - if (guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP) ||
> - guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDPID))
> - vmx_setup_uret_msr(vmx, MSR_TSC_AUX);
> + vmx_setup_uret_msr(vmx, MSR_TSC_AUX,
> + guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP) ||
> + guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDPID));

Shouldn't vmx_setup_uret_msr(,MSR_TSC_AUX,) be called to update
the new flag load_into_hardware for MSR_TSC_AUX when CPUID
(X86_FEATURE_RDTSCP/X86_FEATURE_RDPID) of the vCPU is updated ?


Thanks,
Reiji

2021-05-10 08:04:26

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 01/15] KVM: VMX: Do not adverise RDPID if ENABLE_RDTSCP control is unsupported

On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> Clear KVM's RDPID capability if the ENABLE_RDTSCP secondary exec control is
> unsupported. Despite being enumerated in a separate CPUID flag, RDPID is
> bundled under the same VMCS control as RDTSCP and will #UD in VMX non-root
> if ENABLE_RDTSCP is not enabled.
>
> Fixes: 41cd02c6f7f6 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID")
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 10b610fc7bbc..82404ee2520e 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7377,9 +7377,11 @@ static __init void vmx_set_cpu_caps(void)
> if (!cpu_has_vmx_xsaves())
> kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
>
> - /* CPUID 0x80000001 */
> - if (!cpu_has_vmx_rdtscp())
> + /* CPUID 0x80000001 and 0x7 (RDPID) */
> + if (!cpu_has_vmx_rdtscp()) {
> kvm_cpu_cap_clear(X86_FEATURE_RDTSCP);
> + kvm_cpu_cap_clear(X86_FEATURE_RDPID);
> + }
>
> if (cpu_has_vmx_waitpkg())
> kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG);

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

2021-05-10 08:10:12

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 02/15] KVM: x86: Emulate RDPID only if RDTSCP is supported

On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> Do not advertise emulation support for RDPID if RDTSCP is unsupported.
> RDPID emulation subtly relies on MSR_TSC_AUX to exist in hardware, as
> both vmx_get_msr() and svm_get_msr() will return an error if the MSR is
> unsupported, i.e. ctxt->ops->get_msr() will fail and the emulator will
> inject a #UD.
>
> Note, RDPID emulation also relies on RDTSCP being enabled in the guest,
> but this is a KVM bug and will eventually be fixed.
>
> Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID")
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/cpuid.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index f765bf7a529c..c96f79c9fff2 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -637,7 +637,8 @@ static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func)
> case 7:
> entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> entry->eax = 0;
> - entry->ecx = F(RDPID);
> + if (kvm_cpu_cap_has(X86_FEATURE_RDTSCP))
> + entry->ecx = F(RDPID);
> ++array->nent;
> default:
> break;

Just to make sure that I understand this correctly:

This is what I know:

Both RDTSCP and RDPID are instructions that read IA32_TSC_AUX
(and RDTSCP also reads the TSC).

Both instructions have their own CPUID bits (X86_FEATURE_RDPID, X86_FEATURE_RDTSCP)
If either of these CPUID bits are present, IA32_TSC_AUX should be supported.


RDPID is a newer feature, thus I can at least for the sanity sake assume that
usually a CPU will either have neither of the features, have only RDTSCP,
and IA32_AUX, or have both RDSCP and RDPID.

If not supported in hardware KVM only emulates RDPID as I see.

Why btw? Performance wise guest that only wants the IA32_AUX in userspace,
is better to use RDTSCP and pay the penalty of saving/restoring of the
unwanted registers, than use RDPID with a vmexit.

My own guess for an answer to this question is that RDPID emulation is there
to aid migration from a host that does support RDPID to a host that doesn't.

Having said all that, assuming that we don't want to emulate the RDTSCP too,
when it is not supported, then this patch does make sense.

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky


2021-05-10 08:11:07

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 03/15] KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest

On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest.
>
> Note, SVM does not support intercepting RDPID. Unlike VMX's
> ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID. This
> is a benign virtualization hole as the host kernel (incorrectly) sets
> MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX
> into hardware if RDTSCP is supported in the host, i.e. KVM will not leak
> the host's MSR_TSC_AUX to the guest.
>
> But, when the kernel bug is fixed, KVM will start leaking the host's
> MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available
> for whatever reason. This leak will be remedied in a future commit.
>
> Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 17 +++++++++++++----
> 1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index a7271f31df47..8f2b184270c0 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1100,7 +1100,9 @@ static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> return svm->vmcb->control.tsc_offset;
> }
>
> -static void svm_check_invpcid(struct vcpu_svm *svm)
> +/* Evaluate instruction intercepts that depend on guest CPUID features. */
> +static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
> + struct vcpu_svm *svm)
> {
> /*
> * Intercept INVPCID if shadow paging is enabled to sync/free shadow
> @@ -1113,6 +1115,13 @@ static void svm_check_invpcid(struct vcpu_svm *svm)
> else
> svm_clr_intercept(svm, INTERCEPT_INVPCID);
> }
> +
> + if (kvm_cpu_cap_has(X86_FEATURE_RDTSCP)) {
> + if (guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> + svm_clr_intercept(svm, INTERCEPT_RDTSCP);
> + else
> + svm_set_intercept(svm, INTERCEPT_RDTSCP);
> + }
> }
>
> static void init_vmcb(struct kvm_vcpu *vcpu)
> @@ -1248,7 +1257,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
> svm_clr_intercept(svm, INTERCEPT_PAUSE);
> }
>
> - svm_check_invpcid(svm);
> + svm_recalc_instruction_intercepts(vcpu, svm);
>
> /*
> * If the host supports V_SPEC_CTRL then disable the interception
> @@ -3084,6 +3093,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
> [SVM_EXIT_STGI] = stgi_interception,
> [SVM_EXIT_CLGI] = clgi_interception,
> [SVM_EXIT_SKINIT] = skinit_interception,
> + [SVM_EXIT_RDTSCP] = kvm_handle_invalid_op,
> [SVM_EXIT_WBINVD] = kvm_emulate_wbinvd,
> [SVM_EXIT_MONITOR] = kvm_emulate_monitor,
> [SVM_EXIT_MWAIT] = kvm_emulate_mwait,
> @@ -4007,8 +4017,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
> guest_cpuid_has(vcpu, X86_FEATURE_NRIPS);
>
> - /* Check again if INVPCID interception if required */
> - svm_check_invpcid(svm);
> + svm_recalc_instruction_intercepts(vcpu, svm);
>
> /* For sev guests, the memory encryption bit is not reserved in CR3. */
> if (sev_guest(vcpu->kvm)) {
Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

2021-05-10 08:11:24

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 03/15] KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest

On Tue, 2021-05-04 at 14:58 -0700, Jim Mattson wrote:
> On Tue, May 4, 2021 at 2:57 PM Paolo Bonzini <[email protected]> wrote:
> > On 04/05/21 23:53, Sean Christopherson wrote:
> > > > Does the right thing happen here if the vCPU is in guest mode when
> > > > userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
> > > > off?
> > > I hate our terminology. By "guest mode", do you mean running the vCPU, or do
> > > you specifically mean running in L2?
> > >
> >
> > Guest mode should mean L2.
> >
> > (I wonder if we should have a capability that says "KVM_SET_CPUID2 can
> > only be called prior to KVM_RUN").
>
> It would certainly make it easier to reason about potential security issues.
>
I vote too for this.
Best regards,
Maxim Levitsky

2021-05-10 08:17:24

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 04/15] KVM: x86: Move RDPID emulation intercept to its own enum

On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> Add a dedicated intercept enum for RDPID instead of piggybacking RDTSCP.
> Unlike VMX's ENABLE_RDTSCP, RDPID is not bound to SVM's RDTSCP intercept.
>
> Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID")
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/emulate.c | 2 +-
> arch/x86/kvm/kvm_emulate.h | 1 +
> arch/x86/kvm/vmx/vmx.c | 3 ++-
> 3 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index abd9a4db11a8..8fc71e70857d 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -4502,7 +4502,7 @@ static const struct opcode group8[] = {
> * from the register case of group9.
> */
> static const struct gprefix pfx_0f_c7_7 = {
> - N, N, N, II(DstMem | ModRM | Op3264 | EmulateOnUD, em_rdpid, rdtscp),
> + N, N, N, II(DstMem | ModRM | Op3264 | EmulateOnUD, em_rdpid, rdpid),
> };
>
>
> diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
> index 0d359115429a..f016838faedd 100644
> --- a/arch/x86/kvm/kvm_emulate.h
> +++ b/arch/x86/kvm/kvm_emulate.h
> @@ -468,6 +468,7 @@ enum x86_intercept {
> x86_intercept_clgi,
> x86_intercept_skinit,
> x86_intercept_rdtscp,
> + x86_intercept_rdpid,
> x86_intercept_icebp,
> x86_intercept_wbinvd,
> x86_intercept_monitor,
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 82404ee2520e..99591e523b47 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7437,8 +7437,9 @@ static int vmx_check_intercept(struct kvm_vcpu *vcpu,
> /*
> * RDPID causes #UD if disabled through secondary execution controls.
> * Because it is marked as EmulateOnUD, we need to intercept it here.
> + * Note, RDPID is hidden behind ENABLE_RDTSCP.
> */
> - case x86_intercept_rdtscp:
> + case x86_intercept_rdpid:
Shoudn't this path still handle the x86_intercept_rdtscp as I described below,
or should we remove it from the SVM side as well?

> if (!nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_RDTSCP)) {
> exception->vector = UD_VECTOR;
> exception->error_code_valid = false;

I have a maybe unrelated question that caught my eye:
I see this:

DIP(SrcNone, rdtscp, check_rdtsc),

As far as I can see this means that if a nested guest executes
the rdtscp, and L1 intercepts it, then we will emulate the rdtscp by doing a nested
VM exit, but if we emulate a rdtscp for L1, we will fail since there is no .execute callback.

Is this intentional? As I understand it, at least in theory the emulator can be called
on any instruction due to things like lack of unrestricted guest, and/or emulating an
instruction on page fault (although the later is usually done by re-executing the instruction).

I know that the x86 emulator is far from being complete for such cases but I
do wonder why rdtspc has different behavior in regard to nested and not nested case.

So this patch (since it removes the x86_intercept_rdtscp handling from the VMX),
should break the rdtscp emulation for the nested guest on VMX, although it is probably
not used anyway and should be removed.

Best regards,
Maxim Levitsky


2021-05-10 08:19:53

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 05/15] KVM: VMX: Disable preemption when probing user return MSRs

On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> Disable preemption when probing a user return MSR via RDSMR/WRMSR. If
> the MSR holds a different value per logical CPU, the WRMSR could corrupt
> the host's value if KVM is preempted between the RDMSR and WRMSR, and
> then rescheduled on a different CPU.
>
> Opportunistically land the helper in common x86, SVM will use the helper
> in a future commit.
>
> Fixes: 4be534102624 ("KVM: VMX: Initialize vmx->guest_msrs[] right after allocation")
> Cc: [email protected]
> Cc: Xiaoyao Li <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/vmx/vmx.c | 5 +----
> arch/x86/kvm/x86.c | 16 ++++++++++++++++
> 3 files changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 3e5fc80a35c8..a02c9bf3f7f1 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1778,6 +1778,7 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
> unsigned long icr, int op_64_bit);
>
> void kvm_define_user_return_msr(unsigned index, u32 msr);
> +int kvm_probe_user_return_msr(u32 msr);
> int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);
>
> u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc);
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 99591e523b47..990ee339a05f 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6914,12 +6914,9 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
>
> for (i = 0; i < ARRAY_SIZE(vmx_uret_msrs_list); ++i) {
> u32 index = vmx_uret_msrs_list[i];
> - u32 data_low, data_high;
> int j = vmx->nr_uret_msrs;
>
> - if (rdmsr_safe(index, &data_low, &data_high) < 0)
> - continue;
> - if (wrmsr_safe(index, data_low, data_high) < 0)
> + if (kvm_probe_user_return_msr(index))
> continue;
>
> vmx->guest_uret_msrs[j].slot = i;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3bf52ba5f2bb..e304447be42d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -339,6 +339,22 @@ static void kvm_on_user_return(struct user_return_notifier *urn)
> }
> }
>
> +int kvm_probe_user_return_msr(u32 msr)
> +{
> + u64 val;
> + int ret;
> +
> + preempt_disable();
> + ret = rdmsrl_safe(msr, &val);
> + if (ret)
> + goto out;
> + ret = wrmsrl_safe(msr, val);
> +out:
> + preempt_enable();
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(kvm_probe_user_return_msr);
> +
> void kvm_define_user_return_msr(unsigned slot, u32 msr)
> {
> BUG_ON(slot >= KVM_MAX_NR_USER_RETURN_MSRS);

Reviewed-by: Maxim Levitsky <[email protected]>

One note though: every time we probe for a MSR existance via
checking for a #GP on write, we risk getting nonsense results
if the L1 has ignore_msrs=1.

Thus if possible to use the CPUID instead,
that would be preferred.

Best regards,
Maxim Levitsky


2021-05-10 08:22:10

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 07/15] KVM: x86: Add support for RDPID without RDTSCP

On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> Allow userspace to enable RDPID for a guest without also enabling RDTSCP.
> Aside from checking for RDPID support in the obvious flows, VMX also needs
> to set ENABLE_RDTSCP=1 when RDPID is exposed.
>
> For the record, there is no known scenario where enabling RDPID without
> RDTSCP is desirable. But, both AMD and Intel architectures allow for the
> condition, i.e. this is purely to make KVM more architecturally accurate.
>
> Fixes: 41cd02c6f7f6 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID")
> Cc: [email protected]
> Reported-by: Reiji Watanabe <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 6 ++++--
> arch/x86/kvm/vmx/vmx.c | 27 +++++++++++++++++++++++----
> arch/x86/kvm/x86.c | 3 ++-
> 3 files changed, 29 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index b3153d40cc4d..231b9650d864 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2669,7 +2669,8 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> if (tsc_aux_uret_slot < 0)
> return 1;
> if (!msr_info->host_initiated &&
> - !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> + !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
> return 1;
> msr_info->data = svm->tsc_aux;
> break;
> @@ -2891,7 +2892,8 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> return 1;
>
> if (!msr->host_initiated &&
> - !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> + !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
> return 1;
>
> /*
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 990ee339a05f..42e4bbaa299a 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1788,7 +1788,8 @@ static void setup_msrs(struct vcpu_vmx *vmx)
> if (update_transition_efer(vmx))
> vmx_setup_uret_msr(vmx, MSR_EFER);
>
> - if (guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP))
> + if (guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP) ||
> + guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDPID))
> vmx_setup_uret_msr(vmx, MSR_TSC_AUX);
>
> vmx_setup_uret_msr(vmx, MSR_IA32_TSX_CTRL);
> @@ -1994,7 +1995,8 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> break;
> case MSR_TSC_AUX:
> if (!msr_info->host_initiated &&
> - !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> + !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
> return 1;
> goto find_uret_msr;
> case MSR_IA32_DEBUGCTLMSR:
> @@ -2314,7 +2316,8 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> break;
> case MSR_TSC_AUX:
> if (!msr_info->host_initiated &&
> - !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> + !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
> return 1;
> /* Check reserved bit, higher 32 bits should be zero */
> if ((data >> 32) != 0)
> @@ -4368,7 +4371,23 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
> xsaves_enabled, false);
> }
>
> - vmx_adjust_sec_exec_feature(vmx, &exec_control, rdtscp, RDTSCP);
> + /*
> + * RDPID is also gated by ENABLE_RDTSCP, turn on the control if either
> + * feature is exposed to the guest. This creates a virtualization hole
> + * if both are supported in hardware but only one is exposed to the
> + * guest, but letting the guest execute RDTSCP or RDPID when either one
> + * is advertised is preferable to emulating the advertised instruction
> + * in KVM on #UD, and obviously better than incorrectly injecting #UD.
> + */
> + if (cpu_has_vmx_rdtscp()) {
> + bool rdpid_or_rdtscp_enabled =
> + guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) ||
> + guest_cpuid_has(vcpu, X86_FEATURE_RDPID);
> +
> + vmx_adjust_secondary_exec_control(vmx, &exec_control,
> + SECONDARY_EXEC_ENABLE_RDTSCP,
> + rdpid_or_rdtscp_enabled, false);
> + }
> vmx_adjust_sec_exec_feature(vmx, &exec_control, invpcid, INVPCID);
>
> vmx_adjust_sec_exec_exiting(vmx, &exec_control, rdrand, RDRAND);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e304447be42d..b4516d303413 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5978,7 +5978,8 @@ static void kvm_init_msr_list(void)
> continue;
> break;
> case MSR_TSC_AUX:
> - if (!kvm_cpu_cap_has(X86_FEATURE_RDTSCP))
> + if (!kvm_cpu_cap_has(X86_FEATURE_RDTSCP) &&
> + !kvm_cpu_cap_has(X86_FEATURE_RDPID))
> continue;
> break;
> case MSR_IA32_UMWAIT_CONTROL:

Reviewed-by : Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky


2021-05-10 08:23:01

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 06/15] KVM: SVM: Probe and load MSR_TSC_AUX regardless of RDTSCP support in host

On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> Probe MSR_TSC_AUX whether or not RDTSCP is supported in the host, and
> if probing succeeds, load the guest's MSR_TSC_AUX into hardware prior to
> VMRUN. Because SVM doesn't support interception of RDPID, RDPID cannot
> be disallowed in the guest (without resorting to binary translation).
> Leaving the host's MSR_TSC_AUX in hardware would leak the host's value to
> the guest if RDTSCP is not supported.
>
> Note, there is also a kernel bug that prevents leaking the host's value.
> The host kernel initializes MSR_TSC_AUX if and only if RDTSCP is
> supported, even though the vDSO usage consumes MSR_TSC_AUX via RDPID.
> I.e. if RDTSCP is not supported, there is no host value to leak. But,
> if/when the host kernel bug is fixed, KVM would start leaking MSR_TSC_AUX
> in the case where hardware supports RDPID but RDTSCP is unavailable for
> whatever reason.
>
> Probing MSR_TSC_AUX will also allow consolidating the probe and define
> logic in common x86, and will make it simpler to condition the existence
> of MSR_TSX_AUX (from the guest's perspective) on RDTSCP *or* RDPID.
>
> Fixes: AMD CPUs
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 18 ++++++++++--------
> 1 file changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 8f2b184270c0..b3153d40cc4d 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -212,7 +212,7 @@ DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
> * RDTSCP and RDPID are not used in the kernel, specifically to allow KVM to
> * defer the restoration of TSC_AUX until the CPU returns to userspace.
> */
> -#define TSC_AUX_URET_SLOT 0
> +static int tsc_aux_uret_slot __read_mostly = -1;
>
> static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};
>
> @@ -959,8 +959,10 @@ static __init int svm_hardware_setup(void)
> kvm_tsc_scaling_ratio_frac_bits = 32;
> }
>
> - if (boot_cpu_has(X86_FEATURE_RDTSCP))
> - kvm_define_user_return_msr(TSC_AUX_URET_SLOT, MSR_TSC_AUX);
> + if (!kvm_probe_user_return_msr(MSR_TSC_AUX)) {
> + tsc_aux_uret_slot = 0;
> + kvm_define_user_return_msr(tsc_aux_uret_slot, MSR_TSC_AUX);
> + }
>
> /* Check for pause filtering support */
> if (!boot_cpu_has(X86_FEATURE_PAUSEFILTER)) {
> @@ -1454,8 +1456,8 @@ static void svm_prepare_guest_switch(struct kvm_vcpu *vcpu)
> }
> }
>
> - if (static_cpu_has(X86_FEATURE_RDTSCP))
> - kvm_set_user_return_msr(TSC_AUX_URET_SLOT, svm->tsc_aux, -1ull);
> + if (likely(tsc_aux_uret_slot >= 0))
> + kvm_set_user_return_msr(tsc_aux_uret_slot, svm->tsc_aux, -1ull);
>
> svm->guest_state_loaded = true;
> }
> @@ -2664,7 +2666,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> msr_info->data |= (u64)svm->sysenter_esp_hi << 32;
> break;
> case MSR_TSC_AUX:
> - if (!boot_cpu_has(X86_FEATURE_RDTSCP))
> + if (tsc_aux_uret_slot < 0)
> return 1;
> if (!msr_info->host_initiated &&
> !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> @@ -2885,7 +2887,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> svm->sysenter_esp_hi = guest_cpuid_is_intel(vcpu) ? (data >> 32) : 0;
> break;
> case MSR_TSC_AUX:
> - if (!boot_cpu_has(X86_FEATURE_RDTSCP))
> + if (tsc_aux_uret_slot < 0)
> return 1;
>
> if (!msr->host_initiated &&
> @@ -2908,7 +2910,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> * guest via direct_access_msrs, and switch it via user return.
> */
> preempt_disable();
> - r = kvm_set_user_return_msr(TSC_AUX_URET_SLOT, data, -1ull);
> + r = kvm_set_user_return_msr(tsc_aux_uret_slot, data, -1ull);
> preempt_enable();
> if (r)
> return 1;

If L1 has ignore_msrs=1, then we will end up writing the IA32_TSC_AUX for nothing,
but this shouldn't be that of a big deal, so:

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky


2021-05-10 08:25:16

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 08/15] KVM: VMX: Configure list of user return MSRs at module init

On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> Configure the list of user return MSRs that are actually supported at
> module init instead of reprobing the list of possible MSRs every time a
> vCPU is created. Curating the list on a per-vCPU basis is pointless; KVM
> is completely hosed if the set of supported MSRs changes after module init,
> or if the set of MSRs differs per physical PCU.
>
> The per-vCPU lists also increase complexity (see __vmx_find_uret_msr()) and
> creates corner cases that _should_ be impossible, but theoretically exist
> in KVM, e.g. advertising RDTSCP to userspace without actually being able to
> virtualize RDTSCP if probing MSR_TSC_AUX fails.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 61 ++++++++++++++++++++++++++++--------------
> arch/x86/kvm/vmx/vmx.h | 10 ++++++-
> 2 files changed, 50 insertions(+), 21 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 42e4bbaa299a..68454b0de2b1 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -461,7 +461,7 @@ static unsigned long host_idt_base;
> * support this emulation, IA32_STAR must always be included in
> * vmx_uret_msrs_list[], even in i386 builds.
> */
> -static const u32 vmx_uret_msrs_list[] = {
> +static u32 vmx_uret_msrs_list[] = {
> #ifdef CONFIG_X86_64
> MSR_SYSCALL_MASK, MSR_LSTAR, MSR_CSTAR,
> #endif
> @@ -469,6 +469,12 @@ static const u32 vmx_uret_msrs_list[] = {
> MSR_IA32_TSX_CTRL,
> };
>
> +/*
> + * Number of user return MSRs that are actually supported in hardware.
> + * vmx_uret_msrs_list is modified when KVM is loaded to drop unsupported MSRs.
> + */
> +static int vmx_nr_uret_msrs;
> +
> #if IS_ENABLED(CONFIG_HYPERV)
> static bool __read_mostly enlightened_vmcs = true;
> module_param(enlightened_vmcs, bool, 0444);
> @@ -700,9 +706,16 @@ static inline int __vmx_find_uret_msr(struct vcpu_vmx *vmx, u32 msr)
> {
> int i;
>
> - for (i = 0; i < vmx->nr_uret_msrs; ++i)
> + /*
> + * Note, vmx->guest_uret_msrs is the same size as vmx_uret_msrs_list,
> + * but is ordered differently. The MSR is matched against the list of
> + * supported uret MSRs using "slot", but the index that is returned is
> + * the index into guest_uret_msrs.
> + */
> + for (i = 0; i < vmx_nr_uret_msrs; ++i) {
> if (vmx_uret_msrs_list[vmx->guest_uret_msrs[i].slot] == msr)
> return i;
> + }
> return -1;
> }
>
> @@ -6929,18 +6942,10 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
> goto free_vpid;
> }
>
> - BUILD_BUG_ON(ARRAY_SIZE(vmx_uret_msrs_list) != MAX_NR_USER_RETURN_MSRS);
> + for (i = 0; i < vmx_nr_uret_msrs; ++i) {
> + vmx->guest_uret_msrs[i].data = 0;
>
> - for (i = 0; i < ARRAY_SIZE(vmx_uret_msrs_list); ++i) {
> - u32 index = vmx_uret_msrs_list[i];
> - int j = vmx->nr_uret_msrs;
> -
> - if (kvm_probe_user_return_msr(index))
> - continue;
> -
> - vmx->guest_uret_msrs[j].slot = i;
I don't see anything initalizing the .slot after this patch.
Now this code is removed later which masks this bug,
but for the bisect sake, I think that this patch
should still be fixed.

Other than this minor bug:

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky



> - vmx->guest_uret_msrs[j].data = 0;
> - switch (index) {
> + switch (vmx_uret_msrs_list[i]) {
> case MSR_IA32_TSX_CTRL:
> /*
> * TSX_CTRL_CPUID_CLEAR is handled in the CPUID
> @@ -6954,15 +6959,14 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
> * host so that TSX remains always disabled.
> */
> if (boot_cpu_has(X86_FEATURE_RTM))
> - vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
> + vmx->guest_uret_msrs[i].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
> else
> - vmx->guest_uret_msrs[j].mask = 0;
> + vmx->guest_uret_msrs[i].mask = 0;
> break;
> default:
> - vmx->guest_uret_msrs[j].mask = -1ull;
> + vmx->guest_uret_msrs[i].mask = -1ull;
> break;
> }
> - ++vmx->nr_uret_msrs;
> }
>
> err = alloc_loaded_vmcs(&vmx->vmcs01);
> @@ -7821,17 +7825,34 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
> .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
> };
>
> +static __init void vmx_setup_user_return_msrs(void)
> +{
> + u32 msr;
> + int i;
> +
> + BUILD_BUG_ON(ARRAY_SIZE(vmx_uret_msrs_list) != MAX_NR_USER_RETURN_MSRS);
> +
> + for (i = 0; i < ARRAY_SIZE(vmx_uret_msrs_list); ++i) {
> + msr = vmx_uret_msrs_list[i];
> +
> + if (kvm_probe_user_return_msr(msr))
> + continue;
> +
> + kvm_define_user_return_msr(vmx_nr_uret_msrs, msr);
> + vmx_uret_msrs_list[vmx_nr_uret_msrs++] = msr;
> + }
> +}
> +
> static __init int hardware_setup(void)
> {
> unsigned long host_bndcfgs;
> struct desc_ptr dt;
> - int r, i, ept_lpage_level;
> + int r, ept_lpage_level;
>
> store_idt(&dt);
> host_idt_base = dt.address;
>
> - for (i = 0; i < ARRAY_SIZE(vmx_uret_msrs_list); ++i)
> - kvm_define_user_return_msr(i, vmx_uret_msrs_list[i]);
> + vmx_setup_user_return_msrs();
>
> if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0)
> return -EIO;
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 008cb87ff088..d71ed8b425c5 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -245,8 +245,16 @@ struct vcpu_vmx {
> u32 idt_vectoring_info;
> ulong rflags;
>
> + /*
> + * User return MSRs are always emulated when enabled in the guest, but
> + * only loaded into hardware when necessary, e.g. SYSCALL #UDs outside
> + * of 64-bit mode or if EFER.SCE=1, thus the SYSCALL MSRs don't need to
> + * be loaded into hardware if those conditions aren't met.
> + * nr_active_uret_msrs tracks the number of MSRs that need to be loaded
> + * into hardware when running the guest. guest_uret_msrs[] is resorted
> + * whenever the number of "active" uret MSRs is modified.
> + */
> struct vmx_uret_msr guest_uret_msrs[MAX_NR_USER_RETURN_MSRS];
> - int nr_uret_msrs;
> int nr_active_uret_msrs;
> bool guest_uret_msrs_loaded;
> #ifdef CONFIG_X86_64






2021-05-10 08:27:29

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 11/15] KVM: VMX: Disable loading of TSX_CTRL MSR the more conventional way

On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> Tag TSX_CTRL as not needing to be loaded when RTM isn't supported in the
> host. Crushing the write mask to '0' has the same effect, but requires
> more mental gymnastics to understand.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 22 ++++++++++------------
> 1 file changed, 10 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4b432d2bbd06..7a53568b34fc 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1771,7 +1771,13 @@ static void setup_msrs(struct vcpu_vmx *vmx)
> guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP) ||
> guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDPID));
>
> - vmx_setup_uret_msr(vmx, MSR_IA32_TSX_CTRL, true);
> + /*
> + * hle=0, rtm=0, tsx_ctrl=1 can be found with some combinations of new
> + * kernel and old userspace. If those guests run on a tsx=off host, do
> + * allow guests to use TSX_CTRL, but don't change the value in hardware
> + * so that TSX remains always disabled.
> + */
> + vmx_setup_uret_msr(vmx, MSR_IA32_TSX_CTRL, boot_cpu_has(X86_FEATURE_RTM));
>
> if (cpu_has_vmx_msr_bitmap())
> vmx_update_msr_bitmap(&vmx->vcpu);
> @@ -6919,23 +6925,15 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
> vmx->guest_uret_msrs[i].data = 0;
> vmx->guest_uret_msrs[i].mask = -1ull;
> }
> - tsx_ctrl = vmx_find_uret_msr(vmx, MSR_IA32_TSX_CTRL);
> - if (tsx_ctrl) {
> + if (boot_cpu_has(X86_FEATURE_RTM)) {
> /*
> * TSX_CTRL_CPUID_CLEAR is handled in the CPUID interception.
> * Keep the host value unchanged to avoid changing CPUID bits
> * under the host kernel's feet.
> - *
> - * hle=0, rtm=0, tsx_ctrl=1 can be found with some combinations
> - * of new kernel and old userspace. If those guests run on a
> - * tsx=off host, do allow guests to use TSX_CTRL, but do not
> - * change the value on the host so that TSX remains always
> - * disabled.
> */
> - if (boot_cpu_has(X86_FEATURE_RTM))
> + tsx_ctrl = vmx_find_uret_msr(vmx, MSR_IA32_TSX_CTRL);
> + if (tsx_ctrl)
> vmx->guest_uret_msrs[i].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
> - else
> - vmx->guest_uret_msrs[i].mask = 0;
> }
>
> err = alloc_loaded_vmcs(&vmx->vmcs01);

I also agree that commit message should be updated as Paolo said,
but other than that:

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky <[email protected]>



2021-05-10 08:27:29

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 09/15] KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list

On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> Explicitly flag a uret MSR as needing to be loaded into hardware instead of
> resorting the list of "active" MSRs and tracking how many MSRs in total
> need to be loaded. The only benefit to sorting the list is that the loop
> to load MSRs during vmx_prepare_switch_to_guest() doesn't need to iterate
> over all supported uret MRS, only those that are active. But that is a
> pointless optimization, as the most common case, running a 64-bit guest,
> will load the vast majority of MSRs. Not to mention that a single WRMSR is
> far more expensive than iterating over the list.
>
> Providing a stable list order obviates the need to track a given MSR's
> "slot" in the per-CPU list of user return MSRs; all lists simply use the
> same ordering. Future patches will take advantage of the stable order to
> further simplify the related code.

>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 80 ++++++++++++++++++++++--------------------
> arch/x86/kvm/vmx/vmx.h | 2 +-
> 2 files changed, 42 insertions(+), 40 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 68454b0de2b1..6caabcd5037e 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -458,8 +458,9 @@ static unsigned long host_idt_base;
> * Though SYSCALL is only supported in 64-bit mode on Intel CPUs, kvm
> * will emulate SYSCALL in legacy mode if the vendor string in guest
> * CPUID.0:{EBX,ECX,EDX} is "AuthenticAMD" or "AMDisbetter!" To
> - * support this emulation, IA32_STAR must always be included in
> - * vmx_uret_msrs_list[], even in i386 builds.
> + * support this emulation, MSR_STAR is included in the list for i386,
> + * but is never loaded into hardware. MSR_CSTAR is also never loaded
> + * into hardware and is here purely for emulation purposes.
> */
> static u32 vmx_uret_msrs_list[] = {
> #ifdef CONFIG_X86_64
> @@ -702,18 +703,12 @@ static bool is_valid_passthrough_msr(u32 msr)
> return r;
> }
>
> -static inline int __vmx_find_uret_msr(struct vcpu_vmx *vmx, u32 msr)
> +static inline int __vmx_find_uret_msr(u32 msr)
> {
> int i;
>
> - /*
> - * Note, vmx->guest_uret_msrs is the same size as vmx_uret_msrs_list,
> - * but is ordered differently. The MSR is matched against the list of
> - * supported uret MSRs using "slot", but the index that is returned is
> - * the index into guest_uret_msrs.
> - */
> for (i = 0; i < vmx_nr_uret_msrs; ++i) {
> - if (vmx_uret_msrs_list[vmx->guest_uret_msrs[i].slot] == msr)
> + if (vmx_uret_msrs_list[i] == msr)
> return i;
> }
> return -1;
> @@ -723,7 +718,7 @@ struct vmx_uret_msr *vmx_find_uret_msr(struct vcpu_vmx *vmx, u32 msr)
> {
> int i;
>
> - i = __vmx_find_uret_msr(vmx, msr);
> + i = __vmx_find_uret_msr(msr);
> if (i >= 0)
> return &vmx->guest_uret_msrs[i];
> return NULL;
> @@ -732,13 +727,14 @@ struct vmx_uret_msr *vmx_find_uret_msr(struct vcpu_vmx *vmx, u32 msr)
> static int vmx_set_guest_uret_msr(struct vcpu_vmx *vmx,
> struct vmx_uret_msr *msr, u64 data)
> {
> + unsigned int slot = msr - vmx->guest_uret_msrs;
> int ret = 0;
>
> u64 old_msr_data = msr->data;
> msr->data = data;
> - if (msr - vmx->guest_uret_msrs < vmx->nr_active_uret_msrs) {
> + if (msr->load_into_hardware) {
> preempt_disable();
> - ret = kvm_set_user_return_msr(msr->slot, msr->data, msr->mask);
> + ret = kvm_set_user_return_msr(slot, msr->data, msr->mask);
> preempt_enable();
> if (ret)
> msr->data = old_msr_data;
> @@ -1090,7 +1086,7 @@ static bool update_transition_efer(struct vcpu_vmx *vmx)
> return false;
> }
>
> - i = __vmx_find_uret_msr(vmx, MSR_EFER);
> + i = __vmx_find_uret_msr(MSR_EFER);
> if (i < 0)
> return false;
>
> @@ -1252,11 +1248,14 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> */
> if (!vmx->guest_uret_msrs_loaded) {
> vmx->guest_uret_msrs_loaded = true;
> - for (i = 0; i < vmx->nr_active_uret_msrs; ++i)
> - kvm_set_user_return_msr(vmx->guest_uret_msrs[i].slot,
> + for (i = 0; i < vmx_nr_uret_msrs; ++i) {
> + if (!vmx->guest_uret_msrs[i].load_into_hardware)
> + continue;
> +
> + kvm_set_user_return_msr(i,
> vmx->guest_uret_msrs[i].data,
> vmx->guest_uret_msrs[i].mask);
> -
> + }
> }
>
> if (vmx->nested.need_vmcs12_to_shadow_sync)
> @@ -1763,19 +1762,16 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
> vmx_clear_hlt(vcpu);
> }
>
> -static void vmx_setup_uret_msr(struct vcpu_vmx *vmx, unsigned int msr)
> +static void vmx_setup_uret_msr(struct vcpu_vmx *vmx, unsigned int msr,
> + bool load_into_hardware)
> {
> - struct vmx_uret_msr tmp;
> - int from, to;
> + struct vmx_uret_msr *uret_msr;
>
> - from = __vmx_find_uret_msr(vmx, msr);
> - if (from < 0)
> + uret_msr = vmx_find_uret_msr(vmx, msr);
> + if (!uret_msr)
> return;
> - to = vmx->nr_active_uret_msrs++;
>
> - tmp = vmx->guest_uret_msrs[to];
> - vmx->guest_uret_msrs[to] = vmx->guest_uret_msrs[from];
> - vmx->guest_uret_msrs[from] = tmp;
> + uret_msr->load_into_hardware = load_into_hardware;
> }
>
> /*
> @@ -1785,30 +1781,36 @@ static void vmx_setup_uret_msr(struct vcpu_vmx *vmx, unsigned int msr)
> */
> static void setup_msrs(struct vcpu_vmx *vmx)
> {
> - vmx->guest_uret_msrs_loaded = false;
> - vmx->nr_active_uret_msrs = 0;
> #ifdef CONFIG_X86_64
> + bool load_syscall_msrs;
> +
> /*
> * The SYSCALL MSRs are only needed on long mode guests, and only
> * when EFER.SCE is set.
> */
> - if (is_long_mode(&vmx->vcpu) && (vmx->vcpu.arch.efer & EFER_SCE)) {
> - vmx_setup_uret_msr(vmx, MSR_STAR);
> - vmx_setup_uret_msr(vmx, MSR_LSTAR);
> - vmx_setup_uret_msr(vmx, MSR_SYSCALL_MASK);
> - }
> + load_syscall_msrs = is_long_mode(&vmx->vcpu) &&
> + (vmx->vcpu.arch.efer & EFER_SCE);
> +
> + vmx_setup_uret_msr(vmx, MSR_STAR, load_syscall_msrs);
> + vmx_setup_uret_msr(vmx, MSR_LSTAR, load_syscall_msrs);
> + vmx_setup_uret_msr(vmx, MSR_SYSCALL_MASK, load_syscall_msrs);
> #endif
> - if (update_transition_efer(vmx))
> - vmx_setup_uret_msr(vmx, MSR_EFER);
> + vmx_setup_uret_msr(vmx, MSR_EFER, update_transition_efer(vmx));
>
> - if (guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP) ||
> - guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDPID))
> - vmx_setup_uret_msr(vmx, MSR_TSC_AUX);
> + vmx_setup_uret_msr(vmx, MSR_TSC_AUX,
> + guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP) ||
> + guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDPID));
>
> - vmx_setup_uret_msr(vmx, MSR_IA32_TSX_CTRL);
> + vmx_setup_uret_msr(vmx, MSR_IA32_TSX_CTRL, true);
>
> if (cpu_has_vmx_msr_bitmap())
> vmx_update_msr_bitmap(&vmx->vcpu);
> +
> + /*
> + * The set of MSRs to load may have changed, reload MSRs before the
> + * next VM-Enter.
> + */
> + vmx->guest_uret_msrs_loaded = false;
> }
>
> static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index d71ed8b425c5..16e4e457ba23 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -36,7 +36,7 @@ struct vmx_msrs {
> };
>
> struct vmx_uret_msr {
> - unsigned int slot; /* The MSR's slot in kvm_user_return_msrs. */
> + bool load_into_hardware;
> u64 data;
> u64 mask;
> };

This is a very welcomed change, the old code was very complicated
to understand.

However I still feel that it would be nice to have a comment explaining
that vmx->guest_uret_msrs follow the same order now as the (eventually common)
uret msr list, and basically is a parallel array that extends the
percpu 'user_return_msrs' and the global kvm_uret_msrs_list arrays.

In fact why not to fold the vmx->guest_uret_msrs into the x86 common uret msr list?
There is nothing VMX specific in this list IMHO and SVM can use it as well,
in fact it has 'svm->tsc_aux' which is the 'data' field of a 'struct vmx_uret_msr'


Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky





2021-05-10 08:28:29

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 12/15] KVM: x86: Export the number of uret MSRs to vendor modules

On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> Split out and export the number of configured user return MSRs so that
> VMX can iterate over the set of MSRs without having to do its own tracking.
> Keep the list itself internal to x86 so that vendor code still has to go
> through the "official" APIs to add/modify entries.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/x86.c | 29 +++++++++++++----------------
> 2 files changed, 14 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c9452472ed55..10663610f105 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1419,6 +1419,7 @@ struct kvm_arch_async_pf {
> bool direct_map;
> };
>
> +extern u32 __read_mostly kvm_nr_uret_msrs;
> extern u64 __read_mostly host_efer;
> extern bool __read_mostly allow_smaller_maxphyaddr;
> extern struct kvm_x86_ops kvm_x86_ops;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 90ef340565a4..2fd46e917666 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -184,11 +184,6 @@ module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
> */
> #define KVM_MAX_NR_USER_RETURN_MSRS 16
>
> -struct kvm_user_return_msrs_global {
> - int nr;
> - u32 msrs[KVM_MAX_NR_USER_RETURN_MSRS];
> -};
> -
> struct kvm_user_return_msrs {
> struct user_return_notifier urn;
> bool registered;
> @@ -198,7 +193,9 @@ struct kvm_user_return_msrs {
> } values[KVM_MAX_NR_USER_RETURN_MSRS];
> };
>
> -static struct kvm_user_return_msrs_global __read_mostly user_return_msrs_global;
> +u32 __read_mostly kvm_nr_uret_msrs;
> +EXPORT_SYMBOL_GPL(kvm_nr_uret_msrs);
> +static u32 __read_mostly kvm_uret_msrs_list[KVM_MAX_NR_USER_RETURN_MSRS];
> static struct kvm_user_return_msrs __percpu *user_return_msrs;
>
> #define KVM_SUPPORTED_XCR0 (XFEATURE_MASK_FP | XFEATURE_MASK_SSE \
> @@ -330,10 +327,10 @@ static void kvm_on_user_return(struct user_return_notifier *urn)
> user_return_notifier_unregister(urn);
> }
> local_irq_restore(flags);
> - for (slot = 0; slot < user_return_msrs_global.nr; ++slot) {
> + for (slot = 0; slot < kvm_nr_uret_msrs; ++slot) {
> values = &msrs->values[slot];
> if (values->host != values->curr) {
> - wrmsrl(user_return_msrs_global.msrs[slot], values->host);
> + wrmsrl(kvm_uret_msrs_list[slot], values->host);
> values->curr = values->host;
> }
> }
> @@ -358,9 +355,9 @@ EXPORT_SYMBOL_GPL(kvm_probe_user_return_msr);
> void kvm_define_user_return_msr(unsigned slot, u32 msr)
> {
> BUG_ON(slot >= KVM_MAX_NR_USER_RETURN_MSRS);
> - user_return_msrs_global.msrs[slot] = msr;
> - if (slot >= user_return_msrs_global.nr)
> - user_return_msrs_global.nr = slot + 1;
> + kvm_uret_msrs_list[slot] = msr;
> + if (slot >= kvm_nr_uret_msrs)
> + kvm_nr_uret_msrs = slot + 1;
> }
> EXPORT_SYMBOL_GPL(kvm_define_user_return_msr);
>
> @@ -368,8 +365,8 @@ int kvm_find_user_return_msr(u32 msr)
> {
> int i;
>
> - for (i = 0; i < user_return_msrs_global.nr; ++i) {
> - if (user_return_msrs_global.msrs[i] == msr)
> + for (i = 0; i < kvm_nr_uret_msrs; ++i) {
> + if (kvm_uret_msrs_list[i] == msr)
> return i;
> }
> return -1;
> @@ -383,8 +380,8 @@ static void kvm_user_return_msr_cpu_online(void)
> u64 value;
> int i;
>
> - for (i = 0; i < user_return_msrs_global.nr; ++i) {
> - rdmsrl_safe(user_return_msrs_global.msrs[i], &value);
> + for (i = 0; i < kvm_nr_uret_msrs; ++i) {
> + rdmsrl_safe(kvm_uret_msrs_list[i], &value);
> msrs->values[i].host = value;
> msrs->values[i].curr = value;
> }
> @@ -399,7 +396,7 @@ int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
> value = (value & mask) | (msrs->values[slot].host & ~mask);
> if (value == msrs->values[slot].curr)
> return 0;
> - err = wrmsrl_safe(user_return_msrs_global.msrs[slot], value);
> + err = wrmsrl_safe(kvm_uret_msrs_list[slot], value);
> if (err)
> return 1;
>

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky


2021-05-10 08:28:39

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 10/15] KVM: VMX: Use common x86's uret MSR list as the one true list

On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> Drop VMX's global list of user return MSRs now that VMX doesn't resort said
> list to isolate "active" MSRs, i.e. now that VMX's list and x86's list have
> the same MSRs in the same order.
>
> In addition to eliminating the redundant list, this will also allow moving
> more of the list management into common x86.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/vmx/vmx.c | 97 ++++++++++++++-------------------
> arch/x86/kvm/x86.c | 12 ++++
> 3 files changed, 53 insertions(+), 57 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index a02c9bf3f7f1..c9452472ed55 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1778,6 +1778,7 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
> unsigned long icr, int op_64_bit);
>
> void kvm_define_user_return_msr(unsigned index, u32 msr);
> +int kvm_find_user_return_msr(u32 msr);
> int kvm_probe_user_return_msr(u32 msr);
> int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 6caabcd5037e..4b432d2bbd06 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -454,26 +454,7 @@ static inline void vmx_segment_cache_clear(struct vcpu_vmx *vmx)
>
> static unsigned long host_idt_base;
>
> -/*
> - * Though SYSCALL is only supported in 64-bit mode on Intel CPUs, kvm
> - * will emulate SYSCALL in legacy mode if the vendor string in guest
> - * CPUID.0:{EBX,ECX,EDX} is "AuthenticAMD" or "AMDisbetter!" To
> - * support this emulation, MSR_STAR is included in the list for i386,
> - * but is never loaded into hardware. MSR_CSTAR is also never loaded
> - * into hardware and is here purely for emulation purposes.
> - */
> -static u32 vmx_uret_msrs_list[] = {
> -#ifdef CONFIG_X86_64
> - MSR_SYSCALL_MASK, MSR_LSTAR, MSR_CSTAR,
> -#endif
> - MSR_EFER, MSR_TSC_AUX, MSR_STAR,
> - MSR_IA32_TSX_CTRL,
> -};
> -
> -/*
> - * Number of user return MSRs that are actually supported in hardware.
> - * vmx_uret_msrs_list is modified when KVM is loaded to drop unsupported MSRs.
> - */
> +/* Number of user return MSRs that are actually supported in hardware. */
> static int vmx_nr_uret_msrs;
>
> #if IS_ENABLED(CONFIG_HYPERV)
> @@ -703,22 +684,11 @@ static bool is_valid_passthrough_msr(u32 msr)
> return r;
> }
>
> -static inline int __vmx_find_uret_msr(u32 msr)
> -{
> - int i;
> -
> - for (i = 0; i < vmx_nr_uret_msrs; ++i) {
> - if (vmx_uret_msrs_list[i] == msr)
> - return i;
> - }
> - return -1;
> -}
> -
> struct vmx_uret_msr *vmx_find_uret_msr(struct vcpu_vmx *vmx, u32 msr)
> {
> int i;
>
> - i = __vmx_find_uret_msr(msr);
> + i = kvm_find_user_return_msr(msr);
> if (i >= 0)
> return &vmx->guest_uret_msrs[i];
> return NULL;
> @@ -1086,7 +1056,7 @@ static bool update_transition_efer(struct vcpu_vmx *vmx)
> return false;
> }
>
> - i = __vmx_find_uret_msr(MSR_EFER);
> + i = kvm_find_user_return_msr(MSR_EFER);
> if (i < 0)
> return false;
>
> @@ -6922,6 +6892,7 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu)
>
> static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
> {
> + struct vmx_uret_msr *tsx_ctrl;
> struct vcpu_vmx *vmx;
> int i, cpu, err;
>
> @@ -6946,29 +6917,25 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
>
> for (i = 0; i < vmx_nr_uret_msrs; ++i) {
> vmx->guest_uret_msrs[i].data = 0;
> -
> - switch (vmx_uret_msrs_list[i]) {
> - case MSR_IA32_TSX_CTRL:
> - /*
> - * TSX_CTRL_CPUID_CLEAR is handled in the CPUID
> - * interception. Keep the host value unchanged to avoid
> - * changing CPUID bits under the host kernel's feet.
> - *
> - * hle=0, rtm=0, tsx_ctrl=1 can be found with some
> - * combinations of new kernel and old userspace. If
> - * those guests run on a tsx=off host, do allow guests
> - * to use TSX_CTRL, but do not change the value on the
> - * host so that TSX remains always disabled.
> - */
> - if (boot_cpu_has(X86_FEATURE_RTM))
> - vmx->guest_uret_msrs[i].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
> - else
> - vmx->guest_uret_msrs[i].mask = 0;
> - break;
> - default:
> - vmx->guest_uret_msrs[i].mask = -1ull;
> - break;
> - }
> + vmx->guest_uret_msrs[i].mask = -1ull;
> + }
> + tsx_ctrl = vmx_find_uret_msr(vmx, MSR_IA32_TSX_CTRL);
> + if (tsx_ctrl) {
> + /*
> + * TSX_CTRL_CPUID_CLEAR is handled in the CPUID interception.
> + * Keep the host value unchanged to avoid changing CPUID bits
> + * under the host kernel's feet.
> + *
> + * hle=0, rtm=0, tsx_ctrl=1 can be found with some combinations
> + * of new kernel and old userspace. If those guests run on a
> + * tsx=off host, do allow guests to use TSX_CTRL, but do not
> + * change the value on the host so that TSX remains always
> + * disabled.
> + */
> + if (boot_cpu_has(X86_FEATURE_RTM))
> + vmx->guest_uret_msrs[i].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
> + else
> + vmx->guest_uret_msrs[i].mask = 0;
> }
>
> err = alloc_loaded_vmcs(&vmx->vmcs01);
> @@ -7829,6 +7796,22 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
>
> static __init void vmx_setup_user_return_msrs(void)
> {
> +
> + /*
> + * Though SYSCALL is only supported in 64-bit mode on Intel CPUs, kvm
> + * will emulate SYSCALL in legacy mode if the vendor string in guest
> + * CPUID.0:{EBX,ECX,EDX} is "AuthenticAMD" or "AMDisbetter!" To
> + * support this emulation, MSR_STAR is included in the list for i386,
> + * but is never loaded into hardware. MSR_CSTAR is also never loaded
> + * into hardware and is here purely for emulation purposes.
> + */
> + const u32 vmx_uret_msrs_list[] = {
> + #ifdef CONFIG_X86_64
> + MSR_SYSCALL_MASK, MSR_LSTAR, MSR_CSTAR,
> + #endif
> + MSR_EFER, MSR_TSC_AUX, MSR_STAR,
> + MSR_IA32_TSX_CTRL,
> + };
> u32 msr;
> int i;
>
> @@ -7841,7 +7824,7 @@ static __init void vmx_setup_user_return_msrs(void)
> continue;
>
> kvm_define_user_return_msr(vmx_nr_uret_msrs, msr);
> - vmx_uret_msrs_list[vmx_nr_uret_msrs++] = msr;
> + vmx_nr_uret_msrs++;
> }
> }
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b4516d303413..90ef340565a4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -364,6 +364,18 @@ void kvm_define_user_return_msr(unsigned slot, u32 msr)
> }
> EXPORT_SYMBOL_GPL(kvm_define_user_return_msr);
>
> +int kvm_find_user_return_msr(u32 msr)
> +{
> + int i;
> +
> + for (i = 0; i < user_return_msrs_global.nr; ++i) {
> + if (user_return_msrs_global.msrs[i] == msr)
> + return i;
> + }
> + return -1;
> +}
> +EXPORT_SYMBOL_GPL(kvm_find_user_return_msr);
> +
> static void kvm_user_return_msr_cpu_online(void)
> {
> unsigned int cpu = smp_processor_id();


Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky <[email protected]>


2021-05-10 08:29:52

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 13/15] KVM: x86: Move uret MSR slot management to common x86

On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> Now that SVM and VMX both probe MSRs before "defining" user return slots
> for them, consolidate the code for probe+define into common x86 and
> eliminate the odd behavior of having the vendor code define the slot for
> a given MSR.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 3 +--
> arch/x86/kvm/svm/svm.c | 5 +----
> arch/x86/kvm/vmx/vmx.c | 19 ++++---------------
> arch/x86/kvm/x86.c | 19 +++++++++++--------
> 4 files changed, 17 insertions(+), 29 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 10663610f105..a4b912f7e427 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1778,9 +1778,8 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
> unsigned long ipi_bitmap_high, u32 min,
> unsigned long icr, int op_64_bit);
>
> -void kvm_define_user_return_msr(unsigned index, u32 msr);
> +int kvm_add_user_return_msr(u32 msr);
> int kvm_find_user_return_msr(u32 msr);
> -int kvm_probe_user_return_msr(u32 msr);
> int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);
>
> u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc);
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 231b9650d864..de921935e8de 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -959,10 +959,7 @@ static __init int svm_hardware_setup(void)
> kvm_tsc_scaling_ratio_frac_bits = 32;
> }
>
> - if (!kvm_probe_user_return_msr(MSR_TSC_AUX)) {
> - tsc_aux_uret_slot = 0;
> - kvm_define_user_return_msr(tsc_aux_uret_slot, MSR_TSC_AUX);
> - }
> + tsc_aux_uret_slot = kvm_add_user_return_msr(MSR_TSC_AUX);
>
> /* Check for pause filtering support */
> if (!boot_cpu_has(X86_FEATURE_PAUSEFILTER)) {
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 7a53568b34fc..26f82f302391 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -454,9 +454,6 @@ static inline void vmx_segment_cache_clear(struct vcpu_vmx *vmx)
>
> static unsigned long host_idt_base;
>
> -/* Number of user return MSRs that are actually supported in hardware. */
> -static int vmx_nr_uret_msrs;
> -
> #if IS_ENABLED(CONFIG_HYPERV)
> static bool __read_mostly enlightened_vmcs = true;
> module_param(enlightened_vmcs, bool, 0444);
> @@ -1218,7 +1215,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> */
> if (!vmx->guest_uret_msrs_loaded) {
> vmx->guest_uret_msrs_loaded = true;
> - for (i = 0; i < vmx_nr_uret_msrs; ++i) {
> + for (i = 0; i < kvm_nr_uret_msrs; ++i) {
> if (!vmx->guest_uret_msrs[i].load_into_hardware)
> continue;
>
> @@ -6921,7 +6918,7 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
> goto free_vpid;
> }
>
> - for (i = 0; i < vmx_nr_uret_msrs; ++i) {
> + for (i = 0; i < kvm_nr_uret_msrs; ++i) {
> vmx->guest_uret_msrs[i].data = 0;
> vmx->guest_uret_msrs[i].mask = -1ull;
> }
> @@ -7810,20 +7807,12 @@ static __init void vmx_setup_user_return_msrs(void)
> MSR_EFER, MSR_TSC_AUX, MSR_STAR,
> MSR_IA32_TSX_CTRL,
> };
> - u32 msr;
> int i;
>
> BUILD_BUG_ON(ARRAY_SIZE(vmx_uret_msrs_list) != MAX_NR_USER_RETURN_MSRS);
>
> - for (i = 0; i < ARRAY_SIZE(vmx_uret_msrs_list); ++i) {
> - msr = vmx_uret_msrs_list[i];
> -
> - if (kvm_probe_user_return_msr(msr))
> - continue;
> -
> - kvm_define_user_return_msr(vmx_nr_uret_msrs, msr);
> - vmx_nr_uret_msrs++;
> - }
> + for (i = 0; i < ARRAY_SIZE(vmx_uret_msrs_list); ++i)
> + kvm_add_user_return_msr(vmx_uret_msrs_list[i]);
> }
>
> static __init int hardware_setup(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2fd46e917666..adca491d3b4b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -336,7 +336,7 @@ static void kvm_on_user_return(struct user_return_notifier *urn)
> }
> }
>
> -int kvm_probe_user_return_msr(u32 msr)
> +static int kvm_probe_user_return_msr(u32 msr)
> {
> u64 val;
> int ret;
> @@ -350,16 +350,18 @@ int kvm_probe_user_return_msr(u32 msr)
> preempt_enable();
> return ret;
> }
> -EXPORT_SYMBOL_GPL(kvm_probe_user_return_msr);
>
> -void kvm_define_user_return_msr(unsigned slot, u32 msr)
> +int kvm_add_user_return_msr(u32 msr)
> {
> - BUG_ON(slot >= KVM_MAX_NR_USER_RETURN_MSRS);
> - kvm_uret_msrs_list[slot] = msr;
> - if (slot >= kvm_nr_uret_msrs)
> - kvm_nr_uret_msrs = slot + 1;
> + BUG_ON(kvm_nr_uret_msrs >= KVM_MAX_NR_USER_RETURN_MSRS);
> +
> + if (kvm_probe_user_return_msr(msr))
> + return -1;
> +
> + kvm_uret_msrs_list[kvm_nr_uret_msrs] = msr;
> + return kvm_nr_uret_msrs++;
> }
> -EXPORT_SYMBOL_GPL(kvm_define_user_return_msr);
> +EXPORT_SYMBOL_GPL(kvm_add_user_return_msr);
>
> int kvm_find_user_return_msr(u32 msr)
> {
> @@ -8169,6 +8171,7 @@ int kvm_arch_init(void *opaque)
> printk(KERN_ERR "kvm: failed to allocate percpu kvm_user_return_msrs\n");
> goto out_free_x86_emulator_cache;
> }
> + kvm_nr_uret_msrs = 0;
>
> r = kvm_mmu_module_init();
> if (r)
Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky




2021-05-10 08:30:56

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 15/15] KVM: x86: Hide RDTSCP and RDPID if MSR_TSC_AUX probing failed

On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> If probing MSR_TSC_AUX failed, hide RDTSCP and RDPID, and WARN if either
> feature was reported as supported. In theory, such a scenario should
> never happen as both Intel and AMD state that MSR_TSC_AUX is available if
> RDTSCP or RDPID is supported. But, KVM injects #GP on MSR_TSC_AUX
> accesses if probing failed, faults on WRMSR(MSR_TSC_AUX) may be fatal to
> the guest (because they happen during early CPU bringup), and KVM itself
> has effectively misreported RDPID support in the past.
>
> Note, this also has the happy side effect of omitting MSR_TSC_AUX from
> the list of MSRs that are exposed to userspace if probing the MSR fails.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/cpuid.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index c96f79c9fff2..bf0f74ce4974 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -567,6 +567,21 @@ void kvm_set_cpu_caps(void)
> F(ACE2) | F(ACE2_EN) | F(PHE) | F(PHE_EN) |
> F(PMM) | F(PMM_EN)
> );
> +
> + /*
> + * Hide RDTSCP and RDPID if either feature is reported as supported but
> + * probing MSR_TSC_AUX failed. This is purely a sanity check and
> + * should never happen, but the guest will likely crash if RDTSCP or
> + * RDPID is misreported, and KVM has botched MSR_TSC_AUX emulation in
> + * the past, e.g. the sanity check may fire if this instance of KVM is
> + * running as L1 on top of an older, broken KVM.
> + */
> + if (WARN_ON((kvm_cpu_cap_has(X86_FEATURE_RDTSCP) ||
> + kvm_cpu_cap_has(X86_FEATURE_RDPID)) &&
> + !kvm_is_supported_user_return_msr(MSR_TSC_AUX))) {
> + kvm_cpu_cap_clear(X86_FEATURE_RDTSCP);
> + kvm_cpu_cap_clear(X86_FEATURE_RDPID);
> + }
> }
> EXPORT_SYMBOL_GPL(kvm_set_cpu_caps);
>
Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

2021-05-10 08:31:17

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 14/15] KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU model

On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> Squish the Intel and AMD emulation of MSR_TSC_AUX together and tie it to
> the guest CPU model instead of the host CPU behavior. While not strictly
> necessary to avoid guest breakage, emulating cross-vendor "architecture"
> will provide consistent behavior for the guest, e.g. WRMSR fault behavior
> won't change if the vCPU is migrated to a host with divergent behavior.
>
> Note, the "new" kvm_is_supported_user_return_msr() checks do not add new
> functionality on either SVM or VMX. On SVM, the equivalent was
> "tsc_aux_uret_slot < 0", and on VMX the check was buried in the
> vmx_find_uret_msr() call at the find_uret_msr label.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 5 +++++
> arch/x86/kvm/svm/svm.c | 24 ----------------------
> arch/x86/kvm/vmx/vmx.c | 15 --------------
> arch/x86/kvm/x86.c | 36 +++++++++++++++++++++++++++++++++
> 4 files changed, 41 insertions(+), 39 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index a4b912f7e427..00fb9efb9984 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1782,6 +1782,11 @@ int kvm_add_user_return_msr(u32 msr);
> int kvm_find_user_return_msr(u32 msr);
> int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);
>
> +static inline bool kvm_is_supported_user_return_msr(u32 msr)
> +{
> + return kvm_find_user_return_msr(msr) >= 0;
> +}
> +
> u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc);
> u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index de921935e8de..6c7c6a303cc5 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2663,12 +2663,6 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> msr_info->data |= (u64)svm->sysenter_esp_hi << 32;
> break;
> case MSR_TSC_AUX:
> - if (tsc_aux_uret_slot < 0)
> - return 1;
> - if (!msr_info->host_initiated &&
Not related to this patch, but I do wonder why do we need
to always allow writing this msr if done by the host,
since if neither RDTSPC nor RDPID are supported, the guest
won't be able to read this msr at all.


> - !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
> - !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
> - return 1;
> msr_info->data = svm->tsc_aux;
> break;
> /*
> @@ -2885,24 +2879,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> svm->sysenter_esp_hi = guest_cpuid_is_intel(vcpu) ? (data >> 32) : 0;
> break;
> case MSR_TSC_AUX:
> - if (tsc_aux_uret_slot < 0)
> - return 1;
> -
> - if (!msr->host_initiated &&
> - !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
> - !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
> - return 1;
> -
> - /*
> - * Per Intel's SDM, bits 63:32 are reserved, but AMD's APM has
> - * incomplete and conflicting architectural behavior. Current
> - * AMD CPUs completely ignore bits 63:32, i.e. they aren't
> - * reserved and always read as zeros. Emulate AMD CPU behavior
> - * to avoid explosions if the vCPU is migrated from an AMD host
> - * to an Intel host.
> - */
> - data = (u32)data;
> -
> /*
> * TSC_AUX is usually changed only during boot and never read
> * directly. Intercept TSC_AUX instead of exposing it to the
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 26f82f302391..d85ac5876982 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1981,12 +1981,6 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> else
> msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
> break;
> - case MSR_TSC_AUX:
> - if (!msr_info->host_initiated &&
> - !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
> - !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
> - return 1;
> - goto find_uret_msr;
> case MSR_IA32_DEBUGCTLMSR:
> msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
> break;
> @@ -2302,15 +2296,6 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> else
> vmx->pt_desc.guest.addr_a[index / 2] = data;
> break;
> - case MSR_TSC_AUX:
> - if (!msr_info->host_initiated &&
> - !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
> - !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
> - return 1;
> - /* Check reserved bit, higher 32 bits should be zero */
> - if ((data >> 32) != 0)
> - return 1;
> - goto find_uret_msr;
> case MSR_IA32_PERF_CAPABILITIES:
> if (data && !vcpu_to_pmu(vcpu)->version)
> return 1;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index adca491d3b4b..896127ea4d4f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1642,6 +1642,30 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
> * invokes 64-bit SYSENTER.
> */
> data = get_canonical(data, vcpu_virt_addr_bits(vcpu));
> + break;
> + case MSR_TSC_AUX:
> + if (!kvm_is_supported_user_return_msr(MSR_TSC_AUX))
> + return 1;
> +
> + if (!host_initiated &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
> + return 1;
> +
> + /*
> + * Per Intel's SDM, bits 63:32 are reserved, but AMD's APM has
> + * incomplete and conflicting architectural behavior. Current
> + * AMD CPUs completely ignore bits 63:32, i.e. they aren't
> + * reserved and always read as zeros. Enforce Intel's reserved
> + * bits check if and only if the guest CPU is Intel, and clear
> + * the bits in all other cases. This ensures cross-vendor
> + * migration will provide consistent behavior for the guest.
> + */
> + if (guest_cpuid_is_intel(vcpu) && (data >> 32) != 0)
> + return 1;
> +
> + data = (u32)data;
> + break;
> }
>
> msr.data = data;
> @@ -1678,6 +1702,18 @@ int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
> if (!host_initiated && !kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ))
> return KVM_MSR_RET_FILTERED;
>
> + switch (index) {
> + case MSR_TSC_AUX:
> + if (!kvm_is_supported_user_return_msr(MSR_TSC_AUX))
> + return 1;
> +
> + if (!host_initiated &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
> + return 1;
> + break;
> + }
> +
> msr.index = index;
> msr.host_initiated = host_initiated;
>

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

2021-05-10 15:30:47

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 08/15] KVM: VMX: Configure list of user return MSRs at module init

On Mon, May 10, 2021, Maxim Levitsky wrote:
> On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> > @@ -6929,18 +6942,10 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
> > goto free_vpid;
> > }
> >
> > - BUILD_BUG_ON(ARRAY_SIZE(vmx_uret_msrs_list) != MAX_NR_USER_RETURN_MSRS);
> > + for (i = 0; i < vmx_nr_uret_msrs; ++i) {
> > + vmx->guest_uret_msrs[i].data = 0;
> >
> > - for (i = 0; i < ARRAY_SIZE(vmx_uret_msrs_list); ++i) {
> > - u32 index = vmx_uret_msrs_list[i];
> > - int j = vmx->nr_uret_msrs;
> > -
> > - if (kvm_probe_user_return_msr(index))
> > - continue;
> > -
> > - vmx->guest_uret_msrs[j].slot = i;
> I don't see anything initalizing the .slot after this patch.
> Now this code is removed later which masks this bug,
> but for the bisect sake, I think that this patch
> should still be fixed.

Egad, indeed it's broken. I'll retest the whole series to verify the other
patches will bisect cleanly.

Nice catch!

2021-05-10 16:45:49

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 09/15] KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list

On Fri, May 07, 2021, Reiji Watanabe wrote:
> > -static void vmx_setup_uret_msr(struct vcpu_vmx *vmx, unsigned int msr)
> > +static void vmx_setup_uret_msr(struct vcpu_vmx *vmx, unsigned int msr,
> > + bool load_into_hardware)
> > {
> > - struct vmx_uret_msr tmp;
> > - int from, to;
> > + struct vmx_uret_msr *uret_msr;
> >
> > - from = __vmx_find_uret_msr(vmx, msr);
> > - if (from < 0)
> > + uret_msr = vmx_find_uret_msr(vmx, msr);
> > + if (!uret_msr)
> > return;
> > - to = vmx->nr_active_uret_msrs++;
> >
> > - tmp = vmx->guest_uret_msrs[to];
> > - vmx->guest_uret_msrs[to] = vmx->guest_uret_msrs[from];
> > - vmx->guest_uret_msrs[from] = tmp;
> > + uret_msr->load_into_hardware = load_into_hardware;
> > }
> >
> > /*
> > @@ -1785,30 +1781,36 @@ static void vmx_setup_uret_msr(struct vcpu_vmx *vmx, unsigned int msr)
> > */
> > static void setup_msrs(struct vcpu_vmx *vmx)
> > {
> > - vmx->guest_uret_msrs_loaded = false;
> > - vmx->nr_active_uret_msrs = 0;
> > #ifdef CONFIG_X86_64
> > + bool load_syscall_msrs;
> > +
> > /*
> > * The SYSCALL MSRs are only needed on long mode guests, and only
> > * when EFER.SCE is set.
> > */
> > - if (is_long_mode(&vmx->vcpu) && (vmx->vcpu.arch.efer & EFER_SCE)) {
> > - vmx_setup_uret_msr(vmx, MSR_STAR);
> > - vmx_setup_uret_msr(vmx, MSR_LSTAR);
> > - vmx_setup_uret_msr(vmx, MSR_SYSCALL_MASK);
> > - }
> > + load_syscall_msrs = is_long_mode(&vmx->vcpu) &&
> > + (vmx->vcpu.arch.efer & EFER_SCE);
> > +
> > + vmx_setup_uret_msr(vmx, MSR_STAR, load_syscall_msrs);
> > + vmx_setup_uret_msr(vmx, MSR_LSTAR, load_syscall_msrs);
> > + vmx_setup_uret_msr(vmx, MSR_SYSCALL_MASK, load_syscall_msrs);
> > #endif
> > - if (update_transition_efer(vmx))
> > - vmx_setup_uret_msr(vmx, MSR_EFER);
> > + vmx_setup_uret_msr(vmx, MSR_EFER, update_transition_efer(vmx));
> >
> > - if (guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP) ||
> > - guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDPID))
> > - vmx_setup_uret_msr(vmx, MSR_TSC_AUX);
> > + vmx_setup_uret_msr(vmx, MSR_TSC_AUX,
> > + guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP) ||
> > + guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDPID));
>
> Shouldn't vmx_setup_uret_msr(,MSR_TSC_AUX,) be called to update
> the new flag load_into_hardware for MSR_TSC_AUX when CPUID
> (X86_FEATURE_RDTSCP/X86_FEATURE_RDPID) of the vCPU is updated ?

Yes. I have a patch in the massive vCPU RESET/INIT series to do exactly that.
I honestly can't remember if there was a dependency that "required" the fix to
be buried in the middle of that series. I suspect not; I'm guessing I just
didn't think it was worth backporting to stable kernels and so didn't prioritize
hoisting the patch out of that mess.

https://lkml.kernel.org/r/[email protected]

2021-05-10 16:57:36

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 14/15] KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU model

On Mon, May 10, 2021, Maxim Levitsky wrote:
> On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index de921935e8de..6c7c6a303cc5 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -2663,12 +2663,6 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > msr_info->data |= (u64)svm->sysenter_esp_hi << 32;
> > break;
> > case MSR_TSC_AUX:
> > - if (tsc_aux_uret_slot < 0)
> > - return 1;
> > - if (!msr_info->host_initiated &&
> Not related to this patch, but I do wonder why do we need
> to always allow writing this msr if done by the host,
> since if neither RDTSPC nor RDPID are supported, the guest
> won't be able to read this msr at all.

It's an ordering thing and not specific to MSR_TSC_AUX. Exempting host userspace
from guest CPUID checks allows userspace to set MSR state, e.g. during migration,
before setting the guest CPUID model.

> > - !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
> > - !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
> > - return 1;
> > msr_info->data = svm->tsc_aux;
> > break;
> > /*

2021-05-10 16:59:10

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 03/15] KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest

On Mon, May 10, 2021, Maxim Levitsky wrote:
> On Tue, 2021-05-04 at 14:58 -0700, Jim Mattson wrote:
> > On Tue, May 4, 2021 at 2:57 PM Paolo Bonzini <[email protected]> wrote:
> > > On 04/05/21 23:53, Sean Christopherson wrote:
> > > > > Does the right thing happen here if the vCPU is in guest mode when
> > > > > userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
> > > > > off?
> > > > I hate our terminology. By "guest mode", do you mean running the vCPU, or do
> > > > you specifically mean running in L2?
> > > >
> > >
> > > Guest mode should mean L2.
> > >
> > > (I wonder if we should have a capability that says "KVM_SET_CPUID2 can
> > > only be called prior to KVM_RUN").
> >
> > It would certainly make it easier to reason about potential security issues.
> >
> I vote too for this.

Alternatively, what about adding KVM_VCPU_RESET to let userspace explicitly
pull RESET#, and defining that ioctl() to freeze the vCPU model? I.e. after
userspace resets the vCPU, KVM_SET_CPUID (and any other relevant ioctls() is
disallowed.

Lack of proper RESET emulation is annoying, e.g. userspace has to manually stuff
EDX after vCPU creation to get the right value at RESET. A dedicated ioctl()
would kill two birds with one stone, without having to add yet another "2"
ioctl().

2021-05-10 17:13:25

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 14/15] KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU model

On Mon, May 10, 2021 at 9:50 AM Sean Christopherson <[email protected]> wrote:
>
> On Mon, May 10, 2021, Maxim Levitsky wrote:
> > On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index de921935e8de..6c7c6a303cc5 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -2663,12 +2663,6 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > > msr_info->data |= (u64)svm->sysenter_esp_hi << 32;
> > > break;
> > > case MSR_TSC_AUX:
> > > - if (tsc_aux_uret_slot < 0)
> > > - return 1;
> > > - if (!msr_info->host_initiated &&
> > Not related to this patch, but I do wonder why do we need
> > to always allow writing this msr if done by the host,
> > since if neither RDTSPC nor RDPID are supported, the guest
> > won't be able to read this msr at all.
>
> It's an ordering thing and not specific to MSR_TSC_AUX. Exempting host userspace
> from guest CPUID checks allows userspace to set MSR state, e.g. during migration,
> before setting the guest CPUID model.

I thought the rule was that if an MSR was enumerated by
KVM_GET_MSR_INDEX_LIST, then KVM had to accept legal writes from the
host. The only "ordering thing" is that KVM_GET_MSR_INDEX_LIST is a
device ioctl, so it can't take guest CPUID information into account.

> > > - !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
> > > - !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
> > > - return 1;
> > > msr_info->data = svm->tsc_aux;
> > > break;
> > > /*

2021-05-10 17:22:16

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 02/15] KVM: x86: Emulate RDPID only if RDTSCP is supported

On Mon, May 10, 2021, Maxim Levitsky wrote:
> On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> > Do not advertise emulation support for RDPID if RDTSCP is unsupported.
> > RDPID emulation subtly relies on MSR_TSC_AUX to exist in hardware, as
> > both vmx_get_msr() and svm_get_msr() will return an error if the MSR is
> > unsupported, i.e. ctxt->ops->get_msr() will fail and the emulator will
> > inject a #UD.
> >
> > Note, RDPID emulation also relies on RDTSCP being enabled in the guest,
> > but this is a KVM bug and will eventually be fixed.
> >
> > Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID")
> > Cc: [email protected]
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/x86/kvm/cpuid.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index f765bf7a529c..c96f79c9fff2 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -637,7 +637,8 @@ static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func)
> > case 7:
> > entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> > entry->eax = 0;
> > - entry->ecx = F(RDPID);
> > + if (kvm_cpu_cap_has(X86_FEATURE_RDTSCP))
> > + entry->ecx = F(RDPID);
> > ++array->nent;
> > default:
> > break;
>
> Just to make sure that I understand this correctly:
>
> This is what I know:
>
> Both RDTSCP and RDPID are instructions that read IA32_TSC_AUX
> (and RDTSCP also reads the TSC).
>
> Both instructions have their own CPUID bits (X86_FEATURE_RDPID, X86_FEATURE_RDTSCP)
> If either of these CPUID bits are present, IA32_TSC_AUX should be supported.

Yep.

> RDPID is a newer feature, thus I can at least for the sanity sake assume that
> usually a CPU will either have neither of the features, have only RDTSCP,
> and IA32_AUX, or have both RDSCP and RDPID.

Yep.

> If not supported in hardware KVM only emulates RDPID as I see.

Yep.

> Why btw? Performance wise guest that only wants the IA32_AUX in userspace,
> is better to use RDTSCP and pay the penalty of saving/restoring of the
> unwanted registers, than use RDPID with a vmexit.

FWIW, Linux doesn't even fall back to RDTSCP. If RDPID isn't supported, Linux
throws the info into the limit of a dummy segment in the GDT and uses LSL to get
at the data. Turns out that RDTSCP is too slow for its intended use case :-)

> My own guess for an answer to this question is that RDPID emulation is there
> to aid migration from a host that does support RDPID to a host that doesn't.

That's my assumption as well. Paolo's commit is a bit light on why emulation
was added in the first place, but emulating to allow migrating to old hardware
is the only motivation I can come up with.

commit fb6d4d340e0532032c808a9933eaaa7b8de435ab
Author: Paolo Bonzini <[email protected]>
Date: Tue Jul 12 11:04:26 2016 +0200

KVM: x86: emulate RDPID

This is encoded as F3 0F C7 /7 with a register argument. The register
argument is the second array in the group9 GroupDual, while F3 is the
fourth element of a Prefix.

> Having said all that, assuming that we don't want to emulate the RDTSCP too,
> when it is not supported, then this patch does make sense.
>
> Reviewed-by: Maxim Levitsky <[email protected]>
>
> Best regards,
> Maxim Levitsky
>
>

2021-05-10 17:57:50

by Reiji Watanabe

[permalink] [raw]
Subject: Re: [PATCH 09/15] KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list

> > Shouldn't vmx_setup_uret_msr(,MSR_TSC_AUX,) be called to update
> > the new flag load_into_hardware for MSR_TSC_AUX when CPUID
> > (X86_FEATURE_RDTSCP/X86_FEATURE_RDPID) of the vCPU is updated ?
>
> Yes. I have a patch in the massive vCPU RESET/INIT series to do exactly that.
> I honestly can't remember if there was a dependency that "required" the fix to
> be buried in the middle of that series. I suspect not; I'm guessing I just
> didn't think it was worth backporting to stable kernels and so didn't prioritize
> hoisting the patch out of that mess.
>
> https://lkml.kernel.org/r/[email protected]

I see. I hadn't checked the patch.
Thank you for your answer !

Regards,
Reiji

2021-05-11 12:33:45

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 02/15] KVM: x86: Emulate RDPID only if RDTSCP is supported

On Mon, 2021-05-10 at 17:20 +0000, Sean Christopherson wrote:
> On Mon, May 10, 2021, Maxim Levitsky wrote:
> > On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> > > Do not advertise emulation support for RDPID if RDTSCP is unsupported.
> > > RDPID emulation subtly relies on MSR_TSC_AUX to exist in hardware, as
> > > both vmx_get_msr() and svm_get_msr() will return an error if the MSR is
> > > unsupported, i.e. ctxt->ops->get_msr() will fail and the emulator will
> > > inject a #UD.
> > >
> > > Note, RDPID emulation also relies on RDTSCP being enabled in the guest,
> > > but this is a KVM bug and will eventually be fixed.
> > >
> > > Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID")
> > > Cc: [email protected]
> > > Signed-off-by: Sean Christopherson <[email protected]>
> > > ---
> > > arch/x86/kvm/cpuid.c | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index f765bf7a529c..c96f79c9fff2 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -637,7 +637,8 @@ static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func)
> > > case 7:
> > > entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> > > entry->eax = 0;
> > > - entry->ecx = F(RDPID);
> > > + if (kvm_cpu_cap_has(X86_FEATURE_RDTSCP))
> > > + entry->ecx = F(RDPID);
> > > ++array->nent;
> > > default:
> > > break;
> >
> > Just to make sure that I understand this correctly:
> >
> > This is what I know:
> >
> > Both RDTSCP and RDPID are instructions that read IA32_TSC_AUX
> > (and RDTSCP also reads the TSC).
> >
> > Both instructions have their own CPUID bits (X86_FEATURE_RDPID, X86_FEATURE_RDTSCP)
> > If either of these CPUID bits are present, IA32_TSC_AUX should be supported.
>
> Yep.
>
> > RDPID is a newer feature, thus I can at least for the sanity sake assume that
> > usually a CPU will either have neither of the features, have only RDTSCP,
> > and IA32_AUX, or have both RDSCP and RDPID.
>
> Yep.
>
> > If not supported in hardware KVM only emulates RDPID as I see.
>
> Yep.
>
> > Why btw? Performance wise guest that only wants the IA32_AUX in userspace,
> > is better to use RDTSCP and pay the penalty of saving/restoring of the
> > unwanted registers, than use RDPID with a vmexit.
>
> FWIW, Linux doesn't even fall back to RDTSCP. If RDPID isn't supported, Linux
> throws the info into the limit of a dummy segment in the GDT and uses LSL to get
> at the data. Turns out that RDTSCP is too slow for its intended use case :-)
>
> > My own guess for an answer to this question is that RDPID emulation is there
> > to aid migration from a host that does support RDPID to a host that doesn't.
>
> That's my assumption as well. Paolo's commit is a bit light on why emulation
> was added in the first place, but emulating to allow migrating to old hardware
> is the only motivation I can come up with.

Cool thanks!
Best regards,
Maxim Levitsky

>
> commit fb6d4d340e0532032c808a9933eaaa7b8de435ab
> Author: Paolo Bonzini <[email protected]>
> Date: Tue Jul 12 11:04:26 2016 +0200
>
> KVM: x86: emulate RDPID
>
> This is encoded as F3 0F C7 /7 with a register argument. The register
> argument is the second array in the group9 GroupDual, while F3 is the
> fourth element of a Prefix.
>
> > Having said all that, assuming that we don't want to emulate the RDTSCP too,
> > when it is not supported, then this patch does make sense.
> >
> > Reviewed-by: Maxim Levitsky <[email protected]>
> >
> > Best regards,
> > Maxim Levitsky
> >
> >


2021-05-11 12:35:42

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 03/15] KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest

On Mon, 2021-05-10 at 16:56 +0000, Sean Christopherson wrote:
> On Mon, May 10, 2021, Maxim Levitsky wrote:
> > On Tue, 2021-05-04 at 14:58 -0700, Jim Mattson wrote:
> > > On Tue, May 4, 2021 at 2:57 PM Paolo Bonzini <[email protected]> wrote:
> > > > On 04/05/21 23:53, Sean Christopherson wrote:
> > > > > > Does the right thing happen here if the vCPU is in guest mode when
> > > > > > userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
> > > > > > off?
> > > > > I hate our terminology. By "guest mode", do you mean running the vCPU, or do
> > > > > you specifically mean running in L2?
> > > > >
> > > >
> > > > Guest mode should mean L2.
> > > >
> > > > (I wonder if we should have a capability that says "KVM_SET_CPUID2 can
> > > > only be called prior to KVM_RUN").
> > >
> > > It would certainly make it easier to reason about potential security issues.
> > >
> > I vote too for this.
>
> Alternatively, what about adding KVM_VCPU_RESET to let userspace explicitly
> pull RESET#, and defining that ioctl() to freeze the vCPU model? I.e. after
> userspace resets the vCPU, KVM_SET_CPUID (and any other relevant ioctls() is
> disallowed.

I vote even stronger for this!

I recently created what Paulo called an 'evil' test to stress KVM related
bugs in nested migration by simulating a nested migration with a vCPU reset,
followed by reload of all of its state including nested state.

It is ugly since as you say I have to manually load the reset state, and thus
using 'KVM_VCPU_RESET' ioctl instead would make this test much more
cleaner and even more 'evil'.

Best regards,
Maxim Levitsky


>
> Lack of proper RESET emulation is annoying, e.g. userspace has to manually stuff
> EDX after vCPU creation to get the right value at RESET. A dedicated ioctl()
> would kill two birds with one stone, without having to add yet another "2"
> ioctl().
>


2021-05-11 12:36:07

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 08/15] KVM: VMX: Configure list of user return MSRs at module init

On Mon, 2021-05-10 at 15:13 +0000, Sean Christopherson wrote:
> On Mon, May 10, 2021, Maxim Levitsky wrote:
> > On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> > > @@ -6929,18 +6942,10 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
> > > goto free_vpid;
> > > }
> > >
> > > - BUILD_BUG_ON(ARRAY_SIZE(vmx_uret_msrs_list) != MAX_NR_USER_RETURN_MSRS);
> > > + for (i = 0; i < vmx_nr_uret_msrs; ++i) {
> > > + vmx->guest_uret_msrs[i].data = 0;
> > >
> > > - for (i = 0; i < ARRAY_SIZE(vmx_uret_msrs_list); ++i) {
> > > - u32 index = vmx_uret_msrs_list[i];
> > > - int j = vmx->nr_uret_msrs;
> > > -
> > > - if (kvm_probe_user_return_msr(index))
> > > - continue;
> > > -
> > > - vmx->guest_uret_msrs[j].slot = i;
> > I don't see anything initalizing the .slot after this patch.
> > Now this code is removed later which masks this bug,
> > but for the bisect sake, I think that this patch
> > should still be fixed.
>
> Egad, indeed it's broken. I'll retest the whole series to verify the other
> patches will bisect cleanly.
>
> Nice catch!
>
Thanks!

Best regards,
Maxim Levitsky

2021-05-11 12:37:07

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 14/15] KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU model

On Mon, 2021-05-10 at 10:11 -0700, Jim Mattson wrote:
> On Mon, May 10, 2021 at 9:50 AM Sean Christopherson <[email protected]> wrote:
> > On Mon, May 10, 2021, Maxim Levitsky wrote:
> > > On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > > index de921935e8de..6c7c6a303cc5 100644
> > > > --- a/arch/x86/kvm/svm/svm.c
> > > > +++ b/arch/x86/kvm/svm/svm.c
> > > > @@ -2663,12 +2663,6 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > > > msr_info->data |= (u64)svm->sysenter_esp_hi << 32;
> > > > break;
> > > > case MSR_TSC_AUX:
> > > > - if (tsc_aux_uret_slot < 0)
> > > > - return 1;
> > > > - if (!msr_info->host_initiated &&
> > > Not related to this patch, but I do wonder why do we need
> > > to always allow writing this msr if done by the host,
> > > since if neither RDTSPC nor RDPID are supported, the guest
> > > won't be able to read this msr at all.
> >
> > It's an ordering thing and not specific to MSR_TSC_AUX. Exempting host userspace
> > from guest CPUID checks allows userspace to set MSR state, e.g. during migration,
> > before setting the guest CPUID model.
>
> I thought the rule was that if an MSR was enumerated by
> KVM_GET_MSR_INDEX_LIST, then KVM had to accept legal writes from the
> host. The only "ordering thing" is that KVM_GET_MSR_INDEX_LIST is a
> device ioctl, so it can't take guest CPUID information into account.

This makes sense.

Thanks!
Best regards,
Maxim Levitsky
>
> > > > - !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
> > > > - !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
> > > > - return 1;
> > > > msr_info->data = svm->tsc_aux;
> > > > break;
> > > > /*


2021-05-11 20:11:27

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 08/15] KVM: VMX: Configure list of user return MSRs at module init

On Tue, May 11, 2021, Maxim Levitsky wrote:
> On Mon, 2021-05-10 at 15:13 +0000, Sean Christopherson wrote:
> > On Mon, May 10, 2021, Maxim Levitsky wrote:
> > > On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> > > > @@ -6929,18 +6942,10 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
> > > > goto free_vpid;
> > > > }
> > > >
> > > > - BUILD_BUG_ON(ARRAY_SIZE(vmx_uret_msrs_list) != MAX_NR_USER_RETURN_MSRS);
> > > > + for (i = 0; i < vmx_nr_uret_msrs; ++i) {
> > > > + vmx->guest_uret_msrs[i].data = 0;
> > > >
> > > > - for (i = 0; i < ARRAY_SIZE(vmx_uret_msrs_list); ++i) {
> > > > - u32 index = vmx_uret_msrs_list[i];
> > > > - int j = vmx->nr_uret_msrs;
> > > > -
> > > > - if (kvm_probe_user_return_msr(index))
> > > > - continue;
> > > > -
> > > > - vmx->guest_uret_msrs[j].slot = i;
> > > I don't see anything initalizing the .slot after this patch.
> > > Now this code is removed later which masks this bug,
> > > but for the bisect sake, I think that this patch
> > > should still be fixed.
> >
> > Egad, indeed it's broken. I'll retest the whole series to verify the other
> > patches will bisect cleanly.
> >
> > Nice catch!
> >
> Thanks!

Unfortunately this made it's way to Linus before I could fix my goof. Fingers
crossed no one is unfortunate enough to land on this while bisecting...

2021-05-19 17:53:03

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 03/15] KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest

On 10/05/21 18:56, Sean Christopherson wrote:
> On Mon, May 10, 2021, Maxim Levitsky wrote:
>> On Tue, 2021-05-04 at 14:58 -0700, Jim Mattson wrote:
>>> On Tue, May 4, 2021 at 2:57 PM Paolo Bonzini <[email protected]> wrote:
>>>> On 04/05/21 23:53, Sean Christopherson wrote:
>>>>>> Does the right thing happen here if the vCPU is in guest mode when
>>>>>> userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
>>>>>> off?
>>>>> I hate our terminology. By "guest mode", do you mean running the vCPU, or do
>>>>> you specifically mean running in L2?
>>>>>
>>>>
>>>> Guest mode should mean L2.
>>>>
>>>> (I wonder if we should have a capability that says "KVM_SET_CPUID2 can
>>>> only be called prior to KVM_RUN").
>>>
>>> It would certainly make it easier to reason about potential security issues.
>>>
>> I vote too for this.
>
> Alternatively, what about adding KVM_VCPU_RESET to let userspace explicitly
> pull RESET#, and defining that ioctl() to freeze the vCPU model? I.e. after
> userspace resets the vCPU, KVM_SET_CPUID (and any other relevant ioctls() is
> disallowed.
>
> Lack of proper RESET emulation is annoying, e.g. userspace has to manually stuff
> EDX after vCPU creation to get the right value at RESET. A dedicated ioctl()
> would kill two birds with one stone, without having to add yet another "2"
> ioctl().

That has a disadvantage of opting into the more secure behavior, but we
can do both (forbidding KVM_SET_CPUID2 after both KVM_RUN and KVM_RESET).

Paolo


2021-05-19 21:05:10

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 03/15] KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest

On Tue, May 18, 2021, Paolo Bonzini wrote:
> On 10/05/21 18:56, Sean Christopherson wrote:
> > On Mon, May 10, 2021, Maxim Levitsky wrote:
> > > On Tue, 2021-05-04 at 14:58 -0700, Jim Mattson wrote:
> > > > On Tue, May 4, 2021 at 2:57 PM Paolo Bonzini <[email protected]> wrote:
> > > > > On 04/05/21 23:53, Sean Christopherson wrote:
> > > > > > > Does the right thing happen here if the vCPU is in guest mode when
> > > > > > > userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
> > > > > > > off?
> > > > > > I hate our terminology. By "guest mode", do you mean running the vCPU, or do
> > > > > > you specifically mean running in L2?
> > > > > >
> > > > >
> > > > > Guest mode should mean L2.
> > > > >
> > > > > (I wonder if we should have a capability that says "KVM_SET_CPUID2 can
> > > > > only be called prior to KVM_RUN").
> > > >
> > > > It would certainly make it easier to reason about potential security issues.
> > > >
> > > I vote too for this.
> >
> > Alternatively, what about adding KVM_VCPU_RESET to let userspace explicitly
> > pull RESET#, and defining that ioctl() to freeze the vCPU model? I.e. after
> > userspace resets the vCPU, KVM_SET_CPUID (and any other relevant ioctls() is
> > disallowed.
> >
> > Lack of proper RESET emulation is annoying, e.g. userspace has to manually stuff
> > EDX after vCPU creation to get the right value at RESET. A dedicated ioctl()
> > would kill two birds with one stone, without having to add yet another "2"
> > ioctl().
>
> That has a disadvantage of opting into the more secure behavior, but we can
> do both (forbidding KVM_SET_CPUID2 after both KVM_RUN and KVM_RESET).

Doesn't changing KVM_SET_CPUID2 need to be opt-in as well, e.g. if the strict
behavior is activated via a capability?