2023-03-22 01:18:03

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 0/6] KVM: x86: Unhost the *_CMD MSR mess

Revert the FLUSH_L1D enabling, which has multiple fatal bugs, clean up
the existing PRED_CMD handling, and reintroduce FLUSH_L1D virtualization
without inheriting the mistakes made by PRED_CMD.

The last patch hardens SVM against one of the bugs introduced in the
FLUSH_L1D enabling.

I'll post KUT patches tomorrow. I have the tests written (and they found
bugs in my code, :shocked-pikachu:), just need to write the changelogs.
Wanted to get this out sooner than later as I'm guessing I'm not the only
one whose VMs won't boot on Intel CPUs...

Sean Christopherson (6):
KVM: x86: Revert MSR_IA32_FLUSH_CMD.FLUSH_L1D enabling
KVM: VMX: Passthrough MSR_IA32_PRED_CMD based purely on host+guest
CPUID
KVM: SVM: Passthrough MSR_IA32_PRED_CMD based purely on host+guest
CPUID
KVM: x86: Move MSR_IA32_PRED_CMD WRMSR emulation to common code
KVM: x86: Virtualize FLUSH_L1D and passthrough MSR_IA32_FLUSH_CMD
KVM: SVM: Return the local "r" variable from svm_set_msr()

arch/x86/kvm/svm/svm.c | 51 +++++++++++-----------------------------
arch/x86/kvm/vmx/vmx.c | 53 +++++++-----------------------------------
arch/x86/kvm/vmx/vmx.h | 2 +-
arch/x86/kvm/x86.c | 23 ++++++++++++++++++
4 files changed, 46 insertions(+), 83 deletions(-)


base-commit: d8708b80fa0e6e21bc0c9e7276ad0bccef73b6e7
--
2.40.0.rc2.332.ga46443480c-goog


2023-03-22 01:18:11

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 2/6] KVM: VMX: Passthrough MSR_IA32_PRED_CMD based purely on host+guest CPUID

Passthrough MSR_IA32_PRED_CMD based purely on whether or not the MSR is
supported and enabled, i.e. don't wait until the first write. There's no
benefit to deferred passthrough, and the extra logic only adds complexity.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 16 +++-------------
1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f777509ecf17..5c01c76c0d45 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2298,19 +2298,6 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
break;

wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
-
- /*
- * For non-nested:
- * When it's written (to non-zero) for the first time, pass
- * it through.
- *
- * For nested:
- * The handling of the MSR bitmap for L2 guests is done in
- * nested_vmx_prepare_msr_bitmap. We should not touch the
- * vmcs02.msr_bitmap here since it gets completely overwritten
- * in the merging.
- */
- vmx_disable_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W);
break;
case MSR_IA32_CR_PAT:
if (!kvm_pat_valid(data))
@@ -7743,6 +7730,9 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R,
!guest_cpuid_has(vcpu, X86_FEATURE_XFD));

+ if (boot_cpu_has(X86_FEATURE_IBPB))
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
+ !guest_has_pred_cmd_msr(vcpu));

set_cr4_guest_host_mask(vmx);

--
2.40.0.rc2.332.ga46443480c-goog

2023-03-22 01:18:15

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 1/6] KVM: x86: Revert MSR_IA32_FLUSH_CMD.FLUSH_L1D enabling

Revert the recently added virtualizing of MSR_IA32_FLUSH_CMD, as both
the VMX and SVM are fatally buggy to guests that use MSR_IA32_FLUSH_CMD or
MSR_IA32_PRED_CMD, and because the entire foundation of the logic is
flawed.

The most immediate problem is an inverted check on @cmd that results in
rejecting legal values. SVM doubles down on bugs and drops the error,
i.e. silently breaks all guest mitigations based on the command MSRs.

The next issue is that neither VMX nor SVM was updated to mark
MSR_IA32_FLUSH_CMD as being a possible passthrough MSR,
which isn't hugely problematic, but does break MSR filtering and triggers
a WARN on VMX designed to catch this exact bug.

The foundational issues stem from the MSR_IA32_FLUSH_CMD code reusing
logic from MSR_IA32_PRED_CMD, which in turn was likely copied from KVM's
support for MSR_IA32_SPEC_CTRL. The copy+paste from MSR_IA32_SPEC_CTRL
was misguided as MSR_IA32_PRED_CMD (and MSR_IA32_FLUSH_CMD) is a
write-only MSR, i.e. doesn't need the same "deferred passthrough"
shenanigans as MSR_IA32_SPEC_CTRL.

Revert all MSR_IA32_FLUSH_CMD enabling in one fell swoop so that there is
no point where KVM advertises, but does not support, L1D_FLUSH.

This reverts commits 45cf86f26148e549c5ba4a8ab32a390e4bde216e,
723d5fb0ffe4c02bd4edf47ea02c02e454719f28, and
a807b78ad04b2eaa348f52f5cc7702385b6de1ee.

Reported-by: Nathan Chancellor <[email protected]>
Link: https://lkml.kernel.org/r/20230317190432.GA863767%40dev-arch.thelio-3990X
Cc: Emanuele Giuseppe Esposito <[email protected]>
Cc: Pawan Gupta <[email protected]>
Cc: Jim Mattson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 2 +-
arch/x86/kvm/svm/svm.c | 43 ++++++++-----------------
arch/x86/kvm/vmx/nested.c | 3 --
arch/x86/kvm/vmx/vmx.c | 68 ++++++++++++++-------------------------
4 files changed, 39 insertions(+), 77 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 9583a110cf5f..599aebec2d52 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -653,7 +653,7 @@ void kvm_set_cpu_caps(void)
F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) |
- F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D)
+ F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16)
);

/* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 70183d2271b5..252e7f37e4e2 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2869,28 +2869,6 @@ static int svm_set_vm_cr(struct kvm_vcpu *vcpu, u64 data)
return 0;
}

-static int svm_set_msr_ia32_cmd(struct kvm_vcpu *vcpu, struct msr_data *msr,
- bool guest_has_feat, u64 cmd,
- int x86_feature_bit)
-{
- struct vcpu_svm *svm = to_svm(vcpu);
-
- if (!msr->host_initiated && !guest_has_feat)
- return 1;
-
- if (!(msr->data & ~cmd))
- return 1;
- if (!boot_cpu_has(x86_feature_bit))
- return 1;
- if (!msr->data)
- return 0;
-
- wrmsrl(msr->index, cmd);
- set_msr_interception(vcpu, svm->msrpm, msr->index, 0, 1);
-
- return 0;
-}
-
static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -2965,14 +2943,19 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
break;
case MSR_IA32_PRED_CMD:
- r = svm_set_msr_ia32_cmd(vcpu, msr,
- guest_has_pred_cmd_msr(vcpu),
- PRED_CMD_IBPB, X86_FEATURE_IBPB);
- break;
- case MSR_IA32_FLUSH_CMD:
- r = svm_set_msr_ia32_cmd(vcpu, msr,
- guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D),
- L1D_FLUSH, X86_FEATURE_FLUSH_L1D);
+ if (!msr->host_initiated &&
+ !guest_has_pred_cmd_msr(vcpu))
+ return 1;
+
+ if (data & ~PRED_CMD_IBPB)
+ return 1;
+ if (!boot_cpu_has(X86_FEATURE_IBPB))
+ return 1;
+ if (!data)
+ break;
+
+ wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
break;
case MSR_AMD64_VIRT_SPEC_CTRL:
if (!msr->host_initiated &&
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f63b28f46a71..1bc2b80273c9 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -654,9 +654,6 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
MSR_IA32_PRED_CMD, MSR_TYPE_W);

- nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
- MSR_IA32_FLUSH_CMD, MSR_TYPE_W);
-
kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false);

vmx->nested.force_msr_bitmap_recalc = false;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d7bf14abdba1..f777509ecf17 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2133,39 +2133,6 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
return debugctl;
}

-static int vmx_set_msr_ia32_cmd(struct kvm_vcpu *vcpu,
- struct msr_data *msr_info,
- bool guest_has_feat, u64 cmd,
- int x86_feature_bit)
-{
- if (!msr_info->host_initiated && !guest_has_feat)
- return 1;
-
- if (!(msr_info->data & ~cmd))
- return 1;
- if (!boot_cpu_has(x86_feature_bit))
- return 1;
- if (!msr_info->data)
- return 0;
-
- wrmsrl(msr_info->index, cmd);
-
- /*
- * For non-nested:
- * When it's written (to non-zero) for the first time, pass
- * it through.
- *
- * For nested:
- * The handling of the MSR bitmap for L2 guests is done in
- * nested_vmx_prepare_msr_bitmap. We should not touch the
- * vmcs02.msr_bitmap here since it gets completely overwritten
- * in the merging.
- */
- vmx_disable_intercept_for_msr(vcpu, msr_info->index, MSR_TYPE_W);
-
- return 0;
-}
-
/*
* Writes msr value into the appropriate "register".
* Returns 0 on success, non-0 otherwise.
@@ -2319,16 +2286,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
goto find_uret_msr;
case MSR_IA32_PRED_CMD:
- ret = vmx_set_msr_ia32_cmd(vcpu, msr_info,
- guest_has_pred_cmd_msr(vcpu),
- PRED_CMD_IBPB,
- X86_FEATURE_IBPB);
- break;
- case MSR_IA32_FLUSH_CMD:
- ret = vmx_set_msr_ia32_cmd(vcpu, msr_info,
- guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D),
- L1D_FLUSH,
- X86_FEATURE_FLUSH_L1D);
+ if (!msr_info->host_initiated &&
+ !guest_has_pred_cmd_msr(vcpu))
+ return 1;
+
+ if (data & ~PRED_CMD_IBPB)
+ return 1;
+ if (!boot_cpu_has(X86_FEATURE_IBPB))
+ return 1;
+ if (!data)
+ break;
+
+ wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
+
+ /*
+ * For non-nested:
+ * When it's written (to non-zero) for the first time, pass
+ * it through.
+ *
+ * For nested:
+ * The handling of the MSR bitmap for L2 guests is done in
+ * nested_vmx_prepare_msr_bitmap. We should not touch the
+ * vmcs02.msr_bitmap here since it gets completely overwritten
+ * in the merging.
+ */
+ vmx_disable_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W);
break;
case MSR_IA32_CR_PAT:
if (!kvm_pat_valid(data))
--
2.40.0.rc2.332.ga46443480c-goog

2023-03-22 01:18:53

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 4/6] KVM: x86: Move MSR_IA32_PRED_CMD WRMSR emulation to common code

Dedup the handling of MSR_IA32_PRED_CMD across VMX and SVM by moving the
logic to kvm_set_msr_common(). Now that the MSR interception toggling is
handled as part of setting guest CPUID, the VMX and SVM paths are
identical.

Opportunistically massage the code to make it a wee bit denser.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 14 --------------
arch/x86/kvm/vmx/vmx.c | 14 --------------
arch/x86/kvm/x86.c | 11 +++++++++++
3 files changed, 11 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f757b436ffae..85bb535fc321 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2942,20 +2942,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
*/
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
break;
- case MSR_IA32_PRED_CMD:
- if (!msr->host_initiated &&
- !guest_has_pred_cmd_msr(vcpu))
- return 1;
-
- if (data & ~PRED_CMD_IBPB)
- return 1;
- if (!boot_cpu_has(X86_FEATURE_IBPB))
- return 1;
- if (!data)
- break;
-
- wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
- break;
case MSR_AMD64_VIRT_SPEC_CTRL:
if (!msr->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_VIRT_SSBD))
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5c01c76c0d45..29807be219b9 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2285,20 +2285,6 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (data & ~(TSX_CTRL_RTM_DISABLE | TSX_CTRL_CPUID_CLEAR))
return 1;
goto find_uret_msr;
- case MSR_IA32_PRED_CMD:
- if (!msr_info->host_initiated &&
- !guest_has_pred_cmd_msr(vcpu))
- return 1;
-
- if (data & ~PRED_CMD_IBPB)
- return 1;
- if (!boot_cpu_has(X86_FEATURE_IBPB))
- return 1;
- if (!data)
- break;
-
- wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
- break;
case MSR_IA32_CR_PAT:
if (!kvm_pat_valid(data))
return 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 237c483b1230..c83ec88da043 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3617,6 +3617,17 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vcpu->arch.perf_capabilities = data;
kvm_pmu_refresh(vcpu);
return 0;
+ case MSR_IA32_PRED_CMD:
+ if (!msr_info->host_initiated && !guest_has_pred_cmd_msr(vcpu))
+ return 1;
+
+ if (!boot_cpu_has(X86_FEATURE_IBPB) || (data & ~PRED_CMD_IBPB))
+ return 1;
+ if (!data)
+ break;
+
+ wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
+ break;
case MSR_EFER:
return set_efer(vcpu, msr_info);
case MSR_K7_HWCR:
--
2.40.0.rc2.332.ga46443480c-goog

2023-03-22 01:19:12

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 3/6] KVM: SVM: Passthrough MSR_IA32_PRED_CMD based purely on host+guest CPUID

Passthrough MSR_IA32_PRED_CMD based purely on whether or not the MSR is
supported and enabled, i.e. don't wait until the first write. There's no
benefit to deferred passthrough, and the extra logic only adds complexity.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 252e7f37e4e2..f757b436ffae 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2955,7 +2955,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
break;

wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
- set_msr_interception(vcpu, svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
break;
case MSR_AMD64_VIRT_SPEC_CTRL:
if (!msr->host_initiated &&
@@ -4151,6 +4150,10 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)

svm_recalc_instruction_intercepts(vcpu, svm);

+ if (boot_cpu_has(X86_FEATURE_IBPB))
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_PRED_CMD, 0,
+ !!guest_has_pred_cmd_msr(vcpu));
+
/* For sev guests, the memory encryption bit is not reserved in CR3. */
if (sev_guest(vcpu->kvm)) {
best = kvm_find_cpuid_entry(vcpu, 0x8000001F);
--
2.40.0.rc2.332.ga46443480c-goog

2023-03-22 01:19:14

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 5/6] KVM: x86: Virtualize FLUSH_L1D and passthrough MSR_IA32_FLUSH_CMD

Virtualize FLUSH_L1D so that the guest can use the performant L1D flush
if one of the many mitigations might require a flush in the guest, e.g.
Linux provides an option to flush the L1D when switching mms.

Passthrough MSR_IA32_FLUSH_CMD for write when it's supported in hardware
and exposed to the guest, i.e. always let the guest write it directly if
FLUSH_L1D is fully supported.

Forward writes to hardware in host context on the off chance that KVM
ends up emulating a WRMSR, or in the really unlikely scenario where
userspace wants to force a flush. Restrict these forwarded WRMSRs to
the known command out of an abundance of caution. Passing through the
MSR means the guest can throw any and all values at hardware, but doing
so in host context is arguably a bit more dangerous.

Link: https://lkml.kernel.org/r/CALMp9eTt3xzAEoQ038bJQ9LN0ZOXrSWsN7xnNUD%2B0SS%3DWwF7Pg%40mail.gmail.com
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 2 +-
arch/x86/kvm/svm/svm.c | 5 +++++
arch/x86/kvm/vmx/nested.c | 3 +++
arch/x86/kvm/vmx/vmx.c | 5 +++++
arch/x86/kvm/vmx/vmx.h | 2 +-
arch/x86/kvm/x86.c | 12 ++++++++++++
6 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 599aebec2d52..9583a110cf5f 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -653,7 +653,7 @@ void kvm_set_cpu_caps(void)
F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) |
- F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16)
+ F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D)
);

/* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 85bb535fc321..b32edaf5a74b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -95,6 +95,7 @@ static const struct svm_direct_access_msrs {
#endif
{ .index = MSR_IA32_SPEC_CTRL, .always = false },
{ .index = MSR_IA32_PRED_CMD, .always = false },
+ { .index = MSR_IA32_FLUSH_CMD, .always = false },
{ .index = MSR_IA32_LASTBRANCHFROMIP, .always = false },
{ .index = MSR_IA32_LASTBRANCHTOIP, .always = false },
{ .index = MSR_IA32_LASTINTFROMIP, .always = false },
@@ -4140,6 +4141,10 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_PRED_CMD, 0,
!!guest_has_pred_cmd_msr(vcpu));

+ if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_FLUSH_CMD, 0,
+ !!guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));
+
/* For sev guests, the memory encryption bit is not reserved in CR3. */
if (sev_guest(vcpu->kvm)) {
best = kvm_find_cpuid_entry(vcpu, 0x8000001F);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1bc2b80273c9..f63b28f46a71 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -654,6 +654,9 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
MSR_IA32_PRED_CMD, MSR_TYPE_W);

+ nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_FLUSH_CMD, MSR_TYPE_W);
+
kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false);

vmx->nested.force_msr_bitmap_recalc = false;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 29807be219b9..56e0c7ae961d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -164,6 +164,7 @@ module_param(allow_smaller_maxphyaddr, bool, S_IRUGO);
static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS] = {
MSR_IA32_SPEC_CTRL,
MSR_IA32_PRED_CMD,
+ MSR_IA32_FLUSH_CMD,
MSR_IA32_TSC,
#ifdef CONFIG_X86_64
MSR_FS_BASE,
@@ -7720,6 +7721,10 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
vmx_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
!guest_has_pred_cmd_msr(vcpu));

+ if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
+ !guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));
+
set_cr4_guest_host_mask(vmx);

vmx_write_encls_bitmap(vcpu, NULL);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 2acdc54bc34b..cb766f65a3eb 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -369,7 +369,7 @@ struct vcpu_vmx {
struct lbr_desc lbr_desc;

/* Save desired MSR intercept (read: pass-through) state */
-#define MAX_POSSIBLE_PASSTHROUGH_MSRS 15
+#define MAX_POSSIBLE_PASSTHROUGH_MSRS 16
struct {
DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c83ec88da043..3c58dbae7b4c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3628,6 +3628,18 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)

wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
break;
+ case MSR_IA32_FLUSH_CMD:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D))
+ return 1;
+
+ if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D) || (data & ~L1D_FLUSH))
+ return 1;
+ if (!data)
+ break;
+
+ wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
+ break;
case MSR_EFER:
return set_efer(vcpu, msr_info);
case MSR_K7_HWCR:
--
2.40.0.rc2.332.ga46443480c-goog

2023-03-22 01:19:17

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 6/6] KVM: SVM: Return the local "r" variable from svm_set_msr()

Rename "r" to "ret" and actually return it from svm_set_msr() to reduce
the probability of repeating the mistake of commit 723d5fb0ffe4 ("kvm:
svm: Add IA32_FLUSH_CMD guest support"), which set "r" thinking that it
would be propagated to the caller.

Alternatively, the declaration of "r" could be moved into the handling of
MSR_TSC_AUX, but that risks variable shadowing in the future. A wrapper
for kvm_set_user_return_msr() would allow eliding a local variable, but
that feels like delaying the inevitable.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b32edaf5a74b..57f241c5a371 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2873,7 +2873,7 @@ static int svm_set_vm_cr(struct kvm_vcpu *vcpu, u64 data)
static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
{
struct vcpu_svm *svm = to_svm(vcpu);
- int r;
+ int ret = 0;

u32 ecx = msr->index;
u64 data = msr->data;
@@ -2995,10 +2995,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
* guest via direct_access_msrs, and switch it via user return.
*/
preempt_disable();
- r = kvm_set_user_return_msr(tsc_aux_uret_slot, data, -1ull);
+ ret = kvm_set_user_return_msr(tsc_aux_uret_slot, data, -1ull);
preempt_enable();
- if (r)
- return 1;
+ if (ret)
+ break;

svm->tsc_aux = data;
break;
@@ -3056,7 +3056,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
default:
return kvm_set_msr_common(vcpu, msr);
}
- return 0;
+ return ret;
}

static int msr_interception(struct kvm_vcpu *vcpu)
--
2.40.0.rc2.332.ga46443480c-goog

2023-03-22 08:24:44

by Mathias Krause

[permalink] [raw]
Subject: Re: [PATCH 1/6] KVM: x86: Revert MSR_IA32_FLUSH_CMD.FLUSH_L1D enabling

On 22.03.23 02:14, Sean Christopherson wrote:
> Revert the recently added virtualizing of MSR_IA32_FLUSH_CMD, as both
> the VMX and SVM are fatally buggy to guests that use MSR_IA32_FLUSH_CMD or
> MSR_IA32_PRED_CMD, and because the entire foundation of the logic is
> flawed.
>
> The most immediate problem is an inverted check on @cmd that results in
> rejecting legal values. SVM doubles down on bugs and drops the error,
> i.e. silently breaks all guest mitigations based on the command MSRs.
>
> The next issue is that neither VMX nor SVM was updated to mark
> MSR_IA32_FLUSH_CMD as being a possible passthrough MSR,
> which isn't hugely problematic, but does break MSR filtering and triggers
> a WARN on VMX designed to catch this exact bug.
>
> The foundational issues stem from the MSR_IA32_FLUSH_CMD code reusing
> logic from MSR_IA32_PRED_CMD, which in turn was likely copied from KVM's
> support for MSR_IA32_SPEC_CTRL. The copy+paste from MSR_IA32_SPEC_CTRL
> was misguided as MSR_IA32_PRED_CMD (and MSR_IA32_FLUSH_CMD) is a
> write-only MSR, i.e. doesn't need the same "deferred passthrough"
> shenanigans as MSR_IA32_SPEC_CTRL.
>
> Revert all MSR_IA32_FLUSH_CMD enabling in one fell swoop so that there is
> no point where KVM advertises, but does not support, L1D_FLUSH.
>
> This reverts commits 45cf86f26148e549c5ba4a8ab32a390e4bde216e,
> 723d5fb0ffe4c02bd4edf47ea02c02e454719f28, and
> a807b78ad04b2eaa348f52f5cc7702385b6de1ee.
>
> Reported-by: Nathan Chancellor <[email protected]>
> Link: https://lkml.kernel.org/r/20230317190432.GA863767%40dev-arch.thelio-3990X
> Cc: Emanuele Giuseppe Esposito <[email protected]>
> Cc: Pawan Gupta <[email protected]>
> Cc: Jim Mattson <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/cpuid.c | 2 +-
> arch/x86/kvm/svm/svm.c | 43 ++++++++-----------------
> arch/x86/kvm/vmx/nested.c | 3 --
> arch/x86/kvm/vmx/vmx.c | 68 ++++++++++++++-------------------------
> 4 files changed, 39 insertions(+), 77 deletions(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 9583a110cf5f..599aebec2d52 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -653,7 +653,7 @@ void kvm_set_cpu_caps(void)
> F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
> F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
> F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) |
> - F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D)
> + F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16)
> );
>
> /* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 70183d2271b5..252e7f37e4e2 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2869,28 +2869,6 @@ static int svm_set_vm_cr(struct kvm_vcpu *vcpu, u64 data)
> return 0;
> }
>
> -static int svm_set_msr_ia32_cmd(struct kvm_vcpu *vcpu, struct msr_data *msr,
> - bool guest_has_feat, u64 cmd,
> - int x86_feature_bit)
> -{
> - struct vcpu_svm *svm = to_svm(vcpu);
> -
> - if (!msr->host_initiated && !guest_has_feat)
> - return 1;
> -
> - if (!(msr->data & ~cmd))
> - return 1;
> - if (!boot_cpu_has(x86_feature_bit))
> - return 1;
> - if (!msr->data)
> - return 0;
> -
> - wrmsrl(msr->index, cmd);
> - set_msr_interception(vcpu, svm->msrpm, msr->index, 0, 1);
> -
> - return 0;
> -}
> -
> static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> @@ -2965,14 +2943,19 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
> break;
> case MSR_IA32_PRED_CMD:
> - r = svm_set_msr_ia32_cmd(vcpu, msr,
> - guest_has_pred_cmd_msr(vcpu),
> - PRED_CMD_IBPB, X86_FEATURE_IBPB);
> - break;
> - case MSR_IA32_FLUSH_CMD:
> - r = svm_set_msr_ia32_cmd(vcpu, msr,
> - guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D),
> - L1D_FLUSH, X86_FEATURE_FLUSH_L1D);
> + if (!msr->host_initiated &&
> + !guest_has_pred_cmd_msr(vcpu))
> + return 1;
> +
> + if (data & ~PRED_CMD_IBPB)
> + return 1;
> + if (!boot_cpu_has(X86_FEATURE_IBPB))
> + return 1;
> + if (!data)
> + break;
> +
> + wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
> + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
> break;
> case MSR_AMD64_VIRT_SPEC_CTRL:
> if (!msr->host_initiated &&
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index f63b28f46a71..1bc2b80273c9 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -654,9 +654,6 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
> nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
> MSR_IA32_PRED_CMD, MSR_TYPE_W);
>
> - nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
> - MSR_IA32_FLUSH_CMD, MSR_TYPE_W);
> -
> kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false);
>
> vmx->nested.force_msr_bitmap_recalc = false;
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index d7bf14abdba1..f777509ecf17 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2133,39 +2133,6 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
> return debugctl;
> }
>
> -static int vmx_set_msr_ia32_cmd(struct kvm_vcpu *vcpu,
> - struct msr_data *msr_info,
> - bool guest_has_feat, u64 cmd,
> - int x86_feature_bit)
> -{
> - if (!msr_info->host_initiated && !guest_has_feat)
> - return 1;
> -
> - if (!(msr_info->data & ~cmd))
> - return 1;
> - if (!boot_cpu_has(x86_feature_bit))
> - return 1;
> - if (!msr_info->data)
> - return 0;
> -
> - wrmsrl(msr_info->index, cmd);
> -
> - /*
> - * For non-nested:
> - * When it's written (to non-zero) for the first time, pass
> - * it through.
> - *
> - * For nested:
> - * The handling of the MSR bitmap for L2 guests is done in
> - * nested_vmx_prepare_msr_bitmap. We should not touch the
> - * vmcs02.msr_bitmap here since it gets completely overwritten
> - * in the merging.
> - */
> - vmx_disable_intercept_for_msr(vcpu, msr_info->index, MSR_TYPE_W);
> -
> - return 0;
> -}
> -
> /*
> * Writes msr value into the appropriate "register".
> * Returns 0 on success, non-0 otherwise.
> @@ -2319,16 +2286,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> return 1;
> goto find_uret_msr;
> case MSR_IA32_PRED_CMD:
> - ret = vmx_set_msr_ia32_cmd(vcpu, msr_info,
> - guest_has_pred_cmd_msr(vcpu),
> - PRED_CMD_IBPB,
> - X86_FEATURE_IBPB);
> - break;
> - case MSR_IA32_FLUSH_CMD:
> - ret = vmx_set_msr_ia32_cmd(vcpu, msr_info,
> - guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D),
> - L1D_FLUSH,
> - X86_FEATURE_FLUSH_L1D);
> + if (!msr_info->host_initiated &&
> + !guest_has_pred_cmd_msr(vcpu))
> + return 1;
> +
> + if (data & ~PRED_CMD_IBPB)
> + return 1;
> + if (!boot_cpu_has(X86_FEATURE_IBPB))
> + return 1;
> + if (!data)
> + break;
> +
> + wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
> +
> + /*
> + * For non-nested:
> + * When it's written (to non-zero) for the first time, pass
> + * it through.
> + *
> + * For nested:
> + * The handling of the MSR bitmap for L2 guests is done in
> + * nested_vmx_prepare_msr_bitmap. We should not touch the
> + * vmcs02.msr_bitmap here since it gets completely overwritten
> + * in the merging.
> + */
> + vmx_disable_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W);
> break;
> case MSR_IA32_CR_PAT:
> if (!kvm_pat_valid(data))

This fixes the boot crash others an me ran into. Thanks a lot, Sean!

Tested-by: Mathias Krause <[email protected]>

2023-03-23 05:10:30

by Pawan Gupta

[permalink] [raw]
Subject: Re: [PATCH 5/6] KVM: x86: Virtualize FLUSH_L1D and passthrough MSR_IA32_FLUSH_CMD

On Tue, Mar 21, 2023 at 06:14:39PM -0700, Sean Christopherson wrote:
> Virtualize FLUSH_L1D so that the guest can use the performant L1D flush
> if one of the many mitigations might require a flush in the guest, e.g.
> Linux provides an option to flush the L1D when switching mms.
>
> Passthrough MSR_IA32_FLUSH_CMD for write when it's supported in hardware
> and exposed to the guest, i.e. always let the guest write it directly if
> FLUSH_L1D is fully supported.
>
> Forward writes to hardware in host context on the off chance that KVM
> ends up emulating a WRMSR, or in the really unlikely scenario where
> userspace wants to force a flush. Restrict these forwarded WRMSRs to
> the known command out of an abundance of caution. Passing through the
> MSR means the guest can throw any and all values at hardware, but doing
> so in host context is arguably a bit more dangerous.
>
> Link: https://lkml.kernel.org/r/CALMp9eTt3xzAEoQ038bJQ9LN0ZOXrSWsN7xnNUD%2B0SS%3DWwF7Pg%40mail.gmail.com
> Link: https://lore.kernel.org/all/[email protected]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/cpuid.c | 2 +-
> arch/x86/kvm/svm/svm.c | 5 +++++
> arch/x86/kvm/vmx/nested.c | 3 +++
> arch/x86/kvm/vmx/vmx.c | 5 +++++
> arch/x86/kvm/vmx/vmx.h | 2 +-
> arch/x86/kvm/x86.c | 12 ++++++++++++
> 6 files changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 599aebec2d52..9583a110cf5f 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -653,7 +653,7 @@ void kvm_set_cpu_caps(void)
> F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
> F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
> F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) |
> - F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16)
> + F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D)
> );
>
> /* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 85bb535fc321..b32edaf5a74b 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -95,6 +95,7 @@ static const struct svm_direct_access_msrs {
> #endif
> { .index = MSR_IA32_SPEC_CTRL, .always = false },
> { .index = MSR_IA32_PRED_CMD, .always = false },
> + { .index = MSR_IA32_FLUSH_CMD, .always = false },
> { .index = MSR_IA32_LASTBRANCHFROMIP, .always = false },
> { .index = MSR_IA32_LASTBRANCHTOIP, .always = false },
> { .index = MSR_IA32_LASTINTFROMIP, .always = false },
> @@ -4140,6 +4141,10 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> set_msr_interception(vcpu, svm->msrpm, MSR_IA32_PRED_CMD, 0,
> !!guest_has_pred_cmd_msr(vcpu));
>
> + if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))

Just curious, will this ever be true on AMD hardware? AFAIK,
X86_FEATURE_FLUSH_L1D is Intel-defined CPU feature.

> + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_FLUSH_CMD, 0,
> + !!guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));
> +
> /* For sev guests, the memory encryption bit is not reserved in CR3. */
> if (sev_guest(vcpu->kvm)) {
> best = kvm_find_cpuid_entry(vcpu, 0x8000001F);

2023-03-23 22:23:17

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 5/6] KVM: x86: Virtualize FLUSH_L1D and passthrough MSR_IA32_FLUSH_CMD

On Wed, Mar 22, 2023, Pawan Gupta wrote:
> On Tue, Mar 21, 2023 at 06:14:39PM -0700, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 85bb535fc321..b32edaf5a74b 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -95,6 +95,7 @@ static const struct svm_direct_access_msrs {
> > #endif
> > { .index = MSR_IA32_SPEC_CTRL, .always = false },
> > { .index = MSR_IA32_PRED_CMD, .always = false },
> > + { .index = MSR_IA32_FLUSH_CMD, .always = false },
> > { .index = MSR_IA32_LASTBRANCHFROMIP, .always = false },
> > { .index = MSR_IA32_LASTBRANCHTOIP, .always = false },
> > { .index = MSR_IA32_LASTINTFROMIP, .always = false },
> > @@ -4140,6 +4141,10 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > set_msr_interception(vcpu, svm->msrpm, MSR_IA32_PRED_CMD, 0,
> > !!guest_has_pred_cmd_msr(vcpu));
> >
> > + if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
>
> Just curious, will this ever be true on AMD hardware? AFAIK,
> X86_FEATURE_FLUSH_L1D is Intel-defined CPU feature.

Don't know myself, but I assume/home there was actual motivation behind adding
support for AMD.

2023-03-23 22:52:47

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 0/6] KVM: x86: Unhost the *_CMD MSR mess

On Tue, 21 Mar 2023 18:14:34 -0700, Sean Christopherson wrote:
> Revert the FLUSH_L1D enabling, which has multiple fatal bugs, clean up
> the existing PRED_CMD handling, and reintroduce FLUSH_L1D virtualization
> without inheriting the mistakes made by PRED_CMD.
>
> The last patch hardens SVM against one of the bugs introduced in the
> FLUSH_L1D enabling.
>
> [...]

Applied to a one-off branch, kvm-x86 cmd_msrs, so that I can get this into
kvm-x86 next and onto linux-next asap. I'll drop the branch if Paolo wants
to do something else, or if there are issues with the series.

[1/6] KVM: x86: Revert MSR_IA32_FLUSH_CMD.FLUSH_L1D enabling
https://github.com/kvm-x86/linux/commit/e9c126917c09
[2/6] KVM: VMX: Passthrough MSR_IA32_PRED_CMD based purely on host+guest CPUID
https://github.com/kvm-x86/linux/commit/4f9babd37df0
[3/6] KVM: SVM: Passthrough MSR_IA32_PRED_CMD based purely on host+guest CPUID
https://github.com/kvm-x86/linux/commit/5ac641dff621
[4/6] KVM: x86: Move MSR_IA32_PRED_CMD WRMSR emulation to common code
https://github.com/kvm-x86/linux/commit/584aeda90bd9
[5/6] KVM: x86: Virtualize FLUSH_L1D and passthrough MSR_IA32_FLUSH_CMD
https://github.com/kvm-x86/linux/commit/5bdebd246db5
[6/6] KVM: SVM: Return the local "r" variable from svm_set_msr()
https://github.com/kvm-x86/linux/commit/8a16ed8c673c

--
https://github.com/kvm-x86/linux/tree/next
https://github.com/kvm-x86/linux/tree/fixes

2023-03-24 08:52:51

by Zhi Wang

[permalink] [raw]
Subject: Re: [PATCH 5/6] KVM: x86: Virtualize FLUSH_L1D and passthrough MSR_IA32_FLUSH_CMD

On Thu, 23 Mar 2023 15:17:50 -0700
Sean Christopherson <[email protected]> wrote:

> On Wed, Mar 22, 2023, Pawan Gupta wrote:
> > On Tue, Mar 21, 2023 at 06:14:39PM -0700, Sean Christopherson wrote:
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index 85bb535fc321..b32edaf5a74b 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -95,6 +95,7 @@ static const struct svm_direct_access_msrs {
> > > #endif
> > > { .index = MSR_IA32_SPEC_CTRL, .always = false },
> > > { .index = MSR_IA32_PRED_CMD, .always = false },
> > > + { .index = MSR_IA32_FLUSH_CMD, .always = false },
> > > { .index = MSR_IA32_LASTBRANCHFROMIP, .always = false },
> > > { .index = MSR_IA32_LASTBRANCHTOIP, .always = false },
> > > { .index = MSR_IA32_LASTINTFROMIP, .always = false },
> > > @@ -4140,6 +4141,10 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > > set_msr_interception(vcpu, svm->msrpm, MSR_IA32_PRED_CMD, 0,
> > > !!guest_has_pred_cmd_msr(vcpu));
> > >
> > > + if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
> >
> > Just curious, will this ever be true on AMD hardware? AFAIK,
> > X86_FEATURE_FLUSH_L1D is Intel-defined CPU feature.
>
> Don't know myself, but I assume/home there was actual motivation behind adding
> support for AMD.

Hmm. I took a look on the APM[1] published on Jan 2023, 3.2.9 Speculation
control MSRs. It only has SPEC_CTL/PRED_SMD so far. Also, the information
here [2] shows this is a mitigation only for Intel CPUs. Looks like AMD
does not require this so far.

[1] https://www.amd.com/system/files/TechDocs/40332.pdf
[2] https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html

2023-03-27 03:37:47

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH 5/6] KVM: x86: Virtualize FLUSH_L1D and passthrough MSR_IA32_FLUSH_CMD

On 3/22/2023 9:14 AM, Sean Christopherson wrote:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c83ec88da043..3c58dbae7b4c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3628,6 +3628,18 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>
> wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
> break;
> + case MSR_IA32_FLUSH_CMD:
> + if (!msr_info->host_initiated &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D))
> + return 1;
> +
> + if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D) || (data & ~L1D_FLUSH))
> + return 1;
> + if (!data)
> + break;
> +
> + wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
> + break;

Then KVM provides the ability to flush the L1 data cache of host to
userspace. Can it be exploited to degrade the host performance if
userspace VMM keeps flushing the L1 data cache?

2023-03-27 04:01:02

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH 2/6] KVM: VMX: Passthrough MSR_IA32_PRED_CMD based purely on host+guest CPUID

On 3/22/2023 9:14 AM, Sean Christopherson wrote:
> Passthrough MSR_IA32_PRED_CMD based purely on whether or not the MSR is
> supported and enabled, i.e. don't wait until the first write. There's no
> benefit to deferred passthrough, and the extra logic only adds complexity.

Reviewed-by: Xiaoyao Li <[email protected]>

> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 16 +++-------------
> 1 file changed, 3 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index f777509ecf17..5c01c76c0d45 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2298,19 +2298,6 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> break;
>
> wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
> -
> - /*
> - * For non-nested:
> - * When it's written (to non-zero) for the first time, pass
> - * it through.
> - *
> - * For nested:
> - * The handling of the MSR bitmap for L2 guests is done in
> - * nested_vmx_prepare_msr_bitmap. We should not touch the
> - * vmcs02.msr_bitmap here since it gets completely overwritten
> - * in the merging.
> - */
> - vmx_disable_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W);
> break;
> case MSR_IA32_CR_PAT:
> if (!kvm_pat_valid(data))
> @@ -7743,6 +7730,9 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R,
> !guest_cpuid_has(vcpu, X86_FEATURE_XFD));
>
> + if (boot_cpu_has(X86_FEATURE_IBPB))
> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
> + !guest_has_pred_cmd_msr(vcpu));
>
> set_cr4_guest_host_mask(vmx);
>

2023-03-27 06:48:31

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH 4/6] KVM: x86: Move MSR_IA32_PRED_CMD WRMSR emulation to common code

On 3/22/2023 9:14 AM, Sean Christopherson wrote:
> Dedup the handling of MSR_IA32_PRED_CMD across VMX and SVM by moving the
> logic to kvm_set_msr_common(). Now that the MSR interception toggling is
> handled as part of setting guest CPUID, the VMX and SVM paths are
> identical.
>
> Opportunistically massage the code to make it a wee bit denser.
>

Reviewed-by: Xiaoyao Li <[email protected]>

> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 14 --------------
> arch/x86/kvm/vmx/vmx.c | 14 --------------
> arch/x86/kvm/x86.c | 11 +++++++++++
> 3 files changed, 11 insertions(+), 28 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index f757b436ffae..85bb535fc321 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2942,20 +2942,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> */
> set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
> break;
> - case MSR_IA32_PRED_CMD:
> - if (!msr->host_initiated &&
> - !guest_has_pred_cmd_msr(vcpu))
> - return 1;
> -
> - if (data & ~PRED_CMD_IBPB)
> - return 1;
> - if (!boot_cpu_has(X86_FEATURE_IBPB))
> - return 1;
> - if (!data)
> - break;
> -
> - wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
> - break;
> case MSR_AMD64_VIRT_SPEC_CTRL:
> if (!msr->host_initiated &&
> !guest_cpuid_has(vcpu, X86_FEATURE_VIRT_SSBD))
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 5c01c76c0d45..29807be219b9 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2285,20 +2285,6 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> if (data & ~(TSX_CTRL_RTM_DISABLE | TSX_CTRL_CPUID_CLEAR))
> return 1;
> goto find_uret_msr;
> - case MSR_IA32_PRED_CMD:
> - if (!msr_info->host_initiated &&
> - !guest_has_pred_cmd_msr(vcpu))
> - return 1;
> -
> - if (data & ~PRED_CMD_IBPB)
> - return 1;
> - if (!boot_cpu_has(X86_FEATURE_IBPB))
> - return 1;
> - if (!data)
> - break;
> -
> - wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
> - break;
> case MSR_IA32_CR_PAT:
> if (!kvm_pat_valid(data))
> return 1;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 237c483b1230..c83ec88da043 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3617,6 +3617,17 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> vcpu->arch.perf_capabilities = data;
> kvm_pmu_refresh(vcpu);
> return 0;
> + case MSR_IA32_PRED_CMD:
> + if (!msr_info->host_initiated && !guest_has_pred_cmd_msr(vcpu))
> + return 1;
> +
> + if (!boot_cpu_has(X86_FEATURE_IBPB) || (data & ~PRED_CMD_IBPB))
> + return 1;
> + if (!data)
> + break;
> +
> + wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
> + break;
> case MSR_EFER:
> return set_efer(vcpu, msr_info);
> case MSR_K7_HWCR:

2023-03-27 15:29:09

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 0/6] KVM: x86: Unhost the *_CMD MSR mess

On 3/22/23 02:14, Sean Christopherson wrote:
> Revert the FLUSH_L1D enabling, which has multiple fatal bugs, clean up
> the existing PRED_CMD handling, and reintroduce FLUSH_L1D virtualization
> without inheriting the mistakes made by PRED_CMD.
>
> The last patch hardens SVM against one of the bugs introduced in the
> FLUSH_L1D enabling.
>
> I'll post KUT patches tomorrow. I have the tests written (and they found
> bugs in my code, :shocked-pikachu:), just need to write the changelogs.
> Wanted to get this out sooner than later as I'm guessing I'm not the only
> one whose VMs won't boot on Intel CPUs...

Hi Sean,

did you post them?

Paolo

2023-03-27 15:30:13

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 0/6] KVM: x86: Unhost the *_CMD MSR mess

On Mon, Mar 27, 2023, Paolo Bonzini wrote:
> On 3/22/23 02:14, Sean Christopherson wrote:
> > Revert the FLUSH_L1D enabling, which has multiple fatal bugs, clean up
> > the existing PRED_CMD handling, and reintroduce FLUSH_L1D virtualization
> > without inheriting the mistakes made by PRED_CMD.
> >
> > The last patch hardens SVM against one of the bugs introduced in the
> > FLUSH_L1D enabling.
> >
> > I'll post KUT patches tomorrow. I have the tests written (and they found
> > bugs in my code, :shocked-pikachu:), just need to write the changelogs.
> > Wanted to get this out sooner than later as I'm guessing I'm not the only
> > one whose VMs won't boot on Intel CPUs...
>
> Hi Sean,
>
> did you post them?

No, I'll get that done today (I pinky swear this time).

2023-03-27 15:43:54

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 5/6] KVM: x86: Virtualize FLUSH_L1D and passthrough MSR_IA32_FLUSH_CMD

On Sun, Mar 26, 2023 at 8:33 PM Xiaoyao Li <[email protected]> wrote:
>
> On 3/22/2023 9:14 AM, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index c83ec88da043..3c58dbae7b4c 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3628,6 +3628,18 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >
> > wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
> > break;
> > + case MSR_IA32_FLUSH_CMD:
> > + if (!msr_info->host_initiated &&
> > + !guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D))
> > + return 1;
> > +
> > + if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D) || (data & ~L1D_FLUSH))
> > + return 1;
> > + if (!data)
> > + break;
> > +
> > + wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
> > + break;
>
> Then KVM provides the ability to flush the L1 data cache of host to
> userspace. Can it be exploited to degrade the host performance if
> userspace VMM keeps flushing the L1 data cache?

The L1D$ isn't very big. A guest could always flush out any previously
cached data simply by referencing its own data. Is the ability to
flush the L1D$ by WRMSR that egregious?

2023-03-27 15:48:58

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 0/6] KVM: x86: Unhost the *_CMD MSR mess

On 3/27/23 17:28, Sean Christopherson wrote:
>> On 3/22/23 02:14, Sean Christopherson wrote:
>>> Revert the FLUSH_L1D enabling, which has multiple fatal bugs, clean up
>>> the existing PRED_CMD handling, and reintroduce FLUSH_L1D virtualization
>>> without inheriting the mistakes made by PRED_CMD.
>>>
>>> The last patch hardens SVM against one of the bugs introduced in the
>>> FLUSH_L1D enabling.
>>>
>>> I'll post KUT patches tomorrow. I have the tests written (and they found
>>> bugs in my code, :shocked-pikachu:), just need to write the changelogs.
>>> Wanted to get this out sooner than later as I'm guessing I'm not the only
>>> one whose VMs won't boot on Intel CPUs...
>> Hi Sean,
>>
>> did you post them?
> No, I'll get that done today (I pinky swear this time).

Ok, you can also send me a pull request if you prefer (or I can apply
the patches to kvm/next myself of course).

Paolo

2023-03-27 16:02:13

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 5/6] KVM: x86: Virtualize FLUSH_L1D and passthrough MSR_IA32_FLUSH_CMD

On Mon, Mar 27, 2023, Jim Mattson wrote:
> On Sun, Mar 26, 2023 at 8:33 PM Xiaoyao Li <[email protected]> wrote:
> >
> > On 3/22/2023 9:14 AM, Sean Christopherson wrote:
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index c83ec88da043..3c58dbae7b4c 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -3628,6 +3628,18 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > >
> > > wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
> > > break;
> > > + case MSR_IA32_FLUSH_CMD:
> > > + if (!msr_info->host_initiated &&
> > > + !guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D))
> > > + return 1;
> > > +
> > > + if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D) || (data & ~L1D_FLUSH))
> > > + return 1;
> > > + if (!data)
> > > + break;
> > > +
> > > + wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
> > > + break;
> >
> > Then KVM provides the ability to flush the L1 data cache of host to
> > userspace. Can it be exploited to degrade the host performance if
> > userspace VMM keeps flushing the L1 data cache?
>
> The L1D$ isn't very big. A guest could always flush out any previously
> cached data simply by referencing its own data. Is the ability to
> flush the L1D$ by WRMSR that egregious?

Yeah, AFAIK RDT and the like only provide QoS controls for L3, so L1 is fair game.

2023-04-12 20:01:28

by Paolo Bonzini

[permalink] [raw]

2023-04-12 20:10:28

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 0/6] KVM: x86: Unhost the *_CMD MSR mess

On Wed, Apr 12, 2023, Paolo Bonzini wrote:
> Queued, thanks.

Roger that, I'll drop kvm-x86/cmd_msrs.