2022-05-14 00:04:34

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 0/3] KVM: x86: WRMSR MCi_CTL/STATUS bug fix and cleanups

Fix bugs in set_msr_mce() where KVM returns '-1' instead of '1' when
rejecting a WRMSR, which results in -EPERM getting returned to userspace.

Patches 2 and 3 are cleanups to make {g,s}et_msr_mce a bit less wierd
(especially with respect to Jue's CMCI series) and a bit less magical.

Sean Christopherson (3):
KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS)
KVM: x86: Use explicit case-statements for MCx banks in
{g,s}et_msr_mce()
KVM: x86: Add helpers to identify CTL and STATUS MCi MSRs

arch/x86/kvm/x86.c | 84 +++++++++++++++++++++++++++-------------------
1 file changed, 50 insertions(+), 34 deletions(-)


base-commit: 2764011106d0436cb44702cfb0981339d68c3509
--
2.36.0.550.gb090851708-goog



2022-05-14 00:16:01

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 2/3] KVM: x86: Use explicit case-statements for MCx banks in {g,s}et_msr_mce()

Use an explicit case statement to grab the full range of MCx bank MSRs
in {g,s}et_msr_mce(), and manually check only the "end" (the number of
banks configured by userspace may be less than the max). The "default"
trick works, but is a bit odd now, and will be quite odd if/when support
for accessing MCx_CTL2 MSRs is added, which has near identical logic.

Hoist "offset" to function scope so as to avoid curly braces for the case
statement, and because MCx_CTL2 support will need the same variables.

Opportunstically clean up the comment about allowing bit 10 to be cleared
from bank 4.

No functional change intended.

Cc: Jue Wang <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/x86.c | 71 ++++++++++++++++++++++++----------------------
1 file changed, 37 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bc6db58975dc..7e685ea9882b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3206,6 +3206,7 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
unsigned bank_num = mcg_cap & 0xff;
u32 msr = msr_info->index;
u64 data = msr_info->data;
+ u32 offset, last_msr;

switch (msr) {
case MSR_IA32_MCG_STATUS:
@@ -3219,32 +3220,33 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
vcpu->arch.mcg_ctl = data;
break;
+ case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+ last_msr = MSR_IA32_MCx_CTL(bank_num) - 1;
+ if (msr > last_msr)
+ return 1;
+
+ offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
+ last_msr + 1 - MSR_IA32_MC0_CTL);
+
+ /*
+ * Only 0 or all 1s can be written to IA32_MCi_CTL, all other
+ * values are architecturally undefined. But, some Linux
+ * kernels clear bit 10 in bank 4 to workaround a BIOS/GART TLB
+ * issue on AMD K8s, allow bit 10 to be clear when setting all
+ * other bits in order to avoid an uncaught #GP in the guest.
+ */
+ if ((offset & 0x3) == 0 &&
+ data != 0 && (data | (1 << 10)) != ~(u64)0)
+ return 1;
+
+ /* MCi_STATUS */
+ if (!msr_info->host_initiated && (offset & 0x3) == 1 &&
+ data != 0 && !can_set_mci_status(vcpu))
+ return 1;
+
+ vcpu->arch.mce_banks[offset] = data;
+ break;
default:
- if (msr >= MSR_IA32_MC0_CTL &&
- msr < MSR_IA32_MCx_CTL(bank_num)) {
- u32 offset = array_index_nospec(
- msr - MSR_IA32_MC0_CTL,
- MSR_IA32_MCx_CTL(bank_num) - MSR_IA32_MC0_CTL);
-
- /* only 0 or all 1s can be written to IA32_MCi_CTL
- * some Linux kernels though clear bit 10 in bank 4 to
- * workaround a BIOS/GART TBL issue on AMD K8s, ignore
- * this to avoid an uncatched #GP in the guest
- */
- if ((offset & 0x3) == 0 &&
- data != 0 && (data | (1 << 10)) != ~(u64)0)
- return 1;
-
- /* MCi_STATUS */
- if (!msr_info->host_initiated &&
- (offset & 0x3) == 1 && data != 0) {
- if (!can_set_mci_status(vcpu))
- return 1;
- }
-
- vcpu->arch.mce_banks[offset] = data;
- break;
- }
return 1;
}
return 0;
@@ -3789,6 +3791,7 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
u64 data;
u64 mcg_cap = vcpu->arch.mcg_cap;
unsigned bank_num = mcg_cap & 0xff;
+ u32 offset, last_msr;

switch (msr) {
case MSR_IA32_P5_MC_ADDR:
@@ -3806,16 +3809,16 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
case MSR_IA32_MCG_STATUS:
data = vcpu->arch.mcg_status;
break;
- default:
- if (msr >= MSR_IA32_MC0_CTL &&
- msr < MSR_IA32_MCx_CTL(bank_num)) {
- u32 offset = array_index_nospec(
- msr - MSR_IA32_MC0_CTL,
- MSR_IA32_MCx_CTL(bank_num) - MSR_IA32_MC0_CTL);
+ case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+ last_msr = MSR_IA32_MCx_CTL(bank_num) - 1;
+ if (msr > last_msr)
+ return 1;

- data = vcpu->arch.mce_banks[offset];
- break;
- }
+ offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
+ last_msr + 1 - MSR_IA32_MC0_CTL);
+ data = vcpu->arch.mce_banks[offset];
+ break;
+ default:
return 1;
}
*pdata = data;
--
2.36.0.550.gb090851708-goog


2022-05-14 03:38:59

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 3/3] KVM: x86: Add helpers to identify CTL and STATUS MCi MSRs

Add helpers to identify CTL (control) and STATUS MCi MSR types instead of
open coding the checks using the offset. Using the offset is perfectly
safe, but unintuitive, as understanding what the code does requires
knowing that the offset calcuation will not affect the lower three bits.

Opportunistically comment the STATUS logic to save readers a trip to
Intel's SDM or AMD's APM to understand the "data != 0" check.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/x86.c | 25 +++++++++++++++++++------
1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7e685ea9882b..b61c5def0bfc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3188,6 +3188,16 @@ static void kvmclock_sync_fn(struct work_struct *work)
KVMCLOCK_SYNC_PERIOD);
}

+/* These helpers are safe iff @msr is known to be an MCx bank MSR. */
+static bool is_mci_control_msr(u32 msr)
+{
+ return (msr & 3) == 0;
+}
+static bool is_mci_status_msr(u32 msr)
+{
+ return (msr & 3) == 1;
+}
+
/*
* On AMD, HWCR[McStatusWrEn] controls whether setting MCi_STATUS results in #GP.
*/
@@ -3225,9 +3235,6 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (msr > last_msr)
return 1;

- offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
- last_msr + 1 - MSR_IA32_MC0_CTL);
-
/*
* Only 0 or all 1s can be written to IA32_MCi_CTL, all other
* values are architecturally undefined. But, some Linux
@@ -3235,15 +3242,21 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
* issue on AMD K8s, allow bit 10 to be clear when setting all
* other bits in order to avoid an uncaught #GP in the guest.
*/
- if ((offset & 0x3) == 0 &&
+ if (is_mci_control_msr(msr) &&
data != 0 && (data | (1 << 10)) != ~(u64)0)
return 1;

- /* MCi_STATUS */
- if (!msr_info->host_initiated && (offset & 0x3) == 1 &&
+ /*
+ * All CPUs allow writing 0 to MCi_STATUS MSRs to clear the MSR.
+ * AMD-based CPUs allow non-zero values, but if and only if
+ * HWCR[McStatusWrEn] is set.
+ */
+ if (!msr_info->host_initiated && is_mci_status_msr(msr) &&
data != 0 && !can_set_mci_status(vcpu))
return 1;

+ offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
+ last_msr + 1 - MSR_IA32_MC0_CTL);
vcpu->arch.mce_banks[offset] = data;
break;
default:
--
2.36.0.550.gb090851708-goog


2022-05-14 03:47:30

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 2/3] KVM: x86: Use explicit case-statements for MCx banks in {g,s}et_msr_mce()

On Thu, May 12, 2022 at 3:27 PM Sean Christopherson <[email protected]> wrote:
>
> Use an explicit case statement to grab the full range of MCx bank MSRs
> in {g,s}et_msr_mce(), and manually check only the "end" (the number of
> banks configured by userspace may be less than the max). The "default"
> trick works, but is a bit odd now, and will be quite odd if/when support
> for accessing MCx_CTL2 MSRs is added, which has near identical logic.
>
> Hoist "offset" to function scope so as to avoid curly braces for the case
> statement, and because MCx_CTL2 support will need the same variables.
>
> Opportunstically clean up the comment about allowing bit 10 to be cleared
> from bank 4.
>
> No functional change intended.
>
> Cc: Jue Wang <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
Reviewed-by: Jim Mattson <[email protected]>