2024-04-25 18:16:09

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 00/10] KVM: x86: Clean up MSR access/failure handling

Rework KVM's MSR access handling, and more specific the handling of failures,
to begin the march towards removing host_initiated exemptions for CPUID
checks, e.g. to eventually turn code like this:

if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
return 1;

into

if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
return KVM_MSR_RET_UNSUPPORTED;

For all intents and purposes, KVM already requires setting guest CPUID before
setting MSRs, as there are multiple MSR flows that simply cannot work if CPUID
isn't in place.

But because KVM's ABI is that userspace is allowed to save/restore MSRs that
are advertised to usersepace regardless of the vCPU CPUID model, KVM has ended
up with code like the above where KVM unconditionally allows host accesses.

The idea here is to funnel all MSR accesses through a single helper so that
KVM can make the "host_initiated" exception in a single location based on
KVM_MSR_RET_UNSUPPORTED, i.e. so that KVM doesn't need one-off checks for every
MSR, which is especially problematic for CET where a Venn diagram is needed to
map CET MSR existence to CPUID feature bits.

This series doesn't actually remove the existing host_initiated checks. I
*really* wanted to do that here, but removing all the existing checks is
non-trivial and has a high chance of subtly breaking userspace. I still want
to eventually get there, but it needs to be a slower, more thoughtful process.

For now, the goal is to allow new features to omit the host_initiated checks
without creating a weird userspace ABI, e.g to simplify the aforementioned CET
support.

Sean Christopherson (10):
KVM: SVM: Disallow guest from changing userspace's MSR_AMD64_DE_CFG
value
KVM: x86: Move MSR_TYPE_{R,W,RW} values from VMX to x86, as enums
KVM: x86: Rename KVM_MSR_RET_INVALID to KVM_MSR_RET_UNSUPPORTED
KVM: x86: Refactor kvm_x86_ops.get_msr_feature() to avoid
kvm_msr_entry
KVM: x86: Rename get_msr_feature() APIs to get_feature_msr()
KVM: x86: Refactor kvm_get_feature_msr() to avoid struct kvm_msr_entry
KVM: x86: Funnel all fancy MSR return value handling into a common
helper
KVM: x86: Hoist x86.c's global msr_* variables up above
kvm_do_msr_access()
KVM: x86: Suppress failures on userspace access to advertised,
unsupported MSRs
KVM: x86: Suppress userspace access failures on unsupported,
"emulated" MSRs

arch/x86/include/asm/kvm-x86-ops.h | 2 +-
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm/svm.c | 29 +-
arch/x86/kvm/vmx/main.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 8 +-
arch/x86/kvm/vmx/vmx.h | 4 -
arch/x86/kvm/vmx/x86_ops.h | 2 +-
arch/x86/kvm/x86.c | 513 ++++++++++++++---------------
arch/x86/kvm/x86.h | 21 +-
9 files changed, 294 insertions(+), 289 deletions(-)


base-commit: 7b076c6a308ec5bce9fc96e2935443ed228b9148
--
2.44.0.769.g3c40516874-goog



2024-04-25 18:16:28

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 05/10] KVM: x86: Rename get_msr_feature() APIs to get_feature_msr()

Rename all APIs related to feature MSRs from get_feature_msr() to
get_feature_msr(). The APIs get "feature MSRs", not "MSR features".
And unlike kvm_{g,s}et_msr_common(), the "feature" adjective doesn't
describe the helper itself.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 2 +-
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm/svm.c | 6 +++---
arch/x86/kvm/vmx/main.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 2 +-
arch/x86/kvm/vmx/x86_ops.h | 2 +-
arch/x86/kvm/x86.c | 12 ++++++------
7 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 5187fcf4b610..9f25b4a49d6b 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -128,7 +128,7 @@ KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from)
KVM_X86_OP_OPTIONAL(vm_move_enc_context_from)
KVM_X86_OP_OPTIONAL(guest_memory_reclaimed)
-KVM_X86_OP(get_msr_feature)
+KVM_X86_OP(get_feature_msr)
KVM_X86_OP(check_emulate_instruction)
KVM_X86_OP(apic_init_signal_blocked)
KVM_X86_OP_OPTIONAL(enable_l2_tlb_flush)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7d56e5a52ae3..cc04ab0c234e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1785,7 +1785,7 @@ struct kvm_x86_ops {
int (*vm_move_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
void (*guest_memory_reclaimed)(struct kvm *kvm);

- int (*get_msr_feature)(u32 msr, u64 *data);
+ int (*get_feature_msr)(u32 msr, u64 *data);

int (*check_emulate_instruction)(struct kvm_vcpu *vcpu, int emul_type,
void *insn, int insn_len);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 15422b7d9149..d95cd230540d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2796,7 +2796,7 @@ static int efer_trap(struct kvm_vcpu *vcpu)
return kvm_complete_insn_gp(vcpu, ret);
}

-static int svm_get_msr_feature(u32 msr, u64 *data)
+static int svm_get_feature_msr(u32 msr, u64 *data)
{
*data = 0;

@@ -3134,7 +3134,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
case MSR_AMD64_DE_CFG: {
u64 supported_de_cfg;

- if (svm_get_msr_feature(ecx, &supported_de_cfg))
+ if (svm_get_feature_msr(ecx, &supported_de_cfg))
return 1;

if (data & ~supported_de_cfg)
@@ -4944,7 +4944,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.vcpu_unblocking = avic_vcpu_unblocking,

.update_exception_bitmap = svm_update_exception_bitmap,
- .get_msr_feature = svm_get_msr_feature,
+ .get_feature_msr = svm_get_feature_msr,
.get_msr = svm_get_msr,
.set_msr = svm_set_msr,
.get_segment_base = svm_get_segment_base,
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 7c546ad3e4c9..c670f4cf6d94 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -40,7 +40,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.vcpu_put = vmx_vcpu_put,

.update_exception_bitmap = vmx_update_exception_bitmap,
- .get_msr_feature = vmx_get_msr_feature,
+ .get_feature_msr = vmx_get_feature_msr,
.get_msr = vmx_get_msr,
.set_msr = vmx_set_msr,
.get_segment_base = vmx_get_segment_base,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 25b0a838abd6..fe2bf8f31d7c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1955,7 +1955,7 @@ static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
return !(msr->data & ~valid_bits);
}

-int vmx_get_msr_feature(u32 msr, u64 *data)
+int vmx_get_feature_msr(u32 msr, u64 *data)
{
switch (msr) {
case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 504d56d6837d..4b81c85e9357 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -58,7 +58,7 @@ bool vmx_has_emulated_msr(struct kvm *kvm, u32 index);
void vmx_msr_filter_changed(struct kvm_vcpu *vcpu);
void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu);
-int vmx_get_msr_feature(u32 msr, u64 *data);
+int vmx_get_feature_msr(u32 msr, u64 *data);
int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg);
void vmx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 03e50812ab33..8f58181f2b6d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1682,7 +1682,7 @@ static u64 kvm_get_arch_capabilities(void)
return data;
}

-static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
+static int kvm_get_feature_msr(struct kvm_msr_entry *msr)
{
switch (msr->index) {
case MSR_IA32_ARCH_CAPABILITIES:
@@ -1695,12 +1695,12 @@ static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
rdmsrl_safe(msr->index, &msr->data);
break;
default:
- return static_call(kvm_x86_get_msr_feature)(msr->index, &msr->data);
+ return static_call(kvm_x86_get_feature_msr)(msr->index, &msr->data);
}
return 0;
}

-static int do_get_msr_feature(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
+static int do_get_feature_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
{
struct kvm_msr_entry msr;
int r;
@@ -1708,7 +1708,7 @@ static int do_get_msr_feature(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
/* Unconditionally clear the output for simplicity */
msr.data = 0;
msr.index = index;
- r = kvm_get_msr_feature(&msr);
+ r = kvm_get_feature_msr(&msr);

if (r == KVM_MSR_RET_UNSUPPORTED && kvm_msr_ignored_check(index, 0, false))
r = 0;
@@ -4962,7 +4962,7 @@ long kvm_arch_dev_ioctl(struct file *filp,
break;
}
case KVM_GET_MSRS:
- r = msr_io(NULL, argp, do_get_msr_feature, 1);
+ r = msr_io(NULL, argp, do_get_feature_msr, 1);
break;
#ifdef CONFIG_KVM_HYPERV
case KVM_GET_SUPPORTED_HV_CPUID:
@@ -7367,7 +7367,7 @@ static void kvm_probe_feature_msr(u32 msr_index)
.index = msr_index,
};

- if (kvm_get_msr_feature(&msr))
+ if (kvm_get_feature_msr(&msr))
return;

msr_based_features[num_msr_based_features++] = msr_index;
--
2.44.0.769.g3c40516874-goog


2024-04-25 18:16:32

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 06/10] KVM: x86: Refactor kvm_get_feature_msr() to avoid struct kvm_msr_entry

Refactor kvm_get_feature_msr() to take the components of kvm_msr_entry as
separate parameters, along with a vCPU pointer, i.e. to give it the same
prototype as kvm_{g,s}et_msr_ignored_check(). This will allow using a
common inner helper for handling accesses to "regular" and feature MSRs.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/x86.c | 29 +++++++++++++----------------
1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f58181f2b6d..c0727df18e92 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1682,39 +1682,38 @@ static u64 kvm_get_arch_capabilities(void)
return data;
}

-static int kvm_get_feature_msr(struct kvm_msr_entry *msr)
+static int kvm_get_feature_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
+ bool host_initiated)
{
- switch (msr->index) {
+ WARN_ON_ONCE(!host_initiated);
+
+ switch (index) {
case MSR_IA32_ARCH_CAPABILITIES:
- msr->data = kvm_get_arch_capabilities();
+ *data = kvm_get_arch_capabilities();
break;
case MSR_IA32_PERF_CAPABILITIES:
- msr->data = kvm_caps.supported_perf_cap;
+ *data = kvm_caps.supported_perf_cap;
break;
case MSR_IA32_UCODE_REV:
- rdmsrl_safe(msr->index, &msr->data);
+ rdmsrl_safe(index, data);
break;
default:
- return static_call(kvm_x86_get_feature_msr)(msr->index, &msr->data);
+ return static_call(kvm_x86_get_feature_msr)(index, data);
}
return 0;
}

static int do_get_feature_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
{
- struct kvm_msr_entry msr;
int r;

/* Unconditionally clear the output for simplicity */
- msr.data = 0;
- msr.index = index;
- r = kvm_get_feature_msr(&msr);
+ *data = 0;
+ r = kvm_get_feature_msr(vcpu, index, data, true);

if (r == KVM_MSR_RET_UNSUPPORTED && kvm_msr_ignored_check(index, 0, false))
r = 0;

- *data = msr.data;
-
return r;
}

@@ -7363,11 +7362,9 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)

static void kvm_probe_feature_msr(u32 msr_index)
{
- struct kvm_msr_entry msr = {
- .index = msr_index,
- };
+ u64 data;

- if (kvm_get_feature_msr(&msr))
+ if (kvm_get_feature_msr(NULL, msr_index, &data, true))
return;

msr_based_features[num_msr_based_features++] = msr_index;
--
2.44.0.769.g3c40516874-goog


2024-04-25 18:16:51

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 02/10] KVM: x86: Move MSR_TYPE_{R,W,RW} values from VMX to x86, as enums

Move VMX's MSR_TYPE_{R,W,RW} #defines to x86.h, as enums, so that they can
be used by common x86 code, e.g. instead of doing "bool write".

Opportunistically tweak the definitions to make it more obvious that the
values are bitmasks, not arbitrary ascending values.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.h | 4 ----
arch/x86/kvm/x86.h | 6 ++++++
2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 90f9e4434646..243d2ab8f325 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -17,10 +17,6 @@
#include "run_flags.h"
#include "../mmu.h"

-#define MSR_TYPE_R 1
-#define MSR_TYPE_W 2
-#define MSR_TYPE_RW 3
-
#define X2APIC_MSR(r) (APIC_BASE_MSR + ((r) >> 4))

#ifdef CONFIG_X86_64
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index d80a4c6b5a38..a03829e9c6ac 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -497,6 +497,12 @@ int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva);
bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);

+enum kvm_msr_access {
+ MSR_TYPE_R = BIT(0),
+ MSR_TYPE_W = BIT(1),
+ MSR_TYPE_RW = MSR_TYPE_R | MSR_TYPE_W,
+};
+
/*
* Internal error codes that are used to indicate that MSR emulation encountered
* an error that should result in #GP in the guest, unless userspace
--
2.44.0.769.g3c40516874-goog


2024-04-25 18:17:05

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 07/10] KVM: x86: Funnel all fancy MSR return value handling into a common helper

Add a common helper, kvm_do_msr_access(), to invoke the "leaf" APIs that
are type and access specific, and more importantly to handle errors that
are returned from the leaf APIs. I.e. turn kvm_msr_ignored_check() from a
a helper that is called on an error, into a trampoline that detects errors
*and* applies relevant side effects, e.g. logging unimplemented accesses.

Because the leaf APIs are used for guest accesses, userspace accesses, and
KVM accesses, and because KVM supports restricting access to MSRs from
userspace via filters, the error handling is subtly non-trivial. E.g. KVM
has had at least one bug escape due to making each "outer" function handle
errors. See commit 3376ca3f1a20 ("KVM: x86: Fix KVM_GET_MSRS stack info
leak").

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/x86.c | 86 +++++++++++++++++++++++-----------------------
1 file changed, 43 insertions(+), 43 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c0727df18e92..a0506878d58e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -319,25 +319,40 @@ u64 __read_mostly host_xcr0;

static struct kmem_cache *x86_emulator_cache;

-/*
- * When called, it means the previous get/set msr reached an invalid msr.
- * Return true if we want to ignore/silent this failed msr access.
- */
-static bool kvm_msr_ignored_check(u32 msr, u64 data, bool write)
+typedef int (*msr_access_t)(struct kvm_vcpu *vcpu, u32 index, u64 *data,
+ bool host_initiated);
+
+static __always_inline int kvm_do_msr_access(struct kvm_vcpu *vcpu, u32 msr,
+ u64 *data, bool host_initiated,
+ enum kvm_msr_access rw,
+ msr_access_t msr_access_fn)
{
- const char *op = write ? "wrmsr" : "rdmsr";
-
- if (ignore_msrs) {
- if (report_ignored_msrs)
- kvm_pr_unimpl("ignored %s: 0x%x data 0x%llx\n",
- op, msr, data);
- /* Mask the error */
- return true;
- } else {
+ const char *op = rw == MSR_TYPE_W ? "wrmsr" : "rdmsr";
+ int ret;
+
+ BUILD_BUG_ON(rw != MSR_TYPE_R && rw != MSR_TYPE_W);
+
+ /*
+ * Zero the data on read failures to avoid leaking stack data to the
+ * guest and/or userspace, e.g. if the failure is ignored below.
+ */
+ ret = msr_access_fn(vcpu, msr, data, host_initiated);
+ if (ret && rw == MSR_TYPE_R)
+ *data = 0;
+
+ if (ret != KVM_MSR_RET_UNSUPPORTED)
+ return ret;
+
+ if (!ignore_msrs) {
kvm_debug_ratelimited("unhandled %s: 0x%x data 0x%llx\n",
- op, msr, data);
- return false;
+ op, msr, *data);
+ return ret;
}
+
+ if (report_ignored_msrs)
+ kvm_pr_unimpl("ignored %s: 0x%x data 0x%llx\n", op, msr, *data);
+
+ return 0;
}

static struct kmem_cache *kvm_alloc_emulator_cache(void)
@@ -1705,16 +1720,8 @@ static int kvm_get_feature_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,

static int do_get_feature_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
{
- int r;
-
- /* Unconditionally clear the output for simplicity */
- *data = 0;
- r = kvm_get_feature_msr(vcpu, index, data, true);
-
- if (r == KVM_MSR_RET_UNSUPPORTED && kvm_msr_ignored_check(index, 0, false))
- r = 0;
-
- return r;
+ return kvm_do_msr_access(vcpu, index, data, true, MSR_TYPE_R,
+ kvm_get_feature_msr);
}

static bool __kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
@@ -1901,16 +1908,17 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
return static_call(kvm_x86_set_msr)(vcpu, &msr);
}

+static int _kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
+ bool host_initiated)
+{
+ return __kvm_set_msr(vcpu, index, *data, host_initiated);
+}
+
static int kvm_set_msr_ignored_check(struct kvm_vcpu *vcpu,
u32 index, u64 data, bool host_initiated)
{
- int ret = __kvm_set_msr(vcpu, index, data, host_initiated);
-
- if (ret == KVM_MSR_RET_UNSUPPORTED)
- if (kvm_msr_ignored_check(index, data, true))
- ret = 0;
-
- return ret;
+ return kvm_do_msr_access(vcpu, index, &data, host_initiated, MSR_TYPE_W,
+ _kvm_set_msr);
}

/*
@@ -1949,16 +1957,8 @@ int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu,
u32 index, u64 *data, bool host_initiated)
{
- int ret = __kvm_get_msr(vcpu, index, data, host_initiated);
-
- if (ret == KVM_MSR_RET_UNSUPPORTED) {
- /* Unconditionally clear *data for simplicity */
- *data = 0;
- if (kvm_msr_ignored_check(index, 0, false))
- ret = 0;
- }
-
- return ret;
+ return kvm_do_msr_access(vcpu, index, data, host_initiated, MSR_TYPE_R,
+ __kvm_get_msr);
}

static int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data)
--
2.44.0.769.g3c40516874-goog


2024-04-25 18:17:22

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 08/10] KVM: x86: Hoist x86.c's global msr_* variables up above kvm_do_msr_access()

Move the definitions of the various MSR arrays above kvm_do_msr_access()
so that kvm_do_msr_access() can query the arrays when handling failures,
e.g. to squash errors if userspace tries to read an MSR that isn't fully
supported, but that KVM advertised as being an MSR-to-save.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/x86.c | 368 ++++++++++++++++++++++-----------------------
1 file changed, 184 insertions(+), 184 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a0506878d58e..04a5ae853774 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -319,6 +319,190 @@ u64 __read_mostly host_xcr0;

static struct kmem_cache *x86_emulator_cache;

+/*
+ * The three MSR lists(msrs_to_save, emulated_msrs, msr_based_features) track
+ * the set of MSRs that KVM exposes to userspace through KVM_GET_MSRS,
+ * KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST. msrs_to_save holds MSRs that
+ * require host support, i.e. should be probed via RDMSR. emulated_msrs holds
+ * MSRs that KVM emulates without strictly requiring host support.
+ * msr_based_features holds MSRs that enumerate features, i.e. are effectively
+ * CPUID leafs. Note, msr_based_features isn't mutually exclusive with
+ * msrs_to_save and emulated_msrs.
+ */
+
+static const u32 msrs_to_save_base[] = {
+ MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
+ MSR_STAR,
+#ifdef CONFIG_X86_64
+ MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
+#endif
+ MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
+ MSR_IA32_FEAT_CTL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+ MSR_IA32_SPEC_CTRL, MSR_IA32_TSX_CTRL,
+ MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
+ MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
+ MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
+ MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B,
+ MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B,
+ MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
+ MSR_IA32_UMWAIT_CONTROL,
+
+ MSR_IA32_XFD, MSR_IA32_XFD_ERR,
+};
+
+static const u32 msrs_to_save_pmu[] = {
+ MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
+ MSR_ARCH_PERFMON_FIXED_CTR0 + 2,
+ MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
+ MSR_CORE_PERF_GLOBAL_CTRL, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
+ MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
+
+ /* This part of MSRs should match KVM_INTEL_PMC_MAX_GENERIC. */
+ MSR_ARCH_PERFMON_PERFCTR0, MSR_ARCH_PERFMON_PERFCTR1,
+ MSR_ARCH_PERFMON_PERFCTR0 + 2, MSR_ARCH_PERFMON_PERFCTR0 + 3,
+ MSR_ARCH_PERFMON_PERFCTR0 + 4, MSR_ARCH_PERFMON_PERFCTR0 + 5,
+ MSR_ARCH_PERFMON_PERFCTR0 + 6, MSR_ARCH_PERFMON_PERFCTR0 + 7,
+ MSR_ARCH_PERFMON_EVENTSEL0, MSR_ARCH_PERFMON_EVENTSEL1,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 2, MSR_ARCH_PERFMON_EVENTSEL0 + 3,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 4, MSR_ARCH_PERFMON_EVENTSEL0 + 5,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 6, MSR_ARCH_PERFMON_EVENTSEL0 + 7,
+
+ MSR_K7_EVNTSEL0, MSR_K7_EVNTSEL1, MSR_K7_EVNTSEL2, MSR_K7_EVNTSEL3,
+ MSR_K7_PERFCTR0, MSR_K7_PERFCTR1, MSR_K7_PERFCTR2, MSR_K7_PERFCTR3,
+
+ /* This part of MSRs should match KVM_AMD_PMC_MAX_GENERIC. */
+ MSR_F15H_PERF_CTL0, MSR_F15H_PERF_CTL1, MSR_F15H_PERF_CTL2,
+ MSR_F15H_PERF_CTL3, MSR_F15H_PERF_CTL4, MSR_F15H_PERF_CTL5,
+ MSR_F15H_PERF_CTR0, MSR_F15H_PERF_CTR1, MSR_F15H_PERF_CTR2,
+ MSR_F15H_PERF_CTR3, MSR_F15H_PERF_CTR4, MSR_F15H_PERF_CTR5,
+
+ MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
+ MSR_AMD64_PERF_CNTR_GLOBAL_STATUS,
+ MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
+};
+
+static u32 msrs_to_save[ARRAY_SIZE(msrs_to_save_base) +
+ ARRAY_SIZE(msrs_to_save_pmu)];
+static unsigned num_msrs_to_save;
+
+static const u32 emulated_msrs_all[] = {
+ MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
+ MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
+
+#ifdef CONFIG_KVM_HYPERV
+ HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
+ HV_X64_MSR_TIME_REF_COUNT, HV_X64_MSR_REFERENCE_TSC,
+ HV_X64_MSR_TSC_FREQUENCY, HV_X64_MSR_APIC_FREQUENCY,
+ HV_X64_MSR_CRASH_P0, HV_X64_MSR_CRASH_P1, HV_X64_MSR_CRASH_P2,
+ HV_X64_MSR_CRASH_P3, HV_X64_MSR_CRASH_P4, HV_X64_MSR_CRASH_CTL,
+ HV_X64_MSR_RESET,
+ HV_X64_MSR_VP_INDEX,
+ HV_X64_MSR_VP_RUNTIME,
+ HV_X64_MSR_SCONTROL,
+ HV_X64_MSR_STIMER0_CONFIG,
+ HV_X64_MSR_VP_ASSIST_PAGE,
+ HV_X64_MSR_REENLIGHTENMENT_CONTROL, HV_X64_MSR_TSC_EMULATION_CONTROL,
+ HV_X64_MSR_TSC_EMULATION_STATUS, HV_X64_MSR_TSC_INVARIANT_CONTROL,
+ HV_X64_MSR_SYNDBG_OPTIONS,
+ HV_X64_MSR_SYNDBG_CONTROL, HV_X64_MSR_SYNDBG_STATUS,
+ HV_X64_MSR_SYNDBG_SEND_BUFFER, HV_X64_MSR_SYNDBG_RECV_BUFFER,
+ HV_X64_MSR_SYNDBG_PENDING_BUFFER,
+#endif
+
+ MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
+ MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF_INT, MSR_KVM_ASYNC_PF_ACK,
+
+ MSR_IA32_TSC_ADJUST,
+ MSR_IA32_TSC_DEADLINE,
+ MSR_IA32_ARCH_CAPABILITIES,
+ MSR_IA32_PERF_CAPABILITIES,
+ MSR_IA32_MISC_ENABLE,
+ MSR_IA32_MCG_STATUS,
+ MSR_IA32_MCG_CTL,
+ MSR_IA32_MCG_EXT_CTL,
+ MSR_IA32_SMBASE,
+ MSR_SMI_COUNT,
+ MSR_PLATFORM_INFO,
+ MSR_MISC_FEATURES_ENABLES,
+ MSR_AMD64_VIRT_SPEC_CTRL,
+ MSR_AMD64_TSC_RATIO,
+ MSR_IA32_POWER_CTL,
+ MSR_IA32_UCODE_REV,
+
+ /*
+ * KVM always supports the "true" VMX control MSRs, even if the host
+ * does not. The VMX MSRs as a whole are considered "emulated" as KVM
+ * doesn't strictly require them to exist in the host (ignoring that
+ * KVM would refuse to load in the first place if the core set of MSRs
+ * aren't supported).
+ */
+ MSR_IA32_VMX_BASIC,
+ MSR_IA32_VMX_TRUE_PINBASED_CTLS,
+ MSR_IA32_VMX_TRUE_PROCBASED_CTLS,
+ MSR_IA32_VMX_TRUE_EXIT_CTLS,
+ MSR_IA32_VMX_TRUE_ENTRY_CTLS,
+ MSR_IA32_VMX_MISC,
+ MSR_IA32_VMX_CR0_FIXED0,
+ MSR_IA32_VMX_CR4_FIXED0,
+ MSR_IA32_VMX_VMCS_ENUM,
+ MSR_IA32_VMX_PROCBASED_CTLS2,
+ MSR_IA32_VMX_EPT_VPID_CAP,
+ MSR_IA32_VMX_VMFUNC,
+
+ MSR_K7_HWCR,
+ MSR_KVM_POLL_CONTROL,
+};
+
+static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)];
+static unsigned num_emulated_msrs;
+
+/*
+ * List of MSRs that control the existence of MSR-based features, i.e. MSRs
+ * that are effectively CPUID leafs. VMX MSRs are also included in the set of
+ * feature MSRs, but are handled separately to allow expedited lookups.
+ */
+static const u32 msr_based_features_all_except_vmx[] = {
+ MSR_AMD64_DE_CFG,
+ MSR_IA32_UCODE_REV,
+ MSR_IA32_ARCH_CAPABILITIES,
+ MSR_IA32_PERF_CAPABILITIES,
+};
+
+static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all_except_vmx) +
+ (KVM_LAST_EMULATED_VMX_MSR - KVM_FIRST_EMULATED_VMX_MSR + 1)];
+static unsigned int num_msr_based_features;
+
+/*
+ * All feature MSRs except uCode revID, which tracks the currently loaded uCode
+ * patch, are immutable once the vCPU model is defined.
+ */
+static bool kvm_is_immutable_feature_msr(u32 msr)
+{
+ int i;
+
+ if (msr >= KVM_FIRST_EMULATED_VMX_MSR && msr <= KVM_LAST_EMULATED_VMX_MSR)
+ return true;
+
+ for (i = 0; i < ARRAY_SIZE(msr_based_features_all_except_vmx); i++) {
+ if (msr == msr_based_features_all_except_vmx[i])
+ return msr != MSR_IA32_UCODE_REV;
+ }
+
+ return false;
+}
+
+static bool kvm_is_msr_to_save(u32 msr_index)
+{
+ unsigned int i;
+
+ for (i = 0; i < num_msrs_to_save; i++) {
+ if (msrs_to_save[i] == msr_index)
+ return true;
+ }
+
+ return false;
+}
+
typedef int (*msr_access_t)(struct kvm_vcpu *vcpu, u32 index, u64 *data,
bool host_initiated);

@@ -1448,178 +1632,6 @@ int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_emulate_rdpmc);

-/*
- * The three MSR lists(msrs_to_save, emulated_msrs, msr_based_features) track
- * the set of MSRs that KVM exposes to userspace through KVM_GET_MSRS,
- * KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST. msrs_to_save holds MSRs that
- * require host support, i.e. should be probed via RDMSR. emulated_msrs holds
- * MSRs that KVM emulates without strictly requiring host support.
- * msr_based_features holds MSRs that enumerate features, i.e. are effectively
- * CPUID leafs. Note, msr_based_features isn't mutually exclusive with
- * msrs_to_save and emulated_msrs.
- */
-
-static const u32 msrs_to_save_base[] = {
- MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
- MSR_STAR,
-#ifdef CONFIG_X86_64
- MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
-#endif
- MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
- MSR_IA32_FEAT_CTL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
- MSR_IA32_SPEC_CTRL, MSR_IA32_TSX_CTRL,
- MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
- MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
- MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
- MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B,
- MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B,
- MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
- MSR_IA32_UMWAIT_CONTROL,
-
- MSR_IA32_XFD, MSR_IA32_XFD_ERR,
-};
-
-static const u32 msrs_to_save_pmu[] = {
- MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
- MSR_ARCH_PERFMON_FIXED_CTR0 + 2,
- MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
- MSR_CORE_PERF_GLOBAL_CTRL, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
- MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
-
- /* This part of MSRs should match KVM_INTEL_PMC_MAX_GENERIC. */
- MSR_ARCH_PERFMON_PERFCTR0, MSR_ARCH_PERFMON_PERFCTR1,
- MSR_ARCH_PERFMON_PERFCTR0 + 2, MSR_ARCH_PERFMON_PERFCTR0 + 3,
- MSR_ARCH_PERFMON_PERFCTR0 + 4, MSR_ARCH_PERFMON_PERFCTR0 + 5,
- MSR_ARCH_PERFMON_PERFCTR0 + 6, MSR_ARCH_PERFMON_PERFCTR0 + 7,
- MSR_ARCH_PERFMON_EVENTSEL0, MSR_ARCH_PERFMON_EVENTSEL1,
- MSR_ARCH_PERFMON_EVENTSEL0 + 2, MSR_ARCH_PERFMON_EVENTSEL0 + 3,
- MSR_ARCH_PERFMON_EVENTSEL0 + 4, MSR_ARCH_PERFMON_EVENTSEL0 + 5,
- MSR_ARCH_PERFMON_EVENTSEL0 + 6, MSR_ARCH_PERFMON_EVENTSEL0 + 7,
-
- MSR_K7_EVNTSEL0, MSR_K7_EVNTSEL1, MSR_K7_EVNTSEL2, MSR_K7_EVNTSEL3,
- MSR_K7_PERFCTR0, MSR_K7_PERFCTR1, MSR_K7_PERFCTR2, MSR_K7_PERFCTR3,
-
- /* This part of MSRs should match KVM_AMD_PMC_MAX_GENERIC. */
- MSR_F15H_PERF_CTL0, MSR_F15H_PERF_CTL1, MSR_F15H_PERF_CTL2,
- MSR_F15H_PERF_CTL3, MSR_F15H_PERF_CTL4, MSR_F15H_PERF_CTL5,
- MSR_F15H_PERF_CTR0, MSR_F15H_PERF_CTR1, MSR_F15H_PERF_CTR2,
- MSR_F15H_PERF_CTR3, MSR_F15H_PERF_CTR4, MSR_F15H_PERF_CTR5,
-
- MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
- MSR_AMD64_PERF_CNTR_GLOBAL_STATUS,
- MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
-};
-
-static u32 msrs_to_save[ARRAY_SIZE(msrs_to_save_base) +
- ARRAY_SIZE(msrs_to_save_pmu)];
-static unsigned num_msrs_to_save;
-
-static const u32 emulated_msrs_all[] = {
- MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
- MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
-
-#ifdef CONFIG_KVM_HYPERV
- HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
- HV_X64_MSR_TIME_REF_COUNT, HV_X64_MSR_REFERENCE_TSC,
- HV_X64_MSR_TSC_FREQUENCY, HV_X64_MSR_APIC_FREQUENCY,
- HV_X64_MSR_CRASH_P0, HV_X64_MSR_CRASH_P1, HV_X64_MSR_CRASH_P2,
- HV_X64_MSR_CRASH_P3, HV_X64_MSR_CRASH_P4, HV_X64_MSR_CRASH_CTL,
- HV_X64_MSR_RESET,
- HV_X64_MSR_VP_INDEX,
- HV_X64_MSR_VP_RUNTIME,
- HV_X64_MSR_SCONTROL,
- HV_X64_MSR_STIMER0_CONFIG,
- HV_X64_MSR_VP_ASSIST_PAGE,
- HV_X64_MSR_REENLIGHTENMENT_CONTROL, HV_X64_MSR_TSC_EMULATION_CONTROL,
- HV_X64_MSR_TSC_EMULATION_STATUS, HV_X64_MSR_TSC_INVARIANT_CONTROL,
- HV_X64_MSR_SYNDBG_OPTIONS,
- HV_X64_MSR_SYNDBG_CONTROL, HV_X64_MSR_SYNDBG_STATUS,
- HV_X64_MSR_SYNDBG_SEND_BUFFER, HV_X64_MSR_SYNDBG_RECV_BUFFER,
- HV_X64_MSR_SYNDBG_PENDING_BUFFER,
-#endif
-
- MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
- MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF_INT, MSR_KVM_ASYNC_PF_ACK,
-
- MSR_IA32_TSC_ADJUST,
- MSR_IA32_TSC_DEADLINE,
- MSR_IA32_ARCH_CAPABILITIES,
- MSR_IA32_PERF_CAPABILITIES,
- MSR_IA32_MISC_ENABLE,
- MSR_IA32_MCG_STATUS,
- MSR_IA32_MCG_CTL,
- MSR_IA32_MCG_EXT_CTL,
- MSR_IA32_SMBASE,
- MSR_SMI_COUNT,
- MSR_PLATFORM_INFO,
- MSR_MISC_FEATURES_ENABLES,
- MSR_AMD64_VIRT_SPEC_CTRL,
- MSR_AMD64_TSC_RATIO,
- MSR_IA32_POWER_CTL,
- MSR_IA32_UCODE_REV,
-
- /*
- * KVM always supports the "true" VMX control MSRs, even if the host
- * does not. The VMX MSRs as a whole are considered "emulated" as KVM
- * doesn't strictly require them to exist in the host (ignoring that
- * KVM would refuse to load in the first place if the core set of MSRs
- * aren't supported).
- */
- MSR_IA32_VMX_BASIC,
- MSR_IA32_VMX_TRUE_PINBASED_CTLS,
- MSR_IA32_VMX_TRUE_PROCBASED_CTLS,
- MSR_IA32_VMX_TRUE_EXIT_CTLS,
- MSR_IA32_VMX_TRUE_ENTRY_CTLS,
- MSR_IA32_VMX_MISC,
- MSR_IA32_VMX_CR0_FIXED0,
- MSR_IA32_VMX_CR4_FIXED0,
- MSR_IA32_VMX_VMCS_ENUM,
- MSR_IA32_VMX_PROCBASED_CTLS2,
- MSR_IA32_VMX_EPT_VPID_CAP,
- MSR_IA32_VMX_VMFUNC,
-
- MSR_K7_HWCR,
- MSR_KVM_POLL_CONTROL,
-};
-
-static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)];
-static unsigned num_emulated_msrs;
-
-/*
- * List of MSRs that control the existence of MSR-based features, i.e. MSRs
- * that are effectively CPUID leafs. VMX MSRs are also included in the set of
- * feature MSRs, but are handled separately to allow expedited lookups.
- */
-static const u32 msr_based_features_all_except_vmx[] = {
- MSR_AMD64_DE_CFG,
- MSR_IA32_UCODE_REV,
- MSR_IA32_ARCH_CAPABILITIES,
- MSR_IA32_PERF_CAPABILITIES,
-};
-
-static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all_except_vmx) +
- (KVM_LAST_EMULATED_VMX_MSR - KVM_FIRST_EMULATED_VMX_MSR + 1)];
-static unsigned int num_msr_based_features;
-
-/*
- * All feature MSRs except uCode revID, which tracks the currently loaded uCode
- * patch, are immutable once the vCPU model is defined.
- */
-static bool kvm_is_immutable_feature_msr(u32 msr)
-{
- int i;
-
- if (msr >= KVM_FIRST_EMULATED_VMX_MSR && msr <= KVM_LAST_EMULATED_VMX_MSR)
- return true;
-
- for (i = 0; i < ARRAY_SIZE(msr_based_features_all_except_vmx); i++) {
- if (msr == msr_based_features_all_except_vmx[i])
- return msr != MSR_IA32_UCODE_REV;
- }
-
- return false;
-}
-
/*
* Some IA32_ARCH_CAPABILITIES bits have dependencies on MSRs that KVM
* does not yet virtualize. These include:
@@ -3770,18 +3782,6 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
}

-static bool kvm_is_msr_to_save(u32 msr_index)
-{
- unsigned int i;
-
- for (i = 0; i < num_msrs_to_save; i++) {
- if (msrs_to_save[i] == msr_index)
- return true;
- }
-
- return false;
-}
-
int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
u32 msr = msr_info->index;
--
2.44.0.769.g3c40516874-goog


2024-04-25 18:17:36

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 09/10] KVM: x86: Suppress failures on userspace access to advertised, unsupported MSRs

Extend KVM's suppression of failures due to a userspace access to an
unsupported, but advertised as a "to save" MSR to all MSRs, not just those
that happen to reach the default case statements in kvm_get_msr_common()
and kvm_set_msr_common(). KVM's soon-to-be-established ABI is that if an
MSR is advertised to userspace, then userspace is allowed to read the MSR,
and write back the value that was read, i.e. why an MSR is unsupported
doesn't change KVM's ABI.

Practically speaking, this is very nearly a nop, as the only other paths
that return KVM_MSR_RET_UNSUPPORTED are {svm,vmx}_get_feature_msr(), and
it's unlikely, though not impossible, that userspace is using KVM_GET_MSRS
on unsupported MSRs.

The primary goal of moving the suppression to common code is to allow
returning KVM_MSR_RET_UNSUPPORTED as appropriate throughout KVM, without
having to manually handle the "is userspace accessing an advertised"
waiver. I.e. this will allow formalizing KVM's ABI without incurring a
high maintenance cost.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/x86.c | 27 +++++++++------------------
1 file changed, 9 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 04a5ae853774..4c91189342ff 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -527,6 +527,15 @@ static __always_inline int kvm_do_msr_access(struct kvm_vcpu *vcpu, u32 msr,
if (ret != KVM_MSR_RET_UNSUPPORTED)
return ret;

+ /*
+ * Userspace is allowed to read MSRs, and write '0' to MSRs, that KVM
+ * reports as to-be-saved, even if an MSR isn't fully supported.
+ * Simply check that @data is '0', which covers both the write '0' case
+ * and all reads (in which case @data is zeroed on failure; see above).
+ */
+ if (host_initiated && !*data && kvm_is_msr_to_save(msr))
+ return 0;
+
if (!ignore_msrs) {
kvm_debug_ratelimited("unhandled %s: 0x%x data 0x%llx\n",
op, msr, *data);
@@ -4163,14 +4172,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (kvm_pmu_is_valid_msr(vcpu, msr))
return kvm_pmu_set_msr(vcpu, msr_info);

- /*
- * Userspace is allowed to write '0' to MSRs that KVM reports
- * as to-be-saved, even if an MSRs isn't fully supported.
- */
- if (msr_info->host_initiated && !data &&
- kvm_is_msr_to_save(msr))
- break;
-
return KVM_MSR_RET_UNSUPPORTED;
}
return 0;
@@ -4522,16 +4523,6 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
return kvm_pmu_get_msr(vcpu, msr_info);

- /*
- * Userspace is allowed to read MSRs that KVM reports as
- * to-be-saved, even if an MSR isn't fully supported.
- */
- if (msr_info->host_initiated &&
- kvm_is_msr_to_save(msr_info->index)) {
- msr_info->data = 0;
- break;
- }
-
return KVM_MSR_RET_UNSUPPORTED;
}
return 0;
--
2.44.0.769.g3c40516874-goog


2024-04-25 18:18:03

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 10/10] KVM: x86: Suppress userspace access failures on unsupported, "emulated" MSRs

Extend KVM's suppression of userspace MSR access failures to MSRs that KVM
reports as emulated, but are ultimately unsupported, e.g. if the VMX MSRs
are emulated by KVM, but are unsupported given the vCPU model.

Suggested-by: Weijiang Yang <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/x86.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4c91189342ff..14cfa25ef0e7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -491,7 +491,7 @@ static bool kvm_is_immutable_feature_msr(u32 msr)
return false;
}

-static bool kvm_is_msr_to_save(u32 msr_index)
+static bool kvm_is_advertised_msr(u32 msr_index)
{
unsigned int i;

@@ -500,6 +500,11 @@ static bool kvm_is_msr_to_save(u32 msr_index)
return true;
}

+ for (i = 0; i < num_emulated_msrs; i++) {
+ if (emulated_msrs[i] == msr_index)
+ return true;
+ }
+
return false;
}

@@ -529,11 +534,11 @@ static __always_inline int kvm_do_msr_access(struct kvm_vcpu *vcpu, u32 msr,

/*
* Userspace is allowed to read MSRs, and write '0' to MSRs, that KVM
- * reports as to-be-saved, even if an MSR isn't fully supported.
+ * advertises to userspace, even if an MSR isn't fully supported.
* Simply check that @data is '0', which covers both the write '0' case
* and all reads (in which case @data is zeroed on failure; see above).
*/
- if (host_initiated && !*data && kvm_is_msr_to_save(msr))
+ if (host_initiated && !*data && kvm_is_advertised_msr(msr))
return 0;

if (!ignore_msrs) {
--
2.44.0.769.g3c40516874-goog


2024-04-26 07:02:12

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH 05/10] KVM: x86: Rename get_msr_feature() APIs to get_feature_msr()

On 4/26/2024 2:14 AM, Sean Christopherson wrote:
> Rename all APIs related to feature MSRs from get_feature_msr() to

s /get_feature_msr()/get_msr_feature()
> get_feature_msr(). The APIs get "feature MSRs", not "MSR features".
> And unlike kvm_{g,s}et_msr_common(), the "feature" adjective doesn't
> describe the helper itself.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm-x86-ops.h | 2 +-
> arch/x86/include/asm/kvm_host.h | 2 +-
> arch/x86/kvm/svm/svm.c | 6 +++---
> arch/x86/kvm/vmx/main.c | 2 +-
> arch/x86/kvm/vmx/vmx.c | 2 +-
> arch/x86/kvm/vmx/x86_ops.h | 2 +-
> arch/x86/kvm/x86.c | 12 ++++++------
> 7 files changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 5187fcf4b610..9f25b4a49d6b 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -128,7 +128,7 @@ KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
> KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from)
> KVM_X86_OP_OPTIONAL(vm_move_enc_context_from)
> KVM_X86_OP_OPTIONAL(guest_memory_reclaimed)
> -KVM_X86_OP(get_msr_feature)
> +KVM_X86_OP(get_feature_msr)
> KVM_X86_OP(check_emulate_instruction)
> KVM_X86_OP(apic_init_signal_blocked)
> KVM_X86_OP_OPTIONAL(enable_l2_tlb_flush)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 7d56e5a52ae3..cc04ab0c234e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1785,7 +1785,7 @@ struct kvm_x86_ops {
> int (*vm_move_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
> void (*guest_memory_reclaimed)(struct kvm *kvm);
>
> - int (*get_msr_feature)(u32 msr, u64 *data);
> + int (*get_feature_msr)(u32 msr, u64 *data);
>
> int (*check_emulate_instruction)(struct kvm_vcpu *vcpu, int emul_type,
> void *insn, int insn_len);
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 15422b7d9149..d95cd230540d 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2796,7 +2796,7 @@ static int efer_trap(struct kvm_vcpu *vcpu)
> return kvm_complete_insn_gp(vcpu, ret);
> }
>
> -static int svm_get_msr_feature(u32 msr, u64 *data)
> +static int svm_get_feature_msr(u32 msr, u64 *data)
> {
> *data = 0;
>
> @@ -3134,7 +3134,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> case MSR_AMD64_DE_CFG: {
> u64 supported_de_cfg;
>
> - if (svm_get_msr_feature(ecx, &supported_de_cfg))
> + if (svm_get_feature_msr(ecx, &supported_de_cfg))
> return 1;
>
> if (data & ~supported_de_cfg)
> @@ -4944,7 +4944,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> .vcpu_unblocking = avic_vcpu_unblocking,
>
> .update_exception_bitmap = svm_update_exception_bitmap,
> - .get_msr_feature = svm_get_msr_feature,
> + .get_feature_msr = svm_get_feature_msr,
> .get_msr = svm_get_msr,
> .set_msr = svm_set_msr,
> .get_segment_base = svm_get_segment_base,
> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> index 7c546ad3e4c9..c670f4cf6d94 100644
> --- a/arch/x86/kvm/vmx/main.c
> +++ b/arch/x86/kvm/vmx/main.c
> @@ -40,7 +40,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
> .vcpu_put = vmx_vcpu_put,
>
> .update_exception_bitmap = vmx_update_exception_bitmap,
> - .get_msr_feature = vmx_get_msr_feature,
> + .get_feature_msr = vmx_get_feature_msr,
> .get_msr = vmx_get_msr,
> .set_msr = vmx_set_msr,
> .get_segment_base = vmx_get_segment_base,
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 25b0a838abd6..fe2bf8f31d7c 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1955,7 +1955,7 @@ static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
> return !(msr->data & ~valid_bits);
> }
>
> -int vmx_get_msr_feature(u32 msr, u64 *data)
> +int vmx_get_feature_msr(u32 msr, u64 *data)
> {
> switch (msr) {
> case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
> diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
> index 504d56d6837d..4b81c85e9357 100644
> --- a/arch/x86/kvm/vmx/x86_ops.h
> +++ b/arch/x86/kvm/vmx/x86_ops.h
> @@ -58,7 +58,7 @@ bool vmx_has_emulated_msr(struct kvm *kvm, u32 index);
> void vmx_msr_filter_changed(struct kvm_vcpu *vcpu);
> void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
> void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu);
> -int vmx_get_msr_feature(u32 msr, u64 *data);
> +int vmx_get_feature_msr(u32 msr, u64 *data);
> int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
> u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg);
> void vmx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 03e50812ab33..8f58181f2b6d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1682,7 +1682,7 @@ static u64 kvm_get_arch_capabilities(void)
> return data;
> }
>
> -static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
> +static int kvm_get_feature_msr(struct kvm_msr_entry *msr)
> {
> switch (msr->index) {
> case MSR_IA32_ARCH_CAPABILITIES:
> @@ -1695,12 +1695,12 @@ static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
> rdmsrl_safe(msr->index, &msr->data);
> break;
> default:
> - return static_call(kvm_x86_get_msr_feature)(msr->index, &msr->data);
> + return static_call(kvm_x86_get_feature_msr)(msr->index, &msr->data);
> }
> return 0;
> }
>
> -static int do_get_msr_feature(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
> +static int do_get_feature_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
> {
> struct kvm_msr_entry msr;
> int r;
> @@ -1708,7 +1708,7 @@ static int do_get_msr_feature(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
> /* Unconditionally clear the output for simplicity */
> msr.data = 0;
> msr.index = index;
> - r = kvm_get_msr_feature(&msr);
> + r = kvm_get_feature_msr(&msr);
>
> if (r == KVM_MSR_RET_UNSUPPORTED && kvm_msr_ignored_check(index, 0, false))
> r = 0;
> @@ -4962,7 +4962,7 @@ long kvm_arch_dev_ioctl(struct file *filp,
> break;
> }
> case KVM_GET_MSRS:
> - r = msr_io(NULL, argp, do_get_msr_feature, 1);
> + r = msr_io(NULL, argp, do_get_feature_msr, 1);
> break;
> #ifdef CONFIG_KVM_HYPERV
> case KVM_GET_SUPPORTED_HV_CPUID:
> @@ -7367,7 +7367,7 @@ static void kvm_probe_feature_msr(u32 msr_index)
> .index = msr_index,
> };
>
> - if (kvm_get_msr_feature(&msr))
> + if (kvm_get_feature_msr(&msr))
> return;
>
> msr_based_features[num_msr_based_features++] = msr_index;


2024-04-26 07:53:00

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH 10/10] KVM: x86: Suppress userspace access failures on unsupported, "emulated" MSRs

On 4/26/2024 2:14 AM, Sean Christopherson wrote:
> Extend KVM's suppression of userspace MSR access failures to MSRs that KVM
> reports as emulated, but are ultimately unsupported, e.g. if the VMX MSRs
> are emulated by KVM, but are unsupported given the vCPU model.
>
> Suggested-by: Weijiang Yang<[email protected]>
> Signed-off-by: Sean Christopherson<[email protected]>
> ---
> arch/x86/kvm/x86.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4c91189342ff..14cfa25ef0e7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -491,7 +491,7 @@ static bool kvm_is_immutable_feature_msr(u32 msr)
> return false;
> }
>
> -static bool kvm_is_msr_to_save(u32 msr_index)
> +static bool kvm_is_advertised_msr(u32 msr_index)
> {
> unsigned int i;
>
> @@ -500,6 +500,11 @@ static bool kvm_is_msr_to_save(u32 msr_index)
> return true;
> }
>
> + for (i = 0; i < num_emulated_msrs; i++) {
> + if (emulated_msrs[i] == msr_index)
> + return true;
> + }
> +
> return false;
> }
>
> @@ -529,11 +534,11 @@ static __always_inline int kvm_do_msr_access(struct kvm_vcpu *vcpu, u32 msr,
>
> /*
> * Userspace is allowed to read MSRs, and write '0' to MSRs, that KVM
> - * reports as to-be-saved, even if an MSR isn't fully supported.
> + * advertises to userspace, even if an MSR isn't fully supported.
> * Simply check that @data is '0', which covers both the write '0' case
> * and all reads (in which case @data is zeroed on failure; see above).
> */
> - if (host_initiated && !*data && kvm_is_msr_to_save(msr))
> + if (host_initiated && !*data && kvm_is_advertised_msr(msr))
> return 0;
>
> if (!ignore_msrs) {

Reviewed-by: Weijiang Yang <[email protected]>


2024-04-26 12:36:53

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH 09/10] KVM: x86: Suppress failures on userspace access to advertised, unsupported MSRs

On 4/26/2024 2:14 AM, Sean Christopherson wrote:
> Extend KVM's suppression of failures due to a userspace access to an
> unsupported, but advertised as a "to save" MSR to all MSRs, not just those
> that happen to reach the default case statements in kvm_get_msr_common()
> and kvm_set_msr_common(). KVM's soon-to-be-established ABI is that if an
> MSR is advertised to userspace, then userspace is allowed to read the MSR,
> and write back the value that was read, i.e. why an MSR is unsupported
> doesn't change KVM's ABI.
>
> Practically speaking, this is very nearly a nop, as the only other paths
> that return KVM_MSR_RET_UNSUPPORTED are {svm,vmx}_get_feature_msr(), and
> it's unlikely, though not impossible, that userspace is using KVM_GET_MSRS
> on unsupported MSRs.
>
> The primary goal of moving the suppression to common code is to allow
> returning KVM_MSR_RET_UNSUPPORTED as appropriate throughout KVM, without
> having to manually handle the "is userspace accessing an advertised"
> waiver. I.e. this will allow formalizing KVM's ABI without incurring a
> high maintenance cost.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/x86.c | 27 +++++++++------------------
> 1 file changed, 9 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 04a5ae853774..4c91189342ff 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -527,6 +527,15 @@ static __always_inline int kvm_do_msr_access(struct kvm_vcpu *vcpu, u32 msr,
> if (ret != KVM_MSR_RET_UNSUPPORTED)
> return ret;
>
> + /*
> + * Userspace is allowed to read MSRs, and write '0' to MSRs, that KVM
> + * reports as to-be-saved, even if an MSR isn't fully supported.
> + * Simply check that @data is '0', which covers both the write '0' case
> + * and all reads (in which case @data is zeroed on failure; see above).
> + */
> + if (host_initiated && !*data && kvm_is_msr_to_save(msr))
> + return 0;
> +

IMHO,  it's worth to document above phrase into virt/kvm/api.rst KVM_{GET, SET}_MSRS
sections as a note because when users space reads/writes MSRs successfully, it doesn't
necessarily mean the operation really took effect. Maybe it's  just due to the fact they're
exposed in "to-be-saved" list.

> if (!ignore_msrs) {
> kvm_debug_ratelimited("unhandled %s: 0x%x data 0x%llx\n",
> op, msr, *data);
> @@ -4163,14 +4172,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> if (kvm_pmu_is_valid_msr(vcpu, msr))
> return kvm_pmu_set_msr(vcpu, msr_info);
>
> - /*
> - * Userspace is allowed to write '0' to MSRs that KVM reports
> - * as to-be-saved, even if an MSRs isn't fully supported.
> - */
> - if (msr_info->host_initiated && !data &&
> - kvm_is_msr_to_save(msr))
> - break;
> -
> return KVM_MSR_RET_UNSUPPORTED;
> }
> return 0;
> @@ -4522,16 +4523,6 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
> return kvm_pmu_get_msr(vcpu, msr_info);
>
> - /*
> - * Userspace is allowed to read MSRs that KVM reports as
> - * to-be-saved, even if an MSR isn't fully supported.
> - */
> - if (msr_info->host_initiated &&
> - kvm_is_msr_to_save(msr_info->index)) {
> - msr_info->data = 0;
> - break;
> - }
> -
> return KVM_MSR_RET_UNSUPPORTED;
> }
> return 0;


2024-04-26 17:18:25

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 09/10] KVM: x86: Suppress failures on userspace access to advertised, unsupported MSRs

On Fri, Apr 26, 2024, Weijiang Yang wrote:
> On 4/26/2024 2:14 AM, Sean Christopherson wrote:
> > Extend KVM's suppression of failures due to a userspace access to an
> > unsupported, but advertised as a "to save" MSR to all MSRs, not just those
> > that happen to reach the default case statements in kvm_get_msr_common()
> > and kvm_set_msr_common(). KVM's soon-to-be-established ABI is that if an
> > MSR is advertised to userspace, then userspace is allowed to read the MSR,
> > and write back the value that was read, i.e. why an MSR is unsupported
> > doesn't change KVM's ABI.
> >
> > Practically speaking, this is very nearly a nop, as the only other paths
> > that return KVM_MSR_RET_UNSUPPORTED are {svm,vmx}_get_feature_msr(), and
> > it's unlikely, though not impossible, that userspace is using KVM_GET_MSRS
> > on unsupported MSRs.
> >
> > The primary goal of moving the suppression to common code is to allow
> > returning KVM_MSR_RET_UNSUPPORTED as appropriate throughout KVM, without
> > having to manually handle the "is userspace accessing an advertised"
> > waiver. I.e. this will allow formalizing KVM's ABI without incurring a
> > high maintenance cost.
> >
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/x86/kvm/x86.c | 27 +++++++++------------------
> > 1 file changed, 9 insertions(+), 18 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 04a5ae853774..4c91189342ff 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -527,6 +527,15 @@ static __always_inline int kvm_do_msr_access(struct kvm_vcpu *vcpu, u32 msr,
> > if (ret != KVM_MSR_RET_UNSUPPORTED)
> > return ret;
> > + /*
> > + * Userspace is allowed to read MSRs, and write '0' to MSRs, that KVM
> > + * reports as to-be-saved, even if an MSR isn't fully supported.
> > + * Simply check that @data is '0', which covers both the write '0' case
> > + * and all reads (in which case @data is zeroed on failure; see above).
> > + */
> > + if (host_initiated && !*data && kvm_is_msr_to_save(msr))
> > + return 0;
> > +
>
> IMHO,  it's worth to document above phrase into virt/kvm/api.rst KVM_{GET,
> SET}_MSRS sections as a note because when users space reads/writes MSRs
> successfully, it doesn't necessarily mean the operation really took effect.
> Maybe it's  just due to the fact they're exposed in "to-be-saved" list.

Agreed, though I think I'd prefer to wait to officially document the behavior
until we have fully converted KVM's internals to KVM_MSR_RET_UNSUPPORTED. I'm
99% certain the behavior won't actually be as simple as "userspace can write '0'",
e.g. I know of at least one case where KVM allows '0' _or_ the KVM's non-zero
default.

In the case that I'm aware of (MSR_AMD64_TSC_RATIO), the "default" is a hardcoded
KVM constant, e.g. won't change based on underlying hardware, but there me be
other special cases lurking, and we won't know until we complete the conversion :-(