LinuxLists.cc - [PATCH 0/4] KVM: x86: Collect host state snapshots into a struct

2024-04-23 22:15:36

Subject: [PATCH 0/4] KVM: x86: Collect host state snapshots into a struct

Add a global "kvm_host" structure to hold various host values, e.g. for
EFER, XCR0, raw MAXPHYADDR etc., instead of having a bunch of one-off
variables that inevitably need to be exported, or in the case of
shadow_phys_bits, are buried in a random location and are awkward to use,
leading to duplicate code.

Sean Christopherson (4):
KVM: x86: Add a struct to consolidate host values, e.g. EFER, XCR0,
etc...
KVM: SVM: Use KVM's snapshot of the host's XCR0 for SEV-ES host state
KVM: x86/mmu: Snapshot shadow_phys_bits when kvm.ko is loaded
KVM: x86: Move shadow_phys_bits into "kvm_host", as "maxphyaddr"

arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/mmu.h | 27 +----------------------
arch/x86/kvm/mmu/mmu.c | 2 +-
arch/x86/kvm/mmu/spte.c | 26 ++++++++++++++++++----
arch/x86/kvm/svm/sev.c | 4 ++--
arch/x86/kvm/vmx/nested.c | 8 +++----
arch/x86/kvm/vmx/vmx.c | 28 +++++++++++-------------
arch/x86/kvm/vmx/vmx.h | 2 +-
arch/x86/kvm/x86.c | 38 +++++++++++++--------------------
arch/x86/kvm/x86.h | 19 +++++++++++++----
10 files changed, 74 insertions(+), 81 deletions(-)

base-commit: 7b076c6a308ec5bce9fc96e2935443ed228b9148
--
2.44.0.769.g3c40516874-goog

2024-04-23 22:15:55

by Sean Christopherson

[permalink] [raw]

Subject: [PATCH 1/4] KVM: x86: Add a struct to consolidate host values, e.g. EFER, XCR0, etc...

Add "struct kvm_host_values kvm_host" to hold the various host values
that KVM snapshots during initialization. Bundling the host values into
a single struct simplifies adding new MSRs and other features with host
state/values that KVM cares about, and provides a one-stop shop. E.g.
adding a new value requires one line, whereas tracking each value
individual often requires three: declaration, definition, and export.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/svm/sev.c | 2 +-
arch/x86/kvm/vmx/nested.c | 8 +++----
arch/x86/kvm/vmx/vmx.c | 14 ++++++------
arch/x86/kvm/x86.c | 38 +++++++++++++--------------------
arch/x86/kvm/x86.h | 12 +++++++----
6 files changed, 35 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1d13e3cd1dc5..8d3940a59894 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1846,7 +1846,6 @@ struct kvm_arch_async_pf {
};

extern u32 __read_mostly kvm_nr_uret_msrs;
-extern u64 __read_mostly host_efer;
extern bool __read_mostly allow_smaller_maxphyaddr;
extern bool __read_mostly enable_apicv;
extern struct kvm_x86_ops kvm_x86_ops;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 598d78b4107f..71f1518f0ca1 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3251,7 +3251,7 @@ void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_are
*/
hostsa->xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
hostsa->pkru = read_pkru();
- hostsa->xss = host_xss;
+ hostsa->xss = kvm_host.xss;

/*
* If DebugSwap is enabled, debug registers are loaded but NOT saved by
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d5b832126e34..a896df59eaad 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2422,7 +2422,7 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
if (cpu_has_load_ia32_efer()) {
if (guest_efer & EFER_LMA)
exec_control |= VM_ENTRY_IA32E_MODE;
- if (guest_efer != host_efer)
+ if (guest_efer != kvm_host.efer)
exec_control |= VM_ENTRY_LOAD_IA32_EFER;
}
vm_entry_controls_set(vmx, exec_control);
@@ -2435,7 +2435,7 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
* bits may be modified by vmx_set_efer() in prepare_vmcs02().
*/
exec_control = __vm_exit_controls_get(vmcs01);
- if (cpu_has_load_ia32_efer() && guest_efer != host_efer)
+ if (cpu_has_load_ia32_efer() && guest_efer != kvm_host.efer)
exec_control |= VM_EXIT_LOAD_IA32_EFER;
else
exec_control &= ~VM_EXIT_LOAD_IA32_EFER;
@@ -4662,7 +4662,7 @@ static inline u64 nested_vmx_get_vmcs01_guest_efer(struct vcpu_vmx *vmx)
return vmcs_read64(GUEST_IA32_EFER);

if (cpu_has_load_ia32_efer())
- return host_efer;
+ return kvm_host.efer;

for (i = 0; i < vmx->msr_autoload.guest.nr; ++i) {
if (vmx->msr_autoload.guest.val[i].index == MSR_EFER)
@@ -4673,7 +4673,7 @@ static inline u64 nested_vmx_get_vmcs01_guest_efer(struct vcpu_vmx *vmx)
if (efer_msr)
return efer_msr->data;

- return host_efer;
+ return kvm_host.efer;
}

static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f10b5f8f364b..cb1bd9aebac4 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -258,7 +258,7 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
return 0;
}

- if (host_arch_capabilities & ARCH_CAP_SKIP_VMENTRY_L1DFLUSH) {
+ if (kvm_host.arch_capabilities & ARCH_CAP_SKIP_VMENTRY_L1DFLUSH) {
l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NOT_REQUIRED;
return 0;
}
@@ -403,7 +403,7 @@ static void vmx_update_fb_clear_dis(struct kvm_vcpu *vcpu, struct vcpu_vmx *vmx)
* and VM-Exit.
*/
vmx->disable_fb_clear = !cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF) &&
- (host_arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) &&
+ (kvm_host.arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) &&
!boot_cpu_has_bug(X86_BUG_MDS) &&
!boot_cpu_has_bug(X86_BUG_TAA);

@@ -1116,12 +1116,12 @@ static bool update_transition_efer(struct vcpu_vmx *vmx)
* atomically, since it's faster than switching it manually.
*/
if (cpu_has_load_ia32_efer() ||
- (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) {
+ (enable_ept && ((vmx->vcpu.arch.efer ^ kvm_host.efer) & EFER_NX))) {
if (!(guest_efer & EFER_LMA))
guest_efer &= ~EFER_LME;
- if (guest_efer != host_efer)
+ if (guest_efer != kvm_host.efer)
add_atomic_switch_msr(vmx, MSR_EFER,
- guest_efer, host_efer, false);
+ guest_efer, kvm_host.efer, false);
else
clear_atomic_switch_msr(vmx, MSR_EFER);
return false;
@@ -1134,7 +1134,7 @@ static bool update_transition_efer(struct vcpu_vmx *vmx)
clear_atomic_switch_msr(vmx, MSR_EFER);

guest_efer &= ~ignore_bits;
- guest_efer |= host_efer & ignore_bits;
+ guest_efer |= kvm_host.efer & ignore_bits;

vmx->guest_uret_msrs[i].data = guest_efer;
vmx->guest_uret_msrs[i].mask = ~ignore_bits;
@@ -4346,7 +4346,7 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
}

if (cpu_has_load_ia32_efer())
- vmcs_write64(HOST_IA32_EFER, host_efer);
+ vmcs_write64(HOST_IA32_EFER, kvm_host.efer);
}

void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e9ef1fa4b90b..1b664385461d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -98,6 +98,9 @@ struct kvm_caps kvm_caps __read_mostly = {
};
EXPORT_SYMBOL_GPL(kvm_caps);

+struct kvm_host_values kvm_host __read_mostly;
+EXPORT_SYMBOL_GPL(kvm_host);
+
#define ERR_PTR_USR(e) ((void __user *)ERR_PTR(e))

#define emul_to_vcpu(ctxt) \
@@ -227,21 +230,12 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)

-u64 __read_mostly host_efer;
-EXPORT_SYMBOL_GPL(host_efer);
-
bool __read_mostly allow_smaller_maxphyaddr = 0;
EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr);

bool __read_mostly enable_apicv = true;
EXPORT_SYMBOL_GPL(enable_apicv);

-u64 __read_mostly host_xss;
-EXPORT_SYMBOL_GPL(host_xss);
-
-u64 __read_mostly host_arch_capabilities;
-EXPORT_SYMBOL_GPL(host_arch_capabilities);
-
const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS(),
STATS_DESC_COUNTER(VM, mmu_shadow_zapped),
@@ -315,8 +309,6 @@ const struct kvm_stats_header kvm_vcpu_stats_header = {
sizeof(kvm_vcpu_stats_desc),
};

-u64 __read_mostly host_xcr0;
-
static struct kmem_cache *x86_emulator_cache;

/*
@@ -1023,11 +1015,11 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)

if (kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE)) {

- if (vcpu->arch.xcr0 != host_xcr0)
+ if (vcpu->arch.xcr0 != kvm_host.xcr0)
xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0);

if (guest_can_use(vcpu, X86_FEATURE_XSAVES) &&
- vcpu->arch.ia32_xss != host_xss)
+ vcpu->arch.ia32_xss != kvm_host.xss)
wrmsrl(MSR_IA32_XSS, vcpu->arch.ia32_xss);
}

@@ -1054,12 +1046,12 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu)

if (kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE)) {

- if (vcpu->arch.xcr0 != host_xcr0)
- xsetbv(XCR_XFEATURE_ENABLED_MASK, host_xcr0);
+ if (vcpu->arch.xcr0 != kvm_host.xcr0)
+ xsetbv(XCR_XFEATURE_ENABLED_MASK, kvm_host.xcr0);

if (guest_can_use(vcpu, X86_FEATURE_XSAVES) &&
- vcpu->arch.ia32_xss != host_xss)
- wrmsrl(MSR_IA32_XSS, host_xss);
+ vcpu->arch.ia32_xss != kvm_host.xss)
+ wrmsrl(MSR_IA32_XSS, kvm_host.xss);
}

}
@@ -1626,7 +1618,7 @@ static bool kvm_is_immutable_feature_msr(u32 msr)

static u64 kvm_get_arch_capabilities(void)
{
- u64 data = host_arch_capabilities & KVM_SUPPORTED_ARCH_CAP;
+ u64 data = kvm_host.arch_capabilities & KVM_SUPPORTED_ARCH_CAP;

/*
* If nx_huge_pages is enabled, KVM's shadow paging will ensure that
@@ -9777,19 +9769,19 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
goto out_free_percpu;

if (boot_cpu_has(X86_FEATURE_XSAVE)) {
- host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
- kvm_caps.supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0;
+ kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
+ kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0;
}

- rdmsrl_safe(MSR_EFER, &host_efer);
+ rdmsrl_safe(MSR_EFER, &kvm_host.efer);

if (boot_cpu_has(X86_FEATURE_XSAVES))
- rdmsrl(MSR_IA32_XSS, host_xss);
+ rdmsrl(MSR_IA32_XSS, kvm_host.xss);

kvm_init_pmu_capability(ops->pmu_ops);

if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
- rdmsrl(MSR_IA32_ARCH_CAPABILITIES, host_arch_capabilities);
+ rdmsrl(MSR_IA32_ARCH_CAPABILITIES, kvm_host.arch_capabilities);

r = ops->hardware_setup();
if (r != 0)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index d80a4c6b5a38..e69fff7d1f21 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -33,6 +33,13 @@ struct kvm_caps {
u64 supported_perf_cap;
};

+struct kvm_host_values {
+ u64 efer;
+ u64 xcr0;
+ u64 xss;
+ u64 arch_capabilities;
+};
+
void kvm_spurious_fault(void);

#define KVM_NESTED_VMENTER_CONSISTENCY_CHECK(consistency_check) \
@@ -325,11 +332,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
int emulation_type, void *insn, int insn_len);
fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu);

-extern u64 host_xcr0;
-extern u64 host_xss;
-extern u64 host_arch_capabilities;
-
extern struct kvm_caps kvm_caps;
+extern struct kvm_host_values kvm_host;

extern bool enable_pmu;

--
2.44.0.769.g3c40516874-goog

2024-04-23 22:16:03

by Sean Christopherson

[permalink] [raw]

Subject: [PATCH 2/4] KVM: SVM: Use KVM's snapshot of the host's XCR0 for SEV-ES host state

Use KVM's snapshot of the host's XCR0 when stuffing SEV-ES host state
instead of reading XCR0 from hardware. XCR0 is only written during
boot, i.e. won't change while KVM is running (and KVM at large is hosed
if that doesn't hold true).

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/sev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 71f1518f0ca1..c56070991a58 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3249,7 +3249,7 @@ void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_are
* isn't saved by VMRUN, that isn't already saved by VMSAVE (performed
* by common SVM code).
*/
- hostsa->xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
+ hostsa->xcr0 = kvm_host.xcr0;
hostsa->pkru = read_pkru();
hostsa->xss = kvm_host.xss;

--
2.44.0.769.g3c40516874-goog

2024-04-23 22:16:22

by Sean Christopherson

[permalink] [raw]

Subject: [PATCH 3/4] KVM: x86/mmu: Snapshot shadow_phys_bits when kvm.ko is loaded

Snapshot shadow_phys_bits when kvm.ko is loaded, not when a vendor module
is loaded, to guard against usage of shadow_phys_bits before it is
initialized. The computation isn't vendor specific in any way, i.e. there
there is no reason to wait to snapshot the value until a vendor module is
loaded, nor is there any reason to recompute the value every time a vendor
module is loaded.

Opportunistically convert it from "read mostly" to "read-only after init".

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu.h | 2 +-
arch/x86/kvm/mmu/spte.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index b410a227c601..ef970aea26e7 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -61,7 +61,7 @@ static __always_inline u64 rsvd_bits(int s, int e)
* The number of non-reserved physical address bits irrespective of features
* that repurpose legal bits, e.g. MKTME.
*/
-extern u8 __read_mostly shadow_phys_bits;
+extern u8 __ro_after_init shadow_phys_bits;

static inline gfn_t kvm_mmu_max_gfn(void)
{
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 6c7ab3aa6aa7..927f4abbe973 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -43,7 +43,7 @@ u64 __read_mostly shadow_acc_track_mask;
u64 __read_mostly shadow_nonpresent_or_rsvd_mask;
u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask;

-u8 __read_mostly shadow_phys_bits;
+u8 __ro_after_init shadow_phys_bits;

void __init kvm_mmu_spte_module_init(void)
{
@@ -55,6 +55,8 @@ void __init kvm_mmu_spte_module_init(void)
* will change when the vendor module is (re)loaded.
*/
allow_mmio_caching = enable_mmio_caching;
+
+ shadow_phys_bits = kvm_get_shadow_phys_bits();
}

static u64 generation_mmio_spte_mask(u64 gen)
@@ -439,8 +441,6 @@ void kvm_mmu_reset_all_pte_masks(void)
u8 low_phys_bits;
u64 mask;

- shadow_phys_bits = kvm_get_shadow_phys_bits();
-
/*
* If the CPU has 46 or less physical address bits, then set an
* appropriate mask to guard against L1TF attacks. Otherwise, it is
--
2.44.0.769.g3c40516874-goog

2024-04-23 22:16:51

by Sean Christopherson

[permalink] [raw]

Subject: [PATCH 4/4] KVM: x86: Move shadow_phys_bits into "kvm_host", as "maxphyaddr"

Move shadow_phys_bits into "struct kvm_host_values", i.e. into KVM's
global "kvm_host" variable, so that it is automatically exported for use
in vendor modules. Rename the variable/field to maxphyaddr to more
clearly capture what value it holds, now that it's used outside of the
MMU (and because the "shadow" part is more than a bit misleading as the
variable is not at all unique to shadow paging).

Recomputing the raw/true host.MAXPHYADDR on every use can be subtly
expensive, e.g. it will incur a VM-Exit on the CPUID if KVM is running as
a nested hypervisor. Vendor code already has access to the information,
e.g. by directly doing CPUID or by invoking kvm_get_shadow_phys_bits(), so
there's no tangible benefit to making it MMU-only.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu.h | 27 +--------------------------
arch/x86/kvm/mmu/mmu.c | 2 +-
arch/x86/kvm/mmu/spte.c | 24 +++++++++++++++++++++---
arch/x86/kvm/vmx/vmx.c | 14 ++++++--------
arch/x86/kvm/vmx/vmx.h | 2 +-
arch/x86/kvm/x86.h | 7 +++++++
6 files changed, 37 insertions(+), 39 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index ef970aea26e7..0d63637f46d7 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -57,12 +57,6 @@ static __always_inline u64 rsvd_bits(int s, int e)
return ((2ULL << (e - s)) - 1) << s;
}

-/*
- * The number of non-reserved physical address bits irrespective of features
- * that repurpose legal bits, e.g. MKTME.
- */
-extern u8 __ro_after_init shadow_phys_bits;
-
static inline gfn_t kvm_mmu_max_gfn(void)
{
/*
@@ -76,30 +70,11 @@ static inline gfn_t kvm_mmu_max_gfn(void)
* than hardware's real MAXPHYADDR. Using the host MAXPHYADDR
* disallows such SPTEs entirely and simplifies the TDP MMU.
*/
- int max_gpa_bits = likely(tdp_enabled) ? shadow_phys_bits : 52;
+ int max_gpa_bits = likely(tdp_enabled) ? kvm_host.maxphyaddr : 52;

return (1ULL << (max_gpa_bits - PAGE_SHIFT)) - 1;
}

-static inline u8 kvm_get_shadow_phys_bits(void)
-{
- /*
- * boot_cpu_data.x86_phys_bits is reduced when MKTME or SME are detected
- * in CPU detection code, but the processor treats those reduced bits as
- * 'keyID' thus they are not reserved bits. Therefore KVM needs to look at
- * the physical address bits reported by CPUID.
- */
- if (likely(boot_cpu_data.extended_cpuid_level >= 0x80000008))
- return cpuid_eax(0x80000008) & 0xff;
-
- /*
- * Quite weird to have VMX or SVM but not MAXPHYADDR; probably a VM with
- * custom CPUID. Proceed with whatever the kernel found since these features
- * aren't virtualizable (SME/SEV also require CPUIDs higher than 0x80000008).
- */
- return boot_cpu_data.x86_phys_bits;
-}
-
u8 kvm_mmu_get_max_tdp_level(void);

void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 12ad01929dce..c30bffa441cf 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4933,7 +4933,7 @@ static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu,

static inline u64 reserved_hpa_bits(void)
{
- return rsvd_bits(shadow_phys_bits, 63);
+ return rsvd_bits(kvm_host.maxphyaddr, 63);
}

/*
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 927f4abbe973..d49a3f928b0b 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -43,7 +43,25 @@ u64 __read_mostly shadow_acc_track_mask;
u64 __read_mostly shadow_nonpresent_or_rsvd_mask;
u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask;

-u8 __ro_after_init shadow_phys_bits;
+static u8 __init kvm_get_host_maxphyaddr(void)
+{
+ /*
+ * boot_cpu_data.x86_phys_bits is reduced when MKTME or SME are detected
+ * in CPU detection code, but the processor treats those reduced bits as
+ * 'keyID' thus they are not reserved bits. Therefore KVM needs to look at
+ * the physical address bits reported by CPUID, i.e. the raw MAXPHYADDR,
+ * when reasoning about CPU behavior with respect to MAXPHYADDR.
+ */
+ if (likely(boot_cpu_data.extended_cpuid_level >= 0x80000008))
+ return cpuid_eax(0x80000008) & 0xff;
+
+ /*
+ * Quite weird to have VMX or SVM but not MAXPHYADDR; probably a VM with
+ * custom CPUID. Proceed with whatever the kernel found since these features
+ * aren't virtualizable (SME/SEV also require CPUIDs higher than 0x80000008).
+ */
+ return boot_cpu_data.x86_phys_bits;
+}

void __init kvm_mmu_spte_module_init(void)
{
@@ -56,7 +74,7 @@ void __init kvm_mmu_spte_module_init(void)
*/
allow_mmio_caching = enable_mmio_caching;

- shadow_phys_bits = kvm_get_shadow_phys_bits();
+ kvm_host.maxphyaddr = kvm_get_host_maxphyaddr();
}

static u64 generation_mmio_spte_mask(u64 gen)
@@ -492,7 +510,7 @@ void kvm_mmu_reset_all_pte_masks(void)
* 52-bit physical addresses then there are no reserved PA bits in the
* PTEs and so the reserved PA approach must be disabled.
*/
- if (shadow_phys_bits < 52)
+ if (kvm_host.maxphyaddr < 52)
mask = BIT_ULL(51) | PT_PRESENT_MASK;
else
mask = 0;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cb1bd9aebac4..185b07bbbc16 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8337,18 +8337,16 @@ static void __init vmx_setup_me_spte_mask(void)
u64 me_mask = 0;

/*
- * kvm_get_shadow_phys_bits() returns shadow_phys_bits. Use
- * the former to avoid exposing shadow_phys_bits.
- *
* On pre-MKTME system, boot_cpu_data.x86_phys_bits equals to
- * shadow_phys_bits. On MKTME and/or TDX capable systems,
+ * kvm_host.maxphyaddr. On MKTME and/or TDX capable systems,
* boot_cpu_data.x86_phys_bits holds the actual physical address
- * w/o the KeyID bits, and shadow_phys_bits equals to MAXPHYADDR
- * reported by CPUID. Those bits between are KeyID bits.
+ * w/o the KeyID bits, and kvm_host.maxphyaddr equals to
+ * MAXPHYADDR reported by CPUID. Those bits between are KeyID bits.
*/
- if (boot_cpu_data.x86_phys_bits != kvm_get_shadow_phys_bits())
+ if (boot_cpu_data.x86_phys_bits != kvm_host.maxphyaddr)
me_mask = rsvd_bits(boot_cpu_data.x86_phys_bits,
- kvm_get_shadow_phys_bits() - 1);
+ kvm_host.maxphyaddr - 1);
+
/*
* Unlike SME, host kernel doesn't support setting up any
* MKTME KeyID on Intel platforms. No memory encryption
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 90f9e4434646..e7343023fbce 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -723,7 +723,7 @@ static inline bool vmx_need_pf_intercept(struct kvm_vcpu *vcpu)
return true;

return allow_smaller_maxphyaddr &&
- cpuid_maxphyaddr(vcpu) < kvm_get_shadow_phys_bits();
+ cpuid_maxphyaddr(vcpu) < kvm_host.maxphyaddr;
}

static inline bool is_unrestricted_guest(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index e69fff7d1f21..a88c65d3ea26 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -34,6 +34,13 @@ struct kvm_caps {
};

struct kvm_host_values {
+ /*
+ * The host's raw MAXPHYADDR, i.e. the number of non-reserved physical
+ * address bits irrespective of features that repurpose legal bits,
+ * e.g. MKTME.
+ */
+ u8 maxphyaddr;
+
u64 efer;
u64 xcr0;
u64 xss;
--
2.44.0.769.g3c40516874-goog

2024-04-25 11:17:29

by Wei Wang

[permalink] [raw]

Subject: RE: [PATCH 0/4] KVM: x86: Collect host state snapshots into a struct

On Wednesday, April 24, 2024 6:15 AM, Sean Christopherson wrote:
> Add a global "kvm_host" structure to hold various host values, e.g. for EFER,
> XCR0, raw MAXPHYADDR etc., instead of having a bunch of one-off variables
> that inevitably need to be exported, or in the case of shadow_phys_bits, are
> buried in a random location and are awkward to use, leading to duplicate
> code.

Looks good. How about applying similar improvements to the module
parameters as well? I've changed the "enable_pmu" parameter as an example below:

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 77352a4abd87..a221ba7b546f 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1013,7 +1013,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
union cpuid10_eax eax;
union cpuid10_edx edx;

- if (!enable_pmu || !static_cpu_has(X86_FEATURE_ARCH_PERFMON)) {
+ if (!kvm_caps.enable_pmu || !static_cpu_has(X86_FEATURE_ARCH_PERFMON)) {
entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
break;
}
@@ -1306,7 +1306,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
union cpuid_0x80000022_ebx ebx;

entry->ecx = entry->edx = 0;
- if (!enable_pmu || !kvm_cpu_cap_has(X86_FEATURE_PERFMON_V2)) {
+ if (!kvm_caps.enable_pmu || !kvm_cpu_cap_has(X86_FEATURE_PERFMON_V2)) {
entry->eax = entry->ebx;
break;
}
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 4d52b0b539ba..7e359db64dbd 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -190,9 +190,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
* for hybrid PMUs until KVM gains a way to let userspace opt-in.
*/
if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU))
- enable_pmu = false;
+ kvm_caps.enable_pmu = false;

- if (enable_pmu) {
+ if (kvm_caps.enable_pmu) {
perf_get_x86_pmu_capability(&kvm_pmu_cap);

/*
@@ -203,12 +203,12 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
*/
if (!kvm_pmu_cap.num_counters_gp ||
WARN_ON_ONCE(kvm_pmu_cap.num_counters_gp < min_nr_gp_ctrs))
- enable_pmu = false;
+ kvm_caps.enable_pmu = false;
else if (is_intel && !kvm_pmu_cap.version)
- enable_pmu = false;
+ kvm_caps.enable_pmu = false;
}

- if (!enable_pmu) {
+ if (!kvm_caps.enable_pmu) {
memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
return;
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 7ea0e7f13da4..4ed8c73f88e4 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7869,7 +7869,7 @@ static __init u64 vmx_get_perf_capabilities(void)
u64 perf_cap = PMU_CAP_FW_WRITES;
u64 host_perf_cap = 0;

- if (!enable_pmu)
+ if (!kvm_caps.enable_pmu)
return 0;

if (boot_cpu_has(X86_FEATURE_PDCM))
@@ -7938,7 +7938,7 @@ static __init void vmx_set_cpu_caps(void)
kvm_cpu_cap_check_and_set(X86_FEATURE_DTES64);
}

- if (!enable_pmu)
+ if (!kvm_caps.enable_pmu)
kvm_cpu_cap_clear(X86_FEATURE_PDCM);
kvm_caps.supported_perf_cap = vmx_get_perf_capabilities();

@@ -8683,7 +8683,7 @@ static __init int hardware_setup(void)

if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
return -EINVAL;
- if (!enable_ept || !enable_pmu || !cpu_has_vmx_intel_pt())
+ if (!enable_ept || !kvm_caps.enable_pmu || !cpu_has_vmx_intel_pt())
pt_mode = PT_MODE_SYSTEM;
if (pt_mode == PT_MODE_HOST_GUEST)
vmx_init_ops.handle_intel_pt_intr = vmx_handle_intel_pt_intr;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cdcda1bbf5a3..36d471a7af87 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -94,6 +94,7 @@

struct kvm_caps kvm_caps __read_mostly = {
.supported_mce_cap = MCG_CTL_P | MCG_SER_P,
+ .enable_pmu = true,
};
EXPORT_SYMBOL_GPL(kvm_caps);

@@ -192,9 +193,7 @@ int __read_mostly pi_inject_timer = -1;
module_param(pi_inject_timer, bint, 0644);

/* Enable/disable PMU virtualization */
-bool __read_mostly enable_pmu = true;
-EXPORT_SYMBOL_GPL(enable_pmu);
-module_param(enable_pmu, bool, 0444);
+module_param_named(enable_pmu, kvm_caps.enable_pmu, bool, 0444);

bool __read_mostly eager_page_split = true;
module_param(eager_page_split, bool, 0644);
@@ -4815,7 +4814,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
break;
}
case KVM_CAP_PMU_CAPABILITY:
- r = enable_pmu ? KVM_CAP_PMU_VALID_MASK : 0;
+ r = kvm_caps.enable_pmu ? KVM_CAP_PMU_VALID_MASK : 0;
break;
case KVM_CAP_DISABLE_QUIRKS2:
r = KVM_X86_VALID_QUIRKS;
@@ -6652,7 +6651,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
break;
case KVM_CAP_PMU_CAPABILITY:
r = -EINVAL;
- if (!enable_pmu || (cap->args[0] & ~KVM_CAP_PMU_VALID_MASK))
+ if (!kvm_caps.enable_pmu || (cap->args[0] & ~KVM_CAP_PMU_VALID_MASK))
break;

mutex_lock(&kvm->lock);
@@ -7438,7 +7437,7 @@ static void kvm_init_msr_lists(void)
for (i = 0; i < ARRAY_SIZE(msrs_to_save_base); i++)
kvm_probe_msr_to_save(msrs_to_save_base[i]);

- if (enable_pmu) {
+ if (kvm_caps.enable_pmu) {
for (i = 0; i < ARRAY_SIZE(msrs_to_save_pmu); i++)
kvm_probe_msr_to_save(msrs_to_save_pmu[i]);
}
@@ -12555,7 +12554,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)

kvm->arch.default_tsc_khz = max_tsc_khz ? : tsc_khz;
kvm->arch.guest_can_read_msr_platform_info = true;
- kvm->arch.enable_pmu = enable_pmu;
+ kvm->arch.enable_pmu = kvm_caps.enable_pmu;

#if IS_ENABLED(CONFIG_HYPERV)
spin_lock_init(&kvm->arch.hv_root_tdp_lock);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 102754dc85bc..c4d99338aaa1 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -29,6 +29,9 @@ struct kvm_caps {
u64 supported_xcr0;
u64 supported_xss;
u64 supported_perf_cap;
+
+ /* KVM module parameters */
+ bool enable_pmu;
};

struct kvm_host_values {
@@ -340,8 +343,6 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu);
extern struct kvm_caps kvm_caps;
extern struct kvm_host_values kvm_host;

-extern bool enable_pmu;

2024-04-25 11:30:07

by Wei Wang

[permalink] [raw]

Subject: RE: [PATCH 1/4] KVM: x86: Add a struct to consolidate host values, e.g. EFER, XCR0, etc...

On Wednesday, April 24, 2024 6:15 AM, Sean Christopherson wrote:
> @@ -403,7 +403,7 @@ static void vmx_update_fb_clear_dis(struct kvm_vcpu
> *vcpu, struct vcpu_vmx *vmx)
> * and VM-Exit.
> */
> vmx->disable_fb_clear
> = !cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF) &&
> - (host_arch_capabilities &
> ARCH_CAP_FB_CLEAR_CTRL) &&
> + (kvm_host.arch_capabilities &
> ARCH_CAP_FB_CLEAR_CTRL) &&

The line of code appears to be lengthy. It would be preferable to limit it to under
80 columns per line.

> !boot_cpu_has_bug(X86_BUG_MDS) &&
> !boot_cpu_has_bug(X86_BUG_TAA);
>
> @@ -1116,12 +1116,12 @@ static bool update_transition_efer(struct
> vcpu_vmx *vmx)
> * atomically, since it's faster than switching it manually.
> */
> if (cpu_has_load_ia32_efer() ||
> - (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) {
> + (enable_ept && ((vmx->vcpu.arch.efer ^ kvm_host.efer) & EFER_NX)))
> +{
> if (!(guest_efer & EFER_LMA))
> guest_efer &= ~EFER_LME;
> - if (guest_efer != host_efer)
> + if (guest_efer != kvm_host.efer)
> add_atomic_switch_msr(vmx, MSR_EFER,
> - guest_efer, host_efer, false);
> + guest_efer, kvm_host.efer, false);
> else
> clear_atomic_switch_msr(vmx, MSR_EFER);
> return false;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index
> d80a4c6b5a38..e69fff7d1f21 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -33,6 +33,13 @@ struct kvm_caps {
> u64 supported_perf_cap;
> };
>
> +struct kvm_host_values {
> + u64 efer;
> + u64 xcr0;
> + u64 xss;
> + u64 arch_capabilities;
> +};
> +
> void kvm_spurious_fault(void);
>
> #define KVM_NESTED_VMENTER_CONSISTENCY_CHECK(consistency_check)
> \
> @@ -325,11 +332,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
> gpa_t cr2_or_gpa,
> int emulation_type, void *insn, int insn_len);
> fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu);
>
> -extern u64 host_xcr0;
> -extern u64 host_xss;
> -extern u64 host_arch_capabilities;
> -
> extern struct kvm_caps kvm_caps;
> +extern struct kvm_host_values kvm_host;

Have you considered merging the kvm_host_values and kvm_caps into one unified
structure?
(If the concern is about naming, we could brainstorm a more encompassing term
for them)

2024-04-25 14:10:41

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH 1/4] KVM: x86: Add a struct to consolidate host values, e.g. EFER, XCR0, etc...

On Thu, Apr 25, 2024, Wei W Wang wrote:
> On Wednesday, April 24, 2024 6:15 AM, Sean Christopherson wrote:
> > @@ -403,7 +403,7 @@ static void vmx_update_fb_clear_dis(struct kvm_vcpu
> > *vcpu, struct vcpu_vmx *vmx)
> > * and VM-Exit.
> > */
> > vmx->disable_fb_clear
> > = !cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF) &&
> > - (host_arch_capabilities &
> > ARCH_CAP_FB_CLEAR_CTRL) &&
> > + (kvm_host.arch_capabilities &
> > ARCH_CAP_FB_CLEAR_CTRL) &&
>
> The line of code appears to be lengthy. It would be preferable to limit it to under
> 80 columns per line.

I agree that staying under 80 is generally preferred, but I find this

vmx->disable_fb_clear = (kvm_host.arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) &&
!boot_cpu_has_bug(X86_BUG_MDS) &&
!boot_cpu_has_bug(X86_BUG_TAA);

much more readable than this

vmx->disable_fb_clear = (kvm_host.arch_capabilities &
ARCH_CAP_FB_CLEAR_CTRL) &&
!boot_cpu_has_bug(X86_BUG_MDS) &&
!boot_cpu_has_bug(X86_BUG_TAA);

We should shorten the name to arch_caps, but I don't think that's a net positive,
e.g. unless we do a bulk rename, it'd diverge from several other functions/variables,
and IMO it would be less obvious that the field holds MSR_IA32_ARCH_CAPABILITIES.

> > !boot_cpu_has_bug(X86_BUG_MDS) &&
> > !boot_cpu_has_bug(X86_BUG_TAA);
> >

> > @@ -325,11 +332,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
> > gpa_t cr2_or_gpa,
> > int emulation_type, void *insn, int insn_len);
> > fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu);
> >
> > -extern u64 host_xcr0;
> > -extern u64 host_xss;
> > -extern u64 host_arch_capabilities;
> > -
> > extern struct kvm_caps kvm_caps;
> > +extern struct kvm_host_values kvm_host;
>
> Have you considered merging the kvm_host_values and kvm_caps into one unified
> structure?

No really. I don't see any benefit, only the downside of having to come up with
a name that is intuitive when reading code related to both.

> (If the concern is about naming, we could brainstorm a more encompassing term
> for them)

2024-04-25 14:26:31

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH 0/4] KVM: x86: Collect host state snapshots into a struct

On Thu, Apr 25, 2024, Wei W Wang wrote:
> On Wednesday, April 24, 2024 6:15 AM, Sean Christopherson wrote:
> > Add a global "kvm_host" structure to hold various host values, e.g. for EFER,
> > XCR0, raw MAXPHYADDR etc., instead of having a bunch of one-off variables
> > that inevitably need to be exported, or in the case of shadow_phys_bits, are
> > buried in a random location and are awkward to use, leading to duplicate
> > code.
>
> Looks good. How about applying similar improvements to the module
> parameters as well? I've changed the "enable_pmu" parameter as an example below:

Hmm, I don't hate the idea, but I don't think it would work as well in practice.

For kvm_host, all of the fields it contains were being namespace with "host_<asset>",

And the globals that became kvm_caps all had some variant of kvm_ or kvm_has_ as
a prefix.

For module params and other knobs, the thing being controlled is usually unique
to KVM, and often fairly self-descriptive, so we haven't had to namespace them
muc. And we have params across kvm.ko, kvm-amd.ko, and kvm-intel.ko, which
sometimes weird splits in responsibilities, e.g. enable_apicv is defined by common
x86, but the module params themsleves are defined by SVM and VMX, and for SVM it's
an alias of avic.

> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 77352a4abd87..a221ba7b546f 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -1013,7 +1013,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
> union cpuid10_eax eax;
> union cpuid10_edx edx;
>
> - if (!enable_pmu || !static_cpu_has(X86_FEATURE_ARCH_PERFMON)) {
> + if (!kvm_caps.enable_pmu || !static_cpu_has(X86_FEATURE_ARCH_PERFMON)) {

If we did try to collect module params in a struct, it should be a unique struct,
because they aren't pure capabilities or host values.

> entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
> break;
> }

2024-04-25 14:37:46

by Wei Wang

[permalink] [raw]

Subject: RE: [PATCH 1/4] KVM: x86: Add a struct to consolidate host values, e.g. EFER, XCR0, etc...

On Thursday, April 25, 2024 10:10 PM, Sean Christopherson wrote:
> On Thu, Apr 25, 2024, Wei W Wang wrote:
> > On Wednesday, April 24, 2024 6:15 AM, Sean Christopherson wrote:
> > > @@ -403,7 +403,7 @@ static void vmx_update_fb_clear_dis(struct
> > > kvm_vcpu *vcpu, struct vcpu_vmx *vmx)
> > > * and VM-Exit.
> > > */
> > > vmx->disable_fb_clear
> > > = !cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF) &&
> > > - (host_arch_capabilities &
> > > ARCH_CAP_FB_CLEAR_CTRL) &&
> > > + (kvm_host.arch_capabilities &
> > > ARCH_CAP_FB_CLEAR_CTRL) &&
> >
> > The line of code appears to be lengthy. It would be preferable to
> > limit it to under
> > 80 columns per line.
>
> I agree that staying under 80 is generally preferred, but I find this
>
> vmx->disable_fb_clear = (kvm_host.arch_capabilities &
> ARCH_CAP_FB_CLEAR_CTRL) &&
> !boot_cpu_has_bug(X86_BUG_MDS) &&
> !boot_cpu_has_bug(X86_BUG_TAA);
>
> much more readable than this
>
> vmx->disable_fb_clear = (kvm_host.arch_capabilities &
> ARCH_CAP_FB_CLEAR_CTRL) &&
> !boot_cpu_has_bug(X86_BUG_MDS) &&
> !boot_cpu_has_bug(X86_BUG_TAA);
>
> We should shorten the name to arch_caps, but I don't think that's a net
> positive, e.g. unless we do a bulk rename, it'd diverge from several other
> functions/variables, and IMO it would be less obvious that the field holds
> MSR_IA32_ARCH_CAPABILITIES.

Yeah, the above isn't nice and no need to do bulk rename.
We could just shorten it here, e.g.:

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4ed8c73f88e4..8d0ab5a6a515 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -393,6 +393,9 @@ static __always_inline void vmx_enable_fb_clear(struct vcpu_vmx *vmx)

static void vmx_update_fb_clear_dis(struct kvm_vcpu *vcpu, struct vcpu_vmx *vmx)
{
+ u64 arch_cap = kvm_host.arch_capabilities;
+
/*
* Disable VERW's behavior of clearing CPU buffers for the guest if the
* CPU isn't affected by MDS/TAA, and the host hasn't forcefully enabled
@@ -402,7 +405,7 @@ static void vmx_update_fb_clear_dis(struct kvm_vcpu *vcpu, struct vcpu_vmx *vmx)
* and VM-Exit.
*/
vmx->disable_fb_clear = !cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF) &&
- (kvm_host.arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) &&
+ (arch_cap & ARCH_CAP_FB_CLEAR_CTRL) &&
!boot_cpu_has_bug(X86_BUG_MDS) &&
!boot_cpu_has_bug(X86_BUG_TAA);

>
> > > !boot_cpu_has_bug(X86_BUG_MDS) &&
> > > !boot_cpu_has_bug(X86_BUG_TAA);
> > >
>
> > > @@ -325,11 +332,8 @@ int x86_emulate_instruction(struct kvm_vcpu
> > > *vcpu, gpa_t cr2_or_gpa,
> > > int emulation_type, void *insn, int insn_len);
> fastpath_t
> > > handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu);
> > >
> > > -extern u64 host_xcr0;
> > > -extern u64 host_xss;
> > > -extern u64 host_arch_capabilities;
> > > -
> > > extern struct kvm_caps kvm_caps;
> > > +extern struct kvm_host_values kvm_host;
> >
> > Have you considered merging the kvm_host_values and kvm_caps into one
> > unified structure?
>
> No really. I don't see any benefit, only the downside of having to come up
> with a name that is intuitive when reading code related to both.

I thought the two structures perform quite similar jobs and most of the fields in
kvm_cap, e.g. has_tsc_control, supported_perf_cap, could also be interpreted
as host values?

2024-04-25 14:47:46

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH 1/4] KVM: x86: Add a struct to consolidate host values, e.g. EFER, XCR0, etc...

On Thu, Apr 25, 2024, Wei W Wang wrote:
> On Thursday, April 25, 2024 10:10 PM, Sean Christopherson wrote:
> > We should shorten the name to arch_caps, but I don't think that's a net
> > positive, e.g. unless we do a bulk rename, it'd diverge from several other
> > functions/variables, and IMO it would be less obvious that the field holds
> > MSR_IA32_ARCH_CAPABILITIES.
>
> Yeah, the above isn't nice and no need to do bulk rename.
> We could just shorten it here, e.g.:

Works for me.

> > > > @@ -325,11 +332,8 @@ int x86_emulate_instruction(struct kvm_vcpu
> > > > *vcpu, gpa_t cr2_or_gpa,
> > > > int emulation_type, void *insn, int insn_len);
> > fastpath_t
> > > > handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu);
> > > >
> > > > -extern u64 host_xcr0;
> > > > -extern u64 host_xss;
> > > > -extern u64 host_arch_capabilities;
> > > > -
> > > > extern struct kvm_caps kvm_caps;
> > > > +extern struct kvm_host_values kvm_host;
> > >
> > > Have you considered merging the kvm_host_values and kvm_caps into one
> > > unified structure?
> >
> > No really. I don't see any benefit, only the downside of having to come up
> > with a name that is intuitive when reading code related to both.
>
> I thought the two structures perform quite similar jobs and most of the fields in
> kvm_cap, e.g. has_tsc_control, supported_perf_cap, could also be interpreted
> as host values?

No, kvm_caps is filtered and/or generated information, e.g. supported_perf_cap
and supported_xss incorporate host/hardware support, but they also incorporate
KVM's own capabilities.

kvm_host holds pure, unadultered host values.

XSS is a perfect example. If we shoved the host value in kvm_caps, then we'd
have kvm_caps.supported_xss and kvm_caps.xss, which would be incredibly confusing.
So then we'd need to rename it to kvm_caps.host_xss, which is also confusing,
just less so, and also results in a longer name with no added value.