2024-05-17 17:40:03

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching

This is technically v2 of "Replace governed features with guest cpu_caps",
but it obviously snowballed just a bit. This series wanders all over the
place, and ideally would be 3-4 distinct series, but there are interactions
and dependencies all over the place.

The super short TL;DR: snapshot all X86_FEATURE_* flags that KVM cares
about so that all queries against guest capabilities are "fast", e.g. don't
require manual enabling or judgment calls as to where a feature needs to be
fast.

The guest_cpu_cap_* nomenclature follows the existing kvm_cpu_cap_*
except for a few (maybe just one?) cases where guest cpu_caps need APIs
that kvm_cpu_caps don't. In theory, the similar names will make this
approach more intuitive.

Maxim's suggestion to incorporate KVM's capabilities into the guest's cpu_caps
grew on me, to the point where I decided to just go for it. Through macro
shenanigans (see the last DO NOT APPLY patch) and manually verifying that
vcpu->arch.cpu_caps is always a superset of guest CPUID, I was able to gain
sufficient confidence that KVM won't silently change guest behavior. Many, but
not all, of the new patches are related in some way to that approach.

There are *multiple* potentially breaking changes in this series (in for a
penny, in for a pound). However, I don't expect any fallout for real world
VMMs because the ABI changes either disallow things that couldn't possibly
have worked in the first place, or are following in the footsteps of other
behaviors, e.g. KVM advertises x2APIC, which is 100% dependent on an in-kernel
local APIC.

* Disallow stuffing CPUID-dependent guest CR4 features before setting guest
CPUID.
* Disallow KVM_CAP_X86_DISABLE_EXITS after vCPU creation
* Reject disabling of MWAIT/HLT interception when not allowed
* Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID.
* Advertise HYPERVISOR in KVM_GET_SUPPORTED_CPUID

Lastly, regarding the PoC DO NOT APPLY patch, I hope to turn that into an actual
patch in the future. E.g. I think we can shove feature usage information into
a .note or something, and then do post-processing a la objtool during the build.

v2:
- Collect a few reviews (though I dropped several due to the patches changing
significantly).
- Incorporate KVM's support into the vCPU's cpu_caps. [Maxim]
- A massive pile of new patches.

v1: https://lore.kernel.org/all/[email protected]

Sean Christopherson (49):
KVM: x86: Do all post-set CPUID processing during vCPU creation
KVM: x86: Explicitly do runtime CPUID updates "after" initial setup
KVM: x86: Account for KVM-reserved CR4 bits when passing through CR4
on VMX
KVM: selftests: Update x86's set_sregs_test to match KVM's CPUID
enforcement
KVM: selftests: Assert that the @cpuid passed to get_cpuid_entry() is
non-NULL
KVM: selftests: Refresh vCPU CPUID cache in __vcpu_get_cpuid_entry()
KVM: selftests: Verify KVM stuffs runtime CPUID OS bits on CR4 writes
KVM: x86: Move __kvm_is_valid_cr4() definition to x86.h
KVM: x86/pmu: Drop now-redundant refresh() during init()
KVM: x86: Drop now-redundant MAXPHYADDR and GPA rsvd bits from vCPU
creation
KVM: x86: Disallow KVM_CAP_X86_DISABLE_EXITS after vCPU creation
KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed
KVM: selftests: Fix a bad TEST_REQUIRE() in x86's KVM PV test
KVM: selftests: Update x86's KVM PV test to match KVM's disabling
exits behavior
KVM: x86: Zero out PV features cache when the CPUID leaf is not
present
KVM: x86: Don't update PV features caches when enabling enforcement
capability
KVM: x86: Do reverse CPUID sanity checks in __feature_leaf()
KVM: x86: Account for max supported CPUID leaf when getting raw host
CPUID
KVM: x86: Add a macro to init CPUID features that ignore host kernel
support
KVM: x86: Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init()
KVM: x86: Add a macro to init CPUID features that are 64-bit only
KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID
features
KVM: x86: Handle kernel- and KVM-defined CPUID words in a single
helper
KVM: x86: #undef SPEC_CTRL_SSBD in cpuid.c to avoid macro collisions
KVM: x86: Harden CPU capabilities processing against out-of-scope
features
KVM: x86: Add a macro to init CPUID features that KVM emulates in
software
KVM: x86: Swap incoming guest CPUID into vCPU before massaging in
KVM_SET_CPUID2
KVM: x86: Clear PV_UNHALT for !HLT-exiting only when userspace sets
CPUID
KVM: x86: Remove unnecessary caching of KVM's PV CPUID base
KVM: x86: Always operate on kvm_vcpu data in cpuid_entry2_find()
KVM: x86: Move kvm_find_cpuid_entry{,_index}() up near
cpuid_entry2_find()
KVM: x86: Remove all direct usage of cpuid_entry2_find()
KVM: x86: Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID
KVM: x86: Advertise HYPERVISOR in KVM_GET_SUPPORTED_CPUID
KVM: x86: Add a macro to handle features that are fully VMM controlled
KVM: x86: Rename "governed features" helpers to use "guest_cpu_cap"
KVM: x86: Replace guts of "governed" features with comprehensive
cpu_caps
KVM: x86: Initialize guest cpu_caps based on guest CPUID
KVM: x86: Extract code for generating per-entry emulated CPUID
information
KVM: x86: Initialize guest cpu_caps based on KVM support
KVM: x86: Avoid double CPUID lookup when updating MWAIT at runtime
KVM: x86: Drop unnecessary check that cpuid_entry2_find() returns
right leaf
KVM: x86: Update OS{XSAVE,PKE} bits in guest CPUID irrespective of
host support
KVM: x86: Update guest cpu_caps at runtime for dynamic CPUID-based
features
KVM: x86: Shuffle code to prepare for dropping guest_cpuid_has()
KVM: x86: Replace (almost) all guest CPUID feature queries with
cpu_caps
KVM: x86: Drop superfluous host XSAVE check when adjusting guest
XSAVES caps
KVM: x86: Add a macro for features that are synthesized into
boot_cpu_data
*** DO NOT APPLY *** KVM: x86: Verify KVM initializes all consumed
guest caps

Documentation/virt/kvm/api.rst | 10 +-
arch/x86/include/asm/kvm_host.h | 46 +-
arch/x86/kvm/cpuid.c | 660 +++++++++++-------
arch/x86/kvm/cpuid.h | 141 ++--
arch/x86/kvm/governed_features.h | 22 -
arch/x86/kvm/hyperv.c | 2 +-
arch/x86/kvm/lapic.c | 2 +-
arch/x86/kvm/mmu.h | 2 +-
arch/x86/kvm/mmu/mmu.c | 4 +-
arch/x86/kvm/mtrr.c | 2 +-
arch/x86/kvm/pmu.c | 1 -
arch/x86/kvm/reverse_cpuid.h | 22 +-
arch/x86/kvm/smm.c | 10 +-
arch/x86/kvm/svm/nested.c | 22 +-
arch/x86/kvm/svm/pmu.c | 8 +-
arch/x86/kvm/svm/sev.c | 21 +-
arch/x86/kvm/svm/svm.c | 46 +-
arch/x86/kvm/svm/svm.h | 4 +-
arch/x86/kvm/vmx/hyperv.h | 2 +-
arch/x86/kvm/vmx/nested.c | 18 +-
arch/x86/kvm/vmx/pmu_intel.c | 4 +-
arch/x86/kvm/vmx/sgx.c | 14 +-
arch/x86/kvm/vmx/vmx.c | 61 +-
arch/x86/kvm/x86.c | 153 ++--
arch/x86/kvm/x86.h | 6 +-
include/asm-generic/vmlinux.lds.h | 4 +
.../selftests/kvm/include/x86_64/processor.h | 11 +-
.../selftests/kvm/lib/x86_64/processor.c | 2 +
.../selftests/kvm/x86_64/kvm_pv_test.c | 38 +-
.../selftests/kvm/x86_64/set_sregs_test.c | 63 +-
30 files changed, 791 insertions(+), 610 deletions(-)
delete mode 100644 arch/x86/kvm/governed_features.h


base-commit: 4aad0b1893a141f114ba40ed509066f3c9bc24b0
--
2.45.0.215.g3402c0e53f-goog



2024-05-17 17:40:18

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 01/49] KVM: x86: Do all post-set CPUID processing during vCPU creation

During vCPU creation, process KVM's default, empty CPUID as if userspace
set an empty CPUID to ensure consistent and correct behavior with respect
to guest CPUID. E.g. if userspace never sets guest CPUID, KVM will never
configure cr4_guest_rsvd_bits, and thus create divergent, incorrect, guest-
visible behavior due to letting the guest set any KVM-supported CR4 bits
despite the features not being allowed per guest CPUID.

Note! This changes KVM's ABI, as lack of full CPUID processing allowed
userspace to stuff garbage vCPU state, e.g. userspace could set CR4 to a
guest-unsupported value via KVM_SET_SREGS. But it's extremely unlikely
that this is a breaking change, as KVM already has many flows that require
userspace to set guest CPUID before loading vCPU state. E.g. multiple MSR
flows consult guest CPUID on host writes, and KVM_SET_SREGS itself already
relies on guest CPUID being up-to-date, as KVM's validity check on CR3
consumes CPUID.0x7.1 (for LAM) and CPUID.0x80000008 (for MAXPHYADDR).

Furthermore, the plan is to commit to enforcing guest CPUID for userspace
writes to MSRs, at which point bypassing sregs CPUID checks is even more
nonsensical.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 2 +-
arch/x86/kvm/cpuid.h | 1 +
arch/x86/kvm/x86.c | 1 +
3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index f2f2be5d1141..2b19ff991ceb 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -335,7 +335,7 @@ static bool kvm_cpuid_has_hyperv(struct kvm_cpuid_entry2 *entries, int nent)
#endif
}

-static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
+void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
{
struct kvm_lapic *apic = vcpu->arch.apic;
struct kvm_cpuid_entry2 *best;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 23dbb9eb277c..0a8b561b5434 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -11,6 +11,7 @@
extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
void kvm_set_cpu_caps(void);

+void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu);
void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu);
void kvm_update_pv_runtime(struct kvm_vcpu *vcpu);
struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d750546ec934..7adcf56bd45d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12234,6 +12234,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
kvm_xen_init_vcpu(vcpu);
kvm_vcpu_mtrr_init(vcpu);
vcpu_load(vcpu);
+ kvm_vcpu_after_set_cpuid(vcpu);
kvm_set_tsc_khz(vcpu, vcpu->kvm->arch.default_tsc_khz);
kvm_vcpu_reset(vcpu, false);
kvm_init_mmu(vcpu);
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:40:49

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 02/49] KVM: x86: Explicitly do runtime CPUID updates "after" initial setup

Explicitly perform runtime CPUID adjustments as part of the "after set
CPUID" flow to guard against bugs where KVM consumes stale vCPU/CPUID
state during kvm_update_cpuid_runtime(). E.g. see commit 4736d85f0d18
("KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT").

Whacking each mole individually is not sustainable or robust, e.g. while
the aforemention commit fixed KVM's PV features, the same issue lurks for
Xen and Hyper-V features, Xen and Hyper-V simply don't have any runtime
features (though spoiler alert, neither should KVM).

Updating runtime features in the "full" path will also simplify adding a
snapshot of the guest's capabilities, i.e. of caching the intersection of
guest CPUID and kvm_cpu_caps (modulo a few edge cases).

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 2b19ff991ceb..e60ffb421e4b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -345,6 +345,8 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
bitmap_zero(vcpu->arch.governed_features.enabled,
KVM_MAX_NR_GOVERNED_FEATURES);

+ kvm_update_cpuid_runtime(vcpu);
+
/*
* If TDP is enabled, let the guest use GBPAGES if they're supported in
* hardware. The hardware page walker doesn't let KVM disable GBPAGES,
@@ -426,8 +428,6 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
{
int r;

- __kvm_update_cpuid_runtime(vcpu, e2, nent);
-
/*
* KVM does not correctly handle changing guest CPUID after KVM_RUN, as
* MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
@@ -440,6 +440,15 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
* whether the supplied CPUID data is equal to what's already set.
*/
if (kvm_vcpu_has_run(vcpu)) {
+ /*
+ * Note, runtime CPUID updates may consume other CPUID-driven
+ * vCPU state, e.g. KVM or Xen CPUID bases. Updating runtime
+ * state before full CPUID processing is functionally correct
+ * only because any change in CPUID is disallowed, i.e. using
+ * stale data is ok because KVM will reject the change.
+ */
+ __kvm_update_cpuid_runtime(vcpu, e2, nent);
+
r = kvm_cpuid_check_equal(vcpu, e2, nent);
if (r)
return r;
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:41:14

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 04/49] KVM: selftests: Update x86's set_sregs_test to match KVM's CPUID enforcement

Rework x86's set sregs test to verify that KVM enforces CPUID vs. CR4
features even if userspace hasn't explicitly set guest CPUID. KVM used to
allow userspace to set any KVM-supported CR4 value prior to KVM_SET_CPUID2,
and the test verified that behavior.

However, the testcase was written purely to verify KVM's existing behavior,
i.e. was NOT written to match the needs of real world VMMs.

Opportunistically verify that KVM continues to reject unsupported features
after KVM_SET_CPUID2 (using KVM_GET_SUPPORTED_CPUID).

Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/x86_64/set_sregs_test.c | 53 +++++++++++--------
1 file changed, 30 insertions(+), 23 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/set_sregs_test.c b/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
index c021c0795a96..96fd690d479a 100644
--- a/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
+++ b/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
@@ -41,13 +41,15 @@ do { \
TEST_ASSERT(!memcmp(&new, &orig, sizeof(new)), "KVM modified sregs"); \
} while (0)

+#define KVM_ALWAYS_ALLOWED_CR4 (X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | \
+ X86_CR4_DE | X86_CR4_PSE | X86_CR4_PAE | \
+ X86_CR4_MCE | X86_CR4_PGE | X86_CR4_PCE | \
+ X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT)
+
static uint64_t calc_supported_cr4_feature_bits(void)
{
- uint64_t cr4;
+ uint64_t cr4 = KVM_ALWAYS_ALLOWED_CR4;

- cr4 = X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE |
- X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE | X86_CR4_PGE |
- X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT;
if (kvm_cpu_has(X86_FEATURE_UMIP))
cr4 |= X86_CR4_UMIP;
if (kvm_cpu_has(X86_FEATURE_LA57))
@@ -72,28 +74,14 @@ static uint64_t calc_supported_cr4_feature_bits(void)
return cr4;
}

-int main(int argc, char *argv[])
+static void test_cr_bits(struct kvm_vcpu *vcpu, uint64_t cr4)
{
struct kvm_sregs sregs;
- struct kvm_vcpu *vcpu;
- struct kvm_vm *vm;
- uint64_t cr4;
int rc, i;

- /*
- * Create a dummy VM, specifically to avoid doing KVM_SET_CPUID2, and
- * use it to verify all supported CR4 bits can be set prior to defining
- * the vCPU model, i.e. without doing KVM_SET_CPUID2.
- */
- vm = vm_create_barebones();
- vcpu = __vm_vcpu_add(vm, 0);
-
vcpu_sregs_get(vcpu, &sregs);
-
- sregs.cr0 = 0;
- sregs.cr4 |= calc_supported_cr4_feature_bits();
- cr4 = sregs.cr4;
-
+ sregs.cr0 &= ~(X86_CR0_CD | X86_CR0_NW);
+ sregs.cr4 |= cr4;
rc = _vcpu_sregs_set(vcpu, &sregs);
TEST_ASSERT(!rc, "Failed to set supported CR4 bits (0x%lx)", cr4);

@@ -101,7 +89,6 @@ int main(int argc, char *argv[])
TEST_ASSERT(sregs.cr4 == cr4, "sregs.CR4 (0x%llx) != CR4 (0x%lx)",
sregs.cr4, cr4);

- /* Verify all unsupported features are rejected by KVM. */
TEST_INVALID_CR_BIT(vcpu, cr4, sregs, X86_CR4_UMIP);
TEST_INVALID_CR_BIT(vcpu, cr4, sregs, X86_CR4_LA57);
TEST_INVALID_CR_BIT(vcpu, cr4, sregs, X86_CR4_VMXE);
@@ -119,10 +106,28 @@ int main(int argc, char *argv[])
/* NW without CD is illegal, as is PG without PE. */
TEST_INVALID_CR_BIT(vcpu, cr0, sregs, X86_CR0_NW);
TEST_INVALID_CR_BIT(vcpu, cr0, sregs, X86_CR0_PG);
+}

+int main(int argc, char *argv[])
+{
+ struct kvm_sregs sregs;
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+ int rc;
+
+ /*
+ * Create a dummy VM, specifically to avoid doing KVM_SET_CPUID2, and
+ * use it to verify KVM enforces guest CPUID even if *userspace* never
+ * sets CPUID.
+ */
+ vm = vm_create_barebones();
+ vcpu = __vm_vcpu_add(vm, 0);
+ test_cr_bits(vcpu, KVM_ALWAYS_ALLOWED_CR4);
kvm_vm_free(vm);

- /* Create a "real" VM and verify APIC_BASE can be set. */
+ /* Create a "real" VM with a fully populated guest CPUID and verify
+ * APIC_BASE and all supported CR4 can be set.
+ */
vm = vm_create_with_one_vcpu(&vcpu, NULL);

vcpu_sregs_get(vcpu, &sregs);
@@ -135,6 +140,8 @@ int main(int argc, char *argv[])
TEST_ASSERT(!rc, "Couldn't set IA32_APIC_BASE to %llx (valid)",
sregs.apic_base);

+ test_cr_bits(vcpu, calc_supported_cr4_feature_bits());
+
kvm_vm_free(vm);

return 0;
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:41:34

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 03/49] KVM: x86: Account for KVM-reserved CR4 bits when passing through CR4 on VMX

Drop x86.c's local pre-computed cr4_reserved bits and instead fold KVM's
reserved bits into the guest's reserved bits. This fixes a bug where VMX's
set_cr4_guest_host_mask() fails to account for KVM-reserved bits when
deciding which bits can be passed through to the guest. In most cases,
letting the guest directly write reserved CR4 bits is ok, i.e. attempting
to set the bit(s) will still #GP, but not if a feature is available in
hardware but explicitly disabled by the host, e.g. if FSGSBASE support is
disabled via "nofsgsbase".

Note, the extra overhead of computing host reserved bits every time
userspace sets guest CPUID is negligible. The feature bits that are
queried are packed nicely into a handful of words, and so checking and
setting each reserved bit costs in the neighborhood of ~5 cycles, i.e. the
total cost will be in the noise even if the number of checked CR4 bits
doubles over the next few years. In other words, x86 will run out of CR4
bits long before the overhead becomes problematic.

Note #2, __cr4_reserved_bits() starts from CR4_RESERVED_BITS, which is
why the existing __kvm_cpu_cap_has() processing doesn't explicitly OR in
CR4_RESERVED_BITS (and why the new code doesn't do so either).

Fixes: 2ed41aa631fc ("KVM: VMX: Intercept guest reserved CR4 bits to inject #GP fault")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 7 +++++--
arch/x86/kvm/x86.c | 9 ---------
2 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index e60ffb421e4b..f756a91a3f2f 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -383,8 +383,11 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu);

kvm_pmu_refresh(vcpu);
- vcpu->arch.cr4_guest_rsvd_bits =
- __cr4_reserved_bits(guest_cpuid_has, vcpu);
+
+#define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
+ vcpu->arch.cr4_guest_rsvd_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_) |
+ __cr4_reserved_bits(guest_cpuid_has, vcpu);
+#undef __kvm_cpu_cap_has

kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu->arch.cpuid_entries,
vcpu->arch.cpuid_nent));
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7adcf56bd45d..3f20de4368a6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -116,8 +116,6 @@ u64 __read_mostly efer_reserved_bits = ~((u64)(EFER_SCE | EFER_LME | EFER_LMA));
static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
#endif

-static u64 __read_mostly cr4_reserved_bits = CR4_RESERVED_BITS;
-
#define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_MAP_GPA_RANGE)

#define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE
@@ -1134,9 +1132,6 @@ EXPORT_SYMBOL_GPL(kvm_emulate_xsetbv);

bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
{
- if (cr4 & cr4_reserved_bits)
- return false;
-
if (cr4 & vcpu->arch.cr4_guest_rsvd_bits)
return false;

@@ -9831,10 +9826,6 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
kvm_caps.supported_xss = 0;

-#define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
- cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
-#undef __kvm_cpu_cap_has
-
if (kvm_caps.has_tsc_control) {
/*
* Make sure the user can only configure tsc_khz values that
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:43:08

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 10/49] KVM: x86: Drop now-redundant MAXPHYADDR and GPA rsvd bits from vCPU creation

Drop the manual initialization of maxphyaddr and reserved_gpa_bits during
vCPU creation now that kvm_arch_vcpu_create() unconditionally invokes
kvm_vcpu_after_set_cpuid(), which handles all such CPUID caching.

None of the helpers between the existing code in kvm_arch_vcpu_create()
and the call to kvm_vcpu_after_set_cpuid() consume maxphyaddr or
reserved_gpa_bits (though auditing vmx_vcpu_create() and svm_vcpu_create()
isn't exactly easy). And even if that weren't the case, KVM _must_
refresh any affected state during kvm_vcpu_after_set_cpuid(), e.g. to
correctly handle KVM_SET_CPUID2. In other words, this can't introduce a
new bug, only expose an existing bug (of which there don't appear to be
any).

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/x86.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2f6dda723005..bb34891d2f0a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12190,9 +12190,6 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
goto free_emulate_ctxt;
}

- vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
- vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu);
-
vcpu->arch.pat = MSR_IA32_CR_PAT_DEFAULT;

kvm_async_pf_hash_reset(vcpu);
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:45:00

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 16/49] KVM: x86: Don't update PV features caches when enabling enforcement capability

Revert the chunk of commit 01b4f510b9f4 ("kvm: x86: ensure pv_cpuid.features
is initialized when enabling cap") that forced a PV features cache refresh
during KVM_CAP_ENFORCE_PV_FEATURE_CPUID, as whatever ioctl() ordering
issue it alleged to have fixed never existed upstream, and likely never
existed in any kernel.

At the time of the commit, there was a tangentially related ioctl()
ordering issue, as toggling KVM_X86_DISABLE_EXITS_HLT after KVM_SET_CPUID2
would have resulted in KVM potentially leaving KVM_FEATURE_PV_UNHALT set.
But (a) that bug affected the entire guest CPUID, not just the cache, (b)
commit 01b4f510b9f4 didn't address that bug, it only refreshed the cache
(with the bad CPUID), and (c) setting KVM_X86_DISABLE_EXITS_HLT after vCPU
creation is completely broken as KVM configures HLT-exiting only during
vCPU creation, which is why KVM_CAP_X86_DISABLE_EXITS is now disallowed if
vCPUs have been created.

Another tangentially related bug was KVM's failure to clear the cache when
handling KVM_SET_CPUID2, but again commit 01b4f510b9f4 did nothing to fix
that bug.

The most plausible explanation for the what commit 01b4f510b9f4 was trying
to fix is a bug that existed in Google's internal kernel that was the
source of commit 01b4f510b9f4. At the time, Google's internal kernel had
not yet picked up commit 0d3b2ba16ba68 ("KVM: X86: Go on updating other
CPUID leaves when leaf 1 is absent"), i.e. KVM would not initialize the
PV features cache if KVM_SET_CPUID2 was called without a CPUID.0x1 entry.

Of course, no sane real world VMM would omit CPUID.0x1, including the KVM
selftest added by commit ac4a4d6de22e ("selftests: kvm: test enforcement
of paravirtual cpuid features"). And the test didn't actually try to
verify multiple orderings, nor did the selftest enter the guest without
doing KVM_SET_CPUID2, so who knows what motivated the change.

Regardless of why commit 01b4f510b9f4 ("kvm: x86: ensure pv_cpuid.features
is initialized when enabling cap") was added, refreshing the cache during
KVM_CAP_ENFORCE_PV_FEATURE_CPUID isn't necessary.

Cc: Oliver Upton <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 2 +-
arch/x86/kvm/cpuid.h | 1 -
arch/x86/kvm/x86.c | 3 ---
3 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index be1c8f43e090..a51e48663f53 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -242,7 +242,7 @@ static struct kvm_cpuid_entry2 *kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcp
vcpu->arch.cpuid_nent, base);
}

-void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
+static void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best = kvm_find_kvm_cpuid_features(vcpu);

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 0a8b561b5434..7eb3d7318fc4 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -13,7 +13,6 @@ void kvm_set_cpu_caps(void);

void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu);
void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu);
-void kvm_update_pv_runtime(struct kvm_vcpu *vcpu);
struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
u32 function, u32 index);
struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c729227c6501..7160c5ab8e3e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5849,9 +5849,6 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,

case KVM_CAP_ENFORCE_PV_FEATURE_CPUID:
vcpu->arch.pv_cpuid.enforce = cap->args[0];
- if (vcpu->arch.pv_cpuid.enforce)
- kvm_update_pv_runtime(vcpu);
-
return 0;
default:
return -EINVAL;
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:45:17

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 08/49] KVM: x86: Move __kvm_is_valid_cr4() definition to x86.h

Let vendor code inline __kvm_is_valid_cr4() now x86.c's cr4_reserved_bits
no longer exists, as keeping cr4_reserved_bits local to x86.c was the only
reason for "hiding" the definition of __kvm_is_valid_cr4().

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/x86.c | 9 ---------
arch/x86/kvm/x86.h | 6 +++++-
2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3f20de4368a6..2f6dda723005 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1130,15 +1130,6 @@ int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_emulate_xsetbv);

-bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
-{
- if (cr4 & vcpu->arch.cr4_guest_rsvd_bits)
- return false;
-
- return true;
-}
-EXPORT_SYMBOL_GPL(__kvm_is_valid_cr4);
-
static bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
{
return __kvm_is_valid_cr4(vcpu, cr4) &&
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index d80a4c6b5a38..4a723705a139 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -491,7 +491,6 @@ static inline void kvm_machine_check(void)
void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu);
void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu);
int kvm_spec_ctrl_test_value(u64 value);
-bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
struct x86_exception *e);
int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva);
@@ -505,6 +504,11 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
#define KVM_MSR_RET_INVALID 2 /* in-kernel MSR emulation #GP condition */
#define KVM_MSR_RET_FILTERED 3 /* #GP due to userspace MSR filter */

+static inline bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+{
+ return !(cr4 & vcpu->arch.cr4_guest_rsvd_bits);
+}
+
#define __cr4_reserved_bits(__cpu_has, __c) \
({ \
u64 __reserved_bits = CR4_RESERVED_BITS; \
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:46:31

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 21/49] KVM: x86: Add a macro to init CPUID features that are 64-bit only

Add a macro to mask-in feature flags that are supported only on 64-bit
kernels/KVM. In addition to reducing overall #ifdeffery, using a macro
will allow hardening the kvm_cpu_cap initialization sequences to assert
that the features being advertised are indeed included in the word being
initialized. And arguably using *F() macros through is more readable.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 5a4d6138c4f1..5e3b97d06374 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -70,6 +70,12 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
(boot_cpu_has(X86_FEATURE_##name) ? F(name) : 0); \
})

+/* Features that KVM supports only on 64-bit kernels. */
+#define X86_64_F(name) \
+({ \
+ (IS_ENABLED(CONFIG_X86_64) ? F(name) : 0); \
+})
+
/*
* Raw Feature - For features that KVM supports based purely on raw host CPUID,
* i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
@@ -639,15 +645,6 @@ static __always_inline void kvm_cpu_cap_init(enum cpuid_leafs leaf, u32 mask)

void kvm_set_cpu_caps(void)
{
-#ifdef CONFIG_X86_64
- unsigned int f_gbpages = F(GBPAGES);
- unsigned int f_lm = F(LM);
- unsigned int f_xfd = F(XFD);
-#else
- unsigned int f_gbpages = 0;
- unsigned int f_lm = 0;
- unsigned int f_xfd = 0;
-#endif
memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps));

BUILD_BUG_ON(sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)) >
@@ -744,7 +741,8 @@ void kvm_set_cpu_caps(void)
);

kvm_cpu_cap_init(CPUID_D_1_EAX,
- F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES) | f_xfd
+ F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES) |
+ X86_64_F(XFD)
);

kvm_cpu_cap_init_kvm_defined(CPUID_12_EAX,
@@ -766,8 +764,8 @@ void kvm_set_cpu_caps(void)
F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
F(PAT) | F(PSE36) | 0 /* Reserved */ |
F(NX) | 0 /* Reserved */ | F(MMXEXT) | F(MMX) |
- F(FXSR) | F(FXSR_OPT) | f_gbpages | F(RDTSCP) |
- 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW)
+ F(FXSR) | F(FXSR_OPT) | X86_64_F(GBPAGES) | F(RDTSCP) |
+ 0 /* Reserved */ | X86_64_F(LM) | F(3DNOWEXT) | F(3DNOW)
);

if (!tdp_enabled && IS_ENABLED(CONFIG_X86_64))
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:47:16

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 06/49] KVM: selftests: Refresh vCPU CPUID cache in __vcpu_get_cpuid_entry()

Refresh selftests' CPUID cache in the vCPU structure when querying a CPUID
entry so that tests don't consume stale data when KVM modifies CPUID as a
side effect to a completely unrelated change. E.g. KVM adjusts OSXSAVE in
response to CR4.OSXSAVE changes.

Unnecessarily invoking KVM_GET_CPUID is suboptimal, but vcpu->cpuid exists
to simplify selftests development, not for performance reasons. And,
unfortunately, trying to handle the side effects in tests or other flows
is unpleasant, e.g. selftests could manually refresh if KVM_SET_SREGS is
successful, but that would still leave a gap with respect to guest CR4
changes.

Signed-off-by: Sean Christopherson <[email protected]>
---
.../testing/selftests/kvm/include/x86_64/processor.h | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 8eb57de0b587..99aa3dfca16c 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -992,10 +992,17 @@ static inline struct kvm_cpuid2 *allocate_kvm_cpuid2(int nr_entries)
void vcpu_init_cpuid(struct kvm_vcpu *vcpu, const struct kvm_cpuid2 *cpuid);
void vcpu_set_hv_cpuid(struct kvm_vcpu *vcpu);

+static inline void vcpu_get_cpuid(struct kvm_vcpu *vcpu)
+{
+ vcpu_ioctl(vcpu, KVM_GET_CPUID2, vcpu->cpuid);
+}
+
static inline struct kvm_cpuid_entry2 *__vcpu_get_cpuid_entry(struct kvm_vcpu *vcpu,
uint32_t function,
uint32_t index)
{
+ vcpu_get_cpuid(vcpu);
+
return (struct kvm_cpuid_entry2 *)get_cpuid_entry(vcpu->cpuid,
function, index);
}
@@ -1016,7 +1023,7 @@ static inline int __vcpu_set_cpuid(struct kvm_vcpu *vcpu)
return r;

/* On success, refresh the cache to pick up adjustments made by KVM. */
- vcpu_ioctl(vcpu, KVM_GET_CPUID2, vcpu->cpuid);
+ vcpu_get_cpuid(vcpu);
return 0;
}

@@ -1026,7 +1033,7 @@ static inline void vcpu_set_cpuid(struct kvm_vcpu *vcpu)
vcpu_ioctl(vcpu, KVM_SET_CPUID2, vcpu->cpuid);

/* Refresh the cache to pick up adjustments made by KVM. */
- vcpu_ioctl(vcpu, KVM_GET_CPUID2, vcpu->cpuid);
+ vcpu_get_cpuid(vcpu);
}

void vcpu_set_cpuid_property(struct kvm_vcpu *vcpu,
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:47:29

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 24/49] KVM: x86: #undef SPEC_CTRL_SSBD in cpuid.c to avoid macro collisions

Undefine SPEC_CTRL_SSBD, which is #defined by msr-index.h to represent the
enable flag in MSR_IA32_SPEC_CTRL, to avoid issues with the macro being
unpacked into its raw value when passed to KVM's F() macro. This will
allow using multiple layers of macros in F() and friends, e.g. to harden
against incorrect usage of F().

No functional change intended (cpuid.c doesn't consume SPEC_CTRL_SSBD).

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 8efffd48cdf1..a16d6e070c11 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -639,6 +639,12 @@ static __always_inline void kvm_cpu_cap_init(u32 leaf, u32 mask)
kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);
}

+/*
+ * Undefine the MSR bit macro to avoid token concatenation issues when
+ * processing X86_FEATURE_SPEC_CTRL_SSBD.
+ */
+#undef SPEC_CTRL_SSBD
+
void kvm_set_cpu_caps(void)
{
memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps));
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:47:38

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 05/49] KVM: selftests: Assert that the @cpuid passed to get_cpuid_entry() is non-NULL

Add a sanity check in get_cpuid_entry() to provide a friendlier error than
a segfault when a test developer tries to use a vCPU CPUID helper on a
barebones vCPU.

Signed-off-by: Sean Christopherson <[email protected]>
---
tools/testing/selftests/kvm/lib/x86_64/processor.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index c664e446136b..f0f3434d767e 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -1141,6 +1141,8 @@ const struct kvm_cpuid_entry2 *get_cpuid_entry(const struct kvm_cpuid2 *cpuid,
{
int i;

+ TEST_ASSERT(cpuid, "Must do vcpu_init_cpuid() first (or equivalent)");
+
for (i = 0; i < cpuid->nent; i++) {
if (cpuid->entries[i].function == function &&
cpuid->entries[i].index == index)
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:47:50

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 25/49] KVM: x86: Harden CPU capabilities processing against out-of-scope features

Add compile-time assertions to verify that usage of F() and friends in
kvm_set_cpu_caps() is scoped to the correct CPUID word, e.g. to detect
bugs where KVM passes a feature bit from word X into word y.

Add a one-off assertion in the aliased feature macro to ensure that only
word 0x8000_0001.EDX aliased the features defined for 0x1.EDX.

To do so, convert kvm_cpu_cap_init() to a macro and have it define a
local variable to track which CPUID word is being initialized that is
then used to validate usage of F() (all of the inputs are compile-time
constants and thus can be fed into BUILD_BUG_ON()).

Redefine KVM_VALIDATE_CPU_CAP_USAGE after kvm_set_cpu_caps() to be a nop
so that F() can be used in other flows that aren't as easily hardened,
e.g. __do_cpuid_func_emulated() and __do_cpuid_func().

Invoke KVM_VALIDATE_CPU_CAP_USAGE() in SF() and X86_64_F() to ensure the
validation occurs, e.g. if the usage of F() is completely compiled out
(which shouldn't happen for boot_cpu_has(), but could happen in the future,
e.g. if KVM were to use cpu_feature_enabled()).

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 55 +++++++++++++++++++++++++++++++-------------
1 file changed, 39 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a16d6e070c11..1064e4d68718 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -61,18 +61,24 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
return ret;
}

-#define F feature_bit
+#define F(name) \
+({ \
+ KVM_VALIDATE_CPU_CAP_USAGE(name); \
+ feature_bit(name); \
+})

/* Scattered Flag - For features that are scattered by cpufeatures.h. */
#define SF(name) \
({ \
BUILD_BUG_ON(X86_FEATURE_##name >= MAX_CPU_FEATURES); \
+ KVM_VALIDATE_CPU_CAP_USAGE(name); \
(boot_cpu_has(X86_FEATURE_##name) ? F(name) : 0); \
})

/* Features that KVM supports only on 64-bit kernels. */
#define X86_64_F(name) \
({ \
+ KVM_VALIDATE_CPU_CAP_USAGE(name); \
(IS_ENABLED(CONFIG_X86_64) ? F(name) : 0); \
})

@@ -95,6 +101,7 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
#define AF(name) \
({ \
BUILD_BUG_ON(__feature_leaf(X86_FEATURE_##name) != CPUID_1_EDX); \
+ BUILD_BUG_ON(kvm_cpu_cap_init_in_progress != CPUID_8000_0001_EDX); \
feature_bit(name); \
})

@@ -622,22 +629,34 @@ static __always_inline u32 raw_cpuid_get(struct cpuid_reg cpuid)
return *__cpuid_entry_get_reg(&entry, cpuid.reg);
}

-static __always_inline void kvm_cpu_cap_init(u32 leaf, u32 mask)
-{
- const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
+/*
+ * Assert that the feature bit being declared, e.g. via F(), is in the CPUID
+ * word that's being initialized. Exempt 0x8000_0001.EDX usage of 0x1.EDX
+ * features, as AMD duplicated many 0x1.EDX features into 0x8000_0001.EDX.
+ */
+#define KVM_VALIDATE_CPU_CAP_USAGE(name) \
+do { \
+ u32 __leaf = __feature_leaf(X86_FEATURE_##name); \
+ \
+ BUILD_BUG_ON(__leaf != kvm_cpu_cap_init_in_progress); \
+} while (0)

- /*
- * For kernel-defined leafs, mask the boot CPU's pre-populated value.
- * For KVM-defined leafs, explicitly set the leaf, as KVM is the one
- * and only authority.
- */
- if (leaf < NCAPINTS)
- kvm_cpu_caps[leaf] &= mask;
- else
- kvm_cpu_caps[leaf] = mask;
-
- kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);
-}
+/*
+ * For kernel-defined leafs, mask the boot CPU's pre-populated value. For KVM-
+ * defined leafs, explicitly set the leaf, as KVM is the one and only authority.
+ */
+#define kvm_cpu_cap_init(leaf, mask) \
+do { \
+ const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32); \
+ const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf; \
+ \
+ if (leaf < NCAPINTS) \
+ kvm_cpu_caps[leaf] &= (mask); \
+ else \
+ kvm_cpu_caps[leaf] = (mask); \
+ \
+ kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid); \
+} while (0)

/*
* Undefine the MSR bit macro to avoid token concatenation issues when
@@ -870,6 +889,10 @@ void kvm_set_cpu_caps(void)
}
EXPORT_SYMBOL_GPL(kvm_set_cpu_caps);

+#undef kvm_cpu_cap_init
+#undef KVM_VALIDATE_CPU_CAP_USAGE
+#define KVM_VALIDATE_CPU_CAP_USAGE(name)
+
struct kvm_cpuid_array {
struct kvm_cpuid_entry2 *entries;
int maxnent;
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:48:23

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software

Now that kvm_cpu_cap_init() is a macro with its own scope, add EMUL_F() to
OR-in features that KVM emulates in software, i.e. that don't depend on
the feature being available in hardware. The contained scope
of kvm_cpu_cap_init() allows using a local variable to track the set of
emulated leaves, which in addition to avoiding confusing and/or
unnecessary variables, helps prevent misuse of EMUL_F().

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 36 +++++++++++++++++++++---------------
1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1064e4d68718..33e3e77de1b7 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -94,6 +94,16 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
F(name); \
})

+/*
+ * Emulated Feature - For features that KVM emulates in software irrespective
+ * of host CPU/kernel support.
+ */
+#define EMUL_F(name) \
+({ \
+ kvm_cpu_cap_emulated |= F(name); \
+ F(name); \
+})
+
/*
* Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
* identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
@@ -649,6 +659,7 @@ do { \
do { \
const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32); \
const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf; \
+ u32 kvm_cpu_cap_emulated = 0; \
\
if (leaf < NCAPINTS) \
kvm_cpu_caps[leaf] &= (mask); \
@@ -656,6 +667,7 @@ do { \
kvm_cpu_caps[leaf] = (mask); \
\
kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid); \
+ kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated; \
} while (0)

/*
@@ -684,12 +696,10 @@ void kvm_set_cpu_caps(void)
0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
F(FMA) | F(CX16) | 0 /* xTPR Update */ | F(PDCM) |
F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
- F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
+ F(XMM4_2) | EMUL_F(X2APIC) | F(MOVBE) | F(POPCNT) |
0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
F(F16C) | F(RDRAND)
);
- /* KVM emulates x2apic in software irrespective of host support. */
- kvm_cpu_cap_set(X86_FEATURE_X2APIC);

kvm_cpu_cap_init(CPUID_1_EDX,
F(FPU) | F(VME) | F(DE) | F(PSE) |
@@ -703,13 +713,13 @@ void kvm_set_cpu_caps(void)
);

kvm_cpu_cap_init(CPUID_7_0_EBX,
- F(FSGSBASE) | F(SGX) | F(BMI1) | F(HLE) | F(AVX2) |
- F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) | F(INVPCID) |
- F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ | F(AVX512F) |
- F(AVX512DQ) | F(RDSEED) | F(ADX) | F(SMAP) | F(AVX512IFMA) |
- F(CLFLUSHOPT) | F(CLWB) | 0 /*INTEL_PT*/ | F(AVX512PF) |
- F(AVX512ER) | F(AVX512CD) | F(SHA_NI) | F(AVX512BW) |
- F(AVX512VL));
+ F(FSGSBASE) | EMUL_F(TSC_ADJUST) | F(SGX) | F(BMI1) | F(HLE) |
+ F(AVX2) | F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) |
+ F(INVPCID) | F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ |
+ F(AVX512F) | F(AVX512DQ) | F(RDSEED) | F(ADX) | F(SMAP) |
+ F(AVX512IFMA) | F(CLFLUSHOPT) | F(CLWB) | 0 /*INTEL_PT*/ |
+ F(AVX512PF) | F(AVX512ER) | F(AVX512CD) | F(SHA_NI) |
+ F(AVX512BW) | F(AVX512VL));

kvm_cpu_cap_init(CPUID_7_ECX,
F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
@@ -728,16 +738,12 @@ void kvm_set_cpu_caps(void)

kvm_cpu_cap_init(CPUID_7_EDX,
F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
- F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
+ F(SPEC_CTRL_SSBD) | EMUL_F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) |
F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D)
);

- /* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
- kvm_cpu_cap_set(X86_FEATURE_TSC_ADJUST);
- kvm_cpu_cap_set(X86_FEATURE_ARCH_CAPABILITIES);
-
if (boot_cpu_has(X86_FEATURE_IBPB) && boot_cpu_has(X86_FEATURE_IBRS))
kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL);
if (boot_cpu_has(X86_FEATURE_STIBP))
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:48:41

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 15/49] KVM: x86: Zero out PV features cache when the CPUID leaf is not present

Clear KVM's PV feature cache prior when processing a new guest CPUID so
that KVM doesn't keep a stale cache entry if userspace does KVM_SET_CPUID2
multiple times, once with a PV features entry, and a second time without.

Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID")
Cc: Oliver Upton <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index f756a91a3f2f..be1c8f43e090 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -246,6 +246,8 @@ void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best = kvm_find_kvm_cpuid_features(vcpu);

+ vcpu->arch.pv_cpuid.features = 0;
+
/*
* save the feature bitmap to avoid cpuid lookup for every PV
* operation
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:48:45

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 28/49] KVM: x86: Clear PV_UNHALT for !HLT-exiting only when userspace sets CPUID

Now that KVM disallows disabling HLT-exiting after vCPUs have been created,
i.e. now that it's impossible for kvm_hlt_in_guest() to change while vCPUs
are running, apply KVM's PV_UNHALT quirk only when userspace is setting
guest CPUID.

Opportunistically rename the helper to make it clear that KVM's behavior
is a quirk that should never have been added. KVM's documentation
explicitly states that userspace should not advertise PV_UNHALT if
HLT-exiting is disabled, but for unknown reasons, commit caa057a2cad6
("KVM: X86: Provide a capability to disable HLT intercepts") didn't stop
at documenting the requirement and also massaged the incoming guest CPUID.

Unfortunately, it's quite likely that userspace has come to rely on KVM's
behavior, i.e. the code can't simply be deleted. The only reason KVM
doesn't have an "official" quirk is that there is no known use case where
disabling the quirk would make sense, i.e. letting userspace disable the
quirk would further increase KVM's burden without any benefit.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 26 +++++++++-----------------
1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 4ad01867cb8d..93a7399dc0db 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -287,18 +287,17 @@ static struct kvm_cpuid_entry2 *kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcp
vcpu->arch.cpuid_nent, base);
}

-static void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
+static u32 kvm_apply_cpuid_pv_features_quirk(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best = kvm_find_kvm_cpuid_features(vcpu);

- vcpu->arch.pv_cpuid.features = 0;
+ if (!best)
+ return 0;

- /*
- * save the feature bitmap to avoid cpuid lookup for every PV
- * operation
- */
- if (best)
- vcpu->arch.pv_cpuid.features = best->eax;
+ if (kvm_hlt_in_guest(vcpu->kvm))
+ best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
+
+ return best->eax;
}

/*
@@ -320,7 +319,6 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
int nent)
{
struct kvm_cpuid_entry2 *best;
- struct kvm_hypervisor_cpuid kvm_cpuid;

best = cpuid_entry2_find(entries, nent, 1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
if (best) {
@@ -347,13 +345,6 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
best->ebx = xstate_required_size(vcpu->arch.xcr0, true);

- kvm_cpuid = __kvm_get_hypervisor_cpuid(entries, nent, KVM_SIGNATURE);
- if (kvm_cpuid.base) {
- best = __kvm_find_kvm_cpuid_features(entries, nent, kvm_cpuid.base);
- if (kvm_hlt_in_guest(vcpu->kvm) && best)
- best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
- }
-
if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
best = cpuid_entry2_find(entries, nent, 0x1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
if (best)
@@ -425,7 +416,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
vcpu->arch.guest_supported_xcr0 =
cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);

- kvm_update_pv_runtime(vcpu);
+ vcpu->arch.pv_cpuid.features = kvm_apply_cpuid_pv_features_quirk(vcpu);

vcpu->arch.is_amd_compatible = guest_cpuid_is_amd_or_hygon(vcpu);
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
@@ -508,6 +499,7 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
* stale data is ok because KVM will reject the change.
*/
kvm_update_cpuid_runtime(vcpu);
+ kvm_apply_cpuid_pv_features_quirk(vcpu);

r = kvm_cpuid_check_equal(vcpu, e2, nent);
if (r)
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:49:05

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 29/49] KVM: x86: Remove unnecessary caching of KVM's PV CPUID base

Now that KVM only searches for KVM's PV CPUID base when userspace sets
guest CPUID, drop the cache and simply do the search every time.

Practically speaking, this is a nop except for situations where userspace
sets CPUID _after_ running the vCPU, which is anything but a hot path,
e.g. QEMU does so only when hotplugging a vCPU. And on the flip side,
caching guest CPUID information, especially information that is used to
query/modify _other_ CPUID state, is inherently dangerous as it's all too
easy to use stale information, i.e. KVM should only cache CPUID state when
the performance and/or programming benefits justify it.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/cpuid.c | 34 +++++++--------------------------
2 files changed, 7 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aabf1648a56a..3003e99155e7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -858,7 +858,6 @@ struct kvm_vcpu_arch {

int cpuid_nent;
struct kvm_cpuid_entry2 *cpuid_entries;
- struct kvm_hypervisor_cpuid kvm_cpuid;
bool is_amd_compatible;

/*
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 93a7399dc0db..7290f91c422c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -269,28 +269,16 @@ static struct kvm_hypervisor_cpuid kvm_get_hypervisor_cpuid(struct kvm_vcpu *vcp
vcpu->arch.cpuid_nent, sig);
}

-static struct kvm_cpuid_entry2 *__kvm_find_kvm_cpuid_features(struct kvm_cpuid_entry2 *entries,
- int nent, u32 kvm_cpuid_base)
-{
- return cpuid_entry2_find(entries, nent, kvm_cpuid_base | KVM_CPUID_FEATURES,
- KVM_CPUID_INDEX_NOT_SIGNIFICANT);
-}
-
-static struct kvm_cpuid_entry2 *kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcpu)
-{
- u32 base = vcpu->arch.kvm_cpuid.base;
-
- if (!base)
- return NULL;
-
- return __kvm_find_kvm_cpuid_features(vcpu->arch.cpuid_entries,
- vcpu->arch.cpuid_nent, base);
-}
-
static u32 kvm_apply_cpuid_pv_features_quirk(struct kvm_vcpu *vcpu)
{
- struct kvm_cpuid_entry2 *best = kvm_find_kvm_cpuid_features(vcpu);
+ struct kvm_hypervisor_cpuid kvm_cpuid;
+ struct kvm_cpuid_entry2 *best;

+ kvm_cpuid = kvm_get_hypervisor_cpuid(vcpu, KVM_SIGNATURE);
+ if (!kvm_cpuid.base)
+ return 0;
+
+ best = kvm_find_cpuid_entry(vcpu, kvm_cpuid.base | KVM_CPUID_FEATURES);
if (!best)
return 0;

@@ -491,13 +479,6 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
* whether the supplied CPUID data is equal to what's already set.
*/
if (kvm_vcpu_has_run(vcpu)) {
- /*
- * Note, runtime CPUID updates may consume other CPUID-driven
- * vCPU state, e.g. KVM or Xen CPUID bases. Updating runtime
- * state before full CPUID processing is functionally correct
- * only because any change in CPUID is disallowed, i.e. using
- * stale data is ok because KVM will reject the change.
- */
kvm_update_cpuid_runtime(vcpu);
kvm_apply_cpuid_pv_features_quirk(vcpu);

@@ -519,7 +500,6 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
if (r)
goto err;

- vcpu->arch.kvm_cpuid = kvm_get_hypervisor_cpuid(vcpu, KVM_SIGNATURE);
#ifdef CONFIG_KVM_XEN
vcpu->arch.xen.cpuid = kvm_get_hypervisor_cpuid(vcpu, XEN_SIGNATURE);
#endif
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:49:35

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 30/49] KVM: x86: Always operate on kvm_vcpu data in cpuid_entry2_find()

Now that KVM sets vcpu->arch.cpuid_{entries,nent} before processing the
incoming CPUID entries during KVM_SET_CPUID{,2}, drop the @entries and
@nent params from cpuid_entry2_find() and unconditionally operate on the
vCPU state.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 62 +++++++++++++++-----------------------------
1 file changed, 21 insertions(+), 41 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7290f91c422c..0526f25a7c80 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -124,8 +124,8 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
*/
#define KVM_CPUID_INDEX_NOT_SIGNIFICANT -1ull

-static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
- struct kvm_cpuid_entry2 *entries, int nent, u32 function, u64 index)
+static struct kvm_cpuid_entry2 *cpuid_entry2_find(struct kvm_vcpu *vcpu,
+ u32 function, u64 index)
{
struct kvm_cpuid_entry2 *e;
int i;
@@ -142,8 +142,8 @@ static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
*/
lockdep_assert_irqs_enabled();

- for (i = 0; i < nent; i++) {
- e = &entries[i];
+ for (i = 0; i < vcpu->arch.cpuid_nent; i++) {
+ e = &vcpu->arch.cpuid_entries[i];

if (e->function != function)
continue;
@@ -177,8 +177,6 @@ static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(

static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
{
- struct kvm_cpuid_entry2 *entries = vcpu->arch.cpuid_entries;
- int nent = vcpu->arch.cpuid_nent;
struct kvm_cpuid_entry2 *best;
u64 xfeatures;

@@ -186,7 +184,7 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
* The existing code assumes virtual address is 48-bit or 57-bit in the
* canonical address checks; exit if it is ever changed.
*/
- best = cpuid_entry2_find(entries, nent, 0x80000008,
+ best = cpuid_entry2_find(vcpu, 0x80000008,
KVM_CPUID_INDEX_NOT_SIGNIFICANT);
if (best) {
int vaddr_bits = (best->eax & 0xff00) >> 8;
@@ -199,7 +197,7 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
* Exposing dynamic xfeatures to the guest requires additional
* enabling in the FPU, e.g. to expand the guest XSAVE state size.
*/
- best = cpuid_entry2_find(entries, nent, 0xd, 0);
+ best = cpuid_entry2_find(vcpu, 0xd, 0);
if (!best)
return 0;

@@ -234,15 +232,15 @@ static int kvm_cpuid_check_equal(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2
return 0;
}

-static struct kvm_hypervisor_cpuid __kvm_get_hypervisor_cpuid(struct kvm_cpuid_entry2 *entries,
- int nent, const char *sig)
+static struct kvm_hypervisor_cpuid kvm_get_hypervisor_cpuid(struct kvm_vcpu *vcpu,
+ const char *sig)
{
struct kvm_hypervisor_cpuid cpuid = {};
struct kvm_cpuid_entry2 *entry;
u32 base;

for_each_possible_hypervisor_cpuid_base(base) {
- entry = cpuid_entry2_find(entries, nent, base, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+ entry = cpuid_entry2_find(vcpu, base, KVM_CPUID_INDEX_NOT_SIGNIFICANT);

if (entry) {
u32 signature[3];
@@ -262,13 +260,6 @@ static struct kvm_hypervisor_cpuid __kvm_get_hypervisor_cpuid(struct kvm_cpuid_e
return cpuid;
}

-static struct kvm_hypervisor_cpuid kvm_get_hypervisor_cpuid(struct kvm_vcpu *vcpu,
- const char *sig)
-{
- return __kvm_get_hypervisor_cpuid(vcpu->arch.cpuid_entries,
- vcpu->arch.cpuid_nent, sig);
-}
-
static u32 kvm_apply_cpuid_pv_features_quirk(struct kvm_vcpu *vcpu)
{
struct kvm_hypervisor_cpuid kvm_cpuid;
@@ -292,23 +283,22 @@ static u32 kvm_apply_cpuid_pv_features_quirk(struct kvm_vcpu *vcpu)
* Calculate guest's supported XCR0 taking into account guest CPUID data and
* KVM's supported XCR0 (comprised of host's XCR0 and KVM_SUPPORTED_XCR0).
*/
-static u64 cpuid_get_supported_xcr0(struct kvm_cpuid_entry2 *entries, int nent)
+static u64 cpuid_get_supported_xcr0(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;

- best = cpuid_entry2_find(entries, nent, 0xd, 0);
+ best = cpuid_entry2_find(vcpu, 0xd, 0);
if (!best)
return 0;

return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
}

-static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *entries,
- int nent)
+void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;

- best = cpuid_entry2_find(entries, nent, 1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+ best = cpuid_entry2_find(vcpu, 1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
if (best) {
/* Update OSXSAVE bit */
if (boot_cpu_has(X86_FEATURE_XSAVE))
@@ -319,43 +309,36 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
}

- best = cpuid_entry2_find(entries, nent, 7, 0);
+ best = cpuid_entry2_find(vcpu, 7, 0);
if (best && boot_cpu_has(X86_FEATURE_PKU) && best->function == 0x7)
cpuid_entry_change(best, X86_FEATURE_OSPKE,
kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));

- best = cpuid_entry2_find(entries, nent, 0xD, 0);
+ best = cpuid_entry2_find(vcpu, 0xD, 0);
if (best)
best->ebx = xstate_required_size(vcpu->arch.xcr0, false);

- best = cpuid_entry2_find(entries, nent, 0xD, 1);
+ best = cpuid_entry2_find(vcpu, 0xD, 1);
if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
best->ebx = xstate_required_size(vcpu->arch.xcr0, true);

if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
- best = cpuid_entry2_find(entries, nent, 0x1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+ best = cpuid_entry2_find(vcpu, 0x1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
if (best)
cpuid_entry_change(best, X86_FEATURE_MWAIT,
vcpu->arch.ia32_misc_enable_msr &
MSR_IA32_MISC_ENABLE_MWAIT);
}
}
-
-void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
-{
- __kvm_update_cpuid_runtime(vcpu, vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
-}
EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);

static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
{
#ifdef CONFIG_KVM_HYPERV
- struct kvm_cpuid_entry2 *entries = vcpu->arch.cpuid_entries;
- int nent = vcpu->arch.cpuid_nent;
struct kvm_cpuid_entry2 *entry;

- entry = cpuid_entry2_find(entries, nent, HYPERV_CPUID_INTERFACE,
+ entry = cpuid_entry2_find(vcpu, HYPERV_CPUID_INTERFACE,
KVM_CPUID_INDEX_NOT_SIGNIFICANT);
return entry && entry->eax == HYPERV_CPUID_SIGNATURE_EAX;
#else
@@ -401,8 +384,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
kvm_apic_set_version(vcpu);
}

- vcpu->arch.guest_supported_xcr0 =
- cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
+ vcpu->arch.guest_supported_xcr0 = cpuid_get_supported_xcr0(vcpu);

vcpu->arch.pv_cpuid.features = kvm_apply_cpuid_pv_features_quirk(vcpu);

@@ -1532,16 +1514,14 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
u32 function, u32 index)
{
- return cpuid_entry2_find(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent,
- function, index);
+ return cpuid_entry2_find(vcpu, function, index);
}
EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry_index);

struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
u32 function)
{
- return cpuid_entry2_find(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent,
- function, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+ return cpuid_entry2_find(vcpu, function, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
}
EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry);

--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:49:37

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 31/49] KVM: x86: Move kvm_find_cpuid_entry{,_index}() up near cpuid_entry2_find()

Move kvm_find_cpuid_entry{,_index}() "up" in cpuid.c so that they are
colocated with cpuid_entry2_find(), e.g. to make it easier to see the
effective guts of the helpers without having to bounce around cpuid.c.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0526f25a7c80..d7390ade1c29 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -175,6 +175,20 @@ static struct kvm_cpuid_entry2 *cpuid_entry2_find(struct kvm_vcpu *vcpu,
return NULL;
}

+struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
+ u32 function, u32 index)
+{
+ return cpuid_entry2_find(vcpu, function, index);
+}
+EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry_index);
+
+struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
+ u32 function)
+{
+ return cpuid_entry2_find(vcpu, function, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+}
+EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry);
+
static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;
@@ -1511,20 +1525,6 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
return r;
}

-struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
- u32 function, u32 index)
-{
- return cpuid_entry2_find(vcpu, function, index);
-}
-EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry_index);
-
-struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
- u32 function)
-{
- return cpuid_entry2_find(vcpu, function, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
-}
-EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry);
-
/*
* Intel CPUID semantics treats any query for an out-of-range leaf as if the
* highest basic leaf (i.e. CPUID.0H:EAX) were requested. AMD CPUID semantics
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:49:38

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 17/49] KVM: x86: Do reverse CPUID sanity checks in __feature_leaf()

Do the compile-time sanity checks on reverse_cpuid in __feature_leaf() so
that higher level APIs don't need to "manually" perform the sanity checks.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.h | 3 ---
arch/x86/kvm/reverse_cpuid.h | 6 ++++--
2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 7eb3d7318fc4..d68b7d879820 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -198,7 +198,6 @@ static __always_inline void kvm_cpu_cap_clear(unsigned int x86_feature)
{
unsigned int x86_leaf = __feature_leaf(x86_feature);

- reverse_cpuid_check(x86_leaf);
kvm_cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature);
}

@@ -206,7 +205,6 @@ static __always_inline void kvm_cpu_cap_set(unsigned int x86_feature)
{
unsigned int x86_leaf = __feature_leaf(x86_feature);

- reverse_cpuid_check(x86_leaf);
kvm_cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
}

@@ -214,7 +212,6 @@ static __always_inline u32 kvm_cpu_cap_get(unsigned int x86_feature)
{
unsigned int x86_leaf = __feature_leaf(x86_feature);

- reverse_cpuid_check(x86_leaf);
return kvm_cpu_caps[x86_leaf] & __feature_bit(x86_feature);
}

diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h
index 2f4e155080ba..245f71c16272 100644
--- a/arch/x86/kvm/reverse_cpuid.h
+++ b/arch/x86/kvm/reverse_cpuid.h
@@ -136,7 +136,10 @@ static __always_inline u32 __feature_translate(int x86_feature)

static __always_inline u32 __feature_leaf(int x86_feature)
{
- return __feature_translate(x86_feature) / 32;
+ u32 x86_leaf = __feature_translate(x86_feature) / 32;
+
+ reverse_cpuid_check(x86_leaf);
+ return x86_leaf;
}

/*
@@ -159,7 +162,6 @@ static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned int x86_featu
{
unsigned int x86_leaf = __feature_leaf(x86_feature);

- reverse_cpuid_check(x86_leaf);
return reverse_cpuid[x86_leaf];
}

--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:49:56

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 32/49] KVM: x86: Remove all direct usage of cpuid_entry2_find()

Convert all use of cpuid_entry2_find() to kvm_find_cpuid_entry{,index}()
now that cpuid_entry2_find() operates on the vCPU state, i.e. now that
there is no need to use cpuid_entry2_find() directly in order to pass in
non-vCPU state.

To help prevent unwanted usage of cpuid_entry2_find(), #undef
KVM_CPUID_INDEX_NOT_SIGNIFICANT, i.e. force KVM to use
kvm_find_cpuid_entry().

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 28 ++++++++++++++++------------
1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index d7390ade1c29..699ce4261e9c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -189,6 +189,12 @@ struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
}
EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry);

+/*
+ * cpuid_entry2_find() and KVM_CPUID_INDEX_NOT_SIGNIFICANT should never be used
+ * directly outside of kvm_find_cpuid_entry() and kvm_find_cpuid_entry_index().
+ */
+#undef KVM_CPUID_INDEX_NOT_SIGNIFICANT
+
static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;
@@ -198,8 +204,7 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
* The existing code assumes virtual address is 48-bit or 57-bit in the
* canonical address checks; exit if it is ever changed.
*/
- best = cpuid_entry2_find(vcpu, 0x80000008,
- KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+ best = kvm_find_cpuid_entry(vcpu, 0x80000008);
if (best) {
int vaddr_bits = (best->eax & 0xff00) >> 8;

@@ -211,7 +216,7 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
* Exposing dynamic xfeatures to the guest requires additional
* enabling in the FPU, e.g. to expand the guest XSAVE state size.
*/
- best = cpuid_entry2_find(vcpu, 0xd, 0);
+ best = kvm_find_cpuid_entry_index(vcpu, 0xd, 0);
if (!best)
return 0;

@@ -254,7 +259,7 @@ static struct kvm_hypervisor_cpuid kvm_get_hypervisor_cpuid(struct kvm_vcpu *vcp
u32 base;

for_each_possible_hypervisor_cpuid_base(base) {
- entry = cpuid_entry2_find(vcpu, base, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+ entry = kvm_find_cpuid_entry(vcpu, base);

if (entry) {
u32 signature[3];
@@ -301,7 +306,7 @@ static u64 cpuid_get_supported_xcr0(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;

- best = cpuid_entry2_find(vcpu, 0xd, 0);
+ best = kvm_find_cpuid_entry_index(vcpu, 0xd, 0);
if (!best)
return 0;

@@ -312,7 +317,7 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;

- best = cpuid_entry2_find(vcpu, 1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+ best = kvm_find_cpuid_entry(vcpu, 1);
if (best) {
/* Update OSXSAVE bit */
if (boot_cpu_has(X86_FEATURE_XSAVE))
@@ -323,22 +328,22 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
}

- best = cpuid_entry2_find(vcpu, 7, 0);
+ best = kvm_find_cpuid_entry_index(vcpu, 7, 0);
if (best && boot_cpu_has(X86_FEATURE_PKU) && best->function == 0x7)
cpuid_entry_change(best, X86_FEATURE_OSPKE,
kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));

- best = cpuid_entry2_find(vcpu, 0xD, 0);
+ best = kvm_find_cpuid_entry_index(vcpu, 0xD, 0);
if (best)
best->ebx = xstate_required_size(vcpu->arch.xcr0, false);

- best = cpuid_entry2_find(vcpu, 0xD, 1);
+ best = kvm_find_cpuid_entry_index(vcpu, 0xD, 1);
if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
best->ebx = xstate_required_size(vcpu->arch.xcr0, true);

if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
- best = cpuid_entry2_find(vcpu, 0x1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+ best = kvm_find_cpuid_entry(vcpu, 0x1);
if (best)
cpuid_entry_change(best, X86_FEATURE_MWAIT,
vcpu->arch.ia32_misc_enable_msr &
@@ -352,8 +357,7 @@ static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
#ifdef CONFIG_KVM_HYPERV
struct kvm_cpuid_entry2 *entry;

- entry = cpuid_entry2_find(vcpu, HYPERV_CPUID_INTERFACE,
- KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+ entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_INTERFACE);
return entry && entry->eax == HYPERV_CPUID_SIGNATURE_EAX;
#else
return false;
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:50:15

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 33/49] KVM: x86: Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID

Advertise TSC_DEADLINE_TIMER via KVM_GET_SUPPORTED_CPUID when it's
supported in hardware, as the odds of a VMM emulating the local APIC in
userspace, not emulating the TSC deadline timer, _and_ reflecting
KVM_GET_SUPPORTED_CPUID back into KVM_SET_CPUID2 are extremely low.

KVM has _unconditionally_ advertised X2APIC via CPUID since commit
0d1de2d901f4 ("KVM: Always report x2apic as supported feature"), and it
is completely impossible for userspace to emulate X2APIC as KVM doesn't
support forwarding the MSR accesses to userspace. I.e. KVM has relied on
userspace VMMs to not misreport local APIC capabilities for nearly 13
years.

Signed-off-by: Sean Christopherson <[email protected]>
---
Documentation/virt/kvm/api.rst | 9 ++++++---
arch/x86/kvm/cpuid.c | 4 ++--
2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 884846282d06..cb744a646de6 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1804,15 +1804,18 @@ emulate them efficiently. The fields in each entry are defined as follows:
the values returned by the cpuid instruction for
this function/index combination

-The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always returned
-as false, since the feature depends on KVM_CREATE_IRQCHIP for local APIC
-support. Instead it is reported via::
+x2APIC (CPUID leaf 1, ecx[21) and TSC deadline timer (CPUID leaf 1, ecx[24])
+may be returned as true, but they depend on KVM_CREATE_IRQCHIP for in-kernel
+emulation of the local APIC. TSC deadline timer support is also reported via::

ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER)

if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
feature in userspace, then you can enable the feature for KVM_SET_CPUID2.

+Enabling x2APIC in KVM_SET_CPUID2 requires KVM_CREATE_IRQCHIP as KVM doesn't
+support forwarding x2APIC MSR accesses to userspace, i.e. KVM does not support
+emulating x2APIC in userspace.

4.47 KVM_PPC_GET_PVINFO
-----------------------
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 699ce4261e9c..d1f427284ccc 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -680,8 +680,8 @@ void kvm_set_cpu_caps(void)
F(FMA) | F(CX16) | 0 /* xTPR Update */ | F(PDCM) |
F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
F(XMM4_2) | EMUL_F(X2APIC) | F(MOVBE) | F(POPCNT) |
- 0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
- F(F16C) | F(RDRAND)
+ EMUL_F(TSC_DEADLINE_TIMER) | F(AES) | F(XSAVE) |
+ 0 /* OSXSAVE */ | F(AVX) | F(F16C) | F(RDRAND)
);

kvm_cpu_cap_init(CPUID_1_EDX,
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:50:32

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 34/49] KVM: x86: Advertise HYPERVISOR in KVM_GET_SUPPORTED_CPUID

Unconditionally advertise "support" for the HYPERVISOR feature in CPUID,
as the flag simply communicates to the guest that's it's running under a
hypervisor.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index d1f427284ccc..de898d571faa 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -681,7 +681,8 @@ void kvm_set_cpu_caps(void)
F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
F(XMM4_2) | EMUL_F(X2APIC) | F(MOVBE) | F(POPCNT) |
EMUL_F(TSC_DEADLINE_TIMER) | F(AES) | F(XSAVE) |
- 0 /* OSXSAVE */ | F(AVX) | F(F16C) | F(RDRAND)
+ 0 /* OSXSAVE */ | F(AVX) | F(F16C) | F(RDRAND) |
+ EMUL_F(HYPERVISOR)
);

kvm_cpu_cap_init(CPUID_1_EDX,
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:50:32

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 11/49] KVM: x86: Disallow KVM_CAP_X86_DISABLE_EXITS after vCPU creation

Reject KVM_CAP_X86_DISABLE_EXITS if vCPUs have been created, as disabling
PAUSE/MWAIT/HLT exits after vCPUs have been created is broken and useless,
e.g. except for PAUSE on SVM, the relevant intercepts aren't updated after
vCPU creation. vCPUs may also end up with an inconsistent configuration
if exits are disabled between creation of multiple vCPUs.

Cc: Hou Wenlong <[email protected]>
Link: https://lore.kernel.org/all/9227068821b275ac547eb2ede09ec65d2281fe07.1680179693.git.houwenlong.hwl@antgroup.com
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
Documentation/virt/kvm/api.rst | 1 +
arch/x86/kvm/x86.c | 6 ++++++
2 files changed, 7 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 6ab8b5b7c64e..884846282d06 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7645,6 +7645,7 @@ branch to guests' 0x200 interrupt vector.
:Architectures: x86
:Parameters: args[0] defines which exits are disabled
:Returns: 0 on success, -EINVAL when args[0] contains invalid exits
+ or if any vCPUs have already been created

Valid bits in args[0] are::

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bb34891d2f0a..4cb0c150a2f8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6568,6 +6568,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS)
break;

+ mutex_lock(&kvm->lock);
+ if (kvm->created_vcpus)
+ goto disable_exits_unlock;
+
if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
kvm->arch.pause_in_guest = true;

@@ -6589,6 +6593,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
}

r = 0;
+disable_exits_unlock:
+ mutex_unlock(&kvm->lock);
break;
case KVM_CAP_MSR_PLATFORM_INFO:
kvm->arch.guest_can_read_msr_platform_info = cap->args[0];
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:50:36

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 19/49] KVM: x86: Add a macro to init CPUID features that ignore host kernel support

Add a macro for use in kvm_set_cpu_caps() to automagically initialize
features that KVM wants to support based solely on the CPU's capabilities,
e.g. KVM advertises LA57 support if it's available in hardware, even if
the host kernel isn't utilizing 57-bit virtual addresses.

Take advantage of the fact that kvm_cpu_cap_mask() adjusts kvm_cpu_caps
based on raw CPUID, i.e. will clear features bits that aren't supported in
hardware, and simply force-set the capability before applying the mask.

Abusing kvm_cpu_cap_set() is a borderline evil shenanigan, but doing so
avoid extra CPUID lookups, and a future commit will harden the entire
family of *F() macros to assert (at compile time) that every feature being
allowed is part of the capability word being processed, i.e. using a macro
will bring more advantages in the future.

Avoiding CPUID also fixes a largely benign bug where KVM could incorrectly
report LA57 support on Intel CPUs whose max supported CPUID is less than 7,
i.e. if the max supported leaf (<7) happened to have bit 16 set. In
practice, barring a funky virtual machine setup, the bug is benign as all
known CPUs that support VMX also support leaf 7.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 77625a5477b1..a802c09b50ab 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -70,6 +70,18 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
(boot_cpu_has(X86_FEATURE_##name) ? F(name) : 0); \
})

+/*
+ * Raw Feature - For features that KVM supports based purely on raw host CPUID,
+ * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
+ * Simply force set the feature in KVM's capabilities, raw CPUID support will
+ * be factored in by kvm_cpu_cap_mask().
+ */
+#define RAW_F(name) \
+({ \
+ kvm_cpu_cap_set(X86_FEATURE_##name); \
+ F(name); \
+})
+
/*
* Magic value used by KVM when querying userspace-provided CPUID entries and
* doesn't care about the CPIUD index because the index of the function in
@@ -682,15 +694,12 @@ void kvm_set_cpu_caps(void)
F(AVX512VL));

kvm_cpu_cap_mask(CPUID_7_ECX,
- F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
+ F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ |
F(SGX_LC) | F(BUS_LOCK_DETECT)
);
- /* Set LA57 based on hardware capability. */
- if (cpuid_ecx(7) & F(LA57))
- kvm_cpu_cap_set(X86_FEATURE_LA57);

/*
* PKU not yet implemented for shadow paging and requires OSPKE
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:51:02

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 07/49] KVM: selftests: Verify KVM stuffs runtime CPUID OS bits on CR4 writes

Extend x86's set sregs test to verify that KVM sets/clears OSXSAVE and
OSKPKE according to CR4.XSAVE and CR4.PKE respectively. For performance
reasons, KVM is responsible for emulating the architectural behavior of
the OS CPUID bits tracking CR4.

Signed-off-by: Sean Christopherson <[email protected]>
---
tools/testing/selftests/kvm/x86_64/set_sregs_test.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/tools/testing/selftests/kvm/x86_64/set_sregs_test.c b/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
index 96fd690d479a..f4095a3d1278 100644
--- a/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
+++ b/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
@@ -85,6 +85,16 @@ static void test_cr_bits(struct kvm_vcpu *vcpu, uint64_t cr4)
rc = _vcpu_sregs_set(vcpu, &sregs);
TEST_ASSERT(!rc, "Failed to set supported CR4 bits (0x%lx)", cr4);

+ TEST_ASSERT(!!(sregs.cr4 & X86_CR4_OSXSAVE) ==
+ (vcpu->cpuid && vcpu_cpuid_has(vcpu, X86_FEATURE_OSXSAVE)),
+ "KVM didn't %s OSXSAVE in CPUID as expected",
+ (sregs.cr4 & X86_CR4_OSXSAVE) ? "set" : "clear");
+
+ TEST_ASSERT(!!(sregs.cr4 & X86_CR4_PKE) ==
+ (vcpu->cpuid && vcpu_cpuid_has(vcpu, X86_FEATURE_OSPKE)),
+ "KVM didn't %s OSPKE in CPUID as expected",
+ (sregs.cr4 & X86_CR4_PKE) ? "set" : "clear");
+
vcpu_sregs_get(vcpu, &sregs);
TEST_ASSERT(sregs.cr4 == cr4, "sregs.CR4 (0x%llx) != CR4 (0x%lx)",
sregs.cr4, cr4);
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:51:28

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 38/49] KVM: x86: Initialize guest cpu_caps based on guest CPUID

Initialize a vCPU's capabilities based on the guest CPUID provided by
userspace instead of simply zeroing the entire array. This is the first
step toward using cpu_caps to query *all* CPUID-based guest capabilities,
i.e. will allow converting all usage of guest_cpuid_has() to
guest_cpu_cap_has().

Zeroing the array was the logical choice when using cpu_caps was opt-in,
e.g. "unsupported" was generally a safer default, and the whole point of
governed features is that KVM would need to check host and guest support,
i.e. making everything unsupported by default didn't require more code.

But requiring KVM to manually "enable" every CPUID-based feature in
cpu_caps would require an absurd amount of boilerplate code.

Follow existing CPUID/kvm_cpu_caps nomenclature where possible, e.g. for
the change() and clear() APIs. Replace check_and_set() with constrain()
to try and capture that KVM is constraining userspace's desired guest
feature set based on KVM's capabilities.

This is intended to be gigantic nop, i.e. should not have any impact on
guest or KVM functionality.

This is also an intermediate step; a future commit will also incorporate
KVM support into the vCPU's cpu_caps before converting guest_cpuid_has()
to guest_cpu_cap_has().

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 46 ++++++++++++++++++++++++++++++++++++++++--
arch/x86/kvm/cpuid.h | 25 ++++++++++++++++++++---
arch/x86/kvm/svm/svm.c | 28 +++++++++++++------------
arch/x86/kvm/vmx/vmx.c | 8 +++++---
4 files changed, 86 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 89c506cf649b..fd725cbbcce5 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -381,13 +381,56 @@ static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
#endif
}

+/*
+ * This isn't truly "unsafe", but except for the cpu_caps initialization code,
+ * all register lookups should use __cpuid_entry_get_reg(), which provides
+ * compile-time validation of the input.
+ */
+static u32 cpuid_get_reg_unsafe(struct kvm_cpuid_entry2 *entry, u32 reg)
+{
+ switch (reg) {
+ case CPUID_EAX:
+ return entry->eax;
+ case CPUID_EBX:
+ return entry->ebx;
+ case CPUID_ECX:
+ return entry->ecx;
+ case CPUID_EDX:
+ return entry->edx;
+ default:
+ WARN_ON_ONCE(1);
+ return 0;
+ }
+}
+
void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
{
struct kvm_lapic *apic = vcpu->arch.apic;
struct kvm_cpuid_entry2 *best;
+ struct kvm_cpuid_entry2 *entry;
bool allow_gbpages;
+ int i;

memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps));
+ BUILD_BUG_ON(ARRAY_SIZE(reverse_cpuid) != NR_KVM_CPU_CAPS);
+
+ /*
+ * Reset guest capabilities to userspace's guest CPUID definition, i.e.
+ * honor userspace's definition for features that don't require KVM or
+ * hardware management/support (or that KVM simply doesn't care about).
+ */
+ for (i = 0; i < NR_KVM_CPU_CAPS; i++) {
+ const struct cpuid_reg cpuid = reverse_cpuid[i];
+
+ if (!cpuid.function)
+ continue;
+
+ entry = kvm_find_cpuid_entry_index(vcpu, cpuid.function, cpuid.index);
+ if (!entry)
+ continue;
+
+ vcpu->arch.cpu_caps[i] = cpuid_get_reg_unsafe(entry, cpuid.reg);
+ }

kvm_update_cpuid_runtime(vcpu);

@@ -404,8 +447,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
*/
allow_gbpages = tdp_enabled ? boot_cpu_has(X86_FEATURE_GBPAGES) :
guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES);
- if (allow_gbpages)
- guest_cpu_cap_set(vcpu, X86_FEATURE_GBPAGES);
+ guest_cpu_cap_change(vcpu, X86_FEATURE_GBPAGES, allow_gbpages);

best = kvm_find_cpuid_entry(vcpu, 1);
if (best && apic) {
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index ad0168d3aec5..c2c2b8aa347b 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -265,11 +265,30 @@ static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu,
vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
}

-static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
- unsigned int x86_feature)
+static __always_inline void guest_cpu_cap_clear(struct kvm_vcpu *vcpu,
+ unsigned int x86_feature)
{
- if (kvm_cpu_cap_has(x86_feature) && guest_cpuid_has(vcpu, x86_feature))
+ unsigned int x86_leaf = __feature_leaf(x86_feature);
+
+ reverse_cpuid_check(x86_leaf);
+ vcpu->arch.cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature);
+}
+
+static __always_inline void guest_cpu_cap_change(struct kvm_vcpu *vcpu,
+ unsigned int x86_feature,
+ bool guest_has_cap)
+{
+ if (guest_has_cap)
guest_cpu_cap_set(vcpu, x86_feature);
+ else
+ guest_cpu_cap_clear(vcpu, x86_feature);
+}
+
+static __always_inline void guest_cpu_cap_constrain(struct kvm_vcpu *vcpu,
+ unsigned int x86_feature)
+{
+ if (!kvm_cpu_cap_has(x86_feature))
+ guest_cpu_cap_clear(vcpu, x86_feature);
}

static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2acd2e3bb1b0..1bc431a7e862 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4339,27 +4339,29 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
* XSS on VM-Enter/VM-Exit. Failure to do so would effectively give
* the guest read/write access to the host's XSS.
*/
- if (boot_cpu_has(X86_FEATURE_XSAVE) &&
- boot_cpu_has(X86_FEATURE_XSAVES) &&
- guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
- guest_cpu_cap_set(vcpu, X86_FEATURE_XSAVES);
+ guest_cpu_cap_change(vcpu, X86_FEATURE_XSAVES,
+ boot_cpu_has(X86_FEATURE_XSAVE) &&
+ boot_cpu_has(X86_FEATURE_XSAVES) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVE));

- guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_NRIPS);
- guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_TSCRATEMSR);
- guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LBRV);
+ guest_cpu_cap_constrain(vcpu, X86_FEATURE_NRIPS);
+ guest_cpu_cap_constrain(vcpu, X86_FEATURE_TSCRATEMSR);
+ guest_cpu_cap_constrain(vcpu, X86_FEATURE_LBRV);

/*
* Intercept VMLOAD if the vCPU mode is Intel in order to emulate that
* VMLOAD drops bits 63:32 of SYSENTER (ignoring the fact that exposing
* SVM on Intel is bonkers and extremely unlikely to work).
*/
- if (!guest_cpuid_is_intel(vcpu))
- guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
+ if (guest_cpuid_is_intel(vcpu))
+ guest_cpu_cap_clear(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
+ else
+ guest_cpu_cap_constrain(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);

- guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PAUSEFILTER);
- guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PFTHRESHOLD);
- guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VGIF);
- guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VNMI);
+ guest_cpu_cap_constrain(vcpu, X86_FEATURE_PAUSEFILTER);
+ guest_cpu_cap_constrain(vcpu, X86_FEATURE_PFTHRESHOLD);
+ guest_cpu_cap_constrain(vcpu, X86_FEATURE_VGIF);
+ guest_cpu_cap_constrain(vcpu, X86_FEATURE_VNMI);

svm_recalc_instruction_intercepts(vcpu, svm);

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1bc56596d653..d873386e1473 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7838,10 +7838,12 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
*/
if (boot_cpu_has(X86_FEATURE_XSAVE) &&
guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
- guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_XSAVES);
+ guest_cpu_cap_constrain(vcpu, X86_FEATURE_XSAVES);
+ else
+ guest_cpu_cap_clear(vcpu, X86_FEATURE_XSAVES);

- guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VMX);
- guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LAM);
+ guest_cpu_cap_constrain(vcpu, X86_FEATURE_VMX);
+ guest_cpu_cap_constrain(vcpu, X86_FEATURE_LAM);

vmx_setup_uret_msrs(vmx);

--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:52:15

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 13/49] KVM: selftests: Fix a bad TEST_REQUIRE() in x86's KVM PV test

Actually check for KVM support for disabling HLT-exiting instead of
effectively checking that KVM_CAP_X86_DISABLE_EXITS is #defined to a
non-zero value, and convert the TEST_REQUIRE() to a simple return so
that only the sub-test is skipped if HLT-exiting is mandatory.

The goof has likely gone unnoticed because all x86 CPUs support disabling
HLT-exiting, only systems with the opt-in mitigate_smt_rsb KVM module
param disallow HLT-exiting.

Signed-off-by: Sean Christopherson <[email protected]>
---
tools/testing/selftests/kvm/x86_64/kvm_pv_test.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c b/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c
index 78878b3a2725..2aee93108a54 100644
--- a/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c
+++ b/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c
@@ -140,10 +140,11 @@ static void test_pv_unhalt(void)
struct kvm_cpuid_entry2 *ent;
u32 kvm_sig_old;

+ if (!(kvm_check_cap(KVM_CAP_X86_DISABLE_EXITS) & KVM_X86_DISABLE_EXITS_HLT))
+ return;
+
pr_info("testing KVM_FEATURE_PV_UNHALT\n");

- TEST_REQUIRE(KVM_CAP_X86_DISABLE_EXITS);
-
/* KVM_PV_UNHALT test */
vm = vm_create_with_one_vcpu(&vcpu, guest_main);
vcpu_set_cpuid_feature(vcpu, X86_FEATURE_KVM_PV_UNHALT);
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:52:41

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 14/49] KVM: selftests: Update x86's KVM PV test to match KVM's disabling exits behavior

Rework x86's KVM PV features test to align with KVM's new, fixed behavior
of not allowing userspace to disable HLT-exiting after vCPUs have been
created. Rework the core testcase to disable HLT-exiting before creating
a vCPU, and opportunistically modify keep the paired VM+vCPU creation to
verify that KVM rejects KVM_CAP_X86_DISABLE_EXITS as expected.

Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/x86_64/kvm_pv_test.c | 33 +++++++++++++++++--
1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c b/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c
index 2aee93108a54..1b805cbdb47b 100644
--- a/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c
+++ b/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c
@@ -139,6 +139,7 @@ static void test_pv_unhalt(void)
struct kvm_vm *vm;
struct kvm_cpuid_entry2 *ent;
u32 kvm_sig_old;
+ int r;

if (!(kvm_check_cap(KVM_CAP_X86_DISABLE_EXITS) & KVM_X86_DISABLE_EXITS_HLT))
return;
@@ -152,19 +153,45 @@ static void test_pv_unhalt(void)
TEST_ASSERT(vcpu_cpuid_has(vcpu, X86_FEATURE_KVM_PV_UNHALT),
"Enabling X86_FEATURE_KVM_PV_UNHALT had no effect");

- /* Make sure KVM clears vcpu->arch.kvm_cpuid */
+ /* Verify KVM disallows disabling exits after vCPU creation. */
+ r = __vm_enable_cap(vm, KVM_CAP_X86_DISABLE_EXITS, KVM_X86_DISABLE_EXITS_HLT);
+ TEST_ASSERT(r && errno == EINVAL,
+ "Disabling exits after vCPU creation didn't fail as expected");
+
+ kvm_vm_free(vm);
+
+ /* Verify that KVM clear PV_UNHALT from guest CPUID. */
+ vm = vm_create(1);
+ vm_enable_cap(vm, KVM_CAP_X86_DISABLE_EXITS, KVM_X86_DISABLE_EXITS_HLT);
+
+ vcpu = vm_vcpu_add(vm, 0, NULL);
+ TEST_ASSERT(!vcpu_cpuid_has(vcpu, X86_FEATURE_KVM_PV_UNHALT),
+ "vCPU created with PV_UNHALT set by default");
+
+ vcpu_set_cpuid_feature(vcpu, X86_FEATURE_KVM_PV_UNHALT);
+ TEST_ASSERT(!vcpu_cpuid_has(vcpu, X86_FEATURE_KVM_PV_UNHALT),
+ "PV_UNHALT set in guest CPUID when HLT-exiting is disabled");
+
+ /*
+ * Clobber the KVM PV signature and verify KVM does NOT clear PV_UNHALT
+ * when KVM PV is not present, and DOES clear PV_UNHALT when switching
+ * back to the correct signature..
+ */
ent = vcpu_get_cpuid_entry(vcpu, KVM_CPUID_SIGNATURE);
kvm_sig_old = ent->ebx;
ent->ebx = 0xdeadbeef;
vcpu_set_cpuid(vcpu);

- vm_enable_cap(vm, KVM_CAP_X86_DISABLE_EXITS, KVM_X86_DISABLE_EXITS_HLT);
+ vcpu_set_cpuid_feature(vcpu, X86_FEATURE_KVM_PV_UNHALT);
+ TEST_ASSERT(vcpu_cpuid_has(vcpu, X86_FEATURE_KVM_PV_UNHALT),
+ "PV_UNHALT cleared when using bogus KVM PV signature");
+
ent = vcpu_get_cpuid_entry(vcpu, KVM_CPUID_SIGNATURE);
ent->ebx = kvm_sig_old;
vcpu_set_cpuid(vcpu);

TEST_ASSERT(!vcpu_cpuid_has(vcpu, X86_FEATURE_KVM_PV_UNHALT),
- "KVM_FEATURE_PV_UNHALT is set with KVM_CAP_X86_DISABLE_EXITS");
+ "PV_UNHALT set in guest CPUID when HLT-exiting is disabled");

/* FIXME: actually test KVM_FEATURE_PV_UNHALT feature */

--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:52:47

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 40/49] KVM: x86: Initialize guest cpu_caps based on KVM support

Constrain all guest cpu_caps based on KVM support instead of constraining
only the few features that KVM _currently_ needs to verify are actually
supported by KVM. The intent of cpu_caps is to track what the guest is
actually capable of using, not the raw, unfiltered CPUID values that the
guest sees.

I.e. KVM should always consult it's only support when making decisions
based on guest CPUID, and the only reason KVM has historically made the
checks opt-in was due to lack of centralized tracking.

Suggested-by: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 14 +++++++++++++-
arch/x86/kvm/cpuid.h | 7 -------
arch/x86/kvm/svm/svm.c | 11 -----------
arch/x86/kvm/vmx/vmx.c | 9 ++-------
4 files changed, 15 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index d1849fe874ab..8ada1cac8fcb 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -403,6 +403,8 @@ static u32 cpuid_get_reg_unsafe(struct kvm_cpuid_entry2 *entry, u32 reg)
}
}

+static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func);
+
void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
{
struct kvm_lapic *apic = vcpu->arch.apic;
@@ -421,6 +423,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
*/
for (i = 0; i < NR_KVM_CPU_CAPS; i++) {
const struct cpuid_reg cpuid = reverse_cpuid[i];
+ struct kvm_cpuid_entry2 emulated;

if (!cpuid.function)
continue;
@@ -429,7 +432,16 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
if (!entry)
continue;

- vcpu->arch.cpu_caps[i] = cpuid_get_reg_unsafe(entry, cpuid.reg);
+ cpuid_func_emulated(&emulated, cpuid.function);
+
+ /*
+ * A vCPU has a feature if it's supported by KVM and is enabled
+ * in guest CPUID. Note, this includes features that are
+ * supported by KVM but aren't advertised to userspace!
+ */
+ vcpu->arch.cpu_caps[i] = kvm_cpu_caps[i] | kvm_vmm_cpu_caps[i] |
+ cpuid_get_reg_unsafe(&emulated, cpuid.reg);
+ vcpu->arch.cpu_caps[i] &= cpuid_get_reg_unsafe(entry, cpuid.reg);
}

kvm_update_cpuid_runtime(vcpu);
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index c2c2b8aa347b..60da304db4e4 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -284,13 +284,6 @@ static __always_inline void guest_cpu_cap_change(struct kvm_vcpu *vcpu,
guest_cpu_cap_clear(vcpu, x86_feature);
}

-static __always_inline void guest_cpu_cap_constrain(struct kvm_vcpu *vcpu,
- unsigned int x86_feature)
-{
- if (!kvm_cpu_cap_has(x86_feature))
- guest_cpu_cap_clear(vcpu, x86_feature);
-}
-
static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
unsigned int x86_feature)
{
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1bc431a7e862..946a75771946 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4344,10 +4344,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
boot_cpu_has(X86_FEATURE_XSAVES) &&
guest_cpuid_has(vcpu, X86_FEATURE_XSAVE));

- guest_cpu_cap_constrain(vcpu, X86_FEATURE_NRIPS);
- guest_cpu_cap_constrain(vcpu, X86_FEATURE_TSCRATEMSR);
- guest_cpu_cap_constrain(vcpu, X86_FEATURE_LBRV);
-
/*
* Intercept VMLOAD if the vCPU mode is Intel in order to emulate that
* VMLOAD drops bits 63:32 of SYSENTER (ignoring the fact that exposing
@@ -4355,13 +4351,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
*/
if (guest_cpuid_is_intel(vcpu))
guest_cpu_cap_clear(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
- else
- guest_cpu_cap_constrain(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
-
- guest_cpu_cap_constrain(vcpu, X86_FEATURE_PAUSEFILTER);
- guest_cpu_cap_constrain(vcpu, X86_FEATURE_PFTHRESHOLD);
- guest_cpu_cap_constrain(vcpu, X86_FEATURE_VGIF);
- guest_cpu_cap_constrain(vcpu, X86_FEATURE_VNMI);

svm_recalc_instruction_intercepts(vcpu, svm);

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d873386e1473..653c4b68ec7f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7836,15 +7836,10 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
* to the guest. XSAVES depends on CR4.OSXSAVE, and CR4.OSXSAVE can be
* set if and only if XSAVE is supported.
*/
- if (boot_cpu_has(X86_FEATURE_XSAVE) &&
- guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
- guest_cpu_cap_constrain(vcpu, X86_FEATURE_XSAVES);
- else
+ if (!boot_cpu_has(X86_FEATURE_XSAVE) ||
+ !guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
guest_cpu_cap_clear(vcpu, X86_FEATURE_XSAVES);

- guest_cpu_cap_constrain(vcpu, X86_FEATURE_VMX);
- guest_cpu_cap_constrain(vcpu, X86_FEATURE_LAM);
-
vmx_setup_uret_msrs(vmx);

if (cpu_has_secondary_exec_ctrls())
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:52:57

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed

Reject KVM_CAP_X86_DISABLE_EXITS if userspace attempts to disable MWAIT or
HLT exits and KVM previously reported (via KVM_CHECK_EXTENSION) that
disabling the exit(s) is not allowed. E.g. because MWAIT isn't supported
or the CPU doesn't have an aways-running APIC timer, or because KVM is
configured to mitigate cross-thread vulnerabilities.

Cc: Kechen Lu <[email protected]>
Fixes: 4d5422cea3b6 ("KVM: X86: Provide a capability to disable MWAIT intercepts")
Fixes: 6f0f2d5ef895 ("KVM: x86: Mitigate the cross-thread return address predictions bug")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/x86.c | 54 ++++++++++++++++++++++++----------------------
1 file changed, 28 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4cb0c150a2f8..c729227c6501 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4590,6 +4590,20 @@ static inline bool kvm_can_mwait_in_guest(void)
boot_cpu_has(X86_FEATURE_ARAT);
}

+static u64 kvm_get_allowed_disable_exits(void)
+{
+ u64 r = KVM_X86_DISABLE_EXITS_PAUSE;
+
+ if (!mitigate_smt_rsb) {
+ r |= KVM_X86_DISABLE_EXITS_HLT |
+ KVM_X86_DISABLE_EXITS_CSTATE;
+
+ if (kvm_can_mwait_in_guest())
+ r |= KVM_X86_DISABLE_EXITS_MWAIT;
+ }
+ return r;
+}
+
#ifdef CONFIG_KVM_HYPERV
static int kvm_ioctl_get_supported_hv_cpuid(struct kvm_vcpu *vcpu,
struct kvm_cpuid2 __user *cpuid_arg)
@@ -4726,15 +4740,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = KVM_CLOCK_VALID_FLAGS;
break;
case KVM_CAP_X86_DISABLE_EXITS:
- r = KVM_X86_DISABLE_EXITS_PAUSE;
-
- if (!mitigate_smt_rsb) {
- r |= KVM_X86_DISABLE_EXITS_HLT |
- KVM_X86_DISABLE_EXITS_CSTATE;
-
- if (kvm_can_mwait_in_guest())
- r |= KVM_X86_DISABLE_EXITS_MWAIT;
- }
+ r |= kvm_get_allowed_disable_exits();
break;
case KVM_CAP_X86_SMM:
if (!IS_ENABLED(CONFIG_KVM_SMM))
@@ -6565,33 +6571,29 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
break;
case KVM_CAP_X86_DISABLE_EXITS:
r = -EINVAL;
- if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS)
+ if (cap->args[0] & ~kvm_get_allowed_disable_exits())
break;

mutex_lock(&kvm->lock);
if (kvm->created_vcpus)
goto disable_exits_unlock;

- if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
- kvm->arch.pause_in_guest = true;
-
#define SMT_RSB_MSG "This processor is affected by the Cross-Thread Return Predictions vulnerability. " \
"KVM_CAP_X86_DISABLE_EXITS should only be used with SMT disabled or trusted guests."

- if (!mitigate_smt_rsb) {
- if (boot_cpu_has_bug(X86_BUG_SMT_RSB) && cpu_smt_possible() &&
- (cap->args[0] & ~KVM_X86_DISABLE_EXITS_PAUSE))
- pr_warn_once(SMT_RSB_MSG);
-
- if ((cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT) &&
- kvm_can_mwait_in_guest())
- kvm->arch.mwait_in_guest = true;
- if (cap->args[0] & KVM_X86_DISABLE_EXITS_HLT)
- kvm->arch.hlt_in_guest = true;
- if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
- kvm->arch.cstate_in_guest = true;
- }
+ if (!mitigate_smt_rsb && boot_cpu_has_bug(X86_BUG_SMT_RSB) &&
+ cpu_smt_possible() &&
+ (cap->args[0] & ~KVM_X86_DISABLE_EXITS_PAUSE))
+ pr_warn_once(SMT_RSB_MSG);

+ if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
+ kvm->arch.pause_in_guest = true;
+ if (cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT)
+ kvm->arch.mwait_in_guest = true;
+ if (cap->args[0] & KVM_X86_DISABLE_EXITS_HLT)
+ kvm->arch.hlt_in_guest = true;
+ if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
+ kvm->arch.cstate_in_guest = true;
r = 0;
disable_exits_unlock:
mutex_unlock(&kvm->lock);
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:53:06

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 18/49] KVM: x86: Account for max supported CPUID leaf when getting raw host CPUID

Explicitly zero out the feature word in kvm_cpu_caps if the word's
associated CPUID function is greater than the max leaf supported by the
CPU. For such unsupported functions, Intel CPUs return the output from
the last supported leaf, not all zeros.

Practically speaking, this is likely a benign bug, as KVM uses the raw
host CPUID to mask the kernel's computed capabilities, and the kernel does
perform max leaf checks when populating boot_cpu_data. The only way KVM's
goof could be problematic is if the kernel force-set a feature in a leaf
that is completely unsupported, _and_ the max supported leaf happened to
return a value with '1' the same bit position. Which is theoretically
possible, but extremely unlikely. And even if that did happen, it's
entirely possible that KVM would still provide the correct functionality;
the kernel did set the capability after all.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 29 ++++++++++++++++++++++++-----
1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a51e48663f53..77625a5477b1 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -571,18 +571,37 @@ int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
return 0;
}

+static __always_inline u32 raw_cpuid_get(struct cpuid_reg cpuid)
+{
+ struct kvm_cpuid_entry2 entry;
+ u32 base;
+
+ /*
+ * KVM only supports features defined by Intel (0x0), AMD (0x80000000),
+ * and Centaur (0xc0000000). WARN if a feature for new vendor base is
+ * defined, as this and other code would need to be updated.
+ */
+ base = cpuid.function & 0xffff0000;
+ if (WARN_ON_ONCE(base && base != 0x80000000 && base != 0xc0000000))
+ return 0;
+
+ if (cpuid_eax(base) < cpuid.function)
+ return 0;
+
+ cpuid_count(cpuid.function, cpuid.index,
+ &entry.eax, &entry.ebx, &entry.ecx, &entry.edx);
+
+ return *__cpuid_entry_get_reg(&entry, cpuid.reg);
+}
+
/* Mask kvm_cpu_caps for @leaf with the raw CPUID capabilities of this CPU. */
static __always_inline void __kvm_cpu_cap_mask(unsigned int leaf)
{
const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
- struct kvm_cpuid_entry2 entry;

reverse_cpuid_check(leaf);

- cpuid_count(cpuid.function, cpuid.index,
- &entry.eax, &entry.ebx, &entry.ecx, &entry.edx);
-
- kvm_cpu_caps[leaf] &= *__cpuid_entry_get_reg(&entry, cpuid.reg);
+ kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);
}

static __always_inline
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:53:18

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 09/49] KVM: x86/pmu: Drop now-redundant refresh() during init()

Drop the manual kvm_pmu_refresh() from kvm_pmu_init() now that
kvm_arch_vcpu_create() performs the refresh via kvm_vcpu_after_set_cpuid().

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/pmu.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index a593b03c9aed..31920dd1aa83 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -797,7 +797,6 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu)

memset(pmu, 0, sizeof(*pmu));
static_call(kvm_x86_pmu_init)(vcpu);
- kvm_pmu_refresh(vcpu);
}

/* Release perf_events for vPMCs that have been unused for a full time slice. */
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:53:40

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 35/49] KVM: x86: Add a macro to handle features that are fully VMM controlled

Add a macro to track CPUID features for which KVM fully defers to
userspace, i.e. that KVM honors if they are enumerated to the guest, even
if KVM itself doesn't advertise them to usersepace.

Somewhat unfortunately, this behavior only applies to MWAIT (largely
because of KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS), and it's not all that
likely future features will be handled in a similar way. I.e. very
arguably, potentially tracking every feature in kvm_vmm_cpu_caps is a
waste of memory.

However, adding one-off handling for individual features is quite painful,
especially when considering future hardening. It's very doable to verify,
at compile time, that every CPUID-based feature that KVM queries when
emulating guest behavior is actually known to KVM, e.g. to prevent KVM
bugs where KVM emulates some feature but fails to advertise support to
userspace. In other words, any features that are special cased, i.e. not
handled generically in the CPUID framework, would also need to be special
cased for any hardening efforts that build on said framework.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index de898d571faa..16bb873188d6 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -36,6 +36,8 @@
u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
EXPORT_SYMBOL_GPL(kvm_cpu_caps);

+static u32 kvm_vmm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
+
u32 xstate_required_size(u64 xstate_bv, bool compacted)
{
int feature_bit = 0;
@@ -115,6 +117,21 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
feature_bit(name); \
})

+/*
+ * VMM Features - For features that KVM "supports" in some capacity, i.e. that
+ * KVM may query, but that are never advertised to userspace. E.g. KVM allows
+ * userspace to enumerate MONITOR+MWAIT support to the guest, but the MWAIT
+ * feature flag is never advertised to userspace because MONITOR+MWAIT aren't
+ * virtualized by hardware, can't be faithfully emulated in software (KVM
+ * emulates them as NOPs), and allowing the guest to execute them natively
+ * requires enabling a per-VM capability.
+ */
+#define VMM_F(name) \
+({ \
+ kvm_vmm_cpu_caps[__feature_leaf(X86_FEATURE_##name)] |= F(name); \
+ 0; \
+})
+
/*
* Magic value used by KVM when querying userspace-provided CPUID entries and
* doesn't care about the CPIUD index because the index of the function in
@@ -674,7 +691,7 @@ void kvm_set_cpu_caps(void)
* NOTE: MONITOR (and MWAIT) are emulated as NOP, but *not*
* advertised to guests via CPUID!
*/
- F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
+ F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64 */ | VMM_F(MWAIT) |
0 /* DS-CPL, VMX, SMX, EST */ |
0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
F(FMA) | F(CX16) | 0 /* xTPR Update */ | F(PDCM) |
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:53:48

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 45/49] KVM: x86: Shuffle code to prepare for dropping guest_cpuid_has()

Move the implementations of guest_has_{spec_ctrl,pred_cmd}_msr() down
below guest_cpu_cap_has() so that their use of guest_cpuid_has() can be
replaced with calls to guest_cpu_cap_has().

No functional change intended.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.h | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 60da304db4e4..7be56fa62342 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -168,21 +168,6 @@ static inline int guest_cpuid_stepping(struct kvm_vcpu *vcpu)
return x86_stepping(best->eax);
}

-static inline bool guest_has_spec_ctrl_msr(struct kvm_vcpu *vcpu)
-{
- return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
- guest_cpuid_has(vcpu, X86_FEATURE_AMD_STIBP) ||
- guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBRS) ||
- guest_cpuid_has(vcpu, X86_FEATURE_AMD_SSBD));
-}
-
-static inline bool guest_has_pred_cmd_msr(struct kvm_vcpu *vcpu)
-{
- return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
- guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBPB) ||
- guest_cpuid_has(vcpu, X86_FEATURE_SBPB));
-}
-
static inline bool supports_cpuid_fault(struct kvm_vcpu *vcpu)
{
return vcpu->arch.msr_platform_info & MSR_PLATFORM_INFO_CPUID_FAULT;
@@ -301,4 +286,19 @@ static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr
return kvm_vcpu_is_legal_gpa(vcpu, cr3);
}

+static inline bool guest_has_spec_ctrl_msr(struct kvm_vcpu *vcpu)
+{
+ return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
+ guest_cpuid_has(vcpu, X86_FEATURE_AMD_STIBP) ||
+ guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBRS) ||
+ guest_cpuid_has(vcpu, X86_FEATURE_AMD_SSBD));
+}
+
+static inline bool guest_has_pred_cmd_msr(struct kvm_vcpu *vcpu)
+{
+ return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
+ guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBPB) ||
+ guest_cpuid_has(vcpu, X86_FEATURE_SBPB));
+}
+
#endif
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:53:56

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 27/49] KVM: x86: Swap incoming guest CPUID into vCPU before massaging in KVM_SET_CPUID2

When handling KVM_SET_CPUID{,2}, swap the old and new CPUID arrays and
lengths before processing the new CPUID, and simply undo the swap if
setting the new CPUID fails for whatever reason.

To keep the diff reasonable, continue passing the entry array and length
to most helpers, and defer the more complete cleanup to future commits.

For any sane VMM, setting "bad" CPUID state is not a hot path (or even
something that is surviable), and setting guest CPUID before it's known
good will allow removing all of KVM's infrastructure for processing CPUID
entries directly (as opposed to operating on vcpu->arch.cpuid_entries).

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 49 +++++++++++++++++++++++++++-----------------
1 file changed, 30 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 33e3e77de1b7..4ad01867cb8d 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -175,10 +175,10 @@ static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
return NULL;
}

-static int kvm_check_cpuid(struct kvm_vcpu *vcpu,
- struct kvm_cpuid_entry2 *entries,
- int nent)
+static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
{
+ struct kvm_cpuid_entry2 *entries = vcpu->arch.cpuid_entries;
+ int nent = vcpu->arch.cpuid_nent;
struct kvm_cpuid_entry2 *best;
u64 xfeatures;

@@ -369,9 +369,11 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);

-static bool kvm_cpuid_has_hyperv(struct kvm_cpuid_entry2 *entries, int nent)
+static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
{
#ifdef CONFIG_KVM_HYPERV
+ struct kvm_cpuid_entry2 *entries = vcpu->arch.cpuid_entries;
+ int nent = vcpu->arch.cpuid_nent;
struct kvm_cpuid_entry2 *entry;

entry = cpuid_entry2_find(entries, nent, HYPERV_CPUID_INTERFACE,
@@ -436,8 +438,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
__cr4_reserved_bits(guest_cpuid_has, vcpu);
#undef __kvm_cpu_cap_has

- kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu->arch.cpuid_entries,
- vcpu->arch.cpuid_nent));
+ kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu));

/* Invoke the vendor callback only after the above state is updated. */
static_call(kvm_x86_vcpu_after_set_cpuid)(vcpu);
@@ -478,6 +479,15 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
{
int r;

+ /*
+ * Swap the existing (old) entries with the incoming (new) entries in
+ * order to massage the new entries, e.g. to account for dynamic bits
+ * that KVM controls, without clobbering the current guest CPUID, which
+ * KVM needs to preserve in order to unwind on failure.
+ */
+ swap(vcpu->arch.cpuid_entries, e2);
+ swap(vcpu->arch.cpuid_nent, nent);
+
/*
* KVM does not correctly handle changing guest CPUID after KVM_RUN, as
* MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
@@ -497,31 +507,25 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
* only because any change in CPUID is disallowed, i.e. using
* stale data is ok because KVM will reject the change.
*/
- __kvm_update_cpuid_runtime(vcpu, e2, nent);
+ kvm_update_cpuid_runtime(vcpu);

r = kvm_cpuid_check_equal(vcpu, e2, nent);
if (r)
- return r;
-
- kvfree(e2);
- return 0;
+ goto err;
+ goto success;
}

#ifdef CONFIG_KVM_HYPERV
- if (kvm_cpuid_has_hyperv(e2, nent)) {
+ if (kvm_cpuid_has_hyperv(vcpu)) {
r = kvm_hv_vcpu_init(vcpu);
if (r)
- return r;
+ goto err;
}
#endif

- r = kvm_check_cpuid(vcpu, e2, nent);
+ r = kvm_check_cpuid(vcpu);
if (r)
- return r;
-
- kvfree(vcpu->arch.cpuid_entries);
- vcpu->arch.cpuid_entries = e2;
- vcpu->arch.cpuid_nent = nent;
+ goto err;

vcpu->arch.kvm_cpuid = kvm_get_hypervisor_cpuid(vcpu, KVM_SIGNATURE);
#ifdef CONFIG_KVM_XEN
@@ -529,7 +533,14 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
#endif
kvm_vcpu_after_set_cpuid(vcpu);

+success:
+ kvfree(e2);
return 0;
+
+err:
+ swap(vcpu->arch.cpuid_entries, e2);
+ swap(vcpu->arch.cpuid_nent, nent);
+ return r;
}

/* when an old userspace process fills a new kernel module */
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:54:07

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 36/49] KVM: x86: Rename "governed features" helpers to use "guest_cpu_cap"

As the first step toward replacing KVM's so-called "governed features"
framework with a more comprehensive, less poorly named implementation,
replace the "kvm_governed_feature" function prefix with "guest_cpu_cap"
and rename guest_can_use() to guest_cpu_cap_has().

The "guest_cpu_cap" naming scheme mirrors that of "kvm_cpu_cap", and
provides a more clear distinction between guest capabilities, which are
KVM controlled (heh, or one might say "governed"), and guest CPUID, which
with few exceptions is fully userspace controlled.

Opportunistically rewrite the comment about XSS passthrough for SEV-ES
guests to avoid referencing so many functions, as such comments are prone
to becoming stale (case in point...).

No functional change intended.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 2 +-
arch/x86/kvm/cpuid.h | 16 ++++++++--------
arch/x86/kvm/mmu.h | 2 +-
arch/x86/kvm/mmu/mmu.c | 4 ++--
arch/x86/kvm/svm/nested.c | 22 +++++++++++-----------
arch/x86/kvm/svm/sev.c | 17 ++++++++---------
arch/x86/kvm/svm/svm.c | 26 +++++++++++++-------------
arch/x86/kvm/svm/svm.h | 4 ++--
arch/x86/kvm/vmx/nested.c | 6 +++---
arch/x86/kvm/vmx/vmx.c | 16 ++++++++--------
arch/x86/kvm/x86.c | 4 ++--
11 files changed, 59 insertions(+), 60 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 16bb873188d6..286abefc93d5 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -407,7 +407,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
allow_gbpages = tdp_enabled ? boot_cpu_has(X86_FEATURE_GBPAGES) :
guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES);
if (allow_gbpages)
- kvm_governed_feature_set(vcpu, X86_FEATURE_GBPAGES);
+ guest_cpu_cap_set(vcpu, X86_FEATURE_GBPAGES);

best = kvm_find_cpuid_entry(vcpu, 1);
if (best && apic) {
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index d68b7d879820..e021681f34ac 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -256,8 +256,8 @@ static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature)
return kvm_governed_feature_index(x86_feature) >= 0;
}

-static __always_inline void kvm_governed_feature_set(struct kvm_vcpu *vcpu,
- unsigned int x86_feature)
+static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu,
+ unsigned int x86_feature)
{
BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));

@@ -265,15 +265,15 @@ static __always_inline void kvm_governed_feature_set(struct kvm_vcpu *vcpu,
vcpu->arch.governed_features.enabled);
}

-static __always_inline void kvm_governed_feature_check_and_set(struct kvm_vcpu *vcpu,
- unsigned int x86_feature)
+static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
+ unsigned int x86_feature)
{
if (kvm_cpu_cap_has(x86_feature) && guest_cpuid_has(vcpu, x86_feature))
- kvm_governed_feature_set(vcpu, x86_feature);
+ guest_cpu_cap_set(vcpu, x86_feature);
}

-static __always_inline bool guest_can_use(struct kvm_vcpu *vcpu,
- unsigned int x86_feature)
+static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
+ unsigned int x86_feature)
{
BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));

@@ -283,7 +283,7 @@ static __always_inline bool guest_can_use(struct kvm_vcpu *vcpu,

static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
{
- if (guest_can_use(vcpu, X86_FEATURE_LAM))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_LAM))
cr3 &= ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);

return kvm_vcpu_is_legal_gpa(vcpu, cr3);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index dc80e72e4848..cf95ea5fe29d 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -150,7 +150,7 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)

static inline unsigned long kvm_get_active_cr3_lam_bits(struct kvm_vcpu *vcpu)
{
- if (!guest_can_use(vcpu, X86_FEATURE_LAM))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_LAM))
return 0;

return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5095fb46713e..e18a10c59431 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4966,7 +4966,7 @@ static void reset_guest_rsvds_bits_mask(struct kvm_vcpu *vcpu,
__reset_rsvds_bits_mask(&context->guest_rsvd_check,
vcpu->arch.reserved_gpa_bits,
context->cpu_role.base.level, is_efer_nx(context),
- guest_can_use(vcpu, X86_FEATURE_GBPAGES),
+ guest_cpu_cap_has(vcpu, X86_FEATURE_GBPAGES),
is_cr4_pse(context),
guest_cpuid_is_amd_compatible(vcpu));
}
@@ -5043,7 +5043,7 @@ static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
__reset_rsvds_bits_mask(shadow_zero_check, reserved_hpa_bits(),
context->root_role.level,
context->root_role.efer_nx,
- guest_can_use(vcpu, X86_FEATURE_GBPAGES),
+ guest_cpu_cap_has(vcpu, X86_FEATURE_GBPAGES),
is_pse, is_amd);

if (!shadow_me_mask)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 55b9a6d96bcf..2900a8e21257 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -107,7 +107,7 @@ static void nested_svm_uninit_mmu_context(struct kvm_vcpu *vcpu)

static bool nested_vmcb_needs_vls_intercept(struct vcpu_svm *svm)
{
- if (!guest_can_use(&svm->vcpu, X86_FEATURE_V_VMSAVE_VMLOAD))
+ if (!guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_V_VMSAVE_VMLOAD))
return true;

if (!nested_npt_enabled(svm))
@@ -590,7 +590,7 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
vmcb_mark_dirty(vmcb02, VMCB_DR);
}

- if (unlikely(guest_can_use(vcpu, X86_FEATURE_LBRV) &&
+ if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
(svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
/*
* Reserved bits of DEBUGCTL are ignored. Be consistent with
@@ -647,7 +647,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
* exit_int_info, exit_int_info_err, next_rip, insn_len, insn_bytes.
*/

- if (guest_can_use(vcpu, X86_FEATURE_VGIF) &&
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_VGIF) &&
(svm->nested.ctl.int_ctl & V_GIF_ENABLE_MASK))
int_ctl_vmcb12_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK);
else
@@ -685,7 +685,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,

vmcb02->control.tsc_offset = vcpu->arch.tsc_offset;

- if (guest_can_use(vcpu, X86_FEATURE_TSCRATEMSR) &&
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_TSCRATEMSR) &&
svm->tsc_ratio_msr != kvm_caps.default_tsc_scaling_ratio)
nested_svm_update_tsc_ratio_msr(vcpu);

@@ -706,7 +706,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
* what a nrips=0 CPU would do (L1 is responsible for advancing RIP
* prior to injecting the event).
*/
- if (guest_can_use(vcpu, X86_FEATURE_NRIPS))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
vmcb02->control.next_rip = svm->nested.ctl.next_rip;
else if (boot_cpu_has(X86_FEATURE_NRIPS))
vmcb02->control.next_rip = vmcb12_rip;
@@ -716,7 +716,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
svm->soft_int_injected = true;
svm->soft_int_csbase = vmcb12_csbase;
svm->soft_int_old_rip = vmcb12_rip;
- if (guest_can_use(vcpu, X86_FEATURE_NRIPS))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
svm->soft_int_next_rip = svm->nested.ctl.next_rip;
else
svm->soft_int_next_rip = vmcb12_rip;
@@ -724,18 +724,18 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,

vmcb02->control.virt_ext = vmcb01->control.virt_ext &
LBR_CTL_ENABLE_MASK;
- if (guest_can_use(vcpu, X86_FEATURE_LBRV))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV))
vmcb02->control.virt_ext |=
(svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK);

if (!nested_vmcb_needs_vls_intercept(svm))
vmcb02->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;

- if (guest_can_use(vcpu, X86_FEATURE_PAUSEFILTER))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_PAUSEFILTER))
pause_count12 = svm->nested.ctl.pause_filter_count;
else
pause_count12 = 0;
- if (guest_can_use(vcpu, X86_FEATURE_PFTHRESHOLD))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_PFTHRESHOLD))
pause_thresh12 = svm->nested.ctl.pause_filter_thresh;
else
pause_thresh12 = 0;
@@ -1022,7 +1022,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
if (vmcb12->control.exit_code != SVM_EXIT_ERR)
nested_save_pending_event_to_vmcb12(svm, vmcb12);

- if (guest_can_use(vcpu, X86_FEATURE_NRIPS))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
vmcb12->control.next_rip = vmcb02->control.next_rip;

vmcb12->control.int_ctl = svm->nested.ctl.int_ctl;
@@ -1061,7 +1061,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
if (!nested_exit_on_intr(svm))
kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);

- if (unlikely(guest_can_use(vcpu, X86_FEATURE_LBRV) &&
+ if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
(svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
svm_copy_lbrs(vmcb12, vmcb02);
svm_update_lbrv(vcpu);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 57c2c8025547..7640dedc2ddc 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4409,16 +4409,15 @@ static void sev_es_vcpu_after_set_cpuid(struct vcpu_svm *svm)
* For SEV-ES, accesses to MSR_IA32_XSS should not be intercepted if
* the host/guest supports its use.
*
- * guest_can_use() checks a number of requirements on the host/guest to
- * ensure that MSR_IA32_XSS is available, but it might report true even
- * if X86_FEATURE_XSAVES isn't configured in the guest to ensure host
- * MSR_IA32_XSS is always properly restored. For SEV-ES, it is better
- * to further check that the guest CPUID actually supports
- * X86_FEATURE_XSAVES so that accesses to MSR_IA32_XSS by misbehaved
- * guests will still get intercepted and caught in the normal
- * kvm_emulate_rdmsr()/kvm_emulated_wrmsr() paths.
+ * KVM treats the guest as being capable of using XSAVES even if XSAVES
+ * isn't enabled in guest CPUID as there is no intercept for XSAVES,
+ * i.e. the guest can use XSAVES/XRSTOR to read/write XSS if XSAVE is
+ * exposed to the guest and XSAVES is supported in hardware. Condition
+ * full XSS passthrough on the guest being able to use XSAVES *and*
+ * XSAVES being exposed to the guest so that KVM can at least honor
+ * guest CPUID for RDMSR and WRMSR.
*/
- if (guest_can_use(vcpu, X86_FEATURE_XSAVES) &&
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVES) &&
guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
else
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3d0549ca246f..2acd2e3bb1b0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1039,7 +1039,7 @@ void svm_update_lbrv(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
bool current_enable_lbrv = svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK;
bool enable_lbrv = (svm_get_lbr_vmcb(svm)->save.dbgctl & DEBUGCTLMSR_LBR) ||
- (is_guest_mode(vcpu) && guest_can_use(vcpu, X86_FEATURE_LBRV) &&
+ (is_guest_mode(vcpu) && guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
(svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK));

if (enable_lbrv == current_enable_lbrv)
@@ -2841,7 +2841,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
switch (msr_info->index) {
case MSR_AMD64_TSC_RATIO:
if (!msr_info->host_initiated &&
- !guest_can_use(vcpu, X86_FEATURE_TSCRATEMSR))
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_TSCRATEMSR))
return 1;
msr_info->data = svm->tsc_ratio_msr;
break;
@@ -2991,7 +2991,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
switch (ecx) {
case MSR_AMD64_TSC_RATIO:

- if (!guest_can_use(vcpu, X86_FEATURE_TSCRATEMSR)) {
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_TSCRATEMSR)) {

if (!msr->host_initiated)
return 1;
@@ -3013,7 +3013,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)

svm->tsc_ratio_msr = data;

- if (guest_can_use(vcpu, X86_FEATURE_TSCRATEMSR) &&
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_TSCRATEMSR) &&
is_guest_mode(vcpu))
nested_svm_update_tsc_ratio_msr(vcpu);

@@ -4342,11 +4342,11 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
if (boot_cpu_has(X86_FEATURE_XSAVE) &&
boot_cpu_has(X86_FEATURE_XSAVES) &&
guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
- kvm_governed_feature_set(vcpu, X86_FEATURE_XSAVES);
+ guest_cpu_cap_set(vcpu, X86_FEATURE_XSAVES);

- kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_NRIPS);
- kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_TSCRATEMSR);
- kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_LBRV);
+ guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_NRIPS);
+ guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_TSCRATEMSR);
+ guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LBRV);

/*
* Intercept VMLOAD if the vCPU mode is Intel in order to emulate that
@@ -4354,12 +4354,12 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
* SVM on Intel is bonkers and extremely unlikely to work).
*/
if (!guest_cpuid_is_intel(vcpu))
- kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
+ guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);

- kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_PAUSEFILTER);
- kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_PFTHRESHOLD);
- kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VGIF);
- kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VNMI);
+ guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PAUSEFILTER);
+ guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PFTHRESHOLD);
+ guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VGIF);
+ guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VNMI);

svm_recalc_instruction_intercepts(vcpu, svm);

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 97b3683ea324..08fd788d08df 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -487,7 +487,7 @@ static inline bool svm_is_intercept(struct vcpu_svm *svm, int bit)

static inline bool nested_vgif_enabled(struct vcpu_svm *svm)
{
- return guest_can_use(&svm->vcpu, X86_FEATURE_VGIF) &&
+ return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_VGIF) &&
(svm->nested.ctl.int_ctl & V_GIF_ENABLE_MASK);
}

@@ -539,7 +539,7 @@ static inline bool nested_npt_enabled(struct vcpu_svm *svm)

static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
{
- return guest_can_use(&svm->vcpu, X86_FEATURE_VNMI) &&
+ return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_VNMI) &&
(svm->nested.ctl.int_ctl & V_NMI_ENABLE_MASK);
}

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d5b832126e34..fb7eec29681d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6488,7 +6488,7 @@ static int vmx_get_nested_state(struct kvm_vcpu *vcpu,
vmx = to_vmx(vcpu);
vmcs12 = get_vmcs12(vcpu);

- if (guest_can_use(vcpu, X86_FEATURE_VMX) &&
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_VMX) &&
(vmx->nested.vmxon || vmx->nested.smm.vmxon)) {
kvm_state.hdr.vmx.vmxon_pa = vmx->nested.vmxon_ptr;
kvm_state.hdr.vmx.vmcs12_pa = vmx->nested.current_vmptr;
@@ -6629,7 +6629,7 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
if (kvm_state->flags & ~KVM_STATE_NESTED_EVMCS)
return -EINVAL;
} else {
- if (!guest_can_use(vcpu, X86_FEATURE_VMX))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
return -EINVAL;

if (!page_address_valid(vcpu, kvm_state->hdr.vmx.vmxon_pa))
@@ -6663,7 +6663,7 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
return -EINVAL;

if ((kvm_state->flags & KVM_STATE_NESTED_EVMCS) &&
- (!guest_can_use(vcpu, X86_FEATURE_VMX) ||
+ (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX) ||
!vmx->nested.enlightened_vmcs_enabled))
return -EINVAL;

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 51b2cd13250a..1bc56596d653 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2050,7 +2050,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
[msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0];
break;
case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
- if (!guest_can_use(vcpu, X86_FEATURE_VMX))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
return 1;
if (vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
&msr_info->data))
@@ -2360,7 +2360,7 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
if (!msr_info->host_initiated)
return 1; /* they are read-only */
- if (!guest_can_use(vcpu, X86_FEATURE_VMX))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
return 1;
return vmx_set_vmx_msr(vcpu, msr_index, data);
case MSR_IA32_RTIT_CTL:
@@ -4571,7 +4571,7 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control,
\
if (cpu_has_vmx_##name()) { \
if (kvm_is_governed_feature(X86_FEATURE_##feat_name)) \
- __enabled = guest_can_use(__vcpu, X86_FEATURE_##feat_name); \
+ __enabled = guest_cpu_cap_has(__vcpu, X86_FEATURE_##feat_name); \
else \
__enabled = guest_cpuid_has(__vcpu, X86_FEATURE_##feat_name); \
vmx_adjust_secondary_exec_control(vmx, exec_control, SECONDARY_EXEC_##ctrl_name,\
@@ -7838,10 +7838,10 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
*/
if (boot_cpu_has(X86_FEATURE_XSAVE) &&
guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
- kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_XSAVES);
+ guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_XSAVES);

- kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VMX);
- kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_LAM);
+ guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VMX);
+ guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LAM);

vmx_setup_uret_msrs(vmx);

@@ -7849,7 +7849,7 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
vmcs_set_secondary_exec_control(vmx,
vmx_secondary_exec_control(vmx));

- if (guest_can_use(vcpu, X86_FEATURE_VMX))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
vmx->msr_ia32_feature_control_valid_bits |=
FEAT_CTL_VMX_ENABLED_INSIDE_SMX |
FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX;
@@ -7858,7 +7858,7 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
~(FEAT_CTL_VMX_ENABLED_INSIDE_SMX |
FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX);

- if (guest_can_use(vcpu, X86_FEATURE_VMX))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
nested_vmx_cr_fixed1_bits_update(vcpu);

if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7160c5ab8e3e..4ca9651b3f43 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1026,7 +1026,7 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)
if (vcpu->arch.xcr0 != host_xcr0)
xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0);

- if (guest_can_use(vcpu, X86_FEATURE_XSAVES) &&
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVES) &&
vcpu->arch.ia32_xss != host_xss)
wrmsrl(MSR_IA32_XSS, vcpu->arch.ia32_xss);
}
@@ -1057,7 +1057,7 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu)
if (vcpu->arch.xcr0 != host_xcr0)
xsetbv(XCR_XFEATURE_ENABLED_MASK, host_xcr0);

- if (guest_can_use(vcpu, X86_FEATURE_XSAVES) &&
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVES) &&
vcpu->arch.ia32_xss != host_xss)
wrmsrl(MSR_IA32_XSS, host_xss);
}
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:54:07

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 37/49] KVM: x86: Replace guts of "governed" features with comprehensive cpu_caps

Replace the internals of the governed features framework with a more
comprehensive "guest CPU capabilities" implementation, i.e. with a guest
version of kvm_cpu_caps. Keep the skeleton of governed features around
for now as vmx_adjust_sec_exec_control() relies on detecting governed
features to do the right thing for XSAVES, and switching all guest feature
queries to guest_cpu_cap_has() requires subtle and non-trivial changes,
i.e. is best done as a standalone change.

Tracking *all* guest capabilities that KVM cares will allow excising the
poorly named "governed features" framework, and effectively optimizes all
KVM queries of guest capabilities, i.e. doesn't require making a
subjective decision as to whether or not a feature is worth "governing",
and doesn't require adding the code to do so.

The cost of tracking all features is currently 92 bytes per vCPU on 64-bit
kernels: 100 bytes for cpu_caps versus 8 bytes for governed_features.
That cost is well worth paying even if the only benefit was eliminating
the "governed features" terminology. And practically speaking, the real
cost is zero unless those 92 bytes pushes the size of vcpu_vmx or vcpu_svm
into a new order-N allocation, and if that happens there are better ways
to reduce the footprint of kvm_vcpu_arch, e.g. making the PMU and/or MTRR
state separate allocations.

Suggested-by: Maxim Levitsky <[email protected]>
Reviewed-by: Binbin Wu <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 45 +++++++++++++++++++++------------
arch/x86/kvm/cpuid.c | 14 +++++++---
arch/x86/kvm/cpuid.h | 12 ++++-----
arch/x86/kvm/reverse_cpuid.h | 16 ------------
4 files changed, 46 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3003e99155e7..8840d21ee0b5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -743,6 +743,22 @@ struct kvm_queued_exception {
bool has_payload;
};

+/*
+ * Hardware-defined CPUID leafs that are either scattered by the kernel or are
+ * unknown to the kernel, but need to be directly used by KVM. Note, these
+ * word values conflict with the kernel's "bug" caps, but KVM doesn't use those.
+ */
+enum kvm_only_cpuid_leafs {
+ CPUID_12_EAX = NCAPINTS,
+ CPUID_7_1_EDX,
+ CPUID_8000_0007_EDX,
+ CPUID_8000_0022_EAX,
+ CPUID_7_2_EDX,
+ NR_KVM_CPU_CAPS,
+
+ NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
+};
+
struct kvm_vcpu_arch {
/*
* rip and regs accesses must go through
@@ -861,23 +877,20 @@ struct kvm_vcpu_arch {
bool is_amd_compatible;

/*
- * FIXME: Drop this macro and use KVM_NR_GOVERNED_FEATURES directly
- * when "struct kvm_vcpu_arch" is no longer defined in an
- * arch/x86/include/asm header. The max is mostly arbitrary, i.e.
- * can be increased as necessary.
+ * cpu_caps holds the effective guest capabilities, i.e. the features
+ * the vCPU is allowed to use. Typically, but not always, features can
+ * be used by the guest if and only if both KVM and userspace want to
+ * expose the feature to the guest.
+ *
+ * A common exception is for virtualization holes, i.e. when KVM can't
+ * prevent the guest from using a feature, in which case the vCPU "has"
+ * the feature regardless of what KVM or userspace desires.
+ *
+ * Note, features that don't require KVM involvement in any way are
+ * NOT enforced/sanitized by KVM, i.e. are taken verbatim from the
+ * guest CPUID provided by userspace.
*/
-#define KVM_MAX_NR_GOVERNED_FEATURES BITS_PER_LONG
-
- /*
- * Track whether or not the guest is allowed to use features that are
- * governed by KVM, where "governed" means KVM needs to manage state
- * and/or explicitly enable the feature in hardware. Typically, but
- * not always, governed features can be used by the guest if and only
- * if both KVM and userspace want to expose the feature to the guest.
- */
- struct {
- DECLARE_BITMAP(enabled, KVM_MAX_NR_GOVERNED_FEATURES);
- } governed_features;
+ u32 cpu_caps[NR_KVM_CPU_CAPS];

u64 reserved_gpa_bits;
int maxphyaddr;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 286abefc93d5..89c506cf649b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -387,9 +387,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
struct kvm_cpuid_entry2 *best;
bool allow_gbpages;

- BUILD_BUG_ON(KVM_NR_GOVERNED_FEATURES > KVM_MAX_NR_GOVERNED_FEATURES);
- bitmap_zero(vcpu->arch.governed_features.enabled,
- KVM_MAX_NR_GOVERNED_FEATURES);
+ memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps));

kvm_update_cpuid_runtime(vcpu);

@@ -473,6 +471,7 @@ u64 kvm_vcpu_reserved_gpa_bits_raw(struct kvm_vcpu *vcpu)
static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
int nent)
{
+ u32 vcpu_caps[NR_KVM_CPU_CAPS];
int r;

/*
@@ -480,10 +479,18 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
* order to massage the new entries, e.g. to account for dynamic bits
* that KVM controls, without clobbering the current guest CPUID, which
* KVM needs to preserve in order to unwind on failure.
+ *
+ * Similarly, save the vCPU's current cpu_caps so that the capabilities
+ * can be updated alongside the CPUID entries when performing runtime
+ * updates. Full initialization is done if and only if the vCPU hasn't
+ * run, i.e. only if userspace is potentially changing CPUID features.
*/
swap(vcpu->arch.cpuid_entries, e2);
swap(vcpu->arch.cpuid_nent, nent);

+ memcpy(vcpu_caps, vcpu->arch.cpu_caps, sizeof(vcpu_caps));
+ BUILD_BUG_ON(sizeof(vcpu_caps) != sizeof(vcpu->arch.cpu_caps));
+
/*
* KVM does not correctly handle changing guest CPUID after KVM_RUN, as
* MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
@@ -527,6 +534,7 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
return 0;

err:
+ memcpy(vcpu->arch.cpu_caps, vcpu_caps, sizeof(vcpu_caps));
swap(vcpu->arch.cpuid_entries, e2);
swap(vcpu->arch.cpuid_nent, nent);
return r;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index e021681f34ac..ad0168d3aec5 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -259,10 +259,10 @@ static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature)
static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu,
unsigned int x86_feature)
{
- BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
+ unsigned int x86_leaf = __feature_leaf(x86_feature);

- __set_bit(kvm_governed_feature_index(x86_feature),
- vcpu->arch.governed_features.enabled);
+ reverse_cpuid_check(x86_leaf);
+ vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
}

static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
@@ -275,10 +275,10 @@ static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
unsigned int x86_feature)
{
- BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
+ unsigned int x86_leaf = __feature_leaf(x86_feature);

- return test_bit(kvm_governed_feature_index(x86_feature),
- vcpu->arch.governed_features.enabled);
+ reverse_cpuid_check(x86_leaf);
+ return vcpu->arch.cpu_caps[x86_leaf] & __feature_bit(x86_feature);
}

static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h
index 245f71c16272..63d5735fbc8a 100644
--- a/arch/x86/kvm/reverse_cpuid.h
+++ b/arch/x86/kvm/reverse_cpuid.h
@@ -6,22 +6,6 @@
#include <asm/cpufeature.h>
#include <asm/cpufeatures.h>

-/*
- * Hardware-defined CPUID leafs that are either scattered by the kernel or are
- * unknown to the kernel, but need to be directly used by KVM. Note, these
- * word values conflict with the kernel's "bug" caps, but KVM doesn't use those.
- */
-enum kvm_only_cpuid_leafs {
- CPUID_12_EAX = NCAPINTS,
- CPUID_7_1_EDX,
- CPUID_8000_0007_EDX,
- CPUID_8000_0022_EAX,
- CPUID_7_2_EDX,
- NR_KVM_CPU_CAPS,
-
- NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
-};
-
/*
* Define a KVM-only feature flag.
*
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:54:47

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 49/49] *** DO NOT APPLY *** KVM: x86: Verify KVM initializes all consumed guest caps

Assert that all features queried via guest_cpu_cap_has() are known to KVM,
i.e. that KVM doesn't check for a feature that can never actually be set.

This is for demonstration purposes only, as the proper way to enforce this
is to do post-processing at build time (and there are other shortcomings
of this PoC, e.g. it requires all KVM modules to be built-in).

Not-signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 81 +++++++++++++++++++++++--------
arch/x86/kvm/cpuid.h | 16 +++++-
arch/x86/kvm/x86.c | 2 +
include/asm-generic/vmlinux.lds.h | 4 ++
4 files changed, 81 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0e64a6332052..18ded0e682f2 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -37,6 +37,7 @@ u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
EXPORT_SYMBOL_GPL(kvm_cpu_caps);

static u32 kvm_vmm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
+static u32 kvm_known_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;

u32 xstate_required_size(u64 xstate_bv, bool compacted)
{
@@ -143,6 +144,26 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
0; \
})

+/*
+ * Vendor Features - For features that KVM supports, but are added in later
+ * because they require additional vendor enabling.
+ */
+#define VEND_F(name) \
+({ \
+ KVM_VALIDATE_CPU_CAP_USAGE(name); \
+ 0; \
+})
+
+/*
+ * Operating System Features - For features that KVM dynamically sets/clears at
+ * runtime, e.g. when CR4 changes, but are never advertised to userspace.
+ */
+#define OS_F(name) \
+({ \
+ KVM_VALIDATE_CPU_CAP_USAGE(name); \
+ 0; \
+})
+
/*
* Magic value used by KVM when querying userspace-provided CPUID entries and
* doesn't care about the CPIUD index because the index of the function in
@@ -727,6 +748,7 @@ do { \
u32 __leaf = __feature_leaf(X86_FEATURE_##name); \
\
BUILD_BUG_ON(__leaf != kvm_cpu_cap_init_in_progress); \
+ kvm_known_cpu_caps[__leaf] |= feature_bit(name); \
} while (0)

/*
@@ -771,14 +793,14 @@ void kvm_set_cpu_caps(void)
* NOTE: MONITOR (and MWAIT) are emulated as NOP, but *not*
* advertised to guests via CPUID!
*/
- F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64 */ | VMM_F(MWAIT) |
- 0 /* DS-CPL, VMX, SMX, EST */ |
+ F(XMM3) | F(PCLMULQDQ) | VEND_F(DTES64) | VMM_F(MWAIT) |
+ VEND_F(VMX) | 0 /* DS-CPL, SMX, EST */ |
0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
F(FMA) | F(CX16) | 0 /* xTPR Update */ | F(PDCM) |
F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
F(XMM4_2) | EMUL_F(X2APIC) | F(MOVBE) | F(POPCNT) |
EMUL_F(TSC_DEADLINE_TIMER) | F(AES) | F(XSAVE) |
- 0 /* OSXSAVE */ | F(AVX) | F(F16C) | F(RDRAND) |
+ OS_F(OSXSAVE) | F(AVX) | F(F16C) | F(RDRAND) |
EMUL_F(HYPERVISOR)
);

@@ -788,7 +810,7 @@ void kvm_set_cpu_caps(void)
F(CX8) | F(APIC) | 0 /* Reserved */ | F(SEP) |
F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
F(PAT) | F(PSE36) | 0 /* PSN */ | F(CLFLUSH) |
- 0 /* Reserved, DS, ACPI */ | F(MMX) |
+ 0 /* Reserved */ | F(DS) | 0 /* ACPI */ | F(MMX) |
F(FXSR) | F(XMM) | F(XMM2) | F(SELFSNOOP) |
0 /* HTT, TM, Reserved, PBE */
);
@@ -796,17 +818,17 @@ void kvm_set_cpu_caps(void)
kvm_cpu_cap_init(CPUID_7_0_EBX,
F(FSGSBASE) | EMUL_F(TSC_ADJUST) | F(SGX) | F(BMI1) | F(HLE) |
F(AVX2) | F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) |
- F(INVPCID) | F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ |
+ F(INVPCID) | F(RTM) | F(ZERO_FCS_FDS) | VEND_F(MPX) |
F(AVX512F) | F(AVX512DQ) | F(RDSEED) | F(ADX) | F(SMAP) |
- F(AVX512IFMA) | F(CLFLUSHOPT) | F(CLWB) | 0 /*INTEL_PT*/ |
+ F(AVX512IFMA) | F(CLFLUSHOPT) | F(CLWB) | VEND_F(INTEL_PT) |
F(AVX512PF) | F(AVX512ER) | F(AVX512CD) | F(SHA_NI) |
F(AVX512BW) | F(AVX512VL));

kvm_cpu_cap_init(CPUID_7_ECX,
- F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
+ F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | OS_F(OSPKE) | F(RDPID) |
F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
- F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ |
+ F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | VEND_F(WAITPKG) |
F(SGX_LC) | F(BUS_LOCK_DETECT)
);

@@ -858,11 +880,11 @@ void kvm_set_cpu_caps(void)
);

kvm_cpu_cap_init(CPUID_8000_0001_ECX,
- F(LAHF_LM) | F(CMP_LEGACY) | 0 /*SVM*/ | 0 /* ExtApicSpace */ |
+ F(LAHF_LM) | F(CMP_LEGACY) | VEND_F(SVM) | 0 /* ExtApicSpace */ |
F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) |
F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM) |
- F(TOPOEXT) | 0 /* PERFCTR_CORE */
+ F(TOPOEXT) | VEND_F(PERFCTR_CORE)
);

kvm_cpu_cap_init(CPUID_8000_0001_EDX,
@@ -905,23 +927,22 @@ void kvm_set_cpu_caps(void)
kvm_cpu_cap_set(X86_FEATURE_AMD_SSBD);
if (!boot_cpu_has_bug(X86_BUG_SPEC_STORE_BYPASS))
kvm_cpu_cap_set(X86_FEATURE_AMD_SSB_NO);
- /*
- * The preference is to use SPEC CTRL MSR instead of the
- * VIRT_SPEC MSR.
- */
- if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) &&
- !boot_cpu_has(X86_FEATURE_AMD_SSBD))
- kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);

/*
* Hide all SVM features by default, SVM will set the cap bits for
* features it emulates and/or exposes for L1.
*/
- kvm_cpu_cap_init(CPUID_8000_000A_EDX, 0);
+ kvm_cpu_cap_init(CPUID_8000_000A_EDX,
+ VEND_F(VMCBCLEAN) | VEND_F(FLUSHBYASID) | VEND_F(NRIPS) |
+ VEND_F(TSCRATEMSR) | VEND_F(V_VMSAVE_VMLOAD) | VEND_F(LBRV) |
+ VEND_F(PAUSEFILTER) | VEND_F(PFTHRESHOLD) | VEND_F(VGIF) |
+ VEND_F(VNMI) | VEND_F(SVME_ADDR_CHK)
+ );

kvm_cpu_cap_init(CPUID_8000_001F_EAX,
- 0 /* SME */ | 0 /* SEV */ | 0 /* VM_PAGE_FLUSH */ | 0 /* SEV_ES */ |
- F(SME_COHERENT));
+ VEND_F(SME) | VEND_F(SEV) | 0 /* VM_PAGE_FLUSH */ | VEND_F(SEV_ES) |
+ F(SME_COHERENT)
+ );

kvm_cpu_cap_init(CPUID_8000_0021_EAX,
F(NO_NESTED_DATA_BP) | F(LFENCE_RDTSC) | 0 /* SmmPgCfgLock */ |
@@ -977,6 +998,26 @@ EXPORT_SYMBOL_GPL(kvm_set_cpu_caps);
#undef KVM_VALIDATE_CPU_CAP_USAGE
#define KVM_VALIDATE_CPU_CAP_USAGE(name)

+
+extern unsigned int __start___kvm_features[];
+extern unsigned int __stop___kvm_features[];
+
+void kvm_validate_cpu_caps(void)
+{
+ int i;
+
+ for (i = 0; i < __stop___kvm_features - __start___kvm_features; i++) {
+ u32 feature = __feature_translate(__start___kvm_features[i]);
+ u32 leaf = feature / 32;
+
+ if (kvm_known_cpu_caps[leaf] & BIT(feature & 31))
+ continue;
+
+ pr_warn("Word %u, bit %u (%lx) checked but not supported\n",
+ leaf, feature & 31, BIT(feature & 31));
+ }
+
+}
struct kvm_cpuid_array {
struct kvm_cpuid_entry2 *entries;
int maxnent;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 0bf3bddd0e29..32a86de980c7 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -10,6 +10,7 @@

extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
void kvm_set_cpu_caps(void);
+void kvm_validate_cpu_caps(void);

void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu);
void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu);
@@ -245,8 +246,8 @@ static __always_inline void guest_cpu_cap_change(struct kvm_vcpu *vcpu,
guest_cpu_cap_clear(vcpu, x86_feature);
}

-static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
- unsigned int x86_feature)
+static __always_inline bool __guest_cpu_cap_has(struct kvm_vcpu *vcpu,
+ unsigned int x86_feature)
{
unsigned int x86_leaf = __feature_leaf(x86_feature);

@@ -254,6 +255,17 @@ static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
return vcpu->arch.cpu_caps[x86_leaf] & __feature_bit(x86_feature);
}

+#define guest_cpu_cap_has(vcpu, x86_feature) \
+({ \
+ asm volatile( \
+ " .pushsection \"__kvm_features\",\"a\"\n" \
+ " .balign 4\n" \
+ " .long " __stringify(x86_feature) " \n" \
+ " .popsection\n" \
+ ); \
+ __guest_cpu_cap_has(vcpu, x86_feature); \
+})
+
static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
{
if (guest_cpu_cap_has(vcpu, X86_FEATURE_LAM))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5aa7581802f7..f6b7c5c862fb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9790,6 +9790,8 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
if (r != 0)
goto out_mmu_exit;

+ kvm_validate_cpu_caps();
+
kvm_ops_update(ops);

for_each_online_cpu(cpu) {
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index f7749d0f2562..102fc2a39083 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -533,6 +533,10 @@
BOUNDED_SECTION_BY(__modver, ___modver) \
} \
\
+ __kvm_features : AT(ADDR(__kvm_features) - LOAD_OFFSET) { \
+ BOUNDED_SECTION_BY(__kvm_features, ___kvm_features) \
+ } \
+ \
KCFI_TRAPS \
\
RO_EXCEPTION_TABLE \
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:55:00

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 42/49] KVM: x86: Drop unnecessary check that cpuid_entry2_find() returns right leaf

Drop an unnecessary check that kvm_find_cpuid_entry_index(), i.e.
cpuid_entry2_find(), returns the correct leaf when getting CPUID.0x7.0x0
to update X86_FEATURE_OSPKE. cpuid_entry2_find() never returns an entry
for the wrong function. And not that it matters, but cpuid_entry2_find()
will always return a precise match for CPUID.0x7.0x0 since the index is
significant.

No functional change intended.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 258c5fce87fc..8256fc657c6b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -351,7 +351,7 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
}

best = kvm_find_cpuid_entry_index(vcpu, 7, 0);
- if (best && boot_cpu_has(X86_FEATURE_PKU) && best->function == 0x7)
+ if (best && boot_cpu_has(X86_FEATURE_PKU))
cpuid_entry_change(best, X86_FEATURE_OSPKE,
kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));

--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:55:10

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 43/49] KVM: x86: Update OS{XSAVE,PKE} bits in guest CPUID irrespective of host support

When making runtime CPUID updates, change OSXSAVE and OSPKE even if their
respective base features (XSAVE, PKU) are not supported by the host. KVM
already incorporates host support in the vCPU's effective reserved CR4 bits.
I.e. OSXSAVE and OSPKE can be set if and only if the host supports them.

And conversely, since KVM's ABI is that KVM owns the dynamic OS feature
flags, clearing them when they obviously aren't supported and thus can't
be enabled is arguably a fix.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 8256fc657c6b..552e65ba5efa 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -336,10 +336,8 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)

best = kvm_find_cpuid_entry(vcpu, 1);
if (best) {
- /* Update OSXSAVE bit */
- if (boot_cpu_has(X86_FEATURE_XSAVE))
- cpuid_entry_change(best, X86_FEATURE_OSXSAVE,
- kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE));
+ cpuid_entry_change(best, X86_FEATURE_OSXSAVE,
+ kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE));

cpuid_entry_change(best, X86_FEATURE_APIC,
vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
@@ -351,7 +349,7 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
}

best = kvm_find_cpuid_entry_index(vcpu, 7, 0);
- if (best && boot_cpu_has(X86_FEATURE_PKU))
+ if (best)
cpuid_entry_change(best, X86_FEATURE_OSPKE,
kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));

--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:55:14

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 41/49] KVM: x86: Avoid double CPUID lookup when updating MWAIT at runtime

Move the handling of X86_FEATURE_MWAIT during CPUID runtime updates to
utilize the lookup done for other CPUID.0x1 features.

No functional change intended.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 8ada1cac8fcb..258c5fce87fc 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -343,6 +343,11 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)

cpuid_entry_change(best, X86_FEATURE_APIC,
vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
+
+ if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT))
+ cpuid_entry_change(best, X86_FEATURE_MWAIT,
+ vcpu->arch.ia32_misc_enable_msr &
+ MSR_IA32_MISC_ENABLE_MWAIT);
}

best = kvm_find_cpuid_entry_index(vcpu, 7, 0);
@@ -358,14 +363,6 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
-
- if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
- best = kvm_find_cpuid_entry(vcpu, 0x1);
- if (best)
- cpuid_entry_change(best, X86_FEATURE_MWAIT,
- vcpu->arch.ia32_misc_enable_msr &
- MSR_IA32_MISC_ENABLE_MWAIT);
- }
}
EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);

--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:56:05

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 44/49] KVM: x86: Update guest cpu_caps at runtime for dynamic CPUID-based features

When updating guest CPUID entries to emulate runtime behavior, e.g. when
the guest enables a CR4-based feature that is tied to a CPUID flag, also
update the vCPU's cpu_caps accordingly. This will allow replacing all
usage of guest_cpuid_has() with guest_cpu_cap_has().

Note, this relies on kvm_set_cpuid() taking a snapshot of cpu_caps before
invoking kvm_update_cpuid_runtime(), i.e. when KVM is updating CPUID
entries that *may* become the vCPU's CPUID, so that unwinding to the old
cpu_caps is possible if userspace tries to set bogus CPUID information.

Note #2, none of the features in question use guest_cpu_cap_has() at this
time, i.e. aside from settings bits in cpu_caps, this is a glorified nop.

Cc: Yang Weijiang <[email protected]>
Cc: Robert Hoo <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 28 +++++++++++++++++++---------
1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 552e65ba5efa..1424a9d4eb17 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -330,28 +330,38 @@ static u64 cpuid_get_supported_xcr0(struct kvm_vcpu *vcpu)
return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
}

+static __always_inline void kvm_update_feature_runtime(struct kvm_vcpu *vcpu,
+ struct kvm_cpuid_entry2 *entry,
+ unsigned int x86_feature,
+ bool has_feature)
+{
+ cpuid_entry_change(entry, x86_feature, has_feature);
+ guest_cpu_cap_change(vcpu, x86_feature, has_feature);
+}
+
void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;

best = kvm_find_cpuid_entry(vcpu, 1);
if (best) {
- cpuid_entry_change(best, X86_FEATURE_OSXSAVE,
- kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE));
+ kvm_update_feature_runtime(vcpu, best, X86_FEATURE_OSXSAVE,
+ kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE));

- cpuid_entry_change(best, X86_FEATURE_APIC,
- vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
+ kvm_update_feature_runtime(vcpu, best, X86_FEATURE_APIC,
+ vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);

if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT))
- cpuid_entry_change(best, X86_FEATURE_MWAIT,
- vcpu->arch.ia32_misc_enable_msr &
- MSR_IA32_MISC_ENABLE_MWAIT);
+ kvm_update_feature_runtime(vcpu, best, X86_FEATURE_MWAIT,
+ vcpu->arch.ia32_misc_enable_msr &
+ MSR_IA32_MISC_ENABLE_MWAIT);
}

best = kvm_find_cpuid_entry_index(vcpu, 7, 0);
if (best)
- cpuid_entry_change(best, X86_FEATURE_OSPKE,
- kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
+ kvm_update_feature_runtime(vcpu, best, X86_FEATURE_OSPKE,
+ kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
+

best = kvm_find_cpuid_entry_index(vcpu, 0xD, 0);
if (best)
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:57:10

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 47/49] KVM: x86: Drop superfluous host XSAVE check when adjusting guest XSAVES caps

Drop the manual boot_cpu_has() checks on XSAVE when adjusting the guest's
XSAVES capabilities now that guest cpu_caps incorporates KVM's support.
The guest's cpu_caps are initialized from kvm_cpu_caps, which are in turn
initialized from boot_cpu_data, i.e. checking guest_cpu_cap_has() also
checks host/KVM capabilities (which is the entire point of cpu_caps).

Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 1 -
arch/x86/kvm/vmx/vmx.c | 3 +--
2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 06770b60c0ba..4aaffbf22531 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4340,7 +4340,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
* the guest read/write access to the host's XSS.
*/
guest_cpu_cap_change(vcpu, X86_FEATURE_XSAVES,
- boot_cpu_has(X86_FEATURE_XSAVE) &&
boot_cpu_has(X86_FEATURE_XSAVES) &&
guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE));

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 741961a1edcc..6fbdf520c58b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7833,8 +7833,7 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
* to the guest. XSAVES depends on CR4.OSXSAVE, and CR4.OSXSAVE can be
* set if and only if XSAVE is supported.
*/
- if (!boot_cpu_has(X86_FEATURE_XSAVE) ||
- !guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE))
guest_cpu_cap_clear(vcpu, X86_FEATURE_XSAVES);

vmx_setup_uret_msrs(vmx);
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:57:19

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 46/49] KVM: x86: Replace (almost) all guest CPUID feature queries with cpu_caps

Switch all queries (except XSAVES) of guest features from guest CPUID to
guest capabilities, i.e. replace all calls to guest_cpuid_has() with calls
to guest_cpu_cap_has().

Keep guest_cpuid_has() around for XSAVES, but subsume its helper
guest_cpuid_get_register() and add a compile-time assertion to prevent
using guest_cpuid_has() for any other feature. Add yet another comment
for XSAVE to explain why KVM is allowed to query its raw guest CPUID.

Opportunistically drop the unused guest_cpuid_clear(), as there should be
no circumstance in which KVM needs to _clear_ a guest CPUID feature now
that everything is tracked via cpu_caps. E.g. KVM may need to _change_
a feature to emulate dynamic CPUID flags, but KVM should never need to
clear a feature in guest CPUID to prevent it from being used by the guest.

Delete the last remnants of the governed features framework, as the lone
holdout was vmx_adjust_secondary_exec_control()'s divergent behavior for
governed vs. ungoverned features.

Note, replacing guest_cpuid_has() checks with guest_cpu_cap_has() when
computing reserved CR4 bits is a nop when viewed as a whole, as KVM's
capabilities are already incorporated into the calculation, i.e. if a
feature is present in guest CPUID but unsupported by KVM, its CR4 bit
was already being marked as reserved, checking guest_cpu_cap_has() simply
double-stamps that it's a reserved bit.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 4 +-
arch/x86/kvm/cpuid.h | 74 +++++++++++---------------------
arch/x86/kvm/governed_features.h | 22 ----------
arch/x86/kvm/hyperv.c | 2 +-
arch/x86/kvm/lapic.c | 2 +-
arch/x86/kvm/mtrr.c | 2 +-
arch/x86/kvm/smm.c | 10 ++---
arch/x86/kvm/svm/pmu.c | 8 ++--
arch/x86/kvm/svm/sev.c | 4 +-
arch/x86/kvm/svm/svm.c | 20 ++++-----
arch/x86/kvm/vmx/hyperv.h | 2 +-
arch/x86/kvm/vmx/nested.c | 12 +++---
arch/x86/kvm/vmx/pmu_intel.c | 4 +-
arch/x86/kvm/vmx/sgx.c | 14 +++---
arch/x86/kvm/vmx/vmx.c | 47 ++++++++++----------
arch/x86/kvm/x86.c | 64 +++++++++++++--------------
16 files changed, 121 insertions(+), 170 deletions(-)
delete mode 100644 arch/x86/kvm/governed_features.h

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1424a9d4eb17..0130e0677387 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -463,7 +463,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
* and can install smaller shadow pages if the host lacks 1GiB support.
*/
allow_gbpages = tdp_enabled ? boot_cpu_has(X86_FEATURE_GBPAGES) :
- guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES);
+ guest_cpu_cap_has(vcpu, X86_FEATURE_GBPAGES);
guest_cpu_cap_change(vcpu, X86_FEATURE_GBPAGES, allow_gbpages);

best = kvm_find_cpuid_entry(vcpu, 1);
@@ -488,7 +488,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)

#define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
vcpu->arch.cr4_guest_rsvd_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_) |
- __cr4_reserved_bits(guest_cpuid_has, vcpu);
+ __cr4_reserved_bits(guest_cpu_cap_has, vcpu);
#undef __kvm_cpu_cap_has

kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu));
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 7be56fa62342..0bf3bddd0e29 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -67,41 +67,38 @@ static __always_inline void cpuid_entry_override(struct kvm_cpuid_entry2 *entry,
*reg = kvm_cpu_caps[leaf];
}

-static __always_inline u32 *guest_cpuid_get_register(struct kvm_vcpu *vcpu,
- unsigned int x86_feature)
+static __always_inline bool guest_cpuid_has(struct kvm_vcpu *vcpu,
+ unsigned int x86_feature)
{
const struct cpuid_reg cpuid = x86_feature_cpuid(x86_feature);
struct kvm_cpuid_entry2 *entry;
+ u32 *reg;
+
+ /*
+ * XSAVES is a special snowflake. Due to lack of a dedicated intercept
+ * on SVM, KVM must assume that XSAVES (and thus XRSTORS) is usable by
+ * the guest if the host supports XSAVES and *XSAVE* is exposed to the
+ * guest. Although the guest can read/write XSS via XSAVES/XRSTORS, to
+ * minimize the virtualization hole, KVM rejects attempts to read/write
+ * XSS via RDMSR/WRMSR. To make that work, KVM needs to check the raw
+ * guest CPUID, not KVM's view of guest capabilities.
+ *
+ * For all other features, guest capabilities are accurate. Expand
+ * this allowlist with extreme vigilance.
+ */
+ BUILD_BUG_ON(x86_feature != X86_FEATURE_XSAVES);

entry = kvm_find_cpuid_entry_index(vcpu, cpuid.function, cpuid.index);
if (!entry)
return NULL;

- return __cpuid_entry_get_reg(entry, cpuid.reg);
-}
-
-static __always_inline bool guest_cpuid_has(struct kvm_vcpu *vcpu,
- unsigned int x86_feature)
-{
- u32 *reg;
-
- reg = guest_cpuid_get_register(vcpu, x86_feature);
+ reg = __cpuid_entry_get_reg(entry, cpuid.reg);
if (!reg)
return false;

return *reg & __feature_bit(x86_feature);
}

-static __always_inline void guest_cpuid_clear(struct kvm_vcpu *vcpu,
- unsigned int x86_feature)
-{
- u32 *reg;
-
- reg = guest_cpuid_get_register(vcpu, x86_feature);
- if (reg)
- *reg &= ~__feature_bit(x86_feature);
-}
-
static inline bool guest_cpuid_is_amd_or_hygon(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;
@@ -220,27 +217,6 @@ static __always_inline bool guest_pv_has(struct kvm_vcpu *vcpu,
return vcpu->arch.pv_cpuid.features & (1u << kvm_feature);
}

-enum kvm_governed_features {
-#define KVM_GOVERNED_FEATURE(x) KVM_GOVERNED_##x,
-#include "governed_features.h"
- KVM_NR_GOVERNED_FEATURES
-};
-
-static __always_inline int kvm_governed_feature_index(unsigned int x86_feature)
-{
- switch (x86_feature) {
-#define KVM_GOVERNED_FEATURE(x) case x: return KVM_GOVERNED_##x;
-#include "governed_features.h"
- default:
- return -1;
- }
-}
-
-static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature)
-{
- return kvm_governed_feature_index(x86_feature) >= 0;
-}
-
static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu,
unsigned int x86_feature)
{
@@ -288,17 +264,17 @@ static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr

static inline bool guest_has_spec_ctrl_msr(struct kvm_vcpu *vcpu)
{
- return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
- guest_cpuid_has(vcpu, X86_FEATURE_AMD_STIBP) ||
- guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBRS) ||
- guest_cpuid_has(vcpu, X86_FEATURE_AMD_SSBD));
+ return (guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
+ guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_STIBP) ||
+ guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_IBRS) ||
+ guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_SSBD));
}

static inline bool guest_has_pred_cmd_msr(struct kvm_vcpu *vcpu)
{
- return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
- guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBPB) ||
- guest_cpuid_has(vcpu, X86_FEATURE_SBPB));
+ return (guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
+ guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_IBPB) ||
+ guest_cpu_cap_has(vcpu, X86_FEATURE_SBPB));
}

#endif
diff --git a/arch/x86/kvm/governed_features.h b/arch/x86/kvm/governed_features.h
deleted file mode 100644
index ad463b1ed4e4..000000000000
--- a/arch/x86/kvm/governed_features.h
+++ /dev/null
@@ -1,22 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#if !defined(KVM_GOVERNED_FEATURE) || defined(KVM_GOVERNED_X86_FEATURE)
-BUILD_BUG()
-#endif
-
-#define KVM_GOVERNED_X86_FEATURE(x) KVM_GOVERNED_FEATURE(X86_FEATURE_##x)
-
-KVM_GOVERNED_X86_FEATURE(GBPAGES)
-KVM_GOVERNED_X86_FEATURE(XSAVES)
-KVM_GOVERNED_X86_FEATURE(VMX)
-KVM_GOVERNED_X86_FEATURE(NRIPS)
-KVM_GOVERNED_X86_FEATURE(TSCRATEMSR)
-KVM_GOVERNED_X86_FEATURE(V_VMSAVE_VMLOAD)
-KVM_GOVERNED_X86_FEATURE(LBRV)
-KVM_GOVERNED_X86_FEATURE(PAUSEFILTER)
-KVM_GOVERNED_X86_FEATURE(PFTHRESHOLD)
-KVM_GOVERNED_X86_FEATURE(VGIF)
-KVM_GOVERNED_X86_FEATURE(VNMI)
-KVM_GOVERNED_X86_FEATURE(LAM)
-
-#undef KVM_GOVERNED_X86_FEATURE
-#undef KVM_GOVERNED_FEATURE
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 8a47f8541eab..4971b60a1882 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1352,7 +1352,7 @@ static void __kvm_hv_xsaves_xsavec_maybe_warn(struct kvm_vcpu *vcpu)
return;

if (guest_cpuid_has(vcpu, X86_FEATURE_XSAVES) ||
- !guest_cpuid_has(vcpu, X86_FEATURE_XSAVEC))
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVEC))
return;

pr_notice_ratelimited("Booting SMP Windows KVM VM with !XSAVES && XSAVEC. "
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index ebf41023be38..37a2ecee3d75 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -590,7 +590,7 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
* version first and level-triggered interrupts never get EOIed in
* IOAPIC.
*/
- if (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC) &&
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_X2APIC) &&
!ioapic_in_kernel(vcpu->kvm))
v |= APIC_LVR_DIRECTED_EOI;
kvm_lapic_set_reg(apic, APIC_LVR, v);
diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index a67c28a56417..9e8cb38ae1db 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -128,7 +128,7 @@ static u8 mtrr_disabled_type(struct kvm_vcpu *vcpu)
* enable MTRRs and it is obviously undesirable to run the
* guest entirely with UC memory and we use WB.
*/
- if (guest_cpuid_has(vcpu, X86_FEATURE_MTRR))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_MTRR))
return MTRR_TYPE_UNCACHABLE;
else
return MTRR_TYPE_WRBACK;
diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
index d06d43d8d2aa..9144b28789df 100644
--- a/arch/x86/kvm/smm.c
+++ b/arch/x86/kvm/smm.c
@@ -283,7 +283,7 @@ void enter_smm(struct kvm_vcpu *vcpu)
memset(smram.bytes, 0, sizeof(smram.bytes));

#ifdef CONFIG_X86_64
- if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
enter_smm_save_state_64(vcpu, &smram.smram64);
else
#endif
@@ -353,7 +353,7 @@ void enter_smm(struct kvm_vcpu *vcpu)
kvm_set_segment(vcpu, &ds, VCPU_SREG_SS);

#ifdef CONFIG_X86_64
- if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
if (static_call(kvm_x86_set_efer)(vcpu, 0))
goto error;
#endif
@@ -586,7 +586,7 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
* supports long mode.
*/
#ifdef CONFIG_X86_64
- if (guest_cpuid_has(vcpu, X86_FEATURE_LM)) {
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_LM)) {
struct kvm_segment cs_desc;
unsigned long cr4;

@@ -609,7 +609,7 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
kvm_set_cr0(vcpu, cr0 & ~(X86_CR0_PG | X86_CR0_PE));

#ifdef CONFIG_X86_64
- if (guest_cpuid_has(vcpu, X86_FEATURE_LM)) {
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_LM)) {
unsigned long cr4, efer;

/* Clear CR4.PAE before clearing EFER.LME. */
@@ -632,7 +632,7 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
return X86EMUL_UNHANDLEABLE;

#ifdef CONFIG_X86_64
- if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
return rsm_load_state_64(ctxt, &smram.smram64);
else
#endif
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index dfcc38bd97d3..4a4be2da1345 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -46,7 +46,7 @@ static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,

switch (msr) {
case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTR5:
- if (!guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_PERFCTR_CORE))
return NULL;
/*
* Each PMU counter has a pair of CTL and CTR MSRs. CTLn
@@ -109,7 +109,7 @@ static bool amd_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
case MSR_K7_EVNTSEL0 ... MSR_K7_PERFCTR3:
return pmu->version > 0;
case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTR5:
- return guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE);
+ return guest_cpu_cap_has(vcpu, X86_FEATURE_PERFCTR_CORE);
case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
@@ -179,7 +179,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
union cpuid_0x80000022_ebx ebx;

pmu->version = 1;
- if (guest_cpuid_has(vcpu, X86_FEATURE_PERFMON_V2)) {
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_PERFMON_V2)) {
pmu->version = 2;
/*
* Note, PERFMON_V2 is also in 0x80000022.0x0, i.e. the guest
@@ -189,7 +189,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
x86_feature_cpuid(X86_FEATURE_PERFMON_V2).index);
ebx.full = kvm_find_cpuid_entry_index(vcpu, 0x80000022, 0)->ebx;
pmu->nr_arch_gp_counters = ebx.split.num_core_pmc;
- } else if (guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) {
+ } else if (guest_cpu_cap_has(vcpu, X86_FEATURE_PERFCTR_CORE)) {
pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS_CORE;
} else {
pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7640dedc2ddc..1004280599b4 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4399,8 +4399,8 @@ static void sev_es_vcpu_after_set_cpuid(struct vcpu_svm *svm)
struct kvm_vcpu *vcpu = &svm->vcpu;

if (boot_cpu_has(X86_FEATURE_V_TSC_AUX)) {
- bool v_tsc_aux = guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) ||
- guest_cpuid_has(vcpu, X86_FEATURE_RDPID);
+ bool v_tsc_aux = guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) ||
+ guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID);

set_msr_interception(vcpu, svm->msrpm, MSR_TSC_AUX, v_tsc_aux, v_tsc_aux);
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 946a75771946..06770b60c0ba 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1178,14 +1178,14 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
*/
if (kvm_cpu_cap_has(X86_FEATURE_INVPCID)) {
if (!npt_enabled ||
- !guest_cpuid_has(&svm->vcpu, X86_FEATURE_INVPCID))
+ !guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_INVPCID))
svm_set_intercept(svm, INTERCEPT_INVPCID);
else
svm_clr_intercept(svm, INTERCEPT_INVPCID);
}

if (kvm_cpu_cap_has(X86_FEATURE_RDTSCP)) {
- if (guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP))
svm_clr_intercept(svm, INTERCEPT_RDTSCP);
else
svm_set_intercept(svm, INTERCEPT_RDTSCP);
@@ -2911,7 +2911,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
break;
case MSR_AMD64_VIRT_SPEC_CTRL:
if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_VIRT_SSBD))
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_VIRT_SSBD))
return 1;

msr_info->data = svm->virt_spec_ctrl;
@@ -3058,7 +3058,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
break;
case MSR_AMD64_VIRT_SPEC_CTRL:
if (!msr->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_VIRT_SSBD))
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_VIRT_SSBD))
return 1;

if (data & ~SPEC_CTRL_SSBD)
@@ -3230,7 +3230,7 @@ static int invpcid_interception(struct kvm_vcpu *vcpu)
unsigned long type;
gva_t gva;

- if (!guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) {
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_INVPCID)) {
kvm_queue_exception(vcpu, UD_VECTOR);
return 1;
}
@@ -4342,7 +4342,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
guest_cpu_cap_change(vcpu, X86_FEATURE_XSAVES,
boot_cpu_has(X86_FEATURE_XSAVE) &&
boot_cpu_has(X86_FEATURE_XSAVES) &&
- guest_cpuid_has(vcpu, X86_FEATURE_XSAVE));
+ guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE));

/*
* Intercept VMLOAD if the vCPU mode is Intel in order to emulate that
@@ -4360,7 +4360,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)

if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_FLUSH_CMD, 0,
- !!guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));
+ !!guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));

if (sev_guest(vcpu->kvm))
sev_vcpu_after_set_cpuid(svm);
@@ -4617,7 +4617,7 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram)
* responsible for ensuring nested SVM and SMIs are mutually exclusive.
*/

- if (!guest_cpuid_has(vcpu, X86_FEATURE_LM))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
return 1;

smram->smram64.svm_guest_flag = 1;
@@ -4664,14 +4664,14 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)

const struct kvm_smram_state_64 *smram64 = &smram->smram64;

- if (!guest_cpuid_has(vcpu, X86_FEATURE_LM))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
return 0;

/* Non-zero if SMI arrived while vCPU was in guest mode. */
if (!smram64->svm_guest_flag)
return 0;

- if (!guest_cpuid_has(vcpu, X86_FEATURE_SVM))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SVM))
return 1;

if (!(smram64->efer & EFER_SVME))
diff --git a/arch/x86/kvm/vmx/hyperv.h b/arch/x86/kvm/vmx/hyperv.h
index a87407412615..11a339009781 100644
--- a/arch/x86/kvm/vmx/hyperv.h
+++ b/arch/x86/kvm/vmx/hyperv.h
@@ -42,7 +42,7 @@ static inline struct hv_enlightened_vmcs *nested_vmx_evmcs(struct vcpu_vmx *vmx)
return vmx->nested.hv_evmcs;
}

-static inline bool guest_cpuid_has_evmcs(struct kvm_vcpu *vcpu)
+static inline bool guest_cpu_cap_has_evmcs(struct kvm_vcpu *vcpu)
{
/*
* eVMCS is exposed to the guest if Hyper-V is enabled in CPUID and
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index fb7eec29681d..fcba0061083d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -259,7 +259,7 @@ static bool nested_evmcs_handle_vmclear(struct kvm_vcpu *vcpu, gpa_t vmptr)
* state. It is possible that the area will stay mapped as
* vmx->nested.hv_evmcs but this shouldn't be a problem.
*/
- if (!guest_cpuid_has_evmcs(vcpu) ||
+ if (!guest_cpu_cap_has_evmcs(vcpu) ||
!evmptr_is_valid(nested_get_evmptr(vcpu)))
return false;

@@ -2061,7 +2061,7 @@ static enum nested_evmptrld_status nested_vmx_handle_enlightened_vmptrld(
bool evmcs_gpa_changed = false;
u64 evmcs_gpa;

- if (likely(!guest_cpuid_has_evmcs(vcpu)))
+ if (likely(!guest_cpu_cap_has_evmcs(vcpu)))
return EVMPTRLD_DISABLED;

evmcs_gpa = nested_get_evmptr(vcpu);
@@ -2947,7 +2947,7 @@ static int nested_vmx_check_controls(struct kvm_vcpu *vcpu,
return -EINVAL;

#ifdef CONFIG_KVM_HYPERV
- if (guest_cpuid_has_evmcs(vcpu))
+ if (guest_cpu_cap_has_evmcs(vcpu))
return nested_evmcs_check_controls(vmcs12);
#endif

@@ -3231,7 +3231,7 @@ static bool nested_get_evmcs_page(struct kvm_vcpu *vcpu)
* L2 was running), map it here to make sure vmcs12 changes are
* properly reflected.
*/
- if (guest_cpuid_has_evmcs(vcpu) &&
+ if (guest_cpu_cap_has_evmcs(vcpu) &&
vmx->nested.hv_evmcs_vmptr == EVMPTR_MAP_PENDING) {
enum nested_evmptrld_status evmptrld_status =
nested_vmx_handle_enlightened_vmptrld(vcpu, false);
@@ -4882,7 +4882,7 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
* doesn't isolate different VMCSs, i.e. in this case, doesn't provide
* separate modes for L2 vs L1.
*/
- if (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL))
indirect_branch_prediction_barrier();

/* Update any VMCS fields that might have changed while L2 ran */
@@ -6152,7 +6152,7 @@ static bool nested_vmx_exit_handled_encls(struct kvm_vcpu *vcpu,
{
u32 encls_leaf;

- if (!guest_cpuid_has(vcpu, X86_FEATURE_SGX) ||
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SGX) ||
!nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENCLS_EXITING))
return false;

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index be40474de6e4..a739defa6796 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -110,7 +110,7 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,

static inline u64 vcpu_get_perf_capabilities(struct kvm_vcpu *vcpu)
{
- if (!guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
return 0;

return vcpu->arch.perf_capabilities;
@@ -160,7 +160,7 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
ret = vcpu_get_perf_capabilities(vcpu) & PERF_CAP_PEBS_FORMAT;
break;
case MSR_IA32_DS_AREA:
- ret = guest_cpuid_has(vcpu, X86_FEATURE_DS);
+ ret = guest_cpu_cap_has(vcpu, X86_FEATURE_DS);
break;
case MSR_PEBS_DATA_CFG:
perf_capabilities = vcpu_get_perf_capabilities(vcpu);
diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index 6fef01e0536e..f57f072a16f6 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -123,7 +123,7 @@ static int sgx_inject_fault(struct kvm_vcpu *vcpu, gva_t gva, int trapnr)
* likely than a bad userspace address.
*/
if ((trapnr == PF_VECTOR || !boot_cpu_has(X86_FEATURE_SGX2)) &&
- guest_cpuid_has(vcpu, X86_FEATURE_SGX2)) {
+ guest_cpu_cap_has(vcpu, X86_FEATURE_SGX2)) {
memset(&ex, 0, sizeof(ex));
ex.vector = PF_VECTOR;
ex.error_code = PFERR_PRESENT_MASK | PFERR_WRITE_MASK |
@@ -366,7 +366,7 @@ static inline bool encls_leaf_enabled_in_guest(struct kvm_vcpu *vcpu, u32 leaf)
return true;

if (leaf >= EAUG && leaf <= EMODT)
- return guest_cpuid_has(vcpu, X86_FEATURE_SGX2);
+ return guest_cpu_cap_has(vcpu, X86_FEATURE_SGX2);

return false;
}
@@ -382,8 +382,8 @@ int handle_encls(struct kvm_vcpu *vcpu)
{
u32 leaf = (u32)kvm_rax_read(vcpu);

- if (!enable_sgx || !guest_cpuid_has(vcpu, X86_FEATURE_SGX) ||
- !guest_cpuid_has(vcpu, X86_FEATURE_SGX1)) {
+ if (!enable_sgx || !guest_cpu_cap_has(vcpu, X86_FEATURE_SGX) ||
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_SGX1)) {
kvm_queue_exception(vcpu, UD_VECTOR);
} else if (!encls_leaf_enabled_in_guest(vcpu, leaf) ||
!sgx_enabled_in_guest_bios(vcpu) || !is_paging(vcpu)) {
@@ -480,15 +480,15 @@ void vmx_write_encls_bitmap(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
if (!cpu_has_vmx_encls_vmexit())
return;

- if (guest_cpuid_has(vcpu, X86_FEATURE_SGX) &&
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_SGX) &&
sgx_enabled_in_guest_bios(vcpu)) {
- if (guest_cpuid_has(vcpu, X86_FEATURE_SGX1)) {
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_SGX1)) {
bitmap &= ~GENMASK_ULL(ETRACK, ECREATE);
if (sgx_intercept_encls_ecreate(vcpu))
bitmap |= (1 << ECREATE);
}

- if (guest_cpuid_has(vcpu, X86_FEATURE_SGX2))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_SGX2))
bitmap &= ~GENMASK_ULL(EMODT, EAUG);

/*
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 653c4b68ec7f..741961a1edcc 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1874,8 +1874,8 @@ static void vmx_setup_uret_msrs(struct vcpu_vmx *vmx)
vmx_setup_uret_msr(vmx, MSR_EFER, update_transition_efer(vmx));

vmx_setup_uret_msr(vmx, MSR_TSC_AUX,
- guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP) ||
- guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDPID));
+ guest_cpu_cap_has(&vmx->vcpu, X86_FEATURE_RDTSCP) ||
+ guest_cpu_cap_has(&vmx->vcpu, X86_FEATURE_RDPID));

/*
* hle=0, rtm=0, tsx_ctrl=1 can be found with some combinations of new
@@ -2028,7 +2028,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_BNDCFGS:
if (!kvm_mpx_supported() ||
(!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_MPX)))
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_MPX)))
return 1;
msr_info->data = vmcs_read64(GUEST_BNDCFGS);
break;
@@ -2044,7 +2044,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
break;
case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_SGX_LC))
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_SGX_LC))
return 1;
msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash
[msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0];
@@ -2063,7 +2063,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
* sanity checking and refuse to boot. Filter all unsupported
* features out.
*/
- if (!msr_info->host_initiated && guest_cpuid_has_evmcs(vcpu))
+ if (!msr_info->host_initiated && guest_cpu_cap_has_evmcs(vcpu))
nested_evmcs_filter_control_msr(vcpu, msr_info->index,
&msr_info->data);
#endif
@@ -2133,7 +2133,7 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,
u64 data)
{
#ifdef CONFIG_X86_64
- if (!guest_cpuid_has(vcpu, X86_FEATURE_LM))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
return (u32)data;
#endif
return (unsigned long)data;
@@ -2144,7 +2144,7 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
u64 debugctl = 0;

if (boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT) &&
- (host_initiated || guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT)))
+ (host_initiated || guest_cpu_cap_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT)))
debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT;

if ((kvm_caps.supported_perf_cap & PMU_CAP_LBR_FMT) &&
@@ -2248,7 +2248,7 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_BNDCFGS:
if (!kvm_mpx_supported() ||
(!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_MPX)))
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_MPX)))
return 1;
if (is_noncanonical_address(data & PAGE_MASK, vcpu) ||
(data & MSR_IA32_BNDCFGS_RSVD))
@@ -2350,7 +2350,7 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
* behavior, but it's close enough.
*/
if (!msr_info->host_initiated &&
- (!guest_cpuid_has(vcpu, X86_FEATURE_SGX_LC) ||
+ (!guest_cpu_cap_has(vcpu, X86_FEATURE_SGX_LC) ||
((vmx->msr_ia32_feature_control & FEAT_CTL_LOCKED) &&
!(vmx->msr_ia32_feature_control & FEAT_CTL_SGX_LC_ENABLED))))
return 1;
@@ -2436,9 +2436,9 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if ((data & PERF_CAP_PEBS_MASK) !=
(kvm_caps.supported_perf_cap & PERF_CAP_PEBS_MASK))
return 1;
- if (!guest_cpuid_has(vcpu, X86_FEATURE_DS))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_DS))
return 1;
- if (!guest_cpuid_has(vcpu, X86_FEATURE_DTES64))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_DTES64))
return 1;
if (!cpuid_model_is_consistent(vcpu))
return 1;
@@ -4570,10 +4570,7 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control,
bool __enabled; \
\
if (cpu_has_vmx_##name()) { \
- if (kvm_is_governed_feature(X86_FEATURE_##feat_name)) \
- __enabled = guest_cpu_cap_has(__vcpu, X86_FEATURE_##feat_name); \
- else \
- __enabled = guest_cpuid_has(__vcpu, X86_FEATURE_##feat_name); \
+ __enabled = guest_cpu_cap_has(__vcpu, X86_FEATURE_##feat_name); \
vmx_adjust_secondary_exec_control(vmx, exec_control, SECONDARY_EXEC_##ctrl_name,\
__enabled, exiting); \
} \
@@ -4649,8 +4646,8 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
*/
if (cpu_has_vmx_rdtscp()) {
bool rdpid_or_rdtscp_enabled =
- guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) ||
- guest_cpuid_has(vcpu, X86_FEATURE_RDPID);
+ guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) ||
+ guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID);

vmx_adjust_secondary_exec_control(vmx, &exec_control,
SECONDARY_EXEC_ENABLE_RDTSCP,
@@ -5956,7 +5953,7 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
} operand;
int gpr_index;

- if (!guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) {
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_INVPCID)) {
kvm_queue_exception(vcpu, UD_VECTOR);
return 1;
}
@@ -7837,7 +7834,7 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
* set if and only if XSAVE is supported.
*/
if (!boot_cpu_has(X86_FEATURE_XSAVE) ||
- !guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE))
guest_cpu_cap_clear(vcpu, X86_FEATURE_XSAVES);

vmx_setup_uret_msrs(vmx);
@@ -7859,21 +7856,21 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
nested_vmx_cr_fixed1_bits_update(vcpu);

if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
- guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
+ guest_cpu_cap_has(vcpu, X86_FEATURE_INTEL_PT))
update_intel_pt_cfg(vcpu);

if (boot_cpu_has(X86_FEATURE_RTM)) {
struct vmx_uret_msr *msr;
msr = vmx_find_uret_msr(vmx, MSR_IA32_TSX_CTRL);
if (msr) {
- bool enabled = guest_cpuid_has(vcpu, X86_FEATURE_RTM);
+ bool enabled = guest_cpu_cap_has(vcpu, X86_FEATURE_RTM);
vmx_set_guest_uret_msr(vmx, msr, enabled ? 0 : TSX_CTRL_RTM_DISABLE);
}
}

if (kvm_cpu_cap_has(X86_FEATURE_XFD))
vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R,
- !guest_cpuid_has(vcpu, X86_FEATURE_XFD));
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD));

if (boot_cpu_has(X86_FEATURE_IBPB))
vmx_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
@@ -7881,17 +7878,17 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)

if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
- !guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));

set_cr4_guest_host_mask(vmx);

vmx_write_encls_bitmap(vcpu, NULL);
- if (guest_cpuid_has(vcpu, X86_FEATURE_SGX))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_SGX))
vmx->msr_ia32_feature_control_valid_bits |= FEAT_CTL_SGX_ENABLED;
else
vmx->msr_ia32_feature_control_valid_bits &= ~FEAT_CTL_SGX_ENABLED;

- if (guest_cpuid_has(vcpu, X86_FEATURE_SGX_LC))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_SGX_LC))
vmx->msr_ia32_feature_control_valid_bits |=
FEAT_CTL_SGX_LC_ENABLED;
else
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4ca9651b3f43..5aa7581802f7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -488,7 +488,7 @@ int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
enum lapic_mode old_mode = kvm_get_apic_mode(vcpu);
enum lapic_mode new_mode = kvm_apic_mode(msr_info->data);
u64 reserved_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu) | 0x2ff |
- (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC) ? 0 : X2APIC_ENABLE);
+ (guest_cpu_cap_has(vcpu, X86_FEATURE_X2APIC) ? 0 : X2APIC_ENABLE);

if ((msr_info->data & reserved_bits) != 0 || new_mode == LAPIC_MODE_INVALID)
return 1;
@@ -1351,10 +1351,10 @@ static u64 kvm_dr6_fixed(struct kvm_vcpu *vcpu)
{
u64 fixed = DR6_FIXED_1;

- if (!guest_cpuid_has(vcpu, X86_FEATURE_RTM))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_RTM))
fixed |= DR6_RTM;

- if (!guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
fixed |= DR6_BUS_LOCK;
return fixed;
}
@@ -1708,20 +1708,20 @@ static int do_get_msr_feature(struct kvm_vcpu *vcpu, unsigned index, u64 *data)

static bool __kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
{
- if (efer & EFER_AUTOIBRS && !guest_cpuid_has(vcpu, X86_FEATURE_AUTOIBRS))
+ if (efer & EFER_AUTOIBRS && !guest_cpu_cap_has(vcpu, X86_FEATURE_AUTOIBRS))
return false;

- if (efer & EFER_FFXSR && !guest_cpuid_has(vcpu, X86_FEATURE_FXSR_OPT))
+ if (efer & EFER_FFXSR && !guest_cpu_cap_has(vcpu, X86_FEATURE_FXSR_OPT))
return false;

- if (efer & EFER_SVME && !guest_cpuid_has(vcpu, X86_FEATURE_SVM))
+ if (efer & EFER_SVME && !guest_cpu_cap_has(vcpu, X86_FEATURE_SVM))
return false;

if (efer & (EFER_LME | EFER_LMA) &&
- !guest_cpuid_has(vcpu, X86_FEATURE_LM))
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
return false;

- if (efer & EFER_NX && !guest_cpuid_has(vcpu, X86_FEATURE_NX))
+ if (efer & EFER_NX && !guest_cpu_cap_has(vcpu, X86_FEATURE_NX))
return false;

return true;
@@ -1863,8 +1863,8 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
return 1;

if (!host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
return 1;

/*
@@ -1920,8 +1920,8 @@ int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
return 1;

if (!host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
- !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
return 1;
break;
}
@@ -2113,7 +2113,7 @@ EXPORT_SYMBOL_GPL(kvm_handle_invalid_op);
static int kvm_emulate_monitor_mwait(struct kvm_vcpu *vcpu, const char *insn)
{
if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS) &&
- !guest_cpuid_has(vcpu, X86_FEATURE_MWAIT))
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_MWAIT))
return kvm_handle_invalid_op(vcpu);

pr_warn_once("%s instruction emulated as NOP!\n", insn);
@@ -3820,11 +3820,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if ((!guest_has_pred_cmd_msr(vcpu)))
return 1;

- if (!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) &&
- !guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBPB))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_IBPB))
reserved_bits |= PRED_CMD_IBPB;

- if (!guest_cpuid_has(vcpu, X86_FEATURE_SBPB))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SBPB))
reserved_bits |= PRED_CMD_SBPB;
}

@@ -3845,7 +3845,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
}
case MSR_IA32_FLUSH_CMD:
if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D))
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D))
return 1;

if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D) || (data & ~L1D_FLUSH))
@@ -3896,7 +3896,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
kvm_set_lapic_tscdeadline_msr(vcpu, data);
break;
case MSR_IA32_TSC_ADJUST:
- if (guest_cpuid_has(vcpu, X86_FEATURE_TSC_ADJUST)) {
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_TSC_ADJUST)) {
if (!msr_info->host_initiated) {
s64 adj = data - vcpu->arch.ia32_tsc_adjust_msr;
adjust_tsc_offset_guest(vcpu, adj);
@@ -3923,7 +3923,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)

if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
((old_val ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
- if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_XMM3))
return 1;
vcpu->arch.ia32_misc_enable_msr = data;
kvm_update_cpuid_runtime(vcpu);
@@ -4100,12 +4100,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
kvm_pr_unimpl_wrmsr(vcpu, msr, data);
break;
case MSR_AMD64_OSVW_ID_LENGTH:
- if (!guest_cpuid_has(vcpu, X86_FEATURE_OSVW))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
return 1;
vcpu->arch.osvw.length = data;
break;
case MSR_AMD64_OSVW_STATUS:
- if (!guest_cpuid_has(vcpu, X86_FEATURE_OSVW))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
return 1;
vcpu->arch.osvw.status = data;
break;
@@ -4126,7 +4126,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
#ifdef CONFIG_X86_64
case MSR_IA32_XFD:
if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_XFD))
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
return 1;

if (data & ~kvm_guest_supported_xfd(vcpu))
@@ -4136,7 +4136,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
break;
case MSR_IA32_XFD_ERR:
if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_XFD))
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
return 1;

if (data & ~kvm_guest_supported_xfd(vcpu))
@@ -4260,13 +4260,13 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
break;
case MSR_IA32_ARCH_CAPABILITIES:
if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
return 1;
msr_info->data = vcpu->arch.arch_capabilities;
break;
case MSR_IA32_PERF_CAPABILITIES:
if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
return 1;
msr_info->data = vcpu->arch.perf_capabilities;
break;
@@ -4467,12 +4467,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
msr_info->data = 0xbe702111;
break;
case MSR_AMD64_OSVW_ID_LENGTH:
- if (!guest_cpuid_has(vcpu, X86_FEATURE_OSVW))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
return 1;
msr_info->data = vcpu->arch.osvw.length;
break;
case MSR_AMD64_OSVW_STATUS:
- if (!guest_cpuid_has(vcpu, X86_FEATURE_OSVW))
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
return 1;
msr_info->data = vcpu->arch.osvw.status;
break;
@@ -4491,14 +4491,14 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
#ifdef CONFIG_X86_64
case MSR_IA32_XFD:
if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_XFD))
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
return 1;

msr_info->data = vcpu->arch.guest_fpu.fpstate->xfd;
break;
case MSR_IA32_XFD_ERR:
if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_XFD))
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
return 1;

msr_info->data = vcpu->arch.guest_fpu.xfd_err;
@@ -8508,17 +8508,17 @@ static bool emulator_get_cpuid(struct x86_emulate_ctxt *ctxt,

static bool emulator_guest_has_movbe(struct x86_emulate_ctxt *ctxt)
{
- return guest_cpuid_has(emul_to_vcpu(ctxt), X86_FEATURE_MOVBE);
+ return guest_cpu_cap_has(emul_to_vcpu(ctxt), X86_FEATURE_MOVBE);
}

static bool emulator_guest_has_fxsr(struct x86_emulate_ctxt *ctxt)
{
- return guest_cpuid_has(emul_to_vcpu(ctxt), X86_FEATURE_FXSR);
+ return guest_cpu_cap_has(emul_to_vcpu(ctxt), X86_FEATURE_FXSR);
}

static bool emulator_guest_has_rdpid(struct x86_emulate_ctxt *ctxt)
{
- return guest_cpuid_has(emul_to_vcpu(ctxt), X86_FEATURE_RDPID);
+ return guest_cpu_cap_has(emul_to_vcpu(ctxt), X86_FEATURE_RDPID);
}

static ulong emulator_read_gpr(struct x86_emulate_ctxt *ctxt, unsigned reg)
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:57:24

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 48/49] KVM: x86: Add a macro for features that are synthesized into boot_cpu_data

Add yet another CPUID macro, this time for features that the host kernel
synthesizes into boot_cpu_data, i.e. that the kernel force sets even in
situations where the feature isn't reported by CPUID. Thanks to the
macro shenanigans of kvm_cpu_cap_init(), such features can now be handled
in the core CPUID framework, i.e. don't need to be handled out-of-band and
thus without as many guardrails.

Adding a dedicated macro also helps document what's going on, e.g. the
calls to kvm_cpu_cap_check_and_set() are very confusing unless the reader
knows exactly how kvm_cpu_cap_init() generates kvm_cpu_caps (and even
then, it's far from obvious).

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0130e0677387..0e64a6332052 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -106,6 +106,17 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
F(name); \
})

+/*
+ * Synthesized Feature - For features that are synthesized into boot_cpu_data,
+ * i.e. may not be present in the raw CPUID, but can still be advertised to
+ * userspace. Primarily used for mitigation related feature flags.
+ */
+#define SYN_F(name) \
+({ \
+ kvm_cpu_cap_synthesized |= F(name); \
+ F(name); \
+})
+
/*
* Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
* identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
@@ -727,13 +738,15 @@ do { \
const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32); \
const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf; \
u32 kvm_cpu_cap_emulated = 0; \
+ u32 kvm_cpu_cap_synthesized = 0; \
\
if (leaf < NCAPINTS) \
kvm_cpu_caps[leaf] &= (mask); \
else \
kvm_cpu_caps[leaf] = (mask); \
\
- kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid); \
+ kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) | \
+ kvm_cpu_cap_synthesized); \
kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated; \
} while (0)

@@ -913,13 +926,10 @@ void kvm_set_cpu_caps(void)
kvm_cpu_cap_init(CPUID_8000_0021_EAX,
F(NO_NESTED_DATA_BP) | F(LFENCE_RDTSC) | 0 /* SmmPgCfgLock */ |
F(NULL_SEL_CLR_BASE) | F(AUTOIBRS) | 0 /* PrefetchCtlMsr */ |
- F(WRMSR_XX_BASE_NS)
+ F(WRMSR_XX_BASE_NS) | SYN_F(SBPB) | SYN_F(IBPB_BRTYPE) |
+ SYN_F(SRSO_NO)
);

- kvm_cpu_cap_check_and_set(X86_FEATURE_SBPB);
- kvm_cpu_cap_check_and_set(X86_FEATURE_IBPB_BRTYPE);
- kvm_cpu_cap_check_and_set(X86_FEATURE_SRSO_NO);
-
kvm_cpu_cap_init(CPUID_8000_0022_EAX,
F(PERFMON_V2)
);
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:57:47

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 39/49] KVM: x86: Extract code for generating per-entry emulated CPUID information

Extract the meat of __do_cpuid_func_emulated() into a separate helper,
cpuid_func_emulated(), so that cpuid_func_emulated() can be used with a
single CPUID entry. This will allow marking emulated features as fully
supported in the guest cpu_caps without needing to hardcode the set of
emulated features in multiple locations.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index fd725cbbcce5..d1849fe874ab 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1007,14 +1007,10 @@ static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
return entry;
}

-static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func)
+static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func)
{
- struct kvm_cpuid_entry2 *entry;
+ memset(entry, 0, sizeof(*entry));

- if (array->nent >= array->maxnent)
- return -E2BIG;
-
- entry = &array->entries[array->nent];
entry->function = func;
entry->index = 0;
entry->flags = 0;
@@ -1022,23 +1018,27 @@ static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func)
switch (func) {
case 0:
entry->eax = 7;
- ++array->nent;
- break;
+ return 1;
case 1:
entry->ecx = F(MOVBE);
- ++array->nent;
- break;
+ return 1;
case 7:
entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
entry->eax = 0;
if (kvm_cpu_cap_has(X86_FEATURE_RDTSCP))
entry->ecx = F(RDPID);
- ++array->nent;
- break;
+ return 1;
default:
- break;
+ return 0;
}
+}

+static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func)
+{
+ if (array->nent >= array->maxnent)
+ return -E2BIG;
+
+ array->nent += cpuid_func_emulated(&array->entries[array->nent], func);
return 0;
}

--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:58:15

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 22/49] KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID features

Add a macro to precisely handle CPUID features that AMD duplicated from
CPUID.0x1.EDX into CPUID.0x8000_0001.EDX. This will allow adding an
assert that all features passed to kvm_cpu_cap_init() match the word being
processed, e.g. to prevent passing a feature from CPUID 0x7 to CPUID 0x1.

Because the kernel simply reuses the X86_FEATURE_* definitions from
CPUID.0x1.EDX, KVM's use of the aliased features would result in false
positives from such an assert.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 24 +++++++++++++++++-------
1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 5e3b97d06374..f2bd2f5c4ea3 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -88,6 +88,16 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
F(name); \
})

+/*
+ * Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
+ * identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
+ */
+#define AF(name) \
+({ \
+ BUILD_BUG_ON(__feature_leaf(X86_FEATURE_##name) != CPUID_1_EDX); \
+ feature_bit(name); \
+})
+
/*
* Magic value used by KVM when querying userspace-provided CPUID entries and
* doesn't care about the CPIUD index because the index of the function in
@@ -758,13 +768,13 @@ void kvm_set_cpu_caps(void)
);

kvm_cpu_cap_init(CPUID_8000_0001_EDX,
- F(FPU) | F(VME) | F(DE) | F(PSE) |
- F(TSC) | F(MSR) | F(PAE) | F(MCE) |
- F(CX8) | F(APIC) | 0 /* Reserved */ | F(SYSCALL) |
- F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
- F(PAT) | F(PSE36) | 0 /* Reserved */ |
- F(NX) | 0 /* Reserved */ | F(MMXEXT) | F(MMX) |
- F(FXSR) | F(FXSR_OPT) | X86_64_F(GBPAGES) | F(RDTSCP) |
+ AF(FPU) | AF(VME) | AF(DE) | AF(PSE) |
+ AF(TSC) | AF(MSR) | AF(PAE) | AF(MCE) |
+ AF(CX8) | AF(APIC) | 0 /* Reserved */ | F(SYSCALL) |
+ AF(MTRR) | AF(PGE) | AF(MCA) | AF(CMOV) |
+ AF(PAT) | AF(PSE36) | 0 /* Reserved */ |
+ F(NX) | 0 /* Reserved */ | F(MMXEXT) | AF(MMX) |
+ AF(FXSR) | F(FXSR_OPT) | X86_64_F(GBPAGES) | F(RDTSCP) |
0 /* Reserved */ | X86_64_F(LM) | F(3DNOWEXT) | F(3DNOW)
);

--
2.45.0.215.g3402c0e53f-goog


2024-05-17 17:59:42

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 23/49] KVM: x86: Handle kernel- and KVM-defined CPUID words in a single helper

Merge kvm_cpu_cap_init() and kvm_cpu_cap_init_kvm_defined() into a single
helper. The only advantage of separating the two was to make it somewhat
obvious that KVM directly initializes the KVM-defined words, whereas using
a common helper will allow for hardening both kernel- and KVM-defined
CPUID words without needing copy+paste.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 44 +++++++++++++++-----------------------------
1 file changed, 15 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index f2bd2f5c4ea3..8efffd48cdf1 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -622,37 +622,23 @@ static __always_inline u32 raw_cpuid_get(struct cpuid_reg cpuid)
return *__cpuid_entry_get_reg(&entry, cpuid.reg);
}

-/* Mask kvm_cpu_caps for @leaf with the raw CPUID capabilities of this CPU. */
-static __always_inline void __kvm_cpu_cap_mask(unsigned int leaf)
+static __always_inline void kvm_cpu_cap_init(u32 leaf, u32 mask)
{
const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);

- reverse_cpuid_check(leaf);
+ /*
+ * For kernel-defined leafs, mask the boot CPU's pre-populated value.
+ * For KVM-defined leafs, explicitly set the leaf, as KVM is the one
+ * and only authority.
+ */
+ if (leaf < NCAPINTS)
+ kvm_cpu_caps[leaf] &= mask;
+ else
+ kvm_cpu_caps[leaf] = mask;

kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);
}

-static __always_inline
-void kvm_cpu_cap_init_kvm_defined(enum kvm_only_cpuid_leafs leaf, u32 mask)
-{
- /* Use kvm_cpu_cap_init for leafs that aren't KVM-only. */
- BUILD_BUG_ON(leaf < NCAPINTS);
-
- kvm_cpu_caps[leaf] = mask;
-
- __kvm_cpu_cap_mask(leaf);
-}
-
-static __always_inline void kvm_cpu_cap_init(enum cpuid_leafs leaf, u32 mask)
-{
- /* Use kvm_cpu_cap_init_kvm_defined for KVM-only leafs. */
- BUILD_BUG_ON(leaf >= NCAPINTS);
-
- kvm_cpu_caps[leaf] &= mask;
-
- __kvm_cpu_cap_mask(leaf);
-}
-
void kvm_set_cpu_caps(void)
{
memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps));
@@ -740,12 +726,12 @@ void kvm_set_cpu_caps(void)
F(AMX_FP16) | F(AVX_IFMA) | F(LAM)
);

- kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX,
+ kvm_cpu_cap_init(CPUID_7_1_EDX,
F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI) |
F(AMX_COMPLEX)
);

- kvm_cpu_cap_init_kvm_defined(CPUID_7_2_EDX,
+ kvm_cpu_cap_init(CPUID_7_2_EDX,
F(INTEL_PSFD) | F(IPRED_CTRL) | F(RRSBA_CTRL) | F(DDPD_U) |
F(BHI_CTRL) | F(MCDT_NO)
);
@@ -755,7 +741,7 @@ void kvm_set_cpu_caps(void)
X86_64_F(XFD)
);

- kvm_cpu_cap_init_kvm_defined(CPUID_12_EAX,
+ kvm_cpu_cap_init(CPUID_12_EAX,
SF(SGX1) | SF(SGX2) | SF(SGX_EDECCSSA)
);

@@ -781,7 +767,7 @@ void kvm_set_cpu_caps(void)
if (!tdp_enabled && IS_ENABLED(CONFIG_X86_64))
kvm_cpu_cap_set(X86_FEATURE_GBPAGES);

- kvm_cpu_cap_init_kvm_defined(CPUID_8000_0007_EDX,
+ kvm_cpu_cap_init(CPUID_8000_0007_EDX,
SF(CONSTANT_TSC)
);

@@ -835,7 +821,7 @@ void kvm_set_cpu_caps(void)
kvm_cpu_cap_check_and_set(X86_FEATURE_IBPB_BRTYPE);
kvm_cpu_cap_check_and_set(X86_FEATURE_SRSO_NO);

- kvm_cpu_cap_init_kvm_defined(CPUID_8000_0022_EAX,
+ kvm_cpu_cap_init(CPUID_8000_0022_EAX,
F(PERFMON_V2)
);

--
2.45.0.215.g3402c0e53f-goog


2024-05-17 18:00:05

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 20/49] KVM: x86: Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init()

Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init() in anticipation of merging
it with kvm_cpu_cap_init_kvm_defined(), and in anticipation of _setting_
bits in the helper (a future commit will play macro games to set emulated
feature flags via kvm_cpu_cap_init()).

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 36 ++++++++++++++++++------------------
1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a802c09b50ab..5a4d6138c4f1 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -74,7 +74,7 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
* Raw Feature - For features that KVM supports based purely on raw host CPUID,
* i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
* Simply force set the feature in KVM's capabilities, raw CPUID support will
- * be factored in by kvm_cpu_cap_mask().
+ * be factored in by __kvm_cpu_cap_mask().
*/
#define RAW_F(name) \
({ \
@@ -619,7 +619,7 @@ static __always_inline void __kvm_cpu_cap_mask(unsigned int leaf)
static __always_inline
void kvm_cpu_cap_init_kvm_defined(enum kvm_only_cpuid_leafs leaf, u32 mask)
{
- /* Use kvm_cpu_cap_mask for leafs that aren't KVM-only. */
+ /* Use kvm_cpu_cap_init for leafs that aren't KVM-only. */
BUILD_BUG_ON(leaf < NCAPINTS);

kvm_cpu_caps[leaf] = mask;
@@ -627,7 +627,7 @@ void kvm_cpu_cap_init_kvm_defined(enum kvm_only_cpuid_leafs leaf, u32 mask)
__kvm_cpu_cap_mask(leaf);
}

-static __always_inline void kvm_cpu_cap_mask(enum cpuid_leafs leaf, u32 mask)
+static __always_inline void kvm_cpu_cap_init(enum cpuid_leafs leaf, u32 mask)
{
/* Use kvm_cpu_cap_init_kvm_defined for KVM-only leafs. */
BUILD_BUG_ON(leaf >= NCAPINTS);
@@ -656,7 +656,7 @@ void kvm_set_cpu_caps(void)
memcpy(&kvm_cpu_caps, &boot_cpu_data.x86_capability,
sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)));

- kvm_cpu_cap_mask(CPUID_1_ECX,
+ kvm_cpu_cap_init(CPUID_1_ECX,
/*
* NOTE: MONITOR (and MWAIT) are emulated as NOP, but *not*
* advertised to guests via CPUID!
@@ -673,7 +673,7 @@ void kvm_set_cpu_caps(void)
/* KVM emulates x2apic in software irrespective of host support. */
kvm_cpu_cap_set(X86_FEATURE_X2APIC);

- kvm_cpu_cap_mask(CPUID_1_EDX,
+ kvm_cpu_cap_init(CPUID_1_EDX,
F(FPU) | F(VME) | F(DE) | F(PSE) |
F(TSC) | F(MSR) | F(PAE) | F(MCE) |
F(CX8) | F(APIC) | 0 /* Reserved */ | F(SEP) |
@@ -684,7 +684,7 @@ void kvm_set_cpu_caps(void)
0 /* HTT, TM, Reserved, PBE */
);

- kvm_cpu_cap_mask(CPUID_7_0_EBX,
+ kvm_cpu_cap_init(CPUID_7_0_EBX,
F(FSGSBASE) | F(SGX) | F(BMI1) | F(HLE) | F(AVX2) |
F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) | F(INVPCID) |
F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ | F(AVX512F) |
@@ -693,7 +693,7 @@ void kvm_set_cpu_caps(void)
F(AVX512ER) | F(AVX512CD) | F(SHA_NI) | F(AVX512BW) |
F(AVX512VL));

- kvm_cpu_cap_mask(CPUID_7_ECX,
+ kvm_cpu_cap_init(CPUID_7_ECX,
F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
@@ -708,7 +708,7 @@ void kvm_set_cpu_caps(void)
if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
kvm_cpu_cap_clear(X86_FEATURE_PKU);

- kvm_cpu_cap_mask(CPUID_7_EDX,
+ kvm_cpu_cap_init(CPUID_7_EDX,
F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
@@ -727,7 +727,7 @@ void kvm_set_cpu_caps(void)
if (boot_cpu_has(X86_FEATURE_AMD_SSBD))
kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD);

- kvm_cpu_cap_mask(CPUID_7_1_EAX,
+ kvm_cpu_cap_init(CPUID_7_1_EAX,
F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) |
F(FZRM) | F(FSRS) | F(FSRC) |
F(AMX_FP16) | F(AVX_IFMA) | F(LAM)
@@ -743,7 +743,7 @@ void kvm_set_cpu_caps(void)
F(BHI_CTRL) | F(MCDT_NO)
);

- kvm_cpu_cap_mask(CPUID_D_1_EAX,
+ kvm_cpu_cap_init(CPUID_D_1_EAX,
F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES) | f_xfd
);

@@ -751,7 +751,7 @@ void kvm_set_cpu_caps(void)
SF(SGX1) | SF(SGX2) | SF(SGX_EDECCSSA)
);

- kvm_cpu_cap_mask(CPUID_8000_0001_ECX,
+ kvm_cpu_cap_init(CPUID_8000_0001_ECX,
F(LAHF_LM) | F(CMP_LEGACY) | 0 /*SVM*/ | 0 /* ExtApicSpace */ |
F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) |
F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
@@ -759,7 +759,7 @@ void kvm_set_cpu_caps(void)
F(TOPOEXT) | 0 /* PERFCTR_CORE */
);

- kvm_cpu_cap_mask(CPUID_8000_0001_EDX,
+ kvm_cpu_cap_init(CPUID_8000_0001_EDX,
F(FPU) | F(VME) | F(DE) | F(PSE) |
F(TSC) | F(MSR) | F(PAE) | F(MCE) |
F(CX8) | F(APIC) | 0 /* Reserved */ | F(SYSCALL) |
@@ -777,7 +777,7 @@ void kvm_set_cpu_caps(void)
SF(CONSTANT_TSC)
);

- kvm_cpu_cap_mask(CPUID_8000_0008_EBX,
+ kvm_cpu_cap_init(CPUID_8000_0008_EBX,
F(CLZERO) | F(XSAVEERPTR) |
F(WBNOINVD) | F(AMD_IBPB) | F(AMD_IBRS) | F(AMD_SSBD) | F(VIRT_SSBD) |
F(AMD_SSB_NO) | F(AMD_STIBP) | F(AMD_STIBP_ALWAYS_ON) |
@@ -811,13 +811,13 @@ void kvm_set_cpu_caps(void)
* Hide all SVM features by default, SVM will set the cap bits for
* features it emulates and/or exposes for L1.
*/
- kvm_cpu_cap_mask(CPUID_8000_000A_EDX, 0);
+ kvm_cpu_cap_init(CPUID_8000_000A_EDX, 0);

- kvm_cpu_cap_mask(CPUID_8000_001F_EAX,
+ kvm_cpu_cap_init(CPUID_8000_001F_EAX,
0 /* SME */ | 0 /* SEV */ | 0 /* VM_PAGE_FLUSH */ | 0 /* SEV_ES */ |
F(SME_COHERENT));

- kvm_cpu_cap_mask(CPUID_8000_0021_EAX,
+ kvm_cpu_cap_init(CPUID_8000_0021_EAX,
F(NO_NESTED_DATA_BP) | F(LFENCE_RDTSC) | 0 /* SmmPgCfgLock */ |
F(NULL_SEL_CLR_BASE) | F(AUTOIBRS) | 0 /* PrefetchCtlMsr */ |
F(WRMSR_XX_BASE_NS)
@@ -837,7 +837,7 @@ void kvm_set_cpu_caps(void)
* kernel. LFENCE_RDTSC was a Linux-defined synthetic feature long
* before AMD joined the bandwagon, e.g. LFENCE is serializing on most
* CPUs that support SSE2. On CPUs that don't support AMD's leaf,
- * kvm_cpu_cap_mask() will unfortunately drop the flag due to ANDing
+ * kvm_cpu_cap_init() will unfortunately drop the flag due to ANDing
* the mask with the raw host CPUID, and reporting support in AMD's
* leaf can make it easier for userspace to detect the feature.
*/
@@ -847,7 +847,7 @@ void kvm_set_cpu_caps(void)
kvm_cpu_cap_set(X86_FEATURE_NULL_SEL_CLR_BASE);
kvm_cpu_cap_set(X86_FEATURE_NO_SMM_CTL_MSR);

- kvm_cpu_cap_mask(CPUID_C000_0001_EDX,
+ kvm_cpu_cap_init(CPUID_C000_0001_EDX,
F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
F(ACE2) | F(ACE2_EN) | F(PHE) | F(PHE_EN) |
F(PMM) | F(PMM_EN)
--
2.45.0.215.g3402c0e53f-goog


2024-05-17 18:07:59

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching

On Fri, May 17, 2024 at 7:39 PM Sean Christopherson <[email protected]> wrote:
> * Disallow KVM_CAP_X86_DISABLE_EXITS after vCPU creation
> * Reject disabling of MWAIT/HLT interception when not allowed
> * Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID.

This is technically a breaking change, and it's even documented in
api.rst under "KVM_GET_SUPPORTED_CPUID issues":

---
CPU[EAX=1]:ECX[21] (X2APIC) is reported by
``KVM_GET_SUPPORTED_CPUID``, but it can only be enabled if
``KVM_CREATE_IRQCHIP`` or ``KVM_ENABLE_CAP(KVM_CAP_IRQCHIP_SPLIT)``
are used to enable in-kernel emulation of the local APIC.

The same is true for the ``KVM_FEATURE_PV_UNHALT`` paravirtualized feature.

CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by
``KVM_GET_SUPPORTED_CPUID``. It can be enabled if
``KVM_CAP_TSC_DEADLINE_TIMER`` is present and the kernel has enabled
in-kernel emulation of the local APIC.
---

However I think we can get away with it. QEMU source code on one hand does

/* tsc-deadline flag is not returned by GET_SUPPORTED_CPUID, but it
* can be enabled if the kernel has KVM_CAP_TSC_DEADLINE_TIMER,
* and the irqchip is in the kernel.
*/
if (kvm_irqchip_in_kernel() &&
kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
ret |= CPUID_EXT_TSC_DEADLINE_TIMER;
}

/* x2apic is reported by GET_SUPPORTED_CPUID, but it can't be enabled
* without the in-kernel irqchip
*/
if (!kvm_irqchip_in_kernel()) {
ret &= ~CPUID_EXT_X2APIC;
}

so it has to cope with existing mess but it's not expecting the
opposite mess (understandable).

However, in practice userspace APIC has always been utterly broken and
even deprecated in QEMU, so we might get away with it. I don't see why
one would use no kernel APIC unless the guest has no APIC whatsoever.

And no guest that doesn't find an APIC is going to use the TSC
deadline timer (sure the MSR is outside x2APIC space but how in the
world would you configure LVTT), likewise for X2APIC since you need to
turn it on at 0xFEE0_0000 first.

Paolo


2024-05-22 07:35:44

by Binbin Wu

[permalink] [raw]
Subject: Re: [PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed



On 5/18/2024 1:38 AM, Sean Christopherson wrote:
> Reject KVM_CAP_X86_DISABLE_EXITS if userspace attempts to disable MWAIT or
> HLT exits and KVM previously reported (via KVM_CHECK_EXTENSION) that
> disabling the exit(s) is not allowed. E.g. because MWAIT isn't supported
> or the CPU doesn't have an aways-running APIC timer, or because KVM is

aways-running -> always-running

> configured to mitigate cross-thread vulnerabilities.
>
> Cc: Kechen Lu <[email protected]>
> Fixes: 4d5422cea3b6 ("KVM: X86: Provide a capability to disable MWAIT intercepts")
> Fixes: 6f0f2d5ef895 ("KVM: x86: Mitigate the cross-thread return address predictions bug")
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/x86.c | 54 ++++++++++++++++++++++++----------------------
> 1 file changed, 28 insertions(+), 26 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4cb0c150a2f8..c729227c6501 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4590,6 +4590,20 @@ static inline bool kvm_can_mwait_in_guest(void)
> boot_cpu_has(X86_FEATURE_ARAT);
> }
>
> +static u64 kvm_get_allowed_disable_exits(void)
> +{
> + u64 r = KVM_X86_DISABLE_EXITS_PAUSE;
> +
> + if (!mitigate_smt_rsb) {
> + r |= KVM_X86_DISABLE_EXITS_HLT |
> + KVM_X86_DISABLE_EXITS_CSTATE;
> +
> + if (kvm_can_mwait_in_guest())
> + r |= KVM_X86_DISABLE_EXITS_MWAIT;
> + }
> + return r;
> +}
> +
> #ifdef CONFIG_KVM_HYPERV
> static int kvm_ioctl_get_supported_hv_cpuid(struct kvm_vcpu *vcpu,
> struct kvm_cpuid2 __user *cpuid_arg)
> @@ -4726,15 +4740,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> r = KVM_CLOCK_VALID_FLAGS;
> break;
> case KVM_CAP_X86_DISABLE_EXITS:
> - r = KVM_X86_DISABLE_EXITS_PAUSE;
> -
> - if (!mitigate_smt_rsb) {
> - r |= KVM_X86_DISABLE_EXITS_HLT |
> - KVM_X86_DISABLE_EXITS_CSTATE;
> -
> - if (kvm_can_mwait_in_guest())
> - r |= KVM_X86_DISABLE_EXITS_MWAIT;
> - }
> + r |= kvm_get_allowed_disable_exits();

Nit: Just use "=".

> break;
> case KVM_CAP_X86_SMM:
> if (!IS_ENABLED(CONFIG_KVM_SMM))
> @@ -6565,33 +6571,29 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> break;
> case KVM_CAP_X86_DISABLE_EXITS:
> r = -EINVAL;
> - if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS)
> + if (cap->args[0] & ~kvm_get_allowed_disable_exits())
> break;
>
> mutex_lock(&kvm->lock);
> if (kvm->created_vcpus)
> goto disable_exits_unlock;
>
> - if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
> - kvm->arch.pause_in_guest = true;
> -
> #define SMT_RSB_MSG "This processor is affected by the Cross-Thread Return Predictions vulnerability. " \
> "KVM_CAP_X86_DISABLE_EXITS should only be used with SMT disabled or trusted guests."
>
> - if (!mitigate_smt_rsb) {
> - if (boot_cpu_has_bug(X86_BUG_SMT_RSB) && cpu_smt_possible() &&
> - (cap->args[0] & ~KVM_X86_DISABLE_EXITS_PAUSE))
> - pr_warn_once(SMT_RSB_MSG);
> -
> - if ((cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT) &&
> - kvm_can_mwait_in_guest())
> - kvm->arch.mwait_in_guest = true;
> - if (cap->args[0] & KVM_X86_DISABLE_EXITS_HLT)
> - kvm->arch.hlt_in_guest = true;
> - if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
> - kvm->arch.cstate_in_guest = true;
> - }
> + if (!mitigate_smt_rsb && boot_cpu_has_bug(X86_BUG_SMT_RSB) &&
> + cpu_smt_possible() &&
> + (cap->args[0] & ~KVM_X86_DISABLE_EXITS_PAUSE))
> + pr_warn_once(SMT_RSB_MSG);
>
> + if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
> + kvm->arch.pause_in_guest = true;
> + if (cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT)
> + kvm->arch.mwait_in_guest = true;
> + if (cap->args[0] & KVM_X86_DISABLE_EXITS_HLT)
> + kvm->arch.hlt_in_guest = true;
> + if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
> + kvm->arch.cstate_in_guest = true;
> r = 0;
> disable_exits_unlock:
> mutex_unlock(&kvm->lock);


2024-05-22 10:02:23

by Binbin Wu

[permalink] [raw]
Subject: Re: [PATCH v2 20/49] KVM: x86: Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init()



On 5/18/2024 1:38 AM, Sean Christopherson wrote:
> Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init() in anticipation of merging
> it with kvm_cpu_cap_init_kvm_defined(), and in anticipation of _setting_
> bits in the helper (a future commit will play macro games to set emulated
> feature flags via kvm_cpu_cap_init()).
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/cpuid.c | 36 ++++++++++++++++++------------------
> 1 file changed, 18 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index a802c09b50ab..5a4d6138c4f1 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -74,7 +74,7 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
> * Raw Feature - For features that KVM supports based purely on raw host CPUID,
> * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
> * Simply force set the feature in KVM's capabilities, raw CPUID support will
> - * be factored in by kvm_cpu_cap_mask().
> + * be factored in by __kvm_cpu_cap_mask().

kvm_cpu_cap_init()?


> */
> #define RAW_F(name) \
> ({ \
> @@ -619,7 +619,7 @@ static __always_inline void __kvm_cpu_cap_mask(unsigned int leaf)
> static __always_inline
> void kvm_cpu_cap_init_kvm_defined(enum kvm_only_cpuid_leafs leaf, u32 mask)
> {
> - /* Use kvm_cpu_cap_mask for leafs that aren't KVM-only. */
> + /* Use kvm_cpu_cap_init for leafs that aren't KVM-only. */
> BUILD_BUG_ON(leaf < NCAPINTS);
>
> kvm_cpu_caps[leaf] = mask;
> @@ -627,7 +627,7 @@ void kvm_cpu_cap_init_kvm_defined(enum kvm_only_cpuid_leafs leaf, u32 mask)
> __kvm_cpu_cap_mask(leaf);
> }
>
> -static __always_inline void kvm_cpu_cap_mask(enum cpuid_leafs leaf, u32 mask)
> +static __always_inline void kvm_cpu_cap_init(enum cpuid_leafs leaf, u32 mask)
> {
> /* Use kvm_cpu_cap_init_kvm_defined for KVM-only leafs. */
> BUILD_BUG_ON(leaf >= NCAPINTS);
> @@ -656,7 +656,7 @@ void kvm_set_cpu_caps(void)
> memcpy(&kvm_cpu_caps, &boot_cpu_data.x86_capability,
> sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)));
>
> - kvm_cpu_cap_mask(CPUID_1_ECX,
> + kvm_cpu_cap_init(CPUID_1_ECX,
> /*
> * NOTE: MONITOR (and MWAIT) are emulated as NOP, but *not*
> * advertised to guests via CPUID!
> @@ -673,7 +673,7 @@ void kvm_set_cpu_caps(void)
> /* KVM emulates x2apic in software irrespective of host support. */
> kvm_cpu_cap_set(X86_FEATURE_X2APIC);
>
> - kvm_cpu_cap_mask(CPUID_1_EDX,
> + kvm_cpu_cap_init(CPUID_1_EDX,
> F(FPU) | F(VME) | F(DE) | F(PSE) |
> F(TSC) | F(MSR) | F(PAE) | F(MCE) |
> F(CX8) | F(APIC) | 0 /* Reserved */ | F(SEP) |
> @@ -684,7 +684,7 @@ void kvm_set_cpu_caps(void)
> 0 /* HTT, TM, Reserved, PBE */
> );
>
> - kvm_cpu_cap_mask(CPUID_7_0_EBX,
> + kvm_cpu_cap_init(CPUID_7_0_EBX,
> F(FSGSBASE) | F(SGX) | F(BMI1) | F(HLE) | F(AVX2) |
> F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) | F(INVPCID) |
> F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ | F(AVX512F) |
> @@ -693,7 +693,7 @@ void kvm_set_cpu_caps(void)
> F(AVX512ER) | F(AVX512CD) | F(SHA_NI) | F(AVX512BW) |
> F(AVX512VL));
>
> - kvm_cpu_cap_mask(CPUID_7_ECX,
> + kvm_cpu_cap_init(CPUID_7_ECX,
> F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
> F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
> F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
> @@ -708,7 +708,7 @@ void kvm_set_cpu_caps(void)
> if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
> kvm_cpu_cap_clear(X86_FEATURE_PKU);
>
> - kvm_cpu_cap_mask(CPUID_7_EDX,
> + kvm_cpu_cap_init(CPUID_7_EDX,
> F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
> F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
> F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
> @@ -727,7 +727,7 @@ void kvm_set_cpu_caps(void)
> if (boot_cpu_has(X86_FEATURE_AMD_SSBD))
> kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD);
>
> - kvm_cpu_cap_mask(CPUID_7_1_EAX,
> + kvm_cpu_cap_init(CPUID_7_1_EAX,
> F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) |
> F(FZRM) | F(FSRS) | F(FSRC) |
> F(AMX_FP16) | F(AVX_IFMA) | F(LAM)
> @@ -743,7 +743,7 @@ void kvm_set_cpu_caps(void)
> F(BHI_CTRL) | F(MCDT_NO)
> );
>
> - kvm_cpu_cap_mask(CPUID_D_1_EAX,
> + kvm_cpu_cap_init(CPUID_D_1_EAX,
> F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES) | f_xfd
> );
>
> @@ -751,7 +751,7 @@ void kvm_set_cpu_caps(void)
> SF(SGX1) | SF(SGX2) | SF(SGX_EDECCSSA)
> );
>
> - kvm_cpu_cap_mask(CPUID_8000_0001_ECX,
> + kvm_cpu_cap_init(CPUID_8000_0001_ECX,
> F(LAHF_LM) | F(CMP_LEGACY) | 0 /*SVM*/ | 0 /* ExtApicSpace */ |
> F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) |
> F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
> @@ -759,7 +759,7 @@ void kvm_set_cpu_caps(void)
> F(TOPOEXT) | 0 /* PERFCTR_CORE */
> );
>
> - kvm_cpu_cap_mask(CPUID_8000_0001_EDX,
> + kvm_cpu_cap_init(CPUID_8000_0001_EDX,
> F(FPU) | F(VME) | F(DE) | F(PSE) |
> F(TSC) | F(MSR) | F(PAE) | F(MCE) |
> F(CX8) | F(APIC) | 0 /* Reserved */ | F(SYSCALL) |
> @@ -777,7 +777,7 @@ void kvm_set_cpu_caps(void)
> SF(CONSTANT_TSC)
> );
>
> - kvm_cpu_cap_mask(CPUID_8000_0008_EBX,
> + kvm_cpu_cap_init(CPUID_8000_0008_EBX,
> F(CLZERO) | F(XSAVEERPTR) |
> F(WBNOINVD) | F(AMD_IBPB) | F(AMD_IBRS) | F(AMD_SSBD) | F(VIRT_SSBD) |
> F(AMD_SSB_NO) | F(AMD_STIBP) | F(AMD_STIBP_ALWAYS_ON) |
> @@ -811,13 +811,13 @@ void kvm_set_cpu_caps(void)
> * Hide all SVM features by default, SVM will set the cap bits for
> * features it emulates and/or exposes for L1.
> */
> - kvm_cpu_cap_mask(CPUID_8000_000A_EDX, 0);
> + kvm_cpu_cap_init(CPUID_8000_000A_EDX, 0);
>
> - kvm_cpu_cap_mask(CPUID_8000_001F_EAX,
> + kvm_cpu_cap_init(CPUID_8000_001F_EAX,
> 0 /* SME */ | 0 /* SEV */ | 0 /* VM_PAGE_FLUSH */ | 0 /* SEV_ES */ |
> F(SME_COHERENT));
>
> - kvm_cpu_cap_mask(CPUID_8000_0021_EAX,
> + kvm_cpu_cap_init(CPUID_8000_0021_EAX,
> F(NO_NESTED_DATA_BP) | F(LFENCE_RDTSC) | 0 /* SmmPgCfgLock */ |
> F(NULL_SEL_CLR_BASE) | F(AUTOIBRS) | 0 /* PrefetchCtlMsr */ |
> F(WRMSR_XX_BASE_NS)
> @@ -837,7 +837,7 @@ void kvm_set_cpu_caps(void)
> * kernel. LFENCE_RDTSC was a Linux-defined synthetic feature long
> * before AMD joined the bandwagon, e.g. LFENCE is serializing on most
> * CPUs that support SSE2. On CPUs that don't support AMD's leaf,
> - * kvm_cpu_cap_mask() will unfortunately drop the flag due to ANDing
> + * kvm_cpu_cap_init() will unfortunately drop the flag due to ANDing
> * the mask with the raw host CPUID, and reporting support in AMD's
> * leaf can make it easier for userspace to detect the feature.
> */
> @@ -847,7 +847,7 @@ void kvm_set_cpu_caps(void)
> kvm_cpu_cap_set(X86_FEATURE_NULL_SEL_CLR_BASE);
> kvm_cpu_cap_set(X86_FEATURE_NO_SMM_CTL_MSR);
>
> - kvm_cpu_cap_mask(CPUID_C000_0001_EDX,
> + kvm_cpu_cap_init(CPUID_C000_0001_EDX,
> F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
> F(ACE2) | F(ACE2_EN) | F(PHE) | F(PHE_EN) |
> F(PMM) | F(PMM_EN)


2024-05-22 16:20:47

by Binbin Wu

[permalink] [raw]
Subject: Re: [PATCH v2 33/49] KVM: x86: Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID



On 5/18/2024 1:39 AM, Sean Christopherson wrote:
> Advertise TSC_DEADLINE_TIMER via KVM_GET_SUPPORTED_CPUID when it's
> supported in hardware,

But it's using EMUL_F(TSC_DEADLINE_TIMER) below?

> as the odds of a VMM emulating the local APIC in
> userspace, not emulating the TSC deadline timer, _and_ reflecting
> KVM_GET_SUPPORTED_CPUID back into KVM_SET_CPUID2 are extremely low.
>
> KVM has _unconditionally_ advertised X2APIC via CPUID since commit
> 0d1de2d901f4 ("KVM: Always report x2apic as supported feature"), and it
> is completely impossible for userspace to emulate X2APIC as KVM doesn't
> support forwarding the MSR accesses to userspace. I.e. KVM has relied on
> userspace VMMs to not misreport local APIC capabilities for nearly 13
> years.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> Documentation/virt/kvm/api.rst | 9 ++++++---
> arch/x86/kvm/cpuid.c | 4 ++--
> 2 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 884846282d06..cb744a646de6 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -1804,15 +1804,18 @@ emulate them efficiently. The fields in each entry are defined as follows:
> the values returned by the cpuid instruction for
> this function/index combination
>
> -The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always returned
> -as false, since the feature depends on KVM_CREATE_IRQCHIP for local APIC
> -support. Instead it is reported via::
> +x2APIC (CPUID leaf 1, ecx[21) and TSC deadline timer (CPUID leaf 1, ecx[24])
> +may be returned as true, but they depend on KVM_CREATE_IRQCHIP for in-kernel
> +emulation of the local APIC. TSC deadline timer support is also reported via::
>
> ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER)
>
> if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
> feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
>
> +Enabling x2APIC in KVM_SET_CPUID2 requires KVM_CREATE_IRQCHIP as KVM doesn't
> +support forwarding x2APIC MSR accesses to userspace, i.e. KVM does not support
> +emulating x2APIC in userspace.
>
> 4.47 KVM_PPC_GET_PVINFO
> -----------------------
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 699ce4261e9c..d1f427284ccc 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -680,8 +680,8 @@ void kvm_set_cpu_caps(void)
> F(FMA) | F(CX16) | 0 /* xTPR Update */ | F(PDCM) |
> F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
> F(XMM4_2) | EMUL_F(X2APIC) | F(MOVBE) | F(POPCNT) |
> - 0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
> - F(F16C) | F(RDRAND)
> + EMUL_F(TSC_DEADLINE_TIMER) | F(AES) | F(XSAVE) |
> + 0 /* OSXSAVE */ | F(AVX) | F(F16C) | F(RDRAND)
> );
>
> kvm_cpu_cap_init(CPUID_1_EDX,


2024-05-22 19:12:25

by Binbin Wu

[permalink] [raw]
Subject: Re: [PATCH v2 36/49] KVM: x86: Rename "governed features" helpers to use "guest_cpu_cap"



On 5/18/2024 1:39 AM, Sean Christopherson wrote:
> As the first step toward replacing KVM's so-called "governed features"
> framework with a more comprehensive, less poorly named implementation,
> replace the "kvm_governed_feature" function prefix with "guest_cpu_cap"
> and rename guest_can_use() to guest_cpu_cap_has().
>
> The "guest_cpu_cap" naming scheme mirrors that of "kvm_cpu_cap", and
> provides a more clear distinction between guest capabilities, which are
> KVM controlled (heh, or one might say "governed"), and guest CPUID, which
> with few exceptions is fully userspace controlled.
>
> Opportunistically rewrite the comment about XSS passthrough for SEV-ES
> guests to avoid referencing so many functions, as such comments are prone
> to becoming stale (case in point...).
>
> No functional change intended.

Reviewed-by: Binbin Wu <[email protected]>

>
> Reviewed-by: Maxim Levitsky <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/cpuid.c | 2 +-
> arch/x86/kvm/cpuid.h | 16 ++++++++--------
> arch/x86/kvm/mmu.h | 2 +-
> arch/x86/kvm/mmu/mmu.c | 4 ++--
> arch/x86/kvm/svm/nested.c | 22 +++++++++++-----------
> arch/x86/kvm/svm/sev.c | 17 ++++++++---------
> arch/x86/kvm/svm/svm.c | 26 +++++++++++++-------------
> arch/x86/kvm/svm/svm.h | 4 ++--
> arch/x86/kvm/vmx/nested.c | 6 +++---
> arch/x86/kvm/vmx/vmx.c | 16 ++++++++--------
> arch/x86/kvm/x86.c | 4 ++--
> 11 files changed, 59 insertions(+), 60 deletions(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 16bb873188d6..286abefc93d5 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -407,7 +407,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> allow_gbpages = tdp_enabled ? boot_cpu_has(X86_FEATURE_GBPAGES) :
> guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES);
> if (allow_gbpages)
> - kvm_governed_feature_set(vcpu, X86_FEATURE_GBPAGES);
> + guest_cpu_cap_set(vcpu, X86_FEATURE_GBPAGES);
>
> best = kvm_find_cpuid_entry(vcpu, 1);
> if (best && apic) {
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index d68b7d879820..e021681f34ac 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -256,8 +256,8 @@ static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature)
> return kvm_governed_feature_index(x86_feature) >= 0;
> }
>
> -static __always_inline void kvm_governed_feature_set(struct kvm_vcpu *vcpu,
> - unsigned int x86_feature)
> +static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu,
> + unsigned int x86_feature)
> {
> BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
>
> @@ -265,15 +265,15 @@ static __always_inline void kvm_governed_feature_set(struct kvm_vcpu *vcpu,
> vcpu->arch.governed_features.enabled);
> }
>
> -static __always_inline void kvm_governed_feature_check_and_set(struct kvm_vcpu *vcpu,
> - unsigned int x86_feature)
> +static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
> + unsigned int x86_feature)
> {
> if (kvm_cpu_cap_has(x86_feature) && guest_cpuid_has(vcpu, x86_feature))
> - kvm_governed_feature_set(vcpu, x86_feature);
> + guest_cpu_cap_set(vcpu, x86_feature);
> }
>
> -static __always_inline bool guest_can_use(struct kvm_vcpu *vcpu,
> - unsigned int x86_feature)
> +static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
> + unsigned int x86_feature)
> {
> BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
>
> @@ -283,7 +283,7 @@ static __always_inline bool guest_can_use(struct kvm_vcpu *vcpu,
>
> static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> {
> - if (guest_can_use(vcpu, X86_FEATURE_LAM))
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_LAM))
> cr3 &= ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
>
> return kvm_vcpu_is_legal_gpa(vcpu, cr3);
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index dc80e72e4848..cf95ea5fe29d 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -150,7 +150,7 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)
>
> static inline unsigned long kvm_get_active_cr3_lam_bits(struct kvm_vcpu *vcpu)
> {
> - if (!guest_can_use(vcpu, X86_FEATURE_LAM))
> + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_LAM))
> return 0;
>
> return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 5095fb46713e..e18a10c59431 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4966,7 +4966,7 @@ static void reset_guest_rsvds_bits_mask(struct kvm_vcpu *vcpu,
> __reset_rsvds_bits_mask(&context->guest_rsvd_check,
> vcpu->arch.reserved_gpa_bits,
> context->cpu_role.base.level, is_efer_nx(context),
> - guest_can_use(vcpu, X86_FEATURE_GBPAGES),
> + guest_cpu_cap_has(vcpu, X86_FEATURE_GBPAGES),
> is_cr4_pse(context),
> guest_cpuid_is_amd_compatible(vcpu));
> }
> @@ -5043,7 +5043,7 @@ static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
> __reset_rsvds_bits_mask(shadow_zero_check, reserved_hpa_bits(),
> context->root_role.level,
> context->root_role.efer_nx,
> - guest_can_use(vcpu, X86_FEATURE_GBPAGES),
> + guest_cpu_cap_has(vcpu, X86_FEATURE_GBPAGES),
> is_pse, is_amd);
>
> if (!shadow_me_mask)
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 55b9a6d96bcf..2900a8e21257 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -107,7 +107,7 @@ static void nested_svm_uninit_mmu_context(struct kvm_vcpu *vcpu)
>
> static bool nested_vmcb_needs_vls_intercept(struct vcpu_svm *svm)
> {
> - if (!guest_can_use(&svm->vcpu, X86_FEATURE_V_VMSAVE_VMLOAD))
> + if (!guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_V_VMSAVE_VMLOAD))
> return true;
>
> if (!nested_npt_enabled(svm))
> @@ -590,7 +590,7 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
> vmcb_mark_dirty(vmcb02, VMCB_DR);
> }
>
> - if (unlikely(guest_can_use(vcpu, X86_FEATURE_LBRV) &&
> + if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
> (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
> /*
> * Reserved bits of DEBUGCTL are ignored. Be consistent with
> @@ -647,7 +647,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
> * exit_int_info, exit_int_info_err, next_rip, insn_len, insn_bytes.
> */
>
> - if (guest_can_use(vcpu, X86_FEATURE_VGIF) &&
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_VGIF) &&
> (svm->nested.ctl.int_ctl & V_GIF_ENABLE_MASK))
> int_ctl_vmcb12_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK);
> else
> @@ -685,7 +685,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>
> vmcb02->control.tsc_offset = vcpu->arch.tsc_offset;
>
> - if (guest_can_use(vcpu, X86_FEATURE_TSCRATEMSR) &&
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_TSCRATEMSR) &&
> svm->tsc_ratio_msr != kvm_caps.default_tsc_scaling_ratio)
> nested_svm_update_tsc_ratio_msr(vcpu);
>
> @@ -706,7 +706,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
> * what a nrips=0 CPU would do (L1 is responsible for advancing RIP
> * prior to injecting the event).
> */
> - if (guest_can_use(vcpu, X86_FEATURE_NRIPS))
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> vmcb02->control.next_rip = svm->nested.ctl.next_rip;
> else if (boot_cpu_has(X86_FEATURE_NRIPS))
> vmcb02->control.next_rip = vmcb12_rip;
> @@ -716,7 +716,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
> svm->soft_int_injected = true;
> svm->soft_int_csbase = vmcb12_csbase;
> svm->soft_int_old_rip = vmcb12_rip;
> - if (guest_can_use(vcpu, X86_FEATURE_NRIPS))
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> svm->soft_int_next_rip = svm->nested.ctl.next_rip;
> else
> svm->soft_int_next_rip = vmcb12_rip;
> @@ -724,18 +724,18 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>
> vmcb02->control.virt_ext = vmcb01->control.virt_ext &
> LBR_CTL_ENABLE_MASK;
> - if (guest_can_use(vcpu, X86_FEATURE_LBRV))
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV))
> vmcb02->control.virt_ext |=
> (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK);
>
> if (!nested_vmcb_needs_vls_intercept(svm))
> vmcb02->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
>
> - if (guest_can_use(vcpu, X86_FEATURE_PAUSEFILTER))
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_PAUSEFILTER))
> pause_count12 = svm->nested.ctl.pause_filter_count;
> else
> pause_count12 = 0;
> - if (guest_can_use(vcpu, X86_FEATURE_PFTHRESHOLD))
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_PFTHRESHOLD))
> pause_thresh12 = svm->nested.ctl.pause_filter_thresh;
> else
> pause_thresh12 = 0;
> @@ -1022,7 +1022,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
> if (vmcb12->control.exit_code != SVM_EXIT_ERR)
> nested_save_pending_event_to_vmcb12(svm, vmcb12);
>
> - if (guest_can_use(vcpu, X86_FEATURE_NRIPS))
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> vmcb12->control.next_rip = vmcb02->control.next_rip;
>
> vmcb12->control.int_ctl = svm->nested.ctl.int_ctl;
> @@ -1061,7 +1061,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
> if (!nested_exit_on_intr(svm))
> kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
>
> - if (unlikely(guest_can_use(vcpu, X86_FEATURE_LBRV) &&
> + if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
> (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
> svm_copy_lbrs(vmcb12, vmcb02);
> svm_update_lbrv(vcpu);
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 57c2c8025547..7640dedc2ddc 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4409,16 +4409,15 @@ static void sev_es_vcpu_after_set_cpuid(struct vcpu_svm *svm)
> * For SEV-ES, accesses to MSR_IA32_XSS should not be intercepted if
> * the host/guest supports its use.
> *
> - * guest_can_use() checks a number of requirements on the host/guest to
> - * ensure that MSR_IA32_XSS is available, but it might report true even
> - * if X86_FEATURE_XSAVES isn't configured in the guest to ensure host
> - * MSR_IA32_XSS is always properly restored. For SEV-ES, it is better
> - * to further check that the guest CPUID actually supports
> - * X86_FEATURE_XSAVES so that accesses to MSR_IA32_XSS by misbehaved
> - * guests will still get intercepted and caught in the normal
> - * kvm_emulate_rdmsr()/kvm_emulated_wrmsr() paths.
> + * KVM treats the guest as being capable of using XSAVES even if XSAVES
> + * isn't enabled in guest CPUID as there is no intercept for XSAVES,
> + * i.e. the guest can use XSAVES/XRSTOR to read/write XSS if XSAVE is
> + * exposed to the guest and XSAVES is supported in hardware. Condition
> + * full XSS passthrough on the guest being able to use XSAVES *and*
> + * XSAVES being exposed to the guest so that KVM can at least honor
> + * guest CPUID for RDMSR and WRMSR.
> */
> - if (guest_can_use(vcpu, X86_FEATURE_XSAVES) &&
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVES) &&
> guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
> set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
> else
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 3d0549ca246f..2acd2e3bb1b0 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1039,7 +1039,7 @@ void svm_update_lbrv(struct kvm_vcpu *vcpu)
> struct vcpu_svm *svm = to_svm(vcpu);
> bool current_enable_lbrv = svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK;
> bool enable_lbrv = (svm_get_lbr_vmcb(svm)->save.dbgctl & DEBUGCTLMSR_LBR) ||
> - (is_guest_mode(vcpu) && guest_can_use(vcpu, X86_FEATURE_LBRV) &&
> + (is_guest_mode(vcpu) && guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
> (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK));
>
> if (enable_lbrv == current_enable_lbrv)
> @@ -2841,7 +2841,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> switch (msr_info->index) {
> case MSR_AMD64_TSC_RATIO:
> if (!msr_info->host_initiated &&
> - !guest_can_use(vcpu, X86_FEATURE_TSCRATEMSR))
> + !guest_cpu_cap_has(vcpu, X86_FEATURE_TSCRATEMSR))
> return 1;
> msr_info->data = svm->tsc_ratio_msr;
> break;
> @@ -2991,7 +2991,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> switch (ecx) {
> case MSR_AMD64_TSC_RATIO:
>
> - if (!guest_can_use(vcpu, X86_FEATURE_TSCRATEMSR)) {
> + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_TSCRATEMSR)) {
>
> if (!msr->host_initiated)
> return 1;
> @@ -3013,7 +3013,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>
> svm->tsc_ratio_msr = data;
>
> - if (guest_can_use(vcpu, X86_FEATURE_TSCRATEMSR) &&
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_TSCRATEMSR) &&
> is_guest_mode(vcpu))
> nested_svm_update_tsc_ratio_msr(vcpu);
>
> @@ -4342,11 +4342,11 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> if (boot_cpu_has(X86_FEATURE_XSAVE) &&
> boot_cpu_has(X86_FEATURE_XSAVES) &&
> guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
> - kvm_governed_feature_set(vcpu, X86_FEATURE_XSAVES);
> + guest_cpu_cap_set(vcpu, X86_FEATURE_XSAVES);
>
> - kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_NRIPS);
> - kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_TSCRATEMSR);
> - kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_LBRV);
> + guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_NRIPS);
> + guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_TSCRATEMSR);
> + guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LBRV);
>
> /*
> * Intercept VMLOAD if the vCPU mode is Intel in order to emulate that
> @@ -4354,12 +4354,12 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> * SVM on Intel is bonkers and extremely unlikely to work).
> */
> if (!guest_cpuid_is_intel(vcpu))
> - kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
> + guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
>
> - kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_PAUSEFILTER);
> - kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_PFTHRESHOLD);
> - kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VGIF);
> - kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VNMI);
> + guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PAUSEFILTER);
> + guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PFTHRESHOLD);
> + guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VGIF);
> + guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VNMI);
>
> svm_recalc_instruction_intercepts(vcpu, svm);
>
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 97b3683ea324..08fd788d08df 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -487,7 +487,7 @@ static inline bool svm_is_intercept(struct vcpu_svm *svm, int bit)
>
> static inline bool nested_vgif_enabled(struct vcpu_svm *svm)
> {
> - return guest_can_use(&svm->vcpu, X86_FEATURE_VGIF) &&
> + return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_VGIF) &&
> (svm->nested.ctl.int_ctl & V_GIF_ENABLE_MASK);
> }
>
> @@ -539,7 +539,7 @@ static inline bool nested_npt_enabled(struct vcpu_svm *svm)
>
> static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
> {
> - return guest_can_use(&svm->vcpu, X86_FEATURE_VNMI) &&
> + return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_VNMI) &&
> (svm->nested.ctl.int_ctl & V_NMI_ENABLE_MASK);
> }
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index d5b832126e34..fb7eec29681d 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -6488,7 +6488,7 @@ static int vmx_get_nested_state(struct kvm_vcpu *vcpu,
> vmx = to_vmx(vcpu);
> vmcs12 = get_vmcs12(vcpu);
>
> - if (guest_can_use(vcpu, X86_FEATURE_VMX) &&
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_VMX) &&
> (vmx->nested.vmxon || vmx->nested.smm.vmxon)) {
> kvm_state.hdr.vmx.vmxon_pa = vmx->nested.vmxon_ptr;
> kvm_state.hdr.vmx.vmcs12_pa = vmx->nested.current_vmptr;
> @@ -6629,7 +6629,7 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
> if (kvm_state->flags & ~KVM_STATE_NESTED_EVMCS)
> return -EINVAL;
> } else {
> - if (!guest_can_use(vcpu, X86_FEATURE_VMX))
> + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
> return -EINVAL;
>
> if (!page_address_valid(vcpu, kvm_state->hdr.vmx.vmxon_pa))
> @@ -6663,7 +6663,7 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
> return -EINVAL;
>
> if ((kvm_state->flags & KVM_STATE_NESTED_EVMCS) &&
> - (!guest_can_use(vcpu, X86_FEATURE_VMX) ||
> + (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX) ||
> !vmx->nested.enlightened_vmcs_enabled))
> return -EINVAL;
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 51b2cd13250a..1bc56596d653 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2050,7 +2050,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> [msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0];
> break;
> case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
> - if (!guest_can_use(vcpu, X86_FEATURE_VMX))
> + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
> return 1;
> if (vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
> &msr_info->data))
> @@ -2360,7 +2360,7 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
> if (!msr_info->host_initiated)
> return 1; /* they are read-only */
> - if (!guest_can_use(vcpu, X86_FEATURE_VMX))
> + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
> return 1;
> return vmx_set_vmx_msr(vcpu, msr_index, data);
> case MSR_IA32_RTIT_CTL:
> @@ -4571,7 +4571,7 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control,
> \
> if (cpu_has_vmx_##name()) { \
> if (kvm_is_governed_feature(X86_FEATURE_##feat_name)) \
> - __enabled = guest_can_use(__vcpu, X86_FEATURE_##feat_name); \
> + __enabled = guest_cpu_cap_has(__vcpu, X86_FEATURE_##feat_name); \
> else \
> __enabled = guest_cpuid_has(__vcpu, X86_FEATURE_##feat_name); \
> vmx_adjust_secondary_exec_control(vmx, exec_control, SECONDARY_EXEC_##ctrl_name,\
> @@ -7838,10 +7838,10 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> */
> if (boot_cpu_has(X86_FEATURE_XSAVE) &&
> guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
> - kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_XSAVES);
> + guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_XSAVES);
>
> - kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VMX);
> - kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_LAM);
> + guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VMX);
> + guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LAM);
>
> vmx_setup_uret_msrs(vmx);
>
> @@ -7849,7 +7849,7 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> vmcs_set_secondary_exec_control(vmx,
> vmx_secondary_exec_control(vmx));
>
> - if (guest_can_use(vcpu, X86_FEATURE_VMX))
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
> vmx->msr_ia32_feature_control_valid_bits |=
> FEAT_CTL_VMX_ENABLED_INSIDE_SMX |
> FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX;
> @@ -7858,7 +7858,7 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> ~(FEAT_CTL_VMX_ENABLED_INSIDE_SMX |
> FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX);
>
> - if (guest_can_use(vcpu, X86_FEATURE_VMX))
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
> nested_vmx_cr_fixed1_bits_update(vcpu);
>
> if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7160c5ab8e3e..4ca9651b3f43 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1026,7 +1026,7 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)
> if (vcpu->arch.xcr0 != host_xcr0)
> xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0);
>
> - if (guest_can_use(vcpu, X86_FEATURE_XSAVES) &&
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVES) &&
> vcpu->arch.ia32_xss != host_xss)
> wrmsrl(MSR_IA32_XSS, vcpu->arch.ia32_xss);
> }
> @@ -1057,7 +1057,7 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu)
> if (vcpu->arch.xcr0 != host_xcr0)
> xsetbv(XCR_XFEATURE_ENABLED_MASK, host_xcr0);
>
> - if (guest_can_use(vcpu, X86_FEATURE_XSAVES) &&
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVES) &&
> vcpu->arch.ia32_xss != host_xss)
> wrmsrl(MSR_IA32_XSS, host_xss);
> }


2024-05-28 15:21:54

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 33/49] KVM: x86: Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID

On Wed, May 22, 2024, Binbin Wu wrote:
>
>
> On 5/18/2024 1:39 AM, Sean Christopherson wrote:
> > Advertise TSC_DEADLINE_TIMER via KVM_GET_SUPPORTED_CPUID when it's
> > supported in hardware,
>
> But it's using EMUL_F(TSC_DEADLINE_TIMER) below?

Doh, yeah, the changelog is wrong. KVM always emulates TSC_DEADLINE_TIMER.

Thanks!

2024-05-28 18:54:34

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 20/49] KVM: x86: Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init()

On Wed, May 22, 2024, Binbin Wu wrote:
> On 5/18/2024 1:38 AM, Sean Christopherson wrote:
> > Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init() in anticipation of merging
> > it with kvm_cpu_cap_init_kvm_defined(), and in anticipation of _setting_
> > bits in the helper (a future commit will play macro games to set emulated
> > feature flags via kvm_cpu_cap_init()).
> >
> > No functional change intended.
> >
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/x86/kvm/cpuid.c | 36 ++++++++++++++++++------------------
> > 1 file changed, 18 insertions(+), 18 deletions(-)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index a802c09b50ab..5a4d6138c4f1 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -74,7 +74,7 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
> > * Raw Feature - For features that KVM supports based purely on raw host CPUID,
> > * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
> > * Simply force set the feature in KVM's capabilities, raw CPUID support will
> > - * be factored in by kvm_cpu_cap_mask().
> > + * be factored in by __kvm_cpu_cap_mask().
>
> kvm_cpu_cap_init()?

Drat, yes. IIRC, I tried to get clever to avoid having to update this comment a
second time, but then I ended up removing __kvm_cpu_cap_mask() entirely.

2024-05-28 18:57:10

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed

On Wed, May 22, 2024, Binbin Wu wrote:
> On 5/18/2024 1:38 AM, Sean Christopherson wrote:
> > @@ -4726,15 +4740,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > r = KVM_CLOCK_VALID_FLAGS;
> > break;
> > case KVM_CAP_X86_DISABLE_EXITS:
> > - r = KVM_X86_DISABLE_EXITS_PAUSE;
> > -
> > - if (!mitigate_smt_rsb) {
> > - r |= KVM_X86_DISABLE_EXITS_HLT |
> > - KVM_X86_DISABLE_EXITS_CSTATE;
> > -
> > - if (kvm_can_mwait_in_guest())
> > - r |= KVM_X86_DISABLE_EXITS_MWAIT;
> > - }
> > + r |= kvm_get_allowed_disable_exits();
>
> Nit: Just use "=".

Yowsers, that's more than a nit, that's downright bad code, it just happens to be
functionally ok. Thanks again for the reviews!