Yet another version of fixes and tests for PMU counters. New in v9 is a
fix for priority inversion of #GP vs. VM-Exit on Intel when KVM fully
emulates a RDPMC with a "bad" index. For all intents and purposes, this
priority inversion triggered the WARN in v8[*], which is why I included
the fix in this series (and because the fix would conflict horribly).
Dapeng, I dropped one of your reviews (patch "Disallow "fast" RDPMC for
architectural Intel PMUs) as the patch ended up being quite different after
eliminating the early RDPMC index check.
[*] https://lore.kernel.org/all/[email protected]
v9:
- Collect reviews. [Dapeng, Kan]
- Fix a 63:31 => 63:32 typo in a changelog. [Dapeng]
- Actually check that forced emulation is enabled before trying to force
emulation on RDPMC. [Jinrong]
- Fix the aformentioned priority inversion issue.
- Completely drop "support" for fast RDPMC, in quotes because KVM doesn't
actually support RDPMC for non-architectural PMUs. I had left the code
in v8 because I didn't fully grok what the early emulator check was
doing, i.e. wasn't 100% confident it was dead code.
v8:
- https://lore.kernel.org/all/[email protected]
- Collect reviews. [Jim, Dapeng, Kan]
- Tweak names for the RDPMC flags in the selftests #defines.
- Get the event selectors used to virtualize fixed straight from perf
instead of hardcoding the (wrong) selectors in KVM. [Kan]
- Rename an "eventsel" field to "event" for a patch that gets blasted
away in the end anyways. [Jim]
- Add patches to fix RDPMC emulation and to test the behavior on Intel.
I spot tested on AMD and spent ~30 minutes trying to squeeze in the
bare minimum AMD support, but the PMU implementations between Intel
and AMD are juuuust different enough to make adding AMD support non-
trivial, and this series is already way too big.
v7:
- https://lore.kernel.org/all/[email protected]
- Drop patches that unnecessarily sanitized supported CPUID. [Jim]
- Purge the array of architectural event encodings. [Jim, Dapeng]
- Clean up pmu.h to remove useless macros, and make it easier to use the
new macros. [Jim]
- Port more of pmu_event_filter_test.c to pmu.h macros. [Jim, Jinrong]
- Clean up test comments and error messages. [Jim]
- Sanity check the value provided to vcpu_set_cpuid_property(). [Jim]
v6:
- https://lore.kernel.org/all/[email protected]
- Test LLC references/misses with CFLUSH{OPT}. [Jim]
- Make the tests play nice without PERF_CAPABILITIES. [Mingwei]
- Don't squash eventsels that happen to match an unsupported arch event. [Kan]
- Test PMC counters with forced emulation (don't ask how long it took me to
figure out how to read integer module params).
v5: https://lore.kernel.org/all/[email protected]
v4: https://lore.kernel.org/all/[email protected]
v3: https://lore.kernel.org/kvm/[email protected]
Jinrong Liang (7):
KVM: selftests: Add vcpu_set_cpuid_property() to set properties
KVM: selftests: Add pmu.h and lib/pmu.c for common PMU assets
KVM: selftests: Test Intel PMU architectural events on gp counters
KVM: selftests: Test Intel PMU architectural events on fixed counters
KVM: selftests: Test consistency of CPUID with num of gp counters
KVM: selftests: Test consistency of CPUID with num of fixed counters
KVM: selftests: Add functional test for Intel's fixed PMU counters
Sean Christopherson (21):
KVM: x86/pmu: Always treat Fixed counters as available when supported
KVM: x86/pmu: Allow programming events that match unsupported arch
events
KVM: x86/pmu: Remove KVM's enumeration of Intel's architectural
encodings
KVM: x86/pmu: Setup fixed counters' eventsel during PMU initialization
KVM: x86/pmu: Get eventsel for fixed counters from perf
KVM: x86/pmu: Don't ignore bits 31:30 for RDPMC index on AMD
KVM: x86/pmu: Prioritize VMX interception over #GP on RDPMC due to bad
index
KVM: x86/pmu: Apply "fast" RDPMC only to Intel PMUs
KVM: x86/pmu: Disallow "fast" RDPMC for architectural Intel PMUs
KVM: x86/pmu: Explicitly check for RDPMC of unsupported Intel PMC
types
KVM: selftests: Drop the "name" param from KVM_X86_PMU_FEATURE()
KVM: selftests: Extend {kvm,this}_pmu_has() to support fixed counters
KVM: selftests: Expand PMU counters test to verify LLC events
KVM: selftests: Add a helper to query if the PMU module param is
enabled
KVM: selftests: Add helpers to read integer module params
KVM: selftests: Query module param to detect FEP in MSR filtering test
KVM: selftests: Move KVM_FEP macro into common library header
KVM: selftests: Test PMC virtualization with forced emulation
KVM: selftests: Add a forced emulation variation of KVM_ASM_SAFE()
KVM: selftests: Add helpers for safe and safe+forced RDMSR, RDPMC, and
XGETBV
KVM: selftests: Extend PMU counters test to validate RDPMC after WRMSR
arch/x86/include/asm/kvm-x86-pmu-ops.h | 3 +-
arch/x86/kvm/emulate.c | 2 +-
arch/x86/kvm/kvm_emulate.h | 2 +-
arch/x86/kvm/pmu.c | 20 +-
arch/x86/kvm/pmu.h | 5 +-
arch/x86/kvm/svm/pmu.c | 17 +-
arch/x86/kvm/vmx/pmu_intel.c | 157 ++---
arch/x86/kvm/x86.c | 9 +-
tools/testing/selftests/kvm/Makefile | 2 +
.../selftests/kvm/include/kvm_util_base.h | 4 +
tools/testing/selftests/kvm/include/pmu.h | 97 +++
.../selftests/kvm/include/x86_64/processor.h | 148 ++++-
tools/testing/selftests/kvm/lib/kvm_util.c | 62 +-
tools/testing/selftests/kvm/lib/pmu.c | 31 +
.../selftests/kvm/lib/x86_64/processor.c | 15 +-
.../selftests/kvm/x86_64/pmu_counters_test.c | 617 ++++++++++++++++++
.../kvm/x86_64/pmu_event_filter_test.c | 143 ++--
.../smaller_maxphyaddr_emulation_test.c | 2 +-
.../kvm/x86_64/userspace_msr_exit_test.c | 29 +-
.../selftests/kvm/x86_64/vmx_pmu_caps_test.c | 2 +-
20 files changed, 1079 insertions(+), 288 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/pmu.h
create mode 100644 tools/testing/selftests/kvm/lib/pmu.c
create mode 100644 tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
base-commit: 9f018b692c0114a484cea4faf9758a224774866a
--
2.43.0.rc2.451.g8631bc7472-goog
Remove KVM's bogus restriction that the guest can't program an event whose
encoding matches an unsupported architectural event. The enumeration of
an architectural event only says that if a CPU supports an architectural
event, then the event can be programmed using the architectural encoding.
The enumeration does NOT say anything about the encoding when the CPU
doesn't report support the architectural event.
Preventing the guest from counting events whose encoding happens to match
an architectural event breaks existing functionality whenever Intel adds
an architectural encoding that was *ever* used for a CPU that doesn't
enumerate support for the architectural event, even if the encoding is for
the exact same event!
E.g. the architectural encoding for Top-Down Slots is 0x01a4. Broadwell
CPUs, which do not support the Top-Down Slots architectural event, 0x01a4
is a valid, model-specific event. Denying guest usage of 0x01a4 if/when
KVM adds support for Top-Down slots would break any Broadwell-based guest.
Reported-by: Kan Liang <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]
Fixes: a21864486f7e ("KVM: x86/pmu: Fix available_event_types check for REF_CPU_CYCLES event")
Reviewed-by: Dapeng Mi <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm-x86-pmu-ops.h | 1 -
arch/x86/kvm/pmu.c | 1 -
arch/x86/kvm/pmu.h | 1 -
arch/x86/kvm/svm/pmu.c | 6 ----
arch/x86/kvm/vmx/pmu_intel.c | 38 --------------------------
5 files changed, 47 deletions(-)
diff --git a/arch/x86/include/asm/kvm-x86-pmu-ops.h b/arch/x86/include/asm/kvm-x86-pmu-ops.h
index 058bc636356a..d7eebee4450c 100644
--- a/arch/x86/include/asm/kvm-x86-pmu-ops.h
+++ b/arch/x86/include/asm/kvm-x86-pmu-ops.h
@@ -12,7 +12,6 @@ BUILD_BUG_ON(1)
* a NULL definition, for example if "static_call_cond()" will be used
* at the call sites.
*/
-KVM_X86_PMU_OP(hw_event_available)
KVM_X86_PMU_OP(pmc_idx_to_pmc)
KVM_X86_PMU_OP(rdpmc_ecx_to_pmc)
KVM_X86_PMU_OP(msr_idx_to_pmc)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 87cc6c8809ad..30945fea6988 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -441,7 +441,6 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
static bool pmc_event_is_allowed(struct kvm_pmc *pmc)
{
return pmc_is_globally_enabled(pmc) && pmc_speculative_in_use(pmc) &&
- static_call(kvm_x86_pmu_hw_event_available)(pmc) &&
check_pmu_event_filter(pmc);
}
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 7caeb3d8d4fd..87ecf22f5b25 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -19,7 +19,6 @@
#define VMWARE_BACKDOOR_PMC_APPARENT_TIME 0x10002
struct kvm_pmu_ops {
- bool (*hw_event_available)(struct kvm_pmc *pmc);
struct kvm_pmc *(*pmc_idx_to_pmc)(struct kvm_pmu *pmu, int pmc_idx);
struct kvm_pmc *(*rdpmc_ecx_to_pmc)(struct kvm_vcpu *vcpu,
unsigned int idx, u64 *mask);
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index b6a7ad4d6914..1475d47c821c 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -73,11 +73,6 @@ static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,
return amd_pmc_idx_to_pmc(pmu, idx);
}
-static bool amd_hw_event_available(struct kvm_pmc *pmc)
-{
- return true;
-}
-
static bool amd_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -233,7 +228,6 @@ static void amd_pmu_init(struct kvm_vcpu *vcpu)
}
struct kvm_pmu_ops amd_pmu_ops __initdata = {
- .hw_event_available = amd_hw_event_available,
.pmc_idx_to_pmc = amd_pmc_idx_to_pmc,
.rdpmc_ecx_to_pmc = amd_rdpmc_ecx_to_pmc,
.msr_idx_to_pmc = amd_msr_idx_to_pmc,
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 8207f8c03585..1a7d021a6c7b 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -101,43 +101,6 @@ static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
}
}
-static bool intel_hw_event_available(struct kvm_pmc *pmc)
-{
- struct kvm_pmu *pmu = pmc_to_pmu(pmc);
- u8 event_select = pmc->eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
- u8 unit_mask = (pmc->eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
- int i;
-
- /*
- * Fixed counters are always available if KVM reaches this point. If a
- * fixed counter is unsupported in hardware or guest CPUID, KVM doesn't
- * allow the counter's corresponding MSR to be written. KVM does use
- * architectural events to program fixed counters, as the interface to
- * perf doesn't allow requesting a specific fixed counter, e.g. perf
- * may (sadly) back a guest fixed PMC with a general purposed counter.
- * But if _hardware_ doesn't support the associated event, KVM simply
- * doesn't enumerate support for the fixed counter.
- */
- if (pmc_is_fixed(pmc))
- return true;
-
- BUILD_BUG_ON(ARRAY_SIZE(intel_arch_events) != NR_INTEL_ARCH_EVENTS);
-
- /*
- * Disallow events reported as unavailable in guest CPUID. Note, this
- * doesn't apply to pseudo-architectural events (see above).
- */
- for (i = 0; i < NR_REAL_INTEL_ARCH_EVENTS; i++) {
- if (intel_arch_events[i].eventsel != event_select ||
- intel_arch_events[i].unit_mask != unit_mask)
- continue;
-
- return pmu->available_event_types & BIT(i);
- }
-
- return true;
-}
-
static bool intel_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -780,7 +743,6 @@ void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu)
}
struct kvm_pmu_ops intel_pmu_ops __initdata = {
- .hw_event_available = intel_hw_event_available,
.pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
.rdpmc_ecx_to_pmc = intel_rdpmc_ecx_to_pmc,
.msr_idx_to_pmc = intel_msr_idx_to_pmc,
--
2.43.0.rc2.451.g8631bc7472-goog
Drop KVM's enumeration of Intel's architectural event encodings, and
instead open code the three encodings (of which only two are real) that
KVM uses to emulate fixed counters. Now that KVM doesn't incorrectly
enforce the availability of architectural encodings, there is no reason
for KVM to ever care about the encodings themselves, at least not in the
current format of an array indexed by the encoding's position in CPUID.
Opportunistically add a comment to explain why KVM cares about eventsel
values for fixed counters.
Suggested-by: Jim Mattson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 72 ++++++++++++------------------------
1 file changed, 23 insertions(+), 49 deletions(-)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 1a7d021a6c7b..f3c44ddc09f8 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -22,52 +22,6 @@
#define MSR_PMC_FULL_WIDTH_BIT (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0)
-enum intel_pmu_architectural_events {
- /*
- * The order of the architectural events matters as support for each
- * event is enumerated via CPUID using the index of the event.
- */
- INTEL_ARCH_CPU_CYCLES,
- INTEL_ARCH_INSTRUCTIONS_RETIRED,
- INTEL_ARCH_REFERENCE_CYCLES,
- INTEL_ARCH_LLC_REFERENCES,
- INTEL_ARCH_LLC_MISSES,
- INTEL_ARCH_BRANCHES_RETIRED,
- INTEL_ARCH_BRANCHES_MISPREDICTED,
-
- NR_REAL_INTEL_ARCH_EVENTS,
-
- /*
- * Pseudo-architectural event used to implement IA32_FIXED_CTR2, a.k.a.
- * TSC reference cycles. The architectural reference cycles event may
- * or may not actually use the TSC as the reference, e.g. might use the
- * core crystal clock or the bus clock (yeah, "architectural").
- */
- PSEUDO_ARCH_REFERENCE_CYCLES = NR_REAL_INTEL_ARCH_EVENTS,
- NR_INTEL_ARCH_EVENTS,
-};
-
-static struct {
- u8 eventsel;
- u8 unit_mask;
-} const intel_arch_events[] = {
- [INTEL_ARCH_CPU_CYCLES] = { 0x3c, 0x00 },
- [INTEL_ARCH_INSTRUCTIONS_RETIRED] = { 0xc0, 0x00 },
- [INTEL_ARCH_REFERENCE_CYCLES] = { 0x3c, 0x01 },
- [INTEL_ARCH_LLC_REFERENCES] = { 0x2e, 0x4f },
- [INTEL_ARCH_LLC_MISSES] = { 0x2e, 0x41 },
- [INTEL_ARCH_BRANCHES_RETIRED] = { 0xc4, 0x00 },
- [INTEL_ARCH_BRANCHES_MISPREDICTED] = { 0xc5, 0x00 },
- [PSEUDO_ARCH_REFERENCE_CYCLES] = { 0x00, 0x03 },
-};
-
-/* mapping between fixed pmc index and intel_arch_events array */
-static int fixed_pmc_events[] = {
- [0] = INTEL_ARCH_INSTRUCTIONS_RETIRED,
- [1] = INTEL_ARCH_CPU_CYCLES,
- [2] = PSEUDO_ARCH_REFERENCE_CYCLES,
-};
-
static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
{
struct kvm_pmc *pmc;
@@ -440,8 +394,29 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 0;
}
+/*
+ * Map fixed counter events to architectural general purpose event encodings.
+ * Perf doesn't provide APIs to allow KVM to directly program a fixed counter,
+ * and so KVM instead programs the architectural event to effectively request
+ * the fixed counter. Perf isn't guaranteed to use a fixed counter and may
+ * instead program the encoding into a general purpose counter, e.g. if a
+ * different perf_event is already utilizing the requested counter, but the end
+ * result is the same (ignoring the fact that using a general purpose counter
+ * will likely exacerbate counter contention).
+ *
+ * Note, reference cycles is counted using a perf-defined "psuedo-encoding",
+ * as there is no architectural general purpose encoding for reference cycles.
+ */
static void setup_fixed_pmc_eventsel(struct kvm_pmu *pmu)
{
+ const struct {
+ u8 eventsel;
+ u8 unit_mask;
+ } fixed_pmc_events[] = {
+ [0] = { 0xc0, 0x00 }, /* Instruction Retired / PERF_COUNT_HW_INSTRUCTIONS. */
+ [1] = { 0x3c, 0x00 }, /* CPU Cycles/ PERF_COUNT_HW_CPU_CYCLES. */
+ [2] = { 0x00, 0x03 }, /* Reference Cycles / PERF_COUNT_HW_REF_CPU_CYCLES*/
+ };
int i;
BUILD_BUG_ON(ARRAY_SIZE(fixed_pmc_events) != KVM_PMC_MAX_FIXED);
@@ -449,10 +424,9 @@ static void setup_fixed_pmc_eventsel(struct kvm_pmu *pmu)
for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
int index = array_index_nospec(i, KVM_PMC_MAX_FIXED);
struct kvm_pmc *pmc = &pmu->fixed_counters[index];
- u32 event = fixed_pmc_events[index];
- pmc->eventsel = (intel_arch_events[event].unit_mask << 8) |
- intel_arch_events[event].eventsel;
+ pmc->eventsel = (fixed_pmc_events[index].unit_mask << 8) |
+ fixed_pmc_events[index].eventsel;
}
}
--
2.43.0.rc2.451.g8631bc7472-goog
Set the eventsel for all fixed counters during PMU initialization, the
eventsel is hardcoded and consumed if and only if the counter is supported,
i.e. there is no reason to redo the setup every time the PMU is refreshed.
Configuring all KVM-supported fixed counter also eliminates a potential
pitfall if/when KVM supports discontiguous fixed counters, in which case
configuring only nr_arch_fixed_counters will be insufficient (ignoring the
fact that KVM will need many other changes to support discontiguous fixed
counters).
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index f3c44ddc09f8..98e92b9ece09 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -407,27 +407,21 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
* Note, reference cycles is counted using a perf-defined "psuedo-encoding",
* as there is no architectural general purpose encoding for reference cycles.
*/
-static void setup_fixed_pmc_eventsel(struct kvm_pmu *pmu)
+static u64 intel_get_fixed_pmc_eventsel(int index)
{
const struct {
- u8 eventsel;
+ u8 event;
u8 unit_mask;
} fixed_pmc_events[] = {
[0] = { 0xc0, 0x00 }, /* Instruction Retired / PERF_COUNT_HW_INSTRUCTIONS. */
[1] = { 0x3c, 0x00 }, /* CPU Cycles/ PERF_COUNT_HW_CPU_CYCLES. */
[2] = { 0x00, 0x03 }, /* Reference Cycles / PERF_COUNT_HW_REF_CPU_CYCLES*/
};
- int i;
BUILD_BUG_ON(ARRAY_SIZE(fixed_pmc_events) != KVM_PMC_MAX_FIXED);
- for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
- int index = array_index_nospec(i, KVM_PMC_MAX_FIXED);
- struct kvm_pmc *pmc = &pmu->fixed_counters[index];
-
- pmc->eventsel = (fixed_pmc_events[index].unit_mask << 8) |
- fixed_pmc_events[index].eventsel;
- }
+ return (fixed_pmc_events[index].unit_mask << 8) |
+ fixed_pmc_events[index].event;
}
static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
@@ -493,7 +487,6 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
kvm_pmu_cap.bit_width_fixed);
pmu->counter_bitmask[KVM_PMC_FIXED] =
((u64)1 << edx.split.bit_width_fixed) - 1;
- setup_fixed_pmc_eventsel(pmu);
}
for (i = 0; i < pmu->nr_arch_fixed_counters; i++)
@@ -571,6 +564,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
pmu->fixed_counters[i].vcpu = vcpu;
pmu->fixed_counters[i].idx = i + INTEL_PMC_IDX_FIXED;
pmu->fixed_counters[i].current_config = 0;
+ pmu->fixed_counters[i].eventsel = intel_get_fixed_pmc_eventsel(i);
}
lbr_desc->records.nr = 0;
--
2.43.0.rc2.451.g8631bc7472-goog
Get the event selectors used to effectively request fixed counters for
perf events from perf itself instead of hardcoding them in KVM and hoping
that they match the underlying hardware. While fixed counters 0 and 1 use
architectural events, as of ffbe4ab0beda ("perf/x86/intel: Extend the
ref-cycles event to GP counters") fixed counter 2 (reference TSC cycles)
may use a software-defined pseudo-encoding or a real hardware-defined
encoding.
Reported-by: Kan Liang <[email protected]>
Closes: https://lkml.kernel.org/r/4281eee7-6423-4ec8-bb18-c6aeee1faf2c%40linux.intel.com
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 30 +++++++++++++++++-------------
1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 98e92b9ece09..ec4feaef3d55 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -404,24 +404,28 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
* result is the same (ignoring the fact that using a general purpose counter
* will likely exacerbate counter contention).
*
- * Note, reference cycles is counted using a perf-defined "psuedo-encoding",
- * as there is no architectural general purpose encoding for reference cycles.
+ * Forcibly inlined to allow asserting on @index at build time, and there should
+ * never be more than one user.
*/
-static u64 intel_get_fixed_pmc_eventsel(int index)
+static __always_inline u64 intel_get_fixed_pmc_eventsel(unsigned int index)
{
- const struct {
- u8 event;
- u8 unit_mask;
- } fixed_pmc_events[] = {
- [0] = { 0xc0, 0x00 }, /* Instruction Retired / PERF_COUNT_HW_INSTRUCTIONS. */
- [1] = { 0x3c, 0x00 }, /* CPU Cycles/ PERF_COUNT_HW_CPU_CYCLES. */
- [2] = { 0x00, 0x03 }, /* Reference Cycles / PERF_COUNT_HW_REF_CPU_CYCLES*/
+ const enum perf_hw_id fixed_pmc_perf_ids[] = {
+ [0] = PERF_COUNT_HW_INSTRUCTIONS,
+ [1] = PERF_COUNT_HW_CPU_CYCLES,
+ [2] = PERF_COUNT_HW_REF_CPU_CYCLES,
};
+ u64 eventsel;
- BUILD_BUG_ON(ARRAY_SIZE(fixed_pmc_events) != KVM_PMC_MAX_FIXED);
+ BUILD_BUG_ON(ARRAY_SIZE(fixed_pmc_perf_ids) != KVM_PMC_MAX_FIXED);
+ BUILD_BUG_ON(index >= KVM_PMC_MAX_FIXED);
- return (fixed_pmc_events[index].unit_mask << 8) |
- fixed_pmc_events[index].event;
+ /*
+ * Yell if perf reports support for a fixed counter but perf doesn't
+ * have a known encoding for the associated general purpose event.
+ */
+ eventsel = perf_get_hw_event_config(fixed_pmc_perf_ids[index]);
+ WARN_ON_ONCE(!eventsel && index < kvm_pmu_cap.num_counters_fixed);
+ return eventsel;
}
static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
--
2.43.0.rc2.451.g8631bc7472-goog
Stop stripping bits 31:30 prior to validating/consuming the RDPMC index on
AMD. Per the APM's documentation of RDPMC, *values* greater than 27 are
reserved. The behavior of upper bits being flags is firmly Intel-only.
Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/pmu.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 1475d47c821c..1fafc46f61c9 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -77,8 +77,6 @@ static bool amd_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
- idx &= ~(3u << 30);
-
return idx < pmu->nr_arch_gp_counters;
}
@@ -86,7 +84,7 @@ static bool amd_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
static struct kvm_pmc *amd_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
unsigned int idx, u64 *mask)
{
- return amd_pmc_idx_to_pmc(vcpu_to_pmu(vcpu), idx & ~(3u << 30));
+ return amd_pmc_idx_to_pmc(vcpu_to_pmu(vcpu), idx);
}
static struct kvm_pmc *amd_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr)
--
2.43.0.rc2.451.g8631bc7472-goog
Apply the pre-intercepts RDPMC validity check only to AMD, and rename all
relevant functions to make it as clear as possible that the check is not a
standard PMC index check. On Intel, the basic rule is that only invalid
opcodes and privilege/permission/mode checks have priority over VM-Exit,
i.e. RDPMC with an invalid index should VM-Exit, not #GP. While the SDM
doesn't explicitly call out RDPMC, it _does_ explicitly use RDMSR of a
non-existent MSR as an example where VM-Exit has priority over #GP, and
RDPMC is effectively just a variation of RDMSR.
Manually testing on various Intel CPUs confirms this behavior, and the
inverted priority was introduced for SVM compatibility, i.e. was not an
intentional change for Intel PMUs. On AMD, *all* exceptions on RDPMC have
priority over VM-Exit.
Check for a NULL kvm_pmu_ops.check_rdpmc_early instead of using a RET0
static call so as to provide a convenient location to document the
difference between Intel and AMD, and to again try to make it as obvious
as possible that the early check is a one-off thing, not a generic "is
this PMC valid?" helper.
Fixes: 8061252ee0d2 ("KVM: SVM: Add intercept checks for remaining twobyte instructions")
Cc: Jim Mattson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm-x86-pmu-ops.h | 2 +-
arch/x86/kvm/emulate.c | 2 +-
arch/x86/kvm/kvm_emulate.h | 2 +-
arch/x86/kvm/pmu.c | 16 +++++++++++++---
arch/x86/kvm/pmu.h | 4 ++--
arch/x86/kvm/svm/pmu.c | 9 ++++++---
arch/x86/kvm/vmx/pmu_intel.c | 12 ------------
arch/x86/kvm/x86.c | 9 +++------
8 files changed, 27 insertions(+), 29 deletions(-)
diff --git a/arch/x86/include/asm/kvm-x86-pmu-ops.h b/arch/x86/include/asm/kvm-x86-pmu-ops.h
index d7eebee4450c..f0cd48222133 100644
--- a/arch/x86/include/asm/kvm-x86-pmu-ops.h
+++ b/arch/x86/include/asm/kvm-x86-pmu-ops.h
@@ -15,7 +15,7 @@ BUILD_BUG_ON(1)
KVM_X86_PMU_OP(pmc_idx_to_pmc)
KVM_X86_PMU_OP(rdpmc_ecx_to_pmc)
KVM_X86_PMU_OP(msr_idx_to_pmc)
-KVM_X86_PMU_OP(is_valid_rdpmc_ecx)
+KVM_X86_PMU_OP_OPTIONAL(check_rdpmc_early)
KVM_X86_PMU_OP(is_valid_msr)
KVM_X86_PMU_OP(get_msr)
KVM_X86_PMU_OP(set_msr)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e223043ef5b2..695ab5b6055c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3962,7 +3962,7 @@ static int check_rdpmc(struct x86_emulate_ctxt *ctxt)
* protected mode.
*/
if ((!(cr4 & X86_CR4_PCE) && ctxt->ops->cpl(ctxt)) ||
- ctxt->ops->check_pmc(ctxt, rcx))
+ ctxt->ops->check_rdpmc_early(ctxt, rcx))
return emulate_gp(ctxt, 0);
return X86EMUL_CONTINUE;
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index e6d149825169..4351149484fb 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -208,7 +208,7 @@ struct x86_emulate_ops {
int (*set_msr_with_filter)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 data);
int (*get_msr_with_filter)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 *pdata);
int (*get_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 *pdata);
- int (*check_pmc)(struct x86_emulate_ctxt *ctxt, u32 pmc);
+ int (*check_rdpmc_early)(struct x86_emulate_ctxt *ctxt, u32 pmc);
int (*read_pmc)(struct x86_emulate_ctxt *ctxt, u32 pmc, u64 *pdata);
void (*halt)(struct x86_emulate_ctxt *ctxt);
void (*wbinvd)(struct x86_emulate_ctxt *ctxt);
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 30945fea6988..0b0d804ee239 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -524,10 +524,20 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
kvm_pmu_cleanup(vcpu);
}
-/* check if idx is a valid index to access PMU */
-bool kvm_pmu_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
+int kvm_pmu_check_rdpmc_early(struct kvm_vcpu *vcpu, unsigned int idx)
{
- return static_call(kvm_x86_pmu_is_valid_rdpmc_ecx)(vcpu, idx);
+ /*
+ * On Intel, VMX interception has priority over RDPMC exceptions that
+ * aren't already handled by the emulator, i.e. there are no additional
+ * check needed for Intel PMUs.
+ *
+ * On AMD, _all_ exceptions on RDPMC have priority over SVM intercepts,
+ * i.e. an invalid PMC results in a #GP, not #VMEXIT.
+ */
+ if (!kvm_pmu_ops.check_rdpmc_early)
+ return 0;
+
+ return static_call(kvm_x86_pmu_check_rdpmc_early)(vcpu, idx);
}
bool is_vmware_backdoor_pmc(u32 pmc_idx)
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 87ecf22f5b25..51bbb01b21c8 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -23,7 +23,7 @@ struct kvm_pmu_ops {
struct kvm_pmc *(*rdpmc_ecx_to_pmc)(struct kvm_vcpu *vcpu,
unsigned int idx, u64 *mask);
struct kvm_pmc *(*msr_idx_to_pmc)(struct kvm_vcpu *vcpu, u32 msr);
- bool (*is_valid_rdpmc_ecx)(struct kvm_vcpu *vcpu, unsigned int idx);
+ int (*check_rdpmc_early)(struct kvm_vcpu *vcpu, unsigned int idx);
bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr);
int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
@@ -215,7 +215,7 @@ static inline bool pmc_is_globally_enabled(struct kvm_pmc *pmc)
void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
-bool kvm_pmu_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx);
+int kvm_pmu_check_rdpmc_early(struct kvm_vcpu *vcpu, unsigned int idx);
bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr);
int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 1fafc46f61c9..e886300f0f97 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -73,11 +73,14 @@ static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,
return amd_pmc_idx_to_pmc(pmu, idx);
}
-static bool amd_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
+static int amd_check_rdpmc_early(struct kvm_vcpu *vcpu, unsigned int idx)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
- return idx < pmu->nr_arch_gp_counters;
+ if (idx >= pmu->nr_arch_gp_counters)
+ return -EINVAL;
+
+ return 0;
}
/* idx is the ECX register of RDPMC instruction */
@@ -229,7 +232,7 @@ struct kvm_pmu_ops amd_pmu_ops __initdata = {
.pmc_idx_to_pmc = amd_pmc_idx_to_pmc,
.rdpmc_ecx_to_pmc = amd_rdpmc_ecx_to_pmc,
.msr_idx_to_pmc = amd_msr_idx_to_pmc,
- .is_valid_rdpmc_ecx = amd_is_valid_rdpmc_ecx,
+ .check_rdpmc_early = amd_check_rdpmc_early,
.is_valid_msr = amd_is_valid_msr,
.get_msr = amd_pmu_get_msr,
.set_msr = amd_pmu_set_msr,
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index ec4feaef3d55..1b1f888ad32b 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -55,17 +55,6 @@ static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
}
}
-static bool intel_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
-{
- struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
- bool fixed = idx & (1u << 30);
-
- idx &= ~(3u << 30);
-
- return fixed ? idx < pmu->nr_arch_fixed_counters
- : idx < pmu->nr_arch_gp_counters;
-}
-
static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
unsigned int idx, u64 *mask)
{
@@ -718,7 +707,6 @@ struct kvm_pmu_ops intel_pmu_ops __initdata = {
.pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
.rdpmc_ecx_to_pmc = intel_rdpmc_ecx_to_pmc,
.msr_idx_to_pmc = intel_msr_idx_to_pmc,
- .is_valid_rdpmc_ecx = intel_is_valid_rdpmc_ecx,
.is_valid_msr = intel_is_valid_msr,
.get_msr = intel_pmu_get_msr,
.set_msr = intel_pmu_set_msr,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d19bbca6a44c..17ffadc61a45 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8348,12 +8348,9 @@ static int emulator_get_msr(struct x86_emulate_ctxt *ctxt,
return kvm_get_msr(emul_to_vcpu(ctxt), msr_index, pdata);
}
-static int emulator_check_pmc(struct x86_emulate_ctxt *ctxt,
- u32 pmc)
+static int emulator_check_rdpmc_early(struct x86_emulate_ctxt *ctxt, u32 pmc)
{
- if (kvm_pmu_is_valid_rdpmc_ecx(emul_to_vcpu(ctxt), pmc))
- return 0;
- return -EINVAL;
+ return kvm_pmu_check_rdpmc_early(emul_to_vcpu(ctxt), pmc);
}
static int emulator_read_pmc(struct x86_emulate_ctxt *ctxt,
@@ -8485,7 +8482,7 @@ static const struct x86_emulate_ops emulate_ops = {
.set_msr_with_filter = emulator_set_msr_with_filter,
.get_msr_with_filter = emulator_get_msr_with_filter,
.get_msr = emulator_get_msr,
- .check_pmc = emulator_check_pmc,
+ .check_rdpmc_early = emulator_check_rdpmc_early,
.read_pmc = emulator_read_pmc,
.halt = emulator_halt,
.wbinvd = emulator_wbinvd,
--
2.43.0.rc2.451.g8631bc7472-goog
Move the handling of "fast" RDPMC instructions, which drop bits 63:32 of
the count, to Intel. The "fast" flag, and all flags for that matter, are
Intel-only and aren't supported by AMD.
Opportunistically replace open coded bit crud with proper #defines.
Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM")
Reviewed-by: Dapeng Mi <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/pmu.c | 3 +--
arch/x86/kvm/vmx/pmu_intel.c | 11 +++++++++--
2 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 0b0d804ee239..09b0feb975c3 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -576,10 +576,9 @@ static int kvm_pmu_rdpmc_vmware(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
{
- bool fast_mode = idx & (1u << 31);
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
struct kvm_pmc *pmc;
- u64 mask = fast_mode ? ~0u : ~0ull;
+ u64 mask = ~0ull;
if (!pmu->version)
return 1;
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 1b1f888ad32b..6903dd9b71ad 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -20,6 +20,10 @@
#include "nested.h"
#include "pmu.h"
+/* Perf's "BASE" is wildly misleading, this is a single-bit flag, not a base. */
+#define INTEL_RDPMC_FIXED INTEL_PMC_FIXED_RDPMC_BASE
+#define INTEL_RDPMC_FAST BIT(31)
+
#define MSR_PMC_FULL_WIDTH_BIT (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0)
static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
@@ -59,11 +63,14 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
unsigned int idx, u64 *mask)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
- bool fixed = idx & (1u << 30);
+ bool fixed = idx & INTEL_RDPMC_FIXED;
struct kvm_pmc *counters;
unsigned int num_counters;
- idx &= ~(3u << 30);
+ if (idx & INTEL_RDPMC_FAST)
+ *mask &= GENMASK_ULL(31, 0);
+
+ idx &= ~(INTEL_RDPMC_FIXED | INTEL_RDPMC_FAST);
if (fixed) {
counters = pmu->fixed_counters;
num_counters = pmu->nr_arch_fixed_counters;
--
2.43.0.rc2.451.g8631bc7472-goog
Explicitly check for attempts to read unsupported PMC types instead of
letting the bounds check fail. Functionally, letting the check fail is
ok, but it's unnecessarily subtle and does a poor job of documenting the
architectural behavior that KVM is emulating.
Opportunistically add macros for the type vs. index to further document
what is going on.
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 644de27bd48a..bd4f4bdf5419 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -23,6 +23,9 @@
/* Perf's "BASE" is wildly misleading, this is a single-bit flag, not a base. */
#define INTEL_RDPMC_FIXED INTEL_PMC_FIXED_RDPMC_BASE
+#define INTEL_RDPMC_TYPE_MASK GENMASK(31, 16)
+#define INTEL_RDPMC_INDEX_MASK GENMASK(15, 0)
+
#define MSR_PMC_FULL_WIDTH_BIT (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0)
static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
@@ -82,9 +85,13 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
/*
* Fixed PMCs are supported on all architectural PMUs. Note, KVM only
* emulates fixed PMCs for PMU v2+, but the flag itself is still valid,
- * i.e. let RDPMC fail due to accessing a non-existent counter.
+ * i.e. let RDPMC fail due to accessing a non-existent counter. Reject
+ * attempts to read all other types, which are unknown/unsupported.
*/
- idx &= ~INTEL_RDPMC_FIXED;
+ if (idx & INTEL_RDPMC_TYPE_MASK & ~INTEL_RDPMC_FIXED)
+ return NULL;
+
+ idx &= INTEL_RDPMC_INDEX_MASK;
if (fixed) {
counters = pmu->fixed_counters;
num_counters = pmu->nr_arch_fixed_counters;
--
2.43.0.rc2.451.g8631bc7472-goog
From: Jinrong Liang <[email protected]>
Add vcpu_set_cpuid_property() helper function for setting properties, and
use it instead of open coding an equivalent for MAX_PHY_ADDR. Future vPMU
testcases will also need to stuff various CPUID properties.
Reviewed-by: Jim Mattson <[email protected]>
Signed-off-by: Jinrong Liang <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/include/x86_64/processor.h | 4 +++-
.../testing/selftests/kvm/lib/x86_64/processor.c | 15 ++++++++++++---
.../x86_64/smaller_maxphyaddr_emulation_test.c | 2 +-
3 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index a84863503fcb..932944c4ea01 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -995,7 +995,9 @@ static inline void vcpu_set_cpuid(struct kvm_vcpu *vcpu)
vcpu_ioctl(vcpu, KVM_GET_CPUID2, vcpu->cpuid);
}
-void vcpu_set_cpuid_maxphyaddr(struct kvm_vcpu *vcpu, uint8_t maxphyaddr);
+void vcpu_set_cpuid_property(struct kvm_vcpu *vcpu,
+ struct kvm_x86_cpu_property property,
+ uint32_t value);
void vcpu_clear_cpuid_entry(struct kvm_vcpu *vcpu, uint32_t function);
void vcpu_set_or_clear_cpuid_feature(struct kvm_vcpu *vcpu,
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index d8288374078e..67eb82a6c754 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -752,12 +752,21 @@ void vcpu_init_cpuid(struct kvm_vcpu *vcpu, const struct kvm_cpuid2 *cpuid)
vcpu_set_cpuid(vcpu);
}
-void vcpu_set_cpuid_maxphyaddr(struct kvm_vcpu *vcpu, uint8_t maxphyaddr)
+void vcpu_set_cpuid_property(struct kvm_vcpu *vcpu,
+ struct kvm_x86_cpu_property property,
+ uint32_t value)
{
- struct kvm_cpuid_entry2 *entry = vcpu_get_cpuid_entry(vcpu, 0x80000008);
+ struct kvm_cpuid_entry2 *entry;
+
+ entry = __vcpu_get_cpuid_entry(vcpu, property.function, property.index);
+
+ (&entry->eax)[property.reg] &= ~GENMASK(property.hi_bit, property.lo_bit);
+ (&entry->eax)[property.reg] |= value << property.lo_bit;
- entry->eax = (entry->eax & ~0xff) | maxphyaddr;
vcpu_set_cpuid(vcpu);
+
+ /* Sanity check that @value doesn't exceed the bounds in any way. */
+ TEST_ASSERT_EQ(kvm_cpuid_property(vcpu->cpuid, property), value);
}
void vcpu_clear_cpuid_entry(struct kvm_vcpu *vcpu, uint32_t function)
diff --git a/tools/testing/selftests/kvm/x86_64/smaller_maxphyaddr_emulation_test.c b/tools/testing/selftests/kvm/x86_64/smaller_maxphyaddr_emulation_test.c
index 06edf00a97d6..9b89440dff19 100644
--- a/tools/testing/selftests/kvm/x86_64/smaller_maxphyaddr_emulation_test.c
+++ b/tools/testing/selftests/kvm/x86_64/smaller_maxphyaddr_emulation_test.c
@@ -63,7 +63,7 @@ int main(int argc, char *argv[])
vm_init_descriptor_tables(vm);
vcpu_init_descriptor_tables(vcpu);
- vcpu_set_cpuid_maxphyaddr(vcpu, MAXPHYADDR);
+ vcpu_set_cpuid_property(vcpu, X86_PROPERTY_MAX_PHY_ADDR, MAXPHYADDR);
rc = kvm_check_cap(KVM_CAP_EXIT_ON_EMULATION_FAILURE);
TEST_ASSERT(rc, "KVM_CAP_EXIT_ON_EMULATION_FAILURE is unavailable");
--
2.43.0.rc2.451.g8631bc7472-goog
Inject #GP on RDPMC if the "fast" flag is set for architectural Intel
PMUs, i.e. if the PMU version is non-zero. Per Intel's SDM, and confirmed
on bare metal, the "fast" flag is supported only for non-architectural
PMUs, and is reserved for architectural PMUs.
If the processor does not support architectural performance monitoring
(CPUID.0AH:EAX[7:0]=0), ECX[30:0] specifies the index of the PMC to be
read. Setting ECX[31] selects “fast” read mode if supported. In this mode,
RDPMC returns bits 31:0 of the PMC in EAX while clearing EDX to zero.
If the processor does support architectural performance monitoring
(CPUID.0AH:EAX[7:0] ≠ 0), ECX[31:16] specifies type of PMC while ECX[15:0]
specifies the index of the PMC to be read within that type. The following
PMC types are currently defined:
— General-purpose counters use type 0. The index x (to read IA32_PMCx)
must be less than the value enumerated by CPUID.0AH.EAX[15:8] (thus
ECX[15:8] must be zero).
— Fixed-function counters use type 4000H. The index x (to read
IA32_FIXED_CTRx) can be used if either CPUID.0AH.EDX[4:0] > x or
CPUID.0AH.ECX[x] = 1 (thus ECX[15:5] must be 0).
— Performance metrics use type 2000H. This type can be used only if
IA32_PERF_CAPABILITIES.PERF_METRICS_AVAILABLE[bit 15]=1. For this type,
the index in ECX[15:0] is implementation specific.
Opportunistically WARN if KVM ever actually tries to complete RDPMC for a
non-architectural PMU, and drop the non-existent "support" for fast RDPMC,
as KVM doesn't support such PMUs, i.e. kvm_pmu_rdpmc() should reject the
RDPMC before getting to the Intel code.
Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")
Fixes: 67f4d4288c35 ("KVM: x86: rdpmc emulation checks the counter incorrectly")
Cc: Dapeng Mi <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 6903dd9b71ad..644de27bd48a 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -22,7 +22,6 @@
/* Perf's "BASE" is wildly misleading, this is a single-bit flag, not a base. */
#define INTEL_RDPMC_FIXED INTEL_PMC_FIXED_RDPMC_BASE
-#define INTEL_RDPMC_FAST BIT(31)
#define MSR_PMC_FULL_WIDTH_BIT (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0)
@@ -67,10 +66,25 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
struct kvm_pmc *counters;
unsigned int num_counters;
- if (idx & INTEL_RDPMC_FAST)
- *mask &= GENMASK_ULL(31, 0);
+ /*
+ * The encoding of ECX for RDPMC is different for architectural versus
+ * non-architecturals PMUs (PMUs with version '0'). For architectural
+ * PMUs, bits 31:16 specify the PMC type and bits 15:0 specify the PMC
+ * index. For non-architectural PMUs, bit 31 is a "fast" flag, and
+ * bits 30:0 specify the PMC index.
+ *
+ * Yell and reject attempts to read PMCs for a non-architectural PMU,
+ * as KVM doesn't support such PMUs.
+ */
+ if (WARN_ON_ONCE(!pmu->version))
+ return NULL;
- idx &= ~(INTEL_RDPMC_FIXED | INTEL_RDPMC_FAST);
+ /*
+ * Fixed PMCs are supported on all architectural PMUs. Note, KVM only
+ * emulates fixed PMCs for PMU v2+, but the flag itself is still valid,
+ * i.e. let RDPMC fail due to accessing a non-existent counter.
+ */
+ idx &= ~INTEL_RDPMC_FIXED;
if (fixed) {
counters = pmu->fixed_counters;
num_counters = pmu->nr_arch_fixed_counters;
--
2.43.0.rc2.451.g8631bc7472-goog
From: Jinrong Liang <[email protected]>
Add test cases to verify that Intel's Architectural PMU events work as
expected when they are available according to guest CPUID. Iterate over a
range of sane PMU versions, with and without full-width writes enabled,
and over interesting combinations of lengths/masks for the bit vector that
enumerates unavailable events.
Test up to vPMU version 5, i.e. the current architectural max. KVM only
officially supports up to version 2, but the behavior of the counters is
backwards compatible, i.e. KVM shouldn't do something completely different
for a higher, architecturally-defined vPMU version. Verify KVM behavior
against the effective vPMU version, e.g. advertising vPMU 5 when KVM only
supports vPMU 2 shouldn't magically unlock vPMU 5 features.
According to Intel SDM, the number of architectural events is reported
through CPUID.0AH:EAX[31:24] and the architectural event x is supported
if EBX[x]=0 && EAX[31:24]>x.
Handcode the entirety of the measured section so that the test can
precisely assert on the number of instructions and branches retired.
Co-developed-by: Like Xu <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Jinrong Liang <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/x86_64/pmu_counters_test.c | 321 ++++++++++++++++++
2 files changed, 322 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index ccc354882d1a..65d9b7c7ff54 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -90,6 +90,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/kvm_pv_test
TEST_GEN_PROGS_x86_64 += x86_64/monitor_mwait_test
TEST_GEN_PROGS_x86_64 += x86_64/nested_exceptions_test
TEST_GEN_PROGS_x86_64 += x86_64/platform_info_test
+TEST_GEN_PROGS_x86_64 += x86_64/pmu_counters_test
TEST_GEN_PROGS_x86_64 += x86_64/pmu_event_filter_test
TEST_GEN_PROGS_x86_64 += x86_64/private_mem_conversions_test
TEST_GEN_PROGS_x86_64 += x86_64/private_mem_kvm_exits_test
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
new file mode 100644
index 000000000000..5b8687bb4639
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -0,0 +1,321 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023, Tencent, Inc.
+ */
+
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <x86intrin.h>
+
+#include "pmu.h"
+#include "processor.h"
+
+/* Number of LOOP instructions for the guest measurement payload. */
+#define NUM_BRANCHES 10
+/*
+ * Number of "extra" instructions that will be counted, i.e. the number of
+ * instructions that are needed to set up the loop and then disabled the
+ * counter. 2 MOV, 2 XOR, 1 WRMSR.
+ */
+#define NUM_EXTRA_INSNS 5
+#define NUM_INSNS_RETIRED (NUM_BRANCHES + NUM_EXTRA_INSNS)
+
+static uint8_t kvm_pmu_version;
+static bool kvm_has_perf_caps;
+
+static struct kvm_vm *pmu_vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
+ void *guest_code,
+ uint8_t pmu_version,
+ uint64_t perf_capabilities)
+{
+ struct kvm_vm *vm;
+
+ vm = vm_create_with_one_vcpu(vcpu, guest_code);
+ vm_init_descriptor_tables(vm);
+ vcpu_init_descriptor_tables(*vcpu);
+
+ sync_global_to_guest(vm, kvm_pmu_version);
+
+ /*
+ * Set PERF_CAPABILITIES before PMU version as KVM disallows enabling
+ * features via PERF_CAPABILITIES if the guest doesn't have a vPMU.
+ */
+ if (kvm_has_perf_caps)
+ vcpu_set_msr(*vcpu, MSR_IA32_PERF_CAPABILITIES, perf_capabilities);
+
+ vcpu_set_cpuid_property(*vcpu, X86_PROPERTY_PMU_VERSION, pmu_version);
+ return vm;
+}
+
+static void run_vcpu(struct kvm_vcpu *vcpu)
+{
+ struct ucall uc;
+
+ do {
+ vcpu_run(vcpu);
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_SYNC:
+ break;
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ break;
+ case UCALL_PRINTF:
+ pr_info("%s", uc.buffer);
+ break;
+ case UCALL_DONE:
+ break;
+ default:
+ TEST_FAIL("Unexpected ucall: %lu", uc.cmd);
+ }
+ } while (uc.cmd != UCALL_DONE);
+}
+
+static uint8_t guest_get_pmu_version(void)
+{
+ /*
+ * Return the effective PMU version, i.e. the minimum between what KVM
+ * supports and what is enumerated to the guest. The host deliberately
+ * advertises a PMU version to the guest beyond what is actually
+ * supported by KVM to verify KVM doesn't freak out and do something
+ * bizarre with an architecturally valid, but unsupported, version.
+ */
+ return min_t(uint8_t, kvm_pmu_version, this_cpu_property(X86_PROPERTY_PMU_VERSION));
+}
+
+/*
+ * If an architectural event is supported and guaranteed to generate at least
+ * one "hit, assert that its count is non-zero. If an event isn't supported or
+ * the test can't guarantee the associated action will occur, then all bets are
+ * off regarding the count, i.e. no checks can be done.
+ *
+ * Sanity check that in all cases, the event doesn't count when it's disabled,
+ * and that KVM correctly emulates the write of an arbitrary value.
+ */
+static void guest_assert_event_count(uint8_t idx,
+ struct kvm_x86_pmu_feature event,
+ uint32_t pmc, uint32_t pmc_msr)
+{
+ uint64_t count;
+
+ count = _rdpmc(pmc);
+ if (!this_pmu_has(event))
+ goto sanity_checks;
+
+ switch (idx) {
+ case INTEL_ARCH_INSTRUCTIONS_RETIRED_INDEX:
+ GUEST_ASSERT_EQ(count, NUM_INSNS_RETIRED);
+ break;
+ case INTEL_ARCH_BRANCHES_RETIRED_INDEX:
+ GUEST_ASSERT_EQ(count, NUM_BRANCHES);
+ break;
+ case INTEL_ARCH_CPU_CYCLES_INDEX:
+ case INTEL_ARCH_REFERENCE_CYCLES_INDEX:
+ GUEST_ASSERT_NE(count, 0);
+ break;
+ default:
+ break;
+ }
+
+sanity_checks:
+ __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES}));
+ GUEST_ASSERT_EQ(_rdpmc(pmc), count);
+
+ wrmsr(pmc_msr, 0xdead);
+ GUEST_ASSERT_EQ(_rdpmc(pmc), 0xdead);
+}
+
+static void __guest_test_arch_event(uint8_t idx, struct kvm_x86_pmu_feature event,
+ uint32_t pmc, uint32_t pmc_msr,
+ uint32_t ctrl_msr, uint64_t ctrl_msr_value)
+{
+ wrmsr(pmc_msr, 0);
+
+ /*
+ * Enable and disable the PMC in a monolithic asm blob to ensure that
+ * the compiler can't insert _any_ code into the measured sequence.
+ * Note, ECX doesn't need to be clobbered as the input value, @pmc_msr,
+ * is restored before the end of the sequence.
+ */
+ __asm__ __volatile__("wrmsr\n\t"
+ "mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t"
+ "loop .\n\t"
+ "mov %%edi, %%ecx\n\t"
+ "xor %%eax, %%eax\n\t"
+ "xor %%edx, %%edx\n\t"
+ "wrmsr\n\t"
+ :: "a"((uint32_t)ctrl_msr_value),
+ "d"(ctrl_msr_value >> 32),
+ "c"(ctrl_msr), "D"(ctrl_msr)
+ );
+
+ guest_assert_event_count(idx, event, pmc, pmc_msr);
+}
+
+static void guest_test_arch_event(uint8_t idx)
+{
+ const struct {
+ struct kvm_x86_pmu_feature gp_event;
+ } intel_event_to_feature[] = {
+ [INTEL_ARCH_CPU_CYCLES_INDEX] = { X86_PMU_FEATURE_CPU_CYCLES },
+ [INTEL_ARCH_INSTRUCTIONS_RETIRED_INDEX] = { X86_PMU_FEATURE_INSNS_RETIRED },
+ [INTEL_ARCH_REFERENCE_CYCLES_INDEX] = { X86_PMU_FEATURE_REFERENCE_CYCLES },
+ [INTEL_ARCH_LLC_REFERENCES_INDEX] = { X86_PMU_FEATURE_LLC_REFERENCES },
+ [INTEL_ARCH_LLC_MISSES_INDEX] = { X86_PMU_FEATURE_LLC_MISSES },
+ [INTEL_ARCH_BRANCHES_RETIRED_INDEX] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED },
+ [INTEL_ARCH_BRANCHES_MISPREDICTED_INDEX] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED },
+ [INTEL_ARCH_TOPDOWN_SLOTS_INDEX] = { X86_PMU_FEATURE_TOPDOWN_SLOTS },
+ };
+
+ uint32_t nr_gp_counters = this_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
+ uint32_t pmu_version = guest_get_pmu_version();
+ /* PERF_GLOBAL_CTRL exists only for Architectural PMU Version 2+. */
+ bool guest_has_perf_global_ctrl = pmu_version >= 2;
+ struct kvm_x86_pmu_feature gp_event;
+ uint32_t base_pmc_msr;
+ unsigned int i;
+
+ /* The host side shouldn't invoke this without a guest PMU. */
+ GUEST_ASSERT(pmu_version);
+
+ if (this_cpu_has(X86_FEATURE_PDCM) &&
+ rdmsr(MSR_IA32_PERF_CAPABILITIES) & PMU_CAP_FW_WRITES)
+ base_pmc_msr = MSR_IA32_PMC0;
+ else
+ base_pmc_msr = MSR_IA32_PERFCTR0;
+
+ gp_event = intel_event_to_feature[idx].gp_event;
+ GUEST_ASSERT_EQ(idx, gp_event.f.bit);
+
+ GUEST_ASSERT(nr_gp_counters);
+
+ for (i = 0; i < nr_gp_counters; i++) {
+ uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS |
+ ARCH_PERFMON_EVENTSEL_ENABLE |
+ intel_pmu_arch_events[idx];
+
+ wrmsr(MSR_P6_EVNTSEL0 + i, 0);
+ if (guest_has_perf_global_ctrl)
+ wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, BIT_ULL(i));
+
+ __guest_test_arch_event(idx, gp_event, i, base_pmc_msr + i,
+ MSR_P6_EVNTSEL0 + i, eventsel);
+ }
+}
+
+static void guest_test_arch_events(void)
+{
+ uint8_t i;
+
+ for (i = 0; i < NR_INTEL_ARCH_EVENTS; i++)
+ guest_test_arch_event(i);
+
+ GUEST_DONE();
+}
+
+static void test_arch_events(uint8_t pmu_version, uint64_t perf_capabilities,
+ uint8_t length, uint8_t unavailable_mask)
+{
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+
+ /* Testing arch events requires a vPMU (there are no negative tests). */
+ if (!pmu_version)
+ return;
+
+ vm = pmu_vm_create_with_one_vcpu(&vcpu, guest_test_arch_events,
+ pmu_version, perf_capabilities);
+
+ vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH,
+ length);
+ vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_EVENTS_MASK,
+ unavailable_mask);
+
+ run_vcpu(vcpu);
+
+ kvm_vm_free(vm);
+}
+
+static void test_intel_counters(void)
+{
+ uint8_t nr_arch_events = kvm_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
+ uint8_t pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
+ unsigned int i;
+ uint8_t v, j;
+ uint32_t k;
+
+ const uint64_t perf_caps[] = {
+ 0,
+ PMU_CAP_FW_WRITES,
+ };
+
+ /*
+ * Test up to PMU v5, which is the current maximum version defined by
+ * Intel, i.e. is the last version that is guaranteed to be backwards
+ * compatible with KVM's existing behavior.
+ */
+ uint8_t max_pmu_version = max_t(typeof(pmu_version), pmu_version, 5);
+
+ /*
+ * Detect the existence of events that aren't supported by selftests.
+ * This will (obviously) fail any time the kernel adds support for a
+ * new event, but it's worth paying that price to keep the test fresh.
+ */
+ TEST_ASSERT(nr_arch_events <= NR_INTEL_ARCH_EVENTS,
+ "New architectural event(s) detected; please update this test (length = %u, mask = %x)",
+ nr_arch_events, kvm_cpu_property(X86_PROPERTY_PMU_EVENTS_MASK));
+
+ /*
+ * Force iterating over known arch events regardless of whether or not
+ * KVM/hardware supports a given event.
+ */
+ nr_arch_events = max_t(typeof(nr_arch_events), nr_arch_events, NR_INTEL_ARCH_EVENTS);
+
+ for (v = 0; v <= max_pmu_version; v++) {
+ for (i = 0; i < ARRAY_SIZE(perf_caps); i++) {
+ if (!kvm_has_perf_caps && perf_caps[i])
+ continue;
+
+ pr_info("Testing arch events, PMU version %u, perf_caps = %lx\n",
+ v, perf_caps[i]);
+ /*
+ * To keep the total runtime reasonable, test every
+ * possible non-zero, non-reserved bitmap combination
+ * only with the native PMU version and the full bit
+ * vector length.
+ */
+ if (v == pmu_version) {
+ for (k = 1; k < (BIT(nr_arch_events) - 1); k++)
+ test_arch_events(v, perf_caps[i], nr_arch_events, k);
+ }
+ /*
+ * Test single bits for all PMU version and lengths up
+ * the number of events +1 (to verify KVM doesn't do
+ * weird things if the guest length is greater than the
+ * host length). Explicitly test a mask of '0' and all
+ * ones i.e. all events being available and unavailable.
+ */
+ for (j = 0; j <= nr_arch_events + 1; j++) {
+ test_arch_events(v, perf_caps[i], j, 0);
+ test_arch_events(v, perf_caps[i], j, 0xff);
+
+ for (k = 0; k < nr_arch_events; k++)
+ test_arch_events(v, perf_caps[i], j, BIT(k));
+ }
+ }
+ }
+}
+
+int main(int argc, char *argv[])
+{
+ TEST_REQUIRE(get_kvm_param_bool("enable_pmu"));
+
+ TEST_REQUIRE(host_cpu_is_intel);
+ TEST_REQUIRE(kvm_cpu_has_p(X86_PROPERTY_PMU_VERSION));
+ TEST_REQUIRE(kvm_cpu_property(X86_PROPERTY_PMU_VERSION) > 0);
+
+ kvm_pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
+ kvm_has_perf_caps = kvm_cpu_has(X86_FEATURE_PDCM);
+
+ test_intel_counters();
+
+ return 0;
+}
--
2.43.0.rc2.451.g8631bc7472-goog
From: Jinrong Liang <[email protected]>
Add a test to verify that KVM correctly emulates MSR-based accesses to
general purpose counters based on guest CPUID, e.g. that accesses to
non-existent counters #GP and accesses to existent counters succeed.
Note, for compatibility reasons, KVM does not emulate #GP when
MSR_P6_PERFCTR[0|1] is not present (writes should be dropped).
Co-developed-by: Like Xu <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Jinrong Liang <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/x86_64/pmu_counters_test.c | 99 +++++++++++++++++++
1 file changed, 99 insertions(+)
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index 663e8fbe7ff8..863418842ef8 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -270,9 +270,103 @@ static void test_arch_events(uint8_t pmu_version, uint64_t perf_capabilities,
kvm_vm_free(vm);
}
+/*
+ * Limit testing to MSRs that are actually defined by Intel (in the SDM). MSRs
+ * that aren't defined counter MSRs *probably* don't exist, but there's no
+ * guarantee that currently undefined MSR indices won't be used for something
+ * other than PMCs in the future.
+ */
+#define MAX_NR_GP_COUNTERS 8
+#define MAX_NR_FIXED_COUNTERS 3
+
+#define GUEST_ASSERT_PMC_MSR_ACCESS(insn, msr, expect_gp, vector) \
+__GUEST_ASSERT(expect_gp ? vector == GP_VECTOR : !vector, \
+ "Expected %s on " #insn "(0x%x), got vector %u", \
+ expect_gp ? "#GP" : "no fault", msr, vector) \
+
+#define GUEST_ASSERT_PMC_VALUE(insn, msr, val, expected) \
+ __GUEST_ASSERT(val == expected_val, \
+ "Expected " #insn "(0x%x) to yield 0x%lx, got 0x%lx", \
+ msr, expected_val, val);
+
+static void guest_rd_wr_counters(uint32_t base_msr, uint8_t nr_possible_counters,
+ uint8_t nr_counters)
+{
+ uint8_t i;
+
+ for (i = 0; i < nr_possible_counters; i++) {
+ /*
+ * TODO: Test a value that validates full-width writes and the
+ * width of the counters.
+ */
+ const uint64_t test_val = 0xffff;
+ const uint32_t msr = base_msr + i;
+ const bool expect_success = i < nr_counters;
+
+ /*
+ * KVM drops writes to MSR_P6_PERFCTR[0|1] if the counters are
+ * unsupported, i.e. doesn't #GP and reads back '0'.
+ */
+ const uint64_t expected_val = expect_success ? test_val : 0;
+ const bool expect_gp = !expect_success && msr != MSR_P6_PERFCTR0 &&
+ msr != MSR_P6_PERFCTR1;
+ uint8_t vector;
+ uint64_t val;
+
+ vector = wrmsr_safe(msr, test_val);
+ GUEST_ASSERT_PMC_MSR_ACCESS(WRMSR, msr, expect_gp, vector);
+
+ vector = rdmsr_safe(msr, &val);
+ GUEST_ASSERT_PMC_MSR_ACCESS(RDMSR, msr, expect_gp, vector);
+
+ /* On #GP, the result of RDMSR is undefined. */
+ if (!expect_gp)
+ GUEST_ASSERT_PMC_VALUE(RDMSR, msr, val, expected_val);
+
+ vector = wrmsr_safe(msr, 0);
+ GUEST_ASSERT_PMC_MSR_ACCESS(WRMSR, msr, expect_gp, vector);
+ }
+ GUEST_DONE();
+}
+
+static void guest_test_gp_counters(void)
+{
+ uint8_t nr_gp_counters = 0;
+ uint32_t base_msr;
+
+ if (guest_get_pmu_version())
+ nr_gp_counters = this_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
+
+ if (this_cpu_has(X86_FEATURE_PDCM) &&
+ rdmsr(MSR_IA32_PERF_CAPABILITIES) & PMU_CAP_FW_WRITES)
+ base_msr = MSR_IA32_PMC0;
+ else
+ base_msr = MSR_IA32_PERFCTR0;
+
+ guest_rd_wr_counters(base_msr, MAX_NR_GP_COUNTERS, nr_gp_counters);
+}
+
+static void test_gp_counters(uint8_t pmu_version, uint64_t perf_capabilities,
+ uint8_t nr_gp_counters)
+{
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+
+ vm = pmu_vm_create_with_one_vcpu(&vcpu, guest_test_gp_counters,
+ pmu_version, perf_capabilities);
+
+ vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_NR_GP_COUNTERS,
+ nr_gp_counters);
+
+ run_vcpu(vcpu);
+
+ kvm_vm_free(vm);
+}
+
static void test_intel_counters(void)
{
uint8_t nr_arch_events = kvm_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
+ uint8_t nr_gp_counters = kvm_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
uint8_t pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
unsigned int i;
uint8_t v, j;
@@ -336,6 +430,11 @@ static void test_intel_counters(void)
for (k = 0; k < nr_arch_events; k++)
test_arch_events(v, perf_caps[i], j, BIT(k));
}
+
+ pr_info("Testing GP counters, PMU version %u, perf_caps = %lx\n",
+ v, perf_caps[i]);
+ for (j = 0; j <= nr_gp_counters; j++)
+ test_gp_counters(v, perf_caps[i], j);
}
}
}
--
2.43.0.rc2.451.g8631bc7472-goog
From: Jinrong Liang <[email protected]>
Extend the PMU counters test to validate architectural events using fixed
counters. The core logic is largely the same, the biggest difference
being that if a fixed counter exists, its associated event is available
(the SDM doesn't explicitly state this to be true, but it's KVM's ABI and
letting software program a fixed counter that doesn't actually count would
be quite bizarre).
Note, fixed counters rely on PERF_GLOBAL_CTRL.
Reviewed-by: Jim Mattson <[email protected]>
Reviewed-by: Dapeng Mi <[email protected]>
Co-developed-by: Like Xu <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Jinrong Liang <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/x86_64/pmu_counters_test.c | 54 +++++++++++++++----
1 file changed, 45 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index 5b8687bb4639..663e8fbe7ff8 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -150,26 +150,46 @@ static void __guest_test_arch_event(uint8_t idx, struct kvm_x86_pmu_feature even
guest_assert_event_count(idx, event, pmc, pmc_msr);
}
+#define X86_PMU_FEATURE_NULL \
+({ \
+ struct kvm_x86_pmu_feature feature = {}; \
+ \
+ feature; \
+})
+
+static bool pmu_is_null_feature(struct kvm_x86_pmu_feature event)
+{
+ return !(*(u64 *)&event);
+}
+
static void guest_test_arch_event(uint8_t idx)
{
const struct {
struct kvm_x86_pmu_feature gp_event;
+ struct kvm_x86_pmu_feature fixed_event;
} intel_event_to_feature[] = {
- [INTEL_ARCH_CPU_CYCLES_INDEX] = { X86_PMU_FEATURE_CPU_CYCLES },
- [INTEL_ARCH_INSTRUCTIONS_RETIRED_INDEX] = { X86_PMU_FEATURE_INSNS_RETIRED },
- [INTEL_ARCH_REFERENCE_CYCLES_INDEX] = { X86_PMU_FEATURE_REFERENCE_CYCLES },
- [INTEL_ARCH_LLC_REFERENCES_INDEX] = { X86_PMU_FEATURE_LLC_REFERENCES },
- [INTEL_ARCH_LLC_MISSES_INDEX] = { X86_PMU_FEATURE_LLC_MISSES },
- [INTEL_ARCH_BRANCHES_RETIRED_INDEX] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED },
- [INTEL_ARCH_BRANCHES_MISPREDICTED_INDEX] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED },
- [INTEL_ARCH_TOPDOWN_SLOTS_INDEX] = { X86_PMU_FEATURE_TOPDOWN_SLOTS },
+ [INTEL_ARCH_CPU_CYCLES_INDEX] = { X86_PMU_FEATURE_CPU_CYCLES, X86_PMU_FEATURE_CPU_CYCLES_FIXED },
+ [INTEL_ARCH_INSTRUCTIONS_RETIRED_INDEX] = { X86_PMU_FEATURE_INSNS_RETIRED, X86_PMU_FEATURE_INSNS_RETIRED_FIXED },
+ /*
+ * Note, the fixed counter for reference cycles is NOT the same
+ * as the general purpose architectural event. The fixed counter
+ * explicitly counts at the same frequency as the TSC, whereas
+ * the GP event counts at a fixed, but uarch specific, frequency.
+ * Bundle them here for simplicity.
+ */
+ [INTEL_ARCH_REFERENCE_CYCLES_INDEX] = { X86_PMU_FEATURE_REFERENCE_CYCLES, X86_PMU_FEATURE_REFERENCE_TSC_CYCLES_FIXED },
+ [INTEL_ARCH_LLC_REFERENCES_INDEX] = { X86_PMU_FEATURE_LLC_REFERENCES, X86_PMU_FEATURE_NULL },
+ [INTEL_ARCH_LLC_MISSES_INDEX] = { X86_PMU_FEATURE_LLC_MISSES, X86_PMU_FEATURE_NULL },
+ [INTEL_ARCH_BRANCHES_RETIRED_INDEX] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED, X86_PMU_FEATURE_NULL },
+ [INTEL_ARCH_BRANCHES_MISPREDICTED_INDEX] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED, X86_PMU_FEATURE_NULL },
+ [INTEL_ARCH_TOPDOWN_SLOTS_INDEX] = { X86_PMU_FEATURE_TOPDOWN_SLOTS, X86_PMU_FEATURE_TOPDOWN_SLOTS_FIXED },
};
uint32_t nr_gp_counters = this_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
uint32_t pmu_version = guest_get_pmu_version();
/* PERF_GLOBAL_CTRL exists only for Architectural PMU Version 2+. */
bool guest_has_perf_global_ctrl = pmu_version >= 2;
- struct kvm_x86_pmu_feature gp_event;
+ struct kvm_x86_pmu_feature gp_event, fixed_event;
uint32_t base_pmc_msr;
unsigned int i;
@@ -199,6 +219,22 @@ static void guest_test_arch_event(uint8_t idx)
__guest_test_arch_event(idx, gp_event, i, base_pmc_msr + i,
MSR_P6_EVNTSEL0 + i, eventsel);
}
+
+ if (!guest_has_perf_global_ctrl)
+ return;
+
+ fixed_event = intel_event_to_feature[idx].fixed_event;
+ if (pmu_is_null_feature(fixed_event) || !this_pmu_has(fixed_event))
+ return;
+
+ i = fixed_event.f.bit;
+
+ wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, FIXED_PMC_CTRL(i, FIXED_PMC_KERNEL));
+
+ __guest_test_arch_event(idx, fixed_event, i | INTEL_RDPMC_FIXED,
+ MSR_CORE_PERF_FIXED_CTR0 + i,
+ MSR_CORE_PERF_GLOBAL_CTRL,
+ FIXED_PMC_GLOBAL_CTRL_ENABLE(i));
}
static void guest_test_arch_events(void)
--
2.43.0.rc2.451.g8631bc7472-goog
From: Jinrong Liang <[email protected]>
Add a PMU library for x86 selftests to help eliminate open-coded event
encodings, and to reduce the amount of copy+paste between PMU selftests.
Use the new common macro definitions in the existing PMU event filter test.
Cc: Aaron Lewis <[email protected]>
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Jinrong Liang <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
tools/testing/selftests/kvm/Makefile | 1 +
tools/testing/selftests/kvm/include/pmu.h | 97 ++++++++++++
tools/testing/selftests/kvm/lib/pmu.c | 31 ++++
.../kvm/x86_64/pmu_event_filter_test.c | 141 ++++++------------
4 files changed, 173 insertions(+), 97 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/pmu.h
create mode 100644 tools/testing/selftests/kvm/lib/pmu.c
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 4412b42d95de..ccc354882d1a 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -32,6 +32,7 @@ LIBKVM += lib/guest_modes.c
LIBKVM += lib/io.c
LIBKVM += lib/kvm_util.c
LIBKVM += lib/memstress.c
+LIBKVM += lib/pmu.c
LIBKVM += lib/guest_sprintf.c
LIBKVM += lib/rbtree.c
LIBKVM += lib/sparsebit.c
diff --git a/tools/testing/selftests/kvm/include/pmu.h b/tools/testing/selftests/kvm/include/pmu.h
new file mode 100644
index 000000000000..3c10c4dc0ae8
--- /dev/null
+++ b/tools/testing/selftests/kvm/include/pmu.h
@@ -0,0 +1,97 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023, Tencent, Inc.
+ */
+#ifndef SELFTEST_KVM_PMU_H
+#define SELFTEST_KVM_PMU_H
+
+#include <stdint.h>
+
+#define KVM_PMU_EVENT_FILTER_MAX_EVENTS 300
+
+/*
+ * Encode an eventsel+umask pair into event-select MSR format. Note, this is
+ * technically AMD's format, as Intel's format only supports 8 bits for the
+ * event selector, i.e. doesn't use bits 24:16 for the selector. But, OR-ing
+ * in '0' is a nop and won't clobber the CMASK.
+ */
+#define RAW_EVENT(eventsel, umask) (((eventsel & 0xf00UL) << 24) | \
+ ((eventsel) & 0xff) | \
+ ((umask) & 0xff) << 8)
+
+/*
+ * These are technically Intel's definitions, but except for CMASK (see above),
+ * AMD's layout is compatible with Intel's.
+ */
+#define ARCH_PERFMON_EVENTSEL_EVENT GENMASK_ULL(7, 0)
+#define ARCH_PERFMON_EVENTSEL_UMASK GENMASK_ULL(15, 8)
+#define ARCH_PERFMON_EVENTSEL_USR BIT_ULL(16)
+#define ARCH_PERFMON_EVENTSEL_OS BIT_ULL(17)
+#define ARCH_PERFMON_EVENTSEL_EDGE BIT_ULL(18)
+#define ARCH_PERFMON_EVENTSEL_PIN_CONTROL BIT_ULL(19)
+#define ARCH_PERFMON_EVENTSEL_INT BIT_ULL(20)
+#define ARCH_PERFMON_EVENTSEL_ANY BIT_ULL(21)
+#define ARCH_PERFMON_EVENTSEL_ENABLE BIT_ULL(22)
+#define ARCH_PERFMON_EVENTSEL_INV BIT_ULL(23)
+#define ARCH_PERFMON_EVENTSEL_CMASK GENMASK_ULL(31, 24)
+
+/* RDPMC control flags, Intel only. */
+#define INTEL_RDPMC_METRICS BIT_ULL(29)
+#define INTEL_RDPMC_FIXED BIT_ULL(30)
+#define INTEL_RDPMC_FAST BIT_ULL(31)
+
+/* Fixed PMC controls, Intel only. */
+#define FIXED_PMC_GLOBAL_CTRL_ENABLE(_idx) BIT_ULL((32 + (_idx)))
+
+#define FIXED_PMC_KERNEL BIT_ULL(0)
+#define FIXED_PMC_USER BIT_ULL(1)
+#define FIXED_PMC_ANYTHREAD BIT_ULL(2)
+#define FIXED_PMC_ENABLE_PMI BIT_ULL(3)
+#define FIXED_PMC_NR_BITS 4
+#define FIXED_PMC_CTRL(_idx, _val) ((_val) << ((_idx) * FIXED_PMC_NR_BITS))
+
+#define PMU_CAP_FW_WRITES BIT_ULL(13)
+#define PMU_CAP_LBR_FMT 0x3f
+
+#define INTEL_ARCH_CPU_CYCLES RAW_EVENT(0x3c, 0x00)
+#define INTEL_ARCH_INSTRUCTIONS_RETIRED RAW_EVENT(0xc0, 0x00)
+#define INTEL_ARCH_REFERENCE_CYCLES RAW_EVENT(0x3c, 0x01)
+#define INTEL_ARCH_LLC_REFERENCES RAW_EVENT(0x2e, 0x4f)
+#define INTEL_ARCH_LLC_MISSES RAW_EVENT(0x2e, 0x41)
+#define INTEL_ARCH_BRANCHES_RETIRED RAW_EVENT(0xc4, 0x00)
+#define INTEL_ARCH_BRANCHES_MISPREDICTED RAW_EVENT(0xc5, 0x00)
+#define INTEL_ARCH_TOPDOWN_SLOTS RAW_EVENT(0xa4, 0x01)
+
+#define AMD_ZEN_CORE_CYCLES RAW_EVENT(0x76, 0x00)
+#define AMD_ZEN_INSTRUCTIONS_RETIRED RAW_EVENT(0xc0, 0x00)
+#define AMD_ZEN_BRANCHES_RETIRED RAW_EVENT(0xc2, 0x00)
+#define AMD_ZEN_BRANCHES_MISPREDICTED RAW_EVENT(0xc3, 0x00)
+
+/*
+ * Note! The order and thus the index of the architectural events matters as
+ * support for each event is enumerated via CPUID using the index of the event.
+ */
+enum intel_pmu_architectural_events {
+ INTEL_ARCH_CPU_CYCLES_INDEX,
+ INTEL_ARCH_INSTRUCTIONS_RETIRED_INDEX,
+ INTEL_ARCH_REFERENCE_CYCLES_INDEX,
+ INTEL_ARCH_LLC_REFERENCES_INDEX,
+ INTEL_ARCH_LLC_MISSES_INDEX,
+ INTEL_ARCH_BRANCHES_RETIRED_INDEX,
+ INTEL_ARCH_BRANCHES_MISPREDICTED_INDEX,
+ INTEL_ARCH_TOPDOWN_SLOTS_INDEX,
+ NR_INTEL_ARCH_EVENTS,
+};
+
+enum amd_pmu_zen_events {
+ AMD_ZEN_CORE_CYCLES_INDEX,
+ AMD_ZEN_INSTRUCTIONS_INDEX,
+ AMD_ZEN_BRANCHES_INDEX,
+ AMD_ZEN_BRANCH_MISSES_INDEX,
+ NR_AMD_ZEN_EVENTS,
+};
+
+extern const uint64_t intel_pmu_arch_events[];
+extern const uint64_t amd_pmu_zen_events[];
+
+#endif /* SELFTEST_KVM_PMU_H */
diff --git a/tools/testing/selftests/kvm/lib/pmu.c b/tools/testing/selftests/kvm/lib/pmu.c
new file mode 100644
index 000000000000..f31f0427c17c
--- /dev/null
+++ b/tools/testing/selftests/kvm/lib/pmu.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023, Tencent, Inc.
+ */
+
+#include <stdint.h>
+
+#include <linux/kernel.h>
+
+#include "kvm_util.h"
+#include "pmu.h"
+
+const uint64_t intel_pmu_arch_events[] = {
+ INTEL_ARCH_CPU_CYCLES,
+ INTEL_ARCH_INSTRUCTIONS_RETIRED,
+ INTEL_ARCH_REFERENCE_CYCLES,
+ INTEL_ARCH_LLC_REFERENCES,
+ INTEL_ARCH_LLC_MISSES,
+ INTEL_ARCH_BRANCHES_RETIRED,
+ INTEL_ARCH_BRANCHES_MISPREDICTED,
+ INTEL_ARCH_TOPDOWN_SLOTS,
+};
+kvm_static_assert(ARRAY_SIZE(intel_pmu_arch_events) == NR_INTEL_ARCH_EVENTS);
+
+const uint64_t amd_pmu_zen_events[] = {
+ AMD_ZEN_CORE_CYCLES,
+ AMD_ZEN_INSTRUCTIONS_RETIRED,
+ AMD_ZEN_BRANCHES_RETIRED,
+ AMD_ZEN_BRANCHES_MISPREDICTED,
+};
+kvm_static_assert(ARRAY_SIZE(amd_pmu_zen_events) == NR_AMD_ZEN_EVENTS);
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
index 283cc55597a4..7ec9fbed92e0 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
@@ -11,72 +11,18 @@
*/
#define _GNU_SOURCE /* for program_invocation_short_name */
-#include "test_util.h"
+
#include "kvm_util.h"
+#include "pmu.h"
#include "processor.h"
-
-/*
- * In lieu of copying perf_event.h into tools...
- */
-#define ARCH_PERFMON_EVENTSEL_OS (1ULL << 17)
-#define ARCH_PERFMON_EVENTSEL_ENABLE (1ULL << 22)
-
-/* End of stuff taken from perf_event.h. */
-
-/* Oddly, this isn't in perf_event.h. */
-#define ARCH_PERFMON_BRANCHES_RETIRED 5
+#include "test_util.h"
#define NUM_BRANCHES 42
-#define INTEL_PMC_IDX_FIXED 32
-
-/* Matches KVM_PMU_EVENT_FILTER_MAX_EVENTS in pmu.c */
-#define MAX_FILTER_EVENTS 300
#define MAX_TEST_EVENTS 10
#define PMU_EVENT_FILTER_INVALID_ACTION (KVM_PMU_EVENT_DENY + 1)
#define PMU_EVENT_FILTER_INVALID_FLAGS (KVM_PMU_EVENT_FLAGS_VALID_MASK << 1)
-#define PMU_EVENT_FILTER_INVALID_NEVENTS (MAX_FILTER_EVENTS + 1)
-
-/*
- * This is how the event selector and unit mask are stored in an AMD
- * core performance event-select register. Intel's format is similar,
- * but the event selector is only 8 bits.
- */
-#define EVENT(select, umask) ((select & 0xf00UL) << 24 | (select & 0xff) | \
- (umask & 0xff) << 8)
-
-/*
- * "Branch instructions retired", from the Intel SDM, volume 3,
- * "Pre-defined Architectural Performance Events."
- */
-
-#define INTEL_BR_RETIRED EVENT(0xc4, 0)
-
-/*
- * "Retired branch instructions", from Processor Programming Reference
- * (PPR) for AMD Family 17h Model 01h, Revision B1 Processors,
- * Preliminary Processor Programming Reference (PPR) for AMD Family
- * 17h Model 31h, Revision B0 Processors, and Preliminary Processor
- * Programming Reference (PPR) for AMD Family 19h Model 01h, Revision
- * B1 Processors Volume 1 of 2.
- */
-
-#define AMD_ZEN_BR_RETIRED EVENT(0xc2, 0)
-
-
-/*
- * "Retired instructions", from Processor Programming Reference
- * (PPR) for AMD Family 17h Model 01h, Revision B1 Processors,
- * Preliminary Processor Programming Reference (PPR) for AMD Family
- * 17h Model 31h, Revision B0 Processors, and Preliminary Processor
- * Programming Reference (PPR) for AMD Family 19h Model 01h, Revision
- * B1 Processors Volume 1 of 2.
- * --- and ---
- * "Instructions retired", from the Intel SDM, volume 3,
- * "Pre-defined Architectural Performance Events."
- */
-
-#define INST_RETIRED EVENT(0xc0, 0)
+#define PMU_EVENT_FILTER_INVALID_NEVENTS (KVM_PMU_EVENT_FILTER_MAX_EVENTS + 1)
struct __kvm_pmu_event_filter {
__u32 action;
@@ -84,26 +30,28 @@ struct __kvm_pmu_event_filter {
__u32 fixed_counter_bitmap;
__u32 flags;
__u32 pad[4];
- __u64 events[MAX_FILTER_EVENTS];
+ __u64 events[KVM_PMU_EVENT_FILTER_MAX_EVENTS];
};
/*
- * This event list comprises Intel's eight architectural events plus
- * AMD's "retired branch instructions" for Zen[123] (and possibly
- * other AMD CPUs).
+ * This event list comprises Intel's known architectural events, plus AMD's
+ * "retired branch instructions" for Zen1-Zen3 (and* possibly other AMD CPUs).
+ * Note, AMD and Intel use the same encoding for instructions retired.
*/
+kvm_static_assert(INTEL_ARCH_INSTRUCTIONS_RETIRED == AMD_ZEN_INSTRUCTIONS_RETIRED);
+
static const struct __kvm_pmu_event_filter base_event_filter = {
.nevents = ARRAY_SIZE(base_event_filter.events),
.events = {
- EVENT(0x3c, 0),
- INST_RETIRED,
- EVENT(0x3c, 1),
- EVENT(0x2e, 0x4f),
- EVENT(0x2e, 0x41),
- EVENT(0xc4, 0),
- EVENT(0xc5, 0),
- EVENT(0xa4, 1),
- AMD_ZEN_BR_RETIRED,
+ INTEL_ARCH_CPU_CYCLES,
+ INTEL_ARCH_INSTRUCTIONS_RETIRED,
+ INTEL_ARCH_REFERENCE_CYCLES,
+ INTEL_ARCH_LLC_REFERENCES,
+ INTEL_ARCH_LLC_MISSES,
+ INTEL_ARCH_BRANCHES_RETIRED,
+ INTEL_ARCH_BRANCHES_MISPREDICTED,
+ INTEL_ARCH_TOPDOWN_SLOTS,
+ AMD_ZEN_BRANCHES_RETIRED,
},
};
@@ -165,9 +113,9 @@ static void intel_guest_code(void)
for (;;) {
wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
wrmsr(MSR_P6_EVNTSEL0, ARCH_PERFMON_EVENTSEL_ENABLE |
- ARCH_PERFMON_EVENTSEL_OS | INTEL_BR_RETIRED);
+ ARCH_PERFMON_EVENTSEL_OS | INTEL_ARCH_BRANCHES_RETIRED);
wrmsr(MSR_P6_EVNTSEL1, ARCH_PERFMON_EVENTSEL_ENABLE |
- ARCH_PERFMON_EVENTSEL_OS | INST_RETIRED);
+ ARCH_PERFMON_EVENTSEL_OS | INTEL_ARCH_INSTRUCTIONS_RETIRED);
wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
run_and_measure_loop(MSR_IA32_PMC0);
@@ -189,9 +137,9 @@ static void amd_guest_code(void)
for (;;) {
wrmsr(MSR_K7_EVNTSEL0, 0);
wrmsr(MSR_K7_EVNTSEL0, ARCH_PERFMON_EVENTSEL_ENABLE |
- ARCH_PERFMON_EVENTSEL_OS | AMD_ZEN_BR_RETIRED);
+ ARCH_PERFMON_EVENTSEL_OS | AMD_ZEN_BRANCHES_RETIRED);
wrmsr(MSR_K7_EVNTSEL1, ARCH_PERFMON_EVENTSEL_ENABLE |
- ARCH_PERFMON_EVENTSEL_OS | INST_RETIRED);
+ ARCH_PERFMON_EVENTSEL_OS | AMD_ZEN_INSTRUCTIONS_RETIRED);
run_and_measure_loop(MSR_K7_PERFCTR0);
GUEST_SYNC(0);
@@ -312,7 +260,7 @@ static void test_amd_deny_list(struct kvm_vcpu *vcpu)
.action = KVM_PMU_EVENT_DENY,
.nevents = 1,
.events = {
- EVENT(0x1C2, 0),
+ RAW_EVENT(0x1C2, 0),
},
};
@@ -347,9 +295,9 @@ static void test_not_member_deny_list(struct kvm_vcpu *vcpu)
f.action = KVM_PMU_EVENT_DENY;
- remove_event(&f, INST_RETIRED);
- remove_event(&f, INTEL_BR_RETIRED);
- remove_event(&f, AMD_ZEN_BR_RETIRED);
+ remove_event(&f, INTEL_ARCH_INSTRUCTIONS_RETIRED);
+ remove_event(&f, INTEL_ARCH_BRANCHES_RETIRED);
+ remove_event(&f, AMD_ZEN_BRANCHES_RETIRED);
test_with_filter(vcpu, &f);
ASSERT_PMC_COUNTING_INSTRUCTIONS();
@@ -361,9 +309,9 @@ static void test_not_member_allow_list(struct kvm_vcpu *vcpu)
f.action = KVM_PMU_EVENT_ALLOW;
- remove_event(&f, INST_RETIRED);
- remove_event(&f, INTEL_BR_RETIRED);
- remove_event(&f, AMD_ZEN_BR_RETIRED);
+ remove_event(&f, INTEL_ARCH_INSTRUCTIONS_RETIRED);
+ remove_event(&f, INTEL_ARCH_BRANCHES_RETIRED);
+ remove_event(&f, AMD_ZEN_BRANCHES_RETIRED);
test_with_filter(vcpu, &f);
ASSERT_PMC_NOT_COUNTING_INSTRUCTIONS();
@@ -452,9 +400,9 @@ static bool use_amd_pmu(void)
* - Sapphire Rapids, Ice Lake, Cascade Lake, Skylake.
*/
#define MEM_INST_RETIRED 0xD0
-#define MEM_INST_RETIRED_LOAD EVENT(MEM_INST_RETIRED, 0x81)
-#define MEM_INST_RETIRED_STORE EVENT(MEM_INST_RETIRED, 0x82)
-#define MEM_INST_RETIRED_LOAD_STORE EVENT(MEM_INST_RETIRED, 0x83)
+#define MEM_INST_RETIRED_LOAD RAW_EVENT(MEM_INST_RETIRED, 0x81)
+#define MEM_INST_RETIRED_STORE RAW_EVENT(MEM_INST_RETIRED, 0x82)
+#define MEM_INST_RETIRED_LOAD_STORE RAW_EVENT(MEM_INST_RETIRED, 0x83)
static bool supports_event_mem_inst_retired(void)
{
@@ -486,9 +434,9 @@ static bool supports_event_mem_inst_retired(void)
* B1 Processors Volume 1 of 2.
*/
#define LS_DISPATCH 0x29
-#define LS_DISPATCH_LOAD EVENT(LS_DISPATCH, BIT(0))
-#define LS_DISPATCH_STORE EVENT(LS_DISPATCH, BIT(1))
-#define LS_DISPATCH_LOAD_STORE EVENT(LS_DISPATCH, BIT(2))
+#define LS_DISPATCH_LOAD RAW_EVENT(LS_DISPATCH, BIT(0))
+#define LS_DISPATCH_STORE RAW_EVENT(LS_DISPATCH, BIT(1))
+#define LS_DISPATCH_LOAD_STORE RAW_EVENT(LS_DISPATCH, BIT(2))
#define INCLUDE_MASKED_ENTRY(event_select, mask, match) \
KVM_PMU_ENCODE_MASKED_ENTRY(event_select, mask, match, false)
@@ -729,14 +677,14 @@ static void add_dummy_events(uint64_t *events, int nevents)
static void test_masked_events(struct kvm_vcpu *vcpu)
{
- int nevents = MAX_FILTER_EVENTS - MAX_TEST_EVENTS;
- uint64_t events[MAX_FILTER_EVENTS];
+ int nevents = KVM_PMU_EVENT_FILTER_MAX_EVENTS - MAX_TEST_EVENTS;
+ uint64_t events[KVM_PMU_EVENT_FILTER_MAX_EVENTS];
/* Run the test cases against a sparse PMU event filter. */
run_masked_events_tests(vcpu, events, 0);
/* Run the test cases against a dense PMU event filter. */
- add_dummy_events(events, MAX_FILTER_EVENTS);
+ add_dummy_events(events, KVM_PMU_EVENT_FILTER_MAX_EVENTS);
run_masked_events_tests(vcpu, events, nevents);
}
@@ -809,20 +757,19 @@ static void test_filter_ioctl(struct kvm_vcpu *vcpu)
TEST_ASSERT(!r, "Masking non-existent fixed counters should be allowed");
}
-static void intel_run_fixed_counter_guest_code(uint8_t fixed_ctr_idx)
+static void intel_run_fixed_counter_guest_code(uint8_t idx)
{
for (;;) {
wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
- wrmsr(MSR_CORE_PERF_FIXED_CTR0 + fixed_ctr_idx, 0);
+ wrmsr(MSR_CORE_PERF_FIXED_CTR0 + idx, 0);
/* Only OS_EN bit is enabled for fixed counter[idx]. */
- wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, BIT_ULL(4 * fixed_ctr_idx));
- wrmsr(MSR_CORE_PERF_GLOBAL_CTRL,
- BIT_ULL(INTEL_PMC_IDX_FIXED + fixed_ctr_idx));
+ wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, FIXED_PMC_CTRL(idx, FIXED_PMC_KERNEL));
+ wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, FIXED_PMC_GLOBAL_CTRL_ENABLE(idx));
__asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES}));
wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
- GUEST_SYNC(rdmsr(MSR_CORE_PERF_FIXED_CTR0 + fixed_ctr_idx));
+ GUEST_SYNC(rdmsr(MSR_CORE_PERF_FIXED_CTR0 + idx));
}
}
--
2.43.0.rc2.451.g8631bc7472-goog
From: Jinrong Liang <[email protected]>
Extend the PMU counters test to verify KVM emulation of fixed counters in
addition to general purpose counters. Fixed counters add an extra wrinkle
in the form of an extra supported bitmask. Thus quoth the SDM:
fixed-function performance counter 'i' is supported if ECX[i] || (EDX[4:0] > i)
Test that KVM handles a counter being available through either method.
Reviewed-by: Dapeng Mi <[email protected]>
Co-developed-by: Like Xu <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Jinrong Liang <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/x86_64/pmu_counters_test.c | 60 ++++++++++++++++++-
1 file changed, 57 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index 863418842ef8..b07294af71a3 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -290,7 +290,7 @@ __GUEST_ASSERT(expect_gp ? vector == GP_VECTOR : !vector, \
msr, expected_val, val);
static void guest_rd_wr_counters(uint32_t base_msr, uint8_t nr_possible_counters,
- uint8_t nr_counters)
+ uint8_t nr_counters, uint32_t or_mask)
{
uint8_t i;
@@ -301,7 +301,13 @@ static void guest_rd_wr_counters(uint32_t base_msr, uint8_t nr_possible_counters
*/
const uint64_t test_val = 0xffff;
const uint32_t msr = base_msr + i;
- const bool expect_success = i < nr_counters;
+
+ /*
+ * Fixed counters are supported if the counter is less than the
+ * number of enumerated contiguous counters *or* the counter is
+ * explicitly enumerated in the supported counters mask.
+ */
+ const bool expect_success = i < nr_counters || (or_mask & BIT(i));
/*
* KVM drops writes to MSR_P6_PERFCTR[0|1] if the counters are
@@ -343,7 +349,7 @@ static void guest_test_gp_counters(void)
else
base_msr = MSR_IA32_PERFCTR0;
- guest_rd_wr_counters(base_msr, MAX_NR_GP_COUNTERS, nr_gp_counters);
+ guest_rd_wr_counters(base_msr, MAX_NR_GP_COUNTERS, nr_gp_counters, 0);
}
static void test_gp_counters(uint8_t pmu_version, uint64_t perf_capabilities,
@@ -363,9 +369,50 @@ static void test_gp_counters(uint8_t pmu_version, uint64_t perf_capabilities,
kvm_vm_free(vm);
}
+static void guest_test_fixed_counters(void)
+{
+ uint64_t supported_bitmask = 0;
+ uint8_t nr_fixed_counters = 0;
+
+ /* Fixed counters require Architectural vPMU Version 2+. */
+ if (guest_get_pmu_version() >= 2)
+ nr_fixed_counters = this_cpu_property(X86_PROPERTY_PMU_NR_FIXED_COUNTERS);
+
+ /*
+ * The supported bitmask for fixed counters was introduced in PMU
+ * version 5.
+ */
+ if (guest_get_pmu_version() >= 5)
+ supported_bitmask = this_cpu_property(X86_PROPERTY_PMU_FIXED_COUNTERS_BITMASK);
+
+ guest_rd_wr_counters(MSR_CORE_PERF_FIXED_CTR0, MAX_NR_FIXED_COUNTERS,
+ nr_fixed_counters, supported_bitmask);
+}
+
+static void test_fixed_counters(uint8_t pmu_version, uint64_t perf_capabilities,
+ uint8_t nr_fixed_counters,
+ uint32_t supported_bitmask)
+{
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+
+ vm = pmu_vm_create_with_one_vcpu(&vcpu, guest_test_fixed_counters,
+ pmu_version, perf_capabilities);
+
+ vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_FIXED_COUNTERS_BITMASK,
+ supported_bitmask);
+ vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_NR_FIXED_COUNTERS,
+ nr_fixed_counters);
+
+ run_vcpu(vcpu);
+
+ kvm_vm_free(vm);
+}
+
static void test_intel_counters(void)
{
uint8_t nr_arch_events = kvm_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
+ uint8_t nr_fixed_counters = kvm_cpu_property(X86_PROPERTY_PMU_NR_FIXED_COUNTERS);
uint8_t nr_gp_counters = kvm_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
uint8_t pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
unsigned int i;
@@ -435,6 +482,13 @@ static void test_intel_counters(void)
v, perf_caps[i]);
for (j = 0; j <= nr_gp_counters; j++)
test_gp_counters(v, perf_caps[i], j);
+
+ pr_info("Testing fixed counters, PMU version %u, perf_caps = %lx\n",
+ v, perf_caps[i]);
+ for (j = 0; j <= nr_fixed_counters; j++) {
+ for (k = 0; k <= (BIT(nr_fixed_counters) - 1); k++)
+ test_fixed_counters(v, perf_caps[i], j, k);
+ }
}
}
}
--
2.43.0.rc2.451.g8631bc7472-goog
Expand the PMU counters test to verify that LLC references and misses have
non-zero counts when the code being executed while the LLC event(s) is
active is evicted via CFLUSH{,OPT}. Note, CLFLUSH{,OPT} requires a fence
of some kind to ensure the cache lines are flushed before execution
continues. Use MFENCE for simplicity (performance is not a concern).
Suggested-by: Jim Mattson <[email protected]>
Reviewed-by: Dapeng Mi <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/x86_64/pmu_counters_test.c | 59 +++++++++++++------
1 file changed, 40 insertions(+), 19 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index f5dedd112471..4c7133ddcda8 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -14,9 +14,9 @@
/*
* Number of "extra" instructions that will be counted, i.e. the number of
* instructions that are needed to set up the loop and then disabled the
- * counter. 2 MOV, 2 XOR, 1 WRMSR.
+ * counter. 1 CLFLUSH/CLFLUSHOPT/NOP, 1 MFENCE, 2 MOV, 2 XOR, 1 WRMSR.
*/
-#define NUM_EXTRA_INSNS 5
+#define NUM_EXTRA_INSNS 7
#define NUM_INSNS_RETIRED (NUM_BRANCHES + NUM_EXTRA_INSNS)
static uint8_t kvm_pmu_version;
@@ -107,6 +107,12 @@ static void guest_assert_event_count(uint8_t idx,
case INTEL_ARCH_BRANCHES_RETIRED_INDEX:
GUEST_ASSERT_EQ(count, NUM_BRANCHES);
break;
+ case INTEL_ARCH_LLC_REFERENCES_INDEX:
+ case INTEL_ARCH_LLC_MISSES_INDEX:
+ if (!this_cpu_has(X86_FEATURE_CLFLUSHOPT) &&
+ !this_cpu_has(X86_FEATURE_CLFLUSH))
+ break;
+ fallthrough;
case INTEL_ARCH_CPU_CYCLES_INDEX:
case INTEL_ARCH_REFERENCE_CYCLES_INDEX:
GUEST_ASSERT_NE(count, 0);
@@ -123,29 +129,44 @@ static void guest_assert_event_count(uint8_t idx,
GUEST_ASSERT_EQ(_rdpmc(pmc), 0xdead);
}
+/*
+ * Enable and disable the PMC in a monolithic asm blob to ensure that the
+ * compiler can't insert _any_ code into the measured sequence. Note, ECX
+ * doesn't need to be clobbered as the input value, @pmc_msr, is restored
+ * before the end of the sequence.
+ *
+ * If CLFUSH{,OPT} is supported, flush the cacheline containing (at least) the
+ * start of the loop to force LLC references and misses, i.e. to allow testing
+ * that those events actually count.
+ */
+#define GUEST_MEASURE_EVENT(_msr, _value, clflush) \
+do { \
+ __asm__ __volatile__("wrmsr\n\t" \
+ clflush "\n\t" \
+ "mfence\n\t" \
+ "1: mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t" \
+ "loop .\n\t" \
+ "mov %%edi, %%ecx\n\t" \
+ "xor %%eax, %%eax\n\t" \
+ "xor %%edx, %%edx\n\t" \
+ "wrmsr\n\t" \
+ :: "a"((uint32_t)_value), "d"(_value >> 32), \
+ "c"(_msr), "D"(_msr) \
+ ); \
+} while (0)
+
static void __guest_test_arch_event(uint8_t idx, struct kvm_x86_pmu_feature event,
uint32_t pmc, uint32_t pmc_msr,
uint32_t ctrl_msr, uint64_t ctrl_msr_value)
{
wrmsr(pmc_msr, 0);
- /*
- * Enable and disable the PMC in a monolithic asm blob to ensure that
- * the compiler can't insert _any_ code into the measured sequence.
- * Note, ECX doesn't need to be clobbered as the input value, @pmc_msr,
- * is restored before the end of the sequence.
- */
- __asm__ __volatile__("wrmsr\n\t"
- "mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t"
- "loop .\n\t"
- "mov %%edi, %%ecx\n\t"
- "xor %%eax, %%eax\n\t"
- "xor %%edx, %%edx\n\t"
- "wrmsr\n\t"
- :: "a"((uint32_t)ctrl_msr_value),
- "d"(ctrl_msr_value >> 32),
- "c"(ctrl_msr), "D"(ctrl_msr)
- );
+ if (this_cpu_has(X86_FEATURE_CLFLUSHOPT))
+ GUEST_MEASURE_EVENT(ctrl_msr, ctrl_msr_value, "clflushopt 1f");
+ else if (this_cpu_has(X86_FEATURE_CLFLUSH))
+ GUEST_MEASURE_EVENT(ctrl_msr, ctrl_msr_value, "clflush 1f");
+ else
+ GUEST_MEASURE_EVENT(ctrl_msr, ctrl_msr_value, "nop");
guest_assert_event_count(idx, event, pmc, pmc_msr);
}
--
2.43.0.rc2.451.g8631bc7472-goog
Extend the kvm_x86_pmu_feature framework to allow querying for fixed
counters via {kvm,this}_pmu_has(). Like architectural events, checking
for a fixed counter annoyingly requires checking multiple CPUID fields, as
a fixed counter exists if:
FxCtr[i]_is_supported := ECX[i] || (EDX[4:0] > i);
Note, KVM currently doesn't actually support exposing fixed counters via
the bitmask, but that will hopefully change sooner than later, and Intel's
SDM explicitly "recommends" checking both the number of counters and the
mask.
Rename the intermedate "anti_feature" field to simply 'f' since the fixed
counter bitmask (thankfully) doesn't have reversed polarity like the
architectural events bitmask.
Note, ideally the helpers would use BUILD_BUG_ON() to assert on the
incoming register, but the expected usage in PMU tests can't guarantee the
inputs are compile-time constants.
Opportunistically define macros for all of the known architectural events
and fixed counters.
Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/include/x86_64/processor.h | 65 ++++++++++++++-----
1 file changed, 47 insertions(+), 18 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 4f737d3b893c..92d4f8ecc730 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -282,24 +282,41 @@ struct kvm_x86_cpu_property {
* that indicates the feature is _not_ supported, and a property that states
* the length of the bit mask of unsupported features. A feature is supported
* if the size of the bit mask is larger than the "unavailable" bit, and said
- * bit is not set.
+ * bit is not set. Fixed counters also bizarre enumeration, but inverted from
+ * arch events for general purpose counters. Fixed counters are supported if a
+ * feature flag is set **OR** the total number of fixed counters is greater
+ * than index of the counter.
*
- * Wrap the "unavailable" feature to simplify checking whether or not a given
- * architectural event is supported.
+ * Wrap the events for general purpose and fixed counters to simplify checking
+ * whether or not a given architectural event is supported.
*/
struct kvm_x86_pmu_feature {
- struct kvm_x86_cpu_feature anti_feature;
+ struct kvm_x86_cpu_feature f;
};
-#define KVM_X86_PMU_FEATURE(__bit) \
-({ \
- struct kvm_x86_pmu_feature feature = { \
- .anti_feature = KVM_X86_CPU_FEATURE(0xa, 0, EBX, __bit), \
- }; \
- \
- feature; \
+#define KVM_X86_PMU_FEATURE(__reg, __bit) \
+({ \
+ struct kvm_x86_pmu_feature feature = { \
+ .f = KVM_X86_CPU_FEATURE(0xa, 0, __reg, __bit), \
+ }; \
+ \
+ kvm_static_assert(KVM_CPUID_##__reg == KVM_CPUID_EBX || \
+ KVM_CPUID_##__reg == KVM_CPUID_ECX); \
+ feature; \
})
-#define X86_PMU_FEATURE_BRANCH_INSNS_RETIRED KVM_X86_PMU_FEATURE(5)
+#define X86_PMU_FEATURE_CPU_CYCLES KVM_X86_PMU_FEATURE(EBX, 0)
+#define X86_PMU_FEATURE_INSNS_RETIRED KVM_X86_PMU_FEATURE(EBX, 1)
+#define X86_PMU_FEATURE_REFERENCE_CYCLES KVM_X86_PMU_FEATURE(EBX, 2)
+#define X86_PMU_FEATURE_LLC_REFERENCES KVM_X86_PMU_FEATURE(EBX, 3)
+#define X86_PMU_FEATURE_LLC_MISSES KVM_X86_PMU_FEATURE(EBX, 4)
+#define X86_PMU_FEATURE_BRANCH_INSNS_RETIRED KVM_X86_PMU_FEATURE(EBX, 5)
+#define X86_PMU_FEATURE_BRANCHES_MISPREDICTED KVM_X86_PMU_FEATURE(EBX, 6)
+#define X86_PMU_FEATURE_TOPDOWN_SLOTS KVM_X86_PMU_FEATURE(EBX, 7)
+
+#define X86_PMU_FEATURE_INSNS_RETIRED_FIXED KVM_X86_PMU_FEATURE(ECX, 0)
+#define X86_PMU_FEATURE_CPU_CYCLES_FIXED KVM_X86_PMU_FEATURE(ECX, 1)
+#define X86_PMU_FEATURE_REFERENCE_TSC_CYCLES_FIXED KVM_X86_PMU_FEATURE(ECX, 2)
+#define X86_PMU_FEATURE_TOPDOWN_SLOTS_FIXED KVM_X86_PMU_FEATURE(ECX, 3)
static inline unsigned int x86_family(unsigned int eax)
{
@@ -698,10 +715,16 @@ static __always_inline bool this_cpu_has_p(struct kvm_x86_cpu_property property)
static inline bool this_pmu_has(struct kvm_x86_pmu_feature feature)
{
- uint32_t nr_bits = this_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
+ uint32_t nr_bits;
- return nr_bits > feature.anti_feature.bit &&
- !this_cpu_has(feature.anti_feature);
+ if (feature.f.reg == KVM_CPUID_EBX) {
+ nr_bits = this_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
+ return nr_bits > feature.f.bit && !this_cpu_has(feature.f);
+ }
+
+ GUEST_ASSERT(feature.f.reg == KVM_CPUID_ECX);
+ nr_bits = this_cpu_property(X86_PROPERTY_PMU_NR_FIXED_COUNTERS);
+ return nr_bits > feature.f.bit || this_cpu_has(feature.f);
}
static __always_inline uint64_t this_cpu_supported_xcr0(void)
@@ -917,10 +940,16 @@ static __always_inline bool kvm_cpu_has_p(struct kvm_x86_cpu_property property)
static inline bool kvm_pmu_has(struct kvm_x86_pmu_feature feature)
{
- uint32_t nr_bits = kvm_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
+ uint32_t nr_bits;
- return nr_bits > feature.anti_feature.bit &&
- !kvm_cpu_has(feature.anti_feature);
+ if (feature.f.reg == KVM_CPUID_EBX) {
+ nr_bits = kvm_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
+ return nr_bits > feature.f.bit && !kvm_cpu_has(feature.f);
+ }
+
+ TEST_ASSERT_EQ(feature.f.reg, KVM_CPUID_ECX);
+ nr_bits = kvm_cpu_property(X86_PROPERTY_PMU_NR_FIXED_COUNTERS);
+ return nr_bits > feature.f.bit || kvm_cpu_has(feature.f);
}
static __always_inline uint64_t kvm_cpu_supported_xcr0(void)
--
2.43.0.rc2.451.g8631bc7472-goog
Add a helper to probe KVM's "enable_pmu" param, open coding strings in
multiple places is just asking for false negatives and/or runtime errors
due to typos.
Reviewed-by: Dapeng Mi <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
tools/testing/selftests/kvm/include/x86_64/processor.h | 5 +++++
tools/testing/selftests/kvm/x86_64/pmu_counters_test.c | 2 +-
tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c | 2 +-
tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c | 2 +-
4 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 92d4f8ecc730..ee082ae58f40 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -1217,6 +1217,11 @@ static inline uint8_t xsetbv_safe(uint32_t index, uint64_t value)
bool kvm_is_tdp_enabled(void);
+static inline bool kvm_is_pmu_enabled(void)
+{
+ return get_kvm_param_bool("enable_pmu");
+}
+
uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
int *level);
uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr);
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index 4c7133ddcda8..9e9dc4084c0d 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -545,7 +545,7 @@ static void test_intel_counters(void)
int main(int argc, char *argv[])
{
- TEST_REQUIRE(get_kvm_param_bool("enable_pmu"));
+ TEST_REQUIRE(kvm_is_pmu_enabled());
TEST_REQUIRE(host_cpu_is_intel);
TEST_REQUIRE(kvm_cpu_has_p(X86_PROPERTY_PMU_VERSION));
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
index 7ec9fbed92e0..fa407e2ccb2f 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
@@ -867,7 +867,7 @@ int main(int argc, char *argv[])
struct kvm_vcpu *vcpu, *vcpu2 = NULL;
struct kvm_vm *vm;
- TEST_REQUIRE(get_kvm_param_bool("enable_pmu"));
+ TEST_REQUIRE(kvm_is_pmu_enabled());
TEST_REQUIRE(kvm_has_cap(KVM_CAP_PMU_EVENT_FILTER));
TEST_REQUIRE(kvm_has_cap(KVM_CAP_PMU_EVENT_MASKED_EVENTS));
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c b/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c
index ebbcb0a3f743..562b0152a122 100644
--- a/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c
+++ b/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c
@@ -237,7 +237,7 @@ int main(int argc, char *argv[])
{
union perf_capabilities host_cap;
- TEST_REQUIRE(get_kvm_param_bool("enable_pmu"));
+ TEST_REQUIRE(kvm_is_pmu_enabled());
TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_PDCM));
TEST_REQUIRE(kvm_cpu_has_p(X86_PROPERTY_PMU_VERSION));
--
2.43.0.rc2.451.g8631bc7472-goog
Add helpers for safe and safe-with-forced-emulations versions of RDMSR,
RDPMC, and XGETBV. Use macro shenanigans to eliminate the rather large
amount of boilerplate needed to get values in and out of registers.
Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/include/x86_64/processor.h | 40 +++++++++++++------
1 file changed, 27 insertions(+), 13 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index fe891424ff55..abac816f6594 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -1216,21 +1216,35 @@ void vm_install_exception_handler(struct kvm_vm *vm, int vector,
vector; \
})
-static inline uint8_t rdmsr_safe(uint32_t msr, uint64_t *val)
-{
- uint64_t error_code;
- uint8_t vector;
- uint32_t a, d;
-
- asm volatile(KVM_ASM_SAFE("rdmsr")
- : "=a"(a), "=d"(d), KVM_ASM_SAFE_OUTPUTS(vector, error_code)
- : "c"(msr)
- : KVM_ASM_SAFE_CLOBBERS);
-
- *val = (uint64_t)a | ((uint64_t)d << 32);
- return vector;
+#define BUILD_READ_U64_SAFE_HELPER(insn, _fep, _FEP) \
+static inline uint8_t insn##_safe ##_fep(uint32_t idx, uint64_t *val) \
+{ \
+ uint64_t error_code; \
+ uint8_t vector; \
+ uint32_t a, d; \
+ \
+ asm volatile(KVM_ASM_SAFE##_FEP(#insn) \
+ : "=a"(a), "=d"(d), \
+ KVM_ASM_SAFE_OUTPUTS(vector, error_code) \
+ : "c"(idx) \
+ : KVM_ASM_SAFE_CLOBBERS); \
+ \
+ *val = (uint64_t)a | ((uint64_t)d << 32); \
+ return vector; \
}
+/*
+ * Generate {insn}_safe() and {insn}_safe_fep() helpers for instructions that
+ * use ECX as in input index, and EDX:EAX as a 64-bit output.
+ */
+#define BUILD_READ_U64_SAFE_HELPERS(insn) \
+ BUILD_READ_U64_SAFE_HELPER(insn, , ) \
+ BUILD_READ_U64_SAFE_HELPER(insn, _fep, _FEP) \
+
+BUILD_READ_U64_SAFE_HELPERS(rdmsr)
+BUILD_READ_U64_SAFE_HELPERS(rdpmc)
+BUILD_READ_U64_SAFE_HELPERS(xgetbv)
+
static inline uint8_t wrmsr_safe(uint32_t msr, uint64_t val)
{
return kvm_asm_safe("wrmsr", "a"(val & -1u), "d"(val >> 32), "c"(msr));
--
2.43.0.rc2.451.g8631bc7472-goog
Move the KVM_FEP definition, a.k.a. the KVM force emulation prefix, into
processor.h so that it can be used for other tests besides the MSR filter
test.
Signed-off-by: Sean Christopherson <[email protected]>
---
tools/testing/selftests/kvm/include/x86_64/processor.h | 3 +++
tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c | 2 --
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index d211cea188be..6be365ac2a85 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -23,6 +23,9 @@
extern bool host_cpu_is_intel;
extern bool host_cpu_is_amd;
+/* Forced emulation prefix, used to invoke the emulator unconditionally. */
+#define KVM_FEP "ud2; .byte 'k', 'v', 'm';"
+
#define NMI_VECTOR 0x02
#define X86_EFLAGS_FIXED (1u << 1)
diff --git a/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c b/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c
index 9e12dbc47a72..ab3a8c4f0b86 100644
--- a/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c
+++ b/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c
@@ -12,8 +12,6 @@
#include "kvm_util.h"
#include "vmx.h"
-/* Forced emulation prefix, used to invoke the emulator unconditionally. */
-#define KVM_FEP "ud2; .byte 'k', 'v', 'm';"
static bool fep_available;
#define MSR_NON_EXISTENT 0x474f4f00
--
2.43.0.rc2.451.g8631bc7472-goog
Extend the read/write PMU counters subtest to verify that RDPMC also reads
back the written value. Opportunsitically verify that attempting to use
the "fast" mode of RDPMC fails, as the "fast" flag is only supported by
non-architectural PMUs, which KVM doesn't virtualize.
Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/x86_64/pmu_counters_test.c | 41 +++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index cb808ac827ba..ae5f6042f1e8 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -325,9 +325,30 @@ __GUEST_ASSERT(expect_gp ? vector == GP_VECTOR : !vector, \
"Expected " #insn "(0x%x) to yield 0x%lx, got 0x%lx", \
msr, expected_val, val);
+static void guest_test_rdpmc(uint32_t rdpmc_idx, bool expect_success,
+ uint64_t expected_val)
+{
+ uint8_t vector;
+ uint64_t val;
+
+ vector = rdpmc_safe(rdpmc_idx, &val);
+ GUEST_ASSERT_PMC_MSR_ACCESS(RDPMC, rdpmc_idx, !expect_success, vector);
+ if (expect_success)
+ GUEST_ASSERT_PMC_VALUE(RDPMC, rdpmc_idx, val, expected_val);
+
+ if (!is_forced_emulation_enabled)
+ return;
+
+ vector = rdpmc_safe_fep(rdpmc_idx, &val);
+ GUEST_ASSERT_PMC_MSR_ACCESS(RDPMC, rdpmc_idx, !expect_success, vector);
+ if (expect_success)
+ GUEST_ASSERT_PMC_VALUE(RDPMC, rdpmc_idx, val, expected_val);
+}
+
static void guest_rd_wr_counters(uint32_t base_msr, uint8_t nr_possible_counters,
uint8_t nr_counters, uint32_t or_mask)
{
+ const bool pmu_has_fast_mode = !guest_get_pmu_version();
uint8_t i;
for (i = 0; i < nr_possible_counters; i++) {
@@ -352,6 +373,7 @@ static void guest_rd_wr_counters(uint32_t base_msr, uint8_t nr_possible_counters
const uint64_t expected_val = expect_success ? test_val : 0;
const bool expect_gp = !expect_success && msr != MSR_P6_PERFCTR0 &&
msr != MSR_P6_PERFCTR1;
+ uint32_t rdpmc_idx;
uint8_t vector;
uint64_t val;
@@ -365,6 +387,25 @@ static void guest_rd_wr_counters(uint32_t base_msr, uint8_t nr_possible_counters
if (!expect_gp)
GUEST_ASSERT_PMC_VALUE(RDMSR, msr, val, expected_val);
+ /*
+ * Redo the read tests with RDPMC, which has different indexing
+ * semantics and additional capabilities.
+ */
+ rdpmc_idx = i;
+ if (base_msr == MSR_CORE_PERF_FIXED_CTR0)
+ rdpmc_idx |= INTEL_RDPMC_FIXED;
+
+ guest_test_rdpmc(rdpmc_idx, expect_success, expected_val);
+
+ /*
+ * KVM doesn't support non-architectural PMUs, i.e. it should
+ * impossible to have fast mode RDPMC. Verify that attempting
+ * to use fast RDPMC always #GPs.
+ */
+ GUEST_ASSERT(!expect_success || !pmu_has_fast_mode);
+ rdpmc_idx |= INTEL_RDPMC_FAST;
+ guest_test_rdpmc(rdpmc_idx, false, -1ull);
+
vector = wrmsr_safe(msr, 0);
GUEST_ASSERT_PMC_MSR_ACCESS(WRMSR, msr, expect_gp, vector);
}
--
2.43.0.rc2.451.g8631bc7472-goog
Add a helper to detect KVM support for forced emulation by querying the
module param, and use the helper to detect support for the MSR filtering
test instead of throwing a noodle/NOP at KVM to see if it sticks.
Cc: Aaron Lewis <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/include/x86_64/processor.h | 5 ++++
.../kvm/x86_64/userspace_msr_exit_test.c | 27 +++++++------------
2 files changed, 14 insertions(+), 18 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index ee082ae58f40..d211cea188be 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -1222,6 +1222,11 @@ static inline bool kvm_is_pmu_enabled(void)
return get_kvm_param_bool("enable_pmu");
}
+static inline bool kvm_is_forced_emulation_enabled(void)
+{
+ return !!get_kvm_param_integer("force_emulation_prefix");
+}
+
uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
int *level);
uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr);
diff --git a/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c b/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c
index 3533dc2fbfee..9e12dbc47a72 100644
--- a/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c
+++ b/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c
@@ -14,8 +14,7 @@
/* Forced emulation prefix, used to invoke the emulator unconditionally. */
#define KVM_FEP "ud2; .byte 'k', 'v', 'm';"
-#define KVM_FEP_LENGTH 5
-static int fep_available = 1;
+static bool fep_available;
#define MSR_NON_EXISTENT 0x474f4f00
@@ -260,13 +259,6 @@ static void guest_code_filter_allow(void)
GUEST_ASSERT(data == 2);
GUEST_ASSERT(guest_exception_count == 0);
- /*
- * Test to see if the instruction emulator is available (ie: the module
- * parameter 'kvm.force_emulation_prefix=1' is set). This instruction
- * will #UD if it isn't available.
- */
- __asm__ __volatile__(KVM_FEP "nop");
-
if (fep_available) {
/* Let userspace know we aren't done. */
GUEST_SYNC(0);
@@ -388,12 +380,6 @@ static void guest_fep_gp_handler(struct ex_regs *regs)
&em_wrmsr_start, &em_wrmsr_end);
}
-static void guest_ud_handler(struct ex_regs *regs)
-{
- fep_available = 0;
- regs->rip += KVM_FEP_LENGTH;
-}
-
static void check_for_guest_assert(struct kvm_vcpu *vcpu)
{
struct ucall uc;
@@ -531,9 +517,11 @@ static void test_msr_filter_allow(void)
{
struct kvm_vcpu *vcpu;
struct kvm_vm *vm;
+ uint64_t cmd;
int rc;
vm = vm_create_with_one_vcpu(&vcpu, guest_code_filter_allow);
+ sync_global_to_guest(vm, fep_available);
rc = kvm_check_cap(KVM_CAP_X86_USER_SPACE_MSR);
TEST_ASSERT(rc, "KVM_CAP_X86_USER_SPACE_MSR is available");
@@ -561,11 +549,11 @@ static void test_msr_filter_allow(void)
run_guest_then_process_wrmsr(vcpu, MSR_NON_EXISTENT);
run_guest_then_process_rdmsr(vcpu, MSR_NON_EXISTENT);
- vm_install_exception_handler(vm, UD_VECTOR, guest_ud_handler);
vcpu_run(vcpu);
- vm_install_exception_handler(vm, UD_VECTOR, NULL);
+ cmd = process_ucall(vcpu);
- if (process_ucall(vcpu) != UCALL_DONE) {
+ if (fep_available) {
+ TEST_ASSERT_EQ(cmd, UCALL_SYNC);
vm_install_exception_handler(vm, GP_VECTOR, guest_fep_gp_handler);
/* Process emulated rdmsr and wrmsr instructions. */
@@ -583,6 +571,7 @@ static void test_msr_filter_allow(void)
/* Confirm the guest completed without issues. */
run_guest_then_process_ucall_done(vcpu);
} else {
+ TEST_ASSERT_EQ(cmd, UCALL_DONE);
printf("To run the instruction emulated tests set the module parameter 'kvm.force_emulation_prefix=1'\n");
}
@@ -804,6 +793,8 @@ static void test_user_exit_msr_flags(void)
int main(int argc, char *argv[])
{
+ fep_available = kvm_is_forced_emulation_enabled();
+
test_msr_filter_allow();
test_msr_filter_deny();
--
2.43.0.rc2.451.g8631bc7472-goog
Drop the "name" parameter from KVM_X86_PMU_FEATURE(), it's unused and
the name is redundant with the macro, i.e. it's truly useless.
Reviewed-by: Jim Mattson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
tools/testing/selftests/kvm/include/x86_64/processor.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 932944c4ea01..4f737d3b893c 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -290,7 +290,7 @@ struct kvm_x86_cpu_property {
struct kvm_x86_pmu_feature {
struct kvm_x86_cpu_feature anti_feature;
};
-#define KVM_X86_PMU_FEATURE(name, __bit) \
+#define KVM_X86_PMU_FEATURE(__bit) \
({ \
struct kvm_x86_pmu_feature feature = { \
.anti_feature = KVM_X86_CPU_FEATURE(0xa, 0, EBX, __bit), \
@@ -299,7 +299,7 @@ struct kvm_x86_pmu_feature {
feature; \
})
-#define X86_PMU_FEATURE_BRANCH_INSNS_RETIRED KVM_X86_PMU_FEATURE(BRANCH_INSNS_RETIRED, 5)
+#define X86_PMU_FEATURE_BRANCH_INSNS_RETIRED KVM_X86_PMU_FEATURE(5)
static inline unsigned int x86_family(unsigned int eax)
{
--
2.43.0.rc2.451.g8631bc7472-goog
Add KVM_ASM_SAFE_FEP() to allow forcing emulation on an instruction that
might fault. Note, KVM skips RIP past the FEP prefix before injecting an
exception, i.e. the fixup needs to be on the instruction itself. Do not
check for FEP support, that is firmly the responsibility of whatever code
wants to use KVM_ASM_SAFE_FEP().
Sadly, chaining variadic arguments that contain commas doesn't work, thus
the unfortunate amount of copy+paste.
Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/include/x86_64/processor.h | 30 +++++++++++++++++--
1 file changed, 28 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 6be365ac2a85..fe891424ff55 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -1154,16 +1154,19 @@ void vm_install_exception_handler(struct kvm_vm *vm, int vector,
* r9 = exception vector (non-zero)
* r10 = error code
*/
-#define KVM_ASM_SAFE(insn) \
+#define __KVM_ASM_SAFE(insn, fep) \
"mov $" __stringify(KVM_EXCEPTION_MAGIC) ", %%r9\n\t" \
"lea 1f(%%rip), %%r10\n\t" \
"lea 2f(%%rip), %%r11\n\t" \
- "1: " insn "\n\t" \
+ fep "1: " insn "\n\t" \
"xor %%r9, %%r9\n\t" \
"2:\n\t" \
"mov %%r9b, %[vector]\n\t" \
"mov %%r10, %[error_code]\n\t"
+#define KVM_ASM_SAFE(insn) __KVM_ASM_SAFE(insn, "")
+#define KVM_ASM_SAFE_FEP(insn) __KVM_ASM_SAFE(insn, KVM_FEP)
+
#define KVM_ASM_SAFE_OUTPUTS(v, ec) [vector] "=qm"(v), [error_code] "=rm"(ec)
#define KVM_ASM_SAFE_CLOBBERS "r9", "r10", "r11"
@@ -1190,6 +1193,29 @@ void vm_install_exception_handler(struct kvm_vm *vm, int vector,
vector; \
})
+#define kvm_asm_safe_fep(insn, inputs...) \
+({ \
+ uint64_t ign_error_code; \
+ uint8_t vector; \
+ \
+ asm volatile(KVM_ASM_SAFE(insn) \
+ : KVM_ASM_SAFE_OUTPUTS(vector, ign_error_code) \
+ : inputs \
+ : KVM_ASM_SAFE_CLOBBERS); \
+ vector; \
+})
+
+#define kvm_asm_safe_ec_fep(insn, error_code, inputs...) \
+({ \
+ uint8_t vector; \
+ \
+ asm volatile(KVM_ASM_SAFE_FEP(insn) \
+ : KVM_ASM_SAFE_OUTPUTS(vector, error_code) \
+ : inputs \
+ : KVM_ASM_SAFE_CLOBBERS); \
+ vector; \
+})
+
static inline uint8_t rdmsr_safe(uint32_t msr, uint64_t *val)
{
uint64_t error_code;
--
2.43.0.rc2.451.g8631bc7472-goog
Extend the PMC counters test to use forced emulation to verify that KVM
emulates counter events for instructions retired and branches retired.
Force emulation for only a subset of the measured code to test that KVM
does the right thing when mixing perf events with emulated events.
Reviewed-by: Dapeng Mi <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/x86_64/pmu_counters_test.c | 44 +++++++++++++------
1 file changed, 30 insertions(+), 14 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index 9e9dc4084c0d..cb808ac827ba 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -21,6 +21,7 @@
static uint8_t kvm_pmu_version;
static bool kvm_has_perf_caps;
+static bool is_forced_emulation_enabled;
static struct kvm_vm *pmu_vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
void *guest_code,
@@ -34,6 +35,7 @@ static struct kvm_vm *pmu_vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
vcpu_init_descriptor_tables(*vcpu);
sync_global_to_guest(vm, kvm_pmu_version);
+ sync_global_to_guest(vm, is_forced_emulation_enabled);
/*
* Set PERF_CAPABILITIES before PMU version as KVM disallows enabling
@@ -138,37 +140,50 @@ static void guest_assert_event_count(uint8_t idx,
* If CLFUSH{,OPT} is supported, flush the cacheline containing (at least) the
* start of the loop to force LLC references and misses, i.e. to allow testing
* that those events actually count.
+ *
+ * If forced emulation is enabled (and specified), force emulation on a subset
+ * of the measured code to verify that KVM correctly emulates instructions and
+ * branches retired events in conjunction with hardware also counting said
+ * events.
*/
-#define GUEST_MEASURE_EVENT(_msr, _value, clflush) \
+#define GUEST_MEASURE_EVENT(_msr, _value, clflush, FEP) \
do { \
__asm__ __volatile__("wrmsr\n\t" \
clflush "\n\t" \
"mfence\n\t" \
"1: mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t" \
- "loop .\n\t" \
- "mov %%edi, %%ecx\n\t" \
- "xor %%eax, %%eax\n\t" \
- "xor %%edx, %%edx\n\t" \
+ FEP "loop .\n\t" \
+ FEP "mov %%edi, %%ecx\n\t" \
+ FEP "xor %%eax, %%eax\n\t" \
+ FEP "xor %%edx, %%edx\n\t" \
"wrmsr\n\t" \
:: "a"((uint32_t)_value), "d"(_value >> 32), \
"c"(_msr), "D"(_msr) \
); \
} while (0)
+#define GUEST_TEST_EVENT(_idx, _event, _pmc, _pmc_msr, _ctrl_msr, _value, FEP) \
+do { \
+ wrmsr(pmc_msr, 0); \
+ \
+ if (this_cpu_has(X86_FEATURE_CLFLUSHOPT)) \
+ GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflushopt 1f", FEP); \
+ else if (this_cpu_has(X86_FEATURE_CLFLUSH)) \
+ GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflush 1f", FEP); \
+ else \
+ GUEST_MEASURE_EVENT(_ctrl_msr, _value, "nop", FEP); \
+ \
+ guest_assert_event_count(_idx, _event, _pmc, _pmc_msr); \
+} while (0)
+
static void __guest_test_arch_event(uint8_t idx, struct kvm_x86_pmu_feature event,
uint32_t pmc, uint32_t pmc_msr,
uint32_t ctrl_msr, uint64_t ctrl_msr_value)
{
- wrmsr(pmc_msr, 0);
+ GUEST_TEST_EVENT(idx, event, pmc, pmc_msr, ctrl_msr, ctrl_msr_value, "");
- if (this_cpu_has(X86_FEATURE_CLFLUSHOPT))
- GUEST_MEASURE_EVENT(ctrl_msr, ctrl_msr_value, "clflushopt 1f");
- else if (this_cpu_has(X86_FEATURE_CLFLUSH))
- GUEST_MEASURE_EVENT(ctrl_msr, ctrl_msr_value, "clflush 1f");
- else
- GUEST_MEASURE_EVENT(ctrl_msr, ctrl_msr_value, "nop");
-
- guest_assert_event_count(idx, event, pmc, pmc_msr);
+ if (is_forced_emulation_enabled)
+ GUEST_TEST_EVENT(idx, event, pmc, pmc_msr, ctrl_msr, ctrl_msr_value, KVM_FEP);
}
#define X86_PMU_FEATURE_NULL \
@@ -553,6 +568,7 @@ int main(int argc, char *argv[])
kvm_pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
kvm_has_perf_caps = kvm_cpu_has(X86_FEATURE_PDCM);
+ is_forced_emulation_enabled = kvm_is_forced_emulation_enabled();
test_intel_counters();
--
2.43.0.rc2.451.g8631bc7472-goog
Add helpers to read integer module params, which is painfully non-trivial
because the pain of dealing with strings in C is exacerbated by the kernel
inserting a newline.
Don't bother differentiating between int, uint, short, etc. They all fit
in an int, and KVM (thankfully) doesn't have any integer params larger
than an int.
Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/include/kvm_util_base.h | 4 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 62 +++++++++++++++++--
2 files changed, 60 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index e0da036a13ae..4293af2cb28c 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -257,6 +257,10 @@ bool get_kvm_param_bool(const char *param);
bool get_kvm_intel_param_bool(const char *param);
bool get_kvm_amd_param_bool(const char *param);
+int get_kvm_param_integer(const char *param);
+int get_kvm_intel_param_integer(const char *param);
+int get_kvm_amd_param_integer(const char *param);
+
unsigned int kvm_check_cap(long cap);
static inline bool kvm_has_cap(long cap)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 17a978b8a2c4..878bdbc8e618 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -51,13 +51,13 @@ int open_kvm_dev_path_or_exit(void)
return _open_kvm_dev_path_or_exit(O_RDONLY);
}
-static bool get_module_param_bool(const char *module_name, const char *param)
+static ssize_t get_module_param(const char *module_name, const char *param,
+ void *buffer, size_t buffer_size)
{
const int path_size = 128;
char path[path_size];
- char value;
- ssize_t r;
- int fd;
+ ssize_t bytes_read;
+ int fd, r;
r = snprintf(path, path_size, "/sys/module/%s/parameters/%s",
module_name, param);
@@ -66,11 +66,46 @@ static bool get_module_param_bool(const char *module_name, const char *param)
fd = open_path_or_exit(path, O_RDONLY);
- r = read(fd, &value, 1);
- TEST_ASSERT(r == 1, "read(%s) failed", path);
+ bytes_read = read(fd, buffer, buffer_size);
+ TEST_ASSERT(bytes_read > 0, "read(%s) returned %ld, wanted %ld bytes",
+ path, bytes_read, buffer_size);
r = close(fd);
TEST_ASSERT(!r, "close(%s) failed", path);
+ return bytes_read;
+}
+
+static int get_module_param_integer(const char *module_name, const char *param)
+{
+ /*
+ * 16 bytes to hold a 64-bit value (1 byte per char), 1 byte for the
+ * NUL char, and 1 byte because the kernel sucks and inserts a newline
+ * at the end.
+ */
+ char value[16 + 1 + 1];
+ ssize_t r;
+
+ memset(value, '\0', sizeof(value));
+
+ r = get_module_param(module_name, param, value, sizeof(value));
+ TEST_ASSERT(value[r - 1] == '\n',
+ "Expected trailing newline, got char '%c'", value[r - 1]);
+
+ /*
+ * Squash the newline, otherwise atoi_paranoid() will complain about
+ * trailing non-NUL characters in the string.
+ */
+ value[r - 1] = '\0';
+ return atoi_paranoid(value);
+}
+
+static bool get_module_param_bool(const char *module_name, const char *param)
+{
+ char value;
+ ssize_t r;
+
+ r = get_module_param(module_name, param, &value, sizeof(value));
+ TEST_ASSERT_EQ(r, 1);
if (value == 'Y')
return true;
@@ -95,6 +130,21 @@ bool get_kvm_amd_param_bool(const char *param)
return get_module_param_bool("kvm_amd", param);
}
+int get_kvm_param_integer(const char *param)
+{
+ return get_module_param_integer("kvm", param);
+}
+
+int get_kvm_intel_param_integer(const char *param)
+{
+ return get_module_param_integer("kvm_intel", param);
+}
+
+int get_kvm_amd_param_integer(const char *param)
+{
+ return get_module_param_integer("kvm_amd", param);
+}
+
/*
* Capability
*
--
2.43.0.rc2.451.g8631bc7472-goog
From: Jinrong Liang <[email protected]>
Extend the fixed counters test to verify that supported counters can
actually be enabled in the control MSRs, that unsupported counters cannot,
and that enabled counters actually count.
Co-developed-by: Like Xu <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Jinrong Liang <[email protected]>
[sean: fold into the rd/wr access test, massage changelog]
Reviewed-by: Dapeng Mi <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/x86_64/pmu_counters_test.c | 31 ++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index b07294af71a3..f5dedd112471 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -332,7 +332,6 @@ static void guest_rd_wr_counters(uint32_t base_msr, uint8_t nr_possible_counters
vector = wrmsr_safe(msr, 0);
GUEST_ASSERT_PMC_MSR_ACCESS(WRMSR, msr, expect_gp, vector);
}
- GUEST_DONE();
}
static void guest_test_gp_counters(void)
@@ -350,6 +349,7 @@ static void guest_test_gp_counters(void)
base_msr = MSR_IA32_PERFCTR0;
guest_rd_wr_counters(base_msr, MAX_NR_GP_COUNTERS, nr_gp_counters, 0);
+ GUEST_DONE();
}
static void test_gp_counters(uint8_t pmu_version, uint64_t perf_capabilities,
@@ -373,6 +373,7 @@ static void guest_test_fixed_counters(void)
{
uint64_t supported_bitmask = 0;
uint8_t nr_fixed_counters = 0;
+ uint8_t i;
/* Fixed counters require Architectural vPMU Version 2+. */
if (guest_get_pmu_version() >= 2)
@@ -387,6 +388,34 @@ static void guest_test_fixed_counters(void)
guest_rd_wr_counters(MSR_CORE_PERF_FIXED_CTR0, MAX_NR_FIXED_COUNTERS,
nr_fixed_counters, supported_bitmask);
+
+ for (i = 0; i < MAX_NR_FIXED_COUNTERS; i++) {
+ uint8_t vector;
+ uint64_t val;
+
+ if (i >= nr_fixed_counters && !(supported_bitmask & BIT_ULL(i))) {
+ vector = wrmsr_safe(MSR_CORE_PERF_FIXED_CTR_CTRL,
+ FIXED_PMC_CTRL(i, FIXED_PMC_KERNEL));
+ __GUEST_ASSERT(vector == GP_VECTOR,
+ "Expected #GP for counter %u in FIXED_CTR_CTRL", i);
+
+ vector = wrmsr_safe(MSR_CORE_PERF_GLOBAL_CTRL,
+ FIXED_PMC_GLOBAL_CTRL_ENABLE(i));
+ __GUEST_ASSERT(vector == GP_VECTOR,
+ "Expected #GP for counter %u in PERF_GLOBAL_CTRL", i);
+ continue;
+ }
+
+ wrmsr(MSR_CORE_PERF_FIXED_CTR0 + i, 0);
+ wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, FIXED_PMC_CTRL(i, FIXED_PMC_KERNEL));
+ wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, FIXED_PMC_GLOBAL_CTRL_ENABLE(i));
+ __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES}));
+ wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+ val = rdmsr(MSR_CORE_PERF_FIXED_CTR0 + i);
+
+ GUEST_ASSERT_NE(val, 0);
+ }
+ GUEST_DONE();
}
static void test_fixed_counters(uint8_t pmu_version, uint64_t perf_capabilities,
--
2.43.0.rc2.451.g8631bc7472-goog
On 12/2/2023 8:03 AM, Sean Christopherson wrote:
> Inject #GP on RDPMC if the "fast" flag is set for architectural Intel
> PMUs, i.e. if the PMU version is non-zero. Per Intel's SDM, and confirmed
> on bare metal, the "fast" flag is supported only for non-architectural
> PMUs, and is reserved for architectural PMUs.
>
> If the processor does not support architectural performance monitoring
> (CPUID.0AH:EAX[7:0]=0), ECX[30:0] specifies the index of the PMC to be
> read. Setting ECX[31] selects “fast” read mode if supported. In this mode,
> RDPMC returns bits 31:0 of the PMC in EAX while clearing EDX to zero.
>
> If the processor does support architectural performance monitoring
> (CPUID.0AH:EAX[7:0] ≠ 0), ECX[31:16] specifies type of PMC while ECX[15:0]
> specifies the index of the PMC to be read within that type. The following
> PMC types are currently defined:
> — General-purpose counters use type 0. The index x (to read IA32_PMCx)
> must be less than the value enumerated by CPUID.0AH.EAX[15:8] (thus
> ECX[15:8] must be zero).
> — Fixed-function counters use type 4000H. The index x (to read
> IA32_FIXED_CTRx) can be used if either CPUID.0AH.EDX[4:0] > x or
> CPUID.0AH.ECX[x] = 1 (thus ECX[15:5] must be 0).
> — Performance metrics use type 2000H. This type can be used only if
> IA32_PERF_CAPABILITIES.PERF_METRICS_AVAILABLE[bit 15]=1. For this type,
> the index in ECX[15:0] is implementation specific.
>
> Opportunistically WARN if KVM ever actually tries to complete RDPMC for a
> non-architectural PMU, and drop the non-existent "support" for fast RDPMC,
> as KVM doesn't support such PMUs, i.e. kvm_pmu_rdpmc() should reject the
> RDPMC before getting to the Intel code.
>
> Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")
> Fixes: 67f4d4288c35 ("KVM: x86: rdpmc emulation checks the counter incorrectly")
> Cc: Dapeng Mi <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/pmu_intel.c | 22 ++++++++++++++++++----
> 1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 6903dd9b71ad..644de27bd48a 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -22,7 +22,6 @@
>
> /* Perf's "BASE" is wildly misleading, this is a single-bit flag, not a base. */
> #define INTEL_RDPMC_FIXED INTEL_PMC_FIXED_RDPMC_BASE
> -#define INTEL_RDPMC_FAST BIT(31)
>
> #define MSR_PMC_FULL_WIDTH_BIT (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0)
>
> @@ -67,10 +66,25 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
> struct kvm_pmc *counters;
> unsigned int num_counters;
>
> - if (idx & INTEL_RDPMC_FAST)
> - *mask &= GENMASK_ULL(31, 0);
> + /*
> + * The encoding of ECX for RDPMC is different for architectural versus
> + * non-architecturals PMUs (PMUs with version '0'). For architectural
> + * PMUs, bits 31:16 specify the PMC type and bits 15:0 specify the PMC
> + * index. For non-architectural PMUs, bit 31 is a "fast" flag, and
> + * bits 30:0 specify the PMC index.
> + *
> + * Yell and reject attempts to read PMCs for a non-architectural PMU,
> + * as KVM doesn't support such PMUs.
> + */
> + if (WARN_ON_ONCE(!pmu->version))
> + return NULL;
>
> - idx &= ~(INTEL_RDPMC_FIXED | INTEL_RDPMC_FAST);
> + /*
> + * Fixed PMCs are supported on all architectural PMUs. Note, KVM only
> + * emulates fixed PMCs for PMU v2+, but the flag itself is still valid,
> + * i.e. let RDPMC fail due to accessing a non-existent counter.
> + */
> + idx &= ~INTEL_RDPMC_FIXED;
> if (fixed) {
> counters = pmu->fixed_counters;
> num_counters = pmu->nr_arch_fixed_counters;
Reviewed-by: Dapeng Mi <[email protected]>
On 12/2/2023 8:03 AM, Sean Christopherson wrote:
> Explicitly check for attempts to read unsupported PMC types instead of
> letting the bounds check fail. Functionally, letting the check fail is
> ok, but it's unnecessarily subtle and does a poor job of documenting the
> architectural behavior that KVM is emulating.
>
> Opportunistically add macros for the type vs. index to further document
> what is going on.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/pmu_intel.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 644de27bd48a..bd4f4bdf5419 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -23,6 +23,9 @@
> /* Perf's "BASE" is wildly misleading, this is a single-bit flag, not a base. */
> #define INTEL_RDPMC_FIXED INTEL_PMC_FIXED_RDPMC_BASE
>
> +#define INTEL_RDPMC_TYPE_MASK GENMASK(31, 16)
> +#define INTEL_RDPMC_INDEX_MASK GENMASK(15, 0)
> +
> #define MSR_PMC_FULL_WIDTH_BIT (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0)
>
> static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
> @@ -82,9 +85,13 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
> /*
> * Fixed PMCs are supported on all architectural PMUs. Note, KVM only
> * emulates fixed PMCs for PMU v2+, but the flag itself is still valid,
> - * i.e. let RDPMC fail due to accessing a non-existent counter.
> + * i.e. let RDPMC fail due to accessing a non-existent counter. Reject
> + * attempts to read all other types, which are unknown/unsupported.
> */
> - idx &= ~INTEL_RDPMC_FIXED;
> + if (idx & INTEL_RDPMC_TYPE_MASK & ~INTEL_RDPMC_FIXED)
> + return NULL;
> +
> + idx &= INTEL_RDPMC_INDEX_MASK;
> if (fixed) {
> counters = pmu->fixed_counters;
> num_counters = pmu->nr_arch_fixed_counters;
Reviewed-by: Dapeng Mi <[email protected]>
On Sun, Dec 10, 2023 at 10:26 PM Mi, Dapeng <[email protected]> wrote:
>
>
> On 12/2/2023 8:03 AM, Sean Christopherson wrote:
> > Explicitly check for attempts to read unsupported PMC types instead of
> > letting the bounds check fail. Functionally, letting the check fail is
> > ok, but it's unnecessarily subtle and does a poor job of documenting the
> > architectural behavior that KVM is emulating.
> >
> > Opportunistically add macros for the type vs. index to further document
> > what is going on.
> >
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/x86/kvm/vmx/pmu_intel.c | 11 +++++++++--
> > 1 file changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> > index 644de27bd48a..bd4f4bdf5419 100644
> > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > @@ -23,6 +23,9 @@
> > /* Perf's "BASE" is wildly misleading, this is a single-bit flag, not a base. */
> > #define INTEL_RDPMC_FIXED INTEL_PMC_FIXED_RDPMC_BASE
> >
> > +#define INTEL_RDPMC_TYPE_MASK GENMASK(31, 16)
> > +#define INTEL_RDPMC_INDEX_MASK GENMASK(15, 0)
> > +
> > #define MSR_PMC_FULL_WIDTH_BIT (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0)
> >
> > static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
> > @@ -82,9 +85,13 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
> > /*
> > * Fixed PMCs are supported on all architectural PMUs. Note, KVM only
> > * emulates fixed PMCs for PMU v2+, but the flag itself is still valid,
> > - * i.e. let RDPMC fail due to accessing a non-existent counter.
> > + * i.e. let RDPMC fail due to accessing a non-existent counter. Reject
> > + * attempts to read all other types, which are unknown/unsupported.
> > */
> > - idx &= ~INTEL_RDPMC_FIXED;
> > + if (idx & INTEL_RDPMC_TYPE_MASK & ~INTEL_RDPMC_FIXED)
You know how I hate to be pedantic (ROFL), but the SDM only says:
If the processor does support architectural performance monitoring
(CPUID.0AH:EAX[7:0] ≠ 0), ECX[31:16] specifies type of PMC while
ECX[15:0] specifies the index of the PMC to be read within that type.
It does not say that the types are bitwise-exclusive.
Yes, the types defined thus far are bitwise-exclusive, but who knows
what tomorrow may bring?
> > + return NULL;
> > +
> > + idx &= INTEL_RDPMC_INDEX_MASK;
> > if (fixed) {
> > counters = pmu->fixed_counters;
> > num_counters = pmu->nr_arch_fixed_counters;
> Reviewed-by: Dapeng Mi <[email protected]>
On Mon, Dec 11, 2023, Jim Mattson wrote:
> On Sun, Dec 10, 2023 at 10:26 PM Mi, Dapeng <[email protected]> wrote:
> >
> >
> > On 12/2/2023 8:03 AM, Sean Christopherson wrote:
> > > Explicitly check for attempts to read unsupported PMC types instead of
> > > letting the bounds check fail. Functionally, letting the check fail is
> > > ok, but it's unnecessarily subtle and does a poor job of documenting the
> > > architectural behavior that KVM is emulating.
> > >
> > > Opportunistically add macros for the type vs. index to further document
> > > what is going on.
> > >
> > > Signed-off-by: Sean Christopherson <[email protected]>
> > > ---
> > > arch/x86/kvm/vmx/pmu_intel.c | 11 +++++++++--
> > > 1 file changed, 9 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> > > index 644de27bd48a..bd4f4bdf5419 100644
> > > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > > @@ -23,6 +23,9 @@
> > > /* Perf's "BASE" is wildly misleading, this is a single-bit flag, not a base. */
> > > #define INTEL_RDPMC_FIXED INTEL_PMC_FIXED_RDPMC_BASE
> > >
> > > +#define INTEL_RDPMC_TYPE_MASK GENMASK(31, 16)
> > > +#define INTEL_RDPMC_INDEX_MASK GENMASK(15, 0)
> > > +
> > > #define MSR_PMC_FULL_WIDTH_BIT (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0)
> > >
> > > static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
> > > @@ -82,9 +85,13 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
> > > /*
> > > * Fixed PMCs are supported on all architectural PMUs. Note, KVM only
> > > * emulates fixed PMCs for PMU v2+, but the flag itself is still valid,
> > > - * i.e. let RDPMC fail due to accessing a non-existent counter.
> > > + * i.e. let RDPMC fail due to accessing a non-existent counter. Reject
> > > + * attempts to read all other types, which are unknown/unsupported.
> > > */
> > > - idx &= ~INTEL_RDPMC_FIXED;
> > > + if (idx & INTEL_RDPMC_TYPE_MASK & ~INTEL_RDPMC_FIXED)
>
> You know how I hate to be pedantic (ROFL), but the SDM only says:
>
> If the processor does support architectural performance monitoring
> (CPUID.0AH:EAX[7:0] ≠ 0), ECX[31:16] specifies type of PMC while
> ECX[15:0] specifies the index of the PMC to be read within that type.
>
> It does not say that the types are bitwise-exclusive.
>
> Yes, the types defined thus far are bitwise-exclusive, but who knows
> what tomorrow may bring?
The goal isn't to make the types exclusive, the goal is to reject types that
aren't supported by KVM. The above accomplishes that, no? I don't see how KVM
could get a false negative or false positive, the above allows exactly FIXED and
"none" types. Or are you objecting to the comment?
On Mon, Dec 11, 2023 at 3:43 PM Sean Christopherson <[email protected]> wrote:
>
> On Mon, Dec 11, 2023, Jim Mattson wrote:
> > On Sun, Dec 10, 2023 at 10:26 PM Mi, Dapeng <[email protected]> wrote:
> > >
> > >
> > > On 12/2/2023 8:03 AM, Sean Christopherson wrote:
> > > > Explicitly check for attempts to read unsupported PMC types instead of
> > > > letting the bounds check fail. Functionally, letting the check fail is
> > > > ok, but it's unnecessarily subtle and does a poor job of documenting the
> > > > architectural behavior that KVM is emulating.
> > > >
> > > > Opportunistically add macros for the type vs. index to further document
> > > > what is going on.
> > > >
> > > > Signed-off-by: Sean Christopherson <[email protected]>
> > > > ---
> > > > arch/x86/kvm/vmx/pmu_intel.c | 11 +++++++++--
> > > > 1 file changed, 9 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> > > > index 644de27bd48a..bd4f4bdf5419 100644
> > > > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > > > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > > > @@ -23,6 +23,9 @@
> > > > /* Perf's "BASE" is wildly misleading, this is a single-bit flag, not a base. */
> > > > #define INTEL_RDPMC_FIXED INTEL_PMC_FIXED_RDPMC_BASE
> > > >
> > > > +#define INTEL_RDPMC_TYPE_MASK GENMASK(31, 16)
> > > > +#define INTEL_RDPMC_INDEX_MASK GENMASK(15, 0)
> > > > +
> > > > #define MSR_PMC_FULL_WIDTH_BIT (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0)
> > > >
> > > > static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
> > > > @@ -82,9 +85,13 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
> > > > /*
> > > > * Fixed PMCs are supported on all architectural PMUs. Note, KVM only
> > > > * emulates fixed PMCs for PMU v2+, but the flag itself is still valid,
> > > > - * i.e. let RDPMC fail due to accessing a non-existent counter.
> > > > + * i.e. let RDPMC fail due to accessing a non-existent counter. Reject
> > > > + * attempts to read all other types, which are unknown/unsupported.
> > > > */
> > > > - idx &= ~INTEL_RDPMC_FIXED;
> > > > + if (idx & INTEL_RDPMC_TYPE_MASK & ~INTEL_RDPMC_FIXED)
> >
> > You know how I hate to be pedantic (ROFL), but the SDM only says:
> >
> > If the processor does support architectural performance monitoring
> > (CPUID.0AH:EAX[7:0] ≠ 0), ECX[31:16] specifies type of PMC while
> > ECX[15:0] specifies the index of the PMC to be read within that type.
> >
> > It does not say that the types are bitwise-exclusive.
> >
> > Yes, the types defined thus far are bitwise-exclusive, but who knows
> > what tomorrow may bring?
>
> The goal isn't to make the types exclusive, the goal is to reject types that
> aren't supported by KVM. The above accomplishes that, no? I don't see how KVM
> could get a false negative or false positive, the above allows exactly FIXED and
> "none" types. Or are you objecting to the comment?
You're right. The code is fine. My brain is not.
But what's wrong with something like:
type = idx & INTEL_RDPMC_TYPE_MASK;
if (type != INTEL_RDPMC_GP && type != INTEL_RDPMC_FIXED) ...
This makes it more clear what kvm accepts and what it doesn't accept,
regardless of the actual values of the macros.
On Mon, Dec 11, 2023, Jim Mattson wrote:
> On Mon, Dec 11, 2023 at 3:43 PM Sean Christopherson <[email protected]> wrote:
> > > > > @@ -82,9 +85,13 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
> > > > > /*
> > > > > * Fixed PMCs are supported on all architectural PMUs. Note, KVM only
> > > > > * emulates fixed PMCs for PMU v2+, but the flag itself is still valid,
> > > > > - * i.e. let RDPMC fail due to accessing a non-existent counter.
> > > > > + * i.e. let RDPMC fail due to accessing a non-existent counter. Reject
> > > > > + * attempts to read all other types, which are unknown/unsupported.
> > > > > */
> > > > > - idx &= ~INTEL_RDPMC_FIXED;
> > > > > + if (idx & INTEL_RDPMC_TYPE_MASK & ~INTEL_RDPMC_FIXED)
> > >
> > > You know how I hate to be pedantic (ROFL), but the SDM only says:
> > >
> > > If the processor does support architectural performance monitoring
> > > (CPUID.0AH:EAX[7:0] ≠ 0), ECX[31:16] specifies type of PMC while
> > > ECX[15:0] specifies the index of the PMC to be read within that type.
> > >
> > > It does not say that the types are bitwise-exclusive.
> > >
> > > Yes, the types defined thus far are bitwise-exclusive, but who knows
> > > what tomorrow may bring?
> >
> > The goal isn't to make the types exclusive, the goal is to reject types that
> > aren't supported by KVM. The above accomplishes that, no? I don't see how KVM
> > could get a false negative or false positive, the above allows exactly FIXED and
> > "none" types. Or are you objecting to the comment?
>
> You're right. The code is fine. My brain is not.
>
> But what's wrong with something like:
>
> type = idx & INTEL_RDPMC_TYPE_MASK;
> if (type != INTEL_RDPMC_GP && type != INTEL_RDPMC_FIXED) ...
>
> This makes it more clear what kvm accepts and what it doesn't accept,
> regardless of the actual values of the macros.
Because when I read the SDM, my reading was heavily colored by KVM's existing
implementation. And the SDM using 4000H and 2000H for the non-zero types doesn't
help (those scream "flags" to me). But rereading things, the SDM clearly states
they are explicit, distinct types. I'll massage this to have KVM treat them as
such.