2023-11-04 00:02:57

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 00/20] KVM: x86/pmu: selftests: Fixes and new tests

The series that just keeps on growing. This started out as a smallish
series from Jinrong to add PMU counters test, but has now ballooned to be
fixes and tests (that to some extent do actually validate the fixes).

Except for the first patch, the fixes aren't tagged for stable as I don't
*think* there's anything particularly nasty, and it's not like KVM's vPMU
is bulletproof even with the fixes.

v6:
- Test LLC references/misses with CFLUSH{OPT}. [Jim]
- Make the tests play nice without PERF_CAPABILITIES. [Mingwei]
- Don't squash eventsels that happen to match an unsupported arch event. [Kan]
- Test PMC counters with forced emulation (don't ask how long it took me to
figure out how to read integer module params).

v5: https://lore.kernel.org/all/[email protected]
v4: https://lore.kernel.org/all/[email protected]
v3: https://lore.kernel.org/kvm/[email protected]

Jinrong Liang (7):
KVM: selftests: Add vcpu_set_cpuid_property() to set properties
KVM: selftests: Add pmu.h and lib/pmu.c for common PMU assets
KVM: selftests: Test Intel PMU architectural events on gp counters
KVM: selftests: Test Intel PMU architectural events on fixed counters
KVM: selftests: Test consistency of CPUID with num of gp counters
KVM: selftests: Test consistency of CPUID with num of fixed counters
KVM: selftests: Add functional test for Intel's fixed PMU counters

Sean Christopherson (13):
KVM: x86/pmu: Don't allow exposing unsupported architectural events
KVM: x86/pmu: Don't enumerate support for fixed counters KVM can't
virtualize
KVM: x86/pmu: Don't enumerate arch events KVM doesn't support
KVM: x86/pmu: Always treat Fixed counters as available when supported
KVM: x86/pmu: Allow programming events that match unsupported arch
events
KVM: selftests: Drop the "name" param from KVM_X86_PMU_FEATURE()
KVM: selftests: Extend {kvm,this}_pmu_has() to support fixed counters
KVM: selftests: Expand PMU counters test to verify LLC events
KVM: selftests: Add a helper to query if the PMU module param is
enabled
KVM: selftests: Add helpers to read integer module params
KVM: selftests: Query module param to detect FEP in MSR filtering test
KVM: selftests: Move KVM_FEP macro into common library header
KVM: selftests: Test PMC virtualization with forced emulation

arch/x86/include/asm/kvm-x86-pmu-ops.h | 1 -
arch/x86/kvm/pmu.c | 1 -
arch/x86/kvm/pmu.h | 5 +-
arch/x86/kvm/svm/pmu.c | 6 -
arch/x86/kvm/vmx/pmu_intel.c | 67 ++-
tools/testing/selftests/kvm/Makefile | 2 +
.../selftests/kvm/include/kvm_util_base.h | 4 +
tools/testing/selftests/kvm/include/pmu.h | 84 +++
.../selftests/kvm/include/x86_64/processor.h | 80 ++-
tools/testing/selftests/kvm/lib/kvm_util.c | 62 +-
tools/testing/selftests/kvm/lib/pmu.c | 28 +
.../selftests/kvm/lib/x86_64/processor.c | 12 +-
.../selftests/kvm/x86_64/pmu_counters_test.c | 567 ++++++++++++++++++
.../kvm/x86_64/pmu_event_filter_test.c | 34 +-
.../smaller_maxphyaddr_emulation_test.c | 2 +-
.../kvm/x86_64/userspace_msr_exit_test.c | 29 +-
.../selftests/kvm/x86_64/vmx_pmu_caps_test.c | 2 +-
17 files changed, 877 insertions(+), 109 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/pmu.h
create mode 100644 tools/testing/selftests/kvm/lib/pmu.c
create mode 100644 tools/testing/selftests/kvm/x86_64/pmu_counters_test.c


base-commit: 45b890f7689eb0aba454fc5831d2d79763781677
--
2.42.0.869.gea05f2083d-goog


2023-11-04 00:03:05

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 04/20] KVM: x86/pmu: Always treat Fixed counters as available when supported

Now that KVM hides fixed counters that can't be virtualized, treat fixed
counters as available when they are supported, i.e. don't silently ignore
an enabled fixed counter just because guest CPUID says the associated
general purpose architectural event is unavailable.

KVM originally treated fixed counters as always available, but that got
changed as part of a fix to avoid confusing REF_CPU_CYCLES, which does NOT
map to an architectural event, with the actual architectural event used
associated with bit 7, TOPDOWN_SLOTS.

The commit justified the change with:

If the event is marked as unavailable in the Intel guest CPUID
0AH.EBX leaf, we need to avoid any perf_event creation, whether
it's a gp or fixed counter.

but that justification doesn't mesh with reality. The Intel SDM uses
"architectural events" to refer to both general purpose events (the ones
with the reverse polarity mask in CPUID.0xA.EBX) and the events for fixed
counters, e.g. the SDM makes statements like:

Each of the fixed-function PMC can count only one architectural
performance event.

but the fact that fixed counter 2 (TSC reference cycles) doesn't have an
associated general purpose architectural makes trying to apply the mask
from CPUID.0xA.EBX impossible. Furthermore, the SDM never explicitly
says that an architectural events that's marked unavailable in EBX affects
the fixed counters.

Note, at the time of the change, KVM didn't enforce hardware support, i.e.
didn't prevent userspace from enumerating support in guest CPUID.0xA.EBX
for architectural events that aren't supported in hardware. I.e. silently
dropping the fixed counter didn't somehow protection against counting the
wrong event, it just enforced guest CPUID.

Arguably, userspace is creating a bogus vCPU model by advertising a fixed
counter but saying the associated general purpose architectural event is
unavailable. But regardless of the validity of the vCPU model, letting
the guest enable a fixed counter and then not actually having it count
anything is completely nonsensical. I.e. even if all of the above is
wrong and it's illegal for a fixed counter to exist when the architectural
event is unavailable, silently doing nothing is still the wrong behavior
and KVM should instead disallow enabling the fixed counter in the first
place.

Fixes: a21864486f7e ("KVM: x86/pmu: Fix available_event_types check for REF_CPU_CYCLES event")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 8d545f84dc4a..b239e7dbdc9b 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -147,11 +147,24 @@ static bool intel_hw_event_available(struct kvm_pmc *pmc)
u8 unit_mask = (pmc->eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
int i;

+ /*
+ * Fixed counters are always available if KVM reaches this point. If a
+ * fixed counter is unsupported in hardware or guest CPUID, KVM doesn't
+ * allow the counter's corresponding MSR to be written. KVM does use
+ * architectural events to program fixed counters, as the interface to
+ * perf doesn't allow requesting a specific fixed counter, e.g. perf
+ * may (sadly) back a guest fixed PMC with a general purposed counter.
+ * But if _hardware_ doesn't support the associated event, KVM simply
+ * doesn't enumerate support for the fixed counter.
+ */
+ if (pmc_is_fixed(pmc))
+ return true;
+
BUILD_BUG_ON(ARRAY_SIZE(intel_arch_events) != NR_INTEL_ARCH_EVENTS);

/*
* Disallow events reported as unavailable in guest CPUID. Note, this
- * doesn't apply to pseudo-architectural events.
+ * doesn't apply to pseudo-architectural events (see above).
*/
for (i = 0; i < NR_REAL_INTEL_ARCH_EVENTS; i++) {
if (intel_arch_events[i].eventsel != event_select ||
--
2.42.0.869.gea05f2083d-goog

2023-11-04 00:03:12

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 03/20] KVM: x86/pmu: Don't enumerate arch events KVM doesn't support

Don't advertise support to userspace for architectural events that KVM
doesn't support, i.e. for "real" events that aren't listed in
intel_pmu_architectural_events. On current hardware, this effectively
means "don't advertise support for Top Down Slots".

Mask off the associated "unavailable" bits, as said bits for undefined
events are reserved to zero. Arguably the events _are_ defined, but from
a KVM perspective they might as well not exist, and there's absolutely no
reason to leave useless unavailable bits set.

Fixes: a6c06ed1a60a ("KVM: Expose the architectural performance monitoring CPUID leaf")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 3316fdea212a..8d545f84dc4a 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -73,6 +73,15 @@ static void intel_init_pmu_capability(void)
int i;

/*
+ * Do not enumerate support for architectural events that KVM doesn't
+ * support. Clear unsupported events "unavailable" bit as well, as
+ * architecturally such bits are reserved to zero.
+ */
+ kvm_pmu_cap.events_mask_len = min(kvm_pmu_cap.events_mask_len,
+ NR_REAL_INTEL_ARCH_EVENTS);
+ kvm_pmu_cap.events_mask &= GENMASK(kvm_pmu_cap.events_mask_len - 1, 0);
+
+ /*
* Perf may (sadly) back a guest fixed counter with a general purpose
* counter, and so KVM must hide fixed counters whose associated
* architectural event are unsupported. On real hardware, this should
--
2.42.0.869.gea05f2083d-goog

2023-11-04 00:03:14

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 07/20] KVM: selftests: Drop the "name" param from KVM_X86_PMU_FEATURE()

Drop the "name" parameter from KVM_X86_PMU_FEATURE(), it's unused and
the name is redundant with the macro, i.e. it's truly useless.

Signed-off-by: Sean Christopherson <[email protected]>
---
tools/testing/selftests/kvm/include/x86_64/processor.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index a01931f7d954..2d9771151dd9 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -289,7 +289,7 @@ struct kvm_x86_cpu_property {
struct kvm_x86_pmu_feature {
struct kvm_x86_cpu_feature anti_feature;
};
-#define KVM_X86_PMU_FEATURE(name, __bit) \
+#define KVM_X86_PMU_FEATURE(__bit) \
({ \
struct kvm_x86_pmu_feature feature = { \
.anti_feature = KVM_X86_CPU_FEATURE(0xa, 0, EBX, __bit), \
@@ -298,7 +298,7 @@ struct kvm_x86_pmu_feature {
feature; \
})

-#define X86_PMU_FEATURE_BRANCH_INSNS_RETIRED KVM_X86_PMU_FEATURE(BRANCH_INSNS_RETIRED, 5)
+#define X86_PMU_FEATURE_BRANCH_INSNS_RETIRED KVM_X86_PMU_FEATURE(5)

static inline unsigned int x86_family(unsigned int eax)
{
--
2.42.0.869.gea05f2083d-goog

2023-11-04 00:03:16

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 02/20] KVM: x86/pmu: Don't enumerate support for fixed counters KVM can't virtualize

Hide fixed counters for which perf is incapable of creating the associated
architectural event. Except for the so called pseudo-architectural event
for counting TSC reference cycle, KVM virtualizes fixed counters by
creating a perf event for the associated general purpose architectural
event. If the associated event isn't supported in hardware, KVM can't
actually virtualize the fixed counter because perf will likely not program
up the correct event.

Note, this issue is almost certainly limited to running KVM on a funky
virtual CPU model, no known real hardware has an asymmetric PMU where a
fixed counter is supported but the associated architectural event is not.

Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/pmu.h | 4 ++++
arch/x86/kvm/vmx/pmu_intel.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 35 insertions(+)

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 1d64113de488..5341e8f69a22 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -19,6 +19,7 @@
#define VMWARE_BACKDOOR_PMC_APPARENT_TIME 0x10002

struct kvm_pmu_ops {
+ void (*init_pmu_capability)(void);
bool (*hw_event_available)(struct kvm_pmc *pmc);
struct kvm_pmc *(*pmc_idx_to_pmc)(struct kvm_pmu *pmu, int pmc_idx);
struct kvm_pmc *(*rdpmc_ecx_to_pmc)(struct kvm_vcpu *vcpu,
@@ -218,6 +219,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
pmu_ops->MAX_NR_GP_COUNTERS);
kvm_pmu_cap.num_counters_fixed = min(kvm_pmu_cap.num_counters_fixed,
KVM_PMC_MAX_FIXED);
+
+ if (pmu_ops->init_pmu_capability)
+ pmu_ops->init_pmu_capability();
}

static inline void kvm_pmu_request_counter_reprogram(struct kvm_pmc *pmc)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 1b13a472e3f2..3316fdea212a 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -68,6 +68,36 @@ static int fixed_pmc_events[] = {
[2] = PSEUDO_ARCH_REFERENCE_CYCLES,
};

+static void intel_init_pmu_capability(void)
+{
+ int i;
+
+ /*
+ * Perf may (sadly) back a guest fixed counter with a general purpose
+ * counter, and so KVM must hide fixed counters whose associated
+ * architectural event are unsupported. On real hardware, this should
+ * never happen, but if KVM is running on a funky virtual CPU model...
+ *
+ * TODO: Drop this horror if/when KVM stops using perf events for
+ * guest fixed counters, or can explicitly request fixed counters.
+ */
+ for (i = 0; i < kvm_pmu_cap.num_counters_fixed; i++) {
+ int event = fixed_pmc_events[i];
+
+ /*
+ * Ignore pseudo-architectural events, they're a bizarre way of
+ * requesting events from perf that _can't_ be backed with a
+ * general purpose architectural event, i.e. they're guaranteed
+ * to be backed by the real fixed counter.
+ */
+ if (event < NR_REAL_INTEL_ARCH_EVENTS &&
+ (kvm_pmu_cap.events_mask & BIT(event)))
+ break;
+ }
+
+ kvm_pmu_cap.num_counters_fixed = i;
+}
+
static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
{
struct kvm_pmc *pmc;
@@ -789,6 +819,7 @@ void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu)
}

struct kvm_pmu_ops intel_pmu_ops __initdata = {
+ .init_pmu_capability = intel_init_pmu_capability,
.hw_event_available = intel_hw_event_available,
.pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
.rdpmc_ecx_to_pmc = intel_rdpmc_ecx_to_pmc,
--
2.42.0.869.gea05f2083d-goog

2023-11-04 00:03:20

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 06/20] KVM: selftests: Add vcpu_set_cpuid_property() to set properties

From: Jinrong Liang <[email protected]>

Add vcpu_set_cpuid_property() helper function for setting properties, and
use it instead of open coding an equivalent for MAX_PHY_ADDR. Future vPMU
testcases will also need to stuff various CPUID properties.

Signed-off-by: Jinrong Liang <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
.../testing/selftests/kvm/include/x86_64/processor.h | 4 +++-
tools/testing/selftests/kvm/lib/x86_64/processor.c | 12 +++++++++---
.../kvm/x86_64/smaller_maxphyaddr_emulation_test.c | 2 +-
3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 25bc61dac5fb..a01931f7d954 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -994,7 +994,9 @@ static inline void vcpu_set_cpuid(struct kvm_vcpu *vcpu)
vcpu_ioctl(vcpu, KVM_GET_CPUID2, vcpu->cpuid);
}

-void vcpu_set_cpuid_maxphyaddr(struct kvm_vcpu *vcpu, uint8_t maxphyaddr);
+void vcpu_set_cpuid_property(struct kvm_vcpu *vcpu,
+ struct kvm_x86_cpu_property property,
+ uint32_t value);

void vcpu_clear_cpuid_entry(struct kvm_vcpu *vcpu, uint32_t function);
void vcpu_set_or_clear_cpuid_feature(struct kvm_vcpu *vcpu,
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index d8288374078e..9e717bc6bd6d 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -752,11 +752,17 @@ void vcpu_init_cpuid(struct kvm_vcpu *vcpu, const struct kvm_cpuid2 *cpuid)
vcpu_set_cpuid(vcpu);
}

-void vcpu_set_cpuid_maxphyaddr(struct kvm_vcpu *vcpu, uint8_t maxphyaddr)
+void vcpu_set_cpuid_property(struct kvm_vcpu *vcpu,
+ struct kvm_x86_cpu_property property,
+ uint32_t value)
{
- struct kvm_cpuid_entry2 *entry = vcpu_get_cpuid_entry(vcpu, 0x80000008);
+ struct kvm_cpuid_entry2 *entry;
+
+ entry = __vcpu_get_cpuid_entry(vcpu, property.function, property.index);
+
+ (&entry->eax)[property.reg] &= ~GENMASK(property.hi_bit, property.lo_bit);
+ (&entry->eax)[property.reg] |= value << (property.lo_bit);

- entry->eax = (entry->eax & ~0xff) | maxphyaddr;
vcpu_set_cpuid(vcpu);
}

diff --git a/tools/testing/selftests/kvm/x86_64/smaller_maxphyaddr_emulation_test.c b/tools/testing/selftests/kvm/x86_64/smaller_maxphyaddr_emulation_test.c
index 06edf00a97d6..9b89440dff19 100644
--- a/tools/testing/selftests/kvm/x86_64/smaller_maxphyaddr_emulation_test.c
+++ b/tools/testing/selftests/kvm/x86_64/smaller_maxphyaddr_emulation_test.c
@@ -63,7 +63,7 @@ int main(int argc, char *argv[])
vm_init_descriptor_tables(vm);
vcpu_init_descriptor_tables(vcpu);

- vcpu_set_cpuid_maxphyaddr(vcpu, MAXPHYADDR);
+ vcpu_set_cpuid_property(vcpu, X86_PROPERTY_MAX_PHY_ADDR, MAXPHYADDR);

rc = kvm_check_cap(KVM_CAP_EXIT_ON_EMULATION_FAILURE);
TEST_ASSERT(rc, "KVM_CAP_EXIT_ON_EMULATION_FAILURE is unavailable");
--
2.42.0.869.gea05f2083d-goog

2023-11-04 00:03:26

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 08/20] KVM: selftests: Extend {kvm,this}_pmu_has() to support fixed counters

Extend the kvm_x86_pmu_feature framework to allow querying for fixed
counters via {kvm,this}_pmu_has(). Like architectural events, checking
for a fixed counter annoyingly requires checking multiple CPUID fields, as
a fixed counter exists if:

FxCtr[i]_is_supported := ECX[i] || (EDX[4:0] > i);

Note, KVM currently doesn't actually support exposing fixed counters via
the bitmask, but that will hopefully change sooner than later, and Intel's
SDM explicitly "recommends" checking both the number of counters and the
mask.

Rename the intermedate "anti_feature" field to simply 'f' since the fixed
counter bitmask (thankfully) doesn't have reversed polarity like the
architectural events bitmask.

Note, ideally the helpers would use BUILD_BUG_ON() to assert on the
incoming register, but the expected usage in PMU tests can't guarantee the
inputs are compile-time constants.

Opportunistically define macros for all of the architectural events and
fixed counters that KVM currently supports.

Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/include/x86_64/processor.h | 63 +++++++++++++------
1 file changed, 45 insertions(+), 18 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 2d9771151dd9..b103c462701b 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -281,24 +281,39 @@ struct kvm_x86_cpu_property {
* that indicates the feature is _not_ supported, and a property that states
* the length of the bit mask of unsupported features. A feature is supported
* if the size of the bit mask is larger than the "unavailable" bit, and said
- * bit is not set.
+ * bit is not set. Fixed counters also bizarre enumeration, but inverted from
+ * arch events for general purpose counters. Fixed counters are supported if a
+ * feature flag is set **OR** the total number of fixed counters is greater
+ * than index of the counter.
*
- * Wrap the "unavailable" feature to simplify checking whether or not a given
- * architectural event is supported.
+ * Wrap the events for general purpose and fixed counters to simplify checking
+ * whether or not a given architectural event is supported.
*/
struct kvm_x86_pmu_feature {
- struct kvm_x86_cpu_feature anti_feature;
+ struct kvm_x86_cpu_feature f;
};
-#define KVM_X86_PMU_FEATURE(__bit) \
-({ \
- struct kvm_x86_pmu_feature feature = { \
- .anti_feature = KVM_X86_CPU_FEATURE(0xa, 0, EBX, __bit), \
- }; \
- \
- feature; \
+#define KVM_X86_PMU_FEATURE(__reg, __bit) \
+({ \
+ struct kvm_x86_pmu_feature feature = { \
+ .f = KVM_X86_CPU_FEATURE(0xa, 0, __reg, __bit), \
+ }; \
+ \
+ kvm_static_assert(KVM_CPUID_##__reg == KVM_CPUID_EBX || \
+ KVM_CPUID_##__reg == KVM_CPUID_ECX); \
+ feature; \
})

-#define X86_PMU_FEATURE_BRANCH_INSNS_RETIRED KVM_X86_PMU_FEATURE(5)
+#define X86_PMU_FEATURE_CPU_CYCLES KVM_X86_PMU_FEATURE(EBX, 0)
+#define X86_PMU_FEATURE_INSNS_RETIRED KVM_X86_PMU_FEATURE(EBX, 1)
+#define X86_PMU_FEATURE_REFERENCE_CYCLES KVM_X86_PMU_FEATURE(EBX, 2)
+#define X86_PMU_FEATURE_LLC_REFERENCES KVM_X86_PMU_FEATURE(EBX, 3)
+#define X86_PMU_FEATURE_LLC_MISSES KVM_X86_PMU_FEATURE(EBX, 4)
+#define X86_PMU_FEATURE_BRANCH_INSNS_RETIRED KVM_X86_PMU_FEATURE(EBX, 5)
+#define X86_PMU_FEATURE_BRANCHES_MISPREDICTED KVM_X86_PMU_FEATURE(EBX, 6)
+
+#define X86_PMU_FEATURE_INSNS_RETIRED_FIXED KVM_X86_PMU_FEATURE(ECX, 0)
+#define X86_PMU_FEATURE_CPU_CYCLES_FIXED KVM_X86_PMU_FEATURE(ECX, 1)
+#define X86_PMU_FEATURE_REFERENCE_CYCLES_FIXED KVM_X86_PMU_FEATURE(ECX, 2)

static inline unsigned int x86_family(unsigned int eax)
{
@@ -697,10 +712,16 @@ static __always_inline bool this_cpu_has_p(struct kvm_x86_cpu_property property)

static inline bool this_pmu_has(struct kvm_x86_pmu_feature feature)
{
- uint32_t nr_bits = this_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
+ uint32_t nr_bits;

- return nr_bits > feature.anti_feature.bit &&
- !this_cpu_has(feature.anti_feature);
+ if (feature.f.reg == KVM_CPUID_EBX) {
+ nr_bits = this_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
+ return nr_bits > feature.f.bit && !this_cpu_has(feature.f);
+ }
+
+ GUEST_ASSERT(feature.f.reg == KVM_CPUID_ECX);
+ nr_bits = this_cpu_property(X86_PROPERTY_PMU_NR_FIXED_COUNTERS);
+ return nr_bits > feature.f.bit || this_cpu_has(feature.f);
}

static __always_inline uint64_t this_cpu_supported_xcr0(void)
@@ -916,10 +937,16 @@ static __always_inline bool kvm_cpu_has_p(struct kvm_x86_cpu_property property)

static inline bool kvm_pmu_has(struct kvm_x86_pmu_feature feature)
{
- uint32_t nr_bits = kvm_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
+ uint32_t nr_bits;

- return nr_bits > feature.anti_feature.bit &&
- !kvm_cpu_has(feature.anti_feature);
+ if (feature.f.reg == KVM_CPUID_EBX) {
+ nr_bits = kvm_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
+ return nr_bits > feature.f.bit && !kvm_cpu_has(feature.f);
+ }
+
+ TEST_ASSERT_EQ(feature.f.reg, KVM_CPUID_ECX);
+ nr_bits = kvm_cpu_property(X86_PROPERTY_PMU_NR_FIXED_COUNTERS);
+ return nr_bits > feature.f.bit || kvm_cpu_has(feature.f);
}

static __always_inline uint64_t kvm_cpu_supported_xcr0(void)
--
2.42.0.869.gea05f2083d-goog

2023-11-04 00:03:34

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 09/20] KVM: selftests: Add pmu.h and lib/pmu.c for common PMU assets

From: Jinrong Liang <[email protected]>

By defining the PMU performance events and masks relevant for x86 in
the new pmu.h and pmu.c, it becomes easier to reference them, minimizing
potential errors in code that handles these values.

Clean up pmu_event_filter_test.c by including pmu.h and removing
unnecessary macros.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Jinrong Liang <[email protected]>
[sean: drop PSEUDO_ARCH_REFERENCE_CYCLES]
Signed-off-by: Sean Christopherson <[email protected]>
---
tools/testing/selftests/kvm/Makefile | 1 +
tools/testing/selftests/kvm/include/pmu.h | 84 +++++++++++++++++++
tools/testing/selftests/kvm/lib/pmu.c | 28 +++++++
.../kvm/x86_64/pmu_event_filter_test.c | 32 ++-----
4 files changed, 122 insertions(+), 23 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/pmu.h
create mode 100644 tools/testing/selftests/kvm/lib/pmu.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index a5963ab9215b..44d8d022b023 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -32,6 +32,7 @@ LIBKVM += lib/guest_modes.c
LIBKVM += lib/io.c
LIBKVM += lib/kvm_util.c
LIBKVM += lib/memstress.c
+LIBKVM += lib/pmu.c
LIBKVM += lib/guest_sprintf.c
LIBKVM += lib/rbtree.c
LIBKVM += lib/sparsebit.c
diff --git a/tools/testing/selftests/kvm/include/pmu.h b/tools/testing/selftests/kvm/include/pmu.h
new file mode 100644
index 000000000000..987602c62b51
--- /dev/null
+++ b/tools/testing/selftests/kvm/include/pmu.h
@@ -0,0 +1,84 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023, Tencent, Inc.
+ */
+#ifndef SELFTEST_KVM_PMU_H
+#define SELFTEST_KVM_PMU_H
+
+#include <stdint.h>
+
+#define X86_PMC_IDX_MAX 64
+#define INTEL_PMC_MAX_GENERIC 32
+#define KVM_PMU_EVENT_FILTER_MAX_EVENTS 300
+
+#define GP_COUNTER_NR_OFS_BIT 8
+#define EVENT_LENGTH_OFS_BIT 24
+
+#define PMU_VERSION_MASK GENMASK_ULL(7, 0)
+#define EVENT_LENGTH_MASK GENMASK_ULL(31, EVENT_LENGTH_OFS_BIT)
+#define GP_COUNTER_NR_MASK GENMASK_ULL(15, GP_COUNTER_NR_OFS_BIT)
+#define FIXED_COUNTER_NR_MASK GENMASK_ULL(4, 0)
+
+#define ARCH_PERFMON_EVENTSEL_EVENT GENMASK_ULL(7, 0)
+#define ARCH_PERFMON_EVENTSEL_UMASK GENMASK_ULL(15, 8)
+#define ARCH_PERFMON_EVENTSEL_USR BIT_ULL(16)
+#define ARCH_PERFMON_EVENTSEL_OS BIT_ULL(17)
+#define ARCH_PERFMON_EVENTSEL_EDGE BIT_ULL(18)
+#define ARCH_PERFMON_EVENTSEL_PIN_CONTROL BIT_ULL(19)
+#define ARCH_PERFMON_EVENTSEL_INT BIT_ULL(20)
+#define ARCH_PERFMON_EVENTSEL_ANY BIT_ULL(21)
+#define ARCH_PERFMON_EVENTSEL_ENABLE BIT_ULL(22)
+#define ARCH_PERFMON_EVENTSEL_INV BIT_ULL(23)
+#define ARCH_PERFMON_EVENTSEL_CMASK GENMASK_ULL(31, 24)
+
+#define PMC_MAX_FIXED 16
+#define PMC_IDX_FIXED 32
+
+/* RDPMC offset for Fixed PMCs */
+#define PMC_FIXED_RDPMC_BASE BIT_ULL(30)
+#define PMC_FIXED_RDPMC_METRICS BIT_ULL(29)
+
+#define FIXED_BITS_MASK 0xFULL
+#define FIXED_BITS_STRIDE 4
+#define FIXED_0_KERNEL BIT_ULL(0)
+#define FIXED_0_USER BIT_ULL(1)
+#define FIXED_0_ANYTHREAD BIT_ULL(2)
+#define FIXED_0_ENABLE_PMI BIT_ULL(3)
+
+#define fixed_bits_by_idx(_idx, _bits) \
+ ((_bits) << ((_idx) * FIXED_BITS_STRIDE))
+
+#define AMD64_NR_COUNTERS 4
+#define AMD64_NR_COUNTERS_CORE 6
+
+#define PMU_CAP_FW_WRITES BIT_ULL(13)
+#define PMU_CAP_LBR_FMT 0x3f
+
+enum intel_pmu_architectural_events {
+ /*
+ * The order of the architectural events matters as support for each
+ * event is enumerated via CPUID using the index of the event.
+ */
+ INTEL_ARCH_CPU_CYCLES,
+ INTEL_ARCH_INSTRUCTIONS_RETIRED,
+ INTEL_ARCH_REFERENCE_CYCLES,
+ INTEL_ARCH_LLC_REFERENCES,
+ INTEL_ARCH_LLC_MISSES,
+ INTEL_ARCH_BRANCHES_RETIRED,
+ INTEL_ARCH_BRANCHES_MISPREDICTED,
+ NR_INTEL_ARCH_EVENTS,
+};
+
+enum amd_pmu_k7_events {
+ AMD_ZEN_CORE_CYCLES,
+ AMD_ZEN_INSTRUCTIONS,
+ AMD_ZEN_BRANCHES,
+ AMD_ZEN_BRANCH_MISSES,
+ NR_AMD_ARCH_EVENTS,
+};
+
+extern const uint64_t intel_pmu_arch_events[];
+extern const uint64_t amd_pmu_arch_events[];
+extern const int intel_pmu_fixed_pmc_events[];
+
+#endif /* SELFTEST_KVM_PMU_H */
diff --git a/tools/testing/selftests/kvm/lib/pmu.c b/tools/testing/selftests/kvm/lib/pmu.c
new file mode 100644
index 000000000000..27a6c35f98a1
--- /dev/null
+++ b/tools/testing/selftests/kvm/lib/pmu.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023, Tencent, Inc.
+ */
+
+#include <stdint.h>
+
+#include "pmu.h"
+
+/* Definitions for Architectural Performance Events */
+#define ARCH_EVENT(select, umask) (((select) & 0xff) | ((umask) & 0xff) << 8)
+
+const uint64_t intel_pmu_arch_events[] = {
+ [INTEL_ARCH_CPU_CYCLES] = ARCH_EVENT(0x3c, 0x0),
+ [INTEL_ARCH_INSTRUCTIONS_RETIRED] = ARCH_EVENT(0xc0, 0x0),
+ [INTEL_ARCH_REFERENCE_CYCLES] = ARCH_EVENT(0x3c, 0x1),
+ [INTEL_ARCH_LLC_REFERENCES] = ARCH_EVENT(0x2e, 0x4f),
+ [INTEL_ARCH_LLC_MISSES] = ARCH_EVENT(0x2e, 0x41),
+ [INTEL_ARCH_BRANCHES_RETIRED] = ARCH_EVENT(0xc4, 0x0),
+ [INTEL_ARCH_BRANCHES_MISPREDICTED] = ARCH_EVENT(0xc5, 0x0),
+};
+
+const uint64_t amd_pmu_arch_events[] = {
+ [AMD_ZEN_CORE_CYCLES] = ARCH_EVENT(0x76, 0x00),
+ [AMD_ZEN_INSTRUCTIONS] = ARCH_EVENT(0xc0, 0x00),
+ [AMD_ZEN_BRANCHES] = ARCH_EVENT(0xc2, 0x00),
+ [AMD_ZEN_BRANCH_MISSES] = ARCH_EVENT(0xc3, 0x00),
+};
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
index 283cc55597a4..b6e4f57a8651 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
@@ -11,31 +11,18 @@
*/

#define _GNU_SOURCE /* for program_invocation_short_name */
-#include "test_util.h"
+
#include "kvm_util.h"
+#include "pmu.h"
#include "processor.h"
-
-/*
- * In lieu of copying perf_event.h into tools...
- */
-#define ARCH_PERFMON_EVENTSEL_OS (1ULL << 17)
-#define ARCH_PERFMON_EVENTSEL_ENABLE (1ULL << 22)
-
-/* End of stuff taken from perf_event.h. */
-
-/* Oddly, this isn't in perf_event.h. */
-#define ARCH_PERFMON_BRANCHES_RETIRED 5
+#include "test_util.h"

#define NUM_BRANCHES 42
-#define INTEL_PMC_IDX_FIXED 32
-
-/* Matches KVM_PMU_EVENT_FILTER_MAX_EVENTS in pmu.c */
-#define MAX_FILTER_EVENTS 300
#define MAX_TEST_EVENTS 10

#define PMU_EVENT_FILTER_INVALID_ACTION (KVM_PMU_EVENT_DENY + 1)
#define PMU_EVENT_FILTER_INVALID_FLAGS (KVM_PMU_EVENT_FLAGS_VALID_MASK << 1)
-#define PMU_EVENT_FILTER_INVALID_NEVENTS (MAX_FILTER_EVENTS + 1)
+#define PMU_EVENT_FILTER_INVALID_NEVENTS (KVM_PMU_EVENT_FILTER_MAX_EVENTS + 1)

/*
* This is how the event selector and unit mask are stored in an AMD
@@ -63,7 +50,6 @@

#define AMD_ZEN_BR_RETIRED EVENT(0xc2, 0)

-
/*
* "Retired instructions", from Processor Programming Reference
* (PPR) for AMD Family 17h Model 01h, Revision B1 Processors,
@@ -84,7 +70,7 @@ struct __kvm_pmu_event_filter {
__u32 fixed_counter_bitmap;
__u32 flags;
__u32 pad[4];
- __u64 events[MAX_FILTER_EVENTS];
+ __u64 events[KVM_PMU_EVENT_FILTER_MAX_EVENTS];
};

/*
@@ -729,14 +715,14 @@ static void add_dummy_events(uint64_t *events, int nevents)

static void test_masked_events(struct kvm_vcpu *vcpu)
{
- int nevents = MAX_FILTER_EVENTS - MAX_TEST_EVENTS;
- uint64_t events[MAX_FILTER_EVENTS];
+ int nevents = KVM_PMU_EVENT_FILTER_MAX_EVENTS - MAX_TEST_EVENTS;
+ uint64_t events[KVM_PMU_EVENT_FILTER_MAX_EVENTS];

/* Run the test cases against a sparse PMU event filter. */
run_masked_events_tests(vcpu, events, 0);

/* Run the test cases against a dense PMU event filter. */
- add_dummy_events(events, MAX_FILTER_EVENTS);
+ add_dummy_events(events, KVM_PMU_EVENT_FILTER_MAX_EVENTS);
run_masked_events_tests(vcpu, events, nevents);
}

@@ -818,7 +804,7 @@ static void intel_run_fixed_counter_guest_code(uint8_t fixed_ctr_idx)
/* Only OS_EN bit is enabled for fixed counter[idx]. */
wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, BIT_ULL(4 * fixed_ctr_idx));
wrmsr(MSR_CORE_PERF_GLOBAL_CTRL,
- BIT_ULL(INTEL_PMC_IDX_FIXED + fixed_ctr_idx));
+ BIT_ULL(PMC_IDX_FIXED + fixed_ctr_idx));
__asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES}));
wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);

--
2.42.0.869.gea05f2083d-goog

2023-11-04 00:03:41

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 12/20] KVM: selftests: Test consistency of CPUID with num of gp counters

From: Jinrong Liang <[email protected]>

Add a test to verify that KVM correctly emulates MSR-based accesses to
general purpose counters based on guest CPUID, e.g. that accesses to
non-existent counters #GP and accesses to existent counters succeed.

Note, for compatibility reasons, KVM does not emulate #GP when
MSR_P6_PERFCTR[0|1] is not present (writes should be dropped).

Co-developed-by: Like Xu <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Jinrong Liang <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/x86_64/pmu_counters_test.c | 91 +++++++++++++++++++
1 file changed, 91 insertions(+)

diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index 4d3a5c94b8ba..232b9a80a9db 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -270,9 +270,95 @@ static void test_arch_events(uint8_t pmu_version, uint64_t perf_capabilities,
kvm_vm_free(vm);
}

+/*
+ * Limit testing to MSRs that are actually defined by Intel (in the SDM). MSRs
+ * that aren't defined counter MSRs *probably* don't exist, but there's no
+ * guarantee that currently undefined MSR indices won't be used for something
+ * other than PMCs in the future.
+ */
+#define MAX_NR_GP_COUNTERS 8
+#define MAX_NR_FIXED_COUNTERS 3
+
+#define GUEST_ASSERT_PMC_MSR_ACCESS(insn, msr, expect_gp, vector) \
+__GUEST_ASSERT(expect_gp ? vector == GP_VECTOR : !vector, \
+ "Expected %s on " #insn "(0x%x), got vector %u", \
+ expect_gp ? "#GP" : "no fault", msr, vector) \
+
+static void guest_rd_wr_counters(uint32_t base_msr, uint8_t nr_possible_counters,
+ uint8_t nr_counters)
+{
+ uint8_t i;
+
+ for (i = 0; i < nr_possible_counters; i++) {
+ const uint32_t msr = base_msr + i;
+ const bool expect_success = i < nr_counters;
+
+ /*
+ * KVM drops writes to MSR_P6_PERFCTR[0|1] if the counters are
+ * unsupported, i.e. doesn't #GP and reads back '0'.
+ */
+ const uint64_t expected_val = expect_success ? 0xffff : 0;
+ const bool expect_gp = !expect_success && msr != MSR_P6_PERFCTR0 &&
+ msr != MSR_P6_PERFCTR1;
+ uint8_t vector;
+ uint64_t val;
+
+ vector = wrmsr_safe(msr, 0xffff);
+ GUEST_ASSERT_PMC_MSR_ACCESS(WRMSR, msr, expect_gp, vector);
+
+ vector = rdmsr_safe(msr, &val);
+ GUEST_ASSERT_PMC_MSR_ACCESS(RDMSR, msr, expect_gp, vector);
+
+ /* On #GP, the result of RDMSR is undefined. */
+ if (!expect_gp)
+ __GUEST_ASSERT(val == expected_val,
+ "Expected RDMSR(0x%x) to yield 0x%lx, got 0x%lx",
+ msr, expected_val, val);
+
+ vector = wrmsr_safe(msr, 0);
+ GUEST_ASSERT_PMC_MSR_ACCESS(WRMSR, msr, expect_gp, vector);
+ }
+ GUEST_DONE();
+}
+
+static void guest_test_gp_counters(void)
+{
+ uint8_t nr_gp_counters = 0;
+ uint32_t base_msr;
+
+ if (guest_get_pmu_version())
+ nr_gp_counters = this_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
+
+ if (this_cpu_has(X86_FEATURE_PDCM) &&
+ rdmsr(MSR_IA32_PERF_CAPABILITIES) & PMU_CAP_FW_WRITES)
+ base_msr = MSR_IA32_PMC0;
+ else
+ base_msr = MSR_IA32_PERFCTR0;
+
+ guest_rd_wr_counters(base_msr, MAX_NR_GP_COUNTERS, nr_gp_counters);
+}
+
+static void test_gp_counters(uint8_t pmu_version, uint64_t perf_capabilities,
+ uint8_t nr_gp_counters)
+{
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+
+ vm = pmu_vm_create_with_one_vcpu(&vcpu, guest_test_gp_counters,
+ pmu_version, perf_capabilities);
+
+ vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_NR_GP_COUNTERS,
+ nr_gp_counters);
+
+ run_vcpu(vcpu);
+
+ kvm_vm_free(vm);
+}
+
static void test_intel_counters(void)
{
uint8_t nr_arch_events = kvm_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
+ uint8_t nr_gp_counters = kvm_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
uint8_t pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
unsigned int i;
uint8_t v, j;
@@ -337,6 +423,11 @@ static void test_intel_counters(void)
for (k = 0; k < nr_arch_events; k++)
test_arch_events(v, perf_caps[i], j, BIT(k));
}
+
+ pr_info("Testing GP counters, PMU version %u, perf_caps = %lx\n",
+ v, perf_caps[i]);
+ for (j = 0; j <= nr_gp_counters; j++)
+ test_gp_counters(v, perf_caps[i], j);
}
}
}
--
2.42.0.869.gea05f2083d-goog

2023-11-04 00:03:42

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 10/20] KVM: selftests: Test Intel PMU architectural events on gp counters

From: Jinrong Liang <[email protected]>

Add test cases to verify that Intel's Architectural PMU events work as
expected when the are (un)available according to guest CPUID. Iterate
over a range of sane PMU versions, with and without full-width writes
enabled, and over interesting combinations of lengths/masks for the bit
vector that enumerates unavailable events.

Test up to vPMU version 5, i.e. the current architectural max. KVM only
officially supports up to version 2, but the behavior of the counters is
backwards compatible, i.e. KVM shouldn't do something completely different
for a higher, architecturally-defined vPMU version. Verify KVM behavior
against the effective vPMU version, e.g. advertising vPMU 5 when KVM only
supports vPMU 2 shouldn't magically unlock vPMU 5 features.

According to Intel SDM, the number of architectural events is reported
through CPUID.0AH:EAX[31:24] and the architectural event x is supported
if EBX[x]=0 && EAX[31:24]>x. Note, KVM's ABI is that unavailable events
do not count, even though strictly speaking that's not required by the
SDM (the behavior is effectively undefined).

Handcode the entirety of the measured section so that the test can
precisely assert on the number of instructions and branches retired.

Co-developed-by: Like Xu <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Jinrong Liang <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/x86_64/pmu_counters_test.c | 321 ++++++++++++++++++
2 files changed, 322 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86_64/pmu_counters_test.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 44d8d022b023..09f5d6fe84de 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -91,6 +91,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test
TEST_GEN_PROGS_x86_64 += x86_64/monitor_mwait_test
TEST_GEN_PROGS_x86_64 += x86_64/nested_exceptions_test
TEST_GEN_PROGS_x86_64 += x86_64/platform_info_test
+TEST_GEN_PROGS_x86_64 += x86_64/pmu_counters_test
TEST_GEN_PROGS_x86_64 += x86_64/pmu_event_filter_test
TEST_GEN_PROGS_x86_64 += x86_64/set_boot_cpu_id
TEST_GEN_PROGS_x86_64 += x86_64/set_sregs_test
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
new file mode 100644
index 000000000000..dd9a7864410c
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -0,0 +1,321 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023, Tencent, Inc.
+ */
+
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <x86intrin.h>
+
+#include "pmu.h"
+#include "processor.h"
+
+/* Number of LOOP instructions for the guest measurement payload. */
+#define NUM_BRANCHES 10
+/*
+ * Number of "extra" instructions that will be counted, i.e. the number of
+ * instructions that are needed to set up the loop and then disabled the
+ * counter. 2 MOV, 2 XOR, 1 WRMSR.
+ */
+#define NUM_EXTRA_INSNS 5
+#define NUM_INSNS_RETIRED (NUM_BRANCHES + NUM_EXTRA_INSNS)
+
+static uint8_t kvm_pmu_version;
+static bool kvm_has_perf_caps;
+
+static struct kvm_vm *pmu_vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
+ void *guest_code,
+ uint8_t pmu_version,
+ uint64_t perf_capabilities)
+{
+ struct kvm_vm *vm;
+
+ vm = vm_create_with_one_vcpu(vcpu, guest_code);
+ vm_init_descriptor_tables(vm);
+ vcpu_init_descriptor_tables(*vcpu);
+
+ sync_global_to_guest(vm, kvm_pmu_version);
+
+ /*
+ * Set PERF_CAPABILITIES before PMU version as KVM disallows enabling
+ * features via PERF_CAPABILITIES if the guest doesn't have a vPMU.
+ */
+ if (kvm_has_perf_caps)
+ vcpu_set_msr(*vcpu, MSR_IA32_PERF_CAPABILITIES, perf_capabilities);
+
+ vcpu_set_cpuid_property(*vcpu, X86_PROPERTY_PMU_VERSION, pmu_version);
+ return vm;
+}
+
+static void run_vcpu(struct kvm_vcpu *vcpu)
+{
+ struct ucall uc;
+
+ do {
+ vcpu_run(vcpu);
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_SYNC:
+ break;
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ break;
+ case UCALL_PRINTF:
+ pr_info("%s", uc.buffer);
+ break;
+ case UCALL_DONE:
+ break;
+ default:
+ TEST_FAIL("Unexpected ucall: %lu", uc.cmd);
+ }
+ } while (uc.cmd != UCALL_DONE);
+}
+
+static uint8_t guest_get_pmu_version(void)
+{
+ /*
+ * Return the effective PMU version, i.e. the minimum between what KVM
+ * supports and what is enumerated to the guest. The host deliberately
+ * advertises a PMU version to the guest beyond what is actually
+ * supported by KVM to verify KVM doesn't freak out and do something
+ * bizarre with an architecturally valid, but unsupported, version.
+ */
+ return min_t(uint8_t, kvm_pmu_version, this_cpu_property(X86_PROPERTY_PMU_VERSION));
+}
+
+/*
+ * If an architectural event is supported and guaranteed to generate at least
+ * one "hit, assert that its count is non-zero. If an event isn't supported or
+ * the test can't guarantee the associated action will occur, then all bets are
+ * off regarding the count, i.e. no checks can be done.
+ *
+ * Sanity check that in all cases, the event doesn't count when it's disabled,
+ * and that KVM correctly emulates the write of an arbitrary value.
+ */
+static void guest_assert_event_count(uint8_t idx,
+ struct kvm_x86_pmu_feature event,
+ uint32_t pmc, uint32_t pmc_msr)
+{
+ uint64_t count;
+
+ count = _rdpmc(pmc);
+ if (!this_pmu_has(event))
+ goto sanity_checks;
+
+ switch (idx) {
+ case INTEL_ARCH_INSTRUCTIONS_RETIRED:
+ GUEST_ASSERT_EQ(count, NUM_INSNS_RETIRED);
+ break;
+ case INTEL_ARCH_BRANCHES_RETIRED:
+ GUEST_ASSERT_EQ(count, NUM_BRANCHES);
+ break;
+ case INTEL_ARCH_CPU_CYCLES:
+ case INTEL_ARCH_REFERENCE_CYCLES:
+ GUEST_ASSERT_NE(count, 0);
+ break;
+ default:
+ break;
+ }
+
+sanity_checks:
+ __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES}));
+ GUEST_ASSERT_EQ(_rdpmc(pmc), count);
+
+ wrmsr(pmc_msr, 0xdead);
+ GUEST_ASSERT_EQ(_rdpmc(pmc), 0xdead);
+}
+
+static void __guest_test_arch_event(uint8_t idx, struct kvm_x86_pmu_feature event,
+ uint32_t pmc, uint32_t pmc_msr,
+ uint32_t ctrl_msr, uint64_t ctrl_msr_value)
+{
+ wrmsr(pmc_msr, 0);
+
+ /*
+ * Enable and disable the PMC in a monolithic asm blob to ensure that
+ * the compiler can't insert _any_ code into the measured sequence.
+ * Note, ECX doesn't need to be clobbered as the input value, @pmc_msr,
+ * is restored before the end of the sequence.
+ */
+ __asm__ __volatile__("wrmsr\n\t"
+ "mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t"
+ "loop .\n\t"
+ "mov %%edi, %%ecx\n\t"
+ "xor %%eax, %%eax\n\t"
+ "xor %%edx, %%edx\n\t"
+ "wrmsr\n\t"
+ :: "a"((uint32_t)ctrl_msr_value),
+ "d"(ctrl_msr_value >> 32),
+ "c"(ctrl_msr), "D"(ctrl_msr)
+ );
+
+ guest_assert_event_count(idx, event, pmc, pmc_msr);
+}
+
+static void guest_test_arch_event(uint8_t idx)
+{
+ const struct {
+ struct kvm_x86_pmu_feature gp_event;
+ } intel_event_to_feature[] = {
+ [INTEL_ARCH_CPU_CYCLES] = { X86_PMU_FEATURE_CPU_CYCLES },
+ [INTEL_ARCH_INSTRUCTIONS_RETIRED] = { X86_PMU_FEATURE_INSNS_RETIRED },
+ [INTEL_ARCH_REFERENCE_CYCLES] = { X86_PMU_FEATURE_REFERENCE_CYCLES },
+ [INTEL_ARCH_LLC_REFERENCES] = { X86_PMU_FEATURE_LLC_REFERENCES },
+ [INTEL_ARCH_LLC_MISSES] = { X86_PMU_FEATURE_LLC_MISSES },
+ [INTEL_ARCH_BRANCHES_RETIRED] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED },
+ [INTEL_ARCH_BRANCHES_MISPREDICTED] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED },
+ };
+
+ uint32_t nr_gp_counters = this_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
+ uint32_t pmu_version = guest_get_pmu_version();
+ /* PERF_GLOBAL_CTRL exists only for Architectural PMU Version 2+. */
+ bool guest_has_perf_global_ctrl = pmu_version >= 2;
+ struct kvm_x86_pmu_feature gp_event;
+ uint32_t base_pmc_msr;
+ unsigned int i;
+
+ /* The host side shouldn't invoke this without a guest PMU. */
+ GUEST_ASSERT(pmu_version);
+
+ if (this_cpu_has(X86_FEATURE_PDCM) &&
+ rdmsr(MSR_IA32_PERF_CAPABILITIES) & PMU_CAP_FW_WRITES)
+ base_pmc_msr = MSR_IA32_PMC0;
+ else
+ base_pmc_msr = MSR_IA32_PERFCTR0;
+
+ gp_event = intel_event_to_feature[idx].gp_event;
+ GUEST_ASSERT_EQ(idx, gp_event.f.bit);
+
+ GUEST_ASSERT(nr_gp_counters);
+
+ for (i = 0; i < nr_gp_counters; i++) {
+ uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS |
+ ARCH_PERFMON_EVENTSEL_ENABLE |
+ intel_pmu_arch_events[idx];
+
+ wrmsr(MSR_P6_EVNTSEL0 + i, 0);
+ if (guest_has_perf_global_ctrl)
+ wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, BIT_ULL(i));
+
+ __guest_test_arch_event(idx, gp_event, i, base_pmc_msr + i,
+ MSR_P6_EVNTSEL0 + i, eventsel);
+ }
+}
+
+static void guest_test_arch_events(void)
+{
+ uint8_t i;
+
+ for (i = 0; i < NR_INTEL_ARCH_EVENTS; i++)
+ guest_test_arch_event(i);
+
+ GUEST_DONE();
+}
+
+static void test_arch_events(uint8_t pmu_version, uint64_t perf_capabilities,
+ uint8_t length, uint32_t unavailable_mask)
+{
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+
+ /* Testing arch events requires a vPMU (there are no negative tests). */
+ if (!pmu_version)
+ return;
+
+ vm = pmu_vm_create_with_one_vcpu(&vcpu, guest_test_arch_events,
+ pmu_version, perf_capabilities);
+
+ vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH,
+ length);
+ vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_EVENTS_MASK,
+ unavailable_mask);
+
+ run_vcpu(vcpu);
+
+ kvm_vm_free(vm);
+}
+
+static void test_intel_counters(void)
+{
+ uint8_t nr_arch_events = kvm_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
+ uint8_t pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
+ unsigned int i;
+ uint8_t v, j;
+ uint32_t k;
+
+ const uint64_t perf_caps[] = {
+ 0,
+ PMU_CAP_FW_WRITES,
+ };
+
+ /*
+ * Test up to PMU v5, which is the current maximum version defined by
+ * Intel, i.e. is the last version that is guaranteed to be backwards
+ * compatible with KVM's existing behavior.
+ */
+ uint8_t max_pmu_version = max_t(typeof(pmu_version), pmu_version, 5);
+
+ /*
+ * Verify that KVM is sanitizing the architectural events, i.e. hiding
+ * events that KVM doesn't support. This will fail any time KVM adds
+ * support for a new event, but it's worth paying that price to be able
+ * to detect KVM bugs.
+ */
+ TEST_ASSERT(nr_arch_events <= NR_INTEL_ARCH_EVENTS,
+ "KVM is either buggy, or has learned new tricks (length = %u, mask = %x)",
+ nr_arch_events, kvm_cpu_property(X86_PROPERTY_PMU_EVENTS_MASK));
+
+ /*
+ * Force iterating over known arch events regardless of whether or not
+ * KVM/hardware supports a given event.
+ */
+ nr_arch_events = max_t(typeof(nr_arch_events), nr_arch_events, NR_INTEL_ARCH_EVENTS);
+
+ for (v = 0; v <= max_pmu_version; v++) {
+ for (i = 0; i < ARRAY_SIZE(perf_caps); i++) {
+ if (!kvm_has_perf_caps && perf_caps[i])
+ continue;
+
+ pr_info("Testing arch events, PMU version %u, perf_caps = %lx\n",
+ v, perf_caps[i]);
+ /*
+ * To keep the total runtime reasonable, test every
+ * possible non-zero, non-reserved bitmap combination
+ * only with the native PMU version and the full bit
+ * vector length.
+ */
+ if (v == pmu_version) {
+ for (k = 1; k < (BIT(nr_arch_events) - 1); k++)
+ test_arch_events(v, perf_caps[i], nr_arch_events, k);
+ }
+ /*
+ * Test single bits for all PMU version and lengths up
+ * the number of events +1 (to verify KVM doesn't do
+ * weird things if the guest length is greater than the
+ * host length). Explicitly test a mask of '0' and all
+ * ones i.e. all events being available and unavailable.
+ */
+ for (j = 0; j <= nr_arch_events + 1; j++) {
+ test_arch_events(v, perf_caps[i], j, 0);
+ test_arch_events(v, perf_caps[i], j, -1u);
+
+ for (k = 0; k < nr_arch_events; k++)
+ test_arch_events(v, perf_caps[i], j, BIT(k));
+ }
+ }
+ }
+}
+
+int main(int argc, char *argv[])
+{
+ TEST_REQUIRE(get_kvm_param_bool("enable_pmu"));
+
+ TEST_REQUIRE(host_cpu_is_intel);
+ TEST_REQUIRE(kvm_cpu_has_p(X86_PROPERTY_PMU_VERSION));
+ TEST_REQUIRE(kvm_cpu_property(X86_PROPERTY_PMU_VERSION) > 0);
+
+ kvm_pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
+ kvm_has_perf_caps = kvm_cpu_has(X86_FEATURE_PDCM);
+
+ test_intel_counters();
+
+ return 0;
+}
--
2.42.0.869.gea05f2083d-goog

2023-11-04 00:03:50

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 11/20] KVM: selftests: Test Intel PMU architectural events on fixed counters

From: Jinrong Liang <[email protected]>

Extend the PMU counters test to validate architectural events using fixed
counters. The core logic is largely the same, the biggest difference
being that if a fixed counter exists, its associated event is available
(the SDM doesn't explicitly state this to be true, but it's KVM's ABI and
letting software program a fixed counter that doesn't actually count would
be quite bizarre).

Note, fixed counters rely on PERF_GLOBAL_CTRL.

Co-developed-by: Like Xu <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Jinrong Liang <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/x86_64/pmu_counters_test.c | 53 ++++++++++++++++---
1 file changed, 45 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index dd9a7864410c..4d3a5c94b8ba 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -150,25 +150,46 @@ static void __guest_test_arch_event(uint8_t idx, struct kvm_x86_pmu_feature even
guest_assert_event_count(idx, event, pmc, pmc_msr);
}

+#define X86_PMU_FEATURE_NULL \
+({ \
+ struct kvm_x86_pmu_feature feature = {}; \
+ \
+ feature; \
+})
+
+static bool pmu_is_null_feature(struct kvm_x86_pmu_feature event)
+{
+ return !(*(u64 *)&event);
+}
+
static void guest_test_arch_event(uint8_t idx)
{
const struct {
struct kvm_x86_pmu_feature gp_event;
+ struct kvm_x86_pmu_feature fixed_event;
} intel_event_to_feature[] = {
- [INTEL_ARCH_CPU_CYCLES] = { X86_PMU_FEATURE_CPU_CYCLES },
- [INTEL_ARCH_INSTRUCTIONS_RETIRED] = { X86_PMU_FEATURE_INSNS_RETIRED },
- [INTEL_ARCH_REFERENCE_CYCLES] = { X86_PMU_FEATURE_REFERENCE_CYCLES },
- [INTEL_ARCH_LLC_REFERENCES] = { X86_PMU_FEATURE_LLC_REFERENCES },
- [INTEL_ARCH_LLC_MISSES] = { X86_PMU_FEATURE_LLC_MISSES },
- [INTEL_ARCH_BRANCHES_RETIRED] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED },
- [INTEL_ARCH_BRANCHES_MISPREDICTED] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED },
+ [INTEL_ARCH_CPU_CYCLES] = { X86_PMU_FEATURE_CPU_CYCLES, X86_PMU_FEATURE_CPU_CYCLES_FIXED },
+ [INTEL_ARCH_INSTRUCTIONS_RETIRED] = { X86_PMU_FEATURE_INSNS_RETIRED, X86_PMU_FEATURE_INSNS_RETIRED_FIXED },
+ /*
+ * Note, the fixed counter for reference cycles is NOT the same
+ * as the general purpose architectural event (because the GP
+ * event is garbage). The fixed counter explicitly counts at
+ * the same frequency as the TSC, whereas the GP event counts
+ * at a fixed, but uarch specific, frequency. Bundle them here
+ * for simplicity.
+ */
+ [INTEL_ARCH_REFERENCE_CYCLES] = { X86_PMU_FEATURE_REFERENCE_CYCLES, X86_PMU_FEATURE_REFERENCE_CYCLES_FIXED },
+ [INTEL_ARCH_LLC_REFERENCES] = { X86_PMU_FEATURE_LLC_REFERENCES, X86_PMU_FEATURE_NULL },
+ [INTEL_ARCH_LLC_MISSES] = { X86_PMU_FEATURE_LLC_MISSES, X86_PMU_FEATURE_NULL },
+ [INTEL_ARCH_BRANCHES_RETIRED] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED, X86_PMU_FEATURE_NULL },
+ [INTEL_ARCH_BRANCHES_MISPREDICTED] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED, X86_PMU_FEATURE_NULL },
};

uint32_t nr_gp_counters = this_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
uint32_t pmu_version = guest_get_pmu_version();
/* PERF_GLOBAL_CTRL exists only for Architectural PMU Version 2+. */
bool guest_has_perf_global_ctrl = pmu_version >= 2;
- struct kvm_x86_pmu_feature gp_event;
+ struct kvm_x86_pmu_feature gp_event, fixed_event;
uint32_t base_pmc_msr;
unsigned int i;

@@ -198,6 +219,22 @@ static void guest_test_arch_event(uint8_t idx)
__guest_test_arch_event(idx, gp_event, i, base_pmc_msr + i,
MSR_P6_EVNTSEL0 + i, eventsel);
}
+
+ if (!guest_has_perf_global_ctrl)
+ return;
+
+ fixed_event = intel_event_to_feature[idx].fixed_event;
+ if (pmu_is_null_feature(fixed_event) || !this_pmu_has(fixed_event))
+ return;
+
+ i = fixed_event.f.bit;
+
+ wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, BIT_ULL(4 * i));
+
+ __guest_test_arch_event(idx, fixed_event, PMC_FIXED_RDPMC_BASE | i,
+ MSR_CORE_PERF_FIXED_CTR0 + i,
+ MSR_CORE_PERF_GLOBAL_CTRL,
+ BIT_ULL(PMC_IDX_FIXED + i));
}

static void guest_test_arch_events(void)
--
2.42.0.869.gea05f2083d-goog

2023-11-04 00:03:50

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 14/20] KVM: selftests: Add functional test for Intel's fixed PMU counters

From: Jinrong Liang <[email protected]>

Extend the fixed counters test to verify that supported counters can
actually be enabled in the control MSRs, that unsupported counters cannot,
and that enabled counters actually count.

Co-developed-by: Like Xu <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Jinrong Liang <[email protected]>
[sean: fold into the rd/wr access test, massage changelog]
Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/x86_64/pmu_counters_test.c | 29 ++++++++++++++++++-
1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index 52b9d9f615eb..5e3a1575bffc 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -324,7 +324,6 @@ static void guest_rd_wr_counters(uint32_t base_msr, uint8_t nr_possible_counters
vector = wrmsr_safe(msr, 0);
GUEST_ASSERT_PMC_MSR_ACCESS(WRMSR, msr, expect_gp, vector);
}
- GUEST_DONE();
}

static void guest_test_gp_counters(void)
@@ -342,6 +341,7 @@ static void guest_test_gp_counters(void)
base_msr = MSR_IA32_PERFCTR0;

guest_rd_wr_counters(base_msr, MAX_NR_GP_COUNTERS, nr_gp_counters, 0);
+ GUEST_DONE();
}

static void test_gp_counters(uint8_t pmu_version, uint64_t perf_capabilities,
@@ -365,6 +365,7 @@ static void guest_test_fixed_counters(void)
{
uint64_t supported_bitmask = 0;
uint8_t nr_fixed_counters = 0;
+ uint8_t i;

/* Fixed counters require Architectural vPMU Version 2+. */
if (guest_get_pmu_version() >= 2)
@@ -379,6 +380,32 @@ static void guest_test_fixed_counters(void)

guest_rd_wr_counters(MSR_CORE_PERF_FIXED_CTR0, MAX_NR_FIXED_COUNTERS,
nr_fixed_counters, supported_bitmask);
+
+ for (i = 0; i < MAX_NR_FIXED_COUNTERS; i++) {
+ uint8_t vector;
+ uint64_t val;
+
+ if (i >= nr_fixed_counters && !(supported_bitmask & BIT_ULL(i))) {
+ vector = wrmsr_safe(MSR_CORE_PERF_FIXED_CTR_CTRL, BIT_ULL(4 * i));
+ __GUEST_ASSERT(vector == GP_VECTOR,
+ "Expected #GP for counter %u in FIXED_CTRL_CTRL", i);
+
+ vector = wrmsr_safe(MSR_CORE_PERF_GLOBAL_CTRL, BIT_ULL(PMC_IDX_FIXED + i));
+ __GUEST_ASSERT(vector == GP_VECTOR,
+ "Expected #GP for counter %u in PERF_GLOBAL_CTRL", i);
+ continue;
+ }
+
+ wrmsr(MSR_CORE_PERF_FIXED_CTR0 + i, 0);
+ wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, BIT_ULL(4 * i));
+ wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, BIT_ULL(PMC_IDX_FIXED + i));
+ __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES}));
+ wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+ val = rdmsr(MSR_CORE_PERF_FIXED_CTR0 + i);
+
+ GUEST_ASSERT_NE(val, 0);
+ }
+ GUEST_DONE();
}

static void test_fixed_counters(uint8_t pmu_version, uint64_t perf_capabilities,
--
2.42.0.869.gea05f2083d-goog

2023-11-04 00:04:01

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 15/20] KVM: selftests: Expand PMU counters test to verify LLC events

Expand the PMU counters test to verify that LLC references and misses have
non-zero counts when the code being executed while the LLC event(s) is
active is evicted via CFLUSH{,OPT}. Note, CLFLUSH{,OPT} requires a fence
of some kind to ensure the cache lines are flushed before execution
continues. Use MFENCE for simplicity (performance is not a concern).

Suggested-by: Jim Mattson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/x86_64/pmu_counters_test.c | 59 +++++++++++++------
1 file changed, 40 insertions(+), 19 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index 5e3a1575bffc..780f62e6a0f2 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -14,9 +14,9 @@
/*
* Number of "extra" instructions that will be counted, i.e. the number of
* instructions that are needed to set up the loop and then disabled the
- * counter. 2 MOV, 2 XOR, 1 WRMSR.
+ * counter. 1 CLFLUSH/CLFLUSHOPT/NOP, 1 MFENCE, 2 MOV, 2 XOR, 1 WRMSR.
*/
-#define NUM_EXTRA_INSNS 5
+#define NUM_EXTRA_INSNS 7
#define NUM_INSNS_RETIRED (NUM_BRANCHES + NUM_EXTRA_INSNS)

static uint8_t kvm_pmu_version;
@@ -107,6 +107,12 @@ static void guest_assert_event_count(uint8_t idx,
case INTEL_ARCH_BRANCHES_RETIRED:
GUEST_ASSERT_EQ(count, NUM_BRANCHES);
break;
+ case INTEL_ARCH_LLC_REFERENCES:
+ case INTEL_ARCH_LLC_MISSES:
+ if (!this_cpu_has(X86_FEATURE_CLFLUSHOPT) &&
+ !this_cpu_has(X86_FEATURE_CLFLUSH))
+ break;
+ fallthrough;
case INTEL_ARCH_CPU_CYCLES:
case INTEL_ARCH_REFERENCE_CYCLES:
GUEST_ASSERT_NE(count, 0);
@@ -123,29 +129,44 @@ static void guest_assert_event_count(uint8_t idx,
GUEST_ASSERT_EQ(_rdpmc(pmc), 0xdead);
}

+/*
+ * Enable and disable the PMC in a monolithic asm blob to ensure that the
+ * compiler can't insert _any_ code into the measured sequence. Note, ECX
+ * doesn't need to be clobbered as the input value, @pmc_msr, is restored
+ * before the end of the sequence.
+ *
+ * If CLFUSH{,OPT} is supported, flush the cacheline containing (at least) the
+ * start of the loop to force LLC references and misses, i.e. to allow testing
+ * that those events actually count.
+ */
+#define GUEST_MEASURE_EVENT(_msr, _value, clflush) \
+do { \
+ __asm__ __volatile__("wrmsr\n\t" \
+ clflush "\n\t" \
+ "mfence\n\t" \
+ "1: mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t" \
+ "loop .\n\t" \
+ "mov %%edi, %%ecx\n\t" \
+ "xor %%eax, %%eax\n\t" \
+ "xor %%edx, %%edx\n\t" \
+ "wrmsr\n\t" \
+ :: "a"((uint32_t)_value), "d"(_value >> 32), \
+ "c"(_msr), "D"(_msr) \
+ ); \
+} while (0)
+
static void __guest_test_arch_event(uint8_t idx, struct kvm_x86_pmu_feature event,
uint32_t pmc, uint32_t pmc_msr,
uint32_t ctrl_msr, uint64_t ctrl_msr_value)
{
wrmsr(pmc_msr, 0);

- /*
- * Enable and disable the PMC in a monolithic asm blob to ensure that
- * the compiler can't insert _any_ code into the measured sequence.
- * Note, ECX doesn't need to be clobbered as the input value, @pmc_msr,
- * is restored before the end of the sequence.
- */
- __asm__ __volatile__("wrmsr\n\t"
- "mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t"
- "loop .\n\t"
- "mov %%edi, %%ecx\n\t"
- "xor %%eax, %%eax\n\t"
- "xor %%edx, %%edx\n\t"
- "wrmsr\n\t"
- :: "a"((uint32_t)ctrl_msr_value),
- "d"(ctrl_msr_value >> 32),
- "c"(ctrl_msr), "D"(ctrl_msr)
- );
+ if (this_cpu_has(X86_FEATURE_CLFLUSHOPT))
+ GUEST_MEASURE_EVENT(ctrl_msr, ctrl_msr_value, "clflushopt 1f");
+ else if (this_cpu_has(X86_FEATURE_CLFLUSH))
+ GUEST_MEASURE_EVENT(ctrl_msr, ctrl_msr_value, "clflush 1f");
+ else
+ GUEST_MEASURE_EVENT(ctrl_msr, ctrl_msr_value, "nop");

guest_assert_event_count(idx, event, pmc, pmc_msr);
}
--
2.42.0.869.gea05f2083d-goog

2023-11-04 00:04:02

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 13/20] KVM: selftests: Test consistency of CPUID with num of fixed counters

From: Jinrong Liang <[email protected]>

Extend the PMU counters test to verify KVM emulation of fixed counters in
addition to general purpose counters. Fixed counters add an extra wrinkle
in the form of an extra supported bitmask. Thus quoth the SDM:

fixed-function performance counter 'i' is supported if ECX[i] || (EDX[4:0] > i)

Test that KVM handles a counter being available through either method.

Co-developed-by: Like Xu <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Jinrong Liang <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/x86_64/pmu_counters_test.c | 60 ++++++++++++++++++-
1 file changed, 57 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index 232b9a80a9db..52b9d9f615eb 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -285,13 +285,19 @@ __GUEST_ASSERT(expect_gp ? vector == GP_VECTOR : !vector, \
expect_gp ? "#GP" : "no fault", msr, vector) \

static void guest_rd_wr_counters(uint32_t base_msr, uint8_t nr_possible_counters,
- uint8_t nr_counters)
+ uint8_t nr_counters, uint32_t or_mask)
{
uint8_t i;

for (i = 0; i < nr_possible_counters; i++) {
const uint32_t msr = base_msr + i;
- const bool expect_success = i < nr_counters;
+
+ /*
+ * Fixed counters are supported if the counter is less than the
+ * number of enumerated contiguous counters *or* the counter is
+ * explicitly enumerated in the supported counters mask.
+ */
+ const bool expect_success = i < nr_counters || (or_mask & BIT(i));

/*
* KVM drops writes to MSR_P6_PERFCTR[0|1] if the counters are
@@ -335,7 +341,7 @@ static void guest_test_gp_counters(void)
else
base_msr = MSR_IA32_PERFCTR0;

- guest_rd_wr_counters(base_msr, MAX_NR_GP_COUNTERS, nr_gp_counters);
+ guest_rd_wr_counters(base_msr, MAX_NR_GP_COUNTERS, nr_gp_counters, 0);
}

static void test_gp_counters(uint8_t pmu_version, uint64_t perf_capabilities,
@@ -355,9 +361,50 @@ static void test_gp_counters(uint8_t pmu_version, uint64_t perf_capabilities,
kvm_vm_free(vm);
}

+static void guest_test_fixed_counters(void)
+{
+ uint64_t supported_bitmask = 0;
+ uint8_t nr_fixed_counters = 0;
+
+ /* Fixed counters require Architectural vPMU Version 2+. */
+ if (guest_get_pmu_version() >= 2)
+ nr_fixed_counters = this_cpu_property(X86_PROPERTY_PMU_NR_FIXED_COUNTERS);
+
+ /*
+ * The supported bitmask for fixed counters was introduced in PMU
+ * version 5.
+ */
+ if (guest_get_pmu_version() >= 5)
+ supported_bitmask = this_cpu_property(X86_PROPERTY_PMU_FIXED_COUNTERS_BITMASK);
+
+ guest_rd_wr_counters(MSR_CORE_PERF_FIXED_CTR0, MAX_NR_FIXED_COUNTERS,
+ nr_fixed_counters, supported_bitmask);
+}
+
+static void test_fixed_counters(uint8_t pmu_version, uint64_t perf_capabilities,
+ uint8_t nr_fixed_counters,
+ uint32_t supported_bitmask)
+{
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+
+ vm = pmu_vm_create_with_one_vcpu(&vcpu, guest_test_fixed_counters,
+ pmu_version, perf_capabilities);
+
+ vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_FIXED_COUNTERS_BITMASK,
+ supported_bitmask);
+ vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_NR_FIXED_COUNTERS,
+ nr_fixed_counters);
+
+ run_vcpu(vcpu);
+
+ kvm_vm_free(vm);
+}
+
static void test_intel_counters(void)
{
uint8_t nr_arch_events = kvm_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
+ uint8_t nr_fixed_counters = kvm_cpu_property(X86_PROPERTY_PMU_NR_FIXED_COUNTERS);
uint8_t nr_gp_counters = kvm_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
uint8_t pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
unsigned int i;
@@ -428,6 +475,13 @@ static void test_intel_counters(void)
v, perf_caps[i]);
for (j = 0; j <= nr_gp_counters; j++)
test_gp_counters(v, perf_caps[i], j);
+
+ pr_info("Testing fixed counters, PMU version %u, perf_caps = %lx\n",
+ v, perf_caps[i]);
+ for (j = 0; j <= nr_fixed_counters; j++) {
+ for (k = 0; k <= (BIT(nr_fixed_counters) - 1); k++)
+ test_fixed_counters(v, perf_caps[i], j, k);
+ }
}
}
}
--
2.42.0.869.gea05f2083d-goog

2023-11-04 00:04:14

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 17/20] KVM: selftests: Add helpers to read integer module params

Add helpers to read integer module params, which is painfully non-trivial
because the pain of dealing with strings in C is exacerbated by the kernel
inserting a newline.

Don't bother differentiating between int, uint, short, etc. They all fit
in an int, and KVM (thankfully) doesn't have any integer params larger
than an int.

Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/include/kvm_util_base.h | 4 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 62 +++++++++++++++++--
2 files changed, 60 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index a18db6a7b3cf..46b71241216e 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -238,6 +238,10 @@ bool get_kvm_param_bool(const char *param);
bool get_kvm_intel_param_bool(const char *param);
bool get_kvm_amd_param_bool(const char *param);

+int get_kvm_param_integer(const char *param);
+int get_kvm_intel_param_integer(const char *param);
+int get_kvm_amd_param_integer(const char *param);
+
unsigned int kvm_check_cap(long cap);

static inline bool kvm_has_cap(long cap)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 7a8af1821f5d..65101c7d1a1a 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -51,13 +51,13 @@ int open_kvm_dev_path_or_exit(void)
return _open_kvm_dev_path_or_exit(O_RDONLY);
}

-static bool get_module_param_bool(const char *module_name, const char *param)
+static ssize_t get_module_param(const char *module_name, const char *param,
+ void *buffer, size_t buffer_size)
{
const int path_size = 128;
char path[path_size];
- char value;
- ssize_t r;
- int fd;
+ ssize_t bytes_read;
+ int fd, r;

r = snprintf(path, path_size, "/sys/module/%s/parameters/%s",
module_name, param);
@@ -66,11 +66,46 @@ static bool get_module_param_bool(const char *module_name, const char *param)

fd = open_path_or_exit(path, O_RDONLY);

- r = read(fd, &value, 1);
- TEST_ASSERT(r == 1, "read(%s) failed", path);
+ bytes_read = read(fd, buffer, buffer_size);
+ TEST_ASSERT(bytes_read > 0, "read(%s) returned %ld, wanted %ld bytes",
+ path, bytes_read, buffer_size);

r = close(fd);
TEST_ASSERT(!r, "close(%s) failed", path);
+ return bytes_read;
+}
+
+static int get_module_param_integer(const char *module_name, const char *param)
+{
+ /*
+ * 16 bytes to hold a 64-bit value (1 byte per char), 1 byte for the
+ * NUL char, and 1 byte because the kernel sucks and inserts a newline
+ * at the end.
+ */
+ char value[16 + 1 + 1];
+ ssize_t r;
+
+ memset(value, '\0', sizeof(value));
+
+ r = get_module_param(module_name, param, value, sizeof(value));
+ TEST_ASSERT(value[r - 1] == '\n',
+ "Expected trailing newline, got char '%c'", value[r - 1]);
+
+ /*
+ * Squash the newline, otherwise atoi_paranoid() will complain about
+ * trailing non-NUL characters in the string.
+ */
+ value[r - 1] = '\0';
+ return atoi_paranoid(value);
+}
+
+static bool get_module_param_bool(const char *module_name, const char *param)
+{
+ char value;
+ ssize_t r;
+
+ r = get_module_param(module_name, param, &value, sizeof(value));
+ TEST_ASSERT_EQ(r, 1);

if (value == 'Y')
return true;
@@ -95,6 +130,21 @@ bool get_kvm_amd_param_bool(const char *param)
return get_module_param_bool("kvm_amd", param);
}

+int get_kvm_param_integer(const char *param)
+{
+ return get_module_param_integer("kvm", param);
+}
+
+int get_kvm_intel_param_integer(const char *param)
+{
+ return get_module_param_integer("kvm_intel", param);
+}
+
+int get_kvm_amd_param_integer(const char *param)
+{
+ return get_module_param_integer("kvm_amd", param);
+}
+
/*
* Capability
*
--
2.42.0.869.gea05f2083d-goog

2023-11-04 00:04:24

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 18/20] KVM: selftests: Query module param to detect FEP in MSR filtering test

Add a helper to detect KVM support for forced emulation by querying the
module param, and use the helper to detect support for the MSR filtering
test instead of throwing a noodle/NOP at KVM to see if it sticks.

Cc: Aaron Lewis <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/include/x86_64/processor.h | 5 ++++
.../kvm/x86_64/userspace_msr_exit_test.c | 27 +++++++------------
2 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 1885e758eb4d..47612742968d 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -1219,6 +1219,11 @@ static inline bool kvm_is_pmu_enabled(void)
return get_kvm_param_bool("enable_pmu");
}

+static inline bool kvm_is_forced_emulation_enabled(void)
+{
+ return !!get_kvm_param_integer("force_emulation_prefix");
+}
+
uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
int *level);
uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr);
diff --git a/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c b/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c
index 3533dc2fbfee..9e12dbc47a72 100644
--- a/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c
+++ b/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c
@@ -14,8 +14,7 @@

/* Forced emulation prefix, used to invoke the emulator unconditionally. */
#define KVM_FEP "ud2; .byte 'k', 'v', 'm';"
-#define KVM_FEP_LENGTH 5
-static int fep_available = 1;
+static bool fep_available;

#define MSR_NON_EXISTENT 0x474f4f00

@@ -260,13 +259,6 @@ static void guest_code_filter_allow(void)
GUEST_ASSERT(data == 2);
GUEST_ASSERT(guest_exception_count == 0);

- /*
- * Test to see if the instruction emulator is available (ie: the module
- * parameter 'kvm.force_emulation_prefix=1' is set). This instruction
- * will #UD if it isn't available.
- */
- __asm__ __volatile__(KVM_FEP "nop");
-
if (fep_available) {
/* Let userspace know we aren't done. */
GUEST_SYNC(0);
@@ -388,12 +380,6 @@ static void guest_fep_gp_handler(struct ex_regs *regs)
&em_wrmsr_start, &em_wrmsr_end);
}

-static void guest_ud_handler(struct ex_regs *regs)
-{
- fep_available = 0;
- regs->rip += KVM_FEP_LENGTH;
-}
-
static void check_for_guest_assert(struct kvm_vcpu *vcpu)
{
struct ucall uc;
@@ -531,9 +517,11 @@ static void test_msr_filter_allow(void)
{
struct kvm_vcpu *vcpu;
struct kvm_vm *vm;
+ uint64_t cmd;
int rc;

vm = vm_create_with_one_vcpu(&vcpu, guest_code_filter_allow);
+ sync_global_to_guest(vm, fep_available);

rc = kvm_check_cap(KVM_CAP_X86_USER_SPACE_MSR);
TEST_ASSERT(rc, "KVM_CAP_X86_USER_SPACE_MSR is available");
@@ -561,11 +549,11 @@ static void test_msr_filter_allow(void)
run_guest_then_process_wrmsr(vcpu, MSR_NON_EXISTENT);
run_guest_then_process_rdmsr(vcpu, MSR_NON_EXISTENT);

- vm_install_exception_handler(vm, UD_VECTOR, guest_ud_handler);
vcpu_run(vcpu);
- vm_install_exception_handler(vm, UD_VECTOR, NULL);
+ cmd = process_ucall(vcpu);

- if (process_ucall(vcpu) != UCALL_DONE) {
+ if (fep_available) {
+ TEST_ASSERT_EQ(cmd, UCALL_SYNC);
vm_install_exception_handler(vm, GP_VECTOR, guest_fep_gp_handler);

/* Process emulated rdmsr and wrmsr instructions. */
@@ -583,6 +571,7 @@ static void test_msr_filter_allow(void)
/* Confirm the guest completed without issues. */
run_guest_then_process_ucall_done(vcpu);
} else {
+ TEST_ASSERT_EQ(cmd, UCALL_DONE);
printf("To run the instruction emulated tests set the module parameter 'kvm.force_emulation_prefix=1'\n");
}

@@ -804,6 +793,8 @@ static void test_user_exit_msr_flags(void)

int main(int argc, char *argv[])
{
+ fep_available = kvm_is_forced_emulation_enabled();
+
test_msr_filter_allow();

test_msr_filter_deny();
--
2.42.0.869.gea05f2083d-goog

2023-11-04 00:04:40

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 20/20] KVM: selftests: Test PMC virtualization with forced emulation

Extend the PMC counters test to use forced emulation to verify that KVM
emulates counter events for instructions retired and branches retired.
Force emulation for only a subset of the measured code to test that KVM
does the right thing when mixing perf events with emulated events.

Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/x86_64/pmu_counters_test.c | 44 +++++++++++++------
1 file changed, 30 insertions(+), 14 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index e6cf76d3499b..c66cf92cc9cc 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -21,6 +21,7 @@

static uint8_t kvm_pmu_version;
static bool kvm_has_perf_caps;
+static bool is_forced_emulation_enabled;

static struct kvm_vm *pmu_vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
void *guest_code,
@@ -34,6 +35,7 @@ static struct kvm_vm *pmu_vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
vcpu_init_descriptor_tables(*vcpu);

sync_global_to_guest(vm, kvm_pmu_version);
+ sync_global_to_guest(vm, is_forced_emulation_enabled);

/*
* Set PERF_CAPABILITIES before PMU version as KVM disallows enabling
@@ -138,37 +140,50 @@ static void guest_assert_event_count(uint8_t idx,
* If CLFUSH{,OPT} is supported, flush the cacheline containing (at least) the
* start of the loop to force LLC references and misses, i.e. to allow testing
* that those events actually count.
+ *
+ * If forced emulation is enabled (and specified), force emulation on a subset
+ * of the measured code to verify that KVM correctly emulates instructions and
+ * branches retired events in conjunction with hardware also counting said
+ * events.
*/
-#define GUEST_MEASURE_EVENT(_msr, _value, clflush) \
+#define GUEST_MEASURE_EVENT(_msr, _value, clflush, FEP) \
do { \
__asm__ __volatile__("wrmsr\n\t" \
clflush "\n\t" \
"mfence\n\t" \
"1: mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t" \
- "loop .\n\t" \
- "mov %%edi, %%ecx\n\t" \
- "xor %%eax, %%eax\n\t" \
- "xor %%edx, %%edx\n\t" \
+ FEP "loop .\n\t" \
+ FEP "mov %%edi, %%ecx\n\t" \
+ FEP "xor %%eax, %%eax\n\t" \
+ FEP "xor %%edx, %%edx\n\t" \
"wrmsr\n\t" \
:: "a"((uint32_t)_value), "d"(_value >> 32), \
"c"(_msr), "D"(_msr) \
); \
} while (0)

+#define GUEST_TEST_EVENT(_idx, _event, _pmc, _pmc_msr, _ctrl_msr, _value, FEP) \
+do { \
+ wrmsr(pmc_msr, 0); \
+ \
+ if (this_cpu_has(X86_FEATURE_CLFLUSHOPT)) \
+ GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflushopt 1f", FEP); \
+ else if (this_cpu_has(X86_FEATURE_CLFLUSH)) \
+ GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflush 1f", FEP); \
+ else \
+ GUEST_MEASURE_EVENT(_ctrl_msr, _value, "nop", FEP); \
+ \
+ guest_assert_event_count(_idx, _event, _pmc, _pmc_msr); \
+} while (0)
+
static void __guest_test_arch_event(uint8_t idx, struct kvm_x86_pmu_feature event,
uint32_t pmc, uint32_t pmc_msr,
uint32_t ctrl_msr, uint64_t ctrl_msr_value)
{
- wrmsr(pmc_msr, 0);
+ GUEST_TEST_EVENT(idx, event, pmc, pmc_msr, ctrl_msr, ctrl_msr_value, "");

- if (this_cpu_has(X86_FEATURE_CLFLUSHOPT))
- GUEST_MEASURE_EVENT(ctrl_msr, ctrl_msr_value, "clflushopt 1f");
- else if (this_cpu_has(X86_FEATURE_CLFLUSH))
- GUEST_MEASURE_EVENT(ctrl_msr, ctrl_msr_value, "clflush 1f");
- else
- GUEST_MEASURE_EVENT(ctrl_msr, ctrl_msr_value, "nop");
-
- guest_assert_event_count(idx, event, pmc, pmc_msr);
+ if (is_forced_emulation_enabled)
+ GUEST_TEST_EVENT(idx, event, pmc, pmc_msr, ctrl_msr, ctrl_msr_value, KVM_FEP);
}

#define X86_PMU_FEATURE_NULL \
@@ -544,6 +559,7 @@ int main(int argc, char *argv[])

kvm_pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
kvm_has_perf_caps = kvm_cpu_has(X86_FEATURE_PDCM);
+ is_forced_emulation_enabled = kvm_is_forced_emulation_enabled();

test_intel_counters();

--
2.42.0.869.gea05f2083d-goog

2023-11-04 00:04:42

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 19/20] KVM: selftests: Move KVM_FEP macro into common library header

Move the KVM_FEP definition, a.k.a. the KVM force emulation prefix, into
processor.h so that it can be used for other tests besides the MSR filter
test.

Signed-off-by: Sean Christopherson <[email protected]>
---
tools/testing/selftests/kvm/include/x86_64/processor.h | 3 +++
tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c | 2 --
2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 47612742968d..764e7c58a518 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -22,6 +22,9 @@
extern bool host_cpu_is_intel;
extern bool host_cpu_is_amd;

+/* Forced emulation prefix, used to invoke the emulator unconditionally. */
+#define KVM_FEP "ud2; .byte 'k', 'v', 'm';"
+
#define NMI_VECTOR 0x02

#define X86_EFLAGS_FIXED (1u << 1)
diff --git a/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c b/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c
index 9e12dbc47a72..ab3a8c4f0b86 100644
--- a/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c
+++ b/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c
@@ -12,8 +12,6 @@
#include "kvm_util.h"
#include "vmx.h"

-/* Forced emulation prefix, used to invoke the emulator unconditionally. */
-#define KVM_FEP "ud2; .byte 'k', 'v', 'm';"
static bool fep_available;

#define MSR_NON_EXISTENT 0x474f4f00
--
2.42.0.869.gea05f2083d-goog

2023-11-04 00:04:42

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 16/20] KVM: selftests: Add a helper to query if the PMU module param is enabled

Add a helper to problem KVM's "enable_pmu" param, open coding strings in
multiple places is just asking for a false negatives and/or runtime errors
due to typos.

Signed-off-by: Sean Christopherson <[email protected]>
---
tools/testing/selftests/kvm/include/x86_64/processor.h | 5 +++++
tools/testing/selftests/kvm/x86_64/pmu_counters_test.c | 2 +-
tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c | 2 +-
tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c | 2 +-
4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index b103c462701b..1885e758eb4d 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -1214,6 +1214,11 @@ static inline uint8_t xsetbv_safe(uint32_t index, uint64_t value)

bool kvm_is_tdp_enabled(void);

+static inline bool kvm_is_pmu_enabled(void)
+{
+ return get_kvm_param_bool("enable_pmu");
+}
+
uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
int *level);
uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr);
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index 780f62e6a0f2..e6cf76d3499b 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -536,7 +536,7 @@ static void test_intel_counters(void)

int main(int argc, char *argv[])
{
- TEST_REQUIRE(get_kvm_param_bool("enable_pmu"));
+ TEST_REQUIRE(kvm_is_pmu_enabled());

TEST_REQUIRE(host_cpu_is_intel);
TEST_REQUIRE(kvm_cpu_has_p(X86_PROPERTY_PMU_VERSION));
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
index b6e4f57a8651..95bdb6d5af50 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
@@ -906,7 +906,7 @@ int main(int argc, char *argv[])
struct kvm_vcpu *vcpu, *vcpu2 = NULL;
struct kvm_vm *vm;

- TEST_REQUIRE(get_kvm_param_bool("enable_pmu"));
+ TEST_REQUIRE(kvm_is_pmu_enabled());
TEST_REQUIRE(kvm_has_cap(KVM_CAP_PMU_EVENT_FILTER));
TEST_REQUIRE(kvm_has_cap(KVM_CAP_PMU_EVENT_MASKED_EVENTS));

diff --git a/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c b/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c
index ebbcb0a3f743..562b0152a122 100644
--- a/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c
+++ b/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c
@@ -237,7 +237,7 @@ int main(int argc, char *argv[])
{
union perf_capabilities host_cap;

- TEST_REQUIRE(get_kvm_param_bool("enable_pmu"));
+ TEST_REQUIRE(kvm_is_pmu_enabled());
TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_PDCM));

TEST_REQUIRE(kvm_cpu_has_p(X86_PROPERTY_PMU_VERSION));
--
2.42.0.869.gea05f2083d-goog

2023-11-04 00:16:56

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v6 05/20] KVM: x86/pmu: Allow programming events that match unsupported arch events

Remove KVM's bogus restriction that the guest can't program an event whose
encoding matches an unsupported architectural event. The enumeration of
an architectural event only says that if a CPU supports an architectural
event, then the event can be programmed using the architectural encoding.
The enumeration does NOT say anything about the encoding when the CPU
doesn't report support the architectural event.

Preventing the guest from counting events whose encoding happens to match
an architectural event breaks existing functionality whenever Intel adds
an architectural encoding that was *ever* used for a CPU that doesn't
enumerate support for the architectural event, even if the encoding is for
the exact same event!

E.g. the architectural encoding for Top-Down Slots is 0x01a4. Broadwell
CPUs, which do not support the Top-Down Slots architectural event, 0x10a4
is a valid, model-specific event. Denying guest usage of 0x01a4 if/when
KVM adds support for Top-Down slots would break any Broadwell-based guest.

Reported-by: Kan Liang <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]
Cc: Dapeng Mi <[email protected]>
Fixes: a21864486f7e ("KVM: x86/pmu: Fix available_event_types check for REF_CPU_CYCLES event")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm-x86-pmu-ops.h | 1 -
arch/x86/kvm/pmu.c | 1 -
arch/x86/kvm/pmu.h | 1 -
arch/x86/kvm/svm/pmu.c | 6 ----
arch/x86/kvm/vmx/pmu_intel.c | 38 --------------------------
5 files changed, 47 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-pmu-ops.h b/arch/x86/include/asm/kvm-x86-pmu-ops.h
index 6c98f4bb4228..884af8ef7657 100644
--- a/arch/x86/include/asm/kvm-x86-pmu-ops.h
+++ b/arch/x86/include/asm/kvm-x86-pmu-ops.h
@@ -12,7 +12,6 @@ BUILD_BUG_ON(1)
* a NULL definition, for example if "static_call_cond()" will be used
* at the call sites.
*/
-KVM_X86_PMU_OP(hw_event_available)
KVM_X86_PMU_OP(pmc_idx_to_pmc)
KVM_X86_PMU_OP(rdpmc_ecx_to_pmc)
KVM_X86_PMU_OP(msr_idx_to_pmc)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 9ae07db6f0f6..99ed72966528 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -374,7 +374,6 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
static bool pmc_event_is_allowed(struct kvm_pmc *pmc)
{
return pmc_is_globally_enabled(pmc) && pmc_speculative_in_use(pmc) &&
- static_call(kvm_x86_pmu_hw_event_available)(pmc) &&
check_pmu_event_filter(pmc);
}

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 5341e8f69a22..f3e7a356fd81 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -20,7 +20,6 @@

struct kvm_pmu_ops {
void (*init_pmu_capability)(void);
- bool (*hw_event_available)(struct kvm_pmc *pmc);
struct kvm_pmc *(*pmc_idx_to_pmc)(struct kvm_pmu *pmu, int pmc_idx);
struct kvm_pmc *(*rdpmc_ecx_to_pmc)(struct kvm_vcpu *vcpu,
unsigned int idx, u64 *mask);
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 373ff6a6687b..5596fe816ea8 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -73,11 +73,6 @@ static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,
return amd_pmc_idx_to_pmc(pmu, idx);
}

-static bool amd_hw_event_available(struct kvm_pmc *pmc)
-{
- return true;
-}
-
static bool amd_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -249,7 +244,6 @@ static void amd_pmu_reset(struct kvm_vcpu *vcpu)
}

struct kvm_pmu_ops amd_pmu_ops __initdata = {
- .hw_event_available = amd_hw_event_available,
.pmc_idx_to_pmc = amd_pmc_idx_to_pmc,
.rdpmc_ecx_to_pmc = amd_rdpmc_ecx_to_pmc,
.msr_idx_to_pmc = amd_msr_idx_to_pmc,
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index b239e7dbdc9b..9bf700da1e17 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -140,43 +140,6 @@ static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
}
}

-static bool intel_hw_event_available(struct kvm_pmc *pmc)
-{
- struct kvm_pmu *pmu = pmc_to_pmu(pmc);
- u8 event_select = pmc->eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
- u8 unit_mask = (pmc->eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
- int i;
-
- /*
- * Fixed counters are always available if KVM reaches this point. If a
- * fixed counter is unsupported in hardware or guest CPUID, KVM doesn't
- * allow the counter's corresponding MSR to be written. KVM does use
- * architectural events to program fixed counters, as the interface to
- * perf doesn't allow requesting a specific fixed counter, e.g. perf
- * may (sadly) back a guest fixed PMC with a general purposed counter.
- * But if _hardware_ doesn't support the associated event, KVM simply
- * doesn't enumerate support for the fixed counter.
- */
- if (pmc_is_fixed(pmc))
- return true;
-
- BUILD_BUG_ON(ARRAY_SIZE(intel_arch_events) != NR_INTEL_ARCH_EVENTS);
-
- /*
- * Disallow events reported as unavailable in guest CPUID. Note, this
- * doesn't apply to pseudo-architectural events (see above).
- */
- for (i = 0; i < NR_REAL_INTEL_ARCH_EVENTS; i++) {
- if (intel_arch_events[i].eventsel != event_select ||
- intel_arch_events[i].unit_mask != unit_mask)
- continue;
-
- return pmu->available_event_types & BIT(i);
- }
-
- return true;
-}
-
static bool intel_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -842,7 +805,6 @@ void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu)

struct kvm_pmu_ops intel_pmu_ops __initdata = {
.init_pmu_capability = intel_init_pmu_capability,
- .hw_event_available = intel_hw_event_available,
.pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
.rdpmc_ecx_to_pmc = intel_rdpmc_ecx_to_pmc,
.msr_idx_to_pmc = intel_msr_idx_to_pmc,
--
2.42.0.869.gea05f2083d-goog

2023-11-04 12:26:59

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v6 02/20] KVM: x86/pmu: Don't enumerate support for fixed counters KVM can't virtualize

On Fri, Nov 3, 2023 at 5:02 PM Sean Christopherson <[email protected]> wrote:
>
> Hide fixed counters for which perf is incapable of creating the associated
> architectural event. Except for the so called pseudo-architectural event
> for counting TSC reference cycle, KVM virtualizes fixed counters by
> creating a perf event for the associated general purpose architectural
> event. If the associated event isn't supported in hardware, KVM can't
> actually virtualize the fixed counter because perf will likely not program
> up the correct event.

Won't it? My understanding was that perf preferred to use a fixed
counter when there was a choice of fixed or general purpose counter.
Unless the fixed counter is already assigned to a perf_event, KVM's
request should be satisfied by assigning the fixed counter.

> Note, this issue is almost certainly limited to running KVM on a funky
> virtual CPU model, no known real hardware has an asymmetric PMU where a
> fixed counter is supported but the associated architectural event is not.

This seems like a fix looking for a problem. Has the "problem"
actually been encountered?

> Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/pmu.h | 4 ++++
> arch/x86/kvm/vmx/pmu_intel.c | 31 +++++++++++++++++++++++++++++++
> 2 files changed, 35 insertions(+)
>
> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index 1d64113de488..5341e8f69a22 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -19,6 +19,7 @@
> #define VMWARE_BACKDOOR_PMC_APPARENT_TIME 0x10002
>
> struct kvm_pmu_ops {
> + void (*init_pmu_capability)(void);
> bool (*hw_event_available)(struct kvm_pmc *pmc);
> struct kvm_pmc *(*pmc_idx_to_pmc)(struct kvm_pmu *pmu, int pmc_idx);
> struct kvm_pmc *(*rdpmc_ecx_to_pmc)(struct kvm_vcpu *vcpu,
> @@ -218,6 +219,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
> pmu_ops->MAX_NR_GP_COUNTERS);
> kvm_pmu_cap.num_counters_fixed = min(kvm_pmu_cap.num_counters_fixed,
> KVM_PMC_MAX_FIXED);
> +
> + if (pmu_ops->init_pmu_capability)
> + pmu_ops->init_pmu_capability();
> }
>
> static inline void kvm_pmu_request_counter_reprogram(struct kvm_pmc *pmc)
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 1b13a472e3f2..3316fdea212a 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -68,6 +68,36 @@ static int fixed_pmc_events[] = {
> [2] = PSEUDO_ARCH_REFERENCE_CYCLES,
> };
>
> +static void intel_init_pmu_capability(void)
> +{
> + int i;
> +
> + /*
> + * Perf may (sadly) back a guest fixed counter with a general purpose
> + * counter, and so KVM must hide fixed counters whose associated
> + * architectural event are unsupported. On real hardware, this should
> + * never happen, but if KVM is running on a funky virtual CPU model...
> + *
> + * TODO: Drop this horror if/when KVM stops using perf events for
> + * guest fixed counters, or can explicitly request fixed counters.
> + */
> + for (i = 0; i < kvm_pmu_cap.num_counters_fixed; i++) {
> + int event = fixed_pmc_events[i];
> +
> + /*
> + * Ignore pseudo-architectural events, they're a bizarre way of
> + * requesting events from perf that _can't_ be backed with a
> + * general purpose architectural event, i.e. they're guaranteed
> + * to be backed by the real fixed counter.
> + */
> + if (event < NR_REAL_INTEL_ARCH_EVENTS &&
> + (kvm_pmu_cap.events_mask & BIT(event)))
> + break;
> + }
> +
> + kvm_pmu_cap.num_counters_fixed = i;
> +}
> +
> static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
> {
> struct kvm_pmc *pmc;
> @@ -789,6 +819,7 @@ void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu)
> }
>
> struct kvm_pmu_ops intel_pmu_ops __initdata = {
> + .init_pmu_capability = intel_init_pmu_capability,
> .hw_event_available = intel_hw_event_available,
> .pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
> .rdpmc_ecx_to_pmc = intel_rdpmc_ecx_to_pmc,
> --
> 2.42.0.869.gea05f2083d-goog
>

2023-11-04 12:41:57

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v6 03/20] KVM: x86/pmu: Don't enumerate arch events KVM doesn't support

On Fri, Nov 3, 2023 at 5:02 PM Sean Christopherson <[email protected]> wrote:
>
> Don't advertise support to userspace for architectural events that KVM
> doesn't support, i.e. for "real" events that aren't listed in
> intel_pmu_architectural_events. On current hardware, this effectively
> means "don't advertise support for Top Down Slots".

NR_REAL_INTEL_ARCH_EVENTS is only used in intel_hw_event_available().
As discussed (https://lore.kernel.org/kvm/[email protected]/),
intel_hw_event_available() should go away.

Aside from mapping fixed counters to event selector and unit mask
(fixed_pmc_events[]), KVM has no reason to know when a new
architectural event is defined.

The variable that this change "fixes" is only used to feed
CPUID.0AH:EBX in KVM_GET_SUPPORTED_CPUID, and kvm_pmu_cap.events_mask
is already constructed from what host perf advertises support for.

> Mask off the associated "unavailable" bits, as said bits for undefined
> events are reserved to zero. Arguably the events _are_ defined, but from
> a KVM perspective they might as well not exist, and there's absolutely no
> reason to leave useless unavailable bits set.
>
> Fixes: a6c06ed1a60a ("KVM: Expose the architectural performance monitoring CPUID leaf")
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/pmu_intel.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 3316fdea212a..8d545f84dc4a 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -73,6 +73,15 @@ static void intel_init_pmu_capability(void)
> int i;
>
> /*
> + * Do not enumerate support for architectural events that KVM doesn't
> + * support. Clear unsupported events "unavailable" bit as well, as
> + * architecturally such bits are reserved to zero.
> + */
> + kvm_pmu_cap.events_mask_len = min(kvm_pmu_cap.events_mask_len,
> + NR_REAL_INTEL_ARCH_EVENTS);
> + kvm_pmu_cap.events_mask &= GENMASK(kvm_pmu_cap.events_mask_len - 1, 0);
> +
> + /*
> * Perf may (sadly) back a guest fixed counter with a general purpose
> * counter, and so KVM must hide fixed counters whose associated
> * architectural event are unsupported. On real hardware, this should
> --
> 2.42.0.869.gea05f2083d-goog
>

2023-11-04 12:43:57

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v6 04/20] KVM: x86/pmu: Always treat Fixed counters as available when supported

On Fri, Nov 3, 2023 at 5:02 PM Sean Christopherson <[email protected]> wrote:
>
> Now that KVM hides fixed counters that can't be virtualized, treat fixed
> counters as available when they are supported, i.e. don't silently ignore
> an enabled fixed counter just because guest CPUID says the associated
> general purpose architectural event is unavailable.
>
> KVM originally treated fixed counters as always available, but that got
> changed as part of a fix to avoid confusing REF_CPU_CYCLES, which does NOT
> map to an architectural event, with the actual architectural event used
> associated with bit 7, TOPDOWN_SLOTS.
>
> The commit justified the change with:
>
> If the event is marked as unavailable in the Intel guest CPUID
> 0AH.EBX leaf, we need to avoid any perf_event creation, whether
> it's a gp or fixed counter.
>
> but that justification doesn't mesh with reality. The Intel SDM uses
> "architectural events" to refer to both general purpose events (the ones
> with the reverse polarity mask in CPUID.0xA.EBX) and the events for fixed
> counters, e.g. the SDM makes statements like:
>
> Each of the fixed-function PMC can count only one architectural
> performance event.
>
> but the fact that fixed counter 2 (TSC reference cycles) doesn't have an
> associated general purpose architectural makes trying to apply the mask
> from CPUID.0xA.EBX impossible. Furthermore, the SDM never explicitly
> says that an architectural events that's marked unavailable in EBX affects
> the fixed counters.
>
> Note, at the time of the change, KVM didn't enforce hardware support, i.e.
> didn't prevent userspace from enumerating support in guest CPUID.0xA.EBX
> for architectural events that aren't supported in hardware. I.e. silently
> dropping the fixed counter didn't somehow protection against counting the
> wrong event, it just enforced guest CPUID.
>
> Arguably, userspace is creating a bogus vCPU model by advertising a fixed
> counter but saying the associated general purpose architectural event is
> unavailable. But regardless of the validity of the vCPU model, letting
> the guest enable a fixed counter and then not actually having it count
> anything is completely nonsensical. I.e. even if all of the above is
> wrong and it's illegal for a fixed counter to exist when the architectural
> event is unavailable, silently doing nothing is still the wrong behavior
> and KVM should instead disallow enabling the fixed counter in the first
> place.
>
> Fixes: a21864486f7e ("KVM: x86/pmu: Fix available_event_types check for REF_CPU_CYCLES event")
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/pmu_intel.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 8d545f84dc4a..b239e7dbdc9b 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -147,11 +147,24 @@ static bool intel_hw_event_available(struct kvm_pmc *pmc)

As discussed (https://lore.kernel.org/kvm/[email protected]/),
this function should go away.

> u8 unit_mask = (pmc->eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
> int i;
>
> + /*
> + * Fixed counters are always available if KVM reaches this point. If a
> + * fixed counter is unsupported in hardware or guest CPUID, KVM doesn't
> + * allow the counter's corresponding MSR to be written. KVM does use
> + * architectural events to program fixed counters, as the interface to
> + * perf doesn't allow requesting a specific fixed counter, e.g. perf
> + * may (sadly) back a guest fixed PMC with a general purposed counter.
> + * But if _hardware_ doesn't support the associated event, KVM simply
> + * doesn't enumerate support for the fixed counter.
> + */
> + if (pmc_is_fixed(pmc))
> + return true;
> +
> BUILD_BUG_ON(ARRAY_SIZE(intel_arch_events) != NR_INTEL_ARCH_EVENTS);
>
> /*
> * Disallow events reported as unavailable in guest CPUID. Note, this
> - * doesn't apply to pseudo-architectural events.
> + * doesn't apply to pseudo-architectural events (see above).
> */
> for (i = 0; i < NR_REAL_INTEL_ARCH_EVENTS; i++) {
> if (intel_arch_events[i].eventsel != event_select ||
> --
> 2.42.0.869.gea05f2083d-goog
>

2023-11-04 12:46:43

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v6 05/20] KVM: x86/pmu: Allow programming events that match unsupported arch events

On Fri, Nov 3, 2023 at 5:02 PM Sean Christopherson <[email protected]> wrote:
>
> Remove KVM's bogus restriction that the guest can't program an event whose
> encoding matches an unsupported architectural event. The enumeration of
> an architectural event only says that if a CPU supports an architectural
> event, then the event can be programmed using the architectural encoding.
> The enumeration does NOT say anything about the encoding when the CPU
> doesn't report support the architectural event.
>
> Preventing the guest from counting events whose encoding happens to match
> an architectural event breaks existing functionality whenever Intel adds
> an architectural encoding that was *ever* used for a CPU that doesn't
> enumerate support for the architectural event, even if the encoding is for
> the exact same event!
>
> E.g. the architectural encoding for Top-Down Slots is 0x01a4. Broadwell
> CPUs, which do not support the Top-Down Slots architectural event, 0x10a4
> is a valid, model-specific event. Denying guest usage of 0x01a4 if/when
> KVM adds support for Top-Down slots would break any Broadwell-based guest.
>
> Reported-by: Kan Liang <[email protected]>
> Closes: https://lore.kernel.org/all/[email protected]
> Cc: Dapeng Mi <[email protected]>
> Fixes: a21864486f7e ("KVM: x86/pmu: Fix available_event_types check for REF_CPU_CYCLES event")
> Signed-off-by: Sean Christopherson <[email protected]>

Yes! Finally!

Reviewed-by: Jim Mattson <[email protected]>

2023-11-04 12:52:15

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v6 06/20] KVM: selftests: Add vcpu_set_cpuid_property() to set properties

On Fri, Nov 3, 2023 at 5:02 PM Sean Christopherson <[email protected]> wrote:
>
> From: Jinrong Liang <[email protected]>
>
> Add vcpu_set_cpuid_property() helper function for setting properties, and
> use it instead of open coding an equivalent for MAX_PHY_ADDR. Future vPMU
> testcases will also need to stuff various CPUID properties.
>
> Signed-off-by: Jinrong Liang <[email protected]>
> Co-developed-by: Sean Christopherson <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> .../testing/selftests/kvm/include/x86_64/processor.h | 4 +++-
> tools/testing/selftests/kvm/lib/x86_64/processor.c | 12 +++++++++---
> .../kvm/x86_64/smaller_maxphyaddr_emulation_test.c | 2 +-
> 3 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index 25bc61dac5fb..a01931f7d954 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -994,7 +994,9 @@ static inline void vcpu_set_cpuid(struct kvm_vcpu *vcpu)
> vcpu_ioctl(vcpu, KVM_GET_CPUID2, vcpu->cpuid);
> }
>
> -void vcpu_set_cpuid_maxphyaddr(struct kvm_vcpu *vcpu, uint8_t maxphyaddr);
> +void vcpu_set_cpuid_property(struct kvm_vcpu *vcpu,
> + struct kvm_x86_cpu_property property,
> + uint32_t value);
>
> void vcpu_clear_cpuid_entry(struct kvm_vcpu *vcpu, uint32_t function);
> void vcpu_set_or_clear_cpuid_feature(struct kvm_vcpu *vcpu,
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> index d8288374078e..9e717bc6bd6d 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> @@ -752,11 +752,17 @@ void vcpu_init_cpuid(struct kvm_vcpu *vcpu, const struct kvm_cpuid2 *cpuid)
> vcpu_set_cpuid(vcpu);
> }
>
> -void vcpu_set_cpuid_maxphyaddr(struct kvm_vcpu *vcpu, uint8_t maxphyaddr)
> +void vcpu_set_cpuid_property(struct kvm_vcpu *vcpu,
> + struct kvm_x86_cpu_property property,
> + uint32_t value)
> {
> - struct kvm_cpuid_entry2 *entry = vcpu_get_cpuid_entry(vcpu, 0x80000008);
> + struct kvm_cpuid_entry2 *entry;
> +
> + entry = __vcpu_get_cpuid_entry(vcpu, property.function, property.index);
> +
> + (&entry->eax)[property.reg] &= ~GENMASK(property.hi_bit, property.lo_bit);
> + (&entry->eax)[property.reg] |= value << (property.lo_bit);

What if 'value' is too large?

Perhaps:
value <<= property.lo_bit;
TEST_ASSERT(!(value & ~GENMASK(property.hi_bit,
property.lo_bit)), "value is too large");
(&entry->eax)[property.reg] |= value;

> - entry->eax = (entry->eax & ~0xff) | maxphyaddr;
> vcpu_set_cpuid(vcpu);
> }
>
> diff --git a/tools/testing/selftests/kvm/x86_64/smaller_maxphyaddr_emulation_test.c b/tools/testing/selftests/kvm/x86_64/smaller_maxphyaddr_emulation_test.c
> index 06edf00a97d6..9b89440dff19 100644
> --- a/tools/testing/selftests/kvm/x86_64/smaller_maxphyaddr_emulation_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/smaller_maxphyaddr_emulation_test.c
> @@ -63,7 +63,7 @@ int main(int argc, char *argv[])
> vm_init_descriptor_tables(vm);
> vcpu_init_descriptor_tables(vcpu);
>
> - vcpu_set_cpuid_maxphyaddr(vcpu, MAXPHYADDR);
> + vcpu_set_cpuid_property(vcpu, X86_PROPERTY_MAX_PHY_ADDR, MAXPHYADDR);
>
> rc = kvm_check_cap(KVM_CAP_EXIT_ON_EMULATION_FAILURE);
> TEST_ASSERT(rc, "KVM_CAP_EXIT_ON_EMULATION_FAILURE is unavailable");
> --
> 2.42.0.869.gea05f2083d-goog
>

2023-11-04 12:52:53

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v6 07/20] KVM: selftests: Drop the "name" param from KVM_X86_PMU_FEATURE()

On Fri, Nov 3, 2023 at 5:02 PM Sean Christopherson <[email protected]> wrote:
>
> Drop the "name" parameter from KVM_X86_PMU_FEATURE(), it's unused and
> the name is redundant with the macro, i.e. it's truly useless.
>
> Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>

2023-11-04 13:01:33

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v6 08/20] KVM: selftests: Extend {kvm,this}_pmu_has() to support fixed counters

On Fri, Nov 3, 2023 at 5:02 PM Sean Christopherson <[email protected]> wrote:
>
> Extend the kvm_x86_pmu_feature framework to allow querying for fixed
> counters via {kvm,this}_pmu_has(). Like architectural events, checking
> for a fixed counter annoyingly requires checking multiple CPUID fields, as
> a fixed counter exists if:
>
> FxCtr[i]_is_supported := ECX[i] || (EDX[4:0] > i);
>
> Note, KVM currently doesn't actually support exposing fixed counters via
> the bitmask, but that will hopefully change sooner than later, and Intel's
> SDM explicitly "recommends" checking both the number of counters and the
> mask.
>
> Rename the intermedate "anti_feature" field to simply 'f' since the fixed
> counter bitmask (thankfully) doesn't have reversed polarity like the
> architectural events bitmask.
>
> Note, ideally the helpers would use BUILD_BUG_ON() to assert on the
> incoming register, but the expected usage in PMU tests can't guarantee the
> inputs are compile-time constants.
>
> Opportunistically define macros for all of the architectural events and
> fixed counters that KVM currently supports.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> .../selftests/kvm/include/x86_64/processor.h | 63 +++++++++++++------
> 1 file changed, 45 insertions(+), 18 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index 2d9771151dd9..b103c462701b 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -281,24 +281,39 @@ struct kvm_x86_cpu_property {
> * that indicates the feature is _not_ supported, and a property that states
> * the length of the bit mask of unsupported features. A feature is supported
> * if the size of the bit mask is larger than the "unavailable" bit, and said
> - * bit is not set.
> + * bit is not set. Fixed counters also bizarre enumeration, but inverted from
> + * arch events for general purpose counters. Fixed counters are supported if a
> + * feature flag is set **OR** the total number of fixed counters is greater
> + * than index of the counter.
> *
> - * Wrap the "unavailable" feature to simplify checking whether or not a given
> - * architectural event is supported.
> + * Wrap the events for general purpose and fixed counters to simplify checking
> + * whether or not a given architectural event is supported.
> */
> struct kvm_x86_pmu_feature {
> - struct kvm_x86_cpu_feature anti_feature;
> + struct kvm_x86_cpu_feature f;
> };
> -#define KVM_X86_PMU_FEATURE(__bit) \
> -({ \
> - struct kvm_x86_pmu_feature feature = { \
> - .anti_feature = KVM_X86_CPU_FEATURE(0xa, 0, EBX, __bit), \
> - }; \
> - \
> - feature; \
> +#define KVM_X86_PMU_FEATURE(__reg, __bit) \
> +({ \
> + struct kvm_x86_pmu_feature feature = { \
> + .f = KVM_X86_CPU_FEATURE(0xa, 0, __reg, __bit), \
> + }; \
> + \
> + kvm_static_assert(KVM_CPUID_##__reg == KVM_CPUID_EBX || \
> + KVM_CPUID_##__reg == KVM_CPUID_ECX); \
> + feature; \
> })
>
> -#define X86_PMU_FEATURE_BRANCH_INSNS_RETIRED KVM_X86_PMU_FEATURE(5)
> +#define X86_PMU_FEATURE_CPU_CYCLES KVM_X86_PMU_FEATURE(EBX, 0)
> +#define X86_PMU_FEATURE_INSNS_RETIRED KVM_X86_PMU_FEATURE(EBX, 1)
> +#define X86_PMU_FEATURE_REFERENCE_CYCLES KVM_X86_PMU_FEATURE(EBX, 2)
> +#define X86_PMU_FEATURE_LLC_REFERENCES KVM_X86_PMU_FEATURE(EBX, 3)
> +#define X86_PMU_FEATURE_LLC_MISSES KVM_X86_PMU_FEATURE(EBX, 4)
> +#define X86_PMU_FEATURE_BRANCH_INSNS_RETIRED KVM_X86_PMU_FEATURE(EBX, 5)
> +#define X86_PMU_FEATURE_BRANCHES_MISPREDICTED KVM_X86_PMU_FEATURE(EBX, 6)

Why not add top down slots now?

> +
> +#define X86_PMU_FEATURE_INSNS_RETIRED_FIXED KVM_X86_PMU_FEATURE(ECX, 0)
> +#define X86_PMU_FEATURE_CPU_CYCLES_FIXED KVM_X86_PMU_FEATURE(ECX, 1)
> +#define X86_PMU_FEATURE_REFERENCE_CYCLES_FIXED KVM_X86_PMU_FEATURE(ECX, 2)

Perhaps toss 'TSC' between CYCLES and FIXED?

And add top down slots now>

>
> static inline unsigned int x86_family(unsigned int eax)
> {
> @@ -697,10 +712,16 @@ static __always_inline bool this_cpu_has_p(struct kvm_x86_cpu_property property)
>
> static inline bool this_pmu_has(struct kvm_x86_pmu_feature feature)
> {
> - uint32_t nr_bits = this_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
> + uint32_t nr_bits;
>
> - return nr_bits > feature.anti_feature.bit &&
> - !this_cpu_has(feature.anti_feature);
> + if (feature.f.reg == KVM_CPUID_EBX) {
> + nr_bits = this_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
> + return nr_bits > feature.f.bit && !this_cpu_has(feature.f);

Ouch! Reverse polarity bits make 'this_cpu_has' non-intuitive.

> + }
> +
> + GUEST_ASSERT(feature.f.reg == KVM_CPUID_ECX);
> + nr_bits = this_cpu_property(X86_PROPERTY_PMU_NR_FIXED_COUNTERS);
> + return nr_bits > feature.f.bit || this_cpu_has(feature.f);
> }
>
> static __always_inline uint64_t this_cpu_supported_xcr0(void)
> @@ -916,10 +937,16 @@ static __always_inline bool kvm_cpu_has_p(struct kvm_x86_cpu_property property)
>
> static inline bool kvm_pmu_has(struct kvm_x86_pmu_feature feature)
> {
> - uint32_t nr_bits = kvm_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
> + uint32_t nr_bits;
>
> - return nr_bits > feature.anti_feature.bit &&
> - !kvm_cpu_has(feature.anti_feature);
> + if (feature.f.reg == KVM_CPUID_EBX) {
> + nr_bits = kvm_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
> + return nr_bits > feature.f.bit && !kvm_cpu_has(feature.f);
> + }
> +
> + TEST_ASSERT_EQ(feature.f.reg, KVM_CPUID_ECX);
> + nr_bits = kvm_cpu_property(X86_PROPERTY_PMU_NR_FIXED_COUNTERS);
> + return nr_bits > feature.f.bit || kvm_cpu_has(feature.f);
> }
>
> static __always_inline uint64_t kvm_cpu_supported_xcr0(void)
> --
> 2.42.0.869.gea05f2083d-goog
>

2023-11-04 13:20:40

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v6 09/20] KVM: selftests: Add pmu.h and lib/pmu.c for common PMU assets

On Fri, Nov 3, 2023 at 5:02 PM Sean Christopherson <[email protected]> wrote:
>
> From: Jinrong Liang <[email protected]>
>
> By defining the PMU performance events and masks relevant for x86 in
> the new pmu.h and pmu.c, it becomes easier to reference them, minimizing
> potential errors in code that handles these values.
>
> Clean up pmu_event_filter_test.c by including pmu.h and removing
> unnecessary macros.
>
> Suggested-by: Sean Christopherson <[email protected]>
> Signed-off-by: Jinrong Liang <[email protected]>
> [sean: drop PSEUDO_ARCH_REFERENCE_CYCLES]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> tools/testing/selftests/kvm/Makefile | 1 +
> tools/testing/selftests/kvm/include/pmu.h | 84 +++++++++++++++++++
> tools/testing/selftests/kvm/lib/pmu.c | 28 +++++++
> .../kvm/x86_64/pmu_event_filter_test.c | 32 ++-----
> 4 files changed, 122 insertions(+), 23 deletions(-)
> create mode 100644 tools/testing/selftests/kvm/include/pmu.h
> create mode 100644 tools/testing/selftests/kvm/lib/pmu.c
>
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index a5963ab9215b..44d8d022b023 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -32,6 +32,7 @@ LIBKVM += lib/guest_modes.c
> LIBKVM += lib/io.c
> LIBKVM += lib/kvm_util.c
> LIBKVM += lib/memstress.c
> +LIBKVM += lib/pmu.c
> LIBKVM += lib/guest_sprintf.c
> LIBKVM += lib/rbtree.c
> LIBKVM += lib/sparsebit.c
> diff --git a/tools/testing/selftests/kvm/include/pmu.h b/tools/testing/selftests/kvm/include/pmu.h
> new file mode 100644
> index 000000000000..987602c62b51
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/include/pmu.h
> @@ -0,0 +1,84 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2023, Tencent, Inc.
> + */
> +#ifndef SELFTEST_KVM_PMU_H
> +#define SELFTEST_KVM_PMU_H
> +
> +#include <stdint.h>
> +
> +#define X86_PMC_IDX_MAX 64
> +#define INTEL_PMC_MAX_GENERIC 32

I think this is actually 15. Note that IA32_PMC0 through IA32_PMC7
have MSR indices from 0xc1 through 0xc8, and MSR 0xcf is
IA32_CORE_CAPABILITIES. At the very least, we have to handle
non-contiguous MSR indices if we ever go beyond IA32_PMC14.

> +#define KVM_PMU_EVENT_FILTER_MAX_EVENTS 300
> +
> +#define GP_COUNTER_NR_OFS_BIT 8
> +#define EVENT_LENGTH_OFS_BIT 24
> +
> +#define PMU_VERSION_MASK GENMASK_ULL(7, 0)
> +#define EVENT_LENGTH_MASK GENMASK_ULL(31, EVENT_LENGTH_OFS_BIT)
> +#define GP_COUNTER_NR_MASK GENMASK_ULL(15, GP_COUNTER_NR_OFS_BIT)
> +#define FIXED_COUNTER_NR_MASK GENMASK_ULL(4, 0)
> +
> +#define ARCH_PERFMON_EVENTSEL_EVENT GENMASK_ULL(7, 0)
> +#define ARCH_PERFMON_EVENTSEL_UMASK GENMASK_ULL(15, 8)
> +#define ARCH_PERFMON_EVENTSEL_USR BIT_ULL(16)
> +#define ARCH_PERFMON_EVENTSEL_OS BIT_ULL(17)
> +#define ARCH_PERFMON_EVENTSEL_EDGE BIT_ULL(18)
> +#define ARCH_PERFMON_EVENTSEL_PIN_CONTROL BIT_ULL(19)
> +#define ARCH_PERFMON_EVENTSEL_INT BIT_ULL(20)
> +#define ARCH_PERFMON_EVENTSEL_ANY BIT_ULL(21)
> +#define ARCH_PERFMON_EVENTSEL_ENABLE BIT_ULL(22)
> +#define ARCH_PERFMON_EVENTSEL_INV BIT_ULL(23)
> +#define ARCH_PERFMON_EVENTSEL_CMASK GENMASK_ULL(31, 24)
> +
> +#define PMC_MAX_FIXED 16
> +#define PMC_IDX_FIXED 32
> +
> +/* RDPMC offset for Fixed PMCs */
> +#define PMC_FIXED_RDPMC_BASE BIT_ULL(30)
> +#define PMC_FIXED_RDPMC_METRICS BIT_ULL(29)
> +
> +#define FIXED_BITS_MASK 0xFULL
> +#define FIXED_BITS_STRIDE 4
> +#define FIXED_0_KERNEL BIT_ULL(0)
> +#define FIXED_0_USER BIT_ULL(1)
> +#define FIXED_0_ANYTHREAD BIT_ULL(2)
> +#define FIXED_0_ENABLE_PMI BIT_ULL(3)
> +
> +#define fixed_bits_by_idx(_idx, _bits) \
> + ((_bits) << ((_idx) * FIXED_BITS_STRIDE))
> +
> +#define AMD64_NR_COUNTERS 4
> +#define AMD64_NR_COUNTERS_CORE 6
> +
> +#define PMU_CAP_FW_WRITES BIT_ULL(13)
> +#define PMU_CAP_LBR_FMT 0x3f
> +
> +enum intel_pmu_architectural_events {
> + /*
> + * The order of the architectural events matters as support for each
> + * event is enumerated via CPUID using the index of the event.
> + */
> + INTEL_ARCH_CPU_CYCLES,
> + INTEL_ARCH_INSTRUCTIONS_RETIRED,
> + INTEL_ARCH_REFERENCE_CYCLES,
> + INTEL_ARCH_LLC_REFERENCES,
> + INTEL_ARCH_LLC_MISSES,
> + INTEL_ARCH_BRANCHES_RETIRED,
> + INTEL_ARCH_BRANCHES_MISPREDICTED,
> + NR_INTEL_ARCH_EVENTS,
> +};
> +
> +enum amd_pmu_k7_events {
> + AMD_ZEN_CORE_CYCLES,
> + AMD_ZEN_INSTRUCTIONS,
> + AMD_ZEN_BRANCHES,
> + AMD_ZEN_BRANCH_MISSES,
> + NR_AMD_ARCH_EVENTS,
> +};
> +
> +extern const uint64_t intel_pmu_arch_events[];
> +extern const uint64_t amd_pmu_arch_events[];

AMD doesn't define *any* architectural events. Perhaps
amd_pmu_zen_events[], though who knows what Zen5 and beyond will
bring?

> +extern const int intel_pmu_fixed_pmc_events[];
> +
> +#endif /* SELFTEST_KVM_PMU_H */
> diff --git a/tools/testing/selftests/kvm/lib/pmu.c b/tools/testing/selftests/kvm/lib/pmu.c
> new file mode 100644
> index 000000000000..27a6c35f98a1
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/lib/pmu.c
> @@ -0,0 +1,28 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2023, Tencent, Inc.
> + */
> +
> +#include <stdint.h>
> +
> +#include "pmu.h"
> +
> +/* Definitions for Architectural Performance Events */
> +#define ARCH_EVENT(select, umask) (((select) & 0xff) | ((umask) & 0xff) << 8)

There's nothing architectural about this. Perhaps RAW_EVENT() for
consistency with perf?

> +
> +const uint64_t intel_pmu_arch_events[] = {
> + [INTEL_ARCH_CPU_CYCLES] = ARCH_EVENT(0x3c, 0x0),
> + [INTEL_ARCH_INSTRUCTIONS_RETIRED] = ARCH_EVENT(0xc0, 0x0),
> + [INTEL_ARCH_REFERENCE_CYCLES] = ARCH_EVENT(0x3c, 0x1),
> + [INTEL_ARCH_LLC_REFERENCES] = ARCH_EVENT(0x2e, 0x4f),
> + [INTEL_ARCH_LLC_MISSES] = ARCH_EVENT(0x2e, 0x41),
> + [INTEL_ARCH_BRANCHES_RETIRED] = ARCH_EVENT(0xc4, 0x0),
> + [INTEL_ARCH_BRANCHES_MISPREDICTED] = ARCH_EVENT(0xc5, 0x0),

[INTEL_ARCH_TOPDOWN_SLOTS] = ARCH_EVENT(0xa4, 1),

> +};
> +
> +const uint64_t amd_pmu_arch_events[] = {
> + [AMD_ZEN_CORE_CYCLES] = ARCH_EVENT(0x76, 0x00),
> + [AMD_ZEN_INSTRUCTIONS] = ARCH_EVENT(0xc0, 0x00),
> + [AMD_ZEN_BRANCHES] = ARCH_EVENT(0xc2, 0x00),
> + [AMD_ZEN_BRANCH_MISSES] = ARCH_EVENT(0xc3, 0x00),
> +};
> diff --git a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
> index 283cc55597a4..b6e4f57a8651 100644
> --- a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
> @@ -11,31 +11,18 @@
> */
>
> #define _GNU_SOURCE /* for program_invocation_short_name */
> -#include "test_util.h"
> +
> #include "kvm_util.h"
> +#include "pmu.h"
> #include "processor.h"
> -
> -/*
> - * In lieu of copying perf_event.h into tools...
> - */
> -#define ARCH_PERFMON_EVENTSEL_OS (1ULL << 17)
> -#define ARCH_PERFMON_EVENTSEL_ENABLE (1ULL << 22)
> -
> -/* End of stuff taken from perf_event.h. */
> -
> -/* Oddly, this isn't in perf_event.h. */
> -#define ARCH_PERFMON_BRANCHES_RETIRED 5
> +#include "test_util.h"
>
> #define NUM_BRANCHES 42
> -#define INTEL_PMC_IDX_FIXED 32
> -
> -/* Matches KVM_PMU_EVENT_FILTER_MAX_EVENTS in pmu.c */
> -#define MAX_FILTER_EVENTS 300
> #define MAX_TEST_EVENTS 10
>
> #define PMU_EVENT_FILTER_INVALID_ACTION (KVM_PMU_EVENT_DENY + 1)
> #define PMU_EVENT_FILTER_INVALID_FLAGS (KVM_PMU_EVENT_FLAGS_VALID_MASK << 1)
> -#define PMU_EVENT_FILTER_INVALID_NEVENTS (MAX_FILTER_EVENTS + 1)
> +#define PMU_EVENT_FILTER_INVALID_NEVENTS (KVM_PMU_EVENT_FILTER_MAX_EVENTS + 1)
>
> /*
> * This is how the event selector and unit mask are stored in an AMD
> @@ -63,7 +50,6 @@
>
> #define AMD_ZEN_BR_RETIRED EVENT(0xc2, 0)

Now AMD_ZEN_BRANCHES, above?

>
> -
> /*
> * "Retired instructions", from Processor Programming Reference
> * (PPR) for AMD Family 17h Model 01h, Revision B1 Processors,
> @@ -84,7 +70,7 @@ struct __kvm_pmu_event_filter {
> __u32 fixed_counter_bitmap;
> __u32 flags;
> __u32 pad[4];
> - __u64 events[MAX_FILTER_EVENTS];
> + __u64 events[KVM_PMU_EVENT_FILTER_MAX_EVENTS];
> };
>
> /*
> @@ -729,14 +715,14 @@ static void add_dummy_events(uint64_t *events, int nevents)
>
> static void test_masked_events(struct kvm_vcpu *vcpu)
> {
> - int nevents = MAX_FILTER_EVENTS - MAX_TEST_EVENTS;
> - uint64_t events[MAX_FILTER_EVENTS];
> + int nevents = KVM_PMU_EVENT_FILTER_MAX_EVENTS - MAX_TEST_EVENTS;
> + uint64_t events[KVM_PMU_EVENT_FILTER_MAX_EVENTS];
>
> /* Run the test cases against a sparse PMU event filter. */
> run_masked_events_tests(vcpu, events, 0);
>
> /* Run the test cases against a dense PMU event filter. */
> - add_dummy_events(events, MAX_FILTER_EVENTS);
> + add_dummy_events(events, KVM_PMU_EVENT_FILTER_MAX_EVENTS);
> run_masked_events_tests(vcpu, events, nevents);
> }
>
> @@ -818,7 +804,7 @@ static void intel_run_fixed_counter_guest_code(uint8_t fixed_ctr_idx)
> /* Only OS_EN bit is enabled for fixed counter[idx]. */
> wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, BIT_ULL(4 * fixed_ctr_idx));
> wrmsr(MSR_CORE_PERF_GLOBAL_CTRL,
> - BIT_ULL(INTEL_PMC_IDX_FIXED + fixed_ctr_idx));
> + BIT_ULL(PMC_IDX_FIXED + fixed_ctr_idx));
> __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES}));
> wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
>
> --
> 2.42.0.869.gea05f2083d-goog
>

2023-11-04 13:29:59

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v6 10/20] KVM: selftests: Test Intel PMU architectural events on gp counters

On Fri, Nov 3, 2023 at 5:03 PM Sean Christopherson <[email protected]> wrote:
>
> From: Jinrong Liang <[email protected]>
>
> Add test cases to verify that Intel's Architectural PMU events work as
> expected when the are (un)available according to guest CPUID. Iterate
> over a range of sane PMU versions, with and without full-width writes
> enabled, and over interesting combinations of lengths/masks for the bit
> vector that enumerates unavailable events.
>
> Test up to vPMU version 5, i.e. the current architectural max. KVM only
> officially supports up to version 2, but the behavior of the counters is
> backwards compatible, i.e. KVM shouldn't do something completely different
> for a higher, architecturally-defined vPMU version. Verify KVM behavior
> against the effective vPMU version, e.g. advertising vPMU 5 when KVM only
> supports vPMU 2 shouldn't magically unlock vPMU 5 features.
>
> According to Intel SDM, the number of architectural events is reported
> through CPUID.0AH:EAX[31:24] and the architectural event x is supported
> if EBX[x]=0 && EAX[31:24]>x. Note, KVM's ABI is that unavailable events
> do not count, even though strictly speaking that's not required by the
> SDM (the behavior is effectively undefined).
>
> Handcode the entirety of the measured section so that the test can
> precisely assert on the number of instructions and branches retired.
>
> Co-developed-by: Like Xu <[email protected]>
> Signed-off-by: Like Xu <[email protected]>
> Signed-off-by: Jinrong Liang <[email protected]>
> Co-developed-by: Sean Christopherson <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> tools/testing/selftests/kvm/Makefile | 1 +
> .../selftests/kvm/x86_64/pmu_counters_test.c | 321 ++++++++++++++++++
> 2 files changed, 322 insertions(+)
> create mode 100644 tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
>
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index 44d8d022b023..09f5d6fe84de 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -91,6 +91,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test
> TEST_GEN_PROGS_x86_64 += x86_64/monitor_mwait_test
> TEST_GEN_PROGS_x86_64 += x86_64/nested_exceptions_test
> TEST_GEN_PROGS_x86_64 += x86_64/platform_info_test
> +TEST_GEN_PROGS_x86_64 += x86_64/pmu_counters_test
> TEST_GEN_PROGS_x86_64 += x86_64/pmu_event_filter_test
> TEST_GEN_PROGS_x86_64 += x86_64/set_boot_cpu_id
> TEST_GEN_PROGS_x86_64 += x86_64/set_sregs_test
> diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
> new file mode 100644
> index 000000000000..dd9a7864410c
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
> @@ -0,0 +1,321 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2023, Tencent, Inc.
> + */
> +
> +#define _GNU_SOURCE /* for program_invocation_short_name */
> +#include <x86intrin.h>
> +
> +#include "pmu.h"
> +#include "processor.h"
> +
> +/* Number of LOOP instructions for the guest measurement payload. */
> +#define NUM_BRANCHES 10
> +/*
> + * Number of "extra" instructions that will be counted, i.e. the number of
> + * instructions that are needed to set up the loop and then disabled the
> + * counter. 2 MOV, 2 XOR, 1 WRMSR.
> + */
> +#define NUM_EXTRA_INSNS 5
> +#define NUM_INSNS_RETIRED (NUM_BRANCHES + NUM_EXTRA_INSNS)
> +
> +static uint8_t kvm_pmu_version;
> +static bool kvm_has_perf_caps;
> +
> +static struct kvm_vm *pmu_vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
> + void *guest_code,
> + uint8_t pmu_version,
> + uint64_t perf_capabilities)
> +{
> + struct kvm_vm *vm;
> +
> + vm = vm_create_with_one_vcpu(vcpu, guest_code);
> + vm_init_descriptor_tables(vm);
> + vcpu_init_descriptor_tables(*vcpu);
> +
> + sync_global_to_guest(vm, kvm_pmu_version);
> +
> + /*
> + * Set PERF_CAPABILITIES before PMU version as KVM disallows enabling
> + * features via PERF_CAPABILITIES if the guest doesn't have a vPMU.
> + */
> + if (kvm_has_perf_caps)
> + vcpu_set_msr(*vcpu, MSR_IA32_PERF_CAPABILITIES, perf_capabilities);
> +
> + vcpu_set_cpuid_property(*vcpu, X86_PROPERTY_PMU_VERSION, pmu_version);
> + return vm;
> +}
> +
> +static void run_vcpu(struct kvm_vcpu *vcpu)
> +{
> + struct ucall uc;
> +
> + do {
> + vcpu_run(vcpu);
> + switch (get_ucall(vcpu, &uc)) {
> + case UCALL_SYNC:
> + break;
> + case UCALL_ABORT:
> + REPORT_GUEST_ASSERT(uc);
> + break;
> + case UCALL_PRINTF:
> + pr_info("%s", uc.buffer);
> + break;
> + case UCALL_DONE:
> + break;
> + default:
> + TEST_FAIL("Unexpected ucall: %lu", uc.cmd);
> + }
> + } while (uc.cmd != UCALL_DONE);
> +}
> +
> +static uint8_t guest_get_pmu_version(void)
> +{
> + /*
> + * Return the effective PMU version, i.e. the minimum between what KVM
> + * supports and what is enumerated to the guest. The host deliberately
> + * advertises a PMU version to the guest beyond what is actually
> + * supported by KVM to verify KVM doesn't freak out and do something
> + * bizarre with an architecturally valid, but unsupported, version.
> + */
> + return min_t(uint8_t, kvm_pmu_version, this_cpu_property(X86_PROPERTY_PMU_VERSION));
> +}
> +
> +/*
> + * If an architectural event is supported and guaranteed to generate at least
> + * one "hit, assert that its count is non-zero. If an event isn't supported or
> + * the test can't guarantee the associated action will occur, then all bets are
> + * off regarding the count, i.e. no checks can be done.
> + *
> + * Sanity check that in all cases, the event doesn't count when it's disabled,
> + * and that KVM correctly emulates the write of an arbitrary value.
> + */
> +static void guest_assert_event_count(uint8_t idx,
> + struct kvm_x86_pmu_feature event,
> + uint32_t pmc, uint32_t pmc_msr)
> +{
> + uint64_t count;
> +
> + count = _rdpmc(pmc);
> + if (!this_pmu_has(event))
> + goto sanity_checks;
> +
> + switch (idx) {
> + case INTEL_ARCH_INSTRUCTIONS_RETIRED:
> + GUEST_ASSERT_EQ(count, NUM_INSNS_RETIRED);
> + break;
> + case INTEL_ARCH_BRANCHES_RETIRED:
> + GUEST_ASSERT_EQ(count, NUM_BRANCHES);
> + break;
> + case INTEL_ARCH_CPU_CYCLES:
> + case INTEL_ARCH_REFERENCE_CYCLES:
> + GUEST_ASSERT_NE(count, 0);
> + break;
> + default:
> + break;
> + }
> +
> +sanity_checks:
> + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES}));
> + GUEST_ASSERT_EQ(_rdpmc(pmc), count);
> +
> + wrmsr(pmc_msr, 0xdead);
> + GUEST_ASSERT_EQ(_rdpmc(pmc), 0xdead);
> +}
> +
> +static void __guest_test_arch_event(uint8_t idx, struct kvm_x86_pmu_feature event,
> + uint32_t pmc, uint32_t pmc_msr,
> + uint32_t ctrl_msr, uint64_t ctrl_msr_value)
> +{
> + wrmsr(pmc_msr, 0);
> +
> + /*
> + * Enable and disable the PMC in a monolithic asm blob to ensure that
> + * the compiler can't insert _any_ code into the measured sequence.
> + * Note, ECX doesn't need to be clobbered as the input value, @pmc_msr,
> + * is restored before the end of the sequence.
> + */
> + __asm__ __volatile__("wrmsr\n\t"
> + "mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t"
> + "loop .\n\t"
> + "mov %%edi, %%ecx\n\t"
> + "xor %%eax, %%eax\n\t"
> + "xor %%edx, %%edx\n\t"
> + "wrmsr\n\t"
> + :: "a"((uint32_t)ctrl_msr_value),
> + "d"(ctrl_msr_value >> 32),
> + "c"(ctrl_msr), "D"(ctrl_msr)
> + );
> +
> + guest_assert_event_count(idx, event, pmc, pmc_msr);
> +}
> +
> +static void guest_test_arch_event(uint8_t idx)
> +{
> + const struct {
> + struct kvm_x86_pmu_feature gp_event;
> + } intel_event_to_feature[] = {
> + [INTEL_ARCH_CPU_CYCLES] = { X86_PMU_FEATURE_CPU_CYCLES },
> + [INTEL_ARCH_INSTRUCTIONS_RETIRED] = { X86_PMU_FEATURE_INSNS_RETIRED },
> + [INTEL_ARCH_REFERENCE_CYCLES] = { X86_PMU_FEATURE_REFERENCE_CYCLES },
> + [INTEL_ARCH_LLC_REFERENCES] = { X86_PMU_FEATURE_LLC_REFERENCES },
> + [INTEL_ARCH_LLC_MISSES] = { X86_PMU_FEATURE_LLC_MISSES },
> + [INTEL_ARCH_BRANCHES_RETIRED] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED },
> + [INTEL_ARCH_BRANCHES_MISPREDICTED] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED },
> + };
> +
> + uint32_t nr_gp_counters = this_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
> + uint32_t pmu_version = guest_get_pmu_version();
> + /* PERF_GLOBAL_CTRL exists only for Architectural PMU Version 2+. */
> + bool guest_has_perf_global_ctrl = pmu_version >= 2;
> + struct kvm_x86_pmu_feature gp_event;
> + uint32_t base_pmc_msr;
> + unsigned int i;
> +
> + /* The host side shouldn't invoke this without a guest PMU. */
> + GUEST_ASSERT(pmu_version);
> +
> + if (this_cpu_has(X86_FEATURE_PDCM) &&
> + rdmsr(MSR_IA32_PERF_CAPABILITIES) & PMU_CAP_FW_WRITES)
> + base_pmc_msr = MSR_IA32_PMC0;
> + else
> + base_pmc_msr = MSR_IA32_PERFCTR0;
> +
> + gp_event = intel_event_to_feature[idx].gp_event;
> + GUEST_ASSERT_EQ(idx, gp_event.f.bit);
> +
> + GUEST_ASSERT(nr_gp_counters);
> +
> + for (i = 0; i < nr_gp_counters; i++) {
> + uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS |
> + ARCH_PERFMON_EVENTSEL_ENABLE |
> + intel_pmu_arch_events[idx];
> +
> + wrmsr(MSR_P6_EVNTSEL0 + i, 0);
> + if (guest_has_perf_global_ctrl)
> + wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, BIT_ULL(i));
> +
> + __guest_test_arch_event(idx, gp_event, i, base_pmc_msr + i,
> + MSR_P6_EVNTSEL0 + i, eventsel);
> + }
> +}
> +
> +static void guest_test_arch_events(void)
> +{
> + uint8_t i;
> +
> + for (i = 0; i < NR_INTEL_ARCH_EVENTS; i++)
> + guest_test_arch_event(i);
> +
> + GUEST_DONE();
> +}
> +
> +static void test_arch_events(uint8_t pmu_version, uint64_t perf_capabilities,
> + uint8_t length, uint32_t unavailable_mask)
> +{
> + struct kvm_vcpu *vcpu;
> + struct kvm_vm *vm;
> +
> + /* Testing arch events requires a vPMU (there are no negative tests). */
> + if (!pmu_version)
> + return;
> +
> + vm = pmu_vm_create_with_one_vcpu(&vcpu, guest_test_arch_events,
> + pmu_version, perf_capabilities);
> +
> + vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH,
> + length);
> + vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_EVENTS_MASK,
> + unavailable_mask);
> +
> + run_vcpu(vcpu);
> +
> + kvm_vm_free(vm);
> +}
> +
> +static void test_intel_counters(void)
> +{
> + uint8_t nr_arch_events = kvm_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
> + uint8_t pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
> + unsigned int i;
> + uint8_t v, j;
> + uint32_t k;
> +
> + const uint64_t perf_caps[] = {
> + 0,
> + PMU_CAP_FW_WRITES,
> + };
> +
> + /*
> + * Test up to PMU v5, which is the current maximum version defined by
> + * Intel, i.e. is the last version that is guaranteed to be backwards
> + * compatible with KVM's existing behavior.
> + */
> + uint8_t max_pmu_version = max_t(typeof(pmu_version), pmu_version, 5);
> +
> + /*
> + * Verify that KVM is sanitizing the architectural events, i.e. hiding
> + * events that KVM doesn't support. This will fail any time KVM adds
> + * support for a new event, but it's worth paying that price to be able
> + * to detect KVM bugs.
> + */
> + TEST_ASSERT(nr_arch_events <= NR_INTEL_ARCH_EVENTS,
> + "KVM is either buggy, or has learned new tricks (length = %u, mask = %x)",
> + nr_arch_events, kvm_cpu_property(X86_PROPERTY_PMU_EVENTS_MASK));

As stated earlier in this series, KVM doesn't have to do anything when
a new architectural event is defined, so this should just say
something like, "New architectural event(s); please update this
test."

> + /*
> + * Force iterating over known arch events regardless of whether or not
> + * KVM/hardware supports a given event.
> + */
> + nr_arch_events = max_t(typeof(nr_arch_events), nr_arch_events, NR_INTEL_ARCH_EVENTS);
> +
> + for (v = 0; v <= max_pmu_version; v++) {
> + for (i = 0; i < ARRAY_SIZE(perf_caps); i++) {
> + if (!kvm_has_perf_caps && perf_caps[i])
> + continue;
> +
> + pr_info("Testing arch events, PMU version %u, perf_caps = %lx\n",
> + v, perf_caps[i]);
> + /*
> + * To keep the total runtime reasonable, test every
> + * possible non-zero, non-reserved bitmap combination
> + * only with the native PMU version and the full bit
> + * vector length.
> + */
> + if (v == pmu_version) {
> + for (k = 1; k < (BIT(nr_arch_events) - 1); k++)
> + test_arch_events(v, perf_caps[i], nr_arch_events, k);
> + }
> + /*
> + * Test single bits for all PMU version and lengths up
> + * the number of events +1 (to verify KVM doesn't do
> + * weird things if the guest length is greater than the
> + * host length). Explicitly test a mask of '0' and all
> + * ones i.e. all events being available and unavailable.
> + */
> + for (j = 0; j <= nr_arch_events + 1; j++) {
> + test_arch_events(v, perf_caps[i], j, 0);
> + test_arch_events(v, perf_caps[i], j, -1u);
> +
> + for (k = 0; k < nr_arch_events; k++)
> + test_arch_events(v, perf_caps[i], j, BIT(k));
> + }
> + }
> + }
> +}
> +
> +int main(int argc, char *argv[])
> +{
> + TEST_REQUIRE(get_kvm_param_bool("enable_pmu"));
> +
> + TEST_REQUIRE(host_cpu_is_intel);
> + TEST_REQUIRE(kvm_cpu_has_p(X86_PROPERTY_PMU_VERSION));
> + TEST_REQUIRE(kvm_cpu_property(X86_PROPERTY_PMU_VERSION) > 0);
> +
> + kvm_pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
> + kvm_has_perf_caps = kvm_cpu_has(X86_FEATURE_PDCM);
> +
> + test_intel_counters();
> +
> + return 0;
> +}
> --
> 2.42.0.869.gea05f2083d-goog
>

2023-11-04 13:48:06

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v6 11/20] KVM: selftests: Test Intel PMU architectural events on fixed counters

On Fri, Nov 3, 2023 at 5:03 PM Sean Christopherson <[email protected]> wrote:
>
> From: Jinrong Liang <[email protected]>
>
> Extend the PMU counters test to validate architectural events using fixed
> counters. The core logic is largely the same, the biggest difference
> being that if a fixed counter exists, its associated event is available
> (the SDM doesn't explicitly state this to be true, but it's KVM's ABI and
> letting software program a fixed counter that doesn't actually count would
> be quite bizarre).
>
> Note, fixed counters rely on PERF_GLOBAL_CTRL.
>
> Co-developed-by: Like Xu <[email protected]>
> Signed-off-by: Like Xu <[email protected]>
> Signed-off-by: Jinrong Liang <[email protected]>
> Co-developed-by: Sean Christopherson <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>

Reviewed-by: Jim Mattson <[email protected]>

> ---
> .../selftests/kvm/x86_64/pmu_counters_test.c | 53 ++++++++++++++++---
> 1 file changed, 45 insertions(+), 8 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
> index dd9a7864410c..4d3a5c94b8ba 100644
> --- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
> @@ -150,25 +150,46 @@ static void __guest_test_arch_event(uint8_t idx, struct kvm_x86_pmu_feature even
> guest_assert_event_count(idx, event, pmc, pmc_msr);
> }
>
> +#define X86_PMU_FEATURE_NULL \
> +({ \
> + struct kvm_x86_pmu_feature feature = {}; \
> + \
> + feature; \
> +})
> +
> +static bool pmu_is_null_feature(struct kvm_x86_pmu_feature event)
> +{
> + return !(*(u64 *)&event);
> +}
> +
> static void guest_test_arch_event(uint8_t idx)
> {
> const struct {
> struct kvm_x86_pmu_feature gp_event;
> + struct kvm_x86_pmu_feature fixed_event;
> } intel_event_to_feature[] = {
> - [INTEL_ARCH_CPU_CYCLES] = { X86_PMU_FEATURE_CPU_CYCLES },
> - [INTEL_ARCH_INSTRUCTIONS_RETIRED] = { X86_PMU_FEATURE_INSNS_RETIRED },
> - [INTEL_ARCH_REFERENCE_CYCLES] = { X86_PMU_FEATURE_REFERENCE_CYCLES },
> - [INTEL_ARCH_LLC_REFERENCES] = { X86_PMU_FEATURE_LLC_REFERENCES },
> - [INTEL_ARCH_LLC_MISSES] = { X86_PMU_FEATURE_LLC_MISSES },
> - [INTEL_ARCH_BRANCHES_RETIRED] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED },
> - [INTEL_ARCH_BRANCHES_MISPREDICTED] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED },
> + [INTEL_ARCH_CPU_CYCLES] = { X86_PMU_FEATURE_CPU_CYCLES, X86_PMU_FEATURE_CPU_CYCLES_FIXED },
> + [INTEL_ARCH_INSTRUCTIONS_RETIRED] = { X86_PMU_FEATURE_INSNS_RETIRED, X86_PMU_FEATURE_INSNS_RETIRED_FIXED },
> + /*
> + * Note, the fixed counter for reference cycles is NOT the same
> + * as the general purpose architectural event (because the GP
> + * event is garbage). The fixed counter explicitly counts at
> + * the same frequency as the TSC, whereas the GP event counts
> + * at a fixed, but uarch specific, frequency. Bundle them here
> + * for simplicity.
> + */

Implementation-specific is not necessarily garbage, though it would be
nice if there was a way to query the frequency rather than calibrating
against another clock.
Note that tools/perf/pmu-events/arch/x86/*/pipeline.json does
typically indicate the {0x3c, 1} frequency for the CPU in question.

> + [INTEL_ARCH_REFERENCE_CYCLES] = { X86_PMU_FEATURE_REFERENCE_CYCLES, X86_PMU_FEATURE_REFERENCE_CYCLES_FIXED },
> + [INTEL_ARCH_LLC_REFERENCES] = { X86_PMU_FEATURE_LLC_REFERENCES, X86_PMU_FEATURE_NULL },
> + [INTEL_ARCH_LLC_MISSES] = { X86_PMU_FEATURE_LLC_MISSES, X86_PMU_FEATURE_NULL },
> + [INTEL_ARCH_BRANCHES_RETIRED] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED, X86_PMU_FEATURE_NULL },
> + [INTEL_ARCH_BRANCHES_MISPREDICTED] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED, X86_PMU_FEATURE_NULL },
> };
>
> uint32_t nr_gp_counters = this_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
> uint32_t pmu_version = guest_get_pmu_version();
> /* PERF_GLOBAL_CTRL exists only for Architectural PMU Version 2+. */
> bool guest_has_perf_global_ctrl = pmu_version >= 2;
> - struct kvm_x86_pmu_feature gp_event;
> + struct kvm_x86_pmu_feature gp_event, fixed_event;
> uint32_t base_pmc_msr;
> unsigned int i;
>
> @@ -198,6 +219,22 @@ static void guest_test_arch_event(uint8_t idx)
> __guest_test_arch_event(idx, gp_event, i, base_pmc_msr + i,
> MSR_P6_EVNTSEL0 + i, eventsel);
> }
> +
> + if (!guest_has_perf_global_ctrl)
> + return;
> +
> + fixed_event = intel_event_to_feature[idx].fixed_event;
> + if (pmu_is_null_feature(fixed_event) || !this_pmu_has(fixed_event))
> + return;
> +
> + i = fixed_event.f.bit;
> +
> + wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, BIT_ULL(4 * i));
> +
> + __guest_test_arch_event(idx, fixed_event, PMC_FIXED_RDPMC_BASE | i,
> + MSR_CORE_PERF_FIXED_CTR0 + i,
> + MSR_CORE_PERF_GLOBAL_CTRL,
> + BIT_ULL(PMC_IDX_FIXED + i));
> }
>
> static void guest_test_arch_events(void)
> --
> 2.42.0.869.gea05f2083d-goog
>

2023-11-06 07:21:03

by Jinrong Liang

[permalink] [raw]
Subject: Re: [PATCH v6 09/20] KVM: selftests: Add pmu.h and lib/pmu.c for common PMU assets

在 2023/11/4 21:20, Jim Mattson 写道:
> On Fri, Nov 3, 2023 at 5:02 PM Sean Christopherson <[email protected]> wrote:
>>
>> From: Jinrong Liang <[email protected]>
>>
>> By defining the PMU performance events and masks relevant for x86 in
>> the new pmu.h and pmu.c, it becomes easier to reference them, minimizing
>> potential errors in code that handles these values.
>>
>> Clean up pmu_event_filter_test.c by including pmu.h and removing
>> unnecessary macros.
>>
>> Suggested-by: Sean Christopherson <[email protected]>
>> Signed-off-by: Jinrong Liang <[email protected]>
>> [sean: drop PSEUDO_ARCH_REFERENCE_CYCLES]
>> Signed-off-by: Sean Christopherson <[email protected]>
>> ---
>> tools/testing/selftests/kvm/Makefile | 1 +
>> tools/testing/selftests/kvm/include/pmu.h | 84 +++++++++++++++++++
>> tools/testing/selftests/kvm/lib/pmu.c | 28 +++++++
>> .../kvm/x86_64/pmu_event_filter_test.c | 32 ++-----
>> 4 files changed, 122 insertions(+), 23 deletions(-)
>> create mode 100644 tools/testing/selftests/kvm/include/pmu.h
>> create mode 100644 tools/testing/selftests/kvm/lib/pmu.c
>>
>> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
>> index a5963ab9215b..44d8d022b023 100644
>> --- a/tools/testing/selftests/kvm/Makefile
>> +++ b/tools/testing/selftests/kvm/Makefile
>> @@ -32,6 +32,7 @@ LIBKVM += lib/guest_modes.c
>> LIBKVM += lib/io.c
>> LIBKVM += lib/kvm_util.c
>> LIBKVM += lib/memstress.c
>> +LIBKVM += lib/pmu.c
>> LIBKVM += lib/guest_sprintf.c
>> LIBKVM += lib/rbtree.c
>> LIBKVM += lib/sparsebit.c
>> diff --git a/tools/testing/selftests/kvm/include/pmu.h b/tools/testing/selftests/kvm/include/pmu.h
>> new file mode 100644
>> index 000000000000..987602c62b51
>> --- /dev/null
>> +++ b/tools/testing/selftests/kvm/include/pmu.h
>> @@ -0,0 +1,84 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Copyright (C) 2023, Tencent, Inc.
>> + */
>> +#ifndef SELFTEST_KVM_PMU_H
>> +#define SELFTEST_KVM_PMU_H
>> +
>> +#include <stdint.h>
>> +
>> +#define X86_PMC_IDX_MAX 64
>> +#define INTEL_PMC_MAX_GENERIC 32
>
> I think this is actually 15. Note that IA32_PMC0 through IA32_PMC7
> have MSR indices from 0xc1 through 0xc8, and MSR 0xcf is
> IA32_CORE_CAPABILITIES. At the very least, we have to handle
> non-contiguous MSR indices if we ever go beyond IA32_PMC14.
>
>> +#define KVM_PMU_EVENT_FILTER_MAX_EVENTS 300
>> +
>> +#define GP_COUNTER_NR_OFS_BIT 8
>> +#define EVENT_LENGTH_OFS_BIT 24
>> +
>> +#define PMU_VERSION_MASK GENMASK_ULL(7, 0)
>> +#define EVENT_LENGTH_MASK GENMASK_ULL(31, EVENT_LENGTH_OFS_BIT)
>> +#define GP_COUNTER_NR_MASK GENMASK_ULL(15, GP_COUNTER_NR_OFS_BIT)
>> +#define FIXED_COUNTER_NR_MASK GENMASK_ULL(4, 0)
>> +
>> +#define ARCH_PERFMON_EVENTSEL_EVENT GENMASK_ULL(7, 0)
>> +#define ARCH_PERFMON_EVENTSEL_UMASK GENMASK_ULL(15, 8)
>> +#define ARCH_PERFMON_EVENTSEL_USR BIT_ULL(16)
>> +#define ARCH_PERFMON_EVENTSEL_OS BIT_ULL(17)
>> +#define ARCH_PERFMON_EVENTSEL_EDGE BIT_ULL(18)
>> +#define ARCH_PERFMON_EVENTSEL_PIN_CONTROL BIT_ULL(19)
>> +#define ARCH_PERFMON_EVENTSEL_INT BIT_ULL(20)
>> +#define ARCH_PERFMON_EVENTSEL_ANY BIT_ULL(21)
>> +#define ARCH_PERFMON_EVENTSEL_ENABLE BIT_ULL(22)
>> +#define ARCH_PERFMON_EVENTSEL_INV BIT_ULL(23)
>> +#define ARCH_PERFMON_EVENTSEL_CMASK GENMASK_ULL(31, 24)
>> +
>> +#define PMC_MAX_FIXED 16
>> +#define PMC_IDX_FIXED 32
>> +
>> +/* RDPMC offset for Fixed PMCs */
>> +#define PMC_FIXED_RDPMC_BASE BIT_ULL(30)
>> +#define PMC_FIXED_RDPMC_METRICS BIT_ULL(29)
>> +
>> +#define FIXED_BITS_MASK 0xFULL
>> +#define FIXED_BITS_STRIDE 4
>> +#define FIXED_0_KERNEL BIT_ULL(0)
>> +#define FIXED_0_USER BIT_ULL(1)
>> +#define FIXED_0_ANYTHREAD BIT_ULL(2)
>> +#define FIXED_0_ENABLE_PMI BIT_ULL(3)
>> +
>> +#define fixed_bits_by_idx(_idx, _bits) \
>> + ((_bits) << ((_idx) * FIXED_BITS_STRIDE))
>> +
>> +#define AMD64_NR_COUNTERS 4
>> +#define AMD64_NR_COUNTERS_CORE 6
>> +
>> +#define PMU_CAP_FW_WRITES BIT_ULL(13)
>> +#define PMU_CAP_LBR_FMT 0x3f
>> +
>> +enum intel_pmu_architectural_events {
>> + /*
>> + * The order of the architectural events matters as support for each
>> + * event is enumerated via CPUID using the index of the event.
>> + */
>> + INTEL_ARCH_CPU_CYCLES,
>> + INTEL_ARCH_INSTRUCTIONS_RETIRED,
>> + INTEL_ARCH_REFERENCE_CYCLES,
>> + INTEL_ARCH_LLC_REFERENCES,
>> + INTEL_ARCH_LLC_MISSES,
>> + INTEL_ARCH_BRANCHES_RETIRED,
>> + INTEL_ARCH_BRANCHES_MISPREDICTED,
>> + NR_INTEL_ARCH_EVENTS,
>> +};
>> +
>> +enum amd_pmu_k7_events {
>> + AMD_ZEN_CORE_CYCLES,
>> + AMD_ZEN_INSTRUCTIONS,
>> + AMD_ZEN_BRANCHES,
>> + AMD_ZEN_BRANCH_MISSES,
>> + NR_AMD_ARCH_EVENTS,
>> +};
>> +
>> +extern const uint64_t intel_pmu_arch_events[];
>> +extern const uint64_t amd_pmu_arch_events[];
>
> AMD doesn't define *any* architectural events. Perhaps
> amd_pmu_zen_events[], though who knows what Zen5 and beyond will
> bring?
>
>> +extern const int intel_pmu_fixed_pmc_events[];
>> +
>> +#endif /* SELFTEST_KVM_PMU_H */
>> diff --git a/tools/testing/selftests/kvm/lib/pmu.c b/tools/testing/selftests/kvm/lib/pmu.c
>> new file mode 100644
>> index 000000000000..27a6c35f98a1
>> --- /dev/null
>> +++ b/tools/testing/selftests/kvm/lib/pmu.c
>> @@ -0,0 +1,28 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright (C) 2023, Tencent, Inc.
>> + */
>> +
>> +#include <stdint.h>
>> +
>> +#include "pmu.h"
>> +
>> +/* Definitions for Architectural Performance Events */
>> +#define ARCH_EVENT(select, umask) (((select) & 0xff) | ((umask) & 0xff) << 8)
>
> There's nothing architectural about this. Perhaps RAW_EVENT() for
> consistency with perf?
>
>> +
>> +const uint64_t intel_pmu_arch_events[] = {
>> + [INTEL_ARCH_CPU_CYCLES] = ARCH_EVENT(0x3c, 0x0),
>> + [INTEL_ARCH_INSTRUCTIONS_RETIRED] = ARCH_EVENT(0xc0, 0x0),
>> + [INTEL_ARCH_REFERENCE_CYCLES] = ARCH_EVENT(0x3c, 0x1),
>> + [INTEL_ARCH_LLC_REFERENCES] = ARCH_EVENT(0x2e, 0x4f),
>> + [INTEL_ARCH_LLC_MISSES] = ARCH_EVENT(0x2e, 0x41),
>> + [INTEL_ARCH_BRANCHES_RETIRED] = ARCH_EVENT(0xc4, 0x0),
>> + [INTEL_ARCH_BRANCHES_MISPREDICTED] = ARCH_EVENT(0xc5, 0x0),
>
> [INTEL_ARCH_TOPDOWN_SLOTS] = ARCH_EVENT(0xa4, 1),
>
>> +};
>> +
>> +const uint64_t amd_pmu_arch_events[] = {
>> + [AMD_ZEN_CORE_CYCLES] = ARCH_EVENT(0x76, 0x00),
>> + [AMD_ZEN_INSTRUCTIONS] = ARCH_EVENT(0xc0, 0x00),
>> + [AMD_ZEN_BRANCHES] = ARCH_EVENT(0xc2, 0x00),
>> + [AMD_ZEN_BRANCH_MISSES] = ARCH_EVENT(0xc3, 0x00),
>> +};
>> diff --git a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
>> index 283cc55597a4..b6e4f57a8651 100644
>> --- a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
>> +++ b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
>> @@ -11,31 +11,18 @@
>> */
>>
>> #define _GNU_SOURCE /* for program_invocation_short_name */
>> -#include "test_util.h"
>> +
>> #include "kvm_util.h"
>> +#include "pmu.h"
>> #include "processor.h"
>> -
>> -/*
>> - * In lieu of copying perf_event.h into tools...
>> - */
>> -#define ARCH_PERFMON_EVENTSEL_OS (1ULL << 17)
>> -#define ARCH_PERFMON_EVENTSEL_ENABLE (1ULL << 22)
>> -
>> -/* End of stuff taken from perf_event.h. */
>> -
>> -/* Oddly, this isn't in perf_event.h. */
>> -#define ARCH_PERFMON_BRANCHES_RETIRED 5
>> +#include "test_util.h"
>>
>> #define NUM_BRANCHES 42
>> -#define INTEL_PMC_IDX_FIXED 32
>> -
>> -/* Matches KVM_PMU_EVENT_FILTER_MAX_EVENTS in pmu.c */
>> -#define MAX_FILTER_EVENTS 300
>> #define MAX_TEST_EVENTS 10
>>
>> #define PMU_EVENT_FILTER_INVALID_ACTION (KVM_PMU_EVENT_DENY + 1)
>> #define PMU_EVENT_FILTER_INVALID_FLAGS (KVM_PMU_EVENT_FLAGS_VALID_MASK << 1)
>> -#define PMU_EVENT_FILTER_INVALID_NEVENTS (MAX_FILTER_EVENTS + 1)
>> +#define PMU_EVENT_FILTER_INVALID_NEVENTS (KVM_PMU_EVENT_FILTER_MAX_EVENTS + 1)
>>
>> /*
>> * This is how the event selector and unit mask are stored in an AMD
>> @@ -63,7 +50,6 @@
>>
>> #define AMD_ZEN_BR_RETIRED EVENT(0xc2, 0)
>
> Now AMD_ZEN_BRANCHES, above?

Yes, I forgot to replace INTEL_BR_RETIRED, AMD_ZEN_BR_RETIRED and
INST_RETIRED in pmu_event_filter_test.c and remove their macro definitions.

Thanks,

Jinrong

>
>>
>> -
>> /*
>> * "Retired instructions", from Processor Programming Reference
>> * (PPR) for AMD Family 17h Model 01h, Revision B1 Processors,
>> @@ -84,7 +70,7 @@ struct __kvm_pmu_event_filter {
>> __u32 fixed_counter_bitmap;
>> __u32 flags;
>> __u32 pad[4];
>> - __u64 events[MAX_FILTER_EVENTS];
>> + __u64 events[KVM_PMU_EVENT_FILTER_MAX_EVENTS];
>> };
>>
>> /*
>> @@ -729,14 +715,14 @@ static void add_dummy_events(uint64_t *events, int nevents)
>>
>> static void test_masked_events(struct kvm_vcpu *vcpu)
>> {
>> - int nevents = MAX_FILTER_EVENTS - MAX_TEST_EVENTS;
>> - uint64_t events[MAX_FILTER_EVENTS];
>> + int nevents = KVM_PMU_EVENT_FILTER_MAX_EVENTS - MAX_TEST_EVENTS;
>> + uint64_t events[KVM_PMU_EVENT_FILTER_MAX_EVENTS];
>>
>> /* Run the test cases against a sparse PMU event filter. */
>> run_masked_events_tests(vcpu, events, 0);
>>
>> /* Run the test cases against a dense PMU event filter. */
>> - add_dummy_events(events, MAX_FILTER_EVENTS);
>> + add_dummy_events(events, KVM_PMU_EVENT_FILTER_MAX_EVENTS);
>> run_masked_events_tests(vcpu, events, nevents);
>> }
>>
>> @@ -818,7 +804,7 @@ static void intel_run_fixed_counter_guest_code(uint8_t fixed_ctr_idx)
>> /* Only OS_EN bit is enabled for fixed counter[idx]. */
>> wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, BIT_ULL(4 * fixed_ctr_idx));
>> wrmsr(MSR_CORE_PERF_GLOBAL_CTRL,
>> - BIT_ULL(INTEL_PMC_IDX_FIXED + fixed_ctr_idx));
>> + BIT_ULL(PMC_IDX_FIXED + fixed_ctr_idx));
>> __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES}));
>> wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
>>
>> --
>> 2.42.0.869.gea05f2083d-goog
>>

2023-11-06 15:31:28

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v6 02/20] KVM: x86/pmu: Don't enumerate support for fixed counters KVM can't virtualize

On Sat, Nov 04, 2023, Jim Mattson wrote:
> On Fri, Nov 3, 2023 at 5:02 PM Sean Christopherson <[email protected]> wrote:
> >
> > Hide fixed counters for which perf is incapable of creating the associated
> > architectural event. Except for the so called pseudo-architectural event
> > for counting TSC reference cycle, KVM virtualizes fixed counters by
> > creating a perf event for the associated general purpose architectural
> > event. If the associated event isn't supported in hardware, KVM can't
> > actually virtualize the fixed counter because perf will likely not program
> > up the correct event.
>
> Won't it? My understanding was that perf preferred to use a fixed
> counter when there was a choice of fixed or general purpose counter.
> Unless the fixed counter is already assigned to a perf_event, KVM's
> request should be satisfied by assigning the fixed counter.
>
> > Note, this issue is almost certainly limited to running KVM on a funky
> > virtual CPU model, no known real hardware has an asymmetric PMU where a
> > fixed counter is supported but the associated architectural event is not.
>
> This seems like a fix looking for a problem. Has the "problem"
> actually been encountered?

Heh, yes, I "encountered" the problem in a curated VM I created. But I completely
agree that this is unnecessary, especially since odds are very, very good that
requesting the architectural general purpose encoding will still work. E.g. in
my goofy setup, the underlying hardware does support the architectural event and
so even if perf doesn't use the fixed counter for whatever reason, the GP counter
will still count the right event.

2023-11-06 16:39:57

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v6 11/20] KVM: selftests: Test Intel PMU architectural events on fixed counters

On Sat, Nov 04, 2023, Jim Mattson wrote:
> On Fri, Nov 3, 2023 at 5:03 PM Sean Christopherson <[email protected]> wrote:
> > static void guest_test_arch_event(uint8_t idx)
> > {
> > const struct {
> > struct kvm_x86_pmu_feature gp_event;
> > + struct kvm_x86_pmu_feature fixed_event;
> > } intel_event_to_feature[] = {
> > - [INTEL_ARCH_CPU_CYCLES] = { X86_PMU_FEATURE_CPU_CYCLES },
> > - [INTEL_ARCH_INSTRUCTIONS_RETIRED] = { X86_PMU_FEATURE_INSNS_RETIRED },
> > - [INTEL_ARCH_REFERENCE_CYCLES] = { X86_PMU_FEATURE_REFERENCE_CYCLES },
> > - [INTEL_ARCH_LLC_REFERENCES] = { X86_PMU_FEATURE_LLC_REFERENCES },
> > - [INTEL_ARCH_LLC_MISSES] = { X86_PMU_FEATURE_LLC_MISSES },
> > - [INTEL_ARCH_BRANCHES_RETIRED] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED },
> > - [INTEL_ARCH_BRANCHES_MISPREDICTED] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED },
> > + [INTEL_ARCH_CPU_CYCLES] = { X86_PMU_FEATURE_CPU_CYCLES, X86_PMU_FEATURE_CPU_CYCLES_FIXED },
> > + [INTEL_ARCH_INSTRUCTIONS_RETIRED] = { X86_PMU_FEATURE_INSNS_RETIRED, X86_PMU_FEATURE_INSNS_RETIRED_FIXED },
> > + /*
> > + * Note, the fixed counter for reference cycles is NOT the same
> > + * as the general purpose architectural event (because the GP
> > + * event is garbage). The fixed counter explicitly counts at
> > + * the same frequency as the TSC, whereas the GP event counts
> > + * at a fixed, but uarch specific, frequency. Bundle them here
> > + * for simplicity.
> > + */
>
> Implementation-specific is not necessarily garbage, though it would be
> nice if there was a way to query the frequency rather than calibrating
> against another clock.

Heh, I'll drop the editorial commentry, though I still think an architectural event
with implementation-specific behavior is garbage :-)

2023-11-06 19:01:30

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v6 06/20] KVM: selftests: Add vcpu_set_cpuid_property() to set properties

On Sat, Nov 04, 2023, Jim Mattson wrote:
> On Fri, Nov 3, 2023 at 5:02 PM Sean Christopherson <[email protected]> wrote:
> >
> > From: Jinrong Liang <[email protected]>
> >
> > Add vcpu_set_cpuid_property() helper function for setting properties, and
> > use it instead of open coding an equivalent for MAX_PHY_ADDR. Future vPMU
> > testcases will also need to stuff various CPUID properties.
> >
> > Signed-off-by: Jinrong Liang <[email protected]>
> > Co-developed-by: Sean Christopherson <[email protected]>
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > .../testing/selftests/kvm/include/x86_64/processor.h | 4 +++-
> > tools/testing/selftests/kvm/lib/x86_64/processor.c | 12 +++++++++---
> > .../kvm/x86_64/smaller_maxphyaddr_emulation_test.c | 2 +-
> > 3 files changed, 13 insertions(+), 5 deletions(-)
> >
> > diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> > index 25bc61dac5fb..a01931f7d954 100644
> > --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> > +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> > @@ -994,7 +994,9 @@ static inline void vcpu_set_cpuid(struct kvm_vcpu *vcpu)
> > vcpu_ioctl(vcpu, KVM_GET_CPUID2, vcpu->cpuid);
> > }
> >
> > -void vcpu_set_cpuid_maxphyaddr(struct kvm_vcpu *vcpu, uint8_t maxphyaddr);
> > +void vcpu_set_cpuid_property(struct kvm_vcpu *vcpu,
> > + struct kvm_x86_cpu_property property,
> > + uint32_t value);
> >
> > void vcpu_clear_cpuid_entry(struct kvm_vcpu *vcpu, uint32_t function);
> > void vcpu_set_or_clear_cpuid_feature(struct kvm_vcpu *vcpu,
> > diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > index d8288374078e..9e717bc6bd6d 100644
> > --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > @@ -752,11 +752,17 @@ void vcpu_init_cpuid(struct kvm_vcpu *vcpu, const struct kvm_cpuid2 *cpuid)
> > vcpu_set_cpuid(vcpu);
> > }
> >
> > -void vcpu_set_cpuid_maxphyaddr(struct kvm_vcpu *vcpu, uint8_t maxphyaddr)
> > +void vcpu_set_cpuid_property(struct kvm_vcpu *vcpu,
> > + struct kvm_x86_cpu_property property,
> > + uint32_t value)
> > {
> > - struct kvm_cpuid_entry2 *entry = vcpu_get_cpuid_entry(vcpu, 0x80000008);
> > + struct kvm_cpuid_entry2 *entry;
> > +
> > + entry = __vcpu_get_cpuid_entry(vcpu, property.function, property.index);
> > +
> > + (&entry->eax)[property.reg] &= ~GENMASK(property.hi_bit, property.lo_bit);
> > + (&entry->eax)[property.reg] |= value << (property.lo_bit);
>
> What if 'value' is too large?
>
> Perhaps:
> value <<= property.lo_bit;
> TEST_ASSERT(!(value & ~GENMASK(property.hi_bit,
> property.lo_bit)), "value is too large");

Heh, if the mask is something like bits 31:24, this would miss the case where
shifting value would drop bits.

Rather than explicitly detecting edge cases, I think the simplest approach is to
assert that kvm_cpuid_property() reads back @value, e.g.

struct kvm_cpuid_entry2 *entry;

entry = __vcpu_get_cpuid_entry(vcpu, property.function, property.index);

(&entry->eax)[property.reg] &= ~GENMASK(property.hi_bit, property.lo_bit);
(&entry->eax)[property.reg] |= value << property.lo_bit;

vcpu_set_cpuid(vcpu);

/* Sanity check that @value doesn't exceed the bounds in any way. */
TEST_ASSERT_EQ(kvm_cpuid_property(vcpu->cpuid, property), value);

2023-11-06 19:50:44

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v6 08/20] KVM: selftests: Extend {kvm,this}_pmu_has() to support fixed counters

On Sat, Nov 04, 2023, Jim Mattson wrote:
> On Fri, Nov 3, 2023 at 5:02 PM Sean Christopherson <[email protected]> wrote:
> > +#define KVM_X86_PMU_FEATURE(__reg, __bit) \
> > +({ \
> > + struct kvm_x86_pmu_feature feature = { \
> > + .f = KVM_X86_CPU_FEATURE(0xa, 0, __reg, __bit), \
> > + }; \
> > + \
> > + kvm_static_assert(KVM_CPUID_##__reg == KVM_CPUID_EBX || \
> > + KVM_CPUID_##__reg == KVM_CPUID_ECX); \
> > + feature; \
> > })
> >
> > -#define X86_PMU_FEATURE_BRANCH_INSNS_RETIRED KVM_X86_PMU_FEATURE(5)
> > +#define X86_PMU_FEATURE_CPU_CYCLES KVM_X86_PMU_FEATURE(EBX, 0)
> > +#define X86_PMU_FEATURE_INSNS_RETIRED KVM_X86_PMU_FEATURE(EBX, 1)
> > +#define X86_PMU_FEATURE_REFERENCE_CYCLES KVM_X86_PMU_FEATURE(EBX, 2)
> > +#define X86_PMU_FEATURE_LLC_REFERENCES KVM_X86_PMU_FEATURE(EBX, 3)
> > +#define X86_PMU_FEATURE_LLC_MISSES KVM_X86_PMU_FEATURE(EBX, 4)
> > +#define X86_PMU_FEATURE_BRANCH_INSNS_RETIRED KVM_X86_PMU_FEATURE(EBX, 5)
> > +#define X86_PMU_FEATURE_BRANCHES_MISPREDICTED KVM_X86_PMU_FEATURE(EBX, 6)
>
> Why not add top down slots now?

Laziness?

> > +#define X86_PMU_FEATURE_INSNS_RETIRED_FIXED KVM_X86_PMU_FEATURE(ECX, 0)
> > +#define X86_PMU_FEATURE_CPU_CYCLES_FIXED KVM_X86_PMU_FEATURE(ECX, 1)
> > +#define X86_PMU_FEATURE_REFERENCE_CYCLES_FIXED KVM_X86_PMU_FEATURE(ECX, 2)
>
> Perhaps toss 'TSC' between CYCLES and FIXED?

I think X86_PMU_FEATURE_REFERENCE_TSC_CYCLES_FIXED is more aligned with how the
SDM (and English in general) talks about reference cycles.

> And add top down slots now>

Ya.

2023-11-06 20:41:15

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v6 09/20] KVM: selftests: Add pmu.h and lib/pmu.c for common PMU assets

On Mon, Nov 06, 2023, JinrongLiang wrote:
> 在 2023/11/4 21:20, Jim Mattson 写道:
> > > diff --git a/tools/testing/selftests/kvm/include/pmu.h b/tools/testing/selftests/kvm/include/pmu.h
> > > new file mode 100644
> > > index 000000000000..987602c62b51
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/kvm/include/pmu.h
> > > @@ -0,0 +1,84 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > +/*
> > > + * Copyright (C) 2023, Tencent, Inc.
> > > + */
> > > +#ifndef SELFTEST_KVM_PMU_H
> > > +#define SELFTEST_KVM_PMU_H
> > > +
> > > +#include <stdint.h>
> > > +
> > > +#define X86_PMC_IDX_MAX 64
> > > +#define INTEL_PMC_MAX_GENERIC 32
> >
> > I think this is actually 15. Note that IA32_PMC0 through IA32_PMC7
> > have MSR indices from 0xc1 through 0xc8, and MSR 0xcf is
> > IA32_CORE_CAPABILITIES. At the very least, we have to handle
> > non-contiguous MSR indices if we ever go beyond IA32_PMC14.

There's no reason to define this, it's not used in selftests.

> > > +#define KVM_PMU_EVENT_FILTER_MAX_EVENTS 300
> > > +
> > > +#define GP_COUNTER_NR_OFS_BIT 8
> > > +#define EVENT_LENGTH_OFS_BIT 24
> > > +
> > > +#define PMU_VERSION_MASK GENMASK_ULL(7, 0)
> > > +#define EVENT_LENGTH_MASK GENMASK_ULL(31, EVENT_LENGTH_OFS_BIT)
> > > +#define GP_COUNTER_NR_MASK GENMASK_ULL(15, GP_COUNTER_NR_OFS_BIT)
> > > +#define FIXED_COUNTER_NR_MASK GENMASK_ULL(4, 0)

These are also unneeded, they're superseded by CPUID properties.

> > > +#define ARCH_PERFMON_EVENTSEL_EVENT GENMASK_ULL(7, 0)
> > > +#define ARCH_PERFMON_EVENTSEL_UMASK GENMASK_ULL(15, 8)
> > > +#define ARCH_PERFMON_EVENTSEL_USR BIT_ULL(16)
> > > +#define ARCH_PERFMON_EVENTSEL_OS BIT_ULL(17)
> > > +#define ARCH_PERFMON_EVENTSEL_EDGE BIT_ULL(18)
> > > +#define ARCH_PERFMON_EVENTSEL_PIN_CONTROL BIT_ULL(19)
> > > +#define ARCH_PERFMON_EVENTSEL_INT BIT_ULL(20)
> > > +#define ARCH_PERFMON_EVENTSEL_ANY BIT_ULL(21)
> > > +#define ARCH_PERFMON_EVENTSEL_ENABLE BIT_ULL(22)
> > > +#define ARCH_PERFMON_EVENTSEL_INV BIT_ULL(23)
> > > +#define ARCH_PERFMON_EVENTSEL_CMASK GENMASK_ULL(31, 24)
> > > +
> > > +#define PMC_MAX_FIXED 16

Also unneeded.

> > > +#define PMC_IDX_FIXED 32

This one is absolutely ridiculous. It's the shift for the enable bit in global
control, which is super obvious from the name. /s

> > > +
> > > +/* RDPMC offset for Fixed PMCs */
> > > +#define PMC_FIXED_RDPMC_BASE BIT_ULL(30)
> > > +#define PMC_FIXED_RDPMC_METRICS BIT_ULL(29)
> > > +
> > > +#define FIXED_BITS_MASK 0xFULL
> > > +#define FIXED_BITS_STRIDE 4
> > > +#define FIXED_0_KERNEL BIT_ULL(0)
> > > +#define FIXED_0_USER BIT_ULL(1)
> > > +#define FIXED_0_ANYTHREAD BIT_ULL(2)
> > > +#define FIXED_0_ENABLE_PMI BIT_ULL(3)
> > > +
> > > +#define fixed_bits_by_idx(_idx, _bits) \
> > > + ((_bits) << ((_idx) * FIXED_BITS_STRIDE))

*sigh* And now I see where the "i * 4" stuff in the new test comes from. My
plan is to redo the above as:

/* RDPMC offset for Fixed PMCs */
#define FIXED_PMC_RDPMC_METRICS BIT_ULL(29)
#define FIXED_PMC_RDPMC_BASE BIT_ULL(30)

#define FIXED_PMC_GLOBAL_CTRL_ENABLE(_idx) BIT_ULL((32 + (_idx)))

#define FIXED_PMC_KERNEL BIT_ULL(0)
#define FIXED_PMC_USER BIT_ULL(1)
#define FIXED_PMC_ANYTHREAD BIT_ULL(2)
#define FIXED_PMC_ENABLE_PMI BIT_ULL(3)
#define FIXED_PMC_NR_BITS 4
#define FIXED_PMC_CTRL(_idx, _val) ((_val) << ((_idx) * FIXED_PMC_NR_BITS))

> > > +#define AMD64_NR_COUNTERS 4
> > > +#define AMD64_NR_COUNTERS_CORE 6

These too can be dropped for now.

> > > +#define PMU_CAP_FW_WRITES BIT_ULL(13)
> > > +#define PMU_CAP_LBR_FMT 0x3f
> > > +
> > > +enum intel_pmu_architectural_events {
> > > + /*
> > > + * The order of the architectural events matters as support for each
> > > + * event is enumerated via CPUID using the index of the event.
> > > + */
> > > + INTEL_ARCH_CPU_CYCLES,
> > > + INTEL_ARCH_INSTRUCTIONS_RETIRED,
> > > + INTEL_ARCH_REFERENCE_CYCLES,
> > > + INTEL_ARCH_LLC_REFERENCES,
> > > + INTEL_ARCH_LLC_MISSES,
> > > + INTEL_ARCH_BRANCHES_RETIRED,
> > > + INTEL_ARCH_BRANCHES_MISPREDICTED,
> > > + NR_INTEL_ARCH_EVENTS,
> > > +};
> > > +
> > > +enum amd_pmu_k7_events {
> > > + AMD_ZEN_CORE_CYCLES,
> > > + AMD_ZEN_INSTRUCTIONS,
> > > + AMD_ZEN_BRANCHES,
> > > + AMD_ZEN_BRANCH_MISSES,
> > > + NR_AMD_ARCH_EVENTS,
> > > +};
> > > +
> > > +extern const uint64_t intel_pmu_arch_events[];
> > > +extern const uint64_t amd_pmu_arch_events[];
> >
> > AMD doesn't define *any* architectural events. Perhaps
> > amd_pmu_zen_events[], though who knows what Zen5 and beyond will
> > bring?
> >
> > > +extern const int intel_pmu_fixed_pmc_events[];
> > > +
> > > +#endif /* SELFTEST_KVM_PMU_H */
> > > diff --git a/tools/testing/selftests/kvm/lib/pmu.c b/tools/testing/selftests/kvm/lib/pmu.c
> > > new file mode 100644
> > > index 000000000000..27a6c35f98a1
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/kvm/lib/pmu.c
> > > @@ -0,0 +1,28 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/*
> > > + * Copyright (C) 2023, Tencent, Inc.
> > > + */
> > > +
> > > +#include <stdint.h>
> > > +
> > > +#include "pmu.h"
> > > +
> > > +/* Definitions for Architectural Performance Events */
> > > +#define ARCH_EVENT(select, umask) (((select) & 0xff) | ((umask) & 0xff) << 8)
> >
> > There's nothing architectural about this. Perhaps RAW_EVENT() for
> > consistency with perf?

Works for me.

> > > +const uint64_t intel_pmu_arch_events[] = {
> > > + [INTEL_ARCH_CPU_CYCLES] = ARCH_EVENT(0x3c, 0x0),
> > > + [INTEL_ARCH_INSTRUCTIONS_RETIRED] = ARCH_EVENT(0xc0, 0x0),
> > > + [INTEL_ARCH_REFERENCE_CYCLES] = ARCH_EVENT(0x3c, 0x1),
> > > + [INTEL_ARCH_LLC_REFERENCES] = ARCH_EVENT(0x2e, 0x4f),
> > > + [INTEL_ARCH_LLC_MISSES] = ARCH_EVENT(0x2e, 0x41),
> > > + [INTEL_ARCH_BRANCHES_RETIRED] = ARCH_EVENT(0xc4, 0x0),
> > > + [INTEL_ARCH_BRANCHES_MISPREDICTED] = ARCH_EVENT(0xc5, 0x0),
> >
> > [INTEL_ARCH_TOPDOWN_SLOTS] = ARCH_EVENT(0xa4, 1),

...

> > > @@ -63,7 +50,6 @@
> > >
> > > #define AMD_ZEN_BR_RETIRED EVENT(0xc2, 0)
> >
> > Now AMD_ZEN_BRANCHES, above?
>
> Yes, I forgot to replace INTEL_BR_RETIRED, AMD_ZEN_BR_RETIRED and
> INST_RETIRED in pmu_event_filter_test.c and remove their macro definitions.

Having to go through an array to get a hardcoded value is silly, e.g. it makes
it unnecessarily difficult to reference the encodings because they aren't simple
literals.

My vote is this:

#define INTEL_ARCH_CPU_CYCLES RAW_EVENT(0x3c, 0x00)
#define INTEL_ARCH_INSTRUCTIONS_RETIRED RAW_EVENT(0xc0, 0x00)
#define INTEL_ARCH_REFERENCE_CYCLES RAW_EVENT(0x3c, 0x01)
#define INTEL_ARCH_LLC_REFERENCES RAW_EVENT(0x2e, 0x4f)
#define INTEL_ARCH_LLC_MISSES RAW_EVENT(0x2e, 0x41)
#define INTEL_ARCH_BRANCHES_RETIRED RAW_EVENT(0xc4, 0x00)
#define INTEL_ARCH_BRANCHES_MISPREDICTED RAW_EVENT(0xc5, 0x00)
#define INTEL_ARCH_TOPDOWN_SLOTS RAW_EVENT(0xa4, 0x01)

#define AMD_ZEN_CORE_CYCLES RAW_EVENT(0x76, 0x00)
#define AMD_ZEN_INSTRUCTIONS_RETIRED RAW_EVENT(0xc0, 0x00)
#define AMD_ZEN_BRANCHES_RETIRED RAW_EVENT(0xc2, 0x00)
#define AMD_ZEN_BRANCHES_MISPREDICTED RAW_EVENT(0xc3, 0x00)

/*
* Note! The order and thus the index of the architectural events matters as
* support for each event is enumerated via CPUID using the index of the event.
*/
enum intel_pmu_architectural_events {
INTEL_ARCH_CPU_CYCLES_INDEX,
INTEL_ARCH_INSTRUCTIONS_RETIRED_INDEX,
INTEL_ARCH_REFERENCE_CYCLES_INDEX,
INTEL_ARCH_LLC_REFERENCES_INDEX,
INTEL_ARCH_LLC_MISSES_INDEX,
INTEL_ARCH_BRANCHES_RETIRED_INDEX,
INTEL_ARCH_BRANCHES_MISPREDICTED_INDEX,
INTEL_ARCH_TOPDOWN_SLOTS_INDEX,
NR_INTEL_ARCH_EVENTS,
};

enum amd_pmu_zen_events {
AMD_ZEN_CORE_CYCLES_INDEX,
AMD_ZEN_INSTRUCTIONS_INDEX,
AMD_ZEN_BRANCHES_INDEX,
AMD_ZEN_BRANCH_MISSES_INDEX,
NR_AMD_ZEN_EVENTS,
};

extern const uint64_t intel_pmu_arch_events[];
extern const uint64_t amd_pmu_zen_events[];

...


const uint64_t intel_pmu_arch_events[] = {
INTEL_ARCH_CPU_CYCLES,
INTEL_ARCH_INSTRUCTIONS_RETIRED,
INTEL_ARCH_REFERENCE_CYCLES,
INTEL_ARCH_LLC_REFERENCES,
INTEL_ARCH_LLC_MISSES,
INTEL_ARCH_BRANCHES_RETIRED,
INTEL_ARCH_BRANCHES_MISPREDICTED,
INTEL_ARCH_TOPDOWN_SLOTS,
};
kvm_static_assert(ARRAY_SIZE(intel_pmu_arch_events) == NR_INTEL_ARCH_EVENTS);

const uint64_t amd_pmu_zen_events[] = {
AMD_ZEN_CORE_CYCLES,
AMD_ZEN_INSTRUCTIONS_RETIRED,
AMD_ZEN_BRANCHES_RETIRED,
AMD_ZEN_BRANCHES_MISPREDICTED,
};
kvm_static_assert(ARRAY_SIZE(amd_pmu_zen_events) == NR_AMD_ZEN_EVENTS);

2023-11-07 07:14:40

by Dapeng Mi

[permalink] [raw]
Subject: Re: [PATCH v6 03/20] KVM: x86/pmu: Don't enumerate arch events KVM doesn't support


On 11/4/2023 8:41 PM, Jim Mattson wrote:
> On Fri, Nov 3, 2023 at 5:02 PM Sean Christopherson <[email protected]> wrote:
>> Don't advertise support to userspace for architectural events that KVM
>> doesn't support, i.e. for "real" events that aren't listed in
>> intel_pmu_architectural_events. On current hardware, this effectively
>> means "don't advertise support for Top Down Slots".
> NR_REAL_INTEL_ARCH_EVENTS is only used in intel_hw_event_available().
> As discussed (https://lore.kernel.org/kvm/[email protected]/),
> intel_hw_event_available() should go away.
>
> Aside from mapping fixed counters to event selector and unit mask
> (fixed_pmc_events[]), KVM has no reason to know when a new
> architectural event is defined.


Since intel_hw_event_available() would be removed, it looks the enum
intel_pmu_architectural_events and intel_arch_events[] array become
useless. We can directly simply modify current fixed_pmc_events[] array
and use it to store fixed counter events code and umask.


>
> The variable that this change "fixes" is only used to feed
> CPUID.0AH:EBX in KVM_GET_SUPPORTED_CPUID, and kvm_pmu_cap.events_mask
> is already constructed from what host perf advertises support for.
>
>> Mask off the associated "unavailable" bits, as said bits for undefined
>> events are reserved to zero. Arguably the events _are_ defined, but from
>> a KVM perspective they might as well not exist, and there's absolutely no
>> reason to leave useless unavailable bits set.
>>
>> Fixes: a6c06ed1a60a ("KVM: Expose the architectural performance monitoring CPUID leaf")
>> Signed-off-by: Sean Christopherson <[email protected]>
>> ---
>> arch/x86/kvm/vmx/pmu_intel.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>> index 3316fdea212a..8d545f84dc4a 100644
>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>> @@ -73,6 +73,15 @@ static void intel_init_pmu_capability(void)
>> int i;
>>
>> /*
>> + * Do not enumerate support for architectural events that KVM doesn't
>> + * support. Clear unsupported events "unavailable" bit as well, as
>> + * architecturally such bits are reserved to zero.
>> + */
>> + kvm_pmu_cap.events_mask_len = min(kvm_pmu_cap.events_mask_len,
>> + NR_REAL_INTEL_ARCH_EVENTS);
>> + kvm_pmu_cap.events_mask &= GENMASK(kvm_pmu_cap.events_mask_len - 1, 0);
>> +
>> + /*
>> * Perf may (sadly) back a guest fixed counter with a general purpose
>> * counter, and so KVM must hide fixed counters whose associated
>> * architectural event are unsupported. On real hardware, this should
>> --
>> 2.42.0.869.gea05f2083d-goog
>>

2023-11-07 07:16:39

by Dapeng Mi

[permalink] [raw]
Subject: Re: [PATCH v6 05/20] KVM: x86/pmu: Allow programming events that match unsupported arch events

On 11/4/2023 8:02 AM, Sean Christopherson wrote:
> Remove KVM's bogus restriction that the guest can't program an event whose
> encoding matches an unsupported architectural event. The enumeration of
> an architectural event only says that if a CPU supports an architectural
> event, then the event can be programmed using the architectural encoding.
> The enumeration does NOT say anything about the encoding when the CPU
> doesn't report support the architectural event.
>
> Preventing the guest from counting events whose encoding happens to match
> an architectural event breaks existing functionality whenever Intel adds
> an architectural encoding that was *ever* used for a CPU that doesn't
> enumerate support for the architectural event, even if the encoding is for
> the exact same event!
>
> E.g. the architectural encoding for Top-Down Slots is 0x01a4. Broadwell
> CPUs, which do not support the Top-Down Slots architectural event, 0x10a4
> is a valid, model-specific event. Denying guest usage of 0x01a4 if/when
> KVM adds support for Top-Down slots would break any Broadwell-based guest.
>
> Reported-by: Kan Liang <[email protected]>
> Closes: https://lore.kernel.org/all/[email protected]
> Cc: Dapeng Mi <[email protected]>
> Fixes: a21864486f7e ("KVM: x86/pmu: Fix available_event_types check for REF_CPU_CYCLES event")
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm-x86-pmu-ops.h | 1 -
> arch/x86/kvm/pmu.c | 1 -
> arch/x86/kvm/pmu.h | 1 -
> arch/x86/kvm/svm/pmu.c | 6 ----
> arch/x86/kvm/vmx/pmu_intel.c | 38 --------------------------
> 5 files changed, 47 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm-x86-pmu-ops.h b/arch/x86/include/asm/kvm-x86-pmu-ops.h
> index 6c98f4bb4228..884af8ef7657 100644
> --- a/arch/x86/include/asm/kvm-x86-pmu-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-pmu-ops.h
> @@ -12,7 +12,6 @@ BUILD_BUG_ON(1)
> * a NULL definition, for example if "static_call_cond()" will be used
> * at the call sites.
> */
> -KVM_X86_PMU_OP(hw_event_available)
> KVM_X86_PMU_OP(pmc_idx_to_pmc)
> KVM_X86_PMU_OP(rdpmc_ecx_to_pmc)
> KVM_X86_PMU_OP(msr_idx_to_pmc)
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 9ae07db6f0f6..99ed72966528 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -374,7 +374,6 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
> static bool pmc_event_is_allowed(struct kvm_pmc *pmc)
> {
> return pmc_is_globally_enabled(pmc) && pmc_speculative_in_use(pmc) &&
> - static_call(kvm_x86_pmu_hw_event_available)(pmc) &&
> check_pmu_event_filter(pmc);
> }
>
> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index 5341e8f69a22..f3e7a356fd81 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -20,7 +20,6 @@
>
> struct kvm_pmu_ops {
> void (*init_pmu_capability)(void);
> - bool (*hw_event_available)(struct kvm_pmc *pmc);
> struct kvm_pmc *(*pmc_idx_to_pmc)(struct kvm_pmu *pmu, int pmc_idx);
> struct kvm_pmc *(*rdpmc_ecx_to_pmc)(struct kvm_vcpu *vcpu,
> unsigned int idx, u64 *mask);
> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
> index 373ff6a6687b..5596fe816ea8 100644
> --- a/arch/x86/kvm/svm/pmu.c
> +++ b/arch/x86/kvm/svm/pmu.c
> @@ -73,11 +73,6 @@ static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,
> return amd_pmc_idx_to_pmc(pmu, idx);
> }
>
> -static bool amd_hw_event_available(struct kvm_pmc *pmc)
> -{
> - return true;
> -}
> -
> static bool amd_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
> {
> struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> @@ -249,7 +244,6 @@ static void amd_pmu_reset(struct kvm_vcpu *vcpu)
> }
>
> struct kvm_pmu_ops amd_pmu_ops __initdata = {
> - .hw_event_available = amd_hw_event_available,
> .pmc_idx_to_pmc = amd_pmc_idx_to_pmc,
> .rdpmc_ecx_to_pmc = amd_rdpmc_ecx_to_pmc,
> .msr_idx_to_pmc = amd_msr_idx_to_pmc,
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index b239e7dbdc9b..9bf700da1e17 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -140,43 +140,6 @@ static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
> }
> }
>
> -static bool intel_hw_event_available(struct kvm_pmc *pmc)
> -{
> - struct kvm_pmu *pmu = pmc_to_pmu(pmc);
> - u8 event_select = pmc->eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
> - u8 unit_mask = (pmc->eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
> - int i;
> -
> - /*
> - * Fixed counters are always available if KVM reaches this point. If a
> - * fixed counter is unsupported in hardware or guest CPUID, KVM doesn't
> - * allow the counter's corresponding MSR to be written. KVM does use
> - * architectural events to program fixed counters, as the interface to
> - * perf doesn't allow requesting a specific fixed counter, e.g. perf
> - * may (sadly) back a guest fixed PMC with a general purposed counter.
> - * But if _hardware_ doesn't support the associated event, KVM simply
> - * doesn't enumerate support for the fixed counter.
> - */
> - if (pmc_is_fixed(pmc))
> - return true;
> -
> - BUILD_BUG_ON(ARRAY_SIZE(intel_arch_events) != NR_INTEL_ARCH_EVENTS);
> -
> - /*
> - * Disallow events reported as unavailable in guest CPUID. Note, this
> - * doesn't apply to pseudo-architectural events (see above).
> - */
> - for (i = 0; i < NR_REAL_INTEL_ARCH_EVENTS; i++) {
> - if (intel_arch_events[i].eventsel != event_select ||
> - intel_arch_events[i].unit_mask != unit_mask)
> - continue;
> -
> - return pmu->available_event_types & BIT(i);
> - }
> -
> - return true;
> -}
> -
> static bool intel_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
> {
> struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> @@ -842,7 +805,6 @@ void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu)
>
> struct kvm_pmu_ops intel_pmu_ops __initdata = {
> .init_pmu_capability = intel_init_pmu_capability,
> - .hw_event_available = intel_hw_event_available,
> .pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
> .rdpmc_ecx_to_pmc = intel_rdpmc_ecx_to_pmc,
> .msr_idx_to_pmc = intel_msr_idx_to_pmc,


Reviewed-by:  Dapeng Mi <[email protected]>