2022-07-21 10:47:34

by Like Xu

[permalink] [raw]
Subject: [PATCH v2 0/7] KVM: x86/pmu: Fix some corner cases including Intel PEBS

Good well-designed tests can help us find more bugs, especially when
the test steps differ from the Linux kernel behaviour in terms of the
timing of access to virtualized hw resources.

A new guest PEBS kvm-unit-test constructs a number of typical and
corner use cases to demonstrate how fragile the earlier PEBS
enabling patch set is. I prefer to reveal these flaws and fix them
myself before we receive complaints from projects that rely on it.

In this patch series, there is one small optimization (006), one hardware
surprise (002), and most of these fixes have stepped on my little toes.

Please feel free to run tests, add more or share comments.

Previous:
https://lore.kernel.org/kvm/[email protected]/

V1 -> V2 Changelog:
- For 3/7, use "hw_idx > -1" and add comment; (Sean)
- For 4/7, refine commit message and add comment; (Sean)
- For 6/7, inline reprogram_counter() and restrict pmc->current_config;

Like Xu (7):
perf/x86/core: Update x86_pmu.pebs_capable for ICELAKE_{X,D}
perf/x86/core: Completely disable guest PEBS via guest's global_ctrl
KVM: x86/pmu: Avoid setting BIT_ULL(-1) to pmu->host_cross_mapped_mask
KVM: x86/pmu: Don't generate PEBS records for emulated instructions
KVM: x86/pmu: Avoid using PEBS perf_events for normal counters
KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event()
KVM: x86/pmu: Defer counter emulated overflow via pmc->stale_counter

arch/x86/events/intel/core.c | 4 ++-
arch/x86/include/asm/kvm_host.h | 6 +++--
arch/x86/kvm/pmu.c | 47 +++++++++++++++++++++------------
arch/x86/kvm/pmu.h | 6 ++++-
arch/x86/kvm/svm/pmu.c | 2 +-
arch/x86/kvm/vmx/pmu_intel.c | 30 ++++++++++-----------
6 files changed, 58 insertions(+), 37 deletions(-)

--
2.37.1


2022-07-21 10:48:54

by Like Xu

[permalink] [raw]
Subject: [kvm-unit-tests PATCH] x86: Add tests for Guest Processor Event Based Sampling (PEBS)

This unit-test is intended to test the KVM's support for
the Processor Event Based Sampling (PEBS) which is another
PMU feature on Intel processors (start from Ice Lake Server).

If a bit in PEBS_ENABLE is set to 1, its corresponding counter will
write at least one PEBS records (including partial state of the vcpu
at the time of the current hardware event) to the guest memory on
counter overflow, and trigger an interrupt at a specific DS state.
The format of a PEBS record can be configured by another register.

These tests cover most usage scenarios, for example there are some
specially constructed scenarios (not a typical behaviour of Linux
PEBS driver). It lowers the threshold for others to understand this
feature and opens up more exploration of KVM implementation or
hw feature itself.

Signed-off-by: Like Xu <[email protected]>
---
lib/x86/msr.h | 1 +
x86/Makefile.x86_64 | 1 +
x86/pmu_pebs.c | 511 ++++++++++++++++++++++++++++++++++++++++++++
x86/unittests.cfg | 7 +
4 files changed, 520 insertions(+)
create mode 100644 x86/pmu_pebs.c

diff --git a/lib/x86/msr.h b/lib/x86/msr.h
index fa1c0c8..252e041 100644
--- a/lib/x86/msr.h
+++ b/lib/x86/msr.h
@@ -52,6 +52,7 @@
#define MSR_IA32_MCG_CTL 0x0000017b

#define MSR_IA32_PEBS_ENABLE 0x000003f1
+#define MSR_PEBS_DATA_CFG 0x000003f2
#define MSR_IA32_DS_AREA 0x00000600
#define MSR_IA32_PERF_CAPABILITIES 0x00000345

diff --git a/x86/Makefile.x86_64 b/x86/Makefile.x86_64
index e19284a..c82c274 100644
--- a/x86/Makefile.x86_64
+++ b/x86/Makefile.x86_64
@@ -33,6 +33,7 @@ tests += $(TEST_DIR)/vmware_backdoors.$(exe)
tests += $(TEST_DIR)/rdpru.$(exe)
tests += $(TEST_DIR)/pks.$(exe)
tests += $(TEST_DIR)/pmu_lbr.$(exe)
+tests += $(TEST_DIR)/pmu_pebs.$(exe)

ifeq ($(CONFIG_EFI),y)
tests += $(TEST_DIR)/amd_sev.$(exe)
diff --git a/x86/pmu_pebs.c b/x86/pmu_pebs.c
new file mode 100644
index 0000000..5498bb0
--- /dev/null
+++ b/x86/pmu_pebs.c
@@ -0,0 +1,511 @@
+#include "x86/msr.h"
+#include "x86/processor.h"
+#include "x86/isr.h"
+#include "x86/apic.h"
+#include "x86/apic-defs.h"
+#include "x86/desc.h"
+#include "alloc.h"
+
+#include "vm.h"
+#include "types.h"
+#include "processor.h"
+#include "vmalloc.h"
+#include "alloc_page.h"
+
+#define PC_VECTOR 32
+
+#define X86_FEATURE_PDCM (CPUID(0x1, 0, ECX, 15))
+
+#define PERF_CAP_PEBS_FORMAT 0xf00
+#define PMU_CAP_FW_WRITES (1ULL << 13)
+
+#define INTEL_PMC_IDX_FIXED 32
+
+#define GLOBAL_STATUS_BUFFER_OVF_BIT 62
+#define GLOBAL_STATUS_BUFFER_OVF BIT_ULL(GLOBAL_STATUS_BUFFER_OVF_BIT)
+
+#define EVNTSEL_USR_SHIFT 16
+#define EVNTSEL_OS_SHIFT 17
+#define EVNTSEL_EN_SHIF 22
+
+#define EVNTSEL_EN (1 << EVNTSEL_EN_SHIF)
+#define EVNTSEL_USR (1 << EVNTSEL_USR_SHIFT)
+#define EVNTSEL_OS (1 << EVNTSEL_OS_SHIFT)
+
+#define PEBS_DATACFG_MEMINFO BIT_ULL(0)
+#define PEBS_DATACFG_GP BIT_ULL(1)
+#define PEBS_DATACFG_XMMS BIT_ULL(2)
+#define PEBS_DATACFG_LBRS BIT_ULL(3)
+
+#define ICL_EVENTSEL_ADAPTIVE (1ULL << 34)
+#define PEBS_DATACFG_LBR_SHIFT 24
+#define MAX_NUM_LBR_ENTRY 32
+
+union perf_capabilities {
+ struct {
+ u64 lbr_format:6;
+ u64 pebs_trap:1;
+ u64 pebs_arch_reg:1;
+ u64 pebs_format:4;
+ u64 smm_freeze:1;
+ /*
+ * PMU supports separate counter range for writing
+ * values > 32bit.
+ */
+ u64 full_width_write:1;
+ u64 pebs_baseline:1;
+ u64 perf_metrics:1;
+ u64 pebs_output_pt_available:1;
+ u64 anythread_deprecated:1;
+ };
+ u64 capabilities;
+};
+
+union cpuid10_eax {
+ struct {
+ unsigned int version_id:8;
+ unsigned int num_counters:8;
+ unsigned int bit_width:8;
+ unsigned int mask_length:8;
+ } split;
+ unsigned int full;
+} pmu_eax;
+
+union cpuid10_edx {
+ struct {
+ unsigned int num_counters_fixed:5;
+ unsigned int bit_width_fixed:8;
+ unsigned int reserved:19;
+ } split;
+ unsigned int full;
+} pmu_edx;
+
+static u64 gp_counter_base = MSR_IA32_PERFCTR0;
+static union perf_capabilities perf;
+static unsigned int max_nr_gp_events;
+static unsigned long *ds_bufer;
+static unsigned long *pebs_buffer;
+static u64 ctr_start_val;
+
+struct debug_store {
+ u64 bts_buffer_base;
+ u64 bts_index;
+ u64 bts_absolute_maximum;
+ u64 bts_interrupt_threshold;
+ u64 pebs_buffer_base;
+ u64 pebs_index;
+ u64 pebs_absolute_maximum;
+ u64 pebs_interrupt_threshold;
+ u64 pebs_event_reset[64];
+};
+
+struct pebs_basic {
+ u64 format_size;
+ u64 ip;
+ u64 applicable_counters;
+ u64 tsc;
+};
+
+struct pebs_meminfo {
+ u64 address;
+ u64 aux;
+ u64 latency;
+ u64 tsx_tuning;
+};
+
+struct pebs_gprs {
+ u64 flags, ip, ax, cx, dx, bx, sp, bp, si, di;
+ u64 r8, r9, r10, r11, r12, r13, r14, r15;
+};
+
+struct pebs_xmm {
+ u64 xmm[16*2]; /* two entries for each register */
+};
+
+struct lbr_entry {
+ u64 from;
+ u64 to;
+ u64 info;
+};
+
+enum pmc_type {
+ GP = 0,
+ FIXED,
+};
+
+static uint32_t intel_arch_events[] = {
+ 0x00c4, /* PERF_COUNT_HW_BRANCH_INSTRUCTIONS */
+ 0x00c5, /* PERF_COUNT_HW_BRANCH_MISSES */
+ 0x0300, /* PERF_COUNT_HW_REF_CPU_CYCLES */
+ 0x003c, /* PERF_COUNT_HW_CPU_CYCLES */
+ 0x00c0, /* PERF_COUNT_HW_INSTRUCTIONS */
+ 0x013c, /* PERF_COUNT_HW_BUS_CYCLES */
+ 0x4f2e, /* PERF_COUNT_HW_CACHE_REFERENCES */
+ 0x412e, /* PERF_COUNT_HW_CACHE_MISSES */
+};
+
+static u64 pebs_data_cfgs[] = {
+ PEBS_DATACFG_MEMINFO,
+ PEBS_DATACFG_GP,
+ PEBS_DATACFG_XMMS,
+ PEBS_DATACFG_LBRS | ((MAX_NUM_LBR_ENTRY -1) << PEBS_DATACFG_LBR_SHIFT),
+};
+
+/* Iterating each counter value is a waste of time, pick a few typical values. */
+static u64 counter_start_values[] = {
+ /* if PEBS counter doesn't overflow at all */
+ 0,
+ 0xfffffffffff0,
+ /* normal counter overflow to have PEBS records */
+ 0xfffffffffffe,
+ /* test whether emulated instructions should trigger PEBS */
+ 0xffffffffffff,
+};
+
+static unsigned int get_adaptive_pebs_record_size(u64 pebs_data_cfg)
+{
+ unsigned int sz = sizeof(struct pebs_basic);
+
+ if (!perf.pebs_baseline)
+ return sz;
+
+ if (pebs_data_cfg & PEBS_DATACFG_MEMINFO)
+ sz += sizeof(struct pebs_meminfo);
+ if (pebs_data_cfg & PEBS_DATACFG_GP)
+ sz += sizeof(struct pebs_gprs);
+ if (pebs_data_cfg & PEBS_DATACFG_XMMS)
+ sz += sizeof(struct pebs_xmm);
+ if (pebs_data_cfg & PEBS_DATACFG_LBRS)
+ sz += MAX_NUM_LBR_ENTRY * sizeof(struct lbr_entry);
+
+ return sz;
+}
+
+static void cnt_overflow(isr_regs_t *regs)
+{
+ apic_write(APIC_EOI, 0);
+}
+
+static inline void workload(void)
+{
+ asm volatile(
+ "mov $0x0, %%eax\n"
+ "cmp $0x0, %%eax\n"
+ "jne label2\n"
+ "jne label2\n"
+ "jne label2\n"
+ "jne label2\n"
+ "mov $0x0, %%eax\n"
+ "cmp $0x0, %%eax\n"
+ "jne label2\n"
+ "jne label2\n"
+ "jne label2\n"
+ "jne label2\n"
+ "mov $0xa, %%eax\n"
+ "cpuid\n"
+ "mov $0xa, %%eax\n"
+ "cpuid\n"
+ "mov $0xa, %%eax\n"
+ "cpuid\n"
+ "mov $0xa, %%eax\n"
+ "cpuid\n"
+ "mov $0xa, %%eax\n"
+ "cpuid\n"
+ "mov $0xa, %%eax\n"
+ "cpuid\n"
+ "label2:\n"
+ :
+ :
+ : "eax", "ebx", "ecx", "edx");
+}
+
+static inline void workload2(void)
+{
+ asm volatile(
+ "mov $0x0, %%eax\n"
+ "cmp $0x0, %%eax\n"
+ "jne label3\n"
+ "jne label3\n"
+ "jne label3\n"
+ "jne label3\n"
+ "mov $0x0, %%eax\n"
+ "cmp $0x0, %%eax\n"
+ "jne label3\n"
+ "jne label3\n"
+ "jne label3\n"
+ "jne label3\n"
+ "mov $0xa, %%eax\n"
+ "cpuid\n"
+ "mov $0xa, %%eax\n"
+ "cpuid\n"
+ "mov $0xa, %%eax\n"
+ "cpuid\n"
+ "mov $0xa, %%eax\n"
+ "cpuid\n"
+ "mov $0xa, %%eax\n"
+ "cpuid\n"
+ "mov $0xa, %%eax\n"
+ "cpuid\n"
+ "label3:\n"
+ :
+ :
+ : "eax", "ebx", "ecx", "edx");
+}
+
+static void alloc_buffers(void)
+{
+ ds_bufer = alloc_page();
+ force_4k_page(ds_bufer);
+ memset(ds_bufer, 0x0, PAGE_SIZE);
+
+ pebs_buffer = alloc_page();
+ force_4k_page(pebs_buffer);
+ memset(pebs_buffer, 0x0, PAGE_SIZE);
+}
+
+static void free_buffers(void)
+{
+ if (ds_bufer)
+ free_page(ds_bufer);
+
+ if (pebs_buffer)
+ free_page(pebs_buffer);
+}
+
+static void pebs_enable(u64 bitmask, u64 pebs_data_cfg)
+{
+ static struct debug_store *ds;
+ u64 baseline_extra_ctrl, fixed_ctr_ctrl = 0;
+ unsigned int idx;
+
+ if (perf.pebs_baseline)
+ wrmsr(MSR_PEBS_DATA_CFG, pebs_data_cfg);
+
+ ds = (struct debug_store *)ds_bufer;
+ ds->pebs_index = ds->pebs_buffer_base = (unsigned long)pebs_buffer;
+ ds->pebs_absolute_maximum = (unsigned long)pebs_buffer + PAGE_SIZE;
+ ds->pebs_interrupt_threshold = ds->pebs_buffer_base +
+ get_adaptive_pebs_record_size(pebs_data_cfg);
+
+ for (idx = 0; idx < pmu_edx.split.num_counters_fixed; idx++) {
+ if (!(BIT_ULL(INTEL_PMC_IDX_FIXED + idx) & bitmask))
+ continue;
+ baseline_extra_ctrl = perf.pebs_baseline ?
+ (1ULL << (INTEL_PMC_IDX_FIXED + idx * 4)) : 0;
+ wrmsr(MSR_CORE_PERF_FIXED_CTR0 + idx, ctr_start_val);
+ fixed_ctr_ctrl |= (0xbULL << (idx * 4) | baseline_extra_ctrl);
+ }
+ if (fixed_ctr_ctrl)
+ wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, fixed_ctr_ctrl);
+
+ for (idx = 0; idx < max_nr_gp_events; idx++) {
+ if (!(BIT_ULL(idx) & bitmask))
+ continue;
+ baseline_extra_ctrl = perf.pebs_baseline ?
+ ICL_EVENTSEL_ADAPTIVE : 0;
+ wrmsr(MSR_P6_EVNTSEL0 + idx,
+ EVNTSEL_EN | EVNTSEL_OS | EVNTSEL_USR |
+ intel_arch_events[idx] | baseline_extra_ctrl);
+ wrmsr(gp_counter_base + idx, ctr_start_val);
+ }
+
+ wrmsr(MSR_IA32_DS_AREA, (unsigned long)ds_bufer);
+ wrmsr(MSR_IA32_PEBS_ENABLE, bitmask);
+ wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, bitmask);
+}
+
+static void pmu_env_cleanup(void)
+{
+ unsigned int idx;
+
+ memset(ds_bufer, 0x0, PAGE_SIZE);
+ memset(pebs_buffer, 0x0, PAGE_SIZE);
+ wrmsr(MSR_IA32_PEBS_ENABLE, 0);
+ wrmsr(MSR_IA32_DS_AREA, 0);
+ if (perf.pebs_baseline)
+ wrmsr(MSR_PEBS_DATA_CFG, 0);
+
+ wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+
+ wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
+ for (idx = 0; idx < pmu_edx.split.num_counters_fixed; idx++) {
+ wrmsr(MSR_CORE_PERF_FIXED_CTR0 + idx, 0);
+ }
+
+ for (idx = 0; idx < pmu_eax.split.num_counters; idx++) {
+ wrmsr(MSR_P6_EVNTSEL0 + idx, 0);
+ wrmsr(MSR_IA32_PERFCTR0 + idx, 0);
+ }
+
+ wrmsr(MSR_CORE_PERF_GLOBAL_OVF_CTRL, rdmsr(MSR_CORE_PERF_GLOBAL_STATUS));
+}
+
+static inline void pebs_disable_1(void)
+{
+ wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+}
+
+static inline void pebs_disable_2(void)
+{
+ wrmsr(MSR_IA32_PEBS_ENABLE, 0);
+ wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+}
+
+static void pebs_disable(unsigned int idx)
+{
+ if (idx % 2) {
+ pebs_disable_1();
+ } else {
+ pebs_disable_2();
+ }
+}
+
+static void check_pebs_records(u64 bitmask, u64 pebs_data_cfg)
+{
+ struct pebs_basic *pebs_rec = (struct pebs_basic *)pebs_buffer;
+ struct debug_store *ds = (struct debug_store *)ds_bufer;
+ unsigned int pebs_record_size = get_adaptive_pebs_record_size(pebs_data_cfg);
+ unsigned int count = 0;
+ bool expected, pebs_idx_match, pebs_size_match, data_cfg_match;
+ void *vernier;
+
+ expected = (ds->pebs_index == ds->pebs_buffer_base) && !pebs_rec->format_size;
+ if (!(rdmsr(MSR_CORE_PERF_GLOBAL_STATUS) & GLOBAL_STATUS_BUFFER_OVF)) {
+ report(expected, "No OVF irq, none PEBS records.");
+ return;
+ }
+
+ if (expected) {
+ report(!expected, "A OVF irq, but none PEBS records.");
+ return;
+ }
+
+ expected = ds->pebs_index >= ds->pebs_interrupt_threshold;
+ vernier = (void *)pebs_buffer;
+ do {
+ pebs_rec = (struct pebs_basic *)vernier;
+ pebs_record_size = pebs_rec->format_size >> 48;
+ pebs_idx_match =
+ pebs_rec->applicable_counters & bitmask;
+ pebs_size_match =
+ pebs_record_size == get_adaptive_pebs_record_size(pebs_data_cfg);
+ data_cfg_match =
+ (pebs_rec->format_size & 0xffffffffffff) == pebs_data_cfg;
+ expected = pebs_idx_match && pebs_size_match && data_cfg_match;
+ report(expected,
+ "PEBS record (written seq %d) is verified (inclduing size, counters and cfg).", count);
+ vernier = vernier + pebs_record_size;
+ count++;
+ } while (expected && (void *)vernier < (void *)ds->pebs_index);
+
+ if (!expected) {
+ if (!pebs_idx_match)
+ printf("FAIL: The applicable_counters (0x%lx) doesn't match with pmc_bitmask (0x%lx).\n",
+ pebs_rec->applicable_counters, bitmask);
+ if (!pebs_size_match)
+ printf("FAIL: The pebs_record_size (%d) doesn't match with MSR_PEBS_DATA_CFG (%d).\n",
+ pebs_record_size, get_adaptive_pebs_record_size(pebs_data_cfg));
+ if (!data_cfg_match)
+ printf("FAIL: The pebs_data_cfg (0x%lx) doesn't match with MSR_PEBS_DATA_CFG (0x%lx).\n",
+ pebs_rec->format_size & 0xffffffffffff, pebs_data_cfg);
+ }
+}
+
+static void check_one_counter(enum pmc_type type,
+ unsigned int idx, u64 pebs_data_cfg)
+{
+ report_prefix_pushf("%s counter %d (0x%lx)",
+ type == FIXED ? "Extended Fixed" : "GP", idx, ctr_start_val);
+ pmu_env_cleanup();
+ pebs_enable(BIT_ULL(type == FIXED ? INTEL_PMC_IDX_FIXED + idx : idx), pebs_data_cfg);
+ workload();
+ pebs_disable(idx);
+ check_pebs_records(BIT_ULL(type == FIXED ? INTEL_PMC_IDX_FIXED + idx : idx), pebs_data_cfg);
+ report_prefix_pop();
+}
+
+static void check_multiple_counters(u64 bitmask, u64 pebs_data_cfg)
+{
+ pmu_env_cleanup();
+ pebs_enable(bitmask, pebs_data_cfg);
+ workload2();
+ pebs_disable(0);
+ check_pebs_records(bitmask, pebs_data_cfg);
+}
+
+static void check_pebs_counters(u64 pebs_data_cfg)
+{
+ unsigned int idx;
+ u64 bitmask = 0;
+
+ for (idx = 0; idx < pmu_edx.split.num_counters_fixed; idx++)
+ check_one_counter(FIXED, idx, pebs_data_cfg);
+
+ for (idx = 0; idx < max_nr_gp_events; idx++)
+ check_one_counter(GP, idx, pebs_data_cfg);
+
+ for (idx = 0; idx < pmu_edx.split.num_counters_fixed; idx++)
+ bitmask |= BIT_ULL(INTEL_PMC_IDX_FIXED + idx);
+ for (idx = 0; idx < max_nr_gp_events; idx += 2)
+ bitmask |= BIT_ULL(idx);
+ report_prefix_pushf("Multiple (0x%lx)", bitmask);
+ check_multiple_counters(bitmask, pebs_data_cfg);
+ report_prefix_pop();
+}
+
+int main(int ac, char **av)
+{
+ struct cpuid id;
+ unsigned int i, j;
+
+ setup_vm();
+ id = cpuid(10);
+
+ pmu_eax.full = id.a;
+ pmu_edx.full = id.d;
+ max_nr_gp_events = MIN(pmu_eax.split.num_counters, ARRAY_SIZE(intel_arch_events));
+
+ printf("PMU version: %d\n", pmu_eax.split.version_id);
+ if (this_cpu_has(X86_FEATURE_PDCM))
+ perf.capabilities = rdmsr(MSR_IA32_PERF_CAPABILITIES);
+
+ if (perf.capabilities & PMU_CAP_FW_WRITES)
+ gp_counter_base = MSR_IA32_PMC0;
+
+ if (!is_intel() || (pmu_eax.split.version_id < 2) ||
+ !(perf.capabilities & PERF_CAP_PEBS_FORMAT) ||
+ (rdmsr(MSR_IA32_MISC_ENABLE) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL)) {
+ report_skip("This platform doesn't support guest PEBS.");
+ return 0;
+ }
+
+ printf("PEBS format: %d\n", perf.pebs_format);
+ printf("PEBS GP counters: %d\n", pmu_eax.split.num_counters);
+ printf("PEBS Fixed counters: %d\n", pmu_edx.split.num_counters_fixed);
+ printf("PEBS baseline (Adaptive PEBS): %d\n", perf.pebs_baseline);
+
+ printf("Known reasons for none PEBS records:\n");
+ printf("1. The selected event does not support PEBS;\n");
+ printf("2. From a core pmu perspective, the vCPU and pCPU models are not same;\n");
+ printf("3. Guest counter has not yet overflowed or been cross-mapped by the host;\n");
+
+ handle_irq(PC_VECTOR, cnt_overflow);
+ alloc_buffers();
+
+ for (i = 0; i < ARRAY_SIZE(counter_start_values); i++) {
+ ctr_start_val = counter_start_values[i];
+ check_pebs_counters(0);
+ if (!perf.pebs_baseline)
+ continue;
+
+ for (j = 0; j < ARRAY_SIZE(pebs_data_cfgs); j++) {
+ report_prefix_pushf("Adaptive (0x%lx)", pebs_data_cfgs[j]);
+ check_pebs_counters(pebs_data_cfgs[j]);
+ report_prefix_pop();
+ }
+ }
+
+ free_buffers();
+
+ return report_summary();
+}
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index d6dc19f..5731454 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -198,6 +198,13 @@ check = /sys/module/kvm/parameters/ignore_msrs=N
check = /proc/sys/kernel/nmi_watchdog=0
accel = kvm

+[pmu_pebs]
+arch = x86_64
+file = pmu_pebs.flat
+extra_params = -cpu host,migratable=no
+check = /proc/sys/kernel/nmi_watchdog=0
+accel = kvm
+
[pmu_emulation]
file = pmu.flat
arch = x86_64
--
2.37.1

2022-07-21 11:01:10

by Like Xu

[permalink] [raw]
Subject: [PATCH v2 1/7] perf/x86/core: Update x86_pmu.pebs_capable for ICELAKE_{X,D}

From: Like Xu <[email protected]>

Ice Lake microarchitecture with EPT-Friendly PEBS capability also support
the Extended feature, which means that all counters (both fixed function
and general purpose counters) can be used for PEBS events.

Update x86_pmu.pebs_capable like SPR to apply PEBS_ALL semantics.

Cc: Kan Liang <[email protected]>
Fixes: fb358e0b811e ("perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server")
Signed-off-by: Like Xu <[email protected]>
---
arch/x86/events/intel/core.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 4e9b7af9cc45..e46fd496187b 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -6239,6 +6239,7 @@ __init int intel_pmu_init(void)
case INTEL_FAM6_ICELAKE_X:
case INTEL_FAM6_ICELAKE_D:
x86_pmu.pebs_ept = 1;
+ x86_pmu.pebs_capable = ~0ULL;
pmem = true;
fallthrough;
case INTEL_FAM6_ICELAKE_L:
--
2.37.1

2022-07-27 23:15:09

by Sean Christopherson

[permalink] [raw]
Subject: Re: [kvm-unit-tests PATCH] x86: Add tests for Guest Processor Event Based Sampling (PEBS)

On Thu, Jul 21, 2022, Like Xu wrote:
> +union perf_capabilities {
> + struct {
> + u64 lbr_format:6;
> + u64 pebs_trap:1;
> + u64 pebs_arch_reg:1;
> + u64 pebs_format:4;
> + u64 smm_freeze:1;
> + /*
> + * PMU supports separate counter range for writing
> + * values > 32bit.
> + */
> + u64 full_width_write:1;
> + u64 pebs_baseline:1;
> + u64 perf_metrics:1;
> + u64 pebs_output_pt_available:1;
> + u64 anythread_deprecated:1;
> + };
> + u64 capabilities;
> +};
> +
> +union cpuid10_eax {
> + struct {
> + unsigned int version_id:8;
> + unsigned int num_counters:8;
> + unsigned int bit_width:8;
> + unsigned int mask_length:8;
> + } split;
> + unsigned int full;
> +} pmu_eax;
> +
> +union cpuid10_edx {
> + struct {
> + unsigned int num_counters_fixed:5;
> + unsigned int bit_width_fixed:8;
> + unsigned int reserved:19;
> + } split;
> + unsigned int full;
> +} pmu_edx;

The generic unions are hopefully unnecessary now that helpers are provided by
lib/x86/processor.h, e.g. for pmu_version().

I would prefer to have similar helpers instead of "union perf_capabilities",
but it's not a sticking point if helpers a signifiantly more painful to use.

> + if (!is_intel() || (pmu_eax.split.version_id < 2) ||
> + !(perf.capabilities & PERF_CAP_PEBS_FORMAT) ||
> + (rdmsr(MSR_IA32_MISC_ENABLE) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL)) {

Split these up, it's really, really annoying to have to guess which one of the
four checks failed.

> + report_skip("This platform doesn't support guest PEBS.");
> + return 0;

This needs be be "return report_summary()", otherwise the test says pass when it
didn't do anyting:

TESTNAME=pmu_pebs TIMEOUT=90s ACCEL=kvm ./x86/run x86/pmu_pebs.flat -smp 1 -cpu host,migratable=no
PASS pmu_pebs

wait a second...

SKIP: This platform doesn't support guest PEBS.

E.g. (though if KUT can provide more information on why PERF_CAP_PEBS_FORMAT
may not be advertised, e.g. requires ICX+?, that would be nice to have)

if (!is_intel()) {
report_skip("PEBS is only supported on Intel CPUs");
return report_summary();
}
if (pmu_version() < 2) {
report_skip("Architectural PMU not available");
return report_summary();
}
if (!(perf.capabilities & PERF_CAP_PEBS_FORMAT)) {
report_skip("PEBS not enumerated in PERF_CAPABILITIES");
return report_summary();
}
if (rdmsr(MSR_IA32_MISC_ENABLE) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL) {
report_skip("PEBS unavailable according to MISC_ENABLE");
return report_summary();
}

2022-07-28 11:50:30

by Like Xu

[permalink] [raw]
Subject: Re: [kvm-unit-tests PATCH] x86: Add tests for Guest Processor Event Based Sampling (PEBS)

On 28/7/2022 6:42 am, Sean Christopherson wrote:
> On Thu, Jul 21, 2022, Like Xu wrote:
>> +union perf_capabilities {
>> + struct {
>> + u64 lbr_format:6;
>> + u64 pebs_trap:1;
>> + u64 pebs_arch_reg:1;
>> + u64 pebs_format:4;
>> + u64 smm_freeze:1;
>> + /*
>> + * PMU supports separate counter range for writing
>> + * values > 32bit.
>> + */
>> + u64 full_width_write:1;
>> + u64 pebs_baseline:1;
>> + u64 perf_metrics:1;
>> + u64 pebs_output_pt_available:1;
>> + u64 anythread_deprecated:1;
>> + };
>> + u64 capabilities;
>> +};
>> +
>> +union cpuid10_eax {
>> + struct {
>> + unsigned int version_id:8;
>> + unsigned int num_counters:8;
>> + unsigned int bit_width:8;
>> + unsigned int mask_length:8;
>> + } split;
>> + unsigned int full;
>> +} pmu_eax;
>> +
>> +union cpuid10_edx {
>> + struct {
>> + unsigned int num_counters_fixed:5;
>> + unsigned int bit_width_fixed:8;
>> + unsigned int reserved:19;
>> + } split;
>> + unsigned int full;
>> +} pmu_edx;
>
> The generic unions are hopefully unnecessary now that helpers are provided by
> lib/x86/processor.h, e.g. for pmu_version().
>
> I would prefer to have similar helpers instead of "union perf_capabilities",
> but it's not a sticking point if helpers a signifiantly more painful to use.
>
>> + if (!is_intel() || (pmu_eax.split.version_id < 2) ||
>> + !(perf.capabilities & PERF_CAP_PEBS_FORMAT) ||
>> + (rdmsr(MSR_IA32_MISC_ENABLE) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL)) {
>
> Split these up, it's really, really annoying to have to guess which one of the
> four checks failed.
>
>> + report_skip("This platform doesn't support guest PEBS.");
>> + return 0;
>
> This needs be be "return report_summary()", otherwise the test says pass when it
> didn't do anyting:
>
> TESTNAME=pmu_pebs TIMEOUT=90s ACCEL=kvm ./x86/run x86/pmu_pebs.flat -smp 1 -cpu host,migratable=no
> PASS pmu_pebs
>
> wait a second...
>
> SKIP: This platform doesn't support guest PEBS.
>
> E.g. (though if KUT can provide more information on why PERF_CAP_PEBS_FORMAT
> may not be advertised, e.g. requires ICX+?, that would be nice to have)
>
> if (!is_intel()) {
> report_skip("PEBS is only supported on Intel CPUs");
> return report_summary();
> }
> if (pmu_version() < 2) {
> report_skip("Architectural PMU not available");
> return report_summary();
> }
> if (!(perf.capabilities & PERF_CAP_PEBS_FORMAT)) {
> report_skip("PEBS not enumerated in PERF_CAPABILITIES");
> return report_summary();
> }
> if (rdmsr(MSR_IA32_MISC_ENABLE) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL) {
> report_skip("PEBS unavailable according to MISC_ENABLE");
> return report_summary();
> }

Thanks, all applied. Please review this new version as a separate thread.
https://lore.kernel.org/kvm/[email protected]/

2022-08-02 11:19:03

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] KVM: x86/pmu: Fix some corner cases including Intel PEBS

To catch up on the first -rc, ping beggarly and guiltily.

On 21/7/2022 6:35 pm, Like Xu wrote:
> Good well-designed tests can help us find more bugs, especially when
> the test steps differ from the Linux kernel behaviour in terms of the
> timing of access to virtualized hw resources.
>
> A new guest PEBS kvm-unit-test constructs a number of typical and
> corner use cases to demonstrate how fragile the earlier PEBS
> enabling patch set is. I prefer to reveal these flaws and fix them
> myself before we receive complaints from projects that rely on it.
>
> In this patch series, there is one small optimization (006), one hardware
> surprise (002), and most of these fixes have stepped on my little toes.
>
> Please feel free to run tests, add more or share comments.
>
> Previous:
> https://lore.kernel.org/kvm/[email protected]/
>
> V1 -> V2 Changelog:
> - For 3/7, use "hw_idx > -1" and add comment; (Sean)
> - For 4/7, refine commit message and add comment; (Sean)
> - For 6/7, inline reprogram_counter() and restrict pmc->current_config;
>
> Like Xu (7):
> perf/x86/core: Update x86_pmu.pebs_capable for ICELAKE_{X,D}
> perf/x86/core: Completely disable guest PEBS via guest's global_ctrl
> KVM: x86/pmu: Avoid setting BIT_ULL(-1) to pmu->host_cross_mapped_mask
> KVM: x86/pmu: Don't generate PEBS records for emulated instructions
> KVM: x86/pmu: Avoid using PEBS perf_events for normal counters
> KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event()
> KVM: x86/pmu: Defer counter emulated overflow via pmc->stale_counter
>
> arch/x86/events/intel/core.c | 4 ++-
> arch/x86/include/asm/kvm_host.h | 6 +++--
> arch/x86/kvm/pmu.c | 47 +++++++++++++++++++++------------
> arch/x86/kvm/pmu.h | 6 ++++-
> arch/x86/kvm/svm/pmu.c | 2 +-
> arch/x86/kvm/vmx/pmu_intel.c | 30 ++++++++++-----------
> 6 files changed, 58 insertions(+), 37 deletions(-)
>

2022-08-11 07:24:06

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] KVM: x86/pmu: Fix some corner cases including Intel PEBS

kindly ping, :)
On Tue, 2 Aug 2022 at 19:16, Like Xu <[email protected]> wrote:
>
> To catch up on the first -rc, ping beggarly and guiltily.
>
> On 21/7/2022 6:35 pm, Like Xu wrote:
> > Good well-designed tests can help us find more bugs, especially when
> > the test steps differ from the Linux kernel behaviour in terms of the
> > timing of access to virtualized hw resources.
> >
> > A new guest PEBS kvm-unit-test constructs a number of typical and
> > corner use cases to demonstrate how fragile the earlier PEBS
> > enabling patch set is. I prefer to reveal these flaws and fix them
> > myself before we receive complaints from projects that rely on it.
> >
> > In this patch series, there is one small optimization (006), one hardware
> > surprise (002), and most of these fixes have stepped on my little toes.
> >
> > Please feel free to run tests, add more or share comments.
> >
> > Previous:
> > https://lore.kernel.org/kvm/[email protected]/
> >
> > V1 -> V2 Changelog:
> > - For 3/7, use "hw_idx > -1" and add comment; (Sean)
> > - For 4/7, refine commit message and add comment; (Sean)
> > - For 6/7, inline reprogram_counter() and restrict pmc->current_config;
> >
> > Like Xu (7):
> > perf/x86/core: Update x86_pmu.pebs_capable for ICELAKE_{X,D}
> > perf/x86/core: Completely disable guest PEBS via guest's global_ctrl
> > KVM: x86/pmu: Avoid setting BIT_ULL(-1) to pmu->host_cross_mapped_mask
> > KVM: x86/pmu: Don't generate PEBS records for emulated instructions
> > KVM: x86/pmu: Avoid using PEBS perf_events for normal counters
> > KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event()
> > KVM: x86/pmu: Defer counter emulated overflow via pmc->stale_counter
> >
> > arch/x86/events/intel/core.c | 4 ++-
> > arch/x86/include/asm/kvm_host.h | 6 +++--
> > arch/x86/kvm/pmu.c | 47 +++++++++++++++++++++------------
> > arch/x86/kvm/pmu.h | 6 ++++-
> > arch/x86/kvm/svm/pmu.c | 2 +-
> > arch/x86/kvm/vmx/pmu_intel.c | 30 ++++++++++-----------
> > 6 files changed, 58 insertions(+), 37 deletions(-)
> >

2022-08-12 07:57:36

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] perf/x86/core: Update x86_pmu.pebs_capable for ICELAKE_{X,D}

On 7/21/22 12:35, Like Xu wrote:
> From: Like Xu <[email protected]>
>
> Ice Lake microarchitecture with EPT-Friendly PEBS capability also support
> the Extended feature, which means that all counters (both fixed function
> and general purpose counters) can be used for PEBS events.
>
> Update x86_pmu.pebs_capable like SPR to apply PEBS_ALL semantics.
>
> Cc: Kan Liang <[email protected]>
> Fixes: fb358e0b811e ("perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server")
> Signed-off-by: Like Xu <[email protected]>
> ---
> arch/x86/events/intel/core.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 4e9b7af9cc45..e46fd496187b 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -6239,6 +6239,7 @@ __init int intel_pmu_init(void)
> case INTEL_FAM6_ICELAKE_X:
> case INTEL_FAM6_ICELAKE_D:
> x86_pmu.pebs_ept = 1;
> + x86_pmu.pebs_capable = ~0ULL;
> pmem = true;
> fallthrough;
> case INTEL_FAM6_ICELAKE_L:

Peter, can you please ack this (you were not CCed on this KVM series but
this patch is really perf core)?

Thanks,

Paolo

2022-08-15 10:13:30

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] perf/x86/core: Update x86_pmu.pebs_capable for ICELAKE_{X,D}

On 15/8/2022 5:31 pm, Peter Zijlstra wrote:
> On Fri, Aug 12, 2022 at 09:52:13AM +0200, Paolo Bonzini wrote:
>> On 7/21/22 12:35, Like Xu wrote:
>>> From: Like Xu <[email protected]>
>>>
>>> Ice Lake microarchitecture with EPT-Friendly PEBS capability also support
>>> the Extended feature, which means that all counters (both fixed function
>>> and general purpose counters) can be used for PEBS events.
>>>
>>> Update x86_pmu.pebs_capable like SPR to apply PEBS_ALL semantics.
>>>
>>> Cc: Kan Liang <[email protected]>
>>> Fixes: fb358e0b811e ("perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server")
>>> Signed-off-by: Like Xu <[email protected]>
>>> ---
>>> arch/x86/events/intel/core.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>>> index 4e9b7af9cc45..e46fd496187b 100644
>>> --- a/arch/x86/events/intel/core.c
>>> +++ b/arch/x86/events/intel/core.c
>>> @@ -6239,6 +6239,7 @@ __init int intel_pmu_init(void)
>>> case INTEL_FAM6_ICELAKE_X:
>>> case INTEL_FAM6_ICELAKE_D:
>>> x86_pmu.pebs_ept = 1;
>>> + x86_pmu.pebs_capable = ~0ULL;
>>> pmem = true;
>>> fallthrough;
>>> case INTEL_FAM6_ICELAKE_L:
>>
>> Peter, can you please ack this (you were not CCed on this KVM series but
>> this patch is really perf core)?
>
> I would much rather see something like this; except I don't know if it
> is fully correct. I can never find what models support what... Kan do
> you know?

For guest PEBS, it's a minor optimization from 0d23dc34a7ce to reduce branch
instructions:
https://lore.kernel.org/kvm/[email protected]/

>
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 2db93498ff71..b42c1beb9924 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -5933,7 +5933,6 @@ __init int intel_pmu_init(void)
> x86_pmu.pebs_aliases = NULL;
> x86_pmu.pebs_prec_dist = true;
> x86_pmu.lbr_pt_coexist = true;
> - x86_pmu.pebs_capable = ~0ULL;
> x86_pmu.flags |= PMU_FL_HAS_RSP_1;
> x86_pmu.flags |= PMU_FL_PEBS_ALL;
> x86_pmu.get_event_constraints = glp_get_event_constraints;
> @@ -6291,7 +6290,6 @@ __init int intel_pmu_init(void)
> x86_pmu.pebs_aliases = NULL;
> x86_pmu.pebs_prec_dist = true;
> x86_pmu.pebs_block = true;
> - x86_pmu.pebs_capable = ~0ULL;
> x86_pmu.flags |= PMU_FL_HAS_RSP_1;
> x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
> x86_pmu.flags |= PMU_FL_PEBS_ALL;
> @@ -6337,7 +6335,6 @@ __init int intel_pmu_init(void)
> x86_pmu.pebs_aliases = NULL;
> x86_pmu.pebs_prec_dist = true;
> x86_pmu.pebs_block = true;
> - x86_pmu.pebs_capable = ~0ULL;
> x86_pmu.flags |= PMU_FL_HAS_RSP_1;
> x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
> x86_pmu.flags |= PMU_FL_PEBS_ALL;
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index ba60427caa6d..e2da643632b9 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -2258,6 +2258,7 @@ void __init intel_ds_init(void)
> x86_pmu.drain_pebs = intel_pmu_drain_pebs_icl;
> x86_pmu.pebs_record_size = sizeof(struct pebs_basic);
> if (x86_pmu.intel_cap.pebs_baseline) {
> + x86_pmu.pebs_capable = ~0ULL;

The two features of "Extended PEBS (about pebs_capable)" and "Adaptive PEBS
(about pebs_baseline)"
are orthogonal, although the two are often supported together.

> x86_pmu.large_pebs_flags |=
> PERF_SAMPLE_BRANCH_STACK |
> PERF_SAMPLE_TIME;

2022-08-15 10:39:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] perf/x86/core: Update x86_pmu.pebs_capable for ICELAKE_{X,D}

On Fri, Aug 12, 2022 at 09:52:13AM +0200, Paolo Bonzini wrote:
> On 7/21/22 12:35, Like Xu wrote:
> > From: Like Xu <[email protected]>
> >
> > Ice Lake microarchitecture with EPT-Friendly PEBS capability also support
> > the Extended feature, which means that all counters (both fixed function
> > and general purpose counters) can be used for PEBS events.
> >
> > Update x86_pmu.pebs_capable like SPR to apply PEBS_ALL semantics.
> >
> > Cc: Kan Liang <[email protected]>
> > Fixes: fb358e0b811e ("perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server")
> > Signed-off-by: Like Xu <[email protected]>
> > ---
> > arch/x86/events/intel/core.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > index 4e9b7af9cc45..e46fd496187b 100644
> > --- a/arch/x86/events/intel/core.c
> > +++ b/arch/x86/events/intel/core.c
> > @@ -6239,6 +6239,7 @@ __init int intel_pmu_init(void)
> > case INTEL_FAM6_ICELAKE_X:
> > case INTEL_FAM6_ICELAKE_D:
> > x86_pmu.pebs_ept = 1;
> > + x86_pmu.pebs_capable = ~0ULL;
> > pmem = true;
> > fallthrough;
> > case INTEL_FAM6_ICELAKE_L:
>
> Peter, can you please ack this (you were not CCed on this KVM series but
> this patch is really perf core)?

I would much rather see something like this; except I don't know if it
is fully correct. I can never find what models support what... Kan do
you know?


diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 2db93498ff71..b42c1beb9924 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5933,7 +5933,6 @@ __init int intel_pmu_init(void)
x86_pmu.pebs_aliases = NULL;
x86_pmu.pebs_prec_dist = true;
x86_pmu.lbr_pt_coexist = true;
- x86_pmu.pebs_capable = ~0ULL;
x86_pmu.flags |= PMU_FL_HAS_RSP_1;
x86_pmu.flags |= PMU_FL_PEBS_ALL;
x86_pmu.get_event_constraints = glp_get_event_constraints;
@@ -6291,7 +6290,6 @@ __init int intel_pmu_init(void)
x86_pmu.pebs_aliases = NULL;
x86_pmu.pebs_prec_dist = true;
x86_pmu.pebs_block = true;
- x86_pmu.pebs_capable = ~0ULL;
x86_pmu.flags |= PMU_FL_HAS_RSP_1;
x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
x86_pmu.flags |= PMU_FL_PEBS_ALL;
@@ -6337,7 +6335,6 @@ __init int intel_pmu_init(void)
x86_pmu.pebs_aliases = NULL;
x86_pmu.pebs_prec_dist = true;
x86_pmu.pebs_block = true;
- x86_pmu.pebs_capable = ~0ULL;
x86_pmu.flags |= PMU_FL_HAS_RSP_1;
x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
x86_pmu.flags |= PMU_FL_PEBS_ALL;
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index ba60427caa6d..e2da643632b9 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -2258,6 +2258,7 @@ void __init intel_ds_init(void)
x86_pmu.drain_pebs = intel_pmu_drain_pebs_icl;
x86_pmu.pebs_record_size = sizeof(struct pebs_basic);
if (x86_pmu.intel_cap.pebs_baseline) {
+ x86_pmu.pebs_capable = ~0ULL;
x86_pmu.large_pebs_flags |=
PERF_SAMPLE_BRANCH_STACK |
PERF_SAMPLE_TIME;

2022-08-15 11:54:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] perf/x86/core: Update x86_pmu.pebs_capable for ICELAKE_{X,D}

On Mon, Aug 15, 2022 at 05:43:34PM +0800, Like Xu wrote:

> > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > index 2db93498ff71..b42c1beb9924 100644
> > --- a/arch/x86/events/intel/core.c
> > +++ b/arch/x86/events/intel/core.c
> > @@ -5933,7 +5933,6 @@ __init int intel_pmu_init(void)
> > x86_pmu.pebs_aliases = NULL;
> > x86_pmu.pebs_prec_dist = true;
> > x86_pmu.lbr_pt_coexist = true;
> > - x86_pmu.pebs_capable = ~0ULL;
> > x86_pmu.flags |= PMU_FL_HAS_RSP_1;
> > x86_pmu.flags |= PMU_FL_PEBS_ALL;
> > x86_pmu.get_event_constraints = glp_get_event_constraints;
> > @@ -6291,7 +6290,6 @@ __init int intel_pmu_init(void)
> > x86_pmu.pebs_aliases = NULL;
> > x86_pmu.pebs_prec_dist = true;
> > x86_pmu.pebs_block = true;
> > - x86_pmu.pebs_capable = ~0ULL;
> > x86_pmu.flags |= PMU_FL_HAS_RSP_1;
> > x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
> > x86_pmu.flags |= PMU_FL_PEBS_ALL;
> > @@ -6337,7 +6335,6 @@ __init int intel_pmu_init(void)
> > x86_pmu.pebs_aliases = NULL;
> > x86_pmu.pebs_prec_dist = true;
> > x86_pmu.pebs_block = true;
> > - x86_pmu.pebs_capable = ~0ULL;
> > x86_pmu.flags |= PMU_FL_HAS_RSP_1;
> > x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
> > x86_pmu.flags |= PMU_FL_PEBS_ALL;
> > diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> > index ba60427caa6d..e2da643632b9 100644
> > --- a/arch/x86/events/intel/ds.c
> > +++ b/arch/x86/events/intel/ds.c
> > @@ -2258,6 +2258,7 @@ void __init intel_ds_init(void)
> > x86_pmu.drain_pebs = intel_pmu_drain_pebs_icl;
> > x86_pmu.pebs_record_size = sizeof(struct pebs_basic);
> > if (x86_pmu.intel_cap.pebs_baseline) {
> > + x86_pmu.pebs_capable = ~0ULL;
>
> The two features of "Extended PEBS (about pebs_capable)" and "Adaptive PEBS
> (about pebs_baseline)"
> are orthogonal, although the two are often supported together.

The SDM explicitly states that PEBS Baseline implies Extended PEBS. See
3-19.8 (April 22 edition).

The question is if there is hardware that has Extended PEBS but doesn't
have Baseline; and I simply don't know and was hoping Kan could find
out.

That said; the above patch can be further improved by also removing the
PMU_FL_PEBS_ALL lines, which is already set by intel_ds_init().

In general though; the point is, we shouldn't be doing the FMS table
thing for discoverable features. Having pebs_capable = ~0 and
PMU_FL_PEBS_ALL on something with BASELINE set is just wrong.

2022-08-15 13:24:10

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] perf/x86/core: Update x86_pmu.pebs_capable for ICELAKE_{X,D}

On 15/8/2022 7:51 pm, Peter Zijlstra wrote:
> On Mon, Aug 15, 2022 at 05:43:34PM +0800, Like Xu wrote:
>
>>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>>> index 2db93498ff71..b42c1beb9924 100644
>>> --- a/arch/x86/events/intel/core.c
>>> +++ b/arch/x86/events/intel/core.c
>>> @@ -5933,7 +5933,6 @@ __init int intel_pmu_init(void)
>>> x86_pmu.pebs_aliases = NULL;
>>> x86_pmu.pebs_prec_dist = true;
>>> x86_pmu.lbr_pt_coexist = true;
>>> - x86_pmu.pebs_capable = ~0ULL;
>>> x86_pmu.flags |= PMU_FL_HAS_RSP_1;
>>> x86_pmu.flags |= PMU_FL_PEBS_ALL;
>>> x86_pmu.get_event_constraints = glp_get_event_constraints;
>>> @@ -6291,7 +6290,6 @@ __init int intel_pmu_init(void)
>>> x86_pmu.pebs_aliases = NULL;
>>> x86_pmu.pebs_prec_dist = true;
>>> x86_pmu.pebs_block = true;
>>> - x86_pmu.pebs_capable = ~0ULL;
>>> x86_pmu.flags |= PMU_FL_HAS_RSP_1;
>>> x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
>>> x86_pmu.flags |= PMU_FL_PEBS_ALL;
>>> @@ -6337,7 +6335,6 @@ __init int intel_pmu_init(void)
>>> x86_pmu.pebs_aliases = NULL;
>>> x86_pmu.pebs_prec_dist = true;
>>> x86_pmu.pebs_block = true;
>>> - x86_pmu.pebs_capable = ~0ULL;
>>> x86_pmu.flags |= PMU_FL_HAS_RSP_1;
>>> x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
>>> x86_pmu.flags |= PMU_FL_PEBS_ALL;
>>> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
>>> index ba60427caa6d..e2da643632b9 100644
>>> --- a/arch/x86/events/intel/ds.c
>>> +++ b/arch/x86/events/intel/ds.c
>>> @@ -2258,6 +2258,7 @@ void __init intel_ds_init(void)
>>> x86_pmu.drain_pebs = intel_pmu_drain_pebs_icl;
>>> x86_pmu.pebs_record_size = sizeof(struct pebs_basic);
>>> if (x86_pmu.intel_cap.pebs_baseline) {
>>> + x86_pmu.pebs_capable = ~0ULL;
>>
>> The two features of "Extended PEBS (about pebs_capable)" and "Adaptive PEBS
>> (about pebs_baseline)"
>> are orthogonal, although the two are often supported together.
>
> The SDM explicitly states that PEBS Baseline implies Extended PEBS. See
> 3-19.8 (April 22 edition).

Indeed, "IA32_PERF_CAPABILITIES.PEBS_BASELINE [bit 14]: If set, the following is
true
- Extended PEBS is supported;
- Adaptive PEBS is supported;"

>
> The question is if there is hardware that has Extended PEBS but doesn't
> have Baseline; and I simply don't know and was hoping Kan could find
> out.

In the SDM world, we actually have three PEBS sub-features:

[July 2017] 18.5.4.1 Extended PEBS (first on Goldmont Plus)
[January 2019] 18.9.2 Adaptive PEBS (first on Tremont)
[April 2021] IA32_PERF_CAPABILITIES.PEBS_BASELINE in the Table 2-2. IA-32
Architectural MSRs

Waiting Kan to confirm the real world.

>
> That said; the above patch can be further improved by also removing the
> PMU_FL_PEBS_ALL lines, which is already set by intel_ds_init().
>
> In general though; the point is, we shouldn't be doing the FMS table
> thing for discoverable features. Having pebs_capable = ~0 and
> PMU_FL_PEBS_ALL on something with BASELINE set is just wrong.

2022-08-15 13:29:00

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] perf/x86/core: Update x86_pmu.pebs_capable for ICELAKE_{X,D}



On 2022-08-15 7:51 a.m., Peter Zijlstra wrote:
> On Mon, Aug 15, 2022 at 05:43:34PM +0800, Like Xu wrote:
>
>>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>>> index 2db93498ff71..b42c1beb9924 100644
>>> --- a/arch/x86/events/intel/core.c
>>> +++ b/arch/x86/events/intel/core.c
>>> @@ -5933,7 +5933,6 @@ __init int intel_pmu_init(void)
>>> x86_pmu.pebs_aliases = NULL;
>>> x86_pmu.pebs_prec_dist = true;
>>> x86_pmu.lbr_pt_coexist = true;
>>> - x86_pmu.pebs_capable = ~0ULL;
>>> x86_pmu.flags |= PMU_FL_HAS_RSP_1;
>>> x86_pmu.flags |= PMU_FL_PEBS_ALL;
>>> x86_pmu.get_event_constraints = glp_get_event_constraints;
>>> @@ -6291,7 +6290,6 @@ __init int intel_pmu_init(void)
>>> x86_pmu.pebs_aliases = NULL;
>>> x86_pmu.pebs_prec_dist = true;
>>> x86_pmu.pebs_block = true;
>>> - x86_pmu.pebs_capable = ~0ULL;
>>> x86_pmu.flags |= PMU_FL_HAS_RSP_1;
>>> x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
>>> x86_pmu.flags |= PMU_FL_PEBS_ALL;
>>> @@ -6337,7 +6335,6 @@ __init int intel_pmu_init(void)
>>> x86_pmu.pebs_aliases = NULL;
>>> x86_pmu.pebs_prec_dist = true;
>>> x86_pmu.pebs_block = true;
>>> - x86_pmu.pebs_capable = ~0ULL;
>>> x86_pmu.flags |= PMU_FL_HAS_RSP_1;
>>> x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
>>> x86_pmu.flags |= PMU_FL_PEBS_ALL;
>>> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
>>> index ba60427caa6d..e2da643632b9 100644
>>> --- a/arch/x86/events/intel/ds.c
>>> +++ b/arch/x86/events/intel/ds.c
>>> @@ -2258,6 +2258,7 @@ void __init intel_ds_init(void)
>>> x86_pmu.drain_pebs = intel_pmu_drain_pebs_icl;
>>> x86_pmu.pebs_record_size = sizeof(struct pebs_basic);
>>> if (x86_pmu.intel_cap.pebs_baseline) {
>>> + x86_pmu.pebs_capable = ~0ULL;
>>
>> The two features of "Extended PEBS (about pebs_capable)" and "Adaptive PEBS
>> (about pebs_baseline)"
>> are orthogonal, although the two are often supported together.
>
> The SDM explicitly states that PEBS Baseline implies Extended PEBS. See
> 3-19.8 (April 22 edition).
>
> The question is if there is hardware that has Extended PEBS but doesn't
> have Baseline; and I simply don't know and was hoping Kan could find
> out.

Goldmont Plus should be the only platform which supports extended PEBS
but doesn't have Baseline.

>
> That said; the above patch can be further improved by also removing the
> PMU_FL_PEBS_ALL lines, which is already set by intel_ds_init().

I think we have to keep PMU_FL_PEBS_ALL for the Goldmont Plus. But we
can remove it for SPR and ADL in intel_pmu_init(), since it's already
set in the intel_ds_init() for the Baseline.

Thanks,
Kan
>
> In general though; the point is, we shouldn't be doing the FMS table
> thing for discoverable features. Having pebs_capable = ~0 and
> PMU_FL_PEBS_ALL on something with BASELINE set is just wrong.

2022-08-15 13:30:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] perf/x86/core: Update x86_pmu.pebs_capable for ICELAKE_{X,D}

On Mon, Aug 15, 2022 at 08:48:36PM +0800, Like Xu wrote:

> In the SDM world, we actually have three PEBS sub-features:
>
> [July 2017] 18.5.4.1 Extended PEBS (first on Goldmont Plus)
> [January 2019] 18.9.2 Adaptive PEBS (first on Tremont)
> [April 2021] IA32_PERF_CAPABILITIES.PEBS_BASELINE in the Table 2-2. IA-32
> Architectural MSRs
>
> Waiting Kan to confirm the real world.

Right, so it's possible GLM would need the FMS thing, but AFAIK ICX
has Baseline and therefore shouldn't need it.

2022-08-15 14:35:46

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] perf/x86/core: Update x86_pmu.pebs_capable for ICELAKE_{X,D}

On Mon, Aug 15, 2022 at 09:06:01AM -0400, Liang, Kan wrote:

> Goldmont Plus should be the only platform which supports extended PEBS
> but doesn't have Baseline.

Like so then...

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 2db93498ff71..cb98a05ee743 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -6291,10 +6291,8 @@ __init int intel_pmu_init(void)
x86_pmu.pebs_aliases = NULL;
x86_pmu.pebs_prec_dist = true;
x86_pmu.pebs_block = true;
- x86_pmu.pebs_capable = ~0ULL;
x86_pmu.flags |= PMU_FL_HAS_RSP_1;
x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
- x86_pmu.flags |= PMU_FL_PEBS_ALL;
x86_pmu.flags |= PMU_FL_INSTR_LATENCY;
x86_pmu.flags |= PMU_FL_MEM_LOADS_AUX;

@@ -6337,10 +6335,8 @@ __init int intel_pmu_init(void)
x86_pmu.pebs_aliases = NULL;
x86_pmu.pebs_prec_dist = true;
x86_pmu.pebs_block = true;
- x86_pmu.pebs_capable = ~0ULL;
x86_pmu.flags |= PMU_FL_HAS_RSP_1;
x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
- x86_pmu.flags |= PMU_FL_PEBS_ALL;
x86_pmu.flags |= PMU_FL_INSTR_LATENCY;
x86_pmu.flags |= PMU_FL_MEM_LOADS_AUX;
x86_pmu.lbr_pt_coexist = true;
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index ba60427caa6d..ac6dd4c96dbc 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -2262,6 +2262,7 @@ void __init intel_ds_init(void)
PERF_SAMPLE_BRANCH_STACK |
PERF_SAMPLE_TIME;
x86_pmu.flags |= PMU_FL_PEBS_ALL;
+ x86_pmu.pebs_capable = ~0ULL;
pebs_qual = "-baseline";
x86_get_pmu(smp_processor_id())->capabilities |= PERF_PMU_CAP_EXTENDED_REGS;
} else {

2022-08-16 12:13:20

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] perf/x86/core: Update x86_pmu.pebs_capable for ICELAKE_{X,D}

On 15/8/2022 10:30 pm, Peter Zijlstra wrote:
> On Mon, Aug 15, 2022 at 09:06:01AM -0400, Liang, Kan wrote:
>
>> Goldmont Plus should be the only platform which supports extended PEBS
>> but doesn't have Baseline.
>
> Like so then...
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 2db93498ff71..cb98a05ee743 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -6291,10 +6291,8 @@ __init int intel_pmu_init(void)
> x86_pmu.pebs_aliases = NULL;
> x86_pmu.pebs_prec_dist = true;
> x86_pmu.pebs_block = true;
> - x86_pmu.pebs_capable = ~0ULL;
> x86_pmu.flags |= PMU_FL_HAS_RSP_1;
> x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
> - x86_pmu.flags |= PMU_FL_PEBS_ALL;
> x86_pmu.flags |= PMU_FL_INSTR_LATENCY;
> x86_pmu.flags |= PMU_FL_MEM_LOADS_AUX;
>
> @@ -6337,10 +6335,8 @@ __init int intel_pmu_init(void)
> x86_pmu.pebs_aliases = NULL;
> x86_pmu.pebs_prec_dist = true;
> x86_pmu.pebs_block = true;
> - x86_pmu.pebs_capable = ~0ULL;
> x86_pmu.flags |= PMU_FL_HAS_RSP_1;
> x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
> - x86_pmu.flags |= PMU_FL_PEBS_ALL;
> x86_pmu.flags |= PMU_FL_INSTR_LATENCY;
> x86_pmu.flags |= PMU_FL_MEM_LOADS_AUX;
> x86_pmu.lbr_pt_coexist = true;
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index ba60427caa6d..ac6dd4c96dbc 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -2262,6 +2262,7 @@ void __init intel_ds_init(void)
> PERF_SAMPLE_BRANCH_STACK |
> PERF_SAMPLE_TIME;
> x86_pmu.flags |= PMU_FL_PEBS_ALL;
> + x86_pmu.pebs_capable = ~0ULL;
> pebs_qual = "-baseline";
> x86_get_pmu(smp_processor_id())->capabilities |= PERF_PMU_CAP_EXTENDED_REGS;
> } else {

Looks good to me.

This diff touches PMU_FL_PEBS_ALL, which is less needed for guest PEBS fixes.
It has be sent out separately to the linux-perf-users group for further review.