2022-03-02 14:10:25

by Like Xu

[permalink] [raw]
Subject: [PATCH v2 00/12] KVM: x86/pmu: Get rid of PERF_TYPE_HARDWAR and other minor fixes

Hi,

This is a follow up to [0]. By keeping the same semantics of eventsel
for gp and fixed counters, the reprogram code could be made more
symmetrical, simpler and even faster [1], and it also fixes the
obsolescence amd_event_mapping issue [2].

One of the notable changes is that we ended up removing the
reprogram_{gp, fixed}_counter() functions and replacing it with the
merged reprogram_counter(), where KVM programs pmc->perf_event
with only the PERF_TYPE_RAW type for any type of counter
(suggested by Jim as well). PeterZ confirmed the idea, "think so;
the HARDWARE is just a convenience wrapper over RAW IIRC".

Practically, this change drops the guest pmu support on the hosts without
X86_FEATURE_ARCH_PERFMON (the oldest Pentium 4), where the
PERF_TYPE_HARDWAR is intentionally introduced so that hosts can
map the architectural guest PMU events to their own.

Some code refactoring helps to review key changes more easily.
Patches are based on top of kvm/master (ec756e40e271).

The 11nd patch removes the call trace in the commit message while we still
think that kvm->arch.pmu_event_filter requires SRCU protection in terms
of pmu_event_filter functionality, similar to "kvm->arch.msr_filter".

Patch 0012 is trying to expose some reserved bits for Milan use only, and
the related discussion can be seen here [3], thanks to Jim and Ravikumar.

Please check more details in each commit and feel free to comment.

[0] https://lore.kernel.org/kvm/[email protected]/
[1] https://lore.kernel.org/kvm/[email protected]/
[2] https://lore.kernel.org/kvm/[email protected]/
[3] https://lore.kernel.org/kvm/[email protected]/

Thanks,

v1: https://lore.kernel.org/kvm/[email protected]/
v1 -> v2 Changelog:
- Get all the vPMC tests I have on hand to pass;
- Update obsolete AMD PMU comments; (Jim)
- Fix my carelessness for [-Wconstant-logical-operand] in 0009; (0day)
- Add "u64 new_config" to reuse perf_event for fixed CTRs; (0day)
- Add patch 0012 to attract attention to review;

Like Xu (12):
KVM: x86/pmu: Update comments for AMD gp counters
KVM: x86/pmu: Extract check_pmu_event_filter() from the same semantics
KVM: x86/pmu: Pass only "struct kvm_pmc *pmc" to reprogram_counter()
KVM: x86/pmu: Drop "u64 eventsel" for reprogram_gp_counter()
KVM: x86/pmu: Drop "u8 ctrl, int idx" for reprogram_fixed_counter()
KVM: x86/pmu: Use only the uniformly exported interface
reprogram_counter()
KVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp,
fixed}counter()
perf: x86/core: Add interface to query perfmon_event_map[] directly
KVM: x86/pmu: Replace pmc_perf_hw_id() with perf_get_hw_event_config()
KVM: x86/pmu: Drop amd_event_mapping[] in the KVM context
KVM: x86/pmu: Protect kvm->arch.pmu_event_filter with SRCU
KVM: x86/pmu: Clear reserved bit PERF_CTL2[43] for AMD erratum 1292

arch/x86/events/core.c | 11 ++
arch/x86/include/asm/perf_event.h | 6 +
arch/x86/kvm/pmu.c | 182 ++++++++++++------------------
arch/x86/kvm/pmu.h | 6 +-
arch/x86/kvm/svm/pmu.c | 57 ++++------
arch/x86/kvm/vmx/pmu_intel.c | 64 ++++++-----
6 files changed, 149 insertions(+), 177 deletions(-)

--
2.35.1


2022-03-02 15:51:33

by Like Xu

[permalink] [raw]
Subject: [PATCH v2 05/12] KVM: x86/pmu: Drop "u8 ctrl, int idx" for reprogram_fixed_counter()

From: Like Xu <[email protected]>

Since afrer reprogram_fixed_counter() is called, it's bound to assign
the requested fixed_ctr_ctrl to pmu->fixed_ctr_ctrl, this assignment step
can be moved forward (the stale value for diff is saved extra early),
thus simplifying the passing of parameters.

No functional change intended.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/pmu.c | 13 ++++++-------
arch/x86/kvm/pmu.h | 2 +-
arch/x86/kvm/vmx/pmu_intel.c | 16 ++++++++--------
3 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 7b8a5f973a63..282e6e859c46 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -260,8 +260,11 @@ void reprogram_gp_counter(struct kvm_pmc *pmc)
}
EXPORT_SYMBOL_GPL(reprogram_gp_counter);

-void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
+void reprogram_fixed_counter(struct kvm_pmc *pmc)
{
+ struct kvm_pmu *pmu = pmc_to_pmu(pmc);
+ int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
+ u8 ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, idx);
unsigned en_field = ctrl & 0x3;
bool pmi = ctrl & 0x8;

@@ -291,12 +294,8 @@ void reprogram_counter(struct kvm_pmc *pmc)
{
if (pmc_is_gp(pmc))
reprogram_gp_counter(pmc);
- else {
- int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
- u8 ctrl = fixed_ctrl_field(pmc_to_pmu(pmc)->fixed_ctr_ctrl, idx);
-
- reprogram_fixed_counter(pmc, ctrl, idx);
- }
+ else
+ reprogram_fixed_counter(pmc);
}
EXPORT_SYMBOL_GPL(reprogram_counter);

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 4db50c290c62..70a982c3cdad 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -141,7 +141,7 @@ static inline u64 get_sample_period(struct kvm_pmc *pmc, u64 counter_value)
}

void reprogram_gp_counter(struct kvm_pmc *pmc);
-void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int fixed_idx);
+void reprogram_fixed_counter(struct kvm_pmc *pmc);
void reprogram_counter(struct kvm_pmc *pmc);

void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 2eefde7e4b1a..3ddbfdd16cd0 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -37,23 +37,23 @@ static int fixed_pmc_events[] = {1, 0, 7};

static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
{
+ struct kvm_pmc *pmc;
+ u8 old_fixed_ctr_ctrl = pmu->fixed_ctr_ctrl;
int i;

+ pmu->fixed_ctr_ctrl = data;
for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
u8 new_ctrl = fixed_ctrl_field(data, i);
- u8 old_ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, i);
- struct kvm_pmc *pmc;
-
- pmc = get_fixed_pmc(pmu, MSR_CORE_PERF_FIXED_CTR0 + i);
+ u8 old_ctrl = fixed_ctrl_field(old_fixed_ctr_ctrl, i);

if (old_ctrl == new_ctrl)
continue;

- __set_bit(INTEL_PMC_IDX_FIXED + i, pmu->pmc_in_use);
- reprogram_fixed_counter(pmc, new_ctrl, i);
- }
+ pmc = get_fixed_pmc(pmu, MSR_CORE_PERF_FIXED_CTR0 + i);

- pmu->fixed_ctr_ctrl = data;
+ __set_bit(INTEL_PMC_IDX_FIXED + i, pmu->pmc_in_use);
+ reprogram_fixed_counter(pmc);
+ }
}

static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
--
2.35.1

2022-03-02 16:13:49

by Like Xu

[permalink] [raw]
Subject: [PATCH v2 04/12] KVM: x86/pmu: Drop "u64 eventsel" for reprogram_gp_counter()

From: Like Xu <[email protected]>

Because inside reprogram_gp_counter() it is bound to assign the requested
eventel to pmc->eventsel, this assignment step can be moved forward, thus
simplifying the passing of parameters to "struct kvm_pmc *pmc" only.

No functional change intended.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/pmu.c | 7 +++----
arch/x86/kvm/pmu.h | 2 +-
arch/x86/kvm/svm/pmu.c | 3 ++-
arch/x86/kvm/vmx/pmu_intel.c | 3 ++-
4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 0ce33a2798cd..7b8a5f973a63 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -215,16 +215,15 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
return allow_event;
}

-void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
+void reprogram_gp_counter(struct kvm_pmc *pmc)
{
u64 config;
u32 type = PERF_TYPE_RAW;
+ u64 eventsel = pmc->eventsel;

if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
printk_once("kvm pmu: pin control bit is ignored\n");

- pmc->eventsel = eventsel;
-
pmc_pause_counter(pmc);

if (!(eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) || !pmc_is_enabled(pmc))
@@ -291,7 +290,7 @@ EXPORT_SYMBOL_GPL(reprogram_fixed_counter);
void reprogram_counter(struct kvm_pmc *pmc)
{
if (pmc_is_gp(pmc))
- reprogram_gp_counter(pmc, pmc->eventsel);
+ reprogram_gp_counter(pmc);
else {
int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
u8 ctrl = fixed_ctrl_field(pmc_to_pmu(pmc)->fixed_ctr_ctrl, idx);
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index b529c54dc309..4db50c290c62 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -140,7 +140,7 @@ static inline u64 get_sample_period(struct kvm_pmc *pmc, u64 counter_value)
return sample_period;
}

-void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel);
+void reprogram_gp_counter(struct kvm_pmc *pmc);
void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int fixed_idx);
void reprogram_counter(struct kvm_pmc *pmc);

diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index d4de52409335..7ff9ccaca0a4 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -265,7 +265,8 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (data == pmc->eventsel)
return 0;
if (!(data & pmu->reserved_bits)) {
- reprogram_gp_counter(pmc, data);
+ pmc->eventsel = data;
+ reprogram_gp_counter(pmc);
return 0;
}
}
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 20f2b5f5102b..2eefde7e4b1a 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -448,7 +448,8 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (data == pmc->eventsel)
return 0;
if (!(data & pmu->reserved_bits)) {
- reprogram_gp_counter(pmc, data);
+ pmc->eventsel = data;
+ reprogram_gp_counter(pmc);
return 0;
}
} else if (intel_pmu_handle_lbr_msrs_access(vcpu, msr_info, false))
--
2.35.1

2022-03-02 19:07:04

by Like Xu

[permalink] [raw]
Subject: [PATCH v2 03/12] KVM: x86/pmu: Pass only "struct kvm_pmc *pmc" to reprogram_counter()

From: Like Xu <[email protected]>

Passing the reference "struct kvm_pmc *pmc" when creating
pmc->perf_event is sufficient. This change helps to simplify the
calling convention by replacing reprogram_{gp, fixed}_counter()
with reprogram_counter() seamlessly.

No functional change intended.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/pmu.c | 17 +++++------------
arch/x86/kvm/pmu.h | 2 +-
arch/x86/kvm/vmx/pmu_intel.c | 32 ++++++++++++++++++--------------
3 files changed, 24 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index fda963161951..0ce33a2798cd 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -288,18 +288,13 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
}
EXPORT_SYMBOL_GPL(reprogram_fixed_counter);

-void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx)
+void reprogram_counter(struct kvm_pmc *pmc)
{
- struct kvm_pmc *pmc = kvm_x86_ops.pmu_ops->pmc_idx_to_pmc(pmu, pmc_idx);
-
- if (!pmc)
- return;
-
if (pmc_is_gp(pmc))
reprogram_gp_counter(pmc, pmc->eventsel);
else {
- int idx = pmc_idx - INTEL_PMC_IDX_FIXED;
- u8 ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, idx);
+ int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
+ u8 ctrl = fixed_ctrl_field(pmc_to_pmu(pmc)->fixed_ctr_ctrl, idx);

reprogram_fixed_counter(pmc, ctrl, idx);
}
@@ -318,8 +313,7 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
clear_bit(bit, pmu->reprogram_pmi);
continue;
}
-
- reprogram_counter(pmu, bit);
+ reprogram_counter(pmc);
}

/*
@@ -505,13 +499,12 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu)

static void kvm_pmu_incr_counter(struct kvm_pmc *pmc)
{
- struct kvm_pmu *pmu = pmc_to_pmu(pmc);
u64 prev_count;

prev_count = pmc->counter;
pmc->counter = (pmc->counter + 1) & pmc_bitmask(pmc);

- reprogram_counter(pmu, pmc->idx);
+ reprogram_counter(pmc);
if (pmc->counter < prev_count)
__kvm_perf_overflow(pmc, false);
}
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 7a7b8d5b775e..b529c54dc309 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -142,7 +142,7 @@ static inline u64 get_sample_period(struct kvm_pmc *pmc, u64 counter_value)

void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel);
void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int fixed_idx);
-void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx);
+void reprogram_counter(struct kvm_pmc *pmc);

void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 4e5b1eeeb77c..20f2b5f5102b 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -56,16 +56,32 @@ static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
pmu->fixed_ctr_ctrl = data;
}

+static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
+{
+ if (pmc_idx < INTEL_PMC_IDX_FIXED)
+ return get_gp_pmc(pmu, MSR_P6_EVNTSEL0 + pmc_idx,
+ MSR_P6_EVNTSEL0);
+ else {
+ u32 idx = pmc_idx - INTEL_PMC_IDX_FIXED;
+
+ return get_fixed_pmc(pmu, idx + MSR_CORE_PERF_FIXED_CTR0);
+ }
+}
+
/* function is called when global control register has been updated. */
static void global_ctrl_changed(struct kvm_pmu *pmu, u64 data)
{
int bit;
u64 diff = pmu->global_ctrl ^ data;
+ struct kvm_pmc *pmc;

pmu->global_ctrl = data;

- for_each_set_bit(bit, (unsigned long *)&diff, X86_PMC_IDX_MAX)
- reprogram_counter(pmu, bit);
+ for_each_set_bit(bit, (unsigned long *)&diff, X86_PMC_IDX_MAX) {
+ pmc = intel_pmc_idx_to_pmc(pmu, bit);
+ if (pmc)
+ reprogram_counter(pmc);
+ }
}

static unsigned int intel_pmc_perf_hw_id(struct kvm_pmc *pmc)
@@ -101,18 +117,6 @@ static bool intel_pmc_is_enabled(struct kvm_pmc *pmc)
return test_bit(pmc->idx, (unsigned long *)&pmu->global_ctrl);
}

-static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
-{
- if (pmc_idx < INTEL_PMC_IDX_FIXED)
- return get_gp_pmc(pmu, MSR_P6_EVNTSEL0 + pmc_idx,
- MSR_P6_EVNTSEL0);
- else {
- u32 idx = pmc_idx - INTEL_PMC_IDX_FIXED;
-
- return get_fixed_pmc(pmu, idx + MSR_CORE_PERF_FIXED_CTR0);
- }
-}
-
static bool intel_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
--
2.35.1

2022-03-02 19:57:06

by Like Xu

[permalink] [raw]
Subject: [PATCH v2 07/12] KVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp, fixed}counter()

From: Like Xu <[email protected]>

The code sketch for reprogram_{gp, fixed}_counter() is similar, while the
fixed counter using the PERF_TYPE_HARDWAR type and the gp being
able to use either PERF_TYPE_HARDWAR or PERF_TYPE_RAW type
depending on the pmc->eventsel value.

After 'commit 761875634a5e ("KVM: x86/pmu: Setup pmc->eventsel
for fixed PMCs")', the pmc->eventsel of the fixed counter will also have
been setup with the same semantic value and will not be changed during
the guest runtime. But essentially, "the HARDWARE is just a convenience
wrapper over RAW IIRC", quoated from Peterz. So it could be pretty safe
to use the PERF_TYPE_RAW type only to program both gp and fixed
counters naturally in the reprogram_counter().

To make the gp and fixed counters more semantically symmetrical,
the selection of EVENTSEL_{USER, OS, INT} bits is temporarily translated
via fixed_ctr_ctrl before the pmc_reprogram_counter() call.

Cc: Peter Zijlstra <[email protected]>
Suggested-by: Jim Mattson <[email protected]>
Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/pmu.c | 128 +++++++++++++----------------------
arch/x86/kvm/vmx/pmu_intel.c | 2 +-
2 files changed, 47 insertions(+), 83 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 5299488b002c..00e1660c10ca 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -215,85 +215,60 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
return allow_event;
}

-static void reprogram_gp_counter(struct kvm_pmc *pmc)
-{
- u64 config;
- u32 type = PERF_TYPE_RAW;
- u64 eventsel = pmc->eventsel;
-
- if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
- printk_once("kvm pmu: pin control bit is ignored\n");
-
- pmc_pause_counter(pmc);
-
- if (!(eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) || !pmc_is_enabled(pmc))
- return;
-
- if (!check_pmu_event_filter(pmc))
- return;
-
- if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
- ARCH_PERFMON_EVENTSEL_INV |
- ARCH_PERFMON_EVENTSEL_CMASK |
- HSW_IN_TX |
- HSW_IN_TX_CHECKPOINTED))) {
- config = kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc);
- if (config != PERF_COUNT_HW_MAX)
- type = PERF_TYPE_HARDWARE;
- }
-
- if (type == PERF_TYPE_RAW)
- config = eventsel & AMD64_RAW_EVENT_MASK;
-
- if (pmc->current_config == eventsel && pmc_resume_counter(pmc))
- return;
-
- pmc_release_perf_event(pmc);
-
- pmc->current_config = eventsel;
- pmc_reprogram_counter(pmc, type, config,
- !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
- !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
- eventsel & ARCH_PERFMON_EVENTSEL_INT,
- (eventsel & HSW_IN_TX),
- (eventsel & HSW_IN_TX_CHECKPOINTED));
-}
-
-static void reprogram_fixed_counter(struct kvm_pmc *pmc)
+static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
{
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
- int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
- u8 ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, idx);
- unsigned en_field = ctrl & 0x3;
- bool pmi = ctrl & 0x8;

- pmc_pause_counter(pmc);
+ if (pmc_is_fixed(pmc))
+ return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
+ pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;

- if (!en_field || !pmc_is_enabled(pmc))
- return;
-
- if (!check_pmu_event_filter(pmc))
- return;
-
- if (pmc->current_config == (u64)ctrl && pmc_resume_counter(pmc))
- return;
-
- pmc_release_perf_event(pmc);
-
- pmc->current_config = (u64)ctrl;
- pmc_reprogram_counter(pmc, PERF_TYPE_HARDWARE,
- kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc),
- !(en_field & 0x2), /* exclude user */
- !(en_field & 0x1), /* exclude kernel */
- pmi, false, false);
+ return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
}

void reprogram_counter(struct kvm_pmc *pmc)
{
- if (pmc_is_gp(pmc))
- reprogram_gp_counter(pmc);
- else
- reprogram_fixed_counter(pmc);
+ struct kvm_pmu *pmu = pmc_to_pmu(pmc);
+ u64 eventsel = pmc->eventsel;
+ u64 new_config = eventsel;
+ u8 fixed_ctr_ctrl;
+
+ pmc_pause_counter(pmc);
+
+ if (!pmc_speculative_in_use(pmc) || !pmc_is_enabled(pmc))
+ return;
+
+ if (!check_pmu_event_filter(pmc))
+ return;
+
+ if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
+ printk_once("kvm pmu: pin control bit is ignored\n");
+
+ if (pmc_is_fixed(pmc)) {
+ fixed_ctr_ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl,
+ pmc->idx - INTEL_PMC_IDX_FIXED);
+ if (fixed_ctr_ctrl & 0x1)
+ eventsel |= ARCH_PERFMON_EVENTSEL_OS;
+ if (fixed_ctr_ctrl & 0x2)
+ eventsel |= ARCH_PERFMON_EVENTSEL_USR;
+ if (fixed_ctr_ctrl & 0x8)
+ eventsel |= ARCH_PERFMON_EVENTSEL_INT;
+ new_config = (u64)fixed_ctr_ctrl;
+ }
+
+ if (pmc->current_config == new_config && pmc_resume_counter(pmc))
+ return;
+
+ pmc_release_perf_event(pmc);
+
+ pmc->current_config = new_config;
+ pmc_reprogram_counter(pmc, PERF_TYPE_RAW,
+ (eventsel & AMD64_RAW_EVENT_MASK),
+ !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
+ !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
+ eventsel & ARCH_PERFMON_EVENTSEL_INT,
+ (eventsel & HSW_IN_TX),
+ (eventsel & HSW_IN_TX_CHECKPOINTED));
}
EXPORT_SYMBOL_GPL(reprogram_counter);

@@ -451,17 +426,6 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu)
kvm_pmu_refresh(vcpu);
}

-static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
-{
- struct kvm_pmu *pmu = pmc_to_pmu(pmc);
-
- if (pmc_is_fixed(pmc))
- return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
- pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;
-
- return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
-}
-
/* Release perf_events for vPMCs that have been unused for a full time slice. */
void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)
{
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 19b78a9d9d47..d823fbe4e155 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -492,7 +492,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
pmu->reserved_bits = 0xffffffff00200000ull;

entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
- if (!entry || !vcpu->kvm->arch.enable_pmu)
+ if (!entry || !vcpu->kvm->arch.enable_pmu || !boot_cpu_has(X86_FEATURE_ARCH_PERFMON))
return;
eax.full = entry->eax;
edx.full = entry->edx;
--
2.35.1

2022-03-02 22:26:08

by Like Xu

[permalink] [raw]
Subject: [PATCH v2 08/12] perf: x86/core: Add interface to query perfmon_event_map[] directly

From: Like Xu <[email protected]>

Currently, we have [intel|knc|p4|p6]_perfmon_event_map on the Intel
platforms and amd_[f17h]_perfmon_event_map on the AMD platforms.

Early clumsy KVM code or other potential perf_event users may have
hard-coded these perfmon_maps (e.g., arch/x86/kvm/svm/pmu.c), so
it would not make sense to program a common hardware event based
on the generic "enum perf_hw_id" once the two tables do not match.

Let's provide an interface for callers outside the perf subsystem to get
the counter config based on the perfmon_event_map currently in use,
and it also helps to save bytes.

Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Like Xu <[email protected]>
---
arch/x86/events/core.c | 11 +++++++++++
arch/x86/include/asm/perf_event.h | 6 ++++++
2 files changed, 17 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index e686c5e0537b..e760a1348c62 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2996,3 +2996,14 @@ void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
cap->events_mask_len = x86_pmu.events_mask_len;
}
EXPORT_SYMBOL_GPL(perf_get_x86_pmu_capability);
+
+u64 perf_get_hw_event_config(int hw_event)
+{
+ int max = x86_pmu.max_events;
+
+ if (hw_event < max)
+ return x86_pmu.event_map(array_index_nospec(hw_event, max));
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(perf_get_hw_event_config);
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 8fc1b5003713..822927045406 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -477,6 +477,7 @@ struct x86_pmu_lbr {
};

extern void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap);
+extern u64 perf_get_hw_event_config(int hw_event);
extern void perf_check_microcode(void);
extern void perf_clear_dirty_counters(void);
extern int x86_perf_rdpmc_index(struct perf_event *event);
@@ -486,6 +487,11 @@ static inline void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
memset(cap, 0, sizeof(*cap));
}

+static inline u64 perf_get_hw_event_config(int hw_event)
+{
+ return 0;
+}
+
static inline void perf_events_lapic_init(void) { }
static inline void perf_check_microcode(void) { }
#endif
--
2.35.1

2022-03-02 22:32:28

by Like Xu

[permalink] [raw]
Subject: [PATCH v2 09/12] KVM: x86/pmu: Replace pmc_perf_hw_id() with perf_get_hw_event_config()

From: Like Xu <[email protected]>

With the help of perf_get_hw_event_config(), KVM could query the correct
EVENTSEL_{EVENT, UMASK} pair of a kernel-generic hw event directly from
the different *_perfmon_event_map[] by the kernel's pre-defined perf_hw_id.

Also extend the bit range of the comparison field to
AMD64_RAW_EVENT_MASK_NB to prevent AMD from
defining EventSelect[11:8] into perfmon_event_map[] one day.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/pmu.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 00e1660c10ca..9fb7d29e5fdd 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -472,13 +472,8 @@ static void kvm_pmu_incr_counter(struct kvm_pmc *pmc)
static inline bool eventsel_match_perf_hw_id(struct kvm_pmc *pmc,
unsigned int perf_hw_id)
{
- u64 old_eventsel = pmc->eventsel;
- unsigned int config;
-
- pmc->eventsel &= (ARCH_PERFMON_EVENTSEL_EVENT | ARCH_PERFMON_EVENTSEL_UMASK);
- config = kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc);
- pmc->eventsel = old_eventsel;
- return config == perf_hw_id;
+ return !((pmc->eventsel ^ perf_get_hw_event_config(perf_hw_id)) &
+ AMD64_RAW_EVENT_MASK_NB);
}

static inline bool cpl_is_matched(struct kvm_pmc *pmc)
--
2.35.1

2022-03-02 23:45:58

by Like Xu

[permalink] [raw]
Subject: [PATCH v2 12/12] KVM: x86/pmu: Clear reserved bit PERF_CTL2[43] for AMD erratum 1292

From: Like Xu <[email protected]>

The AMD Family 19h Models 00h-0Fh Processors may experience sampling
inaccuracies that cause the following performance counters to overcount
retire-based events. To count the non-FP affected PMC events correctly,
a patched guest with a target vCPU model would:

- Use Core::X86::Msr::PERF_CTL2 to count the events, and
- Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
- Program Core::X86::Msr::PERF_CTL2[20] to 0b.

To support this use of AMD guests, KVM should not reserve bit 43
only for counter #2. Treatment of other cases remains unchanged.

AMD hardware team clarified that the conditions under which the
overcounting can happen, is quite rare. This change may make those
PMU driver developers who have read errata #1292 less disappointed.

Reported-by: Jim Mattson <[email protected]>
Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/svm/pmu.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 41c9b9e2aec2..05b4e4f2bb66 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -18,6 +18,20 @@
#include "pmu.h"
#include "svm.h"

+/*
+ * As a workaround of "Retire Based Events May Overcount" for erratum 1292,
+ * some patched guests may set PERF_CTL2[43] to 1b and PERF_CTL2[20] to 0b
+ * to count the non-FP affected PMC events correctly.
+ *
+ * Note, tests show that the counter difference before and after using the
+ * workaround is not significant. Host will be scheduling CTR2 indiscriminately.
+ */
+static inline bool vcpu_overcount_retire_events(struct kvm_vcpu *vcpu)
+{
+ return guest_cpuid_family(vcpu) == 0x19 &&
+ guest_cpuid_model(vcpu) < 0x10;
+}
+
enum pmu_type {
PMU_TYPE_COUNTER = 0,
PMU_TYPE_EVNTSEL,
@@ -224,6 +238,7 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
struct kvm_pmc *pmc;
u32 msr = msr_info->index;
u64 data = msr_info->data;
+ u64 reserved_bits;

/* MSR_PERFCTRn */
pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER);
@@ -236,7 +251,10 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (pmc) {
if (data == pmc->eventsel)
return 0;
- if (!(data & pmu->reserved_bits)) {
+ reserved_bits = pmu->reserved_bits;
+ if (pmc->idx == 2 && vcpu_overcount_retire_events(vcpu))
+ reserved_bits &= ~BIT_ULL(43);
+ if (!(data & reserved_bits)) {
pmc->eventsel = data;
reprogram_counter(pmc);
return 0;
--
2.35.1

2022-03-03 00:11:12

by Like Xu

[permalink] [raw]
Subject: [PATCH v2 02/12] KVM: x86/pmu: Extract check_pmu_event_filter() from the same semantics

From: Like Xu <[email protected]>

Checking the kvm->arch.pmu_event_filter policy in both gp and fixed
code paths was somewhat redundant, so common parts can be extracted,
which reduces code footprint and improves readability.

Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/pmu.c | 61 +++++++++++++++++++++++++++-------------------
1 file changed, 36 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 3f09af678b2c..fda963161951 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -182,13 +182,43 @@ static int cmp_u64(const void *a, const void *b)
return *(__u64 *)a - *(__u64 *)b;
}

+static bool check_pmu_event_filter(struct kvm_pmc *pmc)
+{
+ struct kvm_pmu_event_filter *filter;
+ struct kvm *kvm = pmc->vcpu->kvm;
+ bool allow_event = true;
+ __u64 key;
+ int idx;
+
+ filter = srcu_dereference(kvm->arch.pmu_event_filter, &kvm->srcu);
+ if (!filter)
+ goto out;
+
+ if (pmc_is_gp(pmc)) {
+ key = pmc->eventsel & AMD64_RAW_EVENT_MASK_NB;
+ if (bsearch(&key, filter->events, filter->nevents,
+ sizeof(__u64), cmp_u64))
+ allow_event = filter->action == KVM_PMU_EVENT_ALLOW;
+ else
+ allow_event = filter->action == KVM_PMU_EVENT_DENY;
+ } else {
+ idx = pmc->idx - INTEL_PMC_IDX_FIXED;
+ if (filter->action == KVM_PMU_EVENT_DENY &&
+ test_bit(idx, (ulong *)&filter->fixed_counter_bitmap))
+ allow_event = false;
+ if (filter->action == KVM_PMU_EVENT_ALLOW &&
+ !test_bit(idx, (ulong *)&filter->fixed_counter_bitmap))
+ allow_event = false;
+ }
+
+out:
+ return allow_event;
+}
+
void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
{
u64 config;
u32 type = PERF_TYPE_RAW;
- struct kvm *kvm = pmc->vcpu->kvm;
- struct kvm_pmu_event_filter *filter;
- bool allow_event = true;

if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
printk_once("kvm pmu: pin control bit is ignored\n");
@@ -200,17 +230,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
if (!(eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) || !pmc_is_enabled(pmc))
return;

- filter = srcu_dereference(kvm->arch.pmu_event_filter, &kvm->srcu);
- if (filter) {
- __u64 key = eventsel & AMD64_RAW_EVENT_MASK_NB;
-
- if (bsearch(&key, filter->events, filter->nevents,
- sizeof(__u64), cmp_u64))
- allow_event = filter->action == KVM_PMU_EVENT_ALLOW;
- else
- allow_event = filter->action == KVM_PMU_EVENT_DENY;
- }
- if (!allow_event)
+ if (!check_pmu_event_filter(pmc))
return;

if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
@@ -245,23 +265,14 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
{
unsigned en_field = ctrl & 0x3;
bool pmi = ctrl & 0x8;
- struct kvm_pmu_event_filter *filter;
- struct kvm *kvm = pmc->vcpu->kvm;

pmc_pause_counter(pmc);

if (!en_field || !pmc_is_enabled(pmc))
return;

- filter = srcu_dereference(kvm->arch.pmu_event_filter, &kvm->srcu);
- if (filter) {
- if (filter->action == KVM_PMU_EVENT_DENY &&
- test_bit(idx, (ulong *)&filter->fixed_counter_bitmap))
- return;
- if (filter->action == KVM_PMU_EVENT_ALLOW &&
- !test_bit(idx, (ulong *)&filter->fixed_counter_bitmap))
- return;
- }
+ if (!check_pmu_event_filter(pmc))
+ return;

if (pmc->current_config == (u64)ctrl && pmc_resume_counter(pmc))
return;
--
2.35.1

2022-03-08 13:12:02

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v2 07/12] KVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp, fixed}counter()

On 8/3/2022 8:36 am, Jim Mattson wrote:
> On Wed, Mar 2, 2022 at 3:14 AM Like Xu <[email protected]> wrote:
>>
>> From: Like Xu <[email protected]>
>>
>> The code sketch for reprogram_{gp, fixed}_counter() is similar, while the
>> fixed counter using the PERF_TYPE_HARDWAR type and the gp being
>> able to use either PERF_TYPE_HARDWAR or PERF_TYPE_RAW type
>> depending on the pmc->eventsel value.
>>
>> After 'commit 761875634a5e ("KVM: x86/pmu: Setup pmc->eventsel
>> for fixed PMCs")', the pmc->eventsel of the fixed counter will also have
>> been setup with the same semantic value and will not be changed during
>> the guest runtime. But essentially, "the HARDWARE is just a convenience
>> wrapper over RAW IIRC", quoated from Peterz. So it could be pretty safe
>> to use the PERF_TYPE_RAW type only to program both gp and fixed
>> counters naturally in the reprogram_counter().
>>
>> To make the gp and fixed counters more semantically symmetrical,
>> the selection of EVENTSEL_{USER, OS, INT} bits is temporarily translated
>> via fixed_ctr_ctrl before the pmc_reprogram_counter() call.
>>
>> Cc: Peter Zijlstra <[email protected]>
>> Suggested-by: Jim Mattson <[email protected]>
>> Signed-off-by: Like Xu <[email protected]>
>> ---
>> arch/x86/kvm/pmu.c | 128 +++++++++++++----------------------
>> arch/x86/kvm/vmx/pmu_intel.c | 2 +-
>> 2 files changed, 47 insertions(+), 83 deletions(-)
>>
>> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
>> index 5299488b002c..00e1660c10ca 100644
>> --- a/arch/x86/kvm/pmu.c
>> +++ b/arch/x86/kvm/pmu.c
>> @@ -215,85 +215,60 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
>> return allow_event;
>> }
>>
>> -static void reprogram_gp_counter(struct kvm_pmc *pmc)
>> -{
>> - u64 config;
>> - u32 type = PERF_TYPE_RAW;
>> - u64 eventsel = pmc->eventsel;
>> -
>> - if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
>> - printk_once("kvm pmu: pin control bit is ignored\n");
>> -
>> - pmc_pause_counter(pmc);
>> -
>> - if (!(eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) || !pmc_is_enabled(pmc))
>> - return;
>> -
>> - if (!check_pmu_event_filter(pmc))
>> - return;
>> -
>> - if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
>> - ARCH_PERFMON_EVENTSEL_INV |
>> - ARCH_PERFMON_EVENTSEL_CMASK |
>> - HSW_IN_TX |
>> - HSW_IN_TX_CHECKPOINTED))) {
>> - config = kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc);
>> - if (config != PERF_COUNT_HW_MAX)
>> - type = PERF_TYPE_HARDWARE;
>> - }
>> -
>> - if (type == PERF_TYPE_RAW)
>> - config = eventsel & AMD64_RAW_EVENT_MASK;
>> -
>> - if (pmc->current_config == eventsel && pmc_resume_counter(pmc))
>> - return;
>> -
>> - pmc_release_perf_event(pmc);
>> -
>> - pmc->current_config = eventsel;
>> - pmc_reprogram_counter(pmc, type, config,
>> - !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
>> - !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
>> - eventsel & ARCH_PERFMON_EVENTSEL_INT,
>> - (eventsel & HSW_IN_TX),
>> - (eventsel & HSW_IN_TX_CHECKPOINTED));
>> -}
>> -
>> -static void reprogram_fixed_counter(struct kvm_pmc *pmc)
>> +static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
>> {
>> struct kvm_pmu *pmu = pmc_to_pmu(pmc);
>> - int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
>> - u8 ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, idx);
>> - unsigned en_field = ctrl & 0x3;
>> - bool pmi = ctrl & 0x8;
>>
>> - pmc_pause_counter(pmc);
>> + if (pmc_is_fixed(pmc))
>> + return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
>> + pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;
>>
>> - if (!en_field || !pmc_is_enabled(pmc))
>> - return;
>> -
>> - if (!check_pmu_event_filter(pmc))
>> - return;
>> -
>> - if (pmc->current_config == (u64)ctrl && pmc_resume_counter(pmc))
>> - return;
>> -
>> - pmc_release_perf_event(pmc);
>> -
>> - pmc->current_config = (u64)ctrl;
>> - pmc_reprogram_counter(pmc, PERF_TYPE_HARDWARE,
>> - kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc),
>> - !(en_field & 0x2), /* exclude user */
>> - !(en_field & 0x1), /* exclude kernel */
>> - pmi, false, false);
>> + return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
>> }
>>
>> void reprogram_counter(struct kvm_pmc *pmc)
>> {
>> - if (pmc_is_gp(pmc))
>> - reprogram_gp_counter(pmc);
>> - else
>> - reprogram_fixed_counter(pmc);
>> + struct kvm_pmu *pmu = pmc_to_pmu(pmc);
>> + u64 eventsel = pmc->eventsel;
>> + u64 new_config = eventsel;
>> + u8 fixed_ctr_ctrl;
>> +
>> + pmc_pause_counter(pmc);
>> +
>> + if (!pmc_speculative_in_use(pmc) || !pmc_is_enabled(pmc))
>> + return;
>> +
>> + if (!check_pmu_event_filter(pmc))
>> + return;
>> +
>> + if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
>> + printk_once("kvm pmu: pin control bit is ignored\n");
>> +
>> + if (pmc_is_fixed(pmc)) {
>> + fixed_ctr_ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl,
>> + pmc->idx - INTEL_PMC_IDX_FIXED);
>> + if (fixed_ctr_ctrl & 0x1)
>> + eventsel |= ARCH_PERFMON_EVENTSEL_OS;
>> + if (fixed_ctr_ctrl & 0x2)
>> + eventsel |= ARCH_PERFMON_EVENTSEL_USR;
>> + if (fixed_ctr_ctrl & 0x8)
>> + eventsel |= ARCH_PERFMON_EVENTSEL_INT;
>> + new_config = (u64)fixed_ctr_ctrl;
>> + }
>> +
>> + if (pmc->current_config == new_config && pmc_resume_counter(pmc))
>> + return;
>> +
>> + pmc_release_perf_event(pmc);
>> +
>> + pmc->current_config = new_config;
>> + pmc_reprogram_counter(pmc, PERF_TYPE_RAW,
>> + (eventsel & AMD64_RAW_EVENT_MASK),
>> + !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
>> + !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
>> + eventsel & ARCH_PERFMON_EVENTSEL_INT,
>> + (eventsel & HSW_IN_TX),
>> + (eventsel & HSW_IN_TX_CHECKPOINTED));
>
> It seems that this extremely long argument list was motivated by the
> differences between the two original call sites. Now that you have
> mocked up a full eventsel (with USR, OS, INT, IN_TX, and IN_TXCP bits)
> for the fixed counters, why not pass the entire eventsel as the third

I've thought about it.

I'm trying to pass-in generic bits (EVENT_MASK, USER, OS, INT) to
pmc_reprogram_counter() and let the latter handle the implementation details
of assembling the "struct perf_event_attr" to talk carefully with perf core.

I suppose the fixed counters doesn't support IN_TX* bits and I try to
clean those two away in another patch as you know.

In terms of code readability, the current one achieves a good balance.

> argument and drop all of the rest? Then, pmc_reprogram_counter() can
> extract/check the bits of interest.
>
>> }
>> EXPORT_SYMBOL_GPL(reprogram_counter);
>>
>> @@ -451,17 +426,6 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu)
>> kvm_pmu_refresh(vcpu);
>> }
>>
>> -static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
>> -{
>> - struct kvm_pmu *pmu = pmc_to_pmu(pmc);
>> -
>> - if (pmc_is_fixed(pmc))
>> - return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
>> - pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;
>> -
>> - return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
>> -}
>> -
>> /* Release perf_events for vPMCs that have been unused for a full time slice. */
>> void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)
>> {
>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>> index 19b78a9d9d47..d823fbe4e155 100644
>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>> @@ -492,7 +492,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>> pmu->reserved_bits = 0xffffffff00200000ull;
>>
>> entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
>> - if (!entry || !vcpu->kvm->arch.enable_pmu)
>> + if (!entry || !vcpu->kvm->arch.enable_pmu || !boot_cpu_has(X86_FEATURE_ARCH_PERFMON))
>
> This change seems unrelated.

The intention of using the PERF_TYPE_HARDWARE type is to emulate guest architecture
PMU on a host without architecture PMU (the oldest Pentium 4), for which the
guest vPMC
needs to be reprogrammed using the kernel generic perf_hw_id.

This is the most original story of PERF_TYPE_HARDWARE, thanks to history teacher
Paolo,
who also suggested this way to drop this kind of support.

>
>> return;
>> eax.full = entry->eax;
>> edx.full = entry->edx;
>> --
>> 2.35.1
>>

2022-03-08 13:38:01

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v2 07/12] KVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp, fixed}counter()

On Wed, Mar 2, 2022 at 3:14 AM Like Xu <[email protected]> wrote:
>
> From: Like Xu <[email protected]>
>
> The code sketch for reprogram_{gp, fixed}_counter() is similar, while the
> fixed counter using the PERF_TYPE_HARDWAR type and the gp being
> able to use either PERF_TYPE_HARDWAR or PERF_TYPE_RAW type
> depending on the pmc->eventsel value.
>
> After 'commit 761875634a5e ("KVM: x86/pmu: Setup pmc->eventsel
> for fixed PMCs")', the pmc->eventsel of the fixed counter will also have
> been setup with the same semantic value and will not be changed during
> the guest runtime. But essentially, "the HARDWARE is just a convenience
> wrapper over RAW IIRC", quoated from Peterz. So it could be pretty safe
> to use the PERF_TYPE_RAW type only to program both gp and fixed
> counters naturally in the reprogram_counter().
>
> To make the gp and fixed counters more semantically symmetrical,
> the selection of EVENTSEL_{USER, OS, INT} bits is temporarily translated
> via fixed_ctr_ctrl before the pmc_reprogram_counter() call.
>
> Cc: Peter Zijlstra <[email protected]>
> Suggested-by: Jim Mattson <[email protected]>
> Signed-off-by: Like Xu <[email protected]>
> ---
> arch/x86/kvm/pmu.c | 128 +++++++++++++----------------------
> arch/x86/kvm/vmx/pmu_intel.c | 2 +-
> 2 files changed, 47 insertions(+), 83 deletions(-)
>
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 5299488b002c..00e1660c10ca 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -215,85 +215,60 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
> return allow_event;
> }
>
> -static void reprogram_gp_counter(struct kvm_pmc *pmc)
> -{
> - u64 config;
> - u32 type = PERF_TYPE_RAW;
> - u64 eventsel = pmc->eventsel;
> -
> - if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
> - printk_once("kvm pmu: pin control bit is ignored\n");
> -
> - pmc_pause_counter(pmc);
> -
> - if (!(eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) || !pmc_is_enabled(pmc))
> - return;
> -
> - if (!check_pmu_event_filter(pmc))
> - return;
> -
> - if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
> - ARCH_PERFMON_EVENTSEL_INV |
> - ARCH_PERFMON_EVENTSEL_CMASK |
> - HSW_IN_TX |
> - HSW_IN_TX_CHECKPOINTED))) {
> - config = kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc);
> - if (config != PERF_COUNT_HW_MAX)
> - type = PERF_TYPE_HARDWARE;
> - }
> -
> - if (type == PERF_TYPE_RAW)
> - config = eventsel & AMD64_RAW_EVENT_MASK;
> -
> - if (pmc->current_config == eventsel && pmc_resume_counter(pmc))
> - return;
> -
> - pmc_release_perf_event(pmc);
> -
> - pmc->current_config = eventsel;
> - pmc_reprogram_counter(pmc, type, config,
> - !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
> - !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
> - eventsel & ARCH_PERFMON_EVENTSEL_INT,
> - (eventsel & HSW_IN_TX),
> - (eventsel & HSW_IN_TX_CHECKPOINTED));
> -}
> -
> -static void reprogram_fixed_counter(struct kvm_pmc *pmc)
> +static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
> {
> struct kvm_pmu *pmu = pmc_to_pmu(pmc);
> - int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
> - u8 ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, idx);
> - unsigned en_field = ctrl & 0x3;
> - bool pmi = ctrl & 0x8;
>
> - pmc_pause_counter(pmc);
> + if (pmc_is_fixed(pmc))
> + return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
> + pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;
>
> - if (!en_field || !pmc_is_enabled(pmc))
> - return;
> -
> - if (!check_pmu_event_filter(pmc))
> - return;
> -
> - if (pmc->current_config == (u64)ctrl && pmc_resume_counter(pmc))
> - return;
> -
> - pmc_release_perf_event(pmc);
> -
> - pmc->current_config = (u64)ctrl;
> - pmc_reprogram_counter(pmc, PERF_TYPE_HARDWARE,
> - kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc),
> - !(en_field & 0x2), /* exclude user */
> - !(en_field & 0x1), /* exclude kernel */
> - pmi, false, false);
> + return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
> }
>
> void reprogram_counter(struct kvm_pmc *pmc)
> {
> - if (pmc_is_gp(pmc))
> - reprogram_gp_counter(pmc);
> - else
> - reprogram_fixed_counter(pmc);
> + struct kvm_pmu *pmu = pmc_to_pmu(pmc);
> + u64 eventsel = pmc->eventsel;
> + u64 new_config = eventsel;
> + u8 fixed_ctr_ctrl;
> +
> + pmc_pause_counter(pmc);
> +
> + if (!pmc_speculative_in_use(pmc) || !pmc_is_enabled(pmc))
> + return;
> +
> + if (!check_pmu_event_filter(pmc))
> + return;
> +
> + if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
> + printk_once("kvm pmu: pin control bit is ignored\n");
> +
> + if (pmc_is_fixed(pmc)) {
> + fixed_ctr_ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl,
> + pmc->idx - INTEL_PMC_IDX_FIXED);
> + if (fixed_ctr_ctrl & 0x1)
> + eventsel |= ARCH_PERFMON_EVENTSEL_OS;
> + if (fixed_ctr_ctrl & 0x2)
> + eventsel |= ARCH_PERFMON_EVENTSEL_USR;
> + if (fixed_ctr_ctrl & 0x8)
> + eventsel |= ARCH_PERFMON_EVENTSEL_INT;
> + new_config = (u64)fixed_ctr_ctrl;
> + }
> +
> + if (pmc->current_config == new_config && pmc_resume_counter(pmc))
> + return;
> +
> + pmc_release_perf_event(pmc);
> +
> + pmc->current_config = new_config;
> + pmc_reprogram_counter(pmc, PERF_TYPE_RAW,
> + (eventsel & AMD64_RAW_EVENT_MASK),
> + !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
> + !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
> + eventsel & ARCH_PERFMON_EVENTSEL_INT,
> + (eventsel & HSW_IN_TX),
> + (eventsel & HSW_IN_TX_CHECKPOINTED));

It seems that this extremely long argument list was motivated by the
differences between the two original call sites. Now that you have
mocked up a full eventsel (with USR, OS, INT, IN_TX, and IN_TXCP bits)
for the fixed counters, why not pass the entire eventsel as the third
argument and drop all of the rest? Then, pmc_reprogram_counter() can
extract/check the bits of interest.

> }
> EXPORT_SYMBOL_GPL(reprogram_counter);
>
> @@ -451,17 +426,6 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu)
> kvm_pmu_refresh(vcpu);
> }
>
> -static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
> -{
> - struct kvm_pmu *pmu = pmc_to_pmu(pmc);
> -
> - if (pmc_is_fixed(pmc))
> - return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
> - pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;
> -
> - return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
> -}
> -
> /* Release perf_events for vPMCs that have been unused for a full time slice. */
> void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)
> {
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 19b78a9d9d47..d823fbe4e155 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -492,7 +492,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
> pmu->reserved_bits = 0xffffffff00200000ull;
>
> entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
> - if (!entry || !vcpu->kvm->arch.enable_pmu)
> + if (!entry || !vcpu->kvm->arch.enable_pmu || !boot_cpu_has(X86_FEATURE_ARCH_PERFMON))

This change seems unrelated.

> return;
> eax.full = entry->eax;
> edx.full = entry->edx;
> --
> 2.35.1
>