2022-05-18 13:26:50

by Like Xu

[permalink] [raw]
Subject: [PATCH RESEND v3 00/11] KVM: x86/pmu: More refactoring to get rid of PERF_TYPE_HARDWAR

Hi,

This is a follow up to [0]. By keeping the same semantics of eventsel
for gp and fixed counters, the reprogram code could be made more
symmetrical, simpler and even faster [1], and based on that, it fixes the
obsolescence amd_event_mapping issue [2].

Most of the changes are code refactoring, which help to review key
changes more easily. Patches are rebased on top of kvm/queue.

One of the notable changes is that we ended up removing the
reprogram_{gp, fixed}_counter() functions and replacing it with the
merged reprogram_counter(), where KVM programs pmc->perf_event
with only the PERF_TYPE_RAW type for any type of counter
(suggested by Jim as well). PeterZ confirmed the idea, "think so;
the HARDWARE is just a convenience wrapper over RAW IIRC".

Practically, this change drops the guest pmu support on the hosts without
X86_FEATURE_ARCH_PERFMON (the oldest Pentium 4), where the
PERF_TYPE_HARDWAR is intentionally introduced so that hosts can
map the architectural guest PMU events to their own. (thanks to Paolo)

The 3rd patch removes the call trace in the commit message while we still
think that kvm->arch.pmu_event_filter requires SRCU protection in terms
of pmu_event_filter functionality, similar to "kvm->arch.msr_filter".

Please check more details in each commit and feel free to comment.

[0] https://lore.kernel.org/kvm/[email protected]/
[1] https://lore.kernel.org/kvm/[email protected]/
[2] https://lore.kernel.org/kvm/[email protected]/

Thanks,
Like Xu

---
v3: https://lore.kernel.org/kvm/[email protected]/
v3 -> v3 RESEND Changelog:
- Rebase on the top kvm-queue tree;

v2 -> v3 Changelog:
- Use static_call stuff for .hw_event_is_unavail hook;
- Drop unrelated 0012 patch since we have addressed the issue;
- Drop passing HSW_IN_TX* bits for pmc_reprogram_counter();

v1: https://lore.kernel.org/kvm/[email protected]/
v1 -> v2 Changelog:
- Get all the vPMC tests I have on hand to pass;
- Update obsolete AMD PMU comments; (Jim)
- Fix my carelessness for [-Wconstant-logical-operand] in 0009; (0day)
- Add "u64 new_config" to reuse perf_event for fixed CTRs; (0day)
- Add patch 0012 to attract attention to review;

Like Xu (11):
KVM: x86/pmu: Update comments for AMD gp counters
KVM: x86/pmu: Extract check_pmu_event_filter() from the same semantics
KVM: x86/pmu: Protect kvm->arch.pmu_event_filter with SRCU
KVM: x86/pmu: Pass only "struct kvm_pmc *pmc" to reprogram_counter()
KVM: x86/pmu: Drop "u64 eventsel" for reprogram_gp_counter()
KVM: x86/pmu: Drop "u8 ctrl, int idx" for reprogram_fixed_counter()
KVM: x86/pmu: Use only the uniform interface reprogram_counter()
KVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp,fixed}counter()
perf: x86/core: Add interface to query perfmon_event_map[] directly
KVM: x86/pmu: Replace pmc_perf_hw_id() with perf_get_hw_event_config()
KVM: x86/pmu: Drop amd_event_mapping[] in the KVM context

arch/x86/events/core.c | 11 ++
arch/x86/include/asm/kvm-x86-pmu-ops.h | 2 +-
arch/x86/include/asm/perf_event.h | 6 +
arch/x86/kvm/pmu.c | 157 ++++++++++---------------
arch/x86/kvm/pmu.h | 8 +-
arch/x86/kvm/svm/pmu.c | 62 ++--------
arch/x86/kvm/vmx/pmu_intel.c | 62 +++++-----
7 files changed, 121 insertions(+), 187 deletions(-)

--
2.36.1



2022-05-18 13:28:30

by Like Xu

[permalink] [raw]
Subject: [PATCH RESEND v3 01/11] KVM: x86/pmu: Update comments for AMD gp counters

From: Like Xu <[email protected]>

The obsolete comment could more accurately state that AMD platforms
have two base MSR addresses and two different maximum numbers
for gp counters, depending on the X86_FEATURE_PERFCTR_CORE feature.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/pmu.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index b5d0c36b869b..3e200b9610f9 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -37,7 +37,9 @@ EXPORT_SYMBOL_GPL(kvm_pmu_cap);
* However AMD doesn't support fixed-counters;
* - There are three types of index to access perf counters (PMC):
* 1. MSR (named msr): For example Intel has MSR_IA32_PERFCTRn and AMD
- * has MSR_K7_PERFCTRn.
+ * has MSR_K7_PERFCTRn and, for families 15H and later,
+ * MSR_F15H_PERF_CTRn, where MSR_F15H_PERF_CTR[0-3] are
+ * aliased to MSR_K7_PERFCTRn.
* 2. MSR Index (named idx): This normally is used by RDPMC instruction.
* For instance AMD RDPMC instruction uses 0000_0003h in ECX to access
* C001_0007h (MSR_K7_PERCTR3). Intel has a similar mechanism, except
@@ -49,7 +51,8 @@ EXPORT_SYMBOL_GPL(kvm_pmu_cap);
* between pmc and perf counters is as the following:
* * Intel: [0 .. INTEL_PMC_MAX_GENERIC-1] <=> gp counters
* [INTEL_PMC_IDX_FIXED .. INTEL_PMC_IDX_FIXED + 2] <=> fixed
- * * AMD: [0 .. AMD64_NUM_COUNTERS-1] <=> gp counters
+ * * AMD: [0 .. AMD64_NUM_COUNTERS-1] and, for families 15H
+ * and later, [0 .. AMD64_NUM_COUNTERS_CORE-1] <=> gp counters
*/

static struct kvm_pmu_ops kvm_pmu_ops __read_mostly;
--
2.36.1


2022-05-18 13:31:18

by Like Xu

[permalink] [raw]
Subject: [PATCH RESEND v3 03/11] KVM: x86/pmu: Protect kvm->arch.pmu_event_filter with SRCU

From: Like Xu <[email protected]>

Similar to "kvm->arch.msr_filter", KVM should guarantee that vCPUs will
see either the previous filter or the new filter when user space calls
KVM_SET_PMU_EVENT_FILTER ioctl with the vCPU running so that guest
pmu events with identical settings in both the old and new filter have
deterministic behavior.

Fixes: 66bb8a065f5a ("KVM: x86: PMU Event Filter")
Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/pmu.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index f189512207db..24624654e476 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -246,8 +246,9 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
struct kvm *kvm = pmc->vcpu->kvm;
bool allow_event = true;
__u64 key;
- int idx;
+ int idx, srcu_idx;

+ srcu_idx = srcu_read_lock(&kvm->srcu);
filter = srcu_dereference(kvm->arch.pmu_event_filter, &kvm->srcu);
if (!filter)
goto out;
@@ -270,6 +271,7 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
}

out:
+ srcu_read_unlock(&kvm->srcu, srcu_idx);
return allow_event;
}

--
2.36.1


2022-05-18 13:31:38

by Like Xu

[permalink] [raw]
Subject: [PATCH RESEND v3 02/11] KVM: x86/pmu: Extract check_pmu_event_filter() from the same semantics

From: Like Xu <[email protected]>

Checking the kvm->arch.pmu_event_filter policy in both gp and fixed
code paths was somewhat redundant, so common parts can be extracted,
which reduces code footprint and improves readability.

Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/pmu.c | 63 +++++++++++++++++++++++++++-------------------
1 file changed, 37 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 3e200b9610f9..f189512207db 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -240,14 +240,44 @@ static int cmp_u64(const void *a, const void *b)
return *(__u64 *)a - *(__u64 *)b;
}

+static bool check_pmu_event_filter(struct kvm_pmc *pmc)
+{
+ struct kvm_pmu_event_filter *filter;
+ struct kvm *kvm = pmc->vcpu->kvm;
+ bool allow_event = true;
+ __u64 key;
+ int idx;
+
+ filter = srcu_dereference(kvm->arch.pmu_event_filter, &kvm->srcu);
+ if (!filter)
+ goto out;
+
+ if (pmc_is_gp(pmc)) {
+ key = pmc->eventsel & AMD64_RAW_EVENT_MASK_NB;
+ if (bsearch(&key, filter->events, filter->nevents,
+ sizeof(__u64), cmp_u64))
+ allow_event = filter->action == KVM_PMU_EVENT_ALLOW;
+ else
+ allow_event = filter->action == KVM_PMU_EVENT_DENY;
+ } else {
+ idx = pmc->idx - INTEL_PMC_IDX_FIXED;
+ if (filter->action == KVM_PMU_EVENT_DENY &&
+ test_bit(idx, (ulong *)&filter->fixed_counter_bitmap))
+ allow_event = false;
+ if (filter->action == KVM_PMU_EVENT_ALLOW &&
+ !test_bit(idx, (ulong *)&filter->fixed_counter_bitmap))
+ allow_event = false;
+ }
+
+out:
+ return allow_event;
+}
+
void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
{
u64 config;
u32 type = PERF_TYPE_RAW;
- struct kvm *kvm = pmc->vcpu->kvm;
- struct kvm_pmu_event_filter *filter;
- struct kvm_pmu *pmu = vcpu_to_pmu(pmc->vcpu);
- bool allow_event = true;
+ struct kvm_pmu *pmu = pmc_to_pmu(pmc);

if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
printk_once("kvm pmu: pin control bit is ignored\n");
@@ -259,17 +289,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
if (!(eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) || !pmc_is_enabled(pmc))
return;

- filter = srcu_dereference(kvm->arch.pmu_event_filter, &kvm->srcu);
- if (filter) {
- __u64 key = eventsel & AMD64_RAW_EVENT_MASK_NB;
-
- if (bsearch(&key, filter->events, filter->nevents,
- sizeof(__u64), cmp_u64))
- allow_event = filter->action == KVM_PMU_EVENT_ALLOW;
- else
- allow_event = filter->action == KVM_PMU_EVENT_DENY;
- }
- if (!allow_event)
+ if (!check_pmu_event_filter(pmc))
return;

if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
@@ -302,23 +322,14 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
{
unsigned en_field = ctrl & 0x3;
bool pmi = ctrl & 0x8;
- struct kvm_pmu_event_filter *filter;
- struct kvm *kvm = pmc->vcpu->kvm;

pmc_pause_counter(pmc);

if (!en_field || !pmc_is_enabled(pmc))
return;

- filter = srcu_dereference(kvm->arch.pmu_event_filter, &kvm->srcu);
- if (filter) {
- if (filter->action == KVM_PMU_EVENT_DENY &&
- test_bit(idx, (ulong *)&filter->fixed_counter_bitmap))
- return;
- if (filter->action == KVM_PMU_EVENT_ALLOW &&
- !test_bit(idx, (ulong *)&filter->fixed_counter_bitmap))
- return;
- }
+ if (!check_pmu_event_filter(pmc))
+ return;

if (pmc->current_config == (u64)ctrl && pmc_resume_counter(pmc))
return;
--
2.36.1


2022-05-18 13:31:48

by Like Xu

[permalink] [raw]
Subject: [PATCH RESEND v3 04/11] KVM: x86/pmu: Pass only "struct kvm_pmc *pmc" to reprogram_counter()

From: Like Xu <[email protected]>

Passing the reference "struct kvm_pmc *pmc" when creating
pmc->perf_event is sufficient. This change helps to simplify the
calling convention by replacing reprogram_{gp, fixed}_counter()
with reprogram_counter() seamlessly.

No functional change intended.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/pmu.c | 17 +++++------------
arch/x86/kvm/pmu.h | 2 +-
arch/x86/kvm/vmx/pmu_intel.c | 32 ++++++++++++++++++--------------
3 files changed, 24 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 24624654e476..ba767b4921e3 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -347,18 +347,13 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
}
EXPORT_SYMBOL_GPL(reprogram_fixed_counter);

-void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx)
+void reprogram_counter(struct kvm_pmc *pmc)
{
- struct kvm_pmc *pmc = static_call(kvm_x86_pmu_pmc_idx_to_pmc)(pmu, pmc_idx);
-
- if (!pmc)
- return;
-
if (pmc_is_gp(pmc))
reprogram_gp_counter(pmc, pmc->eventsel);
else {
- int idx = pmc_idx - INTEL_PMC_IDX_FIXED;
- u8 ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, idx);
+ int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
+ u8 ctrl = fixed_ctrl_field(pmc_to_pmu(pmc)->fixed_ctr_ctrl, idx);

reprogram_fixed_counter(pmc, ctrl, idx);
}
@@ -377,8 +372,7 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
clear_bit(bit, pmu->reprogram_pmi);
continue;
}
-
- reprogram_counter(pmu, bit);
+ reprogram_counter(pmc);
}

/*
@@ -551,13 +545,12 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu)

static void kvm_pmu_incr_counter(struct kvm_pmc *pmc)
{
- struct kvm_pmu *pmu = pmc_to_pmu(pmc);
u64 prev_count;

prev_count = pmc->counter;
pmc->counter = (pmc->counter + 1) & pmc_bitmask(pmc);

- reprogram_counter(pmu, pmc->idx);
+ reprogram_counter(pmc);
if (pmc->counter < prev_count)
__kvm_perf_overflow(pmc, false);
}
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index dbf4c83519a4..0fd2518227f7 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -174,7 +174,7 @@ static inline void kvm_init_pmu_capability(void)

void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel);
void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int fixed_idx);
-void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx);
+void reprogram_counter(struct kvm_pmc *pmc);

void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 84b326c4dce9..33448482db50 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -56,16 +56,32 @@ static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
pmu->fixed_ctr_ctrl = data;
}

+static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
+{
+ if (pmc_idx < INTEL_PMC_IDX_FIXED) {
+ return get_gp_pmc(pmu, MSR_P6_EVNTSEL0 + pmc_idx,
+ MSR_P6_EVNTSEL0);
+ } else {
+ u32 idx = pmc_idx - INTEL_PMC_IDX_FIXED;
+
+ return get_fixed_pmc(pmu, idx + MSR_CORE_PERF_FIXED_CTR0);
+ }
+}
+
/* function is called when global control register has been updated. */
static void global_ctrl_changed(struct kvm_pmu *pmu, u64 data)
{
int bit;
u64 diff = pmu->global_ctrl ^ data;
+ struct kvm_pmc *pmc;

pmu->global_ctrl = data;

- for_each_set_bit(bit, (unsigned long *)&diff, X86_PMC_IDX_MAX)
- reprogram_counter(pmu, bit);
+ for_each_set_bit(bit, (unsigned long *)&diff, X86_PMC_IDX_MAX) {
+ pmc = intel_pmc_idx_to_pmc(pmu, bit);
+ if (pmc)
+ reprogram_counter(pmc);
+ }
}

static unsigned int intel_pmc_perf_hw_id(struct kvm_pmc *pmc)
@@ -101,18 +117,6 @@ static bool intel_pmc_is_enabled(struct kvm_pmc *pmc)
return test_bit(pmc->idx, (unsigned long *)&pmu->global_ctrl);
}

-static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
-{
- if (pmc_idx < INTEL_PMC_IDX_FIXED)
- return get_gp_pmc(pmu, MSR_P6_EVNTSEL0 + pmc_idx,
- MSR_P6_EVNTSEL0);
- else {
- u32 idx = pmc_idx - INTEL_PMC_IDX_FIXED;
-
- return get_fixed_pmc(pmu, idx + MSR_CORE_PERF_FIXED_CTR0);
- }
-}
-
static bool intel_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
--
2.36.1


2022-05-18 13:31:48

by Like Xu

[permalink] [raw]
Subject: [PATCH RESEND v3 06/11] KVM: x86/pmu: Drop "u8 ctrl, int idx" for reprogram_fixed_counter()

From: Like Xu <[email protected]>

Since afrer reprogram_fixed_counter() is called, it's bound to assign
the requested fixed_ctr_ctrl to pmu->fixed_ctr_ctrl, this assignment step
can be moved forward (the stale value for diff is saved extra early),
thus simplifying the passing of parameters.

No functional change intended.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/pmu.c | 13 ++++++-------
arch/x86/kvm/pmu.h | 2 +-
arch/x86/kvm/vmx/pmu_intel.c | 16 ++++++++--------
3 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index cbffa060976e..131fbab612ca 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -319,8 +319,11 @@ void reprogram_gp_counter(struct kvm_pmc *pmc)
}
EXPORT_SYMBOL_GPL(reprogram_gp_counter);

-void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
+void reprogram_fixed_counter(struct kvm_pmc *pmc)
{
+ struct kvm_pmu *pmu = pmc_to_pmu(pmc);
+ int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
+ u8 ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, idx);
unsigned en_field = ctrl & 0x3;
bool pmi = ctrl & 0x8;

@@ -350,12 +353,8 @@ void reprogram_counter(struct kvm_pmc *pmc)
{
if (pmc_is_gp(pmc))
reprogram_gp_counter(pmc);
- else {
- int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
- u8 ctrl = fixed_ctrl_field(pmc_to_pmu(pmc)->fixed_ctr_ctrl, idx);
-
- reprogram_fixed_counter(pmc, ctrl, idx);
- }
+ else
+ reprogram_fixed_counter(pmc);
}
EXPORT_SYMBOL_GPL(reprogram_counter);

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 56204f5a545d..8d7912978249 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -173,7 +173,7 @@ static inline void kvm_init_pmu_capability(void)
}

void reprogram_gp_counter(struct kvm_pmc *pmc);
-void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int fixed_idx);
+void reprogram_fixed_counter(struct kvm_pmc *pmc);
void reprogram_counter(struct kvm_pmc *pmc);

void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 2bfca470d5fd..5e10a1ef435d 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -37,23 +37,23 @@ static int fixed_pmc_events[] = {1, 0, 7};

static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
{
+ struct kvm_pmc *pmc;
+ u8 old_fixed_ctr_ctrl = pmu->fixed_ctr_ctrl;
int i;

+ pmu->fixed_ctr_ctrl = data;
for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
u8 new_ctrl = fixed_ctrl_field(data, i);
- u8 old_ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, i);
- struct kvm_pmc *pmc;
-
- pmc = get_fixed_pmc(pmu, MSR_CORE_PERF_FIXED_CTR0 + i);
+ u8 old_ctrl = fixed_ctrl_field(old_fixed_ctr_ctrl, i);

if (old_ctrl == new_ctrl)
continue;

- __set_bit(INTEL_PMC_IDX_FIXED + i, pmu->pmc_in_use);
- reprogram_fixed_counter(pmc, new_ctrl, i);
- }
+ pmc = get_fixed_pmc(pmu, MSR_CORE_PERF_FIXED_CTR0 + i);

- pmu->fixed_ctr_ctrl = data;
+ __set_bit(INTEL_PMC_IDX_FIXED + i, pmu->pmc_in_use);
+ reprogram_fixed_counter(pmc);
+ }
}

static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
--
2.36.1


2022-05-18 13:31:49

by Like Xu

[permalink] [raw]
Subject: [PATCH RESEND v3 08/11] KVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp,fixed}counter()

From: Like Xu <[email protected]>

The code sketch for reprogram_{gp, fixed}_counter() is similar, while the
fixed counter using the PERF_TYPE_HARDWAR type and the gp being
able to use either PERF_TYPE_HARDWAR or PERF_TYPE_RAW type
depending on the pmc->eventsel value.

After 'commit 761875634a5e ("KVM: x86/pmu: Setup pmc->eventsel
for fixed PMCs")', the pmc->eventsel of the fixed counter will also have
been setup with the same semantic value and will not be changed during
the guest runtime.

The original story of using the PERF_TYPE_HARDWARE type is to emulate
guest architecture PMU on a host without architecture PMU (the Pentium 4),
for which the guest vPMC needs to be reprogrammed using the kernel
generic perf_hw_id. But essentially, "the HARDWARE is just a convenience
wrapper over RAW IIRC", quoated from Peterz. So it could be pretty safe
to use the PERF_TYPE_RAW type only in practice to program both gp and
fixed counters naturally in the reprogram_counter().

To make the gp and fixed counters more semantically symmetrical,
the selection of EVENTSEL_{USER, OS, INT} bits is temporarily translated
via fixed_ctr_ctrl before the pmc_reprogram_counter() call.

Cc: Peter Zijlstra <[email protected]>
Suggested-by: Jim Mattson <[email protected]>
Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/pmu.c | 87 +++++++++++++---------------------------------
1 file changed, 25 insertions(+), 62 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index c2f00f07fbd7..33bf08fc0282 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -275,85 +275,48 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
return allow_event;
}

-static void reprogram_gp_counter(struct kvm_pmc *pmc)
+void reprogram_counter(struct kvm_pmc *pmc)
{
- u64 config;
- u32 type = PERF_TYPE_RAW;
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
u64 eventsel = pmc->eventsel;
+ u64 new_config = eventsel;
+ u8 fixed_ctr_ctrl;
+
+ pmc_pause_counter(pmc);
+
+ if (!pmc_speculative_in_use(pmc) || !pmc_is_enabled(pmc))
+ return;
+
+ if (!check_pmu_event_filter(pmc))
+ return;

if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
printk_once("kvm pmu: pin control bit is ignored\n");

- pmc_pause_counter(pmc);
-
- if (!(eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) || !pmc_is_enabled(pmc))
- return;
-
- if (!check_pmu_event_filter(pmc))
- return;
-
- if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
- ARCH_PERFMON_EVENTSEL_INV |
- ARCH_PERFMON_EVENTSEL_CMASK |
- HSW_IN_TX |
- HSW_IN_TX_CHECKPOINTED))) {
- config = static_call(kvm_x86_pmu_pmc_perf_hw_id)(pmc);
- if (config != PERF_COUNT_HW_MAX)
- type = PERF_TYPE_HARDWARE;
+ if (pmc_is_fixed(pmc)) {
+ fixed_ctr_ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl,
+ pmc->idx - INTEL_PMC_IDX_FIXED);
+ if (fixed_ctr_ctrl & 0x1)
+ eventsel |= ARCH_PERFMON_EVENTSEL_OS;
+ if (fixed_ctr_ctrl & 0x2)
+ eventsel |= ARCH_PERFMON_EVENTSEL_USR;
+ if (fixed_ctr_ctrl & 0x8)
+ eventsel |= ARCH_PERFMON_EVENTSEL_INT;
+ new_config = (u64)fixed_ctr_ctrl;
}

- if (type == PERF_TYPE_RAW)
- config = eventsel & pmu->raw_event_mask;
-
- if (pmc->current_config == eventsel && pmc_resume_counter(pmc))
+ if (pmc->current_config == new_config && pmc_resume_counter(pmc))
return;

pmc_release_perf_event(pmc);

- pmc->current_config = eventsel;
- pmc_reprogram_counter(pmc, type, config,
+ pmc->current_config = new_config;
+ pmc_reprogram_counter(pmc, PERF_TYPE_RAW,
+ (eventsel & pmu->raw_event_mask),
!(eventsel & ARCH_PERFMON_EVENTSEL_USR),
!(eventsel & ARCH_PERFMON_EVENTSEL_OS),
eventsel & ARCH_PERFMON_EVENTSEL_INT);
}
-
-static void reprogram_fixed_counter(struct kvm_pmc *pmc)
-{
- struct kvm_pmu *pmu = pmc_to_pmu(pmc);
- int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
- u8 ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, idx);
- unsigned en_field = ctrl & 0x3;
- bool pmi = ctrl & 0x8;
-
- pmc_pause_counter(pmc);
-
- if (!en_field || !pmc_is_enabled(pmc))
- return;
-
- if (!check_pmu_event_filter(pmc))
- return;
-
- if (pmc->current_config == (u64)ctrl && pmc_resume_counter(pmc))
- return;
-
- pmc_release_perf_event(pmc);
-
- pmc->current_config = (u64)ctrl;
- pmc_reprogram_counter(pmc, PERF_TYPE_HARDWARE,
- static_call(kvm_x86_pmu_pmc_perf_hw_id)(pmc),
- !(en_field & 0x2), /* exclude user */
- !(en_field & 0x1), /* exclude kernel */
- pmi);
-}
-
-void reprogram_counter(struct kvm_pmc *pmc)
-{
- if (pmc_is_gp(pmc))
- reprogram_gp_counter(pmc);
- else
- reprogram_fixed_counter(pmc);
-}
EXPORT_SYMBOL_GPL(reprogram_counter);

void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
--
2.36.1


2022-05-18 13:31:54

by Like Xu

[permalink] [raw]
Subject: [PATCH RESEND v3 05/11] KVM: x86/pmu: Drop "u64 eventsel" for reprogram_gp_counter()

From: Like Xu <[email protected]>

Because inside reprogram_gp_counter() it is bound to assign the requested
eventel to pmc->eventsel, this assignment step can be moved forward, thus
simplifying the passing of parameters to "struct kvm_pmc *pmc" only.

No functional change intended.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/pmu.c | 7 +++----
arch/x86/kvm/pmu.h | 2 +-
arch/x86/kvm/svm/pmu.c | 6 ++++--
arch/x86/kvm/vmx/pmu_intel.c | 3 ++-
4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index ba767b4921e3..cbffa060976e 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -275,17 +275,16 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
return allow_event;
}

-void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
+void reprogram_gp_counter(struct kvm_pmc *pmc)
{
u64 config;
u32 type = PERF_TYPE_RAW;
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
+ u64 eventsel = pmc->eventsel;

if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
printk_once("kvm pmu: pin control bit is ignored\n");

- pmc->eventsel = eventsel;
-
pmc_pause_counter(pmc);

if (!(eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) || !pmc_is_enabled(pmc))
@@ -350,7 +349,7 @@ EXPORT_SYMBOL_GPL(reprogram_fixed_counter);
void reprogram_counter(struct kvm_pmc *pmc)
{
if (pmc_is_gp(pmc))
- reprogram_gp_counter(pmc, pmc->eventsel);
+ reprogram_gp_counter(pmc);
else {
int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
u8 ctrl = fixed_ctrl_field(pmc_to_pmu(pmc)->fixed_ctr_ctrl, idx);
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 0fd2518227f7..56204f5a545d 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -172,7 +172,7 @@ static inline void kvm_init_pmu_capability(void)
KVM_PMC_MAX_FIXED);
}

-void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel);
+void reprogram_gp_counter(struct kvm_pmc *pmc);
void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int fixed_idx);
void reprogram_counter(struct kvm_pmc *pmc);

diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 47e8eaca1e90..fa4539e470b3 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -285,8 +285,10 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL);
if (pmc) {
data &= ~pmu->reserved_bits;
- if (data != pmc->eventsel)
- reprogram_gp_counter(pmc, data);
+ if (data != pmc->eventsel) {
+ pmc->eventsel = data;
+ reprogram_gp_counter(pmc);
+ }
return 0;
}

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 33448482db50..2bfca470d5fd 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -484,7 +484,8 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
(pmu->raw_event_mask & HSW_IN_TX_CHECKPOINTED))
reserved_bits ^= HSW_IN_TX_CHECKPOINTED;
if (!(data & reserved_bits)) {
- reprogram_gp_counter(pmc, data);
+ pmc->eventsel = data;
+ reprogram_gp_counter(pmc);
return 0;
}
} else if (intel_pmu_handle_lbr_msrs_access(vcpu, msr_info, false))
--
2.36.1


2022-05-18 13:32:13

by Like Xu

[permalink] [raw]
Subject: [PATCH RESEND v3 07/11] KVM: x86/pmu: Use only the uniform interface reprogram_counter()

From: Like Xu <[email protected]>

Since reprogram_counter(), reprogram_{gp, fixed}_counter() currently have
the same incoming parameter "struct kvm_pmc *pmc", the callers can simplify
the conetxt by using uniformly exported interface, which makes reprogram_
{gp, fixed}_counter() static and eliminates EXPORT_SYMBOL_GPL.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/pmu.c | 6 ++----
arch/x86/kvm/svm/pmu.c | 2 +-
arch/x86/kvm/vmx/pmu_intel.c | 4 ++--
3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 131fbab612ca..c2f00f07fbd7 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -275,7 +275,7 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
return allow_event;
}

-void reprogram_gp_counter(struct kvm_pmc *pmc)
+static void reprogram_gp_counter(struct kvm_pmc *pmc)
{
u64 config;
u32 type = PERF_TYPE_RAW;
@@ -317,9 +317,8 @@ void reprogram_gp_counter(struct kvm_pmc *pmc)
!(eventsel & ARCH_PERFMON_EVENTSEL_OS),
eventsel & ARCH_PERFMON_EVENTSEL_INT);
}
-EXPORT_SYMBOL_GPL(reprogram_gp_counter);

-void reprogram_fixed_counter(struct kvm_pmc *pmc)
+static void reprogram_fixed_counter(struct kvm_pmc *pmc)
{
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
@@ -347,7 +346,6 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc)
!(en_field & 0x1), /* exclude kernel */
pmi);
}
-EXPORT_SYMBOL_GPL(reprogram_fixed_counter);

void reprogram_counter(struct kvm_pmc *pmc)
{
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index fa4539e470b3..b5ba846fee88 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -287,7 +287,7 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
data &= ~pmu->reserved_bits;
if (data != pmc->eventsel) {
pmc->eventsel = data;
- reprogram_gp_counter(pmc);
+ reprogram_counter(pmc);
}
return 0;
}
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 5e10a1ef435d..75aa2282ae93 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -52,7 +52,7 @@ static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
pmc = get_fixed_pmc(pmu, MSR_CORE_PERF_FIXED_CTR0 + i);

__set_bit(INTEL_PMC_IDX_FIXED + i, pmu->pmc_in_use);
- reprogram_fixed_counter(pmc);
+ reprogram_counter(pmc);
}
}

@@ -485,7 +485,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
reserved_bits ^= HSW_IN_TX_CHECKPOINTED;
if (!(data & reserved_bits)) {
pmc->eventsel = data;
- reprogram_gp_counter(pmc);
+ reprogram_counter(pmc);
return 0;
}
} else if (intel_pmu_handle_lbr_msrs_access(vcpu, msr_info, false))
--
2.36.1


2022-05-18 13:32:26

by Like Xu

[permalink] [raw]
Subject: [PATCH RESEND v3 11/11] KVM: x86/pmu: Drop amd_event_mapping[] in the KVM context

From: Like Xu <[email protected]>

All gp or fixed counters have been reprogrammed using PERF_TYPE_RAW,
which means that the table that maps perf_hw_id to event select values is
no longer useful, at least for AMD.

For Intel, the logic to check if the pmu event reported by Intel cpuid is
not available is still required, in which case pmc_perf_hw_id() could be
renamed to hw_event_is_unavail() and a bool value is returned to replace
the semantics of "PERF_COUNT_HW_MAX+1".

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/include/asm/kvm-x86-pmu-ops.h | 2 +-
arch/x86/kvm/pmu.c | 6 +--
arch/x86/kvm/pmu.h | 2 +-
arch/x86/kvm/svm/pmu.c | 56 ++------------------------
arch/x86/kvm/vmx/pmu_intel.c | 11 ++---
5 files changed, 12 insertions(+), 65 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-pmu-ops.h b/arch/x86/include/asm/kvm-x86-pmu-ops.h
index fdfd8e06fee6..227317bafb22 100644
--- a/arch/x86/include/asm/kvm-x86-pmu-ops.h
+++ b/arch/x86/include/asm/kvm-x86-pmu-ops.h
@@ -12,7 +12,7 @@ BUILD_BUG_ON(1)
* a NULL definition, for example if "static_call_cond()" will be used
* at the call sites.
*/
-KVM_X86_PMU_OP(pmc_perf_hw_id)
+KVM_X86_PMU_OP(hw_event_is_unavail)
KVM_X86_PMU_OP(pmc_is_enabled)
KVM_X86_PMU_OP(pmc_idx_to_pmc)
KVM_X86_PMU_OP(rdpmc_ecx_to_pmc)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 7dc949f6a92c..c01d66d237bb 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -151,9 +151,6 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
};
bool pebs = test_bit(pmc->idx, (unsigned long *)&pmu->pebs_enable);

- if (type == PERF_TYPE_HARDWARE && config >= PERF_COUNT_HW_MAX)
- return;
-
attr.sample_period = get_sample_period(pmc, pmc->counter);

if ((attr.config & HSW_IN_TX_CHECKPOINTED) &&
@@ -248,6 +245,9 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
__u64 key;
int idx, srcu_idx;

+ if (static_call(kvm_x86_pmu_hw_event_is_unavail)(pmc))
+ return false;
+
srcu_idx = srcu_read_lock(&kvm->srcu);
filter = srcu_dereference(kvm->arch.pmu_event_filter, &kvm->srcu);
if (!filter)
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 8d7912978249..1ad19c1949ad 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -30,7 +30,7 @@ struct kvm_event_hw_type_mapping {
};

struct kvm_pmu_ops {
- unsigned int (*pmc_perf_hw_id)(struct kvm_pmc *pmc);
+ bool (*hw_event_is_unavail)(struct kvm_pmc *pmc);
bool (*pmc_is_enabled)(struct kvm_pmc *pmc);
struct kvm_pmc *(*pmc_idx_to_pmc)(struct kvm_pmu *pmu, int pmc_idx);
struct kvm_pmc *(*rdpmc_ecx_to_pmc)(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index b5ba846fee88..0c9f2e4b7b6b 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -33,34 +33,6 @@ enum index {
INDEX_ERROR,
};

-/* duplicated from amd_perfmon_event_map, K7 and above should work. */
-static struct kvm_event_hw_type_mapping amd_event_mapping[] = {
- [0] = { 0x76, 0x00, PERF_COUNT_HW_CPU_CYCLES },
- [1] = { 0xc0, 0x00, PERF_COUNT_HW_INSTRUCTIONS },
- [2] = { 0x7d, 0x07, PERF_COUNT_HW_CACHE_REFERENCES },
- [3] = { 0x7e, 0x07, PERF_COUNT_HW_CACHE_MISSES },
- [4] = { 0xc2, 0x00, PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
- [5] = { 0xc3, 0x00, PERF_COUNT_HW_BRANCH_MISSES },
- [6] = { 0xd0, 0x00, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND },
- [7] = { 0xd1, 0x00, PERF_COUNT_HW_STALLED_CYCLES_BACKEND },
-};
-
-/* duplicated from amd_f17h_perfmon_event_map. */
-static struct kvm_event_hw_type_mapping amd_f17h_event_mapping[] = {
- [0] = { 0x76, 0x00, PERF_COUNT_HW_CPU_CYCLES },
- [1] = { 0xc0, 0x00, PERF_COUNT_HW_INSTRUCTIONS },
- [2] = { 0x60, 0xff, PERF_COUNT_HW_CACHE_REFERENCES },
- [3] = { 0x64, 0x09, PERF_COUNT_HW_CACHE_MISSES },
- [4] = { 0xc2, 0x00, PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
- [5] = { 0xc3, 0x00, PERF_COUNT_HW_BRANCH_MISSES },
- [6] = { 0x87, 0x02, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND },
- [7] = { 0x87, 0x01, PERF_COUNT_HW_STALLED_CYCLES_BACKEND },
-};
-
-/* amd_pmc_perf_hw_id depends on these being the same size */
-static_assert(ARRAY_SIZE(amd_event_mapping) ==
- ARRAY_SIZE(amd_f17h_event_mapping));
-
static unsigned int get_msr_base(struct kvm_pmu *pmu, enum pmu_type type)
{
struct kvm_vcpu *vcpu = pmu_to_vcpu(pmu);
@@ -154,31 +126,9 @@ static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,
return &pmu->gp_counters[msr_to_index(msr)];
}

-static unsigned int amd_pmc_perf_hw_id(struct kvm_pmc *pmc)
+static bool amd_hw_event_is_unavail(struct kvm_pmc *pmc)
{
- struct kvm_event_hw_type_mapping *event_mapping;
- u8 event_select = pmc->eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
- u8 unit_mask = (pmc->eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
- int i;
-
- /* return PERF_COUNT_HW_MAX as AMD doesn't have fixed events */
- if (WARN_ON(pmc_is_fixed(pmc)))
- return PERF_COUNT_HW_MAX;
-
- if (guest_cpuid_family(pmc->vcpu) >= 0x17)
- event_mapping = amd_f17h_event_mapping;
- else
- event_mapping = amd_event_mapping;
-
- for (i = 0; i < ARRAY_SIZE(amd_event_mapping); i++)
- if (event_mapping[i].eventsel == event_select
- && event_mapping[i].unit_mask == unit_mask)
- break;
-
- if (i == ARRAY_SIZE(amd_event_mapping))
- return PERF_COUNT_HW_MAX;
-
- return event_mapping[i].event_type;
+ return false;
}

/* check if a PMC is enabled by comparing it against global_ctrl bits. Because
@@ -344,7 +294,7 @@ static void amd_pmu_reset(struct kvm_vcpu *vcpu)
}

struct kvm_pmu_ops amd_pmu_ops __initdata = {
- .pmc_perf_hw_id = amd_pmc_perf_hw_id,
+ .hw_event_is_unavail = amd_hw_event_is_unavail,
.pmc_is_enabled = amd_pmc_is_enabled,
.pmc_idx_to_pmc = amd_pmc_idx_to_pmc,
.rdpmc_ecx_to_pmc = amd_rdpmc_ecx_to_pmc,
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 75aa2282ae93..6d24db41d8e0 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -84,7 +84,7 @@ static void global_ctrl_changed(struct kvm_pmu *pmu, u64 data)
}
}

-static unsigned int intel_pmc_perf_hw_id(struct kvm_pmc *pmc)
+static bool intel_hw_event_is_unavail(struct kvm_pmc *pmc)
{
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
u8 event_select = pmc->eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
@@ -98,15 +98,12 @@ static unsigned int intel_pmc_perf_hw_id(struct kvm_pmc *pmc)

/* disable event that reported as not present by cpuid */
if ((i < 7) && !(pmu->available_event_types & (1 << i)))
- return PERF_COUNT_HW_MAX + 1;
+ return true;

break;
}

- if (i == ARRAY_SIZE(intel_arch_events))
- return PERF_COUNT_HW_MAX;
-
- return intel_arch_events[i].event_type;
+ return false;
}

/* check if a PMC is enabled by comparing it with globl_ctrl bits. */
@@ -805,7 +802,7 @@ void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu)
}

struct kvm_pmu_ops intel_pmu_ops __initdata = {
- .pmc_perf_hw_id = intel_pmc_perf_hw_id,
+ .hw_event_is_unavail = intel_hw_event_is_unavail,
.pmc_is_enabled = intel_pmc_is_enabled,
.pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
.rdpmc_ecx_to_pmc = intel_rdpmc_ecx_to_pmc,
--
2.36.1


2022-05-18 13:32:28

by Like Xu

[permalink] [raw]
Subject: [PATCH RESEND v3 10/11] KVM: x86/pmu: Replace pmc_perf_hw_id() with perf_get_hw_event_config()

From: Like Xu <[email protected]>

With the help of perf_get_hw_event_config(), KVM could query the correct
EVENTSEL_{EVENT, UMASK} pair of a kernel-generic hw event directly from
the different *_perfmon_event_map[] by the kernel's pre-defined perf_hw_id.

Also extend the bit range of the comparison field to
AMD64_RAW_EVENT_MASK_NB to prevent AMD from
defining EventSelect[11:8] into perfmon_event_map[] one day.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/pmu.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 33bf08fc0282..7dc949f6a92c 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -517,13 +517,8 @@ static void kvm_pmu_incr_counter(struct kvm_pmc *pmc)
static inline bool eventsel_match_perf_hw_id(struct kvm_pmc *pmc,
unsigned int perf_hw_id)
{
- u64 old_eventsel = pmc->eventsel;
- unsigned int config;
-
- pmc->eventsel &= (ARCH_PERFMON_EVENTSEL_EVENT | ARCH_PERFMON_EVENTSEL_UMASK);
- config = static_call(kvm_x86_pmu_pmc_perf_hw_id)(pmc);
- pmc->eventsel = old_eventsel;
- return config == perf_hw_id;
+ return !((pmc->eventsel ^ perf_get_hw_event_config(perf_hw_id)) &
+ AMD64_RAW_EVENT_MASK_NB);
}

static inline bool cpl_is_matched(struct kvm_pmc *pmc)
--
2.36.1


2022-05-18 13:32:42

by Like Xu

[permalink] [raw]
Subject: [PATCH RESEND v3 09/11] perf: x86/core: Add interface to query perfmon_event_map[] directly

From: Like Xu <[email protected]>

Currently, we have [intel|knc|p4|p6]_perfmon_event_map on the Intel
platforms and amd_[f17h]_perfmon_event_map on the AMD platforms.

Early clumsy KVM code or other potential perf_event users may have
hard-coded these perfmon_maps (e.g., arch/x86/kvm/svm/pmu.c), so
it would not make sense to program a common hardware event based
on the generic "enum perf_hw_id" once the two tables do not match.

Let's provide an interface for callers outside the perf subsystem to get
the counter config based on the perfmon_event_map currently in use,
and it also helps to save bytes.

Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/events/core.c | 11 +++++++++++
arch/x86/include/asm/perf_event.h | 6 ++++++
2 files changed, 17 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 7f1d10dbabc0..99cf67d63cf3 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2997,3 +2997,14 @@ void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
cap->pebs_ept = x86_pmu.pebs_ept;
}
EXPORT_SYMBOL_GPL(perf_get_x86_pmu_capability);
+
+u64 perf_get_hw_event_config(int hw_event)
+{
+ int max = x86_pmu.max_events;
+
+ if (hw_event < max)
+ return x86_pmu.event_map(array_index_nospec(hw_event, max));
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(perf_get_hw_event_config);
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index dc295b8c8def..396f0ce7a0f4 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -478,6 +478,7 @@ struct x86_pmu_lbr {
};

extern void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap);
+extern u64 perf_get_hw_event_config(int hw_event);
extern void perf_check_microcode(void);
extern void perf_clear_dirty_counters(void);
extern int x86_perf_rdpmc_index(struct perf_event *event);
@@ -487,6 +488,11 @@ static inline void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
memset(cap, 0, sizeof(*cap));
}

+static inline u64 perf_get_hw_event_config(int hw_event)
+{
+ return 0;
+}
+
static inline void perf_events_lapic_init(void) { }
static inline void perf_check_microcode(void) { }
#endif
--
2.36.1


2022-05-20 20:07:42

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH RESEND v3 11/11] KVM: x86/pmu: Drop amd_event_mapping[] in the KVM context

On 5/20/22 16:27, Like Xu wrote:
>
>>
>> Apart from patch 3, the series looks good. I'll probably delay it
>> to 5.20 so that you can confirm the SRCU issue, but it's queued.
>
> I have checked it's protected under srcu_read_lock/unlock() for
> existing usages, so did JimM.
>
> TBH, patch 3 is only inspired by the fact why the protection against
> kvm->arch.msr_filter does not appear for kvm->arch.pmu_event_filter,
> and my limited searching scope has not yet confirmed whether it
> prevents the same spider.

Ok, I'll drop it.

Paolo

2022-05-22 18:13:00

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH RESEND v3 11/11] KVM: x86/pmu: Drop amd_event_mapping[] in the KVM context

On 20/5/2022 8:59 pm, Paolo Bonzini wrote:
> On 5/18/22 15:25, Like Xu wrote:
>> +    if (static_call(kvm_x86_pmu_hw_event_is_unavail)(pmc))
>> +        return false;
>> +
>
> I think it's clearer to make this positive and also not abbreviate the name;
> that is, hw_event_available.

Indeed.

>
> Apart from patch 3, the series looks good.  I'll probably delay it to 5.20 so
> that you can confirm the SRCU issue, but it's queued.

I have checked it's protected under srcu_read_lock/unlock() for existing usages,
so did JimM.

TBH, patch 3 is only inspired by the fact why the protection against
kvm->arch.msr_filter
does not appear for kvm->arch.pmu_event_filter, and my limited searching scope
has not
yet confirmed whether it prevents the same spider.

No comments on the target kernel cycle, Capt.

>
> Thanks,
>
> Paolo

2022-05-23 01:11:27

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH RESEND v3 03/11] KVM: x86/pmu: Protect kvm->arch.pmu_event_filter with SRCU

On 5/18/22 15:25, Like Xu wrote:
> From: Like Xu <[email protected]>
>
> Similar to "kvm->arch.msr_filter", KVM should guarantee that vCPUs will
> see either the previous filter or the new filter when user space calls
> KVM_SET_PMU_EVENT_FILTER ioctl with the vCPU running so that guest
> pmu events with identical settings in both the old and new filter have
> deterministic behavior.
>
> Fixes: 66bb8a065f5a ("KVM: x86: PMU Event Filter")
> Signed-off-by: Like Xu <[email protected]>
> Reviewed-by: Wanpeng Li <[email protected]>

Please always include the call trace where SRCU is not taken. The ones
I reconstructed always end up at a place inside srcu_read_lock/unlock:

reprogram_gp_counter/reprogram_fixed_counter
amd_pmu_set_msr
kvm_set_msr_common
svm_set_msr
__kvm_set_msr
kvm_set_msr_ignored_check
kvm_set_msr_with_filter
kvm_emulate_wrmsr**
emulator_set_msr_with_filter**
kvm_set_msr
emulator_set_msr**
do_set_msr
__msr_io
msr_io
ioctl(KVM_SET_MSRS)**
intel_pmu_set_msr
kvm_set_msr_common
vmx_set_msr (see svm_set_msr)
reprogram_counter
global_ctrl_changed
intel_pmu_set_msr (see above)
kvm_pmu_handle_event
vcpu_enter_guest**
kvm_pmu_incr_counter
kvm_pmu_trigger_event
nested_vmx_run**
kvm_skip_emulated_instruction**
x86_emulate_instruction**
reprogram_fixed_counters
intel_pmu_set_msr (see above)

Paolo

> arch/x86/kvm/pmu.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index f189512207db..24624654e476 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -246,8 +246,9 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
> struct kvm *kvm = pmc->vcpu->kvm;
> bool allow_event = true;
> __u64 key;
> - int idx;
> + int idx, srcu_idx;
>
> + srcu_idx = srcu_read_lock(&kvm->srcu);
> filter = srcu_dereference(kvm->arch.pmu_event_filter, &kvm->srcu);
> if (!filter)
> goto out;
> @@ -270,6 +271,7 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
> }
>
> out:
> + srcu_read_unlock(&kvm->srcu, srcu_idx);
> return allow_event;
> }
>


2022-05-23 05:53:07

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH RESEND v3 11/11] KVM: x86/pmu: Drop amd_event_mapping[] in the KVM context

On 5/18/22 15:25, Like Xu wrote:
> + if (static_call(kvm_x86_pmu_hw_event_is_unavail)(pmc))
> + return false;
> +

I think it's clearer to make this positive and also not abbreviate the
name; that is, hw_event_available.

Apart from patch 3, the series looks good. I'll probably delay it to
5.20 so that you can confirm the SRCU issue, but it's queued.

Thanks,

Paolo

2022-05-23 07:22:44

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH RESEND v3 03/11] KVM: x86/pmu: Protect kvm->arch.pmu_event_filter with SRCU

On Fri, May 20, 2022 at 5:51 AM Paolo Bonzini <[email protected]> wrote:
>
> On 5/18/22 15:25, Like Xu wrote:
> > From: Like Xu <[email protected]>
> >
> > Similar to "kvm->arch.msr_filter", KVM should guarantee that vCPUs will
> > see either the previous filter or the new filter when user space calls
> > KVM_SET_PMU_EVENT_FILTER ioctl with the vCPU running so that guest
> > pmu events with identical settings in both the old and new filter have
> > deterministic behavior.
> >
> > Fixes: 66bb8a065f5a ("KVM: x86: PMU Event Filter")
> > Signed-off-by: Like Xu <[email protected]>
> > Reviewed-by: Wanpeng Li <[email protected]>
>
> Please always include the call trace where SRCU is not taken. The ones
> I reconstructed always end up at a place inside srcu_read_lock/unlock:
>
> reprogram_gp_counter/reprogram_fixed_counter
> amd_pmu_set_msr
> kvm_set_msr_common
> svm_set_msr
> __kvm_set_msr
> kvm_set_msr_ignored_check
> kvm_set_msr_with_filter
> kvm_emulate_wrmsr**
> emulator_set_msr_with_filter**
> kvm_set_msr
> emulator_set_msr**
> do_set_msr
> __msr_io
> msr_io
> ioctl(KVM_SET_MSRS)**
> intel_pmu_set_msr
> kvm_set_msr_common
> vmx_set_msr (see svm_set_msr)
> reprogram_counter
> global_ctrl_changed
> intel_pmu_set_msr (see above)
> kvm_pmu_handle_event
> vcpu_enter_guest**
> kvm_pmu_incr_counter
> kvm_pmu_trigger_event
> nested_vmx_run**
> kvm_skip_emulated_instruction**
> x86_emulate_instruction**
> reprogram_fixed_counters
> intel_pmu_set_msr (see above)
>
> Paolo

I agree with Paolo that existing usage is covered by
srcu_read_lock/unlock, but (a) it's not easy to confirm this, and (b)
this is very fragile.

Whichever way we decide to go, the userspace MSR filter and the PMU
event filter should adopt the same approach.