2024-06-07 20:32:10

by Rob Herring (Arm)

[permalink] [raw]
Subject: [PATCH 0/9] arm64: Add support for Armv9.4 PMU fixed instruction counter

This series adds support for the optional fixed instruction counter
added in Armv9.4 PMU. Most of the series is a refactoring to remove the
index to counter number conversion which dates back to the Armv7 PMU
driver. Removing it is necessary in order to support more than 32
counters without a bunch of conditional code further complicating the
conversion.

Patches 1-2 move the 32-bit Arm PMU drivers into drivers/perf/ and drop
non-DT probe support. These can be taken first if there's no comments on
them.

Patch 3 changes struct arm_pmu.num_events to a bitmap of events, and
updates all the users. This removes the index to counter conversion
on the PMUv3 and Armv7 drivers.

Patch 4 updates various register accessors to use 64-bit values matching
the register size.

Patches 5-6 update KVM PMU register accesses to use shared accessors
from asm/arm_pmuv3.h.

Patches 7-8 rework KVM and perf PMU defines for counter indexes and
number of counters.

Patch 9 finally adds support for the fixed instruction counter.

I tested this on FVP with VHE host and a guest. I tested the Armv7 PMU
changes with QEMU.

Signed-off-by: Rob Herring (Arm) <[email protected]>
---
Rob Herring (Arm) (9):
perf/arm: Move 32-bit PMU drivers to drivers/perf/
perf: arm_v6/7_pmu: Drop non-DT probe support
perf: arm_pmu: Remove event index to counter remapping
perf: arm_pmuv3: Prepare for more than 32 counters
KVM: arm64: pmu: Use arm_pmuv3.h register accessors
KVM: arm64: pmu: Use generated define for PMSELR_EL0.SEL access
arm64: perf/kvm: Use a common PMU cycle counter define
KVM: arm64: Refine PMU defines for number of counters
perf: arm_pmuv3: Add support for Armv9.4 PMU instruction counter

arch/arm/include/asm/arm_pmuv3.h | 23 ++++
arch/arm/kernel/Makefile | 2 -
arch/arm64/include/asm/arm_pmuv3.h | 56 ++++++++-
arch/arm64/include/asm/kvm_host.h | 8 +-
arch/arm64/include/asm/sysreg.h | 1 -
arch/arm64/kvm/pmu-emul.c | 14 +--
arch/arm64/kvm/pmu.c | 88 ++++---------
arch/arm64/kvm/sys_regs.c | 11 +-
arch/arm64/tools/sysreg | 30 +++++
drivers/perf/Kconfig | 12 ++
drivers/perf/Makefile | 3 +
drivers/perf/arm_pmu.c | 11 +-
drivers/perf/arm_pmuv3.c | 138 ++++++++++-----------
.../perf_event_v6.c => drivers/perf/arm_v6_pmu.c | 26 +---
.../perf_event_v7.c => drivers/perf/arm_v7_pmu.c | 90 +++++---------
.../perf/arm_xscale_pmu.c | 15 +--
include/kvm/arm_pmu.h | 8 +-
include/linux/perf/arm_pmu.h | 10 +-
include/linux/perf/arm_pmuv3.h | 7 +-
19 files changed, 289 insertions(+), 264 deletions(-)
---
base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
change-id: 20240607-arm-pmu-3-9-icntr-04375ddd0082

Best regards,
--
Rob Herring (Arm) <[email protected]>



2024-06-07 20:32:19

by Rob Herring (Arm)

[permalink] [raw]
Subject: [PATCH 1/9] perf/arm: Move 32-bit PMU drivers to drivers/perf/

It is preferred to put drivers under drivers/ rather than under arch/.
The PMU drivers also depend on arm_pmu.c, so it's better to place them
all together.

Signed-off-by: Rob Herring (Arm) <[email protected]>
---
arch/arm/kernel/Makefile | 2 --
drivers/perf/Kconfig | 12 ++++++++++++
drivers/perf/Makefile | 3 +++
arch/arm/kernel/perf_event_v6.c => drivers/perf/arm_v6_pmu.c | 3 ---
arch/arm/kernel/perf_event_v7.c => drivers/perf/arm_v7_pmu.c | 3 ---
.../perf_event_xscale.c => drivers/perf/arm_xscale_pmu.c | 3 ---
6 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index 89a77e3f51d2..aaae31b8c4a5 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -78,8 +78,6 @@ obj-$(CONFIG_CPU_XSC3) += xscale-cp0.o
obj-$(CONFIG_CPU_MOHAWK) += xscale-cp0.o
obj-$(CONFIG_IWMMXT) += iwmmxt.o
obj-$(CONFIG_PERF_EVENTS) += perf_regs.o perf_callchain.o
-obj-$(CONFIG_HW_PERF_EVENTS) += perf_event_xscale.o perf_event_v6.o \
- perf_event_v7.o
AFLAGS_iwmmxt.o := -Wa,-mcpu=iwmmxt
obj-$(CONFIG_ARM_CPU_TOPOLOGY) += topology.o
obj-$(CONFIG_VDSO) += vdso.o
diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index 7526a9e714fa..aa9530b4064f 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -56,6 +56,18 @@ config ARM_PMU
Say y if you want to use CPU performance monitors on ARM-based
systems.

+config ARM_V6_PMU
+ depends on ARM_PMU && (CPU_V6 || CPU_V6K)
+ def_bool y
+
+config ARM_V7_PMU
+ depends on ARM_PMU && CPU_V7
+ def_bool y
+
+config ARM_XSCALE_PMU
+ depends on ARM_PMU && CPU_XSCALE
+ def_bool y
+
config RISCV_PMU
depends on RISCV
bool "RISC-V PMU framework"
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index 29b1c28203ef..d43df81d52f7 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -6,6 +6,9 @@ obj-$(CONFIG_ARM_DSU_PMU) += arm_dsu_pmu.o
obj-$(CONFIG_ARM_PMU) += arm_pmu.o arm_pmu_platform.o
obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
obj-$(CONFIG_ARM_PMUV3) += arm_pmuv3.o
+obj-$(CONFIG_ARM_V6_PMU) += arm_v6_pmu.o
+obj-$(CONFIG_ARM_V7_PMU) += arm_v7_pmu.o
+obj-$(CONFIG_ARM_XSCALE_PMU) += arm_xscale_pmu.o
obj-$(CONFIG_ARM_SMMU_V3_PMU) += arm_smmuv3_pmu.o
obj-$(CONFIG_FSL_IMX8_DDR_PMU) += fsl_imx8_ddr_perf.o
obj-$(CONFIG_FSL_IMX9_DDR_PMU) += fsl_imx9_ddr_perf.o
diff --git a/arch/arm/kernel/perf_event_v6.c b/drivers/perf/arm_v6_pmu.c
similarity index 99%
rename from arch/arm/kernel/perf_event_v6.c
rename to drivers/perf/arm_v6_pmu.c
index d9fd53841591..f7593843bb85 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/drivers/perf/arm_v6_pmu.c
@@ -31,8 +31,6 @@
* enable the interrupt.
*/

-#if defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K)
-
#include <asm/cputype.h>
#include <asm/irq_regs.h>

@@ -445,4 +443,3 @@ static struct platform_driver armv6_pmu_driver = {
};

builtin_platform_driver(armv6_pmu_driver);
-#endif /* CONFIG_CPU_V6 || CONFIG_CPU_V6K */
diff --git a/arch/arm/kernel/perf_event_v7.c b/drivers/perf/arm_v7_pmu.c
similarity index 99%
rename from arch/arm/kernel/perf_event_v7.c
rename to drivers/perf/arm_v7_pmu.c
index a3322e2b3ea4..fdd936fbd188 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/drivers/perf/arm_v7_pmu.c
@@ -17,8 +17,6 @@
* counter and all 4 performance counters together can be reset separately.
*/

-#ifdef CONFIG_CPU_V7
-
#include <asm/cp15.h>
#include <asm/cputype.h>
#include <asm/irq_regs.h>
@@ -2002,4 +2000,3 @@ static struct platform_driver armv7_pmu_driver = {
};

builtin_platform_driver(armv7_pmu_driver);
-#endif /* CONFIG_CPU_V7 */
diff --git a/arch/arm/kernel/perf_event_xscale.c b/drivers/perf/arm_xscale_pmu.c
similarity index 99%
rename from arch/arm/kernel/perf_event_xscale.c
rename to drivers/perf/arm_xscale_pmu.c
index 7a2ba1c689a7..3d8b72d6b37f 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/drivers/perf/arm_xscale_pmu.c
@@ -13,8 +13,6 @@
* PMU structures.
*/

-#ifdef CONFIG_CPU_XSCALE
-
#include <asm/cputype.h>
#include <asm/irq_regs.h>

@@ -745,4 +743,3 @@ static struct platform_driver xscale_pmu_driver = {
};

builtin_platform_driver(xscale_pmu_driver);
-#endif /* CONFIG_CPU_XSCALE */

--
2.43.0


2024-06-07 20:33:25

by Rob Herring (Arm)

[permalink] [raw]
Subject: [PATCH 6/9] KVM: arm64: pmu: Use generated define for PMSELR_EL0.SEL access

ARMV8_PMU_COUNTER_MASK is really a mask for the PMSELR_EL0.SEL register
field. Make that clear by adding a standard sysreg definition for the
register, and using it instead.

Signed-off-by: Rob Herring (Arm) <[email protected]>
---
arch/arm64/include/asm/sysreg.h | 1 -
arch/arm64/kvm/sys_regs.c | 10 +++++-----
arch/arm64/tools/sysreg | 5 +++++
include/linux/perf/arm_pmuv3.h | 1 -
4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index af3b206fa423..b0d6c33f9ecc 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -403,7 +403,6 @@
#define SYS_PMCNTENCLR_EL0 sys_reg(3, 3, 9, 12, 2)
#define SYS_PMOVSCLR_EL0 sys_reg(3, 3, 9, 12, 3)
#define SYS_PMSWINC_EL0 sys_reg(3, 3, 9, 12, 4)
-#define SYS_PMSELR_EL0 sys_reg(3, 3, 9, 12, 5)
#define SYS_PMCEID0_EL0 sys_reg(3, 3, 9, 12, 6)
#define SYS_PMCEID1_EL0 sys_reg(3, 3, 9, 12, 7)
#define SYS_PMCCNTR_EL0 sys_reg(3, 3, 9, 13, 0)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 22b45a15d068..f8b5db48ea8a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -880,7 +880,7 @@ static u64 reset_pmevtyper(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
static u64 reset_pmselr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
{
reset_unknown(vcpu, r);
- __vcpu_sys_reg(vcpu, r->reg) &= ARMV8_PMU_COUNTER_MASK;
+ __vcpu_sys_reg(vcpu, r->reg) &= PMSELR_EL0_SEL_MASK;

return __vcpu_sys_reg(vcpu, r->reg);
}
@@ -972,7 +972,7 @@ static bool access_pmselr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
else
/* return PMSELR.SEL field */
p->regval = __vcpu_sys_reg(vcpu, PMSELR_EL0)
- & ARMV8_PMU_COUNTER_MASK;
+ & PMSELR_EL0_SEL_MASK;

return true;
}
@@ -1040,8 +1040,8 @@ static bool access_pmu_evcntr(struct kvm_vcpu *vcpu,
if (pmu_access_event_counter_el0_disabled(vcpu))
return false;

- idx = __vcpu_sys_reg(vcpu, PMSELR_EL0)
- & ARMV8_PMU_COUNTER_MASK;
+ idx = SYS_FIELD_GET(PMSELR_EL0, SEL,
+ __vcpu_sys_reg(vcpu, PMSELR_EL0));
} else if (r->Op2 == 0) {
/* PMCCNTR_EL0 */
if (pmu_access_cycle_counter_el0_disabled(vcpu))
@@ -1091,7 +1091,7 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p,

if (r->CRn == 9 && r->CRm == 13 && r->Op2 == 1) {
/* PMXEVTYPER_EL0 */
- idx = __vcpu_sys_reg(vcpu, PMSELR_EL0) & ARMV8_PMU_COUNTER_MASK;
+ idx = SYS_FIELD_GET(PMSELR_EL0, SEL, __vcpu_sys_reg(vcpu, PMSELR_EL0));
reg = PMEVTYPER0_EL0 + idx;
} else if (r->CRn == 14 && (r->CRm & 12) == 12) {
idx = ((r->CRm & 3) << 3) | (r->Op2 & 7);
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index a4c1dd4741a4..231817a379b5 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -2153,6 +2153,11 @@ Field 4 P
Field 3:0 ALIGN
EndSysreg

+Sysreg PMSELR_EL0 3 3 9 12 5
+Res0 63:5
+Field 4:0 SEL
+EndSysreg
+
SysregFields CONTEXTIDR_ELx
Res0 63:32
Field 31:0 PROCID
diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
index 46377e134d67..caa09241ad4f 100644
--- a/include/linux/perf/arm_pmuv3.h
+++ b/include/linux/perf/arm_pmuv3.h
@@ -7,7 +7,6 @@
#define __PERF_ARM_PMUV3_H

#define ARMV8_PMU_MAX_COUNTERS 32
-#define ARMV8_PMU_COUNTER_MASK (ARMV8_PMU_MAX_COUNTERS - 1)

/*
* Common architectural and microarchitectural event numbers.

--
2.43.0


2024-06-07 20:33:34

by Rob Herring (Arm)

[permalink] [raw]
Subject: [PATCH 2/9] perf: arm_v6/7_pmu: Drop non-DT probe support

There are no non-DT based PMU users for v6 or v7, so drop the custom
non-DT probe table.

Note that this drops support for arm1156 PMU, but there are no arm1156
based systems supported in the kernel.

Signed-off-by: Rob Herring (Arm) <[email protected]>
---
drivers/perf/arm_v6_pmu.c | 17 +----------------
drivers/perf/arm_v7_pmu.c | 10 +---------
2 files changed, 2 insertions(+), 25 deletions(-)

diff --git a/drivers/perf/arm_v6_pmu.c b/drivers/perf/arm_v6_pmu.c
index f7593843bb85..0bb685b4bac5 100644
--- a/drivers/perf/arm_v6_pmu.c
+++ b/drivers/perf/arm_v6_pmu.c
@@ -401,13 +401,6 @@ static int armv6_1136_pmu_init(struct arm_pmu *cpu_pmu)
return 0;
}

-static int armv6_1156_pmu_init(struct arm_pmu *cpu_pmu)
-{
- armv6pmu_init(cpu_pmu);
- cpu_pmu->name = "armv6_1156";
- return 0;
-}
-
static int armv6_1176_pmu_init(struct arm_pmu *cpu_pmu)
{
armv6pmu_init(cpu_pmu);
@@ -421,17 +414,9 @@ static const struct of_device_id armv6_pmu_of_device_ids[] = {
{ /* sentinel value */ }
};

-static const struct pmu_probe_info armv6_pmu_probe_table[] = {
- ARM_PMU_PROBE(ARM_CPU_PART_ARM1136, armv6_1136_pmu_init),
- ARM_PMU_PROBE(ARM_CPU_PART_ARM1156, armv6_1156_pmu_init),
- ARM_PMU_PROBE(ARM_CPU_PART_ARM1176, armv6_1176_pmu_init),
- { /* sentinel value */ }
-};
-
static int armv6_pmu_device_probe(struct platform_device *pdev)
{
- return arm_pmu_device_probe(pdev, armv6_pmu_of_device_ids,
- armv6_pmu_probe_table);
+ return arm_pmu_device_probe(pdev, armv6_pmu_of_device_ids, NULL);
}

static struct platform_driver armv6_pmu_driver = {
diff --git a/drivers/perf/arm_v7_pmu.c b/drivers/perf/arm_v7_pmu.c
index fdd936fbd188..928ac3d626ed 100644
--- a/drivers/perf/arm_v7_pmu.c
+++ b/drivers/perf/arm_v7_pmu.c
@@ -1977,17 +1977,9 @@ static const struct of_device_id armv7_pmu_of_device_ids[] = {
{},
};

-static const struct pmu_probe_info armv7_pmu_probe_table[] = {
- ARM_PMU_PROBE(ARM_CPU_PART_CORTEX_A8, armv7_a8_pmu_init),
- ARM_PMU_PROBE(ARM_CPU_PART_CORTEX_A9, armv7_a9_pmu_init),
- { /* sentinel value */ }
-};
-
-
static int armv7_pmu_device_probe(struct platform_device *pdev)
{
- return arm_pmu_device_probe(pdev, armv7_pmu_of_device_ids,
- armv7_pmu_probe_table);
+ return arm_pmu_device_probe(pdev, armv7_pmu_of_device_ids, NULL);
}

static struct platform_driver armv7_pmu_driver = {

--
2.43.0


2024-06-07 20:33:46

by Rob Herring (Arm)

[permalink] [raw]
Subject: [PATCH 7/9] arm64: perf/kvm: Use a common PMU cycle counter define

The PMUv3 and KVM code each have a define for the PMU cycle counter
index. Move KVM's define to a shared location and use it for PMUv3
driver.

Signed-off-by: Rob Herring (Arm) <[email protected]>
---
arch/arm/include/asm/arm_pmuv3.h | 2 ++
arch/arm64/include/asm/arm_pmuv3.h | 2 ++
arch/arm64/kvm/sys_regs.c | 1 +
drivers/perf/arm_pmuv3.c | 23 +++++++++--------------
include/kvm/arm_pmu.h | 1 -
5 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h
index a41b503b7dcd..ac2cf37b57e3 100644
--- a/arch/arm/include/asm/arm_pmuv3.h
+++ b/arch/arm/include/asm/arm_pmuv3.h
@@ -9,6 +9,8 @@
#include <asm/cp15.h>
#include <asm/cputype.h>

+#define ARMV8_PMU_CYCLE_IDX 31
+
#define PMCCNTR __ACCESS_CP15_64(0, c9)

#define PMCR __ACCESS_CP15(c9, 0, c12, 0)
diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h
index 1ed91334fede..46930729fb3f 100644
--- a/arch/arm64/include/asm/arm_pmuv3.h
+++ b/arch/arm64/include/asm/arm_pmuv3.h
@@ -11,6 +11,8 @@
#include <asm/cpufeature.h>
#include <asm/sysreg.h>

+#define ARMV8_PMU_CYCLE_IDX 31
+
#define RETURN_READ_PMEVCNTRN(n) \
return read_sysreg(pmevcntr##n##_el0)
static inline unsigned long read_pmevcntrn(int n)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index f8b5db48ea8a..22393ae7ce14 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -18,6 +18,7 @@
#include <linux/printk.h>
#include <linux/uaccess.h>

+#include <asm/arm_pmuv3.h>
#include <asm/cacheflush.h>
#include <asm/cputype.h>
#include <asm/debug-monitors.h>
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 3b49144f3a58..468a0a3bbd5a 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -448,11 +448,6 @@ static const struct attribute_group armv8_pmuv3_caps_attr_group = {
.attrs = armv8_pmuv3_caps_attrs,
};

-/*
- * Perf Events' indices
- */
-#define ARMV8_IDX_CYCLE_COUNTER 31
-
/*
* We unconditionally enable ARMv8.5-PMU long event counter support
* (64-bit events) where supported. Indicate if this arm_pmu has long
@@ -484,7 +479,7 @@ static bool armv8pmu_event_is_chained(struct perf_event *event)
return !armv8pmu_event_has_user_read(event) &&
armv8pmu_event_is_64bit(event) &&
!armv8pmu_has_long_event(cpu_pmu) &&
- (idx != ARMV8_IDX_CYCLE_COUNTER);
+ (idx != ARMV8_PMU_CYCLE_IDX);
}

/*
@@ -543,7 +538,7 @@ static bool armv8pmu_event_needs_bias(struct perf_event *event)
return false;

if (armv8pmu_has_long_event(cpu_pmu) ||
- idx == ARMV8_IDX_CYCLE_COUNTER)
+ idx == ARMV8_PMU_CYCLE_IDX)
return true;

return false;
@@ -571,7 +566,7 @@ static u64 armv8pmu_read_counter(struct perf_event *event)
int idx = hwc->idx;
u64 value;

- if (idx == ARMV8_IDX_CYCLE_COUNTER)
+ if (idx == ARMV8_PMU_CYCLE_IDX)
value = read_pmccntr();
else
value = armv8pmu_read_hw_counter(event);
@@ -604,7 +599,7 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value)

value = armv8pmu_bias_long_counter(event, value);

- if (idx == ARMV8_IDX_CYCLE_COUNTER)
+ if (idx == ARMV8_PMU_CYCLE_IDX)
write_pmccntr(value);
else
armv8pmu_write_hw_counter(event, value);
@@ -641,7 +636,7 @@ static void armv8pmu_write_event_type(struct perf_event *event)
armv8pmu_write_evtype(idx - 1, hwc->config_base);
armv8pmu_write_evtype(idx, chain_evt);
} else {
- if (idx == ARMV8_IDX_CYCLE_COUNTER)
+ if (idx == ARMV8_PMU_CYCLE_IDX)
write_pmccfiltr(hwc->config_base);
else
armv8pmu_write_evtype(idx, hwc->config_base);
@@ -768,7 +763,7 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)

/* Clear any unused counters to avoid leaking their contents */
for_each_clear_bit(i, cpuc->used_mask, ARMPMU_MAX_HWEVENTS) {
- if (i == ARMV8_IDX_CYCLE_COUNTER)
+ if (i == ARMV8_PMU_CYCLE_IDX)
write_pmccntr(0);
else
armv8pmu_write_evcntr(i, 0);
@@ -928,8 +923,8 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,

/* Always prefer to place a cycle counter into the cycle counter. */
if (evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
- if (!test_and_set_bit(ARMV8_IDX_CYCLE_COUNTER, cpuc->used_mask))
- return ARMV8_IDX_CYCLE_COUNTER;
+ if (!test_and_set_bit(ARMV8_PMU_CYCLE_IDX, cpuc->used_mask))
+ return ARMV8_PMU_CYCLE_IDX;
else if (armv8pmu_event_is_64bit(event) &&
armv8pmu_event_want_user_access(event) &&
!armv8pmu_has_long_event(cpu_pmu))
@@ -1191,7 +1186,7 @@ static void __armv8pmu_probe_pmu(void *info)
0, FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read()));

/* Add the CPU cycles counter */
- bitmap_set(cpu_pmu->cntr_mask, ARMV8_IDX_CYCLE_COUNTER, 1);
+ bitmap_set(cpu_pmu->cntr_mask, ARMV8_PMU_CYCLE_IDX, 1);

pmceid[0] = pmceid_raw[0] = read_pmceid0();
pmceid[1] = pmceid_raw[1] = read_pmceid1();
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 334d7c5503cf..871067fb2616 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -10,7 +10,6 @@
#include <linux/perf_event.h>
#include <linux/perf/arm_pmuv3.h>

-#define ARMV8_PMU_CYCLE_IDX (ARMV8_PMU_MAX_COUNTERS - 1)

#if IS_ENABLED(CONFIG_HW_PERF_EVENTS) && IS_ENABLED(CONFIG_KVM)
struct kvm_pmc {

--
2.43.0


2024-06-07 20:33:58

by Rob Herring (Arm)

[permalink] [raw]
Subject: [PATCH 4/9] perf: arm_pmuv3: Prepare for more than 32 counters

Various PMUv3 registers which are a mask of counters are 64-bit
registers, but the accessor functions take a u32. This has been fine as
the upper 32-bits have been RES0 as there has been a maximum of 32
counters prior to Armv9.4/8.9. With Armv9.4/8.9, a 33rd counter is
added. Update the accessor functions to use a u64 instead.

Signed-off-by: Rob Herring (Arm) <[email protected]>
---
arch/arm64/include/asm/arm_pmuv3.h | 12 ++++++------
arch/arm64/include/asm/kvm_host.h | 8 ++++----
arch/arm64/kvm/pmu.c | 8 ++++----
drivers/perf/arm_pmuv3.c | 40 ++++++++++++++++++++------------------
include/kvm/arm_pmu.h | 4 ++--
5 files changed, 37 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h
index c27404fa4418..e96ce7900fc7 100644
--- a/arch/arm64/include/asm/arm_pmuv3.h
+++ b/arch/arm64/include/asm/arm_pmuv3.h
@@ -71,22 +71,22 @@ static inline u64 read_pmccntr(void)
return read_sysreg(pmccntr_el0);
}

-static inline void write_pmcntenset(u32 val)
+static inline void write_pmcntenset(u64 val)
{
write_sysreg(val, pmcntenset_el0);
}

-static inline void write_pmcntenclr(u32 val)
+static inline void write_pmcntenclr(u64 val)
{
write_sysreg(val, pmcntenclr_el0);
}

-static inline void write_pmintenset(u32 val)
+static inline void write_pmintenset(u64 val)
{
write_sysreg(val, pmintenset_el1);
}

-static inline void write_pmintenclr(u32 val)
+static inline void write_pmintenclr(u64 val)
{
write_sysreg(val, pmintenclr_el1);
}
@@ -96,12 +96,12 @@ static inline void write_pmccfiltr(u64 val)
write_sysreg(val, pmccfiltr_el0);
}

-static inline void write_pmovsclr(u32 val)
+static inline void write_pmovsclr(u64 val)
{
write_sysreg(val, pmovsclr_el0);
}

-static inline u32 read_pmovsclr(void)
+static inline u64 read_pmovsclr(void)
{
return read_sysreg(pmovsclr_el0);
}
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 8170c04fde91..6243a01d9d26 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1267,12 +1267,12 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu);

#ifdef CONFIG_KVM
-void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr);
-void kvm_clr_pmu_events(u32 clr);
+void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr);
+void kvm_clr_pmu_events(u64 clr);
bool kvm_set_pmuserenr(u64 val);
#else
-static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {}
-static inline void kvm_clr_pmu_events(u32 clr) {}
+static inline void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) {}
+static inline void kvm_clr_pmu_events(u64 clr) {}
static inline bool kvm_set_pmuserenr(u64 val)
{
return false;
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index 329819806096..e633b4434c6a 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -35,7 +35,7 @@ struct kvm_pmu_events *kvm_get_pmu_events(void)
* Add events to track that we may want to switch at guest entry/exit
* time.
*/
-void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr)
+void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr)
{
struct kvm_pmu_events *pmu = kvm_get_pmu_events();

@@ -51,7 +51,7 @@ void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr)
/*
* Stop tracking events
*/
-void kvm_clr_pmu_events(u32 clr)
+void kvm_clr_pmu_events(u64 clr)
{
struct kvm_pmu_events *pmu = kvm_get_pmu_events();

@@ -176,7 +176,7 @@ static void kvm_vcpu_pmu_disable_el0(unsigned long events)
void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu)
{
struct kvm_pmu_events *pmu;
- u32 events_guest, events_host;
+ u64 events_guest, events_host;

if (!kvm_arm_support_pmu_v3() || !has_vhe())
return;
@@ -197,7 +197,7 @@ void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu)
void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu)
{
struct kvm_pmu_events *pmu;
- u32 events_guest, events_host;
+ u64 events_guest, events_host;

if (!kvm_arm_support_pmu_v3() || !has_vhe())
return;
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 80202346fc7a..3b49144f3a58 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -502,14 +502,14 @@ static void armv8pmu_pmcr_write(u64 val)
write_pmcr(val);
}

-static int armv8pmu_has_overflowed(u32 pmovsr)
+static int armv8pmu_has_overflowed(u64 pmovsr)
{
- return pmovsr & ARMV8_PMU_OVERFLOWED_MASK;
+ return !!(pmovsr & ARMV8_PMU_OVERFLOWED_MASK);
}

-static int armv8pmu_counter_has_overflowed(u32 pmnc, int idx)
+static int armv8pmu_counter_has_overflowed(u64 pmnc, int idx)
{
- return pmnc & BIT(idx);
+ return !!(pmnc & BIT(idx));
}

static u64 armv8pmu_read_evcntr(int idx)
@@ -648,17 +648,17 @@ static void armv8pmu_write_event_type(struct perf_event *event)
}
}

-static u32 armv8pmu_event_cnten_mask(struct perf_event *event)
+static u64 armv8pmu_event_cnten_mask(struct perf_event *event)
{
int counter = event->hw.idx;
- u32 mask = BIT(counter);
+ u64 mask = BIT(counter);

if (armv8pmu_event_is_chained(event))
mask |= BIT(counter - 1);
return mask;
}

-static void armv8pmu_enable_counter(u32 mask)
+static void armv8pmu_enable_counter(u64 mask)
{
/*
* Make sure event configuration register writes are visible before we
@@ -671,7 +671,7 @@ static void armv8pmu_enable_counter(u32 mask)
static void armv8pmu_enable_event_counter(struct perf_event *event)
{
struct perf_event_attr *attr = &event->attr;
- u32 mask = armv8pmu_event_cnten_mask(event);
+ u64 mask = armv8pmu_event_cnten_mask(event);

kvm_set_pmu_events(mask, attr);

@@ -680,7 +680,7 @@ static void armv8pmu_enable_event_counter(struct perf_event *event)
armv8pmu_enable_counter(mask);
}

-static void armv8pmu_disable_counter(u32 mask)
+static void armv8pmu_disable_counter(u64 mask)
{
write_pmcntenclr(mask);
/*
@@ -693,7 +693,7 @@ static void armv8pmu_disable_counter(u32 mask)
static void armv8pmu_disable_event_counter(struct perf_event *event)
{
struct perf_event_attr *attr = &event->attr;
- u32 mask = armv8pmu_event_cnten_mask(event);
+ u64 mask = armv8pmu_event_cnten_mask(event);

kvm_clr_pmu_events(mask);

@@ -702,7 +702,7 @@ static void armv8pmu_disable_event_counter(struct perf_event *event)
armv8pmu_disable_counter(mask);
}

-static void armv8pmu_enable_intens(u32 mask)
+static void armv8pmu_enable_intens(u64 mask)
{
write_pmintenset(mask);
}
@@ -712,7 +712,7 @@ static void armv8pmu_enable_event_irq(struct perf_event *event)
armv8pmu_enable_intens(BIT(event->hw.idx));
}

-static void armv8pmu_disable_intens(u32 mask)
+static void armv8pmu_disable_intens(u64 mask)
{
write_pmintenclr(mask);
isb();
@@ -726,9 +726,9 @@ static void armv8pmu_disable_event_irq(struct perf_event *event)
armv8pmu_disable_intens(BIT(event->hw.idx));
}

-static u32 armv8pmu_getreset_flags(void)
+static u64 armv8pmu_getreset_flags(void)
{
- u32 value;
+ u64 value;

/* Read */
value = read_pmovsclr();
@@ -823,7 +823,7 @@ static void armv8pmu_stop(struct arm_pmu *cpu_pmu)

static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
{
- u32 pmovsr;
+ u64 pmovsr;
struct perf_sample_data data;
struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
struct pt_regs *regs;
@@ -1035,14 +1035,16 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
static void armv8pmu_reset(void *info)
{
struct arm_pmu *cpu_pmu = (struct arm_pmu *)info;
- u64 pmcr;
+ u64 pmcr, mask;
+
+ bitmap_to_arr64(&mask, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS);

/* The counter and interrupt enable registers are unknown at reset. */
- armv8pmu_disable_counter(U32_MAX);
- armv8pmu_disable_intens(U32_MAX);
+ armv8pmu_disable_counter(mask);
+ armv8pmu_disable_intens(mask);

/* Clear the counters we flip at guest entry/exit */
- kvm_clr_pmu_events(U32_MAX);
+ kvm_clr_pmu_events(mask);

/*
* Initialize & Reset PMNC. Request overflow interrupt for
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 35d4ca4f6122..334d7c5503cf 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -19,8 +19,8 @@ struct kvm_pmc {
};

struct kvm_pmu_events {
- u32 events_host;
- u32 events_guest;
+ u64 events_host;
+ u64 events_guest;
};

struct kvm_pmu {

--
2.43.0


2024-06-07 20:34:07

by Rob Herring (Arm)

[permalink] [raw]
Subject: [PATCH 5/9] KVM: arm64: pmu: Use arm_pmuv3.h register accessors

Commit df29ddf4f04b ("arm64: perf: Abstract system register accesses
away") split off PMU register accessor functions to a standalone header.
Let's use it for KVM PMU code and get rid one copy of the ugly switch
macro.

Signed-off-by: Rob Herring (Arm) <[email protected]>
---
arch/arm64/include/asm/arm_pmuv3.h | 13 ++++++++
arch/arm64/kvm/pmu.c | 67 +++++---------------------------------
2 files changed, 22 insertions(+), 58 deletions(-)

diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h
index e96ce7900fc7..1ed91334fede 100644
--- a/arch/arm64/include/asm/arm_pmuv3.h
+++ b/arch/arm64/include/asm/arm_pmuv3.h
@@ -33,6 +33,14 @@ static inline void write_pmevtypern(int n, unsigned long val)
PMEVN_SWITCH(n, WRITE_PMEVTYPERN);
}

+#define RETURN_READ_PMEVTYPERN(n) \
+ return read_sysreg(pmevtyper##n##_el0)
+static inline unsigned long read_pmevtypern(int n)
+{
+ PMEVN_SWITCH(n, RETURN_READ_PMEVTYPERN);
+ return 0;
+}
+
static inline unsigned long read_pmmir(void)
{
return read_cpuid(PMMIR_EL1);
@@ -96,6 +104,11 @@ static inline void write_pmccfiltr(u64 val)
write_sysreg(val, pmccfiltr_el0);
}

+static inline u64 read_pmccfiltr(void)
+{
+ return read_sysreg(pmccfiltr_el0);
+}
+
static inline void write_pmovsclr(u64 val)
{
write_sysreg(val, pmovsclr_el0);
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index e633b4434c6a..01c9a9efdd1c 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -6,6 +6,8 @@
#include <linux/kvm_host.h>
#include <linux/perf_event.h>

+#include <asm/arm_pmuv3.h>
+
static DEFINE_PER_CPU(struct kvm_pmu_events, kvm_pmu_events);

/*
@@ -62,63 +64,16 @@ void kvm_clr_pmu_events(u64 clr)
pmu->events_guest &= ~clr;
}

-#define PMEVTYPER_READ_CASE(idx) \
- case idx: \
- return read_sysreg(pmevtyper##idx##_el0)
-
-#define PMEVTYPER_WRITE_CASE(idx) \
- case idx: \
- write_sysreg(val, pmevtyper##idx##_el0); \
- break
-
-#define PMEVTYPER_CASES(readwrite) \
- PMEVTYPER_##readwrite##_CASE(0); \
- PMEVTYPER_##readwrite##_CASE(1); \
- PMEVTYPER_##readwrite##_CASE(2); \
- PMEVTYPER_##readwrite##_CASE(3); \
- PMEVTYPER_##readwrite##_CASE(4); \
- PMEVTYPER_##readwrite##_CASE(5); \
- PMEVTYPER_##readwrite##_CASE(6); \
- PMEVTYPER_##readwrite##_CASE(7); \
- PMEVTYPER_##readwrite##_CASE(8); \
- PMEVTYPER_##readwrite##_CASE(9); \
- PMEVTYPER_##readwrite##_CASE(10); \
- PMEVTYPER_##readwrite##_CASE(11); \
- PMEVTYPER_##readwrite##_CASE(12); \
- PMEVTYPER_##readwrite##_CASE(13); \
- PMEVTYPER_##readwrite##_CASE(14); \
- PMEVTYPER_##readwrite##_CASE(15); \
- PMEVTYPER_##readwrite##_CASE(16); \
- PMEVTYPER_##readwrite##_CASE(17); \
- PMEVTYPER_##readwrite##_CASE(18); \
- PMEVTYPER_##readwrite##_CASE(19); \
- PMEVTYPER_##readwrite##_CASE(20); \
- PMEVTYPER_##readwrite##_CASE(21); \
- PMEVTYPER_##readwrite##_CASE(22); \
- PMEVTYPER_##readwrite##_CASE(23); \
- PMEVTYPER_##readwrite##_CASE(24); \
- PMEVTYPER_##readwrite##_CASE(25); \
- PMEVTYPER_##readwrite##_CASE(26); \
- PMEVTYPER_##readwrite##_CASE(27); \
- PMEVTYPER_##readwrite##_CASE(28); \
- PMEVTYPER_##readwrite##_CASE(29); \
- PMEVTYPER_##readwrite##_CASE(30)
-
/*
* Read a value direct from PMEVTYPER<idx> where idx is 0-30
* or PMCCFILTR_EL0 where idx is ARMV8_PMU_CYCLE_IDX (31).
*/
static u64 kvm_vcpu_pmu_read_evtype_direct(int idx)
{
- switch (idx) {
- PMEVTYPER_CASES(READ);
- case ARMV8_PMU_CYCLE_IDX:
- return read_sysreg(pmccfiltr_el0);
- default:
- WARN_ON(1);
- }
+ if (idx == ARMV8_PMU_CYCLE_IDX)
+ return read_pmccfiltr();

- return 0;
+ return read_pmevtypern(idx);
}

/*
@@ -127,14 +82,10 @@ static u64 kvm_vcpu_pmu_read_evtype_direct(int idx)
*/
static void kvm_vcpu_pmu_write_evtype_direct(int idx, u32 val)
{
- switch (idx) {
- PMEVTYPER_CASES(WRITE);
- case ARMV8_PMU_CYCLE_IDX:
- write_sysreg(val, pmccfiltr_el0);
- break;
- default:
- WARN_ON(1);
- }
+ if (idx == ARMV8_PMU_CYCLE_IDX)
+ write_pmccfiltr(val);
+ else
+ write_pmevtypern(idx, val);
}

/*

--
2.43.0


2024-06-07 20:34:20

by Rob Herring (Arm)

[permalink] [raw]
Subject: [PATCH 9/9] perf: arm_pmuv3: Add support for Armv9.4 PMU instruction counter

Armv9.4/8.9 PMU adds optional support for a fixed instruction counter
similar to the fixed cycle counter. Support for the feature is indicated
in the ID_AA64DFR1_EL1 register PMICNTR field. The counter is not
accessible in AArch32.

Existing userspace using direct counter access won't know how to handle
the fixed instruction counter, so we have to avoid using the counter
when user access is requested.

Signed-off-by: Rob Herring (Arm) <[email protected]>
---
arch/arm/include/asm/arm_pmuv3.h | 21 +++++++++++++++++++++
arch/arm64/include/asm/arm_pmuv3.h | 29 +++++++++++++++++++++++++++++
arch/arm64/kvm/pmu.c | 8 ++++++--
arch/arm64/tools/sysreg | 25 +++++++++++++++++++++++++
drivers/perf/arm_pmuv3.c | 28 ++++++++++++++++++++++++++--
include/linux/perf/arm_pmu.h | 8 ++++++--
include/linux/perf/arm_pmuv3.h | 4 +++-
7 files changed, 116 insertions(+), 7 deletions(-)

diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h
index ac2cf37b57e3..b836537ddfbf 100644
--- a/arch/arm/include/asm/arm_pmuv3.h
+++ b/arch/arm/include/asm/arm_pmuv3.h
@@ -10,6 +10,7 @@
#include <asm/cputype.h>

#define ARMV8_PMU_CYCLE_IDX 31
+#define ARMV8_PMU_INSTR_IDX 32 /* Not accessible from AArch32 */

#define PMCCNTR __ACCESS_CP15_64(0, c9)

@@ -129,6 +130,12 @@ static inline u32 read_pmuver(void)
return (dfr0 >> 24) & 0xf;
}

+static inline bool pmuv3_has_icntr(void)
+{
+ /* FEAT_PMUv3_ICNTR not accessible for 32-bit */
+ return false;
+}
+
static inline void write_pmcr(u32 val)
{
write_sysreg(val, PMCR);
@@ -154,6 +161,13 @@ static inline u64 read_pmccntr(void)
return read_sysreg(PMCCNTR);
}

+static inline void write_pmicntr(u64 val) {}
+
+static inline u64 read_pmicntr(void)
+{
+ return 0;
+}
+
static inline void write_pmcntenset(u32 val)
{
write_sysreg(val, PMCNTENSET);
@@ -179,6 +193,13 @@ static inline void write_pmccfiltr(u32 val)
write_sysreg(val, PMCCFILTR);
}

+static inline void write_pmicfiltr(u64 val) {}
+
+static inline u64 read_pmicfiltr(void)
+{
+ return 0;
+}
+
static inline void write_pmovsclr(u32 val)
{
write_sysreg(val, PMOVSR);
diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h
index 46930729fb3f..a13a10e97c01 100644
--- a/arch/arm64/include/asm/arm_pmuv3.h
+++ b/arch/arm64/include/asm/arm_pmuv3.h
@@ -12,6 +12,7 @@
#include <asm/sysreg.h>

#define ARMV8_PMU_CYCLE_IDX 31
+#define ARMV8_PMU_INSTR_IDX 32

#define RETURN_READ_PMEVCNTRN(n) \
return read_sysreg(pmevcntr##n##_el0)
@@ -56,6 +57,14 @@ static inline u32 read_pmuver(void)
ID_AA64DFR0_EL1_PMUVer_SHIFT);
}

+static inline bool pmuv3_has_icntr(void)
+{
+ u64 dfr1 = read_sysreg(id_aa64dfr1_el1);
+
+ return !!cpuid_feature_extract_unsigned_field(dfr1,
+ ID_AA64DFR1_EL1_PMICNTR_SHIFT);
+}
+
static inline void write_pmcr(u64 val)
{
write_sysreg(val, pmcr_el0);
@@ -81,6 +90,16 @@ static inline u64 read_pmccntr(void)
return read_sysreg(pmccntr_el0);
}

+static inline void write_pmicntr(u64 val)
+{
+ write_sysreg_s(val, SYS_PMICNTR_EL0);
+}
+
+static inline u64 read_pmicntr(void)
+{
+ return read_sysreg_s(SYS_PMICNTR_EL0);
+}
+
static inline void write_pmcntenset(u64 val)
{
write_sysreg(val, pmcntenset_el0);
@@ -111,6 +130,16 @@ static inline u64 read_pmccfiltr(void)
return read_sysreg(pmccfiltr_el0);
}

+static inline void write_pmicfiltr(u64 val)
+{
+ write_sysreg_s(val, SYS_PMICFILTR_EL0);
+}
+
+static inline u64 read_pmicfiltr(void)
+{
+ return read_sysreg_s(SYS_PMICFILTR_EL0);
+}
+
static inline void write_pmovsclr(u64 val)
{
write_sysreg(val, pmovsclr_el0);
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index 7eaf5f7aeae9..9420835cce91 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -67,24 +67,28 @@ void kvm_clr_pmu_events(u64 clr)

/*
* Read a value direct from PMEVTYPER<idx> where idx is 0-30
- * or PMCCFILTR_EL0 where idx is ARMV8_PMU_CYCLE_IDX (31).
+ * or PMxCFILTR_EL0 where idx is 31-32.
*/
static u64 kvm_vcpu_pmu_read_evtype_direct(int idx)
{
if (idx == ARMV8_PMU_CYCLE_IDX)
return read_pmccfiltr();
+ else if (idx == ARMV8_PMU_INSTR_IDX)
+ return read_pmicfiltr();

return read_pmevtypern(idx);
}

/*
* Write a value direct to PMEVTYPER<idx> where idx is 0-30
- * or PMCCFILTR_EL0 where idx is ARMV8_PMU_CYCLE_IDX (31).
+ * or PMxCFILTR_EL0 where idx is 31-32.
*/
static void kvm_vcpu_pmu_write_evtype_direct(int idx, u32 val)
{
if (idx == ARMV8_PMU_CYCLE_IDX)
write_pmccfiltr(val);
+ else if (idx == ARMV8_PMU_INSTR_IDX)
+ write_pmicfiltr(val);
else
write_pmevtypern(idx, val);
}
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index 231817a379b5..8ab6e09871de 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -2029,6 +2029,31 @@ Sysreg FAR_EL1 3 0 6 0 0
Field 63:0 ADDR
EndSysreg

+Sysreg PMICNTR_EL0 3 3 9 4 0
+Field 63:0 ICNT
+EndSysreg
+
+Sysreg PMICFILTR_EL0 3 3 9 6 0
+Res0 63:59
+Field 58 SYNC
+Field 57:56 VS
+Res0 55:32
+Field 31 P
+Field 30 U
+Field 29 NSK
+Field 28 NSU
+Field 27 NSH
+Field 26 M
+Res0 25
+Field 24 SH
+Field 23 T
+Field 22 RLK
+Field 21 RLU
+Field 20 RLH
+Res0 19:16
+Field 15:0 evtCount
+EndSysreg
+
Sysreg PMSCR_EL1 3 0 9 9 0
Res0 63:8
Field 7:6 PCT
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 468a0a3bbd5a..890efc686e11 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -479,7 +479,7 @@ static bool armv8pmu_event_is_chained(struct perf_event *event)
return !armv8pmu_event_has_user_read(event) &&
armv8pmu_event_is_64bit(event) &&
!armv8pmu_has_long_event(cpu_pmu) &&
- (idx != ARMV8_PMU_CYCLE_IDX);
+ (idx < ARMV8_PMU_CYCLE_IDX);
}

/*
@@ -538,7 +538,7 @@ static bool armv8pmu_event_needs_bias(struct perf_event *event)
return false;

if (armv8pmu_has_long_event(cpu_pmu) ||
- idx == ARMV8_PMU_CYCLE_IDX)
+ idx >= ARMV8_PMU_CYCLE_IDX)
return true;

return false;
@@ -568,6 +568,8 @@ static u64 armv8pmu_read_counter(struct perf_event *event)

if (idx == ARMV8_PMU_CYCLE_IDX)
value = read_pmccntr();
+ else if (idx == ARMV8_PMU_INSTR_IDX)
+ value = read_pmicntr();
else
value = armv8pmu_read_hw_counter(event);

@@ -601,6 +603,8 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value)

if (idx == ARMV8_PMU_CYCLE_IDX)
write_pmccntr(value);
+ else if (idx == ARMV8_PMU_INSTR_IDX)
+ write_pmicntr(value);
else
armv8pmu_write_hw_counter(event, value);
}
@@ -638,6 +642,8 @@ static void armv8pmu_write_event_type(struct perf_event *event)
} else {
if (idx == ARMV8_PMU_CYCLE_IDX)
write_pmccfiltr(hwc->config_base);
+ else if (idx == ARMV8_PMU_INSTR_IDX)
+ write_pmicfiltr(hwc->config_base);
else
armv8pmu_write_evtype(idx, hwc->config_base);
}
@@ -765,6 +771,8 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
for_each_clear_bit(i, cpuc->used_mask, ARMPMU_MAX_HWEVENTS) {
if (i == ARMV8_PMU_CYCLE_IDX)
write_pmccntr(0);
+ else if (i == ARMV8_PMU_INSTR_IDX)
+ write_pmicntr(0);
else
armv8pmu_write_evcntr(i, 0);
}
@@ -931,6 +939,18 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
return -EAGAIN;
}

+ /*
+ * Always prefer to place a instruction counter into the instruction counter,
+ * but don't expose the instruction counter to userspace access as userspace
+ * may not know how to handle it.
+ */
+ if (test_bit(ARMV8_PMU_INSTR_IDX, cpu_pmu->cntr_mask) &&
+ (evtype == ARMV8_PMUV3_PERFCTR_INST_RETIRED) &&
+ !armv8pmu_event_want_user_access(event)) {
+ if (!test_and_set_bit(ARMV8_PMU_INSTR_IDX, cpuc->used_mask))
+ return ARMV8_PMU_INSTR_IDX;
+ }
+
/*
* Otherwise use events counters
*/
@@ -1188,6 +1208,10 @@ static void __armv8pmu_probe_pmu(void *info)
/* Add the CPU cycles counter */
bitmap_set(cpu_pmu->cntr_mask, ARMV8_PMU_CYCLE_IDX, 1);

+ /* Add the CPU instructions counter */
+ if (pmuv3_has_icntr())
+ bitmap_set(cpu_pmu->cntr_mask, ARMV8_PMU_INSTR_IDX, 1);
+
pmceid[0] = pmceid_raw[0] = read_pmceid0();
pmceid[1] = pmceid_raw[1] = read_pmceid1();

diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index e5d6d204beab..4b5b83677e3f 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -17,10 +17,14 @@
#ifdef CONFIG_ARM_PMU

/*
- * The ARMv7 CPU PMU supports up to 32 event counters.
+ * The Armv7 and Armv8.8 or less CPU PMU supports up to 32 event counters.
+ * The Armv8.9/9.4 CPU PMU supports up to 33 event counters.
*/
+#ifdef CONFIG_ARM
#define ARMPMU_MAX_HWEVENTS 32
-
+#else
+#define ARMPMU_MAX_HWEVENTS 33
+#endif
/*
* ARM PMU hw_event flags
*/
diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
index c902fe64f070..0472c4270d66 100644
--- a/include/linux/perf/arm_pmuv3.h
+++ b/include/linux/perf/arm_pmuv3.h
@@ -224,8 +224,10 @@
*/
#define ARMV8_PMU_OVSR_P GENMASK(30, 0)
#define ARMV8_PMU_OVSR_C BIT(31)
+#define ARMV8_PMU_OVSR_F BIT_ULL(32) /* arm64 only */
/* Mask for writable bits is both P and C fields */
-#define ARMV8_PMU_OVERFLOWED_MASK (ARMV8_PMU_OVSR_P | ARMV8_PMU_OVSR_C)
+#define ARMV8_PMU_OVERFLOWED_MASK (ARMV8_PMU_OVSR_P | ARMV8_PMU_OVSR_C | \
+ ARMV8_PMU_OVSR_F)

/*
* PMXEVTYPER: Event selection reg

--
2.43.0


2024-06-07 20:34:53

by Rob Herring (Arm)

[permalink] [raw]
Subject: [PATCH 8/9] KVM: arm64: Refine PMU defines for number of counters

There are 2 defines for the number of PMU counters:
ARMV8_PMU_MAX_COUNTERS and ARMPMU_MAX_HWEVENTS. Both are the same
currently, but Armv9.4/8.9 increases the number of possible counters
from 32 to 33. With this change, the maximum number of counters will
differ for KVM's PMU emulation which is PMUv3.4. Give KVM PMU emulation
its own define to decouple it from the rest of the kernel's number PMU
counters.

The VHE PMU code needs to match the PMU driver, so switch it to use
ARMPMU_MAX_HWEVENTS instead.

Signed-off-by: Rob Herring (Arm) <[email protected]>
---
arch/arm64/kvm/pmu-emul.c | 8 ++++----
arch/arm64/kvm/pmu.c | 5 +++--
include/kvm/arm_pmu.h | 3 ++-
include/linux/perf/arm_pmuv3.h | 2 --
4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index da5ba9d061e8..77fe79b2ba04 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -234,7 +234,7 @@ void kvm_pmu_vcpu_init(struct kvm_vcpu *vcpu)
int i;
struct kvm_pmu *pmu = &vcpu->arch.pmu;

- for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++)
+ for (i = 0; i < KVM_ARMV8_PMU_MAX_COUNTERS; i++)
pmu->pmc[i].idx = i;
}

@@ -261,7 +261,7 @@ void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu)
{
int i;

- for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++)
+ for (i = 0; i < KVM_ARMV8_PMU_MAX_COUNTERS; i++)
kvm_pmu_release_perf_event(kvm_vcpu_idx_to_pmc(vcpu, i));
irq_work_sync(&vcpu->arch.pmu.overflow_work);
}
@@ -292,7 +292,7 @@ void kvm_pmu_enable_counter_mask(struct kvm_vcpu *vcpu, u64 val)
if (!(kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E) || !val)
return;

- for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) {
+ for (i = 0; i < KVM_ARMV8_PMU_MAX_COUNTERS; i++) {
struct kvm_pmc *pmc;

if (!(val & BIT(i)))
@@ -324,7 +324,7 @@ void kvm_pmu_disable_counter_mask(struct kvm_vcpu *vcpu, u64 val)
if (!kvm_vcpu_has_pmu(vcpu) || !val)
return;

- for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) {
+ for (i = 0; i < KVM_ARMV8_PMU_MAX_COUNTERS; i++) {
struct kvm_pmc *pmc;

if (!(val & BIT(i)))
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index 01c9a9efdd1c..7eaf5f7aeae9 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -5,6 +5,7 @@
*/
#include <linux/kvm_host.h>
#include <linux/perf_event.h>
+#include <linux/perf/arm_pmu.h>

#include <asm/arm_pmuv3.h>

@@ -96,7 +97,7 @@ static void kvm_vcpu_pmu_enable_el0(unsigned long events)
u64 typer;
u32 counter;

- for_each_set_bit(counter, &events, 32) {
+ for_each_set_bit(counter, &events, ARMPMU_MAX_HWEVENTS) {
typer = kvm_vcpu_pmu_read_evtype_direct(counter);
typer &= ~ARMV8_PMU_EXCLUDE_EL0;
kvm_vcpu_pmu_write_evtype_direct(counter, typer);
@@ -111,7 +112,7 @@ static void kvm_vcpu_pmu_disable_el0(unsigned long events)
u64 typer;
u32 counter;

- for_each_set_bit(counter, &events, 32) {
+ for_each_set_bit(counter, &events, ARMPMU_MAX_HWEVENTS) {
typer = kvm_vcpu_pmu_read_evtype_direct(counter);
typer |= ARMV8_PMU_EXCLUDE_EL0;
kvm_vcpu_pmu_write_evtype_direct(counter, typer);
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 871067fb2616..e08aeec5d936 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -10,6 +10,7 @@
#include <linux/perf_event.h>
#include <linux/perf/arm_pmuv3.h>

+#define KVM_ARMV8_PMU_MAX_COUNTERS 32

#if IS_ENABLED(CONFIG_HW_PERF_EVENTS) && IS_ENABLED(CONFIG_KVM)
struct kvm_pmc {
@@ -25,7 +26,7 @@ struct kvm_pmu_events {
struct kvm_pmu {
struct irq_work overflow_work;
struct kvm_pmu_events events;
- struct kvm_pmc pmc[ARMV8_PMU_MAX_COUNTERS];
+ struct kvm_pmc pmc[KVM_ARMV8_PMU_MAX_COUNTERS];
int irq_num;
bool created;
bool irq_level;
diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
index caa09241ad4f..c902fe64f070 100644
--- a/include/linux/perf/arm_pmuv3.h
+++ b/include/linux/perf/arm_pmuv3.h
@@ -6,8 +6,6 @@
#ifndef __PERF_ARM_PMUV3_H
#define __PERF_ARM_PMUV3_H

-#define ARMV8_PMU_MAX_COUNTERS 32
-
/*
* Common architectural and microarchitectural event numbers.
*/

--
2.43.0


2024-06-07 20:57:55

by Rob Herring (Arm)

[permalink] [raw]
Subject: [PATCH 3/9] perf: arm_pmu: Remove event index to counter remapping

Xscale and Armv6 PMUs defined the cycle counter at 0 and event counters
starting at 1 and had 1:1 event index to counter numbering. On Armv7 and
later, this changed the cycle counter to 31 and event counters start at
0. The drivers for Armv7 and PMUv3 kept the old event index numbering
and introduced an event index to counter conversion. The conversion uses
masking to convert from event index to a counter number. This operation
relies on having at most 32 counters so that the cycle counter index 0
can be transformed to counter number 31.

Armv9.4 adds support for an additional fixed function counter
(instructions) which increases possible counters to more than 32, and
the conversion won't work anymore as a simple subtract and mask. The
primary reason for the translation (other than history) seems to be to
have a contiguous mask of counters 0-N. Keeping that would result in
more complicated index to counter conversions. Instead, store a mask of
available counters rather than just number of events. That provides more
information in addition to the number of events.

No (intended) functional changes.

Signed-off-by: Rob Herring (Arm) <[email protected]>
---
arch/arm64/kvm/pmu-emul.c | 6 ++--
drivers/perf/arm_pmu.c | 11 ++++---
drivers/perf/arm_pmuv3.c | 57 ++++++++++----------------------
drivers/perf/arm_v6_pmu.c | 6 ++--
drivers/perf/arm_v7_pmu.c | 77 ++++++++++++++++---------------------------
drivers/perf/arm_xscale_pmu.c | 12 ++++---
include/linux/perf/arm_pmu.h | 2 +-
7 files changed, 69 insertions(+), 102 deletions(-)

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index a35ce10e0a9f..da5ba9d061e8 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -911,10 +911,10 @@ u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm)
struct arm_pmu *arm_pmu = kvm->arch.arm_pmu;

/*
- * The arm_pmu->num_events considers the cycle counter as well.
- * Ignore that and return only the general-purpose counters.
+ * The arm_pmu->cntr_mask considers the fixed counter(s) as well.
+ * Ignore those and return only the general-purpose counters.
*/
- return arm_pmu->num_events - 1;
+ return bitmap_weight(arm_pmu->cntr_mask, 31);
}

static void kvm_arm_set_pmu(struct kvm *kvm, struct arm_pmu *arm_pmu)
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 8458fe2cebb4..398cce3d76fc 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -522,7 +522,7 @@ static void armpmu_enable(struct pmu *pmu)
{
struct arm_pmu *armpmu = to_arm_pmu(pmu);
struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
- bool enabled = !bitmap_empty(hw_events->used_mask, armpmu->num_events);
+ bool enabled = !bitmap_empty(hw_events->used_mask, ARMPMU_MAX_HWEVENTS);

/* For task-bound events we may be called on other CPUs */
if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
@@ -742,7 +742,7 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
struct perf_event *event;
int idx;

- for (idx = 0; idx < armpmu->num_events; idx++) {
+ for_each_set_bit(idx, armpmu->cntr_mask, ARMPMU_MAX_HWEVENTS) {
event = hw_events->events[idx];
if (!event)
continue;
@@ -772,7 +772,7 @@ static int cpu_pm_pmu_notify(struct notifier_block *b, unsigned long cmd,
{
struct arm_pmu *armpmu = container_of(b, struct arm_pmu, cpu_pm_nb);
struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
- bool enabled = !bitmap_empty(hw_events->used_mask, armpmu->num_events);
+ bool enabled = !bitmap_empty(hw_events->used_mask, ARMPMU_MAX_HWEVENTS);

if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
return NOTIFY_DONE;
@@ -924,8 +924,9 @@ int armpmu_register(struct arm_pmu *pmu)
if (ret)
goto out_destroy;

- pr_info("enabled with %s PMU driver, %d counters available%s\n",
- pmu->name, pmu->num_events,
+ pr_info("enabled with %s PMU driver, %d (%*pb) counters available%s\n",
+ pmu->name, bitmap_weight(pmu->cntr_mask, ARMPMU_MAX_HWEVENTS),
+ ARMPMU_MAX_HWEVENTS, &pmu->cntr_mask,
has_nmi ? ", using NMIs" : "");

kvm_host_pmu_init(pmu);
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 23fa6c5da82c..80202346fc7a 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -451,9 +451,7 @@ static const struct attribute_group armv8_pmuv3_caps_attr_group = {
/*
* Perf Events' indices
*/
-#define ARMV8_IDX_CYCLE_COUNTER 0
-#define ARMV8_IDX_COUNTER0 1
-#define ARMV8_IDX_CYCLE_COUNTER_USER 32
+#define ARMV8_IDX_CYCLE_COUNTER 31

/*
* We unconditionally enable ARMv8.5-PMU long event counter support
@@ -492,13 +490,6 @@ static bool armv8pmu_event_is_chained(struct perf_event *event)
/*
* ARMv8 low level PMU access
*/
-
-/*
- * Perf Event to low level counters mapping
- */
-#define ARMV8_IDX_TO_COUNTER(x) \
- (((x) - ARMV8_IDX_COUNTER0) & ARMV8_PMU_COUNTER_MASK)
-
static u64 armv8pmu_pmcr_read(void)
{
return read_pmcr();
@@ -518,14 +509,12 @@ static int armv8pmu_has_overflowed(u32 pmovsr)

static int armv8pmu_counter_has_overflowed(u32 pmnc, int idx)
{
- return pmnc & BIT(ARMV8_IDX_TO_COUNTER(idx));
+ return pmnc & BIT(idx);
}

static u64 armv8pmu_read_evcntr(int idx)
{
- u32 counter = ARMV8_IDX_TO_COUNTER(idx);
-
- return read_pmevcntrn(counter);
+ return read_pmevcntrn(idx);
}

static u64 armv8pmu_read_hw_counter(struct perf_event *event)
@@ -592,9 +581,7 @@ static u64 armv8pmu_read_counter(struct perf_event *event)

static void armv8pmu_write_evcntr(int idx, u64 value)
{
- u32 counter = ARMV8_IDX_TO_COUNTER(idx);
-
- write_pmevcntrn(counter, value);
+ write_pmevcntrn(idx, value);
}

static void armv8pmu_write_hw_counter(struct perf_event *event,
@@ -625,7 +612,6 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value)

static void armv8pmu_write_evtype(int idx, unsigned long val)
{
- u32 counter = ARMV8_IDX_TO_COUNTER(idx);
unsigned long mask = ARMV8_PMU_EVTYPE_EVENT |
ARMV8_PMU_INCLUDE_EL2 |
ARMV8_PMU_EXCLUDE_EL0 |
@@ -635,7 +621,7 @@ static void armv8pmu_write_evtype(int idx, unsigned long val)
mask |= ARMV8_PMU_EVTYPE_TC | ARMV8_PMU_EVTYPE_TH;

val &= mask;
- write_pmevtypern(counter, val);
+ write_pmevtypern(idx, val);
}

static void armv8pmu_write_event_type(struct perf_event *event)
@@ -664,7 +650,7 @@ static void armv8pmu_write_event_type(struct perf_event *event)

static u32 armv8pmu_event_cnten_mask(struct perf_event *event)
{
- int counter = ARMV8_IDX_TO_COUNTER(event->hw.idx);
+ int counter = event->hw.idx;
u32 mask = BIT(counter);

if (armv8pmu_event_is_chained(event))
@@ -723,8 +709,7 @@ static void armv8pmu_enable_intens(u32 mask)

static void armv8pmu_enable_event_irq(struct perf_event *event)
{
- u32 counter = ARMV8_IDX_TO_COUNTER(event->hw.idx);
- armv8pmu_enable_intens(BIT(counter));
+ armv8pmu_enable_intens(BIT(event->hw.idx));
}

static void armv8pmu_disable_intens(u32 mask)
@@ -738,8 +723,7 @@ static void armv8pmu_disable_intens(u32 mask)

static void armv8pmu_disable_event_irq(struct perf_event *event)
{
- u32 counter = ARMV8_IDX_TO_COUNTER(event->hw.idx);
- armv8pmu_disable_intens(BIT(counter));
+ armv8pmu_disable_intens(BIT(event->hw.idx));
}

static u32 armv8pmu_getreset_flags(void)
@@ -783,7 +767,7 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);

/* Clear any unused counters to avoid leaking their contents */
- for_each_clear_bit(i, cpuc->used_mask, cpu_pmu->num_events) {
+ for_each_clear_bit(i, cpuc->used_mask, ARMPMU_MAX_HWEVENTS) {
if (i == ARMV8_IDX_CYCLE_COUNTER)
write_pmccntr(0);
else
@@ -866,7 +850,7 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
* to prevent skews in group events.
*/
armv8pmu_stop(cpu_pmu);
- for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS) {
struct perf_event *event = cpuc->events[idx];
struct hw_perf_event *hwc;

@@ -905,7 +889,7 @@ static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc,
{
int idx;

- for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; idx++) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, 31) {
if (!test_and_set_bit(idx, cpuc->used_mask))
return idx;
}
@@ -921,7 +905,9 @@ static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
* Chaining requires two consecutive event counters, where
* the lower idx must be even.
*/
- for (idx = ARMV8_IDX_COUNTER0 + 1; idx < cpu_pmu->num_events; idx += 2) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, 31) {
+ if (!(idx & 0x1))
+ continue;
if (!test_and_set_bit(idx, cpuc->used_mask)) {
/* Check if the preceding even counter is available */
if (!test_and_set_bit(idx - 1, cpuc->used_mask))
@@ -974,15 +960,7 @@ static int armv8pmu_user_event_idx(struct perf_event *event)
if (!sysctl_perf_user_access || !armv8pmu_event_has_user_read(event))
return 0;

- /*
- * We remap the cycle counter index to 32 to
- * match the offset applied to the rest of
- * the counter indices.
- */
- if (event->hw.idx == ARMV8_IDX_CYCLE_COUNTER)
- return ARMV8_IDX_CYCLE_COUNTER_USER;
-
- return event->hw.idx;
+ return event->hw.idx + 1;
}

/*
@@ -1207,10 +1185,11 @@ static void __armv8pmu_probe_pmu(void *info)
probe->present = true;

/* Read the nb of CNTx counters supported from PMNC */
- cpu_pmu->num_events = FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read());
+ bitmap_set(cpu_pmu->cntr_mask,
+ 0, FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read()));

/* Add the CPU cycles counter */
- cpu_pmu->num_events += 1;
+ bitmap_set(cpu_pmu->cntr_mask, ARMV8_IDX_CYCLE_COUNTER, 1);

pmceid[0] = pmceid_raw[0] = read_pmceid0();
pmceid[1] = pmceid_raw[1] = read_pmceid1();
diff --git a/drivers/perf/arm_v6_pmu.c b/drivers/perf/arm_v6_pmu.c
index 0bb685b4bac5..b09615bb2bb2 100644
--- a/drivers/perf/arm_v6_pmu.c
+++ b/drivers/perf/arm_v6_pmu.c
@@ -64,6 +64,7 @@ enum armv6_counters {
ARMV6_CYCLE_COUNTER = 0,
ARMV6_COUNTER0,
ARMV6_COUNTER1,
+ ARMV6_NUM_COUNTERS
};

/*
@@ -254,7 +255,7 @@ armv6pmu_handle_irq(struct arm_pmu *cpu_pmu)
*/
armv6_pmcr_write(pmcr);

- for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV6_NUM_COUNTERS) {
struct perf_event *event = cpuc->events[idx];
struct hw_perf_event *hwc;

@@ -391,7 +392,8 @@ static void armv6pmu_init(struct arm_pmu *cpu_pmu)
cpu_pmu->start = armv6pmu_start;
cpu_pmu->stop = armv6pmu_stop;
cpu_pmu->map_event = armv6_map_event;
- cpu_pmu->num_events = 3;
+
+ bitmap_set(cpu_pmu->cntr_mask, 0, ARMV6_NUM_COUNTERS);
}

static int armv6_1136_pmu_init(struct arm_pmu *cpu_pmu)
diff --git a/drivers/perf/arm_v7_pmu.c b/drivers/perf/arm_v7_pmu.c
index 928ac3d626ed..d815fcde7d84 100644
--- a/drivers/perf/arm_v7_pmu.c
+++ b/drivers/perf/arm_v7_pmu.c
@@ -649,24 +649,12 @@ static struct attribute_group armv7_pmuv2_events_attr_group = {
/*
* Perf Events' indices
*/
-#define ARMV7_IDX_CYCLE_COUNTER 0
-#define ARMV7_IDX_COUNTER0 1
-#define ARMV7_IDX_COUNTER_LAST(cpu_pmu) \
- (ARMV7_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
-
-#define ARMV7_MAX_COUNTERS 32
-#define ARMV7_COUNTER_MASK (ARMV7_MAX_COUNTERS - 1)
-
+#define ARMV7_IDX_CYCLE_COUNTER 31
+#define ARMV7_IDX_COUNTER_MAX 31
/*
* ARMv7 low level PMNC access
*/

-/*
- * Perf Event to low level counters mapping
- */
-#define ARMV7_IDX_TO_COUNTER(x) \
- (((x) - ARMV7_IDX_COUNTER0) & ARMV7_COUNTER_MASK)
-
/*
* Per-CPU PMNC: config reg
*/
@@ -725,19 +713,17 @@ static inline int armv7_pmnc_has_overflowed(u32 pmnc)

static inline int armv7_pmnc_counter_valid(struct arm_pmu *cpu_pmu, int idx)
{
- return idx >= ARMV7_IDX_CYCLE_COUNTER &&
- idx <= ARMV7_IDX_COUNTER_LAST(cpu_pmu);
+ return test_bit(idx, cpu_pmu->cntr_mask);
}

static inline int armv7_pmnc_counter_has_overflowed(u32 pmnc, int idx)
{
- return pmnc & BIT(ARMV7_IDX_TO_COUNTER(idx));
+ return pmnc & BIT(idx);
}

static inline void armv7_pmnc_select_counter(int idx)
{
- u32 counter = ARMV7_IDX_TO_COUNTER(idx);
- asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (counter));
+ asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (idx));
isb();
}

@@ -787,29 +773,25 @@ static inline void armv7_pmnc_write_evtsel(int idx, u32 val)

static inline void armv7_pmnc_enable_counter(int idx)
{
- u32 counter = ARMV7_IDX_TO_COUNTER(idx);
- asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (BIT(counter)));
+ asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (BIT(idx)));
}

static inline void armv7_pmnc_disable_counter(int idx)
{
- u32 counter = ARMV7_IDX_TO_COUNTER(idx);
- asm volatile("mcr p15, 0, %0, c9, c12, 2" : : "r" (BIT(counter)));
+ asm volatile("mcr p15, 0, %0, c9, c12, 2" : : "r" (BIT(idx)));
}

static inline void armv7_pmnc_enable_intens(int idx)
{
- u32 counter = ARMV7_IDX_TO_COUNTER(idx);
- asm volatile("mcr p15, 0, %0, c9, c14, 1" : : "r" (BIT(counter)));
+ asm volatile("mcr p15, 0, %0, c9, c14, 1" : : "r" (BIT(idx)));
}

static inline void armv7_pmnc_disable_intens(int idx)
{
- u32 counter = ARMV7_IDX_TO_COUNTER(idx);
- asm volatile("mcr p15, 0, %0, c9, c14, 2" : : "r" (BIT(counter)));
+ asm volatile("mcr p15, 0, %0, c9, c14, 2" : : "r" (BIT(idx)));
isb();
/* Clear the overflow flag in case an interrupt is pending. */
- asm volatile("mcr p15, 0, %0, c9, c12, 3" : : "r" (BIT(counter)));
+ asm volatile("mcr p15, 0, %0, c9, c12, 3" : : "r" (BIT(idx)));
isb();
}

@@ -853,15 +835,12 @@ static void armv7_pmnc_dump_regs(struct arm_pmu *cpu_pmu)
asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (val));
pr_info("CCNT =0x%08x\n", val);

- for (cnt = ARMV7_IDX_COUNTER0;
- cnt <= ARMV7_IDX_COUNTER_LAST(cpu_pmu); cnt++) {
+ for_each_set_bit(cnt, cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX) {
armv7_pmnc_select_counter(cnt);
asm volatile("mrc p15, 0, %0, c9, c13, 2" : "=r" (val));
- pr_info("CNT[%d] count =0x%08x\n",
- ARMV7_IDX_TO_COUNTER(cnt), val);
+ pr_info("CNT[%d] count =0x%08x\n", cnt, val);
asm volatile("mrc p15, 0, %0, c9, c13, 1" : "=r" (val));
- pr_info("CNT[%d] evtsel=0x%08x\n",
- ARMV7_IDX_TO_COUNTER(cnt), val);
+ pr_info("CNT[%d] evtsel=0x%08x\n", cnt, val);
}
}
#endif
@@ -958,7 +937,7 @@ static irqreturn_t armv7pmu_handle_irq(struct arm_pmu *cpu_pmu)
*/
regs = get_irq_regs();

- for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS) {
struct perf_event *event = cpuc->events[idx];
struct hw_perf_event *hwc;

@@ -1027,7 +1006,7 @@ static int armv7pmu_get_event_idx(struct pmu_hw_events *cpuc,
* For anything other than a cycle counter, try and use
* the events counters
*/
- for (idx = ARMV7_IDX_COUNTER0; idx < cpu_pmu->num_events; ++idx) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX) {
if (!test_and_set_bit(idx, cpuc->used_mask))
return idx;
}
@@ -1073,7 +1052,7 @@ static int armv7pmu_set_event_filter(struct hw_perf_event *event,
static void armv7pmu_reset(void *info)
{
struct arm_pmu *cpu_pmu = (struct arm_pmu *)info;
- u32 idx, nb_cnt = cpu_pmu->num_events, val;
+ u32 idx, val;

if (cpu_pmu->secure_access) {
asm volatile("mrc p15, 0, %0, c1, c1, 1" : "=r" (val));
@@ -1082,7 +1061,7 @@ static void armv7pmu_reset(void *info)
}

/* The counter and interrupt enable registers are unknown at reset. */
- for (idx = ARMV7_IDX_CYCLE_COUNTER; idx < nb_cnt; ++idx) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS) {
armv7_pmnc_disable_counter(idx);
armv7_pmnc_disable_intens(idx);
}
@@ -1161,20 +1140,22 @@ static void armv7pmu_init(struct arm_pmu *cpu_pmu)

static void armv7_read_num_pmnc_events(void *info)
{
- int *nb_cnt = info;
+ int nb_cnt;
+ struct arm_pmu *cpu_pmu = info;

/* Read the nb of CNTx counters supported from PMNC */
- *nb_cnt = (armv7_pmnc_read() >> ARMV7_PMNC_N_SHIFT) & ARMV7_PMNC_N_MASK;
+ nb_cnt = (armv7_pmnc_read() >> ARMV7_PMNC_N_SHIFT) & ARMV7_PMNC_N_MASK;
+ bitmap_set(cpu_pmu->cntr_mask, 0, nb_cnt);

/* Add the CPU cycles counter */
- *nb_cnt += 1;
+ bitmap_set(cpu_pmu->cntr_mask, ARMV7_IDX_CYCLE_COUNTER, 1);
}

static int armv7_probe_num_events(struct arm_pmu *arm_pmu)
{
return smp_call_function_any(&arm_pmu->supported_cpus,
armv7_read_num_pmnc_events,
- &arm_pmu->num_events, 1);
+ arm_pmu, 1);
}

static int armv7_a8_pmu_init(struct arm_pmu *cpu_pmu)
@@ -1524,7 +1505,7 @@ static void krait_pmu_reset(void *info)
{
u32 vval, fval;
struct arm_pmu *cpu_pmu = info;
- u32 idx, nb_cnt = cpu_pmu->num_events;
+ u32 idx;

armv7pmu_reset(info);

@@ -1538,7 +1519,7 @@ static void krait_pmu_reset(void *info)
venum_post_pmresr(vval, fval);

/* Reset PMxEVNCTCR to sane default */
- for (idx = ARMV7_IDX_CYCLE_COUNTER; idx < nb_cnt; ++idx) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX) {
armv7_pmnc_select_counter(idx);
asm volatile("mcr p15, 0, %0, c9, c15, 0" : : "r" (0));
}
@@ -1562,7 +1543,7 @@ static int krait_event_to_bit(struct perf_event *event, unsigned int region,
* Lower bits are reserved for use by the counters (see
* armv7pmu_get_event_idx() for more info)
*/
- bit += ARMV7_IDX_COUNTER_LAST(cpu_pmu) + 1;
+ bit += bitmap_weight(cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX);

return bit;
}
@@ -1845,7 +1826,7 @@ static void scorpion_pmu_reset(void *info)
{
u32 vval, fval;
struct arm_pmu *cpu_pmu = info;
- u32 idx, nb_cnt = cpu_pmu->num_events;
+ u32 idx;

armv7pmu_reset(info);

@@ -1860,7 +1841,7 @@ static void scorpion_pmu_reset(void *info)
venum_post_pmresr(vval, fval);

/* Reset PMxEVNCTCR to sane default */
- for (idx = ARMV7_IDX_CYCLE_COUNTER; idx < nb_cnt; ++idx) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX) {
armv7_pmnc_select_counter(idx);
asm volatile("mcr p15, 0, %0, c9, c15, 0" : : "r" (0));
}
@@ -1883,7 +1864,7 @@ static int scorpion_event_to_bit(struct perf_event *event, unsigned int region,
* Lower bits are reserved for use by the counters (see
* armv7pmu_get_event_idx() for more info)
*/
- bit += ARMV7_IDX_COUNTER_LAST(cpu_pmu) + 1;
+ bit += bitmap_weight(cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX);

return bit;
}
diff --git a/drivers/perf/arm_xscale_pmu.c b/drivers/perf/arm_xscale_pmu.c
index 3d8b72d6b37f..e075df521350 100644
--- a/drivers/perf/arm_xscale_pmu.c
+++ b/drivers/perf/arm_xscale_pmu.c
@@ -52,6 +52,8 @@ enum xscale_counters {
XSCALE_COUNTER1,
XSCALE_COUNTER2,
XSCALE_COUNTER3,
+ XSCALE2_NUM_COUNTERS,
+ XSCALE_NUM_COUNTERS = 3,
};

static const unsigned xscale_perf_map[PERF_COUNT_HW_MAX] = {
@@ -168,7 +170,7 @@ xscale1pmu_handle_irq(struct arm_pmu *cpu_pmu)

regs = get_irq_regs();

- for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, XSCALE_NUM_COUNTERS) {
struct perf_event *event = cpuc->events[idx];
struct hw_perf_event *hwc;

@@ -364,7 +366,8 @@ static int xscale1pmu_init(struct arm_pmu *cpu_pmu)
cpu_pmu->start = xscale1pmu_start;
cpu_pmu->stop = xscale1pmu_stop;
cpu_pmu->map_event = xscale_map_event;
- cpu_pmu->num_events = 3;
+
+ bitmap_set(cpu_pmu->cntr_mask, 0, XSCALE_NUM_COUNTERS);

return 0;
}
@@ -500,7 +503,7 @@ xscale2pmu_handle_irq(struct arm_pmu *cpu_pmu)

regs = get_irq_regs();

- for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
+ for_each_set_bit(idx, cpu_pmu->cntr_mask, XSCALE2_NUM_COUNTERS) {
struct perf_event *event = cpuc->events[idx];
struct hw_perf_event *hwc;

@@ -719,7 +722,8 @@ static int xscale2pmu_init(struct arm_pmu *cpu_pmu)
cpu_pmu->start = xscale2pmu_start;
cpu_pmu->stop = xscale2pmu_stop;
cpu_pmu->map_event = xscale_map_event;
- cpu_pmu->num_events = 5;
+
+ bitmap_set(cpu_pmu->cntr_mask, 0, XSCALE2_NUM_COUNTERS);

return 0;
}
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index b3b34f6670cf..e5d6d204beab 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -96,7 +96,7 @@ struct arm_pmu {
void (*stop)(struct arm_pmu *);
void (*reset)(void *);
int (*map_event)(struct perf_event *event);
- int num_events;
+ DECLARE_BITMAP(cntr_mask, ARMPMU_MAX_HWEVENTS);
bool secure_access; /* 32-bit ARM only */
#define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);

--
2.43.0


2024-06-08 19:38:36

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 3/9] perf: arm_pmu: Remove event index to counter remapping

Hi Rob,

kernel test robot noticed the following build errors:

[auto build test ERROR on 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0]

url: https://github.com/intel-lab-lkp/linux/commits/Rob-Herring-Arm/perf-arm-Move-32-bit-PMU-drivers-to-drivers-perf/20240608-043509
base: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
patch link: https://lore.kernel.org/r/20240607-arm-pmu-3-9-icntr-v1-3-c7bd2dceff3b%40kernel.org
patch subject: [PATCH 3/9] perf: arm_pmu: Remove event index to counter remapping
config: arm64-allyesconfig (https://download.01.org/0day-ci/archive/20240609/[email protected]/config)
compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project d7d2d4f53fc79b4b58e8d8d08151b577c3699d4a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240609/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>):

| ^~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:27:22: note: expanded from macro 'ONLY_2_4_6'
27 | #define ONLY_2_4_6 (BIT(2) | BIT(4) | BIT(6))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:99:32: note: previous initialization is here
99 | [0 ... M1_PMU_PERFCTR_LAST] = ANY_BUT_0_1,
| ^~~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:25:23: note: expanded from macro 'ANY_BUT_0_1'
25 | #define ANY_BUT_0_1 GENMASK(9, 2)
| ^~~~~~~~~~~~~
include/linux/bits.h:35:2: note: expanded from macro 'GENMASK'
35 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:128:32: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
128 | [M1_PMU_PERFCTR_UNKNOWN_f6] = ONLY_2_4_6,
| ^~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:27:22: note: expanded from macro 'ONLY_2_4_6'
27 | #define ONLY_2_4_6 (BIT(2) | BIT(4) | BIT(6))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:99:32: note: previous initialization is here
99 | [0 ... M1_PMU_PERFCTR_LAST] = ANY_BUT_0_1,
| ^~~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:25:23: note: expanded from macro 'ANY_BUT_0_1'
25 | #define ANY_BUT_0_1 GENMASK(9, 2)
| ^~~~~~~~~~~~~
include/linux/bits.h:35:2: note: expanded from macro 'GENMASK'
35 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:129:32: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
129 | [M1_PMU_PERFCTR_UNKNOWN_f7] = ONLY_2_4_6,
| ^~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:27:22: note: expanded from macro 'ONLY_2_4_6'
27 | #define ONLY_2_4_6 (BIT(2) | BIT(4) | BIT(6))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:99:32: note: previous initialization is here
99 | [0 ... M1_PMU_PERFCTR_LAST] = ANY_BUT_0_1,
| ^~~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:25:23: note: expanded from macro 'ANY_BUT_0_1'
25 | #define ANY_BUT_0_1 GENMASK(9, 2)
| ^~~~~~~~~~~~~
include/linux/bits.h:35:2: note: expanded from macro 'GENMASK'
35 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:130:32: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
130 | [M1_PMU_PERFCTR_UNKNOWN_f8] = ONLY_2_TO_7,
| ^~~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:26:23: note: expanded from macro 'ONLY_2_TO_7'
26 | #define ONLY_2_TO_7 GENMASK(7, 2)
| ^~~~~~~~~~~~~
include/linux/bits.h:35:2: note: expanded from macro 'GENMASK'
35 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:99:32: note: previous initialization is here
99 | [0 ... M1_PMU_PERFCTR_LAST] = ANY_BUT_0_1,
| ^~~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:25:23: note: expanded from macro 'ANY_BUT_0_1'
25 | #define ANY_BUT_0_1 GENMASK(9, 2)
| ^~~~~~~~~~~~~
include/linux/bits.h:35:2: note: expanded from macro 'GENMASK'
35 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:131:32: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
131 | [M1_PMU_PERFCTR_UNKNOWN_fd] = ONLY_2_4_6,
| ^~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:27:22: note: expanded from macro 'ONLY_2_4_6'
27 | #define ONLY_2_4_6 (BIT(2) | BIT(4) | BIT(6))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:99:32: note: previous initialization is here
99 | [0 ... M1_PMU_PERFCTR_LAST] = ANY_BUT_0_1,
| ^~~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:25:23: note: expanded from macro 'ANY_BUT_0_1'
25 | #define ANY_BUT_0_1 GENMASK(9, 2)
| ^~~~~~~~~~~~~
include/linux/bits.h:35:2: note: expanded from macro 'GENMASK'
35 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:136:31: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
136 | [PERF_COUNT_HW_CPU_CYCLES] = M1_PMU_PERFCTR_CPU_CYCLES,
| ^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:135:2: note: previous initialization is here
135 | PERF_MAP_ALL_UNSUPPORTED,
| ^~~~~~~~~~~~~~~~~~~~~~~~
include/linux/perf/arm_pmu.h:40:34: note: expanded from macro 'PERF_MAP_ALL_UNSUPPORTED'
40 | [0 ... PERF_COUNT_HW_MAX - 1] = HW_OP_UNSUPPORTED
| ^~~~~~~~~~~~~~~~~
include/linux/perf/arm_pmu.h:35:28: note: expanded from macro 'HW_OP_UNSUPPORTED'
35 | #define HW_OP_UNSUPPORTED 0xFFFF
| ^~~~~~
drivers/perf/apple_m1_cpu_pmu.c:137:33: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
137 | [PERF_COUNT_HW_INSTRUCTIONS] = M1_PMU_PERFCTR_INSTRUCTIONS,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/perf/apple_m1_cpu_pmu.c:135:2: note: previous initialization is here
135 | PERF_MAP_ALL_UNSUPPORTED,
| ^~~~~~~~~~~~~~~~~~~~~~~~
include/linux/perf/arm_pmu.h:40:34: note: expanded from macro 'PERF_MAP_ALL_UNSUPPORTED'
40 | [0 ... PERF_COUNT_HW_MAX - 1] = HW_OP_UNSUPPORTED
| ^~~~~~~~~~~~~~~~~
include/linux/perf/arm_pmu.h:35:28: note: expanded from macro 'HW_OP_UNSUPPORTED'
35 | #define HW_OP_UNSUPPORTED 0xFFFF
| ^~~~~~
>> drivers/perf/apple_m1_cpu_pmu.c:403:31: error: no member named 'num_events' in 'struct arm_pmu'; did you mean 'hw_events'?
403 | for (idx = 0; idx < cpu_pmu->num_events; idx++) {
| ^~~~~~~~~~
| hw_events
include/linux/perf/arm_pmu.h:106:33: note: 'hw_events' declared here
106 | struct pmu_hw_events __percpu *hw_events;
| ^
drivers/perf/apple_m1_cpu_pmu.c:563:11: error: no member named 'num_events' in 'struct arm_pmu'; did you mean 'hw_events'?
563 | cpu_pmu->num_events = M1_PMU_NR_COUNTERS;
| ^~~~~~~~~~
| hw_events
include/linux/perf/arm_pmu.h:106:33: note: 'hw_events' declared here
106 | struct pmu_hw_events __percpu *hw_events;
| ^
39 warnings and 2 errors generated.


vim +403 drivers/perf/apple_m1_cpu_pmu.c

a639027a1be1d6 Marc Zyngier 2022-02-08 381
a639027a1be1d6 Marc Zyngier 2022-02-08 382 static irqreturn_t m1_pmu_handle_irq(struct arm_pmu *cpu_pmu)
a639027a1be1d6 Marc Zyngier 2022-02-08 383 {
a639027a1be1d6 Marc Zyngier 2022-02-08 384 struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
a639027a1be1d6 Marc Zyngier 2022-02-08 385 struct pt_regs *regs;
a639027a1be1d6 Marc Zyngier 2022-02-08 386 u64 overflow, state;
a639027a1be1d6 Marc Zyngier 2022-02-08 387 int idx;
a639027a1be1d6 Marc Zyngier 2022-02-08 388
a639027a1be1d6 Marc Zyngier 2022-02-08 389 overflow = read_sysreg_s(SYS_IMP_APL_PMSR_EL1);
a639027a1be1d6 Marc Zyngier 2022-02-08 390 if (!overflow) {
a639027a1be1d6 Marc Zyngier 2022-02-08 391 /* Spurious interrupt? */
a639027a1be1d6 Marc Zyngier 2022-02-08 392 state = read_sysreg_s(SYS_IMP_APL_PMCR0_EL1);
a639027a1be1d6 Marc Zyngier 2022-02-08 393 state &= ~PMCR0_IACT;
a639027a1be1d6 Marc Zyngier 2022-02-08 394 write_sysreg_s(state, SYS_IMP_APL_PMCR0_EL1);
a639027a1be1d6 Marc Zyngier 2022-02-08 395 isb();
a639027a1be1d6 Marc Zyngier 2022-02-08 396 return IRQ_NONE;
a639027a1be1d6 Marc Zyngier 2022-02-08 397 }
a639027a1be1d6 Marc Zyngier 2022-02-08 398
a639027a1be1d6 Marc Zyngier 2022-02-08 399 cpu_pmu->stop(cpu_pmu);
a639027a1be1d6 Marc Zyngier 2022-02-08 400
a639027a1be1d6 Marc Zyngier 2022-02-08 401 regs = get_irq_regs();
a639027a1be1d6 Marc Zyngier 2022-02-08 402
a639027a1be1d6 Marc Zyngier 2022-02-08 @403 for (idx = 0; idx < cpu_pmu->num_events; idx++) {
a639027a1be1d6 Marc Zyngier 2022-02-08 404 struct perf_event *event = cpuc->events[idx];
a639027a1be1d6 Marc Zyngier 2022-02-08 405 struct perf_sample_data data;
a639027a1be1d6 Marc Zyngier 2022-02-08 406
a639027a1be1d6 Marc Zyngier 2022-02-08 407 if (!event)
a639027a1be1d6 Marc Zyngier 2022-02-08 408 continue;
a639027a1be1d6 Marc Zyngier 2022-02-08 409
a639027a1be1d6 Marc Zyngier 2022-02-08 410 armpmu_event_update(event);
a639027a1be1d6 Marc Zyngier 2022-02-08 411 perf_sample_data_init(&data, 0, event->hw.last_period);
a639027a1be1d6 Marc Zyngier 2022-02-08 412 if (!armpmu_event_set_period(event))
a639027a1be1d6 Marc Zyngier 2022-02-08 413 continue;
a639027a1be1d6 Marc Zyngier 2022-02-08 414
a639027a1be1d6 Marc Zyngier 2022-02-08 415 if (perf_event_overflow(event, &data, regs))
a639027a1be1d6 Marc Zyngier 2022-02-08 416 m1_pmu_disable_event(event);
a639027a1be1d6 Marc Zyngier 2022-02-08 417 }
a639027a1be1d6 Marc Zyngier 2022-02-08 418
a639027a1be1d6 Marc Zyngier 2022-02-08 419 cpu_pmu->start(cpu_pmu);
a639027a1be1d6 Marc Zyngier 2022-02-08 420
a639027a1be1d6 Marc Zyngier 2022-02-08 421 return IRQ_HANDLED;
a639027a1be1d6 Marc Zyngier 2022-02-08 422 }
a639027a1be1d6 Marc Zyngier 2022-02-08 423

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2024-06-10 09:24:16

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 1/9] perf/arm: Move 32-bit PMU drivers to drivers/perf/

On Fri, Jun 07, 2024 at 02:31:26PM -0600, Rob Herring (Arm) wrote:
> It is preferred to put drivers under drivers/ rather than under arch/.
> The PMU drivers also depend on arm_pmu.c, so it's better to place them
> all together.
>
> Signed-off-by: Rob Herring (Arm) <[email protected]>

Acked-by: Mark Rutland <[email protected]>

Mark.

> ---
> arch/arm/kernel/Makefile | 2 --
> drivers/perf/Kconfig | 12 ++++++++++++
> drivers/perf/Makefile | 3 +++
> arch/arm/kernel/perf_event_v6.c => drivers/perf/arm_v6_pmu.c | 3 ---
> arch/arm/kernel/perf_event_v7.c => drivers/perf/arm_v7_pmu.c | 3 ---
> .../perf_event_xscale.c => drivers/perf/arm_xscale_pmu.c | 3 ---
> 6 files changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
> index 89a77e3f51d2..aaae31b8c4a5 100644
> --- a/arch/arm/kernel/Makefile
> +++ b/arch/arm/kernel/Makefile
> @@ -78,8 +78,6 @@ obj-$(CONFIG_CPU_XSC3) += xscale-cp0.o
> obj-$(CONFIG_CPU_MOHAWK) += xscale-cp0.o
> obj-$(CONFIG_IWMMXT) += iwmmxt.o
> obj-$(CONFIG_PERF_EVENTS) += perf_regs.o perf_callchain.o
> -obj-$(CONFIG_HW_PERF_EVENTS) += perf_event_xscale.o perf_event_v6.o \
> - perf_event_v7.o
> AFLAGS_iwmmxt.o := -Wa,-mcpu=iwmmxt
> obj-$(CONFIG_ARM_CPU_TOPOLOGY) += topology.o
> obj-$(CONFIG_VDSO) += vdso.o
> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> index 7526a9e714fa..aa9530b4064f 100644
> --- a/drivers/perf/Kconfig
> +++ b/drivers/perf/Kconfig
> @@ -56,6 +56,18 @@ config ARM_PMU
> Say y if you want to use CPU performance monitors on ARM-based
> systems.
>
> +config ARM_V6_PMU
> + depends on ARM_PMU && (CPU_V6 || CPU_V6K)
> + def_bool y
> +
> +config ARM_V7_PMU
> + depends on ARM_PMU && CPU_V7
> + def_bool y
> +
> +config ARM_XSCALE_PMU
> + depends on ARM_PMU && CPU_XSCALE
> + def_bool y
> +
> config RISCV_PMU
> depends on RISCV
> bool "RISC-V PMU framework"
> diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> index 29b1c28203ef..d43df81d52f7 100644
> --- a/drivers/perf/Makefile
> +++ b/drivers/perf/Makefile
> @@ -6,6 +6,9 @@ obj-$(CONFIG_ARM_DSU_PMU) += arm_dsu_pmu.o
> obj-$(CONFIG_ARM_PMU) += arm_pmu.o arm_pmu_platform.o
> obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
> obj-$(CONFIG_ARM_PMUV3) += arm_pmuv3.o
> +obj-$(CONFIG_ARM_V6_PMU) += arm_v6_pmu.o
> +obj-$(CONFIG_ARM_V7_PMU) += arm_v7_pmu.o
> +obj-$(CONFIG_ARM_XSCALE_PMU) += arm_xscale_pmu.o
> obj-$(CONFIG_ARM_SMMU_V3_PMU) += arm_smmuv3_pmu.o
> obj-$(CONFIG_FSL_IMX8_DDR_PMU) += fsl_imx8_ddr_perf.o
> obj-$(CONFIG_FSL_IMX9_DDR_PMU) += fsl_imx9_ddr_perf.o
> diff --git a/arch/arm/kernel/perf_event_v6.c b/drivers/perf/arm_v6_pmu.c
> similarity index 99%
> rename from arch/arm/kernel/perf_event_v6.c
> rename to drivers/perf/arm_v6_pmu.c
> index d9fd53841591..f7593843bb85 100644
> --- a/arch/arm/kernel/perf_event_v6.c
> +++ b/drivers/perf/arm_v6_pmu.c
> @@ -31,8 +31,6 @@
> * enable the interrupt.
> */
>
> -#if defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K)
> -
> #include <asm/cputype.h>
> #include <asm/irq_regs.h>
>
> @@ -445,4 +443,3 @@ static struct platform_driver armv6_pmu_driver = {
> };
>
> builtin_platform_driver(armv6_pmu_driver);
> -#endif /* CONFIG_CPU_V6 || CONFIG_CPU_V6K */
> diff --git a/arch/arm/kernel/perf_event_v7.c b/drivers/perf/arm_v7_pmu.c
> similarity index 99%
> rename from arch/arm/kernel/perf_event_v7.c
> rename to drivers/perf/arm_v7_pmu.c
> index a3322e2b3ea4..fdd936fbd188 100644
> --- a/arch/arm/kernel/perf_event_v7.c
> +++ b/drivers/perf/arm_v7_pmu.c
> @@ -17,8 +17,6 @@
> * counter and all 4 performance counters together can be reset separately.
> */
>
> -#ifdef CONFIG_CPU_V7
> -
> #include <asm/cp15.h>
> #include <asm/cputype.h>
> #include <asm/irq_regs.h>
> @@ -2002,4 +2000,3 @@ static struct platform_driver armv7_pmu_driver = {
> };
>
> builtin_platform_driver(armv7_pmu_driver);
> -#endif /* CONFIG_CPU_V7 */
> diff --git a/arch/arm/kernel/perf_event_xscale.c b/drivers/perf/arm_xscale_pmu.c
> similarity index 99%
> rename from arch/arm/kernel/perf_event_xscale.c
> rename to drivers/perf/arm_xscale_pmu.c
> index 7a2ba1c689a7..3d8b72d6b37f 100644
> --- a/arch/arm/kernel/perf_event_xscale.c
> +++ b/drivers/perf/arm_xscale_pmu.c
> @@ -13,8 +13,6 @@
> * PMU structures.
> */
>
> -#ifdef CONFIG_CPU_XSCALE
> -
> #include <asm/cputype.h>
> #include <asm/irq_regs.h>
>
> @@ -745,4 +743,3 @@ static struct platform_driver xscale_pmu_driver = {
> };
>
> builtin_platform_driver(xscale_pmu_driver);
> -#endif /* CONFIG_CPU_XSCALE */
>
> --
> 2.43.0
>
>

2024-06-10 09:30:33

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 2/9] perf: arm_v6/7_pmu: Drop non-DT probe support

On Fri, Jun 07, 2024 at 02:31:27PM -0600, Rob Herring (Arm) wrote:
> There are no non-DT based PMU users for v6 or v7, so drop the custom
> non-DT probe table.
>
> Note that this drops support for arm1156 PMU, but there are no arm1156
> based systems supported in the kernel.
>
> Signed-off-by: Rob Herring (Arm) <[email protected]>

It might be worth adding an explciit note that xscale still has non-DT
support, and hence we need to retain the infrastructure for that.

Regardless:

Acked-by: Mark Rutland <[email protected]>

Mark.

> ---
> drivers/perf/arm_v6_pmu.c | 17 +----------------
> drivers/perf/arm_v7_pmu.c | 10 +---------
> 2 files changed, 2 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/perf/arm_v6_pmu.c b/drivers/perf/arm_v6_pmu.c
> index f7593843bb85..0bb685b4bac5 100644
> --- a/drivers/perf/arm_v6_pmu.c
> +++ b/drivers/perf/arm_v6_pmu.c
> @@ -401,13 +401,6 @@ static int armv6_1136_pmu_init(struct arm_pmu *cpu_pmu)
> return 0;
> }
>
> -static int armv6_1156_pmu_init(struct arm_pmu *cpu_pmu)
> -{
> - armv6pmu_init(cpu_pmu);
> - cpu_pmu->name = "armv6_1156";
> - return 0;
> -}
> -
> static int armv6_1176_pmu_init(struct arm_pmu *cpu_pmu)
> {
> armv6pmu_init(cpu_pmu);
> @@ -421,17 +414,9 @@ static const struct of_device_id armv6_pmu_of_device_ids[] = {
> { /* sentinel value */ }
> };
>
> -static const struct pmu_probe_info armv6_pmu_probe_table[] = {
> - ARM_PMU_PROBE(ARM_CPU_PART_ARM1136, armv6_1136_pmu_init),
> - ARM_PMU_PROBE(ARM_CPU_PART_ARM1156, armv6_1156_pmu_init),
> - ARM_PMU_PROBE(ARM_CPU_PART_ARM1176, armv6_1176_pmu_init),
> - { /* sentinel value */ }
> -};
> -
> static int armv6_pmu_device_probe(struct platform_device *pdev)
> {
> - return arm_pmu_device_probe(pdev, armv6_pmu_of_device_ids,
> - armv6_pmu_probe_table);
> + return arm_pmu_device_probe(pdev, armv6_pmu_of_device_ids, NULL);
> }
>
> static struct platform_driver armv6_pmu_driver = {
> diff --git a/drivers/perf/arm_v7_pmu.c b/drivers/perf/arm_v7_pmu.c
> index fdd936fbd188..928ac3d626ed 100644
> --- a/drivers/perf/arm_v7_pmu.c
> +++ b/drivers/perf/arm_v7_pmu.c
> @@ -1977,17 +1977,9 @@ static const struct of_device_id armv7_pmu_of_device_ids[] = {
> {},
> };
>
> -static const struct pmu_probe_info armv7_pmu_probe_table[] = {
> - ARM_PMU_PROBE(ARM_CPU_PART_CORTEX_A8, armv7_a8_pmu_init),
> - ARM_PMU_PROBE(ARM_CPU_PART_CORTEX_A9, armv7_a9_pmu_init),
> - { /* sentinel value */ }
> -};
> -
> -
> static int armv7_pmu_device_probe(struct platform_device *pdev)
> {
> - return arm_pmu_device_probe(pdev, armv7_pmu_of_device_ids,
> - armv7_pmu_probe_table);
> + return arm_pmu_device_probe(pdev, armv7_pmu_of_device_ids, NULL);
> }
>
> static struct platform_driver armv7_pmu_driver = {
>
> --
> 2.43.0
>

2024-06-10 10:48:57

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 3/9] perf: arm_pmu: Remove event index to counter remapping

On Fri, Jun 07, 2024 at 02:31:28PM -0600, Rob Herring (Arm) wrote:
> Xscale and Armv6 PMUs defined the cycle counter at 0 and event counters
> starting at 1 and had 1:1 event index to counter numbering. On Armv7 and
> later, this changed the cycle counter to 31 and event counters start at
> 0. The drivers for Armv7 and PMUv3 kept the old event index numbering
> and introduced an event index to counter conversion. The conversion uses
> masking to convert from event index to a counter number. This operation
> relies on having at most 32 counters so that the cycle counter index 0
> can be transformed to counter number 31.
>
> Armv9.4 adds support for an additional fixed function counter
> (instructions) which increases possible counters to more than 32, and
> the conversion won't work anymore as a simple subtract and mask. The
> primary reason for the translation (other than history) seems to be to
> have a contiguous mask of counters 0-N. Keeping that would result in
> more complicated index to counter conversions. Instead, store a mask of
> available counters rather than just number of events. That provides more
> information in addition to the number of events.
>
> No (intended) functional changes.
>
> Signed-off-by: Rob Herring (Arm) <[email protected]>
> ---
> arch/arm64/kvm/pmu-emul.c | 6 ++--
> drivers/perf/arm_pmu.c | 11 ++++---
> drivers/perf/arm_pmuv3.c | 57 ++++++++++----------------------
> drivers/perf/arm_v6_pmu.c | 6 ++--
> drivers/perf/arm_v7_pmu.c | 77 ++++++++++++++++---------------------------
> drivers/perf/arm_xscale_pmu.c | 12 ++++---
> include/linux/perf/arm_pmu.h | 2 +-
> 7 files changed, 69 insertions(+), 102 deletions(-)

This looks like a nice cleanup!

As the test robot reports, it looks like this missed
drivers/perf/apple_m1_cpu_pmu.c, but IIUC that's simple enough to fix
up.

Otherwise, I have a few minor comments below,

> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index 23fa6c5da82c..80202346fc7a 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c
> @@ -451,9 +451,7 @@ static const struct attribute_group armv8_pmuv3_caps_attr_group = {
> /*
> * Perf Events' indices
> */
> -#define ARMV8_IDX_CYCLE_COUNTER 0
> -#define ARMV8_IDX_COUNTER0 1
> -#define ARMV8_IDX_CYCLE_COUNTER_USER 32
> +#define ARMV8_IDX_CYCLE_COUNTER 31

I was going to ask whether this affected the ABI, but I see from below
that armv8pmu_user_event_idx() will now always offset the counter by
one rather than special-casing the cycle counter, and this gives us the
same behavior as before.

[...]

> @@ -783,7 +767,7 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
> struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
>
> /* Clear any unused counters to avoid leaking their contents */
> - for_each_clear_bit(i, cpuc->used_mask, cpu_pmu->num_events) {
> + for_each_clear_bit(i, cpuc->used_mask, ARMPMU_MAX_HWEVENTS) {
> if (i == ARMV8_IDX_CYCLE_COUNTER)
> write_pmccntr(0);
> else

IIUC this will now hit all unimplemented counters; e.g. for N counters the body
will run for counters N..31, and the else case has:

armv8pmu_write_evcntr(i, 0);

... where the resulting write to PMEVCNTR<n>_EL0 for unimplemented
counters is CONSTRAINED UNPREDICTABLE and might be UNDEFINED.

We can fix that with for_each_andnot_bit(), e.g.

for_each_andnot_bit(i, cpu_pmu->cntr_mask, cpuc->used_mask,
ARMPMU_MAX_HWEVENTS) {
if (i == ARMV8_IDX_CYCLE_COUNTER)
write_pmccntr(0);
else
armv8pmu_write_evcntr(i, 0);
}

[...]

> @@ -905,7 +889,7 @@ static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc,
> {
> int idx;
>
> - for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; idx++) {
> + for_each_set_bit(idx, cpu_pmu->cntr_mask, 31) {
> if (!test_and_set_bit(idx, cpuc->used_mask))
> return idx;
> }
> @@ -921,7 +905,9 @@ static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
> * Chaining requires two consecutive event counters, where
> * the lower idx must be even.
> */
> - for (idx = ARMV8_IDX_COUNTER0 + 1; idx < cpu_pmu->num_events; idx += 2) {
> + for_each_set_bit(idx, cpu_pmu->cntr_mask, 31) {
> + if (!(idx & 0x1))
> + continue;
> if (!test_and_set_bit(idx, cpuc->used_mask)) {
> /* Check if the preceding even counter is available */
> if (!test_and_set_bit(idx - 1, cpuc->used_mask))

It would be nice to replace those instances of '31' with something
indicating that this was only covering the generic/programmable
counters, but I wasn't able to come up with a nice mnemonic for that.
The best I could think of was:

#define ARMV8_MAX_NR_GENERIC_COUNTERS 31

Maybe it makes sense to define that along with ARMV8_IDX_CYCLE_COUNTER.

> @@ -974,15 +960,7 @@ static int armv8pmu_user_event_idx(struct perf_event *event)
> if (!sysctl_perf_user_access || !armv8pmu_event_has_user_read(event))
> return 0;
>
> - /*
> - * We remap the cycle counter index to 32 to
> - * match the offset applied to the rest of
> - * the counter indices.
> - */
> - if (event->hw.idx == ARMV8_IDX_CYCLE_COUNTER)
> - return ARMV8_IDX_CYCLE_COUNTER_USER;
> -
> - return event->hw.idx;
> + return event->hw.idx + 1;
> }

[...]

> static void armv7_read_num_pmnc_events(void *info)
> {
> - int *nb_cnt = info;
> + int nb_cnt;
> + struct arm_pmu *cpu_pmu = info;
>
> /* Read the nb of CNTx counters supported from PMNC */
> - *nb_cnt = (armv7_pmnc_read() >> ARMV7_PMNC_N_SHIFT) & ARMV7_PMNC_N_MASK;
> + nb_cnt = (armv7_pmnc_read() >> ARMV7_PMNC_N_SHIFT) & ARMV7_PMNC_N_MASK;
> + bitmap_set(cpu_pmu->cntr_mask, 0, nb_cnt);
>
> /* Add the CPU cycles counter */
> - *nb_cnt += 1;
> + bitmap_set(cpu_pmu->cntr_mask, ARMV7_IDX_CYCLE_COUNTER, 1);

This can be:

set_bit(cpu_pmu->cntr_mask, ARMV7_IDX_CYCLE_COUNTER);

... and likewise for the PMUv3 version.

[...]

> diff --git a/drivers/perf/arm_xscale_pmu.c b/drivers/perf/arm_xscale_pmu.c
> index 3d8b72d6b37f..e075df521350 100644
> --- a/drivers/perf/arm_xscale_pmu.c
> +++ b/drivers/perf/arm_xscale_pmu.c
> @@ -52,6 +52,8 @@ enum xscale_counters {
> XSCALE_COUNTER1,
> XSCALE_COUNTER2,
> XSCALE_COUNTER3,
> + XSCALE2_NUM_COUNTERS,
> + XSCALE_NUM_COUNTERS = 3,
> };

Minor nit, but for consistency with other xscale1-only definitions, it'd
be good to s/XSCALE_NUM_COUNTERS/XSCALE1_NUM_COUNTERS/.

While it'd be different fro mthe other PMU drivers, I reckon it's
clearer to pull those out as:

#define XSCALE1_NUM_COUNTERS 3
#define XSCALE2_NUM_COUNTERS 5

Mark.

2024-06-10 11:02:38

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 5/9] KVM: arm64: pmu: Use arm_pmuv3.h register accessors

On Fri, Jun 07, 2024 at 02:31:30PM -0600, Rob Herring (Arm) wrote:
> Commit df29ddf4f04b ("arm64: perf: Abstract system register accesses
> away") split off PMU register accessor functions to a standalone header.
> Let's use it for KVM PMU code and get rid one copy of the ugly switch
> macro.
>
> Signed-off-by: Rob Herring (Arm) <[email protected]>
> ---
> arch/arm64/include/asm/arm_pmuv3.h | 13 ++++++++
> arch/arm64/kvm/pmu.c | 67 +++++---------------------------------
> 2 files changed, 22 insertions(+), 58 deletions(-)

Acked-by: Mark Rutland <[email protected]>

Mark.

>
> diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h
> index e96ce7900fc7..1ed91334fede 100644
> --- a/arch/arm64/include/asm/arm_pmuv3.h
> +++ b/arch/arm64/include/asm/arm_pmuv3.h
> @@ -33,6 +33,14 @@ static inline void write_pmevtypern(int n, unsigned long val)
> PMEVN_SWITCH(n, WRITE_PMEVTYPERN);
> }
>
> +#define RETURN_READ_PMEVTYPERN(n) \
> + return read_sysreg(pmevtyper##n##_el0)
> +static inline unsigned long read_pmevtypern(int n)
> +{
> + PMEVN_SWITCH(n, RETURN_READ_PMEVTYPERN);
> + return 0;
> +}
> +
> static inline unsigned long read_pmmir(void)
> {
> return read_cpuid(PMMIR_EL1);
> @@ -96,6 +104,11 @@ static inline void write_pmccfiltr(u64 val)
> write_sysreg(val, pmccfiltr_el0);
> }
>
> +static inline u64 read_pmccfiltr(void)
> +{
> + return read_sysreg(pmccfiltr_el0);
> +}
> +
> static inline void write_pmovsclr(u64 val)
> {
> write_sysreg(val, pmovsclr_el0);
> diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
> index e633b4434c6a..01c9a9efdd1c 100644
> --- a/arch/arm64/kvm/pmu.c
> +++ b/arch/arm64/kvm/pmu.c
> @@ -6,6 +6,8 @@
> #include <linux/kvm_host.h>
> #include <linux/perf_event.h>
>
> +#include <asm/arm_pmuv3.h>
> +
> static DEFINE_PER_CPU(struct kvm_pmu_events, kvm_pmu_events);
>
> /*
> @@ -62,63 +64,16 @@ void kvm_clr_pmu_events(u64 clr)
> pmu->events_guest &= ~clr;
> }
>
> -#define PMEVTYPER_READ_CASE(idx) \
> - case idx: \
> - return read_sysreg(pmevtyper##idx##_el0)
> -
> -#define PMEVTYPER_WRITE_CASE(idx) \
> - case idx: \
> - write_sysreg(val, pmevtyper##idx##_el0); \
> - break
> -
> -#define PMEVTYPER_CASES(readwrite) \
> - PMEVTYPER_##readwrite##_CASE(0); \
> - PMEVTYPER_##readwrite##_CASE(1); \
> - PMEVTYPER_##readwrite##_CASE(2); \
> - PMEVTYPER_##readwrite##_CASE(3); \
> - PMEVTYPER_##readwrite##_CASE(4); \
> - PMEVTYPER_##readwrite##_CASE(5); \
> - PMEVTYPER_##readwrite##_CASE(6); \
> - PMEVTYPER_##readwrite##_CASE(7); \
> - PMEVTYPER_##readwrite##_CASE(8); \
> - PMEVTYPER_##readwrite##_CASE(9); \
> - PMEVTYPER_##readwrite##_CASE(10); \
> - PMEVTYPER_##readwrite##_CASE(11); \
> - PMEVTYPER_##readwrite##_CASE(12); \
> - PMEVTYPER_##readwrite##_CASE(13); \
> - PMEVTYPER_##readwrite##_CASE(14); \
> - PMEVTYPER_##readwrite##_CASE(15); \
> - PMEVTYPER_##readwrite##_CASE(16); \
> - PMEVTYPER_##readwrite##_CASE(17); \
> - PMEVTYPER_##readwrite##_CASE(18); \
> - PMEVTYPER_##readwrite##_CASE(19); \
> - PMEVTYPER_##readwrite##_CASE(20); \
> - PMEVTYPER_##readwrite##_CASE(21); \
> - PMEVTYPER_##readwrite##_CASE(22); \
> - PMEVTYPER_##readwrite##_CASE(23); \
> - PMEVTYPER_##readwrite##_CASE(24); \
> - PMEVTYPER_##readwrite##_CASE(25); \
> - PMEVTYPER_##readwrite##_CASE(26); \
> - PMEVTYPER_##readwrite##_CASE(27); \
> - PMEVTYPER_##readwrite##_CASE(28); \
> - PMEVTYPER_##readwrite##_CASE(29); \
> - PMEVTYPER_##readwrite##_CASE(30)
> -
> /*
> * Read a value direct from PMEVTYPER<idx> where idx is 0-30
> * or PMCCFILTR_EL0 where idx is ARMV8_PMU_CYCLE_IDX (31).
> */
> static u64 kvm_vcpu_pmu_read_evtype_direct(int idx)
> {
> - switch (idx) {
> - PMEVTYPER_CASES(READ);
> - case ARMV8_PMU_CYCLE_IDX:
> - return read_sysreg(pmccfiltr_el0);
> - default:
> - WARN_ON(1);
> - }
> + if (idx == ARMV8_PMU_CYCLE_IDX)
> + return read_pmccfiltr();
>
> - return 0;
> + return read_pmevtypern(idx);
> }
>
> /*
> @@ -127,14 +82,10 @@ static u64 kvm_vcpu_pmu_read_evtype_direct(int idx)
> */
> static void kvm_vcpu_pmu_write_evtype_direct(int idx, u32 val)
> {
> - switch (idx) {
> - PMEVTYPER_CASES(WRITE);
> - case ARMV8_PMU_CYCLE_IDX:
> - write_sysreg(val, pmccfiltr_el0);
> - break;
> - default:
> - WARN_ON(1);
> - }
> + if (idx == ARMV8_PMU_CYCLE_IDX)
> + write_pmccfiltr(val);
> + else
> + write_pmevtypern(idx, val);
> }
>
> /*
>
> --
> 2.43.0
>

2024-06-10 11:10:31

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 6/9] KVM: arm64: pmu: Use generated define for PMSELR_EL0.SEL access

On Fri, Jun 07, 2024 at 02:31:31PM -0600, Rob Herring (Arm) wrote:
> ARMV8_PMU_COUNTER_MASK is really a mask for the PMSELR_EL0.SEL register
> field. Make that clear by adding a standard sysreg definition for the
> register, and using it instead.
>
> Signed-off-by: Rob Herring (Arm) <[email protected]>
> ---
> arch/arm64/include/asm/sysreg.h | 1 -
> arch/arm64/kvm/sys_regs.c | 10 +++++-----
> arch/arm64/tools/sysreg | 5 +++++
> include/linux/perf/arm_pmuv3.h | 1 -
> 4 files changed, 10 insertions(+), 7 deletions(-)

This looks good to me; I checked the reg values match those in the
latest ARM ARM (ARM DDI 0487K.a pages 9016 to 9018), and they also match
the eixsting values in the kernel. The changes to use the new mask name
and to use SYS_FIELD_GET() all look good to me.

Reviewed-by: Mark Rutland <[email protected]>
Acked-by: Mark Rutland <[email protected]>

Mark.

>
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index af3b206fa423..b0d6c33f9ecc 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -403,7 +403,6 @@
> #define SYS_PMCNTENCLR_EL0 sys_reg(3, 3, 9, 12, 2)
> #define SYS_PMOVSCLR_EL0 sys_reg(3, 3, 9, 12, 3)
> #define SYS_PMSWINC_EL0 sys_reg(3, 3, 9, 12, 4)
> -#define SYS_PMSELR_EL0 sys_reg(3, 3, 9, 12, 5)
> #define SYS_PMCEID0_EL0 sys_reg(3, 3, 9, 12, 6)
> #define SYS_PMCEID1_EL0 sys_reg(3, 3, 9, 12, 7)
> #define SYS_PMCCNTR_EL0 sys_reg(3, 3, 9, 13, 0)
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 22b45a15d068..f8b5db48ea8a 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -880,7 +880,7 @@ static u64 reset_pmevtyper(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> static u64 reset_pmselr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> {
> reset_unknown(vcpu, r);
> - __vcpu_sys_reg(vcpu, r->reg) &= ARMV8_PMU_COUNTER_MASK;
> + __vcpu_sys_reg(vcpu, r->reg) &= PMSELR_EL0_SEL_MASK;
>
> return __vcpu_sys_reg(vcpu, r->reg);
> }
> @@ -972,7 +972,7 @@ static bool access_pmselr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> else
> /* return PMSELR.SEL field */
> p->regval = __vcpu_sys_reg(vcpu, PMSELR_EL0)
> - & ARMV8_PMU_COUNTER_MASK;
> + & PMSELR_EL0_SEL_MASK;
>
> return true;
> }
> @@ -1040,8 +1040,8 @@ static bool access_pmu_evcntr(struct kvm_vcpu *vcpu,
> if (pmu_access_event_counter_el0_disabled(vcpu))
> return false;
>
> - idx = __vcpu_sys_reg(vcpu, PMSELR_EL0)
> - & ARMV8_PMU_COUNTER_MASK;
> + idx = SYS_FIELD_GET(PMSELR_EL0, SEL,
> + __vcpu_sys_reg(vcpu, PMSELR_EL0));
> } else if (r->Op2 == 0) {
> /* PMCCNTR_EL0 */
> if (pmu_access_cycle_counter_el0_disabled(vcpu))
> @@ -1091,7 +1091,7 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>
> if (r->CRn == 9 && r->CRm == 13 && r->Op2 == 1) {
> /* PMXEVTYPER_EL0 */
> - idx = __vcpu_sys_reg(vcpu, PMSELR_EL0) & ARMV8_PMU_COUNTER_MASK;
> + idx = SYS_FIELD_GET(PMSELR_EL0, SEL, __vcpu_sys_reg(vcpu, PMSELR_EL0));
> reg = PMEVTYPER0_EL0 + idx;
> } else if (r->CRn == 14 && (r->CRm & 12) == 12) {
> idx = ((r->CRm & 3) << 3) | (r->Op2 & 7);
> diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
> index a4c1dd4741a4..231817a379b5 100644
> --- a/arch/arm64/tools/sysreg
> +++ b/arch/arm64/tools/sysreg
> @@ -2153,6 +2153,11 @@ Field 4 P
> Field 3:0 ALIGN
> EndSysreg
>
> +Sysreg PMSELR_EL0 3 3 9 12 5
> +Res0 63:5
> +Field 4:0 SEL
> +EndSysreg
> +
> SysregFields CONTEXTIDR_ELx
> Res0 63:32
> Field 31:0 PROCID
> diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
> index 46377e134d67..caa09241ad4f 100644
> --- a/include/linux/perf/arm_pmuv3.h
> +++ b/include/linux/perf/arm_pmuv3.h
> @@ -7,7 +7,6 @@
> #define __PERF_ARM_PMUV3_H
>
> #define ARMV8_PMU_MAX_COUNTERS 32
> -#define ARMV8_PMU_COUNTER_MASK (ARMV8_PMU_MAX_COUNTERS - 1)
>
> /*
> * Common architectural and microarchitectural event numbers.
>
> --
> 2.43.0
>

2024-06-10 11:22:57

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 4/9] perf: arm_pmuv3: Prepare for more than 32 counters

On Fri, Jun 07, 2024 at 02:31:29PM -0600, Rob Herring (Arm) wrote:
> Various PMUv3 registers which are a mask of counters are 64-bit
> registers, but the accessor functions take a u32. This has been fine as
> the upper 32-bits have been RES0 as there has been a maximum of 32
> counters prior to Armv9.4/8.9. With Armv9.4/8.9, a 33rd counter is
> added. Update the accessor functions to use a u64 instead.
>
> Signed-off-by: Rob Herring (Arm) <[email protected]>

This looks sound to me; it'd be goot to have Marc and/or Oliver comment
on the KVM bits.

Acked-by: Mark Rutland <[email protected]>

Mark.

> ---
> arch/arm64/include/asm/arm_pmuv3.h | 12 ++++++------
> arch/arm64/include/asm/kvm_host.h | 8 ++++----
> arch/arm64/kvm/pmu.c | 8 ++++----
> drivers/perf/arm_pmuv3.c | 40 ++++++++++++++++++++------------------
> include/kvm/arm_pmu.h | 4 ++--
> 5 files changed, 37 insertions(+), 35 deletions(-)
>
> diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h
> index c27404fa4418..e96ce7900fc7 100644
> --- a/arch/arm64/include/asm/arm_pmuv3.h
> +++ b/arch/arm64/include/asm/arm_pmuv3.h
> @@ -71,22 +71,22 @@ static inline u64 read_pmccntr(void)
> return read_sysreg(pmccntr_el0);
> }
>
> -static inline void write_pmcntenset(u32 val)
> +static inline void write_pmcntenset(u64 val)
> {
> write_sysreg(val, pmcntenset_el0);
> }
>
> -static inline void write_pmcntenclr(u32 val)
> +static inline void write_pmcntenclr(u64 val)
> {
> write_sysreg(val, pmcntenclr_el0);
> }
>
> -static inline void write_pmintenset(u32 val)
> +static inline void write_pmintenset(u64 val)
> {
> write_sysreg(val, pmintenset_el1);
> }
>
> -static inline void write_pmintenclr(u32 val)
> +static inline void write_pmintenclr(u64 val)
> {
> write_sysreg(val, pmintenclr_el1);
> }
> @@ -96,12 +96,12 @@ static inline void write_pmccfiltr(u64 val)
> write_sysreg(val, pmccfiltr_el0);
> }
>
> -static inline void write_pmovsclr(u32 val)
> +static inline void write_pmovsclr(u64 val)
> {
> write_sysreg(val, pmovsclr_el0);
> }
>
> -static inline u32 read_pmovsclr(void)
> +static inline u64 read_pmovsclr(void)
> {
> return read_sysreg(pmovsclr_el0);
> }
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 8170c04fde91..6243a01d9d26 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1267,12 +1267,12 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu);
> void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu);
>
> #ifdef CONFIG_KVM
> -void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr);
> -void kvm_clr_pmu_events(u32 clr);
> +void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr);
> +void kvm_clr_pmu_events(u64 clr);
> bool kvm_set_pmuserenr(u64 val);
> #else
> -static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {}
> -static inline void kvm_clr_pmu_events(u32 clr) {}
> +static inline void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) {}
> +static inline void kvm_clr_pmu_events(u64 clr) {}
> static inline bool kvm_set_pmuserenr(u64 val)
> {
> return false;
> diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
> index 329819806096..e633b4434c6a 100644
> --- a/arch/arm64/kvm/pmu.c
> +++ b/arch/arm64/kvm/pmu.c
> @@ -35,7 +35,7 @@ struct kvm_pmu_events *kvm_get_pmu_events(void)
> * Add events to track that we may want to switch at guest entry/exit
> * time.
> */
> -void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr)
> +void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr)
> {
> struct kvm_pmu_events *pmu = kvm_get_pmu_events();
>
> @@ -51,7 +51,7 @@ void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr)
> /*
> * Stop tracking events
> */
> -void kvm_clr_pmu_events(u32 clr)
> +void kvm_clr_pmu_events(u64 clr)
> {
> struct kvm_pmu_events *pmu = kvm_get_pmu_events();
>
> @@ -176,7 +176,7 @@ static void kvm_vcpu_pmu_disable_el0(unsigned long events)
> void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu)
> {
> struct kvm_pmu_events *pmu;
> - u32 events_guest, events_host;
> + u64 events_guest, events_host;
>
> if (!kvm_arm_support_pmu_v3() || !has_vhe())
> return;
> @@ -197,7 +197,7 @@ void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu)
> void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu)
> {
> struct kvm_pmu_events *pmu;
> - u32 events_guest, events_host;
> + u64 events_guest, events_host;
>
> if (!kvm_arm_support_pmu_v3() || !has_vhe())
> return;
> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index 80202346fc7a..3b49144f3a58 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c
> @@ -502,14 +502,14 @@ static void armv8pmu_pmcr_write(u64 val)
> write_pmcr(val);
> }
>
> -static int armv8pmu_has_overflowed(u32 pmovsr)
> +static int armv8pmu_has_overflowed(u64 pmovsr)
> {
> - return pmovsr & ARMV8_PMU_OVERFLOWED_MASK;
> + return !!(pmovsr & ARMV8_PMU_OVERFLOWED_MASK);
> }
>
> -static int armv8pmu_counter_has_overflowed(u32 pmnc, int idx)
> +static int armv8pmu_counter_has_overflowed(u64 pmnc, int idx)
> {
> - return pmnc & BIT(idx);
> + return !!(pmnc & BIT(idx));
> }
>
> static u64 armv8pmu_read_evcntr(int idx)
> @@ -648,17 +648,17 @@ static void armv8pmu_write_event_type(struct perf_event *event)
> }
> }
>
> -static u32 armv8pmu_event_cnten_mask(struct perf_event *event)
> +static u64 armv8pmu_event_cnten_mask(struct perf_event *event)
> {
> int counter = event->hw.idx;
> - u32 mask = BIT(counter);
> + u64 mask = BIT(counter);
>
> if (armv8pmu_event_is_chained(event))
> mask |= BIT(counter - 1);
> return mask;
> }
>
> -static void armv8pmu_enable_counter(u32 mask)
> +static void armv8pmu_enable_counter(u64 mask)
> {
> /*
> * Make sure event configuration register writes are visible before we
> @@ -671,7 +671,7 @@ static void armv8pmu_enable_counter(u32 mask)
> static void armv8pmu_enable_event_counter(struct perf_event *event)
> {
> struct perf_event_attr *attr = &event->attr;
> - u32 mask = armv8pmu_event_cnten_mask(event);
> + u64 mask = armv8pmu_event_cnten_mask(event);
>
> kvm_set_pmu_events(mask, attr);
>
> @@ -680,7 +680,7 @@ static void armv8pmu_enable_event_counter(struct perf_event *event)
> armv8pmu_enable_counter(mask);
> }
>
> -static void armv8pmu_disable_counter(u32 mask)
> +static void armv8pmu_disable_counter(u64 mask)
> {
> write_pmcntenclr(mask);
> /*
> @@ -693,7 +693,7 @@ static void armv8pmu_disable_counter(u32 mask)
> static void armv8pmu_disable_event_counter(struct perf_event *event)
> {
> struct perf_event_attr *attr = &event->attr;
> - u32 mask = armv8pmu_event_cnten_mask(event);
> + u64 mask = armv8pmu_event_cnten_mask(event);
>
> kvm_clr_pmu_events(mask);
>
> @@ -702,7 +702,7 @@ static void armv8pmu_disable_event_counter(struct perf_event *event)
> armv8pmu_disable_counter(mask);
> }
>
> -static void armv8pmu_enable_intens(u32 mask)
> +static void armv8pmu_enable_intens(u64 mask)
> {
> write_pmintenset(mask);
> }
> @@ -712,7 +712,7 @@ static void armv8pmu_enable_event_irq(struct perf_event *event)
> armv8pmu_enable_intens(BIT(event->hw.idx));
> }
>
> -static void armv8pmu_disable_intens(u32 mask)
> +static void armv8pmu_disable_intens(u64 mask)
> {
> write_pmintenclr(mask);
> isb();
> @@ -726,9 +726,9 @@ static void armv8pmu_disable_event_irq(struct perf_event *event)
> armv8pmu_disable_intens(BIT(event->hw.idx));
> }
>
> -static u32 armv8pmu_getreset_flags(void)
> +static u64 armv8pmu_getreset_flags(void)
> {
> - u32 value;
> + u64 value;
>
> /* Read */
> value = read_pmovsclr();
> @@ -823,7 +823,7 @@ static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
>
> static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
> {
> - u32 pmovsr;
> + u64 pmovsr;
> struct perf_sample_data data;
> struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
> struct pt_regs *regs;
> @@ -1035,14 +1035,16 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
> static void armv8pmu_reset(void *info)
> {
> struct arm_pmu *cpu_pmu = (struct arm_pmu *)info;
> - u64 pmcr;
> + u64 pmcr, mask;
> +
> + bitmap_to_arr64(&mask, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS);
>
> /* The counter and interrupt enable registers are unknown at reset. */
> - armv8pmu_disable_counter(U32_MAX);
> - armv8pmu_disable_intens(U32_MAX);
> + armv8pmu_disable_counter(mask);
> + armv8pmu_disable_intens(mask);
>
> /* Clear the counters we flip at guest entry/exit */
> - kvm_clr_pmu_events(U32_MAX);
> + kvm_clr_pmu_events(mask);
>
> /*
> * Initialize & Reset PMNC. Request overflow interrupt for
> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
> index 35d4ca4f6122..334d7c5503cf 100644
> --- a/include/kvm/arm_pmu.h
> +++ b/include/kvm/arm_pmu.h
> @@ -19,8 +19,8 @@ struct kvm_pmc {
> };
>
> struct kvm_pmu_events {
> - u32 events_host;
> - u32 events_guest;
> + u64 events_host;
> + u64 events_guest;
> };
>
> struct kvm_pmu {
>
> --
> 2.43.0
>

2024-06-10 11:28:42

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 8/9] KVM: arm64: Refine PMU defines for number of counters

On Fri, Jun 07, 2024 at 02:31:33PM -0600, Rob Herring (Arm) wrote:
> There are 2 defines for the number of PMU counters:
> ARMV8_PMU_MAX_COUNTERS and ARMPMU_MAX_HWEVENTS. Both are the same
> currently, but Armv9.4/8.9 increases the number of possible counters
> from 32 to 33. With this change, the maximum number of counters will
> differ for KVM's PMU emulation which is PMUv3.4. Give KVM PMU emulation
> its own define to decouple it from the rest of the kernel's number PMU
> counters.
>
> The VHE PMU code needs to match the PMU driver, so switch it to use
> ARMPMU_MAX_HWEVENTS instead.
>
> Signed-off-by: Rob Herring (Arm) <[email protected]>

Acked-by: Mark Rutland <[email protected]>

Mark.

> ---
> arch/arm64/kvm/pmu-emul.c | 8 ++++----
> arch/arm64/kvm/pmu.c | 5 +++--
> include/kvm/arm_pmu.h | 3 ++-
> include/linux/perf/arm_pmuv3.h | 2 --
> 4 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> index da5ba9d061e8..77fe79b2ba04 100644
> --- a/arch/arm64/kvm/pmu-emul.c
> +++ b/arch/arm64/kvm/pmu-emul.c
> @@ -234,7 +234,7 @@ void kvm_pmu_vcpu_init(struct kvm_vcpu *vcpu)
> int i;
> struct kvm_pmu *pmu = &vcpu->arch.pmu;
>
> - for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++)
> + for (i = 0; i < KVM_ARMV8_PMU_MAX_COUNTERS; i++)
> pmu->pmc[i].idx = i;
> }
>
> @@ -261,7 +261,7 @@ void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu)
> {
> int i;
>
> - for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++)
> + for (i = 0; i < KVM_ARMV8_PMU_MAX_COUNTERS; i++)
> kvm_pmu_release_perf_event(kvm_vcpu_idx_to_pmc(vcpu, i));
> irq_work_sync(&vcpu->arch.pmu.overflow_work);
> }
> @@ -292,7 +292,7 @@ void kvm_pmu_enable_counter_mask(struct kvm_vcpu *vcpu, u64 val)
> if (!(kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E) || !val)
> return;
>
> - for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) {
> + for (i = 0; i < KVM_ARMV8_PMU_MAX_COUNTERS; i++) {
> struct kvm_pmc *pmc;
>
> if (!(val & BIT(i)))
> @@ -324,7 +324,7 @@ void kvm_pmu_disable_counter_mask(struct kvm_vcpu *vcpu, u64 val)
> if (!kvm_vcpu_has_pmu(vcpu) || !val)
> return;
>
> - for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) {
> + for (i = 0; i < KVM_ARMV8_PMU_MAX_COUNTERS; i++) {
> struct kvm_pmc *pmc;
>
> if (!(val & BIT(i)))
> diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
> index 01c9a9efdd1c..7eaf5f7aeae9 100644
> --- a/arch/arm64/kvm/pmu.c
> +++ b/arch/arm64/kvm/pmu.c
> @@ -5,6 +5,7 @@
> */
> #include <linux/kvm_host.h>
> #include <linux/perf_event.h>
> +#include <linux/perf/arm_pmu.h>
>
> #include <asm/arm_pmuv3.h>
>
> @@ -96,7 +97,7 @@ static void kvm_vcpu_pmu_enable_el0(unsigned long events)
> u64 typer;
> u32 counter;
>
> - for_each_set_bit(counter, &events, 32) {
> + for_each_set_bit(counter, &events, ARMPMU_MAX_HWEVENTS) {
> typer = kvm_vcpu_pmu_read_evtype_direct(counter);
> typer &= ~ARMV8_PMU_EXCLUDE_EL0;
> kvm_vcpu_pmu_write_evtype_direct(counter, typer);
> @@ -111,7 +112,7 @@ static void kvm_vcpu_pmu_disable_el0(unsigned long events)
> u64 typer;
> u32 counter;
>
> - for_each_set_bit(counter, &events, 32) {
> + for_each_set_bit(counter, &events, ARMPMU_MAX_HWEVENTS) {
> typer = kvm_vcpu_pmu_read_evtype_direct(counter);
> typer |= ARMV8_PMU_EXCLUDE_EL0;
> kvm_vcpu_pmu_write_evtype_direct(counter, typer);
> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
> index 871067fb2616..e08aeec5d936 100644
> --- a/include/kvm/arm_pmu.h
> +++ b/include/kvm/arm_pmu.h
> @@ -10,6 +10,7 @@
> #include <linux/perf_event.h>
> #include <linux/perf/arm_pmuv3.h>
>
> +#define KVM_ARMV8_PMU_MAX_COUNTERS 32
>
> #if IS_ENABLED(CONFIG_HW_PERF_EVENTS) && IS_ENABLED(CONFIG_KVM)
> struct kvm_pmc {
> @@ -25,7 +26,7 @@ struct kvm_pmu_events {
> struct kvm_pmu {
> struct irq_work overflow_work;
> struct kvm_pmu_events events;
> - struct kvm_pmc pmc[ARMV8_PMU_MAX_COUNTERS];
> + struct kvm_pmc pmc[KVM_ARMV8_PMU_MAX_COUNTERS];
> int irq_num;
> bool created;
> bool irq_level;
> diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
> index caa09241ad4f..c902fe64f070 100644
> --- a/include/linux/perf/arm_pmuv3.h
> +++ b/include/linux/perf/arm_pmuv3.h
> @@ -6,8 +6,6 @@
> #ifndef __PERF_ARM_PMUV3_H
> #define __PERF_ARM_PMUV3_H
>
> -#define ARMV8_PMU_MAX_COUNTERS 32
> -
> /*
> * Common architectural and microarchitectural event numbers.
> */
>
> --
> 2.43.0
>

2024-06-10 11:40:56

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 7/9] arm64: perf/kvm: Use a common PMU cycle counter define

On Fri, Jun 07, 2024 at 02:31:32PM -0600, Rob Herring (Arm) wrote:
> The PMUv3 and KVM code each have a define for the PMU cycle counter
> index. Move KVM's define to a shared location and use it for PMUv3
> driver.
>
> Signed-off-by: Rob Herring (Arm) <[email protected]>
> ---
> arch/arm/include/asm/arm_pmuv3.h | 2 ++
> arch/arm64/include/asm/arm_pmuv3.h | 2 ++
> arch/arm64/kvm/sys_regs.c | 1 +
> drivers/perf/arm_pmuv3.c | 23 +++++++++--------------
> include/kvm/arm_pmu.h | 1 -
> 5 files changed, 14 insertions(+), 15 deletions(-)
>
> diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h
> index a41b503b7dcd..ac2cf37b57e3 100644
> --- a/arch/arm/include/asm/arm_pmuv3.h
> +++ b/arch/arm/include/asm/arm_pmuv3.h
> @@ -9,6 +9,8 @@
> #include <asm/cp15.h>
> #include <asm/cputype.h>
>
> +#define ARMV8_PMU_CYCLE_IDX 31
> +
> #define PMCCNTR __ACCESS_CP15_64(0, c9)
>
> #define PMCR __ACCESS_CP15(c9, 0, c12, 0)
> diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h
> index 1ed91334fede..46930729fb3f 100644
> --- a/arch/arm64/include/asm/arm_pmuv3.h
> +++ b/arch/arm64/include/asm/arm_pmuv3.h
> @@ -11,6 +11,8 @@
> #include <asm/cpufeature.h>
> #include <asm/sysreg.h>
>
> +#define ARMV8_PMU_CYCLE_IDX 31

I think we can define this in <linux/perf/arm_pmuv3.h>, rather than
needing separate definitions for arm/arm64.

> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index f8b5db48ea8a..22393ae7ce14 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -18,6 +18,7 @@
> #include <linux/printk.h>
> #include <linux/uaccess.h>
>
> +#include <asm/arm_pmuv3.h>
> #include <asm/cacheflush.h>
> #include <asm/cputype.h>
> #include <asm/debug-monitors.h>

... so we'd need to change the include here, but the rest of this patch
looks good as-is.

Mark.

2024-06-10 11:57:10

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 9/9] perf: arm_pmuv3: Add support for Armv9.4 PMU instruction counter

On Fri, Jun 07, 2024 at 02:31:34PM -0600, Rob Herring (Arm) wrote:
> Armv9.4/8.9 PMU adds optional support for a fixed instruction counter
> similar to the fixed cycle counter. Support for the feature is indicated
> in the ID_AA64DFR1_EL1 register PMICNTR field. The counter is not
> accessible in AArch32.
>
> Existing userspace using direct counter access won't know how to handle
> the fixed instruction counter, so we have to avoid using the counter
> when user access is requested.
>
> Signed-off-by: Rob Herring (Arm) <[email protected]>
> ---
> arch/arm/include/asm/arm_pmuv3.h | 21 +++++++++++++++++++++
> arch/arm64/include/asm/arm_pmuv3.h | 29 +++++++++++++++++++++++++++++
> arch/arm64/kvm/pmu.c | 8 ++++++--
> arch/arm64/tools/sysreg | 25 +++++++++++++++++++++++++
> drivers/perf/arm_pmuv3.c | 28 ++++++++++++++++++++++++++--
> include/linux/perf/arm_pmu.h | 8 ++++++--
> include/linux/perf/arm_pmuv3.h | 4 +++-
> 7 files changed, 116 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h
> index ac2cf37b57e3..b836537ddfbf 100644
> --- a/arch/arm/include/asm/arm_pmuv3.h
> +++ b/arch/arm/include/asm/arm_pmuv3.h
> @@ -10,6 +10,7 @@
> #include <asm/cputype.h>
>
> #define ARMV8_PMU_CYCLE_IDX 31
> +#define ARMV8_PMU_INSTR_IDX 32 /* Not accessible from AArch32 */

As with ARMV8_PMU_CYCLE_IDX, I reckon this should live in
<linux/perf/arm_pmuv3.h> (with the comment above) so that we don't need
separate definitions for arm & arm64.

[...]

> diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
> index 231817a379b5..8ab6e09871de 100644
> --- a/arch/arm64/tools/sysreg
> +++ b/arch/arm64/tools/sysreg
> @@ -2029,6 +2029,31 @@ Sysreg FAR_EL1 3 0 6 0 0
> Field 63:0 ADDR
> EndSysreg
>
> +Sysreg PMICNTR_EL0 3 3 9 4 0
> +Field 63:0 ICNT
> +EndSysreg

LGTM per ARM DDI 0487K.a, section D23.5.15, pages 8989 to 8992.

> +
> +Sysreg PMICFILTR_EL0 3 3 9 6 0
> +Res0 63:59
> +Field 58 SYNC
> +Field 57:56 VS

The 'VS' field doesn't seem to be in the ARM ARM (ARM DDI 0487K.a); is
that defined in a supplement?

> +Res0 55:32
> +Field 31 P
> +Field 30 U
> +Field 29 NSK
> +Field 28 NSU
> +Field 27 NSH
> +Field 26 M
> +Res0 25
> +Field 24 SH
> +Field 23 T
> +Field 22 RLK
> +Field 21 RLU
> +Field 20 RLH
> +Res0 19:16
> +Field 15:0 evtCount
> +EndSysreg

Aside from 'VS', this LGTM per ARM DDI 0487K.a, section D23.5.14, pages
8981 to 8988.

One important thing to note is that this doesn't have the threshold
controls (TC, TE, TH); so if threshold events make sense for instruction
events, we cannot place those in the dedicated isntruction counter.

[...]

> @@ -931,6 +939,18 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
> return -EAGAIN;
> }
>
> + /*
> + * Always prefer to place a instruction counter into the instruction counter,
> + * but don't expose the instruction counter to userspace access as userspace
> + * may not know how to handle it.
> + */
> + if (test_bit(ARMV8_PMU_INSTR_IDX, cpu_pmu->cntr_mask) &&
> + (evtype == ARMV8_PMUV3_PERFCTR_INST_RETIRED) &&
> + !armv8pmu_event_want_user_access(event)) {
> + if (!test_and_set_bit(ARMV8_PMU_INSTR_IDX, cpuc->used_mask))
> + return ARMV8_PMU_INSTR_IDX;
> + }


I reckon this'd be a bit clearer if we check the evtype first, as with
cycles, e.g.

/*
* Always prefer to place a instruction counter into the instruction counter,
* but don't expose the instruction counter to userspace access as userspace
* may not know how to handle it.
*/
if (evtype == ARMV8_PMUV3_PERFCTR_INST_RETIRED) {
if (test_bit(ARMV8_PMU_INSTR_IDX, cpu_pmu->cntr_mask) &&
!armv8pmu_event_want_user_access(event) &&
!test_and_set_bit(ARMV8_PMU_INSTR_IDX, cpuc->used_mask))
return ARMV8_PMU_INSTR_IDX;
}

As above, we might need to check for threshold controls, but I'll need
to go page that in.

Mark.

2024-06-10 14:16:08

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH 9/9] perf: arm_pmuv3: Add support for Armv9.4 PMU instruction counter

On Mon, Jun 10, 2024 at 5:55 AM Mark Rutland <[email protected]> wrote:
>
> On Fri, Jun 07, 2024 at 02:31:34PM -0600, Rob Herring (Arm) wrote:
> > Armv9.4/8.9 PMU adds optional support for a fixed instruction counter
> > similar to the fixed cycle counter. Support for the feature is indicated
> > in the ID_AA64DFR1_EL1 register PMICNTR field. The counter is not
> > accessible in AArch32.
> >
> > Existing userspace using direct counter access won't know how to handle
> > the fixed instruction counter, so we have to avoid using the counter
> > when user access is requested.
> >
> > Signed-off-by: Rob Herring (Arm) <[email protected]>
> > ---
> > arch/arm/include/asm/arm_pmuv3.h | 21 +++++++++++++++++++++
> > arch/arm64/include/asm/arm_pmuv3.h | 29 +++++++++++++++++++++++++++++
> > arch/arm64/kvm/pmu.c | 8 ++++++--
> > arch/arm64/tools/sysreg | 25 +++++++++++++++++++++++++
> > drivers/perf/arm_pmuv3.c | 28 ++++++++++++++++++++++++++--
> > include/linux/perf/arm_pmu.h | 8 ++++++--
> > include/linux/perf/arm_pmuv3.h | 4 +++-
> > 7 files changed, 116 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h
> > index ac2cf37b57e3..b836537ddfbf 100644
> > --- a/arch/arm/include/asm/arm_pmuv3.h
> > +++ b/arch/arm/include/asm/arm_pmuv3.h
> > @@ -10,6 +10,7 @@
> > #include <asm/cputype.h>
> >
> > #define ARMV8_PMU_CYCLE_IDX 31
> > +#define ARMV8_PMU_INSTR_IDX 32 /* Not accessible from AArch32 */
>
> As with ARMV8_PMU_CYCLE_IDX, I reckon this should live in
> <linux/perf/arm_pmuv3.h> (with the comment above) so that we don't need
> separate definitions for arm & arm64.

Sure. My only reasoning was to not pull in everything defined in
<linux/perf/arm_pmuv3.h> where it is not needed (KVM).

Of course, the normal pattern here is that linux/foo.h includes
asm/foo.h. We should probably do that here because the arm64 version
of arm_pmuv3.h depends on defines (e.g. PMEVN_SWITCH) in
perf/arm_pmuv3.h, but there is no explicit include. I'll add doing
that into the series.

> [...]
>
> > diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
> > index 231817a379b5..8ab6e09871de 100644
> > --- a/arch/arm64/tools/sysreg
> > +++ b/arch/arm64/tools/sysreg
> > @@ -2029,6 +2029,31 @@ Sysreg FAR_EL1 3 0 6 0 0
> > Field 63:0 ADDR
> > EndSysreg
> >
> > +Sysreg PMICNTR_EL0 3 3 9 4 0
> > +Field 63:0 ICNT
> > +EndSysreg
>
> LGTM per ARM DDI 0487K.a, section D23.5.15, pages 8989 to 8992.
>
> > +
> > +Sysreg PMICFILTR_EL0 3 3 9 6 0
> > +Res0 63:59
> > +Field 58 SYNC
> > +Field 57:56 VS
>
> The 'VS' field doesn't seem to be in the ARM ARM (ARM DDI 0487K.a); is
> that defined in a supplement?

It is in the A-profile Arch Registers doc:
https://developer.arm.com/documentation/ddi0601/2024-03/AArch64-Registers/PMICFILTR-EL0--Performance-Monitors-Instruction-Counter-Filter-Register?lang=en

> > +Res0 55:32
> > +Field 31 P
> > +Field 30 U
> > +Field 29 NSK
> > +Field 28 NSU
> > +Field 27 NSH
> > +Field 26 M
> > +Res0 25
> > +Field 24 SH
> > +Field 23 T
> > +Field 22 RLK
> > +Field 21 RLU
> > +Field 20 RLH
> > +Res0 19:16
> > +Field 15:0 evtCount
> > +EndSysreg
>
> Aside from 'VS', this LGTM per ARM DDI 0487K.a, section D23.5.14, pages
> 8981 to 8988.
>
> One important thing to note is that this doesn't have the threshold
> controls (TC, TE, TH); so if threshold events make sense for instruction
> events, we cannot place those in the dedicated isntruction counter.

Talking to James C, he thinks there could be. Counting cycles which
retired more than 1 instruction was what he came up with.

Also, looks like we need to handle thresholds for the cycle counter as
well. Either reject thresholds for cycles events or don't place cycles
on the fixed cycle counter if thresholds are used. While thresholds
with cycles doesn't make much sense, James prefers the latter as he
did do that for testing.

Rob

2024-06-10 16:54:56

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH 3/9] perf: arm_pmu: Remove event index to counter remapping

On Mon, Jun 10, 2024 at 4:44 AM Mark Rutland <[email protected]> wrote:
>
> On Fri, Jun 07, 2024 at 02:31:28PM -0600, Rob Herring (Arm) wrote:
> > Xscale and Armv6 PMUs defined the cycle counter at 0 and event counters
> > starting at 1 and had 1:1 event index to counter numbering. On Armv7 and
> > later, this changed the cycle counter to 31 and event counters start at
> > 0. The drivers for Armv7 and PMUv3 kept the old event index numbering
> > and introduced an event index to counter conversion. The conversion uses
> > masking to convert from event index to a counter number. This operation
> > relies on having at most 32 counters so that the cycle counter index 0
> > can be transformed to counter number 31.

[...]

> > @@ -783,7 +767,7 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
> > struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
> >
> > /* Clear any unused counters to avoid leaking their contents */
> > - for_each_clear_bit(i, cpuc->used_mask, cpu_pmu->num_events) {
> > + for_each_clear_bit(i, cpuc->used_mask, ARMPMU_MAX_HWEVENTS) {
> > if (i == ARMV8_IDX_CYCLE_COUNTER)
> > write_pmccntr(0);
> > else
>
> IIUC this will now hit all unimplemented counters; e.g. for N counters the body
> will run for counters N..31, and the else case has:
>
> armv8pmu_write_evcntr(i, 0);
>
> ... where the resulting write to PMEVCNTR<n>_EL0 for unimplemented
> counters is CONSTRAINED UNPREDICTABLE and might be UNDEFINED.
>
> We can fix that with for_each_andnot_bit(), e.g.

Good catch. Fixed.

>
> for_each_andnot_bit(i, cpu_pmu->cntr_mask, cpuc->used_mask,
> ARMPMU_MAX_HWEVENTS) {
> if (i == ARMV8_IDX_CYCLE_COUNTER)
> write_pmccntr(0);
> else
> armv8pmu_write_evcntr(i, 0);
> }
>
> [...]
>
> > @@ -905,7 +889,7 @@ static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc,
> > {
> > int idx;
> >
> > - for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; idx++) {
> > + for_each_set_bit(idx, cpu_pmu->cntr_mask, 31) {
> > if (!test_and_set_bit(idx, cpuc->used_mask))
> > return idx;
> > }
> > @@ -921,7 +905,9 @@ static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
> > * Chaining requires two consecutive event counters, where
> > * the lower idx must be even.
> > */
> > - for (idx = ARMV8_IDX_COUNTER0 + 1; idx < cpu_pmu->num_events; idx += 2) {
> > + for_each_set_bit(idx, cpu_pmu->cntr_mask, 31) {
> > + if (!(idx & 0x1))
> > + continue;
> > if (!test_and_set_bit(idx, cpuc->used_mask)) {
> > /* Check if the preceding even counter is available */
> > if (!test_and_set_bit(idx - 1, cpuc->used_mask))
>
> It would be nice to replace those instances of '31' with something
> indicating that this was only covering the generic/programmable
> counters, but I wasn't able to come up with a nice mnemonic for that.
> The best I could think of was:
>
> #define ARMV8_MAX_NR_GENERIC_COUNTERS 31
>
> Maybe it makes sense to define that along with ARMV8_IDX_CYCLE_COUNTER.

I've got nothing better. :) I think there's a few other spots that can use this.

[...]

> > /* Read the nb of CNTx counters supported from PMNC */
> > - *nb_cnt = (armv7_pmnc_read() >> ARMV7_PMNC_N_SHIFT) & ARMV7_PMNC_N_MASK;
> > + nb_cnt = (armv7_pmnc_read() >> ARMV7_PMNC_N_SHIFT) & ARMV7_PMNC_N_MASK;
> > + bitmap_set(cpu_pmu->cntr_mask, 0, nb_cnt);
> >
> > /* Add the CPU cycles counter */
> > - *nb_cnt += 1;
> > + bitmap_set(cpu_pmu->cntr_mask, ARMV7_IDX_CYCLE_COUNTER, 1);
>
> This can be:
>
> set_bit(cpu_pmu->cntr_mask, ARMV7_IDX_CYCLE_COUNTER);
>
> ... and likewise for the PMUv3 version.

Indeed. The documentation in bitmap.h is not clear that greater than 1
unsigned long # of bits works given it says there set_bit() is just
"*addr |= bit". I guess I don't use bitops enough...

Rob