2022-08-16 13:17:33

by Peter Newman

[permalink] [raw]
Subject: [PATCH v3] perf/arm: adjust hwevents mappings on boot

From: Stephane Eranian <[email protected]>

The mapping of perf_events generic hardware events to actual PMU events on
ARM PMUv3 may not always be correct. This is in particular true for the
PERF_COUNT_HW_BRANCH_INSTRUCTIONS event. Although the mapping points to an
architected event, it may not always be available. This can be seen with a
simple:

$ perf stat -e branches sleep 0
Performance counter stats for 'sleep 0':

<not supported> branches

0.001401081 seconds time elapsed

Yet the hardware does have an event that could be used for branches. This
patch fixes the problem by dynamically validating the generic hardware
events against the supported architected events. If a mapping is wrong it
can be replaced it with another. This is done for the event above at boot
time.

And with that:

$ perf stat -e branches sleep 0

Performance counter stats for 'sleep 0':

166,739 branches

0.000832163 seconds time elapsed

Signed-off-by: Stephane Eranian <[email protected]>
Co-developed-by: Peter Newman <[email protected]>
Signed-off-by: Peter Newman <[email protected]>
---

v2: https://lore.kernel.org/lkml/[email protected]/

since v2, removed prints per Will's suggestion

arch/arm64/kernel/perf_event.c | 36 +++++++++++++++++++++++++++++++++-
1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index cb69ff1e6138..945c31e3f3e3 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -39,7 +39,7 @@
* be supported on any given implementation. Unsupported events will
* be disabled at run-time based on the PMCEID registers.
*/
-static const unsigned armv8_pmuv3_perf_map[PERF_COUNT_HW_MAX] = {
+static unsigned armv8_pmuv3_perf_map[PERF_COUNT_HW_MAX] = {
PERF_MAP_ALL_UNSUPPORTED,
[PERF_COUNT_HW_CPU_CYCLES] = ARMV8_PMUV3_PERFCTR_CPU_CYCLES,
[PERF_COUNT_HW_INSTRUCTIONS] = ARMV8_PMUV3_PERFCTR_INST_RETIRED,
@@ -1232,6 +1232,37 @@ static void armv8_pmu_register_sysctl_table(void)
register_sysctl("kernel", armv8_pmu_sysctl_table);
}

+static void armv8pmu_fixup_perf_map(struct arm_pmu *cpu_pmu)
+{
+ int i, code;
+ unsigned *map = armv8_pmuv3_perf_map;
+
+ for (i = 0; i < PERF_COUNT_HW_MAX; i++) {
+retry:
+ code = map[i];
+ if (code == HW_OP_UNSUPPORTED)
+ continue;
+
+ if (test_bit(map[i], cpu_pmu->pmceid_bitmap))
+ continue;
+ /*
+ * mapping does not exist,
+ * let's see if we can fix it
+ */
+ switch (i) {
+ case PERF_COUNT_HW_BRANCH_INSTRUCTIONS:
+ if (code == ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED) {
+ map[i] = ARMV8_PMUV3_PERFCTR_BR_RETIRED;
+ goto retry;
+ }
+ break;
+ default:
+ map[i] = HW_OP_UNSUPPORTED;
+ break;
+ }
+ }
+}
+
static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name,
int (*map_event)(struct perf_event *event),
const struct attribute_group *events,
@@ -1259,6 +1290,9 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name,

cpu_pmu->name = name;
cpu_pmu->map_event = map_event;
+
+ armv8pmu_fixup_perf_map(cpu_pmu);
+
cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_EVENTS] = events ?
events : &armv8_pmuv3_events_attr_group;
cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_FORMATS] = format ?

base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
--
2.37.1.595.g718a3a8f04-goog


2022-08-16 15:26:08

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v3] perf/arm: adjust hwevents mappings on boot

On Tue, Aug 16, 2022 at 03:02:21PM +0200, Peter Newman wrote:
> From: Stephane Eranian <[email protected]>
>
> The mapping of perf_events generic hardware events to actual PMU events on
> ARM PMUv3 may not always be correct. This is in particular true for the
> PERF_COUNT_HW_BRANCH_INSTRUCTIONS event. Although the mapping points to an
> architected event, it may not always be available. This can be seen with a
> simple:
>
> $ perf stat -e branches sleep 0
> Performance counter stats for 'sleep 0':
>
> <not supported> branches
>
> 0.001401081 seconds time elapsed
>
> Yet the hardware does have an event that could be used for branches. This
> patch fixes the problem by dynamically validating the generic hardware
> events against the supported architected events. If a mapping is wrong it
> can be replaced it with another. This is done for the event above at boot
> time.
>
> And with that:
>
> $ perf stat -e branches sleep 0
>
> Performance counter stats for 'sleep 0':
>
> 166,739 branches
>
> 0.000832163 seconds time elapsed
>
> Signed-off-by: Stephane Eranian <[email protected]>
> Co-developed-by: Peter Newman <[email protected]>
> Signed-off-by: Peter Newman <[email protected]>
> ---
>
> v2: https://lore.kernel.org/lkml/[email protected]/
>
> since v2, removed prints per Will's suggestion
>
> arch/arm64/kernel/perf_event.c | 36 +++++++++++++++++++++++++++++++++-
> 1 file changed, 35 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index cb69ff1e6138..945c31e3f3e3 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -39,7 +39,7 @@
> * be supported on any given implementation. Unsupported events will
> * be disabled at run-time based on the PMCEID registers.

> */
> -static const unsigned armv8_pmuv3_perf_map[PERF_COUNT_HW_MAX] = {
> +static unsigned armv8_pmuv3_perf_map[PERF_COUNT_HW_MAX] = {
> PERF_MAP_ALL_UNSUPPORTED,
> [PERF_COUNT_HW_CPU_CYCLES] = ARMV8_PMUV3_PERFCTR_CPU_CYCLES,
> [PERF_COUNT_HW_INSTRUCTIONS] = ARMV8_PMUV3_PERFCTR_INST_RETIRED,

On big.LITTLE systems this is array is shared by multiple PMUs, so this cannot
be altered based on a single PMU.

Rather than applying a fixup, could we special-case this at mapping time?

Does the following work for you?

Thanks
Mark.

---->8----
From 6cf78ec72194296e8d0c87572b81f2b02a097918 Mon Sep 17 00:00:00 2001
From: Mark Rutland <[email protected]>
Date: Tue, 16 Aug 2022 15:16:23 +0100
Subject: [PATCH] arm64: pmuv3: dynamically map
PERF_COUNT_HW_BRANCH_INSTRUCTIONS

Signed-off-by: Mark Rutland <[email protected]>
---
arch/arm64/kernel/perf_event.c | 27 +++++++++++++++++++++++----
1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index cb69ff1e61380..0435adee5cab8 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -45,7 +45,6 @@ static const unsigned armv8_pmuv3_perf_map[PERF_COUNT_HW_MAX] = {
[PERF_COUNT_HW_INSTRUCTIONS] = ARMV8_PMUV3_PERFCTR_INST_RETIRED,
[PERF_COUNT_HW_CACHE_REFERENCES] = ARMV8_PMUV3_PERFCTR_L1D_CACHE,
[PERF_COUNT_HW_CACHE_MISSES] = ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL,
- [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED,
[PERF_COUNT_HW_BRANCH_MISSES] = ARMV8_PMUV3_PERFCTR_BR_MIS_PRED,
[PERF_COUNT_HW_BUS_CYCLES] = ARMV8_PMUV3_PERFCTR_BUS_CYCLES,
[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = ARMV8_PMUV3_PERFCTR_STALL_FRONTEND,
@@ -1050,6 +1049,28 @@ static void armv8pmu_reset(void *info)
armv8pmu_pmcr_write(pmcr);
}

+static int __armv8_pmuv3_map_event_id(struct arm_pmu *armpmu,
+ struct perf_event *event)
+{
+ if (event->attr.type == PERF_TYPE_HARDWARE &&
+ event->attr.config == PERF_COUNT_HW_BRANCH_INSTRUCTIONS) {
+
+ if (test_bit(ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED,
+ armpmu->pmceid_bitmap))
+ return ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED;
+
+ if (test_bit(ARMV8_PMUV3_PERFCTR_BR_RETIRED,
+ armpmu->pmceid_bitmap))
+ return ARMV8_PMUV3_PERFCTR_BR_RETIRED;
+
+ return HW_OP_UNSUPPORTED;
+ }
+
+ return armpmu_map_event(event, &armv8_pmuv3_perf_map,
+ &armv8_pmuv3_perf_cache_map,
+ ARMV8_PMU_EVTYPE_EVENT);
+}
+
static int __armv8_pmuv3_map_event(struct perf_event *event,
const unsigned (*extra_event_map)
[PERF_COUNT_HW_MAX],
@@ -1061,9 +1082,7 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
int hw_event_id;
struct arm_pmu *armpmu = to_arm_pmu(event->pmu);

- hw_event_id = armpmu_map_event(event, &armv8_pmuv3_perf_map,
- &armv8_pmuv3_perf_cache_map,
- ARMV8_PMU_EVTYPE_EVENT);
+ hw_event_id = __armv8_pmuv3_map_event_id(armpmu, event);

if (armv8pmu_event_is_64bit(event))
event->hw.flags |= ARMPMU_EVT_64BIT;
--
2.30.2

2022-08-16 17:22:27

by Peter Newman

[permalink] [raw]
Subject: Re: [PATCH v3] perf/arm: adjust hwevents mappings on boot

On Tue, Aug 16, 2022 at 4:18 PM Mark Rutland <[email protected]> wrote:
> On big.LITTLE systems this is array is shared by multiple PMUs, so this cannot
> be altered based on a single PMU.
>
> Rather than applying a fixup, could we special-case this at mapping time?
>
> Does the following work for you?
>
> Thanks
> Mark.
>
> ---->8----
> From 6cf78ec72194296e8d0c87572b81f2b02a097918 Mon Sep 17 00:00:00 2001
> From: Mark Rutland <[email protected]>
> Date: Tue, 16 Aug 2022 15:16:23 +0100
> Subject: [PATCH] arm64: pmuv3: dynamically map
> PERF_COUNT_HW_BRANCH_INSTRUCTIONS
>
> Signed-off-by: Mark Rutland <[email protected]>


Reviewed-by: Peter Newman <[email protected]>
Tested-by: Peter Newman <[email protected]>


> ---
> arch/arm64/kernel/perf_event.c | 27 +++++++++++++++++++++++----
> 1 file changed, 23 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index cb69ff1e61380..0435adee5cab8 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -45,7 +45,6 @@ static const unsigned armv8_pmuv3_perf_map[PERF_COUNT_HW_MAX] = {
> [PERF_COUNT_HW_INSTRUCTIONS] = ARMV8_PMUV3_PERFCTR_INST_RETIRED,
> [PERF_COUNT_HW_CACHE_REFERENCES] = ARMV8_PMUV3_PERFCTR_L1D_CACHE,
> [PERF_COUNT_HW_CACHE_MISSES] = ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL,
> - [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED,
> [PERF_COUNT_HW_BRANCH_MISSES] = ARMV8_PMUV3_PERFCTR_BR_MIS_PRED,
> [PERF_COUNT_HW_BUS_CYCLES] = ARMV8_PMUV3_PERFCTR_BUS_CYCLES,
> [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = ARMV8_PMUV3_PERFCTR_STALL_FRONTEND,
> @@ -1050,6 +1049,28 @@ static void armv8pmu_reset(void *info)
> armv8pmu_pmcr_write(pmcr);
> }
>
> +static int __armv8_pmuv3_map_event_id(struct arm_pmu *armpmu,
> + struct perf_event *event)
> +{
> + if (event->attr.type == PERF_TYPE_HARDWARE &&
> + event->attr.config == PERF_COUNT_HW_BRANCH_INSTRUCTIONS) {
> +
> + if (test_bit(ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED,
> + armpmu->pmceid_bitmap))
> + return ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED;
> +
> + if (test_bit(ARMV8_PMUV3_PERFCTR_BR_RETIRED,
> + armpmu->pmceid_bitmap))
> + return ARMV8_PMUV3_PERFCTR_BR_RETIRED;
> +
> + return HW_OP_UNSUPPORTED;
> + }
> +
> + return armpmu_map_event(event, &armv8_pmuv3_perf_map,
> + &armv8_pmuv3_perf_cache_map,
> + ARMV8_PMU_EVTYPE_EVENT);
> +}
> +
> static int __armv8_pmuv3_map_event(struct perf_event *event,
> const unsigned (*extra_event_map)
> [PERF_COUNT_HW_MAX],
> @@ -1061,9 +1082,7 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
> int hw_event_id;
> struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
>
> - hw_event_id = armpmu_map_event(event, &armv8_pmuv3_perf_map,
> - &armv8_pmuv3_perf_cache_map,
> - ARMV8_PMU_EVTYPE_EVENT);
> + hw_event_id = __armv8_pmuv3_map_event_id(armpmu, event);
>
> if (armv8pmu_event_is_64bit(event))
> event->hw.flags |= ARMPMU_EVT_64BIT;
> --
> 2.30.2
>

Hi Mark,

I can confirm that your patch fixes the issue we saw.

Thank you for coming back with this quickly, it looks like a better
solution to me.

Which branch were you going to apply it to?

Thanks!
-Peter

2022-08-31 13:53:54

by Peter Newman

[permalink] [raw]
Subject: Re: [PATCH v3] perf/arm: adjust hwevents mappings on boot

Hi Mark,

On Tue, Aug 16, 2022 at 7:00 PM Peter Newman <[email protected]> wrote:
>
> On Tue, Aug 16, 2022 at 4:18 PM Mark Rutland <[email protected]> wrote:

...

>
> I can confirm that your patch fixes the issue we saw.
>
> Thank you for coming back with this quickly, it looks like a better
> solution to me.
>

Just following up, please let me know what plans you have for your patch.

-Peter