2021-04-07 20:50:43

by John Garry

[permalink] [raw]
Subject: [PATCH v3 0/6] perf arm64 metricgroup support

This series contains support to get basic metricgroups working for
arm64 CPUs.

Initial support is added for HiSilicon hip08 platform.

Some sample usage on Huawei D06 board:

$ ./perf list metric

List of pre-defined events (to be used in -e):

Metrics:

bp_misp_flush
[BP misp flush L3 topdown metric]
branch_mispredicts
[Branch mispredicts L2 topdown metric]
core_bound
[Core bound L2 topdown metric]
divider
[Divider L3 topdown metric]
exe_ports_util
[EXE ports util L3 topdown metric]
fetch_bandwidth_bound
[Fetch bandwidth bound L2 topdown metric]
fetch_latency_bound
[Fetch latency bound L2 topdown metric]
fsu_stall
[FSU stall L3 topdown metric]
idle_by_icache_miss

$ sudo ./perf stat -v -M core_bound sleep 1
Using CPUID 0x00000000480fd010
metric expr (exe_stall_cycle - (mem_stall_anyload + armv8_pmuv3_0@event\=0x7005@)) / cpu_cycles for core_bound
found event cpu_cycles
found event armv8_pmuv3_0/event=0x7005/
found event exe_stall_cycle
found event mem_stall_anyload
adding {cpu_cycles -> armv8_pmuv3_0/event=0x7001/
mem_stall_anyload -> armv8_pmuv3_0/event=0x7004/
Control descriptor is not initialized
cpu_cycles: 989433 385050 385050
armv8_pmuv3_0/event=0x7005/: 19207 385050 385050
exe_stall_cycle: 900825 385050 385050
mem_stall_anyload: 253516 385050 385050

Performance counter stats for 'sleep':

989,433 cpu_cycles # 0.63 core_bound
19,207 armv8_pmuv3_0/event=0x7005/
900,825 exe_stall_cycle
253,516 mem_stall_anyload

0.000805809 seconds time elapsed

0.000875000 seconds user
0.000000000 seconds sys

perf stat --topdown is not supported, as this requires the CPU PMU to
expose (alias) events for the TopDown L1 metrics from sysfs, which arm
does not do. To get that to work, we probably need to make perf use the
pmu-events cpumap to learn about those alias events.

Metric reuse support is added for pmu-events parse metric testcase.
This had been broken on power9 recently:
https://lore.kernel.org/lkml/20210324015418.GC8931@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com/

Differences to v2:
- Add TB and RB tags (Thanks!)
- Rename metricgroup__find_metric() from metricgroup_find_metric()
- Change resolve_metric_simple() to rescan after any insert

Differences to v1:
- Add pmu_events_map__find() as arm64-specific function
- Fix metric reuse for pmu-events parse metric testcase

John Garry (6):
perf metricgroup: Make find_metric() public with name change
perf test: Handle metric reuse in pmu-events parsing test
perf pmu: Add pmu_events_map__find()
perf vendor events arm64: Add Hisi hip08 L1 metrics
perf vendor events arm64: Add Hisi hip08 L2 metrics
perf vendor events arm64: Add Hisi hip08 L3 metrics

tools/perf/arch/arm64/util/Build | 1 +
tools/perf/arch/arm64/util/pmu.c | 25 ++
.../arch/arm64/hisilicon/hip08/metrics.json | 233 ++++++++++++++++++
tools/perf/tests/pmu-events.c | 83 ++++++-
tools/perf/util/metricgroup.c | 12 +-
tools/perf/util/metricgroup.h | 3 +-
tools/perf/util/pmu.c | 5 +
tools/perf/util/pmu.h | 1 +
tools/perf/util/s390-sample-raw.c | 4 +-
9 files changed, 356 insertions(+), 11 deletions(-)
create mode 100644 tools/perf/arch/arm64/util/pmu.c
create mode 100644 tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json

--
2.26.2


2021-04-07 20:50:46

by John Garry

[permalink] [raw]
Subject: [PATCH v3 6/6] perf vendor events arm64: Add Hisi hip08 L3 metrics

Add L3 metrics.

Signed-off-by: John Garry <[email protected]>
Reviewed-by: Kajol Jain <[email protected]>
---
.../arch/arm64/hisilicon/hip08/metrics.json | 161 ++++++++++++++++++
1 file changed, 161 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
index dda898d23c2d..dda8e59149d2 100644
--- a/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
@@ -69,4 +69,165 @@
"MetricGroup": "TopDownL2",
"MetricName": "memory_bound"
},
+ {
+ "MetricExpr": "(((L2I_TLB - L2I_TLB_REFILL) * 15) + (L2I_TLB_REFILL * 100)) / CPU_CYCLES",
+ "PublicDescription": "Idle by itlb miss L3 topdown metric",
+ "BriefDescription": "Idle by itlb miss L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "idle_by_itlb_miss"
+ },
+ {
+ "MetricExpr": "(((L2I_CACHE - L2I_CACHE_REFILL) * 15) + (L2I_CACHE_REFILL * 100)) / CPU_CYCLES",
+ "PublicDescription": "Idle by icache miss L3 topdown metric",
+ "BriefDescription": "Idle by icache miss L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "idle_by_icache_miss"
+ },
+ {
+ "MetricExpr": "(BR_MIS_PRED * 5) / CPU_CYCLES",
+ "PublicDescription": "BP misp flush L3 topdown metric",
+ "BriefDescription": "BP misp flush L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "bp_misp_flush"
+ },
+ {
+ "MetricExpr": "(armv8_pmuv3_0@event\\=0x2013@ * 5) / CPU_CYCLES",
+ "PublicDescription": "OOO flush L3 topdown metric",
+ "BriefDescription": "OOO flush L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "ooo_flush"
+ },
+ {
+ "MetricExpr": "(armv8_pmuv3_0@event\\=0x1001@ * 5) / CPU_CYCLES",
+ "PublicDescription": "Static predictor flush L3 topdown metric",
+ "BriefDescription": "Static predictor flush L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "sp_flush"
+ },
+ {
+ "MetricExpr": "armv8_pmuv3_0@event\\=0x1010@ / BR_MIS_PRED",
+ "PublicDescription": "Indirect branch L3 topdown metric",
+ "BriefDescription": "Indirect branch L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "indirect_branch"
+ },
+ {
+ "MetricExpr": "(armv8_pmuv3_0@event\\=0x1014@ + armv8_pmuv3_0@event\\=0x1018@) / BR_MIS_PRED",
+ "PublicDescription": "Push branch L3 topdown metric",
+ "BriefDescription": "Push branch L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "push_branch"
+ },
+ {
+ "MetricExpr": "armv8_pmuv3_0@event\\=0x100c@ / BR_MIS_PRED",
+ "PublicDescription": "Pop branch L3 topdown metric",
+ "BriefDescription": "Pop branch L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "pop_branch"
+ },
+ {
+ "MetricExpr": "(BR_MIS_PRED - armv8_pmuv3_0@event\\=0x1010@ - armv8_pmuv3_0@event\\=0x1014@ - armv8_pmuv3_0@event\\=0x1018@ - armv8_pmuv3_0@event\\=0x100c@) / BR_MIS_PRED",
+ "PublicDescription": "Other branch L3 topdown metric",
+ "BriefDescription": "Other branch L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "other_branch"
+ },
+ {
+ "MetricExpr": "armv8_pmuv3_0@event\\=0x2012@ / armv8_pmuv3_0@event\\=0x2013@",
+ "PublicDescription": "Nuke flush L3 topdown metric",
+ "BriefDescription": "Nuke flush L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "nuke_flush"
+ },
+ {
+ "MetricExpr": "1 - nuke_flush",
+ "PublicDescription": "Other flush L3 topdown metric",
+ "BriefDescription": "Other flush L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "other_flush"
+ },
+ {
+ "MetricExpr": "armv8_pmuv3_0@event\\=0x2010@ / CPU_CYCLES",
+ "PublicDescription": "Sync stall L3 topdown metric",
+ "BriefDescription": "Sync stall L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "sync_stall"
+ },
+ {
+ "MetricExpr": "armv8_pmuv3_0@event\\=0x2004@ / CPU_CYCLES",
+ "PublicDescription": "Rob stall L3 topdown metric",
+ "BriefDescription": "Rob stall L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "rob_stall"
+ },
+ {
+ "MetricExpr": "(armv8_pmuv3_0@event\\=0x2006@ + armv8_pmuv3_0@event\\=0x2007@ + armv8_pmuv3_0@event\\=0x2008@) / CPU_CYCLES",
+ "PublicDescription": "Ptag stall L3 topdown metric",
+ "BriefDescription": "Ptag stall L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "ptag_stall"
+ },
+ {
+ "MetricExpr": "armv8_pmuv3_0@event\\=0x201e@ / CPU_CYCLES",
+ "PublicDescription": "SaveOpQ stall L3 topdown metric",
+ "BriefDescription": "SaveOpQ stall L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "saveopq_stall"
+ },
+ {
+ "MetricExpr": "armv8_pmuv3_0@event\\=0x2005@ / CPU_CYCLES",
+ "PublicDescription": "PC buffer stall L3 topdown metric",
+ "BriefDescription": "PC buffer stall L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "pc_buffer_stall"
+ },
+ {
+ "MetricExpr": "armv8_pmuv3_0@event\\=0x7002@ / CPU_CYCLES",
+ "PublicDescription": "Divider L3 topdown metric",
+ "BriefDescription": "Divider L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "divider"
+ },
+ {
+ "MetricExpr": "armv8_pmuv3_0@event\\=0x7003@ / CPU_CYCLES",
+ "PublicDescription": "FSU stall L3 topdown metric",
+ "BriefDescription": "FSU stall L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "fsu_stall"
+ },
+ {
+ "MetricExpr": "core_bound - divider - fsu_stall",
+ "PublicDescription": "EXE ports util L3 topdown metric",
+ "BriefDescription": "EXE ports util L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "exe_ports_util"
+ },
+ {
+ "MetricExpr": "(MEM_STALL_ANYLOAD - MEM_STALL_L1MISS) / CPU_CYCLES",
+ "PublicDescription": "L1 bound L3 topdown metric",
+ "BriefDescription": "L1 bound L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "l1_bound"
+ },
+ {
+ "MetricExpr": "(MEM_STALL_L1MISS - MEM_STALL_L2MISS) / CPU_CYCLES",
+ "PublicDescription": "L2 bound L3 topdown metric",
+ "BriefDescription": "L2 bound L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "l2_bound"
+ },
+ {
+ "MetricExpr": "MEM_STALL_L2MISS / CPU_CYCLES",
+ "PublicDescription": "Mem bound L3 topdown metric",
+ "BriefDescription": "Mem bound L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "mem_bound"
+ },
+ {
+ "MetricExpr": "armv8_pmuv3_0@event\\=0x7005@ / CPU_CYCLES",
+ "PublicDescription": "Store bound L3 topdown metric",
+ "BriefDescription": "Store bound L3 topdown metric",
+ "MetricGroup": "TopDownL3",
+ "MetricName": "store_bound"
+ },
]
--
2.26.2

2021-04-07 20:52:27

by John Garry

[permalink] [raw]
Subject: [PATCH v3 3/6] perf pmu: Add pmu_events_map__find()

Add a function to find the common PMU map for the system.

For arm64, a special variant is added. This is because arm64 supports
heterogeneous CPU systems. As such, it cannot be guaranteed that the cpumap
is same for all CPUs. So in case of heterogeneous systems, don't return
a cpumap.

Tested-by: Paul A. Clarke <[email protected]>
Signed-off-by: John Garry <[email protected]>
Reviewed-by: Kajol Jain <[email protected]>
---
tools/perf/arch/arm64/util/Build | 1 +
tools/perf/arch/arm64/util/pmu.c | 25 +++++++++++++++++++++++++
tools/perf/tests/pmu-events.c | 2 +-
tools/perf/util/metricgroup.c | 7 +++----
tools/perf/util/pmu.c | 5 +++++
tools/perf/util/pmu.h | 1 +
tools/perf/util/s390-sample-raw.c | 4 +---
7 files changed, 37 insertions(+), 8 deletions(-)
create mode 100644 tools/perf/arch/arm64/util/pmu.c

diff --git a/tools/perf/arch/arm64/util/Build b/tools/perf/arch/arm64/util/Build
index ead2f2275eee..9fcb4e68add9 100644
--- a/tools/perf/arch/arm64/util/Build
+++ b/tools/perf/arch/arm64/util/Build
@@ -2,6 +2,7 @@ perf-y += header.o
perf-y += machine.o
perf-y += perf_regs.o
perf-y += tsc.o
+perf-y += pmu.o
perf-y += kvm-stat.o
perf-$(CONFIG_DWARF) += dwarf-regs.o
perf-$(CONFIG_LOCAL_LIBUNWIND) += unwind-libunwind.o
diff --git a/tools/perf/arch/arm64/util/pmu.c b/tools/perf/arch/arm64/util/pmu.c
new file mode 100644
index 000000000000..d3259d61ca75
--- /dev/null
+++ b/tools/perf/arch/arm64/util/pmu.c
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "../../util/cpumap.h"
+#include "../../util/pmu.h"
+
+struct pmu_events_map *pmu_events_map__find(void)
+{
+ struct perf_pmu *pmu = NULL;
+
+ while ((pmu = perf_pmu__scan(pmu))) {
+ if (!is_pmu_core(pmu->name))
+ continue;
+
+ /*
+ * The cpumap should cover all CPUs. Otherwise, some CPUs may
+ * not support some events or have different event IDs.
+ */
+ if (pmu->cpus->nr != cpu__max_cpu())
+ return NULL;
+
+ return perf_pmu__find_map(pmu);
+ }
+
+ return NULL;
+}
diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index cb5b25d2fb27..b8aff8fb50d8 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -539,7 +539,7 @@ static int resolve_metric_simple(struct expr_parse_ctx *pctx,

static int test_parsing(void)
{
- struct pmu_events_map *cpus_map = perf_pmu__find_map(NULL);
+ struct pmu_events_map *cpus_map = pmu_events_map__find();
struct pmu_events_map *map;
struct pmu_event *pe;
int i, j, k;
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 37fe34a5d93d..8336dd8e8098 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -618,7 +618,7 @@ static int metricgroup__print_sys_event_iter(struct pmu_event *pe, void *data)
void metricgroup__print(bool metrics, bool metricgroups, char *filter,
bool raw, bool details)
{
- struct pmu_events_map *map = perf_pmu__find_map(NULL);
+ struct pmu_events_map *map = pmu_events_map__find();
struct pmu_event *pe;
int i;
struct rblist groups;
@@ -1254,8 +1254,7 @@ int metricgroup__parse_groups(const struct option *opt,
struct rblist *metric_events)
{
struct evlist *perf_evlist = *(struct evlist **)opt->value;
- struct pmu_events_map *map = perf_pmu__find_map(NULL);
-
+ struct pmu_events_map *map = pmu_events_map__find();

return parse_groups(perf_evlist, str, metric_no_group,
metric_no_merge, NULL, metric_events, map);
@@ -1274,7 +1273,7 @@ int metricgroup__parse_groups_test(struct evlist *evlist,

bool metricgroup__has_metric(const char *metric)
{
- struct pmu_events_map *map = perf_pmu__find_map(NULL);
+ struct pmu_events_map *map = pmu_events_map__find();
struct pmu_event *pe;
int i;

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 88da5cf6aee8..419ef6c4fbc0 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -717,6 +717,11 @@ struct pmu_events_map *perf_pmu__find_map(struct perf_pmu *pmu)
return map;
}

+struct pmu_events_map *__weak pmu_events_map__find(void)
+{
+ return perf_pmu__find_map(NULL);
+}
+
bool pmu_uncore_alias_match(const char *pmu_name, const char *name)
{
char *tmp = NULL, *tok, *str;
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 8164388478c6..012317229488 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -114,6 +114,7 @@ void pmu_add_cpu_aliases_map(struct list_head *head, struct perf_pmu *pmu,
struct pmu_events_map *map);

struct pmu_events_map *perf_pmu__find_map(struct perf_pmu *pmu);
+struct pmu_events_map *pmu_events_map__find(void);
bool pmu_uncore_alias_match(const char *pmu_name, const char *name);
void perf_pmu_free_alias(struct perf_pmu_alias *alias);

diff --git a/tools/perf/util/s390-sample-raw.c b/tools/perf/util/s390-sample-raw.c
index cfcf8d534d76..08ec3c3ae0ee 100644
--- a/tools/perf/util/s390-sample-raw.c
+++ b/tools/perf/util/s390-sample-raw.c
@@ -160,11 +160,9 @@ static void s390_cpumcfdg_dump(struct perf_sample *sample)
const char *color = PERF_COLOR_BLUE;
struct cf_ctrset_entry *cep, ce;
struct pmu_events_map *map;
- struct perf_pmu pmu;
u64 *p;

- memset(&pmu, 0, sizeof(pmu));
- map = perf_pmu__find_map(&pmu);
+ map = pmu_events_map__find();
while (offset < len) {
cep = (struct cf_ctrset_entry *)(buf + offset);

--
2.26.2

2021-04-07 20:59:51

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v3 0/6] perf arm64 metricgroup support

Em Wed, Apr 07, 2021 at 06:32:44PM +0800, John Garry escreveu:
> This series contains support to get basic metricgroups working for
> arm64 CPUs.
>
> Initial support is added for HiSilicon hip08 platform.
>
> Some sample usage on Huawei D06 board:
>
> $ ./perf list metric

Thanks, applied.

- Arnaldo


> List of pre-defined events (to be used in -e):
>
> Metrics:
>
> bp_misp_flush
> [BP misp flush L3 topdown metric]
> branch_mispredicts
> [Branch mispredicts L2 topdown metric]
> core_bound
> [Core bound L2 topdown metric]
> divider
> [Divider L3 topdown metric]
> exe_ports_util
> [EXE ports util L3 topdown metric]
> fetch_bandwidth_bound
> [Fetch bandwidth bound L2 topdown metric]
> fetch_latency_bound
> [Fetch latency bound L2 topdown metric]
> fsu_stall
> [FSU stall L3 topdown metric]
> idle_by_icache_miss
>
> $ sudo ./perf stat -v -M core_bound sleep 1
> Using CPUID 0x00000000480fd010
> metric expr (exe_stall_cycle - (mem_stall_anyload + armv8_pmuv3_0@event\=0x7005@)) / cpu_cycles for core_bound
> found event cpu_cycles
> found event armv8_pmuv3_0/event=0x7005/
> found event exe_stall_cycle
> found event mem_stall_anyload
> adding {cpu_cycles -> armv8_pmuv3_0/event=0x7001/
> mem_stall_anyload -> armv8_pmuv3_0/event=0x7004/
> Control descriptor is not initialized
> cpu_cycles: 989433 385050 385050
> armv8_pmuv3_0/event=0x7005/: 19207 385050 385050
> exe_stall_cycle: 900825 385050 385050
> mem_stall_anyload: 253516 385050 385050
>
> Performance counter stats for 'sleep':
>
> 989,433 cpu_cycles # 0.63 core_bound
> 19,207 armv8_pmuv3_0/event=0x7005/
> 900,825 exe_stall_cycle
> 253,516 mem_stall_anyload
>
> 0.000805809 seconds time elapsed
>
> 0.000875000 seconds user
> 0.000000000 seconds sys
>
> perf stat --topdown is not supported, as this requires the CPU PMU to
> expose (alias) events for the TopDown L1 metrics from sysfs, which arm
> does not do. To get that to work, we probably need to make perf use the
> pmu-events cpumap to learn about those alias events.
>
> Metric reuse support is added for pmu-events parse metric testcase.
> This had been broken on power9 recently:
> https://lore.kernel.org/lkml/20210324015418.GC8931@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com/
>
> Differences to v2:
> - Add TB and RB tags (Thanks!)
> - Rename metricgroup__find_metric() from metricgroup_find_metric()
> - Change resolve_metric_simple() to rescan after any insert
>
> Differences to v1:
> - Add pmu_events_map__find() as arm64-specific function
> - Fix metric reuse for pmu-events parse metric testcase
>
> John Garry (6):
> perf metricgroup: Make find_metric() public with name change
> perf test: Handle metric reuse in pmu-events parsing test
> perf pmu: Add pmu_events_map__find()
> perf vendor events arm64: Add Hisi hip08 L1 metrics
> perf vendor events arm64: Add Hisi hip08 L2 metrics
> perf vendor events arm64: Add Hisi hip08 L3 metrics
>
> tools/perf/arch/arm64/util/Build | 1 +
> tools/perf/arch/arm64/util/pmu.c | 25 ++
> .../arch/arm64/hisilicon/hip08/metrics.json | 233 ++++++++++++++++++
> tools/perf/tests/pmu-events.c | 83 ++++++-
> tools/perf/util/metricgroup.c | 12 +-
> tools/perf/util/metricgroup.h | 3 +-
> tools/perf/util/pmu.c | 5 +
> tools/perf/util/pmu.h | 1 +
> tools/perf/util/s390-sample-raw.c | 4 +-
> 9 files changed, 356 insertions(+), 11 deletions(-)
> create mode 100644 tools/perf/arch/arm64/util/pmu.c
> create mode 100644 tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
>
> --
> 2.26.2
>

--

- Arnaldo

2021-04-08 12:09:43

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v3 0/6] perf arm64 metricgroup support

On Wed, Apr 07, 2021 at 06:32:44PM +0800, John Garry wrote:
> This series contains support to get basic metricgroups working for
> arm64 CPUs.
>
> Initial support is added for HiSilicon hip08 platform.
>
> Some sample usage on Huawei D06 board:
>
> $ ./perf list metric
>
> List of pre-defined events (to be used in -e):
>
> Metrics:
>
> bp_misp_flush
> [BP misp flush L3 topdown metric]
> branch_mispredicts
> [Branch mispredicts L2 topdown metric]
> core_bound
> [Core bound L2 topdown metric]
> divider
> [Divider L3 topdown metric]
> exe_ports_util
> [EXE ports util L3 topdown metric]
> fetch_bandwidth_bound
> [Fetch bandwidth bound L2 topdown metric]
> fetch_latency_bound
> [Fetch latency bound L2 topdown metric]
> fsu_stall
> [FSU stall L3 topdown metric]
> idle_by_icache_miss
>
> $ sudo ./perf stat -v -M core_bound sleep 1
> Using CPUID 0x00000000480fd010
> metric expr (exe_stall_cycle - (mem_stall_anyload + armv8_pmuv3_0@event\=0x7005@)) / cpu_cycles for core_bound
> found event cpu_cycles
> found event armv8_pmuv3_0/event=0x7005/
> found event exe_stall_cycle
> found event mem_stall_anyload
> adding {cpu_cycles -> armv8_pmuv3_0/event=0x7001/
> mem_stall_anyload -> armv8_pmuv3_0/event=0x7004/
> Control descriptor is not initialized
> cpu_cycles: 989433 385050 385050
> armv8_pmuv3_0/event=0x7005/: 19207 385050 385050
> exe_stall_cycle: 900825 385050 385050
> mem_stall_anyload: 253516 385050 385050
>
> Performance counter stats for 'sleep':
>
> 989,433 cpu_cycles # 0.63 core_bound
> 19,207 armv8_pmuv3_0/event=0x7005/
> 900,825 exe_stall_cycle
> 253,516 mem_stall_anyload
>
> 0.000805809 seconds time elapsed
>
> 0.000875000 seconds user
> 0.000000000 seconds sys
>
> perf stat --topdown is not supported, as this requires the CPU PMU to
> expose (alias) events for the TopDown L1 metrics from sysfs, which arm
> does not do. To get that to work, we probably need to make perf use the
> pmu-events cpumap to learn about those alias events.
>
> Metric reuse support is added for pmu-events parse metric testcase.
> This had been broken on power9 recently:
> https://lore.kernel.org/lkml/20210324015418.GC8931@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com/
>
> Differences to v2:
> - Add TB and RB tags (Thanks!)
> - Rename metricgroup__find_metric() from metricgroup_find_metric()
> - Change resolve_metric_simple() to rescan after any insert

Acked-by: Jiri Olsa <[email protected]>

thanks,
jirka

2021-04-13 16:16:21

by John Garry

[permalink] [raw]
Subject: perf arm64 --topdown support (was "perf arm64 metricgroup support")

On 08/04/2021 13:06, Jiri Olsa wrote:
> perf stat --topdown is not supported, as this requires the CPU PMU to
> expose (alias) events for the TopDown L1 metrics from sysfs, which arm
> does not do. To get that to work, we probably need to make perf use the
> pmu-events cpumap to learn about those alias events.

Hi guys,

About supporting --topdown command for other archs apart from x86, it
seems not possible today. Support there is based on kernel support for
"topdown" CPU events used in the metric calculations. However, arm64,
for example, does not support these "topdown" events.

It seems to me that we can change to use pmu-events framework +
metricgroup support here, rather than hardcoded events - has anyone
considered this approach previously? Seems a pretty big job, so thought
I'd ask first ...

Thanks,
John