LinuxLists.cc - [PATCH v7 0/9] Add metrics for neoverse-n2-v2

2023-01-13 09:44:27

Subject: [PATCH v7 0/9] Add metrics for neoverse-n2-v2

Changes since v6:
- Split patch #1 into 3 smaller patches as suggested by Ian.
- Change perf_pmu__get_slots into perf_pmu__cpu_slots_per_cycle,
per John's suggestion;
- Return NAN instead of 0 in perf_pmu__cpu_slots_per_cycle weak
function, per John's suggestion;
- Factor out pmu_core__find_same function, per John's suggestion.
- Link: https://lore.kernel.org/all/[email protected]/

Changes since v5:
- Add common topdownL1 metrics in sbsa.json as suggested by John;
- Correct PKI/MPKI ScaleUnit to 1PKI/1MPKI;
- Link: https://lore.kernel.org/all/[email protected]/

Changes since v4:
- Add MPKI/PKI “ScaleUnit”;
- Add acked-by from Ian Rogers;
- Link: https://lore.kernel.org/all/[email protected]/

Changes since v3:
- Add ipc_rate metric;
- Drop the PublicDescription;
- Describe PEutilization metrics in more detail;
- Link: https://lore.kernel.org/all/[email protected]/

Changes since v2:
- Correct the furmula of Branch metrics;
- Add more PE utilization metrics;
- Add more TLB metrics;
- Add “ScaleUnit” for some metrics;
- Add a newline at the end of the file;
- Link: https://lore.kernel.org/all/[email protected]/

Changes since v1:
- Corrected formula for topdown L1 due to wrong counts for stall_slot and
stall_slot_frontend;
- Link: https://lore.kernel.org/all/[email protected]/

This series does the following things:

The slots in each architecture may be different, so add #slots literal
to obtain the slots of different architectures, and the #slots can be
applied in the topdown metric. Currently, The #slots just support for
arm64, and other architectures will return NAN.

The metrics of topdown L1 are from ARM sbsa7.0 platform design doc[0],
D37-38, which are standard. So put them in the common file sbsa.json of
arm64 and add general metric support, so that other cores besides n2/v2
can also be reused.

Then add topdownL1 metric for neoverse-n2-v2, and due to the wrong count
of stall_slot and stall_slot_frontend on neoverse-n2, the real stall_slot
and real stall_slot_frontend need to subtract cpu_cycles, so overwrite
the "MetricExpr" for neoverse-n2.
Reference from ARM neoverse-n2 errata notice [1], D117.

Since neoverse-n2/neoverse-v2 does not yet support topdown L2, metricgroups
such as Cache, TLB, Branch, InstructionsMix, and PEutilization will be added
to further analysis of performance bottlenecks in the following patches.
Reference from ARM PMU guide [2][3].

[0] https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
[1] https://documentation-service.arm.com/static/636a66a64e6cf12278ad89cb?token=
[2] https://documentation-service.arm.com/static/628f8fa3dfaf015c2b76eae8?token=
[3] https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=

Tested in neoverse-n2:

$./perf list
...
Metric Groups:

Branch:
branch_miss_pred_rate
[The rate of branches mis-predited to the overall branches]
branch_mpki
[The rate of branches mis-predicted per kilo instructions]
branch_pki
[The rate of branches retired per kilo instructions]
Cache:
l1d_cache_miss_rate
[The rate of L1 D-Cache misses to the overall L1 D-Cache]
l1d_cache_mpki
[The rate of L1 D-Cache misses per kilo instructions]
...

$sudo ./perf stat -M TLB false_sharing 2

Performance counter stats for 'false_sharing 2':

29,940 L2D_TLB # 20.0 % l2_tlb_miss_rate (42.36%)
5,998 L2D_TLB_REFILL (42.36%)
1,753 L1I_TLB_REFILL # 0.1 % l1i_tlb_miss_rate (43.17%)
2,173,957 L1I_TLB (43.17%)
327,944,763 L1D_TLB # 0.0 % l1d_tlb_miss_rate (43.98%)
22,485 L1D_TLB_REFILL (43.98%)
497,210 L1I_TLB # 0.0 % itlb_walk_rate (44.83%)
28 ITLB_WALK (44.83%)
821,488,762 INST_RETIRED # 0.0 MPKI itlb_mpki (43.97%)
122 ITLB_WALK (43.97%)
744 DTLB_WALK # 0.0 % dtlb_walk_rate (43.01%)
263,913,146 L1D_TLB (43.01%)
779,073,875 INST_RETIRED # 0.0 MPKI dtlb_mpki (42.07%)
1,050 DTLB_WALK (42.07%)

0.435864901 seconds time elapsed

1.201384000 seconds user
0.000000000 seconds sys

$sudo ./perf stat -M TopDownL1 false_sharing 2

Performance counter stats for 'false_sharing 2':

3,408,960,257 cpu_cycles # 0.0 % bad_speculation
# 5.1 % retiring (66.79%)
19,576,079,610 stall_slot (66.79%)
877,673,452 op_spec (66.79%)
876,324,270 op_retired (66.79%)
3,406,548,064 cpu_cycles # 26.7 % frontend_bound (67.08%)
7,961,814,801 stall_slot_frontend (67.08%)
3,415,528,440 cpu_cycles # 68.8 % backend_bound (66.43%)
11,746,647,747 stall_slot_backend (66.43%)

0.455229807 seconds time elapsed

1.243216000 seconds user
0.000000000 seconds sys

$sudo ./perf stat -M branch sleep 1

Performance counter stats for 'sleep 1':

901,495 INST_RETIRED # 223.6 PKI branch_pki
201,603 BR_RETIRED
901,495 INST_RETIRED # 10.0 MPKI branch_mpki
9,004 BR_MIS_PRED_RETIRED
9,004 BR_MIS_PRED_RETIRED # 4.5 % branch_miss_pred_rate
201,603 BR_RETIRED

1.000794467 seconds time elapsed

0.000905000 seconds user
0.000000000 seconds sys

Jing Zhang (9):
perf pmu: Add #slots literal support for arm64
perf jevent: Add general metrics support
perf vendor events arm64: Add common topdown L1 metrics
perf vendor events arm64: Add topdown L1 metrics for neoverse-n2-v2
perf vendor events arm64: Add TLB metrics for neoverse-n2-v2
perf vendor events arm64: Add cache metrics for neoverse-n2-v2
perf vendor events arm64: Add branch metrics for neoverse-n2-v2
perf vendor events arm64: Add PE utilization metrics for
neoverse-n2-v2
perf vendor events arm64: Add instruction mix metrics for
neoverse-n2-v2

tools/perf/arch/arm64/util/pmu.c | 34 ++-
.../arch/arm64/arm/neoverse-n2-v2/metrics.json | 273 +++++++++++++++++++++
tools/perf/pmu-events/arch/arm64/sbsa.json | 30 +++
tools/perf/pmu-events/jevents.py | 2 +
tools/perf/util/expr.c | 5 +
tools/perf/util/pmu.c | 6 +
tools/perf/util/pmu.h | 1 +
7 files changed, 349 insertions(+), 2 deletions(-)
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
create mode 100644 tools/perf/pmu-events/arch/arm64/sbsa.json

--
1.8.3.1

2023-01-13 09:45:02

by Jing Zhang

[permalink] [raw]

Subject: [PATCH v7 1/9] perf pmu: Add #slots literal support for arm64

The slots in each architecture may be different, so add #slots literal
to obtain the slots of different architectures, and the #slots can be
applied in the metric. Currently, The #slots just support for arm64,
and other architectures will return NAN.

On arm64, the value of slots is from the register PMMIR_EL1.SLOT, which
I can read in /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots.
PMMIR_EL1.SLOT might read as zero if the PMU version is lower than
ID_AA64DFR0_EL1_PMUVer_V3P4 or the STALL_SLOT event is not implemented.

Signed-off-by: Jing Zhang <[email protected]>
---
tools/perf/arch/arm64/util/pmu.c | 34 ++++++++++++++++++++++++++++++++--
tools/perf/util/expr.c | 5 +++++
tools/perf/util/pmu.c | 6 ++++++
tools/perf/util/pmu.h | 1 +
4 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/tools/perf/arch/arm64/util/pmu.c b/tools/perf/arch/arm64/util/pmu.c
index 477e513..5f8667b 100644
--- a/tools/perf/arch/arm64/util/pmu.c
+++ b/tools/perf/arch/arm64/util/pmu.c
@@ -3,8 +3,9 @@
#include <internal/cpumap.h>
#include "../../../util/cpumap.h"
#include "../../../util/pmu.h"
+#include <api/fs/fs.h>

-const struct pmu_events_table *pmu_events_table__find(void)
+static struct perf_pmu *pmu_core__find_same(void)
{
struct perf_pmu *pmu = NULL;

@@ -19,8 +20,37 @@ const struct pmu_events_table *pmu_events_table__find(void)
if (pmu->cpus->nr != cpu__max_cpu().cpu)
return NULL;

- return perf_pmu__find_table(pmu);
+ return pmu;
}

return NULL;
}
+
+const struct pmu_events_table *pmu_events_table__find(void)
+{
+ struct perf_pmu *pmu = pmu_core__find_same();
+
+ if (pmu)
+ return perf_pmu__find_table(pmu);
+
+ return NULL;
+}
+
+double perf_pmu__cpu_slots_per_cycle(void)
+{
+ char path[PATH_MAX];
+ unsigned long long slots = 0;
+ struct perf_pmu *pmu = pmu_core__find_same();
+
+ if (pmu) {
+ scnprintf(path, PATH_MAX,
+ EVENT_SOURCE_DEVICE_PATH "%s/caps/slots", pmu->name);
+ /*
+ * The value of slots is not greater than 32 bits, but sysfs__read_int
+ * can't read value with 0x prefix, so use sysfs__read_ull instead.
+ */
+ sysfs__read_ull(path, &slots);
+ }
+
+ return (double)slots;
+}
diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
index 00dcde3..9d3076a 100644
--- a/tools/perf/util/expr.c
+++ b/tools/perf/util/expr.c
@@ -19,6 +19,7 @@
#include <linux/zalloc.h>
#include <ctype.h>
#include <math.h>
+#include "pmu.h"

#ifdef PARSER_DEBUG
extern int expr_debug;
@@ -448,6 +449,10 @@ double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx
result = topology->core_cpus_lists;
goto out;
}
+ if (!strcmp("#slots", literal)) {
+ result = perf_pmu__cpu_slots_per_cycle() ?: NAN;
+ goto out;
+ }

pr_err("Unrecognized literal '%s'", literal);
out:
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 2bdeb89..cbb4fbf 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -19,6 +19,7 @@
#include <regex.h>
#include <perf/cpumap.h>
#include <fnmatch.h>
+#include <math.h>
#include "debug.h"
#include "evsel.h"
#include "pmu.h"
@@ -1993,3 +1994,8 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
*ucpus_ptr = unmatched_cpus;
return 0;
}
+
+double __weak perf_pmu__cpu_slots_per_cycle(void)
+{
+ return NAN;
+}
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 69ca000..fd414ba 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -259,4 +259,5 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,

char *pmu_find_real_name(const char *name);
char *pmu_find_alias_name(const char *name);
+double perf_pmu__cpu_slots_per_cycle(void);
#endif /* __PMU_H */
--
1.8.3.1

2023-01-13 09:45:05

by Jing Zhang

[permalink] [raw]

Subject: [PATCH v7 6/9] perf vendor events arm64: Add cache metrics for neoverse-n2-v2

Add cache related metrics.

Signed-off-by: Jing Zhang <[email protected]>
---
.../arch/arm64/arm/neoverse-n2-v2/metrics.json | 77 ++++++++++++++++++++++
1 file changed, 77 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
index 60bbd8f..08c6aaa 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
@@ -62,5 +62,82 @@
"MetricGroup": "TLB",
"MetricName": "itlb_walk_rate",
"ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000",
+ "BriefDescription": "The rate of L1 I-Cache misses per kilo instructions",
+ "MetricGroup": "Cache",
+ "MetricName": "l1i_cache_mpki",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
+ "BriefDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
+ "MetricGroup": "Cache",
+ "MetricName": "l1i_cache_miss_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000",
+ "BriefDescription": "The rate of L1 D-Cache misses per kilo instructions",
+ "MetricGroup": "Cache",
+ "MetricName": "l1d_cache_mpki",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
+ "BriefDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
+ "MetricGroup": "Cache",
+ "MetricName": "l1d_cache_miss_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000",
+ "BriefDescription": "The rate of L2 D-Cache misses per kilo instructions",
+ "MetricGroup": "Cache",
+ "MetricName": "l2d_cache_mpki",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
+ "BriefDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
+ "MetricGroup": "Cache",
+ "MetricName": "l2d_cache_miss_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "L3D_CACHE_REFILL / INST_RETIRED * 1000",
+ "BriefDescription": "The rate of L3 D-Cache misses per kilo instructions",
+ "MetricGroup": "Cache",
+ "MetricName": "l3d_cache_mpki",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricExpr": "L3D_CACHE_REFILL / L3D_CACHE",
+ "BriefDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
+ "MetricGroup": "Cache",
+ "MetricName": "l3d_cache_miss_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000",
+ "BriefDescription": "The rate of LL Cache read misses per kilo instructions",
+ "MetricGroup": "Cache",
+ "MetricName": "ll_cache_read_mpki",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
+ "BriefDescription": "The rate of LL Cache read misses to the overall LL Cache read",
+ "MetricGroup": "Cache",
+ "MetricName": "ll_cache_read_miss_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
+ "BriefDescription": "The rate of LL Cache read hit to the overall LL Cache read",
+ "MetricGroup": "Cache",
+ "MetricName": "ll_cache_read_hit_rate",
+ "ScaleUnit": "100%"
}
]
--
1.8.3.1

2023-01-13 09:45:07

by Jing Zhang

[permalink] [raw]

Subject: [PATCH v7 7/9] perf vendor events arm64: Add branch metrics for neoverse-n2-v2

Add branch related metrics.

Signed-off-by: Jing Zhang <[email protected]>
---
.../arch/arm64/arm/neoverse-n2-v2/metrics.json | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
index 08c6aaa..afcdb17 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
@@ -139,5 +139,26 @@
"MetricGroup": "Cache",
"MetricName": "ll_cache_read_hit_rate",
"ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "BR_MIS_PRED_RETIRED / INST_RETIRED * 1000",
+ "BriefDescription": "The rate of branches mis-predicted per kilo instructions",
+ "MetricGroup": "Branch",
+ "MetricName": "branch_mpki",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricExpr": "BR_RETIRED / INST_RETIRED * 1000",
+ "BriefDescription": "The rate of branches retired per kilo instructions",
+ "MetricGroup": "Branch",
+ "MetricName": "branch_pki",
+ "ScaleUnit": "1PKI"
+ },
+ {
+ "MetricExpr": "BR_MIS_PRED_RETIRED / BR_RETIRED",
+ "BriefDescription": "The rate of branches mis-predited to the overall branches",
+ "MetricGroup": "Branch",
+ "MetricName": "branch_miss_pred_rate",
+ "ScaleUnit": "100%"
}
]
--
1.8.3.1

2023-01-13 09:45:31

by Jing Zhang

[permalink] [raw]

Subject: [PATCH v7 4/9] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2-v2

Add general topdown L1 metrics for neoverse-n2-v2. Due to the wrong
count of stall_slot and stall_slot_frontend on neoverse-n2, the real
stall_slot and real stall_slot_frontend need to subtract cpu_cycles,
so overwrite the "MetricExpr" for neoverse-n2 which slots are 5.
Reference from ARM neoverse-n2 errata notice [0], D117.

Since neoverse-n2/neoverse-v2 does not yet support topdown L2, metric
groups such as Cache, TLB, Branch, InstructionsMix and PEutilization
will be added to further analysis of performance bottlenecks in the
following patches. Reference from ARM PMU guide [1][2].

[0] https://documentation-service.arm.com/static/636a66a64e6cf12278ad89cb?token=
[1] https://documentation-service.arm.com/static/628f8fa3dfaf015c2b76eae8?token=
[2] https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=

Signed-off-by: Jing Zhang <[email protected]>
---
.../arch/arm64/arm/neoverse-n2-v2/metrics.json | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
new file mode 100644
index 0000000..4e7417f
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
@@ -0,0 +1,17 @@
+[
+ {
+ "ArchStdEvent": "FRONTEND_BOUND",
+ "MetricExpr": "((stall_slot_frontend) if (#slots - 5) else (stall_slot_frontend - cpu_cycles)) / (#slots * cpu_cycles)"
+ },
+ {
+ "ArchStdEvent": "BAD_SPECULATION",
+ "MetricExpr": "(1 - op_retired / op_spec) * (1 - (stall_slot if (#slots - 5) else (stall_slot - cpu_cycles)) / (#slots * cpu_cycles))"
+ },
+ {
+ "ArchStdEvent": "RETIRING",
+ "MetricExpr": "(op_retired / op_spec) * (1 - (stall_slot if (#slots - 5) else (stall_slot - cpu_cycles)) / (#slots * cpu_cycles))"
+ },
+ {
+ "ArchStdEvent": "BACKEND_BOUND"
+ }
+]
--
1.8.3.1

2023-01-13 09:46:35

by Jing Zhang

[permalink] [raw]

Subject: [PATCH v7 5/9] perf vendor events arm64: Add TLB metrics for neoverse-n2-v2

Add TLB related metrics.

Signed-off-by: Jing Zhang <[email protected]>
---
.../arch/arm64/arm/neoverse-n2-v2/metrics.json | 49 ++++++++++++++++++++++
1 file changed, 49 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
index 4e7417f..60bbd8f 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
@@ -13,5 +13,54 @@
},
{
"ArchStdEvent": "BACKEND_BOUND"
+ },
+ {
+ "MetricExpr": "L1D_TLB_REFILL / L1D_TLB",
+ "BriefDescription": "The rate of L1D TLB refill to the overall L1D TLB lookups",
+ "MetricGroup": "TLB",
+ "MetricName": "l1d_tlb_miss_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "L1I_TLB_REFILL / L1I_TLB",
+ "BriefDescription": "The rate of L1I TLB refill to the overall L1I TLB lookups",
+ "MetricGroup": "TLB",
+ "MetricName": "l1i_tlb_miss_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "L2D_TLB_REFILL / L2D_TLB",
+ "BriefDescription": "The rate of L2D TLB refill to the overall L2D TLB lookups",
+ "MetricGroup": "TLB",
+ "MetricName": "l2_tlb_miss_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "DTLB_WALK / INST_RETIRED * 1000",
+ "BriefDescription": "The rate of TLB Walks per kilo instructions for data accesses",
+ "MetricGroup": "TLB",
+ "MetricName": "dtlb_mpki",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricExpr": "DTLB_WALK / L1D_TLB",
+ "BriefDescription": "The rate of DTLB Walks to the overall L1D TLB lookups",
+ "MetricGroup": "TLB",
+ "MetricName": "dtlb_walk_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "ITLB_WALK / INST_RETIRED * 1000",
+ "BriefDescription": "The rate of TLB Walks per kilo instructions for instruction accesses",
+ "MetricGroup": "TLB",
+ "MetricName": "itlb_mpki",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricExpr": "ITLB_WALK / L1I_TLB",
+ "BriefDescription": "The rate of ITLB Walks to the overall L1I TLB lookups",
+ "MetricGroup": "TLB",
+ "MetricName": "itlb_walk_rate",
+ "ScaleUnit": "100%"
}
]
--
1.8.3.1

2023-01-13 09:47:36

by Jing Zhang

[permalink] [raw]

Subject: [PATCH v7 9/9] perf vendor events arm64: Add instruction mix metrics for neoverse-n2-v2

Add instruction mix related metrics.

Signed-off-by: Jing Zhang <[email protected]>
---
.../arch/arm64/arm/neoverse-n2-v2/metrics.json | 63 ++++++++++++++++++++++
1 file changed, 63 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
index 3d6ac0c..8ad15b7 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
@@ -206,5 +206,68 @@
"MetricGroup": "PEutilization",
"MetricName": "cpu_utilization",
"ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "LD_SPEC / INST_SPEC",
+ "BriefDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
+ "MetricGroup": "InstructionMix",
+ "MetricName": "load_spec_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "ST_SPEC / INST_SPEC",
+ "BriefDescription": "The rate of store instructions speculatively executed to overall instructions speclatively executed",
+ "MetricGroup": "InstructionMix",
+ "MetricName": "store_spec_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "DP_SPEC / INST_SPEC",
+ "BriefDescription": "The rate of integer data-processing instructions speculatively executed to overall instructions speclatively executed",
+ "MetricGroup": "InstructionMix",
+ "MetricName": "data_process_spec_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "ASE_SPEC / INST_SPEC",
+ "BriefDescription": "The rate of advanced SIMD instructions speculatively executed to overall instructions speclatively executed",
+ "MetricGroup": "InstructionMix",
+ "MetricName": "advanced_simd_spec_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "VFP_SPEC / INST_SPEC",
+ "BriefDescription": "The rate of floating point instructions speculatively executed to overall instructions speclatively executed",
+ "MetricGroup": "InstructionMix",
+ "MetricName": "float_point_spec_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "CRYPTO_SPEC / INST_SPEC",
+ "BriefDescription": "The rate of crypto instructions speculatively executed to overall instructions speclatively executed",
+ "MetricGroup": "InstructionMix",
+ "MetricName": "crypto_spec_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "BR_IMMED_SPEC / INST_SPEC",
+ "BriefDescription": "The rate of branch immediate instructions speculatively executed to overall instructions speclatively executed",
+ "MetricGroup": "InstructionMix",
+ "MetricName": "branch_immed_spec_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "BR_RETURN_SPEC / INST_SPEC",
+ "BriefDescription": "The rate of procedure return instructions speculatively executed to overall instructions speclatively executed",
+ "MetricGroup": "InstructionMix",
+ "MetricName": "branch_return_spec_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "BR_INDIRECT_SPEC / INST_SPEC",
+ "BriefDescription": "The rate of indirect branch instructions speculatively executed to overall instructions speclatively executed",
+ "MetricGroup": "InstructionMix",
+ "MetricName": "branch_indirect_spec_rate",
+ "ScaleUnit": "100%"
}
]
--
1.8.3.1

2023-01-13 09:58:01

by Jing Zhang

[permalink] [raw]

Subject: [PATCH v7 3/9] perf vendor events arm64: Add common topdown L1 metrics

The metrics of topdown L1 are from ARM sbsa7.0 platform design doc[0],
D37-38, which are standard. So put them in the common file sbsa.json of
arm64, so that other cores besides n2/v2 can also be reused.

[0] https://documentation-service.arm.com/static/60250c7395978b529036da86?token=

Signed-off-by: Jing Zhang <[email protected]>
---
tools/perf/pmu-events/arch/arm64/sbsa.json | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
create mode 100644 tools/perf/pmu-events/arch/arm64/sbsa.json

diff --git a/tools/perf/pmu-events/arch/arm64/sbsa.json b/tools/perf/pmu-events/arch/arm64/sbsa.json
new file mode 100644
index 0000000..f678c37e
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
@@ -0,0 +1,30 @@
+[
+ {
+ "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)",
+ "BriefDescription": "Frontend bound L1 topdown metric",
+ "MetricGroup": "TopdownL1",
+ "MetricName": "frontend_bound",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "(1 - op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles))",
+ "BriefDescription": "Bad speculation L1 topdown metric",
+ "MetricGroup": "TopdownL1",
+ "MetricName": "bad_speculation",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "(op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles))",
+ "BriefDescription": "Retiring L1 topdown metric",
+ "MetricGroup": "TopdownL1",
+ "MetricName": "retiring",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "stall_slot_backend / (#slots * cpu_cycles)",
+ "BriefDescription": "Backend Bound L1 topdown metric",
+ "MetricGroup": "TopdownL1",
+ "MetricName": "backend_bound",
+ "ScaleUnit": "100%"
+ }
+]
--
1.8.3.1

2023-01-13 09:58:25

by Jing Zhang

[permalink] [raw]

Subject: [PATCH v7 2/9] perf jevent: Add general metrics support

Add general metrics support, so that some general metrics applicable
to multiple architectures can be defined in the public json file like
general events, and then add general metrics through "arch_std_event"
in json file of different architecture.

Signed-off-by: Jing Zhang <[email protected]>
---
tools/perf/pmu-events/jevents.py | 2 ++
1 file changed, 2 insertions(+)

diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 4c398e0..0416b74 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -358,6 +358,8 @@ def preprocess_arch_std_files(archpath: str) -> None:
for event in read_json_events(item.path, topic=''):
if event.name:
_arch_std_events[event.name.lower()] = event
+ if event.metric_name:
+ _arch_std_events[event.metric_name.lower()] = event

def print_events_table_prefix(tblname: str) -> None:
--
1.8.3.1

2023-01-13 10:09:24

by Jing Zhang

[permalink] [raw]

Subject: [PATCH v7 8/9] perf vendor events arm64: Add PE utilization metrics for neoverse-n2-v2

Add PE utilization related metrics. In cpu_utilization metric, if it is
neoverse-n2 which slots are 5, the real stall_slot need to subtract the
cpu_cycles according to the neoverse-n2 errata [0].

[0] https://documentation-service.arm.com/static/636a66a64e6cf12278ad89cb?token=

Signed-off-by: Jing Zhang <[email protected]>
---
.../arch/arm64/arm/neoverse-n2-v2/metrics.json | 46 ++++++++++++++++++++++
1 file changed, 46 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
index afcdb17..3d6ac0c 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
@@ -160,5 +160,51 @@
"MetricGroup": "Branch",
"MetricName": "branch_miss_pred_rate",
"ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "instructions / CPU_CYCLES",
+ "BriefDescription": "The average number of instructions executed for each cycle.",
+ "MetricGroup": "PEutilization",
+ "MetricName": "ipc"
+ },
+ {
+ "MetricExpr": "ipc / 5",
+ "BriefDescription": "IPC percentage of peak. The peak of IPC is 5.",
+ "MetricGroup": "PEutilization",
+ "MetricName": "ipc_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "INST_RETIRED / CPU_CYCLES",
+ "BriefDescription": "Architecturally executed Instructions Per Cycle (IPC)",
+ "MetricGroup": "PEutilization",
+ "MetricName": "retired_ipc"
+ },
+ {
+ "MetricExpr": "INST_SPEC / CPU_CYCLES",
+ "BriefDescription": "Speculatively executed Instructions Per Cycle (IPC)",
+ "MetricGroup": "PEutilization",
+ "MetricName": "spec_ipc"
+ },
+ {
+ "MetricExpr": "OP_RETIRED / OP_SPEC",
+ "BriefDescription": "Of all the micro-operations issued, what percentage are retired(committed)",
+ "MetricGroup": "PEutilization",
+ "MetricName": "retired_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "1 - OP_RETIRED / OP_SPEC",
+ "BriefDescription": "Of all the micro-operations issued, what percentage are not retired(committed)",
+ "MetricGroup": "PEutilization",
+ "MetricName": "wasted_rate",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricExpr": "OP_RETIRED / OP_SPEC * (1 - (STALL_SLOT if (#slots - 5) else (STALL_SLOT - CPU_CYCLES)) / (#slots * CPU_CYCLES))",
+ "BriefDescription": "The truly effective ratio of micro-operations executed by the CPU, which means that misprediction and stall are not included",
+ "MetricGroup": "PEutilization",
+ "MetricName": "cpu_utilization",
+ "ScaleUnit": "100%"
}
]
--
1.8.3.1

2023-01-13 10:24:16

by Jing Zhang

[permalink] [raw]

Subject: Re: [PATCH v7 1/9] perf pmu: Add #slots literal support for arm64

在 2023/1/13 下午5:22, Jing Zhang 写道:
> The slots in each architecture may be different, so add #slots literal
> to obtain the slots of different architectures, and the #slots can be
> applied in the metric. Currently, The #slots just support for arm64,
> and other architectures will return NAN.
>
> On arm64, the value of slots is from the register PMMIR_EL1.SLOT, which
> I can read in /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots.
> PMMIR_EL1.SLOT might read as zero if the PMU version is lower than
> ID_AA64DFR0_EL1_PMUVer_V3P4 or the STALL_SLOT event is not implemented.
>
> Signed-off-by: Jing Zhang <[email protected]>
> ---

Hi Ian,

I have made significant changes compared to the previous two versions, so
I have not picked up your acked-by tags in this version. I look forward to
your review and give me a tag again. Thank you very much.

Thanks,
Jing

> tools/perf/arch/arm64/util/pmu.c | 34 ++++++++++++++++++++++++++++++++--
> tools/perf/util/expr.c | 5 +++++
> tools/perf/util/pmu.c | 6 ++++++
> tools/perf/util/pmu.h | 1 +
> 4 files changed, 44 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/arch/arm64/util/pmu.c b/tools/perf/arch/arm64/util/pmu.c
> index 477e513..5f8667b 100644
> --- a/tools/perf/arch/arm64/util/pmu.c
> +++ b/tools/perf/arch/arm64/util/pmu.c
> @@ -3,8 +3,9 @@
> #include <internal/cpumap.h>
> #include "../../../util/cpumap.h"
> #include "../../../util/pmu.h"
> +#include <api/fs/fs.h>
>
> -const struct pmu_events_table *pmu_events_table__find(void)
> +static struct perf_pmu *pmu_core__find_same(void)
> {
> struct perf_pmu *pmu = NULL;
>
> @@ -19,8 +20,37 @@ const struct pmu_events_table *pmu_events_table__find(void)
> if (pmu->cpus->nr != cpu__max_cpu().cpu)
> return NULL;
>
> - return perf_pmu__find_table(pmu);
> + return pmu;
> }
>
> return NULL;
> }
> +
> +const struct pmu_events_table *pmu_events_table__find(void)
> +{
> + struct perf_pmu *pmu = pmu_core__find_same();
> +
> + if (pmu)
> + return perf_pmu__find_table(pmu);
> +
> + return NULL;
> +}
> +
> +double perf_pmu__cpu_slots_per_cycle(void)
> +{
> + char path[PATH_MAX];
> + unsigned long long slots = 0;
> + struct perf_pmu *pmu = pmu_core__find_same();
> +
> + if (pmu) {
> + scnprintf(path, PATH_MAX,
> + EVENT_SOURCE_DEVICE_PATH "%s/caps/slots", pmu->name);
> + /*
> + * The value of slots is not greater than 32 bits, but sysfs__read_int
> + * can't read value with 0x prefix, so use sysfs__read_ull instead.
> + */
> + sysfs__read_ull(path, &slots);
> + }
> +
> + return (double)slots;
> +}
> diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
> index 00dcde3..9d3076a 100644
> --- a/tools/perf/util/expr.c
> +++ b/tools/perf/util/expr.c
> @@ -19,6 +19,7 @@
> #include <linux/zalloc.h>
> #include <ctype.h>
> #include <math.h>
> +#include "pmu.h"
>
> #ifdef PARSER_DEBUG
> extern int expr_debug;
> @@ -448,6 +449,10 @@ double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx
> result = topology->core_cpus_lists;
> goto out;
> }
> + if (!strcmp("#slots", literal)) {
> + result = perf_pmu__cpu_slots_per_cycle() ?: NAN;
> + goto out;
> + }
>
> pr_err("Unrecognized literal '%s'", literal);
> out:
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index 2bdeb89..cbb4fbf 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -19,6 +19,7 @@
> #include <regex.h>
> #include <perf/cpumap.h>
> #include <fnmatch.h>
> +#include <math.h>
> #include "debug.h"
> #include "evsel.h"
> #include "pmu.h"
> @@ -1993,3 +1994,8 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
> *ucpus_ptr = unmatched_cpus;
> return 0;
> }
> +
> +double __weak perf_pmu__cpu_slots_per_cycle(void)
> +{
> + return NAN;
> +}
> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
> index 69ca000..fd414ba 100644
> --- a/tools/perf/util/pmu.h
> +++ b/tools/perf/util/pmu.h
> @@ -259,4 +259,5 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
>
> char *pmu_find_real_name(const char *name);
> char *pmu_find_alias_name(const char *name);
> +double perf_pmu__cpu_slots_per_cycle(void);
> #endif /* __PMU_H */

2023-01-13 10:25:31

by John Garry

[permalink] [raw]

Subject: Re: [PATCH v7 0/9] Add metrics for neoverse-n2-v2

On 13/01/2023 09:22, Jing Zhang wrote:
> Changes since v6:
> - Split patch #1 into 3 smaller patches as suggested by Ian.
> - Change perf_pmu__get_slots into perf_pmu__cpu_slots_per_cycle,
> per John's suggestion;
> - Return NAN instead of 0 in perf_pmu__cpu_slots_per_cycle weak
> function, per John's suggestion;
> - Factor out pmu_core__find_same function, per John's suggestion.
> - Link:https://urldefense.com/v3/__https://lore.kernel.org/all/[email protected]/__;!!ACWV5N9M2RV99hQ!LhBq67uDCOsz1k7ZF4aQPHF0Bp8FsMr-ZNgCnBSUKF4qJTFODfnkId7lw_NXqB4qZUCpu-jbY8z8LTckoqFGz2Q8bA$

This looks fine. But for this code:

On 13/01/2023 09:22, Jing Zhang wrote:
> +double perf_pmu__cpu_slots_per_cycle(void)
> +{
> + char path[PATH_MAX];
> + unsigned long long slots = 0;

I would prefer if this returned NAN (and not 0) for when we can't find a
pmu or the value from ./caps/slots is zero, but I am not going to get
too hung up on that.

For series:

Reviewed-by: John Garry <[email protected]>

2023-01-13 16:45:17

by Jing Zhang

[permalink] [raw]

Subject: Re: [PATCH v7 0/9] Add metrics for neoverse-n2-v2

在 2023/1/13 下午5:59, John Garry 写道:
> On 13/01/2023 09:22, Jing Zhang wrote:
>> Changes since v6:
>> - Split patch #1 into 3 smaller patches as suggested by Ian.
>> - Change perf_pmu__get_slots into perf_pmu__cpu_slots_per_cycle,
>>    per John's suggestion;
>> - Return NAN instead of 0 in perf_pmu__cpu_slots_per_cycle weak
>>    function, per John's suggestion;
>> - Factor out pmu_core__find_same function, per John's suggestion.
>> - Link:https://urldefense.com/v3/__https://lore.kernel.org/all/[email protected]/__;!!ACWV5N9M2RV99hQ!LhBq67uDCOsz1k7ZF4aQPHF0Bp8FsMr-ZNgCnBSUKF4qJTFODfnkId7lw_NXqB4qZUCpu-jbY8z8LTckoqFGz2Q8bA$
>
> This looks fine. But for this code:
>
> On 13/01/2023 09:22, Jing Zhang wrote:
>> +double perf_pmu__cpu_slots_per_cycle(void)
>> +{
>> +    char path[PATH_MAX];
>> +    unsigned long long slots = 0;
>
> I would prefer if this returned NAN (and not 0) for when we can't find a pmu or the value from ./caps/slots is zero, but I am not going to get too hung up on that.
>

Ok, I like this way too.

> For series:
>
> Reviewed-by: John Garry <[email protected]>

Thank you very much indeed!

2023-01-14 22:42:56

by Ian Rogers

[permalink] [raw]

Subject: Re: [PATCH v7 1/9] perf pmu: Add #slots literal support for arm64

On Fri, Jan 13, 2023 at 1:22 AM Jing Zhang <[email protected]> wrote:
>
> The slots in each architecture may be different, so add #slots literal
> to obtain the slots of different architectures, and the #slots can be
> applied in the metric. Currently, The #slots just support for arm64,
> and other architectures will return NAN.
>
> On arm64, the value of slots is from the register PMMIR_EL1.SLOT, which
> I can read in /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots.
> PMMIR_EL1.SLOT might read as zero if the PMU version is lower than
> ID_AA64DFR0_EL1_PMUVer_V3P4 or the STALL_SLOT event is not implemented.
>
> Signed-off-by: Jing Zhang <[email protected]>
> ---
> tools/perf/arch/arm64/util/pmu.c | 34 ++++++++++++++++++++++++++++++++--
> tools/perf/util/expr.c | 5 +++++
> tools/perf/util/pmu.c | 6 ++++++
> tools/perf/util/pmu.h | 1 +
> 4 files changed, 44 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/arch/arm64/util/pmu.c b/tools/perf/arch/arm64/util/pmu.c
> index 477e513..5f8667b 100644
> --- a/tools/perf/arch/arm64/util/pmu.c
> +++ b/tools/perf/arch/arm64/util/pmu.c
> @@ -3,8 +3,9 @@
> #include <internal/cpumap.h>
> #include "../../../util/cpumap.h"
> #include "../../../util/pmu.h"
> +#include <api/fs/fs.h>
>
> -const struct pmu_events_table *pmu_events_table__find(void)
> +static struct perf_pmu *pmu_core__find_same(void)

I'm not sure "find_same" is the best name here. I suspect it should be
"find_core_pmu" which would agree with is_arm_pmu_core. Unfortunately
"core" has become an overloaded term sometimes used interchangeably
with CPU, hyperthread or SMT thread, it was a model name for Intel and
it is used to distinguish a set of SMT threads running together from a
single one. Anyway, for consistency I think perf_pmu__find_core_pmu is
the most appropriate name (or pmu__find_core_pmu, I'm not sure why we
get the extra perf_ prefix sometimes, in general that indicates the
functionality is in libperf).

Aside from that, lgtm. Thanks,
Ian

> {
> struct perf_pmu *pmu = NULL;
>
> @@ -19,8 +20,37 @@ const struct pmu_events_table *pmu_events_table__find(void)
> if (pmu->cpus->nr != cpu__max_cpu().cpu)
> return NULL;
>
> - return perf_pmu__find_table(pmu);
> + return pmu;
> }
>
> return NULL;
> }
> +
> +const struct pmu_events_table *pmu_events_table__find(void)
> +{
> + struct perf_pmu *pmu = pmu_core__find_same();
> +
> + if (pmu)
> + return perf_pmu__find_table(pmu);
> +
> + return NULL;
> +}
> +
> +double perf_pmu__cpu_slots_per_cycle(void)
> +{
> + char path[PATH_MAX];
> + unsigned long long slots = 0;
> + struct perf_pmu *pmu = pmu_core__find_same();
> +
> + if (pmu) {
> + scnprintf(path, PATH_MAX,
> + EVENT_SOURCE_DEVICE_PATH "%s/caps/slots", pmu->name);
> + /*
> + * The value of slots is not greater than 32 bits, but sysfs__read_int
> + * can't read value with 0x prefix, so use sysfs__read_ull instead.
> + */
> + sysfs__read_ull(path, &slots);
> + }
> +
> + return (double)slots;
> +}
> diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
> index 00dcde3..9d3076a 100644
> --- a/tools/perf/util/expr.c
> +++ b/tools/perf/util/expr.c
> @@ -19,6 +19,7 @@
> #include <linux/zalloc.h>
> #include <ctype.h>
> #include <math.h>
> +#include "pmu.h"
>
> #ifdef PARSER_DEBUG
> extern int expr_debug;
> @@ -448,6 +449,10 @@ double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx
> result = topology->core_cpus_lists;
> goto out;
> }
> + if (!strcmp("#slots", literal)) {
> + result = perf_pmu__cpu_slots_per_cycle() ?: NAN;
> + goto out;
> + }
>
> pr_err("Unrecognized literal '%s'", literal);
> out:
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index 2bdeb89..cbb4fbf 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -19,6 +19,7 @@
> #include <regex.h>
> #include <perf/cpumap.h>
> #include <fnmatch.h>
> +#include <math.h>
> #include "debug.h"
> #include "evsel.h"
> #include "pmu.h"
> @@ -1993,3 +1994,8 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
> *ucpus_ptr = unmatched_cpus;
> return 0;
> }
> +
> +double __weak perf_pmu__cpu_slots_per_cycle(void)
> +{
> + return NAN;
> +}
> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
> index 69ca000..fd414ba 100644
> --- a/tools/perf/util/pmu.h
> +++ b/tools/perf/util/pmu.h
> @@ -259,4 +259,5 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
>
> char *pmu_find_real_name(const char *name);
> char *pmu_find_alias_name(const char *name);
> +double perf_pmu__cpu_slots_per_cycle(void);
> #endif /* __PMU_H */
> --
> 1.8.3.1
>

2023-01-14 22:57:46

by Ian Rogers

[permalink] [raw]

Subject: Re: [PATCH v7 0/9] Add metrics for neoverse-n2-v2

On Fri, Jan 13, 2023 at 8:32 AM Jing Zhang <[email protected]> wrote:
>
>
>
> 在 2023/1/13 下午5:59, John Garry 写道:
> > On 13/01/2023 09:22, Jing Zhang wrote:
> >> Changes since v6:
> >> - Split patch #1 into 3 smaller patches as suggested by Ian.
> >> - Change perf_pmu__get_slots into perf_pmu__cpu_slots_per_cycle,
> >> per John's suggestion;
> >> - Return NAN instead of 0 in perf_pmu__cpu_slots_per_cycle weak
> >> function, per John's suggestion;
> >> - Factor out pmu_core__find_same function, per John's suggestion.
> >> - Link:https://urldefense.com/v3/__https://lore.kernel.org/all/[email protected]/__;!!ACWV5N9M2RV99hQ!LhBq67uDCOsz1k7ZF4aQPHF0Bp8FsMr-ZNgCnBSUKF4qJTFODfnkId7lw_NXqB4qZUCpu-jbY8z8LTckoqFGz2Q8bA$
> >
> > This looks fine. But for this code:
> >
> > On 13/01/2023 09:22, Jing Zhang wrote:
> >> +double perf_pmu__cpu_slots_per_cycle(void)
> >> +{
> >> + char path[PATH_MAX];
> >> + unsigned long long slots = 0;
> >
> > I would prefer if this returned NAN (and not 0) for when we can't find a pmu or the value from ./caps/slots is zero, but I am not going to get too hung up on that.
> >
>
> Ok, I like this way too.
>
> > For series:
> >
> > Reviewed-by: John Garry <[email protected]>
>
> Thank you very much indeed!

Aside a naming nit in 1/9 for series:

Acked-by: Ian Rogers <[email protected]>

Thanks,
Ian

2023-01-14 22:59:27

by Ian Rogers

[permalink] [raw]

Subject: Re: [PATCH v7 2/9] perf jevent: Add general metrics support

On Fri, Jan 13, 2023 at 1:22 AM Jing Zhang <[email protected]> wrote:
>
> Add general metrics support, so that some general metrics applicable
> to multiple architectures can be defined in the public json file like
> general events, and then add general metrics through "arch_std_event"
> in json file of different architecture.
>
> Signed-off-by: Jing Zhang <[email protected]>

Acked-by: Ian Rogers <[email protected]>

Thanks,
Ian

> ---
> tools/perf/pmu-events/jevents.py | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
> index 4c398e0..0416b74 100755
> --- a/tools/perf/pmu-events/jevents.py
> +++ b/tools/perf/pmu-events/jevents.py
> @@ -358,6 +358,8 @@ def preprocess_arch_std_files(archpath: str) -> None:
> for event in read_json_events(item.path, topic=''):
> if event.name:
> _arch_std_events[event.name.lower()] = event
> + if event.metric_name:
> + _arch_std_events[event.metric_name.lower()] = event
>
>
> def print_events_table_prefix(tblname: str) -> None:
> --
> 1.8.3.1
>

2023-01-16 03:12:57

by Jing Zhang

[permalink] [raw]

Subject: Re: [PATCH v7 0/9] Add metrics for neoverse-n2-v2

在 2023/1/15 上午6:40, Ian Rogers 写道:
> On Fri, Jan 13, 2023 at 8:32 AM Jing Zhang <[email protected]> wrote:
>>
>>
>>
>> 在 2023/1/13 下午5:59, John Garry 写道:
>>> On 13/01/2023 09:22, Jing Zhang wrote:
>>>> Changes since v6:
>>>> - Split patch #1 into 3 smaller patches as suggested by Ian.
>>>> - Change perf_pmu__get_slots into perf_pmu__cpu_slots_per_cycle,
>>>> per John's suggestion;
>>>> - Return NAN instead of 0 in perf_pmu__cpu_slots_per_cycle weak
>>>> function, per John's suggestion;
>>>> - Factor out pmu_core__find_same function, per John's suggestion.
>>>> - Link:https://urldefense.com/v3/__https://lore.kernel.org/all/[email protected]/__;!!ACWV5N9M2RV99hQ!LhBq67uDCOsz1k7ZF4aQPHF0Bp8FsMr-ZNgCnBSUKF4qJTFODfnkId7lw_NXqB4qZUCpu-jbY8z8LTckoqFGz2Q8bA$
>>>
>>> This looks fine. But for this code:
>>>
>>> On 13/01/2023 09:22, Jing Zhang wrote:
>>>> +double perf_pmu__cpu_slots_per_cycle(void)
>>>> +{
>>>> + char path[PATH_MAX];
>>>> + unsigned long long slots = 0;
>>>
>>> I would prefer if this returned NAN (and not 0) for when we can't find a pmu or the value from ./caps/slots is zero, but I am not going to get too hung up on that.
>>>
>>
>> Ok, I like this way too.
>>
>>> For series:
>>>
>>> Reviewed-by: John Garry <[email protected]>
>>
>> Thank you very much indeed!
>
> Aside a naming nit in 1/9 for series:
>
> Acked-by: Ian Rogers <[email protected]>
>

Thank you sincerely!

> Thanks,
> Ian

2023-01-16 03:22:19

by Jing Zhang

[permalink] [raw]

Subject: Re: [PATCH v7 1/9] perf pmu: Add #slots literal support for arm64

在 2023/1/15 上午6:15, Ian Rogers 写道:
> On Fri, Jan 13, 2023 at 1:22 AM Jing Zhang <[email protected]> wrote:
>>
>> The slots in each architecture may be different, so add #slots literal
>> to obtain the slots of different architectures, and the #slots can be
>> applied in the metric. Currently, The #slots just support for arm64,
>> and other architectures will return NAN.
>>
>> On arm64, the value of slots is from the register PMMIR_EL1.SLOT, which
>> I can read in /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots.
>> PMMIR_EL1.SLOT might read as zero if the PMU version is lower than
>> ID_AA64DFR0_EL1_PMUVer_V3P4 or the STALL_SLOT event is not implemented.
>>
>> Signed-off-by: Jing Zhang <[email protected]>
>> ---
>> tools/perf/arch/arm64/util/pmu.c | 34 ++++++++++++++++++++++++++++++++--
>> tools/perf/util/expr.c | 5 +++++
>> tools/perf/util/pmu.c | 6 ++++++
>> tools/perf/util/pmu.h | 1 +
>> 4 files changed, 44 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/arch/arm64/util/pmu.c b/tools/perf/arch/arm64/util/pmu.c
>> index 477e513..5f8667b 100644
>> --- a/tools/perf/arch/arm64/util/pmu.c
>> +++ b/tools/perf/arch/arm64/util/pmu.c
>> @@ -3,8 +3,9 @@
>> #include <internal/cpumap.h>
>> #include "../../../util/cpumap.h"
>> #include "../../../util/pmu.h"
>> +#include <api/fs/fs.h>
>>
>> -const struct pmu_events_table *pmu_events_table__find(void)
>> +static struct perf_pmu *pmu_core__find_same(void)
>
> I'm not sure "find_same" is the best name here. I suspect it should be
> "find_core_pmu" which would agree with is_arm_pmu_core. Unfortunately
> "core" has become an overloaded term sometimes used interchangeably
> with CPU, hyperthread or SMT thread, it was a model name for Intel and
> it is used to distinguish a set of SMT threads running together from a
> single one. Anyway, for consistency I think perf_pmu__find_core_pmu is
> the most appropriate name (or pmu__find_core_pmu, I'm not sure why we
> get the extra perf_ prefix sometimes, in general that indicates the
> functionality is in libperf).
>

The reason for using "pmu_core__find_same" before is to indicate that we're
only dealing with homogeneous cores. And in the tools/perf/util/pmu.c file,
most of the static functions have "pmu_" prefix, maybe we can use
"pmu_find_same_core_pmu"? Ian, John, what do you think?

Thanks,
Jing

> Aside from that, lgtm. Thanks,
> Ian
>
>> {
>> struct perf_pmu *pmu = NULL;
>>
>> @@ -19,8 +20,37 @@ const struct pmu_events_table *pmu_events_table__find(void)
>> if (pmu->cpus->nr != cpu__max_cpu().cpu)
>> return NULL;
>>
>> - return perf_pmu__find_table(pmu);
>> + return pmu;
>> }
>>
>> return NULL;
>> }
>> +
>> +const struct pmu_events_table *pmu_events_table__find(void)
>> +{
>> + struct perf_pmu *pmu = pmu_core__find_same();
>> +
>> + if (pmu)
>> + return perf_pmu__find_table(pmu);
>> +
>> + return NULL;
>> +}
>> +
>> +double perf_pmu__cpu_slots_per_cycle(void)
>> +{
>> + char path[PATH_MAX];
>> + unsigned long long slots = 0;
>> + struct perf_pmu *pmu = pmu_core__find_same();
>> +
>> + if (pmu) {
>> + scnprintf(path, PATH_MAX,
>> + EVENT_SOURCE_DEVICE_PATH "%s/caps/slots", pmu->name);
>> + /*
>> + * The value of slots is not greater than 32 bits, but sysfs__read_int
>> + * can't read value with 0x prefix, so use sysfs__read_ull instead.
>> + */
>> + sysfs__read_ull(path, &slots);
>> + }
>> +
>> + return (double)slots;
>> +}
>> diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
>> index 00dcde3..9d3076a 100644
>> --- a/tools/perf/util/expr.c
>> +++ b/tools/perf/util/expr.c
>> @@ -19,6 +19,7 @@
>> #include <linux/zalloc.h>
>> #include <ctype.h>
>> #include <math.h>
>> +#include "pmu.h"
>>
>> #ifdef PARSER_DEBUG
>> extern int expr_debug;
>> @@ -448,6 +449,10 @@ double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx
>> result = topology->core_cpus_lists;
>> goto out;
>> }
>> + if (!strcmp("#slots", literal)) {
>> + result = perf_pmu__cpu_slots_per_cycle() ?: NAN;
>> + goto out;
>> + }
>>
>> pr_err("Unrecognized literal '%s'", literal);
>> out:
>> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
>> index 2bdeb89..cbb4fbf 100644
>> --- a/tools/perf/util/pmu.c
>> +++ b/tools/perf/util/pmu.c
>> @@ -19,6 +19,7 @@
>> #include <regex.h>
>> #include <perf/cpumap.h>
>> #include <fnmatch.h>
>> +#include <math.h>
>> #include "debug.h"
>> #include "evsel.h"
>> #include "pmu.h"
>> @@ -1993,3 +1994,8 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
>> *ucpus_ptr = unmatched_cpus;
>> return 0;
>> }
>> +
>> +double __weak perf_pmu__cpu_slots_per_cycle(void)
>> +{
>> + return NAN;
>> +}
>> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
>> index 69ca000..fd414ba 100644
>> --- a/tools/perf/util/pmu.h
>> +++ b/tools/perf/util/pmu.h
>> @@ -259,4 +259,5 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
>>
>> char *pmu_find_real_name(const char *name);
>> char *pmu_find_alias_name(const char *name);
>> +double perf_pmu__cpu_slots_per_cycle(void);
>> #endif /* __PMU_H */
>> --
>> 1.8.3.1
>>

2023-01-16 06:53:21

by Ian Rogers

[permalink] [raw]

Subject: Re: [PATCH v7 1/9] perf pmu: Add #slots literal support for arm64

On Sun, Jan 15, 2023 at 6:58 PM Jing Zhang <[email protected]> wrote:
>
> 在 2023/1/15 上午6:15, Ian Rogers 写道:
> > On Fri, Jan 13, 2023 at 1:22 AM Jing Zhang <[email protected]> wrote:
> >>
> >> The slots in each architecture may be different, so add #slots literal
> >> to obtain the slots of different architectures, and the #slots can be
> >> applied in the metric. Currently, The #slots just support for arm64,
> >> and other architectures will return NAN.
> >>
> >> On arm64, the value of slots is from the register PMMIR_EL1.SLOT, which
> >> I can read in /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots.
> >> PMMIR_EL1.SLOT might read as zero if the PMU version is lower than
> >> ID_AA64DFR0_EL1_PMUVer_V3P4 or the STALL_SLOT event is not implemented.
> >>
> >> Signed-off-by: Jing Zhang <[email protected]>
> >> ---
> >> tools/perf/arch/arm64/util/pmu.c | 34 ++++++++++++++++++++++++++++++++--
> >> tools/perf/util/expr.c | 5 +++++
> >> tools/perf/util/pmu.c | 6 ++++++
> >> tools/perf/util/pmu.h | 1 +
> >> 4 files changed, 44 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/tools/perf/arch/arm64/util/pmu.c b/tools/perf/arch/arm64/util/pmu.c
> >> index 477e513..5f8667b 100644
> >> --- a/tools/perf/arch/arm64/util/pmu.c
> >> +++ b/tools/perf/arch/arm64/util/pmu.c
> >> @@ -3,8 +3,9 @@
> >> #include <internal/cpumap.h>
> >> #include "../../../util/cpumap.h"
> >> #include "../../../util/pmu.h"
> >> +#include <api/fs/fs.h>
> >>
> >> -const struct pmu_events_table *pmu_events_table__find(void)
> >> +static struct perf_pmu *pmu_core__find_same(void)
> >
> > I'm not sure "find_same" is the best name here. I suspect it should be
> > "find_core_pmu" which would agree with is_arm_pmu_core. Unfortunately
> > "core" has become an overloaded term sometimes used interchangeably
> > with CPU, hyperthread or SMT thread, it was a model name for Intel and
> > it is used to distinguish a set of SMT threads running together from a
> > single one. Anyway, for consistency I think perf_pmu__find_core_pmu is
> > the most appropriate name (or pmu__find_core_pmu, I'm not sure why we
> > get the extra perf_ prefix sometimes, in general that indicates the
> > functionality is in libperf).
> >
>
> The reason for using "pmu_core__find_same" before is to indicate that we're
> only dealing with homogeneous cores. And in the tools/perf/util/pmu.c file,
> most of the static functions have "pmu_" prefix, maybe we can use
> "pmu_find_same_core_pmu"? Ian, John, what do you think?

I wouldn't necessarily worry about hybrid given #slots is currently
ARM specific. For hybrid we'd need to know the CPU for the metric. We
do have the list of CPUs (really hyper/SMT threads) that were
requested for the metric:
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/expr.h?h=perf/core#n9
We use it to compute the #core_wide literal. We need the value early
before events are created, hence the string format. You could search
pmus looking for a PMU that isn't uncore and has matching CPUs, but
AMD has multiple core/hardware PMUs (hence Ravi's most recent work)
and so I suspect we'd need to extend PMU to make this work. We've
solved a similar problem to this with the source_count metric
function, that returns a number of aliased events. We could do
something like:
slots(INST_RETIRED.MACRO_FUSED)
and then from the event get the PMU, from the PMU get the CPUs, from a
CPU get the slots. In that case it may just be cleaner to pass the PMU
name to the slots function, so:
slots(cpu_core) or slots(cpu_atom)
But the parser wouldn't understand that cpu_core or cpu_atom were PMU
names and would try to handle them as events or metric references.

Again, I think ignoring the hybrid case is fine in this case. Using
"same" in the function name to imply "not hybrid" I don't think works,
so I think something like pmu__find_core_pmu is best. You could have a
comment and also a:
assert(!perf_pmu__is_hybrid(pmu->name));
Ultimately I'd like to get rid of all notions of hybrid and just pair
events with a PMU. I recently cleaned this up in builtin-list.c.

Thanks,
Ian

> Thanks,
> Jing
>
> > Aside from that, lgtm. Thanks,
> > Ian
> >
> >> {
> >> struct perf_pmu *pmu = NULL;
> >>
> >> @@ -19,8 +20,37 @@ const struct pmu_events_table *pmu_events_table__find(void)
> >> if (pmu->cpus->nr != cpu__max_cpu().cpu)
> >> return NULL;
> >>
> >> - return perf_pmu__find_table(pmu);
> >> + return pmu;
> >> }
> >>
> >> return NULL;
> >> }
> >> +
> >> +const struct pmu_events_table *pmu_events_table__find(void)
> >> +{
> >> + struct perf_pmu *pmu = pmu_core__find_same();
> >> +
> >> + if (pmu)
> >> + return perf_pmu__find_table(pmu);
> >> +
> >> + return NULL;
> >> +}
> >> +
> >> +double perf_pmu__cpu_slots_per_cycle(void)
> >> +{
> >> + char path[PATH_MAX];
> >> + unsigned long long slots = 0;
> >> + struct perf_pmu *pmu = pmu_core__find_same();
> >> +
> >> + if (pmu) {
> >> + scnprintf(path, PATH_MAX,
> >> + EVENT_SOURCE_DEVICE_PATH "%s/caps/slots", pmu->name);
> >> + /*
> >> + * The value of slots is not greater than 32 bits, but sysfs__read_int
> >> + * can't read value with 0x prefix, so use sysfs__read_ull instead.
> >> + */
> >> + sysfs__read_ull(path, &slots);
> >> + }
> >> +
> >> + return (double)slots;
> >> +}
> >> diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
> >> index 00dcde3..9d3076a 100644
> >> --- a/tools/perf/util/expr.c
> >> +++ b/tools/perf/util/expr.c
> >> @@ -19,6 +19,7 @@
> >> #include <linux/zalloc.h>
> >> #include <ctype.h>
> >> #include <math.h>
> >> +#include "pmu.h"
> >>
> >> #ifdef PARSER_DEBUG
> >> extern int expr_debug;
> >> @@ -448,6 +449,10 @@ double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx
> >> result = topology->core_cpus_lists;
> >> goto out;
> >> }
> >> + if (!strcmp("#slots", literal)) {
> >> + result = perf_pmu__cpu_slots_per_cycle() ?: NAN;
> >> + goto out;
> >> + }
> >>
> >> pr_err("Unrecognized literal '%s'", literal);
> >> out:
> >> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> >> index 2bdeb89..cbb4fbf 100644
> >> --- a/tools/perf/util/pmu.c
> >> +++ b/tools/perf/util/pmu.c
> >> @@ -19,6 +19,7 @@
> >> #include <regex.h>
> >> #include <perf/cpumap.h>
> >> #include <fnmatch.h>
> >> +#include <math.h>
> >> #include "debug.h"
> >> #include "evsel.h"
> >> #include "pmu.h"
> >> @@ -1993,3 +1994,8 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
> >> *ucpus_ptr = unmatched_cpus;
> >> return 0;
> >> }
> >> +
> >> +double __weak perf_pmu__cpu_slots_per_cycle(void)
> >> +{
> >> + return NAN;
> >> +}
> >> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
> >> index 69ca000..fd414ba 100644
> >> --- a/tools/perf/util/pmu.h
> >> +++ b/tools/perf/util/pmu.h
> >> @@ -259,4 +259,5 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
> >>
> >> char *pmu_find_real_name(const char *name);
> >> char *pmu_find_alias_name(const char *name);
> >> +double perf_pmu__cpu_slots_per_cycle(void);
> >> #endif /* __PMU_H */
> >> --
> >> 1.8.3.1
> >>

2023-01-16 11:38:50

by Jing Zhang

[permalink] [raw]

Subject: Re: [PATCH v7 1/9] perf pmu: Add #slots literal support for arm64

在 2023/1/16 下午1:59, Ian Rogers 写道:
> On Sun, Jan 15, 2023 at 6:58 PM Jing Zhang <[email protected]> wrote:
>>
>> 在 2023/1/15 上午6:15, Ian Rogers 写道:
>>> On Fri, Jan 13, 2023 at 1:22 AM Jing Zhang <[email protected]> wrote:
>>>>
>>>> The slots in each architecture may be different, so add #slots literal
>>>> to obtain the slots of different architectures, and the #slots can be
>>>> applied in the metric. Currently, The #slots just support for arm64,
>>>> and other architectures will return NAN.
>>>>
>>>> On arm64, the value of slots is from the register PMMIR_EL1.SLOT, which
>>>> I can read in /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots.
>>>> PMMIR_EL1.SLOT might read as zero if the PMU version is lower than
>>>> ID_AA64DFR0_EL1_PMUVer_V3P4 or the STALL_SLOT event is not implemented.
>>>>
>>>> Signed-off-by: Jing Zhang <[email protected]>
>>>> ---
>>>> tools/perf/arch/arm64/util/pmu.c | 34 ++++++++++++++++++++++++++++++++--
>>>> tools/perf/util/expr.c | 5 +++++
>>>> tools/perf/util/pmu.c | 6 ++++++
>>>> tools/perf/util/pmu.h | 1 +
>>>> 4 files changed, 44 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/tools/perf/arch/arm64/util/pmu.c b/tools/perf/arch/arm64/util/pmu.c
>>>> index 477e513..5f8667b 100644
>>>> --- a/tools/perf/arch/arm64/util/pmu.c
>>>> +++ b/tools/perf/arch/arm64/util/pmu.c
>>>> @@ -3,8 +3,9 @@
>>>> #include <internal/cpumap.h>
>>>> #include "../../../util/cpumap.h"
>>>> #include "../../../util/pmu.h"
>>>> +#include <api/fs/fs.h>
>>>>
>>>> -const struct pmu_events_table *pmu_events_table__find(void)
>>>> +static struct perf_pmu *pmu_core__find_same(void)
>>>
>>> I'm not sure "find_same" is the best name here. I suspect it should be
>>> "find_core_pmu" which would agree with is_arm_pmu_core. Unfortunately
>>> "core" has become an overloaded term sometimes used interchangeably
>>> with CPU, hyperthread or SMT thread, it was a model name for Intel and
>>> it is used to distinguish a set of SMT threads running together from a
>>> single one. Anyway, for consistency I think perf_pmu__find_core_pmu is
>>> the most appropriate name (or pmu__find_core_pmu, I'm not sure why we
>>> get the extra perf_ prefix sometimes, in general that indicates the
>>> functionality is in libperf).
>>>
>>
>> The reason for using "pmu_core__find_same" before is to indicate that we're
>> only dealing with homogeneous cores. And in the tools/perf/util/pmu.c file,
>> most of the static functions have "pmu_" prefix, maybe we can use
>> "pmu_find_same_core_pmu"? Ian, John, what do you think?
>
> I wouldn't necessarily worry about hybrid given #slots is currently
> ARM specific. For hybrid we'd need to know the CPU for the metric. We
> do have the list of CPUs (really hyper/SMT threads) that were
> requested for the metric:
> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/expr.h?h=perf/core#n9
> We use it to compute the #core_wide literal. We need the value early
> before events are created, hence the string format. You could search
> pmus looking for a PMU that isn't uncore and has matching CPUs, but
> AMD has multiple core/hardware PMUs (hence Ravi's most recent work)
> and so I suspect we'd need to extend PMU to make this work. We've
> solved a similar problem to this with the source_count metric
> function, that returns a number of aliased events. We could do
> something like:
> slots(INST_RETIRED.MACRO_FUSED)
> and then from the event get the PMU, from the PMU get the CPUs, from a
> CPU get the slots. In that case it may just be cleaner to pass the PMU
> name to the slots function, so:
> slots(cpu_core) or slots(cpu_atom)
> But the parser wouldn't understand that cpu_core or cpu_atom were PMU
> names and would try to handle them as events or metric references.
>
> Again, I think ignoring the hybrid case is fine in this case. Using
> "same" in the function name to imply "not hybrid" I don't think works,
> so I think something like pmu__find_core_pmu is best. You could have a
> comment and also a:
> assert(!perf_pmu__is_hybrid(pmu->name));
> Ultimately I'd like to get rid of all notions of hybrid and just pair
> events with a PMU. I recently cleaned this up in builtin-list.c.
>

Ok, you are right, I follow your suggestion and use pmu__find_core_pmu.

I think “if (pmu->cpus->nr != cpu__max_cpu().cpu)” in the original code
and “assert(!perf_pmu__is_hybrid(pmu->name))” have the same effect, so
I will not change it.

> Thanks,
> Ian
>
>> Thanks,
>> Jing
>>
>>> Aside from that, lgtm. Thanks,
>>> Ian
>>>
>>>> {
>>>> struct perf_pmu *pmu = NULL;
>>>>
>>>> @@ -19,8 +20,37 @@ const struct pmu_events_table *pmu_events_table__find(void)
>>>> if (pmu->cpus->nr != cpu__max_cpu().cpu)
>>>> return NULL;
>>>>
>>>> - return perf_pmu__find_table(pmu);
>>>> + return pmu;
>>>> }
>>>>
>>>> return NULL;
>>>> }
>>>> +
>>>> +const struct pmu_events_table *pmu_events_table__find(void)
>>>> +{
>>>> + struct perf_pmu *pmu = pmu_core__find_same();
>>>> +
>>>> + if (pmu)
>>>> + return perf_pmu__find_table(pmu);
>>>> +
>>>> + return NULL;
>>>> +}
>>>> +
>>>> +double perf_pmu__cpu_slots_per_cycle(void)
>>>> +{
>>>> + char path[PATH_MAX];
>>>> + unsigned long long slots = 0;
>>>> + struct perf_pmu *pmu = pmu_core__find_same();
>>>> +
>>>> + if (pmu) {
>>>> + scnprintf(path, PATH_MAX,
>>>> + EVENT_SOURCE_DEVICE_PATH "%s/caps/slots", pmu->name);
>>>> + /*
>>>> + * The value of slots is not greater than 32 bits, but sysfs__read_int
>>>> + * can't read value with 0x prefix, so use sysfs__read_ull instead.
>>>> + */
>>>> + sysfs__read_ull(path, &slots);
>>>> + }
>>>> +
>>>> + return (double)slots;
>>>> +}
>>>> diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
>>>> index 00dcde3..9d3076a 100644
>>>> --- a/tools/perf/util/expr.c
>>>> +++ b/tools/perf/util/expr.c
>>>> @@ -19,6 +19,7 @@
>>>> #include <linux/zalloc.h>
>>>> #include <ctype.h>
>>>> #include <math.h>
>>>> +#include "pmu.h"
>>>>
>>>> #ifdef PARSER_DEBUG
>>>> extern int expr_debug;
>>>> @@ -448,6 +449,10 @@ double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx
>>>> result = topology->core_cpus_lists;
>>>> goto out;
>>>> }
>>>> + if (!strcmp("#slots", literal)) {
>>>> + result = perf_pmu__cpu_slots_per_cycle() ?: NAN;
>>>> + goto out;
>>>> + }
>>>>
>>>> pr_err("Unrecognized literal '%s'", literal);
>>>> out:
>>>> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
>>>> index 2bdeb89..cbb4fbf 100644
>>>> --- a/tools/perf/util/pmu.c
>>>> +++ b/tools/perf/util/pmu.c
>>>> @@ -19,6 +19,7 @@
>>>> #include <regex.h>
>>>> #include <perf/cpumap.h>
>>>> #include <fnmatch.h>
>>>> +#include <math.h>
>>>> #include "debug.h"
>>>> #include "evsel.h"
>>>> #include "pmu.h"
>>>> @@ -1993,3 +1994,8 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
>>>> *ucpus_ptr = unmatched_cpus;
>>>> return 0;
>>>> }
>>>> +
>>>> +double __weak perf_pmu__cpu_slots_per_cycle(void)
>>>> +{
>>>> + return NAN;
>>>> +}
>>>> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
>>>> index 69ca000..fd414ba 100644
>>>> --- a/tools/perf/util/pmu.h
>>>> +++ b/tools/perf/util/pmu.h
>>>> @@ -259,4 +259,5 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
>>>>
>>>> char *pmu_find_real_name(const char *name);
>>>> char *pmu_find_alias_name(const char *name);
>>>> +double perf_pmu__cpu_slots_per_cycle(void);
>>>> #endif /* __PMU_H */
>>>> --
>>>> 1.8.3.1
>>>>