This patch series adds the ability for perf to recognise the Arm
Neoverse N2 core and what counters are available.
The first patch adds the new counters to the common and microarch json
file (because they can also exist in other ArmV8 and ArmV9 cpus) Files
are also added to pmu-events/arch/arm64/arm/neoverse-n2 so that 'perf list'
categorises the counters.
The second patch renames armv8-common-and-microarch.json and
armv8-recommended.json, removing the armv8 prefix to reflect that
they can include counters for armv9.
Changes in v2:
- Addition of a cover letter
- Small changes in commit messages
- Omission of a patch reformatting whitespace that was
applied by Arnaldo
Previous version:
https://lore.kernel.org/lkml/[email protected]/
Andrew Kilroy (2):
perf vendor events: For the Arm Neoverse N2
perf vendor events: Rename arm64 arch std event files
.../arch/arm64/arm/neoverse-n2/branch.json | 8 +
.../arch/arm64/arm/neoverse-n2/bus.json | 20 ++
.../arch/arm64/arm/neoverse-n2/cache.json | 155 ++++++++++++++
.../arch/arm64/arm/neoverse-n2/exception.json | 47 +++++
.../arm64/arm/neoverse-n2/instruction.json | 143 +++++++++++++
.../arch/arm64/arm/neoverse-n2/memory.json | 38 ++++
.../arch/arm64/arm/neoverse-n2/other.json | 5 +
.../arch/arm64/arm/neoverse-n2/pipeline.json | 23 ++
.../arch/arm64/arm/neoverse-n2/spe.json | 14 ++
.../arch/arm64/arm/neoverse-n2/trace.json | 29 +++
...croarch.json => common-and-microarch.json} | 198 ++++++++++++++++++
tools/perf/pmu-events/arch/arm64/mapfile.csv | 1 +
...rmv8-recommended.json => recommended.json} | 0
13 files changed, 681 insertions(+)
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/branch.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/bus.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/exception.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/instruction.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/memory.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/other.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/spe.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/trace.json
rename tools/perf/pmu-events/arch/arm64/{armv8-common-and-microarch.json => common-and-microarch.json} (76%)
rename tools/perf/pmu-events/arch/arm64/{armv8-recommended.json => recommended.json} (100%)
--
2.17.1
Updates the common and microarch json file to add counters available in
the Arm Neoverse N2 chip, but should also apply to other ArmV8 and ArmV9
cpus. Specified in ArmV8 architecture reference manual
https://developer.arm.com/documentation/ddi0487/gb/?lang=en
Some of the counters added to armv8-common-and-microarch.json are
specified in the ArmV9 architecture reference manual supplement
(issue A.a):
https://developer.arm.com/documentation/ddi0608/aa
The additional ArmV9 counters are
TRB_WRAP
TRCEXTOUT0
TRCEXTOUT1
TRCEXTOUT2
TRCEXTOUT3
CTI_TRIGOUT4
CTI_TRIGOUT5
CTI_TRIGOUT6
CTI_TRIGOUT7
This patch also adds files in pmu-events/arch/arm64/arm/neoverse-n2 for
perf list to output the counter names in categories.
Counters on the Neoverse N2 are stated in its reference manual:
https://developer.arm.com/documentation/102099/0000
Signed-off-by: Andrew Kilroy <[email protected]>
---
.../arch/arm64/arm/neoverse-n2/branch.json | 8 +
.../arch/arm64/arm/neoverse-n2/bus.json | 20 ++
.../arch/arm64/arm/neoverse-n2/cache.json | 155 ++++++++++++++
.../arch/arm64/arm/neoverse-n2/exception.json | 47 +++++
.../arm64/arm/neoverse-n2/instruction.json | 143 +++++++++++++
.../arch/arm64/arm/neoverse-n2/memory.json | 38 ++++
.../arch/arm64/arm/neoverse-n2/other.json | 5 +
.../arch/arm64/arm/neoverse-n2/pipeline.json | 23 ++
.../arch/arm64/arm/neoverse-n2/spe.json | 14 ++
.../arch/arm64/arm/neoverse-n2/trace.json | 29 +++
.../arm64/armv8-common-and-microarch.json | 198 ++++++++++++++++++
tools/perf/pmu-events/arch/arm64/mapfile.csv | 1 +
12 files changed, 681 insertions(+)
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/branch.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/bus.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/exception.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/instruction.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/memory.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/other.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/spe.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/trace.json
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/branch.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/branch.json
new file mode 100644
index 000000000000..79f2016c53b0
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/branch.json
@@ -0,0 +1,8 @@
+[
+ {
+ "ArchStdEvent": "BR_MIS_PRED"
+ },
+ {
+ "ArchStdEvent": "BR_PRED"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/bus.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/bus.json
new file mode 100644
index 000000000000..579c1c993d17
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/bus.json
@@ -0,0 +1,20 @@
+[
+ {
+ "ArchStdEvent": "CPU_CYCLES"
+ },
+ {
+ "ArchStdEvent": "BUS_ACCESS"
+ },
+ {
+ "ArchStdEvent": "BUS_CYCLES"
+ },
+ {
+ "ArchStdEvent": "BUS_ACCESS_RD"
+ },
+ {
+ "ArchStdEvent": "BUS_ACCESS_WR"
+ },
+ {
+ "ArchStdEvent": "CNT_CYCLES"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/cache.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/cache.json
new file mode 100644
index 000000000000..0141f749bff3
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/cache.json
@@ -0,0 +1,155 @@
+[
+ {
+ "ArchStdEvent": "L1I_CACHE_REFILL"
+ },
+ {
+ "ArchStdEvent": "L1I_TLB_REFILL"
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_REFILL"
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE"
+ },
+ {
+ "ArchStdEvent": "L1D_TLB_REFILL"
+ },
+ {
+ "ArchStdEvent": "L1I_CACHE"
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_WB"
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE"
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_REFILL"
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_WB"
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_ALLOCATE"
+ },
+ {
+ "ArchStdEvent": "L1D_TLB"
+ },
+ {
+ "ArchStdEvent": "L1I_TLB"
+ },
+ {
+ "ArchStdEvent": "L3D_CACHE_ALLOCATE"
+ },
+ {
+ "ArchStdEvent": "L3D_CACHE_REFILL"
+ },
+ {
+ "ArchStdEvent": "L3D_CACHE"
+ },
+ {
+ "ArchStdEvent": "L2D_TLB_REFILL"
+ },
+ {
+ "ArchStdEvent": "L2D_TLB"
+ },
+ {
+ "ArchStdEvent": "DTLB_WALK"
+ },
+ {
+ "ArchStdEvent": "ITLB_WALK"
+ },
+ {
+ "ArchStdEvent": "LL_CACHE_RD"
+ },
+ {
+ "ArchStdEvent": "LL_CACHE_MISS_RD"
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_LMISS_RD"
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_RD"
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_WR"
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_REFILL_RD"
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_REFILL_WR"
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_REFILL_INNER"
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_REFILL_OUTER"
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_WB_VICTIM"
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_WB_CLEAN"
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_INVAL"
+ },
+ {
+ "ArchStdEvent": "L1D_TLB_REFILL_RD"
+ },
+ {
+ "ArchStdEvent": "L1D_TLB_REFILL_WR"
+ },
+ {
+ "ArchStdEvent": "L1D_TLB_RD"
+ },
+ {
+ "ArchStdEvent": "L1D_TLB_WR"
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_RD"
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_WR"
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_REFILL_RD"
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_REFILL_WR"
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_WB_VICTIM"
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_WB_CLEAN"
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_INVAL"
+ },
+ {
+ "ArchStdEvent": "L2D_TLB_REFILL_RD"
+ },
+ {
+ "ArchStdEvent": "L2D_TLB_REFILL_WR"
+ },
+ {
+ "ArchStdEvent": "L2D_TLB_RD"
+ },
+ {
+ "ArchStdEvent": "L2D_TLB_WR"
+ },
+ {
+ "ArchStdEvent": "L3D_CACHE_RD"
+ },
+ {
+ "ArchStdEvent": "L1I_CACHE_LMISS"
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_LMISS_RD"
+ },
+ {
+ "ArchStdEvent": "L3D_CACHE_LMISS_RD"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/exception.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/exception.json
new file mode 100644
index 000000000000..344a2d552ad5
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/exception.json
@@ -0,0 +1,47 @@
+[
+ {
+ "ArchStdEvent": "EXC_TAKEN"
+ },
+ {
+ "ArchStdEvent": "MEMORY_ERROR"
+ },
+ {
+ "ArchStdEvent": "EXC_UNDEF"
+ },
+ {
+ "ArchStdEvent": "EXC_SVC"
+ },
+ {
+ "ArchStdEvent": "EXC_PABORT"
+ },
+ {
+ "ArchStdEvent": "EXC_DABORT"
+ },
+ {
+ "ArchStdEvent": "EXC_IRQ"
+ },
+ {
+ "ArchStdEvent": "EXC_FIQ"
+ },
+ {
+ "ArchStdEvent": "EXC_SMC"
+ },
+ {
+ "ArchStdEvent": "EXC_HVC"
+ },
+ {
+ "ArchStdEvent": "EXC_TRAP_PABORT"
+ },
+ {
+ "ArchStdEvent": "EXC_TRAP_DABORT"
+ },
+ {
+ "ArchStdEvent": "EXC_TRAP_OTHER"
+ },
+ {
+ "ArchStdEvent": "EXC_TRAP_IRQ"
+ },
+ {
+ "ArchStdEvent": "EXC_TRAP_FIQ"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/instruction.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/instruction.json
new file mode 100644
index 000000000000..e57cd55937c6
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/instruction.json
@@ -0,0 +1,143 @@
+[
+ {
+ "ArchStdEvent": "SW_INCR"
+ },
+ {
+ "ArchStdEvent": "INST_RETIRED"
+ },
+ {
+ "ArchStdEvent": "EXC_RETURN"
+ },
+ {
+ "ArchStdEvent": "CID_WRITE_RETIRED"
+ },
+ {
+ "ArchStdEvent": "INST_SPEC"
+ },
+ {
+ "ArchStdEvent": "TTBR_WRITE_RETIRED"
+ },
+ {
+ "ArchStdEvent": "BR_RETIRED"
+ },
+ {
+ "ArchStdEvent": "BR_MIS_PRED_RETIRED"
+ },
+ {
+ "ArchStdEvent": "OP_RETIRED"
+ },
+ {
+ "ArchStdEvent": "OP_SPEC"
+ },
+ {
+ "ArchStdEvent": "LDREX_SPEC"
+ },
+ {
+ "ArchStdEvent": "STREX_PASS_SPEC"
+ },
+ {
+ "ArchStdEvent": "STREX_FAIL_SPEC"
+ },
+ {
+ "ArchStdEvent": "STREX_SPEC"
+ },
+ {
+ "ArchStdEvent": "LD_SPEC"
+ },
+ {
+ "ArchStdEvent": "ST_SPEC"
+ },
+ {
+ "ArchStdEvent": "DP_SPEC"
+ },
+ {
+ "ArchStdEvent": "ASE_SPEC"
+ },
+ {
+ "ArchStdEvent": "VFP_SPEC"
+ },
+ {
+ "ArchStdEvent": "PC_WRITE_SPEC"
+ },
+ {
+ "ArchStdEvent": "CRYPTO_SPEC"
+ },
+ {
+ "ArchStdEvent": "BR_IMMED_SPEC"
+ },
+ {
+ "ArchStdEvent": "BR_RETURN_SPEC"
+ },
+ {
+ "ArchStdEvent": "BR_INDIRECT_SPEC"
+ },
+ {
+ "ArchStdEvent": "ISB_SPEC"
+ },
+ {
+ "ArchStdEvent": "DSB_SPEC"
+ },
+ {
+ "ArchStdEvent": "DMB_SPEC"
+ },
+ {
+ "ArchStdEvent": "RC_LD_SPEC"
+ },
+ {
+ "ArchStdEvent": "RC_ST_SPEC"
+ },
+ {
+ "ArchStdEvent": "ASE_INST_SPEC"
+ },
+ {
+ "ArchStdEvent": "SVE_INST_SPEC"
+ },
+ {
+ "ArchStdEvent": "FP_HP_SPEC"
+ },
+ {
+ "ArchStdEvent": "FP_SP_SPEC"
+ },
+ {
+ "ArchStdEvent": "FP_DP_SPEC"
+ },
+ {
+ "ArchStdEvent": "SVE_PRED_SPEC"
+ },
+ {
+ "ArchStdEvent": "SVE_PRED_EMPTY_SPEC"
+ },
+ {
+ "ArchStdEvent": "SVE_PRED_FULL_SPEC"
+ },
+ {
+ "ArchStdEvent": "SVE_PRED_PARTIAL_SPEC"
+ },
+ {
+ "ArchStdEvent": "SVE_PRED_NOT_FULL_SPEC"
+ },
+ {
+ "ArchStdEvent": "SVE_LDFF_SPEC"
+ },
+ {
+ "ArchStdEvent": "SVE_LDFF_FAULT_SPEC"
+ },
+ {
+ "ArchStdEvent": "FP_SCALE_OPS_SPEC"
+ },
+ {
+ "ArchStdEvent": "FP_FIXED_OPS_SPEC"
+ },
+ {
+ "ArchStdEvent": "ASE_SVE_INT8_SPEC"
+ },
+ {
+ "ArchStdEvent": "ASE_SVE_INT16_SPEC"
+ },
+ {
+ "ArchStdEvent": "ASE_SVE_INT32_SPEC"
+ },
+ {
+ "ArchStdEvent": "ASE_SVE_INT64_SPEC"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/memory.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/memory.json
new file mode 100644
index 000000000000..e522113aeb96
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/memory.json
@@ -0,0 +1,38 @@
+[
+ {
+ "ArchStdEvent": "MEM_ACCESS"
+ },
+ {
+ "ArchStdEvent": "MEM_ACCESS_RD"
+ },
+ {
+ "ArchStdEvent": "MEM_ACCESS_WR"
+ },
+ {
+ "ArchStdEvent": "UNALIGNED_LD_SPEC"
+ },
+ {
+ "ArchStdEvent": "UNALIGNED_ST_SPEC"
+ },
+ {
+ "ArchStdEvent": "UNALIGNED_LDST_SPEC"
+ },
+ {
+ "ArchStdEvent": "LDST_ALIGN_LAT"
+ },
+ {
+ "ArchStdEvent": "LD_ALIGN_LAT"
+ },
+ {
+ "ArchStdEvent": "ST_ALIGN_LAT"
+ },
+ {
+ "ArchStdEvent": "MEM_ACCESS_CHECKED"
+ },
+ {
+ "ArchStdEvent": "MEM_ACCESS_CHECKED_RD"
+ },
+ {
+ "ArchStdEvent": "MEM_ACCESS_CHECKED_WR"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/other.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/other.json
new file mode 100644
index 000000000000..20d8365756c5
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/other.json
@@ -0,0 +1,5 @@
+[
+ {
+ "ArchStdEvent": "REMOTE_ACCESS"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
new file mode 100644
index 000000000000..f9fae15f7555
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
@@ -0,0 +1,23 @@
+[
+ {
+ "ArchStdEvent": "STALL_FRONTEND"
+ },
+ {
+ "ArchStdEvent": "STALL_BACKEND"
+ },
+ {
+ "ArchStdEvent": "STALL"
+ },
+ {
+ "ArchStdEvent": "STALL_SLOT_BACKEND"
+ },
+ {
+ "ArchStdEvent": "STALL_SLOT_FRONTEND"
+ },
+ {
+ "ArchStdEvent": "STALL_SLOT"
+ },
+ {
+ "ArchStdEvent": "STALL_BACKEND_MEM"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/spe.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/spe.json
new file mode 100644
index 000000000000..20f2165c85fe
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/spe.json
@@ -0,0 +1,14 @@
+[
+ {
+ "ArchStdEvent": "SAMPLE_POP"
+ },
+ {
+ "ArchStdEvent": "SAMPLE_FEED"
+ },
+ {
+ "ArchStdEvent": "SAMPLE_FILTRATE"
+ },
+ {
+ "ArchStdEvent": "SAMPLE_COLLISION"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/trace.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/trace.json
new file mode 100644
index 000000000000..3116135c59e2
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/trace.json
@@ -0,0 +1,29 @@
+[
+ {
+ "ArchStdEvent": "TRB_WRAP"
+ },
+ {
+ "ArchStdEvent": "TRCEXTOUT0"
+ },
+ {
+ "ArchStdEvent": "TRCEXTOUT1"
+ },
+ {
+ "ArchStdEvent": "TRCEXTOUT2"
+ },
+ {
+ "ArchStdEvent": "TRCEXTOUT3"
+ },
+ {
+ "ArchStdEvent": "CTI_TRIGOUT4"
+ },
+ {
+ "ArchStdEvent": "CTI_TRIGOUT5"
+ },
+ {
+ "ArchStdEvent": "CTI_TRIGOUT6"
+ },
+ {
+ "ArchStdEvent": "CTI_TRIGOUT7"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/armv8-common-and-microarch.json b/tools/perf/pmu-events/arch/arm64/armv8-common-and-microarch.json
index 423767510aff..80d7a70829a0 100644
--- a/tools/perf/pmu-events/arch/arm64/armv8-common-and-microarch.json
+++ b/tools/perf/pmu-events/arch/arm64/armv8-common-and-microarch.json
@@ -299,6 +299,30 @@
"EventName": "STALL_SLOT",
"BriefDescription": "No operation sent for execution on a slot"
},
+ {
+ "PublicDescription": "Sample Population",
+ "EventCode": "0x4000",
+ "EventName": "SAMPLE_POP",
+ "BriefDescription": "Sample Population"
+ },
+ {
+ "PublicDescription": "Sample Taken",
+ "EventCode": "0x4001",
+ "EventName": "SAMPLE_FEED",
+ "BriefDescription": "Sample Taken"
+ },
+ {
+ "PublicDescription": "Sample Taken and not removed by filtering",
+ "EventCode": "0x4002",
+ "EventName": "SAMPLE_FILTRATE",
+ "BriefDescription": "Sample Taken and not removed by filtering"
+ },
+ {
+ "PublicDescription": "Sample collided with previous sample",
+ "EventCode": "0x4003",
+ "EventName": "SAMPLE_COLLISION",
+ "BriefDescription": "Sample collided with previous sample"
+ },
{
"PublicDescription": "Constant frequency cycles. The counter increments at a constant frequency equal to the rate of increment of the system counter, CNTPCT_EL0.",
"EventCode": "0x4004",
@@ -329,6 +353,96 @@
"EventName": "L3D_CACHE_LMISS_RD",
"BriefDescription": "Level 3 data cache long-latency read miss"
},
+ {
+ "PublicDescription": "Trace buffer current write pointer wrapped",
+ "EventCode": "0x400C",
+ "EventName": "TRB_WRAP",
+ "BriefDescription": "Trace buffer current write pointer wrapped"
+ },
+ {
+ "PublicDescription": "PE Trace Unit external output 0",
+ "EventCode": "0x4010",
+ "EventName": "TRCEXTOUT0",
+ "BriefDescription": "PE Trace Unit external output 0"
+ },
+ {
+ "PublicDescription": "PE Trace Unit external output 1",
+ "EventCode": "0x4011",
+ "EventName": "TRCEXTOUT1",
+ "BriefDescription": "PE Trace Unit external output 1"
+ },
+ {
+ "PublicDescription": "PE Trace Unit external output 2",
+ "EventCode": "0x4012",
+ "EventName": "TRCEXTOUT2",
+ "BriefDescription": "PE Trace Unit external output 2"
+ },
+ {
+ "PublicDescription": "PE Trace Unit external output 3",
+ "EventCode": "0x4013",
+ "EventName": "TRCEXTOUT3",
+ "BriefDescription": "PE Trace Unit external output 3"
+ },
+ {
+ "PublicDescription": "Cross-trigger Interface output trigger 4",
+ "EventCode": "0x4018",
+ "EventName": "CTI_TRIGOUT4",
+ "BriefDescription": "Cross-trigger Interface output trigger 4"
+ },
+ {
+ "PublicDescription": "Cross-trigger Interface output trigger 5 ",
+ "EventCode": "0x4019",
+ "EventName": "CTI_TRIGOUT5",
+ "BriefDescription": "Cross-trigger Interface output trigger 5 "
+ },
+ {
+ "PublicDescription": "Cross-trigger Interface output trigger 6",
+ "EventCode": "0x401A",
+ "EventName": "CTI_TRIGOUT6",
+ "BriefDescription": "Cross-trigger Interface output trigger 6"
+ },
+ {
+ "PublicDescription": "Cross-trigger Interface output trigger 7",
+ "EventCode": "0x401B",
+ "EventName": "CTI_TRIGOUT7",
+ "BriefDescription": "Cross-trigger Interface output trigger 7"
+ },
+ {
+ "PublicDescription": "Access with additional latency from alignment",
+ "EventCode": "0x4020",
+ "EventName": "LDST_ALIGN_LAT",
+ "BriefDescription": "Access with additional latency from alignment"
+ },
+ {
+ "PublicDescription": "Load with additional latency from alignment",
+ "EventCode": "0x4021",
+ "EventName": "LD_ALIGN_LAT",
+ "BriefDescription": "Load with additional latency from alignment"
+ },
+ {
+ "PublicDescription": "Store with additional latency from alignment",
+ "EventCode": "0x4022",
+ "EventName": "ST_ALIGN_LAT",
+ "BriefDescription": "Store with additional latency from alignment"
+ },
+ {
+ "PublicDescription": "Checked data memory access",
+ "EventCode": "0x4024",
+ "EventName": "MEM_ACCESS_CHECKED",
+ "BriefDescription": "Checked data memory access"
+ },
+ {
+ "PublicDescription": "Checked data memory access, read",
+ "EventCode": "0x4025",
+ "EventName": "MEM_ACCESS_CHECKED_RD",
+ "BriefDescription": "Checked data memory access, read"
+ },
+ {
+ "PublicDescription": "Checked data memory access, write",
+ "EventCode": "0x4026",
+ "EventName": "MEM_ACCESS_CHECKED_WR",
+ "BriefDescription": "Checked data memory access, write"
+ },
{
"PublicDescription": "SIMD Instruction architecturally executed.",
"EventCode": "0x8000",
@@ -341,6 +455,18 @@
"EventName": "SVE_INST_RETIRED",
"BriefDescription": "Instruction architecturally executed, SVE."
},
+ {
+ "PublicDescription": "ASE operations speculatively executed",
+ "EventCode": "0x8005",
+ "EventName": "ASE_INST_SPEC",
+ "BriefDescription": "ASE operations speculatively executed"
+ },
+ {
+ "PublicDescription": "SVE operations speculatively executed",
+ "EventCode": "0x8006",
+ "EventName": "SVE_INST_SPEC",
+ "BriefDescription": "SVE operations speculatively executed"
+ },
{
"PublicDescription": "Microarchitectural operation, Operations speculatively executed.",
"EventCode": "0x8008",
@@ -359,6 +485,24 @@
"EventName": "FP_SPEC",
"BriefDescription": "Floating-point Operations speculatively executed."
},
+ {
+ "PublicDescription": "Floating-point half-precision operations speculatively executed",
+ "EventCode": "0x8014",
+ "EventName": "FP_HP_SPEC",
+ "BriefDescription": "Floating-point half-precision operations speculatively executed"
+ },
+ {
+ "PublicDescription": "Floating-point single-precision operations speculatively executed",
+ "EventCode": "0x8018",
+ "EventName": "FP_SP_SPEC",
+ "BriefDescription": "Floating-point single-precision operations speculatively executed"
+ },
+ {
+ "PublicDescription": "Floating-point double-precision operations speculatively executed",
+ "EventCode": "0x801C",
+ "EventName": "FP_DP_SPEC",
+ "BriefDescription": "Floating-point double-precision operations speculatively executed"
+ },
{
"PublicDescription": "Floating-point FMA Operations speculatively executed.",
"EventCode": "0x8028",
@@ -389,6 +533,30 @@
"EventName": "SVE_PRED_SPEC",
"BriefDescription": "SVE predicated Operations speculatively executed."
},
+ {
+ "PublicDescription": "SVE predicated operations with no active predicates speculatively executed",
+ "EventCode": "0x8075",
+ "EventName": "SVE_PRED_EMPTY_SPEC",
+ "BriefDescription": "SVE predicated operations with no active predicates speculatively executed"
+ },
+ {
+ "PublicDescription": "SVE predicated operations speculatively executed with all active predicates",
+ "EventCode": "0x8076",
+ "EventName": "SVE_PRED_FULL_SPEC",
+ "BriefDescription": "SVE predicated operations speculatively executed with all active predicates"
+ },
+ {
+ "PublicDescription": "SVE predicated operations speculatively executed with partially active predicates",
+ "EventCode": "0x8077",
+ "EventName": "SVE_PRED_PARTIAL_SPEC",
+ "BriefDescription": "SVE predicated operations speculatively executed with partially active predicates"
+ },
+ {
+ "PublicDescription": "SVE predicated operations with empty or partially active predicates",
+ "EventCode": "0x8079",
+ "EventName": "SVE_PRED_NOT_FULL_SPEC",
+ "BriefDescription": "SVE predicated operations with empty or partially active predicates"
+ },
{
"PublicDescription": "SVE MOVPRFX Operations speculatively executed.",
"EventCode": "0x807C",
@@ -497,6 +665,12 @@
"EventName": "SVE_LDFF_SPEC",
"BriefDescription": "SVE First-fault load Operations speculatively executed."
},
+ {
+ "PublicDescription": "SVE first-fault load operations speculatively executed which set FFR bit to 0",
+ "EventCode": "0x80BD",
+ "EventName": "SVE_LDFF_FAULT_SPEC",
+ "BriefDescription": "SVE first-fault load operations speculatively executed which set FFR bit to 0"
+ },
{
"PublicDescription": "Scalable floating-point element Operations speculatively executed.",
"EventCode": "0x80C0",
@@ -544,5 +718,29 @@
"EventCode": "0x80C7",
"EventName": "FP_DP_FIXED_OPS_SPEC",
"BriefDescription": "Non-scalable double-precision floating-point element Operations speculatively executed."
+ },
+ {
+ "PublicDescription": "Advanced SIMD and SVE 8-bit integer operations speculatively executed",
+ "EventCode": "0x80E3",
+ "EventName": "ASE_SVE_INT8_SPEC",
+ "BriefDescription": "Advanced SIMD and SVE 8-bit integer operations speculatively executed"
+ },
+ {
+ "PublicDescription": "Advanced SIMD and SVE 16-bit integer operations speculatively executed",
+ "EventCode": "0x80E7",
+ "EventName": "ASE_SVE_INT16_SPEC",
+ "BriefDescription": "Advanced SIMD and SVE 16-bit integer operations speculatively executed"
+ },
+ {
+ "PublicDescription": "Advanced SIMD and SVE 32-bit integer operations speculatively executed",
+ "EventCode": "0x80EB",
+ "EventName": "ASE_SVE_INT32_SPEC",
+ "BriefDescription": "Advanced SIMD and SVE 32-bit integer operations speculatively executed"
+ },
+ {
+ "PublicDescription": "Advanced SIMD and SVE 64-bit integer operations speculatively executed",
+ "EventCode": "0x80EF",
+ "EventName": "ASE_SVE_INT64_SPEC",
+ "BriefDescription": "Advanced SIMD and SVE 64-bit integer operations speculatively executed"
}
]
diff --git a/tools/perf/pmu-events/arch/arm64/mapfile.csv b/tools/perf/pmu-events/arch/arm64/mapfile.csv
index 31d8b57ca9bb..b899db48c12a 100644
--- a/tools/perf/pmu-events/arch/arm64/mapfile.csv
+++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv
@@ -19,6 +19,7 @@
0x00000000410fd0b0,v1,arm/cortex-a76-n1,core
0x00000000410fd0c0,v1,arm/cortex-a76-n1,core
0x00000000410fd400,v1,arm/neoverse-v1,core
+0x00000000410fd490,v1,arm/neoverse-n2,core
0x00000000420f5160,v1,cavium/thunderx2,core
0x00000000430f0af0,v1,cavium/thunderx2,core
0x00000000460f0010,v1,fujitsu/a64fx,core
--
2.17.1
A previous commit adds pmu events into the files
armv8-common-and-microarch.json
armv8-recommended.json
that are actually specified in an armv9 reference supplement, not armv8.
As such, naming the files with the armv8 prefix seems artificial.
This patch renames the files to reflect that these two files are for
arch std events regardless of whether they are defined in armv8 or
armv9.
Signed-off-by: Andrew Kilroy <[email protected]>
---
...{armv8-common-and-microarch.json => common-and-microarch.json} | 0
.../arch/arm64/{armv8-recommended.json => recommended.json} | 0
2 files changed, 0 insertions(+), 0 deletions(-)
rename tools/perf/pmu-events/arch/arm64/{armv8-common-and-microarch.json => common-and-microarch.json} (100%)
rename tools/perf/pmu-events/arch/arm64/{armv8-recommended.json => recommended.json} (100%)
diff --git a/tools/perf/pmu-events/arch/arm64/armv8-common-and-microarch.json b/tools/perf/pmu-events/arch/arm64/common-and-microarch.json
similarity index 100%
rename from tools/perf/pmu-events/arch/arm64/armv8-common-and-microarch.json
rename to tools/perf/pmu-events/arch/arm64/common-and-microarch.json
diff --git a/tools/perf/pmu-events/arch/arm64/armv8-recommended.json b/tools/perf/pmu-events/arch/arm64/recommended.json
similarity index 100%
rename from tools/perf/pmu-events/arch/arm64/armv8-recommended.json
rename to tools/perf/pmu-events/arch/arm64/recommended.json
--
2.17.1
On 10/12/2021 12:37, Andrew Kilroy wrote:
> Updates the common and microarch json file to add counters available in
> the Arm Neoverse N2 chip, but should also apply to other ArmV8 and ArmV9
> cpus. Specified in ArmV8 architecture reference manual
>
> https://developer.arm.com/documentation/ddi0487/gb/?lang=en
>
> Some of the counters added to armv8-common-and-microarch.json are
> specified in the ArmV9 architecture reference manual supplement
> (issue A.a):
>
> https://developer.arm.com/documentation/ddi0608/aa
>
> The additional ArmV9 counters are
>
> TRB_WRAP
> TRCEXTOUT0
> TRCEXTOUT1
> TRCEXTOUT2
> TRCEXTOUT3
> CTI_TRIGOUT4
> CTI_TRIGOUT5
> CTI_TRIGOUT6
> CTI_TRIGOUT7
>
> This patch also adds files in pmu-events/arch/arm64/arm/neoverse-n2 for
> perf list to output the counter names in categories.
>
> Counters on the Neoverse N2 are stated in its reference manual:
>
> https://developer.arm.com/documentation/102099/0000
>
> Signed-off-by: Andrew Kilroy<[email protected]>
> ---
> .../arch/arm64/arm/neoverse-n2/branch.json | 8 +
> .../arch/arm64/arm/neoverse-n2/bus.json | 20 ++
> .../arch/arm64/arm/neoverse-n2/cache.json | 155 ++++++++++++++
> .../arch/arm64/arm/neoverse-n2/exception.json | 47 +++++
> .../arm64/arm/neoverse-n2/instruction.json | 143 +++++++++++++
> .../arch/arm64/arm/neoverse-n2/memory.json | 38 ++++
> .../arch/arm64/arm/neoverse-n2/other.json | 5 +
> .../arch/arm64/arm/neoverse-n2/pipeline.json | 23 ++
> .../arch/arm64/arm/neoverse-n2/spe.json | 14 ++
> .../arch/arm64/arm/neoverse-n2/trace.json | 29 +++
> .../arm64/armv8-common-and-microarch.json | 198 ++++++++++++++++++
> tools/perf/pmu-events/arch/arm64/mapfile.csv | 1 +
> 12 files changed, 681 insertions(+)
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/branch.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/bus.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/exception.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/instruction.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/memory.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/other.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/spe.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/trace.json
>
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/branch.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/branch.json
This looks ok,
Reviewed-by: John Garry <[email protected]>
BTW, I was looking at adding perf tool --topdown support for arm64. This
will require L1 metricgroup support per core - see what I did here for
our hisilicon platform already:
[0]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
I would like to add support for more cores. Generally the arm common
events match up to the definitions here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/perf-stat.txt#n400
Apart from frontend_bound - would you have an equivalent metric
expression for this for these Neoverse cores?
[0] Note that I think that the divisor in the metric expressions is max
uops that the core can deal with.
Thanks,
John
On 10/12/2021 12:37, Andrew Kilroy wrote:
> A previous commit adds pmu events into the files
>
> armv8-common-and-microarch.json
> armv8-recommended.json
>
> that are actually specified in an armv9 reference supplement, not armv8.
> As such, naming the files with the armv8 prefix seems artificial.
>
> This patch renames the files to reflect that these two files are for
> arch std events regardless of whether they are defined in armv8 or
> armv9.
>
> Signed-off-by: Andrew Kilroy<[email protected]>
Reviewed-by: John Garry <[email protected]>
Em Fri, Dec 10, 2021 at 01:46:48PM +0000, John Garry escreveu:
> On 10/12/2021 12:37, Andrew Kilroy wrote:
> > A previous commit adds pmu events into the files
> >
> > armv8-common-and-microarch.json
> > armv8-recommended.json
> >
> > that are actually specified in an armv9 reference supplement, not armv8.
> > As such, naming the files with the armv8 prefix seems artificial.
> >
> > This patch renames the files to reflect that these two files are for
> > arch std events regardless of whether they are defined in armv8 or
> > armv9.
> >
> > Signed-off-by: Andrew Kilroy<[email protected]>
>
> Reviewed-by: John Garry <[email protected]>
Thanks, applied both patches.
- Arnaldo
Hi John,
> BTW, I was looking at adding perf tool --topdown support for arm64. This will require L1 metricgroup support per core - see what I did here for our hisilicon platform already:
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
>
> I would like to add support for more cores. Generally the arm common events match up to the definitions here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/perf-stat.txt#n400
>
> Apart from frontend_bound - would you have an equivalent metric expression for this for these Neoverse cores?
>
> [0] Note that I think that the divisor in the metric expressions is max uops that the core can deal with.
Thanks for this, we’ve been working on how to support perf --topdown.
We’re still considering the right events/calculations on our CPUs.
To introduce the calculations in metrics jsons we would like this patch
to alter the --topdown option on arm64.
How does this fit with your solution?
Andrew
Andrew Kilroy (1):
perf arm64: Implement --topdown with metrics
tools/perf/arch/arm64/util/Build | 1 +
tools/perf/arch/arm64/util/topdown.c | 8 ++++++++
tools/perf/builtin-stat.c | 13 +++++++++++++
tools/perf/util/metricgroup.c | 12 ++++++++++++
tools/perf/util/metricgroup.h | 7 +++++++
tools/perf/util/topdown.c | 6 ++++++
tools/perf/util/topdown.h | 1 +
7 files changed, 48 insertions(+)
create mode 100644 tools/perf/arch/arm64/util/topdown.c
--
2.17.1
This patch implements the --topdown option by making use of metrics to
dictate what counters are obtained in order to show the various topdown
columns, e.g. Frontend Bound, Backend Bound, Retiring and Bad
Speculation.
The MetricGroup name is used to identify which set of metrics are to be
shown. For the moment use TopDownL1 and enable for arm64
Signed-off-by: Andrew Kilroy <[email protected]>
---
tools/perf/arch/arm64/util/Build | 1 +
tools/perf/arch/arm64/util/topdown.c | 8 ++++++++
tools/perf/builtin-stat.c | 13 +++++++++++++
tools/perf/util/metricgroup.c | 12 ++++++++++++
tools/perf/util/metricgroup.h | 7 +++++++
tools/perf/util/topdown.c | 6 ++++++
tools/perf/util/topdown.h | 1 +
7 files changed, 48 insertions(+)
create mode 100644 tools/perf/arch/arm64/util/topdown.c
diff --git a/tools/perf/arch/arm64/util/Build b/tools/perf/arch/arm64/util/Build
index 9fcb4e68add9..9807aed981cd 100644
--- a/tools/perf/arch/arm64/util/Build
+++ b/tools/perf/arch/arm64/util/Build
@@ -4,6 +4,7 @@ perf-y += perf_regs.o
perf-y += tsc.o
perf-y += pmu.o
perf-y += kvm-stat.o
+perf-y += topdown.o
perf-$(CONFIG_DWARF) += dwarf-regs.o
perf-$(CONFIG_LOCAL_LIBUNWIND) += unwind-libunwind.o
perf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
diff --git a/tools/perf/arch/arm64/util/topdown.c b/tools/perf/arch/arm64/util/topdown.c
new file mode 100644
index 000000000000..a2b1f9c01148
--- /dev/null
+++ b/tools/perf/arch/arm64/util/topdown.c
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <topdown.h>
+
+
+bool arch_topdown_use_json_metrics(void)
+{
+ return true;
+}
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index f6ca2b054c5b..08ef80ef345e 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1856,6 +1856,18 @@ static int add_default_attributes(void)
if (!force_metric_only)
stat_config.metric_only = true;
+ if (arch_topdown_use_json_metrics()) {
+ if (metricgroup__parse_groups_to_evlist(evsel_list, "TopDownL1",
+ stat_config.metric_no_group,
+ stat_config.metric_no_merge,
+ &stat_config.metric_events) < 0) {
+ pr_err("Could not form list of metrics for topdown\n");
+ return -1;
+ }
+
+ goto end_of_topdown_setup;
+ }
+
if (pmu_have_event("cpu", topdown_metric_L2_attrs[5])) {
metric_attrs = topdown_metric_L2_attrs;
max_level = 2;
@@ -1919,6 +1931,7 @@ static int add_default_attributes(void)
fprintf(stderr, "System does not support topdown\n");
return -1;
}
+end_of_topdown_setup:
free(str);
}
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 51c99cb08abf..9b0394372096 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -1535,6 +1535,18 @@ int metricgroup__parse_groups(const struct option *opt,
metric_no_merge, NULL, metric_events, map);
}
+int metricgroup__parse_groups_to_evlist(struct evlist *perf_evlist,
+ const char *str,
+ bool metric_no_group,
+ bool metric_no_merge,
+ struct rblist *metric_events)
+{
+ const struct pmu_events_map *map = pmu_events_map__find();
+
+ return parse_groups(perf_evlist, str, metric_no_group,
+ metric_no_merge, NULL, metric_events, map);
+}
+
int metricgroup__parse_groups_test(struct evlist *evlist,
const struct pmu_events_map *map,
const char *str,
diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
index 2b42b778d1bf..1f143ad1d9e1 100644
--- a/tools/perf/util/metricgroup.h
+++ b/tools/perf/util/metricgroup.h
@@ -70,6 +70,13 @@ int metricgroup__parse_groups(const struct option *opt,
bool metric_no_group,
bool metric_no_merge,
struct rblist *metric_events);
+
+int metricgroup__parse_groups_to_evlist(struct evlist *perf_evlist,
+ const char *str,
+ bool metric_no_group,
+ bool metric_no_merge,
+ struct rblist *metric_events);
+
const struct pmu_event *metricgroup__find_metric(const char *metric,
const struct pmu_events_map *map);
int metricgroup__parse_groups_test(struct evlist *evlist,
diff --git a/tools/perf/util/topdown.c b/tools/perf/util/topdown.c
index 1081b20f9891..57c0c5f2c6bd 100644
--- a/tools/perf/util/topdown.c
+++ b/tools/perf/util/topdown.c
@@ -56,3 +56,9 @@ __weak bool arch_topdown_sample_read(struct evsel *leader __maybe_unused)
{
return false;
}
+
+__weak bool arch_topdown_use_json_metrics(void)
+{
+ return false;
+}
+
diff --git a/tools/perf/util/topdown.h b/tools/perf/util/topdown.h
index 2f0d0b887639..0a5275a3f078 100644
--- a/tools/perf/util/topdown.h
+++ b/tools/perf/util/topdown.h
@@ -6,6 +6,7 @@
bool arch_topdown_check_group(bool *warn);
void arch_topdown_group_warn(void);
bool arch_topdown_sample_read(struct evsel *leader);
+bool arch_topdown_use_json_metrics(void);
int topdown_filter_events(const char **attr, char **str, bool use_group);
--
2.17.1
On Tue, Dec 14, 2021 at 10:43 AM Andrew Kilroy <[email protected]> wrote:
>
> This patch implements the --topdown option by making use of metrics to
> dictate what counters are obtained in order to show the various topdown
> columns, e.g. Frontend Bound, Backend Bound, Retiring and Bad
> Speculation.
>
> The MetricGroup name is used to identify which set of metrics are to be
> shown. For the moment use TopDownL1 and enable for arm64
>
> Signed-off-by: Andrew Kilroy <[email protected]>
> ---
> tools/perf/arch/arm64/util/Build | 1 +
> tools/perf/arch/arm64/util/topdown.c | 8 ++++++++
> tools/perf/builtin-stat.c | 13 +++++++++++++
> tools/perf/util/metricgroup.c | 12 ++++++++++++
> tools/perf/util/metricgroup.h | 7 +++++++
> tools/perf/util/topdown.c | 6 ++++++
> tools/perf/util/topdown.h | 1 +
> 7 files changed, 48 insertions(+)
> create mode 100644 tools/perf/arch/arm64/util/topdown.c
>
> diff --git a/tools/perf/arch/arm64/util/Build b/tools/perf/arch/arm64/util/Build
> index 9fcb4e68add9..9807aed981cd 100644
> --- a/tools/perf/arch/arm64/util/Build
> +++ b/tools/perf/arch/arm64/util/Build
> @@ -4,6 +4,7 @@ perf-y += perf_regs.o
> perf-y += tsc.o
> perf-y += pmu.o
> perf-y += kvm-stat.o
> +perf-y += topdown.o
> perf-$(CONFIG_DWARF) += dwarf-regs.o
> perf-$(CONFIG_LOCAL_LIBUNWIND) += unwind-libunwind.o
> perf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
> diff --git a/tools/perf/arch/arm64/util/topdown.c b/tools/perf/arch/arm64/util/topdown.c
> new file mode 100644
> index 000000000000..a2b1f9c01148
> --- /dev/null
> +++ b/tools/perf/arch/arm64/util/topdown.c
> @@ -0,0 +1,8 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <topdown.h>
> +
> +
> +bool arch_topdown_use_json_metrics(void)
> +{
> + return true;
> +}
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index f6ca2b054c5b..08ef80ef345e 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1856,6 +1856,18 @@ static int add_default_attributes(void)
> if (!force_metric_only)
> stat_config.metric_only = true;
>
> + if (arch_topdown_use_json_metrics()) {
> + if (metricgroup__parse_groups_to_evlist(evsel_list, "TopDownL1",
> + stat_config.metric_no_group,
> + stat_config.metric_no_merge,
> + &stat_config.metric_events) < 0) {
> + pr_err("Could not form list of metrics for topdown\n");
> + return -1;
> + }
> +
> + goto end_of_topdown_setup;
> + }
> +
> if (pmu_have_event("cpu", topdown_metric_L2_attrs[5])) {
> metric_attrs = topdown_metric_L2_attrs;
> max_level = 2;
> @@ -1919,6 +1931,7 @@ static int add_default_attributes(void)
> fprintf(stderr, "System does not support topdown\n");
> return -1;
> }
> +end_of_topdown_setup:
> free(str);
> }
>
> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> index 51c99cb08abf..9b0394372096 100644
> --- a/tools/perf/util/metricgroup.c
> +++ b/tools/perf/util/metricgroup.c
> @@ -1535,6 +1535,18 @@ int metricgroup__parse_groups(const struct option *opt,
> metric_no_merge, NULL, metric_events, map);
> }
>
> +int metricgroup__parse_groups_to_evlist(struct evlist *perf_evlist,
> + const char *str,
> + bool metric_no_group,
> + bool metric_no_merge,
> + struct rblist *metric_events)
> +{
> + const struct pmu_events_map *map = pmu_events_map__find();
> +
> + return parse_groups(perf_evlist, str, metric_no_group,
> + metric_no_merge, NULL, metric_events, map);
> +}
> +
> int metricgroup__parse_groups_test(struct evlist *evlist,
> const struct pmu_events_map *map,
> const char *str,
> diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
> index 2b42b778d1bf..1f143ad1d9e1 100644
> --- a/tools/perf/util/metricgroup.h
> +++ b/tools/perf/util/metricgroup.h
> @@ -70,6 +70,13 @@ int metricgroup__parse_groups(const struct option *opt,
> bool metric_no_group,
> bool metric_no_merge,
> struct rblist *metric_events);
> +
> +int metricgroup__parse_groups_to_evlist(struct evlist *perf_evlist,
> + const char *str,
> + bool metric_no_group,
> + bool metric_no_merge,
> + struct rblist *metric_events);
> +
> const struct pmu_event *metricgroup__find_metric(const char *metric,
> const struct pmu_events_map *map);
> int metricgroup__parse_groups_test(struct evlist *evlist,
> diff --git a/tools/perf/util/topdown.c b/tools/perf/util/topdown.c
> index 1081b20f9891..57c0c5f2c6bd 100644
> --- a/tools/perf/util/topdown.c
> +++ b/tools/perf/util/topdown.c
> @@ -56,3 +56,9 @@ __weak bool arch_topdown_sample_read(struct evsel *leader __maybe_unused)
> {
> return false;
> }
> +
> +__weak bool arch_topdown_use_json_metrics(void)
> +{
I like this extension! I've ranted in the past about weak symbols
breaking with archives due to lazy loading [1]. In this case
tools/perf/arch/arm64/util/topdown.c has no other symbols within it
and so the weak symbol has an extra chance of being linked
incorrectly. We could add a new command line of --topdown-json to
avoid this, but there seems little difference in doing this over just
doing '-M TopDownL1'. Is it possible to use the json metric approach
for when the CPU version fails?
Thanks,
Ian
[1] https://lore.kernel.org/lkml/CAP-5=fVS2AwZ9bP4BjF9GDcZqmw5fwUZ6OGXdgMnFj3w_2OTaw@mail.gmail.com/
> + return false;
> +}
> +
> diff --git a/tools/perf/util/topdown.h b/tools/perf/util/topdown.h
> index 2f0d0b887639..0a5275a3f078 100644
> --- a/tools/perf/util/topdown.h
> +++ b/tools/perf/util/topdown.h
> @@ -6,6 +6,7 @@
> bool arch_topdown_check_group(bool *warn);
> void arch_topdown_group_warn(void);
> bool arch_topdown_sample_read(struct evsel *leader);
> +bool arch_topdown_use_json_metrics(void);
>
> int topdown_filter_events(const char **attr, char **str, bool use_group);
>
> --
> 2.17.1
>
On 14/12/2021 20:32, Ian Rogers wrote:
> On Tue, Dec 14, 2021 at 10:43 AM Andrew Kilroy <[email protected]> wrote:
>>
>> This patch implements the --topdown option by making use of metrics to
>> dictate what counters are obtained in order to show the various topdown
>> columns, e.g. Frontend Bound, Backend Bound, Retiring and Bad
>> Speculation.
>>
>> The MetricGroup name is used to identify which set of metrics are to be
>> shown. For the moment use TopDownL1 and enable for arm64
>>
>> Signed-off-by: Andrew Kilroy <[email protected]>
>> ---
>> tools/perf/arch/arm64/util/Build | 1 +
>> tools/perf/arch/arm64/util/topdown.c | 8 ++++++++
>> tools/perf/builtin-stat.c | 13 +++++++++++++
>> tools/perf/util/metricgroup.c | 12 ++++++++++++
>> tools/perf/util/metricgroup.h | 7 +++++++
>> tools/perf/util/topdown.c | 6 ++++++
>> tools/perf/util/topdown.h | 1 +
>> 7 files changed, 48 insertions(+)
>> create mode 100644 tools/perf/arch/arm64/util/topdown.c
>>
>> diff --git a/tools/perf/arch/arm64/util/Build b/tools/perf/arch/arm64/util/Build
>> index 9fcb4e68add9..9807aed981cd 100644
>> --- a/tools/perf/arch/arm64/util/Build
>> +++ b/tools/perf/arch/arm64/util/Build
>> @@ -4,6 +4,7 @@ perf-y += perf_regs.o
>> perf-y += tsc.o
>> perf-y += pmu.o
>> perf-y += kvm-stat.o
>> +perf-y += topdown.o
>> perf-$(CONFIG_DWARF) += dwarf-regs.o
>> perf-$(CONFIG_LOCAL_LIBUNWIND) += unwind-libunwind.o
>> perf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
>> diff --git a/tools/perf/arch/arm64/util/topdown.c b/tools/perf/arch/arm64/util/topdown.c
>> new file mode 100644
>> index 000000000000..a2b1f9c01148
>> --- /dev/null
>> +++ b/tools/perf/arch/arm64/util/topdown.c
>> @@ -0,0 +1,8 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +#include <topdown.h>
>> +
>> +
>> +bool arch_topdown_use_json_metrics(void)
>> +{
>> + return true;
>> +}
>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>> index f6ca2b054c5b..08ef80ef345e 100644
>> --- a/tools/perf/builtin-stat.c
>> +++ b/tools/perf/builtin-stat.c
>> @@ -1856,6 +1856,18 @@ static int add_default_attributes(void)
>> if (!force_metric_only)
>> stat_config.metric_only = true;
>>
>> + if (arch_topdown_use_json_metrics()) {
>> + if (metricgroup__parse_groups_to_evlist(evsel_list, "TopDownL1",
>> + stat_config.metric_no_group,
>> + stat_config.metric_no_merge,
>> + &stat_config.metric_events) < 0) {
>> + pr_err("Could not form list of metrics for topdown\n");
>> + return -1;
>> + }
>> +
>> + goto end_of_topdown_setup;
>> + }
>> +
>> if (pmu_have_event("cpu", topdown_metric_L2_attrs[5])) {
>> metric_attrs = topdown_metric_L2_attrs;
>> max_level = 2;
>> @@ -1919,6 +1931,7 @@ static int add_default_attributes(void)
>> fprintf(stderr, "System does not support topdown\n");
>> return -1;
>> }
>> +end_of_topdown_setup:
>> free(str);
>> }
>>
>> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
>> index 51c99cb08abf..9b0394372096 100644
>> --- a/tools/perf/util/metricgroup.c
>> +++ b/tools/perf/util/metricgroup.c
>> @@ -1535,6 +1535,18 @@ int metricgroup__parse_groups(const struct option *opt,
>> metric_no_merge, NULL, metric_events, map);
>> }
>>
>> +int metricgroup__parse_groups_to_evlist(struct evlist *perf_evlist,
>> + const char *str,
>> + bool metric_no_group,
>> + bool metric_no_merge,
>> + struct rblist *metric_events)
>> +{
>> + const struct pmu_events_map *map = pmu_events_map__find();
>> +
>> + return parse_groups(perf_evlist, str, metric_no_group,
>> + metric_no_merge, NULL, metric_events, map);
>> +}
>> +
>> int metricgroup__parse_groups_test(struct evlist *evlist,
>> const struct pmu_events_map *map,
>> const char *str,
>> diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
>> index 2b42b778d1bf..1f143ad1d9e1 100644
>> --- a/tools/perf/util/metricgroup.h
>> +++ b/tools/perf/util/metricgroup.h
>> @@ -70,6 +70,13 @@ int metricgroup__parse_groups(const struct option *opt,
>> bool metric_no_group,
>> bool metric_no_merge,
>> struct rblist *metric_events);
>> +
>> +int metricgroup__parse_groups_to_evlist(struct evlist *perf_evlist,
>> + const char *str,
>> + bool metric_no_group,
>> + bool metric_no_merge,
>> + struct rblist *metric_events);
>> +
>> const struct pmu_event *metricgroup__find_metric(const char *metric,
>> const struct pmu_events_map *map);
>> int metricgroup__parse_groups_test(struct evlist *evlist,
>> diff --git a/tools/perf/util/topdown.c b/tools/perf/util/topdown.c
>> index 1081b20f9891..57c0c5f2c6bd 100644
>> --- a/tools/perf/util/topdown.c
>> +++ b/tools/perf/util/topdown.c
>> @@ -56,3 +56,9 @@ __weak bool arch_topdown_sample_read(struct evsel *leader __maybe_unused)
>> {
>> return false;
>> }
>> +
>> +__weak bool arch_topdown_use_json_metrics(void)
>> +{
>
> I like this extension! I've ranted in the past about weak symbols
> breaking with archives due to lazy loading [1]. In this case
> tools/perf/arch/arm64/util/topdown.c has no other symbols within it
> and so the weak symbol has an extra chance of being linked
> incorrectly. We could add a new command line of --topdown-json to
> avoid this, but there seems little difference in doing this over just
> doing '-M TopDownL1'. Is it possible to use the json metric approach
> for when the CPU version fails?
If weak symbols are an issue we could define the function normally for
all known platforms. Or just do arm64 and 'other'. I think the end result is
the same. Or have only one function like this:
bool arch_topdown_use_json_metrics(void)
{
#ifdef aarch64
return true;
#elseif ...
}
There are quite a few ways to avoid it. I'm not sure I like the
--topdown-json argument as it would be quite fragmented from a user's
point of view, especially if it was only to work around some linking
issue.
James
>
> Thanks,
> Ian
>
> [1] https://lore.kernel.org/lkml/CAP-5=fVS2AwZ9bP4BjF9GDcZqmw5fwUZ6OGXdgMnFj3w_2OTaw@mail.gmail.com/
>
>> + return false;
>> +}
>> +
>> diff --git a/tools/perf/util/topdown.h b/tools/perf/util/topdown.h
>> index 2f0d0b887639..0a5275a3f078 100644
>> --- a/tools/perf/util/topdown.h
>> +++ b/tools/perf/util/topdown.h
>> @@ -6,6 +6,7 @@
>> bool arch_topdown_check_group(bool *warn);
>> void arch_topdown_group_warn(void);
>> bool arch_topdown_sample_read(struct evsel *leader);
>> +bool arch_topdown_use_json_metrics(void);
>>
>> int topdown_filter_events(const char **attr, char **str, bool use_group);
>>
>> --
>> 2.17.1
>>
Hi Andrew,
>> const struct pmu_event *metricgroup__find_metric(const char *metric,
>> const struct pmu_events_map *map);
>> int metricgroup__parse_groups_test(struct evlist *evlist,
>> diff --git a/tools/perf/util/topdown.c b/tools/perf/util/topdown.c
>> index 1081b20f9891..57c0c5f2c6bd 100644
>> --- a/tools/perf/util/topdown.c
>> +++ b/tools/perf/util/topdown.c
>> @@ -56,3 +56,9 @@ __weak bool arch_topdown_sample_read(struct evsel *leader __maybe_unused)
>> {
>> return false;
>> }
>> +
>> +__weak bool arch_topdown_use_json_metrics(void)
>> +{
AFAICS, only x86 supports topdown today and that is because they have
special kernel topdown events exposed for the kernel CPU PMU driver. So
other architectures - not only arm - would need rely on metricgroups for
topdown support. So let's make this generic for all archs.
> I like this extension! I've ranted in the past about weak symbols
> breaking with archives due to lazy loading [1]. In this case
> tools/perf/arch/arm64/util/topdown.c has no other symbols within it
> and so the weak symbol has an extra chance of being linked
> incorrectly. We could add a new command line of --topdown-json to
> avoid this, but there seems little difference in doing this over just
> doing '-M TopDownL1'.
> Is it possible to use the json metric approach
> for when the CPU version fails?
I think that's a good idea.
In addition we could also add a --topdown arg to force using JSON
metricgroups.
Did you actually test this patch? I have something experimental working
from some time ago, and it was more complicated than this. I need to
check the code again...
Thanks,
John
Ian, John, thanks for the feedback.
On 15/12/2021 10:52, John Garry wrote:
> Hi Andrew,
>
>>> const struct pmu_event *metricgroup__find_metric(const char *metric,
>>> const struct
>>> pmu_events_map *map);
>>> int metricgroup__parse_groups_test(struct evlist *evlist,
>>> diff --git a/tools/perf/util/topdown.c b/tools/perf/util/topdown.c
>>> index 1081b20f9891..57c0c5f2c6bd 100644
>>> --- a/tools/perf/util/topdown.c
>>> +++ b/tools/perf/util/topdown.c
>>> @@ -56,3 +56,9 @@ __weak bool arch_topdown_sample_read(struct evsel
>>> *leader __maybe_unused)
>>> {
>>> return false;
>>> }
>>> +
>>> +__weak bool arch_topdown_use_json_metrics(void)
>>> +{
>
> AFAICS, only x86 supports topdown today and that is because they have
> special kernel topdown events exposed for the kernel CPU PMU driver. So
> other architectures - not only arm - would need rely on metricgroups for
> topdown support. So let's make this generic for all archs.
>
>> I like this extension! I've ranted in the past about weak symbols
>> breaking with archives due to lazy loading [1]. In this case
>> tools/perf/arch/arm64/util/topdown.c has no other symbols within it
>> and so the weak symbol has an extra chance of being linked
>> incorrectly. We could add a new command line of --topdown-json to
>> avoid this, but there seems little difference in doing this over just
>> doing '-M TopDownL1'.
>
>
>> Is it possible to use the json metric approach
>> for when the CPU version fails?
>
> I think that's a good idea.
>
Taking a look.
> In addition we could also add a --topdown arg to force using JSON
> metricgroups.
>
What arg do think would be supplied?
> Did you actually test this patch? I have something experimental working
> from some time ago, and it was more complicated than this. I need to
> check the code again...
>
I got stats back from this implementation, yes. Let me know if there's
things my patch isn't catering for.
> Thanks,
> John
>> In addition we could also add a --topdown arg to force using JSON
>> metricgroups.
>>
>
> What arg do think would be supplied?
something like -json or -metricgroup, meaning "Use pmu-events metric
events to calculate topdown results rather than kernel CPU PMU events.
This is default fallback if the kernel CPU PMU does not support topdown
events"
>
>> Did you actually test this patch? I have something experimental
>> working from some time ago, and it was more complicated than this. I
>> need to check the code again...
>>
>
> I got stats back from this implementation, yes. Let me know if there's
> things my patch isn't catering for.
I'll give it a spin...
Thanks,
John
On 14/12/2021 18:42, Andrew Kilroy wrote:
> This patch implements the --topdown option by making use of metrics to
> dictate what counters are obtained in order to show the various topdown
> columns, e.g. Frontend Bound, Backend Bound, Retiring and Bad
> Speculation.
>
> The MetricGroup name is used to identify which set of metrics are to be
> shown. For the moment use TopDownL1 and enable for arm64
>
> Signed-off-by: Andrew Kilroy<[email protected]>
This works in that it gives results, but does not supply the same output
format as for x86 nor has same restrictions in usage (-a commandline
required, for example, below).
For my x86 broadwell:
john@localhost:~/linux/tools/perf> sudo ./perf stat --topdown sleep 1
top down event configuration requires system-wide mode (-a)
john@localhost:~/linux/tools/perf> sudo ./perf stat --topdown -a sleep 1
Performance counter stats for 'system wide':
retiring bad speculation
frontend bound backend bound
S0-D0-C0 2 29.2% 6.3%
37.4% 27.1%
S0-D0-C1 2 20.4% 6.2%
42.1% 31.3%
0.998007338 seconds time elapsed
john@localhost:~/linux/tools/perf>
---
Then my arm64 hip08 platform:
john@debian:~/kernel-dev/tools/perf$ sudo ./perf stat --topdown sleep 1
Performance counter stats for 'sleep 1':
retiring bad_speculation backend_bound
frontend_bound
0.19 0.17 0.27
0.37
1.000832714 seconds time elapsed
0.000891000 seconds user
0.000000000 seconds sys
And there is no colouring for results which are above/below standard
thresholds (see stat-shadow.c:get_radio_color()).
My impression is that we're not plugging the results from
metricgroup__parse_groups_to_evlist() into the --topdown print
functionality properly.
Thanks,
John
On 15/12/2021 10:52, John Garry wrote:
> Hi Andrew,
>
>>> const struct pmu_event *metricgroup__find_metric(const char *metric,
>>> const struct
>>> pmu_events_map *map);
>>> int metricgroup__parse_groups_test(struct evlist *evlist,
>>> diff --git a/tools/perf/util/topdown.c b/tools/perf/util/topdown.c
>>> index 1081b20f9891..57c0c5f2c6bd 100644
>>> --- a/tools/perf/util/topdown.c
>>> +++ b/tools/perf/util/topdown.c
>>> @@ -56,3 +56,9 @@ __weak bool arch_topdown_sample_read(struct evsel
>>> *leader __maybe_unused)
>>> {
>>> return false;
>>> }
>>> +
>>> +__weak bool arch_topdown_use_json_metrics(void)
>>> +{
>
> AFAICS, only x86 supports topdown today and that is because they have
> special kernel topdown events exposed for the kernel CPU PMU driver. So
> other architectures - not only arm - would need rely on metricgroups for
> topdown support. So let's make this generic for all archs.
>
>> I like this extension! I've ranted in the past about weak symbols
>> breaking with archives due to lazy loading [1]. In this case
>> tools/perf/arch/arm64/util/topdown.c has no other symbols within it
>> and so the weak symbol has an extra chance of being linked
>> incorrectly. We could add a new command line of --topdown-json to
>> avoid this, but there seems little difference in doing this over just
>> doing '-M TopDownL1'.
>
>
>> Is it possible to use the json metric approach
>> for when the CPU version fails?
>
> I think that's a good idea.
>
While looking into using the json metrics approach as a fallback to the
original, I noticed there are two json metricgroups 'TopdownL1' and
'TopDownL1' (note the case difference) on x86. Not sure if the case
difference is intentional.
On skylake, 'TopdownL1' contains the four json metrics Retiring,
Bad_Speculation, Frontend_Bound, and Backend_Bound. 'TopDownL1' has
'SLOTS', 'CoreIPC', 'CoreIPC_SMT', 'Instructions'. I think its a
similar situation on other x86 chips.
The search for those metrics by metricgroup name is case insensitive, so
it's picking up all 8 metrics when using the lookup string 'TopDownL1'.
So the extra 'SLOTS', 'CoreIPC', 'CoreIPC_SMT', 'Instructions' metrics
would be printed as well.
Not sure what the significance of the case difference might be.
Should we use a different string than 'TopDownL1' as the metric group
name to search for?
Andrew
On 12/20/2021 9:21 AM, Andrew Kilroy wrote:
>
> On 15/12/2021 10:52, John Garry wrote:
>> Hi Andrew,
>>
>>>> const struct pmu_event *metricgroup__find_metric(const char *metric,
>>>> const struct
>>>> pmu_events_map *map);
>>>> int metricgroup__parse_groups_test(struct evlist *evlist,
>>>> diff --git a/tools/perf/util/topdown.c b/tools/perf/util/topdown.c
>>>> index 1081b20f9891..57c0c5f2c6bd 100644
>>>> --- a/tools/perf/util/topdown.c
>>>> +++ b/tools/perf/util/topdown.c
>>>> @@ -56,3 +56,9 @@ __weak bool arch_topdown_sample_read(struct evsel
>>>> *leader __maybe_unused)
>>>> {
>>>> return false;
>>>> }
>>>> +
>>>> +__weak bool arch_topdown_use_json_metrics(void)
>>>> +{
>>
>> AFAICS, only x86 supports topdown today and that is because they have
>> special kernel topdown events exposed for the kernel CPU PMU driver.
>> So other architectures - not only arm - would need rely on
>> metricgroups for topdown support. So let's make this generic for all
>> archs.
>>
>>> I like this extension! I've ranted in the past about weak symbols
>>> breaking with archives due to lazy loading [1]. In this case
>>> tools/perf/arch/arm64/util/topdown.c has no other symbols within it
>>> and so the weak symbol has an extra chance of being linked
>>> incorrectly. We could add a new command line of --topdown-json to
>>> avoid this, but there seems little difference in doing this over just
>>> doing '-M TopDownL1'.
>>
>>
>>> Is it possible to use the json metric approach
>>> for when the CPU version fails?
>>
>> I think that's a good idea.
>>
>
>
> While looking into using the json metrics approach as a fallback to
> the original, I noticed there are two json metricgroups 'TopdownL1'
> and 'TopDownL1' (note the case difference) on x86. Not sure if the
> case difference is intentional.
>
> On skylake, 'TopdownL1' contains the four json metrics Retiring,
> Bad_Speculation, Frontend_Bound, and Backend_Bound. 'TopDownL1' has
> 'SLOTS', 'CoreIPC', 'CoreIPC_SMT', 'Instructions'. I think its a
> similar situation on other x86 chips.
There's also SMT metrics.
We don't want to include CoreIPC etc. by default because it would cause
multiplexing in common situations.
>
> The search for those metrics by metricgroup name is case insensitive,
> so it's picking up all 8 metrics when using the lookup string
> 'TopDownL1'. So the extra 'SLOTS', 'CoreIPC', 'CoreIPC_SMT',
> 'Instructions' metrics would be printed as well.
>
> Not sure what the significance of the case difference might be.
>
> Should we use a different string than 'TopDownL1' as the metric group
> name to search for?
We should probably fix the case (or just make the match case insensitive)
Can we just keep x86 at using the kernel metrics? On Skylake and earlier
it needs different formulas and other options depending whether SMT is
on or off, so it's not straight forward to express it as json directly.
-Andi
On 17/12/2021 10:19, John Garry wrote:
> On 14/12/2021 18:42, Andrew Kilroy wrote:
>> This patch implements the --topdown option by making use of metrics to
>> dictate what counters are obtained in order to show the various topdown
>> columns, e.g. Frontend Bound, Backend Bound, Retiring and Bad
>> Speculation.
>>
>> The MetricGroup name is used to identify which set of metrics are to be
>> shown. For the moment use TopDownL1 and enable for arm64
>>
>> Signed-off-by: Andrew Kilroy<[email protected]>
>
> This works in that it gives results, but does not supply the same output
> format as for x86 nor has same restrictions in usage (-a commandline
> required, for example, below).
>
> For my x86 broadwell:
>
> john@localhost:~/linux/tools/perf> sudo ./perf stat --topdown sleep 1
> top down event configuration requires system-wide mode (-a)
>
> john@localhost:~/linux/tools/perf> sudo ./perf stat --topdown -a sleep 1
> Performance counter stats for 'system wide':
>
> retiring bad speculation
> frontend bound backend bound
> S0-D0-C0 2 29.2% 6.3%
> 37.4% 27.1%
> S0-D0-C1 2 20.4% 6.2%
> 42.1% 31.3%
>
> 0.998007338 seconds time elapsed
>
> john@localhost:~/linux/tools/perf>
>
Judging by comments in commits 44b1e60ab576c, 55c36a9fc2aaa, whether -a
is required or not differs depending on the cpu. As to why, I'm not
sure. The requirement was relaxed in 55c36a9fc2aaa, but I guess that
doesn't affect the broadwell.
The stats are printed per cpu because on your broadwell, the existing
code is forcing per-core mode. Hence why -a is required. See
builtin-stat.c lines 1885-1890 on commit 8ff4f20f3eb55.
My patch wasn't forcing per-core, hence it didn't require -a.
Andrew
On 17/12/2021 10:19, John Garry wrote:
>
> And there is no colouring for results which are above/below standard
> thresholds (see stat-shadow.c:get_radio_color()).
>
> My impression is that we're not plugging the results from
> metricgroup__parse_groups_to_evlist() into the --topdown print
> functionality properly.
>
The --topdown kernel event colouring is dictated by a large if-else
statement in stat-shadow.c:perf_stat__print_shadow_stats.
There are branches depending on what is returned by
perf_stat_evsel__is() for example
} else if (perf_stat_evsel__is(evsel, TOPDOWN_FETCH_BUBBLES)) {
double fe_bound = td_fe_bound(cpu, st, &rsd);
if (fe_bound > 0.2)
color = PERF_COLOR_RED;
print_metric(config, ctxp, color, "%8.1f%%", "frontend bound",
fe_bound * 100.);
} else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_RETIRED)) {
Because the patches are enabling metrics (equivalent of the -M
'somemetricname' option), the perf_stat__print_shadow_stats function
always makes calls to generic_metric(), where colours are never picked.
Seeing thresholds like:
retiring > 0.7
fe_bound > 0.2
be_bound > 0.2
bad_spec > 0.1
I'm not sure about adding the colouring really. Are these thresholds
x86 specific?
> Thanks,
> John
Andrew
On 15/12/2021 12:53, John Garry wrote:
>>> In addition we could also add a --topdown arg to force using JSON
>>> metricgroups.
>>>
>>
>> What arg do think would be supplied?
>
> something like -json or -metricgroup, meaning "Use pmu-events metric
> events to calculate topdown results rather than kernel CPU PMU events.
> This is default fallback if the kernel CPU PMU does not support topdown
> events"
>
John,
Andi requested that the json metrics be disabled on x86 in
https://lore.kernel.org/linux-perf-users/[email protected]/#t
If we do that, do you still think the --topdown=<arg> modification is
needed? I guess it would have to report an error if someone requests
--topdown=json-metrics on x86.
Thanks,
Andrew
On 06/01/2022 16:33, Andrew Kilroy wrote:
>
> Andi requested that the json metrics be disabled on x86 in
>
>
> https://lore.kernel.org/linux-perf-users/[email protected]/#t
>
>
> If we do that, do you still think the --topdown=<arg> modification is
> needed?
Probably not. I assumed that the metricgroup solution could work for x86
also, so now no need for this option.
> I guess it would have to report an error if someone requests
> --topdown=json-metrics on x86.
>
Thanks,
John
This patch series adds the ability for the --topdown option to use
metrics (defined in json files in the pmu-events directory) to describe
how to calculate and determine the output columns for topdown level 1.
For this to work, a number of metrics have to be defined for the
relevant processor with the MetricGroup name "TopDownL1". perf will
arrange for the events defined in each metric to be collected, and each
metric will be displayed in the output, as if
perf stat -M 'TopDownL1' --metric-only -- exampleapp
had been used.
Topdown was already implemented where certain kernel events are defined.
If these kernel events are defined, the new json metrics behaviour is
not used. The json metrics approach is only used if the kernel events
are absent.
The last patch in the series disables the json metrics behaviour on x86.
This is because of concerns that due to SMT it's not straightforward to
express the various formulas as json for certain x86 cpus. See
https://lore.kernel.org/linux-perf-users/[email protected]/#t
Changes since v1:
Addition of code to detect whether topdown kernel events are available,
and if so use them. Otherwise set up the json metrics.
Disable the use of json metrics on non-x86 architectures, for the reason
stated above.
- Link to v1:
https://lore.kernel.org/linux-perf-users/[email protected]/T/#m514a788bdc3613f057dbd5b6a339c762d37f8b85
Andrew Kilroy (5):
perf stat: Implement --topdown with metrics
perf stat: Topdown kernel events setup function
perf stat: Topdown json metrics setup function
perf stat: Detect if topdown kernel events supported
perf stat: Ensure only topdown kernel events used on x86
tools/perf/builtin-stat.c | 257 +++++++++++++++++++++++++---------
tools/perf/util/metricgroup.c | 12 ++
tools/perf/util/metricgroup.h | 7 +
tools/perf/util/topdown.c | 10 ++
tools/perf/util/topdown.h | 1 +
5 files changed, 223 insertions(+), 64 deletions(-)
--
2.17.1
This patch implements the --topdown option by making use of metrics to
dictate what counters are obtained in order to show the various topdown
columns, e.g. Frontend Bound, Backend Bound, Retiring and Bad
Speculation.
The MetricGroup name is used to identify which set of metrics are to be
shown. For the moment use TopDownL1 and enable for arm64
Signed-off-by: Andrew Kilroy <[email protected]>
---
tools/perf/builtin-stat.c | 13 +++++++++++++
tools/perf/util/metricgroup.c | 12 ++++++++++++
tools/perf/util/metricgroup.h | 7 +++++++
tools/perf/util/topdown.c | 10 ++++++++++
tools/perf/util/topdown.h | 1 +
5 files changed, 43 insertions(+)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index f6ca2b054c5b..975b1e0edaf4 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1856,6 +1856,18 @@ static int add_default_attributes(void)
if (!force_metric_only)
stat_config.metric_only = true;
+ if (topdown_can_use_json_metrics()) {
+ if (metricgroup__parse_groups_to_evlist(evsel_list, "TopDownL1",
+ stat_config.metric_no_group,
+ stat_config.metric_no_merge,
+ &stat_config.metric_events) < 0) {
+ pr_err("Could not form list of metrics for topdown\n");
+ return -1;
+ }
+
+ goto end_of_topdown_setup;
+ }
+
if (pmu_have_event("cpu", topdown_metric_L2_attrs[5])) {
metric_attrs = topdown_metric_L2_attrs;
max_level = 2;
@@ -1919,6 +1931,7 @@ static int add_default_attributes(void)
fprintf(stderr, "System does not support topdown\n");
return -1;
}
+end_of_topdown_setup:
free(str);
}
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 51c99cb08abf..9b0394372096 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -1535,6 +1535,18 @@ int metricgroup__parse_groups(const struct option *opt,
metric_no_merge, NULL, metric_events, map);
}
+int metricgroup__parse_groups_to_evlist(struct evlist *perf_evlist,
+ const char *str,
+ bool metric_no_group,
+ bool metric_no_merge,
+ struct rblist *metric_events)
+{
+ const struct pmu_events_map *map = pmu_events_map__find();
+
+ return parse_groups(perf_evlist, str, metric_no_group,
+ metric_no_merge, NULL, metric_events, map);
+}
+
int metricgroup__parse_groups_test(struct evlist *evlist,
const struct pmu_events_map *map,
const char *str,
diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
index 2b42b778d1bf..1f143ad1d9e1 100644
--- a/tools/perf/util/metricgroup.h
+++ b/tools/perf/util/metricgroup.h
@@ -70,6 +70,13 @@ int metricgroup__parse_groups(const struct option *opt,
bool metric_no_group,
bool metric_no_merge,
struct rblist *metric_events);
+
+int metricgroup__parse_groups_to_evlist(struct evlist *perf_evlist,
+ const char *str,
+ bool metric_no_group,
+ bool metric_no_merge,
+ struct rblist *metric_events);
+
const struct pmu_event *metricgroup__find_metric(const char *metric,
const struct pmu_events_map *map);
int metricgroup__parse_groups_test(struct evlist *evlist,
diff --git a/tools/perf/util/topdown.c b/tools/perf/util/topdown.c
index 1081b20f9891..a542dddd97f3 100644
--- a/tools/perf/util/topdown.c
+++ b/tools/perf/util/topdown.c
@@ -56,3 +56,13 @@ __weak bool arch_topdown_sample_read(struct evsel *leader __maybe_unused)
{
return false;
}
+
+bool topdown_can_use_json_metrics(void)
+{
+#if defined(__aarch64__)
+ return true;
+#else
+ return false;
+#endif
+}
+
diff --git a/tools/perf/util/topdown.h b/tools/perf/util/topdown.h
index 2f0d0b887639..3e28f77443d3 100644
--- a/tools/perf/util/topdown.h
+++ b/tools/perf/util/topdown.h
@@ -8,5 +8,6 @@ void arch_topdown_group_warn(void);
bool arch_topdown_sample_read(struct evsel *leader);
int topdown_filter_events(const char **attr, char **str, bool use_group);
+bool topdown_can_use_json_metrics(void);
#endif
--
2.17.1
Move the code block that sets up topdown by using kernel events into its
own function.
Signed-off-by: Andrew Kilroy <[email protected]>
---
tools/perf/builtin-stat.c | 157 ++++++++++++++++++++------------------
1 file changed, 82 insertions(+), 75 deletions(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 975b1e0edaf4..ab956ac97d94 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1649,6 +1649,84 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
return 0;
}
+static int try_non_json_metrics_topdown(void)
+{
+ int err;
+ const char **metric_attrs = topdown_metric_attrs;
+ unsigned int max_level = 1;
+ char *str = NULL;
+ bool warn = false;
+
+ if (!force_metric_only)
+ stat_config.metric_only = true;
+
+ if (pmu_have_event("cpu", topdown_metric_L2_attrs[5])) {
+ metric_attrs = topdown_metric_L2_attrs;
+ max_level = 2;
+ }
+
+ if (stat_config.topdown_level > max_level) {
+ pr_err("Invalid top-down metrics level. The max level is %u.\n", max_level);
+ return -1;
+ } else if (!stat_config.topdown_level)
+ stat_config.topdown_level = max_level;
+
+ if (topdown_filter_events(metric_attrs, &str, 1) < 0) {
+ pr_err("Out of memory\n");
+ return -1;
+ }
+ if (metric_attrs[0] && str) {
+ if (!stat_config.interval && !stat_config.metric_only) {
+ fprintf(stat_config.output,
+ "Topdown accuracy may decrease when measuring long periods.\n"
+ "Please print the result regularly, e.g. -I1000\n");
+ }
+ goto setup_metrics;
+ }
+
+ zfree(&str);
+
+ if (stat_config.aggr_mode != AGGR_GLOBAL &&
+ stat_config.aggr_mode != AGGR_CORE) {
+ pr_err("top down event configuration requires --per-core mode\n");
+ return -1;
+ }
+ stat_config.aggr_mode = AGGR_CORE;
+ if (nr_cgroups || !target__has_cpu(&target)) {
+ pr_err("top down event configuration requires system-wide mode (-a)\n");
+ return -1;
+ }
+
+ if (topdown_filter_events(topdown_attrs, &str,
+ arch_topdown_check_group(&warn)) < 0) {
+ pr_err("Out of memory\n");
+ return -1;
+ }
+ if (topdown_attrs[0] && str) {
+ struct parse_events_error errinfo;
+ if (warn)
+ arch_topdown_group_warn();
+setup_metrics:
+ parse_events_error__init(&errinfo);
+ err = parse_events(evsel_list, str, &errinfo);
+ if (err) {
+ fprintf(stderr,
+ "Cannot set up top down events %s: %d\n",
+ str, err);
+ parse_events_error__print(&errinfo, str);
+ parse_events_error__exit(&errinfo);
+ free(str);
+ return -1;
+ }
+ parse_events_error__exit(&errinfo);
+ } else {
+ fprintf(stderr, "System does not support topdown\n");
+ return -1;
+ }
+ free(str);
+ return err;
+}
+
/*
* Add default attributes, if there were no attributes specified or
* if -d/--detailed, -d -d or -d -d -d is used:
@@ -1848,14 +1926,6 @@ static int add_default_attributes(void)
}
if (topdown_run) {
- const char **metric_attrs = topdown_metric_attrs;
- unsigned int max_level = 1;
- char *str = NULL;
- bool warn = false;
-
- if (!force_metric_only)
- stat_config.metric_only = true;
-
if (topdown_can_use_json_metrics()) {
if (metricgroup__parse_groups_to_evlist(evsel_list, "TopDownL1",
stat_config.metric_no_group,
@@ -1864,75 +1934,12 @@ static int add_default_attributes(void)
pr_err("Could not form list of metrics for topdown\n");
return -1;
}
-
- goto end_of_topdown_setup;
- }
-
- if (pmu_have_event("cpu", topdown_metric_L2_attrs[5])) {
- metric_attrs = topdown_metric_L2_attrs;
- max_level = 2;
- }
-
- if (stat_config.topdown_level > max_level) {
- pr_err("Invalid top-down metrics level. The max level is %u.\n", max_level);
- return -1;
- } else if (!stat_config.topdown_level)
- stat_config.topdown_level = max_level;
-
- if (topdown_filter_events(metric_attrs, &str, 1) < 0) {
- pr_err("Out of memory\n");
- return -1;
- }
- if (metric_attrs[0] && str) {
- if (!stat_config.interval && !stat_config.metric_only) {
- fprintf(stat_config.output,
- "Topdown accuracy may decrease when measuring long periods.\n"
- "Please print the result regularly, e.g. -I1000\n");
- }
- goto setup_metrics;
- }
-
- zfree(&str);
-
- if (stat_config.aggr_mode != AGGR_GLOBAL &&
- stat_config.aggr_mode != AGGR_CORE) {
- pr_err("top down event configuration requires --per-core mode\n");
- return -1;
- }
- stat_config.aggr_mode = AGGR_CORE;
- if (nr_cgroups || !target__has_cpu(&target)) {
- pr_err("top down event configuration requires system-wide mode (-a)\n");
- return -1;
- }
-
- if (topdown_filter_events(topdown_attrs, &str,
- arch_topdown_check_group(&warn)) < 0) {
- pr_err("Out of memory\n");
- return -1;
- }
- if (topdown_attrs[0] && str) {
- struct parse_events_error errinfo;
- if (warn)
- arch_topdown_group_warn();
-setup_metrics:
- parse_events_error__init(&errinfo);
- err = parse_events(evsel_list, str, &errinfo);
- if (err) {
- fprintf(stderr,
- "Cannot set up top down events %s: %d\n",
- str, err);
- parse_events_error__print(&errinfo, str);
- parse_events_error__exit(&errinfo);
- free(str);
- return -1;
- }
- parse_events_error__exit(&errinfo);
} else {
- fprintf(stderr, "System does not support topdown\n");
- return -1;
+ err = try_non_json_metrics_topdown();
+ if (err)
+ return err;
}
-end_of_topdown_setup:
- free(str);
+
}
if (!evsel_list->core.nr_entries) {
--
2.17.1
Move into its own function, the set up of json metrics to measure L1
topdown statistics
Also move the setup of the metrics_only member of stat_config outside,
since its supposed to be common to both the kernel events and json
metrics implementations
Signed-off-by: Andrew Kilroy <[email protected]>
---
tools/perf/builtin-stat.c | 31 +++++++++++++++++++------------
1 file changed, 19 insertions(+), 12 deletions(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index ab956ac97d94..6122f3a764f8 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1651,15 +1651,12 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
static int try_non_json_metrics_topdown(void)
{
- int err;
+ int err = 0;
const char **metric_attrs = topdown_metric_attrs;
unsigned int max_level = 1;
char *str = NULL;
bool warn = false;
- if (!force_metric_only)
- stat_config.metric_only = true;
-
if (pmu_have_event("cpu", topdown_metric_L2_attrs[5])) {
metric_attrs = topdown_metric_L2_attrs;
max_level = 2;
@@ -1727,6 +1724,18 @@ static int try_non_json_metrics_topdown(void)
return err;
}
+static int try_json_metrics_topdown(void)
+{
+ if (metricgroup__parse_groups_to_evlist(evsel_list, "TopDownL1",
+ stat_config.metric_no_group,
+ stat_config.metric_no_merge,
+ &stat_config.metric_events) < 0) {
+ pr_err("Could not form list of metrics for topdown\n");
+ return -1;
+ }
+ return 0;
+}
+
/*
* Add default attributes, if there were no attributes specified or
* if -d/--detailed, -d -d or -d -d -d is used:
@@ -1926,20 +1935,18 @@ static int add_default_attributes(void)
}
if (topdown_run) {
+ if (!force_metric_only)
+ stat_config.metric_only = true;
+
if (topdown_can_use_json_metrics()) {
- if (metricgroup__parse_groups_to_evlist(evsel_list, "TopDownL1",
- stat_config.metric_no_group,
- stat_config.metric_no_merge,
- &stat_config.metric_events) < 0) {
- pr_err("Could not form list of metrics for topdown\n");
- return -1;
- }
+ err = try_json_metrics_topdown();
+ if (err)
+ return err;
} else {
err = try_non_json_metrics_topdown();
if (err)
return err;
}
-
}
if (!evsel_list->core.nr_entries) {
--
2.17.1
This patch checks if the kernel events topdown implementation can be set
up.
Do this by iterating over two arrays defining what kernel events should
be present. If one of those arrays define at least one event that is
present, the kernel events are supported.
If no topdown kernel events are detected, the json metrics approach is
attempted.
Signed-off-by: Andrew Kilroy <[email protected]>
---
tools/perf/builtin-stat.c | 114 +++++++++++++++++++++++++++++++++++---
1 file changed, 106 insertions(+), 8 deletions(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 6122f3a764f8..2f579d29f9f5 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -93,6 +93,7 @@
#include <linux/ctype.h>
#include <perf/evlist.h>
+#include <linux/string.h>
#define DEFAULT_SEPARATOR " "
#define FREEZE_ON_SMI_PATH "devices/cpu/freeze_on_smi"
@@ -1649,6 +1650,75 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
return 0;
}
+// Return the number of elements in the array excluding the final
+// NULL array element.
+static size_t str_array_len(const char **str)
+{
+ size_t c = 0;
+ while (str[c] != NULL) {
+ ++c;
+ }
+ return c;
+}
+
+// Checks if topdown kernel events listed by the given
+// array of event names are supported.
+//
+// The input array is not modified.
+//
+// Returns 1 if supported,
+// 0 if not supported,
+// -1 if some unexpected error occurred while checking
+static int check_events_available(const char **orig_events_array)
+{
+ char *str = NULL;
+ size_t basic_events_len = str_array_len(orig_events_array);
+ size_t basic_events_cpy_bytes = sizeof(const char *) * (basic_events_len + 1);
+ const char **basic_events = NULL;
+
+ // This function shouldn't have any side effects.
+ // Since topdown_filter_events mutates the arrays it inspects,
+ // this function takes temporary shallow copies of the input
+ // string array
+ basic_events = memdup(orig_events_array, basic_events_cpy_bytes);
+ if (basic_events == NULL) {
+ pr_err("Out of memory, could not copy topdown events array\n");
+ return -1;
+ }
+
+ if (topdown_filter_events(basic_events, &str, 1) < 0) {
+ pr_err("Out of memory, could not form events string\n");
+ free(basic_events);
+ return -1;
+ }
+ if (basic_events[0] && str) {
+ free(basic_events);
+ free(str);
+ return 1;
+ }
+ free(basic_events);
+ free(str);
+
+ return 0;
+}
+
+// Checks if topdown kernel events support has been detected
+// on this system.
+//
+// Returns 1 if supported,
+// 0 if not supported,
+// -1 if some unexpected error occurred while checking
+static int topdown_kernel_events_supported(void)
+{
+ int l1_and_l2_available = check_events_available(topdown_metric_L2_attrs);
+
+ if (l1_and_l2_available == 0) {
+ return check_events_available(topdown_attrs);
+ } else {
+ return l1_and_l2_available;
+ }
+}
+
static int try_non_json_metrics_topdown(void)
{
int err = 0;
@@ -1736,6 +1806,27 @@ static int try_json_metrics_topdown(void)
return 0;
}
+enum topdown_mechanism {
+ TOPDOWN_JSON_METRICS,
+ TOPDOWN_KERNEL_EVENTS,
+ TOPDOWN_DETECTION_ERROR,
+};
+
+static enum topdown_mechanism choose_topdown_mechanism(void)
+{
+ int kernel_events_supported = topdown_kernel_events_supported();
+
+ if (kernel_events_supported > 0) {
+ pr_debug("topdown kernel events are supported\n");
+ return TOPDOWN_KERNEL_EVENTS;
+ } else if (kernel_events_supported == 0) {
+ pr_debug("topdown kernel events are unsupported\n");
+ return TOPDOWN_JSON_METRICS;
+ } else {
+ return TOPDOWN_DETECTION_ERROR;
+ }
+}
+
/*
* Add default attributes, if there were no attributes specified or
* if -d/--detailed, -d -d or -d -d -d is used:
@@ -1935,17 +2026,24 @@ static int add_default_attributes(void)
}
if (topdown_run) {
+ int topdown_err = 0;
if (!force_metric_only)
stat_config.metric_only = true;
- if (topdown_can_use_json_metrics()) {
- err = try_json_metrics_topdown();
- if (err)
- return err;
- } else {
- err = try_non_json_metrics_topdown();
- if (err)
- return err;
+ switch (choose_topdown_mechanism()) {
+ case TOPDOWN_JSON_METRICS:
+ topdown_err = try_json_metrics_topdown();
+ break;
+ case TOPDOWN_DETECTION_ERROR:
+ return -1;
+ case TOPDOWN_KERNEL_EVENTS:
+ default:
+ topdown_err = try_non_json_metrics_topdown();
+ break;
+ }
+
+ if (topdown_err < 0) {
+ return -1;
}
}
--
2.17.1
Based on advice here:
https://lore.kernel.org/linux-perf-users/[email protected]/#t
Only use the existing kernel events topdown mechanism on x86, not the
json metrics approach. Disabling the json metrics because of concerns
that due to SMT it's not straightforward to express the various formulas
as json for certain x86 cpus.
Signed-off-by: Andrew Kilroy <[email protected]>
---
tools/perf/builtin-stat.c | 22 +++++++++++++---------
tools/perf/util/topdown.c | 6 +++---
2 files changed, 16 insertions(+), 12 deletions(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 2f579d29f9f5..eee58fbf1986 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1814,16 +1814,20 @@ enum topdown_mechanism {
static enum topdown_mechanism choose_topdown_mechanism(void)
{
- int kernel_events_supported = topdown_kernel_events_supported();
-
- if (kernel_events_supported > 0) {
- pr_debug("topdown kernel events are supported\n");
- return TOPDOWN_KERNEL_EVENTS;
- } else if (kernel_events_supported == 0) {
- pr_debug("topdown kernel events are unsupported\n");
- return TOPDOWN_JSON_METRICS;
+ if (topdown_can_use_json_metrics()) {
+ int kernel_events_supported = topdown_kernel_events_supported();
+
+ if (kernel_events_supported > 0) {
+ pr_debug("topdown kernel events are supported\n");
+ return TOPDOWN_KERNEL_EVENTS;
+ } else if (kernel_events_supported == 0) {
+ pr_debug("topdown kernel events are unsupported\n");
+ return TOPDOWN_JSON_METRICS;
+ } else {
+ return TOPDOWN_DETECTION_ERROR;
+ }
} else {
- return TOPDOWN_DETECTION_ERROR;
+ return TOPDOWN_KERNEL_EVENTS;
}
}
diff --git a/tools/perf/util/topdown.c b/tools/perf/util/topdown.c
index a542dddd97f3..36f6c29009fb 100644
--- a/tools/perf/util/topdown.c
+++ b/tools/perf/util/topdown.c
@@ -59,10 +59,10 @@ __weak bool arch_topdown_sample_read(struct evsel *leader __maybe_unused)
bool topdown_can_use_json_metrics(void)
{
-#if defined(__aarch64__)
- return true;
-#else
+#if defined(__i386__) || defined(__x86_64__)
return false;
+#else
+ return true;
#endif
}
--
2.17.1
On 11/01/2022 15:07, Andrew Kilroy wrote:
> This patch series adds the ability for the --topdown option to use
> metrics (defined in json files in the pmu-events directory) to describe
> how to calculate and determine the output columns for topdown level 1.
>
> For this to work, a number of metrics have to be defined for the
> relevant processor with the MetricGroup name "TopDownL1". perf will
> arrange for the events defined in each metric to be collected, and each
> metric will be displayed in the output, as if
>
> perf stat -M 'TopDownL1' --metric-only -- exampleapp
>
> had been used.
>
> Topdown was already implemented where certain kernel events are defined.
> If these kernel events are defined, the new json metrics behaviour is
> not used. The json metrics approach is only used if the kernel events
> are absent.
>
> The last patch in the series disables the json metrics behaviour on x86.
> This is because of concerns that due to SMT it's not straightforward to
> express the various formulas as json for certain x86 cpus. See
I suppose this solution is ok.
A concern is that today we only have 1x arm64 platform which actually
supports this in mainline.
Do you have any more which you plan to support?
I think that it's the frontend bound and fetch_bubble event which
doesn't have a standard arm solution.
Note that I do have a series for perf tool which can read arm cpu pmu
sysfs events folder to find events which are implemented (I don't think
all required events are mandated) and match that against the common arch
events JSON, so that we don't need a JSON definition file for each core
implementation from all implementators - this would improve scalability.
However a concern is that some events - like inst_spec - have imp def
meaning, so may not be good to always use by default for all cores metrics.
Thanks,
John
On 20/01/2022 09:26, John Garry wrote:
> On 11/01/2022 15:07, Andrew Kilroy wrote:
>> This patch series adds the ability for the --topdown option to use
>> metrics (defined in json files in the pmu-events directory) to describe
>> how to calculate and determine the output columns for topdown level 1.
>>
>> For this to work, a number of metrics have to be defined for the
>> relevant processor with the MetricGroup name "TopDownL1". perf will
>> arrange for the events defined in each metric to be collected, and each
>> metric will be displayed in the output, as if
>>
>> perf stat -M 'TopDownL1' --metric-only -- exampleapp
>>
>> had been used.
>>
>> Topdown was already implemented where certain kernel events are defined.
>> If these kernel events are defined, the new json metrics behaviour is
>> not used. The json metrics approach is only used if the kernel events
>> are absent.
>>
>> The last patch in the series disables the json metrics behaviour on x86.
>> This is because of concerns that due to SMT it's not straightforward to
>> express the various formulas as json for certain x86 cpus. See
>
> I suppose this solution is ok.
>
> A concern is that today we only have 1x arm64 platform which actually supports this in mainline.
>
> Do you have any more which you plan to support?
>
> I think that it's the frontend bound and fetch_bubble event which doesn't have a standard arm solution.
>
> Note that I do have a series for perf tool which can read arm cpu pmu sysfs events folder to find events which are implemented (I don't think all required events are mandated) and match that against the common arch events JSON, so that we don't need a JSON definition file for each core implementation from all implementators - this would improve scalability.However a concern is that some events - like inst_spec - have imp def meaning, so may not be good to always use by default for all cores metrics.
Sadly the sysfs list isn't complete, it only includes the events
discoverable from the PMCEIDx registers, and they only cover the
ranges 0x0000-0x003f and 0x4000-0x403f. Although that covers most
events used in standard metrics, it doesn't cover all. Most CPUs
have many more events besides these, and there are now architected
(common) events in the 0x8000 range.
There's a lot to be said for having the kernel expose the complete
list to userspace via sysfs. It would save each userspace tool
needing its own set of vendor-supplied information. But to get
that complete list, the kernel would need the same vendor
information the userspace tools are using now.
(And your concern about the metrics varying even when the same
events are present, is quite valid.)
Al
Hi Andi,
On 21/12/2021 14:03, Andi Kleen wrote:
>
> On 12/20/2021 9:21 AM, Andrew Kilroy wrote:
>>
>> On 15/12/2021 10:52, John Garry wrote:
>>> Hi Andrew,
>>>
>>>>> const struct pmu_event *metricgroup__find_metric(const char *metric,
>>>>> const struct
>>>>> pmu_events_map *map);
>>>>> int metricgroup__parse_groups_test(struct evlist *evlist,
>>>>> diff --git a/tools/perf/util/topdown.c b/tools/perf/util/topdown.c
>>>>> index 1081b20f9891..57c0c5f2c6bd 100644
>>>>> --- a/tools/perf/util/topdown.c
>>>>> +++ b/tools/perf/util/topdown.c
>>>>> @@ -56,3 +56,9 @@ __weak bool arch_topdown_sample_read(struct evsel
>>>>> *leader __maybe_unused)
>>>>> {
>>>>> return false;
>>>>> }
>>>>> +
>>>>> +__weak bool arch_topdown_use_json_metrics(void)
>>>>> +{
>>>
>>> AFAICS, only x86 supports topdown today and that is because they have
>>> special kernel topdown events exposed for the kernel CPU PMU driver.
>>> So other architectures - not only arm - would need rely on
>>> metricgroups for topdown support. So let's make this generic for all
>>> archs.
>>>
>>>> I like this extension! I've ranted in the past about weak symbols
>>>> breaking with archives due to lazy loading [1]. In this case
>>>> tools/perf/arch/arm64/util/topdown.c has no other symbols within it
>>>> and so the weak symbol has an extra chance of being linked
>>>> incorrectly. We could add a new command line of --topdown-json to
>>>> avoid this, but there seems little difference in doing this over just
>>>> doing '-M TopDownL1'.
>>>
>>>
>>>> Is it possible to use the json metric approach
>>>> for when the CPU version fails?
>>>
>>> I think that's a good idea.
>>>
>>
>>
>> While looking into using the json metrics approach as a fallback to
>> the original, I noticed there are two json metricgroups 'TopdownL1'
>> and 'TopDownL1' (note the case difference) on x86. Not sure if the
>> case difference is intentional.
>>
>> On skylake, 'TopdownL1' contains the four json metrics Retiring,
>> Bad_Speculation, Frontend_Bound, and Backend_Bound. 'TopDownL1' has
>> 'SLOTS', 'CoreIPC', 'CoreIPC_SMT', 'Instructions'. I think its a
>> similar situation on other x86 chips.
>
>
> There's also SMT metrics.
>
>
> We don't want to include CoreIPC etc. by default because it would cause
> multiplexing in common situations.
>
>>
>> The search for those metrics by metricgroup name is case insensitive,
>> so it's picking up all 8 metrics when using the lookup string
>> 'TopDownL1'. So the extra 'SLOTS', 'CoreIPC', 'CoreIPC_SMT',
>> 'Instructions' metrics would be printed as well.
>>
>> Not sure what the significance of the case difference might be.
>>
>> Should we use a different string than 'TopDownL1' as the metric group
>> name to search for?
>
>
> We should probably fix the case (or just make the match case insensitive)
>
> Can we just keep x86 at using the kernel metrics? On Skylake and earlier
> it needs different formulas and other options depending whether SMT is
> on or off, so it's not straight forward to express it as json directly.
>
I posted a v2 of these patches which keeps x86 only using the kernel
metrics.
https://lore.kernel.org/linux-perf-users/[email protected]/
Would be good to get your feedback,
Thanks
Andrew
On 20/01/2022 09:26, John Garry wrote:
> On 11/01/2022 15:07, Andrew Kilroy wrote:
>> This patch series adds the ability for the --topdown option to use
>> metrics (defined in json files in the pmu-events directory) to describe
>> how to calculate and determine the output columns for topdown level 1.
>>
>> For this to work, a number of metrics have to be defined for the
>> relevant processor with the MetricGroup name "TopDownL1". perf will
>> arrange for the events defined in each metric to be collected, and each
>> metric will be displayed in the output, as if
>>
>> perf stat -M 'TopDownL1' --metric-only -- exampleapp
>>
>> had been used.
>>
>> Topdown was already implemented where certain kernel events are defined.
>> If these kernel events are defined, the new json metrics behaviour is
>> not used. The json metrics approach is only used if the kernel events
>> are absent.
>>
>> The last patch in the series disables the json metrics behaviour on x86.
>> This is because of concerns that due to SMT it's not straightforward to
>> express the various formulas as json for certain x86 cpus. See
>
> I suppose this solution is ok.
>
Thanks, would you mind giving it a Reviewed-By?
> A concern is that today we only have 1x arm64 platform which actually
> supports this in mainline.
>
> Do you have any more which you plan to support?
>
The Neoverse cores, mainly.
> I think that it's the frontend bound and fetch_bubble event which
> doesn't have a standard arm solution.
>
> Note that I do have a series for perf tool which can read arm cpu pmu
> sysfs events folder to find events which are implemented (I don't think
> all required events are mandated) and match that against the common arch
> events JSON, so that we don't need a JSON definition file for each core
> implementation from all implementators - this would improve scalability.
> However a concern is that some events - like inst_spec - have imp def
> meaning, so may not be good to always use by default for all cores metrics.
>
> Thanks,
> John
Thanks,
Andrew
On 11/01/2022 15:07, Andrew Kilroy wrote:
> }
> +
> +bool topdown_can_use_json_metrics(void)
nit: maybe topdown_can_use_metricgroups() could be better
> +{
> +#if defined(__aarch64__)
> + return true;
> +#else
> + return false;
> +#endif
it might be worth just having !x86, just that is prob too much forward
looking
> +}
> +
On 05/01/2022 16:58, Andrew Kilroy wrote:
>>
>
Sorry for very slow response..
> The --topdown kernel event colouring is dictated by a large if-else
> statement in stat-shadow.c:perf_stat__print_shadow_stats.
>
> There are branches depending on what is returned by
> perf_stat_evsel__is() for example
>
> } else if (perf_stat_evsel__is(evsel, TOPDOWN_FETCH_BUBBLES)) {
> double fe_bound = td_fe_bound(cpu, st, &rsd);
>
> if (fe_bound > 0.2)
> color = PERF_COLOR_RED;
> print_metric(config, ctxp, color, "%8.1f%%", "frontend bound",
> fe_bound * 100.);
> } else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_RETIRED)) {
>
>
>
> Because the patches are enabling metrics (equivalent of the -M
> 'somemetricname' option), the perf_stat__print_shadow_stats function
> always makes calls to generic_metric(), where colours are never picked.
>
> Seeing thresholds like:
>
> retiring > 0.7
> fe_bound > 0.2
> be_bound > 0.2
> bad_spec > 0.1
>
>
> I'm not sure about adding the colouring really. Are these thresholds
> x86 specific?
There is info on topdown for vtune here:
https://www.intel.com/content/www/us/en/develop/documentation/vtune-cookbook/top/methodologies/top-down-microarchitecture-analysis-method.html
The threshold info described there seems somewhat consistent with perf
tool topdown thresholds and that is based on general guidelines for
certain compute categories.
Andi did mention "specification" here:
https://lore.kernel.org/lkml/CABPqkBRftsHEAEwgCn3i3=mfk9fjh5r4MycdjHKRka5voTj9JA@mail.gmail.com/
But I don't know it, apart from a paper:
file:///home/john/Downloads/TopDown-Yasin-ISPASS14.pdf
Thanks,
John
On 27/01/2022 11:42, Andrew Kilroy wrote:
>
>
> On 20/01/2022 09:26, John Garry wrote:
>> On 11/01/2022 15:07, Andrew Kilroy wrote:
>>> This patch series adds the ability for the --topdown option to use
>>> metrics (defined in json files in the pmu-events directory) to describe
>>> how to calculate and determine the output columns for topdown level 1.
>>>
>>> For this to work, a number of metrics have to be defined for the
>>> relevant processor with the MetricGroup name "TopDownL1". perf will
>>> arrange for the events defined in each metric to be collected, and each
>>> metric will be displayed in the output, as if
>>>
>>> perf stat -M 'TopDownL1' --metric-only -- exampleapp
>>>
>>> had been used.
>>>
>>> Topdown was already implemented where certain kernel events are defined.
>>> If these kernel events are defined, the new json metrics behaviour is
>>> not used. The json metrics approach is only used if the kernel events
>>> are absent.
>>>
>>> The last patch in the series disables the json metrics behaviour on x86.
>>> This is because of concerns that due to SMT it's not straightforward to
>>> express the various formulas as json for certain x86 cpus. See
>>
>> I suppose this solution is ok.
>>
>
> Thanks, would you mind giving it a Reviewed-By?
>
>> A concern is that today we only have 1x arm64 platform which actually
>> supports this in mainline.
>>
>> Do you have any more which you plan to support?
>>
>
> The Neoverse cores, mainly.
>
Thanks for the feedback on this RFC, I think we'll resubmit these
patches at a later time, when we've got a json metrics file or two.
Thanks,
Andrew