2023-04-26 07:02:38

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 00/40] Fix perf on Intel hybrid CPUs

TL;DR: hybrid doesn't crash, json metrics work on hybrid on both PMUs
or individually, event parsing doesn't always scan all PMUs, more and
new tests that also run without hybrid, less code.

The first patches were previously posted to improve metrics here:
"perf stat: Introduce skippable evsels"
https://lore.kernel.org/all/[email protected]/
"perf vendor events intel: Add xxx metric constraints"
https://lore.kernel.org/all/[email protected]/

Next are some general test improvements.

Next event parsing is rewritten to not scan all PMUs for the benefit
of raw and legacy cache parsing, instead these are handled by the
lexer and a new term type. This ultimately removes the need for the
event parser for hybrid to be recursive as legacy cache can be just a
term. Tests are re-enabled for events with hyphens, so AMD's
branch-brs event is now parsable.

The cputype option is made a generic pmu filter flag and is tested
even on non-hybrid systems.

The final patches address specific json metric issues on hybrid, in
both the json metrics and the metric code. They also bring in a new
json option to not group events when matching a metricgroup, this
helps reduce counter pressure for TopdownL1 and TopdownL2 metric
groups. The updates to the script that updates the json are posted in:
https://github.com/intel/perfmon/pull/73

The patches add slightly more code than they remove, in areas like
better json metric constraints and tests, but in the core util code,
the removal of hybrid is a net reduction:
20 files changed, 631 insertions(+), 951 deletions(-)

There's specific detail with each patch, but for now here is the 6.3
output followed by that from perf-tools-next with the patch series
applied. The tool is running on an Alderlake CPU on an elderly 5.15
kernel:

Events on hybrid that parse and pass tests:
'''
$ perf-6.3 version
perf version 6.3.rc7.gb7bc77e2f2c7
$ perf-6.3 test
...
6.1: Test event parsing : FAILED!
...
$ perf test
...
6: Parse event definition strings :
6.1: Test event parsing : Ok
6.2: Parsing of all PMU events from sysfs : Ok
6.3: Parsing of given PMU events from sysfs : Ok
6.4: Parsing of aliased events from sysfs : Skip (no aliases in sysfs)
6.5: Parsing of aliased events : Ok
6.6: Parsing of terms (event modifiers) : Ok
...
'''

No event/metric running with json metrics and TopdownL1 on both PMUs:
'''
$ perf-6.3 stat -a sleep 1

Performance counter stats for 'system wide':

24,073.58 msec cpu-clock # 23.975 CPUs utilized
350 context-switches # 14.539 /sec
25 cpu-migrations # 1.038 /sec
66 page-faults # 2.742 /sec
21,257,199 cpu_core/cycles/ # 883.009 K/sec
2,162,192 cpu_atom/cycles/ # 89.816 K/sec
6,679,379 cpu_core/instructions/ # 277.457 K/sec
753,197 cpu_atom/instructions/ # 31.287 K/sec
1,300,647 cpu_core/branches/ # 54.028 K/sec
148,652 cpu_atom/branches/ # 6.175 K/sec
117,429 cpu_core/branch-misses/ # 4.878 K/sec
14,396 cpu_atom/branch-misses/ # 598.000 /sec
123,097,644 cpu_core/slots/ # 5.113 M/sec
9,241,207 cpu_core/topdown-retiring/ # 7.5% Retiring
8,903,288 cpu_core/topdown-bad-spec/ # 7.2% Bad Speculation
66,590,029 cpu_core/topdown-fe-bound/ # 54.1% Frontend Bound
38,397,500 cpu_core/topdown-be-bound/ # 31.2% Backend Bound
3,294,283 cpu_core/topdown-heavy-ops/ # 2.7% Heavy Operations # 4.8% Light Operations
8,855,769 cpu_core/topdown-br-mispredict/ # 7.2% Branch Mispredict # 0.0% Machine Clears
57,695,714 cpu_core/topdown-fetch-lat/ # 46.9% Fetch Latency # 7.2% Fetch Bandwidth
12,823,926 cpu_core/topdown-mem-bound/ # 10.4% Memory Bound # 20.8% Core Bound

1.004093622 seconds time elapsed

$ perf stat -a sleep 1

Performance counter stats for 'system wide':

24,064.65 msec cpu-clock # 23.973 CPUs utilized
384 context-switches # 15.957 /sec
24 cpu-migrations # 0.997 /sec
71 page-faults # 2.950 /sec
19,737,646 cpu_core/cycles/ # 820.192 K/sec
122,018,505 cpu_atom/cycles/ # 5.070 M/sec (63.32%)
7,636,653 cpu_core/instructions/ # 317.339 K/sec
16,266,629 cpu_atom/instructions/ # 675.955 K/sec (72.50%)
1,552,995 cpu_core/branches/ # 64.534 K/sec
3,208,143 cpu_atom/branches/ # 133.314 K/sec (72.50%)
132,151 cpu_core/branch-misses/ # 5.491 K/sec
547,285 cpu_atom/branch-misses/ # 22.742 K/sec (72.49%)
32,110,597 cpu_atom/TOPDOWN_RETIRING.ALL/ # 1.334 M/sec
# 18.4 % tma_bad_speculation (72.48%)
228,006,765 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 9.475 M/sec
# 38.1 % tma_frontend_bound (72.47%)
225,866,251 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 9.386 M/sec
# 37.7 % tma_backend_bound
# 37.7 % tma_backend_bound_aux (72.73%)
119,748,254 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 4.976 M/sec
# 5.2 % tma_retiring (73.14%)
31,363,579 cpu_atom/TOPDOWN_RETIRING.ALL/ # 1.303 M/sec (73.37%)
227,907,321 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 9.471 M/sec (63.95%)
228,803,268 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 9.508 M/sec (63.55%)
113,357,334 cpu_core/TOPDOWN.SLOTS/ # 30.5 % tma_backend_bound
# 9.2 % tma_retiring
# 8.7 % tma_bad_speculation
# 51.6 % tma_frontend_bound
10,451,044 cpu_core/topdown-retiring/
9,687,449 cpu_core/topdown-bad-spec/
58,703,214 cpu_core/topdown-fe-bound/
34,540,660 cpu_core/topdown-be-bound/
154,902 cpu_core/INT_MISC.UOP_DROPPING/ # 6.437 K/sec

1.003818397 seconds time elapsed
'''

Json metrics that don't crash:
'''
$ perf-6.3 stat -M TopdownL1 -a sleep 1
WARNING: events in group from different hybrid PMUs!
WARNING: grouped events cpus do not match, disabling group:
anon group { topdown-retiring, topdown-retiring, INT_MISC.UOP_DROPPING, topdown-fe-bound, topdown-fe-bound, CPU_CLK_UNHALTED.CORE, topdown-be-bound, topdown-be-bound, topdown-bad-spec, topdown-bad-spec }
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-retiring).
/bin/dmesg | grep -i perf may provide additional information.

$ perf stat -M TopdownL1 -a sleep 1

Performance counter stats for 'system wide':

811,810 cpu_atom/TOPDOWN_RETIRING.ALL/ # 26.6 % tma_bad_speculation
3,239,281 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 38.8 % tma_frontend_bound
2,037,667 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 24.4 % tma_backend_bound
# 24.4 % tma_backend_bound_aux
1,670,438 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 9.7 % tma_retiring
808,138 cpu_atom/TOPDOWN_RETIRING.ALL/
3,234,707 cpu_atom/TOPDOWN_FE_BOUND.ALL/
2,081,420 cpu_atom/TOPDOWN_BE_BOUND.ALL/
122,795,280 cpu_core/TOPDOWN.SLOTS/ # 31.7 % tma_backend_bound
# 7.0 % tma_bad_speculation
# 54.1 % tma_frontend_bound
# 7.2 % tma_retiring
8,817,636 cpu_core/topdown-retiring/
8,480,817 cpu_core/topdown-bad-spec/
3,108,926 cpu_core/topdown-heavy-ops/
66,566,215 cpu_core/topdown-fe-bound/
38,958,811 cpu_core/topdown-be-bound/
134,194 cpu_core/INT_MISC.UOP_DROPPING/

1.003607796 seconds time elapsed

$ perf stat -M TopdownL2 -a sleep 1

Performance counter stats for 'system wide':

162,334,218 cpu_atom/TOPDOWN_FE_BOUND.FRONTEND_LATENCY/ # 27.7 % tma_fetch_latency (38.99%)
16,191,486 cpu_atom/INST_RETIRED.ANY/ (45.76%)
68,443,205 cpu_atom/TOPDOWN_BE_BOUND.MEM_SCHEDULER/ # 32.2 % tma_memory_bound
# 5.8 % tma_core_bound (45.77%)
14,920,109 cpu_atom/UOPS_RETIRED.MS/ # 2.9 % tma_base (45.92%)
14,829,879 cpu_atom/UOPS_RETIRED.MS/ # 2.5 % tma_ms_uops (46.31%)
31,860,520 cpu_atom/TOPDOWN_RETIRING.ALL/ (46.71%)
117,323,055 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 18.7 % tma_branch_mispredicts
# 11.5 % tma_fetch_bandwidth
# 0.3 % tma_machine_clears
# 37.9 % tma_resource_bound (53.49%)
222,579,768 cpu_atom/TOPDOWN_BE_BOUND.ALL/ (53.90%)
13,672,174 cpu_atom/MEM_SCHEDULER_BLOCK.ST_BUF/ (54.23%)
24,264,262 cpu_atom/LD_HEAD.ANY_AT_RET/ (47.46%)
13,872,813 cpu_atom/MEM_SCHEDULER_BLOCK.ALL/ (47.45%)
223,722,007 cpu_atom/TOPDOWN_BE_BOUND.ALL/ (47.31%)
2,005,972 cpu_atom/TOPDOWN_BAD_SPECULATION.MACHINE_CLEARS/ (46.91%)
109,423,013 cpu_atom/TOPDOWN_BAD_SPECULATION.MISPREDICT/ (39.72%)
67,420,790 cpu_atom/TOPDOWN_FE_BOUND.FRONTEND_BANDWIDTH/ (39.33%)
92,790,312 cpu_core/TOPDOWN.SLOTS/ # 24.3 % tma_core_bound
# 3.0 % tma_heavy_operations
# 5.6 % tma_light_operations
# 10.8 % tma_memory_bound
# 7.8 % tma_branch_mispredicts
# 40.4 % tma_fetch_latency
# 0.2 % tma_machine_clears
# 7.8 % tma_fetch_bandwidth
8,041,595 cpu_core/topdown-retiring/
10,060,500 cpu_core/topdown-mem-bound/
7,314,344 cpu_core/topdown-bad-spec/
2,824,600 cpu_core/topdown-heavy-ops/
37,630,164 cpu_core/topdown-fetch-lat/
7,278,843 cpu_core/topdown-br-mispredict/
44,863,148 cpu_core/topdown-fe-bound/
32,573,458 cpu_core/topdown-be-bound/
5,785,074 cpu_core/INST_RETIRED.ANY/
2,325,424 cpu_core/UOPS_RETIRED.MS/
15,972,774 cpu_core/CPU_CLK_UNHALTED.THREAD/
117,750 cpu_core/INT_MISC.UOP_DROPPING/

1.003519749 seconds time elapsed
'''

Note, flags are added below to reduce the size of the output by
removing event groups and threshold printing support:
'''
$ perf stat --metric-no-threshold --metric-no-group -M TopdownL3 -a sleep 1

Performance counter stats for 'system wide':

3,506,641 cpu_atom/TOPDOWN_BE_BOUND.ALLOC_RESTRICTIONS/ # 0.6 % tma_alloc_restriction (17.14%)
133,962,390 cpu_atom/TOPDOWN_BE_BOUND.SERIALIZATION/ # 22.2 % tma_serialization (17.48%)
11,201,207 cpu_atom/TOPDOWN_FE_BOUND.ITLB/ # 1.9 % tma_itlb_misses (17.88%)
63,876,838 cpu_atom/TOPDOWN_BE_BOUND.MEM_SCHEDULER/ # 10.6 % tma_mem_scheduler
# 10.5 % tma_store_bound
# 2.4 % tma_other_load_store (18.28%)
14,386,940 cpu_atom/UOPS_RETIRED.MS/ (18.68%)
14,432,493 cpu_atom/UOPS_RETIRED.MS/ # 2.7 % tma_other_ret (19.09%)
81,582,687 cpu_atom/TOPDOWN_FE_BOUND.ICACHE/ # 13.5 % tma_icache_misses (19.14%)
30,467,546 cpu_atom/TOPDOWN_RETIRING.ALL/ (19.14%)
16,788,753 cpu_atom/MEM_BOUND_STALLS.LOAD/ # 4.2 % tma_dram_bound
# 3.7 % tma_l2_bound
# 6.7 % tma_l3_bound (19.14%)
14,514,040 cpu_atom/TOPDOWN_FE_BOUND.DECODE/ # 2.4 % tma_decode (19.14%)
688,307 cpu_atom/TOPDOWN_BAD_SPECULATION.NUKE/ # 0.1 % tma_nuke (19.13%)
0 cpu_atom/UOPS_RETIRED.FPDIV/ (19.12%)
4,408,466 cpu_atom/MEM_BOUND_STALLS.LOAD_L2_HIT/ (19.12%)
120,556,998 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 9.3 % tma_branch_detect
# 1.0 % tma_branch_resteer
# 5.8 % tma_cisc
# 0.3 % tma_fast_nuke
# 0.0 % tma_fpdiv_uops
# 4.3 % tma_l1_bound
# 3.2 % tma_non_mem_scheduler
# 1.9 % tma_other_fb
# 1.1 % tma_predecode
# 0.1 % tma_register
# 0.1 % tma_reorder_buffer (22.30%)
34,773,106 cpu_atom/TOPDOWN_FE_BOUND.CISC/ (22.30%)
591,112 cpu_atom/TOPDOWN_BE_BOUND.REGISTER/ (22.30%)
11,286,706 cpu_atom/TOPDOWN_FE_BOUND.OTHER/ (22.30%)
5,082,636 cpu_atom/MEM_BOUND_STALLS.LOAD_DRAM_HIT/ (22.30%)
14,146,185 cpu_atom/MEM_SCHEDULER_BLOCK.ST_BUF/ (22.31%)
55,833,686 cpu_atom/TOPDOWN_FE_BOUND.BRANCH_DETECT/ (22.30%)
25,714,051 cpu_atom/LD_HEAD.ANY_AT_RET/ (19.12%)
456,549 cpu_atom/TOPDOWN_BE_BOUND.REORDER_BUFFER/ (19.12%)
1,616,862 cpu_atom/TOPDOWN_BAD_SPECULATION.FASTNUKE/ (19.12%)
6,680,782 cpu_atom/TOPDOWN_FE_BOUND.PREDECODE/ (19.12%)
14,229,195 cpu_atom/MEM_SCHEDULER_BLOCK.ALL/ (19.12%)
8,128,921 cpu_atom/MEM_BOUND_STALLS.LOAD_LLC_HIT/ (19.12%)
20,941,725 cpu_atom/LD_HEAD.L1_MISS_AT_RET/ (19.11%)
6,177,125 cpu_atom/TOPDOWN_FE_BOUND.BRANCH_RESTEER/ (18.78%)
228,066,346 cpu_atom/TOPDOWN_BE_BOUND.ALL/ (18.38%)
5,204,897 cpu_atom/LD_HEAD.L1_BOUND_AT_RET/ (17.99%)
19,060,104 cpu_atom/TOPDOWN_BE_BOUND.NON_MEM_SCHEDULER/ (17.58%)
0 cpu_atom/UOPS_RETIRED.FPDIV/ (17.19%)
864,565,692 cpu_core/TOPDOWN.SLOTS/ # 4.7 % tma_microcode_sequencer
# 0.4 % tma_few_uops_instructions
# 0.3 % tma_fused_instructions
# 1.8 % tma_memory_operations
# 0.1 % tma_nop_instructions
# 8.9 % tma_ms_switches
# 0.4 % tma_non_fused_branches
# 0.0 % tma_fp_arith
# 0.0 % tma_int_operations
# 35.7 % tma_ports_utilization
# 3.8 % tma_other_light_ops (18.03%)
100,519,954 cpu_core/topdown-retiring/ (18.03%)
68,964,454 cpu_core/topdown-bad-spec/ (18.03%)
44,732,021 cpu_core/topdown-heavy-ops/ (18.03%)
435,618,316 cpu_core/topdown-fe-bound/ (18.03%)
262,842,804 cpu_core/topdown-be-bound/ (18.03%)
10,368,608 cpu_core/BR_INST_RETIRED.ALL_BRANCHES/ (18.43%)
55,947,727 cpu_core/RESOURCE_STALLS.SCOREBOARD/ (18.84%)
125,718,255 cpu_core/UOPS_ISSUED.ANY/ (19.24%)
23,178,652 cpu_core/EXE_ACTIVITY.1_PORTS_UTIL/ (19.65%)
0 cpu_core/INT_VEC_RETIRED.ADD_256/ (20.05%)
1,119,514 cpu_core/DSB2MITE_SWITCHES.PENALTY_CYCLES/ # 0.5 % tma_dsb_switches (20.46%)
27,684,795 cpu_core/MEMORY_ACTIVITY.STALLS_L1D_MISS/ # 10.6 % tma_l1_bound
# 0.7 % tma_l2_bound (20.86%)
108,813,079 cpu_core/UOPS_EXECUTED.THREAD/ (21.27%)
16,563,036 cpu_core/IDQ.MITE_CYCLES_ANY/ # 5.2 % tma_mite (19.14%)
53,037,471 cpu_core/EXE_ACTIVITY.BOUND_ON_LOADS/ (19.14%)
41,005,510 cpu_core/UOPS_RETIRED.MS/ (19.14%)
575,534 cpu_core/ARITH.DIV_ACTIVE/ # 0.2 % tma_divider (19.14%)
0 cpu_core/FP_ARITH_INST_RETIRED.SCALAR_SINGLE,umask=0x03/ (19.14%)
2,207,021 cpu_core/EXE_ACTIVITY.BOUND_ON_STORES/ # 0.9 % tma_store_bound (19.13%)
5,685,032 cpu_core/UOPS_RETIRED.MS,cmask=1,edge/ (19.13%)
25,523 cpu_core/DECODE.LCP/ # 0.0 % tma_lcp (19.12%)
26,095,298 cpu_core/MEMORY_ACTIVITY.STALLS_L2_MISS/ # 10.8 % tma_l3_bound (19.13%)
108,516 cpu_core/MEMORY_ACTIVITY.STALLS_L3_MISS/ # 0.0 % tma_dram_bound (19.13%)
192,239,590 cpu_core/CYCLE_ACTIVITY.STALLS_TOTAL/ (19.12%)
5,978 cpu_core/LSD.CYCLES_ACTIVE/ # -0.0 % tma_lsd (19.12%)
0 cpu_core/INT_VEC_RETIRED.VNNI_128/ (19.13%)
137,530,949 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/ # 0.1 % tma_dsb (19.12%)
240,070,549 cpu_core/CPU_CLK_UNHALTED.THREAD/ # 17.5 % tma_icache_misses
# 6.1 % tma_itlb_misses
# 40.3 % tma_branch_resteers (21.52%)
0 cpu_core/FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE,umask=0x3c/ (21.51%)
595,051 cpu_core/ARITH.DIV_ACTIVE/ (21.52%)
461,041 cpu_core/IDQ.DSB_CYCLES_ANY/ (21.51%)
0 cpu_core/INT_VEC_RETIRED.MUL_256/ (21.52%)
0 cpu_core/UOPS_EXECUTED.X87/ (21.52%)
237,196 cpu_core/IDQ.DSB_CYCLES_OK/ (21.52%)
125,009 cpu_core/LSD.CYCLES_OK/ (21.52%)
0 cpu_core/INT_VEC_RETIRED.ADD_128/ (21.40%)
28,388,778 cpu_core/MEM_UOP_RETIRED.ANY/ (18.61%)
1,806,629 cpu_core/INST_RETIRED.NOP/ (18.21%)
41,928,018 cpu_core/ICACHE_DATA.STALLS/ (17.81%)
0 cpu_core/INT_VEC_RETIRED.VNNI_256/ (17.41%)
18,230,137 cpu_core/EXE_ACTIVITY.2_PORTS_UTIL,umask=0xc/ (17.02%)
28,052,001 cpu_core/EXE_ACTIVITY.3_PORTS_UTIL,umask=0x80/ (16.61%)
4,073,568 cpu_core/INST_RETIRED.MACRO_FUSED/ (16.20%)
66,509,871 cpu_core/INT_MISC.UNKNOWN_BRANCH_CYCLES/ (15.92%)
2,307,447 cpu_core/IDQ.MITE_CYCLES_OK/ (15.91%)
30,345,769 cpu_core/INT_MISC.CLEAR_RESTEER_CYCLES/ (15.91%)
0 cpu_core/INT_VEC_RETIRED.SHUFFLES/ (15.91%)
14,722,079 cpu_core/ICACHE_TAG.STALLS/ (15.90%)

1.004474469 seconds time elapsed

$ perf stat --metric-no-threshold --metric-no-group -M TopdownL4 -a sleep 1

Performance counter stats for 'system wide':

1,004,834,399 ns duration_time # 0.3 % tma_false_sharing
# 40.2 % tma_l3_hit_latency
# 4.4 % tma_contested_accesses
# 1.6 % tma_data_sharing
3,762,410 cpu_atom/LD_HEAD.PGWALK_AT_RET/ # 3.1 % tma_stlb_miss (33.58%)
10 cpu_atom/MACHINE_CLEARS.SMC/ # 0.0 % tma_smc (33.98%)
66,500,689 cpu_atom/TOPDOWN_BE_BOUND.MEM_SCHEDULER/ # 0.0 % tma_ld_buffer
# 0.0 % tma_rsv
# 11.0 % tma_st_buffer (29.60%)
1,051,312 cpu_atom/LD_HEAD.OTHER_AT_RET/ # 0.9 % tma_other_l1 (30.00%)
14,740,093 cpu_atom/UOPS_RETIRED.MS/ (30.39%)
117,899 cpu_atom/LD_HEAD.DTLB_MISS_AT_RET/ # 0.1 % tma_stlb_hit (30.79%)
701,548 cpu_atom/TOPDOWN_BAD_SPECULATION.NUKE/ # 0.0 % tma_disambiguation
# 0.0 % tma_fp_assist
# 0.1 % tma_memory_ordering
# 0.0 % tma_page_fault (31.08%)
12,873 cpu_atom/MACHINE_CLEARS.MEMORY_ORDERING/ (31.07%)
58,321 cpu_atom/MEM_SCHEDULER_BLOCK.LD_BUF/ (31.07%)
43,458 cpu_atom/MEM_SCHEDULER_BLOCK.RSV/ (31.07%)
14,256,005 cpu_atom/MEM_SCHEDULER_BLOCK.ALL/ (31.06%)
122,156,534 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 0.0 % tma_store_fwd_blk (36.16%)
0 cpu_atom/MACHINE_CLEARS.FP_ASSIST/ (35.76%)
13,804 cpu_atom/MACHINE_CLEARS.SLOW/ (35.35%)
14,388,300 cpu_atom/MEM_SCHEDULER_BLOCK.ST_BUF/ (34.95%)
493,070,443 cpu_atom/CPU_CLK_UNHALTED.REF_TSC/ (39.73%)
2 cpu_atom/MACHINE_CLEARS.PAGE_FAULT/ (39.33%)
1,101 cpu_atom/LD_HEAD.ST_ADDR_AT_RET/ (38.93%)
929 cpu_atom/MACHINE_CLEARS.DISAMBIGUATION/ (38.55%)
14,241,213 cpu_atom/MEM_SCHEDULER_BLOCK.ALL/ (33.45%)
1,010,981,054 cpu_core/TOPDOWN.SLOTS/ # 0.0 % tma_assists
# 4.3 % tma_cisc
# 0.0 % tma_fp_scalar
# 0.0 % tma_fp_vector
# 0.0 % tma_shuffles
# 0.0 % tma_int_vector_128b
# 0.0 % tma_x87_use
# 0.0 % tma_int_vector_256b
# 0.7 % tma_clears_resteers
# 12.4 % tma_mispredicts_resteers (8.14%)
132,375,316 cpu_core/topdown-retiring/ (8.14%)
88,303,327 cpu_core/topdown-bad-spec/ (8.14%)
85,519,216 cpu_core/topdown-br-mispredict/ (8.14%)
495,722,455 cpu_core/topdown-fe-bound/ (8.14%)
298,147,134 cpu_core/topdown-be-bound/ (8.14%)
21,418,803 cpu_core/UOPS_EXECUTED.CYCLES_GE_3/ # 8.8 % tma_ports_utilized_3m (10.12%)
35,208,716 cpu_core/OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD,cmask=4/ # 14.5 % tma_mem_bandwidth
# 33.3 % tma_mem_latency (10.52%)
17,358 cpu_core/OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM/ (10.91%)
55,883,811 cpu_core/RESOURCE_STALLS.SCOREBOARD/ # 24.1 % tma_ports_utilized_0 (12.91%)
0 cpu_core/INT_VEC_RETIRED.ADD_256/ (14.89%)
139,890 cpu_core/DTLB_STORE_MISSES.STLB_HIT,cmask=1/ # 2.8 % tma_dtlb_store (15.30%)
216,886 cpu_core/MEM_INST_RETIRED.LOCK_LOADS/ # 3.8 % tma_store_latency
# 0.1 % tma_lock_latency (15.71%)
115,948,790 cpu_core/UOPS_EXECUTED.THREAD/ (17.69%)
52,155,508 cpu_core/EXE_ACTIVITY.BOUND_ON_LOADS/ (15.93%)
6 cpu_core/ASSISTS.ANY,umask=0x1B/ (15.93%)
87,422,517 cpu_core/CYCLE_ACTIVITY.CYCLES_MEM_ANY/ # 5.2 % tma_dtlb_load (15.81%)
37,420,652 cpu_core/MEMORY_ACTIVITY.CYCLES_L1D_MISS/ (15.44%)
43,527,357 cpu_core/UOPS_RETIRED.MS/ (15.04%)
31,787,227 cpu_core/INT_MISC.CLEAR_RESTEER_CYCLES/ (14.64%)
0 cpu_core/FP_ARITH_INST_RETIRED.SCALAR_SINGLE,umask=0x03/ (14.24%)
4,899,130 cpu_core/XQ.FULL_CYCLES/ # 2.0 % tma_sq_full (13.84%)
1,365 cpu_core/OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM/ (13.44%)
23,904,338 cpu_core/EXE_ACTIVITY.1_PORTS_UTIL/ # 9.9 % tma_ports_utilized_1 (13.05%)
251,479 cpu_core/L2_RQSTS.ALL_RFO/ (12.76%)
188,701,010 cpu_core/CYCLE_ACTIVITY.STALLS_TOTAL/ (12.74%)
6,909 cpu_core/MEM_INST_RETIRED.SPLIT_STORES/ # 0.0 % tma_split_stores (12.74%)
619,775 cpu_core/MEM_LOAD_RETIRED.L1_MISS/ (9.56%)
136,716,345 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/ # 0.9 % tma_decoder0_alone (11.15%)
0 cpu_core/INT_VEC_RETIRED.VNNI_128/ (12.74%)
605,850 cpu_core/L1D_PEND_MISS.FB_FULL/ # 0.2 % tma_fb_full (12.73%)
60,079 cpu_core/MEM_STORE_RETIRED.L2_HIT/ (11.14%)
242,508,080 cpu_core/CPU_CLK_UNHALTED.THREAD/ # 4.2 % tma_ports_utilized_2
# 0.2 % tma_store_fwd_blk
# 0.0 % tma_streaming_stores
# 27.5 % tma_unknown_branches
# 0.0 % tma_split_loads (12.74%)
0 cpu_core/FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE,umask=0x3c/ (14.33%)
32,573 cpu_core/LD_BLOCKS.STORE_FORWARD/ (12.74%)
1,130 cpu_core/OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD/ (12.74%)
4,029 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS/ (9.56%)
4,844,548 cpu_core/INST_DECODED.DECODERS,cmask=1/ (9.56%)
5,266 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_NO_FWD/ (6.37%)
0 cpu_core/UOPS_EXECUTED.X87/ (7.96%)
0 cpu_core/INT_VEC_RETIRED.MUL_256/ (9.56%)
2,786,473 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/ (9.56%)
961,614,001 cpu_core/CPU_CLK_UNHALTED.REF_TSC/ (11.15%)
2,433,107 cpu_core/INST_DECODED.DECODERS,cmask=2/ (11.15%)
0 cpu_core/INT_VEC_RETIRED.ADD_128/ (12.74%)
9,058,046 cpu_core/OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO/ (12.74%)
6,399,992 cpu_core/MEM_INST_RETIRED.ALL_STORES/ (12.74%)
45,519,749 cpu_core/L1D_PEND_MISS.PENDING/ (9.56%)
12,200,559 cpu_core/DTLB_LOAD_MISSES.WALK_ACTIVE/ (7.97%)
115,944,190 cpu_core/OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD/ (6.37%)
0 cpu_core/INT_VEC_RETIRED.VNNI_256/ (7.96%)
1,885,278 cpu_core/INT_MISC.UOP_DROPPING/ (9.56%)
524,819 cpu_core/MEM_LOAD_RETIRED.FB_HIT/ (9.56%)
26,866,872 cpu_core/EXE_ACTIVITY.3_PORTS_UTIL,umask=0x80/ (11.15%)
10,265,977 cpu_core/EXE_ACTIVITY.2_PORTS_UTIL/ (12.74%)
66,662,934 cpu_core/INT_MISC.UNKNOWN_BRANCH_CYCLES/ (12.74%)
0 cpu_core/OCR.STREAMING_WR.ANY_RESPONSE/ (12.74%)
12,499 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD/ (12.74%)
0 cpu_core/INT_VEC_RETIRED.SHUFFLES/ (12.74%)
47,649 cpu_core/DTLB_LOAD_MISSES.STLB_HIT,cmask=1/ (12.74%)
106,424 cpu_core/L2_RQSTS.RFO_HIT/ (12.74%)
0 cpu_core/LD_BLOCKS.NO_SR/ (7.97%)
1,343,692 cpu_core/MEM_LOAD_COMPLETED.L1_MISS_ANY/ (7.96%)
28,517 cpu_core/L1D_PEND_MISS.L2_STALLS/ (6.37%)
394,101 cpu_core/MEM_LOAD_RETIRED.L3_HIT/ (6.36%)
76,860,165,929 TSC

1.004834399 seconds time elapsed

$ perf stat --metric-no-threshold --metric-no-group -M TopdownL5 -a sleep 1

Performance counter stats for 'system wide':

839,538,302 cpu_core/TOPDOWN.SLOTS/ # 0.0 % tma_avx_assists
# 0.0 % tma_fp_assists
# 0.0 % tma_page_faults
# 0.0 % tma_fp_vector_128b
# 0.0 % tma_fp_vector_256b (32.40%)
100,274,045 cpu_core/topdown-retiring/ (32.40%)
77,425,642 cpu_core/topdown-bad-spec/ (32.40%)
424,563,652 cpu_core/topdown-fe-bound/ (32.40%)
245,420,564 cpu_core/topdown-be-bound/ (32.40%)
0 cpu_core/FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE/ (32.79%)
54,372,921 cpu_core/RESOURCE_STALLS.SCOREBOARD/ # 22.2 % tma_serializing_operation (33.20%)
23,018,585 cpu_core/UOPS_DISPATCHED.PORT_6/ # 8.0 % tma_alu_op_utilization (33.61%)
17,748,101 cpu_core/UOPS_DISPATCHED.PORT_2_3_10/ # 4.2 % tma_load_op_utilization (34.02%)
0 cpu_core/FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE/ (34.43%)
7,616,700 cpu_core/UOPS_DISPATCHED.PORT_0/ (34.83%)
96,571 cpu_core/DTLB_STORE_MISSES.STLB_HIT,cmask=1/ # 0.6 % tma_store_stlb_hit (35.25%)
84,909,672 cpu_core/CYCLE_ACTIVITY.CYCLES_MEM_ANY/ # 0.2 % tma_load_stlb_hit (35.66%)
32,935,744 cpu_core/MEMORY_ACTIVITY.CYCLES_L1D_MISS/ (31.95%)
16,597,385 cpu_core/UOPS_DISPATCHED.PORT_5_11/ (31.95%)
9,452,844 cpu_core/UOPS_DISPATCHED.PORT_1/ (31.94%)
2,620,695 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/ # 1.8 % tma_store_stlb_miss (31.95%)
15,699,364 cpu_core/UOPS_DISPATCHED.PORT_7_8/ # 5.7 % tma_store_op_utilization (31.95%)
0 cpu_core/FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE/ (31.94%)
142,096,670 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/ (31.95%)
244,591,239 cpu_core/CPU_CLK_UNHALTED.THREAD/ # 5.2 % tma_load_stlb_miss
# 0.0 % tma_mixing_vectors (35.92%)
2,728,385 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/ (35.66%)
0 cpu_core/ASSISTS.SSE_AVX_MIX/ (35.27%)
0 cpu_core/FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE/ (34.86%)
12,664,768 cpu_core/DTLB_LOAD_MISSES.WALK_ACTIVE/ (34.46%)
12,629,733 cpu_core/DTLB_LOAD_MISSES.WALK_ACTIVE/ (34.04%)
0 cpu_core/ASSISTS.FP/ (33.63%)
12 cpu_core/ASSISTS.PAGE_FAULT/ (33.23%)
16,704,699 cpu_core/UOPS_DISPATCHED.PORT_4_9/ (32.81%)
48,386 cpu_core/DTLB_LOAD_MISSES.STLB_HIT,cmask=1/ (28.68%)

1.002806967 seconds time elapsed

$ perf stat --metric-no-threshold --metric-no-group -M TopdownL6 -a sleep 1

Performance counter stats for 'system wide':

743,684 cpu_core/UOPS_DISPATCHED.PORT_0/ # 4.6 % tma_port_0
1,514 cpu_core/MISC2_RETIRED.LFENCE/ # 0.1 % tma_memory_fence
22,120 cpu_core/CPU_CLK_UNHALTED.PAUSE/ # 0.1 % tma_slow_pause
16,187,637 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/ # 4.5 % tma_port_1
# 12.6 % tma_port_6
16,754,672 cpu_core/CPU_CLK_UNHALTED.THREAD/
728,805 cpu_core/UOPS_DISPATCHED.PORT_1/
2,040,181 cpu_core/UOPS_DISPATCHED.PORT_6/

1.002727371 seconds time elapse
'''

Using --cputype:
'''
$ perf stat --cputype=core -M TopdownL1 -a sleep 1

Performance counter stats for 'system wide':

90,542,172 cpu_core/TOPDOWN.SLOTS/ # 31.3 % tma_backend_bound
# 7.0 % tma_bad_speculation
# 54.0 % tma_frontend_bound
# 7.6 % tma_retiring
6,917,885 cpu_core/topdown-retiring/
6,242,227 cpu_core/topdown-bad-spec/
2,353,956 cpu_core/topdown-heavy-ops/
49,034,945 cpu_core/topdown-fe-bound/
28,390,484 cpu_core/topdown-be-bound/
98,299 cpu_core/INT_MISC.UOP_DROPPING/

1.002395582 seconds time elapsed

$ perf stat --cputype=atom -M TopdownL1 -a sleep 1

Performance counter stats for 'system wide':

645,836 cpu_atom/TOPDOWN_RETIRING.ALL/ # 26.4 % tma_bad_speculation
2,404,468 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 38.9 % tma_frontend_bound
1,455,604 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 23.6 % tma_backend_bound
# 23.6 % tma_backend_bound_aux
1,235,109 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 10.4 % tma_retiring
642,124 cpu_atom/TOPDOWN_RETIRING.ALL/
2,398,892 cpu_atom/TOPDOWN_FE_BOUND.ALL/
1,503,157 cpu_atom/TOPDOWN_BE_BOUND.ALL/

1.002061651 seconds time elapsed
'''

Ian Rogers (40):
perf stat: Introduce skippable evsels
perf vendor events intel: Add alderlake metric constraints
perf vendor events intel: Add icelake metric constraints
perf vendor events intel: Add icelakex metric constraints
perf vendor events intel: Add sapphirerapids metric constraints
perf vendor events intel: Add tigerlake metric constraints
perf stat: Avoid segv on counter->name
perf test: Test more sysfs events
perf test: Use valid for PMU tests
perf test: Mask config then test
perf test: Test more with config_cache
perf test: Roundtrip name, don't assume 1 event per name
perf parse-events: Set attr.type to PMU type early
perf print-events: Avoid unnecessary strlist
perf parse-events: Avoid scanning PMUs before parsing
perf test: Validate events with hyphens in
perf evsel: Modify group pmu name for software events
perf test: Move x86 hybrid tests to arch/x86
perf test x86 hybrid: Don't assume evlist order
perf parse-events: Support PMUs for legacy cache events
perf parse-events: Wildcard legacy cache events
perf print-events: Print legacy cache events for each PMU
perf parse-events: Support wildcards on raw events
perf parse-events: Remove now unused hybrid logic
perf parse-events: Minor type safety cleanup
perf parse-events: Add pmu filter
perf stat: Make cputype filter generic
perf test: Add cputype testing to perf stat
perf test: Fix parse-events tests for >1 core PMU
perf parse-events: Support hardware events as terms
perf parse-events: Avoid error when assigning a term
perf parse-events: Avoid error when assigning a legacy cache term
perf parse-events: Don't auto merge hybrid wildcard events
perf parse-events: Don't reorder atom cpu events
perf metrics: Be PMU specific for referenced metrics.
perf metric: Json flag to not group events if gathering a metric group
perf stat: Command line PMU metric filtering
perf vendor events intel: Correct alderlake metrics
perf jevents: Don't rewrite metrics across PMUs
perf metrics: Be PMU specific in event match

tools/perf/arch/x86/include/arch-tests.h | 1 +
tools/perf/arch/x86/tests/Build | 1 +
tools/perf/arch/x86/tests/arch-tests.c | 10 +
tools/perf/arch/x86/tests/hybrid.c | 275 ++++++
tools/perf/arch/x86/util/evlist.c | 4 +-
tools/perf/builtin-list.c | 19 +-
tools/perf/builtin-record.c | 13 +-
tools/perf/builtin-stat.c | 73 +-
tools/perf/builtin-top.c | 5 +-
tools/perf/builtin-trace.c | 5 +-
.../arch/x86/alderlake/adl-metrics.json | 275 +++---
.../arch/x86/alderlaken/adln-metrics.json | 20 +-
.../arch/x86/broadwell/bdw-metrics.json | 12 +
.../arch/x86/broadwellde/bdwde-metrics.json | 12 +
.../arch/x86/broadwellx/bdx-metrics.json | 12 +
.../arch/x86/cascadelakex/clx-metrics.json | 12 +
.../arch/x86/haswell/hsw-metrics.json | 12 +
.../arch/x86/haswellx/hsx-metrics.json | 12 +
.../arch/x86/icelake/icl-metrics.json | 23 +
.../arch/x86/icelakex/icx-metrics.json | 23 +
.../arch/x86/ivybridge/ivb-metrics.json | 12 +
.../arch/x86/ivytown/ivt-metrics.json | 12 +
.../arch/x86/jaketown/jkt-metrics.json | 12 +
.../arch/x86/sandybridge/snb-metrics.json | 12 +
.../arch/x86/sapphirerapids/spr-metrics.json | 23 +
.../arch/x86/skylake/skl-metrics.json | 12 +
.../arch/x86/skylakex/skx-metrics.json | 12 +
.../arch/x86/tigerlake/tgl-metrics.json | 23 +
tools/perf/pmu-events/jevents.py | 10 +-
tools/perf/pmu-events/metric.py | 28 +-
tools/perf/pmu-events/metric_test.py | 6 +-
tools/perf/pmu-events/pmu-events.h | 2 +
tools/perf/tests/evsel-roundtrip-name.c | 119 ++-
tools/perf/tests/parse-events.c | 826 +++++++++---------
tools/perf/tests/pmu-events.c | 12 +-
tools/perf/tests/shell/stat.sh | 44 +
tools/perf/util/Build | 1 -
tools/perf/util/evlist.h | 1 -
tools/perf/util/evsel.c | 30 +-
tools/perf/util/evsel.h | 1 +
tools/perf/util/metricgroup.c | 111 ++-
tools/perf/util/metricgroup.h | 3 +-
tools/perf/util/parse-events-hybrid.c | 214 -----
tools/perf/util/parse-events-hybrid.h | 25 -
tools/perf/util/parse-events.c | 646 ++++++--------
tools/perf/util/parse-events.h | 61 +-
tools/perf/util/parse-events.l | 108 +--
tools/perf/util/parse-events.y | 222 ++---
tools/perf/util/pmu-hybrid.c | 20 -
tools/perf/util/pmu-hybrid.h | 1 -
tools/perf/util/pmu.c | 16 +-
tools/perf/util/pmu.h | 3 +
tools/perf/util/pmus.c | 25 +-
tools/perf/util/pmus.h | 3 +
tools/perf/util/print-events.c | 85 +-
tools/perf/util/stat-display.c | 6 +-
56 files changed, 1939 insertions(+), 1627 deletions(-)
create mode 100644 tools/perf/arch/x86/tests/hybrid.c
delete mode 100644 tools/perf/util/parse-events-hybrid.c
delete mode 100644 tools/perf/util/parse-events-hybrid.h

--
2.40.1.495.gc816e09b53d-goog


2023-04-26 07:02:45

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 01/40] perf stat: Introduce skippable evsels

Perf stat with no arguments will use default events and metrics. These
events may fail to open even with kernel and hypervisor disabled. When
these fail then the permissions error appears even though they were
implicitly selected. This is particularly a problem with the automatic
selection of the TopdownL1 metric group on certain architectures like
Skylake:

```
$ perf stat true
Error:
Access to performance monitoring and observability operations is limited.
Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
access to performance monitoring and observability operations for processes
without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
perf_event_paranoid setting is 2:
-1: Allow use of (almost) all events by all users
Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow raw and ftrace function tracepoint access
>= 1: Disallow CPU event access
>= 2: Disallow kernel profiling
To make the adjusted perf_event_paranoid setting permanent preserve it
in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
```

This patch adds skippable evsels that when they fail to open won't
fail and won't appear in output. The TopdownL1 events, from the metric
group, are marked as skippable. This turns the failure above to:

```
$ perf stat true

Performance counter stats for 'true':

1.26 msec task-clock:u # 0.328 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
49 page-faults:u # 38.930 K/sec
176,449 cycles:u # 0.140 GHz (48.99%)
122,905 instructions:u # 0.70 insn per cycle
28,264 branches:u # 22.456 M/sec
2,405 branch-misses:u # 8.51% of all branches

0.003834565 seconds time elapsed

0.000000000 seconds user
0.004130000 seconds sys
```

When the events can have kernel/hypervisor disabled, like on
Tigerlake, then it continues to succeed as:

```
$ perf stat true

Performance counter stats for 'true':

0.57 msec task-clock:u # 0.385 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
47 page-faults:u # 82.329 K/sec
287,017 cycles:u # 0.503 GHz
133,318 instructions:u # 0.46 insn per cycle
31,396 branches:u # 54.996 M/sec
2,442 branch-misses:u # 7.78% of all branches
998,790 TOPDOWN.SLOTS:u # 14.5 % tma_retiring
# 27.6 % tma_backend_bound
# 40.9 % tma_frontend_bound
# 17.0 % tma_bad_speculation
144,922 topdown-retiring:u
411,266 topdown-fe-bound:u
258,510 topdown-be-bound:u
184,090 topdown-bad-spec:u
2,585 INT_MISC.UOP_DROPPING:u # 4.528 M/sec
3,434 cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/u # 6.015 M/sec

0.001480954 seconds time elapsed

0.000000000 seconds user
0.001686000 seconds sys
```

And this likewise works if paranoia allows or running as root.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/builtin-stat.c | 39 ++++++++++++++++++++++++++--------
tools/perf/util/evsel.c | 15 +++++++++++--
tools/perf/util/evsel.h | 1 +
tools/perf/util/stat-display.c | 4 ++++
4 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index efda63f6bf32..eb34f5418ad3 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -667,6 +667,13 @@ static enum counter_recovery stat_handle_error(struct evsel *counter)
evsel_list->core.threads->err_thread = -1;
return COUNTER_RETRY;
}
+ } else if (counter->skippable) {
+ if (verbose > 0)
+ ui__warning("skipping event %s that kernel failed to open .\n",
+ evsel__name(counter));
+ counter->supported = false;
+ counter->errored = true;
+ return COUNTER_SKIP;
}

evsel__open_strerror(counter, &target, errno, msg, sizeof(msg));
@@ -1885,15 +1892,29 @@ static int add_default_attributes(void)
* Add TopdownL1 metrics if they exist. To minimize
* multiplexing, don't request threshold computation.
*/
- if (metricgroup__has_metric("TopdownL1") &&
- metricgroup__parse_groups(evsel_list, "TopdownL1",
- /*metric_no_group=*/false,
- /*metric_no_merge=*/false,
- /*metric_no_threshold=*/true,
- stat_config.user_requested_cpu_list,
- stat_config.system_wide,
- &stat_config.metric_events) < 0)
- return -1;
+ if (metricgroup__has_metric("TopdownL1")) {
+ struct evlist *metric_evlist = evlist__new();
+ struct evsel *metric_evsel;
+
+ if (!metric_evlist)
+ return -1;
+
+ if (metricgroup__parse_groups(metric_evlist, "TopdownL1",
+ /*metric_no_group=*/false,
+ /*metric_no_merge=*/false,
+ /*metric_no_threshold=*/true,
+ stat_config.user_requested_cpu_list,
+ stat_config.system_wide,
+ &stat_config.metric_events) < 0)
+ return -1;
+
+ evlist__for_each_entry(metric_evlist, metric_evsel) {
+ metric_evsel->skippable = true;
+ }
+ evlist__splice_list_tail(evsel_list, &metric_evlist->core.entries);
+ evlist__delete(metric_evlist);
+ }
+
/* Platform specific attrs */
if (evlist__add_default_attrs(evsel_list, default_null_attrs) < 0)
return -1;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 356c07f03be6..1cd04b5998d2 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -290,6 +290,7 @@ void evsel__init(struct evsel *evsel,
evsel->per_pkg_mask = NULL;
evsel->collect_stat = false;
evsel->pmu_name = NULL;
+ evsel->skippable = false;
}

struct evsel *evsel__new_idx(struct perf_event_attr *attr, int idx)
@@ -1725,9 +1726,13 @@ static int get_group_fd(struct evsel *evsel, int cpu_map_idx, int thread)
return -1;

fd = FD(leader, cpu_map_idx, thread);
- BUG_ON(fd == -1);
+ BUG_ON(fd == -1 && !leader->skippable);

- return fd;
+ /*
+ * When the leader has been skipped, return -2 to distinguish from no
+ * group leader case.
+ */
+ return fd == -1 ? -2 : fd;
}

static void evsel__remove_fd(struct evsel *pos, int nr_cpus, int nr_threads, int thread_idx)
@@ -2109,6 +2114,12 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,

group_fd = get_group_fd(evsel, idx, thread);

+ if (group_fd == -2) {
+ pr_debug("broken group leader for %s\n", evsel->name);
+ err = -EINVAL;
+ goto out_close;
+ }
+
test_attr__ready();

/* Debug message used by test scripts */
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 35805dcdb1b9..bf8f01af1c0b 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -95,6 +95,7 @@ struct evsel {
bool weak_group;
bool bpf_counter;
bool use_config_name;
+ bool skippable;
int bpf_fd;
struct bpf_object *bpf_obj;
struct list_head config_terms;
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index e6035ecbeee8..6b46bbb3d322 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -810,6 +810,10 @@ static bool should_skip_zero_counter(struct perf_stat_config *config,
struct perf_cpu cpu;
int idx;

+ /* Skip counters that were speculatively/default enabled rather than requested. */
+ if (counter->skippable)
+ return true;
+
/*
* Skip value 0 when enabling --per-thread globally,
* otherwise it will have too many 0 output.
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:02:56

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 02/40] perf vendor events intel: Add alderlake metric constraints

Previously these constraints were disabled as they contained topdown
events. Since:
https://lore.kernel.org/all/[email protected]/
the topdown events are correctly grouped even if no group exists.

This change was created by PR:
https://github.com/intel/perfmon/pull/71

Signed-off-by: Ian Rogers <[email protected]>
---
.../pmu-events/arch/x86/alderlake/adl-metrics.json | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json b/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json
index 75d80e70e5cd..d09361dacd4f 100644
--- a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json
@@ -1057,6 +1057,7 @@
},
{
"BriefDescription": "This metric represents overall arithmetic floating-point (FP) operations fraction the CPU has executed (retired)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "tma_x87_use + tma_fp_scalar + tma_fp_vector",
"MetricGroup": "HPC;TopdownL3;tma_L3_group;tma_light_operations_group",
"MetricName": "tma_fp_arith",
@@ -1181,6 +1182,7 @@
},
{
"BriefDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative branch misprediction (retired JEClear)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "(tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) * tma_info_slots / BR_MISP_RETIRED.ALL_BRANCHES",
"MetricGroup": "Bad;BrMispredicts;tma_issueBM",
"MetricName": "tma_info_branch_misprediction_cost",
@@ -1233,6 +1235,7 @@
},
{
"BriefDescription": "Probability of Core Bound bottleneck hidden by SMT-profiling artifacts",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "(100 * (1 - tma_core_bound / tma_ports_utilization if tma_core_bound < tma_ports_utilization else 1) if tma_info_smt_2t_utilization > 0.5 else 0)",
"MetricGroup": "Cor;SMT",
"MetricName": "tma_info_core_bound_likely",
@@ -1293,6 +1296,7 @@
},
{
"BriefDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * (tma_fetch_latency * tma_dsb_switches / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_mite / (tma_dsb + tma_lsd + tma_mite))",
"MetricGroup": "DSBmiss;Fed;tma_issueFB",
"MetricName": "tma_info_dsb_misses",
@@ -1386,6 +1390,7 @@
},
{
"BriefDescription": "Total pipeline cost of instruction fetch bandwidth related bottlenecks",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * (tma_frontend_bound - tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) - tma_info_big_code",
"MetricGroup": "Fed;FetchBW;Frontend",
"MetricName": "tma_info_instruction_fetch_bw",
@@ -1805,6 +1810,7 @@
},
{
"BriefDescription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * tma_memory_bound * (tma_l1_bound / max(tma_memory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_dtlb_load / max(tma_l1_bound, tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)))",
"MetricGroup": "Mem;MemoryTLB;Offcore;tma_issueTLB",
"MetricName": "tma_info_memory_data_tlbs",
@@ -1814,6 +1820,7 @@
},
{
"BriefDescription": "Total pipeline cost of Memory Latency related bottlenecks (external memory and off-core caches)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_l3_hit_latency / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound))",
"MetricGroup": "Mem;MemoryLat;Offcore;tma_issueLat",
"MetricName": "tma_info_memory_latency",
@@ -1823,6 +1830,7 @@
},
{
"BriefDescription": "Total pipeline cost of Branch Misprediction related bottlenecks",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * (tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))",
"MetricGroup": "Bad;BadSpec;BrMispredicts;tma_issueBM",
"MetricName": "tma_info_mispredictions",
@@ -1855,6 +1863,7 @@
},
{
"BriefDescription": "Average number of Uops retired in cycles where at least one uop has retired.",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "tma_retiring * tma_info_slots / cpu_core@UOPS_RETIRED.SLOTS\\,cmask\\=1@",
"MetricGroup": "Pipeline;Ret",
"MetricName": "tma_info_retire",
@@ -2127,6 +2136,7 @@
},
{
"BriefDescription": "This metric represents fraction of slots where the CPU was retiring memory operations -- uops for memory load or store accesses.",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "tma_light_operations * MEM_UOP_RETIRED.ANY / (tma_retiring * tma_info_slots)",
"MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
"MetricName": "tma_memory_operations",
@@ -2206,6 +2216,7 @@
},
{
"BriefDescription": "This metric represents the remaining light uops fraction the CPU has executed - remaining means not covered by other sibling nodes",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_int_operations + tma_memory_operations + tma_fused_instructions + tma_non_fused_branches + tma_nop_instructions))",
"MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
"MetricName": "tma_other_light_ops",
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:03:09

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 03/40] perf vendor events intel: Add icelake metric constraints

Previously these constraints were disabled as they contained topdown
events. Since:
https://lore.kernel.org/all/[email protected]/
the topdown events are correctly grouped even if no group exists.

This change was created by PR:
https://github.com/intel/perfmon/pull/71

Signed-off-by: Ian Rogers <[email protected]>
---
.../perf/pmu-events/arch/x86/icelake/icl-metrics.json | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json b/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
index f45ae3483df4..cb58317860ea 100644
--- a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
@@ -311,6 +311,7 @@
},
{
"BriefDescription": "This metric represents overall arithmetic floating-point (FP) operations fraction the CPU has executed (retired)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "tma_x87_use + tma_fp_scalar + tma_fp_vector",
"MetricGroup": "HPC;TopdownL3;tma_L3_group;tma_light_operations_group",
"MetricName": "tma_fp_arith",
@@ -413,6 +414,7 @@
},
{
"BriefDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative branch misprediction (retired JEClear)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "(tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) * tma_info_slots / BR_MISP_RETIRED.ALL_BRANCHES",
"MetricGroup": "Bad;BrMispredicts;tma_issueBM",
"MetricName": "tma_info_branch_misprediction_cost",
@@ -458,6 +460,7 @@
},
{
"BriefDescription": "Probability of Core Bound bottleneck hidden by SMT-profiling artifacts",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "(100 * (1 - tma_core_bound / tma_ports_utilization if tma_core_bound < tma_ports_utilization else 1) if tma_info_smt_2t_utilization > 0.5 else 0)",
"MetricGroup": "Cor;SMT",
"MetricName": "tma_info_core_bound_likely",
@@ -510,6 +513,7 @@
},
{
"BriefDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * (tma_fetch_latency * tma_dsb_switches / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_mite / (tma_dsb + tma_lsd + tma_mite))",
"MetricGroup": "DSBmiss;Fed;tma_issueFB",
"MetricName": "tma_info_dsb_misses",
@@ -591,6 +595,7 @@
},
{
"BriefDescription": "Total pipeline cost of instruction fetch bandwidth related bottlenecks",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * (tma_frontend_bound - tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) - tma_info_big_code",
"MetricGroup": "Fed;FetchBW;Frontend",
"MetricName": "tma_info_instruction_fetch_bw",
@@ -929,6 +934,7 @@
},
{
"BriefDescription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * tma_memory_bound * (tma_l1_bound / max(tma_memory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_dtlb_load / max(tma_l1_bound, tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)))",
"MetricGroup": "Mem;MemoryTLB;Offcore;tma_issueTLB",
"MetricName": "tma_info_memory_data_tlbs",
@@ -937,6 +943,7 @@
},
{
"BriefDescription": "Total pipeline cost of Memory Latency related bottlenecks (external memory and off-core caches)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_l3_hit_latency / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound))",
"MetricGroup": "Mem;MemoryLat;Offcore;tma_issueLat",
"MetricName": "tma_info_memory_latency",
@@ -945,6 +952,7 @@
},
{
"BriefDescription": "Total pipeline cost of Branch Misprediction related bottlenecks",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * (tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))",
"MetricGroup": "Bad;BadSpec;BrMispredicts;tma_issueBM",
"MetricName": "tma_info_mispredictions",
@@ -996,6 +1004,7 @@
},
{
"BriefDescription": "Average number of Uops retired in cycles where at least one uop has retired.",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "tma_retiring * tma_info_slots / cpu@UOPS_RETIRED.SLOTS\\,cmask\\=1@",
"MetricGroup": "Pipeline;Ret",
"MetricName": "tma_info_retire"
@@ -1196,6 +1205,7 @@
},
{
"BriefDescription": "This metric represents fraction of slots where the CPU was retiring memory operations -- uops for memory load or store accesses.",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "tma_light_operations * MEM_INST_RETIRED.ANY / INST_RETIRED.ANY",
"MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
"MetricName": "tma_memory_operations",
@@ -1266,6 +1276,7 @@
},
{
"BriefDescription": "This metric represents the remaining light uops fraction the CPU has executed - remaining means not covered by other sibling nodes",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_memory_operations + tma_branch_instructions + tma_nop_instructions))",
"MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
"MetricName": "tma_other_light_ops",
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:03:11

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 04/40] perf vendor events intel: Add icelakex metric constraints

Previously these constraints were disabled as they contained topdown
events. Since:
https://lore.kernel.org/all/[email protected]/
the topdown events are correctly grouped even if no group exists.

This change was created by PR:
https://github.com/intel/perfmon/pull/71

Signed-off-by: Ian Rogers <[email protected]>
---
.../pmu-events/arch/x86/icelakex/icx-metrics.json | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json b/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
index 0f9b174dfc22..76e60e3f9d31 100644
--- a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
@@ -276,6 +276,7 @@
},
{
"BriefDescription": "This metric represents overall arithmetic floating-point (FP) operations fraction the CPU has executed (retired)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "tma_x87_use + tma_fp_scalar + tma_fp_vector",
"MetricGroup": "HPC;TopdownL3;tma_L3_group;tma_light_operations_group",
"MetricName": "tma_fp_arith",
@@ -378,6 +379,7 @@
},
{
"BriefDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative branch misprediction (retired JEClear)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "(tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) * tma_info_slots / BR_MISP_RETIRED.ALL_BRANCHES",
"MetricGroup": "Bad;BrMispredicts;tma_issueBM",
"MetricName": "tma_info_branch_misprediction_cost",
@@ -423,6 +425,7 @@
},
{
"BriefDescription": "Probability of Core Bound bottleneck hidden by SMT-profiling artifacts",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "(100 * (1 - tma_core_bound / tma_ports_utilization if tma_core_bound < tma_ports_utilization else 1) if tma_info_smt_2t_utilization > 0.5 else 0)",
"MetricGroup": "Cor;SMT",
"MetricName": "tma_info_core_bound_likely",
@@ -475,6 +478,7 @@
},
{
"BriefDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * (tma_fetch_latency * tma_dsb_switches / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_mite / (tma_dsb + tma_mite))",
"MetricGroup": "DSBmiss;Fed;tma_issueFB",
"MetricName": "tma_info_dsb_misses",
@@ -556,6 +560,7 @@
},
{
"BriefDescription": "Total pipeline cost of instruction fetch bandwidth related bottlenecks",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * (tma_frontend_bound - tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) - tma_info_big_code",
"MetricGroup": "Fed;FetchBW;Frontend",
"MetricName": "tma_info_instruction_fetch_bw",
@@ -940,6 +945,7 @@
},
{
"BriefDescription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * tma_memory_bound * (tma_l1_bound / max(tma_memory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_store_bound) * (tma_dtlb_load / max(tma_l1_bound, tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_store_bound) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)))",
"MetricGroup": "Mem;MemoryTLB;Offcore;tma_issueTLB",
"MetricName": "tma_info_memory_data_tlbs",
@@ -948,6 +954,7 @@
},
{
"BriefDescription": "Total pipeline cost of Memory Latency related bottlenecks (external memory and off-core caches)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_store_bound) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_store_bound) * (tma_l3_hit_latency / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_store_bound))",
"MetricGroup": "Mem;MemoryLat;Offcore;tma_issueLat",
"MetricName": "tma_info_memory_latency",
@@ -956,6 +963,7 @@
},
{
"BriefDescription": "Total pipeline cost of Branch Misprediction related bottlenecks",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * (tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))",
"MetricGroup": "Bad;BadSpec;BrMispredicts;tma_issueBM",
"MetricName": "tma_info_mispredictions",
@@ -1019,6 +1027,7 @@
},
{
"BriefDescription": "Average number of Uops retired in cycles where at least one uop has retired.",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "tma_retiring * tma_info_slots / cpu@UOPS_RETIRED.SLOTS\\,cmask\\=1@",
"MetricGroup": "Pipeline;Ret",
"MetricName": "tma_info_retire"
@@ -1219,6 +1228,7 @@
},
{
"BriefDescription": "This metric represents fraction of slots where the CPU was retiring memory operations -- uops for memory load or store accesses.",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "tma_light_operations * MEM_INST_RETIRED.ANY / INST_RETIRED.ANY",
"MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
"MetricName": "tma_memory_operations",
@@ -1289,6 +1299,7 @@
},
{
"BriefDescription": "This metric represents the remaining light uops fraction the CPU has executed - remaining means not covered by other sibling nodes",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_memory_operations + tma_branch_instructions + tma_nop_instructions))",
"MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
"MetricName": "tma_other_light_ops",
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:03:43

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 05/40] perf vendor events intel: Add sapphirerapids metric constraints

Previously these constraints were disabled as they contained topdown
events. Since:
https://lore.kernel.org/all/[email protected]/
the topdown events are correctly grouped even if no group exists.

This change was created by PR:
https://github.com/intel/perfmon/pull/71

Signed-off-by: Ian Rogers <[email protected]>
---
.../arch/x86/sapphirerapids/spr-metrics.json | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json
index 126300b7ae77..527d40dde003 100644
--- a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json
@@ -284,6 +284,7 @@
},
{
"BriefDescription": "This metric represents overall arithmetic floating-point (FP) operations fraction the CPU has executed (retired)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "tma_x87_use + tma_fp_scalar + tma_fp_vector + tma_fp_amx",
"MetricGroup": "HPC;TopdownL3;tma_L3_group;tma_light_operations_group",
"MetricName": "tma_fp_arith",
@@ -404,6 +405,7 @@
},
{
"BriefDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative branch misprediction (retired JEClear)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "(tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) * tma_info_slots / BR_MISP_RETIRED.ALL_BRANCHES",
"MetricGroup": "Bad;BrMispredicts;tma_issueBM",
"MetricName": "tma_info_branch_misprediction_cost",
@@ -449,6 +451,7 @@
},
{
"BriefDescription": "Probability of Core Bound bottleneck hidden by SMT-profiling artifacts",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "(100 * (1 - tma_core_bound / tma_ports_utilization if tma_core_bound < tma_ports_utilization else 1) if tma_info_smt_2t_utilization > 0.5 else 0)",
"MetricGroup": "Cor;SMT",
"MetricName": "tma_info_core_bound_likely",
@@ -501,6 +504,7 @@
},
{
"BriefDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * (tma_fetch_latency * tma_dsb_switches / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_mite / (tma_dsb + tma_mite))",
"MetricGroup": "DSBmiss;Fed;tma_issueFB",
"MetricName": "tma_info_dsb_misses",
@@ -582,6 +586,7 @@
},
{
"BriefDescription": "Total pipeline cost of instruction fetch bandwidth related bottlenecks",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * (tma_frontend_bound - tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) - tma_info_big_code",
"MetricGroup": "Fed;FetchBW;Frontend",
"MetricName": "tma_info_instruction_fetch_bw",
@@ -990,6 +995,7 @@
},
{
"BriefDescription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * tma_memory_bound * (tma_l1_bound / max(tma_memory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_store_bound) * (tma_dtlb_load / max(tma_l1_bound, tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_store_bound) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)))",
"MetricGroup": "Mem;MemoryTLB;Offcore;tma_issueTLB",
"MetricName": "tma_info_memory_data_tlbs",
@@ -998,6 +1004,7 @@
},
{
"BriefDescription": "Total pipeline cost of Memory Latency related bottlenecks (external memory and off-core caches)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_store_bound) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_store_bound) * (tma_l3_hit_latency / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_store_bound))",
"MetricGroup": "Mem;MemoryLat;Offcore;tma_issueLat",
"MetricName": "tma_info_memory_latency",
@@ -1006,6 +1013,7 @@
},
{
"BriefDescription": "Total pipeline cost of Branch Misprediction related bottlenecks",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * (tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))",
"MetricGroup": "Bad;BadSpec;BrMispredicts;tma_issueBM",
"MetricName": "tma_info_mispredictions",
@@ -1046,6 +1054,7 @@
},
{
"BriefDescription": "Average number of Uops retired in cycles where at least one uop has retired.",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "tma_retiring * tma_info_slots / cpu@UOPS_RETIRED.SLOTS\\,cmask\\=1@",
"MetricGroup": "Pipeline;Ret",
"MetricName": "tma_info_retire"
@@ -1317,6 +1326,7 @@
},
{
"BriefDescription": "This metric represents fraction of slots where the CPU was retiring memory operations -- uops for memory load or store accesses.",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "tma_light_operations * MEM_UOP_RETIRED.ANY / (tma_retiring * tma_info_slots)",
"MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
"MetricName": "tma_memory_operations",
@@ -1388,6 +1398,7 @@
},
{
"BriefDescription": "This metric represents the remaining light uops fraction the CPU has executed - remaining means not covered by other sibling nodes",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_int_operations + tma_memory_operations + tma_fused_instructions + tma_non_fused_branches + tma_nop_instructions))",
"MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
"MetricName": "tma_other_light_ops",
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:03:50

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 07/40] perf stat: Avoid segv on counter->name

Switch to use evsel__name that doesn't return NULL for hardware and
similar events.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/stat-display.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 6b46bbb3d322..71dd6cb83918 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -747,7 +747,7 @@ static void uniquify_event_name(struct evsel *counter)
int ret = 0;

if (counter->uniquified_name || counter->use_config_name ||
- !counter->pmu_name || !strncmp(counter->name, counter->pmu_name,
+ !counter->pmu_name || !strncmp(evsel__name(counter), counter->pmu_name,
strlen(counter->pmu_name)))
return;

--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:03:51

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 06/40] perf vendor events intel: Add tigerlake metric constraints

Previously these constraints were disabled as they contained topdown
events. Since:
https://lore.kernel.org/all/[email protected]/
the topdown events are correctly grouped even if no group exists.

This change was created by PR:
https://github.com/intel/perfmon/pull/71

Signed-off-by: Ian Rogers <[email protected]>
---
.../pmu-events/arch/x86/tigerlake/tgl-metrics.json | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json b/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json
index 4c80d6be6cf1..6ac4a9e5d013 100644
--- a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json
@@ -305,6 +305,7 @@
},
{
"BriefDescription": "This metric represents overall arithmetic floating-point (FP) operations fraction the CPU has executed (retired)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "tma_x87_use + tma_fp_scalar + tma_fp_vector",
"MetricGroup": "HPC;TopdownL3;tma_L3_group;tma_light_operations_group",
"MetricName": "tma_fp_arith",
@@ -407,6 +408,7 @@
},
{
"BriefDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative branch misprediction (retired JEClear)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "(tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) * tma_info_slots / BR_MISP_RETIRED.ALL_BRANCHES",
"MetricGroup": "Bad;BrMispredicts;tma_issueBM",
"MetricName": "tma_info_branch_misprediction_cost",
@@ -452,6 +454,7 @@
},
{
"BriefDescription": "Probability of Core Bound bottleneck hidden by SMT-profiling artifacts",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "(100 * (1 - tma_core_bound / tma_ports_utilization if tma_core_bound < tma_ports_utilization else 1) if tma_info_smt_2t_utilization > 0.5 else 0)",
"MetricGroup": "Cor;SMT",
"MetricName": "tma_info_core_bound_likely",
@@ -504,6 +507,7 @@
},
{
"BriefDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * (tma_fetch_latency * tma_dsb_switches / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_mite / (tma_dsb + tma_lsd + tma_mite))",
"MetricGroup": "DSBmiss;Fed;tma_issueFB",
"MetricName": "tma_info_dsb_misses",
@@ -585,6 +589,7 @@
},
{
"BriefDescription": "Total pipeline cost of instruction fetch bandwidth related bottlenecks",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * (tma_frontend_bound - tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) - tma_info_big_code",
"MetricGroup": "Fed;FetchBW;Frontend",
"MetricName": "tma_info_instruction_fetch_bw",
@@ -949,6 +954,7 @@
},
{
"BriefDescription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * tma_memory_bound * (tma_l1_bound / max(tma_memory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_dtlb_load / max(tma_l1_bound, tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)))",
"MetricGroup": "Mem;MemoryTLB;Offcore;tma_issueTLB",
"MetricName": "tma_info_memory_data_tlbs",
@@ -957,6 +963,7 @@
},
{
"BriefDescription": "Total pipeline cost of Memory Latency related bottlenecks (external memory and off-core caches)",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_l3_hit_latency / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound))",
"MetricGroup": "Mem;MemoryLat;Offcore;tma_issueLat",
"MetricName": "tma_info_memory_latency",
@@ -965,6 +972,7 @@
},
{
"BriefDescription": "Total pipeline cost of Branch Misprediction related bottlenecks",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "100 * (tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))",
"MetricGroup": "Bad;BadSpec;BrMispredicts;tma_issueBM",
"MetricName": "tma_info_mispredictions",
@@ -1016,6 +1024,7 @@
},
{
"BriefDescription": "Average number of Uops retired in cycles where at least one uop has retired.",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "tma_retiring * tma_info_slots / cpu@UOPS_RETIRED.SLOTS\\,cmask\\=1@",
"MetricGroup": "Pipeline;Ret",
"MetricName": "tma_info_retire"
@@ -1210,6 +1219,7 @@
},
{
"BriefDescription": "This metric represents fraction of slots where the CPU was retiring memory operations -- uops for memory load or store accesses.",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "tma_light_operations * MEM_INST_RETIRED.ANY / INST_RETIRED.ANY",
"MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
"MetricName": "tma_memory_operations",
@@ -1280,6 +1290,7 @@
},
{
"BriefDescription": "This metric represents the remaining light uops fraction the CPU has executed - remaining means not covered by other sibling nodes",
+ "MetricConstraint": "NO_GROUP_EVENTS",
"MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_memory_operations + tma_branch_instructions + tma_nop_instructions))",
"MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
"MetricName": "tma_other_light_ops",
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:04:11

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 08/40] perf test: Test more sysfs events

Parse events for all PMUs, and not just cpu, in test "Parsing of all
PMU events from sysfs".

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/tests/parse-events.c | 103 +++++++++++++++++---------------
1 file changed, 55 insertions(+), 48 deletions(-)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 8068cfd89b84..385bbbc4a409 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -7,6 +7,7 @@
#include "debug.h"
#include "pmu.h"
#include "pmu-hybrid.h"
+#include "pmus.h"
#include <dirent.h>
#include <errno.h>
#include "fncache.h"
@@ -2225,49 +2226,24 @@ static int test_pmu(void)

static int test__pmu_events(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
{
- struct stat st;
- char path[PATH_MAX];
- struct dirent *ent;
- DIR *dir;
- int ret;
-
- if (!test_pmu())
- return TEST_SKIP;
-
- snprintf(path, PATH_MAX, "%s/bus/event_source/devices/cpu/events/",
- sysfs__mountpoint());
-
- ret = stat(path, &st);
- if (ret) {
- pr_debug("omitting PMU cpu events tests: %s\n", path);
- return TEST_OK;
- }
+ struct perf_pmu *pmu;
+ int ret = TEST_OK;

- dir = opendir(path);
- if (!dir) {
- pr_debug("can't open pmu event dir: %s\n", path);
- return TEST_FAIL;
- }
+ perf_pmus__for_each_pmu(pmu) {
+ struct stat st;
+ char path[PATH_MAX];
+ struct dirent *ent;
+ DIR *dir;
+ int err;

- ret = TEST_OK;
- while ((ent = readdir(dir))) {
- struct evlist_test e = { .name = NULL, };
- char name[2 * NAME_MAX + 1 + 12 + 3];
- int test_ret;
+ snprintf(path, PATH_MAX, "%s/bus/event_source/devices/%s/events/",
+ sysfs__mountpoint(), pmu->name);

- /* Names containing . are special and cannot be used directly */
- if (strchr(ent->d_name, '.'))
+ err = stat(path, &st);
+ if (err) {
+ pr_debug("skipping PMU %s events tests: %s\n", pmu->name, path);
+ ret = combine_test_results(ret, TEST_SKIP);
continue;
-
- snprintf(name, sizeof(name), "cpu/event=%s/u", ent->d_name);
-
- e.name = name;
- e.check = test__checkevent_pmu_events;
-
- test_ret = test_event(&e);
- if (test_ret != TEST_OK) {
- pr_debug("Test PMU event failed for '%s'", name);
- ret = combine_test_results(ret, test_ret);
}
/*
* Names containing '-' are recognized as prefixes and suffixes
@@ -2282,17 +2258,48 @@ static int test__pmu_events(struct test_suite *test __maybe_unused, int subtest
if (strchr(ent->d_name, '-'))
continue;

- snprintf(name, sizeof(name), "%s:u,cpu/event=%s/u", ent->d_name, ent->d_name);
- e.name = name;
- e.check = test__checkevent_pmu_events_mix;
- test_ret = test_event(&e);
- if (test_ret != TEST_OK) {
- pr_debug("Test PMU event failed for '%s'", name);
- ret = combine_test_results(ret, test_ret);
+ dir = opendir(path);
+ if (!dir) {
+ pr_debug("can't open pmu event dir: %s\n", path);
+ ret = combine_test_results(ret, TEST_SKIP);
+ continue;
}
- }

- closedir(dir);
+ while ((ent = readdir(dir))) {
+ struct evlist_test e = { .name = NULL, };
+ char name[2 * NAME_MAX + 1 + 12 + 3];
+ int test_ret;
+
+ /* Names containing . are special and cannot be used directly */
+ if (strchr(ent->d_name, '.'))
+ continue;
+
+ snprintf(name, sizeof(name), "%s/event=%s/u", pmu->name, ent->d_name);
+
+ e.name = name;
+ e.check = test__checkevent_pmu_events;
+
+ test_ret = test_event(&e);
+ if (test_ret != TEST_OK) {
+ pr_debug("Test PMU event failed for '%s'", name);
+ ret = combine_test_results(ret, test_ret);
+ }
+
+ if (!is_pmu_core(pmu->name))
+ continue;
+
+ snprintf(name, sizeof(name), "%s:u,%s/event=%s/u", ent->d_name, pmu->name, ent->d_name);
+ e.name = name;
+ e.check = test__checkevent_pmu_events_mix;
+ test_ret = test_event(&e);
+ if (test_ret != TEST_OK) {
+ pr_debug("Test PMU event failed for '%s'", name);
+ ret = combine_test_results(ret, test_ret);
+ }
+ }
+
+ closedir(dir);
+ }
return ret;
}

--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:04:31

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 09/40] perf test: Use valid for PMU tests

Rather than skip all tests in test__events_pmu if PMU cpu isn't
present, use the per-test valid test. This allows the running of
software PMU tests on hybrid and arm systems.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/tests/parse-events.c | 27 +++++++++------------------
1 file changed, 9 insertions(+), 18 deletions(-)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 385bbbc4a409..08d6b8a3015d 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -1430,6 +1430,11 @@ static int test__checkevent_config_cache(struct evlist *evlist)
return TEST_OK;
}

+static bool test__pmu_cpu_valid(void)
+{
+ return !!perf_pmu__find("cpu");
+}
+
static bool test__intel_pt_valid(void)
{
return !!perf_pmu__find("intel_pt");
@@ -1979,21 +1984,25 @@ static const struct evlist_test test__events[] = {
static const struct evlist_test test__events_pmu[] = {
{
.name = "cpu/config=10,config1,config2=3,period=1000/u",
+ .valid = test__pmu_cpu_valid,
.check = test__checkevent_pmu,
/* 0 */
},
{
.name = "cpu/config=1,name=krava/u,cpu/config=2/u",
+ .valid = test__pmu_cpu_valid,
.check = test__checkevent_pmu_name,
/* 1 */
},
{
.name = "cpu/config=1,call-graph=fp,time,period=100000/,cpu/config=2,call-graph=no,time=0,period=2000/",
+ .valid = test__pmu_cpu_valid,
.check = test__checkevent_pmu_partial_time_callgraph,
/* 2 */
},
{
.name = "cpu/name='COMPLEX_CYCLES_NAME:orig=cycles,desc=chip-clock-ticks',period=0x1,event=0x2/ukp",
+ .valid = test__pmu_cpu_valid,
.check = test__checkevent_complex_name,
/* 3 */
},
@@ -2209,21 +2218,6 @@ static int test__terms2(struct test_suite *test __maybe_unused, int subtest __ma
return test_terms(test__terms, ARRAY_SIZE(test__terms));
}

-static int test_pmu(void)
-{
- struct stat st;
- char path[PATH_MAX];
- int ret;
-
- snprintf(path, PATH_MAX, "%s/bus/event_source/devices/cpu/format/",
- sysfs__mountpoint());
-
- ret = stat(path, &st);
- if (ret)
- pr_debug("omitting PMU cpu tests\n");
- return !ret;
-}
-
static int test__pmu_events(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
{
struct perf_pmu *pmu;
@@ -2305,9 +2299,6 @@ static int test__pmu_events(struct test_suite *test __maybe_unused, int subtest

static int test__pmu_events2(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
{
- if (!test_pmu())
- return TEST_SKIP;
-
return test_events(test__events_pmu, ARRAY_SIZE(test__events_pmu));
}

--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:04:31

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 10/40] perf test: Mask config then test

Add helper to test the config of an evsel. Mask the config so that
high-bits containing the PMU type, which isn't constant for hybrid,
are ignored.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/tests/parse-events.c | 183 +++++++++++++-------------------
1 file changed, 75 insertions(+), 108 deletions(-)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 08d6b8a3015d..fa016afbc250 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -21,6 +21,11 @@
#define PERF_TP_SAMPLE_TYPE (PERF_SAMPLE_RAW | PERF_SAMPLE_TIME | \
PERF_SAMPLE_CPU | PERF_SAMPLE_PERIOD)

+static bool test_config(const struct evsel *evsel, __u64 expected_config)
+{
+ return (evsel->core.attr.config & PERF_HW_EVENT_MASK) == expected_config;
+}
+
#ifdef HAVE_LIBTRACEEVENT

#if defined(__s390x__)
@@ -87,7 +92,7 @@ static int test__checkevent_raw(struct evlist *evlist)

TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0x1a == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
return TEST_OK;
}

@@ -97,7 +102,7 @@ static int test__checkevent_numeric(struct evlist *evlist)

TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", 1 == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 1 == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 1));
return TEST_OK;
}

@@ -107,8 +112,7 @@ static int test__checkevent_symbolic_name(struct evlist *evlist)

TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_INSTRUCTIONS == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_INSTRUCTIONS));
return TEST_OK;
}

@@ -118,8 +122,7 @@ static int test__checkevent_symbolic_name_config(struct evlist *evlist)

TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
/*
* The period value gets configured within evlist__config,
* while this test executes only parse events method.
@@ -139,8 +142,7 @@ static int test__checkevent_symbolic_alias(struct evlist *evlist)

TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_SW_PAGE_FAULTS == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_SW_PAGE_FAULTS));
return TEST_OK;
}

@@ -150,7 +152,7 @@ static int test__checkevent_genhw(struct evlist *evlist)

TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HW_CACHE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", (1 << 16) == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 1 << 16));
return TEST_OK;
}

@@ -160,7 +162,7 @@ static int test__checkevent_breakpoint(struct evlist *evlist)

TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_BREAKPOINT == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0 == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0));
TEST_ASSERT_VAL("wrong bp_type", (HW_BREAKPOINT_R | HW_BREAKPOINT_W) ==
evsel->core.attr.bp_type);
TEST_ASSERT_VAL("wrong bp_len", HW_BREAKPOINT_LEN_4 ==
@@ -174,7 +176,7 @@ static int test__checkevent_breakpoint_x(struct evlist *evlist)

TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_BREAKPOINT == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0 == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0));
TEST_ASSERT_VAL("wrong bp_type",
HW_BREAKPOINT_X == evsel->core.attr.bp_type);
TEST_ASSERT_VAL("wrong bp_len", sizeof(long) == evsel->core.attr.bp_len);
@@ -188,7 +190,7 @@ static int test__checkevent_breakpoint_r(struct evlist *evlist)
TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type",
PERF_TYPE_BREAKPOINT == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0 == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0));
TEST_ASSERT_VAL("wrong bp_type",
HW_BREAKPOINT_R == evsel->core.attr.bp_type);
TEST_ASSERT_VAL("wrong bp_len",
@@ -203,7 +205,7 @@ static int test__checkevent_breakpoint_w(struct evlist *evlist)
TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type",
PERF_TYPE_BREAKPOINT == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0 == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0));
TEST_ASSERT_VAL("wrong bp_type",
HW_BREAKPOINT_W == evsel->core.attr.bp_type);
TEST_ASSERT_VAL("wrong bp_len",
@@ -218,7 +220,7 @@ static int test__checkevent_breakpoint_rw(struct evlist *evlist)
TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type",
PERF_TYPE_BREAKPOINT == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0 == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0));
TEST_ASSERT_VAL("wrong bp_type",
(HW_BREAKPOINT_R|HW_BREAKPOINT_W) == evsel->core.attr.bp_type);
TEST_ASSERT_VAL("wrong bp_len",
@@ -447,7 +449,7 @@ static int test__checkevent_pmu(struct evlist *evlist)

TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 10 == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 10));
TEST_ASSERT_VAL("wrong config1", 1 == evsel->core.attr.config1);
TEST_ASSERT_VAL("wrong config2", 3 == evsel->core.attr.config2);
TEST_ASSERT_VAL("wrong config3", 0 == evsel->core.attr.config3);
@@ -469,7 +471,7 @@ static int test__checkevent_list(struct evlist *evlist)

/* r1 */
TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 1 == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 1));
TEST_ASSERT_VAL("wrong config1", 0 == evsel->core.attr.config1);
TEST_ASSERT_VAL("wrong config2", 0 == evsel->core.attr.config2);
TEST_ASSERT_VAL("wrong config3", 0 == evsel->core.attr.config3);
@@ -492,7 +494,7 @@ static int test__checkevent_list(struct evlist *evlist)
/* 1:1:hp */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", 1 == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 1 == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 1));
TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
@@ -509,14 +511,14 @@ static int test__checkevent_pmu_name(struct evlist *evlist)
/* cpu/config=1,name=krava/u */
TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 1 == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 1));
TEST_ASSERT_VAL("wrong name", !strcmp(evsel__name(evsel), "krava"));

/* cpu/config=2/u" */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 2 == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 2));
TEST_ASSERT_VAL("wrong name",
!strcmp(evsel__name(evsel), "cpu/config=2/u"));

@@ -530,7 +532,7 @@ static int test__checkevent_pmu_partial_time_callgraph(struct evlist *evlist)
/* cpu/config=1,call-graph=fp,time,period=100000/ */
TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 1 == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 1));
/*
* The period, time and callgraph value gets configured within evlist__config,
* while this test executes only parse events method.
@@ -542,7 +544,7 @@ static int test__checkevent_pmu_partial_time_callgraph(struct evlist *evlist)
/* cpu/config=2,call-graph=no,time=0,period=2000/ */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 2 == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 2));
/*
* The period, time and callgraph value gets configured within evlist__config,
* while this test executes only parse events method.
@@ -694,8 +696,7 @@ static int test__group1(struct evlist *evlist)
/* instructions:k */
evsel = leader = evlist__first(evlist);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_INSTRUCTIONS == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_INSTRUCTIONS));
TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
@@ -710,8 +711,7 @@ static int test__group1(struct evlist *evlist)
/* cycles:upp */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
@@ -736,8 +736,7 @@ static int test__group2(struct evlist *evlist)
/* faults + :ku modifier */
evsel = leader = evlist__first(evlist);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_SW_PAGE_FAULTS == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_SW_PAGE_FAULTS));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
@@ -752,8 +751,7 @@ static int test__group2(struct evlist *evlist)
/* cache-references + :u modifier */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CACHE_REFERENCES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CACHE_REFERENCES));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
@@ -767,8 +765,7 @@ static int test__group2(struct evlist *evlist)
/* cycles:k */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
@@ -811,8 +808,7 @@ static int test__group3(struct evlist *evlist __maybe_unused)
/* group1 cycles:kppp */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
@@ -828,8 +824,7 @@ static int test__group3(struct evlist *evlist __maybe_unused)
/* group2 cycles + G modifier */
evsel = leader = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
@@ -846,7 +841,7 @@ static int test__group3(struct evlist *evlist __maybe_unused)
/* group2 1:3 + G modifier */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", 1 == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 3 == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 3));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
@@ -860,8 +855,7 @@ static int test__group3(struct evlist *evlist __maybe_unused)
/* instructions:u */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_INSTRUCTIONS == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_INSTRUCTIONS));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
@@ -885,8 +879,7 @@ static int test__group4(struct evlist *evlist __maybe_unused)
/* cycles:u + p */
evsel = leader = evlist__first(evlist);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
@@ -903,8 +896,7 @@ static int test__group4(struct evlist *evlist __maybe_unused)
/* instructions:kp + p */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_INSTRUCTIONS == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_INSTRUCTIONS));
TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
@@ -929,8 +921,7 @@ static int test__group5(struct evlist *evlist __maybe_unused)
/* cycles + G */
evsel = leader = evlist__first(evlist);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
@@ -946,8 +937,7 @@ static int test__group5(struct evlist *evlist __maybe_unused)
/* instructions + G */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_INSTRUCTIONS == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_INSTRUCTIONS));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
@@ -961,8 +951,7 @@ static int test__group5(struct evlist *evlist __maybe_unused)
/* cycles:G */
evsel = leader = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
@@ -978,8 +967,7 @@ static int test__group5(struct evlist *evlist __maybe_unused)
/* instructions:G */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_INSTRUCTIONS == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_INSTRUCTIONS));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
@@ -992,8 +980,7 @@ static int test__group5(struct evlist *evlist __maybe_unused)
/* cycles */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
@@ -1015,8 +1002,7 @@ static int test__group_gh1(struct evlist *evlist)
/* cycles + :H group modifier */
evsel = leader = evlist__first(evlist);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
@@ -1031,8 +1017,7 @@ static int test__group_gh1(struct evlist *evlist)
/* cache-misses:G + :H group modifier */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CACHE_MISSES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CACHE_MISSES));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
@@ -1055,8 +1040,7 @@ static int test__group_gh2(struct evlist *evlist)
/* cycles + :G group modifier */
evsel = leader = evlist__first(evlist);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
@@ -1071,8 +1055,7 @@ static int test__group_gh2(struct evlist *evlist)
/* cache-misses:H + :G group modifier */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CACHE_MISSES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CACHE_MISSES));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
@@ -1095,8 +1078,7 @@ static int test__group_gh3(struct evlist *evlist)
/* cycles:G + :u group modifier */
evsel = leader = evlist__first(evlist);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
@@ -1111,8 +1093,7 @@ static int test__group_gh3(struct evlist *evlist)
/* cache-misses:H + :u group modifier */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CACHE_MISSES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CACHE_MISSES));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
@@ -1135,8 +1116,7 @@ static int test__group_gh4(struct evlist *evlist)
/* cycles:G + :uG group modifier */
evsel = leader = evlist__first(evlist);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
@@ -1151,8 +1131,7 @@ static int test__group_gh4(struct evlist *evlist)
/* cache-misses:H + :uG group modifier */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CACHE_MISSES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CACHE_MISSES));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
@@ -1174,8 +1153,7 @@ static int test__leader_sample1(struct evlist *evlist)
/* cycles - sampling group leader */
evsel = leader = evlist__first(evlist);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
@@ -1189,8 +1167,7 @@ static int test__leader_sample1(struct evlist *evlist)
/* cache-misses - not sampling */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CACHE_MISSES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CACHE_MISSES));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
@@ -1203,8 +1180,7 @@ static int test__leader_sample1(struct evlist *evlist)
/* branch-misses - not sampling */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_BRANCH_MISSES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_BRANCH_MISSES));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
@@ -1227,8 +1203,7 @@ static int test__leader_sample2(struct evlist *evlist __maybe_unused)
/* instructions - sampling group leader */
evsel = leader = evlist__first(evlist);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_INSTRUCTIONS == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_INSTRUCTIONS));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
@@ -1242,8 +1217,7 @@ static int test__leader_sample2(struct evlist *evlist __maybe_unused)
/* branch-misses - not sampling */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_BRANCH_MISSES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_BRANCH_MISSES));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
@@ -1279,8 +1253,7 @@ static int test__pinned_group(struct evlist *evlist)
/* cycles - group leader */
evsel = leader = evlist__first(evlist);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
TEST_ASSERT_VAL("wrong pinned", evsel->core.attr.pinned);
@@ -1288,14 +1261,12 @@ static int test__pinned_group(struct evlist *evlist)
/* cache-misses - can not be pinned, but will go on with the leader */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CACHE_MISSES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CACHE_MISSES));
TEST_ASSERT_VAL("wrong pinned", !evsel->core.attr.pinned);

/* branch-misses - ditto */
evsel = evsel__next(evsel);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_BRANCH_MISSES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_BRANCH_MISSES));
TEST_ASSERT_VAL("wrong pinned", !evsel->core.attr.pinned);

return TEST_OK;
@@ -1323,8 +1294,7 @@ static int test__exclusive_group(struct evlist *evlist)
/* cycles - group leader */
evsel = leader = evlist__first(evlist);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
TEST_ASSERT_VAL("wrong exclusive", evsel->core.attr.exclusive);
@@ -1332,14 +1302,12 @@ static int test__exclusive_group(struct evlist *evlist)
/* cache-misses - can not be pinned, but will go on with the leader */
evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_CACHE_MISSES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CACHE_MISSES));
TEST_ASSERT_VAL("wrong exclusive", !evsel->core.attr.exclusive);

/* branch-misses - ditto */
evsel = evsel__next(evsel);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_HW_BRANCH_MISSES == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_BRANCH_MISSES));
TEST_ASSERT_VAL("wrong exclusive", !evsel->core.attr.exclusive);

return TEST_OK;
@@ -1350,7 +1318,7 @@ static int test__checkevent_breakpoint_len(struct evlist *evlist)

TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_BREAKPOINT == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0 == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0));
TEST_ASSERT_VAL("wrong bp_type", (HW_BREAKPOINT_R | HW_BREAKPOINT_W) ==
evsel->core.attr.bp_type);
TEST_ASSERT_VAL("wrong bp_len", HW_BREAKPOINT_LEN_1 ==
@@ -1365,7 +1333,7 @@ static int test__checkevent_breakpoint_len_w(struct evlist *evlist)

TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_BREAKPOINT == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0 == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0));
TEST_ASSERT_VAL("wrong bp_type", HW_BREAKPOINT_W ==
evsel->core.attr.bp_type);
TEST_ASSERT_VAL("wrong bp_len", HW_BREAKPOINT_LEN_2 ==
@@ -1393,8 +1361,7 @@ static int test__checkevent_precise_max_modifier(struct evlist *evlist)

TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config",
- PERF_COUNT_SW_TASK_CLOCK == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_SW_TASK_CLOCK));
return TEST_OK;
}

@@ -1462,7 +1429,7 @@ static int test__checkevent_raw_pmu(struct evlist *evlist)

TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0x1a == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
return TEST_OK;
}

@@ -1471,7 +1438,7 @@ static int test__sym_event_slash(struct evlist *evlist)
struct evsel *evsel = evlist__first(evlist);

TEST_ASSERT_VAL("wrong type", evsel->core.attr.type == PERF_TYPE_HARDWARE);
- TEST_ASSERT_VAL("wrong config", evsel->core.attr.config == PERF_COUNT_HW_CPU_CYCLES);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
return TEST_OK;
}
@@ -1481,7 +1448,7 @@ static int test__sym_event_dc(struct evlist *evlist)
struct evsel *evsel = evlist__first(evlist);

TEST_ASSERT_VAL("wrong type", evsel->core.attr.type == PERF_TYPE_HARDWARE);
- TEST_ASSERT_VAL("wrong config", evsel->core.attr.config == PERF_COUNT_HW_CPU_CYCLES);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
return TEST_OK;
}
@@ -1548,7 +1515,7 @@ static int test__hybrid_hw_event_with_pmu(struct evlist *evlist)

TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
return TEST_OK;
}

@@ -1559,12 +1526,12 @@ static int test__hybrid_hw_group_event(struct evlist *evlist)
evsel = leader = evlist__first(evlist);
TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));

evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0xc0 == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0xc0));
TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
return TEST_OK;
}
@@ -1580,7 +1547,7 @@ static int test__hybrid_sw_hw_group_event(struct evlist *evlist)

evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
return TEST_OK;
}
@@ -1592,7 +1559,7 @@ static int test__hybrid_hw_sw_group_event(struct evlist *evlist)
evsel = leader = evlist__first(evlist);
TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));

evsel = evsel__next(evsel);
@@ -1608,14 +1575,14 @@ static int test__hybrid_group_modifier1(struct evlist *evlist)
evsel = leader = evlist__first(evlist);
TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);

evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0xc0 == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0xc0));
TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
@@ -1629,17 +1596,17 @@ static int test__hybrid_raw1(struct evlist *evlist)
if (!perf_pmu__hybrid_mounted("cpu_atom")) {
TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0x1a == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
return TEST_OK;
}

TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0x1a == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));

/* The type of second event is randome value */
evsel = evsel__next(evsel);
- TEST_ASSERT_VAL("wrong config", 0x1a == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
return TEST_OK;
}

@@ -1649,7 +1616,7 @@ static int test__hybrid_raw2(struct evlist *evlist)

TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0x1a == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
return TEST_OK;
}

--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:04:33

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 11/40] perf test: Test more with config_cache

test__checkevent_config_cache checks the parsing of
"L1-dcache-misses/name=cachepmu/". Don't just check that the name is
set correctly, also validate the rest of the perf_event_attr for
L1-dcache-misses.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/tests/parse-events.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index fa016afbc250..177464793aa8 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -1394,7 +1394,7 @@ static int test__checkevent_config_cache(struct evlist *evlist)
struct evsel *evsel = evlist__first(evlist);

TEST_ASSERT_VAL("wrong name setting", evsel__name_is(evsel, "cachepmu"));
- return TEST_OK;
+ return test__checkevent_genhw(evlist);
}

static bool test__pmu_cpu_valid(void)
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:04:58

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 12/40] perf test: Roundtrip name, don't assume 1 event per name

Opening hardware names and a legacy cache event on a hybrid PMU opens
it on each PMU. Parsing and checking indexes fails, as the parsed
index is double the expected. Avoid checking the index by just
comparing the names immediately after the parse.

This change removes hard coded hybrid logic and removes assumptions
about the expansion of an event. On hybrid the PMUs may or may not
support an event and so using a distance isn't a consistent solution.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/tests/evsel-roundtrip-name.c | 119 ++++++++++--------------
1 file changed, 49 insertions(+), 70 deletions(-)

diff --git a/tools/perf/tests/evsel-roundtrip-name.c b/tools/perf/tests/evsel-roundtrip-name.c
index e94fed901992..15ff86f9da0b 100644
--- a/tools/perf/tests/evsel-roundtrip-name.c
+++ b/tools/perf/tests/evsel-roundtrip-name.c
@@ -4,114 +4,93 @@
#include "parse-events.h"
#include "tests.h"
#include "debug.h"
-#include "pmu.h"
-#include "pmu-hybrid.h"
-#include <errno.h>
#include <linux/kernel.h>

static int perf_evsel__roundtrip_cache_name_test(void)
{
- char name[128];
- int type, op, err = 0, ret = 0, i, idx;
- struct evsel *evsel;
- struct evlist *evlist = evlist__new();
+ int ret = TEST_OK;

- if (evlist == NULL)
- return -ENOMEM;
-
- for (type = 0; type < PERF_COUNT_HW_CACHE_MAX; type++) {
- for (op = 0; op < PERF_COUNT_HW_CACHE_OP_MAX; op++) {
+ for (int type = 0; type < PERF_COUNT_HW_CACHE_MAX; type++) {
+ for (int op = 0; op < PERF_COUNT_HW_CACHE_OP_MAX; op++) {
/* skip invalid cache type */
if (!evsel__is_cache_op_valid(type, op))
continue;

- for (i = 0; i < PERF_COUNT_HW_CACHE_RESULT_MAX; i++) {
- __evsel__hw_cache_type_op_res_name(type, op, i, name, sizeof(name));
- err = parse_event(evlist, name);
- if (err)
- ret = err;
- }
- }
- }
-
- idx = 0;
- evsel = evlist__first(evlist);
+ for (int res = 0; res < PERF_COUNT_HW_CACHE_RESULT_MAX; res++) {
+ char name[128];
+ struct evlist *evlist = evlist__new();
+ struct evsel *evsel;
+ int err;

- for (type = 0; type < PERF_COUNT_HW_CACHE_MAX; type++) {
- for (op = 0; op < PERF_COUNT_HW_CACHE_OP_MAX; op++) {
- /* skip invalid cache type */
- if (!evsel__is_cache_op_valid(type, op))
- continue;
+ if (evlist == NULL) {
+ pr_debug("Failed to alloc evlist");
+ return TEST_FAIL;
+ }
+ __evsel__hw_cache_type_op_res_name(type, op, res,
+ name, sizeof(name));

- for (i = 0; i < PERF_COUNT_HW_CACHE_RESULT_MAX; i++) {
- __evsel__hw_cache_type_op_res_name(type, op, i, name, sizeof(name));
- if (evsel->core.idx != idx)
+ err = parse_event(evlist, name);
+ if (err) {
+ pr_debug("Failure to parse cache event '%s' possibly as PMUs don't support it",
+ name);
+ evlist__delete(evlist);
continue;
-
- ++idx;
-
- if (strcmp(evsel__name(evsel), name)) {
- pr_debug("%s != %s\n", evsel__name(evsel), name);
- ret = -1;
}
-
- evsel = evsel__next(evsel);
+ evlist__for_each_entry(evlist, evsel) {
+ if (strcmp(evsel__name(evsel), name)) {
+ pr_debug("%s != %s\n", evsel__name(evsel), name);
+ ret = TEST_FAIL;
+ }
+ }
+ evlist__delete(evlist);
}
}
}
-
- evlist__delete(evlist);
return ret;
}

-static int __perf_evsel__name_array_test(const char *const names[], int nr_names,
- int distance)
+static int perf_evsel__name_array_test(const char *const names[], int nr_names)
{
- int i, err;
- struct evsel *evsel;
- struct evlist *evlist = evlist__new();
+ int ret = TEST_OK;

- if (evlist == NULL)
- return -ENOMEM;
+ for (int i = 0; i < nr_names; ++i) {
+ struct evlist *evlist = evlist__new();
+ struct evsel *evsel;
+ int err;

- for (i = 0; i < nr_names; ++i) {
+ if (evlist == NULL) {
+ pr_debug("Failed to alloc evlist");
+ return TEST_FAIL;
+ }
err = parse_event(evlist, names[i]);
if (err) {
pr_debug("failed to parse event '%s', err %d\n",
names[i], err);
- goto out_delete_evlist;
+ evlist__delete(evlist);
+ ret = TEST_FAIL;
+ continue;
}
- }
-
- err = 0;
- evlist__for_each_entry(evlist, evsel) {
- if (strcmp(evsel__name(evsel), names[evsel->core.idx / distance])) {
- --err;
- pr_debug("%s != %s\n", evsel__name(evsel), names[evsel->core.idx / distance]);
+ evlist__for_each_entry(evlist, evsel) {
+ if (strcmp(evsel__name(evsel), names[i])) {
+ pr_debug("%s != %s\n", evsel__name(evsel), names[i]);
+ ret = TEST_FAIL;
+ }
}
+ evlist__delete(evlist);
}
-
-out_delete_evlist:
- evlist__delete(evlist);
- return err;
+ return ret;
}

-#define perf_evsel__name_array_test(names, distance) \
- __perf_evsel__name_array_test(names, ARRAY_SIZE(names), distance)
-
static int test__perf_evsel__roundtrip_name_test(struct test_suite *test __maybe_unused,
int subtest __maybe_unused)
{
- int err = 0, ret = 0;
-
- if (perf_pmu__has_hybrid() && perf_pmu__hybrid_mounted("cpu_atom"))
- return perf_evsel__name_array_test(evsel__hw_names, 2);
+ int err = 0, ret = TEST_OK;

- err = perf_evsel__name_array_test(evsel__hw_names, 1);
+ err = perf_evsel__name_array_test(evsel__hw_names, PERF_COUNT_HW_MAX);
if (err)
ret = err;

- err = __perf_evsel__name_array_test(evsel__sw_names, PERF_COUNT_SW_DUMMY + 1, 1);
+ err = perf_evsel__name_array_test(evsel__sw_names, PERF_COUNT_SW_DUMMY + 1);
if (err)
ret = err;

--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:05:02

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 13/40] perf parse-events: Set attr.type to PMU type early

Set attr.type to PMU type early so that later terms can override the
value. Setting the value in perf_pmu__config means that earlier steps,
like config_term_pmu, can override the value.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/parse-events.c | 2 +-
tools/perf/util/pmu.c | 1 -
2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index d71019dcd614..4ba01577618e 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1492,9 +1492,9 @@ int parse_events_add_pmu(struct parse_events_state *parse_state,
} else {
memset(&attr, 0, sizeof(attr));
}
+ attr.type = pmu->type;

if (!head_config) {
- attr.type = pmu->type;
evsel = __add_event(list, &parse_state->idx, &attr,
/*init_attr=*/true, /*name=*/NULL,
/*metric_id=*/NULL, pmu,
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index ad209c88a124..cb33d869f1ed 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1398,7 +1398,6 @@ int perf_pmu__config(struct perf_pmu *pmu, struct perf_event_attr *attr,
{
bool zero = !!pmu->default_config;

- attr->type = pmu->type;
return perf_pmu__config_terms(pmu->name, &pmu->format, attr,
head_terms, zero, err);
}
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:06:10

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 15/40] perf parse-events: Avoid scanning PMUs before parsing

The event parser needs to handle two special cases:
1) legacy events like L1-dcache-load-miss. These event names don't
appear in json or sysfs, and lookup tables are used for the config
value.
2) raw events where 'r0xead' is the same as 'read' unless the PMU has
an event called 'read' in which case the event has priority.

The previous parser to handle these cases would scan all PMUs for
components of event names. These components would then be used to
classify in the lexer whether the token should be part of a legacy
event, a raw event or an event. The grammar would handle legacy event
tokens or recombining the tokens back into a regular event name. The
code wasn't PMU specific and had issues around events like AMD's
branch-brs that would fail to parse as it expects brs to be a suffix
on a legacy event style name:

$ perf stat -e branch-brs true
event syntax error: 'branch-brs'
\___ parser error

This change removes processing all PMUs by using the lexer in the form
of a regular expression matcher. The lexer will return the token for
the longest matched sequence of characters, and in the event of a tie
the first. The legacy events are a fixed number of regular
expressions, and by matching these before a name token its possible to
generate an accurate legacy event token with everything else matching
as a name. Because of the lexer change the handling of hyphens in the
grammar can be removed as hyphens just become a part of the name.

To handle raw events and terms the parser is changed to defer trying
to evaluate whether something is a raw event until the PMU is known in
the grammar. Once the PMU is known, the events of the PMU can be
scanned for the 'read' style problem. A new term type is added for
these raw terms, used to enable deferring the evaluation.

While this change is large, it has stats of:
170 insertions(+), 436 deletions(-)
the bulk of the change is deleting the old approach. It isn't possible
to break apart the code added due to the dependencies on how the parts
of the parsing work.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/tests/parse-events.c | 24 +--
tools/perf/tests/pmu-events.c | 9 -
tools/perf/util/parse-events.c | 329 ++++++++++----------------------
tools/perf/util/parse-events.h | 16 +-
tools/perf/util/parse-events.l | 85 +--------
tools/perf/util/parse-events.y | 143 +++++---------
6 files changed, 170 insertions(+), 436 deletions(-)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 177464793aa8..6eadb8a47dbf 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -664,11 +664,11 @@ static int test__checkterms_simple(struct list_head *terms)
*/
term = list_entry(term->list.next, struct parse_events_term, list);
TEST_ASSERT_VAL("wrong type term",
- term->type_term == PARSE_EVENTS__TERM_TYPE_USER);
+ term->type_term == PARSE_EVENTS__TERM_TYPE_RAW);
TEST_ASSERT_VAL("wrong type val",
- term->type_val == PARSE_EVENTS__TERM_TYPE_NUM);
- TEST_ASSERT_VAL("wrong val", term->val.num == 1);
- TEST_ASSERT_VAL("wrong config", !strcmp(term->config, "read"));
+ term->type_val == PARSE_EVENTS__TERM_TYPE_STR);
+ TEST_ASSERT_VAL("wrong val", !strcmp(term->val.str, "read"));
+ TEST_ASSERT_VAL("wrong config", !strcmp(term->config, "raw"));

/*
* r0xead
@@ -678,11 +678,11 @@ static int test__checkterms_simple(struct list_head *terms)
*/
term = list_entry(term->list.next, struct parse_events_term, list);
TEST_ASSERT_VAL("wrong type term",
- term->type_term == PARSE_EVENTS__TERM_TYPE_CONFIG);
+ term->type_term == PARSE_EVENTS__TERM_TYPE_RAW);
TEST_ASSERT_VAL("wrong type val",
- term->type_val == PARSE_EVENTS__TERM_TYPE_NUM);
- TEST_ASSERT_VAL("wrong val", term->val.num == 0xead);
- TEST_ASSERT_VAL("wrong config", !strcmp(term->config, "config"));
+ term->type_val == PARSE_EVENTS__TERM_TYPE_STR);
+ TEST_ASSERT_VAL("wrong val", !strcmp(term->val.str, "r0xead"));
+ TEST_ASSERT_VAL("wrong config", !strcmp(term->config, "raw"));
return TEST_OK;
}

@@ -2090,7 +2090,6 @@ static int test_event_fake_pmu(const char *str)
return -ENOMEM;

parse_events_error__init(&err);
- perf_pmu__test_parse_init();
ret = __parse_events(evlist, str, &err, &perf_pmu__fake, /*warn_if_reordered=*/true);
if (ret) {
pr_debug("failed to parse event '%s', err %d, str '%s'\n",
@@ -2144,13 +2143,6 @@ static int test_term(const struct terms_test *t)

INIT_LIST_HEAD(&terms);

- /*
- * The perf_pmu__test_parse_init prepares perf_pmu_events_list
- * which gets freed in parse_events_terms.
- */
- if (perf_pmu__test_parse_init())
- return -1;
-
ret = parse_events_terms(&terms, t->str);
if (ret) {
pr_debug("failed to parse terms '%s', err %d\n",
diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index 1dff863b9711..a2cde61b1c77 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -776,15 +776,6 @@ static int check_parse_id(const char *id, struct parse_events_error *error,
for (cur = strchr(dup, '@') ; cur; cur = strchr(++cur, '@'))
*cur = '/';

- if (fake_pmu) {
- /*
- * Every call to __parse_events will try to initialize the PMU
- * state from sysfs and then clean it up at the end. Reset the
- * PMU events to the test state so that we don't pick up
- * erroneous prefixes and suffixes.
- */
- perf_pmu__test_parse_init();
- }
ret = __parse_events(evlist, dup, error, fake_pmu, /*warn_if_reordered=*/true);
free(dup);

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 4ba01577618e..e416e653cf74 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -34,11 +34,6 @@

#define MAX_NAME_LEN 100

-struct perf_pmu_event_symbol {
- char *symbol;
- enum perf_pmu_event_symbol_type type;
-};
-
#ifdef PARSER_DEBUG
extern int parse_events_debug;
#endif
@@ -49,15 +44,6 @@ static int parse_events__with_hybrid_pmu(struct parse_events_state *parse_state,
const char *str, char *pmu_name,
struct list_head *list);

-static struct perf_pmu_event_symbol *perf_pmu_events_list;
-/*
- * The variable indicates the number of supported pmu event symbols.
- * 0 means not initialized and ready to init
- * -1 means failed to init, don't try anymore
- * >0 is the number of supported pmu event symbols
- */
-static int perf_pmu_events_list_num;
-
struct event_symbol event_symbols_hw[PERF_COUNT_HW_MAX] = {
[PERF_COUNT_HW_CPU_CYCLES] = {
.symbol = "cpu-cycles",
@@ -236,6 +222,57 @@ static char *get_config_name(struct list_head *head_terms)
return get_config_str(head_terms, PARSE_EVENTS__TERM_TYPE_NAME);
}

+/**
+ * fix_raw - For each raw term see if there is an event (aka alias) in pmu that
+ * matches the raw's string value. If the string value matches an
+ * event then change the term to be an event, if not then change it to
+ * be a config term. For example, "read" may be an event of the PMU or
+ * a raw hex encoding of 0xead. The fix-up is done late so the PMU of
+ * the event can be determined and we don't need to scan all PMUs
+ * ahead-of-time.
+ * @config_terms: the list of terms that may contain a raw term.
+ * @pmu: the PMU to scan for events from.
+ */
+static void fix_raw(struct list_head *config_terms, struct perf_pmu *pmu)
+{
+ struct parse_events_term *term;
+
+ list_for_each_entry(term, config_terms, list) {
+ struct perf_pmu_alias *alias;
+ bool matched = false;
+
+ if (term->type_term != PARSE_EVENTS__TERM_TYPE_RAW)
+ continue;
+
+ list_for_each_entry(alias, &pmu->aliases, list) {
+ if (!strcmp(alias->name, term->val.str)) {
+ free(term->config);
+ term->config = term->val.str;
+ term->type_val = PARSE_EVENTS__TERM_TYPE_NUM;
+ term->type_term = PARSE_EVENTS__TERM_TYPE_USER;
+ term->val.num = 1;
+ term->no_value = true;
+ matched = true;
+ break;
+ }
+ }
+ if (!matched) {
+ u64 num;
+
+ free(term->config);
+ term->config=strdup("config");
+ errno = 0;
+ num = strtoull(term->val.str + 1, NULL, 16);
+ assert(errno == 0);
+ free(term->val.str);
+ term->type_val = PARSE_EVENTS__TERM_TYPE_NUM;
+ term->type_term = PARSE_EVENTS__TERM_TYPE_CONFIG;
+ term->val.num = num;
+ term->no_value = false;
+ }
+ }
+}
+
static struct evsel *
__add_event(struct list_head *list, int *idx,
struct perf_event_attr *attr,
@@ -328,18 +365,27 @@ static int add_event_tool(struct list_head *list, int *idx,
return 0;
}

-static int parse_aliases(char *str, const char *const names[][EVSEL__MAX_ALIASES], int size)
+/**
+ * parse_aliases - search names for entries beginning or equalling str ignoring
+ * case. If mutliple entries in names match str then the longest
+ * is chosen.
+ * @str: The needle to look for.
+ * @names: The haystack to search.
+ * @size: The size of the haystack.
+ * @longest: Out argument giving the length of the matching entry.
+ */
+static int parse_aliases(const char *str, const char *const names[][EVSEL__MAX_ALIASES], int size,
+ int *longest)
{
- int i, j;
- int n, longest = -1;
+ *longest = -1;
+ for (int i = 0; i < size; i++) {
+ for (int j = 0; j < EVSEL__MAX_ALIASES && names[i][j]; j++) {
+ int n = strlen(names[i][j]);

- for (i = 0; i < size; i++) {
- for (j = 0; j < EVSEL__MAX_ALIASES && names[i][j]; j++) {
- n = strlen(names[i][j]);
- if (n > longest && !strncasecmp(str, names[i][j], n))
- longest = n;
+ if (n > *longest && !strncasecmp(str, names[i][j], n))
+ *longest = n;
}
- if (longest > 0)
+ if (*longest > 0)
return i;
}

@@ -357,52 +403,58 @@ static int config_attr(struct perf_event_attr *attr,
struct parse_events_error *err,
config_term_func_t config_term);

-int parse_events_add_cache(struct list_head *list, int *idx,
- char *type, char *op_result1, char *op_result2,
+int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
struct parse_events_error *err,
struct list_head *head_config,
struct parse_events_state *parse_state)
{
struct perf_event_attr attr;
LIST_HEAD(config_terms);
- char name[MAX_NAME_LEN];
const char *config_name, *metric_id;
int cache_type = -1, cache_op = -1, cache_result = -1;
- char *op_result[2] = { op_result1, op_result2 };
- int i, n, ret;
+ int ret, len;
+ const char *name_end = &name[strlen(name) + 1];
bool hybrid;
+ const char *str = name;

/*
- * No fallback - if we cannot get a clear cache type
- * then bail out:
+ * Search str for the legacy cache event name composed of 1, 2 or 3
+ * hyphen separated sections. The first section is the cache type while
+ * the others are the optional op and optional result. To make life hard
+ * the names in the table also contain hyphens and the longest name
+ * should always be selected.
*/
- cache_type = parse_aliases(type, evsel__hw_cache, PERF_COUNT_HW_CACHE_MAX);
+ cache_type = parse_aliases(str, evsel__hw_cache, PERF_COUNT_HW_CACHE_MAX, &len);
if (cache_type == -1)
return -EINVAL;
+ str += len + 1;

config_name = get_config_name(head_config);
- n = snprintf(name, MAX_NAME_LEN, "%s", type);
-
- for (i = 0; (i < 2) && (op_result[i]); i++) {
- char *str = op_result[i];
-
- n += snprintf(name + n, MAX_NAME_LEN - n, "-%s", str);
-
- if (cache_op == -1) {
+ if (str < name_end) {
+ cache_op = parse_aliases(str, evsel__hw_cache_op,
+ PERF_COUNT_HW_CACHE_OP_MAX, &len);
+ if (cache_op >= 0) {
+ if (!evsel__is_cache_op_valid(cache_type, cache_op))
+ return -EINVAL;
+ str += len + 1;
+ } else {
+ cache_result = parse_aliases(str, evsel__hw_cache_result,
+ PERF_COUNT_HW_CACHE_RESULT_MAX, &len);
+ if (cache_result >= 0)
+ str += len + 1;
+ }
+ }
+ if (str < name_end) {
+ if (cache_op < 0) {
cache_op = parse_aliases(str, evsel__hw_cache_op,
- PERF_COUNT_HW_CACHE_OP_MAX);
+ PERF_COUNT_HW_CACHE_OP_MAX, &len);
if (cache_op >= 0) {
if (!evsel__is_cache_op_valid(cache_type, cache_op))
return -EINVAL;
- continue;
}
- }
-
- if (cache_result == -1) {
+ } else if (cache_result < 0) {
cache_result = parse_aliases(str, evsel__hw_cache_result,
- PERF_COUNT_HW_CACHE_RESULT_MAX);
- if (cache_result >= 0)
- continue;
+ PERF_COUNT_HW_CACHE_RESULT_MAX, &len);
}
}

@@ -968,6 +1020,7 @@ static const char *config_term_names[__PARSE_EVENTS__TERM_TYPE_NR] = {
[PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT] = "aux-output",
[PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE] = "aux-sample-size",
[PARSE_EVENTS__TERM_TYPE_METRIC_ID] = "metric-id",
+ [PARSE_EVENTS__TERM_TYPE_RAW] = "raw",
};

static bool config_term_shrinked;
@@ -1089,6 +1142,9 @@ do { \
case PARSE_EVENTS__TERM_TYPE_METRIC_ID:
CHECK_TYPE_VAL(STR);
break;
+ case PARSE_EVENTS__TERM_TYPE_RAW:
+ CHECK_TYPE_VAL(STR);
+ break;
case PARSE_EVENTS__TERM_TYPE_MAX_STACK:
CHECK_TYPE_VAL(NUM);
break;
@@ -1485,6 +1541,8 @@ int parse_events_add_pmu(struct parse_events_state *parse_state,
parse_events_error__handle(err, 0, err_str, NULL);
return -EINVAL;
}
+ if (head_config)
+ fix_raw(head_config, pmu);

if (pmu->default_config) {
memcpy(&attr, pmu->default_config,
@@ -1875,180 +1933,6 @@ int parse_events_name(struct list_head *list, const char *name)
return 0;
}

-static int
-comp_pmu(const void *p1, const void *p2)
-{
- struct perf_pmu_event_symbol *pmu1 = (struct perf_pmu_event_symbol *) p1;
- struct perf_pmu_event_symbol *pmu2 = (struct perf_pmu_event_symbol *) p2;
-
- return strcasecmp(pmu1->symbol, pmu2->symbol);
-}
-
-static void perf_pmu__parse_cleanup(void)
-{
- if (perf_pmu_events_list_num > 0) {
- struct perf_pmu_event_symbol *p;
- int i;
-
- for (i = 0; i < perf_pmu_events_list_num; i++) {
- p = perf_pmu_events_list + i;
- zfree(&p->symbol);
- }
- zfree(&perf_pmu_events_list);
- perf_pmu_events_list_num = 0;
- }
-}
-
-#define SET_SYMBOL(str, stype) \
-do { \
- p->symbol = str; \
- if (!p->symbol) \
- goto err; \
- p->type = stype; \
-} while (0)
-
-/*
- * Read the pmu events list from sysfs
- * Save it into perf_pmu_events_list
- */
-static void perf_pmu__parse_init(void)
-{
-
- struct perf_pmu *pmu = NULL;
- struct perf_pmu_alias *alias;
- int len = 0;
-
- pmu = NULL;
- while ((pmu = perf_pmu__scan(pmu)) != NULL) {
- list_for_each_entry(alias, &pmu->aliases, list) {
- char *tmp = strchr(alias->name, '-');
-
- if (tmp) {
- char *tmp2 = NULL;
-
- tmp2 = strchr(tmp + 1, '-');
- len++;
- if (tmp2)
- len++;
- }
-
- len++;
- }
- }
-
- if (len == 0) {
- perf_pmu_events_list_num = -1;
- return;
- }
- perf_pmu_events_list = malloc(sizeof(struct perf_pmu_event_symbol) * len);
- if (!perf_pmu_events_list)
- return;
- perf_pmu_events_list_num = len;
-
- len = 0;
- pmu = NULL;
- while ((pmu = perf_pmu__scan(pmu)) != NULL) {
- list_for_each_entry(alias, &pmu->aliases, list) {
- struct perf_pmu_event_symbol *p = perf_pmu_events_list + len;
- char *tmp = strchr(alias->name, '-');
- char *tmp2 = NULL;
-
- if (tmp)
- tmp2 = strchr(tmp + 1, '-');
- if (tmp2) {
- SET_SYMBOL(strndup(alias->name, tmp - alias->name),
- PMU_EVENT_SYMBOL_PREFIX);
- p++;
- tmp++;
- SET_SYMBOL(strndup(tmp, tmp2 - tmp), PMU_EVENT_SYMBOL_SUFFIX);
- p++;
- SET_SYMBOL(strdup(++tmp2), PMU_EVENT_SYMBOL_SUFFIX2);
- len += 3;
- } else if (tmp) {
- SET_SYMBOL(strndup(alias->name, tmp - alias->name),
- PMU_EVENT_SYMBOL_PREFIX);
- p++;
- SET_SYMBOL(strdup(++tmp), PMU_EVENT_SYMBOL_SUFFIX);
- len += 2;
- } else {
- SET_SYMBOL(strdup(alias->name), PMU_EVENT_SYMBOL);
- len++;
- }
- }
- }
- qsort(perf_pmu_events_list, len,
- sizeof(struct perf_pmu_event_symbol), comp_pmu);
-
- return;
-err:
- perf_pmu__parse_cleanup();
-}
-
-/*
- * This function injects special term in
- * perf_pmu_events_list so the test code
- * can check on this functionality.
- */
-int perf_pmu__test_parse_init(void)
-{
- struct perf_pmu_event_symbol *list, *tmp, symbols[] = {
- {(char *)"read", PMU_EVENT_SYMBOL},
- {(char *)"event", PMU_EVENT_SYMBOL_PREFIX},
- {(char *)"two", PMU_EVENT_SYMBOL_SUFFIX},
- {(char *)"hyphen", PMU_EVENT_SYMBOL_SUFFIX},
- {(char *)"hyph", PMU_EVENT_SYMBOL_SUFFIX2},
- };
- unsigned long i, j;
-
- tmp = list = malloc(sizeof(*list) * ARRAY_SIZE(symbols));
- if (!list)
- return -ENOMEM;
-
- for (i = 0; i < ARRAY_SIZE(symbols); i++, tmp++) {
- tmp->type = symbols[i].type;
- tmp->symbol = strdup(symbols[i].symbol);
- if (!tmp->symbol)
- goto err_free;
- }
-
- perf_pmu_events_list = list;
- perf_pmu_events_list_num = ARRAY_SIZE(symbols);
-
- qsort(perf_pmu_events_list, ARRAY_SIZE(symbols),
- sizeof(struct perf_pmu_event_symbol), comp_pmu);
- return 0;
-
-err_free:
- for (j = 0, tmp = list; j < i; j++, tmp++)
- zfree(&tmp->symbol);
- free(list);
- return -ENOMEM;
-}
-
-enum perf_pmu_event_symbol_type
-perf_pmu__parse_check(const char *name)
-{
- struct perf_pmu_event_symbol p, *r;
-
- /* scan kernel pmu events from sysfs if needed */
- if (perf_pmu_events_list_num == 0)
- perf_pmu__parse_init();
- /*
- * name "cpu" could be prefix of cpu-cycles or cpu// events.
- * cpu-cycles has been handled by hardcode.
- * So it must be cpu// events, not kernel pmu event.
- */
- if ((perf_pmu_events_list_num <= 0) || !strcmp(name, "cpu"))
- return PMU_EVENT_SYMBOL_ERR;
-
- p.symbol = strdup(name);
- r = bsearch(&p, perf_pmu_events_list,
- (size_t) perf_pmu_events_list_num,
- sizeof(struct perf_pmu_event_symbol), comp_pmu);
- zfree(&p.symbol);
- return r ? r->type : PMU_EVENT_SYMBOL_ERR;
-}
-
static int parse_events__scanner(const char *str,
struct parse_events_state *parse_state)
{
@@ -2086,7 +1970,6 @@ int parse_events_terms(struct list_head *terms, const char *str)
int ret;

ret = parse_events__scanner(str, &parse_state);
- perf_pmu__parse_cleanup();

if (!ret) {
list_splice(parse_state.terms, terms);
@@ -2111,7 +1994,6 @@ static int parse_events__with_hybrid_pmu(struct parse_events_state *parse_state,
int ret;

ret = parse_events__scanner(str, &ps);
- perf_pmu__parse_cleanup();

if (!ret) {
if (!list_empty(&ps.list)) {
@@ -2267,7 +2149,6 @@ int __parse_events(struct evlist *evlist, const char *str,
int ret;

ret = parse_events__scanner(str, &parse_state);
- perf_pmu__parse_cleanup();

if (!ret && list_empty(&parse_state.list)) {
WARN_ONCE(true, "WARNING: event parser found nothing\n");
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 86ad4438a2aa..f638542c8638 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -41,14 +41,6 @@ int parse_events_terms(struct list_head *terms, const char *str);
int parse_filter(const struct option *opt, const char *str, int unset);
int exclude_perf(const struct option *opt, const char *arg, int unset);

-enum perf_pmu_event_symbol_type {
- PMU_EVENT_SYMBOL_ERR, /* not a PMU EVENT */
- PMU_EVENT_SYMBOL, /* normal style PMU event */
- PMU_EVENT_SYMBOL_PREFIX, /* prefix of pre-suf style event */
- PMU_EVENT_SYMBOL_SUFFIX, /* suffix of pre-suf style event */
- PMU_EVENT_SYMBOL_SUFFIX2, /* suffix of pre-suf2 style event */
-};
-
enum {
PARSE_EVENTS__TERM_TYPE_NUM,
PARSE_EVENTS__TERM_TYPE_STR,
@@ -78,6 +70,7 @@ enum {
PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT,
PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE,
PARSE_EVENTS__TERM_TYPE_METRIC_ID,
+ PARSE_EVENTS__TERM_TYPE_RAW,
__PARSE_EVENTS__TERM_TYPE_NR,
};

@@ -174,8 +167,7 @@ int parse_events_add_numeric(struct parse_events_state *parse_state,
int parse_events_add_tool(struct parse_events_state *parse_state,
struct list_head *list,
int tool_event);
-int parse_events_add_cache(struct list_head *list, int *idx,
- char *type, char *op_result1, char *op_result2,
+int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
struct parse_events_error *error,
struct list_head *head_config,
struct parse_events_state *parse_state);
@@ -198,8 +190,6 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
int parse_events_copy_term_list(struct list_head *old,
struct list_head **new);

-enum perf_pmu_event_symbol_type
-perf_pmu__parse_check(const char *name);
void parse_events__set_leader(char *name, struct list_head *list);
void parse_events_update_lists(struct list_head *list_event,
struct list_head *list_all);
@@ -241,8 +231,6 @@ static inline bool is_sdt_event(char *str __maybe_unused)
}
#endif /* HAVE_LIBELF_SUPPORT */

-int perf_pmu__test_parse_init(void);
-
struct evsel *parse_events__add_event_hybrid(struct list_head *list, int *idx,
struct perf_event_attr *attr,
const char *name,
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 51fe0a9fb3de..4b35c099189a 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -63,17 +63,6 @@ static int str(yyscan_t scanner, int token)
return token;
}

-static int raw(yyscan_t scanner)
-{
- YYSTYPE *yylval = parse_events_get_lval(scanner);
- char *text = parse_events_get_text(scanner);
-
- if (perf_pmu__parse_check(text) == PMU_EVENT_SYMBOL)
- return str(scanner, PE_NAME);
-
- return __value(yylval, text + 1, 16, PE_RAW);
-}
-
static bool isbpf_suffix(char *text)
{
int len = strlen(text);
@@ -131,35 +120,6 @@ do { \
yyless(0); \
} while (0)

-static int pmu_str_check(yyscan_t scanner, struct parse_events_state *parse_state)
-{
- YYSTYPE *yylval = parse_events_get_lval(scanner);
- char *text = parse_events_get_text(scanner);
-
- yylval->str = strdup(text);
-
- /*
- * If we're not testing then parse check determines the PMU event type
- * which if it isn't a PMU returns PE_NAME. When testing the result of
- * parse check can't be trusted so we return PE_PMU_EVENT_FAKE unless
- * an '!' is present in which case the text can't be a PMU name.
- */
- switch (perf_pmu__parse_check(text)) {
- case PMU_EVENT_SYMBOL_PREFIX:
- return PE_PMU_EVENT_PRE;
- case PMU_EVENT_SYMBOL_SUFFIX:
- return PE_PMU_EVENT_SUF;
- case PMU_EVENT_SYMBOL_SUFFIX2:
- return PE_PMU_EVENT_SUF2;
- case PMU_EVENT_SYMBOL:
- return parse_state->fake_pmu
- ? PE_PMU_EVENT_FAKE : PE_KERNEL_PMU_EVENT;
- default:
- return parse_state->fake_pmu && !strchr(text,'!')
- ? PE_PMU_EVENT_FAKE : PE_NAME;
- }
-}
-
static int sym(yyscan_t scanner, int type, int config)
{
YYSTYPE *yylval = parse_events_get_lval(scanner);
@@ -211,13 +171,15 @@ bpf_source [^,{}]+\.c[a-zA-Z0-9._]*
num_dec [0-9]+
num_hex 0x[a-fA-F0-9]+
num_raw_hex [a-fA-F0-9]+
-name [a-zA-Z_*?\[\]][a-zA-Z0-9_*?.\[\]!]*
+name [a-zA-Z_*?\[\]][a-zA-Z0-9_*?.\[\]!\-]*
name_tag [\'][a-zA-Z_*?\[\]][a-zA-Z0-9_*?\-,\.\[\]:=]*[\']
name_minus [a-zA-Z_*?][a-zA-Z0-9\-_*?.:]*
drv_cfg_term [a-zA-Z0-9_\.]+(=[a-zA-Z0-9_*?\.:]+)?
/* If you add a modifier you need to update check_modifier() */
modifier_event [ukhpPGHSDIWeb]+
modifier_bp [rwx]{1,3}
+lc_type (L1-dcache|l1-d|l1d|L1-data|L1-icache|l1-i|l1i|L1-instruction|LLC|L2|dTLB|d-tlb|Data-TLB|iTLB|i-tlb|Instruction-TLB|branch|branches|bpu|btb|bpc|node)
+lc_op_result (load|loads|read|store|stores|write|prefetch|prefetches|speculative-read|speculative-load|refs|Reference|ops|access|misses|miss)

%%

@@ -303,8 +265,8 @@ percore { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_PERCORE); }
aux-output { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT); }
aux-sample-size { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE); }
metric-id { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_METRIC_ID); }
-r{num_raw_hex} { return raw(yyscanner); }
-r0x{num_raw_hex} { return raw(yyscanner); }
+r{num_raw_hex} { return str(yyscanner, PE_RAW); }
+r0x{num_raw_hex} { return str(yyscanner, PE_RAW); }
, { return ','; }
"/" { BEGIN(INITIAL); return '/'; }
{name_minus} { return str(yyscanner, PE_NAME); }
@@ -359,47 +321,20 @@ system_time { return tool(yyscanner, PERF_TOOL_SYSTEM_TIME); }
bpf-output { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_BPF_OUTPUT); }
cgroup-switches { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CGROUP_SWITCHES); }

- /*
- * We have to handle the kernel PMU event cycles-ct/cycles-t/mem-loads/mem-stores separately.
- * Because the prefix cycles is mixed up with cpu-cycles.
- * loads and stores are mixed up with cache event
- */
-cycles-ct |
-cycles-t |
-mem-loads |
-mem-loads-aux |
-mem-stores |
-topdown-[a-z-]+ |
-tx-capacity-[a-z-]+ |
-el-capacity-[a-z-]+ { return str(yyscanner, PE_KERNEL_PMU_EVENT); }
-
-L1-dcache|l1-d|l1d|L1-data |
-L1-icache|l1-i|l1i|L1-instruction |
-LLC|L2 |
-dTLB|d-tlb|Data-TLB |
-iTLB|i-tlb|Instruction-TLB |
-branch|branches|bpu|btb|bpc |
-node { return str(yyscanner, PE_NAME_CACHE_TYPE); }
-
-load|loads|read |
-store|stores|write |
-prefetch|prefetches |
-speculative-read|speculative-load |
-refs|Reference|ops|access |
-misses|miss { return str(yyscanner, PE_NAME_CACHE_OP_RESULT); }
-
+{lc_type} { return str(yyscanner, PE_LEGACY_CACHE); }
+{lc_type}-{lc_op_result} { return str(yyscanner, PE_LEGACY_CACHE); }
+{lc_type}-{lc_op_result}-{lc_op_result} { return str(yyscanner, PE_LEGACY_CACHE); }
mem: { BEGIN(mem); return PE_PREFIX_MEM; }
-r{num_raw_hex} { return raw(yyscanner); }
+r{num_raw_hex} { return str(yyscanner, PE_RAW); }
{num_dec} { return value(yyscanner, 10); }
{num_hex} { return value(yyscanner, 16); }

{modifier_event} { return str(yyscanner, PE_MODIFIER_EVENT); }
{bpf_object} { if (!isbpf(yyscanner)) { USER_REJECT }; return str(yyscanner, PE_BPF_OBJECT); }
{bpf_source} { if (!isbpf(yyscanner)) { USER_REJECT }; return str(yyscanner, PE_BPF_SOURCE); }
-{name} { return pmu_str_check(yyscanner, _parse_state); }
+{name} { return str(yyscanner, PE_NAME); }
{name_tag} { return str(yyscanner, PE_NAME); }
"/" { BEGIN(config); return '/'; }
-- { return '-'; }
, { BEGIN(event); return ','; }
: { return ':'; }
"{" { BEGIN(event); return '{'; }
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 4488443e506e..e7072b5601c5 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -8,6 +8,7 @@

#define YYDEBUG 1

+#include <errno.h>
#include <fnmatch.h>
#include <stdio.h>
#include <linux/compiler.h>
@@ -52,36 +53,35 @@ static void free_list_evsel(struct list_head* list_evsel)
%}

%token PE_START_EVENTS PE_START_TERMS
-%token PE_VALUE PE_VALUE_SYM_HW PE_VALUE_SYM_SW PE_RAW PE_TERM
+%token PE_VALUE PE_VALUE_SYM_HW PE_VALUE_SYM_SW PE_TERM
%token PE_VALUE_SYM_TOOL
%token PE_EVENT_NAME
-%token PE_NAME
+%token PE_RAW PE_NAME
%token PE_BPF_OBJECT PE_BPF_SOURCE
%token PE_MODIFIER_EVENT PE_MODIFIER_BP
-%token PE_NAME_CACHE_TYPE PE_NAME_CACHE_OP_RESULT
+%token PE_LEGACY_CACHE
%token PE_PREFIX_MEM PE_PREFIX_RAW PE_PREFIX_GROUP
%token PE_ERROR
-%token PE_PMU_EVENT_PRE PE_PMU_EVENT_SUF PE_PMU_EVENT_SUF2 PE_KERNEL_PMU_EVENT PE_PMU_EVENT_FAKE
+%token PE_KERNEL_PMU_EVENT PE_PMU_EVENT_FAKE
%token PE_ARRAY_ALL PE_ARRAY_RANGE
%token PE_DRV_CFG_TERM
%type <num> PE_VALUE
%type <num> PE_VALUE_SYM_HW
%type <num> PE_VALUE_SYM_SW
%type <num> PE_VALUE_SYM_TOOL
-%type <num> PE_RAW
%type <num> PE_TERM
%type <num> value_sym
+%type <str> PE_RAW
%type <str> PE_NAME
%type <str> PE_BPF_OBJECT
%type <str> PE_BPF_SOURCE
-%type <str> PE_NAME_CACHE_TYPE
-%type <str> PE_NAME_CACHE_OP_RESULT
+%type <str> PE_LEGACY_CACHE
%type <str> PE_MODIFIER_EVENT
%type <str> PE_MODIFIER_BP
%type <str> PE_EVENT_NAME
-%type <str> PE_PMU_EVENT_PRE PE_PMU_EVENT_SUF PE_PMU_EVENT_SUF2 PE_KERNEL_PMU_EVENT PE_PMU_EVENT_FAKE
+%type <str> PE_KERNEL_PMU_EVENT PE_PMU_EVENT_FAKE
%type <str> PE_DRV_CFG_TERM
-%type <str> event_pmu_name
+%type <str> name_or_raw
%destructor { free ($$); } <str>
%type <term> event_term
%destructor { parse_events_term__delete ($$); } <term>
@@ -273,11 +273,8 @@ event_def: event_pmu |
event_legacy_raw sep_dc |
event_bpf_file

-event_pmu_name:
-PE_NAME | PE_PMU_EVENT_PRE
-
event_pmu:
-event_pmu_name opt_pmu_config
+PE_NAME opt_pmu_config
{
struct parse_events_state *parse_state = _parse_state;
struct parse_events_error *error = parse_state->error;
@@ -303,10 +300,12 @@ event_pmu_name opt_pmu_config
list = alloc_list();
if (!list)
CLEANUP_YYABORT;
+ /* Attempt to add to list assuming $1 is a PMU name. */
if (parse_events_add_pmu(_parse_state, list, $1, $2, /*auto_merge_stats=*/false)) {
struct perf_pmu *pmu = NULL;
int ok = 0;

+ /* Failure to add, try wildcard expansion of $1 as a PMU name. */
if (asprintf(&pattern, "%s*", $1) < 0)
CLEANUP_YYABORT;

@@ -329,6 +328,12 @@ event_pmu_name opt_pmu_config
}
}

+ if (!ok) {
+ /* Failure to add, assume $1 is an event name. */
+ zfree(&list);
+ ok = !parse_events_multi_pmu_add(_parse_state, $1, $2, &list);
+ $2 = NULL;
+ }
if (!ok)
CLEANUP_YYABORT;
}
@@ -352,41 +357,27 @@ PE_KERNEL_PMU_EVENT sep_dc
$$ = list;
}
|
-PE_KERNEL_PMU_EVENT opt_pmu_config
+PE_NAME sep_dc
{
struct list_head *list;
int err;

- /* frees $2 */
- err = parse_events_multi_pmu_add(_parse_state, $1, $2, &list);
+ err = parse_events_multi_pmu_add(_parse_state, $1, NULL, &list);
free($1);
if (err < 0)
YYABORT;
$$ = list;
}
|
-PE_PMU_EVENT_PRE '-' PE_PMU_EVENT_SUF '-' PE_PMU_EVENT_SUF2 sep_dc
-{
- struct list_head *list;
- char pmu_name[128];
- snprintf(pmu_name, sizeof(pmu_name), "%s-%s-%s", $1, $3, $5);
- free($1);
- free($3);
- free($5);
- if (parse_events_multi_pmu_add(_parse_state, pmu_name, NULL, &list) < 0)
- YYABORT;
- $$ = list;
-}
-|
-PE_PMU_EVENT_PRE '-' PE_PMU_EVENT_SUF sep_dc
+PE_KERNEL_PMU_EVENT opt_pmu_config
{
struct list_head *list;
- char pmu_name[128];
+ int err;

- snprintf(pmu_name, sizeof(pmu_name), "%s-%s", $1, $3);
+ /* frees $2 */
+ err = parse_events_multi_pmu_add(_parse_state, $1, $2, &list);
free($1);
- free($3);
- if (parse_events_multi_pmu_add(_parse_state, pmu_name, NULL, &list) < 0)
+ if (err < 0)
YYABORT;
$$ = list;
}
@@ -476,7 +467,7 @@ PE_VALUE_SYM_TOOL sep_slash_slash_dc
}

event_legacy_cache:
-PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT '-' PE_NAME_CACHE_OP_RESULT opt_event_config
+PE_LEGACY_CACHE opt_event_config
{
struct parse_events_state *parse_state = _parse_state;
struct parse_events_error *error = parse_state->error;
@@ -485,51 +476,8 @@ PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT '-' PE_NAME_CACHE_OP_RESULT opt_e

list = alloc_list();
ABORT_ON(!list);
- err = parse_events_add_cache(list, &parse_state->idx, $1, $3, $5, error, $6,
- parse_state);
- parse_events_terms__delete($6);
- free($1);
- free($3);
- free($5);
- if (err) {
- free_list_evsel(list);
- YYABORT;
- }
- $$ = list;
-}
-|
-PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT opt_event_config
-{
- struct parse_events_state *parse_state = _parse_state;
- struct parse_events_error *error = parse_state->error;
- struct list_head *list;
- int err;
+ err = parse_events_add_cache(list, &parse_state->idx, $1, error, $2, parse_state);

- list = alloc_list();
- ABORT_ON(!list);
- err = parse_events_add_cache(list, &parse_state->idx, $1, $3, NULL, error, $4,
- parse_state);
- parse_events_terms__delete($4);
- free($1);
- free($3);
- if (err) {
- free_list_evsel(list);
- YYABORT;
- }
- $$ = list;
-}
-|
-PE_NAME_CACHE_TYPE opt_event_config
-{
- struct parse_events_state *parse_state = _parse_state;
- struct parse_events_error *error = parse_state->error;
- struct list_head *list;
- int err;
-
- list = alloc_list();
- ABORT_ON(!list);
- err = parse_events_add_cache(list, &parse_state->idx, $1, NULL, NULL, error, $2,
- parse_state);
parse_events_terms__delete($2);
free($1);
if (err) {
@@ -633,17 +581,6 @@ tracepoint_name opt_event_config
}

tracepoint_name:
-PE_NAME '-' PE_NAME ':' PE_NAME
-{
- struct tracepoint_name tracepoint;
-
- ABORT_ON(asprintf(&tracepoint.sys, "%s-%s", $1, $3) < 0);
- tracepoint.event = $5;
- free($1);
- free($3);
- $$ = tracepoint;
-}
-|
PE_NAME ':' PE_NAME
{
struct tracepoint_name tracepoint = {$1, $3};
@@ -673,10 +610,15 @@ PE_RAW opt_event_config
{
struct list_head *list;
int err;
+ u64 num;

list = alloc_list();
ABORT_ON(!list);
- err = parse_events_add_numeric(_parse_state, list, PERF_TYPE_RAW, $1, $2);
+ errno = 0;
+ num = strtoull($1 + 1, NULL, 16);
+ ABORT_ON(errno);
+ free($1);
+ err = parse_events_add_numeric(_parse_state, list, PERF_TYPE_RAW, num, $2);
parse_events_terms__delete($2);
if (err) {
free(list);
@@ -781,17 +723,22 @@ event_term
$$ = head;
}

+name_or_raw: PE_RAW | PE_NAME
+
event_term:
PE_RAW
{
struct parse_events_term *term;

- ABORT_ON(parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_CONFIG,
- NULL, $1, false, &@1, NULL));
+ if (parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_RAW,
+ strdup("raw"), $1, &@1, &@1)) {
+ free($1);
+ YYABORT;
+ }
$$ = term;
}
|
-PE_NAME '=' PE_NAME
+name_or_raw '=' PE_NAME
{
struct parse_events_term *term;

@@ -804,7 +751,7 @@ PE_NAME '=' PE_NAME
$$ = term;
}
|
-PE_NAME '=' PE_VALUE
+name_or_raw '=' PE_VALUE
{
struct parse_events_term *term;

@@ -816,7 +763,7 @@ PE_NAME '=' PE_VALUE
$$ = term;
}
|
-PE_NAME '=' PE_VALUE_SYM_HW
+name_or_raw '=' PE_VALUE_SYM_HW
{
struct parse_events_term *term;
int config = $3 & 255;
@@ -876,7 +823,7 @@ PE_TERM
$$ = term;
}
|
-PE_NAME array '=' PE_NAME
+name_or_raw array '=' PE_NAME
{
struct parse_events_term *term;

@@ -891,7 +838,7 @@ PE_NAME array '=' PE_NAME
$$ = term;
}
|
-PE_NAME array '=' PE_VALUE
+name_or_raw array '=' PE_VALUE
{
struct parse_events_term *term;

--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:07:52

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 16/40] perf test: Validate events with hyphens in

Rewritten event parsing can handle event names that contain components
of legacy events.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/tests/parse-events.c | 12 ------------
1 file changed, 12 deletions(-)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 6eadb8a47dbf..cb976765b8b0 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -2198,18 +2198,6 @@ static int test__pmu_events(struct test_suite *test __maybe_unused, int subtest
ret = combine_test_results(ret, TEST_SKIP);
continue;
}
- /*
- * Names containing '-' are recognized as prefixes and suffixes
- * due to '-' being a legacy PMU separator. This fails when the
- * prefix or suffix collides with an existing legacy token. For
- * example, branch-brs has a prefix (branch) that collides with
- * a PE_NAME_CACHE_TYPE token causing a parse error as a suffix
- * isn't expected after this. As event names in the config
- * slashes are allowed a '-' in the name we check this works
- * above.
- */
- if (strchr(ent->d_name, '-'))
- continue;

dir = opendir(path);
if (!dir) {
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:07:53

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 14/40] perf print-events: Avoid unnecessary strlist

The strlist in print_hwcache_events holds the event names as they are
generated, and then it is iterated and printed. This is unnecessary
and each event can just be printed as it is processed.
Rename the variable i to res, to be more intention revealing and
consistent with other code.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/print-events.c | 60 ++++++++++++++++++----------------
1 file changed, 31 insertions(+), 29 deletions(-)

diff --git a/tools/perf/util/print-events.c b/tools/perf/util/print-events.c
index 386b1ab0b60e..93bbb868d400 100644
--- a/tools/perf/util/print-events.c
+++ b/tools/perf/util/print-events.c
@@ -226,58 +226,60 @@ void print_sdt_events(const struct print_callbacks *print_cb, void *print_state)

int print_hwcache_events(const struct print_callbacks *print_cb, void *print_state)
{
- struct strlist *evt_name_list = strlist__new(NULL, NULL);
- struct str_node *nd;
+ const char *event_type_descriptor = event_type_descriptors[PERF_TYPE_HW_CACHE];

- if (!evt_name_list) {
- pr_debug("Failed to allocate new strlist for hwcache events\n");
- return -ENOMEM;
- }
for (int type = 0; type < PERF_COUNT_HW_CACHE_MAX; type++) {
for (int op = 0; op < PERF_COUNT_HW_CACHE_OP_MAX; op++) {
/* skip invalid cache type */
if (!evsel__is_cache_op_valid(type, op))
continue;

- for (int i = 0; i < PERF_COUNT_HW_CACHE_RESULT_MAX; i++) {
+ for (int res = 0; res < PERF_COUNT_HW_CACHE_RESULT_MAX; res++) {
struct perf_pmu *pmu = NULL;
char name[64];

- __evsel__hw_cache_type_op_res_name(type, op, i, name, sizeof(name));
+ __evsel__hw_cache_type_op_res_name(type, op, res,
+ name, sizeof(name));
if (!perf_pmu__has_hybrid()) {
if (is_event_supported(PERF_TYPE_HW_CACHE,
- type | (op << 8) | (i << 16)))
- strlist__add(evt_name_list, name);
+ type | (op << 8) | (res << 16))) {
+ print_cb->print_event(print_state,
+ "cache",
+ /*pmu_name=*/NULL,
+ name,
+ /*event_alias=*/NULL,
+ /*scale_unit=*/NULL,
+ /*deprecated=*/false,
+ event_type_descriptor,
+ /*desc=*/NULL,
+ /*long_desc=*/NULL,
+ /*encoding_desc=*/NULL);
+ }
continue;
}
perf_pmu__for_each_hybrid_pmu(pmu) {
if (is_event_supported(PERF_TYPE_HW_CACHE,
- type | (op << 8) | (i << 16) |
+ type | (op << 8) | (res << 16) |
((__u64)pmu->type << PERF_PMU_TYPE_SHIFT))) {
char new_name[128];
- snprintf(new_name, sizeof(new_name),
- "%s/%s/", pmu->name, name);
- strlist__add(evt_name_list, new_name);
+ snprintf(new_name, sizeof(new_name),
+ "%s/%s/", pmu->name, name);
+ print_cb->print_event(print_state,
+ "cache",
+ pmu->name,
+ name,
+ new_name,
+ /*scale_unit=*/NULL,
+ /*deprecated=*/false,
+ event_type_descriptor,
+ /*desc=*/NULL,
+ /*long_desc=*/NULL,
+ /*encoding_desc=*/NULL);
}
}
}
}
}
-
- strlist__for_each_entry(nd, evt_name_list) {
- print_cb->print_event(print_state,
- "cache",
- /*pmu_name=*/NULL,
- nd->s,
- /*event_alias=*/NULL,
- /*scale_unit=*/NULL,
- /*deprecated=*/false,
- event_type_descriptors[PERF_TYPE_HW_CACHE],
- /*desc=*/NULL,
- /*long_desc=*/NULL,
- /*encoding_desc=*/NULL);
- }
- strlist__delete(evt_name_list);
return 0;
}

--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:08:35

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 17/40] perf evsel: Modify group pmu name for software events

If we have a group of {cycles,faults} then we need the faults software
event to appear to be on the same PMU as cycles so that we don't split
the group in parse_events__sort_events_and_fix_groups. This case is
relatively easy as cycles is the leader and will have a PMU name. In
the reverse case, {faults,cycles} we still need faults to appear to
have the PMU name of cycles but the old behavior is just to return
"cpu". For hybrid this fails as cycles will be on "cpu_core" or
"cpu_atom", causing faults to be split into a different group.

Change the behavior for software events so that the whole group is
searched for the named PMU.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/evsel.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 1cd04b5998d2..63522322e118 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -829,23 +829,26 @@ bool evsel__name_is(struct evsel *evsel, const char *name)

const char *evsel__group_pmu_name(const struct evsel *evsel)
{
- const struct evsel *leader;
+ struct evsel *leader, *pos;

/* If the pmu_name is set use it. pmu_name isn't set for CPU and software events. */
if (evsel->pmu_name)
return evsel->pmu_name;
/*
* Software events may be in a group with other uncore PMU events. Use
- * the pmu_name of the group leader to avoid breaking the software event
- * out of the group.
+ * the pmu_name of the first non-software event to avoid breaking the
+ * software event out of the group.
*
* Aux event leaders, like intel_pt, expect a group with events from
* other PMUs, so substitute the AUX event's PMU in this case.
*/
leader = evsel__leader(evsel);
- if ((evsel->core.attr.type == PERF_TYPE_SOFTWARE || evsel__is_aux_event(leader)) &&
- leader->pmu_name) {
- return leader->pmu_name;
+ if (evsel->core.attr.type == PERF_TYPE_SOFTWARE || evsel__is_aux_event(leader)) {
+ /* Starting with the leader, find the first event with a named PMU. */
+ for_each_group_evsel(pos, leader) {
+ if (pos->pmu_name)
+ return pos->pmu_name;
+ }
}

return "cpu";
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:09:01

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 19/40] perf test x86 hybrid: Don't assume evlist order

Switch to a loop rather than depend on evlist order for raw events
test.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/arch/x86/tests/hybrid.c | 26 ++++++++++++--------------
1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/tools/perf/arch/x86/tests/hybrid.c b/tools/perf/arch/x86/tests/hybrid.c
index 0f99cfd116ee..66486335652f 100644
--- a/tools/perf/arch/x86/tests/hybrid.c
+++ b/tools/perf/arch/x86/tests/hybrid.c
@@ -11,6 +11,11 @@ static bool test_config(const struct evsel *evsel, __u64 expected_config)
return (evsel->core.attr.config & PERF_HW_EVENT_MASK) == expected_config;
}

+static bool test_perf_config(const struct perf_evsel *evsel, __u64 expected_config)
+{
+ return (evsel->attr.config & PERF_HW_EVENT_MASK) == expected_config;
+}
+
static int test__hybrid_hw_event_with_pmu(struct evlist *evlist)
{
struct evsel *evsel = evlist__first(evlist);
@@ -93,22 +98,15 @@ static int test__hybrid_group_modifier1(struct evlist *evlist)

static int test__hybrid_raw1(struct evlist *evlist)
{
- struct evsel *evsel = evlist__first(evlist);
+ struct perf_evsel *evsel;

- if (!perf_pmu__hybrid_mounted("cpu_atom")) {
- TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
- return TEST_OK;
- }
+ perf_evlist__for_each_evsel(&evlist->core, evsel) {
+ struct perf_pmu *pmu = perf_pmu__find_by_type(evsel->attr.type);

- TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
-
- /* The type of second event is randome value */
- evsel = evsel__next(evsel);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
+ TEST_ASSERT_VAL("missing pmu", pmu);
+ TEST_ASSERT_VAL("unexpected pmu", !strncmp(pmu->name, "cpu_", 4));
+ TEST_ASSERT_VAL("wrong config", test_perf_config(evsel, 0x1a));
+ }
return TEST_OK;
}

--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:09:23

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 20/40] perf parse-events: Support PMUs for legacy cache events

Allow a legacy cache event to be both, for example,
"L1-dcache-load-miss" and "cpu/L1-dcache-load-miss/" by introducing a
new legacy cache term type. The term type is processed in
config_term_pmu, setting both the type in perf_event_attr and the
config. The code to determine the config is factored out of
parse_events_add_cache and shared. If the PMU doesn't support legacy
events, currently just core/hybrid PMUs do, then the term is treated
like a PE_NAME term - as before. If only terms are being parsed, such
as for perf_pmu__new_alias, then the PE_LEGACY_CACHE token is always
parsed as PE_NAME.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/tests/parse-events.c | 18 +++++++++
tools/perf/util/parse-events.c | 70 ++++++++++++++++++++++-----------
tools/perf/util/parse-events.h | 3 ++
tools/perf/util/parse-events.l | 9 ++++-
tools/perf/util/parse-events.y | 14 ++++++-
tools/perf/util/pmu.c | 5 +++
tools/perf/util/pmu.h | 1 +
7 files changed, 96 insertions(+), 24 deletions(-)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 15fec7f01315..6aea51e33dc0 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -1861,6 +1861,24 @@ static const struct evlist_test test__events_pmu[] = {
.check = test__checkevent_raw_pmu,
/* 5 */
},
+ {
+ .name = "cpu/L1-dcache-load-miss/",
+ .valid = test__pmu_cpu_valid,
+ .check = test__checkevent_genhw,
+ /* 6 */
+ },
+ {
+ .name = "cpu/L1-dcache-load-miss/kp",
+ .valid = test__pmu_cpu_valid,
+ .check = test__checkevent_genhw_modifier,
+ /* 7 */
+ },
+ {
+ .name = "cpu/L1-dcache-misses,name=cachepmu/",
+ .valid = test__pmu_cpu_valid,
+ .check = test__checkevent_config_cache,
+ /* 8 */
+ },
};

struct terms_test {
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index e416e653cf74..9b2d7b6572c2 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -403,33 +403,27 @@ static int config_attr(struct perf_event_attr *attr,
struct parse_events_error *err,
config_term_func_t config_term);

-int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
- struct parse_events_error *err,
- struct list_head *head_config,
- struct parse_events_state *parse_state)
+/**
+ * parse_events__decode_legacy_cache - Search name for the legacy cache event
+ * name composed of 1, 2 or 3 hyphen
+ * separated sections. The first section is
+ * the cache type while the others are the
+ * optional op and optional result. To make
+ * life hard the names in the table also
+ * contain hyphens and the longest name
+ * should always be selected.
+ */
+static int parse_events__decode_legacy_cache(const char *name, int pmu_type, __u64 *config)
{
- struct perf_event_attr attr;
- LIST_HEAD(config_terms);
- const char *config_name, *metric_id;
- int cache_type = -1, cache_op = -1, cache_result = -1;
- int ret, len;
+ int len, cache_type = -1, cache_op = -1, cache_result = -1;
const char *name_end = &name[strlen(name) + 1];
- bool hybrid;
const char *str = name;

- /*
- * Search str for the legacy cache event name composed of 1, 2 or 3
- * hyphen separated sections. The first section is the cache type while
- * the others are the optional op and optional result. To make life hard
- * the names in the table also contain hyphens and the longest name
- * should always be selected.
- */
cache_type = parse_aliases(str, evsel__hw_cache, PERF_COUNT_HW_CACHE_MAX, &len);
if (cache_type == -1)
return -EINVAL;
str += len + 1;

- config_name = get_config_name(head_config);
if (str < name_end) {
cache_op = parse_aliases(str, evsel__hw_cache_op,
PERF_COUNT_HW_CACHE_OP_MAX, &len);
@@ -470,9 +464,28 @@ int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
if (cache_result == -1)
cache_result = PERF_COUNT_HW_CACHE_RESULT_ACCESS;

+ *config = ((__u64)pmu_type << PERF_PMU_TYPE_SHIFT) |
+ cache_type | (cache_op << 8) | (cache_result << 16);
+ return 0;
+}
+
+int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
+ struct parse_events_error *err,
+ struct list_head *head_config,
+ struct parse_events_state *parse_state)
+{
+ struct perf_event_attr attr;
+ LIST_HEAD(config_terms);
+ const char *config_name, *metric_id;
+ int ret;
+ bool hybrid;
+
+
memset(&attr, 0, sizeof(attr));
- attr.config = cache_type | (cache_op << 8) | (cache_result << 16);
attr.type = PERF_TYPE_HW_CACHE;
+ ret = parse_events__decode_legacy_cache(name, /*pmu_type=*/0, &attr.config);
+ if (ret)
+ return ret;

if (head_config) {
if (config_attr(&attr, head_config, err,
@@ -483,6 +496,7 @@ int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
return -ENOMEM;
}

+ config_name = get_config_name(head_config);
metric_id = get_config_metric_id(head_config);
ret = parse_events__add_cache_hybrid(list, idx, &attr,
config_name ? : name,
@@ -1021,6 +1035,7 @@ static const char *config_term_names[__PARSE_EVENTS__TERM_TYPE_NR] = {
[PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE] = "aux-sample-size",
[PARSE_EVENTS__TERM_TYPE_METRIC_ID] = "metric-id",
[PARSE_EVENTS__TERM_TYPE_RAW] = "raw",
+ [PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE] = "legacy-cache",
};

static bool config_term_shrinked;
@@ -1198,15 +1213,25 @@ static int config_term_pmu(struct perf_event_attr *attr,
struct parse_events_term *term,
struct parse_events_error *err)
{
+ if (term->type_term == PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE) {
+ const struct perf_pmu *pmu = perf_pmu__find_by_type(attr->type);
+
+ if (perf_pmu__supports_legacy_cache(pmu)) {
+ attr->type = PERF_TYPE_HW_CACHE;
+ return parse_events__decode_legacy_cache(term->config, pmu->type,
+ &attr->config);
+ } else
+ term->type_term = PARSE_EVENTS__TERM_TYPE_USER;
+ }
if (term->type_term == PARSE_EVENTS__TERM_TYPE_USER ||
- term->type_term == PARSE_EVENTS__TERM_TYPE_DRV_CFG)
+ term->type_term == PARSE_EVENTS__TERM_TYPE_DRV_CFG) {
/*
* Always succeed for sysfs terms, as we dont know
* at this point what type they need to have.
*/
return 0;
- else
- return config_term_common(attr, term, err);
+ }
+ return config_term_common(attr, term, err);
}

#ifdef HAVE_LIBTRACEEVENT
@@ -2145,6 +2170,7 @@ int __parse_events(struct evlist *evlist, const char *str,
.evlist = evlist,
.stoken = PE_START_EVENTS,
.fake_pmu = fake_pmu,
+ .match_legacy_cache_terms = true,
};
int ret;

diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index f638542c8638..5acb62c2e00a 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -71,6 +71,7 @@ enum {
PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE,
PARSE_EVENTS__TERM_TYPE_METRIC_ID,
PARSE_EVENTS__TERM_TYPE_RAW,
+ PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE,
__PARSE_EVENTS__TERM_TYPE_NR,
};

@@ -122,6 +123,8 @@ struct parse_events_state {
int stoken;
struct perf_pmu *fake_pmu;
char *hybrid_pmu_name;
+ /* Should PE_LEGACY_NAME tokens be generated for config terms? */
+ bool match_legacy_cache_terms;
bool wild_card_pmus;
};

diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 4b35c099189a..abe0ce681d29 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -63,6 +63,11 @@ static int str(yyscan_t scanner, int token)
return token;
}

+static int lc_str(yyscan_t scanner, const struct parse_events_state *state)
+{
+ return str(scanner, state->match_legacy_cache_terms ? PE_LEGACY_CACHE : PE_NAME);
+}
+
static bool isbpf_suffix(char *text)
{
int len = strlen(text);
@@ -185,7 +190,6 @@ lc_op_result (load|loads|read|store|stores|write|prefetch|prefetches|speculative

%{
struct parse_events_state *_parse_state = parse_events_get_extra(yyscanner);
-
{
int start_token = _parse_state->stoken;

@@ -269,6 +273,9 @@ r{num_raw_hex} { return str(yyscanner, PE_RAW); }
r0x{num_raw_hex} { return str(yyscanner, PE_RAW); }
, { return ','; }
"/" { BEGIN(INITIAL); return '/'; }
+{lc_type} { return lc_str(yyscanner, _parse_state); }
+{lc_type}-{lc_op_result} { return lc_str(yyscanner, _parse_state); }
+{lc_type}-{lc_op_result}-{lc_op_result} { return lc_str(yyscanner, _parse_state); }
{name_minus} { return str(yyscanner, PE_NAME); }
\[all\] { return PE_ARRAY_ALL; }
"[" { BEGIN(array); return '['; }
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index e7072b5601c5..f84fa1b132b3 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -723,7 +723,7 @@ event_term
$$ = head;
}

-name_or_raw: PE_RAW | PE_NAME
+name_or_raw: PE_RAW | PE_NAME | PE_LEGACY_CACHE

event_term:
PE_RAW
@@ -775,6 +775,18 @@ name_or_raw '=' PE_VALUE_SYM_HW
$$ = term;
}
|
+PE_LEGACY_CACHE
+{
+ struct parse_events_term *term;
+
+ if (parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE,
+ $1, 1, true, &@1, NULL)) {
+ free($1);
+ YYABORT;
+ }
+ $$ = term;
+}
+|
PE_NAME
{
struct parse_events_term *term;
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index cb33d869f1ed..63071d876190 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1650,6 +1650,11 @@ bool is_pmu_core(const char *name)
return !strcmp(name, "cpu") || is_arm_pmu_core(name);
}

+bool perf_pmu__supports_legacy_cache(const struct perf_pmu *pmu)
+{
+ return is_pmu_core(pmu->name) || perf_pmu__is_hybrid(pmu->name);
+}
+
static bool pmu_alias_is_duplicate(struct sevent *alias_a,
struct sevent *alias_b)
{
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index b9a02dedd473..05702bc4bcf8 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -220,6 +220,7 @@ void perf_pmu__del_formats(struct list_head *formats);
struct perf_pmu *perf_pmu__scan(struct perf_pmu *pmu);

bool is_pmu_core(const char *name);
+bool perf_pmu__supports_legacy_cache(const struct perf_pmu *pmu);
void print_pmu_events(const struct print_callbacks *print_cb, void *print_state);
bool pmu_have_event(const char *pname, const char *name);

--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:09:24

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 21/40] perf parse-events: Wildcard legacy cache events

It is inconsistent that "perf stat -e instructions-retired" wildcard
opens on all PMUs while legacy cache events like "perf stat -e
L1-dcache-load-miss" do not. A behavior introduced by hybrid is that a
legacy cache event like L1-dcache-load-miss should wildcard open on
all hybrid PMUs. A call to is_event_supported is necessary for each
PMU, a failure of which results in the event not being added. Rather
than special case that logic, move it into the main legacy cache event
case and attempt to open legacy cache events on all PMUs.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/parse-events-hybrid.c | 33 -------------
tools/perf/util/parse-events-hybrid.h | 7 ---
tools/perf/util/parse-events.c | 70 ++++++++++++++-------------
tools/perf/util/parse-events.h | 3 +-
tools/perf/util/parse-events.y | 2 +-
5 files changed, 39 insertions(+), 76 deletions(-)

diff --git a/tools/perf/util/parse-events-hybrid.c b/tools/perf/util/parse-events-hybrid.c
index 7c9f9150bad5..d2c0be051d46 100644
--- a/tools/perf/util/parse-events-hybrid.c
+++ b/tools/perf/util/parse-events-hybrid.c
@@ -179,36 +179,3 @@ int parse_events__add_numeric_hybrid(struct parse_events_state *parse_state,
return add_raw_hybrid(parse_state, list, attr, name, metric_id,
config_terms);
}
-
-int parse_events__add_cache_hybrid(struct list_head *list, int *idx,
- struct perf_event_attr *attr,
- const char *name,
- const char *metric_id,
- struct list_head *config_terms,
- bool *hybrid,
- struct parse_events_state *parse_state)
-{
- struct perf_pmu *pmu;
- int ret;
-
- *hybrid = false;
- if (!perf_pmu__has_hybrid())
- return 0;
-
- *hybrid = true;
- perf_pmu__for_each_hybrid_pmu(pmu) {
- LIST_HEAD(terms);
-
- if (pmu_cmp(parse_state, pmu))
- continue;
-
- copy_config_terms(&terms, config_terms);
- ret = create_event_hybrid(PERF_TYPE_HW_CACHE, idx, list,
- attr, name, metric_id, &terms, pmu);
- free_config_terms(&terms);
- if (ret)
- return ret;
- }
-
- return 0;
-}
diff --git a/tools/perf/util/parse-events-hybrid.h b/tools/perf/util/parse-events-hybrid.h
index cbc05fec02a2..bc2966e73897 100644
--- a/tools/perf/util/parse-events-hybrid.h
+++ b/tools/perf/util/parse-events-hybrid.h
@@ -15,11 +15,4 @@ int parse_events__add_numeric_hybrid(struct parse_events_state *parse_state,
struct list_head *config_terms,
bool *hybrid);

-int parse_events__add_cache_hybrid(struct list_head *list, int *idx,
- struct perf_event_attr *attr,
- const char *name, const char *metric_id,
- struct list_head *config_terms,
- bool *hybrid,
- struct parse_events_state *parse_state);
-
#endif /* __PERF_PARSE_EVENTS_HYBRID_H */
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 9b2d7b6572c2..e007b2bc1ab4 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -471,46 +471,50 @@ static int parse_events__decode_legacy_cache(const char *name, int pmu_type, __u

int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
struct parse_events_error *err,
- struct list_head *head_config,
- struct parse_events_state *parse_state)
+ struct list_head *head_config)
{
- struct perf_event_attr attr;
- LIST_HEAD(config_terms);
- const char *config_name, *metric_id;
- int ret;
- bool hybrid;
+ struct perf_pmu *pmu = NULL;
+ bool found_supported = false;
+ const char *config_name = get_config_name(head_config);
+ const char *metric_id = get_config_metric_id(head_config);

+ while ((pmu = perf_pmu__scan(pmu)) != NULL) {
+ LIST_HEAD(config_terms);
+ struct perf_event_attr attr;
+ int ret;

- memset(&attr, 0, sizeof(attr));
- attr.type = PERF_TYPE_HW_CACHE;
- ret = parse_events__decode_legacy_cache(name, /*pmu_type=*/0, &attr.config);
- if (ret)
- return ret;
+ /*
+ * Skip uncore PMUs for performance. Software PMUs can open
+ * PERF_TYPE_HW_CACHE, so skip.
+ */
+ if (pmu->is_uncore || pmu->type == PERF_TYPE_SOFTWARE)
+ continue;

- if (head_config) {
- if (config_attr(&attr, head_config, err,
- config_term_common))
- return -EINVAL;
+ memset(&attr, 0, sizeof(attr));
+ attr.type = PERF_TYPE_HW_CACHE;

- if (get_config_terms(head_config, &config_terms))
- return -ENOMEM;
- }
+ ret = parse_events__decode_legacy_cache(name, pmu->type, &attr.config);
+ if (ret)
+ return ret;

- config_name = get_config_name(head_config);
- metric_id = get_config_metric_id(head_config);
- ret = parse_events__add_cache_hybrid(list, idx, &attr,
- config_name ? : name,
- metric_id,
- &config_terms,
- &hybrid, parse_state);
- if (hybrid)
- goto out_free_terms;
+ if (!is_event_supported(PERF_TYPE_HW_CACHE, attr.config))
+ continue;

- ret = add_event(list, idx, &attr, config_name ? : name, metric_id,
- &config_terms);
-out_free_terms:
- free_config_terms(&config_terms);
- return ret;
+ found_supported = true;
+
+ if (head_config) {
+ if (config_attr(&attr, head_config, err,
+ config_term_common))
+ return -EINVAL;
+
+ if (get_config_terms(head_config, &config_terms))
+ return -ENOMEM;
+ }
+
+ ret = add_event(list, idx, &attr, config_name ? : name, metric_id, &config_terms);
+ free_config_terms(&config_terms);
+ }
+ return found_supported ? 0: -EINVAL;
}

#ifdef HAVE_LIBTRACEEVENT
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 5acb62c2e00a..0c26303f7f63 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -172,8 +172,7 @@ int parse_events_add_tool(struct parse_events_state *parse_state,
int tool_event);
int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
struct parse_events_error *error,
- struct list_head *head_config,
- struct parse_events_state *parse_state);
+ struct list_head *head_config);
int parse_events_add_breakpoint(struct list_head *list, int *idx,
u64 addr, char *type, u64 len);
int parse_events_add_pmu(struct parse_events_state *parse_state,
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index f84fa1b132b3..cc7528558845 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -476,7 +476,7 @@ PE_LEGACY_CACHE opt_event_config

list = alloc_list();
ABORT_ON(!list);
- err = parse_events_add_cache(list, &parse_state->idx, $1, error, $2, parse_state);
+ err = parse_events_add_cache(list, &parse_state->idx, $1, error, $2);

parse_events_terms__delete($2);
free($1);
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:09:34

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 18/40] perf test: Move x86 hybrid tests to arch/x86

The tests use x86 hybrid specific PMUs.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/arch/x86/include/arch-tests.h | 1 +
tools/perf/arch/x86/tests/Build | 1 +
tools/perf/arch/x86/tests/arch-tests.c | 10 +
tools/perf/arch/x86/tests/hybrid.c | 277 +++++++++++++++++++++++
tools/perf/tests/parse-events.c | 181 ---------------
5 files changed, 289 insertions(+), 181 deletions(-)
create mode 100644 tools/perf/arch/x86/tests/hybrid.c

diff --git a/tools/perf/arch/x86/include/arch-tests.h b/tools/perf/arch/x86/include/arch-tests.h
index 902e9ea9b99e..33d39c1d3e64 100644
--- a/tools/perf/arch/x86/include/arch-tests.h
+++ b/tools/perf/arch/x86/include/arch-tests.h
@@ -11,6 +11,7 @@ int test__intel_pt_pkt_decoder(struct test_suite *test, int subtest);
int test__intel_pt_hybrid_compat(struct test_suite *test, int subtest);
int test__bp_modify(struct test_suite *test, int subtest);
int test__x86_sample_parsing(struct test_suite *test, int subtest);
+int test__hybrid(struct test_suite *test, int subtest);

extern struct test_suite *arch_tests[];

diff --git a/tools/perf/arch/x86/tests/Build b/tools/perf/arch/x86/tests/Build
index 6f4e8636c3bf..08cc8b9c931e 100644
--- a/tools/perf/arch/x86/tests/Build
+++ b/tools/perf/arch/x86/tests/Build
@@ -3,5 +3,6 @@ perf-$(CONFIG_DWARF_UNWIND) += dwarf-unwind.o

perf-y += arch-tests.o
perf-y += sample-parsing.o
+perf-y += hybrid.o
perf-$(CONFIG_AUXTRACE) += insn-x86.o intel-pt-test.o
perf-$(CONFIG_X86_64) += bp-modify.o
diff --git a/tools/perf/arch/x86/tests/arch-tests.c b/tools/perf/arch/x86/tests/arch-tests.c
index aae6ea0fe52b..147ad0638bbb 100644
--- a/tools/perf/arch/x86/tests/arch-tests.c
+++ b/tools/perf/arch/x86/tests/arch-tests.c
@@ -22,6 +22,15 @@ struct test_suite suite__intel_pt = {
DEFINE_SUITE("x86 bp modify", bp_modify);
#endif
DEFINE_SUITE("x86 Sample parsing", x86_sample_parsing);
+static struct test_case hybrid_tests[] = {
+ TEST_CASE_REASON("x86 hybrid event parsing", hybrid, "not hybrid"),
+ { .name = NULL, }
+};
+
+struct test_suite suite__hybrid = {
+ .desc = "x86 hybrid",
+ .test_cases = hybrid_tests,
+};

struct test_suite *arch_tests[] = {
#ifdef HAVE_DWARF_UNWIND_SUPPORT
@@ -35,5 +44,6 @@ struct test_suite *arch_tests[] = {
&suite__bp_modify,
#endif
&suite__x86_sample_parsing,
+ &suite__hybrid,
NULL,
};
diff --git a/tools/perf/arch/x86/tests/hybrid.c b/tools/perf/arch/x86/tests/hybrid.c
new file mode 100644
index 000000000000..0f99cfd116ee
--- /dev/null
+++ b/tools/perf/arch/x86/tests/hybrid.c
@@ -0,0 +1,277 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "arch-tests.h"
+#include "debug.h"
+#include "evlist.h"
+#include "evsel.h"
+#include "pmu-hybrid.h"
+#include "tests/tests.h"
+
+static bool test_config(const struct evsel *evsel, __u64 expected_config)
+{
+ return (evsel->core.attr.config & PERF_HW_EVENT_MASK) == expected_config;
+}
+
+static int test__hybrid_hw_event_with_pmu(struct evlist *evlist)
+{
+ struct evsel *evsel = evlist__first(evlist);
+
+ TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
+ return TEST_OK;
+}
+
+static int test__hybrid_hw_group_event(struct evlist *evlist)
+{
+ struct evsel *evsel, *leader;
+
+ evsel = leader = evlist__first(evlist);
+ TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
+ TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
+
+ evsel = evsel__next(evsel);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0xc0));
+ TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
+ return TEST_OK;
+}
+
+static int test__hybrid_sw_hw_group_event(struct evlist *evlist)
+{
+ struct evsel *evsel, *leader;
+
+ evsel = leader = evlist__first(evlist);
+ TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
+
+ evsel = evsel__next(evsel);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
+ TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
+ return TEST_OK;
+}
+
+static int test__hybrid_hw_sw_group_event(struct evlist *evlist)
+{
+ struct evsel *evsel, *leader;
+
+ evsel = leader = evlist__first(evlist);
+ TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
+ TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
+
+ evsel = evsel__next(evsel);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
+ return TEST_OK;
+}
+
+static int test__hybrid_group_modifier1(struct evlist *evlist)
+{
+ struct evsel *evsel, *leader;
+
+ evsel = leader = evlist__first(evlist);
+ TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
+ TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
+ TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
+ TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
+
+ evsel = evsel__next(evsel);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0xc0));
+ TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
+ TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
+ TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
+ return TEST_OK;
+}
+
+static int test__hybrid_raw1(struct evlist *evlist)
+{
+ struct evsel *evsel = evlist__first(evlist);
+
+ if (!perf_pmu__hybrid_mounted("cpu_atom")) {
+ TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
+ return TEST_OK;
+ }
+
+ TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
+
+ /* The type of second event is randome value */
+ evsel = evsel__next(evsel);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
+ return TEST_OK;
+}
+
+static int test__hybrid_raw2(struct evlist *evlist)
+{
+ struct evsel *evsel = evlist__first(evlist);
+
+ TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
+ return TEST_OK;
+}
+
+static int test__hybrid_cache_event(struct evlist *evlist)
+{
+ struct evsel *evsel = evlist__first(evlist);
+
+ TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_HW_CACHE == evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong config", 0x2 == (evsel->core.attr.config & 0xffffffff));
+ return TEST_OK;
+}
+
+static int test__checkevent_pmu(struct evlist *evlist)
+{
+
+ struct evsel *evsel = evlist__first(evlist);
+
+ TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong config", 10 == evsel->core.attr.config);
+ TEST_ASSERT_VAL("wrong config1", 1 == evsel->core.attr.config1);
+ TEST_ASSERT_VAL("wrong config2", 3 == evsel->core.attr.config2);
+ TEST_ASSERT_VAL("wrong config3", 0 == evsel->core.attr.config3);
+ /*
+ * The period value gets configured within evlist__config,
+ * while this test executes only parse events method.
+ */
+ TEST_ASSERT_VAL("wrong period", 0 == evsel->core.attr.sample_period);
+
+ return TEST_OK;
+}
+
+struct evlist_test {
+ const char *name;
+ bool (*valid)(void);
+ int (*check)(struct evlist *evlist);
+};
+
+static const struct evlist_test test__hybrid_events[] = {
+ {
+ .name = "cpu_core/cpu-cycles/",
+ .check = test__hybrid_hw_event_with_pmu,
+ /* 0 */
+ },
+ {
+ .name = "{cpu_core/cpu-cycles/,cpu_core/instructions/}",
+ .check = test__hybrid_hw_group_event,
+ /* 1 */
+ },
+ {
+ .name = "{cpu-clock,cpu_core/cpu-cycles/}",
+ .check = test__hybrid_sw_hw_group_event,
+ /* 2 */
+ },
+ {
+ .name = "{cpu_core/cpu-cycles/,cpu-clock}",
+ .check = test__hybrid_hw_sw_group_event,
+ /* 3 */
+ },
+ {
+ .name = "{cpu_core/cpu-cycles/k,cpu_core/instructions/u}",
+ .check = test__hybrid_group_modifier1,
+ /* 4 */
+ },
+ {
+ .name = "r1a",
+ .check = test__hybrid_raw1,
+ /* 5 */
+ },
+ {
+ .name = "cpu_core/r1a/",
+ .check = test__hybrid_raw2,
+ /* 6 */
+ },
+ {
+ .name = "cpu_core/config=10,config1,config2=3,period=1000/u",
+ .check = test__checkevent_pmu,
+ /* 7 */
+ },
+ {
+ .name = "cpu_core/LLC-loads/",
+ .check = test__hybrid_cache_event,
+ /* 8 */
+ },
+};
+
+static int test_event(const struct evlist_test *e)
+{
+ struct parse_events_error err;
+ struct evlist *evlist;
+ int ret;
+
+ if (e->valid && !e->valid()) {
+ pr_debug("... SKIP\n");
+ return TEST_OK;
+ }
+
+ evlist = evlist__new();
+ if (evlist == NULL) {
+ pr_err("Failed allocation");
+ return TEST_FAIL;
+ }
+ parse_events_error__init(&err);
+ ret = parse_events(evlist, e->name, &err);
+ if (ret) {
+ pr_debug("failed to parse event '%s', err %d, str '%s'\n",
+ e->name, ret, err.str);
+ parse_events_error__print(&err, e->name);
+ ret = TEST_FAIL;
+ if (strstr(err.str, "can't access trace events"))
+ ret = TEST_SKIP;
+ } else {
+ ret = e->check(evlist);
+ }
+ parse_events_error__exit(&err);
+ evlist__delete(evlist);
+
+ return ret;
+}
+
+static int combine_test_results(int existing, int latest)
+{
+ if (existing == TEST_FAIL)
+ return TEST_FAIL;
+ if (existing == TEST_SKIP)
+ return latest == TEST_OK ? TEST_SKIP : latest;
+ return latest;
+}
+
+static int test_events(const struct evlist_test *events, int cnt)
+{
+ int ret = TEST_OK;
+
+ for (int i = 0; i < cnt; i++) {
+ const struct evlist_test *e = &events[i];
+ int test_ret;
+
+ pr_debug("running test %d '%s'\n", i, e->name);
+ test_ret = test_event(e);
+ if (test_ret != TEST_OK) {
+ pr_debug("Event test failure: test %d '%s'", i, e->name);
+ ret = combine_test_results(ret, test_ret);
+ }
+ }
+
+ return ret;
+}
+
+int test__hybrid(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
+{
+ if (!perf_pmu__has_hybrid())
+ return TEST_SKIP;
+
+ return test_events(test__hybrid_events, ARRAY_SIZE(test__hybrid_events));
+}
diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index cb976765b8b0..15fec7f01315 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -6,7 +6,6 @@
#include "tests.h"
#include "debug.h"
#include "pmu.h"
-#include "pmu-hybrid.h"
#include "pmus.h"
#include <dirent.h>
#include <errno.h>
@@ -1509,127 +1508,6 @@ static int test__all_tracepoints(struct evlist *evlist)
}
#endif /* HAVE_LIBTRACEVENT */

-static int test__hybrid_hw_event_with_pmu(struct evlist *evlist)
-{
- struct evsel *evsel = evlist__first(evlist);
-
- TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
- return TEST_OK;
-}
-
-static int test__hybrid_hw_group_event(struct evlist *evlist)
-{
- struct evsel *evsel, *leader;
-
- evsel = leader = evlist__first(evlist);
- TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
- TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
-
- evsel = evsel__next(evsel);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0xc0));
- TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
- return TEST_OK;
-}
-
-static int test__hybrid_sw_hw_group_event(struct evlist *evlist)
-{
- struct evsel *evsel, *leader;
-
- evsel = leader = evlist__first(evlist);
- TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
-
- evsel = evsel__next(evsel);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
- TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
- return TEST_OK;
-}
-
-static int test__hybrid_hw_sw_group_event(struct evlist *evlist)
-{
- struct evsel *evsel, *leader;
-
- evsel = leader = evlist__first(evlist);
- TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
- TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
-
- evsel = evsel__next(evsel);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
- return TEST_OK;
-}
-
-static int test__hybrid_group_modifier1(struct evlist *evlist)
-{
- struct evsel *evsel, *leader;
-
- evsel = leader = evlist__first(evlist);
- TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
- TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
- TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
- TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
-
- evsel = evsel__next(evsel);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0xc0));
- TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
- TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
- TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
- return TEST_OK;
-}
-
-static int test__hybrid_raw1(struct evlist *evlist)
-{
- struct evsel *evsel = evlist__first(evlist);
-
- if (!perf_pmu__hybrid_mounted("cpu_atom")) {
- TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
- return TEST_OK;
- }
-
- TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
-
- /* The type of second event is randome value */
- evsel = evsel__next(evsel);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
- return TEST_OK;
-}
-
-static int test__hybrid_raw2(struct evlist *evlist)
-{
- struct evsel *evsel = evlist__first(evlist);
-
- TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
- return TEST_OK;
-}
-
-static int test__hybrid_cache_event(struct evlist *evlist)
-{
- struct evsel *evsel = evlist__first(evlist);
-
- TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_HW_CACHE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", 0x2 == (evsel->core.attr.config & 0xffffffff));
- return TEST_OK;
-}
-
struct evlist_test {
const char *name;
bool (*valid)(void);
@@ -1997,54 +1875,6 @@ static const struct terms_test test__terms[] = {
},
};

-static const struct evlist_test test__hybrid_events[] = {
- {
- .name = "cpu_core/cpu-cycles/",
- .check = test__hybrid_hw_event_with_pmu,
- /* 0 */
- },
- {
- .name = "{cpu_core/cpu-cycles/,cpu_core/instructions/}",
- .check = test__hybrid_hw_group_event,
- /* 1 */
- },
- {
- .name = "{cpu-clock,cpu_core/cpu-cycles/}",
- .check = test__hybrid_sw_hw_group_event,
- /* 2 */
- },
- {
- .name = "{cpu_core/cpu-cycles/,cpu-clock}",
- .check = test__hybrid_hw_sw_group_event,
- /* 3 */
- },
- {
- .name = "{cpu_core/cpu-cycles/k,cpu_core/instructions/u}",
- .check = test__hybrid_group_modifier1,
- /* 4 */
- },
- {
- .name = "r1a",
- .check = test__hybrid_raw1,
- /* 5 */
- },
- {
- .name = "cpu_core/r1a/",
- .check = test__hybrid_raw2,
- /* 6 */
- },
- {
- .name = "cpu_core/config=10,config1,config2=3,period=1000/u",
- .check = test__checkevent_pmu,
- /* 7 */
- },
- {
- .name = "cpu_core/LLC-loads/",
- .check = test__hybrid_cache_event,
- /* 8 */
- },
-};
-
static int test_event(const struct evlist_test *e)
{
struct parse_events_error err;
@@ -2307,14 +2137,6 @@ static bool test_alias(char **event, char **alias)
return false;
}

-static int test__hybrid(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
-{
- if (!perf_pmu__has_hybrid())
- return TEST_SKIP;
-
- return test_events(test__hybrid_events, ARRAY_SIZE(test__hybrid_events));
-}
-
static int test__checkevent_pmu_events_alias(struct evlist *evlist)
{
struct evsel *evsel1 = evlist__first(evlist);
@@ -2378,9 +2200,6 @@ static struct test_case tests__parse_events[] = {
TEST_CASE_REASON("Test event parsing",
events2,
"permissions"),
- TEST_CASE_REASON("Test parsing of \"hybrid\" CPU events",
- hybrid,
- "not hybrid"),
TEST_CASE_REASON("Parsing of all PMU events from sysfs",
pmu_events,
"permissions"),
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:09:43

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 22/40] perf print-events: Print legacy cache events for each PMU

Mirroring parse_events_add_cache, list the legacy name alongside its
alias with the PMU. Remove the now unnecessary hybrid logic.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/parse-events.c | 2 +-
tools/perf/util/parse-events.h | 1 +
tools/perf/util/print-events.c | 85 ++++++++++++++++------------------
3 files changed, 41 insertions(+), 47 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index e007b2bc1ab4..ae421a5c9ddd 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -413,7 +413,7 @@ static int config_attr(struct perf_event_attr *attr,
* contain hyphens and the longest name
* should always be selected.
*/
-static int parse_events__decode_legacy_cache(const char *name, int pmu_type, __u64 *config)
+int parse_events__decode_legacy_cache(const char *name, int pmu_type, __u64 *config)
{
int len, cache_type = -1, cache_op = -1, cache_result = -1;
const char *name_end = &name[strlen(name) + 1];
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 0c26303f7f63..4e49be290209 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -173,6 +173,7 @@ int parse_events_add_tool(struct parse_events_state *parse_state,
int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
struct parse_events_error *error,
struct list_head *head_config);
+int parse_events__decode_legacy_cache(const char *name, int pmu_type, __u64 *config);
int parse_events_add_breakpoint(struct list_head *list, int *idx,
u64 addr, char *type, u64 len);
int parse_events_add_pmu(struct parse_events_state *parse_state,
diff --git a/tools/perf/util/print-events.c b/tools/perf/util/print-events.c
index 93bbb868d400..d416c5484cd5 100644
--- a/tools/perf/util/print-events.c
+++ b/tools/perf/util/print-events.c
@@ -226,56 +226,49 @@ void print_sdt_events(const struct print_callbacks *print_cb, void *print_state)

int print_hwcache_events(const struct print_callbacks *print_cb, void *print_state)
{
+ struct perf_pmu *pmu = NULL;
const char *event_type_descriptor = event_type_descriptors[PERF_TYPE_HW_CACHE];

- for (int type = 0; type < PERF_COUNT_HW_CACHE_MAX; type++) {
- for (int op = 0; op < PERF_COUNT_HW_CACHE_OP_MAX; op++) {
- /* skip invalid cache type */
- if (!evsel__is_cache_op_valid(type, op))
- continue;
+ while ((pmu = perf_pmu__scan(pmu)) != NULL) {
+ /*
+ * Skip uncore PMUs for performance. Software PMUs can open
+ * PERF_TYPE_HW_CACHE, so skip.
+ */
+ if (pmu->is_uncore || pmu->type == PERF_TYPE_SOFTWARE)
+ continue;

- for (int res = 0; res < PERF_COUNT_HW_CACHE_RESULT_MAX; res++) {
- struct perf_pmu *pmu = NULL;
- char name[64];
-
- __evsel__hw_cache_type_op_res_name(type, op, res,
- name, sizeof(name));
- if (!perf_pmu__has_hybrid()) {
- if (is_event_supported(PERF_TYPE_HW_CACHE,
- type | (op << 8) | (res << 16))) {
- print_cb->print_event(print_state,
- "cache",
- /*pmu_name=*/NULL,
- name,
- /*event_alias=*/NULL,
- /*scale_unit=*/NULL,
- /*deprecated=*/false,
- event_type_descriptor,
- /*desc=*/NULL,
- /*long_desc=*/NULL,
- /*encoding_desc=*/NULL);
- }
+ for (int type = 0; type < PERF_COUNT_HW_CACHE_MAX; type++) {
+ for (int op = 0; op < PERF_COUNT_HW_CACHE_OP_MAX; op++) {
+ /* skip invalid cache type */
+ if (!evsel__is_cache_op_valid(type, op))
continue;
- }
- perf_pmu__for_each_hybrid_pmu(pmu) {
- if (is_event_supported(PERF_TYPE_HW_CACHE,
- type | (op << 8) | (res << 16) |
- ((__u64)pmu->type << PERF_PMU_TYPE_SHIFT))) {
- char new_name[128];
- snprintf(new_name, sizeof(new_name),
- "%s/%s/", pmu->name, name);
- print_cb->print_event(print_state,
- "cache",
- pmu->name,
- name,
- new_name,
- /*scale_unit=*/NULL,
- /*deprecated=*/false,
- event_type_descriptor,
- /*desc=*/NULL,
- /*long_desc=*/NULL,
- /*encoding_desc=*/NULL);
- }
+
+ for (int res = 0; res < PERF_COUNT_HW_CACHE_RESULT_MAX; res++) {
+ char name[64];
+ char alias_name[128];
+ __u64 config;
+ int ret;
+
+ __evsel__hw_cache_type_op_res_name(type, op, res,
+ name, sizeof(name));
+
+ ret = parse_events__decode_legacy_cache(name, pmu->type,
+ &config);
+ if (ret || !is_event_supported(PERF_TYPE_HW_CACHE, config))
+ continue;
+ snprintf(alias_name, sizeof(alias_name), "%s/%s/",
+ pmu->name, name);
+ print_cb->print_event(print_state,
+ "cache",
+ pmu->name,
+ name,
+ alias_name,
+ /*scale_unit=*/NULL,
+ /*deprecated=*/false,
+ event_type_descriptor,
+ /*desc=*/NULL,
+ /*long_desc=*/NULL,
+ /*encoding_desc=*/NULL);
}
}
}
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:09:48

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 23/40] perf parse-events: Support wildcards on raw events

Legacy raw events like r1a open as PERF_TYPE_RAW on non-hybrid systems
and on each hybrid PMU on hybrid systems. Rather than iterate hybrid
PMUs add a perf_pmu__supports_wildcard_numeric function that says when
a numeric event should be opened upon it. If the parsed event
specifies the type of the PMU then don't wildcard match PMUs, use the
specified PMU type.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/parse-events.c | 43 ++++++++++++++++++++++++----------
tools/perf/util/parse-events.h | 3 ++-
tools/perf/util/parse-events.y | 13 ++++++----
tools/perf/util/pmu.c | 5 ++++
tools/perf/util/pmu.h | 1 +
5 files changed, 47 insertions(+), 18 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index ae421a5c9ddd..12b312935353 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -25,7 +25,6 @@
#include "util/parse-branch-options.h"
#include "util/evsel_config.h"
#include "util/event.h"
-#include "util/parse-events-hybrid.h"
#include "util/pmu-hybrid.h"
#include "util/bpf-filter.h"
#include "util/util.h"
@@ -1449,15 +1448,14 @@ int parse_events_add_tracepoint(struct list_head *list, int *idx,
#endif
}

-int parse_events_add_numeric(struct parse_events_state *parse_state,
- struct list_head *list,
- u32 type, u64 config,
- struct list_head *head_config)
+static int __parse_events_add_numeric(struct parse_events_state *parse_state,
+ struct list_head *list,
+ u32 type, u64 config,
+ struct list_head *head_config)
{
struct perf_event_attr attr;
LIST_HEAD(config_terms);
const char *name, *metric_id;
- bool hybrid;
int ret;

memset(&attr, 0, sizeof(attr));
@@ -1475,19 +1473,38 @@ int parse_events_add_numeric(struct parse_events_state *parse_state,

name = get_config_name(head_config);
metric_id = get_config_metric_id(head_config);
- ret = parse_events__add_numeric_hybrid(parse_state, list, &attr,
- name, metric_id,
- &config_terms, &hybrid);
- if (hybrid)
- goto out_free_terms;
-
ret = add_event(list, &parse_state->idx, &attr, name, metric_id,
&config_terms);
-out_free_terms:
free_config_terms(&config_terms);
return ret;
}

+int parse_events_add_numeric(struct parse_events_state *parse_state,
+ struct list_head *list,
+ u32 type, u64 config,
+ struct list_head *head_config,
+ bool wildcard)
+{
+ struct perf_pmu *pmu = NULL;
+ bool found_supported = false;
+
+ if (!wildcard)
+ return __parse_events_add_numeric(parse_state, list, type, config, head_config);
+
+ while ((pmu = perf_pmu__scan(pmu)) != NULL) {
+ int ret;
+
+ if (!perf_pmu__supports_wildcard_numeric(pmu))
+ continue;
+
+ found_supported = true;
+ ret = __parse_events_add_numeric(parse_state, list, pmu->type, config, head_config);
+ if (ret)
+ return ret;
+ }
+ return found_supported ? 0 : -EINVAL;
+}
+
int parse_events_add_tool(struct parse_events_state *parse_state,
struct list_head *list,
int tool_event)
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 4e49be290209..831cd1ff4702 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -166,7 +166,8 @@ int parse_events_load_bpf_obj(struct parse_events_state *parse_state,
int parse_events_add_numeric(struct parse_events_state *parse_state,
struct list_head *list,
u32 type, u64 config,
- struct list_head *head_config);
+ struct list_head *head_config,
+ bool wildcard);
int parse_events_add_tool(struct parse_events_state *parse_state,
struct list_head *list,
int tool_event);
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index cc7528558845..5055a29a448f 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -435,7 +435,8 @@ value_sym '/' event_config '/'

list = alloc_list();
ABORT_ON(!list);
- err = parse_events_add_numeric(_parse_state, list, type, config, $3);
+ err = parse_events_add_numeric(_parse_state, list, type, config, $3,
+ /*wildcard=*/false);
parse_events_terms__delete($3);
if (err) {
free_list_evsel(list);
@@ -452,7 +453,9 @@ value_sym sep_slash_slash_dc

list = alloc_list();
ABORT_ON(!list);
- ABORT_ON(parse_events_add_numeric(_parse_state, list, type, config, NULL));
+ ABORT_ON(parse_events_add_numeric(_parse_state, list, type, config,
+ /*head_config=*/NULL,
+ /*wildcard=*/false));
$$ = list;
}
|
@@ -596,7 +599,8 @@ PE_VALUE ':' PE_VALUE opt_event_config

list = alloc_list();
ABORT_ON(!list);
- err = parse_events_add_numeric(_parse_state, list, (u32)$1, $3, $4);
+ err = parse_events_add_numeric(_parse_state, list, (u32)$1, $3, $4,
+ /*wildcard=*/false);
parse_events_terms__delete($4);
if (err) {
free(list);
@@ -618,7 +622,8 @@ PE_RAW opt_event_config
num = strtoull($1 + 1, NULL, 16);
ABORT_ON(errno);
free($1);
- err = parse_events_add_numeric(_parse_state, list, PERF_TYPE_RAW, num, $2);
+ err = parse_events_add_numeric(_parse_state, list, PERF_TYPE_RAW, num, $2,
+ /*wildcard=*/true);
parse_events_terms__delete($2);
if (err) {
free(list);
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 63071d876190..cd4247a379d4 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1655,6 +1655,11 @@ bool perf_pmu__supports_legacy_cache(const struct perf_pmu *pmu)
return is_pmu_core(pmu->name) || perf_pmu__is_hybrid(pmu->name);
}

+bool perf_pmu__supports_wildcard_numeric(const struct perf_pmu *pmu)
+{
+ return is_pmu_core(pmu->name) || perf_pmu__is_hybrid(pmu->name);
+}
+
static bool pmu_alias_is_duplicate(struct sevent *alias_a,
struct sevent *alias_b)
{
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 05702bc4bcf8..5a19536a5449 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -221,6 +221,7 @@ struct perf_pmu *perf_pmu__scan(struct perf_pmu *pmu);

bool is_pmu_core(const char *name);
bool perf_pmu__supports_legacy_cache(const struct perf_pmu *pmu);
+bool perf_pmu__supports_wildcard_numeric(const struct perf_pmu *pmu);
void print_pmu_events(const struct print_callbacks *print_cb, void *print_state);
bool pmu_have_event(const char *pname, const char *name);

--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:10:43

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 24/40] perf parse-events: Remove now unused hybrid logic

The event parser no longer needs to recurse in case of a legacy cache
event in a PMU, the necessary wild card logic has moved to
perf_pmu__supports_legacy_cache and
perf_pmu__supports_wildcard_numeric.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/Build | 1 -
tools/perf/util/parse-events-hybrid.c | 181 --------------------------
tools/perf/util/parse-events-hybrid.h | 18 ---
tools/perf/util/parse-events.c | 74 -----------
tools/perf/util/parse-events.h | 8 --
5 files changed, 282 deletions(-)
delete mode 100644 tools/perf/util/parse-events-hybrid.c
delete mode 100644 tools/perf/util/parse-events-hybrid.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index bd18fe5f2719..c146736ead19 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -24,7 +24,6 @@ perf-y += llvm-utils.o
perf-y += mmap.o
perf-y += memswap.o
perf-y += parse-events.o
-perf-y += parse-events-hybrid.o
perf-y += print-events.o
perf-y += tracepoint.o
perf-y += perf_regs.o
diff --git a/tools/perf/util/parse-events-hybrid.c b/tools/perf/util/parse-events-hybrid.c
deleted file mode 100644
index d2c0be051d46..000000000000
--- a/tools/perf/util/parse-events-hybrid.c
+++ /dev/null
@@ -1,181 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include <linux/err.h>
-#include <linux/zalloc.h>
-#include <errno.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <fcntl.h>
-#include <sys/param.h>
-#include "evlist.h"
-#include "evsel.h"
-#include "parse-events.h"
-#include "parse-events-hybrid.h"
-#include "debug.h"
-#include "pmu.h"
-#include "pmu-hybrid.h"
-#include "perf.h"
-
-static void config_hybrid_attr(struct perf_event_attr *attr,
- int type, int pmu_type)
-{
- /*
- * attr.config layout for type PERF_TYPE_HARDWARE and
- * PERF_TYPE_HW_CACHE
- *
- * PERF_TYPE_HARDWARE: 0xEEEEEEEE000000AA
- * AA: hardware event ID
- * EEEEEEEE: PMU type ID
- * PERF_TYPE_HW_CACHE: 0xEEEEEEEE00DDCCBB
- * BB: hardware cache ID
- * CC: hardware cache op ID
- * DD: hardware cache op result ID
- * EEEEEEEE: PMU type ID
- * If the PMU type ID is 0, the PERF_TYPE_RAW will be applied.
- */
- attr->type = type;
- attr->config = (attr->config & PERF_HW_EVENT_MASK) |
- ((__u64)pmu_type << PERF_PMU_TYPE_SHIFT);
-}
-
-static int create_event_hybrid(__u32 config_type, int *idx,
- struct list_head *list,
- struct perf_event_attr *attr, const char *name,
- const char *metric_id,
- struct list_head *config_terms,
- struct perf_pmu *pmu)
-{
- struct evsel *evsel;
- __u32 type = attr->type;
- __u64 config = attr->config;
-
- config_hybrid_attr(attr, config_type, pmu->type);
-
- /*
- * Some hybrid hardware cache events are only available on one CPU
- * PMU. For example, the 'L1-dcache-load-misses' is only available
- * on cpu_core, while the 'L1-icache-loads' is only available on
- * cpu_atom. We need to remove "not supported" hybrid cache events.
- */
- if (attr->type == PERF_TYPE_HW_CACHE
- && !is_event_supported(attr->type, attr->config))
- return 0;
-
- evsel = parse_events__add_event_hybrid(list, idx, attr, name, metric_id,
- pmu, config_terms);
- if (evsel) {
- evsel->pmu_name = strdup(pmu->name);
- if (!evsel->pmu_name)
- return -ENOMEM;
- } else
- return -ENOMEM;
- attr->type = type;
- attr->config = config;
- return 0;
-}
-
-static int pmu_cmp(struct parse_events_state *parse_state,
- struct perf_pmu *pmu)
-{
- if (parse_state->evlist && parse_state->evlist->hybrid_pmu_name)
- return strcmp(parse_state->evlist->hybrid_pmu_name, pmu->name);
-
- if (parse_state->hybrid_pmu_name)
- return strcmp(parse_state->hybrid_pmu_name, pmu->name);
-
- return 0;
-}
-
-static int add_hw_hybrid(struct parse_events_state *parse_state,
- struct list_head *list, struct perf_event_attr *attr,
- const char *name, const char *metric_id,
- struct list_head *config_terms)
-{
- struct perf_pmu *pmu;
- int ret;
-
- perf_pmu__for_each_hybrid_pmu(pmu) {
- LIST_HEAD(terms);
-
- if (pmu_cmp(parse_state, pmu))
- continue;
-
- copy_config_terms(&terms, config_terms);
- ret = create_event_hybrid(PERF_TYPE_HARDWARE,
- &parse_state->idx, list, attr, name,
- metric_id, &terms, pmu);
- free_config_terms(&terms);
- if (ret)
- return ret;
- }
-
- return 0;
-}
-
-static int create_raw_event_hybrid(int *idx, struct list_head *list,
- struct perf_event_attr *attr,
- const char *name,
- const char *metric_id,
- struct list_head *config_terms,
- struct perf_pmu *pmu)
-{
- struct evsel *evsel;
-
- attr->type = pmu->type;
- evsel = parse_events__add_event_hybrid(list, idx, attr, name, metric_id,
- pmu, config_terms);
- if (evsel)
- evsel->pmu_name = strdup(pmu->name);
- else
- return -ENOMEM;
-
- return 0;
-}
-
-static int add_raw_hybrid(struct parse_events_state *parse_state,
- struct list_head *list, struct perf_event_attr *attr,
- const char *name, const char *metric_id,
- struct list_head *config_terms)
-{
- struct perf_pmu *pmu;
- int ret;
-
- perf_pmu__for_each_hybrid_pmu(pmu) {
- LIST_HEAD(terms);
-
- if (pmu_cmp(parse_state, pmu))
- continue;
-
- copy_config_terms(&terms, config_terms);
- ret = create_raw_event_hybrid(&parse_state->idx, list, attr,
- name, metric_id, &terms, pmu);
- free_config_terms(&terms);
- if (ret)
- return ret;
- }
-
- return 0;
-}
-
-int parse_events__add_numeric_hybrid(struct parse_events_state *parse_state,
- struct list_head *list,
- struct perf_event_attr *attr,
- const char *name, const char *metric_id,
- struct list_head *config_terms,
- bool *hybrid)
-{
- *hybrid = false;
- if (attr->type == PERF_TYPE_SOFTWARE)
- return 0;
-
- if (!perf_pmu__has_hybrid())
- return 0;
-
- *hybrid = true;
- if (attr->type != PERF_TYPE_RAW) {
- return add_hw_hybrid(parse_state, list, attr, name, metric_id,
- config_terms);
- }
-
- return add_raw_hybrid(parse_state, list, attr, name, metric_id,
- config_terms);
-}
diff --git a/tools/perf/util/parse-events-hybrid.h b/tools/perf/util/parse-events-hybrid.h
deleted file mode 100644
index bc2966e73897..000000000000
--- a/tools/perf/util/parse-events-hybrid.h
+++ /dev/null
@@ -1,18 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __PERF_PARSE_EVENTS_HYBRID_H
-#define __PERF_PARSE_EVENTS_HYBRID_H
-
-#include <linux/list.h>
-#include <stdbool.h>
-#include <linux/types.h>
-#include <linux/perf_event.h>
-#include <string.h>
-
-int parse_events__add_numeric_hybrid(struct parse_events_state *parse_state,
- struct list_head *list,
- struct perf_event_attr *attr,
- const char *name, const char *metric_id,
- struct list_head *config_terms,
- bool *hybrid);
-
-#endif /* __PERF_PARSE_EVENTS_HYBRID_H */
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 12b312935353..fad0dd4b86b2 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -25,7 +25,6 @@
#include "util/parse-branch-options.h"
#include "util/evsel_config.h"
#include "util/event.h"
-#include "util/pmu-hybrid.h"
#include "util/bpf-filter.h"
#include "util/util.h"
#include "tracepoint.h"
@@ -39,9 +38,6 @@ extern int parse_events_debug;
int parse_events_parse(void *parse_state, void *scanner);
static int get_config_terms(struct list_head *head_config,
struct list_head *head_terms __maybe_unused);
-static int parse_events__with_hybrid_pmu(struct parse_events_state *parse_state,
- const char *str, char *pmu_name,
- struct list_head *list);

struct event_symbol event_symbols_hw[PERF_COUNT_HW_MAX] = {
[PERF_COUNT_HW_CPU_CYCLES] = {
@@ -1524,33 +1520,6 @@ static bool config_term_percore(struct list_head *config_terms)
return false;
}

-static int parse_events__inside_hybrid_pmu(struct parse_events_state *parse_state,
- struct list_head *list, char *name,
- struct list_head *head_config)
-{
- struct parse_events_term *term;
- int ret = -1;
-
- if (parse_state->fake_pmu || !head_config || list_empty(head_config) ||
- !perf_pmu__is_hybrid(name)) {
- return -1;
- }
-
- /*
- * More than one term in list.
- */
- if (head_config->next && head_config->next->next != head_config)
- return -1;
-
- term = list_first_entry(head_config, struct parse_events_term, list);
- if (term && term->config && strcmp(term->config, "event")) {
- ret = parse_events__with_hybrid_pmu(parse_state, term->config,
- name, list);
- }
-
- return ret;
-}
-
int parse_events_add_pmu(struct parse_events_state *parse_state,
struct list_head *list, char *name,
struct list_head *head_config,
@@ -1645,11 +1614,6 @@ int parse_events_add_pmu(struct parse_events_state *parse_state,
if (pmu->default_config && get_config_chgs(pmu, head_config, &config_terms))
return -ENOMEM;

- if (!parse_events__inside_hybrid_pmu(parse_state, list, name,
- head_config)) {
- return 0;
- }
-
if (!parse_state->fake_pmu && perf_pmu__config(pmu, &attr, head_config, parse_state->error)) {
free_config_terms(&config_terms);
return -EINVAL;
@@ -2027,32 +1991,6 @@ int parse_events_terms(struct list_head *terms, const char *str)
return ret;
}

-static int parse_events__with_hybrid_pmu(struct parse_events_state *parse_state,
- const char *str, char *pmu_name,
- struct list_head *list)
-{
- struct parse_events_state ps = {
- .list = LIST_HEAD_INIT(ps.list),
- .stoken = PE_START_EVENTS,
- .hybrid_pmu_name = pmu_name,
- .idx = parse_state->idx,
- };
- int ret;
-
- ret = parse_events__scanner(str, &ps);
-
- if (!ret) {
- if (!list_empty(&ps.list)) {
- list_splice(&ps.list, list);
- parse_state->idx = ps.idx;
- return 0;
- } else
- return -1;
- }
-
- return ret;
-}
-
__weak int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
{
/* Order by insertion index. */
@@ -2776,15 +2714,3 @@ char *parse_events_formats_error_string(char *additional_terms)
fail:
return NULL;
}
-
-struct evsel *parse_events__add_event_hybrid(struct list_head *list, int *idx,
- struct perf_event_attr *attr,
- const char *name,
- const char *metric_id,
- struct perf_pmu *pmu,
- struct list_head *config_terms)
-{
- return __add_event(list, idx, attr, /*init_attr=*/true, name, metric_id,
- pmu, config_terms, /*auto_merge_stats=*/false,
- /*cpu_list=*/NULL);
-}
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 831cd1ff4702..77b8f7efdb94 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -122,7 +122,6 @@ struct parse_events_state {
struct list_head *terms;
int stoken;
struct perf_pmu *fake_pmu;
- char *hybrid_pmu_name;
/* Should PE_LEGACY_NAME tokens be generated for config terms? */
bool match_legacy_cache_terms;
bool wild_card_pmus;
@@ -235,11 +234,4 @@ static inline bool is_sdt_event(char *str __maybe_unused)
}
#endif /* HAVE_LIBELF_SUPPORT */

-struct evsel *parse_events__add_event_hybrid(struct list_head *list, int *idx,
- struct perf_event_attr *attr,
- const char *name,
- const char *metric_id,
- struct perf_pmu *pmu,
- struct list_head *config_terms);
-
#endif /* __PERF_PARSE_EVENTS_H */
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:10:46

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 25/40] perf parse-events: Minor type safety cleanup

Use the typed parse_state rather than void* _parse_state when
available.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/parse-events.y | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 5055a29a448f..e709508b1d6e 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -301,7 +301,7 @@ PE_NAME opt_pmu_config
if (!list)
CLEANUP_YYABORT;
/* Attempt to add to list assuming $1 is a PMU name. */
- if (parse_events_add_pmu(_parse_state, list, $1, $2, /*auto_merge_stats=*/false)) {
+ if (parse_events_add_pmu(parse_state, list, $1, $2, /*auto_merge_stats=*/false)) {
struct perf_pmu *pmu = NULL;
int ok = 0;

@@ -319,7 +319,7 @@ PE_NAME opt_pmu_config
!perf_pmu__match(pattern, pmu->alias_name, $1)) {
if (parse_events_copy_term_list(orig_terms, &terms))
CLEANUP_YYABORT;
- if (!parse_events_add_pmu(_parse_state, list, pmu->name, terms,
+ if (!parse_events_add_pmu(parse_state, list, pmu->name, terms,
/*auto_merge_stats=*/true)) {
ok++;
parse_state->wild_card_pmus = true;
@@ -331,7 +331,7 @@ PE_NAME opt_pmu_config
if (!ok) {
/* Failure to add, assume $1 is an event name. */
zfree(&list);
- ok = !parse_events_multi_pmu_add(_parse_state, $1, $2, &list);
+ ok = !parse_events_multi_pmu_add(parse_state, $1, $2, &list);
$2 = NULL;
}
if (!ok)
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:12:39

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 28/40] perf test: Add cputype testing to perf stat

Check a bogus PMU fails and that a known PMU succeeds. Limit to PMUs
known cpu, cpu_atom and armv8_pmuv3_0 ones.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/tests/shell/stat.sh | 44 ++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)

diff --git a/tools/perf/tests/shell/stat.sh b/tools/perf/tests/shell/stat.sh
index 2c1d3f704995..fe1283ca39d1 100755
--- a/tools/perf/tests/shell/stat.sh
+++ b/tools/perf/tests/shell/stat.sh
@@ -91,9 +91,53 @@ test_topdown_weak_groups() {
echo "Topdown weak groups test [Success]"
}

+test_cputype() {
+ # Test --cputype argument.
+ echo "cputype test"
+
+ # Bogus PMU should fail.
+ if perf stat --cputype="123" -e instructions true > /dev/null 2>&1
+ then
+ echo "cputype test [Bogus PMU didn't fail]"
+ err=1
+ return
+ fi
+
+ # Find a known PMU for cputype.
+ pmu=""
+ for i in cpu cpu_atom armv8_pmuv3_0
+ do
+ if test -d "/sys/devices/$i"
+ then
+ pmu="$i"
+ break
+ fi
+ if perf stat -e "$i/instructions/" true > /dev/null 2>&1
+ then
+ pmu="$i"
+ break
+ fi
+ done
+ if test "x$pmu" = "x"
+ then
+ echo "cputype test [Skipped known PMU not found]"
+ return
+ fi
+
+ # Test running with cputype produces output.
+ if ! perf stat --cputype="$pmu" -e instructions true 2>&1 | grep -E -q "instructions"
+ then
+ echo "cputype test [Failed count missed with given filter]"
+ err=1
+ return
+ fi
+ echo "cputype test [Success]"
+}
+
test_default_stat
test_stat_record_report
test_stat_repeat_weak_groups
test_topdown_groups
test_topdown_weak_groups
+test_cputype
exit $err
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:13:48

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 38/40] perf vendor events intel: Correct alderlake metrics

Fix the metrics tma_memory_bound on alderlake cpu_core and
tma_microcode_sequencer on alderlake cpu_atom, where metrics had be
rewritten across PMUs. Fix MEM_BOUND_STALLS_AT_RET_CORRECTION which is
an aux metric but lacks a hash prefix. Add PMU prefixes for
cpu_core/cpu_atom events to avoid wildcard opening the events.

Signed-off-by: Ian Rogers <[email protected]>
---
.../arch/x86/alderlake/adl-metrics.json | 238 +++++++++---------
.../arch/x86/alderlaken/adln-metrics.json | 6 +-
2 files changed, 122 insertions(+), 122 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json b/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json
index 4c2a14ea5a1c..840f6f6fc8c5 100644
--- a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json
@@ -151,7 +151,7 @@
},
{
"BriefDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear",
- "MetricExpr": "(tma_info_slots - (TOPDOWN_FE_BOUND.ALL + TOPDOWN_BE_BOUND.ALL + TOPDOWN_RETIRING.ALL)) / tma_info_slots",
+ "MetricExpr": "(tma_info_slots - (cpu_atom@TOPDOWN_FE_BOUND.ALL@ + cpu_atom@TOPDOWN_BE_BOUND.ALL@ + cpu_atom@TOPDOWN_RETIRING.ALL@)) / tma_info_slots",
"MetricGroup": "TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
@@ -162,7 +162,7 @@
},
{
"BriefDescription": "Counts the number of uops that are not from the microsequencer.",
- "MetricExpr": "(TOPDOWN_RETIRING.ALL - UOPS_RETIRED.MS) / tma_info_slots",
+ "MetricExpr": "(cpu_atom@TOPDOWN_RETIRING.ALL@ - cpu_atom@UOPS_RETIRED.MS@) / tma_info_slots",
"MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_base",
"MetricThreshold": "tma_base > 0.6",
@@ -229,7 +229,7 @@
},
{
"BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to memory disambiguation.",
- "MetricExpr": "tma_nuke * (MACHINE_CLEARS.DISAMBIGUATION / MACHINE_CLEARS.SLOW)",
+ "MetricExpr": "tma_nuke * (cpu_atom@MACHINE_CLEARS.DISAMBIGUATION@ / cpu_atom@MACHINE_CLEARS.SLOW@)",
"MetricGroup": "TopdownL4;tma_L4_group;tma_nuke_group",
"MetricName": "tma_disambiguation",
"MetricThreshold": "tma_disambiguation > 0.02",
@@ -239,7 +239,7 @@
{
"BriefDescription": "Counts the number of cycles the core is stalled due to a demand load miss which hit in DRAM or MMIO (Non-DRAM).",
"MetricConstraint": "NO_GROUP_EVENTS",
- "MetricExpr": "MEM_BOUND_STALLS.LOAD_DRAM_HIT / tma_info_clks - MEM_BOUND_STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_DRAM_HIT / MEM_BOUND_STALLS.LOAD",
+ "MetricExpr": "cpu_atom@MEM_BOUND_STALLS.LOAD_DRAM_HIT@ / tma_info_clks - max((cpu_atom@MEM_BOUND_STALLS.LOAD@ - cpu_atom@LD_HEAD.L1_MISS_AT_RET@) / tma_info_clks, 0) * cpu_atom@MEM_BOUND_STALLS.LOAD_DRAM_HIT@ / cpu_atom@MEM_BOUND_STALLS.LOAD@",
"MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
"MetricName": "tma_dram_bound",
"MetricThreshold": "tma_dram_bound > 0.1",
@@ -277,7 +277,7 @@
},
{
"BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to FP assists.",
- "MetricExpr": "tma_nuke * (MACHINE_CLEARS.FP_ASSIST / MACHINE_CLEARS.SLOW)",
+ "MetricExpr": "tma_nuke * (cpu_atom@MACHINE_CLEARS.FP_ASSIST@ / cpu_atom@MACHINE_CLEARS.SLOW@)",
"MetricGroup": "TopdownL4;tma_L4_group;tma_nuke_group",
"MetricName": "tma_fp_assist",
"MetricThreshold": "tma_fp_assist > 0.02",
@@ -314,7 +314,7 @@
},
{
"BriefDescription": "Percentage of total non-speculative loads with a address aliasing block",
- "MetricExpr": "100 * LD_BLOCKS.4K_ALIAS / MEM_UOPS_RETIRED.ALL_LOADS",
+ "MetricExpr": "100 * cpu_atom@LD_BLOCKS.4K_ALIAS@ / MEM_UOPS_RETIRED.ALL_LOADS",
"MetricName": "tma_info_address_alias_blocks",
"Unit": "cpu_atom"
},
@@ -334,14 +334,14 @@
},
{
"BriefDescription": "",
- "MetricExpr": "CPU_CLK_UNHALTED.CORE",
+ "MetricExpr": "cpu_atom@CPU_CLK_UNHALTED.CORE@",
"MetricGroup": " ",
"MetricName": "tma_info_clks",
"Unit": "cpu_atom"
},
{
"BriefDescription": "",
- "MetricExpr": "CPU_CLK_UNHALTED.CORE_P",
+ "MetricExpr": "cpu_atom@CPU_CLK_UNHALTED.CORE_P@",
"MetricGroup": " ",
"MetricName": "tma_info_clks_p",
"Unit": "cpu_atom"
@@ -383,35 +383,35 @@
},
{
"BriefDescription": "Percentage of all uops which are FPDiv uops",
- "MetricExpr": "100 * UOPS_RETIRED.FPDIV / UOPS_RETIRED.ALL",
+ "MetricExpr": "100 * cpu_atom@UOPS_RETIRED.FPDIV@ / UOPS_RETIRED.ALL",
"MetricGroup": " ",
"MetricName": "tma_info_fpdiv_uop_ratio",
"Unit": "cpu_atom"
},
{
"BriefDescription": "Percentage of all uops which are IDiv uops",
- "MetricExpr": "100 * UOPS_RETIRED.IDIV / UOPS_RETIRED.ALL",
+ "MetricExpr": "100 * cpu_atom@UOPS_RETIRED.IDIV@ / UOPS_RETIRED.ALL",
"MetricGroup": " ",
"MetricName": "tma_info_idiv_uop_ratio",
"Unit": "cpu_atom"
},
{
"BriefDescription": "Percent of instruction miss cost that hit in DRAM",
- "MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_DRAM_HIT / MEM_BOUND_STALLS.IFETCH",
+ "MetricExpr": "100 * cpu_atom@MEM_BOUND_STALLS.IFETCH_DRAM_HIT@ / cpu_atom@MEM_BOUND_STALLS.IFETCH@",
"MetricGroup": " ",
"MetricName": "tma_info_inst_miss_cost_dramhit_percent",
"Unit": "cpu_atom"
},
{
"BriefDescription": "Percent of instruction miss cost that hit in the L2",
- "MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_L2_HIT / MEM_BOUND_STALLS.IFETCH",
+ "MetricExpr": "100 * cpu_atom@MEM_BOUND_STALLS.IFETCH_L2_HIT@ / cpu_atom@MEM_BOUND_STALLS.IFETCH@",
"MetricGroup": " ",
"MetricName": "tma_info_inst_miss_cost_l2hit_percent",
"Unit": "cpu_atom"
},
{
"BriefDescription": "Percent of instruction miss cost that hit in the L3",
- "MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_LLC_HIT / MEM_BOUND_STALLS.IFETCH",
+ "MetricExpr": "100 * cpu_atom@MEM_BOUND_STALLS.IFETCH_LLC_HIT@ / cpu_atom@MEM_BOUND_STALLS.IFETCH@",
"MetricGroup": " ",
"MetricName": "tma_info_inst_miss_cost_l3hit_percent",
"Unit": "cpu_atom"
@@ -439,7 +439,7 @@
},
{
"BriefDescription": "Instructions per Far Branch",
- "MetricExpr": "INST_RETIRED.ANY / (BR_INST_RETIRED.FAR_BRANCH / 2)",
+ "MetricExpr": "INST_RETIRED.ANY / (cpu_atom@BR_INST_RETIRED.FAR_BRANCH@ / 2)",
"MetricGroup": " ",
"MetricName": "tma_info_ipfarbranch",
"Unit": "cpu_atom"
@@ -453,7 +453,7 @@
},
{
"BriefDescription": "Instructions per retired conditional Branch Misprediction where the branch was not taken",
- "MetricExpr": "INST_RETIRED.ANY / (BR_MISP_RETIRED.COND - BR_MISP_RETIRED.COND_TAKEN)",
+ "MetricExpr": "INST_RETIRED.ANY / (cpu_atom@BR_MISP_RETIRED.COND@ - cpu_atom@BR_MISP_RETIRED.COND_TAKEN@)",
"MetricName": "tma_info_ipmisp_cond_ntaken",
"Unit": "cpu_atom"
},
@@ -498,20 +498,20 @@
},
{
"BriefDescription": "Percentage of total non-speculative loads that are splits",
- "MetricExpr": "100 * MEM_UOPS_RETIRED.SPLIT_LOADS / MEM_UOPS_RETIRED.ALL_LOADS",
+ "MetricExpr": "100 * cpu_atom@MEM_UOPS_RETIRED.SPLIT_LOADS@ / MEM_UOPS_RETIRED.ALL_LOADS",
"MetricName": "tma_info_load_splits",
"Unit": "cpu_atom"
},
{
"BriefDescription": "load ops retired per 1000 instruction",
- "MetricExpr": "1e3 * MEM_UOPS_RETIRED.ALL_LOADS / INST_RETIRED.ANY",
+ "MetricExpr": "1e3 * cpu_atom@MEM_UOPS_RETIRED.ALL_LOADS@ / INST_RETIRED.ANY",
"MetricGroup": " ",
"MetricName": "tma_info_memloadpki",
"Unit": "cpu_atom"
},
{
"BriefDescription": "Percentage of all uops which are ucode ops",
- "MetricExpr": "100 * UOPS_RETIRED.MS / UOPS_RETIRED.ALL",
+ "MetricExpr": "100 * cpu_atom@UOPS_RETIRED.MS@ / UOPS_RETIRED.ALL",
"MetricGroup": " ",
"MetricName": "tma_info_microcode_uop_ratio",
"Unit": "cpu_atom"
@@ -525,7 +525,7 @@
},
{
"BriefDescription": "Percentage of total non-speculative loads with a store forward or unknown store address block",
- "MetricExpr": "100 * LD_BLOCKS.DATA_UNKNOWN / MEM_UOPS_RETIRED.ALL_LOADS",
+ "MetricExpr": "100 * cpu_atom@LD_BLOCKS.DATA_UNKNOWN@ / MEM_UOPS_RETIRED.ALL_LOADS",
"MetricName": "tma_info_store_fwd_blocks",
"Unit": "cpu_atom"
},
@@ -545,7 +545,7 @@
},
{
"BriefDescription": "Percentage of all uops which are x87 uops",
- "MetricExpr": "100 * UOPS_RETIRED.X87 / UOPS_RETIRED.ALL",
+ "MetricExpr": "100 * cpu_atom@UOPS_RETIRED.X87@ / UOPS_RETIRED.ALL",
"MetricGroup": " ",
"MetricName": "tma_info_x87_uop_ratio",
"Unit": "cpu_atom"
@@ -571,7 +571,7 @@
{
"BriefDescription": "Counts the number of cycles a core is stalled due to a demand load which hit in the L2 Cache.",
"MetricConstraint": "NO_GROUP_EVENTS",
- "MetricExpr": "MEM_BOUND_STALLS.LOAD_L2_HIT / tma_info_clks - MEM_BOUND_STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_L2_HIT / MEM_BOUND_STALLS.LOAD",
+ "MetricExpr": "cpu_atom@MEM_BOUND_STALLS.LOAD_L2_HIT@ / tma_info_clks - max((cpu_atom@MEM_BOUND_STALLS.LOAD@ - cpu_atom@LD_HEAD.L1_MISS_AT_RET@) / tma_info_clks, 0) * cpu_atom@MEM_BOUND_STALLS.LOAD_L2_HIT@ / cpu_atom@MEM_BOUND_STALLS.LOAD@",
"MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
"MetricName": "tma_l2_bound",
"MetricThreshold": "tma_l2_bound > 0.1",
@@ -580,7 +580,7 @@
},
{
"BriefDescription": "Counts the number of cycles a core is stalled due to a demand load which hit in the Last Level Cache (LLC) or other core with HITE/F/M.",
- "MetricExpr": "MEM_BOUND_STALLS.LOAD_LLC_HIT / tma_info_clks - MEM_BOUND_STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_LLC_HIT / MEM_BOUND_STALLS.LOAD",
+ "MetricExpr": "cpu_atom@MEM_BOUND_STALLS.LOAD_LLC_HIT@ / tma_info_clks - max((cpu_atom@MEM_BOUND_STALLS.LOAD@ - cpu_atom@LD_HEAD.L1_MISS_AT_RET@) / tma_info_clks, 0) * cpu_atom@MEM_BOUND_STALLS.LOAD_LLC_HIT@ / cpu_atom@MEM_BOUND_STALLS.LOAD@",
"MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
"MetricName": "tma_l3_bound",
"MetricThreshold": "tma_l3_bound > 0.1",
@@ -589,7 +589,7 @@
},
{
"BriefDescription": "Counts the number of cycles, relative to the number of mem_scheduler slots, in which uops are blocked due to load buffer full",
- "MetricExpr": "tma_mem_scheduler * MEM_SCHEDULER_BLOCK.LD_BUF / MEM_SCHEDULER_BLOCK.ALL",
+ "MetricExpr": "tma_mem_scheduler * cpu_atom@MEM_SCHEDULER_BLOCK.LD_BUF@ / MEM_SCHEDULER_BLOCK.ALL",
"MetricGroup": "TopdownL4;tma_L4_group;tma_mem_scheduler_group",
"MetricName": "tma_ld_buffer",
"MetricThreshold": "tma_ld_buffer > 0.05",
@@ -617,7 +617,7 @@
},
{
"BriefDescription": "Counts the number of cycles the core is stalled due to stores or loads.",
- "MetricExpr": "min(tma_backend_bound, LD_HEAD.ANY_AT_RET / tma_info_clks + tma_store_bound)",
+ "MetricExpr": "min(cpu_atom@TOPDOWN_BE_BOUND.ALL@ / tma_info_slots, cpu_atom@LD_HEAD.ANY_AT_RET@ / tma_info_clks + tma_store_bound)",
"MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2",
@@ -627,7 +627,7 @@
},
{
"BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to memory ordering.",
- "MetricExpr": "tma_nuke * (MACHINE_CLEARS.MEMORY_ORDERING / MACHINE_CLEARS.SLOW)",
+ "MetricExpr": "tma_nuke * (cpu_atom@MACHINE_CLEARS.MEMORY_ORDERING@ / cpu_atom@MACHINE_CLEARS.SLOW@)",
"MetricGroup": "TopdownL4;tma_L4_group;tma_nuke_group",
"MetricName": "tma_memory_ordering",
"MetricThreshold": "tma_memory_ordering > 0.02",
@@ -636,7 +636,7 @@
},
{
"BriefDescription": "Counts the number of uops that are from the complex flows issued by the micro-sequencer (MS)",
- "MetricExpr": "tma_microcode_sequencer",
+ "MetricExpr": "UOPS_RETIRED.MS / tma_info_slots",
"MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_ms_uops",
"MetricThreshold": "tma_ms_uops > 0.05",
@@ -692,7 +692,7 @@
},
{
"BriefDescription": "Counts the number of uops retired excluding ms and fp div uops.",
- "MetricExpr": "(TOPDOWN_RETIRING.ALL - UOPS_RETIRED.MS - UOPS_RETIRED.FPDIV) / tma_info_slots",
+ "MetricExpr": "(cpu_atom@TOPDOWN_RETIRING.ALL@ - cpu_atom@UOPS_RETIRED.MS@ - cpu_atom@UOPS_RETIRED.FPDIV@) / tma_info_slots",
"MetricGroup": "TopdownL3;tma_L3_group;tma_base_group",
"MetricName": "tma_other_ret",
"MetricThreshold": "tma_other_ret > 0.3",
@@ -701,7 +701,7 @@
},
{
"BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to page faults.",
- "MetricExpr": "tma_nuke * (MACHINE_CLEARS.PAGE_FAULT / MACHINE_CLEARS.SLOW)",
+ "MetricExpr": "tma_nuke * (cpu_atom@MACHINE_CLEARS.PAGE_FAULT@ / cpu_atom@MACHINE_CLEARS.SLOW@)",
"MetricGroup": "TopdownL4;tma_L4_group;tma_nuke_group",
"MetricName": "tma_page_fault",
"MetricThreshold": "tma_page_fault > 0.02",
@@ -758,7 +758,7 @@
},
{
"BriefDescription": "Counts the number of cycles, relative to the number of mem_scheduler slots, in which uops are blocked due to RSV full relative",
- "MetricExpr": "tma_mem_scheduler * MEM_SCHEDULER_BLOCK.RSV / MEM_SCHEDULER_BLOCK.ALL",
+ "MetricExpr": "tma_mem_scheduler * cpu_atom@MEM_SCHEDULER_BLOCK.RSV@ / MEM_SCHEDULER_BLOCK.ALL",
"MetricGroup": "TopdownL4;tma_L4_group;tma_mem_scheduler_group",
"MetricName": "tma_rsv",
"MetricThreshold": "tma_rsv > 0.05",
@@ -776,7 +776,7 @@
},
{
"BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to SMC.",
- "MetricExpr": "tma_nuke * (MACHINE_CLEARS.SMC / MACHINE_CLEARS.SLOW)",
+ "MetricExpr": "tma_nuke * (cpu_atom@MACHINE_CLEARS.SMC@ / cpu_atom@MACHINE_CLEARS.SLOW@)",
"MetricGroup": "TopdownL4;tma_L4_group;tma_nuke_group",
"MetricName": "tma_smc",
"MetricThreshold": "tma_smc > 0.02",
@@ -812,7 +812,7 @@
},
{
"BriefDescription": "Counts the number of cycles the core is stalled due to store buffer full.",
- "MetricExpr": "tma_mem_scheduler * (MEM_SCHEDULER_BLOCK.ST_BUF / MEM_SCHEDULER_BLOCK.ALL)",
+ "MetricExpr": "tma_mem_scheduler * (cpu_atom@MEM_SCHEDULER_BLOCK.ST_BUF@ / cpu_atom@MEM_SCHEDULER_BLOCK.ALL@)",
"MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
"MetricName": "tma_store_bound",
"MetricThreshold": "tma_store_bound > 0.1",
@@ -830,7 +830,7 @@
},
{
"BriefDescription": "This metric represents Core fraction of cycles CPU dispatched uops on execution ports for ALU operations.",
- "MetricExpr": "(UOPS_DISPATCHED.PORT_0 + UOPS_DISPATCHED.PORT_1 + UOPS_DISPATCHED.PORT_5_11 + UOPS_DISPATCHED.PORT_6) / (5 * tma_info_core_clks)",
+ "MetricExpr": "(cpu_core@UOPS_DISPATCHED.PORT_0@ + cpu_core@UOPS_DISPATCHED.PORT_1@ + cpu_core@UOPS_DISPATCHED.PORT_5_11@ + cpu_core@UOPS_DISPATCHED.PORT_6@) / (5 * tma_info_core_clks)",
"MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group",
"MetricName": "tma_alu_op_utilization",
"MetricThreshold": "tma_alu_op_utilization > 0.6",
@@ -849,7 +849,7 @@
},
{
"BriefDescription": "This metric estimates fraction of slots the CPU retired uops as a result of handing SSE to AVX* or AVX* to SSE transition Assists.",
- "MetricExpr": "63 * ASSISTS.SSE_AVX_MIX / tma_info_slots",
+ "MetricExpr": "63 * [email protected]_AVX_MIX@ / tma_info_slots",
"MetricGroup": "HPC;TopdownL5;tma_L5_group;tma_assists_group",
"MetricName": "tma_avx_assists",
"MetricThreshold": "tma_avx_assists > 0.1",
@@ -858,7 +858,7 @@
},
{
"BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend",
- "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_slots",
+ "MetricExpr": "cpu_core@topdown\\-be\\-bound@ / (cpu_core@topdown\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_slots",
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
@@ -880,7 +880,7 @@
},
{
"BriefDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction",
- "MetricExpr": "topdown\\-br\\-mispredict / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_slots",
+ "MetricExpr": "cpu_core@topdown\\-br\\-mispredict@ / (cpu_core@topdown\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_slots",
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
@@ -911,7 +911,7 @@
},
{
"BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Machine Clears",
- "MetricExpr": "(1 - tma_branch_mispredicts / tma_bad_speculation) * INT_MISC.CLEAR_RESTEER_CYCLES / tma_info_clks",
+ "MetricExpr": "(1 - tma_branch_mispredicts / tma_bad_speculation) * cpu_core@INT_MISC.CLEAR_RESTEER_CYCLES@ / tma_info_clks",
"MetricGroup": "BadSpec;MachineClears;TopdownL4;tma_L4_group;tma_branch_resteers_group;tma_issueMC",
"MetricName": "tma_clears_resteers",
"MetricThreshold": "tma_clears_resteers > 0.05 & (tma_branch_resteers > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15))",
@@ -922,7 +922,7 @@
{
"BriefDescription": "This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to contested accesses",
"MetricConstraint": "NO_GROUP_EVENTS",
- "MetricExpr": "(25 * tma_info_average_frequency * (MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD * (OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM / (OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM + OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD))) + 24 * tma_info_average_frequency * MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS) * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2) / tma_info_clks",
+ "MetricExpr": "(25 * tma_info_average_frequency * (cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD@ * ([email protected]_DATA_RD.L3_HIT.SNOOP_HITM@ / ([email protected]_DATA_RD.L3_HIT.SNOOP_HITM@ + [email protected]_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD@))) + 24 * tma_info_average_frequency * cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS@) * (1 + cpu_core@MEM_LOAD_RETIRED.FB_HIT@ / cpu_core@MEM_LOAD_RETIRED.L1_MISS@ / 2) / tma_info_clks",
"MetricGroup": "DataSharing;Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSyncxn;tma_l3_bound_group",
"MetricName": "tma_contested_accesses",
"MetricThreshold": "tma_contested_accesses > 0.05 & (tma_l3_bound > 0.05 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))",
@@ -944,7 +944,7 @@
{
"BriefDescription": "This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses",
"MetricConstraint": "NO_GROUP_EVENTS",
- "MetricExpr": "24 * tma_info_average_frequency * (MEM_LOAD_L3_HIT_RETIRED.XSNP_NO_FWD + MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD * (1 - OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM / (OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM + OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD))) * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2) / tma_info_clks",
+ "MetricExpr": "24 * tma_info_average_frequency * (cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_NO_FWD@ + cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD@ * (1 - [email protected]_DATA_RD.L3_HIT.SNOOP_HITM@ / ([email protected]_DATA_RD.L3_HIT.SNOOP_HITM@ + [email protected]_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD@))) * (1 + cpu_core@MEM_LOAD_RETIRED.FB_HIT@ / cpu_core@MEM_LOAD_RETIRED.L1_MISS@ / 2) / tma_info_clks",
"MetricGroup": "Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSyncxn;tma_l3_bound_group",
"MetricName": "tma_data_sharing",
"MetricThreshold": "tma_data_sharing > 0.05 & (tma_l3_bound > 0.05 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))",
@@ -975,7 +975,7 @@
{
"BriefDescription": "This metric estimates how often the CPU was stalled on accesses to external memory (DRAM) by loads",
"MetricConstraint": "NO_GROUP_EVENTS",
- "MetricExpr": "MEMORY_ACTIVITY.STALLS_L3_MISS / tma_info_clks",
+ "MetricExpr": "cpu_core@MEMORY_ACTIVITY.STALLS_L3_MISS@ / tma_info_clks",
"MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_memory_bound_group",
"MetricName": "tma_dram_bound",
"MetricThreshold": "tma_dram_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2)",
@@ -985,7 +985,7 @@
},
{
"BriefDescription": "This metric represents Core fraction of cycles in which CPU was likely limited due to DSB (decoded uop cache) fetch pipeline",
- "MetricExpr": "(IDQ.DSB_CYCLES_ANY - IDQ.DSB_CYCLES_OK) / tma_info_core_clks / 2",
+ "MetricExpr": "([email protected]_CYCLES_ANY@ - [email protected]_CYCLES_OK@) / tma_info_core_clks / 2",
"MetricGroup": "DSB;FetchBW;TopdownL3;tma_L3_group;tma_fetch_bandwidth_group",
"MetricName": "tma_dsb",
"MetricThreshold": "tma_dsb > 0.15 & (tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 6 > 0.35)",
@@ -1005,7 +1005,7 @@
},
{
"BriefDescription": "This metric roughly estimates the fraction of cycles where the Data TLB (DTLB) was missed by load accesses",
- "MetricExpr": "min(7 * cpu_core@DTLB_LOAD_MISSES.STLB_HIT\\,cmask\\=1@ + DTLB_LOAD_MISSES.WALK_ACTIVE, max(CYCLE_ACTIVITY.CYCLES_MEM_ANY - MEMORY_ACTIVITY.CYCLES_L1D_MISS, 0)) / tma_info_clks",
+ "MetricExpr": "min(7 * cpu_core@DTLB_LOAD_MISSES.STLB_HIT\\,cmask\\=1@ + cpu_core@DTLB_LOAD_MISSES.WALK_ACTIVE@, max(cpu_core@CYCLE_ACTIVITY.CYCLES_MEM_ANY@ - cpu_core@MEMORY_ACTIVITY.CYCLES_L1D_MISS@, 0)) / tma_info_clks",
"MetricGroup": "MemoryTLB;TopdownL4;tma_L4_group;tma_issueTLB;tma_l1_bound_group",
"MetricName": "tma_dtlb_load",
"MetricThreshold": "tma_dtlb_load > 0.1 & (tma_l1_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))",
@@ -1015,7 +1015,7 @@
},
{
"BriefDescription": "This metric roughly estimates the fraction of cycles spent handling first-level data TLB store misses",
- "MetricExpr": "(7 * cpu_core@DTLB_STORE_MISSES.STLB_HIT\\,cmask\\=1@ + DTLB_STORE_MISSES.WALK_ACTIVE) / tma_info_core_clks",
+ "MetricExpr": "(7 * cpu_core@DTLB_STORE_MISSES.STLB_HIT\\,cmask\\=1@ + cpu_core@DTLB_STORE_MISSES.WALK_ACTIVE@) / tma_info_core_clks",
"MetricGroup": "MemoryTLB;TopdownL4;tma_L4_group;tma_issueTLB;tma_store_bound_group",
"MetricName": "tma_dtlb_store",
"MetricThreshold": "tma_dtlb_store > 0.05 & (tma_store_bound > 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))",
@@ -1025,7 +1025,7 @@
},
{
"BriefDescription": "This metric roughly estimates how often CPU was handling synchronizations due to False Sharing",
- "MetricExpr": "28 * tma_info_average_frequency * OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM / tma_info_clks",
+ "MetricExpr": "28 * tma_info_average_frequency * [email protected]_RFO.L3_HIT.SNOOP_HITM@ / tma_info_clks",
"MetricGroup": "DataSharing;Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSyncxn;tma_store_bound_group",
"MetricName": "tma_false_sharing",
"MetricThreshold": "tma_false_sharing > 0.05 & (tma_store_bound > 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))",
@@ -1056,7 +1056,7 @@
},
{
"BriefDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues",
- "MetricExpr": "topdown\\-fetch\\-lat / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UOP_DROPPING / tma_info_slots",
+ "MetricExpr": "cpu_core@topdown\\-fetch\\-lat@ / (cpu_core@topdown\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@ + cpu_core@topdown\\-be\\-bound@) - cpu_core@INT_MISC.UOP_DROPPING@ / tma_info_slots",
"MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
@@ -1088,7 +1088,7 @@
},
{
"BriefDescription": "This metric roughly estimates fraction of slots the CPU retired uops as a result of handing Floating Point (FP) Assists",
- "MetricExpr": "30 * ASSISTS.FP / tma_info_slots",
+ "MetricExpr": "30 * [email protected]@ / tma_info_slots",
"MetricGroup": "HPC;TopdownL5;tma_L5_group;tma_assists_group",
"MetricName": "tma_fp_assists",
"MetricThreshold": "tma_fp_assists > 0.1",
@@ -1118,7 +1118,7 @@
},
{
"BriefDescription": "This metric approximates arithmetic FP vector uops fraction the CPU has retired for 128-bit wide vectors",
- "MetricExpr": "(FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE) / (tma_retiring * tma_info_slots)",
+ "MetricExpr": "(cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE@) / (tma_retiring * tma_info_slots)",
"MetricGroup": "Compute;Flops;TopdownL5;tma_L5_group;tma_fp_vector_group;tma_issue2P",
"MetricName": "tma_fp_vector_128b",
"MetricThreshold": "tma_fp_vector_128b > 0.1 & (tma_fp_vector > 0.1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6))",
@@ -1128,7 +1128,7 @@
},
{
"BriefDescription": "This metric approximates arithmetic FP vector uops fraction the CPU has retired for 256-bit wide vectors",
- "MetricExpr": "(FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE) / (tma_retiring * tma_info_slots)",
+ "MetricExpr": "(cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE@) / (tma_retiring * tma_info_slots)",
"MetricGroup": "Compute;Flops;TopdownL5;tma_L5_group;tma_fp_vector_group;tma_issue2P",
"MetricName": "tma_fp_vector_256b",
"MetricThreshold": "tma_fp_vector_256b > 0.1 & (tma_fp_vector > 0.1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6))",
@@ -1138,7 +1138,7 @@
},
{
"BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend",
- "MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UOP_DROPPING / tma_info_slots",
+ "MetricExpr": "cpu_core@topdown\\-fe\\-bound@ / (cpu_core@topdown\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@ + cpu_core@topdown\\-be\\-bound@) - cpu_core@INT_MISC.UOP_DROPPING@ / tma_info_slots",
"MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
@@ -1149,7 +1149,7 @@
},
{
"BriefDescription": "This metric represents fraction of slots where the CPU was retiring fused instructions -- where one uop can represent multiple contiguous instructions",
- "MetricExpr": "tma_light_operations * INST_RETIRED.MACRO_FUSED / (tma_retiring * tma_info_slots)",
+ "MetricExpr": "tma_light_operations * cpu_core@INST_RETIRED.MACRO_FUSED@ / (tma_retiring * tma_info_slots)",
"MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
"MetricName": "tma_fused_instructions",
"MetricThreshold": "tma_fused_instructions > 0.1 & tma_light_operations > 0.6",
@@ -1159,7 +1159,7 @@
},
{
"BriefDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences",
- "MetricExpr": "topdown\\-heavy\\-ops / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_slots",
+ "MetricExpr": "cpu_core@topdown\\-heavy\\-ops@ / (cpu_core@topdown\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_slots",
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_heavy_operations",
"MetricThreshold": "tma_heavy_operations > 0.1",
@@ -1213,7 +1213,7 @@
},
{
"BriefDescription": "Total pipeline cost of branch related instructions (used for program control-flow including function calls)",
- "MetricExpr": "100 * ((BR_INST_RETIRED.COND + 3 * BR_INST_RETIRED.NEAR_CALL + (BR_INST_RETIRED.NEAR_TAKEN - BR_INST_RETIRED.COND_TAKEN - 2 * BR_INST_RETIRED.NEAR_CALL)) / tma_info_slots)",
+ "MetricExpr": "100 * ((cpu_core@BR_INST_RETIRED.COND@ + 3 * cpu_core@BR_INST_RETIRED.NEAR_CALL@ + (cpu_core@BR_INST_RETIRED.NEAR_TAKEN@ - cpu_core@BR_INST_RETIRED.COND_TAKEN@ - 2 * cpu_core@BR_INST_RETIRED.NEAR_CALL@)) / tma_info_slots)",
"MetricGroup": "Ret;tma_issueBC",
"MetricName": "tma_info_branching_overhead",
"MetricThreshold": "tma_info_branching_overhead > 10",
@@ -1222,21 +1222,21 @@
},
{
"BriefDescription": "Fraction of branches that are CALL or RET",
- "MetricExpr": "(BR_INST_RETIRED.NEAR_CALL + BR_INST_RETIRED.NEAR_RETURN) / BR_INST_RETIRED.ALL_BRANCHES",
+ "MetricExpr": "(cpu_core@BR_INST_RETIRED.NEAR_CALL@ + cpu_core@BR_INST_RETIRED.NEAR_RETURN@) / BR_INST_RETIRED.ALL_BRANCHES",
"MetricGroup": "Bad;Branches",
"MetricName": "tma_info_callret",
"Unit": "cpu_core"
},
{
"BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.",
- "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
+ "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.THREAD@",
"MetricGroup": "Pipeline",
"MetricName": "tma_info_clks",
"Unit": "cpu_core"
},
{
"BriefDescription": "STLB (2nd level TLB) code speculative misses per kilo instruction (misses of any page-size that complete the page walk)",
- "MetricExpr": "1e3 * ITLB_MISSES.WALK_COMPLETED / INST_RETIRED.ANY",
+ "MetricExpr": "1e3 * cpu_core@ITLB_MISSES.WALK_COMPLETED@ / INST_RETIRED.ANY",
"MetricGroup": "Fed;MemoryTLB",
"MetricName": "tma_info_code_stlb_mpki",
"Unit": "cpu_core"
@@ -1266,7 +1266,7 @@
},
{
"BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core",
- "MetricExpr": "CPU_CLK_UNHALTED.DISTRIBUTED",
+ "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.DISTRIBUTED@",
"MetricGroup": "SMT",
"MetricName": "tma_info_core_clks",
"Unit": "cpu_core"
@@ -1309,7 +1309,7 @@
},
{
"BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)",
- "MetricExpr": "IDQ.DSB_UOPS / UOPS_ISSUED.ANY",
+ "MetricExpr": "IDQ.DSB_UOPS / cpu_core@UOPS_ISSUED.ANY@",
"MetricGroup": "DSB;Fed;FetchBW;tma_issueFB",
"MetricName": "tma_info_dsb_coverage",
"MetricThreshold": "tma_info_dsb_coverage < 0.7 & tma_info_ipc / 6 > 0.35",
@@ -1350,7 +1350,7 @@
},
{
"BriefDescription": "Fill Buffer (FB) hits per kilo instructions for retired demand loads (L1D misses that merge into ongoing miss-handling entries)",
- "MetricExpr": "1e3 * MEM_LOAD_RETIRED.FB_HIT / INST_RETIRED.ANY",
+ "MetricExpr": "1e3 * cpu_core@MEM_LOAD_RETIRED.FB_HIT@ / INST_RETIRED.ANY",
"MetricGroup": "CacheMisses;Mem",
"MetricName": "tma_info_fb_hpki",
"Unit": "cpu_core"
@@ -1365,7 +1365,7 @@
{
"BriefDescription": "Floating Point Operations Per Cycle",
"MetricConstraint": "NO_GROUP_EVENTS",
- "MetricExpr": "(FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * (FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE) / tma_info_core_clks",
+ "MetricExpr": "(cpu_core@FP_ARITH_INST_RETIRED.SCALAR_SINGLE@ + cpu_core@FP_ARITH_INST_RETIRED.SCALAR_DOUBLE@ + 2 * cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@ + 4 * (cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE@ + cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE@) + 8 * cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE@) / tma_info_core_clks",
"MetricGroup": "Flops;Ret",
"MetricName": "tma_info_flopc",
"Unit": "cpu_core"
@@ -1373,7 +1373,7 @@
{
"BriefDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width)",
"MetricConstraint": "NO_GROUP_EVENTS",
- "MetricExpr": "(FP_ARITH_DISPATCHED.PORT_0 + FP_ARITH_DISPATCHED.PORT_1 + FP_ARITH_DISPATCHED.PORT_5) / (2 * tma_info_core_clks)",
+ "MetricExpr": "(cpu_core@FP_ARITH_DISPATCHED.PORT_0@ + cpu_core@FP_ARITH_DISPATCHED.PORT_1@ + cpu_core@FP_ARITH_DISPATCHED.PORT_5@) / (2 * tma_info_core_clks)",
"MetricGroup": "Cor;Flops;HPC",
"MetricName": "tma_info_fp_arith_utilization",
"PublicDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width). Values > 1 are possible due to ([BDW+] Fused-Multiply Add (FMA) counting - common; [ADL+] use all of ADD/MUL/FMA in Scalar or 128/256-bit vectors - less common).",
@@ -1381,7 +1381,7 @@
},
{
"BriefDescription": "Giga Floating Point Operations Per Second",
- "MetricExpr": "(FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * (FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE) / 1e9 / duration_time",
+ "MetricExpr": "(cpu_core@FP_ARITH_INST_RETIRED.SCALAR_SINGLE@ + cpu_core@FP_ARITH_INST_RETIRED.SCALAR_DOUBLE@ + 2 * cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@ + 4 * (cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE@ + cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE@) + 8 * cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE@) / 1e9 / duration_time",
"MetricGroup": "Cor;Flops;HPC",
"MetricName": "tma_info_gflops",
"PublicDescription": "Giga Floating Point Operations Per Second. Aggregate across all supported options of: FP precisions, scalar and vector instructions, vector-width and AMX engine.",
@@ -1405,7 +1405,7 @@
},
{
"BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is execution) per-core",
- "MetricExpr": "UOPS_EXECUTED.THREAD / (UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 if #SMT_on else UOPS_EXECUTED.CORE_CYCLES_GE_1)",
+ "MetricExpr": "UOPS_EXECUTED.THREAD / (cpu_core@UOPS_EXECUTED.CORE_CYCLES_GE_1@ / 2 if #SMT_on else cpu_core@UOPS_EXECUTED.CORE_CYCLES_GE_1@)",
"MetricGroup": "Backend;Cor;Pipeline;PortsUtil",
"MetricName": "tma_info_ilp",
"Unit": "cpu_core"
@@ -1421,7 +1421,7 @@
},
{
"BriefDescription": "Total number of retired Instructions",
- "MetricExpr": "INST_RETIRED.ANY",
+ "MetricExpr": "cpu_core@INST_RETIRED.ANY@",
"MetricGroup": "Summary;TmaL1;tma_L1_group",
"MetricName": "tma_info_instructions",
"PublicDescription": "Total number of retired Instructions. Sample with: INST_RETIRED.PREC_DIST",
@@ -1438,7 +1438,7 @@
},
{
"BriefDescription": "Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number means higher occurrence rate)",
- "MetricExpr": "INST_RETIRED.ANY / (FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE)",
+ "MetricExpr": "INST_RETIRED.ANY / (cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE@)",
"MetricGroup": "Flops;FpVector;InsType",
"MetricName": "tma_info_iparith_avx128",
"MetricThreshold": "tma_info_iparith_avx128 < 10",
@@ -1447,7 +1447,7 @@
},
{
"BriefDescription": "Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means higher occurrence rate)",
- "MetricExpr": "INST_RETIRED.ANY / (FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE)",
+ "MetricExpr": "INST_RETIRED.ANY / (cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE@)",
"MetricGroup": "Flops;FpVector;InsType",
"MetricName": "tma_info_iparith_avx256",
"MetricThreshold": "tma_info_iparith_avx256 < 10",
@@ -1514,7 +1514,7 @@
},
{
"BriefDescription": "Instructions per Far Branch ( Far Branches apply upon transition from application to operating system, handling interrupts, exceptions) [lower number means higher occurrence rate]",
- "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.FAR_BRANCH:u",
+ "MetricExpr": "INST_RETIRED.ANY / cpu_core@BR_INST_RETIRED.FAR_BRANCH@u",
"MetricGroup": "Branches;OS",
"MetricName": "tma_info_ipfarbranch",
"MetricThreshold": "tma_info_ipfarbranch < 1e6",
@@ -1522,7 +1522,7 @@
},
{
"BriefDescription": "Instructions per Floating Point (FP) Operation (lower number means higher occurrence rate)",
- "MetricExpr": "INST_RETIRED.ANY / (FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * (FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE)",
+ "MetricExpr": "INST_RETIRED.ANY / (cpu_core@FP_ARITH_INST_RETIRED.SCALAR_SINGLE@ + cpu_core@FP_ARITH_INST_RETIRED.SCALAR_DOUBLE@ + 2 * cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@ + 4 * (cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE@ + cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE@) + 8 * cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE@)",
"MetricGroup": "Flops;InsType",
"MetricName": "tma_info_ipflop",
"MetricThreshold": "tma_info_ipflop < 10",
@@ -1610,14 +1610,14 @@
},
{
"BriefDescription": "Fraction of branches that are unconditional (direct or indirect) jumps",
- "MetricExpr": "(BR_INST_RETIRED.NEAR_TAKEN - BR_INST_RETIRED.COND_TAKEN - 2 * BR_INST_RETIRED.NEAR_CALL) / BR_INST_RETIRED.ALL_BRANCHES",
+ "MetricExpr": "(cpu_core@BR_INST_RETIRED.NEAR_TAKEN@ - cpu_core@BR_INST_RETIRED.COND_TAKEN@ - 2 * cpu_core@BR_INST_RETIRED.NEAR_CALL@) / BR_INST_RETIRED.ALL_BRANCHES",
"MetricGroup": "Bad;Branches",
"MetricName": "tma_info_jump",
"Unit": "cpu_core"
},
{
"BriefDescription": "Cycles Per Instruction for the Operating System (OS) Kernel mode",
- "MetricExpr": "CPU_CLK_UNHALTED.THREAD_P:k / INST_RETIRED.ANY_P:k",
+ "MetricExpr": "CPU_CLK_UNHALTED.THREAD_P:k / cpu_core@INST_RETIRED.ANY_P@k",
"MetricGroup": "OS",
"MetricName": "tma_info_kernel_cpi",
"Unit": "cpu_core"
@@ -1632,7 +1632,7 @@
},
{
"BriefDescription": "Average per-core data fill bandwidth to the L1 data cache [GB / sec]",
- "MetricExpr": "64 * L1D.REPLACEMENT / 1e9 / duration_time",
+ "MetricExpr": "64 * [email protected]@ / 1e9 / duration_time",
"MetricGroup": "Mem;MemoryBW",
"MetricName": "tma_info_l1d_cache_fill_bw",
"Unit": "cpu_core"
@@ -1646,21 +1646,21 @@
},
{
"BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads",
- "MetricExpr": "1e3 * MEM_LOAD_RETIRED.L1_MISS / INST_RETIRED.ANY",
+ "MetricExpr": "1e3 * cpu_core@MEM_LOAD_RETIRED.L1_MISS@ / INST_RETIRED.ANY",
"MetricGroup": "CacheMisses;Mem",
"MetricName": "tma_info_l1mpki",
"Unit": "cpu_core"
},
{
"BriefDescription": "L1 cache true misses per kilo instruction for all demand loads (including speculative)",
- "MetricExpr": "1e3 * L2_RQSTS.ALL_DEMAND_DATA_RD / INST_RETIRED.ANY",
+ "MetricExpr": "1e3 * cpu_core@L2_RQSTS.ALL_DEMAND_DATA_RD@ / INST_RETIRED.ANY",
"MetricGroup": "CacheMisses;Mem",
"MetricName": "tma_info_l1mpki_load",
"Unit": "cpu_core"
},
{
"BriefDescription": "Average per-core data fill bandwidth to the L2 cache [GB / sec]",
- "MetricExpr": "64 * L2_LINES_IN.ALL / 1e9 / duration_time",
+ "MetricExpr": "64 * cpu_core@L2_LINES_IN.ALL@ / 1e9 / duration_time",
"MetricGroup": "Mem;MemoryBW",
"MetricName": "tma_info_l2_cache_fill_bw",
"Unit": "cpu_core"
@@ -1674,56 +1674,56 @@
},
{
"BriefDescription": "L2 cache hits per kilo instruction for all request types (including speculative)",
- "MetricExpr": "1e3 * (L2_RQSTS.REFERENCES - L2_RQSTS.MISS) / INST_RETIRED.ANY",
+ "MetricExpr": "1e3 * (cpu_core@L2_RQSTS.REFERENCES@ - cpu_core@L2_RQSTS.MISS@) / INST_RETIRED.ANY",
"MetricGroup": "CacheMisses;Mem",
"MetricName": "tma_info_l2hpki_all",
"Unit": "cpu_core"
},
{
"BriefDescription": "L2 cache hits per kilo instruction for all demand loads (including speculative)",
- "MetricExpr": "1e3 * L2_RQSTS.DEMAND_DATA_RD_HIT / INST_RETIRED.ANY",
+ "MetricExpr": "1e3 * cpu_core@L2_RQSTS.DEMAND_DATA_RD_HIT@ / INST_RETIRED.ANY",
"MetricGroup": "CacheMisses;Mem",
"MetricName": "tma_info_l2hpki_load",
"Unit": "cpu_core"
},
{
"BriefDescription": "L2 cache true misses per kilo instruction for retired demand loads",
- "MetricExpr": "1e3 * MEM_LOAD_RETIRED.L2_MISS / INST_RETIRED.ANY",
+ "MetricExpr": "1e3 * cpu_core@MEM_LOAD_RETIRED.L2_MISS@ / INST_RETIRED.ANY",
"MetricGroup": "Backend;CacheMisses;Mem",
"MetricName": "tma_info_l2mpki",
"Unit": "cpu_core"
},
{
"BriefDescription": "L2 cache ([RKL+] true) misses per kilo instruction for all request types (including speculative)",
- "MetricExpr": "1e3 * L2_RQSTS.MISS / INST_RETIRED.ANY",
+ "MetricExpr": "1e3 * cpu_core@L2_RQSTS.MISS@ / INST_RETIRED.ANY",
"MetricGroup": "CacheMisses;Mem;Offcore",
"MetricName": "tma_info_l2mpki_all",
"Unit": "cpu_core"
},
{
"BriefDescription": "L2 cache true code cacheline misses per kilo instruction",
- "MetricExpr": "1e3 * FRONTEND_RETIRED.L2_MISS / INST_RETIRED.ANY",
+ "MetricExpr": "1e3 * cpu_core@FRONTEND_RETIRED.L2_MISS@ / INST_RETIRED.ANY",
"MetricGroup": "IcMiss",
"MetricName": "tma_info_l2mpki_code",
"Unit": "cpu_core"
},
{
"BriefDescription": "L2 cache speculative code cacheline misses per kilo instruction",
- "MetricExpr": "1e3 * L2_RQSTS.CODE_RD_MISS / INST_RETIRED.ANY",
+ "MetricExpr": "1e3 * cpu_core@L2_RQSTS.CODE_RD_MISS@ / INST_RETIRED.ANY",
"MetricGroup": "IcMiss",
"MetricName": "tma_info_l2mpki_code_all",
"Unit": "cpu_core"
},
{
"BriefDescription": "L2 cache ([RKL+] true) misses per kilo instruction for all demand loads (including speculative)",
- "MetricExpr": "1e3 * L2_RQSTS.DEMAND_DATA_RD_MISS / INST_RETIRED.ANY",
+ "MetricExpr": "1e3 * cpu_core@L2_RQSTS.DEMAND_DATA_RD_MISS@ / INST_RETIRED.ANY",
"MetricGroup": "CacheMisses;Mem",
"MetricName": "tma_info_l2mpki_load",
"Unit": "cpu_core"
},
{
"BriefDescription": "Average per-core data access bandwidth to the L3 cache [GB / sec]",
- "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1e9 / duration_time",
+ "MetricExpr": "64 * cpu_core@OFFCORE_REQUESTS.ALL_REQUESTS@ / 1e9 / duration_time",
"MetricGroup": "Mem;MemoryBW;Offcore",
"MetricName": "tma_info_l3_cache_access_bw",
"Unit": "cpu_core"
@@ -1737,7 +1737,7 @@
},
{
"BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
- "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1e9 / duration_time",
+ "MetricExpr": "64 * cpu_core@LONGEST_LAT_CACHE.MISS@ / 1e9 / duration_time",
"MetricGroup": "Mem;MemoryBW",
"MetricName": "tma_info_l3_cache_fill_bw",
"Unit": "cpu_core"
@@ -1751,7 +1751,7 @@
},
{
"BriefDescription": "L3 cache true misses per kilo instruction for retired demand loads",
- "MetricExpr": "1e3 * MEM_LOAD_RETIRED.L3_MISS / INST_RETIRED.ANY",
+ "MetricExpr": "1e3 * cpu_core@MEM_LOAD_RETIRED.L3_MISS@ / INST_RETIRED.ANY",
"MetricGroup": "CacheMisses;Mem",
"MetricName": "tma_info_l3mpki",
"Unit": "cpu_core"
@@ -1786,14 +1786,14 @@
},
{
"BriefDescription": "STLB (2nd level TLB) data load speculative misses per kilo instruction (misses of any page-size that complete the page walk)",
- "MetricExpr": "1e3 * DTLB_LOAD_MISSES.WALK_COMPLETED / INST_RETIRED.ANY",
+ "MetricExpr": "1e3 * cpu_core@DTLB_LOAD_MISSES.WALK_COMPLETED@ / INST_RETIRED.ANY",
"MetricGroup": "Mem;MemoryTLB",
"MetricName": "tma_info_load_stlb_mpki",
"Unit": "cpu_core"
},
{
"BriefDescription": "Fraction of Uops delivered by the LSD (Loop Stream Detector; aka Loop Cache)",
- "MetricExpr": "LSD.UOPS / UOPS_ISSUED.ANY",
+ "MetricExpr": "LSD.UOPS / cpu_core@UOPS_ISSUED.ANY@",
"MetricGroup": "Fed;LSD",
"MetricName": "tma_info_lsd_coverage",
"Unit": "cpu_core"
@@ -1877,7 +1877,7 @@
},
{
"BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses",
- "MetricExpr": "(ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING) / (4 * tma_info_core_clks)",
+ "MetricExpr": "(cpu_core@ITLB_MISSES.WALK_PENDING@ + cpu_core@DTLB_LOAD_MISSES.WALK_PENDING@ + cpu_core@DTLB_STORE_MISSES.WALK_PENDING@) / (4 * tma_info_core_clks)",
"MetricGroup": "Mem;MemoryTLB",
"MetricName": "tma_info_page_walks_utilization",
"MetricThreshold": "tma_info_page_walks_utilization > 0.5",
@@ -1893,21 +1893,21 @@
},
{
"BriefDescription": "Total issue-pipeline slots (per-Physical Core till ICL; per-Logical Processor ICL onward)",
- "MetricExpr": "TOPDOWN.SLOTS",
+ "MetricExpr": "[email protected]@",
"MetricGroup": "TmaL1;tma_L1_group",
"MetricName": "tma_info_slots",
"Unit": "cpu_core"
},
{
"BriefDescription": "Fraction of Physical Core issue-slots utilized by this Logical Processor",
- "MetricExpr": "(tma_info_slots / (TOPDOWN.SLOTS / 2) if #SMT_on else 1)",
+ "MetricExpr": "(tma_info_slots / ([email protected]@ / 2) if #SMT_on else 1)",
"MetricGroup": "SMT;TmaL1;tma_L1_group",
"MetricName": "tma_info_slots_utilization",
"Unit": "cpu_core"
},
{
"BriefDescription": "Fraction of cycles where both hardware Logical Processors were active",
- "MetricExpr": "(1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_DISTRIBUTED if #SMT_on else 0)",
+ "MetricExpr": "(1 - cpu_core@CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE@ / cpu_core@CPU_CLK_UNHALTED.REF_DISTRIBUTED@ if #SMT_on else 0)",
"MetricGroup": "SMT",
"MetricName": "tma_info_smt_2t_utilization",
"Unit": "cpu_core"
@@ -1921,7 +1921,7 @@
},
{
"BriefDescription": "STLB (2nd level TLB) data store speculative misses per kilo instruction (misses of any page-size that complete the page walk)",
- "MetricExpr": "1e3 * DTLB_STORE_MISSES.WALK_COMPLETED / INST_RETIRED.ANY",
+ "MetricExpr": "1e3 * cpu_core@DTLB_STORE_MISSES.WALK_COMPLETED@ / INST_RETIRED.ANY",
"MetricGroup": "Mem;MemoryTLB",
"MetricName": "tma_info_store_stlb_mpki",
"Unit": "cpu_core"
@@ -1969,7 +1969,7 @@
},
{
"BriefDescription": "This metric represents 128-bit vector Integer ADD/SUB/SAD or VNNI (Vector Neural Network Instructions) uops fraction the CPU has retired",
- "MetricExpr": "(INT_VEC_RETIRED.ADD_128 + INT_VEC_RETIRED.VNNI_128) / (tma_retiring * tma_info_slots)",
+ "MetricExpr": "(cpu_core@INT_VEC_RETIRED.ADD_128@ + cpu_core@INT_VEC_RETIRED.VNNI_128@) / (tma_retiring * tma_info_slots)",
"MetricGroup": "Compute;IntVector;Pipeline;TopdownL4;tma_L4_group;tma_int_operations_group;tma_issue2P",
"MetricName": "tma_int_vector_128b",
"MetricThreshold": "tma_int_vector_128b > 0.1 & (tma_int_operations > 0.1 & tma_light_operations > 0.6)",
@@ -1979,7 +1979,7 @@
},
{
"BriefDescription": "This metric represents 256-bit vector Integer ADD/SUB/SAD or VNNI (Vector Neural Network Instructions) uops fraction the CPU has retired",
- "MetricExpr": "(INT_VEC_RETIRED.ADD_256 + INT_VEC_RETIRED.MUL_256 + INT_VEC_RETIRED.VNNI_256) / (tma_retiring * tma_info_slots)",
+ "MetricExpr": "(cpu_core@INT_VEC_RETIRED.ADD_256@ + cpu_core@INT_VEC_RETIRED.MUL_256@ + cpu_core@INT_VEC_RETIRED.VNNI_256@) / (tma_retiring * tma_info_slots)",
"MetricGroup": "Compute;IntVector;Pipeline;TopdownL4;tma_L4_group;tma_int_operations_group;tma_issue2P",
"MetricName": "tma_int_vector_256b",
"MetricThreshold": "tma_int_vector_256b > 0.1 & (tma_int_operations > 0.1 & tma_light_operations > 0.6)",
@@ -1999,7 +1999,7 @@
},
{
"BriefDescription": "This metric estimates how often the CPU was stalled without loads missing the L1 data cache",
- "MetricExpr": "max((EXE_ACTIVITY.BOUND_ON_LOADS - MEMORY_ACTIVITY.STALLS_L1D_MISS) / tma_info_clks, 0)",
+ "MetricExpr": "max((cpu_core@EXE_ACTIVITY.BOUND_ON_LOADS@ - cpu_core@MEMORY_ACTIVITY.STALLS_L1D_MISS@) / tma_info_clks, 0)",
"MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_issueL1;tma_issueMC;tma_memory_bound_group",
"MetricName": "tma_l1_bound",
"MetricThreshold": "tma_l1_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2)",
@@ -2010,7 +2010,7 @@
{
"BriefDescription": "This metric estimates how often the CPU was stalled due to L2 cache accesses by loads",
"MetricConstraint": "NO_GROUP_EVENTS",
- "MetricExpr": "(MEMORY_ACTIVITY.STALLS_L1D_MISS - MEMORY_ACTIVITY.STALLS_L2_MISS) / tma_info_clks",
+ "MetricExpr": "(cpu_core@MEMORY_ACTIVITY.STALLS_L1D_MISS@ - cpu_core@MEMORY_ACTIVITY.STALLS_L2_MISS@) / tma_info_clks",
"MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_memory_bound_group",
"MetricName": "tma_l2_bound",
"MetricThreshold": "tma_l2_bound > 0.05 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2)",
@@ -2020,7 +2020,7 @@
},
{
"BriefDescription": "This metric estimates how often the CPU was stalled due to loads accesses to L3 cache or contended with a sibling Core",
- "MetricExpr": "(MEMORY_ACTIVITY.STALLS_L2_MISS - MEMORY_ACTIVITY.STALLS_L3_MISS) / tma_info_clks",
+ "MetricExpr": "(cpu_core@MEMORY_ACTIVITY.STALLS_L2_MISS@ - cpu_core@MEMORY_ACTIVITY.STALLS_L3_MISS@) / tma_info_clks",
"MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_memory_bound_group",
"MetricName": "tma_l3_bound",
"MetricThreshold": "tma_l3_bound > 0.05 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2)",
@@ -2030,7 +2030,7 @@
},
{
"BriefDescription": "This metric represents fraction of cycles with demand load accesses that hit the L3 cache under unloaded scenarios (possibly L3 latency limited)",
- "MetricExpr": "9 * tma_info_average_frequency * MEM_LOAD_RETIRED.L3_HIT * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2) / tma_info_clks",
+ "MetricExpr": "9 * tma_info_average_frequency * cpu_core@MEM_LOAD_RETIRED.L3_HIT@ * (1 + cpu_core@MEM_LOAD_RETIRED.FB_HIT@ / cpu_core@MEM_LOAD_RETIRED.L1_MISS@ / 2) / tma_info_clks",
"MetricGroup": "MemoryLat;TopdownL4;tma_L4_group;tma_issueLat;tma_l3_bound_group",
"MetricName": "tma_l3_hit_latency",
"MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.05 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))",
@@ -2090,7 +2090,7 @@
{
"BriefDescription": "This metric represents fraction of cycles the CPU spent handling cache misses due to lock operations",
"MetricConstraint": "NO_GROUP_EVENTS",
- "MetricExpr": "(16 * max(0, MEM_INST_RETIRED.LOCK_LOADS - L2_RQSTS.ALL_RFO) + MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES * (10 * L2_RQSTS.RFO_HIT + min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO))) / tma_info_clks",
+ "MetricExpr": "(16 * max(0, cpu_core@MEM_INST_RETIRED.LOCK_LOADS@ - cpu_core@L2_RQSTS.ALL_RFO@) + cpu_core@MEM_INST_RETIRED.LOCK_LOADS@ / cpu_core@MEM_INST_RETIRED.ALL_STORES@ * (10 * cpu_core@L2_RQSTS.RFO_HIT@ + min(cpu_core@CPU_CLK_UNHALTED.THREAD@, cpu_core@OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO@))) / tma_info_clks",
"MetricGroup": "Offcore;TopdownL4;tma_L4_group;tma_issueRFO;tma_l1_bound_group",
"MetricName": "tma_lock_latency",
"MetricThreshold": "tma_lock_latency > 0.2 & (tma_l1_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))",
@@ -2100,7 +2100,7 @@
},
{
"BriefDescription": "This metric represents Core fraction of cycles in which CPU was likely limited due to LSD (Loop Stream Detector) unit",
- "MetricExpr": "(LSD.CYCLES_ACTIVE - LSD.CYCLES_OK) / tma_info_core_clks / 2",
+ "MetricExpr": "([email protected]_ACTIVE@ - [email protected]_OK@) / tma_info_core_clks / 2",
"MetricGroup": "FetchBW;LSD;TopdownL3;tma_L3_group;tma_fetch_bandwidth_group",
"MetricName": "tma_lsd",
"MetricThreshold": "tma_lsd > 0.15 & (tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 6 > 0.35)",
@@ -2121,7 +2121,7 @@
},
{
"BriefDescription": "This metric estimates fraction of cycles where the core's performance was likely hurt due to approaching bandwidth limits of external memory (DRAM)",
- "MetricExpr": "min(CPU_CLK_UNHALTED.THREAD, cpu_core@OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD\\,cmask\\=4@) / tma_info_clks",
+ "MetricExpr": "min(cpu_core@CPU_CLK_UNHALTED.THREAD@, cpu_core@OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD\\,cmask\\=4@) / tma_info_clks",
"MetricGroup": "MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_dram_bound_group;tma_issueBW",
"MetricName": "tma_mem_bandwidth",
"MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))",
@@ -2131,7 +2131,7 @@
},
{
"BriefDescription": "This metric estimates fraction of cycles where the performance was likely hurt due to latency from external memory (DRAM)",
- "MetricExpr": "min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD) / tma_info_clks - tma_mem_bandwidth",
+ "MetricExpr": "min(cpu_core@CPU_CLK_UNHALTED.THREAD@, cpu_core@OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD@) / tma_info_clks - tma_mem_bandwidth",
"MetricGroup": "MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_dram_bound_group;tma_issueLat",
"MetricName": "tma_mem_latency",
"MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))",
@@ -2141,7 +2141,7 @@
},
{
"BriefDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck",
- "MetricExpr": "min(tma_backend_bound, LD_HEAD.ANY_AT_RET / tma_info_clks + tma_store_bound)",
+ "MetricExpr": "cpu_core@topdown\\-mem\\-bound@ / (cpu_core@topdown\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_slots",
"MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
@@ -2152,7 +2152,7 @@
},
{
"BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to LFENCE Instructions.",
- "MetricExpr": "13 * MISC2_RETIRED.LFENCE / tma_info_clks",
+ "MetricExpr": "13 * cpu_core@MISC2_RETIRED.LFENCE@ / tma_info_clks",
"MetricGroup": "TopdownL6;tma_L6_group;tma_serializing_operation_group",
"MetricName": "tma_memory_fence",
"MetricThreshold": "tma_memory_fence > 0.05 & (tma_serializing_operation > 0.1 & (tma_ports_utilized_0 > 0.2 & (tma_ports_utilization > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))))",
@@ -2162,7 +2162,7 @@
{
"BriefDescription": "This metric represents fraction of slots where the CPU was retiring memory operations -- uops for memory load or store accesses.",
"MetricConstraint": "NO_GROUP_EVENTS",
- "MetricExpr": "tma_light_operations * MEM_UOP_RETIRED.ANY / (tma_retiring * tma_info_slots)",
+ "MetricExpr": "tma_light_operations * cpu_core@MEM_UOP_RETIRED.ANY@ / (tma_retiring * tma_info_slots)",
"MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
"MetricName": "tma_memory_operations",
"MetricThreshold": "tma_memory_operations > 0.1 & tma_light_operations > 0.6",
@@ -2181,7 +2181,7 @@
},
{
"BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Branch Misprediction at execution stage",
- "MetricExpr": "tma_branch_mispredicts / tma_bad_speculation * INT_MISC.CLEAR_RESTEER_CYCLES / tma_info_clks",
+ "MetricExpr": "tma_branch_mispredicts / tma_bad_speculation * cpu_core@INT_MISC.CLEAR_RESTEER_CYCLES@ / tma_info_clks",
"MetricGroup": "BadSpec;BrMispredicts;TopdownL4;tma_L4_group;tma_branch_resteers_group;tma_issueBM",
"MetricName": "tma_mispredicts_resteers",
"MetricThreshold": "tma_mispredicts_resteers > 0.05 & (tma_branch_resteers > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15))",
@@ -2191,7 +2191,7 @@
},
{
"BriefDescription": "This metric represents Core fraction of cycles in which CPU was likely limited due to the MITE pipeline (the legacy decode pipeline)",
- "MetricExpr": "(IDQ.MITE_CYCLES_ANY - IDQ.MITE_CYCLES_OK) / tma_info_core_clks / 2",
+ "MetricExpr": "([email protected]_CYCLES_ANY@ - [email protected]_CYCLES_OK@) / tma_info_core_clks / 2",
"MetricGroup": "DSBmiss;FetchBW;TopdownL3;tma_L3_group;tma_fetch_bandwidth_group",
"MetricName": "tma_mite",
"MetricThreshold": "tma_mite > 0.1 & (tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 6 > 0.35)",
@@ -2201,7 +2201,7 @@
},
{
"BriefDescription": "The Mixing_Vectors metric gives the percentage of injected blend uops out of all uops issued",
- "MetricExpr": "160 * ASSISTS.SSE_AVX_MIX / tma_info_clks",
+ "MetricExpr": "160 * [email protected]_AVX_MIX@ / tma_info_clks",
"MetricGroup": "TopdownL5;tma_L5_group;tma_issueMV;tma_ports_utilized_0_group",
"MetricName": "tma_mixing_vectors",
"MetricThreshold": "tma_mixing_vectors > 0.05",
@@ -2211,7 +2211,7 @@
},
{
"BriefDescription": "This metric estimates the fraction of cycles when the CPU was stalled due to switches of uop delivery to the Microcode Sequencer (MS)",
- "MetricExpr": "3 * cpu_core@UOPS_RETIRED.MS\\,cmask\\=1\\,edge@ / (tma_retiring * tma_info_slots / UOPS_ISSUED.ANY) / tma_info_clks",
+ "MetricExpr": "3 * cpu_core@UOPS_RETIRED.MS\\,cmask\\=1\\,edge@ / (tma_retiring * tma_info_slots / cpu_core@UOPS_ISSUED.ANY@) / tma_info_clks",
"MetricGroup": "FetchLat;MicroSeq;TopdownL3;tma_L3_group;tma_fetch_latency_group;tma_issueMC;tma_issueMS;tma_issueMV;tma_issueSO",
"MetricName": "tma_ms_switches",
"MetricThreshold": "tma_ms_switches > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)",
@@ -2221,7 +2221,7 @@
},
{
"BriefDescription": "This metric represents fraction of slots where the CPU was retiring branch instructions that were not fused",
- "MetricExpr": "tma_light_operations * (BR_INST_RETIRED.ALL_BRANCHES - INST_RETIRED.MACRO_FUSED) / (tma_retiring * tma_info_slots)",
+ "MetricExpr": "tma_light_operations * (cpu_core@BR_INST_RETIRED.ALL_BRANCHES@ - cpu_core@INST_RETIRED.MACRO_FUSED@) / (tma_retiring * tma_info_slots)",
"MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
"MetricName": "tma_non_fused_branches",
"MetricThreshold": "tma_non_fused_branches > 0.1 & tma_light_operations > 0.6",
@@ -2231,7 +2231,7 @@
},
{
"BriefDescription": "This metric represents fraction of slots where the CPU was retiring NOP (no op) instructions",
- "MetricExpr": "tma_light_operations * INST_RETIRED.NOP / (tma_retiring * tma_info_slots)",
+ "MetricExpr": "tma_light_operations * cpu_core@INST_RETIRED.NOP@ / (tma_retiring * tma_info_slots)",
"MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
"MetricName": "tma_nop_instructions",
"MetricThreshold": "tma_nop_instructions > 0.1 & tma_light_operations > 0.6",
@@ -2252,7 +2252,7 @@
},
{
"BriefDescription": "This metric roughly estimates fraction of slots the CPU retired uops as a result of handing Page Faults",
- "MetricExpr": "99 * ASSISTS.PAGE_FAULT / tma_info_slots",
+ "MetricExpr": "99 * [email protected]_FAULT@ / tma_info_slots",
"MetricGroup": "TopdownL5;tma_L5_group;tma_assists_group",
"MetricName": "tma_page_faults",
"MetricThreshold": "tma_page_faults > 0.05",
@@ -2292,7 +2292,7 @@
},
{
"BriefDescription": "This metric estimates fraction of cycles the CPU performance was potentially limited due to Core computation issues (non divider-related)",
- "MetricExpr": "((cpu_core@EXE_ACTIVITY.3_PORTS_UTIL\\,umask\\=0x80@ + tma_serializing_operation * (CYCLE_ACTIVITY.STALLS_TOTAL - EXE_ACTIVITY.BOUND_ON_LOADS) + (EXE_ACTIVITY.1_PORTS_UTIL + tma_retiring * cpu_core@EXE_ACTIVITY.2_PORTS_UTIL\\,umask\\=0xc@)) / tma_info_clks if ARITH.DIV_ACTIVE < CYCLE_ACTIVITY.STALLS_TOTAL - EXE_ACTIVITY.BOUND_ON_LOADS else (EXE_ACTIVITY.1_PORTS_UTIL + tma_retiring * cpu_core@EXE_ACTIVITY.2_PORTS_UTIL\\,umask\\=0xc@) / tma_info_clks)",
+ "MetricExpr": "((cpu_core@EXE_ACTIVITY.3_PORTS_UTIL\\,umask\\=0x80@ + tma_serializing_operation * (cpu_core@CYCLE_ACTIVITY.STALLS_TOTAL@ - cpu_core@EXE_ACTIVITY.BOUND_ON_LOADS@) + (cpu_core@EXE_ACTIVITY.1_PORTS_UTIL@ + tma_retiring * cpu_core@EXE_ACTIVITY.2_PORTS_UTIL\\,umask\\=0xc@)) / tma_info_clks if [email protected]_ACTIVE@ < cpu_core@CYCLE_ACTIVITY.STALLS_TOTAL@ - cpu_core@EXE_ACTIVITY.BOUND_ON_LOADS@ else (cpu_core@EXE_ACTIVITY.1_PORTS_UTIL@ + tma_retiring * cpu_core@EXE_ACTIVITY.2_PORTS_UTIL\\,umask\\=0xc@) / tma_info_clks)",
"MetricGroup": "PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_group",
"MetricName": "tma_ports_utilization",
"MetricThreshold": "tma_ports_utilization > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2)",
@@ -2302,7 +2302,7 @@
},
{
"BriefDescription": "This metric represents fraction of cycles CPU executed no uops on any execution port (Logical Processor cycles since ICL, Physical Core cycles otherwise)",
- "MetricExpr": "cpu_core@EXE_ACTIVITY.3_PORTS_UTIL\\,umask\\=0x80@ / tma_info_clks + tma_serializing_operation * (CYCLE_ACTIVITY.STALLS_TOTAL - EXE_ACTIVITY.BOUND_ON_LOADS) / tma_info_clks",
+ "MetricExpr": "cpu_core@EXE_ACTIVITY.3_PORTS_UTIL\\,umask\\=0x80@ / tma_info_clks + tma_serializing_operation * (cpu_core@CYCLE_ACTIVITY.STALLS_TOTAL@ - cpu_core@EXE_ACTIVITY.BOUND_ON_LOADS@) / tma_info_clks",
"MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utilization_group",
"MetricName": "tma_ports_utilized_0",
"MetricThreshold": "tma_ports_utilized_0 > 0.2 & (tma_ports_utilization > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))",
@@ -2342,7 +2342,7 @@
},
{
"BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired",
- "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_slots",
+ "MetricExpr": "cpu_core@topdown\\-retiring@ / (cpu_core@topdown\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_slots",
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
@@ -2382,7 +2382,7 @@
},
{
"BriefDescription": "This metric estimates fraction of cycles handling memory load split accesses - load that cross 64-byte cache line boundary",
- "MetricExpr": "tma_info_load_miss_real_latency * LD_BLOCKS.NO_SR / tma_info_clks",
+ "MetricExpr": "tma_info_load_miss_real_latency * cpu_core@LD_BLOCKS.NO_SR@ / tma_info_clks",
"MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group",
"MetricName": "tma_split_loads",
"MetricThreshold": "tma_split_loads > 0.2 & (tma_l1_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))",
@@ -2402,7 +2402,7 @@
},
{
"BriefDescription": "This metric measures fraction of cycles where the Super Queue (SQ) was full taking into account all request-types and both hardware SMT threads (Logical Processors)",
- "MetricExpr": "(XQ.FULL_CYCLES + L1D_PEND_MISS.L2_STALLS) / tma_info_clks",
+ "MetricExpr": "([email protected]_CYCLES@ + cpu_core@L1D_PEND_MISS.L2_STALLS@) / tma_info_clks",
"MetricGroup": "MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_issueBW;tma_l3_bound_group",
"MetricName": "tma_sq_full",
"MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))",
@@ -2422,7 +2422,7 @@
},
{
"BriefDescription": "This metric roughly estimates fraction of cycles when the memory subsystem had loads blocked since they could not forward data from earlier (in program order) overlapping stores",
- "MetricExpr": "13 * LD_BLOCKS.STORE_FORWARD / tma_info_clks",
+ "MetricExpr": "13 * cpu_core@LD_BLOCKS.STORE_FORWARD@ / tma_info_clks",
"MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group",
"MetricName": "tma_store_fwd_blk",
"MetricThreshold": "tma_store_fwd_blk > 0.1 & (tma_l1_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))",
@@ -2432,7 +2432,7 @@
},
{
"BriefDescription": "This metric estimates fraction of cycles the CPU spent handling L1D store misses",
- "MetricExpr": "(MEM_STORE_RETIRED.L2_HIT * 10 * (1 - MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES) + (1 - MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES) * min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO)) / tma_info_clks",
+ "MetricExpr": "(cpu_core@MEM_STORE_RETIRED.L2_HIT@ * 10 * (1 - cpu_core@MEM_INST_RETIRED.LOCK_LOADS@ / cpu_core@MEM_INST_RETIRED.ALL_STORES@) + (1 - cpu_core@MEM_INST_RETIRED.LOCK_LOADS@ / cpu_core@MEM_INST_RETIRED.ALL_STORES@) * min(cpu_core@CPU_CLK_UNHALTED.THREAD@, cpu_core@OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO@)) / tma_info_clks",
"MetricGroup": "MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_issueRFO;tma_issueSL;tma_store_bound_group",
"MetricName": "tma_store_latency",
"MetricThreshold": "tma_store_latency > 0.1 & (tma_store_bound > 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))",
@@ -2442,7 +2442,7 @@
},
{
"BriefDescription": "This metric represents Core fraction of cycles CPU dispatched uops on execution port for Store operations",
- "MetricExpr": "(UOPS_DISPATCHED.PORT_4_9 + UOPS_DISPATCHED.PORT_7_8) / (4 * tma_info_core_clks)",
+ "MetricExpr": "(cpu_core@UOPS_DISPATCHED.PORT_4_9@ + cpu_core@UOPS_DISPATCHED.PORT_7_8@) / (4 * tma_info_core_clks)",
"MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group",
"MetricName": "tma_store_op_utilization",
"MetricThreshold": "tma_store_op_utilization > 0.6",
@@ -2470,7 +2470,7 @@
},
{
"BriefDescription": "This metric estimates how often CPU was stalled due to Streaming store memory accesses; Streaming store optimize out a read request required by RFO stores",
- "MetricExpr": "9 * OCR.STREAMING_WR.ANY_RESPONSE / tma_info_clks",
+ "MetricExpr": "9 * [email protected]_WR.ANY_RESPONSE@ / tma_info_clks",
"MetricGroup": "MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_issueSmSt;tma_store_bound_group",
"MetricName": "tma_streaming_stores",
"MetricThreshold": "tma_streaming_stores > 0.2 & (tma_store_bound > 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))",
@@ -2490,7 +2490,7 @@
},
{
"BriefDescription": "This metric serves as an approximation of legacy x87 usage",
- "MetricExpr": "tma_retiring * UOPS_EXECUTED.X87 / UOPS_EXECUTED.THREAD",
+ "MetricExpr": "tma_retiring * cpu_core@UOPS_EXECUTED.X87@ / UOPS_EXECUTED.THREAD",
"MetricGroup": "Compute;TopdownL4;tma_L4_group;tma_fp_arith_group",
"MetricName": "tma_x87_use",
"MetricThreshold": "tma_x87_use > 0.1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6)",
diff --git a/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json b/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json
index 0402adbf7d92..f4b3c3883643 100644
--- a/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json
@@ -193,7 +193,7 @@
{
"BriefDescription": "Counts the number of cycles the core is stalled due to a demand load miss which hit in DRAM or MMIO (Non-DRAM).",
"MetricConstraint": "NO_GROUP_EVENTS",
- "MetricExpr": "MEM_BOUND_STALLS.LOAD_DRAM_HIT / tma_info_clks - MEM_BOUND_STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_DRAM_HIT / MEM_BOUND_STALLS.LOAD",
+ "MetricExpr": "MEM_BOUND_STALLS.LOAD_DRAM_HIT / tma_info_clks - max((MEM_BOUND_STALLS.LOAD - LD_HEAD.L1_MISS_AT_RET) / tma_info_clks, 0) * MEM_BOUND_STALLS.LOAD_DRAM_HIT / MEM_BOUND_STALLS.LOAD",
"MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
"MetricName": "tma_dram_bound",
"MetricThreshold": "tma_dram_bound > 0.1",
@@ -480,7 +480,7 @@
{
"BriefDescription": "Counts the number of cycles a core is stalled due to a demand load which hit in the L2 Cache.",
"MetricConstraint": "NO_GROUP_EVENTS",
- "MetricExpr": "MEM_BOUND_STALLS.LOAD_L2_HIT / tma_info_clks - MEM_BOUND_STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_L2_HIT / MEM_BOUND_STALLS.LOAD",
+ "MetricExpr": "MEM_BOUND_STALLS.LOAD_L2_HIT / tma_info_clks - max((MEM_BOUND_STALLS.LOAD - LD_HEAD.L1_MISS_AT_RET) / tma_info_clks, 0) * MEM_BOUND_STALLS.LOAD_L2_HIT / MEM_BOUND_STALLS.LOAD",
"MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
"MetricName": "tma_l2_bound",
"MetricThreshold": "tma_l2_bound > 0.1",
@@ -488,7 +488,7 @@
},
{
"BriefDescription": "Counts the number of cycles a core is stalled due to a demand load which hit in the Last Level Cache (LLC) or other core with HITE/F/M.",
- "MetricExpr": "MEM_BOUND_STALLS.LOAD_LLC_HIT / tma_info_clks - MEM_BOUND_STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_LLC_HIT / MEM_BOUND_STALLS.LOAD",
+ "MetricExpr": "MEM_BOUND_STALLS.LOAD_LLC_HIT / tma_info_clks - max((MEM_BOUND_STALLS.LOAD - LD_HEAD.L1_MISS_AT_RET) / tma_info_clks, 0) * MEM_BOUND_STALLS.LOAD_LLC_HIT / MEM_BOUND_STALLS.LOAD",
"MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
"MetricName": "tma_l3_bound",
"MetricThreshold": "tma_l3_bound > 0.1",
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:19:14

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 34/40] perf parse-events: Don't reorder atom cpu events

On hybrid systems the topdown events don't share a fixed counter on
the atom core, so they don't require the sorting the perf metric
supporting PMUs do.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/arch/x86/util/evlist.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
index d4193479a364..1b6065841fb0 100644
--- a/tools/perf/arch/x86/util/evlist.c
+++ b/tools/perf/arch/x86/util/evlist.c
@@ -6,6 +6,7 @@
#include "util/event.h"
#include "util/pmu-hybrid.h"
#include "topdown.h"
+#include "evsel.h"

static int ___evlist__add_default_attrs(struct evlist *evlist,
struct perf_event_attr *attrs,
@@ -67,8 +68,7 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,

int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
{
- if (topdown_sys_has_perf_metrics() &&
- (!lhs->pmu_name || !strncmp(lhs->pmu_name, "cpu", 3))) {
+ if (topdown_sys_has_perf_metrics() && evsel__sys_has_perf_metrics(lhs)) {
/* Ensure the topdown slots comes first. */
if (strcasestr(lhs->name, "slots"))
return -1;
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:19:22

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 26/40] perf parse-events: Add pmu filter

To support the cputype argument added to "perf stat" for hybrid it is
necessary to filter events during wildcard matching. Add a scanner
argument for the filter and checking it when wildcard matching.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/builtin-record.c | 13 +++++++--
tools/perf/builtin-stat.c | 10 +++++--
tools/perf/builtin-top.c | 5 +++-
tools/perf/builtin-trace.c | 5 +++-
tools/perf/tests/parse-events.c | 3 +-
tools/perf/tests/pmu-events.c | 3 +-
tools/perf/util/evlist.h | 1 -
tools/perf/util/metricgroup.c | 4 +--
tools/perf/util/parse-events.c | 51 ++++++++++++++++++++++++---------
tools/perf/util/parse-events.h | 21 ++++++++++----
tools/perf/util/parse-events.y | 6 ++--
11 files changed, 90 insertions(+), 32 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 7b7e74a56346..7e4490dfc0b5 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -3335,6 +3335,14 @@ const char record_callchain_help[] = CALLCHAIN_RECORD_HELP

static bool dry_run;

+static struct parse_events_option_args parse_events_option_args = {
+ .evlistp = &record.evlist,
+};
+
+static struct parse_events_option_args switch_output_parse_events_option_args = {
+ .evlistp = &record.sb_evlist,
+};
+
/*
* XXX Will stay a global variable till we fix builtin-script.c to stop messing
* with it and switch to use the library functions in perf_evlist that came
@@ -3343,7 +3351,7 @@ static bool dry_run;
* using pipes, etc.
*/
static struct option __record_options[] = {
- OPT_CALLBACK('e', "event", &record.evlist, "event",
+ OPT_CALLBACK('e', "event", &parse_events_option_args, "event",
"event selector. use 'perf list' to list available events",
parse_events_option),
OPT_CALLBACK(0, "filter", &record.evlist, "filter",
@@ -3496,7 +3504,8 @@ static struct option __record_options[] = {
&record.switch_output.set, "signal or size[BKMG] or time[smhd]",
"Switch output when receiving SIGUSR2 (signal) or cross a size or time threshold",
"signal"),
- OPT_CALLBACK_SET(0, "switch-output-event", &record.sb_evlist, &record.switch_output_event_set, "switch output event",
+ OPT_CALLBACK_SET(0, "switch-output-event", &switch_output_parse_events_option_args,
+ &record.switch_output_event_set, "switch output event",
"switch output event selector. use 'perf list' to list available events",
parse_events_option_new_evlist),
OPT_INTEGER(0, "switch-max-files", &record.switch_output.num_files,
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index eb34f5418ad3..46210fa3f14b 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -101,6 +101,10 @@
static void print_counters(struct timespec *ts, int argc, const char **argv);

static struct evlist *evsel_list;
+static struct parse_events_option_args parse_events_option_args = {
+ .evlistp = &evsel_list,
+};
+
static bool all_counters_use_bpf = true;

static struct target target = {
@@ -1096,8 +1100,8 @@ static int parse_hybrid_type(const struct option *opt,
return -1;
}

- evlist->hybrid_pmu_name = perf_pmu__hybrid_type_to_pmu(str);
- if (!evlist->hybrid_pmu_name) {
+ parse_events_option_args.pmu_filter = perf_pmu__hybrid_type_to_pmu(str);
+ if (!parse_events_option_args.pmu_filter) {
fprintf(stderr, "--cputype %s is not supported!\n", str);
return -1;
}
@@ -1108,7 +1112,7 @@ static int parse_hybrid_type(const struct option *opt,
static struct option stat_options[] = {
OPT_BOOLEAN('T', "transaction", &transaction_run,
"hardware transaction statistics"),
- OPT_CALLBACK('e', "event", &evsel_list, "event",
+ OPT_CALLBACK('e', "event", &parse_events_option_args, "event",
"event selector. use 'perf list' to list available events",
parse_events_option),
OPT_CALLBACK(0, "filter", &evsel_list, "filter",
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index eb5740154bc0..48ee49e95c5e 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1440,12 +1440,15 @@ int cmd_top(int argc, const char **argv)
.max_stack = sysctl__max_stack(),
.nr_threads_synthesize = UINT_MAX,
};
+ struct parse_events_option_args parse_events_option_args = {
+ .evlistp = &top.evlist,
+ };
bool branch_call_mode = false;
struct record_opts *opts = &top.record_opts;
struct target *target = &opts->target;
const char *disassembler_style = NULL, *objdump_path = NULL, *addr2line_path = NULL;
const struct option options[] = {
- OPT_CALLBACK('e', "event", &top.evlist, "event",
+ OPT_CALLBACK('e', "event", &parse_events_option_args, "event",
"event selector. use 'perf list' to list available events",
parse_events_option),
OPT_U64('c', "count", &opts->user_interval, "event period to sample"),
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 8ee3a45c3c54..b49d3abb1203 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -4591,8 +4591,11 @@ static int trace__parse_events_option(const struct option *opt, const char *str,
err = 0;

if (lists[0]) {
+ struct parse_events_option_args parse_events_option_args = {
+ .evlistp = &trace->evlist,
+ };
struct option o = {
- .value = &trace->evlist,
+ .value = &parse_events_option_args,
};
err = parse_events_option(&o, lists[0], 0);
}
diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 6aea51e33dc0..0b8ec9b1034f 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -1938,7 +1938,8 @@ static int test_event_fake_pmu(const char *str)
return -ENOMEM;

parse_events_error__init(&err);
- ret = __parse_events(evlist, str, &err, &perf_pmu__fake, /*warn_if_reordered=*/true);
+ ret = __parse_events(evlist, str, /*pmu_filter=*/NULL, &err,
+ &perf_pmu__fake, /*warn_if_reordered=*/true);
if (ret) {
pr_debug("failed to parse event '%s', err %d, str '%s'\n",
str, ret, err.str);
diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index a2cde61b1c77..734004f1a37d 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -776,7 +776,8 @@ static int check_parse_id(const char *id, struct parse_events_error *error,
for (cur = strchr(dup, '@') ; cur; cur = strchr(++cur, '@'))
*cur = '/';

- ret = __parse_events(evlist, dup, error, fake_pmu, /*warn_if_reordered=*/true);
+ ret = __parse_events(evlist, dup, /*pmu_filter=*/NULL, error, fake_pmu,
+ /*warn_if_reordered=*/true);
free(dup);

evlist__delete(evlist);
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 46cf402add93..e7e5540cc970 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -67,7 +67,6 @@ struct evlist {
struct evsel *selected;
struct events_stats stats;
struct perf_env *env;
- const char *hybrid_pmu_name;
void (*trace_event_sample_raw)(struct evlist *evlist,
union perf_event *event,
struct perf_sample *sample);
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 4b9a16291b96..46fc31cff124 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -1441,8 +1441,8 @@ static int parse_ids(bool metric_no_merge, struct perf_pmu *fake_pmu,
}
pr_debug("Parsing metric events '%s'\n", events.buf);
parse_events_error__init(&parse_error);
- ret = __parse_events(parsed_evlist, events.buf, &parse_error, fake_pmu,
- /*warn_if_reordered=*/false);
+ ret = __parse_events(parsed_evlist, events.buf, /*pmu_filter=*/NULL,
+ &parse_error, fake_pmu, /*warn_if_reordered=*/false);
if (ret) {
parse_events_error__print(&parse_error, events.buf);
goto err_out;
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index fad0dd4b86b2..29fa84c4cdd4 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -464,8 +464,24 @@ int parse_events__decode_legacy_cache(const char *name, int pmu_type, __u64 *con
return 0;
}

+/**
+ * parse_events__filter_pmu - returns false if a wildcard PMU should be
+ * considered, true if it should be filtered.
+ */
+bool parse_events__filter_pmu(const struct parse_events_state *parse_state,
+ const struct perf_pmu *pmu)
+{
+ if (parse_state->pmu_filter == NULL)
+ return false;
+
+ if (pmu->name == NULL)
+ return true;
+
+ return strcmp(parse_state->pmu_filter, pmu->name) != 0;
+}
+
int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
- struct parse_events_error *err,
+ struct parse_events_state *parse_state,
struct list_head *head_config)
{
struct perf_pmu *pmu = NULL;
@@ -485,6 +501,9 @@ int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
if (pmu->is_uncore || pmu->type == PERF_TYPE_SOFTWARE)
continue;

+ if (parse_events__filter_pmu(parse_state, pmu))
+ continue;
+
memset(&attr, 0, sizeof(attr));
attr.type = PERF_TYPE_HW_CACHE;

@@ -498,8 +517,7 @@ int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
found_supported = true;

if (head_config) {
- if (config_attr(&attr, head_config, err,
- config_term_common))
+ if (config_attr(&attr, head_config, parse_state->error, config_term_common))
return -EINVAL;

if (get_config_terms(head_config, &config_terms))
@@ -1493,6 +1511,9 @@ int parse_events_add_numeric(struct parse_events_state *parse_state,
if (!perf_pmu__supports_wildcard_numeric(pmu))
continue;

+ if (parse_events__filter_pmu(parse_state, pmu))
+ continue;
+
found_supported = true;
ret = __parse_events_add_numeric(parse_state, list, pmu->type, config, head_config);
if (ret)
@@ -1686,6 +1707,9 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
while ((pmu = perf_pmu__scan(pmu)) != NULL) {
struct perf_pmu_alias *alias;

+ if (parse_events__filter_pmu(parse_state, pmu))
+ continue;
+
list_for_each_entry(alias, &pmu->aliases, list) {
if (!strcasecmp(alias->name, str)) {
parse_events_copy_term_list(head, &orig_head);
@@ -2118,7 +2142,7 @@ static bool parse_events__sort_events_and_fix_groups(struct list_head *list)
return idx_changed || num_leaders != orig_num_leaders;
}

-int __parse_events(struct evlist *evlist, const char *str,
+int __parse_events(struct evlist *evlist, const char *str, const char *pmu_filter,
struct parse_events_error *err, struct perf_pmu *fake_pmu,
bool warn_if_reordered)
{
@@ -2129,6 +2153,7 @@ int __parse_events(struct evlist *evlist, const char *str,
.evlist = evlist,
.stoken = PE_START_EVENTS,
.fake_pmu = fake_pmu,
+ .pmu_filter = pmu_filter,
.match_legacy_cache_terms = true,
};
int ret;
@@ -2310,12 +2335,13 @@ void parse_events_error__print(struct parse_events_error *err,
int parse_events_option(const struct option *opt, const char *str,
int unset __maybe_unused)
{
- struct evlist *evlist = *(struct evlist **)opt->value;
+ struct parse_events_option_args *args = opt->value;
struct parse_events_error err;
int ret;

parse_events_error__init(&err);
- ret = parse_events(evlist, str, &err);
+ ret = __parse_events(*args->evlistp, str, args->pmu_filter, &err,
+ /*fake_pmu=*/NULL, /*warn_if_reordered=*/true);

if (ret) {
parse_events_error__print(&err, str);
@@ -2328,22 +2354,21 @@ int parse_events_option(const struct option *opt, const char *str,

int parse_events_option_new_evlist(const struct option *opt, const char *str, int unset)
{
- struct evlist **evlistp = opt->value;
+ struct parse_events_option_args *args = opt->value;
int ret;

- if (*evlistp == NULL) {
- *evlistp = evlist__new();
+ if (*args->evlistp == NULL) {
+ *args->evlistp = evlist__new();

- if (*evlistp == NULL) {
+ if (*args->evlistp == NULL) {
fprintf(stderr, "Not enough memory to create evlist\n");
return -1;
}
}
-
ret = parse_events_option(opt, str, unset);
if (ret) {
- evlist__delete(*evlistp);
- *evlistp = NULL;
+ evlist__delete(*args->evlistp);
+ *args->evlistp = NULL;
}

return ret;
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 77b8f7efdb94..d4cbda6e946a 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -22,17 +22,24 @@ bool is_event_supported(u8 type, u64 config);

const char *event_type(int type);

+/* Arguments encoded in opt->value. */
+struct parse_events_option_args {
+ struct evlist **evlistp;
+ const char *pmu_filter;
+};
int parse_events_option(const struct option *opt, const char *str, int unset);
int parse_events_option_new_evlist(const struct option *opt, const char *str, int unset);
-__attribute__((nonnull(1, 2, 3)))
-int __parse_events(struct evlist *evlist, const char *str, struct parse_events_error *error,
- struct perf_pmu *fake_pmu, bool warn_if_reordered);
+__attribute__((nonnull(1, 2, 4)))
+int __parse_events(struct evlist *evlist, const char *str, const char *pmu_filter,
+ struct parse_events_error *error, struct perf_pmu *fake_pmu,
+ bool warn_if_reordered);

__attribute__((nonnull(1, 2, 3)))
static inline int parse_events(struct evlist *evlist, const char *str,
struct parse_events_error *err)
{
- return __parse_events(evlist, str, err, /*fake_pmu=*/NULL, /*warn_if_reordered=*/true);
+ return __parse_events(evlist, str, /*pmu_filter=*/NULL, err, /*fake_pmu=*/NULL,
+ /*warn_if_reordered=*/true);
}

int parse_event(struct evlist *evlist, const char *str);
@@ -122,11 +129,15 @@ struct parse_events_state {
struct list_head *terms;
int stoken;
struct perf_pmu *fake_pmu;
+ /* If non-null, when wildcard matching only match the given PMU. */
+ const char *pmu_filter;
/* Should PE_LEGACY_NAME tokens be generated for config terms? */
bool match_legacy_cache_terms;
bool wild_card_pmus;
};

+bool parse_events__filter_pmu(const struct parse_events_state *parse_state,
+ const struct perf_pmu *pmu);
void parse_events__shrink_config_terms(void);
int parse_events__is_hardcoded_term(struct parse_events_term *term);
int parse_events_term__num(struct parse_events_term **term,
@@ -171,7 +182,7 @@ int parse_events_add_tool(struct parse_events_state *parse_state,
struct list_head *list,
int tool_event);
int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
- struct parse_events_error *error,
+ struct parse_events_state *parse_state,
struct list_head *head_config);
int parse_events__decode_legacy_cache(const char *name, int pmu_type, __u64 *config);
int parse_events_add_breakpoint(struct list_head *list, int *idx,
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index e709508b1d6e..c95877cbd6cf 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -312,6 +312,9 @@ PE_NAME opt_pmu_config
while ((pmu = perf_pmu__scan(pmu)) != NULL) {
char *name = pmu->name;

+ if (parse_events__filter_pmu(parse_state, pmu))
+ continue;
+
if (!strncmp(name, "uncore_", 7) &&
strncmp($1, "uncore_", 7))
name += 7;
@@ -473,13 +476,12 @@ event_legacy_cache:
PE_LEGACY_CACHE opt_event_config
{
struct parse_events_state *parse_state = _parse_state;
- struct parse_events_error *error = parse_state->error;
struct list_head *list;
int err;

list = alloc_list();
ABORT_ON(!list);
- err = parse_events_add_cache(list, &parse_state->idx, $1, error, $2);
+ err = parse_events_add_cache(list, &parse_state->idx, $1, parse_state, $2);

parse_events_terms__delete($2);
free($1);
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:19:26

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 31/40] perf parse-events: Avoid error when assigning a term

Avoid the parser error:
'''
$ perf stat -e 'cycles/name=name/' true
event syntax error: 'cycles/name=name/'
\___ parser error
'''
by turning the term back to a string if it is on the right. Add PMU
and generic parsing tests.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/tests/parse-events.c | 21 +++++++++++++++++++++
tools/perf/util/parse-events.c | 9 +++++++++
tools/perf/util/parse-events.h | 3 +++
tools/perf/util/parse-events.y | 8 ++++++++
4 files changed, 41 insertions(+)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 4c21bef882ff..06042f450ece 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -1485,6 +1485,16 @@ static int test__sym_event_dc(struct evlist *evlist)
return TEST_OK;
}

+static int test__term_equal_term(struct evlist *evlist)
+{
+ struct evsel *evsel = evlist__first(evlist);
+
+ TEST_ASSERT_VAL("wrong type", evsel->core.attr.type == PERF_TYPE_HARDWARE);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
+ TEST_ASSERT_VAL("wrong name setting", strcmp(evsel->name, "name") == 0);
+ return TEST_OK;
+}
+
#ifdef HAVE_LIBTRACEEVENT
static int count_tracepoints(void)
{
@@ -1857,6 +1867,11 @@ static const struct evlist_test test__events[] = {
.check = test__exclusive_group,
/* 7 */
},
+ {
+ .name = "cycles/name=name/",
+ .check = test__term_equal_term,
+ /* 8 */
+ },
};

static const struct evlist_test test__events_pmu[] = {
@@ -2038,6 +2053,12 @@ static const struct evlist_test test__events_pmu[] = {
.check = test__exclusive_group,
/* 9 */
},
+ {
+ .name = "cpu/cycles,name=name/",
+ .valid = test__pmu_cpu_valid,
+ .check = test__term_equal_term,
+ /* 0 */
+ },
};

struct terms_test {
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 27c179323b6d..98e424257278 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -2578,6 +2578,15 @@ int parse_events_term__str(struct parse_events_term **term,
return new_term(term, &temp, str, 0);
}

+int parse_events_term__term(struct parse_events_term **term,
+ int term_lhs, int term_rhs,
+ void *loc_term, void *loc_val)
+{
+ return parse_events_term__str(term, term_lhs, NULL,
+ strdup(config_term_names[term_rhs]),
+ loc_term, loc_val);
+}
+
int parse_events_term__clone(struct parse_events_term **new,
struct parse_events_term *term)
{
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 7fe80b416143..2a8cafe0ee8f 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -148,6 +148,9 @@ int parse_events_term__num(struct parse_events_term **term,
int parse_events_term__str(struct parse_events_term **term,
int type_term, char *config, char *str,
void *loc_term, void *loc_val);
+int parse_events_term__term(struct parse_events_term **term,
+ int term_lhs, int term_rhs,
+ void *loc_term, void *loc_val);
int parse_events_term__clone(struct parse_events_term **new,
struct parse_events_term *term);
void parse_events_term__delete(struct parse_events_term *term);
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 819a5123fd77..0aaebc57748e 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -848,6 +848,14 @@ PE_TERM '=' PE_TERM_HW
$$ = term;
}
|
+PE_TERM '=' PE_TERM
+{
+ struct parse_events_term *term;
+
+ ABORT_ON(parse_events_term__term(&term, (int)$1, (int)$3, &@1, &@3));
+ $$ = term;
+}
+|
PE_TERM '=' PE_VALUE
{
struct parse_events_term *term;
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:19:28

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 27/40] perf stat: Make cputype filter generic

Rather than limit the --cputype argument for "perf list" and "perf
stat" to hybrid PMUs of just cpu_atom and cpu_core, allow any PMU.

Note, that if cpu_atom isn't mounted but a filter of cpu_atom is
requested, then this will now fail. As such a filter would never
succeed, no events can come from that unmounted PMU, then this
behavior could never have been useful and failing is clearer.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/builtin-list.c | 19 +++++++++++--------
tools/perf/builtin-stat.c | 12 +++++++-----
tools/perf/util/pmu-hybrid.c | 20 --------------------
tools/perf/util/pmu-hybrid.h | 1 -
tools/perf/util/pmus.c | 25 ++++++++++++++++++++++++-
tools/perf/util/pmus.h | 3 +++
6 files changed, 45 insertions(+), 35 deletions(-)

diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c
index 1f5dbd5f0ba4..1b48cf214b6e 100644
--- a/tools/perf/builtin-list.c
+++ b/tools/perf/builtin-list.c
@@ -11,8 +11,8 @@
#include "builtin.h"

#include "util/print-events.h"
+#include "util/pmus.h"
#include "util/pmu.h"
-#include "util/pmu-hybrid.h"
#include "util/debug.h"
#include "util/metricgroup.h"
#include "util/string2.h"
@@ -429,7 +429,7 @@ int cmd_list(int argc, const char **argv)
.print_event = default_print_event,
.print_metric = default_print_metric,
};
- const char *hybrid_name = NULL;
+ const char *cputype = NULL;
const char *unit_name = NULL;
bool json = false;
struct option list_options[] = {
@@ -443,8 +443,8 @@ int cmd_list(int argc, const char **argv)
"Print information on the perf event names and expressions used internally by events."),
OPT_BOOLEAN(0, "deprecated", &default_ps.deprecated,
"Print deprecated events."),
- OPT_STRING(0, "cputype", &hybrid_name, "hybrid cpu type",
- "Limit PMU or metric printing to the given hybrid PMU (e.g. core or atom)."),
+ OPT_STRING(0, "cputype", &cputype, "cpu type",
+ "Limit PMU or metric printing to the given PMU (e.g. cpu, core or atom)."),
OPT_STRING(0, "unit", &unit_name, "PMU name",
"Limit PMU or metric printing to the specified PMU."),
OPT_INCR(0, "debug", &verbose,
@@ -484,10 +484,13 @@ int cmd_list(int argc, const char **argv)
assert(default_ps.visited_metrics);
if (unit_name)
default_ps.pmu_glob = strdup(unit_name);
- else if (hybrid_name) {
- default_ps.pmu_glob = perf_pmu__hybrid_type_to_pmu(hybrid_name);
- if (!default_ps.pmu_glob)
- pr_warning("WARNING: hybrid cputype is not supported!\n");
+ else if (cputype) {
+ const struct perf_pmu *pmu = perf_pmus__pmu_for_pmu_filter(cputype);
+
+ if (!pmu)
+ pr_warning("WARNING: cputype is not supported!\n");
+
+ default_ps.pmu_glob = pmu->name;
}
}
print_cb.print_start(ps);
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 46210fa3f14b..e2119ffd08de 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -44,6 +44,7 @@
#include "util/cgroup.h"
#include <subcmd/parse-options.h>
#include "util/parse-events.h"
+#include "util/pmus.h"
#include "util/pmu.h"
#include "util/event.h"
#include "util/evlist.h"
@@ -69,7 +70,6 @@
#include "util/pfm.h"
#include "util/bpf_counter.h"
#include "util/iostat.h"
-#include "util/pmu-hybrid.h"
#include "util/util.h"
#include "asm/bug.h"

@@ -1089,10 +1089,11 @@ static int parse_stat_cgroups(const struct option *opt,
return parse_cgroups(opt, str, unset);
}

-static int parse_hybrid_type(const struct option *opt,
+static int parse_cputype(const struct option *opt,
const char *str,
int unset __maybe_unused)
{
+ const struct perf_pmu *pmu;
struct evlist *evlist = *(struct evlist **)opt->value;

if (!list_empty(&evlist->core.entries)) {
@@ -1100,11 +1101,12 @@ static int parse_hybrid_type(const struct option *opt,
return -1;
}

- parse_events_option_args.pmu_filter = perf_pmu__hybrid_type_to_pmu(str);
- if (!parse_events_option_args.pmu_filter) {
+ pmu = perf_pmus__pmu_for_pmu_filter(str);
+ if (!pmu) {
fprintf(stderr, "--cputype %s is not supported!\n", str);
return -1;
}
+ parse_events_option_args.pmu_filter = pmu->name;

return 0;
}
@@ -1230,7 +1232,7 @@ static struct option stat_options[] = {
OPT_CALLBACK(0, "cputype", &evsel_list, "hybrid cpu type",
"Only enable events on applying cpu with this type "
"for hybrid platform (e.g. core or atom)",
- parse_hybrid_type),
+ parse_cputype),
#ifdef HAVE_LIBPFM
OPT_CALLBACK(0, "pfm-events", &evsel_list, "event",
"libpfm4 event selector. use 'perf list' to list available events",
diff --git a/tools/perf/util/pmu-hybrid.c b/tools/perf/util/pmu-hybrid.c
index 38628805a952..bc4cb0738c35 100644
--- a/tools/perf/util/pmu-hybrid.c
+++ b/tools/perf/util/pmu-hybrid.c
@@ -50,23 +50,3 @@ bool perf_pmu__is_hybrid(const char *name)
{
return perf_pmu__find_hybrid_pmu(name) != NULL;
}
-
-char *perf_pmu__hybrid_type_to_pmu(const char *type)
-{
- char *pmu_name = NULL;
-
- if (asprintf(&pmu_name, "cpu_%s", type) < 0)
- return NULL;
-
- if (perf_pmu__is_hybrid(pmu_name))
- return pmu_name;
-
- /*
- * pmu may be not scanned, check the sysfs.
- */
- if (perf_pmu__hybrid_mounted(pmu_name))
- return pmu_name;
-
- free(pmu_name);
- return NULL;
-}
diff --git a/tools/perf/util/pmu-hybrid.h b/tools/perf/util/pmu-hybrid.h
index 2b186c26a43e..206b94931531 100644
--- a/tools/perf/util/pmu-hybrid.h
+++ b/tools/perf/util/pmu-hybrid.h
@@ -17,7 +17,6 @@ bool perf_pmu__hybrid_mounted(const char *name);

struct perf_pmu *perf_pmu__find_hybrid_pmu(const char *name);
bool perf_pmu__is_hybrid(const char *name);
-char *perf_pmu__hybrid_type_to_pmu(const char *type);

static inline int perf_pmu__hybrid_pmu_num(void)
{
diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
index 7f3b93c4d229..140e11f00b29 100644
--- a/tools/perf/util/pmus.c
+++ b/tools/perf/util/pmus.c
@@ -1,5 +1,28 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/list.h>
-#include <pmus.h>
+#include <string.h>
+#include "pmus.h"
+#include "pmu.h"

LIST_HEAD(pmus);
+
+const struct perf_pmu *perf_pmus__pmu_for_pmu_filter(const char *str)
+{
+ struct perf_pmu *pmu = NULL;
+
+ while ((pmu = perf_pmu__scan(pmu)) != NULL) {
+ if (!strcmp(pmu->name, str))
+ return pmu;
+ /* Ignore "uncore_" prefix. */
+ if (!strncmp(pmu->name, "uncore_", 7)) {
+ if (!strcmp(pmu->name + 7, str))
+ return pmu;
+ }
+ /* Ignore "cpu_" prefix on Intel hybrid PMUs. */
+ if (!strncmp(pmu->name, "cpu_", 4)) {
+ if (!strcmp(pmu->name + 4, str))
+ return pmu;
+ }
+ }
+ return NULL;
+}
diff --git a/tools/perf/util/pmus.h b/tools/perf/util/pmus.h
index 5ec12007eb5c..d475e2960c10 100644
--- a/tools/perf/util/pmus.h
+++ b/tools/perf/util/pmus.h
@@ -3,7 +3,10 @@
#define __PMUS_H

extern struct list_head pmus;
+struct perf_pmu;

#define perf_pmus__for_each_pmu(pmu) list_for_each_entry(pmu, &pmus, list)

+const struct perf_pmu *perf_pmus__pmu_for_pmu_filter(const char *str);
+
#endif /* __PMUS_H */
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:19:43

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 30/40] perf parse-events: Support hardware events as terms

An event like "cpu/instructions/" typically parses due to there being
a sysfs event called instructions. On hybrid recursive parsing means
that the hardware event is encoded in the attribute, with the PMU
being placed in the high bits of the config:

'''
$ perf stat -vv -e 'cpu_core/cycles/' true
...
------------------------------------------------------------
perf_event_attr:
size 136
config 0x400000000
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
'''

Make this behavior the default by adding a new term type and token for
hardware events. The token gathers both the numeric config and the
parsed name, so that if the token appears like "cycles/name=cycles/"
then the token can be handled like a name. The numeric value isn't
sufficient to distinguish say "cpu-cycles" from "cycles".

Extend the parse-events test so that all current non-PMU hardware
parsing tests, also test with the PMU cpu - more than half the change.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/tests/parse-events.c | 126 ++++++++++++++++++++++++++++++++
tools/perf/util/parse-events.c | 37 +++-------
tools/perf/util/parse-events.h | 3 +-
tools/perf/util/parse-events.l | 20 +++++
tools/perf/util/parse-events.y | 34 +++++++--
5 files changed, 187 insertions(+), 33 deletions(-)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index a0cd50e18ebc..4c21bef882ff 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -1912,6 +1912,132 @@ static const struct evlist_test test__events_pmu[] = {
.check = test__checkevent_config_cache,
/* 8 */
},
+ {
+ .name = "cpu/instructions/",
+ .valid = test__pmu_cpu_valid,
+ .check = test__checkevent_symbolic_name,
+ /* 9 */
+ },
+ {
+ .name = "cpu/cycles,period=100000,config2/",
+ .valid = test__pmu_cpu_valid,
+ .check = test__checkevent_symbolic_name_config,
+ /* 0 */
+ },
+ {
+ .name = "cpu/instructions/h",
+ .valid = test__pmu_cpu_valid,
+ .check = test__checkevent_symbolic_name_modifier,
+ /* 1 */
+ },
+ {
+ .name = "cpu/instructions/G",
+ .valid = test__pmu_cpu_valid,
+ .check = test__checkevent_exclude_host_modifier,
+ /* 2 */
+ },
+ {
+ .name = "cpu/instructions/H",
+ .valid = test__pmu_cpu_valid,
+ .check = test__checkevent_exclude_guest_modifier,
+ /* 3 */
+ },
+ {
+ .name = "{cpu/instructions/k,cpu/cycles/upp}",
+ .valid = test__pmu_cpu_valid,
+ .check = test__group1,
+ /* 4 */
+ },
+ {
+ .name = "{cpu/cycles/u,cpu/instructions/kp}:p",
+ .valid = test__pmu_cpu_valid,
+ .check = test__group4,
+ /* 5 */
+ },
+ {
+ .name = "{cpu/cycles/,cpu/cache-misses/G}:H",
+ .valid = test__pmu_cpu_valid,
+ .check = test__group_gh1,
+ /* 6 */
+ },
+ {
+ .name = "{cpu/cycles/,cpu/cache-misses/H}:G",
+ .valid = test__pmu_cpu_valid,
+ .check = test__group_gh2,
+ /* 7 */
+ },
+ {
+ .name = "{cpu/cycles/G,cpu/cache-misses/H}:u",
+ .valid = test__pmu_cpu_valid,
+ .check = test__group_gh3,
+ /* 8 */
+ },
+ {
+ .name = "{cpu/cycles/G,cpu/cache-misses/H}:uG",
+ .valid = test__pmu_cpu_valid,
+ .check = test__group_gh4,
+ /* 9 */
+ },
+ {
+ .name = "{cpu/cycles/,cpu/cache-misses/,cpu/branch-misses/}:S",
+ .valid = test__pmu_cpu_valid,
+ .check = test__leader_sample1,
+ /* 0 */
+ },
+ {
+ .name = "{cpu/instructions/,cpu/branch-misses/}:Su",
+ .valid = test__pmu_cpu_valid,
+ .check = test__leader_sample2,
+ /* 1 */
+ },
+ {
+ .name = "cpu/instructions/uDp",
+ .valid = test__pmu_cpu_valid,
+ .check = test__checkevent_pinned_modifier,
+ /* 2 */
+ },
+ {
+ .name = "{cpu/cycles/,cpu/cache-misses/,cpu/branch-misses/}:D",
+ .valid = test__pmu_cpu_valid,
+ .check = test__pinned_group,
+ /* 3 */
+ },
+ {
+ .name = "cpu/instructions/I",
+ .valid = test__pmu_cpu_valid,
+ .check = test__checkevent_exclude_idle_modifier,
+ /* 4 */
+ },
+ {
+ .name = "cpu/instructions/kIG",
+ .valid = test__pmu_cpu_valid,
+ .check = test__checkevent_exclude_idle_modifier_1,
+ /* 5 */
+ },
+ {
+ .name = "cpu/cycles/u",
+ .valid = test__pmu_cpu_valid,
+ .check = test__sym_event_slash,
+ /* 6 */
+ },
+ {
+ .name = "cpu/cycles/k",
+ .valid = test__pmu_cpu_valid,
+ .check = test__sym_event_dc,
+ /* 7 */
+ },
+ {
+ .name = "cpu/instructions/uep",
+ .valid = test__pmu_cpu_valid,
+ .check = test__checkevent_exclusive_modifier,
+ /* 8 */
+ },
+ {
+ .name = "{cpu/cycles/,cpu/cache-misses/,cpu/branch-misses/}:e",
+ .valid = test__pmu_cpu_valid,
+ .check = test__exclusive_group,
+ /* 9 */
+ },
};

struct terms_test {
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 29fa84c4cdd4..27c179323b6d 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1053,6 +1053,7 @@ static const char *config_term_names[__PARSE_EVENTS__TERM_TYPE_NR] = {
[PARSE_EVENTS__TERM_TYPE_METRIC_ID] = "metric-id",
[PARSE_EVENTS__TERM_TYPE_RAW] = "raw",
[PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE] = "legacy-cache",
+ [PARSE_EVENTS__TERM_TYPE_HARDWARE] = "hardware",
};

static bool config_term_shrinked;
@@ -1240,6 +1241,17 @@ static int config_term_pmu(struct perf_event_attr *attr,
} else
term->type_term = PARSE_EVENTS__TERM_TYPE_USER;
}
+ if (term->type_term == PARSE_EVENTS__TERM_TYPE_HARDWARE) {
+ const struct perf_pmu *pmu = perf_pmu__find_by_type(attr->type);
+
+ if (!pmu) {
+ pr_debug("Failed to find PMU for type %d", attr->type);
+ return -EINVAL;
+ }
+ attr->type = PERF_TYPE_HARDWARE;
+ attr->config = ((__u64)pmu->type << PERF_PMU_TYPE_SHIFT) | term->val.num;
+ return 0;
+ }
if (term->type_term == PARSE_EVENTS__TERM_TYPE_USER ||
term->type_term == PARSE_EVENTS__TERM_TYPE_DRV_CFG) {
/*
@@ -2566,31 +2578,6 @@ int parse_events_term__str(struct parse_events_term **term,
return new_term(term, &temp, str, 0);
}

-int parse_events_term__sym_hw(struct parse_events_term **term,
- char *config, unsigned idx)
-{
- struct event_symbol *sym;
- char *str;
- struct parse_events_term temp = {
- .type_val = PARSE_EVENTS__TERM_TYPE_STR,
- .type_term = PARSE_EVENTS__TERM_TYPE_USER,
- .config = config,
- };
-
- if (!temp.config) {
- temp.config = strdup("event");
- if (!temp.config)
- return -ENOMEM;
- }
- BUG_ON(idx >= PERF_COUNT_HW_MAX);
- sym = &event_symbols_hw[idx];
-
- str = strdup(sym->symbol);
- if (!str)
- return -ENOMEM;
- return new_term(term, &temp, str, 0);
-}
-
int parse_events_term__clone(struct parse_events_term **new,
struct parse_events_term *term)
{
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index d4cbda6e946a..7fe80b416143 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -79,6 +79,7 @@ enum {
PARSE_EVENTS__TERM_TYPE_METRIC_ID,
PARSE_EVENTS__TERM_TYPE_RAW,
PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE,
+ PARSE_EVENTS__TERM_TYPE_HARDWARE,
__PARSE_EVENTS__TERM_TYPE_NR,
};

@@ -147,8 +148,6 @@ int parse_events_term__num(struct parse_events_term **term,
int parse_events_term__str(struct parse_events_term **term,
int type_term, char *config, char *str,
void *loc_term, void *loc_val);
-int parse_events_term__sym_hw(struct parse_events_term **term,
- char *config, unsigned idx);
int parse_events_term__clone(struct parse_events_term **new,
struct parse_events_term *term);
void parse_events_term__delete(struct parse_events_term *term);
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index abe0ce681d29..6deb70c25984 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -149,6 +149,16 @@ static int term(yyscan_t scanner, int type)
return PE_TERM;
}

+static int hw_term(yyscan_t scanner, int config)
+{
+ YYSTYPE *yylval = parse_events_get_lval(scanner);
+ char *text = parse_events_get_text(scanner);
+
+ yylval->hardware_term.str = strdup(text);
+ yylval->hardware_term.num = PERF_TYPE_HARDWARE + config;
+ return PE_TERM_HW;
+}
+
#define YY_USER_ACTION \
do { \
yylloc->last_column = yylloc->first_column; \
@@ -269,6 +279,16 @@ percore { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_PERCORE); }
aux-output { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT); }
aux-sample-size { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE); }
metric-id { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_METRIC_ID); }
+cpu-cycles|cycles { return hw_term(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
+stalled-cycles-frontend|idle-cycles-frontend { return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
+stalled-cycles-backend|idle-cycles-backend { return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
+instructions { return hw_term(yyscanner, PERF_COUNT_HW_INSTRUCTIONS); }
+cache-references { return hw_term(yyscanner, PERF_COUNT_HW_CACHE_REFERENCES); }
+cache-misses { return hw_term(yyscanner, PERF_COUNT_HW_CACHE_MISSES); }
+branch-instructions|branches { return hw_term(yyscanner, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
+branch-misses { return hw_term(yyscanner, PERF_COUNT_HW_BRANCH_MISSES); }
+bus-cycles { return hw_term(yyscanner, PERF_COUNT_HW_BUS_CYCLES); }
+ref-cycles { return hw_term(yyscanner, PERF_COUNT_HW_REF_CPU_CYCLES); }
r{num_raw_hex} { return str(yyscanner, PE_RAW); }
r0x{num_raw_hex} { return str(yyscanner, PE_RAW); }
, { return ','; }
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index c95877cbd6cf..819a5123fd77 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -65,6 +65,7 @@ static void free_list_evsel(struct list_head* list_evsel)
%token PE_KERNEL_PMU_EVENT PE_PMU_EVENT_FAKE
%token PE_ARRAY_ALL PE_ARRAY_RANGE
%token PE_DRV_CFG_TERM
+%token PE_TERM_HW
%type <num> PE_VALUE
%type <num> PE_VALUE_SYM_HW
%type <num> PE_VALUE_SYM_SW
@@ -112,6 +113,8 @@ static void free_list_evsel(struct list_head* list_evsel)
%type <array> array_term
%type <array> array_terms
%destructor { free ($$.ranges); } <array>
+%type <hardware_term> PE_TERM_HW
+%destructor { free ($$.str); } <hardware_term>

%union
{
@@ -125,6 +128,10 @@ static void free_list_evsel(struct list_head* list_evsel)
char *event;
} tracepoint_name;
struct parse_events_array array;
+ struct hardware_term {
+ char *str;
+ u64 num;
+ } hardware_term;
}
%%

@@ -770,13 +777,14 @@ name_or_raw '=' PE_VALUE
$$ = term;
}
|
-name_or_raw '=' PE_VALUE_SYM_HW
+name_or_raw '=' PE_TERM_HW
{
struct parse_events_term *term;
- int config = $3 & 255;

- if (parse_events_term__sym_hw(&term, $1, config)) {
+ if (parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_USER,
+ $1, $3.str, &@1, &@3)) {
free($1);
+ free($3.str);
YYABORT;
}
$$ = term;
@@ -806,12 +814,15 @@ PE_NAME
$$ = term;
}
|
-PE_VALUE_SYM_HW
+PE_TERM_HW
{
struct parse_events_term *term;
- int config = $1 & 255;

- ABORT_ON(parse_events_term__sym_hw(&term, NULL, config));
+ if (parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_HARDWARE,
+ $1.str, $1.num & 255, false, &@1, NULL)) {
+ free($1.str);
+ YYABORT;
+ }
$$ = term;
}
|
@@ -826,6 +837,17 @@ PE_TERM '=' PE_NAME
$$ = term;
}
|
+PE_TERM '=' PE_TERM_HW
+{
+ struct parse_events_term *term;
+
+ if (parse_events_term__str(&term, (int)$1, NULL, $3.str, &@1, &@3)) {
+ free($3.str);
+ YYABORT;
+ }
+ $$ = term;
+}
+|
PE_TERM '=' PE_VALUE
{
struct parse_events_term *term;
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:20:15

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 36/40] perf metric: Json flag to not group events if gathering a metric group

Some metric groups have metrics that don't have fully overlapping
events, meaning that the group's events become unique event groups
that may need to multiplex with each other. This can be particularly
unfortunate when the groups wouldn't need to multiplex because there
are sufficient hardware counters.

Add a flag so that if recording a metric group then the metrics within
the group needn't use groups for their events. The flag is added to
Intel TopdownL1 and TopdownL2 metrics.

Signed-off-by: Ian Rogers <[email protected]>
---
.../arch/x86/alderlake/adl-metrics.json | 26 +++++++++++++++++++
.../arch/x86/alderlaken/adln-metrics.json | 14 ++++++++++
.../arch/x86/broadwell/bdw-metrics.json | 12 +++++++++
.../arch/x86/broadwellde/bdwde-metrics.json | 12 +++++++++
.../arch/x86/broadwellx/bdx-metrics.json | 12 +++++++++
.../arch/x86/cascadelakex/clx-metrics.json | 12 +++++++++
.../arch/x86/haswell/hsw-metrics.json | 12 +++++++++
.../arch/x86/haswellx/hsx-metrics.json | 12 +++++++++
.../arch/x86/icelake/icl-metrics.json | 12 +++++++++
.../arch/x86/icelakex/icx-metrics.json | 12 +++++++++
.../arch/x86/ivybridge/ivb-metrics.json | 12 +++++++++
.../arch/x86/ivytown/ivt-metrics.json | 12 +++++++++
.../arch/x86/jaketown/jkt-metrics.json | 12 +++++++++
.../arch/x86/sandybridge/snb-metrics.json | 12 +++++++++
.../arch/x86/sapphirerapids/spr-metrics.json | 12 +++++++++
.../arch/x86/skylake/skl-metrics.json | 12 +++++++++
.../arch/x86/skylakex/skx-metrics.json | 12 +++++++++
.../arch/x86/tigerlake/tgl-metrics.json | 12 +++++++++
tools/perf/pmu-events/jevents.py | 4 ++-
tools/perf/pmu-events/pmu-events.h | 1 +
tools/perf/util/metricgroup.c | 5 +++-
21 files changed, 240 insertions(+), 2 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json b/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json
index d09361dacd4f..4c2a14ea5a1c 100644
--- a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json
@@ -133,6 +133,7 @@
"MetricGroup": "TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.1",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that uops must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count. The rest of these subevents count backend stalls, in cycles, due to an outstanding request which is memory bound vs core bound. The subevents are not slot based events and therefore can not be precisely added or subtracted from the Backend_Bound_Aux subevents which are slot based.",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
@@ -143,6 +144,7 @@
"MetricGroup": "TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound_aux",
"MetricThreshold": "tma_backend_bound_aux > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that UOPS must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count. All of these subevents count backend stalls, in slots, due to a resource limitation. These are not cycle based events and therefore can not be precisely added or subtracted from the Backend_Bound subevents which are cycle based. These subevents are supplementary to Backend_Bound and can be used to analyze results from a resource perspective at allocation.",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
@@ -153,6 +155,7 @@
"MetricGroup": "TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear. Only issue slots wasted due to fast nukes such as memory ordering nukes are counted. Other nukes are not accounted for. Counts all issue slots blocked during this recovery window including relevant microcode flows and while uops are not yet available in the instruction queue (IQ). Also includes the issue slots that were consumed by the backend but were thrown away because they were younger than the mispredict or machine clear.",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
@@ -163,6 +166,7 @@
"MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_base",
"MetricThreshold": "tma_base > 0.6",
+ "MetricgroupNoGroup": "TopdownL2",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
},
@@ -182,6 +186,7 @@
"MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.05",
+ "MetricgroupNoGroup": "TopdownL2",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
},
@@ -209,6 +214,7 @@
"MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
},
@@ -255,6 +261,7 @@
"MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
},
@@ -264,6 +271,7 @@
"MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
},
@@ -291,6 +299,7 @@
"MetricGroup": "TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
},
@@ -593,6 +602,7 @@
"MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.05",
+ "MetricgroupNoGroup": "TopdownL2",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
},
@@ -611,6 +621,7 @@
"MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
},
@@ -629,6 +640,7 @@
"MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_ms_uops",
"MetricThreshold": "tma_ms_uops > 0.05",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "Counts the number of uops that are from the complex flows issued by the micro-sequencer (MS). This includes uops from flows due to complex instructions, faults, assists, and inserted flows.",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
@@ -729,6 +741,7 @@
"MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_aux_group",
"MetricName": "tma_resource_bound",
"MetricThreshold": "tma_resource_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that uops must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count.",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
@@ -739,6 +752,7 @@
"MetricGroup": "TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.75",
+ "MetricgroupNoGroup": "TopdownL1",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
},
@@ -848,6 +862,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS",
"ScaleUnit": "100%",
"Unit": "cpu_core"
@@ -858,6 +873,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%",
"Unit": "cpu_core"
@@ -868,6 +884,7 @@
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: TOPDOWN.BR_MISPREDICT_SLOTS. Related metrics: tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredicts_resteers",
"ScaleUnit": "100%",
"Unit": "cpu_core"
@@ -919,6 +936,7 @@
"MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
"ScaleUnit": "100%",
"Unit": "cpu_core"
@@ -1031,6 +1049,7 @@
"MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 6 > 0.35",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp",
"ScaleUnit": "100%",
"Unit": "cpu_core"
@@ -1041,6 +1060,7 @@
"MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS",
"ScaleUnit": "100%",
"Unit": "cpu_core"
@@ -1122,6 +1142,7 @@
"MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
"ScaleUnit": "100%",
"Unit": "cpu_core"
@@ -1142,6 +1163,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_heavy_operations",
"MetricThreshold": "tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences. Sample with: UOPS_RETIRED.HEAVY",
"ScaleUnit": "100%",
"Unit": "cpu_core"
@@ -2032,6 +2054,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_light_operations",
"MetricThreshold": "tma_light_operations > 0.6",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
"ScaleUnit": "100%",
"Unit": "cpu_core"
@@ -2091,6 +2114,7 @@
"MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears. These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
"ScaleUnit": "100%",
"Unit": "cpu_core"
@@ -2121,6 +2145,7 @@
"MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
"ScaleUnit": "100%",
"Unit": "cpu_core"
@@ -2321,6 +2346,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
"ScaleUnit": "100%",
"Unit": "cpu_core"
diff --git a/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json b/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json
index 1a85d935c733..0402adbf7d92 100644
--- a/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json
@@ -98,6 +98,7 @@
"MetricGroup": "TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.1",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that uops must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count. The rest of these subevents count backend stalls, in cycles, due to an outstanding request which is memory bound vs core bound. The subevents are not slot based events and therefore can not be precisely added or subtracted from the Backend_Bound_Aux subevents which are slot based.",
"ScaleUnit": "100%"
},
@@ -107,6 +108,7 @@
"MetricGroup": "TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound_aux",
"MetricThreshold": "tma_backend_bound_aux > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that UOPS must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count. All of these subevents count backend stalls, in slots, due to a resource limitation. These are not cycle based events and therefore can not be precisely added or subtracted from the Backend_Bound subevents which are cycle based. These subevents are supplementary to Backend_Bound and can be used to analyze results from a resource perspective at allocation.",
"ScaleUnit": "100%"
},
@@ -116,6 +118,7 @@
"MetricGroup": "TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear. Only issue slots wasted due to fast nukes such as memory ordering nukes are counted. Other nukes are not accounted for. Counts all issue slots blocked during this recovery window including relevant microcode flows and while uops are not yet available in the instruction queue (IQ). Also includes the issue slots that were consumed by the backend but were thrown away because they were younger than the mispredict or machine clear.",
"ScaleUnit": "100%"
},
@@ -125,6 +128,7 @@
"MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_base",
"MetricThreshold": "tma_base > 0.6",
+ "MetricgroupNoGroup": "TopdownL2",
"ScaleUnit": "100%"
},
{
@@ -142,6 +146,7 @@
"MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.05",
+ "MetricgroupNoGroup": "TopdownL2",
"ScaleUnit": "100%"
},
{
@@ -166,6 +171,7 @@
"MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"ScaleUnit": "100%"
},
{
@@ -207,6 +213,7 @@
"MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"ScaleUnit": "100%"
},
{
@@ -215,6 +222,7 @@
"MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"ScaleUnit": "100%"
},
{
@@ -239,6 +247,7 @@
"MetricGroup": "TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"ScaleUnit": "100%"
},
{
@@ -499,6 +508,7 @@
"MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.05",
+ "MetricgroupNoGroup": "TopdownL2",
"ScaleUnit": "100%"
},
{
@@ -515,6 +525,7 @@
"MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"ScaleUnit": "100%"
},
{
@@ -531,6 +542,7 @@
"MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_ms_uops",
"MetricThreshold": "tma_ms_uops > 0.05",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "Counts the number of uops that are from the complex flows issued by the micro-sequencer (MS). This includes uops from flows due to complex instructions, faults, assists, and inserted flows.",
"ScaleUnit": "100%"
},
@@ -620,6 +632,7 @@
"MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_aux_group",
"MetricName": "tma_resource_bound",
"MetricThreshold": "tma_resource_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that uops must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count.",
"ScaleUnit": "100%"
},
@@ -629,6 +642,7 @@
"MetricGroup": "TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.75",
+ "MetricgroupNoGroup": "TopdownL1",
"ScaleUnit": "100%"
},
{
diff --git a/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json b/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json
index 51cf8560a8d3..f9e2316601e1 100644
--- a/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json
@@ -103,6 +103,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
"ScaleUnit": "100%"
},
@@ -112,6 +113,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
@@ -122,6 +124,7 @@
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_mispredicts_resteers",
"ScaleUnit": "100%"
},
@@ -170,6 +173,7 @@
"MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
"ScaleUnit": "100%"
},
@@ -263,6 +267,7 @@
"MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_lcp",
"ScaleUnit": "100%"
},
@@ -272,6 +277,7 @@
"MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",
"ScaleUnit": "100%"
},
@@ -326,6 +332,7 @@
"MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.",
"ScaleUnit": "100%"
},
@@ -335,6 +342,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_heavy_operations",
"MetricThreshold": "tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
"ScaleUnit": "100%"
},
@@ -828,6 +836,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_light_operations",
"MetricThreshold": "tma_light_operations > 0.6",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
"ScaleUnit": "100%"
},
@@ -858,6 +867,7 @@
"MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears. These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
"ScaleUnit": "100%"
},
@@ -886,6 +896,7 @@
"MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
"ScaleUnit": "100%"
},
@@ -1048,6 +1059,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
"ScaleUnit": "100%"
},
diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/bdwde-metrics.json b/tools/perf/pmu-events/arch/x86/broadwellde/bdwde-metrics.json
index fb57c7382408..e9c46d336a8e 100644
--- a/tools/perf/pmu-events/arch/x86/broadwellde/bdwde-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellde/bdwde-metrics.json
@@ -97,6 +97,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS",
"ScaleUnit": "100%"
},
@@ -106,6 +107,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
@@ -116,6 +118,7 @@
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: TOPDOWN.BR_MISPREDICT_SLOTS. Related metrics: tma_info_branch_misprediction_cost, tma_mispredicts_resteers",
"ScaleUnit": "100%"
},
@@ -164,6 +167,7 @@
"MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
"ScaleUnit": "100%"
},
@@ -248,6 +252,7 @@
"MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_lcp",
"ScaleUnit": "100%"
},
@@ -257,6 +262,7 @@
"MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS",
"ScaleUnit": "100%"
},
@@ -311,6 +317,7 @@
"MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
"ScaleUnit": "100%"
},
@@ -320,6 +327,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_heavy_operations",
"MetricThreshold": "tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
"ScaleUnit": "100%"
},
@@ -795,6 +803,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_light_operations",
"MetricThreshold": "tma_light_operations > 0.6",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
"ScaleUnit": "100%"
},
@@ -825,6 +834,7 @@
"MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears. These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
"ScaleUnit": "100%"
},
@@ -853,6 +863,7 @@
"MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
"ScaleUnit": "100%"
},
@@ -1013,6 +1024,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
"ScaleUnit": "100%"
},
diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json b/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
index 65ec0c9e55d1..437b9867acb9 100644
--- a/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
@@ -103,6 +103,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
"ScaleUnit": "100%"
},
@@ -112,6 +113,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
@@ -122,6 +124,7 @@
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_mispredicts_resteers",
"ScaleUnit": "100%"
},
@@ -170,6 +173,7 @@
"MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
"ScaleUnit": "100%"
},
@@ -263,6 +267,7 @@
"MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_lcp",
"ScaleUnit": "100%"
},
@@ -272,6 +277,7 @@
"MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",
"ScaleUnit": "100%"
},
@@ -326,6 +332,7 @@
"MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.",
"ScaleUnit": "100%"
},
@@ -335,6 +342,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_heavy_operations",
"MetricThreshold": "tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
"ScaleUnit": "100%"
},
@@ -829,6 +837,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_light_operations",
"MetricThreshold": "tma_light_operations > 0.6",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
"ScaleUnit": "100%"
},
@@ -869,6 +878,7 @@
"MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears. These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
"ScaleUnit": "100%"
},
@@ -897,6 +907,7 @@
"MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
"ScaleUnit": "100%"
},
@@ -1079,6 +1090,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
"ScaleUnit": "100%"
},
diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json b/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json
index 8f7dc72accd0..875c766222e3 100644
--- a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json
@@ -101,6 +101,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
"ScaleUnit": "100%"
},
@@ -110,6 +111,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
@@ -120,6 +122,7 @@
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredicts_resteers",
"ScaleUnit": "100%"
},
@@ -167,6 +170,7 @@
"MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
"ScaleUnit": "100%"
},
@@ -271,6 +275,7 @@
"MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp",
"ScaleUnit": "100%"
},
@@ -280,6 +285,7 @@
"MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS",
"ScaleUnit": "100%"
},
@@ -354,6 +360,7 @@
"MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
"ScaleUnit": "100%"
},
@@ -372,6 +379,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_heavy_operations",
"MetricThreshold": "tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
"ScaleUnit": "100%"
},
@@ -1142,6 +1150,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_light_operations",
"MetricThreshold": "tma_light_operations > 0.6",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
"ScaleUnit": "100%"
},
@@ -1196,6 +1205,7 @@
"MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears. These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
"ScaleUnit": "100%"
},
@@ -1224,6 +1234,7 @@
"MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
"ScaleUnit": "100%"
},
@@ -1458,6 +1469,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
"ScaleUnit": "100%"
},
diff --git a/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json b/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json
index 2528418200bb..9570a88d6d1c 100644
--- a/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json
@@ -103,6 +103,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
"ScaleUnit": "100%"
},
@@ -112,6 +113,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
@@ -122,6 +124,7 @@
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_mispredicts_resteers",
"ScaleUnit": "100%"
},
@@ -161,6 +164,7 @@
"MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
"ScaleUnit": "100%"
},
@@ -254,6 +258,7 @@
"MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_lcp",
"ScaleUnit": "100%"
},
@@ -263,6 +268,7 @@
"MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",
"ScaleUnit": "100%"
},
@@ -272,6 +278,7 @@
"MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.",
"ScaleUnit": "100%"
},
@@ -281,6 +288,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_heavy_operations",
"MetricThreshold": "tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
"ScaleUnit": "100%"
},
@@ -663,6 +671,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_light_operations",
"MetricThreshold": "tma_light_operations > 0.6",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
"ScaleUnit": "100%"
},
@@ -693,6 +702,7 @@
"MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears. These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
"ScaleUnit": "100%"
},
@@ -721,6 +731,7 @@
"MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
"ScaleUnit": "100%"
},
@@ -874,6 +885,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
"ScaleUnit": "100%"
},
diff --git a/tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json b/tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json
index 11f152c346eb..a522202cf684 100644
--- a/tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json
@@ -103,6 +103,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
"ScaleUnit": "100%"
},
@@ -112,6 +113,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
@@ -122,6 +124,7 @@
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_mispredicts_resteers",
"ScaleUnit": "100%"
},
@@ -161,6 +164,7 @@
"MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
"ScaleUnit": "100%"
},
@@ -254,6 +258,7 @@
"MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_lcp",
"ScaleUnit": "100%"
},
@@ -263,6 +268,7 @@
"MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",
"ScaleUnit": "100%"
},
@@ -272,6 +278,7 @@
"MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.",
"ScaleUnit": "100%"
},
@@ -281,6 +288,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_heavy_operations",
"MetricThreshold": "tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
"ScaleUnit": "100%"
},
@@ -664,6 +672,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_light_operations",
"MetricThreshold": "tma_light_operations > 0.6",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
"ScaleUnit": "100%"
},
@@ -704,6 +713,7 @@
"MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears. These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
"ScaleUnit": "100%"
},
@@ -732,6 +742,7 @@
"MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
"ScaleUnit": "100%"
},
@@ -905,6 +916,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
"ScaleUnit": "100%"
},
diff --git a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json b/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
index cb58317860ea..ae8a96ec7fa5 100644
--- a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
@@ -115,6 +115,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS",
"ScaleUnit": "100%"
},
@@ -124,6 +125,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
@@ -141,6 +143,7 @@
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredicts_resteers",
"ScaleUnit": "100%"
},
@@ -187,6 +190,7 @@
"MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
"ScaleUnit": "100%"
},
@@ -288,6 +292,7 @@
"MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 5 > 0.35",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp",
"ScaleUnit": "100%"
},
@@ -297,6 +302,7 @@
"MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS",
"ScaleUnit": "100%"
},
@@ -370,6 +376,7 @@
"MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
"ScaleUnit": "100%"
},
@@ -379,6 +386,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_heavy_operations",
"MetricThreshold": "tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
"ScaleUnit": "100%"
},
@@ -1120,6 +1128,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_light_operations",
"MetricThreshold": "tma_light_operations > 0.6",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
"ScaleUnit": "100%"
},
@@ -1173,6 +1182,7 @@
"MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears. These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
"ScaleUnit": "100%"
},
@@ -1200,6 +1210,7 @@
"MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
"ScaleUnit": "100%"
},
@@ -1371,6 +1382,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
"ScaleUnit": "100%"
},
diff --git a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json b/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
index 76e60e3f9d31..b736fec164d0 100644
--- a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
@@ -80,6 +80,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS",
"ScaleUnit": "100%"
},
@@ -89,6 +90,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
@@ -106,6 +108,7 @@
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredicts_resteers",
"ScaleUnit": "100%"
},
@@ -152,6 +155,7 @@
"MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
"ScaleUnit": "100%"
},
@@ -253,6 +257,7 @@
"MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 5 > 0.35",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp",
"ScaleUnit": "100%"
},
@@ -262,6 +267,7 @@
"MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS",
"ScaleUnit": "100%"
},
@@ -335,6 +341,7 @@
"MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
"ScaleUnit": "100%"
},
@@ -344,6 +351,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_heavy_operations",
"MetricThreshold": "tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
"ScaleUnit": "100%"
},
@@ -1143,6 +1151,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_light_operations",
"MetricThreshold": "tma_light_operations > 0.6",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
"ScaleUnit": "100%"
},
@@ -1196,6 +1205,7 @@
"MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears. These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
"ScaleUnit": "100%"
},
@@ -1223,6 +1233,7 @@
"MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
"ScaleUnit": "100%"
},
@@ -1421,6 +1432,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
"ScaleUnit": "100%"
},
diff --git a/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json b/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json
index 5247f69c13b6..11080ccffd51 100644
--- a/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json
@@ -103,6 +103,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
"ScaleUnit": "100%"
},
@@ -112,6 +113,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
@@ -122,6 +124,7 @@
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_mispredicts_resteers",
"ScaleUnit": "100%"
},
@@ -161,6 +164,7 @@
"MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
"ScaleUnit": "100%"
},
@@ -254,6 +258,7 @@
"MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_lcp",
"ScaleUnit": "100%"
},
@@ -263,6 +268,7 @@
"MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",
"ScaleUnit": "100%"
},
@@ -299,6 +305,7 @@
"MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.",
"ScaleUnit": "100%"
},
@@ -308,6 +315,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_heavy_operations",
"MetricThreshold": "tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
"ScaleUnit": "100%"
},
@@ -724,6 +732,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_light_operations",
"MetricThreshold": "tma_light_operations > 0.6",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
"ScaleUnit": "100%"
},
@@ -754,6 +763,7 @@
"MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears. These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
"ScaleUnit": "100%"
},
@@ -782,6 +792,7 @@
"MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
"ScaleUnit": "100%"
},
@@ -917,6 +928,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
"ScaleUnit": "100%"
},
diff --git a/tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json b/tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json
index 89469b10fa30..65a46d659c0a 100644
--- a/tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json
@@ -103,6 +103,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
"ScaleUnit": "100%"
},
@@ -112,6 +113,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
@@ -122,6 +124,7 @@
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_mispredicts_resteers",
"ScaleUnit": "100%"
},
@@ -161,6 +164,7 @@
"MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
"ScaleUnit": "100%"
},
@@ -254,6 +258,7 @@
"MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_lcp",
"ScaleUnit": "100%"
},
@@ -263,6 +268,7 @@
"MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",
"ScaleUnit": "100%"
},
@@ -299,6 +305,7 @@
"MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.",
"ScaleUnit": "100%"
},
@@ -308,6 +315,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_heavy_operations",
"MetricThreshold": "tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
"ScaleUnit": "100%"
},
@@ -725,6 +733,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_light_operations",
"MetricThreshold": "tma_light_operations > 0.6",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
"ScaleUnit": "100%"
},
@@ -765,6 +774,7 @@
"MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears. These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
"ScaleUnit": "100%"
},
@@ -793,6 +803,7 @@
"MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
"ScaleUnit": "100%"
},
@@ -948,6 +959,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
"ScaleUnit": "100%"
},
diff --git a/tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json b/tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json
index e8f4e5c01c9f..66a6f657bd6f 100644
--- a/tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json
@@ -76,6 +76,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
"ScaleUnit": "100%"
},
@@ -85,6 +86,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
@@ -95,6 +97,7 @@
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_mispredicts_resteers",
"ScaleUnit": "100%"
},
@@ -114,6 +117,7 @@
"MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
"ScaleUnit": "100%"
},
@@ -160,6 +164,7 @@
"MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_lcp",
"ScaleUnit": "100%"
},
@@ -169,6 +174,7 @@
"MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",
"ScaleUnit": "100%"
},
@@ -205,6 +211,7 @@
"MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.",
"ScaleUnit": "100%"
},
@@ -214,6 +221,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_heavy_operations",
"MetricThreshold": "tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
"ScaleUnit": "100%"
},
@@ -412,6 +420,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_light_operations",
"MetricThreshold": "tma_light_operations > 0.6",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
"ScaleUnit": "100%"
},
@@ -422,6 +431,7 @@
"MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears. These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
"ScaleUnit": "100%"
},
@@ -450,6 +460,7 @@
"MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
"ScaleUnit": "100%"
},
@@ -487,6 +498,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
"ScaleUnit": "100%"
},
diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json b/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json
index 4a99fe515f4b..4b8bc19392a4 100644
--- a/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json
@@ -76,6 +76,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
"ScaleUnit": "100%"
},
@@ -85,6 +86,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
@@ -95,6 +97,7 @@
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_mispredicts_resteers",
"ScaleUnit": "100%"
},
@@ -114,6 +117,7 @@
"MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
"ScaleUnit": "100%"
},
@@ -160,6 +164,7 @@
"MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_lcp",
"ScaleUnit": "100%"
},
@@ -169,6 +174,7 @@
"MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",
"ScaleUnit": "100%"
},
@@ -205,6 +211,7 @@
"MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.",
"ScaleUnit": "100%"
},
@@ -214,6 +221,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_heavy_operations",
"MetricThreshold": "tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
"ScaleUnit": "100%"
},
@@ -411,6 +419,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_light_operations",
"MetricThreshold": "tma_light_operations > 0.6",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
"ScaleUnit": "100%"
},
@@ -421,6 +430,7 @@
"MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears. These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
"ScaleUnit": "100%"
},
@@ -449,6 +459,7 @@
"MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
"ScaleUnit": "100%"
},
@@ -486,6 +497,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
"ScaleUnit": "100%"
},
diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json
index 527d40dde003..4308e2483112 100644
--- a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json
@@ -87,6 +87,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS",
"ScaleUnit": "100%"
},
@@ -96,6 +97,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
@@ -105,6 +107,7 @@
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: TOPDOWN.BR_MISPREDICT_SLOTS. Related metrics: tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredicts_resteers",
"ScaleUnit": "100%"
},
@@ -151,6 +154,7 @@
"MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
"ScaleUnit": "100%"
},
@@ -252,6 +256,7 @@
"MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 6 > 0.35",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp",
"ScaleUnit": "100%"
},
@@ -261,6 +266,7 @@
"MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS",
"ScaleUnit": "100%"
},
@@ -352,6 +358,7 @@
"MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
"ScaleUnit": "100%"
},
@@ -370,6 +377,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_heavy_operations",
"MetricThreshold": "tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences. Sample with: UOPS_RETIRED.HEAVY",
"ScaleUnit": "100%"
},
@@ -1225,6 +1233,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_light_operations",
"MetricThreshold": "tma_light_operations > 0.6",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
"ScaleUnit": "100%"
},
@@ -1278,6 +1287,7 @@
"MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears. These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
"ScaleUnit": "100%"
},
@@ -1313,6 +1323,7 @@
"MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
"ScaleUnit": "100%"
},
@@ -1520,6 +1531,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
"ScaleUnit": "100%"
},
diff --git a/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json b/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json
index a6d212b349f5..21ef6c9be816 100644
--- a/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json
@@ -101,6 +101,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
"ScaleUnit": "100%"
},
@@ -110,6 +111,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
@@ -120,6 +122,7 @@
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredicts_resteers",
"ScaleUnit": "100%"
},
@@ -167,6 +170,7 @@
"MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
"ScaleUnit": "100%"
},
@@ -271,6 +275,7 @@
"MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp",
"ScaleUnit": "100%"
},
@@ -280,6 +285,7 @@
"MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS",
"ScaleUnit": "100%"
},
@@ -345,6 +351,7 @@
"MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
"ScaleUnit": "100%"
},
@@ -363,6 +370,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_heavy_operations",
"MetricThreshold": "tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
"ScaleUnit": "100%"
},
@@ -1065,6 +1073,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_light_operations",
"MetricThreshold": "tma_light_operations > 0.6",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
"ScaleUnit": "100%"
},
@@ -1110,6 +1119,7 @@
"MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears. These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
"ScaleUnit": "100%"
},
@@ -1138,6 +1148,7 @@
"MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
"ScaleUnit": "100%"
},
@@ -1343,6 +1354,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
"ScaleUnit": "100%"
},
diff --git a/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json b/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json
index fa2f7f126a30..eb6f12c0343d 100644
--- a/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json
@@ -101,6 +101,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
"ScaleUnit": "100%"
},
@@ -110,6 +111,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
@@ -120,6 +122,7 @@
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredicts_resteers",
"ScaleUnit": "100%"
},
@@ -167,6 +170,7 @@
"MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
"ScaleUnit": "100%"
},
@@ -271,6 +275,7 @@
"MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp",
"ScaleUnit": "100%"
},
@@ -280,6 +285,7 @@
"MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS",
"ScaleUnit": "100%"
},
@@ -354,6 +360,7 @@
"MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
"ScaleUnit": "100%"
},
@@ -372,6 +379,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_heavy_operations",
"MetricThreshold": "tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
"ScaleUnit": "100%"
},
@@ -1123,6 +1131,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_light_operations",
"MetricThreshold": "tma_light_operations > 0.6",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
"ScaleUnit": "100%"
},
@@ -1177,6 +1186,7 @@
"MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears. These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
"ScaleUnit": "100%"
},
@@ -1205,6 +1215,7 @@
"MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
"ScaleUnit": "100%"
},
@@ -1429,6 +1440,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
"ScaleUnit": "100%"
},
diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json b/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json
index 6ac4a9e5d013..ae62bacf9f5e 100644
--- a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json
@@ -109,6 +109,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS",
"ScaleUnit": "100%"
},
@@ -118,6 +119,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
@@ -135,6 +137,7 @@
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredicts_resteers",
"ScaleUnit": "100%"
},
@@ -181,6 +184,7 @@
"MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
"ScaleUnit": "100%"
},
@@ -282,6 +286,7 @@
"MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 5 > 0.35",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp",
"ScaleUnit": "100%"
},
@@ -291,6 +296,7 @@
"MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS",
"ScaleUnit": "100%"
},
@@ -364,6 +370,7 @@
"MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
"ScaleUnit": "100%"
},
@@ -373,6 +380,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_heavy_operations",
"MetricThreshold": "tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
"ScaleUnit": "100%"
},
@@ -1134,6 +1142,7 @@
"MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_light_operations",
"MetricThreshold": "tma_light_operations > 0.6",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
"ScaleUnit": "100%"
},
@@ -1187,6 +1196,7 @@
"MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears. These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
"ScaleUnit": "100%"
},
@@ -1214,6 +1224,7 @@
"MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+ "MetricgroupNoGroup": "TopdownL2",
"PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
"ScaleUnit": "100%"
},
@@ -1385,6 +1396,7 @@
"MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+ "MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
"ScaleUnit": "100%"
},
diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index bbcaafaf7c25..b18dd2fcbf04 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -52,7 +52,8 @@ _json_event_attributes = [
# Attributes that are in pmu_metric rather than pmu_event.
_json_metric_attributes = [
'pmu', 'metric_name', 'metric_group', 'metric_expr', 'metric_threshold',
- 'desc', 'long_desc', 'unit', 'compat', 'aggr_mode', 'event_grouping'
+ 'desc', 'long_desc', 'unit', 'compat', 'metricgroup_no_group', 'aggr_mode',
+ 'event_grouping'
]
# Attributes that are bools or enum int values, encoded as '0', '1',...
_json_enum_attributes = ['aggr_mode', 'deprecated', 'event_grouping', 'perpkg']
@@ -303,6 +304,7 @@ class JsonEvent:
self.deprecated = jd.get('Deprecated')
self.metric_name = jd.get('MetricName')
self.metric_group = jd.get('MetricGroup')
+ self.metricgroup_no_group = jd.get('MetricgroupNoGroup')
self.event_grouping = convert_metric_constraint(jd.get('MetricConstraint'))
self.metric_expr = None
if 'MetricExpr' in jd:
diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu-events.h
index 1dd8f35a2483..3549e6971a4d 100644
--- a/tools/perf/pmu-events/pmu-events.h
+++ b/tools/perf/pmu-events/pmu-events.h
@@ -60,6 +60,7 @@ struct pmu_metric {
const char *compat;
const char *desc;
const char *long_desc;
+ const char *metricgroup_no_group;
enum aggr_mode_class aggr_mode;
enum metric_event_groups event_grouping;
};
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 39fbdb5bab1f..17478eb33bdc 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -1171,8 +1171,11 @@ static int metricgroup__add_metric_callback(const struct pmu_metric *pm,
int ret = 0;

if (pm->metric_expr && match_pm_metric(pm, data->pmu, data->metric_name)) {
+ bool metric_no_group = data->metric_no_group ||
+ match_metric(data->metric_name, pm->metricgroup_no_group);
+
data->has_match = true;
- ret = add_metric(data->list, pm, data->modifier, data->metric_no_group,
+ ret = add_metric(data->list, pm, data->modifier, metric_no_group,
data->metric_no_threshold, data->user_requested_cpu_list,
data->system_wide, /*root_metric=*/NULL,
/*visited_metrics=*/NULL, table);
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:20:19

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 37/40] perf stat: Command line PMU metric filtering

Wire up the --cputype value to limit which metrics are parsed.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/builtin-stat.c | 20 ++++++++++++--------
tools/perf/util/metricgroup.c | 3 ++-
tools/perf/util/metricgroup.h | 1 +
3 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 5dbdf001028b..67dc69270ae4 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1783,6 +1783,7 @@ static int add_default_attributes(void)
};

struct perf_event_attr default_null_attrs[] = {};
+ const char *pmu = parse_events_option_args.pmu_filter ?: "all";

/* Set attrs if no event is selected and !null_run: */
if (stat_config.null_run)
@@ -1794,11 +1795,11 @@ static int add_default_attributes(void)
* will use this approach. To determine transaction support
* on an architecture test for such a metric name.
*/
- if (!metricgroup__has_metric("all", "transaction")) {
+ if (!metricgroup__has_metric(pmu, "transaction")) {
pr_err("Missing transaction metrics");
return -1;
}
- return metricgroup__parse_groups(evsel_list, "transaction",
+ return metricgroup__parse_groups(evsel_list, pmu, "transaction",
stat_config.metric_no_group,
stat_config.metric_no_merge,
stat_config.metric_no_threshold,
@@ -1823,7 +1824,7 @@ static int add_default_attributes(void)
smi_reset = true;
}

- if (!metricgroup__has_metric("all", "smi")) {
+ if (!metricgroup__has_metric(pmu, "smi")) {
pr_err("Missing smi metrics");
return -1;
}
@@ -1831,7 +1832,7 @@ static int add_default_attributes(void)
if (!force_metric_only)
stat_config.metric_only = true;

- return metricgroup__parse_groups(evsel_list, "smi",
+ return metricgroup__parse_groups(evsel_list, pmu, "smi",
stat_config.metric_no_group,
stat_config.metric_no_merge,
stat_config.metric_no_threshold,
@@ -1864,7 +1865,8 @@ static int add_default_attributes(void)
"Please print the result regularly, e.g. -I1000\n");
}
str[8] = stat_config.topdown_level + '0';
- if (metricgroup__parse_groups(evsel_list, str,
+ if (metricgroup__parse_groups(evsel_list,
+ pmu, str,
/*metric_no_group=*/false,
/*metric_no_merge=*/false,
/*metric_no_threshold=*/true,
@@ -1898,14 +1900,14 @@ static int add_default_attributes(void)
* Add TopdownL1 metrics if they exist. To minimize
* multiplexing, don't request threshold computation.
*/
- if (metricgroup__has_metric("all", "TopdownL1")) {
+ if (metricgroup__has_metric(pmu, "TopdownL1")) {
struct evlist *metric_evlist = evlist__new();
struct evsel *metric_evsel;

if (!metric_evlist)
return -1;

- if (metricgroup__parse_groups(metric_evlist, "TopdownL1",
+ if (metricgroup__parse_groups(metric_evlist, pmu, "TopdownL1",
/*metric_no_group=*/false,
/*metric_no_merge=*/false,
/*metric_no_threshold=*/true,
@@ -2429,7 +2431,9 @@ int cmd_stat(int argc, const char **argv)
* knowing the target is system-wide.
*/
if (metrics) {
- metricgroup__parse_groups(evsel_list, metrics,
+ const char *pmu = parse_events_option_args.pmu_filter ?: "all";
+
+ metricgroup__parse_groups(evsel_list, pmu, metrics,
stat_config.metric_no_group,
stat_config.metric_no_merge,
stat_config.metric_no_threshold,
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 17478eb33bdc..4245b23d8efe 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -1648,6 +1648,7 @@ static int parse_groups(struct evlist *perf_evlist,
}

int metricgroup__parse_groups(struct evlist *perf_evlist,
+ const char *pmu,
const char *str,
bool metric_no_group,
bool metric_no_merge,
@@ -1661,7 +1662,7 @@ int metricgroup__parse_groups(struct evlist *perf_evlist,
if (!table)
return -EINVAL;

- return parse_groups(perf_evlist, "all", str, metric_no_group, metric_no_merge,
+ return parse_groups(perf_evlist, pmu, str, metric_no_group, metric_no_merge,
metric_no_threshold, user_requested_cpu_list, system_wide,
/*fake_pmu=*/NULL, metric_events, table);
}
diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
index 08e9b9e953ec..bf18274c15df 100644
--- a/tools/perf/util/metricgroup.h
+++ b/tools/perf/util/metricgroup.h
@@ -67,6 +67,7 @@ struct metric_event *metricgroup__lookup(struct rblist *metric_events,
struct evsel *evsel,
bool create);
int metricgroup__parse_groups(struct evlist *perf_evlist,
+ const char *pmu,
const char *str,
bool metric_no_group,
bool metric_no_merge,
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:20:21

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 33/40] perf parse-events: Don't auto merge hybrid wildcard events

Bring back the behavior of not auto-merging hybrid events by
delegating to a test in pmu.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/parse-events.c | 5 ++++-
tools/perf/util/parse-events.y | 4 +++-
tools/perf/util/pmu.c | 5 +++++
tools/perf/util/pmu.h | 1 +
4 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 98e424257278..b62dcc51b22f 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1718,16 +1718,19 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,

while ((pmu = perf_pmu__scan(pmu)) != NULL) {
struct perf_pmu_alias *alias;
+ bool auto_merge_stats;

if (parse_events__filter_pmu(parse_state, pmu))
continue;

+ auto_merge_stats = perf_pmu__auto_merge_stats(pmu);
+
list_for_each_entry(alias, &pmu->aliases, list) {
if (!strcasecmp(alias->name, str)) {
parse_events_copy_term_list(head, &orig_head);
if (!parse_events_add_pmu(parse_state, list,
pmu->name, orig_head,
- /*auto_merge_stats=*/true)) {
+ auto_merge_stats)) {
pr_debug("%s -> %s/%s/\n", str,
pmu->name, alias->str);
ok++;
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index f4ee03b5976b..4e1f5de35be8 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -327,10 +327,12 @@ PE_NAME opt_pmu_config
name += 7;
if (!perf_pmu__match(pattern, name, $1) ||
!perf_pmu__match(pattern, pmu->alias_name, $1)) {
+ bool auto_merge_stats = perf_pmu__auto_merge_stats(pmu);
+
if (parse_events_copy_term_list(orig_terms, &terms))
CLEANUP_YYABORT;
if (!parse_events_add_pmu(parse_state, list, pmu->name, terms,
- /*auto_merge_stats=*/true)) {
+ auto_merge_stats)) {
ok++;
parse_state->wild_card_pmus = true;
}
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index cd4247a379d4..f4f0afbc391c 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1660,6 +1660,11 @@ bool perf_pmu__supports_wildcard_numeric(const struct perf_pmu *pmu)
return is_pmu_core(pmu->name) || perf_pmu__is_hybrid(pmu->name);
}

+bool perf_pmu__auto_merge_stats(const struct perf_pmu *pmu)
+{
+ return !perf_pmu__is_hybrid(pmu->name);
+}
+
static bool pmu_alias_is_duplicate(struct sevent *alias_a,
struct sevent *alias_b)
{
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 5a19536a5449..0e0cb6283594 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -222,6 +222,7 @@ struct perf_pmu *perf_pmu__scan(struct perf_pmu *pmu);
bool is_pmu_core(const char *name);
bool perf_pmu__supports_legacy_cache(const struct perf_pmu *pmu);
bool perf_pmu__supports_wildcard_numeric(const struct perf_pmu *pmu);
+bool perf_pmu__auto_merge_stats(const struct perf_pmu *pmu);
void print_pmu_events(const struct print_callbacks *print_cb, void *print_state);
bool pmu_have_event(const char *pname, const char *name);

--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:21:13

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 39/40] perf jevents: Don't rewrite metrics across PMUs

Don't rewrite metrics across PMUs as the result events likely won't be
found. Identify metrics with a pair of PMU name and metric name.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/pmu-events/jevents.py | 4 ++--
tools/perf/pmu-events/metric.py | 28 +++++++++++++++++-----------
tools/perf/pmu-events/metric_test.py | 6 +++---
3 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index b18dd2fcbf04..487ff01baf1b 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -391,11 +391,11 @@ def read_json_events(path: str, topic: str) -> Sequence[JsonEvent]:
except BaseException as err:
print(f"Exception processing {path}")
raise
- metrics: list[Tuple[str, metric.Expression]] = []
+ metrics: list[Tuple[str, str, metric.Expression]] = []
for event in events:
event.topic = topic
if event.metric_name and '-' not in event.metric_name:
- metrics.append((event.metric_name, event.metric_expr))
+ metrics.append((event.pmu, event.metric_name, event.metric_expr))
updates = metric.RewriteMetricsInTermsOfOthers(metrics)
if updates:
for event in events:
diff --git a/tools/perf/pmu-events/metric.py b/tools/perf/pmu-events/metric.py
index 8ec0ba884673..af58b74d1644 100644
--- a/tools/perf/pmu-events/metric.py
+++ b/tools/perf/pmu-events/metric.py
@@ -552,28 +552,34 @@ def ParsePerfJson(orig: str) -> Expression:
return _Constify(eval(compile(parsed, orig, 'eval')))


-def RewriteMetricsInTermsOfOthers(metrics: List[Tuple[str, Expression]]
- )-> Dict[str, Expression]:
+def RewriteMetricsInTermsOfOthers(metrics: List[Tuple[str, str, Expression]]
+ )-> Dict[Tuple[str, str], Expression]:
"""Shorten metrics by rewriting in terms of others.

Args:
- metrics (list): pairs of metric names and their expressions.
+ metrics (list): pmus, metric names and their expressions.
Returns:
- Dict: mapping from a metric name to a shortened expression.
+ Dict: mapping from a pmu, metric name pair to a shortened expression.
"""
- updates: Dict[str, Expression] = dict()
- for outer_name, outer_expression in metrics:
+ updates: Dict[Tuple[str, str], Expression] = dict()
+ for outer_pmu, outer_name, outer_expression in metrics:
+ if outer_pmu is None:
+ outer_pmu = 'cpu'
updated = outer_expression
while True:
- for inner_name, inner_expression in metrics:
+ for inner_pmu, inner_name, inner_expression in metrics:
+ if inner_pmu is None:
+ inner_pmu = 'cpu'
+ if inner_pmu.lower() != outer_pmu.lower():
+ continue
if inner_name.lower() == outer_name.lower():
continue
- if inner_name in updates:
- inner_expression = updates[inner_name]
+ if (inner_pmu, inner_name) in updates:
+ inner_expression = updates[(inner_pmu, inner_name)]
updated = updated.Substitute(inner_name, inner_expression)
if updated.Equals(outer_expression):
break
- if outer_name in updates and updated.Equals(updates[outer_name]):
+ if (outer_pmu, outer_name) in updates and updated.Equals(updates[(outer_pmu, outer_name)]):
break
- updates[outer_name] = updated
+ updates[(outer_pmu, outer_name)] = updated
return updates
diff --git a/tools/perf/pmu-events/metric_test.py b/tools/perf/pmu-events/metric_test.py
index 40a3c7d8b2bc..ee22ff43ddd7 100755
--- a/tools/perf/pmu-events/metric_test.py
+++ b/tools/perf/pmu-events/metric_test.py
@@ -158,9 +158,9 @@ class TestMetricExpressions(unittest.TestCase):

def test_RewriteMetricsInTermsOfOthers(self):
Expression.__eq__ = lambda e1, e2: e1.Equals(e2)
- before = [('m1', ParsePerfJson('a + b + c + d')),
- ('m2', ParsePerfJson('a + b + c'))]
- after = {'m1': ParsePerfJson('m2 + d')}
+ before = [('cpu', 'm1', ParsePerfJson('a + b + c + d')),
+ ('cpu', 'm2', ParsePerfJson('a + b + c'))]
+ after = {('cpu', 'm1'): ParsePerfJson('m2 + d')}
self.assertEqual(RewriteMetricsInTermsOfOthers(before), after)
Expression.__eq__ = None

--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:21:32

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 35/40] perf metrics: Be PMU specific for referenced metrics.

Hybrid systems may define the same metric for different PMUs, this can
cause confusion of events. To avoid this make the referenced metric
searchs PMU specific, matching that in the table.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/builtin-stat.c | 6 +-
tools/perf/pmu-events/jevents.py | 4 +-
tools/perf/pmu-events/pmu-events.h | 1 +
tools/perf/util/metricgroup.c | 97 +++++++++++++++++++++---------
tools/perf/util/metricgroup.h | 2 +-
5 files changed, 75 insertions(+), 35 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index e2119ffd08de..5dbdf001028b 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1794,7 +1794,7 @@ static int add_default_attributes(void)
* will use this approach. To determine transaction support
* on an architecture test for such a metric name.
*/
- if (!metricgroup__has_metric("transaction")) {
+ if (!metricgroup__has_metric("all", "transaction")) {
pr_err("Missing transaction metrics");
return -1;
}
@@ -1823,7 +1823,7 @@ static int add_default_attributes(void)
smi_reset = true;
}

- if (!metricgroup__has_metric("smi")) {
+ if (!metricgroup__has_metric("all", "smi")) {
pr_err("Missing smi metrics");
return -1;
}
@@ -1898,7 +1898,7 @@ static int add_default_attributes(void)
* Add TopdownL1 metrics if they exist. To minimize
* multiplexing, don't request threshold computation.
*/
- if (metricgroup__has_metric("TopdownL1")) {
+ if (metricgroup__has_metric("all", "TopdownL1")) {
struct evlist *metric_evlist = evlist__new();
struct evsel *metric_evsel;

diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index ca99b9cfe4ad..bbcaafaf7c25 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -51,8 +51,8 @@ _json_event_attributes = [

# Attributes that are in pmu_metric rather than pmu_event.
_json_metric_attributes = [
- 'metric_name', 'metric_group', 'metric_expr', 'metric_threshold', 'desc',
- 'long_desc', 'unit', 'compat', 'aggr_mode', 'event_grouping'
+ 'pmu', 'metric_name', 'metric_group', 'metric_expr', 'metric_threshold',
+ 'desc', 'long_desc', 'unit', 'compat', 'aggr_mode', 'event_grouping'
]
# Attributes that are bools or enum int values, encoded as '0', '1',...
_json_enum_attributes = ['aggr_mode', 'deprecated', 'event_grouping', 'perpkg']
diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu-events.h
index b7dff8f1021f..1dd8f35a2483 100644
--- a/tools/perf/pmu-events/pmu-events.h
+++ b/tools/perf/pmu-events/pmu-events.h
@@ -51,6 +51,7 @@ struct pmu_event {
};

struct pmu_metric {
+ const char *pmu;
const char *metric_name;
const char *metric_group;
const char *metric_expr;
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 46fc31cff124..39fbdb5bab1f 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -123,6 +123,7 @@ struct metric {
* within the expression.
*/
struct expr_parse_ctx *pctx;
+ const char *pmu;
/** The name of the metric such as "IPC". */
const char *metric_name;
/** Modifier on the metric such as "u" or NULL for none. */
@@ -216,6 +217,7 @@ static struct metric *metric__new(const struct pmu_metric *pm,
if (!m->pctx)
goto out_err;

+ m->pmu = pm->pmu ?: "cpu";
m->metric_name = pm->metric_name;
m->modifier = NULL;
if (modifier) {
@@ -259,11 +261,12 @@ static bool contains_metric_id(struct evsel **metric_events, int num_events,
/**
* setup_metric_events - Find a group of events in metric_evlist that correspond
* to the IDs from a parsed metric expression.
+ * @pmu: The PMU for the IDs.
* @ids: the metric IDs to match.
* @metric_evlist: the list of perf events.
* @out_metric_events: holds the created metric events array.
*/
-static int setup_metric_events(struct hashmap *ids,
+static int setup_metric_events(const char *pmu, struct hashmap *ids,
struct evlist *metric_evlist,
struct evsel ***out_metric_events)
{
@@ -271,6 +274,7 @@ static int setup_metric_events(struct hashmap *ids,
const char *metric_id;
struct evsel *ev;
size_t ids_size, matched_events, i;
+ bool all_pmus = !strcmp(pmu, "all");

*out_metric_events = NULL;
ids_size = hashmap__size(ids);
@@ -283,6 +287,8 @@ static int setup_metric_events(struct hashmap *ids,
evlist__for_each_entry(metric_evlist, ev) {
struct expr_id_data *val_ptr;

+ if (!all_pmus && strcmp(ev->pmu_name, pmu))
+ continue;
/*
* Check for duplicate events with the same name. For
* example, uncore_imc/cas_count_read/ will turn into 6
@@ -355,8 +361,13 @@ static bool match_metric(const char *n, const char *list)
return false;
}

-static bool match_pm_metric(const struct pmu_metric *pm, const char *metric)
+static bool match_pm_metric(const struct pmu_metric *pm, const char *pmu, const char *metric)
{
+ const char *pm_pmu = pm->pmu ?: "cpu";
+
+ if (strcmp(pmu, "all") && strcmp(pm_pmu, pmu))
+ return false;
+
return match_metric(pm->metric_group, metric) ||
match_metric(pm->metric_name, metric);
}
@@ -766,6 +777,7 @@ struct visited_metric {

struct metricgroup_add_iter_data {
struct list_head *metric_list;
+ const char *pmu;
const char *metric_name;
const char *modifier;
int *ret;
@@ -779,7 +791,8 @@ struct metricgroup_add_iter_data {
const struct pmu_metrics_table *table;
};

-static bool metricgroup__find_metric(const char *metric,
+static bool metricgroup__find_metric(const char *pmu,
+ const char *metric,
const struct pmu_metrics_table *table,
struct pmu_metric *pm);

@@ -798,6 +811,7 @@ static int add_metric(struct list_head *metric_list,
* resolve_metric - Locate metrics within the root metric and recursively add
* references to them.
* @metric_list: The list the metric is added to.
+ * @pmu: The PMU name to resolve metrics on, or "all" for all PMUs.
* @modifier: if non-null event modifiers like "u".
* @metric_no_group: Should events written to events be grouped "{}" or
* global. Grouping is the default but due to multiplexing the
@@ -813,6 +827,7 @@ static int add_metric(struct list_head *metric_list,
* architecture perf is running upon.
*/
static int resolve_metric(struct list_head *metric_list,
+ const char *pmu,
const char *modifier,
bool metric_no_group,
bool metric_no_threshold,
@@ -842,7 +857,7 @@ static int resolve_metric(struct list_head *metric_list,
hashmap__for_each_entry(root_metric->pctx->ids, cur, bkt) {
struct pmu_metric pm;

- if (metricgroup__find_metric(cur->pkey, table, &pm)) {
+ if (metricgroup__find_metric(pmu, cur->pkey, table, &pm)) {
pending = realloc(pending,
(pending_cnt + 1) * sizeof(struct to_resolve));
if (!pending)
@@ -993,9 +1008,12 @@ static int __add_metric(struct list_head *metric_list,
}
if (!ret) {
/* Resolve referenced metrics. */
- ret = resolve_metric(metric_list, modifier, metric_no_group,
+ const char *pmu = pm->pmu ?: "cpu";
+
+ ret = resolve_metric(metric_list, pmu, modifier, metric_no_group,
metric_no_threshold, user_requested_cpu_list,
- system_wide, root_metric, &visited_node, table);
+ system_wide, root_metric, &visited_node,
+ table);
}
if (ret) {
if (is_root)
@@ -1008,6 +1026,7 @@ static int __add_metric(struct list_head *metric_list,
}

struct metricgroup__find_metric_data {
+ const char *pmu;
const char *metric;
struct pmu_metric *pm;
};
@@ -1017,6 +1036,10 @@ static int metricgroup__find_metric_callback(const struct pmu_metric *pm,
void *vdata)
{
struct metricgroup__find_metric_data *data = vdata;
+ const char *pm_pmu = pm->pmu ?: "cpu";
+
+ if (strcmp(data->pmu, "all") && strcmp(pm_pmu, data->pmu))
+ return 0;

if (!match_metric(pm->metric_name, data->metric))
return 0;
@@ -1025,11 +1048,13 @@ static int metricgroup__find_metric_callback(const struct pmu_metric *pm,
return 1;
}

-static bool metricgroup__find_metric(const char *metric,
+static bool metricgroup__find_metric(const char *pmu,
+ const char *metric,
const struct pmu_metrics_table *table,
struct pmu_metric *pm)
{
struct metricgroup__find_metric_data data = {
+ .pmu = pmu,
.metric = metric,
.pm = pm,
};
@@ -1083,7 +1108,7 @@ static int metricgroup__add_metric_sys_event_iter(const struct pmu_metric *pm,
struct metricgroup_add_iter_data *d = data;
int ret;

- if (!match_pm_metric(pm, d->metric_name))
+ if (!match_pm_metric(pm, d->pmu, d->metric_name))
return 0;

ret = add_metric(d->metric_list, pm, d->modifier, d->metric_no_group,
@@ -1128,6 +1153,7 @@ static int metric_list_cmp(void *priv __maybe_unused, const struct list_head *l,

struct metricgroup__add_metric_data {
struct list_head *list;
+ const char *pmu;
const char *metric_name;
const char *modifier;
const char *user_requested_cpu_list;
@@ -1144,10 +1170,7 @@ static int metricgroup__add_metric_callback(const struct pmu_metric *pm,
struct metricgroup__add_metric_data *data = vdata;
int ret = 0;

- if (pm->metric_expr &&
- (match_metric(pm->metric_group, data->metric_name) ||
- match_metric(pm->metric_name, data->metric_name))) {
-
+ if (pm->metric_expr && match_pm_metric(pm, data->pmu, data->metric_name)) {
data->has_match = true;
ret = add_metric(data->list, pm, data->modifier, data->metric_no_group,
data->metric_no_threshold, data->user_requested_cpu_list,
@@ -1159,6 +1182,7 @@ static int metricgroup__add_metric_callback(const struct pmu_metric *pm,

/**
* metricgroup__add_metric - Find and add a metric, or a metric group.
+ * @pmu: The PMU name to search for metrics on, or "all" for all PMUs.
* @metric_name: The name of the metric or metric group. For example, "IPC"
* could be the name of a metric and "TopDownL1" the name of a
* metric group.
@@ -1172,7 +1196,7 @@ static int metricgroup__add_metric_callback(const struct pmu_metric *pm,
* @table: The table that is searched for metrics, most commonly the table for the
* architecture perf is running upon.
*/
-static int metricgroup__add_metric(const char *metric_name, const char *modifier,
+static int metricgroup__add_metric(const char *pmu, const char *metric_name, const char *modifier,
bool metric_no_group, bool metric_no_threshold,
const char *user_requested_cpu_list,
bool system_wide,
@@ -1186,6 +1210,7 @@ static int metricgroup__add_metric(const char *metric_name, const char *modifier
{
struct metricgroup__add_metric_data data = {
.list = &list,
+ .pmu = pmu,
.metric_name = metric_name,
.modifier = modifier,
.metric_no_group = metric_no_group,
@@ -1210,6 +1235,7 @@ static int metricgroup__add_metric(const char *metric_name, const char *modifier
.fn = metricgroup__add_metric_sys_event_iter,
.data = (void *) &(struct metricgroup_add_iter_data) {
.metric_list = &list,
+ .pmu = pmu,
.metric_name = metric_name,
.modifier = modifier,
.metric_no_group = metric_no_group,
@@ -1239,6 +1265,7 @@ static int metricgroup__add_metric(const char *metric_name, const char *modifier
/**
* metricgroup__add_metric_list - Find and add metrics, or metric groups,
* specified in a list.
+ * @pmu: A pmu to restrict the metrics to, or "all" for all PMUS.
* @list: the list of metrics or metric groups. For example, "IPC,CPI,TopDownL1"
* would match the IPC and CPI metrics, and TopDownL1 would match all
* the metrics in the TopDownL1 group.
@@ -1251,7 +1278,8 @@ static int metricgroup__add_metric(const char *metric_name, const char *modifier
* @table: The table that is searched for metrics, most commonly the table for the
* architecture perf is running upon.
*/
-static int metricgroup__add_metric_list(const char *list, bool metric_no_group,
+static int metricgroup__add_metric_list(const char *pmu, const char *list,
+ bool metric_no_group,
bool metric_no_threshold,
const char *user_requested_cpu_list,
bool system_wide, struct list_head *metric_list,
@@ -1270,7 +1298,7 @@ static int metricgroup__add_metric_list(const char *list, bool metric_no_group,
if (modifier)
*modifier++ = '\0';

- ret = metricgroup__add_metric(metric_name, modifier,
+ ret = metricgroup__add_metric(pmu, metric_name, modifier,
metric_no_group, metric_no_threshold,
user_requested_cpu_list,
system_wide, metric_list, table);
@@ -1460,7 +1488,8 @@ static int parse_ids(bool metric_no_merge, struct perf_pmu *fake_pmu,
return ret;
}

-static int parse_groups(struct evlist *perf_evlist, const char *str,
+static int parse_groups(struct evlist *perf_evlist,
+ const char *pmu, const char *str,
bool metric_no_group,
bool metric_no_merge,
bool metric_no_threshold,
@@ -1478,7 +1507,7 @@ static int parse_groups(struct evlist *perf_evlist, const char *str,

if (metric_events_list->nr_entries == 0)
metricgroup__rblist_init(metric_events_list);
- ret = metricgroup__add_metric_list(str, metric_no_group, metric_no_threshold,
+ ret = metricgroup__add_metric_list(pmu, str, metric_no_group, metric_no_threshold,
user_requested_cpu_list,
system_wide, &metric_list, table);
if (ret)
@@ -1535,6 +1564,11 @@ static int parse_groups(struct evlist *perf_evlist, const char *str,
strcmp(m->modifier, n->modifier)))
continue;

+ if ((!m->pmu && n->pmu) ||
+ (m->pmu && !n->pmu) ||
+ (m->pmu && n->pmu && strcmp(m->pmu, n->pmu)))
+ continue;
+
if (expr__subset_of_ids(n->pctx, m->pctx)) {
pr_debug("Events in '%s' fully contained within '%s'\n",
m->metric_name, n->metric_name);
@@ -1552,7 +1586,8 @@ static int parse_groups(struct evlist *perf_evlist, const char *str,

metric_evlist = m->evlist;
}
- ret = setup_metric_events(m->pctx->ids, metric_evlist, &metric_events);
+ ret = setup_metric_events(fake_pmu ? "all" : m->pmu, m->pctx->ids,
+ metric_evlist, &metric_events);
if (ret) {
pr_debug("Cannot resolve IDs for %s: %s\n",
m->metric_name, m->metric_expr);
@@ -1623,7 +1658,7 @@ int metricgroup__parse_groups(struct evlist *perf_evlist,
if (!table)
return -EINVAL;

- return parse_groups(perf_evlist, str, metric_no_group, metric_no_merge,
+ return parse_groups(perf_evlist, "all", str, metric_no_group, metric_no_merge,
metric_no_threshold, user_requested_cpu_list, system_wide,
/*fake_pmu=*/NULL, metric_events, table);
}
@@ -1633,7 +1668,7 @@ int metricgroup__parse_groups_test(struct evlist *evlist,
const char *str,
struct rblist *metric_events)
{
- return parse_groups(evlist, str,
+ return parse_groups(evlist, "all", str,
/*metric_no_group=*/false,
/*metric_no_merge=*/false,
/*metric_no_threshold=*/false,
@@ -1642,28 +1677,32 @@ int metricgroup__parse_groups_test(struct evlist *evlist,
&perf_pmu__fake, metric_events, table);
}

+struct metricgroup__has_metric_data {
+ const char *pmu;
+ const char *metric;
+};
static int metricgroup__has_metric_callback(const struct pmu_metric *pm,
const struct pmu_metrics_table *table __maybe_unused,
void *vdata)
{
- const char *metric = vdata;
-
- if (match_metric(pm->metric_name, metric) ||
- match_metric(pm->metric_group, metric))
- return 1;
+ struct metricgroup__has_metric_data *data = vdata;

- return 0;
+ return match_pm_metric(pm, data->pmu, data->metric) ? 1 : 0;
}

-bool metricgroup__has_metric(const char *metric)
+bool metricgroup__has_metric(const char *pmu, const char *metric)
{
const struct pmu_metrics_table *table = pmu_metrics_table__find();
+ struct metricgroup__has_metric_data data = {
+ .pmu = pmu,
+ .metric = metric,
+ };

if (!table)
return false;

- return pmu_metrics_table_for_each_metric(table, metricgroup__has_metric_callback,
- (void *)metric) ? true : false;
+ return pmu_metrics_table_for_each_metric(table, metricgroup__has_metric_callback, &data)
+ ? true : false;
}

static int metricgroup__topdown_max_level_callback(const struct pmu_metric *pm,
diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
index 77472e35705e..08e9b9e953ec 100644
--- a/tools/perf/util/metricgroup.h
+++ b/tools/perf/util/metricgroup.h
@@ -80,7 +80,7 @@ int metricgroup__parse_groups_test(struct evlist *evlist,
struct rblist *metric_events);

void metricgroup__print(const struct print_callbacks *print_cb, void *print_state);
-bool metricgroup__has_metric(const char *metric);
+bool metricgroup__has_metric(const char *pmu, const char *metric);
unsigned int metricgroups__topdown_max_level(void);
int arch_get_runtimeparam(const struct pmu_metric *pm);
void metricgroup__rblist_exit(struct rblist *metric_events);
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:21:35

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 29/40] perf test: Fix parse-events tests for >1 core PMU

Remove assumptions of just 1 core PMU.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/tests/parse-events.c | 179 +++++++++++++++++++-------------
1 file changed, 106 insertions(+), 73 deletions(-)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 0b8ec9b1034f..a0cd50e18ebc 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -25,6 +25,11 @@ static bool test_config(const struct evsel *evsel, __u64 expected_config)
return (evsel->core.attr.config & PERF_HW_EVENT_MASK) == expected_config;
}

+static bool test_perf_config(const struct perf_evsel *evsel, __u64 expected_config)
+{
+ return (evsel->attr.config & PERF_HW_EVENT_MASK) == expected_config;
+}
+
#ifdef HAVE_LIBTRACEEVENT

#if defined(__s390x__)
@@ -87,11 +92,27 @@ static int test__checkevent_tracepoint_multi(struct evlist *evlist)

static int test__checkevent_raw(struct evlist *evlist)
{
- struct evsel *evsel = evlist__first(evlist);
+ struct perf_evsel *evsel;
+ bool raw_type_match = false;

- TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
+ TEST_ASSERT_VAL("wrong number of entries", 0 != evlist->core.nr_entries);
+
+ perf_evlist__for_each_evsel(&evlist->core, evsel) {
+ struct perf_pmu *pmu;
+ bool type_matched = false;
+
+ TEST_ASSERT_VAL("wrong config", test_perf_config(evsel, 0x1a));
+ perf_pmus__for_each_pmu(pmu) {
+ if (pmu->type == evsel->attr.type) {
+ TEST_ASSERT_VAL("PMU type expected once", !type_matched);
+ type_matched = true;
+ if (pmu->type == PERF_TYPE_RAW)
+ raw_type_match = true;
+ }
+ }
+ TEST_ASSERT_VAL("No PMU found for type", type_matched);
+ }
+ TEST_ASSERT_VAL("Raw PMU not matched", raw_type_match);
return TEST_OK;
}

@@ -107,31 +128,35 @@ static int test__checkevent_numeric(struct evlist *evlist)

static int test__checkevent_symbolic_name(struct evlist *evlist)
{
- struct evsel *evsel = evlist__first(evlist);
+ struct perf_evsel *evsel;

- TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_INSTRUCTIONS));
+ TEST_ASSERT_VAL("wrong number of entries", 0 != evlist->core.nr_entries);
+
+ perf_evlist__for_each_evsel(&evlist->core, evsel) {
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->attr.type);
+ TEST_ASSERT_VAL("wrong config",
+ test_perf_config(evsel, PERF_COUNT_HW_INSTRUCTIONS));
+ }
return TEST_OK;
}

static int test__checkevent_symbolic_name_config(struct evlist *evlist)
{
- struct evsel *evsel = evlist__first(evlist);
+ struct perf_evsel *evsel;

- TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
- /*
- * The period value gets configured within evlist__config,
- * while this test executes only parse events method.
- */
- TEST_ASSERT_VAL("wrong period",
- 0 == evsel->core.attr.sample_period);
- TEST_ASSERT_VAL("wrong config1",
- 0 == evsel->core.attr.config1);
- TEST_ASSERT_VAL("wrong config2",
- 1 == evsel->core.attr.config2);
+ TEST_ASSERT_VAL("wrong number of entries", 0 != evlist->core.nr_entries);
+
+ perf_evlist__for_each_evsel(&evlist->core, evsel) {
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->attr.type);
+ TEST_ASSERT_VAL("wrong config", test_perf_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
+ /*
+ * The period value gets configured within evlist__config,
+ * while this test executes only parse events method.
+ */
+ TEST_ASSERT_VAL("wrong period", 0 == evsel->attr.sample_period);
+ TEST_ASSERT_VAL("wrong config1", 0 == evsel->attr.config1);
+ TEST_ASSERT_VAL("wrong config2", 1 == evsel->attr.config2);
+ }
return TEST_OK;
}

@@ -147,11 +172,14 @@ static int test__checkevent_symbolic_alias(struct evlist *evlist)

static int test__checkevent_genhw(struct evlist *evlist)
{
- struct evsel *evsel = evlist__first(evlist);
+ struct perf_evsel *evsel;

- TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_HW_CACHE == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 1 << 16));
+ TEST_ASSERT_VAL("wrong number of entries", 0 != evlist->core.nr_entries);
+
+ perf_evlist__for_each_entry(&evlist->core, evsel) {
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_HW_CACHE == evsel->attr.type);
+ TEST_ASSERT_VAL("wrong config", test_perf_config(evsel, 1 << 16));
+ }
return TEST_OK;
}

@@ -243,17 +271,15 @@ static int test__checkevent_tracepoint_modifier(struct evlist *evlist)
static int
test__checkevent_tracepoint_multi_modifier(struct evlist *evlist)
{
- struct evsel *evsel;
+ struct perf_evsel *evsel;

TEST_ASSERT_VAL("wrong number of entries", evlist->core.nr_entries > 1);

- evlist__for_each_entry(evlist, evsel) {
- TEST_ASSERT_VAL("wrong exclude_user",
- !evsel->core.attr.exclude_user);
- TEST_ASSERT_VAL("wrong exclude_kernel",
- evsel->core.attr.exclude_kernel);
- TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
- TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
+ perf_evlist__for_each_entry(&evlist->core, evsel) {
+ TEST_ASSERT_VAL("wrong exclude_user", !evsel->attr.exclude_user);
+ TEST_ASSERT_VAL("wrong exclude_kernel", evsel->attr.exclude_kernel);
+ TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
+ TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
}

return test__checkevent_tracepoint_multi(evlist);
@@ -262,25 +288,27 @@ test__checkevent_tracepoint_multi_modifier(struct evlist *evlist)

static int test__checkevent_raw_modifier(struct evlist *evlist)
{
- struct evsel *evsel = evlist__first(evlist);
-
- TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
- TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
- TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
- TEST_ASSERT_VAL("wrong precise_ip", evsel->core.attr.precise_ip);
+ struct perf_evsel *evsel;

+ perf_evlist__for_each_entry(&evlist->core, evsel) {
+ TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
+ TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->attr.exclude_kernel);
+ TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
+ TEST_ASSERT_VAL("wrong precise_ip", evsel->attr.precise_ip);
+ }
return test__checkevent_raw(evlist);
}

static int test__checkevent_numeric_modifier(struct evlist *evlist)
{
- struct evsel *evsel = evlist__first(evlist);
-
- TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
- TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
- TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
- TEST_ASSERT_VAL("wrong precise_ip", evsel->core.attr.precise_ip);
+ struct perf_evsel *evsel;

+ perf_evlist__for_each_entry(&evlist->core, evsel) {
+ TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
+ TEST_ASSERT_VAL("wrong exclude_kernel", evsel->attr.exclude_kernel);
+ TEST_ASSERT_VAL("wrong exclude_hv", !evsel->attr.exclude_hv);
+ TEST_ASSERT_VAL("wrong precise_ip", evsel->attr.precise_ip);
+ }
return test__checkevent_numeric(evlist);
}

@@ -298,21 +326,23 @@ static int test__checkevent_symbolic_name_modifier(struct evlist *evlist)

static int test__checkevent_exclude_host_modifier(struct evlist *evlist)
{
- struct evsel *evsel = evlist__first(evlist);
-
- TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
- TEST_ASSERT_VAL("wrong exclude host", evsel->core.attr.exclude_host);
+ struct perf_evsel *evsel;

+ perf_evlist__for_each_entry(&evlist->core, evsel) {
+ TEST_ASSERT_VAL("wrong exclude guest", !evsel->attr.exclude_guest);
+ TEST_ASSERT_VAL("wrong exclude host", evsel->attr.exclude_host);
+ }
return test__checkevent_symbolic_name(evlist);
}

static int test__checkevent_exclude_guest_modifier(struct evlist *evlist)
{
- struct evsel *evsel = evlist__first(evlist);
-
- TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
- TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
-
+ struct perf_evsel *evsel;
+
+ perf_evlist__for_each_entry(&evlist->core, evsel) {
+ TEST_ASSERT_VAL("wrong exclude guest", evsel->attr.exclude_guest);
+ TEST_ASSERT_VAL("wrong exclude host", !evsel->attr.exclude_host);
+ }
return test__checkevent_symbolic_name(evlist);
}

@@ -330,13 +360,14 @@ static int test__checkevent_symbolic_alias_modifier(struct evlist *evlist)

static int test__checkevent_genhw_modifier(struct evlist *evlist)
{
- struct evsel *evsel = evlist__first(evlist);
-
- TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
- TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
- TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
- TEST_ASSERT_VAL("wrong precise_ip", evsel->core.attr.precise_ip);
+ struct perf_evsel *evsel;

+ perf_evlist__for_each_entry(&evlist->core, evsel) {
+ TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
+ TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->attr.exclude_kernel);
+ TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
+ TEST_ASSERT_VAL("wrong precise_ip", evsel->attr.precise_ip);
+ }
return test__checkevent_genhw(evlist);
}

@@ -466,21 +497,23 @@ static int test__checkevent_list(struct evlist *evlist)
{
struct evsel *evsel = evlist__first(evlist);

- TEST_ASSERT_VAL("wrong number of entries", 3 == evlist->core.nr_entries);
+ TEST_ASSERT_VAL("wrong number of entries", 3 <= evlist->core.nr_entries);

/* r1 */
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 1));
- TEST_ASSERT_VAL("wrong config1", 0 == evsel->core.attr.config1);
- TEST_ASSERT_VAL("wrong config2", 0 == evsel->core.attr.config2);
- TEST_ASSERT_VAL("wrong config3", 0 == evsel->core.attr.config3);
- TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
- TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
- TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
- TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_TRACEPOINT != evsel->core.attr.type);
+ while (PERF_TYPE_TRACEPOINT != evsel->core.attr.type) {
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, 1));
+ TEST_ASSERT_VAL("wrong config1", 0 == evsel->core.attr.config1);
+ TEST_ASSERT_VAL("wrong config2", 0 == evsel->core.attr.config2);
+ TEST_ASSERT_VAL("wrong config3", 0 == evsel->core.attr.config3);
+ TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
+ TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
+ TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
+ TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
+ evsel = evsel__next(evsel);
+ }

/* syscalls:sys_enter_openat:k */
- evsel = evsel__next(evsel);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_TRACEPOINT == evsel->core.attr.type);
TEST_ASSERT_VAL("wrong sample_type",
PERF_TP_SAMPLE_TYPE == evsel->core.attr.sample_type);
@@ -1916,7 +1949,7 @@ static int test_event(const struct evlist_test *e)
e->name, ret, err.str);
parse_events_error__print(&err, e->name);
ret = TEST_FAIL;
- if (strstr(err.str, "can't access trace events"))
+ if (err.str && strstr(err.str, "can't access trace events"))
ret = TEST_SKIP;
} else {
ret = e->check(evlist);
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:22:11

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 32/40] perf parse-events: Avoid error when assigning a legacy cache term

Avoid the parser error:
'''
$ perf stat -e 'cycles/name=l1d/' true
event syntax error: 'cycles/name=l1d/'
\___ parser error
'''
by combining the name and legacy cache cases in the parser.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/tests/parse-events.c | 21 +++++++++++++++++++++
tools/perf/util/parse-events.y | 10 ++++++----
2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 06042f450ece..c44f0ffa51c5 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -1495,6 +1495,16 @@ static int test__term_equal_term(struct evlist *evlist)
return TEST_OK;
}

+static int test__term_equal_legacy(struct evlist *evlist)
+{
+ struct evsel *evsel = evlist__first(evlist);
+
+ TEST_ASSERT_VAL("wrong type", evsel->core.attr.type == PERF_TYPE_HARDWARE);
+ TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
+ TEST_ASSERT_VAL("wrong name setting", strcmp(evsel->name, "l1d") == 0);
+ return TEST_OK;
+}
+
#ifdef HAVE_LIBTRACEEVENT
static int count_tracepoints(void)
{
@@ -1872,6 +1882,11 @@ static const struct evlist_test test__events[] = {
.check = test__term_equal_term,
/* 8 */
},
+ {
+ .name = "cycles/name=l1d/",
+ .check = test__term_equal_legacy,
+ /* 9 */
+ },
};

static const struct evlist_test test__events_pmu[] = {
@@ -2059,6 +2074,12 @@ static const struct evlist_test test__events_pmu[] = {
.check = test__term_equal_term,
/* 0 */
},
+ {
+ .name = "cpu/cycles,name=l1d/",
+ .valid = test__pmu_cpu_valid,
+ .check = test__term_equal_legacy,
+ /* 1 */
+ },
};

struct terms_test {
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 0aaebc57748e..f4ee03b5976b 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -82,7 +82,7 @@ static void free_list_evsel(struct list_head* list_evsel)
%type <str> PE_EVENT_NAME
%type <str> PE_KERNEL_PMU_EVENT PE_PMU_EVENT_FAKE
%type <str> PE_DRV_CFG_TERM
-%type <str> name_or_raw
+%type <str> name_or_raw name_or_legacy
%destructor { free ($$); } <str>
%type <term> event_term
%destructor { parse_events_term__delete ($$); } <term>
@@ -739,6 +739,8 @@ event_term

name_or_raw: PE_RAW | PE_NAME | PE_LEGACY_CACHE

+name_or_legacy: PE_NAME | PE_LEGACY_CACHE
+
event_term:
PE_RAW
{
@@ -752,7 +754,7 @@ PE_RAW
$$ = term;
}
|
-name_or_raw '=' PE_NAME
+name_or_raw '=' name_or_legacy
{
struct parse_events_term *term;

@@ -826,7 +828,7 @@ PE_TERM_HW
$$ = term;
}
|
-PE_TERM '=' PE_NAME
+PE_TERM '=' name_or_legacy
{
struct parse_events_term *term;

@@ -872,7 +874,7 @@ PE_TERM
$$ = term;
}
|
-name_or_raw array '=' PE_NAME
+name_or_raw array '=' name_or_legacy
{
struct parse_events_term *term;

--
2.40.1.495.gc816e09b53d-goog

2023-04-26 07:22:18

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v1 40/40] perf metrics: Be PMU specific in event match

Ids/events from a metric are turned into an event string and parsed;
setup_metric_events matches the id back to the parsed evsel. With
hybrid the same event may exist on both PMUs with the same name and be
being used by metrics at the same time. A metric on cpu_core therefore
shouldn't match against evsels on cpu_atom, or the metric will compute
the wrong value. Make the matching sensitive to the PMU being parsed.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/metricgroup.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 4245b23d8efe..490561f430f2 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -274,7 +274,7 @@ static int setup_metric_events(const char *pmu, struct hashmap *ids,
const char *metric_id;
struct evsel *ev;
size_t ids_size, matched_events, i;
- bool all_pmus = !strcmp(pmu, "all");
+ bool all_pmus = !strcmp(pmu, "all") || !perf_pmu__is_hybrid(pmu);

*out_metric_events = NULL;
ids_size = hashmap__size(ids);
@@ -287,7 +287,10 @@ static int setup_metric_events(const char *pmu, struct hashmap *ids,
evlist__for_each_entry(metric_evlist, ev) {
struct expr_id_data *val_ptr;

- if (!all_pmus && strcmp(ev->pmu_name, pmu))
+ /* Don't match events for the wrong hybrid PMU. */
+ if (!all_pmus && ev->pmu_name &&
+ perf_pmu__is_hybrid(ev->pmu_name) &&
+ strcmp(ev->pmu_name, pmu))
continue;
/*
* Check for duplicate events with the same name. For
@@ -304,6 +307,7 @@ static int setup_metric_events(const char *pmu, struct hashmap *ids,
* about this event.
*/
if (hashmap__find(ids, metric_id, &val_ptr)) {
+ pr_debug("Matched metric-id %s to %s\n", metric_id, evsel__name(ev));
metric_events[matched_events++] = ev;

if (matched_events >= ids_size)
@@ -1592,7 +1596,7 @@ static int parse_groups(struct evlist *perf_evlist,
ret = setup_metric_events(fake_pmu ? "all" : m->pmu, m->pctx->ids,
metric_evlist, &metric_events);
if (ret) {
- pr_debug("Cannot resolve IDs for %s: %s\n",
+ pr_err("Cannot resolve IDs for %s: %s\n",
m->metric_name, m->metric_expr);
goto out;
}
--
2.40.1.495.gc816e09b53d-goog

2023-04-26 10:12:27

by James Clark

[permalink] [raw]
Subject: Re: [PATCH v1 21/40] perf parse-events: Wildcard legacy cache events



On 26/04/2023 08:00, Ian Rogers wrote:
> It is inconsistent that "perf stat -e instructions-retired" wildcard
> opens on all PMUs while legacy cache events like "perf stat -e
> L1-dcache-load-miss" do not. A behavior introduced by hybrid is that a
> legacy cache event like L1-dcache-load-miss should wildcard open on
> all hybrid PMUs. A call to is_event_supported is necessary for each
> PMU, a failure of which results in the event not being added. Rather
> than special case that logic, move it into the main legacy cache event
> case and attempt to open legacy cache events on all PMUs.
>
> Signed-off-by: Ian Rogers <[email protected]>
> ---
> tools/perf/util/parse-events-hybrid.c | 33 -------------
> tools/perf/util/parse-events-hybrid.h | 7 ---
> tools/perf/util/parse-events.c | 70 ++++++++++++++-------------
> tools/perf/util/parse-events.h | 3 +-
> tools/perf/util/parse-events.y | 2 +-
> 5 files changed, 39 insertions(+), 76 deletions(-)
>
> diff --git a/tools/perf/util/parse-events-hybrid.c b/tools/perf/util/parse-events-hybrid.c
> index 7c9f9150bad5..d2c0be051d46 100644
> --- a/tools/perf/util/parse-events-hybrid.c
> +++ b/tools/perf/util/parse-events-hybrid.c
> @@ -179,36 +179,3 @@ int parse_events__add_numeric_hybrid(struct parse_events_state *parse_state,
> return add_raw_hybrid(parse_state, list, attr, name, metric_id,
> config_terms);
> }
> -
> -int parse_events__add_cache_hybrid(struct list_head *list, int *idx,
> - struct perf_event_attr *attr,
> - const char *name,
> - const char *metric_id,
> - struct list_head *config_terms,
> - bool *hybrid,
> - struct parse_events_state *parse_state)
> -{
> - struct perf_pmu *pmu;
> - int ret;
> -
> - *hybrid = false;
> - if (!perf_pmu__has_hybrid())
> - return 0;
> -
> - *hybrid = true;
> - perf_pmu__for_each_hybrid_pmu(pmu) {
> - LIST_HEAD(terms);
> -
> - if (pmu_cmp(parse_state, pmu))
> - continue;
> -
> - copy_config_terms(&terms, config_terms);
> - ret = create_event_hybrid(PERF_TYPE_HW_CACHE, idx, list,
> - attr, name, metric_id, &terms, pmu);
> - free_config_terms(&terms);
> - if (ret)
> - return ret;
> - }
> -
> - return 0;
> -}
> diff --git a/tools/perf/util/parse-events-hybrid.h b/tools/perf/util/parse-events-hybrid.h
> index cbc05fec02a2..bc2966e73897 100644
> --- a/tools/perf/util/parse-events-hybrid.h
> +++ b/tools/perf/util/parse-events-hybrid.h
> @@ -15,11 +15,4 @@ int parse_events__add_numeric_hybrid(struct parse_events_state *parse_state,
> struct list_head *config_terms,
> bool *hybrid);
>
> -int parse_events__add_cache_hybrid(struct list_head *list, int *idx,
> - struct perf_event_attr *attr,
> - const char *name, const char *metric_id,
> - struct list_head *config_terms,
> - bool *hybrid,
> - struct parse_events_state *parse_state);
> -
> #endif /* __PERF_PARSE_EVENTS_HYBRID_H */
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 9b2d7b6572c2..e007b2bc1ab4 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -471,46 +471,50 @@ static int parse_events__decode_legacy_cache(const char *name, int pmu_type, __u
>
> int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
> struct parse_events_error *err,
> - struct list_head *head_config,
> - struct parse_events_state *parse_state)
> + struct list_head *head_config)
> {
> - struct perf_event_attr attr;
> - LIST_HEAD(config_terms);
> - const char *config_name, *metric_id;
> - int ret;
> - bool hybrid;
> + struct perf_pmu *pmu = NULL;
> + bool found_supported = false;
> + const char *config_name = get_config_name(head_config);
> + const char *metric_id = get_config_metric_id(head_config);
>
> + while ((pmu = perf_pmu__scan(pmu)) != NULL) {
> + LIST_HEAD(config_terms);
> + struct perf_event_attr attr;
> + int ret;
>
> - memset(&attr, 0, sizeof(attr));
> - attr.type = PERF_TYPE_HW_CACHE;
> - ret = parse_events__decode_legacy_cache(name, /*pmu_type=*/0, &attr.config);
> - if (ret)
> - return ret;
> + /*
> + * Skip uncore PMUs for performance. Software PMUs can open
> + * PERF_TYPE_HW_CACHE, so skip.
> + */
> + if (pmu->is_uncore || pmu->type == PERF_TYPE_SOFTWARE)
> + continue;
>
> - if (head_config) {
> - if (config_attr(&attr, head_config, err,
> - config_term_common))
> - return -EINVAL;
> + memset(&attr, 0, sizeof(attr));
> + attr.type = PERF_TYPE_HW_CACHE;
>
> - if (get_config_terms(head_config, &config_terms))
> - return -ENOMEM;
> - }
> + ret = parse_events__decode_legacy_cache(name, pmu->type, &attr.config);
> + if (ret)
> + return ret;
>
> - config_name = get_config_name(head_config);
> - metric_id = get_config_metric_id(head_config);
> - ret = parse_events__add_cache_hybrid(list, idx, &attr,
> - config_name ? : name,
> - metric_id,
> - &config_terms,
> - &hybrid, parse_state);
> - if (hybrid)
> - goto out_free_terms;
> + if (!is_event_supported(PERF_TYPE_HW_CACHE, attr.config))
> + continue;

Hi Ian,

I get a test failure on Arm from this commit. I think it's related to
this check for support that's failing but I'm not sure what the
resolution should be. I also couldn't see why the metrics in
test_soc/cpu/metrics.json aren't run on x86 (assuming they're generic
'test anywhere' type metrics?).

$ perf test -vvv "parsing of PMU event table metrics with fake"
...
parsing 'dcache_miss_cpi': 'l1d\-loads\-misses / inst_retired.any'
parsing metric: l1d\-loads\-misses / inst_retired.any
Attempting to add event pmu 'inst_retired.any' with
'inst_retired.any,' that may result in non-fatal errors
After aliases, add event pmu 'inst_retired.any' with
'inst_retired.any,' that may result in non-fatal errors
inst_retired.any -> fake_pmu/inst_retired.any/
------------------------------------------------------------
perf_event_attr:
type 3
config 0x800010000
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8
sys_perf_event_open failed, error -2

check_parse_fake failed
test child finished with -1
---- end ----
PMU events subtest 4: FAILED!

>
> - ret = add_event(list, idx, &attr, config_name ? : name, metric_id,
> - &config_terms);
> -out_free_terms:
> - free_config_terms(&config_terms);
> - return ret;
> + found_supported = true;
> +
> + if (head_config) {
> + if (config_attr(&attr, head_config, err,
> + config_term_common))
> + return -EINVAL;
> +
> + if (get_config_terms(head_config, &config_terms))
> + return -ENOMEM;
> + }
> +
> + ret = add_event(list, idx, &attr, config_name ? : name, metric_id, &config_terms);
> + free_config_terms(&config_terms);
> + }
> + return found_supported ? 0: -EINVAL;
> }
>
> #ifdef HAVE_LIBTRACEEVENT
> diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
> index 5acb62c2e00a..0c26303f7f63 100644
> --- a/tools/perf/util/parse-events.h
> +++ b/tools/perf/util/parse-events.h
> @@ -172,8 +172,7 @@ int parse_events_add_tool(struct parse_events_state *parse_state,
> int tool_event);
> int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
> struct parse_events_error *error,
> - struct list_head *head_config,
> - struct parse_events_state *parse_state);
> + struct list_head *head_config);
> int parse_events_add_breakpoint(struct list_head *list, int *idx,
> u64 addr, char *type, u64 len);
> int parse_events_add_pmu(struct parse_events_state *parse_state,
> diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
> index f84fa1b132b3..cc7528558845 100644
> --- a/tools/perf/util/parse-events.y
> +++ b/tools/perf/util/parse-events.y
> @@ -476,7 +476,7 @@ PE_LEGACY_CACHE opt_event_config
>
> list = alloc_list();
> ABORT_ON(!list);
> - err = parse_events_add_cache(list, &parse_state->idx, $1, error, $2, parse_state);
> + err = parse_events_add_cache(list, &parse_state->idx, $1, error, $2);
>
> parse_events_terms__delete($2);
> free($1);

2023-04-26 14:04:59

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 00/40] Fix perf on Intel hybrid CPUs



On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> TL;DR: hybrid doesn't crash, json metrics work on hybrid on both PMUs
> or individually, event parsing doesn't always scan all PMUs, more and
> new tests that also run without hybrid, less code.
>
> The first patches were previously posted to improve metrics here:
> "perf stat: Introduce skippable evsels"
> https://lore.kernel.org/all/[email protected]/
> "perf vendor events intel: Add xxx metric constraints"
> https://lore.kernel.org/all/[email protected]/
>
> Next are some general test improvements.
>
> Next event parsing is rewritten to not scan all PMUs for the benefit
> of raw and legacy cache parsing, instead these are handled by the
> lexer and a new term type. This ultimately removes the need for the
> event parser for hybrid to be recursive as legacy cache can be just a
> term. Tests are re-enabled for events with hyphens, so AMD's
> branch-brs event is now parsable.
>
> The cputype option is made a generic pmu filter flag and is tested
> even on non-hybrid systems.
>
> The final patches address specific json metric issues on hybrid, in
> both the json metrics and the metric code. They also bring in a new
> json option to not group events when matching a metricgroup, this
> helps reduce counter pressure for TopdownL1 and TopdownL2 metric
> groups. The updates to the script that updates the json are posted in:
> https://github.com/intel/perfmon/pull/73
>
> The patches add slightly more code than they remove, in areas like
> better json metric constraints and tests, but in the core util code,
> the removal of hybrid is a net reduction:
> 20 files changed, 631 insertions(+), 951 deletions(-)
>
> There's specific detail with each patch, but for now here is the 6.3
> output followed by that from perf-tools-next with the patch series
> applied. The tool is running on an Alderlake CPU on an elderly 5.15
> kernel:
>
> Events on hybrid that parse and pass tests:
> '''
> $ perf-6.3 version
> perf version 6.3.rc7.gb7bc77e2f2c7
> $ perf-6.3 test
> ...
> 6.1: Test event parsing : FAILED!
> ...
> $ perf test
> ...
> 6: Parse event definition strings :
> 6.1: Test event parsing : Ok
> 6.2: Parsing of all PMU events from sysfs : Ok
> 6.3: Parsing of given PMU events from sysfs : Ok
> 6.4: Parsing of aliased events from sysfs : Skip (no aliases in sysfs)
> 6.5: Parsing of aliased events : Ok
> 6.6: Parsing of terms (event modifiers) : Ok
> ...
> '''
>
> No event/metric running with json metrics and TopdownL1 on both PMUs:
> '''
> $ perf-6.3 stat -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 24,073.58 msec cpu-clock # 23.975 CPUs utilized
> 350 context-switches # 14.539 /sec
> 25 cpu-migrations # 1.038 /sec
> 66 page-faults # 2.742 /sec
> 21,257,199 cpu_core/cycles/ # 883.009 K/sec
> 2,162,192 cpu_atom/cycles/ # 89.816 K/sec
> 6,679,379 cpu_core/instructions/ # 277.457 K/sec
> 753,197 cpu_atom/instructions/ # 31.287 K/sec
> 1,300,647 cpu_core/branches/ # 54.028 K/sec
> 148,652 cpu_atom/branches/ # 6.175 K/sec
> 117,429 cpu_core/branch-misses/ # 4.878 K/sec
> 14,396 cpu_atom/branch-misses/ # 598.000 /sec
> 123,097,644 cpu_core/slots/ # 5.113 M/sec
> 9,241,207 cpu_core/topdown-retiring/ # 7.5% Retiring
> 8,903,288 cpu_core/topdown-bad-spec/ # 7.2% Bad Speculation
> 66,590,029 cpu_core/topdown-fe-bound/ # 54.1% Frontend Bound
> 38,397,500 cpu_core/topdown-be-bound/ # 31.2% Backend Bound
> 3,294,283 cpu_core/topdown-heavy-ops/ # 2.7% Heavy Operations # 4.8% Light Operations
> 8,855,769 cpu_core/topdown-br-mispredict/ # 7.2% Branch Mispredict # 0.0% Machine Clears
> 57,695,714 cpu_core/topdown-fetch-lat/ # 46.9% Fetch Latency # 7.2% Fetch Bandwidth
> 12,823,926 cpu_core/topdown-mem-bound/ # 10.4% Memory Bound # 20.8% Core Bound
>
> 1.004093622 seconds time elapsed
>
> $ perf stat -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 24,064.65 msec cpu-clock # 23.973 CPUs utilized
> 384 context-switches # 15.957 /sec
> 24 cpu-migrations # 0.997 /sec
> 71 page-faults # 2.950 /sec
> 19,737,646 cpu_core/cycles/ # 820.192 K/sec
> 122,018,505 cpu_atom/cycles/ # 5.070 M/sec (63.32%)
> 7,636,653 cpu_core/instructions/ # 317.339 K/sec
> 16,266,629 cpu_atom/instructions/ # 675.955 K/sec (72.50%)
> 1,552,995 cpu_core/branches/ # 64.534 K/sec
> 3,208,143 cpu_atom/branches/ # 133.314 K/sec (72.50%)
> 132,151 cpu_core/branch-misses/ # 5.491 K/sec
> 547,285 cpu_atom/branch-misses/ # 22.742 K/sec (72.49%)
> 32,110,597 cpu_atom/TOPDOWN_RETIRING.ALL/ # 1.334 M/sec
> # 18.4 % tma_bad_speculation (72.48%)
> 228,006,765 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 9.475 M/sec
> # 38.1 % tma_frontend_bound (72.47%)
> 225,866,251 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 9.386 M/sec
> # 37.7 % tma_backend_bound
> # 37.7 % tma_backend_bound_aux (72.73%)
> 119,748,254 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 4.976 M/sec
> # 5.2 % tma_retiring (73.14%)
> 31,363,579 cpu_atom/TOPDOWN_RETIRING.ALL/ # 1.303 M/sec (73.37%)
> 227,907,321 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 9.471 M/sec (63.95%)
> 228,803,268 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 9.508 M/sec (63.55%)
> 113,357,334 cpu_core/TOPDOWN.SLOTS/ # 30.5 % tma_backend_bound
> # 9.2 % tma_retiring
> # 8.7 % tma_bad_speculation
> # 51.6 % tma_frontend_bound
> 10,451,044 cpu_core/topdown-retiring/
> 9,687,449 cpu_core/topdown-bad-spec/
> 58,703,214 cpu_core/topdown-fe-bound/
> 34,540,660 cpu_core/topdown-be-bound/
> 154,902 cpu_core/INT_MISC.UOP_DROPPING/ # 6.437 K/sec
>
> 1.003818397 seconds time elapsed
> '''

Thanks for the fixes. That should work for -M or --topdown options.
But I don't think the above output is better than the 6.3 for the
*default* of perf stat?

- The multiplexing in the atom core messes up the other events.
- The "M/sec" seems useless for the Topdown events.
- The tma_* is not a generic name.
"Retiring" is much better than "tma_retiring" as a generic annotation.
It should works for both X86 and Arm.

As the default, it's better to provide a clean and generic ouptput for
the end users.

If the users want to know more details, they can use -M or --topdown
options. The events/formats are expected to be different among ARCHs.

Also, there should be a bug for all atom Topdown events. They are
displayed twice.

Thanks,
Kan

2023-04-26 21:11:02

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v1 00/40] Fix perf on Intel hybrid CPUs

Em Wed, Apr 26, 2023 at 12:00:10AM -0700, Ian Rogers escreveu:
> TL;DR: hybrid doesn't crash, json metrics work on hybrid on both PMUs
> or individually, event parsing doesn't always scan all PMUs, more and
> new tests that also run without hybrid, less code.
>
> The first patches were previously posted to improve metrics here:
> "perf stat: Introduce skippable evsels"
> https://lore.kernel.org/all/[email protected]/
> "perf vendor events intel: Add xxx metric constraints"
> https://lore.kernel.org/all/[email protected]/
>
> Next are some general test improvements.

Kan,

Have you looked at this? I'm doing a test build on it now.

- Arnaldo

> Next event parsing is rewritten to not scan all PMUs for the benefit
> of raw and legacy cache parsing, instead these are handled by the
> lexer and a new term type. This ultimately removes the need for the
> event parser for hybrid to be recursive as legacy cache can be just a
> term. Tests are re-enabled for events with hyphens, so AMD's
> branch-brs event is now parsable.
>
> The cputype option is made a generic pmu filter flag and is tested
> even on non-hybrid systems.
>
> The final patches address specific json metric issues on hybrid, in
> both the json metrics and the metric code. They also bring in a new
> json option to not group events when matching a metricgroup, this
> helps reduce counter pressure for TopdownL1 and TopdownL2 metric
> groups. The updates to the script that updates the json are posted in:
> https://github.com/intel/perfmon/pull/73
>
> The patches add slightly more code than they remove, in areas like
> better json metric constraints and tests, but in the core util code,
> the removal of hybrid is a net reduction:
> 20 files changed, 631 insertions(+), 951 deletions(-)
>
> There's specific detail with each patch, but for now here is the 6.3
> output followed by that from perf-tools-next with the patch series
> applied. The tool is running on an Alderlake CPU on an elderly 5.15
> kernel:
>
> Events on hybrid that parse and pass tests:
> '''
> $ perf-6.3 version
> perf version 6.3.rc7.gb7bc77e2f2c7
> $ perf-6.3 test
> ...
> 6.1: Test event parsing : FAILED!
> ...
> $ perf test
> ...
> 6: Parse event definition strings :
> 6.1: Test event parsing : Ok
> 6.2: Parsing of all PMU events from sysfs : Ok
> 6.3: Parsing of given PMU events from sysfs : Ok
> 6.4: Parsing of aliased events from sysfs : Skip (no aliases in sysfs)
> 6.5: Parsing of aliased events : Ok
> 6.6: Parsing of terms (event modifiers) : Ok
> ...
> '''
>
> No event/metric running with json metrics and TopdownL1 on both PMUs:
> '''
> $ perf-6.3 stat -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 24,073.58 msec cpu-clock # 23.975 CPUs utilized
> 350 context-switches # 14.539 /sec
> 25 cpu-migrations # 1.038 /sec
> 66 page-faults # 2.742 /sec
> 21,257,199 cpu_core/cycles/ # 883.009 K/sec
> 2,162,192 cpu_atom/cycles/ # 89.816 K/sec
> 6,679,379 cpu_core/instructions/ # 277.457 K/sec
> 753,197 cpu_atom/instructions/ # 31.287 K/sec
> 1,300,647 cpu_core/branches/ # 54.028 K/sec
> 148,652 cpu_atom/branches/ # 6.175 K/sec
> 117,429 cpu_core/branch-misses/ # 4.878 K/sec
> 14,396 cpu_atom/branch-misses/ # 598.000 /sec
> 123,097,644 cpu_core/slots/ # 5.113 M/sec
> 9,241,207 cpu_core/topdown-retiring/ # 7.5% Retiring
> 8,903,288 cpu_core/topdown-bad-spec/ # 7.2% Bad Speculation
> 66,590,029 cpu_core/topdown-fe-bound/ # 54.1% Frontend Bound
> 38,397,500 cpu_core/topdown-be-bound/ # 31.2% Backend Bound
> 3,294,283 cpu_core/topdown-heavy-ops/ # 2.7% Heavy Operations # 4.8% Light Operations
> 8,855,769 cpu_core/topdown-br-mispredict/ # 7.2% Branch Mispredict # 0.0% Machine Clears
> 57,695,714 cpu_core/topdown-fetch-lat/ # 46.9% Fetch Latency # 7.2% Fetch Bandwidth
> 12,823,926 cpu_core/topdown-mem-bound/ # 10.4% Memory Bound # 20.8% Core Bound
>
> 1.004093622 seconds time elapsed
>
> $ perf stat -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 24,064.65 msec cpu-clock # 23.973 CPUs utilized
> 384 context-switches # 15.957 /sec
> 24 cpu-migrations # 0.997 /sec
> 71 page-faults # 2.950 /sec
> 19,737,646 cpu_core/cycles/ # 820.192 K/sec
> 122,018,505 cpu_atom/cycles/ # 5.070 M/sec (63.32%)
> 7,636,653 cpu_core/instructions/ # 317.339 K/sec
> 16,266,629 cpu_atom/instructions/ # 675.955 K/sec (72.50%)
> 1,552,995 cpu_core/branches/ # 64.534 K/sec
> 3,208,143 cpu_atom/branches/ # 133.314 K/sec (72.50%)
> 132,151 cpu_core/branch-misses/ # 5.491 K/sec
> 547,285 cpu_atom/branch-misses/ # 22.742 K/sec (72.49%)
> 32,110,597 cpu_atom/TOPDOWN_RETIRING.ALL/ # 1.334 M/sec
> # 18.4 % tma_bad_speculation (72.48%)
> 228,006,765 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 9.475 M/sec
> # 38.1 % tma_frontend_bound (72.47%)
> 225,866,251 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 9.386 M/sec
> # 37.7 % tma_backend_bound
> # 37.7 % tma_backend_bound_aux (72.73%)
> 119,748,254 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 4.976 M/sec
> # 5.2 % tma_retiring (73.14%)
> 31,363,579 cpu_atom/TOPDOWN_RETIRING.ALL/ # 1.303 M/sec (73.37%)
> 227,907,321 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 9.471 M/sec (63.95%)
> 228,803,268 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 9.508 M/sec (63.55%)
> 113,357,334 cpu_core/TOPDOWN.SLOTS/ # 30.5 % tma_backend_bound
> # 9.2 % tma_retiring
> # 8.7 % tma_bad_speculation
> # 51.6 % tma_frontend_bound
> 10,451,044 cpu_core/topdown-retiring/
> 9,687,449 cpu_core/topdown-bad-spec/
> 58,703,214 cpu_core/topdown-fe-bound/
> 34,540,660 cpu_core/topdown-be-bound/
> 154,902 cpu_core/INT_MISC.UOP_DROPPING/ # 6.437 K/sec
>
> 1.003818397 seconds time elapsed
> '''
>
> Json metrics that don't crash:
> '''
> $ perf-6.3 stat -M TopdownL1 -a sleep 1
> WARNING: events in group from different hybrid PMUs!
> WARNING: grouped events cpus do not match, disabling group:
> anon group { topdown-retiring, topdown-retiring, INT_MISC.UOP_DROPPING, topdown-fe-bound, topdown-fe-bound, CPU_CLK_UNHALTED.CORE, topdown-be-bound, topdown-be-bound, topdown-bad-spec, topdown-bad-spec }
> Error:
> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-retiring).
> /bin/dmesg | grep -i perf may provide additional information.
>
> $ perf stat -M TopdownL1 -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 811,810 cpu_atom/TOPDOWN_RETIRING.ALL/ # 26.6 % tma_bad_speculation
> 3,239,281 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 38.8 % tma_frontend_bound
> 2,037,667 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 24.4 % tma_backend_bound
> # 24.4 % tma_backend_bound_aux
> 1,670,438 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 9.7 % tma_retiring
> 808,138 cpu_atom/TOPDOWN_RETIRING.ALL/
> 3,234,707 cpu_atom/TOPDOWN_FE_BOUND.ALL/
> 2,081,420 cpu_atom/TOPDOWN_BE_BOUND.ALL/
> 122,795,280 cpu_core/TOPDOWN.SLOTS/ # 31.7 % tma_backend_bound
> # 7.0 % tma_bad_speculation
> # 54.1 % tma_frontend_bound
> # 7.2 % tma_retiring
> 8,817,636 cpu_core/topdown-retiring/
> 8,480,817 cpu_core/topdown-bad-spec/
> 3,108,926 cpu_core/topdown-heavy-ops/
> 66,566,215 cpu_core/topdown-fe-bound/
> 38,958,811 cpu_core/topdown-be-bound/
> 134,194 cpu_core/INT_MISC.UOP_DROPPING/
>
> 1.003607796 seconds time elapsed
>
> $ perf stat -M TopdownL2 -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 162,334,218 cpu_atom/TOPDOWN_FE_BOUND.FRONTEND_LATENCY/ # 27.7 % tma_fetch_latency (38.99%)
> 16,191,486 cpu_atom/INST_RETIRED.ANY/ (45.76%)
> 68,443,205 cpu_atom/TOPDOWN_BE_BOUND.MEM_SCHEDULER/ # 32.2 % tma_memory_bound
> # 5.8 % tma_core_bound (45.77%)
> 14,920,109 cpu_atom/UOPS_RETIRED.MS/ # 2.9 % tma_base (45.92%)
> 14,829,879 cpu_atom/UOPS_RETIRED.MS/ # 2.5 % tma_ms_uops (46.31%)
> 31,860,520 cpu_atom/TOPDOWN_RETIRING.ALL/ (46.71%)
> 117,323,055 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 18.7 % tma_branch_mispredicts
> # 11.5 % tma_fetch_bandwidth
> # 0.3 % tma_machine_clears
> # 37.9 % tma_resource_bound (53.49%)
> 222,579,768 cpu_atom/TOPDOWN_BE_BOUND.ALL/ (53.90%)
> 13,672,174 cpu_atom/MEM_SCHEDULER_BLOCK.ST_BUF/ (54.23%)
> 24,264,262 cpu_atom/LD_HEAD.ANY_AT_RET/ (47.46%)
> 13,872,813 cpu_atom/MEM_SCHEDULER_BLOCK.ALL/ (47.45%)
> 223,722,007 cpu_atom/TOPDOWN_BE_BOUND.ALL/ (47.31%)
> 2,005,972 cpu_atom/TOPDOWN_BAD_SPECULATION.MACHINE_CLEARS/ (46.91%)
> 109,423,013 cpu_atom/TOPDOWN_BAD_SPECULATION.MISPREDICT/ (39.72%)
> 67,420,790 cpu_atom/TOPDOWN_FE_BOUND.FRONTEND_BANDWIDTH/ (39.33%)
> 92,790,312 cpu_core/TOPDOWN.SLOTS/ # 24.3 % tma_core_bound
> # 3.0 % tma_heavy_operations
> # 5.6 % tma_light_operations
> # 10.8 % tma_memory_bound
> # 7.8 % tma_branch_mispredicts
> # 40.4 % tma_fetch_latency
> # 0.2 % tma_machine_clears
> # 7.8 % tma_fetch_bandwidth
> 8,041,595 cpu_core/topdown-retiring/
> 10,060,500 cpu_core/topdown-mem-bound/
> 7,314,344 cpu_core/topdown-bad-spec/
> 2,824,600 cpu_core/topdown-heavy-ops/
> 37,630,164 cpu_core/topdown-fetch-lat/
> 7,278,843 cpu_core/topdown-br-mispredict/
> 44,863,148 cpu_core/topdown-fe-bound/
> 32,573,458 cpu_core/topdown-be-bound/
> 5,785,074 cpu_core/INST_RETIRED.ANY/
> 2,325,424 cpu_core/UOPS_RETIRED.MS/
> 15,972,774 cpu_core/CPU_CLK_UNHALTED.THREAD/
> 117,750 cpu_core/INT_MISC.UOP_DROPPING/
>
> 1.003519749 seconds time elapsed
> '''
>
> Note, flags are added below to reduce the size of the output by
> removing event groups and threshold printing support:
> '''
> $ perf stat --metric-no-threshold --metric-no-group -M TopdownL3 -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 3,506,641 cpu_atom/TOPDOWN_BE_BOUND.ALLOC_RESTRICTIONS/ # 0.6 % tma_alloc_restriction (17.14%)
> 133,962,390 cpu_atom/TOPDOWN_BE_BOUND.SERIALIZATION/ # 22.2 % tma_serialization (17.48%)
> 11,201,207 cpu_atom/TOPDOWN_FE_BOUND.ITLB/ # 1.9 % tma_itlb_misses (17.88%)
> 63,876,838 cpu_atom/TOPDOWN_BE_BOUND.MEM_SCHEDULER/ # 10.6 % tma_mem_scheduler
> # 10.5 % tma_store_bound
> # 2.4 % tma_other_load_store (18.28%)
> 14,386,940 cpu_atom/UOPS_RETIRED.MS/ (18.68%)
> 14,432,493 cpu_atom/UOPS_RETIRED.MS/ # 2.7 % tma_other_ret (19.09%)
> 81,582,687 cpu_atom/TOPDOWN_FE_BOUND.ICACHE/ # 13.5 % tma_icache_misses (19.14%)
> 30,467,546 cpu_atom/TOPDOWN_RETIRING.ALL/ (19.14%)
> 16,788,753 cpu_atom/MEM_BOUND_STALLS.LOAD/ # 4.2 % tma_dram_bound
> # 3.7 % tma_l2_bound
> # 6.7 % tma_l3_bound (19.14%)
> 14,514,040 cpu_atom/TOPDOWN_FE_BOUND.DECODE/ # 2.4 % tma_decode (19.14%)
> 688,307 cpu_atom/TOPDOWN_BAD_SPECULATION.NUKE/ # 0.1 % tma_nuke (19.13%)
> 0 cpu_atom/UOPS_RETIRED.FPDIV/ (19.12%)
> 4,408,466 cpu_atom/MEM_BOUND_STALLS.LOAD_L2_HIT/ (19.12%)
> 120,556,998 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 9.3 % tma_branch_detect
> # 1.0 % tma_branch_resteer
> # 5.8 % tma_cisc
> # 0.3 % tma_fast_nuke
> # 0.0 % tma_fpdiv_uops
> # 4.3 % tma_l1_bound
> # 3.2 % tma_non_mem_scheduler
> # 1.9 % tma_other_fb
> # 1.1 % tma_predecode
> # 0.1 % tma_register
> # 0.1 % tma_reorder_buffer (22.30%)
> 34,773,106 cpu_atom/TOPDOWN_FE_BOUND.CISC/ (22.30%)
> 591,112 cpu_atom/TOPDOWN_BE_BOUND.REGISTER/ (22.30%)
> 11,286,706 cpu_atom/TOPDOWN_FE_BOUND.OTHER/ (22.30%)
> 5,082,636 cpu_atom/MEM_BOUND_STALLS.LOAD_DRAM_HIT/ (22.30%)
> 14,146,185 cpu_atom/MEM_SCHEDULER_BLOCK.ST_BUF/ (22.31%)
> 55,833,686 cpu_atom/TOPDOWN_FE_BOUND.BRANCH_DETECT/ (22.30%)
> 25,714,051 cpu_atom/LD_HEAD.ANY_AT_RET/ (19.12%)
> 456,549 cpu_atom/TOPDOWN_BE_BOUND.REORDER_BUFFER/ (19.12%)
> 1,616,862 cpu_atom/TOPDOWN_BAD_SPECULATION.FASTNUKE/ (19.12%)
> 6,680,782 cpu_atom/TOPDOWN_FE_BOUND.PREDECODE/ (19.12%)
> 14,229,195 cpu_atom/MEM_SCHEDULER_BLOCK.ALL/ (19.12%)
> 8,128,921 cpu_atom/MEM_BOUND_STALLS.LOAD_LLC_HIT/ (19.12%)
> 20,941,725 cpu_atom/LD_HEAD.L1_MISS_AT_RET/ (19.11%)
> 6,177,125 cpu_atom/TOPDOWN_FE_BOUND.BRANCH_RESTEER/ (18.78%)
> 228,066,346 cpu_atom/TOPDOWN_BE_BOUND.ALL/ (18.38%)
> 5,204,897 cpu_atom/LD_HEAD.L1_BOUND_AT_RET/ (17.99%)
> 19,060,104 cpu_atom/TOPDOWN_BE_BOUND.NON_MEM_SCHEDULER/ (17.58%)
> 0 cpu_atom/UOPS_RETIRED.FPDIV/ (17.19%)
> 864,565,692 cpu_core/TOPDOWN.SLOTS/ # 4.7 % tma_microcode_sequencer
> # 0.4 % tma_few_uops_instructions
> # 0.3 % tma_fused_instructions
> # 1.8 % tma_memory_operations
> # 0.1 % tma_nop_instructions
> # 8.9 % tma_ms_switches
> # 0.4 % tma_non_fused_branches
> # 0.0 % tma_fp_arith
> # 0.0 % tma_int_operations
> # 35.7 % tma_ports_utilization
> # 3.8 % tma_other_light_ops (18.03%)
> 100,519,954 cpu_core/topdown-retiring/ (18.03%)
> 68,964,454 cpu_core/topdown-bad-spec/ (18.03%)
> 44,732,021 cpu_core/topdown-heavy-ops/ (18.03%)
> 435,618,316 cpu_core/topdown-fe-bound/ (18.03%)
> 262,842,804 cpu_core/topdown-be-bound/ (18.03%)
> 10,368,608 cpu_core/BR_INST_RETIRED.ALL_BRANCHES/ (18.43%)
> 55,947,727 cpu_core/RESOURCE_STALLS.SCOREBOARD/ (18.84%)
> 125,718,255 cpu_core/UOPS_ISSUED.ANY/ (19.24%)
> 23,178,652 cpu_core/EXE_ACTIVITY.1_PORTS_UTIL/ (19.65%)
> 0 cpu_core/INT_VEC_RETIRED.ADD_256/ (20.05%)
> 1,119,514 cpu_core/DSB2MITE_SWITCHES.PENALTY_CYCLES/ # 0.5 % tma_dsb_switches (20.46%)
> 27,684,795 cpu_core/MEMORY_ACTIVITY.STALLS_L1D_MISS/ # 10.6 % tma_l1_bound
> # 0.7 % tma_l2_bound (20.86%)
> 108,813,079 cpu_core/UOPS_EXECUTED.THREAD/ (21.27%)
> 16,563,036 cpu_core/IDQ.MITE_CYCLES_ANY/ # 5.2 % tma_mite (19.14%)
> 53,037,471 cpu_core/EXE_ACTIVITY.BOUND_ON_LOADS/ (19.14%)
> 41,005,510 cpu_core/UOPS_RETIRED.MS/ (19.14%)
> 575,534 cpu_core/ARITH.DIV_ACTIVE/ # 0.2 % tma_divider (19.14%)
> 0 cpu_core/FP_ARITH_INST_RETIRED.SCALAR_SINGLE,umask=0x03/ (19.14%)
> 2,207,021 cpu_core/EXE_ACTIVITY.BOUND_ON_STORES/ # 0.9 % tma_store_bound (19.13%)
> 5,685,032 cpu_core/UOPS_RETIRED.MS,cmask=1,edge/ (19.13%)
> 25,523 cpu_core/DECODE.LCP/ # 0.0 % tma_lcp (19.12%)
> 26,095,298 cpu_core/MEMORY_ACTIVITY.STALLS_L2_MISS/ # 10.8 % tma_l3_bound (19.13%)
> 108,516 cpu_core/MEMORY_ACTIVITY.STALLS_L3_MISS/ # 0.0 % tma_dram_bound (19.13%)
> 192,239,590 cpu_core/CYCLE_ACTIVITY.STALLS_TOTAL/ (19.12%)
> 5,978 cpu_core/LSD.CYCLES_ACTIVE/ # -0.0 % tma_lsd (19.12%)
> 0 cpu_core/INT_VEC_RETIRED.VNNI_128/ (19.13%)
> 137,530,949 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/ # 0.1 % tma_dsb (19.12%)
> 240,070,549 cpu_core/CPU_CLK_UNHALTED.THREAD/ # 17.5 % tma_icache_misses
> # 6.1 % tma_itlb_misses
> # 40.3 % tma_branch_resteers (21.52%)
> 0 cpu_core/FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE,umask=0x3c/ (21.51%)
> 595,051 cpu_core/ARITH.DIV_ACTIVE/ (21.52%)
> 461,041 cpu_core/IDQ.DSB_CYCLES_ANY/ (21.51%)
> 0 cpu_core/INT_VEC_RETIRED.MUL_256/ (21.52%)
> 0 cpu_core/UOPS_EXECUTED.X87/ (21.52%)
> 237,196 cpu_core/IDQ.DSB_CYCLES_OK/ (21.52%)
> 125,009 cpu_core/LSD.CYCLES_OK/ (21.52%)
> 0 cpu_core/INT_VEC_RETIRED.ADD_128/ (21.40%)
> 28,388,778 cpu_core/MEM_UOP_RETIRED.ANY/ (18.61%)
> 1,806,629 cpu_core/INST_RETIRED.NOP/ (18.21%)
> 41,928,018 cpu_core/ICACHE_DATA.STALLS/ (17.81%)
> 0 cpu_core/INT_VEC_RETIRED.VNNI_256/ (17.41%)
> 18,230,137 cpu_core/EXE_ACTIVITY.2_PORTS_UTIL,umask=0xc/ (17.02%)
> 28,052,001 cpu_core/EXE_ACTIVITY.3_PORTS_UTIL,umask=0x80/ (16.61%)
> 4,073,568 cpu_core/INST_RETIRED.MACRO_FUSED/ (16.20%)
> 66,509,871 cpu_core/INT_MISC.UNKNOWN_BRANCH_CYCLES/ (15.92%)
> 2,307,447 cpu_core/IDQ.MITE_CYCLES_OK/ (15.91%)
> 30,345,769 cpu_core/INT_MISC.CLEAR_RESTEER_CYCLES/ (15.91%)
> 0 cpu_core/INT_VEC_RETIRED.SHUFFLES/ (15.91%)
> 14,722,079 cpu_core/ICACHE_TAG.STALLS/ (15.90%)
>
> 1.004474469 seconds time elapsed
>
> $ perf stat --metric-no-threshold --metric-no-group -M TopdownL4 -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 1,004,834,399 ns duration_time # 0.3 % tma_false_sharing
> # 40.2 % tma_l3_hit_latency
> # 4.4 % tma_contested_accesses
> # 1.6 % tma_data_sharing
> 3,762,410 cpu_atom/LD_HEAD.PGWALK_AT_RET/ # 3.1 % tma_stlb_miss (33.58%)
> 10 cpu_atom/MACHINE_CLEARS.SMC/ # 0.0 % tma_smc (33.98%)
> 66,500,689 cpu_atom/TOPDOWN_BE_BOUND.MEM_SCHEDULER/ # 0.0 % tma_ld_buffer
> # 0.0 % tma_rsv
> # 11.0 % tma_st_buffer (29.60%)
> 1,051,312 cpu_atom/LD_HEAD.OTHER_AT_RET/ # 0.9 % tma_other_l1 (30.00%)
> 14,740,093 cpu_atom/UOPS_RETIRED.MS/ (30.39%)
> 117,899 cpu_atom/LD_HEAD.DTLB_MISS_AT_RET/ # 0.1 % tma_stlb_hit (30.79%)
> 701,548 cpu_atom/TOPDOWN_BAD_SPECULATION.NUKE/ # 0.0 % tma_disambiguation
> # 0.0 % tma_fp_assist
> # 0.1 % tma_memory_ordering
> # 0.0 % tma_page_fault (31.08%)
> 12,873 cpu_atom/MACHINE_CLEARS.MEMORY_ORDERING/ (31.07%)
> 58,321 cpu_atom/MEM_SCHEDULER_BLOCK.LD_BUF/ (31.07%)
> 43,458 cpu_atom/MEM_SCHEDULER_BLOCK.RSV/ (31.07%)
> 14,256,005 cpu_atom/MEM_SCHEDULER_BLOCK.ALL/ (31.06%)
> 122,156,534 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 0.0 % tma_store_fwd_blk (36.16%)
> 0 cpu_atom/MACHINE_CLEARS.FP_ASSIST/ (35.76%)
> 13,804 cpu_atom/MACHINE_CLEARS.SLOW/ (35.35%)
> 14,388,300 cpu_atom/MEM_SCHEDULER_BLOCK.ST_BUF/ (34.95%)
> 493,070,443 cpu_atom/CPU_CLK_UNHALTED.REF_TSC/ (39.73%)
> 2 cpu_atom/MACHINE_CLEARS.PAGE_FAULT/ (39.33%)
> 1,101 cpu_atom/LD_HEAD.ST_ADDR_AT_RET/ (38.93%)
> 929 cpu_atom/MACHINE_CLEARS.DISAMBIGUATION/ (38.55%)
> 14,241,213 cpu_atom/MEM_SCHEDULER_BLOCK.ALL/ (33.45%)
> 1,010,981,054 cpu_core/TOPDOWN.SLOTS/ # 0.0 % tma_assists
> # 4.3 % tma_cisc
> # 0.0 % tma_fp_scalar
> # 0.0 % tma_fp_vector
> # 0.0 % tma_shuffles
> # 0.0 % tma_int_vector_128b
> # 0.0 % tma_x87_use
> # 0.0 % tma_int_vector_256b
> # 0.7 % tma_clears_resteers
> # 12.4 % tma_mispredicts_resteers (8.14%)
> 132,375,316 cpu_core/topdown-retiring/ (8.14%)
> 88,303,327 cpu_core/topdown-bad-spec/ (8.14%)
> 85,519,216 cpu_core/topdown-br-mispredict/ (8.14%)
> 495,722,455 cpu_core/topdown-fe-bound/ (8.14%)
> 298,147,134 cpu_core/topdown-be-bound/ (8.14%)
> 21,418,803 cpu_core/UOPS_EXECUTED.CYCLES_GE_3/ # 8.8 % tma_ports_utilized_3m (10.12%)
> 35,208,716 cpu_core/OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD,cmask=4/ # 14.5 % tma_mem_bandwidth
> # 33.3 % tma_mem_latency (10.52%)
> 17,358 cpu_core/OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM/ (10.91%)
> 55,883,811 cpu_core/RESOURCE_STALLS.SCOREBOARD/ # 24.1 % tma_ports_utilized_0 (12.91%)
> 0 cpu_core/INT_VEC_RETIRED.ADD_256/ (14.89%)
> 139,890 cpu_core/DTLB_STORE_MISSES.STLB_HIT,cmask=1/ # 2.8 % tma_dtlb_store (15.30%)
> 216,886 cpu_core/MEM_INST_RETIRED.LOCK_LOADS/ # 3.8 % tma_store_latency
> # 0.1 % tma_lock_latency (15.71%)
> 115,948,790 cpu_core/UOPS_EXECUTED.THREAD/ (17.69%)
> 52,155,508 cpu_core/EXE_ACTIVITY.BOUND_ON_LOADS/ (15.93%)
> 6 cpu_core/ASSISTS.ANY,umask=0x1B/ (15.93%)
> 87,422,517 cpu_core/CYCLE_ACTIVITY.CYCLES_MEM_ANY/ # 5.2 % tma_dtlb_load (15.81%)
> 37,420,652 cpu_core/MEMORY_ACTIVITY.CYCLES_L1D_MISS/ (15.44%)
> 43,527,357 cpu_core/UOPS_RETIRED.MS/ (15.04%)
> 31,787,227 cpu_core/INT_MISC.CLEAR_RESTEER_CYCLES/ (14.64%)
> 0 cpu_core/FP_ARITH_INST_RETIRED.SCALAR_SINGLE,umask=0x03/ (14.24%)
> 4,899,130 cpu_core/XQ.FULL_CYCLES/ # 2.0 % tma_sq_full (13.84%)
> 1,365 cpu_core/OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM/ (13.44%)
> 23,904,338 cpu_core/EXE_ACTIVITY.1_PORTS_UTIL/ # 9.9 % tma_ports_utilized_1 (13.05%)
> 251,479 cpu_core/L2_RQSTS.ALL_RFO/ (12.76%)
> 188,701,010 cpu_core/CYCLE_ACTIVITY.STALLS_TOTAL/ (12.74%)
> 6,909 cpu_core/MEM_INST_RETIRED.SPLIT_STORES/ # 0.0 % tma_split_stores (12.74%)
> 619,775 cpu_core/MEM_LOAD_RETIRED.L1_MISS/ (9.56%)
> 136,716,345 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/ # 0.9 % tma_decoder0_alone (11.15%)
> 0 cpu_core/INT_VEC_RETIRED.VNNI_128/ (12.74%)
> 605,850 cpu_core/L1D_PEND_MISS.FB_FULL/ # 0.2 % tma_fb_full (12.73%)
> 60,079 cpu_core/MEM_STORE_RETIRED.L2_HIT/ (11.14%)
> 242,508,080 cpu_core/CPU_CLK_UNHALTED.THREAD/ # 4.2 % tma_ports_utilized_2
> # 0.2 % tma_store_fwd_blk
> # 0.0 % tma_streaming_stores
> # 27.5 % tma_unknown_branches
> # 0.0 % tma_split_loads (12.74%)
> 0 cpu_core/FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE,umask=0x3c/ (14.33%)
> 32,573 cpu_core/LD_BLOCKS.STORE_FORWARD/ (12.74%)
> 1,130 cpu_core/OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD/ (12.74%)
> 4,029 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS/ (9.56%)
> 4,844,548 cpu_core/INST_DECODED.DECODERS,cmask=1/ (9.56%)
> 5,266 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_NO_FWD/ (6.37%)
> 0 cpu_core/UOPS_EXECUTED.X87/ (7.96%)
> 0 cpu_core/INT_VEC_RETIRED.MUL_256/ (9.56%)
> 2,786,473 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/ (9.56%)
> 961,614,001 cpu_core/CPU_CLK_UNHALTED.REF_TSC/ (11.15%)
> 2,433,107 cpu_core/INST_DECODED.DECODERS,cmask=2/ (11.15%)
> 0 cpu_core/INT_VEC_RETIRED.ADD_128/ (12.74%)
> 9,058,046 cpu_core/OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO/ (12.74%)
> 6,399,992 cpu_core/MEM_INST_RETIRED.ALL_STORES/ (12.74%)
> 45,519,749 cpu_core/L1D_PEND_MISS.PENDING/ (9.56%)
> 12,200,559 cpu_core/DTLB_LOAD_MISSES.WALK_ACTIVE/ (7.97%)
> 115,944,190 cpu_core/OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD/ (6.37%)
> 0 cpu_core/INT_VEC_RETIRED.VNNI_256/ (7.96%)
> 1,885,278 cpu_core/INT_MISC.UOP_DROPPING/ (9.56%)
> 524,819 cpu_core/MEM_LOAD_RETIRED.FB_HIT/ (9.56%)
> 26,866,872 cpu_core/EXE_ACTIVITY.3_PORTS_UTIL,umask=0x80/ (11.15%)
> 10,265,977 cpu_core/EXE_ACTIVITY.2_PORTS_UTIL/ (12.74%)
> 66,662,934 cpu_core/INT_MISC.UNKNOWN_BRANCH_CYCLES/ (12.74%)
> 0 cpu_core/OCR.STREAMING_WR.ANY_RESPONSE/ (12.74%)
> 12,499 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD/ (12.74%)
> 0 cpu_core/INT_VEC_RETIRED.SHUFFLES/ (12.74%)
> 47,649 cpu_core/DTLB_LOAD_MISSES.STLB_HIT,cmask=1/ (12.74%)
> 106,424 cpu_core/L2_RQSTS.RFO_HIT/ (12.74%)
> 0 cpu_core/LD_BLOCKS.NO_SR/ (7.97%)
> 1,343,692 cpu_core/MEM_LOAD_COMPLETED.L1_MISS_ANY/ (7.96%)
> 28,517 cpu_core/L1D_PEND_MISS.L2_STALLS/ (6.37%)
> 394,101 cpu_core/MEM_LOAD_RETIRED.L3_HIT/ (6.36%)
> 76,860,165,929 TSC
>
> 1.004834399 seconds time elapsed
>
> $ perf stat --metric-no-threshold --metric-no-group -M TopdownL5 -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 839,538,302 cpu_core/TOPDOWN.SLOTS/ # 0.0 % tma_avx_assists
> # 0.0 % tma_fp_assists
> # 0.0 % tma_page_faults
> # 0.0 % tma_fp_vector_128b
> # 0.0 % tma_fp_vector_256b (32.40%)
> 100,274,045 cpu_core/topdown-retiring/ (32.40%)
> 77,425,642 cpu_core/topdown-bad-spec/ (32.40%)
> 424,563,652 cpu_core/topdown-fe-bound/ (32.40%)
> 245,420,564 cpu_core/topdown-be-bound/ (32.40%)
> 0 cpu_core/FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE/ (32.79%)
> 54,372,921 cpu_core/RESOURCE_STALLS.SCOREBOARD/ # 22.2 % tma_serializing_operation (33.20%)
> 23,018,585 cpu_core/UOPS_DISPATCHED.PORT_6/ # 8.0 % tma_alu_op_utilization (33.61%)
> 17,748,101 cpu_core/UOPS_DISPATCHED.PORT_2_3_10/ # 4.2 % tma_load_op_utilization (34.02%)
> 0 cpu_core/FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE/ (34.43%)
> 7,616,700 cpu_core/UOPS_DISPATCHED.PORT_0/ (34.83%)
> 96,571 cpu_core/DTLB_STORE_MISSES.STLB_HIT,cmask=1/ # 0.6 % tma_store_stlb_hit (35.25%)
> 84,909,672 cpu_core/CYCLE_ACTIVITY.CYCLES_MEM_ANY/ # 0.2 % tma_load_stlb_hit (35.66%)
> 32,935,744 cpu_core/MEMORY_ACTIVITY.CYCLES_L1D_MISS/ (31.95%)
> 16,597,385 cpu_core/UOPS_DISPATCHED.PORT_5_11/ (31.95%)
> 9,452,844 cpu_core/UOPS_DISPATCHED.PORT_1/ (31.94%)
> 2,620,695 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/ # 1.8 % tma_store_stlb_miss (31.95%)
> 15,699,364 cpu_core/UOPS_DISPATCHED.PORT_7_8/ # 5.7 % tma_store_op_utilization (31.95%)
> 0 cpu_core/FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE/ (31.94%)
> 142,096,670 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/ (31.95%)
> 244,591,239 cpu_core/CPU_CLK_UNHALTED.THREAD/ # 5.2 % tma_load_stlb_miss
> # 0.0 % tma_mixing_vectors (35.92%)
> 2,728,385 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/ (35.66%)
> 0 cpu_core/ASSISTS.SSE_AVX_MIX/ (35.27%)
> 0 cpu_core/FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE/ (34.86%)
> 12,664,768 cpu_core/DTLB_LOAD_MISSES.WALK_ACTIVE/ (34.46%)
> 12,629,733 cpu_core/DTLB_LOAD_MISSES.WALK_ACTIVE/ (34.04%)
> 0 cpu_core/ASSISTS.FP/ (33.63%)
> 12 cpu_core/ASSISTS.PAGE_FAULT/ (33.23%)
> 16,704,699 cpu_core/UOPS_DISPATCHED.PORT_4_9/ (32.81%)
> 48,386 cpu_core/DTLB_LOAD_MISSES.STLB_HIT,cmask=1/ (28.68%)
>
> 1.002806967 seconds time elapsed
>
> $ perf stat --metric-no-threshold --metric-no-group -M TopdownL6 -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 743,684 cpu_core/UOPS_DISPATCHED.PORT_0/ # 4.6 % tma_port_0
> 1,514 cpu_core/MISC2_RETIRED.LFENCE/ # 0.1 % tma_memory_fence
> 22,120 cpu_core/CPU_CLK_UNHALTED.PAUSE/ # 0.1 % tma_slow_pause
> 16,187,637 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/ # 4.5 % tma_port_1
> # 12.6 % tma_port_6
> 16,754,672 cpu_core/CPU_CLK_UNHALTED.THREAD/
> 728,805 cpu_core/UOPS_DISPATCHED.PORT_1/
> 2,040,181 cpu_core/UOPS_DISPATCHED.PORT_6/
>
> 1.002727371 seconds time elapse
> '''
>
> Using --cputype:
> '''
> $ perf stat --cputype=core -M TopdownL1 -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 90,542,172 cpu_core/TOPDOWN.SLOTS/ # 31.3 % tma_backend_bound
> # 7.0 % tma_bad_speculation
> # 54.0 % tma_frontend_bound
> # 7.6 % tma_retiring
> 6,917,885 cpu_core/topdown-retiring/
> 6,242,227 cpu_core/topdown-bad-spec/
> 2,353,956 cpu_core/topdown-heavy-ops/
> 49,034,945 cpu_core/topdown-fe-bound/
> 28,390,484 cpu_core/topdown-be-bound/
> 98,299 cpu_core/INT_MISC.UOP_DROPPING/
>
> 1.002395582 seconds time elapsed
>
> $ perf stat --cputype=atom -M TopdownL1 -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 645,836 cpu_atom/TOPDOWN_RETIRING.ALL/ # 26.4 % tma_bad_speculation
> 2,404,468 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 38.9 % tma_frontend_bound
> 1,455,604 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 23.6 % tma_backend_bound
> # 23.6 % tma_backend_bound_aux
> 1,235,109 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 10.4 % tma_retiring
> 642,124 cpu_atom/TOPDOWN_RETIRING.ALL/
> 2,398,892 cpu_atom/TOPDOWN_FE_BOUND.ALL/
> 1,503,157 cpu_atom/TOPDOWN_BE_BOUND.ALL/
>
> 1.002061651 seconds time elapsed
> '''
>
> Ian Rogers (40):
> perf stat: Introduce skippable evsels
> perf vendor events intel: Add alderlake metric constraints
> perf vendor events intel: Add icelake metric constraints
> perf vendor events intel: Add icelakex metric constraints
> perf vendor events intel: Add sapphirerapids metric constraints
> perf vendor events intel: Add tigerlake metric constraints
> perf stat: Avoid segv on counter->name
> perf test: Test more sysfs events
> perf test: Use valid for PMU tests
> perf test: Mask config then test
> perf test: Test more with config_cache
> perf test: Roundtrip name, don't assume 1 event per name
> perf parse-events: Set attr.type to PMU type early
> perf print-events: Avoid unnecessary strlist
> perf parse-events: Avoid scanning PMUs before parsing
> perf test: Validate events with hyphens in
> perf evsel: Modify group pmu name for software events
> perf test: Move x86 hybrid tests to arch/x86
> perf test x86 hybrid: Don't assume evlist order
> perf parse-events: Support PMUs for legacy cache events
> perf parse-events: Wildcard legacy cache events
> perf print-events: Print legacy cache events for each PMU
> perf parse-events: Support wildcards on raw events
> perf parse-events: Remove now unused hybrid logic
> perf parse-events: Minor type safety cleanup
> perf parse-events: Add pmu filter
> perf stat: Make cputype filter generic
> perf test: Add cputype testing to perf stat
> perf test: Fix parse-events tests for >1 core PMU
> perf parse-events: Support hardware events as terms
> perf parse-events: Avoid error when assigning a term
> perf parse-events: Avoid error when assigning a legacy cache term
> perf parse-events: Don't auto merge hybrid wildcard events
> perf parse-events: Don't reorder atom cpu events
> perf metrics: Be PMU specific for referenced metrics.
> perf metric: Json flag to not group events if gathering a metric group
> perf stat: Command line PMU metric filtering
> perf vendor events intel: Correct alderlake metrics
> perf jevents: Don't rewrite metrics across PMUs
> perf metrics: Be PMU specific in event match
>
> tools/perf/arch/x86/include/arch-tests.h | 1 +
> tools/perf/arch/x86/tests/Build | 1 +
> tools/perf/arch/x86/tests/arch-tests.c | 10 +
> tools/perf/arch/x86/tests/hybrid.c | 275 ++++++
> tools/perf/arch/x86/util/evlist.c | 4 +-
> tools/perf/builtin-list.c | 19 +-
> tools/perf/builtin-record.c | 13 +-
> tools/perf/builtin-stat.c | 73 +-
> tools/perf/builtin-top.c | 5 +-
> tools/perf/builtin-trace.c | 5 +-
> .../arch/x86/alderlake/adl-metrics.json | 275 +++---
> .../arch/x86/alderlaken/adln-metrics.json | 20 +-
> .../arch/x86/broadwell/bdw-metrics.json | 12 +
> .../arch/x86/broadwellde/bdwde-metrics.json | 12 +
> .../arch/x86/broadwellx/bdx-metrics.json | 12 +
> .../arch/x86/cascadelakex/clx-metrics.json | 12 +
> .../arch/x86/haswell/hsw-metrics.json | 12 +
> .../arch/x86/haswellx/hsx-metrics.json | 12 +
> .../arch/x86/icelake/icl-metrics.json | 23 +
> .../arch/x86/icelakex/icx-metrics.json | 23 +
> .../arch/x86/ivybridge/ivb-metrics.json | 12 +
> .../arch/x86/ivytown/ivt-metrics.json | 12 +
> .../arch/x86/jaketown/jkt-metrics.json | 12 +
> .../arch/x86/sandybridge/snb-metrics.json | 12 +
> .../arch/x86/sapphirerapids/spr-metrics.json | 23 +
> .../arch/x86/skylake/skl-metrics.json | 12 +
> .../arch/x86/skylakex/skx-metrics.json | 12 +
> .../arch/x86/tigerlake/tgl-metrics.json | 23 +
> tools/perf/pmu-events/jevents.py | 10 +-
> tools/perf/pmu-events/metric.py | 28 +-
> tools/perf/pmu-events/metric_test.py | 6 +-
> tools/perf/pmu-events/pmu-events.h | 2 +
> tools/perf/tests/evsel-roundtrip-name.c | 119 ++-
> tools/perf/tests/parse-events.c | 826 +++++++++---------
> tools/perf/tests/pmu-events.c | 12 +-
> tools/perf/tests/shell/stat.sh | 44 +
> tools/perf/util/Build | 1 -
> tools/perf/util/evlist.h | 1 -
> tools/perf/util/evsel.c | 30 +-
> tools/perf/util/evsel.h | 1 +
> tools/perf/util/metricgroup.c | 111 ++-
> tools/perf/util/metricgroup.h | 3 +-
> tools/perf/util/parse-events-hybrid.c | 214 -----
> tools/perf/util/parse-events-hybrid.h | 25 -
> tools/perf/util/parse-events.c | 646 ++++++--------
> tools/perf/util/parse-events.h | 61 +-
> tools/perf/util/parse-events.l | 108 +--
> tools/perf/util/parse-events.y | 222 ++---
> tools/perf/util/pmu-hybrid.c | 20 -
> tools/perf/util/pmu-hybrid.h | 1 -
> tools/perf/util/pmu.c | 16 +-
> tools/perf/util/pmu.h | 3 +
> tools/perf/util/pmus.c | 25 +-
> tools/perf/util/pmus.h | 3 +
> tools/perf/util/print-events.c | 85 +-
> tools/perf/util/stat-display.c | 6 +-
> 56 files changed, 1939 insertions(+), 1627 deletions(-)
> create mode 100644 tools/perf/arch/x86/tests/hybrid.c
> delete mode 100644 tools/perf/util/parse-events-hybrid.c
> delete mode 100644 tools/perf/util/parse-events-hybrid.h
>
> --
> 2.40.1.495.gc816e09b53d-goog
>

--

- Arnaldo

2023-04-26 21:47:21

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v1 00/40] Fix perf on Intel hybrid CPUs

Em Wed, Apr 26, 2023 at 06:09:36PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Apr 26, 2023 at 12:00:10AM -0700, Ian Rogers escreveu:
> > TL;DR: hybrid doesn't crash, json metrics work on hybrid on both PMUs
> > or individually, event parsing doesn't always scan all PMUs, more and
> > new tests that also run without hybrid, less code.
> >
> > The first patches were previously posted to improve metrics here:
> > "perf stat: Introduce skippable evsels"
> > https://lore.kernel.org/all/[email protected]/
> > "perf vendor events intel: Add xxx metric constraints"
> > https://lore.kernel.org/all/[email protected]/
> >
> > Next are some general test improvements.
>
> Kan,
>
> Have you looked at this? I'm doing a test build on it now.

And just to make clear, this is for v6.5.

- Arnaldo
>
> > Next event parsing is rewritten to not scan all PMUs for the benefit
> > of raw and legacy cache parsing, instead these are handled by the
> > lexer and a new term type. This ultimately removes the need for the
> > event parser for hybrid to be recursive as legacy cache can be just a
> > term. Tests are re-enabled for events with hyphens, so AMD's
> > branch-brs event is now parsable.
> >
> > The cputype option is made a generic pmu filter flag and is tested
> > even on non-hybrid systems.
> >
> > The final patches address specific json metric issues on hybrid, in
> > both the json metrics and the metric code. They also bring in a new
> > json option to not group events when matching a metricgroup, this
> > helps reduce counter pressure for TopdownL1 and TopdownL2 metric
> > groups. The updates to the script that updates the json are posted in:
> > https://github.com/intel/perfmon/pull/73
> >
> > The patches add slightly more code than they remove, in areas like
> > better json metric constraints and tests, but in the core util code,
> > the removal of hybrid is a net reduction:
> > 20 files changed, 631 insertions(+), 951 deletions(-)
> >
> > There's specific detail with each patch, but for now here is the 6.3
> > output followed by that from perf-tools-next with the patch series
> > applied. The tool is running on an Alderlake CPU on an elderly 5.15
> > kernel:
> >
> > Events on hybrid that parse and pass tests:
> > '''
> > $ perf-6.3 version
> > perf version 6.3.rc7.gb7bc77e2f2c7
> > $ perf-6.3 test
> > ...
> > 6.1: Test event parsing : FAILED!
> > ...
> > $ perf test
> > ...
> > 6: Parse event definition strings :
> > 6.1: Test event parsing : Ok
> > 6.2: Parsing of all PMU events from sysfs : Ok
> > 6.3: Parsing of given PMU events from sysfs : Ok
> > 6.4: Parsing of aliased events from sysfs : Skip (no aliases in sysfs)
> > 6.5: Parsing of aliased events : Ok
> > 6.6: Parsing of terms (event modifiers) : Ok
> > ...
> > '''
> >
> > No event/metric running with json metrics and TopdownL1 on both PMUs:
> > '''
> > $ perf-6.3 stat -a sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > 24,073.58 msec cpu-clock # 23.975 CPUs utilized
> > 350 context-switches # 14.539 /sec
> > 25 cpu-migrations # 1.038 /sec
> > 66 page-faults # 2.742 /sec
> > 21,257,199 cpu_core/cycles/ # 883.009 K/sec
> > 2,162,192 cpu_atom/cycles/ # 89.816 K/sec
> > 6,679,379 cpu_core/instructions/ # 277.457 K/sec
> > 753,197 cpu_atom/instructions/ # 31.287 K/sec
> > 1,300,647 cpu_core/branches/ # 54.028 K/sec
> > 148,652 cpu_atom/branches/ # 6.175 K/sec
> > 117,429 cpu_core/branch-misses/ # 4.878 K/sec
> > 14,396 cpu_atom/branch-misses/ # 598.000 /sec
> > 123,097,644 cpu_core/slots/ # 5.113 M/sec
> > 9,241,207 cpu_core/topdown-retiring/ # 7.5% Retiring
> > 8,903,288 cpu_core/topdown-bad-spec/ # 7.2% Bad Speculation
> > 66,590,029 cpu_core/topdown-fe-bound/ # 54.1% Frontend Bound
> > 38,397,500 cpu_core/topdown-be-bound/ # 31.2% Backend Bound
> > 3,294,283 cpu_core/topdown-heavy-ops/ # 2.7% Heavy Operations # 4.8% Light Operations
> > 8,855,769 cpu_core/topdown-br-mispredict/ # 7.2% Branch Mispredict # 0.0% Machine Clears
> > 57,695,714 cpu_core/topdown-fetch-lat/ # 46.9% Fetch Latency # 7.2% Fetch Bandwidth
> > 12,823,926 cpu_core/topdown-mem-bound/ # 10.4% Memory Bound # 20.8% Core Bound
> >
> > 1.004093622 seconds time elapsed
> >
> > $ perf stat -a sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > 24,064.65 msec cpu-clock # 23.973 CPUs utilized
> > 384 context-switches # 15.957 /sec
> > 24 cpu-migrations # 0.997 /sec
> > 71 page-faults # 2.950 /sec
> > 19,737,646 cpu_core/cycles/ # 820.192 K/sec
> > 122,018,505 cpu_atom/cycles/ # 5.070 M/sec (63.32%)
> > 7,636,653 cpu_core/instructions/ # 317.339 K/sec
> > 16,266,629 cpu_atom/instructions/ # 675.955 K/sec (72.50%)
> > 1,552,995 cpu_core/branches/ # 64.534 K/sec
> > 3,208,143 cpu_atom/branches/ # 133.314 K/sec (72.50%)
> > 132,151 cpu_core/branch-misses/ # 5.491 K/sec
> > 547,285 cpu_atom/branch-misses/ # 22.742 K/sec (72.49%)
> > 32,110,597 cpu_atom/TOPDOWN_RETIRING.ALL/ # 1.334 M/sec
> > # 18.4 % tma_bad_speculation (72.48%)
> > 228,006,765 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 9.475 M/sec
> > # 38.1 % tma_frontend_bound (72.47%)
> > 225,866,251 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 9.386 M/sec
> > # 37.7 % tma_backend_bound
> > # 37.7 % tma_backend_bound_aux (72.73%)
> > 119,748,254 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 4.976 M/sec
> > # 5.2 % tma_retiring (73.14%)
> > 31,363,579 cpu_atom/TOPDOWN_RETIRING.ALL/ # 1.303 M/sec (73.37%)
> > 227,907,321 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 9.471 M/sec (63.95%)
> > 228,803,268 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 9.508 M/sec (63.55%)
> > 113,357,334 cpu_core/TOPDOWN.SLOTS/ # 30.5 % tma_backend_bound
> > # 9.2 % tma_retiring
> > # 8.7 % tma_bad_speculation
> > # 51.6 % tma_frontend_bound
> > 10,451,044 cpu_core/topdown-retiring/
> > 9,687,449 cpu_core/topdown-bad-spec/
> > 58,703,214 cpu_core/topdown-fe-bound/
> > 34,540,660 cpu_core/topdown-be-bound/
> > 154,902 cpu_core/INT_MISC.UOP_DROPPING/ # 6.437 K/sec
> >
> > 1.003818397 seconds time elapsed
> > '''
> >
> > Json metrics that don't crash:
> > '''
> > $ perf-6.3 stat -M TopdownL1 -a sleep 1
> > WARNING: events in group from different hybrid PMUs!
> > WARNING: grouped events cpus do not match, disabling group:
> > anon group { topdown-retiring, topdown-retiring, INT_MISC.UOP_DROPPING, topdown-fe-bound, topdown-fe-bound, CPU_CLK_UNHALTED.CORE, topdown-be-bound, topdown-be-bound, topdown-bad-spec, topdown-bad-spec }
> > Error:
> > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-retiring).
> > /bin/dmesg | grep -i perf may provide additional information.
> >
> > $ perf stat -M TopdownL1 -a sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > 811,810 cpu_atom/TOPDOWN_RETIRING.ALL/ # 26.6 % tma_bad_speculation
> > 3,239,281 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 38.8 % tma_frontend_bound
> > 2,037,667 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 24.4 % tma_backend_bound
> > # 24.4 % tma_backend_bound_aux
> > 1,670,438 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 9.7 % tma_retiring
> > 808,138 cpu_atom/TOPDOWN_RETIRING.ALL/
> > 3,234,707 cpu_atom/TOPDOWN_FE_BOUND.ALL/
> > 2,081,420 cpu_atom/TOPDOWN_BE_BOUND.ALL/
> > 122,795,280 cpu_core/TOPDOWN.SLOTS/ # 31.7 % tma_backend_bound
> > # 7.0 % tma_bad_speculation
> > # 54.1 % tma_frontend_bound
> > # 7.2 % tma_retiring
> > 8,817,636 cpu_core/topdown-retiring/
> > 8,480,817 cpu_core/topdown-bad-spec/
> > 3,108,926 cpu_core/topdown-heavy-ops/
> > 66,566,215 cpu_core/topdown-fe-bound/
> > 38,958,811 cpu_core/topdown-be-bound/
> > 134,194 cpu_core/INT_MISC.UOP_DROPPING/
> >
> > 1.003607796 seconds time elapsed
> >
> > $ perf stat -M TopdownL2 -a sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > 162,334,218 cpu_atom/TOPDOWN_FE_BOUND.FRONTEND_LATENCY/ # 27.7 % tma_fetch_latency (38.99%)
> > 16,191,486 cpu_atom/INST_RETIRED.ANY/ (45.76%)
> > 68,443,205 cpu_atom/TOPDOWN_BE_BOUND.MEM_SCHEDULER/ # 32.2 % tma_memory_bound
> > # 5.8 % tma_core_bound (45.77%)
> > 14,920,109 cpu_atom/UOPS_RETIRED.MS/ # 2.9 % tma_base (45.92%)
> > 14,829,879 cpu_atom/UOPS_RETIRED.MS/ # 2.5 % tma_ms_uops (46.31%)
> > 31,860,520 cpu_atom/TOPDOWN_RETIRING.ALL/ (46.71%)
> > 117,323,055 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 18.7 % tma_branch_mispredicts
> > # 11.5 % tma_fetch_bandwidth
> > # 0.3 % tma_machine_clears
> > # 37.9 % tma_resource_bound (53.49%)
> > 222,579,768 cpu_atom/TOPDOWN_BE_BOUND.ALL/ (53.90%)
> > 13,672,174 cpu_atom/MEM_SCHEDULER_BLOCK.ST_BUF/ (54.23%)
> > 24,264,262 cpu_atom/LD_HEAD.ANY_AT_RET/ (47.46%)
> > 13,872,813 cpu_atom/MEM_SCHEDULER_BLOCK.ALL/ (47.45%)
> > 223,722,007 cpu_atom/TOPDOWN_BE_BOUND.ALL/ (47.31%)
> > 2,005,972 cpu_atom/TOPDOWN_BAD_SPECULATION.MACHINE_CLEARS/ (46.91%)
> > 109,423,013 cpu_atom/TOPDOWN_BAD_SPECULATION.MISPREDICT/ (39.72%)
> > 67,420,790 cpu_atom/TOPDOWN_FE_BOUND.FRONTEND_BANDWIDTH/ (39.33%)
> > 92,790,312 cpu_core/TOPDOWN.SLOTS/ # 24.3 % tma_core_bound
> > # 3.0 % tma_heavy_operations
> > # 5.6 % tma_light_operations
> > # 10.8 % tma_memory_bound
> > # 7.8 % tma_branch_mispredicts
> > # 40.4 % tma_fetch_latency
> > # 0.2 % tma_machine_clears
> > # 7.8 % tma_fetch_bandwidth
> > 8,041,595 cpu_core/topdown-retiring/
> > 10,060,500 cpu_core/topdown-mem-bound/
> > 7,314,344 cpu_core/topdown-bad-spec/
> > 2,824,600 cpu_core/topdown-heavy-ops/
> > 37,630,164 cpu_core/topdown-fetch-lat/
> > 7,278,843 cpu_core/topdown-br-mispredict/
> > 44,863,148 cpu_core/topdown-fe-bound/
> > 32,573,458 cpu_core/topdown-be-bound/
> > 5,785,074 cpu_core/INST_RETIRED.ANY/
> > 2,325,424 cpu_core/UOPS_RETIRED.MS/
> > 15,972,774 cpu_core/CPU_CLK_UNHALTED.THREAD/
> > 117,750 cpu_core/INT_MISC.UOP_DROPPING/
> >
> > 1.003519749 seconds time elapsed
> > '''
> >
> > Note, flags are added below to reduce the size of the output by
> > removing event groups and threshold printing support:
> > '''
> > $ perf stat --metric-no-threshold --metric-no-group -M TopdownL3 -a sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > 3,506,641 cpu_atom/TOPDOWN_BE_BOUND.ALLOC_RESTRICTIONS/ # 0.6 % tma_alloc_restriction (17.14%)
> > 133,962,390 cpu_atom/TOPDOWN_BE_BOUND.SERIALIZATION/ # 22.2 % tma_serialization (17.48%)
> > 11,201,207 cpu_atom/TOPDOWN_FE_BOUND.ITLB/ # 1.9 % tma_itlb_misses (17.88%)
> > 63,876,838 cpu_atom/TOPDOWN_BE_BOUND.MEM_SCHEDULER/ # 10.6 % tma_mem_scheduler
> > # 10.5 % tma_store_bound
> > # 2.4 % tma_other_load_store (18.28%)
> > 14,386,940 cpu_atom/UOPS_RETIRED.MS/ (18.68%)
> > 14,432,493 cpu_atom/UOPS_RETIRED.MS/ # 2.7 % tma_other_ret (19.09%)
> > 81,582,687 cpu_atom/TOPDOWN_FE_BOUND.ICACHE/ # 13.5 % tma_icache_misses (19.14%)
> > 30,467,546 cpu_atom/TOPDOWN_RETIRING.ALL/ (19.14%)
> > 16,788,753 cpu_atom/MEM_BOUND_STALLS.LOAD/ # 4.2 % tma_dram_bound
> > # 3.7 % tma_l2_bound
> > # 6.7 % tma_l3_bound (19.14%)
> > 14,514,040 cpu_atom/TOPDOWN_FE_BOUND.DECODE/ # 2.4 % tma_decode (19.14%)
> > 688,307 cpu_atom/TOPDOWN_BAD_SPECULATION.NUKE/ # 0.1 % tma_nuke (19.13%)
> > 0 cpu_atom/UOPS_RETIRED.FPDIV/ (19.12%)
> > 4,408,466 cpu_atom/MEM_BOUND_STALLS.LOAD_L2_HIT/ (19.12%)
> > 120,556,998 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 9.3 % tma_branch_detect
> > # 1.0 % tma_branch_resteer
> > # 5.8 % tma_cisc
> > # 0.3 % tma_fast_nuke
> > # 0.0 % tma_fpdiv_uops
> > # 4.3 % tma_l1_bound
> > # 3.2 % tma_non_mem_scheduler
> > # 1.9 % tma_other_fb
> > # 1.1 % tma_predecode
> > # 0.1 % tma_register
> > # 0.1 % tma_reorder_buffer (22.30%)
> > 34,773,106 cpu_atom/TOPDOWN_FE_BOUND.CISC/ (22.30%)
> > 591,112 cpu_atom/TOPDOWN_BE_BOUND.REGISTER/ (22.30%)
> > 11,286,706 cpu_atom/TOPDOWN_FE_BOUND.OTHER/ (22.30%)
> > 5,082,636 cpu_atom/MEM_BOUND_STALLS.LOAD_DRAM_HIT/ (22.30%)
> > 14,146,185 cpu_atom/MEM_SCHEDULER_BLOCK.ST_BUF/ (22.31%)
> > 55,833,686 cpu_atom/TOPDOWN_FE_BOUND.BRANCH_DETECT/ (22.30%)
> > 25,714,051 cpu_atom/LD_HEAD.ANY_AT_RET/ (19.12%)
> > 456,549 cpu_atom/TOPDOWN_BE_BOUND.REORDER_BUFFER/ (19.12%)
> > 1,616,862 cpu_atom/TOPDOWN_BAD_SPECULATION.FASTNUKE/ (19.12%)
> > 6,680,782 cpu_atom/TOPDOWN_FE_BOUND.PREDECODE/ (19.12%)
> > 14,229,195 cpu_atom/MEM_SCHEDULER_BLOCK.ALL/ (19.12%)
> > 8,128,921 cpu_atom/MEM_BOUND_STALLS.LOAD_LLC_HIT/ (19.12%)
> > 20,941,725 cpu_atom/LD_HEAD.L1_MISS_AT_RET/ (19.11%)
> > 6,177,125 cpu_atom/TOPDOWN_FE_BOUND.BRANCH_RESTEER/ (18.78%)
> > 228,066,346 cpu_atom/TOPDOWN_BE_BOUND.ALL/ (18.38%)
> > 5,204,897 cpu_atom/LD_HEAD.L1_BOUND_AT_RET/ (17.99%)
> > 19,060,104 cpu_atom/TOPDOWN_BE_BOUND.NON_MEM_SCHEDULER/ (17.58%)
> > 0 cpu_atom/UOPS_RETIRED.FPDIV/ (17.19%)
> > 864,565,692 cpu_core/TOPDOWN.SLOTS/ # 4.7 % tma_microcode_sequencer
> > # 0.4 % tma_few_uops_instructions
> > # 0.3 % tma_fused_instructions
> > # 1.8 % tma_memory_operations
> > # 0.1 % tma_nop_instructions
> > # 8.9 % tma_ms_switches
> > # 0.4 % tma_non_fused_branches
> > # 0.0 % tma_fp_arith
> > # 0.0 % tma_int_operations
> > # 35.7 % tma_ports_utilization
> > # 3.8 % tma_other_light_ops (18.03%)
> > 100,519,954 cpu_core/topdown-retiring/ (18.03%)
> > 68,964,454 cpu_core/topdown-bad-spec/ (18.03%)
> > 44,732,021 cpu_core/topdown-heavy-ops/ (18.03%)
> > 435,618,316 cpu_core/topdown-fe-bound/ (18.03%)
> > 262,842,804 cpu_core/topdown-be-bound/ (18.03%)
> > 10,368,608 cpu_core/BR_INST_RETIRED.ALL_BRANCHES/ (18.43%)
> > 55,947,727 cpu_core/RESOURCE_STALLS.SCOREBOARD/ (18.84%)
> > 125,718,255 cpu_core/UOPS_ISSUED.ANY/ (19.24%)
> > 23,178,652 cpu_core/EXE_ACTIVITY.1_PORTS_UTIL/ (19.65%)
> > 0 cpu_core/INT_VEC_RETIRED.ADD_256/ (20.05%)
> > 1,119,514 cpu_core/DSB2MITE_SWITCHES.PENALTY_CYCLES/ # 0.5 % tma_dsb_switches (20.46%)
> > 27,684,795 cpu_core/MEMORY_ACTIVITY.STALLS_L1D_MISS/ # 10.6 % tma_l1_bound
> > # 0.7 % tma_l2_bound (20.86%)
> > 108,813,079 cpu_core/UOPS_EXECUTED.THREAD/ (21.27%)
> > 16,563,036 cpu_core/IDQ.MITE_CYCLES_ANY/ # 5.2 % tma_mite (19.14%)
> > 53,037,471 cpu_core/EXE_ACTIVITY.BOUND_ON_LOADS/ (19.14%)
> > 41,005,510 cpu_core/UOPS_RETIRED.MS/ (19.14%)
> > 575,534 cpu_core/ARITH.DIV_ACTIVE/ # 0.2 % tma_divider (19.14%)
> > 0 cpu_core/FP_ARITH_INST_RETIRED.SCALAR_SINGLE,umask=0x03/ (19.14%)
> > 2,207,021 cpu_core/EXE_ACTIVITY.BOUND_ON_STORES/ # 0.9 % tma_store_bound (19.13%)
> > 5,685,032 cpu_core/UOPS_RETIRED.MS,cmask=1,edge/ (19.13%)
> > 25,523 cpu_core/DECODE.LCP/ # 0.0 % tma_lcp (19.12%)
> > 26,095,298 cpu_core/MEMORY_ACTIVITY.STALLS_L2_MISS/ # 10.8 % tma_l3_bound (19.13%)
> > 108,516 cpu_core/MEMORY_ACTIVITY.STALLS_L3_MISS/ # 0.0 % tma_dram_bound (19.13%)
> > 192,239,590 cpu_core/CYCLE_ACTIVITY.STALLS_TOTAL/ (19.12%)
> > 5,978 cpu_core/LSD.CYCLES_ACTIVE/ # -0.0 % tma_lsd (19.12%)
> > 0 cpu_core/INT_VEC_RETIRED.VNNI_128/ (19.13%)
> > 137,530,949 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/ # 0.1 % tma_dsb (19.12%)
> > 240,070,549 cpu_core/CPU_CLK_UNHALTED.THREAD/ # 17.5 % tma_icache_misses
> > # 6.1 % tma_itlb_misses
> > # 40.3 % tma_branch_resteers (21.52%)
> > 0 cpu_core/FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE,umask=0x3c/ (21.51%)
> > 595,051 cpu_core/ARITH.DIV_ACTIVE/ (21.52%)
> > 461,041 cpu_core/IDQ.DSB_CYCLES_ANY/ (21.51%)
> > 0 cpu_core/INT_VEC_RETIRED.MUL_256/ (21.52%)
> > 0 cpu_core/UOPS_EXECUTED.X87/ (21.52%)
> > 237,196 cpu_core/IDQ.DSB_CYCLES_OK/ (21.52%)
> > 125,009 cpu_core/LSD.CYCLES_OK/ (21.52%)
> > 0 cpu_core/INT_VEC_RETIRED.ADD_128/ (21.40%)
> > 28,388,778 cpu_core/MEM_UOP_RETIRED.ANY/ (18.61%)
> > 1,806,629 cpu_core/INST_RETIRED.NOP/ (18.21%)
> > 41,928,018 cpu_core/ICACHE_DATA.STALLS/ (17.81%)
> > 0 cpu_core/INT_VEC_RETIRED.VNNI_256/ (17.41%)
> > 18,230,137 cpu_core/EXE_ACTIVITY.2_PORTS_UTIL,umask=0xc/ (17.02%)
> > 28,052,001 cpu_core/EXE_ACTIVITY.3_PORTS_UTIL,umask=0x80/ (16.61%)
> > 4,073,568 cpu_core/INST_RETIRED.MACRO_FUSED/ (16.20%)
> > 66,509,871 cpu_core/INT_MISC.UNKNOWN_BRANCH_CYCLES/ (15.92%)
> > 2,307,447 cpu_core/IDQ.MITE_CYCLES_OK/ (15.91%)
> > 30,345,769 cpu_core/INT_MISC.CLEAR_RESTEER_CYCLES/ (15.91%)
> > 0 cpu_core/INT_VEC_RETIRED.SHUFFLES/ (15.91%)
> > 14,722,079 cpu_core/ICACHE_TAG.STALLS/ (15.90%)
> >
> > 1.004474469 seconds time elapsed
> >
> > $ perf stat --metric-no-threshold --metric-no-group -M TopdownL4 -a sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > 1,004,834,399 ns duration_time # 0.3 % tma_false_sharing
> > # 40.2 % tma_l3_hit_latency
> > # 4.4 % tma_contested_accesses
> > # 1.6 % tma_data_sharing
> > 3,762,410 cpu_atom/LD_HEAD.PGWALK_AT_RET/ # 3.1 % tma_stlb_miss (33.58%)
> > 10 cpu_atom/MACHINE_CLEARS.SMC/ # 0.0 % tma_smc (33.98%)
> > 66,500,689 cpu_atom/TOPDOWN_BE_BOUND.MEM_SCHEDULER/ # 0.0 % tma_ld_buffer
> > # 0.0 % tma_rsv
> > # 11.0 % tma_st_buffer (29.60%)
> > 1,051,312 cpu_atom/LD_HEAD.OTHER_AT_RET/ # 0.9 % tma_other_l1 (30.00%)
> > 14,740,093 cpu_atom/UOPS_RETIRED.MS/ (30.39%)
> > 117,899 cpu_atom/LD_HEAD.DTLB_MISS_AT_RET/ # 0.1 % tma_stlb_hit (30.79%)
> > 701,548 cpu_atom/TOPDOWN_BAD_SPECULATION.NUKE/ # 0.0 % tma_disambiguation
> > # 0.0 % tma_fp_assist
> > # 0.1 % tma_memory_ordering
> > # 0.0 % tma_page_fault (31.08%)
> > 12,873 cpu_atom/MACHINE_CLEARS.MEMORY_ORDERING/ (31.07%)
> > 58,321 cpu_atom/MEM_SCHEDULER_BLOCK.LD_BUF/ (31.07%)
> > 43,458 cpu_atom/MEM_SCHEDULER_BLOCK.RSV/ (31.07%)
> > 14,256,005 cpu_atom/MEM_SCHEDULER_BLOCK.ALL/ (31.06%)
> > 122,156,534 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 0.0 % tma_store_fwd_blk (36.16%)
> > 0 cpu_atom/MACHINE_CLEARS.FP_ASSIST/ (35.76%)
> > 13,804 cpu_atom/MACHINE_CLEARS.SLOW/ (35.35%)
> > 14,388,300 cpu_atom/MEM_SCHEDULER_BLOCK.ST_BUF/ (34.95%)
> > 493,070,443 cpu_atom/CPU_CLK_UNHALTED.REF_TSC/ (39.73%)
> > 2 cpu_atom/MACHINE_CLEARS.PAGE_FAULT/ (39.33%)
> > 1,101 cpu_atom/LD_HEAD.ST_ADDR_AT_RET/ (38.93%)
> > 929 cpu_atom/MACHINE_CLEARS.DISAMBIGUATION/ (38.55%)
> > 14,241,213 cpu_atom/MEM_SCHEDULER_BLOCK.ALL/ (33.45%)
> > 1,010,981,054 cpu_core/TOPDOWN.SLOTS/ # 0.0 % tma_assists
> > # 4.3 % tma_cisc
> > # 0.0 % tma_fp_scalar
> > # 0.0 % tma_fp_vector
> > # 0.0 % tma_shuffles
> > # 0.0 % tma_int_vector_128b
> > # 0.0 % tma_x87_use
> > # 0.0 % tma_int_vector_256b
> > # 0.7 % tma_clears_resteers
> > # 12.4 % tma_mispredicts_resteers (8.14%)
> > 132,375,316 cpu_core/topdown-retiring/ (8.14%)
> > 88,303,327 cpu_core/topdown-bad-spec/ (8.14%)
> > 85,519,216 cpu_core/topdown-br-mispredict/ (8.14%)
> > 495,722,455 cpu_core/topdown-fe-bound/ (8.14%)
> > 298,147,134 cpu_core/topdown-be-bound/ (8.14%)
> > 21,418,803 cpu_core/UOPS_EXECUTED.CYCLES_GE_3/ # 8.8 % tma_ports_utilized_3m (10.12%)
> > 35,208,716 cpu_core/OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD,cmask=4/ # 14.5 % tma_mem_bandwidth
> > # 33.3 % tma_mem_latency (10.52%)
> > 17,358 cpu_core/OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM/ (10.91%)
> > 55,883,811 cpu_core/RESOURCE_STALLS.SCOREBOARD/ # 24.1 % tma_ports_utilized_0 (12.91%)
> > 0 cpu_core/INT_VEC_RETIRED.ADD_256/ (14.89%)
> > 139,890 cpu_core/DTLB_STORE_MISSES.STLB_HIT,cmask=1/ # 2.8 % tma_dtlb_store (15.30%)
> > 216,886 cpu_core/MEM_INST_RETIRED.LOCK_LOADS/ # 3.8 % tma_store_latency
> > # 0.1 % tma_lock_latency (15.71%)
> > 115,948,790 cpu_core/UOPS_EXECUTED.THREAD/ (17.69%)
> > 52,155,508 cpu_core/EXE_ACTIVITY.BOUND_ON_LOADS/ (15.93%)
> > 6 cpu_core/ASSISTS.ANY,umask=0x1B/ (15.93%)
> > 87,422,517 cpu_core/CYCLE_ACTIVITY.CYCLES_MEM_ANY/ # 5.2 % tma_dtlb_load (15.81%)
> > 37,420,652 cpu_core/MEMORY_ACTIVITY.CYCLES_L1D_MISS/ (15.44%)
> > 43,527,357 cpu_core/UOPS_RETIRED.MS/ (15.04%)
> > 31,787,227 cpu_core/INT_MISC.CLEAR_RESTEER_CYCLES/ (14.64%)
> > 0 cpu_core/FP_ARITH_INST_RETIRED.SCALAR_SINGLE,umask=0x03/ (14.24%)
> > 4,899,130 cpu_core/XQ.FULL_CYCLES/ # 2.0 % tma_sq_full (13.84%)
> > 1,365 cpu_core/OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM/ (13.44%)
> > 23,904,338 cpu_core/EXE_ACTIVITY.1_PORTS_UTIL/ # 9.9 % tma_ports_utilized_1 (13.05%)
> > 251,479 cpu_core/L2_RQSTS.ALL_RFO/ (12.76%)
> > 188,701,010 cpu_core/CYCLE_ACTIVITY.STALLS_TOTAL/ (12.74%)
> > 6,909 cpu_core/MEM_INST_RETIRED.SPLIT_STORES/ # 0.0 % tma_split_stores (12.74%)
> > 619,775 cpu_core/MEM_LOAD_RETIRED.L1_MISS/ (9.56%)
> > 136,716,345 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/ # 0.9 % tma_decoder0_alone (11.15%)
> > 0 cpu_core/INT_VEC_RETIRED.VNNI_128/ (12.74%)
> > 605,850 cpu_core/L1D_PEND_MISS.FB_FULL/ # 0.2 % tma_fb_full (12.73%)
> > 60,079 cpu_core/MEM_STORE_RETIRED.L2_HIT/ (11.14%)
> > 242,508,080 cpu_core/CPU_CLK_UNHALTED.THREAD/ # 4.2 % tma_ports_utilized_2
> > # 0.2 % tma_store_fwd_blk
> > # 0.0 % tma_streaming_stores
> > # 27.5 % tma_unknown_branches
> > # 0.0 % tma_split_loads (12.74%)
> > 0 cpu_core/FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE,umask=0x3c/ (14.33%)
> > 32,573 cpu_core/LD_BLOCKS.STORE_FORWARD/ (12.74%)
> > 1,130 cpu_core/OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD/ (12.74%)
> > 4,029 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS/ (9.56%)
> > 4,844,548 cpu_core/INST_DECODED.DECODERS,cmask=1/ (9.56%)
> > 5,266 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_NO_FWD/ (6.37%)
> > 0 cpu_core/UOPS_EXECUTED.X87/ (7.96%)
> > 0 cpu_core/INT_VEC_RETIRED.MUL_256/ (9.56%)
> > 2,786,473 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/ (9.56%)
> > 961,614,001 cpu_core/CPU_CLK_UNHALTED.REF_TSC/ (11.15%)
> > 2,433,107 cpu_core/INST_DECODED.DECODERS,cmask=2/ (11.15%)
> > 0 cpu_core/INT_VEC_RETIRED.ADD_128/ (12.74%)
> > 9,058,046 cpu_core/OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO/ (12.74%)
> > 6,399,992 cpu_core/MEM_INST_RETIRED.ALL_STORES/ (12.74%)
> > 45,519,749 cpu_core/L1D_PEND_MISS.PENDING/ (9.56%)
> > 12,200,559 cpu_core/DTLB_LOAD_MISSES.WALK_ACTIVE/ (7.97%)
> > 115,944,190 cpu_core/OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD/ (6.37%)
> > 0 cpu_core/INT_VEC_RETIRED.VNNI_256/ (7.96%)
> > 1,885,278 cpu_core/INT_MISC.UOP_DROPPING/ (9.56%)
> > 524,819 cpu_core/MEM_LOAD_RETIRED.FB_HIT/ (9.56%)
> > 26,866,872 cpu_core/EXE_ACTIVITY.3_PORTS_UTIL,umask=0x80/ (11.15%)
> > 10,265,977 cpu_core/EXE_ACTIVITY.2_PORTS_UTIL/ (12.74%)
> > 66,662,934 cpu_core/INT_MISC.UNKNOWN_BRANCH_CYCLES/ (12.74%)
> > 0 cpu_core/OCR.STREAMING_WR.ANY_RESPONSE/ (12.74%)
> > 12,499 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD/ (12.74%)
> > 0 cpu_core/INT_VEC_RETIRED.SHUFFLES/ (12.74%)
> > 47,649 cpu_core/DTLB_LOAD_MISSES.STLB_HIT,cmask=1/ (12.74%)
> > 106,424 cpu_core/L2_RQSTS.RFO_HIT/ (12.74%)
> > 0 cpu_core/LD_BLOCKS.NO_SR/ (7.97%)
> > 1,343,692 cpu_core/MEM_LOAD_COMPLETED.L1_MISS_ANY/ (7.96%)
> > 28,517 cpu_core/L1D_PEND_MISS.L2_STALLS/ (6.37%)
> > 394,101 cpu_core/MEM_LOAD_RETIRED.L3_HIT/ (6.36%)
> > 76,860,165,929 TSC
> >
> > 1.004834399 seconds time elapsed
> >
> > $ perf stat --metric-no-threshold --metric-no-group -M TopdownL5 -a sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > 839,538,302 cpu_core/TOPDOWN.SLOTS/ # 0.0 % tma_avx_assists
> > # 0.0 % tma_fp_assists
> > # 0.0 % tma_page_faults
> > # 0.0 % tma_fp_vector_128b
> > # 0.0 % tma_fp_vector_256b (32.40%)
> > 100,274,045 cpu_core/topdown-retiring/ (32.40%)
> > 77,425,642 cpu_core/topdown-bad-spec/ (32.40%)
> > 424,563,652 cpu_core/topdown-fe-bound/ (32.40%)
> > 245,420,564 cpu_core/topdown-be-bound/ (32.40%)
> > 0 cpu_core/FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE/ (32.79%)
> > 54,372,921 cpu_core/RESOURCE_STALLS.SCOREBOARD/ # 22.2 % tma_serializing_operation (33.20%)
> > 23,018,585 cpu_core/UOPS_DISPATCHED.PORT_6/ # 8.0 % tma_alu_op_utilization (33.61%)
> > 17,748,101 cpu_core/UOPS_DISPATCHED.PORT_2_3_10/ # 4.2 % tma_load_op_utilization (34.02%)
> > 0 cpu_core/FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE/ (34.43%)
> > 7,616,700 cpu_core/UOPS_DISPATCHED.PORT_0/ (34.83%)
> > 96,571 cpu_core/DTLB_STORE_MISSES.STLB_HIT,cmask=1/ # 0.6 % tma_store_stlb_hit (35.25%)
> > 84,909,672 cpu_core/CYCLE_ACTIVITY.CYCLES_MEM_ANY/ # 0.2 % tma_load_stlb_hit (35.66%)
> > 32,935,744 cpu_core/MEMORY_ACTIVITY.CYCLES_L1D_MISS/ (31.95%)
> > 16,597,385 cpu_core/UOPS_DISPATCHED.PORT_5_11/ (31.95%)
> > 9,452,844 cpu_core/UOPS_DISPATCHED.PORT_1/ (31.94%)
> > 2,620,695 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/ # 1.8 % tma_store_stlb_miss (31.95%)
> > 15,699,364 cpu_core/UOPS_DISPATCHED.PORT_7_8/ # 5.7 % tma_store_op_utilization (31.95%)
> > 0 cpu_core/FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE/ (31.94%)
> > 142,096,670 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/ (31.95%)
> > 244,591,239 cpu_core/CPU_CLK_UNHALTED.THREAD/ # 5.2 % tma_load_stlb_miss
> > # 0.0 % tma_mixing_vectors (35.92%)
> > 2,728,385 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/ (35.66%)
> > 0 cpu_core/ASSISTS.SSE_AVX_MIX/ (35.27%)
> > 0 cpu_core/FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE/ (34.86%)
> > 12,664,768 cpu_core/DTLB_LOAD_MISSES.WALK_ACTIVE/ (34.46%)
> > 12,629,733 cpu_core/DTLB_LOAD_MISSES.WALK_ACTIVE/ (34.04%)
> > 0 cpu_core/ASSISTS.FP/ (33.63%)
> > 12 cpu_core/ASSISTS.PAGE_FAULT/ (33.23%)
> > 16,704,699 cpu_core/UOPS_DISPATCHED.PORT_4_9/ (32.81%)
> > 48,386 cpu_core/DTLB_LOAD_MISSES.STLB_HIT,cmask=1/ (28.68%)
> >
> > 1.002806967 seconds time elapsed
> >
> > $ perf stat --metric-no-threshold --metric-no-group -M TopdownL6 -a sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > 743,684 cpu_core/UOPS_DISPATCHED.PORT_0/ # 4.6 % tma_port_0
> > 1,514 cpu_core/MISC2_RETIRED.LFENCE/ # 0.1 % tma_memory_fence
> > 22,120 cpu_core/CPU_CLK_UNHALTED.PAUSE/ # 0.1 % tma_slow_pause
> > 16,187,637 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/ # 4.5 % tma_port_1
> > # 12.6 % tma_port_6
> > 16,754,672 cpu_core/CPU_CLK_UNHALTED.THREAD/
> > 728,805 cpu_core/UOPS_DISPATCHED.PORT_1/
> > 2,040,181 cpu_core/UOPS_DISPATCHED.PORT_6/
> >
> > 1.002727371 seconds time elapse
> > '''
> >
> > Using --cputype:
> > '''
> > $ perf stat --cputype=core -M TopdownL1 -a sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > 90,542,172 cpu_core/TOPDOWN.SLOTS/ # 31.3 % tma_backend_bound
> > # 7.0 % tma_bad_speculation
> > # 54.0 % tma_frontend_bound
> > # 7.6 % tma_retiring
> > 6,917,885 cpu_core/topdown-retiring/
> > 6,242,227 cpu_core/topdown-bad-spec/
> > 2,353,956 cpu_core/topdown-heavy-ops/
> > 49,034,945 cpu_core/topdown-fe-bound/
> > 28,390,484 cpu_core/topdown-be-bound/
> > 98,299 cpu_core/INT_MISC.UOP_DROPPING/
> >
> > 1.002395582 seconds time elapsed
> >
> > $ perf stat --cputype=atom -M TopdownL1 -a sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > 645,836 cpu_atom/TOPDOWN_RETIRING.ALL/ # 26.4 % tma_bad_speculation
> > 2,404,468 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 38.9 % tma_frontend_bound
> > 1,455,604 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 23.6 % tma_backend_bound
> > # 23.6 % tma_backend_bound_aux
> > 1,235,109 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 10.4 % tma_retiring
> > 642,124 cpu_atom/TOPDOWN_RETIRING.ALL/
> > 2,398,892 cpu_atom/TOPDOWN_FE_BOUND.ALL/
> > 1,503,157 cpu_atom/TOPDOWN_BE_BOUND.ALL/
> >
> > 1.002061651 seconds time elapsed
> > '''
> >
> > Ian Rogers (40):
> > perf stat: Introduce skippable evsels
> > perf vendor events intel: Add alderlake metric constraints
> > perf vendor events intel: Add icelake metric constraints
> > perf vendor events intel: Add icelakex metric constraints
> > perf vendor events intel: Add sapphirerapids metric constraints
> > perf vendor events intel: Add tigerlake metric constraints
> > perf stat: Avoid segv on counter->name
> > perf test: Test more sysfs events
> > perf test: Use valid for PMU tests
> > perf test: Mask config then test
> > perf test: Test more with config_cache
> > perf test: Roundtrip name, don't assume 1 event per name
> > perf parse-events: Set attr.type to PMU type early
> > perf print-events: Avoid unnecessary strlist
> > perf parse-events: Avoid scanning PMUs before parsing
> > perf test: Validate events with hyphens in
> > perf evsel: Modify group pmu name for software events
> > perf test: Move x86 hybrid tests to arch/x86
> > perf test x86 hybrid: Don't assume evlist order
> > perf parse-events: Support PMUs for legacy cache events
> > perf parse-events: Wildcard legacy cache events
> > perf print-events: Print legacy cache events for each PMU
> > perf parse-events: Support wildcards on raw events
> > perf parse-events: Remove now unused hybrid logic
> > perf parse-events: Minor type safety cleanup
> > perf parse-events: Add pmu filter
> > perf stat: Make cputype filter generic
> > perf test: Add cputype testing to perf stat
> > perf test: Fix parse-events tests for >1 core PMU
> > perf parse-events: Support hardware events as terms
> > perf parse-events: Avoid error when assigning a term
> > perf parse-events: Avoid error when assigning a legacy cache term
> > perf parse-events: Don't auto merge hybrid wildcard events
> > perf parse-events: Don't reorder atom cpu events
> > perf metrics: Be PMU specific for referenced metrics.
> > perf metric: Json flag to not group events if gathering a metric group
> > perf stat: Command line PMU metric filtering
> > perf vendor events intel: Correct alderlake metrics
> > perf jevents: Don't rewrite metrics across PMUs
> > perf metrics: Be PMU specific in event match
> >
> > tools/perf/arch/x86/include/arch-tests.h | 1 +
> > tools/perf/arch/x86/tests/Build | 1 +
> > tools/perf/arch/x86/tests/arch-tests.c | 10 +
> > tools/perf/arch/x86/tests/hybrid.c | 275 ++++++
> > tools/perf/arch/x86/util/evlist.c | 4 +-
> > tools/perf/builtin-list.c | 19 +-
> > tools/perf/builtin-record.c | 13 +-
> > tools/perf/builtin-stat.c | 73 +-
> > tools/perf/builtin-top.c | 5 +-
> > tools/perf/builtin-trace.c | 5 +-
> > .../arch/x86/alderlake/adl-metrics.json | 275 +++---
> > .../arch/x86/alderlaken/adln-metrics.json | 20 +-
> > .../arch/x86/broadwell/bdw-metrics.json | 12 +
> > .../arch/x86/broadwellde/bdwde-metrics.json | 12 +
> > .../arch/x86/broadwellx/bdx-metrics.json | 12 +
> > .../arch/x86/cascadelakex/clx-metrics.json | 12 +
> > .../arch/x86/haswell/hsw-metrics.json | 12 +
> > .../arch/x86/haswellx/hsx-metrics.json | 12 +
> > .../arch/x86/icelake/icl-metrics.json | 23 +
> > .../arch/x86/icelakex/icx-metrics.json | 23 +
> > .../arch/x86/ivybridge/ivb-metrics.json | 12 +
> > .../arch/x86/ivytown/ivt-metrics.json | 12 +
> > .../arch/x86/jaketown/jkt-metrics.json | 12 +
> > .../arch/x86/sandybridge/snb-metrics.json | 12 +
> > .../arch/x86/sapphirerapids/spr-metrics.json | 23 +
> > .../arch/x86/skylake/skl-metrics.json | 12 +
> > .../arch/x86/skylakex/skx-metrics.json | 12 +
> > .../arch/x86/tigerlake/tgl-metrics.json | 23 +
> > tools/perf/pmu-events/jevents.py | 10 +-
> > tools/perf/pmu-events/metric.py | 28 +-
> > tools/perf/pmu-events/metric_test.py | 6 +-
> > tools/perf/pmu-events/pmu-events.h | 2 +
> > tools/perf/tests/evsel-roundtrip-name.c | 119 ++-
> > tools/perf/tests/parse-events.c | 826 +++++++++---------
> > tools/perf/tests/pmu-events.c | 12 +-
> > tools/perf/tests/shell/stat.sh | 44 +
> > tools/perf/util/Build | 1 -
> > tools/perf/util/evlist.h | 1 -
> > tools/perf/util/evsel.c | 30 +-
> > tools/perf/util/evsel.h | 1 +
> > tools/perf/util/metricgroup.c | 111 ++-
> > tools/perf/util/metricgroup.h | 3 +-
> > tools/perf/util/parse-events-hybrid.c | 214 -----
> > tools/perf/util/parse-events-hybrid.h | 25 -
> > tools/perf/util/parse-events.c | 646 ++++++--------
> > tools/perf/util/parse-events.h | 61 +-
> > tools/perf/util/parse-events.l | 108 +--
> > tools/perf/util/parse-events.y | 222 ++---
> > tools/perf/util/pmu-hybrid.c | 20 -
> > tools/perf/util/pmu-hybrid.h | 1 -
> > tools/perf/util/pmu.c | 16 +-
> > tools/perf/util/pmu.h | 3 +
> > tools/perf/util/pmus.c | 25 +-
> > tools/perf/util/pmus.h | 3 +
> > tools/perf/util/print-events.c | 85 +-
> > tools/perf/util/stat-display.c | 6 +-
> > 56 files changed, 1939 insertions(+), 1627 deletions(-)
> > create mode 100644 tools/perf/arch/x86/tests/hybrid.c
> > delete mode 100644 tools/perf/util/parse-events-hybrid.c
> > delete mode 100644 tools/perf/util/parse-events-hybrid.h
> >
> > --
> > 2.40.1.495.gc816e09b53d-goog
> >
>
> --
>
> - Arnaldo

--

- Arnaldo

2023-04-26 22:09:25

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 00/40] Fix perf on Intel hybrid CPUs



On 2023-04-26 5:33 p.m., Arnaldo Carvalho de Melo wrote:
> Em Wed, Apr 26, 2023 at 06:09:36PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Wed, Apr 26, 2023 at 12:00:10AM -0700, Ian Rogers escreveu:
>>> TL;DR: hybrid doesn't crash, json metrics work on hybrid on both PMUs
>>> or individually, event parsing doesn't always scan all PMUs, more and
>>> new tests that also run without hybrid, less code.
>>>
>>> The first patches were previously posted to improve metrics here:
>>> "perf stat: Introduce skippable evsels"
>>> https://lore.kernel.org/all/[email protected]/
>>> "perf vendor events intel: Add xxx metric constraints"
>>> https://lore.kernel.org/all/[email protected]/
>>>
>>> Next are some general test improvements.
>>
>> Kan,
>>
>> Have you looked at this? I'm doing a test build on it now.
>
> And just to make clear, this is for v6.5.
>

I'm looking at the patch series, but I cannot finish all the reviews
today. I will try to finish it tomorrow.

But there is one obvious bug with this series.
The topdown events of atom are duplicated. The below is just an example.
Almost all the atom Topdown events in the examples have such issue.

> $ perf stat --cputype=atom -M TopdownL1 -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 645,836 cpu_atom/TOPDOWN_RETIRING.ALL/ # 26.4 % tma_bad_speculation
> 2,404,468 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 38.9 % tma_frontend_bound
> 1,455,604 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 23.6 % tma_backend_bound
> # 23.6 % tma_backend_bound_aux
> 1,235,109 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 10.4 % tma_retiring
> 642,124 cpu_atom/TOPDOWN_RETIRING.ALL/
> 2,398,892 cpu_atom/TOPDOWN_FE_BOUND.ALL/
> 1,503,157 cpu_atom/TOPDOWN_BE_BOUND.ALL/



Thanks,
Kan

2023-04-26 23:29:27

by Yasin, Ahmad

[permalink] [raw]
Subject: RE: [PATCH v1 01/40] perf stat: Introduce skippable evsels

The output got needlessly lengthened with recent changes from Ian.

These four metrics:
# 14.5 % tma_retiring
# 27.6 % tma_backend_bound
# 40.9 % tma_frontend_bound
# 17.0 % tma_bad_speculation
better be appended on the right hand side of these four events (as current perf-stat does):
144,922 topdown-retiring:u
411,266 topdown-fe-bound:u
258,510 topdown-be-bound:u
184,090 topdown-bad-spec:u

Also, I think we should not bother the default perf-stat users with the last two events:
2,585 INT_MISC.UOP_DROPPING:u # 4.528 M/sec
3,434 cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/u # 6.015 M/sec

(yes, there are meant to increase accuracy of the previous tma_* level1 metrics, but the underlying event vary from model to model, e.g. SKL to ICL to SPR).

Besides, I can think on better metrics to append on the top-most TMA event (TOPDOWN.SLOTS). tma_retiring does not belong there.

Ahmad

-----Original Message-----
From: Ian Rogers <[email protected]>
Sent: Wednesday, April 26, 2023 10:00
To: Arnaldo Carvalho de Melo <[email protected]>; Kan Liang <[email protected]>; Yasin, Ahmad <[email protected]>; Peter Zijlstra <[email protected]>; Ingo Molnar <[email protected]>; Eranian, Stephane <[email protected]>; Andi Kleen <[email protected]>; Taylor, Perry <[email protected]>; Alt, Samantha <[email protected]>; Biggers, Caleb <[email protected]>; Wang, Weilin <[email protected]>; Baker, Edward <[email protected]>; Mark Rutland <[email protected]>; Alexander Shishkin <[email protected]>; Jiri Olsa <[email protected]>; Namhyung Kim <[email protected]>; Hunter, Adrian <[email protected]>; Florian Fischer <[email protected]>; Rob Herring <[email protected]>; Zhengjun Xing <[email protected]>; John Garry <[email protected]>; Kajol Jain <[email protected]>; Sumanth Korikkar <[email protected]>; Thomas Richter <[email protected]>; Tiezhu Yang <[email protected]>; Ravi Bangoria <[email protected]>; Leo Yan <[email protected]>; Yang Jihong <[email protected]>; James Clark <[email protected]>; Suzuki Poulouse <[email protected]>; Kang Minchul <[email protected]>; Athira Rajeev <[email protected]>; [email protected]; [email protected]
Cc: Ian Rogers <[email protected]>
Subject: [PATCH v1 01/40] perf stat: Introduce skippable evsels

Perf stat with no arguments will use default events and metrics. These events may fail to open even with kernel and hypervisor disabled. When these fail then the permissions error appears even though they were implicitly selected. This is particularly a problem with the automatic selection of the TopdownL1 metric group on certain architectures like
Skylake:

```
$ perf stat true
Error:
Access to performance monitoring and observability operations is limited.
Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open access to performance monitoring and observability operations for processes without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
perf_event_paranoid setting is 2:
-1: Allow use of (almost) all events by all users
Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow raw and ftrace function tracepoint access = 1: Disallow
>CPU event access = 2: Disallow kernel profiling
To make the adjusted perf_event_paranoid setting permanent preserve it in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>) ```

This patch adds skippable evsels that when they fail to open won't fail and won't appear in output. The TopdownL1 events, from the metric group, are marked as skippable. This turns the failure above to:

```
$ perf stat true

Performance counter stats for 'true':

1.26 msec task-clock:u # 0.328 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
49 page-faults:u # 38.930 K/sec
176,449 cycles:u # 0.140 GHz (48.99%)
122,905 instructions:u # 0.70 insn per cycle
28,264 branches:u # 22.456 M/sec
2,405 branch-misses:u # 8.51% of all branches

0.003834565 seconds time elapsed

0.000000000 seconds user
0.004130000 seconds sys
```

When the events can have kernel/hypervisor disabled, like on Tigerlake, then it continues to succeed as:

```
$ perf stat true

Performance counter stats for 'true':

0.57 msec task-clock:u # 0.385 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
47 page-faults:u # 82.329 K/sec
287,017 cycles:u # 0.503 GHz
133,318 instructions:u # 0.46 insn per cycle
31,396 branches:u # 54.996 M/sec
2,442 branch-misses:u # 7.78% of all branches
998,790 TOPDOWN.SLOTS:u # 14.5 % tma_retiring
# 27.6 % tma_backend_bound
# 40.9 % tma_frontend_bound
# 17.0 % tma_bad_speculation
144,922 topdown-retiring:u
411,266 topdown-fe-bound:u
258,510 topdown-be-bound:u
184,090 topdown-bad-spec:u
2,585 INT_MISC.UOP_DROPPING:u # 4.528 M/sec
3,434 cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/u # 6.015 M/sec

0.001480954 seconds time elapsed

0.000000000 seconds user
0.001686000 seconds sys
```

And this likewise works if paranoia allows or running as root.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/builtin-stat.c | 39 ++++++++++++++++++++++++++--------
tools/perf/util/evsel.c | 15 +++++++++++--
tools/perf/util/evsel.h | 1 +
tools/perf/util/stat-display.c | 4 ++++
4 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index efda63f6bf32..eb34f5418ad3 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -667,6 +667,13 @@ static enum counter_recovery stat_handle_error(struct evsel *counter)
evsel_list->core.threads->err_thread = -1;
return COUNTER_RETRY;
}
+ } else if (counter->skippable) {
+ if (verbose > 0)
+ ui__warning("skipping event %s that kernel failed to open .\n",
+ evsel__name(counter));
+ counter->supported = false;
+ counter->errored = true;
+ return COUNTER_SKIP;
}

evsel__open_strerror(counter, &target, errno, msg, sizeof(msg)); @@ -1885,15 +1892,29 @@ static int add_default_attributes(void)
* Add TopdownL1 metrics if they exist. To minimize
* multiplexing, don't request threshold computation.
*/
- if (metricgroup__has_metric("TopdownL1") &&
- metricgroup__parse_groups(evsel_list, "TopdownL1",
- /*metric_no_group=*/false,
- /*metric_no_merge=*/false,
- /*metric_no_threshold=*/true,
- stat_config.user_requested_cpu_list,
- stat_config.system_wide,
- &stat_config.metric_events) < 0)
- return -1;
+ if (metricgroup__has_metric("TopdownL1")) {
+ struct evlist *metric_evlist = evlist__new();
+ struct evsel *metric_evsel;
+
+ if (!metric_evlist)
+ return -1;
+
+ if (metricgroup__parse_groups(metric_evlist, "TopdownL1",
+ /*metric_no_group=*/false,
+ /*metric_no_merge=*/false,
+ /*metric_no_threshold=*/true,
+ stat_config.user_requested_cpu_list,
+ stat_config.system_wide,
+ &stat_config.metric_events) < 0)
+ return -1;
+
+ evlist__for_each_entry(metric_evlist, metric_evsel) {
+ metric_evsel->skippable = true;
+ }
+ evlist__splice_list_tail(evsel_list, &metric_evlist->core.entries);
+ evlist__delete(metric_evlist);
+ }
+
/* Platform specific attrs */
if (evlist__add_default_attrs(evsel_list, default_null_attrs) < 0)
return -1;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 356c07f03be6..1cd04b5998d2 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -290,6 +290,7 @@ void evsel__init(struct evsel *evsel,
evsel->per_pkg_mask = NULL;
evsel->collect_stat = false;
evsel->pmu_name = NULL;
+ evsel->skippable = false;
}

struct evsel *evsel__new_idx(struct perf_event_attr *attr, int idx) @@ -1725,9 +1726,13 @@ static int get_group_fd(struct evsel *evsel, int cpu_map_idx, int thread)
return -1;

fd = FD(leader, cpu_map_idx, thread);
- BUG_ON(fd == -1);
+ BUG_ON(fd == -1 && !leader->skippable);

- return fd;
+ /*
+ * When the leader has been skipped, return -2 to distinguish from no
+ * group leader case.
+ */
+ return fd == -1 ? -2 : fd;
}

static void evsel__remove_fd(struct evsel *pos, int nr_cpus, int nr_threads, int thread_idx) @@ -2109,6 +2114,12 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,

group_fd = get_group_fd(evsel, idx, thread);

+ if (group_fd == -2) {
+ pr_debug("broken group leader for %s\n", evsel->name);
+ err = -EINVAL;
+ goto out_close;
+ }
+
test_attr__ready();

/* Debug message used by test scripts */ diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index 35805dcdb1b9..bf8f01af1c0b 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -95,6 +95,7 @@ struct evsel {
bool weak_group;
bool bpf_counter;
bool use_config_name;
+ bool skippable;
int bpf_fd;
struct bpf_object *bpf_obj;
struct list_head config_terms;
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c index e6035ecbeee8..6b46bbb3d322 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -810,6 +810,10 @@ static bool should_skip_zero_counter(struct perf_stat_config *config,
struct perf_cpu cpu;
int idx;

+ /* Skip counters that were speculatively/default enabled rather than requested. */
+ if (counter->skippable)
+ return true;
+
/*
* Skip value 0 when enabling --per-thread globally,
* otherwise it will have too many 0 output.
--
2.40.1.495.gc816e09b53d-goog

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

2023-04-27 00:44:42

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH v1 01/40] perf stat: Introduce skippable evsels

On Wed, Apr 26, 2023 at 4:26 PM Yasin, Ahmad <[email protected]> wrote:
>
> The output got needlessly lengthened with recent changes from Ian.
>
> These four metrics:
> # 14.5 % tma_retiring
> # 27.6 % tma_backend_bound
> # 40.9 % tma_frontend_bound
> # 17.0 % tma_bad_speculation
> better be appended on the right hand side of these four events (as current perf-stat does):
> 144,922 topdown-retiring:u
> 411,266 topdown-fe-bound:u
> 258,510 topdown-be-bound:u
> 184,090 topdown-bad-spec:u
>
> Also, I think we should not bother the default perf-stat users with the last two events:
> 2,585 INT_MISC.UOP_DROPPING:u # 4.528 M/sec
> 3,434 cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/u # 6.015 M/sec
>
> (yes, there are meant to increase accuracy of the previous tma_* level1 metrics, but the underlying event vary from model to model, e.g. SKL to ICL to SPR).
>
> Besides, I can think on better metrics to append on the top-most TMA event (TOPDOWN.SLOTS). tma_retiring does not belong there.
>
> Ahmad

Hi Ahmad,

when running perf without events and metrics you get a set of events
and metrics added for you. Previously this meant hard coded metrics
and these are just wrong. Their formulas are wrong, their thresholds
are wrong, they don't use groups correctly, they are inconsistent
between hybrid core types (appearing for one and not the other) and
there is more. When there is a hard coded metric then as they are
implicit from events being enabled, if I gather topdown level 3
metrics, topdown level 1 hard coded metrics may appear. So if I
generate CSV output I may or may not get implicitly added columns from
hard coded metrics. This bug was introduced by Intel and we've
consistently requested a fix, including at our last face-to-face. On
top of this the hard coded metrics in Linux 6.3 use "saved values" for
aggregation and this is broken, counts are at least doubled in some
cases meaning depending on flags you get wildly wrong figures for, for
example, memory bandwidth. So everything is wrong in Linux 6.3 and
needed fixing - this is something that Intel have been testing since
February, for a bug introduced in October, but these complaints came
up 2 weeks ago. Note, ARM pointed out an issue with this patch series
in less than a day, which is awesome testing work!

The switch to json metrics is both documented in the code base as
being the desired route and also in presentations we have worked with
Intel on giving (over years). But anyway, the complaint keeps coming
back to the perf output when running without events or metrics. The
output of what are called "shadow stats" in perf is tied to the output
of events, for metrics it is tied to a "leading" event ie the first
event parsed. For perf metric (cpu or cpu_core topdown-*) events there
is a requirement that the group leader is slots, and so when there are
topdown events in a metric it follows that the metrics are all output
with the slots event. We could look to change this behavior but why?
The complaint is that the output is hard to read, but it is hard to
read regardless of json metrics. In the meeting today, "slots" is a
classic example of what users don't understand. This is a separate
problem than fixing bugs in aggregation, hybrid, etc. The naming
decision to use "tma_bad_speculation" rather than "Bad Speculation" as
used in Linux 6.3 was one that I recall you personally approving.

Adding back the hard coded metrics and output for 6.4 is a regression:
- it will break CSV output when moving across architectures
- there is no testing unlike the json metrics that are tested as part
of project Valkyrie
- grouping, formulas, thresholds, etc. are broken.
- the only compelling reason could be to avoid hybrid crashes, but
that is something far better solved with this patch series.

I understand toplev is using this output for the timed pebs work. It
should not! There are no tests on the text output of perf, but what
there are tests on are the CSV and Json output - that's why they
exist. Tools should be using the CSV and Json output, if text were to
be required tool output then there should be tests on it. I've asked
for toplev to be added to the shell tests every time this has come up
in the past. It is far better to fix toplev not to use text output.

These changes are about fixing hybrid not about the output format of
the perf tool. At no point during 6.4 development has the json metric
output changed. Complaining that a tool, toplev, needs fixing because
of this is the fault of the tool that was doing something wrong. Once
tools are not using the text output then we can fix it and make it
more human readable. There is no reason, for example, to include
events *at all* when displaying metrics. At the same time as cleaning
this up we can question whether IPC and branch miss rate should be
gathered, as these contribute to multiplexing - another reason to run
the tool with CSV or Json output with a named metric, metric group or
event.

I acknowledge your point but it is wrong. I can share today's slide
deck with you if it is useful, Sam said she would be following up
inside of Intel. The only proposed fix for 6.4 is to not enable the
TopdownL1 group on hybrid. This patch series is the proposed long-term
hybrid fix and for 6.5. Testing, reviews, etc. all very much
appreciated.

Thanks,
Ian

> -----Original Message-----
> From: Ian Rogers <[email protected]>
> Sent: Wednesday, April 26, 2023 10:00
> To: Arnaldo Carvalho de Melo <[email protected]>; Kan Liang <[email protected]>; Yasin, Ahmad <[email protected]>; Peter Zijlstra <[email protected]>; Ingo Molnar <[email protected]>; Eranian, Stephane <[email protected]>; Andi Kleen <[email protected]>; Taylor, Perry <[email protected]>; Alt, Samantha <[email protected]>; Biggers, Caleb <[email protected]>; Wang, Weilin <[email protected]>; Baker, Edward <[email protected]>; Mark Rutland <[email protected]>; Alexander Shishkin <[email protected]>; Jiri Olsa <[email protected]>; Namhyung Kim <[email protected]>; Hunter, Adrian <[email protected]>; Florian Fischer <[email protected]>; Rob Herring <[email protected]>; Zhengjun Xing <[email protected]>; John Garry <[email protected]>; Kajol Jain <[email protected]>; Sumanth Korikkar <[email protected]>; Thomas Richter <[email protected]>; Tiezhu Yang <[email protected]>; Ravi Bangoria <[email protected]>; Leo Yan <[email protected]>; Yang Jihong <[email protected]>; James Clark <[email protected]>; Suzuki Poulouse <[email protected]>; Kang Minchul <[email protected]>; Athira Rajeev <[email protected]>; [email protected]; [email protected]
> Cc: Ian Rogers <[email protected]>
> Subject: [PATCH v1 01/40] perf stat: Introduce skippable evsels
>
> Perf stat with no arguments will use default events and metrics. These events may fail to open even with kernel and hypervisor disabled. When these fail then the permissions error appears even though they were implicitly selected. This is particularly a problem with the automatic selection of the TopdownL1 metric group on certain architectures like
> Skylake:
>
> ```
> $ perf stat true
> Error:
> Access to performance monitoring and observability operations is limited.
> Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open access to performance monitoring and observability operations for processes without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
> More information can be found at 'Perf events and tool security' document:
> https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
> perf_event_paranoid setting is 2:
> -1: Allow use of (almost) all events by all users
> Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
> >= 0: Disallow raw and ftrace function tracepoint access = 1: Disallow
> >CPU event access = 2: Disallow kernel profiling
> To make the adjusted perf_event_paranoid setting permanent preserve it in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>) ```
>
> This patch adds skippable evsels that when they fail to open won't fail and won't appear in output. The TopdownL1 events, from the metric group, are marked as skippable. This turns the failure above to:
>
> ```
> $ perf stat true
>
> Performance counter stats for 'true':
>
> 1.26 msec task-clock:u # 0.328 CPUs utilized
> 0 context-switches:u # 0.000 /sec
> 0 cpu-migrations:u # 0.000 /sec
> 49 page-faults:u # 38.930 K/sec
> 176,449 cycles:u # 0.140 GHz (48.99%)
> 122,905 instructions:u # 0.70 insn per cycle
> 28,264 branches:u # 22.456 M/sec
> 2,405 branch-misses:u # 8.51% of all branches
>
> 0.003834565 seconds time elapsed
>
> 0.000000000 seconds user
> 0.004130000 seconds sys
> ```
>
> When the events can have kernel/hypervisor disabled, like on Tigerlake, then it continues to succeed as:
>
> ```
> $ perf stat true
>
> Performance counter stats for 'true':
>
> 0.57 msec task-clock:u # 0.385 CPUs utilized
> 0 context-switches:u # 0.000 /sec
> 0 cpu-migrations:u # 0.000 /sec
> 47 page-faults:u # 82.329 K/sec
> 287,017 cycles:u # 0.503 GHz
> 133,318 instructions:u # 0.46 insn per cycle
> 31,396 branches:u # 54.996 M/sec
> 2,442 branch-misses:u # 7.78% of all branches
> 998,790 TOPDOWN.SLOTS:u # 14.5 % tma_retiring
> # 27.6 % tma_backend_bound
> # 40.9 % tma_frontend_bound
> # 17.0 % tma_bad_speculation
> 144,922 topdown-retiring:u
> 411,266 topdown-fe-bound:u
> 258,510 topdown-be-bound:u
> 184,090 topdown-bad-spec:u
> 2,585 INT_MISC.UOP_DROPPING:u # 4.528 M/sec
> 3,434 cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/u # 6.015 M/sec
>
> 0.001480954 seconds time elapsed
>
> 0.000000000 seconds user
> 0.001686000 seconds sys
> ```
>
> And this likewise works if paranoia allows or running as root.
>
> Signed-off-by: Ian Rogers <[email protected]>
> ---
> tools/perf/builtin-stat.c | 39 ++++++++++++++++++++++++++--------
> tools/perf/util/evsel.c | 15 +++++++++++--
> tools/perf/util/evsel.h | 1 +
> tools/perf/util/stat-display.c | 4 ++++
> 4 files changed, 48 insertions(+), 11 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index efda63f6bf32..eb34f5418ad3 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -667,6 +667,13 @@ static enum counter_recovery stat_handle_error(struct evsel *counter)
> evsel_list->core.threads->err_thread = -1;
> return COUNTER_RETRY;
> }
> + } else if (counter->skippable) {
> + if (verbose > 0)
> + ui__warning("skipping event %s that kernel failed to open .\n",
> + evsel__name(counter));
> + counter->supported = false;
> + counter->errored = true;
> + return COUNTER_SKIP;
> }
>
> evsel__open_strerror(counter, &target, errno, msg, sizeof(msg)); @@ -1885,15 +1892,29 @@ static int add_default_attributes(void)
> * Add TopdownL1 metrics if they exist. To minimize
> * multiplexing, don't request threshold computation.
> */
> - if (metricgroup__has_metric("TopdownL1") &&
> - metricgroup__parse_groups(evsel_list, "TopdownL1",
> - /*metric_no_group=*/false,
> - /*metric_no_merge=*/false,
> - /*metric_no_threshold=*/true,
> - stat_config.user_requested_cpu_list,
> - stat_config.system_wide,
> - &stat_config.metric_events) < 0)
> - return -1;
> + if (metricgroup__has_metric("TopdownL1")) {
> + struct evlist *metric_evlist = evlist__new();
> + struct evsel *metric_evsel;
> +
> + if (!metric_evlist)
> + return -1;
> +
> + if (metricgroup__parse_groups(metric_evlist, "TopdownL1",
> + /*metric_no_group=*/false,
> + /*metric_no_merge=*/false,
> + /*metric_no_threshold=*/true,
> + stat_config.user_requested_cpu_list,
> + stat_config.system_wide,
> + &stat_config.metric_events) < 0)
> + return -1;
> +
> + evlist__for_each_entry(metric_evlist, metric_evsel) {
> + metric_evsel->skippable = true;
> + }
> + evlist__splice_list_tail(evsel_list, &metric_evlist->core.entries);
> + evlist__delete(metric_evlist);
> + }
> +
> /* Platform specific attrs */
> if (evlist__add_default_attrs(evsel_list, default_null_attrs) < 0)
> return -1;
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 356c07f03be6..1cd04b5998d2 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -290,6 +290,7 @@ void evsel__init(struct evsel *evsel,
> evsel->per_pkg_mask = NULL;
> evsel->collect_stat = false;
> evsel->pmu_name = NULL;
> + evsel->skippable = false;
> }
>
> struct evsel *evsel__new_idx(struct perf_event_attr *attr, int idx) @@ -1725,9 +1726,13 @@ static int get_group_fd(struct evsel *evsel, int cpu_map_idx, int thread)
> return -1;
>
> fd = FD(leader, cpu_map_idx, thread);
> - BUG_ON(fd == -1);
> + BUG_ON(fd == -1 && !leader->skippable);
>
> - return fd;
> + /*
> + * When the leader has been skipped, return -2 to distinguish from no
> + * group leader case.
> + */
> + return fd == -1 ? -2 : fd;
> }
>
> static void evsel__remove_fd(struct evsel *pos, int nr_cpus, int nr_threads, int thread_idx) @@ -2109,6 +2114,12 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
>
> group_fd = get_group_fd(evsel, idx, thread);
>
> + if (group_fd == -2) {
> + pr_debug("broken group leader for %s\n", evsel->name);
> + err = -EINVAL;
> + goto out_close;
> + }
> +
> test_attr__ready();
>
> /* Debug message used by test scripts */ diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index 35805dcdb1b9..bf8f01af1c0b 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -95,6 +95,7 @@ struct evsel {
> bool weak_group;
> bool bpf_counter;
> bool use_config_name;
> + bool skippable;
> int bpf_fd;
> struct bpf_object *bpf_obj;
> struct list_head config_terms;
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c index e6035ecbeee8..6b46bbb3d322 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -810,6 +810,10 @@ static bool should_skip_zero_counter(struct perf_stat_config *config,
> struct perf_cpu cpu;
> int idx;
>
> + /* Skip counters that were speculatively/default enabled rather than requested. */
> + if (counter->skippable)
> + return true;
> +
> /*
> * Skip value 0 when enabling --per-thread globally,
> * otherwise it will have too many 0 output.
> --
> 2.40.1.495.gc816e09b53d-goog
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.

2023-04-27 02:12:59

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH v1 01/40] perf stat: Introduce skippable evsels

On Wed, Apr 26, 2023 at 5:37 PM Ian Rogers <[email protected]> wrote:
>
> On Wed, Apr 26, 2023 at 4:26 PM Yasin, Ahmad <[email protected]> wrote:
> >
> > The output got needlessly lengthened with recent changes from Ian.
> >
> > These four metrics:
> > # 14.5 % tma_retiring
> > # 27.6 % tma_backend_bound
> > # 40.9 % tma_frontend_bound
> > # 17.0 % tma_bad_speculation
> > better be appended on the right hand side of these four events (as current perf-stat does):
> > 144,922 topdown-retiring:u
> > 411,266 topdown-fe-bound:u
> > 258,510 topdown-be-bound:u
> > 184,090 topdown-bad-spec:u
> >
> > Also, I think we should not bother the default perf-stat users with the last two events:
> > 2,585 INT_MISC.UOP_DROPPING:u # 4.528 M/sec
> > 3,434 cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/u # 6.015 M/sec
> >
> > (yes, there are meant to increase accuracy of the previous tma_* level1 metrics, but the underlying event vary from model to model, e.g. SKL to ICL to SPR).
> >
> > Besides, I can think on better metrics to append on the top-most TMA event (TOPDOWN.SLOTS). tma_retiring does not belong there.
> >
> > Ahmad
>
> Hi Ahmad,
>
> when running perf without events and metrics you get a set of events
> and metrics added for you. Previously this meant hard coded metrics
> and these are just wrong. Their formulas are wrong, their thresholds
> are wrong, they don't use groups correctly, they are inconsistent
> between hybrid core types (appearing for one and not the other) and
> there is more. When there is a hard coded metric then as they are
> implicit from events being enabled, if I gather topdown level 3
> metrics, topdown level 1 hard coded metrics may appear. So if I
> generate CSV output I may or may not get implicitly added columns from
> hard coded metrics. This bug was introduced by Intel and we've
> consistently requested a fix, including at our last face-to-face. On
> top of this the hard coded metrics in Linux 6.3 use "saved values" for
> aggregation and this is broken, counts are at least doubled in some
> cases meaning depending on flags you get wildly wrong figures for, for
> example, memory bandwidth. So everything is wrong in Linux 6.3 and
> needed fixing - this is something that Intel have been testing since
> February, for a bug introduced in October, but these complaints came
> up 2 weeks ago. Note, ARM pointed out an issue with this patch series
> in less than a day, which is awesome testing work!
>
> The switch to json metrics is both documented in the code base as
> being the desired route and also in presentations we have worked with
> Intel on giving (over years). But anyway, the complaint keeps coming
> back to the perf output when running without events or metrics. The
> output of what are called "shadow stats" in perf is tied to the output
> of events, for metrics it is tied to a "leading" event ie the first
> event parsed. For perf metric (cpu or cpu_core topdown-*) events there
> is a requirement that the group leader is slots, and so when there are
> topdown events in a metric it follows that the metrics are all output
> with the slots event. We could look to change this behavior but why?
> The complaint is that the output is hard to read, but it is hard to
> read regardless of json metrics. In the meeting today, "slots" is a
> classic example of what users don't understand. This is a separate
> problem than fixing bugs in aggregation, hybrid, etc. The naming
> decision to use "tma_bad_speculation" rather than "Bad Speculation" as
> used in Linux 6.3 was one that I recall you personally approving.
>
> Adding back the hard coded metrics and output for 6.4 is a regression:
> - it will break CSV output when moving across architectures
> - there is no testing unlike the json metrics that are tested as part
> of project Valkyrie
> - grouping, formulas, thresholds, etc. are broken.
> - the only compelling reason could be to avoid hybrid crashes, but
> that is something far better solved with this patch series.
>
> I understand toplev is using this output for the timed pebs work. It
> should not! There are no tests on the text output of perf, but what
> there are tests on are the CSV and Json output - that's why they
> exist. Tools should be using the CSV and Json output, if text were to
> be required tool output then there should be tests on it. I've asked
> for toplev to be added to the shell tests every time this has come up
> in the past. It is far better to fix toplev not to use text output.
>
> These changes are about fixing hybrid not about the output format of
> the perf tool. At no point during 6.4 development has the json metric
> output changed. Complaining that a tool, toplev, needs fixing because
> of this is the fault of the tool that was doing something wrong. Once
> tools are not using the text output then we can fix it and make it
> more human readable. There is no reason, for example, to include
> events *at all* when displaying metrics. At the same time as cleaning
> this up we can question whether IPC and branch miss rate should be
> gathered, as these contribute to multiplexing - another reason to run
> the tool with CSV or Json output with a named metric, metric group or
> event.
>
> I acknowledge your point but it is wrong. I can share today's slide
> deck with you if it is useful, Sam said she would be following up
> inside of Intel. The only proposed fix for 6.4 is to not enable the
> TopdownL1 group on hybrid. This patch series is the proposed long-term
> hybrid fix and for 6.5. Testing, reviews, etc. all very much
> appreciated.
>
> Thanks,
> Ian

Also Ahmad, note this series fixes the PMU scanning at perf start up
issue you reported to me - done as a way to better tokenize events,
but still fixed when possible (wildcard event names will scan PMUs as
they are wildcard event names, fix the scanning by adding a PMU as
pmu/event/). It also fixes json metrics, including topdown, on hybrid
architectures and non-hybrid ones like AlderlakeN. No perf output was
changed in this series of patches except when the previous output was
a breakage.

Thanks,
Ian

> > -----Original Message-----
> > From: Ian Rogers <[email protected]>
> > Sent: Wednesday, April 26, 2023 10:00
> > To: Arnaldo Carvalho de Melo <[email protected]>; Kan Liang <[email protected]>; Yasin, Ahmad <[email protected]>; Peter Zijlstra <[email protected]>; Ingo Molnar <[email protected]>; Eranian, Stephane <[email protected]>; Andi Kleen <[email protected]>; Taylor, Perry <[email protected]>; Alt, Samantha <[email protected]>; Biggers, Caleb <[email protected]>; Wang, Weilin <[email protected]>; Baker, Edward <[email protected]>; Mark Rutland <[email protected]>; Alexander Shishkin <[email protected]>; Jiri Olsa <[email protected]>; Namhyung Kim <[email protected]>; Hunter, Adrian <[email protected]>; Florian Fischer <[email protected]>; Rob Herring <[email protected]>; Zhengjun Xing <[email protected]>; John Garry <[email protected]>; Kajol Jain <[email protected]>; Sumanth Korikkar <[email protected]>; Thomas Richter <[email protected]>; Tiezhu Yang <[email protected]>; Ravi Bangoria <[email protected]>; Leo Yan <[email protected]>; Yang Jihong <[email protected]>; James Clark <[email protected]>; Suzuki Poulouse <[email protected]>; Kang Minchul <[email protected]>; Athira Rajeev <[email protected]>; [email protected]; [email protected]
> > Cc: Ian Rogers <[email protected]>
> > Subject: [PATCH v1 01/40] perf stat: Introduce skippable evsels
> >
> > Perf stat with no arguments will use default events and metrics. These events may fail to open even with kernel and hypervisor disabled. When these fail then the permissions error appears even though they were implicitly selected. This is particularly a problem with the automatic selection of the TopdownL1 metric group on certain architectures like
> > Skylake:
> >
> > ```
> > $ perf stat true
> > Error:
> > Access to performance monitoring and observability operations is limited.
> > Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open access to performance monitoring and observability operations for processes without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
> > More information can be found at 'Perf events and tool security' document:
> > https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
> > perf_event_paranoid setting is 2:
> > -1: Allow use of (almost) all events by all users
> > Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
> > >= 0: Disallow raw and ftrace function tracepoint access = 1: Disallow
> > >CPU event access = 2: Disallow kernel profiling
> > To make the adjusted perf_event_paranoid setting permanent preserve it in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>) ```
> >
> > This patch adds skippable evsels that when they fail to open won't fail and won't appear in output. The TopdownL1 events, from the metric group, are marked as skippable. This turns the failure above to:
> >
> > ```
> > $ perf stat true
> >
> > Performance counter stats for 'true':
> >
> > 1.26 msec task-clock:u # 0.328 CPUs utilized
> > 0 context-switches:u # 0.000 /sec
> > 0 cpu-migrations:u # 0.000 /sec
> > 49 page-faults:u # 38.930 K/sec
> > 176,449 cycles:u # 0.140 GHz (48.99%)
> > 122,905 instructions:u # 0.70 insn per cycle
> > 28,264 branches:u # 22.456 M/sec
> > 2,405 branch-misses:u # 8.51% of all branches
> >
> > 0.003834565 seconds time elapsed
> >
> > 0.000000000 seconds user
> > 0.004130000 seconds sys
> > ```
> >
> > When the events can have kernel/hypervisor disabled, like on Tigerlake, then it continues to succeed as:
> >
> > ```
> > $ perf stat true
> >
> > Performance counter stats for 'true':
> >
> > 0.57 msec task-clock:u # 0.385 CPUs utilized
> > 0 context-switches:u # 0.000 /sec
> > 0 cpu-migrations:u # 0.000 /sec
> > 47 page-faults:u # 82.329 K/sec
> > 287,017 cycles:u # 0.503 GHz
> > 133,318 instructions:u # 0.46 insn per cycle
> > 31,396 branches:u # 54.996 M/sec
> > 2,442 branch-misses:u # 7.78% of all branches
> > 998,790 TOPDOWN.SLOTS:u # 14.5 % tma_retiring
> > # 27.6 % tma_backend_bound
> > # 40.9 % tma_frontend_bound
> > # 17.0 % tma_bad_speculation
> > 144,922 topdown-retiring:u
> > 411,266 topdown-fe-bound:u
> > 258,510 topdown-be-bound:u
> > 184,090 topdown-bad-spec:u
> > 2,585 INT_MISC.UOP_DROPPING:u # 4.528 M/sec
> > 3,434 cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/u # 6.015 M/sec
> >
> > 0.001480954 seconds time elapsed
> >
> > 0.000000000 seconds user
> > 0.001686000 seconds sys
> > ```
> >
> > And this likewise works if paranoia allows or running as root.
> >
> > Signed-off-by: Ian Rogers <[email protected]>
> > ---
> > tools/perf/builtin-stat.c | 39 ++++++++++++++++++++++++++--------
> > tools/perf/util/evsel.c | 15 +++++++++++--
> > tools/perf/util/evsel.h | 1 +
> > tools/perf/util/stat-display.c | 4 ++++
> > 4 files changed, 48 insertions(+), 11 deletions(-)
> >
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index efda63f6bf32..eb34f5418ad3 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -667,6 +667,13 @@ static enum counter_recovery stat_handle_error(struct evsel *counter)
> > evsel_list->core.threads->err_thread = -1;
> > return COUNTER_RETRY;
> > }
> > + } else if (counter->skippable) {
> > + if (verbose > 0)
> > + ui__warning("skipping event %s that kernel failed to open .\n",
> > + evsel__name(counter));
> > + counter->supported = false;
> > + counter->errored = true;
> > + return COUNTER_SKIP;
> > }
> >
> > evsel__open_strerror(counter, &target, errno, msg, sizeof(msg)); @@ -1885,15 +1892,29 @@ static int add_default_attributes(void)
> > * Add TopdownL1 metrics if they exist. To minimize
> > * multiplexing, don't request threshold computation.
> > */
> > - if (metricgroup__has_metric("TopdownL1") &&
> > - metricgroup__parse_groups(evsel_list, "TopdownL1",
> > - /*metric_no_group=*/false,
> > - /*metric_no_merge=*/false,
> > - /*metric_no_threshold=*/true,
> > - stat_config.user_requested_cpu_list,
> > - stat_config.system_wide,
> > - &stat_config.metric_events) < 0)
> > - return -1;
> > + if (metricgroup__has_metric("TopdownL1")) {
> > + struct evlist *metric_evlist = evlist__new();
> > + struct evsel *metric_evsel;
> > +
> > + if (!metric_evlist)
> > + return -1;
> > +
> > + if (metricgroup__parse_groups(metric_evlist, "TopdownL1",
> > + /*metric_no_group=*/false,
> > + /*metric_no_merge=*/false,
> > + /*metric_no_threshold=*/true,
> > + stat_config.user_requested_cpu_list,
> > + stat_config.system_wide,
> > + &stat_config.metric_events) < 0)
> > + return -1;
> > +
> > + evlist__for_each_entry(metric_evlist, metric_evsel) {
> > + metric_evsel->skippable = true;
> > + }
> > + evlist__splice_list_tail(evsel_list, &metric_evlist->core.entries);
> > + evlist__delete(metric_evlist);
> > + }
> > +
> > /* Platform specific attrs */
> > if (evlist__add_default_attrs(evsel_list, default_null_attrs) < 0)
> > return -1;
> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 356c07f03be6..1cd04b5998d2 100644
> > --- a/tools/perf/util/evsel.c
> > +++ b/tools/perf/util/evsel.c
> > @@ -290,6 +290,7 @@ void evsel__init(struct evsel *evsel,
> > evsel->per_pkg_mask = NULL;
> > evsel->collect_stat = false;
> > evsel->pmu_name = NULL;
> > + evsel->skippable = false;
> > }
> >
> > struct evsel *evsel__new_idx(struct perf_event_attr *attr, int idx) @@ -1725,9 +1726,13 @@ static int get_group_fd(struct evsel *evsel, int cpu_map_idx, int thread)
> > return -1;
> >
> > fd = FD(leader, cpu_map_idx, thread);
> > - BUG_ON(fd == -1);
> > + BUG_ON(fd == -1 && !leader->skippable);
> >
> > - return fd;
> > + /*
> > + * When the leader has been skipped, return -2 to distinguish from no
> > + * group leader case.
> > + */
> > + return fd == -1 ? -2 : fd;
> > }
> >
> > static void evsel__remove_fd(struct evsel *pos, int nr_cpus, int nr_threads, int thread_idx) @@ -2109,6 +2114,12 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> >
> > group_fd = get_group_fd(evsel, idx, thread);
> >
> > + if (group_fd == -2) {
> > + pr_debug("broken group leader for %s\n", evsel->name);
> > + err = -EINVAL;
> > + goto out_close;
> > + }
> > +
> > test_attr__ready();
> >
> > /* Debug message used by test scripts */ diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index 35805dcdb1b9..bf8f01af1c0b 100644
> > --- a/tools/perf/util/evsel.h
> > +++ b/tools/perf/util/evsel.h
> > @@ -95,6 +95,7 @@ struct evsel {
> > bool weak_group;
> > bool bpf_counter;
> > bool use_config_name;
> > + bool skippable;
> > int bpf_fd;
> > struct bpf_object *bpf_obj;
> > struct list_head config_terms;
> > diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c index e6035ecbeee8..6b46bbb3d322 100644
> > --- a/tools/perf/util/stat-display.c
> > +++ b/tools/perf/util/stat-display.c
> > @@ -810,6 +810,10 @@ static bool should_skip_zero_counter(struct perf_stat_config *config,
> > struct perf_cpu cpu;
> > int idx;
> >
> > + /* Skip counters that were speculatively/default enabled rather than requested. */
> > + if (counter->skippable)
> > + return true;
> > +
> > /*
> > * Skip value 0 when enabling --per-thread globally,
> > * otherwise it will have too many 0 output.
> > --
> > 2.40.1.495.gc816e09b53d-goog
> >
> > ---------------------------------------------------------------------
> > Intel Israel (74) Limited
> >
> > This e-mail and any attachments may contain confidential material for
> > the sole use of the intended recipient(s). Any review or distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.

2023-04-27 05:56:22

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH v1 21/40] perf parse-events: Wildcard legacy cache events

On Wed, Apr 26, 2023 at 3:11 AM James Clark <[email protected]> wrote:
>
>
>
> On 26/04/2023 08:00, Ian Rogers wrote:
> > It is inconsistent that "perf stat -e instructions-retired" wildcard
> > opens on all PMUs while legacy cache events like "perf stat -e
> > L1-dcache-load-miss" do not. A behavior introduced by hybrid is that a
> > legacy cache event like L1-dcache-load-miss should wildcard open on
> > all hybrid PMUs. A call to is_event_supported is necessary for each
> > PMU, a failure of which results in the event not being added. Rather
> > than special case that logic, move it into the main legacy cache event
> > case and attempt to open legacy cache events on all PMUs.
> >
> > Signed-off-by: Ian Rogers <[email protected]>
> > ---
> > tools/perf/util/parse-events-hybrid.c | 33 -------------
> > tools/perf/util/parse-events-hybrid.h | 7 ---
> > tools/perf/util/parse-events.c | 70 ++++++++++++++-------------
> > tools/perf/util/parse-events.h | 3 +-
> > tools/perf/util/parse-events.y | 2 +-
> > 5 files changed, 39 insertions(+), 76 deletions(-)
> >
> > diff --git a/tools/perf/util/parse-events-hybrid.c b/tools/perf/util/parse-events-hybrid.c
> > index 7c9f9150bad5..d2c0be051d46 100644
> > --- a/tools/perf/util/parse-events-hybrid.c
> > +++ b/tools/perf/util/parse-events-hybrid.c
> > @@ -179,36 +179,3 @@ int parse_events__add_numeric_hybrid(struct parse_events_state *parse_state,
> > return add_raw_hybrid(parse_state, list, attr, name, metric_id,
> > config_terms);
> > }
> > -
> > -int parse_events__add_cache_hybrid(struct list_head *list, int *idx,
> > - struct perf_event_attr *attr,
> > - const char *name,
> > - const char *metric_id,
> > - struct list_head *config_terms,
> > - bool *hybrid,
> > - struct parse_events_state *parse_state)
> > -{
> > - struct perf_pmu *pmu;
> > - int ret;
> > -
> > - *hybrid = false;
> > - if (!perf_pmu__has_hybrid())
> > - return 0;
> > -
> > - *hybrid = true;
> > - perf_pmu__for_each_hybrid_pmu(pmu) {
> > - LIST_HEAD(terms);
> > -
> > - if (pmu_cmp(parse_state, pmu))
> > - continue;
> > -
> > - copy_config_terms(&terms, config_terms);
> > - ret = create_event_hybrid(PERF_TYPE_HW_CACHE, idx, list,
> > - attr, name, metric_id, &terms, pmu);
> > - free_config_terms(&terms);
> > - if (ret)
> > - return ret;
> > - }
> > -
> > - return 0;
> > -}
> > diff --git a/tools/perf/util/parse-events-hybrid.h b/tools/perf/util/parse-events-hybrid.h
> > index cbc05fec02a2..bc2966e73897 100644
> > --- a/tools/perf/util/parse-events-hybrid.h
> > +++ b/tools/perf/util/parse-events-hybrid.h
> > @@ -15,11 +15,4 @@ int parse_events__add_numeric_hybrid(struct parse_events_state *parse_state,
> > struct list_head *config_terms,
> > bool *hybrid);
> >
> > -int parse_events__add_cache_hybrid(struct list_head *list, int *idx,
> > - struct perf_event_attr *attr,
> > - const char *name, const char *metric_id,
> > - struct list_head *config_terms,
> > - bool *hybrid,
> > - struct parse_events_state *parse_state);
> > -
> > #endif /* __PERF_PARSE_EVENTS_HYBRID_H */
> > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> > index 9b2d7b6572c2..e007b2bc1ab4 100644
> > --- a/tools/perf/util/parse-events.c
> > +++ b/tools/perf/util/parse-events.c
> > @@ -471,46 +471,50 @@ static int parse_events__decode_legacy_cache(const char *name, int pmu_type, __u
> >
> > int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
> > struct parse_events_error *err,
> > - struct list_head *head_config,
> > - struct parse_events_state *parse_state)
> > + struct list_head *head_config)
> > {
> > - struct perf_event_attr attr;
> > - LIST_HEAD(config_terms);
> > - const char *config_name, *metric_id;
> > - int ret;
> > - bool hybrid;
> > + struct perf_pmu *pmu = NULL;
> > + bool found_supported = false;
> > + const char *config_name = get_config_name(head_config);
> > + const char *metric_id = get_config_metric_id(head_config);
> >
> > + while ((pmu = perf_pmu__scan(pmu)) != NULL) {
> > + LIST_HEAD(config_terms);
> > + struct perf_event_attr attr;
> > + int ret;
> >
> > - memset(&attr, 0, sizeof(attr));
> > - attr.type = PERF_TYPE_HW_CACHE;
> > - ret = parse_events__decode_legacy_cache(name, /*pmu_type=*/0, &attr.config);
> > - if (ret)
> > - return ret;
> > + /*
> > + * Skip uncore PMUs for performance. Software PMUs can open
> > + * PERF_TYPE_HW_CACHE, so skip.
> > + */
> > + if (pmu->is_uncore || pmu->type == PERF_TYPE_SOFTWARE)
> > + continue;
> >
> > - if (head_config) {
> > - if (config_attr(&attr, head_config, err,
> > - config_term_common))
> > - return -EINVAL;
> > + memset(&attr, 0, sizeof(attr));
> > + attr.type = PERF_TYPE_HW_CACHE;
> >
> > - if (get_config_terms(head_config, &config_terms))
> > - return -ENOMEM;
> > - }
> > + ret = parse_events__decode_legacy_cache(name, pmu->type, &attr.config);
> > + if (ret)
> > + return ret;
> >
> > - config_name = get_config_name(head_config);
> > - metric_id = get_config_metric_id(head_config);
> > - ret = parse_events__add_cache_hybrid(list, idx, &attr,
> > - config_name ? : name,
> > - metric_id,
> > - &config_terms,
> > - &hybrid, parse_state);
> > - if (hybrid)
> > - goto out_free_terms;
> > + if (!is_event_supported(PERF_TYPE_HW_CACHE, attr.config))
> > + continue;
>
> Hi Ian,
>
> I get a test failure on Arm from this commit. I think it's related to
> this check for support that's failing but I'm not sure what the
> resolution should be.

Yes, I brought in a behavior from hybrid to fail at parse time if a
legacy cache event isn't supported. The issue is the perf_event_open
may fail because of permissions and I think we probably need to
special case that and allow the parsing to succeed otherwise tests
like this will need to skip. I naively tested on a raspberry pi, which
has no metrics, and so I'll try again tomorrow on a neoverse.

> I also couldn't see why the metrics in
> test_soc/cpu/metrics.json aren't run on x86 (assuming they're generic
> 'test anywhere' type metrics?).

The testing code is split into a bunch of places for historical
reasons, but the test_soc is here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/tests/pmu-events.c?h=v6.3#n1031
'''
$ gdb --args perf test -vv -F 10
(gdb) b test__pmu_event_table
Breakpoint 1 at 0x199d7c: file tests/pmu-events.c, line 467.
(gdb) r
Starting program: /tmp/perf/perf test -vv -F 10
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
10: PMU events :
10.1: PMU event table sanity :
--- start ---

Breakpoint 1, test__pmu_event_table (test=0x5555560bd080
<suite.pmu_events>, subtest=0) at tes
ts/pmu-events.c:467
467 find_sys_events_table("pmu_events__test_soc_sys");
'''

Something I observed is that tests/parse-events.c isn't testing
against an ARM PMU and so skips a lot of testing. There should likely
be a helper so that the string in that test can be dependent on the
test platform. I worry this may expose some latent ARM issues with
things like obscure modifiers.

Thanks,
Ian

> $ perf test -vvv "parsing of PMU event table metrics with fake"
> ...
> parsing 'dcache_miss_cpi': 'l1d\-loads\-misses / inst_retired.any'
> parsing metric: l1d\-loads\-misses / inst_retired.any
> Attempting to add event pmu 'inst_retired.any' with
> 'inst_retired.any,' that may result in non-fatal errors
> After aliases, add event pmu 'inst_retired.any' with
> 'inst_retired.any,' that may result in non-fatal errors
> inst_retired.any -> fake_pmu/inst_retired.any/
> ------------------------------------------------------------
> perf_event_attr:
> type 3
> config 0x800010000
> disabled 1
> ------------------------------------------------------------
> sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8
> sys_perf_event_open failed, error -2
>
> check_parse_fake failed
> test child finished with -1
> ---- end ----
> PMU events subtest 4: FAILED!
>
> >
> > - ret = add_event(list, idx, &attr, config_name ? : name, metric_id,
> > - &config_terms);
> > -out_free_terms:
> > - free_config_terms(&config_terms);
> > - return ret;
> > + found_supported = true;
> > +
> > + if (head_config) {
> > + if (config_attr(&attr, head_config, err,
> > + config_term_common))
> > + return -EINVAL;
> > +
> > + if (get_config_terms(head_config, &config_terms))
> > + return -ENOMEM;
> > + }
> > +
> > + ret = add_event(list, idx, &attr, config_name ? : name, metric_id, &config_terms);
> > + free_config_terms(&config_terms);
> > + }
> > + return found_supported ? 0: -EINVAL;
> > }
> >
> > #ifdef HAVE_LIBTRACEEVENT
> > diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
> > index 5acb62c2e00a..0c26303f7f63 100644
> > --- a/tools/perf/util/parse-events.h
> > +++ b/tools/perf/util/parse-events.h
> > @@ -172,8 +172,7 @@ int parse_events_add_tool(struct parse_events_state *parse_state,
> > int tool_event);
> > int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
> > struct parse_events_error *error,
> > - struct list_head *head_config,
> > - struct parse_events_state *parse_state);
> > + struct list_head *head_config);
> > int parse_events_add_breakpoint(struct list_head *list, int *idx,
> > u64 addr, char *type, u64 len);
> > int parse_events_add_pmu(struct parse_events_state *parse_state,
> > diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
> > index f84fa1b132b3..cc7528558845 100644
> > --- a/tools/perf/util/parse-events.y
> > +++ b/tools/perf/util/parse-events.y
> > @@ -476,7 +476,7 @@ PE_LEGACY_CACHE opt_event_config
> >
> > list = alloc_list();
> > ABORT_ON(!list);
> > - err = parse_events_add_cache(list, &parse_state->idx, $1, error, $2, parse_state);
> > + err = parse_events_add_cache(list, &parse_state->idx, $1, error, $2);
> >
> > parse_events_terms__delete($2);
> > free($1);

2023-04-27 19:01:53

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 01/40] perf stat: Introduce skippable evsels



On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> Perf stat with no arguments will use default events and metrics. These
> events may fail to open even with kernel and hypervisor disabled. When
> these fail then the permissions error appears even though they were
> implicitly selected. This is particularly a problem with the automatic
> selection of the TopdownL1 metric group on certain architectures like
> Skylake:
>
> ```
> $ perf stat true
> Error:
> Access to performance monitoring and observability operations is limited.
> Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
> access to performance monitoring and observability operations for processes
> without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
> More information can be found at 'Perf events and tool security' document:
> https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
> perf_event_paranoid setting is 2:
> -1: Allow use of (almost) all events by all users
> Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>> = 0: Disallow raw and ftrace function tracepoint access
>> = 1: Disallow CPU event access
>> = 2: Disallow kernel profiling
> To make the adjusted perf_event_paranoid setting permanent preserve it
> in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
> ```
>
> This patch adds skippable evsels that when they fail to open won't
> fail and won't appear in output. The TopdownL1 events, from the metric
> group, are marked as skippable. This turns the failure above to:
>
> ```
> $ perf stat true
>
> Performance counter stats for 'true':
>
> 1.26 msec task-clock:u # 0.328 CPUs utilized
> 0 context-switches:u # 0.000 /sec
> 0 cpu-migrations:u # 0.000 /sec
> 49 page-faults:u # 38.930 K/sec
> 176,449 cycles:u # 0.140 GHz (48.99%)
> 122,905 instructions:u # 0.70 insn per cycle
> 28,264 branches:u # 22.456 M/sec
> 2,405 branch-misses:u # 8.51% of all branches
>
> 0.003834565 seconds time elapsed
>
> 0.000000000 seconds user
> 0.004130000 seconds sys
> ```

If the same command runs with root permission, a different output will
be displayed as below:

$ sudo ./perf stat sleep 1

Performance counter stats for 'sleep 1':

0.97 msec task-clock # 0.001 CPUs
utilized
1 context-switches # 1.030 K/sec
0 cpu-migrations # 0.000 /sec
67 page-faults # 69.043 K/sec
1,135,552 cycles # 1.170 GHz
(50.51%)
1,126,446 instructions # 0.99 insn
per cycle
252,904 branches # 260.615 M/sec
7,297 branch-misses # 2.89% of
all branches
22,518 CPU_CLK_UNHALTED.REF_XCLK # 23.205
M/sec
56,994 INT_MISC.RECOVERY_CYCLES_ANY # 58.732 M/sec

The last two events are useless.


It's not reliable to rely on perf_event_open()/kernel to tell whether
an event is available or skippable. Kernel wouldn't check a specific event.

The patch works for the non-root mode is just because the event requires
root permission. It's rejected by the kernel because of lacking
permission. But if the same command runs with root privileges, the trash
events are printed as above.

I think a better way is to check the HW capability and decided whether
to append the TopdownL1 metrics.

https://lore.kernel.org/lkml/[email protected]/


Thanks,
Kan


>
> When the events can have kernel/hypervisor disabled, like on
> Tigerlake, then it continues to succeed as:
>
> ```
> $ perf stat true
>
> Performance counter stats for 'true':
>
> 0.57 msec task-clock:u # 0.385 CPUs utilized
> 0 context-switches:u # 0.000 /sec
> 0 cpu-migrations:u # 0.000 /sec
> 47 page-faults:u # 82.329 K/sec
> 287,017 cycles:u # 0.503 GHz
> 133,318 instructions:u # 0.46 insn per cycle
> 31,396 branches:u # 54.996 M/sec
> 2,442 branch-misses:u # 7.78% of all branches
> 998,790 TOPDOWN.SLOTS:u # 14.5 % tma_retiring
> # 27.6 % tma_backend_bound
> # 40.9 % tma_frontend_bound
> # 17.0 % tma_bad_speculation
> 144,922 topdown-retiring:u
> 411,266 topdown-fe-bound:u
> 258,510 topdown-be-bound:u
> 184,090 topdown-bad-spec:u
> 2,585 INT_MISC.UOP_DROPPING:u # 4.528 M/sec
> 3,434 cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/u # 6.015 M/sec
>
> 0.001480954 seconds time elapsed
>
> 0.000000000 seconds user
> 0.001686000 seconds sys
> ```
>
> And this likewise works if paranoia allows or running as root.
>
> Signed-off-by: Ian Rogers <[email protected]>
> ---
> tools/perf/builtin-stat.c | 39 ++++++++++++++++++++++++++--------
> tools/perf/util/evsel.c | 15 +++++++++++--
> tools/perf/util/evsel.h | 1 +
> tools/perf/util/stat-display.c | 4 ++++
> 4 files changed, 48 insertions(+), 11 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index efda63f6bf32..eb34f5418ad3 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -667,6 +667,13 @@ static enum counter_recovery stat_handle_error(struct evsel *counter)
> evsel_list->core.threads->err_thread = -1;
> return COUNTER_RETRY;
> }
> + } else if (counter->skippable) {
> + if (verbose > 0)
> + ui__warning("skipping event %s that kernel failed to open .\n",
> + evsel__name(counter));
> + counter->supported = false;
> + counter->errored = true;
> + return COUNTER_SKIP;
> }
>
> evsel__open_strerror(counter, &target, errno, msg, sizeof(msg));
> @@ -1885,15 +1892,29 @@ static int add_default_attributes(void)
> * Add TopdownL1 metrics if they exist. To minimize
> * multiplexing, don't request threshold computation.
> */
> - if (metricgroup__has_metric("TopdownL1") &&
> - metricgroup__parse_groups(evsel_list, "TopdownL1",
> - /*metric_no_group=*/false,
> - /*metric_no_merge=*/false,
> - /*metric_no_threshold=*/true,
> - stat_config.user_requested_cpu_list,
> - stat_config.system_wide,
> - &stat_config.metric_events) < 0)
> - return -1;
> + if (metricgroup__has_metric("TopdownL1")) {
> + struct evlist *metric_evlist = evlist__new();
> + struct evsel *metric_evsel;
> +
> + if (!metric_evlist)
> + return -1;
> +
> + if (metricgroup__parse_groups(metric_evlist, "TopdownL1",
> + /*metric_no_group=*/false,
> + /*metric_no_merge=*/false,
> + /*metric_no_threshold=*/true,
> + stat_config.user_requested_cpu_list,
> + stat_config.system_wide,
> + &stat_config.metric_events) < 0)
> + return -1;
> +
> + evlist__for_each_entry(metric_evlist, metric_evsel) {
> + metric_evsel->skippable = true;
> + }
> + evlist__splice_list_tail(evsel_list, &metric_evlist->core.entries);
> + evlist__delete(metric_evlist);
> + }
> +
> /* Platform specific attrs */
> if (evlist__add_default_attrs(evsel_list, default_null_attrs) < 0)
> return -1;
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 356c07f03be6..1cd04b5998d2 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -290,6 +290,7 @@ void evsel__init(struct evsel *evsel,
> evsel->per_pkg_mask = NULL;
> evsel->collect_stat = false;
> evsel->pmu_name = NULL;
> + evsel->skippable = false;
> }
>
> struct evsel *evsel__new_idx(struct perf_event_attr *attr, int idx)
> @@ -1725,9 +1726,13 @@ static int get_group_fd(struct evsel *evsel, int cpu_map_idx, int thread)
> return -1;
>
> fd = FD(leader, cpu_map_idx, thread);
> - BUG_ON(fd == -1);
> + BUG_ON(fd == -1 && !leader->skippable);
>
> - return fd;
> + /*
> + * When the leader has been skipped, return -2 to distinguish from no
> + * group leader case.
> + */
> + return fd == -1 ? -2 : fd;
> }
>
> static void evsel__remove_fd(struct evsel *pos, int nr_cpus, int nr_threads, int thread_idx)
> @@ -2109,6 +2114,12 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
>
> group_fd = get_group_fd(evsel, idx, thread);
>
> + if (group_fd == -2) {
> + pr_debug("broken group leader for %s\n", evsel->name);
> + err = -EINVAL;
> + goto out_close;
> + }
> +
> test_attr__ready();
>
> /* Debug message used by test scripts */
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index 35805dcdb1b9..bf8f01af1c0b 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -95,6 +95,7 @@ struct evsel {
> bool weak_group;
> bool bpf_counter;
> bool use_config_name;
> + bool skippable;
> int bpf_fd;
> struct bpf_object *bpf_obj;
> struct list_head config_terms;
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index e6035ecbeee8..6b46bbb3d322 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -810,6 +810,10 @@ static bool should_skip_zero_counter(struct perf_stat_config *config,
> struct perf_cpu cpu;
> int idx;
>
> + /* Skip counters that were speculatively/default enabled rather than requested. */
> + if (counter->skippable)
> + return true;
> +
> /*
> * Skip value 0 when enabling --per-thread globally,
> * otherwise it will have too many 0 output.

2023-04-27 19:07:22

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 03/40] perf vendor events intel: Add icelake metric constraints



On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> Previously these constraints were disabled as they contained topdown
> events. Since:
> https://lore.kernel.org/all/[email protected]/
> the topdown events are correctly grouped even if no group exists.
>
> This change was created by PR:
> https://github.com/intel/perfmon/pull/71
>
> Signed-off-by: Ian Rogers <[email protected]>
> ---
> .../perf/pmu-events/arch/x86/icelake/icl-metrics.json | 11 +++++++++++

Since it targets fixing the hybrid issues, could you please move the
unrelated patch out of the series? A huge series is realy hard to be
reviewed.


Thanks,
Kan

> 1 file changed, 11 insertions(+)
>
> diff --git a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json b/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
> index f45ae3483df4..cb58317860ea 100644
> --- a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
> +++ b/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
> @@ -311,6 +311,7 @@
> },
> {
> "BriefDescription": "This metric represents overall arithmetic floating-point (FP) operations fraction the CPU has executed (retired)",
> + "MetricConstraint": "NO_GROUP_EVENTS",
> "MetricExpr": "tma_x87_use + tma_fp_scalar + tma_fp_vector",
> "MetricGroup": "HPC;TopdownL3;tma_L3_group;tma_light_operations_group",
> "MetricName": "tma_fp_arith",
> @@ -413,6 +414,7 @@
> },
> {
> "BriefDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative branch misprediction (retired JEClear)",
> + "MetricConstraint": "NO_GROUP_EVENTS",
> "MetricExpr": "(tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) * tma_info_slots / BR_MISP_RETIRED.ALL_BRANCHES",
> "MetricGroup": "Bad;BrMispredicts;tma_issueBM",
> "MetricName": "tma_info_branch_misprediction_cost",
> @@ -458,6 +460,7 @@
> },
> {
> "BriefDescription": "Probability of Core Bound bottleneck hidden by SMT-profiling artifacts",
> + "MetricConstraint": "NO_GROUP_EVENTS",
> "MetricExpr": "(100 * (1 - tma_core_bound / tma_ports_utilization if tma_core_bound < tma_ports_utilization else 1) if tma_info_smt_2t_utilization > 0.5 else 0)",
> "MetricGroup": "Cor;SMT",
> "MetricName": "tma_info_core_bound_likely",
> @@ -510,6 +513,7 @@
> },
> {
> "BriefDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck",
> + "MetricConstraint": "NO_GROUP_EVENTS",
> "MetricExpr": "100 * (tma_fetch_latency * tma_dsb_switches / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_mite / (tma_dsb + tma_lsd + tma_mite))",
> "MetricGroup": "DSBmiss;Fed;tma_issueFB",
> "MetricName": "tma_info_dsb_misses",
> @@ -591,6 +595,7 @@
> },
> {
> "BriefDescription": "Total pipeline cost of instruction fetch bandwidth related bottlenecks",
> + "MetricConstraint": "NO_GROUP_EVENTS",
> "MetricExpr": "100 * (tma_frontend_bound - tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) - tma_info_big_code",
> "MetricGroup": "Fed;FetchBW;Frontend",
> "MetricName": "tma_info_instruction_fetch_bw",
> @@ -929,6 +934,7 @@
> },
> {
> "BriefDescription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)",
> + "MetricConstraint": "NO_GROUP_EVENTS",
> "MetricExpr": "100 * tma_memory_bound * (tma_l1_bound / max(tma_memory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_dtlb_load / max(tma_l1_bound, tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)))",
> "MetricGroup": "Mem;MemoryTLB;Offcore;tma_issueTLB",
> "MetricName": "tma_info_memory_data_tlbs",
> @@ -937,6 +943,7 @@
> },
> {
> "BriefDescription": "Total pipeline cost of Memory Latency related bottlenecks (external memory and off-core caches)",
> + "MetricConstraint": "NO_GROUP_EVENTS",
> "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_l3_hit_latency / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound))",
> "MetricGroup": "Mem;MemoryLat;Offcore;tma_issueLat",
> "MetricName": "tma_info_memory_latency",
> @@ -945,6 +952,7 @@
> },
> {
> "BriefDescription": "Total pipeline cost of Branch Misprediction related bottlenecks",
> + "MetricConstraint": "NO_GROUP_EVENTS",
> "MetricExpr": "100 * (tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))",
> "MetricGroup": "Bad;BadSpec;BrMispredicts;tma_issueBM",
> "MetricName": "tma_info_mispredictions",
> @@ -996,6 +1004,7 @@
> },
> {
> "BriefDescription": "Average number of Uops retired in cycles where at least one uop has retired.",
> + "MetricConstraint": "NO_GROUP_EVENTS",
> "MetricExpr": "tma_retiring * tma_info_slots / cpu@UOPS_RETIRED.SLOTS\\,cmask\\=1@",
> "MetricGroup": "Pipeline;Ret",
> "MetricName": "tma_info_retire"
> @@ -1196,6 +1205,7 @@
> },
> {
> "BriefDescription": "This metric represents fraction of slots where the CPU was retiring memory operations -- uops for memory load or store accesses.",
> + "MetricConstraint": "NO_GROUP_EVENTS",
> "MetricExpr": "tma_light_operations * MEM_INST_RETIRED.ANY / INST_RETIRED.ANY",
> "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
> "MetricName": "tma_memory_operations",
> @@ -1266,6 +1276,7 @@
> },
> {
> "BriefDescription": "This metric represents the remaining light uops fraction the CPU has executed - remaining means not covered by other sibling nodes",
> + "MetricConstraint": "NO_GROUP_EVENTS",
> "MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_memory_operations + tma_branch_instructions + tma_nop_instructions))",
> "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
> "MetricName": "tma_other_light_ops",

2023-04-27 19:19:05

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 07/40] perf stat: Avoid segv on counter->name



On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> Switch to use evsel__name that doesn't return NULL for hardware and
> similar events.

This one should be backport to 6.4. It helps to fix the Segmentation
fault with default mode.
https://lore.kernel.org/lkml/[email protected]/


> Signed-off-by: Ian Rogers <[email protected]>

Reviewed-by: Kan Liang <[email protected]>

Thanks,
Kan

> ---
> tools/perf/util/stat-display.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 6b46bbb3d322..71dd6cb83918 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -747,7 +747,7 @@ static void uniquify_event_name(struct evsel *counter)
> int ret = 0;
>
> if (counter->uniquified_name || counter->use_config_name ||
> - !counter->pmu_name || !strncmp(counter->name, counter->pmu_name,
> + !counter->pmu_name || !strncmp(evsel__name(counter), counter->pmu_name,
> strlen(counter->pmu_name)))
> return;
>

2023-04-27 19:35:24

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v1 07/40] perf stat: Avoid segv on counter->name

Em Thu, Apr 27, 2023 at 03:11:49PM -0400, Liang, Kan escreveu:
> On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> > Switch to use evsel__name that doesn't return NULL for hardware and
> > similar events.

> This one should be backport to 6.4. It helps to fix the Segmentation
> fault with default mode.
> https://lore.kernel.org/lkml/[email protected]/

> > Signed-off-by: Ian Rogers <[email protected]>

> Reviewed-by: Kan Liang <[email protected]>

Thanks,

I'll pick those before pushing to Linus, I'm mostly waiting for you guys
to come to a resolution on moving forward, which I hope has been reached
with that patch you sent.

- Arnaldo

2023-04-27 19:39:26

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 08/40] perf test: Test more sysfs events



On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> Parse events for all PMUs, and not just cpu, in test "Parsing of all
> PMU events from sysfs".
>
> Signed-off-by: Ian Rogers <[email protected]>

Run the test on Cascade Lake and Alder Lake. It looks good.

Tested-by: Kan Liang <[email protected]>

Thanks,
Kan
> ---
> tools/perf/tests/parse-events.c | 103 +++++++++++++++++---------------
> 1 file changed, 55 insertions(+), 48 deletions(-)
>
> diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
> index 8068cfd89b84..385bbbc4a409 100644
> --- a/tools/perf/tests/parse-events.c
> +++ b/tools/perf/tests/parse-events.c
> @@ -7,6 +7,7 @@
> #include "debug.h"
> #include "pmu.h"
> #include "pmu-hybrid.h"
> +#include "pmus.h"
> #include <dirent.h>
> #include <errno.h>
> #include "fncache.h"
> @@ -2225,49 +2226,24 @@ static int test_pmu(void)
>
> static int test__pmu_events(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
> {
> - struct stat st;
> - char path[PATH_MAX];
> - struct dirent *ent;
> - DIR *dir;
> - int ret;
> -
> - if (!test_pmu())
> - return TEST_SKIP;
> -
> - snprintf(path, PATH_MAX, "%s/bus/event_source/devices/cpu/events/",
> - sysfs__mountpoint());
> -
> - ret = stat(path, &st);
> - if (ret) {
> - pr_debug("omitting PMU cpu events tests: %s\n", path);
> - return TEST_OK;
> - }
> + struct perf_pmu *pmu;
> + int ret = TEST_OK;
>
> - dir = opendir(path);
> - if (!dir) {
> - pr_debug("can't open pmu event dir: %s\n", path);
> - return TEST_FAIL;
> - }
> + perf_pmus__for_each_pmu(pmu) {
> + struct stat st;
> + char path[PATH_MAX];
> + struct dirent *ent;
> + DIR *dir;
> + int err;
>
> - ret = TEST_OK;
> - while ((ent = readdir(dir))) {
> - struct evlist_test e = { .name = NULL, };
> - char name[2 * NAME_MAX + 1 + 12 + 3];
> - int test_ret;
> + snprintf(path, PATH_MAX, "%s/bus/event_source/devices/%s/events/",
> + sysfs__mountpoint(), pmu->name);
>
> - /* Names containing . are special and cannot be used directly */
> - if (strchr(ent->d_name, '.'))
> + err = stat(path, &st);
> + if (err) {
> + pr_debug("skipping PMU %s events tests: %s\n", pmu->name, path);
> + ret = combine_test_results(ret, TEST_SKIP);
> continue;
> -
> - snprintf(name, sizeof(name), "cpu/event=%s/u", ent->d_name);
> -
> - e.name = name;
> - e.check = test__checkevent_pmu_events;
> -
> - test_ret = test_event(&e);
> - if (test_ret != TEST_OK) {
> - pr_debug("Test PMU event failed for '%s'", name);
> - ret = combine_test_results(ret, test_ret);
> }
> /*
> * Names containing '-' are recognized as prefixes and suffixes
> @@ -2282,17 +2258,48 @@ static int test__pmu_events(struct test_suite *test __maybe_unused, int subtest
> if (strchr(ent->d_name, '-'))
> continue;
>
> - snprintf(name, sizeof(name), "%s:u,cpu/event=%s/u", ent->d_name, ent->d_name);
> - e.name = name;
> - e.check = test__checkevent_pmu_events_mix;
> - test_ret = test_event(&e);
> - if (test_ret != TEST_OK) {
> - pr_debug("Test PMU event failed for '%s'", name);
> - ret = combine_test_results(ret, test_ret);
> + dir = opendir(path);
> + if (!dir) {
> + pr_debug("can't open pmu event dir: %s\n", path);
> + ret = combine_test_results(ret, TEST_SKIP);
> + continue;
> }
> - }
>
> - closedir(dir);
> + while ((ent = readdir(dir))) {
> + struct evlist_test e = { .name = NULL, };
> + char name[2 * NAME_MAX + 1 + 12 + 3];
> + int test_ret;
> +
> + /* Names containing . are special and cannot be used directly */
> + if (strchr(ent->d_name, '.'))
> + continue;
> +
> + snprintf(name, sizeof(name), "%s/event=%s/u", pmu->name, ent->d_name);
> +
> + e.name = name;
> + e.check = test__checkevent_pmu_events;
> +
> + test_ret = test_event(&e);
> + if (test_ret != TEST_OK) {
> + pr_debug("Test PMU event failed for '%s'", name);
> + ret = combine_test_results(ret, test_ret);
> + }
> +
> + if (!is_pmu_core(pmu->name))
> + continue;
> +
> + snprintf(name, sizeof(name), "%s:u,%s/event=%s/u", ent->d_name, pmu->name, ent->d_name);
> + e.name = name;
> + e.check = test__checkevent_pmu_events_mix;
> + test_ret = test_event(&e);
> + if (test_ret != TEST_OK) {
> + pr_debug("Test PMU event failed for '%s'", name);
> + ret = combine_test_results(ret, test_ret);
> + }
> + }
> +
> + closedir(dir);
> + }
> return ret;
> }
>

2023-04-27 19:39:54

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 09/40] perf test: Use valid for PMU tests



On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> Rather than skip all tests in test__events_pmu if PMU cpu isn't
> present, use the per-test valid test. This allows the running of
> software PMU tests on hybrid and arm systems.
>
> Signed-off-by: Ian Rogers <[email protected]>

Run the test on Cascade Lake and Alder Lake. It looks good.

Tested-by: Kan Liang <[email protected]>

Thanks,
Kan

> ---
> tools/perf/tests/parse-events.c | 27 +++++++++------------------
> 1 file changed, 9 insertions(+), 18 deletions(-)
>
> diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
> index 385bbbc4a409..08d6b8a3015d 100644
> --- a/tools/perf/tests/parse-events.c
> +++ b/tools/perf/tests/parse-events.c
> @@ -1430,6 +1430,11 @@ static int test__checkevent_config_cache(struct evlist *evlist)
> return TEST_OK;
> }
>
> +static bool test__pmu_cpu_valid(void)
> +{
> + return !!perf_pmu__find("cpu");
> +}
> +
> static bool test__intel_pt_valid(void)
> {
> return !!perf_pmu__find("intel_pt");
> @@ -1979,21 +1984,25 @@ static const struct evlist_test test__events[] = {
> static const struct evlist_test test__events_pmu[] = {
> {
> .name = "cpu/config=10,config1,config2=3,period=1000/u",
> + .valid = test__pmu_cpu_valid,
> .check = test__checkevent_pmu,
> /* 0 */
> },
> {
> .name = "cpu/config=1,name=krava/u,cpu/config=2/u",
> + .valid = test__pmu_cpu_valid,
> .check = test__checkevent_pmu_name,
> /* 1 */
> },
> {
> .name = "cpu/config=1,call-graph=fp,time,period=100000/,cpu/config=2,call-graph=no,time=0,period=2000/",
> + .valid = test__pmu_cpu_valid,
> .check = test__checkevent_pmu_partial_time_callgraph,
> /* 2 */
> },
> {
> .name = "cpu/name='COMPLEX_CYCLES_NAME:orig=cycles,desc=chip-clock-ticks',period=0x1,event=0x2/ukp",
> + .valid = test__pmu_cpu_valid,
> .check = test__checkevent_complex_name,
> /* 3 */
> },
> @@ -2209,21 +2218,6 @@ static int test__terms2(struct test_suite *test __maybe_unused, int subtest __ma
> return test_terms(test__terms, ARRAY_SIZE(test__terms));
> }
>
> -static int test_pmu(void)
> -{
> - struct stat st;
> - char path[PATH_MAX];
> - int ret;
> -
> - snprintf(path, PATH_MAX, "%s/bus/event_source/devices/cpu/format/",
> - sysfs__mountpoint());
> -
> - ret = stat(path, &st);
> - if (ret)
> - pr_debug("omitting PMU cpu tests\n");
> - return !ret;
> -}
> -
> static int test__pmu_events(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
> {
> struct perf_pmu *pmu;
> @@ -2305,9 +2299,6 @@ static int test__pmu_events(struct test_suite *test __maybe_unused, int subtest
>
> static int test__pmu_events2(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
> {
> - if (!test_pmu())
> - return TEST_SKIP;
> -
> return test_events(test__events_pmu, ARRAY_SIZE(test__events_pmu));
> }
>

2023-04-27 19:40:31

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 10/40] perf test: Mask config then test



On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> Add helper to test the config of an evsel. Mask the config so that
> high-bits containing the PMU type, which isn't constant for hybrid,
> are ignored.
>
> Signed-off-by: Ian Rogers <[email protected]>

Run the test on Cascade Lake and Alder Lake. It looks good.

Tested-by: Kan Liang <[email protected]>

Thanks,
Kan

> ---
> tools/perf/tests/parse-events.c | 183 +++++++++++++-------------------
> 1 file changed, 75 insertions(+), 108 deletions(-)
>
> diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
> index 08d6b8a3015d..fa016afbc250 100644
> --- a/tools/perf/tests/parse-events.c
> +++ b/tools/perf/tests/parse-events.c
> @@ -21,6 +21,11 @@
> #define PERF_TP_SAMPLE_TYPE (PERF_SAMPLE_RAW | PERF_SAMPLE_TIME | \
> PERF_SAMPLE_CPU | PERF_SAMPLE_PERIOD)
>
> +static bool test_config(const struct evsel *evsel, __u64 expected_config)
> +{
> + return (evsel->core.attr.config & PERF_HW_EVENT_MASK) == expected_config;
> +}
> +
> #ifdef HAVE_LIBTRACEEVENT
>
> #if defined(__s390x__)
> @@ -87,7 +92,7 @@ static int test__checkevent_raw(struct evlist *evlist)
>
> TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0x1a == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
> return TEST_OK;
> }
>
> @@ -97,7 +102,7 @@ static int test__checkevent_numeric(struct evlist *evlist)
>
> TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", 1 == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 1 == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 1));
> return TEST_OK;
> }
>
> @@ -107,8 +112,7 @@ static int test__checkevent_symbolic_name(struct evlist *evlist)
>
> TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_INSTRUCTIONS == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_INSTRUCTIONS));
> return TEST_OK;
> }
>
> @@ -118,8 +122,7 @@ static int test__checkevent_symbolic_name_config(struct evlist *evlist)
>
> TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
> /*
> * The period value gets configured within evlist__config,
> * while this test executes only parse events method.
> @@ -139,8 +142,7 @@ static int test__checkevent_symbolic_alias(struct evlist *evlist)
>
> TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_SW_PAGE_FAULTS == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_SW_PAGE_FAULTS));
> return TEST_OK;
> }
>
> @@ -150,7 +152,7 @@ static int test__checkevent_genhw(struct evlist *evlist)
>
> TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HW_CACHE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", (1 << 16) == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 1 << 16));
> return TEST_OK;
> }
>
> @@ -160,7 +162,7 @@ static int test__checkevent_breakpoint(struct evlist *evlist)
>
> TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_BREAKPOINT == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0 == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0));
> TEST_ASSERT_VAL("wrong bp_type", (HW_BREAKPOINT_R | HW_BREAKPOINT_W) ==
> evsel->core.attr.bp_type);
> TEST_ASSERT_VAL("wrong bp_len", HW_BREAKPOINT_LEN_4 ==
> @@ -174,7 +176,7 @@ static int test__checkevent_breakpoint_x(struct evlist *evlist)
>
> TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_BREAKPOINT == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0 == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0));
> TEST_ASSERT_VAL("wrong bp_type",
> HW_BREAKPOINT_X == evsel->core.attr.bp_type);
> TEST_ASSERT_VAL("wrong bp_len", sizeof(long) == evsel->core.attr.bp_len);
> @@ -188,7 +190,7 @@ static int test__checkevent_breakpoint_r(struct evlist *evlist)
> TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type",
> PERF_TYPE_BREAKPOINT == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0 == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0));
> TEST_ASSERT_VAL("wrong bp_type",
> HW_BREAKPOINT_R == evsel->core.attr.bp_type);
> TEST_ASSERT_VAL("wrong bp_len",
> @@ -203,7 +205,7 @@ static int test__checkevent_breakpoint_w(struct evlist *evlist)
> TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type",
> PERF_TYPE_BREAKPOINT == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0 == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0));
> TEST_ASSERT_VAL("wrong bp_type",
> HW_BREAKPOINT_W == evsel->core.attr.bp_type);
> TEST_ASSERT_VAL("wrong bp_len",
> @@ -218,7 +220,7 @@ static int test__checkevent_breakpoint_rw(struct evlist *evlist)
> TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type",
> PERF_TYPE_BREAKPOINT == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0 == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0));
> TEST_ASSERT_VAL("wrong bp_type",
> (HW_BREAKPOINT_R|HW_BREAKPOINT_W) == evsel->core.attr.bp_type);
> TEST_ASSERT_VAL("wrong bp_len",
> @@ -447,7 +449,7 @@ static int test__checkevent_pmu(struct evlist *evlist)
>
> TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 10 == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 10));
> TEST_ASSERT_VAL("wrong config1", 1 == evsel->core.attr.config1);
> TEST_ASSERT_VAL("wrong config2", 3 == evsel->core.attr.config2);
> TEST_ASSERT_VAL("wrong config3", 0 == evsel->core.attr.config3);
> @@ -469,7 +471,7 @@ static int test__checkevent_list(struct evlist *evlist)
>
> /* r1 */
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 1 == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 1));
> TEST_ASSERT_VAL("wrong config1", 0 == evsel->core.attr.config1);
> TEST_ASSERT_VAL("wrong config2", 0 == evsel->core.attr.config2);
> TEST_ASSERT_VAL("wrong config3", 0 == evsel->core.attr.config3);
> @@ -492,7 +494,7 @@ static int test__checkevent_list(struct evlist *evlist)
> /* 1:1:hp */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", 1 == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 1 == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 1));
> TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> @@ -509,14 +511,14 @@ static int test__checkevent_pmu_name(struct evlist *evlist)
> /* cpu/config=1,name=krava/u */
> TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 1 == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 1));
> TEST_ASSERT_VAL("wrong name", !strcmp(evsel__name(evsel), "krava"));
>
> /* cpu/config=2/u" */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 2 == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 2));
> TEST_ASSERT_VAL("wrong name",
> !strcmp(evsel__name(evsel), "cpu/config=2/u"));
>
> @@ -530,7 +532,7 @@ static int test__checkevent_pmu_partial_time_callgraph(struct evlist *evlist)
> /* cpu/config=1,call-graph=fp,time,period=100000/ */
> TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 1 == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 1));
> /*
> * The period, time and callgraph value gets configured within evlist__config,
> * while this test executes only parse events method.
> @@ -542,7 +544,7 @@ static int test__checkevent_pmu_partial_time_callgraph(struct evlist *evlist)
> /* cpu/config=2,call-graph=no,time=0,period=2000/ */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 2 == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 2));
> /*
> * The period, time and callgraph value gets configured within evlist__config,
> * while this test executes only parse events method.
> @@ -694,8 +696,7 @@ static int test__group1(struct evlist *evlist)
> /* instructions:k */
> evsel = leader = evlist__first(evlist);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_INSTRUCTIONS == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_INSTRUCTIONS));
> TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> @@ -710,8 +711,7 @@ static int test__group1(struct evlist *evlist)
> /* cycles:upp */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> @@ -736,8 +736,7 @@ static int test__group2(struct evlist *evlist)
> /* faults + :ku modifier */
> evsel = leader = evlist__first(evlist);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_SW_PAGE_FAULTS == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_SW_PAGE_FAULTS));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> @@ -752,8 +751,7 @@ static int test__group2(struct evlist *evlist)
> /* cache-references + :u modifier */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CACHE_REFERENCES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CACHE_REFERENCES));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> @@ -767,8 +765,7 @@ static int test__group2(struct evlist *evlist)
> /* cycles:k */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
> TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> @@ -811,8 +808,7 @@ static int test__group3(struct evlist *evlist __maybe_unused)
> /* group1 cycles:kppp */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
> TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> @@ -828,8 +824,7 @@ static int test__group3(struct evlist *evlist __maybe_unused)
> /* group2 cycles + G modifier */
> evsel = leader = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> @@ -846,7 +841,7 @@ static int test__group3(struct evlist *evlist __maybe_unused)
> /* group2 1:3 + G modifier */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", 1 == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 3 == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 3));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> @@ -860,8 +855,7 @@ static int test__group3(struct evlist *evlist __maybe_unused)
> /* instructions:u */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_INSTRUCTIONS == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_INSTRUCTIONS));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> @@ -885,8 +879,7 @@ static int test__group4(struct evlist *evlist __maybe_unused)
> /* cycles:u + p */
> evsel = leader = evlist__first(evlist);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> @@ -903,8 +896,7 @@ static int test__group4(struct evlist *evlist __maybe_unused)
> /* instructions:kp + p */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_INSTRUCTIONS == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_INSTRUCTIONS));
> TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> @@ -929,8 +921,7 @@ static int test__group5(struct evlist *evlist __maybe_unused)
> /* cycles + G */
> evsel = leader = evlist__first(evlist);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> @@ -946,8 +937,7 @@ static int test__group5(struct evlist *evlist __maybe_unused)
> /* instructions + G */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_INSTRUCTIONS == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_INSTRUCTIONS));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> @@ -961,8 +951,7 @@ static int test__group5(struct evlist *evlist __maybe_unused)
> /* cycles:G */
> evsel = leader = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> @@ -978,8 +967,7 @@ static int test__group5(struct evlist *evlist __maybe_unused)
> /* instructions:G */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_INSTRUCTIONS == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_INSTRUCTIONS));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> @@ -992,8 +980,7 @@ static int test__group5(struct evlist *evlist __maybe_unused)
> /* cycles */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> @@ -1015,8 +1002,7 @@ static int test__group_gh1(struct evlist *evlist)
> /* cycles + :H group modifier */
> evsel = leader = evlist__first(evlist);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> @@ -1031,8 +1017,7 @@ static int test__group_gh1(struct evlist *evlist)
> /* cache-misses:G + :H group modifier */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CACHE_MISSES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CACHE_MISSES));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> @@ -1055,8 +1040,7 @@ static int test__group_gh2(struct evlist *evlist)
> /* cycles + :G group modifier */
> evsel = leader = evlist__first(evlist);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> @@ -1071,8 +1055,7 @@ static int test__group_gh2(struct evlist *evlist)
> /* cache-misses:H + :G group modifier */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CACHE_MISSES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CACHE_MISSES));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> @@ -1095,8 +1078,7 @@ static int test__group_gh3(struct evlist *evlist)
> /* cycles:G + :u group modifier */
> evsel = leader = evlist__first(evlist);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> @@ -1111,8 +1093,7 @@ static int test__group_gh3(struct evlist *evlist)
> /* cache-misses:H + :u group modifier */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CACHE_MISSES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CACHE_MISSES));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> @@ -1135,8 +1116,7 @@ static int test__group_gh4(struct evlist *evlist)
> /* cycles:G + :uG group modifier */
> evsel = leader = evlist__first(evlist);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> @@ -1151,8 +1131,7 @@ static int test__group_gh4(struct evlist *evlist)
> /* cache-misses:H + :uG group modifier */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CACHE_MISSES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CACHE_MISSES));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> @@ -1174,8 +1153,7 @@ static int test__leader_sample1(struct evlist *evlist)
> /* cycles - sampling group leader */
> evsel = leader = evlist__first(evlist);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> @@ -1189,8 +1167,7 @@ static int test__leader_sample1(struct evlist *evlist)
> /* cache-misses - not sampling */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CACHE_MISSES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CACHE_MISSES));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> @@ -1203,8 +1180,7 @@ static int test__leader_sample1(struct evlist *evlist)
> /* branch-misses - not sampling */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_BRANCH_MISSES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_BRANCH_MISSES));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> @@ -1227,8 +1203,7 @@ static int test__leader_sample2(struct evlist *evlist __maybe_unused)
> /* instructions - sampling group leader */
> evsel = leader = evlist__first(evlist);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_INSTRUCTIONS == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_INSTRUCTIONS));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> @@ -1242,8 +1217,7 @@ static int test__leader_sample2(struct evlist *evlist __maybe_unused)
> /* branch-misses - not sampling */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_BRANCH_MISSES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_BRANCH_MISSES));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> @@ -1279,8 +1253,7 @@ static int test__pinned_group(struct evlist *evlist)
> /* cycles - group leader */
> evsel = leader = evlist__first(evlist);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
> TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
> TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> TEST_ASSERT_VAL("wrong pinned", evsel->core.attr.pinned);
> @@ -1288,14 +1261,12 @@ static int test__pinned_group(struct evlist *evlist)
> /* cache-misses - can not be pinned, but will go on with the leader */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CACHE_MISSES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CACHE_MISSES));
> TEST_ASSERT_VAL("wrong pinned", !evsel->core.attr.pinned);
>
> /* branch-misses - ditto */
> evsel = evsel__next(evsel);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_BRANCH_MISSES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_BRANCH_MISSES));
> TEST_ASSERT_VAL("wrong pinned", !evsel->core.attr.pinned);
>
> return TEST_OK;
> @@ -1323,8 +1294,7 @@ static int test__exclusive_group(struct evlist *evlist)
> /* cycles - group leader */
> evsel = leader = evlist__first(evlist);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
> TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
> TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> TEST_ASSERT_VAL("wrong exclusive", evsel->core.attr.exclusive);
> @@ -1332,14 +1302,12 @@ static int test__exclusive_group(struct evlist *evlist)
> /* cache-misses - can not be pinned, but will go on with the leader */
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_CACHE_MISSES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CACHE_MISSES));
> TEST_ASSERT_VAL("wrong exclusive", !evsel->core.attr.exclusive);
>
> /* branch-misses - ditto */
> evsel = evsel__next(evsel);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_HW_BRANCH_MISSES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_BRANCH_MISSES));
> TEST_ASSERT_VAL("wrong exclusive", !evsel->core.attr.exclusive);
>
> return TEST_OK;
> @@ -1350,7 +1318,7 @@ static int test__checkevent_breakpoint_len(struct evlist *evlist)
>
> TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_BREAKPOINT == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0 == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0));
> TEST_ASSERT_VAL("wrong bp_type", (HW_BREAKPOINT_R | HW_BREAKPOINT_W) ==
> evsel->core.attr.bp_type);
> TEST_ASSERT_VAL("wrong bp_len", HW_BREAKPOINT_LEN_1 ==
> @@ -1365,7 +1333,7 @@ static int test__checkevent_breakpoint_len_w(struct evlist *evlist)
>
> TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_BREAKPOINT == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0 == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0));
> TEST_ASSERT_VAL("wrong bp_type", HW_BREAKPOINT_W ==
> evsel->core.attr.bp_type);
> TEST_ASSERT_VAL("wrong bp_len", HW_BREAKPOINT_LEN_2 ==
> @@ -1393,8 +1361,7 @@ static int test__checkevent_precise_max_modifier(struct evlist *evlist)
>
> TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config",
> - PERF_COUNT_SW_TASK_CLOCK == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_SW_TASK_CLOCK));
> return TEST_OK;
> }
>
> @@ -1462,7 +1429,7 @@ static int test__checkevent_raw_pmu(struct evlist *evlist)
>
> TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0x1a == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
> return TEST_OK;
> }
>
> @@ -1471,7 +1438,7 @@ static int test__sym_event_slash(struct evlist *evlist)
> struct evsel *evsel = evlist__first(evlist);
>
> TEST_ASSERT_VAL("wrong type", evsel->core.attr.type == PERF_TYPE_HARDWARE);
> - TEST_ASSERT_VAL("wrong config", evsel->core.attr.config == PERF_COUNT_HW_CPU_CYCLES);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
> TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> return TEST_OK;
> }
> @@ -1481,7 +1448,7 @@ static int test__sym_event_dc(struct evlist *evlist)
> struct evsel *evsel = evlist__first(evlist);
>
> TEST_ASSERT_VAL("wrong type", evsel->core.attr.type == PERF_TYPE_HARDWARE);
> - TEST_ASSERT_VAL("wrong config", evsel->core.attr.config == PERF_COUNT_HW_CPU_CYCLES);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, PERF_COUNT_HW_CPU_CYCLES));
> TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
> return TEST_OK;
> }
> @@ -1548,7 +1515,7 @@ static int test__hybrid_hw_event_with_pmu(struct evlist *evlist)
>
> TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
> return TEST_OK;
> }
>
> @@ -1559,12 +1526,12 @@ static int test__hybrid_hw_group_event(struct evlist *evlist)
> evsel = leader = evlist__first(evlist);
> TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
> TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
>
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0xc0 == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0xc0));
> TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> return TEST_OK;
> }
> @@ -1580,7 +1547,7 @@ static int test__hybrid_sw_hw_group_event(struct evlist *evlist)
>
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
> TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> return TEST_OK;
> }
> @@ -1592,7 +1559,7 @@ static int test__hybrid_hw_sw_group_event(struct evlist *evlist)
> evsel = leader = evlist__first(evlist);
> TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
> TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
>
> evsel = evsel__next(evsel);
> @@ -1608,14 +1575,14 @@ static int test__hybrid_group_modifier1(struct evlist *evlist)
> evsel = leader = evlist__first(evlist);
> TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
> TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
>
> evsel = evsel__next(evsel);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0xc0 == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0xc0));
> TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> @@ -1629,17 +1596,17 @@ static int test__hybrid_raw1(struct evlist *evlist)
> if (!perf_pmu__hybrid_mounted("cpu_atom")) {
> TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0x1a == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
> return TEST_OK;
> }
>
> TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0x1a == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
>
> /* The type of second event is randome value */
> evsel = evsel__next(evsel);
> - TEST_ASSERT_VAL("wrong config", 0x1a == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
> return TEST_OK;
> }
>
> @@ -1649,7 +1616,7 @@ static int test__hybrid_raw2(struct evlist *evlist)
>
> TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0x1a == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
> return TEST_OK;
> }
>

2023-04-27 19:40:51

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 11/40] perf test: Test more with config_cache



On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> test__checkevent_config_cache checks the parsing of
> "L1-dcache-misses/name=cachepmu/". Don't just check that the name is
> set correctly, also validate the rest of the perf_event_attr for
> L1-dcache-misses.
>
> Signed-off-by: Ian Rogers <[email protected]>

Run the test on Cascade Lake and Alder Lake. It looks good.

Tested-by: Kan Liang <[email protected]>

Thanks,
Kan

> ---
> tools/perf/tests/parse-events.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
> index fa016afbc250..177464793aa8 100644
> --- a/tools/perf/tests/parse-events.c
> +++ b/tools/perf/tests/parse-events.c
> @@ -1394,7 +1394,7 @@ static int test__checkevent_config_cache(struct evlist *evlist)
> struct evsel *evsel = evlist__first(evlist);
>
> TEST_ASSERT_VAL("wrong name setting", evsel__name_is(evsel, "cachepmu"));
> - return TEST_OK;
> + return test__checkevent_genhw(evlist);
> }
>
> static bool test__pmu_cpu_valid(void)

2023-04-27 19:45:18

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 12/40] perf test: Roundtrip name, don't assume 1 event per name



On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> Opening hardware names and a legacy cache event on a hybrid PMU opens
> it on each PMU. Parsing and checking indexes fails, as the parsed
> index is double the expected. Avoid checking the index by just
> comparing the names immediately after the parse.
>
> This change removes hard coded hybrid logic and removes assumptions
> about the expansion of an event. On hybrid the PMUs may or may not
> support an event and so using a distance isn't a consistent solution.
>
> Signed-off-by: Ian Rogers <[email protected]>

Run the test on Cascade Lake and Alder Lake. It looks good.

Tested-by: Kan Liang <[email protected]>

Thanks,
Kan
> ---
> tools/perf/tests/evsel-roundtrip-name.c | 119 ++++++++++--------------
> 1 file changed, 49 insertions(+), 70 deletions(-)
>
> diff --git a/tools/perf/tests/evsel-roundtrip-name.c b/tools/perf/tests/evsel-roundtrip-name.c
> index e94fed901992..15ff86f9da0b 100644
> --- a/tools/perf/tests/evsel-roundtrip-name.c
> +++ b/tools/perf/tests/evsel-roundtrip-name.c
> @@ -4,114 +4,93 @@
> #include "parse-events.h"
> #include "tests.h"
> #include "debug.h"
> -#include "pmu.h"
> -#include "pmu-hybrid.h"
> -#include <errno.h>
> #include <linux/kernel.h>
>
> static int perf_evsel__roundtrip_cache_name_test(void)
> {
> - char name[128];
> - int type, op, err = 0, ret = 0, i, idx;
> - struct evsel *evsel;
> - struct evlist *evlist = evlist__new();
> + int ret = TEST_OK;
>
> - if (evlist == NULL)
> - return -ENOMEM;
> -
> - for (type = 0; type < PERF_COUNT_HW_CACHE_MAX; type++) {
> - for (op = 0; op < PERF_COUNT_HW_CACHE_OP_MAX; op++) {
> + for (int type = 0; type < PERF_COUNT_HW_CACHE_MAX; type++) {
> + for (int op = 0; op < PERF_COUNT_HW_CACHE_OP_MAX; op++) {
> /* skip invalid cache type */
> if (!evsel__is_cache_op_valid(type, op))
> continue;
>
> - for (i = 0; i < PERF_COUNT_HW_CACHE_RESULT_MAX; i++) {
> - __evsel__hw_cache_type_op_res_name(type, op, i, name, sizeof(name));
> - err = parse_event(evlist, name);
> - if (err)
> - ret = err;
> - }
> - }
> - }
> -
> - idx = 0;
> - evsel = evlist__first(evlist);
> + for (int res = 0; res < PERF_COUNT_HW_CACHE_RESULT_MAX; res++) {
> + char name[128];
> + struct evlist *evlist = evlist__new();
> + struct evsel *evsel;
> + int err;
>
> - for (type = 0; type < PERF_COUNT_HW_CACHE_MAX; type++) {
> - for (op = 0; op < PERF_COUNT_HW_CACHE_OP_MAX; op++) {
> - /* skip invalid cache type */
> - if (!evsel__is_cache_op_valid(type, op))
> - continue;
> + if (evlist == NULL) {
> + pr_debug("Failed to alloc evlist");
> + return TEST_FAIL;
> + }
> + __evsel__hw_cache_type_op_res_name(type, op, res,
> + name, sizeof(name));
>
> - for (i = 0; i < PERF_COUNT_HW_CACHE_RESULT_MAX; i++) {
> - __evsel__hw_cache_type_op_res_name(type, op, i, name, sizeof(name));
> - if (evsel->core.idx != idx)
> + err = parse_event(evlist, name);
> + if (err) {
> + pr_debug("Failure to parse cache event '%s' possibly as PMUs don't support it",
> + name);
> + evlist__delete(evlist);
> continue;
> -
> - ++idx;
> -
> - if (strcmp(evsel__name(evsel), name)) {
> - pr_debug("%s != %s\n", evsel__name(evsel), name);
> - ret = -1;
> }
> -
> - evsel = evsel__next(evsel);
> + evlist__for_each_entry(evlist, evsel) {
> + if (strcmp(evsel__name(evsel), name)) {
> + pr_debug("%s != %s\n", evsel__name(evsel), name);
> + ret = TEST_FAIL;
> + }
> + }
> + evlist__delete(evlist);
> }
> }
> }
> -
> - evlist__delete(evlist);
> return ret;
> }
>
> -static int __perf_evsel__name_array_test(const char *const names[], int nr_names,
> - int distance)
> +static int perf_evsel__name_array_test(const char *const names[], int nr_names)
> {
> - int i, err;
> - struct evsel *evsel;
> - struct evlist *evlist = evlist__new();
> + int ret = TEST_OK;
>
> - if (evlist == NULL)
> - return -ENOMEM;
> + for (int i = 0; i < nr_names; ++i) {
> + struct evlist *evlist = evlist__new();
> + struct evsel *evsel;
> + int err;
>
> - for (i = 0; i < nr_names; ++i) {
> + if (evlist == NULL) {
> + pr_debug("Failed to alloc evlist");
> + return TEST_FAIL;
> + }
> err = parse_event(evlist, names[i]);
> if (err) {
> pr_debug("failed to parse event '%s', err %d\n",
> names[i], err);
> - goto out_delete_evlist;
> + evlist__delete(evlist);
> + ret = TEST_FAIL;
> + continue;
> }
> - }
> -
> - err = 0;
> - evlist__for_each_entry(evlist, evsel) {
> - if (strcmp(evsel__name(evsel), names[evsel->core.idx / distance])) {
> - --err;
> - pr_debug("%s != %s\n", evsel__name(evsel), names[evsel->core.idx / distance]);
> + evlist__for_each_entry(evlist, evsel) {
> + if (strcmp(evsel__name(evsel), names[i])) {
> + pr_debug("%s != %s\n", evsel__name(evsel), names[i]);
> + ret = TEST_FAIL;
> + }
> }
> + evlist__delete(evlist);
> }
> -
> -out_delete_evlist:
> - evlist__delete(evlist);
> - return err;
> + return ret;
> }
>
> -#define perf_evsel__name_array_test(names, distance) \
> - __perf_evsel__name_array_test(names, ARRAY_SIZE(names), distance)
> -
> static int test__perf_evsel__roundtrip_name_test(struct test_suite *test __maybe_unused,
> int subtest __maybe_unused)
> {
> - int err = 0, ret = 0;
> -
> - if (perf_pmu__has_hybrid() && perf_pmu__hybrid_mounted("cpu_atom"))
> - return perf_evsel__name_array_test(evsel__hw_names, 2);
> + int err = 0, ret = TEST_OK;
>
> - err = perf_evsel__name_array_test(evsel__hw_names, 1);
> + err = perf_evsel__name_array_test(evsel__hw_names, PERF_COUNT_HW_MAX);
> if (err)
> ret = err;
>
> - err = __perf_evsel__name_array_test(evsel__sw_names, PERF_COUNT_SW_DUMMY + 1, 1);
> + err = perf_evsel__name_array_test(evsel__sw_names, PERF_COUNT_SW_DUMMY + 1);
> if (err)
> ret = err;
>

2023-04-27 20:13:03

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 14/40] perf print-events: Avoid unnecessary strlist



On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> The strlist in print_hwcache_events holds the event names as they are
> generated, and then it is iterated and printed. This is unnecessary
> and each event can just be printed as it is processed.
> Rename the variable i to res, to be more intention revealing and
> consistent with other code.
>

Looks good to me.

Reviewed-by: Kan Liang <[email protected]>

Thanks,
Kan

> Signed-off-by: Ian Rogers <[email protected]>> ---
> tools/perf/util/print-events.c | 60 ++++++++++++++++++----------------
> 1 file changed, 31 insertions(+), 29 deletions(-)
>
> diff --git a/tools/perf/util/print-events.c b/tools/perf/util/print-events.c
> index 386b1ab0b60e..93bbb868d400 100644
> --- a/tools/perf/util/print-events.c
> +++ b/tools/perf/util/print-events.c
> @@ -226,58 +226,60 @@ void print_sdt_events(const struct print_callbacks *print_cb, void *print_state)
>
> int print_hwcache_events(const struct print_callbacks *print_cb, void *print_state)
> {
> - struct strlist *evt_name_list = strlist__new(NULL, NULL);
> - struct str_node *nd;
> + const char *event_type_descriptor = event_type_descriptors[PERF_TYPE_HW_CACHE];
>
> - if (!evt_name_list) {
> - pr_debug("Failed to allocate new strlist for hwcache events\n");
> - return -ENOMEM;
> - }
> for (int type = 0; type < PERF_COUNT_HW_CACHE_MAX; type++) {
> for (int op = 0; op < PERF_COUNT_HW_CACHE_OP_MAX; op++) {
> /* skip invalid cache type */
> if (!evsel__is_cache_op_valid(type, op))
> continue;
>
> - for (int i = 0; i < PERF_COUNT_HW_CACHE_RESULT_MAX; i++) {
> + for (int res = 0; res < PERF_COUNT_HW_CACHE_RESULT_MAX; res++) {
> struct perf_pmu *pmu = NULL;
> char name[64];
>
> - __evsel__hw_cache_type_op_res_name(type, op, i, name, sizeof(name));
> + __evsel__hw_cache_type_op_res_name(type, op, res,
> + name, sizeof(name));
> if (!perf_pmu__has_hybrid()) {
> if (is_event_supported(PERF_TYPE_HW_CACHE,
> - type | (op << 8) | (i << 16)))
> - strlist__add(evt_name_list, name);
> + type | (op << 8) | (res << 16))) {
> + print_cb->print_event(print_state,
> + "cache",
> + /*pmu_name=*/NULL,
> + name,
> + /*event_alias=*/NULL,
> + /*scale_unit=*/NULL,
> + /*deprecated=*/false,
> + event_type_descriptor,
> + /*desc=*/NULL,
> + /*long_desc=*/NULL,
> + /*encoding_desc=*/NULL);
> + }
> continue;
> }
> perf_pmu__for_each_hybrid_pmu(pmu) {
> if (is_event_supported(PERF_TYPE_HW_CACHE,
> - type | (op << 8) | (i << 16) |
> + type | (op << 8) | (res << 16) |
> ((__u64)pmu->type << PERF_PMU_TYPE_SHIFT))) {
> char new_name[128];
> - snprintf(new_name, sizeof(new_name),
> - "%s/%s/", pmu->name, name);
> - strlist__add(evt_name_list, new_name);
> + snprintf(new_name, sizeof(new_name),
> + "%s/%s/", pmu->name, name);
> + print_cb->print_event(print_state,
> + "cache",
> + pmu->name,
> + name,
> + new_name,
> + /*scale_unit=*/NULL,
> + /*deprecated=*/false,
> + event_type_descriptor,
> + /*desc=*/NULL,
> + /*long_desc=*/NULL,
> + /*encoding_desc=*/NULL);
> }
> }
> }
> }
> }
> -
> - strlist__for_each_entry(nd, evt_name_list) {
> - print_cb->print_event(print_state,
> - "cache",
> - /*pmu_name=*/NULL,
> - nd->s,
> - /*event_alias=*/NULL,
> - /*scale_unit=*/NULL,
> - /*deprecated=*/false,
> - event_type_descriptors[PERF_TYPE_HW_CACHE],
> - /*desc=*/NULL,
> - /*long_desc=*/NULL,
> - /*encoding_desc=*/NULL);
> - }
> - strlist__delete(evt_name_list);
> return 0;
> }
>

2023-04-27 20:13:47

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 15/40] perf parse-events: Avoid scanning PMUs before parsing



On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> The event parser needs to handle two special cases:
> 1) legacy events like L1-dcache-load-miss. These event names don't
> appear in json or sysfs, and lookup tables are used for the config
> value.
> 2) raw events where 'r0xead' is the same as 'read' unless the PMU has
> an event called 'read' in which case the event has priority.
>
> The previous parser to handle these cases would scan all PMUs for
> components of event names. These components would then be used to
> classify in the lexer whether the token should be part of a legacy
> event, a raw event or an event. The grammar would handle legacy event
> tokens or recombining the tokens back into a regular event name. The
> code wasn't PMU specific and had issues around events like AMD's
> branch-brs that would fail to parse as it expects brs to be a suffix
> on a legacy event style name:
>
> $ perf stat -e branch-brs true
> event syntax error: 'branch-brs'
> \___ parser error
>
> This change removes processing all PMUs by using the lexer in the form
> of a regular expression matcher. The lexer will return the token for
> the longest matched sequence of characters, and in the event of a tie
> the first. The legacy events are a fixed number of regular
> expressions, and by matching these before a name token its possible to
> generate an accurate legacy event token with everything else matching
> as a name. Because of the lexer change the handling of hyphens in the
> grammar can be removed as hyphens just become a part of the name.
>
> To handle raw events and terms the parser is changed to defer trying
> to evaluate whether something is a raw event until the PMU is known in
> the grammar. Once the PMU is known, the events of the PMU can be
> scanned for the 'read' style problem. A new term type is added for
> these raw terms, used to enable deferring the evaluation.
>
> While this change is large, it has stats of:
> 170 insertions(+), 436 deletions(-)
> the bulk of the change is deleting the old approach. It isn't possible
> to break apart the code added due to the dependencies on how the parts
> of the parsing work.
>
> Signed-off-by: Ian Rogers <[email protected]>


Run the test on Cascade Lake and Alder Lake. It looks good.

Tested-by: Kan Liang <[email protected]>

Thanks,
Kan

> ---
> tools/perf/tests/parse-events.c | 24 +--
> tools/perf/tests/pmu-events.c | 9 -
> tools/perf/util/parse-events.c | 329 ++++++++++----------------------
> tools/perf/util/parse-events.h | 16 +-
> tools/perf/util/parse-events.l | 85 +--------
> tools/perf/util/parse-events.y | 143 +++++---------
> 6 files changed, 170 insertions(+), 436 deletions(-)
>
> diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
> index 177464793aa8..6eadb8a47dbf 100644
> --- a/tools/perf/tests/parse-events.c
> +++ b/tools/perf/tests/parse-events.c
> @@ -664,11 +664,11 @@ static int test__checkterms_simple(struct list_head *terms)
> */
> term = list_entry(term->list.next, struct parse_events_term, list);
> TEST_ASSERT_VAL("wrong type term",
> - term->type_term == PARSE_EVENTS__TERM_TYPE_USER);
> + term->type_term == PARSE_EVENTS__TERM_TYPE_RAW);
> TEST_ASSERT_VAL("wrong type val",
> - term->type_val == PARSE_EVENTS__TERM_TYPE_NUM);
> - TEST_ASSERT_VAL("wrong val", term->val.num == 1);
> - TEST_ASSERT_VAL("wrong config", !strcmp(term->config, "read"));
> + term->type_val == PARSE_EVENTS__TERM_TYPE_STR);
> + TEST_ASSERT_VAL("wrong val", !strcmp(term->val.str, "read"));
> + TEST_ASSERT_VAL("wrong config", !strcmp(term->config, "raw"));
>
> /*
> * r0xead
> @@ -678,11 +678,11 @@ static int test__checkterms_simple(struct list_head *terms)
> */
> term = list_entry(term->list.next, struct parse_events_term, list);
> TEST_ASSERT_VAL("wrong type term",
> - term->type_term == PARSE_EVENTS__TERM_TYPE_CONFIG);
> + term->type_term == PARSE_EVENTS__TERM_TYPE_RAW);
> TEST_ASSERT_VAL("wrong type val",
> - term->type_val == PARSE_EVENTS__TERM_TYPE_NUM);
> - TEST_ASSERT_VAL("wrong val", term->val.num == 0xead);
> - TEST_ASSERT_VAL("wrong config", !strcmp(term->config, "config"));
> + term->type_val == PARSE_EVENTS__TERM_TYPE_STR);
> + TEST_ASSERT_VAL("wrong val", !strcmp(term->val.str, "r0xead"));
> + TEST_ASSERT_VAL("wrong config", !strcmp(term->config, "raw"));
> return TEST_OK;
> }
>
> @@ -2090,7 +2090,6 @@ static int test_event_fake_pmu(const char *str)
> return -ENOMEM;
>
> parse_events_error__init(&err);
> - perf_pmu__test_parse_init();
> ret = __parse_events(evlist, str, &err, &perf_pmu__fake, /*warn_if_reordered=*/true);
> if (ret) {
> pr_debug("failed to parse event '%s', err %d, str '%s'\n",
> @@ -2144,13 +2143,6 @@ static int test_term(const struct terms_test *t)
>
> INIT_LIST_HEAD(&terms);
>
> - /*
> - * The perf_pmu__test_parse_init prepares perf_pmu_events_list
> - * which gets freed in parse_events_terms.
> - */
> - if (perf_pmu__test_parse_init())
> - return -1;
> -
> ret = parse_events_terms(&terms, t->str);
> if (ret) {
> pr_debug("failed to parse terms '%s', err %d\n",
> diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
> index 1dff863b9711..a2cde61b1c77 100644
> --- a/tools/perf/tests/pmu-events.c
> +++ b/tools/perf/tests/pmu-events.c
> @@ -776,15 +776,6 @@ static int check_parse_id(const char *id, struct parse_events_error *error,
> for (cur = strchr(dup, '@') ; cur; cur = strchr(++cur, '@'))
> *cur = '/';
>
> - if (fake_pmu) {
> - /*
> - * Every call to __parse_events will try to initialize the PMU
> - * state from sysfs and then clean it up at the end. Reset the
> - * PMU events to the test state so that we don't pick up
> - * erroneous prefixes and suffixes.
> - */
> - perf_pmu__test_parse_init();
> - }
> ret = __parse_events(evlist, dup, error, fake_pmu, /*warn_if_reordered=*/true);
> free(dup);
>
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 4ba01577618e..e416e653cf74 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -34,11 +34,6 @@
>
> #define MAX_NAME_LEN 100
>
> -struct perf_pmu_event_symbol {
> - char *symbol;
> - enum perf_pmu_event_symbol_type type;
> -};
> -
> #ifdef PARSER_DEBUG
> extern int parse_events_debug;
> #endif
> @@ -49,15 +44,6 @@ static int parse_events__with_hybrid_pmu(struct parse_events_state *parse_state,
> const char *str, char *pmu_name,
> struct list_head *list);
>
> -static struct perf_pmu_event_symbol *perf_pmu_events_list;
> -/*
> - * The variable indicates the number of supported pmu event symbols.
> - * 0 means not initialized and ready to init
> - * -1 means failed to init, don't try anymore
> - * >0 is the number of supported pmu event symbols
> - */
> -static int perf_pmu_events_list_num;
> -
> struct event_symbol event_symbols_hw[PERF_COUNT_HW_MAX] = {
> [PERF_COUNT_HW_CPU_CYCLES] = {
> .symbol = "cpu-cycles",
> @@ -236,6 +222,57 @@ static char *get_config_name(struct list_head *head_terms)
> return get_config_str(head_terms, PARSE_EVENTS__TERM_TYPE_NAME);
> }
>
> +/**
> + * fix_raw - For each raw term see if there is an event (aka alias) in pmu that
> + * matches the raw's string value. If the string value matches an
> + * event then change the term to be an event, if not then change it to
> + * be a config term. For example, "read" may be an event of the PMU or
> + * a raw hex encoding of 0xead. The fix-up is done late so the PMU of
> + * the event can be determined and we don't need to scan all PMUs
> + * ahead-of-time.
> + * @config_terms: the list of terms that may contain a raw term.
> + * @pmu: the PMU to scan for events from.
> + */
> +static void fix_raw(struct list_head *config_terms, struct perf_pmu *pmu)
> +{
> + struct parse_events_term *term;
> +
> + list_for_each_entry(term, config_terms, list) {
> + struct perf_pmu_alias *alias;
> + bool matched = false;
> +
> + if (term->type_term != PARSE_EVENTS__TERM_TYPE_RAW)
> + continue;
> +
> + list_for_each_entry(alias, &pmu->aliases, list) {
> + if (!strcmp(alias->name, term->val.str)) {
> + free(term->config);
> + term->config = term->val.str;
> + term->type_val = PARSE_EVENTS__TERM_TYPE_NUM;
> + term->type_term = PARSE_EVENTS__TERM_TYPE_USER;
> + term->val.num = 1;
> + term->no_value = true;
> + matched = true;
> + break;
> + }
> + }
> + if (!matched) {
> + u64 num;
> +
> + free(term->config);
> + term->config=strdup("config");
> + errno = 0;
> + num = strtoull(term->val.str + 1, NULL, 16);
> + assert(errno == 0);
> + free(term->val.str);
> + term->type_val = PARSE_EVENTS__TERM_TYPE_NUM;
> + term->type_term = PARSE_EVENTS__TERM_TYPE_CONFIG;
> + term->val.num = num;
> + term->no_value = false;
> + }
> + }
> +}
> +
> static struct evsel *
> __add_event(struct list_head *list, int *idx,
> struct perf_event_attr *attr,
> @@ -328,18 +365,27 @@ static int add_event_tool(struct list_head *list, int *idx,
> return 0;
> }
>
> -static int parse_aliases(char *str, const char *const names[][EVSEL__MAX_ALIASES], int size)
> +/**
> + * parse_aliases - search names for entries beginning or equalling str ignoring
> + * case. If mutliple entries in names match str then the longest
> + * is chosen.
> + * @str: The needle to look for.
> + * @names: The haystack to search.
> + * @size: The size of the haystack.
> + * @longest: Out argument giving the length of the matching entry.
> + */
> +static int parse_aliases(const char *str, const char *const names[][EVSEL__MAX_ALIASES], int size,
> + int *longest)
> {
> - int i, j;
> - int n, longest = -1;
> + *longest = -1;
> + for (int i = 0; i < size; i++) {
> + for (int j = 0; j < EVSEL__MAX_ALIASES && names[i][j]; j++) {
> + int n = strlen(names[i][j]);
>
> - for (i = 0; i < size; i++) {
> - for (j = 0; j < EVSEL__MAX_ALIASES && names[i][j]; j++) {
> - n = strlen(names[i][j]);
> - if (n > longest && !strncasecmp(str, names[i][j], n))
> - longest = n;
> + if (n > *longest && !strncasecmp(str, names[i][j], n))
> + *longest = n;
> }
> - if (longest > 0)
> + if (*longest > 0)
> return i;
> }
>
> @@ -357,52 +403,58 @@ static int config_attr(struct perf_event_attr *attr,
> struct parse_events_error *err,
> config_term_func_t config_term);
>
> -int parse_events_add_cache(struct list_head *list, int *idx,
> - char *type, char *op_result1, char *op_result2,
> +int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
> struct parse_events_error *err,
> struct list_head *head_config,
> struct parse_events_state *parse_state)
> {
> struct perf_event_attr attr;
> LIST_HEAD(config_terms);
> - char name[MAX_NAME_LEN];
> const char *config_name, *metric_id;
> int cache_type = -1, cache_op = -1, cache_result = -1;
> - char *op_result[2] = { op_result1, op_result2 };
> - int i, n, ret;
> + int ret, len;
> + const char *name_end = &name[strlen(name) + 1];
> bool hybrid;
> + const char *str = name;
>
> /*
> - * No fallback - if we cannot get a clear cache type
> - * then bail out:
> + * Search str for the legacy cache event name composed of 1, 2 or 3
> + * hyphen separated sections. The first section is the cache type while
> + * the others are the optional op and optional result. To make life hard
> + * the names in the table also contain hyphens and the longest name
> + * should always be selected.
> */
> - cache_type = parse_aliases(type, evsel__hw_cache, PERF_COUNT_HW_CACHE_MAX);
> + cache_type = parse_aliases(str, evsel__hw_cache, PERF_COUNT_HW_CACHE_MAX, &len);
> if (cache_type == -1)
> return -EINVAL;
> + str += len + 1;
>
> config_name = get_config_name(head_config);
> - n = snprintf(name, MAX_NAME_LEN, "%s", type);
> -
> - for (i = 0; (i < 2) && (op_result[i]); i++) {
> - char *str = op_result[i];
> -
> - n += snprintf(name + n, MAX_NAME_LEN - n, "-%s", str);
> -
> - if (cache_op == -1) {
> + if (str < name_end) {
> + cache_op = parse_aliases(str, evsel__hw_cache_op,
> + PERF_COUNT_HW_CACHE_OP_MAX, &len);
> + if (cache_op >= 0) {
> + if (!evsel__is_cache_op_valid(cache_type, cache_op))
> + return -EINVAL;
> + str += len + 1;
> + } else {
> + cache_result = parse_aliases(str, evsel__hw_cache_result,
> + PERF_COUNT_HW_CACHE_RESULT_MAX, &len);
> + if (cache_result >= 0)
> + str += len + 1;
> + }
> + }
> + if (str < name_end) {
> + if (cache_op < 0) {
> cache_op = parse_aliases(str, evsel__hw_cache_op,
> - PERF_COUNT_HW_CACHE_OP_MAX);
> + PERF_COUNT_HW_CACHE_OP_MAX, &len);
> if (cache_op >= 0) {
> if (!evsel__is_cache_op_valid(cache_type, cache_op))
> return -EINVAL;
> - continue;
> }
> - }
> -
> - if (cache_result == -1) {
> + } else if (cache_result < 0) {
> cache_result = parse_aliases(str, evsel__hw_cache_result,
> - PERF_COUNT_HW_CACHE_RESULT_MAX);
> - if (cache_result >= 0)
> - continue;
> + PERF_COUNT_HW_CACHE_RESULT_MAX, &len);
> }
> }
>
> @@ -968,6 +1020,7 @@ static const char *config_term_names[__PARSE_EVENTS__TERM_TYPE_NR] = {
> [PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT] = "aux-output",
> [PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE] = "aux-sample-size",
> [PARSE_EVENTS__TERM_TYPE_METRIC_ID] = "metric-id",
> + [PARSE_EVENTS__TERM_TYPE_RAW] = "raw",
> };
>
> static bool config_term_shrinked;
> @@ -1089,6 +1142,9 @@ do { \
> case PARSE_EVENTS__TERM_TYPE_METRIC_ID:
> CHECK_TYPE_VAL(STR);
> break;
> + case PARSE_EVENTS__TERM_TYPE_RAW:
> + CHECK_TYPE_VAL(STR);
> + break;
> case PARSE_EVENTS__TERM_TYPE_MAX_STACK:
> CHECK_TYPE_VAL(NUM);
> break;
> @@ -1485,6 +1541,8 @@ int parse_events_add_pmu(struct parse_events_state *parse_state,
> parse_events_error__handle(err, 0, err_str, NULL);
> return -EINVAL;
> }
> + if (head_config)
> + fix_raw(head_config, pmu);
>
> if (pmu->default_config) {
> memcpy(&attr, pmu->default_config,
> @@ -1875,180 +1933,6 @@ int parse_events_name(struct list_head *list, const char *name)
> return 0;
> }
>
> -static int
> -comp_pmu(const void *p1, const void *p2)
> -{
> - struct perf_pmu_event_symbol *pmu1 = (struct perf_pmu_event_symbol *) p1;
> - struct perf_pmu_event_symbol *pmu2 = (struct perf_pmu_event_symbol *) p2;
> -
> - return strcasecmp(pmu1->symbol, pmu2->symbol);
> -}
> -
> -static void perf_pmu__parse_cleanup(void)
> -{
> - if (perf_pmu_events_list_num > 0) {
> - struct perf_pmu_event_symbol *p;
> - int i;
> -
> - for (i = 0; i < perf_pmu_events_list_num; i++) {
> - p = perf_pmu_events_list + i;
> - zfree(&p->symbol);
> - }
> - zfree(&perf_pmu_events_list);
> - perf_pmu_events_list_num = 0;
> - }
> -}
> -
> -#define SET_SYMBOL(str, stype) \
> -do { \
> - p->symbol = str; \
> - if (!p->symbol) \
> - goto err; \
> - p->type = stype; \
> -} while (0)
> -
> -/*
> - * Read the pmu events list from sysfs
> - * Save it into perf_pmu_events_list
> - */
> -static void perf_pmu__parse_init(void)
> -{
> -
> - struct perf_pmu *pmu = NULL;
> - struct perf_pmu_alias *alias;
> - int len = 0;
> -
> - pmu = NULL;
> - while ((pmu = perf_pmu__scan(pmu)) != NULL) {
> - list_for_each_entry(alias, &pmu->aliases, list) {
> - char *tmp = strchr(alias->name, '-');
> -
> - if (tmp) {
> - char *tmp2 = NULL;
> -
> - tmp2 = strchr(tmp + 1, '-');
> - len++;
> - if (tmp2)
> - len++;
> - }
> -
> - len++;
> - }
> - }
> -
> - if (len == 0) {
> - perf_pmu_events_list_num = -1;
> - return;
> - }
> - perf_pmu_events_list = malloc(sizeof(struct perf_pmu_event_symbol) * len);
> - if (!perf_pmu_events_list)
> - return;
> - perf_pmu_events_list_num = len;
> -
> - len = 0;
> - pmu = NULL;
> - while ((pmu = perf_pmu__scan(pmu)) != NULL) {
> - list_for_each_entry(alias, &pmu->aliases, list) {
> - struct perf_pmu_event_symbol *p = perf_pmu_events_list + len;
> - char *tmp = strchr(alias->name, '-');
> - char *tmp2 = NULL;
> -
> - if (tmp)
> - tmp2 = strchr(tmp + 1, '-');
> - if (tmp2) {
> - SET_SYMBOL(strndup(alias->name, tmp - alias->name),
> - PMU_EVENT_SYMBOL_PREFIX);
> - p++;
> - tmp++;
> - SET_SYMBOL(strndup(tmp, tmp2 - tmp), PMU_EVENT_SYMBOL_SUFFIX);
> - p++;
> - SET_SYMBOL(strdup(++tmp2), PMU_EVENT_SYMBOL_SUFFIX2);
> - len += 3;
> - } else if (tmp) {
> - SET_SYMBOL(strndup(alias->name, tmp - alias->name),
> - PMU_EVENT_SYMBOL_PREFIX);
> - p++;
> - SET_SYMBOL(strdup(++tmp), PMU_EVENT_SYMBOL_SUFFIX);
> - len += 2;
> - } else {
> - SET_SYMBOL(strdup(alias->name), PMU_EVENT_SYMBOL);
> - len++;
> - }
> - }
> - }
> - qsort(perf_pmu_events_list, len,
> - sizeof(struct perf_pmu_event_symbol), comp_pmu);
> -
> - return;
> -err:
> - perf_pmu__parse_cleanup();
> -}
> -
> -/*
> - * This function injects special term in
> - * perf_pmu_events_list so the test code
> - * can check on this functionality.
> - */
> -int perf_pmu__test_parse_init(void)
> -{
> - struct perf_pmu_event_symbol *list, *tmp, symbols[] = {
> - {(char *)"read", PMU_EVENT_SYMBOL},
> - {(char *)"event", PMU_EVENT_SYMBOL_PREFIX},
> - {(char *)"two", PMU_EVENT_SYMBOL_SUFFIX},
> - {(char *)"hyphen", PMU_EVENT_SYMBOL_SUFFIX},
> - {(char *)"hyph", PMU_EVENT_SYMBOL_SUFFIX2},
> - };
> - unsigned long i, j;
> -
> - tmp = list = malloc(sizeof(*list) * ARRAY_SIZE(symbols));
> - if (!list)
> - return -ENOMEM;
> -
> - for (i = 0; i < ARRAY_SIZE(symbols); i++, tmp++) {
> - tmp->type = symbols[i].type;
> - tmp->symbol = strdup(symbols[i].symbol);
> - if (!tmp->symbol)
> - goto err_free;
> - }
> -
> - perf_pmu_events_list = list;
> - perf_pmu_events_list_num = ARRAY_SIZE(symbols);
> -
> - qsort(perf_pmu_events_list, ARRAY_SIZE(symbols),
> - sizeof(struct perf_pmu_event_symbol), comp_pmu);
> - return 0;
> -
> -err_free:
> - for (j = 0, tmp = list; j < i; j++, tmp++)
> - zfree(&tmp->symbol);
> - free(list);
> - return -ENOMEM;
> -}
> -
> -enum perf_pmu_event_symbol_type
> -perf_pmu__parse_check(const char *name)
> -{
> - struct perf_pmu_event_symbol p, *r;
> -
> - /* scan kernel pmu events from sysfs if needed */
> - if (perf_pmu_events_list_num == 0)
> - perf_pmu__parse_init();
> - /*
> - * name "cpu" could be prefix of cpu-cycles or cpu// events.
> - * cpu-cycles has been handled by hardcode.
> - * So it must be cpu// events, not kernel pmu event.
> - */
> - if ((perf_pmu_events_list_num <= 0) || !strcmp(name, "cpu"))
> - return PMU_EVENT_SYMBOL_ERR;
> -
> - p.symbol = strdup(name);
> - r = bsearch(&p, perf_pmu_events_list,
> - (size_t) perf_pmu_events_list_num,
> - sizeof(struct perf_pmu_event_symbol), comp_pmu);
> - zfree(&p.symbol);
> - return r ? r->type : PMU_EVENT_SYMBOL_ERR;
> -}
> -
> static int parse_events__scanner(const char *str,
> struct parse_events_state *parse_state)
> {
> @@ -2086,7 +1970,6 @@ int parse_events_terms(struct list_head *terms, const char *str)
> int ret;
>
> ret = parse_events__scanner(str, &parse_state);
> - perf_pmu__parse_cleanup();
>
> if (!ret) {
> list_splice(parse_state.terms, terms);
> @@ -2111,7 +1994,6 @@ static int parse_events__with_hybrid_pmu(struct parse_events_state *parse_state,
> int ret;
>
> ret = parse_events__scanner(str, &ps);
> - perf_pmu__parse_cleanup();
>
> if (!ret) {
> if (!list_empty(&ps.list)) {
> @@ -2267,7 +2149,6 @@ int __parse_events(struct evlist *evlist, const char *str,
> int ret;
>
> ret = parse_events__scanner(str, &parse_state);
> - perf_pmu__parse_cleanup();
>
> if (!ret && list_empty(&parse_state.list)) {
> WARN_ONCE(true, "WARNING: event parser found nothing\n");
> diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
> index 86ad4438a2aa..f638542c8638 100644
> --- a/tools/perf/util/parse-events.h
> +++ b/tools/perf/util/parse-events.h
> @@ -41,14 +41,6 @@ int parse_events_terms(struct list_head *terms, const char *str);
> int parse_filter(const struct option *opt, const char *str, int unset);
> int exclude_perf(const struct option *opt, const char *arg, int unset);
>
> -enum perf_pmu_event_symbol_type {
> - PMU_EVENT_SYMBOL_ERR, /* not a PMU EVENT */
> - PMU_EVENT_SYMBOL, /* normal style PMU event */
> - PMU_EVENT_SYMBOL_PREFIX, /* prefix of pre-suf style event */
> - PMU_EVENT_SYMBOL_SUFFIX, /* suffix of pre-suf style event */
> - PMU_EVENT_SYMBOL_SUFFIX2, /* suffix of pre-suf2 style event */
> -};
> -
> enum {
> PARSE_EVENTS__TERM_TYPE_NUM,
> PARSE_EVENTS__TERM_TYPE_STR,
> @@ -78,6 +70,7 @@ enum {
> PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT,
> PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE,
> PARSE_EVENTS__TERM_TYPE_METRIC_ID,
> + PARSE_EVENTS__TERM_TYPE_RAW,
> __PARSE_EVENTS__TERM_TYPE_NR,
> };
>
> @@ -174,8 +167,7 @@ int parse_events_add_numeric(struct parse_events_state *parse_state,
> int parse_events_add_tool(struct parse_events_state *parse_state,
> struct list_head *list,
> int tool_event);
> -int parse_events_add_cache(struct list_head *list, int *idx,
> - char *type, char *op_result1, char *op_result2,
> +int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
> struct parse_events_error *error,
> struct list_head *head_config,
> struct parse_events_state *parse_state);
> @@ -198,8 +190,6 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
> int parse_events_copy_term_list(struct list_head *old,
> struct list_head **new);
>
> -enum perf_pmu_event_symbol_type
> -perf_pmu__parse_check(const char *name);
> void parse_events__set_leader(char *name, struct list_head *list);
> void parse_events_update_lists(struct list_head *list_event,
> struct list_head *list_all);
> @@ -241,8 +231,6 @@ static inline bool is_sdt_event(char *str __maybe_unused)
> }
> #endif /* HAVE_LIBELF_SUPPORT */
>
> -int perf_pmu__test_parse_init(void);
> -
> struct evsel *parse_events__add_event_hybrid(struct list_head *list, int *idx,
> struct perf_event_attr *attr,
> const char *name,
> diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
> index 51fe0a9fb3de..4b35c099189a 100644
> --- a/tools/perf/util/parse-events.l
> +++ b/tools/perf/util/parse-events.l
> @@ -63,17 +63,6 @@ static int str(yyscan_t scanner, int token)
> return token;
> }
>
> -static int raw(yyscan_t scanner)
> -{
> - YYSTYPE *yylval = parse_events_get_lval(scanner);
> - char *text = parse_events_get_text(scanner);
> -
> - if (perf_pmu__parse_check(text) == PMU_EVENT_SYMBOL)
> - return str(scanner, PE_NAME);
> -
> - return __value(yylval, text + 1, 16, PE_RAW);
> -}
> -
> static bool isbpf_suffix(char *text)
> {
> int len = strlen(text);
> @@ -131,35 +120,6 @@ do { \
> yyless(0); \
> } while (0)
>
> -static int pmu_str_check(yyscan_t scanner, struct parse_events_state *parse_state)
> -{
> - YYSTYPE *yylval = parse_events_get_lval(scanner);
> - char *text = parse_events_get_text(scanner);
> -
> - yylval->str = strdup(text);
> -
> - /*
> - * If we're not testing then parse check determines the PMU event type
> - * which if it isn't a PMU returns PE_NAME. When testing the result of
> - * parse check can't be trusted so we return PE_PMU_EVENT_FAKE unless
> - * an '!' is present in which case the text can't be a PMU name.
> - */
> - switch (perf_pmu__parse_check(text)) {
> - case PMU_EVENT_SYMBOL_PREFIX:
> - return PE_PMU_EVENT_PRE;
> - case PMU_EVENT_SYMBOL_SUFFIX:
> - return PE_PMU_EVENT_SUF;
> - case PMU_EVENT_SYMBOL_SUFFIX2:
> - return PE_PMU_EVENT_SUF2;
> - case PMU_EVENT_SYMBOL:
> - return parse_state->fake_pmu
> - ? PE_PMU_EVENT_FAKE : PE_KERNEL_PMU_EVENT;
> - default:
> - return parse_state->fake_pmu && !strchr(text,'!')
> - ? PE_PMU_EVENT_FAKE : PE_NAME;
> - }
> -}
> -
> static int sym(yyscan_t scanner, int type, int config)
> {
> YYSTYPE *yylval = parse_events_get_lval(scanner);
> @@ -211,13 +171,15 @@ bpf_source [^,{}]+\.c[a-zA-Z0-9._]*
> num_dec [0-9]+
> num_hex 0x[a-fA-F0-9]+
> num_raw_hex [a-fA-F0-9]+
> -name [a-zA-Z_*?\[\]][a-zA-Z0-9_*?.\[\]!]*
> +name [a-zA-Z_*?\[\]][a-zA-Z0-9_*?.\[\]!\-]*
> name_tag [\'][a-zA-Z_*?\[\]][a-zA-Z0-9_*?\-,\.\[\]:=]*[\']
> name_minus [a-zA-Z_*?][a-zA-Z0-9\-_*?.:]*
> drv_cfg_term [a-zA-Z0-9_\.]+(=[a-zA-Z0-9_*?\.:]+)?
> /* If you add a modifier you need to update check_modifier() */
> modifier_event [ukhpPGHSDIWeb]+
> modifier_bp [rwx]{1,3}
> +lc_type (L1-dcache|l1-d|l1d|L1-data|L1-icache|l1-i|l1i|L1-instruction|LLC|L2|dTLB|d-tlb|Data-TLB|iTLB|i-tlb|Instruction-TLB|branch|branches|bpu|btb|bpc|node)
> +lc_op_result (load|loads|read|store|stores|write|prefetch|prefetches|speculative-read|speculative-load|refs|Reference|ops|access|misses|miss)
>
> %%
>
> @@ -303,8 +265,8 @@ percore { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_PERCORE); }
> aux-output { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT); }
> aux-sample-size { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE); }
> metric-id { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_METRIC_ID); }
> -r{num_raw_hex} { return raw(yyscanner); }
> -r0x{num_raw_hex} { return raw(yyscanner); }
> +r{num_raw_hex} { return str(yyscanner, PE_RAW); }
> +r0x{num_raw_hex} { return str(yyscanner, PE_RAW); }
> , { return ','; }
> "/" { BEGIN(INITIAL); return '/'; }
> {name_minus} { return str(yyscanner, PE_NAME); }
> @@ -359,47 +321,20 @@ system_time { return tool(yyscanner, PERF_TOOL_SYSTEM_TIME); }
> bpf-output { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_BPF_OUTPUT); }
> cgroup-switches { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CGROUP_SWITCHES); }
>
> - /*
> - * We have to handle the kernel PMU event cycles-ct/cycles-t/mem-loads/mem-stores separately.
> - * Because the prefix cycles is mixed up with cpu-cycles.
> - * loads and stores are mixed up with cache event
> - */
> -cycles-ct |
> -cycles-t |
> -mem-loads |
> -mem-loads-aux |
> -mem-stores |
> -topdown-[a-z-]+ |
> -tx-capacity-[a-z-]+ |
> -el-capacity-[a-z-]+ { return str(yyscanner, PE_KERNEL_PMU_EVENT); }
> -
> -L1-dcache|l1-d|l1d|L1-data |
> -L1-icache|l1-i|l1i|L1-instruction |
> -LLC|L2 |
> -dTLB|d-tlb|Data-TLB |
> -iTLB|i-tlb|Instruction-TLB |
> -branch|branches|bpu|btb|bpc |
> -node { return str(yyscanner, PE_NAME_CACHE_TYPE); }
> -
> -load|loads|read |
> -store|stores|write |
> -prefetch|prefetches |
> -speculative-read|speculative-load |
> -refs|Reference|ops|access |
> -misses|miss { return str(yyscanner, PE_NAME_CACHE_OP_RESULT); }
> -
> +{lc_type} { return str(yyscanner, PE_LEGACY_CACHE); }
> +{lc_type}-{lc_op_result} { return str(yyscanner, PE_LEGACY_CACHE); }
> +{lc_type}-{lc_op_result}-{lc_op_result} { return str(yyscanner, PE_LEGACY_CACHE); }
> mem: { BEGIN(mem); return PE_PREFIX_MEM; }
> -r{num_raw_hex} { return raw(yyscanner); }
> +r{num_raw_hex} { return str(yyscanner, PE_RAW); }
> {num_dec} { return value(yyscanner, 10); }
> {num_hex} { return value(yyscanner, 16); }
>
> {modifier_event} { return str(yyscanner, PE_MODIFIER_EVENT); }
> {bpf_object} { if (!isbpf(yyscanner)) { USER_REJECT }; return str(yyscanner, PE_BPF_OBJECT); }
> {bpf_source} { if (!isbpf(yyscanner)) { USER_REJECT }; return str(yyscanner, PE_BPF_SOURCE); }
> -{name} { return pmu_str_check(yyscanner, _parse_state); }
> +{name} { return str(yyscanner, PE_NAME); }
> {name_tag} { return str(yyscanner, PE_NAME); }
> "/" { BEGIN(config); return '/'; }
> -- { return '-'; }
> , { BEGIN(event); return ','; }
> : { return ':'; }
> "{" { BEGIN(event); return '{'; }
> diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
> index 4488443e506e..e7072b5601c5 100644
> --- a/tools/perf/util/parse-events.y
> +++ b/tools/perf/util/parse-events.y
> @@ -8,6 +8,7 @@
>
> #define YYDEBUG 1
>
> +#include <errno.h>
> #include <fnmatch.h>
> #include <stdio.h>
> #include <linux/compiler.h>
> @@ -52,36 +53,35 @@ static void free_list_evsel(struct list_head* list_evsel)
> %}
>
> %token PE_START_EVENTS PE_START_TERMS
> -%token PE_VALUE PE_VALUE_SYM_HW PE_VALUE_SYM_SW PE_RAW PE_TERM
> +%token PE_VALUE PE_VALUE_SYM_HW PE_VALUE_SYM_SW PE_TERM
> %token PE_VALUE_SYM_TOOL
> %token PE_EVENT_NAME
> -%token PE_NAME
> +%token PE_RAW PE_NAME
> %token PE_BPF_OBJECT PE_BPF_SOURCE
> %token PE_MODIFIER_EVENT PE_MODIFIER_BP
> -%token PE_NAME_CACHE_TYPE PE_NAME_CACHE_OP_RESULT
> +%token PE_LEGACY_CACHE
> %token PE_PREFIX_MEM PE_PREFIX_RAW PE_PREFIX_GROUP
> %token PE_ERROR
> -%token PE_PMU_EVENT_PRE PE_PMU_EVENT_SUF PE_PMU_EVENT_SUF2 PE_KERNEL_PMU_EVENT PE_PMU_EVENT_FAKE
> +%token PE_KERNEL_PMU_EVENT PE_PMU_EVENT_FAKE
> %token PE_ARRAY_ALL PE_ARRAY_RANGE
> %token PE_DRV_CFG_TERM
> %type <num> PE_VALUE
> %type <num> PE_VALUE_SYM_HW
> %type <num> PE_VALUE_SYM_SW
> %type <num> PE_VALUE_SYM_TOOL
> -%type <num> PE_RAW
> %type <num> PE_TERM
> %type <num> value_sym
> +%type <str> PE_RAW
> %type <str> PE_NAME
> %type <str> PE_BPF_OBJECT
> %type <str> PE_BPF_SOURCE
> -%type <str> PE_NAME_CACHE_TYPE
> -%type <str> PE_NAME_CACHE_OP_RESULT
> +%type <str> PE_LEGACY_CACHE
> %type <str> PE_MODIFIER_EVENT
> %type <str> PE_MODIFIER_BP
> %type <str> PE_EVENT_NAME
> -%type <str> PE_PMU_EVENT_PRE PE_PMU_EVENT_SUF PE_PMU_EVENT_SUF2 PE_KERNEL_PMU_EVENT PE_PMU_EVENT_FAKE
> +%type <str> PE_KERNEL_PMU_EVENT PE_PMU_EVENT_FAKE
> %type <str> PE_DRV_CFG_TERM
> -%type <str> event_pmu_name
> +%type <str> name_or_raw
> %destructor { free ($$); } <str>
> %type <term> event_term
> %destructor { parse_events_term__delete ($$); } <term>
> @@ -273,11 +273,8 @@ event_def: event_pmu |
> event_legacy_raw sep_dc |
> event_bpf_file
>
> -event_pmu_name:
> -PE_NAME | PE_PMU_EVENT_PRE
> -
> event_pmu:
> -event_pmu_name opt_pmu_config
> +PE_NAME opt_pmu_config
> {
> struct parse_events_state *parse_state = _parse_state;
> struct parse_events_error *error = parse_state->error;
> @@ -303,10 +300,12 @@ event_pmu_name opt_pmu_config
> list = alloc_list();
> if (!list)
> CLEANUP_YYABORT;
> + /* Attempt to add to list assuming $1 is a PMU name. */
> if (parse_events_add_pmu(_parse_state, list, $1, $2, /*auto_merge_stats=*/false)) {
> struct perf_pmu *pmu = NULL;
> int ok = 0;
>
> + /* Failure to add, try wildcard expansion of $1 as a PMU name. */
> if (asprintf(&pattern, "%s*", $1) < 0)
> CLEANUP_YYABORT;
>
> @@ -329,6 +328,12 @@ event_pmu_name opt_pmu_config
> }
> }
>
> + if (!ok) {
> + /* Failure to add, assume $1 is an event name. */
> + zfree(&list);
> + ok = !parse_events_multi_pmu_add(_parse_state, $1, $2, &list);
> + $2 = NULL;
> + }
> if (!ok)
> CLEANUP_YYABORT;
> }
> @@ -352,41 +357,27 @@ PE_KERNEL_PMU_EVENT sep_dc
> $$ = list;
> }
> |
> -PE_KERNEL_PMU_EVENT opt_pmu_config
> +PE_NAME sep_dc
> {
> struct list_head *list;
> int err;
>
> - /* frees $2 */
> - err = parse_events_multi_pmu_add(_parse_state, $1, $2, &list);
> + err = parse_events_multi_pmu_add(_parse_state, $1, NULL, &list);
> free($1);
> if (err < 0)
> YYABORT;
> $$ = list;
> }
> |
> -PE_PMU_EVENT_PRE '-' PE_PMU_EVENT_SUF '-' PE_PMU_EVENT_SUF2 sep_dc
> -{
> - struct list_head *list;
> - char pmu_name[128];
> - snprintf(pmu_name, sizeof(pmu_name), "%s-%s-%s", $1, $3, $5);
> - free($1);
> - free($3);
> - free($5);
> - if (parse_events_multi_pmu_add(_parse_state, pmu_name, NULL, &list) < 0)
> - YYABORT;
> - $$ = list;
> -}
> -|
> -PE_PMU_EVENT_PRE '-' PE_PMU_EVENT_SUF sep_dc
> +PE_KERNEL_PMU_EVENT opt_pmu_config
> {
> struct list_head *list;
> - char pmu_name[128];
> + int err;
>
> - snprintf(pmu_name, sizeof(pmu_name), "%s-%s", $1, $3);
> + /* frees $2 */
> + err = parse_events_multi_pmu_add(_parse_state, $1, $2, &list);
> free($1);
> - free($3);
> - if (parse_events_multi_pmu_add(_parse_state, pmu_name, NULL, &list) < 0)
> + if (err < 0)
> YYABORT;
> $$ = list;
> }
> @@ -476,7 +467,7 @@ PE_VALUE_SYM_TOOL sep_slash_slash_dc
> }
>
> event_legacy_cache:
> -PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT '-' PE_NAME_CACHE_OP_RESULT opt_event_config
> +PE_LEGACY_CACHE opt_event_config
> {
> struct parse_events_state *parse_state = _parse_state;
> struct parse_events_error *error = parse_state->error;
> @@ -485,51 +476,8 @@ PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT '-' PE_NAME_CACHE_OP_RESULT opt_e
>
> list = alloc_list();
> ABORT_ON(!list);
> - err = parse_events_add_cache(list, &parse_state->idx, $1, $3, $5, error, $6,
> - parse_state);
> - parse_events_terms__delete($6);
> - free($1);
> - free($3);
> - free($5);
> - if (err) {
> - free_list_evsel(list);
> - YYABORT;
> - }
> - $$ = list;
> -}
> -|
> -PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT opt_event_config
> -{
> - struct parse_events_state *parse_state = _parse_state;
> - struct parse_events_error *error = parse_state->error;
> - struct list_head *list;
> - int err;
> + err = parse_events_add_cache(list, &parse_state->idx, $1, error, $2, parse_state);
>
> - list = alloc_list();
> - ABORT_ON(!list);
> - err = parse_events_add_cache(list, &parse_state->idx, $1, $3, NULL, error, $4,
> - parse_state);
> - parse_events_terms__delete($4);
> - free($1);
> - free($3);
> - if (err) {
> - free_list_evsel(list);
> - YYABORT;
> - }
> - $$ = list;
> -}
> -|
> -PE_NAME_CACHE_TYPE opt_event_config
> -{
> - struct parse_events_state *parse_state = _parse_state;
> - struct parse_events_error *error = parse_state->error;
> - struct list_head *list;
> - int err;
> -
> - list = alloc_list();
> - ABORT_ON(!list);
> - err = parse_events_add_cache(list, &parse_state->idx, $1, NULL, NULL, error, $2,
> - parse_state);
> parse_events_terms__delete($2);
> free($1);
> if (err) {
> @@ -633,17 +581,6 @@ tracepoint_name opt_event_config
> }
>
> tracepoint_name:
> -PE_NAME '-' PE_NAME ':' PE_NAME
> -{
> - struct tracepoint_name tracepoint;
> -
> - ABORT_ON(asprintf(&tracepoint.sys, "%s-%s", $1, $3) < 0);
> - tracepoint.event = $5;
> - free($1);
> - free($3);
> - $$ = tracepoint;
> -}
> -|
> PE_NAME ':' PE_NAME
> {
> struct tracepoint_name tracepoint = {$1, $3};
> @@ -673,10 +610,15 @@ PE_RAW opt_event_config
> {
> struct list_head *list;
> int err;
> + u64 num;
>
> list = alloc_list();
> ABORT_ON(!list);
> - err = parse_events_add_numeric(_parse_state, list, PERF_TYPE_RAW, $1, $2);
> + errno = 0;
> + num = strtoull($1 + 1, NULL, 16);
> + ABORT_ON(errno);
> + free($1);
> + err = parse_events_add_numeric(_parse_state, list, PERF_TYPE_RAW, num, $2);
> parse_events_terms__delete($2);
> if (err) {
> free(list);
> @@ -781,17 +723,22 @@ event_term
> $$ = head;
> }
>
> +name_or_raw: PE_RAW | PE_NAME
> +
> event_term:
> PE_RAW
> {
> struct parse_events_term *term;
>
> - ABORT_ON(parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_CONFIG,
> - NULL, $1, false, &@1, NULL));
> + if (parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_RAW,
> + strdup("raw"), $1, &@1, &@1)) {
> + free($1);
> + YYABORT;
> + }
> $$ = term;
> }
> |
> -PE_NAME '=' PE_NAME
> +name_or_raw '=' PE_NAME
> {
> struct parse_events_term *term;
>
> @@ -804,7 +751,7 @@ PE_NAME '=' PE_NAME
> $$ = term;
> }
> |
> -PE_NAME '=' PE_VALUE
> +name_or_raw '=' PE_VALUE
> {
> struct parse_events_term *term;
>
> @@ -816,7 +763,7 @@ PE_NAME '=' PE_VALUE
> $$ = term;
> }
> |
> -PE_NAME '=' PE_VALUE_SYM_HW
> +name_or_raw '=' PE_VALUE_SYM_HW
> {
> struct parse_events_term *term;
> int config = $3 & 255;
> @@ -876,7 +823,7 @@ PE_TERM
> $$ = term;
> }
> |
> -PE_NAME array '=' PE_NAME
> +name_or_raw array '=' PE_NAME
> {
> struct parse_events_term *term;
>
> @@ -891,7 +838,7 @@ PE_NAME array '=' PE_NAME
> $$ = term;
> }
> |
> -PE_NAME array '=' PE_VALUE
> +name_or_raw array '=' PE_VALUE
> {
> struct parse_events_term *term;
>

2023-04-27 20:17:09

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 16/40] perf test: Validate events with hyphens in



On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> Rewritten event parsing can handle event names that contain components
> of legacy events.


Run the test on Cascade Lake and Alder Lake. It looks good.

Tested-by: Kan Liang <[email protected]>

Thanks,
Kan

>
> Signed-off-by: Ian Rogers <[email protected]>
> ---
> tools/perf/tests/parse-events.c | 12 ------------
> 1 file changed, 12 deletions(-)
>
> diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
> index 6eadb8a47dbf..cb976765b8b0 100644
> --- a/tools/perf/tests/parse-events.c
> +++ b/tools/perf/tests/parse-events.c
> @@ -2198,18 +2198,6 @@ static int test__pmu_events(struct test_suite *test __maybe_unused, int subtest
> ret = combine_test_results(ret, TEST_SKIP);
> continue;
> }
> - /*
> - * Names containing '-' are recognized as prefixes and suffixes
> - * due to '-' being a legacy PMU separator. This fails when the
> - * prefix or suffix collides with an existing legacy token. For
> - * example, branch-brs has a prefix (branch) that collides with
> - * a PE_NAME_CACHE_TYPE token causing a parse error as a suffix
> - * isn't expected after this. As event names in the config
> - * slashes are allowed a '-' in the name we check this works
> - * above.
> - */
> - if (strchr(ent->d_name, '-'))
> - continue;
>
> dir = opendir(path);
> if (!dir) {

2023-04-27 20:21:20

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 17/40] perf evsel: Modify group pmu name for software events



On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> If we have a group of {cycles,faults} then we need the faults software
> event to appear to be on the same PMU as cycles so that we don't split
> the group in parse_events__sort_events_and_fix_groups. This case is
> relatively easy as cycles is the leader and will have a PMU name. In
> the reverse case, {faults,cycles} we still need faults to appear to
> have the PMU name of cycles but the old behavior is just to return
> "cpu". For hybrid this fails as cycles will be on "cpu_core" or
> "cpu_atom", causing faults to be split into a different group.
>
> Change the behavior for software events so that the whole group is
> searched for the named PMU.
>
> Signed-off-by: Ian Rogers <[email protected]>

Reviewed-by: Kan Liang <[email protected]>

Thanks,
Kan

> ---
> tools/perf/util/evsel.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 1cd04b5998d2..63522322e118 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -829,23 +829,26 @@ bool evsel__name_is(struct evsel *evsel, const char *name)
>
> const char *evsel__group_pmu_name(const struct evsel *evsel)
> {
> - const struct evsel *leader;
> + struct evsel *leader, *pos;
>
> /* If the pmu_name is set use it. pmu_name isn't set for CPU and software events. */
> if (evsel->pmu_name)
> return evsel->pmu_name;
> /*
> * Software events may be in a group with other uncore PMU events. Use
> - * the pmu_name of the group leader to avoid breaking the software event
> - * out of the group.
> + * the pmu_name of the first non-software event to avoid breaking the
> + * software event out of the group.
> *
> * Aux event leaders, like intel_pt, expect a group with events from
> * other PMUs, so substitute the AUX event's PMU in this case.
> */
> leader = evsel__leader(evsel);
> - if ((evsel->core.attr.type == PERF_TYPE_SOFTWARE || evsel__is_aux_event(leader)) &&
> - leader->pmu_name) {
> - return leader->pmu_name;
> + if (evsel->core.attr.type == PERF_TYPE_SOFTWARE || evsel__is_aux_event(leader)) {
> + /* Starting with the leader, find the first event with a named PMU. */
> + for_each_group_evsel(pos, leader) {
> + if (pos->pmu_name)
> + return pos->pmu_name;
> + }
> }
>
> return "cpu";

2023-04-27 20:21:47

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 13/40] perf parse-events: Set attr.type to PMU type early



On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> Set attr.type to PMU type early so that later terms can override the
> value. Setting the value in perf_pmu__config means that earlier steps,
> like config_term_pmu, can override the value.
>

Looks good to me.

Reviewed-by: Kan Liang <[email protected]>

Thanks,
Kan

> Signed-off-by: Ian Rogers <[email protected]> ---
> tools/perf/util/parse-events.c | 2 +-
> tools/perf/util/pmu.c | 1 -
> 2 files changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index d71019dcd614..4ba01577618e 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -1492,9 +1492,9 @@ int parse_events_add_pmu(struct parse_events_state *parse_state,
> } else {
> memset(&attr, 0, sizeof(attr));
> }
> + attr.type = pmu->type;
>
> if (!head_config) {
> - attr.type = pmu->type;
> evsel = __add_event(list, &parse_state->idx, &attr,
> /*init_attr=*/true, /*name=*/NULL,
> /*metric_id=*/NULL, pmu,
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index ad209c88a124..cb33d869f1ed 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -1398,7 +1398,6 @@ int perf_pmu__config(struct perf_pmu *pmu, struct perf_event_attr *attr,
> {
> bool zero = !!pmu->default_config;
>
> - attr->type = pmu->type;
> return perf_pmu__config_terms(pmu->name, &pmu->format, attr,
> head_terms, zero, err);
> }

2023-04-27 20:35:13

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH v1 03/40] perf vendor events intel: Add icelake metric constraints

On Thu, Apr 27, 2023 at 12:06 PM Liang, Kan <[email protected]> wrote:
>
>
>
> On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> > Previously these constraints were disabled as they contained topdown
> > events. Since:
> > https://lore.kernel.org/all/[email protected]/
> > the topdown events are correctly grouped even if no group exists.
> >
> > This change was created by PR:
> > https://github.com/intel/perfmon/pull/71
> >
> > Signed-off-by: Ian Rogers <[email protected]>
> > ---
> > .../perf/pmu-events/arch/x86/icelake/icl-metrics.json | 11 +++++++++++
>
> Since it targets fixing the hybrid issues, could you please move the
> unrelated patch out of the series? A huge series is realy hard to be
> reviewed.

I have done. The independent patches are at the front while the
dependencies are in the later patches. This is covered in the cover
letter.

Thanks,
Ian

> Thanks,
> Kan
>
> > 1 file changed, 11 insertions(+)
> >
> > diff --git a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json b/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
> > index f45ae3483df4..cb58317860ea 100644
> > --- a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
> > +++ b/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
> > @@ -311,6 +311,7 @@
> > },
> > {
> > "BriefDescription": "This metric represents overall arithmetic floating-point (FP) operations fraction the CPU has executed (retired)",
> > + "MetricConstraint": "NO_GROUP_EVENTS",
> > "MetricExpr": "tma_x87_use + tma_fp_scalar + tma_fp_vector",
> > "MetricGroup": "HPC;TopdownL3;tma_L3_group;tma_light_operations_group",
> > "MetricName": "tma_fp_arith",
> > @@ -413,6 +414,7 @@
> > },
> > {
> > "BriefDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative branch misprediction (retired JEClear)",
> > + "MetricConstraint": "NO_GROUP_EVENTS",
> > "MetricExpr": "(tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) * tma_info_slots / BR_MISP_RETIRED.ALL_BRANCHES",
> > "MetricGroup": "Bad;BrMispredicts;tma_issueBM",
> > "MetricName": "tma_info_branch_misprediction_cost",
> > @@ -458,6 +460,7 @@
> > },
> > {
> > "BriefDescription": "Probability of Core Bound bottleneck hidden by SMT-profiling artifacts",
> > + "MetricConstraint": "NO_GROUP_EVENTS",
> > "MetricExpr": "(100 * (1 - tma_core_bound / tma_ports_utilization if tma_core_bound < tma_ports_utilization else 1) if tma_info_smt_2t_utilization > 0.5 else 0)",
> > "MetricGroup": "Cor;SMT",
> > "MetricName": "tma_info_core_bound_likely",
> > @@ -510,6 +513,7 @@
> > },
> > {
> > "BriefDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck",
> > + "MetricConstraint": "NO_GROUP_EVENTS",
> > "MetricExpr": "100 * (tma_fetch_latency * tma_dsb_switches / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_mite / (tma_dsb + tma_lsd + tma_mite))",
> > "MetricGroup": "DSBmiss;Fed;tma_issueFB",
> > "MetricName": "tma_info_dsb_misses",
> > @@ -591,6 +595,7 @@
> > },
> > {
> > "BriefDescription": "Total pipeline cost of instruction fetch bandwidth related bottlenecks",
> > + "MetricConstraint": "NO_GROUP_EVENTS",
> > "MetricExpr": "100 * (tma_frontend_bound - tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) - tma_info_big_code",
> > "MetricGroup": "Fed;FetchBW;Frontend",
> > "MetricName": "tma_info_instruction_fetch_bw",
> > @@ -929,6 +934,7 @@
> > },
> > {
> > "BriefDescription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)",
> > + "MetricConstraint": "NO_GROUP_EVENTS",
> > "MetricExpr": "100 * tma_memory_bound * (tma_l1_bound / max(tma_memory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_dtlb_load / max(tma_l1_bound, tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)))",
> > "MetricGroup": "Mem;MemoryTLB;Offcore;tma_issueTLB",
> > "MetricName": "tma_info_memory_data_tlbs",
> > @@ -937,6 +943,7 @@
> > },
> > {
> > "BriefDescription": "Total pipeline cost of Memory Latency related bottlenecks (external memory and off-core caches)",
> > + "MetricConstraint": "NO_GROUP_EVENTS",
> > "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_l3_hit_latency / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound))",
> > "MetricGroup": "Mem;MemoryLat;Offcore;tma_issueLat",
> > "MetricName": "tma_info_memory_latency",
> > @@ -945,6 +952,7 @@
> > },
> > {
> > "BriefDescription": "Total pipeline cost of Branch Misprediction related bottlenecks",
> > + "MetricConstraint": "NO_GROUP_EVENTS",
> > "MetricExpr": "100 * (tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))",
> > "MetricGroup": "Bad;BadSpec;BrMispredicts;tma_issueBM",
> > "MetricName": "tma_info_mispredictions",
> > @@ -996,6 +1004,7 @@
> > },
> > {
> > "BriefDescription": "Average number of Uops retired in cycles where at least one uop has retired.",
> > + "MetricConstraint": "NO_GROUP_EVENTS",
> > "MetricExpr": "tma_retiring * tma_info_slots / cpu@UOPS_RETIRED.SLOTS\\,cmask\\=1@",
> > "MetricGroup": "Pipeline;Ret",
> > "MetricName": "tma_info_retire"
> > @@ -1196,6 +1205,7 @@
> > },
> > {
> > "BriefDescription": "This metric represents fraction of slots where the CPU was retiring memory operations -- uops for memory load or store accesses.",
> > + "MetricConstraint": "NO_GROUP_EVENTS",
> > "MetricExpr": "tma_light_operations * MEM_INST_RETIRED.ANY / INST_RETIRED.ANY",
> > "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
> > "MetricName": "tma_memory_operations",
> > @@ -1266,6 +1276,7 @@
> > },
> > {
> > "BriefDescription": "This metric represents the remaining light uops fraction the CPU has executed - remaining means not covered by other sibling nodes",
> > + "MetricConstraint": "NO_GROUP_EVENTS",
> > "MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_memory_operations + tma_branch_instructions + tma_nop_instructions))",
> > "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
> > "MetricName": "tma_other_light_ops",

2023-04-27 20:36:14

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH v1 08/40] perf test: Test more sysfs events

On Thu, Apr 27, 2023 at 12:39 PM Liang, Kan <[email protected]> wrote:
>
>
>
> On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> > Parse events for all PMUs, and not just cpu, in test "Parsing of all
> > PMU events from sysfs".
> >
> > Signed-off-by: Ian Rogers <[email protected]>
>
> Run the test on Cascade Lake and Alder Lake. It looks good.
>
> Tested-by: Kan Liang <[email protected]>

Arnaldo found an issue (strchr with an uninitialized value) that I
have a fix for in v2 of this. The bug got introduced by trying to
separate out the hybrid from the not hybrid changes.

Thanks,
Ian

> Thanks,
> Kan
> > ---
> > tools/perf/tests/parse-events.c | 103 +++++++++++++++++---------------
> > 1 file changed, 55 insertions(+), 48 deletions(-)
> >
> > diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
> > index 8068cfd89b84..385bbbc4a409 100644
> > --- a/tools/perf/tests/parse-events.c
> > +++ b/tools/perf/tests/parse-events.c
> > @@ -7,6 +7,7 @@
> > #include "debug.h"
> > #include "pmu.h"
> > #include "pmu-hybrid.h"
> > +#include "pmus.h"
> > #include <dirent.h>
> > #include <errno.h>
> > #include "fncache.h"
> > @@ -2225,49 +2226,24 @@ static int test_pmu(void)
> >
> > static int test__pmu_events(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
> > {
> > - struct stat st;
> > - char path[PATH_MAX];
> > - struct dirent *ent;
> > - DIR *dir;
> > - int ret;
> > -
> > - if (!test_pmu())
> > - return TEST_SKIP;
> > -
> > - snprintf(path, PATH_MAX, "%s/bus/event_source/devices/cpu/events/",
> > - sysfs__mountpoint());
> > -
> > - ret = stat(path, &st);
> > - if (ret) {
> > - pr_debug("omitting PMU cpu events tests: %s\n", path);
> > - return TEST_OK;
> > - }
> > + struct perf_pmu *pmu;
> > + int ret = TEST_OK;
> >
> > - dir = opendir(path);
> > - if (!dir) {
> > - pr_debug("can't open pmu event dir: %s\n", path);
> > - return TEST_FAIL;
> > - }
> > + perf_pmus__for_each_pmu(pmu) {
> > + struct stat st;
> > + char path[PATH_MAX];
> > + struct dirent *ent;
> > + DIR *dir;
> > + int err;
> >
> > - ret = TEST_OK;
> > - while ((ent = readdir(dir))) {
> > - struct evlist_test e = { .name = NULL, };
> > - char name[2 * NAME_MAX + 1 + 12 + 3];
> > - int test_ret;
> > + snprintf(path, PATH_MAX, "%s/bus/event_source/devices/%s/events/",
> > + sysfs__mountpoint(), pmu->name);
> >
> > - /* Names containing . are special and cannot be used directly */
> > - if (strchr(ent->d_name, '.'))
> > + err = stat(path, &st);
> > + if (err) {
> > + pr_debug("skipping PMU %s events tests: %s\n", pmu->name, path);
> > + ret = combine_test_results(ret, TEST_SKIP);
> > continue;
> > -
> > - snprintf(name, sizeof(name), "cpu/event=%s/u", ent->d_name);
> > -
> > - e.name = name;
> > - e.check = test__checkevent_pmu_events;
> > -
> > - test_ret = test_event(&e);
> > - if (test_ret != TEST_OK) {
> > - pr_debug("Test PMU event failed for '%s'", name);
> > - ret = combine_test_results(ret, test_ret);
> > }
> > /*
> > * Names containing '-' are recognized as prefixes and suffixes
> > @@ -2282,17 +2258,48 @@ static int test__pmu_events(struct test_suite *test __maybe_unused, int subtest
> > if (strchr(ent->d_name, '-'))
> > continue;
> >
> > - snprintf(name, sizeof(name), "%s:u,cpu/event=%s/u", ent->d_name, ent->d_name);
> > - e.name = name;
> > - e.check = test__checkevent_pmu_events_mix;
> > - test_ret = test_event(&e);
> > - if (test_ret != TEST_OK) {
> > - pr_debug("Test PMU event failed for '%s'", name);
> > - ret = combine_test_results(ret, test_ret);
> > + dir = opendir(path);
> > + if (!dir) {
> > + pr_debug("can't open pmu event dir: %s\n", path);
> > + ret = combine_test_results(ret, TEST_SKIP);
> > + continue;
> > }
> > - }
> >
> > - closedir(dir);
> > + while ((ent = readdir(dir))) {
> > + struct evlist_test e = { .name = NULL, };
> > + char name[2 * NAME_MAX + 1 + 12 + 3];
> > + int test_ret;
> > +
> > + /* Names containing . are special and cannot be used directly */
> > + if (strchr(ent->d_name, '.'))
> > + continue;
> > +
> > + snprintf(name, sizeof(name), "%s/event=%s/u", pmu->name, ent->d_name);
> > +
> > + e.name = name;
> > + e.check = test__checkevent_pmu_events;
> > +
> > + test_ret = test_event(&e);
> > + if (test_ret != TEST_OK) {
> > + pr_debug("Test PMU event failed for '%s'", name);
> > + ret = combine_test_results(ret, test_ret);
> > + }
> > +
> > + if (!is_pmu_core(pmu->name))
> > + continue;
> > +
> > + snprintf(name, sizeof(name), "%s:u,%s/event=%s/u", ent->d_name, pmu->name, ent->d_name);
> > + e.name = name;
> > + e.check = test__checkevent_pmu_events_mix;
> > + test_ret = test_event(&e);
> > + if (test_ret != TEST_OK) {
> > + pr_debug("Test PMU event failed for '%s'", name);
> > + ret = combine_test_results(ret, test_ret);
> > + }
> > + }
> > +
> > + closedir(dir);
> > + }
> > return ret;
> > }
> >

2023-04-27 20:37:27

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH v1 01/40] perf stat: Introduce skippable evsels

On Thu, Apr 27, 2023 at 11:54 AM Liang, Kan <[email protected]> wrote:
>
>
>
> On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> > Perf stat with no arguments will use default events and metrics. These
> > events may fail to open even with kernel and hypervisor disabled. When
> > these fail then the permissions error appears even though they were
> > implicitly selected. This is particularly a problem with the automatic
> > selection of the TopdownL1 metric group on certain architectures like
> > Skylake:
> >
> > ```
> > $ perf stat true
> > Error:
> > Access to performance monitoring and observability operations is limited.
> > Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
> > access to performance monitoring and observability operations for processes
> > without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
> > More information can be found at 'Perf events and tool security' document:
> > https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
> > perf_event_paranoid setting is 2:
> > -1: Allow use of (almost) all events by all users
> > Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
> >> = 0: Disallow raw and ftrace function tracepoint access
> >> = 1: Disallow CPU event access
> >> = 2: Disallow kernel profiling
> > To make the adjusted perf_event_paranoid setting permanent preserve it
> > in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
> > ```
> >
> > This patch adds skippable evsels that when they fail to open won't
> > fail and won't appear in output. The TopdownL1 events, from the metric
> > group, are marked as skippable. This turns the failure above to:
> >
> > ```
> > $ perf stat true
> >
> > Performance counter stats for 'true':
> >
> > 1.26 msec task-clock:u # 0.328 CPUs utilized
> > 0 context-switches:u # 0.000 /sec
> > 0 cpu-migrations:u # 0.000 /sec
> > 49 page-faults:u # 38.930 K/sec
> > 176,449 cycles:u # 0.140 GHz (48.99%)
> > 122,905 instructions:u # 0.70 insn per cycle
> > 28,264 branches:u # 22.456 M/sec
> > 2,405 branch-misses:u # 8.51% of all branches
> >
> > 0.003834565 seconds time elapsed
> >
> > 0.000000000 seconds user
> > 0.004130000 seconds sys
> > ```
>
> If the same command runs with root permission, a different output will
> be displayed as below:
>
> $ sudo ./perf stat sleep 1
>
> Performance counter stats for 'sleep 1':
>
> 0.97 msec task-clock # 0.001 CPUs
> utilized
> 1 context-switches # 1.030 K/sec
> 0 cpu-migrations # 0.000 /sec
> 67 page-faults # 69.043 K/sec
> 1,135,552 cycles # 1.170 GHz
> (50.51%)
> 1,126,446 instructions # 0.99 insn
> per cycle
> 252,904 branches # 260.615 M/sec
> 7,297 branch-misses # 2.89% of
> all branches
> 22,518 CPU_CLK_UNHALTED.REF_XCLK # 23.205
> M/sec
> 56,994 INT_MISC.RECOVERY_CYCLES_ANY # 58.732 M/sec
>
> The last two events are useless.

You missed the system wide (-a) flag.

Thanks,
Ian

> It's not reliable to rely on perf_event_open()/kernel to tell whether
> an event is available or skippable. Kernel wouldn't check a specific event.
>
> The patch works for the non-root mode is just because the event requires
> root permission. It's rejected by the kernel because of lacking
> permission. But if the same command runs with root privileges, the trash
> events are printed as above.
>
> I think a better way is to check the HW capability and decided whether
> to append the TopdownL1 metrics.
>
> https://lore.kernel.org/lkml/[email protected]/
>
>
> Thanks,
> Kan
>
>
> >
> > When the events can have kernel/hypervisor disabled, like on
> > Tigerlake, then it continues to succeed as:
> >
> > ```
> > $ perf stat true
> >
> > Performance counter stats for 'true':
> >
> > 0.57 msec task-clock:u # 0.385 CPUs utilized
> > 0 context-switches:u # 0.000 /sec
> > 0 cpu-migrations:u # 0.000 /sec
> > 47 page-faults:u # 82.329 K/sec
> > 287,017 cycles:u # 0.503 GHz
> > 133,318 instructions:u # 0.46 insn per cycle
> > 31,396 branches:u # 54.996 M/sec
> > 2,442 branch-misses:u # 7.78% of all branches
> > 998,790 TOPDOWN.SLOTS:u # 14.5 % tma_retiring
> > # 27.6 % tma_backend_bound
> > # 40.9 % tma_frontend_bound
> > # 17.0 % tma_bad_speculation
> > 144,922 topdown-retiring:u
> > 411,266 topdown-fe-bound:u
> > 258,510 topdown-be-bound:u
> > 184,090 topdown-bad-spec:u
> > 2,585 INT_MISC.UOP_DROPPING:u # 4.528 M/sec
> > 3,434 cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/u # 6.015 M/sec
> >
> > 0.001480954 seconds time elapsed
> >
> > 0.000000000 seconds user
> > 0.001686000 seconds sys
> > ```
> >
> > And this likewise works if paranoia allows or running as root.
> >
> > Signed-off-by: Ian Rogers <[email protected]>
> > ---
> > tools/perf/builtin-stat.c | 39 ++++++++++++++++++++++++++--------
> > tools/perf/util/evsel.c | 15 +++++++++++--
> > tools/perf/util/evsel.h | 1 +
> > tools/perf/util/stat-display.c | 4 ++++
> > 4 files changed, 48 insertions(+), 11 deletions(-)
> >
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index efda63f6bf32..eb34f5418ad3 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -667,6 +667,13 @@ static enum counter_recovery stat_handle_error(struct evsel *counter)
> > evsel_list->core.threads->err_thread = -1;
> > return COUNTER_RETRY;
> > }
> > + } else if (counter->skippable) {
> > + if (verbose > 0)
> > + ui__warning("skipping event %s that kernel failed to open .\n",
> > + evsel__name(counter));
> > + counter->supported = false;
> > + counter->errored = true;
> > + return COUNTER_SKIP;
> > }
> >
> > evsel__open_strerror(counter, &target, errno, msg, sizeof(msg));
> > @@ -1885,15 +1892,29 @@ static int add_default_attributes(void)
> > * Add TopdownL1 metrics if they exist. To minimize
> > * multiplexing, don't request threshold computation.
> > */
> > - if (metricgroup__has_metric("TopdownL1") &&
> > - metricgroup__parse_groups(evsel_list, "TopdownL1",
> > - /*metric_no_group=*/false,
> > - /*metric_no_merge=*/false,
> > - /*metric_no_threshold=*/true,
> > - stat_config.user_requested_cpu_list,
> > - stat_config.system_wide,
> > - &stat_config.metric_events) < 0)
> > - return -1;
> > + if (metricgroup__has_metric("TopdownL1")) {
> > + struct evlist *metric_evlist = evlist__new();
> > + struct evsel *metric_evsel;
> > +
> > + if (!metric_evlist)
> > + return -1;
> > +
> > + if (metricgroup__parse_groups(metric_evlist, "TopdownL1",
> > + /*metric_no_group=*/false,
> > + /*metric_no_merge=*/false,
> > + /*metric_no_threshold=*/true,
> > + stat_config.user_requested_cpu_list,
> > + stat_config.system_wide,
> > + &stat_config.metric_events) < 0)
> > + return -1;
> > +
> > + evlist__for_each_entry(metric_evlist, metric_evsel) {
> > + metric_evsel->skippable = true;
> > + }
> > + evlist__splice_list_tail(evsel_list, &metric_evlist->core.entries);
> > + evlist__delete(metric_evlist);
> > + }
> > +
> > /* Platform specific attrs */
> > if (evlist__add_default_attrs(evsel_list, default_null_attrs) < 0)
> > return -1;
> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > index 356c07f03be6..1cd04b5998d2 100644
> > --- a/tools/perf/util/evsel.c
> > +++ b/tools/perf/util/evsel.c
> > @@ -290,6 +290,7 @@ void evsel__init(struct evsel *evsel,
> > evsel->per_pkg_mask = NULL;
> > evsel->collect_stat = false;
> > evsel->pmu_name = NULL;
> > + evsel->skippable = false;
> > }
> >
> > struct evsel *evsel__new_idx(struct perf_event_attr *attr, int idx)
> > @@ -1725,9 +1726,13 @@ static int get_group_fd(struct evsel *evsel, int cpu_map_idx, int thread)
> > return -1;
> >
> > fd = FD(leader, cpu_map_idx, thread);
> > - BUG_ON(fd == -1);
> > + BUG_ON(fd == -1 && !leader->skippable);
> >
> > - return fd;
> > + /*
> > + * When the leader has been skipped, return -2 to distinguish from no
> > + * group leader case.
> > + */
> > + return fd == -1 ? -2 : fd;
> > }
> >
> > static void evsel__remove_fd(struct evsel *pos, int nr_cpus, int nr_threads, int thread_idx)
> > @@ -2109,6 +2114,12 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> >
> > group_fd = get_group_fd(evsel, idx, thread);
> >
> > + if (group_fd == -2) {
> > + pr_debug("broken group leader for %s\n", evsel->name);
> > + err = -EINVAL;
> > + goto out_close;
> > + }
> > +
> > test_attr__ready();
> >
> > /* Debug message used by test scripts */
> > diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> > index 35805dcdb1b9..bf8f01af1c0b 100644
> > --- a/tools/perf/util/evsel.h
> > +++ b/tools/perf/util/evsel.h
> > @@ -95,6 +95,7 @@ struct evsel {
> > bool weak_group;
> > bool bpf_counter;
> > bool use_config_name;
> > + bool skippable;
> > int bpf_fd;
> > struct bpf_object *bpf_obj;
> > struct list_head config_terms;
> > diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> > index e6035ecbeee8..6b46bbb3d322 100644
> > --- a/tools/perf/util/stat-display.c
> > +++ b/tools/perf/util/stat-display.c
> > @@ -810,6 +810,10 @@ static bool should_skip_zero_counter(struct perf_stat_config *config,
> > struct perf_cpu cpu;
> > int idx;
> >
> > + /* Skip counters that were speculatively/default enabled rather than requested. */
> > + if (counter->skippable)
> > + return true;
> > +
> > /*
> > * Skip value 0 when enabling --per-thread globally,
> > * otherwise it will have too many 0 output.

2023-04-27 21:04:16

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v1 01/40] perf stat: Introduce skippable evsels

Hello,

On Thu, Apr 27, 2023 at 1:21 PM Ian Rogers <[email protected]> wrote:
>
> On Thu, Apr 27, 2023 at 11:54 AM Liang, Kan <[email protected]> wrote:
> >
> >
> >
> > On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> > > Perf stat with no arguments will use default events and metrics. These
> > > events may fail to open even with kernel and hypervisor disabled. When
> > > these fail then the permissions error appears even though they were
> > > implicitly selected. This is particularly a problem with the automatic
> > > selection of the TopdownL1 metric group on certain architectures like
> > > Skylake:
> > >
> > > ```
> > > $ perf stat true
> > > Error:
> > > Access to performance monitoring and observability operations is limited.
> > > Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
> > > access to performance monitoring and observability operations for processes
> > > without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
> > > More information can be found at 'Perf events and tool security' document:
> > > https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
> > > perf_event_paranoid setting is 2:
> > > -1: Allow use of (almost) all events by all users
> > > Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
> > >> = 0: Disallow raw and ftrace function tracepoint access
> > >> = 1: Disallow CPU event access
> > >> = 2: Disallow kernel profiling
> > > To make the adjusted perf_event_paranoid setting permanent preserve it
> > > in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
> > > ```
> > >
> > > This patch adds skippable evsels that when they fail to open won't
> > > fail and won't appear in output. The TopdownL1 events, from the metric
> > > group, are marked as skippable. This turns the failure above to:
> > >
> > > ```
> > > $ perf stat true
> > >
> > > Performance counter stats for 'true':
> > >
> > > 1.26 msec task-clock:u # 0.328 CPUs utilized
> > > 0 context-switches:u # 0.000 /sec
> > > 0 cpu-migrations:u # 0.000 /sec
> > > 49 page-faults:u # 38.930 K/sec
> > > 176,449 cycles:u # 0.140 GHz (48.99%)
> > > 122,905 instructions:u # 0.70 insn per cycle
> > > 28,264 branches:u # 22.456 M/sec
> > > 2,405 branch-misses:u # 8.51% of all branches
> > >
> > > 0.003834565 seconds time elapsed
> > >
> > > 0.000000000 seconds user
> > > 0.004130000 seconds sys
> > > ```
> >
> > If the same command runs with root permission, a different output will
> > be displayed as below:
> >
> > $ sudo ./perf stat sleep 1
> >
> > Performance counter stats for 'sleep 1':
> >
> > 0.97 msec task-clock # 0.001 CPUs
> > utilized
> > 1 context-switches # 1.030 K/sec
> > 0 cpu-migrations # 0.000 /sec
> > 67 page-faults # 69.043 K/sec
> > 1,135,552 cycles # 1.170 GHz
> > (50.51%)
> > 1,126,446 instructions # 0.99 insn
> > per cycle
> > 252,904 branches # 260.615 M/sec
> > 7,297 branch-misses # 2.89% of
> > all branches
> > 22,518 CPU_CLK_UNHALTED.REF_XCLK # 23.205
> > M/sec
> > 56,994 INT_MISC.RECOVERY_CYCLES_ANY # 58.732 M/sec
> >
> > The last two events are useless.
>
> You missed the system wide (-a) flag.
>
> Thanks,
> Ian
>
> > It's not reliable to rely on perf_event_open()/kernel to tell whether
> > an event is available or skippable. Kernel wouldn't check a specific event.
> >
> > The patch works for the non-root mode is just because the event requires
> > root permission. It's rejected by the kernel because of lacking
> > permission. But if the same command runs with root privileges, the trash
> > events are printed as above.
> >
> > I think a better way is to check the HW capability and decided whether
> > to append the TopdownL1 metrics.
> >
> > https://lore.kernel.org/lkml/[email protected]/

Maybe we can also check if the event is actually enabled like
checking the enabled_time. Then skip the skippable and not
enabled ones.

Thanks,
Namhyung

2023-04-27 21:06:47

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH v1 21/40] perf parse-events: Wildcard legacy cache events

On Wed, Apr 26, 2023 at 10:50 PM Ian Rogers <[email protected]> wrote:
>
> On Wed, Apr 26, 2023 at 3:11 AM James Clark <[email protected]> wrote:
> >
> >
> >
> > On 26/04/2023 08:00, Ian Rogers wrote:
> > > It is inconsistent that "perf stat -e instructions-retired" wildcard
> > > opens on all PMUs while legacy cache events like "perf stat -e
> > > L1-dcache-load-miss" do not. A behavior introduced by hybrid is that a
> > > legacy cache event like L1-dcache-load-miss should wildcard open on
> > > all hybrid PMUs. A call to is_event_supported is necessary for each
> > > PMU, a failure of which results in the event not being added. Rather
> > > than special case that logic, move it into the main legacy cache event
> > > case and attempt to open legacy cache events on all PMUs.
> > >
> > > Signed-off-by: Ian Rogers <[email protected]>
> > > ---
> > > tools/perf/util/parse-events-hybrid.c | 33 -------------
> > > tools/perf/util/parse-events-hybrid.h | 7 ---
> > > tools/perf/util/parse-events.c | 70 ++++++++++++++-------------
> > > tools/perf/util/parse-events.h | 3 +-
> > > tools/perf/util/parse-events.y | 2 +-
> > > 5 files changed, 39 insertions(+), 76 deletions(-)
> > >
> > > diff --git a/tools/perf/util/parse-events-hybrid.c b/tools/perf/util/parse-events-hybrid.c
> > > index 7c9f9150bad5..d2c0be051d46 100644
> > > --- a/tools/perf/util/parse-events-hybrid.c
> > > +++ b/tools/perf/util/parse-events-hybrid.c
> > > @@ -179,36 +179,3 @@ int parse_events__add_numeric_hybrid(struct parse_events_state *parse_state,
> > > return add_raw_hybrid(parse_state, list, attr, name, metric_id,
> > > config_terms);
> > > }
> > > -
> > > -int parse_events__add_cache_hybrid(struct list_head *list, int *idx,
> > > - struct perf_event_attr *attr,
> > > - const char *name,
> > > - const char *metric_id,
> > > - struct list_head *config_terms,
> > > - bool *hybrid,
> > > - struct parse_events_state *parse_state)
> > > -{
> > > - struct perf_pmu *pmu;
> > > - int ret;
> > > -
> > > - *hybrid = false;
> > > - if (!perf_pmu__has_hybrid())
> > > - return 0;
> > > -
> > > - *hybrid = true;
> > > - perf_pmu__for_each_hybrid_pmu(pmu) {
> > > - LIST_HEAD(terms);
> > > -
> > > - if (pmu_cmp(parse_state, pmu))
> > > - continue;
> > > -
> > > - copy_config_terms(&terms, config_terms);
> > > - ret = create_event_hybrid(PERF_TYPE_HW_CACHE, idx, list,
> > > - attr, name, metric_id, &terms, pmu);
> > > - free_config_terms(&terms);
> > > - if (ret)
> > > - return ret;
> > > - }
> > > -
> > > - return 0;
> > > -}
> > > diff --git a/tools/perf/util/parse-events-hybrid.h b/tools/perf/util/parse-events-hybrid.h
> > > index cbc05fec02a2..bc2966e73897 100644
> > > --- a/tools/perf/util/parse-events-hybrid.h
> > > +++ b/tools/perf/util/parse-events-hybrid.h
> > > @@ -15,11 +15,4 @@ int parse_events__add_numeric_hybrid(struct parse_events_state *parse_state,
> > > struct list_head *config_terms,
> > > bool *hybrid);
> > >
> > > -int parse_events__add_cache_hybrid(struct list_head *list, int *idx,
> > > - struct perf_event_attr *attr,
> > > - const char *name, const char *metric_id,
> > > - struct list_head *config_terms,
> > > - bool *hybrid,
> > > - struct parse_events_state *parse_state);
> > > -
> > > #endif /* __PERF_PARSE_EVENTS_HYBRID_H */
> > > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> > > index 9b2d7b6572c2..e007b2bc1ab4 100644
> > > --- a/tools/perf/util/parse-events.c
> > > +++ b/tools/perf/util/parse-events.c
> > > @@ -471,46 +471,50 @@ static int parse_events__decode_legacy_cache(const char *name, int pmu_type, __u
> > >
> > > int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
> > > struct parse_events_error *err,
> > > - struct list_head *head_config,
> > > - struct parse_events_state *parse_state)
> > > + struct list_head *head_config)
> > > {
> > > - struct perf_event_attr attr;
> > > - LIST_HEAD(config_terms);
> > > - const char *config_name, *metric_id;
> > > - int ret;
> > > - bool hybrid;
> > > + struct perf_pmu *pmu = NULL;
> > > + bool found_supported = false;
> > > + const char *config_name = get_config_name(head_config);
> > > + const char *metric_id = get_config_metric_id(head_config);
> > >
> > > + while ((pmu = perf_pmu__scan(pmu)) != NULL) {
> > > + LIST_HEAD(config_terms);
> > > + struct perf_event_attr attr;
> > > + int ret;
> > >
> > > - memset(&attr, 0, sizeof(attr));
> > > - attr.type = PERF_TYPE_HW_CACHE;
> > > - ret = parse_events__decode_legacy_cache(name, /*pmu_type=*/0, &attr.config);
> > > - if (ret)
> > > - return ret;
> > > + /*
> > > + * Skip uncore PMUs for performance. Software PMUs can open
> > > + * PERF_TYPE_HW_CACHE, so skip.
> > > + */
> > > + if (pmu->is_uncore || pmu->type == PERF_TYPE_SOFTWARE)
> > > + continue;
> > >
> > > - if (head_config) {
> > > - if (config_attr(&attr, head_config, err,
> > > - config_term_common))
> > > - return -EINVAL;
> > > + memset(&attr, 0, sizeof(attr));
> > > + attr.type = PERF_TYPE_HW_CACHE;
> > >
> > > - if (get_config_terms(head_config, &config_terms))
> > > - return -ENOMEM;
> > > - }
> > > + ret = parse_events__decode_legacy_cache(name, pmu->type, &attr.config);
> > > + if (ret)
> > > + return ret;
> > >
> > > - config_name = get_config_name(head_config);
> > > - metric_id = get_config_metric_id(head_config);
> > > - ret = parse_events__add_cache_hybrid(list, idx, &attr,
> > > - config_name ? : name,
> > > - metric_id,
> > > - &config_terms,
> > > - &hybrid, parse_state);
> > > - if (hybrid)
> > > - goto out_free_terms;
> > > + if (!is_event_supported(PERF_TYPE_HW_CACHE, attr.config))
> > > + continue;
> >
> > Hi Ian,
> >
> > I get a test failure on Arm from this commit. I think it's related to
> > this check for support that's failing but I'm not sure what the
> > resolution should be.
>
> Yes, I brought in a behavior from hybrid to fail at parse time if a
> legacy cache event isn't supported. The issue is the perf_event_open
> may fail because of permissions and I think we probably need to
> special case that and allow the parsing to succeed otherwise tests
> like this will need to skip. I naively tested on a raspberry pi, which
> has no metrics, and so I'll try again tomorrow on a neoverse.

So, following discussion with Stephane we think the right approach is
to not use a "is_event_supported" test at parse time. The event parser
should take an event name and create a perf_event_attr only. Removing
the is_event_supported test will change Intel hybrid behavior.
Wildcarded events will always try to open on both PMUs, the
expectation is that the event that failed to open will report "<not
counted>". I'll add this change in v2.

Thanks,
Ian

> > I also couldn't see why the metrics in
> > test_soc/cpu/metrics.json aren't run on x86 (assuming they're generic
> > 'test anywhere' type metrics?).
>
> The testing code is split into a bunch of places for historical
> reasons, but the test_soc is here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/tests/pmu-events.c?h=v6.3#n1031
> '''
> $ gdb --args perf test -vv -F 10
> (gdb) b test__pmu_event_table
> Breakpoint 1 at 0x199d7c: file tests/pmu-events.c, line 467.
> (gdb) r
> Starting program: /tmp/perf/perf test -vv -F 10
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> 10: PMU events :
> 10.1: PMU event table sanity :
> --- start ---
>
> Breakpoint 1, test__pmu_event_table (test=0x5555560bd080
> <suite.pmu_events>, subtest=0) at tes
> ts/pmu-events.c:467
> 467 find_sys_events_table("pmu_events__test_soc_sys");
> '''
>
> Something I observed is that tests/parse-events.c isn't testing
> against an ARM PMU and so skips a lot of testing. There should likely
> be a helper so that the string in that test can be dependent on the
> test platform. I worry this may expose some latent ARM issues with
> things like obscure modifiers.
>
> Thanks,
> Ian
>
> > $ perf test -vvv "parsing of PMU event table metrics with fake"
> > ...
> > parsing 'dcache_miss_cpi': 'l1d\-loads\-misses / inst_retired.any'
> > parsing metric: l1d\-loads\-misses / inst_retired.any
> > Attempting to add event pmu 'inst_retired.any' with
> > 'inst_retired.any,' that may result in non-fatal errors
> > After aliases, add event pmu 'inst_retired.any' with
> > 'inst_retired.any,' that may result in non-fatal errors
> > inst_retired.any -> fake_pmu/inst_retired.any/
> > ------------------------------------------------------------
> > perf_event_attr:
> > type 3
> > config 0x800010000
> > disabled 1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8
> > sys_perf_event_open failed, error -2
> >
> > check_parse_fake failed
> > test child finished with -1
> > ---- end ----
> > PMU events subtest 4: FAILED!
> >
> > >
> > > - ret = add_event(list, idx, &attr, config_name ? : name, metric_id,
> > > - &config_terms);
> > > -out_free_terms:
> > > - free_config_terms(&config_terms);
> > > - return ret;
> > > + found_supported = true;
> > > +
> > > + if (head_config) {
> > > + if (config_attr(&attr, head_config, err,
> > > + config_term_common))
> > > + return -EINVAL;
> > > +
> > > + if (get_config_terms(head_config, &config_terms))
> > > + return -ENOMEM;
> > > + }
> > > +
> > > + ret = add_event(list, idx, &attr, config_name ? : name, metric_id, &config_terms);
> > > + free_config_terms(&config_terms);
> > > + }
> > > + return found_supported ? 0: -EINVAL;
> > > }
> > >
> > > #ifdef HAVE_LIBTRACEEVENT
> > > diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
> > > index 5acb62c2e00a..0c26303f7f63 100644
> > > --- a/tools/perf/util/parse-events.h
> > > +++ b/tools/perf/util/parse-events.h
> > > @@ -172,8 +172,7 @@ int parse_events_add_tool(struct parse_events_state *parse_state,
> > > int tool_event);
> > > int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
> > > struct parse_events_error *error,
> > > - struct list_head *head_config,
> > > - struct parse_events_state *parse_state);
> > > + struct list_head *head_config);
> > > int parse_events_add_breakpoint(struct list_head *list, int *idx,
> > > u64 addr, char *type, u64 len);
> > > int parse_events_add_pmu(struct parse_events_state *parse_state,
> > > diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
> > > index f84fa1b132b3..cc7528558845 100644
> > > --- a/tools/perf/util/parse-events.y
> > > +++ b/tools/perf/util/parse-events.y
> > > @@ -476,7 +476,7 @@ PE_LEGACY_CACHE opt_event_config
> > >
> > > list = alloc_list();
> > > ABORT_ON(!list);
> > > - err = parse_events_add_cache(list, &parse_state->idx, $1, error, $2, parse_state);
> > > + err = parse_events_add_cache(list, &parse_state->idx, $1, error, $2);
> > >
> > > parse_events_terms__delete($2);
> > > free($1);

2023-04-27 21:11:14

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH v1 01/40] perf stat: Introduce skippable evsels

On Thu, Apr 27, 2023 at 2:00 PM Namhyung Kim <[email protected]> wrote:
>
> Hello,
>
> On Thu, Apr 27, 2023 at 1:21 PM Ian Rogers <[email protected]> wrote:
> >
> > On Thu, Apr 27, 2023 at 11:54 AM Liang, Kan <[email protected]> wrote:
> > >
> > >
> > >
> > > On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> > > > Perf stat with no arguments will use default events and metrics. These
> > > > events may fail to open even with kernel and hypervisor disabled. When
> > > > these fail then the permissions error appears even though they were
> > > > implicitly selected. This is particularly a problem with the automatic
> > > > selection of the TopdownL1 metric group on certain architectures like
> > > > Skylake:
> > > >
> > > > ```
> > > > $ perf stat true
> > > > Error:
> > > > Access to performance monitoring and observability operations is limited.
> > > > Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
> > > > access to performance monitoring and observability operations for processes
> > > > without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
> > > > More information can be found at 'Perf events and tool security' document:
> > > > https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
> > > > perf_event_paranoid setting is 2:
> > > > -1: Allow use of (almost) all events by all users
> > > > Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
> > > >> = 0: Disallow raw and ftrace function tracepoint access
> > > >> = 1: Disallow CPU event access
> > > >> = 2: Disallow kernel profiling
> > > > To make the adjusted perf_event_paranoid setting permanent preserve it
> > > > in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
> > > > ```
> > > >
> > > > This patch adds skippable evsels that when they fail to open won't
> > > > fail and won't appear in output. The TopdownL1 events, from the metric
> > > > group, are marked as skippable. This turns the failure above to:
> > > >
> > > > ```
> > > > $ perf stat true
> > > >
> > > > Performance counter stats for 'true':
> > > >
> > > > 1.26 msec task-clock:u # 0.328 CPUs utilized
> > > > 0 context-switches:u # 0.000 /sec
> > > > 0 cpu-migrations:u # 0.000 /sec
> > > > 49 page-faults:u # 38.930 K/sec
> > > > 176,449 cycles:u # 0.140 GHz (48.99%)
> > > > 122,905 instructions:u # 0.70 insn per cycle
> > > > 28,264 branches:u # 22.456 M/sec
> > > > 2,405 branch-misses:u # 8.51% of all branches
> > > >
> > > > 0.003834565 seconds time elapsed
> > > >
> > > > 0.000000000 seconds user
> > > > 0.004130000 seconds sys
> > > > ```
> > >
> > > If the same command runs with root permission, a different output will
> > > be displayed as below:
> > >
> > > $ sudo ./perf stat sleep 1
> > >
> > > Performance counter stats for 'sleep 1':
> > >
> > > 0.97 msec task-clock # 0.001 CPUs
> > > utilized
> > > 1 context-switches # 1.030 K/sec
> > > 0 cpu-migrations # 0.000 /sec
> > > 67 page-faults # 69.043 K/sec
> > > 1,135,552 cycles # 1.170 GHz
> > > (50.51%)
> > > 1,126,446 instructions # 0.99 insn
> > > per cycle
> > > 252,904 branches # 260.615 M/sec
> > > 7,297 branch-misses # 2.89% of
> > > all branches
> > > 22,518 CPU_CLK_UNHALTED.REF_XCLK # 23.205
> > > M/sec
> > > 56,994 INT_MISC.RECOVERY_CYCLES_ANY # 58.732 M/sec
> > >
> > > The last two events are useless.
> >
> > You missed the system wide (-a) flag.
> >
> > Thanks,
> > Ian
> >
> > > It's not reliable to rely on perf_event_open()/kernel to tell whether
> > > an event is available or skippable. Kernel wouldn't check a specific event.
> > >
> > > The patch works for the non-root mode is just because the event requires
> > > root permission. It's rejected by the kernel because of lacking
> > > permission. But if the same command runs with root privileges, the trash
> > > events are printed as above.
> > >
> > > I think a better way is to check the HW capability and decided whether
> > > to append the TopdownL1 metrics.
> > >
> > > https://lore.kernel.org/lkml/[email protected]/
>
> Maybe we can also check if the event is actually enabled like
> checking the enabled_time. Then skip the skippable and not
> enabled ones.

Good idea, and I think that addresses Kan's concern over missing
output. I'll add it in v2.

Thanks,
Ian

> Thanks,
> Namhyung

2023-04-27 21:54:48

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 18/40] perf test: Move x86 hybrid tests to arch/x86



On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> The tests use x86 hybrid specific PMUs.

Failed on my ADL.


>
> Signed-off-by: Ian Rogers <[email protected]>
> ---
> tools/perf/arch/x86/include/arch-tests.h | 1 +
> tools/perf/arch/x86/tests/Build | 1 +
> tools/perf/arch/x86/tests/arch-tests.c | 10 +
> tools/perf/arch/x86/tests/hybrid.c | 277 +++++++++++++++++++++++
> tools/perf/tests/parse-events.c | 181 ---------------
> 5 files changed, 289 insertions(+), 181 deletions(-)
> create mode 100644 tools/perf/arch/x86/tests/hybrid.c
>
> diff --git a/tools/perf/arch/x86/include/arch-tests.h b/tools/perf/arch/x86/include/arch-tests.h
> index 902e9ea9b99e..33d39c1d3e64 100644
> --- a/tools/perf/arch/x86/include/arch-tests.h
> +++ b/tools/perf/arch/x86/include/arch-tests.h
> @@ -11,6 +11,7 @@ int test__intel_pt_pkt_decoder(struct test_suite *test, int subtest);
> int test__intel_pt_hybrid_compat(struct test_suite *test, int subtest);
> int test__bp_modify(struct test_suite *test, int subtest);
> int test__x86_sample_parsing(struct test_suite *test, int subtest);
> +int test__hybrid(struct test_suite *test, int subtest);
>
> extern struct test_suite *arch_tests[];
>
> diff --git a/tools/perf/arch/x86/tests/Build b/tools/perf/arch/x86/tests/Build
> index 6f4e8636c3bf..08cc8b9c931e 100644
> --- a/tools/perf/arch/x86/tests/Build
> +++ b/tools/perf/arch/x86/tests/Build
> @@ -3,5 +3,6 @@ perf-$(CONFIG_DWARF_UNWIND) += dwarf-unwind.o
>
> perf-y += arch-tests.o
> perf-y += sample-parsing.o
> +perf-y += hybrid.o
> perf-$(CONFIG_AUXTRACE) += insn-x86.o intel-pt-test.o
> perf-$(CONFIG_X86_64) += bp-modify.o
> diff --git a/tools/perf/arch/x86/tests/arch-tests.c b/tools/perf/arch/x86/tests/arch-tests.c
> index aae6ea0fe52b..147ad0638bbb 100644
> --- a/tools/perf/arch/x86/tests/arch-tests.c
> +++ b/tools/perf/arch/x86/tests/arch-tests.c
> @@ -22,6 +22,15 @@ struct test_suite suite__intel_pt = {
> DEFINE_SUITE("x86 bp modify", bp_modify);
> #endif
> DEFINE_SUITE("x86 Sample parsing", x86_sample_parsing);
> +static struct test_case hybrid_tests[] = {
> + TEST_CASE_REASON("x86 hybrid event parsing", hybrid, "not hybrid"),
> + { .name = NULL, }
> +};
> +
> +struct test_suite suite__hybrid = {
> + .desc = "x86 hybrid",
> + .test_cases = hybrid_tests,
> +};
>
> struct test_suite *arch_tests[] = {
> #ifdef HAVE_DWARF_UNWIND_SUPPORT
> @@ -35,5 +44,6 @@ struct test_suite *arch_tests[] = {
> &suite__bp_modify,
> #endif
> &suite__x86_sample_parsing,
> + &suite__hybrid,
> NULL,
> };
> diff --git a/tools/perf/arch/x86/tests/hybrid.c b/tools/perf/arch/x86/tests/hybrid.c
> new file mode 100644
> index 000000000000..0f99cfd116ee
> --- /dev/null
> +++ b/tools/perf/arch/x86/tests/hybrid.c
> @@ -0,0 +1,277 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include "arch-tests.h"
> +#include "debug.h"
> +#include "evlist.h"
> +#include "evsel.h"
> +#include "pmu-hybrid.h"
> +#include "tests/tests.h"
> +
> +static bool test_config(const struct evsel *evsel, __u64 expected_config)
> +{
> + return (evsel->core.attr.config & PERF_HW_EVENT_MASK) == expected_config;
> +}
> +
> +static int test__hybrid_hw_event_with_pmu(struct evlist *evlist)
> +{
> + struct evsel *evsel = evlist__first(evlist);
> +
> + TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> + TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
> + return TEST_OK;
> +}
> +
> +static int test__hybrid_hw_group_event(struct evlist *evlist)
> +{
> + struct evsel *evsel, *leader;
> +
> + evsel = leader = evlist__first(evlist);
> + TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
> + TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));

The type should be PERF_TYPE_HARDWARE not PERF_TYPE_RAW. The real hybrid
PMU type can be found in the high 32 of attr.config.

/*
* attr.config layout for type PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
* PERF_TYPE_HARDWARE: 0xEEEEEEEE000000AA
* AA: hardware event ID
* EEEEEEEE: PMU type ID


Something as below should work.

diff --git a/tools/perf/arch/x86/tests/hybrid.c
b/tools/perf/arch/x86/tests/hybrid.c
index 66486335652f..6497c3f0801a 100644
--- a/tools/perf/arch/x86/tests/hybrid.c
+++ b/tools/perf/arch/x86/tests/hybrid.c
@@ -16,13 +16,19 @@ static bool test_perf_config(const struct perf_evsel
*evsel, __u64 expected_conf
return (evsel->attr.config & PERF_HW_EVENT_MASK) == expected_config;
}

+static bool test_hybrid_type(const struct evsel *evsel, __u64
expected_config)
+{
+ return (evsel->core.attr.config >> PERF_PMU_TYPE_SHIFT) ==
expected_config;
+}
+
static int test__hybrid_hw_event_with_pmu(struct evlist *evlist)
{
struct evsel *evsel = evlist__first(evlist);

TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE ==
evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong hybrid type", test_hybrid_type(evsel,
PERF_TYPE_RAW));
+ TEST_ASSERT_VAL("wrong config", test_config(evsel,
PERF_COUNT_HW_CPU_CYCLES));
return TEST_OK;
}

@@ -32,13 +38,15 @@ static int test__hybrid_hw_group_event(struct evlist
*evlist)

evsel = leader = evlist__first(evlist);
TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE ==
evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong hybrid type", test_hybrid_type(evsel,
PERF_TYPE_RAW));
+ TEST_ASSERT_VAL("wrong config", test_config(evsel,
PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));

evsel = evsel__next(evsel);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0xc0));
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE ==
evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong hybrid type", test_hybrid_type(evsel,
PERF_TYPE_RAW));
+ TEST_ASSERT_VAL("wrong config", test_config(evsel,
PERF_COUNT_HW_INSTRUCTIONS));
TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
return TEST_OK;
}
@@ -51,10 +59,10 @@ static int test__hybrid_sw_hw_group_event(struct
evlist *evlist)
TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE ==
evsel->core.attr.type);
TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
-
evsel = evsel__next(evsel);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE ==
evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong hybrid type", test_hybrid_type(evsel,
PERF_TYPE_RAW));
+ TEST_ASSERT_VAL("wrong config", test_config(evsel,
PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
return TEST_OK;
}
@@ -65,8 +73,9 @@ static int test__hybrid_hw_sw_group_event(struct
evlist *evlist)

evsel = leader = evlist__first(evlist);
TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE ==
evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong hybrid type", test_hybrid_type(evsel,
PERF_TYPE_RAW));
+ TEST_ASSERT_VAL("wrong config", test_config(evsel,
PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));

evsel = evsel__next(evsel);
@@ -81,15 +90,17 @@ static int test__hybrid_group_modifier1(struct
evlist *evlist)

evsel = leader = evlist__first(evlist);
TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE ==
evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong hybrid type", test_hybrid_type(evsel,
PERF_TYPE_RAW));
+ TEST_ASSERT_VAL("wrong config", test_config(evsel,
PERF_COUNT_HW_CPU_CYCLES));
TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);

evsel = evsel__next(evsel);
- TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
- TEST_ASSERT_VAL("wrong config", test_config(evsel, 0xc0));
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE ==
evsel->core.attr.type);
+ TEST_ASSERT_VAL("wrong hybrid type", test_hybrid_type(evsel,
PERF_TYPE_RAW));
+ TEST_ASSERT_VAL("wrong config", test_config(evsel,
PERF_COUNT_HW_INSTRUCTIONS));
TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);

Thanks,
Kan


> + TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> +
> + evsel = evsel__next(evsel);
> + TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0xc0));
> + TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> + return TEST_OK;
> +}
> +
> +static int test__hybrid_sw_hw_group_event(struct evlist *evlist)
> +{
> + struct evsel *evsel, *leader;
> +
> + evsel = leader = evlist__first(evlist);
> + TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
> + TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type);
> + TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> +
> + evsel = evsel__next(evsel);
> + TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
> + TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> + return TEST_OK;
> +}
> +
> +static int test__hybrid_hw_sw_group_event(struct evlist *evlist)
> +{
> + struct evsel *evsel, *leader;
> +
> + evsel = leader = evlist__first(evlist);
> + TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
> + TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
> + TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> +
> + evsel = evsel__next(evsel);
> + TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type);
> + TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> + return TEST_OK;
> +}
> +
> +static int test__hybrid_group_modifier1(struct evlist *evlist)
> +{
> + struct evsel *evsel, *leader;
> +
> + evsel = leader = evlist__first(evlist);
> + TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
> + TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
> + TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> + TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
> + TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> +
> + evsel = evsel__next(evsel);
> + TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0xc0));
> + TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> + TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> + TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> + return TEST_OK;
> +}
> +
> +static int test__hybrid_raw1(struct evlist *evlist)
> +{
> + struct evsel *evsel = evlist__first(evlist);
> +
> + if (!perf_pmu__hybrid_mounted("cpu_atom")) {
> + TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> + TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
> + return TEST_OK;
> + }
> +
> + TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
> + TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
> +
> + /* The type of second event is randome value */
> + evsel = evsel__next(evsel);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
> + return TEST_OK;
> +}
> +
> +static int test__hybrid_raw2(struct evlist *evlist)
> +{
> + struct evsel *evsel = evlist__first(evlist);
> +
> + TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> + TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> + TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
> + return TEST_OK;
> +}
> +
> +static int test__hybrid_cache_event(struct evlist *evlist)
> +{
> + struct evsel *evsel = evlist__first(evlist);
> +
> + TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> + TEST_ASSERT_VAL("wrong type", PERF_TYPE_HW_CACHE == evsel->core.attr.type);
> + TEST_ASSERT_VAL("wrong config", 0x2 == (evsel->core.attr.config & 0xffffffff));
> + return TEST_OK;
> +}
> +
> +static int test__checkevent_pmu(struct evlist *evlist)
> +{
> +
> + struct evsel *evsel = evlist__first(evlist);
> +
> + TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> + TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> + TEST_ASSERT_VAL("wrong config", 10 == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong config1", 1 == evsel->core.attr.config1);
> + TEST_ASSERT_VAL("wrong config2", 3 == evsel->core.attr.config2);
> + TEST_ASSERT_VAL("wrong config3", 0 == evsel->core.attr.config3);
> + /*
> + * The period value gets configured within evlist__config,
> + * while this test executes only parse events method.
> + */
> + TEST_ASSERT_VAL("wrong period", 0 == evsel->core.attr.sample_period);
> +
> + return TEST_OK;
> +}
> +
> +struct evlist_test {
> + const char *name;
> + bool (*valid)(void);
> + int (*check)(struct evlist *evlist);
> +};
> +
> +static const struct evlist_test test__hybrid_events[] = {
> + {
> + .name = "cpu_core/cpu-cycles/",
> + .check = test__hybrid_hw_event_with_pmu,
> + /* 0 */
> + },
> + {
> + .name = "{cpu_core/cpu-cycles/,cpu_core/instructions/}",
> + .check = test__hybrid_hw_group_event,
> + /* 1 */
> + },
> + {
> + .name = "{cpu-clock,cpu_core/cpu-cycles/}",
> + .check = test__hybrid_sw_hw_group_event,
> + /* 2 */
> + },
> + {
> + .name = "{cpu_core/cpu-cycles/,cpu-clock}",
> + .check = test__hybrid_hw_sw_group_event,
> + /* 3 */
> + },
> + {
> + .name = "{cpu_core/cpu-cycles/k,cpu_core/instructions/u}",
> + .check = test__hybrid_group_modifier1,
> + /* 4 */
> + },
> + {
> + .name = "r1a",
> + .check = test__hybrid_raw1,
> + /* 5 */
> + },
> + {
> + .name = "cpu_core/r1a/",
> + .check = test__hybrid_raw2,
> + /* 6 */
> + },
> + {
> + .name = "cpu_core/config=10,config1,config2=3,period=1000/u",
> + .check = test__checkevent_pmu,
> + /* 7 */
> + },
> + {
> + .name = "cpu_core/LLC-loads/",
> + .check = test__hybrid_cache_event,
> + /* 8 */
> + },
> +};
> +
> +static int test_event(const struct evlist_test *e)
> +{
> + struct parse_events_error err;
> + struct evlist *evlist;
> + int ret;
> +
> + if (e->valid && !e->valid()) {
> + pr_debug("... SKIP\n");
> + return TEST_OK;
> + }
> +
> + evlist = evlist__new();
> + if (evlist == NULL) {
> + pr_err("Failed allocation");
> + return TEST_FAIL;
> + }
> + parse_events_error__init(&err);
> + ret = parse_events(evlist, e->name, &err);
> + if (ret) {
> + pr_debug("failed to parse event '%s', err %d, str '%s'\n",
> + e->name, ret, err.str);
> + parse_events_error__print(&err, e->name);
> + ret = TEST_FAIL;
> + if (strstr(err.str, "can't access trace events"))
> + ret = TEST_SKIP;
> + } else {
> + ret = e->check(evlist);
> + }
> + parse_events_error__exit(&err);
> + evlist__delete(evlist);
> +
> + return ret;
> +}
> +
> +static int combine_test_results(int existing, int latest)
> +{
> + if (existing == TEST_FAIL)
> + return TEST_FAIL;
> + if (existing == TEST_SKIP)
> + return latest == TEST_OK ? TEST_SKIP : latest;
> + return latest;
> +}
> +
> +static int test_events(const struct evlist_test *events, int cnt)
> +{
> + int ret = TEST_OK;
> +
> + for (int i = 0; i < cnt; i++) {
> + const struct evlist_test *e = &events[i];
> + int test_ret;
> +
> + pr_debug("running test %d '%s'\n", i, e->name);
> + test_ret = test_event(e);
> + if (test_ret != TEST_OK) {
> + pr_debug("Event test failure: test %d '%s'", i, e->name);
> + ret = combine_test_results(ret, test_ret);
> + }
> + }
> +
> + return ret;
> +}
> +
> +int test__hybrid(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
> +{
> + if (!perf_pmu__has_hybrid())
> + return TEST_SKIP;
> +
> + return test_events(test__hybrid_events, ARRAY_SIZE(test__hybrid_events));
> +}
> diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
> index cb976765b8b0..15fec7f01315 100644
> --- a/tools/perf/tests/parse-events.c
> +++ b/tools/perf/tests/parse-events.c
> @@ -6,7 +6,6 @@
> #include "tests.h"
> #include "debug.h"
> #include "pmu.h"
> -#include "pmu-hybrid.h"
> #include "pmus.h"
> #include <dirent.h>
> #include <errno.h>
> @@ -1509,127 +1508,6 @@ static int test__all_tracepoints(struct evlist *evlist)
> }
> #endif /* HAVE_LIBTRACEVENT */
>
> -static int test__hybrid_hw_event_with_pmu(struct evlist *evlist)
> -{
> - struct evsel *evsel = evlist__first(evlist);
> -
> - TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> - TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
> - return TEST_OK;
> -}
> -
> -static int test__hybrid_hw_group_event(struct evlist *evlist)
> -{
> - struct evsel *evsel, *leader;
> -
> - evsel = leader = evlist__first(evlist);
> - TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
> - TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
> - TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> -
> - evsel = evsel__next(evsel);
> - TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", test_config(evsel, 0xc0));
> - TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> - return TEST_OK;
> -}
> -
> -static int test__hybrid_sw_hw_group_event(struct evlist *evlist)
> -{
> - struct evsel *evsel, *leader;
> -
> - evsel = leader = evlist__first(evlist);
> - TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
> - TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> -
> - evsel = evsel__next(evsel);
> - TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
> - TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> - return TEST_OK;
> -}
> -
> -static int test__hybrid_hw_sw_group_event(struct evlist *evlist)
> -{
> - struct evsel *evsel, *leader;
> -
> - evsel = leader = evlist__first(evlist);
> - TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
> - TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
> - TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> -
> - evsel = evsel__next(evsel);
> - TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> - return TEST_OK;
> -}
> -
> -static int test__hybrid_group_modifier1(struct evlist *evlist)
> -{
> - struct evsel *evsel, *leader;
> -
> - evsel = leader = evlist__first(evlist);
> - TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
> - TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x3c));
> - TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> - TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
> - TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> -
> - evsel = evsel__next(evsel);
> - TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", test_config(evsel, 0xc0));
> - TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> - TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> - TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> - return TEST_OK;
> -}
> -
> -static int test__hybrid_raw1(struct evlist *evlist)
> -{
> - struct evsel *evsel = evlist__first(evlist);
> -
> - if (!perf_pmu__hybrid_mounted("cpu_atom")) {
> - TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> - TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
> - return TEST_OK;
> - }
> -
> - TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
> - TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
> -
> - /* The type of second event is randome value */
> - evsel = evsel__next(evsel);
> - TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
> - return TEST_OK;
> -}
> -
> -static int test__hybrid_raw2(struct evlist *evlist)
> -{
> - struct evsel *evsel = evlist__first(evlist);
> -
> - TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> - TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", test_config(evsel, 0x1a));
> - return TEST_OK;
> -}
> -
> -static int test__hybrid_cache_event(struct evlist *evlist)
> -{
> - struct evsel *evsel = evlist__first(evlist);
> -
> - TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
> - TEST_ASSERT_VAL("wrong type", PERF_TYPE_HW_CACHE == evsel->core.attr.type);
> - TEST_ASSERT_VAL("wrong config", 0x2 == (evsel->core.attr.config & 0xffffffff));
> - return TEST_OK;
> -}
> -
> struct evlist_test {
> const char *name;
> bool (*valid)(void);
> @@ -1997,54 +1875,6 @@ static const struct terms_test test__terms[] = {
> },
> };
>
> -static const struct evlist_test test__hybrid_events[] = {
> - {
> - .name = "cpu_core/cpu-cycles/",
> - .check = test__hybrid_hw_event_with_pmu,
> - /* 0 */
> - },
> - {
> - .name = "{cpu_core/cpu-cycles/,cpu_core/instructions/}",
> - .check = test__hybrid_hw_group_event,
> - /* 1 */
> - },
> - {
> - .name = "{cpu-clock,cpu_core/cpu-cycles/}",
> - .check = test__hybrid_sw_hw_group_event,
> - /* 2 */
> - },
> - {
> - .name = "{cpu_core/cpu-cycles/,cpu-clock}",
> - .check = test__hybrid_hw_sw_group_event,
> - /* 3 */
> - },
> - {
> - .name = "{cpu_core/cpu-cycles/k,cpu_core/instructions/u}",
> - .check = test__hybrid_group_modifier1,
> - /* 4 */
> - },
> - {
> - .name = "r1a",
> - .check = test__hybrid_raw1,
> - /* 5 */
> - },
> - {
> - .name = "cpu_core/r1a/",
> - .check = test__hybrid_raw2,
> - /* 6 */
> - },
> - {
> - .name = "cpu_core/config=10,config1,config2=3,period=1000/u",
> - .check = test__checkevent_pmu,
> - /* 7 */
> - },
> - {
> - .name = "cpu_core/LLC-loads/",
> - .check = test__hybrid_cache_event,
> - /* 8 */
> - },
> -};
> -
> static int test_event(const struct evlist_test *e)
> {
> struct parse_events_error err;
> @@ -2307,14 +2137,6 @@ static bool test_alias(char **event, char **alias)
> return false;
> }
>
> -static int test__hybrid(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
> -{
> - if (!perf_pmu__has_hybrid())
> - return TEST_SKIP;
> -
> - return test_events(test__hybrid_events, ARRAY_SIZE(test__hybrid_events));
> -}
> -
> static int test__checkevent_pmu_events_alias(struct evlist *evlist)
> {
> struct evsel *evsel1 = evlist__first(evlist);
> @@ -2378,9 +2200,6 @@ static struct test_case tests__parse_events[] = {
> TEST_CASE_REASON("Test event parsing",
> events2,
> "permissions"),
> - TEST_CASE_REASON("Test parsing of \"hybrid\" CPU events",
> - hybrid,
> - "not hybrid"),
> TEST_CASE_REASON("Parsing of all PMU events from sysfs",
> pmu_events,
> "permissions"),