2022-12-07 05:52:38

by Sandipan Das

[permalink] [raw]
Subject: [PATCH 4/4] perf vendor events amd: Add Zen 4 metrics

Add metrics taken from Section 2.1.15.2 "Performance Measurement" in
the Processor Programming Reference (PPR) for AMD Family 19h Model 11h
Revision B1 processors.

The recommended metrics are sourced from Table 27 "Guidance for Common
Performance Statistics with Complex Event Selects".

The pipeline utilization metrics are sourced from Table 28 "Guidance
for Pipeline Utilization Analysis Statistics". These are new to Zen 4
processors and useful for finding performance bottlenecks by analyzing
activity at different stages of the pipeline. Metric groups have been
added for Level 1 and Level 2 analysis.

Signed-off-by: Sandipan Das <[email protected]>
---
.../pmu-events/arch/x86/amdzen4/pipeline.json | 98 +++++
.../arch/x86/amdzen4/recommended.json | 334 ++++++++++++++++++
2 files changed, 432 insertions(+)
create mode 100644 tools/perf/pmu-events/arch/x86/amdzen4/pipeline.json
create mode 100644 tools/perf/pmu-events/arch/x86/amdzen4/recommended.json

diff --git a/tools/perf/pmu-events/arch/x86/amdzen4/pipeline.json b/tools/perf/pmu-events/arch/x86/amdzen4/pipeline.json
new file mode 100644
index 000000000000..23d1f35d0903
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/amdzen4/pipeline.json
@@ -0,0 +1,98 @@
+[
+ {
+ "MetricName": "total_dispatch_slots",
+ "BriefDescription": "Total dispatch slots (upto 6 instructions can be dispatched in each cycle).",
+ "MetricExpr": "6 * ls_not_halted_cyc"
+ },
+ {
+ "MetricName": "frontend_bound",
+ "BriefDescription": "Fraction of dispatch slots that remained unused because the frontend did not supply enough instructions/ops.",
+ "MetricExpr": "d_ratio(de_no_dispatch_per_slot.no_ops_from_frontend, total_dispatch_slots)",
+ "MetricGroup": "pipeline_utilization_level1",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricName": "bad_speculation",
+ "BriefDescription": "Fraction of dispatched ops that did not retire.",
+ "MetricExpr": "d_ratio(de_src_op_disp.all - ex_ret_ops, total_dispatch_slots)",
+ "MetricGroup": "pipeline_utilization_level1",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricName": "backend_bound",
+ "BriefDescription": "Fraction of dispatch slots that remained unused because of backend stalls.",
+ "MetricExpr": "d_ratio(de_no_dispatch_per_slot.backend_stalls, total_dispatch_slots)",
+ "MetricGroup": "pipeline_utilization_level1",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricName": "smt_contention",
+ "BriefDescription": "Fraction of dispatch slots that remained unused because the other thread was selected.",
+ "MetricExpr": "d_ratio(de_no_dispatch_per_slot.smt_contention, total_dispatch_slots)",
+ "MetricGroup": "pipeline_utilization_level1",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricName": "retiring",
+ "BriefDescription": "Fraction of dispatch slots used by ops that retired.",
+ "MetricExpr": "d_ratio(ex_ret_ops, total_dispatch_slots)",
+ "MetricGroup": "pipeline_utilization_level1",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricName": "frontend_bound_latency",
+ "BriefDescription": "Fraction of dispatch slots that remained unused because of a latency bottleneck in the frontend (such as instruction cache or TLB misses).",
+ "MetricExpr": "d_ratio((6 * cpu@de_no_dispatch_per_slot.no_ops_from_frontend\\,cmask\\=0x6@), total_dispatch_slots)",
+ "MetricGroup": "pipeline_utilization_level2;frontend_bound_level2",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricName": "frontend_bound_bandwidth",
+ "BriefDescription": "Fraction of dispatch slots that remained unused because of a bandwidth bottleneck in the frontend (such as decode or op cache fetch bandwidth).",
+ "MetricExpr": "d_ratio(de_no_dispatch_per_slot.no_ops_from_frontend - (6 * cpu@de_no_dispatch_per_slot.no_ops_from_frontend\\,cmask\\=0x6@), total_dispatch_slots)",
+ "MetricGroup": "pipeline_utilization_level2;frontend_bound_level2",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricName": "bad_speculation_mispredicts",
+ "BriefDescription": "Fraction of dispatched ops that were flushed due to branch mispredicts.",
+ "MetricExpr": "d_ratio(bad_speculation * ex_ret_brn_misp, ex_ret_brn_misp + resyncs_or_nc_redirects)",
+ "MetricGroup": "pipeline_utilization_level2;bad_speculation_level2",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricName": "bad_speculation_pipeline_restarts",
+ "BriefDescription": "Fraction of dispatched ops that were flushed due to pipeline restarts (resyncs).",
+ "MetricExpr": "d_ratio(bad_speculation * resyncs_or_nc_redirects, ex_ret_brn_misp + resyncs_or_nc_redirects)",
+ "MetricGroup": "pipeline_utilization_level2;bad_speculation_level2",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricName": "backend_bound_memory",
+ "BriefDescription": "Fraction of dispatch slots that remained unused because of stalls due to the memory subsystem.",
+ "MetricExpr": "backend_bound * d_ratio(ex_no_retire.load_not_complete, ex_no_retire.not_complete)",
+ "MetricGroup": "pipeline_utilization_level2;backend_bound_level2",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricName": "backend_bound_cpu",
+ "BriefDescription": "Fraction of dispatch slots that remained unused because of stalls not related to the memory subsystem.",
+ "MetricExpr": "backend_bound * (1 - d_ratio(ex_no_retire.load_not_complete, ex_no_retire.not_complete))",
+ "MetricGroup": "pipeline_utilization_level2;backend_bound_level2",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricName": "retiring_fastpath",
+ "BriefDescription": "Fraction of dispatch slots used by fastpath ops that retired.",
+ "MetricExpr": "retiring * (1 - d_ratio(ex_ret_ucode_ops, ex_ret_ops))",
+ "MetricGroup": "pipeline_utilization_level2;retiring_level2",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricName": "retiring_microcode",
+ "BriefDescription": "Fraction of dispatch slots used by microcode ops that retired.",
+ "MetricExpr": "retiring * d_ratio(ex_ret_ucode_ops, ex_ret_ops)",
+ "MetricGroup": "pipeline_utilization_level2;retiring_level2",
+ "ScaleUnit": "100%"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/x86/amdzen4/recommended.json b/tools/perf/pmu-events/arch/x86/amdzen4/recommended.json
new file mode 100644
index 000000000000..2e3c9d8942b9
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/amdzen4/recommended.json
@@ -0,0 +1,334 @@
+[
+ {
+ "MetricName": "branch_misprediction_ratio",
+ "BriefDescription": "Execution-time branch misprediction ratio (non-speculative).",
+ "MetricExpr": "d_ratio(ex_ret_brn_misp, ex_ret_brn)",
+ "MetricGroup": "branch_prediction",
+ "ScaleUnit": "100%"
+ },
+ {
+ "EventName": "all_data_cache_accesses",
+ "EventCode": "0x29",
+ "BriefDescription": "All data cache accesses.",
+ "UMask": "0x07"
+ },
+ {
+ "MetricName": "all_l2_cache_accesses",
+ "BriefDescription": "All L2 cache accesses.",
+ "MetricExpr": "l2_request_g1.all_no_prefetch + l2_pf_hit_l2.all + l2_pf_miss_l2_hit_l3.all + l2_pf_miss_l2_l3.all",
+ "MetricGroup": "l2_cache"
+ },
+ {
+ "MetricName": "l2_cache_accesses_from_l1_ic_misses",
+ "BriefDescription": "L2 cache accesses from L1 instruction cache misses (including prefetch).",
+ "MetricExpr": "l2_request_g1.cacheable_ic_read",
+ "MetricGroup": "l2_cache"
+ },
+ {
+ "MetricName": "l2_cache_accesses_from_l1_dc_misses",
+ "BriefDescription": "L2 cache accesses from L1 data cache misses (including prefetch).",
+ "MetricExpr": "l2_request_g1.all_dc",
+ "MetricGroup": "l2_cache"
+ },
+ {
+ "MetricName": "l2_cache_accesses_from_l2_hwpf",
+ "BriefDescription": "L2 cache accesses from L2 cache hardware prefetcher.",
+ "MetricExpr": "l2_pf_hit_l2.all + l2_pf_miss_l2_hit_l3.all + l2_pf_miss_l2_l3.all",
+ "MetricGroup": "l2_cache"
+ },
+ {
+ "MetricName": "all_l2_cache_misses",
+ "BriefDescription": "All L2 cache misses.",
+ "MetricExpr": "l2_cache_req_stat.ic_dc_miss_in_l2 + l2_pf_miss_l2_hit_l3.all + l2_pf_miss_l2_l3.all",
+ "MetricGroup": "l2_cache"
+ },
+ {
+ "MetricName": "l2_cache_misses_from_l1_ic_miss",
+ "BriefDescription": "L2 cache misses from L1 instruction cache misses.",
+ "MetricExpr": "l2_cache_req_stat.ic_fill_miss",
+ "MetricGroup": "l2_cache"
+ },
+ {
+ "MetricName": "l2_cache_misses_from_l1_dc_miss",
+ "BriefDescription": "L2 cache misses from L1 data cache misses.",
+ "MetricExpr": "l2_cache_req_stat.ls_rd_blk_c",
+ "MetricGroup": "l2_cache"
+ },
+ {
+ "MetricName": "l2_cache_misses_from_l2_hwpf",
+ "BriefDescription": "L2 cache misses from L2 cache hardware prefetcher.",
+ "MetricExpr": "l2_pf_miss_l2_hit_l3.all + l2_pf_miss_l2_l3.all",
+ "MetricGroup": "l2_cache"
+ },
+ {
+ "MetricName": "all_l2_cache_hits",
+ "BriefDescription": "All L2 cache hits.",
+ "MetricExpr": "l2_cache_req_stat.ic_dc_hit_in_l2 + l2_pf_hit_l2.all",
+ "MetricGroup": "l2_cache"
+ },
+ {
+ "MetricName": "l2_cache_hits_from_l1_ic_miss",
+ "BriefDescription": "L2 cache hits from L1 instruction cache misses.",
+ "MetricExpr": "l2_cache_req_stat.ic_hit_in_l2",
+ "MetricGroup": "l2_cache"
+ },
+ {
+ "MetricName": "l2_cache_hits_from_l1_dc_miss",
+ "BriefDescription": "L2 cache hits from L1 data cache misses.",
+ "MetricExpr": "l2_cache_req_stat.dc_hit_in_l2",
+ "MetricGroup": "l2_cache"
+ },
+ {
+ "MetricName": "l2_cache_hits_from_l2_hwpf",
+ "BriefDescription": "L2 cache hits from L2 cache hardware prefetcher.",
+ "MetricExpr": "l2_pf_hit_l2.all",
+ "MetricGroup": "l2_cache"
+ },
+ {
+ "MetricName": "l3_cache_accesses",
+ "BriefDescription": "L3 cache accesses.",
+ "MetricExpr": "l3_lookup_state.all_coherent_accesses_to_l3",
+ "MetricGroup": "l3_cache"
+ },
+ {
+ "MetricName": "l3_misses",
+ "BriefDescription": "L3 misses (including cacheline state change requests).",
+ "MetricExpr": "l3_lookup_state.l3_miss",
+ "MetricGroup": "l3_cache"
+ },
+ {
+ "MetricName": "l3_read_miss_latency",
+ "BriefDescription": "Average L3 read miss latency (in core clocks).",
+ "MetricExpr": "(l3_xi_sampled_latency.all * 10) / l3_xi_sampled_latency_requests.all",
+ "MetricGroup": "l3_cache",
+ "ScaleUnit": "1core clocks"
+ },
+ {
+ "MetricName": "op_cache_fetch_miss_ratio",
+ "BriefDescription": "Op cache miss ratio for all fetches.",
+ "MetricExpr": "d_ratio(op_cache_hit_miss.op_cache_miss, op_cache_hit_miss.all_op_cache_accesses)",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricName": "ic_fetch_miss_ratio",
+ "BriefDescription": "Instruction cache miss ratio for all fetches. An instruction cache miss will not be counted by this metric if it is an OC hit.",
+ "MetricExpr": "d_ratio(ic_tag_hit_miss.instruction_cache_miss, ic_tag_hit_miss.all_instruction_cache_accesses)",
+ "ScaleUnit": "100%"
+ },
+ {
+ "MetricName": "l1_data_cache_fills_from_memory",
+ "BriefDescription": "L1 data cache fills from DRAM or MMIO in any NUMA node.",
+ "MetricExpr": "ls_any_fills_from_sys.dram_io_all",
+ "MetricGroup": "l1_dcache"
+ },
+ {
+ "MetricName": "l1_data_cache_fills_from_remote_node",
+ "BriefDescription": "L1 data cache fills from a different NUMA node.",
+ "MetricExpr": "ls_any_fills_from_sys.far_all",
+ "MetricGroup": "l1_dcache"
+ },
+ {
+ "MetricName": "l1_data_cache_fills_from_same_ccx",
+ "BriefDescription": "L1 data cache fills from within the same CCX.",
+ "MetricExpr": "ls_any_fills_from_sys.local_all",
+ "MetricGroup": "l1_dcache"
+ },
+ {
+ "MetricName": "l1_data_cache_fills_from_different_ccx",
+ "BriefDescription": "L1 data cache fills from another CCX cache in any NUMA node.",
+ "MetricExpr": "ls_any_fills_from_sys.remote_cache",
+ "MetricGroup": "l1_dcache"
+ },
+ {
+ "MetricName": "all_l1_data_cache_fills",
+ "BriefDescription": "All L1 data cache fills.",
+ "MetricExpr": "ls_any_fills_from_sys.all",
+ "MetricGroup": "l1_dcache"
+ },
+ {
+ "MetricName": "l1_demand_data_cache_fills_from_local_l2",
+ "BriefDescription": "L1 demand data cache fills from local L2 cache.",
+ "MetricExpr": "ls_dmnd_fills_from_sys.local_l2",
+ "MetricGroup": "l1_dcache"
+ },
+ {
+ "MetricName": "l1_demand_data_cache_fills_from_same_ccx",
+ "BriefDescription": "L1 demand data cache fills from within the same CCX.",
+ "MetricExpr": "ls_dmnd_fills_from_sys.local_ccx",
+ "MetricGroup": "l1_dcache"
+ },
+ {
+ "MetricName": "l1_demand_data_cache_fills_from_near_cache",
+ "BriefDescription": "L1 demand data cache fills from another CCX cache in the same NUMA node.",
+ "MetricExpr": "ls_dmnd_fills_from_sys.near_cache",
+ "MetricGroup": "l1_dcache"
+ },
+ {
+ "MetricName": "l1_demand_data_cache_fills_from_near_memory",
+ "BriefDescription": "L1 demand data cache fills from DRAM or MMIO in the same NUMA node.",
+ "MetricExpr": "ls_dmnd_fills_from_sys.dram_io_near",
+ "MetricGroup": "l1_dcache"
+ },
+ {
+ "MetricName": "l1_demand_data_cache_fills_from_far_cache",
+ "BriefDescription": "L1 demand data cache fills from another CCX cache in a different NUMA node.",
+ "MetricExpr": "ls_dmnd_fills_from_sys.far_cache",
+ "MetricGroup": "l1_dcache"
+ },
+ {
+ "MetricName": "l1_demand_data_cache_fills_from_far_memory",
+ "BriefDescription": "L1 demand data cache fills from DRAM or MMIO in a different NUMA node.",
+ "MetricExpr": "ls_dmnd_fills_from_sys.dram_io_far",
+ "MetricGroup": "l1_dcache"
+ },
+ {
+ "MetricName": "l1_itlb_misses",
+ "BriefDescription": "L1 instruction TLB misses.",
+ "MetricExpr": "bp_l1_tlb_miss_l2_tlb_hit + bp_l1_tlb_miss_l2_tlb_miss.all",
+ "MetricGroup": "tlb"
+ },
+ {
+ "MetricName": "l2_itlb_misses",
+ "BriefDescription": "L2 instruction TLB misses and instruction page walks.",
+ "MetricExpr": "bp_l1_tlb_miss_l2_tlb_miss.all",
+ "MetricGroup": "tlb"
+ },
+ {
+ "MetricName": "l1_dtlb_misses",
+ "BriefDescription": "L1 data TLB misses.",
+ "MetricExpr": "ls_l1_d_tlb_miss.all",
+ "MetricGroup": "tlb"
+ },
+ {
+ "MetricName": "l2_dtlb_misses",
+ "BriefDescription": "L2 data TLB misses and data page walks.",
+ "MetricExpr": "ls_l1_d_tlb_miss.all_l2_miss",
+ "MetricGroup": "tlb"
+ },
+ {
+ "MetricName": "all_tlbs_flushed",
+ "BriefDescription": "All TLBs flushed.",
+ "MetricExpr": "ls_tlb_flush.all",
+ "MetricGroup": "tlb"
+ },
+ {
+ "MetricName": "macro_ops_dispatched",
+ "BriefDescription": "Macro-ops dispatched.",
+ "MetricExpr": "de_src_op_disp.all",
+ "MetricGroup": "decoder"
+ },
+ {
+ "MetricName": "sse_avx_stalls",
+ "BriefDescription": "Mixed SSE/AVX stalls.",
+ "MetricExpr": "fp_disp_faults.sse_avx_all"
+ },
+ {
+ "MetricName": "macro_ops_retired",
+ "BriefDescription": "Macro-ops retired.",
+ "MetricExpr": "ex_ret_ops"
+ },
+ {
+ "MetricName": "dram_read_data_bytes_for_local_processor",
+ "BriefDescription": "DRAM read data bytes for local processor.",
+ "MetricExpr": "local_processor_read_data_beats_cs0 + local_processor_read_data_beats_cs1 + local_processor_read_data_beats_cs2 + local_processor_read_data_beats_cs3 + local_processor_read_data_beats_cs4 + local_processor_read_data_beats_cs5 + local_processor_read_data_beats_cs6 + local_processor_read_data_beats_cs7 + local_processor_read_data_beats_cs8 + local_processor_read_data_beats_cs9 + local_processor_read_data_beats_cs10 + local_processor_read_data_beats_cs11",
+ "MetricGroup": "data_fabric",
+ "PerPkg": "1",
+ "ScaleUnit": "6.103515625e-5MiB"
+ },
+ {
+ "MetricName": "dram_write_data_bytes_for_local_processor",
+ "BriefDescription": "DRAM write data bytes for local processor.",
+ "MetricExpr": "local_processor_write_data_beats_cs0 + local_processor_write_data_beats_cs1 + local_processor_write_data_beats_cs2 + local_processor_write_data_beats_cs3 + local_processor_write_data_beats_cs4 + local_processor_write_data_beats_cs5 + local_processor_write_data_beats_cs6 + local_processor_write_data_beats_cs7 + local_processor_write_data_beats_cs8 + local_processor_write_data_beats_cs9 + local_processor_write_data_beats_cs10 + local_processor_write_data_beats_cs11",
+ "MetricGroup": "data_fabric",
+ "PerPkg": "1",
+ "ScaleUnit": "6.103515625e-5MiB"
+ },
+ {
+ "MetricName": "dram_read_data_bytes_for_remote_processor",
+ "BriefDescription": "DRAM read data bytes for remote processor.",
+ "MetricExpr": "remote_processor_read_data_beats_cs0 + remote_processor_read_data_beats_cs1 + remote_processor_read_data_beats_cs2 + remote_processor_read_data_beats_cs3 + remote_processor_read_data_beats_cs4 + remote_processor_read_data_beats_cs5 + remote_processor_read_data_beats_cs6 + remote_processor_read_data_beats_cs7 + remote_processor_read_data_beats_cs8 + remote_processor_read_data_beats_cs9 + remote_processor_read_data_beats_cs10 + remote_processor_read_data_beats_cs11",
+ "MetricGroup": "data_fabric",
+ "PerPkg": "1",
+ "ScaleUnit": "6.103515625e-5MiB"
+ },
+ {
+ "MetricName": "dram_write_data_bytes_for_remote_processor",
+ "BriefDescription": "DRAM write data bytes for remote processor.",
+ "MetricExpr": "remote_processor_write_data_beats_cs0 + remote_processor_write_data_beats_cs1 + remote_processor_write_data_beats_cs2 + remote_processor_write_data_beats_cs3 + remote_processor_write_data_beats_cs4 + remote_processor_write_data_beats_cs5 + remote_processor_write_data_beats_cs6 + remote_processor_write_data_beats_cs7 + remote_processor_write_data_beats_cs8 + remote_processor_write_data_beats_cs9 + remote_processor_write_data_beats_cs10 + remote_processor_write_data_beats_cs11",
+ "MetricGroup": "data_fabric",
+ "PerPkg": "1",
+ "ScaleUnit": "6.103515625e-5MiB"
+ },
+ {
+ "MetricName": "local_socket_upstream_dma_read_data_bytes",
+ "BriefDescription": "Local socket upstream DMA read data bytes.",
+ "MetricExpr": "local_socket_upstream_read_beats_iom0 + local_socket_upstream_read_beats_iom1 + local_socket_upstream_read_beats_iom2 + local_socket_upstream_read_beats_iom3",
+ "MetricGroup": "data_fabric",
+ "PerPkg": "1",
+ "ScaleUnit": "6.103515625e-5MiB"
+ },
+ {
+ "MetricName": "local_socket_upstream_dma_write_data_bytes",
+ "BriefDescription": "Local socket upstream DMA write data bytes.",
+ "MetricExpr": "local_socket_upstream_write_beats_iom0 + local_socket_upstream_write_beats_iom1 + local_socket_upstream_write_beats_iom2 + local_socket_upstream_write_beats_iom3",
+ "MetricGroup": "data_fabric",
+ "PerPkg": "1",
+ "ScaleUnit": "6.103515625e-5MiB"
+ },
+ {
+ "MetricName": "remote_socket_upstream_dma_read_data_bytes",
+ "BriefDescription": "Remote socket upstream DMA read data bytes.",
+ "MetricExpr": "remote_socket_upstream_read_beats_iom0 + remote_socket_upstream_read_beats_iom1 + remote_socket_upstream_read_beats_iom2 + remote_socket_upstream_read_beats_iom3",
+ "MetricGroup": "data_fabric",
+ "PerPkg": "1",
+ "ScaleUnit": "6.103515625e-5MiB"
+ },
+ {
+ "MetricName": "remote_socket_upstream_dma_write_data_bytes",
+ "BriefDescription": "Remote socket upstream DMA write data bytes.",
+ "MetricExpr": "remote_socket_upstream_write_beats_iom0 + remote_socket_upstream_write_beats_iom1 + remote_socket_upstream_write_beats_iom2 + remote_socket_upstream_write_beats_iom3",
+ "MetricGroup": "data_fabric",
+ "PerPkg": "1",
+ "ScaleUnit": "6.103515625e-5MiB"
+ },
+ {
+ "MetricName": "local_socket_inbound_data_bytes_to_cpu",
+ "BriefDescription": "Local socket inbound data bytes to the CPU (e.g. read data).",
+ "MetricExpr": "local_socket_inf0_inbound_data_beats_ccm0 + local_socket_inf1_inbound_data_beats_ccm0 + local_socket_inf0_inbound_data_beats_ccm1 + local_socket_inf1_inbound_data_beats_ccm1 + local_socket_inf0_inbound_data_beats_ccm2 + local_socket_inf1_inbound_data_beats_ccm2 + local_socket_inf0_inbound_data_beats_ccm3 + local_socket_inf1_inbound_data_beats_ccm3 + local_socket_inf0_inbound_data_beats_ccm4 + local_socket_inf1_inbound_data_beats_ccm4 + local_socket_inf0_inbound_data_beats_ccm5 + local_socket_inf1_inbound_data_beats_ccm5 + local_socket_inf0_inbound_data_beats_ccm6 + local_socket_inf1_inbound_data_beats_ccm6 + local_socket_inf0_inbound_data_beats_ccm7 + local_socket_inf1_inbound_data_beats_ccm7",
+ "MetricGroup": "data_fabric",
+ "PerPkg": "1",
+ "ScaleUnit": "3.0517578125e-5MiB"
+ },
+ {
+ "MetricName": "local_socket_outbound_data_bytes_from_cpu",
+ "BriefDescription": "Local socket outbound data bytes from the CPU (e.g. write data).",
+ "MetricExpr": "local_socket_inf0_outbound_data_beats_ccm0 + local_socket_inf1_outbound_data_beats_ccm0 + local_socket_inf0_outbound_data_beats_ccm1 + local_socket_inf1_outbound_data_beats_ccm1 + local_socket_inf0_outbound_data_beats_ccm2 + local_socket_inf1_outbound_data_beats_ccm2 + local_socket_inf0_outbound_data_beats_ccm3 + local_socket_inf1_outbound_data_beats_ccm3 + local_socket_inf0_outbound_data_beats_ccm4 + local_socket_inf1_outbound_data_beats_ccm4 + local_socket_inf0_outbound_data_beats_ccm5 + local_socket_inf1_outbound_data_beats_ccm5 + local_socket_inf0_outbound_data_beats_ccm6 + local_socket_inf1_outbound_data_beats_ccm6 + local_socket_inf0_outbound_data_beats_ccm7 + local_socket_inf1_outbound_data_beats_ccm7",
+ "MetricGroup": "data_fabric",
+ "PerPkg": "1",
+ "ScaleUnit": "6.103515625e-5MiB"
+ },
+ {
+ "MetricName": "remote_socket_inbound_data_bytes_to_cpu",
+ "BriefDescription": "Remote socket inbound data bytes to the CPU (e.g. read data).",
+ "MetricExpr": "remote_socket_inf0_inbound_data_beats_ccm0 + remote_socket_inf1_inbound_data_beats_ccm0 + remote_socket_inf0_inbound_data_beats_ccm1 + remote_socket_inf1_inbound_data_beats_ccm1 + remote_socket_inf0_inbound_data_beats_ccm2 + remote_socket_inf1_inbound_data_beats_ccm2 + remote_socket_inf0_inbound_data_beats_ccm3 + remote_socket_inf1_inbound_data_beats_ccm3 + remote_socket_inf0_inbound_data_beats_ccm4 + remote_socket_inf1_inbound_data_beats_ccm4 + remote_socket_inf0_inbound_data_beats_ccm5 + remote_socket_inf1_inbound_data_beats_ccm5 + remote_socket_inf0_inbound_data_beats_ccm6 + remote_socket_inf1_inbound_data_beats_ccm6 + remote_socket_inf0_inbound_data_beats_ccm7 + remote_socket_inf1_inbound_data_beats_ccm7",
+ "MetricGroup": "data_fabric",
+ "PerPkg": "1",
+ "ScaleUnit": "3.0517578125e-5MiB"
+ },
+ {
+ "MetricName": "remote_socket_outbound_data_bytes_from_cpu",
+ "BriefDescription": "Remote socket outbound data bytes from the CPU (e.g. write data).",
+ "MetricExpr": "remote_socket_inf0_outbound_data_beats_ccm0 + remote_socket_inf1_outbound_data_beats_ccm0 + remote_socket_inf0_outbound_data_beats_ccm1 + remote_socket_inf1_outbound_data_beats_ccm1 + remote_socket_inf0_outbound_data_beats_ccm2 + remote_socket_inf1_outbound_data_beats_ccm2 + remote_socket_inf0_outbound_data_beats_ccm3 + remote_socket_inf1_outbound_data_beats_ccm3 + remote_socket_inf0_outbound_data_beats_ccm4 + remote_socket_inf1_outbound_data_beats_ccm4 + remote_socket_inf0_outbound_data_beats_ccm5 + remote_socket_inf1_outbound_data_beats_ccm5 + remote_socket_inf0_outbound_data_beats_ccm6 + remote_socket_inf1_outbound_data_beats_ccm6 + remote_socket_inf0_outbound_data_beats_ccm7 + remote_socket_inf1_outbound_data_beats_ccm7",
+ "MetricGroup": "data_fabric",
+ "PerPkg": "1",
+ "ScaleUnit": "6.103515625e-5MiB"
+ },
+ {
+ "MetricName": "local_socket_outbound_data_bytes_from_all_links",
+ "BriefDescription": "Outbound data bytes from all links (local socket).",
+ "MetricExpr": "local_socket_outbound_data_beats_link0 + local_socket_outbound_data_beats_link1 + local_socket_outbound_data_beats_link2 + local_socket_outbound_data_beats_link3 + local_socket_outbound_data_beats_link4 + local_socket_outbound_data_beats_link5 + local_socket_outbound_data_beats_link6 + local_socket_outbound_data_beats_link7",
+ "MetricGroup": "data_fabric",
+ "PerPkg": "1",
+ "ScaleUnit": "6.103515625e-5MiB"
+ }
+]
--
2.34.1


2022-12-07 06:27:25

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH 4/4] perf vendor events amd: Add Zen 4 metrics

On Tue, Dec 6, 2022 at 9:32 PM Sandipan Das <[email protected]> wrote:
>
> Add metrics taken from Section 2.1.15.2 "Performance Measurement" in
> the Processor Programming Reference (PPR) for AMD Family 19h Model 11h
> Revision B1 processors.
>
> The recommended metrics are sourced from Table 27 "Guidance for Common
> Performance Statistics with Complex Event Selects".
>
> The pipeline utilization metrics are sourced from Table 28 "Guidance
> for Pipeline Utilization Analysis Statistics". These are new to Zen 4
> processors and useful for finding performance bottlenecks by analyzing
> activity at different stages of the pipeline. Metric groups have been
> added for Level 1 and Level 2 analysis.
>
> Signed-off-by: Sandipan Das <[email protected]>
> ---
> .../pmu-events/arch/x86/amdzen4/pipeline.json | 98 +++++
> .../arch/x86/amdzen4/recommended.json | 334 ++++++++++++++++++
> 2 files changed, 432 insertions(+)
> create mode 100644 tools/perf/pmu-events/arch/x86/amdzen4/pipeline.json
> create mode 100644 tools/perf/pmu-events/arch/x86/amdzen4/recommended.json
>
> diff --git a/tools/perf/pmu-events/arch/x86/amdzen4/pipeline.json b/tools/perf/pmu-events/arch/x86/amdzen4/pipeline.json
> new file mode 100644
> index 000000000000..23d1f35d0903
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/x86/amdzen4/pipeline.json
> @@ -0,0 +1,98 @@
> +[
> + {
> + "MetricName": "total_dispatch_slots",
> + "BriefDescription": "Total dispatch slots (upto 6 instructions can be dispatched in each cycle).",
> + "MetricExpr": "6 * ls_not_halted_cyc"
> + },
> + {
> + "MetricName": "frontend_bound",
> + "BriefDescription": "Fraction of dispatch slots that remained unused because the frontend did not supply enough instructions/ops.",
> + "MetricExpr": "d_ratio(de_no_dispatch_per_slot.no_ops_from_frontend, total_dispatch_slots)",
> + "MetricGroup": "pipeline_utilization_level1",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "bad_speculation",
> + "BriefDescription": "Fraction of dispatched ops that did not retire.",
> + "MetricExpr": "d_ratio(de_src_op_disp.all - ex_ret_ops, total_dispatch_slots)",
> + "MetricGroup": "pipeline_utilization_level1",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "backend_bound",
> + "BriefDescription": "Fraction of dispatch slots that remained unused because of backend stalls.",
> + "MetricExpr": "d_ratio(de_no_dispatch_per_slot.backend_stalls, total_dispatch_slots)",
> + "MetricGroup": "pipeline_utilization_level1",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "smt_contention",
> + "BriefDescription": "Fraction of dispatch slots that remained unused because the other thread was selected.",
> + "MetricExpr": "d_ratio(de_no_dispatch_per_slot.smt_contention, total_dispatch_slots)",
> + "MetricGroup": "pipeline_utilization_level1",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "retiring",
> + "BriefDescription": "Fraction of dispatch slots used by ops that retired.",
> + "MetricExpr": "d_ratio(ex_ret_ops, total_dispatch_slots)",
> + "MetricGroup": "pipeline_utilization_level1",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "frontend_bound_latency",
> + "BriefDescription": "Fraction of dispatch slots that remained unused because of a latency bottleneck in the frontend (such as instruction cache or TLB misses).",
> + "MetricExpr": "d_ratio((6 * cpu@de_no_dispatch_per_slot.no_ops_from_frontend\\,cmask\\=0x6@), total_dispatch_slots)",
> + "MetricGroup": "pipeline_utilization_level2;frontend_bound_level2",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "frontend_bound_bandwidth",
> + "BriefDescription": "Fraction of dispatch slots that remained unused because of a bandwidth bottleneck in the frontend (such as decode or op cache fetch bandwidth).",
> + "MetricExpr": "d_ratio(de_no_dispatch_per_slot.no_ops_from_frontend - (6 * cpu@de_no_dispatch_per_slot.no_ops_from_frontend\\,cmask\\=0x6@), total_dispatch_slots)",
> + "MetricGroup": "pipeline_utilization_level2;frontend_bound_level2",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "bad_speculation_mispredicts",
> + "BriefDescription": "Fraction of dispatched ops that were flushed due to branch mispredicts.",
> + "MetricExpr": "d_ratio(bad_speculation * ex_ret_brn_misp, ex_ret_brn_misp + resyncs_or_nc_redirects)",
> + "MetricGroup": "pipeline_utilization_level2;bad_speculation_level2",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "bad_speculation_pipeline_restarts",
> + "BriefDescription": "Fraction of dispatched ops that were flushed due to pipeline restarts (resyncs).",
> + "MetricExpr": "d_ratio(bad_speculation * resyncs_or_nc_redirects, ex_ret_brn_misp + resyncs_or_nc_redirects)",
> + "MetricGroup": "pipeline_utilization_level2;bad_speculation_level2",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "backend_bound_memory",
> + "BriefDescription": "Fraction of dispatch slots that remained unused because of stalls due to the memory subsystem.",
> + "MetricExpr": "backend_bound * d_ratio(ex_no_retire.load_not_complete, ex_no_retire.not_complete)",
> + "MetricGroup": "pipeline_utilization_level2;backend_bound_level2",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "backend_bound_cpu",
> + "BriefDescription": "Fraction of dispatch slots that remained unused because of stalls not related to the memory subsystem.",
> + "MetricExpr": "backend_bound * (1 - d_ratio(ex_no_retire.load_not_complete, ex_no_retire.not_complete))",
> + "MetricGroup": "pipeline_utilization_level2;backend_bound_level2",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "retiring_fastpath",
> + "BriefDescription": "Fraction of dispatch slots used by fastpath ops that retired.",
> + "MetricExpr": "retiring * (1 - d_ratio(ex_ret_ucode_ops, ex_ret_ops))",
> + "MetricGroup": "pipeline_utilization_level2;retiring_level2",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "retiring_microcode",
> + "BriefDescription": "Fraction of dispatch slots used by microcode ops that retired.",
> + "MetricExpr": "retiring * d_ratio(ex_ret_ucode_ops, ex_ret_ops)",
> + "MetricGroup": "pipeline_utilization_level2;retiring_level2",
> + "ScaleUnit": "100%"
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/x86/amdzen4/recommended.json b/tools/perf/pmu-events/arch/x86/amdzen4/recommended.json
> new file mode 100644
> index 000000000000..2e3c9d8942b9
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/x86/amdzen4/recommended.json
> @@ -0,0 +1,334 @@
> +[
> + {
> + "MetricName": "branch_misprediction_ratio",
> + "BriefDescription": "Execution-time branch misprediction ratio (non-speculative).",
> + "MetricExpr": "d_ratio(ex_ret_brn_misp, ex_ret_brn)",
> + "MetricGroup": "branch_prediction",
> + "ScaleUnit": "100%"
> + },
> + {
> + "EventName": "all_data_cache_accesses",
> + "EventCode": "0x29",
> + "BriefDescription": "All data cache accesses.",
> + "UMask": "0x07"
> + },
> + {
> + "MetricName": "all_l2_cache_accesses",
> + "BriefDescription": "All L2 cache accesses.",
> + "MetricExpr": "l2_request_g1.all_no_prefetch + l2_pf_hit_l2.all + l2_pf_miss_l2_hit_l3.all + l2_pf_miss_l2_l3.all",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l2_cache_accesses_from_l1_ic_misses",
> + "BriefDescription": "L2 cache accesses from L1 instruction cache misses (including prefetch).",
> + "MetricExpr": "l2_request_g1.cacheable_ic_read",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l2_cache_accesses_from_l1_dc_misses",
> + "BriefDescription": "L2 cache accesses from L1 data cache misses (including prefetch).",
> + "MetricExpr": "l2_request_g1.all_dc",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l2_cache_accesses_from_l2_hwpf",
> + "BriefDescription": "L2 cache accesses from L2 cache hardware prefetcher.",
> + "MetricExpr": "l2_pf_hit_l2.all + l2_pf_miss_l2_hit_l3.all + l2_pf_miss_l2_l3.all",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "all_l2_cache_misses",
> + "BriefDescription": "All L2 cache misses.",
> + "MetricExpr": "l2_cache_req_stat.ic_dc_miss_in_l2 + l2_pf_miss_l2_hit_l3.all + l2_pf_miss_l2_l3.all",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l2_cache_misses_from_l1_ic_miss",
> + "BriefDescription": "L2 cache misses from L1 instruction cache misses.",
> + "MetricExpr": "l2_cache_req_stat.ic_fill_miss",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l2_cache_misses_from_l1_dc_miss",
> + "BriefDescription": "L2 cache misses from L1 data cache misses.",
> + "MetricExpr": "l2_cache_req_stat.ls_rd_blk_c",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l2_cache_misses_from_l2_hwpf",
> + "BriefDescription": "L2 cache misses from L2 cache hardware prefetcher.",
> + "MetricExpr": "l2_pf_miss_l2_hit_l3.all + l2_pf_miss_l2_l3.all",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "all_l2_cache_hits",
> + "BriefDescription": "All L2 cache hits.",
> + "MetricExpr": "l2_cache_req_stat.ic_dc_hit_in_l2 + l2_pf_hit_l2.all",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l2_cache_hits_from_l1_ic_miss",
> + "BriefDescription": "L2 cache hits from L1 instruction cache misses.",
> + "MetricExpr": "l2_cache_req_stat.ic_hit_in_l2",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l2_cache_hits_from_l1_dc_miss",
> + "BriefDescription": "L2 cache hits from L1 data cache misses.",
> + "MetricExpr": "l2_cache_req_stat.dc_hit_in_l2",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l2_cache_hits_from_l2_hwpf",
> + "BriefDescription": "L2 cache hits from L2 cache hardware prefetcher.",
> + "MetricExpr": "l2_pf_hit_l2.all",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l3_cache_accesses",
> + "BriefDescription": "L3 cache accesses.",
> + "MetricExpr": "l3_lookup_state.all_coherent_accesses_to_l3",
> + "MetricGroup": "l3_cache"
> + },
> + {
> + "MetricName": "l3_misses",
> + "BriefDescription": "L3 misses (including cacheline state change requests).",
> + "MetricExpr": "l3_lookup_state.l3_miss",
> + "MetricGroup": "l3_cache"
> + },
> + {
> + "MetricName": "l3_read_miss_latency",
> + "BriefDescription": "Average L3 read miss latency (in core clocks).",
> + "MetricExpr": "(l3_xi_sampled_latency.all * 10) / l3_xi_sampled_latency_requests.all",
> + "MetricGroup": "l3_cache",
> + "ScaleUnit": "1core clocks"
> + },
> + {
> + "MetricName": "op_cache_fetch_miss_ratio",
> + "BriefDescription": "Op cache miss ratio for all fetches.",
> + "MetricExpr": "d_ratio(op_cache_hit_miss.op_cache_miss, op_cache_hit_miss.all_op_cache_accesses)",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "ic_fetch_miss_ratio",
> + "BriefDescription": "Instruction cache miss ratio for all fetches. An instruction cache miss will not be counted by this metric if it is an OC hit.",
> + "MetricExpr": "d_ratio(ic_tag_hit_miss.instruction_cache_miss, ic_tag_hit_miss.all_instruction_cache_accesses)",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "l1_data_cache_fills_from_memory",
> + "BriefDescription": "L1 data cache fills from DRAM or MMIO in any NUMA node.",
> + "MetricExpr": "ls_any_fills_from_sys.dram_io_all",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_data_cache_fills_from_remote_node",
> + "BriefDescription": "L1 data cache fills from a different NUMA node.",
> + "MetricExpr": "ls_any_fills_from_sys.far_all",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_data_cache_fills_from_same_ccx",
> + "BriefDescription": "L1 data cache fills from within the same CCX.",
> + "MetricExpr": "ls_any_fills_from_sys.local_all",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_data_cache_fills_from_different_ccx",
> + "BriefDescription": "L1 data cache fills from another CCX cache in any NUMA node.",
> + "MetricExpr": "ls_any_fills_from_sys.remote_cache",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "all_l1_data_cache_fills",
> + "BriefDescription": "All L1 data cache fills.",
> + "MetricExpr": "ls_any_fills_from_sys.all",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_demand_data_cache_fills_from_local_l2",
> + "BriefDescription": "L1 demand data cache fills from local L2 cache.",
> + "MetricExpr": "ls_dmnd_fills_from_sys.local_l2",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_demand_data_cache_fills_from_same_ccx",
> + "BriefDescription": "L1 demand data cache fills from within the same CCX.",
> + "MetricExpr": "ls_dmnd_fills_from_sys.local_ccx",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_demand_data_cache_fills_from_near_cache",
> + "BriefDescription": "L1 demand data cache fills from another CCX cache in the same NUMA node.",
> + "MetricExpr": "ls_dmnd_fills_from_sys.near_cache",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_demand_data_cache_fills_from_near_memory",
> + "BriefDescription": "L1 demand data cache fills from DRAM or MMIO in the same NUMA node.",
> + "MetricExpr": "ls_dmnd_fills_from_sys.dram_io_near",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_demand_data_cache_fills_from_far_cache",
> + "BriefDescription": "L1 demand data cache fills from another CCX cache in a different NUMA node.",
> + "MetricExpr": "ls_dmnd_fills_from_sys.far_cache",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_demand_data_cache_fills_from_far_memory",
> + "BriefDescription": "L1 demand data cache fills from DRAM or MMIO in a different NUMA node.",
> + "MetricExpr": "ls_dmnd_fills_from_sys.dram_io_far",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_itlb_misses",
> + "BriefDescription": "L1 instruction TLB misses.",
> + "MetricExpr": "bp_l1_tlb_miss_l2_tlb_hit + bp_l1_tlb_miss_l2_tlb_miss.all",
> + "MetricGroup": "tlb"
> + },
> + {
> + "MetricName": "l2_itlb_misses",
> + "BriefDescription": "L2 instruction TLB misses and instruction page walks.",
> + "MetricExpr": "bp_l1_tlb_miss_l2_tlb_miss.all",
> + "MetricGroup": "tlb"
> + },
> + {
> + "MetricName": "l1_dtlb_misses",
> + "BriefDescription": "L1 data TLB misses.",
> + "MetricExpr": "ls_l1_d_tlb_miss.all",
> + "MetricGroup": "tlb"
> + },
> + {
> + "MetricName": "l2_dtlb_misses",
> + "BriefDescription": "L2 data TLB misses and data page walks.",
> + "MetricExpr": "ls_l1_d_tlb_miss.all_l2_miss",
> + "MetricGroup": "tlb"
> + },
> + {
> + "MetricName": "all_tlbs_flushed",
> + "BriefDescription": "All TLBs flushed.",
> + "MetricExpr": "ls_tlb_flush.all",
> + "MetricGroup": "tlb"
> + },
> + {
> + "MetricName": "macro_ops_dispatched",
> + "BriefDescription": "Macro-ops dispatched.",
> + "MetricExpr": "de_src_op_disp.all",
> + "MetricGroup": "decoder"
> + },
> + {
> + "MetricName": "sse_avx_stalls",
> + "BriefDescription": "Mixed SSE/AVX stalls.",
> + "MetricExpr": "fp_disp_faults.sse_avx_all"
> + },
> + {
> + "MetricName": "macro_ops_retired",
> + "BriefDescription": "Macro-ops retired.",
> + "MetricExpr": "ex_ret_ops"
> + },
> + {
> + "MetricName": "dram_read_data_bytes_for_local_processor",

nit: Is "bytes" redundant in the name here? It may even be confusing
given the units.

> + "BriefDescription": "DRAM read data bytes for local processor.",
> + "MetricExpr": "local_processor_read_data_beats_cs0 + local_processor_read_data_beats_cs1 + local_processor_read_data_beats_cs2 + local_processor_read_data_beats_cs3 + local_processor_read_data_beats_cs4 + local_processor_read_data_beats_cs5 + local_processor_read_data_beats_cs6 + local_processor_read_data_beats_cs7 + local_processor_read_data_beats_cs8 + local_processor_read_data_beats_cs9 + local_processor_read_data_beats_cs10 + local_processor_read_data_beats_cs11",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "dram_write_data_bytes_for_local_processor",
> + "BriefDescription": "DRAM write data bytes for local processor.",
> + "MetricExpr": "local_processor_write_data_beats_cs0 + local_processor_write_data_beats_cs1 + local_processor_write_data_beats_cs2 + local_processor_write_data_beats_cs3 + local_processor_write_data_beats_cs4 + local_processor_write_data_beats_cs5 + local_processor_write_data_beats_cs6 + local_processor_write_data_beats_cs7 + local_processor_write_data_beats_cs8 + local_processor_write_data_beats_cs9 + local_processor_write_data_beats_cs10 + local_processor_write_data_beats_cs11",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "dram_read_data_bytes_for_remote_processor",
> + "BriefDescription": "DRAM read data bytes for remote processor.",
> + "MetricExpr": "remote_processor_read_data_beats_cs0 + remote_processor_read_data_beats_cs1 + remote_processor_read_data_beats_cs2 + remote_processor_read_data_beats_cs3 + remote_processor_read_data_beats_cs4 + remote_processor_read_data_beats_cs5 + remote_processor_read_data_beats_cs6 + remote_processor_read_data_beats_cs7 + remote_processor_read_data_beats_cs8 + remote_processor_read_data_beats_cs9 + remote_processor_read_data_beats_cs10 + remote_processor_read_data_beats_cs11",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "dram_write_data_bytes_for_remote_processor",
> + "BriefDescription": "DRAM write data bytes for remote processor.",
> + "MetricExpr": "remote_processor_write_data_beats_cs0 + remote_processor_write_data_beats_cs1 + remote_processor_write_data_beats_cs2 + remote_processor_write_data_beats_cs3 + remote_processor_write_data_beats_cs4 + remote_processor_write_data_beats_cs5 + remote_processor_write_data_beats_cs6 + remote_processor_write_data_beats_cs7 + remote_processor_write_data_beats_cs8 + remote_processor_write_data_beats_cs9 + remote_processor_write_data_beats_cs10 + remote_processor_write_data_beats_cs11",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "local_socket_upstream_dma_read_data_bytes",
> + "BriefDescription": "Local socket upstream DMA read data bytes.",
> + "MetricExpr": "local_socket_upstream_read_beats_iom0 + local_socket_upstream_read_beats_iom1 + local_socket_upstream_read_beats_iom2 + local_socket_upstream_read_beats_iom3",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "local_socket_upstream_dma_write_data_bytes",
> + "BriefDescription": "Local socket upstream DMA write data bytes.",
> + "MetricExpr": "local_socket_upstream_write_beats_iom0 + local_socket_upstream_write_beats_iom1 + local_socket_upstream_write_beats_iom2 + local_socket_upstream_write_beats_iom3",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "remote_socket_upstream_dma_read_data_bytes",
> + "BriefDescription": "Remote socket upstream DMA read data bytes.",
> + "MetricExpr": "remote_socket_upstream_read_beats_iom0 + remote_socket_upstream_read_beats_iom1 + remote_socket_upstream_read_beats_iom2 + remote_socket_upstream_read_beats_iom3",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "remote_socket_upstream_dma_write_data_bytes",
> + "BriefDescription": "Remote socket upstream DMA write data bytes.",
> + "MetricExpr": "remote_socket_upstream_write_beats_iom0 + remote_socket_upstream_write_beats_iom1 + remote_socket_upstream_write_beats_iom2 + remote_socket_upstream_write_beats_iom3",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "local_socket_inbound_data_bytes_to_cpu",
> + "BriefDescription": "Local socket inbound data bytes to the CPU (e.g. read data).",
> + "MetricExpr": "local_socket_inf0_inbound_data_beats_ccm0 + local_socket_inf1_inbound_data_beats_ccm0 + local_socket_inf0_inbound_data_beats_ccm1 + local_socket_inf1_inbound_data_beats_ccm1 + local_socket_inf0_inbound_data_beats_ccm2 + local_socket_inf1_inbound_data_beats_ccm2 + local_socket_inf0_inbound_data_beats_ccm3 + local_socket_inf1_inbound_data_beats_ccm3 + local_socket_inf0_inbound_data_beats_ccm4 + local_socket_inf1_inbound_data_beats_ccm4 + local_socket_inf0_inbound_data_beats_ccm5 + local_socket_inf1_inbound_data_beats_ccm5 + local_socket_inf0_inbound_data_beats_ccm6 + local_socket_inf1_inbound_data_beats_ccm6 + local_socket_inf0_inbound_data_beats_ccm7 + local_socket_inf1_inbound_data_beats_ccm7",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "3.0517578125e-5MiB"
> + },
> + {
> + "MetricName": "local_socket_outbound_data_bytes_from_cpu",
> + "BriefDescription": "Local socket outbound data bytes from the CPU (e.g. write data).",
> + "MetricExpr": "local_socket_inf0_outbound_data_beats_ccm0 + local_socket_inf1_outbound_data_beats_ccm0 + local_socket_inf0_outbound_data_beats_ccm1 + local_socket_inf1_outbound_data_beats_ccm1 + local_socket_inf0_outbound_data_beats_ccm2 + local_socket_inf1_outbound_data_beats_ccm2 + local_socket_inf0_outbound_data_beats_ccm3 + local_socket_inf1_outbound_data_beats_ccm3 + local_socket_inf0_outbound_data_beats_ccm4 + local_socket_inf1_outbound_data_beats_ccm4 + local_socket_inf0_outbound_data_beats_ccm5 + local_socket_inf1_outbound_data_beats_ccm5 + local_socket_inf0_outbound_data_beats_ccm6 + local_socket_inf1_outbound_data_beats_ccm6 + local_socket_inf0_outbound_data_beats_ccm7 + local_socket_inf1_outbound_data_beats_ccm7",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "remote_socket_inbound_data_bytes_to_cpu",
> + "BriefDescription": "Remote socket inbound data bytes to the CPU (e.g. read data).",
> + "MetricExpr": "remote_socket_inf0_inbound_data_beats_ccm0 + remote_socket_inf1_inbound_data_beats_ccm0 + remote_socket_inf0_inbound_data_beats_ccm1 + remote_socket_inf1_inbound_data_beats_ccm1 + remote_socket_inf0_inbound_data_beats_ccm2 + remote_socket_inf1_inbound_data_beats_ccm2 + remote_socket_inf0_inbound_data_beats_ccm3 + remote_socket_inf1_inbound_data_beats_ccm3 + remote_socket_inf0_inbound_data_beats_ccm4 + remote_socket_inf1_inbound_data_beats_ccm4 + remote_socket_inf0_inbound_data_beats_ccm5 + remote_socket_inf1_inbound_data_beats_ccm5 + remote_socket_inf0_inbound_data_beats_ccm6 + remote_socket_inf1_inbound_data_beats_ccm6 + remote_socket_inf0_inbound_data_beats_ccm7 + remote_socket_inf1_inbound_data_beats_ccm7",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "3.0517578125e-5MiB"
> + },
> + {
> + "MetricName": "remote_socket_outbound_data_bytes_from_cpu",
> + "BriefDescription": "Remote socket outbound data bytes from the CPU (e.g. write data).",
> + "MetricExpr": "remote_socket_inf0_outbound_data_beats_ccm0 + remote_socket_inf1_outbound_data_beats_ccm0 + remote_socket_inf0_outbound_data_beats_ccm1 + remote_socket_inf1_outbound_data_beats_ccm1 + remote_socket_inf0_outbound_data_beats_ccm2 + remote_socket_inf1_outbound_data_beats_ccm2 + remote_socket_inf0_outbound_data_beats_ccm3 + remote_socket_inf1_outbound_data_beats_ccm3 + remote_socket_inf0_outbound_data_beats_ccm4 + remote_socket_inf1_outbound_data_beats_ccm4 + remote_socket_inf0_outbound_data_beats_ccm5 + remote_socket_inf1_outbound_data_beats_ccm5 + remote_socket_inf0_outbound_data_beats_ccm6 + remote_socket_inf1_outbound_data_beats_ccm6 + remote_socket_inf0_outbound_data_beats_ccm7 + remote_socket_inf1_outbound_data_beats_ccm7",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "local_socket_outbound_data_bytes_from_all_links",
> + "BriefDescription": "Outbound data bytes from all links (local socket).",
> + "MetricExpr": "local_socket_outbound_data_beats_link0 + local_socket_outbound_data_beats_link1 + local_socket_outbound_data_beats_link2 + local_socket_outbound_data_beats_link3 + local_socket_outbound_data_beats_link4 + local_socket_outbound_data_beats_link5 + local_socket_outbound_data_beats_link6 + local_socket_outbound_data_beats_link7",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + }
> +]
> --
> 2.34.1
>

2022-12-07 07:20:26

by Sandipan Das

[permalink] [raw]
Subject: Re: [PATCH 4/4] perf vendor events amd: Add Zen 4 metrics

On 12/7/2022 11:35 AM, Ian Rogers wrote:
> On Tue, Dec 6, 2022 at 9:32 PM Sandipan Das <[email protected]> wrote:
>>
>> Add metrics taken from Section 2.1.15.2 "Performance Measurement" in
>> the Processor Programming Reference (PPR) for AMD Family 19h Model 11h
>> Revision B1 processors.
>>
>> The recommended metrics are sourced from Table 27 "Guidance for Common
>> Performance Statistics with Complex Event Selects".
>>
>> The pipeline utilization metrics are sourced from Table 28 "Guidance
>> for Pipeline Utilization Analysis Statistics". These are new to Zen 4
>> processors and useful for finding performance bottlenecks by analyzing
>> activity at different stages of the pipeline. Metric groups have been
>> added for Level 1 and Level 2 analysis.
>>
>> Signed-off-by: Sandipan Das <[email protected]>
>> ---
>> .../pmu-events/arch/x86/amdzen4/pipeline.json | 98 +++++
>> .../arch/x86/amdzen4/recommended.json | 334 ++++++++++++++++++
>> 2 files changed, 432 insertions(+)
>> create mode 100644 tools/perf/pmu-events/arch/x86/amdzen4/pipeline.json
>> create mode 100644 tools/perf/pmu-events/arch/x86/amdzen4/recommended.json
>>
<snip>
>> + {
>> + "MetricName": "dram_read_data_bytes_for_local_processor",
>
> nit: Is "bytes" redundant in the name here? It may even be confusing
> given the units.
>

Agreed. I can replace "bytes" with "mbytes" or "megabytes" for these bandwidth metrics.

- Sandipan

>> + "BriefDescription": "DRAM read data bytes for local processor.",
>> + "MetricExpr": "local_processor_read_data_beats_cs0 + local_processor_read_data_beats_cs1 + local_processor_read_data_beats_cs2 + local_processor_read_data_beats_cs3 + local_processor_read_data_beats_cs4 + local_processor_read_data_beats_cs5 + local_processor_read_data_beats_cs6 + local_processor_read_data_beats_cs7 + local_processor_read_data_beats_cs8 + local_processor_read_data_beats_cs9 + local_processor_read_data_beats_cs10 + local_processor_read_data_beats_cs11",
>> + "MetricGroup": "data_fabric",
>> + "PerPkg": "1",
>> + "ScaleUnit": "6.103515625e-5MiB"
>> + },
>> + {
>> + "MetricName": "dram_write_data_bytes_for_local_processor",
>> + "BriefDescription": "DRAM write data bytes for local processor.",
>> + "MetricExpr": "local_processor_write_data_beats_cs0 + local_processor_write_data_beats_cs1 + local_processor_write_data_beats_cs2 + local_processor_write_data_beats_cs3 + local_processor_write_data_beats_cs4 + local_processor_write_data_beats_cs5 + local_processor_write_data_beats_cs6 + local_processor_write_data_beats_cs7 + local_processor_write_data_beats_cs8 + local_processor_write_data_beats_cs9 + local_processor_write_data_beats_cs10 + local_processor_write_data_beats_cs11",
>> + "MetricGroup": "data_fabric",
>> + "PerPkg": "1",
>> + "ScaleUnit": "6.103515625e-5MiB"
>> + },
>> + {
>> + "MetricName": "dram_read_data_bytes_for_remote_processor",
>> + "BriefDescription": "DRAM read data bytes for remote processor.",
>> + "MetricExpr": "remote_processor_read_data_beats_cs0 + remote_processor_read_data_beats_cs1 + remote_processor_read_data_beats_cs2 + remote_processor_read_data_beats_cs3 + remote_processor_read_data_beats_cs4 + remote_processor_read_data_beats_cs5 + remote_processor_read_data_beats_cs6 + remote_processor_read_data_beats_cs7 + remote_processor_read_data_beats_cs8 + remote_processor_read_data_beats_cs9 + remote_processor_read_data_beats_cs10 + remote_processor_read_data_beats_cs11",
>> + "MetricGroup": "data_fabric",
>> + "PerPkg": "1",
>> + "ScaleUnit": "6.103515625e-5MiB"
>> + },
>> + {
>> + "MetricName": "dram_write_data_bytes_for_remote_processor",
>> + "BriefDescription": "DRAM write data bytes for remote processor.",
>> + "MetricExpr": "remote_processor_write_data_beats_cs0 + remote_processor_write_data_beats_cs1 + remote_processor_write_data_beats_cs2 + remote_processor_write_data_beats_cs3 + remote_processor_write_data_beats_cs4 + remote_processor_write_data_beats_cs5 + remote_processor_write_data_beats_cs6 + remote_processor_write_data_beats_cs7 + remote_processor_write_data_beats_cs8 + remote_processor_write_data_beats_cs9 + remote_processor_write_data_beats_cs10 + remote_processor_write_data_beats_cs11",
>> + "MetricGroup": "data_fabric",
>> + "PerPkg": "1",
>> + "ScaleUnit": "6.103515625e-5MiB"
>> + },
>> + {
>> + "MetricName": "local_socket_upstream_dma_read_data_bytes",
>> + "BriefDescription": "Local socket upstream DMA read data bytes.",
>> + "MetricExpr": "local_socket_upstream_read_beats_iom0 + local_socket_upstream_read_beats_iom1 + local_socket_upstream_read_beats_iom2 + local_socket_upstream_read_beats_iom3",
>> + "MetricGroup": "data_fabric",
>> + "PerPkg": "1",
>> + "ScaleUnit": "6.103515625e-5MiB"
>> + },
>> + {
>> + "MetricName": "local_socket_upstream_dma_write_data_bytes",
>> + "BriefDescription": "Local socket upstream DMA write data bytes.",
>> + "MetricExpr": "local_socket_upstream_write_beats_iom0 + local_socket_upstream_write_beats_iom1 + local_socket_upstream_write_beats_iom2 + local_socket_upstream_write_beats_iom3",
>> + "MetricGroup": "data_fabric",
>> + "PerPkg": "1",
>> + "ScaleUnit": "6.103515625e-5MiB"
>> + },
>> + {
>> + "MetricName": "remote_socket_upstream_dma_read_data_bytes",
>> + "BriefDescription": "Remote socket upstream DMA read data bytes.",
>> + "MetricExpr": "remote_socket_upstream_read_beats_iom0 + remote_socket_upstream_read_beats_iom1 + remote_socket_upstream_read_beats_iom2 + remote_socket_upstream_read_beats_iom3",
>> + "MetricGroup": "data_fabric",
>> + "PerPkg": "1",
>> + "ScaleUnit": "6.103515625e-5MiB"
>> + },
>> + {
>> + "MetricName": "remote_socket_upstream_dma_write_data_bytes",
>> + "BriefDescription": "Remote socket upstream DMA write data bytes.",
>> + "MetricExpr": "remote_socket_upstream_write_beats_iom0 + remote_socket_upstream_write_beats_iom1 + remote_socket_upstream_write_beats_iom2 + remote_socket_upstream_write_beats_iom3",
>> + "MetricGroup": "data_fabric",
>> + "PerPkg": "1",
>> + "ScaleUnit": "6.103515625e-5MiB"
>> + },
>> + {
>> + "MetricName": "local_socket_inbound_data_bytes_to_cpu",
>> + "BriefDescription": "Local socket inbound data bytes to the CPU (e.g. read data).",
>> + "MetricExpr": "local_socket_inf0_inbound_data_beats_ccm0 + local_socket_inf1_inbound_data_beats_ccm0 + local_socket_inf0_inbound_data_beats_ccm1 + local_socket_inf1_inbound_data_beats_ccm1 + local_socket_inf0_inbound_data_beats_ccm2 + local_socket_inf1_inbound_data_beats_ccm2 + local_socket_inf0_inbound_data_beats_ccm3 + local_socket_inf1_inbound_data_beats_ccm3 + local_socket_inf0_inbound_data_beats_ccm4 + local_socket_inf1_inbound_data_beats_ccm4 + local_socket_inf0_inbound_data_beats_ccm5 + local_socket_inf1_inbound_data_beats_ccm5 + local_socket_inf0_inbound_data_beats_ccm6 + local_socket_inf1_inbound_data_beats_ccm6 + local_socket_inf0_inbound_data_beats_ccm7 + local_socket_inf1_inbound_data_beats_ccm7",
>> + "MetricGroup": "data_fabric",
>> + "PerPkg": "1",
>> + "ScaleUnit": "3.0517578125e-5MiB"
>> + },
>> + {
>> + "MetricName": "local_socket_outbound_data_bytes_from_cpu",
>> + "BriefDescription": "Local socket outbound data bytes from the CPU (e.g. write data).",
>> + "MetricExpr": "local_socket_inf0_outbound_data_beats_ccm0 + local_socket_inf1_outbound_data_beats_ccm0 + local_socket_inf0_outbound_data_beats_ccm1 + local_socket_inf1_outbound_data_beats_ccm1 + local_socket_inf0_outbound_data_beats_ccm2 + local_socket_inf1_outbound_data_beats_ccm2 + local_socket_inf0_outbound_data_beats_ccm3 + local_socket_inf1_outbound_data_beats_ccm3 + local_socket_inf0_outbound_data_beats_ccm4 + local_socket_inf1_outbound_data_beats_ccm4 + local_socket_inf0_outbound_data_beats_ccm5 + local_socket_inf1_outbound_data_beats_ccm5 + local_socket_inf0_outbound_data_beats_ccm6 + local_socket_inf1_outbound_data_beats_ccm6 + local_socket_inf0_outbound_data_beats_ccm7 + local_socket_inf1_outbound_data_beats_ccm7",
>> + "MetricGroup": "data_fabric",
>> + "PerPkg": "1",
>> + "ScaleUnit": "6.103515625e-5MiB"
>> + },
>> + {
>> + "MetricName": "remote_socket_inbound_data_bytes_to_cpu",
>> + "BriefDescription": "Remote socket inbound data bytes to the CPU (e.g. read data).",
>> + "MetricExpr": "remote_socket_inf0_inbound_data_beats_ccm0 + remote_socket_inf1_inbound_data_beats_ccm0 + remote_socket_inf0_inbound_data_beats_ccm1 + remote_socket_inf1_inbound_data_beats_ccm1 + remote_socket_inf0_inbound_data_beats_ccm2 + remote_socket_inf1_inbound_data_beats_ccm2 + remote_socket_inf0_inbound_data_beats_ccm3 + remote_socket_inf1_inbound_data_beats_ccm3 + remote_socket_inf0_inbound_data_beats_ccm4 + remote_socket_inf1_inbound_data_beats_ccm4 + remote_socket_inf0_inbound_data_beats_ccm5 + remote_socket_inf1_inbound_data_beats_ccm5 + remote_socket_inf0_inbound_data_beats_ccm6 + remote_socket_inf1_inbound_data_beats_ccm6 + remote_socket_inf0_inbound_data_beats_ccm7 + remote_socket_inf1_inbound_data_beats_ccm7",
>> + "MetricGroup": "data_fabric",
>> + "PerPkg": "1",
>> + "ScaleUnit": "3.0517578125e-5MiB"
>> + },
>> + {
>> + "MetricName": "remote_socket_outbound_data_bytes_from_cpu",
>> + "BriefDescription": "Remote socket outbound data bytes from the CPU (e.g. write data).",
>> + "MetricExpr": "remote_socket_inf0_outbound_data_beats_ccm0 + remote_socket_inf1_outbound_data_beats_ccm0 + remote_socket_inf0_outbound_data_beats_ccm1 + remote_socket_inf1_outbound_data_beats_ccm1 + remote_socket_inf0_outbound_data_beats_ccm2 + remote_socket_inf1_outbound_data_beats_ccm2 + remote_socket_inf0_outbound_data_beats_ccm3 + remote_socket_inf1_outbound_data_beats_ccm3 + remote_socket_inf0_outbound_data_beats_ccm4 + remote_socket_inf1_outbound_data_beats_ccm4 + remote_socket_inf0_outbound_data_beats_ccm5 + remote_socket_inf1_outbound_data_beats_ccm5 + remote_socket_inf0_outbound_data_beats_ccm6 + remote_socket_inf1_outbound_data_beats_ccm6 + remote_socket_inf0_outbound_data_beats_ccm7 + remote_socket_inf1_outbound_data_beats_ccm7",
>> + "MetricGroup": "data_fabric",
>> + "PerPkg": "1",
>> + "ScaleUnit": "6.103515625e-5MiB"
>> + },
>> + {
>> + "MetricName": "local_socket_outbound_data_bytes_from_all_links",
>> + "BriefDescription": "Outbound data bytes from all links (local socket).",
>> + "MetricExpr": "local_socket_outbound_data_beats_link0 + local_socket_outbound_data_beats_link1 + local_socket_outbound_data_beats_link2 + local_socket_outbound_data_beats_link3 + local_socket_outbound_data_beats_link4 + local_socket_outbound_data_beats_link5 + local_socket_outbound_data_beats_link6 + local_socket_outbound_data_beats_link7",
>> + "MetricGroup": "data_fabric",
>> + "PerPkg": "1",
>> + "ScaleUnit": "6.103515625e-5MiB"
>> + }
>> +]
>> --
>> 2.34.1
>>

2022-12-07 17:54:34

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH 4/4] perf vendor events amd: Add Zen 4 metrics

On Tue, Dec 6, 2022 at 10:58 PM Sandipan Das <[email protected]> wrote:
>
> On 12/7/2022 11:35 AM, Ian Rogers wrote:
> > On Tue, Dec 6, 2022 at 9:32 PM Sandipan Das <[email protected]> wrote:
> >>
> >> Add metrics taken from Section 2.1.15.2 "Performance Measurement" in
> >> the Processor Programming Reference (PPR) for AMD Family 19h Model 11h
> >> Revision B1 processors.
> >>
> >> The recommended metrics are sourced from Table 27 "Guidance for Common
> >> Performance Statistics with Complex Event Selects".
> >>
> >> The pipeline utilization metrics are sourced from Table 28 "Guidance
> >> for Pipeline Utilization Analysis Statistics". These are new to Zen 4
> >> processors and useful for finding performance bottlenecks by analyzing
> >> activity at different stages of the pipeline. Metric groups have been
> >> added for Level 1 and Level 2 analysis.
> >>
> >> Signed-off-by: Sandipan Das <[email protected]>
> >> ---
> >> .../pmu-events/arch/x86/amdzen4/pipeline.json | 98 +++++
> >> .../arch/x86/amdzen4/recommended.json | 334 ++++++++++++++++++
> >> 2 files changed, 432 insertions(+)
> >> create mode 100644 tools/perf/pmu-events/arch/x86/amdzen4/pipeline.json
> >> create mode 100644 tools/perf/pmu-events/arch/x86/amdzen4/recommended.json
> >>
> <snip>
> >> + {
> >> + "MetricName": "dram_read_data_bytes_for_local_processor",
> >
> > nit: Is "bytes" redundant in the name here? It may even be confusing
> > given the units.
> >
>
> Agreed. I can replace "bytes" with "mbytes" or "megabytes" for these bandwidth metrics.
>
> - Sandipan

Perhaps just drop it from the name :-) So,
dram_read_data_for_local_processor, etc.

Thanks,
Ian

> >> + "BriefDescription": "DRAM read data bytes for local processor.",
> >> + "MetricExpr": "local_processor_read_data_beats_cs0 + local_processor_read_data_beats_cs1 + local_processor_read_data_beats_cs2 + local_processor_read_data_beats_cs3 + local_processor_read_data_beats_cs4 + local_processor_read_data_beats_cs5 + local_processor_read_data_beats_cs6 + local_processor_read_data_beats_cs7 + local_processor_read_data_beats_cs8 + local_processor_read_data_beats_cs9 + local_processor_read_data_beats_cs10 + local_processor_read_data_beats_cs11",
> >> + "MetricGroup": "data_fabric",
> >> + "PerPkg": "1",
> >> + "ScaleUnit": "6.103515625e-5MiB"
> >> + },
> >> + {
> >> + "MetricName": "dram_write_data_bytes_for_local_processor",
> >> + "BriefDescription": "DRAM write data bytes for local processor.",
> >> + "MetricExpr": "local_processor_write_data_beats_cs0 + local_processor_write_data_beats_cs1 + local_processor_write_data_beats_cs2 + local_processor_write_data_beats_cs3 + local_processor_write_data_beats_cs4 + local_processor_write_data_beats_cs5 + local_processor_write_data_beats_cs6 + local_processor_write_data_beats_cs7 + local_processor_write_data_beats_cs8 + local_processor_write_data_beats_cs9 + local_processor_write_data_beats_cs10 + local_processor_write_data_beats_cs11",
> >> + "MetricGroup": "data_fabric",
> >> + "PerPkg": "1",
> >> + "ScaleUnit": "6.103515625e-5MiB"
> >> + },
> >> + {
> >> + "MetricName": "dram_read_data_bytes_for_remote_processor",
> >> + "BriefDescription": "DRAM read data bytes for remote processor.",
> >> + "MetricExpr": "remote_processor_read_data_beats_cs0 + remote_processor_read_data_beats_cs1 + remote_processor_read_data_beats_cs2 + remote_processor_read_data_beats_cs3 + remote_processor_read_data_beats_cs4 + remote_processor_read_data_beats_cs5 + remote_processor_read_data_beats_cs6 + remote_processor_read_data_beats_cs7 + remote_processor_read_data_beats_cs8 + remote_processor_read_data_beats_cs9 + remote_processor_read_data_beats_cs10 + remote_processor_read_data_beats_cs11",
> >> + "MetricGroup": "data_fabric",
> >> + "PerPkg": "1",
> >> + "ScaleUnit": "6.103515625e-5MiB"
> >> + },
> >> + {
> >> + "MetricName": "dram_write_data_bytes_for_remote_processor",
> >> + "BriefDescription": "DRAM write data bytes for remote processor.",
> >> + "MetricExpr": "remote_processor_write_data_beats_cs0 + remote_processor_write_data_beats_cs1 + remote_processor_write_data_beats_cs2 + remote_processor_write_data_beats_cs3 + remote_processor_write_data_beats_cs4 + remote_processor_write_data_beats_cs5 + remote_processor_write_data_beats_cs6 + remote_processor_write_data_beats_cs7 + remote_processor_write_data_beats_cs8 + remote_processor_write_data_beats_cs9 + remote_processor_write_data_beats_cs10 + remote_processor_write_data_beats_cs11",
> >> + "MetricGroup": "data_fabric",
> >> + "PerPkg": "1",
> >> + "ScaleUnit": "6.103515625e-5MiB"
> >> + },
> >> + {
> >> + "MetricName": "local_socket_upstream_dma_read_data_bytes",
> >> + "BriefDescription": "Local socket upstream DMA read data bytes.",
> >> + "MetricExpr": "local_socket_upstream_read_beats_iom0 + local_socket_upstream_read_beats_iom1 + local_socket_upstream_read_beats_iom2 + local_socket_upstream_read_beats_iom3",
> >> + "MetricGroup": "data_fabric",
> >> + "PerPkg": "1",
> >> + "ScaleUnit": "6.103515625e-5MiB"
> >> + },
> >> + {
> >> + "MetricName": "local_socket_upstream_dma_write_data_bytes",
> >> + "BriefDescription": "Local socket upstream DMA write data bytes.",
> >> + "MetricExpr": "local_socket_upstream_write_beats_iom0 + local_socket_upstream_write_beats_iom1 + local_socket_upstream_write_beats_iom2 + local_socket_upstream_write_beats_iom3",
> >> + "MetricGroup": "data_fabric",
> >> + "PerPkg": "1",
> >> + "ScaleUnit": "6.103515625e-5MiB"
> >> + },
> >> + {
> >> + "MetricName": "remote_socket_upstream_dma_read_data_bytes",
> >> + "BriefDescription": "Remote socket upstream DMA read data bytes.",
> >> + "MetricExpr": "remote_socket_upstream_read_beats_iom0 + remote_socket_upstream_read_beats_iom1 + remote_socket_upstream_read_beats_iom2 + remote_socket_upstream_read_beats_iom3",
> >> + "MetricGroup": "data_fabric",
> >> + "PerPkg": "1",
> >> + "ScaleUnit": "6.103515625e-5MiB"
> >> + },
> >> + {
> >> + "MetricName": "remote_socket_upstream_dma_write_data_bytes",
> >> + "BriefDescription": "Remote socket upstream DMA write data bytes.",
> >> + "MetricExpr": "remote_socket_upstream_write_beats_iom0 + remote_socket_upstream_write_beats_iom1 + remote_socket_upstream_write_beats_iom2 + remote_socket_upstream_write_beats_iom3",
> >> + "MetricGroup": "data_fabric",
> >> + "PerPkg": "1",
> >> + "ScaleUnit": "6.103515625e-5MiB"
> >> + },
> >> + {
> >> + "MetricName": "local_socket_inbound_data_bytes_to_cpu",
> >> + "BriefDescription": "Local socket inbound data bytes to the CPU (e.g. read data).",
> >> + "MetricExpr": "local_socket_inf0_inbound_data_beats_ccm0 + local_socket_inf1_inbound_data_beats_ccm0 + local_socket_inf0_inbound_data_beats_ccm1 + local_socket_inf1_inbound_data_beats_ccm1 + local_socket_inf0_inbound_data_beats_ccm2 + local_socket_inf1_inbound_data_beats_ccm2 + local_socket_inf0_inbound_data_beats_ccm3 + local_socket_inf1_inbound_data_beats_ccm3 + local_socket_inf0_inbound_data_beats_ccm4 + local_socket_inf1_inbound_data_beats_ccm4 + local_socket_inf0_inbound_data_beats_ccm5 + local_socket_inf1_inbound_data_beats_ccm5 + local_socket_inf0_inbound_data_beats_ccm6 + local_socket_inf1_inbound_data_beats_ccm6 + local_socket_inf0_inbound_data_beats_ccm7 + local_socket_inf1_inbound_data_beats_ccm7",
> >> + "MetricGroup": "data_fabric",
> >> + "PerPkg": "1",
> >> + "ScaleUnit": "3.0517578125e-5MiB"
> >> + },
> >> + {
> >> + "MetricName": "local_socket_outbound_data_bytes_from_cpu",
> >> + "BriefDescription": "Local socket outbound data bytes from the CPU (e.g. write data).",
> >> + "MetricExpr": "local_socket_inf0_outbound_data_beats_ccm0 + local_socket_inf1_outbound_data_beats_ccm0 + local_socket_inf0_outbound_data_beats_ccm1 + local_socket_inf1_outbound_data_beats_ccm1 + local_socket_inf0_outbound_data_beats_ccm2 + local_socket_inf1_outbound_data_beats_ccm2 + local_socket_inf0_outbound_data_beats_ccm3 + local_socket_inf1_outbound_data_beats_ccm3 + local_socket_inf0_outbound_data_beats_ccm4 + local_socket_inf1_outbound_data_beats_ccm4 + local_socket_inf0_outbound_data_beats_ccm5 + local_socket_inf1_outbound_data_beats_ccm5 + local_socket_inf0_outbound_data_beats_ccm6 + local_socket_inf1_outbound_data_beats_ccm6 + local_socket_inf0_outbound_data_beats_ccm7 + local_socket_inf1_outbound_data_beats_ccm7",
> >> + "MetricGroup": "data_fabric",
> >> + "PerPkg": "1",
> >> + "ScaleUnit": "6.103515625e-5MiB"
> >> + },
> >> + {
> >> + "MetricName": "remote_socket_inbound_data_bytes_to_cpu",
> >> + "BriefDescription": "Remote socket inbound data bytes to the CPU (e.g. read data).",
> >> + "MetricExpr": "remote_socket_inf0_inbound_data_beats_ccm0 + remote_socket_inf1_inbound_data_beats_ccm0 + remote_socket_inf0_inbound_data_beats_ccm1 + remote_socket_inf1_inbound_data_beats_ccm1 + remote_socket_inf0_inbound_data_beats_ccm2 + remote_socket_inf1_inbound_data_beats_ccm2 + remote_socket_inf0_inbound_data_beats_ccm3 + remote_socket_inf1_inbound_data_beats_ccm3 + remote_socket_inf0_inbound_data_beats_ccm4 + remote_socket_inf1_inbound_data_beats_ccm4 + remote_socket_inf0_inbound_data_beats_ccm5 + remote_socket_inf1_inbound_data_beats_ccm5 + remote_socket_inf0_inbound_data_beats_ccm6 + remote_socket_inf1_inbound_data_beats_ccm6 + remote_socket_inf0_inbound_data_beats_ccm7 + remote_socket_inf1_inbound_data_beats_ccm7",
> >> + "MetricGroup": "data_fabric",
> >> + "PerPkg": "1",
> >> + "ScaleUnit": "3.0517578125e-5MiB"
> >> + },
> >> + {
> >> + "MetricName": "remote_socket_outbound_data_bytes_from_cpu",
> >> + "BriefDescription": "Remote socket outbound data bytes from the CPU (e.g. write data).",
> >> + "MetricExpr": "remote_socket_inf0_outbound_data_beats_ccm0 + remote_socket_inf1_outbound_data_beats_ccm0 + remote_socket_inf0_outbound_data_beats_ccm1 + remote_socket_inf1_outbound_data_beats_ccm1 + remote_socket_inf0_outbound_data_beats_ccm2 + remote_socket_inf1_outbound_data_beats_ccm2 + remote_socket_inf0_outbound_data_beats_ccm3 + remote_socket_inf1_outbound_data_beats_ccm3 + remote_socket_inf0_outbound_data_beats_ccm4 + remote_socket_inf1_outbound_data_beats_ccm4 + remote_socket_inf0_outbound_data_beats_ccm5 + remote_socket_inf1_outbound_data_beats_ccm5 + remote_socket_inf0_outbound_data_beats_ccm6 + remote_socket_inf1_outbound_data_beats_ccm6 + remote_socket_inf0_outbound_data_beats_ccm7 + remote_socket_inf1_outbound_data_beats_ccm7",
> >> + "MetricGroup": "data_fabric",
> >> + "PerPkg": "1",
> >> + "ScaleUnit": "6.103515625e-5MiB"
> >> + },
> >> + {
> >> + "MetricName": "local_socket_outbound_data_bytes_from_all_links",
> >> + "BriefDescription": "Outbound data bytes from all links (local socket).",
> >> + "MetricExpr": "local_socket_outbound_data_beats_link0 + local_socket_outbound_data_beats_link1 + local_socket_outbound_data_beats_link2 + local_socket_outbound_data_beats_link3 + local_socket_outbound_data_beats_link4 + local_socket_outbound_data_beats_link5 + local_socket_outbound_data_beats_link6 + local_socket_outbound_data_beats_link7",
> >> + "MetricGroup": "data_fabric",
> >> + "PerPkg": "1",
> >> + "ScaleUnit": "6.103515625e-5MiB"
> >> + }
> >> +]
> >> --
> >> 2.34.1
> >>
>

2022-12-07 18:00:22

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH 4/4] perf vendor events amd: Add Zen 4 metrics

On Tue, Dec 6, 2022 at 9:32 PM Sandipan Das <[email protected]> wrote:
>
> Add metrics taken from Section 2.1.15.2 "Performance Measurement" in
> the Processor Programming Reference (PPR) for AMD Family 19h Model 11h
> Revision B1 processors.
>
> The recommended metrics are sourced from Table 27 "Guidance for Common
> Performance Statistics with Complex Event Selects".
>
> The pipeline utilization metrics are sourced from Table 28 "Guidance
> for Pipeline Utilization Analysis Statistics". These are new to Zen 4
> processors and useful for finding performance bottlenecks by analyzing
> activity at different stages of the pipeline. Metric groups have been
> added for Level 1 and Level 2 analysis.
>
> Signed-off-by: Sandipan Das <[email protected]>
> ---
> .../pmu-events/arch/x86/amdzen4/pipeline.json | 98 +++++
> .../arch/x86/amdzen4/recommended.json | 334 ++++++++++++++++++
> 2 files changed, 432 insertions(+)
> create mode 100644 tools/perf/pmu-events/arch/x86/amdzen4/pipeline.json
> create mode 100644 tools/perf/pmu-events/arch/x86/amdzen4/recommended.json
>
> diff --git a/tools/perf/pmu-events/arch/x86/amdzen4/pipeline.json b/tools/perf/pmu-events/arch/x86/amdzen4/pipeline.json
> new file mode 100644
> index 000000000000..23d1f35d0903
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/x86/amdzen4/pipeline.json
> @@ -0,0 +1,98 @@
> +[
> + {
> + "MetricName": "total_dispatch_slots",
> + "BriefDescription": "Total dispatch slots (upto 6 instructions can be dispatched in each cycle).",
> + "MetricExpr": "6 * ls_not_halted_cyc"
> + },
> + {
> + "MetricName": "frontend_bound",
> + "BriefDescription": "Fraction of dispatch slots that remained unused because the frontend did not supply enough instructions/ops.",
> + "MetricExpr": "d_ratio(de_no_dispatch_per_slot.no_ops_from_frontend, total_dispatch_slots)",
> + "MetricGroup": "pipeline_utilization_level1",

It might be useful here to add the metric group TopdownL1, there was a
proposal to use this with --topdown when topdown events aren't
present:
https://lore.kernel.org/linux-perf-users/[email protected]/
We also describe topdown analysis using metrics starting from this metric group:
https://perf.wiki.kernel.org/index.php/Top-Down_Analysis

> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "bad_speculation",
> + "BriefDescription": "Fraction of dispatched ops that did not retire.",
> + "MetricExpr": "d_ratio(de_src_op_disp.all - ex_ret_ops, total_dispatch_slots)",
> + "MetricGroup": "pipeline_utilization_level1",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "backend_bound",
> + "BriefDescription": "Fraction of dispatch slots that remained unused because of backend stalls.",
> + "MetricExpr": "d_ratio(de_no_dispatch_per_slot.backend_stalls, total_dispatch_slots)",
> + "MetricGroup": "pipeline_utilization_level1",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "smt_contention",
> + "BriefDescription": "Fraction of dispatch slots that remained unused because the other thread was selected.",
> + "MetricExpr": "d_ratio(de_no_dispatch_per_slot.smt_contention, total_dispatch_slots)",
> + "MetricGroup": "pipeline_utilization_level1",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "retiring",
> + "BriefDescription": "Fraction of dispatch slots used by ops that retired.",
> + "MetricExpr": "d_ratio(ex_ret_ops, total_dispatch_slots)",
> + "MetricGroup": "pipeline_utilization_level1",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "frontend_bound_latency",
> + "BriefDescription": "Fraction of dispatch slots that remained unused because of a latency bottleneck in the frontend (such as instruction cache or TLB misses).",
> + "MetricExpr": "d_ratio((6 * cpu@de_no_dispatch_per_slot.no_ops_from_frontend\\,cmask\\=0x6@), total_dispatch_slots)",
> + "MetricGroup": "pipeline_utilization_level2;frontend_bound_level2",

From:
https://perf.wiki.kernel.org/index.php/Top-Down_Analysis
perhaps this should be in a group "frontend_bound_group", to make the
drill down more obvious.

> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "frontend_bound_bandwidth",
> + "BriefDescription": "Fraction of dispatch slots that remained unused because of a bandwidth bottleneck in the frontend (such as decode or op cache fetch bandwidth).",
> + "MetricExpr": "d_ratio(de_no_dispatch_per_slot.no_ops_from_frontend - (6 * cpu@de_no_dispatch_per_slot.no_ops_from_frontend\\,cmask\\=0x6@), total_dispatch_slots)",
> + "MetricGroup": "pipeline_utilization_level2;frontend_bound_level2",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "bad_speculation_mispredicts",
> + "BriefDescription": "Fraction of dispatched ops that were flushed due to branch mispredicts.",
> + "MetricExpr": "d_ratio(bad_speculation * ex_ret_brn_misp, ex_ret_brn_misp + resyncs_or_nc_redirects)",
> + "MetricGroup": "pipeline_utilization_level2;bad_speculation_level2",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "bad_speculation_pipeline_restarts",
> + "BriefDescription": "Fraction of dispatched ops that were flushed due to pipeline restarts (resyncs).",
> + "MetricExpr": "d_ratio(bad_speculation * resyncs_or_nc_redirects, ex_ret_brn_misp + resyncs_or_nc_redirects)",
> + "MetricGroup": "pipeline_utilization_level2;bad_speculation_level2",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "backend_bound_memory",
> + "BriefDescription": "Fraction of dispatch slots that remained unused because of stalls due to the memory subsystem.",
> + "MetricExpr": "backend_bound * d_ratio(ex_no_retire.load_not_complete, ex_no_retire.not_complete)",
> + "MetricGroup": "pipeline_utilization_level2;backend_bound_level2",

Similarly there could be a "backend_bound_group", etc.

Thanks,
Ian

> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "backend_bound_cpu",
> + "BriefDescription": "Fraction of dispatch slots that remained unused because of stalls not related to the memory subsystem.",
> + "MetricExpr": "backend_bound * (1 - d_ratio(ex_no_retire.load_not_complete, ex_no_retire.not_complete))",
> + "MetricGroup": "pipeline_utilization_level2;backend_bound_level2",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "retiring_fastpath",
> + "BriefDescription": "Fraction of dispatch slots used by fastpath ops that retired.",
> + "MetricExpr": "retiring * (1 - d_ratio(ex_ret_ucode_ops, ex_ret_ops))",
> + "MetricGroup": "pipeline_utilization_level2;retiring_level2",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "retiring_microcode",
> + "BriefDescription": "Fraction of dispatch slots used by microcode ops that retired.",
> + "MetricExpr": "retiring * d_ratio(ex_ret_ucode_ops, ex_ret_ops)",
> + "MetricGroup": "pipeline_utilization_level2;retiring_level2",
> + "ScaleUnit": "100%"
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/x86/amdzen4/recommended.json b/tools/perf/pmu-events/arch/x86/amdzen4/recommended.json
> new file mode 100644
> index 000000000000..2e3c9d8942b9
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/x86/amdzen4/recommended.json
> @@ -0,0 +1,334 @@
> +[
> + {
> + "MetricName": "branch_misprediction_ratio",
> + "BriefDescription": "Execution-time branch misprediction ratio (non-speculative).",
> + "MetricExpr": "d_ratio(ex_ret_brn_misp, ex_ret_brn)",
> + "MetricGroup": "branch_prediction",
> + "ScaleUnit": "100%"
> + },
> + {
> + "EventName": "all_data_cache_accesses",
> + "EventCode": "0x29",
> + "BriefDescription": "All data cache accesses.",
> + "UMask": "0x07"
> + },
> + {
> + "MetricName": "all_l2_cache_accesses",
> + "BriefDescription": "All L2 cache accesses.",
> + "MetricExpr": "l2_request_g1.all_no_prefetch + l2_pf_hit_l2.all + l2_pf_miss_l2_hit_l3.all + l2_pf_miss_l2_l3.all",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l2_cache_accesses_from_l1_ic_misses",
> + "BriefDescription": "L2 cache accesses from L1 instruction cache misses (including prefetch).",
> + "MetricExpr": "l2_request_g1.cacheable_ic_read",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l2_cache_accesses_from_l1_dc_misses",
> + "BriefDescription": "L2 cache accesses from L1 data cache misses (including prefetch).",
> + "MetricExpr": "l2_request_g1.all_dc",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l2_cache_accesses_from_l2_hwpf",
> + "BriefDescription": "L2 cache accesses from L2 cache hardware prefetcher.",
> + "MetricExpr": "l2_pf_hit_l2.all + l2_pf_miss_l2_hit_l3.all + l2_pf_miss_l2_l3.all",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "all_l2_cache_misses",
> + "BriefDescription": "All L2 cache misses.",
> + "MetricExpr": "l2_cache_req_stat.ic_dc_miss_in_l2 + l2_pf_miss_l2_hit_l3.all + l2_pf_miss_l2_l3.all",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l2_cache_misses_from_l1_ic_miss",
> + "BriefDescription": "L2 cache misses from L1 instruction cache misses.",
> + "MetricExpr": "l2_cache_req_stat.ic_fill_miss",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l2_cache_misses_from_l1_dc_miss",
> + "BriefDescription": "L2 cache misses from L1 data cache misses.",
> + "MetricExpr": "l2_cache_req_stat.ls_rd_blk_c",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l2_cache_misses_from_l2_hwpf",
> + "BriefDescription": "L2 cache misses from L2 cache hardware prefetcher.",
> + "MetricExpr": "l2_pf_miss_l2_hit_l3.all + l2_pf_miss_l2_l3.all",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "all_l2_cache_hits",
> + "BriefDescription": "All L2 cache hits.",
> + "MetricExpr": "l2_cache_req_stat.ic_dc_hit_in_l2 + l2_pf_hit_l2.all",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l2_cache_hits_from_l1_ic_miss",
> + "BriefDescription": "L2 cache hits from L1 instruction cache misses.",
> + "MetricExpr": "l2_cache_req_stat.ic_hit_in_l2",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l2_cache_hits_from_l1_dc_miss",
> + "BriefDescription": "L2 cache hits from L1 data cache misses.",
> + "MetricExpr": "l2_cache_req_stat.dc_hit_in_l2",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l2_cache_hits_from_l2_hwpf",
> + "BriefDescription": "L2 cache hits from L2 cache hardware prefetcher.",
> + "MetricExpr": "l2_pf_hit_l2.all",
> + "MetricGroup": "l2_cache"
> + },
> + {
> + "MetricName": "l3_cache_accesses",
> + "BriefDescription": "L3 cache accesses.",
> + "MetricExpr": "l3_lookup_state.all_coherent_accesses_to_l3",
> + "MetricGroup": "l3_cache"
> + },
> + {
> + "MetricName": "l3_misses",
> + "BriefDescription": "L3 misses (including cacheline state change requests).",
> + "MetricExpr": "l3_lookup_state.l3_miss",
> + "MetricGroup": "l3_cache"
> + },
> + {
> + "MetricName": "l3_read_miss_latency",
> + "BriefDescription": "Average L3 read miss latency (in core clocks).",
> + "MetricExpr": "(l3_xi_sampled_latency.all * 10) / l3_xi_sampled_latency_requests.all",
> + "MetricGroup": "l3_cache",
> + "ScaleUnit": "1core clocks"
> + },
> + {
> + "MetricName": "op_cache_fetch_miss_ratio",
> + "BriefDescription": "Op cache miss ratio for all fetches.",
> + "MetricExpr": "d_ratio(op_cache_hit_miss.op_cache_miss, op_cache_hit_miss.all_op_cache_accesses)",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "ic_fetch_miss_ratio",
> + "BriefDescription": "Instruction cache miss ratio for all fetches. An instruction cache miss will not be counted by this metric if it is an OC hit.",
> + "MetricExpr": "d_ratio(ic_tag_hit_miss.instruction_cache_miss, ic_tag_hit_miss.all_instruction_cache_accesses)",
> + "ScaleUnit": "100%"
> + },
> + {
> + "MetricName": "l1_data_cache_fills_from_memory",
> + "BriefDescription": "L1 data cache fills from DRAM or MMIO in any NUMA node.",
> + "MetricExpr": "ls_any_fills_from_sys.dram_io_all",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_data_cache_fills_from_remote_node",
> + "BriefDescription": "L1 data cache fills from a different NUMA node.",
> + "MetricExpr": "ls_any_fills_from_sys.far_all",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_data_cache_fills_from_same_ccx",
> + "BriefDescription": "L1 data cache fills from within the same CCX.",
> + "MetricExpr": "ls_any_fills_from_sys.local_all",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_data_cache_fills_from_different_ccx",
> + "BriefDescription": "L1 data cache fills from another CCX cache in any NUMA node.",
> + "MetricExpr": "ls_any_fills_from_sys.remote_cache",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "all_l1_data_cache_fills",
> + "BriefDescription": "All L1 data cache fills.",
> + "MetricExpr": "ls_any_fills_from_sys.all",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_demand_data_cache_fills_from_local_l2",
> + "BriefDescription": "L1 demand data cache fills from local L2 cache.",
> + "MetricExpr": "ls_dmnd_fills_from_sys.local_l2",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_demand_data_cache_fills_from_same_ccx",
> + "BriefDescription": "L1 demand data cache fills from within the same CCX.",
> + "MetricExpr": "ls_dmnd_fills_from_sys.local_ccx",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_demand_data_cache_fills_from_near_cache",
> + "BriefDescription": "L1 demand data cache fills from another CCX cache in the same NUMA node.",
> + "MetricExpr": "ls_dmnd_fills_from_sys.near_cache",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_demand_data_cache_fills_from_near_memory",
> + "BriefDescription": "L1 demand data cache fills from DRAM or MMIO in the same NUMA node.",
> + "MetricExpr": "ls_dmnd_fills_from_sys.dram_io_near",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_demand_data_cache_fills_from_far_cache",
> + "BriefDescription": "L1 demand data cache fills from another CCX cache in a different NUMA node.",
> + "MetricExpr": "ls_dmnd_fills_from_sys.far_cache",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_demand_data_cache_fills_from_far_memory",
> + "BriefDescription": "L1 demand data cache fills from DRAM or MMIO in a different NUMA node.",
> + "MetricExpr": "ls_dmnd_fills_from_sys.dram_io_far",
> + "MetricGroup": "l1_dcache"
> + },
> + {
> + "MetricName": "l1_itlb_misses",
> + "BriefDescription": "L1 instruction TLB misses.",
> + "MetricExpr": "bp_l1_tlb_miss_l2_tlb_hit + bp_l1_tlb_miss_l2_tlb_miss.all",
> + "MetricGroup": "tlb"
> + },
> + {
> + "MetricName": "l2_itlb_misses",
> + "BriefDescription": "L2 instruction TLB misses and instruction page walks.",
> + "MetricExpr": "bp_l1_tlb_miss_l2_tlb_miss.all",
> + "MetricGroup": "tlb"
> + },
> + {
> + "MetricName": "l1_dtlb_misses",
> + "BriefDescription": "L1 data TLB misses.",
> + "MetricExpr": "ls_l1_d_tlb_miss.all",
> + "MetricGroup": "tlb"
> + },
> + {
> + "MetricName": "l2_dtlb_misses",
> + "BriefDescription": "L2 data TLB misses and data page walks.",
> + "MetricExpr": "ls_l1_d_tlb_miss.all_l2_miss",
> + "MetricGroup": "tlb"
> + },
> + {
> + "MetricName": "all_tlbs_flushed",
> + "BriefDescription": "All TLBs flushed.",
> + "MetricExpr": "ls_tlb_flush.all",
> + "MetricGroup": "tlb"
> + },
> + {
> + "MetricName": "macro_ops_dispatched",
> + "BriefDescription": "Macro-ops dispatched.",
> + "MetricExpr": "de_src_op_disp.all",
> + "MetricGroup": "decoder"
> + },
> + {
> + "MetricName": "sse_avx_stalls",
> + "BriefDescription": "Mixed SSE/AVX stalls.",
> + "MetricExpr": "fp_disp_faults.sse_avx_all"
> + },
> + {
> + "MetricName": "macro_ops_retired",
> + "BriefDescription": "Macro-ops retired.",
> + "MetricExpr": "ex_ret_ops"
> + },
> + {
> + "MetricName": "dram_read_data_bytes_for_local_processor",
> + "BriefDescription": "DRAM read data bytes for local processor.",
> + "MetricExpr": "local_processor_read_data_beats_cs0 + local_processor_read_data_beats_cs1 + local_processor_read_data_beats_cs2 + local_processor_read_data_beats_cs3 + local_processor_read_data_beats_cs4 + local_processor_read_data_beats_cs5 + local_processor_read_data_beats_cs6 + local_processor_read_data_beats_cs7 + local_processor_read_data_beats_cs8 + local_processor_read_data_beats_cs9 + local_processor_read_data_beats_cs10 + local_processor_read_data_beats_cs11",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "dram_write_data_bytes_for_local_processor",
> + "BriefDescription": "DRAM write data bytes for local processor.",
> + "MetricExpr": "local_processor_write_data_beats_cs0 + local_processor_write_data_beats_cs1 + local_processor_write_data_beats_cs2 + local_processor_write_data_beats_cs3 + local_processor_write_data_beats_cs4 + local_processor_write_data_beats_cs5 + local_processor_write_data_beats_cs6 + local_processor_write_data_beats_cs7 + local_processor_write_data_beats_cs8 + local_processor_write_data_beats_cs9 + local_processor_write_data_beats_cs10 + local_processor_write_data_beats_cs11",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "dram_read_data_bytes_for_remote_processor",
> + "BriefDescription": "DRAM read data bytes for remote processor.",
> + "MetricExpr": "remote_processor_read_data_beats_cs0 + remote_processor_read_data_beats_cs1 + remote_processor_read_data_beats_cs2 + remote_processor_read_data_beats_cs3 + remote_processor_read_data_beats_cs4 + remote_processor_read_data_beats_cs5 + remote_processor_read_data_beats_cs6 + remote_processor_read_data_beats_cs7 + remote_processor_read_data_beats_cs8 + remote_processor_read_data_beats_cs9 + remote_processor_read_data_beats_cs10 + remote_processor_read_data_beats_cs11",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "dram_write_data_bytes_for_remote_processor",
> + "BriefDescription": "DRAM write data bytes for remote processor.",
> + "MetricExpr": "remote_processor_write_data_beats_cs0 + remote_processor_write_data_beats_cs1 + remote_processor_write_data_beats_cs2 + remote_processor_write_data_beats_cs3 + remote_processor_write_data_beats_cs4 + remote_processor_write_data_beats_cs5 + remote_processor_write_data_beats_cs6 + remote_processor_write_data_beats_cs7 + remote_processor_write_data_beats_cs8 + remote_processor_write_data_beats_cs9 + remote_processor_write_data_beats_cs10 + remote_processor_write_data_beats_cs11",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "local_socket_upstream_dma_read_data_bytes",
> + "BriefDescription": "Local socket upstream DMA read data bytes.",
> + "MetricExpr": "local_socket_upstream_read_beats_iom0 + local_socket_upstream_read_beats_iom1 + local_socket_upstream_read_beats_iom2 + local_socket_upstream_read_beats_iom3",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "local_socket_upstream_dma_write_data_bytes",
> + "BriefDescription": "Local socket upstream DMA write data bytes.",
> + "MetricExpr": "local_socket_upstream_write_beats_iom0 + local_socket_upstream_write_beats_iom1 + local_socket_upstream_write_beats_iom2 + local_socket_upstream_write_beats_iom3",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "remote_socket_upstream_dma_read_data_bytes",
> + "BriefDescription": "Remote socket upstream DMA read data bytes.",
> + "MetricExpr": "remote_socket_upstream_read_beats_iom0 + remote_socket_upstream_read_beats_iom1 + remote_socket_upstream_read_beats_iom2 + remote_socket_upstream_read_beats_iom3",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "remote_socket_upstream_dma_write_data_bytes",
> + "BriefDescription": "Remote socket upstream DMA write data bytes.",
> + "MetricExpr": "remote_socket_upstream_write_beats_iom0 + remote_socket_upstream_write_beats_iom1 + remote_socket_upstream_write_beats_iom2 + remote_socket_upstream_write_beats_iom3",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "local_socket_inbound_data_bytes_to_cpu",
> + "BriefDescription": "Local socket inbound data bytes to the CPU (e.g. read data).",
> + "MetricExpr": "local_socket_inf0_inbound_data_beats_ccm0 + local_socket_inf1_inbound_data_beats_ccm0 + local_socket_inf0_inbound_data_beats_ccm1 + local_socket_inf1_inbound_data_beats_ccm1 + local_socket_inf0_inbound_data_beats_ccm2 + local_socket_inf1_inbound_data_beats_ccm2 + local_socket_inf0_inbound_data_beats_ccm3 + local_socket_inf1_inbound_data_beats_ccm3 + local_socket_inf0_inbound_data_beats_ccm4 + local_socket_inf1_inbound_data_beats_ccm4 + local_socket_inf0_inbound_data_beats_ccm5 + local_socket_inf1_inbound_data_beats_ccm5 + local_socket_inf0_inbound_data_beats_ccm6 + local_socket_inf1_inbound_data_beats_ccm6 + local_socket_inf0_inbound_data_beats_ccm7 + local_socket_inf1_inbound_data_beats_ccm7",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "3.0517578125e-5MiB"
> + },
> + {
> + "MetricName": "local_socket_outbound_data_bytes_from_cpu",
> + "BriefDescription": "Local socket outbound data bytes from the CPU (e.g. write data).",
> + "MetricExpr": "local_socket_inf0_outbound_data_beats_ccm0 + local_socket_inf1_outbound_data_beats_ccm0 + local_socket_inf0_outbound_data_beats_ccm1 + local_socket_inf1_outbound_data_beats_ccm1 + local_socket_inf0_outbound_data_beats_ccm2 + local_socket_inf1_outbound_data_beats_ccm2 + local_socket_inf0_outbound_data_beats_ccm3 + local_socket_inf1_outbound_data_beats_ccm3 + local_socket_inf0_outbound_data_beats_ccm4 + local_socket_inf1_outbound_data_beats_ccm4 + local_socket_inf0_outbound_data_beats_ccm5 + local_socket_inf1_outbound_data_beats_ccm5 + local_socket_inf0_outbound_data_beats_ccm6 + local_socket_inf1_outbound_data_beats_ccm6 + local_socket_inf0_outbound_data_beats_ccm7 + local_socket_inf1_outbound_data_beats_ccm7",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "remote_socket_inbound_data_bytes_to_cpu",
> + "BriefDescription": "Remote socket inbound data bytes to the CPU (e.g. read data).",
> + "MetricExpr": "remote_socket_inf0_inbound_data_beats_ccm0 + remote_socket_inf1_inbound_data_beats_ccm0 + remote_socket_inf0_inbound_data_beats_ccm1 + remote_socket_inf1_inbound_data_beats_ccm1 + remote_socket_inf0_inbound_data_beats_ccm2 + remote_socket_inf1_inbound_data_beats_ccm2 + remote_socket_inf0_inbound_data_beats_ccm3 + remote_socket_inf1_inbound_data_beats_ccm3 + remote_socket_inf0_inbound_data_beats_ccm4 + remote_socket_inf1_inbound_data_beats_ccm4 + remote_socket_inf0_inbound_data_beats_ccm5 + remote_socket_inf1_inbound_data_beats_ccm5 + remote_socket_inf0_inbound_data_beats_ccm6 + remote_socket_inf1_inbound_data_beats_ccm6 + remote_socket_inf0_inbound_data_beats_ccm7 + remote_socket_inf1_inbound_data_beats_ccm7",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "3.0517578125e-5MiB"
> + },
> + {
> + "MetricName": "remote_socket_outbound_data_bytes_from_cpu",
> + "BriefDescription": "Remote socket outbound data bytes from the CPU (e.g. write data).",
> + "MetricExpr": "remote_socket_inf0_outbound_data_beats_ccm0 + remote_socket_inf1_outbound_data_beats_ccm0 + remote_socket_inf0_outbound_data_beats_ccm1 + remote_socket_inf1_outbound_data_beats_ccm1 + remote_socket_inf0_outbound_data_beats_ccm2 + remote_socket_inf1_outbound_data_beats_ccm2 + remote_socket_inf0_outbound_data_beats_ccm3 + remote_socket_inf1_outbound_data_beats_ccm3 + remote_socket_inf0_outbound_data_beats_ccm4 + remote_socket_inf1_outbound_data_beats_ccm4 + remote_socket_inf0_outbound_data_beats_ccm5 + remote_socket_inf1_outbound_data_beats_ccm5 + remote_socket_inf0_outbound_data_beats_ccm6 + remote_socket_inf1_outbound_data_beats_ccm6 + remote_socket_inf0_outbound_data_beats_ccm7 + remote_socket_inf1_outbound_data_beats_ccm7",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + },
> + {
> + "MetricName": "local_socket_outbound_data_bytes_from_all_links",
> + "BriefDescription": "Outbound data bytes from all links (local socket).",
> + "MetricExpr": "local_socket_outbound_data_beats_link0 + local_socket_outbound_data_beats_link1 + local_socket_outbound_data_beats_link2 + local_socket_outbound_data_beats_link3 + local_socket_outbound_data_beats_link4 + local_socket_outbound_data_beats_link5 + local_socket_outbound_data_beats_link6 + local_socket_outbound_data_beats_link7",
> + "MetricGroup": "data_fabric",
> + "PerPkg": "1",
> + "ScaleUnit": "6.103515625e-5MiB"
> + }
> +]
> --
> 2.34.1
>

2022-12-13 10:20:53

by Sandipan Das

[permalink] [raw]
Subject: Re: [PATCH 4/4] perf vendor events amd: Add Zen 4 metrics

On 12/7/2022 11:14 PM, Ian Rogers wrote:
> On Tue, Dec 6, 2022 at 9:32 PM Sandipan Das <[email protected]> wrote:
>>
>> Add metrics taken from Section 2.1.15.2 "Performance Measurement" in
>> the Processor Programming Reference (PPR) for AMD Family 19h Model 11h
>> Revision B1 processors.
>>
>> The recommended metrics are sourced from Table 27 "Guidance for Common
>> Performance Statistics with Complex Event Selects".
>>
>> The pipeline utilization metrics are sourced from Table 28 "Guidance
>> for Pipeline Utilization Analysis Statistics". These are new to Zen 4
>> processors and useful for finding performance bottlenecks by analyzing
>> activity at different stages of the pipeline. Metric groups have been
>> added for Level 1 and Level 2 analysis.
>>
>> Signed-off-by: Sandipan Das <[email protected]>
>> ---
>> .../pmu-events/arch/x86/amdzen4/pipeline.json | 98 +++++
>> .../arch/x86/amdzen4/recommended.json | 334 ++++++++++++++++++
>> 2 files changed, 432 insertions(+)
>> create mode 100644 tools/perf/pmu-events/arch/x86/amdzen4/pipeline.json
>> create mode 100644 tools/perf/pmu-events/arch/x86/amdzen4/recommended.json
>>
>> diff --git a/tools/perf/pmu-events/arch/x86/amdzen4/pipeline.json b/tools/perf/pmu-events/arch/x86/amdzen4/pipeline.json
>> new file mode 100644
>> index 000000000000..23d1f35d0903
>> --- /dev/null
>> +++ b/tools/perf/pmu-events/arch/x86/amdzen4/pipeline.json
>> @@ -0,0 +1,98 @@
>> +[
>> + {
>> + "MetricName": "total_dispatch_slots",
>> + "BriefDescription": "Total dispatch slots (upto 6 instructions can be dispatched in each cycle).",
>> + "MetricExpr": "6 * ls_not_halted_cyc"
>> + },
>> + {
>> + "MetricName": "frontend_bound",
>> + "BriefDescription": "Fraction of dispatch slots that remained unused because the frontend did not supply enough instructions/ops.",
>> + "MetricExpr": "d_ratio(de_no_dispatch_per_slot.no_ops_from_frontend, total_dispatch_slots)",
>> + "MetricGroup": "pipeline_utilization_level1",
>
> It might be useful here to add the metric group TopdownL1, there was a
> proposal to use this with --topdown when topdown events aren't
> present:
> https://lore.kernel.org/linux-perf-users/[email protected]/
> We also describe topdown analysis using metrics starting from this metric group:
> https://perf.wiki.kernel.org/index.php/Top-Down_Analysis
>

Thanks for the suggestion.

After looking at Section 3 "Top-Down Analysis" from the original paper [1]
on top-down analysis, my understanding is that a specific categorization of
metrics is expected at each level. E.g. for Level 1 analysis, the metrics
are "Retiring", "Bad Speculation", "Frontend Bound" and "Backend Bound".
For Zen 4, an additional metric here is "SMT Contention". So the pipeline
utilization data has information that overlaps but is also different at the
same time. Hence, the classification may not strictly adhere to the current
notion of top-down.

[1] "A Top-Down method for performance analysis and counters architecture"
https://ieeexplore.ieee.org/document/6844459

>> + "ScaleUnit": "100%"
>> + },
>> + {
>> + "MetricName": "bad_speculation",
>> + "BriefDescription": "Fraction of dispatched ops that did not retire.",
>> + "MetricExpr": "d_ratio(de_src_op_disp.all - ex_ret_ops, total_dispatch_slots)",
>> + "MetricGroup": "pipeline_utilization_level1",
>> + "ScaleUnit": "100%"
>> + },
>> + {
>> + "MetricName": "backend_bound",
>> + "BriefDescription": "Fraction of dispatch slots that remained unused because of backend stalls.",
>> + "MetricExpr": "d_ratio(de_no_dispatch_per_slot.backend_stalls, total_dispatch_slots)",
>> + "MetricGroup": "pipeline_utilization_level1",
>> + "ScaleUnit": "100%"
>> + },
>> + {
>> + "MetricName": "smt_contention",
>> + "BriefDescription": "Fraction of dispatch slots that remained unused because the other thread was selected.",
>> + "MetricExpr": "d_ratio(de_no_dispatch_per_slot.smt_contention, total_dispatch_slots)",
>> + "MetricGroup": "pipeline_utilization_level1",
>> + "ScaleUnit": "100%"
>> + },
>> + {
>> + "MetricName": "retiring",
>> + "BriefDescription": "Fraction of dispatch slots used by ops that retired.",
>> + "MetricExpr": "d_ratio(ex_ret_ops, total_dispatch_slots)",
>> + "MetricGroup": "pipeline_utilization_level1",
>> + "ScaleUnit": "100%"
>> + },
>> + {
>> + "MetricName": "frontend_bound_latency",
>> + "BriefDescription": "Fraction of dispatch slots that remained unused because of a latency bottleneck in the frontend (such as instruction cache or TLB misses).",
>> + "MetricExpr": "d_ratio((6 * cpu@de_no_dispatch_per_slot.no_ops_from_frontend\\,cmask\\=0x6@), total_dispatch_slots)",
>> + "MetricGroup": "pipeline_utilization_level2;frontend_bound_level2",
>
> From:
> https://perf.wiki.kernel.org/index.php/Top-Down_Analysis
> perhaps this should be in a group "frontend_bound_group", to make the
> drill down more obvious.
>

Agreed. I'll try to keep the group naming familiar wherever possible.

>> + "ScaleUnit": "100%"
>> + },
>> + {
>> + "MetricName": "frontend_bound_bandwidth",
>> + "BriefDescription": "Fraction of dispatch slots that remained unused because of a bandwidth bottleneck in the frontend (such as decode or op cache fetch bandwidth).",
>> + "MetricExpr": "d_ratio(de_no_dispatch_per_slot.no_ops_from_frontend - (6 * cpu@de_no_dispatch_per_slot.no_ops_from_frontend\\,cmask\\=0x6@), total_dispatch_slots)",
>> + "MetricGroup": "pipeline_utilization_level2;frontend_bound_level2",
>> + "ScaleUnit": "100%"
>> + },
>> + {
>> + "MetricName": "bad_speculation_mispredicts",
>> + "BriefDescription": "Fraction of dispatched ops that were flushed due to branch mispredicts.",
>> + "MetricExpr": "d_ratio(bad_speculation * ex_ret_brn_misp, ex_ret_brn_misp + resyncs_or_nc_redirects)",
>> + "MetricGroup": "pipeline_utilization_level2;bad_speculation_level2",
>> + "ScaleUnit": "100%"
>> + },
>> + {
>> + "MetricName": "bad_speculation_pipeline_restarts",
>> + "BriefDescription": "Fraction of dispatched ops that were flushed due to pipeline restarts (resyncs).",
>> + "MetricExpr": "d_ratio(bad_speculation * resyncs_or_nc_redirects, ex_ret_brn_misp + resyncs_or_nc_redirects)",
>> + "MetricGroup": "pipeline_utilization_level2;bad_speculation_level2",
>> + "ScaleUnit": "100%"
>> + },
>> + {
>> + "MetricName": "backend_bound_memory",
>> + "BriefDescription": "Fraction of dispatch slots that remained unused because of stalls due to the memory subsystem.",
>> + "MetricExpr": "backend_bound * d_ratio(ex_no_retire.load_not_complete, ex_no_retire.not_complete)",
>> + "MetricGroup": "pipeline_utilization_level2;backend_bound_level2",
>
> Similarly there could be a "backend_bound_group", etc.
>

Agreed.

- Sandipan