LinuxLists.cc - [PATCH v4 0/6] Add metrics for neoverse-n2

2022-12-23 13:22:46

Subject: [PATCH v4 0/6] Add metrics for neoverse-n2

Changes since v3:
- Add ipc_rate metric;
- Drop the PublicDescription;
- Describe PEutilization metrics in more detail;
- Link: https://lore.kernel.org/all/[email protected]/

Changes since v2:
- Correct the furmula of Branch metrics;
- Add more PE utilization metrics;
- Add more TLB metrics;
- Add “ScaleUnit” for some metrics;
- Add a newline at the end of the file;
- Link: https://lore.kernel.org/all/[email protected]/

Changes since v1:
- Corrected formula for topdown L1 due to wrong counts for stall_slot and
stall_slot_frontend;
- Link: https://lore.kernel.org/all/[email protected]/

This series add six metricgroups for neoverse-n2, among which, the formula of
topdown L1 is from ARM sbsa7.0 platform design document [0], D37-38.

However, due to the wrong count of stall_slot and stall_slot_frontend on
neoverse-n2, the real stall_slot and real stall_slot_frontend need to
subtract cpu_cycles, so correct the expression of topdown metrics.
Reference from ARM neoverse-n2 errata notice [1], D117.

Since neoverse-n2 does not yet support topdown L2, metricgroups such as Cache,
TLB, Branch, InstructionsMix, and PEutilization are added to help further
analysis of performance bottlenecks. Reference from ARM PMU guide [2][3].

[0] https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
[1] https://documentation-service.arm.com/static/636a66a64e6cf12278ad89cb?token=
[2] https://documentation-service.arm.com/static/628f8fa3dfaf015c2b76eae8?token=
[3] https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=

$./perf list
...
Metric Groups:

Branch:
branch_miss_pred_rate
[The rate of branches mis-predited to the overall branches]
branch_mpki
[The rate of branches mis-predicted per kilo instructions]
branch_pki
[The rate of branches retired per kilo instructions]
Cache:
l1d_cache_miss_rate
[The rate of L1 D-Cache misses to the overall L1 D-Cache]
l1d_cache_mpki
[The rate of L1 D-Cache misses per kilo instructions]
...

$sudo ./perf stat -M TLB false_sharing 2

Performance counter stats for 'false_sharing 2':

31,561 L2D_TLB # 18.8 % l2_tlb_miss_rate (43.23%)
5,944 L2D_TLB_REFILL (43.23%)
2,248 L1I_TLB_REFILL # 0.1 % l1i_tlb_miss_rate (43.85%)
2,203,195 L1I_TLB (43.85%)
328,647,380 L1D_TLB # 0.0 % l1d_tlb_miss_rate (44.32%)
26,347 L1D_TLB_REFILL (44.32%)
747,319 L1I_TLB # 0.0 % itlb_walk_rate (43.74%)
310 ITLB_WALK (43.74%)
839,420,454 INST_RETIRED # 0.00 itlb_mpki (42.77%)
212 ITLB_WALK (42.77%)
468 DTLB_WALK # 0.0 % dtlb_walk_rate (42.28%)
265,405,802 L1D_TLB (42.28%)
790,874,367 INST_RETIRED # 0.00 dtlb_mpki (42.33%)
23 DTLB_WALK (42.33%)

0.515904553 seconds time elapsed

1.410313000 seconds user
0.000000000 seconds sys

$sudo ./perf stat -M TopDownL1 false_sharing 2

Performance counter stats for 'false_sharing 2':

4,310,905,590 cpu_cycles # 0.0 % bad_speculation
# 4.0 % retiring (66.87%)
25,009,763,735 stall_slot (66.87%)
855,659,327 op_spec (66.87%)
854,335,288 op_retired (66.87%)
4,330,308,058 cpu_cycles # 27.1 % frontend_bound (66.99%)
10,207,186,460 stall_slot_frontend (66.99%)
4,316,583,673 cpu_cycles # 69.4 % backend_bound (66.65%)
14,979,136,808 stall_slot_backend (66.65%)

0.572056818 seconds time elapsed

1.572143000 seconds user
0.004010000 seconds sys

Jing Zhang (6):
perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
perf vendor events arm64: Add TLB metrics for neoverse-n2
perf vendor events arm64: Add cache metrics for neoverse-n2
perf vendor events arm64: Add branch metrics for neoverse-n2
perf vendor events arm64: Add PE utilization metrics for neoverse-n2
perf vendor events arm64: Add instruction mix metrics for neoverse-n2

.../arch/arm64/arm/neoverse-n2/metrics.json | 277 +++++++++++++++++++++
1 file changed, 277 insertions(+)
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json

--
1.8.3.1

2022-12-30 18:51:54

by Ian Rogers

[permalink] [raw]

Subject: Re: [PATCH v4 0/6] Add metrics for neoverse-n2

On Fri, Dec 23, 2022 at 4:39 AM Jing Zhang <[email protected]> wrote:
>
> Changes since v3:
> - Add ipc_rate metric;
> - Drop the PublicDescription;
> - Describe PEutilization metrics in more detail;
> - Link: https://lore.kernel.org/all/[email protected]/
>
> Changes since v2:
> - Correct the furmula of Branch metrics;
> - Add more PE utilization metrics;
> - Add more TLB metrics;
> - Add “ScaleUnit” for some metrics;
> - Add a newline at the end of the file;
> - Link: https://lore.kernel.org/all/[email protected]/
>
> Changes since v1:
> - Corrected formula for topdown L1 due to wrong counts for stall_slot and
> stall_slot_frontend;
> - Link: https://lore.kernel.org/all/[email protected]/
>
>
> This series add six metricgroups for neoverse-n2, among which, the formula of
> topdown L1 is from ARM sbsa7.0 platform design document [0], D37-38.
>
> However, due to the wrong count of stall_slot and stall_slot_frontend on
> neoverse-n2, the real stall_slot and real stall_slot_frontend need to
> subtract cpu_cycles, so correct the expression of topdown metrics.
> Reference from ARM neoverse-n2 errata notice [1], D117.
>
> Since neoverse-n2 does not yet support topdown L2, metricgroups such as Cache,
> TLB, Branch, InstructionsMix, and PEutilization are added to help further
> analysis of performance bottlenecks. Reference from ARM PMU guide [2][3].
>
> [0] https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
> [1] https://documentation-service.arm.com/static/636a66a64e6cf12278ad89cb?token=
> [2] https://documentation-service.arm.com/static/628f8fa3dfaf015c2b76eae8?token=
> [3] https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=
>
>
> $./perf list
> ...
> Metric Groups:
>
> Branch:
> branch_miss_pred_rate
> [The rate of branches mis-predited to the overall branches]
> branch_mpki
> [The rate of branches mis-predicted per kilo instructions]
> branch_pki
> [The rate of branches retired per kilo instructions]
> Cache:
> l1d_cache_miss_rate
> [The rate of L1 D-Cache misses to the overall L1 D-Cache]
> l1d_cache_mpki
> [The rate of L1 D-Cache misses per kilo instructions]
> ...
>
>
> $sudo ./perf stat -M TLB false_sharing 2
>
> Performance counter stats for 'false_sharing 2':
>
> 31,561 L2D_TLB # 18.8 % l2_tlb_miss_rate (43.23%)
> 5,944 L2D_TLB_REFILL (43.23%)
> 2,248 L1I_TLB_REFILL # 0.1 % l1i_tlb_miss_rate (43.85%)
> 2,203,195 L1I_TLB (43.85%)
> 328,647,380 L1D_TLB # 0.0 % l1d_tlb_miss_rate (44.32%)
> 26,347 L1D_TLB_REFILL (44.32%)
> 747,319 L1I_TLB # 0.0 % itlb_walk_rate (43.74%)
> 310 ITLB_WALK (43.74%)
> 839,420,454 INST_RETIRED # 0.00 itlb_mpki (42.77%)
> 212 ITLB_WALK (42.77%)
> 468 DTLB_WALK # 0.0 % dtlb_walk_rate (42.28%)
> 265,405,802 L1D_TLB (42.28%)
> 790,874,367 INST_RETIRED # 0.00 dtlb_mpki (42.33%)
> 23 DTLB_WALK (42.33%)
>
> 0.515904553 seconds time elapsed
>
> 1.410313000 seconds user
> 0.000000000 seconds sys
>
>
> $sudo ./perf stat -M TopDownL1 false_sharing 2
>
> Performance counter stats for 'false_sharing 2':
>
> 4,310,905,590 cpu_cycles # 0.0 % bad_speculation
> # 4.0 % retiring (66.87%)
> 25,009,763,735 stall_slot (66.87%)
> 855,659,327 op_spec (66.87%)
> 854,335,288 op_retired (66.87%)
> 4,330,308,058 cpu_cycles # 27.1 % frontend_bound (66.99%)
> 10,207,186,460 stall_slot_frontend (66.99%)
> 4,316,583,673 cpu_cycles # 69.4 % backend_bound (66.65%)
> 14,979,136,808 stall_slot_backend (66.65%)
>
> 0.572056818 seconds time elapsed
>
> 1.572143000 seconds user
> 0.004010000 seconds sys
>
>
> Jing Zhang (6):
> perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
> perf vendor events arm64: Add TLB metrics for neoverse-n2
> perf vendor events arm64: Add cache metrics for neoverse-n2
> perf vendor events arm64: Add branch metrics for neoverse-n2
> perf vendor events arm64: Add PE utilization metrics for neoverse-n2
> perf vendor events arm64: Add instruction mix metrics for neoverse-n2

Series:
Acked-by: Ian Rogers <[email protected]>

The only observation I had is that the "per kilo instruction" in the
names (ie ending pki) could be moved into the ScaleUnit. Which may
make the names and the output a little cleaner.

Thanks!
Ian

> .../arch/arm64/arm/neoverse-n2/metrics.json | 277 +++++++++++++++++++++
> 1 file changed, 277 insertions(+)
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>
> --
> 1.8.3.1
>

2023-01-03 07:00:21

by Jing Zhang

[permalink] [raw]

Subject: Re: [PATCH v4 0/6] Add metrics for neoverse-n2

在 2022/12/31 上午2:48, Ian Rogers 写道:
> Series:
> Acked-by: Ian Rogers <[email protected]>
>
> The only observation I had is that the "per kilo instruction" in the
> names (ie ending pki) could be moved into the ScaleUnit. Which may
> make the names and the output a little cleaner.
>
> Thanks!
> Ian

Will do, Thank you Ian!