There are cases where a metric uses more events than the number of
counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four data fabric
counters but the "nps1_die_to_dram" metric has eight events. By default,
the constituent events are placed in a group. Since the events cannot be
scheduled at the same time, the metric is not computed. The all metrics
test also fails because of this.
Before announcing failure, the test can try multiple options for each
available metric. After system-wide mode fails, retry once again with
the "--metric-no-group" option.
E.g.
$ sudo perf test -v 100
Before:
100: perf all metrics test :
--- start ---
test child forked, pid 672731
Testing branch_misprediction_ratio
Testing all_remote_links_outbound
Testing nps1_die_to_dram
Metric 'nps1_die_to_dram' not printed in:
Error:
Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
Testing macro_ops_dispatched
Testing all_l2_cache_accesses
Testing all_l2_cache_hits
Testing all_l2_cache_misses
Testing ic_fetch_miss_ratio
Testing l2_cache_accesses_from_l2_hwpf
Testing l2_cache_misses_from_l2_hwpf
Testing op_cache_fetch_miss_ratio
Testing l3_read_miss_latency
Testing l1_itlb_misses
test child finished with -1
---- end ----
perf all metrics test: FAILED!
After:
100: perf all metrics test :
--- start ---
test child forked, pid 672887
Testing branch_misprediction_ratio
Testing all_remote_links_outbound
Testing nps1_die_to_dram
Testing macro_ops_dispatched
Testing all_l2_cache_accesses
Testing all_l2_cache_hits
Testing all_l2_cache_misses
Testing ic_fetch_miss_ratio
Testing l2_cache_accesses_from_l2_hwpf
Testing l2_cache_misses_from_l2_hwpf
Testing op_cache_fetch_miss_ratio
Testing l3_read_miss_latency
Testing l1_itlb_misses
test child finished with 0
---- end ----
perf all metrics test: Ok
Reported-by: Ayush Jain <[email protected]>
Signed-off-by: Sandipan Das <[email protected]>
---
tools/perf/tests/shell/stat_all_metrics.sh | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh
index 54774525e18a..1e88ea8c5677 100755
--- a/tools/perf/tests/shell/stat_all_metrics.sh
+++ b/tools/perf/tests/shell/stat_all_metrics.sh
@@ -16,6 +16,13 @@ for m in $(perf list --raw-dump metrics); do
then
continue
fi
+ # Failed again, possibly there are not enough counters so retry system wide
+ # mode but without event grouping.
+ result=$(perf stat -M "$m" --metric-no-group -a sleep 0.01 2>&1)
+ if [[ "$result" =~ ${m:0:50} ]]
+ then
+ continue
+ fi
# Failed again, possibly the workload was too small so retry with something
# longer.
result=$(perf stat -M "$m" perf bench internals synthesize 2>&1)
--
2.34.1
Hello Sandipan,
Thank you for this patch,
On 6/14/2023 2:37 PM, Sandipan Das wrote:
> There are cases where a metric uses more events than the number of
> counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four data fabric
> counters but the "nps1_die_to_dram" metric has eight events. By default,
> the constituent events are placed in a group. Since the events cannot be
> scheduled at the same time, the metric is not computed. The all metrics
> test also fails because of this.
>
> Before announcing failure, the test can try multiple options for each
> available metric. After system-wide mode fails, retry once again with
> the "--metric-no-group" option.
>
> E.g.
>
> $ sudo perf test -v 100
>
> Before:
>
> 100: perf all metrics test :
> --- start ---
> test child forked, pid 672731
> Testing branch_misprediction_ratio
> Testing all_remote_links_outbound
> Testing nps1_die_to_dram
> Metric 'nps1_die_to_dram' not printed in:
> Error:
> Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
> Testing macro_ops_dispatched
> Testing all_l2_cache_accesses
> Testing all_l2_cache_hits
> Testing all_l2_cache_misses
> Testing ic_fetch_miss_ratio
> Testing l2_cache_accesses_from_l2_hwpf
> Testing l2_cache_misses_from_l2_hwpf
> Testing op_cache_fetch_miss_ratio
> Testing l3_read_miss_latency
> Testing l1_itlb_misses
> test child finished with -1
> ---- end ----
> perf all metrics test: FAILED!
>
> After:
>
> 100: perf all metrics test :
> --- start ---
> test child forked, pid 672887
> Testing branch_misprediction_ratio
> Testing all_remote_links_outbound
> Testing nps1_die_to_dram
> Testing macro_ops_dispatched
> Testing all_l2_cache_accesses
> Testing all_l2_cache_hits
> Testing all_l2_cache_misses
> Testing ic_fetch_miss_ratio
> Testing l2_cache_accesses_from_l2_hwpf
> Testing l2_cache_misses_from_l2_hwpf
> Testing op_cache_fetch_miss_ratio
> Testing l3_read_miss_latency
> Testing l1_itlb_misses
> test child finished with 0
> ---- end ----
> perf all metrics test: Ok
>
Issue gets resolved after applying this patch
$ ./perf test 102 -vvv
$102: perf all metrics test :
$--- start ---
$test child forked, pid 244991
$Testing branch_misprediction_ratio
$Testing all_remote_links_outbound
$Testing nps1_die_to_dram
$Testing all_l2_cache_accesses
$Testing all_l2_cache_hits
$Testing all_l2_cache_misses
$Testing ic_fetch_miss_ratio
$Testing l2_cache_accesses_from_l2_hwpf
$Testing l2_cache_misses_from_l2_hwpf
$Testing l3_read_miss_latency
$Testing l1_itlb_misses
$test child finished with 0
$---- end ----
$perf all metrics test: Ok
> Reported-by: Ayush Jain <[email protected]>
> Signed-off-by: Sandipan Das <[email protected]>
Tested-by: Ayush Jain <[email protected]>
> ---
> tools/perf/tests/shell/stat_all_metrics.sh | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh
> index 54774525e18a..1e88ea8c5677 100755
> --- a/tools/perf/tests/shell/stat_all_metrics.sh
> +++ b/tools/perf/tests/shell/stat_all_metrics.sh
> @@ -16,6 +16,13 @@ for m in $(perf list --raw-dump metrics); do
> then
> continue
> fi
> + # Failed again, possibly there are not enough counters so retry system wide
> + # mode but without event grouping.
> + result=$(perf stat -M "$m" --metric-no-group -a sleep 0.01 2>&1)
> + if [[ "$result" =~ ${m:0:50} ]]
> + then
> + continue
> + fi
> # Failed again, possibly the workload was too small so retry with something
> # longer.
> result=$(perf stat -M "$m" perf bench internals synthesize 2>&1)
Thanks & Regards,
Ayush Jain
On Wed, Jun 14, 2023 at 2:07 AM Sandipan Das <[email protected]> wrote:
>
> There are cases where a metric uses more events than the number of
> counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four data fabric
> counters but the "nps1_die_to_dram" metric has eight events. By default,
> the constituent events are placed in a group. Since the events cannot be
> scheduled at the same time, the metric is not computed. The all metrics
> test also fails because of this.
Thanks Sandipan. So this is exposing a bug in the AMD data fabric PMU
driver. When the events are added the driver should create a fake PMU,
check that adding the group is valid and if not fail. The failure is
picked up by the tool and it will remove the group.
I appreciate the need for a time machine to make such a fix work. To
workaround the issue with the metrics add:
"MetricConstraint": "NO_GROUP_EVENTS",
to each metric in the json.
> Before announcing failure, the test can try multiple options for each
> available metric. After system-wide mode fails, retry once again with
> the "--metric-no-group" option.
>
> E.g.
>
> $ sudo perf test -v 100
>
> Before:
>
> 100: perf all metrics test :
> --- start ---
> test child forked, pid 672731
> Testing branch_misprediction_ratio
> Testing all_remote_links_outbound
> Testing nps1_die_to_dram
> Metric 'nps1_die_to_dram' not printed in:
> Error:
> Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
This error doesn't relate to grouping, so I'm confused about having it
in the commit message, aside from the test failure.
Thanks,
Ian
> Testing macro_ops_dispatched
> Testing all_l2_cache_accesses
> Testing all_l2_cache_hits
> Testing all_l2_cache_misses
> Testing ic_fetch_miss_ratio
> Testing l2_cache_accesses_from_l2_hwpf
> Testing l2_cache_misses_from_l2_hwpf
> Testing op_cache_fetch_miss_ratio
> Testing l3_read_miss_latency
> Testing l1_itlb_misses
> test child finished with -1
> ---- end ----
> perf all metrics test: FAILED!
>
> After:
>
> 100: perf all metrics test :
> --- start ---
> test child forked, pid 672887
> Testing branch_misprediction_ratio
> Testing all_remote_links_outbound
> Testing nps1_die_to_dram
> Testing macro_ops_dispatched
> Testing all_l2_cache_accesses
> Testing all_l2_cache_hits
> Testing all_l2_cache_misses
> Testing ic_fetch_miss_ratio
> Testing l2_cache_accesses_from_l2_hwpf
> Testing l2_cache_misses_from_l2_hwpf
> Testing op_cache_fetch_miss_ratio
> Testing l3_read_miss_latency
> Testing l1_itlb_misses
> test child finished with 0
> ---- end ----
> perf all metrics test: Ok
>
> Reported-by: Ayush Jain <[email protected]>
> Signed-off-by: Sandipan Das <[email protected]>
> ---
> tools/perf/tests/shell/stat_all_metrics.sh | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh
> index 54774525e18a..1e88ea8c5677 100755
> --- a/tools/perf/tests/shell/stat_all_metrics.sh
> +++ b/tools/perf/tests/shell/stat_all_metrics.sh
> @@ -16,6 +16,13 @@ for m in $(perf list --raw-dump metrics); do
> then
> continue
> fi
> + # Failed again, possibly there are not enough counters so retry system wide
> + # mode but without event grouping.
> + result=$(perf stat -M "$m" --metric-no-group -a sleep 0.01 2>&1)
> + if [[ "$result" =~ ${m:0:50} ]]
> + then
> + continue
> + fi
> # Failed again, possibly the workload was too small so retry with something
> # longer.
> result=$(perf stat -M "$m" perf bench internals synthesize 2>&1)
> --
> 2.34.1
>
Hi Ian,
On 6/14/2023 10:10 PM, Ian Rogers wrote:
> On Wed, Jun 14, 2023 at 2:07 AM Sandipan Das <[email protected]> wrote:
>>
>> There are cases where a metric uses more events than the number of
>> counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four data fabric
>> counters but the "nps1_die_to_dram" metric has eight events. By default,
>> the constituent events are placed in a group. Since the events cannot be
>> scheduled at the same time, the metric is not computed. The all metrics
>> test also fails because of this.
>
> Thanks Sandipan. So this is exposing a bug in the AMD data fabric PMU
> driver. When the events are added the driver should create a fake PMU,
> check that adding the group is valid and if not fail. The failure is
> picked up by the tool and it will remove the group.
>
> I appreciate the need for a time machine to make such a fix work. To
> workaround the issue with the metrics add:
> "MetricConstraint": "NO_GROUP_EVENTS",
> to each metric in the json.
>
Thanks for the suggestions. The amd_uncore driver is indeed missing group
validation checks during event init. Will send out a fix with the
"NO_GROUP_EVENTS" workaround.
>> Before announcing failure, the test can try multiple options for each
>> available metric. After system-wide mode fails, retry once again with
>> the "--metric-no-group" option.
>>
>> E.g.
>>
>> $ sudo perf test -v 100
>>
>> Before:
>>
>> 100: perf all metrics test :
>> --- start ---
>> test child forked, pid 672731
>> Testing branch_misprediction_ratio
>> Testing all_remote_links_outbound
>> Testing nps1_die_to_dram
>> Metric 'nps1_die_to_dram' not printed in:
>> Error:
>> Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
>
> This error doesn't relate to grouping, so I'm confused about having it
> in the commit message, aside from the test failure.
>
Agreed. That's the error message from the last attempt where the test
tries to use a longer running workload (perf bench).
- Sandipan
Em Wed, Jun 14, 2023 at 05:08:21PM +0530, Ayush Jain escreveu:
> On 6/14/2023 2:37 PM, Sandipan Das wrote:
> > There are cases where a metric uses more events than the number of
> > counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four data fabric
> > counters but the "nps1_die_to_dram" metric has eight events. By default,
> > the constituent events are placed in a group. Since the events cannot be
> > scheduled at the same time, the metric is not computed. The all metrics
> > test also fails because of this.
Humm, I'm not being able to reproduce here the problem, before applying
this patch:
[root@five ~]# grep -m1 "model name" /proc/cpuinfo
model name : AMD Ryzen 9 5950X 16-Core Processor
[root@five ~]# perf test -vvv "perf all metrics test"
104: perf all metrics test :
--- start ---
test child forked, pid 1379713
Testing branch_misprediction_ratio
Testing all_remote_links_outbound
Testing nps1_die_to_dram
Testing macro_ops_dispatched
Testing all_l2_cache_accesses
Testing all_l2_cache_hits
Testing all_l2_cache_misses
Testing ic_fetch_miss_ratio
Testing l2_cache_accesses_from_l2_hwpf
Testing l2_cache_misses_from_l2_hwpf
Testing op_cache_fetch_miss_ratio
Testing l3_read_miss_latency
Testing l1_itlb_misses
test child finished with 0
---- end ----
perf all metrics test: Ok
[root@five ~]#
[root@five ~]# perf stat -M nps1_die_to_dram -a sleep 2
Performance counter stats for 'system wide':
0 dram_channel_data_controller_4 # 10885.3 MiB nps1_die_to_dram (49.96%)
31,334,338 dram_channel_data_controller_1 (50.01%)
0 dram_channel_data_controller_6 (50.04%)
54,679,601 dram_channel_data_controller_3 (50.04%)
38,420,402 dram_channel_data_controller_0 (50.04%)
0 dram_channel_data_controller_5 (49.99%)
54,012,661 dram_channel_data_controller_2 (49.96%)
0 dram_channel_data_controller_7 (49.96%)
2.001465439 seconds time elapsed
[root@five ~]#
[root@five ~]# perf stat -v -M nps1_die_to_dram -a sleep 2
Using CPUID AuthenticAMD-25-21-0
metric expr dram_channel_data_controller_0 + dram_channel_data_controller_1 + dram_channel_data_controller_2 + dram_channel_data_controller_3 + dram_channel_data_controller_4 + dram_channel_data_controller_5 + dram_channel_data_controller_6 + dram_channel_data_controller_7 for nps1_die_to_dram
found event dram_channel_data_controller_4
found event dram_channel_data_controller_1
found event dram_channel_data_controller_6
found event dram_channel_data_controller_3
found event dram_channel_data_controller_0
found event dram_channel_data_controller_5
found event dram_channel_data_controller_2
found event dram_channel_data_controller_7
Parsing metric events 'dram_channel_data_controller_4/metric-id=dram_channel_data_controller_4/,dram_channel_data_controller_1/metric-id=dram_channel_data_controller_1/,dram_channel_data_controller_6/metric-id=dram_channel_data_controller_6/,dram_channel_data_controller_3/metric-id=dram_channel_data_controller_3/,dram_channel_data_controller_0/metric-id=dram_channel_data_controller_0/,dram_channel_data_controller_5/metric-id=dram_channel_data_controller_5/,dram_channel_data_controller_2/metric-id=dram_channel_data_controller_2/,dram_channel_data_controller_7/metric-id=dram_channel_data_controller_7/'
dram_channel_data_controller_4 -> amd_df/metric-id=dram_channel_data_controller_4,dram_channel_data_controller_4/
dram_channel_data_controller_1 -> amd_df/metric-id=dram_channel_data_controller_1,dram_channel_data_controller_1/
Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_1'. Missing kernel support? (<no help>)
dram_channel_data_controller_6 -> amd_df/metric-id=dram_channel_data_controller_6,dram_channel_data_controller_6/
Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_6'. Missing kernel support? (<no help>)
dram_channel_data_controller_3 -> amd_df/metric-id=dram_channel_data_controller_3,dram_channel_data_controller_3/
Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_3'. Missing kernel support? (<no help>)
dram_channel_data_controller_0 -> amd_df/metric-id=dram_channel_data_controller_0,dram_channel_data_controller_0/
Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_0'. Missing kernel support? (<no help>)
dram_channel_data_controller_5 -> amd_df/metric-id=dram_channel_data_controller_5,dram_channel_data_controller_5/
Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_5'. Missing kernel support? (<no help>)
dram_channel_data_controller_2 -> amd_df/metric-id=dram_channel_data_controller_2,dram_channel_data_controller_2/
Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_2'. Missing kernel support? (<no help>)
dram_channel_data_controller_7 -> amd_df/metric-id=dram_channel_data_controller_7,dram_channel_data_controller_7/
Matched metric-id dram_channel_data_controller_4 to dram_channel_data_controller_4
Matched metric-id dram_channel_data_controller_1 to dram_channel_data_controller_1
Matched metric-id dram_channel_data_controller_6 to dram_channel_data_controller_6
Matched metric-id dram_channel_data_controller_3 to dram_channel_data_controller_3
Matched metric-id dram_channel_data_controller_0 to dram_channel_data_controller_0
Matched metric-id dram_channel_data_controller_5 to dram_channel_data_controller_5
Matched metric-id dram_channel_data_controller_2 to dram_channel_data_controller_2
Matched metric-id dram_channel_data_controller_7 to dram_channel_data_controller_7
Control descriptor is not initialized
dram_channel_data_controller_4: 0 2001175127 999996394
dram_channel_data_controller_1: 32346663 2001169897 1000709803
dram_channel_data_controller_6: 0 2001168377 1001193443
dram_channel_data_controller_3: 47551247 2001166947 1001198122
dram_channel_data_controller_0: 38975242 2001165217 1001182923
dram_channel_data_controller_5: 0 2001163067 1000464054
dram_channel_data_controller_2: 49934162 2001160907 999974934
dram_channel_data_controller_7: 0 2001150317 999968825
Performance counter stats for 'system wide':
0 dram_channel_data_controller_4 # 10297.2 MiB nps1_die_to_dram (49.97%)
32,346,663 dram_channel_data_controller_1 (50.01%)
0 dram_channel_data_controller_6 (50.03%)
47,551,247 dram_channel_data_controller_3 (50.03%)
38,975,242 dram_channel_data_controller_0 (50.03%)
0 dram_channel_data_controller_5 (49.99%)
49,934,162 dram_channel_data_controller_2 (49.97%)
0 dram_channel_data_controller_7 (49.97%)
2.001196512 seconds time elapsed
[root@five ~]#
What am I missing?
Ian, I also stumbled on this:
[root@five ~]# perf stat -M dram_channel_data_controller_4
Cannot find metric or group `dram_channel_data_controller_4'
^C
Performance counter stats for 'system wide':
284,908.91 msec cpu-clock # 32.002 CPUs utilized
6,485,456 context-switches # 22.763 K/sec
719 cpu-migrations # 2.524 /sec
32,800 page-faults # 115.125 /sec
189,779,273,552 cycles # 0.666 GHz (83.33%)
2,893,165,259 stalled-cycles-frontend # 1.52% frontend cycles idle (83.33%)
24,807,157,349 stalled-cycles-backend # 13.07% backend cycles idle (83.33%)
99,286,488,807 instructions # 0.52 insn per cycle
# 0.25 stalled cycles per insn (83.33%)
24,120,737,678 branches # 84.661 M/sec (83.33%)
1,907,540,278 branch-misses # 7.91% of all branches (83.34%)
8.902784776 seconds time elapsed
[root@five ~]#
[root@five ~]# perf stat -e dram_channel_data_controller_4
^C
Performance counter stats for 'system wide':
0 dram_channel_data_controller_4
1.189638741 seconds time elapsed
[root@five ~]#
I.e. -M should bail out at that point (Cannot find metric or group `dram_channel_data_controller_4'), no?
- Arnaldo
> > Before announcing failure, the test can try multiple options for each
> > available metric. After system-wide mode fails, retry once again with
> > the "--metric-no-group" option.
> >
> > E.g.
> >
> > $ sudo perf test -v 100
> >
> > Before:
> >
> > 100: perf all metrics test :
> > --- start ---
> > test child forked, pid 672731
> > Testing branch_misprediction_ratio
> > Testing all_remote_links_outbound
> > Testing nps1_die_to_dram
> > Metric 'nps1_die_to_dram' not printed in:
> > Error:
> > Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
> > Testing macro_ops_dispatched
> > Testing all_l2_cache_accesses
> > Testing all_l2_cache_hits
> > Testing all_l2_cache_misses
> > Testing ic_fetch_miss_ratio
> > Testing l2_cache_accesses_from_l2_hwpf
> > Testing l2_cache_misses_from_l2_hwpf
> > Testing op_cache_fetch_miss_ratio
> > Testing l3_read_miss_latency
> > Testing l1_itlb_misses
> > test child finished with -1
> > ---- end ----
> > perf all metrics test: FAILED!
> >
> > After:
> >
> > 100: perf all metrics test :
> > --- start ---
> > test child forked, pid 672887
> > Testing branch_misprediction_ratio
> > Testing all_remote_links_outbound
> > Testing nps1_die_to_dram
> > Testing macro_ops_dispatched
> > Testing all_l2_cache_accesses
> > Testing all_l2_cache_hits
> > Testing all_l2_cache_misses
> > Testing ic_fetch_miss_ratio
> > Testing l2_cache_accesses_from_l2_hwpf
> > Testing l2_cache_misses_from_l2_hwpf
> > Testing op_cache_fetch_miss_ratio
> > Testing l3_read_miss_latency
> > Testing l1_itlb_misses
> > test child finished with 0
> > ---- end ----
> > perf all metrics test: Ok
> >
>
> Issue gets resolved after applying this patch
>
> $ ./perf test 102 -vvv
> $102: perf all metrics test :
> $--- start ---
> $test child forked, pid 244991
> $Testing branch_misprediction_ratio
> $Testing all_remote_links_outbound
> $Testing nps1_die_to_dram
> $Testing all_l2_cache_accesses
> $Testing all_l2_cache_hits
> $Testing all_l2_cache_misses
> $Testing ic_fetch_miss_ratio
> $Testing l2_cache_accesses_from_l2_hwpf
> $Testing l2_cache_misses_from_l2_hwpf
> $Testing l3_read_miss_latency
> $Testing l1_itlb_misses
> $test child finished with 0
> $---- end ----
> $perf all metrics test: Ok
>
> > Reported-by: Ayush Jain <[email protected]>
> > Signed-off-by: Sandipan Das <[email protected]>
>
> Tested-by: Ayush Jain <[email protected]>
>
> > ---
> > tools/perf/tests/shell/stat_all_metrics.sh | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh
> > index 54774525e18a..1e88ea8c5677 100755
> > --- a/tools/perf/tests/shell/stat_all_metrics.sh
> > +++ b/tools/perf/tests/shell/stat_all_metrics.sh
> > @@ -16,6 +16,13 @@ for m in $(perf list --raw-dump metrics); do
> > then
> > continue
> > fi
> > + # Failed again, possibly there are not enough counters so retry system wide
> > + # mode but without event grouping.
> > + result=$(perf stat -M "$m" --metric-no-group -a sleep 0.01 2>&1)
> > + if [[ "$result" =~ ${m:0:50} ]]
> > + then
> > + continue
> > + fi
> > # Failed again, possibly the workload was too small so retry with something
> > # longer.
> > result=$(perf stat -M "$m" perf bench internals synthesize 2>&1)
>
> Thanks & Regards,
> Ayush Jain
--
- Arnaldo
On Wed, Dec 6, 2023 at 5:08 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
>
> Em Wed, Jun 14, 2023 at 05:08:21PM +0530, Ayush Jain escreveu:
> > On 6/14/2023 2:37 PM, Sandipan Das wrote:
> > > There are cases where a metric uses more events than the number of
> > > counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four data fabric
> > > counters but the "nps1_die_to_dram" metric has eight events. By default,
> > > the constituent events are placed in a group. Since the events cannot be
> > > scheduled at the same time, the metric is not computed. The all metrics
> > > test also fails because of this.
>
> Humm, I'm not being able to reproduce here the problem, before applying
> this patch:
>
> [root@five ~]# grep -m1 "model name" /proc/cpuinfo
> model name : AMD Ryzen 9 5950X 16-Core Processor
> [root@five ~]# perf test -vvv "perf all metrics test"
> 104: perf all metrics test :
> --- start ---
> test child forked, pid 1379713
> Testing branch_misprediction_ratio
> Testing all_remote_links_outbound
> Testing nps1_die_to_dram
> Testing macro_ops_dispatched
> Testing all_l2_cache_accesses
> Testing all_l2_cache_hits
> Testing all_l2_cache_misses
> Testing ic_fetch_miss_ratio
> Testing l2_cache_accesses_from_l2_hwpf
> Testing l2_cache_misses_from_l2_hwpf
> Testing op_cache_fetch_miss_ratio
> Testing l3_read_miss_latency
> Testing l1_itlb_misses
> test child finished with 0
> ---- end ----
> perf all metrics test: Ok
> [root@five ~]#
Please don't apply the patch. The patch masks a bug in metrics/PMUs
and the proper fix was:
8d40f74ebf21 perf vendor events amd: Fix large metrics
https://lore.kernel.org/r/[email protected]
> [root@five ~]# perf stat -M nps1_die_to_dram -a sleep 2
>
> Performance counter stats for 'system wide':
>
> 0 dram_channel_data_controller_4 # 10885.3 MiB nps1_die_to_dram (49.96%)
> 31,334,338 dram_channel_data_controller_1 (50.01%)
> 0 dram_channel_data_controller_6 (50.04%)
> 54,679,601 dram_channel_data_controller_3 (50.04%)
> 38,420,402 dram_channel_data_controller_0 (50.04%)
> 0 dram_channel_data_controller_5 (49.99%)
> 54,012,661 dram_channel_data_controller_2 (49.96%)
> 0 dram_channel_data_controller_7 (49.96%)
>
> 2.001465439 seconds time elapsed
>
> [root@five ~]#
>
> [root@five ~]# perf stat -v -M nps1_die_to_dram -a sleep 2
> Using CPUID AuthenticAMD-25-21-0
> metric expr dram_channel_data_controller_0 + dram_channel_data_controller_1 + dram_channel_data_controller_2 + dram_channel_data_controller_3 + dram_channel_data_controller_4 + dram_channel_data_controller_5 + dram_channel_data_controller_6 + dram_channel_data_controller_7 for nps1_die_to_dram
> found event dram_channel_data_controller_4
> found event dram_channel_data_controller_1
> found event dram_channel_data_controller_6
> found event dram_channel_data_controller_3
> found event dram_channel_data_controller_0
> found event dram_channel_data_controller_5
> found event dram_channel_data_controller_2
> found event dram_channel_data_controller_7
> Parsing metric events 'dram_channel_data_controller_4/metric-id=dram_channel_data_controller_4/,dram_channel_data_controller_1/metric-id=dram_channel_data_controller_1/,dram_channel_data_controller_6/metric-id=dram_channel_data_controller_6/,dram_channel_data_controller_3/metric-id=dram_channel_data_controller_3/,dram_channel_data_controller_0/metric-id=dram_channel_data_controller_0/,dram_channel_data_controller_5/metric-id=dram_channel_data_controller_5/,dram_channel_data_controller_2/metric-id=dram_channel_data_controller_2/,dram_channel_data_controller_7/metric-id=dram_channel_data_controller_7/'
> dram_channel_data_controller_4 -> amd_df/metric-id=dram_channel_data_controller_4,dram_channel_data_controller_4/
> dram_channel_data_controller_1 -> amd_df/metric-id=dram_channel_data_controller_1,dram_channel_data_controller_1/
> Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_1'. Missing kernel support? (<no help>)
> dram_channel_data_controller_6 -> amd_df/metric-id=dram_channel_data_controller_6,dram_channel_data_controller_6/
> Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_6'. Missing kernel support? (<no help>)
> dram_channel_data_controller_3 -> amd_df/metric-id=dram_channel_data_controller_3,dram_channel_data_controller_3/
> Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_3'. Missing kernel support? (<no help>)
> dram_channel_data_controller_0 -> amd_df/metric-id=dram_channel_data_controller_0,dram_channel_data_controller_0/
> Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_0'. Missing kernel support? (<no help>)
> dram_channel_data_controller_5 -> amd_df/metric-id=dram_channel_data_controller_5,dram_channel_data_controller_5/
> Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_5'. Missing kernel support? (<no help>)
> dram_channel_data_controller_2 -> amd_df/metric-id=dram_channel_data_controller_2,dram_channel_data_controller_2/
> Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_2'. Missing kernel support? (<no help>)
> dram_channel_data_controller_7 -> amd_df/metric-id=dram_channel_data_controller_7,dram_channel_data_controller_7/
> Matched metric-id dram_channel_data_controller_4 to dram_channel_data_controller_4
> Matched metric-id dram_channel_data_controller_1 to dram_channel_data_controller_1
> Matched metric-id dram_channel_data_controller_6 to dram_channel_data_controller_6
> Matched metric-id dram_channel_data_controller_3 to dram_channel_data_controller_3
> Matched metric-id dram_channel_data_controller_0 to dram_channel_data_controller_0
> Matched metric-id dram_channel_data_controller_5 to dram_channel_data_controller_5
> Matched metric-id dram_channel_data_controller_2 to dram_channel_data_controller_2
> Matched metric-id dram_channel_data_controller_7 to dram_channel_data_controller_7
> Control descriptor is not initialized
> dram_channel_data_controller_4: 0 2001175127 999996394
> dram_channel_data_controller_1: 32346663 2001169897 1000709803
> dram_channel_data_controller_6: 0 2001168377 1001193443
> dram_channel_data_controller_3: 47551247 2001166947 1001198122
> dram_channel_data_controller_0: 38975242 2001165217 1001182923
> dram_channel_data_controller_5: 0 2001163067 1000464054
> dram_channel_data_controller_2: 49934162 2001160907 999974934
> dram_channel_data_controller_7: 0 2001150317 999968825
>
> Performance counter stats for 'system wide':
>
> 0 dram_channel_data_controller_4 # 10297.2 MiB nps1_die_to_dram (49.97%)
> 32,346,663 dram_channel_data_controller_1 (50.01%)
> 0 dram_channel_data_controller_6 (50.03%)
> 47,551,247 dram_channel_data_controller_3 (50.03%)
> 38,975,242 dram_channel_data_controller_0 (50.03%)
> 0 dram_channel_data_controller_5 (49.99%)
> 49,934,162 dram_channel_data_controller_2 (49.97%)
> 0 dram_channel_data_controller_7 (49.97%)
>
> 2.001196512 seconds time elapsed
>
> [root@five ~]#
>
> What am I missing?
>
> Ian, I also stumbled on this:
>
> [root@five ~]# perf stat -M dram_channel_data_controller_4
> Cannot find metric or group `dram_channel_data_controller_4'
> ^C
> Performance counter stats for 'system wide':
>
> 284,908.91 msec cpu-clock # 32.002 CPUs utilized
> 6,485,456 context-switches # 22.763 K/sec
> 719 cpu-migrations # 2.524 /sec
> 32,800 page-faults # 115.125 /sec
> 189,779,273,552 cycles # 0.666 GHz (83.33%)
> 2,893,165,259 stalled-cycles-frontend # 1.52% frontend cycles idle (83.33%)
> 24,807,157,349 stalled-cycles-backend # 13.07% backend cycles idle (83.33%)
> 99,286,488,807 instructions # 0.52 insn per cycle
> # 0.25 stalled cycles per insn (83.33%)
> 24,120,737,678 branches # 84.661 M/sec (83.33%)
> 1,907,540,278 branch-misses # 7.91% of all branches (83.34%)
>
> 8.902784776 seconds time elapsed
>
>
> [root@five ~]#
> [root@five ~]# perf stat -e dram_channel_data_controller_4
> ^C
> Performance counter stats for 'system wide':
>
> 0 dram_channel_data_controller_4
>
> 1.189638741 seconds time elapsed
>
>
> [root@five ~]#
>
> I.e. -M should bail out at that point (Cannot find metric or group `dram_channel_data_controller_4'), no?
We could. I suspect the code has always just not bailed out. I'll put
together a patch adding the bail out.
Thanks,
Ian
> - Arnaldo
>
> > > Before announcing failure, the test can try multiple options for each
> > > available metric. After system-wide mode fails, retry once again with
> > > the "--metric-no-group" option.
> > >
> > > E.g.
> > >
> > > $ sudo perf test -v 100
> > >
> > > Before:
> > >
> > > 100: perf all metrics test :
> > > --- start ---
> > > test child forked, pid 672731
> > > Testing branch_misprediction_ratio
> > > Testing all_remote_links_outbound
> > > Testing nps1_die_to_dram
> > > Metric 'nps1_die_to_dram' not printed in:
> > > Error:
> > > Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
> > > Testing macro_ops_dispatched
> > > Testing all_l2_cache_accesses
> > > Testing all_l2_cache_hits
> > > Testing all_l2_cache_misses
> > > Testing ic_fetch_miss_ratio
> > > Testing l2_cache_accesses_from_l2_hwpf
> > > Testing l2_cache_misses_from_l2_hwpf
> > > Testing op_cache_fetch_miss_ratio
> > > Testing l3_read_miss_latency
> > > Testing l1_itlb_misses
> > > test child finished with -1
> > > ---- end ----
> > > perf all metrics test: FAILED!
> > >
> > > After:
> > >
> > > 100: perf all metrics test :
> > > --- start ---
> > > test child forked, pid 672887
> > > Testing branch_misprediction_ratio
> > > Testing all_remote_links_outbound
> > > Testing nps1_die_to_dram
> > > Testing macro_ops_dispatched
> > > Testing all_l2_cache_accesses
> > > Testing all_l2_cache_hits
> > > Testing all_l2_cache_misses
> > > Testing ic_fetch_miss_ratio
> > > Testing l2_cache_accesses_from_l2_hwpf
> > > Testing l2_cache_misses_from_l2_hwpf
> > > Testing op_cache_fetch_miss_ratio
> > > Testing l3_read_miss_latency
> > > Testing l1_itlb_misses
> > > test child finished with 0
> > > ---- end ----
> > > perf all metrics test: Ok
> > >
> >
> > Issue gets resolved after applying this patch
> >
> > $ ./perf test 102 -vvv
> > $102: perf all metrics test :
> > $--- start ---
> > $test child forked, pid 244991
> > $Testing branch_misprediction_ratio
> > $Testing all_remote_links_outbound
> > $Testing nps1_die_to_dram
> > $Testing all_l2_cache_accesses
> > $Testing all_l2_cache_hits
> > $Testing all_l2_cache_misses
> > $Testing ic_fetch_miss_ratio
> > $Testing l2_cache_accesses_from_l2_hwpf
> > $Testing l2_cache_misses_from_l2_hwpf
> > $Testing l3_read_miss_latency
> > $Testing l1_itlb_misses
> > $test child finished with 0
> > $---- end ----
> > $perf all metrics test: Ok
> >
> > > Reported-by: Ayush Jain <[email protected]>
> > > Signed-off-by: Sandipan Das <[email protected]>
> >
> > Tested-by: Ayush Jain <[email protected]>
> >
> > > ---
> > > tools/perf/tests/shell/stat_all_metrics.sh | 7 +++++++
> > > 1 file changed, 7 insertions(+)
> > >
> > > diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh
> > > index 54774525e18a..1e88ea8c5677 100755
> > > --- a/tools/perf/tests/shell/stat_all_metrics.sh
> > > +++ b/tools/perf/tests/shell/stat_all_metrics.sh
> > > @@ -16,6 +16,13 @@ for m in $(perf list --raw-dump metrics); do
> > > then
> > > continue
> > > fi
> > > + # Failed again, possibly there are not enough counters so retry system wide
> > > + # mode but without event grouping.
> > > + result=$(perf stat -M "$m" --metric-no-group -a sleep 0.01 2>&1)
> > > + if [[ "$result" =~ ${m:0:50} ]]
> > > + then
> > > + continue
> > > + fi
> > > # Failed again, possibly the workload was too small so retry with something
> > > # longer.
> > > result=$(perf stat -M "$m" perf bench internals synthesize 2>&1)
> >
> > Thanks & Regards,
> > Ayush Jain
>
> --
>
> - Arnaldo
Em Wed, Dec 06, 2023 at 08:35:23AM -0800, Ian Rogers escreveu:
> On Wed, Dec 6, 2023 at 5:08 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > Humm, I'm not being able to reproduce here the problem, before applying
> > this patch:
> Please don't apply the patch. The patch masks a bug in metrics/PMUs
I didn't
> and the proper fix was:
> 8d40f74ebf21 perf vendor events amd: Fix large metrics
> https://lore.kernel.org/r/[email protected]
that is upstream:
⬢[acme@toolbox perf-tools-next]$ git log tools/perf/pmu-events/arch/x86/amdzen1/recommended.json
commit 8d40f74ebf217d3b9e9b7481721e6236b857cc55
Author: Sandipan Das <[email protected]>
Date: Thu Jul 6 12:04:40 2023 +0530
perf vendor events amd: Fix large metrics
There are cases where a metric requires more events than the number of
available counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four
data fabric counters but the "nps1_die_to_dram" metric has eight events.
By default, the constituent events are placed in a group and since the
events cannot be scheduled at the same time, the metric is not computed.
The "all metrics" test also fails because of this.
Use the NO_GROUP_EVENTS constraint for such metrics which anyway expect
the user to run perf with "--metric-no-group".
E.g.
$ sudo perf test -v 101
Before:
101: perf all metrics test :
--- start ---
test child forked, pid 37131
Testing branch_misprediction_ratio
Testing all_remote_links_outbound
Testing nps1_die_to_dram
Metric 'nps1_die_to_dram' not printed in:
Error:
Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
Testing macro_ops_dispatched
Testing all_l2_cache_accesses
Testing all_l2_cache_hits
Testing all_l2_cache_misses
Testing ic_fetch_miss_ratio
Testing l2_cache_accesses_from_l2_hwpf
Testing l2_cache_misses_from_l2_hwpf
Testing op_cache_fetch_miss_ratio
Testing l3_read_miss_latency
Testing l1_itlb_misses
test child finished with -1
---- end ----
perf all metrics test: FAILED!
After:
101: perf all metrics test :
--- start ---
test child forked, pid 43766
Testing branch_misprediction_ratio
Testing all_remote_links_outbound
Testing nps1_die_to_dram
Testing macro_ops_dispatched
Testing all_l2_cache_accesses
Testing all_l2_cache_hits
Testing all_l2_cache_misses
Testing ic_fetch_miss_ratio
Testing l2_cache_accesses_from_l2_hwpf
Testing l2_cache_misses_from_l2_hwpf
Testing op_cache_fetch_miss_ratio
Testing l3_read_miss_latency
Testing l1_itlb_misses
test child finished with 0
---- end ----
perf all metrics test: Ok
Reported-by: Ayush Jain <[email protected]>
Suggested-by: Ian Rogers <[email protected]>
Signed-off-by: Sandipan Das <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ananth Narayan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Santosh Shukla <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]
> > Ian, I also stumbled on this:
> > [root@five ~]# perf stat -M dram_channel_data_controller_4
> > Cannot find metric or group `dram_channel_data_controller_4'
> > ^C
> > Performance counter stats for 'system wide':
> > 284,908.91 msec cpu-clock # 32.002 CPUs utilized
> > 6,485,456 context-switches # 22.763 K/sec
> > 719 cpu-migrations # 2.524 /sec
> > 32,800 page-faults # 115.125 /sec
<SNIP>
> > I.e. -M should bail out at that point (Cannot find metric or group `dram_channel_data_controller_4'), no?
> We could. I suspect the code has always just not bailed out. I'll put
> together a patch adding the bail out.
Great, thanks,
- Arnaldo
On Wed, Dec 6, 2023 at 9:54 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
>
> Em Wed, Dec 06, 2023 at 08:35:23AM -0800, Ian Rogers escreveu:
> > On Wed, Dec 6, 2023 at 5:08 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > Humm, I'm not being able to reproduce here the problem, before applying
> > > this patch:
>
> > Please don't apply the patch. The patch masks a bug in metrics/PMUs
>
> I didn't
>
> > and the proper fix was:
> > 8d40f74ebf21 perf vendor events amd: Fix large metrics
> > https://lore.kernel.org/r/[email protected]
>
> that is upstream:
>
> ⬢[acme@toolbox perf-tools-next]$ git log tools/perf/pmu-events/arch/x86/amdzen1/recommended.json
> commit 8d40f74ebf217d3b9e9b7481721e6236b857cc55
> Author: Sandipan Das <[email protected]>
> Date: Thu Jul 6 12:04:40 2023 +0530
>
> perf vendor events amd: Fix large metrics
>
> There are cases where a metric requires more events than the number of
> available counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four
> data fabric counters but the "nps1_die_to_dram" metric has eight events.
>
> By default, the constituent events are placed in a group and since the
> events cannot be scheduled at the same time, the metric is not computed.
> The "all metrics" test also fails because of this.
>
> Use the NO_GROUP_EVENTS constraint for such metrics which anyway expect
> the user to run perf with "--metric-no-group".
>
> E.g.
>
> $ sudo perf test -v 101
>
> Before:
>
> 101: perf all metrics test :
> --- start ---
> test child forked, pid 37131
> Testing branch_misprediction_ratio
> Testing all_remote_links_outbound
> Testing nps1_die_to_dram
> Metric 'nps1_die_to_dram' not printed in:
> Error:
> Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
> Testing macro_ops_dispatched
> Testing all_l2_cache_accesses
> Testing all_l2_cache_hits
> Testing all_l2_cache_misses
> Testing ic_fetch_miss_ratio
> Testing l2_cache_accesses_from_l2_hwpf
> Testing l2_cache_misses_from_l2_hwpf
> Testing op_cache_fetch_miss_ratio
> Testing l3_read_miss_latency
> Testing l1_itlb_misses
> test child finished with -1
> ---- end ----
> perf all metrics test: FAILED!
>
> After:
>
> 101: perf all metrics test :
> --- start ---
> test child forked, pid 43766
> Testing branch_misprediction_ratio
> Testing all_remote_links_outbound
> Testing nps1_die_to_dram
> Testing macro_ops_dispatched
> Testing all_l2_cache_accesses
> Testing all_l2_cache_hits
> Testing all_l2_cache_misses
> Testing ic_fetch_miss_ratio
> Testing l2_cache_accesses_from_l2_hwpf
> Testing l2_cache_misses_from_l2_hwpf
> Testing op_cache_fetch_miss_ratio
> Testing l3_read_miss_latency
> Testing l1_itlb_misses
> test child finished with 0
> ---- end ----
> perf all metrics test: Ok
>
> Reported-by: Ayush Jain <[email protected]>
> Suggested-by: Ian Rogers <[email protected]>
> Signed-off-by: Sandipan Das <[email protected]>
> Acked-by: Ian Rogers <[email protected]>
> Cc: Adrian Hunter <[email protected]>
> Cc: Alexander Shishkin <[email protected]>
> Cc: Ananth Narayan <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ravi Bangoria <[email protected]>
> Cc: Santosh Shukla <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]
>
> > > Ian, I also stumbled on this:
>
> > > [root@five ~]# perf stat -M dram_channel_data_controller_4
> > > Cannot find metric or group `dram_channel_data_controller_4'
> > > ^C
> > > Performance counter stats for 'system wide':
>
> > > 284,908.91 msec cpu-clock # 32.002 CPUs utilized
> > > 6,485,456 context-switches # 22.763 K/sec
> > > 719 cpu-migrations # 2.524 /sec
> > > 32,800 page-faults # 115.125 /sec
>
> <SNIP>
>
> > > I.e. -M should bail out at that point (Cannot find metric or group `dram_channel_data_controller_4'), no?
>
> > We could. I suspect the code has always just not bailed out. I'll put
> > together a patch adding the bail out.
>
> Great, thanks,
Sent:
https://lore.kernel.org/lkml/[email protected]/
Thanks,
Ian
> - Arnaldo