2023-06-15 14:35:41

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V3 0/8] New metricgroup output in perf stat default mode

From: Kan Liang <[email protected]>

Changes since V2:
- Fixes memory leak (Ian)
(Ian, I cannot reproduce the memory leak on all my machines. Please
check whether the fix works on your side. Thanks.)
- Add Reviewed-by tags for several patches.

Changes since V1:
- Remove EVSEL_EVENT_MASK and use the __evsel__match which is suggested
by Ian.
- Support TopdownL1 on both e-core and p-core of ADL in the default
mode. (Ian)
- Have separate patches for the modifications of metricgroup and output.
(Ian)
- Does 2nd sort for the Default metricgroup. Remove the logic of
changing the associated metric event. (Ian)
- Move all the metric related code to stat-shadow (Ian)
- Move the commong functions between stat+csv_output and stat+std_output
to the lib directory (Ian)

In the default mode, the current output of the metricgroup include both
events and metrics, which is not necessary and makes the output hard to
read. Also, different ARCHs (even different generations of the ARCH) may
have a different output format because of the different events in a
metrics.

The patch proposes a new output format which only outputting the value
of each metric and the metricgroup name. It can brings a clean and
consistent output format among ARCHs and generations.

The first patche is a bug fix for the current code.

The patches 2-5 introduce the new metricgroup output.

The patches 6-8 improve the tests to cover the default mode.

Here are some examples for the new output.

STD output:

On SPR

perf stat -a sleep 1

Performance counter stats for 'system wide':

226,054.13 msec cpu-clock # 224.588 CPUs utilized
932 context-switches # 4.123 /sec
224 cpu-migrations # 0.991 /sec
76 page-faults # 0.336 /sec
45,940,682 cycles # 0.000 GHz
36,676,047 instructions # 0.80 insn per cycle
7,044,516 branches # 31.163 K/sec
62,169 branch-misses # 0.88% of all branches
TopdownL1 # 68.7 % tma_backend_bound
# 3.1 % tma_bad_speculation
# 13.0 % tma_frontend_bound
# 15.2 % tma_retiring
TopdownL2 # 2.7 % tma_branch_mispredicts
# 19.6 % tma_core_bound
# 4.8 % tma_fetch_bandwidth
# 8.3 % tma_fetch_latency
# 2.9 % tma_heavy_operations
# 12.3 % tma_light_operations
# 0.4 % tma_machine_clears
# 49.1 % tma_memory_bound

1.006529767 seconds time elapsed

perf stat -a sleep 1

Performance counter stats for 'system wide':

32,127.99 msec cpu-clock # 31.992 CPUs utilized
240 context-switches # 7.470 /sec
32 cpu-migrations # 0.996 /sec
74 page-faults # 2.303 /sec
6,313,960 cpu_core/cycles/ # 0.000 GHz
257,711,907 cpu_atom/cycles/ # 0.008 GHz (54.18%)
4,477,162 cpu_core/instructions/ # 0.71 insn per cycle
37,721,481 cpu_atom/instructions/ # 5.97 insn per cycle (63.33%)
809,747 cpu_core/branches/ # 25.204 K/sec
6,621,226 cpu_atom/branches/ # 206.089 K/sec (63.32%)
39,667 cpu_core/branch-misses/ # 4.90% of all branches
1,032,146 cpu_atom/branch-misses/ # 127.47% of all branches (63.33%)
TopdownL1 (cpu_core) # nan % tma_backend_bound
# 0.0 % tma_bad_speculation
# nan % tma_frontend_bound
# nan % tma_retiring
TopdownL1 (cpu_atom) # 13.6 % tma_bad_speculation (63.36%)
# 41.1 % tma_frontend_bound (63.54%)
# 39.2 % tma_backend_bound
# 39.2 % tma_backend_bound_aux (63.93%)
# 5.4 % tma_retiring (64.15%)

1.004244114 seconds time elapsed

JSON output

on SPR

perf stat --json -a sleep 1
{"counter-value" : "225904.823297", "unit" : "msec", "event" : "cpu-clock", "event-runtime" : 225904323425, "pcnt-running" : 100.00, "metric-value" : "224.456872", "metric-unit" : "CPUs utilized"}
{"counter-value" : "986.000000", "unit" : "", "event" : "context-switches", "event-runtime" : 225904108985, "pcnt-running" : 100.00, "metric-value" : "4.364670", "metric-unit" : "/sec"}
{"counter-value" : "224.000000", "unit" : "", "event" : "cpu-migrations", "event-runtime" : 225904016141, "pcnt-running" : 100.00, "metric-value" : "0.991568", "metric-unit" : "/sec"}
{"counter-value" : "76.000000", "unit" : "", "event" : "page-faults", "event-runtime" : 225903913270, "pcnt-running" : 100.00, "metric-value" : "0.336425", "metric-unit" : "/sec"}
{"counter-value" : "48433482.000000", "unit" : "", "event" : "cycles", "event-runtime" : 225903792732, "pcnt-running" : 100.00, "metric-value" : "0.000214", "metric-unit" : "GHz"}
{"counter-value" : "38620409.000000", "unit" : "", "event" : "instructions", "event-runtime" : 225903657830, "pcnt-running" : 100.00, "metric-value" : "0.797391", "metric-unit" : "insn per cycle"}
{"counter-value" : "7369473.000000", "unit" : "", "event" : "branches", "event-runtime" : 225903464328, "pcnt-running" : 100.00, "metric-value" : "32.622026", "metric-unit" : "K/sec"}
{"counter-value" : "54747.000000", "unit" : "", "event" : "branch-misses", "event-runtime" : 225903234523, "pcnt-running" : 100.00, "metric-value" : "0.742889", "metric-unit" : "of all branches"}
{"event-runtime" : 225902840555, "pcnt-running" : 100.00, "metricgroup" : "TopdownL1"}
{"metric-value" : "69.950631", "metric-unit" : "% tma_backend_bound"}
{"metric-value" : "2.771783", "metric-unit" : "% tma_bad_speculation"}
{"metric-value" : "12.026074", "metric-unit" : "% tma_frontend_bound"}
{"metric-value" : "15.251513", "metric-unit" : "% tma_retiring"}
{"event-runtime" : 225902840555, "pcnt-running" : 100.00, "metricgroup" : "TopdownL2"}
{"metric-value" : "2.351757", "metric-unit" : "% tma_branch_mispredicts"}
{"metric-value" : "19.729771", "metric-unit" : "% tma_core_bound"}
{"metric-value" : "4.555207", "metric-unit" : "% tma_fetch_bandwidth"}
{"metric-value" : "7.470867", "metric-unit" : "% tma_fetch_latency"}
{"metric-value" : "2.938808", "metric-unit" : "% tma_heavy_operations"}
{"metric-value" : "12.312705", "metric-unit" : "% tma_light_operations"}
{"metric-value" : "0.420026", "metric-unit" : "% tma_machine_clears"}
{"metric-value" : "50.220860", "metric-unit" : "% tma_memory_bound"}

On hybrid

perf stat --json -a sleep 1
{"counter-value" : "32131.530625", "unit" : "msec", "event" : "cpu-clock", "event-runtime" : 32131536951, "pcnt-running" : 100.00, "metric-value" : "31.992642", "metric-unit" : "CPUs utilized"}
{"counter-value" : "328.000000", "unit" : "", "event" : "context-switches", "event-runtime" : 32131525778, "pcnt-running" : 100.00, "metric-value" : "10.208042", "metric-unit" : "/sec"}
{"counter-value" : "32.000000", "unit" : "", "event" : "cpu-migrations", "event-runtime" : 32131515104, "pcnt-running" : 100.00, "metric-value" : "0.995906", "metric-unit" : "/sec"}
{"counter-value" : "353.000000", "unit" : "", "event" : "page-faults", "event-runtime" : 32131501396, "pcnt-running" : 100.00, "metric-value" : "10.986094", "metric-unit" : "/sec"}
{"counter-value" : "18685492.000000", "unit" : "", "event" : "cpu_core/cycles/", "event-runtime" : 16061585292, "pcnt-running" : 100.00, "metric-value" : "0.000582", "metric-unit" : "GHz"}
{"counter-value" : "255620352.000000", "unit" : "", "event" : "cpu_atom/cycles/", "event-runtime" : 8690268422, "pcnt-running" : 54.00, "metric-value" : "0.007955", "metric-unit" : "GHz"}
{"counter-value" : "15489913.000000", "unit" : "", "event" : "cpu_core/instructions/", "event-runtime" : 16061582200, "pcnt-running" : 100.00, "metric-value" : "0.828981", "metric-unit" : "insn per cycle"}
{"counter-value" : "38790161.000000", "unit" : "", "event" : "cpu_atom/instructions/", "event-runtime" : 10163133324, "pcnt-running" : 63.00, "metric-value" : "2.075951", "metric-unit" : "insn per cycle"}
{"counter-value" : "2908031.000000", "unit" : "", "event" : "cpu_core/branches/", "event-runtime" : 16061563416, "pcnt-running" : 100.00, "metric-value" : "90.503967", "metric-unit" : "K/sec"}
{"counter-value" : "6814948.000000", "unit" : "", "event" : "cpu_atom/branches/", "event-runtime" : 10161711336, "pcnt-running" : 63.00, "metric-value" : "212.095343", "metric-unit" : "K/sec"}
{"counter-value" : "97638.000000", "unit" : "", "event" : "cpu_core/branch-misses/", "event-runtime" : 16061535261, "pcnt-running" : 100.00, "metric-value" : "3.357530", "metric-unit" : "of all branches"}
{"counter-value" : "1017066.000000", "unit" : "", "event" : "cpu_atom/branch-misses/", "event-runtime" : 10159971797, "pcnt-running" : 63.00, "metric-value" : "34.974386", "metric-unit" : "of all branches"}
{"event-runtime" : 16061513607, "pcnt-running" : 100.00, "metricgroup" : "TopdownL1 (cpu_core)"}
{"metric-value" : "nan", "metric-unit" : "% tma_backend_bound"}
{"metric-value" : "0.000000", "metric-unit" : "% tma_bad_speculation"}
{"metric-value" : "nan", "metric-unit" : "% tma_frontend_bound"}
{"metric-value" : "nan", "metric-unit" : "% tma_retiring"}
{"event-runtime" : 10157398501, "pcnt-running" : 63.00, "metricgroup" : "TopdownL1 (cpu_atom)"}
{"metric-value" : "13.719821", "metric-unit" : "% tma_bad_speculation"}
{"event-runtime" : 10178698656, "pcnt-running" : 63.00, "metric-value" : "41.016738", "metric-unit" : "% tma_frontend_bound"}
{"event-runtime" : 10240582902, "pcnt-running" : 63.00, "metric-value" : "39.327764", "metric-unit" : "% tma_backend_bound"}
{"metric-value" : "39.327764", "metric-unit" : "% tma_backend_bound_aux"}
{"event-runtime" : 10284284920, "pcnt-running" : 64.00, "metric-value" : "5.374638", "metric-unit" : "% tma_retiring"}

CSV output

On SPR

perf stat -x, -a sleep 1
225851.20,msec,cpu-clock,225850700108,100.00,224.431,CPUs utilized
976,,context-switches,225850504803,100.00,4.321,/sec
224,,cpu-migrations,225850410336,100.00,0.992,/sec
76,,page-faults,225850304155,100.00,0.337,/sec
52288305,,cycles,225850188531,100.00,0.000,GHz
37977214,,instructions,225850071251,100.00,0.73,insn per cycle
7299859,,branches,225849890722,100.00,32.322,K/sec
51102,,branch-misses,225849672536,100.00,0.70,of all branches
,225849327050,100.00,,,,TopdownL1
,,,,,70.1,% tma_backend_bound
,,,,,2.7,% tma_bad_speculation
,,,,,12.5,% tma_frontend_bound
,,,,,14.6,% tma_retiring
,225849327050,100.00,,,,TopdownL2
,,,,,2.3,% tma_branch_mispredicts
,,,,,19.6,% tma_core_bound
,,,,,4.6,% tma_fetch_bandwidth
,,,,,7.9,% tma_fetch_latency
,,,,,2.9,% tma_heavy_operations
,,,,,11.7,% tma_light_operations
,,,,,0.5,% tma_machine_clears
,,,,,50.5,% tma_memory_bound

On Hybrid

perf stat -x, -a sleep 1
32139.34,msec,cpu-clock,32139351409,100.00,32.001,CPUs utilized
225,,context-switches,32139342672,100.00,7.001,/sec
32,,cpu-migrations,32139337772,100.00,0.996,/sec
72,,page-faults,32139328384,100.00,2.240,/sec
6766433,,cpu_core/cycles/,16067551558,100.00,0.000,GHz
256500230,,cpu_atom/cycles/,8695757391,54.00,0.008,GHz
4688595,,cpu_core/instructions/,16067558976,100.00,0.69,insn per cycle
37487490,,cpu_atom/instructions/,10165193856,63.00,5.54,insn per cycle
845211,,cpu_core/branches/,16067540225,100.00,26.298,K/sec
6571193,,cpu_atom/branches/,10155940853,63.00,204.459,K/sec
41359,,cpu_core/branch-misses/,16067516493,100.00,4.89,of all branches
1020231,,cpu_atom/branch-misses/,10159363620,63.00,120.71,of all branches
,16067494476,100.00,,,,TopdownL1 (cpu_core)
,,,,,,% tma_backend_bound
,,,,,0.0,% tma_bad_speculation
,,,,,,% tma_frontend_bound
,,,,,,% tma_retiring
,10160989992,63.00,,,,TopdownL1 (cpu_atom)
,,,,,13.8,% tma_bad_speculation
,10188319019,63.00,,,41.3,% tma_frontend_bound
,10258326591,63.00,,,38.6,% tma_backend_bound
,,,,,38.6,% tma_backend_bound_aux
,10282689488,64.00,,,5.4,% tma_retiring

Kan Liang (8):
perf evsel: Fix the annotation for hardware events on hybrid
perf metric: JSON flag to default metric group
perf stat,jevents: Introduce Default tags for the default mode
perf metrics: Sort the Default metricgroup
perf stat: New metricgroup output for the default mode
pert tests: Update metric-value for perf stat JSON output
perf test: Move all the check functions of stat csv output to lib
perf test: Add test case for the standard perf stat output

tools/perf/builtin-stat.c | 5 +-
.../arch/x86/alderlake/adl-metrics.json | 45 +++--
.../arch/x86/alderlaken/adln-metrics.json | 25 ++-
.../arch/x86/icelake/icl-metrics.json | 20 +-
.../arch/x86/icelakex/icx-metrics.json | 20 +-
.../arch/x86/sapphirerapids/spr-metrics.json | 60 +++---
.../arch/x86/tigerlake/tgl-metrics.json | 20 +-
tools/perf/pmu-events/jevents.py | 5 +-
tools/perf/pmu-events/pmu-events.h | 1 +
.../tests/shell/lib/perf_json_output_lint.py | 6 +-
tools/perf/tests/shell/lib/stat_output.sh | 169 ++++++++++++++++
tools/perf/tests/shell/stat+csv_output.sh | 188 ++----------------
tools/perf/tests/shell/stat+std_output.sh | 108 ++++++++++
tools/perf/util/evsel.h | 18 +-
tools/perf/util/metricgroup.c | 43 ++++
tools/perf/util/metricgroup.h | 3 +
tools/perf/util/stat-display.c | 108 +++++++++-
tools/perf/util/stat-shadow.c | 108 ++++++++--
tools/perf/util/stat.h | 15 ++
19 files changed, 686 insertions(+), 281 deletions(-)
create mode 100755 tools/perf/tests/shell/lib/stat_output.sh
create mode 100755 tools/perf/tests/shell/stat+std_output.sh

--
2.35.1



2023-06-15 14:37:36

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V3 4/8] perf metrics: Sort the Default metricgroup

From: Kan Liang <[email protected]>

The new default mode will print the metrics as a metric group. The
metrics from the same metric group must be adjacent to each other in the
metric list. But the metric_list_cmp() sorts metrics by the number of
events.

Add a new sort for the Default metricgroup, which sorts by
default_metricgroup_name and metric_name.

Add is_default in the struct metric_event to indicate that it's from
the Default metricgroup.

Store the displayed metricgroup name of the Default metricgroup into
the metric expr for output.

Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/util/metricgroup.c | 37 +++++++++++++++++++++++++++++++++++
tools/perf/util/metricgroup.h | 3 +++
2 files changed, 40 insertions(+)

diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 8b19644ade7d..e20adbdd5b56 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -79,6 +79,7 @@ static struct rb_node *metric_event_new(struct rblist *rblist __maybe_unused,
return NULL;
memcpy(me, entry, sizeof(struct metric_event));
me->evsel = ((struct metric_event *)entry)->evsel;
+ me->is_default = false;
INIT_LIST_HEAD(&me->head);
return &me->nd;
}
@@ -1160,6 +1161,25 @@ static int metric_list_cmp(void *priv __maybe_unused, const struct list_head *l,
return right_count - left_count;
}

+/**
+ * default_metricgroup_cmp - Implements complex key for the Default metricgroup
+ * that first sorts by default_metricgroup_name, then
+ * metric_name.
+ */
+static int default_metricgroup_cmp(void *priv __maybe_unused,
+ const struct list_head *l,
+ const struct list_head *r)
+{
+ const struct metric *left = container_of(l, struct metric, nd);
+ const struct metric *right = container_of(r, struct metric, nd);
+ int diff = strcmp(right->default_metricgroup_name, left->default_metricgroup_name);
+
+ if (diff)
+ return diff;
+
+ return strcmp(right->metric_name, left->metric_name);
+}
+
struct metricgroup__add_metric_data {
struct list_head *list;
const char *pmu;
@@ -1515,6 +1535,7 @@ static int parse_groups(struct evlist *perf_evlist,
LIST_HEAD(metric_list);
struct metric *m;
bool tool_events[PERF_TOOL_MAX] = {false};
+ bool is_default = !strcmp(str, "Default");
int ret;

if (metric_events_list->nr_entries == 0)
@@ -1549,6 +1570,9 @@ static int parse_groups(struct evlist *perf_evlist,
goto out;
}

+ if (is_default)
+ list_sort(NULL, &metric_list, default_metricgroup_cmp);
+
list_for_each_entry(m, &metric_list, nd) {
struct metric_event *me;
struct evsel **metric_events;
@@ -1637,6 +1661,19 @@ static int parse_groups(struct evlist *perf_evlist,
expr->metric_unit = m->metric_unit;
expr->metric_events = metric_events;
expr->runtime = m->pctx->sctx.runtime;
+ if (m->pmu && strcmp(m->pmu, "cpu")) {
+ char *name;
+
+ if (asprintf(&name, "%s (%s)", m->default_metricgroup_name, m->pmu) < 0)
+ expr->default_metricgroup_name = m->default_metricgroup_name;
+ else {
+ expr->default_metricgroup_name = strdup(name);
+ free(name);
+ }
+ } else
+ expr->default_metricgroup_name = m->default_metricgroup_name;
+ if (is_default)
+ me->is_default = true;
list_add(&expr->nd, &me->head);
}

diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
index bf18274c15df..d5325c6ec8e1 100644
--- a/tools/perf/util/metricgroup.h
+++ b/tools/perf/util/metricgroup.h
@@ -22,6 +22,7 @@ struct cgroup;
struct metric_event {
struct rb_node nd;
struct evsel *evsel;
+ bool is_default; /* the metric evsel from the Default metricgroup */
struct list_head head; /* list of metric_expr */
};

@@ -55,6 +56,8 @@ struct metric_expr {
* more human intelligible) and then add "MiB" afterward when displayed.
*/
const char *metric_unit;
+ /** Displayed metricgroup name of the Default metricgroup */
+ const char *default_metricgroup_name;
/** Null terminated array of events used by the metric. */
struct evsel **metric_events;
/** Null terminated array of referenced metrics. */
--
2.35.1


2023-06-15 14:37:37

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V3 7/8] perf test: Move all the check functions of stat csv output to lib

From: Kan Liang <[email protected]>

These functions can be shared with the stat std output test.

There is no functional change.

Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/tests/shell/lib/stat_output.sh | 169 +++++++++++++++++++
tools/perf/tests/shell/stat+csv_output.sh | 188 ++--------------------
2 files changed, 184 insertions(+), 173 deletions(-)
create mode 100755 tools/perf/tests/shell/lib/stat_output.sh

diff --git a/tools/perf/tests/shell/lib/stat_output.sh b/tools/perf/tests/shell/lib/stat_output.sh
new file mode 100755
index 000000000000..363979b1123d
--- /dev/null
+++ b/tools/perf/tests/shell/lib/stat_output.sh
@@ -0,0 +1,169 @@
+# SPDX-License-Identifier: GPL-2.0
+
+# Return true if perf_event_paranoid is > $1 and not running as root.
+function ParanoidAndNotRoot()
+{
+ [ "$(id -u)" != 0 ] && [ "$(cat /proc/sys/kernel/perf_event_paranoid)" -gt $1 ]
+}
+
+# $1 name $2 extra_opt
+check_no_args()
+{
+ echo -n "Checking $1 output: no args"
+ perf stat $2 true
+ commachecker --no-args
+ echo "[Success]"
+}
+
+check_system_wide()
+{
+ echo -n "Checking $1 output: system wide "
+ if ParanoidAndNotRoot 0
+ then
+ echo "[Skip] paranoid and not root"
+ return
+ fi
+ perf stat -a $2 true
+ commachecker --system-wide
+ echo "[Success]"
+}
+
+check_system_wide_no_aggr()
+{
+ echo -n "Checking $1 output: system wide no aggregation "
+ if ParanoidAndNotRoot 0
+ then
+ echo "[Skip] paranoid and not root"
+ return
+ fi
+ perf stat -A -a --no-merge $2 true
+ commachecker --system-wide-no-aggr
+ echo "[Success]"
+}
+
+check_interval()
+{
+ echo -n "Checking $1 output: interval "
+ perf stat -I 1000 $2 true
+ commachecker --interval
+ echo "[Success]"
+}
+
+check_event()
+{
+ echo -n "Checking $1 output: event "
+ perf stat -e cpu-clock $2 true
+ commachecker --event
+ echo "[Success]"
+}
+
+check_per_core()
+{
+ echo -n "Checking $1 output: per core "
+ if ParanoidAndNotRoot 0
+ then
+ echo "[Skip] paranoid and not root"
+ return
+ fi
+ perf stat --per-core -a $2 true
+ commachecker --per-core
+ echo "[Success]"
+}
+
+check_per_thread()
+{
+ echo -n "Checking $1 output: per thread "
+ if ParanoidAndNotRoot 0
+ then
+ echo "[Skip] paranoid and not root"
+ return
+ fi
+ perf stat --per-thread -a $2 true
+ commachecker --per-thread
+ echo "[Success]"
+}
+
+check_per_cache_instance()
+{
+ echo -n "Checking $1 output: per cache instance "
+ if ParanoidAndNotRoot 0
+ then
+ echo "[Skip] paranoid and not root"
+ return
+ fi
+ perf stat --per-cache -a $2 true
+ commachecker --per-cache
+ echo "[Success]"
+}
+
+check_per_die()
+{
+ echo -n "Checking $1 output: per die "
+ if ParanoidAndNotRoot 0
+ then
+ echo "[Skip] paranoid and not root"
+ return
+ fi
+ perf stat --per-die -a $2 true
+ commachecker --per-die
+ echo "[Success]"
+}
+
+check_per_node()
+{
+ echo -n "Checking $1 output: per node "
+ if ParanoidAndNotRoot 0
+ then
+ echo "[Skip] paranoid and not root"
+ return
+ fi
+ perf stat --per-node -a $2 true
+ commachecker --per-node
+ echo "[Success]"
+}
+
+check_per_socket()
+{
+ echo -n "Checking $1 output: per socket "
+ if ParanoidAndNotRoot 0
+ then
+ echo "[Skip] paranoid and not root"
+ return
+ fi
+ perf stat --per-socket -a $2 true
+ commachecker --per-socket
+ echo "[Success]"
+}
+
+# The perf stat options for per-socket, per-core, per-die
+# and -A ( no_aggr mode ) uses the info fetched from this
+# directory: "/sys/devices/system/cpu/cpu*/topology". For
+# example, socket value is fetched from "physical_package_id"
+# file in topology directory.
+# Reference: cpu__get_topology_int in util/cpumap.c
+# If the platform doesn't expose topology information, values
+# will be set to -1. For example, incase of pSeries platform
+# of powerpc, value for "physical_package_id" is restricted
+# and set to -1. Check here validates the socket-id read from
+# topology file before proceeding further
+
+FILE_LOC="/sys/devices/system/cpu/cpu*/topology/"
+FILE_NAME="physical_package_id"
+
+function check_for_topology()
+{
+ if ! ParanoidAndNotRoot 0
+ then
+ socket_file=`ls $FILE_LOC/$FILE_NAME | head -n 1`
+ [ -z $socket_file ] && {
+ echo 0
+ return
+ }
+ socket_id=`cat $socket_file`
+ [ $socket_id == -1 ] && {
+ echo 1
+ return
+ }
+ fi
+ echo 0
+}
diff --git a/tools/perf/tests/shell/stat+csv_output.sh b/tools/perf/tests/shell/stat+csv_output.sh
index ed082daf839c..34a0701fee05 100755
--- a/tools/perf/tests/shell/stat+csv_output.sh
+++ b/tools/perf/tests/shell/stat+csv_output.sh
@@ -6,7 +6,8 @@

set -e

-skip_test=0
+. $(dirname $0)/lib/stat_output.sh
+
csv_sep=@

stat_output=$(mktemp /tmp/__perf_test.stat_output.csv.XXXXX)
@@ -63,181 +64,22 @@ function commachecker()
return 0
}

-# Return true if perf_event_paranoid is > $1 and not running as root.
-function ParanoidAndNotRoot()
-{
- [ "$(id -u)" != 0 ] && [ "$(cat /proc/sys/kernel/perf_event_paranoid)" -gt $1 ]
-}
-
-check_no_args()
-{
- echo -n "Checking CSV output: no args "
- perf stat -x$csv_sep -o "${stat_output}" true
- commachecker --no-args
- echo "[Success]"
-}
-
-check_system_wide()
-{
- echo -n "Checking CSV output: system wide "
- if ParanoidAndNotRoot 0
- then
- echo "[Skip] paranoid and not root"
- return
- fi
- perf stat -x$csv_sep -a -o "${stat_output}" true
- commachecker --system-wide
- echo "[Success]"
-}
-
-check_system_wide_no_aggr()
-{
- echo -n "Checking CSV output: system wide no aggregation "
- if ParanoidAndNotRoot 0
- then
- echo "[Skip] paranoid and not root"
- return
- fi
- perf stat -x$csv_sep -A -a --no-merge -o "${stat_output}" true
- commachecker --system-wide-no-aggr
- echo "[Success]"
-}
-
-check_interval()
-{
- echo -n "Checking CSV output: interval "
- perf stat -x$csv_sep -I 1000 -o "${stat_output}" true
- commachecker --interval
- echo "[Success]"
-}
-
-
-check_event()
-{
- echo -n "Checking CSV output: event "
- perf stat -x$csv_sep -e cpu-clock -o "${stat_output}" true
- commachecker --event
- echo "[Success]"
-}
-
-check_per_core()
-{
- echo -n "Checking CSV output: per core "
- if ParanoidAndNotRoot 0
- then
- echo "[Skip] paranoid and not root"
- return
- fi
- perf stat -x$csv_sep --per-core -a -o "${stat_output}" true
- commachecker --per-core
- echo "[Success]"
-}
-
-check_per_thread()
-{
- echo -n "Checking CSV output: per thread "
- if ParanoidAndNotRoot 0
- then
- echo "[Skip] paranoid and not root"
- return
- fi
- perf stat -x$csv_sep --per-thread -a -o "${stat_output}" true
- commachecker --per-thread
- echo "[Success]"
-}
-
-check_per_cache_instance()
-{
- echo -n "Checking CSV output: per cache instance "
- if ParanoidAndNotRoot 0
- then
- echo "[Skip] paranoid and not root"
- return
- fi
- perf stat -x$csv_sep --per-cache -a true 2>&1 | commachecker --per-cache
- echo "[Success]"
-}
-
-check_per_die()
-{
- echo -n "Checking CSV output: per die "
- if ParanoidAndNotRoot 0
- then
- echo "[Skip] paranoid and not root"
- return
- fi
- perf stat -x$csv_sep --per-die -a -o "${stat_output}" true
- commachecker --per-die
- echo "[Success]"
-}
-
-check_per_node()
-{
- echo -n "Checking CSV output: per node "
- if ParanoidAndNotRoot 0
- then
- echo "[Skip] paranoid and not root"
- return
- fi
- perf stat -x$csv_sep --per-node -a -o "${stat_output}" true
- commachecker --per-node
- echo "[Success]"
-}
-
-check_per_socket()
-{
- echo -n "Checking CSV output: per socket "
- if ParanoidAndNotRoot 0
- then
- echo "[Skip] paranoid and not root"
- return
- fi
- perf stat -x$csv_sep --per-socket -a -o "${stat_output}" true
- commachecker --per-socket
- echo "[Success]"
-}
-
-# The perf stat options for per-socket, per-core, per-die
-# and -A ( no_aggr mode ) uses the info fetched from this
-# directory: "/sys/devices/system/cpu/cpu*/topology". For
-# example, socket value is fetched from "physical_package_id"
-# file in topology directory.
-# Reference: cpu__get_topology_int in util/cpumap.c
-# If the platform doesn't expose topology information, values
-# will be set to -1. For example, incase of pSeries platform
-# of powerpc, value for "physical_package_id" is restricted
-# and set to -1. Check here validates the socket-id read from
-# topology file before proceeding further
-
-FILE_LOC="/sys/devices/system/cpu/cpu*/topology/"
-FILE_NAME="physical_package_id"
-
-check_for_topology()
-{
- if ! ParanoidAndNotRoot 0
- then
- socket_file=`ls $FILE_LOC/$FILE_NAME | head -n 1`
- [ -z $socket_file ] && return 0
- socket_id=`cat $socket_file`
- [ $socket_id == -1 ] && skip_test=1
- return 0
- fi
-}
+perf_cmd="-x$csv_sep -o ${stat_output}"

-check_for_topology
-check_no_args
-check_system_wide
-check_interval
-check_event
-check_per_thread
-check_per_node
+skip_test=$(check_for_topology)
+check_no_args "CSV" "$perf_cmd"
+check_system_wide "CSV" "$perf_cmd"
+check_interval "CSV" "$perf_cmd"
+check_event "CSV" "$perf_cmd"
+check_per_thread "CSV" "$perf_cmd"
+check_per_node "CSV" "$perf_cmd"
if [ $skip_test -ne 1 ]
then
- check_system_wide_no_aggr
- check_per_core
- check_per_cache_instance
- check_per_die
- check_per_socket
+ check_system_wide_no_aggr "CSV" "$perf_cmd"
+ check_per_core "CSV" "$perf_cmd"
+ check_per_cache_instance "CSV" "$perf_cmd"
+ check_per_die "CSV" "$perf_cmd"
+ check_per_socket "CSV" "$perf_cmd"
else
echo "[Skip] Skipping tests for system_wide_no_aggr, per_core, per_die and per_socket since socket id exposed via topology is invalid"
fi
--
2.35.1


2023-06-15 14:39:24

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V3 6/8] pert tests: Update metric-value for perf stat JSON output

From: Kan Liang <[email protected]>

There may be multiplexing triggered, e.g., e-core of ADL.

Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/tests/shell/lib/perf_json_output_lint.py | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/tests/shell/lib/perf_json_output_lint.py b/tools/perf/tests/shell/lib/perf_json_output_lint.py
index 5e9bd68c83fe..ea55d5ea1ced 100644
--- a/tools/perf/tests/shell/lib/perf_json_output_lint.py
+++ b/tools/perf/tests/shell/lib/perf_json_output_lint.py
@@ -66,10 +66,10 @@ def check_json_output(expected_items):
for item in json.loads(input):
if expected_items != -1:
count = len(item)
- if count != expected_items and count >= 1 and count <= 4 and 'metric-value' in item:
+ if count != expected_items and count >= 1 and count <= 6 and 'metric-value' in item:
# Events that generate >1 metric may have isolated metric
- # values and possibly other prefixes like interval, core and
- # aggregate-number.
+ # values and possibly other prefixes like interval, core,
+ # aggregate-number, or event-runtime/pcnt-running from multiplexing.
pass
elif count != expected_items and count >= 1 and count <= 5 and 'metricgroup' in item:
pass
--
2.35.1


2023-06-15 14:39:27

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V3 2/8] perf metric: JSON flag to default metric group

From: Kan Liang <[email protected]>

For the default output, the default metric group could vary on different
platforms. For example, on SPR, the TopdownL1 and TopdownL2 metrics
should be displayed in the default mode. On ICL, only the TopdownL1
should be displayed.

Add a flag so we can tag the default metric group for different
platforms rather than hack the perf code.

The flag is added to Intel TopdownL1 since ICL and ADL, TopdownL2 metrics
since SPR.

Add a new field, DefaultMetricgroupName, in the JSON file to indicate
the real metric group name.

Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
.../arch/x86/alderlake/adl-metrics.json | 45 ++++++++------
.../arch/x86/alderlaken/adln-metrics.json | 25 ++++----
.../arch/x86/icelake/icl-metrics.json | 20 ++++---
.../arch/x86/icelakex/icx-metrics.json | 20 ++++---
.../arch/x86/sapphirerapids/spr-metrics.json | 60 +++++++++++--------
.../arch/x86/tigerlake/tgl-metrics.json | 20 ++++---
6 files changed, 114 insertions(+), 76 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json b/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json
index c9f7e3d4ab08..85fb975b6f56 100644
--- a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json
@@ -129,33 +129,36 @@
},
{
"BriefDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "TOPDOWN_BE_BOUND.ALL / tma_info_core_slots",
- "MetricGroup": "TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.1",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that uops must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count. The rest of these subevents count backend stalls, in cycles, due to an outstanding request which is memory bound vs core bound. The subevents are not slot based events and therefore can not be precisely added or subtracted from the Backend_Bound_Aux subevents which are slot based.",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
},
{
"BriefDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "tma_backend_bound",
- "MetricGroup": "TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound_aux",
"MetricThreshold": "tma_backend_bound_aux > 0.2",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that UOPS must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count. All of these subevents count backend stalls, in slots, due to a resource limitation. These are not cycle based events and therefore can not be precisely added or subtracted from the Backend_Bound subevents which are cycle based. These subevents are supplementary to Backend_Bound and can be used to analyze results from a resource perspective at allocation.",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
},
{
"BriefDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "(tma_info_core_slots - (cpu_atom@TOPDOWN_FE_BOUND.ALL@ + cpu_atom@TOPDOWN_BE_BOUND.ALL@ + cpu_atom@TOPDOWN_RETIRING.ALL@)) / tma_info_core_slots",
- "MetricGroup": "TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear. Only issue slots wasted due to fast nukes such as memory ordering nukes are counted. Other nukes are not accounted for. Counts all issue slots blocked during this recovery window including relevant microcode flows and while uops are not yet available in the instruction queue (IQ). Also includes the issue slots that were consumed by the backend but were thrown away because they were younger than the mispredict or machine clear.",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
@@ -295,11 +298,12 @@
},
{
"BriefDescription": "Counts the number of issue slots that were not consumed by the backend due to frontend stalls.",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "TOPDOWN_FE_BOUND.ALL / tma_info_core_slots",
- "MetricGroup": "TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.2",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
},
@@ -722,11 +726,12 @@
},
{
"BriefDescription": "Counts the numer of issue slots that result in retirement slots.",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "TOPDOWN_RETIRING.ALL / tma_info_core_slots",
- "MetricGroup": "TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.75",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
},
@@ -832,22 +837,24 @@
},
{
"BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "cpu_core@topdown\\-be\\-bound@ / (cpu_core@topdown\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots",
- "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS",
"ScaleUnit": "100%",
"Unit": "cpu_core"
},
{
"BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + tma_retiring), 0)",
- "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%",
"Unit": "cpu_core"
@@ -1112,11 +1119,12 @@
},
{
"BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "cpu_core@topdown\\-fe\\-bound@ / (cpu_core@topdown\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@ + cpu_core@topdown\\-be\\-bound@) - cpu_core@INT_MISC.UOP_DROPPING@ / tma_info_thread_slots",
- "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
"ScaleUnit": "100%",
"Unit": "cpu_core"
@@ -2316,11 +2324,12 @@
},
{
"BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "cpu_core@topdown\\-retiring@ / (cpu_core@topdown\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots",
- "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
"ScaleUnit": "100%",
"Unit": "cpu_core"
diff --git a/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json b/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json
index ed9ff25a03cf..0f1628d698da 100644
--- a/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json
@@ -94,31 +94,34 @@
},
{
"BriefDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "TOPDOWN_BE_BOUND.ALL / tma_info_core_slots",
- "MetricGroup": "TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.1",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that uops must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count. The rest of these subevents count backend stalls, in cycles, due to an outstanding request which is memory bound vs core bound. The subevents are not slot based events and therefore can not be precisely added or subtracted from the Backend_Bound_Aux subevents which are slot based.",
"ScaleUnit": "100%"
},
{
"BriefDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "tma_backend_bound",
- "MetricGroup": "TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound_aux",
"MetricThreshold": "tma_backend_bound_aux > 0.2",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that UOPS must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count. All of these subevents count backend stalls, in slots, due to a resource limitation. These are not cycle based events and therefore can not be precisely added or subtracted from the Backend_Bound subevents which are cycle based. These subevents are supplementary to Backend_Bound and can be used to analyze results from a resource perspective at allocation.",
"ScaleUnit": "100%"
},
{
"BriefDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "(tma_info_core_slots - (TOPDOWN_FE_BOUND.ALL + TOPDOWN_BE_BOUND.ALL + TOPDOWN_RETIRING.ALL)) / tma_info_core_slots",
- "MetricGroup": "TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear. Only issue slots wasted due to fast nukes such as memory ordering nukes are counted. Other nukes are not accounted for. Counts all issue slots blocked during this recovery window including relevant microcode flows and while uops are not yet available in the instruction queue (IQ). Also includes the issue slots that were consumed by the backend but were thrown away because they were younger than the mispredict or machine clear.",
"ScaleUnit": "100%"
},
@@ -243,11 +246,12 @@
},
{
"BriefDescription": "Counts the number of issue slots that were not consumed by the backend due to frontend stalls.",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "TOPDOWN_FE_BOUND.ALL / tma_info_core_slots",
- "MetricGroup": "TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.2",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"ScaleUnit": "100%"
},
{
@@ -612,11 +616,12 @@
},
{
"BriefDescription": "Counts the numer of issue slots that result in retirement slots.",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "TOPDOWN_RETIRING.ALL / tma_info_core_slots",
- "MetricGroup": "TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.75",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"ScaleUnit": "100%"
},
{
diff --git a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json b/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
index 20210742171d..cc4edf855064 100644
--- a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
@@ -111,21 +111,23 @@
},
{
"BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 5 * cpu@INT_MISC.RECOVERY_CYCLES\\,cmask\\=1\\,edge@ / tma_info_thread_slots",
- "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS",
"ScaleUnit": "100%"
},
{
"BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + tma_retiring), 0)",
- "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
@@ -372,11 +374,12 @@
},
{
"BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UOP_DROPPING / tma_info_thread_slots",
- "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
"ScaleUnit": "100%"
},
@@ -1378,11 +1381,12 @@
},
{
"BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_thread_slots",
- "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
"ScaleUnit": "100%"
},
diff --git a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json b/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
index ef25cda019be..6f25b5b7aaf6 100644
--- a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
@@ -315,21 +315,23 @@
},
{
"BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 5 * cpu@INT_MISC.RECOVERY_CYCLES\\,cmask\\=1\\,edge@ / tma_info_thread_slots",
- "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS",
"ScaleUnit": "100%"
},
{
"BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + tma_retiring), 0)",
- "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
@@ -576,11 +578,12 @@
},
{
"BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UOP_DROPPING / tma_info_thread_slots",
- "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
"ScaleUnit": "100%"
},
@@ -1674,11 +1677,12 @@
},
{
"BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_thread_slots",
- "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
"ScaleUnit": "100%"
},
diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json
index 4f3dd85540b6..c732982f70b5 100644
--- a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json
@@ -340,31 +340,34 @@
},
{
"BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_thread_slots",
- "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS",
"ScaleUnit": "100%"
},
{
"BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + tma_retiring), 0)",
- "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
{
"BriefDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction",
+ "DefaultMetricgroupName": "TopdownL2",
"MetricExpr": "topdown\\-br\\-mispredict / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_thread_slots",
- "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
+ "MetricGroup": "BadSpec;BrMispredicts;Default;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
"MetricName": "tma_branch_mispredicts",
"MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
- "MetricgroupNoGroup": "TopdownL2",
+ "MetricgroupNoGroup": "TopdownL2;Default",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: TOPDOWN.BR_MISPREDICT_SLOTS. Related metrics: tma_info_bad_spec_branch_misprediction_cost, tma_info_bottleneck_mispredictions, tma_mispredicts_resteers",
"ScaleUnit": "100%"
},
@@ -407,11 +410,12 @@
},
{
"BriefDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck",
+ "DefaultMetricgroupName": "TopdownL2",
"MetricExpr": "max(0, tma_backend_bound - tma_memory_bound)",
- "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
+ "MetricGroup": "Backend;Compute;Default;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_core_bound",
"MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
- "MetricgroupNoGroup": "TopdownL2",
+ "MetricgroupNoGroup": "TopdownL2;Default",
"PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
"ScaleUnit": "100%"
},
@@ -509,21 +513,23 @@
},
{
"BriefDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues",
+ "DefaultMetricgroupName": "TopdownL2",
"MetricExpr": "max(0, tma_frontend_bound - tma_fetch_latency)",
- "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
+ "MetricGroup": "Default;FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
"MetricName": "tma_fetch_bandwidth",
"MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_thread_ipc / 6 > 0.35",
- "MetricgroupNoGroup": "TopdownL2",
+ "MetricgroupNoGroup": "TopdownL2;Default",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp",
"ScaleUnit": "100%"
},
{
"BriefDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues",
+ "DefaultMetricgroupName": "TopdownL2",
"MetricExpr": "topdown\\-fetch\\-lat / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UOP_DROPPING / tma_info_thread_slots",
- "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
+ "MetricGroup": "Default;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
"MetricName": "tma_fetch_latency",
"MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
- "MetricgroupNoGroup": "TopdownL2",
+ "MetricgroupNoGroup": "TopdownL2;Default",
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS",
"ScaleUnit": "100%"
},
@@ -611,11 +617,12 @@
},
{
"BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UOP_DROPPING / tma_info_thread_slots",
- "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
"ScaleUnit": "100%"
},
@@ -630,11 +637,12 @@
},
{
"BriefDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences",
+ "DefaultMetricgroupName": "TopdownL2",
"MetricExpr": "topdown\\-heavy\\-ops / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_thread_slots",
- "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
+ "MetricGroup": "Default;Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_heavy_operations",
"MetricThreshold": "tma_heavy_operations > 0.1",
- "MetricgroupNoGroup": "TopdownL2",
+ "MetricgroupNoGroup": "TopdownL2;Default",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences. Sample with: UOPS_RETIRED.HEAVY",
"ScaleUnit": "100%"
},
@@ -1486,11 +1494,12 @@
},
{
"BriefDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation)",
+ "DefaultMetricgroupName": "TopdownL2",
"MetricExpr": "max(0, tma_retiring - tma_heavy_operations)",
- "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
+ "MetricGroup": "Default;Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
"MetricName": "tma_light_operations",
"MetricThreshold": "tma_light_operations > 0.6",
- "MetricgroupNoGroup": "TopdownL2",
+ "MetricgroupNoGroup": "TopdownL2;Default",
"PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
"ScaleUnit": "100%"
},
@@ -1540,11 +1549,12 @@
},
{
"BriefDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears",
+ "DefaultMetricgroupName": "TopdownL2",
"MetricExpr": "max(0, tma_bad_speculation - tma_branch_mispredicts)",
- "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
+ "MetricGroup": "BadSpec;Default;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
"MetricName": "tma_machine_clears",
"MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
- "MetricgroupNoGroup": "TopdownL2",
+ "MetricgroupNoGroup": "TopdownL2;Default",
"PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears. These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sharing, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
"ScaleUnit": "100%"
},
@@ -1576,11 +1586,12 @@
},
{
"BriefDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck",
+ "DefaultMetricgroupName": "TopdownL2",
"MetricExpr": "topdown\\-mem\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_thread_slots",
- "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
+ "MetricGroup": "Backend;Default;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
"MetricName": "tma_memory_bound",
"MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
- "MetricgroupNoGroup": "TopdownL2",
+ "MetricgroupNoGroup": "TopdownL2;Default",
"PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
"ScaleUnit": "100%"
},
@@ -1784,11 +1795,12 @@
},
{
"BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_thread_slots",
- "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
"ScaleUnit": "100%"
},
diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json b/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json
index d0538a754288..83346911aa63 100644
--- a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json
@@ -105,21 +105,23 @@
},
{
"BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 5 * cpu@INT_MISC.RECOVERY_CYCLES\\,cmask\\=1\\,edge@ / tma_info_thread_slots",
- "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
"MetricThreshold": "tma_backend_bound > 0.2",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS",
"ScaleUnit": "100%"
},
{
"BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + tma_retiring), 0)",
- "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_bad_speculation",
"MetricThreshold": "tma_bad_speculation > 0.15",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
"ScaleUnit": "100%"
},
@@ -366,11 +368,12 @@
},
{
"BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UOP_DROPPING / tma_info_thread_slots",
- "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
"ScaleUnit": "100%"
},
@@ -1392,11 +1395,12 @@
},
{
"BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired",
+ "DefaultMetricgroupName": "TopdownL1",
"MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_thread_slots",
- "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
+ "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_retiring",
"MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
- "MetricgroupNoGroup": "TopdownL1",
+ "MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
"ScaleUnit": "100%"
},
--
2.35.1


2023-06-15 14:41:18

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V3 5/8] perf stat: New metricgroup output for the default mode

From: Kan Liang <[email protected]>

In the default mode, the current output of the metricgroup include both
events and metrics, which is not necessary and just makes the output
hard to read. Since different ARCHs (even different generations in the
same ARCH) may use different events. The output also vary on different
platforms.

For a metricgroup, only outputting the value of each metric is good
enough.

Add a new field default_metricgroup in evsel to indicate an event of
the default metricgroup. For those events, printout() should print
the metricgroup name rather than each event.

Add perf_stat__skip_metric_event() to skip the evsel in the Default
metricgroup, if it's not running or not the metric event.

Add print_metricgroup_header_t to pass the functions which print the
display name of each metricgroup in the Default metricgroup. Support
all three output methods.

Factor out perf_stat__print_shadow_stats_metricgroup() to print out
each metrics.

On SPR
Before:

./perf_old stat sleep 1

Performance counter stats for 'sleep 1':

0.54 msec task-clock:u # 0.001 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
68 page-faults:u # 125.445 K/sec
540,970 cycles:u # 0.998 GHz
556,325 instructions:u # 1.03 insn per cycle
123,602 branches:u # 228.018 M/sec
6,889 branch-misses:u # 5.57% of all branches
3,245,820 TOPDOWN.SLOTS:u # 18.4 % tma_backend_bound
# 17.2 % tma_retiring
# 23.1 % tma_bad_speculation
# 41.4 % tma_frontend_bound
564,859 topdown-retiring:u
1,370,999 topdown-fe-bound:u
603,271 topdown-be-bound:u
744,874 topdown-bad-spec:u
12,661 INT_MISC.UOP_DROPPING:u # 23.357 M/sec

1.001798215 seconds time elapsed

0.000193000 seconds user
0.001700000 seconds sys

After:

$ ./perf stat sleep 1

Performance counter stats for 'sleep 1':

0.51 msec task-clock:u # 0.001 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
68 page-faults:u # 132.683 K/sec
545,228 cycles:u # 1.064 GHz
555,509 instructions:u # 1.02 insn per cycle
123,574 branches:u # 241.120 M/sec
6,957 branch-misses:u # 5.63% of all branches
TopdownL1 # 17.5 % tma_backend_bound
# 22.6 % tma_bad_speculation
# 42.7 % tma_frontend_bound
# 17.1 % tma_retiring
TopdownL2 # 21.8 % tma_branch_mispredicts
# 11.5 % tma_core_bound
# 13.4 % tma_fetch_bandwidth
# 29.3 % tma_fetch_latency
# 2.7 % tma_heavy_operations
# 14.5 % tma_light_operations
# 0.8 % tma_machine_clears
# 6.1 % tma_memory_bound

1.001712086 seconds time elapsed

0.000151000 seconds user
0.001618000 seconds sys

Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/builtin-stat.c | 1 +
tools/perf/util/evsel.h | 1 +
tools/perf/util/stat-display.c | 108 ++++++++++++++++++++++++++++++---
tools/perf/util/stat-shadow.c | 108 +++++++++++++++++++++++++++++----
tools/perf/util/stat.h | 15 +++++
5 files changed, 211 insertions(+), 22 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 55601b4b5c34..3f4e76f76f94 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -2172,6 +2172,7 @@ static int add_default_attributes(void)

evlist__for_each_entry(metric_evlist, metric_evsel) {
metric_evsel->skippable = true;
+ metric_evsel->default_metricgroup = true;
}
evlist__splice_list_tail(evsel_list, &metric_evlist->core.entries);
evlist__delete(metric_evlist);
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index cc6fb3049b99..9f06d6cd5379 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -131,6 +131,7 @@ struct evsel {
bool reset_group;
bool errored;
bool needs_auxtrace_mmap;
+ bool default_metricgroup; /* A member of the Default metricgroup */
struct hashmap *per_pkg_mask;
int err;
struct {
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index a2bbdc25d979..7329b3340f88 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -25,6 +25,7 @@
#define CNTR_NOT_SUPPORTED "<not supported>"
#define CNTR_NOT_COUNTED "<not counted>"

+#define MGROUP_LEN 50
#define METRIC_LEN 38
#define EVNAME_LEN 32
#define COUNTS_LEN 18
@@ -364,16 +365,27 @@ static void new_line_std(struct perf_stat_config *config __maybe_unused,
os->newline = true;
}

-static void do_new_line_std(struct perf_stat_config *config,
- struct outstate *os)
+static inline void __new_line_std_csv(struct perf_stat_config *config,
+ struct outstate *os)
{
fputc('\n', os->fh);
if (os->prefix)
fputs(os->prefix, os->fh);
aggr_printout(config, os->evsel, os->id, os->aggr_nr);
+}
+
+static inline void __new_line_std(struct outstate *os)
+{
+ fprintf(os->fh, " ");
+}
+
+static void do_new_line_std(struct perf_stat_config *config,
+ struct outstate *os)
+{
+ __new_line_std_csv(config, os);
if (config->aggr_mode == AGGR_NONE)
fprintf(os->fh, " ");
- fprintf(os->fh, " ");
+ __new_line_std(os);
}

static void print_metric_std(struct perf_stat_config *config,
@@ -408,10 +420,7 @@ static void new_line_csv(struct perf_stat_config *config, void *ctx)
struct outstate *os = ctx;
int i;

- fputc('\n', os->fh);
- if (os->prefix)
- fprintf(os->fh, "%s", os->prefix);
- aggr_printout(config, os->evsel, os->id, os->aggr_nr);
+ __new_line_std_csv(config, os);
for (i = 0; i < os->nfields; i++)
fputs(config->csv_sep, os->fh);
}
@@ -462,6 +471,54 @@ static void new_line_json(struct perf_stat_config *config, void *ctx)
aggr_printout(config, os->evsel, os->id, os->aggr_nr);
}

+static void print_metricgroup_header_json(struct perf_stat_config *config,
+ void *ctx,
+ const char *metricgroup_name)
+{
+ if (!metricgroup_name)
+ return;
+
+ fprintf(config->output, "\"metricgroup\" : \"%s\"}", metricgroup_name);
+ new_line_json(config, ctx);
+}
+
+static void print_metricgroup_header_csv(struct perf_stat_config *config,
+ void *ctx,
+ const char *metricgroup_name)
+{
+ struct outstate *os = ctx;
+ int i;
+
+ if (!metricgroup_name) {
+ /* Leave space for running and enabling */
+ for (i = 0; i < os->nfields - 2; i++)
+ fputs(config->csv_sep, os->fh);
+ return;
+ }
+
+ for (i = 0; i < os->nfields; i++)
+ fputs(config->csv_sep, os->fh);
+ fprintf(config->output, "%s", metricgroup_name);
+ new_line_csv(config, ctx);
+}
+
+static void print_metricgroup_header_std(struct perf_stat_config *config,
+ void *ctx,
+ const char *metricgroup_name)
+{
+ struct outstate *os = ctx;
+ int n;
+
+ if (!metricgroup_name) {
+ __new_line_std(os);
+ return;
+ }
+
+ n = fprintf(config->output, " %*s", EVNAME_LEN, metricgroup_name);
+
+ fprintf(config->output, "%*s", MGROUP_LEN - n - 1, "");
+}
+
/* Filter out some columns that don't work well in metrics only mode */

static bool valid_only_metric(const char *unit)
@@ -713,19 +770,23 @@ static void printout(struct perf_stat_config *config, struct outstate *os,
struct perf_stat_output_ctx out;
print_metric_t pm;
new_line_t nl;
+ print_metricgroup_header_t pmh;
bool ok = true;
struct evsel *counter = os->evsel;

if (config->csv_output) {
pm = config->metric_only ? print_metric_only_csv : print_metric_csv;
nl = config->metric_only ? new_line_metric : new_line_csv;
+ pmh = print_metricgroup_header_csv;
os->nfields = 4 + (counter->cgrp ? 1 : 0);
} else if (config->json_output) {
pm = config->metric_only ? print_metric_only_json : print_metric_json;
nl = config->metric_only ? new_line_metric : new_line_json;
+ pmh = print_metricgroup_header_json;
} else {
pm = config->metric_only ? print_metric_only : print_metric_std;
nl = config->metric_only ? new_line_metric : new_line_std;
+ pmh = print_metricgroup_header_std;
}

if (run == 0 || ena == 0 || counter->counts->scaled == -1) {
@@ -747,10 +808,11 @@ static void printout(struct perf_stat_config *config, struct outstate *os,

out.print_metric = pm;
out.new_line = nl;
+ out.print_metricgroup_header = pmh;
out.ctx = os;
out.force_header = false;

- if (!config->metric_only) {
+ if (!config->metric_only && !counter->default_metricgroup) {
abs_printout(config, os->id, os->aggr_nr, counter, uval, ok);

print_noise(config, counter, noise, /*before_metric=*/true);
@@ -758,8 +820,31 @@ static void printout(struct perf_stat_config *config, struct outstate *os,
}

if (ok) {
- perf_stat__print_shadow_stats(config, counter, uval, aggr_idx,
- &out, &config->metric_events);
+ if (!config->metric_only && counter->default_metricgroup) {
+ void *from = NULL;
+
+ aggr_printout(config, os->evsel, os->id, os->aggr_nr);
+ /* Print out all the metricgroup with the same metric event. */
+ do {
+ int num = 0;
+
+ /* Print out the new line for the next new metricgroup. */
+ if (from) {
+ if (config->json_output)
+ new_line_json(config, (void *)os);
+ else
+ __new_line_std_csv(config, os);
+ }
+
+ print_noise(config, counter, noise, /*before_metric=*/true);
+ print_running(config, run, ena, /*before_metric=*/true);
+ from = perf_stat__print_shadow_stats_metricgroup(config, counter, aggr_idx,
+ &num, from, &out,
+ &config->metric_events);
+ } while (from != NULL);
+ } else
+ perf_stat__print_shadow_stats(config, counter, uval, aggr_idx,
+ &out, &config->metric_events);
} else {
pm(config, os, /*color=*/NULL, /*format=*/NULL, /*unit=*/"", /*val=*/0);
}
@@ -889,6 +974,9 @@ static void print_counter_aggrdata(struct perf_stat_config *config,
ena = aggr->counts.ena;
run = aggr->counts.run;

+ if (perf_stat__skip_metric_event(counter, &config->metric_events, ena, run))
+ return;
+
if (val == 0 && should_skip_zero_counter(config, counter, &id))
return;

diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 1566a206ba42..b25974670d30 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -539,6 +539,83 @@ double test_generic_metric(struct metric_expr *mexp, int aggr_idx)
return ratio;
}

+/**
+ * perf_stat__print_shadow_stats_metricgroup - Print out metrics associated with the evsel
+ * For the non-default, all metrics associated
+ * with the evsel are printed.
+ * For the default mode, only the metrics from
+ * the same metricgroup and the name of the
+ * metricgroup are printed. To print the metrics
+ * from the next metricgroup (if available),
+ * invoke the function with correspoinding
+ * metric_expr.
+ */
+void *perf_stat__print_shadow_stats_metricgroup(struct perf_stat_config *config,
+ struct evsel *evsel,
+ int aggr_idx,
+ int *num,
+ void *from,
+ struct perf_stat_output_ctx *out,
+ struct rblist *metric_events)
+{
+ struct metric_event *me;
+ struct metric_expr *mexp = from;
+ void *ctxp = out->ctx;
+ bool header_printed = false;
+ const char *name = NULL;
+ static const char *last_name;
+
+ me = metricgroup__lookup(metric_events, evsel, false);
+ if (me == NULL)
+ return NULL;
+
+ if (!mexp)
+ mexp = list_first_entry(&me->head, typeof(*mexp), nd);
+
+ list_for_each_entry_from(mexp, &me->head, nd) {
+ /* Print the display name of the Default metricgroup */
+ if (me->is_default) {
+ if (!name)
+ name = mexp->default_metricgroup_name;
+ /*
+ * Two or more metricgroup may share the same metric
+ * event, e.g., TopdownL1 and TopdownL2 on SPR.
+ * Return and print the prefix, e.g., noise, running
+ * for the next metricgroup.
+ */
+ if (strcmp(name, mexp->default_metricgroup_name))
+ return (void *)mexp;
+ /* Only print the name of the metricgroup once */
+ if (!header_printed) {
+ header_printed = true;
+ if (!last_name || strcmp(last_name, name)) {
+ /* Print out the name for the new metricgroup. */
+ out->print_metricgroup_header(config, ctxp, name);
+ last_name = name;
+ } else if (!strcmp(last_name, name)) {
+ /*
+ * A metricgroup may have several metric events,
+ * e.g.,TopdownL1 on e-core of ADL.
+ * The name has been output by the first metric
+ * event. Only align with other metics from
+ * different metric events.
+ */
+ out->print_metricgroup_header(config, ctxp, NULL);
+ }
+ }
+ }
+
+ if ((*num)++ > 0)
+ out->new_line(config, ctxp);
+ generic_metric(config, mexp->metric_expr, mexp->metric_threshold,
+ mexp->metric_events, mexp->metric_refs, evsel->name,
+ mexp->metric_name, mexp->metric_unit, mexp->runtime,
+ aggr_idx, out);
+ }
+
+ return NULL;
+}
+
void perf_stat__print_shadow_stats(struct perf_stat_config *config,
struct evsel *evsel,
double avg, int aggr_idx,
@@ -565,7 +642,6 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
};
print_metric_t print_metric = out->print_metric;
void *ctxp = out->ctx;
- struct metric_event *me;
int num = 1;

if (config->iostat_run) {
@@ -592,18 +668,26 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
}
}

- if ((me = metricgroup__lookup(metric_events, evsel, false)) != NULL) {
- struct metric_expr *mexp;
+ perf_stat__print_shadow_stats_metricgroup(config, evsel, aggr_idx,
+ &num, NULL, out, metric_events);

- list_for_each_entry (mexp, &me->head, nd) {
- if (num++ > 0)
- out->new_line(config, ctxp);
- generic_metric(config, mexp->metric_expr, mexp->metric_threshold,
- mexp->metric_events, mexp->metric_refs, evsel->name,
- mexp->metric_name, mexp->metric_unit, mexp->runtime,
- aggr_idx, out);
- }
- }
if (num == 0)
print_metric(config, ctxp, NULL, NULL, NULL, 0);
}
+
+/**
+ * perf_stat__skip_metric_event - Skip the evsel in the Default metricgroup,
+ * if it's not running or not the metric event.
+ */
+bool perf_stat__skip_metric_event(struct evsel *evsel,
+ struct rblist *metric_events,
+ u64 ena, u64 run)
+{
+ if (!evsel->default_metricgroup)
+ return false;
+
+ if (!ena || !run)
+ return true;
+
+ return !metricgroup__lookup(metric_events, evsel, false);
+}
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 7abff7cbb5a1..934f79778cea 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -158,11 +158,16 @@ typedef void (*print_metric_t)(struct perf_stat_config *config,
const char *fmt, double val);
typedef void (*new_line_t)(struct perf_stat_config *config, void *ctx);

+/* Used to print the display name of the Default metricgroup for now. */
+typedef void (*print_metricgroup_header_t)(struct perf_stat_config *config,
+ void *ctx, const char *metricgroup_name);
+
void perf_stat__reset_shadow_stats(void);
struct perf_stat_output_ctx {
void *ctx;
print_metric_t print_metric;
new_line_t new_line;
+ print_metricgroup_header_t print_metricgroup_header;
bool force_header;
};

@@ -171,6 +176,16 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
double avg, int aggr_idx,
struct perf_stat_output_ctx *out,
struct rblist *metric_events);
+bool perf_stat__skip_metric_event(struct evsel *evsel,
+ struct rblist *metric_events,
+ u64 ena, u64 run);
+void *perf_stat__print_shadow_stats_metricgroup(struct perf_stat_config *config,
+ struct evsel *evsel,
+ int aggr_idx,
+ int *num,
+ void *from,
+ struct perf_stat_output_ctx *out,
+ struct rblist *metric_events);

int evlist__alloc_stats(struct perf_stat_config *config,
struct evlist *evlist, bool alloc_raw);
--
2.35.1


2023-06-15 22:08:33

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH V3 4/8] perf metrics: Sort the Default metricgroup

On Thu, Jun 15, 2023 at 6:54 AM <[email protected]> wrote:
>
> From: Kan Liang <[email protected]>
>
> The new default mode will print the metrics as a metric group. The
> metrics from the same metric group must be adjacent to each other in the
> metric list. But the metric_list_cmp() sorts metrics by the number of
> events.
>
> Add a new sort for the Default metricgroup, which sorts by
> default_metricgroup_name and metric_name.
>
> Add is_default in the struct metric_event to indicate that it's from
> the Default metricgroup.
>
> Store the displayed metricgroup name of the Default metricgroup into
> the metric expr for output.
>
> Signed-off-by: Kan Liang <[email protected]>
> ---
> tools/perf/util/metricgroup.c | 37 +++++++++++++++++++++++++++++++++++
> tools/perf/util/metricgroup.h | 3 +++
> 2 files changed, 40 insertions(+)
>
> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> index 8b19644ade7d..e20adbdd5b56 100644
> --- a/tools/perf/util/metricgroup.c
> +++ b/tools/perf/util/metricgroup.c
> @@ -79,6 +79,7 @@ static struct rb_node *metric_event_new(struct rblist *rblist __maybe_unused,
> return NULL;
> memcpy(me, entry, sizeof(struct metric_event));
> me->evsel = ((struct metric_event *)entry)->evsel;
> + me->is_default = false;
> INIT_LIST_HEAD(&me->head);
> return &me->nd;
> }
> @@ -1160,6 +1161,25 @@ static int metric_list_cmp(void *priv __maybe_unused, const struct list_head *l,
> return right_count - left_count;
> }
>
> +/**
> + * default_metricgroup_cmp - Implements complex key for the Default metricgroup
> + * that first sorts by default_metricgroup_name, then
> + * metric_name.
> + */
> +static int default_metricgroup_cmp(void *priv __maybe_unused,
> + const struct list_head *l,
> + const struct list_head *r)
> +{
> + const struct metric *left = container_of(l, struct metric, nd);
> + const struct metric *right = container_of(r, struct metric, nd);
> + int diff = strcmp(right->default_metricgroup_name, left->default_metricgroup_name);
> +
> + if (diff)
> + return diff;
> +
> + return strcmp(right->metric_name, left->metric_name);
> +}
> +
> struct metricgroup__add_metric_data {
> struct list_head *list;
> const char *pmu;
> @@ -1515,6 +1535,7 @@ static int parse_groups(struct evlist *perf_evlist,
> LIST_HEAD(metric_list);
> struct metric *m;
> bool tool_events[PERF_TOOL_MAX] = {false};
> + bool is_default = !strcmp(str, "Default");
> int ret;
>
> if (metric_events_list->nr_entries == 0)
> @@ -1549,6 +1570,9 @@ static int parse_groups(struct evlist *perf_evlist,
> goto out;
> }
>
> + if (is_default)
> + list_sort(NULL, &metric_list, default_metricgroup_cmp);
> +
> list_for_each_entry(m, &metric_list, nd) {
> struct metric_event *me;
> struct evsel **metric_events;
> @@ -1637,6 +1661,19 @@ static int parse_groups(struct evlist *perf_evlist,
> expr->metric_unit = m->metric_unit;
> expr->metric_events = metric_events;
> expr->runtime = m->pctx->sctx.runtime;
> + if (m->pmu && strcmp(m->pmu, "cpu")) {

This shouldn't compare with a PMU name like this. What happens for
memory bandwidth which could be logically with a memory controller
PMU?

> + char *name;
> +
> + if (asprintf(&name, "%s (%s)", m->default_metricgroup_name, m->pmu) < 0)
> + expr->default_metricgroup_name = m->default_metricgroup_name;

Who owns the string in this case? Can't you end up with
default_metricgroup_name pointing to a freed string? I think this
feels a lot more like output code, so I'm unclear why we're setting it
up in the metric.

> + else {
> + expr->default_metricgroup_name = strdup(name);

But name was just allocated, why strdup?

This is still leaking it is just the strdup now leaks and not the asprintf:
```
==2495793==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 6199 byte(s) in 340 object(s) allocated from:
#0 0x7f175c27077b in __interceptor_strdup
../../../../src/libsanitizer/asan/asan_interceptors.cp
p:439
#1 0x5596323b2a28 in parse_groups util/metricgroup.c:1670
#2 0x5596323b2f08 in metricgroup__parse_groups_test util/metricgroup.c:1721
#3 0x5596322992bf in test__parsing_callback tests/pmu-events.c:837
#4 0x559632658ac6 in pmu_metrics_table_for_each_metric
/tmp/perf/pmu-events/pmu-events.c:61641
#5 0x5596326590fc in pmu_for_each_core_metric
/tmp/perf/pmu-events/pmu-events.c:61742
#6 0x559632299b7c in test__parsing tests/pmu-events.c:898
#7 0x5596322663b0 in run_test tests/builtin-test.c:236
#8 0x559632266655 in test_and_print tests/builtin-test.c:265
#9 0x55963226766f in __cmd_test tests/builtin-test.c:436
#10 0x559632268953 in cmd_test tests/builtin-test.c:559
#11 0x5596322f5916 in run_builtin
/home/irogers/kernel.org/tools/perf/perf.c:323
#12 0x5596322f5e87 in handle_internal_command
/home/irogers/kernel.org/tools/perf/perf.c:377
#13 0x5596322f624f in run_argv /home/irogers/kernel.org/tools/perf/perf.c:421
#14 0x5596322f67b7 in main /home/irogers/kernel.org/tools/perf/perf.c:537
#15 0x7f175bebf189 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58

SUMMARY: AddressSanitizer: 6199 byte(s) leaked in 340 allocation(s).
```

> + free(name);
> + }
> + } else
> + expr->default_metricgroup_name = m->default_metricgroup_name;

Who owns the string in this case? Can't you end up with
default_metricgroup_name pointing to a freed string?

I spent some time trying to rationalize this to add as a patch, but
then the more I look at things like the strcmp with "cpu" my changes
were going to modify behavior in a way that would need you to test and
sign-off, so I'll hold back.

To test with leak/address sanitizer use the tmp.perf-tools-next
branch, I build with:
$ make -C tools/perf O=/tmp/perf DEBUG=1 EXTRA_CFLAGS="-O0 -g
-fno-omit-frame-pointer -DREFCNT_CHECKING=1 -fsanitize=address"
NO_LIBTRACEEVENT=1
but the only bit you really need is "-fsanitize=address" which both
clang and gcc support.

Address sanitizer from apt-cache is:
libasan6 - AddressSanitizer -- a fast memory error detector

Fedora and other distro.s have it too.

Thanks,
Ian

> + if (is_default)
> + me->is_default = true;
> list_add(&expr->nd, &me->head);
> }
>
> diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
> index bf18274c15df..d5325c6ec8e1 100644
> --- a/tools/perf/util/metricgroup.h
> +++ b/tools/perf/util/metricgroup.h
> @@ -22,6 +22,7 @@ struct cgroup;
> struct metric_event {
> struct rb_node nd;
> struct evsel *evsel;
> + bool is_default; /* the metric evsel from the Default metricgroup */
> struct list_head head; /* list of metric_expr */
> };
>
> @@ -55,6 +56,8 @@ struct metric_expr {
> * more human intelligible) and then add "MiB" afterward when displayed.
> */
> const char *metric_unit;
> + /** Displayed metricgroup name of the Default metricgroup */
> + const char *default_metricgroup_name;
> /** Null terminated array of events used by the metric. */
> struct evsel **metric_events;
> /** Null terminated array of referenced metrics. */
> --
> 2.35.1
>

2023-06-16 01:45:47

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH V3 0/8] New metricgroup output in perf stat default mode

Em Thu, Jun 15, 2023 at 06:53:07AM -0700, [email protected] escreveu:
> The patch proposes a new output format which only outputting the value
> of each metric and the metricgroup name. It can brings a clean and
> consistent output format among ARCHs and generations.
>
> The first patche is a bug fix for the current code.
>
> The patches 2-5 introduce the new metricgroup output.
>
> The patches 6-8 improve the tests to cover the default mode.

I cherry picked 1-3 and 6, and pushed to the tmp.perf-tools-next branch
at both my usual repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tmp.perf-tools-next

And also at the one we're transitioning to, and that Namhyung Kim has
write access and will update while I'm in vacation in the next two
weeks.

git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git tmp.perf-tools-next

Tomorrow I'll do more tests with it and then transition to
perf-tools-next in both git repositories.

Please prefer
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git
from now on.

- Arnaldo

2023-06-16 02:56:44

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH V3 0/8] New metricgroup output in perf stat default mode



On 2023-06-15 9:28 p.m., Arnaldo Carvalho de Melo wrote:
> Em Thu, Jun 15, 2023 at 06:53:07AM -0700, [email protected] escreveu:
>> The patch proposes a new output format which only outputting the value
>> of each metric and the metricgroup name. It can brings a clean and
>> consistent output format among ARCHs and generations.
>>
>> The first patche is a bug fix for the current code.
>>
>> The patches 2-5 introduce the new metricgroup output.
>>
>> The patches 6-8 improve the tests to cover the default mode.
>
> I cherry picked 1-3 and 6, and pushed to the tmp.perf-tools-next branch
> at both my usual repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tmp.perf-tools-next
>
> And also at the one we're transitioning to, and that Namhyung Kim has
> write access and will update while I'm in vacation in the next two
> weeks.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git tmp.perf-tools-next
>
> Tomorrow I'll do more tests with it and then transition to
> perf-tools-next in both git repositories.
>
> Please prefer
> git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git
> from now on.
>

Thanks Arnaldo!

Kan


2023-06-16 03:09:13

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH V3 4/8] perf metrics: Sort the Default metricgroup



On 2023-06-15 6:05 p.m., Ian Rogers wrote:
> On Thu, Jun 15, 2023 at 6:54 AM <[email protected]> wrote:
>>
>> From: Kan Liang <[email protected]>
>>
>> The new default mode will print the metrics as a metric group. The
>> metrics from the same metric group must be adjacent to each other in the
>> metric list. But the metric_list_cmp() sorts metrics by the number of
>> events.
>>
>> Add a new sort for the Default metricgroup, which sorts by
>> default_metricgroup_name and metric_name.
>>
>> Add is_default in the struct metric_event to indicate that it's from
>> the Default metricgroup.
>>
>> Store the displayed metricgroup name of the Default metricgroup into
>> the metric expr for output.
>>
>> Signed-off-by: Kan Liang <[email protected]>
>> ---
>> tools/perf/util/metricgroup.c | 37 +++++++++++++++++++++++++++++++++++
>> tools/perf/util/metricgroup.h | 3 +++
>> 2 files changed, 40 insertions(+)
>>
>> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
>> index 8b19644ade7d..e20adbdd5b56 100644
>> --- a/tools/perf/util/metricgroup.c
>> +++ b/tools/perf/util/metricgroup.c
>> @@ -79,6 +79,7 @@ static struct rb_node *metric_event_new(struct rblist *rblist __maybe_unused,
>> return NULL;
>> memcpy(me, entry, sizeof(struct metric_event));
>> me->evsel = ((struct metric_event *)entry)->evsel;
>> + me->is_default = false;
>> INIT_LIST_HEAD(&me->head);
>> return &me->nd;
>> }
>> @@ -1160,6 +1161,25 @@ static int metric_list_cmp(void *priv __maybe_unused, const struct list_head *l,
>> return right_count - left_count;
>> }
>>
>> +/**
>> + * default_metricgroup_cmp - Implements complex key for the Default metricgroup
>> + * that first sorts by default_metricgroup_name, then
>> + * metric_name.
>> + */
>> +static int default_metricgroup_cmp(void *priv __maybe_unused,
>> + const struct list_head *l,
>> + const struct list_head *r)
>> +{
>> + const struct metric *left = container_of(l, struct metric, nd);
>> + const struct metric *right = container_of(r, struct metric, nd);
>> + int diff = strcmp(right->default_metricgroup_name, left->default_metricgroup_name);
>> +
>> + if (diff)
>> + return diff;
>> +
>> + return strcmp(right->metric_name, left->metric_name);
>> +}
>> +
>> struct metricgroup__add_metric_data {
>> struct list_head *list;
>> const char *pmu;
>> @@ -1515,6 +1535,7 @@ static int parse_groups(struct evlist *perf_evlist,
>> LIST_HEAD(metric_list);
>> struct metric *m;
>> bool tool_events[PERF_TOOL_MAX] = {false};
>> + bool is_default = !strcmp(str, "Default");
>> int ret;
>>
>> if (metric_events_list->nr_entries == 0)
>> @@ -1549,6 +1570,9 @@ static int parse_groups(struct evlist *perf_evlist,
>> goto out;
>> }
>>
>> + if (is_default)
>> + list_sort(NULL, &metric_list, default_metricgroup_cmp);
>> +
>> list_for_each_entry(m, &metric_list, nd) {
>> struct metric_event *me;
>> struct evsel **metric_events;
>> @@ -1637,6 +1661,19 @@ static int parse_groups(struct evlist *perf_evlist,
>> expr->metric_unit = m->metric_unit;
>> expr->metric_events = metric_events;
>> expr->runtime = m->pctx->sctx.runtime;
>> + if (m->pmu && strcmp(m->pmu, "cpu")) {
>
> This shouldn't compare with a PMU name like this. What happens for
> memory bandwidth which could be logically with a memory controller
> PMU?
>
>> + char *name;
>> +
>> + if (asprintf(&name, "%s (%s)", m->default_metricgroup_name, m->pmu) < 0)
>> + expr->default_metricgroup_name = m->default_metricgroup_name;
>
> Who owns the string in this case? Can't you end up with
> default_metricgroup_name pointing to a freed string? I think this
> feels a lot more like output code, so I'm unclear why we're setting it
> up in the metric.

Yes, we can decide when outputing the name. The below patch move it to
the output code. I will do the modification in V4.

diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index acf86b15ee49..a6a5ed44a679 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -1661,17 +1661,8 @@ static int parse_groups(struct evlist *perf_evlist,
expr->metric_unit = m->metric_unit;
expr->metric_events = metric_events;
expr->runtime = m->pctx->sctx.runtime;
- if (m->pmu && strcmp(m->pmu, "cpu")) {
- char *name;
-
- if (asprintf(&name, "%s (%s)", m->default_metricgroup_name, m->pmu) < 0)
- expr->default_metricgroup_name = m->default_metricgroup_name;
- else
- expr->default_metricgroup_name = name;
- } else
- expr->default_metricgroup_name = m->default_metricgroup_name;
- if (is_default)
- me->is_default = true;
+ expr->default_metricgroup_name = m->default_metricgroup_name;
+ me->is_default = is_default;
list_add(&expr->nd, &me->head);
}

diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index b25974670d30..1440b0fc7d00 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -539,6 +539,42 @@ double test_generic_metric(struct metric_expr
*mexp, int aggr_idx)
return ratio;
}

+static void perf_stat__print_metricgroup_header(struct perf_stat_config
*config,
+ struct evsel *evsel,
+ void *ctxp,
+ const char *name,
+ struct perf_stat_output_ctx *out)
+{
+ bool need_full_name = perf_pmus__num_core_pmus() > 1;
+ static const char *last_name;
+ static const char *last_pmu;
+ char full_name[64];
+
+ /*
+ * A metricgroup may have several metric events,
+ * e.g.,TopdownL1 on e-core of ADL.
+ * The name has been output by the first metric
+ * event. Only align with other metics from
+ * different metric events.
+ */
+ if (last_name && !strcmp(last_name, name)) {
+ if (!need_full_name || !strcmp(last_pmu, evsel->pmu_name)) {
+ out->print_metricgroup_header(config, ctxp, NULL);
+ return;
+ }
+ }
+
+ if (need_full_name)
+ scnprintf(full_name, sizeof(full_name), "%s (%s)", name,
evsel->pmu_name);
+ else
+ scnprintf(full_name, sizeof(full_name), "%s", name);
+
+ out->print_metricgroup_header(config, ctxp, full_name);
+
+ last_name = name;
+ last_pmu = evsel->pmu_name;
+}
+
/**
* perf_stat__print_shadow_stats_metricgroup - Print out metrics
associated with the evsel
* For the non-default, all metrics associated
@@ -563,7 +599,6 @@ void
*perf_stat__print_shadow_stats_metricgroup(struct perf_stat_config *config,
void *ctxp = out->ctx;
bool header_printed = false;
const char *name = NULL;
- static const char *last_name;

me = metricgroup__lookup(metric_events, evsel, false);
if (me == NULL)
@@ -588,20 +623,8 @@ void
*perf_stat__print_shadow_stats_metricgroup(struct perf_stat_config *config,
/* Only print the name of the metricgroup once */
if (!header_printed) {
header_printed = true;
- if (!last_name || strcmp(last_name, name)) {
- /* Print out the name for the new metricgroup. */
- out->print_metricgroup_header(config, ctxp, name);
- last_name = name;
- } else if (!strcmp(last_name, name)) {
- /*
- * A metricgroup may have several metric events,
- * e.g.,TopdownL1 on e-core of ADL.
- * The name has been output by the first metric
- * event. Only align with other metics from
- * different metric events.
- */
- out->print_metricgroup_header(config, ctxp, NULL);
- }
+ perf_stat__print_metricgroup_header(config, evsel, ctxp,
+ name, out);
}
}



>
>> + else {
>> + expr->default_metricgroup_name = strdup(name);
>
> But name was just allocated, why strdup?
>
> This is still leaking it is just the strdup now leaks and not the asprintf:
> ```
> ==2495793==ERROR: LeakSanitizer: detected memory leaks
>
> Direct leak of 6199 byte(s) in 340 object(s) allocated from:
> #0 0x7f175c27077b in __interceptor_strdup
> ../../../../src/libsanitizer/asan/asan_interceptors.cp
> p:439
> #1 0x5596323b2a28 in parse_groups util/metricgroup.c:1670
> #2 0x5596323b2f08 in metricgroup__parse_groups_test util/metricgroup.c:1721
> #3 0x5596322992bf in test__parsing_callback tests/pmu-events.c:837
> #4 0x559632658ac6 in pmu_metrics_table_for_each_metric
> /tmp/perf/pmu-events/pmu-events.c:61641
> #5 0x5596326590fc in pmu_for_each_core_metric
> /tmp/perf/pmu-events/pmu-events.c:61742
> #6 0x559632299b7c in test__parsing tests/pmu-events.c:898
> #7 0x5596322663b0 in run_test tests/builtin-test.c:236
> #8 0x559632266655 in test_and_print tests/builtin-test.c:265
> #9 0x55963226766f in __cmd_test tests/builtin-test.c:436
> #10 0x559632268953 in cmd_test tests/builtin-test.c:559
> #11 0x5596322f5916 in run_builtin
> /home/irogers/kernel.org/tools/perf/perf.c:323
> #12 0x5596322f5e87 in handle_internal_command
> /home/irogers/kernel.org/tools/perf/perf.c:377
> #13 0x5596322f624f in run_argv /home/irogers/kernel.org/tools/perf/perf.c:421
> #14 0x5596322f67b7 in main /home/irogers/kernel.org/tools/perf/perf.c:537
> #15 0x7f175bebf189 in __libc_start_call_main
> ../sysdeps/nptl/libc_start_call_main.h:58
>
> SUMMARY: AddressSanitizer: 6199 byte(s) leaked in 340 allocation(s).
> ```
>
>> + free(name);
>> + }
>> + } else
>> + expr->default_metricgroup_name = m->default_metricgroup_name;
>
> Who owns the string in this case? Can't you end up with
> default_metricgroup_name pointing to a freed string?
>
> I spent some time trying to rationalize this to add as a patch, but
> then the more I look at things like the strcmp with "cpu" my changes
> were going to modify behavior in a way that would need you to test and
> sign-off, so I'll hold back.

THe "cpu" check will be gone in v4.

>
> To test with leak/address sanitizer use the tmp.perf-tools-next
> branch, I build with:
> $ make -C tools/perf O=/tmp/perf DEBUG=1 EXTRA_CFLAGS="-O0 -g
> -fno-omit-frame-pointer -DREFCNT_CHECKING=1 -fsanitize=address"
> NO_LIBTRACEEVENT=1
> but the only bit you really need is "-fsanitize=address" which both
> clang and gcc support.
>
> Address sanitizer from apt-cache is:
> libasan6 - AddressSanitizer -- a fast memory error detector
>
> Fedora and other distro.s have it too.
>

I still cannot reproduce the memory leaks even with the above command.
I will remove the code in v4, so the memory leaks should be gone.

Thanks,
Kan

>
>> + if (is_default)
>> + me->is_default = true;
>> list_add(&expr->nd, &me->head);
>> }
>>
>> diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
>> index bf18274c15df..d5325c6ec8e1 100644
>> --- a/tools/perf/util/metricgroup.h
>> +++ b/tools/perf/util/metricgroup.h
>> @@ -22,6 +22,7 @@ struct cgroup;
>> struct metric_event {
>> struct rb_node nd;
>> struct evsel *evsel;
>> + bool is_default; /* the metric evsel from the Default metricgroup */
>> struct list_head head; /* list of metric_expr */
>> };
>>
>> @@ -55,6 +56,8 @@ struct metric_expr {
>> * more human intelligible) and then add "MiB" afterward when displayed.
>> */
>> const char *metric_unit;
>> + /** Displayed metricgroup name of the Default metricgroup */
>> + const char *default_metricgroup_name;
>> /** Null terminated array of events used by the metric. */
>> struct evsel **metric_events;
>> /** Null terminated array of referenced metrics. */
>> --
>> 2.35.1
>>