2022-10-14 06:18:02

by Namhyung Kim

[permalink] [raw]
Subject: [PATCHSET 00/19] perf stat: Cleanup counter aggregation (v2)

Hello,

Current perf stat code is somewhat hard to follow since it handles
many combinations of PMUs/events for given display and aggregation
options. This is my attempt to clean it up a little. ;-)

changes in v2)
* fix a segfault in perf stat report for per-process record (Jiri)
* fix metric only display (Jiri)
* add evsel__reset_aggr_stat (ian)
* add more comments (Ian)
* add Acked-by from Ian

My first concern is that aggregation and display routines are intermixed
and processed differently depends on the aggregation mode. I'd like to
separate them apart and make the logic clearer.

To do that, I added struct perf_stat_aggr to save the aggregated counter
values and other info. It'll be allocated and processed according to
the aggr_mode and display logic will use it.

I've tested the following combination.

$ cat test-matrix.sh
#!/bin/sh

set -e

yes > /dev/null &
TARGET=$!

./perf stat true
./perf stat -a true
./perf stat -C0 true
./perf stat -p $TARGET true
./perf stat -t $TARGET true

./perf stat -a -A true
./perf stat -a --per-node true
./perf stat -a --per-socket true
./perf stat -a --per-die true
./perf stat -a --per-core true
./perf stat -a --per-thread true

./perf stat -a -I 500 sleep 1
./perf stat -a -I 500 --summary sleep 1
./perf stat -a -I 500 --per-socket sleep 1
./perf stat -a -I 500 --summary --per-socket sleep 1

./perf stat -a --metric-only true
./perf stat -a --metric-only --per-socket true
./perf stat -a --metric-only -I 500 sleep 1
./perf stat -a --metric-only -I 500 --per-socket sleep 1

./perf stat record true && ./perf stat report
./perf stat record -p $TARGET true && ./perf stat report
./perf stat record -a true && ./perf stat report
./perf stat record -a --per-core true && ./perf stat report
./perf stat record -a --per-core --metric-only true && ./perf stat report
./perf stat record -a -I 500 sleep 1 && ./perf stat report
./perf stat record -a -I 500 --per-core sleep 1 && ./perf stat report
./perf stat record -a -I 500 --per-core --metric-only sleep 1 && ./perf stat report

./perf stat -a -A -e cpu/event=cpu-cycles,percore/ true
./perf stat -a -A -e cpu/event=cpu-cycles,percore/ --percore-show-thread true

kill $TARGET

The code is available at 'perf/stat-aggr-v2' branch in

git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung

Namhyung Kim (19):
perf tools: Save evsel->pmu in parse_events()
perf tools: Use pmu info in evsel__is_hybrid()
perf stat: Use evsel__is_hybrid() more
perf stat: Add aggr id for global mode
perf stat: Add cpu aggr id for no aggregation mode
perf stat: Add 'needs_sort' argument to cpu_aggr_map__new()
perf stat: Add struct perf_stat_aggr to perf_stat_evsel
perf stat: Allocate evsel->stats->aggr properly
perf stat: Aggregate events using evsel->stats->aggr
perf stat: Aggregate per-thread stats using evsel->stats->aggr
perf stat: Allocate aggr counts for recorded data
perf stat: Reset aggr counts for each interval
perf stat: Split process_counters()
perf stat: Add perf_stat_merge_counters()
perf stat: Add perf_stat_process_percore()
perf stat: Add perf_stat_process_shadow_stats()
perf stat: Display event stats using aggr counts
perf stat: Display percore events properly
perf stat: Remove unused perf_counts.aggr field

tools/perf/builtin-script.c | 4 +-
tools/perf/builtin-stat.c | 186 +++++--
tools/perf/tests/parse-metric.c | 2 +-
tools/perf/tests/pmu-events.c | 2 +-
tools/perf/util/counts.c | 1 -
tools/perf/util/counts.h | 1 -
tools/perf/util/cpumap.c | 16 +-
tools/perf/util/cpumap.h | 8 +-
tools/perf/util/evsel.c | 13 +-
tools/perf/util/parse-events.c | 1 +
tools/perf/util/pmu.c | 4 +
.../scripting-engines/trace-event-python.c | 6 -
tools/perf/util/stat-display.c | 462 +++---------------
tools/perf/util/stat.c | 385 ++++++++++++---
tools/perf/util/stat.h | 40 +-
15 files changed, 602 insertions(+), 529 deletions(-)


base-commit: d79310700590b8b40d8c867012d6c899ea6fd505
--
2.38.0.413.g74048e4d9e-goog


2022-10-14 06:18:13

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 03/19] perf stat: Use evsel__is_hybrid() more

In the stat-display code, it needs to check if the current evsel is
hybrid but it uses perf_pmu__has_hybrid() which can return true for
non-hybrid event too. I think it's better to use evsel__is_hybrid().

Also remove a NULL check for the 'config' parameter in the
hybrid_merge() since it's called after config->no_merge check.

Acked-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/stat-display.c | 20 ++++----------------
1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 5c47ee9963a7..4113aa86772f 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -704,7 +704,7 @@ static void uniquify_event_name(struct evsel *counter)
counter->name = new_name;
}
} else {
- if (perf_pmu__has_hybrid()) {
+ if (evsel__is_hybrid(counter)) {
ret = asprintf(&new_name, "%s/%s/",
counter->pmu_name, counter->name);
} else {
@@ -744,26 +744,14 @@ static void collect_all_aliases(struct perf_stat_config *config, struct evsel *c
}
}

-static bool is_uncore(struct evsel *evsel)
-{
- struct perf_pmu *pmu = evsel__find_pmu(evsel);
-
- return pmu && pmu->is_uncore;
-}
-
-static bool hybrid_uniquify(struct evsel *evsel)
-{
- return perf_pmu__has_hybrid() && !is_uncore(evsel);
-}
-
static bool hybrid_merge(struct evsel *counter, struct perf_stat_config *config,
bool check)
{
- if (hybrid_uniquify(counter)) {
+ if (evsel__is_hybrid(counter)) {
if (check)
- return config && config->hybrid_merge;
+ return config->hybrid_merge;
else
- return config && !config->hybrid_merge;
+ return !config->hybrid_merge;
}

return false;
--
2.38.0.413.g74048e4d9e-goog

2022-10-14 06:32:46

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 14/19] perf stat: Add perf_stat_merge_counters()

The perf_stat_merge_counters() is to aggregate the same events in different
PMUs like in case of uncore or hybrid. The same logic is in the stat-display
routines but I think it should be handled when it processes the event counters.

As it works on the aggr_counters, it doesn't change the output yet.

Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-stat.c | 2 +
tools/perf/util/stat.c | 96 +++++++++++++++++++++++++++++++++++++++
tools/perf/util/stat.h | 2 +
3 files changed, 100 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 838d29590bed..371d6e896942 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -486,6 +486,8 @@ static void process_counters(void)
pr_warning("failed to process counter %s\n", counter->name);
counter->err = 0;
}
+
+ perf_stat_merge_counters(&stat_config, evsel_list);
}

static void process_interval(void)
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 847481cc3d5a..877107f5a820 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -577,6 +577,102 @@ int perf_stat_process_counter(struct perf_stat_config *config,
return 0;
}

+static int evsel__merge_aggr_counters(struct evsel *evsel, struct evsel *alias)
+{
+ struct perf_stat_evsel *ps_a = evsel->stats;
+ struct perf_stat_evsel *ps_b = alias->stats;
+ int i;
+
+ if (ps_a->aggr == NULL && ps_b->aggr == NULL)
+ return 0;
+
+ if (ps_a->nr_aggr != ps_b->nr_aggr) {
+ pr_err("Unmatched aggregation mode between aliases\n");
+ return -1;
+ }
+
+ for (i = 0; i < ps_a->nr_aggr; i++) {
+ struct perf_counts_values *aggr_counts_a = &ps_a->aggr[i].counts;
+ struct perf_counts_values *aggr_counts_b = &ps_b->aggr[i].counts;
+
+ /* NB: don't increase aggr.nr for aliases */
+
+ aggr_counts_a->val += aggr_counts_b->val;
+ aggr_counts_a->ena += aggr_counts_b->ena;
+ aggr_counts_a->run += aggr_counts_b->run;
+ }
+
+ return 0;
+}
+/* events should have the same name, scale, unit, cgroup but on different PMUs */
+static bool evsel__is_alias(struct evsel *evsel_a, struct evsel *evsel_b)
+{
+ if (strcmp(evsel__name(evsel_a), evsel__name(evsel_b)))
+ return false;
+
+ if (evsel_a->scale != evsel_b->scale)
+ return false;
+
+ if (evsel_a->cgrp != evsel_b->cgrp)
+ return false;
+
+ if (strcmp(evsel_a->unit, evsel_b->unit))
+ return false;
+
+ if (evsel__is_clock(evsel_a) != evsel__is_clock(evsel_b))
+ return false;
+
+ return !!strcmp(evsel_a->pmu_name, evsel_b->pmu_name);
+}
+
+static void evsel__merge_aliases(struct evsel *evsel)
+{
+ struct evlist *evlist = evsel->evlist;
+ struct evsel *alias;
+
+ alias = list_prepare_entry(evsel, &(evlist->core.entries), core.node);
+ list_for_each_entry_continue(alias, &evlist->core.entries, core.node) {
+ /* Merge the same events on different PMUs. */
+ if (evsel__is_alias(evsel, alias)) {
+ evsel__merge_aggr_counters(evsel, alias);
+ alias->merged_stat = true;
+ }
+ }
+}
+
+static bool evsel__should_merge_hybrid(struct evsel *evsel, struct perf_stat_config *config)
+{
+ struct perf_pmu *pmu;
+
+ if (!config->hybrid_merge)
+ return false;
+
+ pmu = evsel__find_pmu(evsel);
+ return pmu && pmu->is_hybrid;
+}
+
+static void evsel__merge_stats(struct evsel *evsel, struct perf_stat_config *config)
+{
+ /* this evsel is already merged */
+ if (evsel->merged_stat)
+ return;
+
+ if (evsel->auto_merge_stats || evsel__should_merge_hybrid(evsel, config))
+ evsel__merge_aliases(evsel);
+}
+
+/* merge the same uncore and hybrid events if requested */
+void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist)
+{
+ struct evsel *evsel;
+
+ if (config->no_merge)
+ return;
+
+ evlist__for_each_entry(evlist, evsel)
+ evsel__merge_stats(evsel, config);
+}
+
int perf_event__process_stat_event(struct perf_session *session,
union perf_event *event)
{
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 809f9f0aff0c..728bbc823b0d 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -280,6 +280,8 @@ void evlist__reset_aggr_stats(struct evlist *evlist);

int perf_stat_process_counter(struct perf_stat_config *config,
struct evsel *counter);
+void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist);
+
struct perf_tool;
union perf_event;
struct perf_session;
--
2.38.0.413.g74048e4d9e-goog

2022-10-14 06:33:44

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 19/19] perf stat: Remove unused perf_counts.aggr field

The aggr field in the struct perf_counts is to keep the aggregated value
in the AGGR_GLOBAL for the old code. But it's not used anymore.

Acked-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/counts.c | 1 -
tools/perf/util/counts.h | 1 -
tools/perf/util/stat.c | 35 ++---------------------------------
3 files changed, 2 insertions(+), 35 deletions(-)

diff --git a/tools/perf/util/counts.c b/tools/perf/util/counts.c
index 7a447d918458..11cd85b278a6 100644
--- a/tools/perf/util/counts.c
+++ b/tools/perf/util/counts.c
@@ -48,7 +48,6 @@ void perf_counts__reset(struct perf_counts *counts)
{
xyarray__reset(counts->loaded);
xyarray__reset(counts->values);
- memset(&counts->aggr, 0, sizeof(struct perf_counts_values));
}

void evsel__reset_counts(struct evsel *evsel)
diff --git a/tools/perf/util/counts.h b/tools/perf/util/counts.h
index 5de275194f2b..42760242e0df 100644
--- a/tools/perf/util/counts.h
+++ b/tools/perf/util/counts.h
@@ -11,7 +11,6 @@ struct evsel;

struct perf_counts {
s8 scaled;
- struct perf_counts_values aggr;
struct xyarray *values;
struct xyarray *loaded;
};
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 14c45f4cfdd3..6ab9c58beca7 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -308,8 +308,6 @@ static void evsel__copy_prev_raw_counts(struct evsel *evsel)
*perf_counts(evsel->prev_raw_counts, idx, thread);
}
}
-
- evsel->counts->aggr = evsel->prev_raw_counts->aggr;
}

void evlist__copy_prev_raw_counts(struct evlist *evlist)
@@ -320,26 +318,6 @@ void evlist__copy_prev_raw_counts(struct evlist *evlist)
evsel__copy_prev_raw_counts(evsel);
}

-void evlist__save_aggr_prev_raw_counts(struct evlist *evlist)
-{
- struct evsel *evsel;
-
- /*
- * To collect the overall statistics for interval mode,
- * we copy the counts from evsel->prev_raw_counts to
- * evsel->counts. The perf_stat_process_counter creates
- * aggr values from per cpu values, but the per cpu values
- * are 0 for AGGR_GLOBAL. So we use a trick that saves the
- * previous aggr value to the first member of perf_counts,
- * then aggr calculation in process_counter_values can work
- * correctly.
- */
- evlist__for_each_entry(evlist, evsel) {
- *perf_counts(evsel->prev_raw_counts, 0, 0) =
- evsel->prev_raw_counts->aggr;
- }
-}
-
static size_t pkg_id_hash(const void *__key, void *ctx __maybe_unused)
{
uint64_t *key = (uint64_t *) __key;
@@ -423,7 +401,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
int cpu_map_idx, int thread,
struct perf_counts_values *count)
{
- struct perf_counts_values *aggr = &evsel->counts->aggr;
struct perf_stat_evsel *ps = evsel->stats;
static struct perf_counts_values zero;
bool skip = false;
@@ -493,12 +470,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
}
}

- if (config->aggr_mode == AGGR_GLOBAL) {
- aggr->val += count->val;
- aggr->ena += count->ena;
- aggr->run += count->run;
- }
-
return 0;
}

@@ -523,13 +494,10 @@ static int process_counter_maps(struct perf_stat_config *config,
int perf_stat_process_counter(struct perf_stat_config *config,
struct evsel *counter)
{
- struct perf_counts_values *aggr = &counter->counts->aggr;
struct perf_stat_evsel *ps = counter->stats;
- u64 *count = counter->counts->aggr.values;
+ u64 *count;
int ret;

- aggr->val = aggr->ena = aggr->run = 0;
-
if (counter->per_pkg)
evsel__zero_per_pkg(counter);

@@ -540,6 +508,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
if (config->aggr_mode != AGGR_GLOBAL)
return 0;

+ count = ps->aggr[0].counts.values;
update_stats(&ps->res_stats, *count);

if (verbose > 0) {
--
2.38.0.413.g74048e4d9e-goog

2022-10-14 06:37:10

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 09/19] perf stat: Aggregate events using evsel->stats->aggr

Add a logic to aggregate counter values to the new evsel->stats->aggr.
This is not used yet so shadow stats are not updated. But later patch
will convert the existing code to use it.

With that, we don't need to handle AGGR_GLOBAL specially anymore. It
can use the same logic with counts, prev_counts and aggr_counts.

Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-stat.c | 3 --
tools/perf/util/evsel.c | 9 +---
.../scripting-engines/trace-event-python.c | 6 ---
tools/perf/util/stat.c | 46 ++++++++++++++++---
4 files changed, 41 insertions(+), 23 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 92a8e4512f98..abede56d79b6 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -963,9 +963,6 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
init_stats(&walltime_nsecs_stats);
update_stats(&walltime_nsecs_stats, t1 - t0);

- if (stat_config.aggr_mode == AGGR_GLOBAL)
- evlist__save_aggr_prev_raw_counts(evsel_list);
-
evlist__copy_prev_raw_counts(evsel_list);
evlist__reset_prev_raw_counts(evsel_list);
perf_stat__reset_shadow_per_stat(&rt_stat);
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index a6ea91c72659..a1fcb3166149 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1526,13 +1526,8 @@ void evsel__compute_deltas(struct evsel *evsel, int cpu_map_idx, int thread,
if (!evsel->prev_raw_counts)
return;

- if (cpu_map_idx == -1) {
- tmp = evsel->prev_raw_counts->aggr;
- evsel->prev_raw_counts->aggr = *count;
- } else {
- tmp = *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread);
- *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread) = *count;
- }
+ tmp = *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread);
+ *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread) = *count;

count->val = count->val - tmp.val;
count->ena = count->ena - tmp.ena;
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index 1f2040f36d4e..7bc8559dce6a 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -1653,12 +1653,6 @@ static void python_process_stat(struct perf_stat_config *config,
struct perf_cpu_map *cpus = counter->core.cpus;
int cpu, thread;

- if (config->aggr_mode == AGGR_GLOBAL) {
- process_stat(counter, (struct perf_cpu){ .cpu = -1 }, -1, tstamp,
- &counter->counts->aggr);
- return;
- }
-
for (thread = 0; thread < threads->nr; thread++) {
for (cpu = 0; cpu < perf_cpu_map__nr(cpus); cpu++) {
process_stat(counter, perf_cpu_map__cpu(cpus, cpu),
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 374149628507..99874254809d 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -387,6 +387,7 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
struct perf_counts_values *count)
{
struct perf_counts_values *aggr = &evsel->counts->aggr;
+ struct perf_stat_evsel *ps = evsel->stats;
static struct perf_counts_values zero;
bool skip = false;

@@ -398,6 +399,44 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
if (skip)
count = &zero;

+ if (!evsel->snapshot)
+ evsel__compute_deltas(evsel, cpu_map_idx, thread, count);
+ perf_counts_values__scale(count, config->scale, NULL);
+
+ if (ps->aggr) {
+ struct perf_cpu cpu = perf_cpu_map__cpu(evsel->core.cpus, cpu_map_idx);
+ struct aggr_cpu_id aggr_id = config->aggr_get_id(config, cpu);
+ struct perf_stat_aggr *ps_aggr;
+ int i;
+
+ for (i = 0; i < ps->nr_aggr; i++) {
+ if (!aggr_cpu_id__equal(&aggr_id, &config->aggr_map->map[i]))
+ continue;
+
+ ps_aggr = &ps->aggr[i];
+ ps_aggr->nr++;
+
+ /*
+ * When any result is bad, make them all to give
+ * consistent output in interval mode.
+ */
+ if (count->ena == 0 || count->run == 0 ||
+ evsel->counts->scaled == -1) {
+ ps_aggr->counts.val = 0;
+ ps_aggr->counts.ena = 0;
+ ps_aggr->counts.run = 0;
+ ps_aggr->failed = true;
+ }
+
+ if (!ps_aggr->failed) {
+ ps_aggr->counts.val += count->val;
+ ps_aggr->counts.ena += count->ena;
+ ps_aggr->counts.run += count->run;
+ }
+ break;
+ }
+ }
+
switch (config->aggr_mode) {
case AGGR_THREAD:
case AGGR_CORE:
@@ -405,9 +444,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
case AGGR_SOCKET:
case AGGR_NODE:
case AGGR_NONE:
- if (!evsel->snapshot)
- evsel__compute_deltas(evsel, cpu_map_idx, thread, count);
- perf_counts_values__scale(count, config->scale, NULL);
if ((config->aggr_mode == AGGR_NONE) && (!evsel->percore)) {
perf_stat__update_shadow_stats(evsel, count->val,
cpu_map_idx, &rt_stat);
@@ -469,10 +505,6 @@ int perf_stat_process_counter(struct perf_stat_config *config,
if (config->aggr_mode != AGGR_GLOBAL)
return 0;

- if (!counter->snapshot)
- evsel__compute_deltas(counter, -1, -1, aggr);
- perf_counts_values__scale(aggr, config->scale, &counter->counts->scaled);
-
update_stats(&ps->res_stats, *count);

if (verbose > 0) {
--
2.38.0.413.g74048e4d9e-goog

2022-10-14 06:40:33

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 13/19] perf stat: Split process_counters()

It'd do more processing with aggregation. Let's split the function so that it
can be shared with by process_stat_round_event() too.

Acked-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-stat.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index bff28a199dfd..838d29590bed 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -465,15 +465,19 @@ static int read_bpf_map_counters(void)
return 0;
}

-static void read_counters(struct timespec *rs)
+static int read_counters(struct timespec *rs)
{
- struct evsel *counter;
-
if (!stat_config.stop_read_counter) {
if (read_bpf_map_counters() ||
read_affinity_counters(rs))
- return;
+ return -1;
}
+ return 0;
+}
+
+static void process_counters(void)
+{
+ struct evsel *counter;

evlist__for_each_entry(evsel_list, counter) {
if (counter->err)
@@ -494,7 +498,8 @@ static void process_interval(void)
perf_stat__reset_shadow_per_stat(&rt_stat);
evlist__reset_aggr_stats(evsel_list);

- read_counters(&rs);
+ if (read_counters(&rs) == 0)
+ process_counters();

if (STAT_RECORD) {
if (WRITE_STAT_ROUND_EVENT(rs.tv_sec * NSEC_PER_SEC + rs.tv_nsec, INTERVAL))
@@ -980,7 +985,8 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
* avoid arbitrary skew, we must read all counters before closing any
* group leaders.
*/
- read_counters(&(struct timespec) { .tv_nsec = t1-t0 });
+ if (read_counters(&(struct timespec) { .tv_nsec = t1-t0 }) == 0)
+ process_counters();

/*
* We need to keep evsel_list alive, because it's processed
@@ -2099,13 +2105,11 @@ static int process_stat_round_event(struct perf_session *session,
union perf_event *event)
{
struct perf_record_stat_round *stat_round = &event->stat_round;
- struct evsel *counter;
struct timespec tsh, *ts = NULL;
const char **argv = session->header.env.cmdline_argv;
int argc = session->header.env.nr_cmdline;

- evlist__for_each_entry(evsel_list, counter)
- perf_stat_process_counter(&stat_config, counter);
+ process_counters();

if (stat_round->type == PERF_STAT_ROUND_TYPE__FINAL)
update_stats(&walltime_nsecs_stats, stat_round->time);
--
2.38.0.413.g74048e4d9e-goog

2022-10-14 06:40:35

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 04/19] perf stat: Add aggr id for global mode

To make the code simpler, I'd like to use the same aggregation code for
the global mode. We can simply add an id function to return cpu 0 and
use print_aggr().

No functional change intended.

Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-stat.c | 36 ++++++++++++++++++++++++++++++++++--
tools/perf/util/cpumap.c | 10 ++++++++++
tools/perf/util/cpumap.h | 6 +++++-
3 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 265b05157972..75d16e9705a4 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1330,6 +1330,12 @@ static struct aggr_cpu_id perf_stat__get_node(struct perf_stat_config *config __
return aggr_cpu_id__node(cpu, /*data=*/NULL);
}

+static struct aggr_cpu_id perf_stat__get_global(struct perf_stat_config *config __maybe_unused,
+ struct perf_cpu cpu)
+{
+ return aggr_cpu_id__global(cpu, /*data=*/NULL);
+}
+
static struct aggr_cpu_id perf_stat__get_aggr(struct perf_stat_config *config,
aggr_get_id_t get_id, struct perf_cpu cpu)
{
@@ -1366,6 +1372,12 @@ static struct aggr_cpu_id perf_stat__get_node_cached(struct perf_stat_config *co
return perf_stat__get_aggr(config, perf_stat__get_node, cpu);
}

+static struct aggr_cpu_id perf_stat__get_global_cached(struct perf_stat_config *config,
+ struct perf_cpu cpu)
+{
+ return perf_stat__get_aggr(config, perf_stat__get_global, cpu);
+}
+
static bool term_percore_set(void)
{
struct evsel *counter;
@@ -1395,6 +1407,7 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)

return NULL;
case AGGR_GLOBAL:
+ return aggr_cpu_id__global;
case AGGR_THREAD:
case AGGR_UNSET:
case AGGR_MAX:
@@ -1420,6 +1433,7 @@ static aggr_get_id_t aggr_mode__get_id(enum aggr_mode aggr_mode)
}
return NULL;
case AGGR_GLOBAL:
+ return perf_stat__get_global_cached;
case AGGR_THREAD:
case AGGR_UNSET:
case AGGR_MAX:
@@ -1535,6 +1549,16 @@ static struct aggr_cpu_id perf_env__get_node_aggr_by_cpu(struct perf_cpu cpu, vo
return id;
}

+static struct aggr_cpu_id perf_env__get_global_aggr_by_cpu(struct perf_cpu cpu __maybe_unused,
+ void *data __maybe_unused)
+{
+ struct aggr_cpu_id id = aggr_cpu_id__empty();
+
+ /* it always aggregates to the cpu 0 */
+ id.cpu = (struct perf_cpu){ .cpu = 0 };
+ return id;
+}
+
static struct aggr_cpu_id perf_stat__get_socket_file(struct perf_stat_config *config __maybe_unused,
struct perf_cpu cpu)
{
@@ -1558,6 +1582,12 @@ static struct aggr_cpu_id perf_stat__get_node_file(struct perf_stat_config *conf
return perf_env__get_node_aggr_by_cpu(cpu, &perf_stat.session->header.env);
}

+static struct aggr_cpu_id perf_stat__get_global_file(struct perf_stat_config *config __maybe_unused,
+ struct perf_cpu cpu)
+{
+ return perf_env__get_global_aggr_by_cpu(cpu, &perf_stat.session->header.env);
+}
+
static aggr_cpu_id_get_t aggr_mode__get_aggr_file(enum aggr_mode aggr_mode)
{
switch (aggr_mode) {
@@ -1569,8 +1599,9 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr_file(enum aggr_mode aggr_mode)
return perf_env__get_core_aggr_by_cpu;
case AGGR_NODE:
return perf_env__get_node_aggr_by_cpu;
- case AGGR_NONE:
case AGGR_GLOBAL:
+ return perf_env__get_global_aggr_by_cpu;
+ case AGGR_NONE:
case AGGR_THREAD:
case AGGR_UNSET:
case AGGR_MAX:
@@ -1590,8 +1621,9 @@ static aggr_get_id_t aggr_mode__get_id_file(enum aggr_mode aggr_mode)
return perf_stat__get_core_file;
case AGGR_NODE:
return perf_stat__get_node_file;
- case AGGR_NONE:
case AGGR_GLOBAL:
+ return perf_stat__get_global_file;
+ case AGGR_NONE:
case AGGR_THREAD:
case AGGR_UNSET:
case AGGR_MAX:
diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
index 8486ca3bec75..60209fe87456 100644
--- a/tools/perf/util/cpumap.c
+++ b/tools/perf/util/cpumap.c
@@ -354,6 +354,16 @@ struct aggr_cpu_id aggr_cpu_id__node(struct perf_cpu cpu, void *data __maybe_unu
return id;
}

+struct aggr_cpu_id aggr_cpu_id__global(struct perf_cpu cpu, void *data __maybe_unused)
+{
+ struct aggr_cpu_id id = aggr_cpu_id__empty();
+
+ /* it always aggregates to the cpu 0 */
+ cpu.cpu = 0;
+ id.cpu = cpu;
+ return id;
+}
+
/* setup simple routines to easily access node numbers given a cpu number */
static int get_max_num(char *path, int *max)
{
diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
index 4a6d029576ee..b2ff648bc417 100644
--- a/tools/perf/util/cpumap.h
+++ b/tools/perf/util/cpumap.h
@@ -133,5 +133,9 @@ struct aggr_cpu_id aggr_cpu_id__cpu(struct perf_cpu cpu, void *data);
* cpu. The function signature is compatible with aggr_cpu_id_get_t.
*/
struct aggr_cpu_id aggr_cpu_id__node(struct perf_cpu cpu, void *data);
-
+/**
+ * aggr_cpu_id__global - Create an aggr_cpu_id for global aggregation.
+ * The function signature is compatible with aggr_cpu_id_get_t.
+ */
+struct aggr_cpu_id aggr_cpu_id__global(struct perf_cpu cpu, void *data);
#endif /* __PERF_CPUMAP_H */
--
2.38.0.413.g74048e4d9e-goog

2022-10-14 06:40:55

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 15/19] perf stat: Add perf_stat_process_percore()

The perf_stat_process_percore() is to aggregate counts for an event per-core
even if the aggr_mode is AGGR_NONE. This is enabled when user requested it
on the command line.

To handle that, it keeps the per-cpu counts at first. And then it aggregates
the counts that have the same core id in the aggr->counts and updates the
values for each cpu back.

Later, per-core events will skip one of the CPUs unless percore-show-thread
option is given. In that case, it can simply print all cpu stats with the
updated (per-core) values.

Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-stat.c | 1 +
tools/perf/util/stat.c | 71 +++++++++++++++++++++++++++++++++++++++
tools/perf/util/stat.h | 3 ++
3 files changed, 75 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 371d6e896942..d6a006e41da0 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -488,6 +488,7 @@ static void process_counters(void)
}

perf_stat_merge_counters(&stat_config, evsel_list);
+ perf_stat_process_percore(&stat_config, evsel_list);
}

static void process_interval(void)
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 877107f5a820..4ce457c94898 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -673,6 +673,77 @@ void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *ev
evsel__merge_stats(evsel, config);
}

+static void evsel__update_percore_stats(struct evsel *evsel, struct aggr_cpu_id *core_id)
+{
+ struct perf_stat_evsel *ps = evsel->stats;
+ struct perf_counts_values counts = { 0, };
+ struct aggr_cpu_id id;
+ struct perf_cpu cpu;
+ int idx;
+
+ /* collect per-core counts */
+ perf_cpu_map__for_each_cpu(cpu, idx, evsel->core.cpus) {
+ struct perf_stat_aggr *aggr = &ps->aggr[idx];
+
+ id = aggr_cpu_id__core(cpu, NULL);
+ if (!aggr_cpu_id__equal(core_id, &id))
+ continue;
+
+ counts.val += aggr->counts.val;
+ counts.ena += aggr->counts.ena;
+ counts.run += aggr->counts.run;
+ }
+
+ /* update aggregated per-core counts for each CPU */
+ perf_cpu_map__for_each_cpu(cpu, idx, evsel->core.cpus) {
+ struct perf_stat_aggr *aggr = &ps->aggr[idx];
+
+ id = aggr_cpu_id__core(cpu, NULL);
+ if (!aggr_cpu_id__equal(core_id, &id))
+ continue;
+
+ aggr->counts.val = counts.val;
+ aggr->counts.ena = counts.ena;
+ aggr->counts.run = counts.run;
+
+ aggr->used = true;
+ }
+}
+
+/* we have an aggr_map for cpu, but want to aggregate the counters per-core */
+static void evsel__process_percore(struct evsel *evsel)
+{
+ struct perf_stat_evsel *ps = evsel->stats;
+ struct aggr_cpu_id core_id;
+ struct perf_cpu cpu;
+ int idx;
+
+ if (!evsel->percore)
+ return;
+
+ perf_cpu_map__for_each_cpu(cpu, idx, evsel->core.cpus) {
+ struct perf_stat_aggr *aggr = &ps->aggr[idx];
+
+ if (aggr->used)
+ continue;
+
+ core_id = aggr_cpu_id__core(cpu, NULL);
+ evsel__update_percore_stats(evsel, &core_id);
+ }
+}
+
+/* process cpu stats on per-core events */
+void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *evlist)
+{
+ struct evsel *evsel;
+
+ if (config->aggr_mode != AGGR_NONE)
+ return;
+
+ evlist__for_each_entry(evlist, evsel)
+ evsel__process_percore(evsel);
+}
+
int perf_event__process_stat_event(struct perf_session *session,
union perf_event *event)
{
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 728bbc823b0d..d23f8743e442 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -51,6 +51,8 @@ struct perf_stat_aggr {
int nr;
/* whether any entry has failed to read/process event */
bool failed;
+ /* to mark this data is processed already */
+ bool used;
};

/* per-evsel event stats */
@@ -281,6 +283,7 @@ void evlist__reset_aggr_stats(struct evlist *evlist);
int perf_stat_process_counter(struct perf_stat_config *config,
struct evsel *counter);
void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist);
+void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *evlist);

struct perf_tool;
union perf_event;
--
2.38.0.413.g74048e4d9e-goog

2022-10-14 06:41:21

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 07/19] perf stat: Add struct perf_stat_aggr to perf_stat_evsel

The perf_stat_aggr struct is to keep aggregated counter values and the
states according to the aggregation mode. The number of entries is
depends on the mode and this is a preparation for the later use.

Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/stat.c | 34 +++++++++++++++++++++++++++-------
tools/perf/util/stat.h | 19 +++++++++++++++++++
2 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 8ec8bb4a9912..c9d5aa295b54 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -133,15 +133,33 @@ static void perf_stat_evsel_id_init(struct evsel *evsel)
static void evsel__reset_stat_priv(struct evsel *evsel)
{
struct perf_stat_evsel *ps = evsel->stats;
+ struct perf_stat_aggr *aggr = ps->aggr;

init_stats(&ps->res_stats);
+
+ if (aggr)
+ memset(aggr, 0, sizeof(*aggr) * ps->nr_aggr);
}

-static int evsel__alloc_stat_priv(struct evsel *evsel)
+
+static int evsel__alloc_stat_priv(struct evsel *evsel, int nr_aggr)
{
- evsel->stats = zalloc(sizeof(struct perf_stat_evsel));
- if (evsel->stats == NULL)
+ struct perf_stat_evsel *ps;
+
+ ps = zalloc(sizeof(*ps));
+ if (ps == NULL)
return -ENOMEM;
+
+ if (nr_aggr) {
+ ps->nr_aggr = nr_aggr;
+ ps->aggr = calloc(nr_aggr, sizeof(*ps->aggr));
+ if (ps->aggr == NULL) {
+ free(ps);
+ return -ENOMEM;
+ }
+ }
+
+ evsel->stats = ps;
perf_stat_evsel_id_init(evsel);
evsel__reset_stat_priv(evsel);
return 0;
@@ -151,8 +169,10 @@ static void evsel__free_stat_priv(struct evsel *evsel)
{
struct perf_stat_evsel *ps = evsel->stats;

- if (ps)
+ if (ps) {
+ zfree(&ps->aggr);
zfree(&ps->group_data);
+ }
zfree(&evsel->stats);
}

@@ -181,9 +201,9 @@ static void evsel__reset_prev_raw_counts(struct evsel *evsel)
perf_counts__reset(evsel->prev_raw_counts);
}

-static int evsel__alloc_stats(struct evsel *evsel, bool alloc_raw)
+static int evsel__alloc_stats(struct evsel *evsel, int nr_aggr, bool alloc_raw)
{
- if (evsel__alloc_stat_priv(evsel) < 0 ||
+ if (evsel__alloc_stat_priv(evsel, nr_aggr) < 0 ||
evsel__alloc_counts(evsel) < 0 ||
(alloc_raw && evsel__alloc_prev_raw_counts(evsel) < 0))
return -ENOMEM;
@@ -196,7 +216,7 @@ int evlist__alloc_stats(struct evlist *evlist, bool alloc_raw)
struct evsel *evsel;

evlist__for_each_entry(evlist, evsel) {
- if (evsel__alloc_stats(evsel, alloc_raw))
+ if (evsel__alloc_stats(evsel, 0, alloc_raw))
goto out_free;
}

diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index b0899c6e002f..42453513ffea 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -8,6 +8,7 @@
#include <sys/resource.h>
#include "cpumap.h"
#include "rblist.h"
+#include "counts.h"

struct perf_cpu_map;
struct perf_stat_config;
@@ -42,9 +43,27 @@ enum perf_stat_evsel_id {
PERF_STAT_EVSEL_ID__MAX,
};

+/* hold aggregated event info */
+struct perf_stat_aggr {
+ /* aggregated values */
+ struct perf_counts_values counts;
+ /* number of entries (CPUs) aggregated */
+ int nr;
+ /* whether any entry has failed to read/process event */
+ bool failed;
+};
+
+/* per-evsel event stats */
struct perf_stat_evsel {
+ /* used for repeated runs */
struct stats res_stats;
+ /* evsel id for quick check */
enum perf_stat_evsel_id id;
+ /* number of allocated 'aggr' */
+ int nr_aggr;
+ /* aggregated event values */
+ struct perf_stat_aggr *aggr;
+ /* used for group read */
u64 *group_data;
};

--
2.38.0.413.g74048e4d9e-goog

2022-10-14 06:41:33

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 10/19] perf stat: Aggregate per-thread stats using evsel->stats->aggr

Per-thread aggregation doesn't use the CPU numbers but the logic should
be the same. Initialize cpu_aggr_map separately for AGGR_THREAD and use
thread map idx to aggregate counter values.

Acked-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-stat.c | 31 +++++++++++++++++++++++++++++++
tools/perf/util/stat.c | 26 +++++++++++++++++++++++---
2 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index abede56d79b6..6777fef0d56c 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1465,6 +1465,21 @@ static int perf_stat_init_aggr_mode(void)
stat_config.aggr_get_id = aggr_mode__get_id(stat_config.aggr_mode);
}

+ if (stat_config.aggr_mode == AGGR_THREAD) {
+ nr = perf_thread_map__nr(evsel_list->core.threads);
+ stat_config.aggr_map = cpu_aggr_map__empty_new(nr);
+ if (stat_config.aggr_map == NULL)
+ return -ENOMEM;
+
+ for (int s = 0; s < nr; s++) {
+ struct aggr_cpu_id id = aggr_cpu_id__empty();
+
+ id.thread_idx = s;
+ stat_config.aggr_map->map[s] = id;
+ }
+ return 0;
+ }
+
/*
* The evsel_list->cpus is the base we operate on,
* taking the highest cpu number to be the size of
@@ -1674,6 +1689,22 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
aggr_cpu_id_get_t get_id = aggr_mode__get_aggr_file(stat_config.aggr_mode);
bool needs_sort = stat_config.aggr_mode != AGGR_NONE;

+ if (stat_config.aggr_mode == AGGR_THREAD) {
+ int nr = perf_thread_map__nr(evsel_list->core.threads);
+
+ stat_config.aggr_map = cpu_aggr_map__empty_new(nr);
+ if (stat_config.aggr_map == NULL)
+ return -ENOMEM;
+
+ for (int s = 0; s < nr; s++) {
+ struct aggr_cpu_id id = aggr_cpu_id__empty();
+
+ id.thread_idx = s;
+ stat_config.aggr_map->map[s] = id;
+ }
+ return 0;
+ }
+
if (!get_id)
return 0;

diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 99874254809d..3593af403188 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -403,6 +403,24 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
evsel__compute_deltas(evsel, cpu_map_idx, thread, count);
perf_counts_values__scale(count, config->scale, NULL);

+ if (config->aggr_mode == AGGR_THREAD) {
+ struct perf_counts_values *aggr_counts = &ps->aggr[thread].counts;
+
+ /*
+ * Skip value 0 when enabling --per-thread globally,
+ * otherwise too many 0 output.
+ */
+ if (count->val == 0 && config->system_wide)
+ return 0;
+
+ ps->aggr[thread].nr++;
+
+ aggr_counts->val += count->val;
+ aggr_counts->ena += count->ena;
+ aggr_counts->run += count->run;
+ goto update;
+ }
+
if (ps->aggr) {
struct perf_cpu cpu = perf_cpu_map__cpu(evsel->core.cpus, cpu_map_idx);
struct aggr_cpu_id aggr_id = config->aggr_get_id(config, cpu);
@@ -417,10 +435,11 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
ps_aggr->nr++;

/*
- * When any result is bad, make them all to give
- * consistent output in interval mode.
+ * When any result is bad, make them all to give consistent output
+ * in interval mode. But per-task counters can have 0 enabled time
+ * when some tasks are idle.
*/
- if (count->ena == 0 || count->run == 0 ||
+ if (((count->ena == 0 || count->run == 0) && cpu.cpu != -1) ||
evsel->counts->scaled == -1) {
ps_aggr->counts.val = 0;
ps_aggr->counts.ena = 0;
@@ -437,6 +456,7 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
}
}

+update:
switch (config->aggr_mode) {
case AGGR_THREAD:
case AGGR_CORE:
--
2.38.0.413.g74048e4d9e-goog

2022-10-14 06:41:53

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 08/19] perf stat: Allocate evsel->stats->aggr properly

The perf_stat_config.aggr_map should have a correct size of the
aggregation map. Use it to allocate aggr_counts.

Also AGGR_NONE with per-core events can be tricky because it doesn't
aggreate basically but it needs to do so for per-core events only.
So only per-core evsels will have stats->aggr data.

Note that other caller of evlist__alloc_stat() might not have
stat_config or aggr_map.

Acked-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-script.c | 4 ++--
tools/perf/builtin-stat.c | 6 +++---
tools/perf/tests/parse-metric.c | 2 +-
tools/perf/tests/pmu-events.c | 2 +-
tools/perf/util/stat.c | 9 +++++++--
tools/perf/util/stat.h | 3 ++-
6 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 7ca238277d83..d7ec8c1af293 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2049,7 +2049,7 @@ static void perf_sample__fprint_metric(struct perf_script *script,
u64 val;

if (!evsel->stats)
- evlist__alloc_stats(script->session->evlist, false);
+ evlist__alloc_stats(&stat_config, script->session->evlist, /*alloc_raw=*/false);
if (evsel_script(leader)->gnum++ == 0)
perf_stat__reset_shadow_stats();
val = sample->period * evsel->scale;
@@ -3632,7 +3632,7 @@ static int set_maps(struct perf_script *script)

perf_evlist__set_maps(&evlist->core, script->cpus, script->threads);

- if (evlist__alloc_stats(evlist, true))
+ if (evlist__alloc_stats(&stat_config, evlist, /*alloc_raw=*/true))
return -ENOMEM;

script->allocated = true;
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 9053fd4d15a7..92a8e4512f98 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -2121,7 +2121,7 @@ static int set_maps(struct perf_stat *st)

perf_evlist__set_maps(&evsel_list->core, st->cpus, st->threads);

- if (evlist__alloc_stats(evsel_list, true))
+ if (evlist__alloc_stats(&stat_config, evsel_list, /*alloc_raw=*/true))
return -ENOMEM;

st->maps_allocated = true;
@@ -2568,10 +2568,10 @@ int cmd_stat(int argc, const char **argv)
goto out;
}

- if (evlist__alloc_stats(evsel_list, interval))
+ if (perf_stat_init_aggr_mode())
goto out;

- if (perf_stat_init_aggr_mode())
+ if (evlist__alloc_stats(&stat_config, evsel_list, interval))
goto out;

/*
diff --git a/tools/perf/tests/parse-metric.c b/tools/perf/tests/parse-metric.c
index 68f5a2a03242..21b7ac00d798 100644
--- a/tools/perf/tests/parse-metric.c
+++ b/tools/perf/tests/parse-metric.c
@@ -103,7 +103,7 @@ static int __compute_metric(const char *name, struct value *vals,
if (err)
goto out;

- err = evlist__alloc_stats(evlist, false);
+ err = evlist__alloc_stats(/*config=*/NULL, evlist, /*alloc_raw=*/false);
if (err)
goto out;

diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index 097e05c796ab..5d0d3b239a68 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -889,7 +889,7 @@ static int test__parsing_callback(const struct pmu_event *pe, const struct pmu_e
goto out_err;
}

- err = evlist__alloc_stats(evlist, false);
+ err = evlist__alloc_stats(/*config=*/NULL, evlist, /*alloc_raw=*/false);
if (err)
goto out_err;
/*
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index c9d5aa295b54..374149628507 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -211,12 +211,17 @@ static int evsel__alloc_stats(struct evsel *evsel, int nr_aggr, bool alloc_raw)
return 0;
}

-int evlist__alloc_stats(struct evlist *evlist, bool alloc_raw)
+int evlist__alloc_stats(struct perf_stat_config *config,
+ struct evlist *evlist, bool alloc_raw)
{
struct evsel *evsel;
+ int nr_aggr = 0;
+
+ if (config && config->aggr_map)
+ nr_aggr = config->aggr_map->nr;

evlist__for_each_entry(evlist, evsel) {
- if (evsel__alloc_stats(evsel, 0, alloc_raw))
+ if (evsel__alloc_stats(evsel, nr_aggr, alloc_raw))
goto out_free;
}

diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 42453513ffea..0980875b9be1 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -267,7 +267,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
struct runtime_stat *st);
void perf_stat__collect_metric_expr(struct evlist *);

-int evlist__alloc_stats(struct evlist *evlist, bool alloc_raw);
+int evlist__alloc_stats(struct perf_stat_config *config,
+ struct evlist *evlist, bool alloc_raw);
void evlist__free_stats(struct evlist *evlist);
void evlist__reset_stats(struct evlist *evlist);
void evlist__reset_prev_raw_counts(struct evlist *evlist);
--
2.38.0.413.g74048e4d9e-goog

2022-10-14 06:42:44

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 17/19] perf stat: Display event stats using aggr counts

Now aggr counts are ready for use. Convert the display routines to use
the aggr counts and update the shadow stat with them. It doesn't need
to aggregate counts or collect aliases anymore during the display. Get
rid of now unused struct perf_aggr_thread_value.

Note that there's a difference in the display order among the aggr mode.
For per-core/die/socket/node aggregation, it shows relevant events in
the same unit together, whereas global/thread/no aggregation it shows
the same events for different units together. So it still uses separate
codes to display them due to the ordering.

One more thing to note is that it breaks per-core event display for now.
The next patch will fix it to have identical output as of now.

Acked-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/stat-display.c | 421 ++++-----------------------------
tools/perf/util/stat.c | 5 -
tools/perf/util/stat.h | 9 -
3 files changed, 49 insertions(+), 386 deletions(-)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 4113aa86772f..d9cd2db1398b 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -442,31 +442,6 @@ static void print_metric_header(struct perf_stat_config *config,
fprintf(os->fh, "%*s ", config->metric_only_len, unit);
}

-static int first_shadow_map_idx(struct perf_stat_config *config,
- struct evsel *evsel, const struct aggr_cpu_id *id)
-{
- struct perf_cpu_map *cpus = evsel__cpus(evsel);
- struct perf_cpu cpu;
- int idx;
-
- if (config->aggr_mode == AGGR_NONE)
- return perf_cpu_map__idx(cpus, id->cpu);
-
- if (config->aggr_mode == AGGR_THREAD)
- return id->thread_idx;
-
- if (!config->aggr_get_id)
- return 0;
-
- perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
- struct aggr_cpu_id cpu_id = config->aggr_get_id(config, cpu);
-
- if (aggr_cpu_id__equal(&cpu_id, id))
- return idx;
- }
- return 0;
-}
-
static void abs_printout(struct perf_stat_config *config,
struct aggr_cpu_id id, int nr, struct evsel *evsel, double avg)
{
@@ -537,7 +512,7 @@ static bool is_mixed_hw_group(struct evsel *counter)
static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int nr,
struct evsel *counter, double uval,
char *prefix, u64 run, u64 ena, double noise,
- struct runtime_stat *st)
+ struct runtime_stat *st, int map_idx)
{
struct perf_stat_output_ctx out;
struct outstate os = {
@@ -648,8 +623,7 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
print_running(config, run, ena);
}

- perf_stat__print_shadow_stats(config, counter, uval,
- first_shadow_map_idx(config, counter, &id),
+ perf_stat__print_shadow_stats(config, counter, uval, map_idx,
&out, &config->metric_events, st);
if (!config->csv_output && !config->metric_only && !config->json_output) {
print_noise(config, counter, noise);
@@ -657,34 +631,6 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
}
}

-static void aggr_update_shadow(struct perf_stat_config *config,
- struct evlist *evlist)
-{
- int idx, s;
- struct perf_cpu cpu;
- struct aggr_cpu_id s2, id;
- u64 val;
- struct evsel *counter;
- struct perf_cpu_map *cpus;
-
- for (s = 0; s < config->aggr_map->nr; s++) {
- id = config->aggr_map->map[s];
- evlist__for_each_entry(evlist, counter) {
- cpus = evsel__cpus(counter);
- val = 0;
- perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
- s2 = config->aggr_get_id(config, cpu);
- if (!aggr_cpu_id__equal(&s2, &id))
- continue;
- val += perf_counts(counter->counts, idx, 0)->val;
- }
- perf_stat__update_shadow_stats(counter, val,
- first_shadow_map_idx(config, counter, &id),
- &rt_stat);
- }
- }
-}
-
static void uniquify_event_name(struct evsel *counter)
{
char *new_name;
@@ -721,137 +667,51 @@ static void uniquify_event_name(struct evsel *counter)
counter->uniquified_name = true;
}

-static void collect_all_aliases(struct perf_stat_config *config, struct evsel *counter,
- void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
- bool first),
- void *data)
+static bool hybrid_uniquify(struct evsel *evsel, struct perf_stat_config *config)
{
- struct evlist *evlist = counter->evlist;
- struct evsel *alias;
-
- alias = list_prepare_entry(counter, &(evlist->core.entries), core.node);
- list_for_each_entry_continue (alias, &evlist->core.entries, core.node) {
- /* Merge events with the same name, etc. but on different PMUs. */
- if (!strcmp(evsel__name(alias), evsel__name(counter)) &&
- alias->scale == counter->scale &&
- alias->cgrp == counter->cgrp &&
- !strcmp(alias->unit, counter->unit) &&
- evsel__is_clock(alias) == evsel__is_clock(counter) &&
- strcmp(alias->pmu_name, counter->pmu_name)) {
- alias->merged_stat = true;
- cb(config, alias, data, false);
- }
- }
+ return evsel__is_hybrid(evsel) && !config->hybrid_merge;
}

-static bool hybrid_merge(struct evsel *counter, struct perf_stat_config *config,
- bool check)
+static void uniquify_counter(struct perf_stat_config *config, struct evsel *counter)
{
- if (evsel__is_hybrid(counter)) {
- if (check)
- return config->hybrid_merge;
- else
- return !config->hybrid_merge;
- }
-
- return false;
-}
-
-static bool collect_data(struct perf_stat_config *config, struct evsel *counter,
- void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
- bool first),
- void *data)
-{
- if (counter->merged_stat)
- return false;
- cb(config, counter, data, true);
- if (config->no_merge || hybrid_merge(counter, config, false))
+ if (config->no_merge || hybrid_uniquify(counter, config))
uniquify_event_name(counter);
- else if (counter->auto_merge_stats || hybrid_merge(counter, config, true))
- collect_all_aliases(config, counter, cb, data);
- return true;
-}
-
-struct aggr_data {
- u64 ena, run, val;
- struct aggr_cpu_id id;
- int nr;
- int cpu_map_idx;
-};
-
-static void aggr_cb(struct perf_stat_config *config,
- struct evsel *counter, void *data, bool first)
-{
- struct aggr_data *ad = data;
- int idx;
- struct perf_cpu cpu;
- struct perf_cpu_map *cpus;
- struct aggr_cpu_id s2;
-
- cpus = evsel__cpus(counter);
- perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
- struct perf_counts_values *counts;
-
- s2 = config->aggr_get_id(config, cpu);
- if (!aggr_cpu_id__equal(&s2, &ad->id))
- continue;
- if (first)
- ad->nr++;
- counts = perf_counts(counter->counts, idx, 0);
- /*
- * When any result is bad, make them all to give
- * consistent output in interval mode.
- */
- if (counts->ena == 0 || counts->run == 0 ||
- counter->counts->scaled == -1) {
- ad->ena = 0;
- ad->run = 0;
- break;
- }
- ad->val += counts->val;
- ad->ena += counts->ena;
- ad->run += counts->run;
- }
}

static void print_counter_aggrdata(struct perf_stat_config *config,
struct evsel *counter, int s,
char *prefix, bool metric_only,
- bool *first, struct perf_cpu cpu)
+ bool *first)
{
- struct aggr_data ad;
FILE *output = config->output;
u64 ena, run, val;
- int nr;
- struct aggr_cpu_id id;
double uval;
+ struct perf_stat_evsel *ps = counter->stats;
+ struct perf_stat_aggr *aggr = &ps->aggr[s];
+ struct aggr_cpu_id id = config->aggr_map->map[s];
+ double avg = aggr->counts.val;

- ad.id = id = config->aggr_map->map[s];
- ad.val = ad.ena = ad.run = 0;
- ad.nr = 0;
- if (!collect_data(config, counter, aggr_cb, &ad))
+ if (aggr->nr == 0)
return;

- if (perf_pmu__has_hybrid() && ad.ena == 0)
- return;
+ uniquify_counter(config, counter);
+
+ val = aggr->counts.val;
+ ena = aggr->counts.ena;
+ run = aggr->counts.run;

- nr = ad.nr;
- ena = ad.ena;
- run = ad.run;
- val = ad.val;
if (*first && metric_only) {
*first = false;
- aggr_printout(config, counter, id, nr);
+ aggr_printout(config, counter, id, aggr->nr);
}
if (prefix && !metric_only)
fprintf(output, "%s", prefix);

uval = val * counter->scale;
- if (cpu.cpu != -1)
- id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);

- printout(config, id, nr, counter, uval,
- prefix, run, ena, 1.0, &rt_stat);
+ printout(config, id, aggr->nr, counter, uval,
+ prefix, run, ena, avg, &rt_stat, s);
+
if (!metric_only)
fputc('\n', output);
}
@@ -869,8 +729,6 @@ static void print_aggr(struct perf_stat_config *config,
if (!config->aggr_map || !config->aggr_get_id)
return;

- aggr_update_shadow(config, evlist);
-
/*
* With metric_only everything is on a single line.
* Without each counter has its own line.
@@ -881,188 +739,36 @@ static void print_aggr(struct perf_stat_config *config,

first = true;
evlist__for_each_entry(evlist, counter) {
+ if (counter->merged_stat)
+ continue;
+
print_counter_aggrdata(config, counter, s,
- prefix, metric_only,
- &first, (struct perf_cpu){ .cpu = -1 });
+ prefix, metric_only,
+ &first);
}
if (metric_only)
fputc('\n', output);
}
}

-static int cmp_val(const void *a, const void *b)
-{
- return ((struct perf_aggr_thread_value *)b)->val -
- ((struct perf_aggr_thread_value *)a)->val;
-}
-
-static struct perf_aggr_thread_value *sort_aggr_thread(
- struct evsel *counter,
- int *ret,
- struct target *_target)
-{
- int nthreads = perf_thread_map__nr(counter->core.threads);
- int i = 0;
- double uval;
- struct perf_aggr_thread_value *buf;
-
- buf = calloc(nthreads, sizeof(struct perf_aggr_thread_value));
- if (!buf)
- return NULL;
-
- for (int thread = 0; thread < nthreads; thread++) {
- int idx;
- u64 ena = 0, run = 0, val = 0;
-
- perf_cpu_map__for_each_idx(idx, evsel__cpus(counter)) {
- struct perf_counts_values *counts =
- perf_counts(counter->counts, idx, thread);
-
- val += counts->val;
- ena += counts->ena;
- run += counts->run;
- }
-
- uval = val * counter->scale;
-
- /*
- * Skip value 0 when enabling --per-thread globally,
- * otherwise too many 0 output.
- */
- if (uval == 0.0 && target__has_per_thread(_target))
- continue;
-
- buf[i].counter = counter;
- buf[i].id = aggr_cpu_id__empty();
- buf[i].id.thread_idx = thread;
- buf[i].uval = uval;
- buf[i].val = val;
- buf[i].run = run;
- buf[i].ena = ena;
- i++;
- }
-
- qsort(buf, i, sizeof(struct perf_aggr_thread_value), cmp_val);
-
- if (ret)
- *ret = i;
-
- return buf;
-}
-
-static void print_aggr_thread(struct perf_stat_config *config,
- struct target *_target,
- struct evsel *counter, char *prefix)
-{
- FILE *output = config->output;
- int thread, sorted_threads;
- struct aggr_cpu_id id;
- struct perf_aggr_thread_value *buf;
-
- buf = sort_aggr_thread(counter, &sorted_threads, _target);
- if (!buf) {
- perror("cannot sort aggr thread");
- return;
- }
-
- for (thread = 0; thread < sorted_threads; thread++) {
- if (prefix)
- fprintf(output, "%s", prefix);
-
- id = buf[thread].id;
- printout(config, id, 0, buf[thread].counter, buf[thread].uval,
- prefix, buf[thread].run, buf[thread].ena, 1.0,
- &rt_stat);
- fputc('\n', output);
- }
-
- free(buf);
-}
-
-struct caggr_data {
- double avg, avg_enabled, avg_running;
-};
-
-static void counter_aggr_cb(struct perf_stat_config *config __maybe_unused,
- struct evsel *counter, void *data,
- bool first __maybe_unused)
-{
- struct caggr_data *cd = data;
- struct perf_counts_values *aggr = &counter->counts->aggr;
-
- cd->avg += aggr->val;
- cd->avg_enabled += aggr->ena;
- cd->avg_running += aggr->run;
-}
-
-/*
- * Print out the results of a single counter:
- * aggregated counts in system-wide mode
- */
-static void print_counter_aggr(struct perf_stat_config *config,
- struct evsel *counter, char *prefix)
-{
- bool metric_only = config->metric_only;
- FILE *output = config->output;
- double uval;
- struct caggr_data cd = { .avg = 0.0 };
-
- if (!collect_data(config, counter, counter_aggr_cb, &cd))
- return;
-
- if (prefix && !metric_only)
- fprintf(output, "%s", prefix);
-
- uval = cd.avg * counter->scale;
- printout(config, aggr_cpu_id__empty(), 0, counter, uval, prefix, cd.avg_running,
- cd.avg_enabled, cd.avg, &rt_stat);
- if (!metric_only)
- fprintf(output, "\n");
-}
-
-static void counter_cb(struct perf_stat_config *config __maybe_unused,
- struct evsel *counter, void *data,
- bool first __maybe_unused)
-{
- struct aggr_data *ad = data;
-
- ad->val += perf_counts(counter->counts, ad->cpu_map_idx, 0)->val;
- ad->ena += perf_counts(counter->counts, ad->cpu_map_idx, 0)->ena;
- ad->run += perf_counts(counter->counts, ad->cpu_map_idx, 0)->run;
-}
-
-/*
- * Print out the results of a single counter:
- * does not use aggregated count in system-wide
- */
static void print_counter(struct perf_stat_config *config,
struct evsel *counter, char *prefix)
{
- FILE *output = config->output;
- u64 ena, run, val;
- double uval;
- int idx;
- struct perf_cpu cpu;
- struct aggr_cpu_id id;
-
- perf_cpu_map__for_each_cpu(cpu, idx, evsel__cpus(counter)) {
- struct aggr_data ad = { .cpu_map_idx = idx };
-
- if (!collect_data(config, counter, counter_cb, &ad))
- return;
- val = ad.val;
- ena = ad.ena;
- run = ad.run;
+ bool metric_only = config->metric_only;
+ bool first = false;
+ int s;

- if (prefix)
- fprintf(output, "%s", prefix);
+ /* AGGR_THREAD doesn't have config->aggr_get_id */
+ if (!config->aggr_map)
+ return;

- uval = val * counter->scale;
- id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
- printout(config, id, 0, counter, uval, prefix,
- run, ena, 1.0, &rt_stat);
+ if (counter->merged_stat)
+ return;

- fputc('\n', output);
+ for (s = 0; s < config->aggr_map->nr; s++) {
+ print_counter_aggrdata(config, counter, s,
+ prefix, metric_only,
+ &first);
}
}

@@ -1081,6 +787,7 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
u64 ena, run, val;
double uval;
struct aggr_cpu_id id;
+ struct perf_stat_evsel *ps = counter->stats;
int counter_idx = perf_cpu_map__idx(evsel__cpus(counter), cpu);

if (counter_idx < 0)
@@ -1093,13 +800,13 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
aggr_printout(config, counter, id, 0);
first = false;
}
- val = perf_counts(counter->counts, counter_idx, 0)->val;
- ena = perf_counts(counter->counts, counter_idx, 0)->ena;
- run = perf_counts(counter->counts, counter_idx, 0)->run;
+ val = ps->aggr[counter_idx].counts.val;
+ ena = ps->aggr[counter_idx].counts.ena;
+ run = ps->aggr[counter_idx].counts.run;

uval = val * counter->scale;
printout(config, id, 0, counter, uval, prefix,
- run, ena, 1.0, &rt_stat);
+ run, ena, 1.0, &rt_stat, counter_idx);
}
if (!first)
fputc('\n', config->output);
@@ -1135,8 +842,8 @@ static void print_metric_headers(struct perf_stat_config *config,
};
bool first = true;

- if (config->json_output && !config->interval)
- fprintf(config->output, "{");
+ if (config->json_output && !config->interval)
+ fprintf(config->output, "{");

if (prefix && !config->json_output)
fprintf(config->output, "%s", prefix);
@@ -1379,31 +1086,6 @@ static void print_footer(struct perf_stat_config *config)
"the same PMU. Try reorganizing the group.\n");
}

-static void print_percore_thread(struct perf_stat_config *config,
- struct evsel *counter, char *prefix)
-{
- int s;
- struct aggr_cpu_id s2, id;
- struct perf_cpu_map *cpus;
- bool first = true;
- int idx;
- struct perf_cpu cpu;
-
- cpus = evsel__cpus(counter);
- perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
- s2 = config->aggr_get_id(config, cpu);
- for (s = 0; s < config->aggr_map->nr; s++) {
- id = config->aggr_map->map[s];
- if (aggr_cpu_id__equal(&s2, &id))
- break;
- }
-
- print_counter_aggrdata(config, counter, s,
- prefix, false,
- &first, cpu);
- }
-}
-
static void print_percore(struct perf_stat_config *config,
struct evsel *counter, char *prefix)
{
@@ -1416,15 +1098,14 @@ static void print_percore(struct perf_stat_config *config,
return;

if (config->percore_show_thread)
- return print_percore_thread(config, counter, prefix);
+ return print_counter(config, counter, prefix);

for (s = 0; s < config->aggr_map->nr; s++) {
if (prefix && metric_only)
fprintf(output, "%s", prefix);

print_counter_aggrdata(config, counter, s,
- prefix, metric_only,
- &first, (struct perf_cpu){ .cpu = -1 });
+ prefix, metric_only, &first);
}

if (metric_only)
@@ -1469,17 +1150,13 @@ void evlist__print_counters(struct evlist *evlist, struct perf_stat_config *conf
print_aggr(config, evlist, prefix);
break;
case AGGR_THREAD:
- evlist__for_each_entry(evlist, counter) {
- print_aggr_thread(config, _target, counter, prefix);
- }
- break;
case AGGR_GLOBAL:
if (config->iostat_run)
iostat_print_counters(evlist, config, ts, prefix = buf,
- print_counter_aggr);
+ print_counter);
else {
evlist__for_each_entry(evlist, counter) {
- print_counter_aggr(config, counter, prefix);
+ print_counter(config, counter, prefix);
}
if (metric_only)
fputc('\n', config->output);
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index d51ba63b9ab2..14c45f4cfdd3 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -547,11 +547,6 @@ int perf_stat_process_counter(struct perf_stat_config *config,
evsel__name(counter), count[0], count[1], count[2]);
}

- /*
- * Save the full runtime - to allow normalization during printout:
- */
- perf_stat__update_shadow_stats(counter, *count, 0, &rt_stat);
-
return 0;
}

diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 3d413ba8c68a..382a1ab92ce1 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -224,15 +224,6 @@ static inline void update_rusage_stats(struct rusage_stats *ru_stats, struct rus
struct evsel;
struct evlist;

-struct perf_aggr_thread_value {
- struct evsel *counter;
- struct aggr_cpu_id id;
- double uval;
- u64 val;
- u64 run;
- u64 ena;
-};
-
bool __perf_stat_evsel__is(struct evsel *evsel, enum perf_stat_evsel_id id);

#define perf_stat_evsel__is(evsel, id) \
--
2.38.0.413.g74048e4d9e-goog

2022-10-14 06:44:13

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 16/19] perf stat: Add perf_stat_process_shadow_stats()

This function updates the shadow stats using the aggregated counts
uniformly since it uses the aggr_counts for the every aggr mode.

It'd have duplicate shadow stats for each items for now since the
display routines will update them once again. But that'd be fine
as it shows the average values and it'd be gone eventually.

Acked-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-stat.c | 1 +
tools/perf/util/stat.c | 50 ++++++++++++++++++++-------------------
tools/perf/util/stat.h | 1 +
3 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index d6a006e41da0..d7c52cef70a3 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -489,6 +489,7 @@ static void process_counters(void)

perf_stat_merge_counters(&stat_config, evsel_list);
perf_stat_process_percore(&stat_config, evsel_list);
+ perf_stat_process_shadow_stats(&stat_config, evsel_list);
}

static void process_interval(void)
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 4ce457c94898..d51ba63b9ab2 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -455,7 +455,7 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
aggr_counts->val += count->val;
aggr_counts->ena += count->ena;
aggr_counts->run += count->run;
- goto update;
+ return 0;
}

if (ps->aggr) {
@@ -493,32 +493,10 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
}
}

-update:
- switch (config->aggr_mode) {
- case AGGR_THREAD:
- case AGGR_CORE:
- case AGGR_DIE:
- case AGGR_SOCKET:
- case AGGR_NODE:
- case AGGR_NONE:
- if ((config->aggr_mode == AGGR_NONE) && (!evsel->percore)) {
- perf_stat__update_shadow_stats(evsel, count->val,
- cpu_map_idx, &rt_stat);
- }
-
- if (config->aggr_mode == AGGR_THREAD) {
- perf_stat__update_shadow_stats(evsel, count->val,
- thread, &rt_stat);
- }
- break;
- case AGGR_GLOBAL:
+ if (config->aggr_mode == AGGR_GLOBAL) {
aggr->val += count->val;
aggr->ena += count->ena;
aggr->run += count->run;
- case AGGR_UNSET:
- case AGGR_MAX:
- default:
- break;
}

return 0;
@@ -744,6 +722,30 @@ void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *e
evsel__process_percore(evsel);
}

+static void evsel__update_shadow_stats(struct evsel *evsel)
+{
+ struct perf_stat_evsel *ps = evsel->stats;
+ int i;
+
+ if (ps->aggr == NULL)
+ return;
+
+ for (i = 0; i < ps->nr_aggr; i++) {
+ struct perf_counts_values *aggr_counts = &ps->aggr[i].counts;
+
+ perf_stat__update_shadow_stats(evsel, aggr_counts->val, i, &rt_stat);
+ }
+}
+
+void perf_stat_process_shadow_stats(struct perf_stat_config *config __maybe_unused,
+ struct evlist *evlist)
+{
+ struct evsel *evsel;
+
+ evlist__for_each_entry(evlist, evsel)
+ evsel__update_shadow_stats(evsel);
+}
+
int perf_event__process_stat_event(struct perf_session *session,
union perf_event *event)
{
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index d23f8743e442..3d413ba8c68a 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -284,6 +284,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
struct evsel *counter);
void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist);
void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *evlist);
+void perf_stat_process_shadow_stats(struct perf_stat_config *config, struct evlist *evlist);

struct perf_tool;
union perf_event;
--
2.38.0.413.g74048e4d9e-goog

2022-10-14 06:59:33

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 02/19] perf tools: Use pmu info in evsel__is_hybrid()

If evsel has pmu, it can use pmu->is_hybrid directly.

Acked-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/evsel.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 196f8e4859d7..a6ea91c72659 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -3132,6 +3132,9 @@ void evsel__zero_per_pkg(struct evsel *evsel)

bool evsel__is_hybrid(struct evsel *evsel)
{
+ if (evsel->pmu)
+ return evsel->pmu->is_hybrid;
+
return evsel->pmu_name && perf_pmu__is_hybrid(evsel->pmu_name);
}

--
2.38.0.413.g74048e4d9e-goog

2022-10-14 07:08:08

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 01/19] perf tools: Save evsel->pmu in parse_events()

Now evsel has a pmu pointer, let's save the info and use it like in
evsel__find_pmu().

Acked-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/evsel.c | 1 +
tools/perf/util/parse-events.c | 1 +
tools/perf/util/pmu.c | 4 ++++
3 files changed, 6 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 76605fde3507..196f8e4859d7 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -467,6 +467,7 @@ struct evsel *evsel__clone(struct evsel *orig)
evsel->collect_stat = orig->collect_stat;
evsel->weak_group = orig->weak_group;
evsel->use_config_name = orig->use_config_name;
+ evsel->pmu = orig->pmu;

if (evsel__copy_config_terms(evsel, orig) < 0)
goto out_err;
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 437389dacf48..9e704841273d 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -263,6 +263,7 @@ __add_event(struct list_head *list, int *idx,
evsel->core.own_cpus = perf_cpu_map__get(cpus);
evsel->core.requires_cpu = pmu ? pmu->is_uncore : false;
evsel->auto_merge_stats = auto_merge_stats;
+ evsel->pmu = pmu;

if (name)
evsel->name = strdup(name);
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 74a2cafb4e8d..15bf5943083a 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1048,11 +1048,15 @@ struct perf_pmu *evsel__find_pmu(struct evsel *evsel)
{
struct perf_pmu *pmu = NULL;

+ if (evsel->pmu)
+ return evsel->pmu;
+
while ((pmu = perf_pmu__scan(pmu)) != NULL) {
if (pmu->type == evsel->core.attr.type)
break;
}

+ evsel->pmu = pmu;
return pmu;
}

--
2.38.0.413.g74048e4d9e-goog

2022-10-14 07:08:14

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 11/19] perf stat: Allocate aggr counts for recorded data

In the process_stat_config_event() it sets the aggr_mode that means the
earlier evlist__alloc_stats() cannot allocate the aggr counts due to the
missing aggr_mode.

Do it after setting the aggr_map using evlist__alloc_aggr_stats().

Acked-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-stat.c | 20 +++++++++++++++-----
tools/perf/util/stat.c | 39 +++++++++++++++++++++++++++++++--------
tools/perf/util/stat.h | 2 ++
3 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 6777fef0d56c..2a6a5d0c5563 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1342,7 +1342,11 @@ static struct aggr_cpu_id perf_stat__get_cpu(struct perf_stat_config *config __m
static struct aggr_cpu_id perf_stat__get_aggr(struct perf_stat_config *config,
aggr_get_id_t get_id, struct perf_cpu cpu)
{
- struct aggr_cpu_id id = aggr_cpu_id__empty();
+ struct aggr_cpu_id id;
+
+ /* per-process mode - should use global aggr mode */
+ if (cpu.cpu == -1)
+ return get_id(config, cpu);

if (aggr_cpu_id__is_empty(&config->cpus_aggr_map->map[cpu.cpu]))
config->cpus_aggr_map->map[cpu.cpu] = get_id(config, cpu);
@@ -2125,17 +2129,23 @@ int process_stat_config_event(struct perf_session *session,
if (perf_cpu_map__empty(st->cpus)) {
if (st->aggr_mode != AGGR_UNSET)
pr_warning("warning: processing task data, aggregation mode not set\n");
- return 0;
- }
-
- if (st->aggr_mode != AGGR_UNSET)
+ } else if (st->aggr_mode != AGGR_UNSET) {
stat_config.aggr_mode = st->aggr_mode;
+ }

if (perf_stat.data.is_pipe)
perf_stat_init_aggr_mode();
else
perf_stat_init_aggr_mode_file(st);

+ if (stat_config.aggr_map) {
+ int nr_aggr = stat_config.aggr_map->nr;
+
+ if (evlist__alloc_aggr_stats(session->evlist, nr_aggr) < 0) {
+ pr_err("cannot allocate aggr counts\n");
+ return -1;
+ }
+ }
return 0;
}

diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 3593af403188..d0e5c4c402aa 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -141,6 +141,31 @@ static void evsel__reset_stat_priv(struct evsel *evsel)
memset(aggr, 0, sizeof(*aggr) * ps->nr_aggr);
}

+static int evsel__alloc_aggr_stats(struct evsel *evsel, int nr_aggr)
+{
+ struct perf_stat_evsel *ps = evsel->stats;
+
+ if (ps == NULL)
+ return 0;
+
+ ps->nr_aggr = nr_aggr;
+ ps->aggr = calloc(nr_aggr, sizeof(*ps->aggr));
+ if (ps->aggr == NULL)
+ return -ENOMEM;
+
+ return 0;
+}
+
+int evlist__alloc_aggr_stats(struct evlist *evlist, int nr_aggr)
+{
+ struct evsel *evsel;
+
+ evlist__for_each_entry(evlist, evsel) {
+ if (evsel__alloc_aggr_stats(evsel, nr_aggr) < 0)
+ return -1;
+ }
+ return 0;
+}

static int evsel__alloc_stat_priv(struct evsel *evsel, int nr_aggr)
{
@@ -150,16 +175,14 @@ static int evsel__alloc_stat_priv(struct evsel *evsel, int nr_aggr)
if (ps == NULL)
return -ENOMEM;

- if (nr_aggr) {
- ps->nr_aggr = nr_aggr;
- ps->aggr = calloc(nr_aggr, sizeof(*ps->aggr));
- if (ps->aggr == NULL) {
- free(ps);
- return -ENOMEM;
- }
+ evsel->stats = ps;
+
+ if (nr_aggr && evsel__alloc_aggr_stats(evsel, nr_aggr) < 0) {
+ evsel->stats = NULL;
+ free(ps);
+ return -ENOMEM;
}

- evsel->stats = ps;
perf_stat_evsel_id_init(evsel);
evsel__reset_stat_priv(evsel);
return 0;
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 0980875b9be1..4c00f814bd79 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -275,6 +275,8 @@ void evlist__reset_prev_raw_counts(struct evlist *evlist);
void evlist__copy_prev_raw_counts(struct evlist *evlist);
void evlist__save_aggr_prev_raw_counts(struct evlist *evlist);

+int evlist__alloc_aggr_stats(struct evlist *evlist, int nr_aggr);
+
int perf_stat_process_counter(struct perf_stat_config *config,
struct evsel *counter);
struct perf_tool;
--
2.38.0.413.g74048e4d9e-goog

2022-10-14 07:08:37

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 06/19] perf stat: Add 'needs_sort' argument to cpu_aggr_map__new()

In case of no aggregation, it needs to keep the original (cpu) ordering
in the aggr_map so that it can be in sync with the cpu map. This will
make the code easier to handle AGGR_NONE similar to others.

Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-stat.c | 7 +++++--
tools/perf/util/cpumap.c | 6 ++++--
tools/perf/util/cpumap.h | 2 +-
3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index b03b530fe9a6..9053fd4d15a7 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1458,8 +1458,9 @@ static int perf_stat_init_aggr_mode(void)
aggr_cpu_id_get_t get_id = aggr_mode__get_aggr(stat_config.aggr_mode);

if (get_id) {
+ bool needs_sort = stat_config.aggr_mode != AGGR_NONE;
stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_requested_cpus,
- get_id, /*data=*/NULL);
+ get_id, /*data=*/NULL, needs_sort);
if (!stat_config.aggr_map) {
pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
return -1;
@@ -1674,11 +1675,13 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
{
struct perf_env *env = &st->session->header.env;
aggr_cpu_id_get_t get_id = aggr_mode__get_aggr_file(stat_config.aggr_mode);
+ bool needs_sort = stat_config.aggr_mode != AGGR_NONE;

if (!get_id)
return 0;

- stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_requested_cpus, get_id, env);
+ stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_requested_cpus,
+ get_id, env, needs_sort);
if (!stat_config.aggr_map) {
pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
return -1;
diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
index 60209fe87456..6e3fcf523de9 100644
--- a/tools/perf/util/cpumap.c
+++ b/tools/perf/util/cpumap.c
@@ -234,7 +234,7 @@ static int aggr_cpu_id__cmp(const void *a_pointer, const void *b_pointer)

struct cpu_aggr_map *cpu_aggr_map__new(const struct perf_cpu_map *cpus,
aggr_cpu_id_get_t get_id,
- void *data)
+ void *data, bool needs_sort)
{
int idx;
struct perf_cpu cpu;
@@ -270,8 +270,10 @@ struct cpu_aggr_map *cpu_aggr_map__new(const struct perf_cpu_map *cpus,
if (trimmed_c)
c = trimmed_c;
}
+
/* ensure we process id in increasing order */
- qsort(c->map, c->nr, sizeof(struct aggr_cpu_id), aggr_cpu_id__cmp);
+ if (needs_sort)
+ qsort(c->map, c->nr, sizeof(struct aggr_cpu_id), aggr_cpu_id__cmp);

return c;

diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
index b2ff648bc417..da28b3146ef9 100644
--- a/tools/perf/util/cpumap.h
+++ b/tools/perf/util/cpumap.h
@@ -97,7 +97,7 @@ typedef struct aggr_cpu_id (*aggr_cpu_id_get_t)(struct perf_cpu cpu, void *data)
*/
struct cpu_aggr_map *cpu_aggr_map__new(const struct perf_cpu_map *cpus,
aggr_cpu_id_get_t get_id,
- void *data);
+ void *data, bool needs_sort);

bool aggr_cpu_id__equal(const struct aggr_cpu_id *a, const struct aggr_cpu_id *b);
bool aggr_cpu_id__is_empty(const struct aggr_cpu_id *a);
--
2.38.0.413.g74048e4d9e-goog

2022-10-14 07:12:47

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 05/19] perf stat: Add cpu aggr id for no aggregation mode

Likewise, add an aggr_id for cpu for none aggregation mode. This is not
used actually yet but later code will use to unify the aggregation code.

No functional change intended.

Acked-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-stat.c | 48 +++++++++++++++++++++++++++++++++++----
1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 75d16e9705a4..b03b530fe9a6 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1336,6 +1336,12 @@ static struct aggr_cpu_id perf_stat__get_global(struct perf_stat_config *config
return aggr_cpu_id__global(cpu, /*data=*/NULL);
}

+static struct aggr_cpu_id perf_stat__get_cpu(struct perf_stat_config *config __maybe_unused,
+ struct perf_cpu cpu)
+{
+ return aggr_cpu_id__cpu(cpu, /*data=*/NULL);
+}
+
static struct aggr_cpu_id perf_stat__get_aggr(struct perf_stat_config *config,
aggr_get_id_t get_id, struct perf_cpu cpu)
{
@@ -1378,6 +1384,12 @@ static struct aggr_cpu_id perf_stat__get_global_cached(struct perf_stat_config *
return perf_stat__get_aggr(config, perf_stat__get_global, cpu);
}

+static struct aggr_cpu_id perf_stat__get_cpu_cached(struct perf_stat_config *config,
+ struct perf_cpu cpu)
+{
+ return perf_stat__get_aggr(config, perf_stat__get_cpu, cpu);
+}
+
static bool term_percore_set(void)
{
struct evsel *counter;
@@ -1404,8 +1416,7 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)
case AGGR_NONE:
if (term_percore_set())
return aggr_cpu_id__core;
-
- return NULL;
+ return aggr_cpu_id__cpu;
case AGGR_GLOBAL:
return aggr_cpu_id__global;
case AGGR_THREAD:
@@ -1428,10 +1439,9 @@ static aggr_get_id_t aggr_mode__get_id(enum aggr_mode aggr_mode)
case AGGR_NODE:
return perf_stat__get_node_cached;
case AGGR_NONE:
- if (term_percore_set()) {
+ if (term_percore_set())
return perf_stat__get_core_cached;
- }
- return NULL;
+ return perf_stat__get_cpu_cached;
case AGGR_GLOBAL:
return perf_stat__get_global_cached;
case AGGR_THREAD:
@@ -1541,6 +1551,26 @@ static struct aggr_cpu_id perf_env__get_core_aggr_by_cpu(struct perf_cpu cpu, vo
return id;
}

+static struct aggr_cpu_id perf_env__get_cpu_aggr_by_cpu(struct perf_cpu cpu, void *data)
+{
+ struct perf_env *env = data;
+ struct aggr_cpu_id id = aggr_cpu_id__empty();
+
+ if (cpu.cpu != -1) {
+ /*
+ * core_id is relative to socket and die,
+ * we need a global id. So we set
+ * socket, die id and core id
+ */
+ id.socket = env->cpu[cpu.cpu].socket_id;
+ id.die = env->cpu[cpu.cpu].die_id;
+ id.core = env->cpu[cpu.cpu].core_id;
+ id.cpu = cpu;
+ }
+
+ return id;
+}
+
static struct aggr_cpu_id perf_env__get_node_aggr_by_cpu(struct perf_cpu cpu, void *data)
{
struct aggr_cpu_id id = aggr_cpu_id__empty();
@@ -1576,6 +1606,12 @@ static struct aggr_cpu_id perf_stat__get_core_file(struct perf_stat_config *conf
return perf_env__get_core_aggr_by_cpu(cpu, &perf_stat.session->header.env);
}

+static struct aggr_cpu_id perf_stat__get_cpu_file(struct perf_stat_config *config __maybe_unused,
+ struct perf_cpu cpu)
+{
+ return perf_env__get_cpu_aggr_by_cpu(cpu, &perf_stat.session->header.env);
+}
+
static struct aggr_cpu_id perf_stat__get_node_file(struct perf_stat_config *config __maybe_unused,
struct perf_cpu cpu)
{
@@ -1602,6 +1638,7 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr_file(enum aggr_mode aggr_mode)
case AGGR_GLOBAL:
return perf_env__get_global_aggr_by_cpu;
case AGGR_NONE:
+ return perf_env__get_cpu_aggr_by_cpu;
case AGGR_THREAD:
case AGGR_UNSET:
case AGGR_MAX:
@@ -1624,6 +1661,7 @@ static aggr_get_id_t aggr_mode__get_id_file(enum aggr_mode aggr_mode)
case AGGR_GLOBAL:
return perf_stat__get_global_file;
case AGGR_NONE:
+ return perf_stat__get_cpu_file;
case AGGR_THREAD:
case AGGR_UNSET:
case AGGR_MAX:
--
2.38.0.413.g74048e4d9e-goog

2022-10-14 07:16:04

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCHSET 00/19] perf stat: Cleanup counter aggregation (v2)

On Thu, Oct 13, 2022 at 11:15:31PM -0700, Namhyung Kim wrote:
> Hello,
>
> Current perf stat code is somewhat hard to follow since it handles
> many combinations of PMUs/events for given display and aggregation
> options. This is my attempt to clean it up a little. ;-)
>
> changes in v2)
> * fix a segfault in perf stat report for per-process record (Jiri)
> * fix metric only display (Jiri)
> * add evsel__reset_aggr_stat (ian)
> * add more comments (Ian)
> * add Acked-by from Ian
>
> My first concern is that aggregation and display routines are intermixed
> and processed differently depends on the aggregation mode. I'd like to
> separate them apart and make the logic clearer.
>
> To do that, I added struct perf_stat_aggr to save the aggregated counter
> values and other info. It'll be allocated and processed according to
> the aggr_mode and display logic will use it.
>
> I've tested the following combination.
>
> $ cat test-matrix.sh
> #!/bin/sh
>
> set -e
>
> yes > /dev/null &
> TARGET=$!
>
> ./perf stat true
> ./perf stat -a true
> ./perf stat -C0 true
> ./perf stat -p $TARGET true
> ./perf stat -t $TARGET true
>
> ./perf stat -a -A true
> ./perf stat -a --per-node true
> ./perf stat -a --per-socket true
> ./perf stat -a --per-die true
> ./perf stat -a --per-core true
> ./perf stat -a --per-thread true
>
> ./perf stat -a -I 500 sleep 1
> ./perf stat -a -I 500 --summary sleep 1
> ./perf stat -a -I 500 --per-socket sleep 1
> ./perf stat -a -I 500 --summary --per-socket sleep 1
>
> ./perf stat -a --metric-only true
> ./perf stat -a --metric-only --per-socket true
> ./perf stat -a --metric-only -I 500 sleep 1
> ./perf stat -a --metric-only -I 500 --per-socket sleep 1
>
> ./perf stat record true && ./perf stat report
> ./perf stat record -p $TARGET true && ./perf stat report
> ./perf stat record -a true && ./perf stat report
> ./perf stat record -a --per-core true && ./perf stat report
> ./perf stat record -a --per-core --metric-only true && ./perf stat report
> ./perf stat record -a -I 500 sleep 1 && ./perf stat report
> ./perf stat record -a -I 500 --per-core sleep 1 && ./perf stat report
> ./perf stat record -a -I 500 --per-core --metric-only sleep 1 && ./perf stat report
>
> ./perf stat -a -A -e cpu/event=cpu-cycles,percore/ true
> ./perf stat -a -A -e cpu/event=cpu-cycles,percore/ --percore-show-thread true
>
> kill $TARGET
>
> The code is available at 'perf/stat-aggr-v2' branch in
>
> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Michael,
ay chance you could run your test suite on top of this change?

thanks,
jirka

>
> Thanks,
> Namhyung
>
> Namhyung Kim (19):
> perf tools: Save evsel->pmu in parse_events()
> perf tools: Use pmu info in evsel__is_hybrid()
> perf stat: Use evsel__is_hybrid() more
> perf stat: Add aggr id for global mode
> perf stat: Add cpu aggr id for no aggregation mode
> perf stat: Add 'needs_sort' argument to cpu_aggr_map__new()
> perf stat: Add struct perf_stat_aggr to perf_stat_evsel
> perf stat: Allocate evsel->stats->aggr properly
> perf stat: Aggregate events using evsel->stats->aggr
> perf stat: Aggregate per-thread stats using evsel->stats->aggr
> perf stat: Allocate aggr counts for recorded data
> perf stat: Reset aggr counts for each interval
> perf stat: Split process_counters()
> perf stat: Add perf_stat_merge_counters()
> perf stat: Add perf_stat_process_percore()
> perf stat: Add perf_stat_process_shadow_stats()
> perf stat: Display event stats using aggr counts
> perf stat: Display percore events properly
> perf stat: Remove unused perf_counts.aggr field
>
> tools/perf/builtin-script.c | 4 +-
> tools/perf/builtin-stat.c | 186 +++++--
> tools/perf/tests/parse-metric.c | 2 +-
> tools/perf/tests/pmu-events.c | 2 +-
> tools/perf/util/counts.c | 1 -
> tools/perf/util/counts.h | 1 -
> tools/perf/util/cpumap.c | 16 +-
> tools/perf/util/cpumap.h | 8 +-
> tools/perf/util/evsel.c | 13 +-
> tools/perf/util/parse-events.c | 1 +
> tools/perf/util/pmu.c | 4 +
> .../scripting-engines/trace-event-python.c | 6 -
> tools/perf/util/stat-display.c | 462 +++---------------
> tools/perf/util/stat.c | 385 ++++++++++++---
> tools/perf/util/stat.h | 40 +-
> 15 files changed, 602 insertions(+), 529 deletions(-)
>
>
> base-commit: d79310700590b8b40d8c867012d6c899ea6fd505
> --
> 2.38.0.413.g74048e4d9e-goog
>

2022-10-14 07:17:18

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 12/19] perf stat: Reset aggr counts for each interval

The evsel->stats->aggr->count should be reset for interval processing
since we want to use the values directly for display.

Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-stat.c | 3 +++
tools/perf/util/stat.c | 20 +++++++++++++++++---
tools/perf/util/stat.h | 1 +
3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 2a6a5d0c5563..bff28a199dfd 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -492,6 +492,8 @@ static void process_interval(void)
diff_timespec(&rs, &ts, &ref_time);

perf_stat__reset_shadow_per_stat(&rt_stat);
+ evlist__reset_aggr_stats(evsel_list);
+
read_counters(&rs);

if (STAT_RECORD) {
@@ -965,6 +967,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)

evlist__copy_prev_raw_counts(evsel_list);
evlist__reset_prev_raw_counts(evsel_list);
+ evlist__reset_aggr_stats(evsel_list);
perf_stat__reset_shadow_per_stat(&rt_stat);
} else {
update_stats(&walltime_nsecs_stats, t1 - t0);
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index d0e5c4c402aa..847481cc3d5a 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -130,17 +130,23 @@ static void perf_stat_evsel_id_init(struct evsel *evsel)
}
}

-static void evsel__reset_stat_priv(struct evsel *evsel)
+static void evsel__reset_aggr_stats(struct evsel *evsel)
{
struct perf_stat_evsel *ps = evsel->stats;
struct perf_stat_aggr *aggr = ps->aggr;

- init_stats(&ps->res_stats);
-
if (aggr)
memset(aggr, 0, sizeof(*aggr) * ps->nr_aggr);
}

+static void evsel__reset_stat_priv(struct evsel *evsel)
+{
+ struct perf_stat_evsel *ps = evsel->stats;
+
+ init_stats(&ps->res_stats);
+ evsel__reset_aggr_stats(evsel);
+}
+
static int evsel__alloc_aggr_stats(struct evsel *evsel, int nr_aggr)
{
struct perf_stat_evsel *ps = evsel->stats;
@@ -276,6 +282,14 @@ void evlist__reset_stats(struct evlist *evlist)
}
}

+void evlist__reset_aggr_stats(struct evlist *evlist)
+{
+ struct evsel *evsel;
+
+ evlist__for_each_entry(evlist, evsel)
+ evsel__reset_aggr_stats(evsel);
+}
+
void evlist__reset_prev_raw_counts(struct evlist *evlist)
{
struct evsel *evsel;
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 4c00f814bd79..809f9f0aff0c 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -276,6 +276,7 @@ void evlist__copy_prev_raw_counts(struct evlist *evlist);
void evlist__save_aggr_prev_raw_counts(struct evlist *evlist);

int evlist__alloc_aggr_stats(struct evlist *evlist, int nr_aggr);
+void evlist__reset_aggr_stats(struct evlist *evlist);

int perf_stat_process_counter(struct perf_stat_config *config,
struct evsel *counter);
--
2.38.0.413.g74048e4d9e-goog

2022-10-14 07:23:52

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 18/19] perf stat: Display percore events properly

The recent change in the perf stat broke the percore event display.
Note that the aggr counts are already processed so that the every
sibling thread in the same core will get the per-core counter values.

Check percore evsels and skip the sibling threads in the display.

Acked-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-stat.c | 16 ----------------
tools/perf/util/stat-display.c | 27 +++++++++++++++++++++++++--
2 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index d7c52cef70a3..9d35a3338976 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1404,18 +1404,6 @@ static struct aggr_cpu_id perf_stat__get_cpu_cached(struct perf_stat_config *con
return perf_stat__get_aggr(config, perf_stat__get_cpu, cpu);
}

-static bool term_percore_set(void)
-{
- struct evsel *counter;
-
- evlist__for_each_entry(evsel_list, counter) {
- if (counter->percore)
- return true;
- }
-
- return false;
-}
-
static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)
{
switch (aggr_mode) {
@@ -1428,8 +1416,6 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)
case AGGR_NODE:
return aggr_cpu_id__node;
case AGGR_NONE:
- if (term_percore_set())
- return aggr_cpu_id__core;
return aggr_cpu_id__cpu;
case AGGR_GLOBAL:
return aggr_cpu_id__global;
@@ -1453,8 +1439,6 @@ static aggr_get_id_t aggr_mode__get_id(enum aggr_mode aggr_mode)
case AGGR_NODE:
return perf_stat__get_node_cached;
case AGGR_NONE:
- if (term_percore_set())
- return perf_stat__get_core_cached;
return perf_stat__get_cpu_cached;
case AGGR_GLOBAL:
return perf_stat__get_global_cached;
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index d9cd2db1398b..acdd14ac26db 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -1091,7 +1091,8 @@ static void print_percore(struct perf_stat_config *config,
{
bool metric_only = config->metric_only;
FILE *output = config->output;
- int s;
+ struct cpu_aggr_map *core_map;
+ int s, c, i;
bool first = true;

if (!config->aggr_map || !config->aggr_get_id)
@@ -1100,13 +1101,35 @@ static void print_percore(struct perf_stat_config *config,
if (config->percore_show_thread)
return print_counter(config, counter, prefix);

- for (s = 0; s < config->aggr_map->nr; s++) {
+ core_map = cpu_aggr_map__empty_new(config->aggr_map->nr);
+ if (core_map == NULL) {
+ fprintf(output, "Cannot allocate per-core aggr map for display\n");
+ return;
+ }
+
+ for (s = 0, c = 0; s < config->aggr_map->nr; s++) {
+ struct perf_cpu curr_cpu = config->aggr_map->map[s].cpu;
+ struct aggr_cpu_id core_id = aggr_cpu_id__core(curr_cpu, NULL);
+ bool found = false;
+
+ for (i = 0; i < c; i++) {
+ if (aggr_cpu_id__equal(&core_map->map[i], &core_id)) {
+ found = true;
+ break;
+ }
+ }
+ if (found)
+ continue;
+
if (prefix && metric_only)
fprintf(output, "%s", prefix);

print_counter_aggrdata(config, counter, s,
prefix, metric_only, &first);
+
+ core_map->map[c++] = core_id;
}
+ free(core_map);

if (metric_only)
fputc('\n', output);
--
2.38.0.413.g74048e4d9e-goog

2022-10-14 18:14:39

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCHSET 00/19] perf stat: Cleanup counter aggregation (v2)

Hi Jiri and Michael,

On Thu, Oct 13, 2022 at 11:56 PM Jiri Olsa <[email protected]> wrote:
>
> On Thu, Oct 13, 2022 at 11:15:31PM -0700, Namhyung Kim wrote:
> > Hello,
> >
> > Current perf stat code is somewhat hard to follow since it handles
> > many combinations of PMUs/events for given display and aggregation
> > options. This is my attempt to clean it up a little. ;-)
> >
> > changes in v2)
> > * fix a segfault in perf stat report for per-process record (Jiri)
> > * fix metric only display (Jiri)
> > * add evsel__reset_aggr_stat (ian)
> > * add more comments (Ian)
> > * add Acked-by from Ian
> >
> > My first concern is that aggregation and display routines are intermixed
> > and processed differently depends on the aggregation mode. I'd like to
> > separate them apart and make the logic clearer.
> >
> > To do that, I added struct perf_stat_aggr to save the aggregated counter
> > values and other info. It'll be allocated and processed according to
> > the aggr_mode and display logic will use it.
> >
> > I've tested the following combination.
> >
> > $ cat test-matrix.sh
> > #!/bin/sh
> >
> > set -e
> >
> > yes > /dev/null &
> > TARGET=$!
> >
> > ./perf stat true
> > ./perf stat -a true
> > ./perf stat -C0 true
> > ./perf stat -p $TARGET true
> > ./perf stat -t $TARGET true
> >
> > ./perf stat -a -A true
> > ./perf stat -a --per-node true
> > ./perf stat -a --per-socket true
> > ./perf stat -a --per-die true
> > ./perf stat -a --per-core true
> > ./perf stat -a --per-thread true
> >
> > ./perf stat -a -I 500 sleep 1
> > ./perf stat -a -I 500 --summary sleep 1
> > ./perf stat -a -I 500 --per-socket sleep 1
> > ./perf stat -a -I 500 --summary --per-socket sleep 1
> >
> > ./perf stat -a --metric-only true
> > ./perf stat -a --metric-only --per-socket true
> > ./perf stat -a --metric-only -I 500 sleep 1
> > ./perf stat -a --metric-only -I 500 --per-socket sleep 1
> >
> > ./perf stat record true && ./perf stat report
> > ./perf stat record -p $TARGET true && ./perf stat report
> > ./perf stat record -a true && ./perf stat report
> > ./perf stat record -a --per-core true && ./perf stat report
> > ./perf stat record -a --per-core --metric-only true && ./perf stat report
> > ./perf stat record -a -I 500 sleep 1 && ./perf stat report
> > ./perf stat record -a -I 500 --per-core sleep 1 && ./perf stat report
> > ./perf stat record -a -I 500 --per-core --metric-only sleep 1 && ./perf stat report
> >
> > ./perf stat -a -A -e cpu/event=cpu-cycles,percore/ true
> > ./perf stat -a -A -e cpu/event=cpu-cycles,percore/ --percore-show-thread true
> >
> > kill $TARGET
> >
> > The code is available at 'perf/stat-aggr-v2' branch in
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>
> Michael,
> ay chance you could run your test suite on top of this change?

I've noticed there's an issue with cgroups. Will send a fix soon.

Thanks,
Namhyung

2022-10-14 18:29:34

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 20/19] perf stat: Factor out evsel__count_has_error()

It's possible to have 0 enabled/running time for some per-task or per-cgroup
events since it's not scheduled on any CPU. Treating the whole event as
failed would not work in this case. Thinking again, the code only existed
when any CPU-level aggregation is enabled (like per-socket, per-core, ...).

To make it clearer, factor out the condition check into the new
evsel__count_has_error() function and add some comments.

Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/stat.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 6ab9c58beca7..9dfa8cac6bc4 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -396,6 +396,25 @@ static int check_per_pkg(struct evsel *counter, struct perf_counts_values *vals,
return ret;
}

+static bool evsel__count_has_error(struct evsel *evsel,
+ struct perf_counts_values *count,
+ struct perf_stat_config *config)
+{
+ /* the evsel was failed already */
+ if (evsel->err || evsel->counts->scaled == -1)
+ return true;
+
+ /* this is meaningful for CPU aggregation modes only */
+ if (config->aggr_mode == AGGR_GLOBAL)
+ return false;
+
+ /* it's considered ok when it actually ran */
+ if (count->ena != 0 && count->run != 0)
+ return false;
+
+ return true;
+}
+
static int
process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
int cpu_map_idx, int thread,
@@ -450,11 +469,9 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,

/*
* When any result is bad, make them all to give consistent output
- * in interval mode. But per-task counters can have 0 enabled time
- * when some tasks are idle.
+ * in interval mode.
*/
- if (((count->ena == 0 || count->run == 0) && cpu.cpu != -1) ||
- evsel->counts->scaled == -1) {
+ if (evsel__count_has_error(evsel, count, config) && !ps_aggr->failed) {
ps_aggr->counts.val = 0;
ps_aggr->counts.ena = 0;
ps_aggr->counts.run = 0;
--
2.38.0.413.g74048e4d9e-goog

2022-10-14 20:37:50

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 20/19] perf stat: Factor out evsel__count_has_error()

Em Fri, Oct 14, 2022 at 11:16:55AM -0700, Namhyung Kim escreveu:
> It's possible to have 0 enabled/running time for some per-task or per-cgroup
> events since it's not scheduled on any CPU. Treating the whole event as
> failed would not work in this case. Thinking again, the code only existed
> when any CPU-level aggregation is enabled (like per-socket, per-core, ...).
>
> To make it clearer, factor out the condition check into the new
> evsel__count_has_error() function and add some comments.

So I should just add this one to the 19-long patchkit I already applied
locally, ok.

- Arnaldo

> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> tools/perf/util/stat.c | 25 +++++++++++++++++++++----
> 1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 6ab9c58beca7..9dfa8cac6bc4 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -396,6 +396,25 @@ static int check_per_pkg(struct evsel *counter, struct perf_counts_values *vals,
> return ret;
> }
>
> +static bool evsel__count_has_error(struct evsel *evsel,
> + struct perf_counts_values *count,
> + struct perf_stat_config *config)
> +{
> + /* the evsel was failed already */
> + if (evsel->err || evsel->counts->scaled == -1)
> + return true;
> +
> + /* this is meaningful for CPU aggregation modes only */
> + if (config->aggr_mode == AGGR_GLOBAL)
> + return false;
> +
> + /* it's considered ok when it actually ran */
> + if (count->ena != 0 && count->run != 0)
> + return false;
> +
> + return true;
> +}
> +
> static int
> process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
> int cpu_map_idx, int thread,
> @@ -450,11 +469,9 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
>
> /*
> * When any result is bad, make them all to give consistent output
> - * in interval mode. But per-task counters can have 0 enabled time
> - * when some tasks are idle.
> + * in interval mode.
> */
> - if (((count->ena == 0 || count->run == 0) && cpu.cpu != -1) ||
> - evsel->counts->scaled == -1) {
> + if (evsel__count_has_error(evsel, count, config) && !ps_aggr->failed) {
> ps_aggr->counts.val = 0;
> ps_aggr->counts.ena = 0;
> ps_aggr->counts.run = 0;
> --
> 2.38.0.413.g74048e4d9e-goog

--

- Arnaldo

2022-10-15 13:13:45

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 17/19] perf stat: Display event stats using aggr counts

Em Thu, Oct 13, 2022 at 11:15:48PM -0700, Namhyung Kim escreveu:
> Now aggr counts are ready for use. Convert the display routines to use
> the aggr counts and update the shadow stat with them. It doesn't need
> to aggregate counts or collect aliases anymore during the display. Get
> rid of now unused struct perf_aggr_thread_value.
>
> Note that there's a difference in the display order among the aggr mode.
> For per-core/die/socket/node aggregation, it shows relevant events in
> the same unit together, whereas global/thread/no aggregation it shows
> the same events for different units together. So it still uses separate
> codes to display them due to the ordering.
>
> One more thing to note is that it breaks per-core event display for now.
> The next patch will fix it to have identical output as of now.

This breaks:

[root@quaco perf]# perf test "perf all PMU test"
101: perf all PMU test : FAILED!

[acme@quaco perf]$ git bisect good
8ecdc6436c19480ac9458ab94807aebc5c13de7f is the first bad commit
commit 8ecdc6436c19480ac9458ab94807aebc5c13de7f
Author: Namhyung Kim <[email protected]>
Date: Thu Oct 13 23:15:48 2022 -0700

perf stat: Display event stats using aggr counts

Now aggr counts are ready for use. Convert the display routines to use
the aggr counts and update the shadow stat with them. It doesn't need
to aggregate counts or collect aliases anymore during the display. Get
rid of now unused struct perf_aggr_thread_value.

Note that there's a difference in the display order among the aggr mode.
For per-core/die/socket/node aggregation, it shows relevant events in
the same unit together, whereas global/thread/no aggregation it shows
the same events for different units together. So it still uses separate
codes to display them due to the ordering.

One more thing to note is that it breaks per-core event display for now.
The next patch will fix it to have identical output as of now.

Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Athira Jajeev <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools/perf/util/stat-display.c | 421 +++++------------------------------------
tools/perf/util/stat.c | 5 -
tools/perf/util/stat.h | 9 -
3 files changed, 49 insertions(+), 386 deletions(-)
[acme@quaco perf]$


Testing directly on it:

[acme@quaco perf]$ git checkout 8ecdc6436c19480ac9458ab94807aebc5c13de7f
Note: switching to '8ecdc6436c19480ac9458ab94807aebc5c13de7f'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

git switch -c <new-branch-name>

Or undo this operation with:

git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 8ecdc6436c19480a perf stat: Display event stats using aggr counts
[acme@quaco perf]$ rm -rf /tmp/build/$(basename $(pwd)) ; mkdir -p /tmp/build/$(basename $(pwd))
[acme@quaco perf]$ alias m='rm -rf ~/libexec/perf-core/ ; perf stat -e cycles:u,instructions:u make -k O=/tmp/build/perf -C tools/perf install-bin && perf test python'
[acme@quaco perf]$ m
<SNIP build>
19: 'import perf' in python : Ok
[acme@quaco perf]$

[root@quaco perf]# id
uid=0(root) gid=0(root) groups=0(root) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[root@quaco perf]# perf -v
perf version 6.0.g8ecdc6436c19
[root@quaco perf]# perf test "perf all PMU test"
101: perf all PMU test : FAILED!
[root@quaco perf]#

[root@quaco perf]# perf test -v "perf all PMU test"
101: perf all PMU test :
--- start ---
test child forked, pid 602103
Testing cpu/branch-instructions/
Testing cpu/branch-misses/
<SNIP>
Testing msr/cpu_thermal_margin/
Testing msr/mperf/
Testing msr/pperf/
Testing msr/smi/
Testing msr/tsc/
Testing power/energy-cores/
Testing power/energy-gpu/
Testing power/energy-pkg/
Testing power/energy-psys/
Testing power/energy-ram/
Testing uncore_cbox_0/clockticks/
Testing uncore_cbox_1/clockticks/
Event 'uncore_cbox_1/clockticks/' not printed in:
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 172.888 usec (+- 0.119 usec)
Average num. events: 53.000 (+- 0.000)
Average time per event 3.262 usec
Average data synthesis took: 182.632 usec (+- 0.168 usec)
Average num. events: 274.000 (+- 0.000)
Average time per event 0.667 usec

Performance counter stats for 'system wide':


3.723058654 seconds time elapsed
test child finished with -1
---- end ----
perf all PMU test: FAILED!
[root@quaco perf]#

So I think I'll remove the whole series and send to Linus whats left.

- Arnaldo

> Acked-by: Ian Rogers <[email protected]>
> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> tools/perf/util/stat-display.c | 421 ++++-----------------------------
> tools/perf/util/stat.c | 5 -
> tools/perf/util/stat.h | 9 -
> 3 files changed, 49 insertions(+), 386 deletions(-)
>
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 4113aa86772f..d9cd2db1398b 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -442,31 +442,6 @@ static void print_metric_header(struct perf_stat_config *config,
> fprintf(os->fh, "%*s ", config->metric_only_len, unit);
> }
>
> -static int first_shadow_map_idx(struct perf_stat_config *config,
> - struct evsel *evsel, const struct aggr_cpu_id *id)
> -{
> - struct perf_cpu_map *cpus = evsel__cpus(evsel);
> - struct perf_cpu cpu;
> - int idx;
> -
> - if (config->aggr_mode == AGGR_NONE)
> - return perf_cpu_map__idx(cpus, id->cpu);
> -
> - if (config->aggr_mode == AGGR_THREAD)
> - return id->thread_idx;
> -
> - if (!config->aggr_get_id)
> - return 0;
> -
> - perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> - struct aggr_cpu_id cpu_id = config->aggr_get_id(config, cpu);
> -
> - if (aggr_cpu_id__equal(&cpu_id, id))
> - return idx;
> - }
> - return 0;
> -}
> -
> static void abs_printout(struct perf_stat_config *config,
> struct aggr_cpu_id id, int nr, struct evsel *evsel, double avg)
> {
> @@ -537,7 +512,7 @@ static bool is_mixed_hw_group(struct evsel *counter)
> static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int nr,
> struct evsel *counter, double uval,
> char *prefix, u64 run, u64 ena, double noise,
> - struct runtime_stat *st)
> + struct runtime_stat *st, int map_idx)
> {
> struct perf_stat_output_ctx out;
> struct outstate os = {
> @@ -648,8 +623,7 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
> print_running(config, run, ena);
> }
>
> - perf_stat__print_shadow_stats(config, counter, uval,
> - first_shadow_map_idx(config, counter, &id),
> + perf_stat__print_shadow_stats(config, counter, uval, map_idx,
> &out, &config->metric_events, st);
> if (!config->csv_output && !config->metric_only && !config->json_output) {
> print_noise(config, counter, noise);
> @@ -657,34 +631,6 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
> }
> }
>
> -static void aggr_update_shadow(struct perf_stat_config *config,
> - struct evlist *evlist)
> -{
> - int idx, s;
> - struct perf_cpu cpu;
> - struct aggr_cpu_id s2, id;
> - u64 val;
> - struct evsel *counter;
> - struct perf_cpu_map *cpus;
> -
> - for (s = 0; s < config->aggr_map->nr; s++) {
> - id = config->aggr_map->map[s];
> - evlist__for_each_entry(evlist, counter) {
> - cpus = evsel__cpus(counter);
> - val = 0;
> - perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> - s2 = config->aggr_get_id(config, cpu);
> - if (!aggr_cpu_id__equal(&s2, &id))
> - continue;
> - val += perf_counts(counter->counts, idx, 0)->val;
> - }
> - perf_stat__update_shadow_stats(counter, val,
> - first_shadow_map_idx(config, counter, &id),
> - &rt_stat);
> - }
> - }
> -}
> -
> static void uniquify_event_name(struct evsel *counter)
> {
> char *new_name;
> @@ -721,137 +667,51 @@ static void uniquify_event_name(struct evsel *counter)
> counter->uniquified_name = true;
> }
>
> -static void collect_all_aliases(struct perf_stat_config *config, struct evsel *counter,
> - void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
> - bool first),
> - void *data)
> +static bool hybrid_uniquify(struct evsel *evsel, struct perf_stat_config *config)
> {
> - struct evlist *evlist = counter->evlist;
> - struct evsel *alias;
> -
> - alias = list_prepare_entry(counter, &(evlist->core.entries), core.node);
> - list_for_each_entry_continue (alias, &evlist->core.entries, core.node) {
> - /* Merge events with the same name, etc. but on different PMUs. */
> - if (!strcmp(evsel__name(alias), evsel__name(counter)) &&
> - alias->scale == counter->scale &&
> - alias->cgrp == counter->cgrp &&
> - !strcmp(alias->unit, counter->unit) &&
> - evsel__is_clock(alias) == evsel__is_clock(counter) &&
> - strcmp(alias->pmu_name, counter->pmu_name)) {
> - alias->merged_stat = true;
> - cb(config, alias, data, false);
> - }
> - }
> + return evsel__is_hybrid(evsel) && !config->hybrid_merge;
> }
>
> -static bool hybrid_merge(struct evsel *counter, struct perf_stat_config *config,
> - bool check)
> +static void uniquify_counter(struct perf_stat_config *config, struct evsel *counter)
> {
> - if (evsel__is_hybrid(counter)) {
> - if (check)
> - return config->hybrid_merge;
> - else
> - return !config->hybrid_merge;
> - }
> -
> - return false;
> -}
> -
> -static bool collect_data(struct perf_stat_config *config, struct evsel *counter,
> - void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
> - bool first),
> - void *data)
> -{
> - if (counter->merged_stat)
> - return false;
> - cb(config, counter, data, true);
> - if (config->no_merge || hybrid_merge(counter, config, false))
> + if (config->no_merge || hybrid_uniquify(counter, config))
> uniquify_event_name(counter);
> - else if (counter->auto_merge_stats || hybrid_merge(counter, config, true))
> - collect_all_aliases(config, counter, cb, data);
> - return true;
> -}
> -
> -struct aggr_data {
> - u64 ena, run, val;
> - struct aggr_cpu_id id;
> - int nr;
> - int cpu_map_idx;
> -};
> -
> -static void aggr_cb(struct perf_stat_config *config,
> - struct evsel *counter, void *data, bool first)
> -{
> - struct aggr_data *ad = data;
> - int idx;
> - struct perf_cpu cpu;
> - struct perf_cpu_map *cpus;
> - struct aggr_cpu_id s2;
> -
> - cpus = evsel__cpus(counter);
> - perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> - struct perf_counts_values *counts;
> -
> - s2 = config->aggr_get_id(config, cpu);
> - if (!aggr_cpu_id__equal(&s2, &ad->id))
> - continue;
> - if (first)
> - ad->nr++;
> - counts = perf_counts(counter->counts, idx, 0);
> - /*
> - * When any result is bad, make them all to give
> - * consistent output in interval mode.
> - */
> - if (counts->ena == 0 || counts->run == 0 ||
> - counter->counts->scaled == -1) {
> - ad->ena = 0;
> - ad->run = 0;
> - break;
> - }
> - ad->val += counts->val;
> - ad->ena += counts->ena;
> - ad->run += counts->run;
> - }
> }
>
> static void print_counter_aggrdata(struct perf_stat_config *config,
> struct evsel *counter, int s,
> char *prefix, bool metric_only,
> - bool *first, struct perf_cpu cpu)
> + bool *first)
> {
> - struct aggr_data ad;
> FILE *output = config->output;
> u64 ena, run, val;
> - int nr;
> - struct aggr_cpu_id id;
> double uval;
> + struct perf_stat_evsel *ps = counter->stats;
> + struct perf_stat_aggr *aggr = &ps->aggr[s];
> + struct aggr_cpu_id id = config->aggr_map->map[s];
> + double avg = aggr->counts.val;
>
> - ad.id = id = config->aggr_map->map[s];
> - ad.val = ad.ena = ad.run = 0;
> - ad.nr = 0;
> - if (!collect_data(config, counter, aggr_cb, &ad))
> + if (aggr->nr == 0)
> return;
>
> - if (perf_pmu__has_hybrid() && ad.ena == 0)
> - return;
> + uniquify_counter(config, counter);
> +
> + val = aggr->counts.val;
> + ena = aggr->counts.ena;
> + run = aggr->counts.run;
>
> - nr = ad.nr;
> - ena = ad.ena;
> - run = ad.run;
> - val = ad.val;
> if (*first && metric_only) {
> *first = false;
> - aggr_printout(config, counter, id, nr);
> + aggr_printout(config, counter, id, aggr->nr);
> }
> if (prefix && !metric_only)
> fprintf(output, "%s", prefix);
>
> uval = val * counter->scale;
> - if (cpu.cpu != -1)
> - id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
>
> - printout(config, id, nr, counter, uval,
> - prefix, run, ena, 1.0, &rt_stat);
> + printout(config, id, aggr->nr, counter, uval,
> + prefix, run, ena, avg, &rt_stat, s);
> +
> if (!metric_only)
> fputc('\n', output);
> }
> @@ -869,8 +729,6 @@ static void print_aggr(struct perf_stat_config *config,
> if (!config->aggr_map || !config->aggr_get_id)
> return;
>
> - aggr_update_shadow(config, evlist);
> -
> /*
> * With metric_only everything is on a single line.
> * Without each counter has its own line.
> @@ -881,188 +739,36 @@ static void print_aggr(struct perf_stat_config *config,
>
> first = true;
> evlist__for_each_entry(evlist, counter) {
> + if (counter->merged_stat)
> + continue;
> +
> print_counter_aggrdata(config, counter, s,
> - prefix, metric_only,
> - &first, (struct perf_cpu){ .cpu = -1 });
> + prefix, metric_only,
> + &first);
> }
> if (metric_only)
> fputc('\n', output);
> }
> }
>
> -static int cmp_val(const void *a, const void *b)
> -{
> - return ((struct perf_aggr_thread_value *)b)->val -
> - ((struct perf_aggr_thread_value *)a)->val;
> -}
> -
> -static struct perf_aggr_thread_value *sort_aggr_thread(
> - struct evsel *counter,
> - int *ret,
> - struct target *_target)
> -{
> - int nthreads = perf_thread_map__nr(counter->core.threads);
> - int i = 0;
> - double uval;
> - struct perf_aggr_thread_value *buf;
> -
> - buf = calloc(nthreads, sizeof(struct perf_aggr_thread_value));
> - if (!buf)
> - return NULL;
> -
> - for (int thread = 0; thread < nthreads; thread++) {
> - int idx;
> - u64 ena = 0, run = 0, val = 0;
> -
> - perf_cpu_map__for_each_idx(idx, evsel__cpus(counter)) {
> - struct perf_counts_values *counts =
> - perf_counts(counter->counts, idx, thread);
> -
> - val += counts->val;
> - ena += counts->ena;
> - run += counts->run;
> - }
> -
> - uval = val * counter->scale;
> -
> - /*
> - * Skip value 0 when enabling --per-thread globally,
> - * otherwise too many 0 output.
> - */
> - if (uval == 0.0 && target__has_per_thread(_target))
> - continue;
> -
> - buf[i].counter = counter;
> - buf[i].id = aggr_cpu_id__empty();
> - buf[i].id.thread_idx = thread;
> - buf[i].uval = uval;
> - buf[i].val = val;
> - buf[i].run = run;
> - buf[i].ena = ena;
> - i++;
> - }
> -
> - qsort(buf, i, sizeof(struct perf_aggr_thread_value), cmp_val);
> -
> - if (ret)
> - *ret = i;
> -
> - return buf;
> -}
> -
> -static void print_aggr_thread(struct perf_stat_config *config,
> - struct target *_target,
> - struct evsel *counter, char *prefix)
> -{
> - FILE *output = config->output;
> - int thread, sorted_threads;
> - struct aggr_cpu_id id;
> - struct perf_aggr_thread_value *buf;
> -
> - buf = sort_aggr_thread(counter, &sorted_threads, _target);
> - if (!buf) {
> - perror("cannot sort aggr thread");
> - return;
> - }
> -
> - for (thread = 0; thread < sorted_threads; thread++) {
> - if (prefix)
> - fprintf(output, "%s", prefix);
> -
> - id = buf[thread].id;
> - printout(config, id, 0, buf[thread].counter, buf[thread].uval,
> - prefix, buf[thread].run, buf[thread].ena, 1.0,
> - &rt_stat);
> - fputc('\n', output);
> - }
> -
> - free(buf);
> -}
> -
> -struct caggr_data {
> - double avg, avg_enabled, avg_running;
> -};
> -
> -static void counter_aggr_cb(struct perf_stat_config *config __maybe_unused,
> - struct evsel *counter, void *data,
> - bool first __maybe_unused)
> -{
> - struct caggr_data *cd = data;
> - struct perf_counts_values *aggr = &counter->counts->aggr;
> -
> - cd->avg += aggr->val;
> - cd->avg_enabled += aggr->ena;
> - cd->avg_running += aggr->run;
> -}
> -
> -/*
> - * Print out the results of a single counter:
> - * aggregated counts in system-wide mode
> - */
> -static void print_counter_aggr(struct perf_stat_config *config,
> - struct evsel *counter, char *prefix)
> -{
> - bool metric_only = config->metric_only;
> - FILE *output = config->output;
> - double uval;
> - struct caggr_data cd = { .avg = 0.0 };
> -
> - if (!collect_data(config, counter, counter_aggr_cb, &cd))
> - return;
> -
> - if (prefix && !metric_only)
> - fprintf(output, "%s", prefix);
> -
> - uval = cd.avg * counter->scale;
> - printout(config, aggr_cpu_id__empty(), 0, counter, uval, prefix, cd.avg_running,
> - cd.avg_enabled, cd.avg, &rt_stat);
> - if (!metric_only)
> - fprintf(output, "\n");
> -}
> -
> -static void counter_cb(struct perf_stat_config *config __maybe_unused,
> - struct evsel *counter, void *data,
> - bool first __maybe_unused)
> -{
> - struct aggr_data *ad = data;
> -
> - ad->val += perf_counts(counter->counts, ad->cpu_map_idx, 0)->val;
> - ad->ena += perf_counts(counter->counts, ad->cpu_map_idx, 0)->ena;
> - ad->run += perf_counts(counter->counts, ad->cpu_map_idx, 0)->run;
> -}
> -
> -/*
> - * Print out the results of a single counter:
> - * does not use aggregated count in system-wide
> - */
> static void print_counter(struct perf_stat_config *config,
> struct evsel *counter, char *prefix)
> {
> - FILE *output = config->output;
> - u64 ena, run, val;
> - double uval;
> - int idx;
> - struct perf_cpu cpu;
> - struct aggr_cpu_id id;
> -
> - perf_cpu_map__for_each_cpu(cpu, idx, evsel__cpus(counter)) {
> - struct aggr_data ad = { .cpu_map_idx = idx };
> -
> - if (!collect_data(config, counter, counter_cb, &ad))
> - return;
> - val = ad.val;
> - ena = ad.ena;
> - run = ad.run;
> + bool metric_only = config->metric_only;
> + bool first = false;
> + int s;
>
> - if (prefix)
> - fprintf(output, "%s", prefix);
> + /* AGGR_THREAD doesn't have config->aggr_get_id */
> + if (!config->aggr_map)
> + return;
>
> - uval = val * counter->scale;
> - id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
> - printout(config, id, 0, counter, uval, prefix,
> - run, ena, 1.0, &rt_stat);
> + if (counter->merged_stat)
> + return;
>
> - fputc('\n', output);
> + for (s = 0; s < config->aggr_map->nr; s++) {
> + print_counter_aggrdata(config, counter, s,
> + prefix, metric_only,
> + &first);
> }
> }
>
> @@ -1081,6 +787,7 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
> u64 ena, run, val;
> double uval;
> struct aggr_cpu_id id;
> + struct perf_stat_evsel *ps = counter->stats;
> int counter_idx = perf_cpu_map__idx(evsel__cpus(counter), cpu);
>
> if (counter_idx < 0)
> @@ -1093,13 +800,13 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
> aggr_printout(config, counter, id, 0);
> first = false;
> }
> - val = perf_counts(counter->counts, counter_idx, 0)->val;
> - ena = perf_counts(counter->counts, counter_idx, 0)->ena;
> - run = perf_counts(counter->counts, counter_idx, 0)->run;
> + val = ps->aggr[counter_idx].counts.val;
> + ena = ps->aggr[counter_idx].counts.ena;
> + run = ps->aggr[counter_idx].counts.run;
>
> uval = val * counter->scale;
> printout(config, id, 0, counter, uval, prefix,
> - run, ena, 1.0, &rt_stat);
> + run, ena, 1.0, &rt_stat, counter_idx);
> }
> if (!first)
> fputc('\n', config->output);
> @@ -1135,8 +842,8 @@ static void print_metric_headers(struct perf_stat_config *config,
> };
> bool first = true;
>
> - if (config->json_output && !config->interval)
> - fprintf(config->output, "{");
> + if (config->json_output && !config->interval)
> + fprintf(config->output, "{");
>
> if (prefix && !config->json_output)
> fprintf(config->output, "%s", prefix);
> @@ -1379,31 +1086,6 @@ static void print_footer(struct perf_stat_config *config)
> "the same PMU. Try reorganizing the group.\n");
> }
>
> -static void print_percore_thread(struct perf_stat_config *config,
> - struct evsel *counter, char *prefix)
> -{
> - int s;
> - struct aggr_cpu_id s2, id;
> - struct perf_cpu_map *cpus;
> - bool first = true;
> - int idx;
> - struct perf_cpu cpu;
> -
> - cpus = evsel__cpus(counter);
> - perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> - s2 = config->aggr_get_id(config, cpu);
> - for (s = 0; s < config->aggr_map->nr; s++) {
> - id = config->aggr_map->map[s];
> - if (aggr_cpu_id__equal(&s2, &id))
> - break;
> - }
> -
> - print_counter_aggrdata(config, counter, s,
> - prefix, false,
> - &first, cpu);
> - }
> -}
> -
> static void print_percore(struct perf_stat_config *config,
> struct evsel *counter, char *prefix)
> {
> @@ -1416,15 +1098,14 @@ static void print_percore(struct perf_stat_config *config,
> return;
>
> if (config->percore_show_thread)
> - return print_percore_thread(config, counter, prefix);
> + return print_counter(config, counter, prefix);
>
> for (s = 0; s < config->aggr_map->nr; s++) {
> if (prefix && metric_only)
> fprintf(output, "%s", prefix);
>
> print_counter_aggrdata(config, counter, s,
> - prefix, metric_only,
> - &first, (struct perf_cpu){ .cpu = -1 });
> + prefix, metric_only, &first);
> }
>
> if (metric_only)
> @@ -1469,17 +1150,13 @@ void evlist__print_counters(struct evlist *evlist, struct perf_stat_config *conf
> print_aggr(config, evlist, prefix);
> break;
> case AGGR_THREAD:
> - evlist__for_each_entry(evlist, counter) {
> - print_aggr_thread(config, _target, counter, prefix);
> - }
> - break;
> case AGGR_GLOBAL:
> if (config->iostat_run)
> iostat_print_counters(evlist, config, ts, prefix = buf,
> - print_counter_aggr);
> + print_counter);
> else {
> evlist__for_each_entry(evlist, counter) {
> - print_counter_aggr(config, counter, prefix);
> + print_counter(config, counter, prefix);
> }
> if (metric_only)
> fputc('\n', config->output);
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index d51ba63b9ab2..14c45f4cfdd3 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -547,11 +547,6 @@ int perf_stat_process_counter(struct perf_stat_config *config,
> evsel__name(counter), count[0], count[1], count[2]);
> }
>
> - /*
> - * Save the full runtime - to allow normalization during printout:
> - */
> - perf_stat__update_shadow_stats(counter, *count, 0, &rt_stat);
> -
> return 0;
> }
>
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index 3d413ba8c68a..382a1ab92ce1 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -224,15 +224,6 @@ static inline void update_rusage_stats(struct rusage_stats *ru_stats, struct rus
> struct evsel;
> struct evlist;
>
> -struct perf_aggr_thread_value {
> - struct evsel *counter;
> - struct aggr_cpu_id id;
> - double uval;
> - u64 val;
> - u64 run;
> - u64 ena;
> -};
> -
> bool __perf_stat_evsel__is(struct evsel *evsel, enum perf_stat_evsel_id id);
>
> #define perf_stat_evsel__is(evsel, id) \
> --
> 2.38.0.413.g74048e4d9e-goog

--

- Arnaldo

2022-10-15 13:36:01

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 01/19] perf tools: Save evsel->pmu in parse_events()

Em Thu, Oct 13, 2022 at 11:15:32PM -0700, Namhyung Kim escreveu:
> Now evsel has a pmu pointer, let's save the info and use it like in
> evsel__find_pmu().

This one made 'perf test metricgroups' to fail, I removed it and that
test passes. Can you please double check?

I have all the other patches in my local perf/core branch and I'm
prepping up a pull req with those included to Linus, holler if you
disagree.

- Arnaldo

> Acked-by: Ian Rogers <[email protected]>
> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> tools/perf/util/evsel.c | 1 +
> tools/perf/util/parse-events.c | 1 +
> tools/perf/util/pmu.c | 4 ++++
> 3 files changed, 6 insertions(+)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 76605fde3507..196f8e4859d7 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -467,6 +467,7 @@ struct evsel *evsel__clone(struct evsel *orig)
> evsel->collect_stat = orig->collect_stat;
> evsel->weak_group = orig->weak_group;
> evsel->use_config_name = orig->use_config_name;
> + evsel->pmu = orig->pmu;
>
> if (evsel__copy_config_terms(evsel, orig) < 0)
> goto out_err;
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 437389dacf48..9e704841273d 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -263,6 +263,7 @@ __add_event(struct list_head *list, int *idx,
> evsel->core.own_cpus = perf_cpu_map__get(cpus);
> evsel->core.requires_cpu = pmu ? pmu->is_uncore : false;
> evsel->auto_merge_stats = auto_merge_stats;
> + evsel->pmu = pmu;
>
> if (name)
> evsel->name = strdup(name);
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index 74a2cafb4e8d..15bf5943083a 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -1048,11 +1048,15 @@ struct perf_pmu *evsel__find_pmu(struct evsel *evsel)
> {
> struct perf_pmu *pmu = NULL;
>
> + if (evsel->pmu)
> + return evsel->pmu;
> +
> while ((pmu = perf_pmu__scan(pmu)) != NULL) {
> if (pmu->type == evsel->core.attr.type)
> break;
> }
>
> + evsel->pmu = pmu;
> return pmu;
> }
>
> --
> 2.38.0.413.g74048e4d9e-goog

--

- Arnaldo

2022-10-16 14:59:16

by Athira Rajeev

[permalink] [raw]
Subject: Re: [PATCH 19/19] perf stat: Remove unused perf_counts.aggr field



> On 14-Oct-2022, at 11:45 AM, Namhyung Kim <[email protected]> wrote:
>
> The aggr field in the struct perf_counts is to keep the aggregated value
> in the AGGR_GLOBAL for the old code. But it's not used anymore.
>
> Acked-by: Ian Rogers <[email protected]>
> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> tools/perf/util/counts.c | 1 -
> tools/perf/util/counts.h | 1 -
> tools/perf/util/stat.c | 35 ++---------------------------------
> 3 files changed, 2 insertions(+), 35 deletions(-)
>
> diff --git a/tools/perf/util/counts.c b/tools/perf/util/counts.c
> index 7a447d918458..11cd85b278a6 100644
> --- a/tools/perf/util/counts.c
> +++ b/tools/perf/util/counts.c
> @@ -48,7 +48,6 @@ void perf_counts__reset(struct perf_counts *counts)
> {
> xyarray__reset(counts->loaded);
> xyarray__reset(counts->values);
> - memset(&counts->aggr, 0, sizeof(struct perf_counts_values));
> }
>
> void evsel__reset_counts(struct evsel *evsel)
> diff --git a/tools/perf/util/counts.h b/tools/perf/util/counts.h
> index 5de275194f2b..42760242e0df 100644
> --- a/tools/perf/util/counts.h
> +++ b/tools/perf/util/counts.h
> @@ -11,7 +11,6 @@ struct evsel;
>
> struct perf_counts {
> s8 scaled;
> - struct perf_counts_values aggr;
> struct xyarray *values;
> struct xyarray *loaded;
> };
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 14c45f4cfdd3..6ab9c58beca7 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -308,8 +308,6 @@ static void evsel__copy_prev_raw_counts(struct evsel *evsel)
> *perf_counts(evsel->prev_raw_counts, idx, thread);
> }
> }
> -
> - evsel->counts->aggr = evsel->prev_raw_counts->aggr;
> }
>
> void evlist__copy_prev_raw_counts(struct evlist *evlist)
> @@ -320,26 +318,6 @@ void evlist__copy_prev_raw_counts(struct evlist *evlist)
> evsel__copy_prev_raw_counts(evsel);
> }
>
> -void evlist__save_aggr_prev_raw_counts(struct evlist *evlist)
> -{
> - struct evsel *evsel;
> -
> - /*
> - * To collect the overall statistics for interval mode,
> - * we copy the counts from evsel->prev_raw_counts to
> - * evsel->counts. The perf_stat_process_counter creates
> - * aggr values from per cpu values, but the per cpu values
> - * are 0 for AGGR_GLOBAL. So we use a trick that saves the
> - * previous aggr value to the first member of perf_counts,
> - * then aggr calculation in process_counter_values can work
> - * correctly.
> - */
> - evlist__for_each_entry(evlist, evsel) {
> - *perf_counts(evsel->prev_raw_counts, 0, 0) =
> - evsel->prev_raw_counts->aggr;
> - }
> -}
> -
> static size_t pkg_id_hash(const void *__key, void *ctx __maybe_unused)
> {
> uint64_t *key = (uint64_t *) __key;
> @@ -423,7 +401,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
> int cpu_map_idx, int thread,
> struct perf_counts_values *count)
> {
> - struct perf_counts_values *aggr = &evsel->counts->aggr;
> struct perf_stat_evsel *ps = evsel->stats;
> static struct perf_counts_values zero;
> bool skip = false;
> @@ -493,12 +470,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
> }
> }
>
> - if (config->aggr_mode == AGGR_GLOBAL) {
> - aggr->val += count->val;
> - aggr->ena += count->ena;
> - aggr->run += count->run;
> - }
> -
> return 0;
> }
>
> @@ -523,13 +494,10 @@ static int process_counter_maps(struct perf_stat_config *config,
> int perf_stat_process_counter(struct perf_stat_config *config,
> struct evsel *counter)
> {
> - struct perf_counts_values *aggr = &counter->counts->aggr;
> struct perf_stat_evsel *ps = counter->stats;
> - u64 *count = counter->counts->aggr.values;
> + u64 *count;
> int ret;
>
> - aggr->val = aggr->ena = aggr->run = 0;
> -
> if (counter->per_pkg)
> evsel__zero_per_pkg(counter);
>
> @@ -540,6 +508,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
> if (config->aggr_mode != AGGR_GLOBAL)
> return 0;
>
> + count = ps->aggr[0].counts.values;

Hi Namhyung,

We are using ps->aggr[0] here always. Can you please clarify on why first index is used here always.

Thanks
Athira
> update_stats(&ps->res_stats, *count);
>
> if (verbose > 0) {
> --
> 2.38.0.413.g74048e4d9e-goog
>

2022-10-17 23:54:32

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 19/19] perf stat: Remove unused perf_counts.aggr field

Hello,

On Sun, Oct 16, 2022 at 6:33 AM Athira Rajeev
<[email protected]> wrote:
>
>
>
> > On 14-Oct-2022, at 11:45 AM, Namhyung Kim <[email protected]> wrote:
> >
> > The aggr field in the struct perf_counts is to keep the aggregated value
> > in the AGGR_GLOBAL for the old code. But it's not used anymore.
> >
> > Acked-by: Ian Rogers <[email protected]>
> > Signed-off-by: Namhyung Kim <[email protected]>
> > ---
> > tools/perf/util/counts.c | 1 -
> > tools/perf/util/counts.h | 1 -
> > tools/perf/util/stat.c | 35 ++---------------------------------
> > 3 files changed, 2 insertions(+), 35 deletions(-)
> >
> > diff --git a/tools/perf/util/counts.c b/tools/perf/util/counts.c
> > index 7a447d918458..11cd85b278a6 100644
> > --- a/tools/perf/util/counts.c
> > +++ b/tools/perf/util/counts.c
> > @@ -48,7 +48,6 @@ void perf_counts__reset(struct perf_counts *counts)
> > {
> > xyarray__reset(counts->loaded);
> > xyarray__reset(counts->values);
> > - memset(&counts->aggr, 0, sizeof(struct perf_counts_values));
> > }
> >
> > void evsel__reset_counts(struct evsel *evsel)
> > diff --git a/tools/perf/util/counts.h b/tools/perf/util/counts.h
> > index 5de275194f2b..42760242e0df 100644
> > --- a/tools/perf/util/counts.h
> > +++ b/tools/perf/util/counts.h
> > @@ -11,7 +11,6 @@ struct evsel;
> >
> > struct perf_counts {
> > s8 scaled;
> > - struct perf_counts_values aggr;
> > struct xyarray *values;
> > struct xyarray *loaded;
> > };
> > diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> > index 14c45f4cfdd3..6ab9c58beca7 100644
> > --- a/tools/perf/util/stat.c
> > +++ b/tools/perf/util/stat.c
> > @@ -308,8 +308,6 @@ static void evsel__copy_prev_raw_counts(struct evsel *evsel)
> > *perf_counts(evsel->prev_raw_counts, idx, thread);
> > }
> > }
> > -
> > - evsel->counts->aggr = evsel->prev_raw_counts->aggr;
> > }
> >
> > void evlist__copy_prev_raw_counts(struct evlist *evlist)
> > @@ -320,26 +318,6 @@ void evlist__copy_prev_raw_counts(struct evlist *evlist)
> > evsel__copy_prev_raw_counts(evsel);
> > }
> >
> > -void evlist__save_aggr_prev_raw_counts(struct evlist *evlist)
> > -{
> > - struct evsel *evsel;
> > -
> > - /*
> > - * To collect the overall statistics for interval mode,
> > - * we copy the counts from evsel->prev_raw_counts to
> > - * evsel->counts. The perf_stat_process_counter creates
> > - * aggr values from per cpu values, but the per cpu values
> > - * are 0 for AGGR_GLOBAL. So we use a trick that saves the
> > - * previous aggr value to the first member of perf_counts,
> > - * then aggr calculation in process_counter_values can work
> > - * correctly.
> > - */
> > - evlist__for_each_entry(evlist, evsel) {
> > - *perf_counts(evsel->prev_raw_counts, 0, 0) =
> > - evsel->prev_raw_counts->aggr;
> > - }
> > -}
> > -
> > static size_t pkg_id_hash(const void *__key, void *ctx __maybe_unused)
> > {
> > uint64_t *key = (uint64_t *) __key;
> > @@ -423,7 +401,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
> > int cpu_map_idx, int thread,
> > struct perf_counts_values *count)
> > {
> > - struct perf_counts_values *aggr = &evsel->counts->aggr;
> > struct perf_stat_evsel *ps = evsel->stats;
> > static struct perf_counts_values zero;
> > bool skip = false;
> > @@ -493,12 +470,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
> > }
> > }
> >
> > - if (config->aggr_mode == AGGR_GLOBAL) {
> > - aggr->val += count->val;
> > - aggr->ena += count->ena;
> > - aggr->run += count->run;
> > - }
> > -
> > return 0;
> > }
> >
> > @@ -523,13 +494,10 @@ static int process_counter_maps(struct perf_stat_config *config,
> > int perf_stat_process_counter(struct perf_stat_config *config,
> > struct evsel *counter)
> > {
> > - struct perf_counts_values *aggr = &counter->counts->aggr;
> > struct perf_stat_evsel *ps = counter->stats;
> > - u64 *count = counter->counts->aggr.values;
> > + u64 *count;
> > int ret;
> >
> > - aggr->val = aggr->ena = aggr->run = 0;
> > -
> > if (counter->per_pkg)
> > evsel__zero_per_pkg(counter);
> >
> > @@ -540,6 +508,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
> > if (config->aggr_mode != AGGR_GLOBAL)
> > return 0;
> >
> > + count = ps->aggr[0].counts.values;
>
> Hi Namhyung,
>
> We are using ps->aggr[0] here always. Can you please clarify on why first index is used here always.

Sure, the AGGR_GLOBAL should have a single entry map because
theaggr_cpu_id__global() always returns the same value. So we
can use the index 0 safely. I'll add a comment.

Thanks,
Namhyung

2022-10-18 04:57:28

by Athira Rajeev

[permalink] [raw]
Subject: Re: [PATCH 19/19] perf stat: Remove unused perf_counts.aggr field



> On 18-Oct-2022, at 5:01 AM, Namhyung Kim <[email protected]> wrote:
>
> Hello,
>
> On Sun, Oct 16, 2022 at 6:33 AM Athira Rajeev
> <[email protected]> wrote:
>>
>>
>>
>>> On 14-Oct-2022, at 11:45 AM, Namhyung Kim <[email protected]> wrote:
>>>
>>> The aggr field in the struct perf_counts is to keep the aggregated value
>>> in the AGGR_GLOBAL for the old code. But it's not used anymore.
>>>
>>> Acked-by: Ian Rogers <[email protected]>
>>> Signed-off-by: Namhyung Kim <[email protected]>
>>> ---
>>> tools/perf/util/counts.c | 1 -
>>> tools/perf/util/counts.h | 1 -
>>> tools/perf/util/stat.c | 35 ++---------------------------------
>>> 3 files changed, 2 insertions(+), 35 deletions(-)
>>>
>>> diff --git a/tools/perf/util/counts.c b/tools/perf/util/counts.c
>>> index 7a447d918458..11cd85b278a6 100644
>>> --- a/tools/perf/util/counts.c
>>> +++ b/tools/perf/util/counts.c
>>> @@ -48,7 +48,6 @@ void perf_counts__reset(struct perf_counts *counts)
>>> {
>>> xyarray__reset(counts->loaded);
>>> xyarray__reset(counts->values);
>>> - memset(&counts->aggr, 0, sizeof(struct perf_counts_values));
>>> }
>>>
>>> void evsel__reset_counts(struct evsel *evsel)
>>> diff --git a/tools/perf/util/counts.h b/tools/perf/util/counts.h
>>> index 5de275194f2b..42760242e0df 100644
>>> --- a/tools/perf/util/counts.h
>>> +++ b/tools/perf/util/counts.h
>>> @@ -11,7 +11,6 @@ struct evsel;
>>>
>>> struct perf_counts {
>>> s8 scaled;
>>> - struct perf_counts_values aggr;
>>> struct xyarray *values;
>>> struct xyarray *loaded;
>>> };
>>> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
>>> index 14c45f4cfdd3..6ab9c58beca7 100644
>>> --- a/tools/perf/util/stat.c
>>> +++ b/tools/perf/util/stat.c
>>> @@ -308,8 +308,6 @@ static void evsel__copy_prev_raw_counts(struct evsel *evsel)
>>> *perf_counts(evsel->prev_raw_counts, idx, thread);
>>> }
>>> }
>>> -
>>> - evsel->counts->aggr = evsel->prev_raw_counts->aggr;
>>> }
>>>
>>> void evlist__copy_prev_raw_counts(struct evlist *evlist)
>>> @@ -320,26 +318,6 @@ void evlist__copy_prev_raw_counts(struct evlist *evlist)
>>> evsel__copy_prev_raw_counts(evsel);
>>> }
>>>
>>> -void evlist__save_aggr_prev_raw_counts(struct evlist *evlist)
>>> -{
>>> - struct evsel *evsel;
>>> -
>>> - /*
>>> - * To collect the overall statistics for interval mode,
>>> - * we copy the counts from evsel->prev_raw_counts to
>>> - * evsel->counts. The perf_stat_process_counter creates
>>> - * aggr values from per cpu values, but the per cpu values
>>> - * are 0 for AGGR_GLOBAL. So we use a trick that saves the
>>> - * previous aggr value to the first member of perf_counts,
>>> - * then aggr calculation in process_counter_values can work
>>> - * correctly.
>>> - */
>>> - evlist__for_each_entry(evlist, evsel) {
>>> - *perf_counts(evsel->prev_raw_counts, 0, 0) =
>>> - evsel->prev_raw_counts->aggr;
>>> - }
>>> -}
>>> -
>>> static size_t pkg_id_hash(const void *__key, void *ctx __maybe_unused)
>>> {
>>> uint64_t *key = (uint64_t *) __key;
>>> @@ -423,7 +401,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
>>> int cpu_map_idx, int thread,
>>> struct perf_counts_values *count)
>>> {
>>> - struct perf_counts_values *aggr = &evsel->counts->aggr;
>>> struct perf_stat_evsel *ps = evsel->stats;
>>> static struct perf_counts_values zero;
>>> bool skip = false;
>>> @@ -493,12 +470,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
>>> }
>>> }
>>>
>>> - if (config->aggr_mode == AGGR_GLOBAL) {
>>> - aggr->val += count->val;
>>> - aggr->ena += count->ena;
>>> - aggr->run += count->run;
>>> - }
>>> -
>>> return 0;
>>> }
>>>
>>> @@ -523,13 +494,10 @@ static int process_counter_maps(struct perf_stat_config *config,
>>> int perf_stat_process_counter(struct perf_stat_config *config,
>>> struct evsel *counter)
>>> {
>>> - struct perf_counts_values *aggr = &counter->counts->aggr;
>>> struct perf_stat_evsel *ps = counter->stats;
>>> - u64 *count = counter->counts->aggr.values;
>>> + u64 *count;
>>> int ret;
>>>
>>> - aggr->val = aggr->ena = aggr->run = 0;
>>> -
>>> if (counter->per_pkg)
>>> evsel__zero_per_pkg(counter);
>>>
>>> @@ -540,6 +508,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
>>> if (config->aggr_mode != AGGR_GLOBAL)
>>> return 0;
>>>
>>> + count = ps->aggr[0].counts.values;
>>
>> Hi Namhyung,
>>
>> We are using ps->aggr[0] here always. Can you please clarify on why first index is used here always.
>
> Sure, the AGGR_GLOBAL should have a single entry map because
> theaggr_cpu_id__global() always returns the same value. So we
> can use the index 0 safely. I'll add a comment.

Hi Namhyung,
Sure, Thanks for the clarification.

Athira
>
> Thanks,
> Namhyung