2020-05-08 08:02:03

by Jin Yao

[permalink] [raw]
Subject: [PATCH v4 0/4] perf stat: Support overall statistics for interval mode

Currently perf-stat supports to print counts at regular interval (-I),
but it's not very easy for user to get the overall statistics.

With this patchset, it supports to report the summary at the end of
interval output.

For example,

root@kbl-ppc:~# perf stat -e cycles -I1000 --interval-count 2
# time counts unit events
1.000412064 2,281,114 cycles
2.001383658 2,547,880 cycles

Performance counter stats for 'system wide':

4,828,994 cycles

2.002860349 seconds time elapsed

root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
# time counts unit events
1.000389902 1,536,093 cycles
1.000389902 420,226 instructions # 0.27 insn per cycle
2.001433453 2,213,952 cycles
2.001433453 735,465 instructions # 0.33 insn per cycle

Performance counter stats for 'system wide':

3,750,045 cycles
1,155,691 instructions # 0.31 insn per cycle

2.003023361 seconds time elapsed

root@kbl-ppc:~# perf stat -M CPI,IPC -I1000 --interval-count 2
# time counts unit events
1.000435121 905,303 inst_retired.any # 2.9 CPI
1.000435121 2,663,333 cycles
1.000435121 914,702 inst_retired.any # 0.3 IPC
1.000435121 2,676,559 cpu_clk_unhalted.thread
2.001615941 1,951,092 inst_retired.any # 1.8 CPI
2.001615941 3,551,357 cycles
2.001615941 1,950,837 inst_retired.any # 0.5 IPC
2.001615941 3,551,044 cpu_clk_unhalted.thread

Performance counter stats for 'system wide':

2,856,395 inst_retired.any # 2.2 CPI
6,214,690 cycles
2,865,539 inst_retired.any # 0.5 IPC
6,227,603 cpu_clk_unhalted.thread

2.003403078 seconds time elapsed

v4:
---
1. Create runtime_stat_reset.

2. Zero the aggr in perf_counts__reset and use it to reset
prev_raw_counts.

3. Move affinity setup and read_counter_cpu to a new function
read_affinity_counters. It's only called when stat_config.summary
is not set.

v3:
---
1. 'perf stat: Fix wrong per-thread runtime stat for interval mode'
is a new patch which fixes an existing issue found in test.

2. We use the prev_raw_counts for summary counts. Drop the summary_counts in v2.

3. Fix some issues.

v2:
---
Rebase to perf/core branch

Jin Yao (4):
perf stat: Fix wrong per-thread runtime stat for interval mode
perf counts: Reset prev_raw_counts counts
perf stat: Copy counts from prev_raw_counts to evsel->counts
perf stat: Report summary for interval mode

tools/perf/builtin-stat.c | 97 ++++++++++++++++++++++++++-------------
tools/perf/util/counts.c | 4 +-
tools/perf/util/counts.h | 1 +
tools/perf/util/evsel.c | 1 +
tools/perf/util/stat.c | 33 ++++++++++---
tools/perf/util/stat.h | 2 +
6 files changed, 99 insertions(+), 39 deletions(-)

--
2.17.1


2020-05-08 08:02:08

by Jin Yao

[permalink] [raw]
Subject: [PATCH v4 1/4] perf stat: Fix wrong per-thread runtime stat for interval mode

root@kbl-ppc:~# perf stat --per-thread -e cycles,instructions -I1000 --interval-count 2
1.004171683 perf-3696 8,747,311 cycles
...
1.004171683 perf-3696 691,730 instructions # 0.08 insn per cycle
...
2.006490373 perf-3696 1,749,936 cycles
...
2.006490373 perf-3696 1,484,582 instructions # 0.28 insn per cycle
...

Let's see interval 2.006490373

perf-3696 1,749,936 cycles
perf-3696 1,484,582 instructions # 0.28 insn per cycle

insn per cycle = 1,484,582 / 1,749,936 = 0.85.
But now it's 0.28, that's not correct.

stat_config.stats[] records the per-thread runtime stat. But for interval
mode, it should be reset for each interval.

So now, with this patch,

root@kbl-ppc:~# perf stat --per-thread -e cycles,instructions -I1000 --interval-count 2
1.005818121 perf-8633 9,898,045 cycles
...
1.005818121 perf-8633 693,298 instructions # 0.07 insn per cycle
...
2.007863743 perf-8633 1,551,619 cycles
...
2.007863743 perf-8633 1,317,514 instructions # 0.85 insn per cycle
...

Let's check interval 2.007863743.

insn per cycle = 1,317,514 / 1,551,619 = 0.85. It's correct.

This patch creates runtime_stat_reset, places it next to
untime_stat_new/runtime_stat_delete and moves all runtime_stat
functions before process_interval.

v4:
---
Create runtime_stat_reset.

Fixes: commit 14e72a21c783 ("perf stat: Update or print per-thread stats")
Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/builtin-stat.c | 70 +++++++++++++++++++++++----------------
1 file changed, 41 insertions(+), 29 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index e0c1ad23c768..f3b3a59ac7d2 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -351,6 +351,46 @@ static void read_counters(struct timespec *rs)
}
}

+static int runtime_stat_new(struct perf_stat_config *config, int nthreads)
+{
+ int i;
+
+ config->stats = calloc(nthreads, sizeof(struct runtime_stat));
+ if (!config->stats)
+ return -1;
+
+ config->stats_num = nthreads;
+
+ for (i = 0; i < nthreads; i++)
+ runtime_stat__init(&config->stats[i]);
+
+ return 0;
+}
+
+static void runtime_stat_delete(struct perf_stat_config *config)
+{
+ int i;
+
+ if (!config->stats)
+ return;
+
+ for (i = 0; i < config->stats_num; i++)
+ runtime_stat__exit(&config->stats[i]);
+
+ zfree(&config->stats);
+}
+
+static void runtime_stat_reset(struct perf_stat_config *config)
+{
+ int i;
+
+ if (!config->stats)
+ return;
+
+ for (i = 0; i < config->stats_num; i++)
+ perf_stat__reset_shadow_per_stat(&config->stats[i]);
+}
+
static void process_interval(void)
{
struct timespec ts, rs;
@@ -359,6 +399,7 @@ static void process_interval(void)
diff_timespec(&rs, &ts, &ref_time);

perf_stat__reset_shadow_per_stat(&rt_stat);
+ runtime_stat_reset(&stat_config);
read_counters(&rs);

if (STAT_RECORD) {
@@ -1737,35 +1778,6 @@ int process_cpu_map_event(struct perf_session *session,
return set_maps(st);
}

-static int runtime_stat_new(struct perf_stat_config *config, int nthreads)
-{
- int i;
-
- config->stats = calloc(nthreads, sizeof(struct runtime_stat));
- if (!config->stats)
- return -1;
-
- config->stats_num = nthreads;
-
- for (i = 0; i < nthreads; i++)
- runtime_stat__init(&config->stats[i]);
-
- return 0;
-}
-
-static void runtime_stat_delete(struct perf_stat_config *config)
-{
- int i;
-
- if (!config->stats)
- return;
-
- for (i = 0; i < config->stats_num; i++)
- runtime_stat__exit(&config->stats[i]);
-
- zfree(&config->stats);
-}
-
static const char * const stat_report_usage[] = {
"perf stat report [<options>]",
NULL,
--
2.17.1

2020-05-08 08:02:13

by Jin Yao

[permalink] [raw]
Subject: [PATCH v4 2/4] perf counts: Reset prev_raw_counts counts

When we want to reset the evsel->prev_raw_counts, zeroing the aggr
is not enough, we need to reset the perf_counts too.

The perf_counts__reset zeros the perf_counts, and it should zero
the aggr too. This patch changes perf_counts__reset to non-static,
and calls it in evsel__reset_prev_raw_counts to reset the
prev_raw_counts.

v4:
---
Zeroing the aggr in perf_counts__reset and use it to reset
prev_raw_counts.

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/counts.c | 4 +++-
tools/perf/util/counts.h | 1 +
tools/perf/util/stat.c | 7 ++-----
3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/counts.c b/tools/perf/util/counts.c
index 615c9f3e95cb..582f3aeaf5e4 100644
--- a/tools/perf/util/counts.c
+++ b/tools/perf/util/counts.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include <errno.h>
#include <stdlib.h>
+#include <string.h>
#include "evsel.h"
#include "counts.h"
#include <linux/zalloc.h>
@@ -42,10 +43,11 @@ void perf_counts__delete(struct perf_counts *counts)
}
}

-static void perf_counts__reset(struct perf_counts *counts)
+void perf_counts__reset(struct perf_counts *counts)
{
xyarray__reset(counts->loaded);
xyarray__reset(counts->values);
+ memset(&counts->aggr, 0, sizeof(struct perf_counts_values));
}

void evsel__reset_counts(struct evsel *evsel)
diff --git a/tools/perf/util/counts.h b/tools/perf/util/counts.h
index 8f556c6d98fa..7ff36bf6d644 100644
--- a/tools/perf/util/counts.h
+++ b/tools/perf/util/counts.h
@@ -37,6 +37,7 @@ perf_counts__set_loaded(struct perf_counts *counts, int cpu, int thread, bool lo

struct perf_counts *perf_counts__new(int ncpus, int nthreads);
void perf_counts__delete(struct perf_counts *counts);
+void perf_counts__reset(struct perf_counts *counts);

void evsel__reset_counts(struct evsel *evsel);
int evsel__alloc_counts(struct evsel *evsel, int ncpus, int nthreads);
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index f4a44df9b221..e397815f0dfb 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -163,11 +163,8 @@ static void evsel__free_prev_raw_counts(struct evsel *evsel)

static void evsel__reset_prev_raw_counts(struct evsel *evsel)
{
- if (evsel->prev_raw_counts) {
- evsel->prev_raw_counts->aggr.val = 0;
- evsel->prev_raw_counts->aggr.ena = 0;
- evsel->prev_raw_counts->aggr.run = 0;
- }
+ if (evsel->prev_raw_counts)
+ perf_counts__reset(evsel->prev_raw_counts);
}

static int evsel__alloc_stats(struct evsel *evsel, bool alloc_raw)
--
2.17.1

2020-05-08 08:03:08

by Jin Yao

[permalink] [raw]
Subject: [PATCH v4 4/4] perf stat: Report summary for interval mode

Currently perf-stat supports to print counts at regular interval (-I),
but it's not very easy for user to get the overall statistics.

The patch uses 'evsel->prev_raw_counts' to get counts for summary.
Copy the counts to 'evsel->counts' after printing the interval results.
Next, we just follow the non-interval processing.

Let's see some examples,

root@kbl-ppc:~# perf stat -e cycles -I1000 --interval-count 2
# time counts unit events
1.000412064 2,281,114 cycles
2.001383658 2,547,880 cycles

Performance counter stats for 'system wide':

4,828,994 cycles

2.002860349 seconds time elapsed

root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
# time counts unit events
1.000389902 1,536,093 cycles
1.000389902 420,226 instructions # 0.27 insn per cycle
2.001433453 2,213,952 cycles
2.001433453 735,465 instructions # 0.33 insn per cycle

Performance counter stats for 'system wide':

3,750,045 cycles
1,155,691 instructions # 0.31 insn per cycle

2.003023361 seconds time elapsed

root@kbl-ppc:~# perf stat -M CPI,IPC -I1000 --interval-count 2
# time counts unit events
1.000435121 905,303 inst_retired.any # 2.9 CPI
1.000435121 2,663,333 cycles
1.000435121 914,702 inst_retired.any # 0.3 IPC
1.000435121 2,676,559 cpu_clk_unhalted.thread
2.001615941 1,951,092 inst_retired.any # 1.8 CPI
2.001615941 3,551,357 cycles
2.001615941 1,950,837 inst_retired.any # 0.5 IPC
2.001615941 3,551,044 cpu_clk_unhalted.thread

Performance counter stats for 'system wide':

2,856,395 inst_retired.any # 2.2 CPI
6,214,690 cycles
2,865,539 inst_retired.any # 0.5 IPC
6,227,603 cpu_clk_unhalted.thread

2.003403078 seconds time elapsed

v4:
---
Move affinity setup and read_counter_cpu to a new function
read_affinity_counters. It's only called when stat_config.summary
is not set.

v3:
---
Use evsel->prev_raw_counts for summary counts

v2:
---
Rebase to perf/core branch

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/builtin-stat.c | 27 ++++++++++++++++++++++++---
tools/perf/util/stat.c | 2 +-
tools/perf/util/stat.h | 1 +
3 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index f3b3a59ac7d2..d6a6aa6b997d 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -314,14 +314,14 @@ static int read_counter_cpu(struct evsel *counter, struct timespec *rs, int cpu)
return 0;
}

-static void read_counters(struct timespec *rs)
+static int read_affinity_counters(struct timespec *rs)
{
struct evsel *counter;
struct affinity affinity;
int i, ncpus, cpu;

if (affinity__setup(&affinity) < 0)
- return;
+ return -1;

ncpus = perf_cpu_map__nr(evsel_list->core.all_cpus);
if (!target__has_cpu(&target) || target__has_per_thread(&target))
@@ -341,6 +341,15 @@ static void read_counters(struct timespec *rs)
}
}
affinity__cleanup(&affinity);
+ return 0;
+}
+
+static void read_counters(struct timespec *rs)
+{
+ struct evsel *counter;
+
+ if (!stat_config.summary && (read_affinity_counters(rs) < 0))
+ return;

evlist__for_each_entry(evsel_list, counter) {
if (counter->err)
@@ -394,6 +403,7 @@ static void runtime_stat_reset(struct perf_stat_config *config)
static void process_interval(void)
{
struct timespec ts, rs;
+ struct stats walltime_nsecs_stats_bak;

clock_gettime(CLOCK_MONOTONIC, &ts);
diff_timespec(&rs, &ts, &ref_time);
@@ -407,9 +417,11 @@ static void process_interval(void)
pr_err("failed to write stat round event\n");
}

+ walltime_nsecs_stats_bak = walltime_nsecs_stats;
init_stats(&walltime_nsecs_stats);
update_stats(&walltime_nsecs_stats, stat_config.interval * 1000000);
print_counters(&rs, 0, NULL);
+ walltime_nsecs_stats = walltime_nsecs_stats_bak;
}

static void enable_counters(void)
@@ -765,6 +777,15 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)

update_stats(&walltime_nsecs_stats, t1 - t0);

+ if (interval) {
+ stat_config.interval = 0;
+ stat_config.summary = true;
+ perf_evlist__copy_prev_raw_counts(evsel_list);
+ perf_evlist__reset_prev_raw_counts(evsel_list);
+ runtime_stat_reset(&stat_config);
+ perf_stat__reset_shadow_per_stat(&rt_stat);
+ }
+
/*
* Closing a group leader splits the group, and as we only disable
* group leaders, results in remaining events becoming enabled. To
@@ -2159,7 +2180,7 @@ int cmd_stat(int argc, const char **argv)
}
}

- if (!forever && status != -1 && !interval)
+ if (!forever && status != -1 && (!interval || stat_config.summary))
print_counters(NULL, argc, argv);

if (STAT_RECORD) {
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index aadc723ce871..49e832a9c109 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -388,7 +388,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
* interval mode, otherwise overall avg running
* averages will be shown for each interval.
*/
- if (config->interval) {
+ if (config->interval || config->summary) {
for (i = 0; i < 3; i++)
init_stats(&ps->res_stats[i]);
}
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 62cf72c71869..c60e9e5d6474 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -110,6 +110,7 @@ struct perf_stat_config {
bool all_kernel;
bool all_user;
bool percore_show_thread;
+ bool summary;
FILE *output;
unsigned int interval;
unsigned int timeout;
--
2.17.1

2020-05-08 08:03:13

by Jin Yao

[permalink] [raw]
Subject: [PATCH v4 3/4] perf stat: Copy counts from prev_raw_counts to evsel->counts

It would be useful to support the overall statistics for perf-stat
interval mode. For example, report the summary at the end of
"perf-stat -I" output.

But since perf-stat can support many aggregation modes, such as
--per-thread, --per-socket, -M and etc, we need a solution which
doesn't bring much complexity.

The idea is to use 'evsel->prev_raw_counts' which is updated in
each interval and it's saved with the latest counts. Before reporting
the summary, we copy the counts from evsel->prev_raw_counts to
evsel->counts, and next we just follow non-interval processing.

In evsel__compute_deltas, this patch saves counts to the member
[cpu0,thread0] of perf_counts for AGGR_GLOBAL.

That's because after copying evsel->prev_raw_counts to evsel->counts,
perf_counts(evsel->counts, cpu, thread) are all 0 for AGGR_GLOBAL.
Once we go to process_counter_maps again, all members of perf_counts
are 0.

So this patch uses a trick that saves the previous aggr value to
the member [cpu0,thread0] of perf_counts, then aggr calculation
in process_counter_values can work correctly.

v4:
---
Change the commit message.
No functional change.

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/evsel.c | 1 +
tools/perf/util/stat.c | 24 ++++++++++++++++++++++++
tools/perf/util/stat.h | 1 +
3 files changed, 26 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 28683b0eb738..6fae1ec28886 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1283,6 +1283,7 @@ void evsel__compute_deltas(struct evsel *evsel, int cpu, int thread,
if (cpu == -1) {
tmp = evsel->prev_raw_counts->aggr;
evsel->prev_raw_counts->aggr = *count;
+ *perf_counts(evsel->prev_raw_counts, 0, 0) = *count;
} else {
tmp = *perf_counts(evsel->prev_raw_counts, cpu, thread);
*perf_counts(evsel->prev_raw_counts, cpu, thread) = *count;
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index e397815f0dfb..aadc723ce871 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -225,6 +225,30 @@ void perf_evlist__reset_prev_raw_counts(struct evlist *evlist)
evsel__reset_prev_raw_counts(evsel);
}

+static void perf_evsel__copy_prev_raw_counts(struct evsel *evsel)
+{
+ int ncpus = evsel__nr_cpus(evsel);
+ int nthreads = perf_thread_map__nr(evsel->core.threads);
+
+ for (int thread = 0; thread < nthreads; thread++) {
+ for (int cpu = 0; cpu < ncpus; cpu++) {
+ *perf_counts(evsel->counts, cpu, thread) =
+ *perf_counts(evsel->prev_raw_counts, cpu,
+ thread);
+ }
+ }
+
+ evsel->counts->aggr = evsel->prev_raw_counts->aggr;
+}
+
+void perf_evlist__copy_prev_raw_counts(struct evlist *evlist)
+{
+ struct evsel *evsel;
+
+ evlist__for_each_entry(evlist, evsel)
+ perf_evsel__copy_prev_raw_counts(evsel);
+}
+
static void zero_per_pkg(struct evsel *counter)
{
if (counter->per_pkg_mask)
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index b4fdfaa7f2c0..62cf72c71869 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -198,6 +198,7 @@ int perf_evlist__alloc_stats(struct evlist *evlist, bool alloc_raw);
void perf_evlist__free_stats(struct evlist *evlist);
void perf_evlist__reset_stats(struct evlist *evlist);
void perf_evlist__reset_prev_raw_counts(struct evlist *evlist);
+void perf_evlist__copy_prev_raw_counts(struct evlist *evlist);

int perf_stat_process_counter(struct perf_stat_config *config,
struct evsel *counter);
--
2.17.1

2020-05-13 15:36:59

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v4 3/4] perf stat: Copy counts from prev_raw_counts to evsel->counts

On Fri, May 08, 2020 at 03:58:16PM +0800, Jin Yao wrote:
> It would be useful to support the overall statistics for perf-stat
> interval mode. For example, report the summary at the end of
> "perf-stat -I" output.
>
> But since perf-stat can support many aggregation modes, such as
> --per-thread, --per-socket, -M and etc, we need a solution which
> doesn't bring much complexity.
>
> The idea is to use 'evsel->prev_raw_counts' which is updated in
> each interval and it's saved with the latest counts. Before reporting
> the summary, we copy the counts from evsel->prev_raw_counts to
> evsel->counts, and next we just follow non-interval processing.
>
> In evsel__compute_deltas, this patch saves counts to the member
> [cpu0,thread0] of perf_counts for AGGR_GLOBAL.
>
> That's because after copying evsel->prev_raw_counts to evsel->counts,
> perf_counts(evsel->counts, cpu, thread) are all 0 for AGGR_GLOBAL.
> Once we go to process_counter_maps again, all members of perf_counts
> are 0.
>
> So this patch uses a trick that saves the previous aggr value to
> the member [cpu0,thread0] of perf_counts, then aggr calculation
> in process_counter_values can work correctly.
>
> v4:
> ---
> Change the commit message.
> No functional change.
>
> Signed-off-by: Jin Yao <[email protected]>
> ---
> tools/perf/util/evsel.c | 1 +
> tools/perf/util/stat.c | 24 ++++++++++++++++++++++++
> tools/perf/util/stat.h | 1 +
> 3 files changed, 26 insertions(+)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 28683b0eb738..6fae1ec28886 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -1283,6 +1283,7 @@ void evsel__compute_deltas(struct evsel *evsel, int cpu, int thread,
> if (cpu == -1) {
> tmp = evsel->prev_raw_counts->aggr;
> evsel->prev_raw_counts->aggr = *count;
> + *perf_counts(evsel->prev_raw_counts, 0, 0) = *count;

ok, I think I understand that now.. it's only for AGGR_GLOBAL mode,
because the perf_stat_process_counter will create aggr values from
per cpu values

but why do we need to do that all the time? can't we just set it up
before you zero prev_raw_counts in next patch?


if (interval) {
stat_config.interval = 0;
stat_config.summary = true;
perf_evlist__copy_prev_raw_counts(evsel_list);

-> for AGGR_GLOBAL set the counts[0,0] to prev_raw_counts->aggr

perf_evlist__reset_prev_raw_counts(evsel_list);
runtime_stat_reset(&stat_config);
perf_stat__reset_shadow_per_stat(&rt_stat);
}


thanks,
jirka

2020-05-14 05:46:47

by Jin Yao

[permalink] [raw]
Subject: Re: [PATCH v4 3/4] perf stat: Copy counts from prev_raw_counts to evsel->counts

Hi Jiri,

On 5/13/2020 11:31 PM, Jiri Olsa wrote:
> On Fri, May 08, 2020 at 03:58:16PM +0800, Jin Yao wrote:
>> It would be useful to support the overall statistics for perf-stat
>> interval mode. For example, report the summary at the end of
>> "perf-stat -I" output.
>>
>> But since perf-stat can support many aggregation modes, such as
>> --per-thread, --per-socket, -M and etc, we need a solution which
>> doesn't bring much complexity.
>>
>> The idea is to use 'evsel->prev_raw_counts' which is updated in
>> each interval and it's saved with the latest counts. Before reporting
>> the summary, we copy the counts from evsel->prev_raw_counts to
>> evsel->counts, and next we just follow non-interval processing.
>>
>> In evsel__compute_deltas, this patch saves counts to the member
>> [cpu0,thread0] of perf_counts for AGGR_GLOBAL.
>>
>> That's because after copying evsel->prev_raw_counts to evsel->counts,
>> perf_counts(evsel->counts, cpu, thread) are all 0 for AGGR_GLOBAL.
>> Once we go to process_counter_maps again, all members of perf_counts
>> are 0.
>>
>> So this patch uses a trick that saves the previous aggr value to
>> the member [cpu0,thread0] of perf_counts, then aggr calculation
>> in process_counter_values can work correctly.
>>
>> v4:
>> ---
>> Change the commit message.
>> No functional change.
>>
>> Signed-off-by: Jin Yao <[email protected]>
>> ---
>> tools/perf/util/evsel.c | 1 +
>> tools/perf/util/stat.c | 24 ++++++++++++++++++++++++
>> tools/perf/util/stat.h | 1 +
>> 3 files changed, 26 insertions(+)
>>
>> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
>> index 28683b0eb738..6fae1ec28886 100644
>> --- a/tools/perf/util/evsel.c
>> +++ b/tools/perf/util/evsel.c
>> @@ -1283,6 +1283,7 @@ void evsel__compute_deltas(struct evsel *evsel, int cpu, int thread,
>> if (cpu == -1) {
>> tmp = evsel->prev_raw_counts->aggr;
>> evsel->prev_raw_counts->aggr = *count;
>> + *perf_counts(evsel->prev_raw_counts, 0, 0) = *count;
>
> ok, I think I understand that now.. it's only for AGGR_GLOBAL mode,
> because the perf_stat_process_counter will create aggr values from
> per cpu values
>
> but why do we need to do that all the time? can't we just set it up
> before you zero prev_raw_counts in next patch?
>
>
> if (interval) {
> stat_config.interval = 0;
> stat_config.summary = true;
> perf_evlist__copy_prev_raw_counts(evsel_list);
>
> -> for AGGR_GLOBAL set the counts[0,0] to prev_raw_counts->aggr
>
> perf_evlist__reset_prev_raw_counts(evsel_list);
> runtime_stat_reset(&stat_config);
> perf_stat__reset_shadow_per_stat(&rt_stat);
> }
>


Yes, I think that's a good idea.

Now in v5, I create a new patch "perf stat: Save aggr value to first member of
prev_raw_counts" to save aggr value to first member of prev_raw_counts for
AGGR_GLOBAL. Then next, perf_stat_process_counter can create aggr values from
per cpu values successfully.

Thanks
Jin Yao

>
> thanks,
> jirka
>