2020-05-07 07:04:21

by Jin Yao

[permalink] [raw]
Subject: [PATCH v3 0/4] perf stat: Support overall statistics for interval mode

Currently perf-stat supports to print counts at regular interval (-I),
but it's not very easy for user to get the overall statistics.

With this patchset, it supports to report the summary at the end of
interval output.

For example,

root@kbl-ppc:~# perf stat -e cycles -I1000 --interval-count 2
# time counts unit events
1.000412064 2,281,114 cycles
2.001383658 2,547,880 cycles

Performance counter stats for 'system wide':

4,828,994 cycles

2.002860349 seconds time elapsed

root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
# time counts unit events
1.000389902 1,536,093 cycles
1.000389902 420,226 instructions # 0.27 insn per cycle
2.001433453 2,213,952 cycles
2.001433453 735,465 instructions # 0.33 insn per cycle

Performance counter stats for 'system wide':

3,750,045 cycles
1,155,691 instructions # 0.31 insn per cycle

2.003023361 seconds time elapsed

root@kbl-ppc:~# perf stat -M CPI,IPC -I1000 --interval-count 2
# time counts unit events
1.000435121 905,303 inst_retired.any # 2.9 CPI
1.000435121 2,663,333 cycles
1.000435121 914,702 inst_retired.any # 0.3 IPC
1.000435121 2,676,559 cpu_clk_unhalted.thread
2.001615941 1,951,092 inst_retired.any # 1.8 CPI
2.001615941 3,551,357 cycles
2.001615941 1,950,837 inst_retired.any # 0.5 IPC
2.001615941 3,551,044 cpu_clk_unhalted.thread

Performance counter stats for 'system wide':

2,856,395 inst_retired.any # 2.2 CPI
6,214,690 cycles
2,865,539 inst_retired.any # 0.5 IPC
6,227,603 cpu_clk_unhalted.thread

2.003403078 seconds time elapsed

v3:
---
1. 'perf stat: Fix wrong per-thread runtime stat for interval mode'
is a new patch which fixes an existing issue found in test.

2. We use the prev_raw_counts for summary counts. Drop the summary_counts in v2.

3. Fix some issues.

v2:
---
Rebase to perf/core branch

Jin Yao (4):
perf stat: Fix wrong per-thread runtime stat for interval mode
perf counts: Reset prev_raw_counts counts
perf stat: Copy counts from prev_raw_counts to evsel->counts
perf stat: Report summary for interval mode

tools/perf/builtin-stat.c | 27 +++++++++++++++++++++++++--
tools/perf/util/counts.c | 5 +++++
tools/perf/util/counts.h | 2 ++
tools/perf/util/evsel.c | 1 +
tools/perf/util/stat.c | 27 ++++++++++++++++++++++++++-
tools/perf/util/stat.h | 2 ++
6 files changed, 61 insertions(+), 3 deletions(-)

--
2.17.1


2020-05-07 07:04:44

by Jin Yao

[permalink] [raw]
Subject: [PATCH v3 1/4] perf stat: Fix wrong per-thread runtime stat for interval mode

root@kbl-ppc:~# perf stat --per-thread -e cycles,instructions -I1000 --interval-count 2
1.004171683 perf-3696 8,747,311 cycles
...
1.004171683 perf-3696 691,730 instructions # 0.08 insn per cycle
...
2.006490373 perf-3696 1,749,936 cycles
...
2.006490373 perf-3696 1,484,582 instructions # 0.28 insn per cycle
...

Let's see interval 2.006490373

perf-3696 1,749,936 cycles
perf-3696 1,484,582 instructions # 0.28 insn per cycle

insn per cycle = 1,484,582 / 1,749,936 = 0.85.
But now it's 0.28, that's not correct.

stat_config.stats[] records the per-thread runtime stat. But for interval
mode, it should be reset for each interval.

So now, with this patch,

root@kbl-ppc:~# perf stat --per-thread -e cycles,instructions -I1000 --interval-count 2
1.005818121 perf-8633 9,898,045 cycles
...
1.005818121 perf-8633 693,298 instructions # 0.07 insn per cycle
...
2.007863743 perf-8633 1,551,619 cycles
...
2.007863743 perf-8633 1,317,514 instructions # 0.85 insn per cycle
...

Let's check interval 2.007863743.

insn per cycle = 1,317,514 / 1,551,619 = 0.85. It's correct.

Fixes: commit 14e72a21c783 ("perf stat: Update or print per-thread stats")
Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/builtin-stat.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index e0c1ad23c768..97ee941649e6 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -351,6 +351,16 @@ static void read_counters(struct timespec *rs)
}
}

+static void thread_stats_reset(struct perf_stat_config *config)
+{
+ int i;
+
+ if (config->stats) {
+ for (i = 0; i < config->stats_num; i++)
+ perf_stat__reset_shadow_per_stat(&config->stats[i]);
+ }
+}
+
static void process_interval(void)
{
struct timespec ts, rs;
@@ -359,6 +369,7 @@ static void process_interval(void)
diff_timespec(&rs, &ts, &ref_time);

perf_stat__reset_shadow_per_stat(&rt_stat);
+ thread_stats_reset(&stat_config);
read_counters(&rs);

if (STAT_RECORD) {
--
2.17.1

2020-05-07 07:05:41

by Jin Yao

[permalink] [raw]
Subject: [PATCH v3 3/4] perf stat: Copy counts from prev_raw_counts to evsel->counts

It would be useful to support the overall statistics for perf-stat
interval mode. For example, report the summary at the end of
"perf-stat -I" output.

But since perf-stat can support many aggregation modes, such as
--per-thread, --per-socket, -M and etc, we need a solution which
doesn't bring much complexity.

The idea is to use 'evsel->prev_raw_counts' which is updated in
each interval and it's saved with the latest counts. Before reporting
the summary, we copy the counts from evsel->prev_raw_counts to
evsel->counts, and next we just follow non-interval processing.

In evsel__compute_deltas, this patch saves counts to the position
of [cpu0,thread0] for AGGR_GLOBAL. After copying counts from
evsel->prev_raw_counts to evsel->counts, we don't need to
modify process_counter_maps in perf_stat_process_counter to let it
work well.

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/evsel.c | 1 +
tools/perf/util/stat.c | 24 ++++++++++++++++++++++++
tools/perf/util/stat.h | 1 +
3 files changed, 26 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index f3e60c45d59a..898a697c7cdd 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1288,6 +1288,7 @@ void evsel__compute_deltas(struct evsel *evsel, int cpu, int thread,
if (cpu == -1) {
tmp = evsel->prev_raw_counts->aggr;
evsel->prev_raw_counts->aggr = *count;
+ *perf_counts(evsel->prev_raw_counts, 0, 0) = *count;
} else {
tmp = *perf_counts(evsel->prev_raw_counts, cpu, thread);
*perf_counts(evsel->prev_raw_counts, cpu, thread) = *count;
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 89e541564ed5..ede113805ecd 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -230,6 +230,30 @@ void perf_evlist__reset_prev_raw_counts(struct evlist *evlist)
perf_evsel__reset_prev_raw_counts(evsel);
}

+static void perf_evsel__copy_prev_raw_counts(struct evsel *evsel)
+{
+ int ncpus = evsel__nr_cpus(evsel);
+ int nthreads = perf_thread_map__nr(evsel->core.threads);
+
+ for (int thread = 0; thread < nthreads; thread++) {
+ for (int cpu = 0; cpu < ncpus; cpu++) {
+ *perf_counts(evsel->counts, cpu, thread) =
+ *perf_counts(evsel->prev_raw_counts, cpu,
+ thread);
+ }
+ }
+
+ evsel->counts->aggr = evsel->prev_raw_counts->aggr;
+}
+
+void perf_evlist__copy_prev_raw_counts(struct evlist *evlist)
+{
+ struct evsel *evsel;
+
+ evlist__for_each_entry(evlist, evsel)
+ perf_evsel__copy_prev_raw_counts(evsel);
+}
+
static void zero_per_pkg(struct evsel *counter)
{
if (counter->per_pkg_mask)
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index b4fdfaa7f2c0..62cf72c71869 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -198,6 +198,7 @@ int perf_evlist__alloc_stats(struct evlist *evlist, bool alloc_raw);
void perf_evlist__free_stats(struct evlist *evlist);
void perf_evlist__reset_stats(struct evlist *evlist);
void perf_evlist__reset_prev_raw_counts(struct evlist *evlist);
+void perf_evlist__copy_prev_raw_counts(struct evlist *evlist);

int perf_stat_process_counter(struct perf_stat_config *config,
struct evsel *counter);
--
2.17.1

2020-05-07 15:22:57

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] perf stat: Copy counts from prev_raw_counts to evsel->counts

On Thu, May 07, 2020 at 02:58:21PM +0800, Jin Yao wrote:
> It would be useful to support the overall statistics for perf-stat
> interval mode. For example, report the summary at the end of
> "perf-stat -I" output.
>
> But since perf-stat can support many aggregation modes, such as
> --per-thread, --per-socket, -M and etc, we need a solution which
> doesn't bring much complexity.
>
> The idea is to use 'evsel->prev_raw_counts' which is updated in
> each interval and it's saved with the latest counts. Before reporting
> the summary, we copy the counts from evsel->prev_raw_counts to
> evsel->counts, and next we just follow non-interval processing.

I did not realize we already store the count in prev_raw_counts ;-) nice catch!

>
> In evsel__compute_deltas, this patch saves counts to the position
> of [cpu0,thread0] for AGGR_GLOBAL. After copying counts from
> evsel->prev_raw_counts to evsel->counts, we don't need to
> modify process_counter_maps in perf_stat_process_counter to let it
> work well.

I don't understand why you need to store it in here.. what's the catch
in process_counter_maps?

thanks,
jirka

2020-05-07 15:23:59

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] perf stat: Fix wrong per-thread runtime stat for interval mode

On Thu, May 07, 2020 at 02:58:19PM +0800, Jin Yao wrote:

SNIP

> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index e0c1ad23c768..97ee941649e6 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -351,6 +351,16 @@ static void read_counters(struct timespec *rs)
> }
> }
>
> +static void thread_stats_reset(struct perf_stat_config *config)
> +{
> + int i;
> +
> + if (config->stats) {
> + for (i = 0; i < config->stats_num; i++)
> + perf_stat__reset_shadow_per_stat(&config->stats[i]);
> + }
> +}
> +
> static void process_interval(void)
> {
> struct timespec ts, rs;
> @@ -359,6 +369,7 @@ static void process_interval(void)
> diff_timespec(&rs, &ts, &ref_time);
>
> perf_stat__reset_shadow_per_stat(&rt_stat);
> + thread_stats_reset(&stat_config);

can't you call in here perf_stat__reset_stats?

and if not, I know it's threads related, but new
and delete functions are:

runtime_stat_new, runtime_stat_delete

so let's call it runtime_stat_reset and place it next to
the new/delete functions

other than that it looks ok, thanks

jirka

> read_counters(&rs);
>
> if (STAT_RECORD) {
> --
> 2.17.1
>

2020-05-08 02:05:33

by Jin Yao

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] perf stat: Fix wrong per-thread runtime stat for interval mode

Hi Jiri,

On 5/7/2020 11:19 PM, Jiri Olsa wrote:
> On Thu, May 07, 2020 at 02:58:19PM +0800, Jin Yao wrote:
>
> SNIP
>
>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>> index e0c1ad23c768..97ee941649e6 100644
>> --- a/tools/perf/builtin-stat.c
>> +++ b/tools/perf/builtin-stat.c
>> @@ -351,6 +351,16 @@ static void read_counters(struct timespec *rs)
>> }
>> }
>>
>> +static void thread_stats_reset(struct perf_stat_config *config)
>> +{
>> + int i;
>> +
>> + if (config->stats) {
>> + for (i = 0; i < config->stats_num; i++)
>> + perf_stat__reset_shadow_per_stat(&config->stats[i]);
>> + }
>> +}
>> +
>> static void process_interval(void)
>> {
>> struct timespec ts, rs;
>> @@ -359,6 +369,7 @@ static void process_interval(void)
>> diff_timespec(&rs, &ts, &ref_time);
>>
>> perf_stat__reset_shadow_per_stat(&rt_stat);
>> + thread_stats_reset(&stat_config);
>
> can't you call in here perf_stat__reset_stats?
>

If we call perf_stat__reset_stat here, it will reset the evsel->counts, but I
don't think it's necessary. The counts will be updated in read_counts() soon.

> and if not, I know it's threads related, but new
> and delete functions are:
>
> runtime_stat_new, runtime_stat_delete
>
> so let's call it runtime_stat_reset and place it next to
> the new/delete functions
>

Yes, that's good idea. I will create runtime_stat_reset and place it next to
untime_stat_new/runtime_stat_delete.

> other than that it looks ok, thanks
>
> jirka
>

Thanks!

Thanks
Jin Yao

>> read_counters(&rs);
>>
>> if (STAT_RECORD) {
>> --
>> 2.17.1
>>
>

2020-05-08 03:38:13

by Jin Yao

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] perf stat: Copy counts from prev_raw_counts to evsel->counts

Hi Jiri,

On 5/7/2020 11:19 PM, Jiri Olsa wrote:
> On Thu, May 07, 2020 at 02:58:21PM +0800, Jin Yao wrote:
>> It would be useful to support the overall statistics for perf-stat
>> interval mode. For example, report the summary at the end of
>> "perf-stat -I" output.
>>
>> But since perf-stat can support many aggregation modes, such as
>> --per-thread, --per-socket, -M and etc, we need a solution which
>> doesn't bring much complexity.
>>
>> The idea is to use 'evsel->prev_raw_counts' which is updated in
>> each interval and it's saved with the latest counts. Before reporting
>> the summary, we copy the counts from evsel->prev_raw_counts to
>> evsel->counts, and next we just follow non-interval processing.
>
> I did not realize we already store the count in prev_raw_counts ;-) nice catch!
>

Thanks! :)

>>
>> In evsel__compute_deltas, this patch saves counts to the position
>> of [cpu0,thread0] for AGGR_GLOBAL. After copying counts from
>> evsel->prev_raw_counts to evsel->counts, we don't need to
>> modify process_counter_maps in perf_stat_process_counter to let it
>> work well.
>
> I don't understand why you need to store it in here.. what's the catch
> in process_counter_maps?
>

Sorry, I didn't explain clearly.

You know the idea is to copy evsel->prev_raw_counts to evsel->counts before
reporting the summary.

But for AGGR_GLOBAL (cpu = -1 in perf_evsel__compute_deltas), the
evsel->prev_raw_counts is only stored with the aggr value.

if (cpu == -1) {
tmp = evsel->prev_raw_counts->aggr;
evsel->prev_raw_counts->aggr = *count;
} else {
tmp = *perf_counts(evsel->prev_raw_counts, cpu, thread);
*perf_counts(evsel->prev_raw_counts, cpu, thread) = *count;
}

So after copying evsel->prev_raw_counts to evsel->counts,
perf_counts(evsel->counts, cpu, thread) are all 0.

Once we go to process_counter_maps again, in process_counter_values, count->val
is 0.

case AGGR_GLOBAL:
aggr->val += count->val;
aggr->ena += count->ena;
aggr->run += count->run;

And the aggr->val is 0.

So this patch uses a trick that saves the previous aggr value to cpu0/thread0,
then above aggr->val calculation can work correctly.

Thanks
Jin Yao

> thanks,
> jirka
>