Currently perf-stat supports to print counts at regular interval (-I),
but it's not very easy for user to get the overall statistics.
With this patchset, it supports to report the summary at the end of
interval output.
For example,
root@kbl-ppc:~# perf stat -e cycles -I1000 --interval-count 2
# time counts unit events
1.000412064 2,281,114 cycles
2.001383658 2,547,880 cycles
Performance counter stats for 'system wide':
4,828,994 cycles
2.002860349 seconds time elapsed
root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
# time counts unit events
1.000389902 1,536,093 cycles
1.000389902 420,226 instructions # 0.27 insn per cycle
2.001433453 2,213,952 cycles
2.001433453 735,465 instructions # 0.33 insn per cycle
Performance counter stats for 'system wide':
3,750,045 cycles
1,155,691 instructions # 0.31 insn per cycle
2.003023361 seconds time elapsed
root@kbl-ppc:~# perf stat -M CPI,IPC -I1000 --interval-count 2
# time counts unit events
1.000435121 905,303 inst_retired.any # 2.9 CPI
1.000435121 2,663,333 cycles
1.000435121 914,702 inst_retired.any # 0.3 IPC
1.000435121 2,676,559 cpu_clk_unhalted.thread
2.001615941 1,951,092 inst_retired.any # 1.8 CPI
2.001615941 3,551,357 cycles
2.001615941 1,950,837 inst_retired.any # 0.5 IPC
2.001615941 3,551,044 cpu_clk_unhalted.thread
Performance counter stats for 'system wide':
2,856,395 inst_retired.any # 2.2 CPI
6,214,690 cycles
2,865,539 inst_retired.any # 0.5 IPC
6,227,603 cpu_clk_unhalted.thread
2.003403078 seconds time elapsed
v3:
---
1. 'perf stat: Fix wrong per-thread runtime stat for interval mode'
is a new patch which fixes an existing issue found in test.
2. We use the prev_raw_counts for summary counts. Drop the summary_counts in v2.
3. Fix some issues.
v2:
---
Rebase to perf/core branch
Jin Yao (4):
perf stat: Fix wrong per-thread runtime stat for interval mode
perf counts: Reset prev_raw_counts counts
perf stat: Copy counts from prev_raw_counts to evsel->counts
perf stat: Report summary for interval mode
tools/perf/builtin-stat.c | 27 +++++++++++++++++++++++++--
tools/perf/util/counts.c | 5 +++++
tools/perf/util/counts.h | 2 ++
tools/perf/util/evsel.c | 1 +
tools/perf/util/stat.c | 27 ++++++++++++++++++++++++++-
tools/perf/util/stat.h | 2 ++
6 files changed, 61 insertions(+), 3 deletions(-)
--
2.17.1
root@kbl-ppc:~# perf stat --per-thread -e cycles,instructions -I1000 --interval-count 2
1.004171683 perf-3696 8,747,311 cycles
...
1.004171683 perf-3696 691,730 instructions # 0.08 insn per cycle
...
2.006490373 perf-3696 1,749,936 cycles
...
2.006490373 perf-3696 1,484,582 instructions # 0.28 insn per cycle
...
Let's see interval 2.006490373
perf-3696 1,749,936 cycles
perf-3696 1,484,582 instructions # 0.28 insn per cycle
insn per cycle = 1,484,582 / 1,749,936 = 0.85.
But now it's 0.28, that's not correct.
stat_config.stats[] records the per-thread runtime stat. But for interval
mode, it should be reset for each interval.
So now, with this patch,
root@kbl-ppc:~# perf stat --per-thread -e cycles,instructions -I1000 --interval-count 2
1.005818121 perf-8633 9,898,045 cycles
...
1.005818121 perf-8633 693,298 instructions # 0.07 insn per cycle
...
2.007863743 perf-8633 1,551,619 cycles
...
2.007863743 perf-8633 1,317,514 instructions # 0.85 insn per cycle
...
Let's check interval 2.007863743.
insn per cycle = 1,317,514 / 1,551,619 = 0.85. It's correct.
Fixes: commit 14e72a21c783 ("perf stat: Update or print per-thread stats")
Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/builtin-stat.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index e0c1ad23c768..97ee941649e6 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -351,6 +351,16 @@ static void read_counters(struct timespec *rs)
}
}
+static void thread_stats_reset(struct perf_stat_config *config)
+{
+ int i;
+
+ if (config->stats) {
+ for (i = 0; i < config->stats_num; i++)
+ perf_stat__reset_shadow_per_stat(&config->stats[i]);
+ }
+}
+
static void process_interval(void)
{
struct timespec ts, rs;
@@ -359,6 +369,7 @@ static void process_interval(void)
diff_timespec(&rs, &ts, &ref_time);
perf_stat__reset_shadow_per_stat(&rt_stat);
+ thread_stats_reset(&stat_config);
read_counters(&rs);
if (STAT_RECORD) {
--
2.17.1
It would be useful to support the overall statistics for perf-stat
interval mode. For example, report the summary at the end of
"perf-stat -I" output.
But since perf-stat can support many aggregation modes, such as
--per-thread, --per-socket, -M and etc, we need a solution which
doesn't bring much complexity.
The idea is to use 'evsel->prev_raw_counts' which is updated in
each interval and it's saved with the latest counts. Before reporting
the summary, we copy the counts from evsel->prev_raw_counts to
evsel->counts, and next we just follow non-interval processing.
In evsel__compute_deltas, this patch saves counts to the position
of [cpu0,thread0] for AGGR_GLOBAL. After copying counts from
evsel->prev_raw_counts to evsel->counts, we don't need to
modify process_counter_maps in perf_stat_process_counter to let it
work well.
Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/evsel.c | 1 +
tools/perf/util/stat.c | 24 ++++++++++++++++++++++++
tools/perf/util/stat.h | 1 +
3 files changed, 26 insertions(+)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index f3e60c45d59a..898a697c7cdd 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1288,6 +1288,7 @@ void evsel__compute_deltas(struct evsel *evsel, int cpu, int thread,
if (cpu == -1) {
tmp = evsel->prev_raw_counts->aggr;
evsel->prev_raw_counts->aggr = *count;
+ *perf_counts(evsel->prev_raw_counts, 0, 0) = *count;
} else {
tmp = *perf_counts(evsel->prev_raw_counts, cpu, thread);
*perf_counts(evsel->prev_raw_counts, cpu, thread) = *count;
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 89e541564ed5..ede113805ecd 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -230,6 +230,30 @@ void perf_evlist__reset_prev_raw_counts(struct evlist *evlist)
perf_evsel__reset_prev_raw_counts(evsel);
}
+static void perf_evsel__copy_prev_raw_counts(struct evsel *evsel)
+{
+ int ncpus = evsel__nr_cpus(evsel);
+ int nthreads = perf_thread_map__nr(evsel->core.threads);
+
+ for (int thread = 0; thread < nthreads; thread++) {
+ for (int cpu = 0; cpu < ncpus; cpu++) {
+ *perf_counts(evsel->counts, cpu, thread) =
+ *perf_counts(evsel->prev_raw_counts, cpu,
+ thread);
+ }
+ }
+
+ evsel->counts->aggr = evsel->prev_raw_counts->aggr;
+}
+
+void perf_evlist__copy_prev_raw_counts(struct evlist *evlist)
+{
+ struct evsel *evsel;
+
+ evlist__for_each_entry(evlist, evsel)
+ perf_evsel__copy_prev_raw_counts(evsel);
+}
+
static void zero_per_pkg(struct evsel *counter)
{
if (counter->per_pkg_mask)
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index b4fdfaa7f2c0..62cf72c71869 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -198,6 +198,7 @@ int perf_evlist__alloc_stats(struct evlist *evlist, bool alloc_raw);
void perf_evlist__free_stats(struct evlist *evlist);
void perf_evlist__reset_stats(struct evlist *evlist);
void perf_evlist__reset_prev_raw_counts(struct evlist *evlist);
+void perf_evlist__copy_prev_raw_counts(struct evlist *evlist);
int perf_stat_process_counter(struct perf_stat_config *config,
struct evsel *counter);
--
2.17.1
On Thu, May 07, 2020 at 02:58:21PM +0800, Jin Yao wrote:
> It would be useful to support the overall statistics for perf-stat
> interval mode. For example, report the summary at the end of
> "perf-stat -I" output.
>
> But since perf-stat can support many aggregation modes, such as
> --per-thread, --per-socket, -M and etc, we need a solution which
> doesn't bring much complexity.
>
> The idea is to use 'evsel->prev_raw_counts' which is updated in
> each interval and it's saved with the latest counts. Before reporting
> the summary, we copy the counts from evsel->prev_raw_counts to
> evsel->counts, and next we just follow non-interval processing.
I did not realize we already store the count in prev_raw_counts ;-) nice catch!
>
> In evsel__compute_deltas, this patch saves counts to the position
> of [cpu0,thread0] for AGGR_GLOBAL. After copying counts from
> evsel->prev_raw_counts to evsel->counts, we don't need to
> modify process_counter_maps in perf_stat_process_counter to let it
> work well.
I don't understand why you need to store it in here.. what's the catch
in process_counter_maps?
thanks,
jirka
On Thu, May 07, 2020 at 02:58:19PM +0800, Jin Yao wrote:
SNIP
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index e0c1ad23c768..97ee941649e6 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -351,6 +351,16 @@ static void read_counters(struct timespec *rs)
> }
> }
>
> +static void thread_stats_reset(struct perf_stat_config *config)
> +{
> + int i;
> +
> + if (config->stats) {
> + for (i = 0; i < config->stats_num; i++)
> + perf_stat__reset_shadow_per_stat(&config->stats[i]);
> + }
> +}
> +
> static void process_interval(void)
> {
> struct timespec ts, rs;
> @@ -359,6 +369,7 @@ static void process_interval(void)
> diff_timespec(&rs, &ts, &ref_time);
>
> perf_stat__reset_shadow_per_stat(&rt_stat);
> + thread_stats_reset(&stat_config);
can't you call in here perf_stat__reset_stats?
and if not, I know it's threads related, but new
and delete functions are:
runtime_stat_new, runtime_stat_delete
so let's call it runtime_stat_reset and place it next to
the new/delete functions
other than that it looks ok, thanks
jirka
> read_counters(&rs);
>
> if (STAT_RECORD) {
> --
> 2.17.1
>
Hi Jiri,
On 5/7/2020 11:19 PM, Jiri Olsa wrote:
> On Thu, May 07, 2020 at 02:58:19PM +0800, Jin Yao wrote:
>
> SNIP
>
>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>> index e0c1ad23c768..97ee941649e6 100644
>> --- a/tools/perf/builtin-stat.c
>> +++ b/tools/perf/builtin-stat.c
>> @@ -351,6 +351,16 @@ static void read_counters(struct timespec *rs)
>> }
>> }
>>
>> +static void thread_stats_reset(struct perf_stat_config *config)
>> +{
>> + int i;
>> +
>> + if (config->stats) {
>> + for (i = 0; i < config->stats_num; i++)
>> + perf_stat__reset_shadow_per_stat(&config->stats[i]);
>> + }
>> +}
>> +
>> static void process_interval(void)
>> {
>> struct timespec ts, rs;
>> @@ -359,6 +369,7 @@ static void process_interval(void)
>> diff_timespec(&rs, &ts, &ref_time);
>>
>> perf_stat__reset_shadow_per_stat(&rt_stat);
>> + thread_stats_reset(&stat_config);
>
> can't you call in here perf_stat__reset_stats?
>
If we call perf_stat__reset_stat here, it will reset the evsel->counts, but I
don't think it's necessary. The counts will be updated in read_counts() soon.
> and if not, I know it's threads related, but new
> and delete functions are:
>
> runtime_stat_new, runtime_stat_delete
>
> so let's call it runtime_stat_reset and place it next to
> the new/delete functions
>
Yes, that's good idea. I will create runtime_stat_reset and place it next to
untime_stat_new/runtime_stat_delete.
> other than that it looks ok, thanks
>
> jirka
>
Thanks!
Thanks
Jin Yao
>> read_counters(&rs);
>>
>> if (STAT_RECORD) {
>> --
>> 2.17.1
>>
>
Hi Jiri,
On 5/7/2020 11:19 PM, Jiri Olsa wrote:
> On Thu, May 07, 2020 at 02:58:21PM +0800, Jin Yao wrote:
>> It would be useful to support the overall statistics for perf-stat
>> interval mode. For example, report the summary at the end of
>> "perf-stat -I" output.
>>
>> But since perf-stat can support many aggregation modes, such as
>> --per-thread, --per-socket, -M and etc, we need a solution which
>> doesn't bring much complexity.
>>
>> The idea is to use 'evsel->prev_raw_counts' which is updated in
>> each interval and it's saved with the latest counts. Before reporting
>> the summary, we copy the counts from evsel->prev_raw_counts to
>> evsel->counts, and next we just follow non-interval processing.
>
> I did not realize we already store the count in prev_raw_counts ;-) nice catch!
>
Thanks! :)
>>
>> In evsel__compute_deltas, this patch saves counts to the position
>> of [cpu0,thread0] for AGGR_GLOBAL. After copying counts from
>> evsel->prev_raw_counts to evsel->counts, we don't need to
>> modify process_counter_maps in perf_stat_process_counter to let it
>> work well.
>
> I don't understand why you need to store it in here.. what's the catch
> in process_counter_maps?
>
Sorry, I didn't explain clearly.
You know the idea is to copy evsel->prev_raw_counts to evsel->counts before
reporting the summary.
But for AGGR_GLOBAL (cpu = -1 in perf_evsel__compute_deltas), the
evsel->prev_raw_counts is only stored with the aggr value.
if (cpu == -1) {
tmp = evsel->prev_raw_counts->aggr;
evsel->prev_raw_counts->aggr = *count;
} else {
tmp = *perf_counts(evsel->prev_raw_counts, cpu, thread);
*perf_counts(evsel->prev_raw_counts, cpu, thread) = *count;
}
So after copying evsel->prev_raw_counts to evsel->counts,
perf_counts(evsel->counts, cpu, thread) are all 0.
Once we go to process_counter_maps again, in process_counter_values, count->val
is 0.
case AGGR_GLOBAL:
aggr->val += count->val;
aggr->ena += count->ena;
aggr->run += count->run;
And the aggr->val is 0.
So this patch uses a trick that saves the previous aggr value to cpu0/thread0,
then above aggr->val calculation can work correctly.
Thanks
Jin Yao
> thanks,
> jirka
>