2020-11-13 05:08:24

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH] perf stat: Use proper cpu for shadow stats

Currently perf stat shows some metrics (like IPC) for defined events.
But when no aggregation mode is used (-A option), it shows incorrect
values since it used a value from a different cpu.

Before:

$ perf stat -aA -e cycles,instructions sleep 1

Performance counter stats for 'system wide':

CPU0 116,057,380 cycles
CPU1 86,084,722 cycles
CPU2 99,423,125 cycles
CPU3 98,272,994 cycles
CPU0 53,369,217 instructions # 0.46 insn per cycle
CPU1 33,378,058 instructions # 0.29 insn per cycle
CPU2 58,150,086 instructions # 0.50 insn per cycle
CPU3 40,029,703 instructions # 0.34 insn per cycle

1.001816971 seconds time elapsed

So the IPC for CPU1 should be 0.38 (= 33,378,058 / 86,084,722)
but it was 0.29 (= 33,378,058 / 116,057,380) and so on.

After:

$ perf stat -aA -e cycles,instructions sleep 1

Performance counter stats for 'system wide':

CPU0 109,621,384 cycles
CPU1 159,026,454 cycles
CPU2 99,460,366 cycles
CPU3 124,144,142 cycles
CPU0 44,396,706 instructions # 0.41 insn per cycle
CPU1 120,195,425 instructions # 0.76 insn per cycle
CPU2 44,763,978 instructions # 0.45 insn per cycle
CPU3 69,049,079 instructions # 0.56 insn per cycle

1.001910444 seconds time elapsed

Reported-by: Sam Xi <[email protected]>
Fixes: 44d49a600259 ("perf stat: Support metrics in --per-core/socket mode")
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/stat-display.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 4b57c0c07632..a963b5b8eb72 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -324,13 +324,10 @@ static int first_shadow_cpu(struct perf_stat_config *config,
struct evlist *evlist = evsel->evlist;
int i;

- if (!config->aggr_get_id)
- return 0;
-
if (config->aggr_mode == AGGR_NONE)
return id;

- if (config->aggr_mode == AGGR_GLOBAL)
+ if (!config->aggr_get_id)
return 0;

for (i = 0; i < evsel__nr_cpus(evsel); i++) {
--
2.29.2.299.gdc1121823c-goog


2020-11-13 19:20:58

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] perf stat: Use proper cpu for shadow stats

> CPU0 109,621,384 cycles
> CPU1 159,026,454 cycles
> CPU2 99,460,366 cycles
> CPU3 124,144,142 cycles
> CPU0 44,396,706 instructions # 0.41 insn per cycle
> CPU1 120,195,425 instructions # 0.76 insn per cycle
> CPU2 44,763,978 instructions # 0.45 insn per cycle
> CPU3 69,049,079 instructions # 0.56 insn per cycle
>
> 1.001910444 seconds time elapsed

Yes makes a lot of sense. Thanks for tracking that down.

Reviewed-by: Andi Kleen <[email protected]>

-Andi

2020-11-13 22:18:15

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH] perf stat: Use proper cpu for shadow stats

On Fri, Nov 13, 2020 at 02:02:36PM +0900, Namhyung Kim wrote:
> Currently perf stat shows some metrics (like IPC) for defined events.
> But when no aggregation mode is used (-A option), it shows incorrect
> values since it used a value from a different cpu.
>
> Before:
>
> $ perf stat -aA -e cycles,instructions sleep 1
>
> Performance counter stats for 'system wide':
>
> CPU0 116,057,380 cycles
> CPU1 86,084,722 cycles
> CPU2 99,423,125 cycles
> CPU3 98,272,994 cycles
> CPU0 53,369,217 instructions # 0.46 insn per cycle
> CPU1 33,378,058 instructions # 0.29 insn per cycle
> CPU2 58,150,086 instructions # 0.50 insn per cycle
> CPU3 40,029,703 instructions # 0.34 insn per cycle
>
> 1.001816971 seconds time elapsed
>
> So the IPC for CPU1 should be 0.38 (= 33,378,058 / 86,084,722)
> but it was 0.29 (= 33,378,058 / 116,057,380) and so on.
>
> After:
>
> $ perf stat -aA -e cycles,instructions sleep 1
>
> Performance counter stats for 'system wide':
>
> CPU0 109,621,384 cycles
> CPU1 159,026,454 cycles
> CPU2 99,460,366 cycles
> CPU3 124,144,142 cycles
> CPU0 44,396,706 instructions # 0.41 insn per cycle
> CPU1 120,195,425 instructions # 0.76 insn per cycle
> CPU2 44,763,978 instructions # 0.45 insn per cycle
> CPU3 69,049,079 instructions # 0.56 insn per cycle
>
> 1.001910444 seconds time elapsed
>
> Reported-by: Sam Xi <[email protected]>
> Fixes: 44d49a600259 ("perf stat: Support metrics in --per-core/socket mode")
> Signed-off-by: Namhyung Kim <[email protected]>

nice catch! would be great to have test for this

Acked-by: Jiri Olsa <[email protected]>

thanks,
jirka

> ---
> tools/perf/util/stat-display.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 4b57c0c07632..a963b5b8eb72 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -324,13 +324,10 @@ static int first_shadow_cpu(struct perf_stat_config *config,
> struct evlist *evlist = evsel->evlist;
> int i;
>
> - if (!config->aggr_get_id)
> - return 0;
> -
> if (config->aggr_mode == AGGR_NONE)
> return id;
>
> - if (config->aggr_mode == AGGR_GLOBAL)
> + if (!config->aggr_get_id)
> return 0;
>
> for (i = 0; i < evsel__nr_cpus(evsel); i++) {
> --
> 2.29.2.299.gdc1121823c-goog
>

2020-11-14 00:27:32

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH] perf stat: Use proper cpu for shadow stats

On Sat, Nov 14, 2020 at 7:16 AM Jiri Olsa <[email protected]> wrote:
>
> On Fri, Nov 13, 2020 at 02:02:36PM +0900, Namhyung Kim wrote:
> > Currently perf stat shows some metrics (like IPC) for defined events.
> > But when no aggregation mode is used (-A option), it shows incorrect
> > values since it used a value from a different cpu.
> >
> > Before:
> >
> > $ perf stat -aA -e cycles,instructions sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > CPU0 116,057,380 cycles
> > CPU1 86,084,722 cycles
> > CPU2 99,423,125 cycles
> > CPU3 98,272,994 cycles
> > CPU0 53,369,217 instructions # 0.46 insn per cycle
> > CPU1 33,378,058 instructions # 0.29 insn per cycle
> > CPU2 58,150,086 instructions # 0.50 insn per cycle
> > CPU3 40,029,703 instructions # 0.34 insn per cycle
> >
> > 1.001816971 seconds time elapsed
> >
> > So the IPC for CPU1 should be 0.38 (= 33,378,058 / 86,084,722)
> > but it was 0.29 (= 33,378,058 / 116,057,380) and so on.
> >
> > After:
> >
> > $ perf stat -aA -e cycles,instructions sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > CPU0 109,621,384 cycles
> > CPU1 159,026,454 cycles
> > CPU2 99,460,366 cycles
> > CPU3 124,144,142 cycles
> > CPU0 44,396,706 instructions # 0.41 insn per cycle
> > CPU1 120,195,425 instructions # 0.76 insn per cycle
> > CPU2 44,763,978 instructions # 0.45 insn per cycle
> > CPU3 69,049,079 instructions # 0.56 insn per cycle
> >
> > 1.001910444 seconds time elapsed
> >
> > Reported-by: Sam Xi <[email protected]>
> > Fixes: 44d49a600259 ("perf stat: Support metrics in --per-core/socket mode")
> > Signed-off-by: Namhyung Kim <[email protected]>
>
> nice catch! would be great to have test for this
>
> Acked-by: Jiri Olsa <[email protected]>

Thanks! Will add a test later.

Thanks,
Namhyung