2009-12-15 09:52:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: [tip:perf/diff] perf diff: Introduce tool to show performance difference


'perf diff' looks to be very useful!

I think we want to improve the default output some more:

For example there should be column names like for perf report:

# Overhead Command Shared Object Symbol
# ........ ......... ............................ ......
#
13.87% hackbench [kernel] [k] _raw_spin_lock
5.60% hackbench [kernel] [k] copy_user_generic_string
5.23% hackbench [kernel] [k] __cache_free

Secondly, here's the current output of two successive 'hackbench 10' run:

1 +2381 +58 [kernel.kallsyms] avc_has_perm_noaudit
2 -26 [kernel.kallsyms] clear_page_c
3 +107 +29 [kernel.kallsyms] _raw_spin_lock
4 +106 +23 [kernel.kallsyms] _raw_spin_lock
5 +105 +22 [kernel.kallsyms] _raw_spin_lock
6 +104 +21 [kernel.kallsyms] _raw_spin_lock
7 +103 +20 [kernel.kallsyms] _raw_spin_lock
8 +102 +20 [kernel.kallsyms] _raw_spin_lock
9 +101 +19 [kernel.kallsyms] _raw_spin_lock
10 -6 -28 [kernel.kallsyms] selinux_task_wait
11 +99 +18 [kernel.kallsyms] _raw_spin_lock
12 +98 +18 [kernel.kallsyms] _raw_spin_lock
13 +97 +18 [kernel.kallsyms] _raw_spin_lock
14 +96 +18 [kernel.kallsyms] _raw_spin_lock
15 +220 +20 [kernel.kallsyms] _raw_spin_lock_irqsave
16 +94 +17 [kernel.kallsyms] _raw_spin_lock
17 +93 +17 [kernel.kallsyms] _raw_spin_lock
18 +92 +16 [kernel.kallsyms] _raw_spin_lock
19 +9648 +30 [kernel.kallsyms] copy_page_c
20 +90 +15 [kernel.kallsyms] _raw_spin_lock
21 +89 +15 [kernel.kallsyms] _raw_spin_lock

beyond the missing column names, there's some other things visible too:

- the symbol names are per pid, so they repeat all over again. I think we
want the default output to be like perf report's, i.e. PIDs get summarized
over commands.

Furthermore, i think -p should be enabled by default. Especially with a lot of
functions the +/- notation isnt very obvious at first sight.

I.e. output like this would be more useful i think:

baseline delta command symbol
-----------------------------------------------------------
13.87% +0.11% hackbench [k] _raw_spin_lock
5.60% -2.53% hackbench [k] copy_user_generic_string
5.23% -0.14% hackbench [k] __cache_free
4.30% -2.59% hackbench [k] unix_stream_recvmsg
3.35% -1.56% hackbench [k] avc_has_perm_noaudit
-----------------------------------------------------------
Sum:
0.51% +- 0.05% less time elapsed

The relative position can be printed too but should not necessarily be the
default. Printing the delta in elapsed time is also very important, because
not only do we want to know the after/before profile delta, we also want to
see the tangible result in elapsed time - via the same tool.

Btw., i think we also want a "--repeat N" feature for perf record, to allow
the recording of the same command over and over again, so that a higher
quality profile can be achieved.

"perf record -f --repeat 10 cmd" would be a convenience shortcut for:

rm -f perf.data
for ((i=0; i<10; i++)); do
perf record -A cmd
done

Thanks,

Ingo


2009-12-15 12:09:24

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [tip:perf/diff] perf diff: Introduce tool to show performance difference

On Tue, Dec 15, 2009 at 10:52:14AM +0100, Ingo Molnar wrote:
>
> 'perf diff' looks to be very useful!
>
> I think we want to improve the default output some more:
>
> For example there should be column names like for perf report:
>
> # Overhead Command Shared Object Symbol
> # ........ ......... ............................ ......
> #
> 13.87% hackbench [kernel] [k] _raw_spin_lock
> 5.60% hackbench [kernel] [k] copy_user_generic_string
> 5.23% hackbench [kernel] [k] __cache_free
>
> Secondly, here's the current output of two successive 'hackbench 10' run:
>
> 1 +2381 +58 [kernel.kallsyms] avc_has_perm_noaudit
> 2 -26 [kernel.kallsyms] clear_page_c
> 3 +107 +29 [kernel.kallsyms] _raw_spin_lock
> 4 +106 +23 [kernel.kallsyms] _raw_spin_lock
> 5 +105 +22 [kernel.kallsyms] _raw_spin_lock
> 6 +104 +21 [kernel.kallsyms] _raw_spin_lock
> 7 +103 +20 [kernel.kallsyms] _raw_spin_lock
> 8 +102 +20 [kernel.kallsyms] _raw_spin_lock
> 9 +101 +19 [kernel.kallsyms] _raw_spin_lock
> 10 -6 -28 [kernel.kallsyms] selinux_task_wait
> 11 +99 +18 [kernel.kallsyms] _raw_spin_lock
> 12 +98 +18 [kernel.kallsyms] _raw_spin_lock
> 13 +97 +18 [kernel.kallsyms] _raw_spin_lock
> 14 +96 +18 [kernel.kallsyms] _raw_spin_lock
> 15 +220 +20 [kernel.kallsyms] _raw_spin_lock_irqsave
> 16 +94 +17 [kernel.kallsyms] _raw_spin_lock
> 17 +93 +17 [kernel.kallsyms] _raw_spin_lock
> 18 +92 +16 [kernel.kallsyms] _raw_spin_lock
> 19 +9648 +30 [kernel.kallsyms] copy_page_c
> 20 +90 +15 [kernel.kallsyms] _raw_spin_lock
> 21 +89 +15 [kernel.kallsyms] _raw_spin_lock
>
> beyond the missing column names, there's some other things visible too:
>
> - the symbol names are per pid, so they repeat all over again. I think we
> want the default output to be like perf report's, i.e. PIDs get summarized
> over commands.
>
> Furthermore, i think -p should be enabled by default. Especially with a lot of
> functions the +/- notation isnt very obvious at first sight.


Agreed, I think -p gives a better first glance about the global
state.

Also it would be nice to sort by default on deltas, probably
by abs(delta) even because the first purpose is to see what
has changed most.

And those who want better granularity could sort by asc or desc.

Oh, and I see the first column is useful to depict the
number of profiled sites that have varied, but dedicating
a whole column for that seems a bit too much. May be
that can be a default-off option? Or the number could
be output as a summary in the beggining?

Thanks.

2009-12-15 12:56:13

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [tip:perf/diff] perf diff: Introduce tool to show performance difference

Em Tue, Dec 15, 2009 at 10:52:14AM +0100, Ingo Molnar escreveu:
>
> 'perf diff' looks to be very useful!
>
> I think we want to improve the default output some more:
>
> For example there should be column names like for perf report:
>
> # Overhead Command Shared Object Symbol
> # ........ ......... ............................ ......
> #
> 13.87% hackbench [kernel] [k] _raw_spin_lock
> 5.60% hackbench [kernel] [k] copy_user_generic_string
> 5.23% hackbench [kernel] [k] __cache_free

Ok, most suggestions boil down to move more code from builtin-report.c
to session.c and use in builtin-diff.c. Surgery beginning.

- Arnaldo