Em Fri, Apr 08, 2016 at 02:57:54PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Apr 08, 2016 at 01:34:15PM +0200, Milian Wolff escreveu:
> > Now, one can print the call chain for every encountered sys_exit
> > event, e.g.:
> > Note that it is advised to increase the number of mmap pages to
> > prevent event losses when using this new feature. Often, adding
> > `-m 10M` to the `perf trace` invocation is enough.
> > This feature is also available in strace when built with libunwind
> > via `strace -k`. Performance wise, this solution is much better:
> > $ time find path/to/linux &> /dev/null
> > real 0m0.051s
> > user 0m0.013s
> > sys 0m0.037s
> > $ time perf trace -m 800M --call-graph dwarf find path/to/linux &>
> > /dev/null
> > real 0m2.624s
> > user 0m1.203s
> > sys 0m1.333s
> > $ time strace -k find path/to/linux &> /dev/null
> > real 0m35.398s
> > user 0m10.403s
> > sys 0m23.173s
> > Note that it is currently not possible to configure the print output.
> > Adding such a feature, similar to what is available in `perf script`
> > via its `--fields` knob can be added later on.
> You mixed up multiple changes in one single patch, I'll break it down
> while testing, and before pushing upstream.
Expanding a bit the audience:
First test, it works, great! But do we really need that address? I guess not,
right, perhaps via some callchain parameter, to tell what we want to see? But
by default knowing the function name + DSO seems enough, no?
[root@jouet bpf]# trace -e nanosleep --call-graph dwarf usleep 1
0.071 ( 0.071 ms): usleep/5455 nanosleep(rqtp: 0x7ffee070f080) = 0
2036be syscall_slow_exit_work ([kernel.kallsyms])
203dfb do_syscall_64 ([kernel.kallsyms])
9b8fe1 return_from_SYSCALL_64 ([kernel.kallsyms])
7f41622ec790 __nanosleep (/usr/lib64/libc-2.22.so)
7f416231d524 usleep (/usr/lib64/libc-2.22.so)
563b6c6afcab [unknown] (/usr/bin/usleep)
7f4162244580 __libc_start_main (/usr/lib64/libc-2.22.so)
563b6c6afce9 [unknown] (/usr/bin/usleep)
[root@jouet bpf]#
Yeah, you agree with that, now that I read the patch 8-):
+ /* TODO: user-configurable print_opts */
+ unsigned int print_opts = PRINT_IP_OPT_IP
Ok, removing that OPT_IP I get, oops, the alignment is beign done only on ip?
[root@jouet bpf]# trace -e nanosleep --call-graph dwarf usleep 1
0.063 ( 0.063 ms): usleep/6132 nanosleep(rqtp: 0x7ffd1b7a8e70 ) = 0
syscall_slow_exit_work ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
return_from_SYSCALL_64 ([kernel.kallsyms])
__nanosleep (/usr/lib64/libc-2.22.so)
usleep (/usr/lib64/libc-2.22.so)
[unknown] (/usr/bin/usleep)
__libc_start_main (/usr/lib64/libc-2.22.so)
[unknown] (/usr/bin/usleep)
[root@jouet bpf]#
Fixing it up we get:
[root@jouet bpf]# trace -e nanosleep --call-graph dwarf usleep 1
0.063 ( 0.063 ms): usleep/6132 nanosleep(rqtp: 0x7ffd1b7a8e70 ) = 0
syscall_slow_exit_work ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
return_from_SYSCALL_64 ([kernel.kallsyms])
__nanosleep (/usr/lib64/libc-2.22.so)
usleep (/usr/lib64/libc-2.22.so)
[unknown] (/usr/bin/usleep)
__libc_start_main (/usr/lib64/libc-2.22.so)
[unknown] (/usr/bin/usleep)
[root@jouet bpf]#
Better, but perhaps we should try aligning, up to a limit, the function names/DSOs?
[root@jouet bpf]# trace -e nanosleep --call-graph dwarf usleep 1
0.063 ( 0.063 ms): usleep/6132 nanosleep(rqtp: 0x7ffd1b7a8e70 ) = 0
syscall_slow_exit_work ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
return_from_SYSCALL_64 ([kernel.kallsyms])
__nanosleep (/usr/lib64/libc-2.22.so)
usleep (/usr/lib64/libc-2.22.so)
[unknown] (/usr/bin/usleep)
__libc_start_main (/usr/lib64/libc-2.22.so)
[unknown] (/usr/bin/usleep)
[root@jouet bpf]#
wdyt?
Also, after this initial support is in, I think the next step is to
allow per syscall configs, like we have for per tracepoints, i.e. this
should be possible:
# trace -e nanosleep(call-graph=dwarf),socket -a
And then we would get callchains just for nanosleep calls, not for
socket ones. We then need to think how to ask that efficiently to the
kernel, in this case it should be instead of using
raw_syscalls:sys_enter + tracepoint filters set via ioctl, to use
syscalls:sys_{enter,exit}_nanosleep, with callgraphs +
syscalls:sys_{enter,exit}_socket, without.
Doing it this way allows us to avoid asking callchains for a lot of
events when we want just for a few ones, to reduce overhead.
Anyway, I think I'll just break this down into multiple patches and then
we can work on these other aspects.
David, ah, his patch floated on the linux-perf-users mailing list, easy
one once the thread->priv one got out of the way (it was being used by
builtin-trace.c and the unwind code, ugh).
Thanks,
- Arnaldo
On Freitag, 8. April 2016 15:18:53 CEST Arnaldo Carvalho de Melo wrote:
> Em Fri, Apr 08, 2016 at 02:57:54PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Fri, Apr 08, 2016 at 01:34:15PM +0200, Milian Wolff escreveu:
> > > Now, one can print the call chain for every encountered sys_exit
> > > event, e.g.:
> > >
> > > Note that it is advised to increase the number of mmap pages to
> > > prevent event losses when using this new feature. Often, adding
> > > `-m 10M` to the `perf trace` invocation is enough.
> > >
> > > This feature is also available in strace when built with libunwind
> > >
> > > via `strace -k`. Performance wise, this solution is much better:
> > > $ time find path/to/linux &> /dev/null
> > >
> > > real 0m0.051s
> > > user 0m0.013s
> > > sys 0m0.037s
> > >
> > > $ time perf trace -m 800M --call-graph dwarf find path/to/linux &>
> > > /dev/null
> > >
> > > real 0m2.624s
> > > user 0m1.203s
> > > sys 0m1.333s
> > >
> > > $ time strace -k find path/to/linux &> /dev/null
> > >
> > > real 0m35.398s
> > > user 0m10.403s
> > > sys 0m23.173s
> > >
> > > Note that it is currently not possible to configure the print output.
> > > Adding such a feature, similar to what is available in `perf script`
> > > via its `--fields` knob can be added later on.
> >
> > You mixed up multiple changes in one single patch, I'll break it down
> > while testing, and before pushing upstream.
>
> Expanding a bit the audience:
>
> First test, it works, great! But do we really need that address? I guess
> not, right, perhaps via some callchain parameter, to tell what we want to
> see? But by default knowing the function name + DSO seems enough, no?
...
> Yeah, you agree with that, now that I read the patch 8-):
>
> + /* TODO: user-configurable print_opts */
> + unsigned int print_opts = PRINT_IP_OPT_IP
;-)
I even tried to make the code of `perf script` reusable for `perf trace`, but
stopped once I realised that it currently relies on the existance of a
`perf_session`, which does not exist when we do live tracing. It only exists
for replaying in `builtin-trace.c`. So it involves some more refactoring which
I did not have the time for.
<snip>
> Better, but perhaps we should try aligning, up to a limit, the function
> names/DSOs?
>
> [root@jouet bpf]# trace -e nanosleep --call-graph dwarf usleep 1
> 0.063 ( 0.063 ms): usleep/6132 nanosleep(rqtp: 0x7ffd1b7a8e70
> ) = 0 syscall_slow_exit_work
> ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms])
> return_from_SYSCALL_64 ([kernel.kallsyms]) __nanosleep
> (/usr/lib64/libc-2.22.so) usleep (/usr/lib64/libc-2.22.so)
> [unknown] (/usr/bin/usleep) __libc_start_main
> (/usr/lib64/libc-2.22.so) [unknown] (/usr/bin/usleep)
> [root@jouet bpf]#
>
> wdyt?
Yes, sounds good. Many profilers I've worked with always dump the IP, so I
thought we should do it here as well. `perf script` e.g. does it. Could we
maybe print the IP if the symbol is [unknown]?
> Also, after this initial support is in, I think the next step is to
> allow per syscall configs, like we have for per tracepoints, i.e. this
> should be possible:
>
> # trace -e nanosleep(call-graph=dwarf),socket -a
>
> And then we would get callchains just for nanosleep calls, not for
> socket ones. We then need to think how to ask that efficiently to the
> kernel, in this case it should be instead of using
> raw_syscalls:sys_enter + tracepoint filters set via ioctl, to use
> syscalls:sys_{enter,exit}_nanosleep, with callgraphs +
> syscalls:sys_{enter,exit}_socket, without.
>
> Doing it this way allows us to avoid asking callchains for a lot of
> events when we want just for a few ones, to reduce overhead.
Yep, sounds useful for some more specific use-cases. For me, this patch is
sufficient as I'd just do:
$ trace -e nanosleep --call-graph=dwarf ...
What I think is more important though is to make sure we only ask for
callchains on the sys_exit events. Afaik, my patch will do it also for the
sys_enter which is just additional cost with no benefit? So fixing that first
is I think even more important, but I don't know how.
> Anyway, I think I'll just break this down into multiple patches and then
> we can work on these other aspects.
Yes, but note that I'll be busy and then on vacation for the next two weeks.
I'll get back to this after wards.
> David, ah, his patch floated on the linux-perf-users mailing list, easy
> one once the thread->priv one got out of the way (it was being used by
> builtin-trace.c and the unwind code, ugh).
>
> Thanks,
Same to you, cheers!
--
Milian Wolff | [email protected] | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts