2018-01-12 10:04:08

by Thomas-Mich Richter

[permalink] [raw]
Subject: [PATCH v2] perf trace: Fix missing handling of --call-graph dwarf

On Intel platform when I execute test case
probe libc's inet_pton & backtrace it with ping
I get this output which is correct:

[root@f27 perf]# ./perf trace --no-syscalls
-e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.064 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.064/0.064/0.064/0.000 ms
0.000 probe_libc:inet_pton:(7f26a88498a0))
__GI___inet_pton (/usr/lib64/libc-2.26.so)
getaddrinfo (/usr/lib64/libc-2.26.so)
main (/usr/bin/ping)
[root@f27 perf]#

However when I add the global --call-graph dwarf to the command
line I get the exactly same result which is wrong:

[root@f27 perf]# ./perf trace --no-syscalls --call-graph dwarf
-e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1

My understanding is that the '--call-graph dwarf' option should be global
and applies to all events. However it is ignored.
The issue is buried in function apply_config_terms() called from
trace__run()
--> perf_evlist__config()
--> perf_evsel__config()
--> apply_config_terms()

This function iterates over all additional terms and handles the
/max-stack=<xx>/ case. However if there is no additional call-graph=dwarf
setting as in -e probe_libc:inet_pton/max-stack=3,call-graph=dwarf/
the function defaults to "fp" regardless of the value of --call-graph
global command line flag.

So when I add ,call-graph=dwarf to the event I get a good result:

[root@f27 perf]# ./perf trace --no-syscalls
-e probe_libc:inet_pton/max-stack=3,call-graph=dwarf/
ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.084 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.084/0.084/0.084/0.000 ms
0.000 probe_libc:inet_pton:(7f9372a988a0))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
main (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
_start (/usr/bin/ping)
[root@f27 perf]#

In fact the looks pretty similar to the output on s390x which only has
dwarf support.
Ok the max-stack is not observed, but this time the global --max-stack <x>
comes to rescue and limits the output:

[root@f27 perf]# ./perf trace --no-syscalls --max-stack 4
-e probe_libc:inet_pton/max-stack=3,call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.061 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.061/0.061/0.061/0.000 ms
0.000 probe_libc:inet_pton:(7fc9f6c188a0))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
main (/usr/bin/ping)
[root@f27 perf]#

This patch applies the global --call-graph <xxx> setting to the individual
events.

With this patch applied:
[root@f27 perf]# ./perf trace --no-syscalls --call-graph dwarf
-e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.066 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.066/0.066/0.066/0.000 ms
0.000 probe_libc:inet_pton:(7fe74f8358a0))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
main (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
_start (/usr/bin/ping)
[root@f27 perf]#

When using dwarf the /max-stack=<x>/ is not observed.
The reason is that for call-graph=fp the kernel stack unwinder
is used and it stops after unwinding x entries.
For dwarf the complete stack data is returned and the unwinding
is done by perf itself and it uses --max-stack <x>
command line flag.

Signed-off-by: Thomas Richter <[email protected]>
Reviewed-by: Hendrick Brueckner <[email protected]>
---
tools/perf/util/evsel.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index d5fbcf8c7aa7..d1c25aafb821 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -788,8 +788,24 @@ static void apply_config_terms(struct perf_evsel *evsel,
if ((callgraph_buf != NULL) || (dump_size > 0) || max_stack) {
if (max_stack) {
param.max_stack = max_stack;
- if (callgraph_buf == NULL)
- callgraph_buf = "fp";
+ if (callgraph_buf == NULL) {
+ switch (callchain_param.record_mode) {
+ case CALLCHAIN_DWARF:
+ callgraph_buf = "dwarf";
+ break;
+ case CALLCHAIN_NONE:
+ case CALLCHAIN_FP:
+ callgraph_buf = "fp";
+ break;
+ case CALLCHAIN_LBR:
+ callgraph_buf = "lbr";
+ break;
+ default:
+ case CALLCHAIN_MAX:
+ callgraph_buf = "no";
+ break;
+ }
+ }
}

/* parse callgraph parameters */
--
2.14.3


2018-01-12 14:20:37

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2] perf trace: Fix missing handling of --call-graph dwarf

Em Fri, Jan 12, 2018 at 11:03:55AM +0100, Thomas Richter escreveu:
> On Intel platform when I execute test case
> probe libc's inet_pton & backtrace it with ping
> I get this output which is correct:
>
> [root@f27 perf]# ./perf trace --no-syscalls
> -e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1
> PING ::1(::1) 56 data bytes
> 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.064 ms
>
> --- ::1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.064/0.064/0.064/0.000 ms
> 0.000 probe_libc:inet_pton:(7f26a88498a0))
> __GI___inet_pton (/usr/lib64/libc-2.26.so)
> getaddrinfo (/usr/lib64/libc-2.26.so)
> main (/usr/bin/ping)
> [root@f27 perf]#
>
> However when I add the global --call-graph dwarf to the command
> line I get the exactly same result which is wrong:
>
> [root@f27 perf]# ./perf trace --no-syscalls --call-graph dwarf
> -e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1
>
> My understanding is that the '--call-graph dwarf' option should be global
> and applies to all events. However it is ignored.

It is global, but then, per-event settings can override it.

So if you say that the following call (without that /max-stack=3/, which
implies DWARF, which is the safest bet for userspace callchains):
[root@jouet ~]# perf trace --no-syscalls --call-graph dwarf -e probe_libc:inet_pton ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.086 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.086/0.086/0.086/0.000 ms
0.000 probe_libc:inet_pton:(7fafee7ce350))
[root@jouet ~]#

And it doesn't work, right, that is a bug. I'll check if your patch
fixes that and then will correct the cset log, i.e. this part:

[root@f27 perf]# ./perf trace --no-syscalls --call-graph dwarf
-e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1

Should read:

[root@f27 perf]# ./perf trace --no-syscalls --call-graph dwarf
-e probe_libc:inet_pton ping -6 -c 1 ::1

And _should_ produce a call-graph, without a stack limit, ok?

I'll reply to this message after testing your patch.

Thanks,

- Arnaldo

> The issue is buried in function apply_config_terms() called from
> trace__run()
> --> perf_evlist__config()
> --> perf_evsel__config()
> --> apply_config_terms()
>
> This function iterates over all additional terms and handles the
> /max-stack=<xx>/ case. However if there is no additional call-graph=dwarf
> setting as in -e probe_libc:inet_pton/max-stack=3,call-graph=dwarf/
> the function defaults to "fp" regardless of the value of --call-graph
> global command line flag.
>
> So when I add ,call-graph=dwarf to the event I get a good result:
>
> [root@f27 perf]# ./perf trace --no-syscalls
> -e probe_libc:inet_pton/max-stack=3,call-graph=dwarf/
> ping -6 -c 1 ::1
> PING ::1(::1) 56 data bytes
> 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.084 ms
>
> --- ::1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.084/0.084/0.084/0.000 ms
> 0.000 probe_libc:inet_pton:(7f9372a988a0))
> __inet_pton (inlined)
> gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
> __GI_getaddrinfo (inlined)
> main (/usr/bin/ping)
> __libc_start_main (/usr/lib64/libc-2.26.so)
> _start (/usr/bin/ping)
> [root@f27 perf]#
>
> In fact the looks pretty similar to the output on s390x which only has
> dwarf support.
> Ok the max-stack is not observed, but this time the global --max-stack <x>
> comes to rescue and limits the output:
>
> [root@f27 perf]# ./perf trace --no-syscalls --max-stack 4
> -e probe_libc:inet_pton/max-stack=3,call-graph=dwarf/ ping -6 -c 1 ::1
> PING ::1(::1) 56 data bytes
> 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.061 ms
>
> --- ::1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.061/0.061/0.061/0.000 ms
> 0.000 probe_libc:inet_pton:(7fc9f6c188a0))
> __inet_pton (inlined)
> gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
> __GI_getaddrinfo (inlined)
> main (/usr/bin/ping)
> [root@f27 perf]#
>
> This patch applies the global --call-graph <xxx> setting to the individual
> events.
>
> With this patch applied:
> [root@f27 perf]# ./perf trace --no-syscalls --call-graph dwarf
> -e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1
> PING ::1(::1) 56 data bytes
> 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.066 ms
>
> --- ::1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.066/0.066/0.066/0.000 ms
> 0.000 probe_libc:inet_pton:(7fe74f8358a0))
> __inet_pton (inlined)
> gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
> __GI_getaddrinfo (inlined)
> main (/usr/bin/ping)
> __libc_start_main (/usr/lib64/libc-2.26.so)
> _start (/usr/bin/ping)
> [root@f27 perf]#
>
> When using dwarf the /max-stack=<x>/ is not observed.
> The reason is that for call-graph=fp the kernel stack unwinder
> is used and it stops after unwinding x entries.
> For dwarf the complete stack data is returned and the unwinding
> is done by perf itself and it uses --max-stack <x>
> command line flag.
>
> Signed-off-by: Thomas Richter <[email protected]>
> Reviewed-by: Hendrick Brueckner <[email protected]>
> ---
> tools/perf/util/evsel.c | 20 ++++++++++++++++++--
> 1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index d5fbcf8c7aa7..d1c25aafb821 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -788,8 +788,24 @@ static void apply_config_terms(struct perf_evsel *evsel,
> if ((callgraph_buf != NULL) || (dump_size > 0) || max_stack) {
> if (max_stack) {
> param.max_stack = max_stack;
> - if (callgraph_buf == NULL)
> - callgraph_buf = "fp";
> + if (callgraph_buf == NULL) {
> + switch (callchain_param.record_mode) {
> + case CALLCHAIN_DWARF:
> + callgraph_buf = "dwarf";
> + break;
> + case CALLCHAIN_NONE:
> + case CALLCHAIN_FP:
> + callgraph_buf = "fp";
> + break;
> + case CALLCHAIN_LBR:
> + callgraph_buf = "lbr";
> + break;
> + default:
> + case CALLCHAIN_MAX:
> + callgraph_buf = "no";
> + break;
> + }
> + }
> }
>
> /* parse callgraph parameters */
> --
> 2.14.3

2018-01-12 15:01:50

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2] perf trace: Fix missing handling of --call-graph dwarf

Em Fri, Jan 12, 2018 at 11:20:32AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Jan 12, 2018 at 11:03:55AM +0100, Thomas Richter escreveu:
> > On Intel platform when I execute test case
> > probe libc's inet_pton & backtrace it with ping
> > I get this output which is correct:

> > [root@f27 perf]# ./perf trace --no-syscalls
> > -e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1
> > PING ::1(::1) 56 data bytes
> > 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.064 ms

> > --- ::1 ping statistics ---
> > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > rtt min/avg/max/mdev = 0.064/0.064/0.064/0.000 ms
> > 0.000 probe_libc:inet_pton:(7f26a88498a0))
> > __GI___inet_pton (/usr/lib64/libc-2.26.so)
> > getaddrinfo (/usr/lib64/libc-2.26.so)
> > main (/usr/bin/ping)
> > [root@f27 perf]#
> >
> > However when I add the global --call-graph dwarf to the command
> > line I get the exactly same result which is wrong:
> >
> > [root@f27 perf]# ./perf trace --no-syscalls --call-graph dwarf
> > -e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1
> >
> > My understanding is that the '--call-graph dwarf' option should be global
> > and applies to all events. However it is ignored.
>
> It is global, but then, per-event settings can override it.
>
> So if you say that the following call (without that /max-stack=3/, which
> implies DWARF, which is the safest bet for userspace callchains):
> [root@jouet ~]# perf trace --no-syscalls --call-graph dwarf -e probe_libc:inet_pton ping -6 -c 1 ::1
> PING ::1(::1) 56 data bytes
> 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.086 ms
>
> --- ::1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.086/0.086/0.086/0.000 ms
> 0.000 probe_libc:inet_pton:(7fafee7ce350))
> [root@jouet ~]#
>
> And it doesn't work, right, that is a bug. I'll check if your patch
> fixes that and then will correct the cset log, i.e. this part:
>
> [root@f27 perf]# ./perf trace --no-syscalls --call-graph dwarf
> -e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1
>
> Should read:
>
> [root@f27 perf]# ./perf trace --no-syscalls --call-graph dwarf
> -e probe_libc:inet_pton ping -6 -c 1 ::1
>
> And _should_ produce a call-graph, without a stack limit, ok?
>
> I'll reply to this message after testing your patch.

You patch works as advertised, but we only get the callgraph if we
specify max-stack in the event definition, which is odd, i.e. it serves
to ask for callgraphs, but then its value is ignored, to make it clear,
with your patch applied I get:

[root@jouet ~]# perf trace --no-syscalls --call-graph dwarf -e probe_libc:inet_pton ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.079 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.079/0.079/0.079/0.000 ms
0.000 probe_libc:inet_pton:(7f4585f7d350))
[root@jouet ~]#

Which is a bug, the global setting should have been applied to the only
event set.

Also:

[root@jouet ~]# perf trace --no-syscalls --call-graph dwarf --max-stack=3 -e probe_libc:inet_pton ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.065 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.065/0.065/0.065/0.000 ms
0.000 probe_libc:inet_pton:(7ffac40fa350))
[root@jouet ~]#

Also doesn't work, which also is a bug, DWARF callgraphs should have
been set _and_ limited to 3 levels.

[root@jouet ~]# perf trace --no-syscalls -e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.063 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.063/0.063/0.063/0.000 ms
0.000 probe_libc:inet_pton:(7f79a0312350))
__GI___inet_pton (/usr/lib64/libc-2.26.so)
getaddrinfo (/usr/lib64/libc-2.26.so)
[0xffffa9fef2875f40] (/usr/bin/ping)
[root@jouet ~]#

Works as advertised, i.e. it sets DWARF callgraphs for that event only
and limits it to 3.

Then if we try with your patch:

[root@jouet ~]# perf trace --no-syscalls --call-graph dwarf -e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.082 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.082/0.082/0.082/0.000 ms
0.000 probe_libc:inet_pton:(7f4c3d953350))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
[0xffffaa1b49e1ef3f] (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
[0xffffaa1b49e1f379] (/usr/bin/ping)
[root@jouet ~]#

It works but ignores the max-stack setting, which is wrong, as per-event
setting should override global ones.

So what I think is happening is that --call-graph is only affecting
syscall events, i.e.:

[root@jouet ~]# perf trace -e nanosleep sleep 1
1.047 (1000.182 ms): sleep/11767 nanosleep(rqtp: 0x7ffc7fbdf060) = 0
[root@jouet ~]# perf trace --call-graph dwarf -e nanosleep sleep 1
no symbols found in /usr/bin/sleep, maybe install a debug package?
0.380 (1000.161 ms): sleep/11772 nanosleep(rqtp: 0x7ffffe08a460) = 0
__GI___nanosleep (inlined)
[0xffffa9e3d99a0856] (/usr/bin/sleep)
[0xffffa9e3d99a062f] (/usr/bin/sleep)
[0xffffa9e3d999d903] (/usr/bin/sleep)
__libc_start_main (/usr/lib64/libc-2.26.so)
[0xffffa9e3d999d9f9] (/usr/bin/sleep)
[root@jouet ~]# perf trace --call-graph dwarf --max-stack 3 -e nanosleep sleep 1
no symbols found in /usr/bin/sleep, maybe install a debug package?
0.954 (1000.160 ms): sleep/11786 nanosleep(rqtp: 0x7ffdd19d42c0) = 0
__GI___nanosleep (inlined)
[0xffffa9ed007bd856] (/usr/bin/sleep)
[0xffffa9ed007bd62f] (/usr/bin/sleep)
[root@jouet ~]# perf trace --max-stack 3 -e nanosleep sleep 1
no symbols found in /usr/bin/sleep, maybe install a debug package?
0.745 (1000.170 ms): sleep/11791 nanosleep(rqtp: 0x7ffd0f665590) = 0
__GI___nanosleep (inlined)
[0xffffaa6de4631856] (/usr/bin/sleep)
[0xffffaa6de463162f] (/usr/bin/sleep)
[root@jouet ~]#

So what we should fix is the setting of --call-graph and --max-stack in
the non syscall events, that became supported by 'perf trace' later, so
I'm not applying your patch and will try later to fix this, unless you
beat me to it.

Thanks!

- Arnaldo

2018-01-12 16:47:10

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2] perf trace: Fix missing handling of --call-graph dwarf

Em Fri, Jan 12, 2018 at 12:01:42PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Jan 12, 2018 at 11:20:32AM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Fri, Jan 12, 2018 at 11:03:55AM +0100, Thomas Richter escreveu:
> > > On Intel platform when I execute test case
> > > probe libc's inet_pton & backtrace it with ping
> > > I get this output which is correct:
>
> > > [root@f27 perf]# ./perf trace --no-syscalls
> > > -e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1
> > > PING ::1(::1) 56 data bytes
> > > 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.064 ms
>
> > > --- ::1 ping statistics ---
> > > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > > rtt min/avg/max/mdev = 0.064/0.064/0.064/0.000 ms
> > > 0.000 probe_libc:inet_pton:(7f26a88498a0))
> > > __GI___inet_pton (/usr/lib64/libc-2.26.so)
> > > getaddrinfo (/usr/lib64/libc-2.26.so)
> > > main (/usr/bin/ping)
> > > [root@f27 perf]#
> > >
> > > However when I add the global --call-graph dwarf to the command
> > > line I get the exactly same result which is wrong:
> > >
> > > [root@f27 perf]# ./perf trace --no-syscalls --call-graph dwarf
> > > -e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1
> > >
> > > My understanding is that the '--call-graph dwarf' option should be global
> > > and applies to all events. However it is ignored.
> >
> > It is global, but then, per-event settings can override it.
> >
> > So if you say that the following call (without that /max-stack=3/, which
> > implies DWARF, which is the safest bet for userspace callchains):
> > [root@jouet ~]# perf trace --no-syscalls --call-graph dwarf -e probe_libc:inet_pton ping -6 -c 1 ::1
> > PING ::1(::1) 56 data bytes
> > 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.086 ms
> >
> > --- ::1 ping statistics ---
> > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > rtt min/avg/max/mdev = 0.086/0.086/0.086/0.000 ms
> > 0.000 probe_libc:inet_pton:(7fafee7ce350))
> > [root@jouet ~]#
> >
> > And it doesn't work, right, that is a bug. I'll check if your patch
> > fixes that and then will correct the cset log, i.e. this part:
> >
> > [root@f27 perf]# ./perf trace --no-syscalls --call-graph dwarf
> > -e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1
> >
> > Should read:
> >
> > [root@f27 perf]# ./perf trace --no-syscalls --call-graph dwarf
> > -e probe_libc:inet_pton ping -6 -c 1 ::1
> >
> > And _should_ produce a call-graph, without a stack limit, ok?
> >
> > I'll reply to this message after testing your patch.
>
> You patch works as advertised, but we only get the callgraph if we
> specify max-stack in the event definition, which is odd, i.e. it serves
> to ask for callgraphs, but then its value is ignored, to make it clear,
> with your patch applied I get:
>
> [root@jouet ~]# perf trace --no-syscalls --call-graph dwarf -e probe_libc:inet_pton ping -6 -c 1 ::1
> PING ::1(::1) 56 data bytes
> 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.079 ms
>
> --- ::1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.079/0.079/0.079/0.000 ms
> 0.000 probe_libc:inet_pton:(7f4585f7d350))
> [root@jouet ~]#
>
> Which is a bug, the global setting should have been applied to the only
> event set.

Please pull my perf/core branch, there are two csets there that should
fix this:

[acme@jouet perf]$ git log --oneline -2
cc28fec89f2a (HEAD -> perf/core, acme.korg/perf/core) perf trace: Fix setting of --call-graph/--max-stack for non-syscall events
236d812c55c2 perf trace: No need to set PERF_SAMPLE_IDENTIFIER explicitely
[acme@jouet perf]$

https://git.kernel.org/acme/c/cc28fec89f2a
https://git.kernel.org/acme/c/236d812c55c2

236d812c55c2 is just a simplification to make cc28fec89f2a smaller and
with just one purpose.

There is still room for improvement, I noticed overriding is not working
for the probe event, investigating it now.

- Arnaldo

2018-01-12 20:02:33

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2] perf trace: Fix missing handling of --call-graph dwarf

Em Fri, Jan 12, 2018 at 01:47:06PM -0300, Arnaldo Carvalho de Melo escreveu:
> There is still room for improvement, I noticed overriding is not working
> for the probe event, investigating it now.

So, I had to fix this another way to get the possibility of overwriting
the global options (--max-stack, --call-graph) in an specific tracepoint
event:

http://git.kernel.org/acme/c/08e26396c6f2

replaced that HEAD.

This cset may take some more minutes to show up, just pushed.

- Arnaldo

2018-01-15 09:11:12

by Thomas-Mich Richter

[permalink] [raw]
Subject: Re: [PATCH v2] perf trace: Fix missing handling of --call-graph dwarf

On 01/12/2018 09:02 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Jan 12, 2018 at 01:47:06PM -0300, Arnaldo Carvalho de Melo escreveu:
>> There is still room for improvement, I noticed overriding is not working
>> for the probe event, investigating it now.
>
> So, I had to fix this another way to get the possibility of overwriting
> the global options (--max-stack, --call-graph) in an specific tracepoint
> event:
>
> http://git.kernel.org/acme/c/08e26396c6f2
>
> replaced that HEAD.
>
> This cset may take some more minutes to show up, just pushed.
>

Sorry this does *not* work on my s390x.

I have cloned your perf/core tree and above commit is included.
Here is the command I tried:

[root@s8360047 perf]# ./perf trace -vv --no-syscalls --max-stack 4 --call-graph dwarf
-e probe_libc:inet_pton -- ping -6 -c 1 ::1
callchain: type DWARF
callchain: stack dump size 8192
------------------------------------------------------------
perf_event_attr:
type 2
size 112
config 0x45f
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|ADDR|CPU|PERIOD|RAW|DATA_SRC
disabled 1
inherit 1
mmap 1
comm 1
enable_on_exec 1
task 1
mmap_data 1
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
{ wakeup_events, wakeup_watermark } 1
------------------------------------------------------------
sys_perf_event_open: pid 6735 cpu 0 group_fd -1 flags 0x8 = 3
sys_perf_event_open: pid 6735 cpu 1 group_fd -1 flags 0x8 = 4
mmap size 2101248B
perf event ring buffer mmapped per cpu
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.070 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.070/0.070/0.070/0.000 ms
0.000 probe_libc:inet_pton:(3ffada42060))
[root@s8360047 perf]#

I do miss in sample_type bits for CALLCHAIN, STACK_USER and REG_USER
in the debug output of the perf_event_open() attribute printout.

When I invoke the command with
[root@s8360047 perf]# ./perf trace -vv --no-syscalls
-e probe_libc:inet_pton/call-graph=dwarf,max-stack=3/ -- ping -6 -c 1 ::1
callchain: type DWARF
callchain: stack dump size 8192
------------------------------------------------------------
perf_event_attr:
type 2
size 112
config 0x45f
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CALLCHAIN|CPU|PERIOD|RAW|REGS_USER|STACK_USER
disabled 1
inherit 1
mmap 1
comm 1
enable_on_exec 1
task 1
sample_id_all 1
exclude_guest 1
exclude_callchain_user 1
mmap2 1
comm_exec 1
{ wakeup_events, wakeup_watermark } 1
sample_regs_user 0x3ffffffff
sample_stack_user 8192
sample_max_stack 3
------------------------------------------------------------
sys_perf_event_open: pid 6768 cpu 0 group_fd -1 flags 0x8 = 3
sys_perf_event_open: pid 6768 cpu 1 group_fd -1 flags 0x8 = 4
mmap size 528384B
perf event ring buffer mmapped per cpu
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.074 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.074/0.074/0.074/0.000 ms
[... snip ....]

unwind: _start:ip = 0x2aa1e38457b (0x457b)
0.000 probe_libc:inet_pton:(3ff9b142060))
__GI___inet_pton (/usr/lib64/libc-2.26.so)
gaih_inet (inlined)
__GI_getaddrinfo (inlined)
main (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
_start (/usr/bin/ping)
[root@s8360047 perf]#

I see the proper result as expected.

--
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294

2018-01-15 12:31:10

by Thomas-Mich Richter

[permalink] [raw]
Subject: Re: [PATCH v2] perf trace: Fix missing handling of --call-graph dwarf

On 01/12/2018 09:02 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Jan 12, 2018 at 01:47:06PM -0300, Arnaldo Carvalho de Melo escreveu:
>> There is still room for improvement, I noticed overriding is not working
>> for the probe event, investigating it now.
>
> So, I had to fix this another way to get the possibility of overwriting
> the global options (--max-stack, --call-graph) in an specific tracepoint
> event:
>
> http://git.kernel.org/acme/c/08e26396c6f2
>
> replaced that HEAD.
>
> This cset may take some more minutes to show up, just pushed.
>

I have installed your perf/core tree on my Fedora 27 Virtual Machine
running on my Intel notebook.

Here are some commands and the output on an Intel platform:

[root@f27 perf]# uname -a
Linux f27 4.15.0-rc6acme+ #1 SMP Mon Jan 15 12:35:23 CET 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@f27 perf]#

[root@f27 perf]# ./perf trace --no-syscalls --call-graph fp --max-stack 3
-e probe_libc:inet_pton ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.022 ms
0.000 probe_libc:inet_pton:
--- ::1 ping statistics ---
(7f8fc407d8a0))
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.022/0.022/0.022/0.000 ms
[root@f27 perf]#

--> No call graph at all, the kernel as been compiled with ORC unwinder.

[root@f27 perf]# ./perf trace --no-syscalls --call-graph dwarf --max-stack 3
-e probe_libc:inet_pton ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.024 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.024/0.024/0.024/0.000 ms
0.000 probe_libc:inet_pton:(7f7ff38488a0))
[root@f27 perf]#

--> No call graph at all, the kernel as been compiled with OCR unwinder.

[root@f27 perf]# ./perf trace --no-syscalls
-e probe_libc:inet_pton/call-graph=dwarf,max-stack=4/
ping -6 -c 1 ::1PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.019 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.019/0.019/0.019/0.000 ms
0.000 probe_libc:inet_pton:(7fc985d658a0))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
main (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
_start (/usr/bin/ping)
[root@f27 perf]#

--> Dwarf call graph, but max-stack=4 not honoured when specified as
event specific restriction.

[root@f27 perf]# ./perf trace --no-syscalls --max-stack 4
-e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.020 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.020/0.020/0.020/0.000 ms
0.000 probe_libc:inet_pton:(7ffbc5f768a0))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
main (/usr/bin/ping)
[root@f27 perf]#


--> Dwarf call graph and --max-stack 4 is also honoured.

I have the feeling that your fix does not work very well when
used with the --no-syscalls option.
Omitting --no-syscalls shows your explained behavior.

So there must be a difference between --no-syscalls and --syscalls
invocation.

--
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294

2018-01-15 13:20:59

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2] perf trace: Fix missing handling of --call-graph dwarf

Em Mon, Jan 15, 2018 at 10:11:02AM +0100, Thomas-Mich Richter escreveu:
> On 01/12/2018 09:02 PM, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Jan 12, 2018 at 01:47:06PM -0300, Arnaldo Carvalho de Melo escreveu:
> >> There is still room for improvement, I noticed overriding is not working
> >> for the probe event, investigating it now.
> >
> > So, I had to fix this another way to get the possibility of overwriting
> > the global options (--max-stack, --call-graph) in an specific tracepoint
> > event:
> >
> > http://git.kernel.org/acme/c/08e26396c6f2
> >
> > replaced that HEAD.
> >
> > This cset may take some more minutes to show up, just pushed.
> >
>
> Sorry this does *not* work on my s390x.

> I have cloned your perf/core tree and above commit is included.
> Here is the command I tried:
>
> [root@s8360047 perf]# ./perf trace -vv --no-syscalls --max-stack 4 --call-graph dwarf
> -e probe_libc:inet_pton -- ping -6 -c 1 ::1

yeah, I'm looking at this now, reproduced here on x86_64 :-\

- Arnaldo

> callchain: type DWARF
> callchain: stack dump size 8192
> ------------------------------------------------------------
> perf_event_attr:
> type 2
> size 112
> config 0x45f
> { sample_period, sample_freq } 1
> sample_type IP|TID|TIME|ADDR|CPU|PERIOD|RAW|DATA_SRC
> disabled 1
> inherit 1
> mmap 1
> comm 1
> enable_on_exec 1
> task 1
> mmap_data 1
> sample_id_all 1
> exclude_guest 1
> mmap2 1
> comm_exec 1
> { wakeup_events, wakeup_watermark } 1
> ------------------------------------------------------------
> sys_perf_event_open: pid 6735 cpu 0 group_fd -1 flags 0x8 = 3
> sys_perf_event_open: pid 6735 cpu 1 group_fd -1 flags 0x8 = 4
> mmap size 2101248B
> perf event ring buffer mmapped per cpu
> PING ::1(::1) 56 data bytes
> 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.070 ms
>
> --- ::1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.070/0.070/0.070/0.000 ms
> 0.000 probe_libc:inet_pton:(3ffada42060))
> [root@s8360047 perf]#
>
> I do miss in sample_type bits for CALLCHAIN, STACK_USER and REG_USER
> in the debug output of the perf_event_open() attribute printout.


> When I invoke the command with
> [root@s8360047 perf]# ./perf trace -vv --no-syscalls
> -e probe_libc:inet_pton/call-graph=dwarf,max-stack=3/ -- ping -6 -c 1 ::1
> callchain: type DWARF
> callchain: stack dump size 8192
> ------------------------------------------------------------
> perf_event_attr:
> type 2
> size 112
> config 0x45f
> { sample_period, sample_freq } 1
> sample_type IP|TID|TIME|CALLCHAIN|CPU|PERIOD|RAW|REGS_USER|STACK_USER
> disabled 1
> inherit 1
> mmap 1
> comm 1
> enable_on_exec 1
> task 1
> sample_id_all 1
> exclude_guest 1
> exclude_callchain_user 1
> mmap2 1
> comm_exec 1
> { wakeup_events, wakeup_watermark } 1
> sample_regs_user 0x3ffffffff
> sample_stack_user 8192
> sample_max_stack 3
> ------------------------------------------------------------
> sys_perf_event_open: pid 6768 cpu 0 group_fd -1 flags 0x8 = 3
> sys_perf_event_open: pid 6768 cpu 1 group_fd -1 flags 0x8 = 4
> mmap size 528384B
> perf event ring buffer mmapped per cpu
> PING ::1(::1) 56 data bytes
> 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.074 ms
>
> --- ::1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.074/0.074/0.074/0.000 ms
> [... snip ....]
>
> unwind: _start:ip = 0x2aa1e38457b (0x457b)
> 0.000 probe_libc:inet_pton:(3ff9b142060))
> __GI___inet_pton (/usr/lib64/libc-2.26.so)
> gaih_inet (inlined)
> __GI_getaddrinfo (inlined)
> main (/usr/bin/ping)
> __libc_start_main (/usr/lib64/libc-2.26.so)
> _start (/usr/bin/ping)
> [root@s8360047 perf]#
>
> I see the proper result as expected.
>
> --
> Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
> --
> Vorsitzende des Aufsichtsrats: Martina Koederitz
> Gesch?ftsf?hrung: Dirk Wittkopp
> Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294

2018-01-15 13:58:01

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2] perf trace: Fix missing handling of --call-graph dwarf

Em Mon, Jan 15, 2018 at 01:31:00PM +0100, Thomas-Mich Richter escreveu:
> On 01/12/2018 09:02 PM, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Jan 12, 2018 at 01:47:06PM -0300, Arnaldo Carvalho de Melo escreveu:
> >> There is still room for improvement, I noticed overriding is not working
> >> for the probe event, investigating it now.
> >
> > So, I had to fix this another way to get the possibility of overwriting
> > the global options (--max-stack, --call-graph) in an specific tracepoint
> > event:
> >
> > http://git.kernel.org/acme/c/08e26396c6f2
> >
> > replaced that HEAD.
> >
> > This cset may take some more minutes to show up, just pushed.
> >
>
> I have installed your perf/core tree on my Fedora 27 Virtual Machine
> running on my Intel notebook.
>
> Here are some commands and the output on an Intel platform:
>
> [root@f27 perf]# uname -a
> Linux f27 4.15.0-rc6acme+ #1 SMP Mon Jan 15 12:35:23 CET 2018 x86_64 x86_64 x86_64 GNU/Linux
> [root@f27 perf]#
>
> [root@f27 perf]# ./perf trace --no-syscalls --call-graph fp --max-stack 3
> -e probe_libc:inet_pton ping -6 -c 1 ::1
> PING ::1(::1) 56 data bytes
> 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.022 ms
> 0.000 probe_libc:inet_pton:
> --- ::1 ping statistics ---
> (7f8fc407d8a0))
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.022/0.022/0.022/0.000 ms
> [root@f27 perf]#
>
> --> No call graph at all, the kernel as been compiled with ORC unwinder.

Results with the patch at the end of this message:

[root@jouet ~]# perf trace --no-syscalls --call-graph fp --max-stack 3 -e probe_libc:inet_pton ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.061 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.061/0.061/0.061/0.000 ms
0.000 probe_libc:inet_pton:(7f475b8be350))
__GI___inet_pton (/usr/lib64/libc-2.26.so)
getaddrinfo (/usr/lib64/libc-2.26.so)
[0xffffaa7de21b9f40] (/usr/bin/ping)
[root@jouet ~]#


> [root@f27 perf]# ./perf trace --no-syscalls --call-graph dwarf --max-stack 3
> -e probe_libc:inet_pton ping -6 -c 1 ::1
> PING ::1(::1) 56 data bytes
> 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.024 ms
>
> --- ::1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.024/0.024/0.024/0.000 ms
> 0.000 probe_libc:inet_pton:(7f7ff38488a0))
> [root@f27 perf]#
>
> --> No call graph at all, the kernel as been compiled with OCR unwinder.

[root@jouet ~]# perf trace --no-syscalls --call-graph fp --max-stack 3 -e probe_libc:inet_pton ping -6 -c 1 ::1
perf trace --no-syscalls --call-graph dwarf --max-stack 3 -e probe_libc:inet_pton ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.080 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.080/0.080/0.080/0.000 ms
0.000 probe_libc:inet_pton:(7f33e9647350))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
[root@jouet ~]#

And here we see a difference in the fp and DWARF unwinders, have to dig
deeper into this one, perhaps using the DWARF one we have more info and
then can end up with inlines instead of what it calls.

> [root@f27 perf]# ./perf trace --no-syscalls
> -e probe_libc:inet_pton/call-graph=dwarf,max-stack=4/
> ping -6 -c 1 ::1PING ::1(::1) 56 data bytes
> 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.019 ms
>
> --- ::1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.019/0.019/0.019/0.000 ms
> 0.000 probe_libc:inet_pton:(7fc985d658a0))
> __inet_pton (inlined)
> gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
> __GI_getaddrinfo (inlined)
> main (/usr/bin/ping)
> __libc_start_main (/usr/lib64/libc-2.26.so)
> _start (/usr/bin/ping)
> [root@f27 perf]#
>
> --> Dwarf call graph, but max-stack=4 not honoured when specified as
> event specific restriction.

[root@jouet ~]# perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf,max-stack=4/
0.000 probe_libc:inet_pton:(7f10482eb350))
Problem processing probe_libc:inet_pton callchain, skipping...
717.753 probe_libc:inet_pton:(7f10482eb350))
Problem processing probe_libc:inet_pton callchain, skipping...
718.182 probe_libc:inet_pton:(7f10482eb350))
Problem processing probe_libc:inet_pton callchain, skipping...
2662.046 probe_libc:inet_pton:(7f10482eb350))
Problem processing probe_libc:inet_pton callchain, skipping...
^C[root@jouet ~]#

Need to investigate...

> [root@f27 perf]# ./perf trace --no-syscalls --max-stack 4
> -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
> PING ::1(::1) 56 data bytes
> 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.020 ms
>
> --- ::1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.020/0.020/0.020/0.000 ms
> 0.000 probe_libc:inet_pton:(7ffbc5f768a0))
> __inet_pton (inlined)
> gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
> __GI_getaddrinfo (inlined)
> main (/usr/bin/ping)
> [root@f27 perf]#
>
>
> --> Dwarf call graph and --max-stack 4 is also honoured.

[root@jouet ~]# perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf,max-stack=4/
perf trace --no-syscalls --max-stack 4 -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
Value too large for defined data type
[root@jouet ~]#

Grrr.
>
> I have the feeling that your fix does not work very well when
> used with the --no-syscalls option.
> Omitting --no-syscalls shows your explained behavior.
>
> So there must be a difference between --no-syscalls and --syscalls
> invocation.

Right, there is, please see the patch below, but I need to fix the other
issues I noticed here but you haven't, perhaps we're using a different
DWARF unwinder (there are two possible libraries to use).

commit 7a24f244b452cfc0acfe35d1a6168c9a8ff483ce
Author: Arnaldo Carvalho de Melo <[email protected]>
Date: Mon Jan 15 10:39:55 2018 -0300

perf trace: Setup DWARF callchains for non-syscall events when --max-stack is used

If we use:

perf trace --max-stack=4

then the syscall events will use DWARF callchains, when available
(libunwind enabled in the build) and the printing will stop at 4 levels.

When we introduced support for tracepoint events this ended up not
applying for them, fix it.

Reported-by: Thomas Richter <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Hendrick Brueckner <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 0362974854e9..c6c7498562ea 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -3065,11 +3065,23 @@ int cmd_trace(int argc, const char **argv)
}

#ifdef HAVE_DWARF_UNWIND_SUPPORT
- if ((trace.min_stack || max_stack_user_set) && !callchain_param.enabled && trace.trace_syscalls)
+ if ((trace.min_stack || max_stack_user_set) && !callchain_param.enabled) {
record_opts__parse_callchain(&trace.opts, &callchain_param, "dwarf", false);
+ }
#endif

if (callchain_param.enabled) {
+ struct perf_evsel *evsel;
+ /*
+ * When we use --max-stack and/or --call-graph we need to look at all the
+ * other events passed in the command line to apply this config to them,
+ * if not explicitely set.
+ */
+ evlist__for_each_entry(trace.evlist, evsel) {
+ if (!(evsel->attr.sample_type & PERF_SAMPLE_CALLCHAIN))
+ perf_evsel__config_callchain(evsel, &trace.opts, &callchain_param);
+ }
+
if (!mmap_pages_user_set && geteuid() == 0)
trace.opts.mmap_pages = perf_event_mlock_kb_in_pages() * 4;


2018-01-15 14:16:27

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2] perf trace: Fix missing handling of --call-graph dwarf

Em Mon, Jan 15, 2018 at 10:57:52AM -0300, Arnaldo Carvalho de Melo escreveu:
> > [root@f27 perf]# ./perf trace --no-syscalls --max-stack 4
> > -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
> > PING ::1(::1) 56 data bytes
> > 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.020 ms
> >
> > --- ::1 ping statistics ---
> > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > rtt min/avg/max/mdev = 0.020/0.020/0.020/0.000 ms
> > 0.000 probe_libc:inet_pton:(7ffbc5f768a0))
> > __inet_pton (inlined)
> > gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
> > __GI_getaddrinfo (inlined)
> > main (/usr/bin/ping)
> > [root@f27 perf]#
> >
> >
> > --> Dwarf call graph and --max-stack 4 is also honoured.
>
> [root@jouet ~]# perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf,max-stack=4/
> perf trace --no-syscalls --max-stack 4 -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
> Value too large for defined data type
> [root@jouet ~]#
>
> Grrr.


Got this one fixed with the following patch:

commit b78278e11f6992ca348a4b96aad3b2c0a9ecf0f0
Author: Arnaldo Carvalho de Melo <[email protected]>
Date: Mon Jan 15 11:07:58 2018 -0300

perf callchain: Fix attr.sample_max_stack setting

When setting the "dwarf" unwinder for a specific event and not
specifying the max-stack, the attr.sample_max_stack ended up using an
uninitialized callchain_param.max_stack, fix it by using designated
initializers for that callchain_param variable, zeroing all non
explicitely initialized struct members.

Here is what happened:

# perf trace -vv --no-syscalls --max-stack 4 -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
callchain: type DWARF
callchain: stack dump size 8192
perf_event_attr:
type 2
size 112
config 0x730
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|ADDR|CALLCHAIN|CPU|PERIOD|RAW|REGS_USER|STACK_USER|DATA_SRC
exclude_callchain_user 1
{ wakeup_events, wakeup_watermark } 1
sample_regs_user 0xff0fff
sample_stack_user 8192
sample_max_stack 50656
sys_perf_event_open failed, error -75
Value too large for defined data type
# perf trace -vv --no-syscalls --max-stack 4 -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
callchain: type DWARF
callchain: stack dump size 8192
perf_event_attr:
type 2
size 112
config 0x730
sample_type IP|TID|TIME|ADDR|CALLCHAIN|CPU|PERIOD|RAW|REGS_USER|STACK_USER|DATA_SRC
exclude_callchain_user 1
sample_regs_user 0xff0fff
sample_stack_user 8192
sample_max_stack 30448
sys_perf_event_open failed, error -75
Value too large for defined data type
#

Now the attr.sample_max_stack is set to zero and the above works as
expected:

# perf trace --no-syscalls --max-stack 4 -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.072 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.072/0.072/0.072/0.000 ms
0.000 probe_libc:inet_pton:(7feb7a998350))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
[0xffffaa39b6108f3f] (/usr/bin/ping)
#

Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Hendrick Brueckner <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Wang Nan <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index efa2e629a669..8f971a2301d1 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -731,14 +731,14 @@ static void apply_config_terms(struct perf_evsel *evsel,
struct perf_evsel_config_term *term;
struct list_head *config_terms = &evsel->config_terms;
struct perf_event_attr *attr = &evsel->attr;
- struct callchain_param param;
+ /* callgraph default */
+ struct callchain_param param = {
+ .record_mode = callchain_param.record_mode,
+ };
u32 dump_size = 0;
int max_stack = 0;
const char *callgraph_buf = NULL;

- /* callgraph default */
- param.record_mode = callchain_param.record_mode;
-
list_for_each_entry(term, config_terms, list) {
switch (term->type) {
case PERF_EVSEL__CONFIG_TERM_PERIOD:

2018-01-15 14:45:57

by Hendrik Brueckner

[permalink] [raw]
Subject: Re: [PATCH v2] perf trace: Fix missing handling of --call-graph dwarf

On Mon, Jan 15, 2018 at 10:57:52AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Jan 15, 2018 at 01:31:00PM +0100, Thomas-Mich Richter escreveu:
> > On 01/12/2018 09:02 PM, Arnaldo Carvalho de Melo wrote:
> > > Em Fri, Jan 12, 2018 at 01:47:06PM -0300, Arnaldo Carvalho de Melo escreveu:
> > >
> [root@jouet ~]# perf trace --no-syscalls --call-graph fp --max-stack 3 -e probe_libc:inet_pton ping -6 -c 1 ::1
> perf trace --no-syscalls --call-graph dwarf --max-stack 3 -e probe_libc:inet_pton ping -6 -c 1 ::1
> PING ::1(::1) 56 data bytes
> 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.080 ms
>
> --- ::1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.080/0.080/0.080/0.000 ms
> 0.000 probe_libc:inet_pton:(7f33e9647350))
> __inet_pton (inlined)
> gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
> __GI_getaddrinfo (inlined)
> [root@jouet ~]#
>
> And here we see a difference in the fp and DWARF unwinders, have to dig
> deeper into this one, perhaps using the DWARF one we have more info and
> then can end up with inlines instead of what it calls.

DWARF includes information about which functions are inlined. The ORC
unwinder data is based on the analysis of the generated instructions
(created with the objtool). Hence, it does no (and no longer) know whether
a bunch of instructions are originally part of a function or inlined.

Thanks and kind regards,
Hendrik

2018-01-15 14:50:28

by Thomas-Mich Richter

[permalink] [raw]
Subject: Re: [PATCH v2] perf trace: Fix missing handling of --call-graph dwarf

On 01/15/2018 03:16 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, Jan 15, 2018 at 10:57:52AM -0300, Arnaldo Carvalho de Melo escreveu:
>>> [root@f27 perf]# ./perf trace --no-syscalls --max-stack 4
>>> -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
>>> PING ::1(::1) 56 data bytes
>>> 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.020 ms
>>>
>>> --- ::1 ping statistics ---
>>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>> rtt min/avg/max/mdev = 0.020/0.020/0.020/0.000 ms
>>> 0.000 probe_libc:inet_pton:(7ffbc5f768a0))
>>> __inet_pton (inlined)
>>> gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
>>> __GI_getaddrinfo (inlined)
>>> main (/usr/bin/ping)
>>> [root@f27 perf]#
>>>
>>>
>>> --> Dwarf call graph and --max-stack 4 is also honoured.
>>
>> [root@jouet ~]# perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf,max-stack=4/
>> perf trace --no-syscalls --max-stack 4 -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
>> Value too large for defined data type
>> [root@jouet ~]#
>>
>> Grrr.
>
>
> Got this one fixed with the following patch:
>
> commit b78278e11f6992ca348a4b96aad3b2c0a9ecf0f0

[...]

>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index efa2e629a669..8f971a2301d1 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -731,14 +731,14 @@ static void apply_config_terms(struct perf_evsel *evsel,
> struct perf_evsel_config_term *term;
> struct list_head *config_terms = &evsel->config_terms;
> struct perf_event_attr *attr = &evsel->attr;
> - struct callchain_param param;
> + /* callgraph default */
> + struct callchain_param param = {
> + .record_mode = callchain_param.record_mode,
> + };
> u32 dump_size = 0;
> int max_stack = 0;
> const char *callgraph_buf = NULL;
>
> - /* callgraph default */
> - param.record_mode = callchain_param.record_mode;
> -
> list_for_each_entry(term, config_terms, list) {
> switch (term->type) {
> case PERF_EVSEL__CONFIG_TERM_PERIOD:

This patch works for me.
Here is the output on my s390x:

[root@s8360047 perf]# ./perf trace --no-syscalls --max-stack 4
-e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.070 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.070/0.070/0.070/0.000 ms
0.000 probe_libc:inet_pton:(3ffa70c2060))
__GI___inet_pton (/usr/lib64/libc-2.26.so)
gaih_inet (inlined)
__GI_getaddrinfo (inlined)
main (/usr/bin/ping)
[root@s8360047 perf]# ./perf trace --no-syscalls --call-graph dwarf
-e probe_libc:inet_pton -- ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.086 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.086/0.086/0.086/0.000 ms
0.000 probe_libc:inet_pton:(3ff93fc2060))
__GI___inet_pton (/usr/lib64/libc-2.26.so)
gaih_inet (inlined)
__GI_getaddrinfo (inlined)
main (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
_start (/usr/bin/ping)
[root@s8360047 perf]#

[root@s8360047 perf]# ./perf trace --no-syscalls
-e probe_libc:inet_pton/call-graph=dwarf,max-stack=3/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.066 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.066/0.066/0.066/0.000 ms
0.000 probe_libc:inet_pton:(3ffb82c2060))
__GI___inet_pton (/usr/lib64/libc-2.26.so)
gaih_inet (inlined)
__GI_getaddrinfo (inlined)
main (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
_start (/usr/bin/ping)
[root@s8360047 perf]#

Not sure if this can work at all. Since dwarf stack unwinding is done in user space
the attr.sample_max_stack set to 3 is useless in this case.

Yout have my Tested-by.
--
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294

2018-01-15 15:02:33

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2] perf trace: Fix missing handling of --call-graph dwarf

Em Mon, Jan 15, 2018 at 03:50:15PM +0100, Thomas-Mich Richter escreveu:
> On 01/15/2018 03:16 PM, Arnaldo Carvalho de Melo wrote:
> > Em Mon, Jan 15, 2018 at 10:57:52AM -0300, Arnaldo Carvalho de Melo escreveu:
> >>> [root@f27 perf]# ./perf trace --no-syscalls --max-stack 4
> >>> -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
> >>> PING ::1(::1) 56 data bytes
> >>> 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.020 ms
> >>>
> >>> --- ::1 ping statistics ---
> >>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> >>> rtt min/avg/max/mdev = 0.020/0.020/0.020/0.000 ms
> >>> 0.000 probe_libc:inet_pton:(7ffbc5f768a0))
> >>> __inet_pton (inlined)
> >>> gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
> >>> __GI_getaddrinfo (inlined)
> >>> main (/usr/bin/ping)
> >>> [root@f27 perf]#
> >>>
> >>>
> >>> --> Dwarf call graph and --max-stack 4 is also honoured.
> >>
> >> [root@jouet ~]# perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf,max-stack=4/
> >> perf trace --no-syscalls --max-stack 4 -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
> >> Value too large for defined data type
> >> [root@jouet ~]#
> >>
> >> Grrr.
> >
> >
> > Got this one fixed with the following patch:
> >
> > commit b78278e11f6992ca348a4b96aad3b2c0a9ecf0f0
>
> [...]
>
> >
> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > index efa2e629a669..8f971a2301d1 100644
> > --- a/tools/perf/util/evsel.c
> > +++ b/tools/perf/util/evsel.c
> > @@ -731,14 +731,14 @@ static void apply_config_terms(struct perf_evsel *evsel,
> > struct perf_evsel_config_term *term;
> > struct list_head *config_terms = &evsel->config_terms;
> > struct perf_event_attr *attr = &evsel->attr;
> > - struct callchain_param param;
> > + /* callgraph default */
> > + struct callchain_param param = {
> > + .record_mode = callchain_param.record_mode,
> > + };
> > u32 dump_size = 0;
> > int max_stack = 0;
> > const char *callgraph_buf = NULL;
> >
> > - /* callgraph default */
> > - param.record_mode = callchain_param.record_mode;
> > -
> > list_for_each_entry(term, config_terms, list) {
> > switch (term->type) {
> > case PERF_EVSEL__CONFIG_TERM_PERIOD:
>
> This patch works for me.
> Here is the output on my s390x:
>
> [root@s8360047 perf]# ./perf trace --no-syscalls --max-stack 4
> -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
> PING ::1(::1) 56 data bytes
> 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.070 ms
>
> --- ::1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.070/0.070/0.070/0.000 ms
> 0.000 probe_libc:inet_pton:(3ffa70c2060))
> __GI___inet_pton (/usr/lib64/libc-2.26.so)
> gaih_inet (inlined)
> __GI_getaddrinfo (inlined)
> main (/usr/bin/ping)
> [root@s8360047 perf]# ./perf trace --no-syscalls --call-graph dwarf
> -e probe_libc:inet_pton -- ping -6 -c 1 ::1
> PING ::1(::1) 56 data bytes
> 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.086 ms
>
> --- ::1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.086/0.086/0.086/0.000 ms
> 0.000 probe_libc:inet_pton:(3ff93fc2060))
> __GI___inet_pton (/usr/lib64/libc-2.26.so)
> gaih_inet (inlined)
> __GI_getaddrinfo (inlined)
> main (/usr/bin/ping)
> __libc_start_main (/usr/lib64/libc-2.26.so)
> _start (/usr/bin/ping)
> [root@s8360047 perf]#
>
> [root@s8360047 perf]# ./perf trace --no-syscalls
> -e probe_libc:inet_pton/call-graph=dwarf,max-stack=3/ ping -6 -c 1 ::1
> PING ::1(::1) 56 data bytes
> 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.066 ms
>
> --- ::1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.066/0.066/0.066/0.000 ms
> 0.000 probe_libc:inet_pton:(3ffb82c2060))
> __GI___inet_pton (/usr/lib64/libc-2.26.so)
> gaih_inet (inlined)
> __GI_getaddrinfo (inlined)
> main (/usr/bin/ping)
> __libc_start_main (/usr/lib64/libc-2.26.so)
> _start (/usr/bin/ping)
> [root@s8360047 perf]#
>
> Not sure if this can work at all. Since dwarf stack unwinding is done in user space
> the attr.sample_max_stack set to 3 is useless in this case.

right, I'm working on this case now, this has to be honoured in
trace__resolve_callchain, that has to use evsel->attr.sample_max_stack
if it is set, otherwise fallback to trace->max_stack.

- Arnaldo