Before this patch we can see very large time in the events before
bpf-output event. For example:
# ./perf trace -vv -T --ev sched:sched_switch \
--ev bpf-output/no-inherit,name=evt/ \
--ev ./test_bpf_trace.c/map:channel.event=evt/ \
usleep 10
...
18446744073709.551 (18446564645918.480 ms): usleep/4157 nanosleep(rqtp: 0x7ffd3f0dc4e0 ) ...
18446744073709.551 ( ): evt:Raise a BPF event!..)
179427791.076 ( ): perf_bpf_probe:func_begin:(ffffffff810eb9a0))
179427791.081 ( ): sched:sched_switch:usleep:4157 [120] S ==> swapper/2:0 [120])
...
We can also see the differences between bpf-output events and
breakpoint events:
For bpf output event:
sample_type IP|TID|RAW|IDENTIFIER
For tracepoint events:
sample_type IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER
This patch fix this differences by adding more sample type for
bpf-output events.
After this patch:
# ./perf trace -vv -T --ev sched:sched_switch \
--ev bpf-output/no-inherit,name=evt/ \
--ev ./test_bpf_trace.c/map:channel.event=evt/ \
usleep 10
...
179877370.878 ( 0.003 ms): usleep/5336 nanosleep(rqtp: 0x7ffff866c450 ) ...
179877370.878 ( ): evt:Raise a BPF event!..)
179877370.878 ( ): perf_bpf_probe:func_begin:(ffffffff810eb9a0))
179877370.882 ( ): sched:sched_switch:usleep:5336 [120] S ==> swapper/4:0 [120])
179877370.945 ( ): evt:Raise a BPF event!..)
...
# ./perf trace -vv -T --ev sched:sched_switch \
--ev bpf-output/no-inherit,name=evt/ \
--ev ./test_bpf_trace.c/map:channel.event=evt/ \
usleep 10 2>&1 | grep sample_type
sample_type IP|TID|TIME|ID|CPU|PERIOD|RAW
sample_type IP|TID|TIME|ID|CPU|PERIOD|RAW
sample_type IP|TID|TIME|ID|CPU|PERIOD|RAW
sample_type IP|TID|TIME|ID|CPU|PERIOD|RAW
sample_type IP|TID|TIME|ID|CPU|PERIOD|RAW
sample_type IP|TID|TIME|ID|CPU|PERIOD|RAW
The 'IDENTIFIER' is not required because all events have the
same sample_type.
Signed-off-by: Wang Nan <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Milian Wolff <[email protected]>
Cc: Namhyung Kim <[email protected]>
---
tools/perf/util/evsel.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 738ce22..3fd7c2c 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -226,7 +226,8 @@ struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx)
perf_evsel__init(evsel, attr, idx);
if (perf_evsel__is_bpf_output(evsel)) {
- evsel->attr.sample_type |= PERF_SAMPLE_RAW;
+ evsel->attr.sample_type |= (PERF_SAMPLE_RAW | PERF_SAMPLE_TIME |
+ PERF_SAMPLE_CPU | PERF_SAMPLE_PERIOD),
evsel->attr.sample_period = 1;
}
--
1.8.3.4
Em Fri, Apr 01, 2016 at 01:26:42PM +0000, Wang Nan escreveu:
> Before this patch we can see very large time in the events before
> bpf-output event. For example:
>
> # perf trace --ev bpf-output/no-inherit,name=evt/ \
> --ev ./test_bpf_trace.c/map:channel.event=evt/ \
> usleep 10
Thanks, applied and tested, now one idea that occurred to me to shorten
the above command line: automagically create a
"__perf_trace_bpf_stdout__" bpf-output event when a .c bpf event is
specified and no bpf-output is present, i.e. the following command line
would produce the same result as the one above:
# trace --ev test_bpf_trace.c usleep 10
Well, it would have to build test_bpf_trace.c and see if it references
the equivalent of a "stdout", i.e. it expects a bpf-output event to be
present to send output to.
I.e. in this example we have a:
struct bpf_map_def SEC("maps") channel = {
That later on you use to do "puts(msg)" like operations, i.e. to a
"stdout" of sorts:
func(void *ctx, int type)
{
char output_str[] = "Raise a BPF event!";
char err_str[] = "BAD %d\n";
int err;
err = perf_event_output(ctx, &channel, get_smp_processor_id(),
&output_str, sizeof(output_str));
if (err)
trace_printk(err_str, sizeof(err_str), err);
return 1;
}
Perhaps, to make all more familiar we could even define equivalents to
stdio.h functions like puts, printf, fputs, etc, that would send to this
bpf-output based "stdout" "channel", then the above would end up being:
func(void *ctx, int type)
{
char err_str[] = "BAD %d\n";
int err;
err = puts("Raise a BPF event!");
if (err)
trace_printk(err_str, sizeof(err_str), err);
return 1;
}
This trace_printk() in turn could become error() (glibc's error.h header), i.e.
the error mechanism would use the equivalent to userland's "syslog", i.e.
trace_printk :-)
In general trying to make BPF C scriptlets fed via perf to be as compact as
possible, hiding all these details while allowing them to be used, if desired.
- Arnaldo
Em Fri, Apr 01, 2016 at 12:16:10PM -0300, Arnaldo Carvalho de Melo escreveu:
> Perhaps, to make all more familiar we could even define equivalents to
> stdio.h functions like puts, printf, fputs, etc, that would send to this
> bpf-output based "stdout" "channel", then the above would end up being:
>
> func(void *ctx, int type)
> {
> char err_str[] = "BAD %d\n";
> int err;
>
> err = puts("Raise a BPF event!");
> if (err)
> trace_printk(err_str, sizeof(err_str), err);
> return 1;
> }
>
> This trace_printk() in turn could become error() (glibc's error.h header), i.e.
> the error mechanism would use the equivalent to userland's "syslog", i.e.
> trace_printk :-)
>
> In general trying to make BPF C scriptlets fed via perf to be as compact as
> possible, hiding all these details while allowing them to be used, if desired.
One extra possibility would be that it would look so much like a user
space C program that we could test it without loading it to the kernel,
with just building it with a different header and feeding it data as it
would get inside the kernel :-)
- Arnaldo
Commit-ID: d37ba880598654fda10b312331377cdca3edd574
Gitweb: http://git.kernel.org/tip/d37ba880598654fda10b312331377cdca3edd574
Author: Wang Nan <[email protected]>
AuthorDate: Fri, 1 Apr 2016 13:26:42 +0000
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Fri, 1 Apr 2016 18:46:25 -0300
perf bpf: Add sample types for 'bpf-output' event
Before this patch we can see very large time in the events before the
'bpf-output' event. For example:
# perf trace -vv -T --ev sched:sched_switch \
--ev bpf-output/no-inherit,name=evt/ \
--ev ./test_bpf_trace.c/map:channel.event=evt/ \
usleep 10
...
18446744073709.551 (18446564645918.480 ms): usleep/4157 nanosleep(rqtp: 0x7ffd3f0dc4e0) ...
18446744073709.551 ( ): evt:Raise a BPF event!..)
179427791.076 ( ): perf_bpf_probe:func_begin:(ffffffff810eb9a0))
179427791.081 ( ): sched:sched_switch:usleep:4157 [120] S ==> swapper/2:0 [120])
...
We can also see the differences between bpf-output events and
breakpoint events:
For bpf output event:
sample_type IP|TID|RAW|IDENTIFIER
For tracepoint events:
sample_type IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER
This patch fix this differences by adding more sample type for
bpf-output events.
After this patch:
# perf trace -vv -T --ev sched:sched_switch \
--ev bpf-output/no-inherit,name=evt/ \
--ev ./test_bpf_trace.c/map:channel.event=evt/ \
usleep 10
...
179877370.878 ( 0.003 ms): usleep/5336 nanosleep(rqtp: 0x7ffff866c450) ...
179877370.878 ( ): evt:Raise a BPF event!..)
179877370.878 ( ): perf_bpf_probe:func_begin:(ffffffff810eb9a0))
179877370.882 ( ): sched:sched_switch:usleep:5336 [120] S ==> swapper/4:0 [120])
179877370.945 ( ): evt:Raise a BPF event!..)
...
# ./perf trace -vv -T --ev sched:sched_switch \
--ev bpf-output/no-inherit,name=evt/ \
--ev ./test_bpf_trace.c/map:channel.event=evt/ \
usleep 10 2>&1 | grep sample_type
sample_type IP|TID|TIME|ID|CPU|PERIOD|RAW
sample_type IP|TID|TIME|ID|CPU|PERIOD|RAW
sample_type IP|TID|TIME|ID|CPU|PERIOD|RAW
sample_type IP|TID|TIME|ID|CPU|PERIOD|RAW
sample_type IP|TID|TIME|ID|CPU|PERIOD|RAW
sample_type IP|TID|TIME|ID|CPU|PERIOD|RAW
The 'IDENTIFIER' info is not required because all events have the same
sample_type.
Committer notes:
Further testing, on top of the changes making 'perf trace' avoid samples
from events without PERF_SAMPLE_TIME:
Before:
# trace --ev bpf-output/no-inherit,name=evt/ --ev /home/acme/bpf/test_bpf_trace.c/map:channel.event=evt/ usleep 10
<SNIP>
0.560 ( 0.001 ms): brk( ) = 0x55e5a1df8000
18446640227439.430 (18446640227438.859 ms): nanosleep(rqtp: 0x7ffc96643370) ...
18446640227439.430 ( ): evt:Raise a BPF event!..)
0.576 ( ): perf_bpf_probe:func_begin:(ffffffff81112460))
18446640227439.430 ( ): evt:Raise a BPF event!..)
0.645 ( ): perf_bpf_probe:func_end:(ffffffff81112460 <- ffffffff81003d92))
0.646 ( 0.076 ms): ... [continued]: nanosleep()) = 0
#
After:
# trace --ev bpf-output/no-inherit,name=evt/ --ev /home/acme/bpf/test_bpf_trace.c/map:channel.event=evt/ usleep 10
<SNIP>
0.292 ( 0.001 ms): brk( ) = 0x55c7cd6e1000
0.302 ( 0.004 ms): nanosleep(rqtp: 0x7ffedd8bc0f0) ...
0.302 ( ): evt:Raise a BPF event!..)
0.303 ( ): perf_bpf_probe:func_begin:(ffffffff81112460))
0.397 ( ): evt:Raise a BPF event!..)
0.397 ( ): perf_bpf_probe:func_end:(ffffffff81112460 <- ffffffff81003d92))
0.398 ( 0.100 ms): ... [continued]: nanosleep()) = 0
Signed-off-by: Wang Nan <[email protected]>
Reported-and-Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Milian Wolff <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/evsel.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 738ce22..3fd7c2c 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -226,7 +226,8 @@ struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx)
perf_evsel__init(evsel, attr, idx);
if (perf_evsel__is_bpf_output(evsel)) {
- evsel->attr.sample_type |= PERF_SAMPLE_RAW;
+ evsel->attr.sample_type |= (PERF_SAMPLE_RAW | PERF_SAMPLE_TIME |
+ PERF_SAMPLE_CPU | PERF_SAMPLE_PERIOD),
evsel->attr.sample_period = 1;
}