When we add a kprobe point and record events by perf, the execution path
of all threads on each cpu will enter this point, but perf may only
record events on a particular thread or cpu at this kprobe point, a
check on call->perf_events list filters out the threads which perf is
not recording.
Currently, bpf_prog will be entered at the beginning of
kprobe_perf_func() before the above check, which makes bpf_prog be
executed in every threads including determined not to be recorded
threads. A simple test can demonstrate this:
'bpf_prog_on_write.o' contains a bpf prog which outputs to trace buffer
when it is entered. Run a background thread 'another-dd' and 'dd'
simultaneously, but only record 'dd' thread by perf. The result shows
all threads trigger bpf_prog.
$ another-dd if=/dev/zero of=test1 bs=4k count=1000000
$ perf record -v --event bpf_prog_on_write.o -- dd if=/dev/zero of=test2 bs=4k count=3
$ cat /sys/kernel/debug/tracing/trace
another-dd-1007 [000] d... 120.225835: : generic_perform_write: tgid=1007, pid=1007
another-dd-1007 [000] d... 120.227123: : generic_perform_write: tgid=1007, pid=1007
[repeat many times...]
another-dd-1007 [000] d... 120.412395: : generic_perform_write: tgid=1007, pid=1007
another-dd-1007 [000] d... 120.412524: : generic_perform_write: tgid=1007, pid=1007
dd-1009 [000] d... 120.413080: : generic_perform_write: tgid=1009, pid=1009
dd-1009 [000] d... 120.414846: : generic_perform_write: tgid=1009, pid=1009
dd-1009 [000] d... 120.415013: : generic_perform_write: tgid=1009, pid=1009
another-dd-1007 [000] d... 120.416128: : generic_perform_write: tgid=1007, pid=1007
another-dd-1007 [000] d... 120.416295: : generic_perform_write: tgid=1007, pid=1007
This patch moves the check on perf_events list ahead and skip running
bpf_prog on threads perf not care.
After this patch:
$ another-dd if=/dev/zero of=test1 bs=4k count=1000000
$ perf record -v --event bpf_prog_on_write.o -- dd if=/dev/zero of=test2 bs=4k count=3
$ cat /sys/kernel/debug/tracing/trace
dd-994 [000] d... 46.386754: : generic_perform_write: tgid=994, pid=994
dd-994 [000] d... 46.389167: : generic_perform_write: tgid=994, pid=994
dd-994 [000] d... 46.389551: : generic_perform_write: tgid=994, pid=994
Signed-off-by: He Kuang <[email protected]>
---
kernel/trace/trace_kprobe.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index d0ce590..5600df8 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1141,13 +1141,13 @@ kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
int size, __size, dsize;
int rctx;
- if (prog && !trace_call_bpf(prog, regs))
- return;
-
head = this_cpu_ptr(call->perf_events);
if (hlist_empty(head))
return;
+ if (prog && !trace_call_bpf(prog, regs))
+ return;
+
dsize = __get_data_size(&tk->tp, regs);
__size = sizeof(*entry) + tk->tp.size + dsize;
size = ALIGN(__size + sizeof(u32), sizeof(u64));
@@ -1176,13 +1176,13 @@ kretprobe_perf_func(struct trace_kprobe *tk, struct kretprobe_instance *ri,
int size, __size, dsize;
int rctx;
- if (prog && !trace_call_bpf(prog, regs))
- return;
-
head = this_cpu_ptr(call->perf_events);
if (hlist_empty(head))
return;
+ if (prog && !trace_call_bpf(prog, regs))
+ return;
+
dsize = __get_data_size(&tk->tp, regs);
__size = sizeof(*entry) + tk->tp.size + dsize;
size = ALIGN(__size + sizeof(u32), sizeof(u64));
--
1.8.5.2
On 6/26/15 11:44 PM, He Kuang wrote:
> @@ -1141,13 +1141,13 @@ kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
> int size, __size, dsize;
> int rctx;
>
> - if (prog && !trace_call_bpf(prog, regs))
> - return;
> -
> head = this_cpu_ptr(call->perf_events);
> if (hlist_empty(head))
> return;
>
> + if (prog && !trace_call_bpf(prog, regs))
> + return;
I considered doing that before placing it where it is now.
There were some issues with this approach. Trying to remember.
I'm traveling at the moment. Will elaborate next week.
On 2015/06/27 15:44, He Kuang wrote:
> When we add a kprobe point and record events by perf, the execution path
> of all threads on each cpu will enter this point, but perf may only
> record events on a particular thread or cpu at this kprobe point, a
> check on call->perf_events list filters out the threads which perf is
> not recording.
>
> Currently, bpf_prog will be entered at the beginning of
> kprobe_perf_func() before the above check, which makes bpf_prog be
> executed in every threads including determined not to be recorded
> threads. A simple test can demonstrate this:
>
> 'bpf_prog_on_write.o' contains a bpf prog which outputs to trace buffer
> when it is entered. Run a background thread 'another-dd' and 'dd'
> simultaneously, but only record 'dd' thread by perf. The result shows
> all threads trigger bpf_prog.
>
> $ another-dd if=/dev/zero of=test1 bs=4k count=1000000
> $ perf record -v --event bpf_prog_on_write.o -- dd if=/dev/zero of=test2 bs=4k count=3
> $ cat /sys/kernel/debug/tracing/trace
> another-dd-1007 [000] d... 120.225835: : generic_perform_write: tgid=1007, pid=1007
> another-dd-1007 [000] d... 120.227123: : generic_perform_write: tgid=1007, pid=1007
> [repeat many times...]
> another-dd-1007 [000] d... 120.412395: : generic_perform_write: tgid=1007, pid=1007
> another-dd-1007 [000] d... 120.412524: : generic_perform_write: tgid=1007, pid=1007
> dd-1009 [000] d... 120.413080: : generic_perform_write: tgid=1009, pid=1009
> dd-1009 [000] d... 120.414846: : generic_perform_write: tgid=1009, pid=1009
> dd-1009 [000] d... 120.415013: : generic_perform_write: tgid=1009, pid=1009
> another-dd-1007 [000] d... 120.416128: : generic_perform_write: tgid=1007, pid=1007
> another-dd-1007 [000] d... 120.416295: : generic_perform_write: tgid=1007, pid=1007
>
> This patch moves the check on perf_events list ahead and skip running
> bpf_prog on threads perf not care.
>
> After this patch:
>
> $ another-dd if=/dev/zero of=test1 bs=4k count=1000000
> $ perf record -v --event bpf_prog_on_write.o -- dd if=/dev/zero of=test2 bs=4k count=3
> $ cat /sys/kernel/debug/tracing/trace
> dd-994 [000] d... 46.386754: : generic_perform_write: tgid=994, pid=994
> dd-994 [000] d... 46.389167: : generic_perform_write: tgid=994, pid=994
> dd-994 [000] d... 46.389551: : generic_perform_write: tgid=994, pid=994
This looks nice :)
Acked-by: Masami Hiramatsu <[email protected]>
Thank you!
>
> Signed-off-by: He Kuang <[email protected]>
> ---
> kernel/trace/trace_kprobe.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index d0ce590..5600df8 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -1141,13 +1141,13 @@ kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
> int size, __size, dsize;
> int rctx;
>
> - if (prog && !trace_call_bpf(prog, regs))
> - return;
> -
> head = this_cpu_ptr(call->perf_events);
> if (hlist_empty(head))
> return;
>
> + if (prog && !trace_call_bpf(prog, regs))
> + return;
> +
> dsize = __get_data_size(&tk->tp, regs);
> __size = sizeof(*entry) + tk->tp.size + dsize;
> size = ALIGN(__size + sizeof(u32), sizeof(u64));
> @@ -1176,13 +1176,13 @@ kretprobe_perf_func(struct trace_kprobe *tk, struct kretprobe_instance *ri,
> int size, __size, dsize;
> int rctx;
>
> - if (prog && !trace_call_bpf(prog, regs))
> - return;
> -
> head = this_cpu_ptr(call->perf_events);
> if (hlist_empty(head))
> return;
>
> + if (prog && !trace_call_bpf(prog, regs))
> + return;
> +
> dsize = __get_data_size(&tk->tp, regs);
> __size = sizeof(*entry) + tk->tp.size + dsize;
> size = ALIGN(__size + sizeof(u32), sizeof(u64));
>
--
Masami HIRAMATSU
Linux Technology Research Center, System Productivity Research Dept.
Center for Technology Innovation - Systems Engineering
Hitachi, Ltd., Research & Development Group
E-mail: [email protected]