2018-01-15 12:28:03

by Cheng Jian

[permalink] [raw]
Subject: [PATCH] perf/trace : Fix repetitious traces of perf on tracepoint When i use perf to trace the sched_wakeup_new tracepoint, there is a bug that output the same event repetitiously. It can be reproduced by :

perf record -e sched:sched_wakeup_new ./bug_fork

bug_fork is an demo that can generating wakeup_new event, parent
process does nothing but fork a child process, and then they both
quit.

There are 4 processors in this machine. before this patch,
perf script(perf-1058, parent-1059, child-1060) :

bug_fork 1059 [001] 62.913666: sched:sched_wakeup_new:
comm=bug_fork pid=1060 prio=120 target_cpu=002
bug_fork 1059 [001] 62.913680: sched:sched_wakeup_new:
comm=bug_fork pid=1060 prio=120 target_cpu=002
bug_fork 1059 [001] 62.913689: sched:sched_wakeup_new:
comm=bug_fork pid=1060 prio=120 target_cpu=002
bug_fork 1059 [001] 62.913698: sched:sched_wakeup_new:
comm=bug_fork pid=1060 prio=120 target_cpu=002
bug_fork 1059 [001] 62.913705: sched:sched_wakeup_new:
comm=bug_fork pid=1060 prio=120 target_cpu=002

but ftrace report this event only once :

bug_fork-1059 [002] d... 62.913666: sched_wakeup_new:
comm=bug_fork pid=1060 prio=120 target_cpu=002

perf script print wakeup_new event multiple times.

These events which trigger this issue all specify a target process.
commit e6dab5ffab59 ("perf/trace: Add ability to set a target task
for events") has designed a method to trace these events. For
example, the sched_wakeup and sched_wakeup_new tracepoint will be
caught when the current task wakeup a target task. If we trace both
of them and task(waken) != current(wakee), it will match this event
at the beginning for tracing current task, and then match again for
tracing the waken task. But these events are registered at all cpus
most of the time, so these events will be matched nr_cpu times
in this branch.

We just forcus on the task thread here, so check the cpu number of
this event and task.

after this patch, perf script(perf-1039, parent-1040, child-1041):

bug_fork 1040 [002] 36.535963: sched:sched_wakeup_new:
comm=bug_fork pid=1041 prio=120 target_cpu=003
bug_fork 1040 [002] 36.536079: sched:sched_wakeup_new:
comm=bug_fork pid=1041 prio=120 target_cpu=003

match it twice, an match for tracing current(parent) and an match
for task(child).

Signed-off-by: Cheng Jian <[email protected]>
---
kernel/events/core.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4df5b69..a199c03 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7924,6 +7924,8 @@ void perf_tp_event(u16 event_type, u64 count, void *record, int entry_size,
continue;
if (event->attr.config != entry->type)
continue;
+ if (event->cpu != task_cpu(task) && event->cpu != -1)
+ continue;
if (perf_tp_event_match(event, &data, regs))
perf_swevent_event(event, count, &data, regs);
}
--
1.8.3.1


2018-01-15 12:32:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf/trace : Fix repetitious traces of perf on tracepoint When i use perf to trace the sched_wakeup_new tracepoint, there is a bug that output the same event repetitiously. It can be reproduced by :

On Mon, Jan 15, 2018 at 08:33:53PM +0800, Cheng Jian wrote:
> perf record -e sched:sched_wakeup_new ./bug_fork
>
> bug_fork is an demo that can generating wakeup_new event, parent
> process does nothing but fork a child process, and then they both
> quit.
>
> There are 4 processors in this machine. before this patch,
> perf script(perf-1058, parent-1059, child-1060) :
>
> bug_fork 1059 [001] 62.913666: sched:sched_wakeup_new:
> comm=bug_fork pid=1060 prio=120 target_cpu=002
> bug_fork 1059 [001] 62.913680: sched:sched_wakeup_new:
> comm=bug_fork pid=1060 prio=120 target_cpu=002
> bug_fork 1059 [001] 62.913689: sched:sched_wakeup_new:
> comm=bug_fork pid=1060 prio=120 target_cpu=002
> bug_fork 1059 [001] 62.913698: sched:sched_wakeup_new:
> comm=bug_fork pid=1060 prio=120 target_cpu=002
> bug_fork 1059 [001] 62.913705: sched:sched_wakeup_new:
> comm=bug_fork pid=1060 prio=120 target_cpu=002

Please don't wrap that, that's unreadable

>
> but ftrace report this event only once :
>
> bug_fork-1059 [002] d... 62.913666: sched_wakeup_new:
> comm=bug_fork pid=1060 prio=120 target_cpu=002
>
> perf script print wakeup_new event multiple times.
>
> These events which trigger this issue all specify a target process.
> commit e6dab5ffab59 ("perf/trace: Add ability to set a target task
> for events") has designed a method to trace these events. For
> example, the sched_wakeup and sched_wakeup_new tracepoint will be
> caught when the current task wakeup a target task. If we trace both
> of them and task(waken) != current(wakee), it will match this event
> at the beginning for tracing current task, and then match again for
> tracing the waken task. But these events are registered at all cpus
> most of the time, so these events will be matched nr_cpu times
> in this branch.
>
> We just forcus on the task thread here, so check the cpu number of
> this event and task.
>
> after this patch, perf script(perf-1039, parent-1040, child-1041):
>
> bug_fork 1040 [002] 36.535963: sched:sched_wakeup_new:
> comm=bug_fork pid=1041 prio=120 target_cpu=003
> bug_fork 1040 [002] 36.536079: sched:sched_wakeup_new:
> comm=bug_fork pid=1041 prio=120 target_cpu=003
>
> match it twice, an match for tracing current(parent) and an match
> for task(child).

So what is the bug? The parent gets one, the child gets one, that's
correct, no?

2018-01-16 01:07:11

by Cheng Jian

[permalink] [raw]
Subject: Re: [PATCH] perf/trace : Fix repetitious traces of perf on tracepoint When i use perf to trace the sched_wakeup_new tracepoint, there is a bug that output the same event repetitiously. It can be reproduced by :

On 2018/1/15 20:31, Peter Zijlstra wrote:


I'm sorry I gave an inappropriate example to make the phenomenon look
confusing.

These events are registered per_cpu and attach in the the
perf_event_ctxp of task too.
So the same event is placed in the process context CPU times.

perf record -e sched:sched_wakeup_new -vv bug_fork
......
sys_perf_event_open: pid 1063 cpu 0 group_fd -1 flags 0x8 = 4
sys_perf_event_open: pid 1063 cpu 1 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 1063 cpu 2 group_fd -1 flags 0x8 = 6
sys_perf_event_open: pid 1063 cpu 3 group_fd -1 flags 0x8 = 8
mmap size 528384B
perf event ring buffer mmapped per cpu




/*
* If we got specified a target task, also iterate its context and
* deliver this event there too.
*/
// Here we want to see whether this event can be matched by the
task not current.
// task is the child now.(tracing task)
if (task && task != current) {
......
// the event mmaped per cpu, But
// task->perf_event_ctxp[perf_sw_context] get all the events
attach in this task
ctx = rcu_dereference(task->perf_event_ctxp[perf_sw_context]);

list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
if (event->attr.type != PERF_TYPE_TRACEPOINT)
continue;
if (event->attr.config != entry->type)
continue;
// event->cpu(3) != task_cpu(1) bug_fork 1063
[003] 128.001255: sched:sched_wakeup_new: comm=bug_fork pid=1064
prio=120 target_cpu=002
// event->cpu(2) != task_cpu(1) bug_fork 1063
[003] 128.001255: sched:sched_wakeup_new: comm=bug_fork pid=1064
prio=120 target_cpu=002
// event->cpu(1) == task_cpu(1) bug_fork 1063
[003] 128.001255: sched:sched_wakeup_new: comm=bug_fork pid=1064
prio=120 target_cpu=002
// event->cpu(0) != task_cpu(1) bug_fork 1063
[003] 128.001255: sched:sched_wakeup_new: comm=bug_fork pid=1064
prio=120 target_cpu=002
// but we only need one about event->cpu(1) ==
task_cpu(1), the others are repeated
if (perf_tp_event_match(event, &data, regs))
perf_swevent_event(event, count, &data, regs);
}
unlock:
rcu_read_unlock();
}
```



>> after this patch, perf script(perf-1039, parent-1040, child-1041):
>>
>> bug_fork 1040 [002] 36.535963: sched:sched_wakeup_new:
>> comm=bug_fork pid=1041 prio=120 target_cpu=003
>> bug_fork 1040 [002] 36.536079: sched:sched_wakeup_new:
>> comm=bug_fork pid=1041 prio=120 target_cpu=003
>>
>> match it twice, an match for tracing current(parent) and an match
>> for task(child).
> So what is the bug? The parent gets one, the child gets one, that's
> correct, no?
>
> .
So the bug is, we match the same event NR_CPU times when tracing the
waken(child process in my demo).


I will modify my commit, and make it clearer.


Thanks.

CHENG Jian