2018-01-16 12:34:27

by Cheng Jian

[permalink] [raw]
Subject: [PATCH v3] perf/trace : Fix repetitious traces of perf on tracepoint

When i use perf to trace the sched_wakeup_new tracepoint, there is
a bug that output the same event repetitiously.
It can be reproduced by :

#./test_fork
parent pid : 1059
child pid : 1060
#perf record -e sched:sched_wakeup_new -p 1060

test_fork is an demo that can generating wakeup_new event, parent
process does nothing but fork a child process, and then they both
quit.

There are 4 processors in this machine. before this patch,
perf script(perf-1058, parent-1059, child-1060) :

test_fork 1059 [001] 62.913689: sched:sched_wakeup_new: comm=test_fork pid=1060 prio=120 target_cpu=002
test_fork 1059 [001] 62.913698: sched:sched_wakeup_new: comm=test_fork pid=1060 prio=120 target_cpu=002
test_fork 1059 [001] 62.913705: sched:sched_wakeup_new: comm=test_fork pid=1060 prio=120 target_cpu=002

but ftrace report this event only once :

test_fork-1059 [002] d... 62.913680: sched_wakeup_new: comm=test_fork pid=1060 prio=120 target_cpu=002

perf script print the same wakeup_new event multiple times.

These events which trigger this issue all specify a target process.
commit e6dab5ffab59 ("perf/trace: Add ability to set a target task
for events") has designed a method to trace these events. For
example, the sched_wakeup and sched_wakeup_new tracepoint will be
caught when the current task wakeup a target task.

These events are registered as per cpu most of the time and attached
to the task too, we will get all of them from the perf_event_context
of this task, they will be matched success but are all the same event.
So check the cpu number of this event to avoid matching them multiple
times.

after this patch, perf script(parent-1040, child-1041):

test_fork 1040 [002] 36.536079: sched:sched_wakeup_new: comm=test_fork pid=1041 prio=120 target_cpu=003

It will match it only once for tracing task(child-1041).

Signed-off-by: Cheng Jian <[email protected]>
---
kernel/events/core.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4df5b69..a199c03 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7924,6 +7924,8 @@ void perf_tp_event(u16 event_type, u64 count, void *record, int entry_size,
continue;
if (event->attr.config != entry->type)
continue;
+ if (event->cpu != task_cpu(task) && event->cpu != -1)
+ continue;
if (perf_tp_event_match(event, &data, regs))
perf_swevent_event(event, count, &data, regs);
}
--
1.8.3.1


2018-01-16 14:43:02

by Milian Wolff

[permalink] [raw]
Subject: Re: [PATCH v3] perf/trace : Fix repetitious traces of perf on tracepoint

On Tuesday, January 16, 2018 1:40:38 PM CET Cheng Jian wrote:
> When i use perf to trace the sched_wakeup_new tracepoint, there is
> a bug that output the same event repetitiously.
> It can be reproduced by :
>
> #./test_fork
> parent pid : 1059
> child pid : 1060
> #perf record -e sched:sched_wakeup_new -p 1060
>
> test_fork is an demo that can generating wakeup_new event, parent
> process does nothing but fork a child process, and then they both
> quit.
>
> There are 4 processors in this machine. before this patch,
> perf script(perf-1058, parent-1059, child-1060) :
>
> test_fork 1059 [001] 62.913689: sched:sched_wakeup_new:
> comm=test_fork pid=1060 prio=120 target_cpu=002 test_fork 1059 [001]
> 62.913698: sched:sched_wakeup_new: comm=test_fork pid=1060 prio=120
> target_cpu=002 test_fork 1059 [001] 62.913705: sched:sched_wakeup_new:
> comm=test_fork pid=1060 prio=120 target_cpu=002
>
> but ftrace report this event only once :
>
> test_fork-1059 [002] d... 62.913680: sched_wakeup_new: comm=test_fork
> pid=1060 prio=120 target_cpu=002
>
> perf script print the same wakeup_new event multiple times.
>
> These events which trigger this issue all specify a target process.
> commit e6dab5ffab59 ("perf/trace: Add ability to set a target task
> for events") has designed a method to trace these events. For
> example, the sched_wakeup and sched_wakeup_new tracepoint will be
> caught when the current task wakeup a target task.
>
> These events are registered as per cpu most of the time and attached
> to the task too, we will get all of them from the perf_event_context
> of this task, they will be matched success but are all the same event.
> So check the cpu number of this event to avoid matching them multiple
> times.
>
> after this patch, perf script(parent-1040, child-1041):
>
> test_fork 1040 [002] 36.536079: sched:sched_wakeup_new: comm=test_fork
> pid=1041 prio=120 target_cpu=003
>
> It will match it only once for tracing task(child-1041).

Oh, this sounds awesome. I don't have the setup available to compile a kernel
with this patch applied, but I think from the description it solves a long-
standing issue with perf's sleep-time profiling.

Can someone try this please:
https://perf.wiki.kernel.org/index.php/Tutorial#Profiling_sleep_times

Use 'sleep 1' as the debuggee. On my system, I get the period multiplied by
nproc like you describe:

```
$ perf-sleep-record sleep 1
..
$ perf report --stdio --show-total-period | grep "Event count"
..
# Event count (approx.): 8000845488
$ nproc
8
```

The sleep-record script is available at: https://github.com/milianw/shell-helpers/blob/master/perf-sleep-record

I believe your patch also fixes the sched_stat_* tracepoints to be only
emitted once per CPU. Can you verify this? I.e. is the period finally
correctly calculated and we get a value of roughly 1E9ns == 1s?

Thanks

--
Milian Wolff | [email protected] | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts


Attachments:
smime.p7s (3.74 kB)

2018-01-16 15:06:36

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v3] perf/trace : Fix repetitious traces of perf on tracepoint

On Tue, Jan 16, 2018 at 08:40:38PM +0800, Cheng Jian wrote:
> When i use perf to trace the sched_wakeup_new tracepoint, there is
> a bug that output the same event repetitiously.
> It can be reproduced by :
>
> #./test_fork
> parent pid : 1059
> child pid : 1060
> #perf record -e sched:sched_wakeup_new -p 1060
>
> test_fork is an demo that can generating wakeup_new event, parent
> process does nothing but fork a child process, and then they both
> quit.
>
> There are 4 processors in this machine. before this patch,
> perf script(perf-1058, parent-1059, child-1060) :
>
> test_fork 1059 [001] 62.913689: sched:sched_wakeup_new: comm=test_fork pid=1060 prio=120 target_cpu=002
> test_fork 1059 [001] 62.913698: sched:sched_wakeup_new: comm=test_fork pid=1060 prio=120 target_cpu=002
> test_fork 1059 [001] 62.913705: sched:sched_wakeup_new: comm=test_fork pid=1060 prio=120 target_cpu=002
>
> but ftrace report this event only once :
>
> test_fork-1059 [002] d... 62.913680: sched_wakeup_new: comm=test_fork pid=1060 prio=120 target_cpu=002
>
> perf script print the same wakeup_new event multiple times.
>
> These events which trigger this issue all specify a target process.
> commit e6dab5ffab59 ("perf/trace: Add ability to set a target task
> for events") has designed a method to trace these events. For
> example, the sched_wakeup and sched_wakeup_new tracepoint will be
> caught when the current task wakeup a target task.
>
> These events are registered as per cpu most of the time and attached
> to the task too, we will get all of them from the perf_event_context
> of this task, they will be matched success but are all the same event.
> So check the cpu number of this event to avoid matching them multiple
> times.
>
> after this patch, perf script(parent-1040, child-1041):
>
> test_fork 1040 [002] 36.536079: sched:sched_wakeup_new: comm=test_fork pid=1041 prio=120 target_cpu=003
>
> It will match it only once for tracing task(child-1041).
>
> Signed-off-by: Cheng Jian <[email protected]>

the duplicated events are gone.. solution looks ok to me

Reviewed-and-Tested-by: Jiri Olsa <[email protected]>

jirka

> ---
> kernel/events/core.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 4df5b69..a199c03 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7924,6 +7924,8 @@ void perf_tp_event(u16 event_type, u64 count, void *record, int entry_size,
> continue;
> if (event->attr.config != entry->type)
> continue;
> + if (event->cpu != task_cpu(task) && event->cpu != -1)
> + continue;
> if (perf_tp_event_match(event, &data, regs))
> perf_swevent_event(event, count, &data, regs);
> }
> --
> 1.8.3.1
>

2018-01-17 13:33:47

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v3] perf/trace : Fix repetitious traces of perf on tracepoint

Em Tue, Jan 16, 2018 at 04:06:22PM +0100, Jiri Olsa escreveu:
> On Tue, Jan 16, 2018 at 08:40:38PM +0800, Cheng Jian wrote:
> > When i use perf to trace the sched_wakeup_new tracepoint, there is
> > a bug that output the same event repetitiously.
> > It can be reproduced by :
> >
> > #./test_fork
> > parent pid : 1059
> > child pid : 1060
> > #perf record -e sched:sched_wakeup_new -p 1060
> >
> > test_fork is an demo that can generating wakeup_new event, parent
> > process does nothing but fork a child process, and then they both
> > quit.
> >
> > There are 4 processors in this machine. before this patch,
> > perf script(perf-1058, parent-1059, child-1060) :
> >
> > test_fork 1059 [001] 62.913689: sched:sched_wakeup_new: comm=test_fork pid=1060 prio=120 target_cpu=002
> > test_fork 1059 [001] 62.913698: sched:sched_wakeup_new: comm=test_fork pid=1060 prio=120 target_cpu=002
> > test_fork 1059 [001] 62.913705: sched:sched_wakeup_new: comm=test_fork pid=1060 prio=120 target_cpu=002
> >
> > but ftrace report this event only once :
> >
> > test_fork-1059 [002] d... 62.913680: sched_wakeup_new: comm=test_fork pid=1060 prio=120 target_cpu=002
> >
> > perf script print the same wakeup_new event multiple times.
> >
> > These events which trigger this issue all specify a target process.
> > commit e6dab5ffab59 ("perf/trace: Add ability to set a target task
> > for events") has designed a method to trace these events. For
> > example, the sched_wakeup and sched_wakeup_new tracepoint will be
> > caught when the current task wakeup a target task.
> >
> > These events are registered as per cpu most of the time and attached
> > to the task too, we will get all of them from the perf_event_context
> > of this task, they will be matched success but are all the same event.
> > So check the cpu number of this event to avoid matching them multiple
> > times.
> >
> > after this patch, perf script(parent-1040, child-1041):
> >
> > test_fork 1040 [002] 36.536079: sched:sched_wakeup_new: comm=test_fork pid=1041 prio=120 target_cpu=003
> >
> > It will match it only once for tracing task(child-1041).
> >
> > Signed-off-by: Cheng Jian <[email protected]>
>
> the duplicated events are gone.. solution looks ok to me
>
> Reviewed-and-Tested-by: Jiri Olsa <[email protected]>

I noticed that in the past as well, will try to test this too.

- Arnaldo

2018-01-17 14:25:58

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v3] perf/trace : Fix repetitious traces of perf on tracepoint

Em Wed, Jan 17, 2018 at 10:33:42AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Jan 16, 2018 at 04:06:22PM +0100, Jiri Olsa escreveu:
> > On Tue, Jan 16, 2018 at 08:40:38PM +0800, Cheng Jian wrote:
> > > after this patch, perf script(parent-1040, child-1041):

> > > test_fork 1040 [002] 36.536079: sched:sched_wakeup_new: comm=test_fork pid=1041 prio=120 target_cpu=003

> > > It will match it only once for tracing task(child-1041).

> > > Signed-off-by: Cheng Jian <[email protected]>

> > the duplicated events are gone.. solution looks ok to me

> > Reviewed-and-Tested-by: Jiri Olsa <[email protected]>

> I noticed that in the past as well, will try to test this too.

Unpatched kernel:

[root@jouet ~]# perf trace --no-syscalls -e sched:sched_wakeup_new time sleep 1
0.000 sched:sched_wakeup_new:time:13571 [120] success=1 CPU:003)
18446744073709.547 sched:sched_wakeup_new:time:13571 [120] success=1 CPU:003)
18446744073709.551 sched:sched_wakeup_new:time:13571 [120] success=1 CPU:003)
18446744073709.551 sched:sched_wakeup_new:time:13571 [120] success=1 CPU:003)
18446744073709.551 sched:sched_wakeup_new:time:13571 [120] success=1 CPU:003)
0.00user 0.00system 0:01.00elapsed 0%CPU (0avgtext+0avgdata 1996maxresident)k
0inputs+0outputs (0major+71minor)pagefaults 0swaps
[root@jouet ~]#

Patched kernel:

[root@seventh ~]# perf trace --no-syscalls -e sched:sched_wakeup_new time sleep 1
0.000 sched:sched_wakeup_new:comm=time pid=339 prio=120 target_cpu=002)
0.001 sched:sched_wakeup_new:comm=time pid=339 prio=120 target_cpu=002)
0.00user 0.00system 0:01.00elapsed 0%CPU (0avgtext+0avgdata 2084maxresident)k
0inputs+0outputs (0major+75minor)pagefaults 0swaps
[root@seventh ~]#

2018-01-27 09:43:32

by Cheng Jian

[permalink] [raw]
Subject: Re: [PATCH v3] perf/trace : Fix repetitious traces of perf on tracepoint

Hi, Milian


On 2018/1/16 22:33, Milian Wolff wrote:
>> perf script print the same wakeup_new event multiple times.
>>
>> These events which trigger this issue all specify a target process.
>> commit e6dab5ffab59 ("perf/trace: Add ability to set a target task
>> for events") has designed a method to trace these events. For
>> example, the sched_wakeup and sched_wakeup_new tracepoint will be
>> caught when the current task wakeup a target task.
>>
>>
the sched_stat_* tracepoint also specify a target process, so it will be
reported
nrcpus times too.

for example sched_stat_sleep

swapper 0 [002] 188.752870: sched:sched_stat_sleep:
comm=bug_fork_loop pid=1051 delay=2133649486 [ns]
| |
| |
current task



> Oh, this sounds awesome. I don't have the setup available to compile a kernel
> with this patch applied, but I think from the description it solves a long-
> standing issue with perf's sleep-time profiling.
>
> Can someone try this please:
> https://perf.wiki.kernel.org/index.php/Tutorial#Profiling_sleep_times
>
when current != task.

#echo 1 > /proc/sys/kernel/sched_schedstats
#./perf-bin/perf record -e sched:sched_stat_sleep ./test_fork_loop

before this patch:
:1050 1050 [000] 186.597339: sched:sched_stat_sleep:
comm=perf pid=1051 delay=22955314 [ns]
:1050 1050 [000] 186.597397: sched:sched_stat_sleep:
comm=perf pid=1051 delay=22955314 [ns]
:1050 1050 [000] 186.597406: sched:sched_stat_sleep:
comm=perf pid=1051 delay=22955314 [ns]
:1050 1050 [000] 186.597415: sched:sched_stat_sleep:
comm=perf pid=1051 delay=22955314 [ns]
swapper 0 [002] 188.752870: sched:sched_stat_sleep:
comm=bug_fork_loop pid=1051 delay=2133649486 [ns]
swapper 0 [002] 188.752899: sched:sched_stat_sleep:
comm=bug_fork_loop pid=1051 delay=2133649486 [ns]
swapper 0 [002] 188.752952: sched:sched_stat_sleep:
comm=bug_fork_loop pid=1051 delay=2133649486 [ns]
swapper 0 [002] 188.752965: sched:sched_stat_sleep:
comm=bug_fork_loop pid=1051 delay=2133649486 [ns]

after this patch:
:1041 1041 [001] 62.447062: sched:sched_stat_sleep:
comm=perf pid=1042 delay=43035826 [ns]
swapper 0 [001] 62.704644: sched:sched_stat_sleep:
comm=bug_fork_loop pid=1042 delay=234613068 [ns]


>
> I believe your patch also fixes the sched_stat_* tracepoints to be only
> emitted once per CPU. Can you verify this? I.e. is the period finally
> correctly calculated and we get a value of roughly 1E9ns == 1s?
>
Yeah, this patch fix it.

Thanks.
CHENG Jian