2009-12-04 10:21:47

by Török Edwin

[permalink] [raw]
Subject: Unable to use tools/perf sched and timechart correctly

Hi,

I installed 2.6.32 and tried its tools/perf.
I noticed two problems so far, please tell me if I am doing anything
wrong here:

1. 'perf sched replay' segfaults on any recorded workload
I tested like this:
$ sudo ./perf sched record
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.295 MB perf.data (~12869 samples) ]
$ sudo perf sched replay
run measurement overhead: 1258 nsecs
sleep measurement overhead: 55232 nsecs
the run test took 999778 nsecs
the sleep test took 1055791 nsecs
nr_run_events: 182
nr_sleep_events: 209
nr_wakeup_events: 103
target-less wakeups: 12
[...]
task 25 ( icedove-bin: 8757), nr_events: 3
task 26 ( events/0: 15), nr_events: 3
------------------------------------------------------------
Segmentation fault

If I let perf sched record run for longer I get same results. I can send
you perf.data/trace.info if you can't reproduce the problem.

2. 'perf timechart' shows CPU almost idle, when in fact it is loaded 100
- 400%
$ echo "int main() {while(1){} return 0;}" >looper.c
$ gcc looper.c
$ sudo ./perf timechart record ./a.out
^C[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 0.608 MB perf.data (~26577 samples) ]
$ sudo ./perf timechart
Written 2.4 seconds of trace to output.svg.

Open timechart, the CPUs are most of the time in sleep state, and the
traced program (./a.out)
shows up with an entirely gray block (sleep), or doesn't show up at all!

I get the same results when trying to trace a real program, that loads
the CPU to almost 400% (according to top and gkrellm),
but in timechart output its shown only as a gray block, and the CPU is
most of the time idle (and when it isn't, the load is
attributed to other processes, not clamd):
http://www.fileden.com/files/2009/9/1/2561232/output.svg.gz

What am I doing wrong? Is the process given on commandline excluded from
the profile?
(I opened the svg with firefox, since inkscape refused to load it)

Attached is my .config

P.S. I don't know if its related but I noticed lots of segfaults of
kernel threads in dmesg:
[ 1939.527417] :<unknown>[13435]: segfault at 7f632d0826a8 ip
000000381de46e4b sp 00007f632d0826b0 error
6
[ 1939.527422] :events/0[13436]: segfault at 7f632d07e6a8 ip
000000381de46e4b sp 00007f632d07e6b0 error 6 in
libc-2.10.2.so[381de00000+14a000] in
libc-2.10.2.so[381de00000+14a000]

[
1939.527431]

[ 1947.751775] :<unknown>[13560]: segfault at 7f740165d6a8 ip
000000381de46e4b sp 00007f740165d6b0 error
6
[ 1947.751781] :events/0[13561]: segfault at 7f74016596a8 ip
000000381de46e4b sp 00007f74016596b0 error 6 in
libc-2.10.2.so[381de00000+14a000] in
libc-2.10.2.so[381de00000+14a000]

[
1947.751792]

[ 1948.709289] :<unknown>[13682]: segfault at 7f6e544a16a8 ip
000000381de46e4b sp 00007f6e544a16b0 error 6 in
libc-2.10.2.so[381de00000+14a000]

[ 1948.709301] :events/0[13683]: segfault at 7f6e5449d6a8 ip
000000381de46e4b sp 00007f6e5449d6b0 error
6
[ 1948.709308] in
libc-2.10.2.so[381de00000+14a000]

[ 2000.919905] :events/0[13860]: segfault at 7fccf062b6a8 ip
000000381de46e4b sp 00007fccf062b6b0 error
6
[ 2000.919910] :<unknown>[13859]: segfault at 7fccf062f6a8 ip
000000381de46e4b sp 00007fccf062f6b0 error 6 in
libc-2.10.2.so[381de00000+14a000] in
libc-2.10.2.so[381de00000+14a000]

[
2000.919919]

[ 2027.356686] :<unknown>[14060]: segfault at 7f40f88046a8 ip
000000381de46e4b sp 00007f40f88046b0 error
6
[ 2027.356693] :events/0[14061]: segfault at 7f40f88006a8 ip
000000381de46e4b sp 00007f40f88006b0 error 6 in
libc-2.10.2.so[381de00000+14a000] in
libc-2.10.2.so[381de00000+14a000]

[
2027.356704]

[ 2067.677990] :mplayer[14225]: segfault at 7fa7fed8c6a8 ip
000000381de46e4b sp 00007fa7fed8c6b0 error
6
[ 2067.677996] :sync_supers[14237]: segfault at 7fa7fed5c6a8 ip
000000381de46e4b sp 00007fa7fed5c6b0 error 6
[ 2067.678002] :klipper[14224]: segfault at 7fa7fed906a8 ip
000000381de46e4b sp 00007fa7fed906b0 error 6 in
libc-2.10.2.so[381de00000+14a000] in
libc-2.10.2.so[381de00000+14a000]

[
2067.678009]

[ 2067.678014] in
libc-2.10.2.so[381de00000+14a000]

[ 2067.678102] :kondemand/0[14231]: segfault at 7fa7fed746a8 ip
000000381de46e4b sp 00007fa7fed746b0 error 6
[ 2067.678119] :gkrellm[14228]: segfault at 7fa7fed806a8 ip
000000381de46e4b sp 00007fa7fed806b0 error
6
[ 2067.678123] :flush-253:2[14235]: segfault at 7fa7fed646a8 ip
000000381de46e4b sp 00007fa7fed646b0 error 6
[ 2067.678127] :Xorg[14229]: segfault at 7fa7fed7c6a8 ip
000000381de46e4b sp 00007fa7fed7c6b0 error
6
[ 2067.678131] :events/0[14236]: segfault at 7fa7fed606a8 ip
000000381de46e4b sp 00007fa7fed606b0 error
6
[ 2067.678136] :md4_raid10[14238]: segfault at 7fa7fed586a8 ip
000000381de46e4b sp 00007fa7fed586b0 error 6
[ 2067.678147] :events/1[14221]: segfault at 7fa7fed9c6a8 ip
000000381de46e4b sp 00007fa7fed9c6b0 error 6 in
libc-2.10.2.so[381de00000+14a000]
[ 2067.678210] in libc-2.10.2.so[381de00000+14a000]
[ 2067.678220] in libc-2.10.2.so[381de00000+14a000]
[ 2067.678225] in libc-2.10.2.so[381de00000+14a000]
[ 2067.678235] in libc-2.10.2.so[381de00000+14a000] in
libc-2.10.2.so[381de00000+14a000]
[ 2067.678239]
[ 2067.678253] in libc-2.10.2.so[381de00000+14a000]
[ 2079.333421] __ratelimit: 10 callbacks suppressed
[ 2079.333426] :kondemand/1[14266]: segfault at 7fd0006d66a8 ip
000000381de46e4b sp 00007fd0006d66b0 error 6
[ 2079.333432] :<unknown>[14265]: segfault at 7fd0006da6a8 ip
000000381de46e4b sp 00007fd0006da6b0 error 6 in
libc-2.10.2.so[381de00000+14a000] in libc-2.10.2.so[381de00000+14a000]
[ 2079.333445]
[ 2107.078219] :<unknown>[14304]: segfault at 7f173a7ac6a8 ip
000000381de46e4b sp 00007f173a7ac6b0 error 6
[ 2107.078224] :events/2[14308]: segfault at 7f173a79c6a8 ip
000000381de46e4b sp 00007f173a79c6b0 error 6
[ 2107.078231] :flush-253:2[14305]: segfault at 7f173a7a86a8 ip
000000381de46e4b sp 00007f173a7a86b0 error 6 in
libc-2.10.2.so[381de00000+14a000]
[ 2107.078237] in libc-2.10.2.so[381de00000+14a000] in
libc-2.10.2.so[381de00000+14a000]
[ 2107.078242]
[ 2462.674266] :ksoftirqd/3[15860]: segfault at 7fd9cd0a66a8 ip
000000381de46e4b sp 00007fd9cd0a66b0 error 6
[ 2462.674273] :swapper[15859]: segfault at 7fd9cd0aa6a8 ip
000000381de46e4b sp 00007fd9cd0aa6b0 error 6 in
libc-2.10.2.so[381de00000+14a000] in libc-2.10.2.so[381de00000+14a000]
[ 2462.674287] :lighttpd[15861]: segfault at 7fd9cd0a26a8 ip
000000381de46e4b sp 00007fd9cd0a26b0 error 6
[ 2462.674292] in libc-2.10.2.so[381de00000+14a000]
[ 2462.674295]
[ 2462.674432] :konsole[15858]: segfault at 7fd9cd0ae6a8 ip
000000381de46e4b sp 00007fd9cd0ae6b0 error 6 in
libc-2.10.2.so[381de00000+14a000]

However the events/[0-3] processes are still running, some of the others
not anymore (like swapper).

Best regards,
--Edwin


Attachments:
config-2.6.32 (58.18 kB)

2009-12-06 10:43:27

by Xiao Guangrong

[permalink] [raw]
Subject: Re: Unable to use tools/perf sched and timechart correctly



Török Edwin wrote:
> Hi,
>
> I installed 2.6.32 and tried its tools/perf.
> I noticed two problems so far, please tell me if I am doing anything
> wrong here:
>
> 1. 'perf sched replay' segfaults on any recorded workload
> I tested like this:
> $ sudo ./perf sched record
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.295 MB perf.data (~12869 samples) ]
> $ sudo perf sched replay
> run measurement overhead: 1258 nsecs
> sleep measurement overhead: 55232 nsecs
> the run test took 999778 nsecs
> the sleep test took 1055791 nsecs
> nr_run_events: 182
> nr_sleep_events: 209
> nr_wakeup_events: 103
> target-less wakeups: 12
> [...]
> task 25 ( icedove-bin: 8757), nr_events: 3
> task 26 ( events/0: 15), nr_events: 3
> ------------------------------------------------------------
> Segmentation fault

You should enable "CONFIG_SCHED_DEBUG", because we use
"/proc/<pid>/sched" file to get the task's cpu usage.

Ingo, I think we do better show the reason about this error,
just like:

---
tools/perf/builtin-sched.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 26b782f..f823631 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -427,7 +427,11 @@ static u64 get_cpu_usage_nsec_self(void)

sprintf(filename, "/proc/%d/sched", getpid());
file = fopen(filename, "r");
- BUG_ON(!file);
+ if (!file)
+ {
+ printf("You should compile your kernel with CONFIG_SCHED_DEBUG option\n");
+ BUG_ON(1);
+ }

while ((chars = getline(&line, &len, file)) != -1) {
ret = sscanf(line, "se.sum_exec_runtime : %ld.%06ld\n",
--
1.6.1.2

Thanks,
Xiao

2009-12-06 10:55:08

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Unable to use tools/perf sched and timechart correctly

On Sun, 2009-12-06 at 18:41 +0800, Xiao Guangrong wrote:

> You should enable "CONFIG_SCHED_DEBUG", because we use
> "/proc/<pid>/sched" file to get the task's cpu usage.
>
> Ingo, I think we do better show the reason about this error,
> just like:
>
> ---
> tools/perf/builtin-sched.c | 6 +++++-
> 1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index 26b782f..f823631 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -427,7 +427,11 @@ static u64 get_cpu_usage_nsec_self(void)
>
> sprintf(filename, "/proc/%d/sched", getpid());
> file = fopen(filename, "r");
> - BUG_ON(!file);
> + if (!file)
> + {
> + printf("You should compile your kernel with CONFIG_SCHED_DEBUG option\n");
> + BUG_ON(1);
> + }
>
> while ((chars = getline(&line, &len, file)) != -1) {
> ret = sscanf(line, "se.sum_exec_runtime : %ld.%06ld\n",

Hrmm, we shouldn't be using a proc file at all, the best solution is to
remove this dependency.


2009-12-06 12:52:16

by Török Edwin

[permalink] [raw]
Subject: Re: Unable to use tools/perf sched and timechart correctly

On 2009-12-06 12:41, Xiao Guangrong wrote:
> Török Edwin wrote:
>
>> Hi,
>>
>> I installed 2.6.32 and tried its tools/perf.
>> I noticed two problems so far, please tell me if I am doing anything
>> wrong here:
>>
>> 1. 'perf sched replay' segfaults on any recorded workload
>> I tested like this:
>> $ sudo ./perf sched record
>> ^C[ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.295 MB perf.data (~12869 samples) ]
>> $ sudo perf sched replay
>> run measurement overhead: 1258 nsecs
>> sleep measurement overhead: 55232 nsecs
>> the run test took 999778 nsecs
>> the sleep test took 1055791 nsecs
>> nr_run_events: 182
>> nr_sleep_events: 209
>> nr_wakeup_events: 103
>> target-less wakeups: 12
>> [...]
>> task 25 ( icedove-bin: 8757), nr_events: 3
>> task 26 ( events/0: 15), nr_events: 3
>> ------------------------------------------------------------
>> Segmentation fault
>>
>
> You should enable "CONFIG_SCHED_DEBUG", because we use
> "/proc/<pid>/sched" file to get the task's cpu usage.
>

Thanks, that got rid of the segfault, and perf sched replay runs without
crashing now.

Am I missing a config option for 'perf timechart' to work?
I tried running 4 infloop executables (on a 4-core box), the last one as
argument to perf timechart record.
Only one of those processes has shown up, as using CPU for 26.5ms, and
then sleeping (when in fact it was running the entire time).
The CPU graphs are mostly white, with a thicker blue line where a.out
shown up, and some scarce thin blue lines.


Best regards,
--Edwin

2009-12-06 17:16:47

by Ingo Molnar

[permalink] [raw]
Subject: Re: Unable to use tools/perf sched and timechart correctly


* Xiao Guangrong <[email protected]> wrote:

> Ingo, I think we do better show the reason about this error,
> just like:

yeah, agreed - mind sending it with a changelog, with a Reported-by and
in a standalone mail? Thanks,

Ingo