2010-04-04 08:11:05

by Hitoshi Mitake

[permalink] [raw]
Subject: Floating point exception by perf record


Hi Arnaldo,

I suffered a mysterious floating point exception when executing perf
on current tip/master.

For example,

% sudo ./perf lock record ./perf bench sched messaging
# Running sched/messaging benchmark...
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

Total time: 1.375 [sec]
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 160.002 MB perf.data (~6990599 samples) ]
[1] 26348 floating point exception sudo ./perf lock record ./perf
bench sched messaging

dmesg says that

[10212.752490] perf[26348] trap divide error ip:7fade8564310
sp:7fff46d926e8 error:0 in libnewt.so.0.52.2[7fade8556000+15000]

Example is of perf lock, but this is not a special phenomenon of perf lock,
other users of perf record, like perf sched, caused the same problem.

I found that this might be caused by the line 283 of perf.c

if (fclose(stdout)) # <- closing stdout!
die("close failed on standard output: %s", strerror(errno));

I don't know well about libnewt, do you know something like cause?

The version of libnewt I'm using is 0.52.2 (default of Debian lenny)

Thanks,
Hitoshi


2010-04-05 18:31:49

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: Floating point exception by perf record

Em Sun, Apr 04, 2010 at 05:10:57PM +0900, Hitoshi Mitake escreveu:
> I suffered a mysterious floating point exception when executing perf
> on current tip/master.
>
> For example,
>
> % sudo ./perf lock record ./perf bench sched messaging
> # Running sched/messaging benchmark...
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 1.375 [sec]
> [ perf record: Woken up 0 times to write data ]
> [ perf record: Captured and wrote 160.002 MB perf.data (~6990599 samples) ]
> [1] 26348 floating point exception sudo ./perf lock record ./perf
> bench sched messaging
>
> dmesg says that
>
> [10212.752490] perf[26348] trap divide error ip:7fade8564310
> sp:7fff46d926e8 error:0 in libnewt.so.0.52.2[7fade8556000+15000]
>
> Example is of perf lock, but this is not a special phenomenon of perf lock,
> other users of perf record, like perf sched, caused the same problem.
>
> I found that this might be caused by the line 283 of perf.c
>
> if (fclose(stdout)) # <- closing stdout!
> die("close failed on standard output: %s", strerror(errno));
>
> I don't know well about libnewt, do you know something like cause?
>
> The version of libnewt I'm using is 0.52.2 (default of Debian lenny)

I just built a kernel where 'perf lock' can work, but I'm seeing a
different problem:

[root@doppio ~]# perf lock trace | head -3
Warning: Error: expected 'data' but read 'overwrite'
swapper-0 [000] 619.086424: lock_acquire: 0xffff8800029d4b28 &q->lock
swapper-0 [000] 619.086429: lock_acquire: 0xffff8800029d08f8 &ctx->lock
swapper-0 [000] 619.086439: lock_acquire: 0xffffffff81a38e58 clockevents_lock
[root@doppio ~]# perf lock report | head -3
Warning: Error: expected 'data' but read 'overwrite'
ID Name acquired contended total wait (ns) max wait (ns) min wait (ns)

0xffff8800029d4b28 &q->lock 0 0 0 0 0
[root@doppio ~]#

My system details:

[root@doppio ~]# uname -a
Linux doppio.ghostprotocols.net 2.6.34-rc3-tip+ #1 SMP Mon Apr 5 13:49:42 BRT 2010 x86_64 x86_64 x86_64 GNU/Linux
[root@doppio ~]# rpm -q newt
newt-0.52.10-4.fc11.x86_64
[root@doppio ~]#

No segfaults here :-\

- Arnaldo

2010-04-07 08:32:32

by Hitoshi Mitake

[permalink] [raw]
Subject: Re: Floating point exception by perf record

On 04/06/10 03:31, Arnaldo Carvalho de Melo wrote:
> Em Sun, Apr 04, 2010 at 05:10:57PM +0900, Hitoshi Mitake escreveu:
>> I suffered a mysterious floating point exception when executing perf
>> on current tip/master.
>>
>> For example,
>>
>> % sudo ./perf lock record ./perf bench sched messaging
>> # Running sched/messaging benchmark...
>> # 20 sender and receiver processes per group
>> # 10 groups == 400 processes run
>>
>> Total time: 1.375 [sec]
>> [ perf record: Woken up 0 times to write data ]
>> [ perf record: Captured and wrote 160.002 MB perf.data (~6990599
samples) ]
>> [1] 26348 floating point exception sudo ./perf lock record ./perf
>> bench sched messaging
>>
>> dmesg says that
>>
>> [10212.752490] perf[26348] trap divide error ip:7fade8564310
>> sp:7fff46d926e8 error:0 in libnewt.so.0.52.2[7fade8556000+15000]
>>
>> Example is of perf lock, but this is not a special phenomenon of
perf lock,
>> other users of perf record, like perf sched, caused the same problem.
>>
>> I found that this might be caused by the line 283 of perf.c
>>
>> if (fclose(stdout)) #<- closing stdout!
>> die("close failed on standard output: %s", strerror(errno));
>>
>> I don't know well about libnewt, do you know something like cause?
>>
>> The version of libnewt I'm using is 0.52.2 (default of Debian lenny)
>
> I just built a kernel where 'perf lock' can work, but I'm seeing a
> different problem:
>
> [root@doppio ~]# perf lock trace | head -3
> Warning: Error: expected 'data' but read 'overwrite'
> swapper-0 [000] 619.086424: lock_acquire:
0xffff8800029d4b28&q->lock
> swapper-0 [000] 619.086429: lock_acquire:
0xffff8800029d08f8&ctx->lock
> swapper-0 [000] 619.086439: lock_acquire:
0xffffffff81a38e58 clockevents_lock
> [root@doppio ~]# perf lock report | head -3
> Warning: Error: expected 'data' but read 'overwrite'
> ID Name acquired contended total
wait (ns) max wait (ns) min wait (ns)
>
> 0xffff8800029d4b28&q->lock 0 0 0
0 0
> [root@doppio ~]#
>
> My system details:
>
> [root@doppio ~]# uname -a
> Linux doppio.ghostprotocols.net 2.6.34-rc3-tip+ #1 SMP Mon Apr 5
13:49:42 BRT 2010 x86_64 x86_64 x86_64 GNU/Linux
> [root@doppio ~]# rpm -q newt
> newt-0.52.10-4.fc11.x86_64
> [root@doppio ~]#
>
> No segfaults here :-\
>
> - Arnaldo
>

I tested perf lock trace and got a similar error (but different).
% sudo ./perf lock trace
Fatal: reading input file (size expected=3 received=0)
It seems that perf.data is broken :( I'll search the cause.

Hitoshi

2010-04-07 10:39:15

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: Floating point exception by perf record

Em Wed, Apr 07, 2010 at 05:32:24PM +0900, Hitoshi Mitake escreveu:
> On 04/06/10 03:31, Arnaldo Carvalho de Melo wrote:
> > Em Sun, Apr 04, 2010 at 05:10:57PM +0900, Hitoshi Mitake escreveu:
> > [root@doppio ~]# perf lock report | head -3
> > Warning: Error: expected 'data' but read 'overwrite'
> > ID Name acquired contended
> total wait (ns) max wait (ns) min wait (ns)
> >
> > 0xffff8800029d4b28&q->lock 0 0 0
> 0 0
> > [root@doppio ~]#
> >
> > My system details:
> >
> > [root@doppio ~]# uname -a
> > Linux doppio.ghostprotocols.net 2.6.34-rc3-tip+ #1 SMP Mon Apr 5
> 13:49:42 BRT 2010 x86_64 x86_64 x86_64 GNU/Linux
> > [root@doppio ~]# rpm -q newt
> > newt-0.52.10-4.fc11.x86_64
> > [root@doppio ~]#
> >
> > No segfaults here :-\
> >
> > - Arnaldo
> >
>
> I tested perf lock trace and got a similar error (but different).
> % sudo ./perf lock trace
> Fatal: reading input file (size expected=3 received=0)
> It seems that perf.data is broken :( I'll search the cause.

Take a look at:

http://git.kernel.org/?p=linux/kernel/git/acme/linux-2.6.git;a=summary

"perf" branch, there is at least one fix there, the "overwrite" one.

- Arnaldo

2010-04-08 06:46:11

by Hitoshi Mitake

[permalink] [raw]
Subject: Re: Floating point exception by perf record

On 04/07/10 19:38, Arnaldo Carvalho de Melo wrote:
> Em Wed, Apr 07, 2010 at 05:32:24PM +0900, Hitoshi Mitake escreveu:
>> On 04/06/10 03:31, Arnaldo Carvalho de Melo wrote:
>>> Em Sun, Apr 04, 2010 at 05:10:57PM +0900, Hitoshi Mitake escreveu:
>>> [root@doppio ~]# perf lock report | head -3
>>> Warning: Error: expected 'data' but read 'overwrite'
>>> ID Name acquired contended
>> total wait (ns) max wait (ns) min wait (ns)
>>>
>>> 0xffff8800029d4b28&q->lock 0 0 0
>> 0 0
>>> [root@doppio ~]#
>>>
>>> My system details:
>>>
>>> [root@doppio ~]# uname -a
>>> Linux doppio.ghostprotocols.net 2.6.34-rc3-tip+ #1 SMP Mon Apr 5
>> 13:49:42 BRT 2010 x86_64 x86_64 x86_64 GNU/Linux
>>> [root@doppio ~]# rpm -q newt
>>> newt-0.52.10-4.fc11.x86_64
>>> [root@doppio ~]#
>>>
>>> No segfaults here :-\
>>>
>>> - Arnaldo
>>>
>>
>> I tested perf lock trace and got a similar error (but different).
>> % sudo ./perf lock trace
>> Fatal: reading input file (size expected=3 received=0)
>> It seems that perf.data is broken :( I'll search the cause.
>
> Take a look at:
>
> http://git.kernel.org/?p=linux/kernel/git/acme/linux-2.6.git;a=summary
>
> "perf" branch, there is at least one fix there, the "overwrite" one.
>

Thanks for your notify!
I applied changes on your branch, but my problem was not solved...

I found that the member "offset" of struct perf_file_header is broken.
I made this easy change,

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 6c9aa16..dbea087 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -571,6 +571,7 @@ int perf_header__process_sections(struct perf_header
*self, int fd,
if (perf_header__getbuffer64(self, fd, feat_sec, sec_size))
goto out_free;

+ printf("feat_sec->offset:%llu\n", feat_sec->offset);
err = 0;
while (idx < nr_sections && feat < HEADER_LAST_FEATURE) {
if (perf_header__has_feat(self, feat)) {

And the result is this:

% sudo ./perf lock report
Fatal: reading input file (size expected=3 received=0)
feat_sec->offset:18014402804449281 # <- Wow! :(

It seems that header of perf.data is broken...
But it may be a cause of this problem, I'll try to fix it.

Thanks,
Hitoshi

2010-04-08 07:29:39

by Hitoshi Mitake

[permalink] [raw]
Subject: Re: Floating point exception by perf record

On 04/08/10 15:46, Hitoshi Mitake wrote:
> On 04/07/10 19:38, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Apr 07, 2010 at 05:32:24PM +0900, Hitoshi Mitake escreveu:
> >> On 04/06/10 03:31, Arnaldo Carvalho de Melo wrote:
> >>> Em Sun, Apr 04, 2010 at 05:10:57PM +0900, Hitoshi Mitake escreveu:
> >>> [root@doppio ~]# perf lock report | head -3
> >>> Warning: Error: expected 'data' but read 'overwrite'
> >>> ID Name acquired contended
> >> total wait (ns) max wait (ns) min wait (ns)
> >>>
> >>> 0xffff8800029d4b28&q->lock 0 0 0
> >> 0 0
> >>> [root@doppio ~]#
> >>>
> >>> My system details:
> >>>
> >>> [root@doppio ~]# uname -a
> >>> Linux doppio.ghostprotocols.net 2.6.34-rc3-tip+ #1 SMP Mon Apr 5
> >> 13:49:42 BRT 2010 x86_64 x86_64 x86_64 GNU/Linux
> >>> [root@doppio ~]# rpm -q newt
> >>> newt-0.52.10-4.fc11.x86_64
> >>> [root@doppio ~]#
> >>>
> >>> No segfaults here :-\
> >>>
> >>> - Arnaldo
> >>>
> >>
> >> I tested perf lock trace and got a similar error (but different).
> >> % sudo ./perf lock trace
> >> Fatal: reading input file (size expected=3 received=0)
> >> It seems that perf.data is broken :( I'll search the cause.
> >
> > Take a look at:
> >
> > http://git.kernel.org/?p=linux/kernel/git/acme/linux-2.6.git;a=summary
> >
> > "perf" branch, there is at least one fix there, the "overwrite" one.
> >
>
> Thanks for your notify!
> I applied changes on your branch, but my problem was not solved...
>
> I found that the member "offset" of struct perf_file_header is broken.
> I made this easy change,
>
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index 6c9aa16..dbea087 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -571,6 +571,7 @@ int perf_header__process_sections(struct perf_header
> *self, int fd,
> if (perf_header__getbuffer64(self, fd, feat_sec, sec_size))
> goto out_free;
>
> + printf("feat_sec->offset:%llu\n", feat_sec->offset);
> err = 0;
> while (idx < nr_sections && feat < HEADER_LAST_FEATURE) {
> if (perf_header__has_feat(self, feat)) {
>
> And the result is this:
>
> % sudo ./perf lock report
> Fatal: reading input file (size expected=3 received=0)
> feat_sec->offset:18014402804449281 # <- Wow! :(
>
> It seems that header of perf.data is broken...
> But it may be a cause of this problem, I'll try to fix it.
>

I tried to build perf without newt,
then perf lock record could work without floating point exception
I mentioned earlier.

And perf lock report could parsed the perf.data previous perf lock
record produced.

The cause of breaking header of perf.data is that
handler registerd by atexit() was not called.

Hmm...