2003-07-25 03:26:18

by Randy Hron

[permalink] [raw]
Subject: dbench has intermittent hang on 2.6.0-test1-ac2

dbench 64 hung during a run using 2.6.0-test1-ac2 on ext3. One
of the dbench processes never created the clients/clientsXX
directory.

The parent dbench-2.0 process continues to update the throughput
measurement and the MB/sec slowly drops.

I saw the same behavior on a dbench 32 run with 2.6.0-test1-ac1
on reiserfs.

ps -ef|grep dbenc[h]
root 12266 11460 0 21:24 pts/0 00:00:00 ./dbench 64
root 12320 12266 0 21:24 pts/0 00:00:00 ./dbench 64

It isn't highly reproduceable. Of 28 different dbench runs
on 2.6.0-test1-ac[12], only 2 have done this.

Uniprocessor x86 running RedHat 7.3 + patches.

Sysrq T for the dbench processes shows:

dbench R C010F024 4089854812 12266 11460 12320 (NOTLB)
c909df60 00000082 bffffa58 c010f024 d5439360 d5439360 00000001 fffffe00
00000000 c0118dda c909c000 00000001 00000000 d5439360 c0113e10 00000000
00000000 c909dfc4 c010836b c909dfc4 00000000 d5439360 c0113e10 d54394b4
Call Trace:
[<c010f024>] restore_i387+0x54/0x80
[<c0118dda>] sys_wait4+0x1ea/0x220
[<c0113e10>] default_wake_function+0x0/0x20
[<c010836b>] sys_sigreturn+0x8b/0xc0
[<c0113e10>] default_wake_function+0x0/0x20
[<c0108e27>] syscall_call+0x7/0xb

dbench S C46F3FC4 4028283312 12320 12266 (NOTLB)
c46f3fb8 00000086 00000000 c46f3fc4 d4546060 0000000b 00000774 00000040
c46f2000 c0120f14 c0108e27 0000000b 00000000 40013000 00000774 00000040
bffffb28 0000001d 0000007b 0000007b 0000001d 400c6837 00000073 00000246
Call Trace:
[<c0120f14>] sys_pause+0x14/0x20
[<c0108e27>] syscall_call+0x7/0xb


strace -p 12320 # child
pause(


kill 12320
kill -INT 12320

The state changes from S to T.

ps axu|grep dbenc[h]
root 12266 0.0 0.1 1364 424 pts/0 S 21:24 0:00 ./dbench 64
root 12320 0.0 0.0 1360 356 pts/0 T 21:24 0:00 ./dbench 64

cat /proc/12320/wchan
finish_stop

kill -CONT 12320 # child and parent exit

All of <sysrq t> output at:
http://home.earthlink.net/~rwhron/kernel/sysrq-t.txt

config at
http://home.earthlink.net/~rwhron/kernel/config/config-2.6.0-test1-ac2

--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html


2003-07-25 03:37:07

by Randy Hron

[permalink] [raw]
Subject: Re: dbench has intermittent hang on 2.6.0-test1-ac2

> dbench 64 hung during a run using 2.6.0-test1-ac2 on ext3.

> I saw the same behavior on a dbench 32 run with 2.6.0-test1-ac1
> on reiserfs.

dbench did not hang in 50 runs on 2.6.0-test1 or 50 runs
on 2.6.0-test1-mm2 on the same machine with various filesystems.

--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html

2003-07-28 23:11:50

by Randy Hron

[permalink] [raw]
Subject: Re: dbench has intermittent hang on 2.6.0-test1-ac2

dbench 32 hang on 2.6.0-test2. /proc/PID/wchan shows
dbench process in sys_pause, /proc/PPID/wchan shows
other dbench in sys_wait4.

kill -CONT on the two dbench PIDs has the child
wchan change to __pdflush, but the processes don't
appear to continue, nor exit. After waiting a couple
minutes, I did kill on both PIDs and dbench exited.

This was ext2 filesystem. Previous was ext3 and
reiserfs.

sysrq t after "kill -CONT" is at:
http://home.earthlink.net/~rwhron/kernel/sysrq.txt

--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html

2003-07-30 11:45:03

by Randy Hron

[permalink] [raw]
Subject: Re: dbench has intermittent hang on 2.6.0-test1-ac2

Summary:
dbench has been intermittantly not completing on uniprocessor.
I run dbench 10 times. 1 of the ten runs has 1 dbench child
that never gets started. That child is in sys_pause.

2.6.0-test1 and 2.6.0-test1-mm2 did not hang.

2.6.0-test1-ac1, 2.6.0-test1-ac2, 2.6.0-test2, and
2.6.0-test2-mm1 have hung. So it seems like a patch
that Alan may have picked up first.

The hang has occurred on ext2, ext3, reiserfs, and xfs,
so filesystem type seems unrelated.

pkill -9 dbench will let the processes continue.

dbench version 2.0.

<sysrq-t> from 2.6.0-test2-mm1 before pkill is at:
http://home.earthlink.net/~rwhron/kernel/minicom.cap

--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html