2007-06-21 15:07:15

by Daniel Drake

[permalink] [raw]
Subject: ext3/journalling deadlock on old kernel

Hi,

To start with, this was seen on a 2.6.18 kernel which was tainted. I
know how it doesn't make sense to work on a report from such an old code
base, never mind a tainted kernel, so feel free to ignore this if it's
not interesting..

I left a system running overnight, running "stress --cpu 4 --io 1 --vm 1
--hdd 1" on a 4 way opteron system. While running stress
(http://weather.ou.edu/~apw/projects/stress/) it was also repeatedly
checking the md5sums of several large files. The system was under both
CPU and IO (DMA) load.

After a few hours, it hung, but caps lock, ping, and sysrq were still
responsive. sysrq shows me this trace:

EIP at .text.lock.spinlock+0x2/0x97
Trace:
journal_dirty_data+0x78
ext3_journal_dirty_data+0x1d
walk_page_buffers+0x68
ext3_journal_dirty_data+0x0
ext3_ordered_commit_write+0x73
generic_file_buffered_write
__generic_file_aio_write
do_get_write_access
generic_file_aio_write
ext3_file_write
do_sync_write

The lock it hung on, according to gdb:

spin_lock(&journal->j_list_lock);

If a fix is known for this issue, I'd appreciate any pointers. I'll
continue attempting to reproduce this with a serial console hooked up.

Thanks.
--
Daniel Drake
Brontes Technologies, A 3M Company