Hi,
To start with, this was seen on a 2.6.18 kernel which was tainted. I
know how it doesn't make sense to work on a report from such an old code
base, never mind a tainted kernel, so feel free to ignore this if it's
not interesting..
I left a system running overnight, running "stress --cpu 4 --io 1 --vm 1
--hdd 1" on a 4 way opteron system. While running stress
(http://weather.ou.edu/~apw/projects/stress/) it was also repeatedly
checking the md5sums of several large files. The system was under both
CPU and IO (DMA) load.
After a few hours, it hung, but caps lock, ping, and sysrq were still
responsive. sysrq shows me this trace:
EIP at .text.lock.spinlock+0x2/0x97
Trace:
journal_dirty_data+0x78
ext3_journal_dirty_data+0x1d
walk_page_buffers+0x68
ext3_journal_dirty_data+0x0
ext3_ordered_commit_write+0x73
generic_file_buffered_write
__generic_file_aio_write
do_get_write_access
generic_file_aio_write
ext3_file_write
do_sync_write
The lock it hung on, according to gdb:
spin_lock(&journal->j_list_lock);
If a fix is known for this issue, I'd appreciate any pointers. I'll
continue attempting to reproduce this with a serial console hooked up.
Thanks.
--
Daniel Drake
Brontes Technologies, A 3M Company