2006-09-08 07:36:50

by Janne Karhunen

[permalink] [raw]
Subject: debugging a deadlock

Hi,

Sometimes you have to do strange things such as trying to debug occasional
deadlock of a system that has been in use for long, long time. So please, no
nasty comments about outdated system with no soft lock-up detection and
such :/

Anyhoo, it appears to be infinite semaphore wait. By modifying the semaphores
to dump stack on LONG wait I managed to get a stack trace. Looks like this:

kernel: [printk+340/384] [printk+340/384] [show_trace+203/240]
[show_trace+203/240] [show_stack+113/120] [show_registers+223/324]
kernel: [__down+147/276] [__down_failed+8/12]
[stext_lock+13538/52069] [error_code+16/64] [system_call+66/76]

Umm, this error_code thing is beyond my current knowledge. What's this?
Some sort of assembly-glued exception handling? Any ideas how to figure
out which semaphore this is?


--
// Janne