Hi,
I see a kernel bug with kernel 2.6.10. Hardware is UP Pentium 4 CPU
2.40GHz, output from lspci is attached.
This all happens on customers machines, so I am unable to easily
switch to a newer kernel version, sorry.
Everry few hours the machines deadlocks with the following messages:
<3>scheduling while atomic: swapper/0x00000100/0
<4> [<c010333e>] dump_stack+0x1e/0x20
<4> [<c02de278>] schedule+0x458/0x510
<4> [<c01006bc>] cpu_idle+0x1c/0x50
<4> [<c0100406>] rest_init+0x26/0x30
<4> [<c03ee99a>] start_kernel+0x1ba/0x200
<4> [<c010019f>] L6+0x0/0x2
It seemed quite clear what happened here, so I started to search for
the missing unlock in some error path, which was a quite daunting
task. So I modified the kernel in order to find out the code
which called local_bh_disable() before this all happened. Patch
is attached. This is the output:
<3>scheduling while atomic: swapper/0x00000100/0
<4>bh_users: c011b499
<4>bh_users: 00000000
<4> [<c010333e>] dump_stack+0x1e/0x20
<4> [<c02de278>] schedule+0x458/0x510
<4> [<c01006bc>] cpu_idle+0x1c/0x50
<4> [<c0100406>] rest_init+0x26/0x30
<4> [<c03ee99a>] start_kernel+0x1ba/0x200
<4> [<c010019f>] L6+0x0/0x2
c01b499 is an address from __do_softirq. And this is the point I do not
understand currently.
Note that the kernel is patched with some very intrusive patches like
LKCD, KDB and Xen 2. So I will disable all but KDB and see what
happens.
/holger
--