2001-12-11 19:50:51

by Brian Horton

[permalink] [raw]
Subject: how to debug a deadlock'ed kernel?

Anyone got any good tips on how to debug a SMP system that is locked up
in a deadlock situation in the kernel? I'm working on a kernel module,
and after some number of hours of stress testing, the box locks up. None
of the sysrq options show anything on the display, though the reBoot
option does reboot the system. RedHat 6.2 and its 2.2.14 kernel. Doesn't
hang for me on 2.4, so I need to debug it here...

Any hints?

thx.bri.


2001-12-11 19:58:51

by Bruce Harada

[permalink] [raw]
Subject: Re: how to debug a deadlock'ed kernel?

On Tue, 11 Dec 2001 13:57:52 -0600
Brian Horton <[email protected]> wrote:

> Anyone got any good tips on how to debug a SMP system that is locked up
> in a deadlock situation in the kernel? I'm working on a kernel module,
> and after some number of hours of stress testing, the box locks up. None
> of the sysrq options show anything on the display, though the reBoot
> option does reboot the system. RedHat 6.2 and its 2.2.14 kernel. Doesn't
> hang for me on 2.4, so I need to debug it here...
>
> Any hints?

Try using a serial console (activate it in your kernel config and hook up
another PC to the serial port - if it's oopsing, you should see the oops over
the serial line.)
Also, I believe you can use kdb via serial as well (although I've never tried).


Bruce

2001-12-11 21:15:58

by Oliver Xymoron

[permalink] [raw]
Subject: Re: how to debug a deadlock'ed kernel?

On Tue, 11 Dec 2001, Brian Horton wrote:

> Anyone got any good tips on how to debug a SMP system that is locked up
> in a deadlock situation in the kernel? I'm working on a kernel module,
> and after some number of hours of stress testing, the box locks up. None
> of the sysrq options show anything on the display, though the reBoot
> option does reboot the system. RedHat 6.2 and its 2.2.14 kernel. Doesn't
> hang for me on 2.4, so I need to debug it here...

You might try Keith Owen's kdb. When you lock-up, hit <pause> which brings
up a kdb prompt. From there you can do backtraces, memory examination, and
disassembly on either processor.

It's often quite helpful to modify your test to narrow down what is making
it crash and/or make it happen faster. Reads vs writes, short/long
packets, etc.

--
"Love the dolphins," she advised him. "Write by W.A.S.T.E.."

2001-12-11 21:42:58

by George Anzinger

[permalink] [raw]
Subject: Re: how to debug a deadlock'ed kernel?

Brian Horton wrote:
>
> Anyone got any good tips on how to debug a SMP system that is locked up
> in a deadlock situation in the kernel? I'm working on a kernel module,
> and after some number of hours of stress testing, the box locks up. None
> of the sysrq options show anything on the display, though the reBoot
> option does reboot the system. RedHat 6.2 and its 2.2.14 kernel. Doesn't
> hang for me on 2.4, so I need to debug it here...
>
> Any hints?

First read about the NMI boot option in Documentation/nmi_watchdog.txt.
If you have this turned on and are not oopsing, then the timer (at
least) is interrupting. The next step I would take would be to used
either kdb (no experience) or kgdb. I have my own version of this if
you are interested. It does, however, require an RS232 (serial)
connection to a host machine.

I don't know about kdb, but kgdb (my version) uses the NMI to trap the
other cpus and also traps NMIs on the way to oopsing.
--
George [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/

2001-12-11 23:34:15

by Brian Horton

[permalink] [raw]
Subject: Re: how to debug a deadlock'ed kernel?

Thanks, I'll try the nmi_watchdog option out. It appears to not be in
the 2.2.14 kernel, but is in a 2.2.19 kernel that I have from RedHat.

Is there a version of kgdb that works with 2.2.x kernels? I only see 2.4
kernels on the web page (http://kgdb.sourceforge.net)

thx.bri.

george anzinger wrote:
>
> Brian Horton wrote:
> >
> > Anyone got any good tips on how to debug a SMP system that is locked up
> > in a deadlock situation in the kernel? I'm working on a kernel module,
> > and after some number of hours of stress testing, the box locks up. None
> > of the sysrq options show anything on the display, though the reBoot
> > option does reboot the system. RedHat 6.2 and its 2.2.14 kernel. Doesn't
> > hang for me on 2.4, so I need to debug it here...
> >
> > Any hints?
>
> First read about the NMI boot option in Documentation/nmi_watchdog.txt.
> If you have this turned on and are not oopsing, then the timer (at
> least) is interrupting. The next step I would take would be to used
> either kdb (no experience) or kgdb. I have my own version of this if
> you are interested. It does, however, require an RS232 (serial)
> connection to a host machine.
>
> I don't know about kdb, but kgdb (my version) uses the NMI to trap the
> other cpus and also traps NMIs on the way to oopsing.
> --
> George [email protected]
> High-res-timers: http://sourceforge.net/projects/high-res-timers/
> Real time sched: http://sourceforge.net/projects/rtsched/

2001-12-12 00:25:47

by Oliver Xymoron

[permalink] [raw]
Subject: Re: how to debug a deadlock'ed kernel?

On Tue, 11 Dec 2001, Brian Horton wrote:

> Thanks, I'll try the nmi_watchdog option out. It appears to not be in
> the 2.2.14 kernel, but is in a 2.2.19 kernel that I have from RedHat.

It's there by default in SMP, you just have to enable it with
nmiwatchdog=1 at boot (or something). I didn't mention it because it
probably won't help your problem: as you can use the magic sysrq keys to
reboot, you are not deadlocked with interrupts off, therefore the timer
interrupt will keep the NMI watchdog from ever firing. NMI watchdog is
mostly of use when you can't even get the capslock light to toggle..

> > > Anyone got any good tips on how to debug a SMP system that is locked up
> > > in a deadlock situation in the kernel? I'm working on a kernel module,
> > > and after some number of hours of stress testing, the box locks up. None
> > > of the sysrq options show anything on the display, though the reBoot
> > > option does reboot the system. RedHat 6.2 and its 2.2.14 kernel. Doesn't
> > > hang for me on 2.4, so I need to debug it here...

--
"Love the dolphins," she advised him. "Write by W.A.S.T.E.."

2001-12-14 21:35:37

by Brian Horton

[permalink] [raw]
Subject: Re: how to debug a deadlock'ed kernel?

Yup, the nmi_watchdog option didn't work, but I was able to get
information I needed with kdb!

thx! .bri.

Oliver Xymoron wrote:
>
> On Tue, 11 Dec 2001, Brian Horton wrote:
>
> > Thanks, I'll try the nmi_watchdog option out. It appears to not be in
> > the 2.2.14 kernel, but is in a 2.2.19 kernel that I have from RedHat.
>
> It's there by default in SMP, you just have to enable it with
> nmiwatchdog=1 at boot (or something). I didn't mention it because it
> probably won't help your problem: as you can use the magic sysrq keys to
> reboot, you are not deadlocked with interrupts off, therefore the timer
> interrupt will keep the NMI watchdog from ever firing. NMI watchdog is
> mostly of use when you can't even get the capslock light to toggle..
>
> > > > Anyone got any good tips on how to debug a SMP system that is locked up
> > > > in a deadlock situation in the kernel? I'm working on a kernel module,
> > > > and after some number of hours of stress testing, the box locks up. None
> > > > of the sysrq options show anything on the display, though the reBoot
> > > > option does reboot the system. RedHat 6.2 and its 2.2.14 kernel. Doesn't
> > > > hang for me on 2.4, so I need to debug it here...
>
> --
> "Love the dolphins," she advised him. "Write by W.A.S.T.E.."