2005-12-17 16:09:22

by Jan Engelhardt

[permalink] [raw]
Subject: Dianogsing a hard lockup

Hi list,


some time after I load drivers (any, rt2500 or via ndiswrap) for a
rt2500-based wlan card, the box locks up hard. Sysrq does not work, so I
suppose it is during irq-disabled context. How could I find out where this
happens?


Jan Engelhardt
--


2005-12-17 17:59:18

by Robert Hancock

[permalink] [raw]
Subject: Re: Dianogsing a hard lockup

Jan Engelhardt wrote:
> Hi list,
>
>
> some time after I load drivers (any, rt2500 or via ndiswrap) for a
> rt2500-based wlan card, the box locks up hard. Sysrq does not work, so I
> suppose it is during irq-disabled context. How could I find out where this
> happens?
>
>
> Jan Engelhardt

Try nmi_watchdog=1 on the kernel command line. That may get you a stack
trace for the lockup.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2005-12-17 18:39:48

by Lee Revell

[permalink] [raw]
Subject: Re: Dianogsing a hard lockup

On Sat, 2005-12-17 at 17:09 +0100, Jan Engelhardt wrote:
> Hi list,
>
>
> some time after I load drivers (any, rt2500 or via ndiswrap) for a
> rt2500-based wlan card, the box locks up hard. Sysrq does not work, so I
> suppose it is during irq-disabled context. How could I find out where this
> happens?


First, stick to rt2500 as you won't get help with binary only drivers
here.

Try to reproduce the problem from the console, you're more likely to get
a usable Oops.

Check the driver code & make sure it can't get stuck looping in the
interrupt handler due to an unhandled IRQ. Add printks.

Finally report it to the rt2500 maintainer.

Lee

2005-12-18 15:42:57

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Dianogsing a hard lockup

Hi list,

>> some time after I load drivers (any, rt2500 or via ndiswrap) for a
>> rt2500-based wlan card, the box locks up hard. Sysrq does not work, so I
>> suppose it is during irq-disabled context. How could I find out where this
>> happens?
>
>First, stick to rt2500 as you won't get help with binary only drivers
>here.
>Check the driver code & make sure it can't get stuck looping in the
>interrupt handler due to an unhandled IRQ. Add printks.

It happens with both, and that's why I think this is not a problem
with the rt2500 driver(s), but somewhere else in the kernel. But I do
not know where, because it is a lot bigger than the rt code base.

>Try to reproduce the problem from the console, you're more likely to get
>a usable Oops.
>
I did, it just locks. No reaction to Sysrq+T/+P, which is the "hard"
in "hard lockup".



Jan Engelhardt
--

2005-12-19 15:10:37

by Roger Heflin

[permalink] [raw]
Subject: RE: Dianogsing a hard lockup



> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Lee Revell
> Sent: Saturday, December 17, 2005 12:41 PM
> To: Jan Engelhardt
> Cc: Linux Kernel Mailing List
> Subject: Re: Dianogsing a hard lockup
>
> On Sat, 2005-12-17 at 17:09 +0100, Jan Engelhardt wrote:
> > Hi list,
> >
> >
> > some time after I load drivers (any, rt2500 or via ndiswrap) for a
> > rt2500-based wlan card, the box locks up hard. Sysrq does
> not work, so
> > I suppose it is during irq-disabled context. How could I find out
> > where this happens?
>
>
> First, stick to rt2500 as you won't get help with binary only
> drivers here.
>
> Try to reproduce the problem from the console, you're more
> likely to get a usable Oops.
>
> Check the driver code & make sure it can't get stuck looping
> in the interrupt handler due to an unhandled IRQ. Add printks.
>
> Finally report it to the rt2500 maintainer.

Jan,

I got the rt2500usb driver to blow up nicely if I used the
default ieee* routines from the kernel and not the ones that
came with the rt2500 drivers, you might want to verify which
ieee* that you are using. Using the ones that came with the
rt2500 seem to work, or at least not crash the kernel out.

Roger

2005-12-19 15:53:06

by Jan Engelhardt

[permalink] [raw]
Subject: RE: Dianogsing a hard lockup


>I got the rt2500usb driver to blow up nicely if I used the
>default ieee* routines from the kernel and not the ones that
>came with the rt2500 drivers, you might want to verify which
>ieee* that you are using. Using the ones that came with the
>rt2500 seem to work, or at least not crash the kernel out.

The rt2500-1.1.0-b3 (not the same as rt2500pci!) package does not include
its own ieee tree yet, so that can't be the issue. Anyway, I tried the card
in on a different box, and it worked there. Strange enough that it's
always the motherboard which fails it. The one where it does not work is a
VIA something motherboard with an AMD K6-2/500 CPU.


Jan Engelhardt
--
| Alphagate Systems, http://alphagate.hopto.org/
| jengelh's site, http://jengelh.hopto.org/

2005-12-31 00:11:12

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Dianogsing a hard lockup

Hi,

> Try nmi_watchdog=1 on the kernel command line. That may get you a stack trace
> for the lockup.

That does not seem to work.
APIC is enabled, but the kernel reports "No local APIC present or hardware
disabled". /proc/interrupts only lists XT PICs, and the NMI counter in
interrupts is also 0.


Jan Engelhardt
--

2006-01-02 18:58:00

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Dianogsing a hard lockup

>Hi,
>
>> Try nmi_watchdog=1 on the kernel command line. That may get you a stack trace
>> for the lockup.
>
>That does not seem to work.
>APIC is enabled, but the kernel reports "No local APIC present or hardware
>disabled". /proc/interrupts only lists XT PICs, and the NMI counter in
>interrupts is also 0.

So, here's a potential answer to my own problem: the mainboard is crap.
0000:00:00.0 Host bridge: VIA Technologies, Inc. VT82C598 [Apollo MVP3]
(rev 04)
0000:00:07.0 ISA bridge: VIA Technologies, Inc. VT82C596 ISA [Mobile South]
(rev 23)



Jan Engelhardt
--