2005-11-27 02:44:50

by Larry Finger

[permalink] [raw]
Subject: What are the general causes of frozen system?

I am trying to help the bcm43xx project develop a driver for the
Broadcom 43xx wireless chips, using my Linksys WPC54G card.
Unfortunately, since the group got far enough to turn on RX DMA, my
system has frozen whenever I load the driver. TX DMA was OK. It seems to
correlate with the receipt of a beacon from my AP, but that cannot be
proven. When the freeze happens, I cannot do anything more and have to
power the system off.

What should I consider as a cause of the freeze? I have reviewed the
code and do not find any obvious out-of-bounds memory references. I have
tried various 'printk' statements, but none of them in the bottom-half
interrupt routine make it to the logs. Are there any tricks that I
should try?

Thanks,

Larry


2005-11-27 02:49:08

by Tim Hockin

[permalink] [raw]
Subject: Re: What are the general causes of frozen system?

> What should I consider as a cause of the freeze? I have reviewed the
> code and do not find any obvious out-of-bounds memory references. I have
> tried various 'printk' statements, but none of them in the bottom-half
> interrupt routine make it to the logs. Are there any tricks that I
> should try?

Look for lock-related deadlocks. Try turning on the nmi watchdog

2005-11-27 03:05:42

by Larry Finger

[permalink] [raw]
Subject: Re: What are the general causes of frozen system?

[email protected] wrote:
>>What should I consider as a cause of the freeze? I have reviewed the
>>code and do not find any obvious out-of-bounds memory references. I have
>>tried various 'printk' statements, but none of them in the bottom-half
>>interrupt routine make it to the logs. Are there any tricks that I
>>should try?
>
>
> Look for lock-related deadlocks. Try turning on the nmi watchdog
>

Thanks for the quick response. Unfortunately, adding either
nmi_watchdog=1 or 2 makes my machine lockup in booting - just after ACPI
is initialized.

2005-11-27 11:01:09

by Michael Buesch

[permalink] [raw]
Subject: Re: What are the general causes of frozen system?

On Sunday 27 November 2005 03:44, you wrote:
> I am trying to help the bcm43xx project develop a driver for the
> Broadcom 43xx wireless chips, using my Linksys WPC54G card.
> Unfortunately, since the group got far enough to turn on RX DMA, my
> system has frozen whenever I load the driver. TX DMA was OK. It seems to
> correlate with the receipt of a beacon from my AP, but that cannot be
> proven. When the freeze happens, I cannot do anything more and have to
> power the system off.
>
> What should I consider as a cause of the freeze? I have reviewed the
> code and do not find any obvious out-of-bounds memory references. I have
> tried various 'printk' statements, but none of them in the bottom-half
> interrupt routine make it to the logs. Are there any tricks that I
> should try?

Enable most of the "Kernel Hacking" Debugging options, like
Memory and spinlock debugging. This will often catch memory corruptions
or deadlocks.

I'm not sure where the freeze happens, from your explanations.
Does it freeze before or after the DMA operations? I am sure we find the
reason, if you talk to us about the problem on IRC or the bcm43xx mailing list.

--
Greetings Michael.


Attachments:
(No filename) (1.16 kB)
(No filename) (189.00 B)
Download all attachments

2005-11-27 21:17:11

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: What are the general causes of frozen system?

On Sat, 2005-11-26 at 20:44 -0600, Larry Finger wrote:
> I am trying to help the bcm43xx project develop a driver for the
> Broadcom 43xx wireless chips, using my Linksys WPC54G card.
> Unfortunately, since the group got far enough to turn on RX DMA, my
> system has frozen whenever I load the driver. TX DMA was OK. It seems to
> correlate with the receipt of a beacon from my AP, but that cannot be
> proven. When the freeze happens, I cannot do anything more and have to
> power the system off.
>
> What should I consider as a cause of the freeze? I have reviewed the
> code and do not find any obvious out-of-bounds memory references. I have
> tried various 'printk' statements, but none of them in the bottom-half
> interrupt routine make it to the logs. Are there any tricks that I
> should try?

spinlock debugging is one... if it's a bad DMA to low memory, you can't
do much. Another common cause is a stale interrupt, is your IRQ handler
called over & over again in a loop ?

The problem is that depending on what you are interrupting, printk may
not work ...

Ben.


2005-11-28 05:01:16

by Larry Finger

[permalink] [raw]
Subject: Re: What are the general causes of frozen system?

[email protected] wrote:
> On Sat, Nov 26, 2005 at 09:05:37PM -0600, Larry Finger wrote:
>
>>>Look for lock-related deadlocks. Try turning on the nmi watchdog
>>
>>Thanks for the quick response. Unfortunately, adding either
>>nmi_watchdog=1 or 2 makes my machine lockup in booting - just after ACPI
>>is initialized.
>
>
> Try with acpi=off, too?
>

It turned out that an 'lacpi' was causing the boot lockup. I tried all
combinations of acpi on/off and nmi_watchdog 1/2 without any success.
The NMI count in /proc/interrupts remains stuck at 0.