2012-05-22 03:39:14

by Larry Finger

[permalink] [raw]
Subject: Network kernel panics with wireless-testing 3.4-rc7

I am getting kernel panics on one of my boxes from the b43legacy driver due to a
"Fatal exception in interrupt".

This particular one happened 50K seconds after bootup, but it has happened
nearly as soon as the network connection was completed. The hand-transcribed
traceback is as follows:

__nefif_schedule+0x13/0xa0
ieee80211_propagate_queue_wake+0x166/0x1c0
__ieee80211_wake_queue+0x13b/0x2d0
? __ieee80211_wake_queue++0xc0/0x2d0
ieee80211_wake_queue_by_reason+0x45/0x70
ieee80211_wake_queue+0xb/0x10
b43legacy_dma_handle_txstatus+0x3f9/0x4b0
? _raw_spin_unlock+0x26/0x40
b43legacy_handle_txstatus+0x64/0x90
b43legacy_handle_hwtxstatus+0x66/0x70
b43legacy_dma_rx+0x354/0x610

The offsets are for an x86_64 architecture.

These crashes never happen when I use a USB device running the rtl8187 driver,
thus it appears to arise in b43legacy. Any suggestions on what might cause the
problem would be helpful. Sorry I don't have the register dumps, etc.

The code dump at the point of the crash is as follows:
ec 10 4c 89 65 f8 48 89 5d f0 49 89 fc <3c> 0f ba af 80 00 00 00 00

Larry


2012-05-29 07:01:41

by Johannes Berg

[permalink] [raw]
Subject: Re: Network kernel panics with wireless-testing 3.4-rc7

On Mon, 2012-05-21 at 22:39 -0500, Larry Finger wrote:
> I am getting kernel panics on one of my boxes from the b43legacy driver due to a
> "Fatal exception in interrupt".
>
> This particular one happened 50K seconds after bootup, but it has happened
> nearly as soon as the network connection was completed. The hand-transcribed
> traceback is as follows:

FWIW, if you have a digital camera I'm happy with a picture too, no need
to hand-transcribe everything.

> __nefif_schedule+0x13/0xa0
> ieee80211_propagate_queue_wake+0x166/0x1c0
> __ieee80211_wake_queue+0x13b/0x2d0
> ? __ieee80211_wake_queue++0xc0/0x2d0
> ieee80211_wake_queue_by_reason+0x45/0x70
> ieee80211_wake_queue+0xb/0x10
> b43legacy_dma_handle_txstatus+0x3f9/0x4b0
> ? _raw_spin_unlock+0x26/0x40
> b43legacy_handle_txstatus+0x64/0x90
> b43legacy_handle_hwtxstatus+0x66/0x70
> b43legacy_dma_rx+0x354/0x610
>
> The offsets are for an x86_64 architecture.
>
> These crashes never happen when I use a USB device running the rtl8187 driver,
> thus it appears to arise in b43legacy. Any suggestions on what might cause the
> problem would be helpful. Sorry I don't have the register dumps, etc.

Maybe that device simply never stops/wakes the queues in the same way.
Or the difference is that b43legacy has only a single queue available to
it (right now) and no QoS.

> The code dump at the point of the crash is as follows:
> ec 10 4c 89 65 f8 48 89 5d f0 49 89 fc <3c> 0f ba af 80 00 00 00 00

Hmm. That decodes (script/decodecode) to

All code
========
0: ec in (%dx),%al
1: 10 4c 89 65 adc %cl,0x65(%rcx,%rcx,4)
5: f8 clc
6: 48 89 5d f0 mov %rbx,-0x10(%rbp)
a: 49 89 fc mov %rdi,%r12
d:* 3c 0f cmp $0xf,%al <-- trapping instruction
f: ba af 80 00 00 mov $0x80af,%edx
...

which is odd because that function doesn't seem to have a comparison to
0xf (15) in it as far as I can tell.

I'm pretty stumped. Does this reproduce well? Maybe you can print out
the queue number in the propagate wake function?

johannes