2006-12-17 20:37:58

by Aaron Sethman

[permalink] [raw]
Subject: [OOPS] bcm43xx oops on 2.6.20-rc1 on x86_64


Just was loading the bcm43xx module and got the following oops. Note that
this card is one of the newer PCI-E cards. If any other info is needed
let me know.

-Aaron

ACPI: PCI Interrupt 0000:0c:00.0[A] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device 0000:0c:00.0 to 64
bcm43xx: Chip ID 0x4311, rev 0x1
bcm43xx: Number of cores: 4
bcm43xx: Core 0: ID 0x800, rev 0x11, vendor 0x4243
bcm43xx: Core 1: ID 0x812, rev 0xa, vendor 0x4243
bcm43xx: Core 2: ID 0x817, rev 0x3, vendor 0x4243
bcm43xx: Core 3: ID 0x820, rev 0x1, vendor 0x4243
bcm43xx: PHY connected
bcm43xx: Detected PHY: Version: 4, Type 2, Revision 8
bcm43xx: Detected Radio: ID: 2205017f (Manuf: 17f Ver: 2050 Rev: 2)
bcm43xx: Radio turned off
bcm43xx: Radio turned off
Unable to handle kernel NULL pointer dereference at 0000000000000011 RIP:
[<ffffffff88007793>] :ieee80211:ieee80211_wx_set_encode+0x14a/0x4a7
PGD 33a22067 PUD 3469b067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in: bcm43xx rng_core ieee80211softmac ieee80211
ieee80211_crypt
Pid: 4088, comm: iwconfig Not tainted 2.6.20-rc1 #2
RIP: 0010:[<ffffffff88007793>] [<ffffffff88007793>]
:ieee80211:ieee80211_wx_set_encode+0x14a/0x4a7
RSP: 0018:ffff810032d3fc28 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff81003332ebf8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff810032d3fcd5
RBP: ffff81003332ebf8 R08: 0000000000000005 R09: ffff810032d3fc48
R10: 0000000000000000 R11: 0000000000000202 R12: ffff81003332e4c0
R13: 0000000000000000 R14: 0000000000000000 R15: ffff810032d3fe58
FS: 00002b7863a2d6d0(0000) GS:ffffffff80697000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000011 CR3: 0000000034270000 CR4: 00000000000026e0
Process iwconfig (pid: 4088, threadinfo ffff810032d3e000, task
ffff810037c4f690)
Stack: ffff81003d87ecc0 0000000000000000 0000000100000000
ffffffff80290ede
0000000000000404 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
[<ffffffff80290ede>] touch_atime+0xde/0x130
[<ffffffff804f338b>] ioctl_standard_call+0x26b/0x3b0
[<ffffffff8802baa0>] :bcm43xx:bcm43xx_wx_set_encoding+0x0/0x10
[<ffffffff8025a029>] find_get_page+0x29/0x60
[<ffffffff8025c684>] filemap_nopage+0x194/0x350
[<ffffffff8802baa0>] :bcm43xx:bcm43xx_wx_set_encoding+0x0/0x10
[<ffffffff804f3775>] wireless_process_ioctl+0x105/0x3d0
[<ffffffff80267675>] __handle_mm_fault+0x465/0xad0
[<ffffffff804e81d6>] dev_ioctl+0x346/0x3c0
[<ffffffff803988f1>] __up_read+0x21/0xb0
[<ffffffff804db750>] sock_ioctl+0x220/0x240
[<ffffffff802885bf>] do_ioctl+0x2f/0xa0
[<ffffffff802888d3>] vfs_ioctl+0x2a3/0x2e0
[<ffffffff80288959>] sys_ioctl+0x49/0x80
[<ffffffff8055184d>] error_exit+0x0/0x84
[<ffffffff8020a03e>] system_call+0x7e/0x83


Code: 48 8b 40 10 48 85 c0 0f 84 01 01 00 00 48 8b 30 b9 04 00 00
RIP [<ffffffff88007793>] :ieee80211:ieee80211_wx_set_encode+0x14a/0x4a7
RSP <ffff810032d3fc28>
CR2: 0000000000000011



2006-12-30 19:21:05

by Adrian Bunk

[permalink] [raw]
Subject: Re: [OOPS] bcm43xx oops on 2.6.20-rc1 on x86_64

On Sun, Dec 17, 2006 at 03:15:28PM -0500, Aaron Sethman wrote:
>
> Just was loading the bcm43xx module and got the following oops. Note that
> this card is one of the newer PCI-E cards. If any other info is needed
> let me know.

Is this issue still present in 2.6.10-rc2-git1?

If yes, was 2.6.19 working fine?

> -Aaron
>
> ACPI: PCI Interrupt 0000:0c:00.0[A] -> GSI 17 (level, low) -> IRQ 17
> PCI: Setting latency timer of device 0000:0c:00.0 to 64
> bcm43xx: Chip ID 0x4311, rev 0x1
> bcm43xx: Number of cores: 4
> bcm43xx: Core 0: ID 0x800, rev 0x11, vendor 0x4243
> bcm43xx: Core 1: ID 0x812, rev 0xa, vendor 0x4243
> bcm43xx: Core 2: ID 0x817, rev 0x3, vendor 0x4243
> bcm43xx: Core 3: ID 0x820, rev 0x1, vendor 0x4243
> bcm43xx: PHY connected
> bcm43xx: Detected PHY: Version: 4, Type 2, Revision 8
> bcm43xx: Detected Radio: ID: 2205017f (Manuf: 17f Ver: 2050 Rev: 2)
> bcm43xx: Radio turned off
> bcm43xx: Radio turned off
> Unable to handle kernel NULL pointer dereference at 0000000000000011 RIP:
> [<ffffffff88007793>] :ieee80211:ieee80211_wx_set_encode+0x14a/0x4a7
> PGD 33a22067 PUD 3469b067 PMD 0
> Oops: 0000 [1] SMP
> CPU 0
> Modules linked in: bcm43xx rng_core ieee80211softmac ieee80211
> ieee80211_crypt
> Pid: 4088, comm: iwconfig Not tainted 2.6.20-rc1 #2
> RIP: 0010:[<ffffffff88007793>] [<ffffffff88007793>]
> :ieee80211:ieee80211_wx_set_encode+0x14a/0x4a7
> RSP: 0018:ffff810032d3fc28 EFLAGS: 00010202
> RAX: 0000000000000001 RBX: ffff81003332ebf8 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff810032d3fcd5
> RBP: ffff81003332ebf8 R08: 0000000000000005 R09: ffff810032d3fc48
> R10: 0000000000000000 R11: 0000000000000202 R12: ffff81003332e4c0
> R13: 0000000000000000 R14: 0000000000000000 R15: ffff810032d3fe58
> FS: 00002b7863a2d6d0(0000) GS:ffffffff80697000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000011 CR3: 0000000034270000 CR4: 00000000000026e0
> Process iwconfig (pid: 4088, threadinfo ffff810032d3e000, task
> ffff810037c4f690)
> Stack: ffff81003d87ecc0 0000000000000000 0000000100000000
> ffffffff80290ede
> 0000000000000404 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> [<ffffffff80290ede>] touch_atime+0xde/0x130
> [<ffffffff804f338b>] ioctl_standard_call+0x26b/0x3b0
> [<ffffffff8802baa0>] :bcm43xx:bcm43xx_wx_set_encoding+0x0/0x10
> [<ffffffff8025a029>] find_get_page+0x29/0x60
> [<ffffffff8025c684>] filemap_nopage+0x194/0x350
> [<ffffffff8802baa0>] :bcm43xx:bcm43xx_wx_set_encoding+0x0/0x10
> [<ffffffff804f3775>] wireless_process_ioctl+0x105/0x3d0
> [<ffffffff80267675>] __handle_mm_fault+0x465/0xad0
> [<ffffffff804e81d6>] dev_ioctl+0x346/0x3c0
> [<ffffffff803988f1>] __up_read+0x21/0xb0
> [<ffffffff804db750>] sock_ioctl+0x220/0x240
> [<ffffffff802885bf>] do_ioctl+0x2f/0xa0
> [<ffffffff802888d3>] vfs_ioctl+0x2a3/0x2e0
> [<ffffffff80288959>] sys_ioctl+0x49/0x80
> [<ffffffff8055184d>] error_exit+0x0/0x84
> [<ffffffff8020a03e>] system_call+0x7e/0x83
>
>
> Code: 48 8b 40 10 48 85 c0 0f 84 01 01 00 00 48 8b 30 b9 04 00 00
> RIP [<ffffffff88007793>] :ieee80211:ieee80211_wx_set_encode+0x14a/0x4a7
> RSP <ffff810032d3fc28>
> CR2: 0000000000000011
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-12-30 21:23:34

by Larry Finger

[permalink] [raw]
Subject: Re: [OOPS] bcm43xx oops on 2.6.20-rc1 on x86_64

From: Ulrich Kunitz <[email protected]>

The signature of work functions changed recently from a context
pointer to the work structure pointer. This caused a problem in
the ieee80211softmac code, because the ieee80211softmac_assox_work
function has been called directly with a parameter explicitly
casted to (void*). This compiled correctly but resulted in a
softlock, because mutex_lock was called with the wrong memory
address. The patch fixes the problem. Another issue was a wrong
call of the schedule_work function. Softmac works again and this
fixes the problem I mentioned earlier in the zd1211rw rx tasklet
patch. The patch is against Linus' tree (commit af1713e0).

Signed-off-by: Ulrich Kunitz <[email protected]>
Acked-by: Michael Buesch <[email protected]>
Signed-off-by: Larry Finger <[email protected]>
---

John,

This patch should be pushed upstream to 2.6.20. At the moment, the work
struct changes have not yet propagated to wireless-2.6. When they do,
it will be needed there as well.

Larry

net/ieee80211/softmac/ieee80211softmac_assoc.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ieee80211/softmac/ieee80211softmac_assoc.c b/net/ieee80211/softmac/ieee80211softmac_assoc.c
index eec1a1d..a824852 100644
--- a/net/ieee80211/softmac/ieee80211softmac_assoc.c
+++ b/net/ieee80211/softmac/ieee80211softmac_assoc.c
@@ -167,7 +167,7 @@ static void
ieee80211softmac_assoc_notify_scan(struct net_device *dev, int event_type, void *context)
{
struct ieee80211softmac_device *mac = ieee80211_priv(dev);
- ieee80211softmac_assoc_work((void*)mac);
+ ieee80211softmac_assoc_work(&mac->associnfo.work.work);
}

static void
@@ -177,7 +177,7 @@ ieee80211softmac_assoc_notify_auth(struc

switch (event_type) {
case IEEE80211SOFTMAC_EVENT_AUTHENTICATED:
- ieee80211softmac_assoc_work((void*)mac);
+ ieee80211softmac_assoc_work(&mac->associnfo.work.work);
break;
case IEEE80211SOFTMAC_EVENT_AUTH_FAILED:
case IEEE80211SOFTMAC_EVENT_AUTH_TIMEOUT:


Attachments:
work_struct2 (1.97 kB)

2006-12-30 21:32:46

by Adrian Bunk

[permalink] [raw]
Subject: Re: [OOPS] bcm43xx oops on 2.6.20-rc1 on x86_64

On Sat, Dec 30, 2006 at 03:23:42PM -0600, Larry Finger wrote:
> Adrian Bunk wrote:
> > On Sun, Dec 17, 2006 at 03:15:28PM -0500, Aaron Sethman wrote:
> >> Just was loading the bcm43xx module and got the following oops. Note that
> >> this card is one of the newer PCI-E cards. If any other info is needed
> >> let me know.
> >
> > Is this issue still present in 2.6.10-rc2-git1?
> >
> > If yes, was 2.6.19 working fine?
>...
>
> Any oops involving wireless extensions is due to 2.6.20-rc1 and -rc2 not having the fix for softmac
> that is necessitated by the 2.6.20 changes in the work structure.

"Any oops" are very strong words.

It wouldn't be the first time that we have several similar bug reports,
and it turns out that one is for a completely different issue...

> The needed patch has now been
> pushed by Jeff to Andrew and Linus, and should be in -rc3. In the meantime, it is attached.

That's why I asked for testing with 2.6.20-rc2-git1 that includes the
two ieee80211softmac patches.

> Larry

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-12-30 22:45:08

by Larry Finger

[permalink] [raw]
Subject: Re: [OOPS] bcm43xx oops on 2.6.20-rc1 on x86_64

Adrian Bunk wrote:
> On Sat, Dec 30, 2006 at 03:23:42PM -0600, Larry Finger wrote:
>> Adrian Bunk wrote:
>>> On Sun, Dec 17, 2006 at 03:15:28PM -0500, Aaron Sethman wrote:
>>>> Just was loading the bcm43xx module and got the following oops. Note that
>>>> this card is one of the newer PCI-E cards. If any other info is needed
>>>> let me know.
>>> Is this issue still present in 2.6.10-rc2-git1?
>>>
>>> If yes, was 2.6.19 working fine?
>> ...
>>
>> Any oops involving wireless extensions is due to 2.6.20-rc1 and -rc2 not having the fix for softmac
>> that is necessitated by the 2.6.20 changes in the work structure.
>
> "Any oops" are very strong words.

Yes - but I have seen at least 7 or 8 different occurrences of that bug since the patch was first
made available on Dec. 10, and I have seen no bcm43xx oopses from any other cause.

> It wouldn't be the first time that we have several similar bug reports,
> and it turns out that one is for a completely different issue...
>
> That's why I asked for testing with 2.6.20-rc2-git1 that includes the
> two ieee80211softmac patches.

I have been chasing a sound-card issue today and missed that -git1 was out. That version fixes the
two outstanding 2.6.20 softmac issues.

Larry

2006-12-31 04:14:40

by Aaron Sethman

[permalink] [raw]
Subject: Re: [OOPS] bcm43xx oops on 2.6.20-rc1 on x86_64


On Sat, 30 Dec 2006, Adrian Bunk wrote:

> On Sun, Dec 17, 2006 at 03:15:28PM -0500, Aaron Sethman wrote:
>>
>> Just was loading the bcm43xx module and got the following oops. Note that
>> this card is one of the newer PCI-E cards. If any other info is needed
>> let me know.
>
> Is this issue still present in 2.6.10-rc2-git1?
>
> If yes, was 2.6.19 working fine?
>

This seems to be fixed in 2.6.20-rc2-git1. Still having other issues
with the driver, but the oops in the SoftMAC code is resolved now at
least.

-Aaron

2006-12-31 05:14:44

by Larry Finger

[permalink] [raw]
Subject: Re: [OOPS] bcm43xx oops on 2.6.20-rc1 on x86_64

Aaron Sethman wrote:
>
> On Sat, 30 Dec 2006, Adrian Bunk wrote:
>
>> On Sun, Dec 17, 2006 at 03:15:28PM -0500, Aaron Sethman wrote:
>>>
>>> Just was loading the bcm43xx module and got the following oops. Note
>>> that
>>> this card is one of the newer PCI-E cards. If any other info is needed
>>> let me know.
>>
>> Is this issue still present in 2.6.10-rc2-git1?
>>
>> If yes, was 2.6.19 working fine?
>>
>
> This seems to be fixed in 2.6.20-rc2-git1. Still having other issues
> with the driver, but the oops in the SoftMAC code is resolved now at least.

I have just started working with the PCI-E BCM4311 that is in my new computer. It receives data OK,
but there is something wrong with the DMA out stuff in bcm43xx-softmac - at least for x86_64. All
the slots get full but nothing is ever transmitted. FWIW, the wireless-dev git tree works. I'm using
it for communications now. I'm using openSUSE 10.2 which uses NetworkManager to configure my WPA-PSK
TKIP encrypted network. The signal strengths are roughly the same as I got for my old BCM4306 card.

I will post any fixes for the DMA problem as soon as they are available, but it may be a while. I
will be off-line until Thursday while I attend a funeral.

Larry

2006-12-31 13:43:01

by Adrian Bunk

[permalink] [raw]
Subject: Re: [OOPS] bcm43xx oops on 2.6.20-rc1 on x86_64

On Sat, Dec 30, 2006 at 04:45:21PM -0600, Larry Finger wrote:
> Adrian Bunk wrote:
> > On Sat, Dec 30, 2006 at 03:23:42PM -0600, Larry Finger wrote:
> >> Adrian Bunk wrote:
> >>> On Sun, Dec 17, 2006 at 03:15:28PM -0500, Aaron Sethman wrote:
> >>>> Just was loading the bcm43xx module and got the following oops. Note that
> >>>> this card is one of the newer PCI-E cards. If any other info is needed
> >>>> let me know.
> >>> Is this issue still present in 2.6.10-rc2-git1?
> >>>
> >>> If yes, was 2.6.19 working fine?
> >> ...
> >>
> >> Any oops involving wireless extensions is due to 2.6.20-rc1 and -rc2 not having the fix for softmac
> >> that is necessitated by the 2.6.20 changes in the work structure.
> >
> > "Any oops" are very strong words.
>
> Yes - but I have seen at least 7 or 8 different occurrences of that bug since the patch was first
> made available on Dec. 10, and I have seen no bcm43xx oopses from any other cause.
>...

To avoid any misunderstandings:

This wasn't in any way meant against you personally.

And in this case you were right, it was the same bug.

My answer was based on experiences like one during 2.6.19-rc where we
had 4 bug reports for a regression with a patch available. And it turned
out that one of them was for a completely different regression.

That's why I prefer to get confirmations that a user actually run into
the same issue, and not into something completely different with similar
symptoms.

> Larry

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-12-31 14:22:28

by Larry Finger

[permalink] [raw]
Subject: Re: [OOPS] bcm43xx oops on 2.6.20-rc1 on x86_64

Adrian Bunk wrote:
>
> To avoid any misunderstandings:
>
> This wasn't in any way meant against you personally.
>
> And in this case you were right, it was the same bug.
>
> My answer was based on experiences like one during 2.6.19-rc where we
> had 4 bug reports for a regression with a patch available. And it turned
> out that one of them was for a completely different regression.
>
> That's why I prefer to get confirmations that a user actually run into
> the same issue, and not into something completely different with similar
> symptoms.

I certainly understood your intent and agree that confirmation is necessary. I was venting some
frustration at it taking nearly 3 weeks to get a fix into the system for a bug that was caused by a
change in an kernel structure that was not our doing. When Andrew posted a trial patch to fix the
compilation error that resulted, I immediately responded with the two additional fixes that were
needed, but they never made it into the code. Since then I have been seeing these obscure bug
reports and quickly learned that if either softmac or bcm43xx WX were involved, this fix took care
of the problem.

Larry