2009-11-23 09:28:18

by Maxim Levitsky

[permalink] [raw]
Subject: RFKILL related oops on iwl3945 reload

I noticed pretty wierd oops message, but found that it is always
reproduciable.

Just have any program listen to rfkill events, and reload the wireless
module (this also reloads all wireless modules)
I use small script that does modprobe -r $1 && modprobe $1

Below, backtrace that triggered by running 'rfkill event' and reloading
the module.
NM+wpa_supplicant(nl80211) did run, and began association process before
the oops.

This is usually triggered by bluetooth-applet that now listens to rfkill
events as well.


<1>[ 254.182693] BUG: unable to handle kernel paging request at ffffffffa0b8c058
<1>[ 254.182722] IP: [<ffffffff810e2fc3>] do_sys_poll+0x223/0x530
<4>[ 254.182750] PGD 1003067 PUD 1007063 PMD 68ac5067 PTE 0
<0>[ 254.182774] Oops: 0000 [#1] PREEMPT SMP
<0>[ 254.182792] last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
<4>[ 254.182813] CPU 1
<4>[ 254.182822] Modules linked in: iwl3945 iwlcore mac80211 cfg80211 rfkill af_packet nfsd exportfs usb_storage usb_libusual nfs cpufreq_powersave cpufreq_conservative cpufreq_userspace snd_hda_codec_realtek uvcvideo lockd snd_hda_intel nfs_acl videodev snd_hda_codec acpi_cpufreq v4l1_compat joydev snd_hwdep v4l2_compat_ioctl32 tg3 coretemp ohci1394 uhci_hcd auth_rpcgss psmouse sbp2 video snd_pcm ehci_hcd serio_raw libphy nvidia(P) output ieee1394 snd_page_alloc usbcore battery ac lirc_ene0100 evdev sunrpc fuse lzo lzo_decompress lzo_compress [last unloaded: rfkill]
<6>[ 254.183040] Pid: 4107, comm: rfkill Tainted: P 2.6.32-rc8-wl #211 Aspire 5720
<6>[ 254.183063] RIP: 0010:[<ffffffff810e2fc3>] [<ffffffff810e2fc3>] do_sys_poll+0x223/0x530
<6>[ 254.183091] RSP: 0018:ffff88006bae1b28 EFLAGS: 00010282
<6>[ 254.183108] RAX: ffffffffa0b8c020 RBX: ffff88006bae1f04 RCX: ffff88006bae1df8
<6>[ 254.183128] RDX: ffff88006bae1df8 RSI: ffff88006bae1f04 RDI: 0000000000000003
<6>[ 254.183149] RBP: ffff88006bae1f38 R08: 0000000000000002 R09: 0000000000000001
<6>[ 254.183170] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88006bae1e0c
<6>[ 254.183190] R13: ffff88006bae1e04 R14: ffff88004239ec80 R15: 0000000000000000
<6>[ 254.183212] FS: 00007f6fae1cb6f0(0000) GS:ffff880002300000(0000) knlGS:0000000000000000
<6>[ 254.183235] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<6>[ 254.183252] CR2: ffffffffa0b8c058 CR3: 0000000054eee000 CR4: 00000000000006e0
<6>[ 254.183273] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<6>[ 254.183294] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[ 254.183315] Process rfkill (pid: 4107, threadinfo ffff88006bae0000, task ffff8800421a8fb0)
<0>[ 254.183337] Stack:
<4>[ 254.183345] ffff880002313c40 ffff88006e9f9980 00ff88006bae1b48 ffff8800421a8fb0
<4>[ 254.183371] <0> 00007fff03e9fbb0 0000000000000000 ffff88006bae1b88 0000000002313c40
<4>[ 254.183399] <0> 0000000000000000 0000000000000000 ffff88006bae1df8 00000000421a8fb0
<0>[ 254.183429] Call Trace:
<4>[ 254.183444] [<ffffffff810e4330>] ? __pollwait+0x0/0xf0
<4>[ 254.183464] [<ffffffff810e4420>] ? pollwake+0x0/0x60
<4>[ 254.183484] [<ffffffff810371dd>] ? task_rq_lock+0x4d/0x90
<4>[ 254.183505] [<ffffffff8106bbfd>] ? trace_hardirqs_off+0xd/0x10
<4>[ 254.183528] [<ffffffff8138ff7d>] ? _spin_unlock_irqrestore+0x3d/0x80
<4>[ 254.183551] [<ffffffff8103ef03>] ? try_to_wake_up+0x1d3/0x240
<4>[ 254.183571] [<ffffffff8106cf3d>] ? trace_hardirqs_on+0xd/0x10
<4>[ 254.183592] [<ffffffff8138ff7d>] ? _spin_unlock_irqrestore+0x3d/0x80
<4>[ 254.183614] [<ffffffff81036f0e>] ? __wake_up+0x4e/0x70
<4>[ 254.183634] [<ffffffff8106cf3d>] ? trace_hardirqs_on+0xd/0x10
<4>[ 254.183655] [<ffffffff81036f0e>] ? __wake_up+0x4e/0x70
<4>[ 254.183673] [<ffffffff8106cf3d>] ? trace_hardirqs_on+0xd/0x10
<4>[ 254.183694] [<ffffffff81248c79>] ? tty_ldisc_deref+0x9/0x10
<4>[ 254.183716] [<ffffffff81241dc2>] ? tty_write+0x242/0x2b0
<4>[ 254.183737] [<ffffffff810d2e82>] ? vfs_write+0x132/0x1a0
<4>[ 254.183757] [<ffffffff810e34b7>] sys_poll+0x77/0x110
<4>[ 254.183778] [<ffffffff8100beab>] system_call_fastpath+0x16/0x1b
<0>[ 254.183796] Code: 00 0f 1f 80 00 00 00 00 48 89 de e8 08 08 ff ff 49 89 c6 48 85 c0 66 90 0f 84 9a 01 00 00 48 8b 40 20 48 85 c0 0f 84 7d 01 00 00 <48> 8b 40 38 48 85 c0 0f 84 70 01 00 00 4d 85 ff 74 18 41 0f b7
<1>[ 254.183946] RIP [<ffffffff810e2fc3>] do_sys_poll+0x223/0x530
<4>[ 254.183967] RSP <ffff88006bae1b28>
<0>[ 254.183979] CR2: ffffffffa0b8c058
<4>[ 254.183994] ---[ end trace 9f0741bcb03a2591 ]---




2009-11-23 17:16:14

by Maxim Levitsky

[permalink] [raw]
Subject: iwl3945 dies on module reload

When reloading the iwl3945 driver I often get this:


[ 830.333386] iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network Connection driver for Linux, 1.2.26kds
[ 830.333407] iwl3945: Copyright(c) 2003-2009 Intel Corporation
[ 830.333536] iwl3945 0000:06:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[ 830.374979] iwl3945 0000:06:00.0: Tunable channels: 13 802.11bg, 0 802.11a channels
[ 830.374995] iwl3945 0000:06:00.0: Detected Intel Wireless WiFi Link 3945BG
[ 830.375163] iwl3945 0000:06:00.0: irq 32 for MSI/MSI-X
[ 830.376859] phy0: Selected rate control algorithm 'iwl-3945-rs'
[ 830.427214] iwl3945 0000:06:00.0: firmware: requesting iwlwifi-3945-2.ucode
[ 830.526945] iwl3945 0000:06:00.0: loaded firmware version 15.32.2.9
[ 830.544236] CE: hpet increasing min_delta_ns to 15000 nsec
[ 830.575694] iwl3945 0000:06:00.0: BSM uCode verification failed at addr 0x00003800+0 (of 900), is 0xa5a5a5a2, s/b 0xf802020
[ 830.578862] iwl3945 0000:06:00.0: Hardware error detected. Restarting.
[ 832.570133] iwl3945 0000:06:00.0: Wait for START_ALIVE timeout after 2000ms.
[ 832.622676] iwl3945 0000:06:00.0: BSM uCode verification failed at addr 0x00003800+0 (of 900), is 0xa5a5a5a2, s/b 0xf802020
[ 832.625839] iwl3945 0000:06:00.0: Hardware error detected. Restarting.
[ 834.620132] iwl3945 0000:06:00.0: Wait for START_ALIVE timeout after 2000ms.


Note that I used antenna=2 parameter to experiment with quality of reception of both antennas
(this feature strangely works just fine....)

I disabled this option, so next time I feel like rebooting my system, I see if this option was the cause.
(unlikely)


Best regards,
Maxim Levitsky






2009-11-23 16:00:41

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 2.6.32] rfkill: fix miscdev ops

On Mon, 2009-11-23 at 16:12 +0200, Maxim Levitsky wrote:

> Not important, however I had seen that iwlwifi now chokes on reload,
> sometimes with this:
>
>
> [ 830.333386] iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network Connection driver for Linux, 1.2.26kds
> [ 830.333407] iwl3945: Copyright(c) 2003-2009 Intel Corporation
> [ 830.333536] iwl3945 0000:06:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
> [ 830.374979] iwl3945 0000:06:00.0: Tunable channels: 13 802.11bg, 0 802.11a channels
> [ 830.374995] iwl3945 0000:06:00.0: Detected Intel Wireless WiFi Link 3945BG
> [ 830.375163] iwl3945 0000:06:00.0: irq 32 for MSI/MSI-X
> [ 830.376859] phy0: Selected rate control algorithm 'iwl-3945-rs'
> [ 830.427214] iwl3945 0000:06:00.0: firmware: requesting iwlwifi-3945-2.ucode
> [ 830.526945] iwl3945 0000:06:00.0: loaded firmware version 15.32.2.9
> [ 830.544236] CE: hpet increasing min_delta_ns to 15000 nsec
> [ 830.575694] iwl3945 0000:06:00.0: BSM uCode verification failed at addr 0x00003800+0 (of 900), is 0xa5a5a5a2, s/b 0xf802020

I'd really appreciate if you started separate threads for separate
issues -- this can be in no way related to the fix except maybe that the
fix makes it possible for you to run into this whereas before it'd
crash :)

johannes


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part

2009-11-23 10:27:36

by Johannes Berg

[permalink] [raw]
Subject: [PATCH 2.6.32] rfkill: fix miscdev ops

The /dev/rfkill ops don't refer to the module,
so it is possible to unload the module while
file descriptors are open. Fix this oversight.

Reported-by: Maxim Levitsky <[email protected]>
Cc: [email protected]
Signed-off-by: Johannes Berg <[email protected]>
---
net/rfkill/core.c | 1 +
1 file changed, 1 insertion(+)

--- wireless-testing.orig/net/rfkill/core.c 2009-11-23 11:11:13.000000000 +0100
+++ wireless-testing/net/rfkill/core.c 2009-11-23 11:11:59.000000000 +0100
@@ -1191,6 +1191,7 @@ static long rfkill_fop_ioctl(struct file
#endif

static const struct file_operations rfkill_fops = {
+ .owner = THIS_MODULE,
.open = rfkill_fop_open,
.read = rfkill_fop_read,
.write = rfkill_fop_write,



2009-11-23 23:24:11

by Cahill, Ben M

[permalink] [raw]
Subject: RE: [ipw3945-devel] iwl3945 dies on module reload



>-----Original Message-----
>From: Maxim Levitsky [mailto:[email protected]]
>Sent: Monday, November 23, 2009 12:16 PM
>To: Johannes Berg
>Cc: linux-wireless; iwlwifi maling list
>Subject: [ipw3945-devel] iwl3945 dies on module reload
>
>When reloading the iwl3945 driver I often get this:
>
>
>[ 830.333386] iwl3945: Intel(R) PRO/Wireless 3945ABG/BG
>Network Connection driver for Linux, 1.2.26kds
>[ 830.333407] iwl3945: Copyright(c) 2003-2009 Intel Corporation
>[ 830.333536] iwl3945 0000:06:00.0: PCI INT A -> GSI 19
>(level, low) -> IRQ 19
>[ 830.374979] iwl3945 0000:06:00.0: Tunable channels: 13
>802.11bg, 0 802.11a channels
>[ 830.374995] iwl3945 0000:06:00.0: Detected Intel Wireless
>WiFi Link 3945BG
>[ 830.375163] iwl3945 0000:06:00.0: irq 32 for MSI/MSI-X
>[ 830.376859] phy0: Selected rate control algorithm 'iwl-3945-rs'
>[ 830.427214] iwl3945 0000:06:00.0: firmware: requesting
>iwlwifi-3945-2.ucode
>[ 830.526945] iwl3945 0000:06:00.0: loaded firmware version 15.32.2.9
>[ 830.544236] CE: hpet increasing min_delta_ns to 15000 nsec
>[ 830.575694] iwl3945 0000:06:00.0: BSM uCode verification
>failed at addr 0x00003800+0 (of 900), is 0xa5a5a5a2, s/b 0xf802020
>[ 830.578862] iwl3945 0000:06:00.0: Hardware error detected.
>Restarting.
>[ 832.570133] iwl3945 0000:06:00.0: Wait for START_ALIVE
>timeout after 2000ms.
>[ 832.622676] iwl3945 0000:06:00.0: BSM uCode verification
>failed at addr 0x00003800+0 (of 900), is 0xa5a5a5a2, s/b 0xf802020
>[ 832.625839] iwl3945 0000:06:00.0: Hardware error detected.
>Restarting.
>[ 834.620132] iwl3945 0000:06:00.0: Wait for START_ALIVE
>timeout after 2000ms.

The 0xa5a5a5a2 signature indicates that the device is powered up and the PCI Express link is working, but somehow the device internals have not been powered up / initialized.

Are you using a very recent version of the driver? I've done some work in this area in the past couple of months.

-- Ben --



>
>
>Note that I used antenna=2 parameter to experiment with
>quality of reception of both antennas
>(this feature strangely works just fine....)
>
>I disabled this option, so next time I feel like rebooting my
>system, I see if this option was the cause.
>(unlikely)
>
>
>Best regards,
> Maxim Levitsky
>
>
>
>
>
>
>---------------------------------------------------------------
>---------------
>Let Crystal Reports handle the reporting - Free Crystal
>Reports 2008 30-Day
>trial. Simplify your report design, integration and deployment
>- and focus on
>what you do best, core application coding. Discover what's new with
>Crystal Reports now. http://p.sf.net/sfu/bobj-july
>_______________________________________________
>Ipw3945-devel mailing list
>[email protected]
>https://lists.sourceforge.net/lists/listinfo/ipw3945-devel
>

2009-11-23 10:44:40

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 2.6.32] rfkill: fix miscdev ops

On Mon, 2009-11-23 at 11:27 +0100, Johannes Berg wrote:
> The /dev/rfkill ops don't refer to the module,
> so it is possible to unload the module while
> file descriptors are open. Fix this oversight.
>
> Reported-by: Maxim Levitsky <[email protected]>
> Cc: [email protected]
> Signed-off-by: Johannes Berg <[email protected]>
> ---
> net/rfkill/core.c | 1 +
> 1 file changed, 1 insertion(+)
>
> --- wireless-testing.orig/net/rfkill/core.c 2009-11-23 11:11:13.000000000 +0100
> +++ wireless-testing/net/rfkill/core.c 2009-11-23 11:11:59.000000000 +0100
> @@ -1191,6 +1191,7 @@ static long rfkill_fop_ioctl(struct file
> #endif
>
> static const struct file_operations rfkill_fops = {
> + .owner = THIS_MODULE,
> .open = rfkill_fop_open,
> .read = rfkill_fop_read,
> .write = rfkill_fop_write,
>
>

Thanks for the quick patch!, works fine.

Best regards,
Maxim Levitsky


2009-11-23 23:54:27

by Maxim Levitsky

[permalink] [raw]
Subject: RE: [ipw3945-devel] iwl3945 dies on module reload

On Mon, 2009-11-23 at 15:24 -0800, Cahill, Ben M wrote:
>
> >-----Original Message-----
> >From: Maxim Levitsky [mailto:[email protected]]
> >Sent: Monday, November 23, 2009 12:16 PM
> >To: Johannes Berg
> >Cc: linux-wireless; iwlwifi maling list
> >Subject: [ipw3945-devel] iwl3945 dies on module reload
> >
> >When reloading the iwl3945 driver I often get this:
> >
> >
> >[ 830.333386] iwl3945: Intel(R) PRO/Wireless 3945ABG/BG
> >Network Connection driver for Linux, 1.2.26kds
> >[ 830.333407] iwl3945: Copyright(c) 2003-2009 Intel Corporation
> >[ 830.333536] iwl3945 0000:06:00.0: PCI INT A -> GSI 19
> >(level, low) -> IRQ 19
> >[ 830.374979] iwl3945 0000:06:00.0: Tunable channels: 13
> >802.11bg, 0 802.11a channels
> >[ 830.374995] iwl3945 0000:06:00.0: Detected Intel Wireless
> >WiFi Link 3945BG
> >[ 830.375163] iwl3945 0000:06:00.0: irq 32 for MSI/MSI-X
> >[ 830.376859] phy0: Selected rate control algorithm 'iwl-3945-rs'
> >[ 830.427214] iwl3945 0000:06:00.0: firmware: requesting
> >iwlwifi-3945-2.ucode
> >[ 830.526945] iwl3945 0000:06:00.0: loaded firmware version 15.32.2.9
> >[ 830.544236] CE: hpet increasing min_delta_ns to 15000 nsec
> >[ 830.575694] iwl3945 0000:06:00.0: BSM uCode verification
> >failed at addr 0x00003800+0 (of 900), is 0xa5a5a5a2, s/b 0xf802020
> >[ 830.578862] iwl3945 0000:06:00.0: Hardware error detected.
> >Restarting.
> >[ 832.570133] iwl3945 0000:06:00.0: Wait for START_ALIVE
> >timeout after 2000ms.
> >[ 832.622676] iwl3945 0000:06:00.0: BSM uCode verification
> >failed at addr 0x00003800+0 (of 900), is 0xa5a5a5a2, s/b 0xf802020
> >[ 832.625839] iwl3945 0000:06:00.0: Hardware error detected.
> >Restarting.
> >[ 834.620132] iwl3945 0000:06:00.0: Wait for START_ALIVE
> >timeout after 2000ms.
>
> The 0xa5a5a5a2 signature indicates that the device is powered up and the PCI Express link is working, but somehow the device internals have not been powered up / initialized.
>
> Are you using a very recent version of the driver? I've done some work in this area in the past couple of months.


I use -git tip of wireless-testing

Best regards,
Maxim Levitsky


2009-11-23 10:49:24

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 2.6.32] rfkill: fix miscdev ops

On Mon, 2009-11-23 at 12:44 +0200, Maxim Levitsky wrote:
> On Mon, 2009-11-23 at 11:27 +0100, Johannes Berg wrote:
> > The /dev/rfkill ops don't refer to the module,
> > so it is possible to unload the module while
> > file descriptors are open. Fix this oversight.

> Thanks for the quick patch!, works fine.

:)
I'd tested myself in kvm so didn't bother asking, but John can add
Tested-by I guess if you would like.

johannes


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part

2009-11-23 14:12:34

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 2.6.32] rfkill: fix miscdev ops

On Mon, 2009-11-23 at 11:49 +0100, Johannes Berg wrote:
> On Mon, 2009-11-23 at 12:44 +0200, Maxim Levitsky wrote:
> > On Mon, 2009-11-23 at 11:27 +0100, Johannes Berg wrote:
> > > The /dev/rfkill ops don't refer to the module,
> > > so it is possible to unload the module while
> > > file descriptors are open. Fix this oversight.
>
> > Thanks for the quick patch!, works fine.
>
> :)
> I'd tested myself in kvm so didn't bother asking, but John can add
> Tested-by I guess if you would like.
Not important, however I had seen that iwlwifi now chokes on reload,
sometimes with this:


[ 830.333386] iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network Connection driver for Linux, 1.2.26kds
[ 830.333407] iwl3945: Copyright(c) 2003-2009 Intel Corporation
[ 830.333536] iwl3945 0000:06:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[ 830.374979] iwl3945 0000:06:00.0: Tunable channels: 13 802.11bg, 0 802.11a channels
[ 830.374995] iwl3945 0000:06:00.0: Detected Intel Wireless WiFi Link 3945BG
[ 830.375163] iwl3945 0000:06:00.0: irq 32 for MSI/MSI-X
[ 830.376859] phy0: Selected rate control algorithm 'iwl-3945-rs'
[ 830.427214] iwl3945 0000:06:00.0: firmware: requesting iwlwifi-3945-2.ucode
[ 830.526945] iwl3945 0000:06:00.0: loaded firmware version 15.32.2.9
[ 830.544236] CE: hpet increasing min_delta_ns to 15000 nsec
[ 830.575694] iwl3945 0000:06:00.0: BSM uCode verification failed at addr 0x00003800+0 (of 900), is 0xa5a5a5a2, s/b 0xf802020
[ 830.578862] iwl3945 0000:06:00.0: Hardware error detected. Restarting.
[ 832.570133] iwl3945 0000:06:00.0: Wait for START_ALIVE timeout after 2000ms.
[ 832.622676] iwl3945 0000:06:00.0: BSM uCode verification failed at addr 0x00003800+0 (of 900), is 0xa5a5a5a2, s/b 0xf802020
[ 832.625839] iwl3945 0000:06:00.0: Hardware error detected. Restarting.
[ 834.620132] iwl3945 0000:06:00.0: Wait for START_ALIVE timeout after 2000ms.


Note that I used antenna=2 parameter to experiment with quality of reception of both antennas
(this feature strangely works just fine....)

I disabled this option, so next time I feel like rebooting my system, I see if this option was the cause.
(unlikely)


Best regards,
Maxim Levitsky