2008-02-05 22:00:35

by Pavel Machek

[permalink] [raw]
Subject: ipw3945: not only it periodically dies, it also BUG()s

Hi!

Under not-even-high load, it periodically restarts:

Feb 5 21:08:50 amd kernel: iwl3945: Microcode SW error detected.
Restarting 0x82000008.
Feb 5 21:08:52 amd kernel: iwl3945: Can't stop Rx DMA.
Feb 5 21:12:51 amd kernel: iwl3945: Microcode SW error detected.
Restarting 0x82000008.
Feb 5 21:12:53 amd kernel: iwl3945: Can't stop Rx DMA.
Feb 5 21:21:44 amd kernel: iwl3945: Microcode SW error detected.
Restarting 0x82000008.
Feb 5 21:21:46 amd kernel: iwl3945: Can't stop Rx DMA.
Feb 5 21:27:32 amd kernel: iwl3945: Microcode SW error detected.
Restarting 0x82000008.
Feb 5 21:27:34 amd kernel: iwl3945: Can't stop Rx DMA.
Feb 5 21:41:29 amd -- MARK --
Feb 5 22:01:29 amd -- MARK --
Feb 5 22:09:11 amd kernel: iwl3945: Microcode SW error detected.
Restarting 0x82000008.
Feb 5 22:09:12 amd kernel: iwl3945: Can't stop Rx DMA.

...I've reported this before, with full debugging. Not sure if
anything happened.

Now, I got BUG() in iwl3945-base.c: 3824

static void iwl3945_tx_cmd_complete(struct iwl3945_priv *priv,
struct iwl3945_rx_mem_buffer *rxb)
{
struct iwl3945_rx_packet *pkt = (struct iwl3945_rx_packet
*)rxb->skb->data;
u16 sequence = le16_to_cpu(pkt->hdr.sequence);
int txq_id = SEQ_TO_QUEUE(sequence);
int index = SEQ_TO_INDEX(sequence);
int huge = sequence & SEQ_HUGE_FRAME;
int cmd_index;
struct iwl3945_cmd *cmd;

/* If a Tx command is being handled and it isn't in the actual
* command queue then there a command routing bug has been
introduced
* in the queue management code. */
if (txq_id != IWL_CMD_QUEUE_NUM)
IWL_ERROR("Error wrong command queue %d command id
0x%X\n",
txq_id, pkt->hdr.cmd);
BUG_ON(txq_id != IWL_CMD_QUEUE_NUM);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here. Any ideas?
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


2008-02-06 14:33:25

by Pavel Machek

[permalink] [raw]
Subject: Re: ipw3945: not only it periodically dies, it also BUG()s

On Tue 2008-02-05 18:20:58, Chatre, Reinette wrote:
> On Tuesday, February 05, 2008 1:45 PM, Pavel Machek wrote:
>
> >
> > ...I've reported this before, with full debugging. Not sure if
> > anything happened.
>
> Could you please point me to where you have reported it before?

>From [email protected] Wed Oct 31 01:52:02 2007
From: Pavel Machek <[email protected]>
To: [email protected],
kernel list <[email protected]>,
[email protected], [email protected]
Subject: iwl3945 in 2.6.24-rc1 dies under load
X-Warning: Reading this can be dangerous to your mental health.

...and thread that resulted.

> > Now, I got BUG() in iwl3945-base.c: 3824
>
> Which driver and kernel are you using?

I'm using the default driver present in 2.6.24-git; I did git
checkout, this is the newest commit I have from Linus:

commit ae9458d6a0956aa21cb49e1251e35a8d4dacbe6e
tree 98c162c79113bc2bd748a3ad5b6fb5ba66139751
parent 63e9b66e29357dd12e8b1d3ebf7036e7591f81e3
parent e91926e9ea9073d8ce95b74602e8c2d775f5a793
author Linus Torvalds <[email protected]> Sat, 02 Feb 2008
15:13:05 +1100
committer Linus Torvalds <[email protected]> Sat, 02 Feb
2008 15:13:05 +1100

Merge git://git.infradead.org/battery-2.6

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.39 kB)
delme (21.66 kB)
Download all attachments

2008-02-19 23:48:03

by Pavel Machek

[permalink] [raw]
Subject: Re: ipw3945: not only it periodically dies, it also BUG()s

Hi!

> > unfortunately the link here does not work.
>
> The instructions on how to report a bug have been updated to address the
> above issues. Thank you very much for helping us to make it better.

Thanks.

One more nit:

[email protected] seems like normal bugzilla, where you can
hit reply button, and control it using email... but that does not
actually work.

It would be nice to fix that. (Or at least make mail appear from
[email protected], so that it is plain to see it does not work).

It seems that iwl3945 problem is somehow linked to
GROUP_SCHED... probably iwl can not handle latencies introduced by it?


Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-02-07 22:20:24

by Pavel Machek

[permalink] [raw]
Subject: iwl3945: not only it periodically dies, it also BUG()s, and oopses

> Could you please create a new bug in our bug tracking system
> (http://www.bughost.org) to enable us to track this problem? Please include the
> relevant information from the thread as well as the information you
> doscovered recently.

I'm connected over that iwl, so filing web form is not exactly
easy. Another day, another problem, today it oopsed:

I guess I should not have tried to fix iwl3945 by rmmoding, oh
well. Kernel is reasonably recent 2.6.25-rc0.
Pavel

iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network Connection driver for Linux, 1.2.23ks
iwl3945: Copyright(c) 2003-2007 Intel Corporation
PCI: Setting latency timer of device 0000:03:00.0 to 64
iwl3945: Detected Intel PRO/Wireless 3945ABG Network Connection
iwl3945: Tunable channels: 11 802.11bg, 13 802.11a channels
PM: Adding info for No Bus:phy0
PM: Adding info for No Bus:wmaster0
phy0: Selected rate control algorithm 'iwl-3945-rs'
PM: Adding info for No Bus:wlan0
ACPI: PCI interrupt for device 0000:03:00.0 disabled
PM: Writing back config space on device 0000:03:00.0 at offset 1 (was 100102, writing 100106)
PM: Adding info for No Bus:firmware-0000:03:00
PM: Removing info for No Bus:firmware-0000:03:00
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:11:2f:0e:95:a0
wlan0: RX authentication from 00:11:2f:0e:95:a0 (alg=0 transaction=2 status=0)
wlan0: authenticated
wlan0: associate with AP 00:11:2f:0e:95:a0
wlan0: RX AssocResp from 00:11:2f:0e:95:a0 (capab=0x401 status=0 aid=1)
wlan0: associated
Clocksource tsc unstable (delta = -65126761 ns)
usb 4-1: new full speed USB device using uhci_hcd and address 2
PM: Adding info for usb:4-1
PM: Adding info for No Bus:usbdev4.2_ep00
usb 4-1: configuration #1 chosen from 2 choices
PM: Adding info for usb:4-1:1.0
PM: Adding info for No Bus:usb0
usb0: register 'cdc_ether' at usb-0000:00:1d.2-1, CDC Ethernet Device, 5a:fb:9e:20:17:56
PM: Adding info for No Bus:usbdev4.2_ep83
PM: Adding info for usb:4-1:1.1
PM: Adding info for No Bus:usbdev4.2_ep81
PM: Adding info for No Bus:usbdev4.2_ep02
PM: Adding info for No Bus:usbdev4.2
usb 4-1: New USB device found, idVendor=1457, idProduct=5122
usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 4-1: Product: RNDIS/Ethernet Gadget
usb 4-1: Manufacturer: Linux 2.6.22.5-moko11/s3c2410_udc
acpiphp_glue: cannot get bridge info
usb 4-1: USB disconnect, address 2
PM: Removing info for No Bus:usbdev4.2_ep83
PM: Removing info for usb:4-1:1.0
usb0: unregister 'cdc_ether' usb-0000:00:1d.2-1, CDC Ethernet Device
PM: Removing info for No Bus:usb0
PM: Removing info for No Bus:usbdev4.2_ep81
PM: Removing info for No Bus:usbdev4.2_ep02
PM: Removing info for usb:4-1:1.1
PM: Removing info for usb:4-1
PM: Removing info for No Bus:usbdev4.2
PM: Removing info for No Bus:usbdev4.2_ep00
acpiphp_glue: cannot get bridge info
iwl3945: Microcode SW error detected. Restarting 0x82000008.
iwl3945: TODO: Implement Tx ABORT REQUIRED!!!
iwl3945: Can't stop Rx DMA.
wlan0: RX deauthentication from 00:11:2f:0e:95:a0 (reason=4)
wlan0: deauthenticated
wlan0: authenticate with AP 00:11:2f:0e:95:a0
wlan0: RX authentication from 00:11:2f:0e:95:a0 (alg=0 transaction=2 status=0)
wlan0: authenticated
wlan0: associate with AP 00:11:2f:0e:95:a0
wlan0: RX ReassocResp from 00:11:2f:0e:95:a0 (capab=0x401 status=0 aid=1)
wlan0: associated
iwl3945: Microcode SW error detected. Restarting 0x82000008.
iwl3945: Can't stop Rx DMA.
wlan0: RX deauthentication from 00:11:2f:0e:95:a0 (reason=4)
wlan0: deauthenticated
wlan0: authenticate with AP 00:11:2f:0e:95:a0
wlan0: RX authentication from 00:11:2f:0e:95:a0 (alg=0 transaction=2 status=0)
wlan0: authenticated
wlan0: associate with AP 00:11:2f:0e:95:a0
wlan0: RX ReassocResp from 00:11:2f:0e:95:a0 (capab=0x401 status=0 aid=1)
wlan0: associated
ACPI: PCI interrupt for device 0000:03:00.0 disabled
PM: Removing info for No Bus:wlan0
PM: Removing info for No Bus:wmaster0
PM: Removing info for No Bus:phy0
iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network Connection driver for Linux, 1.2.23ks
iwl3945: Copyright(c) 2003-2007 Intel Corporation
PCI: Setting latency timer of device 0000:03:00.0 to 64
iwl3945: Detected Intel PRO/Wireless 3945ABG Network Connection
BUG: unable to handle kernel NULL pointer dereference at 00000109
IP: [<c0208063>] dma_alloc_coherent+0x43/0x100
*pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in: iwl3945(+) [last unloaded: iwl3945]

Pid: 11189, comm: insmod Not tainted (2.6.24 #109)
EIP: 0060:[<c0208063>] EFLAGS: 00010202 CPU: 0
EIP is at dma_alloc_coherent+0x43/0x100
EAX: 00000000 EBX: 00000000 ECX: f338c70c EDX: 0000002c
ESI: 0000002c EDI: f338c710 EBP: 00000101 ESP: d9debd64
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process insmod (pid: 11189, ti=d9dea000 task=d9d25980 task.ti=d9dea000)
Stack: 00000000 f338c70c f7d5fb24 f3388e60 f3388e60 f338c710 f7d5fad0 f8c4b71b
00000020 00000000 00000000 f8c482df f8c501f4 f8c4ee4e f78f9630 c02b13f4
c02b10e1 f3351a88 f7d45530 c02b1134 c02b11fd f3351a88 f3351b18 00000000
Call Trace:
[<f8c4b71b>] iwl3945_hw_set_hw_setting+0x3b/0xc0 [iwl3945]
[<f8c482df>] iwl3945_pci_probe+0x24f/0xcb0 [iwl3945]
[<c02b13f4>] sysfs_addrm_finish+0x34/0x1c0
[<c02b10e1>] sysfs_find_dirent+0x21/0x30
[<c02b1134>] sysfs_add_one+0x44/0xa0
[<c02b11fd>] sysfs_addrm_start+0x6d/0xb0
[<c02b1f4b>] sysfs_create_link+0x8b/0xf0
[<c03856c0>] pci_match_device+0x10/0xa0
[<c0385846>] pci_device_probe+0x56/0x80
[<c042fa08>] driver_probe_device+0x88/0x170
[<c0719033>] klist_next+0x53/0xa0
[<c042fc3a>] __driver_attach+0x7a/0x80
[<c042edba>] bus_for_each_dev+0x3a/0x60
[<c03857f0>] pci_device_probe+0x0/0x80
[<c042f886>] driver_attach+0x16/0x20
[<c042fbc0>] __driver_attach+0x0/0x80
[<c042f63d>] bus_add_driver+0xbd/0x220
[<c0270cc1>] cache_free_debugcheck+0xd1/0x220
[<c0385790>] pci_device_remove+0x0/0x40
[<c03857f0>] pci_device_probe+0x0/0x80
[<c042fdcb>] driver_register+0x3b/0xf0
[<c0248f21>] sys_init_module+0x1521/0x1a60
[<c0385a5d>] __pci_register_driver+0x3d/0x80
[<f8b9e030>] iwl3945_init+0x30/0x49 [iwl3945]
[<c0247b3e>] sys_init_module+0x13e/0x1a60
[<c02750a5>] do_sync_read+0xd5/0x120
[<c07042d0>] ieee80211_rx_irqsafe+0x0/0x80
[<c0273607>] filp_close+0x47/0x80
[<c0203e8e>] syscall_call+0x7/0xb
=======================
Code: 14 89 44 24 08 89 4c 24 04 74 06 8b a8 00 01 00 00 8d 46 ff bb ff ff ff ff c1 e8 0b 89 04 24 83 c3 01 d1 2c 24 75 f8 85 ed 74 5d <8b> 55 08 89 d9 8b 45 10 e8 70 0c 17 00 85 c0 78 44 89 c2 8b 45
EIP: [<c0208063>] dma_alloc_coherent+0x43/0x100 SS:ESP 0068:d9debd64
---[ end trace 96f01332a9244198 ]---


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-02-06 17:19:08

by Reinette Chatre

[permalink] [raw]
Subject: RE: ipw3945: not only it periodically dies, it also BUG()s

On Wednesday, February 06, 2008 6:32 AM, Pavel Machek wrote:

> On Tue 2008-02-05 18:20:58, Chatre, Reinette wrote:
>> On Tuesday, February 05, 2008 1:45 PM, Pavel Machek wrote:
>>
>>>
>>> ...I've reported this before, with full debugging. Not sure if
>>> anything happened.
>>
>> Could you please point me to where you have reported it before?
>
> From [email protected] Wed Oct 31 01:52:02 2007
> From: Pavel Machek <[email protected]>
> To: [email protected],
> kernel list <[email protected]>,
> [email protected], [email protected]
> Subject: iwl3945 in 2.6.24-rc1 dies under load
> X-Warning: Reading this can be dangerous to your mental health.
>
> ...and thread that resulted.

Could you please create a new bug in our bug tracking system
(http://www.bughost.org) to enable us to track this problem? Please include the
relevant information from the thread as well as the information you
doscovered recently.

Thank you very much

Reinette


2008-02-06 21:00:14

by Pavel Machek

[permalink] [raw]
Subject: Re: ipw3945: not only it periodically dies, it also BUG()s

Hi!

> >>> ...I've reported this before, with full debugging. Not sure if
> >>> anything happened.
> >>
> >> Could you please point me to where you have reported it before?
> >
> > From [email protected] Wed Oct 31 01:52:02 2007
> > From: Pavel Machek <[email protected]>
> > To: [email protected],
> > kernel list <[email protected]>,
> > [email protected], [email protected]
> > Subject: iwl3945 in 2.6.24-rc1 dies under load
> > X-Warning: Reading this can be dangerous to your mental health.
> >
> > ...and thread that resulted.
>
> Could you please create a new bug in our bug tracking system
> (http://www.bughost.org) to enable us to track this problem? Please include the
> relevant information from the thread as well as the information you
> doscovered recently.

Hmmm... bugzilla says:

* Exact steps to reproduce
* Reproducability of bug (e.g. intermittent or 100% reproducable)
* Did this problem not exist in previous version of the driver?
* kernel version
* AP brand/model
* dmesg output at debug level 0x43fff
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
it would be nice to specify how to do this. It is
insmod parameter, right?

* Type of security (if any) you are using (e.g. WEP64, WEP128, WPA, WPA2, 802.1x, etc)
* Version of firmware
* Version of the ieee80211 module
* Proximity to the AP

* Before reporting any firmware errors, please be sure to read Ben
Cahill's mailing list post on how to most effectively report such
bugs.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
unfortunately the link here does not work.

BTW, why not use kernel.org bugzilla? Having to create another account
is nasty...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-02-06 02:27:50

by Reinette Chatre

[permalink] [raw]
Subject: RE: ipw3945: not only it periodically dies, it also BUG()s

On Tuesday, February 05, 2008 1:45 PM, Pavel Machek wrote:

>
> ...I've reported this before, with full debugging. Not sure if
> anything happened.

Could you please point me to where you have reported it before?

> Now, I got BUG() in iwl3945-base.c: 3824

Which driver and kernel are you using?

Reinette


2008-02-15 22:41:26

by Reinette Chatre

[permalink] [raw]
Subject: RE: ipw3945: not only it periodically dies, it also BUG()s

On Wednesday, February 06, 2008 1:00 PM, Pavel Machek wrote:

> Hmmm... bugzilla says:
>
> * Exact steps to reproduce
> * Reproducability of bug (e.g. intermittent or 100% reproducable)
> * Did this problem not exist in previous version of the driver?
> * kernel version
> * AP brand/model
> * dmesg output at debug level 0x43fff
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> it would be nice to specify how to do this. It is
> insmod parameter, right?
>
> * Type of security (if any) you are using (e.g. WEP64,
> WEP128, WPA, WPA2, 802.1x, etc)
> * Version of firmware
> * Version of the ieee80211 module
> * Proximity to the AP
>
> * Before reporting any firmware errors, please be sure to read Ben
> Cahill's mailing list post on how to most effectively report such
> bugs.
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> unfortunately the link here does not work.

The instructions on how to report a bug have been updated to address the
above issues. Thank you very much for helping us to make it better.

Reinette

2008-02-05 23:10:00

by Pavel Machek

[permalink] [raw]
Subject: Re: ipw3945: not only it periodically dies, it also BUG()s

On Tue 2008-02-05 22:44:41, Pavel Machek wrote:
> Hi!
>
> Under not-even-high load, it periodically restarts:
>
> Feb 5 21:08:50 amd kernel: iwl3945: Microcode SW error detected.
> Restarting 0x82000008.
> Feb 5 21:08:52 amd kernel: iwl3945: Can't stop Rx DMA.
> Feb 5 21:12:51 amd kernel: iwl3945: Microcode SW error detected.
> Restarting 0x82000008.
> Feb 5 21:12:53 amd kernel: iwl3945: Can't stop Rx DMA.
> Feb 5 21:21:44 amd kernel: iwl3945: Microcode SW error detected.
> Restarting 0x82000008.
> Feb 5 21:21:46 amd kernel: iwl3945: Can't stop Rx DMA.
> Feb 5 21:27:32 amd kernel: iwl3945: Microcode SW error detected.
> Restarting 0x82000008.
> Feb 5 21:27:34 amd kernel: iwl3945: Can't stop Rx DMA.
> Feb 5 21:41:29 amd -- MARK --
> Feb 5 22:01:29 amd -- MARK --
> Feb 5 22:09:11 amd kernel: iwl3945: Microcode SW error detected.
> Restarting 0x82000008.
> Feb 5 22:09:12 amd kernel: iwl3945: Can't stop Rx DMA.
>

Now it decided to be original:

Feb 5 23:55:39 amd kernel: ACPI: EC: missing write data confirmation,
don't expect it any longer.
Feb 5 23:57:37 amd kernel: iwl3945: Microcode SW error detected.
Restarting 0x82000008.
Feb 5 23:57:37 amd kernel: iwl3945: TODO: Implement Tx ABORT
REQUIRED!!!
Feb 5 23:57:39 amd kernel: iwl3945: Can't stop Rx DMA.

Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-02-06 23:09:16

by Reinette Chatre

[permalink] [raw]
Subject: RE: ipw3945: not only it periodically dies, it also BUG()s

On Wednesday, February 06, 2008 1:00 PM, Pavel Machek wrote:

> * dmesg output at debug level 0x43fff
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> it would be nice to specify how to do this. It is
> insmod parameter, right?

correct. you can do this as follows:
$ insmod iwl3945 debug=0x43fff
or
$ modprobe iwl3945 debug=0x43fff

> * Before reporting any firmware errors, please be sure to read Ben
> Cahill's mailing list post on how to most effectively report such
> bugs.
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> unfortunately the link here does not work.

We'll try to dig out the instructions from another location and update
that link. Thanks for letting us know.

>
> BTW, why not use kernel.org bugzilla? Having to create another
> account is nasty...

Users can report iwlwifi bugs in many locations ... their OSV's bug
tracker (which could end up being many) as well as the kernel.org
bugzilla. We focus on bugs in the bughost.org system.

Reinette