Hi,
Firstly, I'm not subscribed, so please cc me into any reply.
I've found that I can cause my laptop (DEll Latitude C610) to crash silently
(other than blinking LEDs) with stock 2.6.24-rc8{,-git2,-git3,-git4} kernels.
I suspect the problem is related to the rt61pci driver, so I've cc'd
linux-wireless.
All I have to do to cause the crash is run the following script on the laptop
#!/bin/sh
for i in `seq 1 100`; do
echo $i > itercount
rm -f linux-2.6.23.tar.bz2
wget ftp://192.168.1.10/pub/linux-2.6.23.tar.bz2
done
The crash will occur usually during the first iteration (and often in the
first 12 megabytes), but never later than the second.
I suspect the rt61pci driver because if I down wlan0 and connect to my network
via eth0, the laptop never crashes (and I've let it run for 50 or so
iterations). Similarly, if I build and install -git4 without the in-tree
rt61pci driver and build and install the out-of-tree rt61 driver from
serialmonkey.com (20071117 snapshot), the script runs fine, although I admit
that I stopped it after eight iterations.
My config, the output, from lspci -vv and the contents of syslog from a crash
are attached. The wireless adapter is a Belkin F5D7010.
Other than maybe a little later on tonight, I won't have much time to give
this further attention over the next couple of days, but I thought it best to
give a heads-up now.
Thanks in advance
Chris
--
Beer is proof that God loves us and wants us to be happy - Benjamin Franklin
On Wednesday 23 January 2008, Chris Clayton wrote:
> Hi again.
>
> On Tuesday 22 January 2008, Chris Clayton wrote:
> > Hi,
and here's another, slightly more complete, one, this time from 2.6.24-rc8-git6
(the none I sent earlier was from -git4, by the way)
Call Trace:
[<c011440a>] __update_rq_clock+0x1a/0xf0
[<e08ff59d>] rt2x00lib_txdone+0x9d/0xd0 [rt2x00lib]
[<e0909573>] rt61pci_txdone+0x153/0x1f0 [rt61pci]
[<e09096ad>] rt61pci_interrupt+0x9d/0xb0 [rt61pci]
[<c0140dc7>] handle_IRQ_event+0x27/0x60
[<c0141fbb>] handle_level_irq+0x6b/0x100
[<c014if50>] handle_level_irq+0x0/0x100
[<c0105db8>] do_IRQ+0x68/0xc0
[<c011440a>] __update_rq_clock+0x1a/0xf0
[<c01042f3>] common_interrupt+0x23/0x28
[<c014007b>] encode_comp2_t+0x4b/0x80
[<c011007b>] acpi_copy_wakeup_routine+0x2b/0x9c
[<c02161fb>] acpi_processor_idle+0x25b/0x36e
[<c01020dc>] cpu_idle+0x5c/0x80
[<c040381b>] start_kernel+0x18b/0x1d0
[<c0403360>] unknown_bootoption+0x0/0x150
========================
Code: 00 00 00 e9 6c ff ff ff 8d b6 00 00 00 00 8d bf 00 00 00 00 55 89 cd 57 56
53 83 ec 10 89 d3 89 44 24 08 ba 20 00 00 00 8b 40 58 <89> 43 14 a1 10 66 3e c0
e8 6c f9 82 df 89 44 24 0c 85 c0 89 c7
EIP: [<e0930eb7>] ieee80211_tx_status_irqsafe+0x17/0x120 [mac80211] SS:ESP 0068:
c042ff4c
Kernel panic - not syncing: Fatal exception in interrupt
I'll stop crashing my laptop now :)
Hope this too helps.
>
> I've now got the part of the panic message that doesn't scroll off-screen:
>
> [<c01f870e>] acpi_ds_exec_end_op+0xc1/0x387
> [<c020757d>] acpi_ps_get_arguments+0xa9+0x12d
> [<c0207b3a>] acpi_ps_parse_loop+0x259/0x2b7
> [<c0207052>] acpi_ps_parse_aml_walk+0xb3/0x10e
> [<c02080e3>] acpi_ds_init_aml_walk+0xb3/0c10e
> [<c02080e3>] acpi_ps_execute_method+0xaf/0xe5
> [<c020530b>] acpi_ns_evaluate+0x97/0xec
> [<c0204c66>] acpi_evaluate_object+0x136/0x1da
> [<c01f6ebd>] acpi_evaluate_integer+0x7a/0xb8
> [<c0217a81>] acpi_thermal_get_temprature+0x29/0x3d
> [<c0218498>] acpi_thermal_temp_seq_show+0x15/0x58
> [<c017d27a>] seq_read+0xda/0x2a0
> [<c017d1a0>] seq_read+0x0/0x2a0
> [<c01943ac>] proc_reg_read+0x9c/0xb0
> [<c016359c>] vfs_read+0x9c/0xd0
> [<c01637e7>] sys_read+0x47/0x80
> [<c010414a>] syscall_call+0x7/0xb
> =======================
> Code: 00 00 00 e9 6c ff ff ff 8d b6 00 00 00 00 8d bf 00 00 00 00 55 89 cd 57 56
> 53 83 ec 10 89 d3 89 44 24 08 ba 20 00 00 00 8b 40 58 <89> 42 14 a1 10 66 3e c0
> e8 bc 89 83 df 89 44 24 0c 85 c0 89 c7
> EIP: [<e0927eb7>] ieee80211_tx_status_irqsafe+0x17/0x120 [mac80211] SS:ESP 0068:
> c042ff4c
> Kernel panic - not syncing: fatal exception in interrupt
>
> Hope this helps. Feel free to send me any patches that might fix or help diagnose.
>
> Chris
> >
> > Firstly, I'm not subscribed, so please cc me into any reply.
> >
> > I've found that I can cause my laptop (DEll Latitude C610) to crash silently
> > (other than blinking LEDs) with stock 2.6.24-rc8{,-git2,-git3,-git4} kernels.
> > I suspect the problem is related to the rt61pci driver, so I've cc'd
> > linux-wireless.
> >
> > All I have to do to cause the crash is run the following script on the laptop
> >
> > #!/bin/sh
> > for i in `seq 1 100`; do
> > echo $i > itercount
> > rm -f linux-2.6.23.tar.bz2
> > wget ftp://192.168.1.10/pub/linux-2.6.23.tar.bz2
> > done
> >
> > The crash will occur usually during the first iteration (and often in the
> > first 12 megabytes), but never later than the second.
> >
> > I suspect the rt61pci driver because if I down wlan0 and connect to my network
> > via eth0, the laptop never crashes (and I've let it run for 50 or so
> > iterations). Similarly, if I build and install -git4 without the in-tree
> > rt61pci driver and build and install the out-of-tree rt61 driver from
> > serialmonkey.com (20071117 snapshot), the script runs fine, although I admit
> > that I stopped it after eight iterations.
> >
> > My config, the output, from lspci -vv and the contents of syslog from a crash
> > are attached. The wireless adapter is a Belkin F5D7010.
> >
> > Other than maybe a little later on tonight, I won't have much time to give
> > this further attention over the next couple of days, but I thought it best to
> > give a heads-up now.
> >
> > Thanks in advance
> >
> > Chris
> >
>
--
Beer is proof that God loves us and wants us to be happy - Benjamin Franklin
On Wed, Jan 23, 2008 at 04:42:58PM +0000, Chris Clayton wrote:
> On Wednesday 23 January 2008, Chris Clayton wrote:
> > Hi again.
> >
> > On Tuesday 22 January 2008, Chris Clayton wrote:
> > > Hi,
>
> and here's another, slightly more complete, one, this time from 2.6.24-rc8-git6
> (the none I sent earlier was from -git4, by the way)
>
> Call Trace:
> [<c011440a>] __update_rq_clock+0x1a/0xf0
> [<e08ff59d>] rt2x00lib_txdone+0x9d/0xd0 [rt2x00lib]
> [<e0909573>] rt61pci_txdone+0x153/0x1f0 [rt61pci]
> [<e09096ad>] rt61pci_interrupt+0x9d/0xb0 [rt61pci]
I suspect this could relate to this commit:
commit 62bc060b8ed5fcdafd87da5ab17bdd59a39ebcc9
Author: Mattias Nissler <[email protected]>
Date: Mon Nov 12 15:03:12 2007 +0100
rt2x00: Allow rt61 to catch up after a missing tx report
Could you revert that patch, rebuild, and try to recreate the problem?
Thanks!
John
--
John W. Linville
[email protected]
Thanks for the reply, John.
On Wednesday 23 January 2008, John W. Linville wrote:
> On Wed, Jan 23, 2008 at 04:42:58PM +0000, Chris Clayton wrote:
> > On Wednesday 23 January 2008, Chris Clayton wrote:
> > > Hi again.
> > >
> > > On Tuesday 22 January 2008, Chris Clayton wrote:
> > > > Hi,
> >
> > and here's another, slightly more complete, one, this time from 2.6.24-rc8-git6
> > (the none I sent earlier was from -git4, by the way)
> >
> > Call Trace:
> > [<c011440a>] __update_rq_clock+0x1a/0xf0
> > [<e08ff59d>] rt2x00lib_txdone+0x9d/0xd0 [rt2x00lib]
> > [<e0909573>] rt61pci_txdone+0x153/0x1f0 [rt61pci]
> > [<e09096ad>] rt61pci_interrupt+0x9d/0xb0 [rt61pci]
>
> I suspect this could relate to this commit:
>
> commit 62bc060b8ed5fcdafd87da5ab17bdd59a39ebcc9
> Author: Mattias Nissler <[email protected]>
> Date: Mon Nov 12 15:03:12 2007 +0100
>
> rt2x00: Allow rt61 to catch up after a missing tx report
>
> Could you revert that patch, rebuild, and try to recreate the problem?
>
I've reverted the patch by hand - the original patch on linux-wireless doesn't match the present state of the code
in rt61pci.c and I'm not a git user. I'm pretty sure I've got it right, however.
Problem now is that we are back where we were a few weeks ago because my wireless connection dies
almost immediately after I start the download and from dmesg I see:
phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 2.
Please file bug report to http://rt2x00.serialmonkey.com.
Chris
> Thanks!
>
> John
--
Beer is proof that God loves us and wants us to be happy - Benjamin Franklin
On Thursday 24 January 2008, Chris Clayton wrote:
> Thanks for the reply, John.
>
> On Wednesday 23 January 2008, John W. Linville wrote:
> > On Wed, Jan 23, 2008 at 04:42:58PM +0000, Chris Clayton wrote:
> > > On Wednesday 23 January 2008, Chris Clayton wrote:
> > > > Hi again.
> > > >
> > > > On Tuesday 22 January 2008, Chris Clayton wrote:
> > > > > Hi,
> > >
> > > and here's another, slightly more complete, one, this time from 2.6.24-rc8-git6
> > > (the none I sent earlier was from -git4, by the way)
> > >
> > > Call Trace:
> > > [<c011440a>] __update_rq_clock+0x1a/0xf0
> > > [<e08ff59d>] rt2x00lib_txdone+0x9d/0xd0 [rt2x00lib]
> > > [<e0909573>] rt61pci_txdone+0x153/0x1f0 [rt61pci]
> > > [<e09096ad>] rt61pci_interrupt+0x9d/0xb0 [rt61pci]
> >
> > I suspect this could relate to this commit:
> >
> > commit 62bc060b8ed5fcdafd87da5ab17bdd59a39ebcc9
> > Author: Mattias Nissler <[email protected]>
> > Date: Mon Nov 12 15:03:12 2007 +0100
> >
> > rt2x00: Allow rt61 to catch up after a missing tx report
> >
> > Could you revert that patch, rebuild, and try to recreate the problem?
> >
>
> I've reverted the patch by hand - the original patch on linux-wireless doesn't match the present state of the code
> in rt61pci.c and I'm not a git user. I'm pretty sure I've got it right, however.
>
> Problem now is that we are back where we were a few weeks ago because my wireless connection dies
> almost immediately after I start the download and from dmesg I see:
>
Doh! Of course, I can get the patch that needs to be reverted from Linus' git repository, so just
to be certain my reversion-by-hand was, in fact, OK and not the source of the problem, I've done
just that, and...
> phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 2.
> Please file bug report to http://rt2x00.serialmonkey.com.
>
... this "non-free entry" still appears within a few seconds on any non-trivial network activity starting.
> Chris
>
> > Thanks!
> >
> > John
>
>
>
--
Beer is proof that God loves us and wants us to be happy - Benjamin Franklin
On Thu, 2008-01-24 at 00:03 +0000, Chris Clayton wrote:
> Thanks for the reply, John.
>
> On Wednesday 23 January 2008, John W. Linville wrote:
> > On Wed, Jan 23, 2008 at 04:42:58PM +0000, Chris Clayton wrote:
> > > On Wednesday 23 January 2008, Chris Clayton wrote:
> > > > Hi again.
> > > >
> > > > On Tuesday 22 January 2008, Chris Clayton wrote:
> > > > > Hi,
> > >
> > > and here's another, slightly more complete, one, this time from 2.6.24-rc8-git6
> > > (the none I sent earlier was from -git4, by the way)
> > >
> > > Call Trace:
> > > [<c011440a>] __update_rq_clock+0x1a/0xf0
> > > [<e08ff59d>] rt2x00lib_txdone+0x9d/0xd0 [rt2x00lib]
> > > [<e0909573>] rt61pci_txdone+0x153/0x1f0 [rt61pci]
> > > [<e09096ad>] rt61pci_interrupt+0x9d/0xb0 [rt61pci]
> >
> > I suspect this could relate to this commit:
> >
> > commit 62bc060b8ed5fcdafd87da5ab17bdd59a39ebcc9
> > Author: Mattias Nissler <[email protected]>
> > Date: Mon Nov 12 15:03:12 2007 +0100
> >
> > rt2x00: Allow rt61 to catch up after a missing tx report
> >
> > Could you revert that patch, rebuild, and try to recreate the problem?
> >
>
> I've reverted the patch by hand - the original patch on linux-wireless doesn't match the present state of the code
> in rt61pci.c and I'm not a git user. I'm pretty sure I've got it right, however.
>
> Problem now is that we are back where we were a few weeks ago because my wireless connection dies
> almost immediately after I start the download and from dmesg I see:
>
> phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 2.
> Please file bug report to http://rt2x00.serialmonkey.com.
As the patches author, I'm not very surprised :-) The patch just adds
additional processing that should have happened in a different call of
the interrupt handler. I don't think it can be related at all to the
trouble you're seeing.
When I find time, I'll try the Linus tree on my rt61 to see what
happens.
Mattias
On Thursday 24 January 2008, Mattias Nissler wrote:
>
> On Thu, 2008-01-24 at 00:03 +0000, Chris Clayton wrote:
> > Thanks for the reply, John.
> >
> > On Wednesday 23 January 2008, John W. Linville wrote:
> > > On Wed, Jan 23, 2008 at 04:42:58PM +0000, Chris Clayton wrote:
> > > > On Wednesday 23 January 2008, Chris Clayton wrote:
> > > > > Hi again.
> > > > >
> > > > > On Tuesday 22 January 2008, Chris Clayton wrote:
> > > > > > Hi,
> > > >
> > > > and here's another, slightly more complete, one, this time from 2.6.24-rc8-git6
> > > > (the none I sent earlier was from -git4, by the way)
> > > >
> > > > Call Trace:
> > > > [<c011440a>] __update_rq_clock+0x1a/0xf0
> > > > [<e08ff59d>] rt2x00lib_txdone+0x9d/0xd0 [rt2x00lib]
> > > > [<e0909573>] rt61pci_txdone+0x153/0x1f0 [rt61pci]
> > > > [<e09096ad>] rt61pci_interrupt+0x9d/0xb0 [rt61pci]
> > >
> > > I suspect this could relate to this commit:
> > >
> > > commit 62bc060b8ed5fcdafd87da5ab17bdd59a39ebcc9
> > > Author: Mattias Nissler <[email protected]>
> > > Date: Mon Nov 12 15:03:12 2007 +0100
> > >
> > > rt2x00: Allow rt61 to catch up after a missing tx report
> > >
> > > Could you revert that patch, rebuild, and try to recreate the problem?
> > >
> >
> > I've reverted the patch by hand - the original patch on linux-wireless doesn't match the present state of the code
> > in rt61pci.c and I'm not a git user. I'm pretty sure I've got it right, however.
> >
> > Problem now is that we are back where we were a few weeks ago because my wireless connection dies
> > almost immediately after I start the download and from dmesg I see:
> >
> > phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 2.
> > Please file bug report to http://rt2x00.serialmonkey.com.
>
> As the patches author, I'm not very surprised :-) The patch just adds
> additional processing that should have happened in a different call of
> the interrupt handler. I don't think it can be related at all to the
> trouble you're seeing.
>
> When I find time, I'll try the Linus tree on my rt61 to see what
> happens.
>
OK, thanks. I look forward to seeing how things go. In the meantime, I'll use the out-of-tree module from serialmonkey.
Chris
> Mattias
>
>
--
Beer is proof that God loves us and wants us to be happy - Benjamin Franklin
Hi John,
On Wednesday 23 January 2008, John W. Linville wrote:
> On Wed, Jan 23, 2008 at 04:42:58PM +0000, Chris Clayton wrote:
> > On Wednesday 23 January 2008, Chris Clayton wrote:
> > > Hi again.
> > >
> > > On Tuesday 22 January 2008, Chris Clayton wrote:
> > > > Hi,
> >
> > and here's another, slightly more complete, one, this time from 2.6.24-rc8-git6
> > (the none I sent earlier was from -git4, by the way)
> >
> > Call Trace:
> > [<c011440a>] __update_rq_clock+0x1a/0xf0
> > [<e08ff59d>] rt2x00lib_txdone+0x9d/0xd0 [rt2x00lib]
> > [<e0909573>] rt61pci_txdone+0x153/0x1f0 [rt61pci]
> > [<e09096ad>] rt61pci_interrupt+0x9d/0xb0 [rt61pci]
>
> I suspect this could relate to this commit:
>
> commit 62bc060b8ed5fcdafd87da5ab17bdd59a39ebcc9
> Author: Mattias Nissler <[email protected]>
> Date: Mon Nov 12 15:03:12 2007 +0100
>
> rt2x00: Allow rt61 to catch up after a missing tx report
>
> Could you revert that patch, rebuild, and try to recreate the problem?
As you will have seen, Matthias doubted whether his patch would be at the root
of the problem I am seeing. I noticed you posted some patches earlier this week.
Would any combination of those be likely to fix the bug, please? The patch
http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/upstream/0001-rt61pci-fix-up-merge-damage.patch
applies cleanly to 2.6.24 but the build fails :(
It seems that it may take a little while to get to the bottom of this bug. Should
I raise an incident on bugzilla?
Thanks
Chris
>
> Thanks!
>
> John
--
Beauty is in the eye of the beerholder.
On Thu, Jan 31, 2008 at 08:13:31PM +0000, Chris Clayton wrote:
> Hi John,
>
> On Wednesday 23 January 2008, John W. Linville wrote:
> > On Wed, Jan 23, 2008 at 04:42:58PM +0000, Chris Clayton wrote:
> > > On Wednesday 23 January 2008, Chris Clayton wrote:
> > > > Hi again.
> > > >
> > > > On Tuesday 22 January 2008, Chris Clayton wrote:
> > > > > Hi,
> > >
> > > and here's another, slightly more complete, one, this time from 2.6.24-rc8-git6
> > > (the none I sent earlier was from -git4, by the way)
> > >
> > > Call Trace:
> > > [<c011440a>] __update_rq_clock+0x1a/0xf0
> > > [<e08ff59d>] rt2x00lib_txdone+0x9d/0xd0 [rt2x00lib]
> > > [<e0909573>] rt61pci_txdone+0x153/0x1f0 [rt61pci]
> > > [<e09096ad>] rt61pci_interrupt+0x9d/0xb0 [rt61pci]
> >
> > I suspect this could relate to this commit:
> >
> > commit 62bc060b8ed5fcdafd87da5ab17bdd59a39ebcc9
> > Author: Mattias Nissler <[email protected]>
> > Date: Mon Nov 12 15:03:12 2007 +0100
> >
> > rt2x00: Allow rt61 to catch up after a missing tx report
> >
> > Could you revert that patch, rebuild, and try to recreate the problem?
>
> As you will have seen, Matthias doubted whether his patch would be at the root
> of the problem I am seeing. I noticed you posted some patches earlier this week.
> Would any combination of those be likely to fix the bug, please? The patch
> http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/upstream/0001-rt61pci-fix-up-merge-damage.patch
> applies cleanly to 2.6.24 but the build fails :(
This patch is not intended for 2.6.24, so that won't help. :-)
I saw Mattias's reply, but I didn't see him suggest any other
fixes... :-( I'm wondering if maybe you tried reverting it anyway?
I only ask because the call trace references symbols that would seem
to relate to the patch. So even if _why_ it might break things isn't
obvious, it still might be worth trying a revert "just in case". :-)
> It seems that it may take a little while to get to the bottom of this bug. Should
> I raise an incident on bugzilla?
I've seen other reports of rt61 working fine with 2.6.24. So I
suppose you should open a bug.
Thanks!
John
--
John W. Linville
[email protected]
On Thursday 31 January 2008, John W. Linville wrote:
> On Thu, Jan 31, 2008 at 08:13:31PM +0000, Chris Clayton wrote:
> > Hi John,
> >
> > On Wednesday 23 January 2008, John W. Linville wrote:
> > > On Wed, Jan 23, 2008 at 04:42:58PM +0000, Chris Clayton wrote:
> > > > On Wednesday 23 January 2008, Chris Clayton wrote:
> > > > > Hi again.
> > > > >
> > > > > On Tuesday 22 January 2008, Chris Clayton wrote:
> > > > > > Hi,
> > > >
> > > > and here's another, slightly more complete, one, this time from 2.6.24-rc8-git6
> > > > (the none I sent earlier was from -git4, by the way)
> > > >
> > > > Call Trace:
> > > > [<c011440a>] __update_rq_clock+0x1a/0xf0
> > > > [<e08ff59d>] rt2x00lib_txdone+0x9d/0xd0 [rt2x00lib]
> > > > [<e0909573>] rt61pci_txdone+0x153/0x1f0 [rt61pci]
> > > > [<e09096ad>] rt61pci_interrupt+0x9d/0xb0 [rt61pci]
> > >
> > > I suspect this could relate to this commit:
> > >
> > > commit 62bc060b8ed5fcdafd87da5ab17bdd59a39ebcc9
> > > Author: Mattias Nissler <[email protected]>
> > > Date: Mon Nov 12 15:03:12 2007 +0100
> > >
> > > rt2x00: Allow rt61 to catch up after a missing tx report
> > >
> > > Could you revert that patch, rebuild, and try to recreate the problem?
> >
> > As you will have seen, Matthias doubted whether his patch would be at the root
> > of the problem I am seeing. I noticed you posted some patches earlier this week.
> > Would any combination of those be likely to fix the bug, please? The patch
> > http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/upstream/0001-rt61pci-fix-up-merge-damage.patch
> > applies cleanly to 2.6.24 but the build fails :(
>
> This patch is not intended for 2.6.24, so that won't help. :-)
>
> I saw Mattias's reply, but I didn't see him suggest any other
> fixes... :-( I'm wondering if maybe you tried reverting it anyway?
> I only ask because the call trace references symbols that would seem
> to relate to the patch. So even if _why_ it might break things isn't
> obvious, it still might be worth trying a revert "just in case". :-)
>
Yes, I did revert Matthias' patch- see http://marc.info/?l=linux-wireless&m=120116461507261&w=4
which is what Matthias was replying to.
> > It seems that it may take a little while to get to the bottom of this bug. Should
> > I raise an incident on bugzilla?
>
> I've seen other reports of rt61 working fine with 2.6.24. So I
> suppose you should open a bug.
>
OK, I'll do that.
Incidentally, I've seen other posting to the linux-wireless list containing discussion
about problems when using an 11g device to connect to an 11b AP. That's exactly
what I am doing with a Belkin F5D7010 wireless adapter and a Draytek Vigor
2600WE AP.
> Thanks!
>
> John
--
Beauty is in the eye of the beerholder.