2009-07-13 08:35:46

by Chris Clayton

[permalink] [raw]
Subject: 2.6.31-rc2: Possible regression in rt61pci driver

Hi,

Please cc me on any reply because I'm not subscribed.

I've been testing 2.6.31 development kernels on my laptop and find
that I can induce a complete lock-up more or less at will. To do so,
all I have to do is generate some network traffic on my wireless LAN
(I've been using wget to transfer a file from another box on my LAN)
and then wait. If I run netstat repeatedly while waiting, I see a TCP
connection to port 21 on another box on my LAN in a TIME_WAIT state.
It seems that when that connection disappears, the laptop locks up
hard and I can only recover by powering off and on again. I think the
problem is related to the rt61pci driver because I haven't been able
to induce the lock-up when using a wireless card that's supported by
the ath5k driver. I started bisecting, but a couple of times I arrived
at points where although the kernel builds OK, I have no network
connectivity. I guessed at good, but the bisection process finished at
a change that can't be the culprit (because it's for a different
architecture).

I attach the best diagnostics I can think of at this point in time
(but am more than happy to provide any others that are requested). It
includes the output from dmesg from a boot that locked up and the
syslog journal from that boot; a description of the wireless card from
lspci -v and the output from netstat that shows the connection I think
is involved. As I say, feel free to ask for any other diagnostics that
will help track the problem down.

I have confirmed that the problem is still present in a kernel built
after a 'git pull' this morning, although it was somewhere around the
time that -rc2 was released that I first came across it. I cannot
induce the problem with 2.6.30.1.

Thanks

Chris


--
No, Sir; there is nothing which has yet been contrived by man, by
which so much happiness is produced as by a good tavern or inn -
Doctor Samuel Johnson


Attachments:
dmesg.txt (29.63 kB)
syslog (85.67 kB)
lspci.txt (345.00 B)
netstat.txt (1.51 kB)
Download all attachments

2009-07-28 13:38:32

by Luis Correia

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

Chris

On Tue, Jul 28, 2009 at 14:34, Chris Clayton<[email protected]> wrote:
> 2009/7/28 Luis Correia <[email protected]>:
>> Hi Chris
>
>> Can you do a very dumb test for me?
>>
>> Boot your laptop by adding 'irqpoll' to the kernel command line
>> options and see if the problem still occours.
>>
>
> Not so dumb a request, actually. ?Running -rc4 with the irqpoll
> command line option, the laptop has just survived 30 minutes without a
> freeze. It can't survive more than 5 minutes without that option.
>
> Chris

Well, if the laptop hangs on working without any problems, this may
proove to be some strange kind of hardware incompatibility.

The irqpoll option solves some of the interrupt related problems, by
polling them instead of 'grabbing' them (at least that is what I think
it is happening, correct me if I'm wrong:))


Luis Correia
rt2x00 project admin

2009-07-27 20:08:01

by Pavel Roskin

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

On Mon, 2009-07-27 at 20:35 +0100, Chris Clayton wrote:
> On Monday 27 July 2009, Pavel Roskin wrote:
> >
> > But I think you may be getting something when the freeze happen. But to
> > see it, you need to be on a text console. There are other ways to
> > capture the kernel messages, which are described in the file
> > Documentation/oops-tracing.txt in the Linux sources.
>
> I've taken some photos of the screen from the insertion of the card through to ejecting it, and
> uploaded them to imageshack: They can be viewed at:
>
> http://img145.imageshack.us/img145/8741/xdscn0647.jpg
> http://img170.imageshack.us/img170/1954/xdscn0648.jpg
> http://img195.imageshack.us/img195/807/xdscn0649.jpg
> http://img195.imageshack.us/img195/9625/xdscn0650.jpg
> http://img195.imageshack.us/img195/2717/xdscn0651.jpg
> http://img170.imageshack.us/img170/6103/xdscn0652.jpg
> http://img170.imageshack.us/img170/5844/xdscn0653.jpg
>
> Between them they should show all the output to the console, although some do overlap a little.
>
> I hope they are helpful in tracking down this problem.

Do you remove the card? It's like the card is disconnected at some
point, either physically or logically.

It looks like a problem specific to the hardware, not anything in the
common wireless code. I suggest that you post your question to
[email protected]. That's the mailing list for rt61pci and
other Ralink devices.

--
Regards,
Pavel Roskin

2009-07-29 07:05:45

by Chris Clayton

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

On Tuesday 28 July 2009, Luis Correia wrote:
> Chris
>
> On Tue, Jul 28, 2009 at 14:34, Chris Clayton<[email protected]> wrote:
> > 2009/7/28 Luis Correia <[email protected]>:
> >> Hi Chris
> >>
> >> Can you do a very dumb test for me?
> >>
> >> Boot your laptop by adding 'irqpoll' to the kernel command line
> >> options and see if the problem still occours.
> >
> > Not so dumb a request, actually. ?Running -rc4 with the irqpoll
> > command line option, the laptop has just survived 30 minutes without a
> > freeze. It can't survive more than 5 minutes without that option.

Well, the laptop ran -rc4-git2 all evening yesterday without freezing. So using the irqpoll option
seems to be an answer. A few questions and observations though:

1. Are we sure that it is the power-saving change that _caused_ this problem? Could it be that some
other bug is simply unmasked by the power-saving change? I ask this because I noticed yesterday
that the card's network LED switches off several seconds before the laptop freezes.

2. When power-saving kicks in, the network LED on the card is switched off. (At least I'm assuming
that is what switches the LED off) When the card is woken up again, neither the network nor activity
LED lights-up ever again. IMHO, this is a bug.

3. Once power-saving kicks in, the machine cannot be contacted from another machine on the network -
pings are not responded to, for example. Is this the intended behaviour?

4. Are there any plans to be able to switch power-saving on/off on a driver-by-driver basis. It
seems to me, although I'm happy to be corrected, that it is either on or off for a given kernel,
depending on CONFIG_MAC80211_DEFAULT_PS. But, if like me, someone has two cards (one at home and one
at work, for example), one that works fine with PS and one that doesn't, they have to set PS off
for their kernel, or have two kernels. Could PS be {en,dis}abled by, say, a module parameter or a
sysfs switch?

Thanks,

Chris

> >
> > Chris
>
> Well, if the laptop hangs on working without any problems, this may
> proove to be some strange kind of hardware incompatibility.
>
> The irqpoll option solves some of the interrupt related problems, by
> polling them instead of 'grabbing' them (at least that is what I think
> it is happening, correct me if I'm wrong:))
>
>
> Luis Correia
> rt2x00 project admin



--
No, Sir; there is nothing which has yet been contrived by man, by which
so much happiness is produced as by a good tavern or inn - Doctor Samuel
Johnson

2009-07-27 20:30:43

by Pavel Roskin

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

On Mon, 2009-07-27 at 21:22 +0100, Chris Clayton wrote:

> I think I'll just configure MAC80211 power saving off or use the ath5k
> card that I also own.

OK, it's unfortunate but understandable. I'll keep this in mind in case
I have a chance to do a extensive testing for rt61pci. I'm using a PCI
card that cannot be removed to easily.

--
Regards,
Pavel Roskin

2009-07-26 19:15:21

by Chris Clayton

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

2009/7/21 Chris Clayton <[email protected]>:
> 2009/7/14 Chris Clayton <[email protected]>:
> <snip>
>
>> I've updated to 2.6.31-rc3 this morning and done some more testing.
>> I'm now convinced that the rt61pci driver is somehow involved in
>> locking up the laptop. With the (Belkin) rt61 card inserted, the
>> machine will lock up even if I am doing nothing (no web browsing,
>> email or anything else at all) except running this script in a console
>> window:
>>
>> i=0
>> while true; do
>> ? ? ? ?let i++
>> ? ? ? ?echo -n "$i "
>> ? ? ? ?sleep 1
>> done
>>
>> In the tests I have done so far, the counter has never gone beyond 240
>> before the machine locked. With the (no-name) ath5k card inserted I
>> can use the laptop for normal web browsing, email, etc with no
>> problems - the counter in the script above gets to over 2000.
>>
>
> The freeze still happens with 2.6.31-rc3-git5, but I've been doing
> some more fact-finding.
>
> Running the script shown above and with the rt61-based card inserted,
> I can freeze the laptop even if I am doing nothing else on the laptop.
> When the freeze occurs, the laptop is effectively dead, no response to
> mouse movement or keyboard input and no response to pings from
> another machine on my network. However, if I eject the card, the
> laptop comes to life again. The key presses from when the laptop was
> frozen appear on screen and pings from another machine are responded
> to. The script continues to run and display the counter. I then
> reinsert the card and everything appears OK until the laptop freezes
> again a minute or two later. During a test run this morning the
> machine froze at (from the output of the script) 80, 235, 369, 538 and
> 672. Each time, ejecting the card brought the machine back to life.
>
> Trying the same test with the ath5k-based card inserted resulted in
> the script getting to 2300 without the laptop freezing, at which point
> I stopped the script.
>

One more data point. I wondered whether the freeze would "time out" if
I just left the laptop frozen, but my testing shows that it probably
does not (or if it does it takes more than 27 minutes to do so.

I've also tried to bisect again, but, as last time, once I got to the
batch of network-related changes that went into -rc1, I get a series
of kernels that build but either won't boot or have inoperable
wireless networking.

Finally, since I haven't had a single reply to the regression report I
posted almost two weeks ago, I now give up. I'll switch to using the
card supported by the ath5k driver.

<snip>

Chris
--
No, Sir; there is nothing which has yet been contrived by man, by
which so much happiness is produced as by a good tavern or inn -
Doctor Samuel Johnson

2009-07-29 07:21:56

by Johannes Berg

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

On Wed, 2009-07-29 at 08:03 +0100, Chris Clayton wrote:

> 3. Once power-saving kicks in, the machine cannot be contacted from another machine on the network -
> pings are not responded to, for example. Is this the intended behaviour?

No. The device is supposed to listen to beacons, check if there's
traffic pending and retrieve it. So sounds like a device problem unless
your AP is broken.

> 4. Are there any plans to be able to switch power-saving on/off on a driver-by-driver basis. It
> seems to me, although I'm happy to be corrected, that it is either on or off for a given kernel,
> depending on CONFIG_MAC80211_DEFAULT_PS. But, if like me, someone has two cards (one at home and one
> at work, for example), one that works fine with PS and one that doesn't, they have to set PS off
> for their kernel, or have two kernels. Could PS be {en,dis}abled by, say, a module parameter or a
> sysfs switch?

iwconfig wlan0 power off

(yeah, looks weird, but power means "power saving")

johannes


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part

2009-07-26 20:10:17

by Pavel Roskin

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

On Sun, 2009-07-26 at 20:15 +0100, Chris Clayton wrote:

> One more data point. I wondered whether the freeze would "time out" if
> I just left the laptop frozen, but my testing shows that it probably
> does not (or if it does it takes more than 27 minutes to do so.

I suggest that you run it on the text console after "dmesg -n 8", so
that all kernel messages are seen.

I'm using rt61pci with wireless-testing, and I don't see any freezes.

> I've also tried to bisect again, but, as last time, once I got to the
> batch of network-related changes that went into -rc1, I get a series
> of kernels that build but either won't boot or have inoperable
> wireless networking.

You can use "git bisect skip" to skip those revisions.

You can specify the paths in "git bisect start" so that only changes to
the interesting places (like drivers/net wireless, net/mac80211 and
net/wireless) are considered when calculating the next commit. This
will probably help you avoid the bad place.

Also, please try the current wireless-testing. The problem may be fixed
there.

> Finally, since I haven't had a single reply to the regression report I
> posted almost two weeks ago, I now give up. I'll switch to using the
> card supported by the ath5k driver.

Perhaps there was not enough material for others to comment on, and
nobody was experiencing anything similar.

If I had such problem, I would try to bisect it, but I cannot reproduce
it.

--
Regards,
Pavel Roskin

2009-07-14 11:04:59

by Chris Clayton

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

Hi again,


2009/7/13 Chris Clayton <[email protected]>:
> Hi,
>
> Please cc me on any reply because I'm not subscribed.
>
> I've been testing 2.6.31 development kernels on my laptop and find
> that I can induce a complete lock-up more or less at will. To do so,
> all I have to do is generate some network traffic on my wireless LAN
> (I've been using wget to transfer a file from another box on my LAN)
> and then wait. If I run netstat repeatedly while waiting, I see a TCP
> connection to port 21 on another box on my LAN in a TIME_WAIT state.
> It seems that when that connection disappears, the laptop locks up
> hard and I can only recover by powering off and on again. I think the
> problem is related to the rt61pci driver because I haven't been able
> to induce the lock-up when using a wireless card that's supported by
> the ath5k driver. I started bisecting, but a couple of times I arrived
> at points where although the kernel builds OK, I have no network
> connectivity. I guessed at good, but the bisection process finished at
> a change that can't be the culprit (because it's for a different
> architecture).
>
> I attach the best diagnostics I can think of at this point in time
> (but am more than happy to provide any others that are requested). It
> includes the output from dmesg from a boot that locked up and the
> syslog journal from that boot; a description of the wireless card from
> lspci -v and the output from netstat that shows the connection I think
> is involved. As I say, feel free to ask for any other diagnostics that
> will help track the problem down.
>
> I have confirmed that the problem is still present in a kernel built
> after a 'git pull' this morning, although it was somewhere around the
> time that -rc2 was released that I first came across it. I cannot
> induce the problem with 2.6.30.1.

I've updated to 2.6.31-rc3 this morning and done some more testing.
I'm now convinced that the rt61pci driver is somehow involved in
locking up the laptop. With the (Belkin) rt61 card inserted, the
machine will lock up even if I am doing nothing (no web browsing,
email or anything else at all) except running this script in a console
window:

i=0
while true; do
let i++
echo -n "$i "
sleep 1
done

In the tests I have done so far, the counter has never gone beyond 240
before the machine locked. With the (no-name) ath5k card inserted I
can use the laptop for normal web browsing, email, etc with no
problems - the counter in the script above gets to over 2000.

As I said yesterday, I'm happy to provide additional diagnostics,
apply patches, etc.

Thanks

Chris
--
No, Sir; there is nothing which has yet been contrived by man, by which
so much happiness is produced as by a good tavern or inn - Doctor Samuel
Johnson

2009-07-26 21:33:10

by Chris Clayton

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

Thanks for the reply, Pavel.

2009/7/26 Pavel Roskin <[email protected]>:
> On Sun, 2009-07-26 at 20:15 +0100, Chris Clayton wrote:
>
>> One more data point. I wondered whether the freeze would "time out" if
>> I just left the laptop frozen, but my testing shows that it probably
>> does not (or if it does it takes more than 27 minutes to do so.
>
> I suggest that you run it on the text console after "dmesg -n 8", so
> that all kernel messages are seen.
>
> I'm using rt61pci with wireless-testing, and I don't see any freezes.

Do you have CONFIG_MAC80211_DEFAULT_PS enabled? I have just built and
installed -rc4 with this option disabled and it has survived almost 20
minutes so far without a freeze. No previous kernel in the 2.6.31
series has survived more than 5 minutes without freezing, so this
looks promising.
>
>> I've also tried to bisect again, but, as last time, once I got to the
>> batch of network-related changes that went into -rc1, I get a series
>> of kernels that build but either won't boot or have inoperable
>> wireless networking.
>
> You can use "git bisect skip" to skip those revisions.
>
> You can specify the paths in "git bisect start" so that only changes to
> the interesting places (like drivers/net wireless, net/mac80211 and
> net/wireless) are considered when calculating the next commit. ?This
> will probably help you avoid the bad place.
>

Thanks for those tips. I'll note them in my "useful stuff I might
forget" notebook and then try to find time over the next few weeks to
get to grips with the power of git.

Chris


--
No, Sir; there is nothing which has yet been contrived by man, by which
so much happiness is produced as by a good tavern or inn - Doctor Samuel
Johnson

2009-07-27 19:37:10

by Chris Clayton

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

On Monday 27 July 2009, Pavel Roskin wrote:
>
> But I think you may be getting something when the freeze happen. But to
> see it, you need to be on a text console. There are other ways to
> capture the kernel messages, which are described in the file
> Documentation/oops-tracing.txt in the Linux sources.

I've taken some photos of the screen from the insertion of the card through to ejecting it, and
uploaded them to imageshack: They can be viewed at:

http://img145.imageshack.us/img145/8741/xdscn0647.jpg
http://img170.imageshack.us/img170/1954/xdscn0648.jpg
http://img195.imageshack.us/img195/807/xdscn0649.jpg
http://img195.imageshack.us/img195/9625/xdscn0650.jpg
http://img195.imageshack.us/img195/2717/xdscn0651.jpg
http://img170.imageshack.us/img170/6103/xdscn0652.jpg
http://img170.imageshack.us/img170/5844/xdscn0653.jpg

Between them they should show all the output to the console, although some do overlap a little.

I hope they are helpful in tracking down this problem.

Thanks,

Chris


--
No, Sir; there is nothing which has yet been contrived by man, by which
so much happiness is produced as by a good tavern or inn - Doctor Samuel
Johnson

2009-07-27 20:23:39

by Chris Clayton

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

On Monday 27 July 2009, Pavel Roskin wrote:
> On Mon, 2009-07-27 at 20:35 +0100, Chris Clayton wrote:
> > On Monday 27 July 2009, Pavel Roskin wrote:
> > > But I think you may be getting something when the freeze happen. But
> > > to see it, you need to be on a text console. There are other ways to
> > > capture the kernel messages, which are described in the file
> > > Documentation/oops-tracing.txt in the Linux sources.
> >
> > I've taken some photos of the screen from the insertion of the card
> > through to ejecting it, and uploaded them to imageshack: They can be
> > viewed at:
> >
> > http://img145.imageshack.us/img145/8741/xdscn0647.jpg
> > http://img170.imageshack.us/img170/1954/xdscn0648.jpg
> > http://img195.imageshack.us/img195/807/xdscn0649.jpg
> > http://img195.imageshack.us/img195/9625/xdscn0650.jpg
> > http://img195.imageshack.us/img195/2717/xdscn0651.jpg
> > http://img170.imageshack.us/img170/6103/xdscn0652.jpg
> > http://img170.imageshack.us/img170/5844/xdscn0653.jpg
> >
> > Between them they should show all the output to the console, although
> > some do overlap a little.
> >
> > I hope they are helpful in tracking down this problem.
>
> Do you remove the card? It's like the card is disconnected at some
> point, either physically or logically.

Yes, I remove the card once the laptop has frozen, because doing so "thaws" it again. The laptop is
then usable until 2-5 minutes after I insert the card again, when it will freeze again. I can
repeat this cycle over and over.

>
> It looks like a problem specific to the hardware, not anything in the
> common wireless code. I suggest that you post your question to
> [email protected]. That's the mailing list for rt61pci and
> other Ralink devices.

I think I'll just configure MAC80211 power saving off or use the ath5k card that I also own.

Thanks anyway.

Chris
--
No, Sir; there is nothing which has yet been contrived by man, by which
so much happiness is produced as by a good tavern or inn - Doctor Samuel
Johnson

2009-07-27 14:27:42

by Pavel Roskin

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

On Mon, 2009-07-27 at 10:35 +0100, Chris Clayton wrote:

> I've built and installed a kernel with all the various types of
> MAC80211 debugging turned on (in addition to the RT61 driver's) and
> with CONFIG_MAC80211_DEFAULT_PS enabled. The attached file is the
> output from dmesg just after a freeze occurs. (The pcmcia card eject
> represents me ejecting the wireless card, so that the freeze "thaws"
> and I can capture the output from dmesg).
>
> Let me know if any additional diagnostics are needed. I have enabled
> MAC80211 debugfs too.

I could not reproduce the freezing with CONFIG_MAC80211_DEFAULT_PS
enabled. That said, I'm using wireless-testing with a patch for the
scanning state machine (hopefully it will be committed today).

The output of dmesg doesn't show anything interesting. The stack dump
from the video card is unlikely to be related.

But I think you may be getting something when the freeze happen. But to
see it, you need to be on a text console. There are other ways to
capture the kernel messages, which are described in the file
Documentation/oops-tracing.txt in the Linux sources.

--
Regards,
Pavel Roskin

2009-07-29 07:45:27

by Kalle Valo

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

Chris Clayton <[email protected]> writes:

> 4. Are there any plans to be able to switch power-saving on/off on a
> driver-by-driver basis. It seems to me, although I'm happy to be
> corrected, that it is either on or off for a given kernel, depending
> on CONFIG_MAC80211_DEFAULT_PS.

mac80211 drivers can control power save with IEEE80211_HW_SUPPORTS_PS
flag. Also it can runtime from user space with 'iwconfig wlan0 power
off'. nl80211 interface is on my todo list.

--
Kalle Valo

2009-07-28 11:03:18

by Chris Clayton

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

Hi everyone,

2009/7/27 Johannes Berg <[email protected]>:
> On Mon, 2009-07-27 at 21:22 +0100, Chris Clayton wrote:
>
>> Yes, I remove the card once the laptop has frozen, because doing so "thaws" it again. The laptop is
>> then usable until 2-5 minutes after I insert the card again, when it will freeze again. I can
>> repeat this cycle over and over.
>

Inspired by the fact that Pavel doesn't get the freeze, I've just
cloned and tested the latest and greatest wireless-testing tree and,
unfortunately, the freeze still occurs. In fact it seems to happen
even more quickly now!

> That really points to a hw (or hw programming) bug that causes the card
> to lock up the bus. Had that many times in early b43 days.

If it's a programming problem, might it be fixable in the firmware
(rt2561s.bin)?

If not and given this is a Belkin card (model F5D7010), I guess there
could be thousands of them out there, so some note on the Devices
section of linuxwireless.org might help other users. In fact two
versions of this this model are currently shown as supported by the
b43 driver, but this, which has worked fine until now, isn't mentioned
at all. T

>> > It looks like a problem specific to the hardware, not anything in the
>> > common wireless code. ?I suggest that you post your question to
>> > [email protected]. ?That's the mailing list for rt61pci and
>> > other Ralink devices.
>>
>> I think I'll just configure MAC80211 power saving off or use the ath5k card that I also own.
>
> You could also ask Ivo to disable power saving for rt61pci.
>

Rather than simply disabling power saving, could it be done though a
module parameter, so that those with more robust cards are not
disadvantaged?

Chris

--
No, Sir; there is nothing which has yet been contrived by man, by which
so much happiness is produced as by a good tavern or inn - Doctor Samuel
Johnson

2009-07-27 00:06:49

by Pavel Roskin

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

On Sun, 2009-07-26 at 22:33 +0100, Chris Clayton wrote:

> Do you have CONFIG_MAC80211_DEFAULT_PS enabled?

No. That may explain everything. I'll try enabling
CONFIG_MAC80211_DEFAULT_PS.

--
Regards,
Pavel Roskin

2009-07-28 14:15:04

by Chris Clayton

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

Luis,

2009/7/28 Luis Correia <[email protected]>:
> Chris
>
> On Tue, Jul 28, 2009 at 14:34, Chris Clayton<[email protected]> wrote:
>> 2009/7/28 Luis Correia <[email protected]>:
>>> Hi Chris
>>
>>> Can you do a very dumb test for me?
>>>
>>> Boot your laptop by adding 'irqpoll' to the kernel command line
>>> options and see if the problem still occours.
>>>
>>
>> Not so dumb a request, actually. ?Running -rc4 with the irqpoll
>> command line option, the laptop has just survived 30 minutes without a
>> freeze. It can't survive more than 5 minutes without that option.
>>
>> Chris
>
> Well, if the laptop hangs on working without any problems, this may
> proove to be some strange kind of hardware incompatibility.
>
> The irqpoll option solves some of the interrupt related problems, by
> polling them instead of 'grabbing' them (at least that is what I think
> it is happening, correct me if I'm wrong:))
>

I've no idea whether you are right or wrong :-) If it is a hardware
quirk, the odd thing is that the card has worked perfectly (modulo
development cycle bugs that I've unearthed and reported) for a long
time now, including the codemonkey driver before rt2x00 hit mainline.

Anyway, thanks for your help.

Chris

>
> Luis Correia
> rt2x00 project admin
>



--
No, Sir; there is nothing which has yet been contrived by man, by which
so much happiness is produced as by a good tavern or inn - Doctor Samuel
Johnson

2009-07-21 11:39:22

by Chris Clayton

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

2009/7/14 Chris Clayton <[email protected]>:
<snip>

> I've updated to 2.6.31-rc3 this morning and done some more testing.
> I'm now convinced that the rt61pci driver is somehow involved in
> locking up the laptop. With the (Belkin) rt61 card inserted, the
> machine will lock up even if I am doing nothing (no web browsing,
> email or anything else at all) except running this script in a console
> window:
>
> i=0
> while true; do
> ? ? ? ?let i++
> ? ? ? ?echo -n "$i "
> ? ? ? ?sleep 1
> done
>
> In the tests I have done so far, the counter has never gone beyond 240
> before the machine locked. With the (no-name) ath5k card inserted I
> can use the laptop for normal web browsing, email, etc with no
> problems - the counter in the script above gets to over 2000.
>

The freeze still happens with 2.6.31-rc3-git5, but I've been doing
some more fact-finding.

Running the script shown above and with the rt61-based card inserted,
I can freeze the laptop even if I am doing nothing else on the laptop.
When the freeze occurs, the laptop is effectively dead, no response to
mouse movement or keyboard input and no response to pings from
another machine on my network. However, if I eject the card, the
laptop comes to life again. The key presses from when the laptop was
frozen appear on screen and pings from another machine are responded
to. The script continues to run and display the counter. I then
reinsert the card and everything appears OK until the laptop freezes
again a minute or two later. During a test run this morning the
machine froze at (from the output of the script) 80, 235, 369, 538 and
672. Each time, ejecting the card brought the machine back to life.

Trying the same test with the ath5k-based card inserted resulted in
the script getting to 2300 without the laptop freezing, at which point
I stopped the script.

I started trying to isolate the change that causes the problem by
reverting changes to the files in drivers/net/wireless/rt2x00. The
change "rt2x00: Remove last usage of beacon_int from
ieee80211_config" reverted cleanly and the kernel built OK, but I
still got the freeze. "rt2x00: Remove usage of
IEEE80211_CONF_CHANGE_BEACON_INTERVAL" also reverted cleanly but the
kernel doesn't build because of dependencies on changes to mac80211.
I'm afraid I am out of my depth now, so I will have to abandon that
line of enquiry.

I hope this new information helps track the problem down. I've
attached the output from dmesg that shows the messages emitted when I
eject the card, plus my config

Thanks,

Chris
--
No, Sir; there is nothing which has yet been contrived by man, by which
so much happiness is produced as by a good tavern or inn - Doctor Samuel
Johnson


Attachments:
rt61freeze.log (19.20 kB)
.config (49.24 kB)
Download all attachments

2009-07-27 20:32:57

by Johannes Berg

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

On Mon, 2009-07-27 at 21:22 +0100, Chris Clayton wrote:

> Yes, I remove the card once the laptop has frozen, because doing so "thaws" it again. The laptop is
> then usable until 2-5 minutes after I insert the card again, when it will freeze again. I can
> repeat this cycle over and over.

That really points to a hw (or hw programming) bug that causes the card
to lock up the bus. Had that many times in early b43 days.

> > It looks like a problem specific to the hardware, not anything in the
> > common wireless code. I suggest that you post your question to
> > [email protected]. That's the mailing list for rt61pci and
> > other Ralink devices.
>
> I think I'll just configure MAC80211 power saving off or use the ath5k card that I also own.

You could also ask Ivo to disable power saving for rt61pci.

johannes


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part

2009-07-28 13:34:34

by Chris Clayton

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

2009/7/28 Luis Correia <[email protected]>:
> Hi Chris

> Can you do a very dumb test for me?
>
> Boot your laptop by adding 'irqpoll' to the kernel command line
> options and see if the problem still occours.
>

Not so dumb a request, actually. Running -rc4 with the irqpoll
command line option, the laptop has just survived 30 minutes without a
freeze. It can't survive more than 5 minutes without that option.

Chris

> Luis Correia
> rt2x00 project admin
>



--
No, Sir; there is nothing which has yet been contrived by man, by which
so much happiness is produced as by a good tavern or inn - Doctor Samuel
Johnson

2009-07-27 09:40:30

by Chris Clayton

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

2009/7/27 Pavel Roskin <[email protected]>:
> On Sun, 2009-07-26 at 22:33 +0100, Chris Clayton wrote:
>
>> Do you have CONFIG_MAC80211_DEFAULT_PS enabled?
>
> No. ?That may explain everything. ?I'll try enabling
> CONFIG_MAC80211_DEFAULT_PS.
>

I've built and installed a kernel with all the various types of
MAC80211 debugging turned on (in addition to the RT61 driver's) and
with CONFIG_MAC80211_DEFAULT_PS enabled. The attached file is the
output from dmesg just after a freeze occurs. (The pcmcia card eject
represents me ejecting the wireless card, so that the freeze "thaws"
and I can capture the output from dmesg).

Let me know if any additional diagnostics are needed. I have enabled
MAC80211 debugfs too.

Chris

> --
> Regards,
> Pavel Roskin
>



--
No, Sir; there is nothing which has yet been contrived by man, by which
so much happiness is produced as by a good tavern or inn - Doctor Samuel
Johnson


Attachments:
freeze.dmesg.txt (39.03 kB)

2009-07-28 11:11:38

by Luis Correia

[permalink] [raw]
Subject: Re: 2.6.31-rc2: Possible regression in rt61pci driver

Hi Chris

On Tue, Jul 28, 2009 at 12:03, Chris Clayton<[email protected]> wrote:
> Hi everyone,
>
> 2009/7/27 Johannes Berg <[email protected]>:
>> On Mon, 2009-07-27 at 21:22 +0100, Chris Clayton wrote:
>>
>>> Yes, I remove the card once the laptop has frozen, because doing so "thaws" it again. The laptop is
>>> then usable until 2-5 minutes after I insert the card again, when it will freeze again. I can
>>> repeat this cycle over and over.
>>
>
> Inspired by the fact that Pavel doesn't get the freeze, I've just
> cloned and tested the latest and greatest wireless-testing tree and,
> unfortunately, the freeze still occurs. In fact it seems to happen
> even more quickly now!
>
>> That really points to a hw (or hw programming) bug that causes the card
>> to lock up the bus. Had that many times in early b43 days.
>
> If it's a programming problem, might it be fixable in the firmware
> (rt2561s.bin)?
>
> If not and given this is a Belkin card (model F5D7010), I guess there
> could be thousands of them out there, so some note on the Devices
> section of linuxwireless.org might help other users. In fact two
> versions of this this model are currently shown as supported by the
> b43 driver, but this, which has worked fine until now, isn't mentioned
> at all. T
>
>>> > It looks like a problem specific to the hardware, not anything in the
>>> > common wireless code. ?I suggest that you post your question to
>>> > [email protected]. ?That's the mailing list for rt61pci and
>>> > other Ralink devices.
>>>
>>> I think I'll just configure MAC80211 power saving off or use the ath5k card that I also own.
>>
>> You could also ask Ivo to disable power saving for rt61pci.
>>
>
> Rather than simply disabling power saving, could it be done though a
> module parameter, so that those with more robust cards are not
> disadvantaged?
>
> Chris

Can you do a very dumb test for me?

Boot your laptop by adding 'irqpoll' to the kernel command line
options and see if the problem still occours.

Luis Correia
rt2x00 project admin