2011-07-27 23:40:04

by Justin Piszcz

[permalink] [raw]
Subject: 3.0: rt2800usb(Kernel PANIC) vs. rt2780sta(GOOD/2.6.38)

Hi,

Kernel 3.0 (rt2800usb driver- horrid, and then it panics moments after,
see picture below)
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=301 ttl=64 time=146 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=302 ttl=64 time=274 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=303 ttl=64 time=197 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=304 ttl=64 time=115 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=305 ttl=64 time=243 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=306 ttl=64 time=265 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=307 ttl=64 time=183 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=308 ttl=64 time=201 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=309 ttl=64 time=236 ms

Then it crashes if you use the network:
http://home.comcast.net/~jpiszcz/20110727/photo.JPG

With the rt2870sta driver, the machine has been solid for months (2.6.38)
with consistent low pings:

64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=2 ttl=64 time=1.38 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=3 ttl=64 time=0.520 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=4 ttl=64 time=1.11 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=5 ttl=64 time=0.573 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=6 ttl=64 time=0.562 ms

Can we please get the rt2870sta back into the kernel?

If not, can someone recommend a USB wireless stick that works in Linux
that is not based on RALINK that does not crash the kernel and gets as
good as pings as the one I have (most highest rated on Amazon that
everyone seems to buy) with the rt2870sta driver?
http://www.amazon.com/Medialink-Wireless-Adapter-802-11n-Compatible/dp/B002RM08RE


Justin.


2011-07-31 10:09:55

by Justin Piszcz

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure



On Sat, 30 Jul 2011, Adam Cozzette wrote:

> On Sat, Jul 30, 2011 at 01:10:26PM -0400, Justin Piszcz wrote:
>> (This is with the patch provided earlier)
>>
>> atom:/usr/src/linux# cd drivers/net/wireless/rt2x00/
>> atom:/usr/src/linux/drivers/net/wireless/rt2x00# grep 'memset(skb_push(entry->skb, TXWI_DESC_SIZE), 0, TXWI_DESC_SIZE)' rt2800lib.c
>> memset(skb_push(entry->skb, TXWI_DESC_SIZE), 0, TXWI_DESC_SIZE);
>> atom:/usr/src/linux/drivers/net/wireless/rt2x00# cat /usr/src/linux/.version
>> 2
>> atom:/usr/src/linux/drivers/net/wireless/rt2x00# uname -a
>> Linux atom 3.0.0 #2 SMP Sat Jul 30 08:34:18 EDT 2011 x86_64 GNU/Linux
>> atom:/usr/src/linux/drivers/net/wireless/rt2x00#
>>
>> Justin.
>
> The last two images you posted show rt2800_txdone in the stack trace. I couldn't
> find any such function in the source and unless I'm mistaken, it was renamed in
> commit 8f66bbb5 back in May. Are you sure that the most recent code is what is
> actually running when you run into this problem?
>
> --
> Adam Cozzette
> Harvey Mudd College
>

Hi,

Here is the .config used (3.0):
http://home.comcast.net/~jpiszcz/20110731/config-3.0.txt

Justin.


2011-07-31 03:47:53

by Adam Cozzette

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure

On Sat, Jul 30, 2011 at 01:10:26PM -0400, Justin Piszcz wrote:
> (This is with the patch provided earlier)
>
> atom:/usr/src/linux# cd drivers/net/wireless/rt2x00/
> atom:/usr/src/linux/drivers/net/wireless/rt2x00# grep 'memset(skb_push(entry->skb, TXWI_DESC_SIZE), 0, TXWI_DESC_SIZE)' rt2800lib.c
> memset(skb_push(entry->skb, TXWI_DESC_SIZE), 0, TXWI_DESC_SIZE);
> atom:/usr/src/linux/drivers/net/wireless/rt2x00# cat /usr/src/linux/.version
> 2
> atom:/usr/src/linux/drivers/net/wireless/rt2x00# uname -a
> Linux atom 3.0.0 #2 SMP Sat Jul 30 08:34:18 EDT 2011 x86_64 GNU/Linux
> atom:/usr/src/linux/drivers/net/wireless/rt2x00#
>
> Justin.

The last two images you posted show rt2800_txdone in the stack trace. I couldn't
find any such function in the source and unless I'm mistaken, it was renamed in
commit 8f66bbb5 back in May. Are you sure that the most recent code is what is
actually running when you run into this problem?

--
Adam Cozzette
Harvey Mudd College

2011-07-30 15:04:18

by Andreas Hartmann

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure

Hi Justin,

if you want to test, if the module works stable, run this script for one
or two hours. It will stress the driver and the hardware.

If it doesn't crash and if the throughput is stable, you can hope, that
the driver is ok for daily work.


#!/bin/sh

dest="server" # set the servername

while true ; do
netperf -t TCP_MAERTS -H $dest
netperf -t TCP_STREAM -H $dest
netperf -t TCP_SENDFILE -H $dest
done


Start the script on the client. On the server start netserver.
You get netperf from http://www.netperf.org/netperf/


Andreas

2011-07-30 14:20:22

by Justin Piszcz

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure



On Sat, 30 Jul 2011, Justin Piszcz wrote:

>
>
> On Sat, 30 Jul 2011, Stanislaw Gruszka wrote:
>
>> We should clear skb->data not skb itself. Bug was introduced by:
>> commit 0b8004aa12d13ec750d102ba4082a95f0107c649 "rt2x00: Properly
>> reserve room for descriptors in skbs".
>>
>> Cc: [email protected] # 2.6.36+
>> Signed-off-by: Stanislaw Gruszka <[email protected]>
>> ---
>> drivers/net/wireless/rt2x00/rt2800lib.c | 3 +--
>> 1 files changed, 1 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/wireless/rt2x00/rt2800lib.c
>> b/drivers/net/wireless/rt2x00/rt2800lib.c
>> index 75d2c6c..f94d669 100644
>> --- a/drivers/net/wireless/rt2x00/rt2800lib.c
>> +++ b/drivers/net/wireless/rt2x00/rt2800lib.c
>> @@ -703,8 +703,7 @@ void rt2800_write_beacon(struct queue_entry *entry,
>> struct txentry_desc *txdesc)
>> /*
>> * Add space for the TXWI in front of the skb.
>> */
>> - skb_push(entry->skb, TXWI_DESC_SIZE);
>> - memset(entry->skb, 0, TXWI_DESC_SIZE);
>> + memset(skb_push(entry->skb, TXWI_DESC_SIZE), 0, TXWI_DESC_SIZE);
>>
>> /*
>> * Register descriptor details in skb frame descriptor.
>> --
>> 1.7.4
>>
>
> Hi,
>
> Testing w/ Linux kernel 3.0:
>
> # patch -p1 < ../patch3
> patching file drivers/net/wireless/rt2x00/rt2800lib.c
> Hunk #1 succeeded at 784 (offset 81 lines).
>
> Compiled/installed.
>
> Will reboot shortly, need to run a backup first, thanks.
>
> Justin.
>
>

Hi,

Patched, its working, latency is MUCH better than before the patch!

PING atomw.internal.lan (192.168.0.2) 56(84) bytes of data.
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=1 ttl=64 time=0.535 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=2 ttl=64 time=0.589 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=3 ttl=64 time=48.8 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=4 ttl=64 time=1.14 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=5 ttl=64 time=64.5 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=6 ttl=64 time=0.582 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=7 ttl=64 time=80.6 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=8 ttl=64 time=51.9 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=9 ttl=64 time=35.5 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=10 ttl=64 time=0.614 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=11 ttl=64 time=0.497 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=12 ttl=64 time=41.5 ms

$ uptime
10:19:18 up 5 min, 2 users, load average: 1.32, 1.02, 0.47

It hasn't crashed yet, if you don't hear from me/stating it has crashed again,
then you can assume its stable, I'll let you know if anything happens.

Justin.


2011-07-30 11:31:37

by Stanislaw Gruszka

[permalink] [raw]
Subject: [PATCH] rt2x00: rt2800: fix zeroing skb structure

We should clear skb->data not skb itself. Bug was introduced by:
commit 0b8004aa12d13ec750d102ba4082a95f0107c649 "rt2x00: Properly
reserve room for descriptors in skbs".

Cc: [email protected] # 2.6.36+
Signed-off-by: Stanislaw Gruszka <[email protected]>
---
drivers/net/wireless/rt2x00/rt2800lib.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/rt2x00/rt2800lib.c b/drivers/net/wireless/rt2x00/rt2800lib.c
index 75d2c6c..f94d669 100644
--- a/drivers/net/wireless/rt2x00/rt2800lib.c
+++ b/drivers/net/wireless/rt2x00/rt2800lib.c
@@ -703,8 +703,7 @@ void rt2800_write_beacon(struct queue_entry *entry, struct txentry_desc *txdesc)
/*
* Add space for the TXWI in front of the skb.
*/
- skb_push(entry->skb, TXWI_DESC_SIZE);
- memset(entry->skb, 0, TXWI_DESC_SIZE);
+ memset(skb_push(entry->skb, TXWI_DESC_SIZE), 0, TXWI_DESC_SIZE);

/*
* Register descriptor details in skb frame descriptor.
--
1.7.4


2011-07-30 11:29:06

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: 3.0: rt2800usb(Kernel PANIC) vs. rt2780sta(GOOD/2.6.38)

On Thu, Jul 28, 2011 at 11:18:24AM -0500, Larry Finger wrote:
> On 07/27/2011 06:33 PM, Justin Piszcz wrote:
> Any thoughts on how either the skb or the driver_data member might
> not be setup correctly?

I found a bug that possibly causing this. Justin, please test patch
posted in the next mail.

Stanislaw

2011-07-30 18:07:35

by Larry Finger

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure

On 07/30/2011 12:07 PM, Justin Piszcz wrote:
>
>
> On Sat, 30 Jul 2011, Andreas Hartmann wrote:
>
>> Hi Justin,
>>
>> if you want to test, if the module works stable, run this script for one
>> or two hours. It will stress the driver and the hardware.
>>
>> If it doesn't crash and if the throughput is stable, you can hope, that
>> the driver is ok for daily work.
>>
>>
>> #!/bin/sh
>>
>> dest="server" # set the servername
>>
>> while true ; do
>> netperf -t TCP_MAERTS -H $dest
>> netperf -t TCP_STREAM -H $dest
>> netperf -t TCP_SENDFILE -H $dest
>> done
>>
>>
>> Start the script on the client. On the server start netserver.
>> You get netperf from http://www.netperf.org/netperf/
>>
>>
>> Andreas
>>
>
> Hello Andreas,
>
> Thanks for the suggestion; however, it is crashing repeatedly after about 5-10
> minutes, so it is not needed yet.
>
> Here you go, crash 2:
> http://home.comcast.net/~jpiszcz/20110730/2630-rt2800usb-crash2p1.jpg
> http://home.comcast.net/~jpiszcz/20110730/2630-rt2800usb-crash2p2.jpg

This is likely progress as the crash is in a different place.

The page fault this time is at rt2800usb_get_txwi+0xc. That translates back to
line 382 of file rt2800usb.c, which says

if (entry->queue->qid == QID_BEACON)

I have no idea why entry->queue->qid is wrong here, but likely one of the
smarter people who knows a lot more about the device will be able to figure it out.

Larry

2011-07-28 16:18:34

by Larry Finger

[permalink] [raw]
Subject: Re: 3.0: rt2800usb(Kernel PANIC) vs. rt2780sta(GOOD/2.6.38)

On 07/27/2011 06:33 PM, Justin Piszcz wrote:
> Hi,
>
> Kernel 3.0 (rt2800usb driver- horrid, and then it panics moments after, see
> picture below)
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=301 ttl=64 time=146 ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=302 ttl=64 time=274 ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=303 ttl=64 time=197 ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=304 ttl=64 time=115 ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=305 ttl=64 time=243 ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=306 ttl=64 time=265 ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=307 ttl=64 time=183 ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=308 ttl=64 time=201 ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=309 ttl=64 time=236 ms
>
> Then it crashes if you use the network:
> http://home.comcast.net/~jpiszcz/20110727/photo.JPG
>
> With the rt2870sta driver, the machine has been solid for months (2.6.38) with
> consistent low pings:
>
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=2 ttl=64 time=1.38 ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=3 ttl=64 time=0.520 ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=4 ttl=64 time=1.11 ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=5 ttl=64 time=0.573 ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=6 ttl=64 time=0.562 ms
>
> Can we please get the rt2870sta back into the kernel?

No. One of the reasons for deleting rt2870sta was that having it around was
preventing the use of rt2800usb and it was not getting debugged.

This problem does not happen on my device.

Bus 001 Device 004: ID 148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless
Adapter

That device is an unbranded USB stick that I got from Ebay for $2.59. My pings are

PING sonylap (192.168.1.50) 56(84) bytes of data.
64 bytes from sonylap (192.168.1.50): icmp_req=1 ttl=64 time=13.2 ms
64 bytes from sonylap (192.168.1.50): icmp_req=2 ttl=64 time=11.3 ms
64 bytes from sonylap (192.168.1.50): icmp_req=3 ttl=64 time=10.3 ms
64 bytes from sonylap (192.168.1.50): icmp_req=4 ttl=64 time=10.5 ms
64 bytes from sonylap (192.168.1.50): icmp_req=5 ttl=64 time=11.6 ms
64 bytes from sonylap (192.168.1.50): icmp_req=6 ttl=64 time=11.2 ms
64 bytes from sonylap (192.168.1.50): icmp_req=7 ttl=64 time=10.1 ms
64 bytes from sonylap (192.168.1.50): icmp_req=8 ttl=64 time=11.7 ms
64 bytes from sonylap (192.168.1.50): icmp_req=9 ttl=64 time=10.7 ms
64 bytes from sonylap (192.168.1.50): icmp_req=10 ttl=64 time=13.2 ms
64 bytes from sonylap (192.168.1.50): icmp_req=11 ttl=64 time=10.2 ms
64 bytes from sonylap (192.168.1.50): icmp_req=12 ttl=64 time=11.6 ms
64 bytes from sonylap (192.168.1.50): icmp_req=13 ttl=64 time=10.7 ms
64 bytes from sonylap (192.168.1.50): icmp_req=14 ttl=64 time=10.6 ms
64 bytes from sonylap (192.168.1.50): icmp_req=15 ttl=64 time=16.0 ms
64 bytes from sonylap (192.168.1.50): icmp_req=16 ttl=64 time=10.3 ms
^C
--- sonylap ping statistics ---
16 packets transmitted, 16 received, 0% packet loss, time 15025ms
rtt min/avg/max/mdev = 10.194/11.499/16.075/1.502 ms

Performance is adequate, but not sparkling. Using tcpperf, I get 6 Mbps upload
speed on an 802.11n network. The output of iwconfig says that I have a speed of
121.5 Mbps set, but I'm only getting 1/10th of the transmit rate that I would
expect for that setting.

@Ivo: I don't know if you saw this or not. From the photo, the kernel panics on
a page fault in interrupt mode. The traceback is to rt2800usb_write_tx_desc+0x4,
which seems to implicate the inline routine get_skb_frame_desc(). The offending
statement is

return (struct skb_frame_desc *)&IEEE80211_SKB_CB(skb)->driver_data;

Any thoughts on how either the skb or the driver_data member might not be setup
correctly?

Thanks,

Larry

2011-07-30 14:32:40

by Justin Piszcz

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure



On Sat, 30 Jul 2011, Justin Piszcz wrote:

>
>
> On Sat, 30 Jul 2011, Justin Piszcz wrote:
>
>>
>>
>> On Sat, 30 Jul 2011, Stanislaw Gruszka wrote:
>>
>>> We should clear skb->data not skb itself. Bug was introduced by:
>>> commit 0b8004aa12d13ec750d102ba4082a95f0107c649 "rt2x00: Properly
>>> reserve room for descriptors in skbs".
>>>
>>> Cc: [email protected] # 2.6.36+
>>> Signed-off-by: Stanislaw Gruszka <[email protected]>
>>> ---
>>> drivers/net/wireless/rt2x00/rt2800lib.c | 3 +--
>>> 1 files changed, 1 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/wireless/rt2x00/rt2800lib.c
>>> b/drivers/net/wireless/rt2x00/rt2800lib.c
>>> index 75d2c6c..f94d669 100644
>>> --- a/drivers/net/wireless/rt2x00/rt2800lib.c
>>> +++ b/drivers/net/wireless/rt2x00/rt2800lib.c
>>> @@ -703,8 +703,7 @@ void rt2800_write_beacon(struct queue_entry *entry,
>>> struct txentry_desc *txdesc)
>>> /*
>>> * Add space for the TXWI in front of the skb.
>>> */
>>> - skb_push(entry->skb, TXWI_DESC_SIZE);
>>> - memset(entry->skb, 0, TXWI_DESC_SIZE);
>>> + memset(skb_push(entry->skb, TXWI_DESC_SIZE), 0, TXWI_DESC_SIZE);
>>>
>>> /*
>>> * Register descriptor details in skb frame descriptor.
>>> --
>>> 1.7.4
>>>
>>
>> Hi,
>>
>> Testing w/ Linux kernel 3.0:
>>
>> # patch -p1 < ../patch3
>> patching file drivers/net/wireless/rt2x00/rt2800lib.c
>> Hunk #1 succeeded at 784 (offset 81 lines).
>>
>> Compiled/installed.
>>
>> Will reboot shortly, need to run a backup first, thanks.
>>
>> Justin.
>>
>>
>
> Hi,
>
> Patched, its working, latency is MUCH better than before the patch!
>
> PING atomw.internal.lan (192.168.0.2) 56(84) bytes of data.
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=1 ttl=64 time=0.535
> ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=2 ttl=64 time=0.589
> ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=3 ttl=64 time=48.8
> ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=4 ttl=64 time=1.14
> ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=5 ttl=64 time=64.5
> ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=6 ttl=64 time=0.582
> ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=7 ttl=64 time=80.6
> ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=8 ttl=64 time=51.9
> ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=9 ttl=64 time=35.5
> ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=10 ttl=64 time=0.614
> ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=11 ttl=64 time=0.497
> ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=12 ttl=64 time=41.5
> ms
>
> $ uptime
> 10:19:18 up 5 min, 2 users, load average: 1.32, 1.02, 0.47
>
> It hasn't crashed yet, if you don't hear from me/stating it has crashed
> again,
> then you can assume its stable, I'll let you know if anything happens.
>

Hmm..

The joy was short lived :(
$ ssh: connect to host atomw port 22: No route to host

Going to see if there is a dump now.

Justin.


2011-07-30 17:10:27

by Justin Piszcz

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure



On Sat, 30 Jul 2011, Justin Piszcz wrote:

>
>
> On Sat, 30 Jul 2011, Andreas Hartmann wrote:
>
>> Hi Justin,
>>
>> if you want to test, if the module works stable, run this script for one
>> or two hours. It will stress the driver and the hardware.
>>
>> If it doesn't crash and if the throughput is stable, you can hope, that
>> the driver is ok for daily work.
>>
>>
>> #!/bin/sh
>>
>> dest="server" # set the servername
>>
>> while true ; do
>> netperf -t TCP_MAERTS -H $dest
>> netperf -t TCP_STREAM -H $dest
>> netperf -t TCP_SENDFILE -H $dest
>> done
>>
>>
>> Start the script on the client. On the server start netserver.
>> You get netperf from http://www.netperf.org/netperf/
>>
>>
>> Andreas
>>
>
> Hello Andreas,
>
> Thanks for the suggestion; however, it is crashing repeatedly after about
> 5-10 minutes, so it is not needed yet.
>
> Here you go, crash 2:
> http://home.comcast.net/~jpiszcz/20110730/2630-rt2800usb-crash2p1.jpg
> http://home.comcast.net/~jpiszcz/20110730/2630-rt2800usb-crash2p2.jpg
>
> Justin.
>

(This is with the patch provided earlier)

atom:/usr/src/linux# cd drivers/net/wireless/rt2x00/
atom:/usr/src/linux/drivers/net/wireless/rt2x00# grep 'memset(skb_push(entry->skb, TXWI_DESC_SIZE), 0, TXWI_DESC_SIZE)' rt2800lib.c
memset(skb_push(entry->skb, TXWI_DESC_SIZE), 0, TXWI_DESC_SIZE);
atom:/usr/src/linux/drivers/net/wireless/rt2x00# cat /usr/src/linux/.version
2
atom:/usr/src/linux/drivers/net/wireless/rt2x00# uname -a
Linux atom 3.0.0 #2 SMP Sat Jul 30 08:34:18 EDT 2011 x86_64 GNU/Linux
atom:/usr/src/linux/drivers/net/wireless/rt2x00#

Justin.


2011-07-30 17:07:14

by Justin Piszcz

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure



On Sat, 30 Jul 2011, Andreas Hartmann wrote:

> Hi Justin,
>
> if you want to test, if the module works stable, run this script for one
> or two hours. It will stress the driver and the hardware.
>
> If it doesn't crash and if the throughput is stable, you can hope, that
> the driver is ok for daily work.
>
>
> #!/bin/sh
>
> dest="server" # set the servername
>
> while true ; do
> netperf -t TCP_MAERTS -H $dest
> netperf -t TCP_STREAM -H $dest
> netperf -t TCP_SENDFILE -H $dest
> done
>
>
> Start the script on the client. On the server start netserver.
> You get netperf from http://www.netperf.org/netperf/
>
>
> Andreas
>

Hello Andreas,

Thanks for the suggestion; however, it is crashing repeatedly after about
5-10 minutes, so it is not needed yet.

Here you go, crash 2:
http://home.comcast.net/~jpiszcz/20110730/2630-rt2800usb-crash2p1.jpg
http://home.comcast.net/~jpiszcz/20110730/2630-rt2800usb-crash2p2.jpg

Justin.


2011-07-30 14:02:30

by Ivo Van Doorn

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure

On Sat, Jul 30, 2011 at 1:32 PM, Stanislaw Gruszka <[email protected]> wrote:
> We should clear skb->data not skb itself. Bug was introduced by:
> commit 0b8004aa12d13ec750d102ba4082a95f0107c649 "rt2x00: Properly
> reserve room for descriptors in skbs".
>
> Cc: [email protected] # 2.6.36+
> Signed-off-by: Stanislaw Gruszka <[email protected]>

Acked-by: Ivo van Doorn <[email protected]>

> ---
> ?drivers/net/wireless/rt2x00/rt2800lib.c | ? ?3 +--
> ?1 files changed, 1 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/wireless/rt2x00/rt2800lib.c b/drivers/net/wireless/rt2x00/rt2800lib.c
> index 75d2c6c..f94d669 100644
> --- a/drivers/net/wireless/rt2x00/rt2800lib.c
> +++ b/drivers/net/wireless/rt2x00/rt2800lib.c
> @@ -703,8 +703,7 @@ void rt2800_write_beacon(struct queue_entry *entry, struct txentry_desc *txdesc)
> ? ? ? ?/*
> ? ? ? ? * Add space for the TXWI in front of the skb.
> ? ? ? ? */
> - ? ? ? skb_push(entry->skb, TXWI_DESC_SIZE);
> - ? ? ? memset(entry->skb, 0, TXWI_DESC_SIZE);
> + ? ? ? memset(skb_push(entry->skb, TXWI_DESC_SIZE), 0, TXWI_DESC_SIZE);
>
> ? ? ? ?/*
> ? ? ? ? * Register descriptor details in skb frame descriptor.
> --
> 1.7.4
>
>

2011-07-28 05:33:26

by Andreas Hartmann

[permalink] [raw]
Subject: Re: 3.0: rt2800usb(Kernel PANIC) vs. rt2780sta(GOOD/2.6.38)

Hi Justin,


you're right. The rt2800usb driver are unusable for me, too. You could
test the original driver from Ralink
http://www.ralinktech.com/support.php?s=2 and test, if they work.


Andreas

2011-07-30 13:47:10

by Gertjan van Wingerde

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure

On 07/30/11 13:32, Stanislaw Gruszka wrote:
> We should clear skb->data not skb itself. Bug was introduced by:
> commit 0b8004aa12d13ec750d102ba4082a95f0107c649 "rt2x00: Properly
> reserve room for descriptors in skbs".
>
> Cc: [email protected] # 2.6.36+
> Signed-off-by: Stanislaw Gruszka <[email protected]>

Ouch. Good catch. Let me go find a brown paper bag now :-(

Not sure if this is the source of Justin's problems, though.

Acked-by: Gertjan van Wingerde <[email protected]>

> ---
> drivers/net/wireless/rt2x00/rt2800lib.c | 3 +--
> 1 files changed, 1 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/wireless/rt2x00/rt2800lib.c b/drivers/net/wireless/rt2x00/rt2800lib.c
> index 75d2c6c..f94d669 100644
> --- a/drivers/net/wireless/rt2x00/rt2800lib.c
> +++ b/drivers/net/wireless/rt2x00/rt2800lib.c
> @@ -703,8 +703,7 @@ void rt2800_write_beacon(struct queue_entry *entry, struct txentry_desc *txdesc)
> /*
> * Add space for the TXWI in front of the skb.
> */
> - skb_push(entry->skb, TXWI_DESC_SIZE);
> - memset(entry->skb, 0, TXWI_DESC_SIZE);
> + memset(skb_push(entry->skb, TXWI_DESC_SIZE), 0, TXWI_DESC_SIZE);
>
> /*
> * Register descriptor details in skb frame descriptor.


--
---
Gertjan

2011-07-30 11:39:40

by Justin Piszcz

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure



On Sat, 30 Jul 2011, Stanislaw Gruszka wrote:

> We should clear skb->data not skb itself. Bug was introduced by:
> commit 0b8004aa12d13ec750d102ba4082a95f0107c649 "rt2x00: Properly
> reserve room for descriptors in skbs".
>
> Cc: [email protected] # 2.6.36+
> Signed-off-by: Stanislaw Gruszka <[email protected]>
> ---
> drivers/net/wireless/rt2x00/rt2800lib.c | 3 +--
> 1 files changed, 1 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/wireless/rt2x00/rt2800lib.c b/drivers/net/wireless/rt2x00/rt2800lib.c
> index 75d2c6c..f94d669 100644
> --- a/drivers/net/wireless/rt2x00/rt2800lib.c
> +++ b/drivers/net/wireless/rt2x00/rt2800lib.c
> @@ -703,8 +703,7 @@ void rt2800_write_beacon(struct queue_entry *entry, struct txentry_desc *txdesc)
> /*
> * Add space for the TXWI in front of the skb.
> */
> - skb_push(entry->skb, TXWI_DESC_SIZE);
> - memset(entry->skb, 0, TXWI_DESC_SIZE);
> + memset(skb_push(entry->skb, TXWI_DESC_SIZE), 0, TXWI_DESC_SIZE);
>
> /*
> * Register descriptor details in skb frame descriptor.
> --
> 1.7.4
>

Hi,

Testing w/ Linux kernel 3.0:

# patch -p1 < ../patch3
patching file drivers/net/wireless/rt2x00/rt2800lib.c
Hunk #1 succeeded at 784 (offset 81 lines).

Compiled/installed.

Will reboot shortly, need to run a backup first, thanks.

Justin.


2011-08-04 08:03:12

by Justin Piszcz

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure



On Wed, 3 Aug 2011, Stanislaw Gruszka wrote:

> On Wed, Aug 03, 2011 at 01:44:10PM -0400, Justin Piszcz wrote:
> > Tested that patch:
> >
> > 1. The driver no longer automatically loads, I had to manually modprobe it.
> This must be some other problem not related with patch.
>
> > 2. After loading, I get this (keep getting these)
> >
> > [ 384.054538] phy0 -> rt2800_txdone: Error - Data pending
> > [ 384.072773] phy0 -> rt2800_txdone: Error - Data pending
> > [ 384.096545] phy0 -> rt2800_txdone: Error - Data pending
> > [ 384.117301] phy0 -> rt2800_txdone: Error - Data pending
> > [ 384.537586] phy0 -> rt2800_txdone: Error - Data pending
> > [ 384.555716] phy0 -> rt2800_txdone: Error - Data pending
> > [ 384.573903] phy0 -> rt2800_txdone: Error - Data pending
> > [ 384.599465] phy0 -> rt2800_txdone: Error - Data pending
> > [ 384.618523] phy0 -> rt2800_txdone: Error - Data pending
> You can remove line
>
> ERROR(rt2x00dev, "Data pending\n");
>
> from rt2800_txdone to stop seeing this. It's kinda interesting
> how frequent this happens.
>
> > No crash yet, but bad ping again too (always with the rt2800usb) driver:
> >
> > 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=1 ttl=64 time=53.1 ms
> > 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=2 ttl=64 time=285 ms
> > 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=3 ttl=64 time=89.6 ms
> > 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=4 ttl=64 time=120 ms
> > 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=5 ttl=64 time=42.2 ms
> > 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=6 ttl=64 time=156 ms
> > 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=7 ttl=64 time=77.2 ms
>
> I think patch could have site effect to not run tx queue while we have pending
> data on it. Do you have such ping times always (i.e. after a 5 minutes, 10
> minutes, 20 ... ) or just randomly?
>
> Stanislaw

Hi,

So the patch is good but did not solve the problem, after several hours, the
main USB wireless stick went off-line:

Aug 3 22:02:16 atom kernel: [23424.597037] phy0 -> rt2800_txdone: Error - Data pending
Aug 3 22:02:16 atom kernel: [23424.597168] phy0 -> rt2800_txdone: Error - Data pending
Aug 3 22:02:16 atom kernel: [23424.598047] phy0 -> rt2800_txdone: Error - Data pending
Aug 3 22:02:16 atom kernel: [23424.598411] phy0 -> rt2800_txdone: Error - Data pending
Aug 3 22:02:16 atom kernel: [23424.599414] phy0 -> rt2800_txdone: Error - Data pending
Aug 3 22:02:16 atom kernel: [23424.599549] phy0 -> rt2800_txdone: Error - Data pending
Aug 3 22:02:16 atom kernel: [23424.599664] phy0 -> rt2800_txdone: Error - Data pending
Aug 3 22:47:12 atom kernel: [26120.279774] wlan0: deauthenticated from hidden (Reason: 2)
Aug 3 22:47:13 atom kernel: [26121.728581] wlan0: authenticate with hidden (try 1)
Aug 3 22:47:13 atom kernel: [26121.730109] wlan0: authenticated
Aug 3 22:47:13 atom kernel: [26121.745088] wlan0: associate with hidden (try 1)
Aug 3 22:47:13 atom kernel: [26121.750738] wlan0: RX ReassocResp from hidden (capab=0x431 status=0 aid=2)
Aug 3 22:47:13 atom kernel: [26121.750746] wlan0: associated
Aug 3 22:47:14 atom kernel: [26121.845641] wlan0: Wrong control channel in association response: configured center-freq: 2417 hti-cfreq: 2437 hti->control_chan: 6 band: 0. Disabling HT.


( at this point wlan0 is offline )

$ ssh atomw
ssh: connect to host atomw port 22: No route to host

BUT: It did not crash the kernel; however, interestingly (I have two of these
sticks), now phy1 starts showing the same symptoms wlan0 had (phy0).

Aug 3 22:49:46 atom kernel: [26274.088150] phy1 -> rt2800_txdone: Error - Data pending
Aug 3 22:49:46 atom kernel: [26274.147869] phy1 -> rt2800_txdone: Error - Data pending
Aug 3 22:49:46 atom kernel: [26274.152743] phy1 -> rt2800_txdone: Error - Data pending

Are these sticks just incomptabile with the rt2800usb driver?

Justin.


2011-08-03 18:32:33

by Justin Piszcz

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure



On Wed, 3 Aug 2011, Justin Piszcz wrote:

>
>
> On Wed, 3 Aug 2011, Justin Piszcz wrote:
>

For the rt2800usb driver:

It has not crashed in 1hr with the 3.0 kernel:

[ 1027.102856] phy0 -> rt2800_txdone: Error - Data pending
[ 1027.110496] phy0 -> rt2800_txdone: Error - Data pending
[ 1027.114880] phy0 -> rt2800_txdone: Error - Data pending
[ 1027.117616] phy0 -> rt2800_txdone: Error - Data pending
[ 1027.120272] phy0 -> rt2800_txdone: Error - Data pending
[ 1027.143371] phy0 -> rt2800_txdone: Error - Data pending
[ 1027.147252] phy0 -> rt2800_txdone: Error - Data pending
[ 1027.150505] phy0 -> rt2800_txdone: Error - Data pending
[ 1027.153541] phy0 -> rt2800_txdone: Error - Data pending
[ 1027.156118] phy0 -> rt2800_txdone: Error - Data pending
[ 1027.167250] phy0 -> rt2800_txdone: Error - Data pending
[ 1089.588039] phy0 -> rt2800_txdone: Error - Data pending
[ 1089.623248] phy0 -> rt2800_txdone: Error - Data pending
[ 1089.626370] phy0 -> rt2800_txdone: Error - Data pending

Will reboot again and see if it auto-loads (or if a manual modprobe is
necessary)..

Justin.

2011-08-03 18:42:06

by Justin Piszcz

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure



On Wed, 3 Aug 2011, Justin Piszcz wrote:

>
>
> On Wed, 3 Aug 2011, Stanislaw Gruszka wrote:
>
>> On Wed, Aug 03, 2011 at 01:44:10PM -0400, Justin Piszcz wrote:
>>> Tested that patch:
>>>

Hi,

[ 3552.765072] phy0 -> rt2800_txdone: Error - Data pending
[ 3552.774580] phy0 -> rt2800_txdone: Error - Data pending
[ 3985.215031] ieee80211 phy0: wlan0: No probe response from AP (hidden) after 500ms, disconnecting.
[ 3985.262684] cfg80211: Calling CRDA for country: XX
[ 4011.806042] wlan0: authenticate with (hidden) (try 1)
[ 4011.807577] wlan0: authenticated
[ 4011.824287] wlan0: associate with (hidden) (try 1)
[ 4011.828180] wlan0: RX ReassocResp from (hidden) (capab=0x431 status=0 aid=2)
[ 4011.828189] wlan0: associated
[ 4035.209032] ieee80211 phy0: wlan0: No probe response from AP (hidden) after 500ms, disconnecting.
[ 4035.248719] cfg80211: Calling CRDA to update world regulatory domain
[ 4061.777466] wlan0: authenticate with (hidden) (try 1)
[ 4061.786410] wlan0: authenticated
[ 4061.818460] wlan0: associate with (hidden) (try 1)
[ 4061.822059] wlan0: RX ReassocResp from (hidden) (capab=0x431 status=0 aid=2)
[ 4061.822068] wlan0: associated

Saw this on the rt2800usb driver (w/ your patch), is this normal?

Justin.


2011-08-05 16:13:42

by Aleksandar Milivojevic

[permalink] [raw]
Subject: Re: [rt2x00-users] [PATCH] rt2x00: rt2800: fix zeroing skb structure

On Thu, Aug 4, 2011 at 1:03 AM, Justin Piszcz <[email protected]> wrote:
> Aug ?3 22:47:12 atom kernel: [26120.279774] wlan0: deauthenticated from
> hidden (Reason: 2)

Thinking about it, I'm also seeing similar "deauthenticated" message
in kernel logs on my system infrequently (with Linksys WUSB600N, hw
rev 1, rt2870 chip). About few times a day, if I leave my system on
24/7. In my case, the reason code listed is "3". Most times it
re-authenticates with AP automatically, though sometimes it gives up
and I need to tell it manually to try to re-connect to AP (using
NetworkManager).

Haven't gave it much thought before, as I usually have this system
powered on for only few hours at a time, not long enough for problem
to occur. Unless I forget to power it off and it stays on overnight
(and then by the morning, it's bound to have at least few
"deauthenticated" messages logged).

2011-08-04 12:43:19

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure

On Thu, Aug 04, 2011 at 04:03:11AM -0400, Justin Piszcz wrote:
> So the patch is good but did not solve the problem, after several hours, the
> main USB wireless stick went off-line:
>
> Aug 3 22:02:16 atom kernel: [23424.597037] phy0 -> rt2800_txdone: Error - Data pending
> Aug 3 22:02:16 atom kernel: [23424.597168] phy0 -> rt2800_txdone: Error - Data pending
> Aug 3 22:02:16 atom kernel: [23424.598047] phy0 -> rt2800_txdone: Error - Data pending
> Aug 3 22:02:16 atom kernel: [23424.598411] phy0 -> rt2800_txdone: Error - Data pending
> Aug 3 22:02:16 atom kernel: [23424.599414] phy0 -> rt2800_txdone: Error - Data pending
> Aug 3 22:02:16 atom kernel: [23424.599549] phy0 -> rt2800_txdone: Error - Data pending
> Aug 3 22:02:16 atom kernel: [23424.599664] phy0 -> rt2800_txdone: Error - Data pending
> Aug 3 22:47:12 atom kernel: [26120.279774] wlan0: deauthenticated from hidden (Reason: 2)
> Aug 3 22:47:13 atom kernel: [26121.728581] wlan0: authenticate with hidden (try 1)
> Aug 3 22:47:13 atom kernel: [26121.730109] wlan0: authenticated
> Aug 3 22:47:13 atom kernel: [26121.745088] wlan0: associate with hidden (try 1)
> Aug 3 22:47:13 atom kernel: [26121.750738] wlan0: RX ReassocResp from hidden (capab=0x431 status=0 aid=2)
> Aug 3 22:47:13 atom kernel: [26121.750746] wlan0: associated
> Aug 3 22:47:14 atom kernel: [26121.845641] wlan0: Wrong control channel in association response: configured center-freq: 2417 hti-cfreq: 2437 hti->control_chan: 6 band: 0. Disabling HT.

Well, driver still need some fixes. I have 2 of these dongles and see
some problems here with reading registers, what could be related I
think. I'm gonna fix issues that I have locally. Then we will see if
problems are fixed for you as well. For now, you will need to use
drivers from ralink site, if they work well for you.

> ( at this point wlan0 is offline )
>
> $ ssh atomw
> ssh: connect to host atomw port 22: No route to host
>
> BUT: It did not crash the kernel; however, interestingly (I have two of these
> sticks), now phy1 starts showing the same symptoms wlan0 had (phy0).

I'm going to post modified/cleaned up patch.

> Aug 3 22:49:46 atom kernel: [26274.088150] phy1 -> rt2800_txdone: Error - Data pending
> Aug 3 22:49:46 atom kernel: [26274.147869] phy1 -> rt2800_txdone: Error - Data pending
> Aug 3 22:49:46 atom kernel: [26274.152743] phy1 -> rt2800_txdone: Error - Data pending
>
> Are these sticks just incomptabile with the rt2800usb driver?

If rt2800usb contains usb id of the stick, it should work.

Stanislaw

2011-08-03 17:44:12

by Justin Piszcz

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure



On Wed, 3 Aug 2011, Justin Piszcz wrote:

>
>
> On Wed, 3 Aug 2011, Stanislaw Gruszka wrote:
>
>> On Sat, Jul 30, 2011 at 01:07:11PM -0400, Justin Piszcz wrote:
>>> Here you go, crash 2:
>>> http://home.comcast.net/~jpiszcz/20110730/2630-rt2800usb-crash2p1.jpg
>>> http://home.comcast.net/~jpiszcz/20110730/2630-rt2800usb-crash2p2.jpg
>>
>> Here is next (draw) patch to test:
>
> Thanks,
>
> # patch -p1 < ../patch4
> patching file drivers/net/wireless/rt2x00/rt2800lib.c
> patching file drivers/net/wireless/rt2x00/rt2800lib.h
> patching file drivers/net/wireless/rt2x00/rt2800usb.c
> patching file drivers/net/wireless/rt2x00/rt2x00queue.c
> patching file drivers/net/wireless/rt2x00/rt2x00usb.c
> #
>
> I'll give this driver one more chance (quickly) and let you know if it
> crashes
> afterwards (usually crashes in 5-10 minutes), but most likely I am going to
> move over to some other wireless usb sticks, got a couple think penguin usb
> wifi sticks that use the carl driver, 'open source' wireless usb sticks,
> going to give them a try next.
>
> Also got some other ones as well, hopefully one of them will work without
> 900ms of lag. The rt2870sta was the best driver in 2.6.38.x series that I
> have ever used for wireless USB sticks.
>
> Justin.
>
>

Hi,

Tested that patch:

1. The driver no longer automatically loads, I had to manually modprobe it.
2. After loading, I get this (keep getting these)

[ 384.054538] phy0 -> rt2800_txdone: Error - Data pending
[ 384.072773] phy0 -> rt2800_txdone: Error - Data pending
[ 384.096545] phy0 -> rt2800_txdone: Error - Data pending
[ 384.117301] phy0 -> rt2800_txdone: Error - Data pending
[ 384.537586] phy0 -> rt2800_txdone: Error - Data pending
[ 384.555716] phy0 -> rt2800_txdone: Error - Data pending
[ 384.573903] phy0 -> rt2800_txdone: Error - Data pending
[ 384.599465] phy0 -> rt2800_txdone: Error - Data pending
[ 384.618523] phy0 -> rt2800_txdone: Error - Data pending

No crash yet, but bad ping again too (always with the rt2800usb) driver:

64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=1 ttl=64 time=53.1 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=2 ttl=64 time=285 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=3 ttl=64 time=89.6 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=4 ttl=64 time=120 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=5 ttl=64 time=42.2 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=6 ttl=64 time=156 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=7 ttl=64 time=77.2 ms

(Have another machine, testing the carl driver kernel 3.0 on it)

Not having too much luck there either.

[586342.990975] carl9170: Unknown symbol __ieee80211_get_tx_led_name (err 0)
[586342.991059] carl9170: Unknown symbol __ieee80211_get_assoc_led_name (err 0)
[586459.214057] carl9170: Unknown symbol __ieee80211_get_tx_led_name (err 0)
[586459.214131] carl9170: Unknown symbol __ieee80211_get_assoc_led_name (err 0)

Justin.


2011-08-03 17:31:20

by Justin Piszcz

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure



On Wed, 3 Aug 2011, Stanislaw Gruszka wrote:

> On Sat, Jul 30, 2011 at 01:07:11PM -0400, Justin Piszcz wrote:
>> Here you go, crash 2:
>> http://home.comcast.net/~jpiszcz/20110730/2630-rt2800usb-crash2p1.jpg
>> http://home.comcast.net/~jpiszcz/20110730/2630-rt2800usb-crash2p2.jpg
>
> Here is next (draw) patch to test:

Thanks,

# patch -p1 < ../patch4
patching file drivers/net/wireless/rt2x00/rt2800lib.c
patching file drivers/net/wireless/rt2x00/rt2800lib.h
patching file drivers/net/wireless/rt2x00/rt2800usb.c
patching file drivers/net/wireless/rt2x00/rt2x00queue.c
patching file drivers/net/wireless/rt2x00/rt2x00usb.c
#

I'll give this driver one more chance (quickly) and let you know if it crashes
afterwards (usually crashes in 5-10 minutes), but most likely I am going to
move over to some other wireless usb sticks, got a couple think penguin usb
wifi sticks that use the carl driver, 'open source' wireless usb sticks,
going to give them a try next.

Also got some other ones as well, hopefully one of them will work without
900ms of lag. The rt2870sta was the best driver in 2.6.38.x series that I
have ever used for wireless USB sticks.

Justin.


2011-08-03 18:49:29

by Justin Piszcz

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure



On Wed, 3 Aug 2011, Justin Piszcz wrote:

>
>
> On Wed, 3 Aug 2011, Justin Piszcz wrote:
>
>>
>>
>> On Wed, 3 Aug 2011, Stanislaw Gruszka wrote:
>>
>>> On Wed, Aug 03, 2011 at 01:44:10PM -0400, Justin Piszcz wrote:
>>>> Tested that patch:
>>>>

Ok,

Now with two machines ( and using multiple wi-fi usb sticks ) rebooting clean
and everything is working, no errors except for the -> phy0 ones noted earlier.

No further crashes, I'll update if I see anymore.

[ 26.114025] wlan1: no IPv6 routers present
[ 34.682724] phy0 -> rt2800_txdone: Error - Data pending
[ 34.699746] phy0 -> rt2800_txdone: Error - Data pending
[ 34.716553] phy0 -> rt2800_txdone: Error - Data pending


Justin.


2011-08-03 18:35:53

by Justin Piszcz

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure



On Wed, 3 Aug 2011, Stanislaw Gruszka wrote:

> On Wed, Aug 03, 2011 at 01:44:10PM -0400, Justin Piszcz wrote:
>> Tested that patch:
>>
>> 1. The driver no longer automatically loads, I had to manually modprobe it.
> This must be some other problem not related with patch.
Yes, will check this again shortly.

>
>> 2. After loading, I get this (keep getting these)
>>
>> [ 384.054538] phy0 -> rt2800_txdone: Error - Data pending
>> [ 384.072773] phy0 -> rt2800_txdone: Error - Data pending
>> [ 384.096545] phy0 -> rt2800_txdone: Error - Data pending
>> [ 384.117301] phy0 -> rt2800_txdone: Error - Data pending
>> [ 384.537586] phy0 -> rt2800_txdone: Error - Data pending
>> [ 384.555716] phy0 -> rt2800_txdone: Error - Data pending
>> [ 384.573903] phy0 -> rt2800_txdone: Error - Data pending
>> [ 384.599465] phy0 -> rt2800_txdone: Error - Data pending
>> [ 384.618523] phy0 -> rt2800_txdone: Error - Data pending
> You can remove line
>
> ERROR(rt2x00dev, "Data pending\n");
>
> from rt2800_txdone to stop seeing this. It's kinda interesting
> how frequent this happens.
It seems to happen somewhat often; e.g. if you apt-get install pkg, it
will popup 3-4 times.

>
>> No crash yet, but bad ping again too (always with the rt2800usb) driver:
>>
>> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=1 ttl=64 time=53.1 ms
>> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=2 ttl=64 time=285 ms
>> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=3 ttl=64 time=89.6 ms
>> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=4 ttl=64 time=120 ms
>> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=5 ttl=64 time=42.2 ms
>> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=6 ttl=64 time=156 ms
>> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=7 ttl=64 time=77.2 ms
>
> I think patch could have site effect to not run tx queue while we have pending
> data on it. Do you have such ping times always (i.e. after a 5 minutes, 10
> minutes, 20 ... ) or just randomly?

Randomly.

64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=1 ttl=64 time=203 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=2 ttl=64 time=27.6 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=3 ttl=64 time=159 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=4 ttl=64 time=80.3 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=5 ttl=64 time=194 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=6 ttl=64 time=114 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=7 ttl=64 time=34.0 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=8 ttl=64 time=162 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=9 ttl=64 time=83.1 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=10 ttl=64 time=214 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=11 ttl=64 time=134 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=12 ttl=64 time=149 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=13 ttl=64 time=70.3 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=14 ttl=64 time=201 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=15 ttl=64 time=122 ms
64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=16 ttl=64 time=42.7 ms

34 packets transmitted, 34 received, 0% packet loss, time 33047ms
rtt min/avg/max/mdev = 23.620/117.755/216.279/62.033 ms

Still working so far though, at least the box has not crashed yet, but I'll
give it some more time.

Justin.

2011-08-03 16:00:51

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure

On Sat, Jul 30, 2011 at 01:07:11PM -0400, Justin Piszcz wrote:
> Here you go, crash 2:
> http://home.comcast.net/~jpiszcz/20110730/2630-rt2800usb-crash2p1.jpg
> http://home.comcast.net/~jpiszcz/20110730/2630-rt2800usb-crash2p2.jpg

Here is next (draw) patch to test:

diff --git a/drivers/net/wireless/rt2x00/rt2800lib.c b/drivers/net/wireless/rt2x00/rt2800lib.c
index 2a6aa85..2d662f3 100644
--- a/drivers/net/wireless/rt2x00/rt2800lib.c
+++ b/drivers/net/wireless/rt2x00/rt2800lib.c
@@ -725,14 +725,14 @@ void rt2800_txdone_entry(struct queue_entry *entry, u32 status)
}
EXPORT_SYMBOL_GPL(rt2800_txdone_entry);

-void rt2800_txdone(struct rt2x00_dev *rt2x00dev)
+int rt2800_txdone(struct rt2x00_dev *rt2x00dev)
{
struct data_queue *queue;
struct queue_entry *entry;
u32 reg;
u8 qid;

- while (kfifo_get(&rt2x00dev->txstatus_fifo, &reg)) {
+ while (kfifo_peek(&rt2x00dev->txstatus_fifo, &reg)) {

/* TX_STA_FIFO_PID_QUEUE is a 2-bit field, thus
* qid is guaranteed to be one of the TX QIDs
@@ -742,25 +742,38 @@ void rt2800_txdone(struct rt2x00_dev *rt2x00dev)
if (unlikely(!queue)) {
WARNING(rt2x00dev, "Got TX status for an unavailable "
"queue %u, dropping\n", qid);
- continue;
+ goto next_reg;
}

/*
* Inside each queue, we process each entry in a chronological
* order. We first check that the queue is not empty.
*/
- entry = NULL;
- while (!rt2x00queue_empty(queue)) {
+ while (1) {
+ entry = NULL;
+ if (rt2x00queue_empty(queue))
+ break;
+
entry = rt2x00queue_get_entry(queue, Q_INDEX_DONE);
+
+ if (test_bit(ENTRY_OWNER_DEVICE_DATA, &entry->flags) ||
+ !test_bit(ENTRY_DATA_STATUS_PENDING, &entry->flags)) {
+ ERROR(rt2x00dev, "Data pending\n");
+ return 1;
+ }
+
if (rt2800_txdone_entry_check(entry, reg))
break;
}

- if (!entry || rt2x00queue_empty(queue))
- break;
-
- rt2800_txdone_entry(entry, reg);
+ if (entry)
+ rt2800_txdone_entry(entry, reg);
+next_reg:
+ if (kfifo_get(&rt2x00dev->txstatus_fifo, &reg) != 1)
+ ERROR(rt2x00dev, "BUG on kfifo");
}
+
+ return 0;
}
EXPORT_SYMBOL_GPL(rt2800_txdone);

diff --git a/drivers/net/wireless/rt2x00/rt2800lib.h b/drivers/net/wireless/rt2x00/rt2800lib.h
index f2d1594..54d0d14 100644
--- a/drivers/net/wireless/rt2x00/rt2800lib.h
+++ b/drivers/net/wireless/rt2x00/rt2800lib.h
@@ -152,7 +152,7 @@ void rt2800_write_tx_data(struct queue_entry *entry,
struct txentry_desc *txdesc);
void rt2800_process_rxwi(struct queue_entry *entry, struct rxdone_entry_desc *txdesc);

-void rt2800_txdone(struct rt2x00_dev *rt2x00dev);
+int rt2800_txdone(struct rt2x00_dev *rt2x00dev);
void rt2800_txdone_entry(struct queue_entry *entry, u32 status);

void rt2800_write_beacon(struct queue_entry *entry, struct txentry_desc *txdesc);
diff --git a/drivers/net/wireless/rt2x00/rt2800usb.c b/drivers/net/wireless/rt2x00/rt2800usb.c
index ba82c97..a8a9f79 100644
--- a/drivers/net/wireless/rt2x00/rt2800usb.c
+++ b/drivers/net/wireless/rt2x00/rt2800usb.c
@@ -464,7 +464,8 @@ static void rt2800usb_work_txdone(struct work_struct *work)
struct data_queue *queue;
struct queue_entry *entry;

- rt2800_txdone(rt2x00dev);
+ if (rt2800_txdone(rt2x00dev))
+ goto out;

/*
* Process any trailing TX status reports for IO failures,
@@ -488,6 +489,7 @@ static void rt2800usb_work_txdone(struct work_struct *work)
}
}

+out:
/*
* The hw may delay sending the packet after DMA complete
* if the medium is busy, thus the TX_STA_FIFO entry is
diff --git a/drivers/net/wireless/rt2x00/rt2x00queue.c b/drivers/net/wireless/rt2x00/rt2x00queue.c
index ab8c16f..7635014 100644
--- a/drivers/net/wireless/rt2x00/rt2x00queue.c
+++ b/drivers/net/wireless/rt2x00/rt2x00queue.c
@@ -784,6 +784,57 @@ bool rt2x00queue_for_each_entry(struct data_queue *queue,
}
EXPORT_SYMBOL_GPL(rt2x00queue_for_each_entry);

+static void rt2x00queue_validate(struct data_queue *queue)
+{
+ int idx0, idx1, idx2;
+ int tmp = 0;
+ int u;
+
+ switch (queue->qid) {
+ case QID_AC_VO:
+ case QID_AC_VI:
+ case QID_AC_BE:
+ case QID_AC_BK:
+ goto do_validate;
+ default:
+ return;
+ }
+
+do_validate:
+
+ idx0 = queue->index[2];
+ idx1 = queue->index[1];
+ idx2 = queue->index[0];
+
+ tmp = idx0 + queue->length;
+ if (tmp >= queue->limit)
+ tmp -= queue->limit;
+
+ if (tmp != idx2) {
+ u = 0;
+ goto print;
+ }
+
+ if (idx2 >= idx0) {
+ u = 1;
+ if (idx1 < idx0 || idx1 > idx2)
+ goto print;
+ } else {
+ bool check = (idx1 >= idx0 && idx1 < queue->limit) ||
+ (idx1 >= 0 && idx1 <= idx2);
+
+ u = 2;
+ if (!check)
+ goto print;
+ }
+
+ return;
+
+print:
+ printk(KERN_CRIT "%s %d idx(%d, %d, %d) tmp %d\n", __func__, u, idx2, idx1, idx0, tmp);
+ BUG_ON(1);
+}
+
struct queue_entry *rt2x00queue_get_entry(struct data_queue *queue,
enum queue_index index)
{
@@ -800,6 +851,7 @@ struct queue_entry *rt2x00queue_get_entry(struct data_queue *queue,

entry = &queue->entries[queue->index[index]];

+ rt2x00queue_validate(queue);
spin_unlock_irqrestore(&queue->index_lock, irqflags);

return entry;
@@ -832,6 +884,7 @@ void rt2x00queue_index_inc(struct queue_entry *entry, enum queue_index index)
queue->count++;
}

+ rt2x00queue_validate(queue);
spin_unlock_irqrestore(&queue->index_lock, irqflags);
}

diff --git a/drivers/net/wireless/rt2x00/rt2x00usb.c b/drivers/net/wireless/rt2x00/rt2x00usb.c
index 8f90f62..de3720f 100644
--- a/drivers/net/wireless/rt2x00/rt2x00usb.c
+++ b/drivers/net/wireless/rt2x00/rt2x00usb.c
@@ -265,14 +265,14 @@ static void rt2x00usb_interrupt_txdone(struct urb *urb)
if (!test_and_clear_bit(ENTRY_OWNER_DEVICE_DATA, &entry->flags))
return;

- if (rt2x00dev->ops->lib->tx_dma_done)
- rt2x00dev->ops->lib->tx_dma_done(entry);
-
/*
* Report the frame as DMA done
*/
rt2x00lib_dmadone(entry);

+ if (rt2x00dev->ops->lib->tx_dma_done)
+ rt2x00dev->ops->lib->tx_dma_done(entry);
+
/*
* Check if the frame was correctly uploaded
*/


2011-08-03 18:33:36

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [PATCH] rt2x00: rt2800: fix zeroing skb structure

On Wed, Aug 03, 2011 at 01:44:10PM -0400, Justin Piszcz wrote:
> Tested that patch:
>
> 1. The driver no longer automatically loads, I had to manually modprobe it.
This must be some other problem not related with patch.

> 2. After loading, I get this (keep getting these)
>
> [ 384.054538] phy0 -> rt2800_txdone: Error - Data pending
> [ 384.072773] phy0 -> rt2800_txdone: Error - Data pending
> [ 384.096545] phy0 -> rt2800_txdone: Error - Data pending
> [ 384.117301] phy0 -> rt2800_txdone: Error - Data pending
> [ 384.537586] phy0 -> rt2800_txdone: Error - Data pending
> [ 384.555716] phy0 -> rt2800_txdone: Error - Data pending
> [ 384.573903] phy0 -> rt2800_txdone: Error - Data pending
> [ 384.599465] phy0 -> rt2800_txdone: Error - Data pending
> [ 384.618523] phy0 -> rt2800_txdone: Error - Data pending
You can remove line

ERROR(rt2x00dev, "Data pending\n");

from rt2800_txdone to stop seeing this. It's kinda interesting
how frequent this happens.

> No crash yet, but bad ping again too (always with the rt2800usb) driver:
>
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=1 ttl=64 time=53.1 ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=2 ttl=64 time=285 ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=3 ttl=64 time=89.6 ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=4 ttl=64 time=120 ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=5 ttl=64 time=42.2 ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=6 ttl=64 time=156 ms
> 64 bytes from atomw.internal.lan (192.168.0.2): icmp_req=7 ttl=64 time=77.2 ms

I think patch could have site effect to not run tx queue while we have pending
data on it. Do you have such ping times always (i.e. after a 5 minutes, 10
minutes, 20 ... ) or just randomly?

Stanislaw