Hello,
using rt2800usb with a Linksys WUSB600N v2 (rt3572) crashes the complete
machine (SMP, Core i5, linux 3.0) on unloading the module after using it
for a short period of time:
- 2 times netperf -t TCP_MAERTS -H host
- 2 times netperf -t TCP_STREAM -H host
The error message in /var/log/messages is:
phy0 -> rt2800_wait_wpdma_ready: Error - WPDMA TX/RX busy, aborting
After the crash, you have to hard reset the machine.
BTW:
The throughput to STA is about 4 MB/s.
The throughput from STA is about 1 _k_B/s
Kind regards,
Andreas
On Wed, Aug 24, 2011 at 02:21:51PM +0200, Andreas Hartmann wrote:
> Stanislaw Gruszka schrieb:
> > On Tue, Aug 23, 2011 at 11:07:36AM +0200, Andreas Hartmann wrote:
> >> using rt2800usb with a Linksys WUSB600N v2 (rt3572) crashes the complete
> >> machine (SMP, Core i5, linux 3.0) on unloading the module after using it
> >> for a short period of time:
> >> - 2 times netperf -t TCP_MAERTS -H host
> >> - 2 times netperf -t TCP_STREAM -H host
> >>
> >> The error message in /var/log/messages is:
> >>
> >> phy0 -> rt2800_wait_wpdma_ready: Error - WPDMA TX/RX busy, aborting
> >>
> >> After the crash, you have to hard reset the machine.
>
> [...]
>
> > Otherwise perhaps you could photo crash logs on virtual terminal
> > (switched by Alt+Ctrl+F2 from X-window) or by using netconsole or kdump.
>
> There is no crash dump - the machine just hangs up itself and the fan is
> getting loader and loader (until max), because the machine is getting
> hot more and more.
Kernel should generate some information when it hangs, at least when
debug options are enabled like CONFIG_DEBUG_SPINLOCK,
CONFIG_DEBUG_OBJECTS, CONFIG_LOCKUP_DETECTOR, ...
I just realized that compat-wireless-3.1-rc1-1, does not contain some
rt2x00 fixes:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=4b1bfb7d2d125af6653d6c2305356b2677f79dc6
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=df71c9cfceea801e7e26e2c74241758ef9c042e5
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=674db1344443204b6ce3293f2df8fd1b7665deea
Try to apply them first, or use compat-wireless-next. If they not help
just try to reconfigure kernel to print messages on lockup.
Note, I'm not able to reproduce hangup using steps you provide. However
I have bad performance, between 6 and 16 Mbits/s measured by netperf
on connection between two rt2800usb stations through WRT160NL AP.
I'm going to look at this problem when I'll have a chance.
Stanislaw
Am Thu, 25 Aug 2011 18:11:04 +0200
schrieb Stanislaw Gruszka <[email protected]>:
> On Wed, Aug 24, 2011 at 02:21:51PM +0200, Andreas Hartmann wrote:
> > Stanislaw Gruszka schrieb:
> > > On Tue, Aug 23, 2011 at 11:07:36AM +0200, Andreas Hartmann wrote:
> > >> using rt2800usb with a Linksys WUSB600N v2 (rt3572) crashes the complete
> > >> machine (SMP, Core i5, linux 3.0) on unloading the module after using it
> > >> for a short period of time:
> > >> - 2 times netperf -t TCP_MAERTS -H host
> > >> - 2 times netperf -t TCP_STREAM -H host
> > >>
> > >> The error message in /var/log/messages is:
> > >>
> > >> phy0 -> rt2800_wait_wpdma_ready: Error - WPDMA TX/RX busy, aborting
> > >>
> > >> After the crash, you have to hard reset the machine.
> >
> > [...]
> >
> > > Otherwise perhaps you could photo crash logs on virtual terminal
> > > (switched by Alt+Ctrl+F2 from X-window) or by using netconsole or kdump.
> >
> > There is no crash dump - the machine just hangs up itself and the fan is
> > getting loader and loader (until max), because the machine is getting
> > hot more and more.
>
> Kernel should generate some information when it hangs, at least when
> debug options are enabled like CONFIG_DEBUG_SPINLOCK,
> CONFIG_DEBUG_OBJECTS, CONFIG_LOCKUP_DETECTOR, ...
>
> I just realized that compat-wireless-3.1-rc1-1, does not contain some
> rt2x00 fixes:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=4b1bfb7d2d125af6653d6c2305356b2677f79dc6
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=df71c9cfceea801e7e26e2c74241758ef9c042e5
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=674db1344443204b6ce3293f2df8fd1b7665deea
>
> Try to apply them first, or use compat-wireless-next. If they not help
> just try to reconfigure kernel to print messages on lockup.
I applied your patches (and the suspend patch) and got the following
throughput with a little luck (on a core i5):
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to host (1.1.1.1) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.17 33.48
TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to host (1.1.1.1) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.07 73.48
TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to host (1.1.1.1) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.35 32.08
These values could be seen, after the wlan stack has been reloaded
after the first load. This is necessary, because the transfer stalled
after a few seconds after the first load of the module.
The system load on one core during the test was 100%. The latency (ping)
was about 5 ms.
I did the same test with a single core CPU (Celeron M). I could see the
same high CPU load during data transfer.
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to host (1.1.1.1) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.30 30.61
TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to host (1.1.1.6) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.09 63.65
TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to host (1.1.1.6) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.50 31.17
The latency on the single core machine was 0,5 ms (10 times less!!).
Oh, there are two more differences between the single and multi core
machine: the single core machine runs with linux 2.6.37.6-0.5-desktop
(32 bit), the multi core machine with 3.0.0-39-desktop (64 bit).
Anyway, I could see on both machines suddenly stalled transfers like
these:
TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to host (1.1.1.1)
port 0 AF_INET send_tcp_maerts: data recv error: Interrupted system call
len was -1
Sometimes, it took long time until the network stack was usable again or
I even had to reload the modules.
The 144/s status reports (see my other mail in this thread) are not
gone, if DEBUG option was set in config.mk.
The throughput during a netperf run is very unsteady. During TCP_MAERTS
e.g., it alternates heavily between 0 and 14 M/s (seen with xosview +n).
During the tests, I could see few errors like this in messages:
phy0 -> rt2800_wait_wpdma_ready: Error - WPDMA TX/RX busy, aborting.
With the legacy driver, I'm getting _constant_ throughput until 16 M/s
for TCP_MAERST and until 10 M/s for TCP_STREAM or
TCP_SENDFILE at the same place. The load during these transfers is 0 -
yes, it's really 0.0 - even with the Celeron M machine.
> Note, I'm not able to reproduce hangup using steps you provide.
I couldn't reproduce the mentioned hang with your patches. The hang came
up during my first tests without your patches if the DEBUG option
for rt2800usb in the config.mk was switched off! As long as it was
switched on, the machine didn't hang.
> However
> I have bad performance, between 6 and 16 Mbits/s measured by netperf
> on connection between two rt2800usb stations through WRT160NL AP.
> I'm going to look at this problem when I'll have a chance.
Well, that's the real problem (here)! It would be very great if this
could be fixed. There must be something really broken, if one CPU is
completely used for network data transfer.
After all these problems shown up here, I think that it is more or less
fortune, if the network does work at all (if it's getting stressed). I
can imagine, that on other machines and other terms, the machine could
even crash or hang.
As long as the network is mainly idle (just used for a slow internet
line or for ssh -X e.g.), I could see no problem.
I didn't test suspend / resume.
If I want to copy big files or try to do something like backup /
restore, I can be pretty sure to run into problems, because the
connection isn't stable at all on load.
Therefore I really appreciate a fix for this problem! You can send it
to me, if you have one - I'll test it!
Thank you for your time and work!
Andreas
On Tue, Aug 23, 2011 at 11:07:36AM +0200, Andreas Hartmann wrote:
> using rt2800usb with a Linksys WUSB600N v2 (rt3572) crashes the complete
> machine (SMP, Core i5, linux 3.0) on unloading the module after using it
> for a short period of time:
> - 2 times netperf -t TCP_MAERTS -H host
> - 2 times netperf -t TCP_STREAM -H host
>
> The error message in /var/log/messages is:
>
> phy0 -> rt2800_wait_wpdma_ready: Error - WPDMA TX/RX busy, aborting
>
> After the crash, you have to hard reset the machine.
Did you hibernate machine anytime before the rmmod ?
If so, we have fix on wireless-testing tree:
http://git.kernel.org/?p=linux/kernel/git/linville/wireless-testing.git;a=commitdiff;h=543cc38c8fe86deba4169977c61eb88491036837
Otherwise perhaps you could photo crash logs on virtual terminal
(switched by Alt+Ctrl+F2 from X-window) or by using netconsole or kdump.
Stanislaw
Stanislaw Gruszka schrieb:
> On Tue, Aug 23, 2011 at 11:07:36AM +0200, Andreas Hartmann wrote:
>> using rt2800usb with a Linksys WUSB600N v2 (rt3572) crashes the complete
>> machine (SMP, Core i5, linux 3.0) on unloading the module after using it
>> for a short period of time:
>> - 2 times netperf -t TCP_MAERTS -H host
>> - 2 times netperf -t TCP_STREAM -H host
>>
>> The error message in /var/log/messages is:
>>
>> phy0 -> rt2800_wait_wpdma_ready: Error - WPDMA TX/RX busy, aborting
>>
>> After the crash, you have to hard reset the machine.
[...]
> Otherwise perhaps you could photo crash logs on virtual terminal
> (switched by Alt+Ctrl+F2 from X-window) or by using netconsole or kdump.
There is no crash dump - the machine just hangs up itself and the fan is
getting loader and loader (until max), because the machine is getting
hot more and more.
Kind regards,
Andreas
Hello Stanislaw,
thank you for spending your time for this problem!
Am Wed, 24 Aug 2011 11:03:41 +0200
schrieb Stanislaw Gruszka <[email protected]>:
> On Tue, Aug 23, 2011 at 11:07:36AM +0200, Andreas Hartmann wrote:
> > using rt2800usb with a Linksys WUSB600N v2 (rt3572) crashes the complete
> > machine (SMP, Core i5, linux 3.0) on unloading the module after using it
> > for a short period of time:
> > - 2 times netperf -t TCP_MAERTS -H host
> > - 2 times netperf -t TCP_STREAM -H host
> >
> > The error message in /var/log/messages is:
> >
> > phy0 -> rt2800_wait_wpdma_ready: Error - WPDMA TX/RX busy, aborting
> >
> > After the crash, you have to hard reset the machine.
> Did you hibernate machine anytime before the rmmod ?
No, there was no suspend / resume done with this module at all.
I think, the problem of this hang is located elsewhere. To get more
information, I switched on debugging of the rt2x00-driver and did the
tests again. During the tests, I get tons of these warnings
in /var/log/messages:
Aug 24 13:01:55 pc kernel: [22286.549510] phy0 -> rt2800usb_txdone_entry_check: Warning - TX status report missed for queue 2 entry 36
Aug 24 13:01:55 pc kernel: [22286.549530] phy0 -> rt2800usb_txdone_entry_check: Warning - TX status report missed for queue 2 entry 37
Aug 24 13:01:55 pc kernel: [22286.549546] phy0 -> rt2800usb_txdone_entry_check: Warning - TX status report missed for queue 2 entry 39
Aug 24 13:01:55 pc kernel: [22286.549560] phy0 -> rt2800usb_txdone_entry_check: Warning - TX status report missed for queue 2 entry 40
Aug 24 13:01:55 pc kernel: [22286.550511] phy0 -> rt2800usb_txdone_entry_check: Warning - TX status report missed for queue 2 entry 42
Aug 24 13:01:55 pc kernel: [22286.550749] phy0 -> rt2800usb_txdone_entry_check: Warning - TX status report missed for queue 2 entry 44
Aug 24 13:01:55 pc kernel: [22286.550766] phy0 -> rt2800usb_txdone_entry_check: Warning - TX status report missed for queue 2 entry 46
Aug 24 13:01:55 pc kernel: [22286.550780] phy0 -> rt2800usb_txdone_entry_check: Warning - TX status report missed for queue 2 entry 47
Aug 24 13:01:55 pc kernel: [22286.551756] phy0 -> rt2800usb_txdone_entry_check: Warning - TX status report missed for queue 2 entry 48
Aug 24 13:01:55 pc kernel: [22286.552006] phy0 -> rt2800usb_txdone_entry_check: Warning - TX status report missed for queue 2 entry 50
Aug 24 13:01:55 pc kernel: [22286.552032] phy0 -> rt2800usb_txdone_entry_check: Warning - TX status report missed for queue 2 entry 52
(about 144/s)
Another entry is:
Aug 24 12:59:46 pc kernel: [22157.608241] phy0 -> rt2x00usb_watchdog_tx_status: Warning - TX queue 0 status timed out, invoke forced tx handler
The measured throughput is:
- to STA (-> where rt2800usb runs) is about 4 MB/s.
- from STA is about 1 _k_B/s.
The Linksys WUSB600N V2 dongle should have a throughput of about 10 M/s
in both directions! I think, if the handling of the queues is fixed,
the problem of the hanging machine will be gone, too!
If I should test some patches or if you want need some more information
- please ask, I'll try to provide them!
Kind regards,
Andreas