2010-06-18 18:02:12

by David Ellingsworth

[permalink] [raw]
Subject: rt61pci AP performance issues

I've been trying unsuccessfully for some time to use a rt61pci based
wireless card as an Access Point for my network. With kernel version
2.6.32-4 (Debian version) the AP works but has intermittent problems.
Notably, that version continuously prints the message
"ieee80211_tx_status: headroom too small" to my system log and the
driver simply stops working after a random amount of time. The first
of these errors was fixed some time ago after I reported it, but the
other still remains, even with the latest wireless-testing driver, and
there's no indication as to the cause of the issue. With regards to
this issue, it appears that performance steadily drops until it
becomes unusable. With the latest wireless-testing driver I can
reliably reproduce this issue by trying to transfer a large file from
my server to a client. Any help diagnosing and correcting this issue
would be greatly appreciated.

Regards,

David Ellingsworth


2010-06-29 16:55:25

by David Ellingsworth

[permalink] [raw]
Subject: Re: rt61pci AP performance issues

On Fri, Jun 18, 2010 at 2:02 PM, David Ellingsworth
<[email protected]> wrote:
> I've been trying unsuccessfully for some time to use a rt61pci based
> wireless card as an Access Point for my network. With kernel version
> 2.6.32-4 (Debian version) the AP works but has intermittent problems.
> Notably, that version continuously prints the message
> "ieee80211_tx_status: headroom too small" to my system log and the
> driver simply stops working after a random amount of time. The first
> of these errors was fixed some time ago after I reported it, but the
> other still remains, even with the latest wireless-testing driver, and
> there's no indication as to the cause of the issue. With regards to
> this issue, it appears that performance steadily drops until it
> becomes unusable. With the latest wireless-testing driver I can
> reliably reproduce this issue by trying to transfer a large file from
> my server to a client. Any help diagnosing and correcting this issue
> would be greatly appreciated.
>

I have not received a response to this issue, so I'll try to clarify
it a bit. With the latest wireless-testing tree, clients are able to
associate to my rt61pci AP but the connection fails after about 10s or
less while transferring a file. Between v2.6.32 and the latest
wireless-testing tree, there have been many changes. These changes
introduced a lot of variability in the connection speed and stability.
All of this has made bisecting this issue very difficult..

Using v2.6.32 as a "good" point and ad57b053612 (the HEAD of
wireless-testing at the time) as a "bad" point, I began bisecting. I
then compiled and tested 15 different kernels. Some of them were
remarkably better and others were extremely worse. Upon completing,
git had identified a single commit as the cause. Unfortunately, the
commit it identified had nothing to do with the rt61pci driver or the
wireless networking stack. I therefore had done something wrong during
my testing.

Fortunately, all that work did not go to waste. During my testing, I
identified c91f48d61c as being remarkably better than both v2.6.32 and
the HEAD of the wireless testing branch. At that commit, the
performance is fast and stable and much closer to the HEAD of the
wireless-testing branch. Using it as the first good position, and
bisecting only 'net/mac80211/' and 'drivers/net/wireless/rt2x00/' I
was able to identify e46754f8c9333 as the first bad commit. This
commit was a merge of another branch and consists of about four other
commits, two which directly affect mac80211.

I haven't conducted any other tests beyond what was bisected in the
attached log. Performance across all those revisions remained somewhat
fast, and my markings were based solely on client link failure. Each
bad commit resulted in the link failing after about 10s while
transferring a file. At which point, the transfer would stop, the AP
would be unreachable via pings, but the client remained associated to
the AP. About 30s after that point, the client would timeout and
re-associate to the AP reactivating the link.

Given that I limited my bisection to only the above directories, the
cause could very well be in some other commit than the one identified.
Nonetheless, I hope this additional information helps identify the
cause so that it can be corrected. There's also the chance that even
if this is caused by a change in mac80211, the rt61pci driver may
still be at fault due to unexpected behavior.

Regards,

David Ellingsworthh


Attachments:
bisect.log (1.86 kB)

2010-07-05 06:14:43

by Helmut Schaa

[permalink] [raw]
Subject: Re: rt61pci AP performance issues

Am Dienstag 29 Juni 2010 schrieb David Ellingsworth:
> I haven't conducted any other tests beyond what was bisected in the
> attached log. Performance across all those revisions remained somewhat
> fast, and my markings were based solely on client link failure. Each
> bad commit resulted in the link failing after about 10s while
> transferring a file. At which point, the transfer would stop, the AP
> would be unreachable via pings, but the client remained associated to
> the AP. About 30s after that point, the client would timeout and
> re-associate to the AP reactivating the link.

Could you please check if the queues get stuck in that situation? I was
able to sometimes observe queues getting stuck on rt2800.

Just do the following (on the AP mode machine):

mount -t debugfs none /sys/debug
cat /sys/debug/ieee80211/phy0/queues

That should give you something like:

00: 0x00000000/0
01: 0x00000000/0
02: 0x00000000/0
03: 0x00000000/0

And if there's somewhere a 1 instead of a 0 it means that this queue is
stopped (which can/and must actually happen in some scenarios without
causing problems). If that 1 stays there for longer that means something
didn't start the queue anymore and your connection is most likely stuck.

Helmut