Hello list,
Context:
I am using one device in AP mode, the other in client mode.
The client uses wpa_supplicant to do *background scan to other channels that the data channel*.
I am running iperf (UDP) *from the AP* to the client.
My device is Cavium development board-based (Octeon III CPU), equipped with Compex WLE350NX.
It used to work correctly with kernel 3.18 and an old 2015 wireless backport.
Now I updated to kernel 4.9 and the wireless backport 4.19.32-1, the last one from the OpenWRT trunk. (previously I used backport-2017-11-01 with the same failure).
I am running wireshark with Airpcap to spy the wireless link.
Problem:
When the client scans offchannel, it correctly sends nullfunc frames around the offchannel period, with the PM bit set then unset.
However, during this time, the AP continues to send data to the client.
This results in a lot of lost frames, though I set the powersave buffers to high values on the AP side.
After some research I saw that the same kind of problem was fixed [1] and even re-fixed, but since there where so many changes in the queue management, I cannot really compare his work and the current state of the driver.
Any idea / patch / directions of research ?
J.P. Tosoni - ACKSYS
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5519541d5a5f19893546883547e2f0f2e5934df7
Jean-Pierre TOSONI <[email protected]> writes:
> Context:
>
> I am using one device in AP mode, the other in client mode.
> The client uses wpa_supplicant to do *background scan to other channels that the data channel*.
> I am running iperf (UDP) *from the AP* to the client.
>
> My device is Cavium development board-based (Octeon III CPU), equipped with Compex WLE350NX.
> It used to work correctly with kernel 3.18 and an old 2015 wireless backport.
> Now I updated to kernel 4.9 and the wireless backport 4.19.32-1, the
> last one from the OpenWRT trunk. (previously I used
> backport-2017-11-01 with the same failure).
>
> I am running wireshark with Airpcap to spy the wireless link.
>
> Problem:
>
> When the client scans offchannel, it correctly sends nullfunc frames around the offchannel period, with the PM bit set then unset.
>
> However, during this time, the AP continues to send data to the client.
>
> This results in a lot of lost frames, though I set the powersave buffers to high values on the AP side.
>
>
> After some research I saw that the same kind of problem was fixed [1]
> and even re-fixed, but since there where so many changes in the queue
> management, I cannot really compare his work and the current state of
> the driver.
>
> Any idea / patch / directions of research ?
>
> J.P. Tosoni - ACKSYS
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5519541d5a5f19893546883547e2f0f2e5934df7
This an old report from July but sounds a pretty serious problem to me,
and power save problems are difficult to detect by normal users which
makes this even more important. Has anyone else noticed anything
similar?
Something which would help a lot is to bisect when the problem appeared.
For example, if you can connect your ath9k board to an x86 device you
could start testing upstream kernel releases[1] directly from git
(v3.18, v4.9, v4.19 and so on). With an upstream kernel release I mean a
release directly from Linus' tree:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
This way there are no out-of-tree patches applied, and we can also rule
out backports problems, which makes it easier to find the root cause.
Just first make sure you that you can reproduce the problem with the
upstream kernel so that you don't waste time bisecting which is not
there :)
First find what is the last working release ("good") and the first
release which has the bug ("bad"). This way it's a lot easier to find
the culprit and fix it.
You can checkout a specific release like this:
git checkout v4.9
And even better if you can then use git-bisect to find the actual commit
which broke it:
git bisect start
git bisect bad v4.12
git bisect good v4.11
https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html
Please do let me know how it goes, this issue should be fixed. Power
save problems are always tricky and cause bad user experience.
--
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches