2021-03-27 00:25:12

by Ben Greear

[permalink] [raw]
Subject: mac80211 mlme connection probing woes

I've been digging into a bug where our ath10k driver shows periodic
throughput drops on regular intervals. We've bisected this down to a patch
where we disable the firmware connection monitor, and so ask mac80211 to
do the connection monitor.

This works fine in 5.4 kernel, but in 5.11, it does not work well.

First, if anyone has an idea what change might have caused this,
please let me know. We will try with ath9k, assuming it uses
the mac80211 connection monitor to see if it has the same issue.

And second, if a STA is doing traffic that is passing to/from the AP,
why probe the connection at all? We get tx status showing success, and also valid
rx packets from AP, shouldn't that cause the probe timer to
defer?

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2021-03-30 19:20:45

by Ben Greear

[permalink] [raw]
Subject: Re: mac80211 mlme connection probing woes (bisected)

On 3/26/21 5:18 PM, Ben Greear wrote:
> I've been digging into a bug where our ath10k driver shows periodic
> throughput drops on regular intervals.  We've bisected this down to a patch
> where we disable the firmware connection monitor, and so ask mac80211 to
> do the connection monitor.
>
> This works fine in 5.4 kernel, but in 5.11, it does not work well.
>
> First, if anyone has an idea what change might have caused this,
> please let me know.  We will try with ath9k, assuming it uses
> the mac80211 connection monitor to see if it has the same issue.

Ok, it took a while, but I bisected to this:

commit 9abf4e49830d606f18a05111cfa96b8f0b724c7d (HEAD, refs/bisect/good-9abf4e49830d606f18a05111cfa96b8f0b724c7d)
Author: Felix Fietkau <[email protected]>
Date: Tue Sep 8 14:36:56 2020 +0200

mac80211: optimize station connection monitor

Calling mod_timer for every rx/tx packet can be quite expensive.
Instead of constantly updating the timer, we can simply let it run out
and check the timestamp of the last ACK or rx packet to re-arm it.

Signed-off-by: Felix Fietkau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

To do the bisect, I copied my ath10k-ct driver from the 5.4 kernel (well tested driver)
over whatever ath10k code was in the particular kernel commit I was testing. I tweaked
the driver slightly to compile and work against stock kernel.

The failure case is that when in station mode, and transmitting UDP in upload direction
(with a few packets per second of download traffic too),
the traffic periodically goes to zero throughput every 30 seconds, and stays quiesced
for about 5 seconds, and then resumes. The station stays connected.

In previous debugging, I noticed this only happens when my driver enables mac80211
connection monitoring. In a different bisect attempt, my driver hit the issue when
changing how tx-descriptor count was configured, but I am not fully confident that
is a root cause, and changing things a bit made that problem go away.

The problem is not seen with ath9k, nor stock ath10k. Stock ath10k uses in-firmware
connection monitoring.

Felix, if you have any ideas of likely failure points, please let me know.

Thanks,
Ben