2019-11-19 06:08:30

by Kan Yan

[permalink] [raw]
Subject: [PATCH v11 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

From: Toke Høiland-Jørgensen <[email protected]>

The previous commit added the ability to throttle stations when they queue
too much airtime in the hardware. This commit enables the functionality by
calculating the expected airtime usage of each packet that is dequeued from
the TXQs in mac80211, and accounting that as pending airtime.

The estimated airtime for each skb is stored in the tx_info, so we can
subtract the same amount from the running total when the skb is freed or
recycled. The throttling mechanism relies on this accounting to be
accurate (i.e., that we are not freeing skbs without subtracting any
airtime they were accounted for), so we put the subtraction into
ieee80211_report_used_skb(). As an optimisation, we also subtract the
airtime on regular TX completion, zeroing out the value stored in the
packet afterwards, to avoid having to do an expensive lookup of the station
from the packet data on every packet.

This patch does *not* include any mechanism to wake a throttled TXQ again,
on the assumption that this will happen anyway as a side effect of whatever
freed the skb (most commonly a TX completion).

Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
---
include/net/mac80211.h | 16 ++++++++++++++++
net/mac80211/status.c | 26 ++++++++++++++++++++++++++
net/mac80211/tx.c | 18 ++++++++++++++++++
3 files changed, 60 insertions(+)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index ba3f33cc41ea..aa145808e57a 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -1060,6 +1060,22 @@ struct ieee80211_tx_info {
};
};

+static inline u16
+ieee80211_info_set_tx_time_est(struct ieee80211_tx_info *info, u16 tx_time_est)
+{
+ /* We only have 10 bits in tx_time_est, so store airtime
+ * in increments of 4us and clamp the maximum to 2**12-1
+ */
+ info->tx_time_est = min_t(u16, tx_time_est, 4095) >> 2;
+ return info->tx_time_est << 2;
+}
+
+static inline u16
+ieee80211_info_get_tx_time_est(struct ieee80211_tx_info *info)
+{
+ return info->tx_time_est << 2;
+}
+
/**
* struct ieee80211_tx_status - extended tx status info for rate control
*
diff --git a/net/mac80211/status.c b/net/mac80211/status.c
index 0e51def35b8a..39da82b35be9 100644
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -670,12 +670,26 @@ static void ieee80211_report_used_skb(struct ieee80211_local *local,
struct sk_buff *skb, bool dropped)
{
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
+ u16 tx_time_est = ieee80211_info_get_tx_time_est(info);
struct ieee80211_hdr *hdr = (void *)skb->data;
bool acked = info->flags & IEEE80211_TX_STAT_ACK;

if (dropped)
acked = false;

+ if (tx_time_est) {
+ struct sta_info *sta;
+
+ rcu_read_lock();
+
+ sta = sta_info_get_by_addrs(local, hdr->addr1, hdr->addr2);
+ ieee80211_sta_update_pending_airtime(local, sta,
+ skb_get_queue_mapping(skb),
+ tx_time_est,
+ true);
+ rcu_read_unlock();
+ }
+
if (info->flags & IEEE80211_TX_INTFL_MLME_CONN_TX) {
struct ieee80211_sub_if_data *sdata;

@@ -877,6 +891,7 @@ static void __ieee80211_tx_status(struct ieee80211_hw *hw,
struct ieee80211_bar *bar;
int shift = 0;
int tid = IEEE80211_NUM_TIDS;
+ u16 tx_time_est;

rates_idx = ieee80211_tx_get_rates(hw, info, &retry_count);

@@ -986,6 +1001,17 @@ static void __ieee80211_tx_status(struct ieee80211_hw *hw,
ieee80211_sta_register_airtime(&sta->sta, tid,
info->status.tx_time, 0);

+ if ((tx_time_est = ieee80211_info_get_tx_time_est(info)) > 0) {
+ /* Do this here to avoid the expensive lookup of the sta
+ * in ieee80211_report_used_skb().
+ */
+ ieee80211_sta_update_pending_airtime(local, sta,
+ skb_get_queue_mapping(skb),
+ tx_time_est,
+ true);
+ ieee80211_info_set_tx_time_est(info, 0);
+ }
+
if (ieee80211_hw_check(&local->hw, REPORTS_TX_ACK_STATUS)) {
if (info->flags & IEEE80211_TX_STAT_ACK) {
if (sta->status_stats.lost_packets)
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 59db921d1086..dfbaab92dbf2 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3551,6 +3551,9 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,

WARN_ON_ONCE(softirq_count() == 0);

+ if (!ieee80211_txq_airtime_check(hw, txq))
+ return NULL;
+
begin:
spin_lock_bh(&fq->lock);

@@ -3661,6 +3664,21 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
}

IEEE80211_SKB_CB(skb)->control.vif = vif;
+
+ if (local->airtime_flags & AIRTIME_USE_AQL) {
+ u32 airtime;
+
+ airtime = ieee80211_calc_expected_tx_airtime(hw, vif, txq->sta,
+ skb->len);
+ if (airtime) {
+ airtime = ieee80211_info_set_tx_time_est(info, airtime);
+ ieee80211_sta_update_pending_airtime(local, tx.sta,
+ txq->ac,
+ airtime,
+ false);
+ }
+ }
+
return skb;

out:
--
2.24.0.432.g9d3f5f5b63-goog


2020-02-26 20:31:26

by Felix Fietkau

[permalink] [raw]
Subject: Re: [PATCH v11 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

Hi,

I've take a closer look at the AQL implementation, and I found some
corner cases that need to be addressed soon:

- AQL estimated airtime does not take into account A-MPDU, so it is
significantly overestimating airtime use for aggregated traffic,
especially on high rates.
My proposed solution would be to check for a running aggregation session
and set estimated tx time to something like:
expected_airtime(16 * skb->len) / 16.

- We need an API that allows the driver to change the pending airtime
values, e.g. subtract estimated tx time for a packet.
mt76 an ath9k can queue packets inside the driver that are not currently
in the hardware queues. Typically if the txqs have more data than what
gets put into the hardware queue, both drivers will pull an extra frame
and queue it in its private txq struct. This frame will get used on the
next txq scheduling round for that particular station.
If you have lots of stations doing traffic (or having driver buffered
frames in powersave mode), this could use up a sizable chunk of the AQL
budget.
While removing the airtime of those packages would lead to AQL
temporarily underestimating airtime, I think it would be better than
overestimating it.

I will work on some patches. What do you think about these issues and my
proposed fixes?

- Felix

2020-02-26 21:57:03

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [PATCH v11 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

Felix Fietkau <[email protected]> writes:

> Hi,
>
> I've take a closer look at the AQL implementation, and I found some
> corner cases that need to be addressed soon:
>
> - AQL estimated airtime does not take into account A-MPDU, so it is
> significantly overestimating airtime use for aggregated traffic,
> especially on high rates.
> My proposed solution would be to check for a running aggregation session
> and set estimated tx time to something like:
> expected_airtime(16 * skb->len) / 16.

This seems reasonable. Not sure if it'll do anything for ath10k (does
mac80211 know the aggregation state for that?), but maybe this is not
such a big issue on that hardware?

> - We need an API that allows the driver to change the pending airtime
> values, e.g. subtract estimated tx time for a packet.
> mt76 an ath9k can queue packets inside the driver that are not currently
> in the hardware queues. Typically if the txqs have more data than what
> gets put into the hardware queue, both drivers will pull an extra frame
> and queue it in its private txq struct. This frame will get used on the
> next txq scheduling round for that particular station.
> If you have lots of stations doing traffic (or having driver buffered
> frames in powersave mode), this could use up a sizable chunk of the AQL
> budget.

I'm a bit more skeptical about this. If the driver buffers a bunch of
packets that are not accounted that will hurt that station due to extra
latency when it wakes up. For ath9k, this is the retry_q you're talking
about, right? The number of packets queued on that is fairly limited,
isn't it? What kind of powersave buffering is the driver doing, and why
can't it leave the packets on the TXQ? That would allow them to be
scheduled along with any new ones that might have arrived in the
meantime, which would be a benefit for latency.

I can see how it can be a problem that many stations in powersave can
block transmissions for *other* stations, though. Maybe an alternative
to the driver subtracting airtime could be to have mac80211 do something
like this when a station is put into powersave mode:

local->aql_total_pending_airtime -= sum(sta->airtime[ac].aql_tx_pending)

(where sum is the summation over all ACs)

and the reverse when the station wakes back up? That should keep the
whole device from throttling but still prevent a big queue building up
for that sta when it wakes back up. Would that work, do you think?

-Toke

2020-02-27 08:25:41

by Kan Yan

[permalink] [raw]
Subject: Re: [PATCH v11 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

> - AQL estimated airtime does not take into account A-MPDU, so it is
> significantly overestimating airtime use for aggregated traffic,
> especially on high rates.
> My proposed solution would be to check for a running aggregation session
> and set estimated tx time to something like:
> expected_airtime(16 * skb->len) / 16.

Yes, that's a known limitation that needs to be improved. I actually
post a comment for this patch:
"[PATCH v10 2/4] mac80211: Import airtime calculation code from mt76"
>
> When a txq is subjected to the queue limit,
> there is usually a significant amount of frames being queued and those
> frames are likely being sent out in large aggregations. So, in most
> cases when AQL is active, the medium access overhead can be amortized
> over many frames and the per frame overhead could be considerably
> lower than 36, especially at higher data rate. As a result, the
> pending airtime calculated this way could be higher than the actual
> airtime. In my test, I have to compensate that by increasing
> "aql_txq_limit" via debugfs to get the same peak throughput as without
> AQL.


> My proposed solution would be to check for a running aggregation session
> and set estimated tx time to something like:
> expected_airtime(16 * skb->len) / 16.

I think that's a reasonable approximation, but doubts aggregation
information is available in all architectures. In some architecture,
firmware may only report aggregation information after the frame has
been transmitted.

In my earlier version of AQL for the out-of-tree ChromeOS kernel,
A-MPDU is dealt this way: The the medium access overhead is only
counted once for each TXQ for all frames released to the hardware over
a 4ms period, assuming those frames are likely to be aggregated
together.

Instead of calculating the estimated airtime using the last known phy
rate, then try to add some estimated overhead for medium access time,
another potentially better approach is to use average data rate, which
is byte_transmited/firmware_reported_actual_airtime. The average rate
not only includes medium access overhead, but also takes retries into
account.

> - We need an API that allows the driver to change the pending airtime
> values, e.g. subtract estimated tx time for a packet.
> mt76 an ath9k can queue packets inside the driver that are not currently
> in the hardware queues. Typically if the txqs have more data than what
> gets put into the hardware queue, both drivers will pull an extra frame
> and queue it in its private txq struct. This frame will get used on the
> next txq scheduling round for that particular station.
> If you have lots of stations doing traffic (or having driver buffered
> frames in powersave mode), this could use up a sizable chunk of the AQL
> budget.

Not sure I fully understand your concerns here. The AQL budget is per
STA, per TID, so frames queued in the driver's special queue for one
station won't impact another station's budget. Those frames are
counted towards the per interface pending airtime, which could trigger
AQL to switch to use the lower queue limit. In this case, that could
be the desirable behavior when there is heavy traffic.

Regards,
Kan



On Wed, Feb 26, 2020 at 1:56 PM Toke Høiland-Jørgensen <[email protected]> wrote:
>
> Felix Fietkau <[email protected]> writes:
>
> > Hi,
> >
> > I've take a closer look at the AQL implementation, and I found some
> > corner cases that need to be addressed soon:
> >
> > - AQL estimated airtime does not take into account A-MPDU, so it is
> > significantly overestimating airtime use for aggregated traffic,
> > especially on high rates.
> > My proposed solution would be to check for a running aggregation session
> > and set estimated tx time to something like:
> > expected_airtime(16 * skb->len) / 16.
>
> This seems reasonable. Not sure if it'll do anything for ath10k (does
> mac80211 know the aggregation state for that?), but maybe this is not
> such a big issue on that hardware?
>
> > - We need an API that allows the driver to change the pending airtime
> > values, e.g. subtract estimated tx time for a packet.
> > mt76 an ath9k can queue packets inside the driver that are not currently
> > in the hardware queues. Typically if the txqs have more data than what
> > gets put into the hardware queue, both drivers will pull an extra frame
> > and queue it in its private txq struct. This frame will get used on the
> > next txq scheduling round for that particular station.
> > If you have lots of stations doing traffic (or having driver buffered
> > frames in powersave mode), this could use up a sizable chunk of the AQL
> > budget.
>
> I'm a bit more skeptical about this. If the driver buffers a bunch of
> packets that are not accounted that will hurt that station due to extra
> latency when it wakes up. For ath9k, this is the retry_q you're talking
> about, right? The number of packets queued on that is fairly limited,
> isn't it? What kind of powersave buffering is the driver doing, and why
> can't it leave the packets on the TXQ? That would allow them to be
> scheduled along with any new ones that might have arrived in the
> meantime, which would be a benefit for latency.
>
> I can see how it can be a problem that many stations in powersave can
> block transmissions for *other* stations, though. Maybe an alternative
> to the driver subtracting airtime could be to have mac80211 do something
> like this when a station is put into powersave mode:
>
> local->aql_total_pending_airtime -= sum(sta->airtime[ac].aql_tx_pending)
>
> (where sum is the summation over all ACs)
>
> and the reverse when the station wakes back up? That should keep the
> whole device from throttling but still prevent a big queue building up
> for that sta when it wakes back up. Would that work, do you think?
>
> -Toke
>

2020-02-27 08:35:04

by Felix Fietkau

[permalink] [raw]
Subject: Re: [PATCH v11 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

On 2020-02-26 22:56, Toke Høiland-Jørgensen wrote:
> Felix Fietkau <[email protected]> writes:
>> - We need an API that allows the driver to change the pending airtime
>> values, e.g. subtract estimated tx time for a packet.
>> mt76 an ath9k can queue packets inside the driver that are not currently
>> in the hardware queues. Typically if the txqs have more data than what
>> gets put into the hardware queue, both drivers will pull an extra frame
>> and queue it in its private txq struct. This frame will get used on the
>> next txq scheduling round for that particular station.
>> If you have lots of stations doing traffic (or having driver buffered
>> frames in powersave mode), this could use up a sizable chunk of the AQL
>> budget.
>
> I'm a bit more skeptical about this. If the driver buffers a bunch of
> packets that are not accounted that will hurt that station due to extra
> latency when it wakes up. For ath9k, this is the retry_q you're talking
> about, right? The number of packets queued on that is fairly limited,
> isn't it? What kind of powersave buffering is the driver doing, and why
> can't it leave the packets on the TXQ? That would allow them to be
> scheduled along with any new ones that might have arrived in the
> meantime, which would be a benefit for latency.
For mt76 there should be max. 1 frame in the retry queue, it's just a
frame that was pulled from the txq in a transmission attempt but that it
couldn't put in the hw queue because it didn't fit in the current
aggregate batch.
If such a frame is in the retry queue and the sta ends up switching to
powersave mode, that frame stays buffered in the driver queue until the
sta wakes up.

> I can see how it can be a problem that many stations in powersave can
> block transmissions for *other* stations, though. Maybe an alternative
> to the driver subtracting airtime could be to have mac80211 do something
> like this when a station is put into powersave mode:
>
> local->aql_total_pending_airtime -= sum(sta->airtime[ac].aql_tx_pending)
>
> (where sum is the summation over all ACs)
>
> and the reverse when the station wakes back up? That should keep the
> whole device from throttling but still prevent a big queue building up
> for that sta when it wakes back up. Would that work, do you think?
That solves most of the issue but is still incomplete. It also gets
tricky when the driver starts pulling dequeueing packets in powersave
mode (e.g. on PS-Poll).
The remaining corner case is when there are a large number of stations
that each have a single frame in their retry queue (see above). Having
those frames counted for pending airtime could reduce the aggregation
size for each station, even though those frames are not in the hw queue.

- Felix

2020-02-27 09:28:29

by Justin Capella

[permalink] [raw]
Subject: Re: [PATCH v11 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

> ieee80211_report_used_skb(). As an optimisation, we also subtract the
> airtime on regular TX completion, zeroing out the value stored in the
> packet afterwards, to avoid having to do an expensive lookup of the station
> from the packet data on every packet.
>
> This patch does *not* include any mechanism to wake a throttled TXQ again,
> on the assumption that this will happen anyway as a side effect of whatever
> freed the skb (most commonly a TX completion).

I recall a recent patch for ath10k sdio that disabled tx
acknowledgement for performance gains and am wondering if that will be
problematic? Presumably not since it would be caught at the dequeue,
but thought I'd ask-- wondering what the effect of failed tx's or
block acknowledgement is on this stuff I'll need to study the code
some more

https://lore.kernel.org/linux-wireless/0101016eb1903db0-ef7063b4-0f42-4a01-8886-327541e6c1a4-000000@us-west-2.amazonses.com/T/#t

2020-02-27 10:10:02

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [PATCH v11 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

Felix Fietkau <[email protected]> writes:

> On 2020-02-26 22:56, Toke Høiland-Jørgensen wrote:
>> Felix Fietkau <[email protected]> writes:
>>> - We need an API that allows the driver to change the pending airtime
>>> values, e.g. subtract estimated tx time for a packet.
>>> mt76 an ath9k can queue packets inside the driver that are not currently
>>> in the hardware queues. Typically if the txqs have more data than what
>>> gets put into the hardware queue, both drivers will pull an extra frame
>>> and queue it in its private txq struct. This frame will get used on the
>>> next txq scheduling round for that particular station.
>>> If you have lots of stations doing traffic (or having driver buffered
>>> frames in powersave mode), this could use up a sizable chunk of the AQL
>>> budget.
>>
>> I'm a bit more skeptical about this. If the driver buffers a bunch of
>> packets that are not accounted that will hurt that station due to extra
>> latency when it wakes up. For ath9k, this is the retry_q you're talking
>> about, right? The number of packets queued on that is fairly limited,
>> isn't it? What kind of powersave buffering is the driver doing, and why
>> can't it leave the packets on the TXQ? That would allow them to be
>> scheduled along with any new ones that might have arrived in the
>> meantime, which would be a benefit for latency.
> For mt76 there should be max. 1 frame in the retry queue, it's just a
> frame that was pulled from the txq in a transmission attempt but that it
> couldn't put in the hw queue because it didn't fit in the current
> aggregate batch.

Wait, if it's only a single frame that is queued in the driver, how is
this causing problems? We deliberately set the limit so there was a bit
of slack above the size of an aggregate for things like this. Could you
please describe in a bit more detail what symptoms you are seeing of
this problem? :)

-Toke

2020-02-27 10:18:34

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [PATCH v11 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

Justin Capella <[email protected]> writes:

>> ieee80211_report_used_skb(). As an optimisation, we also subtract the
>> airtime on regular TX completion, zeroing out the value stored in the
>> packet afterwards, to avoid having to do an expensive lookup of the station
>> from the packet data on every packet.
>>
>> This patch does *not* include any mechanism to wake a throttled TXQ again,
>> on the assumption that this will happen anyway as a side effect of whatever
>> freed the skb (most commonly a TX completion).
>
> I recall a recent patch for ath10k sdio that disabled tx
> acknowledgement for performance gains and am wondering if that will be
> problematic? Presumably not since it would be caught at the dequeue,
> but thought I'd ask-- wondering what the effect of failed tx's or
> block acknowledgement is on this stuff I'll need to study the code
> some more
>
> https://lore.kernel.org/linux-wireless/0101016eb1903db0-ef7063b4-0f42-4a01-8886-327541e6c1a4-000000@us-west-2.amazonses.com/T/#t

It looks like that patch will just end up disabling AQL (because packets
will be immediately completed as far as mac80211 is concerned) and
replace it with whatever that credit-based scheme does? No idea how that
will impact latency; you should go ask the developers of that series! As
usual, the patch description only mentions throughput numbers :/

-Toke

2020-02-27 10:43:41

by Felix Fietkau

[permalink] [raw]
Subject: Re: [PATCH v11 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

On 2020-02-27 09:24, Kan Yan wrote:
>> - AQL estimated airtime does not take into account A-MPDU, so it is
>> significantly overestimating airtime use for aggregated traffic,
>> especially on high rates.
>> My proposed solution would be to check for a running aggregation session
>> and set estimated tx time to something like:
>> expected_airtime(16 * skb->len) / 16.
>
> Yes, that's a known limitation that needs to be improved. I actually
> post a comment for this patch:
> "[PATCH v10 2/4] mac80211: Import airtime calculation code from mt76"
>>
>> When a txq is subjected to the queue limit,
>> there is usually a significant amount of frames being queued and those
>> frames are likely being sent out in large aggregations. So, in most
>> cases when AQL is active, the medium access overhead can be amortized
>> over many frames and the per frame overhead could be considerably
>> lower than 36, especially at higher data rate. As a result, the
>> pending airtime calculated this way could be higher than the actual
>> airtime. In my test, I have to compensate that by increasing
>> "aql_txq_limit" via debugfs to get the same peak throughput as without
>> AQL.
>
>
>> My proposed solution would be to check for a running aggregation session
>> and set estimated tx time to something like:
>> expected_airtime(16 * skb->len) / 16.
>
> I think that's a reasonable approximation, but doubts aggregation
> information is available in all architectures. In some architecture,
> firmware may only report aggregation information after the frame has
> been transmitted.
I'm not proposing using frame transmission aggregation. It would be a
simple approximation where it would use a fixed assumed average
aggregation size if an aggregation session is established.

> In my earlier version of AQL for the out-of-tree ChromeOS kernel,
> A-MPDU is dealt this way: The the medium access overhead is only
> counted once for each TXQ for all frames released to the hardware over
> a 4ms period, assuming those frames are likely to be aggregated
> together.
I think that would be more accurate, but probably also more expensive to
calculate.

> Instead of calculating the estimated airtime using the last known phy
> rate, then try to add some estimated overhead for medium access time,
> another potentially better approach is to use average data rate, which
> is byte_transmited/firmware_reported_actual_airtime. The average rate
> not only includes medium access overhead, but also takes retries into
> account.
Also an interesting idea, though probably not available on all hardware.

>> - We need an API that allows the driver to change the pending airtime
>> values, e.g. subtract estimated tx time for a packet.
>> mt76 an ath9k can queue packets inside the driver that are not currently
>> in the hardware queues. Typically if the txqs have more data than what
>> gets put into the hardware queue, both drivers will pull an extra frame
>> and queue it in its private txq struct. This frame will get used on the
>> next txq scheduling round for that particular station.
>> If you have lots of stations doing traffic (or having driver buffered
>> frames in powersave mode), this could use up a sizable chunk of the AQL
>> budget.
>
> Not sure I fully understand your concerns here. The AQL budget is per
> STA, per TID, so frames queued in the driver's special queue for one
> station won't impact another station's budget. Those frames are
> counted towards the per interface pending airtime, which could trigger
> AQL to switch to use the lower queue limit. In this case, that could
> be the desirable behavior when there is heavy traffic.
Yes, the per interface limit is what I'm concerned about. I'm not sure
if it will be an issue in practice, it's just something that I noticed
while reviewing the code.

- Felix

2020-02-27 11:10:34

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [PATCH v11 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

Felix Fietkau <[email protected]> writes:

>> Not sure I fully understand your concerns here. The AQL budget is per
>> STA, per TID, so frames queued in the driver's special queue for one
>> station won't impact another station's budget. Those frames are
>> counted towards the per interface pending airtime, which could trigger
>> AQL to switch to use the lower queue limit. In this case, that could
>> be the desirable behavior when there is heavy traffic.
> Yes, the per interface limit is what I'm concerned about. I'm not sure
> if it will be an issue in practice, it's just something that I noticed
> while reviewing the code.

Ah, right. Note that it's not a hard limit, though; it's just a
threshold for when the per-station limit switches to a smaller value.
The idea is that when there are many stations with outstanding data,
each station's latency will be higher since they have to also wait for
their turn to transmit. And so we compensate for this by lowering each
station's queue limit, since the hardware is unlikely to become idle
when there are many stations to send to.

The lower limit is still 5ms of data, though, so there should still be
enough for a full aggregate queued for each station. Arguably the limit
could actually be made *smaller*: On a very busy network with lots of
stations transmitting at once, IMO we *don't* want to send full-sized
aggregates as that will add up to way too much latency. Better to
sacrifice a bit of network efficiency to raise responsiveness :)

-Toke

2020-02-27 11:35:13

by Felix Fietkau

[permalink] [raw]
Subject: Re: [PATCH v11 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

On 2020-02-27 11:07, Toke Høiland-Jørgensen wrote:
> Felix Fietkau <[email protected]> writes:
>
>> On 2020-02-26 22:56, Toke Høiland-Jørgensen wrote:
>>> Felix Fietkau <[email protected]> writes:
>>>> - We need an API that allows the driver to change the pending airtime
>>>> values, e.g. subtract estimated tx time for a packet.
>>>> mt76 an ath9k can queue packets inside the driver that are not currently
>>>> in the hardware queues. Typically if the txqs have more data than what
>>>> gets put into the hardware queue, both drivers will pull an extra frame
>>>> and queue it in its private txq struct. This frame will get used on the
>>>> next txq scheduling round for that particular station.
>>>> If you have lots of stations doing traffic (or having driver buffered
>>>> frames in powersave mode), this could use up a sizable chunk of the AQL
>>>> budget.
>>>
>>> I'm a bit more skeptical about this. If the driver buffers a bunch of
>>> packets that are not accounted that will hurt that station due to extra
>>> latency when it wakes up. For ath9k, this is the retry_q you're talking
>>> about, right? The number of packets queued on that is fairly limited,
>>> isn't it? What kind of powersave buffering is the driver doing, and why
>>> can't it leave the packets on the TXQ? That would allow them to be
>>> scheduled along with any new ones that might have arrived in the
>>> meantime, which would be a benefit for latency.
>> For mt76 there should be max. 1 frame in the retry queue, it's just a
>> frame that was pulled from the txq in a transmission attempt but that it
>> couldn't put in the hw queue because it didn't fit in the current
>> aggregate batch.
>
> Wait, if it's only a single frame that is queued in the driver, how is
> this causing problems? We deliberately set the limit so there was a bit
> of slack above the size of an aggregate for things like this. Could you
> please describe in a bit more detail what symptoms you are seeing of
> this problem? :)
It would be a single frame per sta/txq. I don't know if it will cause
problems in practice, it's just a potential corner case that I found
during review. I'd imagine this would probably show up in some
benchmarks at least.
I'm not seeing any symptoms myself, but I also haven't run any intricate
tests yet.

- Felix

2020-02-27 12:01:05

by Felix Fietkau

[permalink] [raw]
Subject: Re: [PATCH v11 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

On 2020-02-27 12:07, Toke Høiland-Jørgensen wrote:
> Felix Fietkau <[email protected]> writes:
>
>>> Not sure I fully understand your concerns here. The AQL budget is per
>>> STA, per TID, so frames queued in the driver's special queue for one
>>> station won't impact another station's budget. Those frames are
>>> counted towards the per interface pending airtime, which could trigger
>>> AQL to switch to use the lower queue limit. In this case, that could
>>> be the desirable behavior when there is heavy traffic.
>> Yes, the per interface limit is what I'm concerned about. I'm not sure
>> if it will be an issue in practice, it's just something that I noticed
>> while reviewing the code.
>
> Ah, right. Note that it's not a hard limit, though; it's just a
> threshold for when the per-station limit switches to a smaller value.
> The idea is that when there are many stations with outstanding data,
> each station's latency will be higher since they have to also wait for
> their turn to transmit. And so we compensate for this by lowering each
> station's queue limit, since the hardware is unlikely to become idle
> when there are many stations to send to.
>
> The lower limit is still 5ms of data, though, so there should still be
> enough for a full aggregate queued for each station. Arguably the limit
> could actually be made *smaller*: On a very busy network with lots of
> stations transmitting at once, IMO we *don't* want to send full-sized
> aggregates as that will add up to way too much latency. Better to
> sacrifice a bit of network efficiency to raise responsiveness :)
Makes sense, thanks.

- Felix

2020-02-27 12:02:01

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [PATCH v11 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

Felix Fietkau <[email protected]> writes:

> On 2020-02-27 11:07, Toke Høiland-Jørgensen wrote:
>> Felix Fietkau <[email protected]> writes:
>>
>>> On 2020-02-26 22:56, Toke Høiland-Jørgensen wrote:
>>>> Felix Fietkau <[email protected]> writes:
>>>>> - We need an API that allows the driver to change the pending airtime
>>>>> values, e.g. subtract estimated tx time for a packet.
>>>>> mt76 an ath9k can queue packets inside the driver that are not currently
>>>>> in the hardware queues. Typically if the txqs have more data than what
>>>>> gets put into the hardware queue, both drivers will pull an extra frame
>>>>> and queue it in its private txq struct. This frame will get used on the
>>>>> next txq scheduling round for that particular station.
>>>>> If you have lots of stations doing traffic (or having driver buffered
>>>>> frames in powersave mode), this could use up a sizable chunk of the AQL
>>>>> budget.
>>>>
>>>> I'm a bit more skeptical about this. If the driver buffers a bunch of
>>>> packets that are not accounted that will hurt that station due to extra
>>>> latency when it wakes up. For ath9k, this is the retry_q you're talking
>>>> about, right? The number of packets queued on that is fairly limited,
>>>> isn't it? What kind of powersave buffering is the driver doing, and why
>>>> can't it leave the packets on the TXQ? That would allow them to be
>>>> scheduled along with any new ones that might have arrived in the
>>>> meantime, which would be a benefit for latency.
>>> For mt76 there should be max. 1 frame in the retry queue, it's just a
>>> frame that was pulled from the txq in a transmission attempt but that it
>>> couldn't put in the hw queue because it didn't fit in the current
>>> aggregate batch.
>>
>> Wait, if it's only a single frame that is queued in the driver, how is
>> this causing problems? We deliberately set the limit so there was a bit
>> of slack above the size of an aggregate for things like this. Could you
>> please describe in a bit more detail what symptoms you are seeing of
>> this problem? :)
> It would be a single frame per sta/txq. I don't know if it will cause
> problems in practice, it's just a potential corner case that I found
> during review. I'd imagine this would probably show up in some
> benchmarks at least.
> I'm not seeing any symptoms myself, but I also haven't run any intricate
> tests yet.

See my other reply; I'm not convinced this behaviour is wrong :)

-Toke