2016-06-03 16:52:41

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: [RFC/RFT 0/5] Adding an airtime fairness scheduler to ath9k

This patch set is my first pass at attempting to add an airtime fairness
enforcing scheduler to ath9k (for use in access point mode). For some
background on this approach and its possible benefits see this paper:
https://www.usenix.org/event/usenix04/tech/general/full_papers/tan/tan.pdf

This is still very preliminary work, but I thought I'd post it for
comment anyway. Tentatively, it's looking like it gives some
improvement, but is by no means perfect in what it is achieving. See
https://blog.tohojo.dk/2016/06/a-first-stab-at-an-airtime-fairness-scheduler-for-ath9k.html
for some graphs of the patch set. I'm seeing roughly a 15% improvement
in aggregate throughput in the presence of a slow station.

To work, the scheduler needs per-station queueing, so this patch series
also includes Michal's patch to not have qdiscs on the interface when
wake_tx_queue is present, and Tim's patch to add wake_tx_queue to ath9k.
The latter I reworked to not require the global variable renaming, since
I found that change to be too confusing after having spent time getting
used to the ath9k names for things (I put my own name on this commit and
added a second Signed-off-by; apologies if this is not the right way to
do things for this kind of rework).

The scheduler patch set itself is in three parts: The first patch simply
adds airtime accounting and exports it to a per-station debugfs file
(which the latest git version of Flent knows how to read). The second
patch adds the airtime scheduler using only the TX airtime information.
Finally the third patch adds in RX airtime to the deficit used by the
scheduler. This last part is not always a win (see the blog post linked
above), but I haven't tested it extensively. So feel free to test with
or without the last patch.

There is some code duplication between the debugfs code and the
scheduler which I haven't removed yet to be able to easily switch
between having the scheduler use the airtime values and simply
accounting them for debugging purposes.

Comments and test results very welcome! :)

-Toke



Michal Kazior (1):
mac80211: skip netdev queue control with software queuing

Toke Høiland-Jørgensen (4):
ath9k: use mac80211 intermediate software queues
ath9k: Add airstame stats to per-station debugfs
ath9k: Add a per-station airtime deficit scheduler
ath9k: Count RX airtime in airtime deficit.

drivers/net/wireless/ath/ath9k/ath9k.h | 27 +++-
drivers/net/wireless/ath/ath9k/channel.c | 12 +-
drivers/net/wireless/ath/ath9k/debug.h | 29 ++++
drivers/net/wireless/ath/ath9k/debug_sta.c | 144 ++++++++++++++++++-
drivers/net/wireless/ath/ath9k/init.c | 1 +
drivers/net/wireless/ath/ath9k/main.c | 7 +-
drivers/net/wireless/ath/ath9k/recv.c | 50 +++++++
drivers/net/wireless/ath/ath9k/xmit.c | 222 ++++++++++++++++++++++++-----
include/net/mac80211.h | 4 -
net/mac80211/ieee80211_i.h | 2 +-
net/mac80211/iface.c | 18 ++-
net/mac80211/main.c | 3 -
net/mac80211/sta_info.c | 2 +-
net/mac80211/tx.c | 82 ++++++-----
net/mac80211/util.c | 11 +-
15 files changed, 511 insertions(+), 103 deletions(-)

--
2.7.4


2016-06-06 17:31:41

by Tim Shepard

[permalink] [raw]
Subject: Re: [RFC/RFT 2/5] ath9k: use mac80211 intermediate software queues


> And the patchset I was testing included your fq_codel port for ath9k
> but it was based on codel5.h. Michal's latest stuff reworked mainline
> codel to be generically usable, and it *is* a different variant of the
> algorithm.

I think you probably do understand what you are doing and which patch
is which, but I want to make sure that everyone else (and you, just in
case) understands that my patch to ath9k hooks ath9k up to the new
mac80211 per-station and per-tid intermediate queues and does *not*
have any codel or fq stuff in it, and is useful and worth doing even
if you aren't taking the codel and fq stuff. (I didn't port any
fq_codel stuff to ath9k.)

My ath9k patch is a prerequisite for people who want to take Michal's
mac80211 fq and mac80211 codel stuff for a test drive on an ath9k
interface. (Otherwise the ath9k uses the old transmit path through
mac80211 which wouldn't touch Michal's fq and codel tweaks to
mac80211's new intermediate queues.) But it is useful and there are
some benefits to it (especially with Michal's IFF_NO_QUEUE) even
without the fq and codel stuff.


-Tim Shepard
[email protected]


2016-06-07 00:01:09

by Adrian Chadd

[permalink] [raw]
Subject: Re: [RFC/RFT 5/5] ath9k: Count RX airtime in airtime deficit

[snip]

I also found one of my notes in my version of this - how can we
estimate the duration of an A-MPDU, when we only get hardware
de-encapsulated frames?



-adrian

2016-06-10 09:49:31

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [Make-wifi-fast] [RFC/RFT 5/5] ath9k: Count RX airtime in airtime deficit

Michal Kazior <[email protected]> writes:

> On 10 June 2016 at 11:08, Toke Høiland-Jørgensen <[email protected]> wrote:
>> Michal Kazior <[email protected]> writes:
>>
>>> For A-MPDU all MPDU rx status (except last one) should share the same
>>> timestamp. Last one has a different one so all you need is to
>>> distinguish first and last MPDU. Non A-MPDU obviously are special case
>>> (status bits are pricky).
>>
>> Right. So comparing the rs_stamp between first and last MPDU should give
>> the duration of the entire thing?
>
> Depends on how you define your "thing" :) I no, I don't know what
> you'll actually measure. It should be reasonable either way.

"Time spent receiving from this station" is the overall metric I want.
In this case that then becomes "Time spend receiving this aggregate"
which I can then account to that stations' total RX airtime.

>> This would require keeping state between subsequent calls to the RX
>> handler. Also, what happens if the last MPDU is lost?
>
> No idea. If that's possible, then track last MPDU within PPDU, so you
> can at least fallback to _something_ when you detect a new first
> (A-)MPDU?
>
> Or maybe it's impossible (i.e. not worth worrying) and HW always
> reports last MPDU as far as status bits are concerned (regardless of
> it being _actual_ last MPDU, i.e. it just says "ok, I'm done with this
> PPDU").

I'll try a couple of different permutations of this and see what works.
Thanks for the ideas!

>>> No idea. Maybe it is as there's distinction between "more" and
>>> "moreaggr".
>>
>> Hmm. If it is, comparing the stamp of the first MPDU to the current time
>> (when handling it) should give the needed duration? Will try doing that
>> and see what the result is.
>
> I'd say it's a little racy/inaccurate (and perhaps unreliable) to
> compare any kind of global timer and compare it against your rx-status
> descriptors.

Well, the timer would be the one coming from the card (tsf in the code
snippet I posted comes from ath9k_hw_gettsf64()). But yeah, it is going
to account some things that are not *strictly* air time (interrupt
latency for one). However, the question is how much this really
contributes (at least in cases where we are not running behind because
of lack of CPU). And if there's a constant overhead it shouldn't matter
so much as long as it is the same for all stations. Certainly just
timestamping packets with the current time works well enough in the
qdisc layer...

-Toke

2016-06-06 16:55:30

by Dave Taht

[permalink] [raw]
Subject: Re: [RFC/RFT 2/5] ath9k: use mac80211 intermediate software queues

On Sun, Jun 5, 2016 at 10:50 PM, Tim Shepard <[email protected]> wrote:
>
>
>> > Thanks. I've gotten no other feedback that suggests anyone else has
>> > read the code. So I very much appreciate it.
>>
>> I not only read it but tested it extensively against the ath9k stock,
>> ath10k stock, ath10k -michal's fqmac 3.5 patches, and your ath9k
>> patches...
>
> My patch to ath9k shouldn't have any effect on any of the ath10k
> stuff. It only cuts ath9k over to use the new intermediate queues.

Like the Sith, there are always at least two wifi chips on a link. One
behaving significantly differently (like flow queuing and codeling and
striving for airtime fairness) does affect the behavior of the other.

While I did do some ath9k to ath9k testing, I was mostly testing the
"whole enchalada". There are many, many moving parts left to swap out.

And the patchset I was testing included your fq_codel port for ath9k
but it was based on codel5.h. Michal's latest stuff reworked mainline
codel to be generically usable, and it *is* a different variant of the
algorithm.

The airtime-fair stuff does not as yet include fq_codel on top. It's a
MPU, and still puzzling as to why it did not get closer to perfect
fairness.

I also fiddled with the idea of dynamically altering the beacon's txop
sizes. A really short best effort txop (94) was "Interesting". I need
to take apart how more beacons are constructed from non-linux vendors.

The whole enchalada is tasting pretty good thus far.

> I should point out again that Avery's observation that michal's
> mac80211 flow queueing patches and mac80211 codel stuff aren't needed
> to the improvement between competing client stations.

To *2* competing client stations, this is somewhat true at present.
There are at least 5 other (pending) factors to factor in.

* Toke's preliminary airtime-fair patches already showed a net gain in
bandwidth for the higher rate competing station. The "performance
anomaly", identified way back in 2003, is still with us without also
striving for airtime fairness.
* In order to hold latencies low with > 2 stations active, I advocate
gradually using shorter txops, which will improve behavior for
stations doing interactive tasks, and offering service sooner to
infrequently seen stations.
* In order to do gang scheduling for mu-mimo, we need to have several
2-3ms sized queues outstanding for the devices that can be mu-mimoed.
* Getting minstrel-blues in there to sample more dynamically would be nice
* Reducing retries and retransmits would be nice when already
congested. I'd also like to try QosNoack.


> All that's
> needed is to use the new mac80211 per-station per-tid intermediate
> queues and also set the IFF_NO_QUEUE bit.

It's a heckofastart.

>
> For ath9k, Felix's mac80211 interemediate queues patch (already in
> mainline Linux over a year ago), my patch to ath9k, and just
> Michal's first patch alone "[PATCHv3 1/5] mac80211: skip netdev
> queue control with software queuing" should (and seems to in
> testing I've done so far) get all the latency improvement there is
> to be had when the competing traffic is to a different client
> station.

I think it can be shaved from the presently observed 7-12ms minimum at
160mbit by another 2-3x. Also the codel implementation is not as yet
as tightly controlling queue size as I'd like - I haven't pushed it as
hard at sub 20mbit performance as I'd like (coping with being enraged
at networkmanager's over-use of channel scans was what I was at, last)
but I'm showing queue depth of well over 25ms at that rate right now
on the ath9k patch.

>
>
>
>> After losing my temper at the impact of channel scans...
>>
>> ( https://plus.google.com/u/0/107942175615993706558/posts/WA915Pt4SRN
>> ), I got told how to get rid of them for testing, and started redoing
>> the work when I got back from vacation.
>
> Heh... I've never seen that problem. But I'm not running
> network-manager on any machine in my testbed.

I tend to think it is important to measure what happens in the real
world, to clearly identify what the real world problems actually are.

I let everybody else, with attenuators, and emulators, do whatever
they want. Me, I'm perpetually setting up real-world labs like the
yurtlab and sflab and the isclab to try to see what happens in
practice.

I now plan to put some work in on making channel scans less nasty and
to also look into what it takes to implement 802.11k, 802.11r and
802.11v.

or at the very least, nag people to get it more right.

https://bugzilla.gnome.org/show_bug.cgi?id=766482

NetworkManager's author suggests here that

https://blogs.gnome.org/dcbw/2016/05/16/networkmanager-and-wifi-scans/

"You can also advocate that your favorite WiFi interface add support
for NetworkManager’s RequestScan()"

>
>> > I looked for a way to ask mac80211 if there are any packets left in
>> > the intermediate queue without dequeueing a packet and I failed to
>> > find such an interface.
>>
>> qdisc->peek like function? A bitmap that says these stations have data
>> outstanding?
>>
>
> I'm not sure what you mean here. Are you asking if that was what I
> was looking for in the existing mac80211 code? If so, yes, that is
> what I was looking for. Or are you suggesting (or agreeing) that
> indeed such an interface should be added to mac80211?

Yes, both.

> Though I don't see how it could be a bitmap... stations don't have
> any obvious stable associated index into such a bitmap.

Mores the pity. I liked the idea of adapting something more qfq-like
than fq_codel like in parts of this. Allocating a fixed number of bits
(usually max_stations is set to like 128), and using array indexes for
various station related params (including rate stats), seems to be
simpler than chasing pointers around.

>
> The real problem might be the way all the code in ath9k/xmit.c is
> designed where it needs ath_tid_has_buffered() in so many places.

Yes, the ath9k is mroe than a bit hairy. :/

> Look at the mt76 driver, and see how it also maintains a small
> (probably only at most one packet ever) queue "retry_q" which
> sometimes does hold a packet which will get sent before it will
> pull another packet from ieee80211_tx_dequeue. But the rest of the
> driver is structured so that a NULL return from mt76_txq_dequeue()
> is all it needs. That's probably the sort of structure we ought to
> have in ath9k, but that's going to be a more invasive patch to
> ath9k (and I haven't figured out how to do it yet).

I did finally acquire a mt76 board to play with and I agree that that
that driver appears the most ideal to be prototyping with.

>
> I think we can get there incrementally though. I hope the patch for
> ath9k that I've got now (and the one I hope to have soon that fixes
> the problem of not making use of hwq_max_pending() ) might be useful
> intermediate steps (that work, and provide improvements) then we can
> clean up ath9k removing mechanisms that are no longer needed (breaking
> the old transmit path, and somehow handling the chanctx stuff).

Groovy.

>
>
> -Tim Shepard
> [email protected]



--
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

2016-06-03 16:52:43

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: [RFC/RFT 1/5] mac80211: skip netdev queue control with software queuing

From: Michal Kazior <[email protected]>

Qdiscs are designed with no regard to 802.11
aggregation requirements and hand out
packet-by-packet with no guarantee they are
destined to the same tid. This does more bad than
good no matter how fairly a given qdisc may behave
on an ethernet interface.

Software queuing used per-AC netdev subqueue
congestion control whenever a global AC limit was
hit. This meant in practice a single station or
tid queue could starve others rather easily. This
could resonate with qdiscs in a bad way or could
just end up with poor aggregation performance.
Increasing the AC limit would increase induced
latency which is also bad.

Disabling qdiscs by default and performing
taildrop instead of netdev subqueue congestion
control on the other hand makes it possible for
tid queues to fill up "in the meantime" while
preventing stations starving each other.

This increases aggregation opportunities and
should allow software queuing based drivers
achieve better performance by utilizing airtime
more efficiently with big aggregates.

Signed-off-by: Michal Kazior <[email protected]>
---
include/net/mac80211.h | 4 ---
net/mac80211/ieee80211_i.h | 2 +-
net/mac80211/iface.c | 18 ++++++++--
net/mac80211/main.c | 3 --
net/mac80211/sta_info.c | 2 +-
net/mac80211/tx.c | 82 +++++++++++++++++++++++++---------------------
net/mac80211/util.c | 11 ++++---
7 files changed, 67 insertions(+), 55 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index be30b05..a8683ae 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -2147,9 +2147,6 @@ enum ieee80211_hw_flags {
* @n_cipher_schemes: a size of an array of cipher schemes definitions.
* @cipher_schemes: a pointer to an array of cipher scheme definitions
* supported by HW.
- *
- * @txq_ac_max_pending: maximum number of frames per AC pending in all txq
- * entries for a vif.
*/
struct ieee80211_hw {
struct ieee80211_conf conf;
@@ -2180,7 +2177,6 @@ struct ieee80211_hw {
u8 uapsd_max_sp_len;
u8 n_cipher_schemes;
const struct ieee80211_cipher_scheme *cipher_schemes;
- int txq_ac_max_pending;
};

static inline bool _ieee80211_hw_check(struct ieee80211_hw *hw,
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 9438c94..6346033 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -856,7 +856,7 @@ struct ieee80211_sub_if_data {
bool control_port_no_encrypt;
int encrypt_headroom;

- atomic_t txqs_len[IEEE80211_NUM_ACS];
+ atomic_t num_tx_queued;
struct ieee80211_tx_queue_params tx_conf[IEEE80211_NUM_ACS];
struct mac80211_qos_map __rcu *qos_map;

diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index c59af3e..609c517 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -976,13 +976,13 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,

if (sdata->vif.txq) {
struct txq_info *txqi = to_txq_info(sdata->vif.txq);
+ int n = skb_queue_len(&txqi->queue);

spin_lock_bh(&txqi->queue.lock);
ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
+ atomic_sub(n, &sdata->num_tx_queued);
txqi->byte_cnt = 0;
spin_unlock_bh(&txqi->queue.lock);
-
- atomic_set(&sdata->txqs_len[txqi->txq.ac], 0);
}

if (local->open_count == 0)
@@ -1198,6 +1198,12 @@ static void ieee80211_if_setup(struct net_device *dev)
dev->destructor = ieee80211_if_free;
}

+static void ieee80211_if_setup_no_queue(struct net_device *dev)
+{
+ ieee80211_if_setup(dev);
+ dev->priv_flags |= IFF_NO_QUEUE;
+}
+
static void ieee80211_iface_work(struct work_struct *work)
{
struct ieee80211_sub_if_data *sdata =
@@ -1707,6 +1713,7 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
struct net_device *ndev = NULL;
struct ieee80211_sub_if_data *sdata = NULL;
struct txq_info *txqi;
+ void (*if_setup)(struct net_device *dev);
int ret, i;
int txqs = 1;

@@ -1734,12 +1741,17 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
txq_size += sizeof(struct txq_info) +
local->hw.txq_data_size;

+ if (local->ops->wake_tx_queue)
+ if_setup = ieee80211_if_setup_no_queue;
+ else
+ if_setup = ieee80211_if_setup;
+
if (local->hw.queues >= IEEE80211_NUM_ACS)
txqs = IEEE80211_NUM_ACS;

ndev = alloc_netdev_mqs(size + txq_size,
name, name_assign_type,
- ieee80211_if_setup, txqs, 1);
+ if_setup, txqs, 1);
if (!ndev)
return -ENOMEM;
dev_net_set(ndev, wiphy_net(local->hw.wiphy));
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 7ee91d6..160ac6b 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -1055,9 +1055,6 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)

local->dynamic_ps_forced_timeout = -1;

- if (!local->hw.txq_ac_max_pending)
- local->hw.txq_ac_max_pending = 64;
-
result = ieee80211_wep_init(local);
if (result < 0)
wiphy_debug(local->hw.wiphy, "Failed to initialize wep: %d\n",
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 5ccfdbd..177cc6c 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -116,7 +116,7 @@ static void __cleanup_single_sta(struct sta_info *sta)
int n = skb_queue_len(&txqi->queue);

ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
- atomic_sub(n, &sdata->txqs_len[txqi->txq.ac]);
+ atomic_sub(n, &sdata->num_tx_queued);
txqi->byte_cnt = 0;
}
}
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 2030443..792f0172 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1236,67 +1236,58 @@ ieee80211_tx_prepare(struct ieee80211_sub_if_data *sdata,
return TX_CONTINUE;
}

-static void ieee80211_drv_tx(struct ieee80211_local *local,
- struct ieee80211_vif *vif,
- struct ieee80211_sta *pubsta,
- struct sk_buff *skb)
+static struct txq_info *ieee80211_get_txq(struct ieee80211_local *local,
+ struct ieee80211_vif *vif,
+ struct ieee80211_sta *pubsta,
+ struct sk_buff *skb)
{
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
- struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
- struct ieee80211_tx_control control = {
- .sta = pubsta,
- };
- struct ieee80211_txq *txq = NULL;
- struct txq_info *txqi;
- u8 ac;

if ((info->flags & IEEE80211_TX_CTL_SEND_AFTER_DTIM) ||
(info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE))
- goto tx_normal;
+ return NULL;

if (!ieee80211_is_data(hdr->frame_control))
- goto tx_normal;
+ return NULL;

if (pubsta) {
u8 tid = skb->priority & IEEE80211_QOS_CTL_TID_MASK;

- txq = pubsta->txq[tid];
+ return to_txq_info(pubsta->txq[tid]);
} else if (vif) {
- txq = vif->txq;
+ return to_txq_info(vif->txq);
}

- if (!txq)
- goto tx_normal;
-
- ac = txq->ac;
- txqi = to_txq_info(txq);
- atomic_inc(&sdata->txqs_len[ac]);
- if (atomic_read(&sdata->txqs_len[ac]) >= local->hw.txq_ac_max_pending)
- netif_stop_subqueue(sdata->dev, ac);
+ return NULL;
+}

- spin_lock_bh(&txqi->queue.lock);
- txqi->byte_cnt += skb->len;
- __skb_queue_tail(&txqi->queue, skb);
- spin_unlock_bh(&txqi->queue.lock);
+static void ieee80211_txq_enqueue(struct ieee80211_local *local,
+ struct txq_info *txqi,
+ struct sk_buff *skb)
+{
+ struct ieee80211_sub_if_data *sdata = vif_to_sdata(txqi->txq.vif);

- drv_wake_tx_queue(local, txqi);
+ lockdep_assert_held(&txqi->queue.lock);

- return;
+ if (atomic_read(&sdata->num_tx_queued) >= TOTAL_MAX_TX_BUFFER ||
+ txqi->queue.qlen >= STA_MAX_TX_BUFFER) {
+ ieee80211_free_txskb(&local->hw, skb);
+ return;
+ }

-tx_normal:
- drv_tx(local, &control, skb);
+ atomic_inc(&sdata->num_tx_queued);
+ txqi->byte_cnt += skb->len;
+ __skb_queue_tail(&txqi->queue, skb);
}

struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
struct ieee80211_txq *txq)
{
- struct ieee80211_local *local = hw_to_local(hw);
struct ieee80211_sub_if_data *sdata = vif_to_sdata(txq->vif);
struct txq_info *txqi = container_of(txq, struct txq_info, txq);
struct ieee80211_hdr *hdr;
struct sk_buff *skb = NULL;
- u8 ac = txq->ac;

spin_lock_bh(&txqi->queue.lock);

@@ -1307,12 +1298,9 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
if (!skb)
goto out;

+ atomic_dec(&sdata->num_tx_queued);
txqi->byte_cnt -= skb->len;

- atomic_dec(&sdata->txqs_len[ac]);
- if (__netif_subqueue_stopped(sdata->dev, ac))
- ieee80211_propagate_queue_wake(local, sdata->vif.hw_queue[ac]);
-
hdr = (struct ieee80211_hdr *)skb->data;
if (txq->sta && ieee80211_is_data_qos(hdr->frame_control)) {
struct sta_info *sta = container_of(txq->sta, struct sta_info,
@@ -1343,7 +1331,9 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
struct sk_buff_head *skbs,
bool txpending)
{
+ struct ieee80211_tx_control control = {};
struct sk_buff *skb, *tmp;
+ struct txq_info *txqi;
unsigned long flags;

skb_queue_walk_safe(skbs, skb, tmp) {
@@ -1358,6 +1348,21 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
}
#endif

+ txqi = ieee80211_get_txq(local, vif, sta, skb);
+ if (txqi) {
+ info->control.vif = vif;
+
+ __skb_unlink(skb, skbs);
+
+ spin_lock_bh(&txqi->queue.lock);
+ ieee80211_txq_enqueue(local, txqi, skb);
+ spin_unlock_bh(&txqi->queue.lock);
+
+ drv_wake_tx_queue(local, txqi);
+
+ continue;
+ }
+
spin_lock_irqsave(&local->queue_stop_reason_lock, flags);
if (local->queue_stop_reasons[q] ||
(!txpending && !skb_queue_empty(&local->pending[q]))) {
@@ -1400,9 +1405,10 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags);

info->control.vif = vif;
+ control.sta = sta;

__skb_unlink(skb, skbs);
- ieee80211_drv_tx(local, vif, sta, skb);
+ drv_tx(local, &control, skb);
}

return true;
diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 905003f..8903285 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -244,6 +244,9 @@ void ieee80211_propagate_queue_wake(struct ieee80211_local *local, int queue)
struct ieee80211_sub_if_data *sdata;
int n_acs = IEEE80211_NUM_ACS;

+ if (local->ops->wake_tx_queue)
+ return;
+
if (local->hw.queues < IEEE80211_NUM_ACS)
n_acs = 1;

@@ -260,11 +263,6 @@ void ieee80211_propagate_queue_wake(struct ieee80211_local *local, int queue)
for (ac = 0; ac < n_acs; ac++) {
int ac_queue = sdata->vif.hw_queue[ac];

- if (local->ops->wake_tx_queue &&
- (atomic_read(&sdata->txqs_len[ac]) >
- local->hw.txq_ac_max_pending))
- continue;
-
if (ac_queue == queue ||
(sdata->vif.cab_queue == queue &&
local->queue_stop_reasons[ac_queue] == 0 &&
@@ -341,6 +339,9 @@ static void __ieee80211_stop_queue(struct ieee80211_hw *hw, int queue,

trace_stop_queue(local, queue, reason);

+ if (local->ops->wake_tx_queue)
+ return;
+
if (WARN_ON(queue >= hw->queues))
return;

--
2.7.4

2016-06-03 16:52:38

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: [RFC/RFT 3/5] ath9k: Add airstame stats to per-station debugfs

Uses the ts->duration + retry-chain information to account for time
spent transmitting to a station. Calculates the RX airtime from the rate
and packet length. Hopefully these figures are not too inaccurate.

Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
---
drivers/net/wireless/ath/ath9k/ath9k.h | 3 +
drivers/net/wireless/ath/ath9k/debug.h | 21 +++++
drivers/net/wireless/ath/ath9k/debug_sta.c | 125 +++++++++++++++++++++++++++++
drivers/net/wireless/ath/ath9k/recv.c | 1 +
drivers/net/wireless/ath/ath9k/xmit.c | 6 +-
5 files changed, 154 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index caeae10..3fbb0ad 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -264,6 +264,7 @@ struct ath_node {

#ifdef CONFIG_ATH9K_STATION_STATISTICS
struct ath_rx_rate_stats rx_rate_stats;
+ struct ath_airtime_stats airtime_stats;
#endif
u8 key_idx[4];

@@ -573,6 +574,8 @@ void ath_txq_schedule_all(struct ath_softc *sc);
int ath_tx_init(struct ath_softc *sc, int nbufs);
int ath_txq_update(struct ath_softc *sc, int qnum,
struct ath9k_tx_queue_info *q);
+u32 ath_pkt_duration(struct ath_softc *sc, u8 rix, int pktlen,
+ int width, int half_gi, bool shortPreamble);
void ath_update_max_aggr_framelen(struct ath_softc *sc, int queue, int txop);
void ath_assign_seq(struct ath_common *common, struct sk_buff *skb);
int ath_tx_start(struct ieee80211_hw *hw, struct sk_buff *skb,
diff --git a/drivers/net/wireless/ath/ath9k/debug.h b/drivers/net/wireless/ath/ath9k/debug.h
index cd68c5f..4f78495 100644
--- a/drivers/net/wireless/ath/ath9k/debug.h
+++ b/drivers/net/wireless/ath/ath9k/debug.h
@@ -223,6 +223,11 @@ struct ath_rx_rate_stats {
} cck_stats[4];
};

+struct ath_airtime_stats {
+ u32 rx_airtime;
+ u32 tx_airtime;
+};
+
#define ANT_MAIN 0
#define ANT_ALT 1

@@ -316,12 +321,28 @@ ath9k_debug_sync_cause(struct ath_softc *sc, u32 sync_cause)
void ath_debug_rate_stats(struct ath_softc *sc,
struct ath_rx_status *rs,
struct sk_buff *skb);
+void ath_debug_tx_airtime(struct ath_softc *sc,
+ struct ath_buf *bf,
+ struct ath_tx_status *ts);
+void ath_debug_rx_airtime(struct ath_softc *sc,
+ struct ath_rx_status *rs,
+ struct sk_buff *skb);
#else
static inline void ath_debug_rate_stats(struct ath_softc *sc,
struct ath_rx_status *rs,
struct sk_buff *skb)
{
}
+static inline void ath_debug_tx_airtime(struct ath_softc *sc,
+ struct ath_buf *bf,
+ struct ath_tx_status *ts)
+{
+}
+static inline void ath_debug_rx_airtime(struct ath_softc *sc,
+ struct ath_rx_status *rs,
+ struct sk_buff *skb)
+{
+}
#endif /* CONFIG_ATH9K_STATION_STATISTICS */

#endif /* DEBUG_H */
diff --git a/drivers/net/wireless/ath/ath9k/debug_sta.c b/drivers/net/wireless/ath/ath9k/debug_sta.c
index 0e7f6b5..56aba3a 100644
--- a/drivers/net/wireless/ath/ath9k/debug_sta.c
+++ b/drivers/net/wireless/ath/ath9k/debug_sta.c
@@ -245,6 +245,130 @@ static const struct file_operations fops_node_recv = {
.llseek = default_llseek,
};

+void ath_debug_tx_airtime(struct ath_softc *sc,
+ struct ath_buf *bf,
+ struct ath_tx_status *ts)
+{
+ struct ath_node *an;
+ struct sk_buff *skb;
+ struct ieee80211_hdr *hdr;
+ struct ath_airtime_stats *astats;
+ struct ieee80211_hw *hw = sc->hw;
+ struct ieee80211_tx_rate rates[4];
+ struct ieee80211_sta *sta;
+ int i;
+
+ skb = bf->bf_mpdu;
+ if(!skb)
+ return;
+
+ hdr = (struct ieee80211_hdr *)skb->data;
+ memcpy(rates, bf->rates, sizeof(rates));
+
+ rcu_read_lock();
+
+ sta = ieee80211_find_sta_by_ifaddr(hw, hdr->addr1, hdr->addr2);
+ if(!sta)
+ goto exit;
+
+
+ an = (struct ath_node *) sta->drv_priv;
+ astats = &an->airtime_stats;
+
+ astats->tx_airtime += ts->duration * (ts->ts_longretry + 1);
+
+ for(i=0; i < ts->ts_rateindex; i++)
+ astats->tx_airtime += ath9k_hw_get_duration(sc->sc_ah, bf->bf_desc, i) * rates[i].count;
+
+
+
+exit:
+ rcu_read_unlock();
+}
+
+void ath_debug_rx_airtime(struct ath_softc *sc,
+ struct ath_rx_status *rs,
+ struct sk_buff *skb)
+{
+ struct ath_airtime_stats *astats;
+ struct ath_node *an;
+ struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
+ struct ath_hw *ah = sc->sc_ah;
+ struct ath_common *common = ath9k_hw_common(ah);
+ struct ieee80211_sta *sta;
+ struct ieee80211_rx_status *rxs;
+ const struct ieee80211_rate *rate;
+ bool is_sgi, is_40, is_sp;
+ int phy;
+
+ if (!ieee80211_is_data(hdr->frame_control))
+ return;
+
+ rcu_read_lock();
+
+ sta = ieee80211_find_sta_by_ifaddr(sc->hw, hdr->addr2, NULL);
+ if (!sta)
+ goto exit;
+ an = (struct ath_node *) sta->drv_priv;
+ rxs = IEEE80211_SKB_RXCB(skb);
+ astats = &an->airtime_stats;
+
+ is_sgi = !!(rxs->flag & RX_FLAG_SHORT_GI);
+ is_40 = !!(rxs->flag & RX_FLAG_40MHZ);
+ is_sp = !!(rxs->flag & RX_FLAG_SHORTPRE);
+
+ if (!!(rxs->flag & RX_FLAG_HT)) {
+ /* MCS rates */
+
+ astats->rx_airtime += ath_pkt_duration(sc, rxs->rate_idx, rs->rs_datalen,
+ is_40, is_sgi, is_sp);
+ goto exit;
+ }
+
+ phy = IS_CCK_RATE(rs->rs_rate) ? WLAN_RC_PHY_CCK : WLAN_RC_PHY_OFDM;
+ rate = &common->sbands[rxs->band].bitrates[rxs->rate_idx];
+ astats->rx_airtime += ath9k_hw_computetxtime(ah, phy, rate->bitrate * 100,
+ rs->rs_datalen, rxs->rate_idx, is_sp);
+
+
+exit:
+ rcu_read_unlock();
+}
+
+
+static ssize_t read_airtime(struct file *file, char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ struct ath_node *an = file->private_data;
+ struct ath_airtime_stats *astats;
+ u32 len = 0, size = 128;
+ char *buf;
+ size_t retval;
+
+ buf = kzalloc(size, GFP_KERNEL);
+ if (buf == NULL)
+ return -ENOMEM;
+
+ astats = &an->airtime_stats;
+
+ len += scnprintf(buf + len, size - len, "RX: %u us\n", astats->rx_airtime);
+ len += scnprintf(buf + len, size - len, "TX: %u us\n", astats->tx_airtime);
+
+ retval = simple_read_from_buffer(user_buf, count, ppos, buf, len);
+ kfree(buf);
+
+ return retval;
+}
+
+
+static const struct file_operations fops_airtime = {
+ .read = read_airtime,
+ .open = simple_open,
+ .owner = THIS_MODULE,
+ .llseek = default_llseek,
+};
+
+
void ath9k_sta_add_debugfs(struct ieee80211_hw *hw,
struct ieee80211_vif *vif,
struct ieee80211_sta *sta,
@@ -254,4 +378,5 @@ void ath9k_sta_add_debugfs(struct ieee80211_hw *hw,

debugfs_create_file("node_aggr", S_IRUGO, dir, an, &fops_node_aggr);
debugfs_create_file("node_recv", S_IRUGO, dir, an, &fops_node_recv);
+ debugfs_create_file("airtime", S_IRUGO, dir, an, &fops_airtime);
}
diff --git a/drivers/net/wireless/ath/ath9k/recv.c b/drivers/net/wireless/ath/ath9k/recv.c
index 32160fc..7eb8980 100644
--- a/drivers/net/wireless/ath/ath9k/recv.c
+++ b/drivers/net/wireless/ath/ath9k/recv.c
@@ -1137,6 +1137,7 @@ int ath_rx_tasklet(struct ath_softc *sc, int flush, bool hp)
ath9k_antenna_check(sc, &rs);
ath9k_apply_ampdu_details(sc, &rs, rxs);
ath_debug_rate_stats(sc, &rs, skb);
+ ath_debug_rx_airtime(sc, &rs, skb);

hdr = (struct ieee80211_hdr *)skb->data;
if (ieee80211_is_ack(hdr->frame_control))
diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
index cdc8684..07d32e7 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -739,6 +739,8 @@ static void ath_tx_process_buffer(struct ath_softc *sc, struct ath_txq *txq,

ts->duration = ath9k_hw_get_duration(sc->sc_ah, bf->bf_desc,
ts->ts_rateindex);
+ ath_debug_tx_airtime(sc, bf, ts);
+
if (!bf_isampdu(bf)) {
if (!flush) {
info = IEEE80211_SKB_CB(bf->bf_mpdu);
@@ -1090,8 +1093,8 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
* width - 0 for 20 MHz, 1 for 40 MHz
* half_gi - to use 4us v/s 3.6 us for symbol time
*/
-static u32 ath_pkt_duration(struct ath_softc *sc, u8 rix, int pktlen,
- int width, int half_gi, bool shortPreamble)
+u32 ath_pkt_duration(struct ath_softc *sc, u8 rix, int pktlen,
+ int width, int half_gi, bool shortPreamble)
{
u32 nbits, nsymbits, duration, nsymbols;
int streams;
--
2.7.4

2016-06-06 17:00:07

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [RFC/RFT 1/5] mac80211: skip netdev queue control with software queuing

Julian Calaby <[email protected]> writes:

> As this patch is passing through your hands, you need to add your
> Signed-off-by too.

Ah, right, sorry; didn't know that. Will add it the next time around.
Thanks for the pointer! :)

-Toke

2016-06-03 16:52:38

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: [RFC/RFT 4/5] ath9k: Add a per-station airtime deficit scheduler

This modifies the logic in ath_txq_schedule to account airtime consumed
by each station and uses a deficit-based scheduler derived from FQ-CoDel
to try to enforce airtime fairness. The deficit is decreased on TX
completion and the scheduler simply makes scheduling decisions based on
this.

Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
---
drivers/net/wireless/ath/ath9k/ath9k.h | 8 ++-
drivers/net/wireless/ath/ath9k/channel.c | 12 ++--
drivers/net/wireless/ath/ath9k/debug.h | 8 +++
drivers/net/wireless/ath/ath9k/debug_sta.c | 12 ++++
drivers/net/wireless/ath/ath9k/main.c | 6 +-
drivers/net/wireless/ath/ath9k/xmit.c | 99 ++++++++++++++++++++++++------
6 files changed, 117 insertions(+), 28 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index 3fbb0ad..7c0fc8b 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -261,6 +261,7 @@ struct ath_node {

bool sleeping;
bool no_ps_filter;
+ s64 airtime_deficit;

#ifdef CONFIG_ATH9K_STATION_STATISTICS
struct ath_rx_rate_stats rx_rate_stats;
@@ -332,10 +333,15 @@ struct ath_rx {
/* Channel Context */
/*******************/

+struct ath_acq {
+ struct list_head acq_new;
+ struct list_head acq_old;
+};
+
struct ath_chanctx {
struct cfg80211_chan_def chandef;
struct list_head vifs;
- struct list_head acq[IEEE80211_NUM_ACS];
+ struct ath_acq acq[IEEE80211_NUM_ACS];
int hw_queue_base;

/* do not dereference, use for comparison only */
diff --git a/drivers/net/wireless/ath/ath9k/channel.c b/drivers/net/wireless/ath/ath9k/channel.c
index e56bafc..2594029 100644
--- a/drivers/net/wireless/ath/ath9k/channel.c
+++ b/drivers/net/wireless/ath/ath9k/channel.c
@@ -118,8 +118,10 @@ void ath_chanctx_init(struct ath_softc *sc)
INIT_LIST_HEAD(&ctx->vifs);
ctx->txpower = ATH_TXPOWER_MAX;
ctx->flush_timeout = HZ / 5; /* 200ms */
- for (j = 0; j < ARRAY_SIZE(ctx->acq); j++)
- INIT_LIST_HEAD(&ctx->acq[j]);
+ for (j = 0; j < ARRAY_SIZE(ctx->acq); j++) {
+ INIT_LIST_HEAD(&ctx->acq[j].acq_new);
+ INIT_LIST_HEAD(&ctx->acq[j].acq_old);
+ }
}
}

@@ -1344,8 +1346,10 @@ void ath9k_offchannel_init(struct ath_softc *sc)
ctx->txpower = ATH_TXPOWER_MAX;
cfg80211_chandef_create(&ctx->chandef, chan, NL80211_CHAN_HT20);

- for (i = 0; i < ARRAY_SIZE(ctx->acq); i++)
- INIT_LIST_HEAD(&ctx->acq[i]);
+ for (i = 0; i < ARRAY_SIZE(ctx->acq); i++) {
+ INIT_LIST_HEAD(&ctx->acq[i].acq_new);
+ INIT_LIST_HEAD(&ctx->acq[i].acq_old);
+ }

sc->offchannel.chan.offchannel = true;
}
diff --git a/drivers/net/wireless/ath/ath9k/debug.h b/drivers/net/wireless/ath/ath9k/debug.h
index 4f78495..bf1a540 100644
--- a/drivers/net/wireless/ath/ath9k/debug.h
+++ b/drivers/net/wireless/ath/ath9k/debug.h
@@ -327,6 +327,9 @@ void ath_debug_tx_airtime(struct ath_softc *sc,
void ath_debug_rx_airtime(struct ath_softc *sc,
struct ath_rx_status *rs,
struct sk_buff *skb);
+void ath_debug_airtime(struct ath_softc *sc,
+ struct ath_node *an,
+ u32 rx, u32 tx);
#else
static inline void ath_debug_rate_stats(struct ath_softc *sc,
struct ath_rx_status *rs,
@@ -343,6 +346,11 @@ static inline void ath_debug_rx_airtime(struct ath_softc *sc,
struct sk_buff *skb)
{
}
+static void ath_debug_airtime(struct ath_softc *sc,
+ struct ath_node *an,
+ u32 rx, u32 tx)
+{
+}
#endif /* CONFIG_ATH9K_STATION_STATISTICS */

#endif /* DEBUG_H */
diff --git a/drivers/net/wireless/ath/ath9k/debug_sta.c b/drivers/net/wireless/ath/ath9k/debug_sta.c
index 56aba3a..2313f9e 100644
--- a/drivers/net/wireless/ath/ath9k/debug_sta.c
+++ b/drivers/net/wireless/ath/ath9k/debug_sta.c
@@ -245,6 +245,17 @@ static const struct file_operations fops_node_recv = {
.llseek = default_llseek,
};

+void ath_debug_airtime(struct ath_softc *sc,
+ struct ath_node *an,
+ u32 rx,
+ u32 tx)
+{
+ struct ath_airtime_stats *astats = &an->airtime_stats;
+
+ astats->rx_airtime += rx;
+ astats->tx_airtime += tx;
+}
+
void ath_debug_tx_airtime(struct ath_softc *sc,
struct ath_buf *bf,
struct ath_tx_status *ts)
@@ -353,6 +364,7 @@ static ssize_t read_airtime(struct file *file, char __user *user_buf,

len += scnprintf(buf + len, size - len, "RX: %u us\n", astats->rx_airtime);
len += scnprintf(buf + len, size - len, "TX: %u us\n", astats->tx_airtime);
+ len += scnprintf(buf + len, size - len, "Deficit: %lld us\n", an->airtime_deficit);

retval = simple_read_from_buffer(user_buf, count, ppos, buf, len);
kfree(buf);
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 6ab56e5..e13068b 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -70,10 +70,10 @@ static bool ath9k_has_pending_frames(struct ath_softc *sc, struct ath_txq *txq,
goto out;

if (txq->mac80211_qnum >= 0) {
- struct list_head *list;
+ struct ath_acq *acq;

- list = &sc->cur_chan->acq[txq->mac80211_qnum];
- if (!list_empty(list))
+ acq = &sc->cur_chan->acq[txq->mac80211_qnum];
+ if (!list_empty(&acq->acq_new) || !list_empty(&acq->acq_old))
pending = true;
}
out:
diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
index 07d32e7..6ce58d9 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -108,16 +108,17 @@ void ath_txq_unlock_complete(struct ath_softc *sc, struct ath_txq *txq)
static void ath_tx_queue_tid(struct ath_softc *sc, struct ath_txq *txq,
struct ath_atx_tid *tid)
{
- struct list_head *list;
struct ath_vif *avp = (struct ath_vif *) tid->an->vif->drv_priv;
struct ath_chanctx *ctx = avp->chanctx;
+ struct ath_acq *acq;

if (!ctx)
return;

- list = &ctx->acq[TID_TO_WME_AC(tid->tidno)];
+
+ acq = &ctx->acq[TID_TO_WME_AC(tid->tidno)];
if (list_empty(&tid->list))
- list_add_tail(&tid->list, list);
+ list_add_tail(&tid->list, &acq->acq_new);
}

void ath9k_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *swq)
@@ -722,6 +723,47 @@ static bool bf_is_ampdu_not_probing(struct ath_buf *bf)
return bf_isampdu(bf) && !(info->flags & IEEE80211_TX_CTL_RATE_CTRL_PROBE);
}

+static void ath_tx_count_airtime(struct ath_softc *sc,
+ struct ath_buf *bf,
+ struct ath_tx_status *ts)
+{
+ struct ath_node *an;
+ struct sk_buff *skb;
+ struct ieee80211_hdr *hdr;
+ struct ieee80211_hw *hw = sc->hw;
+ struct ieee80211_tx_rate rates[4];
+ struct ieee80211_sta *sta;
+ int i;
+ u32 airtime = 0;
+
+ skb = bf->bf_mpdu;
+ if(!skb)
+ return;
+
+ hdr = (struct ieee80211_hdr *)skb->data;
+ memcpy(rates, bf->rates, sizeof(rates));
+
+ rcu_read_lock();
+
+ sta = ieee80211_find_sta_by_ifaddr(hw, hdr->addr1, hdr->addr2);
+ if(!sta)
+ goto exit;
+
+
+ an = (struct ath_node *) sta->drv_priv;
+
+ airtime += ts->duration * (ts->ts_longretry + 1);
+
+ for(i=0; i < ts->ts_rateindex; i++)
+ airtime += ath9k_hw_get_duration(sc->sc_ah, bf->bf_desc, i) * rates[i].count;
+
+ an->airtime_deficit -= airtime;
+ ath_debug_airtime(sc, an, 0, airtime);
+
+exit:
+ rcu_read_unlock();
+}
+
static void ath_tx_process_buffer(struct ath_softc *sc, struct ath_txq *txq,
struct ath_tx_status *ts, struct ath_buf *bf,
struct list_head *bf_head)
@@ -739,7 +781,7 @@ static void ath_tx_process_buffer(struct ath_softc *sc, struct ath_txq *txq,

ts->duration = ath9k_hw_get_duration(sc->sc_ah, bf->bf_desc,
ts->ts_rateindex);
- ath_debug_tx_airtime(sc, bf, ts);
+ ath_tx_count_airtime(sc, bf, ts);

if (!bf_isampdu(bf)) {
if (!flush) {
@@ -1989,6 +2031,7 @@ void ath_txq_schedule(struct ath_softc *sc, struct ath_txq *txq)
struct ath_common *common = ath9k_hw_common(sc->sc_ah);
struct ath_atx_tid *tid, *last_tid;
struct list_head *tid_list;
+ struct ath_acq *acq;
bool sent = false;

if (txq->mac80211_qnum < 0)
@@ -1998,37 +2041,51 @@ void ath_txq_schedule(struct ath_softc *sc, struct ath_txq *txq)
return;

spin_lock_bh(&sc->chan_lock);
- tid_list = &sc->cur_chan->acq[txq->mac80211_qnum];
-
- if (list_empty(tid_list)) {
- spin_unlock_bh(&sc->chan_lock);
- return;
- }
+ acq = &sc->cur_chan->acq[txq->mac80211_qnum];

rcu_read_lock();

+ tid_list = &acq->acq_new;
+ if (list_empty(tid_list)) {
+ tid_list = &acq->acq_old;
+ if (list_empty(tid_list))
+ goto out;
+ }
last_tid = list_entry(tid_list->prev, struct ath_atx_tid, list);
- while (!list_empty(tid_list)) {
+ for (;;) {
bool stop = false;

if (sc->cur_chan->stopped)
break;

- tid = list_first_entry(tid_list, struct ath_atx_tid, list);
- list_del_init(&tid->list);
+ tid = list_first_entry_or_null(tid_list, struct ath_atx_tid, list);
+ if (!tid)
+ break;
+
+ if (tid->an->airtime_deficit <= 0) {
+ tid->an->airtime_deficit += 300;
+ list_move_tail(&tid->list, &acq->acq_old);
+ continue;
+ }
+

if (ath_tx_sched_aggr(sc, txq, tid, &stop))
sent = true;

+ /* Give this tid a chance to try again when we have hw buffer space */
+ if (stop)
+ break;
+
/*
- * add tid to round-robin queue if more frames
- * are pending for the tid
+ * If we still have packets or we're in the new list, we need to
+ * pass through old list. The latter case is to prevent
+ * starvation.
*/
- if (ath_tid_has_buffered(tid))
- ath_tx_queue_tid(sc, txq, tid);
+ if (ath_tid_has_buffered(tid) || ((tid_list == &acq->acq_new) && !list_empty(&acq->acq_old)))
+ list_move_tail(&tid->list, &acq->acq_old);
+ else
+ list_del_init(&tid->list);

- if (stop)
- break;

if (tid == last_tid) {
if (!sent)
@@ -2039,7 +2096,7 @@ void ath_txq_schedule(struct ath_softc *sc, struct ath_txq *txq)
struct ath_atx_tid, list);
}
}
-
+out:
rcu_read_unlock();
spin_unlock_bh(&sc->chan_lock);
}
@@ -2934,6 +2991,8 @@ void ath_tx_node_init(struct ath_softc *sc, struct ath_node *an)
struct ath_atx_tid *tid;
int tidno, acno;

+ an->airtime_deficit = 300;
+
for (tidno = 0;
tidno < IEEE80211_NUM_TIDS;
tidno++) {
--
2.7.4

2016-06-03 16:52:40

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: [RFC/RFT 5/5] ath9k: Count RX airtime in airtime deficit

This adds RX airtime to the airtime deficit used in the scheduler. This
is not a definite win, but I have only done very limited testing where
it has been included. Feel free to skip this patch when testing.

Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
---
drivers/net/wireless/ath/ath9k/recv.c | 51 ++++++++++++++++++++++++++++++++++-
1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath9k/recv.c b/drivers/net/wireless/ath/ath9k/recv.c
index 7eb8980..23d6ebe 100644
--- a/drivers/net/wireless/ath/ath9k/recv.c
+++ b/drivers/net/wireless/ath/ath9k/recv.c
@@ -991,6 +991,55 @@ static void ath9k_apply_ampdu_details(struct ath_softc *sc,
}
}

+static void ath_rx_count_airtime(struct ath_softc *sc,
+ struct ath_rx_status *rs,
+ struct sk_buff *skb)
+{
+ struct ath_node *an;
+ struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
+ struct ath_hw *ah = sc->sc_ah;
+ struct ath_common *common = ath9k_hw_common(ah);
+ struct ieee80211_sta *sta;
+ struct ieee80211_rx_status *rxs;
+ const struct ieee80211_rate *rate;
+ bool is_sgi, is_40, is_sp;
+ int phy;
+ u32 airtime = 0;
+
+ if (!ieee80211_is_data(hdr->frame_control))
+ return;
+
+ rcu_read_lock();
+
+ sta = ieee80211_find_sta_by_ifaddr(sc->hw, hdr->addr2, NULL);
+ if (!sta)
+ goto exit;
+ an = (struct ath_node *) sta->drv_priv;
+ rxs = IEEE80211_SKB_RXCB(skb);
+
+ is_sgi = !!(rxs->flag & RX_FLAG_SHORT_GI);
+ is_40 = !!(rxs->flag & RX_FLAG_40MHZ);
+ is_sp = !!(rxs->flag & RX_FLAG_SHORTPRE);
+
+ if (!!(rxs->flag & RX_FLAG_HT)) {
+ /* MCS rates */
+
+ airtime += ath_pkt_duration(sc, rxs->rate_idx, rs->rs_datalen,
+ is_40, is_sgi, is_sp);
+ } else {
+
+ phy = IS_CCK_RATE(rs->rs_rate) ? WLAN_RC_PHY_CCK : WLAN_RC_PHY_OFDM;
+ rate = &common->sbands[rxs->band].bitrates[rxs->rate_idx];
+ airtime += ath9k_hw_computetxtime(ah, phy, rate->bitrate * 100,
+ rs->rs_datalen, rxs->rate_idx, is_sp);
+ }
+
+ an->airtime_deficit -= airtime;
+ ath_debug_airtime(sc, an, airtime, 0);
+exit:
+ rcu_read_unlock();
+}
+
int ath_rx_tasklet(struct ath_softc *sc, int flush, bool hp)
{
struct ath_rxbuf *bf;
@@ -1137,7 +1186,7 @@ int ath_rx_tasklet(struct ath_softc *sc, int flush, bool hp)
ath9k_antenna_check(sc, &rs);
ath9k_apply_ampdu_details(sc, &rs, rxs);
ath_debug_rate_stats(sc, &rs, skb);
- ath_debug_rx_airtime(sc, &rs, skb);
+ ath_rx_count_airtime(sc, &rs, skb);

hdr = (struct ieee80211_hdr *)skb->data;
if (ieee80211_is_ack(hdr->frame_control))
--
2.7.4

2016-06-06 02:59:01

by Tim Shepard

[permalink] [raw]
Subject: Re: [RFC/RFT 2/5] ath9k: use mac80211 intermediate software queues



> > Huh. I wonder why you did that? Is it really better to call the
> > ieee80211_txq a swq and call the ath9k hardware queue a txq? I
> > thought doing the renaming made for more readable and much more
> > maintainable code (where searching for text strings produced much
> > cleaner results when trying to track down all references).
>
> Well, the immediate reason was that at the time I just spent two weeks
> figuring out how ath9k worked and what the different concepts were, and
> suddenly they start to mean something different. The txq->hwq renaming
> would probably make sense in itself, but suddenly the same variable
> (txq) meant something different, which was quite confusing. So I found
> it was easier to change your patch to not require the renaming.
>
> A secondary reason was to keep the patch delta as small as possible. I'm
> definitely not the right person to make that call, but I found the
> global renaming somewhat intrusive.
>

OK, makes sense. But you left it called txq in mac80211. So someone
reading the mac80211 code and ath9k at the same time (to understand
the whole stack) winds up with txq meaning different things, including
in places in ath9k where it has to reference a field in a structure
defined by mac80211's header files to point to one of its txqs.

> As for the first point, I think it would be easier if you did not call
> the mac80211 queues 'txq' but something else ('swq' like I did, perhaps;
> I also considered 'imq' for intermediate queue). This will at least make
> it clear at a glance that it is something different than 'txq' was
> previously.

I'm not the one who called them txq. That was Felix's patch.
He also wrote the mt76 driver and in that driver txq consistently
means the mac80211 intermediate queue and not a driver internal queue
or a hardware queue. So I was trying to converge ath9k to be more
like the one and only example I had of a driver that uses the
intermediate queue.


> > I am grateful to learn that someone has read my patched version of
> > ath9k at least enough to do this rework. Any comments on the actual
> > work?
>
> Well, it seems to work well enough (haven't noticed the pending_frames
> issue, but on the other hand I haven't been looking very hard). However,
> it does fell like somewhat of a temporary solution. Having another
> intermediate queue in the driver (tid->i_q) and having a side-effect in
> ath_tid_has_buffered() seems a bit iffy to me. Wouldn't the 'right' way
> to do it be to have ath_tid_has_buffered() ask the mac layer if it has
> any frames queued, and then have ath_tx_get_tid_subframe() basically
> translate straight to a call to ieee80211_tx_dequeue() (taking into
> account the retry queue)? Not sure if that covers all code paths, but
> the current approach seems like it has one too many layers of queues?
>
> Haven't given the above a lot of thought, but since you asked... :)

Thanks. I've gotten no other feedback that suggests anyone else has
read the code. So I very much appreciate it.

Yes, I didn't like that side effect either, but (at least for my first
attempt) wanted to leave the old transmit path working, in particular
because I couldn't see how to get all the chanctx stuff working right
without leaving the old driver-internal queues in place. (I'm not
even sure what I would have to do to test the driver with
CONFIG_ATH9K_CHANNEL_CONTEXT turned on. I did test that it compiles
without error when CONFIG_ATH9K_CHANNEL_CONTEXT=y before I released my
v2 patch, and I tried to understand it enough to avoid breaking
anything. (My v1 patch broke compilation when
CONFIG_ATH9K_CHANNEL_CONTEXT=y as some robot notified me after I
posted it.)

I looked for a way to ask mac80211 if there are any packets left in
the intermediate queue without dequeueing a packet and I failed to
find such an interface. I guess I should have submitted a patch to
add that to mac80211. Or maybe there's a way to refactor the
aggregation code in ath9k so that it can cleanly work with the
existing ieee80211_tx_dequeue() without having to add another
interface to mac80211, but I didn't figure it out. It would have been
a bigger patch to ath9k, and require more thinking when reading the
patch to see if it works (assuming pre-patch ath9k works).

My current approach was an attempt to get it working. Yes, the code
looks like it has another layer of queues, but at run time there's
only one packet at a time on the internal queue that packets wind up
on from the mac80211 intermediate queue and by putting it on that
queue it can interface to the same code that pulls from that queue or
the previously existing queues (which the chanctx code seems to
require remain around without major surgery).

I thought about first patching ath9k to remove all the chanctx stuff
just to have a simpler starting point to work on, but I wouldn't
expect that patch to ever be accepted upstream (assuming that somebody
somewhere is getting useful stuff from the chanctx stuff).

Again, thanks for looking at this patch.

I'm now working on figuring out the right way to fix my bug in
the <= v2 patch where I fail to respect the hwq_max_pending
values on the new transmit path, and I plan to post a v3 patch
when I get that done.

-Tim Shepard
[email protected]

2016-06-06 17:28:59

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [RFC/RFT 2/5] ath9k: use mac80211 intermediate software queues

Tim Shepard <[email protected]> writes:

> OK, makes sense. But you left it called txq in mac80211. So someone
> reading the mac80211 code and ath9k at the same time (to understand
> the whole stack) winds up with txq meaning different things, including
> in places in ath9k where it has to reference a field in a structure
> defined by mac80211's header files to point to one of its txqs.

Yeah, realise there's a problem here. I was coming at it from the ath9k
side, so to speak, and having the variable txq suddenly be something
else was confusing.
>
>> As for the first point, I think it would be easier if you did not call
>> the mac80211 queues 'txq' but something else ('swq' like I did, perhaps;
>> I also considered 'imq' for intermediate queue). This will at least make
>> it clear at a glance that it is something different than 'txq' was
>> previously.
>
> I'm not the one who called them txq.

I was referring to the variable names, not the struct name. Having
'txq->foo' suddenly be something else is what ticked me off.

> That was Felix's patch. He also wrote the mt76 driver and in that
> driver txq consistently means the mac80211 intermediate queue and not
> a driver internal queue or a hardware queue. So I was trying to
> converge ath9k to be more like the one and only example I had of a
> driver that uses the intermediate queue.

Well, that is certainly an argument for going ahead with the renaming.
Hmm, would posting the renaming as a proper proposed patch explaining
the reasoning be a way to get some feedback on whether this would be a
tractable way forward (and also of keeping the delta against mainline
lower)?

> Thanks. I've gotten no other feedback that suggests anyone else has
> read the code. So I very much appreciate it.

You're very welcome; your patch made it possible for me to get straight
to hacking on the parts that I wanted to, without having to first figure
out how to best interface with the mac80211 stuff. Very helpful :)

> Yes, I didn't like that side effect either, but (at least for my first
> attempt) wanted to leave the old transmit path working, in particular
> because I couldn't see how to get all the chanctx stuff working right
> without leaving the old driver-internal queues in place. (I'm not even
> sure what I would have to do to test the driver with
> CONFIG_ATH9K_CHANNEL_CONTEXT turned on. I did test that it compiles
> without error when CONFIG_ATH9K_CHANNEL_CONTEXT=y before I released my
> v2 patch, and I tried to understand it enough to avoid breaking
> anything. (My v1 patch broke compilation when
> CONFIG_ATH9K_CHANNEL_CONTEXT=y as some robot notified me after I
> posted it.)

Right. Well I have been cheerfully ignoring the chanctx stuff so far.
What does that do?

> I looked for a way to ask mac80211 if there are any packets left in
> the intermediate queue without dequeueing a packet and I failed to
> find such an interface. I guess I should have submitted a patch to add
> that to mac80211. Or maybe there's a way to refactor the aggregation
> code in ath9k so that it can cleanly work with the existing
> ieee80211_tx_dequeue() without having to add another interface to
> mac80211, but I didn't figure it out. It would have been a bigger
> patch to ath9k, and require more thinking when reading the patch to
> see if it works (assuming pre-patch ath9k works).

Well code actually building the aggregates is not the problem, I think.
That just loops on while(ath_tid_has_buffered()) which is pretty
straight forward to turn into a dequeue + checking for NULL. It's the
aggr_{sleep,wakup,resume} that's conditioning txq wakeup on
ath_tid_has_buffered() that's the main problem I guess. What would
happen if that was changed to just unconditionally schedule the tid on
wakeup?

But maybe having an ieee80211_tx_peek() function would be useful in any
case? It seems that there's an internal data structure in mac80211
(struct txq_info) that keeps a byte count for that queue, so exporting
that would be trivial...

> I'm now working on figuring out the right way to fix my bug in the <=
> v2 patch where I fail to respect the hwq_max_pending values on the new
> transmit path, and I plan to post a v3 patch when I get that done.

What's the symptom of this? As I said I haven't noticed anything, but it
might be worth looking out for.

-Toke

2016-06-03 17:38:06

by Tim Shepard

[permalink] [raw]
Subject: Re: [RFC/RFT 2/5] ath9k: use mac80211 intermediate software queues



> Reworked to not require the global variable renaming in ath9k.
>
> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>


Huh. I wonder why you did that? Is it really better to call the
ieee80211_txq a swq and call the ath9k hardware queue a txq? I
thought doing the renaming made for more readable and much more
maintainable code (where searching for text strings produced much
cleaner results when trying to track down all references).


I am grateful to learn that someone has read my patched version of
ath9k at least enough to do this rework. Any comments on the actual
work?

I've recently figured out that I botched something to do with flow
control from the mac80211 intermediate queue into the hwq. With my v2
patch the ath9k xmit path via the intermediate queues fails to
increment hwq->pending_frames, making the hwq_max_pending values
(which are tweakble in the debug file system) inneffective. I'm in
the process of fixing this flaw in my patch and hope to have that done
soon so I can post a v3 of my ath9k patch.


Now I'm wondering if there is some reason I should also rework it to
avoid the renaming, making it more like what you just posted.

Or was your effort to rework the patch to avoid the renaming just to
make things easier in the interim while we (some of us) are testing
this patch in various systems (with slightly different versions of
ath9k driver)?


-Tim Shepard
[email protected]

2016-06-06 05:50:36

by Tim Shepard

[permalink] [raw]
Subject: Re: [RFC/RFT 2/5] ath9k: use mac80211 intermediate software queues



> > Thanks. I've gotten no other feedback that suggests anyone else has
> > read the code. So I very much appreciate it.
>
> I not only read it but tested it extensively against the ath9k stock,
> ath10k stock, ath10k -michal's fqmac 3.5 patches, and your ath9k
> patches...

My patch to ath9k shouldn't have any effect on any of the ath10k
stuff. It only cuts ath9k over to use the new intermediate queues.

I should point out again that Avery's observation that michal's
mac80211 flow queueing patches and mac80211 codel stuff aren't needed
to the improvement between competing client stations. All that's
needed is to use the new mac80211 per-station per-tid intermediate
queues and also set the IFF_NO_QUEUE bit.

For ath9k, Felix's mac80211 interemediate queues patch (already in
mainline Linux over a year ago), my patch to ath9k, and just
Michal's first patch alone "[PATCHv3 1/5] mac80211: skip netdev
queue control with software queuing" should (and seems to in
testing I've done so far) get all the latency improvement there is
to be had when the competing traffic is to a different client
station.



> After losing my temper at the impact of channel scans...
>
> ( https://plus.google.com/u/0/107942175615993706558/posts/WA915Pt4SRN
> ), I got told how to get rid of them for testing, and started redoing
> the work when I got back from vacation.

Heh... I've never seen that problem. But I'm not running
network-manager on any machine in my testbed.


> > I looked for a way to ask mac80211 if there are any packets left in
> > the intermediate queue without dequeueing a packet and I failed to
> > find such an interface.
>
> qdisc->peek like function? A bitmap that says these stations have data
> outstanding?
>

I'm not sure what you mean here. Are you asking if that was what I
was looking for in the existing mac80211 code? If so, yes, that is
what I was looking for. Or are you suggesting (or agreeing) that
indeed such an interface should be added to mac80211?

Though I don't see how it could be a bitmap... stations don't have
any obvious stable associated index into such a bitmap.

The real problem might be the way all the code in ath9k/xmit.c is
designed where it needs ath_tid_has_buffered() in so many places.

Look at the mt76 driver, and see how it also maintains a small
(probably only at most one packet ever) queue "retry_q" which
sometimes does hold a packet which will get sent before it will
pull another packet from ieee80211_tx_dequeue. But the rest of the
driver is structured so that a NULL return from mt76_txq_dequeue()
is all it needs. That's probably the sort of structure we ought to
have in ath9k, but that's going to be a more invasive patch to
ath9k (and I haven't figured out how to do it yet).

I think we can get there incrementally though. I hope the patch for
ath9k that I've got now (and the one I hope to have soon that fixes
the problem of not making use of hwq_max_pending() ) might be useful
intermediate steps (that work, and provide improvements) then we can
clean up ath9k removing mechanisms that are no longer needed (breaking
the old transmit path, and somehow handling the chanctx stuff).


-Tim Shepard
[email protected]

2016-06-10 09:02:37

by Michal Kazior

[permalink] [raw]
Subject: Re: [Make-wifi-fast] [RFC/RFT 5/5] ath9k: Count RX airtime in airtime deficit

On 10 June 2016 at 10:53, Toke Høiland-Jørgensen <[email protected]> wrote:
>
>>> I initially thought that using the timestamp put into the frame by the
>>> hardware could be a way to get timing. But there's only a timestamp of
>>> the first symbol in rs_tstamp, and getting a time to compare it with is
>>> difficult; by the time the frame is handled in the rx tasklet, way too
>>> much time has been spent on interrupt handling etc for the current time
>>> to be worth comparing with.
>
> As an aside, I'm no longer sure this explanation for why I got wrong
> timings for this way of measuring RX time is correct. It may simply be
> that I was counting the same time interval more than once because I
> didn't realise that the handler would be called for each frame. See
> below.
>
>> I think rs_tstamp in rx-status is different for first MPDU and last
>> MPDU in an A-MPDU meaning you should be able to compute the entire
>> duration (if you track it, and this should be fairly straightforward
>> as you can't really rx interleaved MPDUs from different
>> A-MPDUs/stations). I'm not sure if the last MPDU defines the tstamp of
>> first symbol or last one.
>
> I actually went down this path again last night, but haven't had a
> chance to test it yet.
>
> So what I have now is this:
>
> /* Only count airtime for last frame in an aggregate. FIXME: Should
> * this be only the first frame? */
> if (!rs->rs_isaggr || !rs->rs_moreaggr)
> airtime = (tsf & 0xffffffff) - rs->rs_tstamp;
>
> Which was under the assumption that rs_tstamp will be the time of the
> first symbol *of the whole ampdu* for all the frames in that aggregate.
> However, if rs_tstamp differs between frames, tracking it at the first
> frame might be the right things to do?

For A-MPDU all MPDU rx status (except last one) should share the same
timestamp. Last one has a different one so all you need is to
distinguish first and last MPDU. Non A-MPDU obviously are special case
(status bits are pricky).


> Is the entire A-MPDU received before the RX handler is called for the
> first frame?

No idea. Maybe it is as there's distinction between "more" and "moreaggr".


Michał

2016-06-05 17:23:09

by Adrian Chadd

[permalink] [raw]
Subject: Re: [RFC/RFT 5/5] ath9k: Count RX airtime in airtime deficit

On 5 June 2016 at 03:55, Toke Høiland-Jørgensen <[email protected]> wrote:
> Adrian Chadd <[email protected]> writes:
>
>> I've been working on something similar in freebsd, so cool to see this
>> happening here!
>
> Cool. Do you have code available somewhere?

not yet. It was mostly what you just did in ath9k anyway :)

The sample rate control module and net80211 already had the duration
calculation bits, and I was just trying to figure out how to track it
per-node in a useful way without blowing CPU time.

>
>> The only thing missing atm is STBC and LDPC. My RX airtime code looks
>> basically like this one too; but I have TODO items for ensuring
>> LDPC/STBC calculations are sane.
>
> So basically this means taking into account whether there was MIMO in
> use (which may lower the transmit time)? My understanding is that (at
> least in Linux) this is encoded in the rate tables (i.e. a MIMO rate is
> a separate entry). Am I wrong in thinking so? Or is this something else
> entirely?

The existing routines take GI, MCS rate, channel width; figure out
MIMO, and do the math. The assumption is the symbol count is the same.
For STBC I think there's an additional symbol at the end (because it's
mixed current symbol plus previous symbol.) I forget the details for
LDPC.

I'll er, go read the standard again. :)


-adrian

2016-06-05 10:55:18

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [RFC/RFT 5/5] ath9k: Count RX airtime in airtime deficit

Adrian Chadd <[email protected]> writes:

> I've been working on something similar in freebsd, so cool to see this
> happening here!

Cool. Do you have code available somewhere?

> The only thing missing atm is STBC and LDPC. My RX airtime code looks
> basically like this one too; but I have TODO items for ensuring
> LDPC/STBC calculations are sane.

So basically this means taking into account whether there was MIMO in
use (which may lower the transmit time)? My understanding is that (at
least in Linux) this is encoded in the rate tables (i.e. a MIMO rate is
a separate entry). Am I wrong in thinking so? Or is this something else
entirely?

-Toke

2016-06-08 13:06:33

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [RFC/RFT 5/5] ath9k: Count RX airtime in airtime deficit

Adrian Chadd <[email protected]> writes:

> Right. In the case of RX'ing an A-MPDU, we only get told about the
> A-MPDU boundaries (isaggr/lastaggr or something in the RX descriptor)
> but nothing telling us how long the original RX'ed PPDU is.
>
> So if we get say 16 frames and we are missing the middle one, we can
> reconstruct things okay. But if we miss the first 8 frames, we don't
> know when it started - we only get the RX aggr boundary flags set on
> the 9th and the 16th and we don't even know about the missed frames.

Yes, discovering retransmissions is difficult at the RX side by nature.
Another approach to catching (some of) these is trying to parse the MAC
address even for corrupted frames, and if it fits to a station, account
the airtime. Of course this will only catch cases where the frame header
does make it through, and it won't work if the hardware discards the
frame as corrupt before we get to see it.

> I think that's going to be a shortcoming right now. I couldn't think
> of a clever way to figure it out except to detect holes in the BAW and
> determine the client is missing frames and take actions there.

Yes, I'm fine with having the RX side airtime accounting just ignore the
retransmission issue, at least for now.

-Toke

2016-06-10 09:08:06

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [Make-wifi-fast] [RFC/RFT 5/5] ath9k: Count RX airtime in airtime deficit

Michal Kazior <[email protected]> writes:

> For A-MPDU all MPDU rx status (except last one) should share the same
> timestamp. Last one has a different one so all you need is to
> distinguish first and last MPDU. Non A-MPDU obviously are special case
> (status bits are pricky).

Right. So comparing the rs_stamp between first and last MPDU should give
the duration of the entire thing? This would require keeping state
between subsequent calls to the RX handler. Also, what happens if the
last MPDU is lost?

>> Is the entire A-MPDU received before the RX handler is called for the
>> first frame?
>
> No idea. Maybe it is as there's distinction between "more" and
> "moreaggr".

Hmm. If it is, comparing the stamp of the first MPDU to the current time
(when handling it) should give the needed duration? Will try doing that
and see what the result is.

-Toke

2016-06-07 08:58:09

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [RFC/RFT 5/5] ath9k: Count RX airtime in airtime deficit

Adrian Chadd <[email protected]> writes:

> [snip]
>
> I also found one of my notes in my version of this - how can we
> estimate the duration of an A-MPDU, when we only get hardware
> de-encapsulated frames?

Well in my case I'm sidestepping this by getting the TX duration from a
register in the hardware. There seems to be registers containing the
duration spent on each step in the retry chain; I simply sum these.

-Toke

2016-06-06 17:26:50

by Dave Taht

[permalink] [raw]
Subject: Re: [RFC/RFT 2/5] ath9k: use mac80211 intermediate software queues

For the record, michal's lastest patchset for the ath10k is here:

https://github.com/kazikcz/linux/tree/fqmac-v5

which includes the reworked codel.h support (which also landed in
net-next as of april 22) (no, haven't tried it yet, I'm only a day
back from vacation)

... but it would pay to leverage rate control more, for the ath9k, and
I'd like folk to agree on a standardized set of statistics in a std
location that can be polled for all implementations (ath9k, ath10k,
mt76)


I am also reviewing this:

http://info.iet.unipi.it/~luigi/papers/20160511-mysched-preprint.pdf

as we have a chance to innovate and use less locking with all this
stuff happening at the mac80211 layer.

2016-06-07 01:31:10

by Jonathan Morton

[permalink] [raw]
Subject: Re: [Make-wifi-fast] [RFC/RFT 5/5] ath9k: Count RX airtime in airtime deficit


> On 7 Jun, 2016, at 03:01, Adrian Chadd <[email protected]> wrote:
>
> I also found one of my notes in my version of this - how can we
> estimate the duration of an A-MPDU, when we only get hardware
> de-encapsulated frames?

If my understanding is correct:

The A-MPDU itself is sent at the “data” rate (selected by Minstrel), so its total size can be used to calculate this portion of its duration in the same way as any other packet.

It is also surrounded by the same framing/preamble material, sent at the fixed “control” rate, as any normal frame.

The sum of these two major components should be the total airtime, no?

Or is the difficulty in identifying which de-encapsulated frames belonged to a particular A-MPDU?

- Jonathan Morton


2016-06-06 04:26:01

by Dave Taht

[permalink] [raw]
Subject: Re: [RFC/RFT 2/5] ath9k: use mac80211 intermediate software queues

On Sun, Jun 5, 2016 at 7:59 PM, Tim Shepard <[email protected]> wrote:
>
>
>> > Huh. I wonder why you did that? Is it really better to call the
>> > ieee80211_txq a swq and call the ath9k hardware queue a txq? I
>> > thought doing the renaming made for more readable and much more
>> > maintainable code (where searching for text strings produced much
>> > cleaner results when trying to track down all references).
>>
>> Well, the immediate reason was that at the time I just spent two weeks
>> figuring out how ath9k worked and what the different concepts were, and
>> suddenly they start to mean something different. The txq->hwq renaming
>> would probably make sense in itself, but suddenly the same variable
>> (txq) meant something different, which was quite confusing. So I found
>> it was easier to change your patch to not require the renaming.
>>
>> A secondary reason was to keep the patch delta as small as possible. I'm
>> definitely not the right person to make that call, but I found the
>> global renaming somewhat intrusive.
>>
>
> OK, makes sense. But you left it called txq in mac80211. So someone
> reading the mac80211 code and ath9k at the same time (to understand
> the whole stack) winds up with txq meaning different things, including
> in places in ath9k where it has to reference a field in a structure
> defined by mac80211's header files to point to one of its txqs.
>
>> As for the first point, I think it would be easier if you did not call
>> the mac80211 queues 'txq' but something else ('swq' like I did, perhaps;
>> I also considered 'imq' for intermediate queue). This will at least make
>> it clear at a glance that it is something different than 'txq' was
>> previously.
>
> I'm not the one who called them txq. That was Felix's patch.
> He also wrote the mt76 driver and in that driver txq consistently
> means the mac80211 intermediate queue and not a driver internal queue
> or a hardware queue. So I was trying to converge ath9k to be more
> like the one and only example I had of a driver that uses the
> intermediate queue.

Well, finding a common and minimally invasive (re)naming scheme would be good.

>
>> > I am grateful to learn that someone has read my patched version of
>> > ath9k at least enough to do this rework. Any comments on the actual
>> > work?
>>
>> Well, it seems to work well enough (haven't noticed the pending_frames
>> issue, but on the other hand I haven't been looking very hard). However,
>> it does fell like somewhat of a temporary solution. Having another
>> intermediate queue in the driver (tid->i_q) and having a side-effect in
>> ath_tid_has_buffered() seems a bit iffy to me. Wouldn't the 'right' way
>> to do it be to have ath_tid_has_buffered() ask the mac layer if it has
>> any frames queued, and then have ath_tx_get_tid_subframe() basically
>> translate straight to a call to ieee80211_tx_dequeue() (taking into
>> account the retry queue)? Not sure if that covers all code paths, but
>> the current approach seems like it has one too many layers of queues?
>>
>> Haven't given the above a lot of thought, but since you asked... :)
>
> Thanks. I've gotten no other feedback that suggests anyone else has
> read the code. So I very much appreciate it.

I not only read it but tested it extensively against the ath9k stock,
ath10k stock, ath10k -michal's fqmac 3.5 patches, and your ath9k
patches...

After losing my temper at the impact of channel scans...

( https://plus.google.com/u/0/107942175615993706558/posts/WA915Pt4SRN
), I got told how to get rid of them for testing, and started redoing
the work when I got back from vacation. The reductions in latency
without a decline in throughput were wonderful vs stock in all cases.

The latency reductions at lower rates and speed of response to rate
changes was not what I'd hoped for, but given all the other pieces
flying in loose formation (like the need for an airtime fair scheduler
also), I'm still pretty happy.

http://blog.cerowrt.org/flent/nmfixed/20mbit_not_entirely_convinced_weve_won_completely.svg

http://blog.cerowrt.org/flent/nmfixed/160_vs_80_half_delay.svg

(More plots and data in that directory and in the channel_scan directory.)

At the moment however I plan to pause until cleaner patch sets arrive....

> Yes, I didn't like that side effect either, but (at least for my first
> attempt) wanted to leave the old transmit path working, in particular
> because I couldn't see how to get all the chanctx stuff working right
> without leaving the old driver-internal queues in place. (I'm not
> even sure what I would have to do to test the driver with
> CONFIG_ATH9K_CHANNEL_CONTEXT turned on. I did test that it compiles
> without error when CONFIG_ATH9K_CHANNEL_CONTEXT=y before I released my
> v2 patch, and I tried to understand it enough to avoid breaking
> anything. (My v1 patch broke compilation when
> CONFIG_ATH9K_CHANNEL_CONTEXT=y as some robot notified me after I
> posted it.)
>
> I looked for a way to ask mac80211 if there are any packets left in
> the intermediate queue without dequeueing a packet and I failed to
> find such an interface.

qdisc->peek like function? A bitmap that says these stations have data
outstanding?

> I guess I should have submitted a patch to
> add that to mac80211. Or maybe there's a way to refactor the
> aggregation code in ath9k so that it can cleanly work with the
> existing ieee80211_tx_dequeue() without having to add another
> interface to mac80211, but I didn't figure it out. It would have been
> a bigger patch to ath9k, and require more thinking when reading the
> patch to see if it works (assuming pre-patch ath9k works).
>
> My current approach was an attempt to get it working. Yes, the code
> looks like it has another layer of queues, but at run time there's
> only one packet at a time on the internal queue that packets wind up
> on from the mac80211 intermediate queue and by putting it on that
> queue it can interface to the same code that pulls from that queue or
> the previously existing queues (which the chanctx code seems to
> require remain around without major surgery).
>
> I thought about first patching ath9k to remove all the chanctx stuff
> just to have a simpler starting point to work on, but I wouldn't
> expect that patch to ever be accepted upstream (assuming that somebody
> somewhere is getting useful stuff from the chanctx stuff).
>
> Again, thanks for looking at this patch.
>
> I'm now working on figuring out the right way to fix my bug in
> the <= v2 patch where I fail to respect the hwq_max_pending
> values on the new transmit path, and I plan to post a v3 patch
> when I get that done.
>
> -Tim Shepard
> [email protected]
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html



--
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

2016-06-04 17:06:10

by Adrian Chadd

[permalink] [raw]
Subject: Re: [RFC/RFT 5/5] ath9k: Count RX airtime in airtime deficit

hi,

I've been working on something similar in freebsd, so cool to see this
happening here!

The only thing missing atm is STBC and LDPC. My RX airtime code looks
basically like this one too; but I have TODO items for ensuring
LDPC/STBC calculations are sane.



-adrian


On 3 June 2016 at 09:51, Toke Høiland-Jørgensen <[email protected]> wrote:
> This adds RX airtime to the airtime deficit used in the scheduler. This
> is not a definite win, but I have only done very limited testing where
> it has been included. Feel free to skip this patch when testing.
>
> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
> ---
> drivers/net/wireless/ath/ath9k/recv.c | 51 ++++++++++++++++++++++++++++++++++-
> 1 file changed, 50 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/ath/ath9k/recv.c b/drivers/net/wireless/ath/ath9k/recv.c
> index 7eb8980..23d6ebe 100644
> --- a/drivers/net/wireless/ath/ath9k/recv.c
> +++ b/drivers/net/wireless/ath/ath9k/recv.c
> @@ -991,6 +991,55 @@ static void ath9k_apply_ampdu_details(struct ath_softc *sc,
> }
> }
>
> +static void ath_rx_count_airtime(struct ath_softc *sc,
> + struct ath_rx_status *rs,
> + struct sk_buff *skb)
> +{
> + struct ath_node *an;
> + struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
> + struct ath_hw *ah = sc->sc_ah;
> + struct ath_common *common = ath9k_hw_common(ah);
> + struct ieee80211_sta *sta;
> + struct ieee80211_rx_status *rxs;
> + const struct ieee80211_rate *rate;
> + bool is_sgi, is_40, is_sp;
> + int phy;
> + u32 airtime = 0;
> +
> + if (!ieee80211_is_data(hdr->frame_control))
> + return;
> +
> + rcu_read_lock();
> +
> + sta = ieee80211_find_sta_by_ifaddr(sc->hw, hdr->addr2, NULL);
> + if (!sta)
> + goto exit;
> + an = (struct ath_node *) sta->drv_priv;
> + rxs = IEEE80211_SKB_RXCB(skb);
> +
> + is_sgi = !!(rxs->flag & RX_FLAG_SHORT_GI);
> + is_40 = !!(rxs->flag & RX_FLAG_40MHZ);
> + is_sp = !!(rxs->flag & RX_FLAG_SHORTPRE);
> +
> + if (!!(rxs->flag & RX_FLAG_HT)) {
> + /* MCS rates */
> +
> + airtime += ath_pkt_duration(sc, rxs->rate_idx, rs->rs_datalen,
> + is_40, is_sgi, is_sp);
> + } else {
> +
> + phy = IS_CCK_RATE(rs->rs_rate) ? WLAN_RC_PHY_CCK : WLAN_RC_PHY_OFDM;
> + rate = &common->sbands[rxs->band].bitrates[rxs->rate_idx];
> + airtime += ath9k_hw_computetxtime(ah, phy, rate->bitrate * 100,
> + rs->rs_datalen, rxs->rate_idx, is_sp);
> + }
> +
> + an->airtime_deficit -= airtime;
> + ath_debug_airtime(sc, an, airtime, 0);
> +exit:
> + rcu_read_unlock();
> +}
> +
> int ath_rx_tasklet(struct ath_softc *sc, int flush, bool hp)
> {
> struct ath_rxbuf *bf;
> @@ -1137,7 +1186,7 @@ int ath_rx_tasklet(struct ath_softc *sc, int flush, bool hp)
> ath9k_antenna_check(sc, &rs);
> ath9k_apply_ampdu_details(sc, &rs, rxs);
> ath_debug_rate_stats(sc, &rs, skb);
> - ath_debug_rx_airtime(sc, &rs, skb);
> + ath_rx_count_airtime(sc, &rs, skb);
>
> hdr = (struct ieee80211_hdr *)skb->data;
> if (ieee80211_is_ack(hdr->frame_control))
> --
> 2.7.4
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2016-06-07 11:12:58

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [RFC/RFT 5/5] ath9k: Count RX airtime in airtime deficit

Toke Høiland-Jørgensen <[email protected]> writes:

>> [snip]
>>
>> I also found one of my notes in my version of this - how can we
>> estimate the duration of an A-MPDU, when we only get hardware
>> de-encapsulated frames?
>
> Well in my case I'm sidestepping this by getting the TX duration from
> a register in the hardware. There seems to be registers containing the
> duration spent on each step in the retry chain; I simply sum these.

Ah, but you're still talking RX? Hmm, I'm using ath_pkt_duration() to
compute the RX time, which does take into account MIMO (I think) but
expects the size to include padding. Which is probably not included in
the rs_datalen field of struct ath_rx_status that I'm using.

So yeah, how to account for that?

I initially thought that using the timestamp put into the frame by the
hardware could be a way to get timing. But there's only a timestamp of
the first symbol in rs_tstamp, and getting a time to compare it with is
difficult; by the time the frame is handled in the rx tasklet, way too
much time has been spent on interrupt handling etc for the current time
to be worth comparing with.

-Toke

2016-06-10 09:20:17

by Michal Kazior

[permalink] [raw]
Subject: Re: [Make-wifi-fast] [RFC/RFT 5/5] ath9k: Count RX airtime in airtime deficit

On 10 June 2016 at 11:08, Toke Høiland-Jørgensen <[email protected]> wrote:
> Michal Kazior <[email protected]> writes:
>
>> For A-MPDU all MPDU rx status (except last one) should share the same
>> timestamp. Last one has a different one so all you need is to
>> distinguish first and last MPDU. Non A-MPDU obviously are special case
>> (status bits are pricky).
>
> Right. So comparing the rs_stamp between first and last MPDU should give
> the duration of the entire thing?

Depends on how you define your "thing" :) I no, I don't know what
you'll actually measure. It should be reasonable either way.


> This would require keeping state
> between subsequent calls to the RX handler. Also, what happens if the
> last MPDU is lost?

No idea. If that's possible, then track last MPDU within PPDU, so you
can at least fallback to _something_ when you detect a new first
(A-)MPDU?

Or maybe it's impossible (i.e. not worth worrying) and HW always
reports last MPDU as far as status bits are concerned (regardless of
it being _actual_ last MPDU, i.e. it just says "ok, I'm done with this
PPDU").


>>> Is the entire A-MPDU received before the RX handler is called for the
>>> first frame?
>>
>> No idea. Maybe it is as there's distinction between "more" and
>> "moreaggr".
>
> Hmm. If it is, comparing the stamp of the first MPDU to the current time
> (when handling it) should give the needed duration? Will try doing that
> and see what the result is.

I'd say it's a little racy/inaccurate (and perhaps unreliable) to
compare any kind of global timer and compare it against your rx-status
descriptors.


Michał

2016-06-06 02:26:30

by Julian Calaby

[permalink] [raw]
Subject: Re: [RFC/RFT 1/5] mac80211: skip netdev queue control with software queuing

Hi Toke,

On Sat, Jun 4, 2016 at 2:51 AM, Toke Høiland-Jørgensen <[email protected]> wrote:
> From: Michal Kazior <[email protected]>
>
> Qdiscs are designed with no regard to 802.11
> aggregation requirements and hand out
> packet-by-packet with no guarantee they are
> destined to the same tid. This does more bad than
> good no matter how fairly a given qdisc may behave
> on an ethernet interface.
>
> Software queuing used per-AC netdev subqueue
> congestion control whenever a global AC limit was
> hit. This meant in practice a single station or
> tid queue could starve others rather easily. This
> could resonate with qdiscs in a bad way or could
> just end up with poor aggregation performance.
> Increasing the AC limit would increase induced
> latency which is also bad.
>
> Disabling qdiscs by default and performing
> taildrop instead of netdev subqueue congestion
> control on the other hand makes it possible for
> tid queues to fill up "in the meantime" while
> preventing stations starving each other.
>
> This increases aggregation opportunities and
> should allow software queuing based drivers
> achieve better performance by utilizing airtime
> more efficiently with big aggregates.
>
> Signed-off-by: Michal Kazior <[email protected]>

As this patch is passing through your hands, you need to add your
Signed-off-by too.

Thanks,

--
Julian Calaby

Email: [email protected]
Profile: http://www.google.com/profiles/julian.calaby/

2016-06-10 08:40:56

by Michal Kazior

[permalink] [raw]
Subject: Re: [Make-wifi-fast] [RFC/RFT 5/5] ath9k: Count RX airtime in airtime deficit

On 7 June 2016 at 13:12, Toke Høiland-Jørgensen <[email protected]> wrote:
> Toke Høiland-Jørgensen <[email protected]> writes:
>
>>> [snip]
>>>
>>> I also found one of my notes in my version of this - how can we
>>> estimate the duration of an A-MPDU, when we only get hardware
>>> de-encapsulated frames?
>>
>> Well in my case I'm sidestepping this by getting the TX duration from
>> a register in the hardware. There seems to be registers containing the
>> duration spent on each step in the retry chain; I simply sum these.
>
> Ah, but you're still talking RX? Hmm, I'm using ath_pkt_duration() to
> compute the RX time, which does take into account MIMO (I think) but
> expects the size to include padding. Which is probably not included in
> the rs_datalen field of struct ath_rx_status that I'm using.
>
> So yeah, how to account for that?
>
> I initially thought that using the timestamp put into the frame by the
> hardware could be a way to get timing. But there's only a timestamp of
> the first symbol in rs_tstamp, and getting a time to compare it with is
> difficult; by the time the frame is handled in the rx tasklet, way too
> much time has been spent on interrupt handling etc for the current time
> to be worth comparing with.

I think rs_tstamp in rx-status is different for first MPDU and last
MPDU in an A-MPDU meaning you should be able to compute the entire
duration (if you track it, and this should be fairly straightforward
as you can't really rx interleaved MPDUs from different
A-MPDUs/stations). I'm not sure if the last MPDU defines the tstamp of
first symbol or last one.


Michał

2016-06-03 16:52:38

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: [RFC/RFT 2/5] ath9k: use mac80211 intermediate software queues

This patch leaves the code for ath9k's internal per-node per-tid
queues in place and just modifies the driver to also pull from
the new mac80211 intermediate software queues, and implements
the .wake_tx_queue method, which will cause mac80211 to deliver
packets to be sent via the new intermediate queue.

Signed-off-by: Tim Shepard <[email protected]>

Reworked to not require the global variable renaming in ath9k.

Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
---
drivers/net/wireless/ath/ath9k/ath9k.h | 16 +++-
drivers/net/wireless/ath/ath9k/debug_sta.c | 7 +-
drivers/net/wireless/ath/ath9k/init.c | 1 +
drivers/net/wireless/ath/ath9k/main.c | 1 +
drivers/net/wireless/ath/ath9k/xmit.c | 119 +++++++++++++++++++++++++----
5 files changed, 125 insertions(+), 19 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index 93b3793..caeae10 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -145,8 +145,6 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
#define BAW_WITHIN(_start, _bawsz, _seqno) \
((((_seqno) - (_start)) & 4095) < (_bawsz))

-#define ATH_AN_2_TID(_an, _tidno) (&(_an)->tid[(_tidno)])
-
#define IS_HT_RATE(rate) (rate & 0x80)
#define IS_CCK_RATE(rate) ((rate >= 0x18) && (rate <= 0x1e))
#define IS_OFDM_RATE(rate) ((rate >= 0x8) && (rate <= 0xf))
@@ -232,8 +230,10 @@ struct ath_buf {

struct ath_atx_tid {
struct list_head list;
+ struct sk_buff_head i_q;
struct sk_buff_head buf_q;
struct sk_buff_head retry_q;
+ struct ieee80211_txq *swq;
struct ath_node *an;
struct ath_txq *txq;
unsigned long tx_buf[BITS_TO_LONGS(ATH_TID_MAX_BUFS)];
@@ -247,13 +247,13 @@ struct ath_atx_tid {
s8 bar_index;
bool active;
bool clear_ps_filter;
+ bool swq_nonempty;
};

struct ath_node {
struct ath_softc *sc;
struct ieee80211_sta *sta; /* station struct we're part of */
struct ieee80211_vif *vif; /* interface with which we're associated */
- struct ath_atx_tid tid[IEEE80211_NUM_TIDS];

u16 maxampdu;
u8 mpdudensity;
@@ -271,6 +271,15 @@ struct ath_node {
struct list_head list;
};

+static inline
+struct ath_atx_tid *ath_an_2_tid(struct ath_node *an, u8 tidno)
+{
+ struct ieee80211_sta *sta = an->sta;
+ struct ieee80211_vif *vif = an->vif;
+ struct ieee80211_txq *swq = sta ? sta->txq[tidno] : vif->txq;
+ return (struct ath_atx_tid *) swq->drv_priv;
+}
+
struct ath_tx_control {
struct ath_txq *txq;
struct ath_node *an;
@@ -585,6 +594,7 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
u16 tids, int nframes,
enum ieee80211_frame_release_type reason,
bool more_data);
+void ath9k_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *swq);

/********/
/* VIFs */
diff --git a/drivers/net/wireless/ath/ath9k/debug_sta.c b/drivers/net/wireless/ath/ath9k/debug_sta.c
index b66cfa9..0e7f6b5 100644
--- a/drivers/net/wireless/ath/ath9k/debug_sta.c
+++ b/drivers/net/wireless/ath/ath9k/debug_sta.c
@@ -25,6 +25,7 @@ static ssize_t read_file_node_aggr(struct file *file, char __user *user_buf,
{
struct ath_node *an = file->private_data;
struct ath_softc *sc = an->sc;
+ struct ieee80211_txq *swq;
struct ath_atx_tid *tid;
struct ath_txq *txq;
u32 len = 0, size = 4096;
@@ -52,8 +53,10 @@ static ssize_t read_file_node_aggr(struct file *file, char __user *user_buf,
"TID", "SEQ_START", "SEQ_NEXT", "BAW_SIZE",
"BAW_HEAD", "BAW_TAIL", "BAR_IDX", "SCHED", "PAUSED");

- for (tidno = 0, tid = &an->tid[tidno];
- tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
+ for (tidno = 0;
+ tidno < IEEE80211_NUM_TIDS; tidno++) {
+ swq = an->sta->txq[tidno];
+ tid = (struct ath_atx_tid *) swq->drv_priv;
txq = tid->txq;
ath_txq_lock(sc, txq);
if (tid->active) {
diff --git a/drivers/net/wireless/ath/ath9k/init.c b/drivers/net/wireless/ath/ath9k/init.c
index 2ee8624..211736c 100644
--- a/drivers/net/wireless/ath/ath9k/init.c
+++ b/drivers/net/wireless/ath/ath9k/init.c
@@ -873,6 +873,7 @@ static void ath9k_set_hw_capab(struct ath_softc *sc, struct ieee80211_hw *hw)
hw->max_rate_tries = 10;
hw->sta_data_size = sizeof(struct ath_node);
hw->vif_data_size = sizeof(struct ath_vif);
+ hw->txq_data_size = sizeof(struct ath_atx_tid);
hw->extra_tx_headroom = 4;

hw->wiphy->available_antennas_rx = BIT(ah->caps.max_rxchains) - 1;
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 8b63988..6ab56e5 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -2668,4 +2668,5 @@ struct ieee80211_ops ath9k_ops = {
.sw_scan_start = ath9k_sw_scan_start,
.sw_scan_complete = ath9k_sw_scan_complete,
.get_txpower = ath9k_get_txpower,
+ .wake_tx_queue = ath9k_wake_tx_queue,
};
diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
index 8ddd604..cdc8684 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -65,6 +65,8 @@ static struct ath_buf *ath_tx_setup_buffer(struct ath_softc *sc,
struct ath_txq *txq,
struct ath_atx_tid *tid,
struct sk_buff *skb);
+static int ath_tx_prepare(struct ieee80211_hw *hw, struct sk_buff *skb,
+ struct ath_tx_control *txctl);

enum {
MCS_HT20,
@@ -118,6 +120,21 @@ static void ath_tx_queue_tid(struct ath_softc *sc, struct ath_txq *txq,
list_add_tail(&tid->list, list);
}

+void ath9k_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *swq)
+{
+ struct ath_softc *sc = hw->priv;
+ struct ath_atx_tid *tid = (struct ath_atx_tid *) swq->drv_priv;
+ struct ath_txq *txq = tid->txq;
+
+ spin_lock_bh(&txq->axq_lock);
+
+ tid->swq_nonempty = true;
+ ath_tx_queue_tid(sc, txq, tid);
+ ath_txq_schedule(sc, txq);
+
+ spin_unlock_bh(&txq->axq_lock);
+}
+
static struct ath_frame_info *get_frame_info(struct sk_buff *skb)
{
struct ieee80211_tx_info *tx_info = IEEE80211_SKB_CB(skb);
@@ -170,12 +187,51 @@ static struct ath_atx_tid *
ath_get_skb_tid(struct ath_softc *sc, struct ath_node *an, struct sk_buff *skb)
{
u8 tidno = skb->priority & IEEE80211_QOS_CTL_TID_MASK;
- return ATH_AN_2_TID(an, tidno);
+ return ath_an_2_tid(an, tidno);
}

+static void ath_swq_pull(struct ath_atx_tid *tid)
+{
+ struct sk_buff *skb;
+ struct ath_tx_control txctl;
+ struct ath_frame_info *fi;
+ int r;
+
+ if (!skb_queue_empty(&tid->i_q))
+ return;
+
+ if (!tid->swq_nonempty)
+ return;
+
+ skb = ieee80211_tx_dequeue(tid->an->sc->hw, tid->swq);
+ if (!skb) {
+ tid->swq_nonempty = false;
+ } else {
+ /* sad to do all this with axq_lock held */
+ memset(&txctl, 0, sizeof txctl);
+ txctl.txq = tid->txq;
+ txctl.sta = tid->an->sta;
+ r = ath_tx_prepare(tid->an->sc->hw, skb, &txctl);
+ if (WARN_ON(r != 0)) {
+ /** should not happen ??? */
+ } else {
+ /* perhaps not needed here ??? */
+ fi = get_frame_info(skb);
+ fi->txq = skb_get_queue_mapping(skb);
+
+ __skb_queue_tail(&tid->i_q, skb);
+ ++tid->txq->pending_frames;
+ }
+ }
+ }
+
+
static bool ath_tid_has_buffered(struct ath_atx_tid *tid)
{
- return !skb_queue_empty(&tid->buf_q) || !skb_queue_empty(&tid->retry_q);
+ if (!skb_queue_empty(&tid->buf_q) || !skb_queue_empty(&tid->retry_q) || !skb_queue_empty(&tid->i_q))
+ return true;
+ ath_swq_pull(tid);
+ return !skb_queue_empty(&tid->i_q);
}

static struct sk_buff *ath_tid_dequeue(struct ath_atx_tid *tid)
@@ -185,6 +241,12 @@ static struct sk_buff *ath_tid_dequeue(struct ath_atx_tid *tid)
skb = __skb_dequeue(&tid->retry_q);
if (!skb)
skb = __skb_dequeue(&tid->buf_q);
+ if (!skb)
+ skb = __skb_dequeue(&tid->i_q);
+ if (!skb) {
+ ath_swq_pull(tid);
+ skb = __skb_dequeue(&tid->i_q);
+ }

return skb;
}
@@ -870,6 +932,10 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
*q = &tid->retry_q;
if (skb_queue_empty(*q))
*q = &tid->buf_q;
+ if (skb_queue_empty(*q))
+ *q = &tid->i_q;
+ if (skb_queue_empty(*q))
+ ath_swq_pull(tid);

skb = skb_peek(*q);
if (!skb)
@@ -1482,7 +1548,7 @@ int ath_tx_aggr_start(struct ath_softc *sc, struct ieee80211_sta *sta,
ath_dbg(common, XMIT, "%s called\n", __func__);

an = (struct ath_node *)sta->drv_priv;
- txtid = ATH_AN_2_TID(an, tid);
+ txtid = ath_an_2_tid(an, tid);
txq = txtid->txq;

ath_txq_lock(sc, txq);
@@ -1517,7 +1583,7 @@ void ath_tx_aggr_stop(struct ath_softc *sc, struct ieee80211_sta *sta, u16 tid)
{
struct ath_common *common = ath9k_hw_common(sc->sc_ah);
struct ath_node *an = (struct ath_node *)sta->drv_priv;
- struct ath_atx_tid *txtid = ATH_AN_2_TID(an, tid);
+ struct ath_atx_tid *txtid = ath_an_2_tid(an, tid);
struct ath_txq *txq = txtid->txq;

ath_dbg(common, XMIT, "%s called\n", __func__);
@@ -1533,6 +1599,7 @@ void ath_tx_aggr_sleep(struct ieee80211_sta *sta, struct ath_softc *sc,
struct ath_node *an)
{
struct ath_common *common = ath9k_hw_common(sc->sc_ah);
+ struct ieee80211_txq *swq;
struct ath_atx_tid *tid;
struct ath_txq *txq;
bool buffered;
@@ -1540,9 +1607,11 @@ void ath_tx_aggr_sleep(struct ieee80211_sta *sta, struct ath_softc *sc,

ath_dbg(common, XMIT, "%s called\n", __func__);

- for (tidno = 0, tid = &an->tid[tidno];
- tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
+ for (tidno = 0;
+ tidno < IEEE80211_NUM_TIDS; tidno++) {

+ swq = an->sta->txq[tidno];
+ tid = (struct ath_atx_tid *) swq->drv_priv;
txq = tid->txq;

ath_txq_lock(sc, txq);
@@ -1565,15 +1634,18 @@ void ath_tx_aggr_sleep(struct ieee80211_sta *sta, struct ath_softc *sc,
void ath_tx_aggr_wakeup(struct ath_softc *sc, struct ath_node *an)
{
struct ath_common *common = ath9k_hw_common(sc->sc_ah);
+ struct ieee80211_txq *swq;
struct ath_atx_tid *tid;
struct ath_txq *txq;
int tidno;

ath_dbg(common, XMIT, "%s called\n", __func__);

- for (tidno = 0, tid = &an->tid[tidno];
- tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
+ for (tidno = 0;
+ tidno < IEEE80211_NUM_TIDS; tidno++) {

+ swq = an->sta->txq[tidno];
+ tid = (struct ath_atx_tid *) swq->drv_priv;
txq = tid->txq;

ath_txq_lock(sc, txq);
@@ -1599,7 +1671,7 @@ void ath_tx_aggr_resume(struct ath_softc *sc, struct ieee80211_sta *sta,
ath_dbg(common, XMIT, "%s called\n", __func__);

an = (struct ath_node *)sta->drv_priv;
- tid = ATH_AN_2_TID(an, tidno);
+ tid = ath_an_2_tid(an, tidno);
txq = tid->txq;

ath_txq_lock(sc, txq);
@@ -1637,7 +1709,7 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
if (!(tids & 1))
continue;

- tid = ATH_AN_2_TID(an, i);
+ tid = ath_an_2_tid(an, i);

ath_txq_lock(sc, tid->txq);
while (nframes > 0) {
@@ -2853,12 +2925,18 @@ int ath_tx_init(struct ath_softc *sc, int nbufs)

void ath_tx_node_init(struct ath_softc *sc, struct ath_node *an)
{
+ struct ieee80211_txq *swq;
+ struct ieee80211_sta *sta = an->sta;
+ struct ieee80211_vif *vif = an->vif;
struct ath_atx_tid *tid;
int tidno, acno;

- for (tidno = 0, tid = &an->tid[tidno];
+ for (tidno = 0;
tidno < IEEE80211_NUM_TIDS;
- tidno++, tid++) {
+ tidno++) {
+ swq = sta ? sta->txq[tidno] : vif->txq;
+ tid = (struct ath_atx_tid *) swq->drv_priv;
+ tid->swq = swq;
tid->an = an;
tid->tidno = tidno;
tid->seq_start = tid->seq_next = 0;
@@ -2866,23 +2944,33 @@ void ath_tx_node_init(struct ath_softc *sc, struct ath_node *an)
tid->baw_head = tid->baw_tail = 0;
tid->active = false;
tid->clear_ps_filter = true;
+ tid->swq_nonempty = false;
+ __skb_queue_head_init(&tid->i_q);
__skb_queue_head_init(&tid->buf_q);
__skb_queue_head_init(&tid->retry_q);
INIT_LIST_HEAD(&tid->list);
acno = TID_TO_WME_AC(tidno);
tid->txq = sc->tx.txq_map[acno];
+
+ if (!sta)
+ break; /* just one multicast ath_atx_tid */
}
}

void ath_tx_node_cleanup(struct ath_softc *sc, struct ath_node *an)
{
+ struct ieee80211_txq *swq;
+ struct ieee80211_sta *sta = an->sta;
+ struct ieee80211_vif *vif = an->vif;
struct ath_atx_tid *tid;
struct ath_txq *txq;
int tidno;

- for (tidno = 0, tid = &an->tid[tidno];
- tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
+ for (tidno = 0;
+ tidno < IEEE80211_NUM_TIDS; tidno++) {

+ swq = sta ? sta->txq[tidno] : vif->txq;
+ tid = (struct ath_atx_tid *) swq->drv_priv;
txq = tid->txq;

ath_txq_lock(sc, txq);
@@ -2894,6 +2982,9 @@ void ath_tx_node_cleanup(struct ath_softc *sc, struct ath_node *an)
tid->active = false;

ath_txq_unlock(sc, txq);
+
+ if (!sta)
+ break; /* just one multicast ath_atx_tid */
}
}

--
2.7.4

2016-06-10 15:52:52

by Adrian Chadd

[permalink] [raw]
Subject: Re: [Make-wifi-fast] [RFC/RFT 5/5] ath9k: Count RX airtime in airtime deficit

[picking a random email to reply to]

I /think/ the TSF is only valid for the last frame of an aggregate.

'more' means "this RX frame is split across multiple RX descriptors"
"moreaggr" means "there's more subframes in this RX'ed A-MPDU"

If you get a TSF in the first frame, it's whatever was left over at
that point in the RX FIFO when it was last updated; it isn't an actual
timestamp timestamp. The logic for populating the descriptor contents
just takes stuff out of certain offsets in the RX FIFO, and sometimes
that contains stale/wrong data.

I'll go and double/triple check this (as well as where the TSF is
actually stamped - I think it's actually stamped at the leading edge
of a packet, but stamped at the /end/ of it.)


-adrian

2016-06-04 14:32:25

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [RFC/RFT 2/5] ath9k: use mac80211 intermediate software queues

Tim Shepard <[email protected]> writes:

>> Reworked to not require the global variable renaming in ath9k.
>>
>> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
>
> Huh. I wonder why you did that? Is it really better to call the
> ieee80211_txq a swq and call the ath9k hardware queue a txq? I
> thought doing the renaming made for more readable and much more
> maintainable code (where searching for text strings produced much
> cleaner results when trying to track down all references).

Well, the immediate reason was that at the time I just spent two weeks
figuring out how ath9k worked and what the different concepts were, and
suddenly they start to mean something different. The txq->hwq renaming
would probably make sense in itself, but suddenly the same variable
(txq) meant something different, which was quite confusing. So I found
it was easier to change your patch to not require the renaming.

A secondary reason was to keep the patch delta as small as possible. I'm
definitely not the right person to make that call, but I found the
global renaming somewhat intrusive.

As for the first point, I think it would be easier if you did not call
the mac80211 queues 'txq' but something else ('swq' like I did, perhaps;
I also considered 'imq' for intermediate queue). This will at least make
it clear at a glance that it is something different than 'txq' was
previously.

> I am grateful to learn that someone has read my patched version of
> ath9k at least enough to do this rework. Any comments on the actual
> work?

Well, it seems to work well enough (haven't noticed the pending_frames
issue, but on the other hand I haven't been looking very hard). However,
it does fell like somewhat of a temporary solution. Having another
intermediate queue in the driver (tid->i_q) and having a side-effect in
ath_tid_has_buffered() seems a bit iffy to me. Wouldn't the 'right' way
to do it be to have ath_tid_has_buffered() ask the mac layer if it has
any frames queued, and then have ath_tx_get_tid_subframe() basically
translate straight to a call to ieee80211_tx_dequeue() (taking into
account the retry queue)? Not sure if that covers all code paths, but
the current approach seems like it has one too many layers of queues?

Haven't given the above a lot of thought, but since you asked... :)

-Toke

2016-06-10 15:33:12

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [Make-wifi-fast] [RFC/RFT 5/5] ath9k: Count RX airtime in airtime deficit

Toke Høiland-Jørgensen <[email protected]> writes:

>> No idea. If that's possible, then track last MPDU within PPDU, so you
>> can at least fallback to _something_ when you detect a new first
>> (A-)MPDU?
>>
>> Or maybe it's impossible (i.e. not worth worrying) and HW always
>> reports last MPDU as far as status bits are concerned (regardless of
>> it being _actual_ last MPDU, i.e. it just says "ok, I'm done with this
>> PPDU").
>
> I'll try a couple of different permutations of this and see what works.
> Thanks for the ideas!

OK, so having tried all the different approaches, this seems to be the
best one. Basically, this:

if (rs->rs_isaggr && rs->rs_firstaggr) {
an->airtime_rx_start = rs->rs_tstamp;
} else if (rs->rs_isaggr && !rs->rs_moreaggr && an->airtime_rx_start) {
airtime = rs->rs_tstamp - an->airtime_rx_start;
} else if (!rs->rs_isaggr) {
an->airtime_rx_start = 0;
/* keeping the previous length-based calculation here */
}


This seems to give me RX airtime figures that correspond fairly well to
the TX airtime for running the same test in the opposite direction (I'm
running a ten-second UDP flood test and looking at the total airtime
accounted after the end of the run).

-Toke

2016-06-10 08:53:40

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [Make-wifi-fast] [RFC/RFT 5/5] ath9k: Count RX airtime in airtime deficit


>> I initially thought that using the timestamp put into the frame by the
>> hardware could be a way to get timing. But there's only a timestamp of
>> the first symbol in rs_tstamp, and getting a time to compare it with is
>> difficult; by the time the frame is handled in the rx tasklet, way too
>> much time has been spent on interrupt handling etc for the current time
>> to be worth comparing with.

As an aside, I'm no longer sure this explanation for why I got wrong
timings for this way of measuring RX time is correct. It may simply be
that I was counting the same time interval more than once because I
didn't realise that the handler would be called for each frame. See
below.

> I think rs_tstamp in rx-status is different for first MPDU and last
> MPDU in an A-MPDU meaning you should be able to compute the entire
> duration (if you track it, and this should be fairly straightforward
> as you can't really rx interleaved MPDUs from different
> A-MPDUs/stations). I'm not sure if the last MPDU defines the tstamp of
> first symbol or last one.

I actually went down this path again last night, but haven't had a
chance to test it yet.

So what I have now is this:

/* Only count airtime for last frame in an aggregate. FIXME: Should
* this be only the first frame? */
if (!rs->rs_isaggr || !rs->rs_moreaggr)
airtime = (tsf & 0xffffffff) - rs->rs_tstamp;

Which was under the assumption that rs_tstamp will be the time of the
first symbol *of the whole ampdu* for all the frames in that aggregate.
However, if rs_tstamp differs between frames, tracking it at the first
frame might be the right things to do?

Is the entire A-MPDU received before the RX handler is called for the
first frame?

-Toke

2016-06-08 01:41:21

by Adrian Chadd

[permalink] [raw]
Subject: Re: [RFC/RFT 5/5] ath9k: Count RX airtime in airtime deficit

On 7 June 2016 at 04:12, Toke Høiland-Jørgensen <[email protected]> wrote:
> Toke Høiland-Jørgensen <[email protected]> writes:
>
>>> [snip]
>>>
>>> I also found one of my notes in my version of this - how can we
>>> estimate the duration of an A-MPDU, when we only get hardware
>>> de-encapsulated frames?
>>
>> Well in my case I'm sidestepping this by getting the TX duration from
>> a register in the hardware. There seems to be registers containing the
>> duration spent on each step in the retry chain; I simply sum these.
>
> Ah, but you're still talking RX? Hmm, I'm using ath_pkt_duration() to
> compute the RX time, which does take into account MIMO (I think) but
> expects the size to include padding. Which is probably not included in
> the rs_datalen field of struct ath_rx_status that I'm using.
>
> So yeah, how to account for that?
>
> I initially thought that using the timestamp put into the frame by the
> hardware could be a way to get timing. But there's only a timestamp of
> the first symbol in rs_tstamp, and getting a time to compare it with is
> difficult; by the time the frame is handled in the rx tasklet, way too
> much time has been spent on interrupt handling etc for the current time
> to be worth comparing with.

Right. In the case of RX'ing an A-MPDU, we only get told about the
A-MPDU boundaries (isaggr/lastaggr or something in the RX descriptor)
but nothing telling us how long the original RX'ed PPDU is.

So if we get say 16 frames and we are missing the middle one, we can
reconstruct things okay. But if we miss the first 8 frames, we don't
know when it started - we only get the RX aggr boundary flags set on
the 9th and the 16th and we don't even know about the missed frames.

I think that's going to be a shortcoming right now. I couldn't think
of a clever way to figure it out except to detect holes in the BAW and
determine the client is missing frames and take actions there.


-adrian