2016-03-16 10:15:50

by Michal Kazior

[permalink] [raw]
Subject: [RFCv2 0/3] mac80211: implement fq codel

Hi,

Most notable changes:
* fixes (duh); fairness should work better now,
* EWMA codel target based on estimated service
time,
* new tx scheduling helper with in-flight
duration limiting (same idea Emmanuel
had for iwlwifi),
* added a few debugfs hooks.
* ath10k proof-of-concept that uses the new tx
scheduling (will post results in separate
email)

The patch grew pretty big and I plan on splitting
it before next submission. Any suggestions?

The tx scheduling probably needs more work and
testing. I didn't evaluate how CPU intensive it is
nor how it influences things like peak throughput
(lab conditions et al) yet.

I've uploaded a branch for convenience:

https://github.com/kazikcz/linux/tree/fqmac-rfc-v2

This is based on Kalle's ath tree.


Michal Kazior (3):
mac80211: implement fq_codel for software queuing
ath10k: report per-station tx/rate rates to mac80211
ath10k: use ieee80211_tx_schedule()

drivers/net/wireless/ath/ath10k/core.c | 2 -
drivers/net/wireless/ath/ath10k/core.h | 8 +-
drivers/net/wireless/ath/ath10k/debug.c | 61 ++-
drivers/net/wireless/ath/ath10k/mac.c | 126 +++---
drivers/net/wireless/ath/ath10k/wmi.h | 2 +-
include/net/mac80211.h | 96 ++++-
net/mac80211/agg-tx.c | 8 +-
net/mac80211/cfg.c | 2 +-
net/mac80211/codel.h | 264 +++++++++++++
net/mac80211/codel_i.h | 89 +++++
net/mac80211/debugfs.c | 267 +++++++++++++
net/mac80211/ieee80211_i.h | 45 ++-
net/mac80211/iface.c | 25 +-
net/mac80211/main.c | 9 +-
net/mac80211/rx.c | 2 +-
net/mac80211/sta_info.c | 10 +-
net/mac80211/sta_info.h | 27 ++
net/mac80211/status.c | 64 ++++
net/mac80211/tx.c | 658 ++++++++++++++++++++++++++++++--
net/mac80211/util.c | 21 +-
20 files changed, 1629 insertions(+), 157 deletions(-)
create mode 100644 net/mac80211/codel.h
create mode 100644 net/mac80211/codel_i.h

--
2.1.4



2016-03-16 10:15:53

by Michal Kazior

[permalink] [raw]
Subject: [RFCv2 1/3] mac80211: implement fq_codel for software queuing

Since 11n aggregation become important to get the
best out of txops. However aggregation inherently
requires buffering and queuing. Once variable
medium conditions to different associated stations
is considered it became apparent that bufferbloat
can't be simply fought with qdiscs for wireless
drivers.

This bases on codel5 and sch_fq_codel.c. It may
not be the Right Thing yet but it should at least
provide a framework for more improvements.

Signed-off-by: Michal Kazior <[email protected]>
---
include/net/mac80211.h | 96 ++++++-
net/mac80211/agg-tx.c | 8 +-
net/mac80211/cfg.c | 2 +-
net/mac80211/codel.h | 264 ++++++++++++++++++
net/mac80211/codel_i.h | 89 ++++++
net/mac80211/debugfs.c | 267 ++++++++++++++++++
net/mac80211/ieee80211_i.h | 45 +++-
net/mac80211/iface.c | 25 +-
net/mac80211/main.c | 9 +-
net/mac80211/rx.c | 2 +-
net/mac80211/sta_info.c | 10 +-
net/mac80211/sta_info.h | 27 ++
net/mac80211/status.c | 64 +++++
net/mac80211/tx.c | 658 ++++++++++++++++++++++++++++++++++++++++++---
net/mac80211/util.c | 21 +-
15 files changed, 1503 insertions(+), 84 deletions(-)
create mode 100644 net/mac80211/codel.h
create mode 100644 net/mac80211/codel_i.h

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index a53333cb1528..947d827f254b 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -565,6 +565,16 @@ struct ieee80211_bss_conf {
struct ieee80211_p2p_noa_attr p2p_noa_attr;
};

+/*
+ * struct codel_params - contains codel parameters
+ * @interval: initial drop rate
+ * @target: maximum persistent sojourn time
+ */
+struct codel_params {
+ u64 interval;
+ u64 target;
+};
+
/**
* enum mac80211_tx_info_flags - flags to describe transmission information/status
*
@@ -853,6 +863,8 @@ ieee80211_rate_get_vht_nss(const struct ieee80211_tx_rate *rate)
* @band: the band to transmit on (use for checking for races)
* @hw_queue: HW queue to put the frame on, skb_get_queue_mapping() gives the AC
* @ack_frame_id: internal frame ID for TX status, used internally
+ * @expected_duration: number of microseconds the stack expects this frame to
+ * take to tx. Used for fair queuing.
* @control: union for control data
* @status: union for status data
* @driver_data: array of driver_data pointers
@@ -865,11 +877,10 @@ ieee80211_rate_get_vht_nss(const struct ieee80211_tx_rate *rate)
struct ieee80211_tx_info {
/* common information */
u32 flags;
- u8 band;
-
- u8 hw_queue;
-
- u16 ack_frame_id;
+ u32 band:2,
+ hw_queue:5,
+ ack_frame_id:15,
+ expected_duration:10;

union {
struct {
@@ -888,8 +899,18 @@ struct ieee80211_tx_info {
/* only needed before rate control */
unsigned long jiffies;
};
- /* NB: vif can be NULL for injected frames */
- struct ieee80211_vif *vif;
+ union {
+ /* NB: vif can be NULL for injected frames */
+ struct ieee80211_vif *vif;
+
+ /* When packets are enqueued on txq it's easy
+ * to re-construct the vif pointer. There's no
+ * more space in tx_info so it can be used to
+ * store the necessary enqueue time for packet
+ * sojourn time computation.
+ */
+ u64 enqueue_time;
+ };
struct ieee80211_key_conf *hw_key;
u32 flags;
/* 4 bytes free */
@@ -2114,8 +2135,8 @@ enum ieee80211_hw_flags {
* @cipher_schemes: a pointer to an array of cipher scheme definitions
* supported by HW.
*
- * @txq_ac_max_pending: maximum number of frames per AC pending in all txq
- * entries for a vif.
+ * @txq_cparams: codel parameters to control tx queueing dropping behavior
+ * @txq_limit: maximum number of frames queuesd
*/
struct ieee80211_hw {
struct ieee80211_conf conf;
@@ -2145,7 +2166,8 @@ struct ieee80211_hw {
u8 uapsd_max_sp_len;
u8 n_cipher_schemes;
const struct ieee80211_cipher_scheme *cipher_schemes;
- int txq_ac_max_pending;
+ struct codel_params txq_cparams;
+ u32 txq_limit;
};

static inline bool _ieee80211_hw_check(struct ieee80211_hw *hw,
@@ -5633,6 +5655,9 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
* txq state can change half-way of this function and the caller may end up
* with "new" frame_cnt and "old" byte_cnt or vice-versa.
*
+ * Moreover returned values are best-case, i.e. assuming queueing algorithm
+ * will not drop frames due to excess latency.
+ *
* @txq: pointer obtained from station or virtual interface
* @frame_cnt: pointer to store frame count
* @byte_cnt: pointer to store byte count
@@ -5640,4 +5665,55 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
void ieee80211_txq_get_depth(struct ieee80211_txq *txq,
unsigned long *frame_cnt,
unsigned long *byte_cnt);
+
+/**
+ * ieee80211_recalc_fq_period - recalculate fair-queuing period
+ *
+ * This is used to alter the dropping rate to react to possibly changing
+ * (active) station-tid service period and air conditions.
+ *
+ * Driver which implement wake_tx_queue() but don't use ieee80211_tx_schedule()
+ * are encouraged to call this function periodically.
+ *
+* @hw: pointer as obtained from ieee80211_alloc_hw()
+ */
+void ieee80211_recalc_fq_period(struct ieee80211_hw *hw);
+
+/**
+ * ieee80211_tx_schedule - schedule next transmission burst
+ *
+ * This function can be (and should be, preferably) called by drivers that use
+ * wake_tx_queue op. It uses fq-codel like algorithm to maintain fairness.
+ *
+ * This function may call in back to driver (get_expected_throughput op) so
+ * be careful with locking.
+ *
+ * Driver should take care of serializing calls to this functions. Otherwise
+ * fairness can't be guaranteed.
+ *
+ * This function returns the following values:
+ * -EBUSY Software queues are not empty yet. The function should
+ * not be called until after driver's next tx completion.
+ * -ENOENT Software queues are empty.
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @wake: callback to driver to handle burst for given txq within given (byte)
+ * budget. The driver is expected to either call ieee80211_tx_dequeue() or
+ * use its internal queues (if any). The budget should be respected only
+ * for frames comming from ieee80211_tx_dequeue(). On termination it is
+ * expected to return number of frames put onto hw queue that were taken
+ * via ieee80211_tx_dequeue(). Frames from internal retry queues shall not
+ * be included in the returned count. If hw queues become/are busy/full
+ * the driver shall return a negative value which will prompt
+ * ieee80211_tx_schedule() to terminate. If hw queues become full after at
+ * least 1 frame dequeued via ieee80211_tx_dequeue() was sent the driver
+ * is free to report either number of sent frames up until that point or a
+ * negative value. The driver may return 0 if it wants to skip the txq
+ * (e.g. target station is in powersave).
+ */
+int ieee80211_tx_schedule(struct ieee80211_hw *hw,
+ int (*wake)(struct ieee80211_hw *hw,
+ struct ieee80211_txq *txq,
+ int budget));
+
#endif /* MAC80211_H */
diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
index 4932e9f243a2..b9d0cee2a786 100644
--- a/net/mac80211/agg-tx.c
+++ b/net/mac80211/agg-tx.c
@@ -194,17 +194,21 @@ static void
ieee80211_agg_stop_txq(struct sta_info *sta, int tid)
{
struct ieee80211_txq *txq = sta->sta.txq[tid];
+ struct ieee80211_sub_if_data *sdata;
+ struct ieee80211_fq *fq;
struct txq_info *txqi;

if (!txq)
return;

txqi = to_txq_info(txq);
+ sdata = vif_to_sdata(txq->vif);
+ fq = &sdata->local->fq;

/* Lock here to protect against further seqno updates on dequeue */
- spin_lock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);
set_bit(IEEE80211_TXQ_STOP, &txqi->flags);
- spin_unlock_bh(&txqi->queue.lock);
+ spin_unlock_bh(&fq->lock);
}

static void
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index b37adb60c9cb..238d7bbd275e 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -3029,7 +3029,7 @@ int ieee80211_attach_ack_skb(struct ieee80211_local *local, struct sk_buff *skb,

spin_lock_irqsave(&local->ack_status_lock, spin_flags);
id = idr_alloc(&local->ack_status_frames, ack_skb,
- 1, 0x10000, GFP_ATOMIC);
+ 1, 0x8000, GFP_ATOMIC);
spin_unlock_irqrestore(&local->ack_status_lock, spin_flags);

if (id < 0) {
diff --git a/net/mac80211/codel.h b/net/mac80211/codel.h
new file mode 100644
index 000000000000..e6470dbe5b0b
--- /dev/null
+++ b/net/mac80211/codel.h
@@ -0,0 +1,264 @@
+#ifndef __NET_MAC80211_CODEL_H
+#define __NET_MAC80211_CODEL_H
+
+/*
+ * Codel - The Controlled-Delay Active Queue Management algorithm
+ *
+ * Copyright (C) 2011-2012 Kathleen Nichols <[email protected]>
+ * Copyright (C) 2011-2012 Van Jacobson <[email protected]>
+ * Copyright (C) 2016 Michael D. Taht <[email protected]>
+ * Copyright (C) 2012 Eric Dumazet <[email protected]>
+ * Copyright (C) 2015 Jonathan Morton <[email protected]>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions, and the following disclaimer,
+ * without modification.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. The names of the authors may not be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ * Alternatively, provided that this notice is retained in full, this
+ * software may be distributed under the terms of the GNU General
+ * Public License ("GPL") version 2, in which case the provisions of the
+ * GPL apply INSTEAD OF those given above.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
+ * DAMAGE.
+ *
+ */
+
+#include <linux/version.h>
+#include <linux/types.h>
+#include <linux/ktime.h>
+#include <linux/skbuff.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <linux/reciprocal_div.h>
+
+#include "codel_i.h"
+
+/* Controlling Queue Delay (CoDel) algorithm
+ * =========================================
+ * Source : Kathleen Nichols and Van Jacobson
+ * http://queue.acm.org/detail.cfm?id=2209336
+ *
+ * Implemented on linux by Dave Taht and Eric Dumazet
+ */
+
+/* CoDel5 uses a real clock, unlike codel */
+
+static inline u64 codel_get_time(void)
+{
+ return ktime_get_ns();
+}
+
+static inline u32 codel_time_to_us(u64 val)
+{
+ do_div(val, NSEC_PER_USEC);
+ return (u32)val;
+}
+
+/* sizeof_in_bits(rec_inv_sqrt) */
+#define REC_INV_SQRT_BITS (8 * sizeof(u16))
+/* needed shift to get a Q0.32 number from rec_inv_sqrt */
+#define REC_INV_SQRT_SHIFT (32 - REC_INV_SQRT_BITS)
+
+/* Newton approximation method needs more iterations at small inputs,
+ * so cache them.
+ */
+
+static void codel_vars_init(struct codel_vars *vars)
+{
+ memset(vars, 0, sizeof(*vars));
+}
+
+/*
+ * http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Iterative_methods_for_reciprocal_square_roots
+ * new_invsqrt = (invsqrt / 2) * (3 - count * invsqrt^2)
+ *
+ * Here, invsqrt is a fixed point number (< 1.0), 32bit mantissa, aka Q0.32
+ */
+static inline void codel_Newton_step(struct codel_vars *vars)
+{
+ u32 invsqrt = ((u32)vars->rec_inv_sqrt) << REC_INV_SQRT_SHIFT;
+ u32 invsqrt2 = ((u64)invsqrt * invsqrt) >> 32;
+ u64 val = (3LL << 32) - ((u64)vars->count * invsqrt2);
+
+ val >>= 2; /* avoid overflow in following multiply */
+ val = (val * invsqrt) >> (32 - 2 + 1);
+
+ vars->rec_inv_sqrt = val >> REC_INV_SQRT_SHIFT;
+}
+
+/*
+ * CoDel control_law is t + interval/sqrt(count)
+ * We maintain in rec_inv_sqrt the reciprocal value of sqrt(count) to avoid
+ * both sqrt() and divide operation.
+ */
+static u64 codel_control_law(u64 t,
+ u64 interval,
+ u32 rec_inv_sqrt)
+{
+ return t + reciprocal_scale(interval, rec_inv_sqrt <<
+ REC_INV_SQRT_SHIFT);
+}
+
+/* Forward declaration of this for use elsewhere */
+
+static inline u64
+custom_codel_get_enqueue_time(struct sk_buff *skb);
+
+static inline struct sk_buff *
+custom_dequeue(struct codel_vars *vars, void *ptr);
+
+static inline void
+custom_drop(struct sk_buff *skb, void *ptr);
+
+static bool codel_should_drop(struct sk_buff *skb,
+ __u32 *backlog,
+ __u32 backlog_thr,
+ struct codel_vars *vars,
+ const struct codel_params *p,
+ u64 now)
+{
+ if (!skb) {
+ vars->first_above_time = 0;
+ return false;
+ }
+
+ if (now - custom_codel_get_enqueue_time(skb) < p->target ||
+ *backlog <= backlog_thr) {
+ /* went below - stay below for at least interval */
+ vars->first_above_time = 0;
+ return false;
+ }
+
+ if (vars->first_above_time == 0) {
+ /* just went above from below; mark the time */
+ vars->first_above_time = now + p->interval;
+
+ } else if (now > vars->first_above_time) {
+ return true;
+ }
+
+ return false;
+}
+
+static struct sk_buff *codel_dequeue(void *ptr,
+ __u32 *backlog,
+ __u32 backlog_thr,
+ struct codel_vars *vars,
+ struct codel_params *p,
+ u64 now,
+ bool overloaded)
+{
+ struct sk_buff *skb = custom_dequeue(vars, ptr);
+ bool drop;
+
+ if (!skb) {
+ vars->dropping = false;
+ return skb;
+ }
+ drop = codel_should_drop(skb, backlog, backlog_thr, vars, p, now);
+ if (vars->dropping) {
+ if (!drop) {
+ /* sojourn time below target - leave dropping state */
+ vars->dropping = false;
+ } else if (now >= vars->drop_next) {
+ /* It's time for the next drop. Drop the current
+ * packet and dequeue the next. The dequeue might
+ * take us out of dropping state.
+ * If not, schedule the next drop.
+ * A large backlog might result in drop rates so high
+ * that the next drop should happen now,
+ * hence the while loop.
+ */
+
+ /* saturating increment */
+ vars->count++;
+ if (!vars->count)
+ vars->count--;
+
+ codel_Newton_step(vars);
+ vars->drop_next = codel_control_law(vars->drop_next,
+ p->interval,
+ vars->rec_inv_sqrt);
+ do {
+ if (INET_ECN_set_ce(skb) && !overloaded) {
+ vars->ecn_mark++;
+ /* and schedule the next drop */
+ vars->drop_next = codel_control_law(
+ vars->drop_next, p->interval,
+ vars->rec_inv_sqrt);
+ goto end;
+ }
+ custom_drop(skb, ptr);
+ vars->drop_count++;
+ skb = custom_dequeue(vars, ptr);
+ if (skb && !codel_should_drop(skb, backlog,
+ backlog_thr,
+ vars, p, now)) {
+ /* leave dropping state */
+ vars->dropping = false;
+ } else {
+ /* schedule the next drop */
+ vars->drop_next = codel_control_law(
+ vars->drop_next, p->interval,
+ vars->rec_inv_sqrt);
+ }
+ } while (skb && vars->dropping && now >=
+ vars->drop_next);
+
+ /* Mark the packet regardless */
+ if (skb && INET_ECN_set_ce(skb))
+ vars->ecn_mark++;
+ }
+ } else if (drop) {
+ if (INET_ECN_set_ce(skb) && !overloaded) {
+ vars->ecn_mark++;
+ } else {
+ custom_drop(skb, ptr);
+ vars->drop_count++;
+
+ skb = custom_dequeue(vars, ptr);
+ drop = codel_should_drop(skb, backlog, backlog_thr,
+ vars, p, now);
+ if (skb && INET_ECN_set_ce(skb))
+ vars->ecn_mark++;
+ }
+ vars->dropping = true;
+ /* if min went above target close to when we last went below
+ * assume that the drop rate that controlled the queue on the
+ * last cycle is a good starting point to control it now.
+ */
+ if (vars->count > 2 &&
+ now - vars->drop_next < 8 * p->interval) {
+ vars->count -= 2;
+ codel_Newton_step(vars);
+ } else {
+ vars->count = 1;
+ vars->rec_inv_sqrt = ~0U >> REC_INV_SQRT_SHIFT;
+ }
+ codel_Newton_step(vars);
+ vars->drop_next = codel_control_law(now, p->interval,
+ vars->rec_inv_sqrt);
+ }
+end:
+ return skb;
+}
+#endif
diff --git a/net/mac80211/codel_i.h b/net/mac80211/codel_i.h
new file mode 100644
index 000000000000..a7d23e45dee9
--- /dev/null
+++ b/net/mac80211/codel_i.h
@@ -0,0 +1,89 @@
+#ifndef __NET_MAC80211_CODEL_I_H
+#define __NET_MAC80211_CODEL_I_H
+
+/*
+ * Codel - The Controlled-Delay Active Queue Management algorithm
+ *
+ * Copyright (C) 2011-2012 Kathleen Nichols <[email protected]>
+ * Copyright (C) 2011-2012 Van Jacobson <[email protected]>
+ * Copyright (C) 2016 Michael D. Taht <[email protected]>
+ * Copyright (C) 2012 Eric Dumazet <[email protected]>
+ * Copyright (C) 2015 Jonathan Morton <[email protected]>
+ * Copyright (C) 2016 Michal Kazior <[email protected]>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions, and the following disclaimer,
+ * without modification.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. The names of the authors may not be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ * Alternatively, provided that this notice is retained in full, this
+ * software may be distributed under the terms of the GNU General
+ * Public License ("GPL") version 2, in which case the provisions of the
+ * GPL apply INSTEAD OF those given above.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
+ * DAMAGE.
+ *
+ */
+
+#include <linux/version.h>
+#include <linux/types.h>
+#include <linux/ktime.h>
+#include <linux/skbuff.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <linux/reciprocal_div.h>
+
+/* Controlling Queue Delay (CoDel) algorithm
+ * =========================================
+ * Source : Kathleen Nichols and Van Jacobson
+ * http://queue.acm.org/detail.cfm?id=2209336
+ *
+ * Implemented on linux by Dave Taht and Eric Dumazet
+ */
+
+/* CoDel5 uses a real clock, unlike codel */
+
+#define MS2TIME(a) (a * (u64) NSEC_PER_MSEC)
+#define US2TIME(a) (a * (u64) NSEC_PER_USEC)
+
+/**
+ * struct codel_vars - contains codel variables
+ * @count: how many drops we've done since the last time we
+ * entered dropping state
+ * @dropping: set to > 0 if in dropping state
+ * @rec_inv_sqrt: reciprocal value of sqrt(count) >> 1
+ * @first_above_time: when we went (or will go) continuously above target
+ * for interval
+ * @drop_next: time to drop next packet, or when we dropped last
+ * @drop_count: temp count of dropped packets in dequeue()
+ * @ecn_mark: number of packets we ECN marked instead of dropping
+ */
+
+struct codel_vars {
+ u32 count;
+ u16 dropping;
+ u16 rec_inv_sqrt;
+ u64 first_above_time;
+ u64 drop_next;
+ u16 drop_count;
+ u16 ecn_mark;
+};
+#endif
diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 4ab5c522ceee..9b0b8c3d23cd 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -31,6 +31,30 @@ int mac80211_format_buffer(char __user *userbuf, size_t count,
return simple_read_from_buffer(userbuf, count, ppos, buf, res);
}

+static int mac80211_parse_buffer(const char __user *userbuf,
+ size_t count,
+ loff_t *ppos,
+ char *fmt, ...)
+{
+ va_list args;
+ char buf[DEBUGFS_FORMAT_BUFFER_SIZE] = {};
+ int res;
+
+ if (count > sizeof(buf))
+ return -EINVAL;
+
+ if (copy_from_user(buf, userbuf, count))
+ return -EFAULT;
+
+ buf[sizeof(buf) - 1] = '\0';
+
+ va_start(args, fmt);
+ res = vsscanf(buf, fmt, args);
+ va_end(args);
+
+ return count;
+}
+
#define DEBUGFS_READONLY_FILE_FN(name, fmt, value...) \
static ssize_t name## _read(struct file *file, char __user *userbuf, \
size_t count, loff_t *ppos) \
@@ -70,6 +94,62 @@ DEBUGFS_READONLY_FILE(wep_iv, "%#08x",
DEBUGFS_READONLY_FILE(rate_ctrl_alg, "%s",
local->rate_ctrl ? local->rate_ctrl->ops->name : "hw/driver");

+DEBUGFS_READONLY_FILE(fq_drop_overlimit, "%d",
+ local->fq.drop_overlimit);
+DEBUGFS_READONLY_FILE(fq_drop_codel, "%d",
+ local->fq.drop_codel);
+DEBUGFS_READONLY_FILE(fq_backlog, "%d",
+ local->fq.backlog);
+DEBUGFS_READONLY_FILE(fq_in_flight_usec, "%d",
+ atomic_read(&local->fq.in_flight_usec));
+DEBUGFS_READONLY_FILE(fq_txq_limit, "%d",
+ local->hw.txq_limit);
+DEBUGFS_READONLY_FILE(fq_txq_interval, "%llu",
+ local->hw.txq_cparams.interval);
+DEBUGFS_READONLY_FILE(fq_txq_target, "%llu",
+ local->hw.txq_cparams.target);
+DEBUGFS_READONLY_FILE(fq_ave_period, "%d",
+ (int)ewma_fq_period_read(&local->fq.ave_period));
+
+#define DEBUGFS_RW_FILE_FN(name, expr) \
+static ssize_t name## _write(struct file *file, \
+ const char __user *userbuf, \
+ size_t count, \
+ loff_t *ppos) \
+{ \
+ struct ieee80211_local *local = file->private_data; \
+ return expr; \
+}
+
+#define DEBUGFS_RW_FILE(name, expr, fmt, value...) \
+ DEBUGFS_READONLY_FILE_FN(name, fmt, value) \
+ DEBUGFS_RW_FILE_FN(name, expr) \
+ DEBUGFS_RW_FILE_OPS(name)
+
+#define DEBUGFS_RW_FILE_OPS(name) \
+static const struct file_operations name## _ops = { \
+ .read = name## _read, \
+ .write = name## _write, \
+ .open = simple_open, \
+ .llseek = generic_file_llseek, \
+};
+
+#define DEBUGFS_RW_EXPR_FQ(name) \
+({ \
+ int res; \
+ res = mac80211_parse_buffer(userbuf, count, ppos, "%d", &name); \
+ ieee80211_recalc_fq_period(&local->hw); \
+ res; \
+})
+
+DEBUGFS_RW_FILE(fq_min_txops_target, DEBUGFS_RW_EXPR_FQ(local->fq.min_txops_target), "%d", local->fq.min_txops_target);
+DEBUGFS_RW_FILE(fq_max_txops_per_txq, DEBUGFS_RW_EXPR_FQ(local->fq.max_txops_per_txq), "%d", local->fq.max_txops_per_txq);
+DEBUGFS_RW_FILE(fq_min_txops_per_hw, DEBUGFS_RW_EXPR_FQ(local->fq.min_txops_per_hw), "%d", local->fq.min_txops_per_hw);
+DEBUGFS_RW_FILE(fq_max_txops_per_hw, DEBUGFS_RW_EXPR_FQ(local->fq.max_txops_per_hw), "%d", local->fq.max_txops_per_hw);
+DEBUGFS_RW_FILE(fq_txop_mixed_usec, DEBUGFS_RW_EXPR_FQ(local->fq.txop_mixed_usec), "%d", local->fq.txop_mixed_usec);
+DEBUGFS_RW_FILE(fq_txop_green_usec, DEBUGFS_RW_EXPR_FQ(local->fq.txop_green_usec), "%d", local->fq.txop_green_usec);
+
+
#ifdef CONFIG_PM
static ssize_t reset_write(struct file *file, const char __user *user_buf,
size_t count, loff_t *ppos)
@@ -177,8 +257,178 @@ static ssize_t queues_read(struct file *file, char __user *user_buf,
return simple_read_from_buffer(user_buf, count, ppos, buf, res);
}

+static ssize_t fq_read(struct file *file, char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ struct ieee80211_local *local = file->private_data;
+ struct ieee80211_sub_if_data *sdata;
+ struct sta_info *sta;
+ struct txq_flow *flow;
+ struct txq_info *txqi;
+ void *buf;
+ int new_flows;
+ int old_flows;
+ int len;
+ int i;
+ int rv;
+ int res = 0;
+ static const u8 zeroaddr[ETH_ALEN];
+
+ len = 32 * 1024;
+ buf = kzalloc(len, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ spin_lock_bh(&local->fq.lock);
+ rcu_read_lock();
+
+ list_for_each_entry(txqi, &local->fq.new_flows, flowchain) {
+ res += scnprintf(buf + res, len - res,
+ "sched new txqi vif %s sta %pM tid %d deficit %d\n",
+ container_of(txqi->txq.vif, struct ieee80211_sub_if_data, vif)->name,
+ txqi->txq.sta ? txqi->txq.sta->addr : zeroaddr,
+ txqi->txq.tid,
+ txqi->deficit);
+ }
+
+ list_for_each_entry(txqi, &local->fq.old_flows, flowchain) {
+ res += scnprintf(buf + res, len - res,
+ "sched old txqi vif %s sta %pM tid %d deficit %d\n",
+ container_of(txqi->txq.vif, struct ieee80211_sub_if_data, vif)->name,
+ txqi->txq.sta ? txqi->txq.sta->addr : zeroaddr,
+ txqi->txq.tid,
+ txqi->deficit);
+ }
+
+ list_for_each_entry_rcu(sta, &local->sta_list, list) {
+ for (i = 0; i < IEEE80211_NUM_TIDS; i++) {
+ if (!sta->sta.txq[i])
+ continue;
+
+ txqi = container_of(sta->sta.txq[i], struct txq_info, txq);
+ if (!txqi->backlog_bytes)
+ continue;
+
+ new_flows = 0;
+ old_flows = 0;
+
+ list_for_each_entry(flow, &txqi->new_flows, flowchain)
+ new_flows++;
+ list_for_each_entry(flow, &txqi->old_flows, flowchain)
+ old_flows++;
+
+ res += scnprintf(buf + res, len - res,
+ "sta %pM tid %d backlog (%db %dp) flows (%d new %d old) burst %d bpu %d in-flight %d\n",
+ sta->sta.addr,
+ i,
+ txqi->backlog_bytes,
+ txqi->backlog_packets,
+ new_flows,
+ old_flows,
+ txqi->bytes_per_burst,
+ txqi->bytes_per_usec,
+ atomic_read(&txqi->in_flight_usec)
+ );
+
+ flow = &txqi->flow;
+ res += scnprintf(buf + res, len - res,
+ "sta %pM def flow %p backlog (%db %dp)\n",
+ sta->sta.addr,
+ flow,
+ flow->backlog,
+ flow->queue.qlen
+ );
+
+ list_for_each_entry(flow, &txqi->new_flows, flowchain)
+ res += scnprintf(buf + res, len - res,
+ "sta %pM tid %d new flow %p backlog (%db %dp)\n",
+ sta->sta.addr,
+ i,
+ flow,
+ flow->backlog,
+ flow->queue.qlen
+ );
+
+ list_for_each_entry(flow, &txqi->old_flows, flowchain)
+ res += scnprintf(buf + res, len - res,
+ "sta %pM tid %d old flow %p backlog (%db %dp)\n",
+ sta->sta.addr,
+ i,
+ flow,
+ flow->backlog,
+ flow->queue.qlen
+ );
+ }
+ }
+
+ list_for_each_entry_rcu(sdata, &local->interfaces, list) {
+ if (!sdata->vif.txq)
+ continue;
+
+ txqi = container_of(sdata->vif.txq, struct txq_info, txq);
+ if (!txqi->backlog_bytes)
+ continue;
+
+ new_flows = 0;
+ old_flows = 0;
+
+ list_for_each_entry(flow, &txqi->new_flows, flowchain)
+ new_flows++;
+ list_for_each_entry(flow, &txqi->old_flows, flowchain)
+ old_flows++;
+
+ res += scnprintf(buf + res, len - res,
+ "vif %s backlog (%db %dp) flows (%d new %d old) burst %d bpu %d in-flight %d\n",
+ sdata->name,
+ txqi->backlog_bytes,
+ txqi->backlog_packets,
+ new_flows,
+ old_flows,
+ txqi->bytes_per_burst,
+ txqi->bytes_per_usec,
+ atomic_read(&txqi->in_flight_usec)
+ );
+
+ flow = &txqi->flow;
+ res += scnprintf(buf + res, len - res,
+ "vif %s def flow %p backlog (%db %dp)\n",
+ sdata->name,
+ flow,
+ flow->backlog,
+ flow->queue.qlen
+ );
+
+ list_for_each_entry(flow, &txqi->new_flows, flowchain)
+ res += scnprintf(buf + res, len - res,
+ "vif %s new flow %p backlog (%db %dp)\n",
+ sdata->name,
+ flow,
+ flow->backlog,
+ flow->queue.qlen
+ );
+
+ list_for_each_entry(flow, &txqi->old_flows, flowchain)
+ res += scnprintf(buf + res, len - res,
+ "vif %s old flow %p backlog (%db %dp)\n",
+ sdata->name,
+ flow,
+ flow->backlog,
+ flow->queue.qlen
+ );
+ }
+
+ rcu_read_unlock();
+ spin_unlock_bh(&local->fq.lock);
+
+ rv = simple_read_from_buffer(user_buf, count, ppos, buf, res);
+ kfree(buf);
+
+ return rv;
+}
+
DEBUGFS_READONLY_FILE_OPS(hwflags);
DEBUGFS_READONLY_FILE_OPS(queues);
+DEBUGFS_READONLY_FILE_OPS(fq);

/* statistics stuff */

@@ -247,6 +497,7 @@ void debugfs_hw_add(struct ieee80211_local *local)
DEBUGFS_ADD(total_ps_buffered);
DEBUGFS_ADD(wep_iv);
DEBUGFS_ADD(queues);
+ DEBUGFS_ADD(fq);
#ifdef CONFIG_PM
DEBUGFS_ADD_MODE(reset, 0200);
#endif
@@ -254,6 +505,22 @@ void debugfs_hw_add(struct ieee80211_local *local)
DEBUGFS_ADD(user_power);
DEBUGFS_ADD(power);

+ DEBUGFS_ADD(fq_drop_overlimit);
+ DEBUGFS_ADD(fq_drop_codel);
+ DEBUGFS_ADD(fq_backlog);
+ DEBUGFS_ADD(fq_in_flight_usec);
+ DEBUGFS_ADD(fq_txq_limit);
+ DEBUGFS_ADD(fq_txq_interval);
+ DEBUGFS_ADD(fq_txq_target);
+ DEBUGFS_ADD(fq_ave_period);
+
+ DEBUGFS_ADD(fq_min_txops_target);
+ DEBUGFS_ADD(fq_max_txops_per_txq);
+ DEBUGFS_ADD(fq_min_txops_per_hw);
+ DEBUGFS_ADD(fq_max_txops_per_hw);
+ DEBUGFS_ADD(fq_txop_mixed_usec);
+ DEBUGFS_ADD(fq_txop_green_usec);
+
statsd = debugfs_create_dir("statistics", phyd);

/* if the dir failed, don't put all the other things into the root! */
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index f1565ce35273..443c941d5917 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -805,9 +805,18 @@ enum txq_info_flags {
};

struct txq_info {
- struct sk_buff_head queue;
+ struct txq_flow flow;
+ struct list_head flowchain;
+ struct list_head new_flows;
+ struct list_head old_flows;
+ int backlog_bytes;
+ int backlog_packets;
+ int bytes_per_burst;
+ int bytes_per_usec;
+ int deficit;
+ int in_flight_delta_usec;
+ atomic_t in_flight_usec;
unsigned long flags;
- unsigned long byte_cnt;

/* keep last! */
struct ieee80211_txq txq;
@@ -855,7 +864,6 @@ struct ieee80211_sub_if_data {
bool control_port_no_encrypt;
int encrypt_headroom;

- atomic_t txqs_len[IEEE80211_NUM_ACS];
struct ieee80211_tx_queue_params tx_conf[IEEE80211_NUM_ACS];
struct mac80211_qos_map __rcu *qos_map;

@@ -1092,11 +1100,37 @@ enum mac80211_scan_state {
SCAN_ABORT,
};

+DECLARE_EWMA(fq_period, 16, 4)
+
+struct ieee80211_fq {
+ struct txq_flow *flows;
+ struct list_head backlogs;
+ struct list_head old_flows;
+ struct list_head new_flows;
+ struct ewma_fq_period ave_period;
+ spinlock_t lock;
+ atomic_t in_flight_usec;
+ int flows_cnt;
+ int perturbation;
+ int quantum;
+ int backlog;
+ int min_txops_target;
+ int max_txops_per_txq;
+ int min_txops_per_hw;
+ int max_txops_per_hw;
+ int txop_mixed_usec;
+ int txop_green_usec;
+
+ int drop_overlimit;
+ int drop_codel;
+};
+
struct ieee80211_local {
/* embed the driver visible part.
* don't cast (use the static inlines below), but we keep
* it first anyway so they become a no-op */
struct ieee80211_hw hw;
+ struct ieee80211_fq fq;

const struct ieee80211_ops *ops;

@@ -1928,6 +1962,11 @@ static inline bool ieee80211_can_run_worker(struct ieee80211_local *local)
void ieee80211_init_tx_queue(struct ieee80211_sub_if_data *sdata,
struct sta_info *sta,
struct txq_info *txq, int tid);
+void ieee80211_purge_txq(struct ieee80211_local *local, struct txq_info *txqi);
+void ieee80211_init_flow(struct txq_flow *flow);
+int ieee80211_setup_flows(struct ieee80211_local *local);
+void ieee80211_teardown_flows(struct ieee80211_local *local);
+
void ieee80211_send_auth(struct ieee80211_sub_if_data *sdata,
u16 transaction, u16 auth_alg, u16 status,
const u8 *extra, size_t extra_len, const u8 *bssid,
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 453b4e741780..d1063b50f12c 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -779,6 +779,7 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,
bool going_down)
{
struct ieee80211_local *local = sdata->local;
+ struct ieee80211_fq *fq = &local->fq;
unsigned long flags;
struct sk_buff *skb, *tmp;
u32 hw_reconf_flags = 0;
@@ -977,12 +978,9 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,
if (sdata->vif.txq) {
struct txq_info *txqi = to_txq_info(sdata->vif.txq);

- spin_lock_bh(&txqi->queue.lock);
- ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
- txqi->byte_cnt = 0;
- spin_unlock_bh(&txqi->queue.lock);
-
- atomic_set(&sdata->txqs_len[txqi->txq.ac], 0);
+ spin_lock_bh(&fq->lock);
+ ieee80211_purge_txq(local, txqi);
+ spin_unlock_bh(&fq->lock);
}

if (local->open_count == 0)
@@ -1198,6 +1196,13 @@ static void ieee80211_if_setup(struct net_device *dev)
dev->destructor = ieee80211_if_free;
}

+static void ieee80211_if_setup_no_queue(struct net_device *dev)
+{
+ ieee80211_if_setup(dev);
+ dev->priv_flags |= IFF_NO_QUEUE;
+ /* Note for backporters: use dev->tx_queue_len = 0 instead of IFF_ */
+}
+
static void ieee80211_iface_work(struct work_struct *work)
{
struct ieee80211_sub_if_data *sdata =
@@ -1707,6 +1712,7 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
struct net_device *ndev = NULL;
struct ieee80211_sub_if_data *sdata = NULL;
struct txq_info *txqi;
+ void (*if_setup)(struct net_device *dev);
int ret, i;
int txqs = 1;

@@ -1734,12 +1740,17 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
txq_size += sizeof(struct txq_info) +
local->hw.txq_data_size;

+ if (local->ops->wake_tx_queue)
+ if_setup = ieee80211_if_setup_no_queue;
+ else
+ if_setup = ieee80211_if_setup;
+
if (local->hw.queues >= IEEE80211_NUM_ACS)
txqs = IEEE80211_NUM_ACS;

ndev = alloc_netdev_mqs(size + txq_size,
name, name_assign_type,
- ieee80211_if_setup, txqs, 1);
+ if_setup, txqs, 1);
if (!ndev)
return -ENOMEM;
dev_net_set(ndev, wiphy_net(local->hw.wiphy));
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 8190bf27ebff..9fd3b10ae52b 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -1053,9 +1053,6 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)

local->dynamic_ps_forced_timeout = -1;

- if (!local->hw.txq_ac_max_pending)
- local->hw.txq_ac_max_pending = 64;
-
result = ieee80211_wep_init(local);
if (result < 0)
wiphy_debug(local->hw.wiphy, "Failed to initialize wep: %d\n",
@@ -1087,6 +1084,10 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)

rtnl_unlock();

+ result = ieee80211_setup_flows(local);
+ if (result)
+ goto fail_flows;
+
#ifdef CONFIG_INET
local->ifa_notifier.notifier_call = ieee80211_ifa_changed;
result = register_inetaddr_notifier(&local->ifa_notifier);
@@ -1112,6 +1113,8 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)
#if defined(CONFIG_INET) || defined(CONFIG_IPV6)
fail_ifa:
#endif
+ ieee80211_teardown_flows(local);
+ fail_flows:
rtnl_lock();
rate_control_deinitialize(local);
ieee80211_remove_interfaces(local);
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index dc27becb9b71..70f8f7949bf2 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1268,7 +1268,7 @@ static void sta_ps_start(struct sta_info *sta)
for (tid = 0; tid < ARRAY_SIZE(sta->sta.txq); tid++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[tid]);

- if (!skb_queue_len(&txqi->queue))
+ if (!txqi->backlog_packets)
set_bit(tid, &sta->txq_buffered_tids);
else
clear_bit(tid, &sta->txq_buffered_tids);
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 00c82fb152c0..0729046a0144 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -112,11 +112,7 @@ static void __cleanup_single_sta(struct sta_info *sta)
if (sta->sta.txq[0]) {
for (i = 0; i < ARRAY_SIZE(sta->sta.txq); i++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[i]);
- int n = skb_queue_len(&txqi->queue);
-
- ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
- atomic_sub(n, &sdata->txqs_len[txqi->txq.ac]);
- txqi->byte_cnt = 0;
+ ieee80211_purge_txq(local, txqi);
}
}

@@ -1193,7 +1189,7 @@ void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta)
for (i = 0; i < ARRAY_SIZE(sta->sta.txq); i++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[i]);

- if (!skb_queue_len(&txqi->queue))
+ if (!txqi->backlog_packets)
continue;

drv_wake_tx_queue(local, txqi);
@@ -1630,7 +1626,7 @@ ieee80211_sta_ps_deliver_response(struct sta_info *sta,
for (tid = 0; tid < ARRAY_SIZE(sta->sta.txq); tid++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[tid]);

- if (!(tids & BIT(tid)) || skb_queue_len(&txqi->queue))
+ if (!(tids & BIT(tid)) || txqi->backlog_packets)
continue;

sta_info_recalc_tim(sta);
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index 053f5c4fa495..dd9d5f754c57 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -19,6 +19,7 @@
#include <linux/etherdevice.h>
#include <linux/rhashtable.h>
#include "key.h"
+#include "codel_i.h"

/**
* enum ieee80211_sta_info_flags - Stations flags
@@ -330,6 +331,32 @@ struct mesh_sta {

DECLARE_EWMA(signal, 1024, 8)

+struct txq_info;
+
+/**
+ * struct txq_flow - per traffic flow queue
+ *
+ * This structure is used to distinguish and queue different traffic flows
+ * separately for fair queueing/AQM purposes.
+ *
+ * @txqi: txq_info structure it is associated at given time
+ * @flowchain: can be linked to other flows for RR purposes
+ * @backlogchain: can be linked to other flows for backlog sorting purposes
+ * @queue: sk_buff queue
+ * @cvars: codel state vars
+ * @backlog: number of bytes pending in the queue
+ * @deficit: used for fair queueing balancing
+ */
+struct txq_flow {
+ struct txq_info *txqi;
+ struct list_head flowchain;
+ struct list_head backlogchain;
+ struct sk_buff_head queue;
+ struct codel_vars cvars;
+ int backlog;
+ int deficit;
+};
+
/**
* struct sta_info - STA information
*
diff --git a/net/mac80211/status.c b/net/mac80211/status.c
index 8b1b2ea03eb5..2cd898f8a658 100644
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -502,6 +502,67 @@ static void ieee80211_report_ack_skb(struct ieee80211_local *local,
}
}

+static void ieee80211_report_txq_skb(struct ieee80211_local *local,
+ struct ieee80211_hdr *hdr,
+ struct sk_buff *skb)
+{
+ struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
+ struct ieee80211_fq *fq = &local->fq;
+ struct ieee80211_sub_if_data *sdata;
+ struct ieee80211_txq *txq = NULL;
+ struct sta_info *sta;
+ struct txq_info *txqi;
+ struct rhash_head *tmp;
+ const struct bucket_table *tbl;
+ int tid;
+ __le16 fc = hdr->frame_control;
+ u8 *addr;
+ static const u8 zeroaddr[ETH_ALEN];
+
+ if (!ieee80211_is_data(fc))
+ return;
+
+ rcu_read_lock();
+
+ tbl = rht_dereference_rcu(local->sta_hash.tbl, &local->sta_hash);
+ for_each_sta_info(local, tbl, hdr->addr1, sta, tmp) {
+ /* skip wrong virtual interface */
+ if (!ether_addr_equal(hdr->addr2, sta->sdata->vif.addr))
+ continue;
+
+ tid = skb->priority & IEEE80211_QOS_CTL_TID_MASK;
+ txq = sta->sta.txq[tid];
+
+ break;
+ }
+
+ if (!txq) {
+ addr = ieee80211_get_DA(hdr);
+ if (is_multicast_ether_addr(addr)) {
+ sdata = ieee80211_sdata_from_skb(local, skb);
+ txq = sdata->vif.txq;
+ }
+ }
+
+ if (txq) {
+ txqi = container_of(txq, struct txq_info, txq);
+ atomic_sub(info->expected_duration, &txqi->in_flight_usec);
+ if (atomic_read(&txqi->in_flight_usec) < 0) {
+ WARN_ON_ONCE(1);
+ print_hex_dump(KERN_DEBUG, "skb: ", DUMP_PREFIX_OFFSET, 16, 1,
+ skb->data, skb->len, 0);
+ printk("underflow: txq tid %d sta %pM vif %s\n",
+ txq->tid,
+ txq->sta ? txq->sta->addr : zeroaddr,
+ container_of(txq->vif, struct ieee80211_sub_if_data, vif)->name);
+ }
+ }
+
+ atomic_sub(info->expected_duration, &fq->in_flight_usec);
+
+ rcu_read_unlock();
+}
+
static void ieee80211_report_used_skb(struct ieee80211_local *local,
struct sk_buff *skb, bool dropped)
{
@@ -512,6 +573,9 @@ static void ieee80211_report_used_skb(struct ieee80211_local *local,
if (dropped)
acked = false;

+ if (local->ops->wake_tx_queue)
+ ieee80211_report_txq_skb(local, hdr, skb);
+
if (info->flags & IEEE80211_TX_INTFL_MLME_CONN_TX) {
struct ieee80211_sub_if_data *sdata;

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 6040c29a9e17..3072e460e82a 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -34,6 +34,7 @@
#include "wpa.h"
#include "wme.h"
#include "rate.h"
+#include "codel.h"

/* misc utils */

@@ -1232,27 +1233,335 @@ ieee80211_tx_prepare(struct ieee80211_sub_if_data *sdata,
return TX_CONTINUE;
}

-static void ieee80211_drv_tx(struct ieee80211_local *local,
- struct ieee80211_vif *vif,
- struct ieee80211_sta *pubsta,
- struct sk_buff *skb)
+static inline u64
+custom_codel_get_enqueue_time(struct sk_buff *skb)
+{
+ return IEEE80211_SKB_CB(skb)->control.enqueue_time;
+}
+
+static inline struct sk_buff *
+flow_dequeue(struct ieee80211_local *local, struct txq_flow *flow)
+{
+ struct ieee80211_fq *fq = &local->fq;
+ struct txq_info *txqi = flow->txqi;
+ struct txq_flow *i;
+ struct sk_buff *skb;
+
+ skb = __skb_dequeue(&flow->queue);
+ if (!skb)
+ return NULL;
+
+ txqi->backlog_bytes -= skb->len;
+ txqi->backlog_packets--;
+ flow->backlog -= skb->len;
+ fq->backlog--;
+
+ if (flow->backlog == 0) {
+ list_del_init(&flow->backlogchain);
+ } else {
+ i = flow;
+
+ list_for_each_entry_continue(i, &fq->backlogs, backlogchain)
+ if (i->backlog < flow->backlog)
+ break;
+
+ list_move_tail(&flow->backlogchain, &i->backlogchain);
+ }
+
+ return skb;
+}
+
+static inline struct sk_buff *
+custom_dequeue(struct codel_vars *vars, void *ptr)
+{
+ struct txq_flow *flow = ptr;
+ struct txq_info *txqi = flow->txqi;
+ struct ieee80211_vif *vif = txqi->txq.vif;
+ struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
+ struct ieee80211_local *local = sdata->local;
+
+ return flow_dequeue(local, flow);
+}
+
+static inline void
+custom_drop(struct sk_buff *skb, void *ptr)
+{
+ struct txq_flow *flow = ptr;
+ struct txq_info *txqi = flow->txqi;
+ struct ieee80211_vif *vif = txqi->txq.vif;
+ struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
+ struct ieee80211_local *local = sdata->local;
+ struct ieee80211_hw *hw = &local->hw;
+
+ ieee80211_free_txskb(hw, skb);
+ local->fq.drop_codel++;
+}
+
+static u32 fq_hash(struct ieee80211_fq *fq, struct sk_buff *skb)
+{
+ u32 hash = skb_get_hash_perturb(skb, fq->perturbation);
+ return reciprocal_scale(hash, fq->flows_cnt);
+}
+
+static void fq_drop(struct ieee80211_local *local)
+{
+ struct ieee80211_hw *hw = &local->hw;
+ struct ieee80211_fq *fq = &local->fq;
+ struct txq_flow *flow;
+ struct sk_buff *skb;
+
+ flow = list_first_entry_or_null(&fq->backlogs, struct txq_flow,
+ backlogchain);
+ if (WARN_ON_ONCE(!flow))
+ return;
+
+ skb = flow_dequeue(local, flow);
+ if (WARN_ON_ONCE(!skb))
+ return;
+
+ ieee80211_free_txskb(hw, skb);
+ fq->drop_overlimit++;
+}
+
+void ieee80211_init_flow(struct txq_flow *flow)
+{
+ INIT_LIST_HEAD(&flow->flowchain);
+ INIT_LIST_HEAD(&flow->backlogchain);
+ __skb_queue_head_init(&flow->queue);
+ codel_vars_init(&flow->cvars);
+}
+
+#define MIN_FQ_TARGET_USEC(fq) ((fq)->min_txops_target * (fq)->txop_mixed_usec)
+
+int ieee80211_setup_flows(struct ieee80211_local *local)
+{
+ struct ieee80211_fq *fq = &local->fq;
+ int i;
+
+ if (!local->ops->wake_tx_queue)
+ return 0;
+
+ if (!local->hw.txq_limit)
+ local->hw.txq_limit = 8192;
+
+ memset(fq, 0, sizeof(fq[0]));
+ INIT_LIST_HEAD(&fq->backlogs);
+ INIT_LIST_HEAD(&fq->old_flows);
+ INIT_LIST_HEAD(&fq->new_flows);
+ ewma_fq_period_init(&fq->ave_period);
+ atomic_set(&fq->in_flight_usec, 0);
+ spin_lock_init(&fq->lock);
+ fq->flows_cnt = 4096;
+ fq->perturbation = prandom_u32();
+ fq->quantum = 300;
+ fq->txop_mixed_usec = 5484;
+ fq->txop_green_usec = 10000;
+ fq->min_txops_target = 2;
+ fq->max_txops_per_txq = 1;
+ fq->min_txops_per_hw = 3;
+ fq->max_txops_per_hw = 4;
+
+ if (!local->hw.txq_cparams.target)
+ local->hw.txq_cparams.target = US2TIME(MIN_FQ_TARGET_USEC(fq));
+
+ if (!local->hw.txq_cparams.interval)
+ local->hw.txq_cparams.interval = MS2TIME(100);
+
+ fq->flows = kzalloc(fq->flows_cnt * sizeof(fq->flows[0]), GFP_KERNEL);
+ if (!fq->flows)
+ return -ENOMEM;
+
+ for (i = 0; i < fq->flows_cnt; i++)
+ ieee80211_init_flow(&fq->flows[i]);
+
+ return 0;
+}
+
+static void ieee80211_reset_flow(struct ieee80211_local *local,
+ struct txq_flow *flow)
+{
+ if (!list_empty(&flow->flowchain))
+ list_del_init(&flow->flowchain);
+
+ if (!list_empty(&flow->backlogchain))
+ list_del_init(&flow->backlogchain);
+
+ ieee80211_purge_tx_queue(&local->hw, &flow->queue);
+
+ flow->deficit = 0;
+ flow->txqi = NULL;
+}
+
+void ieee80211_purge_txq(struct ieee80211_local *local, struct txq_info *txqi)
+{
+ struct txq_flow *flow;
+ int i;
+
+ for (i = 0; i < local->fq.flows_cnt; i++) {
+ flow = &local->fq.flows[i];
+
+ if (flow->txqi != txqi)
+ continue;
+
+ ieee80211_reset_flow(local, flow);
+ }
+
+ ieee80211_reset_flow(local, &txqi->flow);
+
+ txqi->backlog_bytes = 0;
+ txqi->backlog_packets = 0;
+}
+
+void ieee80211_teardown_flows(struct ieee80211_local *local)
+{
+ struct ieee80211_fq *fq = &local->fq;
+ struct ieee80211_sub_if_data *sdata;
+ struct sta_info *sta;
+ int i;
+
+ if (!local->ops->wake_tx_queue)
+ return;
+
+ list_for_each_entry_rcu(sta, &local->sta_list, list)
+ for (i = 0; i < IEEE80211_NUM_TIDS; i++)
+ ieee80211_purge_txq(local,
+ to_txq_info(sta->sta.txq[i]));
+
+ list_for_each_entry_rcu(sdata, &local->interfaces, list)
+ ieee80211_purge_txq(local, to_txq_info(sdata->vif.txq));
+
+ for (i = 0; i < fq->flows_cnt; i++)
+ ieee80211_reset_flow(local, &fq->flows[i]);
+
+ kfree(fq->flows);
+
+ fq->flows = NULL;
+ fq->flows_cnt = 0;
+}
+
+static void ieee80211_txq_enqueue(struct ieee80211_local *local,
+ struct txq_info *txqi,
+ struct sk_buff *skb)
+{
+ struct ieee80211_fq *fq = &local->fq;
+ struct ieee80211_hw *hw = &local->hw;
+ struct txq_flow *flow;
+ struct txq_flow *i;
+ size_t idx = fq_hash(fq, skb);
+
+ lockdep_assert_held(&fq->lock);
+
+ flow = &fq->flows[idx];
+
+ if (flow->txqi && flow->txqi != txqi)
+ flow = &txqi->flow;
+
+ /* The following overwrites `vif` pointer effectively. It is later
+ * restored using txq structure.
+ */
+ IEEE80211_SKB_CB(skb)->control.enqueue_time = codel_get_time();
+
+ flow->txqi = txqi;
+ flow->backlog += skb->len;
+ txqi->backlog_bytes += skb->len;
+ txqi->backlog_packets++;
+ fq->backlog++;
+
+ if (list_empty(&flow->backlogchain))
+ list_add_tail(&flow->backlogchain, &fq->backlogs);
+
+ i = flow;
+ list_for_each_entry_continue_reverse(i, &fq->backlogs, backlogchain)
+ if (i->backlog > flow->backlog)
+ break;
+
+ list_move(&flow->backlogchain, &i->backlogchain);
+
+ if (list_empty(&flow->flowchain)) {
+ flow->deficit = fq->quantum;
+ list_add_tail(&flow->flowchain, &txqi->new_flows);
+ }
+
+ if (list_empty(&txqi->flowchain)) {
+ txqi->deficit = fq->quantum;
+ list_add_tail(&txqi->flowchain, &fq->new_flows);
+ }
+
+ __skb_queue_tail(&flow->queue, skb);
+
+ if (fq->backlog > hw->txq_limit)
+ fq_drop(local);
+}
+
+static struct sk_buff *ieee80211_txq_dequeue(struct ieee80211_local *local,
+ struct txq_info *txqi)
+{
+ struct ieee80211_fq *fq = &local->fq;
+ struct ieee80211_hw *hw = &local->hw;
+ struct txq_flow *flow;
+ struct list_head *head;
+ struct sk_buff *skb;
+
+begin:
+ head = &txqi->new_flows;
+ if (list_empty(head)) {
+ head = &txqi->old_flows;
+ if (list_empty(head))
+ return NULL;
+ }
+
+ flow = list_first_entry(head, struct txq_flow, flowchain);
+
+ if (flow->deficit <= 0) {
+ flow->deficit += fq->quantum;
+ list_move_tail(&flow->flowchain, &txqi->old_flows);
+ goto begin;
+ }
+
+ skb = codel_dequeue(flow,
+ &flow->backlog,
+ txqi->bytes_per_burst,
+ &flow->cvars,
+ &hw->txq_cparams,
+ codel_get_time(),
+ false);
+ if (!skb) {
+ if ((head == &txqi->new_flows) &&
+ !list_empty(&txqi->old_flows)) {
+ list_move_tail(&flow->flowchain, &txqi->old_flows);
+ } else {
+ list_del_init(&flow->flowchain);
+ flow->txqi = NULL;
+ }
+ goto begin;
+ }
+
+ flow->deficit -= skb->len;
+
+ /* The `vif` pointer was overwritten with enqueue time during
+ * enqueuing. Restore it before handing to driver.
+ */
+ IEEE80211_SKB_CB(skb)->control.vif = flow->txqi->txq.vif;
+
+ return skb;
+}
+
+static struct txq_info *
+ieee80211_get_txq(struct ieee80211_local *local,
+ struct ieee80211_vif *vif,
+ struct ieee80211_sta *pubsta,
+ struct sk_buff *skb)
{
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
- struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
- struct ieee80211_tx_control control = {
- .sta = pubsta,
- };
struct ieee80211_txq *txq = NULL;
- struct txq_info *txqi;
- u8 ac;

if ((info->flags & IEEE80211_TX_CTL_SEND_AFTER_DTIM) ||
(info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE))
- goto tx_normal;
+ return NULL;

if (!ieee80211_is_data(hdr->frame_control))
- goto tx_normal;
+ return NULL;

if (pubsta) {
u8 tid = skb->priority & IEEE80211_QOS_CTL_TID_MASK;
@@ -1263,57 +1572,48 @@ static void ieee80211_drv_tx(struct ieee80211_local *local,
}

if (!txq)
- goto tx_normal;
+ return NULL;

- ac = txq->ac;
- txqi = to_txq_info(txq);
- atomic_inc(&sdata->txqs_len[ac]);
- if (atomic_read(&sdata->txqs_len[ac]) >= local->hw.txq_ac_max_pending)
- netif_stop_subqueue(sdata->dev, ac);
-
- spin_lock_bh(&txqi->queue.lock);
- txqi->byte_cnt += skb->len;
- __skb_queue_tail(&txqi->queue, skb);
- spin_unlock_bh(&txqi->queue.lock);
-
- drv_wake_tx_queue(local, txqi);
-
- return;
-
-tx_normal:
- drv_tx(local, &control, skb);
+ return to_txq_info(txq);
}

+#define TXQI_BYTES_TO_USEC(txqi, bytes) \
+ DIV_ROUND_UP((bytes), max_t(int, 1, (txqi)->bytes_per_usec))
+
struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
struct ieee80211_txq *txq)
{
struct ieee80211_local *local = hw_to_local(hw);
- struct ieee80211_sub_if_data *sdata = vif_to_sdata(txq->vif);
+ struct ieee80211_fq *fq = &local->fq;
+ struct ieee80211_tx_info *info;
struct txq_info *txqi = container_of(txq, struct txq_info, txq);
struct ieee80211_hdr *hdr;
struct sk_buff *skb = NULL;
- u8 ac = txq->ac;
+ int duration_usec;

- spin_lock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);

if (test_bit(IEEE80211_TXQ_STOP, &txqi->flags))
goto out;

- skb = __skb_dequeue(&txqi->queue);
+ skb = ieee80211_txq_dequeue(local, txqi);
if (!skb)
goto out;

- txqi->byte_cnt -= skb->len;
+ duration_usec = TXQI_BYTES_TO_USEC(txqi, skb->len);
+ duration_usec = min_t(int, BIT(10) - 1, duration_usec);

- atomic_dec(&sdata->txqs_len[ac]);
- if (__netif_subqueue_stopped(sdata->dev, ac))
- ieee80211_propagate_queue_wake(local, sdata->vif.hw_queue[ac]);
+ info = IEEE80211_SKB_CB(skb);
+ info->expected_duration = duration_usec;
+
+ txqi->in_flight_delta_usec += duration_usec;
+ atomic_add(duration_usec, &txqi->in_flight_usec);
+ atomic_add(duration_usec, &fq->in_flight_usec);

hdr = (struct ieee80211_hdr *)skb->data;
if (txq->sta && ieee80211_is_data_qos(hdr->frame_control)) {
struct sta_info *sta = container_of(txq->sta, struct sta_info,
sta);
- struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);

hdr->seq_ctrl = ieee80211_tx_next_seq(sta, txq->tid);
if (test_bit(IEEE80211_TXQ_AMPDU, &txqi->flags))
@@ -1323,19 +1623,274 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
}

out:
- spin_unlock_bh(&txqi->queue.lock);
+ spin_unlock_bh(&fq->lock);

return skb;
}
EXPORT_SYMBOL(ieee80211_tx_dequeue);

+static u16 ieee80211_get_txop_usec(struct ieee80211_local *local,
+ struct txq_info *txqi)
+{
+ struct ieee80211_sub_if_data *sdata;
+ u16 txop_usec;
+
+ sdata = container_of(txqi->txq.vif, struct ieee80211_sub_if_data, vif);
+ txop_usec = sdata->tx_conf[txqi->txq.ac].txop * 32;
+
+ /* How to pick between mixed/greenfield txops? */
+ if (txop_usec == 0)
+ txop_usec = local->fq.txop_mixed_usec;
+
+ return txop_usec;
+}
+
+static u32 ieee80211_get_tput_kbps(struct ieee80211_local *local,
+ struct txq_info *txqi)
+{
+ struct ieee80211_sub_if_data *sdata;
+ struct ieee80211_supported_band *sband;
+ struct ieee80211_chanctx_conf *chanctx_conf;
+ enum ieee80211_band band;
+ struct rate_control_ref *ref = NULL;
+ struct sta_info *sta;
+ int idx;
+ u32 tput;
+
+ if (txqi->txq.sta) {
+ sta = container_of(txqi->txq.sta, struct sta_info, sta);
+
+ if (test_sta_flag(sta, WLAN_STA_RATE_CONTROL))
+ ref = local->rate_ctrl;
+
+ if (ref)
+ tput = ref->ops->get_expected_throughput(sta->rate_ctrl_priv);
+ else if (local->ops->get_expected_throughput)
+ tput = drv_get_expected_throughput(local, &sta->sta);
+ else
+ tput = 0;
+ } else {
+ sdata = container_of(txqi->txq.vif, struct ieee80211_sub_if_data, vif);
+
+ rcu_read_lock();
+ chanctx_conf = rcu_dereference(sdata->vif.chanctx_conf);
+ band = chanctx_conf->def.chan->band;
+ rcu_read_unlock();
+
+ sband = local->hw.wiphy->bands[band];
+ idx = sdata->vif.bss_conf.mcast_rate[band];
+ if (idx > 0) {
+ /* Convert units from 100Kbps and assume 20% MAC
+ * overhead, i.e. 80% efficiency.
+ */
+ tput = sband[band].bitrates[idx].bitrate * 100;
+ tput = (tput * 8) / 10;
+ } else {
+ tput = 1000;
+ }
+ }
+
+ return tput;
+}
+
+static void ieee80211_recalc_txqi_tput(struct ieee80211_local *local,
+ struct txq_info *txqi)
+{
+ struct ieee80211_fq *fq = &local->fq;
+ int tput_kbps;
+ int txop_usec;
+
+ lockdep_assert_held(&fq->lock);
+
+ tput_kbps = ieee80211_get_tput_kbps(local, txqi);
+ txop_usec = ieee80211_get_txop_usec(local, txqi);
+ txqi->bytes_per_usec = max_t(int, 1, DIV_ROUND_UP(1024 * (tput_kbps/8),
+ USEC_PER_SEC));
+ txqi->bytes_per_burst = max_t(int, 1, txop_usec * txqi->bytes_per_usec);
+}
+
+void ieee80211_recalc_fq_period(struct ieee80211_hw *hw)
+{
+ struct ieee80211_local *local = hw_to_local(hw);
+ struct ieee80211_fq *fq = &local->fq;
+ struct txq_info *txqi;
+ int period = 0;
+ int target_usec;
+
+ spin_lock_bh(&fq->lock);
+
+ list_for_each_entry(txqi, &fq->new_flows, flowchain) {
+ ieee80211_recalc_txqi_tput(local, txqi);
+
+ period += TXQI_BYTES_TO_USEC(txqi, min(txqi->backlog_bytes,
+ txqi->bytes_per_burst));
+ }
+
+ list_for_each_entry(txqi, &fq->old_flows, flowchain) {
+ ieee80211_recalc_txqi_tput(local, txqi);
+
+ period += TXQI_BYTES_TO_USEC(txqi, min(txqi->backlog_bytes,
+ txqi->bytes_per_burst));
+ }
+
+ ewma_fq_period_add(&fq->ave_period, period);
+
+ target_usec = ewma_fq_period_read(&fq->ave_period);
+ target_usec = max_t(u64, target_usec, MIN_FQ_TARGET_USEC(fq));
+ hw->txq_cparams.target = US2TIME(target_usec);
+
+ spin_unlock_bh(&fq->lock);
+}
+EXPORT_SYMBOL(ieee80211_recalc_fq_period);
+
+static int ieee80211_tx_sched_budget(struct ieee80211_local *local,
+ struct txq_info *txqi)
+{
+ int txop_usec;
+ int budget;
+
+ /* XXX: Should this consider per-txq or per-sta in flight duration? */
+ txop_usec = ieee80211_get_txop_usec(local, txqi);
+ budget = local->fq.max_txops_per_txq * txop_usec;
+ budget -= atomic_read(&txqi->in_flight_usec);
+ budget = min(budget, txop_usec);
+ budget *= min_t(int, 1, txqi->bytes_per_usec);
+
+ return budget;
+}
+
+static void ieee80211_tx_sched_next_txqi(struct ieee80211_local *local,
+ struct list_head **list,
+ struct list_head **head)
+{
+ struct ieee80211_fq *fq = &local->fq;
+
+ if (!*list) {
+ *head = &fq->new_flows;
+ *list = *head;
+ }
+
+ *list = (*list)->next;
+
+ if (*list != *head)
+ return;
+
+ if (*head == &fq->new_flows) {
+ *head = &fq->old_flows;
+ *list = *head;
+ ieee80211_tx_sched_next_txqi(local, list, head);
+ return;
+ }
+
+ *head = NULL;
+ *list = NULL;
+}
+
+int ieee80211_tx_schedule(struct ieee80211_hw *hw,
+ int (*wake)(struct ieee80211_hw *hw,
+ struct ieee80211_txq *txq,
+ int budget))
+{
+ struct ieee80211_local *local = hw_to_local(hw);
+ struct ieee80211_fq *fq = &local->fq;
+ struct list_head *list = NULL;
+ struct list_head *head = NULL;
+ struct txq_info *txqi = NULL;
+ int min_in_flight_usec;
+ int max_in_flight_usec;
+ int in_flight_usec;
+ int ret = 0;
+ int budget;
+
+ rcu_read_lock();
+ spin_lock_bh(&fq->lock);
+
+ min_in_flight_usec = fq->min_txops_per_hw * fq->txop_mixed_usec;
+ max_in_flight_usec = fq->max_txops_per_hw * fq->txop_mixed_usec;
+ in_flight_usec = atomic_read(&fq->in_flight_usec);
+
+ if (in_flight_usec >= min_in_flight_usec) {
+ ret = -EBUSY;
+ goto unlock;
+ }
+
+ for (;;) {
+ if (in_flight_usec >= max_in_flight_usec) {
+ ret = -EBUSY;
+ break;
+ }
+
+ if (list && list_is_last(list, &fq->old_flows)) {
+ ret = -EBUSY;
+ break;
+ }
+
+ ieee80211_tx_sched_next_txqi(local, &list, &head);
+ if (!list) {
+ ret = -ENOENT;
+ break;
+ }
+
+ txqi = list_entry(list, struct txq_info, flowchain);
+
+ if (txqi->deficit < 0) {
+ txqi->deficit += fq->quantum;
+ list_move_tail(&txqi->flowchain, &fq->old_flows);
+ list = NULL;
+ continue;
+ }
+
+ budget = ieee80211_tx_sched_budget(local, txqi);
+ txqi->in_flight_delta_usec = 0;
+
+ spin_unlock_bh(&fq->lock);
+ ret = wake(hw, &txqi->txq, budget);
+ spin_lock_bh(&fq->lock);
+
+ if (ret > 0) {
+ txqi->deficit -= txqi->in_flight_delta_usec;
+ in_flight_usec += txqi->in_flight_delta_usec;
+ }
+
+ if (!txqi->backlog_bytes) {
+ if (head == &fq->new_flows && !list_empty(&fq->old_flows)) {
+ list_move_tail(&txqi->flowchain, &fq->old_flows);
+ } else {
+ list_del_init(&txqi->flowchain);
+ }
+
+ list = NULL;
+ }
+
+ if (ret < 0) {
+ ret = -EBUSY;
+ break;
+ } else if (ret == 0 && txqi) {
+ /* `list` is not reset to skip over */
+ continue;
+ }
+
+ list = NULL;
+ }
+
+unlock:
+ spin_unlock_bh(&fq->lock);
+ rcu_read_unlock();
+
+ return ret;
+}
+EXPORT_SYMBOL(ieee80211_tx_schedule);
+
static bool ieee80211_tx_frags(struct ieee80211_local *local,
struct ieee80211_vif *vif,
struct ieee80211_sta *sta,
struct sk_buff_head *skbs,
bool txpending)
{
+ struct ieee80211_fq *fq = &local->fq;
+ struct ieee80211_tx_control control = {};
struct sk_buff *skb, *tmp;
+ struct txq_info *txqi;
unsigned long flags;

skb_queue_walk_safe(skbs, skb, tmp) {
@@ -1350,6 +1905,24 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
}
#endif

+ /* XXX: This changes behavior for offchan-tx. Is this really a
+ * problem with per-sta-tid queueing now?
+ */
+ txqi = ieee80211_get_txq(local, vif, sta, skb);
+ if (txqi) {
+ info->control.vif = vif;
+
+ __skb_unlink(skb, skbs);
+
+ spin_lock_bh(&fq->lock);
+ ieee80211_txq_enqueue(local, txqi, skb);
+ spin_unlock_bh(&fq->lock);
+
+ drv_wake_tx_queue(local, txqi);
+
+ continue;
+ }
+
spin_lock_irqsave(&local->queue_stop_reason_lock, flags);
if (local->queue_stop_reasons[q] ||
(!txpending && !skb_queue_empty(&local->pending[q]))) {
@@ -1392,9 +1965,10 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags);

info->control.vif = vif;
+ control.sta = sta;

__skb_unlink(skb, skbs);
- ieee80211_drv_tx(local, vif, sta, skb);
+ drv_tx(local, &control, skb);
}

return true;
@@ -2381,7 +2955,7 @@ static struct sk_buff *ieee80211_build_hdr(struct ieee80211_sub_if_data *sdata,

spin_lock_irqsave(&local->ack_status_lock, flags);
id = idr_alloc(&local->ack_status_frames, ack_skb,
- 1, 0x10000, GFP_ATOMIC);
+ 1, 0x8000, GFP_ATOMIC);
spin_unlock_irqrestore(&local->ack_status_lock, flags);

if (id >= 0) {
diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 0319d6d4f863..afb1bbf9b3f4 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -244,6 +244,9 @@ void ieee80211_propagate_queue_wake(struct ieee80211_local *local, int queue)
struct ieee80211_sub_if_data *sdata;
int n_acs = IEEE80211_NUM_ACS;

+ if (local->ops->wake_tx_queue)
+ return;
+
if (local->hw.queues < IEEE80211_NUM_ACS)
n_acs = 1;

@@ -260,11 +263,6 @@ void ieee80211_propagate_queue_wake(struct ieee80211_local *local, int queue)
for (ac = 0; ac < n_acs; ac++) {
int ac_queue = sdata->vif.hw_queue[ac];

- if (local->ops->wake_tx_queue &&
- (atomic_read(&sdata->txqs_len[ac]) >
- local->hw.txq_ac_max_pending))
- continue;
-
if (ac_queue == queue ||
(sdata->vif.cab_queue == queue &&
local->queue_stop_reasons[ac_queue] == 0 &&
@@ -352,6 +350,9 @@ static void __ieee80211_stop_queue(struct ieee80211_hw *hw, int queue,
if (__test_and_set_bit(reason, &local->queue_stop_reasons[queue]))
return;

+ if (local->ops->wake_tx_queue)
+ return;
+
if (local->hw.queues < IEEE80211_NUM_ACS)
n_acs = 1;

@@ -3392,8 +3393,12 @@ void ieee80211_init_tx_queue(struct ieee80211_sub_if_data *sdata,
struct sta_info *sta,
struct txq_info *txqi, int tid)
{
- skb_queue_head_init(&txqi->queue);
+ INIT_LIST_HEAD(&txqi->flowchain);
+ INIT_LIST_HEAD(&txqi->old_flows);
+ INIT_LIST_HEAD(&txqi->new_flows);
+ ieee80211_init_flow(&txqi->flow);
txqi->txq.vif = &sdata->vif;
+ txqi->flow.txqi = txqi;

if (sta) {
txqi->txq.sta = &sta->sta;
@@ -3414,9 +3419,9 @@ void ieee80211_txq_get_depth(struct ieee80211_txq *txq,
struct txq_info *txqi = to_txq_info(txq);

if (frame_cnt)
- *frame_cnt = txqi->queue.qlen;
+ *frame_cnt = txqi->backlog_packets;

if (byte_cnt)
- *byte_cnt = txqi->byte_cnt;
+ *byte_cnt = txqi->backlog_bytes;
}
EXPORT_SYMBOL(ieee80211_txq_get_depth);
--
2.1.4


2016-03-22 09:51:18

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [RFCv2 0/3] mac80211: implement fq codel

Michal Kazior <[email protected]> writes:

> traffic-gen generates only BE traffic. Everything else runs UDP_RR
> which doesn't generate a lot of traffic.

Good point. Fixed that: the newest git version of traffic-gen supports a
-t parameter which will be set as the TOS byte on outgoing traffic
(literal; no smart diffserv handling, so you can override the ECN bits
as well).

Added support for a burst-tos test parameter in the Flent burst test
configs which will use this new parameter if set.

-Toke

2016-03-31 10:26:21

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv2 1/2] mac80211: implement fair queuing per txq

Qdiscs assume all packets regardless of
destination address are treated equally by
underlying an device link.

This isn't true for wireless where each node is a
link in it's own right with different and varying
signal quality over time.

Existing wireless behavior stuffs device tx queues
with no regard to link conditions. This can result
in queue buildup for slow stations and an inertia
worth of seconds making it impossible for small
bursty traffic to come through.

The current high-level idea is to keep roughly 1-2
txops worth of data in device tx queues to allow
short bursts to be handled responsively.

mac80211's software queues were designed to work
very closely with device tx queues. They are
required to make use of 802.11 packet aggregation
easily and efficiently.

However the logic imposed a per-AC queue limit.
With the limit too small mac80211 wasn't be able
to guarantee fairness across TIDs nor stations
because single burst to a slow station could
monopolize queues and reach per-AC limit
preventing traffic from other stations being
queued into mac80211's software queues. Having the
limit too large would make smart qdiscs, e.g.
fq_codel, a lot less efficient as they are
designed on the premise that they are very close
to the actualy device tx queues.

The patch implements fq_codel-ish logic in
mac80211's software queuing. This doesn't directly
translate to immediate and significant gains.
Moreover only wake_tx_queue based drivers will be
able to reap the benefits of fair queuing for now.

More work is required to make sure drivers keep
their device tx queues at minimum fill (instead of
clogging them up until they're full regardless of
link conditions). Only then the full effect of
fair queuing will be observable.

Signed-off-by: Michal Kazior <[email protected]>
---

Notes:
v2:
* fix invalid ptr deref (I accidentally removed `info` ptr assignment..)

v1:
* move txq_limit and txq_cparams from ieee80211_hw to ieee80211_fq
* remove printks
* improve commit log
* various cleanups
* extra stats
* split out the core txq fairness changes
* should_drop() doesn't consider bursts
* codel target is hardcoded to 20ms

RFC v2:
* actually re-use txq_flows on enqueue [Felix]
* tune should_drop() to consider bursts wrt station expected tput [Dave/Bob]
* make codel target time scale via ewma of estimated txqi service period [Dave]
* generic tx scheduling (with time-based queue limit and naive hysteresis)
* tracking per-frame expected duration
* tracking per-txqi in-flight data duration
* tracking per-hw in-flight data duration
? in-flight means scheduled to driver and assumes driver does report
tx-status on actual tx-completion
* added a few debugfs entries

include/net/mac80211.h | 21 ++-
net/mac80211/agg-tx.c | 8 +-
net/mac80211/codel.h | 264 +++++++++++++++++++++++++++++++
net/mac80211/codel_i.h | 89 +++++++++++
net/mac80211/ieee80211_i.h | 45 +++++-
net/mac80211/iface.c | 24 ++-
net/mac80211/main.c | 9 +-
net/mac80211/rx.c | 2 +-
net/mac80211/sta_info.c | 10 +-
net/mac80211/sta_info.h | 27 ++++
net/mac80211/tx.c | 377 ++++++++++++++++++++++++++++++++++++++++-----
net/mac80211/util.c | 20 ++-
12 files changed, 817 insertions(+), 79 deletions(-)
create mode 100644 net/mac80211/codel.h
create mode 100644 net/mac80211/codel_i.h

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index a53333cb1528..0ee51dbb361b 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -888,8 +888,18 @@ struct ieee80211_tx_info {
/* only needed before rate control */
unsigned long jiffies;
};
- /* NB: vif can be NULL for injected frames */
- struct ieee80211_vif *vif;
+ union {
+ /* NB: vif can be NULL for injected frames */
+ struct ieee80211_vif *vif;
+
+ /* When packets are enqueued on txq it's easy
+ * to re-construct the vif pointer. There's no
+ * more space in tx_info so it can be used to
+ * store the necessary enqueue time for packet
+ * sojourn time computation.
+ */
+ u64 enqueue_time;
+ };
struct ieee80211_key_conf *hw_key;
u32 flags;
/* 4 bytes free */
@@ -2113,9 +2123,6 @@ enum ieee80211_hw_flags {
* @n_cipher_schemes: a size of an array of cipher schemes definitions.
* @cipher_schemes: a pointer to an array of cipher scheme definitions
* supported by HW.
- *
- * @txq_ac_max_pending: maximum number of frames per AC pending in all txq
- * entries for a vif.
*/
struct ieee80211_hw {
struct ieee80211_conf conf;
@@ -2145,7 +2152,6 @@ struct ieee80211_hw {
u8 uapsd_max_sp_len;
u8 n_cipher_schemes;
const struct ieee80211_cipher_scheme *cipher_schemes;
- int txq_ac_max_pending;
};

static inline bool _ieee80211_hw_check(struct ieee80211_hw *hw,
@@ -5633,6 +5639,9 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
* txq state can change half-way of this function and the caller may end up
* with "new" frame_cnt and "old" byte_cnt or vice-versa.
*
+ * Moreover returned values are best-case, i.e. assuming queueing algorithm
+ * will not drop frames due to excess latency.
+ *
* @txq: pointer obtained from station or virtual interface
* @frame_cnt: pointer to store frame count
* @byte_cnt: pointer to store byte count
diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
index 4932e9f243a2..b9d0cee2a786 100644
--- a/net/mac80211/agg-tx.c
+++ b/net/mac80211/agg-tx.c
@@ -194,17 +194,21 @@ static void
ieee80211_agg_stop_txq(struct sta_info *sta, int tid)
{
struct ieee80211_txq *txq = sta->sta.txq[tid];
+ struct ieee80211_sub_if_data *sdata;
+ struct ieee80211_fq *fq;
struct txq_info *txqi;

if (!txq)
return;

txqi = to_txq_info(txq);
+ sdata = vif_to_sdata(txq->vif);
+ fq = &sdata->local->fq;

/* Lock here to protect against further seqno updates on dequeue */
- spin_lock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);
set_bit(IEEE80211_TXQ_STOP, &txqi->flags);
- spin_unlock_bh(&txqi->queue.lock);
+ spin_unlock_bh(&fq->lock);
}

static void
diff --git a/net/mac80211/codel.h b/net/mac80211/codel.h
new file mode 100644
index 000000000000..e6470dbe5b0b
--- /dev/null
+++ b/net/mac80211/codel.h
@@ -0,0 +1,264 @@
+#ifndef __NET_MAC80211_CODEL_H
+#define __NET_MAC80211_CODEL_H
+
+/*
+ * Codel - The Controlled-Delay Active Queue Management algorithm
+ *
+ * Copyright (C) 2011-2012 Kathleen Nichols <[email protected]>
+ * Copyright (C) 2011-2012 Van Jacobson <[email protected]>
+ * Copyright (C) 2016 Michael D. Taht <[email protected]>
+ * Copyright (C) 2012 Eric Dumazet <[email protected]>
+ * Copyright (C) 2015 Jonathan Morton <[email protected]>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions, and the following disclaimer,
+ * without modification.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. The names of the authors may not be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ * Alternatively, provided that this notice is retained in full, this
+ * software may be distributed under the terms of the GNU General
+ * Public License ("GPL") version 2, in which case the provisions of the
+ * GPL apply INSTEAD OF those given above.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
+ * DAMAGE.
+ *
+ */
+
+#include <linux/version.h>
+#include <linux/types.h>
+#include <linux/ktime.h>
+#include <linux/skbuff.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <linux/reciprocal_div.h>
+
+#include "codel_i.h"
+
+/* Controlling Queue Delay (CoDel) algorithm
+ * =========================================
+ * Source : Kathleen Nichols and Van Jacobson
+ * http://queue.acm.org/detail.cfm?id=2209336
+ *
+ * Implemented on linux by Dave Taht and Eric Dumazet
+ */
+
+/* CoDel5 uses a real clock, unlike codel */
+
+static inline u64 codel_get_time(void)
+{
+ return ktime_get_ns();
+}
+
+static inline u32 codel_time_to_us(u64 val)
+{
+ do_div(val, NSEC_PER_USEC);
+ return (u32)val;
+}
+
+/* sizeof_in_bits(rec_inv_sqrt) */
+#define REC_INV_SQRT_BITS (8 * sizeof(u16))
+/* needed shift to get a Q0.32 number from rec_inv_sqrt */
+#define REC_INV_SQRT_SHIFT (32 - REC_INV_SQRT_BITS)
+
+/* Newton approximation method needs more iterations at small inputs,
+ * so cache them.
+ */
+
+static void codel_vars_init(struct codel_vars *vars)
+{
+ memset(vars, 0, sizeof(*vars));
+}
+
+/*
+ * http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Iterative_methods_for_reciprocal_square_roots
+ * new_invsqrt = (invsqrt / 2) * (3 - count * invsqrt^2)
+ *
+ * Here, invsqrt is a fixed point number (< 1.0), 32bit mantissa, aka Q0.32
+ */
+static inline void codel_Newton_step(struct codel_vars *vars)
+{
+ u32 invsqrt = ((u32)vars->rec_inv_sqrt) << REC_INV_SQRT_SHIFT;
+ u32 invsqrt2 = ((u64)invsqrt * invsqrt) >> 32;
+ u64 val = (3LL << 32) - ((u64)vars->count * invsqrt2);
+
+ val >>= 2; /* avoid overflow in following multiply */
+ val = (val * invsqrt) >> (32 - 2 + 1);
+
+ vars->rec_inv_sqrt = val >> REC_INV_SQRT_SHIFT;
+}
+
+/*
+ * CoDel control_law is t + interval/sqrt(count)
+ * We maintain in rec_inv_sqrt the reciprocal value of sqrt(count) to avoid
+ * both sqrt() and divide operation.
+ */
+static u64 codel_control_law(u64 t,
+ u64 interval,
+ u32 rec_inv_sqrt)
+{
+ return t + reciprocal_scale(interval, rec_inv_sqrt <<
+ REC_INV_SQRT_SHIFT);
+}
+
+/* Forward declaration of this for use elsewhere */
+
+static inline u64
+custom_codel_get_enqueue_time(struct sk_buff *skb);
+
+static inline struct sk_buff *
+custom_dequeue(struct codel_vars *vars, void *ptr);
+
+static inline void
+custom_drop(struct sk_buff *skb, void *ptr);
+
+static bool codel_should_drop(struct sk_buff *skb,
+ __u32 *backlog,
+ __u32 backlog_thr,
+ struct codel_vars *vars,
+ const struct codel_params *p,
+ u64 now)
+{
+ if (!skb) {
+ vars->first_above_time = 0;
+ return false;
+ }
+
+ if (now - custom_codel_get_enqueue_time(skb) < p->target ||
+ *backlog <= backlog_thr) {
+ /* went below - stay below for at least interval */
+ vars->first_above_time = 0;
+ return false;
+ }
+
+ if (vars->first_above_time == 0) {
+ /* just went above from below; mark the time */
+ vars->first_above_time = now + p->interval;
+
+ } else if (now > vars->first_above_time) {
+ return true;
+ }
+
+ return false;
+}
+
+static struct sk_buff *codel_dequeue(void *ptr,
+ __u32 *backlog,
+ __u32 backlog_thr,
+ struct codel_vars *vars,
+ struct codel_params *p,
+ u64 now,
+ bool overloaded)
+{
+ struct sk_buff *skb = custom_dequeue(vars, ptr);
+ bool drop;
+
+ if (!skb) {
+ vars->dropping = false;
+ return skb;
+ }
+ drop = codel_should_drop(skb, backlog, backlog_thr, vars, p, now);
+ if (vars->dropping) {
+ if (!drop) {
+ /* sojourn time below target - leave dropping state */
+ vars->dropping = false;
+ } else if (now >= vars->drop_next) {
+ /* It's time for the next drop. Drop the current
+ * packet and dequeue the next. The dequeue might
+ * take us out of dropping state.
+ * If not, schedule the next drop.
+ * A large backlog might result in drop rates so high
+ * that the next drop should happen now,
+ * hence the while loop.
+ */
+
+ /* saturating increment */
+ vars->count++;
+ if (!vars->count)
+ vars->count--;
+
+ codel_Newton_step(vars);
+ vars->drop_next = codel_control_law(vars->drop_next,
+ p->interval,
+ vars->rec_inv_sqrt);
+ do {
+ if (INET_ECN_set_ce(skb) && !overloaded) {
+ vars->ecn_mark++;
+ /* and schedule the next drop */
+ vars->drop_next = codel_control_law(
+ vars->drop_next, p->interval,
+ vars->rec_inv_sqrt);
+ goto end;
+ }
+ custom_drop(skb, ptr);
+ vars->drop_count++;
+ skb = custom_dequeue(vars, ptr);
+ if (skb && !codel_should_drop(skb, backlog,
+ backlog_thr,
+ vars, p, now)) {
+ /* leave dropping state */
+ vars->dropping = false;
+ } else {
+ /* schedule the next drop */
+ vars->drop_next = codel_control_law(
+ vars->drop_next, p->interval,
+ vars->rec_inv_sqrt);
+ }
+ } while (skb && vars->dropping && now >=
+ vars->drop_next);
+
+ /* Mark the packet regardless */
+ if (skb && INET_ECN_set_ce(skb))
+ vars->ecn_mark++;
+ }
+ } else if (drop) {
+ if (INET_ECN_set_ce(skb) && !overloaded) {
+ vars->ecn_mark++;
+ } else {
+ custom_drop(skb, ptr);
+ vars->drop_count++;
+
+ skb = custom_dequeue(vars, ptr);
+ drop = codel_should_drop(skb, backlog, backlog_thr,
+ vars, p, now);
+ if (skb && INET_ECN_set_ce(skb))
+ vars->ecn_mark++;
+ }
+ vars->dropping = true;
+ /* if min went above target close to when we last went below
+ * assume that the drop rate that controlled the queue on the
+ * last cycle is a good starting point to control it now.
+ */
+ if (vars->count > 2 &&
+ now - vars->drop_next < 8 * p->interval) {
+ vars->count -= 2;
+ codel_Newton_step(vars);
+ } else {
+ vars->count = 1;
+ vars->rec_inv_sqrt = ~0U >> REC_INV_SQRT_SHIFT;
+ }
+ codel_Newton_step(vars);
+ vars->drop_next = codel_control_law(now, p->interval,
+ vars->rec_inv_sqrt);
+ }
+end:
+ return skb;
+}
+#endif
diff --git a/net/mac80211/codel_i.h b/net/mac80211/codel_i.h
new file mode 100644
index 000000000000..60371121e526
--- /dev/null
+++ b/net/mac80211/codel_i.h
@@ -0,0 +1,89 @@
+#ifndef __NET_MAC80211_CODEL_I_H
+#define __NET_MAC80211_CODEL_I_H
+
+/*
+ * Codel - The Controlled-Delay Active Queue Management algorithm
+ *
+ * Copyright (C) 2011-2012 Kathleen Nichols <[email protected]>
+ * Copyright (C) 2011-2012 Van Jacobson <[email protected]>
+ * Copyright (C) 2016 Michael D. Taht <[email protected]>
+ * Copyright (C) 2012 Eric Dumazet <[email protected]>
+ * Copyright (C) 2015 Jonathan Morton <[email protected]>
+ * Copyright (C) 2016 Michal Kazior <[email protected]>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions, and the following disclaimer,
+ * without modification.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. The names of the authors may not be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ * Alternatively, provided that this notice is retained in full, this
+ * software may be distributed under the terms of the GNU General
+ * Public License ("GPL") version 2, in which case the provisions of the
+ * GPL apply INSTEAD OF those given above.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
+ * DAMAGE.
+ *
+ */
+
+#include <linux/version.h>
+#include <linux/types.h>
+#include <linux/ktime.h>
+#include <linux/skbuff.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <linux/reciprocal_div.h>
+
+/* Controlling Queue Delay (CoDel) algorithm
+ * =========================================
+ * Source : Kathleen Nichols and Van Jacobson
+ * http://queue.acm.org/detail.cfm?id=2209336
+ *
+ * Implemented on linux by Dave Taht and Eric Dumazet
+ */
+
+/* CoDel5 uses a real clock, unlike codel */
+
+#define MS2TIME(a) (a * (u64)NSEC_PER_MSEC)
+#define US2TIME(a) (a * (u64)NSEC_PER_USEC)
+
+/**
+ * struct codel_vars - contains codel variables
+ * @count: how many drops we've done since the last time we
+ * entered dropping state
+ * @dropping: set to > 0 if in dropping state
+ * @rec_inv_sqrt: reciprocal value of sqrt(count) >> 1
+ * @first_above_time: when we went (or will go) continuously above target
+ * for interval
+ * @drop_next: time to drop next packet, or when we dropped last
+ * @drop_count: temp count of dropped packets in dequeue()
+ * @ecn_mark: number of packets we ECN marked instead of dropping
+ */
+
+struct codel_vars {
+ u32 count;
+ u16 dropping;
+ u16 rec_inv_sqrt;
+ u64 first_above_time;
+ u64 drop_next;
+ u16 drop_count;
+ u16 ecn_mark;
+};
+#endif
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index c6830fbe7d68..3dc5192b6dd9 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -805,9 +805,18 @@ enum txq_info_flags {
};

struct txq_info {
- struct sk_buff_head queue;
+ struct txq_flow flow;
+ struct list_head new_flows;
+ struct list_head old_flows;
+ u32 backlog_bytes;
+ u32 backlog_packets;
+ u32 drop_codel;
+ u32 drop_overlimit;
+ u32 collisions;
+ u32 flows;
+ u32 tx_bytes;
+ u32 tx_packets;
unsigned long flags;
- unsigned long byte_cnt;

/* keep last! */
struct ieee80211_txq txq;
@@ -855,7 +864,6 @@ struct ieee80211_sub_if_data {
bool control_port_no_encrypt;
int encrypt_headroom;

- atomic_t txqs_len[IEEE80211_NUM_ACS];
struct ieee80211_tx_queue_params tx_conf[IEEE80211_NUM_ACS];
struct mac80211_qos_map __rcu *qos_map;

@@ -1092,11 +1100,37 @@ enum mac80211_scan_state {
SCAN_ABORT,
};

+/**
+ * struct codel_params - stores codel parameters
+ *
+ * @interval: initial drop rate
+ * @target: maximum persistent sojourn time
+ */
+struct codel_params {
+ u64 interval;
+ u64 target;
+};
+
+struct ieee80211_fq {
+ struct txq_flow *flows;
+ struct list_head backlogs;
+ struct codel_params cparams;
+ spinlock_t lock;
+ u32 flows_cnt;
+ u32 perturbation;
+ u32 txq_limit;
+ u32 quantum;
+ u32 backlog;
+ u32 drop_overlimit;
+ u32 drop_codel;
+};
+
struct ieee80211_local {
/* embed the driver visible part.
* don't cast (use the static inlines below), but we keep
* it first anyway so they become a no-op */
struct ieee80211_hw hw;
+ struct ieee80211_fq fq;

const struct ieee80211_ops *ops;

@@ -1928,6 +1962,11 @@ static inline bool ieee80211_can_run_worker(struct ieee80211_local *local)
void ieee80211_init_tx_queue(struct ieee80211_sub_if_data *sdata,
struct sta_info *sta,
struct txq_info *txq, int tid);
+void ieee80211_purge_txq(struct ieee80211_local *local, struct txq_info *txqi);
+void ieee80211_init_flow(struct txq_flow *flow);
+int ieee80211_setup_flows(struct ieee80211_local *local);
+void ieee80211_teardown_flows(struct ieee80211_local *local);
+
void ieee80211_send_auth(struct ieee80211_sub_if_data *sdata,
u16 transaction, u16 auth_alg, u16 status,
const u8 *extra, size_t extra_len, const u8 *bssid,
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 453b4e741780..1faea208edfc 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -779,6 +779,7 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,
bool going_down)
{
struct ieee80211_local *local = sdata->local;
+ struct ieee80211_fq *fq = &local->fq;
unsigned long flags;
struct sk_buff *skb, *tmp;
u32 hw_reconf_flags = 0;
@@ -977,12 +978,9 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,
if (sdata->vif.txq) {
struct txq_info *txqi = to_txq_info(sdata->vif.txq);

- spin_lock_bh(&txqi->queue.lock);
- ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
- txqi->byte_cnt = 0;
- spin_unlock_bh(&txqi->queue.lock);
-
- atomic_set(&sdata->txqs_len[txqi->txq.ac], 0);
+ spin_lock_bh(&fq->lock);
+ ieee80211_purge_txq(local, txqi);
+ spin_unlock_bh(&fq->lock);
}

if (local->open_count == 0)
@@ -1198,6 +1196,12 @@ static void ieee80211_if_setup(struct net_device *dev)
dev->destructor = ieee80211_if_free;
}

+static void ieee80211_if_setup_no_queue(struct net_device *dev)
+{
+ ieee80211_if_setup(dev);
+ dev->priv_flags |= IFF_NO_QUEUE;
+}
+
static void ieee80211_iface_work(struct work_struct *work)
{
struct ieee80211_sub_if_data *sdata =
@@ -1707,6 +1711,7 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
struct net_device *ndev = NULL;
struct ieee80211_sub_if_data *sdata = NULL;
struct txq_info *txqi;
+ void (*if_setup)(struct net_device *dev);
int ret, i;
int txqs = 1;

@@ -1734,12 +1739,17 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
txq_size += sizeof(struct txq_info) +
local->hw.txq_data_size;

+ if (local->ops->wake_tx_queue)
+ if_setup = ieee80211_if_setup_no_queue;
+ else
+ if_setup = ieee80211_if_setup;
+
if (local->hw.queues >= IEEE80211_NUM_ACS)
txqs = IEEE80211_NUM_ACS;

ndev = alloc_netdev_mqs(size + txq_size,
name, name_assign_type,
- ieee80211_if_setup, txqs, 1);
+ if_setup, txqs, 1);
if (!ndev)
return -ENOMEM;
dev_net_set(ndev, wiphy_net(local->hw.wiphy));
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 8190bf27ebff..9fd3b10ae52b 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -1053,9 +1053,6 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)

local->dynamic_ps_forced_timeout = -1;

- if (!local->hw.txq_ac_max_pending)
- local->hw.txq_ac_max_pending = 64;
-
result = ieee80211_wep_init(local);
if (result < 0)
wiphy_debug(local->hw.wiphy, "Failed to initialize wep: %d\n",
@@ -1087,6 +1084,10 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)

rtnl_unlock();

+ result = ieee80211_setup_flows(local);
+ if (result)
+ goto fail_flows;
+
#ifdef CONFIG_INET
local->ifa_notifier.notifier_call = ieee80211_ifa_changed;
result = register_inetaddr_notifier(&local->ifa_notifier);
@@ -1112,6 +1113,8 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)
#if defined(CONFIG_INET) || defined(CONFIG_IPV6)
fail_ifa:
#endif
+ ieee80211_teardown_flows(local);
+ fail_flows:
rtnl_lock();
rate_control_deinitialize(local);
ieee80211_remove_interfaces(local);
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 91279576f4a7..10a6e9c3de51 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1268,7 +1268,7 @@ static void sta_ps_start(struct sta_info *sta)
for (tid = 0; tid < ARRAY_SIZE(sta->sta.txq); tid++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[tid]);

- if (!skb_queue_len(&txqi->queue))
+ if (!txqi->backlog_packets)
set_bit(tid, &sta->txq_buffered_tids);
else
clear_bit(tid, &sta->txq_buffered_tids);
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 00c82fb152c0..0729046a0144 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -112,11 +112,7 @@ static void __cleanup_single_sta(struct sta_info *sta)
if (sta->sta.txq[0]) {
for (i = 0; i < ARRAY_SIZE(sta->sta.txq); i++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[i]);
- int n = skb_queue_len(&txqi->queue);
-
- ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
- atomic_sub(n, &sdata->txqs_len[txqi->txq.ac]);
- txqi->byte_cnt = 0;
+ ieee80211_purge_txq(local, txqi);
}
}

@@ -1193,7 +1189,7 @@ void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta)
for (i = 0; i < ARRAY_SIZE(sta->sta.txq); i++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[i]);

- if (!skb_queue_len(&txqi->queue))
+ if (!txqi->backlog_packets)
continue;

drv_wake_tx_queue(local, txqi);
@@ -1630,7 +1626,7 @@ ieee80211_sta_ps_deliver_response(struct sta_info *sta,
for (tid = 0; tid < ARRAY_SIZE(sta->sta.txq); tid++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[tid]);

- if (!(tids & BIT(tid)) || skb_queue_len(&txqi->queue))
+ if (!(tids & BIT(tid)) || txqi->backlog_packets)
continue;

sta_info_recalc_tim(sta);
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index 053f5c4fa495..4f7ad9158d31 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -19,6 +19,7 @@
#include <linux/etherdevice.h>
#include <linux/rhashtable.h>
#include "key.h"
+#include "codel_i.h"

/**
* enum ieee80211_sta_info_flags - Stations flags
@@ -330,6 +331,32 @@ struct mesh_sta {

DECLARE_EWMA(signal, 1024, 8)

+struct txq_info;
+
+/**
+ * struct txq_flow - per traffic flow queue
+ *
+ * This structure is used to distinguish and queue different traffic flows
+ * separately for fair queueing/AQM purposes.
+ *
+ * @txqi: txq_info structure it is associated at given time
+ * @flowchain: can be linked to other flows for RR purposes
+ * @backlogchain: can be linked to other flows for backlog sorting purposes
+ * @queue: sk_buff queue
+ * @cvars: codel state vars
+ * @backlog: number of bytes pending in the queue
+ * @deficit: used for fair queueing balancing
+ */
+struct txq_flow {
+ struct txq_info *txqi;
+ struct list_head flowchain;
+ struct list_head backlogchain;
+ struct sk_buff_head queue;
+ struct codel_vars cvars;
+ u32 backlog;
+ int deficit;
+};
+
/**
* struct sta_info - STA information
*
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 485e30a24b38..dd65e34f7107 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -34,6 +34,7 @@
#include "wpa.h"
#include "wme.h"
#include "rate.h"
+#include "codel.h"

/* misc utils */

@@ -1232,27 +1233,323 @@ ieee80211_tx_prepare(struct ieee80211_sub_if_data *sdata,
return TX_CONTINUE;
}

-static void ieee80211_drv_tx(struct ieee80211_local *local,
- struct ieee80211_vif *vif,
- struct ieee80211_sta *pubsta,
- struct sk_buff *skb)
+static inline u64
+custom_codel_get_enqueue_time(struct sk_buff *skb)
+{
+ return IEEE80211_SKB_CB(skb)->control.enqueue_time;
+}
+
+static inline struct sk_buff *
+flow_dequeue(struct ieee80211_local *local, struct txq_flow *flow)
+{
+ struct ieee80211_fq *fq = &local->fq;
+ struct txq_info *txqi = flow->txqi;
+ struct txq_flow *i;
+ struct sk_buff *skb;
+
+ skb = __skb_dequeue(&flow->queue);
+ if (!skb)
+ return NULL;
+
+ txqi->backlog_bytes -= skb->len;
+ txqi->backlog_packets--;
+ flow->backlog -= skb->len;
+ fq->backlog--;
+
+ if (flow->backlog == 0) {
+ list_del_init(&flow->backlogchain);
+ } else {
+ i = flow;
+
+ list_for_each_entry_continue(i, &fq->backlogs, backlogchain)
+ if (i->backlog < flow->backlog)
+ break;
+
+ list_move_tail(&flow->backlogchain, &i->backlogchain);
+ }
+
+ return skb;
+}
+
+static inline struct sk_buff *
+custom_dequeue(struct codel_vars *vars, void *ptr)
+{
+ struct txq_flow *flow = ptr;
+ struct txq_info *txqi = flow->txqi;
+ struct ieee80211_vif *vif = txqi->txq.vif;
+ struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
+ struct ieee80211_local *local = sdata->local;
+
+ return flow_dequeue(local, flow);
+}
+
+static inline void
+custom_drop(struct sk_buff *skb, void *ptr)
+{
+ struct txq_flow *flow = ptr;
+ struct txq_info *txqi = flow->txqi;
+ struct ieee80211_vif *vif = txqi->txq.vif;
+ struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
+ struct ieee80211_local *local = sdata->local;
+ struct ieee80211_hw *hw = &local->hw;
+
+ ieee80211_free_txskb(hw, skb);
+
+ txqi->drop_codel++;
+ local->fq.drop_codel++;
+}
+
+static u32 fq_hash(struct ieee80211_fq *fq, struct sk_buff *skb)
+{
+ u32 hash = skb_get_hash_perturb(skb, fq->perturbation);
+
+ return reciprocal_scale(hash, fq->flows_cnt);
+}
+
+static void fq_drop(struct ieee80211_local *local)
+{
+ struct ieee80211_hw *hw = &local->hw;
+ struct ieee80211_fq *fq = &local->fq;
+ struct txq_flow *flow;
+ struct sk_buff *skb;
+
+ flow = list_first_entry_or_null(&fq->backlogs, struct txq_flow,
+ backlogchain);
+ if (WARN_ON_ONCE(!flow))
+ return;
+
+ skb = flow_dequeue(local, flow);
+ if (WARN_ON_ONCE(!skb))
+ return;
+
+ ieee80211_free_txskb(hw, skb);
+
+ flow->txqi->drop_overlimit++;
+ fq->drop_overlimit++;
+}
+
+void ieee80211_init_flow(struct txq_flow *flow)
+{
+ INIT_LIST_HEAD(&flow->flowchain);
+ INIT_LIST_HEAD(&flow->backlogchain);
+ __skb_queue_head_init(&flow->queue);
+ codel_vars_init(&flow->cvars);
+}
+
+int ieee80211_setup_flows(struct ieee80211_local *local)
+{
+ struct ieee80211_fq *fq = &local->fq;
+ int i;
+
+ if (!local->ops->wake_tx_queue)
+ return 0;
+
+ memset(fq, 0, sizeof(fq[0]));
+ INIT_LIST_HEAD(&fq->backlogs);
+ spin_lock_init(&fq->lock);
+ fq->flows_cnt = 4096;
+ fq->perturbation = prandom_u32();
+ fq->quantum = 300;
+ fq->txq_limit = 8192;
+ fq->cparams.target = MS2TIME(20);
+ fq->cparams.interval = MS2TIME(100);
+
+ fq->flows = kcalloc(fq->flows_cnt, sizeof(fq->flows[0]), GFP_KERNEL);
+ if (!fq->flows)
+ return -ENOMEM;
+
+ for (i = 0; i < fq->flows_cnt; i++)
+ ieee80211_init_flow(&fq->flows[i]);
+
+ return 0;
+}
+
+static void ieee80211_reset_flow(struct ieee80211_local *local,
+ struct txq_flow *flow)
+{
+ if (!list_empty(&flow->flowchain))
+ list_del_init(&flow->flowchain);
+
+ if (!list_empty(&flow->backlogchain))
+ list_del_init(&flow->backlogchain);
+
+ ieee80211_purge_tx_queue(&local->hw, &flow->queue);
+
+ flow->deficit = 0;
+ flow->txqi = NULL;
+}
+
+void ieee80211_purge_txq(struct ieee80211_local *local, struct txq_info *txqi)
+{
+ struct txq_flow *flow;
+ int i;
+
+ for (i = 0; i < local->fq.flows_cnt; i++) {
+ flow = &local->fq.flows[i];
+
+ if (flow->txqi != txqi)
+ continue;
+
+ ieee80211_reset_flow(local, flow);
+ }
+
+ ieee80211_reset_flow(local, &txqi->flow);
+
+ txqi->backlog_bytes = 0;
+ txqi->backlog_packets = 0;
+}
+
+void ieee80211_teardown_flows(struct ieee80211_local *local)
+{
+ struct ieee80211_fq *fq = &local->fq;
+ struct ieee80211_sub_if_data *sdata;
+ struct sta_info *sta;
+ int i;
+
+ if (!local->ops->wake_tx_queue)
+ return;
+
+ list_for_each_entry_rcu(sta, &local->sta_list, list)
+ for (i = 0; i < IEEE80211_NUM_TIDS; i++)
+ ieee80211_purge_txq(local,
+ to_txq_info(sta->sta.txq[i]));
+
+ list_for_each_entry_rcu(sdata, &local->interfaces, list)
+ ieee80211_purge_txq(local, to_txq_info(sdata->vif.txq));
+
+ for (i = 0; i < fq->flows_cnt; i++)
+ ieee80211_reset_flow(local, &fq->flows[i]);
+
+ kfree(fq->flows);
+
+ fq->flows = NULL;
+ fq->flows_cnt = 0;
+}
+
+static void ieee80211_txq_enqueue(struct ieee80211_local *local,
+ struct txq_info *txqi,
+ struct sk_buff *skb)
+{
+ struct ieee80211_fq *fq = &local->fq;
+ struct txq_flow *flow;
+ struct txq_flow *i;
+ size_t idx = fq_hash(fq, skb);
+
+ lockdep_assert_held(&fq->lock);
+
+ flow = &fq->flows[idx];
+
+ if (flow->txqi && flow->txqi != txqi) {
+ flow = &txqi->flow;
+ txqi->collisions++;
+ }
+
+ if (!flow->txqi)
+ txqi->flows++;
+
+ /* The following overwrites `vif` pointer effectively. It is later
+ * restored using txq structure.
+ */
+ IEEE80211_SKB_CB(skb)->control.enqueue_time = codel_get_time();
+
+ flow->txqi = txqi;
+ flow->backlog += skb->len;
+ txqi->backlog_bytes += skb->len;
+ txqi->backlog_packets++;
+ fq->backlog++;
+
+ if (list_empty(&flow->backlogchain))
+ list_add_tail(&flow->backlogchain, &fq->backlogs);
+
+ i = flow;
+ list_for_each_entry_continue_reverse(i, &fq->backlogs, backlogchain)
+ if (i->backlog > flow->backlog)
+ break;
+
+ list_move(&flow->backlogchain, &i->backlogchain);
+
+ if (list_empty(&flow->flowchain)) {
+ flow->deficit = fq->quantum;
+ list_add_tail(&flow->flowchain, &txqi->new_flows);
+ }
+
+ __skb_queue_tail(&flow->queue, skb);
+
+ if (fq->backlog > fq->txq_limit)
+ fq_drop(local);
+}
+
+static struct sk_buff *ieee80211_txq_dequeue(struct ieee80211_local *local,
+ struct txq_info *txqi)
+{
+ struct ieee80211_fq *fq = &local->fq;
+ struct txq_flow *flow;
+ struct list_head *head;
+ struct sk_buff *skb;
+
+begin:
+ head = &txqi->new_flows;
+ if (list_empty(head)) {
+ head = &txqi->old_flows;
+ if (list_empty(head))
+ return NULL;
+ }
+
+ flow = list_first_entry(head, struct txq_flow, flowchain);
+
+ if (flow->deficit <= 0) {
+ flow->deficit += fq->quantum;
+ list_move_tail(&flow->flowchain, &txqi->old_flows);
+ goto begin;
+ }
+
+ skb = codel_dequeue(flow,
+ &flow->backlog,
+ 0,
+ &flow->cvars,
+ &fq->cparams,
+ codel_get_time(),
+ false);
+ if (!skb) {
+ if ((head == &txqi->new_flows) &&
+ !list_empty(&txqi->old_flows)) {
+ list_move_tail(&flow->flowchain, &txqi->old_flows);
+ } else {
+ list_del_init(&flow->flowchain);
+ flow->txqi = NULL;
+ }
+ goto begin;
+ }
+
+ flow->deficit -= skb->len;
+ txqi->tx_bytes += skb->len;
+ txqi->tx_packets++;
+
+ /* The `vif` pointer was overwritten with enqueue time during
+ * enqueuing. Restore it before handing to driver.
+ */
+ IEEE80211_SKB_CB(skb)->control.vif = flow->txqi->txq.vif;
+
+ return skb;
+}
+
+static struct txq_info *
+ieee80211_get_txq(struct ieee80211_local *local,
+ struct ieee80211_vif *vif,
+ struct ieee80211_sta *pubsta,
+ struct sk_buff *skb)
{
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
- struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
- struct ieee80211_tx_control control = {
- .sta = pubsta,
- };
struct ieee80211_txq *txq = NULL;
- struct txq_info *txqi;
- u8 ac;

if ((info->flags & IEEE80211_TX_CTL_SEND_AFTER_DTIM) ||
+ (info->flags & IEEE80211_TX_INTFL_OFFCHAN_TX_OK) ||
(info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE))
- goto tx_normal;
+ return NULL;

if (!ieee80211_is_data(hdr->frame_control))
- goto tx_normal;
+ return NULL;

if (pubsta) {
u8 tid = skb->priority & IEEE80211_QOS_CTL_TID_MASK;
@@ -1263,52 +1560,29 @@ static void ieee80211_drv_tx(struct ieee80211_local *local,
}

if (!txq)
- goto tx_normal;
+ return NULL;

- ac = txq->ac;
- txqi = to_txq_info(txq);
- atomic_inc(&sdata->txqs_len[ac]);
- if (atomic_read(&sdata->txqs_len[ac]) >= local->hw.txq_ac_max_pending)
- netif_stop_subqueue(sdata->dev, ac);
-
- spin_lock_bh(&txqi->queue.lock);
- txqi->byte_cnt += skb->len;
- __skb_queue_tail(&txqi->queue, skb);
- spin_unlock_bh(&txqi->queue.lock);
-
- drv_wake_tx_queue(local, txqi);
-
- return;
-
-tx_normal:
- drv_tx(local, &control, skb);
+ return to_txq_info(txq);
}

struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
struct ieee80211_txq *txq)
{
struct ieee80211_local *local = hw_to_local(hw);
- struct ieee80211_sub_if_data *sdata = vif_to_sdata(txq->vif);
+ struct ieee80211_fq *fq = &local->fq;
struct txq_info *txqi = container_of(txq, struct txq_info, txq);
struct ieee80211_hdr *hdr;
struct sk_buff *skb = NULL;
- u8 ac = txq->ac;

- spin_lock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);

if (test_bit(IEEE80211_TXQ_STOP, &txqi->flags))
goto out;

- skb = __skb_dequeue(&txqi->queue);
+ skb = ieee80211_txq_dequeue(local, txqi);
if (!skb)
goto out;

- txqi->byte_cnt -= skb->len;
-
- atomic_dec(&sdata->txqs_len[ac]);
- if (__netif_subqueue_stopped(sdata->dev, ac))
- ieee80211_propagate_queue_wake(local, sdata->vif.hw_queue[ac]);
-
hdr = (struct ieee80211_hdr *)skb->data;
if (txq->sta && ieee80211_is_data_qos(hdr->frame_control)) {
struct sta_info *sta = container_of(txq->sta, struct sta_info,
@@ -1323,7 +1597,7 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
}

out:
- spin_unlock_bh(&txqi->queue.lock);
+ spin_unlock_bh(&fq->lock);

return skb;
}
@@ -1335,7 +1609,10 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
struct sk_buff_head *skbs,
bool txpending)
{
+ struct ieee80211_fq *fq = &local->fq;
+ struct ieee80211_tx_control control = {};
struct sk_buff *skb, *tmp;
+ struct txq_info *txqi;
unsigned long flags;

skb_queue_walk_safe(skbs, skb, tmp) {
@@ -1350,6 +1627,21 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
}
#endif

+ txqi = ieee80211_get_txq(local, vif, sta, skb);
+ if (txqi) {
+ info->control.vif = vif;
+
+ __skb_unlink(skb, skbs);
+
+ spin_lock_bh(&fq->lock);
+ ieee80211_txq_enqueue(local, txqi, skb);
+ spin_unlock_bh(&fq->lock);
+
+ drv_wake_tx_queue(local, txqi);
+
+ continue;
+ }
+
spin_lock_irqsave(&local->queue_stop_reason_lock, flags);
if (local->queue_stop_reasons[q] ||
(!txpending && !skb_queue_empty(&local->pending[q]))) {
@@ -1392,9 +1684,10 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags);

info->control.vif = vif;
+ control.sta = sta;

__skb_unlink(skb, skbs);
- ieee80211_drv_tx(local, vif, sta, skb);
+ drv_tx(local, &control, skb);
}

return true;
diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 0319d6d4f863..cbcdf7cf9679 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -244,6 +244,9 @@ void ieee80211_propagate_queue_wake(struct ieee80211_local *local, int queue)
struct ieee80211_sub_if_data *sdata;
int n_acs = IEEE80211_NUM_ACS;

+ if (local->ops->wake_tx_queue)
+ return;
+
if (local->hw.queues < IEEE80211_NUM_ACS)
n_acs = 1;

@@ -260,11 +263,6 @@ void ieee80211_propagate_queue_wake(struct ieee80211_local *local, int queue)
for (ac = 0; ac < n_acs; ac++) {
int ac_queue = sdata->vif.hw_queue[ac];

- if (local->ops->wake_tx_queue &&
- (atomic_read(&sdata->txqs_len[ac]) >
- local->hw.txq_ac_max_pending))
- continue;
-
if (ac_queue == queue ||
(sdata->vif.cab_queue == queue &&
local->queue_stop_reasons[ac_queue] == 0 &&
@@ -352,6 +350,9 @@ static void __ieee80211_stop_queue(struct ieee80211_hw *hw, int queue,
if (__test_and_set_bit(reason, &local->queue_stop_reasons[queue]))
return;

+ if (local->ops->wake_tx_queue)
+ return;
+
if (local->hw.queues < IEEE80211_NUM_ACS)
n_acs = 1;

@@ -3392,8 +3393,11 @@ void ieee80211_init_tx_queue(struct ieee80211_sub_if_data *sdata,
struct sta_info *sta,
struct txq_info *txqi, int tid)
{
- skb_queue_head_init(&txqi->queue);
+ INIT_LIST_HEAD(&txqi->old_flows);
+ INIT_LIST_HEAD(&txqi->new_flows);
+ ieee80211_init_flow(&txqi->flow);
txqi->txq.vif = &sdata->vif;
+ txqi->flow.txqi = txqi;

if (sta) {
txqi->txq.sta = &sta->sta;
@@ -3414,9 +3418,9 @@ void ieee80211_txq_get_depth(struct ieee80211_txq *txq,
struct txq_info *txqi = to_txq_info(txq);

if (frame_cnt)
- *frame_cnt = txqi->queue.qlen;
+ *frame_cnt = txqi->backlog_packets;

if (byte_cnt)
- *byte_cnt = txqi->byte_cnt;
+ *byte_cnt = txqi->backlog_bytes;
}
EXPORT_SYMBOL(ieee80211_txq_get_depth);
--
2.1.4


2016-03-16 10:26:13

by Michal Kazior

[permalink] [raw]
Subject: Re: [RFCv2 0/3] mac80211: implement fq codel

On 16 March 2016 at 11:17, Michal Kazior <[email protected]> wrote:
> Hi,
>
> Most notable changes:
[...]
> * ath10k proof-of-concept that uses the new tx
> scheduling (will post results in separate
> email)

I'm attaching a bunch of tests I've done using flent. They are all
"burst" tests with burst-ports=1 and burst-length=2. The testing
topology is:

AP ----> STA
AP )) (( STA
[veth]--[br]--[wlan] )) (( [wlan]

You can notice that in some tests plot data gets cut-off. There are 2
problems I've identified:
- excess drops (not a problem with the patchset and can be seen when
there's no codel-in-mac or scheduling isn't used)
- UDP_RR hangs (apparently QCA99X0 I have hangs for a few hundred ms
sometimes at times and doesn't Rx frames causing UDP_RR to stop
mid-way; confirmed with logs and sniffer; I haven't figured out *why*
exactly, could be some hw/fw quirk)

Let me know if you have questions or comments regarding my testing/results.


Michał


Attachments:
fq.tar.gz (62.26 kB)

2016-03-31 10:26:19

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv2 0/2] mac80211: implement fq_codel

Hi,

I've cleaned up and removed the
txop-queue-limiting and scheduling from the patch
(compared to my last RFC). It's still to early for
the scheduling thing to go prime time.

The fair queuing on the other hand does seem to
work. In good RF conditions it seems to improve
things (e.g. multiple TCP streams converge into a
steady average). In bad RF conditions things look
same grim as before (but not worse).

I've done a few more experiments with naive DQL in
ath10k and some flent tests prove the fair queuing
in mac80211 works better than fq_codel qdisc as
far as wake_tx_queue drivers are concerned. I'll
be posting a separate thread after this.

This is based on mac80211-next/master
(0a87cadbb54e1595a5f64542adb4c63be914d290).

v2:
* fix invalid ptr deref
* fix compilation for backports


Michal Kazior (2):
mac80211: implement fair queuing per txq
mac80211: expose some txq/fq internals and knobs via debugfs

include/net/mac80211.h | 21 ++-
net/mac80211/agg-tx.c | 8 +-
net/mac80211/codel.h | 264 +++++++++++++++++++++++++++++
net/mac80211/codel_i.h | 89 ++++++++++
net/mac80211/debugfs.c | 86 ++++++++++
net/mac80211/debugfs_netdev.c | 29 +++-
net/mac80211/debugfs_sta.c | 46 +++++
net/mac80211/ieee80211_i.h | 45 ++++-
net/mac80211/iface.c | 24 ++-
net/mac80211/main.c | 9 +-
net/mac80211/rx.c | 2 +-
net/mac80211/sta_info.c | 10 +-
net/mac80211/sta_info.h | 27 +++
net/mac80211/tx.c | 383 +++++++++++++++++++++++++++++++++++++-----
net/mac80211/util.c | 20 ++-
15 files changed, 983 insertions(+), 80 deletions(-)
create mode 100644 net/mac80211/codel.h
create mode 100644 net/mac80211/codel_i.h

--
2.1.4


2016-03-16 10:15:56

by Michal Kazior

[permalink] [raw]
Subject: [RFCv2 3/3] ath10k: use ieee80211_tx_schedule()

The wake_tx_queue() scheduling for non-pull-push
(or threshold based pull-push) wasn't really smart
nor fair.

Instead use the new mac80211 Tx scheduling helper
which is part of the fq-codel-in-mac80211 proof of
concept.

Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/core.c | 2 -
drivers/net/wireless/ath/ath10k/core.h | 3 -
drivers/net/wireless/ath/ath10k/mac.c | 100 ++++++++++++++++-----------------
3 files changed, 50 insertions(+), 55 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
index 2389c0713c13..1848e0b6206a 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -2049,9 +2049,7 @@ struct ath10k *ath10k_core_create(size_t priv_size, struct device *dev,

mutex_init(&ar->conf_mutex);
spin_lock_init(&ar->data_lock);
- spin_lock_init(&ar->txqs_lock);

- INIT_LIST_HEAD(&ar->txqs);
INIT_LIST_HEAD(&ar->peers);
init_waitqueue_head(&ar->peer_mapping_wq);
init_waitqueue_head(&ar->htt.empty_tx_wq);
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 3f76669d44cf..1f94af046f79 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -801,10 +801,7 @@ struct ath10k {

/* protects shared structure data */
spinlock_t data_lock;
- /* protects: ar->txqs, artxq->list */
- spinlock_t txqs_lock;

- struct list_head txqs;
struct list_head arvifs;
struct list_head peers;
struct ath10k_peer *peer_map[ATH10K_MAX_NUM_PEER_IDS];
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index addef9179dbe..5b8becb30b62 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -3624,17 +3624,14 @@ void ath10k_mgmt_over_wmi_tx_work(struct work_struct *work)

static void ath10k_mac_txq_init(struct ieee80211_txq *txq)
{
- struct ath10k_txq *artxq = (void *)txq->drv_priv;
-
if (!txq)
return;

- INIT_LIST_HEAD(&artxq->list);
+ /* It's useful to keep this even though it doesn't do anything now */
}

static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq *txq)
{
- struct ath10k_txq *artxq = (void *)txq->drv_priv;
struct ath10k_skb_cb *cb;
struct sk_buff *msdu;
int msdu_id;
@@ -3642,11 +3639,6 @@ static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq *txq)
if (!txq)
return;

- spin_lock_bh(&ar->txqs_lock);
- if (!list_empty(&artxq->list))
- list_del_init(&artxq->list);
- spin_unlock_bh(&ar->txqs_lock);
-
spin_lock_bh(&ar->htt.tx_lock);
idr_for_each_entry(&ar->htt.pending_tx, msdu, msdu_id) {
cb = ATH10K_SKB_CB(msdu);
@@ -3725,7 +3717,7 @@ int ath10k_mac_tx_push_txq(struct ieee80211_hw *hw,
ath10k_htt_tx_dec_pending(htt, is_mgmt);
spin_unlock_bh(&ar->htt.tx_lock);

- return -ENOENT;
+ return 0;
}

ath10k_mac_tx_h_fill_cb(ar, vif, txq, skb);
@@ -3752,44 +3744,59 @@ int ath10k_mac_tx_push_txq(struct ieee80211_hw *hw,
return skb_len;
}

+#define MIN_TX_SLOTS 42
+
+static int ath10k_mac_tx_wake(struct ieee80211_hw *hw,
+ struct ieee80211_txq *txq,
+ int budget)
+{
+ struct ath10k *ar = hw->priv;
+ int sent = 0;
+ int ret;
+ int slots;
+
+ if (!ath10k_mac_tx_can_push(hw, txq))
+ return 0;
+
+ /* This gives more opportunity to form longer bursts when there's a lot
+ * of stations active
+ */
+ slots = ar->htt.max_num_pending_tx - ar->htt.num_pending_tx;
+ if (slots < MIN_TX_SLOTS)
+ return -1;
+
+ while (budget > 0) {
+ ret = ath10k_mac_tx_push_txq(hw, txq);
+ if (ret <= 0)
+ break;
+
+ sent++;
+ budget -= ret;
+ }
+
+ if (ret >= 0)
+ return sent;
+ else
+ return -1;
+}
+
void ath10k_mac_tx_push_pending(struct ath10k *ar)
{
struct ieee80211_hw *hw = ar->hw;
- struct ieee80211_txq *txq;
- struct ath10k_txq *artxq;
- struct ath10k_txq *last;
int ret;
- int max;

- spin_lock_bh(&ar->txqs_lock);
- rcu_read_lock();
-
- last = list_last_entry(&ar->txqs, struct ath10k_txq, list);
- while (!list_empty(&ar->txqs)) {
- artxq = list_first_entry(&ar->txqs, struct ath10k_txq, list);
- txq = container_of((void *)artxq, struct ieee80211_txq,
- drv_priv);
-
- /* Prevent aggressive sta/tid taking over tx queue */
- max = 16;
- while (max--) {
- ret = ath10k_mac_tx_push_txq(hw, txq);
- if (ret < 0)
- break;
- }
-
- list_del_init(&artxq->list);
- ath10k_htt_tx_txq_update(hw, txq);
-
- if (artxq == last || (ret < 0 && ret != -ENOENT)) {
- if (ret != -ENOENT)
- list_add_tail(&artxq->list, &ar->txqs);
- break;
- }
+ ieee80211_recalc_fq_period(hw);
+
+ ret = ieee80211_tx_schedule(hw, ath10k_mac_tx_wake);
+ switch (ret) {
+ default:
+ ath10k_warn(ar, "unexpected tx schedul retval: %d\n",
+ ret);
+ /* pass through */
+ case -EBUSY:
+ case -ENOENT:
+ break;
}
-
- rcu_read_unlock();
- spin_unlock_bh(&ar->txqs_lock);
}

/************/
@@ -4013,16 +4020,9 @@ static void ath10k_mac_op_wake_tx_queue(struct ieee80211_hw *hw,
struct ieee80211_txq *txq)
{
struct ath10k *ar = hw->priv;
- struct ath10k_txq *artxq = (void *)txq->drv_priv;
-
- if (ath10k_mac_tx_can_push(hw, txq)) {
- spin_lock_bh(&ar->txqs_lock);
- if (list_empty(&artxq->list))
- list_add_tail(&artxq->list, &ar->txqs);
- spin_unlock_bh(&ar->txqs_lock);

+ if (ath10k_mac_tx_can_push(hw, txq))
tasklet_schedule(&ar->htt.txrx_compl_task);
- }

ath10k_htt_tx_txq_update(hw, txq);
}
--
2.1.4


2016-03-16 18:55:39

by Bob Copeland

[permalink] [raw]
Subject: Re: [RFCv2 0/3] mac80211: implement fq codel

On Wed, Mar 16, 2016 at 11:36:31AM -0700, Dave Taht wrote:
> That is the sanest 802.11e queue behavior I have ever seen! (at both
> 6 and 300mbit! in the ath10k patched mac test)

Out of curiosity, why does BE have larger latency than BK in that chart?
I'd have expected the opposite.

--
Bob Copeland %% http://bobcopeland.com/

2016-03-17 08:55:06

by Michal Kazior

[permalink] [raw]
Subject: Re: [RFCv2 0/3] mac80211: implement fq codel

TxOP 0 has a special meaning in the standard. For HT/VHT it means the
it is actually limited to 5484us (mixed-mode) or 10000us (greenfield).

I suspect the BK/BE latency difference has to do with the fact that
there's bulk traffic going on BE queues (this isn't reflected
explicitly in the plots). The `bursts` flent test includes short
bursts of traffic on tid0 (BE) which is shared with ICMP and BE UDP_RR
(seen as green and blue lines on the plot). Due to (intended) limited
outflow (6mbps) BE queues build up and don't drain for the duration of
the entire test creating more opportunities for aggregating BE traffic
while other queues are near-empty and very short (time wise as well).
If you consider Wi-Fi is half-duplex and latency in the entire stack
(for processing ICMP and UDP_RR) is greater than 11e contention window
timings you can get your BE flow responses with extra delay (since
other queues might have responses ready quicker).

I've modified traffic-gen and re-run tests with bursts on all tested
tids/ACs (tid0, tid1, tid5). I'm attaching the results.

With bursts on all tids you can clearly see BK has much higher latency than BE.

(Note, I've changed my AP to QCA988X with oldie firmware 10.1.467 for
this test; it doesn't have the weird hiccups I was seeing on QCA99X0
and newer QCA988X firmware reports bogus expected throughput which is
most likely a result of my sloppy proof-of-concept change in ath10k).


Michał

On 16 March 2016 at 20:48, Jasmine Strong <[email protected]> wrote:
> BK usually has 0 txop, so it doesn't do aggregation.
>
> On Wed, Mar 16, 2016 at 11:55 AM, Bob Copeland <[email protected]> wrote:
>>
>> On Wed, Mar 16, 2016 at 11:36:31AM -0700, Dave Taht wrote:
>> > That is the sanest 802.11e queue behavior I have ever seen! (at both
>> > 6 and 300mbit! in the ath10k patched mac test)
>>
>> Out of curiosity, why does BE have larger latency than BK in that chart?
>> I'd have expected the opposite.
>>
>> --
>> Bob Copeland %% http://bobcopeland.com/
>>
>> _______________________________________________
>> ath10k mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/ath10k
>
>


Attachments:
bursts-2016-03-17T083932.549858.qca988x_10_1_467_fqmac_ath10k_with_tx_sched_6mbps_.flent.gz (14.31 kB)
bursts-2016-03-17T083803.348752.qca988x_10_1_467_fqmac_ath10k_with_tx_sched_6mbps_.flent.gz (14.68 kB)
Download all attachments

2016-03-17 17:00:07

by Dave Taht

[permalink] [raw]
Subject: Re: [RFCv2 0/3] mac80211: implement fq codel

On Thu, Mar 17, 2016 at 1:55 AM, Michal Kazior <[email protected]> wrote:

> I suspect the BK/BE latency difference has to do with the fact that
> there's bulk traffic going on BE queues (this isn't reflected
> explicitly in the plots). The `bursts` flent test includes short
> bursts of traffic on tid0 (BE) which is shared with ICMP and BE UDP_RR
> (seen as green and blue lines on the plot). Due to (intended) limited
> outflow (6mbps) BE queues build up and don't drain for the duration of
> the entire test creating more opportunities for aggregating BE traffic
> while other queues are near-empty and very short (time wise as well).

I agree with your explanation. Access to the media and queue length
are the two variables at play here.

I just committed a new flent test that should exercise the vo,vi,be,
and bk queues, "bursts_11e". I dropped the conventional ping from it
and just rely on netperf's udp_rr for each queue. It seems to "do the
right thing" on the ath9k....

And while I'm all in favor of getting 802.11e's behaviors more right,
and this seems like a good way to get there...

netperf's udp_rr is not how much traffic conventionally behaves. It
doesn't do tcp slow start or congestion control in particular...

In the case of the VO queue, for example, the (2004) intended behavior
was 1 isochronous packet per 10ms per voice sending station and one
from the ap, not a "ping". And at the time, VI was intended to be
unicast video. TCP was an afterthought. (wifi's original (1993) mac
was actually designed for ipx/spx!)

I long for regular "rrul" and "rrul_be" tests against the new stuff to
blow it up thoroughly as references along the way.
(tcp_upload, tcp_download, (and several of the rtt_fair tests also
between stations)). Will get formal about it here as soon as we end up
on the same kernel trees....

Furthermore 802.11e is not widely used - in particular, not much
internet bound/sourced traffic falls into more than BE and BK,
presently. and in some cases weirder - comcast remarks a very large
percentage of to the home inbound traffic as CS1 (BK), btw, and
stations tend to use CS0. Data comes in on BK, acks go out on BE.

I/we will try to come up with intermediate tests between the burst
tests and the rrul tests as we go along the way.

> If you consider Wi-Fi is half-duplex and latency in the entire stack

In the context of this test regime...

<pedantry>
Saying wifi is "half"-duplex is a misleading way to think about it in
many respects. it is a shared medium more like early, non-switched
ethernet, with a weird mac that governs what sort of packets get
access to (a txop) the medium first, across all stations co-operating
within EDCA.

Half or full duplex is something that mostly applied to p2p serial
connections (or p2p wifi), not P2MP. Additionally characteristics like
exponential backoff make no sense were wifi any form of duplex, full
or half.

Certainly much stuff within a txop (block acks for example) can be
considered half duplex in a microcosmic context.

I wish we actually had words that accurately described wifi's actual behavior.
</pedantry>

> (for processing ICMP and UDP_RR) is greater than 11e contention window
> timings you can get your BE flow responses with extra delay (since
> other queues might have responses ready quicker).

yes. always having a request pending for each of the 802.11e queues is
actually not the best idea, it is better to take advantage of better
aggregation afforded by 802.11n/ac, to only have one or two of the
queues in use against any given station and promote or demote traffic
into a more-right queue.

simple example of the damage having all 4 queues always contending is
exemplified by running the rrul and rrul_be tests against nearly any
given AP.

>
> I've modified traffic-gen and re-run tests with bursts on all tested
> tids/ACs (tid0, tid1, tid5). I'm attaching the results.
>
> With bursts on all tids you can clearly see BK has much higher latency than BE.

The long term goal here, of course, is for BK (or the other queues) to
not have seconds of queuing latency but something more bounded to 2x
media access time...

> (Note, I've changed my AP to QCA988X with oldie firmware 10.1.467 for
> this test; it doesn't have the weird hiccups I was seeing on QCA99X0
> and newer QCA988X firmware reports bogus expected throughput which is
> most likely a result of my sloppy proof-of-concept change in ath10k).

So I should avoid ben greer's firmware for now?

>
>
> Michał
>
> On 16 March 2016 at 20:48, Jasmine Strong <[email protected]> wrote:
>> BK usually has 0 txop, so it doesn't do aggregation.
>>
>> On Wed, Mar 16, 2016 at 11:55 AM, Bob Copeland <[email protected]> wrote:
>>>
>>> On Wed, Mar 16, 2016 at 11:36:31AM -0700, Dave Taht wrote:
>>> > That is the sanest 802.11e queue behavior I have ever seen! (at both
>>> > 6 and 300mbit! in the ath10k patched mac test)
>>>
>>> Out of curiosity, why does BE have larger latency than BK in that chart?
>>> I'd have expected the opposite.
>>>
>>> --
>>> Bob Copeland %% http://bobcopeland.com/
>>>
>>> _______________________________________________
>>> ath10k mailing list
>>> [email protected]
>>> http://lists.infradead.org/mailman/listinfo/ath10k
>>
>>

2016-03-24 07:19:35

by Mohammed Shafi Shajakhan

[permalink] [raw]
Subject: Re: [RFCv2 2/3] ath10k: report per-station tx/rate rates to mac80211

Hi Michal,

On Wed, Mar 16, 2016 at 11:17:57AM +0100, Michal Kazior wrote:
> The rate control is offloaded by firmware so it's
> challanging to provide expected throughput value
> for given station.
>
> This approach is naive as it reports last tx rate
> used for given station as provided by firmware
> stat event.
>
> This should be sufficient for airtime estimation
> used for fq-codel-in-mac80211 tx scheduling
> purposes now.
>
> This patch uses a very hacky way to get the stats.
> This is sufficient for proof-of-concept but must
> be cleaned up properly eventually.
>
> Signed-off-by: Michal Kazior <[email protected]>
> ---
> drivers/net/wireless/ath/ath10k/core.h | 5 +++
> drivers/net/wireless/ath/ath10k/debug.c | 61 +++++++++++++++++++++++++++++----
> drivers/net/wireless/ath/ath10k/mac.c | 26 ++++++++------
> drivers/net/wireless/ath/ath10k/wmi.h | 2 +-
> 4 files changed, 76 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
> index 23ba03fb7a5f..3f76669d44cf 100644
> --- a/drivers/net/wireless/ath/ath10k/core.h
> +++ b/drivers/net/wireless/ath/ath10k/core.h
> @@ -331,6 +331,9 @@ struct ath10k_sta {
> /* protected by conf_mutex */
> bool aggr_mode;
> u64 rx_duration;
> +
> + u32 tx_rate_kbps;
> + u32 rx_rate_kbps;
> #endif
> };
>
> @@ -372,6 +375,8 @@ struct ath10k_vif {
> s8 def_wep_key_idx;
>
> u16 tx_seq_no;
> + u32 tx_rate_kbps;
> + u32 rx_rate_kbps;
>
> union {
> struct {
> diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c
> index 076d29b53ddf..cc7ebf04ae00 100644
> --- a/drivers/net/wireless/ath/ath10k/debug.c
> +++ b/drivers/net/wireless/ath/ath10k/debug.c
> @@ -316,6 +316,58 @@ static void ath10k_debug_fw_stats_reset(struct ath10k *ar)
> spin_unlock_bh(&ar->data_lock);
> }
>
> +static void ath10k_mac_update_txrx_rate_iter(void *data,
> + u8 *mac,
> + struct ieee80211_vif *vif)
> +{
> + struct ath10k_fw_stats_peer *peer = data;
> + struct ath10k_vif *arvif;
> +
> + if (memcmp(vif->addr, peer->peer_macaddr, ETH_ALEN))
> + return;
> +
> + arvif = (void *)vif->drv_priv;
> + arvif->tx_rate_kbps = peer->peer_tx_rate;
> + arvif->rx_rate_kbps = peer->peer_rx_rate;
> +}
> +
> +static void ath10k_mac_update_txrx_rate(struct ath10k *ar,
> + struct ath10k_fw_stats *stats)
> +{
> + struct ieee80211_hw *hw = ar->hw;
> + struct ath10k_fw_stats_peer *peer;
> + struct ath10k_sta *arsta;
> + struct ieee80211_sta *sta;
> + const u8 *localaddr = NULL;
> +
> + rcu_read_lock();
> +
> + list_for_each_entry(peer, &stats->peers, list) {
> + /* This doesn't account for multiple STA connected on different
> + * vifs. Unfortunately there's no way to derive that from the available
> + * information.
> + */
> + sta = ieee80211_find_sta_by_ifaddr(hw,
> + peer->peer_macaddr,
> + localaddr);
> + if (!sta) {
> + /* This tries to update multicast rates */
> + ieee80211_iterate_active_interfaces_atomic(
> + hw,
> + IEEE80211_IFACE_ITER_NORMAL,
> + ath10k_mac_update_txrx_rate_iter,
> + peer);
> + continue;
> + }
> +
> + arsta = (void *)sta->drv_priv;
> + arsta->tx_rate_kbps = peer->peer_tx_rate;
> + arsta->rx_rate_kbps = peer->peer_rx_rate;
> + }
> +
> + rcu_read_unlock();
> +}
> +
> void ath10k_debug_fw_stats_process(struct ath10k *ar, struct sk_buff *skb)
> {
> struct ath10k_fw_stats stats = {};
> @@ -335,6 +387,8 @@ void ath10k_debug_fw_stats_process(struct ath10k *ar, struct sk_buff *skb)
> goto free;
> }
>
> + ath10k_mac_update_txrx_rate(ar, &stats);
> +
> /* Stat data may exceed htc-wmi buffer limit. In such case firmware
> * splits the stats data and delivers it in a ping-pong fashion of
> * request cmd-update event.
> @@ -351,13 +405,6 @@ void ath10k_debug_fw_stats_process(struct ath10k *ar, struct sk_buff *skb)
> if (peer_stats_svc)
> ath10k_sta_update_rx_duration(ar, &stats.peers);
>
> - if (ar->debug.fw_stats_done) {
> - if (!peer_stats_svc)
> - ath10k_warn(ar, "received unsolicited stats update event\n");
> -
> - goto free;
> - }
> -

[shafi] As you had suggested previously, should we completely clean up this ping
- pong response approach for f/w stats, (or) this should be retained to support
backward compatibility and also for supporting ping - pong response when user
cats for fw-stats (via debugfs) (i did see in the commit message this needs to
be cleaned up)


> num_peers = ath10k_wmi_fw_stats_num_peers(&ar->debug.fw_stats.peers);
> num_vdevs = ath10k_wmi_fw_stats_num_vdevs(&ar->debug.fw_stats.vdevs);
> is_start = (list_empty(&ar->debug.fw_stats.pdevs) &&
> diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
> index ebff9c0a0784..addef9179dbe 100644
> --- a/drivers/net/wireless/ath/ath10k/mac.c
> +++ b/drivers/net/wireless/ath/ath10k/mac.c
> @@ -4427,16 +4427,14 @@ static int ath10k_start(struct ieee80211_hw *hw)
>
> ar->ani_enabled = true;
>
> - if (test_bit(WMI_SERVICE_PEER_STATS, ar->wmi.svc_map)) {
> - param = ar->wmi.pdev_param->peer_stats_update_period;
> - ret = ath10k_wmi_pdev_set_param(ar, param,
> - PEER_DEFAULT_STATS_UPDATE_PERIOD);
> - if (ret) {
> - ath10k_warn(ar,
> - "failed to set peer stats period : %d\n",
> - ret);
> - goto err_core_stop;
> - }
> + param = ar->wmi.pdev_param->peer_stats_update_period;
> + ret = ath10k_wmi_pdev_set_param(ar, param,
> + PEER_DEFAULT_STATS_UPDATE_PERIOD);
> + if (ret) {
> + ath10k_warn(ar,
> + "failed to set peer stats period : %d\n",
> + ret);
> + goto err_core_stop;
> }

[shafi] If i am correct this change requires 'PEER_STATS' to be enabled by
default.

>
> ar->num_started_vdevs = 0;
> @@ -7215,6 +7213,13 @@ ath10k_mac_op_switch_vif_chanctx(struct ieee80211_hw *hw,
> return 0;
> }
>
> +static u32
> +ath10k_mac_op_get_expected_throughput(struct ieee80211_sta *sta)
> +{
> + struct ath10k_sta *arsta = (struct ath10k_sta *)sta->drv_priv;
> + return arsta->tx_rate_kbps;
> +}
> +
> static const struct ieee80211_ops ath10k_ops = {
> .tx = ath10k_mac_op_tx,
> .wake_tx_queue = ath10k_mac_op_wake_tx_queue,
> @@ -7254,6 +7259,7 @@ static const struct ieee80211_ops ath10k_ops = {
> .assign_vif_chanctx = ath10k_mac_op_assign_vif_chanctx,
> .unassign_vif_chanctx = ath10k_mac_op_unassign_vif_chanctx,
> .switch_vif_chanctx = ath10k_mac_op_switch_vif_chanctx,
> + .get_expected_throughput = ath10k_mac_op_get_expected_throughput,
>
> CFG80211_TESTMODE_CMD(ath10k_tm_cmd)
>
> diff --git a/drivers/net/wireless/ath/ath10k/wmi.h b/drivers/net/wireless/ath/ath10k/wmi.h
> index 4d3cbc44fcd2..2877a3a27b95 100644
> --- a/drivers/net/wireless/ath/ath10k/wmi.h
> +++ b/drivers/net/wireless/ath/ath10k/wmi.h
> @@ -3296,7 +3296,7 @@ struct wmi_csa_event {
> /* the definition of different PDEV parameters */
> #define PDEV_DEFAULT_STATS_UPDATE_PERIOD 500
> #define VDEV_DEFAULT_STATS_UPDATE_PERIOD 500
> -#define PEER_DEFAULT_STATS_UPDATE_PERIOD 500
> +#define PEER_DEFAULT_STATS_UPDATE_PERIOD 100

[shafi] Is this for more granularity since 500ms is not sufficient ?, I understand
the firmware has default stats_update_period as 500ms and i hope it supports
100ms as well, Also if we are going to support period stats update we may need to
accumulate the information in driver (like this change and rx_duration

I will try to take this change, rebase it to TOT and see how it goes.

thanks,
shafi

2016-03-24 12:23:30

by Mohammed Shafi Shajakhan

[permalink] [raw]
Subject: Re: [RFCv2 2/3] ath10k: report per-station tx/rate rates to mac80211

On Thu, Mar 24, 2016 at 08:49:12AM +0100, Michal Kazior wrote:
> On 24 March 2016 at 08:19, Mohammed Shafi Shajakhan
> <[email protected]> wrote:
> > Hi Michal,
> >
> > On Wed, Mar 16, 2016 at 11:17:57AM +0100, Michal Kazior wrote:
> >> The rate control is offloaded by firmware so it's
> >> challanging to provide expected throughput value
> >> for given station.
> >>
> >> This approach is naive as it reports last tx rate
> >> used for given station as provided by firmware
> >> stat event.
> >>
> >> This should be sufficient for airtime estimation
> >> used for fq-codel-in-mac80211 tx scheduling
> >> purposes now.
> >>
> >> This patch uses a very hacky way to get the stats.
> >> This is sufficient for proof-of-concept but must
> >> be cleaned up properly eventually.
> >>
> >> Signed-off-by: Michal Kazior <[email protected]>
> >> ---
> >> drivers/net/wireless/ath/ath10k/core.h | 5 +++
> >> drivers/net/wireless/ath/ath10k/debug.c | 61 +++++++++++++++++++++++++++++----
> >> drivers/net/wireless/ath/ath10k/mac.c | 26 ++++++++------
> >> drivers/net/wireless/ath/ath10k/wmi.h | 2 +-
> >> 4 files changed, 76 insertions(+), 18 deletions(-)
> >>
> >> diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
> >> index 23ba03fb7a5f..3f76669d44cf 100644
> >> --- a/drivers/net/wireless/ath/ath10k/core.h
> >> +++ b/drivers/net/wireless/ath/ath10k/core.h
> >> @@ -331,6 +331,9 @@ struct ath10k_sta {
> >> /* protected by conf_mutex */
> >> bool aggr_mode;
> >> u64 rx_duration;
> >> +
> >> + u32 tx_rate_kbps;
> >> + u32 rx_rate_kbps;
> >> #endif
> >> };
> >>
> >> @@ -372,6 +375,8 @@ struct ath10k_vif {
> >> s8 def_wep_key_idx;
> >>
> >> u16 tx_seq_no;
> >> + u32 tx_rate_kbps;
> >> + u32 rx_rate_kbps;
> >>
> >> union {
> >> struct {
> >> diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c
> >> index 076d29b53ddf..cc7ebf04ae00 100644
> >> --- a/drivers/net/wireless/ath/ath10k/debug.c
> >> +++ b/drivers/net/wireless/ath/ath10k/debug.c
> >> @@ -316,6 +316,58 @@ static void ath10k_debug_fw_stats_reset(struct ath10k *ar)
> >> spin_unlock_bh(&ar->data_lock);
> >> }
> >>
> >> +static void ath10k_mac_update_txrx_rate_iter(void *data,
> >> + u8 *mac,
> >> + struct ieee80211_vif *vif)
> >> +{
> >> + struct ath10k_fw_stats_peer *peer = data;
> >> + struct ath10k_vif *arvif;
> >> +
> >> + if (memcmp(vif->addr, peer->peer_macaddr, ETH_ALEN))
> >> + return;
> >> +
> >> + arvif = (void *)vif->drv_priv;
> >> + arvif->tx_rate_kbps = peer->peer_tx_rate;
> >> + arvif->rx_rate_kbps = peer->peer_rx_rate;
> >> +}
> >> +
> >> +static void ath10k_mac_update_txrx_rate(struct ath10k *ar,
> >> + struct ath10k_fw_stats *stats)
> >> +{
> >> + struct ieee80211_hw *hw = ar->hw;
> >> + struct ath10k_fw_stats_peer *peer;
> >> + struct ath10k_sta *arsta;
> >> + struct ieee80211_sta *sta;
> >> + const u8 *localaddr = NULL;
> >> +
> >> + rcu_read_lock();
> >> +
> >> + list_for_each_entry(peer, &stats->peers, list) {
> >> + /* This doesn't account for multiple STA connected on different
> >> + * vifs. Unfortunately there's no way to derive that from the available
> >> + * information.
> >> + */
> >> + sta = ieee80211_find_sta_by_ifaddr(hw,
> >> + peer->peer_macaddr,
> >> + localaddr);
> >> + if (!sta) {
> >> + /* This tries to update multicast rates */
> >> + ieee80211_iterate_active_interfaces_atomic(
> >> + hw,
> >> + IEEE80211_IFACE_ITER_NORMAL,
> >> + ath10k_mac_update_txrx_rate_iter,
> >> + peer);
> >> + continue;
> >> + }
> >> +
> >> + arsta = (void *)sta->drv_priv;
> >> + arsta->tx_rate_kbps = peer->peer_tx_rate;
> >> + arsta->rx_rate_kbps = peer->peer_rx_rate;
> >> + }
> >> +
> >> + rcu_read_unlock();
> >> +}
> >> +
> >> void ath10k_debug_fw_stats_process(struct ath10k *ar, struct sk_buff *skb)
> >> {
> >> struct ath10k_fw_stats stats = {};
> >> @@ -335,6 +387,8 @@ void ath10k_debug_fw_stats_process(struct ath10k *ar, struct sk_buff *skb)
> >> goto free;
> >> }
> >>
> >> + ath10k_mac_update_txrx_rate(ar, &stats);
> >> +
> >> /* Stat data may exceed htc-wmi buffer limit. In such case firmware
> >> * splits the stats data and delivers it in a ping-pong fashion of
> >> * request cmd-update event.
> >> @@ -351,13 +405,6 @@ void ath10k_debug_fw_stats_process(struct ath10k *ar, struct sk_buff *skb)
> >> if (peer_stats_svc)
> >> ath10k_sta_update_rx_duration(ar, &stats.peers);
> >>
> >> - if (ar->debug.fw_stats_done) {
> >> - if (!peer_stats_svc)
> >> - ath10k_warn(ar, "received unsolicited stats update event\n");
> >> -
> >> - goto free;
> >> - }
> >> -
> >
> > [shafi] As you had suggested previously, should we completely clean up this ping
> > - pong response approach for f/w stats, (or) this should be retained to support
> > backward compatibility and also for supporting ping - pong response when user
> > cats for fw-stats (via debugfs) (i did see in the commit message this needs to
> > be cleaned up)
>
> I think it makes sense to remove the ping-pong logic and rely on
> periodic updates alone, including fw_stats and ethstats handling.
>
>
> >> - if (test_bit(WMI_SERVICE_PEER_STATS, ar->wmi.svc_map)) {
> >> - param = ar->wmi.pdev_param->peer_stats_update_period;
> >> - ret = ath10k_wmi_pdev_set_param(ar, param,
> >> - PEER_DEFAULT_STATS_UPDATE_PERIOD);
> >> - if (ret) {
> >> - ath10k_warn(ar,
> >> - "failed to set peer stats period : %d\n",
> >> - ret);
> >> - goto err_core_stop;
> >> - }
> >> + param = ar->wmi.pdev_param->peer_stats_update_period;
> >> + ret = ath10k_wmi_pdev_set_param(ar, param,
> >> + PEER_DEFAULT_STATS_UPDATE_PERIOD);
> >> + if (ret) {
> >> + ath10k_warn(ar,
> >> + "failed to set peer stats period : %d\n",
> >> + ret);
> >> + goto err_core_stop;
> >> }
> >
> > [shafi] If i am correct this change requires 'PEER_STATS' to be enabled by
> > default.
>
> No, it does not. Periodic stats have been available since forever.

[shafi] Michal, sorry i was talking about enabling WMI_PEER_STATS feature for
10.2, and we have a patch pushed recently to reduce the number of peers if
'WMI_PEER_STATS' feature is enabled(avoiding f/w crash due to memory
constraints) . But this patch requires the feature to be
turned ON always (with periodic stats update as well for evey 100ms). Please
correct me if my understanding is wrong.

>
>
> >> diff --git a/drivers/net/wireless/ath/ath10k/wmi.h b/drivers/net/wireless/ath/ath10k/wmi.h
> >> index 4d3cbc44fcd2..2877a3a27b95 100644
> >> --- a/drivers/net/wireless/ath/ath10k/wmi.h
> >> +++ b/drivers/net/wireless/ath/ath10k/wmi.h
> >> @@ -3296,7 +3296,7 @@ struct wmi_csa_event {
> >> /* the definition of different PDEV parameters */
> >> #define PDEV_DEFAULT_STATS_UPDATE_PERIOD 500
> >> #define VDEV_DEFAULT_STATS_UPDATE_PERIOD 500
> >> -#define PEER_DEFAULT_STATS_UPDATE_PERIOD 500
> >> +#define PEER_DEFAULT_STATS_UPDATE_PERIOD 100
> >
> > [shafi] Is this for more granularity since 500ms is not sufficient ?, I understand
> > the firmware has default stats_update_period as 500ms and i hope it supports
> > 100ms as well, Also if we are going to support period stats update we may need to
> > accumulate the information in driver (like this change and rx_duration
>
> The patch is used for rough rate estimation which is used to keep Tx
> queues filled only with 1-2 txops worth of data. Signal conditions can
> change vastly so I figured I need the peer stat update events come
> more often. I didn't really verify if they come every 100ms. The patch
> already served it's job as a proof-of-concept for smarter tx queuing.

[shafi] Ok, sure. I understand very soon we will get this in ath10k as well
(though it is a POC)

>
> > I will try to take this change, rebase it to TOT and see how it goes.
>
> There's really no benefit it taking this patch as a basis for periodic
> stat handling. Majority of this patch is just handling peer_tx_rate
> for rate estimation purposes. Feel free to knock yourself out with
> ripping out the ping-pong stat stuff though :)
>
[shafi] sure Michal, thanks !

regards,
shafi

2016-03-25 09:26:24

by Michal Kazior

[permalink] [raw]
Subject: [PATCH 1/2] mac80211: implement fair queuing per txq

Qdiscs assume all packets regardless of
destination address are treated equally by
underlying an device link.

This isn't true for wireless where each node is a
link in it's own right with different and varying
signal quality over time.

Existing wireless behavior stuffs device tx queues
with no regard to link conditions. This can result
in queue buildup for slow stations and an inertia
worth of seconds making it impossible for small
bursty traffic to come through.

The current high-level idea is to keep roughly 1-2
txops worth of data in device tx queues to allow
short bursts to be handled responsively.

mac80211's software queues were designed to work
very closely with device tx queues. They are
required to make use of 802.11 packet aggregation
easily and efficiently.

However the logic imposed a per-AC queue limit.
With the limit too small mac80211 wasn't be able
to guarantee fairness across TIDs nor stations
because single burst to a slow station could
monopolize queues and reach per-AC limit
preventing traffic from other stations being
queued into mac80211's software queues. Having the
limit too large would make smart qdiscs, e.g.
fq_codel, a lot less efficient as they are
designed on the premise that they are very close
to the actualy device tx queues.

The patch implements fq_codel-ish logic in
mac80211's software queuing. This doesn't directly
translate to immediate and significant gains.
Moreover only wake_tx_queue based drivers will be
able to reap the benefits of fair queuing for now.

More work is required to make sure drivers keep
their device tx queues at minimum fill (instead of
clogging them up until they're full regardless of
link conditions). Only then the full effect of
fair queuing will be observable.

Signed-off-by: Michal Kazior <[email protected]>
---

Notes:
v1:
* move txq_limit and txq_cparams from ieee80211_hw to ieee80211_fq
* remove printks
* improve commit log
* various cleanups
* extra stats
* split out the core txq fairness changes
* should_drop() doesn't consider bursts
* codel target is hardcoded to 20ms

RFC v2:
* actually re-use txq_flows on enqueue [Felix]
* tune should_drop() to consider bursts wrt station expected tput [Dave/Bob]
* make codel target time scale via ewma of estimated txqi service period [Dave]
* generic tx scheduling (with time-based queue limit and naive hysteresis)
* tracking per-frame expected duration
* tracking per-txqi in-flight data duration
* tracking per-hw in-flight data duration
? in-flight means scheduled to driver and assumes driver does report
tx-status on actual tx-completion
* added a few debugfs entries

include/net/mac80211.h | 21 ++-
net/mac80211/agg-tx.c | 8 +-
net/mac80211/codel.h | 264 +++++++++++++++++++++++++++++++
net/mac80211/codel_i.h | 89 +++++++++++
net/mac80211/ieee80211_i.h | 45 +++++-
net/mac80211/iface.c | 24 ++-
net/mac80211/main.c | 9 +-
net/mac80211/rx.c | 2 +-
net/mac80211/sta_info.c | 10 +-
net/mac80211/sta_info.h | 27 ++++
net/mac80211/tx.c | 379 ++++++++++++++++++++++++++++++++++++++++-----
net/mac80211/util.c | 20 ++-
12 files changed, 818 insertions(+), 80 deletions(-)
create mode 100644 net/mac80211/codel.h
create mode 100644 net/mac80211/codel_i.h

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index a53333cb1528..0ee51dbb361b 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -888,8 +888,18 @@ struct ieee80211_tx_info {
/* only needed before rate control */
unsigned long jiffies;
};
- /* NB: vif can be NULL for injected frames */
- struct ieee80211_vif *vif;
+ union {
+ /* NB: vif can be NULL for injected frames */
+ struct ieee80211_vif *vif;
+
+ /* When packets are enqueued on txq it's easy
+ * to re-construct the vif pointer. There's no
+ * more space in tx_info so it can be used to
+ * store the necessary enqueue time for packet
+ * sojourn time computation.
+ */
+ u64 enqueue_time;
+ };
struct ieee80211_key_conf *hw_key;
u32 flags;
/* 4 bytes free */
@@ -2113,9 +2123,6 @@ enum ieee80211_hw_flags {
* @n_cipher_schemes: a size of an array of cipher schemes definitions.
* @cipher_schemes: a pointer to an array of cipher scheme definitions
* supported by HW.
- *
- * @txq_ac_max_pending: maximum number of frames per AC pending in all txq
- * entries for a vif.
*/
struct ieee80211_hw {
struct ieee80211_conf conf;
@@ -2145,7 +2152,6 @@ struct ieee80211_hw {
u8 uapsd_max_sp_len;
u8 n_cipher_schemes;
const struct ieee80211_cipher_scheme *cipher_schemes;
- int txq_ac_max_pending;
};

static inline bool _ieee80211_hw_check(struct ieee80211_hw *hw,
@@ -5633,6 +5639,9 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
* txq state can change half-way of this function and the caller may end up
* with "new" frame_cnt and "old" byte_cnt or vice-versa.
*
+ * Moreover returned values are best-case, i.e. assuming queueing algorithm
+ * will not drop frames due to excess latency.
+ *
* @txq: pointer obtained from station or virtual interface
* @frame_cnt: pointer to store frame count
* @byte_cnt: pointer to store byte count
diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
index 4932e9f243a2..b9d0cee2a786 100644
--- a/net/mac80211/agg-tx.c
+++ b/net/mac80211/agg-tx.c
@@ -194,17 +194,21 @@ static void
ieee80211_agg_stop_txq(struct sta_info *sta, int tid)
{
struct ieee80211_txq *txq = sta->sta.txq[tid];
+ struct ieee80211_sub_if_data *sdata;
+ struct ieee80211_fq *fq;
struct txq_info *txqi;

if (!txq)
return;

txqi = to_txq_info(txq);
+ sdata = vif_to_sdata(txq->vif);
+ fq = &sdata->local->fq;

/* Lock here to protect against further seqno updates on dequeue */
- spin_lock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);
set_bit(IEEE80211_TXQ_STOP, &txqi->flags);
- spin_unlock_bh(&txqi->queue.lock);
+ spin_unlock_bh(&fq->lock);
}

static void
diff --git a/net/mac80211/codel.h b/net/mac80211/codel.h
new file mode 100644
index 000000000000..e6470dbe5b0b
--- /dev/null
+++ b/net/mac80211/codel.h
@@ -0,0 +1,264 @@
+#ifndef __NET_MAC80211_CODEL_H
+#define __NET_MAC80211_CODEL_H
+
+/*
+ * Codel - The Controlled-Delay Active Queue Management algorithm
+ *
+ * Copyright (C) 2011-2012 Kathleen Nichols <[email protected]>
+ * Copyright (C) 2011-2012 Van Jacobson <[email protected]>
+ * Copyright (C) 2016 Michael D. Taht <[email protected]>
+ * Copyright (C) 2012 Eric Dumazet <[email protected]>
+ * Copyright (C) 2015 Jonathan Morton <[email protected]>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions, and the following disclaimer,
+ * without modification.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. The names of the authors may not be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ * Alternatively, provided that this notice is retained in full, this
+ * software may be distributed under the terms of the GNU General
+ * Public License ("GPL") version 2, in which case the provisions of the
+ * GPL apply INSTEAD OF those given above.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
+ * DAMAGE.
+ *
+ */
+
+#include <linux/version.h>
+#include <linux/types.h>
+#include <linux/ktime.h>
+#include <linux/skbuff.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <linux/reciprocal_div.h>
+
+#include "codel_i.h"
+
+/* Controlling Queue Delay (CoDel) algorithm
+ * =========================================
+ * Source : Kathleen Nichols and Van Jacobson
+ * http://queue.acm.org/detail.cfm?id=2209336
+ *
+ * Implemented on linux by Dave Taht and Eric Dumazet
+ */
+
+/* CoDel5 uses a real clock, unlike codel */
+
+static inline u64 codel_get_time(void)
+{
+ return ktime_get_ns();
+}
+
+static inline u32 codel_time_to_us(u64 val)
+{
+ do_div(val, NSEC_PER_USEC);
+ return (u32)val;
+}
+
+/* sizeof_in_bits(rec_inv_sqrt) */
+#define REC_INV_SQRT_BITS (8 * sizeof(u16))
+/* needed shift to get a Q0.32 number from rec_inv_sqrt */
+#define REC_INV_SQRT_SHIFT (32 - REC_INV_SQRT_BITS)
+
+/* Newton approximation method needs more iterations at small inputs,
+ * so cache them.
+ */
+
+static void codel_vars_init(struct codel_vars *vars)
+{
+ memset(vars, 0, sizeof(*vars));
+}
+
+/*
+ * http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Iterative_methods_for_reciprocal_square_roots
+ * new_invsqrt = (invsqrt / 2) * (3 - count * invsqrt^2)
+ *
+ * Here, invsqrt is a fixed point number (< 1.0), 32bit mantissa, aka Q0.32
+ */
+static inline void codel_Newton_step(struct codel_vars *vars)
+{
+ u32 invsqrt = ((u32)vars->rec_inv_sqrt) << REC_INV_SQRT_SHIFT;
+ u32 invsqrt2 = ((u64)invsqrt * invsqrt) >> 32;
+ u64 val = (3LL << 32) - ((u64)vars->count * invsqrt2);
+
+ val >>= 2; /* avoid overflow in following multiply */
+ val = (val * invsqrt) >> (32 - 2 + 1);
+
+ vars->rec_inv_sqrt = val >> REC_INV_SQRT_SHIFT;
+}
+
+/*
+ * CoDel control_law is t + interval/sqrt(count)
+ * We maintain in rec_inv_sqrt the reciprocal value of sqrt(count) to avoid
+ * both sqrt() and divide operation.
+ */
+static u64 codel_control_law(u64 t,
+ u64 interval,
+ u32 rec_inv_sqrt)
+{
+ return t + reciprocal_scale(interval, rec_inv_sqrt <<
+ REC_INV_SQRT_SHIFT);
+}
+
+/* Forward declaration of this for use elsewhere */
+
+static inline u64
+custom_codel_get_enqueue_time(struct sk_buff *skb);
+
+static inline struct sk_buff *
+custom_dequeue(struct codel_vars *vars, void *ptr);
+
+static inline void
+custom_drop(struct sk_buff *skb, void *ptr);
+
+static bool codel_should_drop(struct sk_buff *skb,
+ __u32 *backlog,
+ __u32 backlog_thr,
+ struct codel_vars *vars,
+ const struct codel_params *p,
+ u64 now)
+{
+ if (!skb) {
+ vars->first_above_time = 0;
+ return false;
+ }
+
+ if (now - custom_codel_get_enqueue_time(skb) < p->target ||
+ *backlog <= backlog_thr) {
+ /* went below - stay below for at least interval */
+ vars->first_above_time = 0;
+ return false;
+ }
+
+ if (vars->first_above_time == 0) {
+ /* just went above from below; mark the time */
+ vars->first_above_time = now + p->interval;
+
+ } else if (now > vars->first_above_time) {
+ return true;
+ }
+
+ return false;
+}
+
+static struct sk_buff *codel_dequeue(void *ptr,
+ __u32 *backlog,
+ __u32 backlog_thr,
+ struct codel_vars *vars,
+ struct codel_params *p,
+ u64 now,
+ bool overloaded)
+{
+ struct sk_buff *skb = custom_dequeue(vars, ptr);
+ bool drop;
+
+ if (!skb) {
+ vars->dropping = false;
+ return skb;
+ }
+ drop = codel_should_drop(skb, backlog, backlog_thr, vars, p, now);
+ if (vars->dropping) {
+ if (!drop) {
+ /* sojourn time below target - leave dropping state */
+ vars->dropping = false;
+ } else if (now >= vars->drop_next) {
+ /* It's time for the next drop. Drop the current
+ * packet and dequeue the next. The dequeue might
+ * take us out of dropping state.
+ * If not, schedule the next drop.
+ * A large backlog might result in drop rates so high
+ * that the next drop should happen now,
+ * hence the while loop.
+ */
+
+ /* saturating increment */
+ vars->count++;
+ if (!vars->count)
+ vars->count--;
+
+ codel_Newton_step(vars);
+ vars->drop_next = codel_control_law(vars->drop_next,
+ p->interval,
+ vars->rec_inv_sqrt);
+ do {
+ if (INET_ECN_set_ce(skb) && !overloaded) {
+ vars->ecn_mark++;
+ /* and schedule the next drop */
+ vars->drop_next = codel_control_law(
+ vars->drop_next, p->interval,
+ vars->rec_inv_sqrt);
+ goto end;
+ }
+ custom_drop(skb, ptr);
+ vars->drop_count++;
+ skb = custom_dequeue(vars, ptr);
+ if (skb && !codel_should_drop(skb, backlog,
+ backlog_thr,
+ vars, p, now)) {
+ /* leave dropping state */
+ vars->dropping = false;
+ } else {
+ /* schedule the next drop */
+ vars->drop_next = codel_control_law(
+ vars->drop_next, p->interval,
+ vars->rec_inv_sqrt);
+ }
+ } while (skb && vars->dropping && now >=
+ vars->drop_next);
+
+ /* Mark the packet regardless */
+ if (skb && INET_ECN_set_ce(skb))
+ vars->ecn_mark++;
+ }
+ } else if (drop) {
+ if (INET_ECN_set_ce(skb) && !overloaded) {
+ vars->ecn_mark++;
+ } else {
+ custom_drop(skb, ptr);
+ vars->drop_count++;
+
+ skb = custom_dequeue(vars, ptr);
+ drop = codel_should_drop(skb, backlog, backlog_thr,
+ vars, p, now);
+ if (skb && INET_ECN_set_ce(skb))
+ vars->ecn_mark++;
+ }
+ vars->dropping = true;
+ /* if min went above target close to when we last went below
+ * assume that the drop rate that controlled the queue on the
+ * last cycle is a good starting point to control it now.
+ */
+ if (vars->count > 2 &&
+ now - vars->drop_next < 8 * p->interval) {
+ vars->count -= 2;
+ codel_Newton_step(vars);
+ } else {
+ vars->count = 1;
+ vars->rec_inv_sqrt = ~0U >> REC_INV_SQRT_SHIFT;
+ }
+ codel_Newton_step(vars);
+ vars->drop_next = codel_control_law(now, p->interval,
+ vars->rec_inv_sqrt);
+ }
+end:
+ return skb;
+}
+#endif
diff --git a/net/mac80211/codel_i.h b/net/mac80211/codel_i.h
new file mode 100644
index 000000000000..60371121e526
--- /dev/null
+++ b/net/mac80211/codel_i.h
@@ -0,0 +1,89 @@
+#ifndef __NET_MAC80211_CODEL_I_H
+#define __NET_MAC80211_CODEL_I_H
+
+/*
+ * Codel - The Controlled-Delay Active Queue Management algorithm
+ *
+ * Copyright (C) 2011-2012 Kathleen Nichols <[email protected]>
+ * Copyright (C) 2011-2012 Van Jacobson <[email protected]>
+ * Copyright (C) 2016 Michael D. Taht <[email protected]>
+ * Copyright (C) 2012 Eric Dumazet <[email protected]>
+ * Copyright (C) 2015 Jonathan Morton <[email protected]>
+ * Copyright (C) 2016 Michal Kazior <[email protected]>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions, and the following disclaimer,
+ * without modification.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. The names of the authors may not be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ * Alternatively, provided that this notice is retained in full, this
+ * software may be distributed under the terms of the GNU General
+ * Public License ("GPL") version 2, in which case the provisions of the
+ * GPL apply INSTEAD OF those given above.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
+ * DAMAGE.
+ *
+ */
+
+#include <linux/version.h>
+#include <linux/types.h>
+#include <linux/ktime.h>
+#include <linux/skbuff.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <linux/reciprocal_div.h>
+
+/* Controlling Queue Delay (CoDel) algorithm
+ * =========================================
+ * Source : Kathleen Nichols and Van Jacobson
+ * http://queue.acm.org/detail.cfm?id=2209336
+ *
+ * Implemented on linux by Dave Taht and Eric Dumazet
+ */
+
+/* CoDel5 uses a real clock, unlike codel */
+
+#define MS2TIME(a) (a * (u64)NSEC_PER_MSEC)
+#define US2TIME(a) (a * (u64)NSEC_PER_USEC)
+
+/**
+ * struct codel_vars - contains codel variables
+ * @count: how many drops we've done since the last time we
+ * entered dropping state
+ * @dropping: set to > 0 if in dropping state
+ * @rec_inv_sqrt: reciprocal value of sqrt(count) >> 1
+ * @first_above_time: when we went (or will go) continuously above target
+ * for interval
+ * @drop_next: time to drop next packet, or when we dropped last
+ * @drop_count: temp count of dropped packets in dequeue()
+ * @ecn_mark: number of packets we ECN marked instead of dropping
+ */
+
+struct codel_vars {
+ u32 count;
+ u16 dropping;
+ u16 rec_inv_sqrt;
+ u64 first_above_time;
+ u64 drop_next;
+ u16 drop_count;
+ u16 ecn_mark;
+};
+#endif
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index c6830fbe7d68..3dc5192b6dd9 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -805,9 +805,18 @@ enum txq_info_flags {
};

struct txq_info {
- struct sk_buff_head queue;
+ struct txq_flow flow;
+ struct list_head new_flows;
+ struct list_head old_flows;
+ u32 backlog_bytes;
+ u32 backlog_packets;
+ u32 drop_codel;
+ u32 drop_overlimit;
+ u32 collisions;
+ u32 flows;
+ u32 tx_bytes;
+ u32 tx_packets;
unsigned long flags;
- unsigned long byte_cnt;

/* keep last! */
struct ieee80211_txq txq;
@@ -855,7 +864,6 @@ struct ieee80211_sub_if_data {
bool control_port_no_encrypt;
int encrypt_headroom;

- atomic_t txqs_len[IEEE80211_NUM_ACS];
struct ieee80211_tx_queue_params tx_conf[IEEE80211_NUM_ACS];
struct mac80211_qos_map __rcu *qos_map;

@@ -1092,11 +1100,37 @@ enum mac80211_scan_state {
SCAN_ABORT,
};

+/**
+ * struct codel_params - stores codel parameters
+ *
+ * @interval: initial drop rate
+ * @target: maximum persistent sojourn time
+ */
+struct codel_params {
+ u64 interval;
+ u64 target;
+};
+
+struct ieee80211_fq {
+ struct txq_flow *flows;
+ struct list_head backlogs;
+ struct codel_params cparams;
+ spinlock_t lock;
+ u32 flows_cnt;
+ u32 perturbation;
+ u32 txq_limit;
+ u32 quantum;
+ u32 backlog;
+ u32 drop_overlimit;
+ u32 drop_codel;
+};
+
struct ieee80211_local {
/* embed the driver visible part.
* don't cast (use the static inlines below), but we keep
* it first anyway so they become a no-op */
struct ieee80211_hw hw;
+ struct ieee80211_fq fq;

const struct ieee80211_ops *ops;

@@ -1928,6 +1962,11 @@ static inline bool ieee80211_can_run_worker(struct ieee80211_local *local)
void ieee80211_init_tx_queue(struct ieee80211_sub_if_data *sdata,
struct sta_info *sta,
struct txq_info *txq, int tid);
+void ieee80211_purge_txq(struct ieee80211_local *local, struct txq_info *txqi);
+void ieee80211_init_flow(struct txq_flow *flow);
+int ieee80211_setup_flows(struct ieee80211_local *local);
+void ieee80211_teardown_flows(struct ieee80211_local *local);
+
void ieee80211_send_auth(struct ieee80211_sub_if_data *sdata,
u16 transaction, u16 auth_alg, u16 status,
const u8 *extra, size_t extra_len, const u8 *bssid,
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 453b4e741780..1faea208edfc 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -779,6 +779,7 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,
bool going_down)
{
struct ieee80211_local *local = sdata->local;
+ struct ieee80211_fq *fq = &local->fq;
unsigned long flags;
struct sk_buff *skb, *tmp;
u32 hw_reconf_flags = 0;
@@ -977,12 +978,9 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,
if (sdata->vif.txq) {
struct txq_info *txqi = to_txq_info(sdata->vif.txq);

- spin_lock_bh(&txqi->queue.lock);
- ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
- txqi->byte_cnt = 0;
- spin_unlock_bh(&txqi->queue.lock);
-
- atomic_set(&sdata->txqs_len[txqi->txq.ac], 0);
+ spin_lock_bh(&fq->lock);
+ ieee80211_purge_txq(local, txqi);
+ spin_unlock_bh(&fq->lock);
}

if (local->open_count == 0)
@@ -1198,6 +1196,12 @@ static void ieee80211_if_setup(struct net_device *dev)
dev->destructor = ieee80211_if_free;
}

+static void ieee80211_if_setup_no_queue(struct net_device *dev)
+{
+ ieee80211_if_setup(dev);
+ dev->priv_flags |= IFF_NO_QUEUE;
+}
+
static void ieee80211_iface_work(struct work_struct *work)
{
struct ieee80211_sub_if_data *sdata =
@@ -1707,6 +1711,7 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
struct net_device *ndev = NULL;
struct ieee80211_sub_if_data *sdata = NULL;
struct txq_info *txqi;
+ void (*if_setup)(struct net_device *dev);
int ret, i;
int txqs = 1;

@@ -1734,12 +1739,17 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
txq_size += sizeof(struct txq_info) +
local->hw.txq_data_size;

+ if (local->ops->wake_tx_queue)
+ if_setup = ieee80211_if_setup_no_queue;
+ else
+ if_setup = ieee80211_if_setup;
+
if (local->hw.queues >= IEEE80211_NUM_ACS)
txqs = IEEE80211_NUM_ACS;

ndev = alloc_netdev_mqs(size + txq_size,
name, name_assign_type,
- ieee80211_if_setup, txqs, 1);
+ if_setup, txqs, 1);
if (!ndev)
return -ENOMEM;
dev_net_set(ndev, wiphy_net(local->hw.wiphy));
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 8190bf27ebff..9fd3b10ae52b 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -1053,9 +1053,6 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)

local->dynamic_ps_forced_timeout = -1;

- if (!local->hw.txq_ac_max_pending)
- local->hw.txq_ac_max_pending = 64;
-
result = ieee80211_wep_init(local);
if (result < 0)
wiphy_debug(local->hw.wiphy, "Failed to initialize wep: %d\n",
@@ -1087,6 +1084,10 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)

rtnl_unlock();

+ result = ieee80211_setup_flows(local);
+ if (result)
+ goto fail_flows;
+
#ifdef CONFIG_INET
local->ifa_notifier.notifier_call = ieee80211_ifa_changed;
result = register_inetaddr_notifier(&local->ifa_notifier);
@@ -1112,6 +1113,8 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)
#if defined(CONFIG_INET) || defined(CONFIG_IPV6)
fail_ifa:
#endif
+ ieee80211_teardown_flows(local);
+ fail_flows:
rtnl_lock();
rate_control_deinitialize(local);
ieee80211_remove_interfaces(local);
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 91279576f4a7..10a6e9c3de51 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1268,7 +1268,7 @@ static void sta_ps_start(struct sta_info *sta)
for (tid = 0; tid < ARRAY_SIZE(sta->sta.txq); tid++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[tid]);

- if (!skb_queue_len(&txqi->queue))
+ if (!txqi->backlog_packets)
set_bit(tid, &sta->txq_buffered_tids);
else
clear_bit(tid, &sta->txq_buffered_tids);
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 00c82fb152c0..0729046a0144 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -112,11 +112,7 @@ static void __cleanup_single_sta(struct sta_info *sta)
if (sta->sta.txq[0]) {
for (i = 0; i < ARRAY_SIZE(sta->sta.txq); i++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[i]);
- int n = skb_queue_len(&txqi->queue);
-
- ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
- atomic_sub(n, &sdata->txqs_len[txqi->txq.ac]);
- txqi->byte_cnt = 0;
+ ieee80211_purge_txq(local, txqi);
}
}

@@ -1193,7 +1189,7 @@ void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta)
for (i = 0; i < ARRAY_SIZE(sta->sta.txq); i++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[i]);

- if (!skb_queue_len(&txqi->queue))
+ if (!txqi->backlog_packets)
continue;

drv_wake_tx_queue(local, txqi);
@@ -1630,7 +1626,7 @@ ieee80211_sta_ps_deliver_response(struct sta_info *sta,
for (tid = 0; tid < ARRAY_SIZE(sta->sta.txq); tid++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[tid]);

- if (!(tids & BIT(tid)) || skb_queue_len(&txqi->queue))
+ if (!(tids & BIT(tid)) || txqi->backlog_packets)
continue;

sta_info_recalc_tim(sta);
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index 053f5c4fa495..4f7ad9158d31 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -19,6 +19,7 @@
#include <linux/etherdevice.h>
#include <linux/rhashtable.h>
#include "key.h"
+#include "codel_i.h"

/**
* enum ieee80211_sta_info_flags - Stations flags
@@ -330,6 +331,32 @@ struct mesh_sta {

DECLARE_EWMA(signal, 1024, 8)

+struct txq_info;
+
+/**
+ * struct txq_flow - per traffic flow queue
+ *
+ * This structure is used to distinguish and queue different traffic flows
+ * separately for fair queueing/AQM purposes.
+ *
+ * @txqi: txq_info structure it is associated at given time
+ * @flowchain: can be linked to other flows for RR purposes
+ * @backlogchain: can be linked to other flows for backlog sorting purposes
+ * @queue: sk_buff queue
+ * @cvars: codel state vars
+ * @backlog: number of bytes pending in the queue
+ * @deficit: used for fair queueing balancing
+ */
+struct txq_flow {
+ struct txq_info *txqi;
+ struct list_head flowchain;
+ struct list_head backlogchain;
+ struct sk_buff_head queue;
+ struct codel_vars cvars;
+ u32 backlog;
+ int deficit;
+};
+
/**
* struct sta_info - STA information
*
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 485e30a24b38..202405ad344a 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -34,6 +34,7 @@
#include "wpa.h"
#include "wme.h"
#include "rate.h"
+#include "codel.h"

/* misc utils */

@@ -1232,27 +1233,323 @@ ieee80211_tx_prepare(struct ieee80211_sub_if_data *sdata,
return TX_CONTINUE;
}

-static void ieee80211_drv_tx(struct ieee80211_local *local,
- struct ieee80211_vif *vif,
- struct ieee80211_sta *pubsta,
- struct sk_buff *skb)
+static inline u64
+custom_codel_get_enqueue_time(struct sk_buff *skb)
+{
+ return IEEE80211_SKB_CB(skb)->control.enqueue_time;
+}
+
+static inline struct sk_buff *
+flow_dequeue(struct ieee80211_local *local, struct txq_flow *flow)
+{
+ struct ieee80211_fq *fq = &local->fq;
+ struct txq_info *txqi = flow->txqi;
+ struct txq_flow *i;
+ struct sk_buff *skb;
+
+ skb = __skb_dequeue(&flow->queue);
+ if (!skb)
+ return NULL;
+
+ txqi->backlog_bytes -= skb->len;
+ txqi->backlog_packets--;
+ flow->backlog -= skb->len;
+ fq->backlog--;
+
+ if (flow->backlog == 0) {
+ list_del_init(&flow->backlogchain);
+ } else {
+ i = flow;
+
+ list_for_each_entry_continue(i, &fq->backlogs, backlogchain)
+ if (i->backlog < flow->backlog)
+ break;
+
+ list_move_tail(&flow->backlogchain, &i->backlogchain);
+ }
+
+ return skb;
+}
+
+static inline struct sk_buff *
+custom_dequeue(struct codel_vars *vars, void *ptr)
+{
+ struct txq_flow *flow = ptr;
+ struct txq_info *txqi = flow->txqi;
+ struct ieee80211_vif *vif = txqi->txq.vif;
+ struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
+ struct ieee80211_local *local = sdata->local;
+
+ return flow_dequeue(local, flow);
+}
+
+static inline void
+custom_drop(struct sk_buff *skb, void *ptr)
+{
+ struct txq_flow *flow = ptr;
+ struct txq_info *txqi = flow->txqi;
+ struct ieee80211_vif *vif = txqi->txq.vif;
+ struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
+ struct ieee80211_local *local = sdata->local;
+ struct ieee80211_hw *hw = &local->hw;
+
+ ieee80211_free_txskb(hw, skb);
+
+ txqi->drop_codel++;
+ local->fq.drop_codel++;
+}
+
+static u32 fq_hash(struct ieee80211_fq *fq, struct sk_buff *skb)
+{
+ u32 hash = skb_get_hash_perturb(skb, fq->perturbation);
+
+ return reciprocal_scale(hash, fq->flows_cnt);
+}
+
+static void fq_drop(struct ieee80211_local *local)
+{
+ struct ieee80211_hw *hw = &local->hw;
+ struct ieee80211_fq *fq = &local->fq;
+ struct txq_flow *flow;
+ struct sk_buff *skb;
+
+ flow = list_first_entry_or_null(&fq->backlogs, struct txq_flow,
+ backlogchain);
+ if (WARN_ON_ONCE(!flow))
+ return;
+
+ skb = flow_dequeue(local, flow);
+ if (WARN_ON_ONCE(!skb))
+ return;
+
+ ieee80211_free_txskb(hw, skb);
+
+ flow->txqi->drop_overlimit++;
+ fq->drop_overlimit++;
+}
+
+void ieee80211_init_flow(struct txq_flow *flow)
+{
+ INIT_LIST_HEAD(&flow->flowchain);
+ INIT_LIST_HEAD(&flow->backlogchain);
+ __skb_queue_head_init(&flow->queue);
+ codel_vars_init(&flow->cvars);
+}
+
+int ieee80211_setup_flows(struct ieee80211_local *local)
+{
+ struct ieee80211_fq *fq = &local->fq;
+ int i;
+
+ if (!local->ops->wake_tx_queue)
+ return 0;
+
+ memset(fq, 0, sizeof(fq[0]));
+ INIT_LIST_HEAD(&fq->backlogs);
+ spin_lock_init(&fq->lock);
+ fq->flows_cnt = 4096;
+ fq->perturbation = prandom_u32();
+ fq->quantum = 300;
+ fq->txq_limit = 8192;
+ fq->cparams.target = MS2TIME(20);
+ fq->cparams.interval = MS2TIME(100);
+
+ fq->flows = kcalloc(fq->flows_cnt, sizeof(fq->flows[0]), GFP_KERNEL);
+ if (!fq->flows)
+ return -ENOMEM;
+
+ for (i = 0; i < fq->flows_cnt; i++)
+ ieee80211_init_flow(&fq->flows[i]);
+
+ return 0;
+}
+
+static void ieee80211_reset_flow(struct ieee80211_local *local,
+ struct txq_flow *flow)
+{
+ if (!list_empty(&flow->flowchain))
+ list_del_init(&flow->flowchain);
+
+ if (!list_empty(&flow->backlogchain))
+ list_del_init(&flow->backlogchain);
+
+ ieee80211_purge_tx_queue(&local->hw, &flow->queue);
+
+ flow->deficit = 0;
+ flow->txqi = NULL;
+}
+
+void ieee80211_purge_txq(struct ieee80211_local *local, struct txq_info *txqi)
+{
+ struct txq_flow *flow;
+ int i;
+
+ for (i = 0; i < local->fq.flows_cnt; i++) {
+ flow = &local->fq.flows[i];
+
+ if (flow->txqi != txqi)
+ continue;
+
+ ieee80211_reset_flow(local, flow);
+ }
+
+ ieee80211_reset_flow(local, &txqi->flow);
+
+ txqi->backlog_bytes = 0;
+ txqi->backlog_packets = 0;
+}
+
+void ieee80211_teardown_flows(struct ieee80211_local *local)
+{
+ struct ieee80211_fq *fq = &local->fq;
+ struct ieee80211_sub_if_data *sdata;
+ struct sta_info *sta;
+ int i;
+
+ if (!local->ops->wake_tx_queue)
+ return;
+
+ list_for_each_entry_rcu(sta, &local->sta_list, list)
+ for (i = 0; i < IEEE80211_NUM_TIDS; i++)
+ ieee80211_purge_txq(local,
+ to_txq_info(sta->sta.txq[i]));
+
+ list_for_each_entry_rcu(sdata, &local->interfaces, list)
+ ieee80211_purge_txq(local, to_txq_info(sdata->vif.txq));
+
+ for (i = 0; i < fq->flows_cnt; i++)
+ ieee80211_reset_flow(local, &fq->flows[i]);
+
+ kfree(fq->flows);
+
+ fq->flows = NULL;
+ fq->flows_cnt = 0;
+}
+
+static void ieee80211_txq_enqueue(struct ieee80211_local *local,
+ struct txq_info *txqi,
+ struct sk_buff *skb)
+{
+ struct ieee80211_fq *fq = &local->fq;
+ struct txq_flow *flow;
+ struct txq_flow *i;
+ size_t idx = fq_hash(fq, skb);
+
+ lockdep_assert_held(&fq->lock);
+
+ flow = &fq->flows[idx];
+
+ if (flow->txqi && flow->txqi != txqi) {
+ flow = &txqi->flow;
+ txqi->collisions++;
+ }
+
+ if (!flow->txqi)
+ txqi->flows++;
+
+ /* The following overwrites `vif` pointer effectively. It is later
+ * restored using txq structure.
+ */
+ IEEE80211_SKB_CB(skb)->control.enqueue_time = codel_get_time();
+
+ flow->txqi = txqi;
+ flow->backlog += skb->len;
+ txqi->backlog_bytes += skb->len;
+ txqi->backlog_packets++;
+ fq->backlog++;
+
+ if (list_empty(&flow->backlogchain))
+ list_add_tail(&flow->backlogchain, &fq->backlogs);
+
+ i = flow;
+ list_for_each_entry_continue_reverse(i, &fq->backlogs, backlogchain)
+ if (i->backlog > flow->backlog)
+ break;
+
+ list_move(&flow->backlogchain, &i->backlogchain);
+
+ if (list_empty(&flow->flowchain)) {
+ flow->deficit = fq->quantum;
+ list_add_tail(&flow->flowchain, &txqi->new_flows);
+ }
+
+ __skb_queue_tail(&flow->queue, skb);
+
+ if (fq->backlog > fq->txq_limit)
+ fq_drop(local);
+}
+
+static struct sk_buff *ieee80211_txq_dequeue(struct ieee80211_local *local,
+ struct txq_info *txqi)
+{
+ struct ieee80211_fq *fq = &local->fq;
+ struct txq_flow *flow;
+ struct list_head *head;
+ struct sk_buff *skb;
+
+begin:
+ head = &txqi->new_flows;
+ if (list_empty(head)) {
+ head = &txqi->old_flows;
+ if (list_empty(head))
+ return NULL;
+ }
+
+ flow = list_first_entry(head, struct txq_flow, flowchain);
+
+ if (flow->deficit <= 0) {
+ flow->deficit += fq->quantum;
+ list_move_tail(&flow->flowchain, &txqi->old_flows);
+ goto begin;
+ }
+
+ skb = codel_dequeue(flow,
+ &flow->backlog,
+ 0,
+ &flow->cvars,
+ &fq->cparams,
+ codel_get_time(),
+ false);
+ if (!skb) {
+ if ((head == &txqi->new_flows) &&
+ !list_empty(&txqi->old_flows)) {
+ list_move_tail(&flow->flowchain, &txqi->old_flows);
+ } else {
+ list_del_init(&flow->flowchain);
+ flow->txqi = NULL;
+ }
+ goto begin;
+ }
+
+ flow->deficit -= skb->len;
+ txqi->tx_bytes += skb->len;
+ txqi->tx_packets++;
+
+ /* The `vif` pointer was overwritten with enqueue time during
+ * enqueuing. Restore it before handing to driver.
+ */
+ IEEE80211_SKB_CB(skb)->control.vif = flow->txqi->txq.vif;
+
+ return skb;
+}
+
+static struct txq_info *
+ieee80211_get_txq(struct ieee80211_local *local,
+ struct ieee80211_vif *vif,
+ struct ieee80211_sta *pubsta,
+ struct sk_buff *skb)
{
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
- struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
- struct ieee80211_tx_control control = {
- .sta = pubsta,
- };
struct ieee80211_txq *txq = NULL;
- struct txq_info *txqi;
- u8 ac;

if ((info->flags & IEEE80211_TX_CTL_SEND_AFTER_DTIM) ||
+ (info->flags & IEEE80211_TX_INTFL_OFFCHAN_TX_OK) ||
(info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE))
- goto tx_normal;
+ return NULL;

if (!ieee80211_is_data(hdr->frame_control))
- goto tx_normal;
+ return NULL;

if (pubsta) {
u8 tid = skb->priority & IEEE80211_QOS_CTL_TID_MASK;
@@ -1263,57 +1560,34 @@ static void ieee80211_drv_tx(struct ieee80211_local *local,
}

if (!txq)
- goto tx_normal;
+ return NULL;

- ac = txq->ac;
- txqi = to_txq_info(txq);
- atomic_inc(&sdata->txqs_len[ac]);
- if (atomic_read(&sdata->txqs_len[ac]) >= local->hw.txq_ac_max_pending)
- netif_stop_subqueue(sdata->dev, ac);
-
- spin_lock_bh(&txqi->queue.lock);
- txqi->byte_cnt += skb->len;
- __skb_queue_tail(&txqi->queue, skb);
- spin_unlock_bh(&txqi->queue.lock);
-
- drv_wake_tx_queue(local, txqi);
-
- return;
-
-tx_normal:
- drv_tx(local, &control, skb);
+ return to_txq_info(txq);
}

struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
struct ieee80211_txq *txq)
{
struct ieee80211_local *local = hw_to_local(hw);
- struct ieee80211_sub_if_data *sdata = vif_to_sdata(txq->vif);
+ struct ieee80211_fq *fq = &local->fq;
+ struct ieee80211_tx_info *info;
struct txq_info *txqi = container_of(txq, struct txq_info, txq);
struct ieee80211_hdr *hdr;
struct sk_buff *skb = NULL;
- u8 ac = txq->ac;

- spin_lock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);

if (test_bit(IEEE80211_TXQ_STOP, &txqi->flags))
goto out;

- skb = __skb_dequeue(&txqi->queue);
+ skb = ieee80211_txq_dequeue(local, txqi);
if (!skb)
goto out;

- txqi->byte_cnt -= skb->len;
-
- atomic_dec(&sdata->txqs_len[ac]);
- if (__netif_subqueue_stopped(sdata->dev, ac))
- ieee80211_propagate_queue_wake(local, sdata->vif.hw_queue[ac]);
-
hdr = (struct ieee80211_hdr *)skb->data;
if (txq->sta && ieee80211_is_data_qos(hdr->frame_control)) {
struct sta_info *sta = container_of(txq->sta, struct sta_info,
sta);
- struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);

hdr->seq_ctrl = ieee80211_tx_next_seq(sta, txq->tid);
if (test_bit(IEEE80211_TXQ_AMPDU, &txqi->flags))
@@ -1323,7 +1597,7 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
}

out:
- spin_unlock_bh(&txqi->queue.lock);
+ spin_unlock_bh(&fq->lock);

return skb;
}
@@ -1335,7 +1609,10 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
struct sk_buff_head *skbs,
bool txpending)
{
+ struct ieee80211_fq *fq = &local->fq;
+ struct ieee80211_tx_control control = {};
struct sk_buff *skb, *tmp;
+ struct txq_info *txqi;
unsigned long flags;

skb_queue_walk_safe(skbs, skb, tmp) {
@@ -1350,6 +1627,21 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
}
#endif

+ txqi = ieee80211_get_txq(local, vif, sta, skb);
+ if (txqi) {
+ info->control.vif = vif;
+
+ __skb_unlink(skb, skbs);
+
+ spin_lock_bh(&fq->lock);
+ ieee80211_txq_enqueue(local, txqi, skb);
+ spin_unlock_bh(&fq->lock);
+
+ drv_wake_tx_queue(local, txqi);
+
+ continue;
+ }
+
spin_lock_irqsave(&local->queue_stop_reason_lock, flags);
if (local->queue_stop_reasons[q] ||
(!txpending && !skb_queue_empty(&local->pending[q]))) {
@@ -1392,9 +1684,10 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags);

info->control.vif = vif;
+ control.sta = sta;

__skb_unlink(skb, skbs);
- ieee80211_drv_tx(local, vif, sta, skb);
+ drv_tx(local, &control, skb);
}

return true;
diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 0319d6d4f863..cbcdf7cf9679 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -244,6 +244,9 @@ void ieee80211_propagate_queue_wake(struct ieee80211_local *local, int queue)
struct ieee80211_sub_if_data *sdata;
int n_acs = IEEE80211_NUM_ACS;

+ if (local->ops->wake_tx_queue)
+ return;
+
if (local->hw.queues < IEEE80211_NUM_ACS)
n_acs = 1;

@@ -260,11 +263,6 @@ void ieee80211_propagate_queue_wake(struct ieee80211_local *local, int queue)
for (ac = 0; ac < n_acs; ac++) {
int ac_queue = sdata->vif.hw_queue[ac];

- if (local->ops->wake_tx_queue &&
- (atomic_read(&sdata->txqs_len[ac]) >
- local->hw.txq_ac_max_pending))
- continue;
-
if (ac_queue == queue ||
(sdata->vif.cab_queue == queue &&
local->queue_stop_reasons[ac_queue] == 0 &&
@@ -352,6 +350,9 @@ static void __ieee80211_stop_queue(struct ieee80211_hw *hw, int queue,
if (__test_and_set_bit(reason, &local->queue_stop_reasons[queue]))
return;

+ if (local->ops->wake_tx_queue)
+ return;
+
if (local->hw.queues < IEEE80211_NUM_ACS)
n_acs = 1;

@@ -3392,8 +3393,11 @@ void ieee80211_init_tx_queue(struct ieee80211_sub_if_data *sdata,
struct sta_info *sta,
struct txq_info *txqi, int tid)
{
- skb_queue_head_init(&txqi->queue);
+ INIT_LIST_HEAD(&txqi->old_flows);
+ INIT_LIST_HEAD(&txqi->new_flows);
+ ieee80211_init_flow(&txqi->flow);
txqi->txq.vif = &sdata->vif;
+ txqi->flow.txqi = txqi;

if (sta) {
txqi->txq.sta = &sta->sta;
@@ -3414,9 +3418,9 @@ void ieee80211_txq_get_depth(struct ieee80211_txq *txq,
struct txq_info *txqi = to_txq_info(txq);

if (frame_cnt)
- *frame_cnt = txqi->queue.qlen;
+ *frame_cnt = txqi->backlog_packets;

if (byte_cnt)
- *byte_cnt = txqi->byte_cnt;
+ *byte_cnt = txqi->backlog_bytes;
}
EXPORT_SYMBOL(ieee80211_txq_get_depth);
--
2.1.4


2016-03-25 09:26:24

by Michal Kazior

[permalink] [raw]
Subject: [PATCH 2/2] mac80211: expose some txq/fq internals and knobs via debugfs

Makes it easier to debug, test and experiment.

Signed-off-by: Michal Kazior <[email protected]>
---
net/mac80211/debugfs.c | 86 +++++++++++++++++++++++++++++++++++++++++++
net/mac80211/debugfs_netdev.c | 29 ++++++++++++++-
net/mac80211/debugfs_sta.c | 46 +++++++++++++++++++++++
net/mac80211/tx.c | 7 +++-
4 files changed, 166 insertions(+), 2 deletions(-)

diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 4ab5c522ceee..81d3f5a9910d 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -31,6 +31,30 @@ int mac80211_format_buffer(char __user *userbuf, size_t count,
return simple_read_from_buffer(userbuf, count, ppos, buf, res);
}

+static int mac80211_parse_buffer(const char __user *userbuf,
+ size_t count,
+ loff_t *ppos,
+ char *fmt, ...)
+{
+ va_list args;
+ char buf[DEBUGFS_FORMAT_BUFFER_SIZE] = {};
+ int res;
+
+ if (count > sizeof(buf))
+ return -EINVAL;
+
+ if (copy_from_user(buf, userbuf, count))
+ return -EFAULT;
+
+ buf[sizeof(buf) - 1] = '\0';
+
+ va_start(args, fmt);
+ res = vsscanf(buf, fmt, args);
+ va_end(args);
+
+ return count;
+}
+
#define DEBUGFS_READONLY_FILE_FN(name, fmt, value...) \
static ssize_t name## _read(struct file *file, char __user *userbuf, \
size_t count, loff_t *ppos) \
@@ -70,6 +94,59 @@ DEBUGFS_READONLY_FILE(wep_iv, "%#08x",
DEBUGFS_READONLY_FILE(rate_ctrl_alg, "%s",
local->rate_ctrl ? local->rate_ctrl->ops->name : "hw/driver");

+#define DEBUGFS_RW_FILE_FN(name, expr) \
+static ssize_t name## _write(struct file *file, \
+ const char __user *userbuf, \
+ size_t count, \
+ loff_t *ppos) \
+{ \
+ struct ieee80211_local *local = file->private_data; \
+ return expr; \
+}
+
+#define DEBUGFS_RW_FILE(name, expr, fmt, value...) \
+ DEBUGFS_READONLY_FILE_FN(name, fmt, value) \
+ DEBUGFS_RW_FILE_FN(name, expr) \
+ DEBUGFS_RW_FILE_OPS(name)
+
+#define DEBUGFS_RW_FILE_OPS(name) \
+static const struct file_operations name## _ops = { \
+ .read = name## _read, \
+ .write = name## _write, \
+ .open = simple_open, \
+ .llseek = generic_file_llseek, \
+}
+
+#define DEBUGFS_RW_EXPR_FQ(args...) \
+({ \
+ int res; \
+ res = mac80211_parse_buffer(userbuf, count, ppos, args); \
+ res; \
+})
+
+DEBUGFS_READONLY_FILE(fq_drop_overlimit, "%u",
+ local->fq.drop_overlimit);
+DEBUGFS_READONLY_FILE(fq_drop_codel, "%u",
+ local->fq.drop_codel);
+DEBUGFS_READONLY_FILE(fq_backlog, "%u",
+ local->fq.backlog);
+DEBUGFS_READONLY_FILE(fq_flows_cnt, "%u",
+ local->fq.flows_cnt);
+
+DEBUGFS_RW_FILE(fq_target,
+ DEBUGFS_RW_EXPR_FQ("%llu", &local->fq.cparams.target),
+ "%llu", local->fq.cparams.target);
+DEBUGFS_RW_FILE(fq_interval,
+ DEBUGFS_RW_EXPR_FQ("%llu", &local->fq.cparams.interval),
+ "%llu", local->fq.cparams.interval);
+DEBUGFS_RW_FILE(fq_quantum,
+ DEBUGFS_RW_EXPR_FQ("%u", &local->fq.quantum),
+ "%u", local->fq.quantum);
+DEBUGFS_RW_FILE(fq_txq_limit,
+ DEBUGFS_RW_EXPR_FQ("%u", &local->fq.txq_limit),
+ "%u", local->fq.txq_limit);
+
+
#ifdef CONFIG_PM
static ssize_t reset_write(struct file *file, const char __user *user_buf,
size_t count, loff_t *ppos)
@@ -254,6 +331,15 @@ void debugfs_hw_add(struct ieee80211_local *local)
DEBUGFS_ADD(user_power);
DEBUGFS_ADD(power);

+ DEBUGFS_ADD(fq_drop_overlimit);
+ DEBUGFS_ADD(fq_drop_codel);
+ DEBUGFS_ADD(fq_backlog);
+ DEBUGFS_ADD(fq_flows_cnt);
+ DEBUGFS_ADD(fq_target);
+ DEBUGFS_ADD(fq_interval);
+ DEBUGFS_ADD(fq_quantum);
+ DEBUGFS_ADD(fq_txq_limit);
+
statsd = debugfs_create_dir("statistics", phyd);

/* if the dir failed, don't put all the other things into the root! */
diff --git a/net/mac80211/debugfs_netdev.c b/net/mac80211/debugfs_netdev.c
index 37ea30e0754c..39ae13d19387 100644
--- a/net/mac80211/debugfs_netdev.c
+++ b/net/mac80211/debugfs_netdev.c
@@ -30,7 +30,7 @@ static ssize_t ieee80211_if_read(
size_t count, loff_t *ppos,
ssize_t (*format)(const struct ieee80211_sub_if_data *, char *, int))
{
- char buf[70];
+ char buf[200];
ssize_t ret = -EINVAL;

read_lock(&dev_base_lock);
@@ -236,6 +236,32 @@ ieee80211_if_fmt_hw_queues(const struct ieee80211_sub_if_data *sdata,
}
IEEE80211_IF_FILE_R(hw_queues);

+static ssize_t
+ieee80211_if_fmt_txq(const struct ieee80211_sub_if_data *sdata,
+ char *buf, int buflen)
+{
+ struct txq_info *txqi;
+ int len = 0;
+
+ if (!sdata->vif.txq)
+ return 0;
+
+ txqi = to_txq_info(sdata->vif.txq);
+ len += scnprintf(buf + len, buflen - len,
+ "CAB backlog %ub %up flows %u drops %u overlimit %u collisions %u tx %ub %up\n",
+ txqi->backlog_bytes,
+ txqi->backlog_packets,
+ txqi->flows,
+ txqi->drop_codel,
+ txqi->drop_overlimit,
+ txqi->collisions,
+ txqi->tx_bytes,
+ txqi->tx_packets);
+
+ return len;
+}
+IEEE80211_IF_FILE_R(txq);
+
/* STA attributes */
IEEE80211_IF_FILE(bssid, u.mgd.bssid, MAC);
IEEE80211_IF_FILE(aid, u.mgd.aid, DEC);
@@ -618,6 +644,7 @@ static void add_common_files(struct ieee80211_sub_if_data *sdata)
DEBUGFS_ADD(rc_rateidx_vht_mcs_mask_2ghz);
DEBUGFS_ADD(rc_rateidx_vht_mcs_mask_5ghz);
DEBUGFS_ADD(hw_queues);
+ DEBUGFS_ADD(txq);
}

static void add_sta_files(struct ieee80211_sub_if_data *sdata)
diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
index a39512f09f9e..7322fb098f4d 100644
--- a/net/mac80211/debugfs_sta.c
+++ b/net/mac80211/debugfs_sta.c
@@ -319,6 +319,51 @@ static ssize_t sta_vht_capa_read(struct file *file, char __user *userbuf,
}
STA_OPS(vht_capa);

+static ssize_t sta_txqs_read(struct file *file,
+ char __user *userbuf,
+ size_t count,
+ loff_t *ppos)
+{
+ struct sta_info *sta = file->private_data;
+ struct txq_info *txqi;
+ char *buf;
+ int buflen;
+ int len;
+ int res;
+ int i;
+
+ len = 0;
+ buflen = 200 * IEEE80211_NUM_TIDS;
+ buf = kzalloc(buflen, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ for (i = 0; i < IEEE80211_NUM_TIDS; i++) {
+ if (!sta->sta.txq[i])
+ break;
+
+ txqi = to_txq_info(sta->sta.txq[i]);
+ len += scnprintf(buf + len, buflen - len,
+ "TID %d AC %d backlog %ub %up flows %u drops %u overlimit %u collisions %u tx %ub %up\n",
+ i,
+ txqi->txq.ac,
+ txqi->backlog_bytes,
+ txqi->backlog_packets,
+ txqi->flows,
+ txqi->drop_codel,
+ txqi->drop_overlimit,
+ txqi->collisions,
+ txqi->tx_bytes,
+ txqi->tx_packets);
+ }
+
+ res = simple_read_from_buffer(userbuf, count, ppos, buf, len);
+ kfree(buf);
+
+ return res;
+}
+STA_OPS(txqs);
+

#define DEBUGFS_ADD(name) \
debugfs_create_file(#name, 0400, \
@@ -365,6 +410,7 @@ void ieee80211_sta_debugfs_add(struct sta_info *sta)
DEBUGFS_ADD(agg_status);
DEBUGFS_ADD(ht_capa);
DEBUGFS_ADD(vht_capa);
+ DEBUGFS_ADD(txqs);

DEBUGFS_ADD_COUNTER(rx_duplicates, rx_stats.num_duplicates);
DEBUGFS_ADD_COUNTER(rx_fragments, rx_stats.fragments);
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 202405ad344a..ded95b80c4f5 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -36,6 +36,11 @@
#include "rate.h"
#include "codel.h"

+static unsigned int fq_flows_cnt = 4096;
+module_param(fq_flows_cnt, uint, 0644);
+MODULE_PARM_DESC(fq_flows_cnt,
+ "Maximum number of txq fair queuing flows. ");
+
/* misc utils */

static inline void ieee80211_tx_stats(struct net_device *dev, u32 len)
@@ -1347,7 +1352,7 @@ int ieee80211_setup_flows(struct ieee80211_local *local)
memset(fq, 0, sizeof(fq[0]));
INIT_LIST_HEAD(&fq->backlogs);
spin_lock_init(&fq->lock);
- fq->flows_cnt = 4096;
+ fq->flows_cnt = max_t(u32, fq_flows_cnt, 1);
fq->perturbation = prandom_u32();
fq->quantum = 300;
fq->txq_limit = 8192;
--
2.1.4


2016-03-16 15:37:33

by Dave Taht

[permalink] [raw]
Subject: Re: [RFCv2 0/3] mac80211: implement fq codel

it is helpful to name the test files coherently in the flent tests, in
addition to using a directory structure and timestamp. It makes doing
comparison plots in data->add-other-open-data-files simpler. "-t
patched-mac-300mbps", for example.

Also netperf from svn (maybe 2.7, don't remember) will restart udp_rr
after a packet loss in 250ms. Seeing a loss on UDP_RR and it stop for
a while is "ok".
Dave Täht
Let's go make home routers and wifi faster! With better software!
https://www.gofundme.com/savewifi


On Wed, Mar 16, 2016 at 3:26 AM, Michal Kazior <[email protected]> wrote:
> On 16 March 2016 at 11:17, Michal Kazior <[email protected]> wrote:
>> Hi,
>>
>> Most notable changes:
> [...]
>> * ath10k proof-of-concept that uses the new tx
>> scheduling (will post results in separate
>> email)
>
> I'm attaching a bunch of tests I've done using flent. They are all
> "burst" tests with burst-ports=1 and burst-length=2. The testing
> topology is:
>
> AP ----> STA
> AP )) (( STA
> [veth]--[br]--[wlan] )) (( [wlan]
>
> You can notice that in some tests plot data gets cut-off. There are 2
> problems I've identified:
> - excess drops (not a problem with the patchset and can be seen when
> there's no codel-in-mac or scheduling isn't used)
> - UDP_RR hangs (apparently QCA99X0 I have hangs for a few hundred ms
> sometimes at times and doesn't Rx frames causing UDP_RR to stop
> mid-way; confirmed with logs and sniffer; I haven't figured out *why*
> exactly, could be some hw/fw quirk)
>
> Let me know if you have questions or comments regarding my testing/results.
>
>
> Michał


Attachments:
cdf_comparison.png (85.16 kB)

2016-03-22 06:51:08

by Michal Kazior

[permalink] [raw]
Subject: Re: [Make-wifi-fast] [RFCv2 1/3] mac80211: implement fq_codel for software queuing

On 22 March 2016 at 02:35, David Lang <[email protected]> wrote:
> On Wed, 16 Mar 2016, Michal Kazior wrote:
>
>> Since 11n aggregation become important to get the
>> best out of txops. However aggregation inherently
>> requires buffering and queuing. Once variable
>> medium conditions to different associated stations
>> is considered it became apparent that bufferbloat
>> can't be simply fought with qdiscs for wireless
>> drivers.
>
> If the network is quiet enough, don't do any buffering, but in almost all
> situations you are going to need to buffer starting no later than the second
> packet you try to send.
>
> Don't try to make queueing occur, just deal with the queues that form
> naturally because you can't transmit data any faster (and work to keep them
> under some semblence of control)
[...]

This is what already happens. Queues typically start to build up when
hardware tx fifos/queues become busy so by the time they become
available you might have a bunch of frames you can aggregate.

The patch is more about getting rid of qdiscs because it's inherently
hard to teach them how 802.11 aggregation works (per-station per-tid)
and the ever-changing nature of per-station tx conditions.

I'll update the commit log to better reflect what is being done.


Michał

2016-03-24 07:49:13

by Michal Kazior

[permalink] [raw]
Subject: Re: [RFCv2 2/3] ath10k: report per-station tx/rate rates to mac80211

On 24 March 2016 at 08:19, Mohammed Shafi Shajakhan
<[email protected]> wrote:
> Hi Michal,
>
> On Wed, Mar 16, 2016 at 11:17:57AM +0100, Michal Kazior wrote:
>> The rate control is offloaded by firmware so it's
>> challanging to provide expected throughput value
>> for given station.
>>
>> This approach is naive as it reports last tx rate
>> used for given station as provided by firmware
>> stat event.
>>
>> This should be sufficient for airtime estimation
>> used for fq-codel-in-mac80211 tx scheduling
>> purposes now.
>>
>> This patch uses a very hacky way to get the stats.
>> This is sufficient for proof-of-concept but must
>> be cleaned up properly eventually.
>>
>> Signed-off-by: Michal Kazior <[email protected]>
>> ---
>> drivers/net/wireless/ath/ath10k/core.h | 5 +++
>> drivers/net/wireless/ath/ath10k/debug.c | 61 +++++++++++++++++++++++++++++----
>> drivers/net/wireless/ath/ath10k/mac.c | 26 ++++++++------
>> drivers/net/wireless/ath/ath10k/wmi.h | 2 +-
>> 4 files changed, 76 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
>> index 23ba03fb7a5f..3f76669d44cf 100644
>> --- a/drivers/net/wireless/ath/ath10k/core.h
>> +++ b/drivers/net/wireless/ath/ath10k/core.h
>> @@ -331,6 +331,9 @@ struct ath10k_sta {
>> /* protected by conf_mutex */
>> bool aggr_mode;
>> u64 rx_duration;
>> +
>> + u32 tx_rate_kbps;
>> + u32 rx_rate_kbps;
>> #endif
>> };
>>
>> @@ -372,6 +375,8 @@ struct ath10k_vif {
>> s8 def_wep_key_idx;
>>
>> u16 tx_seq_no;
>> + u32 tx_rate_kbps;
>> + u32 rx_rate_kbps;
>>
>> union {
>> struct {
>> diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c
>> index 076d29b53ddf..cc7ebf04ae00 100644
>> --- a/drivers/net/wireless/ath/ath10k/debug.c
>> +++ b/drivers/net/wireless/ath/ath10k/debug.c
>> @@ -316,6 +316,58 @@ static void ath10k_debug_fw_stats_reset(struct ath10k *ar)
>> spin_unlock_bh(&ar->data_lock);
>> }
>>
>> +static void ath10k_mac_update_txrx_rate_iter(void *data,
>> + u8 *mac,
>> + struct ieee80211_vif *vif)
>> +{
>> + struct ath10k_fw_stats_peer *peer = data;
>> + struct ath10k_vif *arvif;
>> +
>> + if (memcmp(vif->addr, peer->peer_macaddr, ETH_ALEN))
>> + return;
>> +
>> + arvif = (void *)vif->drv_priv;
>> + arvif->tx_rate_kbps = peer->peer_tx_rate;
>> + arvif->rx_rate_kbps = peer->peer_rx_rate;
>> +}
>> +
>> +static void ath10k_mac_update_txrx_rate(struct ath10k *ar,
>> + struct ath10k_fw_stats *stats)
>> +{
>> + struct ieee80211_hw *hw = ar->hw;
>> + struct ath10k_fw_stats_peer *peer;
>> + struct ath10k_sta *arsta;
>> + struct ieee80211_sta *sta;
>> + const u8 *localaddr = NULL;
>> +
>> + rcu_read_lock();
>> +
>> + list_for_each_entry(peer, &stats->peers, list) {
>> + /* This doesn't account for multiple STA connected on different
>> + * vifs. Unfortunately there's no way to derive that from the available
>> + * information.
>> + */
>> + sta = ieee80211_find_sta_by_ifaddr(hw,
>> + peer->peer_macaddr,
>> + localaddr);
>> + if (!sta) {
>> + /* This tries to update multicast rates */
>> + ieee80211_iterate_active_interfaces_atomic(
>> + hw,
>> + IEEE80211_IFACE_ITER_NORMAL,
>> + ath10k_mac_update_txrx_rate_iter,
>> + peer);
>> + continue;
>> + }
>> +
>> + arsta = (void *)sta->drv_priv;
>> + arsta->tx_rate_kbps = peer->peer_tx_rate;
>> + arsta->rx_rate_kbps = peer->peer_rx_rate;
>> + }
>> +
>> + rcu_read_unlock();
>> +}
>> +
>> void ath10k_debug_fw_stats_process(struct ath10k *ar, struct sk_buff *skb)
>> {
>> struct ath10k_fw_stats stats = {};
>> @@ -335,6 +387,8 @@ void ath10k_debug_fw_stats_process(struct ath10k *ar, struct sk_buff *skb)
>> goto free;
>> }
>>
>> + ath10k_mac_update_txrx_rate(ar, &stats);
>> +
>> /* Stat data may exceed htc-wmi buffer limit. In such case firmware
>> * splits the stats data and delivers it in a ping-pong fashion of
>> * request cmd-update event.
>> @@ -351,13 +405,6 @@ void ath10k_debug_fw_stats_process(struct ath10k *ar, struct sk_buff *skb)
>> if (peer_stats_svc)
>> ath10k_sta_update_rx_duration(ar, &stats.peers);
>>
>> - if (ar->debug.fw_stats_done) {
>> - if (!peer_stats_svc)
>> - ath10k_warn(ar, "received unsolicited stats update event\n");
>> -
>> - goto free;
>> - }
>> -
>
> [shafi] As you had suggested previously, should we completely clean up this ping
> - pong response approach for f/w stats, (or) this should be retained to support
> backward compatibility and also for supporting ping - pong response when user
> cats for fw-stats (via debugfs) (i did see in the commit message this needs to
> be cleaned up)

I think it makes sense to remove the ping-pong logic and rely on
periodic updates alone, including fw_stats and ethstats handling.


>> - if (test_bit(WMI_SERVICE_PEER_STATS, ar->wmi.svc_map)) {
>> - param = ar->wmi.pdev_param->peer_stats_update_period;
>> - ret = ath10k_wmi_pdev_set_param(ar, param,
>> - PEER_DEFAULT_STATS_UPDATE_PERIOD);
>> - if (ret) {
>> - ath10k_warn(ar,
>> - "failed to set peer stats period : %d\n",
>> - ret);
>> - goto err_core_stop;
>> - }
>> + param = ar->wmi.pdev_param->peer_stats_update_period;
>> + ret = ath10k_wmi_pdev_set_param(ar, param,
>> + PEER_DEFAULT_STATS_UPDATE_PERIOD);
>> + if (ret) {
>> + ath10k_warn(ar,
>> + "failed to set peer stats period : %d\n",
>> + ret);
>> + goto err_core_stop;
>> }
>
> [shafi] If i am correct this change requires 'PEER_STATS' to be enabled by
> default.

No, it does not. Periodic stats have been available since forever.


>> diff --git a/drivers/net/wireless/ath/ath10k/wmi.h b/drivers/net/wireless/ath/ath10k/wmi.h
>> index 4d3cbc44fcd2..2877a3a27b95 100644
>> --- a/drivers/net/wireless/ath/ath10k/wmi.h
>> +++ b/drivers/net/wireless/ath/ath10k/wmi.h
>> @@ -3296,7 +3296,7 @@ struct wmi_csa_event {
>> /* the definition of different PDEV parameters */
>> #define PDEV_DEFAULT_STATS_UPDATE_PERIOD 500
>> #define VDEV_DEFAULT_STATS_UPDATE_PERIOD 500
>> -#define PEER_DEFAULT_STATS_UPDATE_PERIOD 500
>> +#define PEER_DEFAULT_STATS_UPDATE_PERIOD 100
>
> [shafi] Is this for more granularity since 500ms is not sufficient ?, I understand
> the firmware has default stats_update_period as 500ms and i hope it supports
> 100ms as well, Also if we are going to support period stats update we may need to
> accumulate the information in driver (like this change and rx_duration

The patch is used for rough rate estimation which is used to keep Tx
queues filled only with 1-2 txops worth of data. Signal conditions can
change vastly so I figured I need the peer stat update events come
more often. I didn't really verify if they come every 100ms. The patch
already served it's job as a proof-of-concept for smarter tx queuing.


> I will try to take this change, rebase it to TOT and see how it goes.

There's really no benefit it taking this patch as a basis for periodic
stat handling. Majority of this patch is just handling peer_tx_rate
for rate estimation purposes. Feel free to knock yourself out with
ripping out the ping-pong stat stuff though :)


Michał

2016-03-17 11:12:59

by Bob Copeland

[permalink] [raw]
Subject: Re: [RFCv2 0/3] mac80211: implement fq codel

On Thu, Mar 17, 2016 at 09:55:03AM +0100, Michal Kazior wrote:
> If you consider Wi-Fi is half-duplex and latency in the entire stack
> (for processing ICMP and UDP_RR) is greater than 11e contention window
> timings you can get your BE flow responses with extra delay (since
> other queues might have responses ready quicker).

Got it, that makes sense. Thanks for the explanation!

--
Bob Copeland %% http://bobcopeland.com/

2016-03-21 17:10:41

by Dave Taht

[permalink] [raw]
Subject: Re: [RFCv2 0/3] mac80211: implement fq codel

thx.

a lot to digest.

A) quick notes on "flent-gui bursts_11e-2016-03-21T09*.gz"

1) the new bursts_11e test *should* have stuck stuff in the VI and VO
queues, and there *should* have been some sort of difference shown on
the plots with it. There wasn't.

For diffserv markings I used BE=CS0, BK=CS1, VI=CS5, and VO=EF.
CS6/CS7 should also land in VO (at least with the soft mac handler
last I looked). Is there a way to check if you are indeed exercising
all four 802.11e hardware queues in this test? in ath9k it is the
"xmit" sysfs var....

2) In all the old cases the BE UDP_RR flow died on the first burst
(why?), and the fullpatch preserved it. (I would have kind of hoped to
have seen the BK flow die, actually, in the fullpatch)

3) I am also confused on 802.11ac - can VO aggregate? ( can't in in 802.11n).


Attachments:
vivosame.png (270.14 kB)

2016-03-16 10:15:54

by Michal Kazior

[permalink] [raw]
Subject: [RFCv2 2/3] ath10k: report per-station tx/rate rates to mac80211

The rate control is offloaded by firmware so it's
challanging to provide expected throughput value
for given station.

This approach is naive as it reports last tx rate
used for given station as provided by firmware
stat event.

This should be sufficient for airtime estimation
used for fq-codel-in-mac80211 tx scheduling
purposes now.

This patch uses a very hacky way to get the stats.
This is sufficient for proof-of-concept but must
be cleaned up properly eventually.

Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/core.h | 5 +++
drivers/net/wireless/ath/ath10k/debug.c | 61 +++++++++++++++++++++++++++++----
drivers/net/wireless/ath/ath10k/mac.c | 26 ++++++++------
drivers/net/wireless/ath/ath10k/wmi.h | 2 +-
4 files changed, 76 insertions(+), 18 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 23ba03fb7a5f..3f76669d44cf 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -331,6 +331,9 @@ struct ath10k_sta {
/* protected by conf_mutex */
bool aggr_mode;
u64 rx_duration;
+
+ u32 tx_rate_kbps;
+ u32 rx_rate_kbps;
#endif
};

@@ -372,6 +375,8 @@ struct ath10k_vif {
s8 def_wep_key_idx;

u16 tx_seq_no;
+ u32 tx_rate_kbps;
+ u32 rx_rate_kbps;

union {
struct {
diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c
index 076d29b53ddf..cc7ebf04ae00 100644
--- a/drivers/net/wireless/ath/ath10k/debug.c
+++ b/drivers/net/wireless/ath/ath10k/debug.c
@@ -316,6 +316,58 @@ static void ath10k_debug_fw_stats_reset(struct ath10k *ar)
spin_unlock_bh(&ar->data_lock);
}

+static void ath10k_mac_update_txrx_rate_iter(void *data,
+ u8 *mac,
+ struct ieee80211_vif *vif)
+{
+ struct ath10k_fw_stats_peer *peer = data;
+ struct ath10k_vif *arvif;
+
+ if (memcmp(vif->addr, peer->peer_macaddr, ETH_ALEN))
+ return;
+
+ arvif = (void *)vif->drv_priv;
+ arvif->tx_rate_kbps = peer->peer_tx_rate;
+ arvif->rx_rate_kbps = peer->peer_rx_rate;
+}
+
+static void ath10k_mac_update_txrx_rate(struct ath10k *ar,
+ struct ath10k_fw_stats *stats)
+{
+ struct ieee80211_hw *hw = ar->hw;
+ struct ath10k_fw_stats_peer *peer;
+ struct ath10k_sta *arsta;
+ struct ieee80211_sta *sta;
+ const u8 *localaddr = NULL;
+
+ rcu_read_lock();
+
+ list_for_each_entry(peer, &stats->peers, list) {
+ /* This doesn't account for multiple STA connected on different
+ * vifs. Unfortunately there's no way to derive that from the available
+ * information.
+ */
+ sta = ieee80211_find_sta_by_ifaddr(hw,
+ peer->peer_macaddr,
+ localaddr);
+ if (!sta) {
+ /* This tries to update multicast rates */
+ ieee80211_iterate_active_interfaces_atomic(
+ hw,
+ IEEE80211_IFACE_ITER_NORMAL,
+ ath10k_mac_update_txrx_rate_iter,
+ peer);
+ continue;
+ }
+
+ arsta = (void *)sta->drv_priv;
+ arsta->tx_rate_kbps = peer->peer_tx_rate;
+ arsta->rx_rate_kbps = peer->peer_rx_rate;
+ }
+
+ rcu_read_unlock();
+}
+
void ath10k_debug_fw_stats_process(struct ath10k *ar, struct sk_buff *skb)
{
struct ath10k_fw_stats stats = {};
@@ -335,6 +387,8 @@ void ath10k_debug_fw_stats_process(struct ath10k *ar, struct sk_buff *skb)
goto free;
}

+ ath10k_mac_update_txrx_rate(ar, &stats);
+
/* Stat data may exceed htc-wmi buffer limit. In such case firmware
* splits the stats data and delivers it in a ping-pong fashion of
* request cmd-update event.
@@ -351,13 +405,6 @@ void ath10k_debug_fw_stats_process(struct ath10k *ar, struct sk_buff *skb)
if (peer_stats_svc)
ath10k_sta_update_rx_duration(ar, &stats.peers);

- if (ar->debug.fw_stats_done) {
- if (!peer_stats_svc)
- ath10k_warn(ar, "received unsolicited stats update event\n");
-
- goto free;
- }
-
num_peers = ath10k_wmi_fw_stats_num_peers(&ar->debug.fw_stats.peers);
num_vdevs = ath10k_wmi_fw_stats_num_vdevs(&ar->debug.fw_stats.vdevs);
is_start = (list_empty(&ar->debug.fw_stats.pdevs) &&
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index ebff9c0a0784..addef9179dbe 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -4427,16 +4427,14 @@ static int ath10k_start(struct ieee80211_hw *hw)

ar->ani_enabled = true;

- if (test_bit(WMI_SERVICE_PEER_STATS, ar->wmi.svc_map)) {
- param = ar->wmi.pdev_param->peer_stats_update_period;
- ret = ath10k_wmi_pdev_set_param(ar, param,
- PEER_DEFAULT_STATS_UPDATE_PERIOD);
- if (ret) {
- ath10k_warn(ar,
- "failed to set peer stats period : %d\n",
- ret);
- goto err_core_stop;
- }
+ param = ar->wmi.pdev_param->peer_stats_update_period;
+ ret = ath10k_wmi_pdev_set_param(ar, param,
+ PEER_DEFAULT_STATS_UPDATE_PERIOD);
+ if (ret) {
+ ath10k_warn(ar,
+ "failed to set peer stats period : %d\n",
+ ret);
+ goto err_core_stop;
}

ar->num_started_vdevs = 0;
@@ -7215,6 +7213,13 @@ ath10k_mac_op_switch_vif_chanctx(struct ieee80211_hw *hw,
return 0;
}

+static u32
+ath10k_mac_op_get_expected_throughput(struct ieee80211_sta *sta)
+{
+ struct ath10k_sta *arsta = (struct ath10k_sta *)sta->drv_priv;
+ return arsta->tx_rate_kbps;
+}
+
static const struct ieee80211_ops ath10k_ops = {
.tx = ath10k_mac_op_tx,
.wake_tx_queue = ath10k_mac_op_wake_tx_queue,
@@ -7254,6 +7259,7 @@ static const struct ieee80211_ops ath10k_ops = {
.assign_vif_chanctx = ath10k_mac_op_assign_vif_chanctx,
.unassign_vif_chanctx = ath10k_mac_op_unassign_vif_chanctx,
.switch_vif_chanctx = ath10k_mac_op_switch_vif_chanctx,
+ .get_expected_throughput = ath10k_mac_op_get_expected_throughput,

CFG80211_TESTMODE_CMD(ath10k_tm_cmd)

diff --git a/drivers/net/wireless/ath/ath10k/wmi.h b/drivers/net/wireless/ath/ath10k/wmi.h
index 4d3cbc44fcd2..2877a3a27b95 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.h
+++ b/drivers/net/wireless/ath/ath10k/wmi.h
@@ -3296,7 +3296,7 @@ struct wmi_csa_event {
/* the definition of different PDEV parameters */
#define PDEV_DEFAULT_STATS_UPDATE_PERIOD 500
#define VDEV_DEFAULT_STATS_UPDATE_PERIOD 500
-#define PEER_DEFAULT_STATS_UPDATE_PERIOD 500
+#define PEER_DEFAULT_STATS_UPDATE_PERIOD 100

struct wmi_pdev_param_map {
u32 tx_chain_mask;
--
2.1.4


2016-03-17 09:43:52

by Michal Kazior

[permalink] [raw]
Subject: Re: [RFCv2 0/3] mac80211: implement fq codel

I've re-tested selected cases with wmm_enabled=0 set on the DUT AP.
I'm attaching results.

Naming:
* "old-" is without mac/ath10k changes (referred to as kvalo-reverts
previously) and fq_codel on qdiscs,
* "patched-" is all patches applied (both mac and ath),
* "-be-bursts" is stock "bursts" flent test,
* "-all-bursts" is modified "bursts" flent test to burst on all 3
tids simultaneously: tid0(BE), tid1(BK), tid5(VI).


Michał

On 16 March 2016 at 19:36, Dave Taht <[email protected]> wrote:
> That is the sanest 802.11e queue behavior I have ever seen! (at both
> 6 and 300mbit! in the ath10k patched mac test)
>
> It would be good to add a flow to this test that exercises the VI
> queue (CS5 diffserv marking?), and to repeat this test with wmm
> disabled for comparison.
>
>
> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> https://www.gofundme.com/savewifi
>
>
> On Wed, Mar 16, 2016 at 8:37 AM, Dave Taht <[email protected]> wrote:
>> it is helpful to name the test files coherently in the flent tests, in
>> addition to using a directory structure and timestamp. It makes doing
>> comparison plots in data->add-other-open-data-files simpler. "-t
>> patched-mac-300mbps", for example.
>>
>> Also netperf from svn (maybe 2.7, don't remember) will restart udp_rr
>> after a packet loss in 250ms. Seeing a loss on UDP_RR and it stop for
>> a while is "ok".
>> Dave Täht
>> Let's go make home routers and wifi faster! With better software!
>> https://www.gofundme.com/savewifi
>>
>>
>> On Wed, Mar 16, 2016 at 3:26 AM, Michal Kazior <[email protected]> wrote:
>>> On 16 March 2016 at 11:17, Michal Kazior <[email protected]> wrote:
>>>> Hi,
>>>>
>>>> Most notable changes:
>>> [...]
>>>> * ath10k proof-of-concept that uses the new tx
>>>> scheduling (will post results in separate
>>>> email)
>>>
>>> I'm attaching a bunch of tests I've done using flent. They are all
>>> "burst" tests with burst-ports=1 and burst-length=2. The testing
>>> topology is:
>>>
>>> AP ----> STA
>>> AP )) (( STA
>>> [veth]--[br]--[wlan] )) (( [wlan]
>>>
>>> You can notice that in some tests plot data gets cut-off. There are 2
>>> problems I've identified:
>>> - excess drops (not a problem with the patchset and can be seen when
>>> there's no codel-in-mac or scheduling isn't used)
>>> - UDP_RR hangs (apparently QCA99X0 I have hangs for a few hundred ms
>>> sometimes at times and doesn't Rx frames causing UDP_RR to stop
>>> mid-way; confirmed with logs and sniffer; I haven't figured out *why*
>>> exactly, could be some hw/fw quirk)
>>>
>>> Let me know if you have questions or comments regarding my testing/results.
>>>
>>>
>>> Michał


Attachments:
bursts-2016-03-17T093033.443115.patched_all_bursts.flent.gz (13.52 kB)
bursts-2016-03-17T092946.721003.patched_be_bursts.flent.gz (13.46 kB)
bursts-2016-03-17T092445.132728.old_be_bursts.flent.gz (6.20 kB)
bursts-2016-03-17T091952.053950.old_all_bursts.flent.gz (5.33 kB)
patched-be-bursts.gif (17.54 kB)
Download all attachments

2016-03-22 01:46:20

by David Lang

[permalink] [raw]
Subject: Re: [Make-wifi-fast] [RFCv2 1/3] mac80211: implement fq_codel for software queuing

On Wed, 16 Mar 2016, Michal Kazior wrote:

> Since 11n aggregation become important to get the
> best out of txops. However aggregation inherently
> requires buffering and queuing. Once variable
> medium conditions to different associated stations
> is considered it became apparent that bufferbloat
> can't be simply fought with qdiscs for wireless
> drivers.

If the network is quiet enough, don't do any buffering, but in almost all
situations you are going to need to buffer starting no later than the second
packet you try to send.

Don't try to make queueing occur, just deal with the queues that form naturally
because you can't transmit data any faster (and work to keep them under some
semblence of control)

It's a tempting trap to fall into to try and fill the aggregates to transmit as
efficiently as possible, but don't skip transmitting because you don't have a
full aggregate, transmit what you have and if there is more, you'll catch it on
the next pass.

This is slightly less friendly on the network than waiting to see if you can
fill the aggregate in a quick fashion, but if uces the lowest latency possible,
and deteriorates into the same state that you end up with if you try to fill the
aggregates as the load/congestion builds.

David Lang

>
> This bases on codel5 and sch_fq_codel.c. It may
> not be the Right Thing yet but it should at least
> provide a framework for more improvements.
>
> Signed-off-by: Michal Kazior <[email protected]>
> ---
> include/net/mac80211.h | 96 ++++++-
> net/mac80211/agg-tx.c | 8 +-
> net/mac80211/cfg.c | 2 +-
> net/mac80211/codel.h | 264 ++++++++++++++++++
> net/mac80211/codel_i.h | 89 ++++++
> net/mac80211/debugfs.c | 267 ++++++++++++++++++
> net/mac80211/ieee80211_i.h | 45 +++-
> net/mac80211/iface.c | 25 +-
> net/mac80211/main.c | 9 +-
> net/mac80211/rx.c | 2 +-
> net/mac80211/sta_info.c | 10 +-
> net/mac80211/sta_info.h | 27 ++
> net/mac80211/status.c | 64 +++++
> net/mac80211/tx.c | 658 ++++++++++++++++++++++++++++++++++++++++++---
> net/mac80211/util.c | 21 +-
> 15 files changed, 1503 insertions(+), 84 deletions(-)
> create mode 100644 net/mac80211/codel.h
> create mode 100644 net/mac80211/codel_i.h
>
> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
> index a53333cb1528..947d827f254b 100644
> --- a/include/net/mac80211.h
> +++ b/include/net/mac80211.h
> @@ -565,6 +565,16 @@ struct ieee80211_bss_conf {
> struct ieee80211_p2p_noa_attr p2p_noa_attr;
> };
>
> +/*
> + * struct codel_params - contains codel parameters
> + * @interval: initial drop rate
> + * @target: maximum persistent sojourn time
> + */
> +struct codel_params {
> + u64 interval;
> + u64 target;
> +};
> +
> /**
> * enum mac80211_tx_info_flags - flags to describe transmission information/status
> *
> @@ -853,6 +863,8 @@ ieee80211_rate_get_vht_nss(const struct ieee80211_tx_rate *rate)
> * @band: the band to transmit on (use for checking for races)
> * @hw_queue: HW queue to put the frame on, skb_get_queue_mapping() gives the AC
> * @ack_frame_id: internal frame ID for TX status, used internally
> + * @expected_duration: number of microseconds the stack expects this frame to
> + * take to tx. Used for fair queuing.
> * @control: union for control data
> * @status: union for status data
> * @driver_data: array of driver_data pointers
> @@ -865,11 +877,10 @@ ieee80211_rate_get_vht_nss(const struct ieee80211_tx_rate *rate)
> struct ieee80211_tx_info {
> /* common information */
> u32 flags;
> - u8 band;
> -
> - u8 hw_queue;
> -
> - u16 ack_frame_id;
> + u32 band:2,
> + hw_queue:5,
> + ack_frame_id:15,
> + expected_duration:10;
>
> union {
> struct {
> @@ -888,8 +899,18 @@ struct ieee80211_tx_info {
> /* only needed before rate control */
> unsigned long jiffies;
> };
> - /* NB: vif can be NULL for injected frames */
> - struct ieee80211_vif *vif;
> + union {
> + /* NB: vif can be NULL for injected frames */
> + struct ieee80211_vif *vif;
> +
> + /* When packets are enqueued on txq it's easy
> + * to re-construct the vif pointer. There's no
> + * more space in tx_info so it can be used to
> + * store the necessary enqueue time for packet
> + * sojourn time computation.
> + */
> + u64 enqueue_time;
> + };
> struct ieee80211_key_conf *hw_key;
> u32 flags;
> /* 4 bytes free */
> @@ -2114,8 +2135,8 @@ enum ieee80211_hw_flags {
> * @cipher_schemes: a pointer to an array of cipher scheme definitions
> * supported by HW.
> *
> - * @txq_ac_max_pending: maximum number of frames per AC pending in all txq
> - * entries for a vif.
> + * @txq_cparams: codel parameters to control tx queueing dropping behavior
> + * @txq_limit: maximum number of frames queuesd
> */
> struct ieee80211_hw {
> struct ieee80211_conf conf;
> @@ -2145,7 +2166,8 @@ struct ieee80211_hw {
> u8 uapsd_max_sp_len;
> u8 n_cipher_schemes;
> const struct ieee80211_cipher_scheme *cipher_schemes;
> - int txq_ac_max_pending;
> + struct codel_params txq_cparams;
> + u32 txq_limit;
> };
>
> static inline bool _ieee80211_hw_check(struct ieee80211_hw *hw,
> @@ -5633,6 +5655,9 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
> * txq state can change half-way of this function and the caller may end up
> * with "new" frame_cnt and "old" byte_cnt or vice-versa.
> *
> + * Moreover returned values are best-case, i.e. assuming queueing algorithm
> + * will not drop frames due to excess latency.
> + *
> * @txq: pointer obtained from station or virtual interface
> * @frame_cnt: pointer to store frame count
> * @byte_cnt: pointer to store byte count
> @@ -5640,4 +5665,55 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
> void ieee80211_txq_get_depth(struct ieee80211_txq *txq,
> unsigned long *frame_cnt,
> unsigned long *byte_cnt);
> +
> +/**
> + * ieee80211_recalc_fq_period - recalculate fair-queuing period
> + *
> + * This is used to alter the dropping rate to react to possibly changing
> + * (active) station-tid service period and air conditions.
> + *
> + * Driver which implement wake_tx_queue() but don't use ieee80211_tx_schedule()
> + * are encouraged to call this function periodically.
> + *
> +* @hw: pointer as obtained from ieee80211_alloc_hw()
> + */
> +void ieee80211_recalc_fq_period(struct ieee80211_hw *hw);
> +
> +/**
> + * ieee80211_tx_schedule - schedule next transmission burst
> + *
> + * This function can be (and should be, preferably) called by drivers that use
> + * wake_tx_queue op. It uses fq-codel like algorithm to maintain fairness.
> + *
> + * This function may call in back to driver (get_expected_throughput op) so
> + * be careful with locking.
> + *
> + * Driver should take care of serializing calls to this functions. Otherwise
> + * fairness can't be guaranteed.
> + *
> + * This function returns the following values:
> + * -EBUSY Software queues are not empty yet. The function should
> + * not be called until after driver's next tx completion.
> + * -ENOENT Software queues are empty.
> + *
> + * @hw: pointer as obtained from ieee80211_alloc_hw()
> + * @wake: callback to driver to handle burst for given txq within given (byte)
> + * budget. The driver is expected to either call ieee80211_tx_dequeue() or
> + * use its internal queues (if any). The budget should be respected only
> + * for frames comming from ieee80211_tx_dequeue(). On termination it is
> + * expected to return number of frames put onto hw queue that were taken
> + * via ieee80211_tx_dequeue(). Frames from internal retry queues shall not
> + * be included in the returned count. If hw queues become/are busy/full
> + * the driver shall return a negative value which will prompt
> + * ieee80211_tx_schedule() to terminate. If hw queues become full after at
> + * least 1 frame dequeued via ieee80211_tx_dequeue() was sent the driver
> + * is free to report either number of sent frames up until that point or a
> + * negative value. The driver may return 0 if it wants to skip the txq
> + * (e.g. target station is in powersave).
> + */
> +int ieee80211_tx_schedule(struct ieee80211_hw *hw,
> + int (*wake)(struct ieee80211_hw *hw,
> + struct ieee80211_txq *txq,
> + int budget));
> +
> #endif /* MAC80211_H */
> diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
> index 4932e9f243a2..b9d0cee2a786 100644
> --- a/net/mac80211/agg-tx.c
> +++ b/net/mac80211/agg-tx.c
> @@ -194,17 +194,21 @@ static void
> ieee80211_agg_stop_txq(struct sta_info *sta, int tid)
> {
> struct ieee80211_txq *txq = sta->sta.txq[tid];
> + struct ieee80211_sub_if_data *sdata;
> + struct ieee80211_fq *fq;
> struct txq_info *txqi;
>
> if (!txq)
> return;
>
> txqi = to_txq_info(txq);
> + sdata = vif_to_sdata(txq->vif);
> + fq = &sdata->local->fq;
>
> /* Lock here to protect against further seqno updates on dequeue */
> - spin_lock_bh(&txqi->queue.lock);
> + spin_lock_bh(&fq->lock);
> set_bit(IEEE80211_TXQ_STOP, &txqi->flags);
> - spin_unlock_bh(&txqi->queue.lock);
> + spin_unlock_bh(&fq->lock);
> }
>
> static void
> diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
> index b37adb60c9cb..238d7bbd275e 100644
> --- a/net/mac80211/cfg.c
> +++ b/net/mac80211/cfg.c
> @@ -3029,7 +3029,7 @@ int ieee80211_attach_ack_skb(struct ieee80211_local *local, struct sk_buff *skb,
>
> spin_lock_irqsave(&local->ack_status_lock, spin_flags);
> id = idr_alloc(&local->ack_status_frames, ack_skb,
> - 1, 0x10000, GFP_ATOMIC);
> + 1, 0x8000, GFP_ATOMIC);
> spin_unlock_irqrestore(&local->ack_status_lock, spin_flags);
>
> if (id < 0) {
> diff --git a/net/mac80211/codel.h b/net/mac80211/codel.h
> new file mode 100644
> index 000000000000..e6470dbe5b0b
> --- /dev/null
> +++ b/net/mac80211/codel.h
> @@ -0,0 +1,264 @@
> +#ifndef __NET_MAC80211_CODEL_H
> +#define __NET_MAC80211_CODEL_H
> +
> +/*
> + * Codel - The Controlled-Delay Active Queue Management algorithm
> + *
> + * Copyright (C) 2011-2012 Kathleen Nichols <[email protected]>
> + * Copyright (C) 2011-2012 Van Jacobson <[email protected]>
> + * Copyright (C) 2016 Michael D. Taht <[email protected]>
> + * Copyright (C) 2012 Eric Dumazet <[email protected]>
> + * Copyright (C) 2015 Jonathan Morton <[email protected]>
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + * notice, this list of conditions, and the following disclaimer,
> + * without modification.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in the
> + * documentation and/or other materials provided with the distribution.
> + * 3. The names of the authors may not be used to endorse or promote products
> + * derived from this software without specific prior written permission.
> + *
> + * Alternatively, provided that this notice is retained in full, this
> + * software may be distributed under the terms of the GNU General
> + * Public License ("GPL") version 2, in which case the provisions of the
> + * GPL apply INSTEAD OF those given above.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> + * DAMAGE.
> + *
> + */
> +
> +#include <linux/version.h>
> +#include <linux/types.h>
> +#include <linux/ktime.h>
> +#include <linux/skbuff.h>
> +#include <net/pkt_sched.h>
> +#include <net/inet_ecn.h>
> +#include <linux/reciprocal_div.h>
> +
> +#include "codel_i.h"
> +
> +/* Controlling Queue Delay (CoDel) algorithm
> + * =========================================
> + * Source : Kathleen Nichols and Van Jacobson
> + * http://queue.acm.org/detail.cfm?id=2209336
> + *
> + * Implemented on linux by Dave Taht and Eric Dumazet
> + */
> +
> +/* CoDel5 uses a real clock, unlike codel */
> +
> +static inline u64 codel_get_time(void)
> +{
> + return ktime_get_ns();
> +}
> +
> +static inline u32 codel_time_to_us(u64 val)
> +{
> + do_div(val, NSEC_PER_USEC);
> + return (u32)val;
> +}
> +
> +/* sizeof_in_bits(rec_inv_sqrt) */
> +#define REC_INV_SQRT_BITS (8 * sizeof(u16))
> +/* needed shift to get a Q0.32 number from rec_inv_sqrt */
> +#define REC_INV_SQRT_SHIFT (32 - REC_INV_SQRT_BITS)
> +
> +/* Newton approximation method needs more iterations at small inputs,
> + * so cache them.
> + */
> +
> +static void codel_vars_init(struct codel_vars *vars)
> +{
> + memset(vars, 0, sizeof(*vars));
> +}
> +
> +/*
> + * http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Iterative_methods_for_reciprocal_square_roots
> + * new_invsqrt = (invsqrt / 2) * (3 - count * invsqrt^2)
> + *
> + * Here, invsqrt is a fixed point number (< 1.0), 32bit mantissa, aka Q0.32
> + */
> +static inline void codel_Newton_step(struct codel_vars *vars)
> +{
> + u32 invsqrt = ((u32)vars->rec_inv_sqrt) << REC_INV_SQRT_SHIFT;
> + u32 invsqrt2 = ((u64)invsqrt * invsqrt) >> 32;
> + u64 val = (3LL << 32) - ((u64)vars->count * invsqrt2);
> +
> + val >>= 2; /* avoid overflow in following multiply */
> + val = (val * invsqrt) >> (32 - 2 + 1);
> +
> + vars->rec_inv_sqrt = val >> REC_INV_SQRT_SHIFT;
> +}
> +
> +/*
> + * CoDel control_law is t + interval/sqrt(count)
> + * We maintain in rec_inv_sqrt the reciprocal value of sqrt(count) to avoid
> + * both sqrt() and divide operation.
> + */
> +static u64 codel_control_law(u64 t,
> + u64 interval,
> + u32 rec_inv_sqrt)
> +{
> + return t + reciprocal_scale(interval, rec_inv_sqrt <<
> + REC_INV_SQRT_SHIFT);
> +}
> +
> +/* Forward declaration of this for use elsewhere */
> +
> +static inline u64
> +custom_codel_get_enqueue_time(struct sk_buff *skb);
> +
> +static inline struct sk_buff *
> +custom_dequeue(struct codel_vars *vars, void *ptr);
> +
> +static inline void
> +custom_drop(struct sk_buff *skb, void *ptr);
> +
> +static bool codel_should_drop(struct sk_buff *skb,
> + __u32 *backlog,
> + __u32 backlog_thr,
> + struct codel_vars *vars,
> + const struct codel_params *p,
> + u64 now)
> +{
> + if (!skb) {
> + vars->first_above_time = 0;
> + return false;
> + }
> +
> + if (now - custom_codel_get_enqueue_time(skb) < p->target ||
> + *backlog <= backlog_thr) {
> + /* went below - stay below for at least interval */
> + vars->first_above_time = 0;
> + return false;
> + }
> +
> + if (vars->first_above_time == 0) {
> + /* just went above from below; mark the time */
> + vars->first_above_time = now + p->interval;
> +
> + } else if (now > vars->first_above_time) {
> + return true;
> + }
> +
> + return false;
> +}
> +
> +static struct sk_buff *codel_dequeue(void *ptr,
> + __u32 *backlog,
> + __u32 backlog_thr,
> + struct codel_vars *vars,
> + struct codel_params *p,
> + u64 now,
> + bool overloaded)
> +{
> + struct sk_buff *skb = custom_dequeue(vars, ptr);
> + bool drop;
> +
> + if (!skb) {
> + vars->dropping = false;
> + return skb;
> + }
> + drop = codel_should_drop(skb, backlog, backlog_thr, vars, p, now);
> + if (vars->dropping) {
> + if (!drop) {
> + /* sojourn time below target - leave dropping state */
> + vars->dropping = false;
> + } else if (now >= vars->drop_next) {
> + /* It's time for the next drop. Drop the current
> + * packet and dequeue the next. The dequeue might
> + * take us out of dropping state.
> + * If not, schedule the next drop.
> + * A large backlog might result in drop rates so high
> + * that the next drop should happen now,
> + * hence the while loop.
> + */
> +
> + /* saturating increment */
> + vars->count++;
> + if (!vars->count)
> + vars->count--;
> +
> + codel_Newton_step(vars);
> + vars->drop_next = codel_control_law(vars->drop_next,
> + p->interval,
> + vars->rec_inv_sqrt);
> + do {
> + if (INET_ECN_set_ce(skb) && !overloaded) {
> + vars->ecn_mark++;
> + /* and schedule the next drop */
> + vars->drop_next = codel_control_law(
> + vars->drop_next, p->interval,
> + vars->rec_inv_sqrt);
> + goto end;
> + }
> + custom_drop(skb, ptr);
> + vars->drop_count++;
> + skb = custom_dequeue(vars, ptr);
> + if (skb && !codel_should_drop(skb, backlog,
> + backlog_thr,
> + vars, p, now)) {
> + /* leave dropping state */
> + vars->dropping = false;
> + } else {
> + /* schedule the next drop */
> + vars->drop_next = codel_control_law(
> + vars->drop_next, p->interval,
> + vars->rec_inv_sqrt);
> + }
> + } while (skb && vars->dropping && now >=
> + vars->drop_next);
> +
> + /* Mark the packet regardless */
> + if (skb && INET_ECN_set_ce(skb))
> + vars->ecn_mark++;
> + }
> + } else if (drop) {
> + if (INET_ECN_set_ce(skb) && !overloaded) {
> + vars->ecn_mark++;
> + } else {
> + custom_drop(skb, ptr);
> + vars->drop_count++;
> +
> + skb = custom_dequeue(vars, ptr);
> + drop = codel_should_drop(skb, backlog, backlog_thr,
> + vars, p, now);
> + if (skb && INET_ECN_set_ce(skb))
> + vars->ecn_mark++;
> + }
> + vars->dropping = true;
> + /* if min went above target close to when we last went below
> + * assume that the drop rate that controlled the queue on the
> + * last cycle is a good starting point to control it now.
> + */
> + if (vars->count > 2 &&
> + now - vars->drop_next < 8 * p->interval) {
> + vars->count -= 2;
> + codel_Newton_step(vars);
> + } else {
> + vars->count = 1;
> + vars->rec_inv_sqrt = ~0U >> REC_INV_SQRT_SHIFT;
> + }
> + codel_Newton_step(vars);
> + vars->drop_next = codel_control_law(now, p->interval,
> + vars->rec_inv_sqrt);
> + }
> +end:
> + return skb;
> +}
> +#endif
> diff --git a/net/mac80211/codel_i.h b/net/mac80211/codel_i.h
> new file mode 100644
> index 000000000000..a7d23e45dee9
> --- /dev/null
> +++ b/net/mac80211/codel_i.h
> @@ -0,0 +1,89 @@
> +#ifndef __NET_MAC80211_CODEL_I_H
> +#define __NET_MAC80211_CODEL_I_H
> +
> +/*
> + * Codel - The Controlled-Delay Active Queue Management algorithm
> + *
> + * Copyright (C) 2011-2012 Kathleen Nichols <[email protected]>
> + * Copyright (C) 2011-2012 Van Jacobson <[email protected]>
> + * Copyright (C) 2016 Michael D. Taht <[email protected]>
> + * Copyright (C) 2012 Eric Dumazet <[email protected]>
> + * Copyright (C) 2015 Jonathan Morton <[email protected]>
> + * Copyright (C) 2016 Michal Kazior <[email protected]>
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + * notice, this list of conditions, and the following disclaimer,
> + * without modification.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in the
> + * documentation and/or other materials provided with the distribution.
> + * 3. The names of the authors may not be used to endorse or promote products
> + * derived from this software without specific prior written permission.
> + *
> + * Alternatively, provided that this notice is retained in full, this
> + * software may be distributed under the terms of the GNU General
> + * Public License ("GPL") version 2, in which case the provisions of the
> + * GPL apply INSTEAD OF those given above.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> + * DAMAGE.
> + *
> + */
> +
> +#include <linux/version.h>
> +#include <linux/types.h>
> +#include <linux/ktime.h>
> +#include <linux/skbuff.h>
> +#include <net/pkt_sched.h>
> +#include <net/inet_ecn.h>
> +#include <linux/reciprocal_div.h>
> +
> +/* Controlling Queue Delay (CoDel) algorithm
> + * =========================================
> + * Source : Kathleen Nichols and Van Jacobson
> + * http://queue.acm.org/detail.cfm?id=2209336
> + *
> + * Implemented on linux by Dave Taht and Eric Dumazet
> + */
> +
> +/* CoDel5 uses a real clock, unlike codel */
> +
> +#define MS2TIME(a) (a * (u64) NSEC_PER_MSEC)
> +#define US2TIME(a) (a * (u64) NSEC_PER_USEC)
> +
> +/**
> + * struct codel_vars - contains codel variables
> + * @count: how many drops we've done since the last time we
> + * entered dropping state
> + * @dropping: set to > 0 if in dropping state
> + * @rec_inv_sqrt: reciprocal value of sqrt(count) >> 1
> + * @first_above_time: when we went (or will go) continuously above target
> + * for interval
> + * @drop_next: time to drop next packet, or when we dropped last
> + * @drop_count: temp count of dropped packets in dequeue()
> + * @ecn_mark: number of packets we ECN marked instead of dropping
> + */
> +
> +struct codel_vars {
> + u32 count;
> + u16 dropping;
> + u16 rec_inv_sqrt;
> + u64 first_above_time;
> + u64 drop_next;
> + u16 drop_count;
> + u16 ecn_mark;
> +};
> +#endif
> diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
> index 4ab5c522ceee..9b0b8c3d23cd 100644
> --- a/net/mac80211/debugfs.c
> +++ b/net/mac80211/debugfs.c
> @@ -31,6 +31,30 @@ int mac80211_format_buffer(char __user *userbuf, size_t count,
> return simple_read_from_buffer(userbuf, count, ppos, buf, res);
> }
>
> +static int mac80211_parse_buffer(const char __user *userbuf,
> + size_t count,
> + loff_t *ppos,
> + char *fmt, ...)
> +{
> + va_list args;
> + char buf[DEBUGFS_FORMAT_BUFFER_SIZE] = {};
> + int res;
> +
> + if (count > sizeof(buf))
> + return -EINVAL;
> +
> + if (copy_from_user(buf, userbuf, count))
> + return -EFAULT;
> +
> + buf[sizeof(buf) - 1] = '\0';
> +
> + va_start(args, fmt);
> + res = vsscanf(buf, fmt, args);
> + va_end(args);
> +
> + return count;
> +}
> +
> #define DEBUGFS_READONLY_FILE_FN(name, fmt, value...) \
> static ssize_t name## _read(struct file *file, char __user *userbuf, \
> size_t count, loff_t *ppos) \
> @@ -70,6 +94,62 @@ DEBUGFS_READONLY_FILE(wep_iv, "%#08x",
> DEBUGFS_READONLY_FILE(rate_ctrl_alg, "%s",
> local->rate_ctrl ? local->rate_ctrl->ops->name : "hw/driver");
>
> +DEBUGFS_READONLY_FILE(fq_drop_overlimit, "%d",
> + local->fq.drop_overlimit);
> +DEBUGFS_READONLY_FILE(fq_drop_codel, "%d",
> + local->fq.drop_codel);
> +DEBUGFS_READONLY_FILE(fq_backlog, "%d",
> + local->fq.backlog);
> +DEBUGFS_READONLY_FILE(fq_in_flight_usec, "%d",
> + atomic_read(&local->fq.in_flight_usec));
> +DEBUGFS_READONLY_FILE(fq_txq_limit, "%d",
> + local->hw.txq_limit);
> +DEBUGFS_READONLY_FILE(fq_txq_interval, "%llu",
> + local->hw.txq_cparams.interval);
> +DEBUGFS_READONLY_FILE(fq_txq_target, "%llu",
> + local->hw.txq_cparams.target);
> +DEBUGFS_READONLY_FILE(fq_ave_period, "%d",
> + (int)ewma_fq_period_read(&local->fq.ave_period));
> +
> +#define DEBUGFS_RW_FILE_FN(name, expr) \
> +static ssize_t name## _write(struct file *file, \
> + const char __user *userbuf, \
> + size_t count, \
> + loff_t *ppos) \
> +{ \
> + struct ieee80211_local *local = file->private_data; \
> + return expr; \
> +}
> +
> +#define DEBUGFS_RW_FILE(name, expr, fmt, value...) \
> + DEBUGFS_READONLY_FILE_FN(name, fmt, value) \
> + DEBUGFS_RW_FILE_FN(name, expr) \
> + DEBUGFS_RW_FILE_OPS(name)
> +
> +#define DEBUGFS_RW_FILE_OPS(name) \
> +static const struct file_operations name## _ops = { \
> + .read = name## _read, \
> + .write = name## _write, \
> + .open = simple_open, \
> + .llseek = generic_file_llseek, \
> +};
> +
> +#define DEBUGFS_RW_EXPR_FQ(name) \
> +({ \
> + int res; \
> + res = mac80211_parse_buffer(userbuf, count, ppos, "%d", &name); \
> + ieee80211_recalc_fq_period(&local->hw); \
> + res; \
> +})
> +
> +DEBUGFS_RW_FILE(fq_min_txops_target, DEBUGFS_RW_EXPR_FQ(local->fq.min_txops_target), "%d", local->fq.min_txops_target);
> +DEBUGFS_RW_FILE(fq_max_txops_per_txq, DEBUGFS_RW_EXPR_FQ(local->fq.max_txops_per_txq), "%d", local->fq.max_txops_per_txq);
> +DEBUGFS_RW_FILE(fq_min_txops_per_hw, DEBUGFS_RW_EXPR_FQ(local->fq.min_txops_per_hw), "%d", local->fq.min_txops_per_hw);
> +DEBUGFS_RW_FILE(fq_max_txops_per_hw, DEBUGFS_RW_EXPR_FQ(local->fq.max_txops_per_hw), "%d", local->fq.max_txops_per_hw);
> +DEBUGFS_RW_FILE(fq_txop_mixed_usec, DEBUGFS_RW_EXPR_FQ(local->fq.txop_mixed_usec), "%d", local->fq.txop_mixed_usec);
> +DEBUGFS_RW_FILE(fq_txop_green_usec, DEBUGFS_RW_EXPR_FQ(local->fq.txop_green_usec), "%d", local->fq.txop_green_usec);
> +
> +
> #ifdef CONFIG_PM
> static ssize_t reset_write(struct file *file, const char __user *user_buf,
> size_t count, loff_t *ppos)
> @@ -177,8 +257,178 @@ static ssize_t queues_read(struct file *file, char __user *user_buf,
> return simple_read_from_buffer(user_buf, count, ppos, buf, res);
> }
>
> +static ssize_t fq_read(struct file *file, char __user *user_buf,
> + size_t count, loff_t *ppos)
> +{
> + struct ieee80211_local *local = file->private_data;
> + struct ieee80211_sub_if_data *sdata;
> + struct sta_info *sta;
> + struct txq_flow *flow;
> + struct txq_info *txqi;
> + void *buf;
> + int new_flows;
> + int old_flows;
> + int len;
> + int i;
> + int rv;
> + int res = 0;
> + static const u8 zeroaddr[ETH_ALEN];
> +
> + len = 32 * 1024;
> + buf = kzalloc(len, GFP_KERNEL);
> + if (!buf)
> + return -ENOMEM;
> +
> + spin_lock_bh(&local->fq.lock);
> + rcu_read_lock();
> +
> + list_for_each_entry(txqi, &local->fq.new_flows, flowchain) {
> + res += scnprintf(buf + res, len - res,
> + "sched new txqi vif %s sta %pM tid %d deficit %d\n",
> + container_of(txqi->txq.vif, struct ieee80211_sub_if_data, vif)->name,
> + txqi->txq.sta ? txqi->txq.sta->addr : zeroaddr,
> + txqi->txq.tid,
> + txqi->deficit);
> + }
> +
> + list_for_each_entry(txqi, &local->fq.old_flows, flowchain) {
> + res += scnprintf(buf + res, len - res,
> + "sched old txqi vif %s sta %pM tid %d deficit %d\n",
> + container_of(txqi->txq.vif, struct ieee80211_sub_if_data, vif)->name,
> + txqi->txq.sta ? txqi->txq.sta->addr : zeroaddr,
> + txqi->txq.tid,
> + txqi->deficit);
> + }
> +
> + list_for_each_entry_rcu(sta, &local->sta_list, list) {
> + for (i = 0; i < IEEE80211_NUM_TIDS; i++) {
> + if (!sta->sta.txq[i])
> + continue;
> +
> + txqi = container_of(sta->sta.txq[i], struct txq_info, txq);
> + if (!txqi->backlog_bytes)
> + continue;
> +
> + new_flows = 0;
> + old_flows = 0;
> +
> + list_for_each_entry(flow, &txqi->new_flows, flowchain)
> + new_flows++;
> + list_for_each_entry(flow, &txqi->old_flows, flowchain)
> + old_flows++;
> +
> + res += scnprintf(buf + res, len - res,
> + "sta %pM tid %d backlog (%db %dp) flows (%d new %d old) burst %d bpu %d in-flight %d\n",
> + sta->sta.addr,
> + i,
> + txqi->backlog_bytes,
> + txqi->backlog_packets,
> + new_flows,
> + old_flows,
> + txqi->bytes_per_burst,
> + txqi->bytes_per_usec,
> + atomic_read(&txqi->in_flight_usec)
> + );
> +
> + flow = &txqi->flow;
> + res += scnprintf(buf + res, len - res,
> + "sta %pM def flow %p backlog (%db %dp)\n",
> + sta->sta.addr,
> + flow,
> + flow->backlog,
> + flow->queue.qlen
> + );
> +
> + list_for_each_entry(flow, &txqi->new_flows, flowchain)
> + res += scnprintf(buf + res, len - res,
> + "sta %pM tid %d new flow %p backlog (%db %dp)\n",
> + sta->sta.addr,
> + i,
> + flow,
> + flow->backlog,
> + flow->queue.qlen
> + );
> +
> + list_for_each_entry(flow, &txqi->old_flows, flowchain)
> + res += scnprintf(buf + res, len - res,
> + "sta %pM tid %d old flow %p backlog (%db %dp)\n",
> + sta->sta.addr,
> + i,
> + flow,
> + flow->backlog,
> + flow->queue.qlen
> + );
> + }
> + }
> +
> + list_for_each_entry_rcu(sdata, &local->interfaces, list) {
> + if (!sdata->vif.txq)
> + continue;
> +
> + txqi = container_of(sdata->vif.txq, struct txq_info, txq);
> + if (!txqi->backlog_bytes)
> + continue;
> +
> + new_flows = 0;
> + old_flows = 0;
> +
> + list_for_each_entry(flow, &txqi->new_flows, flowchain)
> + new_flows++;
> + list_for_each_entry(flow, &txqi->old_flows, flowchain)
> + old_flows++;
> +
> + res += scnprintf(buf + res, len - res,
> + "vif %s backlog (%db %dp) flows (%d new %d old) burst %d bpu %d in-flight %d\n",
> + sdata->name,
> + txqi->backlog_bytes,
> + txqi->backlog_packets,
> + new_flows,
> + old_flows,
> + txqi->bytes_per_burst,
> + txqi->bytes_per_usec,
> + atomic_read(&txqi->in_flight_usec)
> + );
> +
> + flow = &txqi->flow;
> + res += scnprintf(buf + res, len - res,
> + "vif %s def flow %p backlog (%db %dp)\n",
> + sdata->name,
> + flow,
> + flow->backlog,
> + flow->queue.qlen
> + );
> +
> + list_for_each_entry(flow, &txqi->new_flows, flowchain)
> + res += scnprintf(buf + res, len - res,
> + "vif %s new flow %p backlog (%db %dp)\n",
> + sdata->name,
> + flow,
> + flow->backlog,
> + flow->queue.qlen
> + );
> +
> + list_for_each_entry(flow, &txqi->old_flows, flowchain)
> + res += scnprintf(buf + res, len - res,
> + "vif %s old flow %p backlog (%db %dp)\n",
> + sdata->name,
> + flow,
> + flow->backlog,
> + flow->queue.qlen
> + );
> + }
> +
> + rcu_read_unlock();
> + spin_unlock_bh(&local->fq.lock);
> +
> + rv = simple_read_from_buffer(user_buf, count, ppos, buf, res);
> + kfree(buf);
> +
> + return rv;
> +}
> +
> DEBUGFS_READONLY_FILE_OPS(hwflags);
> DEBUGFS_READONLY_FILE_OPS(queues);
> +DEBUGFS_READONLY_FILE_OPS(fq);
>
> /* statistics stuff */
>
> @@ -247,6 +497,7 @@ void debugfs_hw_add(struct ieee80211_local *local)
> DEBUGFS_ADD(total_ps_buffered);
> DEBUGFS_ADD(wep_iv);
> DEBUGFS_ADD(queues);
> + DEBUGFS_ADD(fq);
> #ifdef CONFIG_PM
> DEBUGFS_ADD_MODE(reset, 0200);
> #endif
> @@ -254,6 +505,22 @@ void debugfs_hw_add(struct ieee80211_local *local)
> DEBUGFS_ADD(user_power);
> DEBUGFS_ADD(power);
>
> + DEBUGFS_ADD(fq_drop_overlimit);
> + DEBUGFS_ADD(fq_drop_codel);
> + DEBUGFS_ADD(fq_backlog);
> + DEBUGFS_ADD(fq_in_flight_usec);
> + DEBUGFS_ADD(fq_txq_limit);
> + DEBUGFS_ADD(fq_txq_interval);
> + DEBUGFS_ADD(fq_txq_target);
> + DEBUGFS_ADD(fq_ave_period);
> +
> + DEBUGFS_ADD(fq_min_txops_target);
> + DEBUGFS_ADD(fq_max_txops_per_txq);
> + DEBUGFS_ADD(fq_min_txops_per_hw);
> + DEBUGFS_ADD(fq_max_txops_per_hw);
> + DEBUGFS_ADD(fq_txop_mixed_usec);
> + DEBUGFS_ADD(fq_txop_green_usec);
> +
> statsd = debugfs_create_dir("statistics", phyd);
>
> /* if the dir failed, don't put all the other things into the root! */
> diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
> index f1565ce35273..443c941d5917 100644
> --- a/net/mac80211/ieee80211_i.h
> +++ b/net/mac80211/ieee80211_i.h
> @@ -805,9 +805,18 @@ enum txq_info_flags {
> };
>
> struct txq_info {
> - struct sk_buff_head queue;
> + struct txq_flow flow;
> + struct list_head flowchain;
> + struct list_head new_flows;
> + struct list_head old_flows;
> + int backlog_bytes;
> + int backlog_packets;
> + int bytes_per_burst;
> + int bytes_per_usec;
> + int deficit;
> + int in_flight_delta_usec;
> + atomic_t in_flight_usec;
> unsigned long flags;
> - unsigned long byte_cnt;
>
> /* keep last! */
> struct ieee80211_txq txq;
> @@ -855,7 +864,6 @@ struct ieee80211_sub_if_data {
> bool control_port_no_encrypt;
> int encrypt_headroom;
>
> - atomic_t txqs_len[IEEE80211_NUM_ACS];
> struct ieee80211_tx_queue_params tx_conf[IEEE80211_NUM_ACS];
> struct mac80211_qos_map __rcu *qos_map;
>
> @@ -1092,11 +1100,37 @@ enum mac80211_scan_state {
> SCAN_ABORT,
> };
>
> +DECLARE_EWMA(fq_period, 16, 4)
> +
> +struct ieee80211_fq {
> + struct txq_flow *flows;
> + struct list_head backlogs;
> + struct list_head old_flows;
> + struct list_head new_flows;
> + struct ewma_fq_period ave_period;
> + spinlock_t lock;
> + atomic_t in_flight_usec;
> + int flows_cnt;
> + int perturbation;
> + int quantum;
> + int backlog;
> + int min_txops_target;
> + int max_txops_per_txq;
> + int min_txops_per_hw;
> + int max_txops_per_hw;
> + int txop_mixed_usec;
> + int txop_green_usec;
> +
> + int drop_overlimit;
> + int drop_codel;
> +};
> +
> struct ieee80211_local {
> /* embed the driver visible part.
> * don't cast (use the static inlines below), but we keep
> * it first anyway so they become a no-op */
> struct ieee80211_hw hw;
> + struct ieee80211_fq fq;
>
> const struct ieee80211_ops *ops;
>
> @@ -1928,6 +1962,11 @@ static inline bool ieee80211_can_run_worker(struct ieee80211_local *local)
> void ieee80211_init_tx_queue(struct ieee80211_sub_if_data *sdata,
> struct sta_info *sta,
> struct txq_info *txq, int tid);
> +void ieee80211_purge_txq(struct ieee80211_local *local, struct txq_info *txqi);
> +void ieee80211_init_flow(struct txq_flow *flow);
> +int ieee80211_setup_flows(struct ieee80211_local *local);
> +void ieee80211_teardown_flows(struct ieee80211_local *local);
> +
> void ieee80211_send_auth(struct ieee80211_sub_if_data *sdata,
> u16 transaction, u16 auth_alg, u16 status,
> const u8 *extra, size_t extra_len, const u8 *bssid,
> diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
> index 453b4e741780..d1063b50f12c 100644
> --- a/net/mac80211/iface.c
> +++ b/net/mac80211/iface.c
> @@ -779,6 +779,7 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,
> bool going_down)
> {
> struct ieee80211_local *local = sdata->local;
> + struct ieee80211_fq *fq = &local->fq;
> unsigned long flags;
> struct sk_buff *skb, *tmp;
> u32 hw_reconf_flags = 0;
> @@ -977,12 +978,9 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,
> if (sdata->vif.txq) {
> struct txq_info *txqi = to_txq_info(sdata->vif.txq);
>
> - spin_lock_bh(&txqi->queue.lock);
> - ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
> - txqi->byte_cnt = 0;
> - spin_unlock_bh(&txqi->queue.lock);
> -
> - atomic_set(&sdata->txqs_len[txqi->txq.ac], 0);
> + spin_lock_bh(&fq->lock);
> + ieee80211_purge_txq(local, txqi);
> + spin_unlock_bh(&fq->lock);
> }
>
> if (local->open_count == 0)
> @@ -1198,6 +1196,13 @@ static void ieee80211_if_setup(struct net_device *dev)
> dev->destructor = ieee80211_if_free;
> }
>
> +static void ieee80211_if_setup_no_queue(struct net_device *dev)
> +{
> + ieee80211_if_setup(dev);
> + dev->priv_flags |= IFF_NO_QUEUE;
> + /* Note for backporters: use dev->tx_queue_len = 0 instead of IFF_ */
> +}
> +
> static void ieee80211_iface_work(struct work_struct *work)
> {
> struct ieee80211_sub_if_data *sdata =
> @@ -1707,6 +1712,7 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
> struct net_device *ndev = NULL;
> struct ieee80211_sub_if_data *sdata = NULL;
> struct txq_info *txqi;
> + void (*if_setup)(struct net_device *dev);
> int ret, i;
> int txqs = 1;
>
> @@ -1734,12 +1740,17 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
> txq_size += sizeof(struct txq_info) +
> local->hw.txq_data_size;
>
> + if (local->ops->wake_tx_queue)
> + if_setup = ieee80211_if_setup_no_queue;
> + else
> + if_setup = ieee80211_if_setup;
> +
> if (local->hw.queues >= IEEE80211_NUM_ACS)
> txqs = IEEE80211_NUM_ACS;
>
> ndev = alloc_netdev_mqs(size + txq_size,
> name, name_assign_type,
> - ieee80211_if_setup, txqs, 1);
> + if_setup, txqs, 1);
> if (!ndev)
> return -ENOMEM;
> dev_net_set(ndev, wiphy_net(local->hw.wiphy));
> diff --git a/net/mac80211/main.c b/net/mac80211/main.c
> index 8190bf27ebff..9fd3b10ae52b 100644
> --- a/net/mac80211/main.c
> +++ b/net/mac80211/main.c
> @@ -1053,9 +1053,6 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)
>
> local->dynamic_ps_forced_timeout = -1;
>
> - if (!local->hw.txq_ac_max_pending)
> - local->hw.txq_ac_max_pending = 64;
> -
> result = ieee80211_wep_init(local);
> if (result < 0)
> wiphy_debug(local->hw.wiphy, "Failed to initialize wep: %d\n",
> @@ -1087,6 +1084,10 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)
>
> rtnl_unlock();
>
> + result = ieee80211_setup_flows(local);
> + if (result)
> + goto fail_flows;
> +
> #ifdef CONFIG_INET
> local->ifa_notifier.notifier_call = ieee80211_ifa_changed;
> result = register_inetaddr_notifier(&local->ifa_notifier);
> @@ -1112,6 +1113,8 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)
> #if defined(CONFIG_INET) || defined(CONFIG_IPV6)
> fail_ifa:
> #endif
> + ieee80211_teardown_flows(local);
> + fail_flows:
> rtnl_lock();
> rate_control_deinitialize(local);
> ieee80211_remove_interfaces(local);
> diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
> index dc27becb9b71..70f8f7949bf2 100644
> --- a/net/mac80211/rx.c
> +++ b/net/mac80211/rx.c
> @@ -1268,7 +1268,7 @@ static void sta_ps_start(struct sta_info *sta)
> for (tid = 0; tid < ARRAY_SIZE(sta->sta.txq); tid++) {
> struct txq_info *txqi = to_txq_info(sta->sta.txq[tid]);
>
> - if (!skb_queue_len(&txqi->queue))
> + if (!txqi->backlog_packets)
> set_bit(tid, &sta->txq_buffered_tids);
> else
> clear_bit(tid, &sta->txq_buffered_tids);
> diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
> index 00c82fb152c0..0729046a0144 100644
> --- a/net/mac80211/sta_info.c
> +++ b/net/mac80211/sta_info.c
> @@ -112,11 +112,7 @@ static void __cleanup_single_sta(struct sta_info *sta)
> if (sta->sta.txq[0]) {
> for (i = 0; i < ARRAY_SIZE(sta->sta.txq); i++) {
> struct txq_info *txqi = to_txq_info(sta->sta.txq[i]);
> - int n = skb_queue_len(&txqi->queue);
> -
> - ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
> - atomic_sub(n, &sdata->txqs_len[txqi->txq.ac]);
> - txqi->byte_cnt = 0;
> + ieee80211_purge_txq(local, txqi);
> }
> }
>
> @@ -1193,7 +1189,7 @@ void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta)
> for (i = 0; i < ARRAY_SIZE(sta->sta.txq); i++) {
> struct txq_info *txqi = to_txq_info(sta->sta.txq[i]);
>
> - if (!skb_queue_len(&txqi->queue))
> + if (!txqi->backlog_packets)
> continue;
>
> drv_wake_tx_queue(local, txqi);
> @@ -1630,7 +1626,7 @@ ieee80211_sta_ps_deliver_response(struct sta_info *sta,
> for (tid = 0; tid < ARRAY_SIZE(sta->sta.txq); tid++) {
> struct txq_info *txqi = to_txq_info(sta->sta.txq[tid]);
>
> - if (!(tids & BIT(tid)) || skb_queue_len(&txqi->queue))
> + if (!(tids & BIT(tid)) || txqi->backlog_packets)
> continue;
>
> sta_info_recalc_tim(sta);
> diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
> index 053f5c4fa495..dd9d5f754c57 100644
> --- a/net/mac80211/sta_info.h
> +++ b/net/mac80211/sta_info.h
> @@ -19,6 +19,7 @@
> #include <linux/etherdevice.h>
> #include <linux/rhashtable.h>
> #include "key.h"
> +#include "codel_i.h"
>
> /**
> * enum ieee80211_sta_info_flags - Stations flags
> @@ -330,6 +331,32 @@ struct mesh_sta {
>
> DECLARE_EWMA(signal, 1024, 8)
>
> +struct txq_info;
> +
> +/**
> + * struct txq_flow - per traffic flow queue
> + *
> + * This structure is used to distinguish and queue different traffic flows
> + * separately for fair queueing/AQM purposes.
> + *
> + * @txqi: txq_info structure it is associated at given time
> + * @flowchain: can be linked to other flows for RR purposes
> + * @backlogchain: can be linked to other flows for backlog sorting purposes
> + * @queue: sk_buff queue
> + * @cvars: codel state vars
> + * @backlog: number of bytes pending in the queue
> + * @deficit: used for fair queueing balancing
> + */
> +struct txq_flow {
> + struct txq_info *txqi;
> + struct list_head flowchain;
> + struct list_head backlogchain;
> + struct sk_buff_head queue;
> + struct codel_vars cvars;
> + int backlog;
> + int deficit;
> +};
> +
> /**
> * struct sta_info - STA information
> *
> diff --git a/net/mac80211/status.c b/net/mac80211/status.c
> index 8b1b2ea03eb5..2cd898f8a658 100644
> --- a/net/mac80211/status.c
> +++ b/net/mac80211/status.c
> @@ -502,6 +502,67 @@ static void ieee80211_report_ack_skb(struct ieee80211_local *local,
> }
> }
>
> +static void ieee80211_report_txq_skb(struct ieee80211_local *local,
> + struct ieee80211_hdr *hdr,
> + struct sk_buff *skb)
> +{
> + struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
> + struct ieee80211_fq *fq = &local->fq;
> + struct ieee80211_sub_if_data *sdata;
> + struct ieee80211_txq *txq = NULL;
> + struct sta_info *sta;
> + struct txq_info *txqi;
> + struct rhash_head *tmp;
> + const struct bucket_table *tbl;
> + int tid;
> + __le16 fc = hdr->frame_control;
> + u8 *addr;
> + static const u8 zeroaddr[ETH_ALEN];
> +
> + if (!ieee80211_is_data(fc))
> + return;
> +
> + rcu_read_lock();
> +
> + tbl = rht_dereference_rcu(local->sta_hash.tbl, &local->sta_hash);
> + for_each_sta_info(local, tbl, hdr->addr1, sta, tmp) {
> + /* skip wrong virtual interface */
> + if (!ether_addr_equal(hdr->addr2, sta->sdata->vif.addr))
> + continue;
> +
> + tid = skb->priority & IEEE80211_QOS_CTL_TID_MASK;
> + txq = sta->sta.txq[tid];
> +
> + break;
> + }
> +
> + if (!txq) {
> + addr = ieee80211_get_DA(hdr);
> + if (is_multicast_ether_addr(addr)) {
> + sdata = ieee80211_sdata_from_skb(local, skb);
> + txq = sdata->vif.txq;
> + }
> + }
> +
> + if (txq) {
> + txqi = container_of(txq, struct txq_info, txq);
> + atomic_sub(info->expected_duration, &txqi->in_flight_usec);
> + if (atomic_read(&txqi->in_flight_usec) < 0) {
> + WARN_ON_ONCE(1);
> + print_hex_dump(KERN_DEBUG, "skb: ", DUMP_PREFIX_OFFSET, 16, 1,
> + skb->data, skb->len, 0);
> + printk("underflow: txq tid %d sta %pM vif %s\n",
> + txq->tid,
> + txq->sta ? txq->sta->addr : zeroaddr,
> + container_of(txq->vif, struct ieee80211_sub_if_data, vif)->name);
> + }
> + }
> +
> + atomic_sub(info->expected_duration, &fq->in_flight_usec);
> +
> + rcu_read_unlock();
> +}
> +
> static void ieee80211_report_used_skb(struct ieee80211_local *local,
> struct sk_buff *skb, bool dropped)
> {
> @@ -512,6 +573,9 @@ static void ieee80211_report_used_skb(struct ieee80211_local *local,
> if (dropped)
> acked = false;
>
> + if (local->ops->wake_tx_queue)
> + ieee80211_report_txq_skb(local, hdr, skb);
> +
> if (info->flags & IEEE80211_TX_INTFL_MLME_CONN_TX) {
> struct ieee80211_sub_if_data *sdata;
>
> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index 6040c29a9e17..3072e460e82a 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -34,6 +34,7 @@
> #include "wpa.h"
> #include "wme.h"
> #include "rate.h"
> +#include "codel.h"
>
> /* misc utils */
>
> @@ -1232,27 +1233,335 @@ ieee80211_tx_prepare(struct ieee80211_sub_if_data *sdata,
> return TX_CONTINUE;
> }
>
> -static void ieee80211_drv_tx(struct ieee80211_local *local,
> - struct ieee80211_vif *vif,
> - struct ieee80211_sta *pubsta,
> - struct sk_buff *skb)
> +static inline u64
> +custom_codel_get_enqueue_time(struct sk_buff *skb)
> +{
> + return IEEE80211_SKB_CB(skb)->control.enqueue_time;
> +}
> +
> +static inline struct sk_buff *
> +flow_dequeue(struct ieee80211_local *local, struct txq_flow *flow)
> +{
> + struct ieee80211_fq *fq = &local->fq;
> + struct txq_info *txqi = flow->txqi;
> + struct txq_flow *i;
> + struct sk_buff *skb;
> +
> + skb = __skb_dequeue(&flow->queue);
> + if (!skb)
> + return NULL;
> +
> + txqi->backlog_bytes -= skb->len;
> + txqi->backlog_packets--;
> + flow->backlog -= skb->len;
> + fq->backlog--;
> +
> + if (flow->backlog == 0) {
> + list_del_init(&flow->backlogchain);
> + } else {
> + i = flow;
> +
> + list_for_each_entry_continue(i, &fq->backlogs, backlogchain)
> + if (i->backlog < flow->backlog)
> + break;
> +
> + list_move_tail(&flow->backlogchain, &i->backlogchain);
> + }
> +
> + return skb;
> +}
> +
> +static inline struct sk_buff *
> +custom_dequeue(struct codel_vars *vars, void *ptr)
> +{
> + struct txq_flow *flow = ptr;
> + struct txq_info *txqi = flow->txqi;
> + struct ieee80211_vif *vif = txqi->txq.vif;
> + struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
> + struct ieee80211_local *local = sdata->local;
> +
> + return flow_dequeue(local, flow);
> +}
> +
> +static inline void
> +custom_drop(struct sk_buff *skb, void *ptr)
> +{
> + struct txq_flow *flow = ptr;
> + struct txq_info *txqi = flow->txqi;
> + struct ieee80211_vif *vif = txqi->txq.vif;
> + struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
> + struct ieee80211_local *local = sdata->local;
> + struct ieee80211_hw *hw = &local->hw;
> +
> + ieee80211_free_txskb(hw, skb);
> + local->fq.drop_codel++;
> +}
> +
> +static u32 fq_hash(struct ieee80211_fq *fq, struct sk_buff *skb)
> +{
> + u32 hash = skb_get_hash_perturb(skb, fq->perturbation);
> + return reciprocal_scale(hash, fq->flows_cnt);
> +}
> +
> +static void fq_drop(struct ieee80211_local *local)
> +{
> + struct ieee80211_hw *hw = &local->hw;
> + struct ieee80211_fq *fq = &local->fq;
> + struct txq_flow *flow;
> + struct sk_buff *skb;
> +
> + flow = list_first_entry_or_null(&fq->backlogs, struct txq_flow,
> + backlogchain);
> + if (WARN_ON_ONCE(!flow))
> + return;
> +
> + skb = flow_dequeue(local, flow);
> + if (WARN_ON_ONCE(!skb))
> + return;
> +
> + ieee80211_free_txskb(hw, skb);
> + fq->drop_overlimit++;
> +}
> +
> +void ieee80211_init_flow(struct txq_flow *flow)
> +{
> + INIT_LIST_HEAD(&flow->flowchain);
> + INIT_LIST_HEAD(&flow->backlogchain);
> + __skb_queue_head_init(&flow->queue);
> + codel_vars_init(&flow->cvars);
> +}
> +
> +#define MIN_FQ_TARGET_USEC(fq) ((fq)->min_txops_target * (fq)->txop_mixed_usec)
> +
> +int ieee80211_setup_flows(struct ieee80211_local *local)
> +{
> + struct ieee80211_fq *fq = &local->fq;
> + int i;
> +
> + if (!local->ops->wake_tx_queue)
> + return 0;
> +
> + if (!local->hw.txq_limit)
> + local->hw.txq_limit = 8192;
> +
> + memset(fq, 0, sizeof(fq[0]));
> + INIT_LIST_HEAD(&fq->backlogs);
> + INIT_LIST_HEAD(&fq->old_flows);
> + INIT_LIST_HEAD(&fq->new_flows);
> + ewma_fq_period_init(&fq->ave_period);
> + atomic_set(&fq->in_flight_usec, 0);
> + spin_lock_init(&fq->lock);
> + fq->flows_cnt = 4096;
> + fq->perturbation = prandom_u32();
> + fq->quantum = 300;
> + fq->txop_mixed_usec = 5484;
> + fq->txop_green_usec = 10000;
> + fq->min_txops_target = 2;
> + fq->max_txops_per_txq = 1;
> + fq->min_txops_per_hw = 3;
> + fq->max_txops_per_hw = 4;
> +
> + if (!local->hw.txq_cparams.target)
> + local->hw.txq_cparams.target = US2TIME(MIN_FQ_TARGET_USEC(fq));
> +
> + if (!local->hw.txq_cparams.interval)
> + local->hw.txq_cparams.interval = MS2TIME(100);
> +
> + fq->flows = kzalloc(fq->flows_cnt * sizeof(fq->flows[0]), GFP_KERNEL);
> + if (!fq->flows)
> + return -ENOMEM;
> +
> + for (i = 0; i < fq->flows_cnt; i++)
> + ieee80211_init_flow(&fq->flows[i]);
> +
> + return 0;
> +}
> +
> +static void ieee80211_reset_flow(struct ieee80211_local *local,
> + struct txq_flow *flow)
> +{
> + if (!list_empty(&flow->flowchain))
> + list_del_init(&flow->flowchain);
> +
> + if (!list_empty(&flow->backlogchain))
> + list_del_init(&flow->backlogchain);
> +
> + ieee80211_purge_tx_queue(&local->hw, &flow->queue);
> +
> + flow->deficit = 0;
> + flow->txqi = NULL;
> +}
> +
> +void ieee80211_purge_txq(struct ieee80211_local *local, struct txq_info *txqi)
> +{
> + struct txq_flow *flow;
> + int i;
> +
> + for (i = 0; i < local->fq.flows_cnt; i++) {
> + flow = &local->fq.flows[i];
> +
> + if (flow->txqi != txqi)
> + continue;
> +
> + ieee80211_reset_flow(local, flow);
> + }
> +
> + ieee80211_reset_flow(local, &txqi->flow);
> +
> + txqi->backlog_bytes = 0;
> + txqi->backlog_packets = 0;
> +}
> +
> +void ieee80211_teardown_flows(struct ieee80211_local *local)
> +{
> + struct ieee80211_fq *fq = &local->fq;
> + struct ieee80211_sub_if_data *sdata;
> + struct sta_info *sta;
> + int i;
> +
> + if (!local->ops->wake_tx_queue)
> + return;
> +
> + list_for_each_entry_rcu(sta, &local->sta_list, list)
> + for (i = 0; i < IEEE80211_NUM_TIDS; i++)
> + ieee80211_purge_txq(local,
> + to_txq_info(sta->sta.txq[i]));
> +
> + list_for_each_entry_rcu(sdata, &local->interfaces, list)
> + ieee80211_purge_txq(local, to_txq_info(sdata->vif.txq));
> +
> + for (i = 0; i < fq->flows_cnt; i++)
> + ieee80211_reset_flow(local, &fq->flows[i]);
> +
> + kfree(fq->flows);
> +
> + fq->flows = NULL;
> + fq->flows_cnt = 0;
> +}
> +
> +static void ieee80211_txq_enqueue(struct ieee80211_local *local,
> + struct txq_info *txqi,
> + struct sk_buff *skb)
> +{
> + struct ieee80211_fq *fq = &local->fq;
> + struct ieee80211_hw *hw = &local->hw;
> + struct txq_flow *flow;
> + struct txq_flow *i;
> + size_t idx = fq_hash(fq, skb);
> +
> + lockdep_assert_held(&fq->lock);
> +
> + flow = &fq->flows[idx];
> +
> + if (flow->txqi && flow->txqi != txqi)
> + flow = &txqi->flow;
> +
> + /* The following overwrites `vif` pointer effectively. It is later
> + * restored using txq structure.
> + */
> + IEEE80211_SKB_CB(skb)->control.enqueue_time = codel_get_time();
> +
> + flow->txqi = txqi;
> + flow->backlog += skb->len;
> + txqi->backlog_bytes += skb->len;
> + txqi->backlog_packets++;
> + fq->backlog++;
> +
> + if (list_empty(&flow->backlogchain))
> + list_add_tail(&flow->backlogchain, &fq->backlogs);
> +
> + i = flow;
> + list_for_each_entry_continue_reverse(i, &fq->backlogs, backlogchain)
> + if (i->backlog > flow->backlog)
> + break;
> +
> + list_move(&flow->backlogchain, &i->backlogchain);
> +
> + if (list_empty(&flow->flowchain)) {
> + flow->deficit = fq->quantum;
> + list_add_tail(&flow->flowchain, &txqi->new_flows);
> + }
> +
> + if (list_empty(&txqi->flowchain)) {
> + txqi->deficit = fq->quantum;
> + list_add_tail(&txqi->flowchain, &fq->new_flows);
> + }
> +
> + __skb_queue_tail(&flow->queue, skb);
> +
> + if (fq->backlog > hw->txq_limit)
> + fq_drop(local);
> +}
> +
> +static struct sk_buff *ieee80211_txq_dequeue(struct ieee80211_local *local,
> + struct txq_info *txqi)
> +{
> + struct ieee80211_fq *fq = &local->fq;
> + struct ieee80211_hw *hw = &local->hw;
> + struct txq_flow *flow;
> + struct list_head *head;
> + struct sk_buff *skb;
> +
> +begin:
> + head = &txqi->new_flows;
> + if (list_empty(head)) {
> + head = &txqi->old_flows;
> + if (list_empty(head))
> + return NULL;
> + }
> +
> + flow = list_first_entry(head, struct txq_flow, flowchain);
> +
> + if (flow->deficit <= 0) {
> + flow->deficit += fq->quantum;
> + list_move_tail(&flow->flowchain, &txqi->old_flows);
> + goto begin;
> + }
> +
> + skb = codel_dequeue(flow,
> + &flow->backlog,
> + txqi->bytes_per_burst,
> + &flow->cvars,
> + &hw->txq_cparams,
> + codel_get_time(),
> + false);
> + if (!skb) {
> + if ((head == &txqi->new_flows) &&
> + !list_empty(&txqi->old_flows)) {
> + list_move_tail(&flow->flowchain, &txqi->old_flows);
> + } else {
> + list_del_init(&flow->flowchain);
> + flow->txqi = NULL;
> + }
> + goto begin;
> + }
> +
> + flow->deficit -= skb->len;
> +
> + /* The `vif` pointer was overwritten with enqueue time during
> + * enqueuing. Restore it before handing to driver.
> + */
> + IEEE80211_SKB_CB(skb)->control.vif = flow->txqi->txq.vif;
> +
> + return skb;
> +}
> +
> +static struct txq_info *
> +ieee80211_get_txq(struct ieee80211_local *local,
> + struct ieee80211_vif *vif,
> + struct ieee80211_sta *pubsta,
> + struct sk_buff *skb)
> {
> struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
> - struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
> struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
> - struct ieee80211_tx_control control = {
> - .sta = pubsta,
> - };
> struct ieee80211_txq *txq = NULL;
> - struct txq_info *txqi;
> - u8 ac;
>
> if ((info->flags & IEEE80211_TX_CTL_SEND_AFTER_DTIM) ||
> (info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE))
> - goto tx_normal;
> + return NULL;
>
> if (!ieee80211_is_data(hdr->frame_control))
> - goto tx_normal;
> + return NULL;
>
> if (pubsta) {
> u8 tid = skb->priority & IEEE80211_QOS_CTL_TID_MASK;
> @@ -1263,57 +1572,48 @@ static void ieee80211_drv_tx(struct ieee80211_local *local,
> }
>
> if (!txq)
> - goto tx_normal;
> + return NULL;
>
> - ac = txq->ac;
> - txqi = to_txq_info(txq);
> - atomic_inc(&sdata->txqs_len[ac]);
> - if (atomic_read(&sdata->txqs_len[ac]) >= local->hw.txq_ac_max_pending)
> - netif_stop_subqueue(sdata->dev, ac);
> -
> - spin_lock_bh(&txqi->queue.lock);
> - txqi->byte_cnt += skb->len;
> - __skb_queue_tail(&txqi->queue, skb);
> - spin_unlock_bh(&txqi->queue.lock);
> -
> - drv_wake_tx_queue(local, txqi);
> -
> - return;
> -
> -tx_normal:
> - drv_tx(local, &control, skb);
> + return to_txq_info(txq);
> }
>
> +#define TXQI_BYTES_TO_USEC(txqi, bytes) \
> + DIV_ROUND_UP((bytes), max_t(int, 1, (txqi)->bytes_per_usec))
> +
> struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
> struct ieee80211_txq *txq)
> {
> struct ieee80211_local *local = hw_to_local(hw);
> - struct ieee80211_sub_if_data *sdata = vif_to_sdata(txq->vif);
> + struct ieee80211_fq *fq = &local->fq;
> + struct ieee80211_tx_info *info;
> struct txq_info *txqi = container_of(txq, struct txq_info, txq);
> struct ieee80211_hdr *hdr;
> struct sk_buff *skb = NULL;
> - u8 ac = txq->ac;
> + int duration_usec;
>
> - spin_lock_bh(&txqi->queue.lock);
> + spin_lock_bh(&fq->lock);
>
> if (test_bit(IEEE80211_TXQ_STOP, &txqi->flags))
> goto out;
>
> - skb = __skb_dequeue(&txqi->queue);
> + skb = ieee80211_txq_dequeue(local, txqi);
> if (!skb)
> goto out;
>
> - txqi->byte_cnt -= skb->len;
> + duration_usec = TXQI_BYTES_TO_USEC(txqi, skb->len);
> + duration_usec = min_t(int, BIT(10) - 1, duration_usec);
>
> - atomic_dec(&sdata->txqs_len[ac]);
> - if (__netif_subqueue_stopped(sdata->dev, ac))
> - ieee80211_propagate_queue_wake(local, sdata->vif.hw_queue[ac]);
> + info = IEEE80211_SKB_CB(skb);
> + info->expected_duration = duration_usec;
> +
> + txqi->in_flight_delta_usec += duration_usec;
> + atomic_add(duration_usec, &txqi->in_flight_usec);
> + atomic_add(duration_usec, &fq->in_flight_usec);
>
> hdr = (struct ieee80211_hdr *)skb->data;
> if (txq->sta && ieee80211_is_data_qos(hdr->frame_control)) {
> struct sta_info *sta = container_of(txq->sta, struct sta_info,
> sta);
> - struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
>
> hdr->seq_ctrl = ieee80211_tx_next_seq(sta, txq->tid);
> if (test_bit(IEEE80211_TXQ_AMPDU, &txqi->flags))
> @@ -1323,19 +1623,274 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
> }
>
> out:
> - spin_unlock_bh(&txqi->queue.lock);
> + spin_unlock_bh(&fq->lock);
>
> return skb;
> }
> EXPORT_SYMBOL(ieee80211_tx_dequeue);
>
> +static u16 ieee80211_get_txop_usec(struct ieee80211_local *local,
> + struct txq_info *txqi)
> +{
> + struct ieee80211_sub_if_data *sdata;
> + u16 txop_usec;
> +
> + sdata = container_of(txqi->txq.vif, struct ieee80211_sub_if_data, vif);
> + txop_usec = sdata->tx_conf[txqi->txq.ac].txop * 32;
> +
> + /* How to pick between mixed/greenfield txops? */
> + if (txop_usec == 0)
> + txop_usec = local->fq.txop_mixed_usec;
> +
> + return txop_usec;
> +}
> +
> +static u32 ieee80211_get_tput_kbps(struct ieee80211_local *local,
> + struct txq_info *txqi)
> +{
> + struct ieee80211_sub_if_data *sdata;
> + struct ieee80211_supported_band *sband;
> + struct ieee80211_chanctx_conf *chanctx_conf;
> + enum ieee80211_band band;
> + struct rate_control_ref *ref = NULL;
> + struct sta_info *sta;
> + int idx;
> + u32 tput;
> +
> + if (txqi->txq.sta) {
> + sta = container_of(txqi->txq.sta, struct sta_info, sta);
> +
> + if (test_sta_flag(sta, WLAN_STA_RATE_CONTROL))
> + ref = local->rate_ctrl;
> +
> + if (ref)
> + tput = ref->ops->get_expected_throughput(sta->rate_ctrl_priv);
> + else if (local->ops->get_expected_throughput)
> + tput = drv_get_expected_throughput(local, &sta->sta);
> + else
> + tput = 0;
> + } else {
> + sdata = container_of(txqi->txq.vif, struct ieee80211_sub_if_data, vif);
> +
> + rcu_read_lock();
> + chanctx_conf = rcu_dereference(sdata->vif.chanctx_conf);
> + band = chanctx_conf->def.chan->band;
> + rcu_read_unlock();
> +
> + sband = local->hw.wiphy->bands[band];
> + idx = sdata->vif.bss_conf.mcast_rate[band];
> + if (idx > 0) {
> + /* Convert units from 100Kbps and assume 20% MAC
> + * overhead, i.e. 80% efficiency.
> + */
> + tput = sband[band].bitrates[idx].bitrate * 100;
> + tput = (tput * 8) / 10;
> + } else {
> + tput = 1000;
> + }
> + }
> +
> + return tput;
> +}
> +
> +static void ieee80211_recalc_txqi_tput(struct ieee80211_local *local,
> + struct txq_info *txqi)
> +{
> + struct ieee80211_fq *fq = &local->fq;
> + int tput_kbps;
> + int txop_usec;
> +
> + lockdep_assert_held(&fq->lock);
> +
> + tput_kbps = ieee80211_get_tput_kbps(local, txqi);
> + txop_usec = ieee80211_get_txop_usec(local, txqi);
> + txqi->bytes_per_usec = max_t(int, 1, DIV_ROUND_UP(1024 * (tput_kbps/8),
> + USEC_PER_SEC));
> + txqi->bytes_per_burst = max_t(int, 1, txop_usec * txqi->bytes_per_usec);
> +}
> +
> +void ieee80211_recalc_fq_period(struct ieee80211_hw *hw)
> +{
> + struct ieee80211_local *local = hw_to_local(hw);
> + struct ieee80211_fq *fq = &local->fq;
> + struct txq_info *txqi;
> + int period = 0;
> + int target_usec;
> +
> + spin_lock_bh(&fq->lock);
> +
> + list_for_each_entry(txqi, &fq->new_flows, flowchain) {
> + ieee80211_recalc_txqi_tput(local, txqi);
> +
> + period += TXQI_BYTES_TO_USEC(txqi, min(txqi->backlog_bytes,
> + txqi->bytes_per_burst));
> + }
> +
> + list_for_each_entry(txqi, &fq->old_flows, flowchain) {
> + ieee80211_recalc_txqi_tput(local, txqi);
> +
> + period += TXQI_BYTES_TO_USEC(txqi, min(txqi->backlog_bytes,
> + txqi->bytes_per_burst));
> + }
> +
> + ewma_fq_period_add(&fq->ave_period, period);
> +
> + target_usec = ewma_fq_period_read(&fq->ave_period);
> + target_usec = max_t(u64, target_usec, MIN_FQ_TARGET_USEC(fq));
> + hw->txq_cparams.target = US2TIME(target_usec);
> +
> + spin_unlock_bh(&fq->lock);
> +}
> +EXPORT_SYMBOL(ieee80211_recalc_fq_period);
> +
> +static int ieee80211_tx_sched_budget(struct ieee80211_local *local,
> + struct txq_info *txqi)
> +{
> + int txop_usec;
> + int budget;
> +
> + /* XXX: Should this consider per-txq or per-sta in flight duration? */
> + txop_usec = ieee80211_get_txop_usec(local, txqi);
> + budget = local->fq.max_txops_per_txq * txop_usec;
> + budget -= atomic_read(&txqi->in_flight_usec);
> + budget = min(budget, txop_usec);
> + budget *= min_t(int, 1, txqi->bytes_per_usec);
> +
> + return budget;
> +}
> +
> +static void ieee80211_tx_sched_next_txqi(struct ieee80211_local *local,
> + struct list_head **list,
> + struct list_head **head)
> +{
> + struct ieee80211_fq *fq = &local->fq;
> +
> + if (!*list) {
> + *head = &fq->new_flows;
> + *list = *head;
> + }
> +
> + *list = (*list)->next;
> +
> + if (*list != *head)
> + return;
> +
> + if (*head == &fq->new_flows) {
> + *head = &fq->old_flows;
> + *list = *head;
> + ieee80211_tx_sched_next_txqi(local, list, head);
> + return;
> + }
> +
> + *head = NULL;
> + *list = NULL;
> +}
> +
> +int ieee80211_tx_schedule(struct ieee80211_hw *hw,
> + int (*wake)(struct ieee80211_hw *hw,
> + struct ieee80211_txq *txq,
> + int budget))
> +{
> + struct ieee80211_local *local = hw_to_local(hw);
> + struct ieee80211_fq *fq = &local->fq;
> + struct list_head *list = NULL;
> + struct list_head *head = NULL;
> + struct txq_info *txqi = NULL;
> + int min_in_flight_usec;
> + int max_in_flight_usec;
> + int in_flight_usec;
> + int ret = 0;
> + int budget;
> +
> + rcu_read_lock();
> + spin_lock_bh(&fq->lock);
> +
> + min_in_flight_usec = fq->min_txops_per_hw * fq->txop_mixed_usec;
> + max_in_flight_usec = fq->max_txops_per_hw * fq->txop_mixed_usec;
> + in_flight_usec = atomic_read(&fq->in_flight_usec);
> +
> + if (in_flight_usec >= min_in_flight_usec) {
> + ret = -EBUSY;
> + goto unlock;
> + }
> +
> + for (;;) {
> + if (in_flight_usec >= max_in_flight_usec) {
> + ret = -EBUSY;
> + break;
> + }
> +
> + if (list && list_is_last(list, &fq->old_flows)) {
> + ret = -EBUSY;
> + break;
> + }
> +
> + ieee80211_tx_sched_next_txqi(local, &list, &head);
> + if (!list) {
> + ret = -ENOENT;
> + break;
> + }
> +
> + txqi = list_entry(list, struct txq_info, flowchain);
> +
> + if (txqi->deficit < 0) {
> + txqi->deficit += fq->quantum;
> + list_move_tail(&txqi->flowchain, &fq->old_flows);
> + list = NULL;
> + continue;
> + }
> +
> + budget = ieee80211_tx_sched_budget(local, txqi);
> + txqi->in_flight_delta_usec = 0;
> +
> + spin_unlock_bh(&fq->lock);
> + ret = wake(hw, &txqi->txq, budget);
> + spin_lock_bh(&fq->lock);
> +
> + if (ret > 0) {
> + txqi->deficit -= txqi->in_flight_delta_usec;
> + in_flight_usec += txqi->in_flight_delta_usec;
> + }
> +
> + if (!txqi->backlog_bytes) {
> + if (head == &fq->new_flows && !list_empty(&fq->old_flows)) {
> + list_move_tail(&txqi->flowchain, &fq->old_flows);
> + } else {
> + list_del_init(&txqi->flowchain);
> + }
> +
> + list = NULL;
> + }
> +
> + if (ret < 0) {
> + ret = -EBUSY;
> + break;
> + } else if (ret == 0 && txqi) {
> + /* `list` is not reset to skip over */
> + continue;
> + }
> +
> + list = NULL;
> + }
> +
> +unlock:
> + spin_unlock_bh(&fq->lock);
> + rcu_read_unlock();
> +
> + return ret;
> +}
> +EXPORT_SYMBOL(ieee80211_tx_schedule);
> +
> static bool ieee80211_tx_frags(struct ieee80211_local *local,
> struct ieee80211_vif *vif,
> struct ieee80211_sta *sta,
> struct sk_buff_head *skbs,
> bool txpending)
> {
> + struct ieee80211_fq *fq = &local->fq;
> + struct ieee80211_tx_control control = {};
> struct sk_buff *skb, *tmp;
> + struct txq_info *txqi;
> unsigned long flags;
>
> skb_queue_walk_safe(skbs, skb, tmp) {
> @@ -1350,6 +1905,24 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
> }
> #endif
>
> + /* XXX: This changes behavior for offchan-tx. Is this really a
> + * problem with per-sta-tid queueing now?
> + */
> + txqi = ieee80211_get_txq(local, vif, sta, skb);
> + if (txqi) {
> + info->control.vif = vif;
> +
> + __skb_unlink(skb, skbs);
> +
> + spin_lock_bh(&fq->lock);
> + ieee80211_txq_enqueue(local, txqi, skb);
> + spin_unlock_bh(&fq->lock);
> +
> + drv_wake_tx_queue(local, txqi);
> +
> + continue;
> + }
> +
> spin_lock_irqsave(&local->queue_stop_reason_lock, flags);
> if (local->queue_stop_reasons[q] ||
> (!txpending && !skb_queue_empty(&local->pending[q]))) {
> @@ -1392,9 +1965,10 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
> spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags);
>
> info->control.vif = vif;
> + control.sta = sta;
>
> __skb_unlink(skb, skbs);
> - ieee80211_drv_tx(local, vif, sta, skb);
> + drv_tx(local, &control, skb);
> }
>
> return true;
> @@ -2381,7 +2955,7 @@ static struct sk_buff *ieee80211_build_hdr(struct ieee80211_sub_if_data *sdata,
>
> spin_lock_irqsave(&local->ack_status_lock, flags);
> id = idr_alloc(&local->ack_status_frames, ack_skb,
> - 1, 0x10000, GFP_ATOMIC);
> + 1, 0x8000, GFP_ATOMIC);
> spin_unlock_irqrestore(&local->ack_status_lock, flags);
>
> if (id >= 0) {
> diff --git a/net/mac80211/util.c b/net/mac80211/util.c
> index 0319d6d4f863..afb1bbf9b3f4 100644
> --- a/net/mac80211/util.c
> +++ b/net/mac80211/util.c
> @@ -244,6 +244,9 @@ void ieee80211_propagate_queue_wake(struct ieee80211_local *local, int queue)
> struct ieee80211_sub_if_data *sdata;
> int n_acs = IEEE80211_NUM_ACS;
>
> + if (local->ops->wake_tx_queue)
> + return;
> +
> if (local->hw.queues < IEEE80211_NUM_ACS)
> n_acs = 1;
>
> @@ -260,11 +263,6 @@ void ieee80211_propagate_queue_wake(struct ieee80211_local *local, int queue)
> for (ac = 0; ac < n_acs; ac++) {
> int ac_queue = sdata->vif.hw_queue[ac];
>
> - if (local->ops->wake_tx_queue &&
> - (atomic_read(&sdata->txqs_len[ac]) >
> - local->hw.txq_ac_max_pending))
> - continue;
> -
> if (ac_queue == queue ||
> (sdata->vif.cab_queue == queue &&
> local->queue_stop_reasons[ac_queue] == 0 &&
> @@ -352,6 +350,9 @@ static void __ieee80211_stop_queue(struct ieee80211_hw *hw, int queue,
> if (__test_and_set_bit(reason, &local->queue_stop_reasons[queue]))
> return;
>
> + if (local->ops->wake_tx_queue)
> + return;
> +
> if (local->hw.queues < IEEE80211_NUM_ACS)
> n_acs = 1;
>
> @@ -3392,8 +3393,12 @@ void ieee80211_init_tx_queue(struct ieee80211_sub_if_data *sdata,
> struct sta_info *sta,
> struct txq_info *txqi, int tid)
> {
> - skb_queue_head_init(&txqi->queue);
> + INIT_LIST_HEAD(&txqi->flowchain);
> + INIT_LIST_HEAD(&txqi->old_flows);
> + INIT_LIST_HEAD(&txqi->new_flows);
> + ieee80211_init_flow(&txqi->flow);
> txqi->txq.vif = &sdata->vif;
> + txqi->flow.txqi = txqi;
>
> if (sta) {
> txqi->txq.sta = &sta->sta;
> @@ -3414,9 +3419,9 @@ void ieee80211_txq_get_depth(struct ieee80211_txq *txq,
> struct txq_info *txqi = to_txq_info(txq);
>
> if (frame_cnt)
> - *frame_cnt = txqi->queue.qlen;
> + *frame_cnt = txqi->backlog_packets;
>
> if (byte_cnt)
> - *byte_cnt = txqi->byte_cnt;
> + *byte_cnt = txqi->backlog_bytes;
> }
> EXPORT_SYMBOL(ieee80211_txq_get_depth);
> --
> 2.1.4
>
> _______________________________________________
> Make-wifi-fast mailing list
> [email protected]
> https://lists.bufferbloat.net/listinfo/make-wifi-fast

2016-03-24 12:31:33

by Michal Kazior

[permalink] [raw]
Subject: Re: [RFCv2 2/3] ath10k: report per-station tx/rate rates to mac80211

On 24 March 2016 at 13:23, Mohammed Shafi Shajakhan
<[email protected]> wrote:
> On Thu, Mar 24, 2016 at 08:49:12AM +0100, Michal Kazior wrote:
>> On 24 March 2016 at 08:19, Mohammed Shafi Shajakhan
>> <[email protected]> wrote:
>> > Hi Michal,
>> >
>> > On Wed, Mar 16, 2016 at 11:17:57AM +0100, Michal Kazior wrote:
>> >> The rate control is offloaded by firmware so it's
>> >> challanging to provide expected throughput value
>> >> for given station.
>> >>
>> >> This approach is naive as it reports last tx rate
>> >> used for given station as provided by firmware
>> >> stat event.
>> >>
>> >> This should be sufficient for airtime estimation
>> >> used for fq-codel-in-mac80211 tx scheduling
>> >> purposes now.
>> >>
>> >> This patch uses a very hacky way to get the stats.
>> >> This is sufficient for proof-of-concept but must
>> >> be cleaned up properly eventually.
>> >>
>> >> Signed-off-by: Michal Kazior <[email protected]>
>> >> ---
>> >> drivers/net/wireless/ath/ath10k/core.h | 5 +++
>> >> drivers/net/wireless/ath/ath10k/debug.c | 61 +++++++++++++++++++++++++++++----
>> >> drivers/net/wireless/ath/ath10k/mac.c | 26 ++++++++------
>> >> drivers/net/wireless/ath/ath10k/wmi.h | 2 +-
>> >> 4 files changed, 76 insertions(+), 18 deletions(-)
>> >>
>> >> diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
>> >> index 23ba03fb7a5f..3f76669d44cf 100644
>> >> --- a/drivers/net/wireless/ath/ath10k/core.h
>> >> +++ b/drivers/net/wireless/ath/ath10k/core.h
>> >> @@ -331,6 +331,9 @@ struct ath10k_sta {
>> >> /* protected by conf_mutex */
>> >> bool aggr_mode;
>> >> u64 rx_duration;
>> >> +
>> >> + u32 tx_rate_kbps;
>> >> + u32 rx_rate_kbps;
>> >> #endif
>> >> };
>> >>
>> >> @@ -372,6 +375,8 @@ struct ath10k_vif {
>> >> s8 def_wep_key_idx;
>> >>
>> >> u16 tx_seq_no;
>> >> + u32 tx_rate_kbps;
>> >> + u32 rx_rate_kbps;
>> >>
>> >> union {
>> >> struct {
>> >> diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c
>> >> index 076d29b53ddf..cc7ebf04ae00 100644
>> >> --- a/drivers/net/wireless/ath/ath10k/debug.c
>> >> +++ b/drivers/net/wireless/ath/ath10k/debug.c
>> >> @@ -316,6 +316,58 @@ static void ath10k_debug_fw_stats_reset(struct ath10k *ar)
>> >> spin_unlock_bh(&ar->data_lock);
>> >> }
>> >>
>> >> +static void ath10k_mac_update_txrx_rate_iter(void *data,
>> >> + u8 *mac,
>> >> + struct ieee80211_vif *vif)
>> >> +{
>> >> + struct ath10k_fw_stats_peer *peer = data;
>> >> + struct ath10k_vif *arvif;
>> >> +
>> >> + if (memcmp(vif->addr, peer->peer_macaddr, ETH_ALEN))
>> >> + return;
>> >> +
>> >> + arvif = (void *)vif->drv_priv;
>> >> + arvif->tx_rate_kbps = peer->peer_tx_rate;
>> >> + arvif->rx_rate_kbps = peer->peer_rx_rate;
>> >> +}
>> >> +
>> >> +static void ath10k_mac_update_txrx_rate(struct ath10k *ar,
>> >> + struct ath10k_fw_stats *stats)
>> >> +{
>> >> + struct ieee80211_hw *hw = ar->hw;
>> >> + struct ath10k_fw_stats_peer *peer;
>> >> + struct ath10k_sta *arsta;
>> >> + struct ieee80211_sta *sta;
>> >> + const u8 *localaddr = NULL;
>> >> +
>> >> + rcu_read_lock();
>> >> +
>> >> + list_for_each_entry(peer, &stats->peers, list) {
>> >> + /* This doesn't account for multiple STA connected on different
>> >> + * vifs. Unfortunately there's no way to derive that from the available
>> >> + * information.
>> >> + */
>> >> + sta = ieee80211_find_sta_by_ifaddr(hw,
>> >> + peer->peer_macaddr,
>> >> + localaddr);
>> >> + if (!sta) {
>> >> + /* This tries to update multicast rates */
>> >> + ieee80211_iterate_active_interfaces_atomic(
>> >> + hw,
>> >> + IEEE80211_IFACE_ITER_NORMAL,
>> >> + ath10k_mac_update_txrx_rate_iter,
>> >> + peer);
>> >> + continue;
>> >> + }
>> >> +
>> >> + arsta = (void *)sta->drv_priv;
>> >> + arsta->tx_rate_kbps = peer->peer_tx_rate;
>> >> + arsta->rx_rate_kbps = peer->peer_rx_rate;
>> >> + }
>> >> +
>> >> + rcu_read_unlock();
>> >> +}
>> >> +
>> >> void ath10k_debug_fw_stats_process(struct ath10k *ar, struct sk_buff *skb)
>> >> {
>> >> struct ath10k_fw_stats stats = {};
>> >> @@ -335,6 +387,8 @@ void ath10k_debug_fw_stats_process(struct ath10k *ar, struct sk_buff *skb)
>> >> goto free;
>> >> }
>> >>
>> >> + ath10k_mac_update_txrx_rate(ar, &stats);
>> >> +
>> >> /* Stat data may exceed htc-wmi buffer limit. In such case firmware
>> >> * splits the stats data and delivers it in a ping-pong fashion of
>> >> * request cmd-update event.
>> >> @@ -351,13 +405,6 @@ void ath10k_debug_fw_stats_process(struct ath10k *ar, struct sk_buff *skb)
>> >> if (peer_stats_svc)
>> >> ath10k_sta_update_rx_duration(ar, &stats.peers);
>> >>
>> >> - if (ar->debug.fw_stats_done) {
>> >> - if (!peer_stats_svc)
>> >> - ath10k_warn(ar, "received unsolicited stats update event\n");
>> >> -
>> >> - goto free;
>> >> - }
>> >> -
>> >
>> > [shafi] As you had suggested previously, should we completely clean up this ping
>> > - pong response approach for f/w stats, (or) this should be retained to support
>> > backward compatibility and also for supporting ping - pong response when user
>> > cats for fw-stats (via debugfs) (i did see in the commit message this needs to
>> > be cleaned up)
>>
>> I think it makes sense to remove the ping-pong logic and rely on
>> periodic updates alone, including fw_stats and ethstats handling.
>>
>>
>> >> - if (test_bit(WMI_SERVICE_PEER_STATS, ar->wmi.svc_map)) {
>> >> - param = ar->wmi.pdev_param->peer_stats_update_period;
>> >> - ret = ath10k_wmi_pdev_set_param(ar, param,
>> >> - PEER_DEFAULT_STATS_UPDATE_PERIOD);
>> >> - if (ret) {
>> >> - ath10k_warn(ar,
>> >> - "failed to set peer stats period : %d\n",
>> >> - ret);
>> >> - goto err_core_stop;
>> >> - }
>> >> + param = ar->wmi.pdev_param->peer_stats_update_period;
>> >> + ret = ath10k_wmi_pdev_set_param(ar, param,
>> >> + PEER_DEFAULT_STATS_UPDATE_PERIOD);
>> >> + if (ret) {
>> >> + ath10k_warn(ar,
>> >> + "failed to set peer stats period : %d\n",
>> >> + ret);
>> >> + goto err_core_stop;
>> >> }
>> >
>> > [shafi] If i am correct this change requires 'PEER_STATS' to be enabled by
>> > default.
>>
>> No, it does not. Periodic stats have been available since forever.
>
> [shafi] Michal, sorry i was talking about enabling WMI_PEER_STATS feature for
> 10.2, and we have a patch pushed recently to reduce the number of peers if
> 'WMI_PEER_STATS' feature is enabled(avoiding f/w crash due to memory
> constraints) . But this patch requires the feature to be
> turned ON always (with periodic stats update as well for evey 100ms). Please
> correct me if my understanding is wrong.

Periodic stats and extended stats are two separate things.

WMI_PEER_STATS is a feature which prompts firmware to gather more
statistics and then report them to host via stat update event and
includes, e.g. rx_duration. Due to how rx_duration was designed in
firmware it needs to be combined with reading out the stats often to
make it usable. Periodic stats were used instead of explicit
ping-pong.

Periodic stats is just one of two ways (the other being ping-pong)
asking firmware for stat update event. It has been in firmware since
forever.


Michał

2016-03-25 09:26:22

by Michal Kazior

[permalink] [raw]
Subject: [PATCH 0/2] mac80211: implement fq_codel

Hi,

I've cleaned up and removed the
txop-queue-limiting and scheduling from the patch
(compared to my last RFC). It's still to early for
the scheduling thing to go prime time.

The fair queuing on the other hand does seem to
work. In good RF conditions it seems to improve
things (e.g. multiple TCP streams converge into a
steady average). In bad RF conditions things look
same grim as before (but not worse).

I've done a few more experiments with naive DQL in
ath10k and some flent tests prove the fair queuing
in mac80211 works better than fq_codel qdisc as
far as wake_tx_queue drivers are concerned. I'll
be posting a separate thread after this.

This is based on mac80211-next/master
(0a87cadbb54e1595a5f64542adb4c63be914d290).


Michal Kazior (2):
mac80211: implement fair queuing per txq
mac80211: expose some txq/fq internals and knobs via debugfs

include/net/mac80211.h | 21 ++-
net/mac80211/agg-tx.c | 8 +-
net/mac80211/codel.h | 264 +++++++++++++++++++++++++++++
net/mac80211/codel_i.h | 89 ++++++++++
net/mac80211/debugfs.c | 86 ++++++++++
net/mac80211/debugfs_netdev.c | 29 +++-
net/mac80211/debugfs_sta.c | 46 +++++
net/mac80211/ieee80211_i.h | 45 ++++-
net/mac80211/iface.c | 24 ++-
net/mac80211/main.c | 9 +-
net/mac80211/rx.c | 2 +-
net/mac80211/sta_info.c | 10 +-
net/mac80211/sta_info.h | 27 +++
net/mac80211/tx.c | 384 +++++++++++++++++++++++++++++++++++++-----
net/mac80211/util.c | 20 ++-
15 files changed, 983 insertions(+), 81 deletions(-)
create mode 100644 net/mac80211/codel.h
create mode 100644 net/mac80211/codel_i.h

--
2.1.4


2016-03-22 08:05:35

by Michal Kazior

[permalink] [raw]
Subject: Re: [RFCv2 0/3] mac80211: implement fq codel

On 21 March 2016 at 18:10, Dave Taht <[email protected]> wrote:
> thx.
>
> a lot to digest.
>
> A) quick notes on "flent-gui bursts_11e-2016-03-21T09*.gz"
>
> 1) the new bursts_11e test *should* have stuck stuff in the VI and VO
> queues, and there *should* have been some sort of difference shown on
> the plots with it. There wasn't.

traffic-gen generates only BE traffic. Everything else runs UDP_RR
which doesn't generate a lot of traffic.


> For diffserv markings I used BE=CS0, BK=CS1, VI=CS5, and VO=EF.
> CS6/CS7 should also land in VO (at least with the soft mac handler
> last I looked). Is there a way to check if you are indeed exercising
> all four 802.11e hardware queues in this test? in ath9k it is the
> "xmit" sysfs var....

Hmm.. there are no txq stats. I guess it makes sense to have them?

There is /sys/kernel/debug/ieee80211/phy*/fq which dumps state of all
queues which will be mostly empty with UDP_RR. You can run netperf UDP
stream with diffserv marking to see onto which tid they are mapped.
You can see tid-AC mappings here:
https://wireless.wiki.kernel.org/en/developers/documentation/mac80211/queues

I just checked and EF ends up as tid5 which is VI. It's actually the
same as CS5. You can use CS7 to run on tid7 which is VO.


> 2) In all the old cases the BE UDP_RR flow died on the first burst
> (why?), and the fullpatch preserved it.

I think it's related to my setup which involves veth pairs. I use them
to simulate bridging/AP behavior but maybe it's not doing the job
right, hmm..


> (I would have kind of hoped to
> have seen the BK flow die, actually, in the fullpatch)

There's no extra weight priority to BK. The difference between BE and
BK in 802.11 is contention window access time so BK gets less txops
statistically. Both share the same txop, which is 5.484ms in most
cases.


> 3) I am also confused on 802.11ac - can VO aggregate? ( can't in in 802.11n).

Yes, it should be albeit VI and VO have shorter txop compared to
BE/BK: 3.008ms and 1.504ms respectively.

UDP_RR doesn't really create a lot of opportunities for aggregation.
If you want to see how different queues behave when loaded you'll need
to modify traffic-gen and add bursts across different ACs in the
bursts_11e test.


Michał

2016-03-17 17:24:52

by Rick Jones

[permalink] [raw]
Subject: Re: [Codel] [RFCv2 0/3] mac80211: implement fq codel

On 03/17/2016 10:00 AM, Dave Taht wrote:
> netperf's udp_rr is not how much traffic conventionally behaves. It
> doesn't do tcp slow start or congestion control in particular...

Nor would one expect it to need to, unless one were using "burst mode"
to have more than one transaction inflight at one time.

And unless one uses the test-specific -e option to provide a very crude
retransmission mechanism based on a socket read timeout, neither does
UDP_RR recover from lost datagrams.

happy benchmarking,

rick jones
http://www.netperf.org/

2016-03-22 14:24:17

by Dave Taht

[permalink] [raw]
Subject: Re: [RFCv2 0/3] mac80211: implement fq codel

We have a huge cc list on this thread, and admittedly this work does
cut across a great deal of wireless, potentially, but does netdev need
to be on it?
there's been nothing codel specific on it in a while, so I cut those
from the cc.

On Tue, Mar 22, 2016 at 1:05 AM, Michal Kazior <[email protected]> wrote:
> On 21 March 2016 at 18:10, Dave Taht <[email protected]> wrote:
>> thx.
>>
>> a lot to digest.
>>
>> A) quick notes on "flent-gui bursts_11e-2016-03-21T09*.gz"
>>
>> 1) the new bursts_11e test *should* have stuck stuff in the VI and VO
>> queues, and there *should* have been some sort of difference shown on
>> the plots with it. There wasn't.
>
> traffic-gen generates only BE traffic. Everything else runs UDP_RR
> which doesn't generate a lot of traffic.
>
>
>> For diffserv markings I used BE=CS0, BK=CS1, VI=CS5, and VO=EF.
>> CS6/CS7 should also land in VO (at least with the soft mac handler
>> last I looked). Is there a way to check if you are indeed exercising
>> all four 802.11e hardware queues in this test? in ath9k it is the
>> "xmit" sysfs var....
>
> Hmm.. there are no txq stats. I guess it makes sense to have them?

ath9k xmit has been useful to capture. I'm kind of unconvinced those
stats are correct, at the moment, but...

> There is /sys/kernel/debug/ieee80211/phy*/fq which dumps state of all
> queues which will be mostly empty with UDP_RR. You can run netperf UDP
> stream with diffserv marking to see onto which tid they are mapped.
> You can see tid-AC mappings here:
> https://wireless.wiki.kernel.org/en/developers/documentation/mac80211/queues

We can try to capture those, but sampling summary per-station stats
ties back better to actual traffic analysis.

Also useful to capture has been the minstrel stats, the minstrel-blues
version provided these in a handy csv format.

> I just checked and EF ends up as tid5 which is VI. It's actually the
> same as CS5. You can use CS7 to run on tid7 which is VO.

The intent of CS6 is somewhat incompatible with VO's intent, but we
can argue diffserv's usefulness and mappings another day.

I have changed the bursts_11e test to use CS7, which will break
parsing our previous test runs' data, but actually test what I'd
intended to test in the first place.

>> 2) In all the old cases the BE UDP_RR flow died on the first burst
>> (why?), and the fullpatch preserved it.
>
> I think it's related to my setup which involves veth pairs. I use them
> to simulate bridging/AP behavior but maybe it's not doing the job
> right, hmm..
>
>
>> (I would have kind of hoped to
>> have seen the BK flow die, actually, in the fullpatch)
>
> There's no extra weight priority to BK. The difference between BE and
> BK in 802.11 is contention window access time so BK gets less txops
> statistically. Both share the same txop, which is 5.484ms in most
> cases.

Um, well, another day.

>
>> 3) I am also confused on 802.11ac - can VO aggregate? ( can't in in 802.11n).
>
> Yes, it should be albeit VI and VO have shorter txop compared to
> BE/BK: 3.008ms and 1.504ms respectively.

Not being able to aggregate in VO in n was a bad thing. There is an
awful lot I like about ac over n.

>
> UDP_RR doesn't really create a lot of opportunities for aggregation.
> If you want to see how different queues behave when loaded you'll need
> to modify traffic-gen and add bursts across different ACs in the
> bursts_11e test.

or flood the queues with other tests like rrul or toke's enhancement
to traffic-gen. :) I liked being able to arbitrarily mark udp packets
ecn capable...

>
>
> Michał

2016-03-31 10:26:22

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv2 2/2] mac80211: expose some txq/fq internals and knobs via debugfs

Makes it easier to debug, test and experiment.

Signed-off-by: Michal Kazior <[email protected]>
---

Notes:
v2:
* add moduleparam.h to tx.c (fixes broken compilation with backports)

net/mac80211/debugfs.c | 86 +++++++++++++++++++++++++++++++++++++++++++
net/mac80211/debugfs_netdev.c | 29 ++++++++++++++-
net/mac80211/debugfs_sta.c | 46 +++++++++++++++++++++++
net/mac80211/tx.c | 8 +++-
4 files changed, 167 insertions(+), 2 deletions(-)

diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 4ab5c522ceee..81d3f5a9910d 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -31,6 +31,30 @@ int mac80211_format_buffer(char __user *userbuf, size_t count,
return simple_read_from_buffer(userbuf, count, ppos, buf, res);
}

+static int mac80211_parse_buffer(const char __user *userbuf,
+ size_t count,
+ loff_t *ppos,
+ char *fmt, ...)
+{
+ va_list args;
+ char buf[DEBUGFS_FORMAT_BUFFER_SIZE] = {};
+ int res;
+
+ if (count > sizeof(buf))
+ return -EINVAL;
+
+ if (copy_from_user(buf, userbuf, count))
+ return -EFAULT;
+
+ buf[sizeof(buf) - 1] = '\0';
+
+ va_start(args, fmt);
+ res = vsscanf(buf, fmt, args);
+ va_end(args);
+
+ return count;
+}
+
#define DEBUGFS_READONLY_FILE_FN(name, fmt, value...) \
static ssize_t name## _read(struct file *file, char __user *userbuf, \
size_t count, loff_t *ppos) \
@@ -70,6 +94,59 @@ DEBUGFS_READONLY_FILE(wep_iv, "%#08x",
DEBUGFS_READONLY_FILE(rate_ctrl_alg, "%s",
local->rate_ctrl ? local->rate_ctrl->ops->name : "hw/driver");

+#define DEBUGFS_RW_FILE_FN(name, expr) \
+static ssize_t name## _write(struct file *file, \
+ const char __user *userbuf, \
+ size_t count, \
+ loff_t *ppos) \
+{ \
+ struct ieee80211_local *local = file->private_data; \
+ return expr; \
+}
+
+#define DEBUGFS_RW_FILE(name, expr, fmt, value...) \
+ DEBUGFS_READONLY_FILE_FN(name, fmt, value) \
+ DEBUGFS_RW_FILE_FN(name, expr) \
+ DEBUGFS_RW_FILE_OPS(name)
+
+#define DEBUGFS_RW_FILE_OPS(name) \
+static const struct file_operations name## _ops = { \
+ .read = name## _read, \
+ .write = name## _write, \
+ .open = simple_open, \
+ .llseek = generic_file_llseek, \
+}
+
+#define DEBUGFS_RW_EXPR_FQ(args...) \
+({ \
+ int res; \
+ res = mac80211_parse_buffer(userbuf, count, ppos, args); \
+ res; \
+})
+
+DEBUGFS_READONLY_FILE(fq_drop_overlimit, "%u",
+ local->fq.drop_overlimit);
+DEBUGFS_READONLY_FILE(fq_drop_codel, "%u",
+ local->fq.drop_codel);
+DEBUGFS_READONLY_FILE(fq_backlog, "%u",
+ local->fq.backlog);
+DEBUGFS_READONLY_FILE(fq_flows_cnt, "%u",
+ local->fq.flows_cnt);
+
+DEBUGFS_RW_FILE(fq_target,
+ DEBUGFS_RW_EXPR_FQ("%llu", &local->fq.cparams.target),
+ "%llu", local->fq.cparams.target);
+DEBUGFS_RW_FILE(fq_interval,
+ DEBUGFS_RW_EXPR_FQ("%llu", &local->fq.cparams.interval),
+ "%llu", local->fq.cparams.interval);
+DEBUGFS_RW_FILE(fq_quantum,
+ DEBUGFS_RW_EXPR_FQ("%u", &local->fq.quantum),
+ "%u", local->fq.quantum);
+DEBUGFS_RW_FILE(fq_txq_limit,
+ DEBUGFS_RW_EXPR_FQ("%u", &local->fq.txq_limit),
+ "%u", local->fq.txq_limit);
+
+
#ifdef CONFIG_PM
static ssize_t reset_write(struct file *file, const char __user *user_buf,
size_t count, loff_t *ppos)
@@ -254,6 +331,15 @@ void debugfs_hw_add(struct ieee80211_local *local)
DEBUGFS_ADD(user_power);
DEBUGFS_ADD(power);

+ DEBUGFS_ADD(fq_drop_overlimit);
+ DEBUGFS_ADD(fq_drop_codel);
+ DEBUGFS_ADD(fq_backlog);
+ DEBUGFS_ADD(fq_flows_cnt);
+ DEBUGFS_ADD(fq_target);
+ DEBUGFS_ADD(fq_interval);
+ DEBUGFS_ADD(fq_quantum);
+ DEBUGFS_ADD(fq_txq_limit);
+
statsd = debugfs_create_dir("statistics", phyd);

/* if the dir failed, don't put all the other things into the root! */
diff --git a/net/mac80211/debugfs_netdev.c b/net/mac80211/debugfs_netdev.c
index 37ea30e0754c..39ae13d19387 100644
--- a/net/mac80211/debugfs_netdev.c
+++ b/net/mac80211/debugfs_netdev.c
@@ -30,7 +30,7 @@ static ssize_t ieee80211_if_read(
size_t count, loff_t *ppos,
ssize_t (*format)(const struct ieee80211_sub_if_data *, char *, int))
{
- char buf[70];
+ char buf[200];
ssize_t ret = -EINVAL;

read_lock(&dev_base_lock);
@@ -236,6 +236,32 @@ ieee80211_if_fmt_hw_queues(const struct ieee80211_sub_if_data *sdata,
}
IEEE80211_IF_FILE_R(hw_queues);

+static ssize_t
+ieee80211_if_fmt_txq(const struct ieee80211_sub_if_data *sdata,
+ char *buf, int buflen)
+{
+ struct txq_info *txqi;
+ int len = 0;
+
+ if (!sdata->vif.txq)
+ return 0;
+
+ txqi = to_txq_info(sdata->vif.txq);
+ len += scnprintf(buf + len, buflen - len,
+ "CAB backlog %ub %up flows %u drops %u overlimit %u collisions %u tx %ub %up\n",
+ txqi->backlog_bytes,
+ txqi->backlog_packets,
+ txqi->flows,
+ txqi->drop_codel,
+ txqi->drop_overlimit,
+ txqi->collisions,
+ txqi->tx_bytes,
+ txqi->tx_packets);
+
+ return len;
+}
+IEEE80211_IF_FILE_R(txq);
+
/* STA attributes */
IEEE80211_IF_FILE(bssid, u.mgd.bssid, MAC);
IEEE80211_IF_FILE(aid, u.mgd.aid, DEC);
@@ -618,6 +644,7 @@ static void add_common_files(struct ieee80211_sub_if_data *sdata)
DEBUGFS_ADD(rc_rateidx_vht_mcs_mask_2ghz);
DEBUGFS_ADD(rc_rateidx_vht_mcs_mask_5ghz);
DEBUGFS_ADD(hw_queues);
+ DEBUGFS_ADD(txq);
}

static void add_sta_files(struct ieee80211_sub_if_data *sdata)
diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
index a39512f09f9e..7322fb098f4d 100644
--- a/net/mac80211/debugfs_sta.c
+++ b/net/mac80211/debugfs_sta.c
@@ -319,6 +319,51 @@ static ssize_t sta_vht_capa_read(struct file *file, char __user *userbuf,
}
STA_OPS(vht_capa);

+static ssize_t sta_txqs_read(struct file *file,
+ char __user *userbuf,
+ size_t count,
+ loff_t *ppos)
+{
+ struct sta_info *sta = file->private_data;
+ struct txq_info *txqi;
+ char *buf;
+ int buflen;
+ int len;
+ int res;
+ int i;
+
+ len = 0;
+ buflen = 200 * IEEE80211_NUM_TIDS;
+ buf = kzalloc(buflen, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ for (i = 0; i < IEEE80211_NUM_TIDS; i++) {
+ if (!sta->sta.txq[i])
+ break;
+
+ txqi = to_txq_info(sta->sta.txq[i]);
+ len += scnprintf(buf + len, buflen - len,
+ "TID %d AC %d backlog %ub %up flows %u drops %u overlimit %u collisions %u tx %ub %up\n",
+ i,
+ txqi->txq.ac,
+ txqi->backlog_bytes,
+ txqi->backlog_packets,
+ txqi->flows,
+ txqi->drop_codel,
+ txqi->drop_overlimit,
+ txqi->collisions,
+ txqi->tx_bytes,
+ txqi->tx_packets);
+ }
+
+ res = simple_read_from_buffer(userbuf, count, ppos, buf, len);
+ kfree(buf);
+
+ return res;
+}
+STA_OPS(txqs);
+

#define DEBUGFS_ADD(name) \
debugfs_create_file(#name, 0400, \
@@ -365,6 +410,7 @@ void ieee80211_sta_debugfs_add(struct sta_info *sta)
DEBUGFS_ADD(agg_status);
DEBUGFS_ADD(ht_capa);
DEBUGFS_ADD(vht_capa);
+ DEBUGFS_ADD(txqs);

DEBUGFS_ADD_COUNTER(rx_duplicates, rx_stats.num_duplicates);
DEBUGFS_ADD_COUNTER(rx_fragments, rx_stats.fragments);
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index dd65e34f7107..29ce9a110680 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -17,6 +17,7 @@
#include <linux/slab.h>
#include <linux/skbuff.h>
#include <linux/etherdevice.h>
+#include <linux/moduleparam.h>
#include <linux/bitmap.h>
#include <linux/rcupdate.h>
#include <linux/export.h>
@@ -36,6 +37,11 @@
#include "rate.h"
#include "codel.h"

+static unsigned int fq_flows_cnt = 4096;
+module_param(fq_flows_cnt, uint, 0644);
+MODULE_PARM_DESC(fq_flows_cnt,
+ "Maximum number of txq fair queuing flows. ");
+
/* misc utils */

static inline void ieee80211_tx_stats(struct net_device *dev, u32 len)
@@ -1347,7 +1353,7 @@ int ieee80211_setup_flows(struct ieee80211_local *local)
memset(fq, 0, sizeof(fq[0]));
INIT_LIST_HEAD(&fq->backlogs);
spin_lock_init(&fq->lock);
- fq->flows_cnt = 4096;
+ fq->flows_cnt = max_t(u32, fq_flows_cnt, 1);
fq->perturbation = prandom_u32();
fq->quantum = 300;
fq->txq_limit = 8192;
--
2.1.4


2016-03-17 09:03:47

by Michal Kazior

[permalink] [raw]
Subject: Re: [RFCv2 0/3] mac80211: implement fq codel

On 16 March 2016 at 16:37, Dave Taht <[email protected]> wrote:
> it is helpful to name the test files coherently in the flent tests, in
> addition to using a directory structure and timestamp. It makes doing
> comparison plots in data->add-other-open-data-files simpler. "-t
> patched-mac-300mbps", for example.

Sorry. I'm still trying to figure out what variables are worth
considering for comparison purposes.


> Also netperf from svn (maybe 2.7, don't remember) will restart udp_rr
> after a packet loss in 250ms. Seeing a loss on UDP_RR and it stop for
> a while is "ok".

I'm using 2.6 straight out of debian repos so yeah. I guess I'll try
using more recent netperf if I can't figure out the hiccups.


Michał


> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> https://www.gofundme.com/savewifi
>
>
> On Wed, Mar 16, 2016 at 3:26 AM, Michal Kazior <[email protected]> wrote:
>> On 16 March 2016 at 11:17, Michal Kazior <[email protected]> wrote:
>>> Hi,
>>>
>>> Most notable changes:
>> [...]
>>> * ath10k proof-of-concept that uses the new tx
>>> scheduling (will post results in separate
>>> email)
>>
>> I'm attaching a bunch of tests I've done using flent. They are all
>> "burst" tests with burst-ports=1 and burst-length=2. The testing
>> topology is:
>>
>> AP ----> STA
>> AP )) (( STA
>> [veth]--[br]--[wlan] )) (( [wlan]
>>
>> You can notice that in some tests plot data gets cut-off. There are 2
>> problems I've identified:
>> - excess drops (not a problem with the patchset and can be seen when
>> there's no codel-in-mac or scheduling isn't used)
>> - UDP_RR hangs (apparently QCA99X0 I have hangs for a few hundred ms
>> sometimes at times and doesn't Rx frames causing UDP_RR to stop
>> mid-way; confirmed with logs and sniffer; I haven't figured out *why*
>> exactly, could be some hw/fw quirk)
>>
>> Let me know if you have questions or comments regarding my testing/results.
>>
>>
>> Michał

2016-03-16 18:36:32

by Dave Taht

[permalink] [raw]
Subject: Re: [RFCv2 0/3] mac80211: implement fq codel

That is the sanest 802.11e queue behavior I have ever seen! (at both
6 and 300mbit! in the ath10k patched mac test)

It would be good to add a flow to this test that exercises the VI
queue (CS5 diffserv marking?), and to repeat this test with wmm
disabled for comparison.


Dave Täht
Let's go make home routers and wifi faster! With better software!
https://www.gofundme.com/savewifi


On Wed, Mar 16, 2016 at 8:37 AM, Dave Taht <[email protected]> wrote:
> it is helpful to name the test files coherently in the flent tests, in
> addition to using a directory structure and timestamp. It makes doing
> comparison plots in data->add-other-open-data-files simpler. "-t
> patched-mac-300mbps", for example.
>
> Also netperf from svn (maybe 2.7, don't remember) will restart udp_rr
> after a packet loss in 250ms. Seeing a loss on UDP_RR and it stop for
> a while is "ok".
> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> https://www.gofundme.com/savewifi
>
>
> On Wed, Mar 16, 2016 at 3:26 AM, Michal Kazior <[email protected]> wrote:
>> On 16 March 2016 at 11:17, Michal Kazior <[email protected]> wrote:
>>> Hi,
>>>
>>> Most notable changes:
>> [...]
>>> * ath10k proof-of-concept that uses the new tx
>>> scheduling (will post results in separate
>>> email)
>>
>> I'm attaching a bunch of tests I've done using flent. They are all
>> "burst" tests with burst-ports=1 and burst-length=2. The testing
>> topology is:
>>
>> AP ----> STA
>> AP )) (( STA
>> [veth]--[br]--[wlan] )) (( [wlan]
>>
>> You can notice that in some tests plot data gets cut-off. There are 2
>> problems I've identified:
>> - excess drops (not a problem with the patchset and can be seen when
>> there's no codel-in-mac or scheduling isn't used)
>> - UDP_RR hangs (apparently QCA99X0 I have hangs for a few hundred ms
>> sometimes at times and doesn't Rx frames causing UDP_RR to stop
>> mid-way; confirmed with logs and sniffer; I haven't figured out *why*
>> exactly, could be some hw/fw quirk)
>>
>> Let me know if you have questions or comments regarding my testing/results.
>>
>>
>> Michał


Attachments:
sanest_802.11eresult_i_have_ever_seen.png (143.51 kB)

2016-04-06 16:46:48

by Jonathan Morton

[permalink] [raw]
Subject: Re: [Make-wifi-fast] [PATCHv2 1/2] mac80211: implement fair queuing per txq


> On 6 Apr, 2016, at 10:16, Michal Kazior <[email protected]> wrote:
>
> When a driver asks mac80211 to dequeue given txq it implies a
> destination station as well. This is important because 802.11
> aggregation can be performed only on groups of packets going to a
> single station on a single tid.
>
> Cake - as I understand it - doesn't really *guarantee* maintaining
> this. Keep in mind you can run with hundreds of stations connected.
>
> You don't really want to burden drivers with sorting this grouping up
> themselves (and hence coerce them into introducing another level of
> intermediate queues, bis).

Well, no. Cake isn’t designed to maintain per-station queues explicitly, though it does have support for stochastic fairness between hosts. It is also blissfuly unaware of the requirements of wifi aggregation, largely because the standard qdisc interface is likewise ignorant. I’m therefore not suggesting that you use Cake as-is.

What I’m pointing at instead is the set-associative hash, which could easily be tweaked to put greater emphasis on avoiding putting multiple stations’ traffic in one queue, while maintaining the performance benefits of a fixed queue pool indexed by a hash table, and an extended operating region in which flow isolation is maintained. You can then have a linked-list of queues assigned to a particular station, so that when a packet for a particular station is requested, you can easily locate one.

I hadn’t appreciated, though, that the TXQ struct was station-specific. This wasn’t obvious from the code fragments posted, so it looked like packets that incurred hash collisions would be dumped into a single overflow queue.

- Jonathan Morton


2016-04-14 12:16:28

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv3 0/5] mac80211: implement fq_codel

Hi,

This patchset disables qdiscs for drivers
using software queuing and performs fq_codel-like
dequeuing on txqs.

I've reworked it as per Avery's suggestion (and
more).

I've (re)tested it against ath10k with and without
DQL (my ath10k RFC which is limited but sufficient
for some proofing scenarios) and got quite nice
looking results:

http://imgur.com/a/8ruhK
http://kazikcz.github.io/dl/2016-04-12-flent-fqmac-ath10k-dql.tar.gz

All DQL cases show incremental improvement and are
within expectations. The "dql-fq" case loses TCP
fairness/convergence (compared to "dql-taildrop")
because it removes per-txq 64 packet limit and
"dql-fqcodel" gets it back.


v3:
* split taildrop, fq and codel functionalities
into separate patches [Avery]

v2:
* fix invalid ptr deref
* fix compilation for backports


Michal Kazior (5):
mac80211: skip netdev queue control with software queuing
mac80211: implement fair queueing per txq
mac80211: add debug knobs for fair queuing
mac80211: implement codel on fair queuing flows
mac80211: add debug knobs for codel

include/net/mac80211.h | 17 ++-
net/mac80211/agg-tx.c | 8 +-
net/mac80211/codel.h | 265 ++++++++++++++++++++++++++++++++++++++++
net/mac80211/codel_i.h | 100 +++++++++++++++
net/mac80211/debugfs.c | 91 ++++++++++++++
net/mac80211/debugfs_netdev.c | 28 ++++-
net/mac80211/debugfs_sta.c | 45 +++++++
net/mac80211/fq.h | 276 ++++++++++++++++++++++++++++++++++++++++++
net/mac80211/fq_i.h | 82 +++++++++++++
net/mac80211/ieee80211_i.h | 32 ++++-
net/mac80211/iface.c | 26 ++--
net/mac80211/main.c | 10 +-
net/mac80211/rx.c | 2 +-
net/mac80211/sta_info.c | 14 +--
net/mac80211/tx.c | 274 ++++++++++++++++++++++++++++++++++-------
net/mac80211/util.c | 34 ++----
16 files changed, 1204 insertions(+), 100 deletions(-)
create mode 100644 net/mac80211/codel.h
create mode 100644 net/mac80211/codel_i.h
create mode 100644 net/mac80211/fq.h
create mode 100644 net/mac80211/fq_i.h

--
2.1.4


2016-04-19 09:31:42

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCHv3 4/5] mac80211: implement codel on fair queuing flows

On 19 April 2016 at 11:06, Johannes Berg <[email protected]> wrote:
> On Mon, 2016-04-18 at 14:38 +0200, Michal Kazior wrote:
>> On 18 April 2016 at 07:31, Michal Kazior <[email protected]>
>> wrote:
>> >
>> > On 17 April 2016 at 00:29, Johannes Berg <[email protected]
>> > > wrote:
>> > >
>> > > On Thu, 2016-04-14 at 14:18 +0200, Michal Kazior wrote:
>> > > >
>> > > >
>> > > > + struct ieee80211_vif *vif;
>> > > > +
>> > > > + /* When packets are enqueued on
>> > > > txq
>> > > > it's easy
>> > > > + * to re-construct the vif
>> > > > pointer.
>> > > > There's no
>> > > > + * more space in tx_info so it
>> > > > can
>> > > > be used to
>> > > > + * store the necessary enqueue
>> > > > time
>> > > > for packet
>> > > > + * sojourn time computation.
>> > > > + */
>> > > > + u64 enqueue_time;
>> > > > + };
>> > > I wonder if we could move something like the hw_key into
>> > > tx_control
>> > > instead?
>> > Hmm.. It's probably doable. From a quick look it'll require quite
>> > some
>> > change here and there (e.g. tdls_channel_switch op will need to be
>> > extended to pass tx_control). I'll play with the idea..
>> This is actually far more than I thought initially.
>
> Fair enough. Perhaps it could be done for the vif? But ISTR there were
> issues with that when I looked.

Still tricky in a similar fashion as hw_key.


> We should just get rid of all the rate stuff and convert everything to
> use rate tables, but ... :)

I'm guessing it's not trivial either and you risk breaking a lot of stuff? :)


>> A lot of drivers
>> (b43, b43legacy, rtlwifi, wlxxxx, cw1200) access hw_key outside of tx
>> op context (tx workers, tx completions). I'm not even sure this is
>> safe (keys can be freed in the meantime by mac80211 hence invaliding
>> the pointer inside skb, no?).
>>
>
> Hm, yeah, that does seem problematic unless they synchronize against
> key removal somehow?

I didn't see any explicit synchronization but maybe I missed something.


Michał

2016-04-18 05:39:18

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCHv3 1/5] mac80211: skip netdev queue control with software queuing

On 17 April 2016 at 00:21, Johannes Berg <[email protected]> wrote:
>> +static void ieee80211_txq_enqueue(struct ieee80211_local *local,
>> + struct txq_info *txqi,
>> + struct sk_buff *skb)
>> +{
>> + lockdep_assert_held(&txqi->queue.lock);
> [...]
>> + atomic_inc(&local->num_tx_queued);
>
> This global kinda bothers me - anything we can do about removing it?

I don't think so. Re-counting via sta/vif/txq iteration every time is
rather a bad idea.

FWIW This is removed by the "fq" patch. The main purpose of taildrop
patch is to make some comparisons easier.


> We obviously didn't have it now - just one (even bigger limit!) per
> queue, so that's 4000 frames default per interface ... now you're down
> to 512 for the entire hardware. Perhaps keeping it per interface at
> least gets away the worst of the contention here?

The default qdisc limits were arguably already too big anyway.
Nevertheless it makes sense to have the 512 limit per interface
instead of per radio. I'll move num_tx_queued to sdata.


Michał

2016-04-06 05:35:49

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCHv2 1/2] mac80211: implement fair queuing per txq

On 5 April 2016 at 15:57, Johannes Berg <[email protected]> wrote:
> On Thu, 2016-03-31 at 12:28 +0200, Michal Kazior wrote:
>
>> +++ b/net/mac80211/codel.h
>> +++ b/net/mac80211/codel_i.h
>
> Do we really need all this code in .h files? It seems very odd to me to
> have all the algorithm implementation there rather than a C file, you
> should (can?) only include codel.h into a single C file anyway.

I just wanted to follow the suggested/implied usage of codel code and
keep modifications to a minimum. I could very well just assimilate it
if you wish.


>> struct txq_info {
>> - struct sk_buff_head queue;
>> + struct txq_flow flow;
>> + struct list_head new_flows;
>> + struct list_head old_flows;
>
> This is confusing, can you please document that? Why are there two
> lists of flows, *and* an embedded flow? Is the embedded flow on any of
> the lists?

The new/old flows is follows the same principle as net/sched/sch_fq_codel.c

The embedded flow is for possible collisions, explained below.

Nevertheless I'll add more comments on what-is-what-and-why.


>> + u32 backlog_bytes;
>> + u32 backlog_packets;
>> + u32 drop_codel;
>
> Would it make some sense to at least conceptually layer this a bit?
> I.e. rather than calling this "drop_codel" call it "drop_congestion" or
> something like that?

Sure, I'll change it.


>> +/**
>> + * struct txq_flow - per traffic flow queue
>> + *
>> + * This structure is used to distinguish and queue different traffic flows
>> + * separately for fair queueing/AQM purposes.
>> + *
>> + * @txqi: txq_info structure it is associated at given time
>
> Do we actually have to keep that? It's on a list per txqi, no?

It's used to track ownership.

Packets can be destined to different stations/txqs. At enqueue time I
do a partial hash of a packet to get an "index" which I then use to
address a txq_flow from per-radio list (out of 4096 of them). You can
end up with a situtation like this:
- packet A hashing to X destined to txq P which is VI
- packet B hashing to X destined to txq Q which is BK

You can't use the same txq_flow for both A and B because you want to
maintain packets per txqs more than you want to maintain them per flow
(you don't want to queue BK traffic onto VI or vice versa as an
artifact, do you? ;). When a txq_flow doesn't have a txqi it is bound.
Later, if a collision happens (i.e. resulting txq_flow has non-NULL
txqi) the "embedded" per-txq flow is used:

struct txq_info {
- struct sk_buff_head queue;
+ struct txq_flow flow; // <--- this

When txq_flow becomes empty its txqi is reset.

The embedded flow is otherwise treated like any other flow, i.e. it
can be linked to old_flows and new_flows.


>> + * @flowchain: can be linked to other flows for RR purposes
>
> RR?

Round-robin. Assuming it's correct to call fq_codel an RR scheme?



>> +void ieee80211_teardown_flows(struct ieee80211_local *local)
>> +{
>> + struct ieee80211_fq *fq = &local->fq;
>> + struct ieee80211_sub_if_data *sdata;
>> + struct sta_info *sta;
>> + int i;
>> +
>> + if (!local->ops->wake_tx_queue)
>> + return;
>> +
>> + list_for_each_entry_rcu(sta, &local->sta_list, list)
>> + for (i = 0; i < IEEE80211_NUM_TIDS; i++)
>> + ieee80211_purge_txq(local,
>> + to_txq_info(sta->sta.txq[i]));
>> +
>> + list_for_each_entry_rcu(sdata, &local->interfaces, list)
>> + ieee80211_purge_txq(local, to_txq_info(sdata->vif.txq));
>
> Using RCU iteration here seems rather strange, since it's a teardown
> flow? That doesn't seem necessary, since it's control path and must be
> holding appropriate locks anyway to make sure nothing is added to the
> lists.

You're probably right. I'll look into changing it.


>
>> + skb = codel_dequeue(flow,
>> + &flow->backlog,
>> + 0,
>> + &flow->cvars,
>> + &fq->cparams,
>> + codel_get_time(),
>> + false);
>
> What happened here? :)

I'm not a huge fan of wrapping functions with a lot of (ugly-looking)
arguments. I can make it a different ugly if you want :)


>> + if (!skb) {
>> + if ((head == &txqi->new_flows) &&
>> + !list_empty(&txqi->old_flows)) {
>> + list_move_tail(&flow->flowchain, &txqi->old_flows);
>> + } else {
>> + list_del_init(&flow->flowchain);
>> + flow->txqi = NULL;
>> + }
>> + goto begin;
>> + }
>
> Ouch. Any way you can make that easier to follow?

This follows net/sched/sch_fq_codel.h. I can put up a comment to
explain what it's supposed to do?


Michał

2016-04-14 12:16:33

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv3 3/5] mac80211: add debug knobs for fair queuing

This adds a few debugfs entries and a module
parameter to make it easier to test, debug and
experiment.

Signed-off-by: Michal Kazior <[email protected]>
---
net/mac80211/debugfs.c | 77 +++++++++++++++++++++++++++++++++++++++++++
net/mac80211/debugfs_netdev.c | 28 +++++++++++++++-
net/mac80211/debugfs_sta.c | 45 +++++++++++++++++++++++++
net/mac80211/fq.h | 13 +++++++-
net/mac80211/fq_i.h | 7 ++++
net/mac80211/tx.c | 8 ++++-
6 files changed, 175 insertions(+), 3 deletions(-)

diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 4ab5c522ceee..5cbaa5872e6b 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -31,6 +31,30 @@ int mac80211_format_buffer(char __user *userbuf, size_t count,
return simple_read_from_buffer(userbuf, count, ppos, buf, res);
}

+static int mac80211_parse_buffer(const char __user *userbuf,
+ size_t count,
+ loff_t *ppos,
+ char *fmt, ...)
+{
+ va_list args;
+ char buf[DEBUGFS_FORMAT_BUFFER_SIZE] = {};
+ int res;
+
+ if (count > sizeof(buf))
+ return -EINVAL;
+
+ if (copy_from_user(buf, userbuf, count))
+ return -EFAULT;
+
+ buf[sizeof(buf) - 1] = '\0';
+
+ va_start(args, fmt);
+ res = vsscanf(buf, fmt, args);
+ va_end(args);
+
+ return count;
+}
+
#define DEBUGFS_READONLY_FILE_FN(name, fmt, value...) \
static ssize_t name## _read(struct file *file, char __user *userbuf, \
size_t count, loff_t *ppos) \
@@ -70,6 +94,52 @@ DEBUGFS_READONLY_FILE(wep_iv, "%#08x",
DEBUGFS_READONLY_FILE(rate_ctrl_alg, "%s",
local->rate_ctrl ? local->rate_ctrl->ops->name : "hw/driver");

+#define DEBUGFS_RW_FILE_FN(name, expr) \
+static ssize_t name## _write(struct file *file, \
+ const char __user *userbuf, \
+ size_t count, \
+ loff_t *ppos) \
+{ \
+ struct ieee80211_local *local = file->private_data; \
+ return expr; \
+}
+
+#define DEBUGFS_RW_FILE(name, expr, fmt, value...) \
+ DEBUGFS_READONLY_FILE_FN(name, fmt, value) \
+ DEBUGFS_RW_FILE_FN(name, expr) \
+ DEBUGFS_RW_FILE_OPS(name)
+
+#define DEBUGFS_RW_FILE_OPS(name) \
+static const struct file_operations name## _ops = { \
+ .read = name## _read, \
+ .write = name## _write, \
+ .open = simple_open, \
+ .llseek = generic_file_llseek, \
+}
+
+#define DEBUGFS_RW_EXPR_FQ(args...) \
+({ \
+ int res; \
+ res = mac80211_parse_buffer(userbuf, count, ppos, args); \
+ res; \
+})
+
+DEBUGFS_READONLY_FILE(fq_flows_cnt, "%u",
+ local->fq.flows_cnt);
+DEBUGFS_READONLY_FILE(fq_backlog, "%u",
+ local->fq.backlog);
+DEBUGFS_READONLY_FILE(fq_overlimit, "%u",
+ local->fq.overlimit);
+DEBUGFS_READONLY_FILE(fq_collisions, "%u",
+ local->fq.collisions);
+
+DEBUGFS_RW_FILE(fq_limit,
+ DEBUGFS_RW_EXPR_FQ("%u", &local->fq.limit),
+ "%u", local->fq.limit);
+DEBUGFS_RW_FILE(fq_quantum,
+ DEBUGFS_RW_EXPR_FQ("%u", &local->fq.quantum),
+ "%u", local->fq.quantum);
+
#ifdef CONFIG_PM
static ssize_t reset_write(struct file *file, const char __user *user_buf,
size_t count, loff_t *ppos)
@@ -254,6 +324,13 @@ void debugfs_hw_add(struct ieee80211_local *local)
DEBUGFS_ADD(user_power);
DEBUGFS_ADD(power);

+ DEBUGFS_ADD(fq_flows_cnt);
+ DEBUGFS_ADD(fq_backlog);
+ DEBUGFS_ADD(fq_overlimit);
+ DEBUGFS_ADD(fq_collisions);
+ DEBUGFS_ADD(fq_limit);
+ DEBUGFS_ADD(fq_quantum);
+
statsd = debugfs_create_dir("statistics", phyd);

/* if the dir failed, don't put all the other things into the root! */
diff --git a/net/mac80211/debugfs_netdev.c b/net/mac80211/debugfs_netdev.c
index 37ea30e0754c..471cab40a25f 100644
--- a/net/mac80211/debugfs_netdev.c
+++ b/net/mac80211/debugfs_netdev.c
@@ -30,7 +30,7 @@ static ssize_t ieee80211_if_read(
size_t count, loff_t *ppos,
ssize_t (*format)(const struct ieee80211_sub_if_data *, char *, int))
{
- char buf[70];
+ char buf[200];
ssize_t ret = -EINVAL;

read_lock(&dev_base_lock);
@@ -236,6 +236,31 @@ ieee80211_if_fmt_hw_queues(const struct ieee80211_sub_if_data *sdata,
}
IEEE80211_IF_FILE_R(hw_queues);

+static ssize_t
+ieee80211_if_fmt_txq(const struct ieee80211_sub_if_data *sdata,
+ char *buf, int buflen)
+{
+ struct txq_info *txqi;
+ int len = 0;
+
+ if (!sdata->vif.txq)
+ return 0;
+
+ txqi = to_txq_info(sdata->vif.txq);
+ len += scnprintf(buf + len, buflen - len,
+ "CAB backlog %ub %up flows %u overlimit %u collisions %u tx %ub %up\n",
+ txqi->tin.backlog_bytes,
+ txqi->tin.backlog_packets,
+ txqi->tin.flows,
+ txqi->tin.overlimit,
+ txqi->tin.collisions,
+ txqi->tin.tx_bytes,
+ txqi->tin.tx_packets);
+
+ return len;
+}
+IEEE80211_IF_FILE_R(txq);
+
/* STA attributes */
IEEE80211_IF_FILE(bssid, u.mgd.bssid, MAC);
IEEE80211_IF_FILE(aid, u.mgd.aid, DEC);
@@ -618,6 +643,7 @@ static void add_common_files(struct ieee80211_sub_if_data *sdata)
DEBUGFS_ADD(rc_rateidx_vht_mcs_mask_2ghz);
DEBUGFS_ADD(rc_rateidx_vht_mcs_mask_5ghz);
DEBUGFS_ADD(hw_queues);
+ DEBUGFS_ADD(txq);
}

static void add_sta_files(struct ieee80211_sub_if_data *sdata)
diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
index a39512f09f9e..b5eb4f402710 100644
--- a/net/mac80211/debugfs_sta.c
+++ b/net/mac80211/debugfs_sta.c
@@ -319,6 +319,50 @@ static ssize_t sta_vht_capa_read(struct file *file, char __user *userbuf,
}
STA_OPS(vht_capa);

+static ssize_t sta_txqs_read(struct file *file,
+ char __user *userbuf,
+ size_t count,
+ loff_t *ppos)
+{
+ struct sta_info *sta = file->private_data;
+ struct txq_info *txqi;
+ char *buf;
+ int buflen;
+ int len;
+ int res;
+ int i;
+
+ len = 0;
+ buflen = 200 * IEEE80211_NUM_TIDS;
+ buf = kzalloc(buflen, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ for (i = 0; i < IEEE80211_NUM_TIDS; i++) {
+ if (!sta->sta.txq[i])
+ break;
+
+ txqi = to_txq_info(sta->sta.txq[i]);
+ len += scnprintf(buf + len, buflen - len,
+ "TID %d AC %d backlog %ub %up flows %u overlimit %u collisions %u tx %ub %up\n",
+ i,
+ txqi->txq.ac,
+ txqi->tin.backlog_bytes,
+ txqi->tin.backlog_packets,
+ txqi->tin.flows,
+ txqi->tin.overlimit,
+ txqi->tin.collisions,
+ txqi->tin.tx_bytes,
+ txqi->tin.tx_packets);
+ }
+
+ res = simple_read_from_buffer(userbuf, count, ppos, buf, len);
+ kfree(buf);
+
+ return res;
+}
+STA_OPS(txqs);
+

#define DEBUGFS_ADD(name) \
debugfs_create_file(#name, 0400, \
@@ -365,6 +409,7 @@ void ieee80211_sta_debugfs_add(struct sta_info *sta)
DEBUGFS_ADD(agg_status);
DEBUGFS_ADD(ht_capa);
DEBUGFS_ADD(vht_capa);
+ DEBUGFS_ADD(txqs);

DEBUGFS_ADD_COUNTER(rx_duplicates, rx_stats.num_duplicates);
DEBUGFS_ADD_COUNTER(rx_fragments, rx_stats.fragments);
diff --git a/net/mac80211/fq.h b/net/mac80211/fq.h
index fa98576e1825..aa68363d6221 100644
--- a/net/mac80211/fq.h
+++ b/net/mac80211/fq.h
@@ -102,6 +102,8 @@ begin:
}

flow->deficit -= skb->len;
+ tin->tx_bytes += skb->len;
+ tin->tx_packets++;

return skb;
}
@@ -120,8 +122,14 @@ static struct fq_flow *fq_flow_classify(struct fq *fq,
idx = reciprocal_scale(hash, fq->flows_cnt);
flow = &fq->flows[idx];

- if (flow->tin && flow->tin != tin)
+ if (flow->tin && flow->tin != tin) {
flow = fq_flow_get_default_fn(fq, tin, idx, skb);
+ tin->collisions++;
+ fq->collisions++;
+ }
+
+ if (!flow->tin)
+ tin->flows++;

return flow;
}
@@ -174,6 +182,9 @@ static void fq_tin_enqueue(struct fq *fq,
return;

fq_skb_free_fn(fq, flow->tin, flow, skb);
+
+ flow->tin->overlimit++;
+ fq->overlimit++;
}
}

diff --git a/net/mac80211/fq_i.h b/net/mac80211/fq_i.h
index 5d8423f22e8d..0e25dda4fce3 100644
--- a/net/mac80211/fq_i.h
+++ b/net/mac80211/fq_i.h
@@ -51,6 +51,11 @@ struct fq_tin {
struct list_head old_flows;
u32 backlog_bytes;
u32 backlog_packets;
+ u32 overlimit;
+ u32 collisions;
+ u32 flows;
+ u32 tx_bytes;
+ u32 tx_packets;
};

/**
@@ -70,6 +75,8 @@ struct fq {
u32 limit;
u32 quantum;
u32 backlog;
+ u32 overlimit;
+ u32 collisions;
};

#endif
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index d4e0c87ecec5..396d0d17edeb 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -17,6 +17,7 @@
#include <linux/slab.h>
#include <linux/skbuff.h>
#include <linux/etherdevice.h>
+#include <linux/moduleparam.h>
#include <linux/bitmap.h>
#include <linux/rcupdate.h>
#include <linux/export.h>
@@ -36,6 +37,11 @@
#include "rate.h"
#include "fq.h"

+static unsigned int fq_flows_cnt = 4096;
+module_param(fq_flows_cnt, uint, 0644);
+MODULE_PARM_DESC(fq_flows_cnt,
+ "Maximum number of txq fair queuing flows. ");
+
/* misc utils */

static inline void ieee80211_tx_stats(struct net_device *dev, u32 len)
@@ -1336,7 +1342,7 @@ int ieee80211_txq_setup_flows(struct ieee80211_local *local)
if (!local->ops->wake_tx_queue)
return 0;

- ret = fq_init(fq, 4096);
+ ret = fq_init(fq, max_t(u32, fq_flows_cnt, 1));
if (ret)
return ret;

--
2.1.4


2016-04-06 06:03:58

by Jonathan Morton

[permalink] [raw]
Subject: Re: [Make-wifi-fast] [PATCHv2 1/2] mac80211: implement fair queuing per txq


> On 6 Apr, 2016, at 08:35, Michal Kazior <[email protected]> wrote:
>
> Packets can be destined to different stations/txqs. At enqueue time I
> do a partial hash of a packet to get an "index" which I then use to
> address a txq_flow from per-radio list (out of 4096 of them). You can
> end up with a situtation like this:
> - packet A hashing to X destined to txq P which is VI
> - packet B hashing to X destined to txq Q which is BK
>
> You can't use the same txq_flow for both A and B because you want to
> maintain packets per txqs more than you want to maintain them per flow
> (you don't want to queue BK traffic onto VI or vice versa as an
> artifact, do you? ;). When a txq_flow doesn't have a txqi it is bound.
> Later, if a collision happens (i.e. resulting txq_flow has non-NULL
> txqi) the "embedded" per-txq flow is used:
>
> struct txq_info {
> - struct sk_buff_head queue;
> + struct txq_flow flow; // <--- this
>
> When txq_flow becomes empty its txqi is reset.
>
> The embedded flow is otherwise treated like any other flow, i.e. it
> can be linked to old_flows and new_flows.

This smells like a very fragile and complex solution to the collision problem. You may want to look at how Cake solves it.

I use a separate pool of flows per traffic class (essentially, VO/VI/BE/BK), and there is also a set-associative hash to take care of the birthday problem. The latter has an order-of-magnitude effect on the general flow collision rate once you get into the tens of flows, for very little CPU cost.

- Jonathan Morton


2016-04-06 07:21:21

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCHv2 1/2] mac80211: implement fair queuing per txq

[removing other lists since they spam me with moderation bounces]

> The hope had been the original codel.h would have been reusable,
> which is not the case at present.

So what's the strategy for making it happen? Unless there is one, I
don't see the point in making the code more complicated than it already
has to be anyway.

johannes

2016-04-18 05:31:39

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCHv3 4/5] mac80211: implement codel on fair queuing flows

On 17 April 2016 at 00:29, Johannes Berg <[email protected]> wrote:
> On Thu, 2016-04-14 at 14:18 +0200, Michal Kazior wrote:
>>
>> + struct ieee80211_vif *vif;
>> +
>> + /* When packets are enqueued on txq
>> it's easy
>> + * to re-construct the vif pointer.
>> There's no
>> + * more space in tx_info so it can
>> be used to
>> + * store the necessary enqueue time
>> for packet
>> + * sojourn time computation.
>> + */
>> + u64 enqueue_time;
>> + };
>
> I wonder if we could move something like the hw_key into tx_control
> instead?

Hmm.. It's probably doable. From a quick look it'll require quite some
change here and there (e.g. tdls_channel_switch op will need to be
extended to pass tx_control). I'll play with the idea..


Michał

2016-04-05 14:32:13

by Dave Taht

[permalink] [raw]
Subject: Re: [PATCHv2 1/2] mac80211: implement fair queuing per txq

thx for the review!

On Tue, Apr 5, 2016 at 6:57 AM, Johannes Berg <[email protected]> wrote:
> On Thu, 2016-03-31 at 12:28 +0200, Michal Kazior wrote:
>
>> +++ b/net/mac80211/codel.h
>> +++ b/net/mac80211/codel_i.h
>
> Do we really need all this code in .h files? It seems very odd to me to
> have all the algorithm implementation there rather than a C file, you
> should (can?) only include codel.h into a single C file anyway.

The hope had been the original codel.h would have been reusable, which
is not the case at present.

>
>> struct txq_info {
>> - struct sk_buff_head queue;
>> + struct txq_flow flow;
>> + struct list_head new_flows;
>> + struct list_head old_flows;
>
> This is confusing, can you please document that? Why are there two
> lists of flows, *and* an embedded flow? Is the embedded flow on any of
> the lists?

To explain the new and old flow concepts, there's
https://tools.ietf.org/html/draft-ietf-aqm-fq-codel-06 which is in the
ietf editors queue for final publication and doesn't have a final name
yet.

The embedded flow concept is michal's and I'm not convinced it's the
right idea as yet.

>
>> + u32 backlog_bytes;
>> + u32 backlog_packets;
>> + u32 drop_codel;
>
> Would it make some sense to at least conceptually layer this a bit?
> I.e. rather than calling this "drop_codel" call it "drop_congestion" or
> something like that?

Is there a more generic place overall in ieee80211 to record per-sta
backlogs, drops and marks?

>> + skb = codel_dequeue(flow,
>> + &flow->backlog,
>> + 0,
>> + &flow->cvars,
>> + &fq->cparams,
>> + codel_get_time(),
>> + false);
>
> What happened here? :)

Magic.

>
>> + if (!skb) {
>> + if ((head == &txqi->new_flows) &&
>> + !list_empty(&txqi->old_flows)) {
>> + list_move_tail(&flow->flowchain, &txqi->old_flows);
>> + } else {
>> + list_del_init(&flow->flowchain);
>> + flow->txqi = NULL;
>> + }
>> + goto begin;
>> + }
>
> Ouch. Any way you can make that easier to follow?

It made my brain hurt in the original code, too, but it is eric
optimizing out cycles at his finest.

if the the new_flows list is expired or done, switch to the old_flows
list, if the old_flows list is done, go try selecting another queue to
pull from (which may or may not exist). see the pending rfc for a more
elongated version.

>
> johannes

2016-04-16 22:23:07

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCHv3 2/5] mac80211: implement fair queueing per txq

On Thu, 2016-04-14 at 14:18 +0200, Michal Kazior wrote:

> +++ b/net/mac80211/fq.h
>
Now that you've mostly rewritten it, why keep it in a .h file?

johannes

2016-04-19 09:57:57

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCHv3 4/5] mac80211: implement codel on fair queuing flows

On Tue, 2016-04-19 at 11:31 +0200, Michal Kazior wrote:

> > We should just get rid of all the rate stuff and convert everything
> > to use rate tables, but ... :)
> I'm guessing it's not trivial either and you risk breaking a lot of
> stuff? :)

It's not "tricky" in the same sense - but we'd have to convert drivers
to use a different data structure etc. Quite a bit of work.

johannes

2016-04-11 07:25:46

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCH 1/2] mac80211: implement fair queuing per txq

On 8 April 2016 at 06:37, Avery Pennarun <[email protected]> wrote:
> On Fri, Mar 25, 2016 at 5:27 AM, Michal Kazior <[email protected]> wrote:
>> mac80211's software queues were designed to work
>> very closely with device tx queues. They are
>> required to make use of 802.11 packet aggregation
>> easily and efficiently.
>>
>> However the logic imposed a per-AC queue limit.
>> With the limit too small mac80211 wasn't be able
>> to guarantee fairness across TIDs nor stations
>> because single burst to a slow station could
>> monopolize queues and reach per-AC limit
>> preventing traffic from other stations being
>> queued into mac80211's software queues. Having the
>> limit too large would make smart qdiscs, e.g.
>> fq_codel, a lot less efficient as they are
>> designed on the premise that they are very close
>> to the actualy device tx queues.
>
> As usual, I'm way behind on everything, but I have been testing this
> patch series in the background (no clear results to report yet) and
> wanted to comment at a very high level. I think you are actually
> doing several stages of improvements all at once here:
>
> [0. Baseline: one big queue going into the driver]
> 1. Switch ath10k to mac80211 per-station queues.
> 2. Change per-station queues to use NO_QUEUE qdisc and *not* ever stop
> the kernel netdev queue (since there no longer is one).
> 3. Actively manage per-station queues with fq_codel.
> 4. DQL-like control system for managing hardware queues.

The #4 is not really part of this patch series (unless you add the
older RFCv2 with txop scheduling or ath10k RFC which tests global DQL
to the mix).

Otherwise you're correct. I would argue #1 doesn't really matter
because without it all other changes are no-op though.


> Just to clarify what I mean by #2, if I understand correctly, before
> this patch, the driver+mac80211 keeps track of the total number of
> packets in all the mac80211 queues.

It keeps track of number of packets per AC..


> When the total exceeds a fixed
> amount (or when one of the per-station queues gets full?) mac80211
> tells the kernel to stop sending in new packets, so they sit around in
> the qdisc instead.

..and stops netdev subqueues (0..3) when this per-AC limit is reached.

That is all assuming you have a wake_tx_queue() based driver.
Otherwise mac80211 doesn't really do anything special. It stops
subqueues when driver asks it to (when internal driver/device queues
become full).


> The problem with this behaviour is we probably
> have a lot of packets for one station, and not many packets for other
> stations, even if the netdev qdisc has plenty of packets still waiting
> for those other stations. When you then go to drain the mac80211
> queues in a round-robin fashion, only the fullest queue (corresponding
> to the busiest stream to the fastest station) can get optimal results.
> The driver can then either send out from the fullest queue (unfair but
> fast) or round robin using the non-full queues (fair but non-optimal
> speed).

Roughly, yes.


> Upon implementing #2, we would essentially never tell the kernel to
> stop sending packets; instead, it just always forwards them to
> mac80211, which needs to learn how to drop them instead of providing
> backpressure. This moves the entire qdisc functionality into
> mac80211, hence the use of NO_QUEUE.

Correct.


> It's then obvious that if you just did the obvious thing (tail drop),
> you'll end up with high latency, so you added fq_codel to the mix.

Yes. Also the mere taildrop doesn't split flows so you end up bulky
TCP traffic delaying/starving, e.g. your ICMP (and other short/bursty
traffic) which might be considered a regression if someone used
fq_codel before.


> However, as people on this thread have noticed, fq_codel is
> complicated. I'd like to be able to evaluate the performance impact
> of each of the above steps separately. In particular, my theory is
> that if we implement #2 with just a simple FIFO queue per station,
> then if we have two stations competing (one slow, one fast), and
> dequeue aggregates using round robin, then we should get all of:
>
> a) Full airtime utilization and max-length aggregates
> and
> b) High latency only on busy stations, but near-zero latency on idle
> stations (because of round-robin servicing of the per-station queues).

Roughly, yes.


> Using just a tail drop implementation, it should be very easy for me
> to test that (a) and (b) are true. It should also be strictly equal
> (one station) or better (multiple stations) than using mac80211 soft
> queues with the pfifo_fast qdisc. If that isn't what happens, then
> we'll know something went wrong with that part of the code, and we can
> debug that before moving on to a wifi-aware fq_codel.
>
> So my request: do you mind splitting your patch into two patches, one
> that implements just NO_QUEUE and per-station fifo tail drop, with a
> second patch that converts the tail drop to fq_codel?

Hmm.. Actually it might be worth splitting it up into 3 patches:
1. noqueue (taildrop)
2. fq (headdrop)
3. codel

In which case (2) should introduce additional intra-station flow
fairness (compared to (1)) and (3) should make tcp behave better,
right?

Moreover I'm thinking of making "fq" part reusable the same way
"codel" is (or attempts to be today at least), i.e. header trickery
for easy gluing. This should make it easier to fix non-mac80211
drivers as well with possibly little effort in the future. Thoughts?


> Another advantage of the split is that we could then test NO_QUEUE +
> tail_drop + DQL. Again, that should be strictly better than the
> NO_QUEUE + tail_drop + fixed_driver_queue. Then it might be easier to
> debug the (much more fiddly) fq_codel on top.
>
> Thoughts?

It does make sense to split it up.


Michał

2016-04-19 09:06:34

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCHv3 4/5] mac80211: implement codel on fair queuing flows

On Mon, 2016-04-18 at 14:38 +0200, Michal Kazior wrote:
> On 18 April 2016 at 07:31, Michal Kazior <[email protected]>
> wrote:
> >
> > On 17 April 2016 at 00:29, Johannes Berg <[email protected]
> > > wrote:
> > >
> > > On Thu, 2016-04-14 at 14:18 +0200, Michal Kazior wrote:
> > > >
> > > >
> > > > +                             struct ieee80211_vif *vif;
> > > > +
> > > > +                             /* When packets are enqueued on
> > > > txq
> > > > it's easy
> > > > +                              * to re-construct the vif
> > > > pointer.
> > > > There's no
> > > > +                              * more space in tx_info so it
> > > > can
> > > > be used to
> > > > +                              * store the necessary enqueue
> > > > time
> > > > for packet
> > > > +                              * sojourn time computation.
> > > > +                              */
> > > > +                             u64 enqueue_time;
> > > > +                     };
> > > I wonder if we could move something like the hw_key into
> > > tx_control
> > > instead?
> > Hmm.. It's probably doable. From a quick look it'll require quite
> > some
> > change here and there (e.g. tdls_channel_switch op will need to be
> > extended to pass tx_control). I'll play with the idea..
> This is actually far more than I thought initially.

Fair enough. Perhaps it could be done for the vif? But ISTR there were
issues with that when I looked.

We should just get rid of all the rate stuff and convert everything to
use rate tables, but ... :)

> A lot of drivers
> (b43, b43legacy, rtlwifi, wlxxxx, cw1200) access hw_key outside of tx
> op context (tx workers, tx completions). I'm not even sure this is
> safe (keys can be freed in the meantime by mac80211 hence invaliding
> the pointer inside skb, no?).
>

Hm, yeah, that does seem problematic unless they synchronize against
key removal somehow?

johannes

2016-04-07 08:53:51

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCHv2 1/2] mac80211: implement fair queuing per txq

On Wed, 2016-04-06 at 10:39 -0700, Dave Taht wrote:

> > > The hope had been the original codel.h would have been reusable,
> > > which is not the case at present.
> > So what's the strategy for making it happen?
> Strategy? to meander towards a result that gives low latency to all
> stations, no matter their bandwidth, on several chipsets.

I meant "strategy for making the code reusable". Or something like that
anyway. I don't see the point in trying and then failing. Here we're
adding a completely different version of codel to the kernel - why?
What makes this version unusable for the original usage in
include/net/codel.h? Can't we replace that one with the newer version
and actually use the same file here?

Or - why bother with the header file to make it shareable, if we're not
even attempting to do that?

johannes

2016-04-05 13:57:43

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCHv2 1/2] mac80211: implement fair queuing per txq

On Thu, 2016-03-31 at 12:28 +0200, Michal Kazior wrote:

> +++ b/net/mac80211/codel.h
> +++ b/net/mac80211/codel_i.h

Do we really need all this code in .h files? It seems very odd to me to
have all the algorithm implementation there rather than a C file, you
should (can?) only include codel.h into a single C file anyway.

>  struct txq_info {
> - struct sk_buff_head queue;
> + struct txq_flow flow;
> + struct list_head new_flows;
> + struct list_head old_flows;

This is confusing, can you please document that? Why are there two
lists of flows, *and* an embedded flow? Is the embedded flow on any of
the lists?

> + u32 backlog_bytes;
> + u32 backlog_packets;
> + u32 drop_codel;

Would it make some sense to at least conceptually layer this a bit?
I.e. rather than calling this "drop_codel" call it "drop_congestion" or
something like that?

> @@ -977,12 +978,9 @@ static void ieee80211_do_stop(struct
> ieee80211_sub_if_data *sdata,
>   if (sdata->vif.txq) {
>   struct txq_info *txqi = to_txq_info(sdata->vif.txq);
>  
> - spin_lock_bh(&txqi->queue.lock);
> - ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
> - txqi->byte_cnt = 0;
> - spin_unlock_bh(&txqi->queue.lock);
> -
> - atomic_set(&sdata->txqs_len[txqi->txq.ac], 0);
> + spin_lock_bh(&fq->lock);
> + ieee80211_purge_txq(local, txqi);
> + spin_unlock_bh(&fq->lock);

This isn't very nice - you're going from locking a single txqi to
having a global hardware lock.

It's probably fine in this particular case, but I'll need to look for
other places :)

> +/**
> + * struct txq_flow - per traffic flow queue
> + *
> + * This structure is used to distinguish and queue different traffic flows
> + * separately for fair queueing/AQM purposes.
> + *
> + * @txqi: txq_info structure it is associated at given time

Do we actually have to keep that? It's on a list per txqi, no?

> + * @flowchain: can be linked to other flows for RR purposes

RR?

> +void ieee80211_teardown_flows(struct ieee80211_local *local)
> +{
> + struct ieee80211_fq *fq = &local->fq;
> + struct ieee80211_sub_if_data *sdata;
> + struct sta_info *sta;
> + int i;
> +
> + if (!local->ops->wake_tx_queue)
> + return;
> +
> + list_for_each_entry_rcu(sta, &local->sta_list, list)
> + for (i = 0; i < IEEE80211_NUM_TIDS; i++)
> + ieee80211_purge_txq(local,
> +     to_txq_info(sta->sta.txq[i]));
> +
> + list_for_each_entry_rcu(sdata, &local->interfaces, list)
> + ieee80211_purge_txq(local, to_txq_info(sdata->vif.txq));

Using RCU iteration here seems rather strange, since it's a teardown
flow? That doesn't seem necessary, since it's control path and must be
holding appropriate locks anyway to make sure nothing is added to the
lists.

> + skb = codel_dequeue(flow,
> +     &flow->backlog,
> +     0,
> +     &flow->cvars,
> +     &fq->cparams,
> +     codel_get_time(),
> +     false);

What happened here? :)

> + if (!skb) {
> + if ((head == &txqi->new_flows) &&
> +     !list_empty(&txqi->old_flows)) {
> + list_move_tail(&flow->flowchain, &txqi->old_flows);
> + } else {
> + list_del_init(&flow->flowchain);
> + flow->txqi = NULL;
> + }
> + goto begin;
> + }

Ouch. Any way you can make that easier to follow?

johannes

2016-04-16 22:29:34

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCHv3 4/5] mac80211: implement codel on fair queuing flows

On Thu, 2016-04-14 at 14:18 +0200, Michal Kazior wrote:

> + struct ieee80211_vif *vif;
> +
> + /* When packets are enqueued on txq
> it's easy
> +  * to re-construct the vif pointer.
> There's no
> +  * more space in tx_info so it can
> be used to
> +  * store the necessary enqueue time
> for packet
> +  * sojourn time computation.
> +  */
> + u64 enqueue_time;
> + };

I wonder if we could move something like the hw_key into tx_control
instead?

johannes

2016-04-08 04:37:43

by Avery Pennarun

[permalink] [raw]
Subject: Re: [PATCH 1/2] mac80211: implement fair queuing per txq

On Fri, Mar 25, 2016 at 5:27 AM, Michal Kazior <[email protected]> wrote:
> mac80211's software queues were designed to work
> very closely with device tx queues. They are
> required to make use of 802.11 packet aggregation
> easily and efficiently.
>
> However the logic imposed a per-AC queue limit.
> With the limit too small mac80211 wasn't be able
> to guarantee fairness across TIDs nor stations
> because single burst to a slow station could
> monopolize queues and reach per-AC limit
> preventing traffic from other stations being
> queued into mac80211's software queues. Having the
> limit too large would make smart qdiscs, e.g.
> fq_codel, a lot less efficient as they are
> designed on the premise that they are very close
> to the actualy device tx queues.

As usual, I'm way behind on everything, but I have been testing this
patch series in the background (no clear results to report yet) and
wanted to comment at a very high level. I think you are actually
doing several stages of improvements all at once here:

[0. Baseline: one big queue going into the driver]
1. Switch ath10k to mac80211 per-station queues.
2. Change per-station queues to use NO_QUEUE qdisc and *not* ever stop
the kernel netdev queue (since there no longer is one).
3. Actively manage per-station queues with fq_codel.
4. DQL-like control system for managing hardware queues.

Just to clarify what I mean by #2, if I understand correctly, before
this patch, the driver+mac80211 keeps track of the total number of
packets in all the mac80211 queues. When the total exceeds a fixed
amount (or when one of the per-station queues gets full?) mac80211
tells the kernel to stop sending in new packets, so they sit around in
the qdisc instead. The problem with this behaviour is we probably
have a lot of packets for one station, and not many packets for other
stations, even if the netdev qdisc has plenty of packets still waiting
for those other stations. When you then go to drain the mac80211
queues in a round-robin fashion, only the fullest queue (corresponding
to the busiest stream to the fastest station) can get optimal results.
The driver can then either send out from the fullest queue (unfair but
fast) or round robin using the non-full queues (fair but non-optimal
speed).

Upon implementing #2, we would essentially never tell the kernel to
stop sending packets; instead, it just always forwards them to
mac80211, which needs to learn how to drop them instead of providing
backpressure. This moves the entire qdisc functionality into
mac80211, hence the use of NO_QUEUE.

It's then obvious that if you just did the obvious thing (tail drop),
you'll end up with high latency, so you added fq_codel to the mix.

However, as people on this thread have noticed, fq_codel is
complicated. I'd like to be able to evaluate the performance impact
of each of the above steps separately. In particular, my theory is
that if we implement #2 with just a simple FIFO queue per station,
then if we have two stations competing (one slow, one fast), and
dequeue aggregates using round robin, then we should get all of:

a) Full airtime utilization and max-length aggregates
and
b) High latency only on busy stations, but near-zero latency on idle
stations (because of round-robin servicing of the per-station queues).

Using just a tail drop implementation, it should be very easy for me
to test that (a) and (b) are true. It should also be strictly equal
(one station) or better (multiple stations) than using mac80211 soft
queues with the pfifo_fast qdisc. If that isn't what happens, then
we'll know something went wrong with that part of the code, and we can
debug that before moving on to a wifi-aware fq_codel.

So my request: do you mind splitting your patch into two patches, one
that implements just NO_QUEUE and per-station fifo tail drop, with a
second patch that converts the tail drop to fq_codel?

Another advantage of the split is that we could then test NO_QUEUE +
tail_drop + DQL. Again, that should be strictly better than the
NO_QUEUE + tail_drop + fixed_driver_queue. Then it might be easier to
debug the (much more fiddly) fq_codel on top.

Thoughts?

Thanks,

Avery

2016-04-06 17:39:38

by Dave Taht

[permalink] [raw]
Subject: Re: [PATCHv2 1/2] mac80211: implement fair queuing per txq

On Wed, Apr 6, 2016 at 12:21 AM, Johannes Berg
<[email protected]> wrote:
> [removing other lists since they spam me with moderation bounces]

I have added your email address be accepted to the codel,
make-wifi-fast lists. My apologies for the bounces.

The people on those lists generally do not have the time to tackle the
volume of traffic on linux-wireless.

>> The hope had been the original codel.h would have been reusable,
>> which is not the case at present.
>
> So what's the strategy for making it happen?

Strategy? to meander towards a result that gives low latency to all
stations, no matter their bandwidth, on several chipsets.

The holy grail from my viewpoint is to get airtime fairness, better
mac utilization, slow stations not starving fast ones, more stations
servicable, and so on, and my focus has generally been on having an
architecture that applied equally to APs and clients. Getting clients
alone to have a queuing latency reduction of these orders of magnitude
on uploads at low rates would be a huge win, but not the holy grail.

It was really nice to have michal's proof of concept(s) show up and
show fq_codel-like benefits at both low and high speeds on wifi, but
it is clear more architectural rework is required to fit the theory
into the reality.

> Unless there is one, I
> don't see the point in making the code more complicated than it already
> has to be anyway.

+1.

Next steps were - get toke's and my testbeds up - avery/tim/myself to
keep hammering at the ath9k - michal exploring dql - jonathon poking
at it with cake-like ideas - and anyone else that cares to join in on
finally fixing bufferbloat on wifi.

and maybe put together a videoconference in 2-3 weeks or so with where
we are stuck at (felix will be off vacation, too, I think). There are
still multiple points where we all talk past each other.

Me, for example, am overly fixated on having a per station queue to
start with (which in the case of a client is two stations - one
multicast/mgtmt frames and regular traffic) and not dealing with tids
until much later in the process. Unfortunately it seems the hook is
very late in the process.
>
> johannes

2016-04-18 12:38:23

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCHv3 4/5] mac80211: implement codel on fair queuing flows

On 18 April 2016 at 07:31, Michal Kazior <[email protected]> wrote:
> On 17 April 2016 at 00:29, Johannes Berg <[email protected]> wrote:
>> On Thu, 2016-04-14 at 14:18 +0200, Michal Kazior wrote:
>>>
>>> + struct ieee80211_vif *vif;
>>> +
>>> + /* When packets are enqueued on txq
>>> it's easy
>>> + * to re-construct the vif pointer.
>>> There's no
>>> + * more space in tx_info so it can
>>> be used to
>>> + * store the necessary enqueue time
>>> for packet
>>> + * sojourn time computation.
>>> + */
>>> + u64 enqueue_time;
>>> + };
>>
>> I wonder if we could move something like the hw_key into tx_control
>> instead?
>
> Hmm.. It's probably doable. From a quick look it'll require quite some
> change here and there (e.g. tdls_channel_switch op will need to be
> extended to pass tx_control). I'll play with the idea..

This is actually far more than I thought initially. A lot of drivers
(b43, b43legacy, rtlwifi, wlxxxx, cw1200) access hw_key outside of tx
op context (tx workers, tx completions). I'm not even sure this is
safe (keys can be freed in the meantime by mac80211 hence invaliding
the pointer inside skb, no?).


Michał

2016-04-18 12:31:15

by Eric Dumazet

[permalink] [raw]
Subject: Re: [Codel] [PATCHv3 2/5] mac80211: implement fair queueing per txq

On Mon, 2016-04-18 at 07:16 +0200, Michal Kazior wrote:

>
> I guess .h file can give the compiler an opportunity for more
> optimizations. With .c you would need LTO which I'm not sure if it's
> available everywhere.
>

This makes little sense really. Otherwise everything would be in .h
files.

include/net/codel.h is an include file because both codel and fq_codel
use a common template for codel_dequeue() in fast path.

But net/mac80211/fq.h is included once, so should be a .c

Certainly all the code in control plan is not fast path and does not
deserve being duplicated.




2016-04-06 07:19:57

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCHv2 1/2] mac80211: implement fair queuing per txq

On Wed, 2016-04-06 at 07:35 +0200, Michal Kazior wrote:

> I just wanted to follow the suggested/implied usage of codel code and
> keep modifications to a minimum. I could very well just assimilate it
> if you wish.

I don't really feel all that strongly about it, but I also don't see
the point. It makes it harder to look for the code though, and that
seems fairly pointless.

Btw, just realized that there's also __u32 in there which you should
probably remove and use just u32. Also don't #include <linux/version.h>


> This follows net/sched/sch_fq_codel.h. I can put up a comment to
> explain what it's supposed to do?
>

Ok, fair enough.

johannes

2016-04-18 05:16:53

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCHv3 2/5] mac80211: implement fair queueing per txq

On 17 April 2016 at 00:25, Johannes Berg <[email protected]> wrote:
> On Sun, 2016-04-17 at 00:23 +0200, Johannes Berg wrote:
>> On Thu, 2016-04-14 at 14:18 +0200, Michal Kazior wrote:
>> >
>> >
>> > +++ b/net/mac80211/fq.h
>> >
>> Now that you've mostly rewritten it, why keep it in a .h file?
>>
>
> I think I just confused this with codel.h, but still - why a .h file
> with all this "real" code, meaning the file can really only be included
> once?

I guess .h file can give the compiler an opportunity for more
optimizations. With .c you would need LTO which I'm not sure if it's
available everywhere.


Michał

2016-04-14 12:16:34

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv3 4/5] mac80211: implement codel on fair queuing flows

There is no other limit other than a global
packet count limit when using software queuing.
This means a single flow queue can grow insanely
long. This is particularly bad for TCP congestion
algorithms which requires a little more
sophisticated frame dropping scheme than a mere
headdrop on limit overflow.

Hence apply (a slighly modified, to fit the knobs)
CoDel5 on flow queues. This improves TCP
convergence and stability when combined with
wireless driver which keeps its own tx queue/fifo
at a minimum fill level for given link conditions.

Signed-off-by: Michal Kazior <[email protected]>
---
include/net/mac80211.h | 13 ++-
net/mac80211/codel.h | 265 +++++++++++++++++++++++++++++++++++++++++++++
net/mac80211/codel_i.h | 100 +++++++++++++++++
net/mac80211/ieee80211_i.h | 5 +
net/mac80211/tx.c | 99 ++++++++++++++++-
5 files changed, 480 insertions(+), 2 deletions(-)
create mode 100644 net/mac80211/codel.h
create mode 100644 net/mac80211/codel_i.h

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index c24d0b8e4deb..d53b14bc4e79 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -889,7 +889,18 @@ struct ieee80211_tx_info {
unsigned long jiffies;
};
/* NB: vif can be NULL for injected frames */
- struct ieee80211_vif *vif;
+ union {
+ /* NB: vif can be NULL for injected frames */
+ struct ieee80211_vif *vif;
+
+ /* When packets are enqueued on txq it's easy
+ * to re-construct the vif pointer. There's no
+ * more space in tx_info so it can be used to
+ * store the necessary enqueue time for packet
+ * sojourn time computation.
+ */
+ u64 enqueue_time;
+ };
struct ieee80211_key_conf *hw_key;
u32 flags;
/* 4 bytes free */
diff --git a/net/mac80211/codel.h b/net/mac80211/codel.h
new file mode 100644
index 000000000000..63ccedcbce04
--- /dev/null
+++ b/net/mac80211/codel.h
@@ -0,0 +1,265 @@
+#ifndef __NET_MAC80211_CODEL_H
+#define __NET_MAC80211_CODEL_H
+
+/*
+ * Codel - The Controlled-Delay Active Queue Management algorithm
+ *
+ * Copyright (C) 2011-2012 Kathleen Nichols <[email protected]>
+ * Copyright (C) 2011-2012 Van Jacobson <[email protected]>
+ * Copyright (C) 2016 Michael D. Taht <[email protected]>
+ * Copyright (C) 2012 Eric Dumazet <[email protected]>
+ * Copyright (C) 2015 Jonathan Morton <[email protected]>
+ * Copyright (C) 2016 Michal Kazior <[email protected]>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions, and the following disclaimer,
+ * without modification.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. The names of the authors may not be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ * Alternatively, provided that this notice is retained in full, this
+ * software may be distributed under the terms of the GNU General
+ * Public License ("GPL") version 2, in which case the provisions of the
+ * GPL apply INSTEAD OF those given above.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
+ * DAMAGE.
+ *
+ */
+
+#include <linux/types.h>
+#include <linux/ktime.h>
+#include <linux/skbuff.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <linux/reciprocal_div.h>
+
+#include "codel_i.h"
+
+/* Controlling Queue Delay (CoDel) algorithm
+ * =========================================
+ * Source : Kathleen Nichols and Van Jacobson
+ * http://queue.acm.org/detail.cfm?id=2209336
+ *
+ * Implemented on linux by Dave Taht and Eric Dumazet
+ */
+
+/* CoDel5 uses a real clock, unlike codel */
+
+static inline u64 codel_get_time(void)
+{
+ return ktime_get_ns();
+}
+
+static inline u32 codel_time_to_us(u64 val)
+{
+ do_div(val, NSEC_PER_USEC);
+ return (u32)val;
+}
+
+/* sizeof_in_bits(rec_inv_sqrt) */
+#define REC_INV_SQRT_BITS (8 * sizeof(u16))
+/* needed shift to get a Q0.32 number from rec_inv_sqrt */
+#define REC_INV_SQRT_SHIFT (32 - REC_INV_SQRT_BITS)
+
+/* Newton approximation method needs more iterations at small inputs,
+ * so cache them.
+ */
+
+static void codel_vars_init(struct codel_vars *vars)
+{
+ memset(vars, 0, sizeof(*vars));
+}
+
+/*
+ * http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Iterative_methods_for_reciprocal_square_roots
+ * new_invsqrt = (invsqrt / 2) * (3 - count * invsqrt^2)
+ *
+ * Here, invsqrt is a fixed point number (< 1.0), 32bit mantissa, aka Q0.32
+ */
+static inline void codel_Newton_step(struct codel_vars *vars)
+{
+ u32 invsqrt = ((u32)vars->rec_inv_sqrt) << REC_INV_SQRT_SHIFT;
+ u32 invsqrt2 = ((u64)invsqrt * invsqrt) >> 32;
+ u64 val = (3LL << 32) - ((u64)vars->count * invsqrt2);
+
+ val >>= 2; /* avoid overflow in following multiply */
+ val = (val * invsqrt) >> (32 - 2 + 1);
+
+ vars->rec_inv_sqrt = val >> REC_INV_SQRT_SHIFT;
+}
+
+/*
+ * CoDel control_law is t + interval/sqrt(count)
+ * We maintain in rec_inv_sqrt the reciprocal value of sqrt(count) to avoid
+ * both sqrt() and divide operation.
+ */
+static u64 codel_control_law(u64 t,
+ u64 interval,
+ u32 rec_inv_sqrt)
+{
+ return t + reciprocal_scale(interval, rec_inv_sqrt <<
+ REC_INV_SQRT_SHIFT);
+}
+
+/* Forward declaration of this for use elsewhere */
+
+static u64 codel_get_enqueue_time_fn(void *ctx,
+ struct sk_buff *skb);
+static struct sk_buff *codel_dequeue_fn(void *ctx,
+ struct codel_vars *vars);
+static void codel_drop_fn(void *ctx,
+ struct codel_vars *vars,
+ struct sk_buff *skb);
+
+static bool codel_should_drop(void *ctx,
+ struct sk_buff *skb,
+ u32 *backlog,
+ u32 backlog_thr,
+ struct codel_vars *vars,
+ const struct codel_params *p,
+ u64 now)
+{
+ if (!skb) {
+ vars->first_above_time = 0;
+ return false;
+ }
+
+ if (now - codel_get_enqueue_time_fn(ctx, skb) < p->target ||
+ *backlog <= backlog_thr) {
+ /* went below - stay below for at least interval */
+ vars->first_above_time = 0;
+ return false;
+ }
+
+ if (vars->first_above_time == 0) {
+ /* just went above from below; mark the time */
+ vars->first_above_time = now + p->interval;
+
+ } else if (now > vars->first_above_time) {
+ return true;
+ }
+
+ return false;
+}
+
+static struct sk_buff *codel_dequeue(void *ctx,
+ u32 *backlog,
+ u32 backlog_thr,
+ struct codel_vars *vars,
+ struct codel_params *p,
+ u64 now,
+ bool overloaded)
+{
+ struct sk_buff *skb = codel_dequeue_fn(ctx, vars);
+ bool drop;
+
+ if (!skb) {
+ vars->dropping = false;
+ return skb;
+ }
+ drop = codel_should_drop(ctx, skb, backlog, backlog_thr, vars, p, now);
+ if (vars->dropping) {
+ if (!drop) {
+ /* sojourn time below target - leave dropping state */
+ vars->dropping = false;
+ } else if (now >= vars->drop_next) {
+ /* It's time for the next drop. Drop the current
+ * packet and dequeue the next. The dequeue might
+ * take us out of dropping state.
+ * If not, schedule the next drop.
+ * A large backlog might result in drop rates so high
+ * that the next drop should happen now,
+ * hence the while loop.
+ */
+
+ /* saturating increment */
+ vars->count++;
+ if (!vars->count)
+ vars->count--;
+
+ codel_Newton_step(vars);
+ vars->drop_next = codel_control_law(vars->drop_next,
+ p->interval,
+ vars->rec_inv_sqrt);
+ do {
+ if (INET_ECN_set_ce(skb) && !overloaded) {
+ vars->ecn_mark++;
+ /* and schedule the next drop */
+ vars->drop_next = codel_control_law(
+ vars->drop_next, p->interval,
+ vars->rec_inv_sqrt);
+ goto end;
+ }
+ codel_drop_fn(ctx, vars, skb);
+ vars->drop_count++;
+ skb = codel_dequeue_fn(ctx, vars);
+ if (skb && !codel_should_drop(ctx, skb,
+ backlog,
+ backlog_thr,
+ vars, p, now)) {
+ /* leave dropping state */
+ vars->dropping = false;
+ } else {
+ /* schedule the next drop */
+ vars->drop_next = codel_control_law(
+ vars->drop_next, p->interval,
+ vars->rec_inv_sqrt);
+ }
+ } while (skb && vars->dropping && now >=
+ vars->drop_next);
+
+ /* Mark the packet regardless */
+ if (skb && INET_ECN_set_ce(skb))
+ vars->ecn_mark++;
+ }
+ } else if (drop) {
+ if (INET_ECN_set_ce(skb) && !overloaded) {
+ vars->ecn_mark++;
+ } else {
+ codel_drop_fn(ctx, vars, skb);
+ vars->drop_count++;
+
+ skb = codel_dequeue_fn(ctx, vars);
+ drop = codel_should_drop(ctx, skb, backlog,
+ backlog_thr, vars, p, now);
+ if (skb && INET_ECN_set_ce(skb))
+ vars->ecn_mark++;
+ }
+ vars->dropping = true;
+ /* if min went above target close to when we last went below
+ * assume that the drop rate that controlled the queue on the
+ * last cycle is a good starting point to control it now.
+ */
+ if (vars->count > 2 &&
+ now - vars->drop_next < 8 * p->interval) {
+ vars->count -= 2;
+ codel_Newton_step(vars);
+ } else {
+ vars->count = 1;
+ vars->rec_inv_sqrt = ~0U >> REC_INV_SQRT_SHIFT;
+ }
+ codel_Newton_step(vars);
+ vars->drop_next = codel_control_law(now, p->interval,
+ vars->rec_inv_sqrt);
+ }
+end:
+ return skb;
+}
+#endif
diff --git a/net/mac80211/codel_i.h b/net/mac80211/codel_i.h
new file mode 100644
index 000000000000..57369d78d131
--- /dev/null
+++ b/net/mac80211/codel_i.h
@@ -0,0 +1,100 @@
+#ifndef __NET_MAC80211_CODEL_I_H
+#define __NET_MAC80211_CODEL_I_H
+
+/*
+ * Codel - The Controlled-Delay Active Queue Management algorithm
+ *
+ * Copyright (C) 2011-2012 Kathleen Nichols <[email protected]>
+ * Copyright (C) 2011-2012 Van Jacobson <[email protected]>
+ * Copyright (C) 2016 Michael D. Taht <[email protected]>
+ * Copyright (C) 2012 Eric Dumazet <[email protected]>
+ * Copyright (C) 2015 Jonathan Morton <[email protected]>
+ * Copyright (C) 2016 Michal Kazior <[email protected]>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions, and the following disclaimer,
+ * without modification.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. The names of the authors may not be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ * Alternatively, provided that this notice is retained in full, this
+ * software may be distributed under the terms of the GNU General
+ * Public License ("GPL") version 2, in which case the provisions of the
+ * GPL apply INSTEAD OF those given above.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
+ * DAMAGE.
+ *
+ */
+
+#include <linux/types.h>
+#include <linux/ktime.h>
+#include <linux/skbuff.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <linux/reciprocal_div.h>
+
+/* Controlling Queue Delay (CoDel) algorithm
+ * =========================================
+ * Source : Kathleen Nichols and Van Jacobson
+ * http://queue.acm.org/detail.cfm?id=2209336
+ *
+ * Implemented on linux by Dave Taht and Eric Dumazet
+ */
+
+/* CoDel5 uses a real clock, unlike codel */
+
+#define MS2TIME(a) (a * (u64)NSEC_PER_MSEC)
+#define US2TIME(a) (a * (u64)NSEC_PER_USEC)
+
+/**
+ * struct codel_vars - contains codel variables
+ * @count: how many drops we've done since the last time we
+ * entered dropping state
+ * @dropping: set to > 0 if in dropping state
+ * @rec_inv_sqrt: reciprocal value of sqrt(count) >> 1
+ * @first_above_time: when we went (or will go) continuously above target
+ * for interval
+ * @drop_next: time to drop next packet, or when we dropped last
+ * @drop_count: temp count of dropped packets in dequeue()
+ * @ecn_mark: number of packets we ECN marked instead of dropping
+ */
+
+struct codel_vars {
+ u32 count;
+ u16 dropping;
+ u16 rec_inv_sqrt;
+ u64 first_above_time;
+ u64 drop_next;
+ u16 drop_count;
+ u16 ecn_mark;
+};
+
+/**
+ * struct codel_params - stores codel parameters
+ *
+ * @interval: initial drop rate
+ * @target: maximum persistent sojourn time
+ */
+struct codel_params {
+ u64 interval;
+ u64 target;
+};
+
+#endif
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 49396d13ba9a..78953b495a25 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -31,6 +31,7 @@
#include <net/cfg80211.h>
#include <net/mac80211.h>
#include "fq_i.h"
+#include "codel_i.h"
#include "key.h"
#include "sta_info.h"
#include "debug.h"
@@ -811,10 +812,12 @@ enum txq_info_flags {
* @tin: contains packets split into multiple flows
* @def_flow: used as a fallback flow when a packet destined to @tin hashes to
* a fq_flow which is already owned by a different tin
+ * @def_cvars: codel vars for @def_flow
*/
struct txq_info {
struct fq_tin tin;
struct fq_flow def_flow;
+ struct codel_vars def_cvars;
unsigned long flags;

/* keep last! */
@@ -1106,6 +1109,8 @@ struct ieee80211_local {
struct ieee80211_hw hw;

struct fq fq;
+ struct codel_vars *cvars;
+ struct codel_params cparams;

const struct ieee80211_ops *ops;

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 396d0d17edeb..238cb8e979fd 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -36,6 +36,7 @@
#include "wme.h"
#include "rate.h"
#include "fq.h"
+#include "codel.h"

static unsigned int fq_flows_cnt = 4096;
module_param(fq_flows_cnt, uint, 0644);
@@ -1265,11 +1266,86 @@ static struct txq_info *ieee80211_get_txq(struct ieee80211_local *local,
return NULL;
}

+static void ieee80211_set_skb_enqueue_time(struct sk_buff *skb)
+{
+ IEEE80211_SKB_CB(skb)->control.enqueue_time = codel_get_time();
+}
+
+static void ieee80211_set_skb_vif(struct sk_buff *skb, struct txq_info *txqi)
+{
+ IEEE80211_SKB_CB(skb)->control.vif = txqi->txq.vif;
+}
+
+static u64 codel_get_enqueue_time_fn(void *ctx,
+ struct sk_buff *skb)
+{
+ return IEEE80211_SKB_CB(skb)->control.enqueue_time;
+}
+
+static struct sk_buff *codel_dequeue_fn(void *ctx,
+ struct codel_vars *cvars)
+{
+ struct ieee80211_local *local;
+ struct txq_info *txqi;
+ struct fq *fq;
+ struct fq_flow *flow;
+
+ txqi = ctx;
+ local = vif_to_sdata(txqi->txq.vif)->local;
+ fq = &local->fq;
+
+ if (cvars == &txqi->def_cvars)
+ flow = &txqi->def_flow;
+ else
+ flow = &fq->flows[cvars - local->cvars];
+
+ return fq_flow_dequeue(fq, flow);
+}
+
+static void codel_drop_fn(void *ctx,
+ struct codel_vars *cvars,
+ struct sk_buff *skb)
+{
+ struct ieee80211_local *local;
+ struct ieee80211_hw *hw;
+ struct txq_info *txqi;
+
+ txqi = ctx;
+ local = vif_to_sdata(txqi->txq.vif)->local;
+ hw = &local->hw;
+
+ ieee80211_free_txskb(hw, skb);
+}
+
static struct sk_buff *fq_tin_dequeue_fn(struct fq *fq,
struct fq_tin *tin,
struct fq_flow *flow)
{
- return fq_flow_dequeue(fq, flow);
+ struct ieee80211_local *local;
+ struct txq_info *txqi;
+ struct codel_vars *cvars;
+ struct codel_params *cparams;
+ bool overloaded;
+
+ local = container_of(fq, struct ieee80211_local, fq);
+ txqi = container_of(tin, struct txq_info, tin);
+ cparams = &local->cparams;
+
+ if (flow == &txqi->def_flow)
+ cvars = &txqi->def_cvars;
+ else
+ cvars = &local->cvars[flow - fq->flows];
+
+ /* TODO */
+ overloaded = false;
+
+ return codel_dequeue(txqi,
+ &flow->backlog,
+ 0,
+ cvars,
+ cparams,
+ codel_get_time(),
+ overloaded);
}

static void fq_skb_free_fn(struct fq *fq,
@@ -1301,6 +1377,7 @@ static void ieee80211_txq_enqueue(struct ieee80211_local *local,
struct fq *fq = &local->fq;
struct fq_tin *tin = &txqi->tin;

+ ieee80211_set_skb_enqueue_time(skb);
fq_tin_enqueue(fq, tin, skb);
}

@@ -1310,6 +1387,7 @@ void ieee80211_txq_init(struct ieee80211_sub_if_data *sdata,
{
fq_tin_init(&txqi->tin);
fq_flow_init(&txqi->def_flow);
+ codel_vars_init(&txqi->def_cvars);

txqi->txq.vif = &sdata->vif;

@@ -1338,6 +1416,7 @@ int ieee80211_txq_setup_flows(struct ieee80211_local *local)
{
struct fq *fq = &local->fq;
int ret;
+ int i;

if (!local->ops->wake_tx_queue)
return 0;
@@ -1346,6 +1425,19 @@ int ieee80211_txq_setup_flows(struct ieee80211_local *local)
if (ret)
return ret;

+ local->cparams.interval = MS2TIME(100);
+ local->cparams.target = MS2TIME(20);
+
+ local->cvars = kcalloc(fq->flows_cnt, sizeof(local->cvars[0]),
+ GFP_KERNEL);
+ if (!local->cvars) {
+ fq_reset(fq);
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < fq->flows_cnt; i++)
+ codel_vars_init(&local->cvars[i]);
+
return 0;
}

@@ -1356,6 +1448,9 @@ void ieee80211_txq_teardown_flows(struct ieee80211_local *local)
if (!local->ops->wake_tx_queue)
return;

+ kfree(local->cvars);
+ local->cvars = NULL;
+
fq_reset(fq);
}

@@ -1378,6 +1473,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
if (!skb)
goto out;

+ ieee80211_set_skb_vif(skb, txqi);
+
hdr = (struct ieee80211_hdr *)skb->data;
if (txq->sta && ieee80211_is_data_qos(hdr->frame_control)) {
struct sta_info *sta = container_of(txq->sta, struct sta_info,
--
2.1.4


2016-04-19 09:10:22

by Johannes Berg

[permalink] [raw]
Subject: Re: [Codel] [PATCHv3 2/5] mac80211: implement fair queueing per txq

On Mon, 2016-04-18 at 15:36 +0200, Michal Kazior wrote:

> FWIW cfg80211 drivers might become another user of the fq/codel stuff
> in the future.
>
> Arguably I should make include/net/codel.h not be qdisc specific as
> it is now (and hence re-usable by mac80211) and submit fq.h to
> include/net/. Would that be better (it'll probably take a lot longer
> to propagate over trees, no?)

I think it would be better, and we could sync and take it through my
tree, or I can sync and just pull back davem's tree once it's there; I
don't think code "management" wise it'll be an issue.

> johannes

2016-04-16 22:25:54

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCHv3 2/5] mac80211: implement fair queueing per txq

On Sun, 2016-04-17 at 00:23 +0200, Johannes Berg wrote:
> On Thu, 2016-04-14 at 14:18 +0200, Michal Kazior wrote:
> >
> >  
> > +++ b/net/mac80211/fq.h
> >
> Now that you've mostly rewritten it, why keep it in a .h file?
>

I think I just confused this with codel.h, but still - why a .h file
with all this "real" code, meaning the file can really only be included
once?

johannes

2016-04-16 22:21:10

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCHv3 1/5] mac80211: skip netdev queue control with software queuing

> +static void ieee80211_txq_enqueue(struct ieee80211_local *local,
> +   struct txq_info *txqi,
> +   struct sk_buff *skb)
> +{
> + lockdep_assert_held(&txqi->queue.lock);
[...]
> + atomic_inc(&local->num_tx_queued);

This global kinda bothers me - anything we can do about removing it?

We obviously didn't have it now - just one (even bigger limit!) per
queue, so that's 4000 frames default per interface ... now you're down
to 512 for the entire hardware. Perhaps keeping it per interface at
least gets away the worst of the contention here?

johannes

2016-04-06 07:17:02

by Michal Kazior

[permalink] [raw]
Subject: Re: [Make-wifi-fast] [PATCHv2 1/2] mac80211: implement fair queuing per txq

On 6 April 2016 at 08:03, Jonathan Morton <[email protected]> wrote:
>
>> On 6 Apr, 2016, at 08:35, Michal Kazior <[email protected]> wrote:
>>
>> Packets can be destined to different stations/txqs. At enqueue time I
>> do a partial hash of a packet to get an "index" which I then use to
>> address a txq_flow from per-radio list (out of 4096 of them). You can
>> end up with a situtation like this:
>> - packet A hashing to X destined to txq P which is VI
>> - packet B hashing to X destined to txq Q which is BK
>>
>> You can't use the same txq_flow for both A and B because you want to
>> maintain packets per txqs more than you want to maintain them per flow
>> (you don't want to queue BK traffic onto VI or vice versa as an
>> artifact, do you? ;). When a txq_flow doesn't have a txqi it is bound.
>> Later, if a collision happens (i.e. resulting txq_flow has non-NULL
>> txqi) the "embedded" per-txq flow is used:
>>
>> struct txq_info {
>> - struct sk_buff_head queue;
>> + struct txq_flow flow; // <--- this
>>
>> When txq_flow becomes empty its txqi is reset.
>>
>> The embedded flow is otherwise treated like any other flow, i.e. it
>> can be linked to old_flows and new_flows.
>
> This smells like a very fragile and complex solution to the collision problem. You may want to look at how Cake solves it.
>
> I use a separate pool of flows per traffic class (essentially, VO/VI/BE/BK), and there is also a set-associative hash to take care of the birthday problem. The latter has an order-of-magnitude effect on the general flow collision rate once you get into the tens of flows, for very little CPU cost.

When a driver asks mac80211 to dequeue given txq it implies a
destination station as well. This is important because 802.11
aggregation can be performed only on groups of packets going to a
single station on a single tid.

Cake - as I understand it - doesn't really *guarantee* maintaining
this. Keep in mind you can run with hundreds of stations connected.

You don't really want to burden drivers with sorting this grouping up
themselves (and hence coerce them into introducing another level of
intermediate queues, bis).

Without the per-txq fallback flow (regardless of using fq_codel-like
scheme or cake-like scheme in mac80211) you'll need to modify
codel_dequeue() itself to compensate and re-queue/skip frames not
belonging to requested txq.

I'm not sure it's worth it for initial fair-queuing implementation.


Michał

2016-04-18 13:36:26

by Michal Kazior

[permalink] [raw]
Subject: Re: [Codel] [PATCHv3 2/5] mac80211: implement fair queueing per txq

On 18 April 2016 at 14:31, Eric Dumazet <[email protected]> wrote:
> On Mon, 2016-04-18 at 07:16 +0200, Michal Kazior wrote:
>
>>
>> I guess .h file can give the compiler an opportunity for more
>> optimizations. With .c you would need LTO which I'm not sure if it's
>> available everywhere.
>>
>
> This makes little sense really. Otherwise everything would be in .h
> files.
>
> include/net/codel.h is an include file because both codel and fq_codel
> use a common template for codel_dequeue() in fast path.
>
> But net/mac80211/fq.h is included once, so should be a .c

FWIW cfg80211 drivers might become another user of the fq/codel stuff
in the future.

Arguably I should make include/net/codel.h not be qdisc specific as it
is now (and hence re-usable by mac80211) and submit fq.h to
include/net/. Would that be better (it'll probably take a lot longer
to propagate over trees, no?)


>
> Certainly all the code in control plan is not fast path and does not
> deserve being duplicated.

Good point. The fq init/reset stuff is probably a good example.

However if I were to put fq.h into include/net/ where should I put the
init/reset stuff then? net/core/?


Michał

2016-04-14 12:16:31

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv3 2/5] mac80211: implement fair queueing per txq

mac80211's software queues were designed to work
very closely with device tx queues. They are
required to make use of 802.11 packet aggregation
easily and efficiently.

Due to the way 802.11 aggregation is designed it
only makes sense to keep fair queuing as close to
hardware as possible to reduce induced latency and
inertia and provide the best flow responsivness.

This change doesn't translate directly to
immediate and significant gains. End result
depends on driver's induced latency. Best results
can be achieved if driver keeps its own tx
queue/fifo fill level to a minimum.

Signed-off-by: Michal Kazior <[email protected]>
---
net/mac80211/agg-tx.c | 8 +-
net/mac80211/fq.h | 265 +++++++++++++++++++++++++++++++++++++++++++++
net/mac80211/fq_i.h | 75 +++++++++++++
net/mac80211/ieee80211_i.h | 25 ++++-
net/mac80211/iface.c | 12 +-
net/mac80211/main.c | 7 ++
net/mac80211/rx.c | 2 +-
net/mac80211/sta_info.c | 14 +--
net/mac80211/tx.c | 110 ++++++++++++++++---
net/mac80211/util.c | 23 +---
10 files changed, 481 insertions(+), 60 deletions(-)
create mode 100644 net/mac80211/fq.h
create mode 100644 net/mac80211/fq_i.h

diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
index 4932e9f243a2..908ac84a1962 100644
--- a/net/mac80211/agg-tx.c
+++ b/net/mac80211/agg-tx.c
@@ -194,17 +194,21 @@ static void
ieee80211_agg_stop_txq(struct sta_info *sta, int tid)
{
struct ieee80211_txq *txq = sta->sta.txq[tid];
+ struct ieee80211_sub_if_data *sdata;
+ struct fq *fq;
struct txq_info *txqi;

if (!txq)
return;

txqi = to_txq_info(txq);
+ sdata = vif_to_sdata(txq->vif);
+ fq = &sdata->local->fq;

/* Lock here to protect against further seqno updates on dequeue */
- spin_lock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);
set_bit(IEEE80211_TXQ_STOP, &txqi->flags);
- spin_unlock_bh(&txqi->queue.lock);
+ spin_unlock_bh(&fq->lock);
}

static void
diff --git a/net/mac80211/fq.h b/net/mac80211/fq.h
new file mode 100644
index 000000000000..fa98576e1825
--- /dev/null
+++ b/net/mac80211/fq.h
@@ -0,0 +1,265 @@
+/*
+ * Copyright (c) 2016 Qualcomm Atheros, Inc
+ *
+ * GPL v2
+ *
+ * Based on net/sched/sch_fq_codel.c
+ */
+#ifndef FQ_H
+#define FQ_H
+
+#include "fq_i.h"
+
+/* forward declarations the includer must implement */
+
+static struct sk_buff *fq_tin_dequeue_fn(struct fq *,
+ struct fq_tin *,
+ struct fq_flow *flow);
+
+static void fq_skb_free_fn(struct fq *,
+ struct fq_tin *,
+ struct fq_flow *,
+ struct sk_buff *);
+
+static struct fq_flow *fq_flow_get_default_fn(struct fq *,
+ struct fq_tin *,
+ int idx,
+ struct sk_buff *);
+
+/* functions that are embedded into includer */
+
+static struct sk_buff *fq_flow_dequeue(struct fq *fq,
+ struct fq_flow *flow)
+{
+ struct fq_tin *tin = flow->tin;
+ struct fq_flow *i;
+ struct sk_buff *skb;
+
+ lockdep_assert_held(&fq->lock);
+
+ skb = __skb_dequeue(&flow->queue);
+ if (!skb)
+ return NULL;
+
+ tin->backlog_bytes -= skb->len;
+ tin->backlog_packets--;
+ flow->backlog -= skb->len;
+ fq->backlog--;
+
+ if (flow->backlog == 0) {
+ list_del_init(&flow->backlogchain);
+ } else {
+ i = flow;
+
+ list_for_each_entry_continue(i, &fq->backlogs, backlogchain)
+ if (i->backlog < flow->backlog)
+ break;
+
+ list_move_tail(&flow->backlogchain,
+ &i->backlogchain);
+ }
+
+ return skb;
+}
+
+static struct sk_buff *fq_tin_dequeue(struct fq *fq,
+ struct fq_tin *tin)
+{
+ struct fq_flow *flow;
+ struct list_head *head;
+ struct sk_buff *skb;
+
+ lockdep_assert_held(&fq->lock);
+
+begin:
+ head = &tin->new_flows;
+ if (list_empty(head)) {
+ head = &tin->old_flows;
+ if (list_empty(head))
+ return NULL;
+ }
+
+ flow = list_first_entry(head, struct fq_flow, flowchain);
+
+ if (flow->deficit <= 0) {
+ flow->deficit += fq->quantum;
+ list_move_tail(&flow->flowchain,
+ &tin->old_flows);
+ goto begin;
+ }
+
+ skb = fq_tin_dequeue_fn(fq, tin, flow);
+ if (!skb) {
+ /* force a pass through old_flows to prevent starvation */
+ if ((head == &tin->new_flows) &&
+ !list_empty(&tin->old_flows)) {
+ list_move_tail(&flow->flowchain, &tin->old_flows);
+ } else {
+ list_del_init(&flow->flowchain);
+ flow->tin = NULL;
+ }
+ goto begin;
+ }
+
+ flow->deficit -= skb->len;
+
+ return skb;
+}
+
+static struct fq_flow *fq_flow_classify(struct fq *fq,
+ struct fq_tin *tin,
+ struct sk_buff *skb)
+{
+ struct fq_flow *flow;
+ u32 hash;
+ u32 idx;
+
+ lockdep_assert_held(&fq->lock);
+
+ hash = skb_get_hash_perturb(skb, fq->perturbation);
+ idx = reciprocal_scale(hash, fq->flows_cnt);
+ flow = &fq->flows[idx];
+
+ if (flow->tin && flow->tin != tin)
+ flow = fq_flow_get_default_fn(fq, tin, idx, skb);
+
+ return flow;
+}
+
+static void fq_tin_enqueue(struct fq *fq,
+ struct fq_tin *tin,
+ struct sk_buff *skb)
+{
+ struct fq_flow *flow;
+ struct fq_flow *i;
+
+ lockdep_assert_held(&fq->lock);
+
+ flow = fq_flow_classify(fq, tin, skb);
+
+ flow->tin = tin;
+ flow->backlog += skb->len;
+ tin->backlog_bytes += skb->len;
+ tin->backlog_packets++;
+ fq->backlog++;
+
+ if (list_empty(&flow->backlogchain))
+ list_add_tail(&flow->backlogchain, &fq->backlogs);
+
+ i = flow;
+ list_for_each_entry_continue_reverse(i, &fq->backlogs,
+ backlogchain)
+ if (i->backlog > flow->backlog)
+ break;
+
+ list_move(&flow->backlogchain, &i->backlogchain);
+
+ if (list_empty(&flow->flowchain)) {
+ flow->deficit = fq->quantum;
+ list_add_tail(&flow->flowchain,
+ &tin->new_flows);
+ }
+
+ __skb_queue_tail(&flow->queue, skb);
+
+ if (fq->backlog > fq->limit) {
+ flow = list_first_entry_or_null(&fq->backlogs,
+ struct fq_flow,
+ backlogchain);
+ if (!flow)
+ return;
+
+ skb = fq_flow_dequeue(fq, flow);
+ if (!skb)
+ return;
+
+ fq_skb_free_fn(fq, flow->tin, flow, skb);
+ }
+}
+
+static void fq_flow_reset(struct fq *fq, struct fq_flow *flow)
+{
+ struct sk_buff *skb;
+
+ while ((skb = fq_flow_dequeue(fq, flow)))
+ fq_skb_free_fn(fq, flow->tin, flow, skb);
+
+ if (!list_empty(&flow->flowchain))
+ list_del_init(&flow->flowchain);
+
+ if (!list_empty(&flow->backlogchain))
+ list_del_init(&flow->backlogchain);
+
+ flow->tin = NULL;
+
+ WARN_ON_ONCE(flow->backlog);
+}
+
+static void fq_tin_reset(struct fq *fq, struct fq_tin *tin)
+{
+ struct list_head *head;
+ struct fq_flow *flow;
+
+ for (;;) {
+ head = &tin->new_flows;
+ if (list_empty(head)) {
+ head = &tin->old_flows;
+ if (list_empty(head))
+ break;
+ }
+
+ flow = list_first_entry(head, struct fq_flow, flowchain);
+ fq_flow_reset(fq, flow);
+ }
+
+ WARN_ON_ONCE(tin->backlog_bytes);
+ WARN_ON_ONCE(tin->backlog_packets);
+}
+
+static void fq_flow_init(struct fq_flow *flow)
+{
+ INIT_LIST_HEAD(&flow->flowchain);
+ INIT_LIST_HEAD(&flow->backlogchain);
+ __skb_queue_head_init(&flow->queue);
+}
+
+static void fq_tin_init(struct fq_tin *tin)
+{
+ INIT_LIST_HEAD(&tin->new_flows);
+ INIT_LIST_HEAD(&tin->old_flows);
+}
+
+static int fq_init(struct fq *fq, int flows_cnt)
+{
+ int i;
+
+ memset(fq, 0, sizeof(fq[0]));
+ INIT_LIST_HEAD(&fq->backlogs);
+ spin_lock_init(&fq->lock);
+ fq->flows_cnt = max_t(u32, flows_cnt, 1);
+ fq->perturbation = prandom_u32();
+ fq->quantum = 300;
+ fq->limit = 8192;
+
+ fq->flows = kcalloc(fq->flows_cnt, sizeof(fq->flows[0]), GFP_KERNEL);
+ if (!fq->flows)
+ return -ENOMEM;
+
+ for (i = 0; i < fq->flows_cnt; i++)
+ fq_flow_init(&fq->flows[i]);
+
+ return 0;
+}
+
+static void fq_reset(struct fq *fq)
+{
+ int i;
+
+ for (i = 0; i < fq->flows_cnt; i++)
+ fq_flow_reset(fq, &fq->flows[i]);
+
+ kfree(fq->flows);
+ fq->flows = NULL;
+}
+
+#endif
diff --git a/net/mac80211/fq_i.h b/net/mac80211/fq_i.h
new file mode 100644
index 000000000000..5d8423f22e8d
--- /dev/null
+++ b/net/mac80211/fq_i.h
@@ -0,0 +1,75 @@
+/*
+ * Copyright (c) 2016 Qualcomm Atheros, Inc
+ *
+ * GPL v2
+ *
+ * Based on net/sched/sch_fq_codel.c
+ */
+#ifndef FQ_I_H
+#define FQ_I_H
+
+struct fq_tin;
+struct fq_flow;
+
+/**
+ * struct fq_flow - per traffic flow queue
+ *
+ * @tin: owner of this flow. Used to manage collisions, i.e. when a packet
+ * hashes to an index which points to a flow that is already owned by a
+ * different tin the packet is destined to. In such case the implementer
+ * must provide a fallback flow
+ * @flowchain: can be linked to fq_tin's new_flows or old_flows. Used for DRR++
+ * (deficit round robin) based round robin queuing similar to the one
+ * found in net/sched/sch_fq_codel.c
+ * @backlogchain: can be linked to other fq_flow and fq. Used to keep track of
+ * fat flows and efficient head-dropping if packet limit is reached
+ * @queue: sk_buff queue to hold packets
+ * @backlog: number of bytes pending in the queue. The number of packets can be
+ * found in @queue.qlen
+ * @deficit: used for DRR++
+ */
+struct fq_flow {
+ struct fq_tin *tin;
+ struct list_head flowchain;
+ struct list_head backlogchain;
+ struct sk_buff_head queue;
+ u32 backlog;
+ int deficit;
+};
+
+/**
+ * struct fq_tin - a logical container of fq_flows
+ *
+ * Used to group fq_flows into a logical aggregate. DRR++ scheme is used to
+ * pull interleaved packets out of the associated flows.
+ *
+ * @new_flows: linked list of fq_flow
+ * @old_flows: linked list of fq_flow
+ */
+struct fq_tin {
+ struct list_head new_flows;
+ struct list_head old_flows;
+ u32 backlog_bytes;
+ u32 backlog_packets;
+};
+
+/**
+ * struct fq - main container for fair queuing purposes
+ *
+ * @backlogs: linked to fq_flows. Used to maintain fat flows for efficient
+ * head-dropping when @backlog reaches @limit
+ * @limit: max number of packets that can be queued across all flows
+ * @backlog: number of packets queued across all flows
+ */
+struct fq {
+ struct fq_flow *flows;
+ struct list_head backlogs;
+ spinlock_t lock;
+ u32 flows_cnt;
+ u32 perturbation;
+ u32 limit;
+ u32 quantum;
+ u32 backlog;
+};
+
+#endif
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index b2570aa66d33..49396d13ba9a 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -30,6 +30,7 @@
#include <net/ieee80211_radiotap.h>
#include <net/cfg80211.h>
#include <net/mac80211.h>
+#include "fq_i.h"
#include "key.h"
#include "sta_info.h"
#include "debug.h"
@@ -804,10 +805,17 @@ enum txq_info_flags {
IEEE80211_TXQ_AMPDU,
};

+/**
+ * struct txq_info - per tid queue
+ *
+ * @tin: contains packets split into multiple flows
+ * @def_flow: used as a fallback flow when a packet destined to @tin hashes to
+ * a fq_flow which is already owned by a different tin
+ */
struct txq_info {
- struct sk_buff_head queue;
+ struct fq_tin tin;
+ struct fq_flow def_flow;
unsigned long flags;
- unsigned long byte_cnt;

/* keep last! */
struct ieee80211_txq txq;
@@ -1097,6 +1105,8 @@ struct ieee80211_local {
* it first anyway so they become a no-op */
struct ieee80211_hw hw;

+ struct fq fq;
+
const struct ieee80211_ops *ops;

/*
@@ -1117,7 +1127,6 @@ struct ieee80211_local {
fif_probe_req;
int probe_req_reg;
unsigned int filter_flags; /* FIF_* */
- atomic_t num_tx_queued;

bool wiphy_ciphers_allocated;

@@ -1925,9 +1934,13 @@ static inline bool ieee80211_can_run_worker(struct ieee80211_local *local)
return true;
}

-void ieee80211_init_tx_queue(struct ieee80211_sub_if_data *sdata,
- struct sta_info *sta,
- struct txq_info *txq, int tid);
+int ieee80211_txq_setup_flows(struct ieee80211_local *local);
+void ieee80211_txq_teardown_flows(struct ieee80211_local *local);
+void ieee80211_txq_init(struct ieee80211_sub_if_data *sdata,
+ struct sta_info *sta,
+ struct txq_info *txq, int tid);
+void ieee80211_txq_purge(struct ieee80211_local *local,
+ struct txq_info *txqi);
void ieee80211_send_auth(struct ieee80211_sub_if_data *sdata,
u16 transaction, u16 auth_alg, u16 status,
const u8 *extra, size_t extra_len, const u8 *bssid,
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 67c9b1e565ad..a7ac80944ae6 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -779,6 +779,7 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,
bool going_down)
{
struct ieee80211_local *local = sdata->local;
+ struct fq *fq = &local->fq;
unsigned long flags;
struct sk_buff *skb, *tmp;
u32 hw_reconf_flags = 0;
@@ -976,13 +977,10 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,

if (sdata->vif.txq) {
struct txq_info *txqi = to_txq_info(sdata->vif.txq);
- int n = skb_queue_len(&txqi->queue);

- spin_lock_bh(&txqi->queue.lock);
- ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
- atomic_sub(n, &local->num_tx_queued);
- txqi->byte_cnt = 0;
- spin_unlock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);
+ ieee80211_txq_purge(local, txqi);
+ spin_unlock_bh(&fq->lock);
}

if (local->open_count == 0)
@@ -1792,7 +1790,7 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,

if (txq_size) {
txqi = netdev_priv(ndev) + size;
- ieee80211_init_tx_queue(sdata, NULL, txqi, 0);
+ ieee80211_txq_init(sdata, NULL, txqi, 0);
}

sdata->dev = ndev;
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 609abc39e454..a1b19297ac22 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -1084,6 +1084,10 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)

rtnl_unlock();

+ result = ieee80211_txq_setup_flows(local);
+ if (result)
+ goto fail_flows;
+
#ifdef CONFIG_INET
local->ifa_notifier.notifier_call = ieee80211_ifa_changed;
result = register_inetaddr_notifier(&local->ifa_notifier);
@@ -1109,6 +1113,8 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)
#if defined(CONFIG_INET) || defined(CONFIG_IPV6)
fail_ifa:
#endif
+ ieee80211_txq_teardown_flows(local);
+ fail_flows:
rtnl_lock();
rate_control_deinitialize(local);
ieee80211_remove_interfaces(local);
@@ -1167,6 +1173,7 @@ void ieee80211_unregister_hw(struct ieee80211_hw *hw)
skb_queue_purge(&local->skb_queue);
skb_queue_purge(&local->skb_queue_unreliable);
skb_queue_purge(&local->skb_queue_tdls_chsw);
+ ieee80211_txq_teardown_flows(local);

destroy_workqueue(local->workqueue);
wiphy_unregister(local->hw.wiphy);
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 91279576f4a7..5c52fc14a0e9 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1268,7 +1268,7 @@ static void sta_ps_start(struct sta_info *sta)
for (tid = 0; tid < ARRAY_SIZE(sta->sta.txq); tid++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[tid]);

- if (!skb_queue_len(&txqi->queue))
+ if (!txqi->tin.backlog_packets)
set_bit(tid, &sta->txq_buffered_tids);
else
clear_bit(tid, &sta->txq_buffered_tids);
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 4ab97d454bc1..15a1265d20b0 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -89,6 +89,7 @@ static void __cleanup_single_sta(struct sta_info *sta)
struct tid_ampdu_tx *tid_tx;
struct ieee80211_sub_if_data *sdata = sta->sdata;
struct ieee80211_local *local = sdata->local;
+ struct fq *fq = &local->fq;
struct ps_data *ps;

if (test_sta_flag(sta, WLAN_STA_PS_STA) ||
@@ -112,11 +113,10 @@ static void __cleanup_single_sta(struct sta_info *sta)
if (sta->sta.txq[0]) {
for (i = 0; i < ARRAY_SIZE(sta->sta.txq); i++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[i]);
- int n = skb_queue_len(&txqi->queue);

- ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
- atomic_sub(n, &local->num_tx_queued);
- txqi->byte_cnt = 0;
+ spin_lock_bh(&fq->lock);
+ ieee80211_txq_purge(local, txqi);
+ spin_unlock_bh(&fq->lock);
}
}

@@ -357,7 +357,7 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
for (i = 0; i < ARRAY_SIZE(sta->sta.txq); i++) {
struct txq_info *txq = txq_data + i * size;

- ieee80211_init_tx_queue(sdata, sta, txq, i);
+ ieee80211_txq_init(sdata, sta, txq, i);
}
}

@@ -1193,7 +1193,7 @@ void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta)
for (i = 0; i < ARRAY_SIZE(sta->sta.txq); i++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[i]);

- if (!skb_queue_len(&txqi->queue))
+ if (!txqi->tin.backlog_packets)
continue;

drv_wake_tx_queue(local, txqi);
@@ -1630,7 +1630,7 @@ ieee80211_sta_ps_deliver_response(struct sta_info *sta,
for (tid = 0; tid < ARRAY_SIZE(sta->sta.txq); tid++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[tid]);

- if (!(tids & BIT(tid)) || skb_queue_len(&txqi->queue))
+ if (!(tids & BIT(tid)) || txqi->tin.backlog_packets)
continue;

sta_info_recalc_tim(sta);
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 8de3d2676397..d4e0c87ecec5 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -34,6 +34,7 @@
#include "wpa.h"
#include "wme.h"
#include "rate.h"
+#include "fq.h"

/* misc utils */

@@ -1258,21 +1259,98 @@ static struct txq_info *ieee80211_get_txq(struct ieee80211_local *local,
return NULL;
}

+static struct sk_buff *fq_tin_dequeue_fn(struct fq *fq,
+ struct fq_tin *tin,
+ struct fq_flow *flow)
+{
+ return fq_flow_dequeue(fq, flow);
+}
+
+static void fq_skb_free_fn(struct fq *fq,
+ struct fq_tin *tin,
+ struct fq_flow *flow,
+ struct sk_buff *skb)
+{
+ struct ieee80211_local *local;
+
+ local = container_of(fq, struct ieee80211_local, fq);
+ ieee80211_free_txskb(&local->hw, skb);
+}
+
+static struct fq_flow *fq_flow_get_default_fn(struct fq *fq,
+ struct fq_tin *tin,
+ int idx,
+ struct sk_buff *skb)
+{
+ struct txq_info *txqi;
+
+ txqi = container_of(tin, struct txq_info, tin);
+ return &txqi->def_flow;
+}
+
static void ieee80211_txq_enqueue(struct ieee80211_local *local,
struct txq_info *txqi,
struct sk_buff *skb)
{
- lockdep_assert_held(&txqi->queue.lock);
+ struct fq *fq = &local->fq;
+ struct fq_tin *tin = &txqi->tin;

- if (atomic_read(&local->num_tx_queued) >= TOTAL_MAX_TX_BUFFER ||
- txqi->queue.qlen >= STA_MAX_TX_BUFFER) {
- ieee80211_free_txskb(&local->hw, skb);
- return;
+ fq_tin_enqueue(fq, tin, skb);
+}
+
+void ieee80211_txq_init(struct ieee80211_sub_if_data *sdata,
+ struct sta_info *sta,
+ struct txq_info *txqi, int tid)
+{
+ fq_tin_init(&txqi->tin);
+ fq_flow_init(&txqi->def_flow);
+
+ txqi->txq.vif = &sdata->vif;
+
+ if (sta) {
+ txqi->txq.sta = &sta->sta;
+ sta->sta.txq[tid] = &txqi->txq;
+ txqi->txq.tid = tid;
+ txqi->txq.ac = ieee802_1d_to_ac[tid & 7];
+ } else {
+ sdata->vif.txq = &txqi->txq;
+ txqi->txq.tid = 0;
+ txqi->txq.ac = IEEE80211_AC_BE;
}
+}

- atomic_inc(&local->num_tx_queued);
- txqi->byte_cnt += skb->len;
- __skb_queue_tail(&txqi->queue, skb);
+void ieee80211_txq_purge(struct ieee80211_local *local,
+ struct txq_info *txqi)
+{
+ struct fq *fq = &local->fq;
+ struct fq_tin *tin = &txqi->tin;
+
+ fq_tin_reset(fq, tin);
+}
+
+int ieee80211_txq_setup_flows(struct ieee80211_local *local)
+{
+ struct fq *fq = &local->fq;
+ int ret;
+
+ if (!local->ops->wake_tx_queue)
+ return 0;
+
+ ret = fq_init(fq, 4096);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+void ieee80211_txq_teardown_flows(struct ieee80211_local *local)
+{
+ struct fq *fq = &local->fq;
+
+ if (!local->ops->wake_tx_queue)
+ return;
+
+ fq_reset(fq);
}

struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
@@ -1282,19 +1360,18 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
struct txq_info *txqi = container_of(txq, struct txq_info, txq);
struct ieee80211_hdr *hdr;
struct sk_buff *skb = NULL;
+ struct fq *fq = &local->fq;
+ struct fq_tin *tin = &txqi->tin;

- spin_lock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);

if (test_bit(IEEE80211_TXQ_STOP, &txqi->flags))
goto out;

- skb = __skb_dequeue(&txqi->queue);
+ skb = fq_tin_dequeue(fq, tin);
if (!skb)
goto out;

- atomic_dec(&local->num_tx_queued);
- txqi->byte_cnt -= skb->len;
-
hdr = (struct ieee80211_hdr *)skb->data;
if (txq->sta && ieee80211_is_data_qos(hdr->frame_control)) {
struct sta_info *sta = container_of(txq->sta, struct sta_info,
@@ -1309,7 +1386,7 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
}

out:
- spin_unlock_bh(&txqi->queue.lock);
+ spin_unlock_bh(&fq->lock);

return skb;
}
@@ -1322,6 +1399,7 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
bool txpending)
{
struct ieee80211_tx_control control = {};
+ struct fq *fq = &local->fq;
struct sk_buff *skb, *tmp;
struct txq_info *txqi;
unsigned long flags;
@@ -1344,9 +1422,9 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,

__skb_unlink(skb, skbs);

- spin_lock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);
ieee80211_txq_enqueue(local, txqi, skb);
- spin_unlock_bh(&txqi->queue.lock);
+ spin_unlock_bh(&fq->lock);

drv_wake_tx_queue(local, txqi);

diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index f13b08896238..bee776e46612 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -3389,25 +3389,6 @@ u8 *ieee80211_add_wmm_info_ie(u8 *buf, u8 qosinfo)
return buf;
}

-void ieee80211_init_tx_queue(struct ieee80211_sub_if_data *sdata,
- struct sta_info *sta,
- struct txq_info *txqi, int tid)
-{
- skb_queue_head_init(&txqi->queue);
- txqi->txq.vif = &sdata->vif;
-
- if (sta) {
- txqi->txq.sta = &sta->sta;
- sta->sta.txq[tid] = &txqi->txq;
- txqi->txq.tid = tid;
- txqi->txq.ac = ieee802_1d_to_ac[tid & 7];
- } else {
- sdata->vif.txq = &txqi->txq;
- txqi->txq.tid = 0;
- txqi->txq.ac = IEEE80211_AC_BE;
- }
-}
-
void ieee80211_txq_get_depth(struct ieee80211_txq *txq,
unsigned long *frame_cnt,
unsigned long *byte_cnt)
@@ -3415,9 +3396,9 @@ void ieee80211_txq_get_depth(struct ieee80211_txq *txq,
struct txq_info *txqi = to_txq_info(txq);

if (frame_cnt)
- *frame_cnt = txqi->queue.qlen;
+ *frame_cnt = txqi->tin.backlog_packets;

if (byte_cnt)
- *byte_cnt = txqi->byte_cnt;
+ *byte_cnt = txqi->tin.backlog_bytes;
}
EXPORT_SYMBOL(ieee80211_txq_get_depth);
--
2.1.4


2016-04-14 12:16:29

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv3 1/5] mac80211: skip netdev queue control with software queuing

Qdiscs are designed with no regard to 802.11
aggregation requirements and hand out
packet-by-packet with no guarantee they are
destined to the same tid. This does more bad than
good no matter how fairly a given qdisc may behave
on an ethernet interface.

Software queuing used per-AC netdev subqueue
congestion control whenever a global AC limit was
hit. This meant in practice a single station or
tid queue could starve others rather easily. This
could resonate with qdiscs in a bad way or could
just end up with poor aggregation performance.
Increasing the AC limit would increase induced
latency which is also bad.

Disabling qdiscs by default and performing
taildrop instead of netdev subqueue congestion
control on the other hand makes it possible for
tid queues to fill up "in the meantime" while
preventing stations starving each other.

This increases aggregation opportunities and
should allow software queuing based drivers
achieve better performance by utilizing airtime
more efficiently with big aggregates.

Signed-off-by: Michal Kazior <[email protected]>
---
include/net/mac80211.h | 4 ---
net/mac80211/ieee80211_i.h | 2 +-
net/mac80211/iface.c | 18 +++++++++--
net/mac80211/main.c | 3 --
net/mac80211/sta_info.c | 2 +-
net/mac80211/tx.c | 80 ++++++++++++++++++++++++----------------------
net/mac80211/util.c | 11 ++++---
7 files changed, 65 insertions(+), 55 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index a53333cb1528..c24d0b8e4deb 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -2113,9 +2113,6 @@ enum ieee80211_hw_flags {
* @n_cipher_schemes: a size of an array of cipher schemes definitions.
* @cipher_schemes: a pointer to an array of cipher scheme definitions
* supported by HW.
- *
- * @txq_ac_max_pending: maximum number of frames per AC pending in all txq
- * entries for a vif.
*/
struct ieee80211_hw {
struct ieee80211_conf conf;
@@ -2145,7 +2142,6 @@ struct ieee80211_hw {
u8 uapsd_max_sp_len;
u8 n_cipher_schemes;
const struct ieee80211_cipher_scheme *cipher_schemes;
- int txq_ac_max_pending;
};

static inline bool _ieee80211_hw_check(struct ieee80211_hw *hw,
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index c6830fbe7d68..b2570aa66d33 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -855,7 +855,6 @@ struct ieee80211_sub_if_data {
bool control_port_no_encrypt;
int encrypt_headroom;

- atomic_t txqs_len[IEEE80211_NUM_ACS];
struct ieee80211_tx_queue_params tx_conf[IEEE80211_NUM_ACS];
struct mac80211_qos_map __rcu *qos_map;

@@ -1118,6 +1117,7 @@ struct ieee80211_local {
fif_probe_req;
int probe_req_reg;
unsigned int filter_flags; /* FIF_* */
+ atomic_t num_tx_queued;

bool wiphy_ciphers_allocated;

diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 453b4e741780..67c9b1e565ad 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -976,13 +976,13 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,

if (sdata->vif.txq) {
struct txq_info *txqi = to_txq_info(sdata->vif.txq);
+ int n = skb_queue_len(&txqi->queue);

spin_lock_bh(&txqi->queue.lock);
ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
+ atomic_sub(n, &local->num_tx_queued);
txqi->byte_cnt = 0;
spin_unlock_bh(&txqi->queue.lock);
-
- atomic_set(&sdata->txqs_len[txqi->txq.ac], 0);
}

if (local->open_count == 0)
@@ -1198,6 +1198,12 @@ static void ieee80211_if_setup(struct net_device *dev)
dev->destructor = ieee80211_if_free;
}

+static void ieee80211_if_setup_no_queue(struct net_device *dev)
+{
+ ieee80211_if_setup(dev);
+ dev->priv_flags |= IFF_NO_QUEUE;
+}
+
static void ieee80211_iface_work(struct work_struct *work)
{
struct ieee80211_sub_if_data *sdata =
@@ -1707,6 +1713,7 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
struct net_device *ndev = NULL;
struct ieee80211_sub_if_data *sdata = NULL;
struct txq_info *txqi;
+ void (*if_setup)(struct net_device *dev);
int ret, i;
int txqs = 1;

@@ -1734,12 +1741,17 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
txq_size += sizeof(struct txq_info) +
local->hw.txq_data_size;

+ if (local->ops->wake_tx_queue)
+ if_setup = ieee80211_if_setup_no_queue;
+ else
+ if_setup = ieee80211_if_setup;
+
if (local->hw.queues >= IEEE80211_NUM_ACS)
txqs = IEEE80211_NUM_ACS;

ndev = alloc_netdev_mqs(size + txq_size,
name, name_assign_type,
- ieee80211_if_setup, txqs, 1);
+ if_setup, txqs, 1);
if (!ndev)
return -ENOMEM;
dev_net_set(ndev, wiphy_net(local->hw.wiphy));
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 8190bf27ebff..609abc39e454 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -1053,9 +1053,6 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)

local->dynamic_ps_forced_timeout = -1;

- if (!local->hw.txq_ac_max_pending)
- local->hw.txq_ac_max_pending = 64;
-
result = ieee80211_wep_init(local);
if (result < 0)
wiphy_debug(local->hw.wiphy, "Failed to initialize wep: %d\n",
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 00c82fb152c0..4ab97d454bc1 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -115,7 +115,7 @@ static void __cleanup_single_sta(struct sta_info *sta)
int n = skb_queue_len(&txqi->queue);

ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
- atomic_sub(n, &sdata->txqs_len[txqi->txq.ac]);
+ atomic_sub(n, &local->num_tx_queued);
txqi->byte_cnt = 0;
}
}
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 485e30a24b38..8de3d2676397 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1232,67 +1232,56 @@ ieee80211_tx_prepare(struct ieee80211_sub_if_data *sdata,
return TX_CONTINUE;
}

-static void ieee80211_drv_tx(struct ieee80211_local *local,
- struct ieee80211_vif *vif,
- struct ieee80211_sta *pubsta,
- struct sk_buff *skb)
+static struct txq_info *ieee80211_get_txq(struct ieee80211_local *local,
+ struct ieee80211_vif *vif,
+ struct ieee80211_sta *pubsta,
+ struct sk_buff *skb)
{
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
- struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
- struct ieee80211_tx_control control = {
- .sta = pubsta,
- };
- struct ieee80211_txq *txq = NULL;
- struct txq_info *txqi;
- u8 ac;

if ((info->flags & IEEE80211_TX_CTL_SEND_AFTER_DTIM) ||
(info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE))
- goto tx_normal;
+ return NULL;

if (!ieee80211_is_data(hdr->frame_control))
- goto tx_normal;
+ return NULL;

if (pubsta) {
u8 tid = skb->priority & IEEE80211_QOS_CTL_TID_MASK;

- txq = pubsta->txq[tid];
+ return to_txq_info(pubsta->txq[tid]);
} else if (vif) {
- txq = vif->txq;
+ return to_txq_info(vif->txq);
}

- if (!txq)
- goto tx_normal;
+ return NULL;
+}

- ac = txq->ac;
- txqi = to_txq_info(txq);
- atomic_inc(&sdata->txqs_len[ac]);
- if (atomic_read(&sdata->txqs_len[ac]) >= local->hw.txq_ac_max_pending)
- netif_stop_subqueue(sdata->dev, ac);
+static void ieee80211_txq_enqueue(struct ieee80211_local *local,
+ struct txq_info *txqi,
+ struct sk_buff *skb)
+{
+ lockdep_assert_held(&txqi->queue.lock);

- spin_lock_bh(&txqi->queue.lock);
+ if (atomic_read(&local->num_tx_queued) >= TOTAL_MAX_TX_BUFFER ||
+ txqi->queue.qlen >= STA_MAX_TX_BUFFER) {
+ ieee80211_free_txskb(&local->hw, skb);
+ return;
+ }
+
+ atomic_inc(&local->num_tx_queued);
txqi->byte_cnt += skb->len;
__skb_queue_tail(&txqi->queue, skb);
- spin_unlock_bh(&txqi->queue.lock);
-
- drv_wake_tx_queue(local, txqi);
-
- return;
-
-tx_normal:
- drv_tx(local, &control, skb);
}

struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
struct ieee80211_txq *txq)
{
struct ieee80211_local *local = hw_to_local(hw);
- struct ieee80211_sub_if_data *sdata = vif_to_sdata(txq->vif);
struct txq_info *txqi = container_of(txq, struct txq_info, txq);
struct ieee80211_hdr *hdr;
struct sk_buff *skb = NULL;
- u8 ac = txq->ac;

spin_lock_bh(&txqi->queue.lock);

@@ -1303,12 +1292,9 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
if (!skb)
goto out;

+ atomic_dec(&local->num_tx_queued);
txqi->byte_cnt -= skb->len;

- atomic_dec(&sdata->txqs_len[ac]);
- if (__netif_subqueue_stopped(sdata->dev, ac))
- ieee80211_propagate_queue_wake(local, sdata->vif.hw_queue[ac]);
-
hdr = (struct ieee80211_hdr *)skb->data;
if (txq->sta && ieee80211_is_data_qos(hdr->frame_control)) {
struct sta_info *sta = container_of(txq->sta, struct sta_info,
@@ -1335,7 +1321,9 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
struct sk_buff_head *skbs,
bool txpending)
{
+ struct ieee80211_tx_control control = {};
struct sk_buff *skb, *tmp;
+ struct txq_info *txqi;
unsigned long flags;

skb_queue_walk_safe(skbs, skb, tmp) {
@@ -1350,6 +1338,21 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
}
#endif

+ txqi = ieee80211_get_txq(local, vif, sta, skb);
+ if (txqi) {
+ info->control.vif = vif;
+
+ __skb_unlink(skb, skbs);
+
+ spin_lock_bh(&txqi->queue.lock);
+ ieee80211_txq_enqueue(local, txqi, skb);
+ spin_unlock_bh(&txqi->queue.lock);
+
+ drv_wake_tx_queue(local, txqi);
+
+ continue;
+ }
+
spin_lock_irqsave(&local->queue_stop_reason_lock, flags);
if (local->queue_stop_reasons[q] ||
(!txpending && !skb_queue_empty(&local->pending[q]))) {
@@ -1392,9 +1395,10 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags);

info->control.vif = vif;
+ control.sta = sta;

__skb_unlink(skb, skbs);
- ieee80211_drv_tx(local, vif, sta, skb);
+ drv_tx(local, &control, skb);
}

return true;
diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 0319d6d4f863..f13b08896238 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -244,6 +244,9 @@ void ieee80211_propagate_queue_wake(struct ieee80211_local *local, int queue)
struct ieee80211_sub_if_data *sdata;
int n_acs = IEEE80211_NUM_ACS;

+ if (local->ops->wake_tx_queue)
+ return;
+
if (local->hw.queues < IEEE80211_NUM_ACS)
n_acs = 1;

@@ -260,11 +263,6 @@ void ieee80211_propagate_queue_wake(struct ieee80211_local *local, int queue)
for (ac = 0; ac < n_acs; ac++) {
int ac_queue = sdata->vif.hw_queue[ac];

- if (local->ops->wake_tx_queue &&
- (atomic_read(&sdata->txqs_len[ac]) >
- local->hw.txq_ac_max_pending))
- continue;
-
if (ac_queue == queue ||
(sdata->vif.cab_queue == queue &&
local->queue_stop_reasons[ac_queue] == 0 &&
@@ -341,6 +339,9 @@ static void __ieee80211_stop_queue(struct ieee80211_hw *hw, int queue,

trace_stop_queue(local, queue, reason);

+ if (local->ops->wake_tx_queue)
+ return;
+
if (WARN_ON(queue >= hw->queues))
return;

--
2.1.4


2016-04-14 12:16:35

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv3 5/5] mac80211: add debug knobs for codel

This adds a few debugfs entries to make it easier
to test, debug and experiment.

Signed-off-by: Michal Kazior <[email protected]>
---
net/mac80211/debugfs.c | 14 ++++++++++++++
net/mac80211/ieee80211_i.h | 2 ++
net/mac80211/tx.c | 21 ++++++++++++++-------
3 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 5cbaa5872e6b..9088e505fa85 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -132,6 +132,10 @@ DEBUGFS_READONLY_FILE(fq_overlimit, "%u",
local->fq.overlimit);
DEBUGFS_READONLY_FILE(fq_collisions, "%u",
local->fq.collisions);
+DEBUGFS_READONLY_FILE(codel_drop_count, "%u",
+ local->cdrop_count);
+DEBUGFS_READONLY_FILE(codel_ecn_mark, "%u",
+ local->cecn_mark);

DEBUGFS_RW_FILE(fq_limit,
DEBUGFS_RW_EXPR_FQ("%u", &local->fq.limit),
@@ -139,6 +143,12 @@ DEBUGFS_RW_FILE(fq_limit,
DEBUGFS_RW_FILE(fq_quantum,
DEBUGFS_RW_EXPR_FQ("%u", &local->fq.quantum),
"%u", local->fq.quantum);
+DEBUGFS_RW_FILE(codel_interval,
+ DEBUGFS_RW_EXPR_FQ("%llu", &local->cparams.interval),
+ "%llu", local->cparams.interval);
+DEBUGFS_RW_FILE(codel_target,
+ DEBUGFS_RW_EXPR_FQ("%llu", &local->cparams.target),
+ "%llu", local->cparams.target);

#ifdef CONFIG_PM
static ssize_t reset_write(struct file *file, const char __user *user_buf,
@@ -330,6 +340,10 @@ void debugfs_hw_add(struct ieee80211_local *local)
DEBUGFS_ADD(fq_collisions);
DEBUGFS_ADD(fq_limit);
DEBUGFS_ADD(fq_quantum);
+ DEBUGFS_ADD(codel_interval);
+ DEBUGFS_ADD(codel_target);
+ DEBUGFS_ADD(codel_drop_count);
+ DEBUGFS_ADD(codel_ecn_mark);

statsd = debugfs_create_dir("statistics", phyd);

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 78953b495a25..7aecb7b6528c 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -1111,6 +1111,8 @@ struct ieee80211_local {
struct fq fq;
struct codel_vars *cvars;
struct codel_params cparams;
+ unsigned int cdrop_count;
+ unsigned int cecn_mark;

const struct ieee80211_ops *ops;

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 238cb8e979fd..b5506411b8e6 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1314,6 +1314,7 @@ static void codel_drop_fn(void *ctx,
local = vif_to_sdata(txqi->txq.vif)->local;
hw = &local->hw;

+ local->cdrop_count++;
ieee80211_free_txskb(hw, skb);
}

@@ -1325,6 +1326,8 @@ static struct sk_buff *fq_tin_dequeue_fn(struct fq *fq,
struct txq_info *txqi;
struct codel_vars *cvars;
struct codel_params *cparams;
+ struct sk_buff *skb;
+ u16 ecn_mark;
bool overloaded;

local = container_of(fq, struct ieee80211_local, fq);
@@ -1339,13 +1342,17 @@ static struct sk_buff *fq_tin_dequeue_fn(struct fq *fq,
/* TODO */
overloaded = false;

- return codel_dequeue(txqi,
- &flow->backlog,
- 0,
- cvars,
- cparams,
- codel_get_time(),
- overloaded);
+ ecn_mark = cvars->ecn_mark;
+ skb = codel_dequeue(txqi,
+ &flow->backlog,
+ 0,
+ cvars,
+ cparams,
+ codel_get_time(),
+ overloaded);
+ local->cecn_mark += cvars->ecn_mark - ecn_mark;
+
+ return skb;
}

static void fq_skb_free_fn(struct fq *fq,
--
2.1.4


2016-05-06 06:33:09

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCHv4 5/5] mac80211: add debug knobs for codel

On 6 May 2016 at 07:51, Dave Taht <[email protected]> wrote:
> On Thu, May 5, 2016 at 10:27 PM, Michal Kazior <[email protected]> wrote:
>> On 5 May 2016 at 17:21, Dave Taht <[email protected]> wrote:
>>> On Thu, May 5, 2016 at 4:00 AM, Michal Kazior <[email protected]> wrote:
>>>> This adds a few debugfs entries to make it easier
>>>> to test, debug and experiment.
>>>
>>> I might argue in favor of moving all these (inc the fq ones) into
>>> their own dir, maybe "aqm" or "sqm".
>>>
>>> The mixture of read only stats and configuration vars is a bit confusing.
>>>
>>> Also in my testing of the previous patch, actually seeing the stats
>>> get updated seemed to be highly async or inaccurate. For example, it
>>> was obvious from the captures themselves that codel_ce_mark-ing was
>>> happening, but the actual numbers out of wack with the mark seen or
>>> fq_backlog seen. (I can go back to revisit this)
>>
>> That's kind of expected since all of these bits are exposed as
>> separate debugfs entries/files. To avoid that it'd be necessary to
>> provide a single debugfs entry/file whose contents are generated on
>> open() while holding local->fq.lock. But then you could argue it
>> should contain all per-sta-tid info as well (backlog, flows, drops) as
>> well instead of having them in netdev*/stations/*/txqs.
>> Hmm..
>
> I have not had time to write up todays results to any full extent, but
> they were pretty spectacular.
>
> I have a comparison of the baseline ath10k driver vs your 3.5 patchset
> here on the second plot:
>
> http://blog.cerowrt.org/post/predictive_codeling/
>
> The raw data is here:
> https://github.com/dtaht/blog-cerowrt/tree/master/content/flent/qca-10.2-fqmac35-codel-5

It's probably good to explicitly mention that you test(ed) ath10k with
my RFC DQL patch applied. Without it the fqcodel benefits are a lot
less significant.

(oh, and the "3.5" is pre-PATCHv4 before fq/codel split work:
https://github.com/kazikcz/linux/tree/fqmac-v3.5 )


>
> ...
>
> a note: quantum of the mtu (typically 1514) is a saner default than 300,
>
> (the older patch I had, set it to 300, dunno what your default is now).

I still use 300.


> and quantum 1514, codel target 5ms rather than 20ms for this test
> series was *just fine* (but more testing of the lower target is
> needed)

I would keep 20ms for now until we get more test data. I'm mostly
concerned about MU performance on ath10k which requires significant
amount of buffering.


> However:
>
> quantum "300" only makes sense for very, very low bandwidths (say <
> 6mbits), in other scenarios it just eats extra cpu (5 passes through
> the loop to send a big packet) and disables
> the "new/old" queue feature which helps "push" new flows to flow
> balance. I'd default it to the larger value.

Perhaps this could be dynamically adjusted to match the slowest
station known to rate control in the future? Oh, and there's
multicast..


Michał

2016-05-05 15:21:40

by Dave Taht

[permalink] [raw]
Subject: Re: [PATCHv4 5/5] mac80211: add debug knobs for codel

On Thu, May 5, 2016 at 4:00 AM, Michal Kazior <[email protected]> wrote:
> This adds a few debugfs entries to make it easier
> to test, debug and experiment.

I might argue in favor of moving all these (inc the fq ones) into
their own dir, maybe "aqm" or "sqm".

The mixture of read only stats and configuration vars is a bit confusing.

Also in my testing of the previous patch, actually seeing the stats
get updated seemed to be highly async or inaccurate. For example, it
was obvious from the captures themselves that codel_ce_mark-ing was
happening, but the actual numbers out of wack with the mark seen or
fq_backlog seen. (I can go back to revisit this)

>
> Signed-off-by: Michal Kazior <[email protected]>
> ---
>
> Notes:
> v4:
> * stats adjustments (in-kernel codel has more of them)
>
> net/mac80211/debugfs.c | 40 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 40 insertions(+)
>
> diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
> index 43592b6f79f0..c7cfedc61fc4 100644
> --- a/net/mac80211/debugfs.c
> +++ b/net/mac80211/debugfs.c
> @@ -124,6 +124,15 @@ static const struct file_operations name## _ops = { \
> res; \
> })
>
> +#define DEBUGFS_RW_BOOL(arg) \
> +({ \
> + int res; \
> + int val; \
> + res = mac80211_parse_buffer(userbuf, count, ppos, "%d", &val); \
> + arg = !!(val); \
> + res; \
> +})
> +
> DEBUGFS_READONLY_FILE(fq_flows_cnt, "%u",
> local->fq.flows_cnt);
> DEBUGFS_READONLY_FILE(fq_backlog, "%u",
> @@ -132,6 +141,16 @@ DEBUGFS_READONLY_FILE(fq_overlimit, "%u",
> local->fq.overlimit);
> DEBUGFS_READONLY_FILE(fq_collisions, "%u",
> local->fq.collisions);
> +DEBUGFS_READONLY_FILE(codel_maxpacket, "%u",
> + local->cstats.maxpacket);
> +DEBUGFS_READONLY_FILE(codel_drop_count, "%u",
> + local->cstats.drop_count);
> +DEBUGFS_READONLY_FILE(codel_drop_len, "%u",
> + local->cstats.drop_len);
> +DEBUGFS_READONLY_FILE(codel_ecn_mark, "%u",
> + local->cstats.ecn_mark);
> +DEBUGFS_READONLY_FILE(codel_ce_mark, "%u",
> + local->cstats.ce_mark);
>
> DEBUGFS_RW_FILE(fq_limit,
> DEBUGFS_RW_EXPR_FQ("%u", &local->fq.limit),
> @@ -139,6 +158,18 @@ DEBUGFS_RW_FILE(fq_limit,
> DEBUGFS_RW_FILE(fq_quantum,
> DEBUGFS_RW_EXPR_FQ("%u", &local->fq.quantum),
> "%u", local->fq.quantum);
> +DEBUGFS_RW_FILE(codel_interval,
> + DEBUGFS_RW_EXPR_FQ("%u", &local->cparams.interval),
> + "%u", local->cparams.interval);
> +DEBUGFS_RW_FILE(codel_target,
> + DEBUGFS_RW_EXPR_FQ("%u", &local->cparams.target),
> + "%u", local->cparams.target);
> +DEBUGFS_RW_FILE(codel_mtu,
> + DEBUGFS_RW_EXPR_FQ("%u", &local->cparams.mtu),
> + "%u", local->cparams.mtu);
> +DEBUGFS_RW_FILE(codel_ecn,
> + DEBUGFS_RW_BOOL(local->cparams.ecn),
> + "%d", local->cparams.ecn ? 1 : 0);
>
> #ifdef CONFIG_PM
> static ssize_t reset_write(struct file *file, const char __user *user_buf,
> @@ -333,6 +364,15 @@ void debugfs_hw_add(struct ieee80211_local *local)
> DEBUGFS_ADD(fq_collisions);
> DEBUGFS_ADD(fq_limit);
> DEBUGFS_ADD(fq_quantum);
> + DEBUGFS_ADD(codel_maxpacket);
> + DEBUGFS_ADD(codel_drop_count);
> + DEBUGFS_ADD(codel_drop_len);
> + DEBUGFS_ADD(codel_ecn_mark);
> + DEBUGFS_ADD(codel_ce_mark);
> + DEBUGFS_ADD(codel_interval);
> + DEBUGFS_ADD(codel_target);
> + DEBUGFS_ADD(codel_mtu);
> + DEBUGFS_ADD(codel_ecn);
>
> statsd = debugfs_create_dir("statistics", phyd);
>
> --
> 2.1.4
>



--
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

2016-05-05 10:58:51

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv4 1/5] mac80211: skip netdev queue control with software queuing

Qdiscs are designed with no regard to 802.11
aggregation requirements and hand out
packet-by-packet with no guarantee they are
destined to the same tid. This does more bad than
good no matter how fairly a given qdisc may behave
on an ethernet interface.

Software queuing used per-AC netdev subqueue
congestion control whenever a global AC limit was
hit. This meant in practice a single station or
tid queue could starve others rather easily. This
could resonate with qdiscs in a bad way or could
just end up with poor aggregation performance.
Increasing the AC limit would increase induced
latency which is also bad.

Disabling qdiscs by default and performing
taildrop instead of netdev subqueue congestion
control on the other hand makes it possible for
tid queues to fill up "in the meantime" while
preventing stations starving each other.

This increases aggregation opportunities and
should allow software queuing based drivers
achieve better performance by utilizing airtime
more efficiently with big aggregates.

Signed-off-by: Michal Kazior <[email protected]>
---

Notes:
v4:
* make queue depth limit per interface instead of
per radio [Johannes]

include/net/mac80211.h | 4 ---
net/mac80211/ieee80211_i.h | 2 +-
net/mac80211/iface.c | 18 ++++++++--
net/mac80211/main.c | 3 --
net/mac80211/sta_info.c | 2 +-
net/mac80211/tx.c | 82 +++++++++++++++++++++++++---------------------
net/mac80211/util.c | 11 ++++---
7 files changed, 67 insertions(+), 55 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index 07ef9378df2b..ffb90dfe0d70 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -2143,9 +2143,6 @@ enum ieee80211_hw_flags {
* @n_cipher_schemes: a size of an array of cipher schemes definitions.
* @cipher_schemes: a pointer to an array of cipher scheme definitions
* supported by HW.
- *
- * @txq_ac_max_pending: maximum number of frames per AC pending in all txq
- * entries for a vif.
*/
struct ieee80211_hw {
struct ieee80211_conf conf;
@@ -2176,7 +2173,6 @@ struct ieee80211_hw {
u8 uapsd_max_sp_len;
u8 n_cipher_schemes;
const struct ieee80211_cipher_scheme *cipher_schemes;
- int txq_ac_max_pending;
};

static inline bool _ieee80211_hw_check(struct ieee80211_hw *hw,
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 9438c9406687..634603320374 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -856,7 +856,7 @@ struct ieee80211_sub_if_data {
bool control_port_no_encrypt;
int encrypt_headroom;

- atomic_t txqs_len[IEEE80211_NUM_ACS];
+ atomic_t num_tx_queued;
struct ieee80211_tx_queue_params tx_conf[IEEE80211_NUM_ACS];
struct mac80211_qos_map __rcu *qos_map;

diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index c59af3eb9fa4..609c5174d798 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -976,13 +976,13 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,

if (sdata->vif.txq) {
struct txq_info *txqi = to_txq_info(sdata->vif.txq);
+ int n = skb_queue_len(&txqi->queue);

spin_lock_bh(&txqi->queue.lock);
ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
+ atomic_sub(n, &sdata->num_tx_queued);
txqi->byte_cnt = 0;
spin_unlock_bh(&txqi->queue.lock);
-
- atomic_set(&sdata->txqs_len[txqi->txq.ac], 0);
}

if (local->open_count == 0)
@@ -1198,6 +1198,12 @@ static void ieee80211_if_setup(struct net_device *dev)
dev->destructor = ieee80211_if_free;
}

+static void ieee80211_if_setup_no_queue(struct net_device *dev)
+{
+ ieee80211_if_setup(dev);
+ dev->priv_flags |= IFF_NO_QUEUE;
+}
+
static void ieee80211_iface_work(struct work_struct *work)
{
struct ieee80211_sub_if_data *sdata =
@@ -1707,6 +1713,7 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
struct net_device *ndev = NULL;
struct ieee80211_sub_if_data *sdata = NULL;
struct txq_info *txqi;
+ void (*if_setup)(struct net_device *dev);
int ret, i;
int txqs = 1;

@@ -1734,12 +1741,17 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
txq_size += sizeof(struct txq_info) +
local->hw.txq_data_size;

+ if (local->ops->wake_tx_queue)
+ if_setup = ieee80211_if_setup_no_queue;
+ else
+ if_setup = ieee80211_if_setup;
+
if (local->hw.queues >= IEEE80211_NUM_ACS)
txqs = IEEE80211_NUM_ACS;

ndev = alloc_netdev_mqs(size + txq_size,
name, name_assign_type,
- ieee80211_if_setup, txqs, 1);
+ if_setup, txqs, 1);
if (!ndev)
return -ENOMEM;
dev_net_set(ndev, wiphy_net(local->hw.wiphy));
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 7ee91d6151d1..160ac6b8b9a1 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -1055,9 +1055,6 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)

local->dynamic_ps_forced_timeout = -1;

- if (!local->hw.txq_ac_max_pending)
- local->hw.txq_ac_max_pending = 64;
-
result = ieee80211_wep_init(local);
if (result < 0)
wiphy_debug(local->hw.wiphy, "Failed to initialize wep: %d\n",
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 5ccfdbd406bd..177cc6cd6416 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -116,7 +116,7 @@ static void __cleanup_single_sta(struct sta_info *sta)
int n = skb_queue_len(&txqi->queue);

ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
- atomic_sub(n, &sdata->txqs_len[txqi->txq.ac]);
+ atomic_sub(n, &sdata->num_tx_queued);
txqi->byte_cnt = 0;
}
}
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 203044379ce0..792f01721d65 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1236,67 +1236,58 @@ ieee80211_tx_prepare(struct ieee80211_sub_if_data *sdata,
return TX_CONTINUE;
}

-static void ieee80211_drv_tx(struct ieee80211_local *local,
- struct ieee80211_vif *vif,
- struct ieee80211_sta *pubsta,
- struct sk_buff *skb)
+static struct txq_info *ieee80211_get_txq(struct ieee80211_local *local,
+ struct ieee80211_vif *vif,
+ struct ieee80211_sta *pubsta,
+ struct sk_buff *skb)
{
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
- struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
- struct ieee80211_tx_control control = {
- .sta = pubsta,
- };
- struct ieee80211_txq *txq = NULL;
- struct txq_info *txqi;
- u8 ac;

if ((info->flags & IEEE80211_TX_CTL_SEND_AFTER_DTIM) ||
(info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE))
- goto tx_normal;
+ return NULL;

if (!ieee80211_is_data(hdr->frame_control))
- goto tx_normal;
+ return NULL;

if (pubsta) {
u8 tid = skb->priority & IEEE80211_QOS_CTL_TID_MASK;

- txq = pubsta->txq[tid];
+ return to_txq_info(pubsta->txq[tid]);
} else if (vif) {
- txq = vif->txq;
+ return to_txq_info(vif->txq);
}

- if (!txq)
- goto tx_normal;
+ return NULL;
+}

- ac = txq->ac;
- txqi = to_txq_info(txq);
- atomic_inc(&sdata->txqs_len[ac]);
- if (atomic_read(&sdata->txqs_len[ac]) >= local->hw.txq_ac_max_pending)
- netif_stop_subqueue(sdata->dev, ac);
+static void ieee80211_txq_enqueue(struct ieee80211_local *local,
+ struct txq_info *txqi,
+ struct sk_buff *skb)
+{
+ struct ieee80211_sub_if_data *sdata = vif_to_sdata(txqi->txq.vif);

- spin_lock_bh(&txqi->queue.lock);
+ lockdep_assert_held(&txqi->queue.lock);
+
+ if (atomic_read(&sdata->num_tx_queued) >= TOTAL_MAX_TX_BUFFER ||
+ txqi->queue.qlen >= STA_MAX_TX_BUFFER) {
+ ieee80211_free_txskb(&local->hw, skb);
+ return;
+ }
+
+ atomic_inc(&sdata->num_tx_queued);
txqi->byte_cnt += skb->len;
__skb_queue_tail(&txqi->queue, skb);
- spin_unlock_bh(&txqi->queue.lock);
-
- drv_wake_tx_queue(local, txqi);
-
- return;
-
-tx_normal:
- drv_tx(local, &control, skb);
}

struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
struct ieee80211_txq *txq)
{
- struct ieee80211_local *local = hw_to_local(hw);
struct ieee80211_sub_if_data *sdata = vif_to_sdata(txq->vif);
struct txq_info *txqi = container_of(txq, struct txq_info, txq);
struct ieee80211_hdr *hdr;
struct sk_buff *skb = NULL;
- u8 ac = txq->ac;

spin_lock_bh(&txqi->queue.lock);

@@ -1307,12 +1298,9 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
if (!skb)
goto out;

+ atomic_dec(&sdata->num_tx_queued);
txqi->byte_cnt -= skb->len;

- atomic_dec(&sdata->txqs_len[ac]);
- if (__netif_subqueue_stopped(sdata->dev, ac))
- ieee80211_propagate_queue_wake(local, sdata->vif.hw_queue[ac]);
-
hdr = (struct ieee80211_hdr *)skb->data;
if (txq->sta && ieee80211_is_data_qos(hdr->frame_control)) {
struct sta_info *sta = container_of(txq->sta, struct sta_info,
@@ -1343,7 +1331,9 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
struct sk_buff_head *skbs,
bool txpending)
{
+ struct ieee80211_tx_control control = {};
struct sk_buff *skb, *tmp;
+ struct txq_info *txqi;
unsigned long flags;

skb_queue_walk_safe(skbs, skb, tmp) {
@@ -1358,6 +1348,21 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
}
#endif

+ txqi = ieee80211_get_txq(local, vif, sta, skb);
+ if (txqi) {
+ info->control.vif = vif;
+
+ __skb_unlink(skb, skbs);
+
+ spin_lock_bh(&txqi->queue.lock);
+ ieee80211_txq_enqueue(local, txqi, skb);
+ spin_unlock_bh(&txqi->queue.lock);
+
+ drv_wake_tx_queue(local, txqi);
+
+ continue;
+ }
+
spin_lock_irqsave(&local->queue_stop_reason_lock, flags);
if (local->queue_stop_reasons[q] ||
(!txpending && !skb_queue_empty(&local->pending[q]))) {
@@ -1400,9 +1405,10 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags);

info->control.vif = vif;
+ control.sta = sta;

__skb_unlink(skb, skbs);
- ieee80211_drv_tx(local, vif, sta, skb);
+ drv_tx(local, &control, skb);
}

return true;
diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 905003f75c4d..8903285337da 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -244,6 +244,9 @@ void ieee80211_propagate_queue_wake(struct ieee80211_local *local, int queue)
struct ieee80211_sub_if_data *sdata;
int n_acs = IEEE80211_NUM_ACS;

+ if (local->ops->wake_tx_queue)
+ return;
+
if (local->hw.queues < IEEE80211_NUM_ACS)
n_acs = 1;

@@ -260,11 +263,6 @@ void ieee80211_propagate_queue_wake(struct ieee80211_local *local, int queue)
for (ac = 0; ac < n_acs; ac++) {
int ac_queue = sdata->vif.hw_queue[ac];

- if (local->ops->wake_tx_queue &&
- (atomic_read(&sdata->txqs_len[ac]) >
- local->hw.txq_ac_max_pending))
- continue;
-
if (ac_queue == queue ||
(sdata->vif.cab_queue == queue &&
local->queue_stop_reasons[ac_queue] == 0 &&
@@ -341,6 +339,9 @@ static void __ieee80211_stop_queue(struct ieee80211_hw *hw, int queue,

trace_stop_queue(local, queue, reason);

+ if (local->ops->wake_tx_queue)
+ return;
+
if (WARN_ON(queue >= hw->queues))
return;

--
2.1.4


2016-05-19 08:36:06

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv5 3/5] mac80211: add debug knobs for fair queuing

This adds a debugfs entry to read and modify some
fq parameters and inroduces a module parameter to
control number of flows mac80211 shuold maintain.

This makes it easy to debug, test and experiment.

Signed-off-by: Michal Kazior <[email protected]>
---

Notes:
v5:
* expose a single "aqm" debugfs knob to maintain
coherent stat values [Dave]

net/mac80211/debugfs.c | 173 +++++++++++++++++++++++++++++++++++++++++++++++++
net/mac80211/tx.c | 8 ++-
2 files changed, 180 insertions(+), 1 deletion(-)

diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index b251b2f7f8dd..2906c1004e1a 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -10,6 +10,7 @@

#include <linux/debugfs.h>
#include <linux/rtnetlink.h>
+#include <linux/vmalloc.h>
#include "ieee80211_i.h"
#include "driver-ops.h"
#include "rate.h"
@@ -70,6 +71,177 @@ DEBUGFS_READONLY_FILE(wep_iv, "%#08x",
DEBUGFS_READONLY_FILE(rate_ctrl_alg, "%s",
local->rate_ctrl ? local->rate_ctrl->ops->name : "hw/driver");

+struct aqm_info {
+ struct ieee80211_local *local;
+ size_t size;
+ size_t len;
+ unsigned char buf[0];
+};
+
+#define AQM_HDR_LEN 200
+#define AQM_HW_ENTRY_LEN 40
+#define AQM_TXQ_ENTRY_LEN 110
+
+static int aqm_open(struct inode *inode, struct file *file)
+{
+ struct ieee80211_local *local = inode->i_private;
+ struct ieee80211_sub_if_data *sdata;
+ struct sta_info *sta;
+ struct txq_info *txqi;
+ struct fq *fq = &local->fq;
+ struct aqm_info *info = NULL;
+ int len = 0;
+ int i;
+
+ if (!local->ops->wake_tx_queue)
+ return -EOPNOTSUPP;
+
+ len += AQM_HDR_LEN;
+ len += 6 * AQM_HW_ENTRY_LEN;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(sdata, &local->interfaces, list)
+ len += AQM_TXQ_ENTRY_LEN;
+ list_for_each_entry_rcu(sta, &local->sta_list, list)
+ len += AQM_TXQ_ENTRY_LEN * ARRAY_SIZE(sta->sta.txq);
+ rcu_read_unlock();
+
+ info = vmalloc(len);
+ if (!info)
+ return -ENOMEM;
+
+ spin_lock_bh(&local->fq.lock);
+ rcu_read_lock();
+
+ file->private_data = info;
+ info->local = local;
+ info->size = len;
+ len = 0;
+
+ len += scnprintf(info->buf + len, info->size - len,
+ "* hw\n"
+ "access name value\n"
+ "R fq_flows_cnt %u\n"
+ "R fq_backlog %u\n"
+ "R fq_overlimit %u\n"
+ "R fq_collisions %u\n"
+ "RW fq_limit %u\n"
+ "RW fq_quantum %u\n",
+ fq->flows_cnt,
+ fq->backlog,
+ fq->overlimit,
+ fq->collisions,
+ fq->limit,
+ fq->quantum);
+
+ len += scnprintf(info->buf + len,
+ info->size - len,
+ "* vif\n"
+ "ifname addr ac backlog-bytes backlog-packets flows overlimit collisions tx-bytes tx-packets\n");
+
+ list_for_each_entry_rcu(sdata, &local->interfaces, list) {
+ txqi = to_txq_info(sdata->vif.txq);
+ len += scnprintf(info->buf + len, info->size - len,
+ "%s %pM %u %u %u %u %u %u %u %u\n",
+ sdata->name,
+ sdata->vif.addr,
+ txqi->txq.ac,
+ txqi->tin.backlog_bytes,
+ txqi->tin.backlog_packets,
+ txqi->tin.flows,
+ txqi->tin.overlimit,
+ txqi->tin.collisions,
+ txqi->tin.tx_bytes,
+ txqi->tin.tx_packets);
+ }
+
+ len += scnprintf(info->buf + len,
+ info->size - len,
+ "* sta\n"
+ "ifname addr tid ac backlog-bytes backlog-packets flows overlimit collisions tx-bytes tx-packets\n");
+
+ list_for_each_entry_rcu(sta, &local->sta_list, list) {
+ sdata = sta->sdata;
+ for (i = 0; i < ARRAY_SIZE(sta->sta.txq); i++) {
+ txqi = to_txq_info(sta->sta.txq[i]);
+ len += scnprintf(info->buf + len, info->size - len,
+ "%s %pM %d %d %u %u %u %u %u %u %u\n",
+ sdata->name,
+ sta->sta.addr,
+ txqi->txq.tid,
+ txqi->txq.ac,
+ txqi->tin.backlog_bytes,
+ txqi->tin.backlog_packets,
+ txqi->tin.flows,
+ txqi->tin.overlimit,
+ txqi->tin.collisions,
+ txqi->tin.tx_bytes,
+ txqi->tin.tx_packets);
+ }
+ }
+
+ info->len = len;
+
+ rcu_read_unlock();
+ spin_unlock_bh(&local->fq.lock);
+
+ return 0;
+}
+
+static int aqm_release(struct inode *inode, struct file *file)
+{
+ vfree(file->private_data);
+ return 0;
+}
+
+static ssize_t aqm_read(struct file *file,
+ char __user *user_buf,
+ size_t count,
+ loff_t *ppos)
+{
+ struct aqm_info *info = file->private_data;
+
+ return simple_read_from_buffer(user_buf, count, ppos,
+ info->buf, info->len);
+}
+
+static ssize_t aqm_write(struct file *file,
+ const char __user *user_buf,
+ size_t count,
+ loff_t *ppos)
+{
+ struct aqm_info *info = file->private_data;
+ struct ieee80211_local *local = info->local;
+ char buf[100];
+ size_t len;
+
+ if (count > sizeof(buf))
+ return -EINVAL;
+
+ if (copy_from_user(buf, user_buf, count))
+ return -EFAULT;
+
+ buf[sizeof(buf) - 1] = '\0';
+ len = strlen(buf);
+ if (len > 0 && buf[len-1] == '\n')
+ buf[len-1] = 0;
+
+ if (sscanf(buf, "fq_limit %u", &local->fq.limit) == 1)
+ return count;
+ else if (sscanf(buf, "fq_quantum %u", &local->fq.quantum) == 1)
+ return count;
+
+ return -EINVAL;
+}
+
+static const struct file_operations aqm_ops = {
+ .write = aqm_write,
+ .read = aqm_read,
+ .open = aqm_open,
+ .release = aqm_release,
+ .llseek = default_llseek,
+};
+
#ifdef CONFIG_PM
static ssize_t reset_write(struct file *file, const char __user *user_buf,
size_t count, loff_t *ppos)
@@ -256,6 +428,7 @@ void debugfs_hw_add(struct ieee80211_local *local)
DEBUGFS_ADD(hwflags);
DEBUGFS_ADD(user_power);
DEBUGFS_ADD(power);
+ DEBUGFS_ADD_MODE(aqm, 0600);

statsd = debugfs_create_dir("statistics", phyd);

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 1d8343fca6d4..2b60b10e6990 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -17,6 +17,7 @@
#include <linux/slab.h>
#include <linux/skbuff.h>
#include <linux/etherdevice.h>
+#include <linux/moduleparam.h>
#include <linux/bitmap.h>
#include <linux/rcupdate.h>
#include <linux/export.h>
@@ -36,6 +37,11 @@
#include "wme.h"
#include "rate.h"

+static unsigned int fq_flows_cnt = 4096;
+module_param(fq_flows_cnt, uint, 0644);
+MODULE_PARM_DESC(fq_flows_cnt,
+ "Maximum number of txq fair queuing flows. ");
+
/* misc utils */

static inline void ieee80211_tx_stats(struct net_device *dev, u32 len)
@@ -1346,7 +1352,7 @@ int ieee80211_txq_setup_flows(struct ieee80211_local *local)
if (!local->ops->wake_tx_queue)
return 0;

- ret = fq_init(fq, 4096);
+ ret = fq_init(fq, max_t(u32, fq_flows_cnt, 1));
if (ret)
return ret;

--
2.1.4


2016-05-19 08:36:05

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv5 2/5] mac80211: implement fair queueing per txq

mac80211's software queues were designed to work
very closely with device tx queues. They are
required to make use of 802.11 packet aggregation
easily and efficiently.

Due to the way 802.11 aggregation is designed it
only makes sense to keep fair queuing as close to
hardware as possible to reduce induced latency and
inertia and provide the best flow responsivness.

This change doesn't translate directly to
immediate and significant gains. End result
depends on driver's induced latency. Best results
can be achieved if driver keeps its own tx
queue/fifo fill level to a minimum.

Signed-off-by: Michal Kazior <[email protected]>
---

Notes:
v4:
* removed internal fq.h and re-used in-kernel one

net/mac80211/agg-tx.c | 8 ++-
net/mac80211/ieee80211_i.h | 24 ++++++--
net/mac80211/iface.c | 12 ++--
net/mac80211/main.c | 7 +++
net/mac80211/rx.c | 2 +-
net/mac80211/sta_info.c | 14 ++---
net/mac80211/tx.c | 136 ++++++++++++++++++++++++++++++++++++++-------
net/mac80211/util.c | 23 +-------
8 files changed, 162 insertions(+), 64 deletions(-)

diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
index 42fa81031dfa..5650c46bf91a 100644
--- a/net/mac80211/agg-tx.c
+++ b/net/mac80211/agg-tx.c
@@ -194,17 +194,21 @@ static void
ieee80211_agg_stop_txq(struct sta_info *sta, int tid)
{
struct ieee80211_txq *txq = sta->sta.txq[tid];
+ struct ieee80211_sub_if_data *sdata;
+ struct fq *fq;
struct txq_info *txqi;

if (!txq)
return;

txqi = to_txq_info(txq);
+ sdata = vif_to_sdata(txq->vif);
+ fq = &sdata->local->fq;

/* Lock here to protect against further seqno updates on dequeue */
- spin_lock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);
set_bit(IEEE80211_TXQ_STOP, &txqi->flags);
- spin_unlock_bh(&txqi->queue.lock);
+ spin_unlock_bh(&fq->lock);
}

static void
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 634603320374..6f8375f1df88 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -30,6 +30,7 @@
#include <net/ieee80211_radiotap.h>
#include <net/cfg80211.h>
#include <net/mac80211.h>
+#include <net/fq.h>
#include "key.h"
#include "sta_info.h"
#include "debug.h"
@@ -805,10 +806,17 @@ enum txq_info_flags {
IEEE80211_TXQ_NO_AMSDU,
};

+/**
+ * struct txq_info - per tid queue
+ *
+ * @tin: contains packets split into multiple flows
+ * @def_flow: used as a fallback flow when a packet destined to @tin hashes to
+ * a fq_flow which is already owned by a different tin
+ */
struct txq_info {
- struct sk_buff_head queue;
+ struct fq_tin tin;
+ struct fq_flow def_flow;
unsigned long flags;
- unsigned long byte_cnt;

/* keep last! */
struct ieee80211_txq txq;
@@ -1099,6 +1107,8 @@ struct ieee80211_local {
* it first anyway so they become a no-op */
struct ieee80211_hw hw;

+ struct fq fq;
+
const struct ieee80211_ops *ops;

/*
@@ -1931,9 +1941,13 @@ static inline bool ieee80211_can_run_worker(struct ieee80211_local *local)
return true;
}

-void ieee80211_init_tx_queue(struct ieee80211_sub_if_data *sdata,
- struct sta_info *sta,
- struct txq_info *txq, int tid);
+int ieee80211_txq_setup_flows(struct ieee80211_local *local);
+void ieee80211_txq_teardown_flows(struct ieee80211_local *local);
+void ieee80211_txq_init(struct ieee80211_sub_if_data *sdata,
+ struct sta_info *sta,
+ struct txq_info *txq, int tid);
+void ieee80211_txq_purge(struct ieee80211_local *local,
+ struct txq_info *txqi);
void ieee80211_send_auth(struct ieee80211_sub_if_data *sdata,
u16 transaction, u16 auth_alg, u16 status,
const u8 *extra, size_t extra_len, const u8 *bssid,
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 609c5174d798..b123a9e325b3 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -779,6 +779,7 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,
bool going_down)
{
struct ieee80211_local *local = sdata->local;
+ struct fq *fq = &local->fq;
unsigned long flags;
struct sk_buff *skb, *tmp;
u32 hw_reconf_flags = 0;
@@ -976,13 +977,10 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,

if (sdata->vif.txq) {
struct txq_info *txqi = to_txq_info(sdata->vif.txq);
- int n = skb_queue_len(&txqi->queue);

- spin_lock_bh(&txqi->queue.lock);
- ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
- atomic_sub(n, &sdata->num_tx_queued);
- txqi->byte_cnt = 0;
- spin_unlock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);
+ ieee80211_txq_purge(local, txqi);
+ spin_unlock_bh(&fq->lock);
}

if (local->open_count == 0)
@@ -1792,7 +1790,7 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,

if (txq_size) {
txqi = netdev_priv(ndev) + size;
- ieee80211_init_tx_queue(sdata, NULL, txqi, 0);
+ ieee80211_txq_init(sdata, NULL, txqi, 0);
}

sdata->dev = ndev;
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 160ac6b8b9a1..d00ea9b13f49 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -1086,6 +1086,10 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)

rtnl_unlock();

+ result = ieee80211_txq_setup_flows(local);
+ if (result)
+ goto fail_flows;
+
#ifdef CONFIG_INET
local->ifa_notifier.notifier_call = ieee80211_ifa_changed;
result = register_inetaddr_notifier(&local->ifa_notifier);
@@ -1111,6 +1115,8 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)
#if defined(CONFIG_INET) || defined(CONFIG_IPV6)
fail_ifa:
#endif
+ ieee80211_txq_teardown_flows(local);
+ fail_flows:
rtnl_lock();
rate_control_deinitialize(local);
ieee80211_remove_interfaces(local);
@@ -1169,6 +1175,7 @@ void ieee80211_unregister_hw(struct ieee80211_hw *hw)
skb_queue_purge(&local->skb_queue);
skb_queue_purge(&local->skb_queue_unreliable);
skb_queue_purge(&local->skb_queue_tdls_chsw);
+ ieee80211_txq_teardown_flows(local);

destroy_workqueue(local->workqueue);
wiphy_unregister(local->hw.wiphy);
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 5e65e838992a..9a1eb70cb120 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1268,7 +1268,7 @@ static void sta_ps_start(struct sta_info *sta)
for (tid = 0; tid < ARRAY_SIZE(sta->sta.txq); tid++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[tid]);

- if (!skb_queue_len(&txqi->queue))
+ if (!txqi->tin.backlog_packets)
set_bit(tid, &sta->txq_buffered_tids);
else
clear_bit(tid, &sta->txq_buffered_tids);
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 177cc6cd6416..76b737dcc36f 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -90,6 +90,7 @@ static void __cleanup_single_sta(struct sta_info *sta)
struct tid_ampdu_tx *tid_tx;
struct ieee80211_sub_if_data *sdata = sta->sdata;
struct ieee80211_local *local = sdata->local;
+ struct fq *fq = &local->fq;
struct ps_data *ps;

if (test_sta_flag(sta, WLAN_STA_PS_STA) ||
@@ -113,11 +114,10 @@ static void __cleanup_single_sta(struct sta_info *sta)
if (sta->sta.txq[0]) {
for (i = 0; i < ARRAY_SIZE(sta->sta.txq); i++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[i]);
- int n = skb_queue_len(&txqi->queue);

- ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
- atomic_sub(n, &sdata->num_tx_queued);
- txqi->byte_cnt = 0;
+ spin_lock_bh(&fq->lock);
+ ieee80211_txq_purge(local, txqi);
+ spin_unlock_bh(&fq->lock);
}
}

@@ -368,7 +368,7 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
for (i = 0; i < ARRAY_SIZE(sta->sta.txq); i++) {
struct txq_info *txq = txq_data + i * size;

- ieee80211_init_tx_queue(sdata, sta, txq, i);
+ ieee80211_txq_init(sdata, sta, txq, i);
}
}

@@ -1211,7 +1211,7 @@ void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta)
for (i = 0; i < ARRAY_SIZE(sta->sta.txq); i++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[i]);

- if (!skb_queue_len(&txqi->queue))
+ if (!txqi->tin.backlog_packets)
continue;

drv_wake_tx_queue(local, txqi);
@@ -1648,7 +1648,7 @@ ieee80211_sta_ps_deliver_response(struct sta_info *sta,
for (tid = 0; tid < ARRAY_SIZE(sta->sta.txq); tid++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[tid]);

- if (!(tids & BIT(tid)) || skb_queue_len(&txqi->queue))
+ if (!(tids & BIT(tid)) || txqi->tin.backlog_packets)
continue;

sta_info_recalc_tim(sta);
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 3e77da195ce8..1d8343fca6d4 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -25,6 +25,7 @@
#include <net/cfg80211.h>
#include <net/mac80211.h>
#include <asm/unaligned.h>
+#include <net/fq_impl.h>

#include "ieee80211_i.h"
#include "driver-ops.h"
@@ -1266,46 +1267,121 @@ static struct txq_info *ieee80211_get_txq(struct ieee80211_local *local,
return to_txq_info(txq);
}

+static struct sk_buff *fq_tin_dequeue_func(struct fq *fq,
+ struct fq_tin *tin,
+ struct fq_flow *flow)
+{
+ return fq_flow_dequeue(fq, flow);
+}
+
+static void fq_skb_free_func(struct fq *fq,
+ struct fq_tin *tin,
+ struct fq_flow *flow,
+ struct sk_buff *skb)
+{
+ struct ieee80211_local *local;
+
+ local = container_of(fq, struct ieee80211_local, fq);
+ ieee80211_free_txskb(&local->hw, skb);
+}
+
+static struct fq_flow *fq_flow_get_default_func(struct fq *fq,
+ struct fq_tin *tin,
+ int idx,
+ struct sk_buff *skb)
+{
+ struct txq_info *txqi;
+
+ txqi = container_of(tin, struct txq_info, tin);
+ return &txqi->def_flow;
+}
+
static void ieee80211_txq_enqueue(struct ieee80211_local *local,
struct txq_info *txqi,
struct sk_buff *skb)
{
- struct ieee80211_sub_if_data *sdata = vif_to_sdata(txqi->txq.vif);
+ struct fq *fq = &local->fq;
+ struct fq_tin *tin = &txqi->tin;

- lockdep_assert_held(&txqi->queue.lock);
+ fq_tin_enqueue(fq, tin, skb,
+ fq_skb_free_func,
+ fq_flow_get_default_func);
+}

- if (atomic_read(&sdata->num_tx_queued) >= TOTAL_MAX_TX_BUFFER ||
- txqi->queue.qlen >= STA_MAX_TX_BUFFER) {
- ieee80211_free_txskb(&local->hw, skb);
- return;
+void ieee80211_txq_init(struct ieee80211_sub_if_data *sdata,
+ struct sta_info *sta,
+ struct txq_info *txqi, int tid)
+{
+ fq_tin_init(&txqi->tin);
+ fq_flow_init(&txqi->def_flow);
+
+ txqi->txq.vif = &sdata->vif;
+
+ if (sta) {
+ txqi->txq.sta = &sta->sta;
+ sta->sta.txq[tid] = &txqi->txq;
+ txqi->txq.tid = tid;
+ txqi->txq.ac = ieee802_1d_to_ac[tid & 7];
+ } else {
+ sdata->vif.txq = &txqi->txq;
+ txqi->txq.tid = 0;
+ txqi->txq.ac = IEEE80211_AC_BE;
}
+}

- atomic_inc(&sdata->num_tx_queued);
- txqi->byte_cnt += skb->len;
- __skb_queue_tail(&txqi->queue, skb);
+void ieee80211_txq_purge(struct ieee80211_local *local,
+ struct txq_info *txqi)
+{
+ struct fq *fq = &local->fq;
+ struct fq_tin *tin = &txqi->tin;
+
+ fq_tin_reset(fq, tin, fq_skb_free_func);
+}
+
+int ieee80211_txq_setup_flows(struct ieee80211_local *local)
+{
+ struct fq *fq = &local->fq;
+ int ret;
+
+ if (!local->ops->wake_tx_queue)
+ return 0;
+
+ ret = fq_init(fq, 4096);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+void ieee80211_txq_teardown_flows(struct ieee80211_local *local)
+{
+ struct fq *fq = &local->fq;
+
+ if (!local->ops->wake_tx_queue)
+ return;
+
+ fq_reset(fq, fq_skb_free_func);
}

struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
struct ieee80211_txq *txq)
{
struct ieee80211_local *local = hw_to_local(hw);
- struct ieee80211_sub_if_data *sdata = vif_to_sdata(txq->vif);
struct txq_info *txqi = container_of(txq, struct txq_info, txq);
struct ieee80211_hdr *hdr;
struct sk_buff *skb = NULL;
+ struct fq *fq = &local->fq;
+ struct fq_tin *tin = &txqi->tin;

- spin_lock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);

if (test_bit(IEEE80211_TXQ_STOP, &txqi->flags))
goto out;

- skb = __skb_dequeue(&txqi->queue);
+ skb = fq_tin_dequeue(fq, tin, fq_tin_dequeue_func);
if (!skb)
goto out;

- atomic_dec(&sdata->num_tx_queued);
- txqi->byte_cnt -= skb->len;
-
hdr = (struct ieee80211_hdr *)skb->data;
if (txq->sta && ieee80211_is_data_qos(hdr->frame_control)) {
struct sta_info *sta = container_of(txq->sta, struct sta_info,
@@ -1320,7 +1396,7 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
}

out:
- spin_unlock_bh(&txqi->queue.lock);
+ spin_unlock_bh(&fq->lock);

if (skb && skb_has_frag_list(skb) &&
!ieee80211_hw_check(&local->hw, TX_FRAG_LIST))
@@ -1337,6 +1413,7 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
bool txpending)
{
struct ieee80211_tx_control control = {};
+ struct fq *fq = &local->fq;
struct sk_buff *skb, *tmp;
struct txq_info *txqi;
unsigned long flags;
@@ -1359,9 +1436,9 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,

__skb_unlink(skb, skbs);

- spin_lock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);
ieee80211_txq_enqueue(local, txqi, skb);
- spin_unlock_bh(&txqi->queue.lock);
+ spin_unlock_bh(&fq->lock);

drv_wake_tx_queue(local, txqi);

@@ -2893,6 +2970,9 @@ static bool ieee80211_amsdu_aggregate(struct ieee80211_sub_if_data *sdata,
struct sk_buff *skb)
{
struct ieee80211_local *local = sdata->local;
+ struct fq *fq = &local->fq;
+ struct fq_tin *tin;
+ struct fq_flow *flow;
u8 tid = skb->priority & IEEE80211_QOS_CTL_TAG1D_MASK;
struct ieee80211_txq *txq = sta->sta.txq[tid];
struct txq_info *txqi;
@@ -2904,6 +2984,7 @@ static bool ieee80211_amsdu_aggregate(struct ieee80211_sub_if_data *sdata,
__be16 len;
void *data;
bool ret = false;
+ unsigned int orig_len;
int n = 1, nfrags;

if (!ieee80211_hw_check(&local->hw, TX_AMSDU))
@@ -2920,12 +3001,20 @@ static bool ieee80211_amsdu_aggregate(struct ieee80211_sub_if_data *sdata,
max_amsdu_len = min_t(int, max_amsdu_len,
sta->sta.max_rc_amsdu_len);

- spin_lock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);

- head = skb_peek_tail(&txqi->queue);
+ /* TODO: Ideally aggregation should be done on dequeue to remain
+ * responsive to environment changes.
+ */
+
+ tin = &txqi->tin;
+ flow = fq_flow_classify(fq, tin, skb, fq_flow_get_default_func);
+ head = skb_peek_tail(&flow->queue);
if (!head)
goto out;

+ orig_len = head->len;
+
if (skb->len + head->len > max_amsdu_len)
goto out;

@@ -2964,8 +3053,13 @@ static bool ieee80211_amsdu_aggregate(struct ieee80211_sub_if_data *sdata,
head->data_len += skb->len;
*frag_tail = skb;

+ flow->backlog += head->len - orig_len;
+ tin->backlog_bytes += head->len - orig_len;
+
+ fq_recalc_backlog(fq, tin, flow);
+
out:
- spin_unlock_bh(&txqi->queue.lock);
+ spin_unlock_bh(&fq->lock);

return ret;
}
diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 0db46442bdcf..42bf0b6685e8 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -3389,25 +3389,6 @@ u8 *ieee80211_add_wmm_info_ie(u8 *buf, u8 qosinfo)
return buf;
}

-void ieee80211_init_tx_queue(struct ieee80211_sub_if_data *sdata,
- struct sta_info *sta,
- struct txq_info *txqi, int tid)
-{
- skb_queue_head_init(&txqi->queue);
- txqi->txq.vif = &sdata->vif;
-
- if (sta) {
- txqi->txq.sta = &sta->sta;
- sta->sta.txq[tid] = &txqi->txq;
- txqi->txq.tid = tid;
- txqi->txq.ac = ieee802_1d_to_ac[tid & 7];
- } else {
- sdata->vif.txq = &txqi->txq;
- txqi->txq.tid = 0;
- txqi->txq.ac = IEEE80211_AC_BE;
- }
-}
-
void ieee80211_txq_get_depth(struct ieee80211_txq *txq,
unsigned long *frame_cnt,
unsigned long *byte_cnt)
@@ -3415,9 +3396,9 @@ void ieee80211_txq_get_depth(struct ieee80211_txq *txq,
struct txq_info *txqi = to_txq_info(txq);

if (frame_cnt)
- *frame_cnt = txqi->queue.qlen;
+ *frame_cnt = txqi->tin.backlog_packets;

if (byte_cnt)
- *byte_cnt = txqi->byte_cnt;
+ *byte_cnt = txqi->tin.backlog_bytes;
}
EXPORT_SYMBOL(ieee80211_txq_get_depth);
--
2.1.4


2016-05-09 12:28:55

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCHv4 1/5] mac80211: skip netdev queue control with software queuing

On 5 May 2016 at 13:00, Michal Kazior <[email protected]> wrote:
[...]
> -static void ieee80211_drv_tx(struct ieee80211_local *local,
> - struct ieee80211_vif *vif,
> - struct ieee80211_sta *pubsta,
> - struct sk_buff *skb)
> +static struct txq_info *ieee80211_get_txq(struct ieee80211_local *local,
> + struct ieee80211_vif *vif,
> + struct ieee80211_sta *pubsta,
> + struct sk_buff *skb)
> {
> struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
> - struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
> struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
> - struct ieee80211_tx_control control = {
> - .sta = pubsta,
> - };
> - struct ieee80211_txq *txq = NULL;
> - struct txq_info *txqi;
> - u8 ac;
>
> if ((info->flags & IEEE80211_TX_CTL_SEND_AFTER_DTIM) ||
> (info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE))
> - goto tx_normal;
> + return NULL;
>
> if (!ieee80211_is_data(hdr->frame_control))
> - goto tx_normal;
> + return NULL;
>
> if (pubsta) {
> u8 tid = skb->priority & IEEE80211_QOS_CTL_TID_MASK;
>
> - txq = pubsta->txq[tid];
> + return to_txq_info(pubsta->txq[tid]);
> } else if (vif) {
> - txq = vif->txq;
> + return to_txq_info(vif->txq);
> }

I just noticed this crashes on non-wake_tx_queue drivers. I'll re-spin
a v5 with this fixed later.


Michał

2016-05-19 08:36:04

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv5 1/5] mac80211: skip netdev queue control with software queuing

Qdiscs are designed with no regard to 802.11
aggregation requirements and hand out
packet-by-packet with no guarantee they are
destined to the same tid. This does more bad than
good no matter how fairly a given qdisc may behave
on an ethernet interface.

Software queuing used per-AC netdev subqueue
congestion control whenever a global AC limit was
hit. This meant in practice a single station or
tid queue could starve others rather easily. This
could resonate with qdiscs in a bad way or could
just end up with poor aggregation performance.
Increasing the AC limit would increase induced
latency which is also bad.

Disabling qdiscs by default and performing
taildrop instead of netdev subqueue congestion
control on the other hand makes it possible for
tid queues to fill up "in the meantime" while
preventing stations starving each other.

This increases aggregation opportunities and
should allow software queuing based drivers
achieve better performance by utilizing airtime
more efficiently with big aggregates.

Signed-off-by: Michal Kazior <[email protected]>
---

Notes:
v5:
* fix null-deref for non-txq drivers
* fix compilation after rebase
* fix powersave on ath9k w/ wake_tx_queue
[reported by Tim]

v4:
* make queue depth limit per interface instead of
per radio [Johannes]

include/net/mac80211.h | 4 ---
net/mac80211/ieee80211_i.h | 2 +-
net/mac80211/iface.c | 18 +++++++++--
net/mac80211/main.c | 3 --
net/mac80211/sta_info.c | 2 +-
net/mac80211/tx.c | 77 ++++++++++++++++++++++++++--------------------
net/mac80211/util.c | 11 ++++---
7 files changed, 67 insertions(+), 50 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index be30b0549b88..a8683aec6dbe 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -2147,9 +2147,6 @@ enum ieee80211_hw_flags {
* @n_cipher_schemes: a size of an array of cipher schemes definitions.
* @cipher_schemes: a pointer to an array of cipher scheme definitions
* supported by HW.
- *
- * @txq_ac_max_pending: maximum number of frames per AC pending in all txq
- * entries for a vif.
*/
struct ieee80211_hw {
struct ieee80211_conf conf;
@@ -2180,7 +2177,6 @@ struct ieee80211_hw {
u8 uapsd_max_sp_len;
u8 n_cipher_schemes;
const struct ieee80211_cipher_scheme *cipher_schemes;
- int txq_ac_max_pending;
};

static inline bool _ieee80211_hw_check(struct ieee80211_hw *hw,
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 9438c9406687..634603320374 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -856,7 +856,7 @@ struct ieee80211_sub_if_data {
bool control_port_no_encrypt;
int encrypt_headroom;

- atomic_t txqs_len[IEEE80211_NUM_ACS];
+ atomic_t num_tx_queued;
struct ieee80211_tx_queue_params tx_conf[IEEE80211_NUM_ACS];
struct mac80211_qos_map __rcu *qos_map;

diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index c59af3eb9fa4..609c5174d798 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -976,13 +976,13 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,

if (sdata->vif.txq) {
struct txq_info *txqi = to_txq_info(sdata->vif.txq);
+ int n = skb_queue_len(&txqi->queue);

spin_lock_bh(&txqi->queue.lock);
ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
+ atomic_sub(n, &sdata->num_tx_queued);
txqi->byte_cnt = 0;
spin_unlock_bh(&txqi->queue.lock);
-
- atomic_set(&sdata->txqs_len[txqi->txq.ac], 0);
}

if (local->open_count == 0)
@@ -1198,6 +1198,12 @@ static void ieee80211_if_setup(struct net_device *dev)
dev->destructor = ieee80211_if_free;
}

+static void ieee80211_if_setup_no_queue(struct net_device *dev)
+{
+ ieee80211_if_setup(dev);
+ dev->priv_flags |= IFF_NO_QUEUE;
+}
+
static void ieee80211_iface_work(struct work_struct *work)
{
struct ieee80211_sub_if_data *sdata =
@@ -1707,6 +1713,7 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
struct net_device *ndev = NULL;
struct ieee80211_sub_if_data *sdata = NULL;
struct txq_info *txqi;
+ void (*if_setup)(struct net_device *dev);
int ret, i;
int txqs = 1;

@@ -1734,12 +1741,17 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
txq_size += sizeof(struct txq_info) +
local->hw.txq_data_size;

+ if (local->ops->wake_tx_queue)
+ if_setup = ieee80211_if_setup_no_queue;
+ else
+ if_setup = ieee80211_if_setup;
+
if (local->hw.queues >= IEEE80211_NUM_ACS)
txqs = IEEE80211_NUM_ACS;

ndev = alloc_netdev_mqs(size + txq_size,
name, name_assign_type,
- ieee80211_if_setup, txqs, 1);
+ if_setup, txqs, 1);
if (!ndev)
return -ENOMEM;
dev_net_set(ndev, wiphy_net(local->hw.wiphy));
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 7ee91d6151d1..160ac6b8b9a1 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -1055,9 +1055,6 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)

local->dynamic_ps_forced_timeout = -1;

- if (!local->hw.txq_ac_max_pending)
- local->hw.txq_ac_max_pending = 64;
-
result = ieee80211_wep_init(local);
if (result < 0)
wiphy_debug(local->hw.wiphy, "Failed to initialize wep: %d\n",
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 5ccfdbd406bd..177cc6cd6416 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -116,7 +116,7 @@ static void __cleanup_single_sta(struct sta_info *sta)
int n = skb_queue_len(&txqi->queue);

ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
- atomic_sub(n, &sdata->txqs_len[txqi->txq.ac]);
+ atomic_sub(n, &sdata->num_tx_queued);
txqi->byte_cnt = 0;
}
}
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 203044379ce0..3e77da195ce8 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1236,27 +1236,21 @@ ieee80211_tx_prepare(struct ieee80211_sub_if_data *sdata,
return TX_CONTINUE;
}

-static void ieee80211_drv_tx(struct ieee80211_local *local,
- struct ieee80211_vif *vif,
- struct ieee80211_sta *pubsta,
- struct sk_buff *skb)
+static struct txq_info *ieee80211_get_txq(struct ieee80211_local *local,
+ struct ieee80211_vif *vif,
+ struct ieee80211_sta *pubsta,
+ struct sk_buff *skb)
{
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
- struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
- struct ieee80211_tx_control control = {
- .sta = pubsta,
- };
struct ieee80211_txq *txq = NULL;
- struct txq_info *txqi;
- u8 ac;

if ((info->flags & IEEE80211_TX_CTL_SEND_AFTER_DTIM) ||
(info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE))
- goto tx_normal;
+ return NULL;

if (!ieee80211_is_data(hdr->frame_control))
- goto tx_normal;
+ return NULL;

if (pubsta) {
u8 tid = skb->priority & IEEE80211_QOS_CTL_TID_MASK;
@@ -1267,25 +1261,28 @@ static void ieee80211_drv_tx(struct ieee80211_local *local,
}

if (!txq)
- goto tx_normal;
+ return NULL;

- ac = txq->ac;
- txqi = to_txq_info(txq);
- atomic_inc(&sdata->txqs_len[ac]);
- if (atomic_read(&sdata->txqs_len[ac]) >= local->hw.txq_ac_max_pending)
- netif_stop_subqueue(sdata->dev, ac);
+ return to_txq_info(txq);
+}

- spin_lock_bh(&txqi->queue.lock);
+static void ieee80211_txq_enqueue(struct ieee80211_local *local,
+ struct txq_info *txqi,
+ struct sk_buff *skb)
+{
+ struct ieee80211_sub_if_data *sdata = vif_to_sdata(txqi->txq.vif);
+
+ lockdep_assert_held(&txqi->queue.lock);
+
+ if (atomic_read(&sdata->num_tx_queued) >= TOTAL_MAX_TX_BUFFER ||
+ txqi->queue.qlen >= STA_MAX_TX_BUFFER) {
+ ieee80211_free_txskb(&local->hw, skb);
+ return;
+ }
+
+ atomic_inc(&sdata->num_tx_queued);
txqi->byte_cnt += skb->len;
__skb_queue_tail(&txqi->queue, skb);
- spin_unlock_bh(&txqi->queue.lock);
-
- drv_wake_tx_queue(local, txqi);
-
- return;
-
-tx_normal:
- drv_tx(local, &control, skb);
}

struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
@@ -1296,7 +1293,6 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
struct txq_info *txqi = container_of(txq, struct txq_info, txq);
struct ieee80211_hdr *hdr;
struct sk_buff *skb = NULL;
- u8 ac = txq->ac;

spin_lock_bh(&txqi->queue.lock);

@@ -1307,12 +1303,9 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
if (!skb)
goto out;

+ atomic_dec(&sdata->num_tx_queued);
txqi->byte_cnt -= skb->len;

- atomic_dec(&sdata->txqs_len[ac]);
- if (__netif_subqueue_stopped(sdata->dev, ac))
- ieee80211_propagate_queue_wake(local, sdata->vif.hw_queue[ac]);
-
hdr = (struct ieee80211_hdr *)skb->data;
if (txq->sta && ieee80211_is_data_qos(hdr->frame_control)) {
struct sta_info *sta = container_of(txq->sta, struct sta_info,
@@ -1343,7 +1336,9 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
struct sk_buff_head *skbs,
bool txpending)
{
+ struct ieee80211_tx_control control = {};
struct sk_buff *skb, *tmp;
+ struct txq_info *txqi;
unsigned long flags;

skb_queue_walk_safe(skbs, skb, tmp) {
@@ -1358,6 +1353,21 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
}
#endif

+ txqi = ieee80211_get_txq(local, vif, sta, skb);
+ if (txqi) {
+ info->control.vif = vif;
+
+ __skb_unlink(skb, skbs);
+
+ spin_lock_bh(&txqi->queue.lock);
+ ieee80211_txq_enqueue(local, txqi, skb);
+ spin_unlock_bh(&txqi->queue.lock);
+
+ drv_wake_tx_queue(local, txqi);
+
+ continue;
+ }
+
spin_lock_irqsave(&local->queue_stop_reason_lock, flags);
if (local->queue_stop_reasons[q] ||
(!txpending && !skb_queue_empty(&local->pending[q]))) {
@@ -1400,9 +1410,10 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags);

info->control.vif = vif;
+ control.sta = sta;

__skb_unlink(skb, skbs);
- ieee80211_drv_tx(local, vif, sta, skb);
+ drv_tx(local, &control, skb);
}

return true;
diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 905003f75c4d..0db46442bdcf 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -244,6 +244,9 @@ void ieee80211_propagate_queue_wake(struct ieee80211_local *local, int queue)
struct ieee80211_sub_if_data *sdata;
int n_acs = IEEE80211_NUM_ACS;

+ if (local->ops->wake_tx_queue)
+ return;
+
if (local->hw.queues < IEEE80211_NUM_ACS)
n_acs = 1;

@@ -260,11 +263,6 @@ void ieee80211_propagate_queue_wake(struct ieee80211_local *local, int queue)
for (ac = 0; ac < n_acs; ac++) {
int ac_queue = sdata->vif.hw_queue[ac];

- if (local->ops->wake_tx_queue &&
- (atomic_read(&sdata->txqs_len[ac]) >
- local->hw.txq_ac_max_pending))
- continue;
-
if (ac_queue == queue ||
(sdata->vif.cab_queue == queue &&
local->queue_stop_reasons[ac_queue] == 0 &&
@@ -352,6 +350,9 @@ static void __ieee80211_stop_queue(struct ieee80211_hw *hw, int queue,
if (__test_and_set_bit(reason, &local->queue_stop_reasons[queue]))
return;

+ if (local->ops->wake_tx_queue)
+ return;
+
if (local->hw.queues < IEEE80211_NUM_ACS)
n_acs = 1;

--
2.1.4


2016-05-31 12:12:20

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [Make-wifi-fast] [PATCHv5 0/5] mac80211: implement fq_codel

Michal Kazior <[email protected]> writes:

> This patchset disables qdiscs for drivers
> using software queuing and performs fq_codel-like
> dequeuing on txqs.

Hi Michal

Is this version in a git repo somewhere I can pull from? :)

-Toke

2016-05-19 08:36:01

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv5 0/5] mac80211: implement fq_codel

Hi,

This patchset disables qdiscs for drivers
using software queuing and performs fq_codel-like
dequeuing on txqs.

This is based on net-next/master
(0b7962a6c4a37ef3cbb25d976af7b9ec4ce8ad01).

Background:

https://www.spinics.net/lists/linux-wireless/msg149776.html
https://www.spinics.net/lists/linux-wireless/msg148714.html
https://www.spinics.net/lists/linux-wireless/msg149039.html
http://blog.cerowrt.org/post/dql_on_wifi_2/
http://blog.cerowrt.org/post/dql_on_wifi/
http://blog.cerowrt.org/post/fq_codel_on_ath10k/

v5:
* some fixes (crash, powersave) [me, Tim]
* reworked debugfs knob (single file now) [Dave]

v4:
* the taildrop stop-gap patch moved to
per-interface limit (instead of per-radio) [Johannes]
* pushed fq.h and codel.h changes to include/net/ [Johannes]

v3:
* split taildrop, fq and codel functionalities
into separate patches [Avery]

v2:
* fix invalid ptr deref
* fix compilation for backports


Michal Kazior (5):
mac80211: skip netdev queue control with software queuing
mac80211: implement fair queueing per txq
mac80211: add debug knobs for fair queuing
mac80211: implement codel on fair queuing flows
mac80211: add debug knobs for codel

include/net/mac80211.h | 18 ++-
net/mac80211/agg-tx.c | 8 +-
net/mac80211/debugfs.c | 202 ++++++++++++++++++++++++++++++
net/mac80211/ieee80211_i.h | 31 ++++-
net/mac80211/iface.c | 26 ++--
net/mac80211/main.c | 10 +-
net/mac80211/rx.c | 2 +-
net/mac80211/sta_info.c | 14 +--
net/mac80211/tx.c | 298 +++++++++++++++++++++++++++++++++++++++------
net/mac80211/util.c | 34 ++----
10 files changed, 545 insertions(+), 98 deletions(-)

--
2.1.4


2016-05-05 10:58:55

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv4 4/5] mac80211: implement codel on fair queuing flows

There is no other limit other than a global
packet count limit when using software queuing.
This means a single flow queue can grow insanely
long. This is particularly bad for TCP congestion
algorithms which requires a little more
sophisticated frame dropping scheme than a mere
headdrop on limit overflow.

Hence apply (a slighly modified, to fit the knobs)
CoDel5 on flow queues. This improves TCP
convergence and stability when combined with
wireless driver which keeps its own tx queue/fifo
at a minimum fill level for given link conditions.

Signed-off-by: Michal Kazior <[email protected]>
---

Notes:
v4:
* removed internal codel.h and re-used in-kernel one

include/net/mac80211.h | 14 +++++-
net/mac80211/ieee80211_i.h | 5 +++
net/mac80211/tx.c | 109 ++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index ffb90dfe0d70..cc534f1b0f8e 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -21,6 +21,7 @@
#include <linux/skbuff.h>
#include <linux/ieee80211.h>
#include <net/cfg80211.h>
+#include <net/codel.h>
#include <asm/unaligned.h>

/**
@@ -895,7 +896,18 @@ struct ieee80211_tx_info {
unsigned long jiffies;
};
/* NB: vif can be NULL for injected frames */
- struct ieee80211_vif *vif;
+ union {
+ /* NB: vif can be NULL for injected frames */
+ struct ieee80211_vif *vif;
+
+ /* When packets are enqueued on txq it's easy
+ * to re-construct the vif pointer. There's no
+ * more space in tx_info so it can be used to
+ * store the necessary enqueue time for packet
+ * sojourn time computation.
+ */
+ codel_time_t enqueue_time;
+ };
struct ieee80211_key_conf *hw_key;
u32 flags;
/* 4 bytes free */
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 6f8375f1df88..54edfb6fc1d1 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -812,10 +812,12 @@ enum txq_info_flags {
* @tin: contains packets split into multiple flows
* @def_flow: used as a fallback flow when a packet destined to @tin hashes to
* a fq_flow which is already owned by a different tin
+ * @def_cvars: codel vars for @def_flow
*/
struct txq_info {
struct fq_tin tin;
struct fq_flow def_flow;
+ struct codel_vars def_cvars;
unsigned long flags;

/* keep last! */
@@ -1108,6 +1110,9 @@ struct ieee80211_local {
struct ieee80211_hw hw;

struct fq fq;
+ struct codel_vars *cvars;
+ struct codel_params cparams;
+ struct codel_stats cstats;

const struct ieee80211_ops *ops;

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 47936b939591..013b382f6888 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -25,6 +25,8 @@
#include <net/ieee80211_radiotap.h>
#include <net/cfg80211.h>
#include <net/mac80211.h>
+#include <net/codel.h>
+#include <net/codel_impl.h>
#include <asm/unaligned.h>
#include <net/fq_impl.h>

@@ -1269,11 +1271,92 @@ static struct txq_info *ieee80211_get_txq(struct ieee80211_local *local,
return NULL;
}

+static void ieee80211_set_skb_enqueue_time(struct sk_buff *skb)
+{
+ IEEE80211_SKB_CB(skb)->control.enqueue_time = codel_get_time();
+}
+
+static void ieee80211_set_skb_vif(struct sk_buff *skb, struct txq_info *txqi)
+{
+ IEEE80211_SKB_CB(skb)->control.vif = txqi->txq.vif;
+}
+
+static u32 codel_skb_len_func(const struct sk_buff *skb)
+{
+ return skb->len;
+}
+
+static codel_time_t codel_skb_time_func(const struct sk_buff *skb)
+{
+ const struct ieee80211_tx_info *info;
+
+ info = (const struct ieee80211_tx_info *)skb->cb;
+ return info->control.enqueue_time;
+}
+
+static struct sk_buff *codel_dequeue_func(struct codel_vars *cvars,
+ void *ctx)
+{
+ struct ieee80211_local *local;
+ struct txq_info *txqi;
+ struct fq *fq;
+ struct fq_flow *flow;
+
+ txqi = ctx;
+ local = vif_to_sdata(txqi->txq.vif)->local;
+ fq = &local->fq;
+
+ if (cvars == &txqi->def_cvars)
+ flow = &txqi->def_flow;
+ else
+ flow = &fq->flows[cvars - local->cvars];
+
+ return fq_flow_dequeue(fq, flow);
+}
+
+static void codel_drop_func(struct sk_buff *skb,
+ void *ctx)
+{
+ struct ieee80211_local *local;
+ struct ieee80211_hw *hw;
+ struct txq_info *txqi;
+
+ txqi = ctx;
+ local = vif_to_sdata(txqi->txq.vif)->local;
+ hw = &local->hw;
+
+ ieee80211_free_txskb(hw, skb);
+}
+
static struct sk_buff *fq_tin_dequeue_func(struct fq *fq,
struct fq_tin *tin,
struct fq_flow *flow)
{
- return fq_flow_dequeue(fq, flow);
+ struct ieee80211_local *local;
+ struct txq_info *txqi;
+ struct codel_vars *cvars;
+ struct codel_params *cparams;
+ struct codel_stats *cstats;
+
+ local = container_of(fq, struct ieee80211_local, fq);
+ txqi = container_of(tin, struct txq_info, tin);
+ cparams = &local->cparams;
+ cstats = &local->cstats;
+
+ if (flow == &txqi->def_flow)
+ cvars = &txqi->def_cvars;
+ else
+ cvars = &local->cvars[flow - fq->flows];
+
+ return codel_dequeue(txqi,
+ &flow->backlog,
+ cparams,
+ cvars,
+ cstats,
+ codel_skb_len_func,
+ codel_skb_time_func,
+ codel_drop_func,
+ codel_dequeue_func);
}

static void fq_skb_free_func(struct fq *fq,
@@ -1305,6 +1388,7 @@ static void ieee80211_txq_enqueue(struct ieee80211_local *local,
struct fq *fq = &local->fq;
struct fq_tin *tin = &txqi->tin;

+ ieee80211_set_skb_enqueue_time(skb);
fq_tin_enqueue(fq, tin, skb,
fq_skb_free_func,
fq_flow_get_default_func);
@@ -1316,6 +1400,7 @@ void ieee80211_txq_init(struct ieee80211_sub_if_data *sdata,
{
fq_tin_init(&txqi->tin);
fq_flow_init(&txqi->def_flow);
+ codel_vars_init(&txqi->def_cvars);

txqi->txq.vif = &sdata->vif;

@@ -1344,6 +1429,7 @@ int ieee80211_txq_setup_flows(struct ieee80211_local *local)
{
struct fq *fq = &local->fq;
int ret;
+ int i;

if (!local->ops->wake_tx_queue)
return 0;
@@ -1352,6 +1438,22 @@ int ieee80211_txq_setup_flows(struct ieee80211_local *local)
if (ret)
return ret;

+ codel_params_init(&local->cparams);
+ codel_stats_init(&local->cstats);
+ local->cparams.interval = MS2TIME(100);
+ local->cparams.target = MS2TIME(20);
+ local->cparams.ecn = true;
+
+ local->cvars = kcalloc(fq->flows_cnt, sizeof(local->cvars[0]),
+ GFP_KERNEL);
+ if (!local->cvars) {
+ fq_reset(fq, fq_skb_free_func);
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < fq->flows_cnt; i++)
+ codel_vars_init(&local->cvars[i]);
+
return 0;
}

@@ -1362,6 +1464,9 @@ void ieee80211_txq_teardown_flows(struct ieee80211_local *local)
if (!local->ops->wake_tx_queue)
return;

+ kfree(local->cvars);
+ local->cvars = NULL;
+
fq_reset(fq, fq_skb_free_func);
}

@@ -1384,6 +1489,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
if (!skb)
goto out;

+ ieee80211_set_skb_vif(skb, txqi);
+
hdr = (struct ieee80211_hdr *)skb->data;
if (txq->sta && ieee80211_is_data_qos(hdr->frame_control)) {
struct sta_info *sta = container_of(txq->sta, struct sta_info,
--
2.1.4


2016-05-05 10:58:54

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv4 3/5] mac80211: add debug knobs for fair queuing

This adds a few debugfs entries and a module
parameter to make it easier to test, debug and
experiment.

Signed-off-by: Michal Kazior <[email protected]>
---
net/mac80211/debugfs.c | 77 +++++++++++++++++++++++++++++++++++++++++++
net/mac80211/debugfs_netdev.c | 28 +++++++++++++++-
net/mac80211/debugfs_sta.c | 45 +++++++++++++++++++++++++
net/mac80211/tx.c | 8 ++++-
4 files changed, 156 insertions(+), 2 deletions(-)

diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index b251b2f7f8dd..43592b6f79f0 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -31,6 +31,30 @@ int mac80211_format_buffer(char __user *userbuf, size_t count,
return simple_read_from_buffer(userbuf, count, ppos, buf, res);
}

+static int mac80211_parse_buffer(const char __user *userbuf,
+ size_t count,
+ loff_t *ppos,
+ char *fmt, ...)
+{
+ va_list args;
+ char buf[DEBUGFS_FORMAT_BUFFER_SIZE] = {};
+ int res;
+
+ if (count > sizeof(buf))
+ return -EINVAL;
+
+ if (copy_from_user(buf, userbuf, count))
+ return -EFAULT;
+
+ buf[sizeof(buf) - 1] = '\0';
+
+ va_start(args, fmt);
+ res = vsscanf(buf, fmt, args);
+ va_end(args);
+
+ return count;
+}
+
#define DEBUGFS_READONLY_FILE_FN(name, fmt, value...) \
static ssize_t name## _read(struct file *file, char __user *userbuf, \
size_t count, loff_t *ppos) \
@@ -70,6 +94,52 @@ DEBUGFS_READONLY_FILE(wep_iv, "%#08x",
DEBUGFS_READONLY_FILE(rate_ctrl_alg, "%s",
local->rate_ctrl ? local->rate_ctrl->ops->name : "hw/driver");

+#define DEBUGFS_RW_FILE_FN(name, expr) \
+static ssize_t name## _write(struct file *file, \
+ const char __user *userbuf, \
+ size_t count, \
+ loff_t *ppos) \
+{ \
+ struct ieee80211_local *local = file->private_data; \
+ return expr; \
+}
+
+#define DEBUGFS_RW_FILE(name, expr, fmt, value...) \
+ DEBUGFS_READONLY_FILE_FN(name, fmt, value) \
+ DEBUGFS_RW_FILE_FN(name, expr) \
+ DEBUGFS_RW_FILE_OPS(name)
+
+#define DEBUGFS_RW_FILE_OPS(name) \
+static const struct file_operations name## _ops = { \
+ .read = name## _read, \
+ .write = name## _write, \
+ .open = simple_open, \
+ .llseek = generic_file_llseek, \
+}
+
+#define DEBUGFS_RW_EXPR_FQ(args...) \
+({ \
+ int res; \
+ res = mac80211_parse_buffer(userbuf, count, ppos, args); \
+ res; \
+})
+
+DEBUGFS_READONLY_FILE(fq_flows_cnt, "%u",
+ local->fq.flows_cnt);
+DEBUGFS_READONLY_FILE(fq_backlog, "%u",
+ local->fq.backlog);
+DEBUGFS_READONLY_FILE(fq_overlimit, "%u",
+ local->fq.overlimit);
+DEBUGFS_READONLY_FILE(fq_collisions, "%u",
+ local->fq.collisions);
+
+DEBUGFS_RW_FILE(fq_limit,
+ DEBUGFS_RW_EXPR_FQ("%u", &local->fq.limit),
+ "%u", local->fq.limit);
+DEBUGFS_RW_FILE(fq_quantum,
+ DEBUGFS_RW_EXPR_FQ("%u", &local->fq.quantum),
+ "%u", local->fq.quantum);
+
#ifdef CONFIG_PM
static ssize_t reset_write(struct file *file, const char __user *user_buf,
size_t count, loff_t *ppos)
@@ -257,6 +327,13 @@ void debugfs_hw_add(struct ieee80211_local *local)
DEBUGFS_ADD(user_power);
DEBUGFS_ADD(power);

+ DEBUGFS_ADD(fq_flows_cnt);
+ DEBUGFS_ADD(fq_backlog);
+ DEBUGFS_ADD(fq_overlimit);
+ DEBUGFS_ADD(fq_collisions);
+ DEBUGFS_ADD(fq_limit);
+ DEBUGFS_ADD(fq_quantum);
+
statsd = debugfs_create_dir("statistics", phyd);

/* if the dir failed, don't put all the other things into the root! */
diff --git a/net/mac80211/debugfs_netdev.c b/net/mac80211/debugfs_netdev.c
index a5ba739cd2a7..369755b2b078 100644
--- a/net/mac80211/debugfs_netdev.c
+++ b/net/mac80211/debugfs_netdev.c
@@ -30,7 +30,7 @@ static ssize_t ieee80211_if_read(
size_t count, loff_t *ppos,
ssize_t (*format)(const struct ieee80211_sub_if_data *, char *, int))
{
- char buf[70];
+ char buf[200];
ssize_t ret = -EINVAL;

read_lock(&dev_base_lock);
@@ -236,6 +236,31 @@ ieee80211_if_fmt_hw_queues(const struct ieee80211_sub_if_data *sdata,
}
IEEE80211_IF_FILE_R(hw_queues);

+static ssize_t
+ieee80211_if_fmt_txq(const struct ieee80211_sub_if_data *sdata,
+ char *buf, int buflen)
+{
+ struct txq_info *txqi;
+ int len = 0;
+
+ if (!sdata->vif.txq)
+ return 0;
+
+ txqi = to_txq_info(sdata->vif.txq);
+ len += scnprintf(buf + len, buflen - len,
+ "CAB backlog %ub %up flows %u overlimit %u collisions %u tx %ub %up\n",
+ txqi->tin.backlog_bytes,
+ txqi->tin.backlog_packets,
+ txqi->tin.flows,
+ txqi->tin.overlimit,
+ txqi->tin.collisions,
+ txqi->tin.tx_bytes,
+ txqi->tin.tx_packets);
+
+ return len;
+}
+IEEE80211_IF_FILE_R(txq);
+
/* STA attributes */
IEEE80211_IF_FILE(bssid, u.mgd.bssid, MAC);
IEEE80211_IF_FILE(aid, u.mgd.aid, DEC);
@@ -618,6 +643,7 @@ static void add_common_files(struct ieee80211_sub_if_data *sdata)
DEBUGFS_ADD(rc_rateidx_vht_mcs_mask_2ghz);
DEBUGFS_ADD(rc_rateidx_vht_mcs_mask_5ghz);
DEBUGFS_ADD(hw_queues);
+ DEBUGFS_ADD(txq);
}

static void add_sta_files(struct ieee80211_sub_if_data *sdata)
diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
index 33dfcbc2bf9c..bae1c39517af 100644
--- a/net/mac80211/debugfs_sta.c
+++ b/net/mac80211/debugfs_sta.c
@@ -355,6 +355,50 @@ static ssize_t sta_vht_capa_read(struct file *file, char __user *userbuf,
}
STA_OPS(vht_capa);

+static ssize_t sta_txqs_read(struct file *file,
+ char __user *userbuf,
+ size_t count,
+ loff_t *ppos)
+{
+ struct sta_info *sta = file->private_data;
+ struct txq_info *txqi;
+ char *buf;
+ int buflen;
+ int len;
+ int res;
+ int i;
+
+ len = 0;
+ buflen = 200 * IEEE80211_NUM_TIDS;
+ buf = kzalloc(buflen, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ for (i = 0; i < IEEE80211_NUM_TIDS; i++) {
+ if (!sta->sta.txq[i])
+ break;
+
+ txqi = to_txq_info(sta->sta.txq[i]);
+ len += scnprintf(buf + len, buflen - len,
+ "TID %d AC %d backlog %ub %up flows %u overlimit %u collisions %u tx %ub %up\n",
+ i,
+ txqi->txq.ac,
+ txqi->tin.backlog_bytes,
+ txqi->tin.backlog_packets,
+ txqi->tin.flows,
+ txqi->tin.overlimit,
+ txqi->tin.collisions,
+ txqi->tin.tx_bytes,
+ txqi->tin.tx_packets);
+ }
+
+ res = simple_read_from_buffer(userbuf, count, ppos, buf, len);
+ kfree(buf);
+
+ return res;
+}
+STA_OPS(txqs);
+

#define DEBUGFS_ADD(name) \
debugfs_create_file(#name, 0400, \
@@ -399,6 +443,7 @@ void ieee80211_sta_debugfs_add(struct sta_info *sta)
DEBUGFS_ADD(agg_status);
DEBUGFS_ADD(ht_capa);
DEBUGFS_ADD(vht_capa);
+ DEBUGFS_ADD(txqs);

DEBUGFS_ADD_COUNTER(rx_duplicates, rx_stats.num_duplicates);
DEBUGFS_ADD_COUNTER(rx_fragments, rx_stats.fragments);
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 56633b012ba1..47936b939591 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -17,6 +17,7 @@
#include <linux/slab.h>
#include <linux/skbuff.h>
#include <linux/etherdevice.h>
+#include <linux/moduleparam.h>
#include <linux/bitmap.h>
#include <linux/rcupdate.h>
#include <linux/export.h>
@@ -36,6 +37,11 @@
#include "wme.h"
#include "rate.h"

+static unsigned int fq_flows_cnt = 4096;
+module_param(fq_flows_cnt, uint, 0644);
+MODULE_PARM_DESC(fq_flows_cnt,
+ "Maximum number of txq fair queuing flows. ");
+
/* misc utils */

static inline void ieee80211_tx_stats(struct net_device *dev, u32 len)
@@ -1342,7 +1348,7 @@ int ieee80211_txq_setup_flows(struct ieee80211_local *local)
if (!local->ops->wake_tx_queue)
return 0;

- ret = fq_init(fq, 4096);
+ ret = fq_init(fq, max_t(u32, fq_flows_cnt, 1));
if (ret)
return ret;

--
2.1.4


2016-05-19 08:36:08

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv5 4/5] mac80211: implement codel on fair queuing flows

There is no other limit other than a global
packet count limit when using software queuing.
This means a single flow queue can grow insanely
long. This is particularly bad for TCP congestion
algorithms which requires a little more
sophisticated frame dropping scheme than a mere
headdrop on limit overflow.

Hence apply (a slighly modified, to fit the knobs)
CoDel5 on flow queues. This improves TCP
convergence and stability when combined with
wireless driver which keeps its own tx queue/fifo
at a minimum fill level for given link conditions.

Signed-off-by: Michal Kazior <[email protected]>
---

Notes:
v4:
* removed internal codel.h and re-used in-kernel one

include/net/mac80211.h | 14 +++++-
net/mac80211/ieee80211_i.h | 5 +++
net/mac80211/tx.c | 109 ++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index a8683aec6dbe..a52009ffc19f 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -21,6 +21,7 @@
#include <linux/skbuff.h>
#include <linux/ieee80211.h>
#include <net/cfg80211.h>
+#include <net/codel.h>
#include <asm/unaligned.h>

/**
@@ -895,7 +896,18 @@ struct ieee80211_tx_info {
unsigned long jiffies;
};
/* NB: vif can be NULL for injected frames */
- struct ieee80211_vif *vif;
+ union {
+ /* NB: vif can be NULL for injected frames */
+ struct ieee80211_vif *vif;
+
+ /* When packets are enqueued on txq it's easy
+ * to re-construct the vif pointer. There's no
+ * more space in tx_info so it can be used to
+ * store the necessary enqueue time for packet
+ * sojourn time computation.
+ */
+ codel_time_t enqueue_time;
+ };
struct ieee80211_key_conf *hw_key;
u32 flags;
/* 4 bytes free */
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 6f8375f1df88..54edfb6fc1d1 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -812,10 +812,12 @@ enum txq_info_flags {
* @tin: contains packets split into multiple flows
* @def_flow: used as a fallback flow when a packet destined to @tin hashes to
* a fq_flow which is already owned by a different tin
+ * @def_cvars: codel vars for @def_flow
*/
struct txq_info {
struct fq_tin tin;
struct fq_flow def_flow;
+ struct codel_vars def_cvars;
unsigned long flags;

/* keep last! */
@@ -1108,6 +1110,9 @@ struct ieee80211_local {
struct ieee80211_hw hw;

struct fq fq;
+ struct codel_vars *cvars;
+ struct codel_params cparams;
+ struct codel_stats cstats;

const struct ieee80211_ops *ops;

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 2b60b10e6990..a7c9b6704ffb 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -25,6 +25,8 @@
#include <net/ieee80211_radiotap.h>
#include <net/cfg80211.h>
#include <net/mac80211.h>
+#include <net/codel.h>
+#include <net/codel_impl.h>
#include <asm/unaligned.h>
#include <net/fq_impl.h>

@@ -1273,11 +1275,92 @@ static struct txq_info *ieee80211_get_txq(struct ieee80211_local *local,
return to_txq_info(txq);
}

+static void ieee80211_set_skb_enqueue_time(struct sk_buff *skb)
+{
+ IEEE80211_SKB_CB(skb)->control.enqueue_time = codel_get_time();
+}
+
+static void ieee80211_set_skb_vif(struct sk_buff *skb, struct txq_info *txqi)
+{
+ IEEE80211_SKB_CB(skb)->control.vif = txqi->txq.vif;
+}
+
+static u32 codel_skb_len_func(const struct sk_buff *skb)
+{
+ return skb->len;
+}
+
+static codel_time_t codel_skb_time_func(const struct sk_buff *skb)
+{
+ const struct ieee80211_tx_info *info;
+
+ info = (const struct ieee80211_tx_info *)skb->cb;
+ return info->control.enqueue_time;
+}
+
+static struct sk_buff *codel_dequeue_func(struct codel_vars *cvars,
+ void *ctx)
+{
+ struct ieee80211_local *local;
+ struct txq_info *txqi;
+ struct fq *fq;
+ struct fq_flow *flow;
+
+ txqi = ctx;
+ local = vif_to_sdata(txqi->txq.vif)->local;
+ fq = &local->fq;
+
+ if (cvars == &txqi->def_cvars)
+ flow = &txqi->def_flow;
+ else
+ flow = &fq->flows[cvars - local->cvars];
+
+ return fq_flow_dequeue(fq, flow);
+}
+
+static void codel_drop_func(struct sk_buff *skb,
+ void *ctx)
+{
+ struct ieee80211_local *local;
+ struct ieee80211_hw *hw;
+ struct txq_info *txqi;
+
+ txqi = ctx;
+ local = vif_to_sdata(txqi->txq.vif)->local;
+ hw = &local->hw;
+
+ ieee80211_free_txskb(hw, skb);
+}
+
static struct sk_buff *fq_tin_dequeue_func(struct fq *fq,
struct fq_tin *tin,
struct fq_flow *flow)
{
- return fq_flow_dequeue(fq, flow);
+ struct ieee80211_local *local;
+ struct txq_info *txqi;
+ struct codel_vars *cvars;
+ struct codel_params *cparams;
+ struct codel_stats *cstats;
+
+ local = container_of(fq, struct ieee80211_local, fq);
+ txqi = container_of(tin, struct txq_info, tin);
+ cparams = &local->cparams;
+ cstats = &local->cstats;
+
+ if (flow == &txqi->def_flow)
+ cvars = &txqi->def_cvars;
+ else
+ cvars = &local->cvars[flow - fq->flows];
+
+ return codel_dequeue(txqi,
+ &flow->backlog,
+ cparams,
+ cvars,
+ cstats,
+ codel_skb_len_func,
+ codel_skb_time_func,
+ codel_drop_func,
+ codel_dequeue_func);
}

static void fq_skb_free_func(struct fq *fq,
@@ -1309,6 +1392,7 @@ static void ieee80211_txq_enqueue(struct ieee80211_local *local,
struct fq *fq = &local->fq;
struct fq_tin *tin = &txqi->tin;

+ ieee80211_set_skb_enqueue_time(skb);
fq_tin_enqueue(fq, tin, skb,
fq_skb_free_func,
fq_flow_get_default_func);
@@ -1320,6 +1404,7 @@ void ieee80211_txq_init(struct ieee80211_sub_if_data *sdata,
{
fq_tin_init(&txqi->tin);
fq_flow_init(&txqi->def_flow);
+ codel_vars_init(&txqi->def_cvars);

txqi->txq.vif = &sdata->vif;

@@ -1348,6 +1433,7 @@ int ieee80211_txq_setup_flows(struct ieee80211_local *local)
{
struct fq *fq = &local->fq;
int ret;
+ int i;

if (!local->ops->wake_tx_queue)
return 0;
@@ -1356,6 +1442,22 @@ int ieee80211_txq_setup_flows(struct ieee80211_local *local)
if (ret)
return ret;

+ codel_params_init(&local->cparams);
+ codel_stats_init(&local->cstats);
+ local->cparams.interval = MS2TIME(100);
+ local->cparams.target = MS2TIME(20);
+ local->cparams.ecn = true;
+
+ local->cvars = kcalloc(fq->flows_cnt, sizeof(local->cvars[0]),
+ GFP_KERNEL);
+ if (!local->cvars) {
+ fq_reset(fq, fq_skb_free_func);
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < fq->flows_cnt; i++)
+ codel_vars_init(&local->cvars[i]);
+
return 0;
}

@@ -1366,6 +1468,9 @@ void ieee80211_txq_teardown_flows(struct ieee80211_local *local)
if (!local->ops->wake_tx_queue)
return;

+ kfree(local->cvars);
+ local->cvars = NULL;
+
fq_reset(fq, fq_skb_free_func);
}

@@ -1388,6 +1493,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
if (!skb)
goto out;

+ ieee80211_set_skb_vif(skb, txqi);
+
hdr = (struct ieee80211_hdr *)skb->data;
if (txq->sta && ieee80211_is_data_qos(hdr->frame_control)) {
struct sta_info *sta = container_of(txq->sta, struct sta_info,
--
2.1.4


2016-05-05 15:30:34

by Dave Taht

[permalink] [raw]
Subject: Re: [PATCHv4 4/5] mac80211: implement codel on fair queuing flows

On Thu, May 5, 2016 at 4:00 AM, Michal Kazior <[email protected]> wrote:
> There is no other limit other than a global
> packet count limit when using software queuing.
> This means a single flow queue can grow insanely
> long. This is particularly bad for TCP congestion
> algorithms which requires a little more
> sophisticated frame dropping scheme than a mere
> headdrop on limit overflow.
>
> Hence apply (a slighly modified, to fit the knobs)
> CoDel5 on flow queues. This improves TCP
> convergence and stability when combined with
> wireless driver which keeps its own tx queue/fifo
> at a minimum fill level for given link conditions.
>
> Signed-off-by: Michal Kazior <[email protected]>
> ---
>
> Notes:
> v4:
> * removed internal codel.h and re-used in-kernel one
>
> include/net/mac80211.h | 14 +++++-
> net/mac80211/ieee80211_i.h | 5 +++
> net/mac80211/tx.c | 109 ++++++++++++++++++++++++++++++++++++++++++++-
> 3 files changed, 126 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
> index ffb90dfe0d70..cc534f1b0f8e 100644
> --- a/include/net/mac80211.h
> +++ b/include/net/mac80211.h
> @@ -21,6 +21,7 @@
> #include <linux/skbuff.h>
> #include <linux/ieee80211.h>
> #include <net/cfg80211.h>
> +#include <net/codel.h>
> #include <asm/unaligned.h>
>
> /**
> @@ -895,7 +896,18 @@ struct ieee80211_tx_info {
> unsigned long jiffies;
> };
> /* NB: vif can be NULL for injected frames */
> - struct ieee80211_vif *vif;
> + union {
> + /* NB: vif can be NULL for injected frames */
> + struct ieee80211_vif *vif;
> +
> + /* When packets are enqueued on txq it's easy
> + * to re-construct the vif pointer. There's no
> + * more space in tx_info so it can be used to
> + * store the necessary enqueue time for packet
> + * sojourn time computation.
> + */
> + codel_time_t enqueue_time;
> + };

Can't the skb->timestamp be used instead? (or does that still stomp on tcp)

(my longstanding dream of course has been to always timestamp coming
off the rx ring, and to not have to do it on entrance to the codel
enqueue routine here. It adds measuring total system processing time
to the queue measurement, allows for offloaded timestamping, etc, but
did involve changing all of linux to use it)

> struct ieee80211_key_conf *hw_key;
> u32 flags;
> /* 4 bytes free */
> diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
> index 6f8375f1df88..54edfb6fc1d1 100644
> --- a/net/mac80211/ieee80211_i.h
> +++ b/net/mac80211/ieee80211_i.h
> @@ -812,10 +812,12 @@ enum txq_info_flags {
> * @tin: contains packets split into multiple flows
> * @def_flow: used as a fallback flow when a packet destined to @tin hashes to
> * a fq_flow which is already owned by a different tin
> + * @def_cvars: codel vars for @def_flow
> */
> struct txq_info {
> struct fq_tin tin;
> struct fq_flow def_flow;
> + struct codel_vars def_cvars;
> unsigned long flags;
>
> /* keep last! */
> @@ -1108,6 +1110,9 @@ struct ieee80211_local {
> struct ieee80211_hw hw;
>
> struct fq fq;
> + struct codel_vars *cvars;
> + struct codel_params cparams;
> + struct codel_stats cstats;
>
> const struct ieee80211_ops *ops;
>
> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index 47936b939591..013b382f6888 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -25,6 +25,8 @@
> #include <net/ieee80211_radiotap.h>
> #include <net/cfg80211.h>
> #include <net/mac80211.h>
> +#include <net/codel.h>
> +#include <net/codel_impl.h>
> #include <asm/unaligned.h>
> #include <net/fq_impl.h>
>
> @@ -1269,11 +1271,92 @@ static struct txq_info *ieee80211_get_txq(struct ieee80211_local *local,
> return NULL;
> }
>
> +static void ieee80211_set_skb_enqueue_time(struct sk_buff *skb)
> +{
> + IEEE80211_SKB_CB(skb)->control.enqueue_time = codel_get_time();
> +}
> +
> +static void ieee80211_set_skb_vif(struct sk_buff *skb, struct txq_info *txqi)
> +{
> + IEEE80211_SKB_CB(skb)->control.vif = txqi->txq.vif;
> +}
> +
> +static u32 codel_skb_len_func(const struct sk_buff *skb)
> +{
> + return skb->len;
> +}
> +
> +static codel_time_t codel_skb_time_func(const struct sk_buff *skb)
> +{
> + const struct ieee80211_tx_info *info;
> +
> + info = (const struct ieee80211_tx_info *)skb->cb;
> + return info->control.enqueue_time;
> +}
> +
> +static struct sk_buff *codel_dequeue_func(struct codel_vars *cvars,
> + void *ctx)
> +{
> + struct ieee80211_local *local;
> + struct txq_info *txqi;
> + struct fq *fq;
> + struct fq_flow *flow;
> +
> + txqi = ctx;
> + local = vif_to_sdata(txqi->txq.vif)->local;
> + fq = &local->fq;
> +
> + if (cvars == &txqi->def_cvars)
> + flow = &txqi->def_flow;
> + else
> + flow = &fq->flows[cvars - local->cvars];
> +
> + return fq_flow_dequeue(fq, flow);
> +}
> +
> +static void codel_drop_func(struct sk_buff *skb,
> + void *ctx)
> +{
> + struct ieee80211_local *local;
> + struct ieee80211_hw *hw;
> + struct txq_info *txqi;
> +
> + txqi = ctx;
> + local = vif_to_sdata(txqi->txq.vif)->local;
> + hw = &local->hw;
> +
> + ieee80211_free_txskb(hw, skb);
> +}
> +
> static struct sk_buff *fq_tin_dequeue_func(struct fq *fq,
> struct fq_tin *tin,
> struct fq_flow *flow)
> {
> - return fq_flow_dequeue(fq, flow);
> + struct ieee80211_local *local;
> + struct txq_info *txqi;
> + struct codel_vars *cvars;
> + struct codel_params *cparams;
> + struct codel_stats *cstats;
> +
> + local = container_of(fq, struct ieee80211_local, fq);
> + txqi = container_of(tin, struct txq_info, tin);
> + cparams = &local->cparams;
> + cstats = &local->cstats;
> +
> + if (flow == &txqi->def_flow)
> + cvars = &txqi->def_cvars;
> + else
> + cvars = &local->cvars[flow - fq->flows];
> +
> + return codel_dequeue(txqi,
> + &flow->backlog,
> + cparams,
> + cvars,
> + cstats,
> + codel_skb_len_func,
> + codel_skb_time_func,
> + codel_drop_func,
> + codel_dequeue_func);
> }
>
> static void fq_skb_free_func(struct fq *fq,
> @@ -1305,6 +1388,7 @@ static void ieee80211_txq_enqueue(struct ieee80211_local *local,
> struct fq *fq = &local->fq;
> struct fq_tin *tin = &txqi->tin;
>
> + ieee80211_set_skb_enqueue_time(skb);
> fq_tin_enqueue(fq, tin, skb,
> fq_skb_free_func,
> fq_flow_get_default_func);
> @@ -1316,6 +1400,7 @@ void ieee80211_txq_init(struct ieee80211_sub_if_data *sdata,
> {
> fq_tin_init(&txqi->tin);
> fq_flow_init(&txqi->def_flow);
> + codel_vars_init(&txqi->def_cvars);
>
> txqi->txq.vif = &sdata->vif;
>
> @@ -1344,6 +1429,7 @@ int ieee80211_txq_setup_flows(struct ieee80211_local *local)
> {
> struct fq *fq = &local->fq;
> int ret;
> + int i;
>
> if (!local->ops->wake_tx_queue)
> return 0;
> @@ -1352,6 +1438,22 @@ int ieee80211_txq_setup_flows(struct ieee80211_local *local)
> if (ret)
> return ret;
>
> + codel_params_init(&local->cparams);
> + codel_stats_init(&local->cstats);
> + local->cparams.interval = MS2TIME(100);
> + local->cparams.target = MS2TIME(20);
> + local->cparams.ecn = true;
> +
> + local->cvars = kcalloc(fq->flows_cnt, sizeof(local->cvars[0]),
> + GFP_KERNEL);
> + if (!local->cvars) {
> + fq_reset(fq, fq_skb_free_func);
> + return -ENOMEM;
> + }
> +
> + for (i = 0; i < fq->flows_cnt; i++)
> + codel_vars_init(&local->cvars[i]);
> +
> return 0;
> }
>
> @@ -1362,6 +1464,9 @@ void ieee80211_txq_teardown_flows(struct ieee80211_local *local)
> if (!local->ops->wake_tx_queue)
> return;
>
> + kfree(local->cvars);
> + local->cvars = NULL;
> +
> fq_reset(fq, fq_skb_free_func);
> }
>
> @@ -1384,6 +1489,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
> if (!skb)
> goto out;
>
> + ieee80211_set_skb_vif(skb, txqi);
> +
> hdr = (struct ieee80211_hdr *)skb->data;
> if (txq->sta && ieee80211_is_data_qos(hdr->frame_control)) {
> struct sta_info *sta = container_of(txq->sta, struct sta_info,
> --
> 2.1.4
>



--
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

2016-05-05 10:58:49

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv4 0/5] mac80211: implement fq_codel

Hi,

This patchset disables qdiscs for drivers
using software queuing and performs fq_codel-like
dequeuing on txqs.

This is based on net-next/master
(035cd6ba53eff060760c4f4d11339fcc916a967c).

For anyone interested I've pushed tree with my
(now oldish) ath10k DQL RFC and a small fix I've
been testing:

https://github.com/kazikcz/linux/tree/fqmac-v4%2Bdqlrfc%2Bcpuregrfix

Background:

https://www.spinics.net/lists/linux-wireless/msg149776.html
https://www.spinics.net/lists/linux-wireless/msg148714.html
https://www.spinics.net/lists/linux-wireless/msg149039.html
http://blog.cerowrt.org/post/dql_on_wifi_2/
http://blog.cerowrt.org/post/dql_on_wifi/
http://blog.cerowrt.org/post/fq_codel_on_ath10k/


v4:
* the taildrop stop-gap patch moved to
per-interface limit (instead of per-radio) [Johannes]
* pushed fq.h and codel.h changes to include/net/ [Johannes]

v3:
* split taildrop, fq and codel functionalities
into separate patches [Avery]

v2:
* fix invalid ptr deref
* fix compilation for backports


Michal Kazior (5):
mac80211: skip netdev queue control with software queuing
mac80211: implement fair queueing per txq
mac80211: add debug knobs for fair queuing
mac80211: implement codel on fair queuing flows
mac80211: add debug knobs for codel

include/net/mac80211.h | 18 ++-
net/mac80211/agg-tx.c | 8 +-
net/mac80211/debugfs.c | 117 ++++++++++++++++
net/mac80211/debugfs_netdev.c | 28 +++-
net/mac80211/debugfs_sta.c | 45 +++++++
net/mac80211/ieee80211_i.h | 31 ++++-
net/mac80211/iface.c | 26 ++--
net/mac80211/main.c | 10 +-
net/mac80211/rx.c | 2 +-
net/mac80211/sta_info.c | 14 +-
net/mac80211/tx.c | 302 ++++++++++++++++++++++++++++++++++++------
net/mac80211/util.c | 34 ++---
12 files changed, 532 insertions(+), 103 deletions(-)

--
2.1.4


2016-05-06 05:51:17

by Dave Taht

[permalink] [raw]
Subject: Re: [PATCHv4 5/5] mac80211: add debug knobs for codel

On Thu, May 5, 2016 at 10:27 PM, Michal Kazior <[email protected]> wrote:
> On 5 May 2016 at 17:21, Dave Taht <[email protected]> wrote:
>> On Thu, May 5, 2016 at 4:00 AM, Michal Kazior <[email protected]> wrote:
>>> This adds a few debugfs entries to make it easier
>>> to test, debug and experiment.
>>
>> I might argue in favor of moving all these (inc the fq ones) into
>> their own dir, maybe "aqm" or "sqm".
>>
>> The mixture of read only stats and configuration vars is a bit confusing.
>>
>> Also in my testing of the previous patch, actually seeing the stats
>> get updated seemed to be highly async or inaccurate. For example, it
>> was obvious from the captures themselves that codel_ce_mark-ing was
>> happening, but the actual numbers out of wack with the mark seen or
>> fq_backlog seen. (I can go back to revisit this)
>
> That's kind of expected since all of these bits are exposed as
> separate debugfs entries/files. To avoid that it'd be necessary to
> provide a single debugfs entry/file whose contents are generated on
> open() while holding local->fq.lock. But then you could argue it
> should contain all per-sta-tid info as well (backlog, flows, drops) as
> well instead of having them in netdev*/stations/*/txqs.
> Hmm..

I have not had time to write up todays results to any full extent, but
they were pretty spectacular.

I have a comparison of the baseline ath10k driver vs your 3.5 patchset
here on the second plot:

http://blog.cerowrt.org/post/predictive_codeling/

The raw data is here:
https://github.com/dtaht/blog-cerowrt/tree/master/content/flent/qca-10.2-fqmac35-codel-5

...

a note: quantum of the mtu (typically 1514) is a saner default than 300,

(the older patch I had, set it to 300, dunno what your default is now).

and quantum 1514, codel target 5ms rather than 20ms for this test
series was *just fine* (but more testing of the lower target is
needed)

However:

quantum "300" only makes sense for very, very low bandwidths (say <
6mbits), in other scenarios it just eats extra cpu (5 passes through
the loop to send a big packet) and disables
the "new/old" queue feature which helps "push" new flows to flow
balance. I'd default it to the larger value.




...

In other news, spacex just landed on the barge a few minutes ago.

The webcast is still going on
https://www.youtube.com/watch?v=L0bMeDj76ig and you can reverse it to
the landing.
:awesome:

>
>
> Michał



--
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

2016-05-05 10:58:57

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv4 5/5] mac80211: add debug knobs for codel

This adds a few debugfs entries to make it easier
to test, debug and experiment.

Signed-off-by: Michal Kazior <[email protected]>
---

Notes:
v4:
* stats adjustments (in-kernel codel has more of them)

net/mac80211/debugfs.c | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)

diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 43592b6f79f0..c7cfedc61fc4 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -124,6 +124,15 @@ static const struct file_operations name## _ops = { \
res; \
})

+#define DEBUGFS_RW_BOOL(arg) \
+({ \
+ int res; \
+ int val; \
+ res = mac80211_parse_buffer(userbuf, count, ppos, "%d", &val); \
+ arg = !!(val); \
+ res; \
+})
+
DEBUGFS_READONLY_FILE(fq_flows_cnt, "%u",
local->fq.flows_cnt);
DEBUGFS_READONLY_FILE(fq_backlog, "%u",
@@ -132,6 +141,16 @@ DEBUGFS_READONLY_FILE(fq_overlimit, "%u",
local->fq.overlimit);
DEBUGFS_READONLY_FILE(fq_collisions, "%u",
local->fq.collisions);
+DEBUGFS_READONLY_FILE(codel_maxpacket, "%u",
+ local->cstats.maxpacket);
+DEBUGFS_READONLY_FILE(codel_drop_count, "%u",
+ local->cstats.drop_count);
+DEBUGFS_READONLY_FILE(codel_drop_len, "%u",
+ local->cstats.drop_len);
+DEBUGFS_READONLY_FILE(codel_ecn_mark, "%u",
+ local->cstats.ecn_mark);
+DEBUGFS_READONLY_FILE(codel_ce_mark, "%u",
+ local->cstats.ce_mark);

DEBUGFS_RW_FILE(fq_limit,
DEBUGFS_RW_EXPR_FQ("%u", &local->fq.limit),
@@ -139,6 +158,18 @@ DEBUGFS_RW_FILE(fq_limit,
DEBUGFS_RW_FILE(fq_quantum,
DEBUGFS_RW_EXPR_FQ("%u", &local->fq.quantum),
"%u", local->fq.quantum);
+DEBUGFS_RW_FILE(codel_interval,
+ DEBUGFS_RW_EXPR_FQ("%u", &local->cparams.interval),
+ "%u", local->cparams.interval);
+DEBUGFS_RW_FILE(codel_target,
+ DEBUGFS_RW_EXPR_FQ("%u", &local->cparams.target),
+ "%u", local->cparams.target);
+DEBUGFS_RW_FILE(codel_mtu,
+ DEBUGFS_RW_EXPR_FQ("%u", &local->cparams.mtu),
+ "%u", local->cparams.mtu);
+DEBUGFS_RW_FILE(codel_ecn,
+ DEBUGFS_RW_BOOL(local->cparams.ecn),
+ "%d", local->cparams.ecn ? 1 : 0);

#ifdef CONFIG_PM
static ssize_t reset_write(struct file *file, const char __user *user_buf,
@@ -333,6 +364,15 @@ void debugfs_hw_add(struct ieee80211_local *local)
DEBUGFS_ADD(fq_collisions);
DEBUGFS_ADD(fq_limit);
DEBUGFS_ADD(fq_quantum);
+ DEBUGFS_ADD(codel_maxpacket);
+ DEBUGFS_ADD(codel_drop_count);
+ DEBUGFS_ADD(codel_drop_len);
+ DEBUGFS_ADD(codel_ecn_mark);
+ DEBUGFS_ADD(codel_ce_mark);
+ DEBUGFS_ADD(codel_interval);
+ DEBUGFS_ADD(codel_target);
+ DEBUGFS_ADD(codel_mtu);
+ DEBUGFS_ADD(codel_ecn);

statsd = debugfs_create_dir("statistics", phyd);

--
2.1.4


2016-05-06 07:23:34

by Dave Taht

[permalink] [raw]
Subject: Re: [PATCHv4 5/5] mac80211: add debug knobs for codel

On Thu, May 5, 2016 at 11:33 PM, Michal Kazior <[email protected]> wrote:
> On 6 May 2016 at 07:51, Dave Taht <[email protected]> wrote:
>> On Thu, May 5, 2016 at 10:27 PM, Michal Kazior <[email protected]> wrote:
>>> On 5 May 2016 at 17:21, Dave Taht <[email protected]> wrote:
>>>> On Thu, May 5, 2016 at 4:00 AM, Michal Kazior <[email protected]> wrote:
>>>>> This adds a few debugfs entries to make it easier
>>>>> to test, debug and experiment.
>>>>
>>>> I might argue in favor of moving all these (inc the fq ones) into
>>>> their own dir, maybe "aqm" or "sqm".
>>>>
>>>> The mixture of read only stats and configuration vars is a bit confusing.
>>>>
>>>> Also in my testing of the previous patch, actually seeing the stats
>>>> get updated seemed to be highly async or inaccurate. For example, it
>>>> was obvious from the captures themselves that codel_ce_mark-ing was
>>>> happening, but the actual numbers out of wack with the mark seen or
>>>> fq_backlog seen. (I can go back to revisit this)
>>>
>>> That's kind of expected since all of these bits are exposed as
>>> separate debugfs entries/files. To avoid that it'd be necessary to
>>> provide a single debugfs entry/file whose contents are generated on
>>> open() while holding local->fq.lock. But then you could argue it
>>> should contain all per-sta-tid info as well (backlog, flows, drops) as
>>> well instead of having them in netdev*/stations/*/txqs.
>>> Hmm..
>>
>> I have not had time to write up todays results to any full extent, but
>> they were pretty spectacular.
>>
>> I have a comparison of the baseline ath10k driver vs your 3.5 patchset
>> here on the second plot:
>>
>> http://blog.cerowrt.org/post/predictive_codeling/
>>
>> The raw data is here:
>> https://github.com/dtaht/blog-cerowrt/tree/master/content/flent/qca-10.2-fqmac35-codel-5
>
> It's probably good to explicitly mention that you test(ed) ath10k with
> my RFC DQL patch applied. Without it the fqcodel benefits are a lot
> less significant.

Yes. I am trying to establish a baseline before and after, starting at
the max rate my ath9k (2x2) can take the ath10k (2x2) at a distance of
about 12 feet. Without moving anything.

https://github.com/dtaht/blog-cerowrt/tree/master/content/flent/stock-4.4.1-22
has the baseline stats from that ubuntu 16.04 kernel...
but the comparison plots I'd generated there were against the ct-10.1
firmware and before I'd realized you'd used the smaller quantum. Life
is *even* better with using the bigger quantum in the
qca-10.2-fqmac35-codel-5 patchset.

>
> (oh, and the "3.5" is pre-PATCHv4 before fq/codel split work:
> https://github.com/kazikcz/linux/tree/fqmac-v3.5 )

I have insufficient time in life to track any but the most advanced
patchset, and I am catching up as fast as I can. First up was finding
the max ath9k performance, (5x reduction in latency, no reduction in
throughput at about 110mbit).

Then I'll try locking the bitrate at say 24mbit for another run. You
already showed the latency reduction at 6mbit at about 100x to 1, so I
don't plan to repeat that.

then I'll get another ath10k 3x3 up and wash, rinse, repeat.

I would not mind if your patch 4.1 had good stats generation (maybe
put all the relevant stats in a single file?) and defaulted to quantum
1514, since it seems likely I'll not get done this first test run
before monday.

Additional test suggestions wanted? I plan to add the tcp_square_wave
tests to the next run to show how much better the congestion control
is, and I'll add iperf3 floods too.

I am not sure how avery is planning to test each individual piece.

>
>>
>> ...
>>
>> a note: quantum of the mtu (typically 1514) is a saner default than 300,
>>
>> (the older patch I had, set it to 300, dunno what your default is now).
>
> I still use 300.
>
>
>> and quantum 1514, codel target 5ms rather than 20ms for this test
>> series was *just fine* (but more testing of the lower target is
>> needed)
>
> I would keep 20ms for now until we get more test data. I'm mostly
> concerned about MU performance on ath10k which requires significant
> amount of buffering.

ok.

>
>> However:
>>
>> quantum "300" only makes sense for very, very low bandwidths (say <
>> 6mbits), in other scenarios it just eats extra cpu (5 passes through
>> the loop to send a big packet) and disables
>> the "new/old" queue feature which helps "push" new flows to flow
>> balance. I'd default it to the larger value.
>
> Perhaps this could be dynamically adjusted to match the slowest
> station known to rate control in the future?

Meh. We've done a lot of fiddling with the quantum to not much avail.
300 was a good number at really low rates. The rest of the time, the
larger number is vastly easier on cpu AND on flow balance.

https://dev.openwrt.org/ticket/21326

> Oh, and there's
> multicast..

Multicast is it's own queue in the per-station queueing model.

>
>
> Michał



--
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

2016-05-05 10:58:53

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv4 2/5] mac80211: implement fair queueing per txq

mac80211's software queues were designed to work
very closely with device tx queues. They are
required to make use of 802.11 packet aggregation
easily and efficiently.

Due to the way 802.11 aggregation is designed it
only makes sense to keep fair queuing as close to
hardware as possible to reduce induced latency and
inertia and provide the best flow responsivness.

This change doesn't translate directly to
immediate and significant gains. End result
depends on driver's induced latency. Best results
can be achieved if driver keeps its own tx
queue/fifo fill level to a minimum.

Signed-off-by: Michal Kazior <[email protected]>
---

Notes:
v4:
* removed internal fq.h and re-used in-kernel one

net/mac80211/agg-tx.c | 8 ++-
net/mac80211/ieee80211_i.h | 24 ++++++--
net/mac80211/iface.c | 12 ++--
net/mac80211/main.c | 7 +++
net/mac80211/rx.c | 2 +-
net/mac80211/sta_info.c | 14 ++---
net/mac80211/tx.c | 137 ++++++++++++++++++++++++++++++++++++++-------
net/mac80211/util.c | 23 +-------
8 files changed, 163 insertions(+), 64 deletions(-)

diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
index 42fa81031dfa..5650c46bf91a 100644
--- a/net/mac80211/agg-tx.c
+++ b/net/mac80211/agg-tx.c
@@ -194,17 +194,21 @@ static void
ieee80211_agg_stop_txq(struct sta_info *sta, int tid)
{
struct ieee80211_txq *txq = sta->sta.txq[tid];
+ struct ieee80211_sub_if_data *sdata;
+ struct fq *fq;
struct txq_info *txqi;

if (!txq)
return;

txqi = to_txq_info(txq);
+ sdata = vif_to_sdata(txq->vif);
+ fq = &sdata->local->fq;

/* Lock here to protect against further seqno updates on dequeue */
- spin_lock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);
set_bit(IEEE80211_TXQ_STOP, &txqi->flags);
- spin_unlock_bh(&txqi->queue.lock);
+ spin_unlock_bh(&fq->lock);
}

static void
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 634603320374..6f8375f1df88 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -30,6 +30,7 @@
#include <net/ieee80211_radiotap.h>
#include <net/cfg80211.h>
#include <net/mac80211.h>
+#include <net/fq.h>
#include "key.h"
#include "sta_info.h"
#include "debug.h"
@@ -805,10 +806,17 @@ enum txq_info_flags {
IEEE80211_TXQ_NO_AMSDU,
};

+/**
+ * struct txq_info - per tid queue
+ *
+ * @tin: contains packets split into multiple flows
+ * @def_flow: used as a fallback flow when a packet destined to @tin hashes to
+ * a fq_flow which is already owned by a different tin
+ */
struct txq_info {
- struct sk_buff_head queue;
+ struct fq_tin tin;
+ struct fq_flow def_flow;
unsigned long flags;
- unsigned long byte_cnt;

/* keep last! */
struct ieee80211_txq txq;
@@ -1099,6 +1107,8 @@ struct ieee80211_local {
* it first anyway so they become a no-op */
struct ieee80211_hw hw;

+ struct fq fq;
+
const struct ieee80211_ops *ops;

/*
@@ -1931,9 +1941,13 @@ static inline bool ieee80211_can_run_worker(struct ieee80211_local *local)
return true;
}

-void ieee80211_init_tx_queue(struct ieee80211_sub_if_data *sdata,
- struct sta_info *sta,
- struct txq_info *txq, int tid);
+int ieee80211_txq_setup_flows(struct ieee80211_local *local);
+void ieee80211_txq_teardown_flows(struct ieee80211_local *local);
+void ieee80211_txq_init(struct ieee80211_sub_if_data *sdata,
+ struct sta_info *sta,
+ struct txq_info *txq, int tid);
+void ieee80211_txq_purge(struct ieee80211_local *local,
+ struct txq_info *txqi);
void ieee80211_send_auth(struct ieee80211_sub_if_data *sdata,
u16 transaction, u16 auth_alg, u16 status,
const u8 *extra, size_t extra_len, const u8 *bssid,
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 609c5174d798..b123a9e325b3 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -779,6 +779,7 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,
bool going_down)
{
struct ieee80211_local *local = sdata->local;
+ struct fq *fq = &local->fq;
unsigned long flags;
struct sk_buff *skb, *tmp;
u32 hw_reconf_flags = 0;
@@ -976,13 +977,10 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,

if (sdata->vif.txq) {
struct txq_info *txqi = to_txq_info(sdata->vif.txq);
- int n = skb_queue_len(&txqi->queue);

- spin_lock_bh(&txqi->queue.lock);
- ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
- atomic_sub(n, &sdata->num_tx_queued);
- txqi->byte_cnt = 0;
- spin_unlock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);
+ ieee80211_txq_purge(local, txqi);
+ spin_unlock_bh(&fq->lock);
}

if (local->open_count == 0)
@@ -1792,7 +1790,7 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,

if (txq_size) {
txqi = netdev_priv(ndev) + size;
- ieee80211_init_tx_queue(sdata, NULL, txqi, 0);
+ ieee80211_txq_init(sdata, NULL, txqi, 0);
}

sdata->dev = ndev;
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 160ac6b8b9a1..d00ea9b13f49 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -1086,6 +1086,10 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)

rtnl_unlock();

+ result = ieee80211_txq_setup_flows(local);
+ if (result)
+ goto fail_flows;
+
#ifdef CONFIG_INET
local->ifa_notifier.notifier_call = ieee80211_ifa_changed;
result = register_inetaddr_notifier(&local->ifa_notifier);
@@ -1111,6 +1115,8 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)
#if defined(CONFIG_INET) || defined(CONFIG_IPV6)
fail_ifa:
#endif
+ ieee80211_txq_teardown_flows(local);
+ fail_flows:
rtnl_lock();
rate_control_deinitialize(local);
ieee80211_remove_interfaces(local);
@@ -1169,6 +1175,7 @@ void ieee80211_unregister_hw(struct ieee80211_hw *hw)
skb_queue_purge(&local->skb_queue);
skb_queue_purge(&local->skb_queue_unreliable);
skb_queue_purge(&local->skb_queue_tdls_chsw);
+ ieee80211_txq_teardown_flows(local);

destroy_workqueue(local->workqueue);
wiphy_unregister(local->hw.wiphy);
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index c5678703921e..14aae75a5c75 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1268,7 +1268,7 @@ static void sta_ps_start(struct sta_info *sta)
for (tid = 0; tid < ARRAY_SIZE(sta->sta.txq); tid++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[tid]);

- if (!skb_queue_len(&txqi->queue))
+ if (!txqi->tin.backlog_packets)
set_bit(tid, &sta->txq_buffered_tids);
else
clear_bit(tid, &sta->txq_buffered_tids);
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 177cc6cd6416..76b737dcc36f 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -90,6 +90,7 @@ static void __cleanup_single_sta(struct sta_info *sta)
struct tid_ampdu_tx *tid_tx;
struct ieee80211_sub_if_data *sdata = sta->sdata;
struct ieee80211_local *local = sdata->local;
+ struct fq *fq = &local->fq;
struct ps_data *ps;

if (test_sta_flag(sta, WLAN_STA_PS_STA) ||
@@ -113,11 +114,10 @@ static void __cleanup_single_sta(struct sta_info *sta)
if (sta->sta.txq[0]) {
for (i = 0; i < ARRAY_SIZE(sta->sta.txq); i++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[i]);
- int n = skb_queue_len(&txqi->queue);

- ieee80211_purge_tx_queue(&local->hw, &txqi->queue);
- atomic_sub(n, &sdata->num_tx_queued);
- txqi->byte_cnt = 0;
+ spin_lock_bh(&fq->lock);
+ ieee80211_txq_purge(local, txqi);
+ spin_unlock_bh(&fq->lock);
}
}

@@ -368,7 +368,7 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
for (i = 0; i < ARRAY_SIZE(sta->sta.txq); i++) {
struct txq_info *txq = txq_data + i * size;

- ieee80211_init_tx_queue(sdata, sta, txq, i);
+ ieee80211_txq_init(sdata, sta, txq, i);
}
}

@@ -1211,7 +1211,7 @@ void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta)
for (i = 0; i < ARRAY_SIZE(sta->sta.txq); i++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[i]);

- if (!skb_queue_len(&txqi->queue))
+ if (!txqi->tin.backlog_packets)
continue;

drv_wake_tx_queue(local, txqi);
@@ -1648,7 +1648,7 @@ ieee80211_sta_ps_deliver_response(struct sta_info *sta,
for (tid = 0; tid < ARRAY_SIZE(sta->sta.txq); tid++) {
struct txq_info *txqi = to_txq_info(sta->sta.txq[tid]);

- if (!(tids & BIT(tid)) || skb_queue_len(&txqi->queue))
+ if (!(tids & BIT(tid)) || txqi->tin.backlog_packets)
continue;

sta_info_recalc_tim(sta);
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 792f01721d65..56633b012ba1 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -25,6 +25,7 @@
#include <net/cfg80211.h>
#include <net/mac80211.h>
#include <asm/unaligned.h>
+#include <net/fq_impl.h>

#include "ieee80211_i.h"
#include "driver-ops.h"
@@ -1262,45 +1263,121 @@ static struct txq_info *ieee80211_get_txq(struct ieee80211_local *local,
return NULL;
}

+static struct sk_buff *fq_tin_dequeue_func(struct fq *fq,
+ struct fq_tin *tin,
+ struct fq_flow *flow)
+{
+ return fq_flow_dequeue(fq, flow);
+}
+
+static void fq_skb_free_func(struct fq *fq,
+ struct fq_tin *tin,
+ struct fq_flow *flow,
+ struct sk_buff *skb)
+{
+ struct ieee80211_local *local;
+
+ local = container_of(fq, struct ieee80211_local, fq);
+ ieee80211_free_txskb(&local->hw, skb);
+}
+
+static struct fq_flow *fq_flow_get_default_func(struct fq *fq,
+ struct fq_tin *tin,
+ int idx,
+ struct sk_buff *skb)
+{
+ struct txq_info *txqi;
+
+ txqi = container_of(tin, struct txq_info, tin);
+ return &txqi->def_flow;
+}
+
static void ieee80211_txq_enqueue(struct ieee80211_local *local,
struct txq_info *txqi,
struct sk_buff *skb)
{
- struct ieee80211_sub_if_data *sdata = vif_to_sdata(txqi->txq.vif);
+ struct fq *fq = &local->fq;
+ struct fq_tin *tin = &txqi->tin;

- lockdep_assert_held(&txqi->queue.lock);
+ fq_tin_enqueue(fq, tin, skb,
+ fq_skb_free_func,
+ fq_flow_get_default_func);
+}

- if (atomic_read(&sdata->num_tx_queued) >= TOTAL_MAX_TX_BUFFER ||
- txqi->queue.qlen >= STA_MAX_TX_BUFFER) {
- ieee80211_free_txskb(&local->hw, skb);
- return;
+void ieee80211_txq_init(struct ieee80211_sub_if_data *sdata,
+ struct sta_info *sta,
+ struct txq_info *txqi, int tid)
+{
+ fq_tin_init(&txqi->tin);
+ fq_flow_init(&txqi->def_flow);
+
+ txqi->txq.vif = &sdata->vif;
+
+ if (sta) {
+ txqi->txq.sta = &sta->sta;
+ sta->sta.txq[tid] = &txqi->txq;
+ txqi->txq.tid = tid;
+ txqi->txq.ac = ieee802_1d_to_ac[tid & 7];
+ } else {
+ sdata->vif.txq = &txqi->txq;
+ txqi->txq.tid = 0;
+ txqi->txq.ac = IEEE80211_AC_BE;
}
+}

- atomic_inc(&sdata->num_tx_queued);
- txqi->byte_cnt += skb->len;
- __skb_queue_tail(&txqi->queue, skb);
+void ieee80211_txq_purge(struct ieee80211_local *local,
+ struct txq_info *txqi)
+{
+ struct fq *fq = &local->fq;
+ struct fq_tin *tin = &txqi->tin;
+
+ fq_tin_reset(fq, tin, fq_skb_free_func);
+}
+
+int ieee80211_txq_setup_flows(struct ieee80211_local *local)
+{
+ struct fq *fq = &local->fq;
+ int ret;
+
+ if (!local->ops->wake_tx_queue)
+ return 0;
+
+ ret = fq_init(fq, 4096);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+void ieee80211_txq_teardown_flows(struct ieee80211_local *local)
+{
+ struct fq *fq = &local->fq;
+
+ if (!local->ops->wake_tx_queue)
+ return;
+
+ fq_reset(fq, fq_skb_free_func);
}

struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
struct ieee80211_txq *txq)
{
- struct ieee80211_sub_if_data *sdata = vif_to_sdata(txq->vif);
+ struct ieee80211_local *local = hw_to_local(hw);
struct txq_info *txqi = container_of(txq, struct txq_info, txq);
struct ieee80211_hdr *hdr;
struct sk_buff *skb = NULL;
+ struct fq *fq = &local->fq;
+ struct fq_tin *tin = &txqi->tin;

- spin_lock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);

if (test_bit(IEEE80211_TXQ_STOP, &txqi->flags))
goto out;

- skb = __skb_dequeue(&txqi->queue);
+ skb = fq_tin_dequeue(fq, tin, fq_tin_dequeue_func);
if (!skb)
goto out;

- atomic_dec(&sdata->num_tx_queued);
- txqi->byte_cnt -= skb->len;
-
hdr = (struct ieee80211_hdr *)skb->data;
if (txq->sta && ieee80211_is_data_qos(hdr->frame_control)) {
struct sta_info *sta = container_of(txq->sta, struct sta_info,
@@ -1315,7 +1392,7 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
}

out:
- spin_unlock_bh(&txqi->queue.lock);
+ spin_unlock_bh(&fq->lock);

if (skb && skb_has_frag_list(skb) &&
!ieee80211_hw_check(&local->hw, TX_FRAG_LIST))
@@ -1332,6 +1409,7 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
bool txpending)
{
struct ieee80211_tx_control control = {};
+ struct fq *fq = &local->fq;
struct sk_buff *skb, *tmp;
struct txq_info *txqi;
unsigned long flags;
@@ -1354,9 +1432,9 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,

__skb_unlink(skb, skbs);

- spin_lock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);
ieee80211_txq_enqueue(local, txqi, skb);
- spin_unlock_bh(&txqi->queue.lock);
+ spin_unlock_bh(&fq->lock);

drv_wake_tx_queue(local, txqi);

@@ -2888,6 +2966,9 @@ static bool ieee80211_amsdu_aggregate(struct ieee80211_sub_if_data *sdata,
struct sk_buff *skb)
{
struct ieee80211_local *local = sdata->local;
+ struct fq *fq = &local->fq;
+ struct fq_tin *tin;
+ struct fq_flow *flow;
u8 tid = skb->priority & IEEE80211_QOS_CTL_TAG1D_MASK;
struct ieee80211_txq *txq = sta->sta.txq[tid];
struct txq_info *txqi;
@@ -2899,6 +2980,7 @@ static bool ieee80211_amsdu_aggregate(struct ieee80211_sub_if_data *sdata,
__be16 len;
void *data;
bool ret = false;
+ unsigned int orig_len;
int n = 1, nfrags;

if (!ieee80211_hw_check(&local->hw, TX_AMSDU))
@@ -2915,12 +2997,20 @@ static bool ieee80211_amsdu_aggregate(struct ieee80211_sub_if_data *sdata,
max_amsdu_len = min_t(int, max_amsdu_len,
sta->sta.max_rc_amsdu_len);

- spin_lock_bh(&txqi->queue.lock);
+ spin_lock_bh(&fq->lock);

- head = skb_peek_tail(&txqi->queue);
+ /* TODO: Ideally aggregation should be done on dequeue to remain
+ * responsive to environment changes.
+ */
+
+ tin = &txqi->tin;
+ flow = fq_flow_classify(fq, tin, skb, fq_flow_get_default_func);
+ head = skb_peek_tail(&flow->queue);
if (!head)
goto out;

+ orig_len = head->len;
+
if (skb->len + head->len > max_amsdu_len)
goto out;

@@ -2959,8 +3049,13 @@ static bool ieee80211_amsdu_aggregate(struct ieee80211_sub_if_data *sdata,
head->data_len += skb->len;
*frag_tail = skb;

+ flow->backlog += head->len - orig_len;
+ tin->backlog_bytes += head->len - orig_len;
+
+ fq_recalc_backlog(fq, tin, flow);
+
out:
- spin_unlock_bh(&txqi->queue.lock);
+ spin_unlock_bh(&fq->lock);

return ret;
}
diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 8903285337da..7a50086fb84a 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -3389,25 +3389,6 @@ u8 *ieee80211_add_wmm_info_ie(u8 *buf, u8 qosinfo)
return buf;
}

-void ieee80211_init_tx_queue(struct ieee80211_sub_if_data *sdata,
- struct sta_info *sta,
- struct txq_info *txqi, int tid)
-{
- skb_queue_head_init(&txqi->queue);
- txqi->txq.vif = &sdata->vif;
-
- if (sta) {
- txqi->txq.sta = &sta->sta;
- sta->sta.txq[tid] = &txqi->txq;
- txqi->txq.tid = tid;
- txqi->txq.ac = ieee802_1d_to_ac[tid & 7];
- } else {
- sdata->vif.txq = &txqi->txq;
- txqi->txq.tid = 0;
- txqi->txq.ac = IEEE80211_AC_BE;
- }
-}
-
void ieee80211_txq_get_depth(struct ieee80211_txq *txq,
unsigned long *frame_cnt,
unsigned long *byte_cnt)
@@ -3415,9 +3396,9 @@ void ieee80211_txq_get_depth(struct ieee80211_txq *txq,
struct txq_info *txqi = to_txq_info(txq);

if (frame_cnt)
- *frame_cnt = txqi->queue.qlen;
+ *frame_cnt = txqi->tin.backlog_packets;

if (byte_cnt)
- *byte_cnt = txqi->byte_cnt;
+ *byte_cnt = txqi->tin.backlog_bytes;
}
EXPORT_SYMBOL(ieee80211_txq_get_depth);
--
2.1.4


2016-05-31 12:31:02

by Michal Kazior

[permalink] [raw]
Subject: Re: [Make-wifi-fast] [PATCHv5 0/5] mac80211: implement fq_codel

On 31 May 2016 at 14:12, Toke Høiland-Jørgensen <[email protected]> wrote:
> Michal Kazior <[email protected]> writes:
>
>> This patchset disables qdiscs for drivers
>> using software queuing and performs fq_codel-like
>> dequeuing on txqs.
>
> Hi Michal
>
> Is this version in a git repo somewhere I can pull from? :)

Just pushed, enjoy!

https://github.com/kazikcz/linux/tree/fqmac-v5


Michał

2016-05-19 08:36:09

by Michal Kazior

[permalink] [raw]
Subject: [PATCHv5 5/5] mac80211: add debug knobs for codel

This adds a few debugfs entries to make it easier
to test, debug and experiment.

Signed-off-by: Michal Kazior <[email protected]>
---

Notes:
v5:
* use the single "aqm" debugfs knob [Dave]

v4:
* stats adjustments (in-kernel codel has more of them)

net/mac80211/debugfs.c | 33 +++++++++++++++++++++++++++++++--
1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 2906c1004e1a..53a315401a4b 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -126,13 +126,31 @@ static int aqm_open(struct inode *inode, struct file *file)
"R fq_overlimit %u\n"
"R fq_collisions %u\n"
"RW fq_limit %u\n"
- "RW fq_quantum %u\n",
+ "RW fq_quantum %u\n"
+ "R codel_maxpacket %u\n"
+ "R codel_drop_count %u\n"
+ "R codel_drop_len %u\n"
+ "R codel_ecn_mark %u\n"
+ "R codel_ce_mark %u\n"
+ "RW codel_interval %u\n"
+ "RW codel_target %u\n"
+ "RW codel_mtu %u\n"
+ "RW codel_ecn %u\n",
fq->flows_cnt,
fq->backlog,
fq->overlimit,
fq->collisions,
fq->limit,
- fq->quantum);
+ fq->quantum,
+ local->cstats.maxpacket,
+ local->cstats.drop_count,
+ local->cstats.drop_len,
+ local->cstats.ecn_mark,
+ local->cstats.ce_mark,
+ local->cparams.interval,
+ local->cparams.target,
+ local->cparams.mtu,
+ local->cparams.ecn ? 1U : 0U);

len += scnprintf(info->buf + len,
info->size - len,
@@ -214,6 +232,7 @@ static ssize_t aqm_write(struct file *file,
struct ieee80211_local *local = info->local;
char buf[100];
size_t len;
+ unsigned int ecn;

if (count > sizeof(buf))
return -EINVAL;
@@ -230,6 +249,16 @@ static ssize_t aqm_write(struct file *file,
return count;
else if (sscanf(buf, "fq_quantum %u", &local->fq.quantum) == 1)
return count;
+ else if (sscanf(buf, "codel_interval %u", &local->cparams.interval) == 1)
+ return count;
+ else if (sscanf(buf, "codel_target %u", &local->cparams.target) == 1)
+ return count;
+ else if (sscanf(buf, "codel_mtu %u", &local->cparams.mtu) == 1)
+ return count;
+ else if (sscanf(buf, "codel_ecn %u", &ecn) == 1) {
+ local->cparams.ecn = !!ecn;
+ return count;
+ }

return -EINVAL;
}
--
2.1.4


2016-05-06 05:27:14

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCHv4 5/5] mac80211: add debug knobs for codel

On 5 May 2016 at 17:21, Dave Taht <[email protected]> wrote:
> On Thu, May 5, 2016 at 4:00 AM, Michal Kazior <[email protected]> wrote:
>> This adds a few debugfs entries to make it easier
>> to test, debug and experiment.
>
> I might argue in favor of moving all these (inc the fq ones) into
> their own dir, maybe "aqm" or "sqm".
>
> The mixture of read only stats and configuration vars is a bit confusing.
>
> Also in my testing of the previous patch, actually seeing the stats
> get updated seemed to be highly async or inaccurate. For example, it
> was obvious from the captures themselves that codel_ce_mark-ing was
> happening, but the actual numbers out of wack with the mark seen or
> fq_backlog seen. (I can go back to revisit this)

That's kind of expected since all of these bits are exposed as
separate debugfs entries/files. To avoid that it'd be necessary to
provide a single debugfs entry/file whose contents are generated on
open() while holding local->fq.lock. But then you could argue it
should contain all per-sta-tid info as well (backlog, flows, drops) as
well instead of having them in netdev*/stations/*/txqs.

Hmm..


Michał

2016-06-09 09:49:15

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCHv5 0/5] mac80211: implement fq_codel

Applied 1-4, with the changes and comments on 5 noted in separate
emails.

johannes

2016-06-09 09:47:59

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCHv5 5/5] mac80211: add debug knobs for codel


> +++ b/net/mac80211/debugfs.c
> @@ -126,13 +126,31 @@ static int aqm_open(struct inode *inode, struct
> file *file)
>    "R fq_overlimit %u\n"
>    "R fq_collisions %u\n"
>    "RW fq_limit %u\n"
> -  "RW fq_quantum %u\n",
> +  "RW fq_quantum %u\n"
> +  "R codel_maxpacket %u\n"
> +  "R codel_drop_count %u\n"
>
It seems to me that this needs to adjust the length of the buffer
that's allocated.

johannes

2016-06-09 09:48:49

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCHv4 3/5] mac80211: add debug knobs for fair queuing

On Thu, 2016-05-05 at 13:00 +0200, Michal Kazior wrote:
> This adds a few debugfs entries and a module
> parameter to make it easier to test, debug and
> experiment.
>
I removed the module parameter, I don't really like that. Maybe it can
be replaced by a mac80211 debugfs file, that already exists when
mac80211 is loaded, outside the context of a device, so it can be set
before loading the driver?

johannes