Return-path: Received: from arrakis.dune.hu ([78.24.191.176]:44547 "EHLO arrakis.dune.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1422745AbcBZQs6 (ORCPT ); Fri, 26 Feb 2016 11:48:58 -0500 Subject: Re: [RFC/RFT] mac80211: implement fq_codel for software queuing To: Michal Kazior , linux-wireless@vger.kernel.org References: <1456492163-11437-1-git-send-email-michal.kazior@tieto.com> Cc: johannes@sipsolutions.net, netdev@vger.kernel.org, eric.dumazet@gmail.com, dave.taht@gmail.com, emmanuel.grumbach@intel.com, Tim Shepard From: Felix Fietkau Message-ID: <56D081F1.3030801@openwrt.org> (sfid-20160226_175110_135318_FA5004C8) Date: Fri, 26 Feb 2016 17:48:49 +0100 MIME-Version: 1.0 In-Reply-To: <1456492163-11437-1-git-send-email-michal.kazior@tieto.com> Content-Type: text/plain; charset=utf-8 Sender: linux-wireless-owner@vger.kernel.org List-ID: On 2016-02-26 14:09, Michal Kazior wrote: > Since 11n aggregation become important to get the > best out of txops. However aggregation inherently > requires buffering and queuing. Once variable > medium conditions to different associated stations > is considered it became apparent that bufferbloat > can't be simply fought with qdiscs for wireless > drivers. 11ac with MU-MIMO makes the problem > worse because the bandwidth-delay product becomes > even greater. > > This bases on codel5 and sch_fq_codel.c. It may > not be the Right Thing yet but it should at least > provide a framework for more improvements. Nice work! > I guess dropping rate could factor in per-station > rate control info but I don't know how this should > exactly be done. HW rate control drivers would > need extra work to take advantage of this. > > This obviously works only with drivers that use > wake_tx_queue op. > > Note: This uses IFF_NO_QUEUE to get rid of qdiscs > for wireless drivers that use mac80211 and > implement wake_tx_queue op. > > Moreover the current txq_limit and latency setting > might need tweaking. Either from userspace or be > dynamically scaled with regard to, e.g. number of > associated stations. > > FWIW This already works nicely with ath10k's (not > yey merged) pull-push congestion control for > MU-MIMO as far as throughput is concerned. > > Evaluating latency improvements is a little tricky > at this point if a driver is using more queue > layering and/or its firmware controls tx > scheduling - hence I don't have any solid data on > this. I'm open for suggestions though. > > It might also be a good idea to do the following > in the future: > > - make generic tx scheduling which does some RR > over per-sta-tid queues and dequeues bursts of > packets to form a PPDU to fit into designated > txop timeframe and bytelimit > > This could in theory be shared and used by > ath9k and (future) mt76. > > Moreover tx scheduling could factor in rate > control info and keep per-station number of > queued packets at a sufficient low threshold to > avoid queue buildup for slow stations. Emmanuel > already did similar experiment for iwlwifi's > station mode and got promising results. > > - make software queueing default internally in > mac80211. This could help other drivers to get > at least some benefit from mac80211 smarter > queueing. > > Signed-off-by: Michal Kazior > --- > include/net/mac80211.h | 36 ++++- > net/mac80211/agg-tx.c | 8 +- > net/mac80211/codel.h | 260 +++++++++++++++++++++++++++++++ > net/mac80211/codel_i.h | 89 +++++++++++ > net/mac80211/ieee80211_i.h | 27 +++- > net/mac80211/iface.c | 25 ++- > net/mac80211/main.c | 9 +- > net/mac80211/rx.c | 2 +- > net/mac80211/sta_info.c | 10 +- > net/mac80211/sta_info.h | 27 ++++ > net/mac80211/tx.c | 370 ++++++++++++++++++++++++++++++++++++++++----- > net/mac80211/util.c | 20 ++- > 12 files changed, 805 insertions(+), 78 deletions(-) > create mode 100644 net/mac80211/codel.h > create mode 100644 net/mac80211/codel_i.h > > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c > index af584f7cdd63..f42f898cb8b5 100644 > --- a/net/mac80211/tx.c > +++ b/net/mac80211/tx.c > + [...] > +static void ieee80211_txq_enqueue(struct ieee80211_local *local, > + struct txq_info *txqi, > + struct sk_buff *skb) > +{ > + struct ieee80211_fq *fq = &local->fq; > + struct ieee80211_hw *hw = &local->hw; > + struct txq_flow *flow; > + struct txq_flow *i; > + size_t idx = fq_hash(fq, skb); > + > + flow = &fq->flows[idx]; > + > + if (flow->txqi) > + flow = &txqi->flow; I'm not sure I understand this part correctly, but shouldn't that be: if (flow->txqi && flow->txqi != txqi) > + > + /* The following overwrites `vif` pointer effectively. It is later > + * restored using txq structure. > + */ > + IEEE80211_SKB_CB(skb)->control.enqueue_time = codel_get_time(); > + > + flow->txqi = txqi; > + flow->backlog += skb->len; > + txqi->backlog_bytes += skb->len; > + txqi->backlog_packets++; > + fq->backlog++; > + > + if (list_empty(&flow->backlogchain)) > + i = list_last_entry(&fq->backlogs, struct txq_flow, backlogchain); > + else > + i = flow; > + > + list_for_each_entry_continue_reverse(i, &fq->backlogs, backlogchain) > + if (i->backlog > flow->backlog) > + break; > + > + list_move(&flow->backlogchain, &i->backlogchain); > + > + if (list_empty(&flow->flowchain)) { > + flow->deficit = fq->quantum; > + list_add_tail(&flow->flowchain, &txqi->new_flows); > + } > + > + __skb_queue_tail(&flow->queue, skb); > + > + if (fq->backlog > hw->txq_limit) > + fq_drop(local); > +}