Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2A96C10F0E for ; Fri, 12 Apr 2019 06:34:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8851020643 for ; Fri, 12 Apr 2019 06:34:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="key not found in DNS" (0-bit key) header.d=codeaurora.org header.i=@codeaurora.org header.b="V0RX/yAQ"; dkim=fail reason="key not found in DNS" (0-bit key) header.d=codeaurora.org header.i=@codeaurora.org header.b="B16I+0Ku" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726860AbfDLGeO (ORCPT ); Fri, 12 Apr 2019 02:34:14 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:58702 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726024AbfDLGeO (ORCPT ); Fri, 12 Apr 2019 02:34:14 -0400 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id A3BCC61424; Fri, 12 Apr 2019 06:34:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1555050852; bh=a2XgeslapefViKAKYvJhNZ27ivhIC0RuJfB8BURF2k0=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=V0RX/yAQPrlrlmfdGoJ6XFrriuScPMcImg8++Hl3VKiAI96avtXzJA3905nx2aZnB 6IKAHcm9ngznBT1t0czq7PY+DoJbvSH1haai0D+Sk0hn6ANY+7B+kobPSRB+kRnJxg 2Nebw5/TuUbxkl9Ekesk0LbW3MXkrQ8XIB0+CQb4= Received: from mail.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.codeaurora.org (Postfix) with ESMTP id 2E00C6141C; Fri, 12 Apr 2019 06:34:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1555050851; bh=a2XgeslapefViKAKYvJhNZ27ivhIC0RuJfB8BURF2k0=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=B16I+0KuA2HuNMsXeHsLhIJ40wVVppcE7s6FSH0RBwymvgxdkhTN18lJXES+7jpW0 EwjC0dkp7gcfMjh9PJHBbCSexU4YSmVl5lswh3Y9GFoKN4hdLrkOthYSs9Xpl8iqUz TpPMnFEzgzoBCnznTCq/I2muryKUHmNECIGTBUqw= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Fri, 12 Apr 2019 14:34:11 +0800 From: Yibo Zhao To: =?UTF-8?Q?Toke_H=C3=B8iland-J=C3=B8rgensen?= Cc: make-wifi-fast@lists.bufferbloat.net, linux-wireless@vger.kernel.org, Felix Fietkau , Rajkumar Manoharan , Kan Yan , linux-wireless-owner@vger.kernel.org Subject: Re: [RFC/RFT] mac80211: Switch to a virtual time-based airtime scheduler In-Reply-To: <877ec0u6mu.fsf@toke.dk> References: <20190215170512.31512-1-toke@redhat.com> <753b328855b85f960ceaf974194a7506@codeaurora.org> <87ftqy41ea.fsf@toke.dk> <877ec2ykrh.fsf@toke.dk> <89d32174b282006c8d4e7614657171be@codeaurora.org> <87a7gyw3cu.fsf@toke.dk> <73077ba7cda566d5eeb2395978b3524c@codeaurora.org> <877ec0u6mu.fsf@toke.dk> Message-ID: X-Sender: yiboz@codeaurora.org User-Agent: Roundcube Webmail/1.2.5 Sender: linux-wireless-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org On 2019-04-11 19:24, Toke Høiland-Jørgensen wrote: > Yibo Zhao writes: > >> On 2019-04-10 18:40, Toke Høiland-Jørgensen wrote: >>> Yibo Zhao writes: >>> >>>> On 2019-04-10 04:41, Toke Høiland-Jørgensen wrote: >>>>> Yibo Zhao writes: >>>>> >>>>>> On 2019-04-04 16:31, Toke Høiland-Jørgensen wrote: >>>>>>> Yibo Zhao writes: >>>>>>> >>>>>>>> On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote: >>>>>>>>> This switches the airtime scheduler in mac80211 to use a >>>>>>>>> virtual >>>>>>>>> time-based >>>>>>>>> scheduler instead of the round-robin scheduler used before. >>>>>>>>> This >>>>>>>>> has >>>>>>>>> a >>>>>>>>> couple of advantages: >>>>>>>>> >>>>>>>>> - No need to sync up the round-robin scheduler in >>>>>>>>> firmware/hardware >>>>>>>>> with >>>>>>>>> the round-robin airtime scheduler. >>>>>>>>> >>>>>>>>> - If several stations are eligible for transmission we can >>>>>>>>> schedule >>>>>>>>> both of >>>>>>>>> them; no need to hard-block the scheduling rotation until the >>>>>>>>> head >>>>>>>>> of >>>>>>>>> the >>>>>>>>> queue has used up its quantum. >>>>>>>>> >>>>>>>>> - The check of whether a station is eligible for transmission >>>>>>>>> becomes >>>>>>>>> simpler (in ieee80211_txq_may_transmit()). >>>>>>>>> >>>>>>>>> The drawback is that scheduling becomes slightly more >>>>>>>>> expensive, >>>>>>>>> as >>>>>>>>> we >>>>>>>>> need >>>>>>>>> to maintain an rbtree of TXQs sorted by virtual time. This >>>>>>>>> means >>>>>>>>> that >>>>>>>>> ieee80211_register_airtime() becomes O(logN) in the number of >>>>>>>>> currently >>>>>>>>> scheduled TXQs. However, hopefully this number rarely grows too >>>>>>>>> big >>>>>>>>> (it's >>>>>>>>> only TXQs currently backlogged, not all associated stations), >>>>>>>>> so >>>>>>>>> it >>>>>>>>> shouldn't be too big of an issue. >>>>>>>>> >>>>>>>>> @@ -1831,18 +1830,32 @@ void >>>>>>>>> ieee80211_sta_register_airtime(struct >>>>>>>>> ieee80211_sta *pubsta, u8 tid, >>>>>>>>> { >>>>>>>>> struct sta_info *sta = container_of(pubsta, struct sta_info, >>>>>>>>> sta); >>>>>>>>> struct ieee80211_local *local = sta->sdata->local; >>>>>>>>> + struct ieee80211_txq *txq = sta->sta.txq[tid]; >>>>>>>>> u8 ac = ieee80211_ac_from_tid(tid); >>>>>>>>> - u32 airtime = 0; >>>>>>>>> + u64 airtime = 0, weight_sum; >>>>>>>>> + >>>>>>>>> + if (!txq) >>>>>>>>> + return; >>>>>>>>> >>>>>>>>> if (sta->local->airtime_flags & AIRTIME_USE_TX) >>>>>>>>> airtime += tx_airtime; >>>>>>>>> if (sta->local->airtime_flags & AIRTIME_USE_RX) >>>>>>>>> airtime += rx_airtime; >>>>>>>>> >>>>>>>>> + /* Weights scale so the unit weight is 256 */ >>>>>>>>> + airtime <<= 8; >>>>>>>>> + >>>>>>>>> spin_lock_bh(&local->active_txq_lock[ac]); >>>>>>>>> + >>>>>>>>> sta->airtime[ac].tx_airtime += tx_airtime; >>>>>>>>> sta->airtime[ac].rx_airtime += rx_airtime; >>>>>>>>> - sta->airtime[ac].deficit -= airtime; >>>>>>>>> + >>>>>>>>> + weight_sum = local->airtime_weight_sum[ac] ?: >>>>>>>>> sta->airtime_weight; >>>>>>>>> + >>>>>>>>> + local->airtime_v_t[ac] += airtime / weight_sum; >>>>>>>> Hi Toke, >>>>>>>> >>>>>>>> Please ignore the previous two broken emails regarding this new >>>>>>>> proposal >>>>>>>> from me. >>>>>>>> >>>>>>>> It looks like local->airtime_v_t acts like a Tx criteria. Only >>>>>>>> the >>>>>>>> stations with less airtime than that are valid for Tx. That >>>>>>>> means >>>>>>>> there >>>>>>>> are situations, like 50 clients, that some of the stations can >>>>>>>> be >>>>>>>> used >>>>>>>> to Tx when putting next_txq in the loop. Am I right? >>>>>>> >>>>>>> I'm not sure what you mean here. Are you referring to the case >>>>>>> where >>>>>>> new >>>>>>> stations appear with a very low (zero) airtime_v_t? That is >>>>>>> handled >>>>>>> when >>>>>>> the station is enqueued. >>>>>> Hi Toke, >>>>>> >>>>>> Sorry for the confusion. I am not referring to the case that you >>>>>> mentioned though it can be solved by your subtle design, max(local >>>>>> vt, >>>>>> sta vt). :-) >>>>>> >>>>>> Actually, my concern is situation about putting next_txq in the >>>>>> loop. >>>>>> Let me explain a little more and see below. >>>>>> >>>>>>> @@ -3640,126 +3638,191 @@ EXPORT_SYMBOL(ieee80211_tx_dequeue); >>>>>>> struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw >>>>>>> *hw, >>>>>>> u8 >>>>>>> ac) >>>>>>> { >>>>>>> struct ieee80211_local *local = hw_to_local(hw); >>>>>>> + struct rb_node *node = local->schedule_pos[ac]; >>>>>>> struct txq_info *txqi = NULL; >>>>>>> + bool first = false; >>>>>>> >>>>>>> lockdep_assert_held(&local->active_txq_lock[ac]); >>>>>>> >>>>>>> - begin: >>>>>>> - txqi = list_first_entry_or_null(&local->active_txqs[ac], >>>>>>> - struct txq_info, >>>>>>> - schedule_order); >>>>>>> - if (!txqi) >>>>>>> + if (!node) { >>>>>>> + node = rb_first_cached(&local->active_txqs[ac]); >>>>>>> + first = true; >>>>>>> + } else >>>>>>> + node = rb_next(node); >>>>>> >>>>>> Consider below piece of code from ath10k_mac_schedule_txq: >>>>>> >>>>>> ieee80211_txq_schedule_start(hw, ac); >>>>>> while ((txq = ieee80211_next_txq(hw, ac))) { >>>>>> while (ath10k_mac_tx_can_push(hw, txq)) { >>>>>> ret = ath10k_mac_tx_push_txq(hw, txq); >>>>>> if (ret < 0) >>>>>> break; >>>>>> } >>>>>> ieee80211_return_txq(hw, txq); >>>>>> ath10k_htt_tx_txq_update(hw, txq); >>>>>> if (ret == -EBUSY) >>>>>> break; >>>>>> } >>>>>> ieee80211_txq_schedule_end(hw, ac); >>>>>> >>>>>> If my understanding is right, local->schedule_pos is used to >>>>>> record >>>>>> the >>>>>> last scheduled node and used for traversal rbtree for valid txq. >>>>>> There >>>>>> is chance that an empty txq is feeded to return_txq and got >>>>>> removed >>>>>> from >>>>>> rbtree. The empty txq will always be the rb_first node. Then in >>>>>> the >>>>>> following next_txq, local->schedule_pos becomes meaningless since >>>>>> its >>>>>> rb_next will return NULL and the loop break. Only rb_first get >>>>>> dequeued >>>>>> during this loop. >>>>>> >>>>>> if (!node || RB_EMPTY_NODE(node)) { >>>>>> node = rb_first_cached(&local->active_txqs[ac]); >>>>>> first = true; >>>>>> } else >>>>>> node = rb_next(node); >>>>> >>>>> Ah, I see what you mean. Yes, that would indeed be a problem - nice >>>>> catch! :) >>>>> >>>>>> How about this? The nodes on the rbtree will be dequeued and >>>>>> removed >>>>>> from rbtree one by one until HW is busy. Please note local vt and >>>>>> sta >>>>>> vt will not be updated since txq lock is held during this time. >>>>> >>>>> Insertion and removal from the rbtree are relatively expensive, so >>>>> I'd >>>>> rather not do that for every txq. I think a better way to solve >>>>> this >>>>> is to just defer the actual removal from the tree until >>>>> ieee80211_txq_schedule_end()... Will fix that when I submit this >>>>> again. >>>> >>>> Do you mean we keep the empty txqs in the rbtree until loop finishes >>>> and >>>> remove them in ieee80211_txq_schedule_end(may be put return_txq in >>>> it)? >>>> If it is the case, I suppose a list is needed to store the empty >>>> txqs >>>> so >>>> as to dequeue them in ieee80211_txq_schedule_end. >>> >>> Yeah, return_txq() would just put "to be removed" TXQs on a list, and >>> schedule_end() would do the actual removal (after checking whether a >>> new >>> packet showed up in the meantime). >> >> SGTM >> >>> >>>> And one more thing, >>>> >>>>> + if (sta->airtime[ac].v_t > local->airtime_v_t[ac]) >>>>> { >>>>> + if (first) >>>>> + local->airtime_v_t[ac] = >>>>> sta->airtime[ac].v_t; >>>>> + else >>>>> + return NULL; >>>> >>>> As local->airtime_v_t will not be updated during loop, we don't need >>>> to >>>> return NULL. >>> >>> Yes we do; this is actually the break condition. I.e., stations whose >>> virtual time are higher than the global time (in local->airtime_v_t) >>> are >>> not allowed to transmit. And since we are traversing them in order, >>> when >>> we find the first such station, we are done and can break out of the >>> scheduling loop entirely (which is what we do by returning NULL). The >>> other branch in the inner if() is just for the case where no stations >>> are currently eligible to transmit according to this rule; here we >>> don't >>> want to stall, so we advance the global timer so the first station >>> becomes eligible... >> >> Yes,the inner if() make sure first node always get scheduled no matter >> its vt. >> >> To detail my concern, let's assume only two nodes in the tree and >> empty nodes will be in tree until schedule_end(). In the loop and in >> case hw is not busy, ath10k will drain every node next_txq returned >> before asking for another txq again. Then as we are traversing to next >> rb node, it is highly possible the second node is not allowed to >> transmit since the global time has not been updated yet as the active >> txq lock is held. At this time, only second node on the tree has data >> and hw is capable of sending more data. I don't think the second node >> is not valid for transmission in this situation. >> >> With more nodes in the tree in this situation, I think same thing >> happens that all nodes except the first node are not allowed to >> transmit since none of their vts are less than the global time which >> is not updated in time. The loop breaks when we are checking the >> second node. > > Yeah, in many cases we will end up throttling all but the first (couple > of) node(s). This is by design; otherwise we can't ensure fairness. As > long as we are making forward progress that is fine, though... All right, I see your point. Fairness is our first priority. :) > > -Toke -- Yibo