Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp3269088pxf; Mon, 22 Mar 2021 02:10:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyVNnUicFazMJiviGpL6tmpe9xvEtNOp5MkSuXMQB1uK+UPoMZgncG3C0+v9TgmaY6tc1M5 X-Received: by 2002:aa7:df84:: with SMTP id b4mr24243221edy.240.1616404237455; Mon, 22 Mar 2021 02:10:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616404237; cv=none; d=google.com; s=arc-20160816; b=xvJGbAo0VrnXmj1mRSu2e6/duVzPNkeK/SXaBD6gVq27R9W78BDNrAra5B+wzh5I85 kXTPcoESd6E58BhzLg6xBSzeG48XVbB3cEqZND/vxwRLIbgh6zhk0iszLfaB8SnpCIok CsFfEITe6kcR3Wpop7MTBafy0Uh+lafXgQnMaS94+ViIcGQIfbC4WbjvOGqbMR7Szc+i +83FnV6/0uF0c+/pG4E9H28qhFOzKxP4EdGcSpAUldD9Y42uWW5KSAY+iBPCmAO7wkqd mkvsvtcTtICUSlAf2QuILVpMwhomZeqsLuKD6ev2kV+0ZhkyrjoKCCPG7dkKV1vGbM2A Ag4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:in-reply-to:message-id :date:subject:cc:to:from; bh=iJSiR4mar+k+s4uftNHhUy/FqXj9q8+CJWzISeYx+No=; b=o2RloQvq6KwAQKvXueteSTpP8Q52cHMLFT5MH/BjOHynU+wQy4HpfHew4BQkFVLImS 3yrXyQlFnL0vh7SyAzUt/qxemmUVREfR8RP5YJFVhPqRuPkYVlldYh2Z+V0mUtyXcw6j //XnDG4Afr1I2osxdvSRZA1heFuxGaouOs2AfQFdq+uWt2FxvrMfmerVeZk3ZY4mv0Oi dOjqLw6gFUOhRX15LTf9znrjTL27rzUYP5e3W487j7oZhyVTIECGGq8MCpFo6xcXlcl1 UN5Oo9JhJ6mVx4PsLsPvMZpt/05auN32MHJVm3yvQIRbHqCfIQ5KF3dVxjnT6pjbS9IE VrJw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bs1si10994242edb.64.2021.03.22.02.10.14; Mon, 22 Mar 2021 02:10:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229746AbhCVJJO (ORCPT + 99 others); Mon, 22 Mar 2021 05:09:14 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:14053 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229508AbhCVJIr (ORCPT ); Mon, 22 Mar 2021 05:08:47 -0400 Received: from DGGEMS406-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4F3pWp2nqgzNnNp; Mon, 22 Mar 2021 17:06:14 +0800 (CST) Received: from localhost.localdomain (10.69.192.56) by DGGEMS406-HUB.china.huawei.com (10.3.19.206) with Microsoft SMTP Server id 14.3.498.0; Mon, 22 Mar 2021 17:08:39 +0800 From: Yunsheng Lin To: , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [RFC v3] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc Date: Mon, 22 Mar 2021 17:09:16 +0800 Message-ID: <1616404156-11772-1-git-send-email-linyunsheng@huawei.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1616050402-37023-1-git-send-email-linyunsheng@huawei.com> References: <1616050402-37023-1-git-send-email-linyunsheng@huawei.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.69.192.56] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK flag set, but queue discipline by-pass does not work for lockless qdisc because skb is always enqueued to qdisc even when the qdisc is empty, see __dev_xmit_skb(). This patch calls sch_direct_xmit() to transmit the skb directly to the driver for empty lockless qdisc too, which aviod enqueuing and dequeuing operation. qdisc->empty is set to false whenever a skb is enqueued, see pfifo_fast_enqueue(), and is set to true when skb dequeuing return NULL, see pfifo_fast_dequeue(). There is a data race between enqueue/dequeue and qdisc->empty setting, qdisc->empty is only used as a hint, so we need to call sch_may_need_requeuing() to see if the queue is really empty and if there is requeued skb, which has higher priority than the current skb. The performance for ip_forward test increases about 10% with this patch. Signed-off-by: Yunsheng Lin --- Hi, Vladimir and Ahmad Please give it a test to see if there is any out of order packet for this patch, which has removed the priv->lock added in RFC v2. There is a data race as below: CPU1 CPU2 qdisc_run_begin(q) . . q->enqueue() sch_may_need_requeuing() . return true . . . . . q->enqueue() . When above happen, the skb enqueued by CPU1 is dequeued after the skb enqueued by CPU2 because sch_may_need_requeuing() return true. If there is not qdisc bypass, the CPU1 has better chance to queue the skb quicker than CPU2. This patch does not take care of the above data race, because I view this as similar as below: Even at the same time CPU1 and CPU2 write the skb to two socket which both heading to the same qdisc, there is no guarantee that which skb will hit the qdisc first, becuase there is a lot of factor like interrupt/softirq/cache miss/scheduling afffecting that. So I hope the above data race will not cause problem for Vladimir and Ahmad. --- include/net/pkt_sched.h | 1 + include/net/sch_generic.h | 1 - net/core/dev.c | 22 ++++++++++++++++++++++ net/sched/sch_generic.c | 11 +++++++++++ 4 files changed, 34 insertions(+), 1 deletion(-) diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index f5c1bee..5715ddf 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -122,6 +122,7 @@ void qdisc_warn_nonwc(const char *txt, struct Qdisc *qdisc); bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q, struct net_device *dev, struct netdev_queue *txq, spinlock_t *root_lock, bool validate); +bool sch_may_need_requeuing(struct Qdisc *q); void __qdisc_run(struct Qdisc *q); diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index f7a6e14..e08cc77 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -161,7 +161,6 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc) if (qdisc->flags & TCQ_F_NOLOCK) { if (!spin_trylock(&qdisc->seqlock)) return false; - WRITE_ONCE(qdisc->empty, false); } else if (qdisc_is_running(qdisc)) { return false; } diff --git a/net/core/dev.c b/net/core/dev.c index be941ed..317180a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3796,9 +3796,31 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, qdisc_calculate_pkt_len(skb, q); if (q->flags & TCQ_F_NOLOCK) { + if (q->flags & TCQ_F_CAN_BYPASS && READ_ONCE(q->empty) && + qdisc_run_begin(q)) { + if (sch_may_need_requeuing(q)) { + rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK; + __qdisc_run(q); + qdisc_run_end(q); + + goto no_lock_out; + } + + qdisc_bstats_cpu_update(q, skb); + + if (sch_direct_xmit(skb, q, dev, txq, NULL, true) && + !READ_ONCE(q->empty)) + __qdisc_run(q); + + qdisc_run_end(q); + return NET_XMIT_SUCCESS; + } + rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK; + WRITE_ONCE(q->empty, false); qdisc_run(q); +no_lock_out: if (unlikely(to_free)) kfree_skb_list(to_free); return rc; diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 44991ea..2145fdad 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -146,6 +146,8 @@ static inline void dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q) } if (lock) spin_unlock(lock); + + WRITE_ONCE(q->empty, false); __netif_schedule(q); } @@ -273,6 +275,15 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate, return skb; } +bool sch_may_need_requeuing(struct Qdisc *q) +{ + if (likely(skb_queue_empty(&q->gso_skb) && + !q->ops->peek(q))) + return false; + + return true; +} + /* * Transmit possibly several skbs, and handle the return status as * required. Owning running seqcount bit guarantees that -- 2.7.4