Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp104071pxk; Fri, 11 Sep 2020 01:26:41 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwpF2y85quVgL1VhwJQQlf8al5MMb7XqiEdwtv98Hsa7M82t0Pe7KMxVndTP2ZpLpljNqYB X-Received: by 2002:a17:906:480a:: with SMTP id w10mr933985ejq.372.1599812801689; Fri, 11 Sep 2020 01:26:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1599812801; cv=none; d=google.com; s=arc-20160816; b=z967RDLHB+4Fp7LOStmyC3xYms8SrAqn2B7P784AZtxNyR0/UC9EE0DHcdqpV6+Ure Td//4WKtAetHX7q0vNZzCW/JZUaMjk6iImZlPSVy9eLXyrItXkQsaT55ieWjIKb7SXu2 Rtux2t+EHVpvJzMcIqSiIXai61FDwmdwreTX1bB2KwzS4DUvbvMuKl/TFzADGwE0WkY2 dxvhlgzMXfHHQWGIUGmWlGVbTeEuMNPf7mYwWA8Uh9Tx+JuaCM0GzwB2uto78gFdl2Yw c1lGFQWHRNxxH53Z+6rCvoysB8mn5n1O7bBQD83OwzvTlCgcefWfBNuZT9w/7Z9tthJ7 GdBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:references:cc:to:from:subject; bh=oeh8kpekJcPMetDOPpBBtzabpZVi8se6h4dn2gayanQ=; b=dKJb4unCUrl+AvYoVlotHUrdM6ZjEA1Q3j4e0PEaqmtePK7L4K8nZDBeqTHqfeNTlq bAv8XkrxRJn1djwJHAz5sw38nyk1cmCLpCQr54d8p96sgwewk1PLFxHe+wGWFsKLTQ2O xxzh+xhAcuaMrc+NFOvmkWyYBezOJVI9HAqZFsDi0hbm8szotXju+yEmBNsvb698hl7g /c107/7UNVVFA/PZucFfQtKmKml+RHfrR8VY1nGZpeFeRFoySj9X2w+724717Kh6R+de 7GSJ54IORnbCI6fRZzy6oLt4gs656pdWEm6MdkxSoc+4whqYWo4qZFpPNQRU5vaeGDwI eSNw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g16si811414edv.344.2020.09.11.01.26.17; Fri, 11 Sep 2020 01:26:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725786AbgIKIZi (ORCPT + 99 others); Fri, 11 Sep 2020 04:25:38 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:12232 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725554AbgIKIZh (ORCPT ); Fri, 11 Sep 2020 04:25:37 -0400 Received: from DGGEMS404-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 738753D7BCA3945F5F93; Fri, 11 Sep 2020 16:25:34 +0800 (CST) Received: from [10.74.191.121] (10.74.191.121) by DGGEMS404-HUB.china.huawei.com (10.3.19.204) with Microsoft SMTP Server id 14.3.487.0; Fri, 11 Sep 2020 16:25:26 +0800 Subject: Re: [PATCH v2 net] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc From: Yunsheng Lin To: Cong Wang CC: Jamal Hadi Salim , Jiri Pirko , "David Miller" , Jakub Kicinski , "Linux Kernel Network Developers" , LKML , , John Fastabend , Eric Dumazet References: <1599562954-87257-1-git-send-email-linyunsheng@huawei.com> <830f85b5-ef29-c68e-c982-de20ac880bd9@huawei.com> Message-ID: <763e6ad2-5e3a-a9db-9cb0-5b4529c56b50@huawei.com> Date: Fri, 11 Sep 2020 16:25:25 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: <830f85b5-ef29-c68e-c982-de20ac880bd9@huawei.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.74.191.121] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/9/11 16:13, Yunsheng Lin wrote: > On 2020/9/11 4:07, Cong Wang wrote: >> On Tue, Sep 8, 2020 at 4:06 AM Yunsheng Lin wrote: >>> >>> Currently there is concurrent reset and enqueue operation for the >>> same lockless qdisc when there is no lock to synchronize the >>> q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in >>> qdisc_deactivate() called by dev_deactivate_queue(), which may cause >>> out-of-bounds access for priv->ring[] in hns3 driver if user has >>> requested a smaller queue num when __dev_xmit_skb() still enqueue a >>> skb with a larger queue_mapping after the corresponding qdisc is >>> reset, and call hns3_nic_net_xmit() with that skb later. >>> >>> Reused the existing synchronize_net() in dev_deactivate_many() to >>> make sure skb with larger queue_mapping enqueued to old qdisc(which >>> is saved in dev_queue->qdisc_sleeping) will always be reset when >>> dev_reset_queue() is called. >>> >>> Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") >>> Signed-off-by: Yunsheng Lin >>> --- >>> ChangeLog V2: >>> Reuse existing synchronize_net(). >>> --- >>> net/sched/sch_generic.c | 48 +++++++++++++++++++++++++++++++++--------------- >>> 1 file changed, 33 insertions(+), 15 deletions(-) >>> >>> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c >>> index 265a61d..54c4172 100644 >>> --- a/net/sched/sch_generic.c >>> +++ b/net/sched/sch_generic.c >>> @@ -1131,24 +1131,10 @@ EXPORT_SYMBOL(dev_activate); >>> >>> static void qdisc_deactivate(struct Qdisc *qdisc) >>> { >>> - bool nolock = qdisc->flags & TCQ_F_NOLOCK; >>> - >>> if (qdisc->flags & TCQ_F_BUILTIN) >>> return; >>> - if (test_bit(__QDISC_STATE_DEACTIVATED, &qdisc->state)) >>> - return; >>> - >>> - if (nolock) >>> - spin_lock_bh(&qdisc->seqlock); >>> - spin_lock_bh(qdisc_lock(qdisc)); >>> >>> set_bit(__QDISC_STATE_DEACTIVATED, &qdisc->state); >>> - >>> - qdisc_reset(qdisc); >>> - >>> - spin_unlock_bh(qdisc_lock(qdisc)); >>> - if (nolock) >>> - spin_unlock_bh(&qdisc->seqlock); >>> } >>> >>> static void dev_deactivate_queue(struct net_device *dev, >>> @@ -1165,6 +1151,30 @@ static void dev_deactivate_queue(struct net_device *dev, >>> } >>> } >>> >>> +static void dev_reset_queue(struct net_device *dev, >>> + struct netdev_queue *dev_queue, >>> + void *_unused) >>> +{ >>> + struct Qdisc *qdisc; >>> + bool nolock; >>> + >>> + qdisc = dev_queue->qdisc_sleeping; >>> + if (!qdisc) >>> + return; >>> + >>> + nolock = qdisc->flags & TCQ_F_NOLOCK; >>> + >>> + if (nolock) >>> + spin_lock_bh(&qdisc->seqlock); >>> + spin_lock_bh(qdisc_lock(qdisc)); >> >> >> I think you do not need this lock for lockless one. > > It seems so. > Maybe another patch to remove qdisc_lock(qdisc) for lockless > qdisc? > > >> >>> + >>> + qdisc_reset(qdisc); >>> + >>> + spin_unlock_bh(qdisc_lock(qdisc)); >>> + if (nolock) >>> + spin_unlock_bh(&qdisc->seqlock); >>> +} >>> + >>> static bool some_qdisc_is_busy(struct net_device *dev) >>> { >>> unsigned int i; >>> @@ -1213,12 +1223,20 @@ void dev_deactivate_many(struct list_head *head) >>> dev_watchdog_down(dev); >>> } >>> >>> - /* Wait for outstanding qdisc-less dev_queue_xmit calls. >>> + /* Wait for outstanding qdisc-less dev_queue_xmit calls or >>> + * outstanding qdisc enqueuing calls. >>> * This is avoided if all devices are in dismantle phase : >>> * Caller will call synchronize_net() for us >>> */ >>> synchronize_net(); >>> >>> + list_for_each_entry(dev, head, close_list) { >>> + netdev_for_each_tx_queue(dev, dev_reset_queue, NULL); >>> + >>> + if (dev_ingress_queue(dev)) >>> + dev_reset_queue(dev, dev_ingress_queue(dev), NULL); >>> + } >>> + >>> /* Wait for outstanding qdisc_run calls. */ >>> list_for_each_entry(dev, head, close_list) { >>> while (some_qdisc_is_busy(dev)) { >> >> Do you want to reset before waiting for TX action? >> >> I think it is safer to do it after, at least prior to commit 759ae57f1b >> we did after. > > The reference to the txq->qdisc is always protected by RCU, so the synchronize_net() > should be enought to ensure there is no skb enqueued to the old qdisc that is saved > in the dev_queue->qdisc_sleeping, because __dev_queue_xmit can only see the new qdisc > after synchronize_net(), which is noop_qdisc, and noop_qdisc will make sure any skb > enqueued to it will be dropped and freed, right? > > If we do any additional reset that is not related to qdisc in dev_reset_queue(), we > can move it after some_qdisc_is_busy() checking. > > Also, it seems the __QDISC_STATE_DEACTIVATED checking in qdisc_run() is unnecessary > after this patch, because after synchronize_net() qdisc_run() will now see the old > qdisc. now -> not sorry for the typo. > > static inline void qdisc_run(struct Qdisc *q) > { > if (qdisc_run_begin(q)) { > /* NOLOCK qdisc must check 'state' under the qdisc seqlock > * to avoid racing with dev_qdisc_reset() > */ > if (!(q->flags & TCQ_F_NOLOCK) || > likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) > __qdisc_run(q); > qdisc_run_end(q); > } > } > >> >> Thanks. >> . >> > . >