Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp310932pxb; Mon, 2 Nov 2020 23:26:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJyX6sJR25f198I5relDohsWQgcXgBfAHwpkaLNgShLEuZgdw5lgedwyYUgNfMHYnfiUhAZa X-Received: by 2002:a50:dac9:: with SMTP id s9mr19637324edj.75.1604388391226; Mon, 02 Nov 2020 23:26:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604388391; cv=none; d=google.com; s=arc-20160816; b=KLUJerF1T/0yLEPdpI1rMQG5Gqa1QZSzMdDEds8h25jEhW3C3lp4EvB7WN+d0nFeK1 /i0KXkqyrFDIGTIz3GLVBXLCILOEK42KezLcBBFB/XOgNLFpY7HBHv7izMjCXOLazVu4 Pz0bDtbz6oyURFd5jQmjz2DtCezFBqyYqqTGHqIpMMj9JN09fGVpPI1jiawRstSOs6is Mb7aWHc3cWu3Okia7YRBGqwZ3Y1KKHDEOCEEzy5kID+vaZn2Bwue9nzvXi46Xek+vbqs PQ5ID1wo/bzD65sM+YLCKiFvIa+PZ5XwO3MYdTzCOGGjCTXjOVzeKeZggvCktl4kJg0Y 60xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=Qlw/Squ0kL2kV35QWBUYBBBiaaBEk8f0ANesUTFWSyU=; b=0FbpndYCYa5LdLi6oSzZDnB+BhDsjdcGvPYgoEZenLemUhJlz4hmml9AEzGz/kayEZ iIm7+IcFAuIQCj7voumsF2z2rYKuLg/ARShMOI7aqmTrx9q7BcMmsquKA0d1XRMK31YN SD5HcLNw2V6oF6Mikg/Hj3bZ5PrdPlJPOy2aIeBAj/oDE9ZZXc2Hk0vutwmc3i8CckIL Fa4zb3yVXZYxEiq9Tj6bp9snfSNIqucvw8v3Bgtgyl0EE5Lb6LxYYg+kZw2L7iRf8rKb Vb63hvJ1r1v3ZF4eug3t/ctnBOn+cUKeISls64RIGpYNb6Co8ZPQhms15AAB+LLbxO/N 0FQA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cd21si9249512ejb.85.2020.11.02.23.26.08; Mon, 02 Nov 2020 23:26:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727834AbgKCHYq (ORCPT + 99 others); Tue, 3 Nov 2020 02:24:46 -0500 Received: from szxga06-in.huawei.com ([45.249.212.32]:7447 "EHLO szxga06-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725958AbgKCHYq (ORCPT ); Tue, 3 Nov 2020 02:24:46 -0500 Received: from DGGEMS410-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4CQLrn1HbCzhd5q; Tue, 3 Nov 2020 15:24:41 +0800 (CST) Received: from [10.74.191.121] (10.74.191.121) by DGGEMS410-HUB.china.huawei.com (10.3.19.210) with Microsoft SMTP Server id 14.3.487.0; Tue, 3 Nov 2020 15:24:32 +0800 Subject: Re: [PATCH v2 net] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc To: Cong Wang CC: Jamal Hadi Salim , Jiri Pirko , "David Miller" , Jakub Kicinski , "Linux Kernel Network Developers" , LKML , , John Fastabend , Eric Dumazet References: <1599562954-87257-1-git-send-email-linyunsheng@huawei.com> <830f85b5-ef29-c68e-c982-de20ac880bd9@huawei.com> <1f8ebcde-f5ff-43df-960e-3661706e8d04@huawei.com> From: Yunsheng Lin Message-ID: <5472023c-b50b-0cb3-4cb6-7bbea42d3612@huawei.com> Date: Tue, 3 Nov 2020 15:24:32 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.74.191.121] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/11/3 0:55, Cong Wang wrote: > On Fri, Oct 30, 2020 at 12:38 AM Yunsheng Lin wrote: >> >> On 2020/10/30 3:05, Cong Wang wrote: >>> >>> I do not see how and why it should. synchronize_net() is merely an optimized >>> version of synchronize_rcu(), it should wait for RCU readers, softirqs are not >>> necessarily RCU readers, net_tx_action() does not take RCU read lock either. >> >> Ok, make sense. >> >> Taking RCU read lock in net_tx_action() does not seems to solve the problem, >> what about the time window between __netif_reschedule() and net_tx_action()? >> >> It seems we need to re-dereference the qdisc whenever RCU read lock is released >> and qdisc is still in sd->output_queue or wait for the sd->output_queue to drain? > > Not suggesting you to take RCU read lock. We already wait for TX action with > a loop of sleep. To me, the only thing missing is just moving the > reset after that > wait. __QDISC_STATE_SCHED is cleared before calling qdisc_run() in net_tx_action(), some_qdisc_is_busy does not seem to wait fully for TX action, at least qdisc is still being accessed even if __QDISC_STATE_DEACTIVATED is set. > > >>>>>> If we do any additional reset that is not related to qdisc in dev_reset_queue(), we >>>>>> can move it after some_qdisc_is_busy() checking. >>>>> >>>>> I am not suggesting to do an additional reset, I am suggesting to move >>>>> your reset after the busy waiting. >>>> >>>> There maybe a deadlock here if we reset the qdisc after the some_qdisc_is_busy() checking, >>>> because some_qdisc_is_busy() may require the qdisc reset to clear the skb, so that >>> >>> some_qdisc_is_busy() checks the status of qdisc, not the skb queue. >> >> Is there any reason why we do not check the skb queue in the dqisc? >> It seems there may be skb left when netdev is deactivated, maybe at least warn >> about that when there is still skb left when netdev is deactivated? >> Is that why we call qdisc_reset() to clear the leftover skb in qdisc_destroy()? >> >>> >>> >>>> some_qdisc_is_busy() can return false. I am not sure this is really a problem, but >>>> sch_direct_xmit() may requeue the skb when dev_hard_start_xmit return TX_BUSY. >>> >>> Sounds like another reason we should move the reset as late as possible? >> >> Why? > > You said "sch_direct_xmit() may requeue the skb", I agree. I assume you mean > net_tx_action() calls sch_direct_xmit() which does the requeue then races with > reset. No? > Look at current code again, I think there is no race between sch_direct_xmit() in net_tx_action() and dev_reset_queue() in dev_deactivate_many(), because qdisc_lock(qdisc) or qdisc->seqlock has been taken when calling sch_direct_xmit() or dev_reset_queue(). > >> >> There current netdev down order is mainly below: >> >> netif_tx_stop_all_queues() >> >> dev_deactivate_queue() >> >> synchronize_net() >> >> dev_reset_queue() >> >> some_qdisc_is_busy() >> >> >> You suggest to change it to below order, right? >> >> netif_tx_stop_all_queues() >> >> dev_deactivate_queue() >> >> synchronize_net() >> >> some_qdisc_is_busy() >> >> dev_reset_queue() > > Yes. > >> >> >> What is the semantics of some_qdisc_is_busy()? > > Waiting for flying TX action. It wait for __QDISC_STATE_SCHED to clear and qdisc running to finish, but there is still time window between __QDISC_STATE_SCHED clearing and qdisc running, right? > >> From my understanding, we can do anything about the old qdisc (including >> destorying the old qdisc) after some_qdisc_is_busy() return false. > > But the current code does the reset _before_ some_qdisc_is_busy(). ;) If lock is taken when doing reset, it does not matter if the reset is before some_qdisc_is_busy(), right? > > Thanks. > . >