Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp1010685pxj; Sat, 29 May 2021 00:06:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwK7qEedan3LVUZXSqaVpBAY8sqNo/HE/QNTj1yHQkt5iBLZV298P/HDL7mWTA8/Zi7pSAj X-Received: by 2002:a05:6e02:156d:: with SMTP id k13mr10017890ilu.149.1622271962906; Sat, 29 May 2021 00:06:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622271962; cv=none; d=google.com; s=arc-20160816; b=pqsiPV64Vsfh+Sv/hpWNhO/9KaZwJsL7fNBrrj/8xvZbHInfgjUHo6oaKGgOr3oQmY iGxoDN22dX3Oe8kQeFBueVnhKtun0Jl3DX7Wh5im58QM8MIoD79faJ6IATpeZxOnVjFo 33Brg3IvbEKcHehidnoE/UvX6+sm70sR+MbHltZ5xJQE++0KSHTGqQb26GMBdWiySpUr xPeKbdmV4Xhseu1/ldgITR1KXur+WXfjloVCSloKMj7SVaZj71Z/+MCPLk4c7lGwpt7l fJtVnwF6DtIQecSv14YnzdoVhI8cXDEQSqBUggm8mATIFs9NkzwbpicM6vo5bpk8yUwU 07hw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=Vx6vyn9/+Fp7y0szI7ikBQwcSAEg0AVOUJZ22FJO0aE=; b=V87wtRxu7Osb56xSGLSwYDQ0fxtFea6koEA7MiPyUfs7Gp2U0ZgG9pRlmwH23nNVCJ ILeiGFuzzNNPe91Ah/byDYPGUTRf9EslLtc33HPzdQn13uDS66EJv40+JpgYlsl+DEI8 aIIbVN2i8Zf3x+yEwDwgeTXIadSLNdhAT8rCs4ObnxfTxkwew9cTEYwTzwwzrGrzkb8B jctAm7Nilj5s33+QGj9yJs9qDH3hqLZ7k8UhFVfWjTM5CeUJMkBTtu/pYF0MqQZCUJGh Qfx+XAzIqTMvj075pXUthRyZQanjwqNJ2WkPLthEyUe6M9ArUR0W8RjFxeHnG7m27SIv ceSA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j11si7609401ila.145.2021.05.29.00.05.24; Sat, 29 May 2021 00:06:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229620AbhE2HEv (ORCPT + 99 others); Sat, 29 May 2021 03:04:51 -0400 Received: from szxga03-in.huawei.com ([45.249.212.189]:2403 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229559AbhE2HEu (ORCPT ); Sat, 29 May 2021 03:04:50 -0400 Received: from dggemv711-chm.china.huawei.com (unknown [172.30.72.56]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4FsXVC420qz66Rp; Sat, 29 May 2021 14:59:31 +0800 (CST) Received: from dggpemm500005.china.huawei.com (7.185.36.74) by dggemv711-chm.china.huawei.com (10.1.198.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Sat, 29 May 2021 15:03:11 +0800 Received: from [127.0.0.1] (10.69.30.204) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2176.2; Sat, 29 May 2021 15:03:10 +0800 Subject: Re: [PATCH net-next 2/3] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc To: Jakub Kicinski CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , References: <1622170197-27370-1-git-send-email-linyunsheng@huawei.com> <1622170197-27370-3-git-send-email-linyunsheng@huawei.com> <20210528180012.676797d6@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net> <20210528213218.2b90864c@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net> From: Yunsheng Lin Message-ID: Date: Sat, 29 May 2021 15:03:09 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: <20210528213218.2b90864c@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.69.30.204] X-ClientProxiedBy: dggeme719-chm.china.huawei.com (10.1.199.115) To dggpemm500005.china.huawei.com (7.185.36.74) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/5/29 12:32, Jakub Kicinski wrote: > On Sat, 29 May 2021 09:44:57 +0800 Yunsheng Lin wrote: >>>> @@ -3852,10 +3852,32 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, >>>> qdisc_calculate_pkt_len(skb, q); >>>> >>>> if (q->flags & TCQ_F_NOLOCK) { >>>> + if (q->flags & TCQ_F_CAN_BYPASS && nolock_qdisc_is_empty(q) && >>>> + qdisc_run_begin(q)) { >>>> + /* Retest nolock_qdisc_is_empty() within the protection >>>> + * of q->seqlock to ensure qdisc is indeed empty. >>>> + */ >>>> + if (unlikely(!nolock_qdisc_is_empty(q))) { >>> >>> This is just for the DRAINING case right? >>> >>> MISSED can be set at any moment, testing MISSED seems confusing. >> >> MISSED is only set when there is lock contention, which means it >> is better not to do the qdisc bypass to avoid out of order packet >> problem, > > Avoid as in make less likely? Nothing guarantees other thread is not > interrupted after ->enqueue and before qdisc_run_begin(). > > TBH I'm not sure what out-of-order situation you're referring to, > there is no ordering guarantee between separate threads trying to > transmit AFAIU. A thread need to do the bypass checking before doing enqueuing, so it means MISSED is set or the trylock fails for the bypass transmiting( which will set the MISSED after the first trylock), so the MISSED will always be set before a thread doing a enqueuing, and we ensure MISSED only be cleared during the protection of q->seqlock, after clearing MISSED, we do anther round of dequeuing within the protection of q->seqlock. So if a thread has taken the q->seqlock and the MISSED is not set yet, it is allowed to send the packet directly without going through the qdisc enqueuing and dequeuing process. > > IOW this check is not required for correctness, right? if a thread has taken the q->seqlock and the MISSED is not set, it means other thread has not set MISSED after the first trylock and before the second trylock, which means the enqueuing is not done yet. So I assume the this check is required for correctness if I understand your question correctly. > >> another good thing is that we could also do the batch >> dequeuing and transmiting of packets when there is lock contention. > > No doubt, but did you see the flag get set significantly often here > to warrant the double-checking? No, that is just my guess:) > >>> Is it really worth the extra code? >> >> Currently DRAINING is only set for the netdev queue stopped. >> We could only use DRAINING to indicate the non-empty of a qdisc, >> then we need to set the DRAINING evrywhere MISSED is set, that is >> why I use both DRAINING and MISSED to indicate a non-empty qdisc.