Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2202073pxj; Sun, 30 May 2021 17:45:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwoRUsnCbrp5nM6vObRVH6Z3xnl2r0qbaoQ5bgjvvL+t7gKdrANB5aoSsOL6hVQopJUszHg X-Received: by 2002:a17:906:55cb:: with SMTP id z11mr8110103ejp.475.1622421951817; Sun, 30 May 2021 17:45:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622421951; cv=none; d=google.com; s=arc-20160816; b=GBSwCXeO0fAvc9Qb7YSITdOQ481Eo/wghLyyY3fDGVGyAi/zJemrylvKVXdI6IRkt7 C3lnZEvHaI7WggZ4y4Ow5fGGpaZ8NrRKj1jbmKu+AIn4GVdf3TLQL60h7WAEjEmw9I1B EXd1xh7+RJChMDnRDJNodV+lqNVyMOGV2SCHoKdxH2dlYR80X6D5OTelQbvy9wnWTDRg Vi2PTrNy+D0wI26m9NFUXApRgYMBEON3ACKQHoYygtXFWv75gJ9YOOUUq0ypqYNUzNCu oH+976xe9ZUlwyteYLUHYEmT9xydmqYFR90zKqb8SW81MXWOpE7AvFRTfPmWjyp7PY9A jmPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=tLniVgTlhlY7xYM+PYnGg8ESq8vFNco++WRr8V8im74=; b=vg4vTEMDTnEyRbqVgc1rdQyHTs07rMkeBaGVf/cJyp302Rg32Y9Dk5pP4XWl2ax6o0 yrVS+NT5GjhqRhGkrLmi+DNgvCeTR6bt/cFrZN95zUbm2DNZg6MX7cIUjdZIXgfq46qY /RfDDg7oBTPDF1TCEgTMPFaTOxNxlRKIuvhBoUOGSBrvvMm/yn9DeIE5d2grJCvTh5pI 6Ad7DKaDWLjkLzr3Gw0e1WEsLo2K+75oHzX491jIEXv25i1rvprLnTZ6HQQGN61amK9l TK4DpphstZ87KmG6fa9JPvRW+nCVRB8bP4Cc53Ux/MgU7nnUSy+My6DvsowSMGcqswsG rTiw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z4si12274148ejp.321.2021.05.30.17.45.09; Sun, 30 May 2021 17:45:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229961AbhEaAmE (ORCPT + 99 others); Sun, 30 May 2021 20:42:04 -0400 Received: from szxga01-in.huawei.com ([45.249.212.187]:2107 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229887AbhEaAmA (ORCPT ); Sun, 30 May 2021 20:42:00 -0400 Received: from dggemv704-chm.china.huawei.com (unknown [172.30.72.57]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4FtbtL4y58zWpJM; Mon, 31 May 2021 08:35:38 +0800 (CST) Received: from dggpemm500005.china.huawei.com (7.185.36.74) by dggemv704-chm.china.huawei.com (10.3.19.47) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Mon, 31 May 2021 08:40:15 +0800 Received: from [127.0.0.1] (10.69.30.204) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2176.2; Mon, 31 May 2021 08:40:15 +0800 Subject: Re: [PATCH net-next 2/3] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc To: Jakub Kicinski , Yunsheng Lin CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , References: <1622170197-27370-1-git-send-email-linyunsheng@huawei.com> <1622170197-27370-3-git-send-email-linyunsheng@huawei.com> <20210528180012.676797d6@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net> <20210528213218.2b90864c@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net> <20210529114919.4f8b1980@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net> <9cc9f513-7655-07df-3c74-5abe07ae8321@gmail.com> <20210530132111.3a974275@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net> From: Yunsheng Lin Message-ID: <3c2fbc70-841f-d90b-ca13-1f058169be50@huawei.com> Date: Mon, 31 May 2021 08:40:14 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: <20210530132111.3a974275@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.69.30.204] X-ClientProxiedBy: dggeme712-chm.china.huawei.com (10.1.199.108) To dggpemm500005.china.huawei.com (7.185.36.74) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/5/31 4:21, Jakub Kicinski wrote: > On Sun, 30 May 2021 09:37:09 +0800 Yunsheng Lin wrote: >> On 2021/5/30 2:49, Jakub Kicinski wrote: >>> The fact that MISSED is only cleared under q->seqlock does not matter, >>> because setting it and ->enqueue() are not under any lock. If the thread >>> gets interrupted between: >>> >>> if (q->flags & TCQ_F_CAN_BYPASS && nolock_qdisc_is_empty(q) && >>> qdisc_run_begin(q)) { >>> >>> and ->enqueue() we can't guarantee that something else won't come in, >>> take q->seqlock and clear MISSED. >>> >>> thread1 thread2 thread3 >>> # holds seqlock >>> qdisc_run_begin(q) >>> set(MISSED) >>> pfifo_fast_dequeue >>> clear(MISSED) >>> # recheck the queue >>> qdisc_run_end() >>> ->enqueue() >>> q->flags & TCQ_F_CAN_BYPASS.. >>> qdisc_run_begin() # true >>> sch_direct_xmit() >>> qdisc_run_begin() >>> set(MISSED) >>> >>> Or am I missing something? >>> >>> Re-checking nolock_qdisc_is_empty() may or may not help. >>> But it doesn't really matter because there is no ordering >>> requirement between thread2 and thread3 here. >> >> I were more focued on explaining that using MISSED is reliable >> as sch_may_need_requeuing() checking in RFCv3 [1] to indicate a >> empty qdisc, and forgot to mention the data race described in >> RFCv3, which is kind of like the one described above: >> >> "There is a data race as below: >> >> CPU1 CPU2 >> qdisc_run_begin(q) . >> . q->enqueue() >> sch_may_need_requeuing() . >> return true . >> . . >> . . >> q->enqueue() . >> >> When above happen, the skb enqueued by CPU1 is dequeued after the >> skb enqueued by CPU2 because sch_may_need_requeuing() return true. >> If there is not qdisc bypass, the CPU1 has better chance to queue >> the skb quicker than CPU2. >> >> This patch does not take care of the above data race, because I >> view this as similar as below: >> >> Even at the same time CPU1 and CPU2 write the skb to two socket >> which both heading to the same qdisc, there is no guarantee that >> which skb will hit the qdisc first, becuase there is a lot of >> factor like interrupt/softirq/cache miss/scheduling afffecting >> that." >> >> Does above make sense? Or any idea to avoid it? >> >> 1. https://patchwork.kernel.org/project/netdevbpf/patch/1616404156-11772-1-git-send-email-linyunsheng@huawei.com/ > > We agree on this one. > > Could you draw a sequence diagram of different CPUs (like the one > above) for the case where removing re-checking nolock_qdisc_is_empty() > under q->seqlock leads to incorrect behavior? When nolock_qdisc_is_empty() is not re-checking under q->seqlock, we may have: CPU1 CPU2 qdisc_run_begin(q) . . enqueue skb1 deuqueue skb1 and clear MISSED . . nolock_qdisc_is_empty() return true requeue skb . q->enqueue() . set MISSED . . . qdisc_run_end(q) . . qdisc_run_begin(q) . transmit skb2 directly . transmit the requeued skb1 The problem here is that skb1 and skb2 are from the same CPU, which means they are likely from the same flow, so we need to avoid this, right? > > If there is no such case would you be willing to repeat the benchmark > with and without this test? > > Sorry for dragging the review out.. > > . >