Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp2657107pxb; Tue, 13 Apr 2021 07:12:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxqsaxWDLV5twLeG+RAvXSPUqAI2bR/gF82vKaT6uJVmGKcbfbr3uaZCNJrxMFPaUP6Aybn X-Received: by 2002:a05:6402:350f:: with SMTP id b15mr34799214edd.6.1618323123853; Tue, 13 Apr 2021 07:12:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618323123; cv=none; d=google.com; s=arc-20160816; b=XNVzAFU6O5E1BNvxVf7i9ftyRC3CE08xkSPgbSUYBCUBgjDtmcqSRWjKJXFcneJt0j 8CzxIA1Tp8ZhT/9BtmN1OMeYtK3NUf4iJDO0ErMBXmZXjd/XkG92tWceZvjojZ3lIakC q+oB9iAi8ecXe5Dkoyx89BSla+tI8ZSbUyMKgZ5JoTcYHpyeh/0qGg6VGmOdq5uzsAfY xf7aOsVsj0TCmGZDY3Sb71eWeurLJ0HxXGdR0j+GkgMlLrygJyNphbPOe4BOk9fTjWX+ BONo4hNdxcxmficsKB/4bF7UvEdCJn/+W9ae3rOwy8SzTapv2fkYGqxMsHPaZjTo0Ye+ xELw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=6dGUgHuk4mge9S/rIw7z/iiAdlm9jmJ2aVkLxreDiec=; b=N36k43RLzaPtbci4ZMB6rOMPTtTdgOT9QaL3yZ4RAfaYnIt0NzIyK8jIHFsmt9gwt7 segIq1/1V/oFJJegfLNGKBGIvgZJk1THZtwSRhHrHOkY12FhajMxObdF21OYlBf8w+iw ErvBLLBpAxpfwMwD135J43vY+z0L6Z7tYY8kYXAl6UuH0ne/xvOiZNn47ABUw2LBI8i2 qFzdQqL89KWRyL/Y2KJGxgVJNOvTL23bui+M9tLZV0mxPWLopWRYes4x6grvjwTC1Ir1 vzLuxJIuPL4MYFuSifb665sKocptvHxsNRUuWua30AqApbHhtn+jn2mAuRzVHqpGGuHQ FV1w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g20si10591297ejz.520.2021.04.13.07.11.39; Tue, 13 Apr 2021 07:12:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229960AbhDMJE4 (ORCPT + 99 others); Tue, 13 Apr 2021 05:04:56 -0400 Received: from szxga02-in.huawei.com ([45.249.212.188]:3527 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244480AbhDMJEJ (ORCPT ); Tue, 13 Apr 2021 05:04:09 -0400 Received: from DGGEML401-HUB.china.huawei.com (unknown [172.30.72.57]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4FKKNP5ymYzRdpd; Tue, 13 Apr 2021 17:01:41 +0800 (CST) Received: from dggpemm500005.china.huawei.com (7.185.36.74) by DGGEML401-HUB.china.huawei.com (10.3.17.32) with Microsoft SMTP Server (TLS) id 14.3.498.0; Tue, 13 Apr 2021 17:03:31 +0800 Received: from [127.0.0.1] (10.69.30.204) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2106.2; Tue, 13 Apr 2021 17:03:30 +0800 Subject: Re: [PATCH net v3] net: sched: fix packet stuck problem for lockless qdisc To: Hillf Danton CC: Juergen Gross , , , Jiri Kosina References: <1616641991-14847-1-git-send-email-linyunsheng@huawei.com> <20210409090909.1767-1-hdanton@sina.com> <20210412032111.1887-1-hdanton@sina.com> <20210412072856.2046-1-hdanton@sina.com> <20210413022129.2203-1-hdanton@sina.com> <20210413032620.2259-1-hdanton@sina.com> <20210413071241.2325-1-hdanton@sina.com> <20210413083352.2424-1-hdanton@sina.com> From: Yunsheng Lin Message-ID: <1cd37014-4b2a-b82c-0cfc-6beffb8d36de@huawei.com> Date: Tue, 13 Apr 2021 17:03:30 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: <20210413083352.2424-1-hdanton@sina.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.69.30.204] X-ClientProxiedBy: dggeme713-chm.china.huawei.com (10.1.199.109) To dggpemm500005.china.huawei.com (7.185.36.74) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/4/13 16:33, Hillf Danton wrote: > On Tue, 13 Apr 2021 15:57:29 Yunsheng Lin wrote: >> On 2021/4/13 15:12, Hillf Danton wrote: >>> On Tue, 13 Apr 2021 11:34:27 Yunsheng Lin wrote: >>>> On 2021/4/13 11:26, Hillf Danton wrote: >>>>> On Tue, 13 Apr 2021 10:56:42 Yunsheng Lin wrote: >>>>>> On 2021/4/13 10:21, Hillf Danton wrote: >>>>>>> On Mon, 12 Apr 2021 20:00:43 Yunsheng Lin wrote: >>>>>>>> >>>>>>>> Yes, the below patch seems to fix the data race described in >>>>>>>> the commit log. >>>>>>>> Then what is the difference between my patch and your patch below:) >>>>>>> >>>>>>> Hehe, this is one of the tough questions over a bounch of weeks. >>>>>>> >>>>>>> If a seqcount can detect the race between skb enqueue and dequeue then we >>>>>>> cant see any excuse for not rolling back to the point without NOLOCK. >>>>>> >>>>>> I am not sure I understood what you meant above. >>>>>> >>>>>> As my understanding, the below patch is essentially the same as >>>>>> your previous patch, the only difference I see is it uses qdisc->pad >>>>>> instead of __QDISC_STATE_NEED_RESCHEDULE. >>>>>> >>>>>> So instead of proposing another patch, it would be better if you >>>>>> comment on my patch, and make improvement upon that. >>>>>> >>>>> Happy to do that after you show how it helps revert NOLOCK. >>>> >>>> Actually I am not going to revert NOLOCK, but add optimization >>>> to it if the patch fixes the packet stuck problem. >>>> >>> Fix is not optimization, right? >> >> For this patch, it is a fix. >> In case you missed it, I do have a couple of idea to optimize the >> lockless qdisc: >> >> 1. RFC patch to add lockless qdisc bypass optimization: >> >> https://patchwork.kernel.org/project/netdevbpf/patch/1616404156-11772-1-git-send-email-linyunsheng@huawei.com/ >> >> 2. implement lockless enqueuing for lockless qdisc using the idea >> from Jason and Toke. And it has a noticable proformance increase with >> 1-4 threads running using the below prototype based on ptr_ring. >> >> static inline int __ptr_ring_multi_produce(struct ptr_ring *r, void *ptr) >> { >> >> int producer, next_producer; >> >> >> do { >> producer = READ_ONCE(r->producer); >> if (unlikely(!r->size) || r->queue[producer]) >> return -ENOSPC; >> next_producer = producer + 1; >> if (unlikely(next_producer >= r->size)) >> next_producer = 0; >> } while(cmpxchg_relaxed(&r->producer, producer, next_producer) != producer); >> >> /* Make sure the pointer we are storing points to a valid data. */ >> /* Pairs with the dependency ordering in __ptr_ring_consume. */ >> smp_wmb(); >> >> WRITE_ONCE(r->queue[producer], ptr); >> return 0; >> } >> >> 3. Maybe it is possible to remove the netif_tx_lock for lockless qdisc >> too, because dev_hard_start_xmit is also in the protection of >> qdisc_run_begin()/qdisc_run_end()(if there is only one qdisc using >> a netdev queue, which is true for pfifo_fast, I believe). >> >> 4. Remove the qdisc->running seqcount operation for lockless qdisc, which >> is mainly used to do heuristic locking on q->busylock for locked qdisc. >> > > Sounds good. They can stand two months, cant they? >>> >>>> Is there any reason why you want to revert it? >>>> >>> I think you know Jiri's plan and it would be nice to wait a couple of >>> months for it to complete. >> >> I am not sure I am aware of Jiri's plan. >> Is there any link referring to the plan? >> > https://lore.kernel.org/lkml/eaff25bc-9b64-037e-b9bc-c06fc4a5a9fb@huawei.com/ I think there is some misunderstanding here. As my understanding, Jiri and Juergen are from the same team(using the suse.com mail server). what Jiri said about "I am still planning to have Yunsheng Lin's (CCing) fix [1] tested in the coming days." is that Juergen has done the test and provide a "Tested-by" tag. So if this patch fixes the packet stuck problem, Jiri is ok with NOLOCK qdisc too. Or do I misunderstand it again? Perhaps Jiri and Juergen can help to clarify this? > > . >