Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp1237487pxf; Fri, 19 Mar 2021 02:27:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw57Zg0I6lBPO/l45fV6bbvNweuzf9iPOrz8L8pSq0JA9LZFT3NXvHUe6hxkNLWP5PJv3Ph X-Received: by 2002:aa7:d287:: with SMTP id w7mr8310524edq.23.1616146044810; Fri, 19 Mar 2021 02:27:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616146044; cv=none; d=google.com; s=arc-20160816; b=K1LMebqcQ+SL3po27QUBqa/entSpPFxfgvZIkbxhOwFkV49i1F4cWGzMXtnyctoZb/ Xvhy7mX9uRmsXd8KplLBorpe9spODzruNmieMU24paeqcsd3C0zgPiuUGojrFh8HMlpZ B78x/1kN4sDWYrWZJ4zcDELTzyVDKByX5KPZrAP/rRZv/Wkmzyd9xjW76H5c2pylXdAR CHgD2ucp+I0fFm3O8lQtOgJM8qtagJMDiN9Rzuz5mxbxbnoOLg2GvF9Q2wNhIBt5RHs4 6GtrDI0tFBAdoNQvFicJTnNcPCq4WUNxogv0Y3i/gz6NJUfNu+72rTcQcAat/EV8PkOv 3obQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:references:cc :to:from:subject; bh=ZFtJlD3N84wWd5ATN0xxgYx61UHRzj6vPiK2USk7IUw=; b=S+wHH8PkhnyPq7ZebCzQe7CmmzP7BGIWX5GwaVV8gUdkgN4B2axt9nqGypBAdZ/4A5 sHRdQ35l4KuEsyZood7dMXMhlTDjUuySl1S+frLxm7bM221mAXPqQzdZvPCjhhqjoFh/ zPHfA5037oNkKMIk6Evz1ZWIAmiR9S7k9fNkAW62ENbQ2yLZUJdzbJ6e4J1sBU2dQZW3 GL2nljjH8G49mUXaLV44rDqHCXGcb7VvVpnaTugCdHjTae7Q3IpYnGU0/GftNoTETDGW 6iCAffda8NG0MO4u8wI8Vr3/rz0ltnxqTYShiLHfjoVFAC/zUYAN5saJPoM3UhIUrFnK BLDA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i6si3757473ejk.722.2021.03.19.02.27.02; Fri, 19 Mar 2021 02:27:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229791AbhCSJ0H (ORCPT + 99 others); Fri, 19 Mar 2021 05:26:07 -0400 Received: from szxga02-in.huawei.com ([45.249.212.188]:3492 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229469AbhCSJZy (ORCPT ); Fri, 19 Mar 2021 05:25:54 -0400 Received: from DGGEMM406-HUB.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4F1z3j2LlvzRRYx; Fri, 19 Mar 2021 17:24:01 +0800 (CST) Received: from dggpemm500005.china.huawei.com (7.185.36.74) by DGGEMM406-HUB.china.huawei.com (10.3.20.214) with Microsoft SMTP Server (TLS) id 14.3.498.0; Fri, 19 Mar 2021 17:25:46 +0800 Received: from [127.0.0.1] (10.69.30.204) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2106.2; Fri, 19 Mar 2021 17:25:47 +0800 Subject: Re: [Linuxarm] [PATCH net] net: sched: fix packet stuck problem for lockless qdisc From: Yunsheng Lin To: , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , "Ahmad Fatoum" References: <1616050402-37023-1-git-send-email-linyunsheng@huawei.com> Message-ID: Date: Fri, 19 Mar 2021 17:25:46 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: <1616050402-37023-1-git-send-email-linyunsheng@huawei.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.69.30.204] X-ClientProxiedBy: dggeme719-chm.china.huawei.com (10.1.199.115) To dggpemm500005.china.huawei.com (7.185.36.74) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/3/18 14:53, Yunsheng Lin wrote: > Lockless qdisc has below concurrent problem: > cpu0 cpu1 > . . > q->enqueue . > . . > qdisc_run_begin() . > . . > dequeue_skb() . > . . > sch_direct_xmit() . > . . > . q->enqueue > . qdisc_run_begin() > . return and do nothing > . . > qdisc_run_end() . > > cpu1 enqueue a skb without calling __qdisc_run() because cpu0 > has not released the lock yet and spin_trylock() return false > for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb > enqueued by cpu1 when calling dequeue_skb() because cpu1 may > enqueue the skb after cpu0 calling dequeue_skb() and before > cpu0 calling qdisc_run_end(). > > Lockless qdisc has another concurrent problem when tx_action > is involved: > > cpu0(serving tx_action) cpu1 cpu2 > . . . > . q->enqueue . > . qdisc_run_begin() . > . dequeue_skb() . > . . q->enqueue > . . . > . sch_direct_xmit() . > . . qdisc_run_begin() > . . return and do nothing > . . . > clear __QDISC_STATE_SCHED . . > qdisc_run_begin() . . > return and do nothing . . > . . . > . qdisc_run_begin() . > > This patch fixes the above data race by: > 1. Set a flag after spin_trylock() return false. > 2. Retry a spin_trylock() in case other CPU may not see the > new flag after it releases the lock. > 3. reschedule if the flag is set after the lock is released > at the end of qdisc_run_end(). > > For tx_action case, the flags is also set when cpu1 is at the > end if qdisc_run_begin(), so tx_action will be rescheduled > again to dequeue the skb enqueued by cpu2. > > Also clear the flag before dequeuing in order to reduce the > overhead of the above process, and aviod doing the heavy > test_and_clear_bit() at the end of qdisc_run_end(). > > Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") > Signed-off-by: Yunsheng Lin > --- > For those who has not been following the qdsic scheduling > discussion, there is packet stuck problem for lockless qdisc, > see [1], and I has done some cleanup and added some enhanced > features too, see [2] [3]. > While I was doing the optimization for lockless qdisc, it > accurred to me that these optimization is useless if there is > still basic bug in lockless qdisc, even the bug is not easily > reproducible. So look through [1] again, I found that the data > race for tx action mentioned by Michael, and thought deep about > it and came up with this patch trying to fix it. > > So I am really appreciated some who still has the reproducer > can try this patch and report back. I had done some performance test to see if there is value to fix the packet stuck problem and support lockless qdisc bypass, here is some result using pktgen in 'queue_xmit' mode on a dummy device as Paolo Abeni had done in [1], and using pfifo_fast qdisc: threads vanilla locked-qdisc vanilla+this_patch 1 2.6Mpps 2.9Mpps 2.5Mpps 2 3.9Mpps 4.8Mpps 3.6Mpps 4 5.6Mpps 3.0Mpps 4.7Mpps 8 2.7Mpps 1.6Mpps 2.8Mpps 16 2.2Mpps 1.3Mpps 2.3Mpps locked-qdisc: test by removing the "TCQ_F_NOLOCK | TCQ_F_CPUSTATS". And add the lockless qdisc bypatch and other optimization upon this patch: threads patch_set_1 patch_set_2 patch_set_3 1 2.5Mpps 3.0Mpps 3.0Mpps 2 3.6Mpps 4.1Mpps 5.3Mpps 4 4.7Mpps 4.6Mpps 5.1Mpps 8 2.8Mpps 2.6Mpps 2.7Mpps 16 2.3Mpps 2.2Mpps 2.2Mpps patch_set_1: vanilla + this_patch patch_set_2: vanilla + this_patch + lockless_qdisc_bypass_patch patch_set_3: vanilla + this_patch + lockless_qdisc_bypass_patch + remove_seq_operation_for_lockless_qdisc_optimization + check_rc_before_calling_qdisc_run()_optimization + spin_trylock()_retry_optimization. So all the fix and optimization added together, the lockless qdisc has better performance than vanilla except for the 4 threads case, which has about 9% performance degradation than vanilla one, but still better than the locked-qdisc. > > 1. https://lore.kernel.org/netdev/d102074f-7489-e35a-98cf-e2cad7efd8a2@netrounds.com/t/#ma7013a79b8c4d8e7c49015c724e481e6d5325b32 > 2. https://patchwork.kernel.org/project/netdevbpf/patch/1615777818-13969-1-git-send-email-linyunsheng@huawei.com/ > 3. https://patchwork.kernel.org/project/netdevbpf/patch/1615800610-34700-1-git-send-email-linyunsheng@huawei.com/ >