Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp51366pxf; Wed, 24 Mar 2021 20:42:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyjZUwaTVIQ5jautqf9surOvrnbe0lD49X+vYAev2FJoZmTLEk2+c7tnU4zS9ZlofZWA9V1 X-Received: by 2002:aa7:cc94:: with SMTP id p20mr6870760edt.353.1616643733569; Wed, 24 Mar 2021 20:42:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616643733; cv=none; d=google.com; s=arc-20160816; b=qgVrGE12/qYAceij/IJNU7k0DNcjSSrcXL0MV+PxMT9Swp19F9HDoz30ZmtLKWgQkT RjVo6AXECvTtxlnW1hjAFCJ3oOkwdnsjw2jGo38W5SyaJxBr7FKZqZYFKed7QCMSwWD3 a+VnuToYSpQp5L8uBAK/ZdWJUZHhpkWa6qrRwVawt9+7sL4y/CP319zqk25etfB9tscq HJ6HsCVhT/irgxXx4EcLrzz16nSE3ZUVceB4NFOsyRExLTypdQ6ITMqnJfN9w2fMVXz6 L6VdMDdKp+/qmczGqqlsqBoH3+eVpcGo5x1AceVfAaPvhxxbRmNdfi388y1M5Rkst3bb iZow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=4hqZuLZLD0ZxcU0Va268C/8xwyRSwGWiQ3wXm1UNF5M=; b=CQzdZtzSxqNDIk7Rli4wgnkufaluj/7eaGMEh4JqFTr21xLHsyMxHFn8vBmPFne44X 8AMJU2F1yEFGjzT869hk+nb+dXNrKi9P6B2+2tLMSIIEsLooBeG7w6HnC6q+hl58hU3R oB6ypR71mNfpgHE19WF++YVJBOfdmMAl4kzZSuFcJOkvZ+lUYY+2dNzH31MIm/l3WJUe 9oo+QZxco7ICGa9az6l4BPh5NUpv7aRvGYWA0ynm2QFjFtiCxuWG/wV00fR7uXK6v/mQ KB3weAGBuBEF7Ro/N2C05FqotH3sUmoVyg2V+Bc6NT6ZS/Ayl+lmp3PoE0B5cxfyCVNo BDHA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id br23si3174258ejb.740.2021.03.24.20.41.44; Wed, 24 Mar 2021 20:42:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233962AbhCYCJF (ORCPT + 99 others); Wed, 24 Mar 2021 22:09:05 -0400 Received: from szxga02-in.huawei.com ([45.249.212.188]:3499 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233752AbhCYCIv (ORCPT ); Wed, 24 Mar 2021 22:08:51 -0400 Received: from DGGEML403-HUB.china.huawei.com (unknown [172.30.72.57]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4F5T4f3KKvzRTnV; Thu, 25 Mar 2021 10:06:58 +0800 (CST) Received: from dggpemm500005.china.huawei.com (7.185.36.74) by DGGEML403-HUB.china.huawei.com (10.3.17.33) with Microsoft SMTP Server (TLS) id 14.3.498.0; Thu, 25 Mar 2021 10:08:48 +0800 Received: from [127.0.0.1] (10.69.30.204) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2106.2; Thu, 25 Mar 2021 10:08:48 +0800 Subject: Re: [PATCH net v2] net: sched: fix packet stuck problem for lockless qdisc To: Cong Wang CC: David Miller , Jakub Kicinski , Vladimir Oltean , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Eric Dumazet , Wei Wang , "Cong Wang ." , Taehee Yoo , "Linux Kernel Network Developers" , LKML , , Marc Kleine-Budde , , Jamal Hadi Salim , Jiri Pirko , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , , bpf , Jonas Bonn , Paolo Abeni , Michael Zhivich , Josh Hunt , "Jike Song" , Kehuan Feng , Ahmad Fatoum , , Alexander Duyck References: <1616552677-39016-1-git-send-email-linyunsheng@huawei.com> From: Yunsheng Lin Message-ID: <364d994a-9234-fe52-a8ad-ab17934e6205@huawei.com> Date: Thu, 25 Mar 2021 10:08:47 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.69.30.204] X-ClientProxiedBy: dggeme711-chm.china.huawei.com (10.1.199.107) To dggpemm500005.china.huawei.com (7.185.36.74) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/3/25 3:20, Cong Wang wrote: > On Tue, Mar 23, 2021 at 7:24 PM Yunsheng Lin wrote: >> @@ -176,8 +207,23 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc) >> static inline void qdisc_run_end(struct Qdisc *qdisc) >> { >> write_seqcount_end(&qdisc->running); >> - if (qdisc->flags & TCQ_F_NOLOCK) >> + if (qdisc->flags & TCQ_F_NOLOCK) { >> spin_unlock(&qdisc->seqlock); >> + >> + /* qdisc_run_end() is protected by RCU lock, and >> + * qdisc reset will do a synchronize_net() after >> + * setting __QDISC_STATE_DEACTIVATED, so testing >> + * the below two bits separately should be fine. > > Hmm, why synchronize_net() after setting this bit is fine? It could > still be flipped right after you test RESCHEDULE bit. That depends on when it will be fliped again. As I see: 1. __QDISC_STATE_DEACTIVATED is set during dev_deactivate() process, which should also wait for all process related to "test_bit( __QDISC_STATE_NEED_RESCHEDULE, &q->state)" to finish by calling synchronize_net() and checking some_qdisc_is_busy(). 2. it is cleared during dev_activate() process. And dev_deactivate() and dev_activate() is protected by RTNL lock, or serialized by linkwatch. > > >> + * For qdisc_run() in net_tx_action() case, we >> + * really should provide rcu protection explicitly >> + * for document purposes or PREEMPT_RCU. >> + */ >> + if (unlikely(test_bit(__QDISC_STATE_NEED_RESCHEDULE, >> + &qdisc->state) && >> + !test_bit(__QDISC_STATE_DEACTIVATED, >> + &qdisc->state))) > > Why do you want to test __QDISC_STATE_DEACTIVATED bit at all? > dev_deactivate_many() will wait for those scheduled but being > deactivated, so what's the problem of scheduling it even with this bit? The problem I tried to fix is: CPU0(calling dev_deactivate) CPU1(calling qdisc_run_end) CPU2(calling tx_atcion) . __netif_schedule() . . set __QDISC_STATE_SCHED . . . . clear __QDISC_STATE_DEACTIVATED . . synchronize_net() . . . . . . . clear __QDISC_STATE_SCHED . . . some_qdisc_is_busy() return false . . . . . . . qdisc_run() some_qdisc_is_busy() checks if the qdisc is busy by checking __QDISC_STATE_SCHED and spin_is_locked(&qdisc->seqlock) for lockless qdisc, and some_qdisc_is_busy() return false for CPU0 because CPU2 has cleared the __QDISC_STATE_SCHED and has not taken the qdisc->seqlock yet, qdisc is clearly still busy when qdisc_run() is run by CPU2 later. So you are right, testing __QDISC_STATE_DEACTIVATED does not completely solve the above data race, and there are __netif_schedule() called by dev_requeue_skb() and __qdisc_run() too, which need the same fixing. So will remove the __QDISC_STATE_DEACTIVATED testing for this patch first, and deal with it later. > > Thanks. > > . >