Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp187153pxk; Wed, 2 Sep 2020 18:52:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxicKgY9g5ZPfUc+02iHmIHEPlzlOMAKNh+/MhK9BRUveu+5PZYAI1sDnGdq5QH23KyJCO5 X-Received: by 2002:a17:906:5902:: with SMTP id h2mr989490ejq.423.1599097933684; Wed, 02 Sep 2020 18:52:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1599097933; cv=none; d=google.com; s=arc-20160816; b=POcGknKEGpcfMKUXJPtmYRNE6z0Rl2OeuHIbrp+NfTqLRcgLsp7rYcUxU9kalZWoRL I85+GuQqGRbUaReRSHTeuX1YE5Iu8VMTA/+JFBlClIKs23NtCItTGEHYSgmzr+PJa59a Bt9RG8EJGCyO+MUvZG/ZUs1MybajayjjVr0XGfKixqxXiXNcdK6bQqS3+pLbLkXLgkaW slK2IJ6ErFzOb/u4DgwRP/LRiRXTWwjyh+E7Agoaw2FJJdun258deD62nZ0gNdFIyVZX iKTPqjuKe0JdK52dPPxvp2uez3WMcaLrThzdso/T5AI5FirMZLZu157hSU5SsKwo4PSj XxlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=A041r2jEFTomfDx8u7avd16rEk4eRvYegHw5wHbWQMg=; b=KQRAybuk0jgikziloLMDZBwWJVqYhl9K3BUXRTI+Ms68QHjCyC7IZYOsDWB85daOMc IELwk0VDWnJoBon4Jki+wAI98X26mBHHT8lPZTnPpL7OmAsr1FIQAvx7UTfF1fdHAj9K gCfqIH6hoBCG521A4DOhZMeaXUczAo2LT+2NBpSVjrHsahSaAalByoni7MRsaEy9Qu5b TGAZU+8lX9GpCXLH3aA19Yg7IwhWIa18Q3/bYLDZxIiuq0wfklmIDNPwbkbdst/1SnbQ 75DrN3lpaH6pAvdBBovWmKG9tH7QQwym+8+iOFFHWQDWdb3pq7/HcjM0MxRRUBwBtLwy wWjQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=gaDMGyt8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p21si766241eds.193.2020.09.02.18.51.51; Wed, 02 Sep 2020 18:52:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=gaDMGyt8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726936AbgICBtH (ORCPT + 99 others); Wed, 2 Sep 2020 21:49:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726177AbgICBtE (ORCPT ); Wed, 2 Sep 2020 21:49:04 -0400 Received: from mail-io1-xd43.google.com (mail-io1-xd43.google.com [IPv6:2607:f8b0:4864:20::d43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57354C061244; Wed, 2 Sep 2020 18:49:04 -0700 (PDT) Received: by mail-io1-xd43.google.com with SMTP id b6so1023319iof.6; Wed, 02 Sep 2020 18:49:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=A041r2jEFTomfDx8u7avd16rEk4eRvYegHw5wHbWQMg=; b=gaDMGyt8CFZEoVtbbxWAN5JkAjcMRRu+pPuNEtpms4SNTjXMkIlwuCmglykMPmkbEI GbZGdG6yTHOQnfZWVdFbXTdvTcLUpvfVL43Um6QJAzl0Y4s/Nl1DYyf3mZBkyYSpUdA4 OhLI99hG2/RsLXjXzul2Z/LLduxwOsJszmmllCDBe/gwqZxeqis695H1yBj661g8ERbB J2k5yB/duPOVNpQxNL5gXrtNr7/YT1Vma7JCLr3A7By6kE6L9q7NM9nDfMCkxi8aPh20 7cop2Le0OMYh/3reNWDgQ1BZcuKrrrhrsxiFCjt4sCfjxGUbyVzTHQ+tsNvSoPcRKQLF KJ7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=A041r2jEFTomfDx8u7avd16rEk4eRvYegHw5wHbWQMg=; b=W3sjdJX/2LBKz5wuAdyFGMCtO0tV3Ku9SYXUfJpCcYXFZreY6KM9W44bDPqkfWgpZL 1SehNKZLgvxPR733xriXWmB6LruWyVynGOqjN/wfsZzVJVu2rF+QWSgSW3yNphzDibon eW7LymCIEvac7Z6e+73LvhiKwBhIw4FNY2U2Nqhr7mWmQcnIlpKR2JDangMljdFdd2Bi yEu21zRdJTHXHwt8Vm8iLjPMxRvi/CJEE5jnagXkieUXLrgxCl7qNbsSm9zWdcjm3PbO OFNtvEatR1m0Uc+M83KjQlXx2bRwhJNnlI9h/NRerQAzMmxWemWtPe665GYWCuYAo6Vh KKtw== X-Gm-Message-State: AOAM533pn4ELA94VOsYPUHk/16tJHRx26boSyrGQjuaIRcOExdrx/T/W FbhOcBTuDvfOCKOJoinvQ/L90tj8UwN62I+JVaI= X-Received: by 2002:a5d:980f:: with SMTP id a15mr1070690iol.12.1599097743554; Wed, 02 Sep 2020 18:49:03 -0700 (PDT) MIME-Version: 1.0 References: <1598921718-79505-1-git-send-email-linyunsheng@huawei.com> <511bcb5c-b089-ab4e-4424-a83c6e718bfa@huawei.com> In-Reply-To: From: Cong Wang Date: Wed, 2 Sep 2020 18:48:52 -0700 Message-ID: Subject: Re: [PATCH net-next] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc To: Yunsheng Lin Cc: Jamal Hadi Salim , Jiri Pirko , David Miller , Jakub Kicinski , Linux Kernel Network Developers , LKML , linuxarm@huawei.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 2, 2020 at 6:22 PM Yunsheng Lin wrote: > > On 2020/9/3 8:35, Cong Wang wrote: > > On Tue, Sep 1, 2020 at 11:35 PM Yunsheng Lin wrote: > >> > >> On 2020/9/2 12:41, Cong Wang wrote: > >>> On Tue, Sep 1, 2020 at 6:42 PM Yunsheng Lin wrote: > >>>> > >>>> On 2020/9/2 2:24, Cong Wang wrote: > >>>>> On Mon, Aug 31, 2020 at 5:59 PM Yunsheng Lin wrote: > >>>>>> > >>>>>> Currently there is concurrent reset and enqueue operation for the > >>>>>> same lockless qdisc when there is no lock to synchronize the > >>>>>> q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in > >>>>>> qdisc_deactivate() called by dev_deactivate_queue(), which may cause > >>>>>> out-of-bounds access for priv->ring[] in hns3 driver if user has > >>>>>> requested a smaller queue num when __dev_xmit_skb() still enqueue a > >>>>>> skb with a larger queue_mapping after the corresponding qdisc is > >>>>>> reset, and call hns3_nic_net_xmit() with that skb later. > >>>>> > >>>>> Can you be more specific here? Which call path requests a smaller > >>>>> tx queue num? If you mean netif_set_real_num_tx_queues(), clearly > >>>>> we already have a synchronize_net() there. > >>>> > >>>> When the netdevice is in active state, the synchronize_net() seems to > >>>> do the correct work, as below: > >>>> > >>>> CPU 0: CPU1: > >>>> __dev_queue_xmit() netif_set_real_num_tx_queues() > >>>> rcu_read_lock_bh(); > >>>> netdev_core_pick_tx(dev, skb, sb_dev); > >>>> . > >>>> . dev->real_num_tx_queues = txq; > >>>> . . > >>>> . . > >>>> . synchronize_net(); > >>>> . . > >>>> q->enqueue() . > >>>> . . > >>>> rcu_read_unlock_bh() . > >>>> qdisc_reset_all_tx_gt > >>>> > >>>> > >>> > >>> Right. > >>> > >>> > >>>> but dev->real_num_tx_queues is not RCU-protected, maybe that is a problem > >>>> too. > >>>> > >>>> The problem we hit is as below: > >>>> In hns3_set_channels(), hns3_reset_notify(h, HNAE3_DOWN_CLIENT) is called > >>>> to deactive the netdevice when user requested a smaller queue num, and > >>>> txq->qdisc is already changed to noop_qdisc when calling > >>>> netif_set_real_num_tx_queues(), so the synchronize_net() in the function > >>>> netif_set_real_num_tx_queues() does not help here. > >>> > >>> How could qdisc still be running after deactivating the device? > >> > >> qdisc could be running during the device deactivating process. > >> > >> The main process of changing channel number is as below: > >> > >> 1. dev_deactivate() > >> 2. hns3 handware related setup > >> 3. netif_set_real_num_tx_queues() > >> 4. netif_tx_wake_all_queues() > >> 5. dev_activate() > >> > >> During step 1, qdisc could be running while qdisc is resetting, so > >> there could be skb left in the old qdisc(which will be restored back to > >> txq->qdisc during dev_activate()), as below: > >> > >> CPU 0: CPU1: > >> __dev_queue_xmit(): dev_deactivate_many(): > >> rcu_read_lock_bh(); qdisc_deactivate(qdisc); > >> q = rcu_dereference_bh(txq->qdisc); . > >> netdev_core_pick_tx(dev, skb, sb_dev); . > >> . > >> . rcu_assign_pointer(dev_queue->qdisc, qdisc_default); > >> . . > >> . . > >> . . > >> . . > >> q->enqueue() . > > > > > > Well, like I said, if the deactivated bit were tested before ->enqueue(), > > there would be no packet queued after qdisc_deactivate(). > > Only if the deactivated bit testing is also protected by qdisc->seqlock? > otherwise there is still window between setting and testing the deactivated bit. Can you be more specific here? Why testing or setting a bit is not atomic? AFAIU, qdisc->seqlock is an optimization to replace __QDISC_STATE_RUNNING, which has nothing to do with deactivate bit. Thanks.