Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp1749091imw; Tue, 5 Jul 2022 14:57:09 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uYBAMErvx0jTczux7YlHPsxdmHjtUVPkB8TnvEFproCmosgyJUqiYRutvLcRcEHWmkw4NU X-Received: by 2002:a17:906:4fc4:b0:6da:b4c6:fadb with SMTP id i4-20020a1709064fc400b006dab4c6fadbmr36789756ejw.282.1657058229685; Tue, 05 Jul 2022 14:57:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657058229; cv=none; d=google.com; s=arc-20160816; b=mkWOHgebEqx1NiTJopSTwcfmtc/KAL3Q03W+Hz+7hkRAT50daw/IJ43ZA31ouwPWlT fpGrNCOhSN2QnILv+vyg2BH3yYDbjHTsc2Eve1zBVYIXFqFbLF7hQEDu173O1im4ZABy RPQs5lgQ/YPDy5mSbuMbwGCnX+Y4jBv9CbKfqmI95gYcNJfZE252ksttY8pIk6eNi8at J5MuZcrMR6vjyl+2122Kv14JgJ/BoIcfUskWJmhQR3/8/RaRn6Q41puxa9CxsjBSNTU1 H7DHstYwDa1ZiBR+F87gGiYz+gmAWu2APd0EkAcCRp7Tjy9bDwPUNTbExizkNwvssMYA S8Wg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Z3w1nf3Hp79ClpxdX022mPLezyw8xDafITfNcepY9zA=; b=1DelV+TmHDzwBrma+KE+E75HZLs/+fkgE68/5adcwLFYHWLh/VhNkH3q3ynpsC3S7x ka4UXt/9xm0XNEsQwCW5+KDTLpf+BaIbfLZOdgkf3U3tCOQXxMZmoO7lX7nfUvPJEGFE +6uFhcpC0BO+NugnyRXyE+DKgLjuVpJ5AUO6xzf2LV5kb2dRlXm3Vsgm/O3nwqgBHwuh ytUYbaPu9A8C69y7JbW4Ln7tzNKvaL7810k+TtPZM/pc17mwUlJAjx0kh4rqjNsg4aIV ksUOY1qxgMZHAv5/r8i3zG8dKg2evzErZ/VtlaBDoojTlvvGBRxh/uqcJ7JNkpTuNznN A3nQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@microchip.com header.s=mchp header.b="lK/83tdY"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=microchip.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id fj4-20020a1709069c8400b006ff11ed7162si19629486ejc.535.2022.07.05.14.56.45; Tue, 05 Jul 2022 14:57:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@microchip.com header.s=mchp header.b="lK/83tdY"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=microchip.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231398AbiGEVzd (ORCPT + 99 others); Tue, 5 Jul 2022 17:55:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229452AbiGEVz3 (ORCPT ); Tue, 5 Jul 2022 17:55:29 -0400 Received: from esa.microchip.iphmx.com (esa.microchip.iphmx.com [68.232.154.123]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA650CF4; Tue, 5 Jul 2022 14:55:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=microchip.com; i=@microchip.com; q=dns/txt; s=mchp; t=1657058125; x=1688594125; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=VEi1co6rpJ5a8g+H6DoDMEmae0EzGQxH1pWxICwqPSw=; b=lK/83tdY7T60esk/NSehauzkXwSau40XcaMkT4Dhe+49hUy3RYn/7VCd B+ZPh750cTdAczK58FOyt7tofuMlkN/6EOhMs2ul3ssakVfNSX/Rl4ZXO 9b5xGQL8pqZh/FQOkQXw3zf2tNAuOO9140ghzGF9TavcPNl0+9DHCvR6+ ZqPzARvll+/rkOiiFHyODj5lQ2PsLgzJ84ywD+el+fS4Eu0jbg4xuKg06 Lf2bdMk/5utm6WnnsWy1Lm5po54MYxLAoxohDw38w8RbDSNR1LWCP6tAS f4weprB97ovxEtEIdOfTt6jDsQDENI2gJllPl7t1IC28NCGWIMTjv8QLH w==; X-IronPort-AV: E=Sophos;i="5.92,248,1650956400"; d="scan'208";a="166504484" Received: from unknown (HELO email.microchip.com) ([170.129.1.10]) by esa2.microchip.iphmx.com with ESMTP/TLS/AES256-SHA256; 05 Jul 2022 14:55:21 -0700 Received: from chn-vm-ex04.mchp-main.com (10.10.85.152) by chn-vm-ex04.mchp-main.com (10.10.85.152) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.17; Tue, 5 Jul 2022 14:55:20 -0700 Received: from localhost (10.10.115.15) by chn-vm-ex04.mchp-main.com (10.10.85.152) with Microsoft SMTP Server id 15.1.2375.17 via Frontend Transport; Tue, 5 Jul 2022 14:55:20 -0700 Date: Tue, 5 Jul 2022 23:59:18 +0200 From: Horatiu Vultur To: Vladimir Oltean CC: "linux-kernel@vger.kernel.org" , "netdev@vger.kernel.org" , "UNGLinuxDriver@microchip.com" , "davem@davemloft.net" , "edumazet@google.com" , "kuba@kernel.org" , "pabeni@redhat.com" Subject: Re: [PATCH net-next v3 2/7] net: lan966x: Split lan966x_fdb_event_work Message-ID: <20220705215918.uwcp4yco5fn3fdex@soft-dev3-1.localhost> References: <20220701205227.1337160-1-horatiu.vultur@microchip.com> <20220701205227.1337160-3-horatiu.vultur@microchip.com> <20220702140834.gyqmtmaru6ecdamb@skbuf> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline In-Reply-To: <20220702140834.gyqmtmaru6ecdamb@skbuf> X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The 07/02/2022 14:08, Vladimir Oltean wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > On Fri, Jul 01, 2022 at 10:52:22PM +0200, Horatiu Vultur wrote: > > Split the function lan966x_fdb_event_work. One case for when the > > orig_dev is a bridge and one case when orig_dev is lan966x port. > > This is preparation for lag support. There is no functional change. > > > > Signed-off-by: Horatiu Vultur > > --- > > > -static void lan966x_fdb_event_work(struct work_struct *work) > > +void lan966x_fdb_flush_workqueue(struct lan966x *lan966x) > > +{ > > + flush_workqueue(lan966x->fdb_work); > > +} > > + > > > diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c b/drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c > > index df2bee678559..d9fc6a9a3da1 100644 > > --- a/drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c > > +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c > > @@ -320,9 +320,10 @@ static int lan966x_port_prechangeupper(struct net_device *dev, > > { > > struct lan966x_port *port = netdev_priv(dev); > > > > - if (netif_is_bridge_master(info->upper_dev) && !info->linking) > > - switchdev_bridge_port_unoffload(port->dev, port, > > - NULL, NULL); > > + if (netif_is_bridge_master(info->upper_dev) && !info->linking) { > > + switchdev_bridge_port_unoffload(port->dev, port, NULL, NULL); > > + lan966x_fdb_flush_workqueue(port->lan966x); > > + } > > Very curious as to why you decided to stuff this change in here. > There was no functional change in v2, now there is. And it's a change > you might need to come back to later (probably sooner than you'd like), > since the flushing of the workqueue is susceptible to causing deadlocks > if done improperly - let's see how you blame a commit that was only > supposed to move code, in that case ;) There is a functional change here and I forgot to change the commit message for this. > > The deadlock that I'm talking about comes from the fact that > lan966x_port_prechangeupper() runs with rtnl_lock() held. So the code of > the flushed workqueue item must not hold rtnl_lock(), or any other lock > that is blocked by the rtnl_lock(). Otherwise, the flushing will wait > for a workqueue item to complete, that in turn waits to acquire the > rtnl_lock, which is held by the thread waiting the workqueue to complete. > > Analyzing your code, lan966x_mac_notifiers() takes rtnl_lock(). > That is taken from threaded interrupt context - lan966x_mac_irq_process(), > but is a sub-lock of spin_lock(&lan966x->mac_lock). > > There are 2 problems with that already: rtnl_lock() is a mutex => can > sleep, but &lan966x->mac_lock is a spin lock => is atomic. You can't > take rtnl_lock() from atomic context. Lockdep and/or CONFIG_DEBUG_ATOMIC_SLEEP > will tell you so much. > > The second problem is the lock ordering inversion that this causes. > There exists a threaded IRQ which takes the locks in the order mac_lock > -> rtnl_lock, and there exists this new fdb_flush_workqueue which takes > the locks in the order rtnl_lock -> mac_lock. If they run at the same > time, kaboom. Again, lockdep will tell you as much. > > I'm sorry, but you need to solve the existing locking problems with the > code first. As I see it, there 2 'different problems' which both have the same root cause, the usage of the lan966x->mac_lock: 1. One is with lan966x_mac_notifiers and lan966x_mac_irq_process, which is an issue on net. And this needs a separate patch. 2. Second is introduced by flushing the workqueue. I am pretty sure I have run with CONFIG_DEBUG_ATOMIC_SLEEP but I couldn't see any errors/warnings. So let me start by fixing first issue on net. > > > > > return NOTIFY_DONE; > > } > > -- > > 2.33.0 > > -- /Horatiu