Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp2303171pxp; Mon, 21 Mar 2022 16:22:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxszg7pzDEf38PVPhUov1EvS+csm0Wjr0FfR9cv/zMcW3Gau9f4XfIt0W5QxoXHQSHk3qe5 X-Received: by 2002:a17:903:22d1:b0:154:4cfa:2b12 with SMTP id y17-20020a17090322d100b001544cfa2b12mr8872054plg.47.1647904927038; Mon, 21 Mar 2022 16:22:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647904927; cv=none; d=google.com; s=arc-20160816; b=UgmWSBQ/SDLI1lzV8o2Bala4Qsl4dVm3IOodEgjzosHSU+czNASL6fJJXqV4URuETM MVj0l3Ax+ZNwxWOcqX/5eu36qbMI9WwPJYm3SoAUaQ1uTCjDG8ybbouZivJapLa1IeAQ VSBBVJt1clpdbmgPhNvJhaG/+1zGV45fM6cA6M4DW7g1bj+VZ9SOiBk8eMsNcvY04rhz e2wVe1WA8pY1BeCQmhHd1y8FcZHrUG6gD5KGAVMYi2E3n/jWdE0fuq54LW9nE4x/R6ZV FeBdmGgSzOX3jDDkJ3pcvPW8ie27x2J0eWBm8jhLornfwVPcmsF6catjQ7InO4nVvCil v78w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=re+ElXIzTQy5rCx63a7lonLbk/LZ9pkEiZiZ3EmVdPU=; b=NiWTlwpQUo9O1tF0adw6GAFNGJ/C7mIyGpH6idaQJWH/NQHy1HeziPVSUJ8DuDqwj4 QF38sOQWeSVgF9vNpzgHMmQpUt581rkitqSxrdD82ecvETEC4C1TqHfTVolfhyarqrmC hRsZ4Tums7RuKABCt8/Pft+098kpVY5vrcOa/gDfnFIxztJqNl96OvSm8yJgCQSZXI0w uxpI7a41NBRSOtDr/Lw/NzqhpyNLjs8+yXi1k7VnAxGqo/IeUM5rdqa7KTRTg8T2DQA4 om/YD4f7UzkvOCxMWlFfXKS15QnbF6cFQEbQVxhu/DD6BEp3iYw/KIUFznl4z68u6uHd H22A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=ZQTHwvHw; spf=pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id y5-20020a63fa05000000b003816043eeefsi13691233pgh.228.2022.03.21.16.22.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Mar 2022 16:22:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=ZQTHwvHw; spf=pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7374938D76F; Mon, 21 Mar 2022 15:22:23 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351548AbiCURBr (ORCPT + 70 others); Mon, 21 Mar 2022 13:01:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52398 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343823AbiCURBq (ORCPT ); Mon, 21 Mar 2022 13:01:46 -0400 Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B856A72E2C for ; Mon, 21 Mar 2022 10:00:20 -0700 (PDT) Received: by mail-pf1-x429.google.com with SMTP id p5so9868039pfo.5 for ; Mon, 21 Mar 2022 10:00:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=re+ElXIzTQy5rCx63a7lonLbk/LZ9pkEiZiZ3EmVdPU=; b=ZQTHwvHwoLS2LZrT/qbrGqCVvtYIDbVN+QZ/CEj5heQn9DoiuHYkbDzUaTMCb3AJOD etj4PhQXrLDt2rIXLFPEHcXT0FOZnMZJVHvLEpm3zLr3O0MqgzIQ+0pQ45R9vh1JPUkJ Qtd39NNNRLY31TtYv4MtuToce3/ucogB4e7xhFbBuY9r2IYxQAusjYC4hffCj2zFsvzd O+jlk6fyLJ7dmWaY+U+XhOHR8Ra+yD408ZUakFxfkQ5vSBVjyo3nlcW9GatETneJzbFw zm7+hABDXabl2vE8eRZv0W0f35mmMUSxGkMA8V3pRDPWEVFGtTSy17FKw9g4KGF46/16 Mz0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=re+ElXIzTQy5rCx63a7lonLbk/LZ9pkEiZiZ3EmVdPU=; b=jv95yRAfiUZN4Q89uHCu5jHv+z6k5nGS9XsFdcc0x+vvTDmn56YnO2jaDRv+rNlLwR O/T4UKRqU7RqqvL02qV+RZjIjwF7QCo+wgj78lgJfWKtdYnMTsxe7XqGNsq8EcoyJZLh 4DJJanfk7js7ukNa3LDIS0G6YEDMYo/gPpBUGrCetobu6LjqawPZpFozv236nDlVVx9K w/zZDD3ucO1Wkh++cYQ6EnLOEXCOWwg9iLv7tzbD46cFLTutM+x9vc1kvJQeBOSIfpu2 gbD939RViqKbAh3p2YRPS1IyfwHkVDh7yPbhPawcfS/XG5IcpkBr4ajFvu451WOtquvO wA0A== X-Gm-Message-State: AOAM5311cHFE5IKmyBYOdg1xi0VJtKFMrQzkqGYTnwSlfMB6fzbjenbw KM6wxoZ+/7vm7l0E73HMCs8WYPnS1d1Nbw== X-Received: by 2002:aa7:8256:0:b0:4e0:78ad:eb81 with SMTP id e22-20020aa78256000000b004e078adeb81mr25218291pfn.30.1647882018633; Mon, 21 Mar 2022 10:00:18 -0700 (PDT) Received: from google.com (249.189.233.35.bc.googleusercontent.com. [35.233.189.249]) by smtp.gmail.com with ESMTPSA id l9-20020a655609000000b0037589f4337dsm15217666pgs.78.2022.03.21.10.00.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Mar 2022 10:00:17 -0700 (PDT) Date: Mon, 21 Mar 2022 17:00:14 +0000 From: William McVicker To: Johannes Berg , linux-wireless@vger.kernel.org Cc: Marek Szyprowski , Kalle Valo , "David S. Miller" , Jakub Kicinski , netdev@vger.kernel.org, Amitkumar Karwar , Ganapathi Bhat , Xinming Hu , "Cc: Android Kernel" Subject: Re: [BUG] deadlock in nl80211_vendor_cmd Message-ID: References: <0000000000009e9b7105da6d1779@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-8.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, HK_RANDOM_FROM,MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org On 03/21/2022, Will McVicker wrote: > On Thu, Mar 17, 2022 at 10:09 AM wrote: > > > Hi, > > > > I wanted to report a deadlock that I'm hitting as a result of the upstream > > commit a05829a7222e ("cfg80211: avoid holding the RTNL when calling the > > driver"). I'm using the Pixel 6 with downstream version of the 5.15 kernel, > > but I'm pretty sure this will happen on the upstream tip-of-tree kernel as > > well. > > > > Basically, my wlan driver uses the wiphy_vendor_command ops to handle > > a number of vendor specific operations. One of them in particular deletes > > a cfg80211 interface. The deadlock happens when thread 1 tries to take the > > RTNL lock before calling cfg80211_unregister_device() while thread 2 is > > inside nl80211_pre_doit(), holding the RTNL lock, and waiting on > > wiphy_lock(). > > > > Here is the call flow: > > > > Thread 1: Thread 2: > > > > nl80211_pre_doit(): > > -> rtnl_lock() > > nl80211_pre_doit(): > > -> rtnl_lock() > > -> > > -> wiphy_lock() > > -> rtnl_unlock() > > -> > > exit nl80211_pre_doit() > > > > -> wiphy_lock() > > -> > > nl80211_doit() > > -> nl80211_vendor_cmd() > > -> rtnl_lock() > > -> cfg80211_unregister_device() > > -> rtnl_unlock() > > > > > > To be complete, here are the kernel call traces when the deadlock occurs: > > > > Thread 1 Call trace: > > > > nl80211_vendor_cmd+0x210/0x218 > > genl_rcv_msg+0x3ac/0x45c > > netlink_rcv_skb+0x130/0x168 > > genl_rcv+0x38/0x54 > > netlink_unicast_kernel+0xe4/0x1f4 > > netlink_unicast+0x128/0x21c > > netlink_sendmsg+0x2d8/0x3d8 > > > > Thread 2 Call trace: > > > > nl80211_pre_doit+0x1b0/0x250 > > genl_rcv_msg+0x37c/0x45c > > netlink_rcv_skb+0x130/0x168 > > genl_rcv+0x38/0x54 > > netlink_unicast_kernel+0xe4/0x1f4 > > netlink_unicast+0x128/0x21c > > netlink_sendmsg+0x2d8/0x3d8 > > > > I'm not an networking expert. So my main question is if I'm allowed to take > > the RTNL lock inside the nl80211_vendor_cmd callbacks? If so, then > > regardless of why I take it, we shouldn't be allowing this deadlock > > situation, right? > > > > I hope that helps explain the issue. Let me know if you need any more > > details. > > > > Thanks, > > Will > > > > Sorry my CC list got dropped. Adding the following: > > Kalle Valo > "David S. Miller" > Jakub Kicinski > netdev@vger.kernel.org > Amitkumar Karwar > Ganapathi Bhat > Xinming Hu > kernel-team@android.com Sorry for the noise. The lists bounced due to html. Resending with mutt to make sure everyone gets this message. As an update, I was able to fix the deadlock by updating nl80211_pre_doit() to not hold the RTNL lock while waiting to get the wiphy_lock. This allows us to take the RTNL lock within nl80211_doit() and have parallel calls to nl80211_doit(). Below is the logic I tested. Please let me know if I'm heading in the right direction. Thanks, Will diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 686a69381731..bb4ad746509b 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -15227,7 +15227,24 @@ static int nl80211_pre_doit(const struct genl_ops *ops, struct sk_buff *skb, } if (rdev && !(ops->internal_flags & NL80211_FLAG_NO_WIPHY_MTX)) { - wiphy_lock(&rdev->wiphy); + while (!mutex_trylock(&rdev->wiphy.mtx)) { + /* Holding the RTNL lock while waiting for the wiphy lock can lead to + * a deadlock within doit() ops that don't hold the RTNL in pre_doit. So + * we need to release the RTNL lock first while we wait for the wiphy + * lock. + */ + rtnl_unlock(); + wiphy_lock(&rdev->wiphy); + + /* Once we get the wiphy_lock, we need to grab the RTNL lock. If we can't + * get it, then we need to unlock the wiphy to avoid a deadlock in + * pre_doit and then retry taking the locks again. */ + if (!rtnl_trylock()) { + wiphy_unlock(&rdev->wiphy); + rtnl_lock(); + } else + break; + } /* we keep the mutex locked until post_doit */ __release(&rdev->wiphy.mtx); }