Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp2289436pxb; Fri, 25 Mar 2022 14:44:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwNvy9SnLb+HinPqE0wa51Nf4rjtBeQqPHmxJypuEYMbWm+s1IdGZrsMR2xAqfmDRd7qvY8 X-Received: by 2002:a17:902:6b47:b0:150:1f58:44c3 with SMTP id g7-20020a1709026b4700b001501f5844c3mr14196240plt.127.1648244692018; Fri, 25 Mar 2022 14:44:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648244692; cv=none; d=google.com; s=arc-20160816; b=sWCCsMa2NqBHADnAsYwiZB1JFeYExzVixnSZlDqF/POCEv39dIZ6TRRuMFS+OAobfS yZFeebyT+qEhJ3hWPWKsWHQnrEra2J9XzjiNpj+Ss4bkw4gLLR/1Hi/+yQcPXEP7hjC4 aJFmc8+COfp7J/cg5lTLdLmo8ZTleNswgKHekd+Vvisr+D3WnEPCpfXJXNKuY+E8dmoP wDWN9jQKjiQxpK9Y6hKMuftUL3pdXKdqoGTySHlQAe9qxNkh/s9imHDp6RLV/qRivoWH lsiVPbExesJ7BwN3oh2IK3IHJ9hp5yAJskXb8tepbwm4ikxn5k59eiLHhvF7sltoRqkm LS3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=VjjoxVuhRNNF8jYyMUMowFklmpUhYFJFF0ZY+7oRIwo=; b=L23y8VGDEWoZmmlrmOFKYkIjPVOFUsF0yo0Yv2rSBsHEgXV9CCWaEBnU7m25aQFxe5 h3aBQmAB2DbaMkGRbf6AutEfbiCuJur/pHa0cw3afwglIqXpGslD9pQpS3v9/tQwXsyD Q5uYzJZw8nZzbaJOxQ+hV40eqX61suT1JbY8JP2vFidPhoNQIIOnNEKwyHXu8cTZggsC SYgEX8br/cn/0ARucO3ROrcS8LBa08cEEIujnAV4a0kBpb3BLRlL7r06br9CdZtcK9z5 Dv2B+J/cyRf1UqC8PxJ3Udg1eNwbBFd+hDDtdR13/9Tkx+sBzoLRf5x/UR28T7Ar7PQG s8BQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sipsolutions.net header.s=mail header.b=HwTHh1Ob; spf=softfail (google.com: domain of transitioning linux-wireless-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=pass (p=NONE sp=REJECT dis=NONE) header.from=sipsolutions.net Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id y2-20020a170902700200b00153b2d164e3si3176895plk.235.2022.03.25.14.44.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Mar 2022 14:44:51 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-wireless-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@sipsolutions.net header.s=mail header.b=HwTHh1Ob; spf=softfail (google.com: domain of transitioning linux-wireless-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=pass (p=NONE sp=REJECT dis=NONE) header.from=sipsolutions.net Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6A39F6372; Fri, 25 Mar 2022 14:17:17 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233409AbiCYVSt (ORCPT + 70 others); Fri, 25 Mar 2022 17:18:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233562AbiCYVST (ORCPT ); Fri, 25 Mar 2022 17:18:19 -0400 Received: from sipsolutions.net (s3.sipsolutions.net [IPv6:2a01:4f8:191:4433::2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C8ACDEA6; Fri, 25 Mar 2022 14:16:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sipsolutions.net; s=mail; h=Content-Transfer-Encoding:MIME-Version: Content-Type:References:In-Reply-To:Date:Cc:To:From:Subject:Message-ID:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-To: Resent-Cc:Resent-Message-ID; bh=VjjoxVuhRNNF8jYyMUMowFklmpUhYFJFF0ZY+7oRIwo=; t=1648243004; x=1649452604; b=HwTHh1ObyTEECo58y7w/OzGMUaz2z2uEHBYlx+QE9Eb8wY6 ogRLiCdnzQFkHy8da3PbVtk7Vwm0JGLCDImPgMS4M/9z3zSRbQf6O3kqLnz8TmNH093NVuXnUKvqe vWrcH8mVeqlIAbqPtHG+ifXAX31+1M7SwJw5sC/jV9iHh+Batia0KuDMqzki2zz44HrlLGRnNlzoh JOw/T6IFAkQRQHtBMp6EStSyGeHUI7m+jjY9r/2Sm6IjqFmcaKXfpN9w9bFZ9+X4jIwNqAIxLxWTM dh0NjOK3aBL+NeUEs4LVpPmt+YygfT3MJlhilSftJkiAkkEKndA1Cxe0szWgGm1Q==; Received: by sipsolutions.net with esmtpsa (TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.95) (envelope-from ) id 1nXrIM-000VcJ-9j; Fri, 25 Mar 2022 22:16:30 +0100 Message-ID: <976e8cf697c7e5bc3a752e758a484b69a058710a.camel@sipsolutions.net> Subject: Re: [BUG] deadlock in nl80211_vendor_cmd From: Johannes Berg To: William McVicker Cc: Jakub Kicinski , linux-wireless@vger.kernel.org, Marek Szyprowski , Kalle Valo , "David S. Miller" , netdev@vger.kernel.org, Amitkumar Karwar , Ganapathi Bhat , Xinming Hu , kernel-team@android.com, Paolo Abeni Date: Fri, 25 Mar 2022 22:16:29 +0100 In-Reply-To: References: <0000000000009e9b7105da6d1779@google.com> <99eda6d1dad3ff49435b74e539488091642b10a8.camel@sipsolutions.net> <5d5cf050-7de0-7bad-2407-276970222635@quicinc.com> <19e12e6b5f04ba9e5b192001fbe31a3fc47d380a.camel@sipsolutions.net> <20220325094952.10c46350@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.42.4 (3.42.4-1.fc35) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-malware-bazaar: not-scanned X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org On Fri, 2022-03-25 at 20:36 +0000, William McVicker wrote: > > I found that my wlan driver is using the vendor commands to create/delete NAN > interfaces for this Android feature called Wi-Fi aware [1]. Basically, this > features allows users to discover other nearby devices and allows them to > connect directly with one another over a local network.  > Wait, why is it doing that? We actually support a NAN interface type upstream :) It's not really quite fully fleshed out, but it could be? Probably should be? > Thread 1 Thread 2 > nl80211_pre_doit(): > rtnl_lock() > wiphy_lock() nl80211_pre_doit(): > rtnl_lock() // blocked by Thread 1 > nl80211_vendor_cmd(): > doit() > cfg80211_unregister_netdevice() > rtnl_unlock(): > netdev_run_todo(): > __rtnl_unlock() > > wiphy_lock() // blocked by Thread 1 > rtnl_lock(); // DEADLOCK > nl80211_post_doit(): > wiphy_unlock(); Right, this is what I had discussed in my other mails. Basically, you're actually doing (some form of) unregister_netdevice() before rtnl_unlock(). Clearly this isn't possible in cfg80211 itself. However, I couldn't entirely discount the possibility that this is possible: Thread 1 Thread 2 rtnl_lock() unregister_netdevice() __rtnl_unlock() rtnl_lock() wiphy_lock() netdev_run_todo() __rtnl_unlock() // list not empty now // because of thread 2 rtnl_lock() rtnl_lock() wiphy_lock() ** DEADLOCK ** Given my other discussion with Jakub though, it seems that we can indeed make sure that this cannot happen, and then this scenario is impossible without the unregistration you're doing. > Since I'm unlocking the RTNL inside nl80211_vendor_cmd() after calling doit() > instead of waiting till post_doit(), I get into the situation you mentioned > where the net_todo_list is not empty when calling rtnl_unlock. So I decided to > drop the rtnl_unlock() in nl80211_vendor_cmd() and defer that until > nl80211_post_doit() after calling wiphy_unlock(). With this change, I haven't > been able to reproduce the deadlock. So it's possible that we aren't actually > able to hit this deadlock in nl80211_pre_doit() with the existing code since, > as you mentioned, one wouldn't be able to call unregister_netdevice() without > having the RTNL lock. > Right, this is why I said earlier that actually adding a flag for vendor commands to get the RTNL would be more complex - you'd have to basically open-code pre_doit() and post_doit() in there and check the sub-command flag at the very beginning and very end. johannes