Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp2260224pxp; Mon, 21 Mar 2022 15:12:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx3OH2JnqZUaNvA/b4a8zf4IGiMkHjD/uwTDZnxd3vZqfGc2hZHc3cm1NTsDXxfwd7GzcbF X-Received: by 2002:a63:121f:0:b0:382:2513:df9e with SMTP id h31-20020a63121f000000b003822513df9emr15521813pgl.269.1647900765244; Mon, 21 Mar 2022 15:12:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647900765; cv=none; d=google.com; s=arc-20160816; b=aVT+58BsqpEhcODV2yMkifGeMJNgbzJ/tx9faJQXf+HpzVGTh/FAAoLSbz4nJS+WBE N+sVGKfGSCkNtey0eA30wmGfw/BZ1xB3A/wyz5nqpi9Je/0qUhhvQmUGZxGWYfYC/mgz lghYT4RzcwTtKcOlNY1FkWYR6VdsaGNEbLKGdAWKLTTGz33cmavwfy/y4M6GgM5fVauI NgDXSDtNooyhTLiZQNgLh4IQP6L/Z8+B81Uzi27rAl6yUkDK7pTTCREjmJhtt10hZwf7 oYun0CMMu5PvxVwKy47C6mZiisbWqeoQnBMFs/efOsUJ0pIAYUZEmL12uIeB/YaGgq9P YiKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=1z7QvDxYKRxkE9hrxM7O6W3REzHeduSoZ8ywLXfF33o=; b=KrU8xa1UHuck9TuPHgbzAp0M2fRrH7l5Z/PRQhRgf7ZGnm8XY9xrxpCVSHjtShccuK 1gGbrh8J85aootHJEoOkl8lV+kwE27pMW+qnnbwDyP14TwSNpY2HjNnIeakGPt4/TzaA EutcYNsXyU0jk+KRM4S/CNARp00ZuYWkjMjuXsXYHFSkrsdfE35dcH/XkA17YdfXIih/ 6OASz+xG6PQ/TD58DIwbRUc62I9YDtrV2HgqZIgNc8h8ABZprPMuJTsW2H8LmZ/3ba13 QT7wObLToU+oOF1zLLKNNBfIPzmlCa9/1IDKNUuYRoGUo1nFAsVB+XFy7vpsg/JZHF9K M/hg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sipsolutions.net header.s=mail header.b=FOPk4Rnr; spf=pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=pass (p=NONE sp=REJECT dis=NONE) header.from=sipsolutions.net Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id s7-20020a17090a880700b001beec00d3ebsi445442pjn.125.2022.03.21.15.12.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Mar 2022 15:12:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@sipsolutions.net header.s=mail header.b=FOPk4Rnr; spf=pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=pass (p=NONE sp=REJECT dis=NONE) header.from=sipsolutions.net Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 59831375D0D; Mon, 21 Mar 2022 14:34:16 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353070AbiCUUJ6 (ORCPT + 70 others); Mon, 21 Mar 2022 16:09:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44154 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353007AbiCUUJm (ORCPT ); Mon, 21 Mar 2022 16:09:42 -0400 Received: from sipsolutions.net (s3.sipsolutions.net [IPv6:2a01:4f8:191:4433::2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 761C410876B for ; Mon, 21 Mar 2022 13:08:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sipsolutions.net; s=mail; h=Content-Transfer-Encoding:MIME-Version: Content-Type:References:In-Reply-To:Date:Cc:To:From:Subject:Message-ID:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-To: Resent-Cc:Resent-Message-ID; bh=1z7QvDxYKRxkE9hrxM7O6W3REzHeduSoZ8ywLXfF33o=; t=1647893282; x=1649102882; b=FOPk4RnrA0HAWi0++HNzqBtHoIpV3D19w1ZHsWDevq1fsb9 FuO+vt29gZH2Eil9kOueWRKhq8BIMNZnQajdUBnooJGAdvvotsx2udf11b3zM+J3/LbUyrPhaz4mh liDNn6EMvqOnMbZTIhtjAlGShDTZapxOuEHsQaiNuqqPBMbo6ZP490/O7+4d57edMex5J6w/AtABk nkl85p8kSouCwFIo/LVvsP9o6fV8RZAgmDqCQoXt6FaPdKLvRSvkvsvwd/h2pWPze5yhkxoL6w2AJ PYbNN3l4i0ZvtGs1KkzZ9lN1vAqBCkv1Vcj7VPZ7PeUbbmMvCNQ1YeuIVGmPSecA==; Received: by sipsolutions.net with esmtpsa (TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.95) (envelope-from ) id 1nWOJq-00GKv3-SA; Mon, 21 Mar 2022 21:07:58 +0100 Message-ID: <99eda6d1dad3ff49435b74e539488091642b10a8.camel@sipsolutions.net> Subject: Re: [BUG] deadlock in nl80211_vendor_cmd From: Johannes Berg To: willmcvicker@google.com, linux-wireless@vger.kernel.org Cc: Marek Szyprowski Date: Mon, 21 Mar 2022 21:07:57 +0100 In-Reply-To: <0000000000009e9b7105da6d1779@google.com> References: <0000000000009e9b7105da6d1779@google.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.42.4 (3.42.4-1.fc35) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-malware-bazaar: not-scanned X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org Hi, > Basically, my wlan driver uses the wiphy_vendor_command ops to handle > a number of vendor specific operations. > I guess it's an out-of-tree driver, since I (hope I) fixed all the issues in the code here ... :) > One of them in particular deletes > a cfg80211 interface. There's quite normal API for that, why would you do that?! > The deadlock happens when thread 1 tries to take the > RTNL lock before calling cfg80211_unregister_device() while thread 2 is > inside nl80211_pre_doit(), holding the RTNL lock, and waiting on > wiphy_lock(). > > Here is the call flow: > > Thread 1: Thread 2: > > nl80211_pre_doit(): > -> rtnl_lock() > nl80211_pre_doit(): > -> rtnl_lock() > -> > -> wiphy_lock() > -> rtnl_unlock() > -> > exit nl80211_pre_doit() > > -> wiphy_lock() > -> > nl80211_doit() > -> nl80211_vendor_cmd() > -> rtnl_lock() Yeah, I guess the way we invoke vendor commands now w/o RTNL held means you cannot safely acquire RTNL in them. I mean, the whole above thing basically collapses down to Thread 1 Thread 2 wiphy_lock(); // nl80211 rtnl_lock(); wiphy_lock(); rtnl_lock(); // your driver The correct order to _acquire_ these is rtnl -> wiphy, and we do it that way around everywhere (else). > I'm not an networking expert. So my main question is if I'm allowed to take > the RTNL lock inside the nl80211_vendor_cmd callbacks? Evidently, you're not. It's interesting though, it used to be that we called these with the RTNL held, now we don't, and the driver you're using somehow "got fixed" to take it, but whoever fixed it didn't take into account that this is not possible? > I hope that helps explain the issue. Let me know if you need any more > details. It does, but I don't think there's any way to fix it. You just fundamentally cannot acquire the RTNL in a vendor command operation since that introduced the ABBA deadlock you observed. Since it's an out-of-tree driver that's about as much as I can help. johannes