2021-05-15 04:40:03

by Brian Norris

[permalink] [raw]
Subject: [PATCH 5.13] mwifiex: bring down link before deleting interface

We can deadlock when rmmod'ing the driver or going through firmware
reset, because the cfg80211_unregister_wdev() has to bring down the link
for us, ... which then grab the same wiphy lock.

nl80211_del_interface() already handles a very similar case, with a nice
description:

/*
* We hold RTNL, so this is safe, without RTNL opencount cannot
* reach 0, and thus the rdev cannot be deleted.
*
* We need to do it for the dev_close(), since that will call
* the netdev notifiers, and we need to acquire the mutex there
* but don't know if we get there from here or from some other
* place (e.g. "ip link set ... down").
*/
mutex_unlock(&rdev->wiphy.mtx);
...

Do similarly for mwifiex teardown, by ensuring we bring the link down
first.

Sample deadlock trace:

[ 247.103516] INFO: task rmmod:2119 blocked for more than 123 seconds.
[ 247.110630] Not tainted 5.12.4 #5
[ 247.115796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 247.124557] task:rmmod state:D stack: 0 pid: 2119 ppid: 2114 flags:0x00400208
[ 247.133905] Call trace:
[ 247.136644] __switch_to+0x130/0x170
[ 247.140643] __schedule+0x714/0xa0c
[ 247.144548] schedule_preempt_disabled+0x88/0xf4
[ 247.149714] __mutex_lock_common+0x43c/0x750
[ 247.154496] mutex_lock_nested+0x5c/0x68
[ 247.158884] cfg80211_netdev_notifier_call+0x280/0x4e0 [cfg80211]
[ 247.165769] raw_notifier_call_chain+0x4c/0x78
[ 247.170742] call_netdevice_notifiers_info+0x68/0xa4
[ 247.176305] __dev_close_many+0x7c/0x138
[ 247.180693] dev_close_many+0x7c/0x10c
[ 247.184893] unregister_netdevice_many+0xfc/0x654
[ 247.190158] unregister_netdevice_queue+0xb4/0xe0
[ 247.195424] _cfg80211_unregister_wdev+0xa4/0x204 [cfg80211]
[ 247.201816] cfg80211_unregister_wdev+0x20/0x2c [cfg80211]
[ 247.208016] mwifiex_del_virtual_intf+0xc8/0x188 [mwifiex]
[ 247.214174] mwifiex_uninit_sw+0x158/0x1b0 [mwifiex]
[ 247.219747] mwifiex_remove_card+0x38/0xa0 [mwifiex]
[ 247.225316] mwifiex_pcie_remove+0xd0/0xe0 [mwifiex_pcie]
[ 247.231451] pci_device_remove+0x50/0xe0
[ 247.235849] device_release_driver_internal+0x110/0x1b0
[ 247.241701] driver_detach+0x5c/0x9c
[ 247.245704] bus_remove_driver+0x84/0xb8
[ 247.250095] driver_unregister+0x3c/0x60
[ 247.254486] pci_unregister_driver+0x2c/0x90
[ 247.259267] cleanup_module+0x18/0xcdc [mwifiex_pcie]

Fixes: a05829a7222e ("cfg80211: avoid holding the RTNL when calling the driver")
Cc: [email protected]
Link: https://lore.kernel.org/linux-wireless/[email protected]/
Link: https://lore.kernel.org/linux-wireless/[email protected]/
Reported-by: Maximilian Luz <[email protected]>
Reported-by: [email protected]
Cc: Johannes Berg <[email protected]>
Signed-off-by: Brian Norris <[email protected]>
---
drivers/net/wireless/marvell/mwifiex/main.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
index 529dfd8b7ae8..17399d4aa129 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -1445,11 +1445,18 @@ static void mwifiex_uninit_sw(struct mwifiex_adapter *adapter)
if (!priv)
continue;
rtnl_lock();
- wiphy_lock(adapter->wiphy);
if (priv->netdev &&
- priv->wdev.iftype != NL80211_IFTYPE_UNSPECIFIED)
+ priv->wdev.iftype != NL80211_IFTYPE_UNSPECIFIED) {
+ /*
+ * Close the netdev now, because if we do it later, the
+ * netdev notifiers will need to acquire the wiphy lock
+ * again --> deadlock.
+ */
+ dev_close(priv->wdev.netdev);
+ wiphy_lock(adapter->wiphy);
mwifiex_del_virtual_intf(adapter->wiphy, &priv->wdev);
- wiphy_unlock(adapter->wiphy);
+ wiphy_unlock(adapter->wiphy);
+ }
rtnl_unlock();
}

--



2021-05-15 22:18:49

by Dave Olsthoorn

[permalink] [raw]
Subject: Re: [PATCH 5.13] mwifiex: bring down link before deleting interface

On 2021-05-15 04:42, Brian Norris wrote:
> We can deadlock when rmmod'ing the driver or going through firmware
> reset, because the cfg80211_unregister_wdev() has to bring down the
> link
> for us, ... which then grab the same wiphy lock.
>
> nl80211_del_interface() already handles a very similar case, with a
> nice
> description:
>
> /*
> * We hold RTNL, so this is safe, without RTNL opencount cannot
> * reach 0, and thus the rdev cannot be deleted.
> *
> * We need to do it for the dev_close(), since that will call
> * the netdev notifiers, and we need to acquire the mutex there
> * but don't know if we get there from here or from some other
> * place (e.g. "ip link set ... down").
> */
> mutex_unlock(&rdev->wiphy.mtx);
> ...
>
> Do similarly for mwifiex teardown, by ensuring we bring the link down
> first.
>
> Sample deadlock trace:
>
> [ 247.103516] INFO: task rmmod:2119 blocked for more than 123 seconds.
> [ 247.110630] Not tainted 5.12.4 #5
> [ 247.115796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 247.124557] task:rmmod state:D stack: 0 pid: 2119
> ppid: 2114 flags:0x00400208
> [ 247.133905] Call trace:
> [ 247.136644] __switch_to+0x130/0x170
> [ 247.140643] __schedule+0x714/0xa0c
> [ 247.144548] schedule_preempt_disabled+0x88/0xf4
> [ 247.149714] __mutex_lock_common+0x43c/0x750
> [ 247.154496] mutex_lock_nested+0x5c/0x68
> [ 247.158884] cfg80211_netdev_notifier_call+0x280/0x4e0 [cfg80211]
> [ 247.165769] raw_notifier_call_chain+0x4c/0x78
> [ 247.170742] call_netdevice_notifiers_info+0x68/0xa4
> [ 247.176305] __dev_close_many+0x7c/0x138
> [ 247.180693] dev_close_many+0x7c/0x10c
> [ 247.184893] unregister_netdevice_many+0xfc/0x654
> [ 247.190158] unregister_netdevice_queue+0xb4/0xe0
> [ 247.195424] _cfg80211_unregister_wdev+0xa4/0x204 [cfg80211]
> [ 247.201816] cfg80211_unregister_wdev+0x20/0x2c [cfg80211]
> [ 247.208016] mwifiex_del_virtual_intf+0xc8/0x188 [mwifiex]
> [ 247.214174] mwifiex_uninit_sw+0x158/0x1b0 [mwifiex]
> [ 247.219747] mwifiex_remove_card+0x38/0xa0 [mwifiex]
> [ 247.225316] mwifiex_pcie_remove+0xd0/0xe0 [mwifiex_pcie]
> [ 247.231451] pci_device_remove+0x50/0xe0
> [ 247.235849] device_release_driver_internal+0x110/0x1b0
> [ 247.241701] driver_detach+0x5c/0x9c
> [ 247.245704] bus_remove_driver+0x84/0xb8
> [ 247.250095] driver_unregister+0x3c/0x60
> [ 247.254486] pci_unregister_driver+0x2c/0x90
> [ 247.259267] cleanup_module+0x18/0xcdc [mwifiex_pcie]
>
> Fixes: a05829a7222e ("cfg80211: avoid holding the RTNL when calling the
> driver")
> Cc: [email protected]
> Link:
> https://lore.kernel.org/linux-wireless/[email protected]/
> Link:
> https://lore.kernel.org/linux-wireless/[email protected]/
> Reported-by: Maximilian Luz <[email protected]>
> Reported-by: Dave Olsthoorn <[email protected]>

Thanks!

The firmware still seems to crash quicker than previously, but that's a
unrelated problem.

Tested-by: Dave Olsthoorn <[email protected]>

> Cc: Johannes Berg <[email protected]>
> Signed-off-by: Brian Norris <[email protected]>
> ---
> drivers/net/wireless/marvell/mwifiex/main.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/wireless/marvell/mwifiex/main.c
> b/drivers/net/wireless/marvell/mwifiex/main.c
> index 529dfd8b7ae8..17399d4aa129 100644
> --- a/drivers/net/wireless/marvell/mwifiex/main.c
> +++ b/drivers/net/wireless/marvell/mwifiex/main.c
> @@ -1445,11 +1445,18 @@ static void mwifiex_uninit_sw(struct
> mwifiex_adapter *adapter)
> if (!priv)
> continue;
> rtnl_lock();
> - wiphy_lock(adapter->wiphy);
> if (priv->netdev &&
> - priv->wdev.iftype != NL80211_IFTYPE_UNSPECIFIED)
> + priv->wdev.iftype != NL80211_IFTYPE_UNSPECIFIED) {
> + /*
> + * Close the netdev now, because if we do it later, the
> + * netdev notifiers will need to acquire the wiphy lock
> + * again --> deadlock.
> + */
> + dev_close(priv->wdev.netdev);
> + wiphy_lock(adapter->wiphy);
> mwifiex_del_virtual_intf(adapter->wiphy, &priv->wdev);
> - wiphy_unlock(adapter->wiphy);
> + wiphy_unlock(adapter->wiphy);
> + }
> rtnl_unlock();
> }

2021-05-15 22:21:27

by Pali Rohár

[permalink] [raw]
Subject: mwifiex firmware crash (Was: Re: [PATCH 5.13] mwifiex: bring down link before deleting interface)

On Saturday 15 May 2021 17:10:31 Dave Olsthoorn wrote:
> The firmware still seems to crash quicker than previously, but that's a
> unrelated problem.

Hello! Do you have some more details (or links) about mentioned firmware crash?

2021-05-15 22:33:27

by Dave Olsthoorn

[permalink] [raw]
Subject: Re: mwifiex firmware crash

Hi,

On 2021-05-15 17:40, Pali Rohár wrote:
> On Saturday 15 May 2021 17:10:31 Dave Olsthoorn wrote:
>> The firmware still seems to crash quicker than previously, but that's
>> a
>> unrelated problem.
>
> Hello! Do you have some more details (or links) about mentioned
> firmware crash?

Sure, firmware crashes have always been a problem on the Surface
devices. They seem to be related, at least for some of the crashes, to
power management. For this reason I disabled powersaving in
NetworkManager which used to make it at least stable enough for me, in
5.13 this trick does not seem to work.

The dmesg log attached shows a firmware crash happening, the card does
not work even after a reset or remove & rescan on the pci(e) bus.

There are patches [1] which have not been submitted yet and where
developed as part of the linux-surface effort [2]. From my experience
these patches resolve most if not all of the firmware crashes.

Regards,
Dave Olsthoorn

[1]:
https://github.com/linux-surface/linux-surface/blob/master/patches/5.12/0002-mwifiex.patch
[2]: https://github.com/linux-surface

dmesg.log:
[ 127.367628] mwifiex_pcie 0000:01:00.0: info: successfully
disconnected from xx:xx:xx:xx:xx:xx: reason code 3
[ 127.371059] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[ 127.371083] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[ 127.371086] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[ 127.371088] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[ 127.371091] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[ 127.371093] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[ 127.371096] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[ 127.371099] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[ 127.371101] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[ 127.371103] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[ 127.374001] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[ 127.374028] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[ 127.374103] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[ 127.374113] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[ 127.375041] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[ 127.375311] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[ 127.375445] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[ 127.375903] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[ 127.379980] mwifiex_pcie 0000:01:00.0: info: shutdown mwifiex...
[ 127.448638] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[ 127.457009] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[ 127.494749] mwifiex_pcie 0000:01:00.0: performing
cancel_work_sync()...
[ 127.494756] mwifiex_pcie 0000:01:00.0: cancel_work_sync() done
[ 155.308155] mwifiex_pcie: PCI memory map Virt0: 00000000ff7d3684 PCI
memory map Virt2: 000000004aa1ebbb
[ 155.357139] mwifiex_pcie 0000:01:00.0: WLAN FW already running! Skip
FW dnld
[ 155.357146] mwifiex_pcie 0000:01:00.0: WLAN FW is active
[ 155.510296] mwifiex_pcie 0000:01:00.0: info: MWIFIEX VERSION:
mwifiex 1.0 (15.68.19.p21)
[ 155.510319] mwifiex_pcie 0000:01:00.0: driver_version = mwifiex 1.0
(15.68.19.p21)
[ 155.560724] mwifiex_pcie 0000:01:00.0 wlp1s0: renamed from mlan0
[ 161.848939] mwifiex_pcie 0000:01:00.0: info: trying to associate to
bssid xx:xx:xx:xx:xx:xx
[ 162.027982] mwifiex_pcie 0000:01:00.0: info: associated to bssid
xx:xx:xx:xx:xx:xx successfully
[ 162.063062] IPv6: ADDRCONF(NETDEV_CHANGE): wlp1s0: link becomes
ready
[ 173.452786] mwifiex_pcie 0000:01:00.0: mwifiex_cmd_timeout_func:
Timeout cmd id = 0x10, act = 0x1
[ 173.452867] mwifiex_pcie 0000:01:00.0: num_data_h2c_failure = 0
[ 173.452873] mwifiex_pcie 0000:01:00.0: num_cmd_h2c_failure = 0
[ 173.452877] mwifiex_pcie 0000:01:00.0: is_cmd_timedout = 1
[ 173.452881] mwifiex_pcie 0000:01:00.0: num_tx_timeout = 0
[ 173.452885] mwifiex_pcie 0000:01:00.0: last_cmd_index = 1
[ 173.452889] mwifiex_pcie 0000:01:00.0: last_cmd_id: 16 00 10 00 28
00 a4 00 7f 00
[ 173.452894] mwifiex_pcie 0000:01:00.0: last_cmd_act: 00 00 01 00 13
00 00 00 00 00
[ 173.452898] mwifiex_pcie 0000:01:00.0: last_cmd_resp_index = 0
[ 173.452905] mwifiex_pcie 0000:01:00.0: last_cmd_resp_id: 16 80 10 80
28 80 a4 80 7f 80
[ 173.452911] mwifiex_pcie 0000:01:00.0: last_event_index = 3
[ 173.452918] mwifiex_pcie 0000:01:00.0: last_event: 6a 00 17 00 2b 00
33 00 58 00
[ 173.452925] mwifiex_pcie 0000:01:00.0: data_sent=1 cmd_sent=1
[ 173.452930] mwifiex_pcie 0000:01:00.0: ps_mode=0 ps_state=0
[ 173.455878] mwifiex_pcie 0000:01:00.0: ===mwifiex driverinfo dump
start===
[ 173.455914] mwifiex_pcie 0000:01:00.0: info: MWIFIEX VERSION:
mwifiex 1.0 (15.68.19.p21)
[ 173.455925] mwifiex_pcie 0000:01:00.0: PCIE register dump start
[ 173.456014] mwifiex_pcie 0000:01:00.0: pcie scratch register:
[ 173.456026] mwifiex_pcie 0000:01:00.0: reg:0xcf0, value=0xfedcba00
reg:0xcf8, value=0x9141cc
reg:0xcfc, value=0x28c8c00
[ 173.456030] mwifiex_pcie 0000:01:00.0: PCIE register dump end
[ 173.456173] mwifiex_pcie 0000:01:00.0: ===mwifiex driverinfo dump
end===
[ 173.456177] mwifiex_pcie 0000:01:00.0: == mwifiex firmware dump
start ==
[ 177.548244] mwifiex_pcie 0000:01:00.0: cmd_wait_q terminated: -110
[ 177.548341] mwifiex_pcie 0000:01:00.0: failed to get signal
information
[ 177.548708] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[ 177.548712] mwifiex_pcie 0000:01:00.0: failed to get signal
information
[ 182.816381] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[ 182.816389] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[ 183.100502] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[ 183.100524] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[ 183.463055] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[ 183.463059] mwifiex_pcie 0000:01:00.0: failed to get signal
information
[ 189.465526] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[ 189.465538] mwifiex_pcie 0000:01:00.0: failed to get signal
information
[ 195.464760] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[ 195.464770] mwifiex_pcie 0000:01:00.0: failed to get signal
information
[ 201.467820] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[ 201.467826] mwifiex_pcie 0000:01:00.0: failed to get signal
information
[ 207.463143] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[ 207.463154] mwifiex_pcie 0000:01:00.0: failed to get signal
information
[ 213.462237] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[ 213.462244] mwifiex_pcie 0000:01:00.0: failed to get signal
information
[ 219.464173] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[ 219.464180] mwifiex_pcie 0000:01:00.0: failed to get signal
information
[ 223.116948] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[ 225.468213] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[ 225.468219] mwifiex_pcie 0000:01:00.0: failed to get signal
information
[ 229.349824] mwifiex_pcie 0000:01:00.0: Firmware dump Finished!
[ 229.349835] mwifiex_pcie 0000:01:00.0: == mwifiex firmware dump end
==
[ 229.350968] mwifiex_pcie 0000:01:00.0: == mwifiex dump information
to /sys/class/devcoredump start
[ 229.354949] mwifiex_pcie 0000:01:00.0: == mwifiex dump information
to /sys/class/devcoredump end
[ 229.354981] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[ 229.356825] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[ 229.357175] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[ 229.357184] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[ 229.357190] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[ 229.357195] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[ 229.357200] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[ 229.357205] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[ 229.357210] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[ 229.357215] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[ 229.357219] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[ 229.357224] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[ 229.357228] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[ 229.357233] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[ 229.359375] mwifiex_pcie 0000:01:00.0: info: shutdown mwifiex...
[ 229.362422] mwifiex_pcie 0000:01:00.0: rx_pending=0, tx_pending=1,
cmd_pending=0
[ 229.493680] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[ 229.493700] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[ 229.509094] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[ 230.653787] mwifiex_pcie 0000:01:00.0: WLAN FW already running! Skip
FW dnld
[ 230.653794] mwifiex_pcie 0000:01:00.0: WLAN FW is active
[ 241.037922] mwifiex_pcie 0000:01:00.0: mwifiex_cmd_timeout_func:
Timeout cmd id = 0xfa, act = 0x4000
[ 241.038005] mwifiex_pcie 0000:01:00.0: num_data_h2c_failure = 0
[ 241.038031] mwifiex_pcie 0000:01:00.0: num_cmd_h2c_failure = 0
[ 241.038048] mwifiex_pcie 0000:01:00.0: is_cmd_timedout = 1
[ 241.038067] mwifiex_pcie 0000:01:00.0: num_tx_timeout = 0
[ 241.038086] mwifiex_pcie 0000:01:00.0: last_cmd_index = 2
[ 241.038104] mwifiex_pcie 0000:01:00.0: last_cmd_id: 16 00 10 00 fa
00 a4 00 7f 00
[ 241.038126] mwifiex_pcie 0000:01:00.0: last_cmd_act: 00 00 01 00 00
40 00 00 00 00
[ 241.038147] mwifiex_pcie 0000:01:00.0: last_cmd_resp_index = 0
[ 241.038168] mwifiex_pcie 0000:01:00.0: last_cmd_resp_id: 16 80 10 80
28 80 a4 80 7f 80
[ 241.038187] mwifiex_pcie 0000:01:00.0: last_event_index = 3
[ 241.038207] mwifiex_pcie 0000:01:00.0: last_event: 6a 00 17 00 2b 00
33 00 58 00
[ 241.038226] mwifiex_pcie 0000:01:00.0: data_sent=0 cmd_sent=1
[ 241.038246] mwifiex_pcie 0000:01:00.0: ps_mode=0 ps_state=0
[ 241.040764] mwifiex_pcie 0000:01:00.0: info: _mwifiex_fw_dpc:
unregister device
[ 241.057936] mwifiex_pcie 0000:01:00.0: skipped cancel_work_sync()
because we're in card reset failure path
[ 241.060488] mwifiex: Failed to bring up adapter: -5
[ 241.060504] mwifiex_pcie 0000:01:00.0: reinit failed: -5

2021-06-11 00:20:33

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH 5.13] mwifiex: bring down link before deleting interface

On Fri, May 14, 2021 at 7:45 PM Brian Norris <[email protected]> wrote:
>
> We can deadlock when rmmod'ing the driver or going through firmware
> reset, because the cfg80211_unregister_wdev() has to bring down the link
> for us, ... which then grab the same wiphy lock.
...
> Fixes: a05829a7222e ("cfg80211: avoid holding the RTNL when calling the driver")
> Cc: [email protected]
> Link: https://lore.kernel.org/linux-wireless/[email protected]/
> Link: https://lore.kernel.org/linux-wireless/[email protected]/
> Reported-by: Maximilian Luz <[email protected]>
> Reported-by: [email protected]
> Cc: Johannes Berg <[email protected]>
> Signed-off-by: Brian Norris <[email protected]>

Ping - is this going to get merged? It's a 5.12 regression, and we
have multiple people complaining about it (and they tested the fix
too!).

Thanks,
Brian

2021-06-11 10:05:34

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH 5.13] mwifiex: bring down link before deleting interface

Brian Norris <[email protected]> wrote:

> We can deadlock when rmmod'ing the driver or going through firmware
> reset, because the cfg80211_unregister_wdev() has to bring down the link
> for us, ... which then grab the same wiphy lock.
>
> nl80211_del_interface() already handles a very similar case, with a nice
> description:
>
> /*
> * We hold RTNL, so this is safe, without RTNL opencount cannot
> * reach 0, and thus the rdev cannot be deleted.
> *
> * We need to do it for the dev_close(), since that will call
> * the netdev notifiers, and we need to acquire the mutex there
> * but don't know if we get there from here or from some other
> * place (e.g. "ip link set ... down").
> */
> mutex_unlock(&rdev->wiphy.mtx);
> ...
>
> Do similarly for mwifiex teardown, by ensuring we bring the link down
> first.
>
> Sample deadlock trace:
>
> [ 247.103516] INFO: task rmmod:2119 blocked for more than 123 seconds.
> [ 247.110630] Not tainted 5.12.4 #5
> [ 247.115796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 247.124557] task:rmmod state:D stack: 0 pid: 2119 ppid: 2114 flags:0x00400208
> [ 247.133905] Call trace:
> [ 247.136644] __switch_to+0x130/0x170
> [ 247.140643] __schedule+0x714/0xa0c
> [ 247.144548] schedule_preempt_disabled+0x88/0xf4
> [ 247.149714] __mutex_lock_common+0x43c/0x750
> [ 247.154496] mutex_lock_nested+0x5c/0x68
> [ 247.158884] cfg80211_netdev_notifier_call+0x280/0x4e0 [cfg80211]
> [ 247.165769] raw_notifier_call_chain+0x4c/0x78
> [ 247.170742] call_netdevice_notifiers_info+0x68/0xa4
> [ 247.176305] __dev_close_many+0x7c/0x138
> [ 247.180693] dev_close_many+0x7c/0x10c
> [ 247.184893] unregister_netdevice_many+0xfc/0x654
> [ 247.190158] unregister_netdevice_queue+0xb4/0xe0
> [ 247.195424] _cfg80211_unregister_wdev+0xa4/0x204 [cfg80211]
> [ 247.201816] cfg80211_unregister_wdev+0x20/0x2c [cfg80211]
> [ 247.208016] mwifiex_del_virtual_intf+0xc8/0x188 [mwifiex]
> [ 247.214174] mwifiex_uninit_sw+0x158/0x1b0 [mwifiex]
> [ 247.219747] mwifiex_remove_card+0x38/0xa0 [mwifiex]
> [ 247.225316] mwifiex_pcie_remove+0xd0/0xe0 [mwifiex_pcie]
> [ 247.231451] pci_device_remove+0x50/0xe0
> [ 247.235849] device_release_driver_internal+0x110/0x1b0
> [ 247.241701] driver_detach+0x5c/0x9c
> [ 247.245704] bus_remove_driver+0x84/0xb8
> [ 247.250095] driver_unregister+0x3c/0x60
> [ 247.254486] pci_unregister_driver+0x2c/0x90
> [ 247.259267] cleanup_module+0x18/0xcdc [mwifiex_pcie]
>
> Fixes: a05829a7222e ("cfg80211: avoid holding the RTNL when calling the driver")
> Cc: [email protected]
> Link: https://lore.kernel.org/linux-wireless/[email protected]/
> Link: https://lore.kernel.org/linux-wireless/[email protected]/
> Reported-by: Maximilian Luz <[email protected]>
> Reported-by: [email protected]
> Cc: Johannes Berg <[email protected]>
> Signed-off-by: Brian Norris <[email protected]>
> Tested-by: Maximilian Luz <[email protected]>
> Tested-by: Dave Olsthoorn <[email protected]>

Patch applied to wireless-drivers.git, thanks.

1f9482aa8d41 mwifiex: bring down link before deleting interface

--
https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches