From: Wen Gong <[email protected]>
Running this test in a loop it is easy to reproduce an rtnl deadlock:
iw reg set FI
ifconfig wlan0 down
What happens is that thread A (workqueue) tries to update the regulatory:
try to acquire the rtnl_lock of ar->regd_update_work
rtnl_lock+0x17/0x20
ath11k_regd_update+0x15a/0x260 [ath11k]
ath11k_regd_update_work+0x15/0x20 [ath11k]
process_one_work+0x228/0x670
worker_thread+0x4d/0x440
kthread+0x16d/0x1b0
ret_from_fork+0x22/0x30
And thread B (ifconfig) tries to stop the interface:
try to cancel_work_sync(&ar->regd_update_work) in ath11k_mac_op_stop().
ifconfig 3109 [003] 2414.232506: probe:
ath11k_mac_op_stop: (ffffffffc14187a0)
drv_stop+0x30 ([mac80211])
ieee80211_do_stop+0x5d2 ([mac80211])
ieee80211_stop+0x3e ([mac80211])
__dev_close_many+0x9e ([kernel.kallsyms])
__dev_change_flags+0xbe ([kernel.kallsyms])
dev_change_flags+0x23 ([kernel.kallsyms])
devinet_ioctl+0x5e3 ([kernel.kallsyms])
inet_ioctl+0x197 ([kernel.kallsyms])
sock_do_ioctl+0x4d ([kernel.kallsyms])
sock_ioctl+0x264 ([kernel.kallsyms])
__x64_sys_ioctl+0x92 ([kernel.kallsyms])
do_syscall_64+0x3a ([kernel.kallsyms])
entry_SYSCALL_64_after_hwframe+0x63 ([kernel.kallsyms])
__GI___ioctl+0x7 (/lib/x86_64-linux-gnu/libc-2.23.so)
The sequence of deadlock is:
1. Thread B calls rtnl_lock().
2. Thread A starts to run and calls rtnl_lock() from within
ath11k_regd_update_work(), then enters wait state because the lock is owned by
thread B.
3. Thread B continues to run and tries to call
cancel_work_sync(&ar->regd_update_work), but thread A is in
ath11k_regd_update_work() waiting for rtnl_lock(). So cancel_work_sync()
forever waits for ath11k_regd_update_work() to finish and we have a deadlock.
Fix this by switching from using regulatory_set_wiphy_regd_sync() to
regulatory_set_wiphy_regd(). Now cfg80211 will schedule another workqueue which
handles the locking on it's own. So the ath11k workqueue can simply exit without
taking any locks, avoiding the deadlock.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
Signed-off-by: Wen Gong <[email protected]>
[kvalo: improve commit log]
Signed-off-by: Kalle Valo <[email protected]>
---
drivers/net/wireless/ath/ath11k/reg.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/net/wireless/ath/ath11k/reg.c b/drivers/net/wireless/ath/ath11k/reg.c
index 7ee3ff69dfc8..6fae4e61ede7 100644
--- a/drivers/net/wireless/ath/ath11k/reg.c
+++ b/drivers/net/wireless/ath/ath11k/reg.c
@@ -287,11 +287,7 @@ int ath11k_regd_update(struct ath11k *ar)
goto err;
}
- rtnl_lock();
- wiphy_lock(ar->hw->wiphy);
- ret = regulatory_set_wiphy_regd_sync(ar->hw->wiphy, regd_copy);
- wiphy_unlock(ar->hw->wiphy);
- rtnl_unlock();
+ ret = regulatory_set_wiphy_regd(ar->hw->wiphy, regd_copy);
kfree(regd_copy);
base-commit: 023baf1318ef21442fab3842bf03883bc81223e0
--
2.30.2
On 10/6/2022 8:17 AM, Kalle Valo wrote:
> From: Wen Gong <[email protected]>
>
> Running this test in a loop it is easy to reproduce an rtnl deadlock:
>
> iw reg set FI
> ifconfig wlan0 down
>
> What happens is that thread A (workqueue) tries to update the regulatory:
>
> try to acquire the rtnl_lock of ar->regd_update_work
>
> rtnl_lock+0x17/0x20
> ath11k_regd_update+0x15a/0x260 [ath11k]
> ath11k_regd_update_work+0x15/0x20 [ath11k]
> process_one_work+0x228/0x670
> worker_thread+0x4d/0x440
> kthread+0x16d/0x1b0
> ret_from_fork+0x22/0x30
>
> And thread B (ifconfig) tries to stop the interface:
>
> try to cancel_work_sync(&ar->regd_update_work) in ath11k_mac_op_stop().
> ifconfig 3109 [003] 2414.232506: probe:
>
> ath11k_mac_op_stop: (ffffffffc14187a0)
> drv_stop+0x30 ([mac80211])
> ieee80211_do_stop+0x5d2 ([mac80211])
> ieee80211_stop+0x3e ([mac80211])
> __dev_close_many+0x9e ([kernel.kallsyms])
> __dev_change_flags+0xbe ([kernel.kallsyms])
> dev_change_flags+0x23 ([kernel.kallsyms])
> devinet_ioctl+0x5e3 ([kernel.kallsyms])
> inet_ioctl+0x197 ([kernel.kallsyms])
> sock_do_ioctl+0x4d ([kernel.kallsyms])
> sock_ioctl+0x264 ([kernel.kallsyms])
> __x64_sys_ioctl+0x92 ([kernel.kallsyms])
> do_syscall_64+0x3a ([kernel.kallsyms])
> entry_SYSCALL_64_after_hwframe+0x63 ([kernel.kallsyms])
> __GI___ioctl+0x7 (/lib/x86_64-linux-gnu/libc-2.23.so)
>
> The sequence of deadlock is:
>
> 1. Thread B calls rtnl_lock().
>
> 2. Thread A starts to run and calls rtnl_lock() from within
> ath11k_regd_update_work(), then enters wait state because the lock is owned by
> thread B.
>
> 3. Thread B continues to run and tries to call
> cancel_work_sync(&ar->regd_update_work), but thread A is in
> ath11k_regd_update_work() waiting for rtnl_lock(). So cancel_work_sync()
> forever waits for ath11k_regd_update_work() to finish and we have a deadlock.
>
> Fix this by switching from using regulatory_set_wiphy_regd_sync() to
> regulatory_set_wiphy_regd(). Now cfg80211 will schedule another workqueue which
> handles the locking on it's own. So the ath11k workqueue can simply exit without
> taking any locks, avoiding the deadlock.
>
> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
>
> Signed-off-by: Wen Gong <[email protected]>
> [kvalo: improve commit log]
> Signed-off-by: Kalle Valo <[email protected]>
Reviewed-by: Jeff Johnson <[email protected]>
Kalle Valo <[email protected]> wrote:
> Running this test in a loop it is easy to reproduce an rtnl deadlock:
>
> iw reg set FI
> ifconfig wlan0 down
>
> What happens is that thread A (workqueue) tries to update the regulatory:
>
> try to acquire the rtnl_lock of ar->regd_update_work
>
> rtnl_lock+0x17/0x20
> ath11k_regd_update+0x15a/0x260 [ath11k]
> ath11k_regd_update_work+0x15/0x20 [ath11k]
> process_one_work+0x228/0x670
> worker_thread+0x4d/0x440
> kthread+0x16d/0x1b0
> ret_from_fork+0x22/0x30
>
> And thread B (ifconfig) tries to stop the interface:
>
> try to cancel_work_sync(&ar->regd_update_work) in ath11k_mac_op_stop().
> ifconfig 3109 [003] 2414.232506: probe:
>
> ath11k_mac_op_stop: (ffffffffc14187a0)
> drv_stop+0x30 ([mac80211])
> ieee80211_do_stop+0x5d2 ([mac80211])
> ieee80211_stop+0x3e ([mac80211])
> __dev_close_many+0x9e ([kernel.kallsyms])
> __dev_change_flags+0xbe ([kernel.kallsyms])
> dev_change_flags+0x23 ([kernel.kallsyms])
> devinet_ioctl+0x5e3 ([kernel.kallsyms])
> inet_ioctl+0x197 ([kernel.kallsyms])
> sock_do_ioctl+0x4d ([kernel.kallsyms])
> sock_ioctl+0x264 ([kernel.kallsyms])
> __x64_sys_ioctl+0x92 ([kernel.kallsyms])
> do_syscall_64+0x3a ([kernel.kallsyms])
> entry_SYSCALL_64_after_hwframe+0x63 ([kernel.kallsyms])
> __GI___ioctl+0x7 (/lib/x86_64-linux-gnu/libc-2.23.so)
>
> The sequence of deadlock is:
>
> 1. Thread B calls rtnl_lock().
>
> 2. Thread A starts to run and calls rtnl_lock() from within
> ath11k_regd_update_work(), then enters wait state because the lock is owned by
> thread B.
>
> 3. Thread B continues to run and tries to call
> cancel_work_sync(&ar->regd_update_work), but thread A is in
> ath11k_regd_update_work() waiting for rtnl_lock(). So cancel_work_sync()
> forever waits for ath11k_regd_update_work() to finish and we have a deadlock.
>
> Fix this by switching from using regulatory_set_wiphy_regd_sync() to
> regulatory_set_wiphy_regd(). Now cfg80211 will schedule another workqueue which
> handles the locking on it's own. So the ath11k workqueue can simply exit without
> taking any locks, avoiding the deadlock.
>
> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
>
> Signed-off-by: Wen Gong <[email protected]>
> [kvalo: improve commit log]
> Signed-off-by: Kalle Valo <[email protected]>
Patch applied to ath-next branch of ath.git, thanks.
d99884ad9e36 wifi: ath11k: avoid deadlock during regulatory update in ath11k_regd_update()
--
https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches