If BSS instance is not provided in __cfg80211_connect_result then
a get bss is performed. This can return NULL if the BSS for the
given SSID is expired due to delayed scheduling of connect result event
in rdev->event_work. This can cause WARN_ON(!cr->bss) in
__cfg80211_connect_result to be triggered and cause cascading
failures. To mitigate this, initiate a get bss call in
cfg80211_connect_done itself and hold it to ensure that the BSS
instance does not get expired.
Signed-off-by: Abhishek Kumar <[email protected]>
---
net/wireless/sme.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/net/wireless/sme.c b/net/wireless/sme.c
index 07756ca5e3b5..52f65991accd 100644
--- a/net/wireless/sme.c
+++ b/net/wireless/sme.c
@@ -724,15 +724,8 @@ void __cfg80211_connect_result(struct net_device *dev,
}
#endif
- if (!cr->bss && (cr->status == WLAN_STATUS_SUCCESS)) {
+ if (cr->status == WLAN_STATUS_SUCCESS)
WARN_ON_ONCE(!wiphy_to_rdev(wdev->wiphy)->ops->connect);
- cr->bss = cfg80211_get_bss(wdev->wiphy, NULL, cr->bssid,
- wdev->ssid, wdev->ssid_len,
- wdev->conn_bss_type,
- IEEE80211_PRIVACY_ANY);
- if (cr->bss)
- cfg80211_hold_bss(bss_from_pub(cr->bss));
- }
if (wdev->current_bss) {
cfg80211_unhold_bss(wdev->current_bss);
@@ -882,6 +875,18 @@ void cfg80211_connect_done(struct net_device *dev,
ev->cr.fils.update_erp_next_seq_num = params->fils.update_erp_next_seq_num;
if (params->fils.update_erp_next_seq_num)
ev->cr.fils.erp_next_seq_num = params->fils.erp_next_seq_num;
+
+ /* Acquire and hold the bss if bss is not provided in argument.
+ * This ensures that the BSS does not get expired if the schedule
+ * of the rdev->event_work gets delayed.
+ */
+ if (!params->bss && params->bssid)
+ params->bss = cfg80211_get_bss(wdev->wiphy, NULL,
+ params->bssid,
+ wdev->ssid, wdev->ssid_len,
+ wdev->conn_bss_type,
+ IEEE80211_PRIVACY_ANY);
+
if (params->bss)
cfg80211_hold_bss(bss_from_pub(params->bss));
ev->cr.bss = params->bss;
--
2.31.0.rc2.261.g7f71774620-goog
On Tue, 2021-03-16 at 19:29 +0000, Abhishek Kumar wrote:
> If BSS instance is not provided in __cfg80211_connect_result then
> a get bss is performed. This can return NULL if the BSS for the
> given SSID is expired due to delayed scheduling of connect result event
> in rdev->event_work. This can cause WARN_ON(!cr->bss) in
> __cfg80211_connect_result to be triggered and cause cascading
> failures. To mitigate this, initiate a get bss call in
> cfg80211_connect_done itself and hold it to ensure that the BSS
> instance does not get expired.
I'm not sure I see the value in this.
You're basically picking a slightly earlier point in time where cfg80211
might know about the BSS entry still, so you're really just making the
problem window a few microseconds or perhaps milliseconds (whatever ends
up being the worker delay) shorter.
Compared to the 30s entry lifetime, that's nothing.
So what's the point? Please fix the driver instead to actually hold on
to it and report it back.
johannes
Greeting,
FYI, we noticed the following commit (built with gcc-9):
commit: 01df75115977b90907f0d3bc0511c58f14d4b2f9 ("[PATCH] net: wireless: search and hold bss in cfg80211_connect_done")
url: https://github.com/0day-ci/linux/commits/Abhishek-Kumar/net-wireless-search-and-hold-bss-in-cfg80211_connect_done/20210317-033314
base: https://git.kernel.org/cgit/linux/kernel/git/jberg/mac80211-next.git master
in testcase: hwsim
version: hwsim-x86_64-537ab94-1_20210101
with following parameters:
test: group-19
ucode: 0xe2
on test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz with 32G memory
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
[ 70.141545] ------------[ cut here ]------------
[ 70.146165] WARNING: CPU: 3 PID: 35 at net/wireless/sme.c:728 __cfg80211_connect_result+0x391/0x3c0 [cfg80211]
[ 70.156225] Modules linked in: ccm mac80211_hwsim mac80211 cfg80211 rfkill libarc4 xfs libcrc32c sd_mod t10_pi sg ipmi_devintf ipmi_msghandler intel_rapl_msr intel_rapl_common i915 intel_gtt x86_pkg_temp_thermal intel_powerclamp drm_kms_helper coretemp syscopyarea kvm_intel sysfillrect ahci dell_wmi kvm irqbypass mei_wdt sparse_keymap dell_smbios crct10dif_pclmul crc32_pclmul crc32c_intel dell_wmi_descriptor wmi_bmof sysimgblt ghash_clmulni_intel libahci dcdbas fb_sys_fops rapl intel_cstate mei_me intel_uncore drm libata mei intel_pch_thermal wmi video acpi_pad intel_pmc_core ip_tables
[ 70.208119] CPU: 3 PID: 35 Comm: kworker/u8:3 Tainted: G I 5.11.0-rc7-02022-g01df75115977 #1
[ 70.217872] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.1.1 10/07/2015
[ 70.225263] Workqueue: phy0 ieee80211_iface_work [mac80211]
[ 70.230903] RIP: 0010:__cfg80211_connect_result+0x391/0x3c0 [cfg80211]
[ 70.237483] Code: ff ff e9 a0 fe ff ff 0f 0b 0f 0b e9 97 fe ff ff 65 ff 0d 62 ce 5a 3f e8 7d 97 6f c0 e9 86 fe ff ff 0f 0b e9 a5 fd ff ff 0f 0b <0f> 0b e9 b8 fe ff ff 0f 0b e9 b4 fd ff ff 0f 0b eb 95 0f 0b eb a9
[ 70.256263] RSP: 0018:ffffc900001838b0 EFLAGS: 00010246
[ 70.261488] RAX: ffffffffc0bf4200 RBX: ffff88886beb4910 RCX: 0000000000000000
[ 70.268621] RDX: 0000000000000000 RSI: ffff888100046e00 RDI: 0000000000002000
[ 70.275764] RBP: ffffc900001838f0 R08: 0000000000000ea0 R09: ffff88885dc20044
[ 70.282919] R10: 0000000000082a20 R11: ffff888871dab0f0 R12: ffffc90000183928
[ 70.290062] R13: ffff88886beb4000 R14: ffffc900001838b0 R15: ffffc90000183928
[ 70.297210] FS: 0000000000000000(0000) GS:ffff888871d80000(0000) knlGS:0000000000000000
[ 70.305302] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 70.311050] CR2: 00007fca7921f000 CR3: 000000087020a005 CR4: 00000000003706e0
[ 70.318184] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 70.325317] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 70.332454] Call Trace:
[ 70.334900] cfg80211_rx_assoc_resp+0x108/0x1e0 [cfg80211]
[ 70.340437] ieee80211_rx_mgmt_assoc_resp.cold+0x1f1/0x2ac [mac80211]
[ 70.346951] ieee80211_sta_rx_queued_mgmt+0xca/0x640 [mac80211]
[ 70.352957] ? update_load_avg+0x78/0x640
[ 70.356967] ? sched_clock+0x5/0x20
[ 70.360458] ? sched_clock_cpu+0xc/0xc0
[ 70.364296] ? newidle_balance+0x1f0/0x400
[ 70.368391] ieee80211_iface_work+0x27a/0x3a0 [mac80211]
[ 70.373750] process_one_work+0x1ed/0x3c0
[ 70.377759] worker_thread+0x50/0x3c0
[ 70.381431] ? process_one_work+0x3c0/0x3c0
[ 70.385628] kthread+0x116/0x160
[ 70.388857] ? kthread_park+0xa0/0xa0
[ 70.392518] ret_from_fork+0x22/0x30
[ 70.396093] ---[ end trace 0acdcc8abf949a49 ]---
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang