2012-07-31 12:55:14

by Pedro Francisco

[permalink] [raw]
Subject: unloading WiFi modules is usually triggering kernel crash

I've noticed in the past few days a pattern: sometimes nm-applet
starts showing empty bars for the signal strength.

Running the script:
sudo ifconfig wlan0 down; sleep 1
sudo rmmod hp_wmi; sudo rmmod iwl3945; sudo rmmod iwlegacy; sudo rmmod
mac80211; sudo rmmod cfg80211
sleep 2; sudo rmmod rfkill; sync
sudo modprobe rfkill; sudo modprobe cfg80211; sudo modprobe mac80211;
sudo modprobe iwlegacy
sudo modprobe iwl3945; sudo modprobe hp_wmi; sleep 1; sudo ifconfig wlan0 up

usually triggers a kernel crash. This has happened twice so far. I
tried it now for the third time but it didn't crash.

Logs (running with slub_debug ):
https://dl.dropbox.com/u/1332655/WiFi-issues/notTainted-cfg80211_mlme_disassoc-WARNING.log
https://dl.dropbox.com/u/1332655/WiFi-issues/alreadyTainted-debug_print_object-WARNING.log
(debug_print_object-WARNING was caused by running the above script
rmmoding things)
https://dl.dropbox.com/u/1332655/WiFi-issues/iw_dev_scan.log
https://dl.dropbox.com/u/1332655/WiFi-issues/gshell-wifiBars_empty.png

Any ideas on what is going on? Looking at other mails around here it
seems not to be driver specific, at least the cfg80211_mlme_disassoc
part.

Thanks in Advance,
--
Pedro


2012-07-31 13:15:20

by John W. Linville

[permalink] [raw]
Subject: Re: unloading WiFi modules is usually triggering kernel crash

On Tue, Jul 31, 2012 at 01:54:52PM +0100, Pedro Francisco wrote:
> I've noticed in the past few days a pattern: sometimes nm-applet
> starts showing empty bars for the signal strength.
>
> Running the script:
> sudo ifconfig wlan0 down; sleep 1
> sudo rmmod hp_wmi; sudo rmmod iwl3945; sudo rmmod iwlegacy; sudo rmmod
> mac80211; sudo rmmod cfg80211
> sleep 2; sudo rmmod rfkill; sync
> sudo modprobe rfkill; sudo modprobe cfg80211; sudo modprobe mac80211;
> sudo modprobe iwlegacy
> sudo modprobe iwl3945; sudo modprobe hp_wmi; sleep 1; sudo ifconfig wlan0 up
>
> usually triggers a kernel crash. This has happened twice so far. I
> tried it now for the third time but it didn't crash.
>
> Logs (running with slub_debug ):
> https://dl.dropbox.com/u/1332655/WiFi-issues/notTainted-cfg80211_mlme_disassoc-WARNING.log
> https://dl.dropbox.com/u/1332655/WiFi-issues/alreadyTainted-debug_print_object-WARNING.log
> (debug_print_object-WARNING was caused by running the above script
> rmmoding things)
> https://dl.dropbox.com/u/1332655/WiFi-issues/iw_dev_scan.log
> https://dl.dropbox.com/u/1332655/WiFi-issues/gshell-wifiBars_empty.png
>
> Any ideas on what is going on? Looking at other mails around here it
> seems not to be driver specific, at least the cfg80211_mlme_disassoc
> part.

Looks the same as this one, FWIW...

https://bugzilla.redhat.com/show_bug.cgi?id=834158

John
--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2012-08-07 10:22:13

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: unloading WiFi modules is usually triggering kernel crash

On Tue, Jul 31, 2012 at 01:54:52PM +0100, Pedro Francisco wrote:
> I've noticed in the past few days a pattern: sometimes nm-applet
> starts showing empty bars for the signal strength.

RSSI reporting problem or maybe NM issue. When you change kernel to
older or newer does this problem go away ?

> Running the script:
> sudo ifconfig wlan0 down; sleep 1
> sudo rmmod hp_wmi; sudo rmmod iwl3945; sudo rmmod iwlegacy; sudo rmmod
> mac80211; sudo rmmod cfg80211
> sleep 2; sudo rmmod rfkill; sync
> sudo modprobe rfkill; sudo modprobe cfg80211; sudo modprobe mac80211;
> sudo modprobe iwlegacy
> sudo modprobe iwl3945; sudo modprobe hp_wmi; sleep 1; sudo ifconfig wlan0 up

I run a bit modified script (I do not have hp_wmi.ko and rfkill.ko) for few
hours, and did not get any WARNING/crash. I used 3.5, can you check if that
problem is also fixed on your system on 3.5 or newer.

Stanislaw

2012-08-30 15:59:18

by Pedro Francisco

[permalink] [raw]
Subject: Re: unloading WiFi modules is usually triggering kernel crash

On Tue, Aug 7, 2012 at 11:22 AM, Stanislaw Gruszka <[email protected]> wrote:
> On Tue, Jul 31, 2012 at 01:54:52PM +0100, Pedro Francisco wrote:
>> I've noticed in the past few days a pattern: sometimes nm-applet
>> starts showing empty bars for the signal strength.
>
> RSSI reporting problem or maybe NM issue. When you change kernel to
> older or newer does this problem go away ?
>
>> Running the script:
>> sudo ifconfig wlan0 down; sleep 1
>> sudo rmmod hp_wmi; sudo rmmod iwl3945; sudo rmmod iwlegacy; sudo rmmod
>> mac80211; sudo rmmod cfg80211
>> sleep 2; sudo rmmod rfkill; sync
>> sudo modprobe rfkill; sudo modprobe cfg80211; sudo modprobe mac80211;
>> sudo modprobe iwlegacy
>> sudo modprobe iwl3945; sudo modprobe hp_wmi; sleep 1; sudo ifconfig wlan0 up
>
> I run a bit modified script (I do not have hp_wmi.ko and rfkill.ko) for few
> hours, and did not get any WARNING/crash. I used 3.5, can you check if that
> problem is also fixed on your system on 3.5 or newer.

On 3.5.2-3.fc17.i686.PAE everything seems stable. The problem I had
described hasn't happened recently.
I guess it got fixed in the meantime.

Thank you for your time,
--
Pedro

2012-09-26 12:47:39

by Pedro Francisco

[permalink] [raw]
Subject: Re: unloading WiFi modules is usually triggering kernel crash

On Thu, Aug 30, 2012 at 4:58 PM, Pedro Francisco
<[email protected]> wrote:
> On Tue, Aug 7, 2012 at 11:22 AM, Stanislaw Gruszka <[email protected]> wrote:
>> On Tue, Jul 31, 2012 at 01:54:52PM +0100, Pedro Francisco wrote:
>>> I've noticed in the past few days a pattern: sometimes nm-applet
>>> starts showing empty bars for the signal strength.
>>
>> RSSI reporting problem or maybe NM issue. When you change kernel to
>> older or newer does this problem go away ?
>>
>>> Running the script:
>>> sudo ifconfig wlan0 down; sleep 1
>>> sudo rmmod hp_wmi; sudo rmmod iwl3945; sudo rmmod iwlegacy; sudo rmmod
>>> mac80211; sudo rmmod cfg80211
>>> sleep 2; sudo rmmod rfkill; sync
>>> sudo modprobe rfkill; sudo modprobe cfg80211; sudo modprobe mac80211;
>>> sudo modprobe iwlegacy
>>> sudo modprobe iwl3945; sudo modprobe hp_wmi; sleep 1; sudo ifconfig wlan0 up
>>
>> I run a bit modified script (I do not have hp_wmi.ko and rfkill.ko) for few
>> hours, and did not get any WARNING/crash. I used 3.5, can you check if that
>> problem is also fixed on your system on 3.5 or newer.
>
> On 3.5.2-3.fc17.i686.PAE everything seems stable. The problem I had
> described hasn't happened recently.
> I guess it got fixed in the meantime.

I was wrong, got it again.

So, to recap: once the network applet shows no signal, but only then,
removing the wireless modules triggers an unrecoverable kernel panic.
I still haven't compiled a relocatable x86 kernel to get a proper
backtrace using kexec/kdump, sorry.

I found something else as well. Notice this output of "iwconfig" when
everything is _normal_:
$ iwconfig wlan0
wlan0 IEEE 802.11abg ESSID:"eduroam"
Mode:Managed Frequency:2.437 GHz Access Point: B8:62:1F:XX:XX:XX
Bit Rate=54 Mb/s Tx-Power=15 dBm
Retry long limit:7 RTS thr:off Fragment thr:off
Power Management:off
Link Quality=58/70 Signal level=-52 dBm
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:0 Invalid misc:0 Missed beacon:0

When I have the "empty signal bars" issue:
$ iwconfig wlan0
wlan0 IEEE 802.11abg ESSID:off/any
Mode:Managed Access Point: Not-Associated Tx-Power=15 dBm
Retry long limit:7 RTS thr:off Fragment thr:off
Power Management:off

In case you're wondering, it is connected and streaming stuff :)

I can sometimes trigger it on purpose: I just have to roam to a 5GHz
AP of the same ESS, cycle around 2GHz and back to 5GHz (using wpa_cli
roam XX:XX:XX:XX:XX ). If I get "SME: Authentication request to the
driver failed", then disabling NetworkManager (not wireless) and
reenabling will _probably_ get the "empty signal bars" (I was just
able to trigger the "empty signal bars" now after a clean boot).
So I'm guessing something gets corrupted, which is why reloading the
modules will crash.

I'm aware due to a patch to _iwlwifi_ (not iwl3945/iwlegacy) [1] that
2->5GHz roaming is not working very well on newer Intel wireless cards
so it is worth considering it is happening here as well.

Also, note some info, collected two days ago, relative to "Invalid
misc:" is getting 10 "invalid misc" packets in 10 seconds normal?
Several 'VAL=`date`; VAL="$VAL $(iwconfig wlan0 |grep "Invalid
misc")"; echo $VAL' follow:
Seg Set 24 15:06:36 WEST 2012 Tx excessive retries:5 Invalid misc:133
Missed beacon:0
Seg Set 24 15:06:46 WEST 2012 Tx excessive retries:5 Invalid misc:143
Missed beacon:0
Seg Set 24 15:07:00 WEST 2012 Tx excessive retries:5 Invalid misc:148
Missed beacon:0
Seg Set 24 15:21:46 WEST 2012 Tx excessive retries:22 Invalid misc:495
Missed beacon:0
Seg Set 24 15:24:41 WEST 2012 Tx excessive retries:24 Invalid misc:593
Missed beacon:0


So, something is getting corrupted here. Do you want the full logs?

[1] http://thread.gmane.org/gmane.linux.kernel.wireless.general/89361/focus=89445

--
Pedro

2012-10-15 15:49:17

by Pedro Francisco

[permalink] [raw]
Subject: Re: unloading WiFi modules is usually triggering kernel crash

On Fri, Oct 12, 2012 at 1:13 PM, Stanislaw Gruszka <[email protected]> wrote:
> On Tue, Oct 09, 2012 at 10:14:40AM +0100, Pedro Francisco wrote:
>> So, I'm guessing this means it is related to what you found on iwlwifi
>> (even if I'm on iwlegacy)?
>
> Yes, this seems to be cfg80211 problem. I think crash happen because
> cfg80211 is in disassociate state (i.e. has wdev->current_bss NULL) and
> erroneously mac80211 stays in associate state. So while we unload
> module cfg80211_mlme_down() we do not call ieee80211_deauth().
>
> I think this state mishmash happens because wrong behaviour on
> __cfg80211_mlme_deauth(). Below patch try to correct that.
> Can you check if it prevent a crash? On my environment I can
> not reproduce this problem reliably.
>
> Thanks
> Stanislaw
>
> diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
> index ab78b53..9b99b60 100644
> --- a/include/net/cfg80211.h
> +++ b/include/net/cfg80211.h
> @@ -1218,6 +1218,7 @@ struct cfg80211_deauth_request {
> const u8 *ie;
> size_t ie_len;
> u16 reason_code;
> + bool local_state_change;
> };
>
> /**
> diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
> index e714ed8..e510a33 100644
> --- a/net/mac80211/mlme.c
> +++ b/net/mac80211/mlme.c
> @@ -3549,6 +3549,7 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
> {
> struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
> u8 frame_buf[IEEE80211_DEAUTH_FRAME_LEN];
> + bool tx = !req->local_state_change;
>
> mutex_lock(&ifmgd->mtx);
>
> @@ -3565,12 +3566,12 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
> if (ifmgd->associated &&
> ether_addr_equal(ifmgd->associated->bssid, req->bssid)) {
> ieee80211_set_disassoc(sdata, IEEE80211_STYPE_DEAUTH,
> - req->reason_code, true, frame_buf);
> + req->reason_code, tx, frame_buf);
> } else {
> drv_mgd_prepare_tx(sdata->local, sdata);
> ieee80211_send_deauth_disassoc(sdata, req->bssid,
> IEEE80211_STYPE_DEAUTH,
> - req->reason_code, true,
> + req->reason_code, tx,
> frame_buf);
> }
>
> diff --git a/net/wireless/mlme.c b/net/wireless/mlme.c
> index 3df195a..4954010 100644
> --- a/net/wireless/mlme.c
> +++ b/net/wireless/mlme.c
> @@ -457,21 +457,11 @@ int __cfg80211_mlme_deauth(struct cfg80211_registered_device *rdev,
> .reason_code = reason,
> .ie = ie,
> .ie_len = ie_len,
> + .local_state_change = local_state_change,
> };
>
> ASSERT_WDEV_LOCK(wdev);
>
> - if (local_state_change) {
> - if (wdev->current_bss &&
> - ether_addr_equal(wdev->current_bss->pub.bssid, bssid)) {
> - cfg80211_unhold_bss(wdev->current_bss);
> - cfg80211_put_bss(&wdev->current_bss->pub);
> - wdev->current_bss = NULL;
> - }
> -
> - return 0;
> - }
> -
> return rdev->ops->deauth(&rdev->wiphy, dev, &req);
> }
>

I've been testing the patch since this morning (GMT), I can't
reproduce any of the issues I referred on this thread (had to adapt
the patch slightly, though). Seems to be fixed!

Thank you for your help!
--
Pedro Francisco


Attachments:
mlme-timers-fedora-3.6.1-fc17-kernel.patch (1.88 kB)

2012-10-12 12:14:36

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: unloading WiFi modules is usually triggering kernel crash

On Tue, Oct 09, 2012 at 10:14:40AM +0100, Pedro Francisco wrote:
> So, I'm guessing this means it is related to what you found on iwlwifi
> (even if I'm on iwlegacy)?

Yes, this seems to be cfg80211 problem. I think crash happen because
cfg80211 is in disassociate state (i.e. has wdev->current_bss NULL) and
erroneously mac80211 stays in associate state. So while we unload
module cfg80211_mlme_down() we do not call ieee80211_deauth().

I think this state mishmash happens because wrong behaviour on
__cfg80211_mlme_deauth(). Below patch try to correct that.
Can you check if it prevent a crash? On my environment I can
not reproduce this problem reliably.

Thanks
Stanislaw

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index ab78b53..9b99b60 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -1218,6 +1218,7 @@ struct cfg80211_deauth_request {
const u8 *ie;
size_t ie_len;
u16 reason_code;
+ bool local_state_change;
};

/**
diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index e714ed8..e510a33 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -3549,6 +3549,7 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
{
struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
u8 frame_buf[IEEE80211_DEAUTH_FRAME_LEN];
+ bool tx = !req->local_state_change;

mutex_lock(&ifmgd->mtx);

@@ -3565,12 +3566,12 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
if (ifmgd->associated &&
ether_addr_equal(ifmgd->associated->bssid, req->bssid)) {
ieee80211_set_disassoc(sdata, IEEE80211_STYPE_DEAUTH,
- req->reason_code, true, frame_buf);
+ req->reason_code, tx, frame_buf);
} else {
drv_mgd_prepare_tx(sdata->local, sdata);
ieee80211_send_deauth_disassoc(sdata, req->bssid,
IEEE80211_STYPE_DEAUTH,
- req->reason_code, true,
+ req->reason_code, tx,
frame_buf);
}

diff --git a/net/wireless/mlme.c b/net/wireless/mlme.c
index 3df195a..4954010 100644
--- a/net/wireless/mlme.c
+++ b/net/wireless/mlme.c
@@ -457,21 +457,11 @@ int __cfg80211_mlme_deauth(struct cfg80211_registered_device *rdev,
.reason_code = reason,
.ie = ie,
.ie_len = ie_len,
+ .local_state_change = local_state_change,
};

ASSERT_WDEV_LOCK(wdev);

- if (local_state_change) {
- if (wdev->current_bss &&
- ether_addr_equal(wdev->current_bss->pub.bssid, bssid)) {
- cfg80211_unhold_bss(wdev->current_bss);
- cfg80211_put_bss(&wdev->current_bss->pub);
- wdev->current_bss = NULL;
- }
-
- return 0;
- }
-
return rdev->ops->deauth(&rdev->wiphy, dev, &req);
}


2012-10-03 14:31:27

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: unloading WiFi modules is usually triggering kernel crash

On Wed, Sep 26, 2012 at 01:47:18PM +0100, Pedro Francisco wrote:
> On Thu, Aug 30, 2012 at 4:58 PM, Pedro Francisco
> <[email protected]> wrote:
> > On Tue, Aug 7, 2012 at 11:22 AM, Stanislaw Gruszka <[email protected]> wrote:
> >> On Tue, Jul 31, 2012 at 01:54:52PM +0100, Pedro Francisco wrote:
> >>> I've noticed in the past few days a pattern: sometimes nm-applet
> >>> starts showing empty bars for the signal strength.
> >>
> >> RSSI reporting problem or maybe NM issue. When you change kernel to
> >> older or newer does this problem go away ?
> >>
> >>> Running the script:
> >>> sudo ifconfig wlan0 down; sleep 1
> >>> sudo rmmod hp_wmi; sudo rmmod iwl3945; sudo rmmod iwlegacy; sudo rmmod
> >>> mac80211; sudo rmmod cfg80211
> >>> sleep 2; sudo rmmod rfkill; sync
> >>> sudo modprobe rfkill; sudo modprobe cfg80211; sudo modprobe mac80211;
> >>> sudo modprobe iwlegacy
> >>> sudo modprobe iwl3945; sudo modprobe hp_wmi; sleep 1; sudo ifconfig wlan0 up
> >>
> >> I run a bit modified script (I do not have hp_wmi.ko and rfkill.ko) for few
> >> hours, and did not get any WARNING/crash. I used 3.5, can you check if that
> >> problem is also fixed on your system on 3.5 or newer.
> >
> > On 3.5.2-3.fc17.i686.PAE everything seems stable. The problem I had
> > described hasn't happened recently.
> > I guess it got fixed in the meantime.
>
> I was wrong, got it again.
>
> So, to recap: once the network applet shows no signal, but only then,
> removing the wireless modules triggers an unrecoverable kernel panic.
> I still haven't compiled a relocatable x86 kernel to get a proper
> backtrace using kexec/kdump, sorry.
>
> I found something else as well. Notice this output of "iwconfig" when
> everything is _normal_:
> $ iwconfig wlan0
> wlan0 IEEE 802.11abg ESSID:"eduroam"
> Mode:Managed Frequency:2.437 GHz Access Point: B8:62:1F:XX:XX:XX
> Bit Rate=54 Mb/s Tx-Power=15 dBm
> Retry long limit:7 RTS thr:off Fragment thr:off
> Power Management:off
> Link Quality=58/70 Signal level=-52 dBm
> Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
> Tx excessive retries:0 Invalid misc:0 Missed beacon:0
>
> When I have the "empty signal bars" issue:
> $ iwconfig wlan0
> wlan0 IEEE 802.11abg ESSID:off/any
> Mode:Managed Access Point: Not-Associated Tx-Power=15 dBm
> Retry long limit:7 RTS thr:off Fragment thr:off
> Power Management:off
>
> In case you're wondering, it is connected and streaming stuff :)
>
> I can sometimes trigger it on purpose: I just have to roam to a 5GHz
> AP of the same ESS, cycle around 2GHz and back to 5GHz (using wpa_cli
> roam XX:XX:XX:XX:XX ). If I get "SME: Authentication request to the
> driver failed", then disabling NetworkManager (not wireless) and
> reenabling will _probably_ get the "empty signal bars" (I was just
> able to trigger the "empty signal bars" now after a clean boot).
> So I'm guessing something gets corrupted, which is why reloading the
> modules will crash.

We do not stop mac80211 timers on module unload. I reproduced below
warnings with iwlwifi on 3.5 kernel with DEBUG_OBJECTS enabled.
I forced roaming many times, and then do "modprobe -r iwlwifi".
Unfortunately those steps do not trigger warnings anytime, they
happened just once.

iwlwifi 0000:02:00.0: ACTIVATE a non DRIVER active station id 0 addr 6c:50:4d:3f:79:73
------------[ cut here ]------------
WARNING: at lib/debugobjects.c:261 debug_print_object+0x8e/0xb0()
Hardware name: SandyBridge Platform
ODEBUG: free active (active state 0) object type: timer_list hint:
ieee80211_sta_conn_mon_timer+0x0/0x40 [mac80211]
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq
freq_table mperf ipv6 uinput arc4 sg iwlwifi(-) mac80211 cfg80211 rfkill
coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr
lpc_ich mfd_core i2c_i801 e1000e ext4 mbcache jbd2 sd_mod crc_t10dif
sr_mod cdrom aesni_intel cryptd aes_x86_64 aes_generic ahci libahci i915
drm_kms_helper drm i2c_algo_bit i2c_core video dm_mirror dm_region_hash
dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 3064, comm: modprobe Not tainted 3.5.0 #1
Call Trace:
[<ffffffff810535af>] warn_slowpath_common+0x7f/0xc0
[<ffffffff810536a6>] warn_slowpath_fmt+0x46/0x50
[<ffffffff812901be>] debug_print_object+0x8e/0xb0
[<ffffffffa03a09b0>] ? ieee80211_chswitch_timer+0x40/0x40 [mac80211]
[<ffffffff81290a0d>] __debug_check_no_obj_freed+0x10d/0x200
[<ffffffff81290b1d>] debug_check_no_obj_freed+0x1d/0x30
[<ffffffff8117a2b0>] kfree+0xc0/0x330
[<ffffffff810b9083>] ? __lock_release+0x133/0x1a0
[<ffffffff815555f0>] ? _raw_spin_unlock_irqrestore+0x40/0x80
[<ffffffff814957c4>] netdev_release+0x44/0x60
[<ffffffff813704b7>] device_release+0x27/0xa0
[<ffffffff8127da42>] kobject_cleanup+0x82/0x1b0
[<ffffffff8127db7d>] kobject_release+0xd/0x10
[<ffffffff8127d8cc>] kobject_put+0x2c/0x60
[<ffffffff8147e371>] netdev_run_todo+0x101/0x180
[<ffffffff8148f5ae>] rtnl_unlock+0xe/0x10
[<ffffffffa0366178>] ieee80211_unregister_hw+0x58/0x120 [mac80211]
[<ffffffffa040912b>] iwlagn_mac_unregister+0x2b/0x40 [iwlwifi]
[<ffffffffa03fdf59>] iwl_op_mode_dvm_stop+0x49/0xf0 [iwlwifi]
[<ffffffffa041f730>] iwl_drv_stop+0x40/0x60 [iwlwifi]
[<ffffffffa0430a39>] iwl_pci_remove+0x25/0x3c [iwlwifi]
[<ffffffff812aafc2>] pci_device_remove+0x52/0x120
[<ffffffff813741cc>] __device_release_driver+0x7c/0xe0
[<ffffffff81374308>] driver_detach+0xd8/0xe0
[<ffffffff81372f61>] bus_remove_driver+0x91/0x110
[<ffffffff81374fd2>] driver_unregister+0x62/0xa0
[<ffffffff812ab2b4>] pci_unregister_driver+0x44/0xa0
[<ffffffffa041f3d5>] iwl_pci_unregister_driver+0x15/0x20 [iwlwifi]
[<ffffffffa0430a01>] iwl_exit+0x9/0x1c [iwlwifi]
[<ffffffff810c50f1>] sys_delete_module+0x1d1/0x2c0
[<ffffffff81555855>] ? retint_swapgs+0x13/0x1b
[<ffffffff810e169c>] ? __audit_syscall_entry+0xcc/0x210
[<ffffffff812896ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8155de69>] system_call_fastpath+0x16/0x1b
---[ end trace 8070f580fc119b8b ]---
------------[ cut here ]------------
WARNING: at lib/debugobjects.c:261 debug_print_object+0x8e/0xb0()
Hardware name: SandyBridge Platform
ODEBUG: free active (active state 0) object type: timer_list hint:
ieee80211_sta_bcn_mon_timer+0x0/0x40 [mac80211]
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq
freq_table mperf ipv6 uinput arc4 sg iwlwifi(-) mac80211 cfg80211 rfkill
coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr
lpc_ich mfd_core i2c_i801 e1000e ext4 mbcache jbd2 sd_mod crc_t10dif
sr_mod cdrom aesni_intel cryptd aes_x86_64 aes_generic ahci libahci i915
drm_kms_helper drm i2c_algo_bit i2c_core video dm_mirror dm_region_hash
dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 3064, comm: modprobe Tainted: G W 3.5.0 #1
Call Trace:
[<ffffffff810535af>] warn_slowpath_common+0x7f/0xc0
[<ffffffff810536a6>] warn_slowpath_fmt+0x46/0x50
[<ffffffff812901be>] debug_print_object+0x8e/0xb0
[<ffffffffa03a09f0>] ? ieee80211_sta_conn_mon_timer+0x40/0x40
[mac80211]
[<ffffffff81290a0d>] __debug_check_no_obj_freed+0x10d/0x200
[<ffffffff81290b1d>] debug_check_no_obj_freed+0x1d/0x30
[<ffffffff8117a2b0>] kfree+0xc0/0x330
[<ffffffff810b9083>] ? __lock_release+0x133/0x1a0
[<ffffffff815555f0>] ? _raw_spin_unlock_irqrestore+0x40/0x80
[<ffffffff814957c4>] netdev_release+0x44/0x60
[<ffffffff813704b7>] device_release+0x27/0xa0
[<ffffffff8127da42>] kobject_cleanup+0x82/0x1b0
[<ffffffff8127db7d>] kobject_release+0xd/0x10
[<ffffffff8127d8cc>] kobject_put+0x2c/0x60
[<ffffffff8147e371>] netdev_run_todo+0x101/0x180
[<ffffffff8148f5ae>] rtnl_unlock+0xe/0x10
[<ffffffffa0366178>] ieee80211_unregister_hw+0x58/0x120 [mac80211]
[<ffffffffa040912b>] iwlagn_mac_unregister+0x2b/0x40 [iwlwifi]
[<ffffffffa03fdf59>] iwl_op_mode_dvm_stop+0x49/0xf0 [iwlwifi]
[<ffffffffa041f730>] iwl_drv_stop+0x40/0x60 [iwlwifi]
[<ffffffffa0430a39>] iwl_pci_remove+0x25/0x3c [iwlwifi]
[<ffffffff812aafc2>] pci_device_remove+0x52/0x120
[<ffffffff813741cc>] __device_release_driver+0x7c/0xe0
[<ffffffff81374308>] driver_detach+0xd8/0xe0
[<ffffffff81372f61>] bus_remove_driver+0x91/0x110
[<ffffffff81374fd2>] driver_unregister+0x62/0xa0
[<ffffffff812ab2b4>] pci_unregister_driver+0x44/0xa0
[<ffffffffa041f3d5>] iwl_pci_unregister_driver+0x15/0x20 [iwlwifi]
[<ffffffffa0430a01>] iwl_exit+0x9/0x1c [iwlwifi]
[<ffffffff810c50f1>] sys_delete_module+0x1d1/0x2c0
[<ffffffff81555855>] ? retint_swapgs+0x13/0x1b
[<ffffffff810e169c>] ? __audit_syscall_entry+0xcc/0x210
[<ffffffff812896ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8155de69>] system_call_fastpath+0x16/0x1b
---[ end trace 8070f580fc119b8c ]---
Bridge firewalling registered

> misc:" is getting 10 "invalid misc" packets in 10 seconds normal?
> Several 'VAL=`date`; VAL="$VAL $(iwconfig wlan0 |grep "Invalid
> misc")"; echo $VAL' follow:
> Seg Set 24 15:06:36 WEST 2012 Tx excessive retries:5 Invalid misc:133
> Missed beacon:0
> Seg Set 24 15:06:46 WEST 2012 Tx excessive retries:5 Invalid misc:143
> Missed beacon:0
> Seg Set 24 15:07:00 WEST 2012 Tx excessive retries:5 Invalid misc:148
> Missed beacon:0
> Seg Set 24 15:21:46 WEST 2012 Tx excessive retries:22 Invalid misc:495
> Missed beacon:0
> Seg Set 24 15:24:41 WEST 2012 Tx excessive retries:24 Invalid misc:593
> Missed beacon:0

I see lot of that. This can be caused by noisy radio environment, but also
can be a firmware/driver bug. Unfortunately those kind of bugs are not
easy to fix.

Stanislaw

2012-10-09 09:15:03

by Pedro Francisco

[permalink] [raw]
Subject: Re: unloading WiFi modules is usually triggering kernel crash

On Wed, Oct 3, 2012 at 3:30 PM, Stanislaw Gruszka <[email protected]> wrote:
> On Wed, Sep 26, 2012 at 01:47:18PM +0100, Pedro Francisco wrote:
>> On Thu, Aug 30, 2012 at 4:58 PM, Pedro Francisco
>> <[email protected]> wrote:
>> > On Tue, Aug 7, 2012 at 11:22 AM, Stanislaw Gruszka <[email protected]> wrote:
>> >> On Tue, Jul 31, 2012 at 01:54:52PM +0100, Pedro Francisco wrote:
>> >>> I've noticed in the past few days a pattern: sometimes nm-applet
>> >>> starts showing empty bars for the signal strength.
>> >>
>> >> RSSI reporting problem or maybe NM issue. When you change kernel to
>> >> older or newer does this problem go away ?
>> >>
>> >>> Running the script:
>> >>> sudo ifconfig wlan0 down; sleep 1
>> >>> sudo rmmod hp_wmi; sudo rmmod iwl3945; sudo rmmod iwlegacy; sudo rmmod
>> >>> mac80211; sudo rmmod cfg80211
>> >>> sleep 2; sudo rmmod rfkill; sync
>> >>> sudo modprobe rfkill; sudo modprobe cfg80211; sudo modprobe mac80211;
>> >>> sudo modprobe iwlegacy
>> >>> sudo modprobe iwl3945; sudo modprobe hp_wmi; sleep 1; sudo ifconfig wlan0 up
>> >>
>> >> I run a bit modified script (I do not have hp_wmi.ko and rfkill.ko) for few
>> >> hours, and did not get any WARNING/crash. I used 3.5, can you check if that
>> >> problem is also fixed on your system on 3.5 or newer.
>> >
>> > On 3.5.2-3.fc17.i686.PAE everything seems stable. The problem I had
>> > described hasn't happened recently.
>> > I guess it got fixed in the meantime.
>>
>> I was wrong, got it again.
>>
>> So, to recap: once the network applet shows no signal, but only then,
>> removing the wireless modules triggers an unrecoverable kernel panic.
>> I still haven't compiled a relocatable x86 kernel to get a proper
>> backtrace using kexec/kdump, sorry.
>>
>> I found something else as well. Notice this output of "iwconfig" when
>> everything is _normal_:
>> $ iwconfig wlan0
>> wlan0 IEEE 802.11abg ESSID:"eduroam"
>> Mode:Managed Frequency:2.437 GHz Access Point: B8:62:1F:XX:XX:XX
>> Bit Rate=54 Mb/s Tx-Power=15 dBm
>> Retry long limit:7 RTS thr:off Fragment thr:off
>> Power Management:off
>> Link Quality=58/70 Signal level=-52 dBm
>> Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
>> Tx excessive retries:0 Invalid misc:0 Missed beacon:0
>>
>> When I have the "empty signal bars" issue:
>> $ iwconfig wlan0
>> wlan0 IEEE 802.11abg ESSID:off/any
>> Mode:Managed Access Point: Not-Associated Tx-Power=15 dBm
>> Retry long limit:7 RTS thr:off Fragment thr:off
>> Power Management:off
>>
>> In case you're wondering, it is connected and streaming stuff :)
>>
>> I can sometimes trigger it on purpose: I just have to roam to a 5GHz
>> AP of the same ESS, cycle around 2GHz and back to 5GHz (using wpa_cli
>> roam XX:XX:XX:XX:XX ). If I get "SME: Authentication request to the
>> driver failed", then disabling NetworkManager (not wireless) and
>> reenabling will _probably_ get the "empty signal bars" (I was just
>> able to trigger the "empty signal bars" now after a clean boot).
>> So I'm guessing something gets corrupted, which is why reloading the
>> modules will crash.
>
> We do not stop mac80211 timers on module unload. I reproduced below
> warnings with iwlwifi on 3.5 kernel with DEBUG_OBJECTS enabled.
> I forced roaming many times, and then do "modprobe -r iwlwifi".
> Unfortunately those steps do not trigger warnings anytime, they
> happened just once.
>
> iwlwifi 0000:02:00.0: ACTIVATE a non DRIVER active station id 0 addr 6c:50:4d:3f:79:73
> ------------[ cut here ]------------
> WARNING: at lib/debugobjects.c:261 debug_print_object+0x8e/0xb0()
> Hardware name: SandyBridge Platform
> ODEBUG: free active (active state 0) object type: timer_list hint:
> ieee80211_sta_conn_mon_timer+0x0/0x40 [mac80211]
> Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq
> freq_table mperf ipv6 uinput arc4 sg iwlwifi(-) mac80211 cfg80211 rfkill
> coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr
> lpc_ich mfd_core i2c_i801 e1000e ext4 mbcache jbd2 sd_mod crc_t10dif
> sr_mod cdrom aesni_intel cryptd aes_x86_64 aes_generic ahci libahci i915
> drm_kms_helper drm i2c_algo_bit i2c_core video dm_mirror dm_region_hash
> dm_log dm_mod [last unloaded: scsi_wait_scan]
> Pid: 3064, comm: modprobe Not tainted 3.5.0 #1
> Call Trace:
> [<ffffffff810535af>] warn_slowpath_common+0x7f/0xc0
> [<ffffffff810536a6>] warn_slowpath_fmt+0x46/0x50
> [<ffffffff812901be>] debug_print_object+0x8e/0xb0
> [<ffffffffa03a09b0>] ? ieee80211_chswitch_timer+0x40/0x40 [mac80211]
> [<ffffffff81290a0d>] __debug_check_no_obj_freed+0x10d/0x200
> [<ffffffff81290b1d>] debug_check_no_obj_freed+0x1d/0x30
> [<ffffffff8117a2b0>] kfree+0xc0/0x330
> [<ffffffff810b9083>] ? __lock_release+0x133/0x1a0
> [<ffffffff815555f0>] ? _raw_spin_unlock_irqrestore+0x40/0x80
> [<ffffffff814957c4>] netdev_release+0x44/0x60
> [<ffffffff813704b7>] device_release+0x27/0xa0
> [<ffffffff8127da42>] kobject_cleanup+0x82/0x1b0
> [<ffffffff8127db7d>] kobject_release+0xd/0x10
> [<ffffffff8127d8cc>] kobject_put+0x2c/0x60
> [<ffffffff8147e371>] netdev_run_todo+0x101/0x180
> [<ffffffff8148f5ae>] rtnl_unlock+0xe/0x10
> [<ffffffffa0366178>] ieee80211_unregister_hw+0x58/0x120 [mac80211]
> [<ffffffffa040912b>] iwlagn_mac_unregister+0x2b/0x40 [iwlwifi]
> [<ffffffffa03fdf59>] iwl_op_mode_dvm_stop+0x49/0xf0 [iwlwifi]
> [<ffffffffa041f730>] iwl_drv_stop+0x40/0x60 [iwlwifi]
> [<ffffffffa0430a39>] iwl_pci_remove+0x25/0x3c [iwlwifi]
> [<ffffffff812aafc2>] pci_device_remove+0x52/0x120
> [<ffffffff813741cc>] __device_release_driver+0x7c/0xe0
> [<ffffffff81374308>] driver_detach+0xd8/0xe0
> [<ffffffff81372f61>] bus_remove_driver+0x91/0x110
> [<ffffffff81374fd2>] driver_unregister+0x62/0xa0
> [<ffffffff812ab2b4>] pci_unregister_driver+0x44/0xa0
> [<ffffffffa041f3d5>] iwl_pci_unregister_driver+0x15/0x20 [iwlwifi]
> [<ffffffffa0430a01>] iwl_exit+0x9/0x1c [iwlwifi]
> [<ffffffff810c50f1>] sys_delete_module+0x1d1/0x2c0
> [<ffffffff81555855>] ? retint_swapgs+0x13/0x1b
> [<ffffffff810e169c>] ? __audit_syscall_entry+0xcc/0x210
> [<ffffffff812896ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [<ffffffff8155de69>] system_call_fastpath+0x16/0x1b
> ---[ end trace 8070f580fc119b8b ]---
> ------------[ cut here ]------------
> WARNING: at lib/debugobjects.c:261 debug_print_object+0x8e/0xb0()
> Hardware name: SandyBridge Platform
> ODEBUG: free active (active state 0) object type: timer_list hint:
> ieee80211_sta_bcn_mon_timer+0x0/0x40 [mac80211]
> Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq
> freq_table mperf ipv6 uinput arc4 sg iwlwifi(-) mac80211 cfg80211 rfkill
> coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr
> lpc_ich mfd_core i2c_i801 e1000e ext4 mbcache jbd2 sd_mod crc_t10dif
> sr_mod cdrom aesni_intel cryptd aes_x86_64 aes_generic ahci libahci i915
> drm_kms_helper drm i2c_algo_bit i2c_core video dm_mirror dm_region_hash
> dm_log dm_mod [last unloaded: scsi_wait_scan]
> Pid: 3064, comm: modprobe Tainted: G W 3.5.0 #1
> Call Trace:
> [<ffffffff810535af>] warn_slowpath_common+0x7f/0xc0
> [<ffffffff810536a6>] warn_slowpath_fmt+0x46/0x50
> [<ffffffff812901be>] debug_print_object+0x8e/0xb0
> [<ffffffffa03a09f0>] ? ieee80211_sta_conn_mon_timer+0x40/0x40
> [mac80211]
> [<ffffffff81290a0d>] __debug_check_no_obj_freed+0x10d/0x200
> [<ffffffff81290b1d>] debug_check_no_obj_freed+0x1d/0x30
> [<ffffffff8117a2b0>] kfree+0xc0/0x330
> [<ffffffff810b9083>] ? __lock_release+0x133/0x1a0
> [<ffffffff815555f0>] ? _raw_spin_unlock_irqrestore+0x40/0x80
> [<ffffffff814957c4>] netdev_release+0x44/0x60
> [<ffffffff813704b7>] device_release+0x27/0xa0
> [<ffffffff8127da42>] kobject_cleanup+0x82/0x1b0
> [<ffffffff8127db7d>] kobject_release+0xd/0x10
> [<ffffffff8127d8cc>] kobject_put+0x2c/0x60
> [<ffffffff8147e371>] netdev_run_todo+0x101/0x180
> [<ffffffff8148f5ae>] rtnl_unlock+0xe/0x10
> [<ffffffffa0366178>] ieee80211_unregister_hw+0x58/0x120 [mac80211]
> [<ffffffffa040912b>] iwlagn_mac_unregister+0x2b/0x40 [iwlwifi]
> [<ffffffffa03fdf59>] iwl_op_mode_dvm_stop+0x49/0xf0 [iwlwifi]
> [<ffffffffa041f730>] iwl_drv_stop+0x40/0x60 [iwlwifi]
> [<ffffffffa0430a39>] iwl_pci_remove+0x25/0x3c [iwlwifi]
> [<ffffffff812aafc2>] pci_device_remove+0x52/0x120
> [<ffffffff813741cc>] __device_release_driver+0x7c/0xe0
> [<ffffffff81374308>] driver_detach+0xd8/0xe0
> [<ffffffff81372f61>] bus_remove_driver+0x91/0x110
> [<ffffffff81374fd2>] driver_unregister+0x62/0xa0
> [<ffffffff812ab2b4>] pci_unregister_driver+0x44/0xa0
> [<ffffffffa041f3d5>] iwl_pci_unregister_driver+0x15/0x20 [iwlwifi]
> [<ffffffffa0430a01>] iwl_exit+0x9/0x1c [iwlwifi]
> [<ffffffff810c50f1>] sys_delete_module+0x1d1/0x2c0
> [<ffffffff81555855>] ? retint_swapgs+0x13/0x1b
> [<ffffffff810e169c>] ? __audit_syscall_entry+0xcc/0x210
> [<ffffffff812896ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [<ffffffff8155de69>] system_call_fastpath+0x16/0x1b
> ---[ end trace 8070f580fc119b8c ]---
> Bridge firewalling registered
>

Hi!
I was finally able to compile a relocatable kernel: here's what I got,
after a crash on iwlegacy module removal:
# crash vmcore /usr/lib/debug/lib/modules/`uname -r`/vmlinux

crash 6.0.8-1.fc18 [note, I'm on FC17 but had to install FC18's crash
to workaround a log structure change in 3.5 kernel]
(...)
This GDB was configured as "i686-pc-linux-gnu"...


============ CRASH 1 ============
KERNEL: /usr/lib/debug/lib/modules/3.5.4-2.pedro.fc17.i686.PAE/vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 2
DATE: Tue Oct 9 09:17:35 2012
UPTIME: 00:12:47
LOAD AVERAGE: 0.50, 0.62, 0.60
TASKS: 322
NODENAME: s2
RELEASE: 3.5.4-2.pedro.fc17.i686.PAE
VERSION: #1 SMP Mon Oct 8 23:15:44 WEST 2012
MACHINE: i686 (1496 Mhz)
MEMORY: 2 GB
PANIC: "Oops: 0000 [#1] SMP " (check log for details)
PID: 0
COMMAND: "swapper/1"
TASK: f4104240 (1 of 2) [THREAD_INFO: f4146000]
CPU: 1
STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 0 TASK: f4104240 CPU: 1 COMMAND: "swapper/1"
#0 [f4147db4] crash_kexec at c04a7d59
#1 [f4147e04] timerqueue_add at c0675503
#2 [f4147e14] ktime_get at c04921ee
#3 [f4147e30] bad_area_nosemaphore at c0958328
#4 [f4147e3c] do_page_fault at c0964c25
#5 [f4147eb8] error_code (via page_fault) at c0961eb1
EAX: 6b6b6b6b EBX: 00072420 ECX: 00000001 EDX: f4178930 EBP: f4147f30
DS: 007b ESI: 00000024 ES: 007b EDI: 00072420 GS: 00e0
CS: 0060 EIP: c0457993 ERR: ffffffff EFLAGS: 00010003
#6 [f4147eec] get_next_timer_interrupt at c0457993
#7 [f4147f34] tick_nohz_stop_sched_tick.isra.11 at c049a19a
#8 [f4147f78] tick_nohz_idle_enter at c049a661
#9 [f4147f80] cpu_idle at c04195da


============ CRASH 2 ============

KERNEL: /usr/lib/debug/lib/modules/3.5.4-2.pedro.fc17.i686.PAE/vmlinux
DUMPFILE: vmcore
CPUS: 2
DATE: Tue Oct 9 09:29:35 2012
UPTIME: 00:10:22
LOAD AVERAGE: 0.30, 0.78, 0.67
TASKS: 323
NODENAME: s2
RELEASE: 3.5.4-2.pedro.fc17.i686.PAE
VERSION: #1 SMP Mon Oct 8 23:15:44 WEST 2012
MACHINE: i686 (1496 Mhz)
MEMORY: 2 GB
PANIC: "kernel BUG at kernel/timer.c:1091!"
PID: 6563
COMMAND: "rpm" <-- ?
TASK: eaa5dcc0 [THREAD_INFO: f414a000]
CPU: 0
STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 6563 TASK: eaa5dcc0 CPU: 0 COMMAND: "rpm"
bt: cannot resolve stack trace:
#0 [f414bd58] __schedule at c095fba6
#1 [f414bdd4] sched_clock_local at c047a56d
bt: text symbols on stack:
[f414bd5c] kmap_atomic_prot at c0441244
[f414bd70] __kunmap_atomic at c04410dd
[f414bd80] get_page_from_freelist at c0504e40
[f414bdcc] sched_clock at c0417a28
[f414bdd4] sched_clock_local at c047a572
[f414be28] update_curr at c047cdb2
[f414be60] clear_nohz_tick_stopped.part.37 at c0958f63
[f414be6c] trigger_load_balance at c047ff73
[f414be88] scheduler_tick at c04774e5
[f414beac] timerqueue_add at c0675508
[f414bec0] ktime_get at c04921f0
[f414bed4] lapic_next_event at c042f75b
[f414bedc] clockevents_program_event at c049877d
[f414bef4] tick_program_event at c0499a79
[f414bf04] hrtimer_interrupt at c046bcc8
[f414bf54] irq_exit at c045001d
[f414bf5c] smp_apic_timer_interrupt at c042fdbe
[f414bf74] apic_timer_interrupt at c0961c85
[f414bfa8] sysenter_past_esp at c0968322
bt: possible exception frame:
USER-MODE EXCEPTION FRAME AT f414bfb4:
EAX: 0000002d EBX: 09fba000 ECX: 45a4bff4 EDX: 09fba000
DS: 007b ESI: 09f99000 ES: 007b EDI: 09fba000
SS: 007b ESP: bff95804 EBP: bff95804 GS: 0033
CS: 0073 EIP: b7720424 ERR: 0000002d EFLAGS: 00000202



So, I'm guessing this means it is related to what you found on iwlwifi
(even if I'm on iwlegacy)?
The crash kernel crashed again but I can try to add a script to try to
recover dmesg -- I believe slub_debug caught something as well...

--
Pedro

2012-10-15 11:02:49

by Johannes Berg

[permalink] [raw]
Subject: Re: unloading WiFi modules is usually triggering kernel crash

On Fri, 2012-10-12 at 14:13 +0200, Stanislaw Gruszka wrote:
> On Tue, Oct 09, 2012 at 10:14:40AM +0100, Pedro Francisco wrote:
> > So, I'm guessing this means it is related to what you found on iwlwifi
> > (even if I'm on iwlegacy)?
>
> Yes, this seems to be cfg80211 problem. I think crash happen because
> cfg80211 is in disassociate state (i.e. has wdev->current_bss NULL) and
> erroneously mac80211 stays in associate state. So while we unload
> module cfg80211_mlme_down() we do not call ieee80211_deauth().
>
> I think this state mishmash happens because wrong behaviour on
> __cfg80211_mlme_deauth(). Below patch try to correct that.
> Can you check if it prevent a crash? On my environment I can
> not reproduce this problem reliably.

Ugh, yeah, what was I thinking with the code below ... ??


> diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
> index ab78b53..9b99b60 100644
> --- a/include/net/cfg80211.h
> +++ b/include/net/cfg80211.h
> @@ -1218,6 +1218,7 @@ struct cfg80211_deauth_request {
> const u8 *ie;
> size_t ie_len;
> u16 reason_code;
> + bool local_state_change;
> };
>
> /**
> diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
> index e714ed8..e510a33 100644
> --- a/net/mac80211/mlme.c
> +++ b/net/mac80211/mlme.c
> @@ -3549,6 +3549,7 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
> {
> struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
> u8 frame_buf[IEEE80211_DEAUTH_FRAME_LEN];
> + bool tx = !req->local_state_change;
>
> mutex_lock(&ifmgd->mtx);
>
> @@ -3565,12 +3566,12 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
> if (ifmgd->associated &&
> ether_addr_equal(ifmgd->associated->bssid, req->bssid)) {
> ieee80211_set_disassoc(sdata, IEEE80211_STYPE_DEAUTH,
> - req->reason_code, true, frame_buf);
> + req->reason_code, tx, frame_buf);
> } else {
> drv_mgd_prepare_tx(sdata->local, sdata);
> ieee80211_send_deauth_disassoc(sdata, req->bssid,
> IEEE80211_STYPE_DEAUTH,
> - req->reason_code, true,
> + req->reason_code, tx,
> frame_buf);
> }
>
> diff --git a/net/wireless/mlme.c b/net/wireless/mlme.c
> index 3df195a..4954010 100644
> --- a/net/wireless/mlme.c
> +++ b/net/wireless/mlme.c
> @@ -457,21 +457,11 @@ int __cfg80211_mlme_deauth(struct cfg80211_registered_device *rdev,
> .reason_code = reason,
> .ie = ie,
> .ie_len = ie_len,
> + .local_state_change = local_state_change,
> };
>
> ASSERT_WDEV_LOCK(wdev);
>
> - if (local_state_change) {
> - if (wdev->current_bss &&
> - ether_addr_equal(wdev->current_bss->pub.bssid, bssid)) {
> - cfg80211_unhold_bss(wdev->current_bss);
> - cfg80211_put_bss(&wdev->current_bss->pub);
> - wdev->current_bss = NULL;
> - }
> -
> - return 0;
> - }
> -


This looks fine to me. Probably needs Cc: stable?

Then again, maybe if the deauth request is for a BSS that *isn't* the
current BSS we should "swallow" it in cfg80211? IOW, something like

if (local_state_change && (!wdev->current_bss ||
!ether_addr_equal(...))
return 0;

since neither mac80211 nor cfg80211 track authentication... Doesn't
matter much though.

johannes