2009-08-19 20:47:44

by Fabio Comolli

[permalink] [raw]
Subject: WARNING: at net/mac80211/mlme.c:2292

Hi all.
I see the following warning on an eeePC 900 (AR5001) running
2.6.31-rc6 after a suspend/resume cycle:

[ 292.377941] ------------[ cut here ]------------
[ 292.377976] WARNING: at net/mac80211/mlme.c:2292
ieee80211_sta_work+0x89/0xc39 [mac80211]()
[ 292.377981] Hardware name: 900
[ 292.377984] Modules linked in: arc4 ecb snd_hda_codec_realtek ath5k
snd_hda_intel snd_hda_codec mac80211 snd_pcm_oss snd_mixer_oss
usb_storage ath snd_hwdep battery snd_pcm snd_timer ac cfg80211 snd
soundcore snd_page_alloc thermal button processor uhci_hcd ehci_hcd
[ 292.378327] Pid: 866, comm: phy0 Tainted: G M 2.6.31-rc6-eee #8
[ 292.378331] Call Trace:
[ 292.378345] [<c101d13a>] ? warn_slowpath_common+0x5d/0x70
[ 292.378354] [<c101d158>] ? warn_slowpath_null+0xb/0xd
[ 292.378379] [<f87e8ab0>] ? ieee80211_sta_work+0x89/0xc39 [mac80211]
[ 292.378398] [<f8872784>] ? ath5k_hw_reset+0xfdd/0xff1 [ath5k]
[ 292.378417] [<f886b94c>] ? ath5k_hw_set_imr+0x14d/0x156 [ath5k]
[ 292.378433] [<f8873c52>] ? ath5k_beacon_config+0x16c/0x173 [ath5k]
[ 292.378449] [<f8873f85>] ? ath5k_txq_cleanup+0x19b/0x1b5 [ath5k]
[ 292.378457] [<c10199da>] ? __wake_up+0x1d/0x3d
[ 292.378466] [<c102a529>] ? insert_work+0x8f/0x96
[ 292.378473] [<c102a777>] ? queue_work_on+0x24/0x2b
[ 292.378480] [<c102a7a5>] ? queue_work+0x1a/0x39
[ 292.378506] [<f87e64dc>] ?
ieee80211_mlme_notify_scan_completed+0x40/0x66 [mac80211]
[ 292.378532] [<f87e3caa>] ? ieee80211_scan_completed+0x2ef/0x2fc [mac80211]
[ 292.378539] [<c102aaad>] ? worker_thread+0x15c/0x1fd
[ 292.378564] [<f87e8a27>] ? ieee80211_sta_work+0x0/0xc39 [mac80211]
[ 292.378572] [<c102d8e0>] ? autoremove_wake_function+0x0/0x29
[ 292.378579] [<c102a951>] ? worker_thread+0x0/0x1fd
[ 292.378586] [<c102d67f>] ? kthread+0x68/0x6d
[ 292.378592] [<c102d617>] ? kthread+0x0/0x6d
[ 292.378600] [<c1002fb3>] ? kernel_thread_helper+0x7/0x10
[ 292.378605] ---[ end trace 7349ad9bfff515b3 ]---

A very similar warning is also seen with compat-wireless pulled 9 days ago:

Aug 10 20:08:35 archeee kernel: [ 1440.840885] ------------[ cut here
]------------
Aug 10 20:08:35 archeee kernel: [ 1440.840932] WARNING: at
/home/fcomolli/SRC/src/compat-wireless-2.6.31-rc4/net/mac80211/mlme.c:2292
ieee80211_sta_work+0x97/0xd30 [mac80211]()
Aug 10 20:08:35 archeee kernel: [ 1440.840939] Hardware name: 900
Aug 10 20:08:35 archeee kernel: [ 1440.840942] Modules linked in:
uvcvideo videodev v4l1_compat arc4 ecb ath5k mac80211 ath cfg80211
Aug 10 20:08:35 archeee kernel: [ 1440.840961] Pid: 3218, comm: phy3
Tainted: G M W 2.6.31-rc5 #1
Aug 10 20:08:35 archeee kernel: [ 1440.840966] Call Trace:
Aug 10 20:08:35 archeee kernel: [ 1440.840978] [<c1024359>]
warn_slowpath_common+0x65/0x7c
Aug 10 20:08:35 archeee kernel: [ 1440.841015] [<f848b2af>] ?
ieee80211_sta_work+0x97/0xd30 [mac80211]
Aug 10 20:08:35 archeee kernel: [ 1440.841023] [<c102437d>]
warn_slowpath_null+0xd/0x10
Aug 10 20:08:35 archeee kernel: [ 1440.841058] [<f848b2af>]
ieee80211_sta_work+0x97/0xd30 [mac80211]
Aug 10 20:08:35 archeee kernel: [ 1440.841069] [<c101f49b>] ?
__wake_up+0x2f/0x56
Aug 10 20:08:35 archeee kernel: [ 1440.841076] [<c101f4a9>] ?
__wake_up+0x3d/0x56
Aug 10 20:08:35 archeee kernel: [ 1440.841086] [<c1032410>] ?
insert_work+0x96/0x9f
Aug 10 20:08:35 archeee kernel: [ 1440.841095] [<c103266f>] ?
__queue_work+0x32/0x49
Aug 10 20:08:35 archeee kernel: [ 1440.841103] [<c10326ad>] ?
queue_work_on+0x27/0x2f
Aug 10 20:08:35 archeee kernel: [ 1440.841109] [<c1032e3b>] ?
queue_work+0x2d/0x45
Aug 10 20:08:35 archeee kernel: [ 1440.841150] [<f849a9ac>] ?
ieee80211_mesh_notify_scan_completed+0x4d/0x64 [mac80211]
Aug 10 20:08:35 archeee kernel: [ 1440.841185] [<f84860ce>] ?
ieee80211_scan_completed+0x327/0x32f [mac80211]
Aug 10 20:08:35 archeee kernel: [ 1440.841220] [<f84861d5>] ?
ieee80211_scan_work+0xb5/0x171 [mac80211]
Aug 10 20:08:35 archeee kernel: [ 1440.841229] [<c1032981>]
worker_thread+0x15d/0x204
Aug 10 20:08:35 archeee kernel: [ 1440.841265] [<f848b218>] ?
ieee80211_sta_work+0x0/0xd30 [mac80211]
Aug 10 20:08:35 archeee kernel: [ 1440.841274] [<c1035b23>] ?
autoremove_wake_function+0x0/0x2f
Aug 10 20:08:35 archeee kernel: [ 1440.841282] [<c1032824>] ?
worker_thread+0x0/0x204
Aug 10 20:08:35 archeee kernel: [ 1440.841289] [<c10358b7>] kthread+0x63/0x68
Aug 10 20:08:35 archeee kernel: [ 1440.841296] [<c1035854>] ? kthread+0x0/0x68
Aug 10 20:08:35 archeee kernel: [ 1440.841305] [<c10030b3>]
kernel_thread_helper+0x7/0x10
Aug 10 20:08:35 archeee kernel: [ 1440.841310] ---[ end trace
bc8f70f38f66567f ]---


The effect is that after the warning the interface is unusable until
disabled and reenabled with the rfkill switch.
The warning is new in the 31-rc series but the effect was present in
previous versions (at least it probably happened with the 29.x series
- I skipped the 30.x kernels).

The bug is quite easy to reproduce.

Regards,
Fabio


2009-08-22 19:29:39

by Fabio Comolli

[permalink] [raw]
Subject: Re: WARNING: at net/mac80211/mlme.c:2292

Hi Bob.
Unfortunately the patch doesn't apply at all with compat-wireless,
there's no "flush_workqueue" before "local->suspended" there....

Please advice.
Regards,
Fabio




On Sat, Aug 22, 2009 at 3:40 PM, Bob Copeland<[email protected]> wrote:
> On Sat, Aug 22, 2009 at 03:02:26PM +0200, Fabio Comolli wrote:
>> Hi Bob.
>>
>> I'm not a git user. How can I user wireless-testing without git? Is
>> there a patch or seomething?
>
> No, I'm unaware of a way to get it without git.  Well, okay, try
> compat-wireless, but be sure to use the 'bleeding edge' compat-wireless,
> and post the contents of its 'git-describe' file with the warnings so
> I can be sure we're on the same page.
>
> http://wireless.kernel.org/en/users/Download
>
> --
> Bob Copeland %% http://www.bobcopeland.com
>
>

2009-08-22 13:02:27

by Fabio Comolli

[permalink] [raw]
Subject: Re: WARNING: at net/mac80211/mlme.c:2292

Hi Bob.

On Sat, Aug 22, 2009 at 2:47 PM, Bob Copeland<[email protected]> wrote:
> On Fri, Aug 21, 2009 at 10:19:33AM -0400, Bob Copeland wrote:
>> Okay, I think I see what is going on here.
>
> Well I can't quite convince myself of what exactly is requeing
> sta work; we do cancel everything already as far as I can tell, but
> something is rearming, I just can't tell which.
>
> Fabio, can you please apply this patch against wireless-testing (not
> compat-wireless) and report all the warnings produced when doing a
> suspend/resume?  This should tell us the code paths that are queuing
> work too late.

I'm not a git user. How can I user wireless-testing without git? Is
there a patch or seomething?


>
> From 6eb7d5c3ae8f2f42b164491acd02631858515876 Mon Sep 17 00:00:00 2001
> From: Bob Copeland <[email protected]>
> Date: Sat, 22 Aug 2009 08:40:53 -0400
> Subject: [PATCH] mac80211: set suspended before final flush_workqueue
>
> Just a temporary debugging aid.
> ---
>  net/mac80211/pm.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/mac80211/pm.c b/net/mac80211/pm.c
> index a5d2f1f..231c8be 100644
> --- a/net/mac80211/pm.c
> +++ b/net/mac80211/pm.c
> @@ -117,13 +117,13 @@ int __ieee80211_suspend(struct ieee80211_hw *hw)
>         * shouldn't be doing (or cancel everything in the
>         * stop callback) that but better safe than sorry.
>         */
> -       flush_workqueue(local->workqueue);
> -
>        local->suspended = true;
>        /* need suspended to be visible before quiescing is false */
>        barrier();
>        local->quiescing = false;
>
> +       flush_workqueue(local->workqueue);
> +
>        return 0;
>  }
>
> --
> 1.6.2.5
>
>
> --
> Bob Copeland %% http://www.bobcopeland.com
>
>

Regards,
Fabio

2009-08-23 14:40:14

by Fabio Comolli

[permalink] [raw]
Subject: Re: WARNING: at net/mac80211/mlme.c:2292

Hi.

On Sun, Aug 23, 2009 at 2:12 AM, Bob Copeland<[email protected]> wrote:
> On Sat, Aug 22, 2009 at 09:29:39PM +0200, Fabio Comolli wrote:
>> Hi Bob.
>> Unfortunately the patch doesn't apply at all with compat-wireless,
>> there's no "flush_workqueue" before "local->suspended" there....
>
> Ah yes, it got moved into ieee80211_stop_device().  Can you put
> local->suspended and the barrier() ahead of that?
>

Well, this crashed my system. Backtrace copied by hand:

warning at net/wireless/core.c wdev_cleanup_work [cfg80211]

warn_slowpat_common
warn_slowpath_null
wdev_cleanup_work [cfg80211]
worker_thread
wdev_cleanup_work [cfg80211]
autoresolve_wake_function
worker_thread
kthread
kthread
kernel_thread_helper


> Thanks!

Regards,
Fabio


>
> --
> Bob Copeland %% http://www.bobcopeland.com
>
>

2009-08-22 12:47:28

by Bob Copeland

[permalink] [raw]
Subject: Re: WARNING: at net/mac80211/mlme.c:2292

On Fri, Aug 21, 2009 at 10:19:33AM -0400, Bob Copeland wrote:
> Okay, I think I see what is going on here.

Well I can't quite convince myself of what exactly is requeing
sta work; we do cancel everything already as far as I can tell, but
something is rearming, I just can't tell which.

Fabio, can you please apply this patch against wireless-testing (not
compat-wireless) and report all the warnings produced when doing a
suspend/resume? This should tell us the code paths that are queuing
work too late.

>From 6eb7d5c3ae8f2f42b164491acd02631858515876 Mon Sep 17 00:00:00 2001
From: Bob Copeland <[email protected]>
Date: Sat, 22 Aug 2009 08:40:53 -0400
Subject: [PATCH] mac80211: set suspended before final flush_workqueue

Just a temporary debugging aid.
---
net/mac80211/pm.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/mac80211/pm.c b/net/mac80211/pm.c
index a5d2f1f..231c8be 100644
--- a/net/mac80211/pm.c
+++ b/net/mac80211/pm.c
@@ -117,13 +117,13 @@ int __ieee80211_suspend(struct ieee80211_hw *hw)
* shouldn't be doing (or cancel everything in the
* stop callback) that but better safe than sorry.
*/
- flush_workqueue(local->workqueue);
-
local->suspended = true;
/* need suspended to be visible before quiescing is false */
barrier();
local->quiescing = false;

+ flush_workqueue(local->workqueue);
+
return 0;
}

--
1.6.2.5


--
Bob Copeland %% http://www.bobcopeland.com


2009-08-19 21:42:28

by Bob Copeland

[permalink] [raw]
Subject: Re: WARNING: at net/mac80211/mlme.c:2292

On Wed, Aug 19, 2009 at 4:47 PM, Fabio Comolli<[email protected]> wrote:
> Hi all.
> I see the following warning on an eeePC 900 (AR5001) running
> 2.6.31-rc6 after a suspend/resume cycle:
>
> [ ?292.377941] ------------[ cut here ]------------
> [ ?292.377976] WARNING: at net/mac80211/mlme.c:2292
> ieee80211_sta_work+0x89/0xc39 [mac80211]()

if (WARN_ON(local->suspended)) ...

> [ ?292.378457] ?[<c10199da>] ? __wake_up+0x1d/0x3d
> [ ?292.378466] ?[<c102a529>] ? insert_work+0x8f/0x96
> [ ?292.378473] ?[<c102a777>] ? queue_work_on+0x24/0x2b
> [ ?292.378480] ?[<c102a7a5>] ? queue_work+0x1a/0x39
> [ ?292.378506] ?[<f87e64dc>] ?
> ieee80211_mlme_notify_scan_completed+0x40/0x66 [mac80211]
> [ ?292.378532] ?[<f87e3caa>] ? ieee80211_scan_completed+0x2ef/0x2fc [mac80211]

Looks like ieee80211_scan_completed() racing with suspend()?

> ieee80211_mesh_notify_scan_completed+0x4d/0x64 [mac80211]
> Aug 10 20:08:35 archeee kernel: [ 1440.841185] ?[<f84860ce>] ?
> ieee80211_scan_completed+0x327/0x32f [mac80211]

Same here.

> The effect is that after the warning the interface is unusable until
> disabled and reenabled with the rfkill switch.

Interesting, I tried and didn't reproduce it, but I wasn't trying
while scanning before.

--
Bob Copeland %% http://www.bobcopeland.com

2009-08-21 14:19:32

by Bob Copeland

[permalink] [raw]
Subject: Re: WARNING: at net/mac80211/mlme.c:2292

On Wed, Aug 19, 2009 at 5:42 PM, Bob Copeland<[email protected]> wrote:
> On Wed, Aug 19, 2009 at 4:47 PM, Fabio Comolli<[email protected]> wrote:
>> Hi all.
>> I see the following warning on an eeePC 900 (AR5001) running
>> 2.6.31-rc6 after a suspend/resume cycle:
>>
>> [ ?292.377941] ------------[ cut here ]------------
>> [ ?292.377976] WARNING: at net/mac80211/mlme.c:2292
>> ieee80211_sta_work+0x89/0xc39 [mac80211]()
>
> ? ?if (WARN_ON(local->suspended)) ...

Okay, I think I see what is going on here.

suspend
ieee80211_scan_cancel
ieee80211_notify_scan_completed
ieee80211_mlme_scan_completed
ieee80211_ibss_scan_completed
ieee80211_mesh_scan_completed

All of these completed() notifications queue work. That's not a problem
because we flush the workqueue shortly thereafter. However, flushing the
workqueue runs the scheduled work, which can queue _more_ work (sometimes
on a timer with queue_delayed_work), and none of the work functions in
2.6.31 currently check for local->quiesced.

Repeat this process once more and we eventually hit the warning.

Can we just replace the last flush_workqueue() with cancel_delayed_work_sync()
or will that break something?

--
Bob Copeland %% http://www.bobcopeland.com

2009-08-23 15:04:00

by Fabio Comolli

[permalink] [raw]
Subject: Re: WARNING: at net/mac80211/mlme.c:2292

OK, some more info.
After the warning the system is usable but wireless doesn't work. The
hang happens when I activate the rfkill swich.

This happens even after a fresh boot. After activating the rfkill
switch the console fills with:

unregister_netdevice: waiting for device wlan0 to become free

with a line every 5 seconds or so. At this time the system is
completely unresponsive.

Regards,
Fabio





On Sun, Aug 23, 2009 at 4:40 PM, Fabio Comolli<[email protected]> wrote:
> Hi.
>
> On Sun, Aug 23, 2009 at 2:12 AM, Bob Copeland<[email protected]> wrote:
>> On Sat, Aug 22, 2009 at 09:29:39PM +0200, Fabio Comolli wrote:
>>> Hi Bob.
>>> Unfortunately the patch doesn't apply at all with compat-wireless,
>>> there's no "flush_workqueue" before "local->suspended" there....
>>
>> Ah yes, it got moved into ieee80211_stop_device().  Can you put
>> local->suspended and the barrier() ahead of that?
>>
>
> Well, this crashed my system. Backtrace copied by hand:
>
> warning at net/wireless/core.c    wdev_cleanup_work [cfg80211]
>
> warn_slowpat_common
> warn_slowpath_null
> wdev_cleanup_work [cfg80211]
> worker_thread
> wdev_cleanup_work [cfg80211]
> autoresolve_wake_function
> worker_thread
> kthread
> kthread
> kernel_thread_helper
>
>
>> Thanks!
>
> Regards,
> Fabio
>
>
>>
>> --
>> Bob Copeland %% http://www.bobcopeland.com
>>
>>
>

2009-08-22 13:41:02

by Bob Copeland

[permalink] [raw]
Subject: Re: WARNING: at net/mac80211/mlme.c:2292

On Sat, Aug 22, 2009 at 03:02:26PM +0200, Fabio Comolli wrote:
> Hi Bob.
>
> I'm not a git user. How can I user wireless-testing without git? Is
> there a patch or seomething?

No, I'm unaware of a way to get it without git. Well, okay, try
compat-wireless, but be sure to use the 'bleeding edge' compat-wireless,
and post the contents of its 'git-describe' file with the warnings so
I can be sure we're on the same page.

http://wireless.kernel.org/en/users/Download

--
Bob Copeland %% http://www.bobcopeland.com


2009-08-23 00:12:35

by Bob Copeland

[permalink] [raw]
Subject: Re: WARNING: at net/mac80211/mlme.c:2292

On Sat, Aug 22, 2009 at 09:29:39PM +0200, Fabio Comolli wrote:
> Hi Bob.
> Unfortunately the patch doesn't apply at all with compat-wireless,
> there's no "flush_workqueue" before "local->suspended" there....

Ah yes, it got moved into ieee80211_stop_device(). Can you put
local->suspended and the barrier() ahead of that?

Thanks!

--
Bob Copeland %% http://www.bobcopeland.com