2008-07-02 20:57:23

by Giacomo Mulas

[permalink] [raw]
Subject: b43 locks the machine when resuming after suspend to disk

I tried searching on the list for this, before posting, but searching the
mailing list archives with keywords such as b43, suspend, resume... brings
up such a ludicrous amount of threads that it's not realistic to check them
all, so just tell me what to look for if it's been asked already.

Whenever I do a suspend to disk after using b43, the computer freezes hard
as soon as it attempts again to access b43 after resume.

Minimal how to reproduce the freeze:
1) modprobe b43
2) hibernate (using any suspend to disk, which one is irrelevant)
3) resume
4) ifconfig wlan0 up

This has been happening (at least) since b43 was included in the mainline
kernel, on my Asus A6K laptop running an x86_64 kernel (now the latest
2.6.25 stable release or compiled from the latest released debian sid
sources). The nvidia module is not responsible: I explicitely booted my
laptop in single user mode without any unnecessary modules, same result. It
does not happen using the windows driver with ndiswrapper (which I would
prefer to avoid for other reasons), so it definitely depends on b43 or
something it depends on. Unloading and reloading the b43 module and all the
other modules it depends on does not change anything. Just loading the
module once, hibernating and resuming means freeze-up as soon as the module
is actually initialised next time, regardless of it having been unloaded and
reloaded any number of times before or after the suspend-resume cycle. No
oopses, nothing on system logs, just instant freeze-up. Is there some
testing a user can do to help nailing this? I am not a kernel developer,
even if I am a decent C programmer.

Please CC me on replies, I am not on the list.

Thanks in advance,
Giacomo Mulas

--
_________________________________________________________________

Giacomo Mulas <[email protected]>
_________________________________________________________________

OSSERVATORIO ASTRONOMICO DI CAGLIARI
Str. 54, Loc. Poggio dei Pini * 09012 Capoterra (CA)

Tel. (OAC): +39 070 71180 248 Fax : +39 070 71180 222
Tel. (UNICA): +39 070 675 4916
_________________________________________________________________

"When the storms are raging around you, stay right where you are"
(Freddy Mercury)
_________________________________________________________________


2008-07-02 21:39:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: b43 locks the machine when resuming after suspend to disk

On Wednesday, 2 of July 2008, Giacomo Mulas wrote:
> I tried searching on the list for this, before posting, but searching the
> mailing list archives with keywords such as b43, suspend, resume... brings
> up such a ludicrous amount of threads that it's not realistic to check them
> all, so just tell me what to look for if it's been asked already.
>
> Whenever I do a suspend to disk after using b43, the computer freezes hard
> as soon as it attempts again to access b43 after resume.
>
> Minimal how to reproduce the freeze:
> 1) modprobe b43
> 2) hibernate (using any suspend to disk, which one is irrelevant)
> 3) resume
> 4) ifconfig wlan0 up
>
> This has been happening (at least) since b43 was included in the mainline
> kernel, on my Asus A6K laptop running an x86_64 kernel (now the latest
> 2.6.25 stable release or compiled from the latest released debian sid
> sources). The nvidia module is not responsible: I explicitely booted my
> laptop in single user mode without any unnecessary modules, same result. It
> does not happen using the windows driver with ndiswrapper (which I would
> prefer to avoid for other reasons), so it definitely depends on b43 or
> something it depends on. Unloading and reloading the b43 module and all the
> other modules it depends on does not change anything. Just loading the
> module once, hibernating and resuming means freeze-up as soon as the module
> is actually initialised next time, regardless of it having been unloaded and
> reloaded any number of times before or after the suspend-resume cycle. No
> oopses, nothing on system logs, just instant freeze-up. Is there some
> testing a user can do to help nailing this? I am not a kernel developer,
> even if I am a decent C programmer.
>
> Please CC me on replies, I am not on the list.

I think you need the appended patch, but it only applies to linux-next.

Thanks,
Rafael

---
When a driver rejects a frame in it's ->tx() callback, it must also
stop queues, otherwise mac80211 can go into a loop here. Detect this
situation and abort the loop after five retries, warning about the
driver bug.

Signed-off-by: Johannes Berg <[email protected]>
---
net/mac80211/tx.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

Index: linux-next/net/mac80211/tx.c
===================================================================
--- linux-next.orig/net/mac80211/tx.c
+++ linux-next/net/mac80211/tx.c
@@ -1144,7 +1144,7 @@ static int ieee80211_tx(struct net_devic
struct ieee80211_tx_data tx;
ieee80211_tx_result res_prepare;
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
- int ret, i;
+ int ret, i, retries = 0;
u16 queue;

queue = skb_get_queue_mapping(skb);
@@ -1206,6 +1206,13 @@ retry:
*/
if (!__netif_subqueue_stopped(local->mdev, queue)) {
clear_bit(queue, local->queues_pending);
+ retries++;
+ /*
+ * Driver bug, it's rejecting packets but
+ * not stopping queues.
+ */
+ if (WARN_ON_ONCE(retries > 5))
+ goto drop;
goto retry;
}
store->skb = skb;

2008-07-02 21:47:10

by Johannes Berg

[permalink] [raw]
Subject: Re: b43 locks the machine when resuming after suspend to disk

On Wed, 2008-07-02 at 23:40 +0200, Rafael J. Wysocki wrote:
> On Wednesday, 2 of July 2008, Giacomo Mulas wrote:
> > I tried searching on the list for this, before posting, but searching the
> > mailing list archives with keywords such as b43, suspend, resume... brings
> > up such a ludicrous amount of threads that it's not realistic to check them
> > all, so just tell me what to look for if it's been asked already.
> >
> > Whenever I do a suspend to disk after using b43, the computer freezes hard
> > as soon as it attempts again to access b43 after resume.
> >
> > Minimal how to reproduce the freeze:
> > 1) modprobe b43
> > 2) hibernate (using any suspend to disk, which one is irrelevant)
> > 3) resume
> > 4) ifconfig wlan0 up
> >
> > This has been happening (at least) since b43 was included in the mainline
> > kernel, on my Asus A6K laptop running an x86_64 kernel (now the latest
> > 2.6.25 stable release or compiled from the latest released debian sid
> > sources). The nvidia module is not responsible: I explicitely booted my
> > laptop in single user mode without any unnecessary modules, same result. It
> > does not happen using the windows driver with ndiswrapper (which I would
> > prefer to avoid for other reasons), so it definitely depends on b43 or
> > something it depends on. Unloading and reloading the b43 module and all the
> > other modules it depends on does not change anything. Just loading the
> > module once, hibernating and resuming means freeze-up as soon as the module
> > is actually initialised next time, regardless of it having been unloaded and
> > reloaded any number of times before or after the suspend-resume cycle. No
> > oopses, nothing on system logs, just instant freeze-up. Is there some
> > testing a user can do to help nailing this? I am not a kernel developer,
> > even if I am a decent C programmer.
> >
> > Please CC me on replies, I am not on the list.
>
> I think you need the appended patch, but it only applies to linux-next.

A different version has been merged into what will become 2.6.26. I'll
see what we can do about stable.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-07-02 21:56:56

by Johannes Berg

[permalink] [raw]
Subject: Re: b43 locks the machine when resuming after suspend to disk


> > I think you need the appended patch, but it only applies to linux-next.
>
> A different version has been merged into what will become 2.6.26. I'll
> see what we can do about stable.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ef3a62d272f033989e83eb1f26505f93f93e3e69;hp=6d1a3fb567a728d31474636e167c324702a0c38b

Anybody have a stable tree around to see if that applies? I think it
should.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-07-02 22:31:29

by Larry Finger

[permalink] [raw]
Subject: Re: b43 locks the machine when resuming after suspend to disk

Index: linux-2.6/net/mac80211/tx.c
===================================================================
--- linux-2.6.orig/net/mac80211/tx.c
+++ linux-2.6/net/mac80211/tx.c
@@ -1090,7 +1090,7 @@ static int ieee80211_tx(struct net_devic
ieee80211_tx_handler *handler;
struct ieee80211_txrx_data tx;
ieee80211_txrx_result res = TXRX_DROP, res_prepare;
- int ret, i;
+ int ret, i, retries = 0;

WARN_ON(__ieee80211_queue_pending(local, control->queue));

@@ -1181,6 +1181,13 @@ retry:
if (!__ieee80211_queue_stopped(local, control->queue)) {
clear_bit(IEEE80211_LINK_STATE_PENDING,
&local->state[control->queue]);
+ retries++;
+ /*
+ * Driver bug, it's rejecting packets but
+ * not stopping queues.
+ */
+ if (WARN_ON_ONCE(retries > 5))
+ goto drop;
goto retry;
}
memcpy(&store->control, control,


Attachments:
mac80211_tx_patch (847.00 B)

2008-07-02 22:38:38

by Johannes Berg

[permalink] [raw]
Subject: Re: b43 locks the machine when resuming after suspend to disk

On Wed, 2008-07-02 at 17:32 -0500, Larry Finger wrote:
> Johannes Berg wrote:
> >>> I think you need the appended patch, but it only applies to linux-next.
> >> A different version has been merged into what will become 2.6.26. I'll
> >> see what we can do about stable.
> >
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ef3a62d272f033989e83eb1f26505f93f93e3e69;hp=6d1a3fb567a728d31474636e167c324702a0c38b
> >
> > Anybody have a stable tree around to see if that applies? I think it
> > should.
>
> It didn't, but this version will. It has been compile tested only.

Ah, the TXRX result thing, thanks a bunch. Adding stable to CC, can you
pick this up?


Subject: mac80211: detect driver tx bugs

When a driver rejects a frame in it's ->tx() callback, it must also
stop queues, otherwise mac80211 can go into a loop here. Detect this
situation and abort the loop after five retries, warning about the
driver bug.

Thanks to Larry Finger <[email protected]> for doing the -stable
port.


--- linux-2.6.orig/net/mac80211/tx.c
+++ linux-2.6/net/mac80211/tx.c
@@ -1090,7 +1090,7 @@ static int ieee80211_tx(struct net_devic
ieee80211_tx_handler *handler;
struct ieee80211_txrx_data tx;
ieee80211_txrx_result res = TXRX_DROP, res_prepare;
- int ret, i;
+ int ret, i, retries = 0;

WARN_ON(__ieee80211_queue_pending(local, control->queue));

@@ -1181,6 +1181,13 @@ retry:
if (!__ieee80211_queue_stopped(local, control->queue)) {
clear_bit(IEEE80211_LINK_STATE_PENDING,
&local->state[control->queue]);
+ retries++;
+ /*
+ * Driver bug, it's rejecting packets but
+ * not stopping queues.
+ */
+ if (WARN_ON_ONCE(retries > 5))
+ goto drop;
goto retry;
}
memcpy(&store->control, control,


2008-07-02 22:57:53

by Johannes Berg

[permalink] [raw]
Subject: Re: b43 locks the machine when resuming after suspend to disk

On Wed, 2008-07-02 at 23:40 +0200, Rafael J. Wysocki wrote:
> On Wednesday, 2 of July 2008, Giacomo Mulas wrote:
> > I tried searching on the list for this, before posting, but searching the
> > mailing list archives with keywords such as b43, suspend, resume... brings
> > up such a ludicrous amount of threads that it's not realistic to check them
> > all, so just tell me what to look for if it's been asked already.
> >
> > Whenever I do a suspend to disk after using b43, the computer freezes hard
> > as soon as it attempts again to access b43 after resume.
> >
> > Minimal how to reproduce the freeze:
> > 1) modprobe b43
> > 2) hibernate (using any suspend to disk, which one is irrelevant)
> > 3) resume
> > 4) ifconfig wlan0 up

> I think you need the appended patch, but it only applies to linux-next.

Rafael, you misled me :) This is a completely different thing.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-07-02 23:06:48

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: b43 locks the machine when resuming after suspend to disk

On Thursday, 3 of July 2008, Johannes Berg wrote:
> On Wed, 2008-07-02 at 23:40 +0200, Rafael J. Wysocki wrote:
> > On Wednesday, 2 of July 2008, Giacomo Mulas wrote:
> > > I tried searching on the list for this, before posting, but searching the
> > > mailing list archives with keywords such as b43, suspend, resume... brings
> > > up such a ludicrous amount of threads that it's not realistic to check them
> > > all, so just tell me what to look for if it's been asked already.
> > >
> > > Whenever I do a suspend to disk after using b43, the computer freezes hard
> > > as soon as it attempts again to access b43 after resume.
> > >
> > > Minimal how to reproduce the freeze:
> > > 1) modprobe b43
> > > 2) hibernate (using any suspend to disk, which one is irrelevant)
> > > 3) resume
> > > 4) ifconfig wlan0 up
>
> > I think you need the appended patch, but it only applies to linux-next.
>
> Rafael, you misled me :) This is a completely different thing.

Ah, sorry then. I was too quick with my response.

Rafael

2008-07-02 23:08:37

by Johannes Berg

[permalink] [raw]
Subject: Re: b43 locks the machine when resuming after suspend to disk

On Thu, 2008-07-03 at 01:08 +0200, Rafael J. Wysocki wrote:
> On Thursday, 3 of July 2008, Johannes Berg wrote:
> > On Wed, 2008-07-02 at 23:40 +0200, Rafael J. Wysocki wrote:
> > > On Wednesday, 2 of July 2008, Giacomo Mulas wrote:
> > > > I tried searching on the list for this, before posting, but searching the
> > > > mailing list archives with keywords such as b43, suspend, resume... brings
> > > > up such a ludicrous amount of threads that it's not realistic to check them
> > > > all, so just tell me what to look for if it's been asked already.
> > > >
> > > > Whenever I do a suspend to disk after using b43, the computer freezes hard
> > > > as soon as it attempts again to access b43 after resume.
> > > >
> > > > Minimal how to reproduce the freeze:
> > > > 1) modprobe b43
> > > > 2) hibernate (using any suspend to disk, which one is irrelevant)
> > > > 3) resume
> > > > 4) ifconfig wlan0 up
> >
> > > I think you need the appended patch, but it only applies to linux-next.
> >
> > Rafael, you misled me :) This is a completely different thing.
>
> Ah, sorry then. I was too quick with my response.

No trouble, it reminded me that I wanted to ask stable to pick up that
patch anyway although I don't think we ever ran into the issue there.

This seems very odd though, Giacomo, are you sure it also happens if you
unload the module?

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-07-02 23:25:56

by Larry Finger

[permalink] [raw]
Subject: Re: b43 locks the machine when resuming after suspend to disk

Johannes Berg wrote:
> On Thu, 2008-07-03 at 01:08 +0200, Rafael J. Wysocki wrote:
>> On Thursday, 3 of July 2008, Johannes Berg wrote:
>>> Rafael, you misled me :) This is a completely different thing.
>> Ah, sorry then. I was too quick with my response.
>
> No trouble, it reminded me that I wanted to ask stable to pick up that
> patch anyway although I don't think we ever ran into the issue there.
>
> This seems very odd though, Giacomo, are you sure it also happens if you
> unload the module?

I'm confused. Should the "mac80211: detect driver tx bugs" patch be sent to stable?

Larry

2008-07-02 23:42:37

by Johannes Berg

[permalink] [raw]
Subject: Re: b43 locks the machine when resuming after suspend to disk

On Wed, 2008-07-02 at 18:27 -0500, Larry Finger wrote:
> Johannes Berg wrote:
> > On Thu, 2008-07-03 at 01:08 +0200, Rafael J. Wysocki wrote:
> >> On Thursday, 3 of July 2008, Johannes Berg wrote:
> >>> Rafael, you misled me :) This is a completely different thing.
> >> Ah, sorry then. I was too quick with my response.
> >
> > No trouble, it reminded me that I wanted to ask stable to pick up that
> > patch anyway although I don't think we ever ran into the issue there.
> >
> > This seems very odd though, Giacomo, are you sure it also happens if you
> > unload the module?
>
> I'm confused. Should the "mac80211: detect driver tx bugs" patch be sent to stable?

Yeah I think it still should even if that's not the bug here.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-07-03 12:30:45

by Giacomo Mulas

[permalink] [raw]
Subject: Re: b43 locks the machine when resuming after suspend to disk

On Thu, 3 Jul 2008, Johannes Berg wrote:

> This seems very odd though, Giacomo, are you sure it also happens if you
> unload the module?

yes, absolutely (unfortunately). I can unload the module before suspending,
reload it after resuming, same result; I can actually do any number of
suspend/resumes, unload and reload the modules any number of times,
everything still works until I try to ifconfig up the interface, then it
hangs solid. I also tried old-style suspend to disk, suspend2, user-space
suspend... exactly the same.

I am now compining a kernel with the patch you sent, to see whether this
improves things. I will let you know. By the way, is there a module
debugging option I could use to cause the b43 and/or mac80211 modules to use
lots of printk's, so that I could at least give you a hint as to where the
code hangs?

Thanks, bye
Giacomo

P.S. please CC replies to me, I'm not on the list(s)

--
_________________________________________________________________

Giacomo Mulas <[email protected]>
_________________________________________________________________

OSSERVATORIO ASTRONOMICO DI CAGLIARI
Str. 54, Loc. Poggio dei Pini * 09012 Capoterra (CA)

Tel. (OAC): +39 070 71180 248 Fax : +39 070 71180 222
Tel. (UNICA): +39 070 675 4916
_________________________________________________________________

"When the storms are raging around you, stay right where you are"
(Freddy Mercury)
_________________________________________________________________

2008-07-10 18:11:03

by Pavel Machek

[permalink] [raw]
Subject: Re: b43 locks the machine when resuming after suspend to disk

Hi!

> >This seems very odd though, Giacomo, are you sure it
> >also happens if you
> >unload the module?
>
> yes, absolutely (unfortunately). I can unload the module
> before suspending,
> reload it after resuming, same result; I can actually do
> any number of
> suspend/resumes, unload and reload the modules any
> number of times,
> everything still works until I try to ifconfig up the
> interface, then it
> hangs solid.

That starts to sound like some core problem -- bug in b43 does not
explain symptoms you see. What other drivers does b43 share interrupt
with? Do those work after resume? Maybe irqpoll helps?

> I am now compining a kernel with the patch you sent, to
> see whether this
> improves things. I will let you know. By the way, is
> there a module
> debugging option I could use to cause the b43 and/or
> mac80211 modules to use
> lots of printk's, so that I could at least give you a
> hint as to where the
> code hangs?

Adding printks to b43's resume should be easy...

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html