2007-05-01 23:28:41

by Mark Lord

[permalink] [raw]
Subject: [BUG] usb/core/hub.c loops forever on resume from ram due to bluetooth

I have just replaced my primary single-core notebook
with a nearly identical dual-core notebook,
and moved the usb-bluetooth peripheral from the old
machine to the new one.

On the single-core machine, suspend/resume (RAM) worked
fine even with the bluetooth module enabled.

On the new dual-core machine, resuming with bluetooth
enabled results in an infinite(?) lockup in an unbounded
loop in hub_tt_kevent(). With PM debug on, I see
tens of thousands of these messages scrolling on the console:

kernel: usb 5-1: clear tt 4 (9042) error -71
kernel: usb 5-1: clear tt 4 (9042) error -71
kernel: usb 5-1: clear tt 4 (9042) error -71
(over and over and ...)

By restricting iterations on the unbounded loop
the machine is able to resume again.

Greg / Marcel: any words of wisdom?

And we should probably put bounds permanently on that loop:

I devised/used this patch to accomplish it.
Now, I still get close to a thousand or so such
messages, in groups, showing up in syslog,
but at least the system can resume after suspend.

Signed-off-by: Mark Lord <[email protected]>

--- linux/drivers/usb/core/hub.c.orig 2007-04-26 12:02:47.000000000 -0400
+++ linux/drivers/usb/core/hub.c 2007-05-01 18:48:46.000000000 -0400
@@ -403,9 +403,10 @@
struct usb_hub *hub =
container_of(work, struct usb_hub, tt.kevent);
unsigned long flags;
+ int limit = 500;

spin_lock_irqsave (&hub->tt.lock, flags);
- while (!list_empty (&hub->tt.clear_list)) {
+ while (--limit && !list_empty (&hub->tt.clear_list)) {
struct list_head *temp;
struct usb_tt_clear *clear;
struct usb_device *hdev = hub->hdev;
-----


2007-05-02 19:59:29

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] [BUG] usb/core/hub.c loops forever on resume from ram due to bluetooth

On Tue, 1 May 2007, Mark Lord wrote:

> I have just replaced my primary single-core notebook
> with a nearly identical dual-core notebook,
> and moved the usb-bluetooth peripheral from the old
> machine to the new one.
>
> On the single-core machine, suspend/resume (RAM) worked
> fine even with the bluetooth module enabled.
>
> On the new dual-core machine, resuming with bluetooth
> enabled results in an infinite(?) lockup in an unbounded
> loop in hub_tt_kevent(). With PM debug on, I see
> tens of thousands of these messages scrolling on the console:
>
> kernel: usb 5-1: clear tt 4 (9042) error -71
> kernel: usb 5-1: clear tt 4 (9042) error -71
> kernel: usb 5-1: clear tt 4 (9042) error -71
> (over and over and ...)
>
> By restricting iterations on the unbounded loop
> the machine is able to resume again.
>
> Greg / Marcel: any words of wisdom?
>
> And we should probably put bounds permanently on that loop:
>
> I devised/used this patch to accomplish it.
> Now, I still get close to a thousand or so such
> messages, in groups, showing up in syslog,
> but at least the system can resume after suspend.

A better approach would be to find out why your system gets into that loop
and fix the underlying cause.

Alan Stern

2007-05-02 21:39:09

by Mark Lord

[permalink] [raw]
Subject: Re: [linux-usb-devel] [BUG] usb/core/hub.c loops forever on resume from ram due to bluetooth

Alan Stern wrote:
>
> A better approach would be to find out why your system gets into that loop
> and fix the underlying cause.

Not better, just parallel.

That loop should not be unbounded, as this example proves.
But it also shouldn't get stuck there regardless.

Two fixes needed.

Cheers

2007-05-03 13:33:27

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] [BUG] usb/core/hub.c loops forever on resume from ram due to bluetooth

On Wed, 2 May 2007, Mark Lord wrote:

> Alan Stern wrote:
> >
> > A better approach would be to find out why your system gets into that loop
> > and fix the underlying cause.
>
> Not better, just parallel.
>
> That loop should not be unbounded, as this example proves.
> But it also shouldn't get stuck there regardless.
>
> Two fixes needed.

If the code never gets stuck in a loop, then there's no need to check
whether the loop is unbounded! :-)

So only one fix needed.

Alan Stern

2007-05-03 13:44:39

by Mark Lord

[permalink] [raw]
Subject: Re: [linux-usb-devel] [BUG] usb/core/hub.c loops forever on resume from ram due to bluetooth

Alan Stern wrote:
> On Wed, 2 May 2007, Mark Lord wrote:
>
>> Alan Stern wrote:
>>> A better approach would be to find out why your system gets into that loop
>>> and fix the underlying cause.
>> Not better, just parallel.
>>
>> That loop should not be unbounded, as this example proves.
>> But it also shouldn't get stuck there regardless.
>>
>> Two fixes needed.
>
> If the code never gets stuck in a loop, then there's no need to check
> whether the loop is unbounded! :-)

Yes, except here we know it does actually get stuck in a loop,
and having unbounded loops in device-driver code is a known baddy.

One cannot predict perfectly exactly how devices will fail,
but one can program defensively against them with simple precautions
like limiting list traversals and the like. :)

Sure, Marcel may eventually look at the bluetooth code and fix it
to not get confused, but some other USB device may then show up
in the future with similar issues. The messages are still there
so we'll know about any future failure, but it won't just silently
crash the machine on resume this way.

Remember, resume is a very tough operation to debug at the best
of times, so adding some harmless robustness to the more troublesome
drives is a very Good Thing(tm) here.

Cheers

2007-05-03 14:46:20

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] [BUG] usb/core/hub.c loops forever on resume from ram due to bluetooth

On Thu, 3 May 2007, Mark Lord wrote:

> > If the code never gets stuck in a loop, then there's no need to check
> > whether the loop is unbounded! :-)
>
> Yes, except here we know it does actually get stuck in a loop,
> and having unbounded loops in device-driver code is a known baddy.
>
> One cannot predict perfectly exactly how devices will fail,
> but one can program defensively against them with simple precautions
> like limiting list traversals and the like. :)
>
> Sure, Marcel may eventually look at the bluetooth code and fix it
> to not get confused, but some other USB device may then show up
> in the future with similar issues.

I'd agree, except that the problem isn't in the bluetooth code. And it
probably doesn't have anything in particular to do with the fact that this
is a bluetooth device; the TT code gets used with USB hubs only. More
accurately, the code helps manage the TT embedded in a high-speed USB hub,
allowing the hub to communicate with a full- or low-speed USB device
plugged into it.

> The messages are still there
> so we'll know about any future failure, but it won't just silently
> crash the machine on resume this way.
>
> Remember, resume is a very tough operation to debug at the best
> of times, so adding some harmless robustness to the more troublesome
> drives is a very Good Thing(tm) here.

Alan Stern

2007-05-08 13:53:10

by Mark Lord

[permalink] [raw]
Subject: Re: [BUG] usb/core/hub.c loops forever on resume from ram due to bluetooth

Greg ?

The oddball thing here is that on a UP machine with a UP kernel,
this (below) was never an issue.

After moving the drive to a dual-core machine and rebuilding
the kernel with SMP=y, the problem becomes a killer here.
The two machines are nearly identical, apart from the CPUs.

The failing machine is a Dell Inspiron 9400,
and mine isn't the only unit that has this issue.


----------Original message:

I have just replaced my primary single-core notebook
with a nearly identical dual-core notebook,
and moved the usb-bluetooth peripheral from the old
machine to the new one.

On the single-core machine, suspend/resume (RAM) worked
fine even with the bluetooth module enabled.

On the new dual-core machine, resuming with bluetooth
enabled results in an infinite(?) lockup in an unbounded
loop in hub_tt_kevent(). With PM debug on, I see
tens of thousands of these messages scrolling on the console:

kernel: usb 5-1: clear tt 4 (9042) error -71
kernel: usb 5-1: clear tt 4 (9042) error -71
kernel: usb 5-1: clear tt 4 (9042) error -71
(over and over and ...)

By restricting iterations on the unbounded loop
the machine is able to resume again.

Greg / Marcel: any words of wisdom?

And we should probably put bounds permanently on that loop:

I devised/used this patch to accomplish it.
Now, I still get close to a thousand or so such
messages, in groups, showing up in syslog,
but at least the system can resume after suspend.

Signed-off-by: Mark Lord <[email protected]>

--- linux/drivers/usb/core/hub.c.orig 2007-04-26 12:02:47.000000000 -0400
+++ linux/drivers/usb/core/hub.c 2007-05-01 18:48:46.000000000 -0400
@@ -403,9 +403,10 @@
struct usb_hub *hub =
container_of(work, struct usb_hub, tt.kevent);
unsigned long flags;
+ int limit = 500;

spin_lock_irqsave (&hub->tt.lock, flags);
- while (!list_empty (&hub->tt.clear_list)) {
+ while (--limit && !list_empty (&hub->tt.clear_list)) {
struct list_head *temp;
struct usb_tt_clear *clear;
struct usb_device *hdev = hub->hdev;
-----

2007-05-14 01:42:33

by Greg KH

[permalink] [raw]
Subject: Re: [BUG] usb/core/hub.c loops forever on resume from ram due to bluetooth

On Tue, May 08, 2007 at 09:53:05AM -0400, Mark Lord wrote:
> Greg ?
>
> The oddball thing here is that on a UP machine with a UP kernel,
> this (below) was never an issue.
>
> After moving the drive to a dual-core machine and rebuilding
> the kernel with SMP=y, the problem becomes a killer here.
> The two machines are nearly identical, apart from the CPUs.
>
> The failing machine is a Dell Inspiron 9400,
> and mine isn't the only unit that has this issue.

Ok, I'll take a patch to keep the loop from going forever, but the main
issue here is that there is probably a hardware failure somewhere.

Care to resend the patch with proper formatting?

thanks,

greg k-h

2007-05-14 23:48:16

by Mark Lord

[permalink] [raw]
Subject: Re: [BUG] usb/core/hub.c loops forever on resume from ram due to bluetooth

Greg KH wrote:
>
> Ok, I'll take a patch to keep the loop from going forever, but the main
> issue here is that there is probably a hardware failure somewhere.

Okay, found it. The root cause here was a missing CONFIG_USB_SUSPEND=y,
which means the hci_usb device never got marked as USB_STATE_SUSPENDED,
which then caused the loop to go on forever.

The system works fine now with CONFIG_USB_SUSPEND=y in the .config.

Here's the patch to prevent future lockups for this or other causes.
I no longer need it, but it does still seem a good idea.

Signed-off-by: Mark Lord <[email protected]>
---

--- old/drivers/usb/core/hub.c 2007-04-26 12:02:47.000000000 -0400
+++ linux/drivers/usb/core/hub.c 2007-05-01 18:48:46.000000000 -0400
@@ -403,9 +403,10 @@
struct usb_hub *hub =
container_of(work, struct usb_hub, tt.kevent);
unsigned long flags;
+ int limit = 100;

spin_lock_irqsave (&hub->tt.lock, flags);
- while (!list_empty (&hub->tt.clear_list)) {
+ while (--limit && !list_empty (&hub->tt.clear_list)) {
struct list_head *temp;
struct usb_tt_clear *clear;
struct usb_device *hdev = hub->hdev;

2007-05-22 23:53:32

by Greg KH

[permalink] [raw]
Subject: patch usb-hub.c-loops-forever-on-resume-from-ram-due-to-bluetooth.patch added to gregkh-2.6 tree


This is a note to let you know that I've just added the patch titled

Subject: USB: hub.c loops forever on resume from ram due to bluetooth

to my gregkh-2.6 tree. Its filename is

usb-hub.c-loops-forever-on-resume-from-ram-due-to-bluetooth.patch

This tree can be found at
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


>From [email protected] Mon May 14 16:48:14 2007
From: Mark Lord <[email protected]>
Date: Mon, 14 May 2007 19:48:02 -0400
Subject: USB: hub.c loops forever on resume from ram due to bluetooth
To: Greg KH <[email protected]>
Cc: Linux Kernel <[email protected]>, Andrew Morton <[email protected]>, [email protected]
Message-ID: <[email protected]>


Okay, found it. The root cause here was a missing CONFIG_USB_SUSPEND=y,
which means the hci_usb device never got marked as USB_STATE_SUSPENDED,
which then caused the loop to go on forever.

The system works fine now with CONFIG_USB_SUSPEND=y in the .config.

Here's the patch to prevent future lockups for this or other causes.
I no longer need it, but it does still seem a good idea.

Signed-off-by: Mark Lord <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/core/hub.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -403,9 +403,10 @@ static void hub_tt_kevent (struct work_s
struct usb_hub *hub =
container_of(work, struct usb_hub, tt.kevent);
unsigned long flags;
+ int limit = 100;

spin_lock_irqsave (&hub->tt.lock, flags);
- while (!list_empty (&hub->tt.clear_list)) {
+ while (--limit && !list_empty (&hub->tt.clear_list)) {
struct list_head *temp;
struct usb_tt_clear *clear;
struct usb_device *hdev = hub->hdev;


Patches currently in gregkh-2.6 which might be from [email protected] are

usb/usb-hub.c-loops-forever-on-resume-from-ram-due-to-bluetooth.patch