I have just replaced my primary single-core notebook
with a nearly identical dual-core notebook,
and moved the usb-bluetooth peripheral from the old
machine to the new one.
On the single-core machine, suspend/resume (RAM) worked
fine even with the bluetooth module enabled.
On the new dual-core machine, resuming with bluetooth
enabled results in an infinite(?) lockup in an unbounded
loop in hub_tt_kevent(). With PM debug on, I see
tens of thousands of these messages scrolling on the console:
kernel: usb 5-1: clear tt 4 (9042) error -71
kernel: usb 5-1: clear tt 4 (9042) error -71
kernel: usb 5-1: clear tt 4 (9042) error -71
(over and over and ...)
By restricting iterations on the unbounded loop
the machine is able to resume again.
Greg / Marcel: any words of wisdom?
And we should probably put bounds permanently on that loop:
I devised/used this patch to accomplish it.
Now, I still get close to a thousand or so such
messages, in groups, showing up in syslog,
but at least the system can resume after suspend.
Signed-off-by: Mark Lord <[email protected]>
--- linux/drivers/usb/core/hub.c.orig 2007-04-26 12:02:47.000000000 -0400
+++ linux/drivers/usb/core/hub.c 2007-05-01 18:48:46.000000000 -0400
@@ -403,9 +403,10 @@
struct usb_hub *hub =
container_of(work, struct usb_hub, tt.kevent);
unsigned long flags;
+ int limit = 500;
spin_lock_irqsave (&hub->tt.lock, flags);
- while (!list_empty (&hub->tt.clear_list)) {
+ while (--limit && !list_empty (&hub->tt.clear_list)) {
struct list_head *temp;
struct usb_tt_clear *clear;
struct usb_device *hdev = hub->hdev;
-----
On Tue, 1 May 2007, Mark Lord wrote:
> I have just replaced my primary single-core notebook
> with a nearly identical dual-core notebook,
> and moved the usb-bluetooth peripheral from the old
> machine to the new one.
>
> On the single-core machine, suspend/resume (RAM) worked
> fine even with the bluetooth module enabled.
>
> On the new dual-core machine, resuming with bluetooth
> enabled results in an infinite(?) lockup in an unbounded
> loop in hub_tt_kevent(). With PM debug on, I see
> tens of thousands of these messages scrolling on the console:
>
> kernel: usb 5-1: clear tt 4 (9042) error -71
> kernel: usb 5-1: clear tt 4 (9042) error -71
> kernel: usb 5-1: clear tt 4 (9042) error -71
> (over and over and ...)
>
> By restricting iterations on the unbounded loop
> the machine is able to resume again.
>
> Greg / Marcel: any words of wisdom?
>
> And we should probably put bounds permanently on that loop:
>
> I devised/used this patch to accomplish it.
> Now, I still get close to a thousand or so such
> messages, in groups, showing up in syslog,
> but at least the system can resume after suspend.
A better approach would be to find out why your system gets into that loop
and fix the underlying cause.
Alan Stern
Alan Stern wrote:
>
> A better approach would be to find out why your system gets into that loop
> and fix the underlying cause.
Not better, just parallel.
That loop should not be unbounded, as this example proves.
But it also shouldn't get stuck there regardless.
Two fixes needed.
Cheers
On Wed, 2 May 2007, Mark Lord wrote:
> Alan Stern wrote:
> >
> > A better approach would be to find out why your system gets into that loop
> > and fix the underlying cause.
>
> Not better, just parallel.
>
> That loop should not be unbounded, as this example proves.
> But it also shouldn't get stuck there regardless.
>
> Two fixes needed.
If the code never gets stuck in a loop, then there's no need to check
whether the loop is unbounded! :-)
So only one fix needed.
Alan Stern
Alan Stern wrote:
> On Wed, 2 May 2007, Mark Lord wrote:
>
>> Alan Stern wrote:
>>> A better approach would be to find out why your system gets into that loop
>>> and fix the underlying cause.
>> Not better, just parallel.
>>
>> That loop should not be unbounded, as this example proves.
>> But it also shouldn't get stuck there regardless.
>>
>> Two fixes needed.
>
> If the code never gets stuck in a loop, then there's no need to check
> whether the loop is unbounded! :-)
Yes, except here we know it does actually get stuck in a loop,
and having unbounded loops in device-driver code is a known baddy.
One cannot predict perfectly exactly how devices will fail,
but one can program defensively against them with simple precautions
like limiting list traversals and the like. :)
Sure, Marcel may eventually look at the bluetooth code and fix it
to not get confused, but some other USB device may then show up
in the future with similar issues. The messages are still there
so we'll know about any future failure, but it won't just silently
crash the machine on resume this way.
Remember, resume is a very tough operation to debug at the best
of times, so adding some harmless robustness to the more troublesome
drives is a very Good Thing(tm) here.
Cheers
On Thu, 3 May 2007, Mark Lord wrote:
> > If the code never gets stuck in a loop, then there's no need to check
> > whether the loop is unbounded! :-)
>
> Yes, except here we know it does actually get stuck in a loop,
> and having unbounded loops in device-driver code is a known baddy.
>
> One cannot predict perfectly exactly how devices will fail,
> but one can program defensively against them with simple precautions
> like limiting list traversals and the like. :)
>
> Sure, Marcel may eventually look at the bluetooth code and fix it
> to not get confused, but some other USB device may then show up
> in the future with similar issues.
I'd agree, except that the problem isn't in the bluetooth code. And it
probably doesn't have anything in particular to do with the fact that this
is a bluetooth device; the TT code gets used with USB hubs only. More
accurately, the code helps manage the TT embedded in a high-speed USB hub,
allowing the hub to communicate with a full- or low-speed USB device
plugged into it.
> The messages are still there
> so we'll know about any future failure, but it won't just silently
> crash the machine on resume this way.
>
> Remember, resume is a very tough operation to debug at the best
> of times, so adding some harmless robustness to the more troublesome
> drives is a very Good Thing(tm) here.
Alan Stern
Greg ?
The oddball thing here is that on a UP machine with a UP kernel,
this (below) was never an issue.
After moving the drive to a dual-core machine and rebuilding
the kernel with SMP=y, the problem becomes a killer here.
The two machines are nearly identical, apart from the CPUs.
The failing machine is a Dell Inspiron 9400,
and mine isn't the only unit that has this issue.
----------Original message:
I have just replaced my primary single-core notebook
with a nearly identical dual-core notebook,
and moved the usb-bluetooth peripheral from the old
machine to the new one.
On the single-core machine, suspend/resume (RAM) worked
fine even with the bluetooth module enabled.
On the new dual-core machine, resuming with bluetooth
enabled results in an infinite(?) lockup in an unbounded
loop in hub_tt_kevent(). With PM debug on, I see
tens of thousands of these messages scrolling on the console:
kernel: usb 5-1: clear tt 4 (9042) error -71
kernel: usb 5-1: clear tt 4 (9042) error -71
kernel: usb 5-1: clear tt 4 (9042) error -71
(over and over and ...)
By restricting iterations on the unbounded loop
the machine is able to resume again.
Greg / Marcel: any words of wisdom?
And we should probably put bounds permanently on that loop:
I devised/used this patch to accomplish it.
Now, I still get close to a thousand or so such
messages, in groups, showing up in syslog,
but at least the system can resume after suspend.
Signed-off-by: Mark Lord <[email protected]>
--- linux/drivers/usb/core/hub.c.orig 2007-04-26 12:02:47.000000000 -0400
+++ linux/drivers/usb/core/hub.c 2007-05-01 18:48:46.000000000 -0400
@@ -403,9 +403,10 @@
struct usb_hub *hub =
container_of(work, struct usb_hub, tt.kevent);
unsigned long flags;
+ int limit = 500;
spin_lock_irqsave (&hub->tt.lock, flags);
- while (!list_empty (&hub->tt.clear_list)) {
+ while (--limit && !list_empty (&hub->tt.clear_list)) {
struct list_head *temp;
struct usb_tt_clear *clear;
struct usb_device *hdev = hub->hdev;
-----
On Tue, May 08, 2007 at 09:53:05AM -0400, Mark Lord wrote:
> Greg ?
>
> The oddball thing here is that on a UP machine with a UP kernel,
> this (below) was never an issue.
>
> After moving the drive to a dual-core machine and rebuilding
> the kernel with SMP=y, the problem becomes a killer here.
> The two machines are nearly identical, apart from the CPUs.
>
> The failing machine is a Dell Inspiron 9400,
> and mine isn't the only unit that has this issue.
Ok, I'll take a patch to keep the loop from going forever, but the main
issue here is that there is probably a hardware failure somewhere.
Care to resend the patch with proper formatting?
thanks,
greg k-h
Greg KH wrote:
>
> Ok, I'll take a patch to keep the loop from going forever, but the main
> issue here is that there is probably a hardware failure somewhere.
Okay, found it. The root cause here was a missing CONFIG_USB_SUSPEND=y,
which means the hci_usb device never got marked as USB_STATE_SUSPENDED,
which then caused the loop to go on forever.
The system works fine now with CONFIG_USB_SUSPEND=y in the .config.
Here's the patch to prevent future lockups for this or other causes.
I no longer need it, but it does still seem a good idea.
Signed-off-by: Mark Lord <[email protected]>
---
--- old/drivers/usb/core/hub.c 2007-04-26 12:02:47.000000000 -0400
+++ linux/drivers/usb/core/hub.c 2007-05-01 18:48:46.000000000 -0400
@@ -403,9 +403,10 @@
struct usb_hub *hub =
container_of(work, struct usb_hub, tt.kevent);
unsigned long flags;
+ int limit = 100;
spin_lock_irqsave (&hub->tt.lock, flags);
- while (!list_empty (&hub->tt.clear_list)) {
+ while (--limit && !list_empty (&hub->tt.clear_list)) {
struct list_head *temp;
struct usb_tt_clear *clear;
struct usb_device *hdev = hub->hdev;
This is a note to let you know that I've just added the patch titled
Subject: USB: hub.c loops forever on resume from ram due to bluetooth
to my gregkh-2.6 tree. Its filename is
usb-hub.c-loops-forever-on-resume-from-ram-due-to-bluetooth.patch
This tree can be found at
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/
>From [email protected] Mon May 14 16:48:14 2007
From: Mark Lord <[email protected]>
Date: Mon, 14 May 2007 19:48:02 -0400
Subject: USB: hub.c loops forever on resume from ram due to bluetooth
To: Greg KH <[email protected]>
Cc: Linux Kernel <[email protected]>, Andrew Morton <[email protected]>, [email protected]
Message-ID: <[email protected]>
Okay, found it. The root cause here was a missing CONFIG_USB_SUSPEND=y,
which means the hci_usb device never got marked as USB_STATE_SUSPENDED,
which then caused the loop to go on forever.
The system works fine now with CONFIG_USB_SUSPEND=y in the .config.
Here's the patch to prevent future lockups for this or other causes.
I no longer need it, but it does still seem a good idea.
Signed-off-by: Mark Lord <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/usb/core/hub.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -403,9 +403,10 @@ static void hub_tt_kevent (struct work_s
struct usb_hub *hub =
container_of(work, struct usb_hub, tt.kevent);
unsigned long flags;
+ int limit = 100;
spin_lock_irqsave (&hub->tt.lock, flags);
- while (!list_empty (&hub->tt.clear_list)) {
+ while (--limit && !list_empty (&hub->tt.clear_list)) {
struct list_head *temp;
struct usb_tt_clear *clear;
struct usb_device *hdev = hub->hdev;
Patches currently in gregkh-2.6 which might be from [email protected] are
usb/usb-hub.c-loops-forever-on-resume-from-ram-due-to-bluetooth.patch