2007-05-29 16:36:18

by Mark Lord

[permalink] [raw]
Subject: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset

I just "upgraded" from 2.6.21.3 to 2.6.22-rc3,
but will be rebooting back into 2.6.21.xx shortly.

Suspend/Resume (RAM) works perfectly on this Dell i9400 dual-core notebook.
Except with 2.6.22-rc3, all USB devices are non-functional on resume.
Most of the time.
Once in five reboots, my mouse worked after the first suspend/resume cycle,
but not thereafter. 'lsusb' also hangs after resume.

I'm attaching a full syslog with USB_DEBUG=y.
I cannot use PM_DEBUG=y due to an unresolved race condition
in the HRTIMER stuff (previously reported/discussed with no attempt at resolution).

This notebook uses very mainstream Intel chips for most stuff:

0000:00:00.0 Host bridge: Intel Corporation Mobile Memory Controller Hub (rev 03)
0000:00:01.0 PCI bridge: Intel Corporation Mobile PCI Express Graphics Port (rev 03)
0000:00:1b.0 0403: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01)
0000:00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01)
0000:00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 01)
0000:00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 01)
0000:00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01)
0000:00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01)
0000:00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01)
0000:00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01)
0000:00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01)
0000:00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
0000:00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 01)
0000:00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial ATA Storage Controllers cc=IDE (rev 01)
0000:00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
0000:01:00.0 VGA compatible controller: ATI Technologies Inc: Unknown device 7145
0000:03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02)
0000:03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd: Unknown device 0832
0000:03:01.1 0805: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19)
0000:03:01.2 System peripheral: Ricoh Co Ltd: Unknown device 0843 (rev 01)
0000:03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 0a)
0000:03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05)
0000:0c:00.0 Network controller: Intel Corporation: Unknown device 4222 (rev 02)

There are a zillion USB patches in 2.6.22-rc*.
Greg: got any good suggestions on which one to revert first?

Thanks
Mark Lord


Attachments:
usb_toa.log (82.09 kB)

2007-05-29 16:46:29

by Linus Torvalds

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset



On Tue, 29 May 2007, Mark Lord wrote:
>
> There are a zillion USB patches in 2.6.22-rc*.
> Greg: got any good suggestions on which one to revert first?

Any chance of bisecting it?

Linus

2007-05-29 17:00:20

by Mark Lord

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset

Linus Torvalds wrote:
>
> On Tue, 29 May 2007, Mark Lord wrote:
>> There are a zillion USB patches in 2.6.22-rc*.
>> Greg: got any good suggestions on which one to revert first?
>
> Any chance of bisecting it?

Ugh. Is there a way to tell bisect to only work around the USB updates?
I suppose so.. just pick the commits before and after the USB dump
and let it pick through the middle.

Still, that'll take a few hours, and frankly I'm getting sick of having
to re-debug the USB layer with each new kernel rev.

Got a pointer to the "bisect how-to" ?

Thanks

2007-05-29 17:03:49

by Nish Aravamudan

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset

On 5/29/07, Mark Lord <[email protected]> wrote:
> I just "upgraded" from 2.6.21.3 to 2.6.22-rc3,
> but will be rebooting back into 2.6.21.xx shortly.
>
> Suspend/Resume (RAM) works perfectly on this Dell i9400 dual-core notebook.
> Except with 2.6.22-rc3, all USB devices are non-functional on resume.
> Most of the time.
> Once in five reboots, my mouse worked after the first suspend/resume cycle,
> but not thereafter. 'lsusb' also hangs after resume.

Could this be related to: http://lkml.org/lkml/2007/5/27/56/?

If so, Kay has a patch available (not sure if it's in -rc3 or not...):
http://lkml.org/lkml/2007/5/27/71

Thanks,
Nish

2007-05-29 17:04:45

by Mark Lord

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset

Mmmm.. is there a way to get the raw patches out of the http://git.kernel.org/
browser windows ?

Clicking on "diff" gives me a fancy HTML coloured diff,
whereas I need just the raw patch so I can revert it.

???

2007-05-29 17:06:14

by Mark Lord

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset

Nish Aravamudan wrote:
> On 5/29/07, Mark Lord <[email protected]> wrote:
>> I just "upgraded" from 2.6.21.3 to 2.6.22-rc3,
>> but will be rebooting back into 2.6.21.xx shortly.
>>
>> Suspend/Resume (RAM) works perfectly on this Dell i9400 dual-core
>> notebook.
>> Except with 2.6.22-rc3, all USB devices are non-functional on resume.
>> Most of the time.
>> Once in five reboots, my mouse worked after the first suspend/resume
>> cycle,
>> but not thereafter. 'lsusb' also hangs after resume.
>
> Could this be related to: http://lkml.org/lkml/2007/5/27/56/?

No, the .config here has CONFIG_USB_DEVICE_CLASS=y

Thanks.

2007-05-29 17:06:29

by Nish Aravamudan

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset

On 5/29/07, Mark Lord <[email protected]> wrote:
> Linus Torvalds wrote:
> >
> > On Tue, 29 May 2007, Mark Lord wrote:
> >> There are a zillion USB patches in 2.6.22-rc*.
> >> Greg: got any good suggestions on which one to revert first?
> >
> > Any chance of bisecting it?
>
> Ugh. Is there a way to tell bisect to only work around the USB updates?
> I suppose so.. just pick the commits before and after the USB dump
> and let it pick through the middle.

git bisect start drivers/usb ? Should only bisect over commits
affecting any paths you give it.

> Still, that'll take a few hours, and frankly I'm getting sick of having
> to re-debug the USB layer with each new kernel rev.
>
> Got a pointer to the "bisect how-to" ?

The man-page is pretty good. There used to be
http://www.kernel.org/pub/software/scm/git/docs/howto/isolate-bugs-with-bisect.txt,
but it's 404 now...

Thanks,
Nish

2007-05-29 17:07:52

by Nish Aravamudan

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset

On 5/29/07, Mark Lord <[email protected]> wrote:
> Mmmm.. is there a way to get the raw patches out of the http://git.kernel.org/
> browser windows ?
>
> Clicking on "diff" gives me a fancy HTML coloured diff,
> whereas I need just the raw patch so I can revert it.

Click on "raw"? e.g.:

summary | shortlog | log | commit | commitdiff | tree
raw (parent: 1ea0975)

Neare the top.

Thanks,
Nish

2007-05-29 17:19:19

by Mark Lord

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset

Mark Lord wrote:
> I just "upgraded" from 2.6.21.3 to 2.6.22-rc3,
> but will be rebooting back into 2.6.21.xx shortly.
>
> Suspend/Resume (RAM) works perfectly on this Dell i9400 dual-core notebook.
> Except with 2.6.22-rc3, all USB devices are non-functional on resume.
> Most of the time.
> Once in five reboots, my mouse worked after the first suspend/resume cycle,
> but not thereafter. 'lsusb' also hangs after resume.
>
> I'm attaching a full syslog with USB_DEBUG=y.
> I cannot use PM_DEBUG=y due to an unresolved race condition
> in the HRTIMER stuff (previously reported/discussed with no attempt at
> resolution).
>
> This notebook uses very mainstream Intel chips for most stuff:
..
> There are a zillion USB patches in 2.6.22-rc*.
> Greg: got any good suggestions on which one to revert first?

Okay, I used the "targeted revert" method instead of "git-bisect",
and nailed it in one try by reverting these two commits by Alan Stern:

7ed92f1a149dddc3cb537ccd7441e98adac12c3e USB: make the autosuspend workqueue thread freezable
ef7f6c7084b333c7524dcd297e0578d43733a2a2 USB: more autosuspend timer stuff

I yanked them both, as they appeared to be releated based on the titles.
Reverting this pair of commits fixes the USB suspend/resume regression.

Alan Stern: your ball now.

-ml


2007-05-29 17:37:50

by Mark Lord

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset

Mark Lord wrote:
>..
> 7ed92f1a149dddc3cb537ccd7441e98adac12c3e USB: make the autosuspend
> workqueue thread freezable
> ef7f6c7084b333c7524dcd297e0578d43733a2a2 USB: more autosuspend timer stuff
>
> I yanked them both, as they appeared to be releated based on the titles.
> Reverting this pair of commits fixes the USB suspend/resume regression.

Okay, just to make it trivial,
I've narrowed it down to only this commit from Alan Stern:

7ed92f1a149dddc3cb537ccd7441e98adac12c3e USB: make the autosuspend workqueue thread freezable
>
> From: Alan Stern <[email protected]>
> Date: Tue, 22 May 2007 13:38:39 +0000 (-0400)
> Subject: USB: make the autosuspend workqueue thread freezable
>
> USB: make the autosuspend workqueue thread freezable
>
> This patch (as881b) makes the ksuspend_usb_wq workqueue freezable. We
> don't want a rogue workqueue thread running around, unexpectedly
> suspending or resuming USB devices in the middle of a system sleep
> transition.
>
> This fixes Bugzilla #8498.
>
> Signed-off-by: Alan Stern <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
> ---
>
> diff --git a/drivers/usb/core/usb.c b/drivers/usb/core/usb.c
> index 18ddc5e..80627b6 100644
> --- a/drivers/usb/core/usb.c
> +++ b/drivers/usb/core/usb.c
> @@ -205,7 +205,11 @@ struct device_type usb_device_type = {
>
> static int ksuspend_usb_init(void)
> {
> - ksuspend_usb_wq = create_singlethread_workqueue("ksuspend_usbd");
> + /* This workqueue is supposed to be both freezable and
> + * singlethreaded. Its job doesn't justify running on more
> + * than one CPU.
> + */
> + ksuspend_usb_wq = create_freezeable_workqueue("ksuspend_usbd");
> if (!ksuspend_usb_wq)
> return -ENOMEM;
> return 0;
>

2007-05-29 17:42:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset



On Tue, 29 May 2007, Mark Lord wrote:
>
> Ugh. Is there a way to tell bisect to only work around the USB updates?

Well, you _can_ actually give "git bisect" a pathspec (the same way you
git "git log" and friends), and tell it to only care about stuff that
changed that pathspec.

However, that was broken in some older git versions, and in general hasn't
had a huge amount of testing even in new ones, so I'm a tad nervous about
recommending people do it. But yes, you should be able to say

git bisect start drivers/usb/
git bisect good v2.6.21
git bisect bad v2.6.22-rc3

and off you go. However, if you do this, you need to make sure that you
have at LEAST git-1.5.1.

The other downside of path limiting is that if it turns out that the bug
was really introduced by something else, and just happened to _look_ like
it's USB-related, the path limiting will then cause "git bisect" to blame
a commit that just happens to be the next commit after the _real_ bug was
introduced.

In this case, I don't think it's likely to be an issue, but it *could*
obviously be something else.

(In contrast, the non-path-limited "git bisect" should work in pretty much
any situation and with any git version)

> Still, that'll take a few hours, and frankly I'm getting sick of having
> to re-debug the USB layer with each new kernel rev.

Yeah, I'm not surprised. USB is probably the worst possible case for
suspend/resume (at least if you ignore ACPI-related problems). It's nasty.

> Got a pointer to the "bisect how-to" ?

It's so disgustingly simple that I don't think we've ever done any
specific bisection tutorial, but the "git-bisect" man-page does exist, and
it talks about the only half-way interesting case, namely the case where
the automatic selection of a half-way point causes git to pick a point
that doesn't work for some other reason (ie stupid compile problem or
whatever). In which case you have to pick another one manually.

So that kind of gotcha is at least _mentioned_ in the git-bisect man-page,
even if it doesn't get much further than that.

There's also the git users manual, but I think the man-page is more
detailed. But for future reference, just do

git user manual

in google, and press "I'm feeling lucky". It finds the right thing at
least for me (and at least right now).

Linus

2007-05-29 17:47:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset



On Tue, 29 May 2007, Nish Aravamudan wrote:
>
> git bisect start drivers/usb ? Should only bisect over commits
> affecting any paths you give it.

Yes, but see the caveat in my other email. You need to have git-1.5.1 or
newer to have this work (it will *mostly* work in older versions, but
older versions have a terminating problem, so while it actually does the
bisection correctly, it will screw up at the very end unless all commits
in the final range are "dense" in that they all trigger in those
pathnames).

> The man-page is pretty good. There used to be
> http://www.kernel.org/pub/software/scm/git/docs/howto/isolate-bugs-with-bisect.txt,
> but it's 404 now...

That file got folded into the user manual, so it doesn't exist as a
stand-alone thing any more.

Linus

2007-05-29 17:55:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset



On Tue, 29 May 2007, Mark Lord wrote:
>
> Okay, just to make it trivial,
> I've narrowed it down to only this commit from Alan Stern:
>
> 7ed92f1a149dddc3cb537ccd7441e98adac12c3e USB: make the autosuspend workqueue
> thread freezable

Heh. Have I mentioned how much I *hate* those kernel threads being frozen?

Just for fun, could you try if the patch that just rips out the freezer
calls from the STR code just fixes the problem too (instead of reverting
that commit?)

It was done by Matthew Garrett <[email protected]>, and you should be
able to find it in the kernel archives under the subject

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Message-ID: <[email protected]>

and it would be interesting to hear if that just solves the problem for
you too.

There's a different (but related to all the same freezer problems) patch
by Rafael Wysocki <[email protected]>:

Re: [RFC][PATCH][EXPERIMENTAL] Make kernel threads nonfreezable by default
Message-Id: <[email protected]>

which would also be worth trying out.

Linus

2007-05-29 18:19:18

by Mark Lord

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset

Linus Torvalds wrote:
>
> On Tue, 29 May 2007, Mark Lord wrote:
>> Okay, just to make it trivial,
>> I've narrowed it down to only this commit from Alan Stern:
>>
>> 7ed92f1a149dddc3cb537ccd7441e98adac12c3e USB: make the autosuspend workqueue
>> thread freezable
>
> Heh. Have I mentioned how much I *hate* those kernel threads being frozen?
>
> Just for fun, could you try if the patch that just rips out the freezer
> calls from the STR code just fixes the problem too (instead of reverting
> that commit?)
>
> It was done by Matthew Garrett <[email protected]>, and you should be
> able to find it in the kernel archives under the subject
>
> Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
> Message-ID: <[email protected]>

Okay, I tried that one, but the machine just hung on the VGA console screen
at suspend time. No alt-sysrq or anything. Power-cycled it to recover.

> and it would be interesting to hear if that just solves the problem for
> you too.
>
> There's a different (but related to all the same freezer problems) patch
> by Rafael Wysocki <[email protected]>:
>
> Re: [RFC][PATCH][EXPERIMENTAL] Make kernel threads nonfreezable by default
> Message-Id: <[email protected]>

I'll hunt for Rafael's patch next.

2007-05-29 18:29:16

by Mark Lord

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset

Mark Lord wrote:
> Linus Torvalds wrote:
..
>> There's a different (but related to all the same freezer problems)
>> patch by Rafael Wysocki <[email protected]>:
>>
>> Re: [RFC][PATCH][EXPERIMENTAL] Make kernel threads nonfreezable by
>> default
>> Message-Id: <[email protected]>
>
> I'll hunt for Rafael's patch next.

Mmm.. Rafael's patch appears to be part of a large series of 15 patches,
and it's not totally clear to me how to test just that part of it,
so I think I'll leave things as is (working!) for now.

Here's my current fix:

-- snip --

Regarding commit 7ed92f1a149dddc3cb537ccd7441e98adac12c3e
"USB: make the autosuspend workqueue thread freezable":

This commit causes USB lockups on resume from Suspend-to-RAM
on my Core2duo notebook. Applying the patch below to revert
the commit fixes the problem for me in 2.6.22-rc3.

Signed-off-by: Mark Lord <[email protected]>
---
--- b0rken/drivers/usb/core/usb.c 2007-05-29 13:32:32.000000000 -0400
+++ linux/drivers/usb/core/usb.c 2007-05-29 13:27:10.000000000 -0400
@@ -205,11 +205,7 @@

static int ksuspend_usb_init(void)
{
- /* This workqueue is supposed to be both freezable and
- * singlethreaded. Its job doesn't justify running on more
- * than one CPU.
- */
- ksuspend_usb_wq = create_freezeable_workqueue("ksuspend_usbd");
+ ksuspend_usb_wq = create_singlethread_workqueue("ksuspend_usbd");
if (!ksuspend_usb_wq)
return -ENOMEM;
return 0;

2007-05-29 18:43:21

by Alan Stern

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset

On Tue, 29 May 2007, Mark Lord wrote:

> Mark Lord wrote:
> >..
> > 7ed92f1a149dddc3cb537ccd7441e98adac12c3e USB: make the autosuspend
> > workqueue thread freezable
> > ef7f6c7084b333c7524dcd297e0578d43733a2a2 USB: more autosuspend timer stuff
> >
> > I yanked them both, as they appeared to be releated based on the titles.
> > Reverting this pair of commits fixes the USB suspend/resume regression.
>
> Okay, just to make it trivial,
> I've narrowed it down to only this commit from Alan Stern:
>
> 7ed92f1a149dddc3cb537ccd7441e98adac12c3e USB: make the autosuspend workqueue thread freezable

I'm a little surprised; I would have expected the other patch to be the
one responsible. Are you sure you got the right one?

Your symptoms suggest that the ksuspend_usbd and khubd kernel threads
are in deadlock. You could verify this by getting an Alt-SysRq-T stack
trace of the two threads after resuming. Andrew Morton saw this
happen, under slightly different circumstances, on his system.

If this is indeed the case, the patch below ought to fix your problem.
Essentially it replaces calls to flush_workqueue() with
cancel_sync_work(). I sent it to Andrew last week, but his problem
occurs only sporadically so it's hard to test.


On Tue, 29 May 2007, Linus Torvalds wrote:

> Heh. Have I mentioned how much I *hate* those kernel threads being frozen?
>
> Just for fun, could you try if the patch that just rips out the freezer
> calls from the STR code just fixes the problem too (instead of reverting
> that commit?)

We'll know for certain after Mark tries this patch. However at the
moment I doubt that freezing the kernel threads has anything to do with
the problem.

BTW, you mentioned in an earlier email that for suspend-to-RAM, instead
of freezing tasks we should freeze I/O. It's a good idea, but have you
considered how much overhead it would add? Even if the necessary
changes are confined to the subsystem level, just think about what
would happen for example with char device drivers.

The only common subsystem they share is VFS. Every entry point into
VFS would therefore need to test whether a suspend-to-RAM was in
progress, and if it was, block until the suspend was over. Each of
these tests would be on a hot path -- not something we really want to
do if it is at all avoidable.

Alan Stern



Index: 2.6.22-rc3/drivers/usb/core/hub.c
===================================================================
--- 2.6.22-rc3.orig/drivers/usb/core/hub.c
+++ 2.6.22-rc3/drivers/usb/core/hub.c
@@ -1158,6 +1158,30 @@ static void release_address(struct usb_d
}
}

+#ifdef CONFIG_USB_SUSPEND
+
+static void usb_stop_pm(struct usb_device *udev)
+{
+ /* Synchronize with the ksuspend thread to prevent any more
+ * autosuspend requests from being submitted, and decrement
+ * the parent's count of unsuspended children.
+ */
+ usb_pm_lock(udev);
+ if (udev->parent && !udev->discon_suspended)
+ usb_autosuspend_device(udev->parent);
+ usb_pm_unlock(udev);
+
+ /* Stop any autosuspend requests already submitted */
+ cancel_rearming_delayed_work(&udev->autosuspend);
+}
+
+#else
+
+static inline void usb_stop_pm(struct usb_device *udev)
+{ }
+
+#endif
+
/**
* usb_disconnect - disconnect a device (usbcore-internal)
* @pdev: pointer to device being disconnected
@@ -1224,13 +1248,7 @@ void usb_disconnect(struct usb_device **
*pdev = NULL;
spin_unlock_irq(&device_state_lock);

- /* Decrement the parent's count of unsuspended children */
- if (udev->parent) {
- usb_pm_lock(udev);
- if (!udev->discon_suspended)
- usb_autosuspend_device(udev->parent);
- usb_pm_unlock(udev);
- }
+ usb_stop_pm(udev);

put_device(&udev->dev);
}
Index: 2.6.22-rc3/drivers/usb/core/usb.c
===================================================================
--- 2.6.22-rc3.orig/drivers/usb/core/usb.c
+++ 2.6.22-rc3/drivers/usb/core/usb.c
@@ -184,10 +184,6 @@ static void usb_release_dev(struct devic

udev = to_usb_device(dev);

-#ifdef CONFIG_USB_SUSPEND
- cancel_delayed_work(&udev->autosuspend);
- flush_workqueue(ksuspend_usb_wq);
-#endif
usb_destroy_configuration(udev);
usb_put_hcd(bus_to_hcd(udev->bus));
kfree(udev->product);
Index: 2.6.22-rc3/drivers/usb/core/hcd.c
===================================================================
--- 2.6.22-rc3.orig/drivers/usb/core/hcd.c
+++ 2.6.22-rc3/drivers/usb/core/hcd.c
@@ -1681,7 +1681,7 @@ void usb_remove_hcd(struct usb_hcd *hcd)
spin_unlock_irq (&hcd_root_hub_lock);

#ifdef CONFIG_PM
- flush_workqueue(ksuspend_usb_wq);
+ cancel_work_sync(&hcd->wakeup_work);
#endif

mutex_lock(&usb_bus_list_lock);

2007-05-29 19:11:50

by Mark Lord

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset

Alan Stern wrote:
> On Tue, 29 May 2007, Mark Lord wrote:
>> Mark Lord wrote:
>>> 7ed92f1a149dddc3cb537ccd7441e98adac12c3e USB: make the autosuspend
>>> workqueue thread freezable
>>> ef7f6c7084b333c7524dcd297e0578d43733a2a2 USB: more autosuspend timer stuff
>>>
>>> I yanked them both, as they appeared to be releated based on the titles.
>>> Reverting this pair of commits fixes the USB suspend/resume regression.
>> Okay, just to make it trivial,
>> I've narrowed it down to only this commit from Alan Stern:
>>
>> 7ed92f1a149dddc3cb537ccd7441e98adac12c3e USB: make the autosuspend workqueue thread freezable
>
> I'm a little surprised; I would have expected the other patch to be the
> one responsible. Are you sure you got the right one?

Absolutely, 100% sure.

> Your symptoms suggest that the ksuspend_usbd and khubd kernel threads
> are in deadlock.
..
> If this is indeed the case, the patch below ought to fix your problem.
> Essentially it replaces calls to flush_workqueue() with
> cancel_sync_work(). I sent it to Andrew last week, but his problem
> occurs only sporadically so it's hard to test.

I didn't re-break the kernel to do the alt-sysrq thing,
but I did reapply the b0rken commit plus your new patch (below).

So far, so good with that combination, thanks.
..

> Index: 2.6.22-rc3/drivers/usb/core/hub.c
> ===================================================================
> --- 2.6.22-rc3.orig/drivers/usb/core/hub.c
> +++ 2.6.22-rc3/drivers/usb/core/hub.c
> @@ -1158,6 +1158,30 @@ static void release_address(struct usb_d
> }
> }
>
> +#ifdef CONFIG_USB_SUSPEND
> +
> +static void usb_stop_pm(struct usb_device *udev)
> +{
> + /* Synchronize with the ksuspend thread to prevent any more
> + * autosuspend requests from being submitted, and decrement
> + * the parent's count of unsuspended children.
> + */
> + usb_pm_lock(udev);
> + if (udev->parent && !udev->discon_suspended)
> + usb_autosuspend_device(udev->parent);
> + usb_pm_unlock(udev);
> +
> + /* Stop any autosuspend requests already submitted */
> + cancel_rearming_delayed_work(&udev->autosuspend);
> +}
> +
> +#else
> +
> +static inline void usb_stop_pm(struct usb_device *udev)
> +{ }
> +
> +#endif
> +
> /**
> * usb_disconnect - disconnect a device (usbcore-internal)
> * @pdev: pointer to device being disconnected
> @@ -1224,13 +1248,7 @@ void usb_disconnect(struct usb_device **
> *pdev = NULL;
> spin_unlock_irq(&device_state_lock);
>
> - /* Decrement the parent's count of unsuspended children */
> - if (udev->parent) {
> - usb_pm_lock(udev);
> - if (!udev->discon_suspended)
> - usb_autosuspend_device(udev->parent);
> - usb_pm_unlock(udev);
> - }
> + usb_stop_pm(udev);
>
> put_device(&udev->dev);
> }
> Index: 2.6.22-rc3/drivers/usb/core/usb.c
> ===================================================================
> --- 2.6.22-rc3.orig/drivers/usb/core/usb.c
> +++ 2.6.22-rc3/drivers/usb/core/usb.c
> @@ -184,10 +184,6 @@ static void usb_release_dev(struct devic
>
> udev = to_usb_device(dev);
>
> -#ifdef CONFIG_USB_SUSPEND
> - cancel_delayed_work(&udev->autosuspend);
> - flush_workqueue(ksuspend_usb_wq);
> -#endif
> usb_destroy_configuration(udev);
> usb_put_hcd(bus_to_hcd(udev->bus));
> kfree(udev->product);
> Index: 2.6.22-rc3/drivers/usb/core/hcd.c
> ===================================================================
> --- 2.6.22-rc3.orig/drivers/usb/core/hcd.c
> +++ 2.6.22-rc3/drivers/usb/core/hcd.c
> @@ -1681,7 +1681,7 @@ void usb_remove_hcd(struct usb_hcd *hcd)
> spin_unlock_irq (&hcd_root_hub_lock);
>
> #ifdef CONFIG_PM
> - flush_workqueue(ksuspend_usb_wq);
> + cancel_work_sync(&hcd->wakeup_work);
> #endif
>
> mutex_lock(&usb_bus_list_lock);

2007-05-29 19:23:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset



On Tue, 29 May 2007, Mark Lord wrote:
>
> I didn't re-break the kernel to do the alt-sysrq thing,
> but I did reapply the b0rken commit plus your new patch (below).
>
> So far, so good with that combination, thanks.

Ok, let's merge the fix them. Alan, can you send it up-stream with the
proper commit message and sign-off?

Linus

2007-05-29 19:32:49

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset

On Tue, May 29, 2007 at 10:47:06AM -0700, Linus Torvalds wrote:
> On Tue, 29 May 2007, Nish Aravamudan wrote:
> > The man-page is pretty good. There used to be
> > http://www.kernel.org/pub/software/scm/git/docs/howto/isolate-bugs-with-bisect.txt,
> > but it's 404 now...
>
> That file got folded into the user manual, so it doesn't exist as a
> stand-alone thing any more.

Yup, see
http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#using-bisect

which also has a pointer to the man page.

--b.

2007-05-29 20:12:22

by Alan Stern

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle on Intel chipset

On Tue, 29 May 2007, Mark Lord wrote:

> >> Okay, just to make it trivial,
> >> I've narrowed it down to only this commit from Alan Stern:
> >>
> >> 7ed92f1a149dddc3cb537ccd7441e98adac12c3e USB: make the autosuspend workqueue thread freezable
> >
> > I'm a little surprised; I would have expected the other patch to be the
> > one responsible. Are you sure you got the right one?
>
> Absolutely, 100% sure.

In fact, looking back at things I suspect the real culprit was

6b157c9bf3bace6eeb4a973da63923ef24995cce USB: separate autosuspend from
external suspend

The patch you identified merely makes the problem much more likely to
occur. It causes both threads involved in the deadlock to be frozen
instead of just one of them, so when things get unfrozen the race is
poised to occur.

> I didn't re-break the kernel to do the alt-sysrq thing,
> but I did reapply the b0rken commit plus your new patch (below).

No need, it seems pretty clear that this was the problem.

> So far, so good with that combination, thanks.

You're welcome. I'll send the patch on up the line...

Greg, the patch I sent Mark is for 2.6.22-rc3. It's different from the
corresponding patch for the USB development tree because of intervening
changes. How would you like to handle this? Should I send you both
versions of the patch?

Alan Stern

2007-05-29 20:35:45

by Alan Stern

[permalink] [raw]
Subject: [PATCH] USB: replace flush_workqueue with cancel_sync_work

This patch (as912) replaces a couple of calls to flush_workqueue()
with cancel_sync_work() and cancel_rearming_delayed_work(). Using a
more directed approach allows us to avoid some nasty deadlocks. The
prime example occurs when a first-level device (the parent is a root
hub) is removed while at the same time the root hub gets a remote
wakeup request. khubd would try to flush the autosuspend workqueue
while holding the root-hub's lock, and the remote-wakeup workqueue
routine would be waiting to lock the root hub.

The patch also reorganizes the power management portion of
usb_disconnect(), separating it out into its own routine. The
autosuspend workqueue entry is cancelled immediately instead of
waiting for the device's release routine. In addition,
synchronization with the autosuspend thread is carried out even for
root hubs (an oversight in the original code).

Signed-off-by: Alan Stern <[email protected]>

---


On Tue, 29 May 2007, Linus Torvalds wrote:

> Ok, let's merge the fix them. Alan, can you send it up-stream with the
> proper commit message and sign-off?

Here it is. This will apply to 2.6.22-rc3. I'll let you duke it out
with Greg and Andrew to see who wants to apply it where. :-)

Alan Stern


Index: 2.6.22-rc3/drivers/usb/core/hub.c
===================================================================
--- 2.6.22-rc3.orig/drivers/usb/core/hub.c
+++ 2.6.22-rc3/drivers/usb/core/hub.c
@@ -1158,6 +1158,30 @@ static void release_address(struct usb_d
}
}

+#ifdef CONFIG_USB_SUSPEND
+
+static void usb_stop_pm(struct usb_device *udev)
+{
+ /* Synchronize with the ksuspend thread to prevent any more
+ * autosuspend requests from being submitted, and decrement
+ * the parent's count of unsuspended children.
+ */
+ usb_pm_lock(udev);
+ if (udev->parent && !udev->discon_suspended)
+ usb_autosuspend_device(udev->parent);
+ usb_pm_unlock(udev);
+
+ /* Stop any autosuspend requests already submitted */
+ cancel_rearming_delayed_work(&udev->autosuspend);
+}
+
+#else
+
+static inline void usb_stop_pm(struct usb_device *udev)
+{ }
+
+#endif
+
/**
* usb_disconnect - disconnect a device (usbcore-internal)
* @pdev: pointer to device being disconnected
@@ -1224,13 +1248,7 @@ void usb_disconnect(struct usb_device **
*pdev = NULL;
spin_unlock_irq(&device_state_lock);

- /* Decrement the parent's count of unsuspended children */
- if (udev->parent) {
- usb_pm_lock(udev);
- if (!udev->discon_suspended)
- usb_autosuspend_device(udev->parent);
- usb_pm_unlock(udev);
- }
+ usb_stop_pm(udev);

put_device(&udev->dev);
}
Index: 2.6.22-rc3/drivers/usb/core/usb.c
===================================================================
--- 2.6.22-rc3.orig/drivers/usb/core/usb.c
+++ 2.6.22-rc3/drivers/usb/core/usb.c
@@ -184,10 +184,6 @@ static void usb_release_dev(struct devic

udev = to_usb_device(dev);

-#ifdef CONFIG_USB_SUSPEND
- cancel_delayed_work(&udev->autosuspend);
- flush_workqueue(ksuspend_usb_wq);
-#endif
usb_destroy_configuration(udev);
usb_put_hcd(bus_to_hcd(udev->bus));
kfree(udev->product);
Index: 2.6.22-rc3/drivers/usb/core/hcd.c
===================================================================
--- 2.6.22-rc3.orig/drivers/usb/core/hcd.c
+++ 2.6.22-rc3/drivers/usb/core/hcd.c
@@ -1681,7 +1681,7 @@ void usb_remove_hcd(struct usb_hcd *hcd)
spin_unlock_irq (&hcd_root_hub_lock);

#ifdef CONFIG_PM
- flush_workqueue(ksuspend_usb_wq);
+ cancel_work_sync(&hcd->wakeup_work);
#endif

mutex_lock(&usb_bus_list_lock);

2007-05-29 21:20:16

by Mark Lord

[permalink] [raw]
Subject: Re: [PATCH] USB: replace flush_workqueue with cancel_sync_work

Thanks again, Alan!

From: Alan Stern <[email protected]>
>
>This patch (as912) replaces a couple of calls to flush_workqueue()
>with cancel_sync_work() and cancel_rearming_delayed_work(). Using a
>more directed approach allows us to avoid some nasty deadlocks. The
>prime example occurs when a first-level device (the parent is a root
>hub) is removed while at the same time the root hub gets a remote
>wakeup request. khubd would try to flush the autosuspend workqueue
>while holding the root-hub's lock, and the remote-wakeup workqueue
>routine would be waiting to lock the root hub.
>
>The patch also reorganizes the power management portion of
>usb_disconnect(), separating it out into its own routine. The
>autosuspend workqueue entry is cancelled immediately instead of
>waiting for the device's release routine. In addition,
>synchronization with the autosuspend thread is carried out even for
>root hubs (an oversight in the original code).
>
>Signed-off-by: Alan Stern <[email protected]>
Acked-by: Mark Lord <[email protected]>

---

Index: 2.6.22-rc3/drivers/usb/core/hub.c
===================================================================
--- 2.6.22-rc3.orig/drivers/usb/core/hub.c
+++ 2.6.22-rc3/drivers/usb/core/hub.c
@@ -1158,6 +1158,30 @@ static void release_address(struct usb_d
}
}

+#ifdef CONFIG_USB_SUSPEND
+
+static void usb_stop_pm(struct usb_device *udev)
+{
+ /* Synchronize with the ksuspend thread to prevent any more
+ * autosuspend requests from being submitted, and decrement
+ * the parent's count of unsuspended children.
+ */
+ usb_pm_lock(udev);
+ if (udev->parent && !udev->discon_suspended)
+ usb_autosuspend_device(udev->parent);
+ usb_pm_unlock(udev);
+
+ /* Stop any autosuspend requests already submitted */
+ cancel_rearming_delayed_work(&udev->autosuspend);
+}
+
+#else
+
+static inline void usb_stop_pm(struct usb_device *udev)
+{ }
+
+#endif
+
/**
* usb_disconnect - disconnect a device (usbcore-internal)
* @pdev: pointer to device being disconnected
@@ -1224,13 +1248,7 @@ void usb_disconnect(struct usb_device **
*pdev = NULL;
spin_unlock_irq(&device_state_lock);

- /* Decrement the parent's count of unsuspended children */
- if (udev->parent) {
- usb_pm_lock(udev);
- if (!udev->discon_suspended)
- usb_autosuspend_device(udev->parent);
- usb_pm_unlock(udev);
- }
+ usb_stop_pm(udev);

put_device(&udev->dev);
}
Index: 2.6.22-rc3/drivers/usb/core/usb.c
===================================================================
--- 2.6.22-rc3.orig/drivers/usb/core/usb.c
+++ 2.6.22-rc3/drivers/usb/core/usb.c
@@ -184,10 +184,6 @@ static void usb_release_dev(struct devic

udev = to_usb_device(dev);

-#ifdef CONFIG_USB_SUSPEND
- cancel_delayed_work(&udev->autosuspend);
- flush_workqueue(ksuspend_usb_wq);
-#endif
usb_destroy_configuration(udev);
usb_put_hcd(bus_to_hcd(udev->bus));
kfree(udev->product);
Index: 2.6.22-rc3/drivers/usb/core/hcd.c
===================================================================
--- 2.6.22-rc3.orig/drivers/usb/core/hcd.c
+++ 2.6.22-rc3/drivers/usb/core/hcd.c
@@ -1681,7 +1681,7 @@ void usb_remove_hcd(struct usb_hcd *hcd)
spin_unlock_irq (&hcd_root_hub_lock);

#ifdef CONFIG_PM
- flush_workqueue(ksuspend_usb_wq);
+ cancel_work_sync(&hcd->wakeup_work);
#endif

mutex_lock(&usb_bus_list_lock);

-

2007-05-30 07:02:39

by Romano Giannetti

[permalink] [raw]
Subject: Re: Regression: USB is nfg after suspend/resume(RAM) cycle onIntel chipset

On Tue, 2007-05-29 at 10:07 -0700, Nish Aravamudan wrote:

> Click on "raw"? e.g.:
>
> summary | shortlog | log | commit | commitdiff | tree
> raw (parent: 1ea0975)
>
> Neare the top.

Hmmm...yes, but if you click raw, you ave "500 internal error", and a
raw html source (that is, content: header is right :-) a least).

Added webmaster to Cc:, as per instruction in the error itself.

Romano



--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.