Hi,
Following patch fixes an endless loop that happens after having
slept and resumed my iBook with a linux-wlan-ng controller plugged in,
removed the stick and plugged it back (getting "IRQ lossage" message).
It supercedes the previous one where
.I hadn't noticed limit was unsigned,
.Decrementing limit was twice too fast,
.the goto was a bit useless.
Signed-off-by: Colin Leroy <[email protected]>
--- a/drivers/usb/host/ohci-hcd.c 2004-11-26 11:28:21.284259057 +0100
+++ b/drivers/usb/host/ohci-hcd.c 2004-11-26 11:28:03.437351150 +0100
@@ -344,7 +344,7 @@
int epnum = ep & USB_ENDPOINT_NUMBER_MASK;
unsigned long flags;
struct ed *ed;
- unsigned limit = 1000;
+ int limit = 1000;
/* ASSERT: any requests/urbs are being unlinked */
/* ASSERT: nobody can be submitting urbs for this any more */
@@ -375,6 +375,11 @@
spin_unlock_irqrestore (&ohci->lock, flags);
set_current_state (TASK_UNINTERRUPTIBLE);
schedule_timeout (1);
+ if (limit < 1000) {
+ ohci_warn (ohci, "Can't recover, restarting.\n");
+ ohci_restart(ohci);
+ return;
+ }
goto rescan;
case ED_IDLE: /* fully unlinked */
if (list_empty (&ed->td_list)) {
On Friday 26 November 2004 02:30, Colin Leroy wrote:
> @@ -375,6 +375,11 @@
> spin_unlock_irqrestore (&ohci->lock, flags);
> set_current_state (TASK_UNINTERRUPTIBLE);
> schedule_timeout (1);
> + if (limit < 1000) {
> + ohci_warn (ohci, "Can't recover, restarting.\n");
> + ohci_restart(ohci);
> + return;
> + }
So instead of waiting a moment for the ED to finish
its normal processing and move from state ED_UNLINK
into ED_IDLE, you want to always clobber the whole
USB device tree attached to that bus? That'd happen
quite routinely.
This isn't a good patch either... maybe your best
bet would be to find out why the IRQs stopped getting
delivered.
- Dave
On Fri, 2004-11-26 at 09:57 -0800, David Brownell wrote:
> On Friday 26 November 2004 09:37, Colin Leroy wrote:
> > On 26 Nov 2004 at 09h11, David Brownell wrote:
> > > This isn't a good patch either... maybe your best
> > > bet would be to find out why the IRQs stopped getting
> > > delivered.
> >
> > It's probably a linux-wlan-ng issue...
>
> I suspect PPC resume issues myself.
Colin, you didn't tell us which controller it was ? The NEC one is a
totally normal off-the-shelves controller coming out of D3. The Apple
ones are a bit special tho.
>
> As expected, if IRQs aren't arriving. Though you
> may not be using the latest kernel; it's supposed
> to give warnings about IRQ delivery problems after
> resume too, not just on initial startup.
It could be a problem in the code restarting the clocks to the USB cell
in KL (provided it's one of these controller and not the NEC), that
would need some more delay before restarting things...
> I'm not expert in PPC IRQ delivery, which is where the
> root cause of this problem seems to live. We all have
> places where we need help!
There is nothing fancy with PPC IRQ delivery. IRQs work on wakeup for
everybody or nobody. It's a problem with the USB chip. (There is no
fancy firmware IRQ routing thing, etc... every device is physically
wired to one of the about 128 IRQ lines of the MPIC).
Ben.
On 26 Nov 2004 at 09h11, David Brownell wrote:
Hi,
> So instead of waiting a moment for the ED to finish
> its normal processing and move from state ED_UNLINK
> into ED_IDLE, you want to always clobber the whole
> USB device tree attached to that bus? That'd happen
> quite routinely.
Yeah. Sorry. Also, just noticed that this patch seemed
to work because I overlooked the unsigned bit, makeing my
hack not go though sanitize - which changes eb->state and
thus does not get back to the ED_UNLINK path. Duh... I must
have been tired.
> This isn't a good patch either... maybe your best
> bet would be to find out why the IRQs stopped getting
> delivered.
It's probably a linux-wlan-ng issue... What do you think
of these logs ?
#resume logs...
#disconnecting the stick:
usb 4-1: USB disconnect, address 2
ohci_hcd 0001:10:1b.1: IRQ INTR_SF lossage
hfa384x_usbin_callback: Fatal, failed to resubmit rx_urb. error=-19
hfa384x_dorrid: ctlx failure=REQ_TIMEOUT
prism2sta_mlmerequest: Failed to read eth1 statistics: error=-5
#reconnecting the stick:
usb 4-1: new full speed USB device using address 3
usb 4-1: control timeout on ep0out
maybe the lwlan driver should catch these and kill the urbs or
something?
Thanks for your help, I'm not an expert at all in the usb world...
--
Colin
On Friday 26 November 2004 09:37, Colin Leroy wrote:
> On 26 Nov 2004 at 09h11, David Brownell wrote:
> > This isn't a good patch either... maybe your best
> > bet would be to find out why the IRQs stopped getting
> > delivered.
>
> It's probably a linux-wlan-ng issue...
I suspect PPC resume issues myself.
> What do you think
> of these logs ?
>
> #resume logs...
> #disconnecting the stick:
> usb 4-1: USB disconnect, address 2
> ohci_hcd 0001:10:1b.1: IRQ INTR_SF lossage
That does seem to be the first problem; fixing
it (that is, making sure IRQs arrive again!)
should make the rest go away.
> hfa384x_usbin_callback: Fatal, failed to resubmit rx_urb. error=-19
> hfa384x_dorrid: ctlx failure=REQ_TIMEOUT
> prism2sta_mlmerequest: Failed to read eth1 statistics: error=-5
Those look like plausible ways for that driver to
behave. "-19" == "-ENODEV" for device-gone (you
unplugged it!), though the rest (timeout, EIO)
suggest that WLAN code fault recovery is wierd.
> #reconnecting the stick:
> usb 4-1: new full speed USB device using address 3
> usb 4-1: control timeout on ep0out
As expected, if IRQs aren't arriving. Though you
may not be using the latest kernel; it's supposed
to give warnings about IRQ delivery problems after
resume too, not just on initial startup.
> maybe the lwlan driver should catch these and kill the urbs or
> something?
The only obvious "looks wrong" thing from that WLAN
code is discarding the non-recoverable ENODEV status
in favor of reporting a usually-recoverable (timeout)
then maybe-recoverable (EIO) error. But that's not
necessarily troublesome here.
> Thanks for your help, I'm not an expert at all in the usb world...
Most people aren't... :)
I'm not expert in PPC IRQ delivery, which is where the
root cause of this problem seems to live. We all have
places where we need help!
- Dave
Colin reported off-line that he's using 2.6.9
rather than 2.6.10-rc2 or newer ... so it's
actually expected that his kernel misbehave
with USB PM. The workaround, for all 2.6
kernels until very recently, is to rmmod the
HCDs before entering a system sleep state.
I think that starting in 2.6.10 it'll be OK
to leave the USB HCDs loaded during various
PM sleep states ... in at least some common
system configuration. There are several
hundred different possibilities, it's hard
to test all of them even if you do happen to
have all that hardware!
But for earlier kernels, don't even try that.
- Dave
On 27 Nov 2004 at 09h11, Benjamin Herrenschmidt wrote:
Hi,
> > > It's probably a linux-wlan-ng issue...
> >
> > I suspect PPC resume issues myself.
>
> Colin, you didn't tell us which controller it was ? The NEC one is a
> totally normal off-the-shelves controller coming out of D3. The Apple
> ones are a bit special tho.
It's the ibook G4's controller:
[colin@jack ~]$ for i in 1 2 3 4; do cat /sys/bus/usb/devices/usb$i/product; done;
NEC Corporation USB 2.0
Apple Computer Inc. KeyLargo/Intrepid USB (#3)
NEC Corporation USB
NEC Corporation USB (#2)
--
Colin
On Mon, 2004-11-29 at 09:04 +0100, Colin Leroy wrote:
> On 27 Nov 2004 at 09h11, Benjamin Herrenschmidt wrote:
>
> Hi,
>
> > > > It's probably a linux-wlan-ng issue...
> > >
> > > I suspect PPC resume issues myself.
> >
> > Colin, you didn't tell us which controller it was ? The NEC one is a
> > totally normal off-the-shelves controller coming out of D3. The Apple
> > ones are a bit special tho.
>
> It's the ibook G4's controller:
> [colin@jack ~]$ for i in 1 2 3 4; do cat /sys/bus/usb/devices/usb$i/product; done;
> NEC Corporation USB 2.0
> Apple Computer Inc. KeyLargo/Intrepid USB (#3)
> NEC Corporation USB
> NEC Corporation USB (#2)
Hrm... there is some problem in communication here. I asked you which
controller out of the 3 OHCIs you have in this machine is the culprit,
you give me a list of all of them but without PCI IDs ... From the
archive, I think it was USB bus #4 no ? not sure which of these
controllers it matches.
The iBook G4 has actually 3 "Apple" OHCI's in KeyLargo/Intrepid but with
2 of them disabled by the firmware (not wired) plus one NEC USB2
controller (which contains 1 EHCI and 2 OHCIs) on the PCI bus. The code
managing their sleep process is very different.
Ben.
On Mon, 2004-11-29 at 23:34 +0100, Colin Leroy wrote:
> On 30 Nov 2004 at 09h11, Benjamin Herrenschmidt wrote:
>
> Hi,
>
> > Hrm... there is some problem in communication here. I asked you which
> > controller out of the 3 OHCIs you have in this machine is the culprit,
> > you give me a list of all of them but without PCI IDs ... From the
> > archive, I think it was USB bus #4 no ? not sure which of these
> > controllers it matches.
> >
> > The iBook G4 has actually 3 "Apple" OHCI's in KeyLargo/Intrepid but
> > with 2 of them disabled by the firmware (not wired) plus one NEC USB2
> > controller (which contains 1 EHCI and 2 OHCIs) on the PCI bus. The
> > code managing their sleep process is very different.
>
> Sorry, i was away and had a problem of /proc/bus/usb being empty. As my
> link was on the wireless stick I couldn't reload usb modules. The
> culprit is usb 4-1, I think it would be this one (as the stick is bus
> 004 device 001):
Ok, this is a perfectly normal "out of the schelves" NEC chip, no
special "Mac" thing in there, it just use normal PCI PM...
It could be one of the devices not properly dealing with beeing
suspended, or it could be some delay needing to be increased here or
there in the resume process, difficult to say at this point.
Ben.
On 30 Nov 2004 at 09h11, Benjamin Herrenschmidt wrote:
Hi,
> Hrm... there is some problem in communication here. I asked you which
> controller out of the 3 OHCIs you have in this machine is the culprit,
> you give me a list of all of them but without PCI IDs ... From the
> archive, I think it was USB bus #4 no ? not sure which of these
> controllers it matches.
>
> The iBook G4 has actually 3 "Apple" OHCI's in KeyLargo/Intrepid but
> with 2 of them disabled by the firmware (not wired) plus one NEC USB2
> controller (which contains 1 EHCI and 2 OHCIs) on the PCI bus. The
> code managing their sleep process is very different.
Sorry, i was away and had a problem of /proc/bus/usb being empty. As my
link was on the wireless stick I couldn't reload usb modules. The
culprit is usb 4-1, I think it would be this one (as the stick is bus
004 device 001):
Bus 004 Device 001: ID 0000:0000
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 10.01
bDeviceClass 9 Hub
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 8
idVendor 0x0000
idProduct 0x0000
bcdDevice 6.02
iManufacturer 3 Linux 2.6.9 ohci_hcd
iProduct 2 NEC Corporation USB (#2)
iSerial 1 0001:10:1b.1
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 25
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xc0
Self Powered
MaxPower 0mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 9 Hub
bInterfaceSubClass 0
bInterfaceProtocol 0
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type none
wMaxPacketSize 2
bInterval 255
Language IDs: (length=4)
0409 English(US)
--
Colin
On Monday 29 November 2004 2:43 pm, Benjamin Herrenschmidt wrote:
> On Mon, 2004-11-29 at 23:34 +0100, Colin Leroy wrote:
>
> Ok, this is a perfectly normal "out of the schelves" NEC chip, no
> special "Mac" thing in there, it just use normal PCI PM...
>
> It could be one of the devices not properly dealing with beeing
> suspended, or it could be some delay needing to be increased here or
> there in the resume process, difficult to say at this point.
Or as I said before, it's probably one of the issues fixed
in the USB PM patches in 2.6.10-rc2 ... really, it's not
even worth testing that with straight 2.6.9 kernels.