2006-09-15 22:14:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: 2.6.18-rc6-mm2 (-mm1): ohci_hcd sometimes does not initialize properly on x86_64

Hi,

It looks like the ohci_hcd driver sometimes has problems with the
initialization (eg. USB mouse doesn't work after a fresh boot and reloading
of the driver helps).

I have observed this on two different x86_64 boxes (HPC 6325, Asus L5D),
but it is not readily reproducible. Anyway I've got a dmesg output from a
failing case which is attached.

Greetings,
Rafael


--
You never change things by fighting the existing reality.
R. Buckminster Fuller


Attachments:
(No filename) (457.00 B)
dmesg.log.gz (8.13 kB)
Download all attachments

2006-09-16 08:14:54

by Rafael J. Wysocki

[permalink] [raw]
Subject: 2.6.18-rc6-mm2 (-mm1): ohci_hcd does not recognize new devices

Hi,

On Saturday, 16 September 2006 00:13, Rafael J. Wysocki wrote:
> It looks like the ohci_hcd driver sometimes has problems with the
> initialization (eg. USB mouse doesn't work after a fresh boot and reloading
> of the driver helps).
>
> I have observed this on two different x86_64 boxes (HPC 6325, Asus L5D),
> but it is not readily reproducible. Anyway I've got a dmesg output from a
> failing case which is attached.

Actually, the problem is ohci_hcd doesn't seem to recognize devices plugged
into the USB ports.

For example, if I unplug and replug a mouse (that worked before unplugging),
it doesn't work any more. I have to reload ohci_hcd to make it work again.

This is 100% reproducible and occurs on the two boxes above.

Appended in a snippet from a dmesg output that I think is relevant to this
issue. It covers the unplugging and replugging of a USB mouse (there are no
more USB-related messages in the dmesg).

Greetings,
Rafael


hub 3-0:1.0: state 7 ports 4 chg 0000 evt 0010
ohci_hcd 0000:00:13.1: GetStatus roothub.portstatus [3] = 0x00030100 PESC CSC PPS
hub 3-0:1.0: port 4, status 0100, change 0003, 12 Mb/s
usb 3-4: USB disconnect, address 2
usb 3-4: unregistering device
usb 3-4: usb_disable_device nuking all URBs
ohci_hcd 0000:00:13.1: shutdown urb ffff81002f77d4b8 pipe 40408280 ep1in-intr
usb 3-4: unregistering interface 3-4:1.0
PM: Removing info for No Bus:usbdev3.2_ep81
usbdev3.2_ep81: ep_device_release called for usbdev3.2_ep81
PM: Removing info for usb:3-4:1.0
usb 3-4:1.0: uevent
PM: Removing info for No Bus:usbdev3.2
PM: Removing info for No Bus:usbdev3.2_ep00
usbdev3.2_ep00: ep_device_release called for usbdev3.2_ep00
PM: Removing info for usb:3-4
usb 3-4: uevent
hub 3-0:1.0: debounce: port 4: total 100ms stable 100ms status 0x100
hub 1-0:1.0: state 7 ports 8 chg 0000 evt 0100
ehci_hcd 0000:00:13.2: GetStatus port 8 status 001403 POWER sig=k CSC CONNECT
hub 1-0:1.0: port 8, status 0501, change 0001, 480 Mb/s
hub 1-0:1.0: debounce: port 8: total 100ms stable 100ms status 0x501
ehci_hcd 0000:00:13.2: port 8 low speed --> companion
ehci_hcd 0000:00:13.2: GetStatus port 8 status 003002 POWER OWNER sig=se0 CSC


--
You never change things by fighting the existing reality.
R. Buckminster Fuller

2006-09-18 06:24:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.18-rc6-mm2 (-mm1): ohci_hcd does not recognize new devices

On Saturday, 16 September 2006 10:13, Rafael J. Wysocki wrote:
> On Saturday, 16 September 2006 00:13, Rafael J. Wysocki wrote:
>
> > It looks like the ohci_hcd driver sometimes has problems with the
> > initialization (eg. USB mouse doesn't work after a fresh boot and reloading
> > of the driver helps).
> >
> > I have observed this on two different x86_64 boxes (HPC 6325, Asus L5D),
> > but it is not readily reproducible. Anyway I've got a dmesg output from a
> > failing case which is attached.
>
> Actually, the problem is ohci_hcd doesn't seem to recognize devices plugged
> into the USB ports.
>
> For example, if I unplug and replug a mouse (that worked before unplugging),
> it doesn't work any more. I have to reload ohci_hcd to make it work again.
>
> This is 100% reproducible and occurs on the two boxes above.

I have carried out a binary search and found that the problem is caused by

gregkh-usb-usbcore-remove-usb_suspend_root_hub.patch

Greetings,
Rafael


--
You never change things by fighting the existing reality.
R. Buckminster Fuller

2006-09-18 06:51:00

by Jan De Luyck

[permalink] [raw]
Subject: Re: 2.6.18-rc6-mm2 (-mm1): ohci_hcd does not recognize new devices

On Monday 18 September 2006 08:27, Rafael J. Wysocki wrote:
> On Saturday, 16 September 2006 10:13, Rafael J. Wysocki wrote:
> > On Saturday, 16 September 2006 00:13, Rafael J. Wysocki wrote:
> > > It looks like the ohci_hcd driver sometimes has problems with the
> > > initialization (eg. USB mouse doesn't work after a fresh boot and
> > > reloading of the driver helps).
> > >
> > > I have observed this on two different x86_64 boxes (HPC 6325, Asus
> > > L5D), but it is not readily reproducible. Anyway I've got a dmesg
> > > output from a failing case which is attached.
> >
> > Actually, the problem is ohci_hcd doesn't seem to recognize devices
> > plugged into the USB ports.
> >
> > For example, if I unplug and replug a mouse (that worked before
> > unplugging), it doesn't work any more. I have to reload ohci_hcd to make
> > it work again.
> >
> > This is 100% reproducible and occurs on the two boxes above.

I can confirm this behaviour. I've also seen that sometimes my USB
keyboard/mouse doesn't work after booting up. Reloading the module solves the
problem.

This is on an amd64 box, ABIT kn9-sli, nForce 550.

This is with 2.6.17.13.

> I have carried out a binary search and found that the problem is caused by
>
> gregkh-usb-usbcore-remove-usb_suspend_root_hub.patch

Will this work against 2.6.17.13 vanilla?

Thanks,

Jan
--
QOTD:
"If I could walk that way, I wouldn't need the cologne, now would I?"

2006-09-18 11:17:06

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.18-rc6-mm2 (-mm1): ohci_hcd does not recognize new devices

On Monday, 18 September 2006 08:50, Jan De Luyck wrote:
> On Monday 18 September 2006 08:27, Rafael J. Wysocki wrote:
> > On Saturday, 16 September 2006 10:13, Rafael J. Wysocki wrote:
> > > On Saturday, 16 September 2006 00:13, Rafael J. Wysocki wrote:
> > > > It looks like the ohci_hcd driver sometimes has problems with the
> > > > initialization (eg. USB mouse doesn't work after a fresh boot and
> > > > reloading of the driver helps).
> > > >
> > > > I have observed this on two different x86_64 boxes (HPC 6325, Asus
> > > > L5D), but it is not readily reproducible. Anyway I've got a dmesg
> > > > output from a failing case which is attached.
> > >
> > > Actually, the problem is ohci_hcd doesn't seem to recognize devices
> > > plugged into the USB ports.
> > >
> > > For example, if I unplug and replug a mouse (that worked before
> > > unplugging), it doesn't work any more. I have to reload ohci_hcd to make
> > > it work again.
> > >
> > > This is 100% reproducible and occurs on the two boxes above.
>
> I can confirm this behaviour. I've also seen that sometimes my USB
> keyboard/mouse doesn't work after booting up. Reloading the module solves the
> problem.
>
> This is on an amd64 box, ABIT kn9-sli, nForce 550.
>
> This is with 2.6.17.13.
>
> > I have carried out a binary search and found that the problem is caused by
> >
> > gregkh-usb-usbcore-remove-usb_suspend_root_hub.patch
>
> Will this work against 2.6.17.13 vanilla?

No, this patch is not present in vanilla.

Rafael


--
You never change things by fighting the existing reality.
R. Buckminster Fuller

2006-09-18 15:07:15

by Alan Stern

[permalink] [raw]
Subject: Re: 2.6.18-rc6-mm2 (-mm1): ohci_hcd does not recognize new devices

On Mon, 18 Sep 2006, Rafael J. Wysocki wrote:

> > Actually, the problem is ohci_hcd doesn't seem to recognize devices plugged
> > into the USB ports.
> >
> > For example, if I unplug and replug a mouse (that worked before unplugging),
> > it doesn't work any more. I have to reload ohci_hcd to make it work again.
> >
> > This is 100% reproducible and occurs on the two boxes above.
>
> I have carried out a binary search and found that the problem is caused by
>
> gregkh-usb-usbcore-remove-usb_suspend_root_hub.patch

Tell me, what happens if you leave that patch installed, and you use
the patch I sent last week (the one that removes a chunk of code from
ohci-hub.c), and you also set CONFIG_USB_SUSPEND?

I think the real underlying problem here is that David's implementation of
root-hub suspend in ohci-hcd is incompatible with the overall scheme I've
been working on. In the end I'll probably have to rewrite the ohci-hcd
code.

Alan Stern

2006-09-18 20:47:58

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.18-rc6-mm2 (-mm1): ohci_hcd does not recognize new devices

On Monday, 18 September 2006 17:07, Alan Stern wrote:
> On Mon, 18 Sep 2006, Rafael J. Wysocki wrote:
>
> > > Actually, the problem is ohci_hcd doesn't seem to recognize devices plugged
> > > into the USB ports.
> > >
> > > For example, if I unplug and replug a mouse (that worked before unplugging),
> > > it doesn't work any more. I have to reload ohci_hcd to make it work again.
> > >
> > > This is 100% reproducible and occurs on the two boxes above.
> >
> > I have carried out a binary search and found that the problem is caused by
> >
> > gregkh-usb-usbcore-remove-usb_suspend_root_hub.patch
>
> Tell me, what happens if you leave that patch installed, and you use
> the patch I sent last week (the one that removes a chunk of code from
> ohci-hub.c), and you also set CONFIG_USB_SUSPEND?

The problem continues to happen.

Moreover, if I revert the above patch and apply the patch removing code
from ohci-hub.c, the problem reappears.

> I think the real underlying problem here is that David's implementation of
> root-hub suspend in ohci-hcd is incompatible with the overall scheme I've
> been working on. In the end I'll probably have to rewrite the ohci-hcd
> code.

Well, at this point I can only help you by testing some code. ;-)

Seriously, if you have any new patches to test, please let me know.

Greetings,
Rafael


--
You never change things by fighting the existing reality.
R. Buckminster Fuller

2006-09-18 21:16:33

by Alan Stern

[permalink] [raw]
Subject: Re: 2.6.18-rc6-mm2 (-mm1): ohci_hcd does not recognize new devices

On Mon, 18 Sep 2006, Rafael J. Wysocki wrote:

> > > I have carried out a binary search and found that the problem is caused by
> > >
> > > gregkh-usb-usbcore-remove-usb_suspend_root_hub.patch
> >
> > Tell me, what happens if you leave that patch installed, and you use
> > the patch I sent last week (the one that removes a chunk of code from
> > ohci-hub.c), and you also set CONFIG_USB_SUSPEND?
>
> The problem continues to happen.
>
> Moreover, if I revert the above patch and apply the patch removing code
> from ohci-hub.c, the problem reappears.

Very strange.

> > I think the real underlying problem here is that David's implementation of
> > root-hub suspend in ohci-hcd is incompatible with the overall scheme I've
> > been working on. In the end I'll probably have to rewrite the ohci-hcd
> > code.
>
> Well, at this point I can only help you by testing some code. ;-)
>
> Seriously, if you have any new patches to test, please let me know.

I definitely will. However they won't be ready for a few days...

Alan Stern

2006-09-19 02:10:04

by David Brownell

[permalink] [raw]
Subject: Re: 2.6.18-rc6-mm2 (-mm1): ohci_hcd sometimes does not initialize properly on x86_64

On Friday 15 September 2006 3:13 pm, Rafael J. Wysocki wrote:
> Hi,
>
> It looks like the ohci_hcd driver sometimes has problems with the
> initialization (eg. USB mouse doesn't work after a fresh boot and reloading
> of the driver helps).
>
> I have observed this on two different x86_64 boxes (HPC 6325, Asus L5D),
> but it is not readily reproducible. Anyway I've got a dmesg output from a
> failing case which is attached.

Where I've seen such issues in the past has been with one specific
device: a UPS that seems unhappy if it doesn't get a VBUS power cycle,
so that OHCI implementations that don't implement power switching are
bad choices for connecting that particular UPS.

I believe that's not the issue in your case. I compared the boot
sequence you sent with one for the NF3-150 I use a lot (also x86_64)
which does not exhibit this failure, and the differences I noticed
were:

- NOCP set in roothub.a ... your BIOS reports no overcurrent protection
- different 2.6.18 prepatches ... you used rc6-mm2, not rc7
- different irqs (you used PIC not IOAPIC)
- driver registration sequence different ... I registered EHCI first
- yours came _up_ with RHSC irq pending on one root (device present)

And re those last two, it didn't finish mouse enumeration with OHCI before
starting to do it with EHCI. I could easily see how that would lead to
timing-dependent/intermittent failures.

Now, registering EHCI first is not "supposed" to matter, but I'm thinking
it started to matter a while back, since a few folk have reported as much.

One suspicion being that some of the hub driver changes have had some bad
consequences. (My suspicions there were highlighted by noticing some of
the misbehavior associated with an embedded USB controller I was testing,
which provoked failures in those same code paths...) The root hub handoff
relies on the usb/core/hub.c code to do the right things, notably treating
disconnect-during-reset (handoff to companion) as routine, but I think I
noticed that fault handling logic has changed.

At any rate, that suggests a few experiments to me.

- First, does this still show up with the stock RC7 code? There are
a bunch of IMO rather experimental USB patches in the MM tree...
including several affecting usbcore hub support.

- Second does it appear without EHCI loaded? If not, that would
tend to confirm an issue usbcore hub driver handoff logic.

- Third, does it appear if EHCI is loaded _first_ (as the distro
should already have been doing just to avoid thrashing during
system startup)? Similar comment re previous experiment, though
it'd provide a potential workaround.

I'd kind of suspect that the generic RC7 code, with EHCI loaded first
as it should be, would "just work".

- Dave



2006-09-19 20:50:37

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.18-rc6-mm2 (-mm1): ohci_hcd sometimes does not initialize properly on x86_64

On Tuesday, 19 September 2006 02:04, David Brownell wrote:
> On Friday 15 September 2006 3:13 pm, Rafael J. Wysocki wrote:
> > Hi,
> >
> > It looks like the ohci_hcd driver sometimes has problems with the
> > initialization (eg. USB mouse doesn't work after a fresh boot and reloading
> > of the driver helps).
> >
> > I have observed this on two different x86_64 boxes (HPC 6325, Asus L5D),
> > but it is not readily reproducible. Anyway I've got a dmesg output from a
> > failing case which is attached.
>
> Where I've seen such issues in the past has been with one specific
> device: a UPS that seems unhappy if it doesn't get a VBUS power cycle,
> so that OHCI implementations that don't implement power switching are
> bad choices for connecting that particular UPS.
>
> I believe that's not the issue in your case. I compared the boot
> sequence you sent with one for the NF3-150 I use a lot (also x86_64)
> which does not exhibit this failure, and the differences I noticed
> were:
>
> - NOCP set in roothub.a ... your BIOS reports no overcurrent protection
> - different 2.6.18 prepatches ... you used rc6-mm2, not rc7
> - different irqs (you used PIC not IOAPIC)
> - driver registration sequence different ... I registered EHCI first
> - yours came _up_ with RHSC irq pending on one root (device present)
>
> And re those last two, it didn't finish mouse enumeration with OHCI before
> starting to do it with EHCI. I could easily see how that would lead to
> timing-dependent/intermittent failures.
>
> Now, registering EHCI first is not "supposed" to matter, but I'm thinking
> it started to matter a while back, since a few folk have reported as much.
>
> One suspicion being that some of the hub driver changes have had some bad
> consequences. (My suspicions there were highlighted by noticing some of
> the misbehavior associated with an embedded USB controller I was testing,
> which provoked failures in those same code paths...) The root hub handoff
> relies on the usb/core/hub.c code to do the right things, notably treating
> disconnect-during-reset (handoff to companion) as routine, but I think I
> noticed that fault handling logic has changed.
>
> At any rate, that suggests a few experiments to me.
>
> - First, does this still show up with the stock RC7 code? There are
> a bunch of IMO rather experimental USB patches in the MM tree...
> including several affecting usbcore hub support.
>
> - Second does it appear without EHCI loaded? If not, that would
> tend to confirm an issue usbcore hub driver handoff logic.
>
> - Third, does it appear if EHCI is loaded _first_ (as the distro
> should already have been doing just to avoid thrashing during
> system startup)? Similar comment re previous experiment, though
> it'd provide a potential workaround.
>
> I'd kind of suspect that the generic RC7 code, with EHCI loaded first
> as it should be, would "just work".

Yes, I think the problem resulted from the experimental patches in -mm.

Greetings,
Rafael


--
You never change things by fighting the existing reality.
R. Buckminster Fuller