2019-12-20 03:00:24

by AceLan Kao

[permalink] [raw]
Subject: [PATCH] usb: hub: move resume delay at the head of all USB access functions

usb_control_msg() function should be called after the resume delay, or
you'll encounter the below errors sometime.
After the issue happens, have to re-plug the USB cable to recover.

[ 837.483573] hub 2-3:1.0: hub_ext_port_status failed (err = -71)
[ 837.490889] hub 2-3:1.0: hub_ext_port_status failed (err = -71)
[ 837.506780] usb 2-3-port4: cannot disable (err = -71)

Signed-off-by: AceLan Kao <[email protected]>
---
drivers/usb/core/hub.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index f229ad6952c0..2fb2816b0d38 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -3522,6 +3522,7 @@ int usb_port_resume(struct usb_device *udev, pm_message_t msg)
}
}

+ msleep(USB_RESUME_TIMEOUT);
usb_lock_port(port_dev);

/* Skip the initial Clear-Suspend step for a remote wakeup */
@@ -3544,7 +3545,6 @@ int usb_port_resume(struct usb_device *udev, pm_message_t msg)
/* drive resume for USB_RESUME_TIMEOUT msec */
dev_dbg(&udev->dev, "usb %sresume\n",
(PMSG_IS_AUTO(msg) ? "auto-" : ""));
- msleep(USB_RESUME_TIMEOUT);

/* Virtual root hubs can trigger on GET_PORT_STATUS to
* stop resume signaling. Then finish the resume
--
2.17.1


2019-12-20 15:50:00

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] usb: hub: move resume delay at the head of all USB access functions

On Fri, 20 Dec 2019, AceLan Kao wrote:

> usb_control_msg() function should be called after the resume delay, or

Which usb_control_msg() call are you referring to? Is it the call
under hub_port_status()?

> you'll encounter the below errors sometime.
> After the issue happens, have to re-plug the USB cable to recover.
>
> [ 837.483573] hub 2-3:1.0: hub_ext_port_status failed (err = -71)
> [ 837.490889] hub 2-3:1.0: hub_ext_port_status failed (err = -71)
> [ 837.506780] usb 2-3-port4: cannot disable (err = -71)

You need to do a better job of figuring out why these errors occur. It
is not connected to the resume delay; there must be a different reason.
Hint: This is the sort of error you would expect to see if the kernel
tried to resume a device while its parent hub was still suspended.

> Signed-off-by: AceLan Kao <[email protected]>
> ---
> drivers/usb/core/hub.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index f229ad6952c0..2fb2816b0d38 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -3522,6 +3522,7 @@ int usb_port_resume(struct usb_device *udev, pm_message_t msg)
> }
> }
>
> + msleep(USB_RESUME_TIMEOUT);

This makes no sense at all. At this point we haven't even started to
do the resume signalling, so there's no reason to wait for it to
finish.

> usb_lock_port(port_dev);
>
> /* Skip the initial Clear-Suspend step for a remote wakeup */
> @@ -3544,7 +3545,6 @@ int usb_port_resume(struct usb_device *udev, pm_message_t msg)
> /* drive resume for USB_RESUME_TIMEOUT msec */
> dev_dbg(&udev->dev, "usb %sresume\n",
> (PMSG_IS_AUTO(msg) ? "auto-" : ""));
> - msleep(USB_RESUME_TIMEOUT);

This is wrong also. At this point the resume signal _is_ being sent,
and the USB spec requires that we wait a minimum amount of time for the
device to fully resume.

Alan Stern

2019-12-23 04:33:55

by AceLan Kao

[permalink] [raw]
Subject: Re: [PATCH] usb: hub: move resume delay at the head of all USB access functions

Alan Stern <[email protected]> 於 2019年12月20日 週五 下午11:48寫道:
>
> On Fri, 20 Dec 2019, AceLan Kao wrote:
>
> > usb_control_msg() function should be called after the resume delay, or
>
> Which usb_control_msg() call are you referring to? Is it the call
> under hub_port_status()?
usb_port_resume() -> hub_port_status() -> hub_ext_port_status()
-> get_port_status() -> usb_control_msg()

>
> > you'll encounter the below errors sometime.
> > After the issue happens, have to re-plug the USB cable to recover.
> >
> > [ 837.483573] hub 2-3:1.0: hub_ext_port_status failed (err = -71)
> > [ 837.490889] hub 2-3:1.0: hub_ext_port_status failed (err = -71)
> > [ 837.506780] usb 2-3-port4: cannot disable (err = -71)
>
> You need to do a better job of figuring out why these errors occur. It
> is not connected to the resume delay; there must be a different reason.
> Hint: This is the sort of error you would expect to see if the kernel
> tried to resume a device while its parent hub was still suspended.
Once this error shows, the USB port doesn't work until re-plug the cable.
I have no idea what else I can do to this, do you have any idea that I
could try?
Thanks.

>
> > Signed-off-by: AceLan Kao <[email protected]>
> > ---
> > drivers/usb/core/hub.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> > index f229ad6952c0..2fb2816b0d38 100644
> > --- a/drivers/usb/core/hub.c
> > +++ b/drivers/usb/core/hub.c
> > @@ -3522,6 +3522,7 @@ int usb_port_resume(struct usb_device *udev, pm_message_t msg)
> > }
> > }
> >
> > + msleep(USB_RESUME_TIMEOUT);
>
> This makes no sense at all. At this point we haven't even started to
> do the resume signalling, so there's no reason to wait for it to
> finish.
I thought the h/w need some time to be back to stable status when resuming.
>
> > usb_lock_port(port_dev);
> >
> > /* Skip the initial Clear-Suspend step for a remote wakeup */
> > @@ -3544,7 +3545,6 @@ int usb_port_resume(struct usb_device *udev, pm_message_t msg)
> > /* drive resume for USB_RESUME_TIMEOUT msec */
> > dev_dbg(&udev->dev, "usb %sresume\n",
> > (PMSG_IS_AUTO(msg) ? "auto-" : ""));
> > - msleep(USB_RESUME_TIMEOUT);
>
> This is wrong also. At this point the resume signal _is_ being sent,
> and the USB spec requires that we wait a minimum amount of time for the
> device to fully resume.
I don't see the difference that after the delay, it calls hub_port_status(), but
in the beginning of usb_port_resume() it call the same function, too.
So, I think it should be good to move the delay before the first
hub_port_status()
>
> Alan Stern
>

2019-12-24 15:24:14

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] usb: hub: move resume delay at the head of all USB access functions

On Mon, 23 Dec 2019, AceLan Kao wrote:

> > > you'll encounter the below errors sometime.
> > > After the issue happens, have to re-plug the USB cable to recover.
> > >
> > > [ 837.483573] hub 2-3:1.0: hub_ext_port_status failed (err = -71)
> > > [ 837.490889] hub 2-3:1.0: hub_ext_port_status failed (err = -71)
> > > [ 837.506780] usb 2-3-port4: cannot disable (err = -71)
> >
> > You need to do a better job of figuring out why these errors occur. It
> > is not connected to the resume delay; there must be a different reason.
> > Hint: This is the sort of error you would expect to see if the kernel
> > tried to resume a device while its parent hub was still suspended.
> Once this error shows, the USB port doesn't work until re-plug the cable.
> I have no idea what else I can do to this, do you have any idea that I
> could try?
> Thanks.

Maybe you can collect a usbmon trace showing what happens when the
error occurs. There's probably something going on that you aren't
aware of.

Alan Stern

2019-12-25 03:16:49

by AceLan Kao

[permalink] [raw]
Subject: Re: [PATCH] usb: hub: move resume delay at the head of all USB access functions

Here[1] are the dmesg and the usbmon log from wireshark, and
/sys/kernel/debug/usb/usbmon/0u.

I verified this issue on Dell XPS 13 + Dell Salomon WD19 docking
station(plug-in 3 USB disk on it)
After s2idle 7 times, 2 usb disks lost. But from wireshark log, the
packets look normal, no error.

So, I re-do the test again and log the usbmon/0u output, but it's greek to me.
Hope you can help to find some clues in the logs.
Thanks.

1. https://people.canonical.com/~acelan/bugs/usb_issue/

Alan Stern <[email protected]> 於 2019年12月24日 週二 下午11:23寫道:
>
> On Mon, 23 Dec 2019, AceLan Kao wrote:
>
> > > > you'll encounter the below errors sometime.
> > > > After the issue happens, have to re-plug the USB cable to recover.
> > > >
> > > > [ 837.483573] hub 2-3:1.0: hub_ext_port_status failed (err = -71)
> > > > [ 837.490889] hub 2-3:1.0: hub_ext_port_status failed (err = -71)
> > > > [ 837.506780] usb 2-3-port4: cannot disable (err = -71)
> > >
> > > You need to do a better job of figuring out why these errors occur. It
> > > is not connected to the resume delay; there must be a different reason.
> > > Hint: This is the sort of error you would expect to see if the kernel
> > > tried to resume a device while its parent hub was still suspended.
> > Once this error shows, the USB port doesn't work until re-plug the cable.
> > I have no idea what else I can do to this, do you have any idea that I
> > could try?
> > Thanks.
>
> Maybe you can collect a usbmon trace showing what happens when the
> error occurs. There's probably something going on that you aren't
> aware of.
>
> Alan Stern
>


Attachments:
dmesg_usb.log (119.33 kB)

2019-12-26 04:15:41

by Harry Pan

[permalink] [raw]
Subject: RE: [PATCH] usb: hub: move resume delay at the head of all USB access functions

Hi AceLan,

Would you mind to read this thread and evaluate whether it is helpful or not by kernel downgrade?
https://bugzilla.kernel.org/show_bug.cgi?id=202541

BTW, would you mind to the shared pcap file permission as well?

-Harry
________________________________________
$BWO(B: AceLan Kao [[email protected]]
$B4s7oF|4|(B: 2019$BG/(B12$B7n(B25$BF|(B $B>e8a(B 11:15
$B;j(B: Alan Stern
$BI{K\(B: Greg Kroah-Hartman; Kai-Heng Feng; Thinh Nguyen; Pan, Harry; David Heinzelmann; Andrey Konovalov; Nicolas Saenz Julienne; Mathieu Malaterre; [email protected]; Linux-Kernel@Vger. Kernel. Org
$B<g;](B: Re: [PATCH] usb: hub: move resume delay at the head of all USB access functions

Here[1] are the dmesg and the usbmon log from wireshark, and
/sys/kernel/debug/usb/usbmon/0u.

I verified this issue on Dell XPS 13 + Dell Salomon WD19 docking
station(plug-in 3 USB disk on it)
After s2idle 7 times, 2 usb disks lost. But from wireshark log, the
packets look normal, no error.

So, I re-do the test again and log the usbmon/0u output, but it's greek to me.
Hope you can help to find some clues in the logs.
Thanks.

1. https://people.canonical.com/~acelan/bugs/usb_issue/

2019-12-26 09:10:01

by AceLan Kao

[permalink] [raw]
Subject: Re: [PATCH] usb: hub: move resume delay at the head of all USB access functions

Sorry, I just corrected the file permission.

Pan, Harry <[email protected]> 於 2019年12月26日 週四 下午12:14寫道:
>
> Hi AceLan,
>
> Would you mind to read this thread and evaluate whether it is helpful or not by kernel downgrade?
> https://bugzilla.kernel.org/show_bug.cgi?id=202541
>
> BTW, would you mind to the shared pcap file permission as well?
>
> -Harry
> ________________________________________
> 從: AceLan Kao [[email protected]]
> 寄件日期: 2019年12月25日 上午 11:15
> 至: Alan Stern
> 副本: Greg Kroah-Hartman; Kai-Heng Feng; Thinh Nguyen; Pan, Harry; David Heinzelmann; Andrey Konovalov; Nicolas Saenz Julienne; Mathieu Malaterre; [email protected]; Linux-Kernel@Vger. Kernel. Org
> 主旨: Re: [PATCH] usb: hub: move resume delay at the head of all USB access functions
>
> Here[1] are the dmesg and the usbmon log from wireshark, and
> /sys/kernel/debug/usb/usbmon/0u.
>
> I verified this issue on Dell XPS 13 + Dell Salomon WD19 docking
> station(plug-in 3 USB disk on it)
> After s2idle 7 times, 2 usb disks lost. But from wireshark log, the
> packets look normal, no error.
>
> So, I re-do the test again and log the usbmon/0u output, but it's greek to me.
> Hope you can help to find some clues in the logs.
> Thanks.
>
> 1. https://people.canonical.com/~acelan/bugs/usb_issue/

2020-01-02 21:38:15

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] usb: hub: move resume delay at the head of all USB access functions

On Wed, 25 Dec 2019, AceLan Kao wrote:

> Here[1] are the dmesg and the usbmon log from wireshark, and
> /sys/kernel/debug/usb/usbmon/0u.
>
> I verified this issue on Dell XPS 13 + Dell Salomon WD19 docking
> station(plug-in 3 USB disk on it)
> After s2idle 7 times, 2 usb disks lost. But from wireshark log, the
> packets look normal, no error.
>
> So, I re-do the test again and log the usbmon/0u output, but it's greek to me.
> Hope you can help to find some clues in the logs.
> Thanks.
>
> 1. https://people.canonical.com/~acelan/bugs/usb_issue/

Maybe Mathias can help figure out what's going on. It clearly is an
xHCI-related problem of some sort.

I can't get much out of these logs. For one thing, the time period
covered by the usbmon trace is different from the time period in your
dmesg log. When you collect two kinds of logs for a test, it's
important that they should be collected at the _same_ time! Otherwise
they record different events, which is no use.

For another, your usbmon trace used the 0u file, but you should have
used the 2u file instead. All the errors you get involve devices on
bus 2; including other buses in the trace just makes it more confusing
and causes more entries to be dropped.

Also, I don't understand why your Dell Dock devices show up the way
they do. The SuperSpeed dock is on bus 2, and the high speed dock is
on bus 3, which is attached to a different xHCI controller! A hub
(which is what a dock is) should show up twice, and both instances
should be attached to the same controller.

In any case, it's clear that you've got some problem involving the xHCI
controller and the SuperSpeed Dell Dock (device 2-1). Here's an
excerpt from the dmesg log showing what goes wrong. This occurs during
a system resume; apparently the dock's connection drops while the
system is suspended and there are errors when the system tries to
re-activate it:

[ 721.507202] usb 2-1: USB disconnect, device number 32
[ 721.507206] usb 2-1.3: USB disconnect, device number 33
[ 721.507209] usb 2-1.3.1: USB disconnect, device number 35
[ 721.540214] usb 2-1.3.3: USB disconnect, device number 36
[ 721.584518] usb 2-1.4: USB disconnect, device number 34
[ 721.647466] xhci_hcd 0000:00:0d.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

That's the first problem.

[ 721.919433] usb 2-1: new SuperSpeedPlus Gen 2 USB device number 38 using xhci_hcd
[ 721.942597] usb 2-1: New USB device found, idVendor=0bda, idProduct=0487, bcdDevice= 1.47
[ 721.942600] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[ 721.942602] usb 2-1: Product: Dell dock
[ 721.942604] usb 2-1: Manufacturer: Dell Inc.
[ 721.944849] hub 2-1:1.0: USB hub found
[ 721.945232] hub 2-1:1.0: 4 ports detected
[ 722.178395] hub 2-1:1.0: hub_ext_port_status failed (err = -71)
[ 722.178570] usb 2-1-port3: attempt power cycle
[ 722.559212] usb 2-1: USB disconnect, device number 38
[ 722.559311] usb 2-1: Failed to suspend device, error -19

And that's a second error. Then the system tries again:

[ 722.831431] usb 2-1: new SuperSpeedPlus Gen 2 USB device number 43 using xhci_hcd
[ 722.854608] usb 2-1: New USB device found, idVendor=0bda, idProduct=0487, bcdDevice= 1.47
[ 722.854611] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[ 722.854614] usb 2-1: Product: Dell dock
[ 722.854616] usb 2-1: Manufacturer: Dell Inc.
[ 722.857034] hub 2-1:1.0: USB hub found
[ 722.857440] hub 2-1:1.0: 4 ports detected
[ 723.090106] hub 2-1:1.0: hub_ext_port_status failed (err = -71)
[ 723.090316] usb 2-1-port3: attempt power cycle
[ 723.471212] usb 2-1: USB disconnect, device number 43
[ 723.471274] usb 2-1: Failed to suspend device, error -19

And it fails in the same way. A third try ends up succeeding:

[ 723.743465] usb 2-1: new SuperSpeedPlus Gen 2 USB device number 48 using xhci_hcd
...

I don't see why the device numbers jump around in such a crazy manner.
When device 38 disconnects, the next device is number 43. And when
that fails, the next number is 48! It looks like device numbers are
somehow being allocated for the four child devices even though the log
doesn't show them being detected.

Here's a portion of the usbmon trace that apparently corresponds
somewhat to part of the log above, although it's hard to be certain.
In this trace, device 77 on bus 2 is 2-1. Unlike the log above, it
only has a child attached to port 3, not to port 4:

ffff9879b43ff840 291366827 S Ci:2:001:0 s a3 00 0000 0001 0004 4 <
ffff9879b43ff840 291366846 C Ci:2:001:0 0 4 = 03020000
ffff9879b2f54780 291469875 S Ci:2:077:0 s a3 00 0000 0001 0004 4 <
ffff9879b2f54780 291470062 C Ci:2:077:0 0 4 = a0020000
ffff9879b2f54780 291470069 S Ci:2:077:0 s a3 00 0000 0002 0004 4 <
ffff9879b2f54780 291470208 C Ci:2:077:0 0 4 = a0020000
ffff9879b2f54780 291470214 S Ci:2:077:0 s a3 00 0000 0003 0004 4 <
ffff9879b2f54780 291470387 C Ci:2:077:0 0 4 = 03021000
ffff9879b2f54780 291470392 S Co:2:077:0 s 23 01 0014 0003 0000 0
ffff9879b2f54780 291470527 C Co:2:077:0 0 0
ffff9879b2f54780 291470533 S Ci:2:077:0 s a3 00 0000 0004 0004 4 <
ffff9879b2f54780 291470709 C Ci:2:077:0 0 4 = a0020000
ffff9879b43ff480 291573804 S Ii:2:077:1 -115:128 2 <
ffff9879b2f54780 291573839 S Ci:2:077:0 s a3 00 0000 0003 0004 4 <
ffff9879b2f54780 291574068 C Ci:2:077:0 0 4 = 03020000
ffff9879b2f54780 291574166 S Ci:2:077:0 s a3 00 0000 0003 0004 4 <
ffff9879b2f54780 291574363 C Ci:2:077:0 0 4 = 03020000
ffff9879b2f54780 291574383 S Co:2:077:0 s 23 03 0004 0003 0000 0
ffff9879b2f54780 291574580 C Co:2:077:0 0 0
ffff9879b2f54780 291593804 S Ci:2:077:0 s a3 00 0002 0003 0008 8 <
ffff9879ae2969c0 291595804 C Ii:2:001:1 0:2048 1 = 02
ffff9879ae2969c0 291595824 S Ii:2:001:1 -115:2048 4 <
ffff9879ba3476c0 291595843 S Ci:2:001:0 s a3 00 0000 0001 0004 4 <
ffff9879ba3476c0 291595857 C Ci:2:001:0 0 4 = c0024100

As far as I can tell, the device was working fine until at this point
it disconnected itself.

ffff9879ba3476c0 291595863 S Co:2:001:0 s 23 01 0010 0001 0000 0
ffff9879ba3476c0 291595867 C Co:2:001:0 0 0
ffff9879ba3476c0 291595870 S Co:2:001:0 s 23 01 0019 0001 0000 0
ffff9879ba3476c0 291595875 C Co:2:001:0 0 0
ffff9879ba3476c0 291595879 S Co:2:001:0 s 23 03 001c 0001 0000 0
ffff9879ba3476c0 291595884 C Co:2:001:0 0 0
ffff9879b2f54780 291600949 C Ci:2:077:0 -71 0
ffff9879b2f54780 291600997 S Co:2:077:0 s 23 03 0004 0003 0000 0
ffff9879b2f54780 291601000 E Co:2:077:0 -19 0

Mathias, any suggestions on how Acelan can track down the reasons for
these problems?

Alan Stern

2020-01-03 05:28:55

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH] usb: hub: move resume delay at the head of all USB access functions

Hi Alan,

> On Jan 3, 2020, at 05:37, Alan Stern <[email protected]> wrote:
>
> On Wed, 25 Dec 2019, AceLan Kao wrote:
>
>> Here[1] are the dmesg and the usbmon log from wireshark, and
>> /sys/kernel/debug/usb/usbmon/0u.
>>
>> I verified this issue on Dell XPS 13 + Dell Salomon WD19 docking
>> station(plug-in 3 USB disk on it)
>> After s2idle 7 times, 2 usb disks lost. But from wireshark log, the
>> packets look normal, no error.
>>
>> So, I re-do the test again and log the usbmon/0u output, but it's greek to me.
>> Hope you can help to find some clues in the logs.
>> Thanks.
>>
>> 1. https://people.canonical.com/~acelan/bugs/usb_issue/
>
> Maybe Mathias can help figure out what's going on. It clearly is an
> xHCI-related problem of some sort.

I think I figured this issue out and I'll send out a patch series soon.

>
> I can't get much out of these logs. For one thing, the time period
> covered by the usbmon trace is different from the time period in your
> dmesg log. When you collect two kinds of logs for a test, it's
> important that they should be collected at the _same_ time! Otherwise
> they record different events, which is no use.
>
> For another, your usbmon trace used the 0u file, but you should have
> used the 2u file instead. All the errors you get involve devices on
> bus 2; including other buses in the trace just makes it more confusing
> and causes more entries to be dropped.
>
> Also, I don't understand why your Dell Dock devices show up the way
> they do. The SuperSpeed dock is on bus 2, and the high speed dock is
> on bus 3, which is attached to a different xHCI controller! A hub
> (which is what a dock is) should show up twice, and both instances
> should be attached to the same controller.

I think this is how Ice Lake xHCI works, the Thunderbolt xHCI is also part of SoC on Ice Lake.

>
> In any case, it's clear that you've got some problem involving the xHCI
> controller and the SuperSpeed Dell Dock (device 2-1). Here's an
> excerpt from the dmesg log showing what goes wrong. This occurs during
> a system resume; apparently the dock's connection drops while the
> system is suspended and there are errors when the system tries to
> re-activate it:
>
> [ 721.507202] usb 2-1: USB disconnect, device number 32
> [ 721.507206] usb 2-1.3: USB disconnect, device number 33
> [ 721.507209] usb 2-1.3.1: USB disconnect, device number 35
> [ 721.540214] usb 2-1.3.3: USB disconnect, device number 36
> [ 721.584518] usb 2-1.4: USB disconnect, device number 34
> [ 721.647466] xhci_hcd 0000:00:0d.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
>
> That's the first problem.
>
> [ 721.919433] usb 2-1: new SuperSpeedPlus Gen 2 USB device number 38 using xhci_hcd
> [ 721.942597] usb 2-1: New USB device found, idVendor=0bda, idProduct=0487, bcdDevice= 1.47
> [ 721.942600] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [ 721.942602] usb 2-1: Product: Dell dock
> [ 721.942604] usb 2-1: Manufacturer: Dell Inc.
> [ 721.944849] hub 2-1:1.0: USB hub found
> [ 721.945232] hub 2-1:1.0: 4 ports detected
> [ 722.178395] hub 2-1:1.0: hub_ext_port_status failed (err = -71)
> [ 722.178570] usb 2-1-port3: attempt power cycle
> [ 722.559212] usb 2-1: USB disconnect, device number 38
> [ 722.559311] usb 2-1: Failed to suspend device, error -19
>
> And that's a second error. Then the system tries again:
>
> [ 722.831431] usb 2-1: new SuperSpeedPlus Gen 2 USB device number 43 using xhci_hcd
> [ 722.854608] usb 2-1: New USB device found, idVendor=0bda, idProduct=0487, bcdDevice= 1.47
> [ 722.854611] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [ 722.854614] usb 2-1: Product: Dell dock
> [ 722.854616] usb 2-1: Manufacturer: Dell Inc.
> [ 722.857034] hub 2-1:1.0: USB hub found
> [ 722.857440] hub 2-1:1.0: 4 ports detected
> [ 723.090106] hub 2-1:1.0: hub_ext_port_status failed (err = -71)
> [ 723.090316] usb 2-1-port3: attempt power cycle
> [ 723.471212] usb 2-1: USB disconnect, device number 43
> [ 723.471274] usb 2-1: Failed to suspend device, error -19
>
> And it fails in the same way. A third try ends up succeeding:
>
> [ 723.743465] usb 2-1: new SuperSpeedPlus Gen 2 USB device number 48 using xhci_hcd
> ...
>
> I don't see why the device numbers jump around in such a crazy manner.
> When device 38 disconnects, the next device is number 43. And when
> that fails, the next number is 48! It looks like device numbers are
> somehow being allocated for the four child devices even though the log
> doesn't show them being detected.
>
> Here's a portion of the usbmon trace that apparently corresponds
> somewhat to part of the log above, although it's hard to be certain.
> In this trace, device 77 on bus 2 is 2-1. Unlike the log above, it
> only has a child attached to port 3, not to port 4:
>
> ffff9879b43ff840 291366827 S Ci:2:001:0 s a3 00 0000 0001 0004 4 <
> ffff9879b43ff840 291366846 C Ci:2:001:0 0 4 = 03020000
> ffff9879b2f54780 291469875 S Ci:2:077:0 s a3 00 0000 0001 0004 4 <
> ffff9879b2f54780 291470062 C Ci:2:077:0 0 4 = a0020000
> ffff9879b2f54780 291470069 S Ci:2:077:0 s a3 00 0000 0002 0004 4 <
> ffff9879b2f54780 291470208 C Ci:2:077:0 0 4 = a0020000
> ffff9879b2f54780 291470214 S Ci:2:077:0 s a3 00 0000 0003 0004 4 <
> ffff9879b2f54780 291470387 C Ci:2:077:0 0 4 = 03021000
> ffff9879b2f54780 291470392 S Co:2:077:0 s 23 01 0014 0003 0000 0
> ffff9879b2f54780 291470527 C Co:2:077:0 0 0
> ffff9879b2f54780 291470533 S Ci:2:077:0 s a3 00 0000 0004 0004 4 <
> ffff9879b2f54780 291470709 C Ci:2:077:0 0 4 = a0020000
> ffff9879b43ff480 291573804 S Ii:2:077:1 -115:128 2 <
> ffff9879b2f54780 291573839 S Ci:2:077:0 s a3 00 0000 0003 0004 4 <
> ffff9879b2f54780 291574068 C Ci:2:077:0 0 4 = 03020000
> ffff9879b2f54780 291574166 S Ci:2:077:0 s a3 00 0000 0003 0004 4 <
> ffff9879b2f54780 291574363 C Ci:2:077:0 0 4 = 03020000
> ffff9879b2f54780 291574383 S Co:2:077:0 s 23 03 0004 0003 0000 0
> ffff9879b2f54780 291574580 C Co:2:077:0 0 0
> ffff9879b2f54780 291593804 S Ci:2:077:0 s a3 00 0002 0003 0008 8 <
> ffff9879ae2969c0 291595804 C Ii:2:001:1 0:2048 1 = 02
> ffff9879ae2969c0 291595824 S Ii:2:001:1 -115:2048 4 <
> ffff9879ba3476c0 291595843 S Ci:2:001:0 s a3 00 0000 0001 0004 4 <
> ffff9879ba3476c0 291595857 C Ci:2:001:0 0 4 = c0024100
>
> As far as I can tell, the device was working fine until at this point
> it disconnected itself.
>
> ffff9879ba3476c0 291595863 S Co:2:001:0 s 23 01 0010 0001 0000 0
> ffff9879ba3476c0 291595867 C Co:2:001:0 0 0
> ffff9879ba3476c0 291595870 S Co:2:001:0 s 23 01 0019 0001 0000 0
> ffff9879ba3476c0 291595875 C Co:2:001:0 0 0
> ffff9879ba3476c0 291595879 S Co:2:001:0 s 23 03 001c 0001 0000 0
> ffff9879ba3476c0 291595884 C Co:2:001:0 0 0
> ffff9879b2f54780 291600949 C Ci:2:077:0 -71 0
> ffff9879b2f54780 291600997 S Co:2:077:0 s 23 03 0004 0003 0000 0
> ffff9879b2f54780 291601000 E Co:2:077:0 -19 0
>
> Mathias, any suggestions on how Acelan can track down the reasons for
> these problems?

There are three issues for this hub:
- It may not be in U3 after suspend
- It may not be in U0 after resume
- It disconnects when setting downstream ports to U0 while the hub itself is not in U0.

I'll send a patch series soon.

Kai-Heng

>
> Alan Stern