2017-07-24 12:03:46

by Zdenek Kabelac

[permalink] [raw]
Subject: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4

Hi

I've problem with my USB storage devices: WD Elements 1TB.
(Bus 004 Device 002: ID 1058:10a8 Western Digital Technologies, Inc. Elements
Portable (WDBUZG))


After kernel >4.9 when disk is attached via cable it has very low speed
(less then 1MB/s).

It can run at full speed (>22MB/s) when the Linux kernel is fully rebooted (so
disk is attached during the reboot of Lenovo T61, C2D, Fedora Rawhide).

However when >4.9 kernel is running and disk is just attached it's very slow.

I've played a bisect game - and the clean result has been:

22547c4cc4fe20698a6a85a55b8788859134b8e4

When I just revert this patch with 4.13-rc1 - it's again running with full
speed even when disk is attached (thus no reboot is needed for full speed).


So while I've no idea what 22547c4cc4fe20698... is doing, it seems to have
some unpleasant side-effect on regular USB devices.

So what else is needed to get this properly working ?
(assuming plain revert of 22547c4cc4fe20698 is unwanted).

What more info can I provide to get this storage 'normally' usable without
rebooting the machine.

Regards

Zdenek


2017-07-24 14:41:18

by Alan Stern

[permalink] [raw]
Subject: Re: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4

On Mon, 24 Jul 2017, Zdenek Kabelac wrote:

> Hi
>
> I've problem with my USB storage devices: WD Elements 1TB.
> (Bus 004 Device 002: ID 1058:10a8 Western Digital Technologies, Inc. Elements
> Portable (WDBUZG))
>
>
> After kernel >4.9 when disk is attached via cable it has very low speed
> (less then 1MB/s).
>
> It can run at full speed (>22MB/s) when the Linux kernel is fully rebooted (so
> disk is attached during the reboot of Lenovo T61, C2D, Fedora Rawhide).
>
> However when >4.9 kernel is running and disk is just attached it's very slow.
>
> I've played a bisect game - and the clean result has been:
>
> 22547c4cc4fe20698a6a85a55b8788859134b8e4
>
> When I just revert this patch with 4.13-rc1 - it's again running with full
> speed even when disk is attached (thus no reboot is needed for full speed).
>
>
> So while I've no idea what 22547c4cc4fe20698... is doing, it seems to have
> some unpleasant side-effect on regular USB devices.
>
> So what else is needed to get this properly working ?
> (assuming plain revert of 22547c4cc4fe20698 is unwanted).
>
> What more info can I provide to get this storage 'normally' usable without
> rebooting the machine.

Please post the dmesg logs showing what happens when the disk is
first attached and operates slowly, and what happens when the disk is
attached following a reboot and operates normally.

Alan Stern

2017-07-24 21:04:11

by Guenter Roeck

[permalink] [raw]
Subject: Re: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4

On Mon, Jul 24, 2017 at 10:41:12AM -0400, Alan Stern wrote:
> On Mon, 24 Jul 2017, Zdenek Kabelac wrote:
>
> > Hi
> >
> > I've problem with my USB storage devices: WD Elements 1TB.
> > (Bus 004 Device 002: ID 1058:10a8 Western Digital Technologies, Inc. Elements
> > Portable (WDBUZG))
> >
> >
> > After kernel >4.9 when disk is attached via cable it has very low speed
> > (less then 1MB/s).
> >
> > It can run at full speed (>22MB/s) when the Linux kernel is fully rebooted (so
> > disk is attached during the reboot of Lenovo T61, C2D, Fedora Rawhide).
> >
> > However when >4.9 kernel is running and disk is just attached it's very slow.
> >
> > I've played a bisect game - and the clean result has been:
> >
> > 22547c4cc4fe20698a6a85a55b8788859134b8e4
> >
> > When I just revert this patch with 4.13-rc1 - it's again running with full
> > speed even when disk is attached (thus no reboot is needed for full speed).
> >
> >
> > So while I've no idea what 22547c4cc4fe20698... is doing, it seems to have
> > some unpleasant side-effect on regular USB devices.
> >
> > So what else is needed to get this properly working ?
> > (assuming plain revert of 22547c4cc4fe20698 is unwanted).
> >
> > What more info can I provide to get this storage 'normally' usable without
> > rebooting the machine.
>
> Please post the dmesg logs showing what happens when the disk is
> first attached and operates slowly, and what happens when the disk is
> attached following a reboot and operates normally.
>

Only idea I came up with is that some device may require more than one
port reset. Agreed, dmesg logs should give us an idea if this is the case.

I ordered one of those drives; maybe I can reproduce the problem.

Guenter

2017-07-25 10:06:22

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4

Dne 24.7.2017 v 16:41 Alan Stern napsal(a):
> On Mon, 24 Jul 2017, Zdenek Kabelac wrote:
>
>> Hi
>>
>> I've problem with my USB storage devices: WD Elements 1TB.
>> (Bus 004 Device 002: ID 1058:10a8 Western Digital Technologies, Inc. Elements
>> Portable (WDBUZG))
>>
>>
>> After kernel >4.9 when disk is attached via cable it has very low speed
>> (less then 1MB/s).
>>
>> It can run at full speed (>22MB/s) when the Linux kernel is fully rebooted (so
>> disk is attached during the reboot of Lenovo T61, C2D, Fedora Rawhide).
>>
>> However when >4.9 kernel is running and disk is just attached it's very slow.
>>
>> I've played a bisect game - and the clean result has been:
>>
>> 22547c4cc4fe20698a6a85a55b8788859134b8e4
>>
>> When I just revert this patch with 4.13-rc1 - it's again running with full
>> speed even when disk is attached (thus no reboot is needed for full speed).
>>
>>
>> So while I've no idea what 22547c4cc4fe20698... is doing, it seems to have
>> some unpleasant side-effect on regular USB devices.
>>
>> So what else is needed to get this properly working ?
>> (assuming plain revert of 22547c4cc4fe20698 is unwanted).
>>
>> What more info can I provide to get this storage 'normally' usable without
>> rebooting the machine.
>
> Please post the dmesg logs showing what happens when the disk is
> first attached and operates slowly, and what happens when the disk is
> attached following a reboot and operates normally.

So I'm attaching kernel traces from kernel 4.8 & 4.12 from T61.

Both are from full boot (all kernel: messages)

In both cases - boot was with USB WD disk attached -
then I've detached USB disk and reattached again.

On 4.8 this had normal speed all the time
On 4.12 after reattach -> slow speed.

I should also add that on Lenovo T440s - there seems to be NO slowdown
when this WD Element drive is attached it works normally all the time.

So it could be probably related to USB chipset on T61 ??

For completeness I'm also attaching boot kernel trace from T440s where USB
disk is just attached and works with normal speed.


Zdenek


Attachments:
k4.8 (44.03 kB)
k4.12 (45.23 kB)
k440s (43.87 kB)
Download all attachments

2017-07-25 14:25:36

by Alan Stern

[permalink] [raw]
Subject: Re: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4

On Tue, 25 Jul 2017, Zdenek Kabelac wrote:

> Dne 24.7.2017 v 16:41 Alan Stern napsal(a):
> > On Mon, 24 Jul 2017, Zdenek Kabelac wrote:
> >
> >> Hi
> >>
> >> I've problem with my USB storage devices: WD Elements 1TB.
> >> (Bus 004 Device 002: ID 1058:10a8 Western Digital Technologies, Inc. Elements
> >> Portable (WDBUZG))
> >>
> >>
> >> After kernel >4.9 when disk is attached via cable it has very low speed
> >> (less then 1MB/s).
> >>
> >> It can run at full speed (>22MB/s) when the Linux kernel is fully rebooted (so
> >> disk is attached during the reboot of Lenovo T61, C2D, Fedora Rawhide).
> >>
> >> However when >4.9 kernel is running and disk is just attached it's very slow.
> >>
> >> I've played a bisect game - and the clean result has been:
> >>
> >> 22547c4cc4fe20698a6a85a55b8788859134b8e4
> >>
> >> When I just revert this patch with 4.13-rc1 - it's again running with full
> >> speed even when disk is attached (thus no reboot is needed for full speed).
> >>
> >>
> >> So while I've no idea what 22547c4cc4fe20698... is doing, it seems to have
> >> some unpleasant side-effect on regular USB devices.
> >>
> >> So what else is needed to get this properly working ?
> >> (assuming plain revert of 22547c4cc4fe20698 is unwanted).
> >>
> >> What more info can I provide to get this storage 'normally' usable without
> >> rebooting the machine.
> >
> > Please post the dmesg logs showing what happens when the disk is
> > first attached and operates slowly, and what happens when the disk is
> > attached following a reboot and operates normally.
>
> So I'm attaching kernel traces from kernel 4.8 & 4.12 from T61.
>
> Both are from full boot (all kernel: messages)
>
> In both cases - boot was with USB WD disk attached -
> then I've detached USB disk and reattached again.
>
> On 4.8 this had normal speed all the time
> On 4.12 after reattach -> slow speed.
>
> I should also add that on Lenovo T440s - there seems to be NO slowdown
> when this WD Element drive is attached it works normally all the time.
>
> So it could be probably related to USB chipset on T61 ??
>
> For completeness I'm also attaching boot kernel trace from T440s where USB
> disk is just attached and works with normal speed.

The log shows that the drive is reconnecting at full speed (12 Mb/s)
instead of high speed (480 Mb/s), which is why the communication
becomes so slow.

I think there's some weird timing issue going on. Hard to tell from
just the log, though.

Please collect a usbmon trace, starting from just before the unplug and
ending after the drive has reconnected at the slower speed.
Instructions are in Documentation/usb/usbmon.txt. Use bus number 0 for
the trace, so we can see what's going on with both the high-speed and
full-speed connections. And collect traces for both 4.8 and 4.12.

And try to avoid using any other USB devices during the test, to
minimize the amount of excess information recorded in the trace.

Alan Stern

2017-07-25 14:34:12

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4

Dne 25.7.2017 v 16:25 Alan Stern napsal(a):
> On Tue, 25 Jul 2017, Zdenek Kabelac wrote:
>
>> Dne 24.7.2017 v 16:41 Alan Stern napsal(a):
>>> On Mon, 24 Jul 2017, Zdenek Kabelac wrote:
>>>
>>>> Hi
>>>>
>>>> I've problem with my USB storage devices: WD Elements 1TB.
>>>> (Bus 004 Device 002: ID 1058:10a8 Western Digital Technologies, Inc. Elements
>>>> Portable (WDBUZG))
>>>>
>>>>
>>>> After kernel >4.9 when disk is attached via cable it has very low speed
>>>> (less then 1MB/s).
>>>>
>>>> It can run at full speed (>22MB/s) when the Linux kernel is fully rebooted (so
>>>> disk is attached during the reboot of Lenovo T61, C2D, Fedora Rawhide).
>>>>
>>>> However when >4.9 kernel is running and disk is just attached it's very slow.
>>>>
>>>> I've played a bisect game - and the clean result has been:
>>>>
>>>> 22547c4cc4fe20698a6a85a55b8788859134b8e4
>>>>
>>>> When I just revert this patch with 4.13-rc1 - it's again running with full
>>>> speed even when disk is attached (thus no reboot is needed for full speed).
>>>>
>>>>
>>>> So while I've no idea what 22547c4cc4fe20698... is doing, it seems to have
>>>> some unpleasant side-effect on regular USB devices.
>>>>
>>>> So what else is needed to get this properly working ?
>>>> (assuming plain revert of 22547c4cc4fe20698 is unwanted).
>>>>
>>>> What more info can I provide to get this storage 'normally' usable without
>>>> rebooting the machine.
>>>
>>> Please post the dmesg logs showing what happens when the disk is
>>> first attached and operates slowly, and what happens when the disk is
>>> attached following a reboot and operates normally.
>>
>> So I'm attaching kernel traces from kernel 4.8 & 4.12 from T61.
>>
>> Both are from full boot (all kernel: messages)
>>
>> In both cases - boot was with USB WD disk attached -
>> then I've detached USB disk and reattached again.
>>
>> On 4.8 this had normal speed all the time
>> On 4.12 after reattach -> slow speed.
>>
>> I should also add that on Lenovo T440s - there seems to be NO slowdown
>> when this WD Element drive is attached it works normally all the time.
>>
>> So it could be probably related to USB chipset on T61 ??
>>
>> For completeness I'm also attaching boot kernel trace from T440s where USB
>> disk is just attached and works with normal speed.
>
> The log shows that the drive is reconnecting at full speed (12 Mb/s)
> instead of high speed (480 Mb/s), which is why the communication
> becomes so slow.
>
> I think there's some weird timing issue going on. Hard to tell from
> just the log, though.
>
> Please collect a usbmon trace, starting from just before the unplug and
> ending after the drive has reconnected at the slower speed.
> Instructions are in Documentation/usb/usbmon.txt. Use bus number 0 for
> the trace, so we can see what's going on with both the high-speed and
> full-speed connections. And collect traces for both 4.8 and 4.12.
>
> And try to avoid using any other USB devices during the test, to
> minimize the amount of excess information recorded in the trace.
>


Ahh nice spot - I've not been checking that message in detail - so maybe I
should actually run another bisect - where:

"new high-speed USB device number..."

turned into:

"new full-speed USB device number 2 using uhci_hcd"
"not running at top speed; connect to a high speed hub"

As I've just ended with a commit that made 'slow-speed' reality
but I need to find the commit having impact on connecting message

Zdenek

2017-07-25 15:03:08

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4

Dne 25.7.2017 v 16:34 Zdenek Kabelac napsal(a):
> Dne 25.7.2017 v 16:25 Alan Stern napsal(a):
>> On Tue, 25 Jul 2017, Zdenek Kabelac wrote:
>>
>>> Dne 24.7.2017 v 16:41 Alan Stern napsal(a):
>>>> On Mon, 24 Jul 2017, Zdenek Kabelac wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> I've problem with my USB storage devices: WD Elements 1TB.
>>>>> (Bus 004 Device 002: ID 1058:10a8 Western Digital Technologies, Inc.
>>>>> Elements
>>>>> Portable (WDBUZG))
>>>>>
>>>>>
>>>>> After kernel >4.9 when disk is attached via cable it has very low speed
>>>>> (less then 1MB/s).
>>>>>
>>>>> It can run at full speed (>22MB/s) when the Linux kernel is fully
>>>>> rebooted (so
>>>>> disk is attached during the reboot of Lenovo T61, C2D, Fedora Rawhide).
>>>>>
>>>>> However when >4.9 kernel is running and disk is just attached it's very
>>>>> slow.
>>>>>
>>>>> I've played a bisect game - and the clean result has been:
>>>>>
>>>>> 22547c4cc4fe20698a6a85a55b8788859134b8e4
>>>>>
>>>>> When I just revert this patch with 4.13-rc1 - it's again running with full
>>>>> speed even when disk is attached (thus no reboot is needed for full speed).
>>>>>
>>>>>
>>>>> So while I've no idea what 22547c4cc4fe20698... is doing, it seems to have
>>>>> some unpleasant side-effect on regular USB devices.
>>>>>
>>>>> So what else is needed to get this properly working ?
>>>>> (assuming plain revert of 22547c4cc4fe20698 is unwanted).
>>>>>
>>>>> What more info can I provide to get this storage 'normally' usable without
>>>>> rebooting the machine.
>>>>
>>>> Please post the dmesg logs showing what happens when the disk is
>>>> first attached and operates slowly, and what happens when the disk is
>>>> attached following a reboot and operates normally.
>>>
>>> So I'm attaching kernel traces from kernel 4.8 & 4.12 from T61.
>>>
>>> Both are from full boot (all kernel: messages)
>>>
>>> In both cases - boot was with USB WD disk attached -
>>> then I've detached USB disk and reattached again.
>>>
>>> On 4.8 this had normal speed all the time
>>> On 4.12 after reattach -> slow speed.
>>>
>>> I should also add that on Lenovo T440s - there seems to be NO slowdown
>>> when this WD Element drive is attached it works normally all the time.
>>>
>>> So it could be probably related to USB chipset on T61 ??
>>>
>>> For completeness I'm also attaching boot kernel trace from T440s where USB
>>> disk is just attached and works with normal speed.
>>
>> The log shows that the drive is reconnecting at full speed (12 Mb/s)
>> instead of high speed (480 Mb/s), which is why the communication
>> becomes so slow.
>>
>> I think there's some weird timing issue going on. Hard to tell from
>> just the log, though.
>>
>> Please collect a usbmon trace, starting from just before the unplug and
>> ending after the drive has reconnected at the slower speed.
>> Instructions are in Documentation/usb/usbmon.txt. Use bus number 0 for
>> the trace, so we can see what's going on with both the high-speed and
>> full-speed connections. And collect traces for both 4.8 and 4.12.
>>
>> And try to avoid using any other USB devices during the test, to
>> minimize the amount of excess information recorded in the trace.
>>
>
>
> Ahh nice spot - I've not been checking that message in detail - so maybe I
> should actually run another bisect - where:
>
> "new high-speed USB device number..."
>
> turned into:
>
> "new full-speed USB device number 2 using uhci_hcd"
> "not running at top speed; connect to a high speed hub"
>
> As I've just ended with a commit that made 'slow-speed' reality
> but I need to find the commit having impact on connecting message
>

And in fact it's the very same commit - which adds this message
(just check current 4.13 with and without this commit reverted)

So here goes usbmon trace (aka 'cat /sys/kernel/debug/usb/usbmon/0u')
no other usb device has been touch so should not hopefully interfere here.

Trace is from 4.12 kernel - so it has reported "not running at top speed"

Zdenek


Attachments:
usbmon.txt (85.08 kB)

2017-07-25 17:02:33

by Alan Stern

[permalink] [raw]
Subject: Re: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4

On Tue, 25 Jul 2017, Zdenek Kabelac wrote:

> And in fact it's the very same commit - which adds this message
> (just check current 4.13 with and without this commit reverted)
>
> So here goes usbmon trace (aka 'cat /sys/kernel/debug/usb/usbmon/0u')
> no other usb device has been touch so should not hopefully interfere here.
>
> Trace is from 4.12 kernel - so it has reported "not running at top speed"

Can you collect an equivalent trace under 4.8?

Alan Stern

2017-07-25 17:46:32

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4

Dne 25.7.2017 v 19:02 Alan Stern napsal(a):
> On Tue, 25 Jul 2017, Zdenek Kabelac wrote:
>
>> And in fact it's the very same commit - which adds this message
>> (just check current 4.13 with and without this commit reverted)
>>
>> So here goes usbmon trace (aka 'cat /sys/kernel/debug/usb/usbmon/0u')
>> no other usb device has been touch so should not hopefully interfere here.
>>
>> Trace is from 4.12 kernel - so it has reported "not running at top speed"
>
> Can you collect an equivalent trace under 4.8?
>
> Alan Stern
>

Hi


Sure - no pb.

Just attached & detached USB disk in this trace

Zdenek





Attachments:
usmon4.8 (83.93 kB)

2017-07-25 19:50:10

by Alan Stern

[permalink] [raw]
Subject: Re: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4

On Tue, 25 Jul 2017, Zdenek Kabelac wrote:

> Dne 25.7.2017 v 19:02 Alan Stern napsal(a):
> > On Tue, 25 Jul 2017, Zdenek Kabelac wrote:
> >
> >> And in fact it's the very same commit - which adds this message
> >> (just check current 4.13 with and without this commit reverted)
> >>
> >> So here goes usbmon trace (aka 'cat /sys/kernel/debug/usb/usbmon/0u')
> >> no other usb device has been touch so should not hopefully interfere here.
> >>
> >> Trace is from 4.12 kernel - so it has reported "not running at top speed"
> >
> > Can you collect an equivalent trace under 4.8?
> >
> > Alan Stern
> >
>
> Hi
>
>
> Sure - no pb.
>
> Just attached & detached USB disk in this trace

This is really peculiar. The only difference is that the 4.12 trace
shows an extra 250 ms delay (including two extra Get-Port-Status
requests) compared to the 4.8 trace. I honestly can't tell what could
be causing the switch from high speed to full speed, or why it would
happen in one case but not the other.

It's obvious that _something_ is confusing the USB hardware in the disk
drive. This shows up in both traces: After you unplug and replug the
drive, it initially connects at high-speed and then disconnects itself
before the computer can communicate with it. Then a second connection
occurs. For example, in the 4.8 log:

08:18 usb 2-2: USB disconnect, device number 2
08:25 usb 2-2: new high-speed USB device number 3 using ehci-pci
08:26 usb 2-2: new high-speed USB device number 4 using ehci-pci

If the drive were working entirely correctly, it wouldn't do that.

We could continue futzing around with hardware and driver tests for a
long time. But there may be a shortcut: If you have a USB hub, you
could try attaching the drive through it. It's entirely possible that
this will fix the problem.

If not, you'll have to start doing some very detailed tests. As a
start, you can enable debugging for the usbcore and ehci_hcd drivers
immediately before the test:

echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
echo 'module ehci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
dmesg -C

Then after the test, see what shows up in the dmesg output. And again,
we'll want to do a comparison. In fact, 4.12 with and without the
commit you identified would make a better comparison than 4.12 vs. 4.8.

Alan Stern

2017-07-26 02:33:43

by Guenter Roeck

[permalink] [raw]
Subject: Re: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4

On 07/25/2017 12:50 PM, Alan Stern wrote:
> On Tue, 25 Jul 2017, Zdenek Kabelac wrote:
>
>> Dne 25.7.2017 v 19:02 Alan Stern napsal(a):
>>> On Tue, 25 Jul 2017, Zdenek Kabelac wrote:
>>>
>>>> And in fact it's the very same commit - which adds this message
>>>> (just check current 4.13 with and without this commit reverted)
>>>>
>>>> So here goes usbmon trace (aka 'cat /sys/kernel/debug/usb/usbmon/0u')
>>>> no other usb device has been touch so should not hopefully interfere here.
>>>>
>>>> Trace is from 4.12 kernel - so it has reported "not running at top speed"
>>>
>>> Can you collect an equivalent trace under 4.8?
>>>
>>> Alan Stern
>>>
>>
>> Hi
>>
>>
>> Sure - no pb.
>>
>> Just attached & detached USB disk in this trace
>
> This is really peculiar. The only difference is that the 4.12 trace
> shows an extra 250 ms delay (including two extra Get-Port-Status

Most likely we are now catching the long reset timeout. HUB_LONG_RESET_TIME
is 200 ms. It looks like the code is catching exactly the condition
addressed in my patch, ie USB_PORT_STAT_RESET is cleared but
USB_PORT_STAT_CONNECTION is not yet set. hub_port_wait_reset() will
now wait for USB_PORT_STAT_CONNECTION, which it didn't do before.

> requests) compared to the 4.8 trace. I honestly can't tell what could
> be causing the switch from high speed to full speed, or why it would
> happen in one case but not the other.
>

We talked about this today. What is causing the switch from high speed to
full speed ? Is this triggered by the kernel, or by the USB controller ?

If it is triggered by the USB controller, maybe the changed timing around
repeated hub resets, either in combination with an additional hub reset
or in combination with fewer resets, results in the full speed connection.
That is just a wild guess, of course.

Guenter

> It's obvious that _something_ is confusing the USB hardware in the disk
> drive. This shows up in both traces: After you unplug and replug the
> drive, it initially connects at high-speed and then disconnects itself
> before the computer can communicate with it. Then a second connection
> occurs. For example, in the 4.8 log:
>
> 08:18 usb 2-2: USB disconnect, device number 2
> 08:25 usb 2-2: new high-speed USB device number 3 using ehci-pci
> 08:26 usb 2-2: new high-speed USB device number 4 using ehci-pci
>
> If the drive were working entirely correctly, it wouldn't do that.
> > We could continue futzing around with hardware and driver tests for a
> long time. But there may be a shortcut: If you have a USB hub, you
> could try attaching the drive through it. It's entirely possible that
> this will fix the problem.
>
> If not, you'll have to start doing some very detailed tests. As a
> start, you can enable debugging for the usbcore and ehci_hcd drivers
> immediately before the test:
>
> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
> echo 'module ehci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
> dmesg -C
>
> Then after the test, see what shows up in the dmesg output. And again,
> we'll want to do a comparison. In fact, 4.12 with and without the
> commit you identified would make a better comparison than 4.12 vs. 4.8.
>
> Alan Stern
>
>

2017-07-26 14:28:59

by Alan Stern

[permalink] [raw]
Subject: Re: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4

On Tue, 25 Jul 2017, Guenter Roeck wrote:

> On 07/25/2017 12:50 PM, Alan Stern wrote:
> > On Tue, 25 Jul 2017, Zdenek Kabelac wrote:
> >
> >> Dne 25.7.2017 v 19:02 Alan Stern napsal(a):
> >>> On Tue, 25 Jul 2017, Zdenek Kabelac wrote:
> >>>
> >>>> And in fact it's the very same commit - which adds this message
> >>>> (just check current 4.13 with and without this commit reverted)
> >>>>
> >>>> So here goes usbmon trace (aka 'cat /sys/kernel/debug/usb/usbmon/0u')
> >>>> no other usb device has been touch so should not hopefully interfere here.
> >>>>
> >>>> Trace is from 4.12 kernel - so it has reported "not running at top speed"
> >>>
> >>> Can you collect an equivalent trace under 4.8?
> >>>
> >>> Alan Stern
> >>>
> >>
> >> Hi
> >>
> >>
> >> Sure - no pb.
> >>
> >> Just attached & detached USB disk in this trace
> >
> > This is really peculiar. The only difference is that the 4.12 trace
> > shows an extra 250 ms delay (including two extra Get-Port-Status
>
> Most likely we are now catching the long reset timeout. HUB_LONG_RESET_TIME
> is 200 ms. It looks like the code is catching exactly the condition
> addressed in my patch, ie USB_PORT_STAT_RESET is cleared but
> USB_PORT_STAT_CONNECTION is not yet set. hub_port_wait_reset() will
> now wait for USB_PORT_STAT_CONNECTION, which it didn't do before.
>
> > requests) compared to the 4.8 trace. I honestly can't tell what could
> > be causing the switch from high speed to full speed, or why it would
> > happen in one case but not the other.
> >
>
> We talked about this today. What is causing the switch from high speed to
> full speed ? Is this triggered by the kernel, or by the USB controller ?

A little of both. When a reset completes, if the device does not
follow the high-speed chirp protocol, the EHCI status registers show
that the port is not enabled. When the driver sees that, it sets a bit
that causes the connection to be moved over from the high-speed EHCI
controller to the companion full-speed UHCI controller. The code that
does this is in check_reset_complete() in ehci-hub.c.

The logs showed the same thing in both kernels: When the reset
completed, the port status showed no connection and not enabled.
Nevertheless, one kernel switched the port over to the UHCI controller
and the other kernel didn't.

> If it is triggered by the USB controller, maybe the changed timing around
> repeated hub resets, either in combination with an additional hub reset
> or in combination with fewer resets, results in the full speed connection.
> That is just a wild guess, of course.

I don't think that is what happened.

Alan Stern

2017-07-26 14:36:42

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4

Dne 26.7.2017 v 16:28 Alan Stern napsal(a):
> On Tue, 25 Jul 2017, Guenter Roeck wrote:
>
>> On 07/25/2017 12:50 PM, Alan Stern wrote:
>>> On Tue, 25 Jul 2017, Zdenek Kabelac wrote:
>>>
>>>> Dne 25.7.2017 v 19:02 Alan Stern napsal(a):
>>>>> On Tue, 25 Jul 2017, Zdenek Kabelac wrote:
>>>>>
>>>>>> And in fact it's the very same commit - which adds this message
>>>>>> (just check current 4.13 with and without this commit reverted)
>>>>>>
>>>>>> So here goes usbmon trace (aka 'cat /sys/kernel/debug/usb/usbmon/0u')
>>>>>> no other usb device has been touch so should not hopefully interfere here.
>>>>>>
>>>>>> Trace is from 4.12 kernel - so it has reported "not running at top speed"
>>>>>
>>>>> Can you collect an equivalent trace under 4.8?
>>>>>
>>>>> Alan Stern
>>>>>
>>>>
>>>> Hi
>>>>
>>>>
>>>> Sure - no pb.
>>>>
>>>> Just attached & detached USB disk in this trace
>>>
>>> This is really peculiar. The only difference is that the 4.12 trace
>>> shows an extra 250 ms delay (including two extra Get-Port-Status
>>
>> Most likely we are now catching the long reset timeout. HUB_LONG_RESET_TIME
>> is 200 ms. It looks like the code is catching exactly the condition
>> addressed in my patch, ie USB_PORT_STAT_RESET is cleared but
>> USB_PORT_STAT_CONNECTION is not yet set. hub_port_wait_reset() will
>> now wait for USB_PORT_STAT_CONNECTION, which it didn't do before.
>>
>>> requests) compared to the 4.8 trace. I honestly can't tell what could
>>> be causing the switch from high speed to full speed, or why it would
>>> happen in one case but not the other.
>>>
>>
>> We talked about this today. What is causing the switch from high speed to
>> full speed ? Is this triggered by the kernel, or by the USB controller ?
>
> A little of both. When a reset completes, if the device does not
> follow the high-speed chirp protocol, the EHCI status registers show
> that the port is not enabled. When the driver sees that, it sets a bit
> that causes the connection to be moved over from the high-speed EHCI
> controller to the companion full-speed UHCI controller. The code that
> does this is in check_reset_complete() in ehci-hub.c.


Well I do have 4.13-rc1 kernel - where the only difference is the revert
of mentioned patched - so I can provide probably traces from this one if that
helps anything.

What is not quite clear to me - why T440 works.

Could this be in some way related to the fact that T61 is USB2 only old lenovo
machine while T440 works with USB3 (new SuperSpeed USB device number...)

So maybe there is some time-sensitive logic - where WD Elements
decides to be USB2 or USB3 device ???


Zdenek

2017-07-26 15:04:05

by Alan Stern

[permalink] [raw]
Subject: Re: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4

On Wed, 26 Jul 2017, Zdenek Kabelac wrote:

> Well I do have 4.13-rc1 kernel - where the only difference is the revert
> of mentioned patched - so I can provide probably traces from this one if that
> helps anything.
>
> What is not quite clear to me - why T440 works.
>
> Could this be in some way related to the fact that T61 is USB2 only old lenovo
> machine while T440 works with USB3 (new SuperSpeed USB device number...)

Very likely that is the reason. The hardware is different.

> So maybe there is some time-sensitive logic - where WD Elements
> decides to be USB2 or USB3 device ???

The decision has to be agreed upon on both sides -- the drive and the
kernel. If you enable debugging in the kernel, we'll have a better
picture of what it is deciding to do.

Alan Stern

2017-07-26 17:49:26

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4

Dne 25.7.2017 v 21:50 Alan Stern napsal(a):
> On Tue, 25 Jul 2017, Zdenek Kabelac wrote:
>
>
> 08:18 usb 2-2: USB disconnect, device number 2
> 08:25 usb 2-2: new high-speed USB device number 3 using ehci-pci
> 08:26 usb 2-2: new high-speed USB device number 4 using ehci-pci
>
> If the drive were working entirely correctly, it wouldn't do that.
>
> We could continue futzing around with hardware and driver tests for a
> long time. But there may be a shortcut: If you have a USB hub, you
> could try attaching the drive through it. It's entirely possible that
> this will fix the problem.
>
> If not, you'll have to start doing some very detailed tests. As a
> start, you can enable debugging for the usbcore and ehci_hcd drivers
> immediately before the test:
>
> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
> echo 'module ehci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
> dmesg -C
>
> Then after the test, see what shows up in the dmesg output. And again,
> we'll want to do a comparison. In fact, 4.12 with and without the
> commit you identified would make a better comparison than 4.12 vs. 4.8.
>


Hi


So here we go with traces - made with freshly compiled recent 4.13-rc2.
OK trace is with revert patch applied.
BAD trace is the one with it (essentially vaniala master).

Trace also has KOBJECT debugging enabled - I think difference is
nicely visible between them - but I've no explanation for it.

Both traces start with cable detach followed with attachment.

Regards

Zdenek


Attachments:
OKtrace.txt (33.47 kB)
BADtrace.txt (39.11 kB)
Download all attachments

2017-07-27 01:03:32

by Alan Stern

[permalink] [raw]
Subject: Re: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4

[Added linux-usb mailing list to CC:]

Short description of bug: In 4.12 or later, when Zdenek's Western
Digital disk is attached to an EHCI controller, it ends up connecting
at full speed to the companion UHCI controller instead. But when
commit 22547c4cc4fe ("usb: hub: Wait for connection to be reestablished
after port reset") is reverted, the disk connects correctly at high
speed.

On Wed, 26 Jul 2017, Zdenek Kabelac wrote:

> Dne 25.7.2017 v 21:50 Alan Stern napsal(a):
> > On Tue, 25 Jul 2017, Zdenek Kabelac wrote:
> >
> >
> > 08:18 usb 2-2: USB disconnect, device number 2
> > 08:25 usb 2-2: new high-speed USB device number 3 using ehci-pci
> > 08:26 usb 2-2: new high-speed USB device number 4 using ehci-pci
> >
> > If the drive were working entirely correctly, it wouldn't do that.
> >
> > We could continue futzing around with hardware and driver tests for a
> > long time. But there may be a shortcut: If you have a USB hub, you
> > could try attaching the drive through it. It's entirely possible that
> > this will fix the problem.
> >
> > If not, you'll have to start doing some very detailed tests. As a
> > start, you can enable debugging for the usbcore and ehci_hcd drivers
> > immediately before the test:
> >
> > echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
> > echo 'module ehci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
> > dmesg -C
> >
> > Then after the test, see what shows up in the dmesg output. And again,
> > we'll want to do a comparison. In fact, 4.12 with and without the
> > commit you identified would make a better comparison than 4.12 vs. 4.8.
> >
>
>
> Hi
>
>
> So here we go with traces - made with freshly compiled recent 4.13-rc2.
> OK trace is with revert patch applied.
> BAD trace is the one with it (essentially vaniala master).
>
> Trace also has KOBJECT debugging enabled - I think difference is
> nicely visible between them - but I've no explanation for it.
>
> Both traces start with cable detach followed with attachment.

Okay, I figured out the cause.

The USB stack assumes that if a high-speed device initialization fails
and the device is still connected, it means that the device can't run
properly at high speed and the computer needs to try again at full
speed. See commit 90da096ee46b ("USB: force handover port to companion
when hub_port_connect_change fails").

In Zdenek's case, the device really does disconnect itself before the
second port reset. If 22547c4cc4fe is present, it causes an extra
delay -- some 200 ms -- long enough for the device to reconnect again.
So we appear to be in the situation described above, and the connection
is switched over to the companion controller.

If 22547c4cc4fe is not present, the kernel sees that the device is not
connected at the end of the second reset and gives up trying to
initialize the device. When the device reconnects about 140 ms later,
the kernel treats it as a new connection and successfully negotiates a
high-speed link.

Zdenek, you check this explanation by commenting out these last two
lines at the end of hub_port_connect() in drivers/usb/core/hub.c:

if (hcd->driver->relinquish_port && !hub->hdev->parent)
hcd->driver->relinquish_port(hcd, port1);

That should prevent the connection from being handed over to the UHCI
companion, allowing the device to operate at high speed.

Alan Stern

2017-07-27 09:11:55

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4

Dne 27.7.2017 v 03:03 Alan Stern napsal(a):
> [Added linux-usb mailing list to CC:]
>
> Short description of bug: In 4.12 or later, when Zdenek's Western
> Digital disk is attached to an EHCI controller, it ends up connecting
> at full speed to the companion UHCI controller instead. But when
> commit 22547c4cc4fe ("usb: hub: Wait for connection to be reestablished
> after port reset") is reverted, the disk connects correctly at high
> speed.
>
> On Wed, 26 Jul 2017, Zdenek Kabelac wrote:
>
>> Dne 25.7.2017 v 21:50 Alan Stern napsal(a):
>>> On Tue, 25 Jul 2017, Zdenek Kabelac wrote:
>>>
>>>
>>> 08:18 usb 2-2: USB disconnect, device number 2
>>> 08:25 usb 2-2: new high-speed USB device number 3 using ehci-pci
>>> 08:26 usb 2-2: new high-speed USB device number 4 using ehci-pci
>>>
>>> If the drive were working entirely correctly, it wouldn't do that.
>>>
>>> We could continue futzing around with hardware and driver tests for a
>>> long time. But there may be a shortcut: If you have a USB hub, you
>>> could try attaching the drive through it. It's entirely possible that
>>> this will fix the problem.
>>>
>>> If not, you'll have to start doing some very detailed tests. As a
>>> start, you can enable debugging for the usbcore and ehci_hcd drivers
>>> immediately before the test:
>>>
>>> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
>>> echo 'module ehci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
>>> dmesg -C
>>>
>>> Then after the test, see what shows up in the dmesg output. And again,
>>> we'll want to do a comparison. In fact, 4.12 with and without the
>>> commit you identified would make a better comparison than 4.12 vs. 4.8.
>>>
>>
>>
>> Hi
>>
>>
>> So here we go with traces - made with freshly compiled recent 4.13-rc2.
>> OK trace is with revert patch applied.
>> BAD trace is the one with it (essentially vaniala master).
>>
>> Trace also has KOBJECT debugging enabled - I think difference is
>> nicely visible between them - but I've no explanation for it.
>>
>> Both traces start with cable detach followed with attachment.
>
> Okay, I figured out the cause.
>
> The USB stack assumes that if a high-speed device initialization fails
> and the device is still connected, it means that the device can't run
> properly at high speed and the computer needs to try again at full
> speed. See commit 90da096ee46b ("USB: force handover port to companion
> when hub_port_connect_change fails").
>
> In Zdenek's case, the device really does disconnect itself before the
> second port reset. If 22547c4cc4fe is present, it causes an extra
> delay -- some 200 ms -- long enough for the device to reconnect again.
> So we appear to be in the situation described above, and the connection
> is switched over to the companion controller.
>
> If 22547c4cc4fe is not present, the kernel sees that the device is not
> connected at the end of the second reset and gives up trying to
> initialize the device. When the device reconnects about 140 ms later,
> the kernel treats it as a new connection and successfully negotiates a
> high-speed link.
>
> Zdenek, you check this explanation by commenting out these last two
> lines at the end of hub_port_connect() in drivers/usb/core/hub.c:
>
> if (hcd->driver->relinquish_port && !hub->hdev->parent)
> hcd->driver->relinquish_port(hcd, port1);
>
> That should prevent the connection from being handed over to the UHCI
> companion, allowing the device to operate at high speed.
>

Hi

Yep - seems this helped - I've dropped revert and commented those 2 lines
and I've used the very same kernel - and speed was all good:

usb 2-2: new high-speed USB device number 2 using ehci-pci
usb 2-2: new high-speed USB device number 3 using ehci-pci
usb 2-2: New USB device found, idVendor=1058, idProduct=10a8
usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3


So while I'm not sure if this is final fix for USB - this solution surely
solved my WD Element disk attachment issue.

Regards

Zdenek

2017-07-28 18:33:44

by Alan Stern

[permalink] [raw]
Subject: Re: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4

On Thu, 27 Jul 2017, Zdenek Kabelac wrote:

> > Zdenek, you check this explanation by commenting out these last two
> > lines at the end of hub_port_connect() in drivers/usb/core/hub.c:
> >
> > if (hcd->driver->relinquish_port && !hub->hdev->parent)
> > hcd->driver->relinquish_port(hcd, port1);
> >
> > That should prevent the connection from being handed over to the UHCI
> > companion, allowing the device to operate at high speed.
> >
>
> Hi
>
> Yep - seems this helped - I've dropped revert and commented those 2 lines
> and I've used the very same kernel - and speed was all good:
>
> usb 2-2: new high-speed USB device number 2 using ehci-pci
> usb 2-2: new high-speed USB device number 3 using ehci-pci
> usb 2-2: New USB device found, idVendor=1058, idProduct=10a8
> usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
>
>
> So while I'm not sure if this is final fix for USB - this solution surely
> solved my WD Element disk attachment issue.

Great! Here's a real patch to test; I think this will be acceptable if
it fixes your problem.

Alan Stern



Index: usb-4.x/drivers/usb/core/hub.c
===================================================================
--- usb-4.x.orig/drivers/usb/core/hub.c
+++ usb-4.x/drivers/usb/core/hub.c
@@ -4934,9 +4934,10 @@ loop:

done:
hub_port_disable(hub, port1, 1);
- if (hcd->driver->relinquish_port && !hub->hdev->parent)
- hcd->driver->relinquish_port(hcd, port1);
-
+ if (hcd->driver->relinquish_port && !hub->hdev->parent) {
+ if (status != -ENOTCONN && status != -ENODEV)
+ hcd->driver->relinquish_port(hcd, port1);
+ }
}

/* Handle physical or logical connection change events.

2017-07-29 02:09:16

by Alan Stern

[permalink] [raw]
Subject: Re: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4

On Fri, 28 Jul 2017, Zdenek Kabelac wrote:

> Dne 28.7.2017 v 20:33 Alan Stern napsal(a):
> > On Thu, 27 Jul 2017, Zdenek Kabelac wrote:
> >
> >>> Zdenek, you check this explanation by commenting out these last two
> >>> lines at the end of hub_port_connect() in drivers/usb/core/hub.c:
> >>>
> >>> if (hcd->driver->relinquish_port && !hub->hdev->parent)
> >>> hcd->driver->relinquish_port(hcd, port1);
> >>>
> >>> That should prevent the connection from being handed over to the UHCI
> >>> companion, allowing the device to operate at high speed.
> >>>
> >>
> >> Hi
> >>
> >> Yep - seems this helped - I've dropped revert and commented those 2 lines
> >> and I've used the very same kernel - and speed was all good:
> >>
> >> usb 2-2: new high-speed USB device number 2 using ehci-pci
> >> usb 2-2: new high-speed USB device number 3 using ehci-pci
> >> usb 2-2: New USB device found, idVendor=1058, idProduct=10a8
> >> usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
> >>
> >>
> >> So while I'm not sure if this is final fix for USB - this solution surely
> >> solved my WD Element disk attachment issue.
> >
> > Great! Here's a real patch to test; I think this will be acceptable if
> > it fixes your problem.
> >
> > Alan Stern
> >
> >
> >
> > Index: usb-4.x/drivers/usb/core/hub.c
> > ===================================================================
> > --- usb-4.x.orig/drivers/usb/core/hub.c
> > +++ usb-4.x/drivers/usb/core/hub.c
> > @@ -4934,9 +4934,10 @@ loop:
> >
> > done:
> > hub_port_disable(hub, port1, 1);
> > - if (hcd->driver->relinquish_port && !hub->hdev->parent)
> > - hcd->driver->relinquish_port(hcd, port1);
> > -
> > + if (hcd->driver->relinquish_port && !hub->hdev->parent) {
> > + if (status != -ENOTCONN && status != -ENODEV)
> > + hcd->driver->relinquish_port(hcd, port1);
> > + }
> > }
> >
> > /* Handle physical or logical connection change events.
> >
>
>
> Hi
>
> Yep - still good speed - however compilation reported this Warning:
>
> drivers/usb/core/hub.c:4933:37: warning: ‘status’ may be used uninitialized in
> this function [-Wmaybe-uninitialized]
> if (status != -ENOTCONN && status != -ENODEV)
> drivers/usb/core/hub.c:4728:6: note: ‘status’ was declared here
> int status, i;
>
>
> So I've just made assignment of 'status = 0' at declaration place -
> but then you have some other unneeded assigns of =0 left around...
> so possibly worth some minor cleanup...

The initialization actually should be -ENODEV. I have changed the
patch and will submit it after the weekend.

Alan Stern