2020-09-01 16:38:38

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low speed devices

On Tue, Sep 01, 2020 at 09:15:46AM -0700, Khalid Aziz wrote:
> On 8/31/20 8:31 PM, Alan Stern wrote:
> > Can you collect a usbmon trace showing an example of this problem?
> >
>
> I have attached usbmon traces for when USB hub with keyboards and mouse
> is plugged into USB 2.0 port and when it is plugged into the NEC USB 3.0
> port.

The usbmon traces show lots of errors, but no Clear-TT events. The
large number of errors suggests that you've got a hardware problem;
either a bad hub or bad USB connections.

Alan Stern


2020-09-01 17:01:35

by Khalid Aziz

[permalink] [raw]
Subject: Re: [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low speed devices

On 9/1/20 10:36 AM, Alan Stern wrote:
> On Tue, Sep 01, 2020 at 09:15:46AM -0700, Khalid Aziz wrote:
>> On 8/31/20 8:31 PM, Alan Stern wrote:
>>> Can you collect a usbmon trace showing an example of this problem?
>>>
>>
>> I have attached usbmon traces for when USB hub with keyboards and mouse
>> is plugged into USB 2.0 port and when it is plugged into the NEC USB 3.0
>> port.
>
> The usbmon traces show lots of errors, but no Clear-TT events. The
> large number of errors suggests that you've got a hardware problem;
> either a bad hub or bad USB connections.

That is what I thought initially which is why I got additional hubs and
a USB 2.0 PCI card to test. I am seeing errors across 3 USB controllers,
4 USB hubs and 4 slow/full speed devices. All of the hubs and slow/full
devices work with zero errors on my laptop. My keyboard/mouse devices
and 2 of my USB hubs predate motherboard update and they all worked
flawlessly before the motherboard upgrade. Some combinations of these
also works with no errors on my desktop with new motherboard that I had
listed in my original email:

2. USB 2.0 controller - WORKS
5. USB 3.0/3.1 controller -> Bus powered USB 2.0 hub - WORKS

I am not seeing a common failure here that would point to any specific
hardware being bad. Besides, that one code change (which I still can't
say is the right code change) in ehci-q.c makes USB 2.0 controller work
reliably with all my devices.

--
Khalid

2020-09-01 19:55:44

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low speed devices

On Tue, Sep 01, 2020 at 11:00:16AM -0600, Khalid Aziz wrote:
> On 9/1/20 10:36 AM, Alan Stern wrote:
> > On Tue, Sep 01, 2020 at 09:15:46AM -0700, Khalid Aziz wrote:
> >> On 8/31/20 8:31 PM, Alan Stern wrote:
> >>> Can you collect a usbmon trace showing an example of this problem?
> >>>
> >>
> >> I have attached usbmon traces for when USB hub with keyboards and mouse
> >> is plugged into USB 2.0 port and when it is plugged into the NEC USB 3.0
> >> port.
> >
> > The usbmon traces show lots of errors, but no Clear-TT events. The
> > large number of errors suggests that you've got a hardware problem;
> > either a bad hub or bad USB connections.
>
> That is what I thought initially which is why I got additional hubs and
> a USB 2.0 PCI card to test. I am seeing errors across 3 USB controllers,
> 4 USB hubs and 4 slow/full speed devices. All of the hubs and slow/full
> devices work with zero errors on my laptop. My keyboard/mouse devices
> and 2 of my USB hubs predate motherboard update and they all worked
> flawlessly before the motherboard upgrade. Some combinations of these
> also works with no errors on my desktop with new motherboard that I had
> listed in my original email:

It's a very puzzling situation.

One thing which probably would work well, surprisingly, would be to buy
an old USB-1.1 hub and plug it into the PCI card. That combination is
likely to be similar to what you see when plugging the devices directly
into the PCI card. It might even work okay with the USB-3 controllers.

> 2. USB 2.0 controller - WORKS
> 5. USB 3.0/3.1 controller -> Bus powered USB 2.0 hub - WORKS
>
> I am not seeing a common failure here that would point to any specific
> hardware being bad. Besides, that one code change (which I still can't
> say is the right code change) in ehci-q.c makes USB 2.0 controller work
> reliably with all my devices.

The USB and EHCI designs are flawed in that under the circumstances
you're seeing, they don't have any way to tell the difference between a
STALL and a host timing error. The current code treats these situations
as timing/transmission errors (resulting in device resets); your change
causes them to be treated as STALLs. However, there are known, common
situations in which those same symptoms really are caused by
transmission errors, so we don't want to start treating them as STALLs.

Besides, I suspect that your code change does _not_ make the USB-2
controller work reliably with your devices. You should collect a usbmon
trace under those conditions; I predict it will be full of STALLs. And
furthermore, I believe these STALLs will not show up in a usbmon trace
made with the devices plugged directly into the PCI card. If I'm right
about these things, the errors are still present even with your patch;
all it does is hide them.

Short of a USB bus analyzer, however, there's no way to tell what's
really going on.

Alan Stern

2020-09-01 22:58:27

by Khalid Aziz

[permalink] [raw]
Subject: Re: [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low speed devices

On 9/1/20 1:51 PM, Alan Stern wrote:
> On Tue, Sep 01, 2020 at 11:00:16AM -0600, Khalid Aziz wrote:
>> On 9/1/20 10:36 AM, Alan Stern wrote:
>>> On Tue, Sep 01, 2020 at 09:15:46AM -0700, Khalid Aziz wrote:
>>>> On 8/31/20 8:31 PM, Alan Stern wrote:
>>>>> Can you collect a usbmon trace showing an example of this problem?
>>>>>
>>>>
>>>> I have attached usbmon traces for when USB hub with keyboards and mouse
>>>> is plugged into USB 2.0 port and when it is plugged into the NEC USB 3.0
>>>> port.
>>>
>>> The usbmon traces show lots of errors, but no Clear-TT events. The
>>> large number of errors suggests that you've got a hardware problem;
>>> either a bad hub or bad USB connections.
>>
>> That is what I thought initially which is why I got additional hubs and
>> a USB 2.0 PCI card to test. I am seeing errors across 3 USB controllers,
>> 4 USB hubs and 4 slow/full speed devices. All of the hubs and slow/full
>> devices work with zero errors on my laptop. My keyboard/mouse devices
>> and 2 of my USB hubs predate motherboard update and they all worked
>> flawlessly before the motherboard upgrade. Some combinations of these
>> also works with no errors on my desktop with new motherboard that I had
>> listed in my original email:
>
> It's a very puzzling situation.
>
> One thing which probably would work well, surprisingly, would be to buy
> an old USB-1.1 hub and plug it into the PCI card. That combination is
> likely to be similar to what you see when plugging the devices directly
> into the PCI card. It might even work okay with the USB-3 controllers.
>
>> 2. USB 2.0 controller - WORKS
>> 5. USB 3.0/3.1 controller -> Bus powered USB 2.0 hub - WORKS
>>
>> I am not seeing a common failure here that would point to any specific
>> hardware being bad. Besides, that one code change (which I still can't
>> say is the right code change) in ehci-q.c makes USB 2.0 controller work
>> reliably with all my devices.
>
> The USB and EHCI designs are flawed in that under the circumstances
> you're seeing, they don't have any way to tell the difference between a
> STALL and a host timing error. The current code treats these situations
> as timing/transmission errors (resulting in device resets); your change
> causes them to be treated as STALLs. However, there are known, common
> situations in which those same symptoms really are caused by
> transmission errors, so we don't want to start treating them as STALLs.
>
> Besides, I suspect that your code change does _not_ make the USB-2
> controller work reliably with your devices. You should collect a usbmon
> trace under those conditions; I predict it will be full of STALLs. And
> furthermore, I believe these STALLs will not show up in a usbmon trace
> made with the devices plugged directly into the PCI card. If I'm right
> about these things, the errors are still present even with your patch;
> all it does is hide them.
>
> Short of a USB bus analyzer, however, there's no way to tell what's
> really going on.

I have managed to find a hardware combination that seems to work, so for
now at least my machine is usable. I will figure out how to interpret
usbmon output and run more experiments. There seems to be a real problem
in the driver somewhere and should be solved.

Thanks,
Khalid


2020-09-02 01:46:52

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low speed devices

On Tue, Sep 01, 2020 at 04:54:48PM -0600, Khalid Aziz wrote:
> I have managed to find a hardware combination that seems to work, so for
> now at least my machine is usable. I will figure out how to interpret
> usbmon output and run more experiments. There seems to be a real problem
> in the driver somewhere and should be solved.

Correction: You're using two different drivers. Although it's not
impossible, it seems very unlikely that they both contain the same bug.

Alan Stern