2016-12-14 10:58:59

by Michal Necasek

[permalink] [raw]
Subject: xhci_reset_endpoint() doesn't reset endpoint


Hi Mathias,

We have run into a problem with a USB printer which we're quite confident is a bug in the Linux xHCI driver. There is no problem when the same printer is plugged into a port managed by the EHCI driver.

The core problem is that xhci_reset_endpoint() doesn't do anything, and more specifically does not reset the xHC's data toggle/sequence number. That is not normally an issue, because the reset does happen in response to a STALL; in our scenario, there is no STALL or any other error. That can lead to the data toggle getting out of sync and the host dropping a packet sent by the device.

Now a detailed problem description. We have a USB printer passed through to a VM. The VM runs Windows 8.1 or 10 (other versions may be affected too), and uses Microsoft's standard usbprint.sys to talk to the printer. The vendor printer driver tries to query the printer's configuration, using the control endpoint, one OUT endpoint, and one IN endpoint. The query always times out/fails when printer is plugged into a port managed by xHCI, yet works in EHCI ports.

The usbprint.sys driver is a bit funny and in many cases (though not always) queues up URBs on the IN endpoint in advance, and once it decides that it has received the entire response, cancels the last URB and resets the IN endpoint (issuing SetFeature(CLEAR_HALT)). After much head scratching, we realized, and later confirmed with a USB analyzer, that the next IN packet that the printer sends is not seen by the host's USB stack at all, let alone the guest OS. Other packets arrive just fine, but the guest OS keeps waiting for more data to arrive, eventually loses patience and fails.

We cannot observe the data toggle state of the xHC but we are fairly certain that things go wrong when the data toggle is set (on both ends) prior to the endpoint reset. SetFeature(CLEAR_HALT) resets the toggle on the device, but not on the host. But we know for a fact that the device sends a packet (with data toggle 0) which the host USB stack never sees, and a data toggle mismatch explains that quite well.

We are using USBFS to talk to the printer, but that shouldn't matter much. I will note that the available documentation<1> explicitly says that USBDEVFS_RESETEP and USBDEVFS_CLEAR_HALT both reset the data toggle. That is indeed the case for the Linux EHCI driver but not xHCI. Both of the USBFS IOCTLs call into xhci_reset_endpoint() which does nothing.

We believe that xhci_reset_endpoint() needs to reset the data toggle/sequence number to match the documentation and for compatibility with the EHCI driver. We tried but failed to find a workaround which would reset the data toggle without side effects (e.g. USBDEVFS_SETINTERFACE does reset the toggle on the IN endpoint, but also resets it on the OUT endpoint and talks to the device, so that's no good).

The data toggle management is not terribly well documented in the xHCI spec so we hope you know about it more than we do. Based on our understanding of the xHCI specification, xhci_reset_endpoint() should issue either a Reset Endpoint command with TSP=0 or a dummy Configure Endpoint command dropping/re-adding the specified endpoint (as the xHCI 1.1 spec suggests at the end of 4.6.8). Please confirm if that should solve the problem.

We don't know how many devices this problem affects. We suspect it affects many USB printers and could in theory affect more or less any device, but few drivers reset endpoints when there are no errors. The problem scenario can probably be artificially reproduced with more or less any USB device (when data toggle is set, issue USBDEVFS_CLEAR_HALT, see if next packet arrives at destination).


Regards,
Michal


1:
https://www.kernel.org/doc/htmldocs/usb/usbfs-ioctl.html


2016-12-14 14:27:50

by Mathias Nyman

[permalink] [raw]
Subject: Re: xhci_reset_endpoint() doesn't reset endpoint

On 14.12.2016 12:58, Michal Necasek wrote:
> prior to the endpoint reset. SetFeature(CLEAR_HALT) resets the toggle
> on the device, but not on the host. But we know for a fact that the
> device sends a packet (with data toggle 0) which the host USB stack
> never sees, and a data toggle mismatch explains that quite well.
>
> We are using USBFS to talk to the printer, but that shouldn't matter
> much. I will note that the available documentation<1> explicitly says
> that USBDEVFS_RESETEP and USBDEVFS_CLEAR_HALT both reset the data
> toggle. That is indeed the case for the Linux EHCI driver but not
> xHCI. Both of the USBFS IOCTLs call into xhci_reset_endpoint() which
> does nothing.
>

This is very likely the case.

xhci can not reset the host side of the endpoint unless it really is halted.
xhci 4.6.8:

"If the endpoint is not in the Halted state when an Reset Endpoint Command
is executed -The xHC shall reject the command and generate a Command
Completion Event with the Completion Code set to Context State Error."


Normal halt/stall case is that xhci receives a STALL from the device,
and immediately resets the endpoint (clears toggle, host side) then
propagates the HALT status to usb core.
USB core then sends SetFeature(CLEAR_HALT) to the device which will reset the
toggle for the device side of the endpoint, and host and device toggles
will be in sync.

After this xhci_endpoint_reset() is called by usb core to inform xhci that the
endpoint was reset, but currently we don't do anything in it.

If SetFeature(CLEAR_HALT) is called without endpoint actually being HALTED we can not
reset it from xhci. we should issue a config endpoint command to reset the host side
toggle, as mentioned in xhci 1.0 120814 as a last note:

"Note: The Reset Endpoint Command may only be issued to endpoints in the Halted state.
If software wishes reset the Data Toggle or Sequence Number of an endpoint that isn't
in the Halted state, then software may issue a Configure Endpoint Command with the Drop
and Add bits set for the target endpoint. that is in the Stopped state."

There was a case with a scanner we believed had the same issue, and we tried to
resolve it by issuing the configure endpoint command in xhci_endpoint_reset() but
if I remember correctly It did not resolve the case and code never got anywhere.

I might have some really old implementation somewhere for this, at least there is
a really old and outdated hack at


git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git ep_reset_halt_test
https://git.kernel.org/cgit/linux/kernel/git/mnyman/xhci.git/log/?h=ep_reset_halt_test

which really is quite a hack, and based on 3.19 kernel so it's probably only useful
as an Idea to base a real solution on.

-Mathias

2016-12-15 15:42:26

by Michal Necasek

[permalink] [raw]
Subject: Re: xhci_reset_endpoint() doesn't reset endpoint

On 12/14/2016 3:28 PM, Mathias Nyman wrote:
> On 14.12.2016 12:58, Michal Necasek wrote:
>> prior to the endpoint reset. SetFeature(CLEAR_HALT) resets the toggle
>> on the device, but not on the host. But we know for a fact that the
>> device sends a packet (with data toggle 0) which the host USB stack
>> never sees, and a data toggle mismatch explains that quite well.
>>
>> We are using USBFS to talk to the printer, but that shouldn't matter
>> much. I will note that the available documentation<1> explicitly says
>> that USBDEVFS_RESETEP and USBDEVFS_CLEAR_HALT both reset the data
>> toggle. That is indeed the case for the Linux EHCI driver but not
>> xHCI. Both of the USBFS IOCTLs call into xhci_reset_endpoint() which
>> does nothing.
>>
>
> This is very likely the case.
>
Thanks for confirming that.

> xhci can not reset the host side of the endpoint unless it really is
> halted.
> xhci 4.6.8:
>
> "If the endpoint is not in the Halted state when an Reset Endpoint Command
> is executed -The xHC shall reject the command and generate a Command
> Completion Event with the Completion Code set to Context State Error."
>
Right, the Reset Endpoint is limited to halted endpoints, not even
stopped ones. It's really only useful for error cleanup.

> Normal halt/stall case is that xhci receives a STALL from the device,
> and immediately resets the endpoint (clears toggle, host side) then
> propagates the HALT status to usb core.
> USB core then sends SetFeature(CLEAR_HALT) to the device which will
> reset the
> toggle for the device side of the endpoint, and host and device toggles
> will be in sync.
>
> After this xhci_endpoint_reset() is called by usb core to inform xhci
> that the
> endpoint was reset, but currently we don't do anything in it.
>
OK, that's what I thought was happening.

> If SetFeature(CLEAR_HALT) is called without endpoint actually being
> HALTED we can not
> reset it from xhci. we should issue a config endpoint command to reset
> the host side
> toggle, as mentioned in xhci 1.0 120814 as a last note:
>
> "Note: The Reset Endpoint Command may only be issued to endpoints in the
> Halted state.
> If software wishes reset the Data Toggle or Sequence Number of an
> endpoint that isn't
> in the Halted state, then software may issue a Configure Endpoint
> Command with the Drop
> and Add bits set for the target endpoint. that is in the Stopped state."
>
> There was a case with a scanner we believed had the same issue, and we
> tried to
> resolve it by issuing the configure endpoint command in
> xhci_endpoint_reset() but
> if I remember correctly It did not resolve the case and code never got
> anywhere.
>
Ah, that's interesting. I'm not surprised that we weren't the first
ones running into this. The scenario is a bit obscure but not very
device specific.

> I might have some really old implementation somewhere for this, at least
> there is
> a really old and outdated hack at
>
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git
> ep_reset_halt_test
> https://git.kernel.org/cgit/linux/kernel/git/mnyman/xhci.git/log/?h=ep_reset_halt_test
>
>
> which really is quite a hack, and based on 3.19 kernel so it's probably
> only useful
> as an Idea to base a real solution on.
>
Thanks, we'll have to try it out. It looks like the right approach to me.

In the meantime, I did a little survey of other operating systems. I
know Windows can do it (reset the data toggle for non-halted endpoint),
because the printer works just fine on a Windows host, using both Intel
xHCI drivers (Windows 7) and Microsoft xHCI drivers (Windows 10). And
that is true whether the printer is attached to the host or passed
through to a VM. I do not yet fully understand how Windows does it.

Solaris can't do it. FreeBSD can I believe, but the FreeBSD method is
very heavy-handed (they just reset everything in sight that's remotely
related to the endpoint).

I also looked at Apple's xHCI driver <1> (not new, but the most recent
one published) and found that they do exactly what the xHCI 1.1 spec
suggests, doing a dummy Configure Endpoint command which drops and
re-adds the endpoint without explicitly changing its state. There are
even some interesting comments: "State is not halted, so the quiesce
endpoint did not clear the toggles. This will clear the toggles", and
"Now do configure endpoint with both add and drop flags set."


Thanks,
Michal

1:

https://opensource.apple.com/source/IOUSBFamily/IOUSBFamily-560.4.2/AppleUSBXHCI/Classes/AppleUSBXHCIUIM.cpp.auto.html