2015-12-01 08:21:56

by Mathias Nyman

[permalink] [raw]
Subject: [TESTPATCH v2] xhci: fix usb2 resume timing and races.

usb2 ports need to signal resume for 20ms before moving to U0 state.
Both device and host can initiate resume.

On host initated resume port is set to resume state, sleep 20ms,
and finally set port to U0 state.

On device initated resume a port status interrupt with a port in resume
state in issued. The interrupt handler tags a resume_done[port]
timestamp with current time + 20ms, and kick roothub timer.
Root hub timer requests for port status, finds the port in resume state,
checks if resume_done[port] timestamp passed, and set port to U0 state.

There are a few issues with this approach,
1. A host initated resume will also generate a resume event, the event
handler will find the port in resume state, believe it's a device
initated and act accordingly.

2. A port status request might cut the 20ms resume signalling short if a
get_port_status request is handled during the 20ms host resume.
The port will be found in resume state. The timestamp is not set leading
to time_after_eq(jiffoes, timestamp) returning true, as timestamp = 0.
get_port_status will proceed with moving the port to U0.

3. If an error, or anything else happends to the port during device
initated 20ms resume signalling it will leave all device resume
parameters hanging uncleared preventing further resume.

Fix this by using the existing resuming_ports bitfield to indicate if
resume signalling timing is taken care of.
Also check if the resume_done[port] is set before using it in time
comparison. Also clear out any resume signalling related variables if port
is not in U0 or Resume state.

v2. fix parentheses when checking for uncleared resume variables.
we want: if ((unclear1 OR unclear2 ) AND !in_resume AND !in_U3) { .. }

Signed-off-by: Mathias Nyman <[email protected]>
---
drivers/usb/host/xhci-hub.c | 38 ++++++++++++++++++++++++++++++++++++--
drivers/usb/host/xhci-ring.c | 3 ++-
2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 78241b5..a750298 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -616,8 +616,30 @@ static u32 xhci_get_port_status(struct usb_hcd *hcd,
if ((raw_port_status & PORT_RESET) ||
!(raw_port_status & PORT_PE))
return 0xffffffff;
- if (time_after_eq(jiffies,
- bus_state->resume_done[wIndex])) {
+ /* did port event handler already start 20ms resume timing? */
+ if (!bus_state->resume_done[wIndex]) {
+ /* If not, maybe we are in a host initated resume? */
+ if (test_bit(wIndex, &bus_state->resuming_ports)) {
+ /* Host initated resume doesn't time the resume
+ * signalling using resume_done[].
+ * It manually sets RESUME state, sleeps 20ms
+ * and sets U0 state. This should probably be
+ * changed, but not right now, do nothing
+ */
+ } else {
+ /* port resume was discovered now and here,
+ * start resume timing
+ */
+ unsigned long timeout = jiffies +
+ msecs_to_jiffies(USB_RESUME_TIMEOUT);
+
+ set_bit(wIndex, &bus_state->resuming_ports);
+ bus_state->resume_done[wIndex] = timeout;
+ mod_timer(&hcd->rh_timer, timeout);
+ }
+ /* Has resume been signalled for 20ms? yet? */
+ } else if (time_after_eq(jiffies,
+ bus_state->resume_done[wIndex])) {
int time_left;

xhci_dbg(xhci, "Resume USB2 port %d\n",
@@ -665,6 +687,16 @@ static u32 xhci_get_port_status(struct usb_hcd *hcd,
status |= USB_PORT_STAT_SUSPEND;
}
}
+ /* Clear stale usb2 resume signalling variables in case port changed
+ * state during 20ms resume signalling. For example on error
+ */
+ if ((bus_state->resume_done[wIndex] ||
+ test_bit(wIndex, &bus_state->resuming_ports)) &&
+ (raw_port_status & PORT_PLS_MASK) != XDEV_U3 &&
+ (raw_port_status & PORT_PLS_MASK) != XDEV_RESUME) {
+ bus_state->resume_done[wIndex] = 0;
+ clear_bit(wIndex, &bus_state->resuming_ports);
+ }
if ((raw_port_status & PORT_PLS_MASK) == XDEV_U0
&& (raw_port_status & PORT_POWER)
&& (bus_state->suspended_ports & (1 << wIndex))) {
@@ -995,6 +1027,7 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 wValue,
if ((temp & PORT_PE) == 0)
goto error;

+ set_bit(wIndex, &bus_state->resuming_ports);
xhci_set_link_state(xhci, port_array, wIndex,
XDEV_RESUME);
spin_unlock_irqrestore(&xhci->lock, flags);
@@ -1002,6 +1035,7 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 wValue,
spin_lock_irqsave(&xhci->lock, flags);
xhci_set_link_state(xhci, port_array, wIndex,
XDEV_U0);
+ clear_bit(wIndex, &bus_state->resuming_ports);
}
bus_state->port_c_suspend |= 1 << wIndex;

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 97ffe39..3743bb2 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1583,7 +1583,8 @@ static void handle_port_status(struct xhci_hcd *xhci,
*/
bogus_port_status = true;
goto cleanup;
- } else {
+ } else if (!test_bit(faked_port_index,
+ &bus_state->resuming_ports)) {
xhci_dbg(xhci, "resume HS port %d\n", port_id);
bus_state->resume_done[faked_port_index] = jiffies +
msecs_to_jiffies(USB_RESUME_TIMEOUT);
--
1.9.1


2015-12-01 14:32:18

by Felipe Balbi

[permalink] [raw]
Subject: Re: [TESTPATCH v2] xhci: fix usb2 resume timing and races.


Hi,

Mathias Nyman <[email protected]> writes:
> usb2 ports need to signal resume for 20ms before moving to U0 state.

at least 20ms ;-) Recently, we decided to drive resume for 40ms to
support devices with broken FW.

> Both device and host can initiate resume.
>
> On host initated resume port is set to resume state, sleep 20ms,

sleep 40ms:

include/linux/usb.h:232:#define USB_RESUME_TIMEOUT 40 /* ms */

> and finally set port to U0 state.
>
> On device initated resume a port status interrupt with a port in resume
> state in issued. The interrupt handler tags a resume_done[port]
> timestamp with current time + 20ms, and kick roothub timer.

+ 40ms ?

> Root hub timer requests for port status, finds the port in resume state,
> checks if resume_done[port] timestamp passed, and set port to U0 state.
>
> There are a few issues with this approach,
> 1. A host initated resume will also generate a resume event, the event
> handler will find the port in resume state, believe it's a device
> initated and act accordingly.
>
> 2. A port status request might cut the 20ms resume signalling short if a
> get_port_status request is handled during the 20ms host resume.
> The port will be found in resume state. The timestamp is not set leading
> to time_after_eq(jiffoes, timestamp) returning true, as timestamp = 0.
> get_port_status will proceed with moving the port to U0.
>
> 3. If an error, or anything else happends to the port during device
> initated 20ms resume signalling it will leave all device resume
> parameters hanging uncleared preventing further resume.
>
> Fix this by using the existing resuming_ports bitfield to indicate if
> resume signalling timing is taken care of.
> Also check if the resume_done[port] is set before using it in time
> comparison. Also clear out any resume signalling related variables if port
> is not in U0 or Resume state.
>
> v2. fix parentheses when checking for uncleared resume variables.
> we want: if ((unclear1 OR unclear2 ) AND !in_resume AND !in_U3) { .. }

this 'v2' note doesn't have to go into commit log, IMO.

> Signed-off-by: Mathias Nyman <[email protected]>
> ---
> drivers/usb/host/xhci-hub.c | 38 ++++++++++++++++++++++++++++++++++++--
> drivers/usb/host/xhci-ring.c | 3 ++-
> 2 files changed, 38 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
> index 78241b5..a750298 100644
> --- a/drivers/usb/host/xhci-hub.c
> +++ b/drivers/usb/host/xhci-hub.c
> @@ -616,8 +616,30 @@ static u32 xhci_get_port_status(struct usb_hcd *hcd,
> if ((raw_port_status & PORT_RESET) ||
> !(raw_port_status & PORT_PE))
> return 0xffffffff;
> - if (time_after_eq(jiffies,
> - bus_state->resume_done[wIndex])) {
> + /* did port event handler already start 20ms resume timing? */
> + if (!bus_state->resume_done[wIndex]) {
> + /* If not, maybe we are in a host initated resume? */
> + if (test_bit(wIndex, &bus_state->resuming_ports)) {
> + /* Host initated resume doesn't time the resume
> + * signalling using resume_done[].
> + * It manually sets RESUME state, sleeps 20ms
> + * and sets U0 state. This should probably be
> + * changed, but not right now, do nothing
> + */
> + } else {
> + /* port resume was discovered now and here,
> + * start resume timing
> + */
> + unsigned long timeout = jiffies +
> + msecs_to_jiffies(USB_RESUME_TIMEOUT);
> +
> + set_bit(wIndex, &bus_state->resuming_ports);
> + bus_state->resume_done[wIndex] = timeout;
> + mod_timer(&hcd->rh_timer, timeout);
> + }
> + /* Has resume been signalled for 20ms? yet? */

How about: "Has resume been signalled for at least 20ms?"

Or even better:

Has resume been signalled for at least USB_RESUME_TIMEOUT ms yet ?

> + } else if (time_after_eq(jiffies,
> + bus_state->resume_done[wIndex])) {
> int time_left;
>
> xhci_dbg(xhci, "Resume USB2 port %d\n",
> @@ -665,6 +687,16 @@ static u32 xhci_get_port_status(struct usb_hcd *hcd,
> status |= USB_PORT_STAT_SUSPEND;
> }
> }
> + /* Clear stale usb2 resume signalling variables in case port changed
> + * state during 20ms resume signalling. For example on error
> + */
> + if ((bus_state->resume_done[wIndex] ||
> + test_bit(wIndex, &bus_state->resuming_ports)) &&
> + (raw_port_status & PORT_PLS_MASK) != XDEV_U3 &&
> + (raw_port_status & PORT_PLS_MASK) != XDEV_RESUME) {
> + bus_state->resume_done[wIndex] = 0;
> + clear_bit(wIndex, &bus_state->resuming_ports);
> + }
> if ((raw_port_status & PORT_PLS_MASK) == XDEV_U0
> && (raw_port_status & PORT_POWER)
> && (bus_state->suspended_ports & (1 << wIndex))) {
> @@ -995,6 +1027,7 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 wValue,
> if ((temp & PORT_PE) == 0)
> goto error;
>
> + set_bit(wIndex, &bus_state->resuming_ports);
> xhci_set_link_state(xhci, port_array, wIndex,
> XDEV_RESUME);
> spin_unlock_irqrestore(&xhci->lock, flags);
> @@ -1002,6 +1035,7 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 wValue,
> spin_lock_irqsave(&xhci->lock, flags);
> xhci_set_link_state(xhci, port_array, wIndex,
> XDEV_U0);
> + clear_bit(wIndex, &bus_state->resuming_ports);
> }
> bus_state->port_c_suspend |= 1 << wIndex;
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 97ffe39..3743bb2 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -1583,7 +1583,8 @@ static void handle_port_status(struct xhci_hcd *xhci,
> */
> bogus_port_status = true;
> goto cleanup;
> - } else {
> + } else if (!test_bit(faked_port_index,
> + &bus_state->resuming_ports)) {
> xhci_dbg(xhci, "resume HS port %d\n", port_id);
> bus_state->resume_done[faked_port_index] = jiffies +
> msecs_to_jiffies(USB_RESUME_TIMEOUT);
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
balbi


Attachments:
signature.asc (818.00 B)

2015-12-01 15:04:10

by Mathias Nyman

[permalink] [raw]
Subject: Re: [TESTPATCH v2] xhci: fix usb2 resume timing and races.

On 01.12.2015 16:32, Felipe Balbi wrote:
>
> Hi,
>
> Mathias Nyman <[email protected]> writes:
>> usb2 ports need to signal resume for 20ms before moving to U0 state.
>
> at least 20ms ;-) Recently, we decided to drive resume for 40ms to
> support devices with broken FW.
>

True, but specs talk about 20ms, and I'm just trying to give some context for what's
going on. This testpatch doesn't touch the timings.

Daniel is able to trigger a USB2 xhci resume issue which I hope is fixed with this patch.
This is especially made for his setup running a 4.3 kernel

If this works I'll clean up all the "20ms" in the commit message and comments.

Just noticed that xhci USB2 host initiated resume uses a hardcoded msleep(20).
That needs to be changed to USB_RESUME_TIMEOUT at some point.
For now I'm just interested in knowing if this patch works.

>
> this 'v2' note doesn't have to go into commit log, IMO.
>

It's going to be cleaned out as well, just there to explain to Daniel, and the world why
a second version of the testpatch was created.

-Mathias

2015-12-01 15:12:09

by Felipe Balbi

[permalink] [raw]
Subject: Re: [TESTPATCH v2] xhci: fix usb2 resume timing and races.


Hi,

Mathias Nyman <[email protected]> writes:
> On 01.12.2015 16:32, Felipe Balbi wrote:
>>
>> Hi,
>>
>> Mathias Nyman <[email protected]> writes:
>>> usb2 ports need to signal resume for 20ms before moving to U0 state.
>>
>> at least 20ms ;-) Recently, we decided to drive resume for 40ms to
>> support devices with broken FW.
>>
>
> True, but specs talk about 20ms, and I'm just trying to give some context for what's
> going on. This testpatch doesn't touch the timings.
>
> Daniel is able to trigger a USB2 xhci resume issue which I hope is fixed with this patch.
> This is especially made for his setup running a 4.3 kernel
>
> If this works I'll clean up all the "20ms" in the commit message and comments.
>
> Just noticed that xhci USB2 host initiated resume uses a hardcoded msleep(20).
> That needs to be changed to USB_RESUME_TIMEOUT at some point.
> For now I'm just interested in knowing if this patch works.
>
>>
>> this 'v2' note doesn't have to go into commit log, IMO.
>>
>
> It's going to be cleaned out as well, just there to explain to Daniel, and the world why
> a second version of the testpatch was created.

all right, thanks ;-)

--
balbi


Attachments:
signature.asc (818.00 B)

2015-12-05 04:03:03

by Daniel J Blueman

[permalink] [raw]
Subject: Re: [TESTPATCH v2] xhci: fix usb2 resume timing and races.

On 1 December 2015 at 16:26, Mathias Nyman
<[email protected]> wrote:
> usb2 ports need to signal resume for 20ms before moving to U0 state.
> Both device and host can initiate resume.
>
> On host initated resume port is set to resume state, sleep 20ms,
> and finally set port to U0 state.
>
> On device initated resume a port status interrupt with a port in resume
> state in issued. The interrupt handler tags a resume_done[port]
> timestamp with current time + 20ms, and kick roothub timer.
> Root hub timer requests for port status, finds the port in resume state,
> checks if resume_done[port] timestamp passed, and set port to U0 state.
>
> There are a few issues with this approach,
> 1. A host initated resume will also generate a resume event, the event
> handler will find the port in resume state, believe it's a device
> initated and act accordingly.
>
> 2. A port status request might cut the 20ms resume signalling short if a
> get_port_status request is handled during the 20ms host resume.
> The port will be found in resume state. The timestamp is not set leading
> to time_after_eq(jiffoes, timestamp) returning true, as timestamp = 0.
> get_port_status will proceed with moving the port to U0.
>
> 3. If an error, or anything else happends to the port during device
> initated 20ms resume signalling it will leave all device resume
> parameters hanging uncleared preventing further resume.
>
> Fix this by using the existing resuming_ports bitfield to indicate if
> resume signalling timing is taken care of.
> Also check if the resume_done[port] is set before using it in time
> comparison. Also clear out any resume signalling related variables if port
> is not in U0 or Resume state.
>
> v2. fix parentheses when checking for uncleared resume variables.
> we want: if ((unclear1 OR unclear2 ) AND !in_resume AND !in_U3) { .. }
>
> Signed-off-by: Mathias Nyman <[email protected]>

Excellent; this correctly prevents the cyclic chain of suspend
attempts, resolving the issue.

Tested-by: Daniel J Blueman <[email protected]>

Thanks Mathias!
Daniel

2015-12-08 11:11:17

by Mathias Nyman

[permalink] [raw]
Subject: Re: [TESTPATCH v2] xhci: fix usb2 resume timing and races.

On 05.12.2015 06:02, Daniel J Blueman wrote:
> On 1 December 2015 at 16:26, Mathias Nyman
> <[email protected]> wrote:
>> usb2 ports need to signal resume for 20ms before moving to U0 state.
>> Both device and host can initiate resume.
>>
>> On host initated resume port is set to resume state, sleep 20ms,
>> and finally set port to U0 state.
>>
>> On device initated resume a port status interrupt with a port in resume
>> state in issued. The interrupt handler tags a resume_done[port]
>> timestamp with current time + 20ms, and kick roothub timer.
>> Root hub timer requests for port status, finds the port in resume state,
>> checks if resume_done[port] timestamp passed, and set port to U0 state.
>>
>> There are a few issues with this approach,
>> 1. A host initated resume will also generate a resume event, the event
>> handler will find the port in resume state, believe it's a device
>> initated and act accordingly.
>>
>> 2. A port status request might cut the 20ms resume signalling short if a
>> get_port_status request is handled during the 20ms host resume.
>> The port will be found in resume state. The timestamp is not set leading
>> to time_after_eq(jiffoes, timestamp) returning true, as timestamp = 0.
>> get_port_status will proceed with moving the port to U0.
>>
>> 3. If an error, or anything else happends to the port during device
>> initated 20ms resume signalling it will leave all device resume
>> parameters hanging uncleared preventing further resume.
>>
>> Fix this by using the existing resuming_ports bitfield to indicate if
>> resume signalling timing is taken care of.
>> Also check if the resume_done[port] is set before using it in time
>> comparison. Also clear out any resume signalling related variables if port
>> is not in U0 or Resume state.
>>
>> v2. fix parentheses when checking for uncleared resume variables.
>> we want: if ((unclear1 OR unclear2 ) AND !in_resume AND !in_U3) { .. }
>>
>> Signed-off-by: Mathias Nyman <[email protected]>
>
> Excellent; this correctly prevents the cyclic chain of suspend
> attempts, resolving the issue.
>
> Tested-by: Daniel J Blueman <[email protected]>
>
> Thanks Mathias!
> Daniel
>

Thanks for testing it
I'll do some minor cleanups and add it to the queue

-Mathias