2021-08-13 04:06:02

by Tao Wang

[permalink] [raw]
Subject: [PATCH] usb: xhci-ring: set all cancelled_td's cancel_status to TD_CLEARING_CACHE

USB SSD may fail to unmount if disconnect during data transferring.

it stuck in usb_kill_urb() due to urb use_count will not become zero,
this means urb giveback is not happen.
in xhci_handle_cmd_set_deq() will giveback urb if td's cancel_status
equal to TD_CLEARING_CACHE,
but in xhci_invalidate_cancelled_tds(), only last canceled td's
cancel_status change to TD_CLEARING_CACHE,
thus giveback only happen to last urb.

this change set all cancelled_td's cancel_status to TD_CLEARING_CACHE
rather than the last one, so all urb can giveback.

Signed-off-by: Tao Wang <[email protected]>
---
drivers/usb/host/xhci-ring.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 8fea44b..c7dd7c0 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -960,19 +960,19 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
td_to_noop(xhci, ring, td, false);
td->cancel_status = TD_CLEARED;
}
- }
- if (cached_td) {
- cached_td->cancel_status = TD_CLEARING_CACHE;
-
- err = xhci_move_dequeue_past_td(xhci, slot_id, ep->ep_index,
- cached_td->urb->stream_id,
- cached_td);
- /* Failed to move past cached td, try just setting it noop */
- if (err) {
- td_to_noop(xhci, ring, cached_td, false);
- cached_td->cancel_status = TD_CLEARED;
+ if (cached_td) {
+ cached_td->cancel_status = TD_CLEARING_CACHE;
+
+ err = xhci_move_dequeue_past_td(xhci, slot_id, ep->ep_index,
+ cached_td->urb->stream_id,
+ cached_td);
+ /* Failed to move past cached td, try just setting it noop */
+ if (err) {
+ td_to_noop(xhci, ring, cached_td, false);
+ cached_td->cancel_status = TD_CLEARED;
+ }
+ cached_td = NULL;
}
- cached_td = NULL;
}
return 0;
}
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


2021-08-13 09:10:22

by Mathias Nyman

[permalink] [raw]
Subject: Re: [PATCH] usb: xhci-ring: set all cancelled_td's cancel_status to TD_CLEARING_CACHE

On 13.8.2021 11.44, [email protected] wrote:
> On 2021-08-13 15:25, Ikjoon Jang wrote:
>> Hi,
>>
>> On Fri, Aug 13, 2021 at 10:44 AM Tao Wang <[email protected]> wrote:
>>>
>>> USB SSD may fail to unmount if disconnect during data transferring.
>>>
>>> it stuck in usb_kill_urb() due to urb use_count will not become zero,
>>> this means urb giveback is not happen.
>>> in xhci_handle_cmd_set_deq() will giveback urb if td's cancel_status
>>> equal to TD_CLEARING_CACHE,
>>> but in xhci_invalidate_cancelled_tds(), only last canceled td's
>>> cancel_status change to TD_CLEARING_CACHE,
>>> thus giveback only happen to last urb.
>>>
>>> this change set all cancelled_td's cancel_status to TD_CLEARING_CACHE
>>> rather than the last one, so all urb can giveback.
>>>
>>> Signed-off-by: Tao Wang <[email protected]>
>>> ---
>>>  drivers/usb/host/xhci-ring.c | 24 ++++++++++++------------
>>>  1 file changed, 12 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
>>> index 8fea44b..c7dd7c0 100644
>>> --- a/drivers/usb/host/xhci-ring.c
>>> +++ b/drivers/usb/host/xhci-ring.c
>>> @@ -960,19 +960,19 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
>>>                         td_to_noop(xhci, ring, td, false);
>>>                         td->cancel_status = TD_CLEARED;
>>>                 }
>>> -       }
>>> -       if (cached_td) {
>>> -               cached_td->cancel_status = TD_CLEARING_CACHE;
>>> -
>>> -               err = xhci_move_dequeue_past_td(xhci, slot_id, ep->ep_index,
>>> -                                               cached_td->urb->stream_id,
>>> -                                               cached_td);
>>> -               /* Failed to move past cached td, try just setting it noop */
>>> -               if (err) {
>>> -                       td_to_noop(xhci, ring, cached_td, false);
>>> -                       cached_td->cancel_status = TD_CLEARED;
>>> +               if (cached_td) {
>>> +                       cached_td->cancel_status = TD_CLEARING_CACHE;
>>> +
>>> +                       err = xhci_move_dequeue_past_td(xhci, slot_id, ep->ep_index,
>>> +                                                       cached_td->urb->stream_id,
>>> +                                                       cached_td);
>>> +                       /* Failed to move past cached td, try just setting it noop */
>>> +                       if (err) {
>>> +                               td_to_noop(xhci, ring, cached_td, false);
>>> +                               cached_td->cancel_status = TD_CLEARED;
>>> +                       }
>>> +                       cached_td = NULL;
>>>                 }
>>> -               cached_td = NULL;
>>
>> I think we can call xhci_move_dequeue_past_td() just once to
>> the last halted && cancelled TD in a ring.
>>
>> But that might need to compare two TDs to see which one is
>> the latter, I'm not sure how to do this well. :-/
>>
>> if (!cached_td || cached_td < td)
>>   cached_td = td;
>>
>
> thanks, I think you are correct that we can call xhci_move_dequeue_past_td() just once to
>  the last halted && cancelled TD in a ring,
> but the set status "cached_td->cancel_status = TD_CLEARING_CACHE;" should be every cancelled TD.
> I am not very good at td and ring, I have a question why we need to
> compare two TDs to see which one is the latter.

I'm debugging the exact same issue.
For normal endpoints (no streams) it should be enough to set cancel_td->cancel_status = TD_CLEARING_CACHE
in the TD_DIRTY and TD_HALTED case.

We don't need to move the dq past the last cancelled TD as other cancelled TDs are set to no-op, and
the command to move the dq will flush the xHC controllers TD cache and read the no-ops.
(just make sure we call xhci_move_dequeue_past_td() _after_ overwriting cancelled TDs with no-op)

Streams get trickier as each endpoint has several rings, and we might need to move the dq pointer for
many stream rings on that endpoint. This needs more work as we shouldn't start the endpoint before all
the all move dq commands complete. i.e. the current ep->ep_state &= ~SET_DEQ_PENDING isn't enough.

-Mathias

2021-08-13 09:52:40

by Ikjoon Jang

[permalink] [raw]
Subject: Re: [PATCH] usb: xhci-ring: set all cancelled_td's cancel_status to TD_CLEARING_CACHE

Hi,

On Fri, Aug 13, 2021 at 10:44 AM Tao Wang <[email protected]> wrote:
>
> USB SSD may fail to unmount if disconnect during data transferring.
>
> it stuck in usb_kill_urb() due to urb use_count will not become zero,
> this means urb giveback is not happen.
> in xhci_handle_cmd_set_deq() will giveback urb if td's cancel_status
> equal to TD_CLEARING_CACHE,
> but in xhci_invalidate_cancelled_tds(), only last canceled td's
> cancel_status change to TD_CLEARING_CACHE,
> thus giveback only happen to last urb.
>
> this change set all cancelled_td's cancel_status to TD_CLEARING_CACHE
> rather than the last one, so all urb can giveback.
>
> Signed-off-by: Tao Wang <[email protected]>
> ---
> drivers/usb/host/xhci-ring.c | 24 ++++++++++++------------
> 1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 8fea44b..c7dd7c0 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -960,19 +960,19 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
> td_to_noop(xhci, ring, td, false);
> td->cancel_status = TD_CLEARED;
> }
> - }
> - if (cached_td) {
> - cached_td->cancel_status = TD_CLEARING_CACHE;
> -
> - err = xhci_move_dequeue_past_td(xhci, slot_id, ep->ep_index,
> - cached_td->urb->stream_id,
> - cached_td);
> - /* Failed to move past cached td, try just setting it noop */
> - if (err) {
> - td_to_noop(xhci, ring, cached_td, false);
> - cached_td->cancel_status = TD_CLEARED;
> + if (cached_td) {
> + cached_td->cancel_status = TD_CLEARING_CACHE;
> +
> + err = xhci_move_dequeue_past_td(xhci, slot_id, ep->ep_index,
> + cached_td->urb->stream_id,
> + cached_td);
> + /* Failed to move past cached td, try just setting it noop */
> + if (err) {
> + td_to_noop(xhci, ring, cached_td, false);
> + cached_td->cancel_status = TD_CLEARED;
> + }
> + cached_td = NULL;
> }
> - cached_td = NULL;

I think we can call xhci_move_dequeue_past_td() just once to
the last halted && cancelled TD in a ring.

But that might need to compare two TDs to see which one is
the latter, I'm not sure how to do this well. :-/

if (!cached_td || cached_td < td)
cached_td = td;

> }
> return 0;
> }
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>

2021-08-13 09:55:49

by Tao Wang

[permalink] [raw]
Subject: Re: [PATCH] usb: xhci-ring: set all cancelled_td's cancel_status to TD_CLEARING_CACHE

On 2021-08-13 15:25, Ikjoon Jang wrote:
> Hi,
>
> On Fri, Aug 13, 2021 at 10:44 AM Tao Wang <[email protected]> wrote:
>>
>> USB SSD may fail to unmount if disconnect during data transferring.
>>
>> it stuck in usb_kill_urb() due to urb use_count will not become zero,
>> this means urb giveback is not happen.
>> in xhci_handle_cmd_set_deq() will giveback urb if td's cancel_status
>> equal to TD_CLEARING_CACHE,
>> but in xhci_invalidate_cancelled_tds(), only last canceled td's
>> cancel_status change to TD_CLEARING_CACHE,
>> thus giveback only happen to last urb.
>>
>> this change set all cancelled_td's cancel_status to TD_CLEARING_CACHE
>> rather than the last one, so all urb can giveback.
>>
>> Signed-off-by: Tao Wang <[email protected]>
>> ---
>> drivers/usb/host/xhci-ring.c | 24 ++++++++++++------------
>> 1 file changed, 12 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/usb/host/xhci-ring.c
>> b/drivers/usb/host/xhci-ring.c
>> index 8fea44b..c7dd7c0 100644
>> --- a/drivers/usb/host/xhci-ring.c
>> +++ b/drivers/usb/host/xhci-ring.c
>> @@ -960,19 +960,19 @@ static int xhci_invalidate_cancelled_tds(struct
>> xhci_virt_ep *ep)
>> td_to_noop(xhci, ring, td, false);
>> td->cancel_status = TD_CLEARED;
>> }
>> - }
>> - if (cached_td) {
>> - cached_td->cancel_status = TD_CLEARING_CACHE;
>> -
>> - err = xhci_move_dequeue_past_td(xhci, slot_id,
>> ep->ep_index,
>> -
>> cached_td->urb->stream_id,
>> - cached_td);
>> - /* Failed to move past cached td, try just setting it
>> noop */
>> - if (err) {
>> - td_to_noop(xhci, ring, cached_td, false);
>> - cached_td->cancel_status = TD_CLEARED;
>> + if (cached_td) {
>> + cached_td->cancel_status = TD_CLEARING_CACHE;
>> +
>> + err = xhci_move_dequeue_past_td(xhci, slot_id,
>> ep->ep_index,
>> +
>> cached_td->urb->stream_id,
>> + cached_td);
>> + /* Failed to move past cached td, try just
>> setting it noop */
>> + if (err) {
>> + td_to_noop(xhci, ring, cached_td,
>> false);
>> + cached_td->cancel_status = TD_CLEARED;
>> + }
>> + cached_td = NULL;
>> }
>> - cached_td = NULL;
>
> I think we can call xhci_move_dequeue_past_td() just once to
> the last halted && cancelled TD in a ring.
>
> But that might need to compare two TDs to see which one is
> the latter, I'm not sure how to do this well. :-/
>
> if (!cached_td || cached_td < td)
> cached_td = td;
>

thanks, I think you are correct that we can call
xhci_move_dequeue_past_td() just once to
the last halted && cancelled TD in a ring,
but the set status "cached_td->cancel_status = TD_CLEARING_CACHE;"
should be every cancelled TD.
I am not very good at td and ring, I have a question why we need to
compare two TDs to see which one is the latter.

>> }
>> return 0;
>> }
>> --
>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
>> Forum,
>> a Linux Foundation Collaborative Project
>>

2021-08-13 11:30:01

by Tao Wang

[permalink] [raw]
Subject: Re: [PATCH] usb: xhci-ring: set all cancelled_td's cancel_status to TD_CLEARING_CACHE

On 2021-08-13 17:09, Mathias Nyman wrote:
> On 13.8.2021 11.44, [email protected] wrote:
>> On 2021-08-13 15:25, Ikjoon Jang wrote:
>>> Hi,
>>>
>>> On Fri, Aug 13, 2021 at 10:44 AM Tao Wang <[email protected]> wrote:
>>>>
>>>> USB SSD may fail to unmount if disconnect during data transferring.
>>>>
>>>> it stuck in usb_kill_urb() due to urb use_count will not become
>>>> zero,
>>>> this means urb giveback is not happen.
>>>> in xhci_handle_cmd_set_deq() will giveback urb if td's cancel_status
>>>> equal to TD_CLEARING_CACHE,
>>>> but in xhci_invalidate_cancelled_tds(), only last canceled td's
>>>> cancel_status change to TD_CLEARING_CACHE,
>>>> thus giveback only happen to last urb.
>>>>
>>>> this change set all cancelled_td's cancel_status to
>>>> TD_CLEARING_CACHE
>>>> rather than the last one, so all urb can giveback.
>>>>
>>>> Signed-off-by: Tao Wang <[email protected]>
>>>> ---
>>>>  drivers/usb/host/xhci-ring.c | 24 ++++++++++++------------
>>>>  1 file changed, 12 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/drivers/usb/host/xhci-ring.c
>>>> b/drivers/usb/host/xhci-ring.c
>>>> index 8fea44b..c7dd7c0 100644
>>>> --- a/drivers/usb/host/xhci-ring.c
>>>> +++ b/drivers/usb/host/xhci-ring.c
>>>> @@ -960,19 +960,19 @@ static int
>>>> xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
>>>>                         td_to_noop(xhci, ring, td, false);
>>>>                         td->cancel_status = TD_CLEARED;
>>>>                 }
>>>> -       }
>>>> -       if (cached_td) {
>>>> -               cached_td->cancel_status = TD_CLEARING_CACHE;
>>>> -
>>>> -               err = xhci_move_dequeue_past_td(xhci, slot_id,
>>>> ep->ep_index,
>>>> -                                              
>>>> cached_td->urb->stream_id,
>>>> -                                               cached_td);
>>>> -               /* Failed to move past cached td, try just setting
>>>> it noop */
>>>> -               if (err) {
>>>> -                       td_to_noop(xhci, ring, cached_td, false);
>>>> -                       cached_td->cancel_status = TD_CLEARED;
>>>> +               if (cached_td) {
>>>> +                       cached_td->cancel_status =
>>>> TD_CLEARING_CACHE;
>>>> +
>>>> +                       err = xhci_move_dequeue_past_td(xhci,
>>>> slot_id, ep->ep_index,
>>>> +                                                      
>>>> cached_td->urb->stream_id,
>>>> +                                                       cached_td);
>>>> +                       /* Failed to move past cached td, try just
>>>> setting it noop */
>>>> +                       if (err) {
>>>> +                               td_to_noop(xhci, ring, cached_td,
>>>> false);
>>>> +                               cached_td->cancel_status =
>>>> TD_CLEARED;
>>>> +                       }
>>>> +                       cached_td = NULL;
>>>>                 }
>>>> -               cached_td = NULL;
>>>
>>> I think we can call xhci_move_dequeue_past_td() just once to
>>> the last halted && cancelled TD in a ring.
>>>
>>> But that might need to compare two TDs to see which one is
>>> the latter, I'm not sure how to do this well. :-/
>>>
>>> if (!cached_td || cached_td < td)
>>>   cached_td = td;
>>>
>>
>> thanks, I think you are correct that we can call
>> xhci_move_dequeue_past_td() just once to
>>  the last halted && cancelled TD in a ring,
>> but the set status "cached_td->cancel_status = TD_CLEARING_CACHE;"
>> should be every cancelled TD.
>> I am not very good at td and ring, I have a question why we need to
>> compare two TDs to see which one is the latter.
>
> I'm debugging the exact same issue.
> For normal endpoints (no streams) it should be enough to set
> cancel_td->cancel_status = TD_CLEARING_CACHE
> in the TD_DIRTY and TD_HALTED case.
>
> We don't need to move the dq past the last cancelled TD as other
> cancelled TDs are set to no-op, and
> the command to move the dq will flush the xHC controllers TD cache and
> read the no-ops.
> (just make sure we call xhci_move_dequeue_past_td() _after_
> overwriting cancelled TDs with no-op)
>
> Streams get trickier as each endpoint has several rings, and we might
> need to move the dq pointer for
> many stream rings on that endpoint. This needs more work as we
> shouldn't start the endpoint before all
> the all move dq commands complete. i.e. the current ep->ep_state &=
> ~SET_DEQ_PENDING isn't enough.
>
> -Mathias
ok, thanks, please tell me if you have a great solution after debugging,
I still need to learn from you.

2021-08-13 17:02:04

by Mathias Nyman

[permalink] [raw]
Subject: [RFT PATCH] xhci: fix failure to give back some cached cancelled URBs.

Only TDs with status TD_CLEARING_CACHE will be given back after
cache is cleared with a set TR deq command.

xhci_invalidate_cached_td() failed to set the TD_CLEARING_CACHE status
for some cancelled TDs as it assumed an endpoint only needs to clear the
TD it stopped on from cache. There are some cases this isn't true.

For example with streams as an endpoint may have several
stream rings, each stopping on different TDs.

* FIXME *, explain that streams case isn't fully solved by this, but it's
x100 better than before as we give back the URBs and thread won't hang.
Some of the streams TRB cache still isn't cleared, and may point to data
in URBs we alrady gave back. xHC controller may touch this.

Signed-off-by: Mathias Nyman <[email protected]>
---
drivers/usb/host/xhci-ring.c | 40 ++++++++++++++++++++++--------------
1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index d0faa67a689d..9017986241f5 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -942,17 +942,21 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
td->urb->stream_id);
hw_deq &= ~0xf;

- if (td->cancel_status == TD_HALTED) {
- cached_td = td;
- } else if (trb_in_td(xhci, td->start_seg, td->first_trb,
- td->last_trb, hw_deq, false)) {
+ if (td->cancel_status == TD_HALTED ||
+ trb_in_td(xhci, td->start_seg, td->first_trb, td->last_trb, hw_deq, false)) {
switch (td->cancel_status) {
case TD_CLEARED: /* TD is already no-op */
case TD_CLEARING_CACHE: /* set TR deq command already queued */
break;
case TD_DIRTY: /* TD is cached, clear it */
case TD_HALTED:
- /* FIXME stream case, several stopped rings */
+ td->cancel_status = TD_CLEARING_CACHE;
+ if (cached_td)
+ /* FIXME stream case, several stopped rings */
+ xhci_dbg(xhci,
+ "Move dq past stream %u URB %p instead of stream %u URB %p\n",
+ td->urb->stream_id, td->urb,
+ cached_td->urb->stream_id, cached_td->urb);
cached_td = td;
break;
}
@@ -961,18 +965,24 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
td->cancel_status = TD_CLEARED;
}
}
- if (cached_td) {
- cached_td->cancel_status = TD_CLEARING_CACHE;

- err = xhci_move_dequeue_past_td(xhci, slot_id, ep->ep_index,
- cached_td->urb->stream_id,
- cached_td);
- /* Failed to move past cached td, try just setting it noop */
- if (err) {
- td_to_noop(xhci, ring, cached_td, false);
- cached_td->cancel_status = TD_CLEARED;
+ /* If there's no need to move the dequeue pointer then we're done */
+ if (!cached_td)
+ return 0;
+
+ err = xhci_move_dequeue_past_td(xhci, slot_id, ep->ep_index,
+ cached_td->urb->stream_id,
+ cached_td);
+ if (err) {
+ /* Failed to move past cached td, just set cached TDs to no-op */
+ list_for_each_entry_safe(td, tmp_td, &ep->cancelled_td_list, cancelled_td_list) {
+ if (td->cancel_status != TD_CLEARING_CACHE)
+ continue;
+ xhci_dbg(xhci, "Failed to clear cancelled cached URB %p, mark clear anyway\n",
+ td->urb);
+ td_to_noop(xhci, ring, td, false);
+ td->cancel_status = TD_CLEARED;
}
- cached_td = NULL;
}
return 0;
}
--
2.25.1