2020-01-22 22:28:24

by John Stultz

[permalink] [raw]
Subject: [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using adb over f_fs

Hey all,
I wanted to send these out for comment and thoughts.

Since ~4.20, when the functionfs gadget enabled scatter-gather
support, we have seen problems with adb connections stalling and
stopping to function on hardware with dwc3 usb controllers.
Specifically, HiKey960, Dragonboard 845c, and Pixel3 devices.

Initally the workaround we used was to simply disable scatter
gather support on the dwc3 by commenting out the
"dwc->gadget.sg_supported = true;" line.

After working with Fei Yang, who was seeing similar trouble on
Intel dwc3 based hardare, Thinh Nguyen mentioned that a fix had
already been found and pointed me to one of Anurag's patches.

This solved the issue on HiKey960 and I sent it out to the list
but didn't get any feedback.

Additional testing with the Dragonboard 845c found that that
first fix was not sufficient, and so I've sat on the fix
thinking something deeper was amiss and went back to the hack
of disabling sg_supported on all dwc3 platforms.

In the following months Fei's continued and repeated efforts
didn't seem to get enough review to result in a fix, and they've
since moved on to other work.

Recently, I found that folks at qcom have seen similer issues
and pointed me to the second patch in this series, which does
seem to resolve the issue on the Dragonboard 845c, but not the
HiKey960 on its own.

So I wanted to send these patches out for comment. There's
clearly a number of folks seeing broken behavior for ahwile on
dwc3 hardware, and we're all seeemingly working around it in our
own ways, so either those individual fixes need to get upstream
or we need to figure out some deeper solution to the issue.

So I wanted to send these two out for review and feedback.

thanks
-john

Cc: Felipe Balbi <[email protected]>
Cc: Yang Fei <[email protected]>
Cc: Thinh Nguyen <[email protected]>
Cc: Tejas Joglekar <[email protected]>
Cc: Andrzej Pietrasiewicz <[email protected]>
Cc: Jack Pham <[email protected]>
Cc: Todd Kjos <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Linux USB List <[email protected]>
Cc: stable <[email protected]>

Anurag Kumar Vulisha (2):
usb: dwc3: gadget: Check for IOC/LST bit in both event->status and
TRB->ctrl fields
usb: dwc3: gadget: Correct the logic for finding last SG entry

drivers/usb/dwc3/gadget.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

--
2.17.1


2020-01-22 22:29:30

by John Stultz

[permalink] [raw]
Subject: [RFC][PATCH 2/2] usb: dwc3: gadget: Correct the logic for finding last SG entry

From: Anurag Kumar Vulisha <[email protected]>

As a process of preparing TRBs usb_gadget_map_request_by_dev() is
called from dwc3_prepare_trbs() for mapping the request. This will
call dma_map_sg() if req->num_sgs are greater than 0. dma_map_sg()
will map the sg entries in sglist and return the number of mapped SGs.
As a part of mapping, some sg entries having contigous memory may be
merged together into a single sg (when IOMMU used). So, the number of
mapped sg entries may not be equal to the number of orginal sg entries
in the request (req->num_sgs).

As a part of preparing the TRBs, dwc3_prepare_one_trb_sg() iterates over
the sg entries present in the sglist and calls sg_is_last() to identify
whether the sg entry is last and set IOC bit for the last sg entry. The
sg_is_last() determines last sg if SG_END is set in sg->page_link. When
IOMMU used, dma_map_sg() merges 2 or more sgs into a single sg and it
doesn't retain the page_link properties. Because of this reason the
sg_is_last() may not find SG_END and thus resulting in IOC bit never
getting set.

For example:

Consider a request having 8 sg entries with each entry having a length of
4096 bytes. Assume that sg1 & sg2, sg3 & sg4, sg5 & sg6, sg7 & sg8 are
having contigous memory regions.

Before calling dma_map_sg():
sg1-->sg2-->sg3-->sg4-->sg6-->sg7-->sg8
dma_length: 4K 4K 4K 4K 4K 4K 4K
SG_END: False False False False False False True
num_sgs = 8
num_mapped_sgs = 0

The dma_map_sg() merges sg1 & sg2 memory regions into sg1->dma_address.
Similarly sg3 & sg4 into sg2->dma_address, sg5 & sg6 into the
sg3->dma_address and sg6 & sg8 into sg4->dma_address. Here the memory
regions are merged but the page_link properties like SG_END are not
retained into the merged sgs.

After calling dma_map_sg();
sg1-->sg2-->sg3-->sg4-->sg6-->sg7-->sg8
dma_length: 8K 8K 8K 8K 0K 0K 0K
SG_END: False False False False False False True
num_sgs = 8
num_mapped_sgs = 4

After calling dma_map_sg(), sg1,sg2,sg3,sg4 are having dma_length of
8096 bytes each and remaining sg4,sg5,sg6,sg7 are having 0 bytes of
dma_length.

After dma_map_sg() is performed dma_perpare_trb_sg() iterates on all sg
entries and sets IOC bit only for the sg8 (since sg_is_last() returns true
only for sg8). But after calling dma_map_sg() the valid data are present
only till sg4 and the IOC bit should be set for sg4 TRB only (which is not
happening in the present code)

The above mentioned issue can be fixed by determining last sg based on the
req->num_queued_sgs instead of sg_is_last(). If (req->num_queued_sgs + 1)
is equal to req->num_mapped_sgs, then this sg is the last sg. In the above
example, the dwc3 driver has already queued 3 sgs (upto sg3), so the
num_queued_sgs = 3. On preparing the next sg (i.e sg4), check for last sg
(num_queued_sgs + 1) == num_mapped_sgs becomes true. So, the driver sets
IOC bit for sg4. This patch does the same.

At a practical level, this patch resolves USB transfer stalls
seen with adb on dwc3 based db845c, pixel3 and other qcom
hardware after functionfs gadget added scatter-gather support
around v4.20.

Cc: Felipe Balbi <[email protected]>
Cc: Yang Fei <[email protected]>
Cc: Thinh Nguyen <[email protected]>
Cc: Tejas Joglekar <[email protected]>
Cc: Andrzej Pietrasiewicz <[email protected]>
Cc: Jack Pham <[email protected]>
Cc: Todd Kjos <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Linux USB List <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Anurag Kumar Vulisha <[email protected]>
[jstultz: Add note to end of commit message on specific issue this resovles]
Signed-off-by: John Stultz <[email protected]>
---
drivers/usb/dwc3/gadget.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 1edce3bbb55c..30a80bc97cfe 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1068,7 +1068,7 @@ static void dwc3_prepare_one_trb_sg(struct dwc3_ep *dep,
unsigned int rem = length % maxp;
unsigned chain = true;

- if (sg_is_last(s))
+ if ((req->num_queued_sgs + 1) == req->request.num_mapped_sgs)
chain = false;

if (rem && usb_endpoint_dir_out(dep->endpoint.desc) && !chain) {
--
2.17.1

2020-01-22 22:29:30

by John Stultz

[permalink] [raw]
Subject: [RFC][PATCH 1/2] usb: dwc3: gadget: Check for IOC/LST bit in both event->status and TRB->ctrl fields

From: Anurag Kumar Vulisha <[email protected]>

The present code in dwc3_gadget_ep_reclaim_completed_trb() will check
for IOC/LST bit in the event->status and returns if IOC/LST bit is
set. This logic doesn't work if multiple TRBs are queued per
request and the IOC/LST bit is set on the last TRB of that request.
Consider an example where a queued request has multiple queued TRBs
and IOC/LST bit is set only for the last TRB. In this case, the Core
generates XferComplete/XferInProgress events only for the last TRB
(since IOC/LST are set only for the last TRB). As per the logic in
dwc3_gadget_ep_reclaim_completed_trb() event->status is checked for
IOC/LST bit and returns on the first TRB. This makes the remaining
TRBs left unhandled.
To aviod this, changed the code to check for IOC/LST bits in both
event->status & TRB->ctrl. This patch does the same.

At a practical level, this patch resolves USB transfer stalls seen
with adb on dwc3 based HiKey960 after functionfs gadget added
scatter-gather support around v4.20.

Cc: Felipe Balbi <[email protected]>
Cc: Yang Fei <[email protected]>
Cc: Thinh Nguyen <[email protected]>
Cc: Tejas Joglekar <[email protected]>
Cc: Andrzej Pietrasiewicz <[email protected]>
Cc: Jack Pham <[email protected]>
Cc: Todd Kjos <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Linux USB List <[email protected]>
Cc: stable <[email protected]>
Tested-by: Tejas Joglekar <[email protected]>
Reviewed-by: Thinh Nguyen <[email protected]>
Signed-off-by: Anurag Kumar Vulisha <[email protected]>
[jstultz: forward ported to mainline, added note to commit log]
Signed-off-by: John Stultz <[email protected]>
---
drivers/usb/dwc3/gadget.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 154f3f3e8cff..1edce3bbb55c 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2420,7 +2420,12 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
if (event->status & DEPEVT_STATUS_SHORT && !chain)
return 1;

- if (event->status & DEPEVT_STATUS_IOC)
+ if ((event->status & DEPEVT_STATUS_IOC) &&
+ (trb->ctrl & DWC3_TRB_CTRL_IOC))
+ return 1;
+
+ if ((event->status & DEPEVT_STATUS_LST) &&
+ (trb->ctrl & DWC3_TRB_CTRL_LST))
return 1;

return 0;
--
2.17.1

2020-01-23 07:19:10

by Felipe Balbi

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using adb over f_fs


Hi,

John Stultz <[email protected]> writes:
> Hey all,
> I wanted to send these out for comment and thoughts.
>
> Since ~4.20, when the functionfs gadget enabled scatter-gather
> support, we have seen problems with adb connections stalling and
> stopping to function on hardware with dwc3 usb controllers.
> Specifically, HiKey960, Dragonboard 845c, and Pixel3 devices.
>
> Initally the workaround we used was to simply disable scatter
> gather support on the dwc3 by commenting out the
> "dwc->gadget.sg_supported = true;" line.
>
> After working with Fei Yang, who was seeing similar trouble on
> Intel dwc3 based hardare, Thinh Nguyen mentioned that a fix had
> already been found and pointed me to one of Anurag's patches.
>
> This solved the issue on HiKey960 and I sent it out to the list
> but didn't get any feedback.
>
> Additional testing with the Dragonboard 845c found that that
> first fix was not sufficient, and so I've sat on the fix
> thinking something deeper was amiss and went back to the hack
> of disabling sg_supported on all dwc3 platforms.
>
> In the following months Fei's continued and repeated efforts
> didn't seem to get enough review to result in a fix, and they've
> since moved on to other work.
>
> Recently, I found that folks at qcom have seen similer issues
> and pointed me to the second patch in this series, which does
> seem to resolve the issue on the Dragonboard 845c, but not the
> HiKey960 on its own.
>
> So I wanted to send these patches out for comment. There's
> clearly a number of folks seeing broken behavior for ahwile on
> dwc3 hardware, and we're all seeemingly working around it in our
> own ways, so either those individual fixes need to get upstream
> or we need to figure out some deeper solution to the issue.
>
> So I wanted to send these two out for review and feedback.
>
> thanks
> -john
>
> Cc: Felipe Balbi <[email protected]>
> Cc: Yang Fei <[email protected]>
> Cc: Thinh Nguyen <[email protected]>
> Cc: Tejas Joglekar <[email protected]>
> Cc: Andrzej Pietrasiewicz <[email protected]>
> Cc: Jack Pham <[email protected]>
> Cc: Todd Kjos <[email protected]>
> Cc: Greg KH <[email protected]>
> Cc: Linux USB List <[email protected]>
> Cc: stable <[email protected]>
>
> Anurag Kumar Vulisha (2):
> usb: dwc3: gadget: Check for IOC/LST bit in both event->status and
> TRB->ctrl fields
> usb: dwc3: gadget: Correct the logic for finding last SG entry

I remember commenting on these patches before and never getting a newer
version from Anurag.

--
balbi


Attachments:
signature.asc (847.00 B)

2020-01-23 07:25:45

by Felipe Balbi

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] usb: dwc3: gadget: Check for IOC/LST bit in both event->status and TRB->ctrl fields


Hi,

John Stultz <[email protected]> writes:

> From: Anurag Kumar Vulisha <[email protected]>
>
> The present code in dwc3_gadget_ep_reclaim_completed_trb() will check
> for IOC/LST bit in the event->status and returns if IOC/LST bit is
> set. This logic doesn't work if multiple TRBs are queued per
> request and the IOC/LST bit is set on the last TRB of that request.
> Consider an example where a queued request has multiple queued TRBs
> and IOC/LST bit is set only for the last TRB. In this case, the Core
> generates XferComplete/XferInProgress events only for the last TRB
> (since IOC/LST are set only for the last TRB). As per the logic in
> dwc3_gadget_ep_reclaim_completed_trb() event->status is checked for
> IOC/LST bit and returns on the first TRB. This makes the remaining
> TRBs left unhandled.
> To aviod this, changed the code to check for IOC/LST bits in both
avoid

> event->status & TRB->ctrl. This patch does the same.

We don't need to check both. It's very likely that checking the TRB is
enough.

> At a practical level, this patch resolves USB transfer stalls seen
> with adb on dwc3 based HiKey960 after functionfs gadget added
> scatter-gather support around v4.20.

Right, I remember asking for tracepoint data showing this problem
happening. It's the best way to figure out what's really going on.

Before we accept these two patches, could you collect dwc3 tracepoint
data and share here?

--
balbi


Attachments:
signature.asc (847.00 B)

2020-01-23 07:26:16

by Felipe Balbi

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] usb: dwc3: gadget: Correct the logic for finding last SG entry


Hi,

John Stultz <[email protected]> writes:
> From: Anurag Kumar Vulisha <[email protected]>
>
> As a process of preparing TRBs usb_gadget_map_request_by_dev() is
> called from dwc3_prepare_trbs() for mapping the request. This will
> call dma_map_sg() if req->num_sgs are greater than 0. dma_map_sg()
> will map the sg entries in sglist and return the number of mapped SGs.
> As a part of mapping, some sg entries having contigous memory may be
> merged together into a single sg (when IOMMU used). So, the number of
> mapped sg entries may not be equal to the number of orginal sg entries
> in the request (req->num_sgs).
>
> As a part of preparing the TRBs, dwc3_prepare_one_trb_sg() iterates over
> the sg entries present in the sglist and calls sg_is_last() to identify
> whether the sg entry is last and set IOC bit for the last sg entry. The
> sg_is_last() determines last sg if SG_END is set in sg->page_link. When
> IOMMU used, dma_map_sg() merges 2 or more sgs into a single sg and it
> doesn't retain the page_link properties. Because of this reason the
> sg_is_last() may not find SG_END and thus resulting in IOC bit never
> getting set.
>
> For example:
>
> Consider a request having 8 sg entries with each entry having a length of
> 4096 bytes. Assume that sg1 & sg2, sg3 & sg4, sg5 & sg6, sg7 & sg8 are
> having contigous memory regions.
>
> Before calling dma_map_sg():
> sg1-->sg2-->sg3-->sg4-->sg6-->sg7-->sg8
> dma_length: 4K 4K 4K 4K 4K 4K 4K
> SG_END: False False False False False False True
> num_sgs = 8
> num_mapped_sgs = 0
>
> The dma_map_sg() merges sg1 & sg2 memory regions into sg1->dma_address.
> Similarly sg3 & sg4 into sg2->dma_address, sg5 & sg6 into the
> sg3->dma_address and sg6 & sg8 into sg4->dma_address. Here the memory
> regions are merged but the page_link properties like SG_END are not
> retained into the merged sgs.
>
> After calling dma_map_sg();
> sg1-->sg2-->sg3-->sg4-->sg6-->sg7-->sg8
> dma_length: 8K 8K 8K 8K 0K 0K 0K
> SG_END: False False False False False False True
> num_sgs = 8
> num_mapped_sgs = 4
>
> After calling dma_map_sg(), sg1,sg2,sg3,sg4 are having dma_length of
> 8096 bytes each and remaining sg4,sg5,sg6,sg7 are having 0 bytes of
> dma_length.
>
> After dma_map_sg() is performed dma_perpare_trb_sg() iterates on all sg
> entries and sets IOC bit only for the sg8 (since sg_is_last() returns true
> only for sg8). But after calling dma_map_sg() the valid data are present
> only till sg4 and the IOC bit should be set for sg4 TRB only (which is not
> happening in the present code)
>
> The above mentioned issue can be fixed by determining last sg based on the
> req->num_queued_sgs instead of sg_is_last(). If (req->num_queued_sgs + 1)
> is equal to req->num_mapped_sgs, then this sg is the last sg. In the above
> example, the dwc3 driver has already queued 3 sgs (upto sg3), so the
> num_queued_sgs = 3. On preparing the next sg (i.e sg4), check for last sg
> (num_queued_sgs + 1) == num_mapped_sgs becomes true. So, the driver sets
> IOC bit for sg4. This patch does the same.
>
> At a practical level, this patch resolves USB transfer stalls
> seen with adb on dwc3 based db845c, pixel3 and other qcom
> hardware after functionfs gadget added scatter-gather support
> around v4.20.
>
> Cc: Felipe Balbi <[email protected]>
> Cc: Yang Fei <[email protected]>
> Cc: Thinh Nguyen <[email protected]>
> Cc: Tejas Joglekar <[email protected]>
> Cc: Andrzej Pietrasiewicz <[email protected]>
> Cc: Jack Pham <[email protected]>
> Cc: Todd Kjos <[email protected]>
> Cc: Greg KH <[email protected]>
> Cc: Linux USB List <[email protected]>
> Cc: stable <[email protected]>
> Signed-off-by: Anurag Kumar Vulisha <[email protected]>
> [jstultz: Add note to end of commit message on specific issue this resovles]
> Signed-off-by: John Stultz <[email protected]>
> ---
> drivers/usb/dwc3/gadget.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 1edce3bbb55c..30a80bc97cfe 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -1068,7 +1068,7 @@ static void dwc3_prepare_one_trb_sg(struct dwc3_ep *dep,
> unsigned int rem = length % maxp;
> unsigned chain = true;
>
> - if (sg_is_last(s))
> + if ((req->num_queued_sgs + 1) == req->request.num_mapped_sgs)

This is probably a bug on DMA API. If it combines pages from
scatter-list, then it should also move the last SG so sg_is_last()
continues to work.

I had asked author to discuss this with DMA API maintainers. Can you do
that?

--
balbi


Attachments:
signature.asc (847.00 B)

2020-01-23 08:45:02

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using adb over f_fs

Hi John,

W dniu 22.01.2020 o 23:26, John Stultz pisze:
> Hey all,
> I wanted to send these out for comment and thoughts.
>
> Since ~4.20, when the functionfs gadget enabled scatter-gather
> support, we have seen problems with adb connections stalling and
> stopping to function on hardware with dwc3 usb controllers.
> Specifically, HiKey960, Dragonboard 845c, and Pixel3 devices.

Any chance this:

https://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git/commit/?h=testing/next&id=f63333e8e4fd63d8d8ae83b89d2c38cf21d64801

has something to do with the problem you are reporting?

Andrzej

2020-01-23 15:51:51

by Anurag Kumar Vulisha

[permalink] [raw]
Subject: RE: [RFC][PATCH 2/2] usb: dwc3: gadget: Correct the logic for finding last SG entry

Hi Felipe,
>-----Original Message-----
>From: Felipe Balbi <[email protected]> On Behalf Of Felipe Balbi
>Sent: Thursday, January 23, 2020 12:56 PM
>To: John Stultz <[email protected]>; lkml <[email protected]>
>Cc: Anurag Kumar Vulisha <[email protected]>; Felipe Balbi
><[email protected]>; Yang Fei <[email protected]>; Thinh Nguyen
><[email protected]>; Tejas Joglekar <[email protected]>;
>Andrzej Pietrasiewicz <[email protected]>; Jack Pham
><[email protected]>; Todd Kjos <[email protected]>; Greg KH
><[email protected]>; Linux USB List <[email protected]>;
>stable <[email protected]>; John Stultz <[email protected]>
>Subject: Re: [RFC][PATCH 2/2] usb: dwc3: gadget: Correct the logic for finding
>last SG entry
>
>
>Hi,
>
>John Stultz <[email protected]> writes:
>> From: Anurag Kumar Vulisha <[email protected]>
>>
>> As a process of preparing TRBs usb_gadget_map_request_by_dev() is
>> called from dwc3_prepare_trbs() for mapping the request. This will
>> call dma_map_sg() if req->num_sgs are greater than 0. dma_map_sg()
>> will map the sg entries in sglist and return the number of mapped SGs.
>> As a part of mapping, some sg entries having contigous memory may be
>> merged together into a single sg (when IOMMU used). So, the number of
>> mapped sg entries may not be equal to the number of orginal sg entries
>> in the request (req->num_sgs).
>>
>> As a part of preparing the TRBs, dwc3_prepare_one_trb_sg() iterates
>> over the sg entries present in the sglist and calls sg_is_last() to
>> identify whether the sg entry is last and set IOC bit for the last sg
>> entry. The
>> sg_is_last() determines last sg if SG_END is set in sg->page_link.
>> When IOMMU used, dma_map_sg() merges 2 or more sgs into a single sg
>> and it doesn't retain the page_link properties. Because of this reason
>> the
>> sg_is_last() may not find SG_END and thus resulting in IOC bit never
>> getting set.
>>
>> For example:
>>
>> Consider a request having 8 sg entries with each entry having a length
>> of
>> 4096 bytes. Assume that sg1 & sg2, sg3 & sg4, sg5 & sg6, sg7 & sg8 are
>> having contigous memory regions.
>>
>> Before calling dma_map_sg():
>> sg1-->sg2-->sg3-->sg4-->sg6-->sg7-->sg8
>> dma_length: 4K 4K 4K 4K 4K 4K 4K
>> SG_END: False False False False False False True
>> num_sgs = 8
>> num_mapped_sgs = 0
>>
>> The dma_map_sg() merges sg1 & sg2 memory regions into sg1->dma_address.
>> Similarly sg3 & sg4 into sg2->dma_address, sg5 & sg6 into the
>> sg3->dma_address and sg6 & sg8 into sg4->dma_address. Here the memory
>> regions are merged but the page_link properties like SG_END are not
>> retained into the merged sgs.
>>
>> After calling dma_map_sg();
>> sg1-->sg2-->sg3-->sg4-->sg6-->sg7-->sg8
>> dma_length: 8K 8K 8K 8K 0K 0K 0K
>> SG_END: False False False False False False True
>> num_sgs = 8
>> num_mapped_sgs = 4
>>
>> After calling dma_map_sg(), sg1,sg2,sg3,sg4 are having dma_length of
>> 8096 bytes each and remaining sg4,sg5,sg6,sg7 are having 0 bytes of
>> dma_length.
>>
>> After dma_map_sg() is performed dma_perpare_trb_sg() iterates on all
>> sg entries and sets IOC bit only for the sg8 (since sg_is_last()
>> returns true only for sg8). But after calling dma_map_sg() the valid
>> data are present only till sg4 and the IOC bit should be set for sg4
>> TRB only (which is not happening in the present code)
>>
>> The above mentioned issue can be fixed by determining last sg based on
>> the
>> req->num_queued_sgs instead of sg_is_last(). If (req->num_queued_sgs +
>> req->1)
>> is equal to req->num_mapped_sgs, then this sg is the last sg. In the
>> above example, the dwc3 driver has already queued 3 sgs (upto sg3), so
>> the num_queued_sgs = 3. On preparing the next sg (i.e sg4), check for
>> last sg (num_queued_sgs + 1) == num_mapped_sgs becomes true. So, the
>> driver sets IOC bit for sg4. This patch does the same.
>>
>> At a practical level, this patch resolves USB transfer stalls seen
>> with adb on dwc3 based db845c, pixel3 and other qcom hardware after
>> functionfs gadget added scatter-gather support around v4.20.
>>
>> Cc: Felipe Balbi <[email protected]>
>> Cc: Yang Fei <[email protected]>
>> Cc: Thinh Nguyen <[email protected]>
>> Cc: Tejas Joglekar <[email protected]>
>> Cc: Andrzej Pietrasiewicz <[email protected]>
>> Cc: Jack Pham <[email protected]>
>> Cc: Todd Kjos <[email protected]>
>> Cc: Greg KH <[email protected]>
>> Cc: Linux USB List <[email protected]>
>> Cc: stable <[email protected]>
>> Signed-off-by: Anurag Kumar Vulisha <[email protected]>
>> [jstultz: Add note to end of commit message on specific issue this
>> resovles]
>> Signed-off-by: John Stultz <[email protected]>
>> ---
>> drivers/usb/dwc3/gadget.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index 1edce3bbb55c..30a80bc97cfe 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -1068,7 +1068,7 @@ static void dwc3_prepare_one_trb_sg(struct
>dwc3_ep *dep,
>> unsigned int rem = length % maxp;
>> unsigned chain = true;
>>
>> - if (sg_is_last(s))
>> + if ((req->num_queued_sgs + 1) == req-
>>request.num_mapped_sgs)
>
>This is probably a bug on DMA API. If it combines pages from scatter-list, then it
>should also move the last SG so sg_is_last() continues to work.
>
>I had asked author to discuss this with DMA API maintainers. Can you do that?
>
I was stuck with other tasks, so couldn't discuss with DMA maintainers on this.
I am sorry for that.

Hi John,
Thanks for bringing this patch back . Please let me know if I can help you with
anything on this. If you want, I am ready to start working on this.

Thanks,
Anurag Kumar Vulisha

2020-01-23 16:30:32

by Yang, Fei

[permalink] [raw]
Subject: RE: [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using adb over f_fs

>> Hey all,
>> I wanted to send these out for comment and thoughts.
>>
>> Since ~4.20, when the functionfs gadget enabled scatter-gather
>> support, we have seen problems with adb connections stalling and
>> stopping to function on hardware with dwc3 usb controllers.
>> Specifically, HiKey960, Dragonboard 845c, and Pixel3 devices.
>
> Any chance this:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git/commit/?h=testing/next&id=f63333e8e4fd63d8d8ae83b89d2c38cf21d64801
This is a different issue. I have tried initializing num_sgs when debugging this adb stall problem, but it didn't help.

>> has something to do with the problem you are reporting?
>
>> Andrzej

2020-01-23 17:32:24

by Felipe Balbi

[permalink] [raw]
Subject: RE: [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using adb over f_fs


Hi,

"Yang, Fei" <[email protected]> writes:
>>> Hey all,
>>> I wanted to send these out for comment and thoughts.
>>>
>>> Since ~4.20, when the functionfs gadget enabled scatter-gather
>>> support, we have seen problems with adb connections stalling and
>>> stopping to function on hardware with dwc3 usb controllers.
>>> Specifically, HiKey960, Dragonboard 845c, and Pixel3 devices.
>>
>> Any chance this:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git/commit/?h=testing/next&id=f63333e8e4fd63d8d8ae83b89d2c38cf21d64801
> This is a different issue. I have tried initializing num_sgs when debugging this adb stall problem, but it didn't help.

So multiple folks have run through this problem, but not *one* has
tracepoints collected from the issue? C'mon guys. Can someone, please,
collect tracepoints so we can figure out what's actually going on?

I'm pretty sure this should be solved at the DMA API level, just want to
confirm.

cheers

--
balbi


Attachments:
signature.asc (847.00 B)

2020-01-23 17:38:39

by Yang, Fei

[permalink] [raw]
Subject: RE: [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using adb over f_fs

>>>> Hey all,
>>>> I wanted to send these out for comment and thoughts.
>>>>
>>>> Since ~4.20, when the functionfs gadget enabled scatter-gather
>>>> support, we have seen problems with adb connections stalling and
>>>> stopping to function on hardware with dwc3 usb controllers.
>>>> Specifically, HiKey960, Dragonboard 845c, and Pixel3 devices.
>>>
>>> Any chance this:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git/commit/
>>> ?h=testing/next&id=f63333e8e4fd63d8d8ae83b89d2c38cf21d64801
>> This is a different issue. I have tried initializing num_sgs when debugging this adb stall problem, but it didn't help.
>
> So multiple folks have run through this problem, but not *one* has tracepoints collected from the issue? C'mon guys. Can someone, please, collect tracepoints so we can figure out what's actually going on?
>
> I'm pretty sure this should be solved at the DMA API level, just want to confirm.
I have sent you the tracepoints long time ago. Also my analysis of the problem (BTW, I don't think the tracepoints helped much). It's basically a logic problem in function dwc3_gadget_ep_reclaim_trb_sg().
I can try dig into my old emails and resend, but that is a bit hard to find.

-Fei

2020-01-23 17:47:16

by Felipe Balbi

[permalink] [raw]
Subject: RE: [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using adb over f_fs


Hi,

"Yang, Fei" <[email protected]> writes:
>>>>> Hey all,
>>>>> I wanted to send these out for comment and thoughts.
>>>>>
>>>>> Since ~4.20, when the functionfs gadget enabled scatter-gather
>>>>> support, we have seen problems with adb connections stalling and
>>>>> stopping to function on hardware with dwc3 usb controllers.
>>>>> Specifically, HiKey960, Dragonboard 845c, and Pixel3 devices.
>>>>
>>>> Any chance this:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git/commit/
>>>> ?h=testing/next&id=f63333e8e4fd63d8d8ae83b89d2c38cf21d64801
>>> This is a different issue. I have tried initializing num_sgs when debugging this adb stall problem, but it didn't help.
>>
>> So multiple folks have run through this problem, but not *one* has tracepoints collected from the issue? C'mon guys. Can someone, please, collect tracepoints so we can figure out what's actually going on?
>>
>> I'm pretty sure this should be solved at the DMA API level, just want to confirm.
>
> I have sent you the tracepoints long time ago. Also my analysis of the
> problem (BTW, I don't think the tracepoints helped much). It's
> basically a logic problem in function dwc3_gadget_ep_reclaim_trb_sg().

AFAICT, this is caused by DMA API merging pages together when map an
sglist for DMA. While doing that, it does *not* move the SG_END flag
which sg_is_last() checks.

I consider that an overlook on the DMA API, wouldn't you? Why should DMA
API users care if pages were merged or not while mapping the sglist? We
have for_each_sg() and sg_is_last() for a reason.

> I can try dig into my old emails and resend, but that is a bit hard to find.

Don't bother, I'm still not convinced we should fix at the driver level
when sg_is_last() should be working here, unless we should iterate over
num_sgs instead of num_mapped_sgs, though I don't think that's the case
since in that case we would have to chain buffers of size zero.

--
balbi


Attachments:
signature.asc (847.00 B)

2020-01-23 18:30:09

by Yang, Fei

[permalink] [raw]
Subject: RE: [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using adb over f_fs

>>>>>> Since ~4.20, when the functionfs gadget enabled scatter-gather
>>>>>> support, we have seen problems with adb connections stalling and
>>>>>> stopping to function on hardware with dwc3 usb controllers.
>>>>>> Specifically, HiKey960, Dragonboard 845c, and Pixel3 devices.
>>>>>
>>>>> Any chance this:
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git/commi
>>>>> t/
>>>>> ?h=testing/next&id=f63333e8e4fd63d8d8ae83b89d2c38cf21d64801
>>>> This is a different issue. I have tried initializing num_sgs when debugging this adb stall problem, but it didn't help.
>>>
>>> So multiple folks have run through this problem, but not *one* has tracepoints collected from the issue? C'mon guys.
>>> Can someone, please, collect tracepoints so we can figure out what's actually going on?
>>>
>>> I'm pretty sure this should be solved at the DMA API level, just want to confirm.
>>
>> I have sent you the tracepoints long time ago. Also my analysis of the
>> problem (BTW, I don't think the tracepoints helped much). It's
>> basically a logic problem in function dwc3_gadget_ep_reclaim_trb_sg().
>
> AFAICT, this is caused by DMA API merging pages together when map an sglist for DMA. While doing that,
> it does *not* move the SG_END flag which sg_is_last() checks.
>
> I consider that an overlook on the DMA API, wouldn't you? Why should DMA API users care if pages were merged or not while mapping the sglist?
> We have for_each_sg() and sg_is_last() for a reason.

Oops, my bad. Actually, I was talking about the other patch, not the one setting num_sgs = 0; I don't know if this patch is really needed, but from
what I remember the DMA API is setting up the num_sgs properly. I agree even if there is a problem initializing num_sgs, it should be fixed in DMA API.

> I can try dig into my old emails and resend, but that is a bit hard to find.
>
> Don't bother, I'm still not convinced we should fix at the driver level when sg_is_last() should be working here,
> unless we should iterate over num_sgs instead of num_mapped_sgs, though I don't think that's the case since
> in that case we would have to chain buffers of size zero.

> --
> balbi

2020-01-23 18:55:28

by Yang, Fei

[permalink] [raw]
Subject: RE: [RFC][PATCH 1/2] usb: dwc3: gadget: Check for IOC/LST bit in both event->status and TRB->ctrl fields

>> The present code in dwc3_gadget_ep_reclaim_completed_trb() will check
>> for IOC/LST bit in the event->status and returns if IOC/LST bit is
>> set. This logic doesn't work if multiple TRBs are queued per request
>> and the IOC/LST bit is set on the last TRB of that request.
>> Consider an example where a queued request has multiple queued TRBs
>> and IOC/LST bit is set only for the last TRB. In this case, the Core
>> generates XferComplete/XferInProgress events only for the last TRB
>> (since IOC/LST are set only for the last TRB). As per the logic in
>> dwc3_gadget_ep_reclaim_completed_trb() event->status is checked for
>> IOC/LST bit and returns on the first TRB. This makes the remaining
>> TRBs left unhandled.
>> To aviod this, changed the code to check for IOC/LST bits in both
> avoid
>
>> event->status & TRB->ctrl. This patch does the same.
>
> We don't need to check both. It's very likely that checking the TRB is enough.
>
>> At a practical level, this patch resolves USB transfer stalls seen
>> with adb on dwc3 based HiKey960 after functionfs gadget added
>> scatter-gather support around v4.20.
>
> Right, I remember asking for tracepoint data showing this problem happening. It's the best way to figure out what's really going on.
>
> Before we accept these two patches, could you collect dwc3 tracepoint data and share here?

I should have replied to this one. Sorry for the confusion on the other thread.
I have sent tracepoints long time ago for this problem, but the tracepoints did help much on debugging the issue, so I'm not going to send again.

But the problem is really obvious though. In function dwc3_gadget_ep_reclaim_trb_sg(), the for_each_sg loop doesn't get a chance to iterate through all TRBs in the sg list, because this function only gets called when the last TRB in the list is completed (because of setting IOC), so at this moment event->status has IOC bit set. The consequence is that, when the for_each_sg loop trying to reclaim the first TRB in the sg list, the call dwc3_gadget_ep_reclaim_completed_trb() returns 1 (if (event-status & DEPEVT_STATUS_IOC) return 1;), thus the for loop is terminated before the rest of the TRBs are reclaimed.

static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
struct dwc3_request *req, const struct dwc3_event_depevt *event,
int status)
{
struct dwc3_trb *trb = &dep->trb_pool[dep->trb_dequeue];
struct scatterlist *sg = req->sg;
struct scatterlist *s;
unsigned int pending = req->num_pending_sgs;
unsigned int i;
int ret = 0;

for_each_sg(sg, s, pending, i) {
trb = &dep->trb_pool[dep->trb_dequeue];

if (trb->ctrl & DWC3_TRB_CTRL_HWO)
break;

req->sg = sg_next(s);
req->num_pending_sgs--;

ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,
trb, event, status, true);
if (ret)
break;
}

return ret;
}

2020-01-23 20:51:11

by John Stultz

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using adb over f_fs

On Thu, Jan 23, 2020 at 9:31 AM Felipe Balbi <[email protected]> wrote:
> "Yang, Fei" <[email protected]> writes:
> >>> Hey all,
> >>> I wanted to send these out for comment and thoughts.
> >>>
> >>> Since ~4.20, when the functionfs gadget enabled scatter-gather
> >>> support, we have seen problems with adb connections stalling and
> >>> stopping to function on hardware with dwc3 usb controllers.
> >>> Specifically, HiKey960, Dragonboard 845c, and Pixel3 devices.
> >>
> >> Any chance this:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git/commit/?h=testing/next&id=f63333e8e4fd63d8d8ae83b89d2c38cf21d64801
> > This is a different issue. I have tried initializing num_sgs when debugging this adb stall problem, but it didn't help.
>
> So multiple folks have run through this problem, but not *one* has
> tracepoints collected from the issue? C'mon guys. Can someone, please,
> collect tracepoints so we can figure out what's actually going on?

Sure, I can do that. Though to be fair, I recall Fei sending out
tracepoint data earlier that didn't get a response.

So attached is trace/regdump data for db845c both in the failure case
and with the patch ("Correct the logic for finding last SG entry").

I'll collect HiKey960 data here after lunch when I can swap over to
that board and will send it along soon.

Thanks so much for taking a look at this!
-john


Attachments:
db845c.tar.xz (188.09 kB)

2020-01-24 01:20:15

by John Stultz

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] usb: dwc3: gadget: Check for IOC/LST bit in both event->status and TRB->ctrl fields

On Wed, Jan 22, 2020 at 11:23 PM Felipe Balbi <[email protected]> wrote:
> > From: Anurag Kumar Vulisha <[email protected]>
> >
> > The present code in dwc3_gadget_ep_reclaim_completed_trb() will check
> > for IOC/LST bit in the event->status and returns if IOC/LST bit is
> > set. This logic doesn't work if multiple TRBs are queued per
> > request and the IOC/LST bit is set on the last TRB of that request.
> > Consider an example where a queued request has multiple queued TRBs
> > and IOC/LST bit is set only for the last TRB. In this case, the Core
> > generates XferComplete/XferInProgress events only for the last TRB
> > (since IOC/LST are set only for the last TRB). As per the logic in
> > dwc3_gadget_ep_reclaim_completed_trb() event->status is checked for
> > IOC/LST bit and returns on the first TRB. This makes the remaining
> > TRBs left unhandled.
> > To aviod this, changed the code to check for IOC/LST bits in both
> avoid
>
> > event->status & TRB->ctrl. This patch does the same.
>
> We don't need to check both. It's very likely that checking the TRB is
> enough.

Sorry, just to clarify, are you suggesting instead of:
- if (event->status & DEPEVT_STATUS_IOC)
+ if ((event->status & DEPEVT_STATUS_IOC) &&
+ (trb->ctrl & DWC3_TRB_CTRL_IOC))


We do something like:
- if (event->status & DEPEVT_STATUS_IOC)
+ if (trb->ctrl & DWC3_TRB_CTRL_IOC)
+ return 1;
+
+ if (trb->ctrl & DWC3_TRB_CTRL_LST)
return 1;

?

> > At a practical level, this patch resolves USB transfer stalls seen
> > with adb on dwc3 based HiKey960 after functionfs gadget added
> > scatter-gather support around v4.20.
>
> Right, I remember asking for tracepoint data showing this problem
> happening. It's the best way to figure out what's really going on.
>
> Before we accept these two patches, could you collect dwc3 tracepoint
> data and share here?

Sure. Attached is trace logs and regdumps for hikey960.

The one gotcha with the logs is that in the working case (with this
patch applied), I booted with the usb-c cable disconnected (as
suggested in the dwc3.rst doc), enabled tracing and plugged in the
device, then ran adb logcat a few times to validate no stalls.

In the failure case (without this patch), I booted with the usb-c
cable disconnected, enabled tracing and then when I plugged in the
device, it never was detected by adb (it seems perhaps the problem had
already struck?).

So I generated the failure2 log by booting with USB-C plugged in,
enabling tracing, and running adb logcat on the host to observe the
stall.

Anyway, all three sets of logs are included. Let me know if you need
me to try anything else.

thanks
-john


Attachments:
hikey960.tar.xz (314.86 kB)

2020-01-24 07:49:18

by Felipe Balbi

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] usb: dwc3: gadget: Check for IOC/LST bit in both event->status and TRB->ctrl fields


Hi,

John Stultz <[email protected]> writes:

> On Wed, Jan 22, 2020 at 11:23 PM Felipe Balbi <[email protected]> wrote:
>> > From: Anurag Kumar Vulisha <[email protected]>
>> >
>> > The present code in dwc3_gadget_ep_reclaim_completed_trb() will check
>> > for IOC/LST bit in the event->status and returns if IOC/LST bit is
>> > set. This logic doesn't work if multiple TRBs are queued per
>> > request and the IOC/LST bit is set on the last TRB of that request.
>> > Consider an example where a queued request has multiple queued TRBs
>> > and IOC/LST bit is set only for the last TRB. In this case, the Core
>> > generates XferComplete/XferInProgress events only for the last TRB
>> > (since IOC/LST are set only for the last TRB). As per the logic in
>> > dwc3_gadget_ep_reclaim_completed_trb() event->status is checked for
>> > IOC/LST bit and returns on the first TRB. This makes the remaining
>> > TRBs left unhandled.
>> > To aviod this, changed the code to check for IOC/LST bits in both
>> avoid
>>
>> > event->status & TRB->ctrl. This patch does the same.
>>
>> We don't need to check both. It's very likely that checking the TRB is
>> enough.
>
> Sorry, just to clarify, are you suggesting instead of:
> - if (event->status & DEPEVT_STATUS_IOC)
> + if ((event->status & DEPEVT_STATUS_IOC) &&
> + (trb->ctrl & DWC3_TRB_CTRL_IOC))
>
>
> We do something like:
> - if (event->status & DEPEVT_STATUS_IOC)
> + if (trb->ctrl & DWC3_TRB_CTRL_IOC)
> + return 1;
> +
> + if (trb->ctrl & DWC3_TRB_CTRL_LST)
> return 1;
>
> ?

that's correct. In hindsight, I have no idea why I used the
event->status here since all other checks are done against the TRB
only.

>> > At a practical level, this patch resolves USB transfer stalls seen
>> > with adb on dwc3 based HiKey960 after functionfs gadget added
>> > scatter-gather support around v4.20.
>>
>> Right, I remember asking for tracepoint data showing this problem
>> happening. It's the best way to figure out what's really going on.
>>
>> Before we accept these two patches, could you collect dwc3 tracepoint
>> data and share here?
>
> Sure. Attached is trace logs and regdumps for hikey960.

Thanks

> The one gotcha with the logs is that in the working case (with this
> patch applied), I booted with the usb-c cable disconnected (as
> suggested in the dwc3.rst doc), enabled tracing and plugged in the
> device, then ran adb logcat a few times to validate no stalls.
>
> In the failure case (without this patch), I booted with the usb-c
> cable disconnected, enabled tracing and then when I plugged in the
> device, it never was detected by adb (it seems perhaps the problem had
> already struck?).

You never got a Reset Interrupt, so something else is going on. I
suggest putting a sniffer and first making sure the host *does* drive
reset signalling. Second step would be to look at your phy
configuration. Is it going in suspend for any reason? Might want to try
our snps,dis_u3_susphy_quirk and snps,dis_u2_susphy_quirk flags.

> So I generated the failure2 log by booting with USB-C plugged in,
> enabling tracing, and running adb logcat on the host to observe the
> stall.

Thank you. Here's a quick summary of what's in failure2:

There is a series of 24-byte transfers on ep1out and that's the one
which shows a problem. We can clearly see that adb is issuing one
transfer at a time, only enqueueing transfer n+1 when transfer n is
completed and given back, so we see a series of similar blocks:

- dwc3_alloc_request
- dwc3_ep_queue
- dwc3_prepare_trb
- dwc3_prepare_trb (for the chained bit)
- dwc3_gadget_ep_cmd (update transfer)
- dwc3_event (transfer in progress)
- dwc3_complete_trb
- dwc3_complete_trb (for the chained bit)
- dwc3_gadget_giveback
- dwc3_free_request

So this works for several iterations. Note, however, that the TRB
addresses don't really make sense. DWC3 allocates a contiguous block of
memory to server as TRB pool, but we see non-consecutive addresses on
these TRBs. I'm assuming there's an IOMMU in your system.

Anyway, the failing point is here:

> adbd-461 [002] d..1 49.855992: dwc3_alloc_request: ep1out: req 000000004e6eaaba length 0/0 zsI ==> 0
> adbd-461 [002] d..2 49.855994: dwc3_ep_queue: ep1out: req 000000004e6eaaba length 0/24 zsI ==> -115
> adbd-461 [002] d..2 49.855996: dwc3_prepare_trb: ep1out: trb 00000000bae39b48 buf 000000009eb0b100 size 24 ctrl 0000001d (HlCS:sc:normal)
> adbd-461 [002] d..2 49.855997: dwc3_prepare_trb: ep1out: trb 000000009093a074 buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
> adbd-461 [002] d..2 49.856003: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
> irq/65-dwc3-498 [000] d..1 53.902752: dwc3_event: event (00006084): ep1out: Transfer In Progress [0] (SIm)
> irq/65-dwc3-498 [000] d..1 53.902763: dwc3_complete_trb: ep1out: trb 00000000bae39b48 buf 000000009eb0b100 size 0 ctrl 0000001c (hlCS:sc:normal)
> irq/65-dwc3-498 [000] d..1 53.902769: dwc3_complete_trb: ep1out: trb 000000009093a074 buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
> irq/65-dwc3-498 [000] d..1 53.902781: dwc3_gadget_giveback: ep1out: req 000000004e6eaaba length 24/24 zsI ==> 0
> kworker/u16:0-7 [000] .... 53.903020: dwc3_free_request: ep1out: req 000000004e6eaaba length 24/24 zsI ==> 0
> adbd-461 [002] d..1 53.903273: dwc3_alloc_request: ep1out: req 00000000c769beab length 0/0 zsI ==> 0
> adbd-461 [002] d..2 53.903285: dwc3_ep_queue: ep1out: req 00000000c769beab length 0/24 zsI ==> -115
> adbd-461 [002] d..2 53.903292: dwc3_prepare_trb: ep1out: trb 00000000f0ffa827 buf 000000009eb11e80 size 24 ctrl 0000001d (HlCS:sc:normal)
> adbd-461 [002] d..2 53.903296: dwc3_prepare_trb: ep1out: trb 00000000d6a9892a buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
> adbd-461 [002] d..2 53.903315: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful

Note that this transfer, after started, took 4 seconds to complete,
while all others completed within a few ms. There's no real reason for
this visible from dwc3 driver itself. What follows, is a transfer that
never completed.

The only thing I can come up with, is that we starve the TRB ring, by
continuously reclaiming a single TRB. We have 255 usable TRBs, so after
a few iterations, we would see a stall due to starved TRB ring.

There is a way to verify this by tracking trb_enqueue and trb_dequeue,
if you're willing to do that, that'll help us prove that this is really
the problem and, since current tracepoints doen't really show that
information, it may be a good idea to add this information to
dwc3_log_trb tracepoint class. Something like below should be enough,
could you re-run the test of failure2 with this patch applied?

drivers/usb/dwc3/trace.h | 9 +++++++--

modified drivers/usb/dwc3/trace.h
@@ -227,6 +227,8 @@ DECLARE_EVENT_CLASS(dwc3_log_trb,
__field(u32, size)
__field(u32, ctrl)
__field(u32, type)
+ __field(u32, enqueue)
+ __field(u32, dequeue)
),
TP_fast_assign(
__assign_str(name, dep->name);
@@ -236,9 +238,12 @@ DECLARE_EVENT_CLASS(dwc3_log_trb,
__entry->size = trb->size;
__entry->ctrl = trb->ctrl;
__entry->type = usb_endpoint_type(dep->endpoint.desc);
+ __entry->enqueue = dep->trb_enqueue
+ __entry->dequeue = dep->trb_dequeue
),
- TP_printk("%s: trb %p buf %08x%08x size %s%d ctrl %08x (%c%c%c%c:%c%c:%s)",
- __get_str(name), __entry->trb, __entry->bph, __entry->bpl,
+ TP_printk("%s: trb %p (E%d:D%d) buf %08x%08x size %s%d ctrl %08x (%c%c%c%c:%c%c:%s)",
+ __get_str(name), __entry->trb, __entry->enqueue,
+ __entry->dequeue, __entry->bph, __entry->bpl,
({char *s;
int pcm = ((__entry->size >> 24) & 3) + 1;
switch (__entry->type) {

> Anyway, all three sets of logs are included. Let me know if you need
> me to try anything else.

Thanks for doing this

--
balbi


Attachments:
signature.asc (847.00 B)

2020-01-24 22:12:15

by John Stultz

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] usb: dwc3: gadget: Check for IOC/LST bit in both event->status and TRB->ctrl fields

On Thu, Jan 23, 2020 at 11:44 PM Felipe Balbi <[email protected]> wrote:
>
>
> Hi,
>
> John Stultz <[email protected]> writes:
>
> > On Wed, Jan 22, 2020 at 11:23 PM Felipe Balbi <[email protected]> wrote:
> >> > From: Anurag Kumar Vulisha <[email protected]>
> >> >
> >> > The present code in dwc3_gadget_ep_reclaim_completed_trb() will check
> >> > for IOC/LST bit in the event->status and returns if IOC/LST bit is
> >> > set. This logic doesn't work if multiple TRBs are queued per
> >> > request and the IOC/LST bit is set on the last TRB of that request.
> >> > Consider an example where a queued request has multiple queued TRBs
> >> > and IOC/LST bit is set only for the last TRB. In this case, the Core
> >> > generates XferComplete/XferInProgress events only for the last TRB
> >> > (since IOC/LST are set only for the last TRB). As per the logic in
> >> > dwc3_gadget_ep_reclaim_completed_trb() event->status is checked for
> >> > IOC/LST bit and returns on the first TRB. This makes the remaining
> >> > TRBs left unhandled.
> >> > To aviod this, changed the code to check for IOC/LST bits in both
> >> avoid
> >>
> >> > event->status & TRB->ctrl. This patch does the same.
> >>
> >> We don't need to check both. It's very likely that checking the TRB is
> >> enough.
> >
> > Sorry, just to clarify, are you suggesting instead of:
> > - if (event->status & DEPEVT_STATUS_IOC)
> > + if ((event->status & DEPEVT_STATUS_IOC) &&
> > + (trb->ctrl & DWC3_TRB_CTRL_IOC))
> >
> >
> > We do something like:
> > - if (event->status & DEPEVT_STATUS_IOC)
> > + if (trb->ctrl & DWC3_TRB_CTRL_IOC)
> > + return 1;
> > +
> > + if (trb->ctrl & DWC3_TRB_CTRL_LST)
> > return 1;
> >
> > ?
>
> that's correct. In hindsight, I have no idea why I used the
> event->status here since all other checks are done against the TRB
> only.
>
> >> > At a practical level, this patch resolves USB transfer stalls seen
> >> > with adb on dwc3 based HiKey960 after functionfs gadget added
> >> > scatter-gather support around v4.20.
> >>
> >> Right, I remember asking for tracepoint data showing this problem
> >> happening. It's the best way to figure out what's really going on.
> >>
> >> Before we accept these two patches, could you collect dwc3 tracepoint
> >> data and share here?
> >
> > Sure. Attached is trace logs and regdumps for hikey960.
>
> Thanks
>
> > The one gotcha with the logs is that in the working case (with this
> > patch applied), I booted with the usb-c cable disconnected (as
> > suggested in the dwc3.rst doc), enabled tracing and plugged in the
> > device, then ran adb logcat a few times to validate no stalls.
> >
> > In the failure case (without this patch), I booted with the usb-c
> > cable disconnected, enabled tracing and then when I plugged in the
> > device, it never was detected by adb (it seems perhaps the problem had
> > already struck?).
>
> You never got a Reset Interrupt, so something else is going on. I
> suggest putting a sniffer and first making sure the host *does* drive
> reset signalling. Second step would be to look at your phy
> configuration. Is it going in suspend for any reason? Might want to try
> our snps,dis_u3_susphy_quirk and snps,dis_u2_susphy_quirk flags.
>
> > So I generated the failure2 log by booting with USB-C plugged in,
> > enabling tracing, and running adb logcat on the host to observe the
> > stall.
>
> Thank you. Here's a quick summary of what's in failure2:
>
> There is a series of 24-byte transfers on ep1out and that's the one
> which shows a problem. We can clearly see that adb is issuing one
> transfer at a time, only enqueueing transfer n+1 when transfer n is
> completed and given back, so we see a series of similar blocks:
>
> - dwc3_alloc_request
> - dwc3_ep_queue
> - dwc3_prepare_trb
> - dwc3_prepare_trb (for the chained bit)
> - dwc3_gadget_ep_cmd (update transfer)
> - dwc3_event (transfer in progress)
> - dwc3_complete_trb
> - dwc3_complete_trb (for the chained bit)
> - dwc3_gadget_giveback
> - dwc3_free_request
>
> So this works for several iterations. Note, however, that the TRB
> addresses don't really make sense. DWC3 allocates a contiguous block of
> memory to server as TRB pool, but we see non-consecutive addresses on
> these TRBs. I'm assuming there's an IOMMU in your system.
>
> Anyway, the failing point is here:
>
> > adbd-461 [002] d..1 49.855992: dwc3_alloc_request: ep1out: req 000000004e6eaaba length 0/0 zsI ==> 0
> > adbd-461 [002] d..2 49.855994: dwc3_ep_queue: ep1out: req 000000004e6eaaba length 0/24 zsI ==> -115
> > adbd-461 [002] d..2 49.855996: dwc3_prepare_trb: ep1out: trb 00000000bae39b48 buf 000000009eb0b100 size 24 ctrl 0000001d (HlCS:sc:normal)
> > adbd-461 [002] d..2 49.855997: dwc3_prepare_trb: ep1out: trb 000000009093a074 buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
> > adbd-461 [002] d..2 49.856003: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
> > irq/65-dwc3-498 [000] d..1 53.902752: dwc3_event: event (00006084): ep1out: Transfer In Progress [0] (SIm)
> > irq/65-dwc3-498 [000] d..1 53.902763: dwc3_complete_trb: ep1out: trb 00000000bae39b48 buf 000000009eb0b100 size 0 ctrl 0000001c (hlCS:sc:normal)
> > irq/65-dwc3-498 [000] d..1 53.902769: dwc3_complete_trb: ep1out: trb 000000009093a074 buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
> > irq/65-dwc3-498 [000] d..1 53.902781: dwc3_gadget_giveback: ep1out: req 000000004e6eaaba length 24/24 zsI ==> 0
> > kworker/u16:0-7 [000] .... 53.903020: dwc3_free_request: ep1out: req 000000004e6eaaba length 24/24 zsI ==> 0
> > adbd-461 [002] d..1 53.903273: dwc3_alloc_request: ep1out: req 00000000c769beab length 0/0 zsI ==> 0
> > adbd-461 [002] d..2 53.903285: dwc3_ep_queue: ep1out: req 00000000c769beab length 0/24 zsI ==> -115
> > adbd-461 [002] d..2 53.903292: dwc3_prepare_trb: ep1out: trb 00000000f0ffa827 buf 000000009eb11e80 size 24 ctrl 0000001d (HlCS:sc:normal)
> > adbd-461 [002] d..2 53.903296: dwc3_prepare_trb: ep1out: trb 00000000d6a9892a buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
> > adbd-461 [002] d..2 53.903315: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
>
> Note that this transfer, after started, took 4 seconds to complete,
> while all others completed within a few ms. There's no real reason for
> this visible from dwc3 driver itself. What follows, is a transfer that
> never completed.
>
> The only thing I can come up with, is that we starve the TRB ring, by
> continuously reclaiming a single TRB. We have 255 usable TRBs, so after
> a few iterations, we would see a stall due to starved TRB ring.
>
> There is a way to verify this by tracking trb_enqueue and trb_dequeue,
> if you're willing to do that, that'll help us prove that this is really
> the problem and, since current tracepoints doen't really show that
> information, it may be a good idea to add this information to
> dwc3_log_trb tracepoint class. Something like below should be enough,
> could you re-run the test of failure2 with this patch applied?


Ok. Attached is the trace logs using the new tracepoints with and
without the patch. In both cases, I started with the usb-c cable
plugged in, started tracing and ran "adb logcat -d" a few times.

Also, in the -with-fix case, I'm using the patch modified as we
discussed yesterday (which still avoids the issue). If this log
confirms your suspicions I'll go ahead and resubmit the new patch.

thanks
-john


Attachments:
hikey960.tar.xz (100.36 kB)

2020-01-25 11:02:20

by Felipe Balbi

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] usb: dwc3: gadget: Check for IOC/LST bit in both event->status and TRB->ctrl fields


Hi,

John Stultz <[email protected]> writes:
>> > On Wed, Jan 22, 2020 at 11:23 PM Felipe Balbi <[email protected]> wrote:
>> >> > From: Anurag Kumar Vulisha <[email protected]>
>> >> >
>> >> > The present code in dwc3_gadget_ep_reclaim_completed_trb() will check
>> >> > for IOC/LST bit in the event->status and returns if IOC/LST bit is
>> >> > set. This logic doesn't work if multiple TRBs are queued per
>> >> > request and the IOC/LST bit is set on the last TRB of that request.
>> >> > Consider an example where a queued request has multiple queued TRBs
>> >> > and IOC/LST bit is set only for the last TRB. In this case, the Core
>> >> > generates XferComplete/XferInProgress events only for the last TRB
>> >> > (since IOC/LST are set only for the last TRB). As per the logic in
>> >> > dwc3_gadget_ep_reclaim_completed_trb() event->status is checked for
>> >> > IOC/LST bit and returns on the first TRB. This makes the remaining
>> >> > TRBs left unhandled.
>> >> > To aviod this, changed the code to check for IOC/LST bits in both
>> >> avoid
>> >>
>> >> > event->status & TRB->ctrl. This patch does the same.
>> >>
>> >> We don't need to check both. It's very likely that checking the TRB is
>> >> enough.
>> >
>> > Sorry, just to clarify, are you suggesting instead of:
>> > - if (event->status & DEPEVT_STATUS_IOC)
>> > + if ((event->status & DEPEVT_STATUS_IOC) &&
>> > + (trb->ctrl & DWC3_TRB_CTRL_IOC))
>> >
>> >
>> > We do something like:
>> > - if (event->status & DEPEVT_STATUS_IOC)
>> > + if (trb->ctrl & DWC3_TRB_CTRL_IOC)
>> > + return 1;
>> > +
>> > + if (trb->ctrl & DWC3_TRB_CTRL_LST)
>> > return 1;
>> >
>> > ?
>>
>> that's correct. In hindsight, I have no idea why I used the
>> event->status here since all other checks are done against the TRB
>> only.
>>
>> >> > At a practical level, this patch resolves USB transfer stalls seen
>> >> > with adb on dwc3 based HiKey960 after functionfs gadget added
>> >> > scatter-gather support around v4.20.
>> >>
>> >> Right, I remember asking for tracepoint data showing this problem
>> >> happening. It's the best way to figure out what's really going on.
>> >>
>> >> Before we accept these two patches, could you collect dwc3 tracepoint
>> >> data and share here?
>> >
>> > Sure. Attached is trace logs and regdumps for hikey960.
>>
>> Thanks
>>
>> > The one gotcha with the logs is that in the working case (with this
>> > patch applied), I booted with the usb-c cable disconnected (as
>> > suggested in the dwc3.rst doc), enabled tracing and plugged in the
>> > device, then ran adb logcat a few times to validate no stalls.
>> >
>> > In the failure case (without this patch), I booted with the usb-c
>> > cable disconnected, enabled tracing and then when I plugged in the
>> > device, it never was detected by adb (it seems perhaps the problem had
>> > already struck?).
>>
>> You never got a Reset Interrupt, so something else is going on. I
>> suggest putting a sniffer and first making sure the host *does* drive
>> reset signalling. Second step would be to look at your phy
>> configuration. Is it going in suspend for any reason? Might want to try
>> our snps,dis_u3_susphy_quirk and snps,dis_u2_susphy_quirk flags.
>>
>> > So I generated the failure2 log by booting with USB-C plugged in,
>> > enabling tracing, and running adb logcat on the host to observe the
>> > stall.
>>
>> Thank you. Here's a quick summary of what's in failure2:
>>
>> There is a series of 24-byte transfers on ep1out and that's the one
>> which shows a problem. We can clearly see that adb is issuing one
>> transfer at a time, only enqueueing transfer n+1 when transfer n is
>> completed and given back, so we see a series of similar blocks:
>>
>> - dwc3_alloc_request
>> - dwc3_ep_queue
>> - dwc3_prepare_trb
>> - dwc3_prepare_trb (for the chained bit)
>> - dwc3_gadget_ep_cmd (update transfer)
>> - dwc3_event (transfer in progress)
>> - dwc3_complete_trb
>> - dwc3_complete_trb (for the chained bit)
>> - dwc3_gadget_giveback
>> - dwc3_free_request
>>
>> So this works for several iterations. Note, however, that the TRB
>> addresses don't really make sense. DWC3 allocates a contiguous block of
>> memory to server as TRB pool, but we see non-consecutive addresses on
>> these TRBs. I'm assuming there's an IOMMU in your system.
>>
>> Anyway, the failing point is here:
>>
>> > adbd-461 [002] d..1 49.855992: dwc3_alloc_request: ep1out: req 000000004e6eaaba length 0/0 zsI ==> 0
>> > adbd-461 [002] d..2 49.855994: dwc3_ep_queue: ep1out: req 000000004e6eaaba length 0/24 zsI ==> -115
>> > adbd-461 [002] d..2 49.855996: dwc3_prepare_trb: ep1out: trb 00000000bae39b48 buf 000000009eb0b100 size 24 ctrl 0000001d (HlCS:sc:normal)
>> > adbd-461 [002] d..2 49.855997: dwc3_prepare_trb: ep1out: trb 000000009093a074 buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
>> > adbd-461 [002] d..2 49.856003: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
>> > irq/65-dwc3-498 [000] d..1 53.902752: dwc3_event: event (00006084): ep1out: Transfer In Progress [0] (SIm)
>> > irq/65-dwc3-498 [000] d..1 53.902763: dwc3_complete_trb: ep1out: trb 00000000bae39b48 buf 000000009eb0b100 size 0 ctrl 0000001c (hlCS:sc:normal)
>> > irq/65-dwc3-498 [000] d..1 53.902769: dwc3_complete_trb: ep1out: trb 000000009093a074 buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
>> > irq/65-dwc3-498 [000] d..1 53.902781: dwc3_gadget_giveback: ep1out: req 000000004e6eaaba length 24/24 zsI ==> 0
>> > kworker/u16:0-7 [000] .... 53.903020: dwc3_free_request: ep1out: req 000000004e6eaaba length 24/24 zsI ==> 0
>> > adbd-461 [002] d..1 53.903273: dwc3_alloc_request: ep1out: req 00000000c769beab length 0/0 zsI ==> 0
>> > adbd-461 [002] d..2 53.903285: dwc3_ep_queue: ep1out: req 00000000c769beab length 0/24 zsI ==> -115
>> > adbd-461 [002] d..2 53.903292: dwc3_prepare_trb: ep1out: trb 00000000f0ffa827 buf 000000009eb11e80 size 24 ctrl 0000001d (HlCS:sc:normal)
>> > adbd-461 [002] d..2 53.903296: dwc3_prepare_trb: ep1out: trb 00000000d6a9892a buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
>> > adbd-461 [002] d..2 53.903315: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
>>
>> Note that this transfer, after started, took 4 seconds to complete,
>> while all others completed within a few ms. There's no real reason for
>> this visible from dwc3 driver itself. What follows, is a transfer that
>> never completed.
>>
>> The only thing I can come up with, is that we starve the TRB ring, by
>> continuously reclaiming a single TRB. We have 255 usable TRBs, so after
>> a few iterations, we would see a stall due to starved TRB ring.
>>
>> There is a way to verify this by tracking trb_enqueue and trb_dequeue,
>> if you're willing to do that, that'll help us prove that this is really
>> the problem and, since current tracepoints doen't really show that
>> information, it may be a good idea to add this information to
>> dwc3_log_trb tracepoint class. Something like below should be enough,
>> could you re-run the test of failure2 with this patch applied?
>
>
> Ok. Attached is the trace logs using the new tracepoints with and
> without the patch. In both cases, I started with the usb-c cable
> plugged in, started tracing and ran "adb logcat -d" a few times.
>
> Also, in the -with-fix case, I'm using the patch modified as we
> discussed yesterday (which still avoids the issue). If this log
> confirms your suspicions I'll go ahead and resubmit the new patch.

So the problem is caused with ep1in, not ep1out as I originally
though. Here's snippet with the fix:

adbd-2020 [005] d..2 696.765411: dwc3_ep_queue: ep1in: req 0000000090c1f3b7 length 0/8197 zsI ==> -115
adbd-2020 [005] d..2 696.765414: dwc3_prepare_trb: ep1in: trb 00000000c0b7b1ee (E97:D96) buf 00000000aac5d000 size 4096 ctrl 00000015 (HlCs:sc:normal)
adbd-2020 [005] d..2 696.765415: dwc3_prepare_trb: ep1in: trb 00000000cd8ddc31 (E98:D96) buf 00000000adf18000 size 4101 ctrl 00000811 (Hlcs:sC:normal)
adbd-2020 [005] d..2 696.765419: dwc3_gadget_ep_cmd: ep1in: cmd 'Update Transfer' [30007] params 00000000 00000000 00000000 --> status: Successful
irq/65-dwc3-2021 [000] d..1 696.765640: dwc3_event: event (00004086): ep1in: Transfer In Progress [0] (sIm)
irq/65-dwc3-2021 [000] d..1 696.765642: dwc3_complete_trb: ep1in: trb 00000000c0b7b1ee (E98:D97) buf 00000000aac5d000 size 0 ctrl 00000014 (hlCs:sc:normal)
irq/65-dwc3-2021 [000] d..1 696.765644: dwc3_complete_trb: ep1in: trb 00000000cd8ddc31 (E98:D98) buf 00000000adf18000 size 0 ctrl 00000810 (hlcs:sC:normal)
irq/65-dwc3-2021 [000] d..1 696.765647: dwc3_gadget_giveback: ep1in: req 0000000090c1f3b7 length 8197/8197 zsI ==> 0
kworker/u16:0-7 [003] .... 696.765667: dwc3_free_request: ep1in: req 0000000090c1f3b7 length 8197/8197 zsI ==> 0

And without the fix:

adbd-469 [005] d..1 40.118540: dwc3_alloc_request: ep1in: req 000000000dca92a3 length 0/0 zsI ==> 0
adbd-469 [005] d..2 40.118541: dwc3_ep_queue: ep1in: req 000000000dca92a3 length 0/5424 zsI ==> -115
adbd-469 [005] d..2 40.118543: dwc3_prepare_trb: ep1in: trb 0000000020352887 (E77:D76) buf 0000000057db5000 size 4096 ctrl 00000015 (HlCs:sc:normal)
adbd-469 [005] d..2 40.118543: dwc3_prepare_trb: ep1in: trb 00000000227d614e (E78:D76) buf 0000000057db4000 size 1328 ctrl 00000811 (Hlcs:sC:normal)
adbd-469 [005] d..2 40.118547: dwc3_gadget_ep_cmd: ep1in: cmd 'Update Transfer' [30007] params 00000000 00000000 00000000 --> status: Successful
irq/65-dwc3-473 [000] d..1 40.118720: dwc3_event: event (00004086): ep1in: Transfer In Progress [0] (sIm)
irq/65-dwc3-473 [000] d..1 40.118721: dwc3_complete_trb: ep1in: trb 0000000020352887 (E78:D77) buf 0000000057db5000 size 0 ctrl 00000014 (hlCs:sc:normal)
irq/65-dwc3-473 [000] d..1 40.118730: dwc3_gadget_ep_cmd: ep1in: cmd 'Update Transfer' [30007] params 00000000 00000000 00000000 --> status: Successful

Note that we completed a single TRB in the failure case. The odd thing
is why this doesn't happen with OUT direction? (/me goes look at the
code).

Okay, here's the answer: With OUT direction, DWC3, itself, is adding an
extra chained TRB because OUT transfers must be aligned to
wMaxPacketSize. Because of that we set needs_extra_trb flag which causes
this flow:

XferInProgress
dwc3_gadget_ep_cleanup_completed_request
dwc3_gadget_ep_reclaim_trb_sg
for_each_sg {
dwc3_gadget_ep_reclaim_completed_trb
if (IOC)
break;
}
if (needs_extra_trb)
dwc3_gadget_ep_reclaim_trb_linear
dwc3_gadget_ep_reclaim_completed_trb

In summary, OUT directions work solely out of luck :-) If gadget
function enqueues an unaligned request with sglist already in it, it
should fail the same way, since we will append another TRB to something
that already uses more than one TRB.

We should probably add some of this explanation to commit log as well
and, BTW, tracepoints actually had the data to show where the problem
was, arguably printing out enqueue and dequeue points made it easier to
see the issue.

I'm now convinced of what the problem really is, please resend the
modified patch so we can apply and backport it.

cheers

--
balbi


Attachments:
signature.asc (847.00 B)

2020-01-27 18:49:43

by John Stultz

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] usb: dwc3: gadget: Check for IOC/LST bit in both event->status and TRB->ctrl fields

On Sat, Jan 25, 2020 at 3:01 AM Felipe Balbi <[email protected]> wrote:
>
>
> Hi,
>
> John Stultz <[email protected]> writes:
> >> > On Wed, Jan 22, 2020 at 11:23 PM Felipe Balbi <[email protected]> wrote:
> >> >> > From: Anurag Kumar Vulisha <[email protected]>
> >> >> >
> >> >> > The present code in dwc3_gadget_ep_reclaim_completed_trb() will check
> >> >> > for IOC/LST bit in the event->status and returns if IOC/LST bit is
> >> >> > set. This logic doesn't work if multiple TRBs are queued per
> >> >> > request and the IOC/LST bit is set on the last TRB of that request.
> >> >> > Consider an example where a queued request has multiple queued TRBs
> >> >> > and IOC/LST bit is set only for the last TRB. In this case, the Core
> >> >> > generates XferComplete/XferInProgress events only for the last TRB
> >> >> > (since IOC/LST are set only for the last TRB). As per the logic in
> >> >> > dwc3_gadget_ep_reclaim_completed_trb() event->status is checked for
> >> >> > IOC/LST bit and returns on the first TRB. This makes the remaining
> >> >> > TRBs left unhandled.
> >> >> > To aviod this, changed the code to check for IOC/LST bits in both
> >> >> avoid
> >> >>
> >> >> > event->status & TRB->ctrl. This patch does the same.
> >> >>
> >> >> We don't need to check both. It's very likely that checking the TRB is
> >> >> enough.
> >> >
> >> > Sorry, just to clarify, are you suggesting instead of:
> >> > - if (event->status & DEPEVT_STATUS_IOC)
> >> > + if ((event->status & DEPEVT_STATUS_IOC) &&
> >> > + (trb->ctrl & DWC3_TRB_CTRL_IOC))
> >> >
> >> >
> >> > We do something like:
> >> > - if (event->status & DEPEVT_STATUS_IOC)
> >> > + if (trb->ctrl & DWC3_TRB_CTRL_IOC)
> >> > + return 1;
> >> > +
> >> > + if (trb->ctrl & DWC3_TRB_CTRL_LST)
> >> > return 1;
> >> >
> >> > ?
> >>
> >> that's correct. In hindsight, I have no idea why I used the
> >> event->status here since all other checks are done against the TRB
> >> only.
> >>
> >> >> > At a practical level, this patch resolves USB transfer stalls seen
> >> >> > with adb on dwc3 based HiKey960 after functionfs gadget added
> >> >> > scatter-gather support around v4.20.
> >> >>
> >> >> Right, I remember asking for tracepoint data showing this problem
> >> >> happening. It's the best way to figure out what's really going on.
> >> >>
> >> >> Before we accept these two patches, could you collect dwc3 tracepoint
> >> >> data and share here?
> >> >
> >> > Sure. Attached is trace logs and regdumps for hikey960.
> >>
> >> Thanks
> >>
> >> > The one gotcha with the logs is that in the working case (with this
> >> > patch applied), I booted with the usb-c cable disconnected (as
> >> > suggested in the dwc3.rst doc), enabled tracing and plugged in the
> >> > device, then ran adb logcat a few times to validate no stalls.
> >> >
> >> > In the failure case (without this patch), I booted with the usb-c
> >> > cable disconnected, enabled tracing and then when I plugged in the
> >> > device, it never was detected by adb (it seems perhaps the problem had
> >> > already struck?).
> >>
> >> You never got a Reset Interrupt, so something else is going on. I
> >> suggest putting a sniffer and first making sure the host *does* drive
> >> reset signalling. Second step would be to look at your phy
> >> configuration. Is it going in suspend for any reason? Might want to try
> >> our snps,dis_u3_susphy_quirk and snps,dis_u2_susphy_quirk flags.
> >>
> >> > So I generated the failure2 log by booting with USB-C plugged in,
> >> > enabling tracing, and running adb logcat on the host to observe the
> >> > stall.
> >>
> >> Thank you. Here's a quick summary of what's in failure2:
> >>
> >> There is a series of 24-byte transfers on ep1out and that's the one
> >> which shows a problem. We can clearly see that adb is issuing one
> >> transfer at a time, only enqueueing transfer n+1 when transfer n is
> >> completed and given back, so we see a series of similar blocks:
> >>
> >> - dwc3_alloc_request
> >> - dwc3_ep_queue
> >> - dwc3_prepare_trb
> >> - dwc3_prepare_trb (for the chained bit)
> >> - dwc3_gadget_ep_cmd (update transfer)
> >> - dwc3_event (transfer in progress)
> >> - dwc3_complete_trb
> >> - dwc3_complete_trb (for the chained bit)
> >> - dwc3_gadget_giveback
> >> - dwc3_free_request
> >>
> >> So this works for several iterations. Note, however, that the TRB
> >> addresses don't really make sense. DWC3 allocates a contiguous block of
> >> memory to server as TRB pool, but we see non-consecutive addresses on
> >> these TRBs. I'm assuming there's an IOMMU in your system.
> >>
> >> Anyway, the failing point is here:
> >>
> >> > adbd-461 [002] d..1 49.855992: dwc3_alloc_request: ep1out: req 000000004e6eaaba length 0/0 zsI ==> 0
> >> > adbd-461 [002] d..2 49.855994: dwc3_ep_queue: ep1out: req 000000004e6eaaba length 0/24 zsI ==> -115
> >> > adbd-461 [002] d..2 49.855996: dwc3_prepare_trb: ep1out: trb 00000000bae39b48 buf 000000009eb0b100 size 24 ctrl 0000001d (HlCS:sc:normal)
> >> > adbd-461 [002] d..2 49.855997: dwc3_prepare_trb: ep1out: trb 000000009093a074 buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
> >> > adbd-461 [002] d..2 49.856003: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
> >> > irq/65-dwc3-498 [000] d..1 53.902752: dwc3_event: event (00006084): ep1out: Transfer In Progress [0] (SIm)
> >> > irq/65-dwc3-498 [000] d..1 53.902763: dwc3_complete_trb: ep1out: trb 00000000bae39b48 buf 000000009eb0b100 size 0 ctrl 0000001c (hlCS:sc:normal)
> >> > irq/65-dwc3-498 [000] d..1 53.902769: dwc3_complete_trb: ep1out: trb 000000009093a074 buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
> >> > irq/65-dwc3-498 [000] d..1 53.902781: dwc3_gadget_giveback: ep1out: req 000000004e6eaaba length 24/24 zsI ==> 0
> >> > kworker/u16:0-7 [000] .... 53.903020: dwc3_free_request: ep1out: req 000000004e6eaaba length 24/24 zsI ==> 0
> >> > adbd-461 [002] d..1 53.903273: dwc3_alloc_request: ep1out: req 00000000c769beab length 0/0 zsI ==> 0
> >> > adbd-461 [002] d..2 53.903285: dwc3_ep_queue: ep1out: req 00000000c769beab length 0/24 zsI ==> -115
> >> > adbd-461 [002] d..2 53.903292: dwc3_prepare_trb: ep1out: trb 00000000f0ffa827 buf 000000009eb11e80 size 24 ctrl 0000001d (HlCS:sc:normal)
> >> > adbd-461 [002] d..2 53.903296: dwc3_prepare_trb: ep1out: trb 00000000d6a9892a buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
> >> > adbd-461 [002] d..2 53.903315: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
> >>
> >> Note that this transfer, after started, took 4 seconds to complete,
> >> while all others completed within a few ms. There's no real reason for
> >> this visible from dwc3 driver itself. What follows, is a transfer that
> >> never completed.
> >>
> >> The only thing I can come up with, is that we starve the TRB ring, by
> >> continuously reclaiming a single TRB. We have 255 usable TRBs, so after
> >> a few iterations, we would see a stall due to starved TRB ring.
> >>
> >> There is a way to verify this by tracking trb_enqueue and trb_dequeue,
> >> if you're willing to do that, that'll help us prove that this is really
> >> the problem and, since current tracepoints doen't really show that
> >> information, it may be a good idea to add this information to
> >> dwc3_log_trb tracepoint class. Something like below should be enough,
> >> could you re-run the test of failure2 with this patch applied?
> >
> >
> > Ok. Attached is the trace logs using the new tracepoints with and
> > without the patch. In both cases, I started with the usb-c cable
> > plugged in, started tracing and ran "adb logcat -d" a few times.
> >
> > Also, in the -with-fix case, I'm using the patch modified as we
> > discussed yesterday (which still avoids the issue). If this log
> > confirms your suspicions I'll go ahead and resubmit the new patch.
>
> So the problem is caused with ep1in, not ep1out as I originally
> though. Here's snippet with the fix:
>
> adbd-2020 [005] d..2 696.765411: dwc3_ep_queue: ep1in: req 0000000090c1f3b7 length 0/8197 zsI ==> -115
> adbd-2020 [005] d..2 696.765414: dwc3_prepare_trb: ep1in: trb 00000000c0b7b1ee (E97:D96) buf 00000000aac5d000 size 4096 ctrl 00000015 (HlCs:sc:normal)
> adbd-2020 [005] d..2 696.765415: dwc3_prepare_trb: ep1in: trb 00000000cd8ddc31 (E98:D96) buf 00000000adf18000 size 4101 ctrl 00000811 (Hlcs:sC:normal)
> adbd-2020 [005] d..2 696.765419: dwc3_gadget_ep_cmd: ep1in: cmd 'Update Transfer' [30007] params 00000000 00000000 00000000 --> status: Successful
> irq/65-dwc3-2021 [000] d..1 696.765640: dwc3_event: event (00004086): ep1in: Transfer In Progress [0] (sIm)
> irq/65-dwc3-2021 [000] d..1 696.765642: dwc3_complete_trb: ep1in: trb 00000000c0b7b1ee (E98:D97) buf 00000000aac5d000 size 0 ctrl 00000014 (hlCs:sc:normal)
> irq/65-dwc3-2021 [000] d..1 696.765644: dwc3_complete_trb: ep1in: trb 00000000cd8ddc31 (E98:D98) buf 00000000adf18000 size 0 ctrl 00000810 (hlcs:sC:normal)
> irq/65-dwc3-2021 [000] d..1 696.765647: dwc3_gadget_giveback: ep1in: req 0000000090c1f3b7 length 8197/8197 zsI ==> 0
> kworker/u16:0-7 [003] .... 696.765667: dwc3_free_request: ep1in: req 0000000090c1f3b7 length 8197/8197 zsI ==> 0
>
> And without the fix:
>
> adbd-469 [005] d..1 40.118540: dwc3_alloc_request: ep1in: req 000000000dca92a3 length 0/0 zsI ==> 0
> adbd-469 [005] d..2 40.118541: dwc3_ep_queue: ep1in: req 000000000dca92a3 length 0/5424 zsI ==> -115
> adbd-469 [005] d..2 40.118543: dwc3_prepare_trb: ep1in: trb 0000000020352887 (E77:D76) buf 0000000057db5000 size 4096 ctrl 00000015 (HlCs:sc:normal)
> adbd-469 [005] d..2 40.118543: dwc3_prepare_trb: ep1in: trb 00000000227d614e (E78:D76) buf 0000000057db4000 size 1328 ctrl 00000811 (Hlcs:sC:normal)
> adbd-469 [005] d..2 40.118547: dwc3_gadget_ep_cmd: ep1in: cmd 'Update Transfer' [30007] params 00000000 00000000 00000000 --> status: Successful
> irq/65-dwc3-473 [000] d..1 40.118720: dwc3_event: event (00004086): ep1in: Transfer In Progress [0] (sIm)
> irq/65-dwc3-473 [000] d..1 40.118721: dwc3_complete_trb: ep1in: trb 0000000020352887 (E78:D77) buf 0000000057db5000 size 0 ctrl 00000014 (hlCs:sc:normal)
> irq/65-dwc3-473 [000] d..1 40.118730: dwc3_gadget_ep_cmd: ep1in: cmd 'Update Transfer' [30007] params 00000000 00000000 00000000 --> status: Successful
>
> Note that we completed a single TRB in the failure case. The odd thing
> is why this doesn't happen with OUT direction? (/me goes look at the
> code).
>
> Okay, here's the answer: With OUT direction, DWC3, itself, is adding an
> extra chained TRB because OUT transfers must be aligned to
> wMaxPacketSize. Because of that we set needs_extra_trb flag which causes
> this flow:
>
> XferInProgress
> dwc3_gadget_ep_cleanup_completed_request
> dwc3_gadget_ep_reclaim_trb_sg
> for_each_sg {
> dwc3_gadget_ep_reclaim_completed_trb
> if (IOC)
> break;
> }
> if (needs_extra_trb)
> dwc3_gadget_ep_reclaim_trb_linear
> dwc3_gadget_ep_reclaim_completed_trb
>
> In summary, OUT directions work solely out of luck :-) If gadget
> function enqueues an unaligned request with sglist already in it, it
> should fail the same way, since we will append another TRB to something
> that already uses more than one TRB.
>
> We should probably add some of this explanation to commit log as well
> and, BTW, tracepoints actually had the data to show where the problem
> was, arguably printing out enqueue and dequeue points made it easier to
> see the issue.
>
> I'm now convinced of what the problem really is, please resend the
> modified patch so we can apply and backport it.

Sure thing! Though I'm not as adept at staring at the matrix/tracelogs
as you, so I'll do my best to add a comment to the commit log to the
effect of the above, but it may not be accurate so feel free to reword
it yourself to correct it after I send it out. :)

thanks so much again for taking a look at this!
-john

2020-02-05 21:05:41

by John Stultz

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using adb over f_fs

On Thu, Jan 23, 2020 at 9:46 AM Felipe Balbi <[email protected]> wrote:
> >>>>>
> >>>>> Since ~4.20, when the functionfs gadget enabled scatter-gather
> >>>>> support, we have seen problems with adb connections stalling and
> >>>>> stopping to function on hardware with dwc3 usb controllers.
> >>>>> Specifically, HiKey960, Dragonboard 845c, and Pixel3 devices.
...
> >>
> >> I'm pretty sure this should be solved at the DMA API level, just want to confirm.
> >
> > I have sent you the tracepoints long time ago. Also my analysis of the
> > problem (BTW, I don't think the tracepoints helped much). It's
> > basically a logic problem in function dwc3_gadget_ep_reclaim_trb_sg().
>
> AFAICT, this is caused by DMA API merging pages together when map an
> sglist for DMA. While doing that, it does *not* move the SG_END flag
> which sg_is_last() checks.
>
> I consider that an overlook on the DMA API, wouldn't you? Why should DMA
> API users care if pages were merged or not while mapping the sglist? We
> have for_each_sg() and sg_is_last() for a reason.
>

From an initial look, I agree this is pretty confusing. dma_map_sg()
can coalesce entries in the sg list, modifying the sg entires
themselves, however, in doing so it doesn't modify the number of
entries in the sglist (nor the end state bit). That's pretty subtle!

My initial naive attempt to fix the dma-iommu path to set the end bit
at the tail of __finalize_sg() which does the sg entry modifications,
only caused trouble elsewhere, as there's plenty of logic that expects
the number of entries to not change, so having sg_next() return NULL
before that point results in lots of null pointer traversals.

Further, looking at the history, while apparently quirky, this has
been the documented behavior in DMA-API.txt for over almost 14 years
(at least). It clearly states that that dma_map_api can return fewer
mapped entries then sg entries, and one should loop only that many
times (for_each_sg() having a max number of entries to iterate over
seems specifically for this purpose). Additionally, it says one must
preserve the original number of entries (not # mapped entries) for
dma_unmap_sg().

So I'm not sure that sg_is_last() is really valid for iterating on
mapped sg lists.

Should it be? Probably (at least with my unfamiliar eyes), but
sg_is_last() has been around for almost as long coexisting with this
behavioral quirk, so I'm also not sure this is the best hill for the
dwc3 driver to die on. :)

The fix here:
https://lore.kernel.org/lkml/[email protected]/
Or maybe the slightly cleaner varient here:
https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/db845c-mainline-WIP&id=61a5816aa71ec719e77df9f2260dbbf6bcec7c99
seems like it would correctly address things following the
documentation and avoid the failures we're seeing.

As to if dma_map_sg() should fixup the state bits or properly shrink
the sg list when it coalesces entries, that seems like it would be a
much more intrusive change across quite a bit of the kernel that was
written to follow the documented method. So my confidence that such a
change would make it upstream in a reasonable amount of time isn't
very high, and it seems like a bad idea to block the driver from
working properly in the meantime.

Pulling in Christoph and Jens as I suspect they have more context on
this, and maybe can explain thats its not so quirky with the right
perspective?

Thoughts? Maybe there is an easier way to make it less quirky then
what I imagine?

thanks
-john

2020-02-06 06:25:21

by Felipe Balbi

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using adb over f_fs


Hi,

John Stultz <[email protected]> writes:
> On Thu, Jan 23, 2020 at 9:46 AM Felipe Balbi <[email protected]> wrote:
>> >>>>>
>> >>>>> Since ~4.20, when the functionfs gadget enabled scatter-gather
>> >>>>> support, we have seen problems with adb connections stalling and
>> >>>>> stopping to function on hardware with dwc3 usb controllers.
>> >>>>> Specifically, HiKey960, Dragonboard 845c, and Pixel3 devices.
> ...
>> >>
>> >> I'm pretty sure this should be solved at the DMA API level, just want to confirm.
>> >
>> > I have sent you the tracepoints long time ago. Also my analysis of the
>> > problem (BTW, I don't think the tracepoints helped much). It's
>> > basically a logic problem in function dwc3_gadget_ep_reclaim_trb_sg().
>>
>> AFAICT, this is caused by DMA API merging pages together when map an
>> sglist for DMA. While doing that, it does *not* move the SG_END flag
>> which sg_is_last() checks.
>>
>> I consider that an overlook on the DMA API, wouldn't you? Why should DMA
>> API users care if pages were merged or not while mapping the sglist? We
>> have for_each_sg() and sg_is_last() for a reason.
>>
>
> From an initial look, I agree this is pretty confusing. dma_map_sg()
> can coalesce entries in the sg list, modifying the sg entires
> themselves, however, in doing so it doesn't modify the number of
> entries in the sglist (nor the end state bit). That's pretty subtle!
>
> My initial naive attempt to fix the dma-iommu path to set the end bit
> at the tail of __finalize_sg() which does the sg entry modifications,
> only caused trouble elsewhere, as there's plenty of logic that expects
> the number of entries to not change, so having sg_next() return NULL
> before that point results in lots of null pointer traversals.
>
> Further, looking at the history, while apparently quirky, this has
> been the documented behavior in DMA-API.txt for over almost 14 years
> (at least). It clearly states that that dma_map_api can return fewer
> mapped entries then sg entries, and one should loop only that many
> times (for_each_sg() having a max number of entries to iterate over
> seems specifically for this purpose). Additionally, it says one must
> preserve the original number of entries (not # mapped entries) for
> dma_unmap_sg().
>
> So I'm not sure that sg_is_last() is really valid for iterating on
> mapped sg lists.
>
> Should it be? Probably (at least with my unfamiliar eyes), but
> sg_is_last() has been around for almost as long coexisting with this
> behavioral quirk, so I'm also not sure this is the best hill for the
> dwc3 driver to die on. :)
>
> The fix here:
> https://lore.kernel.org/lkml/[email protected]/
> Or maybe the slightly cleaner varient here:
> https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/db845c-mainline-WIP&id=61a5816aa71ec719e77df9f2260dbbf6bcec7c99

in that case, we don't need to use sg_is_last() at all, since i will
always encode the last entry in the list.

> seems like it would correctly address things following the
> documentation and avoid the failures we're seeing.
>
> As to if dma_map_sg() should fixup the state bits or properly shrink
> the sg list when it coalesces entries, that seems like it would be a
> much more intrusive change across quite a bit of the kernel that was
> written to follow the documented method. So my confidence that such a
> change would make it upstream in a reasonable amount of time isn't
> very high, and it seems like a bad idea to block the driver from
> working properly in the meantime.
>
> Pulling in Christoph and Jens as I suspect they have more context on
> this, and maybe can explain thats its not so quirky with the right
> perspective?
>
> Thoughts? Maybe there is an easier way to make it less quirky then
> what I imagine?

it just seems very counter-intuitive to me that DMA api can coalesce
entries but they're actually still there and drivers have to cope with
this behavior.

--
balbi


Attachments:
signature.asc (847.00 B)

2020-02-06 07:44:05

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using adb over f_fs

On Wed, Feb 05, 2020 at 01:03:51PM -0800, John Stultz wrote:
> On Thu, Jan 23, 2020 at 9:46 AM Felipe Balbi <[email protected]> wrote:
> > >> I'm pretty sure this should be solved at the DMA API level, just want to confirm.
> > >
> > > I have sent you the tracepoints long time ago. Also my analysis of the
> > > problem (BTW, I don't think the tracepoints helped much). It's
> > > basically a logic problem in function dwc3_gadget_ep_reclaim_trb_sg().
> >
> > AFAICT, this is caused by DMA API merging pages together when map an
> > sglist for DMA. While doing that, it does *not* move the SG_END flag
> > which sg_is_last() checks.
> >
> > I consider that an overlook on the DMA API, wouldn't you? Why should DMA
> > API users care if pages were merged or not while mapping the sglist? We
> > have for_each_sg() and sg_is_last() for a reason.
> >
>
> >From an initial look, I agree this is pretty confusing. dma_map_sg()
> can coalesce entries in the sg list, modifying the sg entires
> themselves, however, in doing so it doesn't modify the number of
> entries in the sglist (nor the end state bit). That's pretty subtle!

dma_map_sg only coalesces the dma address. The page, offset and len
members are immutable.

The problem is really the design of the scatterlist structure - it
combines immutable input parameters (page, offset, len) and output
parameters (dma_addr, dma_len) in one data structure, and then needs
different accessors depending on which information you care about.
The end marker only works for the "CPU" view.

The right fix is top stop using struct scatterlist, but that is going to
be larger and painful change. At least for block layer stuff I plan to
incrementally do that, though.

> So I'm not sure that sg_is_last() is really valid for iterating on
> mapped sg lists.
>
> Should it be? Probably (at least with my unfamiliar eyes), but
> sg_is_last() has been around for almost as long coexisting with this
> behavioral quirk, so I'm also not sure this is the best hill for the
> dwc3 driver to die on. :)

No, it shoudn't. dma_map_sg returns the number of mapped segments,
and the callers need to remember that.

>
> The fix here:
> https://lore.kernel.org/lkml/[email protected]/
> Or maybe the slightly cleaner varient here:
> https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/db845c-mainline-WIP&id=61a5816aa71ec719e77df9f2260dbbf6bcec7c99
> seems like it would correctly address things following the
> documentation and avoid the failures we're seeing.

The first version is the corret one. sg_is_last has no meaning for the
"DMA" view of the scatterlist.

2020-02-06 18:31:11

by Felipe Balbi

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using adb over f_fs


Hi,

Christoph Hellwig <[email protected]> writes:
> On Wed, Feb 05, 2020 at 01:03:51PM -0800, John Stultz wrote:
>> On Thu, Jan 23, 2020 at 9:46 AM Felipe Balbi <[email protected]> wrote:
>> > >> I'm pretty sure this should be solved at the DMA API level, just want to confirm.
>> > >
>> > > I have sent you the tracepoints long time ago. Also my analysis of the
>> > > problem (BTW, I don't think the tracepoints helped much). It's
>> > > basically a logic problem in function dwc3_gadget_ep_reclaim_trb_sg().
>> >
>> > AFAICT, this is caused by DMA API merging pages together when map an
>> > sglist for DMA. While doing that, it does *not* move the SG_END flag
>> > which sg_is_last() checks.
>> >
>> > I consider that an overlook on the DMA API, wouldn't you? Why should DMA
>> > API users care if pages were merged or not while mapping the sglist? We
>> > have for_each_sg() and sg_is_last() for a reason.
>> >
>>
>> >From an initial look, I agree this is pretty confusing. dma_map_sg()
>> can coalesce entries in the sg list, modifying the sg entires
>> themselves, however, in doing so it doesn't modify the number of
>> entries in the sglist (nor the end state bit). That's pretty subtle!
>
> dma_map_sg only coalesces the dma address. The page, offset and len
> members are immutable.

ok

> The problem is really the design of the scatterlist structure - it
> combines immutable input parameters (page, offset, len) and output
> parameters (dma_addr, dma_len) in one data structure, and then needs
> different accessors depending on which information you care about.
> The end marker only works for the "CPU" view.

right

> The right fix is top stop using struct scatterlist, but that is going to
> be larger and painful change. At least for block layer stuff I plan to
> incrementally do that, though.

I don't think that would be necessary though.

>> So I'm not sure that sg_is_last() is really valid for iterating on
>> mapped sg lists.
>>
>> Should it be? Probably (at least with my unfamiliar eyes), but
>> sg_is_last() has been around for almost as long coexisting with this
>> behavioral quirk, so I'm also not sure this is the best hill for the
>> dwc3 driver to die on. :)
>
> No, it shoudn't. dma_map_sg returns the number of mapped segments,
> and the callers need to remember that.

We _do_ remember that:

unsigned int remaining = req->request.num_mapped_sgs
- req->num_queued_sgs;

for_each_sg(sg, s, remaining, i) {
unsigned int length = req->request.length;
unsigned int maxp = usb_endpoint_maxp(dep->endpoint.desc);
unsigned int rem = length % maxp;
unsigned chain = true;

if (sg_is_last(s))
chain = false;

if (rem && usb_endpoint_dir_out(dep->endpoint.desc) && !chain) {

that req->request.num_mapped_sgs is the returned value. So you're saying
we should test for i == num_mapped_sgs, instead of using
sg_is_last(). Is that it?

Fair enough. Just out of curiosity, then, when *should* we use
sg_is_last()?

cheers

--
balbi


Attachments:
signature.asc (847.00 B)

2020-02-06 18:42:42

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using adb over f_fs

On Thu, Feb 06, 2020 at 08:29:45PM +0200, Felipe Balbi wrote:
> > No, it shoudn't. dma_map_sg returns the number of mapped segments,
> > and the callers need to remember that.
>
> We _do_ remember that:

That helps :)

> that req->request.num_mapped_sgs is the returned value. So you're saying
> we should test for i == num_mapped_sgs, instead of using
> sg_is_last(). Is that it?

Yes.

> Fair enough. Just out of curiosity, then, when *should* we use
> sg_is_last()?

Outside of sg_next/sg_last it really shoud not be used at all as far
as I'm concerned.

2020-02-07 06:01:53

by Felipe Balbi

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using adb over f_fs


Hi,

Christoph Hellwig <[email protected]> writes:
> On Thu, Feb 06, 2020 at 08:29:45PM +0200, Felipe Balbi wrote:
>> Fair enough. Just out of curiosity, then, when *should* we use
>> sg_is_last()?
>
> Outside of sg_next/sg_last it really shoud not be used at all as far
> as I'm concerned.

In that case, we may have other drivers with similar issues that just
haven't surfaced:

$ git --no-pager grep -e sg_is_last
drivers/ata/pata_octeon_cf.c:701: if (!sg_is_last(qc->cursg)) {
drivers/crypto/amcc/crypto4xx_core.c:738: if (sg_is_last(dst) && force_sd == false) {
drivers/crypto/atmel-sha.c:318: if ((ctx->sg->length == 0) && !sg_is_last(ctx->sg)) {
drivers/crypto/atmel-sha.c:781: if (!sg_is_last(sg) && !IS_ALIGNED(sg->length, ctx->block_size))
drivers/crypto/atmel-sha.c:787: if (sg_is_last(sg)) {
drivers/crypto/ccree/cc_buffer_mgr.c:293: if (sg_is_last(sg)) {
drivers/crypto/ccree/cc_buffer_mgr.c:305: } else { /*sg_is_last*/
drivers/crypto/hisilicon/hpre/hpre_crypto.c:238: if ((sg_is_last(data) && len == ctx->key_sz) &&
drivers/crypto/marvell/tdma.c:25: if (sg_is_last(sgiter->sg))
drivers/crypto/mediatek/mtk-sha.c:196: if ((ctx->sg->length == 0) && !sg_is_last(ctx->sg)) {
drivers/crypto/mediatek/mtk-sha.c:530: if (!sg_is_last(sg) && !IS_ALIGNED(sg->length, ctx->bs))
drivers/crypto/mediatek/mtk-sha.c:536: if (sg_is_last(sg)) {
drivers/crypto/mxs-dcp.c:339: if (actx->fill == out_off || sg_is_last(src) ||
drivers/crypto/qat/qat_common/qat_asym_algs.c:322: if (sg_is_last(req->src) && req->src_len == ctx->p_size) {
drivers/crypto/qat/qat_common/qat_asym_algs.c:353: if (sg_is_last(req->dst) && req->dst_len == ctx->p_size) {
drivers/crypto/qat/qat_common/qat_asym_algs.c:730: if (sg_is_last(req->src) && req->src_len == ctx->key_sz) {
drivers/crypto/qat/qat_common/qat_asym_algs.c:749: if (sg_is_last(req->dst) && req->dst_len == ctx->key_sz) {
drivers/crypto/qat/qat_common/qat_asym_algs.c:874: if (sg_is_last(req->src) && req->src_len == ctx->key_sz) {
drivers/crypto/qat/qat_common/qat_asym_algs.c:893: if (sg_is_last(req->dst) && req->dst_len == ctx->key_sz) {
drivers/crypto/rockchip/rk3288_crypto_ahash.c:238: if (sg_is_last(dev->sg_src)) {
drivers/crypto/rockchip/rk3288_crypto_skcipher.c:356: if (sg_is_last(dev->sg_src)) {
drivers/crypto/s5p-sss.c:581: if (!sg_is_last(dev->sg_dst)) {
drivers/crypto/s5p-sss.c:603: if (!sg_is_last(dev->sg_src)) {
drivers/crypto/s5p-sss.c:690: if (sg_is_last(dev->sg_dst))
drivers/crypto/stm32/stm32-hash.c:304: if ((rctx->sg->length == 0) && !sg_is_last(rctx->sg)) {
drivers/crypto/stm32/stm32-hash.c:568: if (sg_is_last(sg)) {
drivers/crypto/stm32/stm32-hash.c:595: !sg_is_last(sg));
drivers/crypto/stm32/stm32-hash.c:668: (!sg_is_last(sg)))
drivers/dma/ipu/ipu_idmac.c:847: dma_addr_t dma_1 = sg_is_last(desc->sg) ? 0 :
drivers/gpu/drm/i915/i915_gpu_error.c:649: if (sg_is_last(sg))
drivers/gpu/drm/i915/i915_gpu_error.c:653: sg = sg_is_last(sg) ? NULL : sg_chain_ptr(sg);
drivers/gpu/drm/i915/i915_gpu_error.c:903: } while (!sg_is_last(sg++));
drivers/gpu/drm/i915/i915_scatterlist.h:66: return sg_is_last(sg) ? NULL : ____sg_next(sg);
drivers/hwtracing/intel_th/msu.c:545: if (sg_is_last(iter->block))
drivers/memstick/core/ms_block.c:43: if (sg_is_last(sg_from))
drivers/memstick/core/ms_block.c:58: if (sg_is_last(sg_from) || !len)
drivers/memstick/core/ms_block.c:73: if (sg_is_last(sg_from) || !len)
drivers/mmc/host/bcm2835.c:485: if (sg_is_last(sg)) {
drivers/rapidio/devices/tsi721_dma.c:514: if (sg_is_last(sg)) {
drivers/s390/scsi/zfcp_qdio.h:184: return sg_is_last(sg) && sg->length <= ZFCP_QDIO_SBALE_LEN;
drivers/scsi/NCR5380.c:171: if (!cmd->SCp.this_residual && s && !sg_is_last(s)) {
drivers/scsi/NCR5380.c:184: while (!sg_is_last(s)) {
drivers/scsi/aha152x.c:2019: !sg_is_last(CURRENT_SC->SCp.buffer)) {
drivers/scsi/aha152x.c:2125: !sg_is_last(CURRENT_SC->SCp.buffer)) {
drivers/scsi/aha152x.c:2155: while (done > 0 && !sg_is_last(sg)) {
drivers/scsi/qla2xxx/qla_iocb.c:1226: sg_is_last(sg)) {
drivers/spi/spi-bcm2835.c:484: if (bs->tx_buf && !sg_is_last(&tfr->tx_sg.sgl[0]))
drivers/spi/spi-bcm2835.c:487: if (bs->rx_buf && !sg_is_last(&tfr->rx_sg.sgl[0])) {
drivers/spi/spi-bcm2835.c:491: if (!bs->tx_buf || sg_is_last(&tfr->tx_sg.sgl[0])) {
drivers/usb/dwc2/gadget.c:861: sg_is_last(sg));
drivers/usb/dwc3/gadget.c:1074: if (sg_is_last(s))
drivers/usb/image/microtek.c:507: sg_is_last(context->curr_sg) ?
include/linux/devcoredump.h:40: while (!sg_is_last(iter)) {
include/linux/scatterlist.h:73:#define sg_is_last(sg) ((sg)->page_link & SG_END)
lib/scatterlist.c:25: if (sg_is_last(sg))
lib/scatterlist.c:109: BUG_ON(!sg_is_last(ret));
net/core/skbuff.c:4289: if (unlikely(elt && sg_is_last(&sg[elt - 1])))
net/core/skbuff.c:4311: if (unlikely(elt && sg_is_last(&sg[elt - 1])))
net/tls/tls_main.c:116: if (sg_is_last(sg))
net/xfrm/espintcp.c:179: if (sg_is_last(sg))
samples/kfifo/dma-example.c:79: if (sg_is_last(&sg[i]))
samples/kfifo/dma-example.c:108: if (sg_is_last(&sg[i]))
tools/virtio/linux/scatterlist.h:15:#define sg_is_last(sg) ((sg)->page_link & 0x02)
tools/virtio/linux/scatterlist.h:139: if (sg_is_last(sg))

--
balbi


Attachments:
signature.asc (847.00 B)