2023-05-18 20:59:55

by Frank Li

[permalink] [raw]
Subject: [PATCH 1/2] usb: cdns3: improve handling of unaligned address case

When the address of a request was not aligned with an 8-byte boundary, the
USB DMA was unable to process it, necessitating the use of an internal
bounce buffer.

In these cases, the request->buf had to be copied to/from this bounce
buffer. However, if this unaligned address scenario arises, it is
unnecessary to perform heavy cache maintenance operations like
usb_gadget_map(unmap)_request_by_dev() on the request->buf, as the DMA
does not utilize it at all. it can be skipped at this case.

iperf3 tests on the rndis case:

Transmit speed (TX): Improved from 299Mbps to 440Mbps
Receive speed (RX): Improved from 290Mbps to 500Mbps

Signed-off-by: Frank Li <[email protected]>
---
drivers/usb/cdns3/cdns3-gadget.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/cdns3/cdns3-gadget.c b/drivers/usb/cdns3/cdns3-gadget.c
index 1dcadef933e3..09a0882a4e97 100644
--- a/drivers/usb/cdns3/cdns3-gadget.c
+++ b/drivers/usb/cdns3/cdns3-gadget.c
@@ -800,7 +800,8 @@ void cdns3_gadget_giveback(struct cdns3_endpoint *priv_ep,
if (request->status == -EINPROGRESS)
request->status = status;

- usb_gadget_unmap_request_by_dev(priv_dev->sysdev, request,
+ if (likely(!(priv_req->flags & REQUEST_UNALIGNED)))
+ usb_gadget_unmap_request_by_dev(priv_dev->sysdev, request,
priv_ep->dir);

if ((priv_req->flags & REQUEST_UNALIGNED) &&
@@ -2543,10 +2544,12 @@ static int __cdns3_gadget_ep_queue(struct usb_ep *ep,
if (ret < 0)
return ret;

- ret = usb_gadget_map_request_by_dev(priv_dev->sysdev, request,
+ if (likely(!(priv_req->flags & REQUEST_UNALIGNED))) {
+ ret = usb_gadget_map_request_by_dev(priv_dev->sysdev, request,
usb_endpoint_dir_in(ep->desc));
- if (ret)
- return ret;
+ if (ret)
+ return ret;
+ }

list_add_tail(&request->list, &priv_ep->deferred_req_list);

--
2.34.1



2023-06-04 23:23:14

by Peter Chen

[permalink] [raw]
Subject: Re: [PATCH 1/2] usb: cdns3: improve handling of unaligned address case

On 23-05-18 16:49:45, Frank Li wrote:
> When the address of a request was not aligned with an 8-byte boundary, the
> USB DMA was unable to process it, necessitating the use of an internal
> bounce buffer.
>
> In these cases, the request->buf had to be copied to/from this bounce
> buffer. However, if this unaligned address scenario arises, it is
> unnecessary to perform heavy cache maintenance operations like
> usb_gadget_map(unmap)_request_by_dev() on the request->buf, as the DMA
> does not utilize it at all. it can be skipped at this case.
>
> iperf3 tests on the rndis case:
>
> Transmit speed (TX): Improved from 299Mbps to 440Mbps
> Receive speed (RX): Improved from 290Mbps to 500Mbps
>
> Signed-off-by: Frank Li <[email protected]>
> ---
> drivers/usb/cdns3/cdns3-gadget.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/usb/cdns3/cdns3-gadget.c b/drivers/usb/cdns3/cdns3-gadget.c
> index 1dcadef933e3..09a0882a4e97 100644
> --- a/drivers/usb/cdns3/cdns3-gadget.c
> +++ b/drivers/usb/cdns3/cdns3-gadget.c
> @@ -800,7 +800,8 @@ void cdns3_gadget_giveback(struct cdns3_endpoint *priv_ep,
> if (request->status == -EINPROGRESS)
> request->status = status;
>
> - usb_gadget_unmap_request_by_dev(priv_dev->sysdev, request,
> + if (likely(!(priv_req->flags & REQUEST_UNALIGNED)))
> + usb_gadget_unmap_request_by_dev(priv_dev->sysdev, request,
> priv_ep->dir);
>
> if ((priv_req->flags & REQUEST_UNALIGNED) &&
> @@ -2543,10 +2544,12 @@ static int __cdns3_gadget_ep_queue(struct usb_ep *ep,
> if (ret < 0)
> return ret;
>
> - ret = usb_gadget_map_request_by_dev(priv_dev->sysdev, request,
> + if (likely(!(priv_req->flags & REQUEST_UNALIGNED))) {
> + ret = usb_gadget_map_request_by_dev(priv_dev->sysdev, request,
> usb_endpoint_dir_in(ep->desc));

So, the possible reason for performance drop is do cache coherency
operation twice for unaligned buffers?

Peter

> - if (ret)
> - return ret;
> + if (ret)
> + return ret;
> + }
>
> list_add_tail(&request->list, &priv_ep->deferred_req_list);
>
> --
> 2.34.1
>

--

Thanks,
Peter Chen