2020-06-16 12:37:18

by Macpaul Lin

[permalink] [raw]
Subject: [PATCH 1/2] usb: gadget: introduce flag for large request

Some USB hardware like DMA engine can help to process (split) the data
of each URB request into small packets. For example, the max packet size
of high speed is 512 bytes. These kinds of hardware can help to split
the continue Tx/Rx data requests into packets just at the max packet
size during transmission. Hence upper layer software can reduce some
effort for queueing many requests back and forth for larger data.

Here we introduce "can_exceed_maxp" flag in gadget when these kinds of
hardware is ready to support these operations.

Signed-off-by: Macpaul Lin <[email protected]>
---
drivers/usb/mtu3/mtu3_qmu.c | 11 ++++++++++-
include/linux/usb/gadget.h | 1 +
2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/mtu3/mtu3_qmu.c b/drivers/usb/mtu3/mtu3_qmu.c
index 3f414f9..2b51a20 100644
--- a/drivers/usb/mtu3/mtu3_qmu.c
+++ b/drivers/usb/mtu3/mtu3_qmu.c
@@ -620,7 +620,7 @@ irqreturn_t mtu3_qmu_isr(struct mtu3 *mtu)

int mtu3_qmu_init(struct mtu3 *mtu)
{
-
+ int i;
compiletime_assert(QMU_GPD_SIZE == 16, "QMU_GPD size SHOULD be 16B");

mtu->qmu_gpd_pool = dma_pool_create("QMU_GPD", mtu->dev,
@@ -629,10 +629,19 @@ int mtu3_qmu_init(struct mtu3 *mtu)
if (!mtu->qmu_gpd_pool)
return -ENOMEM;

+ /* Let gadget know we can process request larger than max packet */
+ for (i = 1; i < mtu->num_eps; i++)
+ mtu->ep_array[i].ep.can_exceed_maxp = 1;
+
return 0;
}

void mtu3_qmu_exit(struct mtu3 *mtu)
{
+ int i;
dma_pool_destroy(mtu->qmu_gpd_pool);
+
+ /* Disable large request support */
+ for (i = 1; i < mtu->num_eps; i++)
+ mtu->ep_array[i].ep.can_exceed_maxp = 0;
}
diff --git a/include/linux/usb/gadget.h b/include/linux/usb/gadget.h
index 6a17817..60e0645 100644
--- a/include/linux/usb/gadget.h
+++ b/include/linux/usb/gadget.h
@@ -236,6 +236,7 @@ struct usb_ep {
unsigned max_streams:16;
unsigned mult:2;
unsigned maxburst:5;
+ unsigned can_exceed_maxp:1;
u8 address;
const struct usb_endpoint_descriptor *desc;
const struct usb_ss_ep_comp_descriptor *comp_desc;
--
1.7.9.5


2020-06-16 12:37:47

by Macpaul Lin

[permalink] [raw]
Subject: [PATCH 2/2] usb: gadget: u_serial: improve performance for large data

If the hardware (like DMA engine) could support large usb request exceeds
maximum packet size, use larger buffer when performing Rx/Tx could reduce
request numbers and improve performance.

Signed-off-by: Macpaul Lin <[email protected]>
---
drivers/usb/gadget/function/u_serial.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c
index 3cfc6e2..cdcc070 100644
--- a/drivers/usb/gadget/function/u_serial.c
+++ b/drivers/usb/gadget/function/u_serial.c
@@ -80,6 +80,8 @@
#define QUEUE_SIZE 16
#define WRITE_BUF_SIZE 8192 /* TX only */
#define GS_CONSOLE_BUF_SIZE 8192
+/* for hardware can do more than max packet */
+#define REQ_BUF_SIZE 4096

/* console info */
struct gs_console {
@@ -247,7 +249,8 @@ static int gs_start_tx(struct gs_port *port)
break;

req = list_entry(pool->next, struct usb_request, list);
- len = gs_send_packet(port, req->buf, in->maxpacket);
+ len = gs_send_packet(port, req->buf, in->can_exceed_maxp ?
+ REQ_BUF_SIZE : in->maxpacket);
if (len == 0) {
wake_up_interruptible(&port->drain_wait);
break;
@@ -514,7 +517,9 @@ static int gs_alloc_requests(struct usb_ep *ep, struct list_head *head,
* be as speedy as we might otherwise be.
*/
for (i = 0; i < n; i++) {
- req = gs_alloc_req(ep, ep->maxpacket, GFP_ATOMIC);
+ req = gs_alloc_req(ep, ep->can_exceed_maxp ?
+ REQ_BUF_SIZE : ep->maxpacket,
+ GFP_ATOMIC);
if (!req)
return list_empty(head) ? -ENOMEM : 0;
req->complete = fn;
--
1.7.9.5

2020-06-16 14:06:06

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 1/2] usb: gadget: introduce flag for large request

On Tue, Jun 16, 2020 at 08:34:43PM +0800, Macpaul Lin wrote:
> Some USB hardware like DMA engine can help to process (split) the data
> of each URB request into small packets. For example, the max packet size
> of high speed is 512 bytes. These kinds of hardware can help to split
> the continue Tx/Rx data requests into packets just at the max packet
> size during transmission. Hence upper layer software can reduce some
> effort for queueing many requests back and forth for larger data.
>
> Here we introduce "can_exceed_maxp" flag in gadget when these kinds of
> hardware is ready to support these operations.

This isn't needed. All UDC drivers must be able to support requests that
are larger than the maxpacket size.

Alan Stern

2020-06-17 02:32:15

by Macpaul Lin

[permalink] [raw]
Subject: Re: [PATCH 1/2] usb: gadget: introduce flag for large request

Alan Stern <[email protected]> 於 2020年6月16日 週二 下午10:05寫道:
>
> On Tue, Jun 16, 2020 at 08:34:43PM +0800, Macpaul Lin wrote:
> > Some USB hardware like DMA engine can help to process (split) the data
> > of each URB request into small packets. For example, the max packet size
> > of high speed is 512 bytes. These kinds of hardware can help to split
> > the continue Tx/Rx data requests into packets just at the max packet
> > size during transmission. Hence upper layer software can reduce some
> > effort for queueing many requests back and forth for larger data.
> >
> > Here we introduce "can_exceed_maxp" flag in gadget when these kinds of
> > hardware is ready to support these operations.
>
> This isn't needed. All UDC drivers must be able to support requests that
> are larger than the maxpacket size.
>
> Alan Stern

Thanks for your reply, could we just modify the patch 2 (u_serial.c)
for improving
better performance? I'm not sure why there was a restriction about max packet.
Isn't there any historical reason?

--
Best regards,
Macpaul Lin

2020-06-17 02:49:03

by Macpaul Lin

[permalink] [raw]
Subject: [PATCH v2] usb: gadget: u_serial: improve performance for large data

Nowadays some embedded systems use VCOM to transfer large log and data.
Take LTE MODEM as an example, during the long debugging stage, large
log and data were transfer through VCOM when doing field try or in
operator's lab. Here we suggest slightly increase the transfer buffer
in u_serial.c for performance improving.

Signed-off-by: Macpaul Lin <[email protected]>
---
drivers/usb/gadget/function/u_serial.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c
index 3cfc6e2..d7912a9 100644
--- a/drivers/usb/gadget/function/u_serial.c
+++ b/drivers/usb/gadget/function/u_serial.c
@@ -80,6 +80,7 @@
#define QUEUE_SIZE 16
#define WRITE_BUF_SIZE 8192 /* TX only */
#define GS_CONSOLE_BUF_SIZE 8192
+#define REQ_BUF_SIZE 4096

/* console info */
struct gs_console {
@@ -247,7 +248,7 @@ static int gs_start_tx(struct gs_port *port)
break;

req = list_entry(pool->next, struct usb_request, list);
- len = gs_send_packet(port, req->buf, in->maxpacket);
+ len = gs_send_packet(port, req->buf, REQ_BUF_SIZE);
if (len == 0) {
wake_up_interruptible(&port->drain_wait);
break;
@@ -514,7 +515,7 @@ static int gs_alloc_requests(struct usb_ep *ep, struct list_head *head,
* be as speedy as we might otherwise be.
*/
for (i = 0; i < n; i++) {
- req = gs_alloc_req(ep, ep->maxpacket, GFP_ATOMIC);
+ req = gs_alloc_req(ep, REQ_BUF_SIZE, GFP_ATOMIC);
if (!req)
return list_empty(head) ? -ENOMEM : 0;
req->complete = fn;
--
1.7.9.5

2020-06-17 05:16:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2] usb: gadget: u_serial: improve performance for large data

On Wed, Jun 17, 2020 at 10:46:47AM +0800, Macpaul Lin wrote:
> Nowadays some embedded systems use VCOM to transfer large log and data.
> Take LTE MODEM as an example, during the long debugging stage, large
> log and data were transfer through VCOM when doing field try or in
> operator's lab. Here we suggest slightly increase the transfer buffer
> in u_serial.c for performance improving.
>
> Signed-off-by: Macpaul Lin <[email protected]>
> ---
> drivers/usb/gadget/function/u_serial.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)

What changed from v1? Always put that below the --- line as the
documentation asks for.

v3?

2020-06-17 05:36:36

by Macpaul Lin

[permalink] [raw]
Subject: Re: [PATCH v2] usb: gadget: u_serial: improve performance for large data

On Wed, 2020-06-17 at 07:14 +0200, Greg Kroah-Hartman wrote:
> On Wed, Jun 17, 2020 at 10:46:47AM +0800, Macpaul Lin wrote:
> > Nowadays some embedded systems use VCOM to transfer large log and data.
> > Take LTE MODEM as an example, during the long debugging stage, large
> > log and data were transfer through VCOM when doing field try or in
> > operator's lab. Here we suggest slightly increase the transfer buffer
> > in u_serial.c for performance improving.
> >
> > Signed-off-by: Macpaul Lin <[email protected]>
> > ---
> > drivers/usb/gadget/function/u_serial.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
>
> What changed from v1? Always put that below the --- line as the
> documentation asks for.
>
> v3?
Sorry, I just forget to add change log, I'll send v3 later.

Thanks!

BR,
Macpaul Lin

2020-06-17 05:45:24

by Macpaul Lin

[permalink] [raw]
Subject: [PATCH v3] usb: gadget: u_serial: improve performance for large data

Nowadays some embedded systems use VCOM to transfer large log and data.
Take LTE MODEM as an example, during the long debugging stage, large
log and data were transfer through VCOM when doing field try or in
operator's lab. Here we suggest slightly increase the transfer buffer
in u_serial.c for performance improving.

Signed-off-by: Macpaul Lin <[email protected]>
---
Changes for v2:
- Drop previous patch for adding flag which indicates hardware capability in
gadget.h and in DMA engine according to Alan's suggestion. Thanks.
- Replace requested buffer size "REQ_BUF_SIZE" instead of checking hardware
capability.
- Refine commit messages.
Changes for v3:
- Code: no change.
Commit: Add missing change log in v2.

drivers/usb/gadget/function/u_serial.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c
index 3cfc6e2..d7912a9 100644
--- a/drivers/usb/gadget/function/u_serial.c
+++ b/drivers/usb/gadget/function/u_serial.c
@@ -80,6 +80,7 @@
#define QUEUE_SIZE 16
#define WRITE_BUF_SIZE 8192 /* TX only */
#define GS_CONSOLE_BUF_SIZE 8192
+#define REQ_BUF_SIZE 4096

/* console info */
struct gs_console {
@@ -247,7 +248,7 @@ static int gs_start_tx(struct gs_port *port)
break;

req = list_entry(pool->next, struct usb_request, list);
- len = gs_send_packet(port, req->buf, in->maxpacket);
+ len = gs_send_packet(port, req->buf, REQ_BUF_SIZE);
if (len == 0) {
wake_up_interruptible(&port->drain_wait);
break;
@@ -514,7 +515,7 @@ static int gs_alloc_requests(struct usb_ep *ep, struct list_head *head,
* be as speedy as we might otherwise be.
*/
for (i = 0; i < n; i++) {
- req = gs_alloc_req(ep, ep->maxpacket, GFP_ATOMIC);
+ req = gs_alloc_req(ep, REQ_BUF_SIZE, GFP_ATOMIC);
if (!req)
return list_empty(head) ? -ENOMEM : 0;
req->complete = fn;
--
1.7.9.5

2020-06-24 06:48:33

by Felipe Balbi

[permalink] [raw]
Subject: Re: [PATCH 1/2] usb: gadget: introduce flag for large request


Hi,

Macpaul Lin <[email protected]> writes:
> Some USB hardware like DMA engine can help to process (split) the data
> of each URB request into small packets. For example, the max packet size
> of high speed is 512 bytes. These kinds of hardware can help to split
> the continue Tx/Rx data requests into packets just at the max packet
> size during transmission. Hence upper layer software can reduce some
> effort for queueing many requests back and forth for larger data.
>
> Here we introduce "can_exceed_maxp" flag in gadget when these kinds of
> hardware is ready to support these operations.
>
> Signed-off-by: Macpaul Lin <[email protected]>
> ---
> drivers/usb/mtu3/mtu3_qmu.c | 11 ++++++++++-
> include/linux/usb/gadget.h | 1 +
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/usb/mtu3/mtu3_qmu.c b/drivers/usb/mtu3/mtu3_qmu.c
> index 3f414f9..2b51a20 100644
> --- a/drivers/usb/mtu3/mtu3_qmu.c
> +++ b/drivers/usb/mtu3/mtu3_qmu.c
> @@ -620,7 +620,7 @@ irqreturn_t mtu3_qmu_isr(struct mtu3 *mtu)
>
> int mtu3_qmu_init(struct mtu3 *mtu)
> {
> -
> + int i;
> compiletime_assert(QMU_GPD_SIZE == 16, "QMU_GPD size SHOULD be 16B");
>
> mtu->qmu_gpd_pool = dma_pool_create("QMU_GPD", mtu->dev,
> @@ -629,10 +629,19 @@ int mtu3_qmu_init(struct mtu3 *mtu)
> if (!mtu->qmu_gpd_pool)
> return -ENOMEM;
>
> + /* Let gadget know we can process request larger than max packet */
> + for (i = 1; i < mtu->num_eps; i++)
> + mtu->ep_array[i].ep.can_exceed_maxp = 1;
> +
> return 0;
> }
>
> void mtu3_qmu_exit(struct mtu3 *mtu)
> {
> + int i;
> dma_pool_destroy(mtu->qmu_gpd_pool);
> +
> + /* Disable large request support */
> + for (i = 1; i < mtu->num_eps; i++)
> + mtu->ep_array[i].ep.can_exceed_maxp = 0;
> }
> diff --git a/include/linux/usb/gadget.h b/include/linux/usb/gadget.h
> index 6a17817..60e0645 100644
> --- a/include/linux/usb/gadget.h
> +++ b/include/linux/usb/gadget.h
> @@ -236,6 +236,7 @@ struct usb_ep {
> unsigned max_streams:16;
> unsigned mult:2;
> unsigned maxburst:5;
> + unsigned can_exceed_maxp:1;

every driver does this without this flag. This is unnecessary.

--
balbi


Attachments:
signature.asc (847.00 B)

2020-06-24 06:49:32

by Felipe Balbi

[permalink] [raw]
Subject: Re: [PATCH 2/2] usb: gadget: u_serial: improve performance for large data


Hi,

Macpaul Lin <[email protected]> writes:

> If the hardware (like DMA engine) could support large usb request exceeds
> maximum packet size, use larger buffer when performing Rx/Tx could reduce
> request numbers and improve performance.
>
> Signed-off-by: Macpaul Lin <[email protected]>
> ---
> drivers/usb/gadget/function/u_serial.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c
> index 3cfc6e2..cdcc070 100644
> --- a/drivers/usb/gadget/function/u_serial.c
> +++ b/drivers/usb/gadget/function/u_serial.c
> @@ -80,6 +80,8 @@
> #define QUEUE_SIZE 16
> #define WRITE_BUF_SIZE 8192 /* TX only */
> #define GS_CONSOLE_BUF_SIZE 8192
> +/* for hardware can do more than max packet */
> +#define REQ_BUF_SIZE 4096
>
> /* console info */
> struct gs_console {
> @@ -247,7 +249,8 @@ static int gs_start_tx(struct gs_port *port)
> break;
>
> req = list_entry(pool->next, struct usb_request, list);
> - len = gs_send_packet(port, req->buf, in->maxpacket);
> + len = gs_send_packet(port, req->buf, in->can_exceed_maxp ?
> + REQ_BUF_SIZE : in->maxpacket);

just do this unconditionally.

> if (len == 0) {
> wake_up_interruptible(&port->drain_wait);
> break;
> @@ -514,7 +517,9 @@ static int gs_alloc_requests(struct usb_ep *ep, struct list_head *head,
> * be as speedy as we might otherwise be.
> */
> for (i = 0; i < n; i++) {
> - req = gs_alloc_req(ep, ep->maxpacket, GFP_ATOMIC);
> + req = gs_alloc_req(ep, ep->can_exceed_maxp ?
> + REQ_BUF_SIZE : ep->maxpacket,
> + GFP_ATOMIC);

allocating 4kiB in atomic isn't very good. A better idea would be to
preallocate a list of requests and recycle them, rather than allocating
every time you need to do a transfer

--
balbi


Attachments:
signature.asc (847.00 B)

2020-06-24 06:52:04

by Felipe Balbi

[permalink] [raw]
Subject: Re: [PATCH v3] usb: gadget: u_serial: improve performance for large data

Hi,

Macpaul Lin <[email protected]> writes:
> Nowadays some embedded systems use VCOM to transfer large log and data.
> Take LTE MODEM as an example, during the long debugging stage, large
> log and data were transfer through VCOM when doing field try or in
> operator's lab. Here we suggest slightly increase the transfer buffer
> in u_serial.c for performance improving.
>
> Signed-off-by: Macpaul Lin <[email protected]>
> ---
> Changes for v2:
> - Drop previous patch for adding flag which indicates hardware capability in
> gadget.h and in DMA engine according to Alan's suggestion. Thanks.
> - Replace requested buffer size "REQ_BUF_SIZE" instead of checking hardware
> capability.
> - Refine commit messages.
> Changes for v3:
> - Code: no change.
> Commit: Add missing change log in v2.
>
> drivers/usb/gadget/function/u_serial.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c
> index 3cfc6e2..d7912a9 100644
> --- a/drivers/usb/gadget/function/u_serial.c
> +++ b/drivers/usb/gadget/function/u_serial.c
> @@ -80,6 +80,7 @@
> #define QUEUE_SIZE 16
> #define WRITE_BUF_SIZE 8192 /* TX only */
> #define GS_CONSOLE_BUF_SIZE 8192
> +#define REQ_BUF_SIZE 4096
>
> /* console info */
> struct gs_console {
> @@ -247,7 +248,7 @@ static int gs_start_tx(struct gs_port *port)
> break;
>
> req = list_entry(pool->next, struct usb_request, list);
> - len = gs_send_packet(port, req->buf, in->maxpacket);
> + len = gs_send_packet(port, req->buf, REQ_BUF_SIZE);
> if (len == 0) {
> wake_up_interruptible(&port->drain_wait);
> break;
> @@ -514,7 +515,7 @@ static int gs_alloc_requests(struct usb_ep *ep, struct list_head *head,
> * be as speedy as we might otherwise be.
> */
> for (i = 0; i < n; i++) {
> - req = gs_alloc_req(ep, ep->maxpacket, GFP_ATOMIC);
> + req = gs_alloc_req(ep, REQ_BUF_SIZE, GFP_ATOMIC);

since this can only be applied for the next merge window, it would be
much better if you work rework how requests are used here and, as I
mentioned in the other subthread, preallocate a list of requests that
get recycled. This would allow us to allocate memory without GFP_ATOMIC.

--
balbi


Attachments:
signature.asc (847.00 B)