2020-05-21 08:38:16

by Wesley Cheng

[permalink] [raw]
Subject: [PATCH v2 0/3] Re-introduce TX FIFO resize for larger EP bursting

Changes in V2:
- Modified TXFIFO resizing logic to ensure that each EP is reserved a
FIFO.
- Removed dev_dbg() prints and fixed typos from patches
- Added some more description on the dt-bindings commit message

Reviewed-by: Felipe Balbi <[email protected]>
Reviewed-by: Rob Herring <[email protected]>

Currently, there is no functionality to allow for resizing the TXFIFOs, and
relying on the HW default setting for the TXFIFO depth. In most cases, the
HW default is probably sufficient, but for USB compositions that contain
multiple functions that require EP bursting, the default settings
might not be enough. Also to note, the current SW will assign an EP to a
function driver w/o checking to see if the TXFIFO size for that particular
EP is large enough. (this is a problem if there are multiple HW defined
values for the TXFIFO size)

It is mentioned in the SNPS databook that a minimum of TX FIFO depth = 3
is required for an EP that supports bursting. Otherwise, there may be
frequent occurences of bursts ending. For high bandwidth functions,
such as data tethering (protocols that support data aggregation), mass
storage, and media transfer protocol (over FFS), the bMaxBurst value can be
large, and a bigger TXFIFO depth may prove to be beneficial in terms of USB
throughput. (which can be associated to system access latency, etc...) It
allows for a more consistent burst of traffic, w/o any interruptions, as
data is readily available in the FIFO.

With testing done using the mass storage function driver, the results show
that with a larger TXFIFO depth, the bandwidth increased significantly.

Test Parameters:
- Platform: Qualcomm SM8150
- bMaxBurst = 6
- USB req size = 256kB
- Num of USB reqs = 16
- USB Speed = Super-Speed
- Function Driver: Mass Storage (w/ ramdisk)
- Test Application: CrystalDiskMark

Results:

TXFIFO Depth = 3 max packets

Test Case | Data Size | AVG tput (in MB/s)
-------------------------------------------
Sequential|1 GB x |
Read |9 loops | 193.60
| | 195.86
| | 184.77
| | 193.60
-------------------------------------------

TXFIFO Depth = 6 max packets

Test Case | Data Size | AVG tput (in MB/s)
-------------------------------------------
Sequential|1 GB x |
Read |9 loops | 287.35
| | 304.94
| | 289.64
| | 293.61
-------------------------------------------

Wesley Cheng (3):
usb: dwc3: Resize TX FIFOs to meet EP bursting requirements
arm64: boot: dts: qcom: sm8150: Enable dynamic TX FIFO resize logic
dt-bindings: usb: dwc3: Add entry for tx-fifo-resize

Documentation/devicetree/bindings/usb/dwc3.txt | 2 +-
arch/arm64/boot/dts/qcom/sm8150.dtsi | 1 +
drivers/usb/dwc3/core.c | 2 +
drivers/usb/dwc3/core.h | 8 ++
drivers/usb/dwc3/ep0.c | 37 ++++++++-
drivers/usb/dwc3/gadget.c | 111 +++++++++++++++++++++++++
6 files changed, 159 insertions(+), 2 deletions(-)

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


2020-05-21 08:38:45

by Wesley Cheng

[permalink] [raw]
Subject: [PATCH v2 1/3] usb: dwc3: Resize TX FIFOs to meet EP bursting requirements

Some devices have USB compositions which may require multiple endpoints
that support EP bursting. HW defined TX FIFO sizes may not always be
sufficient for these compositions. By utilizing flexible TX FIFO
allocation, this allows for endpoints to request the required FIFO depth to
achieve higher bandwidth. With some higher bMaxBurst configurations, using
a larger TX FIFO size results in better TX throughput.

Ensure that one TX FIFO is reserved for every IN endpoint. This allows for
the FIFO logic to prevent running out of FIFO space.

Signed-off-by: Wesley Cheng <[email protected]>
Reviewed-by: Felipe Balbi <[email protected]>
---
drivers/usb/dwc3/core.c | 2 +
drivers/usb/dwc3/core.h | 8 ++++
drivers/usb/dwc3/ep0.c | 37 +++++++++++++++-
drivers/usb/dwc3/gadget.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 157 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index edc1715..cca5554 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1304,6 +1304,8 @@ static void dwc3_get_properties(struct dwc3 *dwc)
&tx_thr_num_pkt_prd);
device_property_read_u8(dev, "snps,tx-max-burst-prd",
&tx_max_burst_prd);
+ dwc->needs_fifo_resize = device_property_read_bool(dev,
+ "tx-fifo-resize");

dwc->disable_scramble_quirk = device_property_read_bool(dev,
"snps,disable_scramble_quirk");
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 4c171a8..ce0bf28 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -675,6 +675,7 @@ struct dwc3_event_buffer {
* isochronous START TRANSFER command failure workaround
* @start_cmd_status: the status of testing START TRANSFER command with
* combo_num = 'b00
+ * @fifo_depth: allocated TXFIFO depth
*/
struct dwc3_ep {
struct usb_ep endpoint;
@@ -727,6 +728,7 @@ struct dwc3_ep {
/* For isochronous START TRANSFER workaround only */
u8 combo_num;
int start_cmd_status;
+ int fifo_depth;
};

enum dwc3_phy {
@@ -1004,6 +1006,7 @@ struct dwc3_scratchpad_array {
* 1 - utmi_l1_suspend_n
* @is_fpga: true when we are using the FPGA board
* @pending_events: true when we have pending IRQs to be handled
+ * @needs_fifo_resize: not all users might want fifo resizing, flag it
* @pullups_connected: true when Run/Stop bit is set
* @setup_packet_pending: true when there's a Setup Packet in FIFO. Workaround
* @three_stage_setup: set if we perform a three phase setup
@@ -1044,6 +1047,8 @@ struct dwc3_scratchpad_array {
* @dis_metastability_quirk: set to disable metastability quirk.
* @imod_interval: set the interrupt moderation interval in 250ns
* increments or 0 to disable.
+ * @last_fifo_depth: total TXFIFO depth of all enabled USB IN/INT endpoints
+ * @num_ep_resized: the number of TX FIFOs that have already been resized
*/
struct dwc3 {
struct work_struct drd_work;
@@ -1204,6 +1209,7 @@ struct dwc3 {
unsigned is_utmi_l1_suspend:1;
unsigned is_fpga:1;
unsigned pending_events:1;
+ unsigned needs_fifo_resize:1;
unsigned pullups_connected:1;
unsigned setup_packet_pending:1;
unsigned three_stage_setup:1;
@@ -1236,6 +1242,8 @@ struct dwc3 {
unsigned dis_metastability_quirk:1;

u16 imod_interval;
+ int last_fifo_depth;
+ int num_ep_resized;
};

#define INCRX_BURST_MODE 0
diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
index 6dee4da..76db9b5 100644
--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -601,8 +601,9 @@ static int dwc3_ep0_set_config(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl)
{
enum usb_device_state state = dwc->gadget.state;
u32 cfg;
- int ret;
+ int ret, num, size;
u32 reg;
+ struct dwc3_ep *dep;

cfg = le16_to_cpu(ctrl->wValue);

@@ -611,6 +612,40 @@ static int dwc3_ep0_set_config(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl)
return -EINVAL;

case USB_STATE_ADDRESS:
+ /*
+ * If tx-fifo-resize flag is not set for the controller, then
+ * do not clear existing allocated TXFIFO since we do not
+ * allocate it again in dwc3_gadget_resize_tx_fifos
+ */
+ if (dwc->needs_fifo_resize) {
+ /* Read ep0IN related TXFIFO size */
+ dep = dwc->eps[1];
+ size = dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(0));
+ if (dwc3_is_usb31(dwc))
+ dep->fifo_depth = DWC31_GTXFIFOSIZ_TXFDEP(size);
+ else
+ dep->fifo_depth = DWC3_GTXFIFOSIZ_TXFDEP(size);
+
+ dwc->last_fifo_depth = dep->fifo_depth;
+ /* Clear existing TXFIFO for all IN eps except ep0 */
+ for (num = 3; num < min_t(int, dwc->num_eps,
+ DWC3_ENDPOINTS_NUM); num += 2) {
+ dep = dwc->eps[num];
+ /* Don't change TXFRAMNUM on usb31 version */
+ size = dwc3_is_usb31(dwc) ?
+ dwc3_readl(dwc->regs,
+ DWC3_GTXFIFOSIZ(num >> 1)) &
+ DWC31_GTXFIFOSIZ_TXFRAMNUM :
+ 0;
+
+ dwc3_writel(dwc->regs,
+ DWC3_GTXFIFOSIZ(num >> 1),
+ size);
+ dep->fifo_depth = 0;
+ }
+ dwc->num_ep_resized = 0;
+ }
+
ret = dwc3_ep0_delegate_req(dwc, ctrl);
/* if the cfg matches and the cfg is non zero */
if (cfg && (!ret || (ret == USB_GADGET_DELAYED_STATUS))) {
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 00746c2..65bcc0f 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -540,6 +540,113 @@ static int dwc3_gadget_start_config(struct dwc3_ep *dep)
return 0;
}

+/*
+ * dwc3_gadget_resize_tx_fifos - reallocate fifo spaces for current use-case
+ * @dwc: pointer to our context structure
+ *
+ * This function will a best effort FIFO allocation in order
+ * to improve FIFO usage and throughput, while still allowing
+ * us to enable as many endpoints as possible.
+ *
+ * Keep in mind that this operation will be highly dependent
+ * on the configured size for RAM1 - which contains TxFifo -,
+ * the amount of endpoints enabled on coreConsultant tool, and
+ * the width of the Master Bus.
+ *
+ * In general, FIFO depths are represented with the following equation:
+ *
+ * fifo_size = mult * ((max_packet + mdwidth)/mdwidth + 1) + 1
+ *
+ * Conversions can be done to the equation to derive the number of packets that
+ * will fit to a particular FIFO size value.
+ */
+static int dwc3_gadget_resize_tx_fifos(struct dwc3 *dwc, struct dwc3_ep *dep)
+{
+ int ram1_depth, mdwidth, fifo_0_start, tmp, num_in_ep;
+ int min_depth, remaining, fifo_size, mult = 1, fifo, max_packet = 1024;
+
+ if (!dwc->needs_fifo_resize)
+ return 0;
+
+ /* resize IN endpoints except ep0 */
+ if (!usb_endpoint_dir_in(dep->endpoint.desc) || dep->number <= 1)
+ return 0;
+
+ /* Don't resize already resized IN endpoint */
+ if (dep->fifo_depth)
+ return 0;
+
+ ram1_depth = DWC3_RAM1_DEPTH(dwc->hwparams.hwparams7);
+ mdwidth = DWC3_MDWIDTH(dwc->hwparams.hwparams0);
+ /* MDWIDTH is represented in bits, we need it in bytes */
+ mdwidth >>= 3;
+
+ if (((dep->endpoint.maxburst > 1) &&
+ usb_endpoint_xfer_bulk(dep->endpoint.desc))
+ || usb_endpoint_xfer_isoc(dep->endpoint.desc))
+ mult = 3;
+
+ if ((dep->endpoint.maxburst > 6) &&
+ usb_endpoint_xfer_bulk(dep->endpoint.desc)
+ && dwc3_is_usb31(dwc))
+ mult = 6;
+
+ /* FIFO size for a single buffer */
+ fifo = (max_packet + mdwidth)/mdwidth;
+ fifo++;
+
+ /* Calculate the number of remaining EPs w/o any FIFO */
+ num_in_ep = dwc->num_eps/2;
+ num_in_ep -= dwc->num_ep_resized;
+ /* Ignore EP0 IN */
+ num_in_ep--;
+
+ /* Reserve at least one FIFO for the number of IN EPs */
+ min_depth = num_in_ep * (fifo+1);
+ remaining = ram1_depth - min_depth - dwc->last_fifo_depth;
+
+ /* We've already reserved 1 FIFO per EP, so check what we can fit in
+ * addition to it. If there is not enough remaining space, allocate
+ * all the remaining space to the EP.
+ */
+ fifo_size = (mult-1) * fifo;
+ if (remaining < fifo_size && remaining > 0)
+ fifo_size = remaining;
+
+ fifo_size += fifo;
+ fifo_size++;
+ dep->fifo_depth = fifo_size;
+
+ /* Check if TXFIFOs start at non-zero addr */
+ tmp = dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(0));
+ fifo_0_start = DWC3_GTXFIFOSIZ_TXFSTADDR(tmp);
+
+ fifo_size |= (fifo_0_start + (dwc->last_fifo_depth << 16));
+ if (dwc3_is_usb31(dwc))
+ dwc->last_fifo_depth += DWC31_GTXFIFOSIZ_TXFDEP(fifo_size);
+ else
+ dwc->last_fifo_depth += DWC3_GTXFIFOSIZ_TXFDEP(fifo_size);
+
+ /* Check fifo size allocation doesn't exceed available RAM size. */
+ if (dwc->last_fifo_depth >= ram1_depth) {
+ dev_err(dwc->dev, "Fifosize(%d) > RAM size(%d) %s depth:%d\n",
+ (dwc->last_fifo_depth * mdwidth), ram1_depth,
+ dep->endpoint.name, fifo_size);
+ if (dwc3_is_usb31(dwc))
+ fifo_size = DWC31_GTXFIFOSIZ_TXFDEP(fifo_size);
+ else
+ fifo_size = DWC3_GTXFIFOSIZ_TXFDEP(fifo_size);
+ dwc->last_fifo_depth -= fifo_size;
+ dep->fifo_depth = 0;
+ WARN_ON(1);
+ return -ENOMEM;
+ }
+
+ dwc3_writel(dwc->regs, DWC3_GTXFIFOSIZ(dep->number >> 1), fifo_size);
+ dwc->num_ep_resized++;
+ return 0;
+}
+
static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
{
const struct usb_ss_ep_comp_descriptor *comp_desc;
@@ -620,6 +727,10 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action)
int ret;

if (!(dep->flags & DWC3_EP_ENABLED)) {
+ ret = dwc3_gadget_resize_tx_fifos(dwc, dep);
+ if (ret)
+ return ret;
+
ret = dwc3_gadget_start_config(dep);
if (ret)
return ret;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2020-05-21 08:38:52

by Wesley Cheng

[permalink] [raw]
Subject: [PATCH v2 3/3] dt-bindings: usb: dwc3: Add entry for tx-fifo-resize

Re-introduce the comment for the tx-fifo-resize setting for the DWC3
controller. This allows for vendors to control if they require the TX FIFO
resizing logic on their HW, as the default FIFO size configurations may
already be sufficient.

Signed-off-by: Wesley Cheng <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
---
Documentation/devicetree/bindings/usb/dwc3.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt b/Documentation/devicetree/bindings/usb/dwc3.txt
index 9946ff9..489f5da 100644
--- a/Documentation/devicetree/bindings/usb/dwc3.txt
+++ b/Documentation/devicetree/bindings/usb/dwc3.txt
@@ -105,7 +105,7 @@ Optional properties:
1-16 (DWC_usb31 programming guide section 1.2.3) to
enable periodic ESS TX threshold.

- - <DEPRECATED> tx-fifo-resize: determines if the FIFO *has* to be reallocated.
+ - tx-fifo-resize: determines if the FIFO *has* to be reallocated.
- snps,incr-burst-type-adjustment: Value for INCR burst type of GSBUSCFG0
register, undefined length INCR burst type enable and INCRx type.
When just one value, which means INCRX burst mode enabled. When
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2020-05-21 17:39:46

by Wesley Cheng

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] Re-introduce TX FIFO resize for larger EP bursting



On 5/21/2020 1:36 AM, Wesley Cheng wrote:
> Changes in V2:
> - Modified TXFIFO resizing logic to ensure that each EP is reserved a
> FIFO.
> - Removed dev_dbg() prints and fixed typos from patches
> - Added some more description on the dt-bindings commit message
>


> Reviewed-by: Felipe Balbi <[email protected]>
> Reviewed-by: Rob Herring <[email protected]>
>

Sorry, please disregard the Reviewed-by tags in the patches. I added
those mistakenly.

> Currently, there is no functionality to allow for resizing the TXFIFOs, and
> relying on the HW default setting for the TXFIFO depth. In most cases, the
> HW default is probably sufficient, but for USB compositions that contain
> multiple functions that require EP bursting, the default settings
> might not be enough. Also to note, the current SW will assign an EP to a
> function driver w/o checking to see if the TXFIFO size for that particular
> EP is large enough. (this is a problem if there are multiple HW defined
> values for the TXFIFO size)
>
> It is mentioned in the SNPS databook that a minimum of TX FIFO depth = 3
> is required for an EP that supports bursting. Otherwise, there may be
> frequent occurences of bursts ending. For high bandwidth functions,
> such as data tethering (protocols that support data aggregation), mass
> storage, and media transfer protocol (over FFS), the bMaxBurst value can be
> large, and a bigger TXFIFO depth may prove to be beneficial in terms of USB
> throughput. (which can be associated to system access latency, etc...) It
> allows for a more consistent burst of traffic, w/o any interruptions, as
> data is readily available in the FIFO.
>
> With testing done using the mass storage function driver, the results show
> that with a larger TXFIFO depth, the bandwidth increased significantly.
>
> Test Parameters:
> - Platform: Qualcomm SM8150
> - bMaxBurst = 6
> - USB req size = 256kB
> - Num of USB reqs = 16
> - USB Speed = Super-Speed
> - Function Driver: Mass Storage (w/ ramdisk)
> - Test Application: CrystalDiskMark
>
> Results:
>
> TXFIFO Depth = 3 max packets
>
> Test Case | Data Size | AVG tput (in MB/s)
> -------------------------------------------
> Sequential|1 GB x |
> Read |9 loops | 193.60
> | | 195.86
> | | 184.77
> | | 193.60
> -------------------------------------------
>
> TXFIFO Depth = 6 max packets
>
> Test Case | Data Size | AVG tput (in MB/s)
> -------------------------------------------
> Sequential|1 GB x |
> Read |9 loops | 287.35
> | | 304.94
> | | 289.64
> | | 293.61
> -------------------------------------------
>
> Wesley Cheng (3):
> usb: dwc3: Resize TX FIFOs to meet EP bursting requirements
> arm64: boot: dts: qcom: sm8150: Enable dynamic TX FIFO resize logic
> dt-bindings: usb: dwc3: Add entry for tx-fifo-resize
>
> Documentation/devicetree/bindings/usb/dwc3.txt | 2 +-
> arch/arm64/boot/dts/qcom/sm8150.dtsi | 1 +
> drivers/usb/dwc3/core.c | 2 +
> drivers/usb/dwc3/core.h | 8 ++
> drivers/usb/dwc3/ep0.c | 37 ++++++++-
> drivers/usb/dwc3/gadget.c | 111 +++++++++++++++++++++++++
> 6 files changed, 159 insertions(+), 2 deletions(-)
>

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2020-05-22 09:56:57

by Felipe Balbi

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] Re-introduce TX FIFO resize for larger EP bursting

Wesley Cheng <[email protected]> writes:

> Changes in V2:
> - Modified TXFIFO resizing logic to ensure that each EP is reserved a
> FIFO.
> - Removed dev_dbg() prints and fixed typos from patches
> - Added some more description on the dt-bindings commit message
>
> Reviewed-by: Felipe Balbi <[email protected]>

I don't remember giving you a Reviewed-by, did I?

--
balbi


Attachments:
signature.asc (847.00 B)

2020-05-22 17:53:35

by Wesley Cheng

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] Re-introduce TX FIFO resize for larger EP bursting



On 5/22/2020 2:54 AM, Felipe Balbi wrote:
> Wesley Cheng <[email protected]> writes:
>
>> Changes in V2:
>> - Modified TXFIFO resizing logic to ensure that each EP is reserved a
>> FIFO.
>> - Removed dev_dbg() prints and fixed typos from patches
>> - Added some more description on the dt-bindings commit message
>>
>> Reviewed-by: Felipe Balbi <[email protected]>
>
> I don't remember giving you a Reviewed-by, did I?
>

Hi Felipe,

Sorry, I put the Reviewed-by tag by mistake, I sent a follow up email to
disregard the tags. If you need me to resubmit the patch series
version, please let me know. Thanks!

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project