This series now effectively has two purposes:
1. Speed up dwc2 interrupt latency.
2. Start fixing up the microframe scheduler.
...the two things were separate series in the past but they ended up
running into each other, so now they're combined.
To summarize what we have here:
1. usb: dwc2: rockchip: Make the max_transfer_size automatic
No brainer. Can land any time.
2. usb: dwc2: host: Get aligned DMA in a more supported way
Although this touches a lot of code, it's mostly just deleting
stuff. The way this is working is nearly the same as tegra. Biggest
objection I expect is that it has too much duplication with tegra and
musb. I'd personally prefer to land it now and remove duplication
later, but up to others. Speeding up interrupt handler helps with
SOF scheduling, so this is not just a dumb optimization.
3. usb: dwc2: host: Add scheduler tracing
Useful for patches below.
4. usb: dwc2: host: Rewrite the microframe scheduler
Seems hard to believe this would make things worse since the old
scheduler is easy to break. Certainly microframe scheduler isn't
amazing, but small steps, right?
5. usb: dwc2: host: Keep track of and use our scheduled microframe
Needs review, but seems simple to me. Maybe doesn't fix everything,
but fixes some things...
6. usb: dwc2: host: Assume all devices are on one single_tt hub
Questionable, but maybe worth landing it?
7. usb: dwc2: host: Add a delay before releasing periodic bandwidth
Pretty much the same patch I sent before, just rebased.
8. usb: dwc2: host: Giveback URB in tasklet context
Simple and a nice speedup assuming it doesn't break anything. My
belief is that our scheduler is already broken enough that things
aren't made worse by this patch (and lots of things are made better
by speeding up the interrupt handler and not mising SOFs), but
welcome other testing and opinions.
===
Below is discussion of some of the speedup stuff.
===
The dwc2 interrupt handler is quite slow. On rk3288 with a few things
plugged into the ports and with cpufreq locked at 696MHz (to simulate
real world idle system), I can easily observe dwc2_handle_hcd_intr()
taking > 120 us, sometimes > 150 us. Note that SOF interrupts come
every 125 us with high speed USB, so taking > 120 us in the interrupt
handler is a big deal.
The patches here will speed up the interrupt controller significantly.
After this series, I have a hard time seeing the interrupt controller
taking > 20 us and I don't ever see it taking > 30 us ever in my tests
unless I bring the cpufreq back down. With the cpufreq at 126 MHz I can
still see the interrupt handler take > 50 us, so I'm sure we could
improve this further. ...but hey, it's a start.
This series also shows big speed improvements when testing with a USB
Gigabit Ethernet adapter. Previously the tested adapter would top out
at about 15MB/s. After these changes it gets about 23MB/s.
In addition to the speedup, this series also has the advantage of
simplifying dwc2 and making it more like everyone else (introducing the
possibility of future simplifications). Picking this series up will
help your diffstat and likely win you friends. ;)
===
Steps for gathering data with ftrace:
cd /sys/devices/system/cpu/cpu0/cpufreq/
echo userspace > scaling_governor
echo 696000 > scaling_setspeed
cd /sys/kernel/debug/tracing
echo 0 > tracing_on
echo "" > trace
echo nop > current_tracer
echo function_graph > current_tracer
echo dwc2_handle_hcd_intr > set_graph_function
echo dwc2_handle_common_intr >> set_graph_function
echo dwc2_handle_hcd_intr > set_ftrace_filter
echo dwc2_handle_common_intr >> set_ftrace_filter
echo funcgraph-abstime > trace_options
echo 70 > tracing_thresh
echo 1 > /sys/kernel/debug/tracing/tracing_on
sleep 2
cat trace
===
NOTE: This series doesn't replace any other patches I've submitted
recently, it merely adds another set of changes that upstream could
benefit from.
Changes in v3:
- scheduler tracing new for v3.
- The uframe scheduler patch is folded into optimization series.
- Optimize uframe scheduler "single uframe" case a little.
- uframe scheduler now atop logging patches.
- uframe scheduler now before delayed bandwidth release patches.
- Add defines like EARLY_FRAME_USEC
- Reorder dwc2_deschedule_periodic() in prep for future patches.
- uframe scheduler now shows real usefulness w/ future patches!
- Keep track and use our uframe new for v3.
- Assuming single_tt is new for v3; not terribly well tested (yet).
- Moved periodic bandwidth release delay patch later in the series.
Changes in v2:
- Add a warn if setup_dma is not aligned (Julius Werner).
- Totally rewrote uframe scheduler again after writing test code.
- uframe scheduler atop delayed bandwidth release patches.
- Periodic bandwidth release delay new for V2
- Commit message now says that URB giveback change needs delay change.
Douglas Anderson (8):
usb: dwc2: rockchip: Make the max_transfer_size automatic
usb: dwc2: host: Get aligned DMA in a more supported way
usb: dwc2: host: Add scheduler tracing
usb: dwc2: host: Rewrite the microframe scheduler
usb: dwc2: host: Keep track of and use our scheduled microframe
usb: dwc2: host: Assume all devices are on one single_tt hub
usb: dwc2: host: Add a delay before releasing periodic bandwidth
usb: dwc2: host: Giveback URB in tasklet context
drivers/usb/dwc2/core.c | 21 +--
drivers/usb/dwc2/core.h | 20 ++-
drivers/usb/dwc2/hcd.c | 177 +++++++++----------
drivers/usb/dwc2/hcd.h | 30 ++--
drivers/usb/dwc2/hcd_intr.c | 73 +-------
drivers/usb/dwc2/hcd_queue.c | 407 ++++++++++++++++++++++++++++++-------------
drivers/usb/dwc2/platform.c | 2 +-
7 files changed, 416 insertions(+), 314 deletions(-)
--
2.6.0.rc2.230.g3dd15c0
Previously we needed to set the max_transfer_size to explicitly be 65535
because the old driver would detect that our hardware could support much
bigger transfers and then would try to do them. This wouldn't work
since the DMA alignment code couldn't support it.
Later in commit e8f8c14d9da7 ("usb: dwc2: clip max_transfer_size to
65535") upstream added support for clipping this automatically. Since
that commit it has been OK to just use "-1" (default), but nobody
bothered to change it.
Let's change it to default now for two reasons:
- It's nice to use autodetected params.
- If we can remove the 65535 limit, we can transfer more!
Signed-off-by: Douglas Anderson <[email protected]>
Tested-by: Heiko Stuebner <[email protected]>
---
Changes in v3: None
Changes in v2: None
drivers/usb/dwc2/platform.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/dwc2/platform.c b/drivers/usb/dwc2/platform.c
index 5859b0fa19ee..f26e0c31c07e 100644
--- a/drivers/usb/dwc2/platform.c
+++ b/drivers/usb/dwc2/platform.c
@@ -95,7 +95,7 @@ static const struct dwc2_core_params params_rk3066 = {
.host_rx_fifo_size = 520, /* 520 DWORDs */
.host_nperio_tx_fifo_size = 128, /* 128 DWORDs */
.host_perio_tx_fifo_size = 256, /* 256 DWORDs */
- .max_transfer_size = 65535,
+ .max_transfer_size = -1,
.max_packet_count = -1,
.host_channels = -1,
.phy_type = -1,
--
2.6.0.rc2.230.g3dd15c0
All other host controllers who want aligned buffers for DMA do it a
certain way. Let's do that too instead of working behind the USB core's
back. This makes our interrupt handler not take forever and also rips
out a lot of code, simplifying things a bunch.
This also has the side effect of removing the 65535 max transfer size
limit.
NOTE: The actual code to allocate the aligned buffers is ripped almost
completely from the tegra EHCI driver. At some point in the future we
may want to add this functionality to the USB core to share more code
everywhere.
Signed-off-by: Douglas Anderson <[email protected]>
Tested-by: Heiko Stuebner <[email protected]>
---
Changes in v3: None
Changes in v2:
- Add a warn if setup_dma is not aligned (Julius Werner).
drivers/usb/dwc2/core.c | 21 +-----
drivers/usb/dwc2/hcd.c | 170 +++++++++++++++++++++----------------------
drivers/usb/dwc2/hcd.h | 10 ---
drivers/usb/dwc2/hcd_intr.c | 65 -----------------
drivers/usb/dwc2/hcd_queue.c | 7 +-
5 files changed, 87 insertions(+), 186 deletions(-)
diff --git a/drivers/usb/dwc2/core.c b/drivers/usb/dwc2/core.c
index ef73e498e98f..7e28cfafcfd8 100644
--- a/drivers/usb/dwc2/core.c
+++ b/drivers/usb/dwc2/core.c
@@ -1830,19 +1830,11 @@ void dwc2_hc_start_transfer(struct dwc2_hsotg *hsotg,
}
if (hsotg->core_params->dma_enable > 0) {
- dma_addr_t dma_addr;
-
- if (chan->align_buf) {
- if (dbg_hc(chan))
- dev_vdbg(hsotg->dev, "align_buf\n");
- dma_addr = chan->align_buf;
- } else {
- dma_addr = chan->xfer_dma;
- }
- dwc2_writel((u32)dma_addr, hsotg->regs + HCDMA(chan->hc_num));
+ dwc2_writel((u32)chan->xfer_dma,
+ hsotg->regs + HCDMA(chan->hc_num));
if (dbg_hc(chan))
dev_vdbg(hsotg->dev, "Wrote %08lx to HCDMA(%d)\n",
- (unsigned long)dma_addr, chan->hc_num);
+ (unsigned long)chan->xfer_dma, chan->hc_num);
}
/* Start the split */
@@ -3137,13 +3129,6 @@ int dwc2_get_hwparams(struct dwc2_hsotg *hsotg)
width = (hwcfg3 & GHWCFG3_XFER_SIZE_CNTR_WIDTH_MASK) >>
GHWCFG3_XFER_SIZE_CNTR_WIDTH_SHIFT;
hw->max_transfer_size = (1 << (width + 11)) - 1;
- /*
- * Clip max_transfer_size to 65535. dwc2_hc_setup_align_buf() allocates
- * coherent buffers with this size, and if it's too large we can
- * exhaust the coherent DMA pool.
- */
- if (hw->max_transfer_size > 65535)
- hw->max_transfer_size = 65535;
width = (hwcfg3 & GHWCFG3_PACKET_SIZE_CNTR_WIDTH_MASK) >>
GHWCFG3_PACKET_SIZE_CNTR_WIDTH_SHIFT;
hw->max_packet_count = (1 << (width + 4)) - 1;
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index e79baf73c234..33495b235b3c 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -598,9 +598,9 @@ static void dwc2_hc_init_split(struct dwc2_hsotg *hsotg,
chan->hub_port = (u8)hub_port;
}
-static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
- struct dwc2_host_chan *chan,
- struct dwc2_qtd *qtd, void *bufptr)
+static void dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
+ struct dwc2_host_chan *chan,
+ struct dwc2_qtd *qtd)
{
struct dwc2_hcd_urb *urb = qtd->urb;
struct dwc2_hcd_iso_packet_desc *frame_desc;
@@ -620,7 +620,6 @@ static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
else
chan->xfer_buf = urb->setup_packet;
chan->xfer_len = 8;
- bufptr = NULL;
break;
case DWC2_CONTROL_DATA:
@@ -647,7 +646,6 @@ static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
chan->xfer_dma = hsotg->status_buf_dma;
else
chan->xfer_buf = hsotg->status_buf;
- bufptr = NULL;
break;
}
break;
@@ -680,14 +678,6 @@ static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
chan->xfer_len = frame_desc->length - qtd->isoc_split_offset;
- /* For non-dword aligned buffers */
- if (hsotg->core_params->dma_enable > 0 &&
- (chan->xfer_dma & 0x3))
- bufptr = (u8 *)urb->buf + frame_desc->offset +
- qtd->isoc_split_offset;
- else
- bufptr = NULL;
-
if (chan->xact_pos == DWC2_HCSPLT_XACTPOS_ALL) {
if (chan->xfer_len <= 188)
chan->xact_pos = DWC2_HCSPLT_XACTPOS_ALL;
@@ -696,63 +686,93 @@ static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
}
break;
}
+}
+
+#define DWC2_USB_DMA_ALIGN 4
+
+struct dma_aligned_buffer {
+ void *kmalloc_ptr;
+ void *old_xfer_buffer;
+ u8 data[0];
+};
+
+static void dwc2_free_dma_aligned_buffer(struct urb *urb)
+{
+ struct dma_aligned_buffer *temp;
+
+ if (!(urb->transfer_flags & URB_ALIGNED_TEMP_BUFFER))
+ return;
+
+ temp = container_of(urb->transfer_buffer,
+ struct dma_aligned_buffer, data);
- return bufptr;
+ if (usb_urb_dir_in(urb))
+ memcpy(temp->old_xfer_buffer, temp->data,
+ urb->transfer_buffer_length);
+ urb->transfer_buffer = temp->old_xfer_buffer;
+ kfree(temp->kmalloc_ptr);
+
+ urb->transfer_flags &= ~URB_ALIGNED_TEMP_BUFFER;
}
-static int dwc2_hc_setup_align_buf(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh,
- struct dwc2_host_chan *chan,
- struct dwc2_hcd_urb *urb, void *bufptr)
+static int dwc2_alloc_dma_aligned_buffer(struct urb *urb, gfp_t mem_flags)
{
- u32 buf_size;
- struct urb *usb_urb;
- struct usb_hcd *hcd;
+ struct dma_aligned_buffer *temp, *kmalloc_ptr;
+ size_t kmalloc_size;
- if (!qh->dw_align_buf) {
- if (chan->ep_type != USB_ENDPOINT_XFER_ISOC)
- buf_size = hsotg->core_params->max_transfer_size;
- else
- /* 3072 = 3 max-size Isoc packets */
- buf_size = 3072;
+ if (urb->num_sgs || urb->sg ||
+ urb->transfer_buffer_length == 0 ||
+ !((uintptr_t)urb->transfer_buffer & (DWC2_USB_DMA_ALIGN - 1)))
+ return 0;
- qh->dw_align_buf = kmalloc(buf_size, GFP_ATOMIC | GFP_DMA);
- if (!qh->dw_align_buf)
- return -ENOMEM;
- qh->dw_align_buf_size = buf_size;
- }
+ /* Allocate a buffer with enough padding for alignment */
+ kmalloc_size = urb->transfer_buffer_length +
+ sizeof(struct dma_aligned_buffer) + DWC2_USB_DMA_ALIGN - 1;
- if (chan->xfer_len) {
- dev_vdbg(hsotg->dev, "%s(): non-aligned buffer\n", __func__);
- usb_urb = urb->priv;
+ kmalloc_ptr = kmalloc(kmalloc_size, mem_flags);
+ if (!kmalloc_ptr)
+ return -ENOMEM;
- if (usb_urb) {
- if (usb_urb->transfer_flags &
- (URB_SETUP_MAP_SINGLE | URB_DMA_MAP_SG |
- URB_DMA_MAP_PAGE | URB_DMA_MAP_SINGLE)) {
- hcd = dwc2_hsotg_to_hcd(hsotg);
- usb_hcd_unmap_urb_for_dma(hcd, usb_urb);
- }
- if (!chan->ep_is_in)
- memcpy(qh->dw_align_buf, bufptr,
- chan->xfer_len);
- } else {
- dev_warn(hsotg->dev, "no URB in dwc2_urb\n");
- }
- }
+ /* Position our struct dma_aligned_buffer such that data is aligned */
+ temp = PTR_ALIGN(kmalloc_ptr + 1, DWC2_USB_DMA_ALIGN) - 1;
+ temp->kmalloc_ptr = kmalloc_ptr;
+ temp->old_xfer_buffer = urb->transfer_buffer;
+ if (usb_urb_dir_out(urb))
+ memcpy(temp->data, urb->transfer_buffer,
+ urb->transfer_buffer_length);
+ urb->transfer_buffer = temp->data;
- qh->dw_align_buf_dma = dma_map_single(hsotg->dev,
- qh->dw_align_buf, qh->dw_align_buf_size,
- chan->ep_is_in ? DMA_FROM_DEVICE : DMA_TO_DEVICE);
- if (dma_mapping_error(hsotg->dev, qh->dw_align_buf_dma)) {
- dev_err(hsotg->dev, "can't map align_buf\n");
- chan->align_buf = 0;
- return -EINVAL;
- }
+ urb->transfer_flags |= URB_ALIGNED_TEMP_BUFFER;
- chan->align_buf = qh->dw_align_buf_dma;
return 0;
}
+static int dwc2_map_urb_for_dma(struct usb_hcd *hcd, struct urb *urb,
+ gfp_t mem_flags)
+{
+ int ret;
+
+ /* We assume setup_dma is always aligned; warn if not */
+ WARN_ON_ONCE(urb->setup_dma &&
+ (urb->setup_dma & (DWC2_USB_DMA_ALIGN - 1)));
+
+ ret = dwc2_alloc_dma_aligned_buffer(urb, mem_flags);
+ if (ret)
+ return ret;
+
+ ret = usb_hcd_map_urb_for_dma(hcd, urb, mem_flags);
+ if (ret)
+ dwc2_free_dma_aligned_buffer(urb);
+
+ return ret;
+}
+
+static void dwc2_unmap_urb_for_dma(struct usb_hcd *hcd, struct urb *urb)
+{
+ usb_hcd_unmap_urb_for_dma(hcd, urb);
+ dwc2_free_dma_aligned_buffer(urb);
+}
+
/**
* dwc2_assign_and_init_hc() - Assigns transactions from a QTD to a free host
* channel and initializes the host channel to perform the transactions. The
@@ -767,7 +787,6 @@ static int dwc2_assign_and_init_hc(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
struct dwc2_host_chan *chan;
struct dwc2_hcd_urb *urb;
struct dwc2_qtd *qtd;
- void *bufptr = NULL;
if (dbg_qh(qh))
dev_vdbg(hsotg->dev, "%s(%p,%p)\n", __func__, hsotg, qh);
@@ -829,16 +848,10 @@ static int dwc2_assign_and_init_hc(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
!dwc2_hcd_is_pipe_in(&urb->pipe_info))
urb->actual_length = urb->length;
- if (hsotg->core_params->dma_enable > 0) {
+ if (hsotg->core_params->dma_enable > 0)
chan->xfer_dma = urb->dma + urb->actual_length;
-
- /* For non-dword aligned case */
- if (hsotg->core_params->dma_desc_enable <= 0 &&
- (chan->xfer_dma & 0x3))
- bufptr = (u8 *)urb->buf + urb->actual_length;
- } else {
+ else
chan->xfer_buf = (u8 *)urb->buf + urb->actual_length;
- }
chan->xfer_len = urb->length - urb->actual_length;
chan->xfer_count = 0;
@@ -850,27 +863,7 @@ static int dwc2_assign_and_init_hc(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
chan->do_split = 0;
/* Set the transfer attributes */
- bufptr = dwc2_hc_init_xfer(hsotg, chan, qtd, bufptr);
-
- /* Non DWORD-aligned buffer case */
- if (bufptr) {
- dev_vdbg(hsotg->dev, "Non-aligned buffer\n");
- if (dwc2_hc_setup_align_buf(hsotg, qh, chan, urb, bufptr)) {
- dev_err(hsotg->dev,
- "%s: Failed to allocate memory to handle non-dword aligned buffer\n",
- __func__);
- /* Add channel back to free list */
- chan->align_buf = 0;
- chan->multi_count = 0;
- list_add_tail(&chan->hc_list_entry,
- &hsotg->free_hc_list);
- qtd->in_process = 0;
- qh->channel = NULL;
- return -ENOMEM;
- }
- } else {
- chan->align_buf = 0;
- }
+ dwc2_hc_init_xfer(hsotg, chan, qtd);
if (chan->ep_type == USB_ENDPOINT_XFER_INT ||
chan->ep_type == USB_ENDPOINT_XFER_ISOC)
@@ -2904,6 +2897,9 @@ static struct hc_driver dwc2_hc_driver = {
.bus_suspend = _dwc2_hcd_suspend,
.bus_resume = _dwc2_hcd_resume,
+
+ .map_urb_for_dma = dwc2_map_urb_for_dma,
+ .unmap_urb_for_dma = dwc2_unmap_urb_for_dma,
};
/*
diff --git a/drivers/usb/dwc2/hcd.h b/drivers/usb/dwc2/hcd.h
index f105bada2fd1..8a4486e1a542 100644
--- a/drivers/usb/dwc2/hcd.h
+++ b/drivers/usb/dwc2/hcd.h
@@ -75,8 +75,6 @@ struct dwc2_qh;
* (micro)frame
* @xfer_buf: Pointer to current transfer buffer position
* @xfer_dma: DMA address of xfer_buf
- * @align_buf: In Buffer DMA mode this will be used if xfer_buf is not
- * DWORD aligned
* @xfer_len: Total number of bytes to transfer
* @xfer_count: Number of bytes transferred so far
* @start_pkt_count: Packet count at start of transfer
@@ -132,7 +130,6 @@ struct dwc2_host_chan {
u8 *xfer_buf;
dma_addr_t xfer_dma;
- dma_addr_t align_buf;
u32 xfer_len;
u32 xfer_count;
u16 start_pkt_count;
@@ -241,10 +238,6 @@ enum dwc2_transaction_type {
* @frame_usecs: Internal variable used by the microframe scheduler
* @start_split_frame: (Micro)frame at which last start split was initialized
* @ntd: Actual number of transfer descriptors in a list
- * @dw_align_buf: Used instead of original buffer if its physical address
- * is not dword-aligned
- * @dw_align_buf_size: Size of dw_align_buf
- * @dw_align_buf_dma: DMA address for dw_align_buf
* @qtd_list: List of QTDs for this QH
* @channel: Host channel currently processing transfers for this QH
* @qh_list_entry: Entry for QH in either the periodic or non-periodic
@@ -276,9 +269,6 @@ struct dwc2_qh {
u16 frame_usecs[8];
u16 start_split_frame;
u16 ntd;
- u8 *dw_align_buf;
- int dw_align_buf_size;
- dma_addr_t dw_align_buf_dma;
struct list_head qtd_list;
struct dwc2_host_chan *channel;
struct list_head qh_list_entry;
diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c
index bda0b21b850f..84190243b8be 100644
--- a/drivers/usb/dwc2/hcd_intr.c
+++ b/drivers/usb/dwc2/hcd_intr.c
@@ -465,18 +465,6 @@ static int dwc2_update_urb_state(struct dwc2_hsotg *hsotg,
xfer_length = urb->length - urb->actual_length;
}
- /* Non DWORD-aligned buffer case handling */
- if (chan->align_buf && xfer_length) {
- dev_vdbg(hsotg->dev, "%s(): non-aligned buffer\n", __func__);
- dma_unmap_single(hsotg->dev, chan->qh->dw_align_buf_dma,
- chan->qh->dw_align_buf_size,
- chan->ep_is_in ?
- DMA_FROM_DEVICE : DMA_TO_DEVICE);
- if (chan->ep_is_in)
- memcpy(urb->buf + urb->actual_length,
- chan->qh->dw_align_buf, xfer_length);
- }
-
dev_vdbg(hsotg->dev, "urb->actual_length=%d xfer_length=%d\n",
urb->actual_length, xfer_length);
urb->actual_length += xfer_length;
@@ -558,21 +546,6 @@ static enum dwc2_halt_status dwc2_update_isoc_urb_state(
frame_desc->status = 0;
frame_desc->actual_length = dwc2_get_actual_xfer_length(hsotg,
chan, chnum, qtd, halt_status, NULL);
-
- /* Non DWORD-aligned buffer case handling */
- if (chan->align_buf && frame_desc->actual_length) {
- dev_vdbg(hsotg->dev, "%s(): non-aligned buffer\n",
- __func__);
- dma_unmap_single(hsotg->dev, chan->qh->dw_align_buf_dma,
- chan->qh->dw_align_buf_size,
- chan->ep_is_in ?
- DMA_FROM_DEVICE : DMA_TO_DEVICE);
- if (chan->ep_is_in)
- memcpy(urb->buf + frame_desc->offset +
- qtd->isoc_split_offset,
- chan->qh->dw_align_buf,
- frame_desc->actual_length);
- }
break;
case DWC2_HC_XFER_FRAME_OVERRUN:
urb->error_count++;
@@ -593,21 +566,6 @@ static enum dwc2_halt_status dwc2_update_isoc_urb_state(
frame_desc->actual_length = dwc2_get_actual_xfer_length(hsotg,
chan, chnum, qtd, halt_status, NULL);
- /* Non DWORD-aligned buffer case handling */
- if (chan->align_buf && frame_desc->actual_length) {
- dev_vdbg(hsotg->dev, "%s(): non-aligned buffer\n",
- __func__);
- dma_unmap_single(hsotg->dev, chan->qh->dw_align_buf_dma,
- chan->qh->dw_align_buf_size,
- chan->ep_is_in ?
- DMA_FROM_DEVICE : DMA_TO_DEVICE);
- if (chan->ep_is_in)
- memcpy(urb->buf + frame_desc->offset +
- qtd->isoc_split_offset,
- chan->qh->dw_align_buf,
- frame_desc->actual_length);
- }
-
/* Skip whole frame */
if (chan->qh->do_split &&
chan->ep_type == USB_ENDPOINT_XFER_ISOC && chan->ep_is_in &&
@@ -673,8 +631,6 @@ static void dwc2_deactivate_qh(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh,
}
no_qtd:
- if (qh->channel)
- qh->channel->align_buf = 0;
qh->channel = NULL;
dwc2_hcd_qh_deactivate(hsotg, qh, continue_split);
}
@@ -939,14 +895,6 @@ static int dwc2_xfercomp_isoc_split_in(struct dwc2_hsotg *hsotg,
frame_desc->actual_length += len;
- if (chan->align_buf) {
- dev_vdbg(hsotg->dev, "%s(): non-aligned buffer\n", __func__);
- dma_unmap_single(hsotg->dev, chan->qh->dw_align_buf_dma,
- chan->qh->dw_align_buf_size, DMA_FROM_DEVICE);
- memcpy(qtd->urb->buf + frame_desc->offset +
- qtd->isoc_split_offset, chan->qh->dw_align_buf, len);
- }
-
qtd->isoc_split_offset += len;
if (frame_desc->actual_length >= frame_desc->length) {
@@ -1169,19 +1117,6 @@ static void dwc2_update_urb_state_abn(struct dwc2_hsotg *hsotg,
xfer_length = urb->length - urb->actual_length;
}
- /* Non DWORD-aligned buffer case handling */
- if (chan->align_buf && xfer_length && chan->ep_is_in) {
- dev_vdbg(hsotg->dev, "%s(): non-aligned buffer\n", __func__);
- dma_unmap_single(hsotg->dev, chan->qh->dw_align_buf_dma,
- chan->qh->dw_align_buf_size,
- chan->ep_is_in ?
- DMA_FROM_DEVICE : DMA_TO_DEVICE);
- if (chan->ep_is_in)
- memcpy(urb->buf + urb->actual_length,
- chan->qh->dw_align_buf,
- xfer_length);
- }
-
urb->actual_length += xfer_length;
hctsiz = dwc2_readl(hsotg->regs + HCTSIZ(chnum));
diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c
index 7d8d06cfe3c1..3e1edafc593c 100644
--- a/drivers/usb/dwc2/hcd_queue.c
+++ b/drivers/usb/dwc2/hcd_queue.c
@@ -232,13 +232,8 @@ struct dwc2_qh *dwc2_hcd_qh_create(struct dwc2_hsotg *hsotg,
*/
void dwc2_hcd_qh_free(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
{
- if (hsotg->core_params->dma_desc_enable > 0) {
+ if (hsotg->core_params->dma_desc_enable > 0)
dwc2_hcd_qh_free_ddma(hsotg, qh);
- } else {
- /* kfree(NULL) is safe */
- kfree(qh->dw_align_buf);
- qh->dw_align_buf_dma = (dma_addr_t)0;
- }
kfree(qh);
}
--
2.6.0.rc2.230.g3dd15c0
In preparation for future changes to the scheduler let's add some
tracing that makes it easy for us to see what's happening. By default
this tracing will be off.
Note that I've chosen to point tracing at ftrace rather than the console
since performance is pretty critical for these traces and ftrace is more
appropriate for performance-critical traces.
Signed-off-by: Douglas Anderson <[email protected]>
---
Changes in v3:
- scheduler tracing new for v3.
Changes in v2: None
drivers/usb/dwc2/core.h | 9 +++++++++
drivers/usb/dwc2/hcd.h | 5 +++++
drivers/usb/dwc2/hcd_intr.c | 7 ++++++-
drivers/usb/dwc2/hcd_queue.c | 24 +++++++++++++++++++++++-
4 files changed, 43 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
index a66d3cb62b65..8224a1c21ac3 100644
--- a/drivers/usb/dwc2/core.h
+++ b/drivers/usb/dwc2/core.h
@@ -44,6 +44,15 @@
#include <linux/usb/phy.h>
#include "hw.h"
+#ifdef DWC2_TRACE_SCHEDULER
+#define dwc2_scheduler_printk trace_printk
+#else
+#define dwc2_scheduler_printk no_printk
+#endif
+#define dwc2_sch_dbg(hsotg, fmt, ...) \
+ dwc2_scheduler_printk(pr_fmt("%s: SCH: " fmt), \
+ dev_name(hsotg->dev), ##__VA_ARGS__)
+
static inline u32 dwc2_readl(const void __iomem *addr)
{
u32 value = __raw_readl(addr);
diff --git a/drivers/usb/dwc2/hcd.h b/drivers/usb/dwc2/hcd.h
index 8a4486e1a542..de8c0b0937e6 100644
--- a/drivers/usb/dwc2/hcd.h
+++ b/drivers/usb/dwc2/hcd.h
@@ -554,6 +554,11 @@ static inline u16 dwc2_frame_num_inc(u16 frame, u16 inc)
return (frame + inc) & HFNUM_MAX_FRNUM;
}
+static inline u16 dwc2_frame_num_dec(u16 frame, u16 dec)
+{
+ return (frame + HFNUM_MAX_FRNUM + 1 - dec) & HFNUM_MAX_FRNUM;
+}
+
static inline u16 dwc2_full_frame_num(u16 frame)
{
return (frame & HFNUM_MAX_FRNUM) >> 3;
diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c
index 84190243b8be..c3f6200bc630 100644
--- a/drivers/usb/dwc2/hcd_intr.c
+++ b/drivers/usb/dwc2/hcd_intr.c
@@ -135,13 +135,18 @@ static void dwc2_sof_intr(struct dwc2_hsotg *hsotg)
while (qh_entry != &hsotg->periodic_sched_inactive) {
qh = list_entry(qh_entry, struct dwc2_qh, qh_list_entry);
qh_entry = qh_entry->next;
- if (dwc2_frame_num_le(qh->sched_frame, hsotg->frame_number))
+ if (dwc2_frame_num_le(qh->sched_frame, hsotg->frame_number)) {
+ dwc2_sch_dbg(hsotg,
+ "ready %p fn=%04x, sch=%04x\n",
+ qh, hsotg->frame_number, qh->sched_frame);
+
/*
* Move QH to the ready list to be executed next
* (micro)frame
*/
list_move(&qh->qh_list_entry,
&hsotg->periodic_sched_ready);
+ }
}
tr_type = dwc2_hcd_select_transactions(hsotg);
if (tr_type != DWC2_TRANSACTION_NONE)
diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c
index 3e1edafc593c..93b26c8fef88 100644
--- a/drivers/usb/dwc2/hcd_queue.c
+++ b/drivers/usb/dwc2/hcd_queue.c
@@ -113,6 +113,9 @@ static void dwc2_qh_init(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh,
qh->sched_frame = dwc2_frame_num_inc(hsotg->frame_number,
SCHEDULE_SLOP);
qh->interval = urb->interval;
+ dwc2_sch_dbg(hsotg, "init %p sch=%04x, fn=%04x, int=%#x\n",
+ qh, qh->sched_frame, hsotg->frame_number,
+ qh->interval);
#if 0
/* Increase interrupt polling rate for debugging */
if (qh->ep_type == USB_ENDPOINT_XFER_INT)
@@ -126,6 +129,11 @@ static void dwc2_qh_init(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh,
qh->interval *= 8;
qh->sched_frame |= 0x7;
qh->start_split_frame = qh->sched_frame;
+ dwc2_sch_dbg(hsotg,
+ "init*8 %p sch=%04x, fn=%04x, int=%#x\n",
+ qh, qh->sched_frame, hsotg->frame_number,
+ qh->interval);
+
}
dev_dbg(hsotg->dev, "interval=%d\n", qh->interval);
}
@@ -482,6 +490,8 @@ static int dwc2_schedule_periodic(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
if (frame >= 0) {
qh->sched_frame &= ~0x7;
qh->sched_frame |= (frame & 7);
+ dwc2_sch_dbg(hsotg, "sched_p %p sch=%04x, uf=%d\n",
+ qh, qh->sched_frame, frame);
}
if (status > 0)
@@ -583,10 +593,16 @@ int dwc2_hcd_qh_add(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
if (!dwc2_frame_num_le(qh->sched_frame, hsotg->frame_number) &&
!hsotg->frame_number) {
+ u16 new_frame;
+
dev_dbg(hsotg->dev,
"reset frame number counter\n");
- qh->sched_frame = dwc2_frame_num_inc(hsotg->frame_number,
+ new_frame = dwc2_frame_num_inc(hsotg->frame_number,
SCHEDULE_SLOP);
+
+ dwc2_sch_dbg(hsotg, "reset %p sch=%04x=>%04x\n",
+ qh, qh->sched_frame, new_frame);
+ qh->sched_frame = new_frame;
}
/* Add the new QH to the appropriate schedule */
@@ -652,6 +668,7 @@ static void dwc2_sched_periodic_split(struct dwc2_hsotg *hsotg,
int sched_next_periodic_split)
{
u16 incr;
+ u16 old_frame = qh->sched_frame;
if (sched_next_periodic_split) {
qh->sched_frame = frame_number;
@@ -677,6 +694,11 @@ static void dwc2_sched_periodic_split(struct dwc2_hsotg *hsotg,
qh->sched_frame |= 0x7;
qh->start_split_frame = qh->sched_frame;
}
+
+ dwc2_sch_dbg(hsotg, "next(%d) %p fn=%04x, sch=%04x=>%04x (%+d)\n",
+ sched_next_periodic_split, qh, frame_number, old_frame,
+ qh->sched_frame,
+ dwc2_frame_num_dec(qh->sched_frame, old_frame));
}
/*
--
2.6.0.rc2.230.g3dd15c0
The old microframe scheduler was hard to follow and had some bugs in
it. Specifically, I made some code to visualize what was happening and
found:
Add W (280 us):
WWWWWWWWWW|WWWWWWWWWW|WWWWWWWW | | | | |
Add B (260 us):
WWWWWWWWWW|WWWWWWWWWW|WWWWWWWWBB|BBBBBBBBBB|BBBBBBBBBB|BBBB | |
Add C ( 80 us):
WWWWWWWWWW|WWWWWWWWWW|WWWWWWWWBB|BBBBBBBBBB|BBBBBBBBBB|BBBBCCCCCC|CC |
Add K ( 10 us):
WWWWWWWWWW|WWWWWWWWWW|WWWWWWWWBB|BBBBBBBBBB|BBBBBBBBBB|BBBBK | |
Add S (170 us):
SSSSSSSSSS|SSSSSSS | BB|BBBBBBBBBB|BBBBBBBBBB|BBBBK | |
Add L ( 60 us):
SSSSSSSSSS|SSSSSSSLLL|LLL BB|BBBBBBBBBB|BBBBBBBBBB|BBBBK | |
Add W ( 60 us):
SSSSSSSSSS|SSSSSSSLLL|LLLWWWWWBB|BBBBBBBBBB|BBBBBBBBBB|BBBBKW | |
As you can see, the "W" is broken up in a bogus way. It's in microframe
2 and microframe 5. It's easy to find more examples of bugs with more
testing.
As a first step toward fixing things, let's first change the scheduling
functions to do what I believe was the intent of the old function. This
new code uses the existing Linux bitmap code to avoid reinventing the
wheel. You can see (ugly) test code for this up on pastebin:
http://pastebin.com/PjxktNYA
Note that the frames picked by the microframe scheduler functions aren't
properly used yet elsewhere, so this patch won't really have much of an
effect. See future patches in the series.
Signed-off-by: Douglas Anderson <[email protected]>
---
Changes in v3:
- The uframe scheduler patch is folded into optimization series.
- Optimize uframe scheduler "single uframe" case a little.
- uframe scheduler now atop logging patches.
- uframe scheduler now before delayed bandwidth release patches.
- Add defines like EARLY_FRAME_USEC
- Reorder dwc2_deschedule_periodic() in prep for future patches.
- uframe scheduler now shows real usefulness w/ future patches!
Changes in v2:
- Totally rewrote uframe scheduler again after writing test code.
- uframe scheduler atop delayed bandwidth release patches.
drivers/usb/dwc2/core.h | 10 +++-
drivers/usb/dwc2/hcd.c | 3 -
drivers/usb/dwc2/hcd.h | 5 +-
drivers/usb/dwc2/hcd_intr.c | 5 +-
drivers/usb/dwc2/hcd_queue.c | 134 ++++++++++++++++++-------------------------
5 files changed, 69 insertions(+), 88 deletions(-)
diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
index 8224a1c21ac3..567ee2c9e69f 100644
--- a/drivers/usb/dwc2/core.h
+++ b/drivers/usb/dwc2/core.h
@@ -658,7 +658,7 @@ struct dwc2_hregs_backup {
* This value is in microseconds per (micro)frame. The
* assumption is that all periodic transfers may occur in
* the same (micro)frame.
- * @frame_usecs: Internal variable used by the microframe scheduler
+ * @periodic_bitmap: Bitmap used by the microframe scheduler
* @frame_number: Frame number read from the core at SOF. The value ranges
* from 0 to HFNUM_MAX_FRNUM.
* @periodic_qh_count: Count of periodic QHs, if using several eps. Used for
@@ -753,6 +753,11 @@ struct dwc2_hsotg {
#define DWC2_CORE_REV_3_00a 0x4f54300a
#if IS_ENABLED(CONFIG_USB_DWC2_HOST) || IS_ENABLED(CONFIG_USB_DWC2_DUAL_ROLE)
+
+/* Total number of microseconds for scheduling */
+#define EARLY_FRAME_USEC 100
+#define TOTAL_PERIODIC_USEC (6 * EARLY_FRAME_USEC + 30)
+
union dwc2_hcd_internal_flags {
u32 d32;
struct {
@@ -775,7 +780,8 @@ struct dwc2_hsotg {
struct list_head periodic_sched_assigned;
struct list_head periodic_sched_queued;
u16 periodic_usecs;
- u16 frame_usecs[8];
+ unsigned long periodic_bitmap[DIV_ROUND_UP(TOTAL_PERIODIC_USEC,
+ BITS_PER_LONG)];
u16 frame_number;
u16 periodic_qh_count;
bool bus_suspended;
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index 33495b235b3c..803992985935 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -3091,9 +3091,6 @@ int dwc2_hcd_init(struct dwc2_hsotg *hsotg, int irq)
hsotg->hc_ptr_array[i] = channel;
}
- if (hsotg->core_params->uframe_sched > 0)
- dwc2_hcd_init_usecs(hsotg);
-
/* Initialize hsotg start work */
INIT_DELAYED_WORK(&hsotg->start_work, dwc2_hcd_start_func);
diff --git a/drivers/usb/dwc2/hcd.h b/drivers/usb/dwc2/hcd.h
index de8c0b0937e6..a3dbc561fe3f 100644
--- a/drivers/usb/dwc2/hcd.h
+++ b/drivers/usb/dwc2/hcd.h
@@ -235,7 +235,7 @@ enum dwc2_transaction_type {
* @interval: Interval between transfers in (micro)frames
* @sched_frame: (Micro)frame to initialize a periodic transfer.
* The transfer executes in the following (micro)frame.
- * @frame_usecs: Internal variable used by the microframe scheduler
+ * @start_usecs: Exact start microsecond value.
* @start_split_frame: (Micro)frame at which last start split was initialized
* @ntd: Actual number of transfer descriptors in a list
* @qtd_list: List of QTDs for this QH
@@ -266,7 +266,7 @@ struct dwc2_qh {
u16 usecs;
u16 interval;
u16 sched_frame;
- u16 frame_usecs[8];
+ u16 start_usecs;
u16 start_split_frame;
u16 ntd;
struct list_head qtd_list;
@@ -452,7 +452,6 @@ extern void dwc2_hcd_queue_transactions(struct dwc2_hsotg *hsotg,
/* Schedule Queue Functions */
/* Implemented in hcd_queue.c */
-extern void dwc2_hcd_init_usecs(struct dwc2_hsotg *hsotg);
extern struct dwc2_qh *dwc2_hcd_qh_create(struct dwc2_hsotg *hsotg,
struct dwc2_hcd_urb *urb,
gfp_t mem_flags);
diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c
index c3f6200bc630..6071b8ba1ee7 100644
--- a/drivers/usb/dwc2/hcd_intr.c
+++ b/drivers/usb/dwc2/hcd_intr.c
@@ -137,8 +137,9 @@ static void dwc2_sof_intr(struct dwc2_hsotg *hsotg)
qh_entry = qh_entry->next;
if (dwc2_frame_num_le(qh->sched_frame, hsotg->frame_number)) {
dwc2_sch_dbg(hsotg,
- "ready %p fn=%04x, sch=%04x\n",
- qh, hsotg->frame_number, qh->sched_frame);
+ "ready %p fn=%04x, sch=%04x, us=%d\n",
+ qh, hsotg->frame_number, qh->sched_frame,
+ qh->start_usecs);
/*
* Move QH to the ready list to be executed next
diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c
index 93b26c8fef88..64b779a5c93f 100644
--- a/drivers/usb/dwc2/hcd_queue.c
+++ b/drivers/usb/dwc2/hcd_queue.c
@@ -325,35 +325,45 @@ static int dwc2_check_periodic_bandwidth(struct dwc2_hsotg *hsotg,
/**
* Microframe scheduler
- * track the total use in hsotg->frame_usecs
- * keep each qh use in qh->frame_usecs
+ * track the total use in hsotg->periodic_bitmap
+ * keep each qh use in qh->start_usecs
* when surrendering the qh then donate the time back
*/
static const unsigned short max_uframe_usecs[] = {
- 100, 100, 100, 100, 100, 100, 30, 0
+ EARLY_FRAME_USEC, EARLY_FRAME_USEC, EARLY_FRAME_USEC,
+ EARLY_FRAME_USEC, EARLY_FRAME_USEC, EARLY_FRAME_USEC,
+ 30, 0
};
-void dwc2_hcd_init_usecs(struct dwc2_hsotg *hsotg)
-{
- int i;
-
- for (i = 0; i < 8; i++)
- hsotg->frame_usecs[i] = max_uframe_usecs[i];
-}
-
static int dwc2_find_single_uframe(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
{
+ unsigned short frame_start = 0;
unsigned short utime = qh->usecs;
int i;
- for (i = 0; i < 8; i++) {
- /* At the start hsotg->frame_usecs[i] = max_uframe_usecs[i] */
- if (utime <= hsotg->frame_usecs[i]) {
- hsotg->frame_usecs[i] -= utime;
- qh->frame_usecs[i] += utime;
+ for (i = 0; i < ARRAY_SIZE(max_uframe_usecs); i++) {
+ unsigned short frame_time = max_uframe_usecs[i];
+ unsigned long start;
+
+ /* Look for a chunk starting at begin of this frame */
+ start = bitmap_find_next_zero_area(hsotg->periodic_bitmap,
+ frame_start + frame_time,
+ frame_start, utime, 0);
+
+ /* The chunk has to totally fit in this frame */
+ if (start < frame_start + frame_time) {
+ bitmap_set(hsotg->periodic_bitmap, start, utime);
+ qh->start_usecs = start;
+ dwc2_sch_dbg(hsotg, "%s: assigned %d us @ %d us\n",
+ __func__, qh->usecs, qh->start_usecs);
return i;
}
+
+ frame_start += frame_time;
}
+
+ dwc2_sch_dbg(hsotg, "%s: failed to assign %d us\n",
+ __func__, qh->usecs);
return -ENOSPC;
}
@@ -363,57 +373,23 @@ static int dwc2_find_single_uframe(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
static int dwc2_find_multi_uframe(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
{
unsigned short utime = qh->usecs;
- unsigned short xtime;
- int t_left;
- int i;
- int j;
- int k;
+ unsigned long start;
+
+ start = bitmap_find_next_zero_area(hsotg->periodic_bitmap,
+ TOTAL_PERIODIC_USEC, 0, utime, 0);
+ if (start >= TOTAL_PERIODIC_USEC) {
+ dwc2_sch_dbg(hsotg, "%s: failed to assign %d us\n",
+ __func__, qh->usecs);
+ return -ENOSPC;
+ }
- for (i = 0; i < 8; i++) {
- if (hsotg->frame_usecs[i] <= 0)
- continue;
+ bitmap_set(hsotg->periodic_bitmap, start, qh->usecs);
+ qh->start_usecs = start;
- /*
- * we need n consecutive slots so use j as a start slot
- * j plus j+1 must be enough time (for now)
- */
- xtime = hsotg->frame_usecs[i];
- for (j = i + 1; j < 8; j++) {
- /*
- * if we add this frame remaining time to xtime we may
- * be OK, if not we need to test j for a complete frame
- */
- if (xtime + hsotg->frame_usecs[j] < utime) {
- if (hsotg->frame_usecs[j] <
- max_uframe_usecs[j])
- continue;
- }
- if (xtime >= utime) {
- t_left = utime;
- for (k = i; k < 8; k++) {
- t_left -= hsotg->frame_usecs[k];
- if (t_left <= 0) {
- qh->frame_usecs[k] +=
- hsotg->frame_usecs[k]
- + t_left;
- hsotg->frame_usecs[k] = -t_left;
- return i;
- } else {
- qh->frame_usecs[k] +=
- hsotg->frame_usecs[k];
- hsotg->frame_usecs[k] = 0;
- }
- }
- }
- /* add the frame time to x time */
- xtime += hsotg->frame_usecs[j];
- /* we must have a fully available next frame or break */
- if (xtime < utime &&
- hsotg->frame_usecs[j] == max_uframe_usecs[j])
- continue;
- }
- }
- return -ENOSPC;
+ dwc2_sch_dbg(hsotg, "%s: assigned %d us @ %d us\n",
+ __func__, qh->usecs, qh->start_usecs);
+
+ return start / EARLY_FRAME_USEC;
}
static int dwc2_find_uframe(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
@@ -490,8 +466,10 @@ static int dwc2_schedule_periodic(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
if (frame >= 0) {
qh->sched_frame &= ~0x7;
qh->sched_frame |= (frame & 7);
- dwc2_sch_dbg(hsotg, "sched_p %p sch=%04x, uf=%d\n",
- qh, qh->sched_frame, frame);
+ dwc2_sch_dbg(hsotg,
+ "sched_p %p sch=%04x, uf=%d, us=%d\n",
+ qh, qh->sched_frame, frame,
+ qh->start_usecs);
}
if (status > 0)
@@ -551,22 +529,21 @@ static int dwc2_schedule_periodic(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
static void dwc2_deschedule_periodic(struct dwc2_hsotg *hsotg,
struct dwc2_qh *qh)
{
- int i;
+ int start = qh->start_usecs;
+ int utime = qh->usecs;
list_del_init(&qh->qh_list_entry);
/* Update claimed usecs per (micro)frame */
hsotg->periodic_usecs -= qh->usecs;
- if (hsotg->core_params->uframe_sched > 0) {
- for (i = 0; i < 8; i++) {
- hsotg->frame_usecs[i] += qh->frame_usecs[i];
- qh->frame_usecs[i] = 0;
- }
- } else {
+ if (hsotg->core_params->uframe_sched <= 0) {
/* Release periodic channel reservation */
hsotg->periodic_channels--;
+ return;
}
+
+ bitmap_clear(hsotg->periodic_bitmap, start, utime);
}
/**
@@ -600,8 +577,8 @@ int dwc2_hcd_qh_add(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
new_frame = dwc2_frame_num_inc(hsotg->frame_number,
SCHEDULE_SLOP);
- dwc2_sch_dbg(hsotg, "reset %p sch=%04x=>%04x\n",
- qh, qh->sched_frame, new_frame);
+ dwc2_sch_dbg(hsotg, "reset %p sch=%04x=>%04x, us=%d\n",
+ qh, qh->sched_frame, new_frame, qh->start_usecs);
qh->sched_frame = new_frame;
}
@@ -695,10 +672,11 @@ static void dwc2_sched_periodic_split(struct dwc2_hsotg *hsotg,
qh->start_split_frame = qh->sched_frame;
}
- dwc2_sch_dbg(hsotg, "next(%d) %p fn=%04x, sch=%04x=>%04x (%+d)\n",
+ dwc2_sch_dbg(hsotg, "next(%d) %p fn=%04x, sch=%04x=>%04x (%+d) us=%d\n",
sched_next_periodic_split, qh, frame_number, old_frame,
qh->sched_frame,
- dwc2_frame_num_dec(qh->sched_frame, old_frame));
+ dwc2_frame_num_dec(qh->sched_frame, old_frame),
+ qh->start_usecs);
}
/*
--
2.6.0.rc2.230.g3dd15c0
The microframe scheduler did a lot of work to pick the proper
microframe. Then dwc2_sched_periodic_split() went ahead and ignored
which microframe we picked if it ever needed to re-assign things.
Let's keep track of the uframe and then we'll always use it.
Signed-off-by: Douglas Anderson <[email protected]>
---
Changes in v3:
- Keep track and use our uframe new for v3.
Changes in v2: None
drivers/usb/dwc2/hcd.h | 4 ++++
drivers/usb/dwc2/hcd_queue.c | 16 +++++++++++++---
2 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/dwc2/hcd.h b/drivers/usb/dwc2/hcd.h
index a3dbc561fe3f..ff99cb44c89d 100644
--- a/drivers/usb/dwc2/hcd.h
+++ b/drivers/usb/dwc2/hcd.h
@@ -231,6 +231,9 @@ enum dwc2_transaction_type {
* @do_split: Full/low speed endpoint on high-speed hub requires split
* @td_first: Index of first activated isochronous transfer descriptor
* @td_last: Index of last activated isochronous transfer descriptor
+ * @sched_uframe: The microframe that we're scheduled to be in (0 - 7).
+ * Whenever we start a new split this is expected to be the
+ * lower 3 bits of sched_frame.
* @usecs: Bandwidth in microseconds per (micro)frame
* @interval: Interval between transfers in (micro)frames
* @sched_frame: (Micro)frame to initialize a periodic transfer.
@@ -263,6 +266,7 @@ struct dwc2_qh {
u8 do_split;
u8 td_first;
u8 td_last;
+ u8 sched_uframe;
u16 usecs;
u16 interval;
u16 sched_frame;
diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c
index 64b779a5c93f..4c1d9cf482d0 100644
--- a/drivers/usb/dwc2/hcd_queue.c
+++ b/drivers/usb/dwc2/hcd_queue.c
@@ -464,8 +464,10 @@ static int dwc2_schedule_periodic(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
/* Set the new frame up */
if (frame >= 0) {
- qh->sched_frame &= ~0x7;
- qh->sched_frame |= (frame & 7);
+ qh->sched_uframe = frame;
+
+ /* The next frame that matches our scheduled uframe */
+ qh->sched_frame = ((qh->sched_frame + 7) & ~7) | frame;
dwc2_sch_dbg(hsotg,
"sched_p %p sch=%04x, uf=%d, us=%d\n",
qh, qh->sched_frame, frame,
@@ -668,7 +670,15 @@ static void dwc2_sched_periodic_split(struct dwc2_hsotg *hsotg,
qh->interval);
if (dwc2_frame_num_le(qh->sched_frame, frame_number))
qh->sched_frame = frame_number;
- qh->sched_frame |= 0x7;
+
+ if (hsotg->core_params->uframe_sched > 0)
+ /* The next frame that matches our scheduled uframe */
+ qh->sched_frame = ((qh->sched_frame + 7) & ~7) |
+ qh->sched_uframe;
+ else
+ /* The beginning of the next full frame */
+ qh->sched_frame |= 0x7;
+
qh->start_split_frame = qh->sched_frame;
}
--
2.6.0.rc2.230.g3dd15c0
Until we have logic to determine which devices share the same TT let's
add logic to assume that all devices on a given dwc2 controller are on
one single_tt hub. This is better than the previous code that assumed
that all devices were on one multi_tt hub, since single_tt hubs
appear (in my experience) to be much more common and any schedule that
would work on a single_tt hub will also work on a multi_tt hub. This
will prevent more than 8 total low/full speed devices to be on the bus
at one time, but that's a reasonable restriction until we've made things
smarter.
Signed-off-by: Douglas Anderson <[email protected]>
---
Changes in v3:
- Assuming single_tt is new for v3; not terribly well tested (yet).
Changes in v2: None
drivers/usb/dwc2/core.h | 1 +
drivers/usb/dwc2/hcd_queue.c | 40 +++++++++++++++++++++++++++++++++++++++-
2 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
index 567ee2c9e69f..09aa2b5ae29e 100644
--- a/drivers/usb/dwc2/core.h
+++ b/drivers/usb/dwc2/core.h
@@ -782,6 +782,7 @@ struct dwc2_hsotg {
u16 periodic_usecs;
unsigned long periodic_bitmap[DIV_ROUND_UP(TOTAL_PERIODIC_USEC,
BITS_PER_LONG)];
+ bool has_split[8];
u16 frame_number;
u16 periodic_qh_count;
bool bus_suspended;
diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c
index 4c1d9cf482d0..c5a2edb04bec 100644
--- a/drivers/usb/dwc2/hcd_queue.c
+++ b/drivers/usb/dwc2/hcd_queue.c
@@ -372,10 +372,33 @@ static int dwc2_find_single_uframe(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
*/
static int dwc2_find_multi_uframe(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
{
+ unsigned long tmp_bmp[DIV_ROUND_UP(TOTAL_PERIODIC_USEC, BITS_PER_LONG)];
+ unsigned long *bmp;
unsigned short utime = qh->usecs;
unsigned long start;
+ int i;
+
+ if (qh->do_split) {
+ /*
+ * Eventually we'll only want to exclude out microframes used by
+ * other people on the same TT as us, and then only if we're on
+ * a single_tt hub. ...but until we have that logic, just
+ * schedule everyone together.
+ */
+ bmp = tmp_bmp;
+ memcpy(bmp, hsotg->periodic_bitmap, sizeof(tmp_bmp));
+ start = 0;
+
+ for (i = 0; i < ARRAY_SIZE(max_uframe_usecs); i++) {
+ if (hsotg->has_split[i])
+ bitmap_set(bmp, start, max_uframe_usecs[i]);
+ start += max_uframe_usecs[i];
+ }
+ } else {
+ bmp = hsotg->periodic_bitmap;
+ }
- start = bitmap_find_next_zero_area(hsotg->periodic_bitmap,
+ start = bitmap_find_next_zero_area(bmp,
TOTAL_PERIODIC_USEC, 0, utime, 0);
if (start >= TOTAL_PERIODIC_USEC) {
dwc2_sch_dbg(hsotg, "%s: failed to assign %d us\n",
@@ -386,6 +409,13 @@ static int dwc2_find_multi_uframe(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
bitmap_set(hsotg->periodic_bitmap, start, qh->usecs);
qh->start_usecs = start;
+ if (qh->do_split) {
+ for (i = start / EARLY_FRAME_USEC;
+ i < DIV_ROUND_UP(start + utime - 1, EARLY_FRAME_USEC);
+ i++)
+ hsotg->has_split[i] = true;
+ }
+
dwc2_sch_dbg(hsotg, "%s: assigned %d us @ %d us\n",
__func__, qh->usecs, qh->start_usecs);
@@ -533,6 +563,7 @@ static void dwc2_deschedule_periodic(struct dwc2_hsotg *hsotg,
{
int start = qh->start_usecs;
int utime = qh->usecs;
+ int i;
list_del_init(&qh->qh_list_entry);
@@ -546,6 +577,13 @@ static void dwc2_deschedule_periodic(struct dwc2_hsotg *hsotg,
}
bitmap_clear(hsotg->periodic_bitmap, start, utime);
+
+ if (qh->do_split) {
+ for (i = start / EARLY_FRAME_USEC;
+ i < DIV_ROUND_UP(start + utime - 1, EARLY_FRAME_USEC);
+ i++)
+ hsotg->has_split[i] = false;
+ }
}
/**
--
2.6.0.rc2.230.g3dd15c0
We'd like to be able to use HCD_BH in order to speed up the dwc2 host
interrupt handler quite a bit. However, according to the kernel doc for
usb_submit_urb() (specifically the part about "Reserved Bandwidth
Transfers"), we need to keep a reservation active as long as a device
driver keeps submitting. That was easy to do when we gave back the URB
in the interrupt context: we just looked at when our queue was empty and
released the reserved bandwidth then. ...but now we need a little more
complexity.
We'll follow EHCI's lead in commit 9118f9eb4f1e ("USB: EHCI: improve
interrupt qh unlink") and add a 5ms delay. Since we don't have a whole
timer infrastructure in dwc2, we'll just add a timer per QH. The
overhead for this is very small.
Signed-off-by: Douglas Anderson <[email protected]>
---
Changes in v3:
- Moved periodic bandwidth release delay patch later in the series.
Changes in v2:
- Periodic bandwidth release delay new for V2
drivers/usb/dwc2/hcd.h | 6 ++
drivers/usb/dwc2/hcd_queue.c | 252 ++++++++++++++++++++++++++++++++-----------
2 files changed, 193 insertions(+), 65 deletions(-)
diff --git a/drivers/usb/dwc2/hcd.h b/drivers/usb/dwc2/hcd.h
index ff99cb44c89d..2d9bb45c1777 100644
--- a/drivers/usb/dwc2/hcd.h
+++ b/drivers/usb/dwc2/hcd.h
@@ -211,6 +211,7 @@ enum dwc2_transaction_type {
/**
* struct dwc2_qh - Software queue head structure
*
+ * @hsotg: The HCD state structure for the DWC OTG controller
* @ep_type: Endpoint type. One of the following values:
* - USB_ENDPOINT_XFER_CONTROL
* - USB_ENDPOINT_XFER_BULK
@@ -250,13 +251,16 @@ enum dwc2_transaction_type {
* @n_bytes: Xfer Bytes array. Each element corresponds to a transfer
* descriptor and indicates original XferSize value for the
* descriptor
+ * @unreserve_timer: Timer for releasing periodic reservation.
* @tt_buffer_dirty True if clear_tt_buffer_complete is pending
+ * @unreserve_pending: True if we planned to unreserve but haven't yet.
*
* A Queue Head (QH) holds the static characteristics of an endpoint and
* maintains a list of transfers (QTDs) for that endpoint. A QH structure may
* be entered in either the non-periodic or periodic schedule.
*/
struct dwc2_qh {
+ struct dwc2_hsotg *hsotg;
u8 ep_type;
u8 ep_is_in;
u16 maxp;
@@ -279,7 +283,9 @@ struct dwc2_qh {
struct dwc2_hcd_dma_desc *desc_list;
dma_addr_t desc_list_dma;
u32 *n_bytes;
+ struct timer_list unreserve_timer;
unsigned tt_buffer_dirty:1;
+ unsigned unreserve_pending:1;
};
/**
diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c
index c5a2edb04bec..a287e52f9a63 100644
--- a/drivers/usb/dwc2/hcd_queue.c
+++ b/drivers/usb/dwc2/hcd_queue.c
@@ -53,6 +53,101 @@
#include "core.h"
#include "hcd.h"
+/* Wait this long before releasing periodic reservation */
+#define DWC2_UNRESERVE_DELAY (msecs_to_jiffies(5))
+
+/**
+ * dwc2_do_unreserve() - Actually release the periodic reservation
+ *
+ * This function actually releases the periodic bandwidth that was reserved
+ * by the given qh.
+ *
+ * @hsotg: The HCD state structure for the DWC OTG controller
+ * @qh: QH for the periodic transfer.
+ */
+static void dwc2_do_unreserve(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
+{
+ int start = qh->start_usecs;
+ int utime = qh->usecs;
+ int i;
+
+ assert_spin_locked(&hsotg->lock);
+
+ WARN_ON(!qh->unreserve_pending);
+
+ /* No more unreserve pending--we're doing it */
+ qh->unreserve_pending = false;
+
+ if (WARN_ON(!list_empty(&qh->qh_list_entry)))
+ list_del_init(&qh->qh_list_entry);
+
+ /* Update claimed usecs per (micro)frame */
+ hsotg->periodic_usecs -= qh->usecs;
+
+ /* Release periodic channel reservation */
+ if (hsotg->core_params->uframe_sched <= 0) {
+ hsotg->periodic_channels--;
+ return;
+ }
+
+ bitmap_clear(hsotg->periodic_bitmap, start, utime);
+
+ if (qh->do_split) {
+ for (i = start / EARLY_FRAME_USEC;
+ i < DIV_ROUND_UP(start + utime - 1, EARLY_FRAME_USEC);
+ i++)
+ hsotg->has_split[i] = false;
+ }
+}
+
+/**
+ * dwc2_unreserve_timer_fn() - Timer function to release periodic reservation
+ *
+ * According to the kernel doc for usb_submit_urb() (specifically the part about
+ * "Reserved Bandwidth Transfers"), we need to keep a reservation active as
+ * long as a device driver keeps submitting. Since we're using HCD_BH to give
+ * back the URB we need to give the driver a little bit of time before we
+ * release the reservation. This worker is called after the appropriate
+ * delay.
+ *
+ * @work: Pointer to a qh unreserve_work.
+ */
+static void dwc2_unreserve_timer_fn(unsigned long data)
+{
+ struct dwc2_qh *qh = (struct dwc2_qh *)data;
+ struct dwc2_hsotg *hsotg = qh->hsotg;
+ unsigned long flags;
+
+ /*
+ * Wait for the lock, or for us to be scheduled again. We
+ * could be scheduled again if:
+ * - We started executing but didn't get the lock yet.
+ * - A new reservation came in, but cancel didn't take effect
+ * because we already started executing.
+ * - The timer has been kicked again.
+ * In that case cancel and wait for the next call.
+ */
+ while (!spin_trylock_irqsave(&hsotg->lock, flags)) {
+ if (timer_pending(&qh->unreserve_timer))
+ return;
+ }
+
+ /*
+ * Might be no more unreserve pending if:
+ * - We started executing but didn't get the lock yet.
+ * - A new reservation came in, but cancel didn't take effect
+ * because we already started executing.
+ *
+ * We can't put this in the loop above because unreserve_pending needs
+ * to be accessed under lock, so we can only check it once we got the
+ * lock.
+ */
+ if (qh->unreserve_pending)
+ dwc2_do_unreserve(hsotg, qh);
+
+ spin_unlock_irqrestore(&hsotg->lock, flags);
+}
+
/**
* dwc2_qh_init() - Initializes a QH structure
*
@@ -71,6 +166,9 @@ static void dwc2_qh_init(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh,
dev_vdbg(hsotg->dev, "%s()\n", __func__);
/* Initialize QH */
+ qh->hsotg = hsotg;
+ setup_timer(&qh->unreserve_timer, dwc2_unreserve_timer_fn,
+ (unsigned long)qh);
qh->ep_type = dwc2_hcd_get_pipe_type(&urb->pipe_info);
qh->ep_is_in = dwc2_hcd_is_pipe_in(&urb->pipe_info) ? 1 : 0;
@@ -240,6 +338,15 @@ struct dwc2_qh *dwc2_hcd_qh_create(struct dwc2_hsotg *hsotg,
*/
void dwc2_hcd_qh_free(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
{
+ /* Make sure any unreserve work is finished. */
+ if (del_timer_sync(&qh->unreserve_timer)) {
+ unsigned long flags;
+
+ spin_lock_irqsave(&hsotg->lock, flags);
+ dwc2_do_unreserve(hsotg, qh);
+ spin_unlock_irqrestore(&hsotg->lock, flags);
+ }
+
if (hsotg->core_params->dma_desc_enable > 0)
dwc2_hcd_qh_free_ddma(hsotg, qh);
kfree(qh);
@@ -483,55 +590,76 @@ static int dwc2_schedule_periodic(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
{
int status;
- if (hsotg->core_params->uframe_sched > 0) {
- int frame = -1;
+ status = dwc2_check_max_xfer_size(hsotg, qh);
+ if (status) {
+ dev_dbg(hsotg->dev,
+ "%s: Channel max transfer size too small for periodic transfer\n",
+ __func__);
+ return status;
+ }
- status = dwc2_find_uframe(hsotg, qh);
- if (status == 0)
- frame = 7;
- else if (status > 0)
- frame = status - 1;
+ /* Cancel pending unreserve; if canceled OK, unreserve was pending */
+ if (del_timer(&qh->unreserve_timer))
+ WARN_ON(!qh->unreserve_pending);
- /* Set the new frame up */
- if (frame >= 0) {
- qh->sched_uframe = frame;
+ /*
+ * Only need to reserve if there's not an unreserve pending, since if an
+ * unreserve is pending then by definition our old reservation is still
+ * valid. Unreserve might still be pending even if we didn't cancel if
+ * dwc2_unreserve_timer_fn() already started. Code in the timer handles
+ * that case.
+ */
+ if (!qh->unreserve_pending) {
+ if (hsotg->core_params->uframe_sched > 0) {
+ int frame = -1;
+
+ status = dwc2_find_uframe(hsotg, qh);
+ if (status == 0)
+ frame = 7;
+ else if (status > 0)
+ frame = status - 1;
+
+ /* Set the new frame up */
+ if (frame >= 0) {
+ qh->sched_uframe = frame;
+ qh->sched_frame = ((qh->sched_frame + 7) & ~7) |
+ frame;
+ dwc2_sch_dbg(hsotg,
+ "sched_p %p sch=%04x, uf=%d, us=%d\n",
+ qh, qh->sched_frame, frame,
+ qh->start_usecs);
+ }
- /* The next frame that matches our scheduled uframe */
- qh->sched_frame = ((qh->sched_frame + 7) & ~7) | frame;
- dwc2_sch_dbg(hsotg,
- "sched_p %p sch=%04x, uf=%d, us=%d\n",
- qh, qh->sched_frame, frame,
- qh->start_usecs);
+ if (status > 0)
+ status = 0;
+ } else {
+ status = dwc2_periodic_channel_available(hsotg);
+ if (status) {
+ dev_info(hsotg->dev,
+ "%s: No host channel available for periodic transfer\n",
+ __func__);
+ return status;
+ }
+
+ status = dwc2_check_periodic_bandwidth(hsotg, qh);
}
- if (status > 0)
- status = 0;
- } else {
- status = dwc2_periodic_channel_available(hsotg);
if (status) {
- dev_info(hsotg->dev,
- "%s: No host channel available for periodic transfer\n",
- __func__);
+ dev_dbg(hsotg->dev,
+ "%s: Insufficient periodic bandwidth for periodic transfer\n",
+ __func__);
return status;
}
- status = dwc2_check_periodic_bandwidth(hsotg, qh);
- }
+ if (hsotg->core_params->uframe_sched <= 0)
+ /* Reserve periodic channel */
+ hsotg->periodic_channels++;
- if (status) {
- dev_dbg(hsotg->dev,
- "%s: Insufficient periodic bandwidth for periodic transfer\n",
- __func__);
- return status;
+ /* Update claimed usecs per (micro)frame */
+ hsotg->periodic_usecs += qh->usecs;
}
- status = dwc2_check_max_xfer_size(hsotg, qh);
- if (status) {
- dev_dbg(hsotg->dev,
- "%s: Channel max transfer size too small for periodic transfer\n",
- __func__);
- return status;
- }
+ qh->unreserve_pending = 0;
if (hsotg->core_params->dma_desc_enable > 0)
/* Don't rely on SOF and start in ready schedule */
@@ -541,13 +669,6 @@ static int dwc2_schedule_periodic(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
list_add_tail(&qh->qh_list_entry,
&hsotg->periodic_sched_inactive);
- if (hsotg->core_params->uframe_sched <= 0)
- /* Reserve periodic channel */
- hsotg->periodic_channels++;
-
- /* Update claimed usecs per (micro)frame */
- hsotg->periodic_usecs += qh->usecs;
-
return status;
}
@@ -561,29 +682,31 @@ static int dwc2_schedule_periodic(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
static void dwc2_deschedule_periodic(struct dwc2_hsotg *hsotg,
struct dwc2_qh *qh)
{
- int start = qh->start_usecs;
- int utime = qh->usecs;
- int i;
+ bool did_modify;
- list_del_init(&qh->qh_list_entry);
+ assert_spin_locked(&hsotg->lock);
- /* Update claimed usecs per (micro)frame */
- hsotg->periodic_usecs -= qh->usecs;
-
- if (hsotg->core_params->uframe_sched <= 0) {
- /* Release periodic channel reservation */
- hsotg->periodic_channels--;
- return;
- }
-
- bitmap_clear(hsotg->periodic_bitmap, start, utime);
+ /*
+ * Schedule the unreserve to happen in a little bit. Cases here:
+ * - Unreserve worker might be sitting there waiting to grab the lock.
+ * In this case it will notice it's been schedule again and will
+ * quit.
+ * - Unreserve worker might not be scheduled.
+ *
+ * We should never already be scheduled since dwc2_schedule_periodic()
+ * should have canceled the scheduled unreserve timer (hence the
+ * warning on did_modify).
+ *
+ * We add + 1 to the timer to guarantee that at least 1 jiffy has
+ * passed (otherwise if the jiffy counter might tick right after we
+ * read it and we'll get no delay).
+ */
+ did_modify = mod_timer(&qh->unreserve_timer,
+ jiffies + DWC2_UNRESERVE_DELAY + 1);
+ WARN_ON(did_modify);
+ qh->unreserve_pending = 1;
- if (qh->do_split) {
- for (i = start / EARLY_FRAME_USEC;
- i < DIV_ROUND_UP(start + utime - 1, EARLY_FRAME_USEC);
- i++)
- hsotg->has_split[i] = false;
- }
+ list_del_init(&qh->qh_list_entry);
}
/**
@@ -710,7 +833,6 @@ static void dwc2_sched_periodic_split(struct dwc2_hsotg *hsotg,
qh->sched_frame = frame_number;
if (hsotg->core_params->uframe_sched > 0)
- /* The next frame that matches our scheduled uframe */
qh->sched_frame = ((qh->sched_frame + 7) & ~7) |
qh->sched_uframe;
else
--
2.6.0.rc2.230.g3dd15c0
In commit 94dfd7edfd5c ("USB: HCD: support giveback of URB in tasklet
context") support was added to give back the URB in tasklet context.
Let's take advantage of this in dwc2.
This speeds up the dwc2 interrupt handler considerably.
Note that this requires the change ("usb: dwc2: host: Add a delay before
releasing periodic bandwidth") to come first.
Signed-off-by: Douglas Anderson <[email protected]>
Tested-by: Heiko Stuebner <[email protected]>
---
Changes in v3: None
Changes in v2:
- Commit message now says that URB giveback change needs delay change.
drivers/usb/dwc2/hcd.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index 803992985935..5fc86aad542e 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -2266,9 +2266,7 @@ void dwc2_host_complete(struct dwc2_hsotg *hsotg, struct dwc2_qtd *qtd,
kfree(qtd->urb);
qtd->urb = NULL;
- spin_unlock(&hsotg->lock);
usb_hcd_giveback_urb(dwc2_hsotg_to_hcd(hsotg), urb, status);
- spin_lock(&hsotg->lock);
}
/*
@@ -2881,7 +2879,7 @@ static struct hc_driver dwc2_hc_driver = {
.hcd_priv_size = sizeof(struct wrapper_priv_data),
.irq = _dwc2_hcd_irq,
- .flags = HCD_MEMORY | HCD_USB2,
+ .flags = HCD_MEMORY | HCD_USB2 | HCD_BH,
.start = _dwc2_hcd_start,
.stop = _dwc2_hcd_stop,
--
2.6.0.rc2.230.g3dd15c0
Hi,
On Mon, Nov 16, 2015 at 4:51 PM, Douglas Anderson <[email protected]> wrote:
> Until we have logic to determine which devices share the same TT let's
> add logic to assume that all devices on a given dwc2 controller are on
> one single_tt hub. This is better than the previous code that assumed
> that all devices were on one multi_tt hub, since single_tt hubs
> appear (in my experience) to be much more common and any schedule that
> would work on a single_tt hub will also work on a multi_tt hub. This
> will prevent more than 8 total low/full speed devices to be on the bus
> at one time, but that's a reasonable restriction until we've made things
> smarter.
>
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
> Changes in v3:
> - Assuming single_tt is new for v3; not terribly well tested (yet).
>
> Changes in v2: None
>
> drivers/usb/dwc2/core.h | 1 +
> drivers/usb/dwc2/hcd_queue.c | 40 +++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 40 insertions(+), 1 deletion(-)
Just as a FYI I managed to make this a little better with
<http://crosreview.com/312981>, but not posting that yet because
Julius has pointed out some things offline that I could be doing
better (actually schedule the low speed bus properly). I'll hopefully
post something more soon...
-Doug
Hi,
Douglas Anderson <[email protected]> writes:
> Until we have logic to determine which devices share the same TT let's
> add logic to assume that all devices on a given dwc2 controller are on
> one single_tt hub. This is better than the previous code that assumed
> that all devices were on one multi_tt hub, since single_tt hubs
> appear (in my experience) to be much more common and any schedule that
> would work on a single_tt hub will also work on a multi_tt hub. This
> will prevent more than 8 total low/full speed devices to be on the bus
> at one time, but that's a reasonable restriction until we've made things
> smarter.
>
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
> Changes in v3:
> - Assuming single_tt is new for v3; not terribly well tested (yet).
>
> Changes in v2: None
>
> drivers/usb/dwc2/core.h | 1 +
> drivers/usb/dwc2/hcd_queue.c | 40 +++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
> index 567ee2c9e69f..09aa2b5ae29e 100644
> --- a/drivers/usb/dwc2/core.h
> +++ b/drivers/usb/dwc2/core.h
> @@ -782,6 +782,7 @@ struct dwc2_hsotg {
> u16 periodic_usecs;
> unsigned long periodic_bitmap[DIV_ROUND_UP(TOTAL_PERIODIC_USEC,
> BITS_PER_LONG)];
> + bool has_split[8];
why don't you use a u8 instead then just set each bit for each
"has_split" you need to take care of. This array is kinda ugly.
> @@ -386,6 +409,13 @@ static int dwc2_find_multi_uframe(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
> bitmap_set(hsotg->periodic_bitmap, start, qh->usecs);
> qh->start_usecs = start;
>
> + if (qh->do_split) {
> + for (i = start / EARLY_FRAME_USEC;
> + i < DIV_ROUND_UP(start + utime - 1, EARLY_FRAME_USEC);
> + i++)
> + hsotg->has_split[i] = true;
hsotg->has_split |= BIT(i);
> @@ -546,6 +577,13 @@ static void dwc2_deschedule_periodic(struct dwc2_hsotg *hsotg,
> }
>
> bitmap_clear(hsotg->periodic_bitmap, start, utime);
> +
> + if (qh->do_split) {
> + for (i = start / EARLY_FRAME_USEC;
> + i < DIV_ROUND_UP(start + utime - 1, EARLY_FRAME_USEC);
> + i++)
> + hsotg->has_split[i] = false;
hsotg->has_split &= ~BIT(i);
--
balbi
Felipe,
On Thu, Nov 19, 2015 at 7:34 AM, Felipe Balbi <[email protected]> wrote:
>
> Hi,
>
> Douglas Anderson <[email protected]> writes:
>> Until we have logic to determine which devices share the same TT let's
>> add logic to assume that all devices on a given dwc2 controller are on
>> one single_tt hub. This is better than the previous code that assumed
>> that all devices were on one multi_tt hub, since single_tt hubs
>> appear (in my experience) to be much more common and any schedule that
>> would work on a single_tt hub will also work on a multi_tt hub. This
>> will prevent more than 8 total low/full speed devices to be on the bus
>> at one time, but that's a reasonable restriction until we've made things
>> smarter.
>>
>> Signed-off-by: Douglas Anderson <[email protected]>
>> ---
>> Changes in v3:
>> - Assuming single_tt is new for v3; not terribly well tested (yet).
>>
>> Changes in v2: None
>>
>> drivers/usb/dwc2/core.h | 1 +
>> drivers/usb/dwc2/hcd_queue.c | 40 +++++++++++++++++++++++++++++++++++++++-
>> 2 files changed, 40 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
>> index 567ee2c9e69f..09aa2b5ae29e 100644
>> --- a/drivers/usb/dwc2/core.h
>> +++ b/drivers/usb/dwc2/core.h
>> @@ -782,6 +782,7 @@ struct dwc2_hsotg {
>> u16 periodic_usecs;
>> unsigned long periodic_bitmap[DIV_ROUND_UP(TOTAL_PERIODIC_USEC,
>> BITS_PER_LONG)];
>> + bool has_split[8];
>
> why don't you use a u8 instead then just set each bit for each
> "has_split" you need to take care of. This array is kinda ugly.
Let's actually drop this patch completely. Julius and I sat down and
he talked me through things, and with my current understanding the
current microframe scheduler in dwc2 is broken enough that small
little band-aids like this will do little more than just push the
problems around.
I'm a good portion of the way through a better microframe scheduler.
I have no doubt that it won't be perfect, but hopefully it will at
least be based in reality...
My latest thinking on the patches in this series:
1. usb: dwc2: rockchip: Make the max_transfer_size automatic
I'll probably separate this out into its own patch so I can stop
sending it as part of this series. ...or if someone wanted to land it
then I won't bother.
2. usb: dwc2: host: Get aligned DMA in a more supported way
Still can land any time and has good benefits. I believe that I can't
separate this because it will cause conflicts with scheduler patches.
3. usb: dwc2: host: Add scheduler tracing
Would be nice to land.
4. usb: dwc2: host: Rewrite the microframe scheduler
5. usb: dwc2: host: Keep track of and use our scheduled microframe
6. usb: dwc2: host: Assume all devices are on one single_tt hub
Please drop patches 4-6 right now.
7. usb: dwc2: host: Add a delay before releasing periodic bandwidth
8. usb: dwc2: host: Giveback URB in tasklet context
I'll probably move these back up in the series (like in v2) and put
microframe rewrite atop them. With my current understanding the
scheduling is so broken today that the concerns Alan brought up can
wait until we have a proper scheduler to be addressed. In the
meantime getting the huge boost in interrupt speed will help with
dwc2's correctness (and performance) because it means we're much less
likely to miss SOF interrupts.
If anyone has any review time, giving a review to v2 of these patches
would be helpful. Otherwise I'll double check that v2 still looks
good with my current understanding and eventually post them again.
-Doug
Hi,
Doug Anderson <[email protected]> writes:
>> Douglas Anderson <[email protected]> writes:
>>> Until we have logic to determine which devices share the same TT let's
>>> add logic to assume that all devices on a given dwc2 controller are on
>>> one single_tt hub. This is better than the previous code that assumed
>>> that all devices were on one multi_tt hub, since single_tt hubs
>>> appear (in my experience) to be much more common and any schedule that
>>> would work on a single_tt hub will also work on a multi_tt hub. This
>>> will prevent more than 8 total low/full speed devices to be on the bus
>>> at one time, but that's a reasonable restriction until we've made things
>>> smarter.
>>>
>>> Signed-off-by: Douglas Anderson <[email protected]>
>>> ---
>>> Changes in v3:
>>> - Assuming single_tt is new for v3; not terribly well tested (yet).
>>>
>>> Changes in v2: None
>>>
>>> drivers/usb/dwc2/core.h | 1 +
>>> drivers/usb/dwc2/hcd_queue.c | 40 +++++++++++++++++++++++++++++++++++++++-
>>> 2 files changed, 40 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
>>> index 567ee2c9e69f..09aa2b5ae29e 100644
>>> --- a/drivers/usb/dwc2/core.h
>>> +++ b/drivers/usb/dwc2/core.h
>>> @@ -782,6 +782,7 @@ struct dwc2_hsotg {
>>> u16 periodic_usecs;
>>> unsigned long periodic_bitmap[DIV_ROUND_UP(TOTAL_PERIODIC_USEC,
>>> BITS_PER_LONG)];
>>> + bool has_split[8];
>>
>> why don't you use a u8 instead then just set each bit for each
>> "has_split" you need to take care of. This array is kinda ugly.
>
> Let's actually drop this patch completely. Julius and I sat down and
> he talked me through things, and with my current understanding the
> current microframe scheduler in dwc2 is broken enough that small
> little band-aids like this will do little more than just push the
> problems around.
>
> I'm a good portion of the way through a better microframe scheduler.
> I have no doubt that it won't be perfect, but hopefully it will at
> least be based in reality...
>
> My latest thinking on the patches in this series:
>
> 1. usb: dwc2: rockchip: Make the max_transfer_size automatic
>
> I'll probably separate this out into its own patch so I can stop
> sending it as part of this series. ...or if someone wanted to land it
> then I won't bother.
>
>
> 2. usb: dwc2: host: Get aligned DMA in a more supported way
>
> Still can land any time and has good benefits. I believe that I can't
> separate this because it will cause conflicts with scheduler patches.
>
>
> 3. usb: dwc2: host: Add scheduler tracing
>
> Would be nice to land.
>
>
> 4. usb: dwc2: host: Rewrite the microframe scheduler
> 5. usb: dwc2: host: Keep track of and use our scheduled microframe
> 6. usb: dwc2: host: Assume all devices are on one single_tt hub
>
> Please drop patches 4-6 right now.
>
>
> 7. usb: dwc2: host: Add a delay before releasing periodic bandwidth
> 8. usb: dwc2: host: Giveback URB in tasklet context
if you can, it's best to send a new series with the changes. This makes
mine and John's life a lot easier :-)
> I'll probably move these back up in the series (like in v2) and put
> microframe rewrite atop them. With my current understanding the
> scheduling is so broken today that the concerns Alan brought up can
> wait until we have a proper scheduler to be addressed. In the
> meantime getting the huge boost in interrupt speed will help with
> dwc2's correctness (and performance) because it means we're much less
> likely to miss SOF interrupts.
>
> If anyone has any review time, giving a review to v2 of these patches
> would be helpful. Otherwise I'll double check that v2 still looks
> good with my current understanding and eventually post them again.
That would have to be someone experienced with that IP. I don't even
have docs :-)
--
balbi
On 11/19/2015 8:27 AM, Doug Anderson wrote:
> Felipe,
>
> On Thu, Nov 19, 2015 at 7:34 AM, Felipe Balbi <[email protected]> wrote:
>>
>> Hi,
>>
>> Douglas Anderson <[email protected]> writes:
>>> Until we have logic to determine which devices share the same TT let's
>>> add logic to assume that all devices on a given dwc2 controller are on
>>> one single_tt hub. This is better than the previous code that assumed
>>> that all devices were on one multi_tt hub, since single_tt hubs
>>> appear (in my experience) to be much more common and any schedule that
>>> would work on a single_tt hub will also work on a multi_tt hub. This
>>> will prevent more than 8 total low/full speed devices to be on the bus
>>> at one time, but that's a reasonable restriction until we've made things
>>> smarter.
>>>
>>> Signed-off-by: Douglas Anderson <[email protected]>
>>> ---
>>> Changes in v3:
>>> - Assuming single_tt is new for v3; not terribly well tested (yet).
>>>
>>> Changes in v2: None
>>>
>>> drivers/usb/dwc2/core.h | 1 +
>>> drivers/usb/dwc2/hcd_queue.c | 40 +++++++++++++++++++++++++++++++++++++++-
>>> 2 files changed, 40 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
>>> index 567ee2c9e69f..09aa2b5ae29e 100644
>>> --- a/drivers/usb/dwc2/core.h
>>> +++ b/drivers/usb/dwc2/core.h
>>> @@ -782,6 +782,7 @@ struct dwc2_hsotg {
>>> u16 periodic_usecs;
>>> unsigned long periodic_bitmap[DIV_ROUND_UP(TOTAL_PERIODIC_USEC,
>>> BITS_PER_LONG)];
>>> + bool has_split[8];
>>
>> why don't you use a u8 instead then just set each bit for each
>> "has_split" you need to take care of. This array is kinda ugly.
>
> Let's actually drop this patch completely. Julius and I sat down and
> he talked me through things, and with my current understanding the
> current microframe scheduler in dwc2 is broken enough that small
> little band-aids like this will do little more than just push the
> problems around.
>
> I'm a good portion of the way through a better microframe scheduler.
> I have no doubt that it won't be perfect, but hopefully it will at
> least be based in reality...
>
> My latest thinking on the patches in this series:
>
> 1. usb: dwc2: rockchip: Make the max_transfer_size automatic
>
> I'll probably separate this out into its own patch so I can stop
> sending it as part of this series. ...or if someone wanted to land it
> then I won't bother.
>
>
> 2. usb: dwc2: host: Get aligned DMA in a more supported way
>
> Still can land any time and has good benefits. I believe that I can't
> separate this because it will cause conflicts with scheduler patches.
>
>
> 3. usb: dwc2: host: Add scheduler tracing
>
> Would be nice to land.
>
>
> 4. usb: dwc2: host: Rewrite the microframe scheduler
> 5. usb: dwc2: host: Keep track of and use our scheduled microframe
> 6. usb: dwc2: host: Assume all devices are on one single_tt hub
>
> Please drop patches 4-6 right now.
>
>
> 7. usb: dwc2: host: Add a delay before releasing periodic bandwidth
> 8. usb: dwc2: host: Giveback URB in tasklet context
>
> I'll probably move these back up in the series (like in v2) and put
> microframe rewrite atop them. With my current understanding the
> scheduling is so broken today that the concerns Alan brought up can
> wait until we have a proper scheduler to be addressed. In the
> meantime getting the huge boost in interrupt speed will help with
> dwc2's correctness (and performance) because it means we're much less
> likely to miss SOF interrupts.
>
> If anyone has any review time, giving a review to v2 of these patches
> would be helpful. Otherwise I'll double check that v2 still looks
> good with my current understanding and eventually post them again.
>
> -Doug
>
Hi Doug,
Patches 1-3:
Acked-by: John Youn <[email protected]>
Patch 2:
Tested-by: John Youn <[email protected]>
Tested on core version 3.20 using internal TE for un-aligned
buffers.
I haven't had time to look into the scheduling patches yet. But I
agree with you that there are fundamental problems. I'll await
your rewrite.
Regards,
John
John,
On Thu, Nov 19, 2015 at 8:33 PM, John Youn <[email protected]> wrote:
> Patches 1-3:
> Acked-by: John Youn <[email protected]>
>
> Patch 2:
> Tested-by: John Youn <[email protected]>
>
> Tested on core version 3.20 using internal TE for un-aligned
> buffers.
>
> I haven't had time to look into the scheduling patches yet. But I
> agree with you that there are fundamental problems. I'll await
> your rewrite.
Thanks! I'm going to attempt to at least get things reposted with
your tags before Thanksgiving. I'm hopeful that I'll actually be able
to post a RFC patch for the microframe rewrite by then, too! :)
I have something that seems to be sane at
<https://chromium-review.googlesource.com/#/c/313808/>, but it
obviously still needs a lot of cleanup. I also need to pick all of
the things that have landed on Felipe's tree recently and rebase atop
those.
-Doug
Hi,
On Mon, Nov 23, 2015 at 4:28 PM, Doug Anderson <[email protected]> wrote:
> John,
>
> On Thu, Nov 19, 2015 at 8:33 PM, John Youn <[email protected]> wrote:
>> Patches 1-3:
>> Acked-by: John Youn <[email protected]>
>>
>> Patch 2:
>> Tested-by: John Youn <[email protected]>
>>
>> Tested on core version 3.20 using internal TE for un-aligned
>> buffers.
>>
>> I haven't had time to look into the scheduling patches yet. But I
>> agree with you that there are fundamental problems. I'll await
>> your rewrite.
>
> Thanks! I'm going to attempt to at least get things reposted with
> your tags before Thanksgiving. I'm hopeful that I'll actually be able
> to post a RFC patch for the microframe rewrite by then, too! :)
>
> I have something that seems to be sane at
> <https://chromium-review.googlesource.com/#/c/313808/>, but it
> obviously still needs a lot of cleanup. I also need to pick all of
> the things that have landed on Felipe's tree recently and rebase atop
> those.
I didn't succeed at getting something postable. :( I could easily
repost some of the earlier patches (like fixing aligned memory, etc)
if there's a desire for those, but I figured I'd wait until I had
something postable for the microframe scheduler.
I've definitely made progress in the last few days and things are a
bit cleaner at <https://chromium-review.googlesource.com/#/c/313808/3>.
It's also working reasonably well, but I'd still like to understand
some of the behavior I'm seeing before I actually post it to the
mailing list.
I'm going to be away from my computer for about a week, so I'll plan
to pick this up again when I'm back...
-Doug