2022-10-17 20:57:47

by Dan Vacura

[permalink] [raw]
Subject: [PATCH v3 0/6] uvc gadget performance issues

Hello uvc gadget developers,

V3 series updated with fixes for the two issues discussed below, plus
fixes for the configfs interrupt patch.

V2 series with added patches to disable these performance features at
the userspace level for devices that don't work well with the UDC hw,
i.e. dwc3 in this case. Also included are updates to comments for the v1
patch.

Original note:

I'm working on a 5.15.41 based kernel on a qcom chipset with the dwc3
controller and I'm encountering two problems related to the recent performance
improvement changes:

https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/ and
https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/

If I revert these two changes, then I have much improved stability and a
transmission problem I'm seeing is gone. Has there been any success from
others on 5.15 with this uvc improvement and any recommendations for my
current problems? Those being:

1) a smmu panic, snippet here: 

<3>[  718.314900][  T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!
<3>[  718.314994][  T803] arm-smmu 15000000.apps-smmu: FAR    = 0x00000000efe60800
<3>[  718.315023][  T803] arm-smmu 15000000.apps-smmu: PAR    = 0x0000000000000000
<3>[  718.315048][  T803] arm-smmu 15000000.apps-smmu: FSR    = 0x40000402 [TF R SS ]
<3>[  718.315074][  T803] arm-smmu 15000000.apps-smmu: FSYNR0    = 0x5f0003
<3>[  718.315096][  T803] arm-smmu 15000000.apps-smmu: FSYNR1    = 0xaa02
<3>[  718.315117][  T803] arm-smmu 15000000.apps-smmu: context bank#    = 0x1b
<3>[  718.315141][  T803] arm-smmu 15000000.apps-smmu: TTBR0  = 0x001b0000c2a92000
<3>[  718.315165][  T803] arm-smmu 15000000.apps-smmu: TTBR1  = 0x001b000000000000
<3>[  718.315192][  T803] arm-smmu 15000000.apps-smmu: SCTLR  = 0x0a5f00e7 ACTLR  = 0x00000003
<3>[  718.315245][  T803] arm-smmu 15000000.apps-smmu: CBAR  = 0x0001f300
<3>[  718.315274][  T803] arm-smmu 15000000.apps-smmu: MAIR0   = 0xf404ff44 MAIR1   = 0x0000efe4
<3>[  718.315297][  T803] arm-smmu 15000000.apps-smmu: SID = 0x40
<3>[  718.315318][  T803] arm-smmu 15000000.apps-smmu: Client info: BID=0x5, PID=0xa, MID=0x2
<3>[  718.315377][  T803] arm-smmu 15000000.apps-smmu: soft iova-to-phys=0x0000000000000000

I can reduce this panic with the proposed patch, but it still happens until I
disable the "req->no_interrupt = 1" logic.

2) The frame is not fully transmitted in dwc3 with sg support enabled.

There seems to be a mapping limit I'm seeing where only the roughly first
70% of the total frame is sent. Interestingly, if I allocate a larger
size for the buffer upfront, in uvc_queue_setup(), like sizes[0] =
video->imagesize * 3. Then the issue rarely happens. For example, when I
do YUYV I see green, uninitialized data, at the bottom part of the
frame. If I do MJPG with smaller filled sizes, the transmission is fine.

+-------------------------+
| |
| |
| |
| Good data |
| |
| |
| |
+-------------------------+
|xxxxxxxxxxxxxxxxxxxxxxxxx|
|xxxx Bad data xxxxxxxxx|
|xxxxxxxxxxxxxxxxxxxxxxxxx|
+-------------------------+


Dan Vacura (4):
usb: gadget: uvc: fix dropped frame after missed isoc
usb: gadget: uvc: fix sg handling in error case
usb: gadget: uvc: make interrupt skip logic configurable
usb: gadget: uvc: add configfs option for sg support

Jeff Vanhoof (2):
usb: dwc3: gadget: cancel requests instead of release after missed
isoc
usb: gadget: uvc: fix sg handling during video encode

.../ABI/testing/configfs-usb-gadget-uvc | 2 +
Documentation/usb/gadget-testing.rst | 4 ++
drivers/usb/dwc3/core.h | 1 +
drivers/usb/dwc3/gadget.c | 38 +++++++++++++------
drivers/usb/gadget/function/f_uvc.c | 5 +++
drivers/usb/gadget/function/u_uvc.h | 2 +
drivers/usb/gadget/function/uvc.h | 2 +
drivers/usb/gadget/function/uvc_configfs.c | 4 ++
drivers/usb/gadget/function/uvc_queue.c | 18 ++++++---
drivers/usb/gadget/function/uvc_video.c | 27 +++++++++----
10 files changed, 78 insertions(+), 25 deletions(-)

--
2.34.1


2022-10-17 20:58:00

by Dan Vacura

[permalink] [raw]
Subject: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

From: Jeff Vanhoof <[email protected]>

arm-smmu related crashes seen after a Missed ISOC interrupt when
no_interrupt=1 is used. This can happen if the hardware is still using
the data associated with a TRB after the usb_request's ->complete call
has been made. Instead of immediately releasing a request when a Missed
ISOC interrupt has occurred, this change will add logic to cancel the
request instead where it will eventually be released when the
END_TRANSFER command has completed. This logic is similar to some of the
cleanup done in dwc3_gadget_ep_dequeue.

Fixes: 6d8a019614f3 ("usb: dwc3: gadget: check for Missed Isoc from event status")
Cc: <[email protected]>
Signed-off-by: Jeff Vanhoof <[email protected]>
Co-developed-by: Dan Vacura <[email protected]>
Signed-off-by: Dan Vacura <[email protected]>
---
V1 -> V3:
- no change, new patch in series

drivers/usb/dwc3/core.h | 1 +
drivers/usb/dwc3/gadget.c | 38 ++++++++++++++++++++++++++------------
2 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 8f9959ba9fd4..9b005d912241 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -943,6 +943,7 @@ struct dwc3_request {
#define DWC3_REQUEST_STATUS_DEQUEUED 3
#define DWC3_REQUEST_STATUS_STALLED 4
#define DWC3_REQUEST_STATUS_COMPLETED 5
+#define DWC3_REQUEST_STATUS_MISSED_ISOC 6
#define DWC3_REQUEST_STATUS_UNKNOWN -1

u8 epnum;
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 079cd333632e..411532c5c378 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2021,6 +2021,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
case DWC3_REQUEST_STATUS_STALLED:
dwc3_gadget_giveback(dep, req, -EPIPE);
break;
+ case DWC3_REQUEST_STATUS_MISSED_ISOC:
+ dwc3_gadget_giveback(dep, req, -EXDEV);
+ break;
default:
dev_err(dwc->dev, "request cancelled with wrong reason:%d\n", req->status);
dwc3_gadget_giveback(dep, req, -ECONNRESET);
@@ -3402,21 +3405,32 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
struct dwc3 *dwc = dep->dwc;
bool no_started_trb = true;

- dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
+ if (status == -EXDEV) {
+ struct dwc3_request *tmp;
+ struct dwc3_request *req;

- if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
- goto out;
+ if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
+ dwc3_stop_active_transfer(dep, true, true);

- if (!dep->endpoint.desc)
- return no_started_trb;
+ list_for_each_entry_safe(req, tmp, &dep->started_list, list)
+ dwc3_gadget_move_cancelled_request(req,
+ DWC3_REQUEST_STATUS_MISSED_ISOC);
+ } else {
+ dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);

- if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
- list_empty(&dep->started_list) &&
- (list_empty(&dep->pending_list) || status == -EXDEV))
- dwc3_stop_active_transfer(dep, true, true);
- else if (dwc3_gadget_ep_should_continue(dep))
- if (__dwc3_gadget_kick_transfer(dep) == 0)
- no_started_trb = false;
+ if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
+ goto out;
+
+ if (!dep->endpoint.desc)
+ return no_started_trb;
+
+ if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
+ list_empty(&dep->started_list) && list_empty(&dep->pending_list))
+ dwc3_stop_active_transfer(dep, true, true);
+ else if (dwc3_gadget_ep_should_continue(dep))
+ if (__dwc3_gadget_kick_transfer(dep) == 0)
+ no_started_trb = false;
+ }

out:
/*
--
2.34.1

2022-10-17 20:58:16

by Dan Vacura

[permalink] [raw]
Subject: [PATCH v3 3/6] usb: gadget: uvc: fix sg handling in error case

If there is a transmission error the buffer will be returned too early,
causing a memory fault as subsequent requests for that buffer are still
queued up to be sent. Refactor the error handling to wait for the final
request to come in before reporting back the buffer to userspace for all
transfer types (bulk/isoc/isoc_sg). This ensures userspace knows if the
frame was successfully sent.

Fixes: e81e7f9a0eb9 ("usb: gadget: uvc: add scatter gather support")
Cc: <[email protected]> # 859c675d84d4: usb: gadget: uvc: consistently use define for headerlen
Cc: <[email protected]> # f262ce66d40c: usb: gadget: uvc: use on returned header len in video_encode_isoc_sg
Cc: <[email protected]> # 61aa709ca58a: usb: gadget: uvc: rework uvcg_queue_next_buffer to uvcg_complete_buffer
Cc: <[email protected]> # 9b969f93bcef: usb: gadget: uvc: giveback vb2 buffer on req complete
Cc: <[email protected]> # aef11279888c: usb: gadget: uvc: improve sg exit condition
Cc: <[email protected]>
Signed-off-by: Dan Vacura <[email protected]>
---
V1 -> V2:
- undo error rename
- change uvcg_info to uvcg_dbg
V2 -> V3:
- no changes

drivers/usb/gadget/function/uvc_queue.c | 8 +++++---
drivers/usb/gadget/function/uvc_video.c | 18 ++++++++++++++----
2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/gadget/function/uvc_queue.c b/drivers/usb/gadget/function/uvc_queue.c
index ec500ee499ee..0aa3d7e1f3cc 100644
--- a/drivers/usb/gadget/function/uvc_queue.c
+++ b/drivers/usb/gadget/function/uvc_queue.c
@@ -304,6 +304,7 @@ int uvcg_queue_enable(struct uvc_video_queue *queue, int enable)

queue->sequence = 0;
queue->buf_used = 0;
+ queue->flags &= ~UVC_QUEUE_DROP_INCOMPLETE;
} else {
ret = vb2_streamoff(&queue->queue, queue->queue.type);
if (ret < 0)
@@ -329,10 +330,11 @@ int uvcg_queue_enable(struct uvc_video_queue *queue, int enable)
void uvcg_complete_buffer(struct uvc_video_queue *queue,
struct uvc_buffer *buf)
{
- if ((queue->flags & UVC_QUEUE_DROP_INCOMPLETE) &&
- buf->length != buf->bytesused) {
- buf->state = UVC_BUF_STATE_QUEUED;
+ if (queue->flags & UVC_QUEUE_DROP_INCOMPLETE) {
+ queue->flags &= ~UVC_QUEUE_DROP_INCOMPLETE;
+ buf->state = UVC_BUF_STATE_ERROR;
vb2_set_plane_payload(&buf->buf.vb2_buf, 0, 0);
+ vb2_buffer_done(&buf->buf.vb2_buf, VB2_BUF_STATE_ERROR);
return;
}

diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index 91a58567beac..dd54841b0b3e 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -88,6 +88,7 @@ uvc_video_encode_bulk(struct usb_request *req, struct uvc_video *video,
struct uvc_buffer *buf)
{
void *mem = req->buf;
+ struct uvc_request *ureq = req->context;
int len = video->req_size;
int ret;

@@ -113,13 +114,14 @@ uvc_video_encode_bulk(struct usb_request *req, struct uvc_video *video,
video->queue.buf_used = 0;
buf->state = UVC_BUF_STATE_DONE;
list_del(&buf->queue);
- uvcg_complete_buffer(&video->queue, buf);
video->fid ^= UVC_STREAM_FID;
+ ureq->last_buf = buf;

video->payload_size = 0;
}

if (video->payload_size == video->max_payload_size ||
+ video->queue.flags & UVC_QUEUE_DROP_INCOMPLETE ||
buf->bytesused == video->queue.buf_used)
video->payload_size = 0;
}
@@ -180,7 +182,8 @@ uvc_video_encode_isoc_sg(struct usb_request *req, struct uvc_video *video,
req->length -= len;
video->queue.buf_used += req->length - header_len;

- if (buf->bytesused == video->queue.buf_used || !buf->sg) {
+ if (buf->bytesused == video->queue.buf_used || !buf->sg ||
+ video->queue.flags & UVC_QUEUE_DROP_INCOMPLETE) {
video->queue.buf_used = 0;
buf->state = UVC_BUF_STATE_DONE;
buf->offset = 0;
@@ -195,6 +198,7 @@ uvc_video_encode_isoc(struct usb_request *req, struct uvc_video *video,
struct uvc_buffer *buf)
{
void *mem = req->buf;
+ struct uvc_request *ureq = req->context;
int len = video->req_size;
int ret;

@@ -209,12 +213,13 @@ uvc_video_encode_isoc(struct usb_request *req, struct uvc_video *video,

req->length = video->req_size - len;

- if (buf->bytesused == video->queue.buf_used) {
+ if (buf->bytesused == video->queue.buf_used ||
+ video->queue.flags & UVC_QUEUE_DROP_INCOMPLETE) {
video->queue.buf_used = 0;
buf->state = UVC_BUF_STATE_DONE;
list_del(&buf->queue);
- uvcg_complete_buffer(&video->queue, buf);
video->fid ^= UVC_STREAM_FID;
+ ureq->last_buf = buf;
}
}

@@ -255,6 +260,11 @@ uvc_video_complete(struct usb_ep *ep, struct usb_request *req)
case 0:
break;

+ case -EXDEV:
+ uvcg_dbg(&video->uvc->func, "VS request missed xfer.\n");
+ queue->flags |= UVC_QUEUE_DROP_INCOMPLETE;
+ break;
+
case -ESHUTDOWN: /* disconnect from host. */
uvcg_dbg(&video->uvc->func, "VS request cancelled.\n");
uvcg_queue_cancel(queue, 1);
--
2.34.1

2022-10-17 20:59:24

by Dan Vacura

[permalink] [raw]
Subject: [PATCH v3 4/6] usb: gadget: uvc: fix sg handling during video encode

From: Jeff Vanhoof <[email protected]>

In uvc_video_encode_isoc_sg, the uvc_request's sg list is
incorrectly being populated leading to corrupt video being
received by the remote end. When building the sg list the
usage of buf->sg's 'dma_length' field is not correct and
instead its 'length' field should be used.

Fixes: e81e7f9a0eb9 ("usb: gadget: uvc: add scatter gather support")
Cc: <[email protected]>
Signed-off-by: Jeff Vanhoof <[email protected]>
Signed-off-by: Dan Vacura <[email protected]>
---
V1 -> V3:
- no change, new patch in series

drivers/usb/gadget/function/uvc_video.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index dd54841b0b3e..7d4508a83d5d 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -157,10 +157,10 @@ uvc_video_encode_isoc_sg(struct usb_request *req, struct uvc_video *video,
sg = sg_next(sg);

for_each_sg(sg, iter, ureq->sgt.nents - 1, i) {
- if (!len || !buf->sg || !sg_dma_len(buf->sg))
+ if (!len || !buf->sg || !buf->sg->length)
break;

- sg_left = sg_dma_len(buf->sg) - buf->offset;
+ sg_left = buf->sg->length - buf->offset;
part = min_t(unsigned int, len, sg_left);

sg_set_page(iter, sg_page(buf->sg), part, buf->offset);
--
2.34.1

2022-10-17 20:59:38

by Dan Vacura

[permalink] [raw]
Subject: [PATCH v3 5/6] usb: gadget: uvc: make interrupt skip logic configurable

Some UDC hw may not support skipping interrupts, but still support the
request. Allow the interrupt frequency to be configurable to the user.

Signed-off-by: Dan Vacura <[email protected]>
---
V1 -> V2:
- no change, new patch in series
V2 -> V3:
- default to baseline value of 4, fix storing the initial value

Documentation/ABI/testing/configfs-usb-gadget-uvc | 1 +
Documentation/usb/gadget-testing.rst | 2 ++
drivers/usb/gadget/function/f_uvc.c | 3 +++
drivers/usb/gadget/function/u_uvc.h | 1 +
drivers/usb/gadget/function/uvc.h | 2 ++
drivers/usb/gadget/function/uvc_configfs.c | 2 ++
drivers/usb/gadget/function/uvc_queue.c | 6 ++++++
drivers/usb/gadget/function/uvc_video.c | 3 ++-
8 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/configfs-usb-gadget-uvc b/Documentation/ABI/testing/configfs-usb-gadget-uvc
index 611b23e6488d..5dfaa3f7f6a4 100644
--- a/Documentation/ABI/testing/configfs-usb-gadget-uvc
+++ b/Documentation/ABI/testing/configfs-usb-gadget-uvc
@@ -8,6 +8,7 @@ Description: UVC function directory
streaming_maxpacket 1..1023 (fs), 1..3072 (hs/ss)
streaming_interval 1..16
function_name string [32]
+ req_int_skip_div unsigned int
=================== =============================

What: /config/usb-gadget/gadget/functions/uvc.name/control
diff --git a/Documentation/usb/gadget-testing.rst b/Documentation/usb/gadget-testing.rst
index 2278c9ffb74a..f9b5a09be1f4 100644
--- a/Documentation/usb/gadget-testing.rst
+++ b/Documentation/usb/gadget-testing.rst
@@ -794,6 +794,8 @@ The uvc function provides these attributes in its function directory:
sending or receiving when this configuration is
selected
function_name name of the interface
+ req_int_skip_div divisor of total requests to aid in calculating
+ interrupt frequency, 0 indicates all interrupt
=================== ================================================

There are also "control" and "streaming" subdirectories, each of which contain
diff --git a/drivers/usb/gadget/function/f_uvc.c b/drivers/usb/gadget/function/f_uvc.c
index 6e196e06181e..e40ca26b9c55 100644
--- a/drivers/usb/gadget/function/f_uvc.c
+++ b/drivers/usb/gadget/function/f_uvc.c
@@ -655,6 +655,8 @@ uvc_function_bind(struct usb_configuration *c, struct usb_function *f)
cpu_to_le16(max_packet_size * max_packet_mult *
(opts->streaming_maxburst + 1));

+ uvc->config_skip_int_div = opts->req_int_skip_div;
+
/* Allocate endpoints. */
ep = usb_ep_autoconfig(cdev->gadget, &uvc_control_ep);
if (!ep) {
@@ -872,6 +874,7 @@ static struct usb_function_instance *uvc_alloc_inst(void)

opts->streaming_interval = 1;
opts->streaming_maxpacket = 1024;
+ opts->req_int_skip_div = 4;
snprintf(opts->function_name, sizeof(opts->function_name), "UVC Camera");

ret = uvcg_attach_configfs(opts);
diff --git a/drivers/usb/gadget/function/u_uvc.h b/drivers/usb/gadget/function/u_uvc.h
index 24b8681b0d6f..6f73bd5638ed 100644
--- a/drivers/usb/gadget/function/u_uvc.h
+++ b/drivers/usb/gadget/function/u_uvc.h
@@ -24,6 +24,7 @@ struct f_uvc_opts {
unsigned int streaming_interval;
unsigned int streaming_maxpacket;
unsigned int streaming_maxburst;
+ unsigned int req_int_skip_div;

unsigned int control_interface;
unsigned int streaming_interface;
diff --git a/drivers/usb/gadget/function/uvc.h b/drivers/usb/gadget/function/uvc.h
index 40226b1f7e14..29f9477c92cc 100644
--- a/drivers/usb/gadget/function/uvc.h
+++ b/drivers/usb/gadget/function/uvc.h
@@ -107,6 +107,7 @@ struct uvc_video {
spinlock_t req_lock;

unsigned int req_int_count;
+ unsigned int req_int_skip_div;

void (*encode) (struct usb_request *req, struct uvc_video *video,
struct uvc_buffer *buf);
@@ -155,6 +156,7 @@ struct uvc_device {
/* Events */
unsigned int event_length;
unsigned int event_setup_out : 1;
+ unsigned int config_skip_int_div;
};

static inline struct uvc_device *to_uvc(struct usb_function *f)
diff --git a/drivers/usb/gadget/function/uvc_configfs.c b/drivers/usb/gadget/function/uvc_configfs.c
index 4303a3283ba0..419e926ab57e 100644
--- a/drivers/usb/gadget/function/uvc_configfs.c
+++ b/drivers/usb/gadget/function/uvc_configfs.c
@@ -2350,6 +2350,7 @@ UVC_ATTR(f_uvc_opts_, cname, cname)
UVCG_OPTS_ATTR(streaming_interval, streaming_interval, 16);
UVCG_OPTS_ATTR(streaming_maxpacket, streaming_maxpacket, 3072);
UVCG_OPTS_ATTR(streaming_maxburst, streaming_maxburst, 15);
+UVCG_OPTS_ATTR(req_int_skip_div, req_int_skip_div, UINT_MAX);

#undef UVCG_OPTS_ATTR

@@ -2399,6 +2400,7 @@ static struct configfs_attribute *uvc_attrs[] = {
&f_uvc_opts_attr_streaming_interval,
&f_uvc_opts_attr_streaming_maxpacket,
&f_uvc_opts_attr_streaming_maxburst,
+ &f_uvc_opts_attr_req_int_skip_div,
&f_uvc_opts_string_attr_function_name,
NULL,
};
diff --git a/drivers/usb/gadget/function/uvc_queue.c b/drivers/usb/gadget/function/uvc_queue.c
index 0aa3d7e1f3cc..02559906a55a 100644
--- a/drivers/usb/gadget/function/uvc_queue.c
+++ b/drivers/usb/gadget/function/uvc_queue.c
@@ -63,6 +63,12 @@ static int uvc_queue_setup(struct vb2_queue *vq,
*/
nreq = DIV_ROUND_UP(DIV_ROUND_UP(sizes[0], 2), req_size);
nreq = clamp(nreq, 4U, 64U);
+ if (0 == video->uvc->config_skip_int_div) {
+ video->req_int_skip_div = nreq;
+ } else {
+ video->req_int_skip_div = min_t(unsigned int, nreq,
+ video->uvc->config_skip_int_div);
+ }
video->uvc_num_requests = nreq;

return 0;
diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index 7d4508a83d5d..9ff02691b6a4 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -423,7 +423,8 @@ static void uvcg_video_pump(struct work_struct *work)
if (list_empty(&video->req_free) ||
buf->state == UVC_BUF_STATE_DONE ||
!(video->req_int_count %
- DIV_ROUND_UP(video->uvc_num_requests, 4))) {
+ DIV_ROUND_UP(video->uvc_num_requests,
+ video->req_int_skip_div))) {
video->req_int_count = 0;
req->no_interrupt = 0;
} else {
--
2.34.1

2022-10-17 20:59:45

by Dan Vacura

[permalink] [raw]
Subject: [PATCH v3 6/6] usb: gadget: uvc: add configfs option for sg support

The scatter gather support doesn't appear to work well with some UDC hw.
Add the ability to turn on the feature depending on the controller in
use.

Signed-off-by: Dan Vacura <[email protected]>
---
V1 -> V2:
- no change, new patch in serie
V2 -> V3:
- default on, same as baseline

Documentation/ABI/testing/configfs-usb-gadget-uvc | 1 +
Documentation/usb/gadget-testing.rst | 2 ++
drivers/usb/gadget/function/f_uvc.c | 2 ++
drivers/usb/gadget/function/u_uvc.h | 1 +
drivers/usb/gadget/function/uvc_configfs.c | 2 ++
drivers/usb/gadget/function/uvc_queue.c | 4 ++--
6 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/configfs-usb-gadget-uvc b/Documentation/ABI/testing/configfs-usb-gadget-uvc
index 5dfaa3f7f6a4..839a75fc28ee 100644
--- a/Documentation/ABI/testing/configfs-usb-gadget-uvc
+++ b/Documentation/ABI/testing/configfs-usb-gadget-uvc
@@ -9,6 +9,7 @@ Description: UVC function directory
streaming_interval 1..16
function_name string [32]
req_int_skip_div unsigned int
+ sg_supported 0..1
=================== =============================

What: /config/usb-gadget/gadget/functions/uvc.name/control
diff --git a/Documentation/usb/gadget-testing.rst b/Documentation/usb/gadget-testing.rst
index f9b5a09be1f4..8e3072d6a590 100644
--- a/Documentation/usb/gadget-testing.rst
+++ b/Documentation/usb/gadget-testing.rst
@@ -796,6 +796,8 @@ The uvc function provides these attributes in its function directory:
function_name name of the interface
req_int_skip_div divisor of total requests to aid in calculating
interrupt frequency, 0 indicates all interrupt
+ sg_supported allow for scatter gather to be used if the UDC
+ hw supports it
=================== ================================================

There are also "control" and "streaming" subdirectories, each of which contain
diff --git a/drivers/usb/gadget/function/f_uvc.c b/drivers/usb/gadget/function/f_uvc.c
index e40ca26b9c55..d08ebe3ffeb2 100644
--- a/drivers/usb/gadget/function/f_uvc.c
+++ b/drivers/usb/gadget/function/f_uvc.c
@@ -656,6 +656,7 @@ uvc_function_bind(struct usb_configuration *c, struct usb_function *f)
(opts->streaming_maxburst + 1));

uvc->config_skip_int_div = opts->req_int_skip_div;
+ uvc->video.queue.use_sg = opts->sg_supported;

/* Allocate endpoints. */
ep = usb_ep_autoconfig(cdev->gadget, &uvc_control_ep);
@@ -875,6 +876,7 @@ static struct usb_function_instance *uvc_alloc_inst(void)
opts->streaming_interval = 1;
opts->streaming_maxpacket = 1024;
opts->req_int_skip_div = 4;
+ opts->sg_supported = 1;
snprintf(opts->function_name, sizeof(opts->function_name), "UVC Camera");

ret = uvcg_attach_configfs(opts);
diff --git a/drivers/usb/gadget/function/u_uvc.h b/drivers/usb/gadget/function/u_uvc.h
index 6f73bd5638ed..5ccced629925 100644
--- a/drivers/usb/gadget/function/u_uvc.h
+++ b/drivers/usb/gadget/function/u_uvc.h
@@ -25,6 +25,7 @@ struct f_uvc_opts {
unsigned int streaming_maxpacket;
unsigned int streaming_maxburst;
unsigned int req_int_skip_div;
+ unsigned int sg_supported;

unsigned int control_interface;
unsigned int streaming_interface;
diff --git a/drivers/usb/gadget/function/uvc_configfs.c b/drivers/usb/gadget/function/uvc_configfs.c
index 419e926ab57e..3784c0e02d01 100644
--- a/drivers/usb/gadget/function/uvc_configfs.c
+++ b/drivers/usb/gadget/function/uvc_configfs.c
@@ -2351,6 +2351,7 @@ UVCG_OPTS_ATTR(streaming_interval, streaming_interval, 16);
UVCG_OPTS_ATTR(streaming_maxpacket, streaming_maxpacket, 3072);
UVCG_OPTS_ATTR(streaming_maxburst, streaming_maxburst, 15);
UVCG_OPTS_ATTR(req_int_skip_div, req_int_skip_div, UINT_MAX);
+UVCG_OPTS_ATTR(sg_supported, sg_supported, 1);

#undef UVCG_OPTS_ATTR

@@ -2401,6 +2402,7 @@ static struct configfs_attribute *uvc_attrs[] = {
&f_uvc_opts_attr_streaming_maxpacket,
&f_uvc_opts_attr_streaming_maxburst,
&f_uvc_opts_attr_req_int_skip_div,
+ &f_uvc_opts_attr_sg_supported,
&f_uvc_opts_string_attr_function_name,
NULL,
};
diff --git a/drivers/usb/gadget/function/uvc_queue.c b/drivers/usb/gadget/function/uvc_queue.c
index 02559906a55a..3c7aa5c4bba2 100644
--- a/drivers/usb/gadget/function/uvc_queue.c
+++ b/drivers/usb/gadget/function/uvc_queue.c
@@ -149,11 +149,11 @@ int uvcg_queue_init(struct uvc_video_queue *queue, struct device *dev, enum v4l2
queue->queue.buf_struct_size = sizeof(struct uvc_buffer);
queue->queue.ops = &uvc_queue_qops;
queue->queue.lock = lock;
- if (cdev->gadget->sg_supported) {
+ if (queue->use_sg && cdev->gadget->sg_supported) {
queue->queue.mem_ops = &vb2_dma_sg_memops;
- queue->use_sg = 1;
} else {
queue->queue.mem_ops = &vb2_vmalloc_memops;
+ queue->use_sg = false;
}

queue->queue.timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY
--
2.34.1

2022-10-17 21:08:02

by Dan Vacura

[permalink] [raw]
Subject: [PATCH] usb: gadget: uvc: fix dropped frame after missed isoc

With the re-use of the previous completion status in 0d1c407b1a749
("usb: dwc3: gadget: Return proper request status") it could be possible
that the next frame would also get dropped if the current frame has a
missed isoc error. Ensure that an interrupt is requested for the start
of a new frame.

Fixes: fc78941d8169 ("usb: gadget: uvc: decrease the interrupt load to a quarter")
Cc: <[email protected]>
Signed-off-by: Dan Vacura <[email protected]>
---
V1 -> V3:
- no change, new patch in series

drivers/usb/gadget/function/uvc_video.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index bb037fcc90e6..323977716f5a 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -431,7 +431,8 @@ static void uvcg_video_pump(struct work_struct *work)

/* Endpoint now owns the request */
req = NULL;
- video->req_int_count++;
+ if (buf->state != UVC_BUF_STATE_DONE)
+ video->req_int_count++;
}

if (!req)
--
2.34.1

2022-10-17 21:40:52

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

On Mon, Oct 17, 2022, Dan Vacura wrote:
> From: Jeff Vanhoof <[email protected]>
>
> arm-smmu related crashes seen after a Missed ISOC interrupt when
> no_interrupt=1 is used. This can happen if the hardware is still using
> the data associated with a TRB after the usb_request's ->complete call
> has been made. Instead of immediately releasing a request when a Missed
> ISOC interrupt has occurred, this change will add logic to cancel the
> request instead where it will eventually be released when the
> END_TRANSFER command has completed. This logic is similar to some of the
> cleanup done in dwc3_gadget_ep_dequeue.

This doesn't sound right. How did you determine that the hardware is
still using the data associated with the TRB? Did you check the TRB's
HWO bit?

The dwc3 driver would only give back the requests if the TRBs of the
associated requests are completed or when the device is disconnected.
If the TRB indicated missed isoc, that means that the TRB is completed
and its status was updated.

There's a special case which dwc3 may give back requests early is the
case of the device disconnecting. The requests should be returned with
-ESHUTDOWN, and the gadget driver shouldn't be re-using the requests on
de-initialization anyway.

We should not issue End Transfer command just because of missed isoc. We
may want issue End Transfer if the gadget driver is too slow and unable
to feed requests in time (causing underrun and missed isoc) to resync
with the host, but we already handle that.

I'm still not clear what's the problem you're seeing. Do you have the
crash log? Tracepoints?

BR,
Thinh

2022-10-18 01:57:30

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: [PATCH] usb: gadget: uvc: fix dropped frame after missed isoc

On 10/18/22 03:54, Dan Vacura wrote:
> With the re-use of the previous completion status in 0d1c407b1a749
> ("usb: dwc3: gadget: Return proper request status") it could be possible
> that the next frame would also get dropped if the current frame has a
> missed isoc error. Ensure that an interrupt is requested for the start
> of a new frame.
>

Shouldn't the subject line says [PATCH v3 1/6]?

--
An old man doll... just what I always wanted! - Clara

2022-10-18 02:47:42

by Dan Vacura

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

Hi Thinh,

On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
> On Mon, Oct 17, 2022, Dan Vacura wrote:
> > From: Jeff Vanhoof <[email protected]>
> >
> > arm-smmu related crashes seen after a Missed ISOC interrupt when
> > no_interrupt=1 is used. This can happen if the hardware is still using
> > the data associated with a TRB after the usb_request's ->complete call
> > has been made. Instead of immediately releasing a request when a Missed
> > ISOC interrupt has occurred, this change will add logic to cancel the
> > request instead where it will eventually be released when the
> > END_TRANSFER command has completed. This logic is similar to some of the
> > cleanup done in dwc3_gadget_ep_dequeue.
>
> This doesn't sound right. How did you determine that the hardware is
> still using the data associated with the TRB? Did you check the TRB's
> HWO bit?

The problem we're seeing was mentioned in the summary of this patch
series, issue #1. Basically, with the following patch
https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/
integrated a smmu panic is occurring on our Android device with the 5.15
kernel which is:

<3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!

The uvc gadget driver appears to be the first (and only) gadget that
uses the no_interrupt=1 logic, so this seems to be a new condition for
the dwc3 driver. In our configuration, we have up to 64 requests and the
no_interrupt=1 for up to 15 requests. The list size of dep->started_list
would get up to that amount when looping through to cleanup the
completed requests. From testing and debugging the smmu panic occurs
when a -EXDEV status shows up and right after
dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion
we had was the requests were getting returned to the gadget too early.

>
> The dwc3 driver would only give back the requests if the TRBs of the
> associated requests are completed or when the device is disconnected.
> If the TRB indicated missed isoc, that means that the TRB is completed
> and its status was updated.

Interesting, the device is not disconnected as we don't get the
-ESHUTDOWN status back and with this patch in place things continue
after a -EXDEV status is received.

>
> There's a special case which dwc3 may give back requests early is the
> case of the device disconnecting. The requests should be returned with
> -ESHUTDOWN, and the gadget driver shouldn't be re-using the requests on
> de-initialization anyway.
>
> We should not issue End Transfer command just because of missed isoc. We
> may want issue End Transfer if the gadget driver is too slow and unable
> to feed requests in time (causing underrun and missed isoc) to resync
> with the host, but we already handle that.

Hmm, isn't that what happens when we get into this
condition in dwc3_gadget_endpoint_trbs_complete():

if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
list_empty(&dep->started_list) &&
(list_empty(&dep->pending_list) || status == -EXDEV))
dwc3_stop_active_transfer(dep, true, true);

>
> I'm still not clear what's the problem you're seeing. Do you have the
> crash log? Tracepoints?
>
> BR,
> Thinh

Appreciate the support!

Dan

2022-10-18 02:48:14

by Dan Vacura

[permalink] [raw]
Subject: Re: [PATCH] usb: gadget: uvc: fix dropped frame after missed isoc

On Tue, Oct 18, 2022 at 08:50:03AM +0700, Bagas Sanjaya wrote:
> On 10/18/22 03:54, Dan Vacura wrote:
> > With the re-use of the previous completion status in 0d1c407b1a749
> > ("usb: dwc3: gadget: Return proper request status") it could be possible
> > that the next frame would also get dropped if the current frame has a
> > missed isoc error. Ensure that an interrupt is requested for the start
> > of a new frame.
> >
>
> Shouldn't the subject line says [PATCH v3 1/6]?

Yes. Clerical error on my side not updating this after resolving a
check-patch error... Not sure if it matters as this patch can exist on
it's own. Or if I can send this again with fixed subject line, but that
may confuse others, since there's no code difference.

>
> --
> An old man doll... just what I always wanted! - Clara
>

2022-10-18 05:40:51

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] usb: gadget: uvc: fix dropped frame after missed isoc

On Mon, Oct 17, 2022 at 09:15:43PM -0500, Dan Vacura wrote:
> On Tue, Oct 18, 2022 at 08:50:03AM +0700, Bagas Sanjaya wrote:
> > On 10/18/22 03:54, Dan Vacura wrote:
> > > With the re-use of the previous completion status in 0d1c407b1a749
> > > ("usb: dwc3: gadget: Return proper request status") it could be possible
> > > that the next frame would also get dropped if the current frame has a
> > > missed isoc error. Ensure that an interrupt is requested for the start
> > > of a new frame.
> > >
> >
> > Shouldn't the subject line says [PATCH v3 1/6]?
>
> Yes. Clerical error on my side not updating this after resolving a
> check-patch error... Not sure if it matters as this patch can exist on
> it's own. Or if I can send this again with fixed subject line, but that
> may confuse others, since there's no code difference.

Our tools (b4) will complain it can not find patch 1 in the series, so
yes, please resend with them properly numbered so that we can find them
all when going to apply them to the tree.

thanks,

greg k-h

2022-10-18 14:20:52

by Michael Grzeschik

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] usb: gadget: uvc: add configfs option for sg support

Hi Dan!
Hi Dan!

On Tue, Oct 18, 2022 at 02:27:13PM +0100, Dan Scally wrote:
>Hi Dan
>
>On 17/10/2022 21:54, Dan Vacura wrote:
>>The scatter gather support doesn't appear to work well with some UDC hw.
>>Add the ability to turn on the feature depending on the controller in
>>use.
>>
>>Signed-off-by: Dan Vacura <[email protected]>
>
>
>Nitpick: I would call it use_sg everywhere, but either way:

Or even only "scatter_gather". How does that sound?

>
>Reviewed-by: Daniel Scally <[email protected]>
>
>Tested-by: Daniel Scally <[email protected]>
>
>>---
>>V1 -> V2:
>>- no change, new patch in serie
>>V2 -> V3:
>>- default on, same as baseline
>>
>> Documentation/ABI/testing/configfs-usb-gadget-uvc | 1 +
>> Documentation/usb/gadget-testing.rst | 2 ++
>> drivers/usb/gadget/function/f_uvc.c | 2 ++
>> drivers/usb/gadget/function/u_uvc.h | 1 +
>> drivers/usb/gadget/function/uvc_configfs.c | 2 ++
>> drivers/usb/gadget/function/uvc_queue.c | 4 ++--
>> 6 files changed, 10 insertions(+), 2 deletions(-)
>>
>>diff --git a/Documentation/ABI/testing/configfs-usb-gadget-uvc b/Documentation/ABI/testing/configfs-usb-gadget-uvc
>>index 5dfaa3f7f6a4..839a75fc28ee 100644
>>--- a/Documentation/ABI/testing/configfs-usb-gadget-uvc
>>+++ b/Documentation/ABI/testing/configfs-usb-gadget-uvc
>>@@ -9,6 +9,7 @@ Description: UVC function directory
>> streaming_interval 1..16
>> function_name string [32]
>> req_int_skip_div unsigned int
>>+ sg_supported 0..1
>> =================== =============================
>> What: /config/usb-gadget/gadget/functions/uvc.name/control
>>diff --git a/Documentation/usb/gadget-testing.rst b/Documentation/usb/gadget-testing.rst
>>index f9b5a09be1f4..8e3072d6a590 100644
>>--- a/Documentation/usb/gadget-testing.rst
>>+++ b/Documentation/usb/gadget-testing.rst
>>@@ -796,6 +796,8 @@ The uvc function provides these attributes in its function directory:
>> function_name name of the interface
>> req_int_skip_div divisor of total requests to aid in calculating
>> interrupt frequency, 0 indicates all interrupt
>>+ sg_supported allow for scatter gather to be used if the UDC
>>+ hw supports it
>> =================== ================================================
>> There are also "control" and "streaming" subdirectories, each of which contain
>>diff --git a/drivers/usb/gadget/function/f_uvc.c b/drivers/usb/gadget/function/f_uvc.c
>>index e40ca26b9c55..d08ebe3ffeb2 100644
>>--- a/drivers/usb/gadget/function/f_uvc.c
>>+++ b/drivers/usb/gadget/function/f_uvc.c
>>@@ -656,6 +656,7 @@ uvc_function_bind(struct usb_configuration *c, struct usb_function *f)
>> (opts->streaming_maxburst + 1));
>> uvc->config_skip_int_div = opts->req_int_skip_div;
>>+ uvc->video.queue.use_sg = opts->sg_supported;

Why do you set this here?

>> /* Allocate endpoints. */
>> ep = usb_ep_autoconfig(cdev->gadget, &uvc_control_ep);
>>@@ -875,6 +876,7 @@ static struct usb_function_instance *uvc_alloc_inst(void)
>> opts->streaming_interval = 1;
>> opts->streaming_maxpacket = 1024;
>> opts->req_int_skip_div = 4;
>>+ opts->sg_supported = 1;
>> snprintf(opts->function_name, sizeof(opts->function_name), "UVC Camera");
>> ret = uvcg_attach_configfs(opts);
>>diff --git a/drivers/usb/gadget/function/u_uvc.h b/drivers/usb/gadget/function/u_uvc.h
>>index 6f73bd5638ed..5ccced629925 100644
>>--- a/drivers/usb/gadget/function/u_uvc.h
>>+++ b/drivers/usb/gadget/function/u_uvc.h
>>@@ -25,6 +25,7 @@ struct f_uvc_opts {
>> unsigned int streaming_maxpacket;
>> unsigned int streaming_maxburst;
>> unsigned int req_int_skip_div;
>>+ unsigned int sg_supported;
>> unsigned int control_interface;
>> unsigned int streaming_interface;
>>diff --git a/drivers/usb/gadget/function/uvc_configfs.c b/drivers/usb/gadget/function/uvc_configfs.c
>>index 419e926ab57e..3784c0e02d01 100644
>>--- a/drivers/usb/gadget/function/uvc_configfs.c
>>+++ b/drivers/usb/gadget/function/uvc_configfs.c
>>@@ -2351,6 +2351,7 @@ UVCG_OPTS_ATTR(streaming_interval, streaming_interval, 16);
>> UVCG_OPTS_ATTR(streaming_maxpacket, streaming_maxpacket, 3072);
>> UVCG_OPTS_ATTR(streaming_maxburst, streaming_maxburst, 15);
>> UVCG_OPTS_ATTR(req_int_skip_div, req_int_skip_div, UINT_MAX);
>>+UVCG_OPTS_ATTR(sg_supported, sg_supported, 1);
>> #undef UVCG_OPTS_ATTR
>>@@ -2401,6 +2402,7 @@ static struct configfs_attribute *uvc_attrs[] = {
>> &f_uvc_opts_attr_streaming_maxpacket,
>> &f_uvc_opts_attr_streaming_maxburst,
>> &f_uvc_opts_attr_req_int_skip_div,
>>+ &f_uvc_opts_attr_sg_supported,
>> &f_uvc_opts_string_attr_function_name,
>> NULL,
>> };
>>diff --git a/drivers/usb/gadget/function/uvc_queue.c b/drivers/usb/gadget/function/uvc_queue.c
>>index 02559906a55a..3c7aa5c4bba2 100644
>>--- a/drivers/usb/gadget/function/uvc_queue.c
>>+++ b/drivers/usb/gadget/function/uvc_queue.c
>>@@ -149,11 +149,11 @@ int uvcg_queue_init(struct uvc_video_queue *queue, struct device *dev, enum v4l2
>> queue->queue.buf_struct_size = sizeof(struct uvc_buffer);
>> queue->queue.ops = &uvc_queue_qops;
>> queue->queue.lock = lock;
>>- if (cdev->gadget->sg_supported) {
>>+ if (queue->use_sg && cdev->gadget->sg_supported) {
>> queue->queue.mem_ops = &vb2_dma_sg_memops;
>>- queue->use_sg = 1;
>> } else {
>> queue->queue.mem_ops = &vb2_vmalloc_memops;
>>+ queue->use_sg = false;

I am unsure, but can you actually not always use vb2_dma_sg_memops.

With my last patch we always set buf->mem to vb2_plane_vaddr(vb, 0);

https://lore.kernel.org/linux-usb/[email protected]/T/#u


The condition to decide if encode_isoc_sg or encode_isoc should then
remain the last place to switch between sg or not. I would hook the
userspace decision in here.

You can also directly get to opts->scatter_gather by using

struct f_uvc_opts *opts = fi_to_f_uvc_opts(uvc->func.fi);

in the function uvcg_video_enable.

>> }
>> queue->queue.timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY
>

Thanks,
Michael

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |


Attachments:
(No filename) (6.42 kB)
signature.asc (849.00 B)
Download all attachments

2022-10-18 14:38:33

by Dan Scally

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] usb: gadget: uvc: add configfs option for sg support

Hi Michael - again!

On 18/10/2022 15:04, Michael Grzeschik wrote:
> Hi Dan!
> Hi Dan!
>
> On Tue, Oct 18, 2022 at 02:27:13PM +0100, Dan Scally wrote:
>> Hi Dan
>>
>> On 17/10/2022 21:54, Dan Vacura wrote:
>>> The scatter gather support doesn't appear to work well with some UDC
>>> hw.
>>> Add the ability to turn on the feature depending on the controller in
>>> use.
>>>
>>> Signed-off-by: Dan Vacura <[email protected]>
>>
>>
>> Nitpick: I would call it use_sg everywhere, but either way:
>
> Or even only "scatter_gather". How does that sound?


I think I prefer use_sg actually, but I don't have a strong feeling
either way.

>
>>
>> Reviewed-by: Daniel Scally <[email protected]>
>>
>> Tested-by: Daniel Scally <[email protected]>
>>
>>> ---
>>> V1 -> V2:
>>> - no change, new patch in serie
>>> V2 -> V3:
>>> - default on, same as baseline
>>>
>>>  Documentation/ABI/testing/configfs-usb-gadget-uvc | 1 +
>>>  Documentation/usb/gadget-testing.rst              | 2 ++
>>>  drivers/usb/gadget/function/f_uvc.c               | 2 ++
>>>  drivers/usb/gadget/function/u_uvc.h               | 1 +
>>>  drivers/usb/gadget/function/uvc_configfs.c        | 2 ++
>>>  drivers/usb/gadget/function/uvc_queue.c           | 4 ++--
>>>  6 files changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/Documentation/ABI/testing/configfs-usb-gadget-uvc
>>> b/Documentation/ABI/testing/configfs-usb-gadget-uvc
>>> index 5dfaa3f7f6a4..839a75fc28ee 100644
>>> --- a/Documentation/ABI/testing/configfs-usb-gadget-uvc
>>> +++ b/Documentation/ABI/testing/configfs-usb-gadget-uvc
>>> @@ -9,6 +9,7 @@ Description:    UVC function directory
>>>          streaming_interval    1..16
>>>          function_name        string [32]
>>>          req_int_skip_div    unsigned int
>>> +        sg_supported        0..1
>>>          ===================    =============================
>>>  What: /config/usb-gadget/gadget/functions/uvc.name/control
>>> diff --git a/Documentation/usb/gadget-testing.rst
>>> b/Documentation/usb/gadget-testing.rst
>>> index f9b5a09be1f4..8e3072d6a590 100644
>>> --- a/Documentation/usb/gadget-testing.rst
>>> +++ b/Documentation/usb/gadget-testing.rst
>>> @@ -796,6 +796,8 @@ The uvc function provides these attributes in
>>> its function directory:
>>>      function_name       name of the interface
>>>      req_int_skip_div    divisor of total requests to aid in
>>> calculating
>>>                  interrupt frequency, 0 indicates all interrupt
>>> +    sg_supported        allow for scatter gather to be used if the UDC
>>> +                hw supports it
>>>      ===================
>>> ================================================
>>>  There are also "control" and "streaming" subdirectories, each of
>>> which contain
>>> diff --git a/drivers/usb/gadget/function/f_uvc.c
>>> b/drivers/usb/gadget/function/f_uvc.c
>>> index e40ca26b9c55..d08ebe3ffeb2 100644
>>> --- a/drivers/usb/gadget/function/f_uvc.c
>>> +++ b/drivers/usb/gadget/function/f_uvc.c
>>> @@ -656,6 +656,7 @@ uvc_function_bind(struct usb_configuration *c,
>>> struct usb_function *f)
>>>                  (opts->streaming_maxburst + 1));
>>>      uvc->config_skip_int_div = opts->req_int_skip_div;
>>> +    uvc->video.queue.use_sg = opts->sg_supported;
>
> Why do you set this here?
>
>>>      /* Allocate endpoints. */
>>>      ep = usb_ep_autoconfig(cdev->gadget, &uvc_control_ep);
>>> @@ -875,6 +876,7 @@ static struct usb_function_instance
>>> *uvc_alloc_inst(void)
>>>      opts->streaming_interval = 1;
>>>      opts->streaming_maxpacket = 1024;
>>>      opts->req_int_skip_div = 4;
>>> +    opts->sg_supported = 1;
>>>      snprintf(opts->function_name, sizeof(opts->function_name), "UVC
>>> Camera");
>>>      ret = uvcg_attach_configfs(opts);
>>> diff --git a/drivers/usb/gadget/function/u_uvc.h
>>> b/drivers/usb/gadget/function/u_uvc.h
>>> index 6f73bd5638ed..5ccced629925 100644
>>> --- a/drivers/usb/gadget/function/u_uvc.h
>>> +++ b/drivers/usb/gadget/function/u_uvc.h
>>> @@ -25,6 +25,7 @@ struct f_uvc_opts {
>>>      unsigned int                    streaming_maxpacket;
>>>      unsigned int                    streaming_maxburst;
>>>      unsigned int                    req_int_skip_div;
>>> +    unsigned int                    sg_supported;
>>>      unsigned int                    control_interface;
>>>      unsigned int                    streaming_interface;
>>> diff --git a/drivers/usb/gadget/function/uvc_configfs.c
>>> b/drivers/usb/gadget/function/uvc_configfs.c
>>> index 419e926ab57e..3784c0e02d01 100644
>>> --- a/drivers/usb/gadget/function/uvc_configfs.c
>>> +++ b/drivers/usb/gadget/function/uvc_configfs.c
>>> @@ -2351,6 +2351,7 @@ UVCG_OPTS_ATTR(streaming_interval,
>>> streaming_interval, 16);
>>>  UVCG_OPTS_ATTR(streaming_maxpacket, streaming_maxpacket, 3072);
>>>  UVCG_OPTS_ATTR(streaming_maxburst, streaming_maxburst, 15);
>>>  UVCG_OPTS_ATTR(req_int_skip_div, req_int_skip_div, UINT_MAX);
>>> +UVCG_OPTS_ATTR(sg_supported, sg_supported, 1);
>>>  #undef UVCG_OPTS_ATTR
>>> @@ -2401,6 +2402,7 @@ static struct configfs_attribute *uvc_attrs[] = {
>>>      &f_uvc_opts_attr_streaming_maxpacket,
>>>      &f_uvc_opts_attr_streaming_maxburst,
>>>      &f_uvc_opts_attr_req_int_skip_div,
>>> +    &f_uvc_opts_attr_sg_supported,
>>>      &f_uvc_opts_string_attr_function_name,
>>>      NULL,
>>>  };
>>> diff --git a/drivers/usb/gadget/function/uvc_queue.c
>>> b/drivers/usb/gadget/function/uvc_queue.c
>>> index 02559906a55a..3c7aa5c4bba2 100644
>>> --- a/drivers/usb/gadget/function/uvc_queue.c
>>> +++ b/drivers/usb/gadget/function/uvc_queue.c
>>> @@ -149,11 +149,11 @@ int uvcg_queue_init(struct uvc_video_queue
>>> *queue, struct device *dev, enum v4l2
>>>      queue->queue.buf_struct_size = sizeof(struct uvc_buffer);
>>>      queue->queue.ops = &uvc_queue_qops;
>>>      queue->queue.lock = lock;
>>> -    if (cdev->gadget->sg_supported) {
>>> +    if (queue->use_sg && cdev->gadget->sg_supported) {
>>>          queue->queue.mem_ops = &vb2_dma_sg_memops;
>>> -        queue->use_sg = 1;
>>>      } else {
>>>          queue->queue.mem_ops = &vb2_vmalloc_memops;
>>> +        queue->use_sg = false;
>
> I am unsure, but can you actually not always use vb2_dma_sg_memops.
>
> With my last patch we always set buf->mem to vb2_plane_vaddr(vb, 0);
>
> https://lore.kernel.org/linux-usb/[email protected]/T/#u
>
>
>
> The condition to decide if encode_isoc_sg or encode_isoc should then
> remain the last place to switch between sg or not. I would hook the
> userspace decision in here.
>
> You can also directly get to opts->scatter_gather by using
>
>     struct f_uvc_opts *opts = fi_to_f_uvc_opts(uvc->func.fi);
>
> in the function uvcg_video_enable.
>
>>>      }
>>>      queue->queue.timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY
>>
>
> Thanks,
> Michael
>

2022-10-18 14:51:16

by Dan Scally

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] usb: gadget: uvc: add configfs option for sg support

Hi Dan

On 17/10/2022 21:54, Dan Vacura wrote:
> The scatter gather support doesn't appear to work well with some UDC hw.
> Add the ability to turn on the feature depending on the controller in
> use.
>
> Signed-off-by: Dan Vacura <[email protected]>


Nitpick: I would call it use_sg everywhere, but either way:


Reviewed-by: Daniel Scally <[email protected]>

Tested-by: Daniel Scally <[email protected]>

> ---
> V1 -> V2:
> - no change, new patch in serie
> V2 -> V3:
> - default on, same as baseline
>
> Documentation/ABI/testing/configfs-usb-gadget-uvc | 1 +
> Documentation/usb/gadget-testing.rst | 2 ++
> drivers/usb/gadget/function/f_uvc.c | 2 ++
> drivers/usb/gadget/function/u_uvc.h | 1 +
> drivers/usb/gadget/function/uvc_configfs.c | 2 ++
> drivers/usb/gadget/function/uvc_queue.c | 4 ++--
> 6 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/ABI/testing/configfs-usb-gadget-uvc b/Documentation/ABI/testing/configfs-usb-gadget-uvc
> index 5dfaa3f7f6a4..839a75fc28ee 100644
> --- a/Documentation/ABI/testing/configfs-usb-gadget-uvc
> +++ b/Documentation/ABI/testing/configfs-usb-gadget-uvc
> @@ -9,6 +9,7 @@ Description: UVC function directory
> streaming_interval 1..16
> function_name string [32]
> req_int_skip_div unsigned int
> + sg_supported 0..1
> =================== =============================
>
> What: /config/usb-gadget/gadget/functions/uvc.name/control
> diff --git a/Documentation/usb/gadget-testing.rst b/Documentation/usb/gadget-testing.rst
> index f9b5a09be1f4..8e3072d6a590 100644
> --- a/Documentation/usb/gadget-testing.rst
> +++ b/Documentation/usb/gadget-testing.rst
> @@ -796,6 +796,8 @@ The uvc function provides these attributes in its function directory:
> function_name name of the interface
> req_int_skip_div divisor of total requests to aid in calculating
> interrupt frequency, 0 indicates all interrupt
> + sg_supported allow for scatter gather to be used if the UDC
> + hw supports it
> =================== ================================================
>
> There are also "control" and "streaming" subdirectories, each of which contain
> diff --git a/drivers/usb/gadget/function/f_uvc.c b/drivers/usb/gadget/function/f_uvc.c
> index e40ca26b9c55..d08ebe3ffeb2 100644
> --- a/drivers/usb/gadget/function/f_uvc.c
> +++ b/drivers/usb/gadget/function/f_uvc.c
> @@ -656,6 +656,7 @@ uvc_function_bind(struct usb_configuration *c, struct usb_function *f)
> (opts->streaming_maxburst + 1));
>
> uvc->config_skip_int_div = opts->req_int_skip_div;
> + uvc->video.queue.use_sg = opts->sg_supported;
>
> /* Allocate endpoints. */
> ep = usb_ep_autoconfig(cdev->gadget, &uvc_control_ep);
> @@ -875,6 +876,7 @@ static struct usb_function_instance *uvc_alloc_inst(void)
> opts->streaming_interval = 1;
> opts->streaming_maxpacket = 1024;
> opts->req_int_skip_div = 4;
> + opts->sg_supported = 1;
> snprintf(opts->function_name, sizeof(opts->function_name), "UVC Camera");
>
> ret = uvcg_attach_configfs(opts);
> diff --git a/drivers/usb/gadget/function/u_uvc.h b/drivers/usb/gadget/function/u_uvc.h
> index 6f73bd5638ed..5ccced629925 100644
> --- a/drivers/usb/gadget/function/u_uvc.h
> +++ b/drivers/usb/gadget/function/u_uvc.h
> @@ -25,6 +25,7 @@ struct f_uvc_opts {
> unsigned int streaming_maxpacket;
> unsigned int streaming_maxburst;
> unsigned int req_int_skip_div;
> + unsigned int sg_supported;
>
> unsigned int control_interface;
> unsigned int streaming_interface;
> diff --git a/drivers/usb/gadget/function/uvc_configfs.c b/drivers/usb/gadget/function/uvc_configfs.c
> index 419e926ab57e..3784c0e02d01 100644
> --- a/drivers/usb/gadget/function/uvc_configfs.c
> +++ b/drivers/usb/gadget/function/uvc_configfs.c
> @@ -2351,6 +2351,7 @@ UVCG_OPTS_ATTR(streaming_interval, streaming_interval, 16);
> UVCG_OPTS_ATTR(streaming_maxpacket, streaming_maxpacket, 3072);
> UVCG_OPTS_ATTR(streaming_maxburst, streaming_maxburst, 15);
> UVCG_OPTS_ATTR(req_int_skip_div, req_int_skip_div, UINT_MAX);
> +UVCG_OPTS_ATTR(sg_supported, sg_supported, 1);
>
> #undef UVCG_OPTS_ATTR
>
> @@ -2401,6 +2402,7 @@ static struct configfs_attribute *uvc_attrs[] = {
> &f_uvc_opts_attr_streaming_maxpacket,
> &f_uvc_opts_attr_streaming_maxburst,
> &f_uvc_opts_attr_req_int_skip_div,
> + &f_uvc_opts_attr_sg_supported,
> &f_uvc_opts_string_attr_function_name,
> NULL,
> };
> diff --git a/drivers/usb/gadget/function/uvc_queue.c b/drivers/usb/gadget/function/uvc_queue.c
> index 02559906a55a..3c7aa5c4bba2 100644
> --- a/drivers/usb/gadget/function/uvc_queue.c
> +++ b/drivers/usb/gadget/function/uvc_queue.c
> @@ -149,11 +149,11 @@ int uvcg_queue_init(struct uvc_video_queue *queue, struct device *dev, enum v4l2
> queue->queue.buf_struct_size = sizeof(struct uvc_buffer);
> queue->queue.ops = &uvc_queue_qops;
> queue->queue.lock = lock;
> - if (cdev->gadget->sg_supported) {
> + if (queue->use_sg && cdev->gadget->sg_supported) {
> queue->queue.mem_ops = &vb2_dma_sg_memops;
> - queue->use_sg = 1;
> } else {
> queue->queue.mem_ops = &vb2_vmalloc_memops;
> + queue->use_sg = false;
> }
>
> queue->queue.timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY

2022-10-18 14:54:36

by Dan Scally

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] usb: gadget: uvc: add configfs option for sg support

Hi Michael

On 18/10/2022 15:04, Michael Grzeschik wrote:
> Hi Dan!
> Hi Dan!
>
> On Tue, Oct 18, 2022 at 02:27:13PM +0100, Dan Scally wrote:
>> Hi Dan
>>
>> On 17/10/2022 21:54, Dan Vacura wrote:
>>> The scatter gather support doesn't appear to work well with some UDC
>>> hw.
>>> Add the ability to turn on the feature depending on the controller in
>>> use.
>>>
>>> Signed-off-by: Dan Vacura <[email protected]>
>>
>>
>> Nitpick: I would call it use_sg everywhere, but either way:
>
> Or even only "scatter_gather". How does that sound?
>
>>
>> Reviewed-by: Daniel Scally <[email protected]>
>>
>> Tested-by: Daniel Scally <[email protected]>
>>
>>> ---
>>> V1 -> V2:
>>> - no change, new patch in serie
>>> V2 -> V3:
>>> - default on, same as baseline
>>>
>>>  Documentation/ABI/testing/configfs-usb-gadget-uvc | 1 +
>>>  Documentation/usb/gadget-testing.rst              | 2 ++
>>>  drivers/usb/gadget/function/f_uvc.c               | 2 ++
>>>  drivers/usb/gadget/function/u_uvc.h               | 1 +
>>>  drivers/usb/gadget/function/uvc_configfs.c        | 2 ++
>>>  drivers/usb/gadget/function/uvc_queue.c           | 4 ++--
>>>  6 files changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/Documentation/ABI/testing/configfs-usb-gadget-uvc
>>> b/Documentation/ABI/testing/configfs-usb-gadget-uvc
>>> index 5dfaa3f7f6a4..839a75fc28ee 100644
>>> --- a/Documentation/ABI/testing/configfs-usb-gadget-uvc
>>> +++ b/Documentation/ABI/testing/configfs-usb-gadget-uvc
>>> @@ -9,6 +9,7 @@ Description:    UVC function directory
>>>          streaming_interval    1..16
>>>          function_name        string [32]
>>>          req_int_skip_div    unsigned int
>>> +        sg_supported        0..1
>>>          ===================    =============================
>>>  What: /config/usb-gadget/gadget/functions/uvc.name/control
>>> diff --git a/Documentation/usb/gadget-testing.rst
>>> b/Documentation/usb/gadget-testing.rst
>>> index f9b5a09be1f4..8e3072d6a590 100644
>>> --- a/Documentation/usb/gadget-testing.rst
>>> +++ b/Documentation/usb/gadget-testing.rst
>>> @@ -796,6 +796,8 @@ The uvc function provides these attributes in
>>> its function directory:
>>>      function_name       name of the interface
>>>      req_int_skip_div    divisor of total requests to aid in
>>> calculating
>>>                  interrupt frequency, 0 indicates all interrupt
>>> +    sg_supported        allow for scatter gather to be used if the UDC
>>> +                hw supports it
>>>      ===================
>>> ================================================
>>>  There are also "control" and "streaming" subdirectories, each of
>>> which contain
>>> diff --git a/drivers/usb/gadget/function/f_uvc.c
>>> b/drivers/usb/gadget/function/f_uvc.c
>>> index e40ca26b9c55..d08ebe3ffeb2 100644
>>> --- a/drivers/usb/gadget/function/f_uvc.c
>>> +++ b/drivers/usb/gadget/function/f_uvc.c
>>> @@ -656,6 +656,7 @@ uvc_function_bind(struct usb_configuration *c,
>>> struct usb_function *f)
>>>                  (opts->streaming_maxburst + 1));
>>>      uvc->config_skip_int_div = opts->req_int_skip_div;
>>> +    uvc->video.queue.use_sg = opts->sg_supported;
>
> Why do you set this here?
>
>>>      /* Allocate endpoints. */
>>>      ep = usb_ep_autoconfig(cdev->gadget, &uvc_control_ep);
>>> @@ -875,6 +876,7 @@ static struct usb_function_instance
>>> *uvc_alloc_inst(void)
>>>      opts->streaming_interval = 1;
>>>      opts->streaming_maxpacket = 1024;
>>>      opts->req_int_skip_div = 4;
>>> +    opts->sg_supported = 1;
>>>      snprintf(opts->function_name, sizeof(opts->function_name), "UVC
>>> Camera");
>>>      ret = uvcg_attach_configfs(opts);
>>> diff --git a/drivers/usb/gadget/function/u_uvc.h
>>> b/drivers/usb/gadget/function/u_uvc.h
>>> index 6f73bd5638ed..5ccced629925 100644
>>> --- a/drivers/usb/gadget/function/u_uvc.h
>>> +++ b/drivers/usb/gadget/function/u_uvc.h
>>> @@ -25,6 +25,7 @@ struct f_uvc_opts {
>>>      unsigned int                    streaming_maxpacket;
>>>      unsigned int                    streaming_maxburst;
>>>      unsigned int                    req_int_skip_div;
>>> +    unsigned int                    sg_supported;
>>>      unsigned int                    control_interface;
>>>      unsigned int                    streaming_interface;
>>> diff --git a/drivers/usb/gadget/function/uvc_configfs.c
>>> b/drivers/usb/gadget/function/uvc_configfs.c
>>> index 419e926ab57e..3784c0e02d01 100644
>>> --- a/drivers/usb/gadget/function/uvc_configfs.c
>>> +++ b/drivers/usb/gadget/function/uvc_configfs.c
>>> @@ -2351,6 +2351,7 @@ UVCG_OPTS_ATTR(streaming_interval,
>>> streaming_interval, 16);
>>>  UVCG_OPTS_ATTR(streaming_maxpacket, streaming_maxpacket, 3072);
>>>  UVCG_OPTS_ATTR(streaming_maxburst, streaming_maxburst, 15);
>>>  UVCG_OPTS_ATTR(req_int_skip_div, req_int_skip_div, UINT_MAX);
>>> +UVCG_OPTS_ATTR(sg_supported, sg_supported, 1);
>>>  #undef UVCG_OPTS_ATTR
>>> @@ -2401,6 +2402,7 @@ static struct configfs_attribute *uvc_attrs[] = {
>>>      &f_uvc_opts_attr_streaming_maxpacket,
>>>      &f_uvc_opts_attr_streaming_maxburst,
>>>      &f_uvc_opts_attr_req_int_skip_div,
>>> +    &f_uvc_opts_attr_sg_supported,
>>>      &f_uvc_opts_string_attr_function_name,
>>>      NULL,
>>>  };
>>> diff --git a/drivers/usb/gadget/function/uvc_queue.c
>>> b/drivers/usb/gadget/function/uvc_queue.c
>>> index 02559906a55a..3c7aa5c4bba2 100644
>>> --- a/drivers/usb/gadget/function/uvc_queue.c
>>> +++ b/drivers/usb/gadget/function/uvc_queue.c
>>> @@ -149,11 +149,11 @@ int uvcg_queue_init(struct uvc_video_queue
>>> *queue, struct device *dev, enum v4l2
>>>      queue->queue.buf_struct_size = sizeof(struct uvc_buffer);
>>>      queue->queue.ops = &uvc_queue_qops;
>>>      queue->queue.lock = lock;
>>> -    if (cdev->gadget->sg_supported) {
>>> +    if (queue->use_sg && cdev->gadget->sg_supported) {
>>>          queue->queue.mem_ops = &vb2_dma_sg_memops;
>>> -        queue->use_sg = 1;
>>>      } else {
>>>          queue->queue.mem_ops = &vb2_vmalloc_memops;
>>> +        queue->use_sg = false;
>
> I am unsure, but can you actually not always use vb2_dma_sg_memops.


We have problems with this on the imx8mp EVK, because it doesn't account
for dma-range properties in device tree and so will attempt to allocate
memory outside the 4GB range that a DWC3 can access. The allocation
attempts to fall back on the swiotlb, but that's slow even if it
works...and it can fail and return -ENOMEM if the swiotlb is too small.


Probably some future work needs to fix the vb2_dma_sg_memops to take
those dma constraints into account, but for now I wouldn't force its use.

>
> With my last patch we always set buf->mem to vb2_plane_vaddr(vb, 0);
>
> https://lore.kernel.org/linux-usb/[email protected]/T/#u
>
>
>
> The condition to decide if encode_isoc_sg or encode_isoc should then
> remain the last place to switch between sg or not. I would hook the
> userspace decision in here.
>
> You can also directly get to opts->scatter_gather by using
>
>     struct f_uvc_opts *opts = fi_to_f_uvc_opts(uvc->func.fi);
>
> in the function uvcg_video_enable.
>
>>>      }
>>>      queue->queue.timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY
>>
>
> Thanks,
> Michael
>

2022-10-18 15:12:46

by Dan Vacura

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] usb: gadget: uvc: add configfs option for sg support

Hi Dan and Michael,

On Tue, Oct 18, 2022 at 03:10:38PM +0100, Dan Scally wrote:
> Hi Michael - again!
>
> On 18/10/2022 15:04, Michael Grzeschik wrote:
> > Hi Dan!
> > Hi Dan!
> >
> > On Tue, Oct 18, 2022 at 02:27:13PM +0100, Dan Scally wrote:
> > > Hi Dan
> > >
> > > On 17/10/2022 21:54, Dan Vacura wrote:
> > > > The scatter gather support doesn't appear to work well with some
> > > > UDC hw.
> > > > Add the ability to turn on the feature depending on the controller in
> > > > use.
> > > >
> > > > Signed-off-by: Dan Vacura <[email protected]>
> > >
> > >
> > > Nitpick: I would call it use_sg everywhere, but either way:
> >
> > Or even only "scatter_gather". How does that sound?
>
>
> I think I prefer use_sg actually, but I don't have a strong feeling either
> way.

I went with sg_supported since use_sg and scatter_gather may imply that
the feature will be guaranteed to be used if set to 1. I thought that
sg_supported conveyed that the driver supports sg ability and expressed
that it'll be used if the UDC driver supports it. Also, that name is
used in struct usb_gadget, with the similar wording: "true if we can
handle scatter-gather".

>
> >
> > >
> > > Reviewed-by: Daniel Scally <[email protected]>
> > >
> > > Tested-by: Daniel Scally <[email protected]>
> > >
> > > > ---
> > > > V1 -> V2:
> > > > - no change, new patch in serie
> > > > V2 -> V3:
> > > > - default on, same as baseline
> > > >
> > > > ?Documentation/ABI/testing/configfs-usb-gadget-uvc | 1 +
> > > > ?Documentation/usb/gadget-testing.rst????????????? | 2 ++
> > > > ?drivers/usb/gadget/function/f_uvc.c?????????????? | 2 ++
> > > > ?drivers/usb/gadget/function/u_uvc.h?????????????? | 1 +
> > > > ?drivers/usb/gadget/function/uvc_configfs.c??????? | 2 ++
> > > > ?drivers/usb/gadget/function/uvc_queue.c?????????? | 4 ++--
> > > > ?6 files changed, 10 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/Documentation/ABI/testing/configfs-usb-gadget-uvc
> > > > b/Documentation/ABI/testing/configfs-usb-gadget-uvc
> > > > index 5dfaa3f7f6a4..839a75fc28ee 100644
> > > > --- a/Documentation/ABI/testing/configfs-usb-gadget-uvc
> > > > +++ b/Documentation/ABI/testing/configfs-usb-gadget-uvc
> > > > @@ -9,6 +9,7 @@ Description:??? UVC function directory
> > > > ???????? streaming_interval??? 1..16
> > > > ???????? function_name??????? string [32]
> > > > ???????? req_int_skip_div??? unsigned int
> > > > +??????? sg_supported??????? 0..1
> > > > ???????? ===================??? =============================
> > > > ?What: /config/usb-gadget/gadget/functions/uvc.name/control
> > > > diff --git a/Documentation/usb/gadget-testing.rst
> > > > b/Documentation/usb/gadget-testing.rst
> > > > index f9b5a09be1f4..8e3072d6a590 100644
> > > > --- a/Documentation/usb/gadget-testing.rst
> > > > +++ b/Documentation/usb/gadget-testing.rst
> > > > @@ -796,6 +796,8 @@ The uvc function provides these attributes
> > > > in its function directory:
> > > > ???? function_name?????? name of the interface
> > > > ???? req_int_skip_div??? divisor of total requests to aid in
> > > > calculating
> > > > ???????????????? interrupt frequency, 0 indicates all interrupt
> > > > +??? sg_supported??????? allow for scatter gather to be used if the UDC
> > > > +??????????????? hw supports it
> > > > ???? ===================
> > > > ================================================
> > > > ?There are also "control" and "streaming" subdirectories, each
> > > > of which contain
> > > > diff --git a/drivers/usb/gadget/function/f_uvc.c
> > > > b/drivers/usb/gadget/function/f_uvc.c
> > > > index e40ca26b9c55..d08ebe3ffeb2 100644
> > > > --- a/drivers/usb/gadget/function/f_uvc.c
> > > > +++ b/drivers/usb/gadget/function/f_uvc.c
> > > > @@ -656,6 +656,7 @@ uvc_function_bind(struct usb_configuration
> > > > *c, struct usb_function *f)
> > > > ???????????????? (opts->streaming_maxburst + 1));
> > > > ???? uvc->config_skip_int_div = opts->req_int_skip_div;
> > > > +??? uvc->video.queue.use_sg = opts->sg_supported;
> >
> > Why do you set this here?

This is set here to enable or disable the support. I wasn't aware of
direct access to opts in uvcg_queue_init(). I'll update to use it
directly as that'll be more clear.

> >
> > > > ???? /* Allocate endpoints. */
> > > > ???? ep = usb_ep_autoconfig(cdev->gadget, &uvc_control_ep);
> > > > @@ -875,6 +876,7 @@ static struct usb_function_instance
> > > > *uvc_alloc_inst(void)
> > > > ???? opts->streaming_interval = 1;
> > > > ???? opts->streaming_maxpacket = 1024;
> > > > ???? opts->req_int_skip_div = 4;
> > > > +??? opts->sg_supported = 1;
> > > > ???? snprintf(opts->function_name, sizeof(opts->function_name),
> > > > "UVC Camera");
> > > > ???? ret = uvcg_attach_configfs(opts);
> > > > diff --git a/drivers/usb/gadget/function/u_uvc.h
> > > > b/drivers/usb/gadget/function/u_uvc.h
> > > > index 6f73bd5638ed..5ccced629925 100644
> > > > --- a/drivers/usb/gadget/function/u_uvc.h
> > > > +++ b/drivers/usb/gadget/function/u_uvc.h
> > > > @@ -25,6 +25,7 @@ struct f_uvc_opts {
> > > > ???? unsigned int??????????????????? streaming_maxpacket;
> > > > ???? unsigned int??????????????????? streaming_maxburst;
> > > > ???? unsigned int??????????????????? req_int_skip_div;
> > > > +??? unsigned int??????????????????? sg_supported;
> > > > ???? unsigned int??????????????????? control_interface;
> > > > ???? unsigned int??????????????????? streaming_interface;
> > > > diff --git a/drivers/usb/gadget/function/uvc_configfs.c
> > > > b/drivers/usb/gadget/function/uvc_configfs.c
> > > > index 419e926ab57e..3784c0e02d01 100644
> > > > --- a/drivers/usb/gadget/function/uvc_configfs.c
> > > > +++ b/drivers/usb/gadget/function/uvc_configfs.c
> > > > @@ -2351,6 +2351,7 @@ UVCG_OPTS_ATTR(streaming_interval,
> > > > streaming_interval, 16);
> > > > ?UVCG_OPTS_ATTR(streaming_maxpacket, streaming_maxpacket, 3072);
> > > > ?UVCG_OPTS_ATTR(streaming_maxburst, streaming_maxburst, 15);
> > > > ?UVCG_OPTS_ATTR(req_int_skip_div, req_int_skip_div, UINT_MAX);
> > > > +UVCG_OPTS_ATTR(sg_supported, sg_supported, 1);
> > > > ?#undef UVCG_OPTS_ATTR
> > > > @@ -2401,6 +2402,7 @@ static struct configfs_attribute *uvc_attrs[] = {
> > > > ???? &f_uvc_opts_attr_streaming_maxpacket,
> > > > ???? &f_uvc_opts_attr_streaming_maxburst,
> > > > ???? &f_uvc_opts_attr_req_int_skip_div,
> > > > +??? &f_uvc_opts_attr_sg_supported,
> > > > ???? &f_uvc_opts_string_attr_function_name,
> > > > ???? NULL,
> > > > ?};
> > > > diff --git a/drivers/usb/gadget/function/uvc_queue.c
> > > > b/drivers/usb/gadget/function/uvc_queue.c
> > > > index 02559906a55a..3c7aa5c4bba2 100644
> > > > --- a/drivers/usb/gadget/function/uvc_queue.c
> > > > +++ b/drivers/usb/gadget/function/uvc_queue.c
> > > > @@ -149,11 +149,11 @@ int uvcg_queue_init(struct uvc_video_queue
> > > > *queue, struct device *dev, enum v4l2
> > > > ???? queue->queue.buf_struct_size = sizeof(struct uvc_buffer);
> > > > ???? queue->queue.ops = &uvc_queue_qops;
> > > > ???? queue->queue.lock = lock;
> > > > -??? if (cdev->gadget->sg_supported) {
> > > > +??? if (queue->use_sg && cdev->gadget->sg_supported) {
> > > > ???????? queue->queue.mem_ops = &vb2_dma_sg_memops;
> > > > -??????? queue->use_sg = 1;
> > > > ???? } else {
> > > > ???????? queue->queue.mem_ops = &vb2_vmalloc_memops;
> > > > +??????? queue->use_sg = false;
> >
> > I am unsure, but can you actually not always use vb2_dma_sg_memops.
> >
> > With my last patch we always set buf->mem to vb2_plane_vaddr(vb, 0);
> >
> > https://lore.kernel.org/linux-usb/[email protected]/T/#u
> >
> >
> >
> > The condition to decide if encode_isoc_sg or encode_isoc should then
> > remain the last place to switch between sg or not. I would hook the
> > userspace decision in here.
> >
> > You can also directly get to opts->scatter_gather by using
> >
> > ????struct f_uvc_opts *opts = fi_to_f_uvc_opts(uvc->func.fi);
> >
> > in the function uvcg_video_enable.
> >
> > > > ???? }
> > > > ???? queue->queue.timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY
> > >
> >
> > Thanks,
> > Michael
> >

2022-10-18 15:50:19

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] usb: gadget: uvc: add configfs option for sg support

On Tue, Oct 18, 2022 at 02:27:13PM +0100, Dan Scally wrote:
> Hi Dan
>
> On 17/10/2022 21:54, Dan Vacura wrote:
> > The scatter gather support doesn't appear to work well with some UDC hw.
> > Add the ability to turn on the feature depending on the controller in
> > use.
> >
> > Signed-off-by: Dan Vacura <[email protected]>
>
>
> Nitpick: I would call it use_sg everywhere, but either way:
>
>
> Reviewed-by: Daniel Scally <[email protected]>
>
> Tested-by: Daniel Scally <[email protected]>
>
> > ---
> > V1 -> V2:
> > - no change, new patch in serie
> > V2 -> V3:
> > - default on, same as baseline
> >
> > Documentation/ABI/testing/configfs-usb-gadget-uvc | 1 +
> > Documentation/usb/gadget-testing.rst | 2 ++
> > drivers/usb/gadget/function/f_uvc.c | 2 ++
> > drivers/usb/gadget/function/u_uvc.h | 1 +
> > drivers/usb/gadget/function/uvc_configfs.c | 2 ++
> > drivers/usb/gadget/function/uvc_queue.c | 4 ++--
> > 6 files changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/ABI/testing/configfs-usb-gadget-uvc b/Documentation/ABI/testing/configfs-usb-gadget-uvc
> > index 5dfaa3f7f6a4..839a75fc28ee 100644
> > --- a/Documentation/ABI/testing/configfs-usb-gadget-uvc
> > +++ b/Documentation/ABI/testing/configfs-usb-gadget-uvc
> > @@ -9,6 +9,7 @@ Description: UVC function directory
> > streaming_interval 1..16
> > function_name string [32]
> > req_int_skip_div unsigned int
> > + sg_supported 0..1
> > =================== =============================
> > What: /config/usb-gadget/gadget/functions/uvc.name/control
> > diff --git a/Documentation/usb/gadget-testing.rst b/Documentation/usb/gadget-testing.rst
> > index f9b5a09be1f4..8e3072d6a590 100644
> > --- a/Documentation/usb/gadget-testing.rst
> > +++ b/Documentation/usb/gadget-testing.rst
> > @@ -796,6 +796,8 @@ The uvc function provides these attributes in its function directory:
> > function_name name of the interface
> > req_int_skip_div divisor of total requests to aid in calculating
> > interrupt frequency, 0 indicates all interrupt
> > + sg_supported allow for scatter gather to be used if the UDC
> > + hw supports it

Why is a configuration option needed for this? Why not always use SG
when the UDC supports it? Or at least, make the decision automatically
(say, based on the amount of data to be transferred) with no need for
any user input?

Is this because the SG support in some UDC drivers is buggy? In that
case the proper approach is to fix the UDC drivers, not add new options
that users won't know when to use.

Or is it because the UDC hardware itself is buggy? In that case the
best approach is to fix the UDC drivers so that they don't advertise
working SG support when the hardware is unable to handle it.

Alan Stern

2022-10-18 16:04:46

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] usb: gadget: uvc: add configfs option for sg support

On Tue, Oct 18, 2022 at 10:14:54AM -0500, Dan Vacura wrote:
> Hi Alan,
>
> On Tue, Oct 18, 2022 at 10:32:33AM -0400, Alan Stern wrote:
> > On Tue, Oct 18, 2022 at 02:27:13PM +0100, Dan Scally wrote:
> > > Hi Dan

> > > > --- a/Documentation/usb/gadget-testing.rst
> > > > +++ b/Documentation/usb/gadget-testing.rst
> > > > @@ -796,6 +796,8 @@ The uvc function provides these attributes in its function directory:
> > > > function_name name of the interface
> > > > req_int_skip_div divisor of total requests to aid in calculating
> > > > interrupt frequency, 0 indicates all interrupt
> > > > + sg_supported allow for scatter gather to be used if the UDC
> > > > + hw supports it
> >
> > Why is a configuration option needed for this? Why not always use SG
> > when the UDC supports it? Or at least, make the decision automatically
> > (say, based on the amount of data to be transferred) with no need for
> > any user input?
>
> Patches for a fix and to select to use SG depending on amount of data
> are already submitted and under review. I agree, ideally we don't need
> this patch, but there have been several regressions uncovered with
> enabling this support and it takes time to root cause these issues.

Please put this information into the patch description, and maybe also
into the documentation file. For your readers' and reviewers' sake it's
important -- probably _more_ important -- to explain why you're making a
change than what that change is.

Alan Stern

> In my specific environment, Android GKI 2.0, changes need to get
> upstreamed first here before they're pulled into Android device
> software. Having this logic in place gives us the ability to turn off
> this functionality without going through this process. A revert was also
> considered until all the bugs are resolved, but the code is quite
> entrenched now to take out, plus others seem to benefit from it being
> enabled. Thus the configurability.

2022-10-18 16:24:57

by Michael Grzeschik

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] usb: gadget: uvc: add configfs option for sg support

Hi Dan!

On Tue, Oct 18, 2022 at 10:14:54AM -0500, Dan Vacura wrote:
>On Tue, Oct 18, 2022 at 10:32:33AM -0400, Alan Stern wrote:
>> On Tue, Oct 18, 2022 at 02:27:13PM +0100, Dan Scally wrote:
>> > On 17/10/2022 21:54, Dan Vacura wrote:
>> > > The scatter gather support doesn't appear to work well with some UDC hw.
>> > > Add the ability to turn on the feature depending on the controller in
>> > > use.
>> > >
>> > > Signed-off-by: Dan Vacura <[email protected]>
>> >
>> >
>> > Nitpick: I would call it use_sg everywhere, but either way:
>> >
>> >
>> > Reviewed-by: Daniel Scally <[email protected]>
>> >
>> > Tested-by: Daniel Scally <[email protected]>
>> >
>> > > ---
>> > > V1 -> V2:
>> > > - no change, new patch in serie
>> > > V2 -> V3:
>> > > - default on, same as baseline
>> > >
>> > > Documentation/ABI/testing/configfs-usb-gadget-uvc | 1 +
>> > > Documentation/usb/gadget-testing.rst | 2 ++
>> > > drivers/usb/gadget/function/f_uvc.c | 2 ++
>> > > drivers/usb/gadget/function/u_uvc.h | 1 +
>> > > drivers/usb/gadget/function/uvc_configfs.c | 2 ++
>> > > drivers/usb/gadget/function/uvc_queue.c | 4 ++--
>> > > 6 files changed, 10 insertions(+), 2 deletions(-)
>> > >
>> > > diff --git a/Documentation/ABI/testing/configfs-usb-gadget-uvc b/Documentation/ABI/testing/configfs-usb-gadget-uvc
>> > > index 5dfaa3f7f6a4..839a75fc28ee 100644
>> > > --- a/Documentation/ABI/testing/configfs-usb-gadget-uvc
>> > > +++ b/Documentation/ABI/testing/configfs-usb-gadget-uvc
>> > > @@ -9,6 +9,7 @@ Description: UVC function directory
>> > > streaming_interval 1..16
>> > > function_name string [32]
>> > > req_int_skip_div unsigned int
>> > > + sg_supported 0..1
>> > > =================== =============================
>> > > What: /config/usb-gadget/gadget/functions/uvc.name/control
>> > > diff --git a/Documentation/usb/gadget-testing.rst b/Documentation/usb/gadget-testing.rst
>> > > index f9b5a09be1f4..8e3072d6a590 100644
>> > > --- a/Documentation/usb/gadget-testing.rst
>> > > +++ b/Documentation/usb/gadget-testing.rst
>> > > @@ -796,6 +796,8 @@ The uvc function provides these attributes in its function directory:
>> > > function_name name of the interface
>> > > req_int_skip_div divisor of total requests to aid in calculating
>> > > interrupt frequency, 0 indicates all interrupt
>> > > + sg_supported allow for scatter gather to be used if the UDC
>> > > + hw supports it
>>
>> Why is a configuration option needed for this? Why not always use SG
>> when the UDC supports it? Or at least, make the decision automatically
>> (say, based on the amount of data to be transferred) with no need for
>> any user input?
>
>Patches for a fix and to select to use SG depending on amount of data
>are already submitted and under review. I agree, ideally we don't need
>this patch, but there have been several regressions uncovered with
>enabling this support and it takes time to root cause these issues.
>

>In my specific environment, Android GKI 2.0, changes need to get
>upstreamed first here before they're pulled into Android device
>software.

In fact this is actually a good policy, but adding workarounds mainline
to "hopefully" fix the real problems later are probably not what this
policy is about. Hopefully!

Michael

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |


Attachments:
(No filename) (3.69 kB)
signature.asc (849.00 B)
Download all attachments

2022-10-18 16:28:34

by Dan Vacura

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] usb: gadget: uvc: add configfs option for sg support

Hi Alan,

On Tue, Oct 18, 2022 at 10:32:33AM -0400, Alan Stern wrote:
> On Tue, Oct 18, 2022 at 02:27:13PM +0100, Dan Scally wrote:
> > Hi Dan
> >
> > On 17/10/2022 21:54, Dan Vacura wrote:
> > > The scatter gather support doesn't appear to work well with some UDC hw.
> > > Add the ability to turn on the feature depending on the controller in
> > > use.
> > >
> > > Signed-off-by: Dan Vacura <[email protected]>
> >
> >
> > Nitpick: I would call it use_sg everywhere, but either way:
> >
> >
> > Reviewed-by: Daniel Scally <[email protected]>
> >
> > Tested-by: Daniel Scally <[email protected]>
> >
> > > ---
> > > V1 -> V2:
> > > - no change, new patch in serie
> > > V2 -> V3:
> > > - default on, same as baseline
> > >
> > > Documentation/ABI/testing/configfs-usb-gadget-uvc | 1 +
> > > Documentation/usb/gadget-testing.rst | 2 ++
> > > drivers/usb/gadget/function/f_uvc.c | 2 ++
> > > drivers/usb/gadget/function/u_uvc.h | 1 +
> > > drivers/usb/gadget/function/uvc_configfs.c | 2 ++
> > > drivers/usb/gadget/function/uvc_queue.c | 4 ++--
> > > 6 files changed, 10 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/Documentation/ABI/testing/configfs-usb-gadget-uvc b/Documentation/ABI/testing/configfs-usb-gadget-uvc
> > > index 5dfaa3f7f6a4..839a75fc28ee 100644
> > > --- a/Documentation/ABI/testing/configfs-usb-gadget-uvc
> > > +++ b/Documentation/ABI/testing/configfs-usb-gadget-uvc
> > > @@ -9,6 +9,7 @@ Description: UVC function directory
> > > streaming_interval 1..16
> > > function_name string [32]
> > > req_int_skip_div unsigned int
> > > + sg_supported 0..1
> > > =================== =============================
> > > What: /config/usb-gadget/gadget/functions/uvc.name/control
> > > diff --git a/Documentation/usb/gadget-testing.rst b/Documentation/usb/gadget-testing.rst
> > > index f9b5a09be1f4..8e3072d6a590 100644
> > > --- a/Documentation/usb/gadget-testing.rst
> > > +++ b/Documentation/usb/gadget-testing.rst
> > > @@ -796,6 +796,8 @@ The uvc function provides these attributes in its function directory:
> > > function_name name of the interface
> > > req_int_skip_div divisor of total requests to aid in calculating
> > > interrupt frequency, 0 indicates all interrupt
> > > + sg_supported allow for scatter gather to be used if the UDC
> > > + hw supports it
>
> Why is a configuration option needed for this? Why not always use SG
> when the UDC supports it? Or at least, make the decision automatically
> (say, based on the amount of data to be transferred) with no need for
> any user input?

Patches for a fix and to select to use SG depending on amount of data
are already submitted and under review. I agree, ideally we don't need
this patch, but there have been several regressions uncovered with
enabling this support and it takes time to root cause these issues.

In my specific environment, Android GKI 2.0, changes need to get
upstreamed first here before they're pulled into Android device
software. Having this logic in place gives us the ability to turn off
this functionality without going through this process. A revert was also
considered until all the bugs are resolved, but the code is quite
entrenched now to take out, plus others seem to benefit from it being
enabled. Thus the configurability.

>
> Is this because the SG support in some UDC drivers is buggy? In that
> case the proper approach is to fix the UDC drivers, not add new options
> that users won't know when to use.
>
> Or is it because the UDC hardware itself is buggy? In that case the
> best approach is to fix the UDC drivers so that they don't advertise
> working SG support when the hardware is unable to handle it.
>
> Alan Stern

2022-10-18 19:16:53

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

Hi Dan,

On Mon, Oct 17, 2022, Dan Vacura wrote:
> Hi Thinh,
>
> On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
> > On Mon, Oct 17, 2022, Dan Vacura wrote:
> > > From: Jeff Vanhoof <[email protected]>
> > >
> > > arm-smmu related crashes seen after a Missed ISOC interrupt when
> > > no_interrupt=1 is used. This can happen if the hardware is still using
> > > the data associated with a TRB after the usb_request's ->complete call
> > > has been made. Instead of immediately releasing a request when a Missed
> > > ISOC interrupt has occurred, this change will add logic to cancel the
> > > request instead where it will eventually be released when the
> > > END_TRANSFER command has completed. This logic is similar to some of the
> > > cleanup done in dwc3_gadget_ep_dequeue.
> >
> > This doesn't sound right. How did you determine that the hardware is
> > still using the data associated with the TRB? Did you check the TRB's
> > HWO bit?
>
> The problem we're seeing was mentioned in the summary of this patch
> series, issue #1. Basically, with the following patch
> https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/__;!!A4F2R9G_pg!aSNZ-IjMcPgL47A4NR5qp9qhVlP91UGTuCxej5NRTv8-FmTrMkKK7CjNToQQVEgtpqbKzLU2HXET9O226AEN$
> integrated a smmu panic is occurring on our Android device with the 5.15
> kernel which is:
>
> <3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!
>
> The uvc gadget driver appears to be the first (and only) gadget that
> uses the no_interrupt=1 logic, so this seems to be a new condition for
> the dwc3 driver. In our configuration, we have up to 64 requests and the
> no_interrupt=1 for up to 15 requests. The list size of dep->started_list
> would get up to that amount when looping through to cleanup the
> completed requests. From testing and debugging the smmu panic occurs
> when a -EXDEV status shows up and right after
> dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion
> we had was the requests were getting returned to the gadget too early.

As I mentioned, if the status is updated to missed isoc, that means that
the controller returned ownership of the TRB to the driver. At least for
the particular request with -EXDEV, its TRBs are completed. I'm not
clear on your conclusion.

Do we know where did the crash occur? Is it from dwc3 driver or from uvc
driver, and at what line? It'd great if we can see the driver log.

>
> >
> > The dwc3 driver would only give back the requests if the TRBs of the
> > associated requests are completed or when the device is disconnected.
> > If the TRB indicated missed isoc, that means that the TRB is completed
> > and its status was updated.
>
> Interesting, the device is not disconnected as we don't get the
> -ESHUTDOWN status back and with this patch in place things continue
> after a -EXDEV status is received.
>

Actually, minor correction here: a recent change
b44c0e7fef51 ("usb: dwc3: gadget: conditionally remove requests")
changed -ESHUTDOWN request status to -ECONNRESET when disable endpoint.
This doesn't look right.

While disabling endpoint may also apply for other cases such as
switching alternate interface in addition to disconnect, -ESHUTDOWN
seems more fitting there.

Hi Michael,

Can you help clarify for the change above? This changed the usage of
requests. Now requests returned by disconnection won't be returned as
-ESHUTDOWN.

> >
> > There's a special case which dwc3 may give back requests early is the
> > case of the device disconnecting. The requests should be returned with
> > -ESHUTDOWN, and the gadget driver shouldn't be re-using the requests on
> > de-initialization anyway.
> >
> > We should not issue End Transfer command just because of missed isoc. We
> > may want issue End Transfer if the gadget driver is too slow and unable
> > to feed requests in time (causing underrun and missed isoc) to resync
> > with the host, but we already handle that.
>
> Hmm, isn't that what happens when we get into this
> condition in dwc3_gadget_endpoint_trbs_complete():
>
> if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> list_empty(&dep->started_list) &&
> (list_empty(&dep->pending_list) || status == -EXDEV))
> dwc3_stop_active_transfer(dep, true, true);
>

Yes, it's being handled there.

> >
> > I'm still not clear what's the problem you're seeing. Do you have the
> > crash log? Tracepoints?
> >
>
> Appreciate the support!
>

Thanks,
Thinh

2022-10-18 19:53:34

by Michael Grzeschik

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

Hi Thinh,

On Tue, Oct 18, 2022 at 06:45:40PM +0000, Thinh Nguyen wrote:
>On Mon, Oct 17, 2022, Dan Vacura wrote:
>> On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
>> > On Mon, Oct 17, 2022, Dan Vacura wrote:
>> > > From: Jeff Vanhoof <[email protected]>
>> > >
>> > > arm-smmu related crashes seen after a Missed ISOC interrupt when
>> > > no_interrupt=1 is used. This can happen if the hardware is still using
>> > > the data associated with a TRB after the usb_request's ->complete call
>> > > has been made. Instead of immediately releasing a request when a Missed
>> > > ISOC interrupt has occurred, this change will add logic to cancel the
>> > > request instead where it will eventually be released when the
>> > > END_TRANSFER command has completed. This logic is similar to some of the
>> > > cleanup done in dwc3_gadget_ep_dequeue.
>> >
>> > This doesn't sound right. How did you determine that the hardware is
>> > still using the data associated with the TRB? Did you check the TRB's
>> > HWO bit?
>>
>> The problem we're seeing was mentioned in the summary of this patch
>> series, issue #1. Basically, with the following patch
>> https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/__;!!A4F2R9G_pg!aSNZ-IjMcPgL47A4NR5qp9qhVlP91UGTuCxej5NRTv8-FmTrMkKK7CjNToQQVEgtpqbKzLU2HXET9O226AEN$
>> integrated a smmu panic is occurring on our Android device with the 5.15
>> kernel which is:
>>
>> <3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!
>>
>> The uvc gadget driver appears to be the first (and only) gadget that
>> uses the no_interrupt=1 logic, so this seems to be a new condition for
>> the dwc3 driver. In our configuration, we have up to 64 requests and the
>> no_interrupt=1 for up to 15 requests. The list size of dep->started_list
>> would get up to that amount when looping through to cleanup the
>> completed requests. From testing and debugging the smmu panic occurs
>> when a -EXDEV status shows up and right after
>> dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion
>> we had was the requests were getting returned to the gadget too early.
>
>As I mentioned, if the status is updated to missed isoc, that means that
>the controller returned ownership of the TRB to the driver. At least for
>the particular request with -EXDEV, its TRBs are completed. I'm not
>clear on your conclusion.
>
>Do we know where did the crash occur? Is it from dwc3 driver or from uvc
>driver, and at what line? It'd great if we can see the driver log.
>
>>
>> >
>> > The dwc3 driver would only give back the requests if the TRBs of the
>> > associated requests are completed or when the device is disconnected.
>> > If the TRB indicated missed isoc, that means that the TRB is completed
>> > and its status was updated.
>>
>> Interesting, the device is not disconnected as we don't get the
>> -ESHUTDOWN status back and with this patch in place things continue
>> after a -EXDEV status is received.
>>
>
>Actually, minor correction here: a recent change
>b44c0e7fef51 ("usb: dwc3: gadget: conditionally remove requests")
>changed -ESHUTDOWN request status to -ECONNRESET when disable endpoint.
>This doesn't look right.
>
>While disabling endpoint may also apply for other cases such as
>switching alternate interface in addition to disconnect, -ESHUTDOWN
>seems more fitting there.
>
>Hi Michael,
>
>Can you help clarify for the change above? This changed the usage of
>requests. Now requests returned by disconnection won't be returned as
>-ESHUTDOWN.

When writing the patch, I was looking into
Documentation/driver-api/usb/error-codes.rst.

After looking into it today, I see that ESHUTDOWN should be send on
ep_disable (device disable) and ECONNRESET on stop_active_transfer.
So I probably just mixed them up, while writing the patch. :/

The followup patch would then just be to swap the status results of
__dwc3_gadget_ep_disable and dwc3_stop_active_transfers on the
dwc3_remove_requests call.

Michael

>> >
>> > There's a special case which dwc3 may give back requests early is the
>> > case of the device disconnecting. The requests should be returned with
>> > -ESHUTDOWN, and the gadget driver shouldn't be re-using the requests on
>> > de-initialization anyway.
>> >
>> > We should not issue End Transfer command just because of missed isoc. We
>> > may want issue End Transfer if the gadget driver is too slow and unable
>> > to feed requests in time (causing underrun and missed isoc) to resync
>> > with the host, but we already handle that.
>>
>> Hmm, isn't that what happens when we get into this
>> condition in dwc3_gadget_endpoint_trbs_complete():
>>
>> if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
>> list_empty(&dep->started_list) &&
>> (list_empty(&dep->pending_list) || status == -EXDEV))
>> dwc3_stop_active_transfer(dep, true, true);
>>
>
>Yes, it's being handled there.
>
>> >
>> > I'm still not clear what's the problem you're seeing. Do you have the
>> > crash log? Tracepoints?
>> >
>>
>> Appreciate the support!
>>
>
>Thanks,
>Thinh

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |


Attachments:
(No filename) (5.46 kB)
signature.asc (849.00 B)
Download all attachments

2022-10-18 21:18:20

by Jeffrey Vanhoof

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

From qjv001@qjv001-XeonWs Tue Oct 18 15:37:29 2022
From: qjv001 <qjv001@qjv001-XeonWs>
To: Thinh Nguyen <[email protected]>
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of
release after missed isoc
References: <[email protected]>
<[email protected]>
<[email protected]>
<Y04K/HoUigF5FYBA@p1g3>
<[email protected]>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <[email protected]>
X-Mutt-References: <[email protected]>
X-Mutt-Fcc: ~/sent
Status: RO
Date: Tue, 18 Oct 2022 15:37:29 -0500
Content-Length: 5434
Lines: 124

Hi Thinh,

On Tue, Oct 18, 2022 at 06:45:40PM +0000, Thinh Nguyen wrote:
> Hi Dan,
>
> On Mon, Oct 17, 2022, Dan Vacura wrote:
> > Hi Thinh,
> >
> > On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
> > > On Mon, Oct 17, 2022, Dan Vacura wrote:
> > > > From: Jeff Vanhoof <[email protected]>
> > > >
> > > > arm-smmu related crashes seen after a Missed ISOC interrupt when
> > > > no_interrupt=1 is used. This can happen if the hardware is still using
> > > > the data associated with a TRB after the usb_request's ->complete call
> > > > has been made. Instead of immediately releasing a request when a Missed
> > > > ISOC interrupt has occurred, this change will add logic to cancel the
> > > > request instead where it will eventually be released when the
> > > > END_TRANSFER command has completed. This logic is similar to some of the
> > > > cleanup done in dwc3_gadget_ep_dequeue.
> > >
> > > This doesn't sound right. How did you determine that the hardware is
> > > still using the data associated with the TRB? Did you check the TRB's
> > > HWO bit?
> >
> > The problem we're seeing was mentioned in the summary of this patch
> > series, issue #1. Basically, with the following patch
> > https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/__;!!A4F2R9G_pg!aSNZ-IjMcPgL47A4NR5qp9qhVlP91UGTuCxej5NRTv8-FmTrMkKK7CjNToQQVEgtpqbKzLU2HXET9O226AEN$
> > integrated a smmu panic is occurring on our Android device with the 5.15
> > kernel which is:
> >
> > <3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!
> >
> > The uvc gadget driver appears to be the first (and only) gadget that
> > uses the no_interrupt=1 logic, so this seems to be a new condition for
> > the dwc3 driver. In our configuration, we have up to 64 requests and the
> > no_interrupt=1 for up to 15 requests. The list size of dep->started_list
> > would get up to that amount when looping through to cleanup the
> > completed requests. From testing and debugging the smmu panic occurs
> > when a -EXDEV status shows up and right after
> > dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion
> > we had was the requests were getting returned to the gadget too early.
>
> As I mentioned, if the status is updated to missed isoc, that means that
> the controller returned ownership of the TRB to the driver. At least for
> the particular request with -EXDEV, its TRBs are completed. I'm not
> clear on your conclusion.
>
> Do we know where did the crash occur? Is it from dwc3 driver or from uvc
> driver, and at what line? It'd great if we can see the driver log.
>

To interject, what should happen in dwc3_gadget_ep_reclaim_completed_trb if the
IOC bit is not set (but the IMI bit is) and -EXDEV status is passed into it?
If the function returns 0, another attempt to reclaim may occur. If this
happens and the next request did have the HWO bit set, the function would
return 1 but dwc3_gadget_ep_cleanup_completed_request would still call
dwc3_gadget_giveback.

As a test (without this patch), I added a check to see if HWO bit was set in
dwc3_gadget_ep_cleanup_completed_requests(). If the usecase was ISOC and the
HWO bit was set I avoided calling dwc3_gadget_ep_cleanup_completed_request().
This seemed to also avoid the iommu related crash being seen.

Is there an issue in this area that needs to be corrected instead? Not having
interrupts set for each request may be causing some new issues to be uncovered.

As far as the crash seen without this patch, no good stacktrace is given. Line
provided for crash varied a bit, but tended to appear towards the end of
dwc3_stop_active_transfer() or dwc3_gadget_endpoint_trbs_complete().

Since dwc3_gadget_endpoint_trbs_complete() can be called from multiple
locations, I duplicated the function to help identify which path it was likely
being called from. At the time of the crashes seen,
dwc3_gadget_endpoint_transfer_in_progress() appeared to be the caller.

dwc3_gadget_endpoint_transfer_in_progress()
->dwc3_gadget_endpoint_trbs_complete() (crashed towards end of here)
->dwc3_stop_active_transfer() (sometimes crashed towards end of here)

I hope this clarifies things a bit.

> >
> > >
> > > The dwc3 driver would only give back the requests if the TRBs of the
> > > associated requests are completed or when the device is disconnected.
> > > If the TRB indicated missed isoc, that means that the TRB is completed
> > > and its status was updated.
> >
> > Interesting, the device is not disconnected as we don't get the
> > -ESHUTDOWN status back and with this patch in place things continue
> > after a -EXDEV status is received.
> >
>
> Actually, minor correction here: a recent change
> b44c0e7fef51 ("usb: dwc3: gadget: conditionally remove requests")
> changed -ESHUTDOWN request status to -ECONNRESET when disable endpoint.
> This doesn't look right.
>
> While disabling endpoint may also apply for other cases such as
> switching alternate interface in addition to disconnect, -ESHUTDOWN
> seems more fitting there.
>
btw, we don't have "usb: dwc3: gadget: conditionally remove requests" in our baseline

> Hi Michael,
>
> Can you help clarify for the change above? This changed the usage of
> requests. Now requests returned by disconnection won't be returned as
> -ESHUTDOWN.
>
> > >
> > > There's a special case which dwc3 may give back requests early is the
> > > case of the device disconnecting. The requests should be returned with
> > > -ESHUTDOWN, and the gadget driver shouldn't be re-using the requests on
> > > de-initialization anyway.
> > >
> > > We should not issue End Transfer command just because of missed isoc. We
> > > may want issue End Transfer if the gadget driver is too slow and unable
> > > to feed requests in time (causing underrun and missed isoc) to resync
> > > with the host, but we already handle that.
> >
> > Hmm, isn't that what happens when we get into this
> > condition in dwc3_gadget_endpoint_trbs_complete():
> >
> > if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > list_empty(&dep->started_list) &&
> > (list_empty(&dep->pending_list) || status == -EXDEV))
> > dwc3_stop_active_transfer(dep, true, true);
> >
>
> Yes, it's being handled there.
>
> > >
> > > I'm still not clear what's the problem you're seeing. Do you have the
> > > crash log? Tracepoints?
> > >
> >
> > Appreciate the support!
> >
>
> Thanks,
> Thinh

Thanks,
Jeff

2022-10-18 22:53:02

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

On Tue, Oct 18, 2022, Jeffrey Vanhoof wrote:
> Hi Thinh,
>
> On Tue, Oct 18, 2022 at 06:45:40PM +0000, Thinh Nguyen wrote:
> > Hi Dan,
> >
> > On Mon, Oct 17, 2022, Dan Vacura wrote:
> > > Hi Thinh,
> > >
> > > On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
> > > > On Mon, Oct 17, 2022, Dan Vacura wrote:
> > > > > From: Jeff Vanhoof <[email protected]>
> > > > >
> > > > > arm-smmu related crashes seen after a Missed ISOC interrupt when
> > > > > no_interrupt=1 is used. This can happen if the hardware is still using
> > > > > the data associated with a TRB after the usb_request's ->complete call
> > > > > has been made. Instead of immediately releasing a request when a Missed
> > > > > ISOC interrupt has occurred, this change will add logic to cancel the
> > > > > request instead where it will eventually be released when the
> > > > > END_TRANSFER command has completed. This logic is similar to some of the
> > > > > cleanup done in dwc3_gadget_ep_dequeue.
> > > >
> > > > This doesn't sound right. How did you determine that the hardware is
> > > > still using the data associated with the TRB? Did you check the TRB's
> > > > HWO bit?
> > >
> > > The problem we're seeing was mentioned in the summary of this patch
> > > series, issue #1. Basically, with the following patch
> > > https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/__;!!A4F2R9G_pg!aSNZ-IjMcPgL47A4NR5qp9qhVlP91UGTuCxej5NRTv8-FmTrMkKK7CjNToQQVEgtpqbKzLU2HXET9O226AEN$
> > > integrated a smmu panic is occurring on our Android device with the 5.15
> > > kernel which is:
> > >
> > > <3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!
> > >
> > > The uvc gadget driver appears to be the first (and only) gadget that
> > > uses the no_interrupt=1 logic, so this seems to be a new condition for
> > > the dwc3 driver. In our configuration, we have up to 64 requests and the
> > > no_interrupt=1 for up to 15 requests. The list size of dep->started_list
> > > would get up to that amount when looping through to cleanup the
> > > completed requests. From testing and debugging the smmu panic occurs
> > > when a -EXDEV status shows up and right after
> > > dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion
> > > we had was the requests were getting returned to the gadget too early.
> >
> > As I mentioned, if the status is updated to missed isoc, that means that
> > the controller returned ownership of the TRB to the driver. At least for
> > the particular request with -EXDEV, its TRBs are completed. I'm not
> > clear on your conclusion.
> >
> > Do we know where did the crash occur? Is it from dwc3 driver or from uvc
> > driver, and at what line? It'd great if we can see the driver log.
> >
>
> To interject, what should happen in dwc3_gadget_ep_reclaim_completed_trb if the
> IOC bit is not set (but the IMI bit is) and -EXDEV status is passed into it?

Hm... we may have overlooked this case for no_interrupt scenario. If IMI
is set, then there will be an interrupt when there's missed isoc
regardless of whether no_interrupt is set by the gadget driver.

> If the function returns 0, another attempt to reclaim may occur. If this
> happens and the next request did have the HWO bit set, the function would
> return 1 but dwc3_gadget_ep_cleanup_completed_request would still call
> dwc3_gadget_giveback.
>
> As a test (without this patch), I added a check to see if HWO bit was set in
> dwc3_gadget_ep_cleanup_completed_requests(). If the usecase was ISOC and the
> HWO bit was set I avoided calling dwc3_gadget_ep_cleanup_completed_request().
> This seemed to also avoid the iommu related crash being seen.
>
> Is there an issue in this area that needs to be corrected instead? Not having
> interrupts set for each request may be causing some new issues to be uncovered.
>
> As far as the crash seen without this patch, no good stacktrace is given. Line
> provided for crash varied a bit, but tended to appear towards the end of
> dwc3_stop_active_transfer() or dwc3_gadget_endpoint_trbs_complete().
>
> Since dwc3_gadget_endpoint_trbs_complete() can be called from multiple
> locations, I duplicated the function to help identify which path it was likely
> being called from. At the time of the crashes seen,
> dwc3_gadget_endpoint_transfer_in_progress() appeared to be the caller.
>
> dwc3_gadget_endpoint_transfer_in_progress()
> ->dwc3_gadget_endpoint_trbs_complete() (crashed towards end of here)
> ->dwc3_stop_active_transfer() (sometimes crashed towards end of here)
>
> I hope this clarifies things a bit.
>

Can we try this? Let me know if it resolves your issue.

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 61fba2b7389b..8352f4b5dd9f 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -3657,6 +3657,10 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
if (event->status & DEPEVT_STATUS_SHORT && !chain)
return 1;

+ if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
+ (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain)
+ return 1;
+
if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
(trb->ctrl & DWC3_TRB_CTRL_LST))
return 1;

Thanks,
Thinh

2022-10-18 23:26:28

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

On Tue, Oct 18, 2022, Michael Grzeschik wrote:
> Hi Thinh,
>
> On Tue, Oct 18, 2022 at 06:45:40PM +0000, Thinh Nguyen wrote:
> > On Mon, Oct 17, 2022, Dan Vacura wrote:
> > > On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
> > > > On Mon, Oct 17, 2022, Dan Vacura wrote:
> > > > > From: Jeff Vanhoof <[email protected]>
> > > > >
> > > > > arm-smmu related crashes seen after a Missed ISOC interrupt when
> > > > > no_interrupt=1 is used. This can happen if the hardware is still using
> > > > > the data associated with a TRB after the usb_request's ->complete call
> > > > > has been made. Instead of immediately releasing a request when a Missed
> > > > > ISOC interrupt has occurred, this change will add logic to cancel the
> > > > > request instead where it will eventually be released when the
> > > > > END_TRANSFER command has completed. This logic is similar to some of the
> > > > > cleanup done in dwc3_gadget_ep_dequeue.
> > > >
> > > > This doesn't sound right. How did you determine that the hardware is
> > > > still using the data associated with the TRB? Did you check the TRB's
> > > > HWO bit?
> > >
> > > The problem we're seeing was mentioned in the summary of this patch
> > > series, issue #1. Basically, with the following patch
> > > https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/__;!!A4F2R9G_pg!aSNZ-IjMcPgL47A4NR5qp9qhVlP91UGTuCxej5NRTv8-FmTrMkKK7CjNToQQVEgtpqbKzLU2HXET9O226AEN$
> > > integrated a smmu panic is occurring on our Android device with the 5.15
> > > kernel which is:
> > >
> > > <3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!
> > >
> > > The uvc gadget driver appears to be the first (and only) gadget that
> > > uses the no_interrupt=1 logic, so this seems to be a new condition for
> > > the dwc3 driver. In our configuration, we have up to 64 requests and the
> > > no_interrupt=1 for up to 15 requests. The list size of dep->started_list
> > > would get up to that amount when looping through to cleanup the
> > > completed requests. From testing and debugging the smmu panic occurs
> > > when a -EXDEV status shows up and right after
> > > dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion
> > > we had was the requests were getting returned to the gadget too early.
> >
> > As I mentioned, if the status is updated to missed isoc, that means that
> > the controller returned ownership of the TRB to the driver. At least for
> > the particular request with -EXDEV, its TRBs are completed. I'm not
> > clear on your conclusion.
> >
> > Do we know where did the crash occur? Is it from dwc3 driver or from uvc
> > driver, and at what line? It'd great if we can see the driver log.
> >
> > >
> > > >
> > > > The dwc3 driver would only give back the requests if the TRBs of the
> > > > associated requests are completed or when the device is disconnected.
> > > > If the TRB indicated missed isoc, that means that the TRB is completed
> > > > and its status was updated.
> > >
> > > Interesting, the device is not disconnected as we don't get the
> > > -ESHUTDOWN status back and with this patch in place things continue
> > > after a -EXDEV status is received.
> > >
> >
> > Actually, minor correction here: a recent change
> > b44c0e7fef51 ("usb: dwc3: gadget: conditionally remove requests")
> > changed -ESHUTDOWN request status to -ECONNRESET when disable endpoint.
> > This doesn't look right.
> >
> > While disabling endpoint may also apply for other cases such as
> > switching alternate interface in addition to disconnect, -ESHUTDOWN
> > seems more fitting there.
> >
> > Hi Michael,
> >
> > Can you help clarify for the change above? This changed the usage of
> > requests. Now requests returned by disconnection won't be returned as
> > -ESHUTDOWN.
>
> When writing the patch, I was looking into
> Documentation/driver-api/usb/error-codes.rst.
>
> After looking into it today, I see that ESHUTDOWN should be send on
> ep_disable (device disable) and ECONNRESET on stop_active_transfer.
> So I probably just mixed them up, while writing the patch. :/
>

I think you mean ECONNRESET for ep_dequeue()?
dwc3_stop_active_transfer() is called for both scenarios.

> The followup patch would then just be to swap the status results of
> __dwc3_gadget_ep_disable and dwc3_stop_active_transfers on the
> dwc3_remove_requests call.
>
> Michael

Can you help make a fix?

Thanks!
Thinh

2022-10-19 01:49:32

by Jeff Vanhoof

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

Hi Thinh,

On Tue, Oct 18, 2022 at 10:35:30PM +0000, Thinh Nguyen wrote:
> On Tue, Oct 18, 2022, Jeffrey Vanhoof wrote:
> > Hi Thinh,
> >
> > On Tue, Oct 18, 2022 at 06:45:40PM +0000, Thinh Nguyen wrote:
> > > Hi Dan,
> > >
> > > On Mon, Oct 17, 2022, Dan Vacura wrote:
> > > > Hi Thinh,
> > > >
> > > > On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
> > > > > On Mon, Oct 17, 2022, Dan Vacura wrote:
> > > > > > From: Jeff Vanhoof <[email protected]>
> > > > > >
> > > > > > arm-smmu related crashes seen after a Missed ISOC interrupt when
> > > > > > no_interrupt=1 is used. This can happen if the hardware is still using
> > > > > > the data associated with a TRB after the usb_request's ->complete call
> > > > > > has been made. Instead of immediately releasing a request when a Missed
> > > > > > ISOC interrupt has occurred, this change will add logic to cancel the
> > > > > > request instead where it will eventually be released when the
> > > > > > END_TRANSFER command has completed. This logic is similar to some of the
> > > > > > cleanup done in dwc3_gadget_ep_dequeue.
> > > > >
> > > > > This doesn't sound right. How did you determine that the hardware is
> > > > > still using the data associated with the TRB? Did you check the TRB's
> > > > > HWO bit?
> > > >
> > > > The problem we're seeing was mentioned in the summary of this patch
> > > > series, issue #1. Basically, with the following patch
> > > > https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/__;!!A4F2R9G_pg!aSNZ-IjMcPgL47A4NR5qp9qhVlP91UGTuCxej5NRTv8-FmTrMkKK7CjNToQQVEgtpqbKzLU2HXET9O226AEN$
> > > > integrated a smmu panic is occurring on our Android device with the 5.15
> > > > kernel which is:
> > > >
> > > > <3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!
> > > >
> > > > The uvc gadget driver appears to be the first (and only) gadget that
> > > > uses the no_interrupt=1 logic, so this seems to be a new condition for
> > > > the dwc3 driver. In our configuration, we have up to 64 requests and the
> > > > no_interrupt=1 for up to 15 requests. The list size of dep->started_list
> > > > would get up to that amount when looping through to cleanup the
> > > > completed requests. From testing and debugging the smmu panic occurs
> > > > when a -EXDEV status shows up and right after
> > > > dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion
> > > > we had was the requests were getting returned to the gadget too early.
> > >
> > > As I mentioned, if the status is updated to missed isoc, that means that
> > > the controller returned ownership of the TRB to the driver. At least for
> > > the particular request with -EXDEV, its TRBs are completed. I'm not
> > > clear on your conclusion.
> > >
> > > Do we know where did the crash occur? Is it from dwc3 driver or from uvc
> > > driver, and at what line? It'd great if we can see the driver log.
> > >
> >
> > To interject, what should happen in dwc3_gadget_ep_reclaim_completed_trb if the
> > IOC bit is not set (but the IMI bit is) and -EXDEV status is passed into it?
>
> Hm... we may have overlooked this case for no_interrupt scenario. If IMI
> is set, then there will be an interrupt when there's missed isoc
> regardless of whether no_interrupt is set by the gadget driver.
>
> > If the function returns 0, another attempt to reclaim may occur. If this
> > happens and the next request did have the HWO bit set, the function would
> > return 1 but dwc3_gadget_ep_cleanup_completed_request would still call
> > dwc3_gadget_giveback.
> >
> > As a test (without this patch), I added a check to see if HWO bit was set in
> > dwc3_gadget_ep_cleanup_completed_requests(). If the usecase was ISOC and the
> > HWO bit was set I avoided calling dwc3_gadget_ep_cleanup_completed_request().
> > This seemed to also avoid the iommu related crash being seen.
> >
> > Is there an issue in this area that needs to be corrected instead? Not having
> > interrupts set for each request may be causing some new issues to be uncovered.
> >
> > As far as the crash seen without this patch, no good stacktrace is given. Line
> > provided for crash varied a bit, but tended to appear towards the end of
> > dwc3_stop_active_transfer() or dwc3_gadget_endpoint_trbs_complete().
> >
> > Since dwc3_gadget_endpoint_trbs_complete() can be called from multiple
> > locations, I duplicated the function to help identify which path it was likely
> > being called from. At the time of the crashes seen,
> > dwc3_gadget_endpoint_transfer_in_progress() appeared to be the caller.
> >
> > dwc3_gadget_endpoint_transfer_in_progress()
> > ->dwc3_gadget_endpoint_trbs_complete() (crashed towards end of here)
> > ->dwc3_stop_active_transfer() (sometimes crashed towards end of here)
> >
> > I hope this clarifies things a bit.
> >
>
> Can we try this? Let me know if it resolves your issue.
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 61fba2b7389b..8352f4b5dd9f 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -3657,6 +3657,10 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> if (event->status & DEPEVT_STATUS_SHORT && !chain)
> return 1;
>
> + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain)
> + return 1;
> +
> if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> (trb->ctrl & DWC3_TRB_CTRL_LST))
> return 1;
>

With this change it doesn't seem to crash but unfortunately the output
completely hangs after the first missed isoc. At the moment I do not understand
why this might happen.

> Thanks,
> Thinh

Note that I haven't quite learned correctly how to reply correct to the mailing
list. I appologize for messing up the thread a bit.

Regards,
Jeff

2022-10-19 02:26:57

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

On Tue, Oct 18, 2022, Jeff Vanhoof wrote:
> Hi Thinh,
>
> On Tue, Oct 18, 2022 at 10:35:30PM +0000, Thinh Nguyen wrote:
> > On Tue, Oct 18, 2022, Jeffrey Vanhoof wrote:
> > > Hi Thinh,
> > >
> > > On Tue, Oct 18, 2022 at 06:45:40PM +0000, Thinh Nguyen wrote:
> > > > Hi Dan,
> > > >
> > > > On Mon, Oct 17, 2022, Dan Vacura wrote:
> > > > > Hi Thinh,
> > > > >
> > > > > On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
> > > > > > On Mon, Oct 17, 2022, Dan Vacura wrote:
> > > > > > > From: Jeff Vanhoof <[email protected]>
> > > > > > >
> > > > > > > arm-smmu related crashes seen after a Missed ISOC interrupt when
> > > > > > > no_interrupt=1 is used. This can happen if the hardware is still using
> > > > > > > the data associated with a TRB after the usb_request's ->complete call
> > > > > > > has been made. Instead of immediately releasing a request when a Missed
> > > > > > > ISOC interrupt has occurred, this change will add logic to cancel the
> > > > > > > request instead where it will eventually be released when the
> > > > > > > END_TRANSFER command has completed. This logic is similar to some of the
> > > > > > > cleanup done in dwc3_gadget_ep_dequeue.
> > > > > >
> > > > > > This doesn't sound right. How did you determine that the hardware is
> > > > > > still using the data associated with the TRB? Did you check the TRB's
> > > > > > HWO bit?
> > > > >
> > > > > The problem we're seeing was mentioned in the summary of this patch
> > > > > series, issue #1. Basically, with the following patch
> > > > > https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/__;!!A4F2R9G_pg!aSNZ-IjMcPgL47A4NR5qp9qhVlP91UGTuCxej5NRTv8-FmTrMkKK7CjNToQQVEgtpqbKzLU2HXET9O226AEN$
> > > > > integrated a smmu panic is occurring on our Android device with the 5.15
> > > > > kernel which is:
> > > > >
> > > > > <3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!
> > > > >
> > > > > The uvc gadget driver appears to be the first (and only) gadget that
> > > > > uses the no_interrupt=1 logic, so this seems to be a new condition for
> > > > > the dwc3 driver. In our configuration, we have up to 64 requests and the
> > > > > no_interrupt=1 for up to 15 requests. The list size of dep->started_list
> > > > > would get up to that amount when looping through to cleanup the
> > > > > completed requests. From testing and debugging the smmu panic occurs
> > > > > when a -EXDEV status shows up and right after
> > > > > dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion
> > > > > we had was the requests were getting returned to the gadget too early.
> > > >
> > > > As I mentioned, if the status is updated to missed isoc, that means that
> > > > the controller returned ownership of the TRB to the driver. At least for
> > > > the particular request with -EXDEV, its TRBs are completed. I'm not
> > > > clear on your conclusion.
> > > >
> > > > Do we know where did the crash occur? Is it from dwc3 driver or from uvc
> > > > driver, and at what line? It'd great if we can see the driver log.
> > > >
> > >
> > > To interject, what should happen in dwc3_gadget_ep_reclaim_completed_trb if the
> > > IOC bit is not set (but the IMI bit is) and -EXDEV status is passed into it?
> >
> > Hm... we may have overlooked this case for no_interrupt scenario. If IMI
> > is set, then there will be an interrupt when there's missed isoc
> > regardless of whether no_interrupt is set by the gadget driver.
> >
> > > If the function returns 0, another attempt to reclaim may occur. If this
> > > happens and the next request did have the HWO bit set, the function would
> > > return 1 but dwc3_gadget_ep_cleanup_completed_request would still call
> > > dwc3_gadget_giveback.
> > >
> > > As a test (without this patch), I added a check to see if HWO bit was set in
> > > dwc3_gadget_ep_cleanup_completed_requests(). If the usecase was ISOC and the
> > > HWO bit was set I avoided calling dwc3_gadget_ep_cleanup_completed_request().
> > > This seemed to also avoid the iommu related crash being seen.
> > >
> > > Is there an issue in this area that needs to be corrected instead? Not having
> > > interrupts set for each request may be causing some new issues to be uncovered.
> > >
> > > As far as the crash seen without this patch, no good stacktrace is given. Line
> > > provided for crash varied a bit, but tended to appear towards the end of
> > > dwc3_stop_active_transfer() or dwc3_gadget_endpoint_trbs_complete().
> > >
> > > Since dwc3_gadget_endpoint_trbs_complete() can be called from multiple
> > > locations, I duplicated the function to help identify which path it was likely
> > > being called from. At the time of the crashes seen,
> > > dwc3_gadget_endpoint_transfer_in_progress() appeared to be the caller.
> > >
> > > dwc3_gadget_endpoint_transfer_in_progress()
> > > ->dwc3_gadget_endpoint_trbs_complete() (crashed towards end of here)
> > > ->dwc3_stop_active_transfer() (sometimes crashed towards end of here)
> > >
> > > I hope this clarifies things a bit.
> > >
> >
> > Can we try this? Let me know if it resolves your issue.
> >
> > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > index 61fba2b7389b..8352f4b5dd9f 100644
> > --- a/drivers/usb/dwc3/gadget.c
> > +++ b/drivers/usb/dwc3/gadget.c
> > @@ -3657,6 +3657,10 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > return 1;
> >
> > + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain)
> > + return 1;
> > +
> > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > return 1;
> >
>
> With this change it doesn't seem to crash but unfortunately the output
> completely hangs after the first missed isoc. At the moment I do not understand
> why this might happen.
>

Can you capture the driver tracepoints with the change above?

>
> Note that I haven't quite learned correctly how to reply correct to the mailing
> list. I appologize for messing up the thread a bit.
>

Seems fine to me. As long as I can read and understand, I've no issue. :)

Thanks,
Thinh

2022-10-19 07:06:42

by Michael Grzeschik

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

Hi Thinh,

On Tue, Oct 18, 2022 at 10:45:16PM +0000, Thinh Nguyen wrote:
>On Tue, Oct 18, 2022, Michael Grzeschik wrote:
>> On Tue, Oct 18, 2022 at 06:45:40PM +0000, Thinh Nguyen wrote:
>> > On Mon, Oct 17, 2022, Dan Vacura wrote:
>> > > On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
>> > > > On Mon, Oct 17, 2022, Dan Vacura wrote:
>> > > > > From: Jeff Vanhoof <[email protected]>
>> > > > >
>> > > > > arm-smmu related crashes seen after a Missed ISOC interrupt when
>> > > > > no_interrupt=1 is used. This can happen if the hardware is still using
>> > > > > the data associated with a TRB after the usb_request's ->complete call
>> > > > > has been made. Instead of immediately releasing a request when a Missed
>> > > > > ISOC interrupt has occurred, this change will add logic to cancel the
>> > > > > request instead where it will eventually be released when the
>> > > > > END_TRANSFER command has completed. This logic is similar to some of the
>> > > > > cleanup done in dwc3_gadget_ep_dequeue.
>> > > >
>> > > > This doesn't sound right. How did you determine that the hardware is
>> > > > still using the data associated with the TRB? Did you check the TRB's
>> > > > HWO bit?
>> > >
>> > > The problem we're seeing was mentioned in the summary of this patch
>> > > series, issue #1. Basically, with the following patch
>> > > https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/__;!!A4F2R9G_pg!aSNZ-IjMcPgL47A4NR5qp9qhVlP91UGTuCxej5NRTv8-FmTrMkKK7CjNToQQVEgtpqbKzLU2HXET9O226AEN$
>> > > integrated a smmu panic is occurring on our Android device with the 5.15
>> > > kernel which is:
>> > >
>> > > <3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!
>> > >
>> > > The uvc gadget driver appears to be the first (and only) gadget that
>> > > uses the no_interrupt=1 logic, so this seems to be a new condition for
>> > > the dwc3 driver. In our configuration, we have up to 64 requests and the
>> > > no_interrupt=1 for up to 15 requests. The list size of dep->started_list
>> > > would get up to that amount when looping through to cleanup the
>> > > completed requests. From testing and debugging the smmu panic occurs
>> > > when a -EXDEV status shows up and right after
>> > > dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion
>> > > we had was the requests were getting returned to the gadget too early.
>> >
>> > As I mentioned, if the status is updated to missed isoc, that means that
>> > the controller returned ownership of the TRB to the driver. At least for
>> > the particular request with -EXDEV, its TRBs are completed. I'm not
>> > clear on your conclusion.
>> >
>> > Do we know where did the crash occur? Is it from dwc3 driver or from uvc
>> > driver, and at what line? It'd great if we can see the driver log.
>> >
>> > >
>> > > >
>> > > > The dwc3 driver would only give back the requests if the TRBs of the
>> > > > associated requests are completed or when the device is disconnected.
>> > > > If the TRB indicated missed isoc, that means that the TRB is completed
>> > > > and its status was updated.
>> > >
>> > > Interesting, the device is not disconnected as we don't get the
>> > > -ESHUTDOWN status back and with this patch in place things continue
>> > > after a -EXDEV status is received.
>> > >
>> >
>> > Actually, minor correction here: a recent change
>> > b44c0e7fef51 ("usb: dwc3: gadget: conditionally remove requests")
>> > changed -ESHUTDOWN request status to -ECONNRESET when disable endpoint.
>> > This doesn't look right.
>> >
>> > While disabling endpoint may also apply for other cases such as
>> > switching alternate interface in addition to disconnect, -ESHUTDOWN
>> > seems more fitting there.
>> >
>> > Hi Michael,
>> >
>> > Can you help clarify for the change above? This changed the usage of
>> > requests. Now requests returned by disconnection won't be returned as
>> > -ESHUTDOWN.
>>
>> When writing the patch, I was looking into
>> Documentation/driver-api/usb/error-codes.rst.
>>
>> After looking into it today, I see that ESHUTDOWN should be send on
>> ep_disable (device disable) and ECONNRESET on stop_active_transfer.
>> So I probably just mixed them up, while writing the patch. :/
>>
>
>I think you mean ECONNRESET for ep_dequeue()?
>dwc3_stop_active_transfer() is called for both scenarios.

No, I meant dwc3_stop_active_transfer*s*.
On ep_dequeue the request status is already ECONNRESET.

>> The followup patch would then just be to swap the status results of
>> __dwc3_gadget_ep_disable and dwc3_stop_active_transfers on the
>> dwc3_remove_requests call.
>>
>> Michael
>
>Can you help make a fix?

Sure, I will write a patch.

Thanks,
Michael

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |


Attachments:
(No filename) (5.09 kB)
signature.asc (849.00 B)
Download all attachments

2022-10-19 07:46:53

by Jeff Vanhoof

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

Hi Thinh,

On Wed, Oct 19, 2022 at 02:02:53AM +0000, Thinh Nguyen wrote:
> On Tue, Oct 18, 2022, Jeff Vanhoof wrote:
> > Hi Thinh,
> >
> > On Tue, Oct 18, 2022 at 10:35:30PM +0000, Thinh Nguyen wrote:
> > > On Tue, Oct 18, 2022, Jeffrey Vanhoof wrote:
> > > > Hi Thinh,
> > > >
> > > > On Tue, Oct 18, 2022 at 06:45:40PM +0000, Thinh Nguyen wrote:
> > > > > Hi Dan,
> > > > >
> > > > > On Mon, Oct 17, 2022, Dan Vacura wrote:
> > > > > > Hi Thinh,
> > > > > >
> > > > > > On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
> > > > > > > On Mon, Oct 17, 2022, Dan Vacura wrote:
> > > > > > > > From: Jeff Vanhoof <[email protected]>
> > > > > > > >
> > > > > > > > arm-smmu related crashes seen after a Missed ISOC interrupt when
> > > > > > > > no_interrupt=1 is used. This can happen if the hardware is still using
> > > > > > > > the data associated with a TRB after the usb_request's ->complete call
> > > > > > > > has been made. Instead of immediately releasing a request when a Missed
> > > > > > > > ISOC interrupt has occurred, this change will add logic to cancel the
> > > > > > > > request instead where it will eventually be released when the
> > > > > > > > END_TRANSFER command has completed. This logic is similar to some of the
> > > > > > > > cleanup done in dwc3_gadget_ep_dequeue.
> > > > > > >
> > > > > > > This doesn't sound right. How did you determine that the hardware is
> > > > > > > still using the data associated with the TRB? Did you check the TRB's
> > > > > > > HWO bit?
> > > > > >
> > > > > > The problem we're seeing was mentioned in the summary of this patch
> > > > > > series, issue #1. Basically, with the following patch
> > > > > > https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/__;!!A4F2R9G_pg!aSNZ-IjMcPgL47A4NR5qp9qhVlP91UGTuCxej5NRTv8-FmTrMkKK7CjNToQQVEgtpqbKzLU2HXET9O226AEN$
> > > > > > integrated a smmu panic is occurring on our Android device with the 5.15
> > > > > > kernel which is:
> > > > > >
> > > > > > <3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!
> > > > > >
> > > > > > The uvc gadget driver appears to be the first (and only) gadget that
> > > > > > uses the no_interrupt=1 logic, so this seems to be a new condition for
> > > > > > the dwc3 driver. In our configuration, we have up to 64 requests and the
> > > > > > no_interrupt=1 for up to 15 requests. The list size of dep->started_list
> > > > > > would get up to that amount when looping through to cleanup the
> > > > > > completed requests. From testing and debugging the smmu panic occurs
> > > > > > when a -EXDEV status shows up and right after
> > > > > > dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion
> > > > > > we had was the requests were getting returned to the gadget too early.
> > > > >
> > > > > As I mentioned, if the status is updated to missed isoc, that means that
> > > > > the controller returned ownership of the TRB to the driver. At least for
> > > > > the particular request with -EXDEV, its TRBs are completed. I'm not
> > > > > clear on your conclusion.
> > > > >
> > > > > Do we know where did the crash occur? Is it from dwc3 driver or from uvc
> > > > > driver, and at what line? It'd great if we can see the driver log.
> > > > >
> > > >
> > > > To interject, what should happen in dwc3_gadget_ep_reclaim_completed_trb if the
> > > > IOC bit is not set (but the IMI bit is) and -EXDEV status is passed into it?
> > >
> > > Hm... we may have overlooked this case for no_interrupt scenario. If IMI
> > > is set, then there will be an interrupt when there's missed isoc
> > > regardless of whether no_interrupt is set by the gadget driver.
> > >
> > > > If the function returns 0, another attempt to reclaim may occur. If this
> > > > happens and the next request did have the HWO bit set, the function would
> > > > return 1 but dwc3_gadget_ep_cleanup_completed_request would still call
> > > > dwc3_gadget_giveback.
> > > >
> > > > As a test (without this patch), I added a check to see if HWO bit was set in
> > > > dwc3_gadget_ep_cleanup_completed_requests(). If the usecase was ISOC and the
> > > > HWO bit was set I avoided calling dwc3_gadget_ep_cleanup_completed_request().
> > > > This seemed to also avoid the iommu related crash being seen.
> > > >
> > > > Is there an issue in this area that needs to be corrected instead? Not having
> > > > interrupts set for each request may be causing some new issues to be uncovered.
> > > >
> > > > As far as the crash seen without this patch, no good stacktrace is given. Line
> > > > provided for crash varied a bit, but tended to appear towards the end of
> > > > dwc3_stop_active_transfer() or dwc3_gadget_endpoint_trbs_complete().
> > > >
> > > > Since dwc3_gadget_endpoint_trbs_complete() can be called from multiple
> > > > locations, I duplicated the function to help identify which path it was likely
> > > > being called from. At the time of the crashes seen,
> > > > dwc3_gadget_endpoint_transfer_in_progress() appeared to be the caller.
> > > >
> > > > dwc3_gadget_endpoint_transfer_in_progress()
> > > > ->dwc3_gadget_endpoint_trbs_complete() (crashed towards end of here)
> > > > ->dwc3_stop_active_transfer() (sometimes crashed towards end of here)
> > > >
> > > > I hope this clarifies things a bit.
> > > >
> > >
> > > Can we try this? Let me know if it resolves your issue.
> > >
> > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > index 61fba2b7389b..8352f4b5dd9f 100644
> > > --- a/drivers/usb/dwc3/gadget.c
> > > +++ b/drivers/usb/dwc3/gadget.c
> > > @@ -3657,6 +3657,10 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > return 1;
> > >
> > > + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > > + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain)
> > > + return 1;
> > > +
> > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > > return 1;
> > >
> >
> > With this change it doesn't seem to crash but unfortunately the output
> > completely hangs after the first missed isoc. At the moment I do not understand
> > why this might happen.
> >
>
> Can you capture the driver tracepoints with the change above?
>

Due to the size of the log, have sent it to directly to you.

From what I can gather from the log, with the current changes it seems that
after a missed isoc event few requests are staying longer than expected in the
started_list (not getting reclaimed) and this is preventing the transmission
from stopping/starting again, and opening the door for continuous stream of
missed isoc events that cause what appears to the user as a frozen video.

So one thought, if IOC bit is not set every frame, but IMI bit is, when a
missed isoc related interrupt occurs it seems likely that more than one trb
request will need to be reclaimed, but the current set of changes is not
handling this.

In the good transfer case this issue seems to be taken care of since the IOC
bit is not set every frame and the reclaimation will loop through every item in
the started_list and only stop if there are no additional trbs or if one has
its HWO bit set. Although I am a bit surprised that we are not yet seeing iommu
related panics here too since the trb is being reclaimed and given back even
the HWO bit still set, but maybe I am misunderstanding something. In my earlier
testing, if a missed isoc event was received and we attempted to
reclaim/giveback a trb with its HWO bit set, a iommu related panic would be
seen.

Can you propose an additional change to handle freeing up the extra trbs in the
missed isoc case? I think that somehow the HWO bit should be checked before
entering dwc3_gadget_ep_cleanup_completed_request in order to avoid the trb
being given back too soon.

Your thoughts?

> >
> > Note that I haven't quite learned correctly how to reply correct to the mailing
> > list. I appologize for messing up the thread a bit.
> >
>
> Seems fine to me. As long as I can read and understand, I've no issue. :)
>
> Thanks,
> Thinh

Thanks,
Jeff

2022-10-19 19:14:13

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

On Wed, Oct 19, 2022, Jeff Vanhoof wrote:
> Hi Thinh,
>
> On Wed, Oct 19, 2022 at 02:02:53AM +0000, Thinh Nguyen wrote:
> > On Tue, Oct 18, 2022, Jeff Vanhoof wrote:
> > > Hi Thinh,
> > >
> > > On Tue, Oct 18, 2022 at 10:35:30PM +0000, Thinh Nguyen wrote:
> > > > On Tue, Oct 18, 2022, Jeffrey Vanhoof wrote:
> > > > > Hi Thinh,
> > > > >
> > > > > On Tue, Oct 18, 2022 at 06:45:40PM +0000, Thinh Nguyen wrote:
> > > > > > Hi Dan,
> > > > > >
> > > > > > On Mon, Oct 17, 2022, Dan Vacura wrote:
> > > > > > > Hi Thinh,
> > > > > > >
> > > > > > > On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
> > > > > > > > On Mon, Oct 17, 2022, Dan Vacura wrote:
> > > > > > > > > From: Jeff Vanhoof <[email protected]>
> > > > > > > > >
> > > > > > > > > arm-smmu related crashes seen after a Missed ISOC interrupt when
> > > > > > > > > no_interrupt=1 is used. This can happen if the hardware is still using
> > > > > > > > > the data associated with a TRB after the usb_request's ->complete call
> > > > > > > > > has been made. Instead of immediately releasing a request when a Missed
> > > > > > > > > ISOC interrupt has occurred, this change will add logic to cancel the
> > > > > > > > > request instead where it will eventually be released when the
> > > > > > > > > END_TRANSFER command has completed. This logic is similar to some of the
> > > > > > > > > cleanup done in dwc3_gadget_ep_dequeue.
> > > > > > > >
> > > > > > > > This doesn't sound right. How did you determine that the hardware is
> > > > > > > > still using the data associated with the TRB? Did you check the TRB's
> > > > > > > > HWO bit?
> > > > > > >
> > > > > > > The problem we're seeing was mentioned in the summary of this patch
> > > > > > > series, issue #1. Basically, with the following patch
> > > > > > > https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/__;!!A4F2R9G_pg!aSNZ-IjMcPgL47A4NR5qp9qhVlP91UGTuCxej5NRTv8-FmTrMkKK7CjNToQQVEgtpqbKzLU2HXET9O226AEN$
> > > > > > > integrated a smmu panic is occurring on our Android device with the 5.15
> > > > > > > kernel which is:
> > > > > > >
> > > > > > > <3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!
> > > > > > >
> > > > > > > The uvc gadget driver appears to be the first (and only) gadget that
> > > > > > > uses the no_interrupt=1 logic, so this seems to be a new condition for
> > > > > > > the dwc3 driver. In our configuration, we have up to 64 requests and the
> > > > > > > no_interrupt=1 for up to 15 requests. The list size of dep->started_list
> > > > > > > would get up to that amount when looping through to cleanup the
> > > > > > > completed requests. From testing and debugging the smmu panic occurs
> > > > > > > when a -EXDEV status shows up and right after
> > > > > > > dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion
> > > > > > > we had was the requests were getting returned to the gadget too early.
> > > > > >
> > > > > > As I mentioned, if the status is updated to missed isoc, that means that
> > > > > > the controller returned ownership of the TRB to the driver. At least for
> > > > > > the particular request with -EXDEV, its TRBs are completed. I'm not
> > > > > > clear on your conclusion.
> > > > > >
> > > > > > Do we know where did the crash occur? Is it from dwc3 driver or from uvc
> > > > > > driver, and at what line? It'd great if we can see the driver log.
> > > > > >
> > > > >
> > > > > To interject, what should happen in dwc3_gadget_ep_reclaim_completed_trb if the
> > > > > IOC bit is not set (but the IMI bit is) and -EXDEV status is passed into it?
> > > >
> > > > Hm... we may have overlooked this case for no_interrupt scenario. If IMI
> > > > is set, then there will be an interrupt when there's missed isoc
> > > > regardless of whether no_interrupt is set by the gadget driver.
> > > >
> > > > > If the function returns 0, another attempt to reclaim may occur. If this
> > > > > happens and the next request did have the HWO bit set, the function would
> > > > > return 1 but dwc3_gadget_ep_cleanup_completed_request would still call
> > > > > dwc3_gadget_giveback.
> > > > >
> > > > > As a test (without this patch), I added a check to see if HWO bit was set in
> > > > > dwc3_gadget_ep_cleanup_completed_requests(). If the usecase was ISOC and the
> > > > > HWO bit was set I avoided calling dwc3_gadget_ep_cleanup_completed_request().
> > > > > This seemed to also avoid the iommu related crash being seen.
> > > > >
> > > > > Is there an issue in this area that needs to be corrected instead? Not having
> > > > > interrupts set for each request may be causing some new issues to be uncovered.
> > > > >
> > > > > As far as the crash seen without this patch, no good stacktrace is given. Line
> > > > > provided for crash varied a bit, but tended to appear towards the end of
> > > > > dwc3_stop_active_transfer() or dwc3_gadget_endpoint_trbs_complete().
> > > > >
> > > > > Since dwc3_gadget_endpoint_trbs_complete() can be called from multiple
> > > > > locations, I duplicated the function to help identify which path it was likely
> > > > > being called from. At the time of the crashes seen,
> > > > > dwc3_gadget_endpoint_transfer_in_progress() appeared to be the caller.
> > > > >
> > > > > dwc3_gadget_endpoint_transfer_in_progress()
> > > > > ->dwc3_gadget_endpoint_trbs_complete() (crashed towards end of here)
> > > > > ->dwc3_stop_active_transfer() (sometimes crashed towards end of here)
> > > > >
> > > > > I hope this clarifies things a bit.
> > > > >
> > > >
> > > > Can we try this? Let me know if it resolves your issue.
> > > >
> > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > index 61fba2b7389b..8352f4b5dd9f 100644
> > > > --- a/drivers/usb/dwc3/gadget.c
> > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > @@ -3657,6 +3657,10 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > > return 1;
> > > >
> > > > + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > > > + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain)
> > > > + return 1;
> > > > +
> > > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > > > return 1;
> > > >
> > >
> > > With this change it doesn't seem to crash but unfortunately the output
> > > completely hangs after the first missed isoc. At the moment I do not understand
> > > why this might happen.
> > >
> >
> > Can you capture the driver tracepoints with the change above?
> >
>
> Due to the size of the log, have sent it to directly to you.

Just got it. Thanks.

>
> From what I can gather from the log, with the current changes it seems that
> after a missed isoc event few requests are staying longer than expected in the
> started_list (not getting reclaimed) and this is preventing the transmission
> from stopping/starting again, and opening the door for continuous stream of
> missed isoc events that cause what appears to the user as a frozen video.
>
> So one thought, if IOC bit is not set every frame, but IMI bit is, when a
> missed isoc related interrupt occurs it seems likely that more than one trb
> request will need to be reclaimed, but the current set of changes is not
> handling this.
>
> In the good transfer case this issue seems to be taken care of since the IOC
> bit is not set every frame and the reclaimation will loop through every item in
> the started_list and only stop if there are no additional trbs or if one has

It should stop at the request that associated with the interrupt event,
whether it's because of IMI or IOC.

> its HWO bit set. Although I am a bit surprised that we are not yet seeing iommu
> related panics here too since the trb is being reclaimed and given back even
> the HWO bit still set, but maybe I am misunderstanding something. In my earlier
> testing, if a missed isoc event was received and we attempted to
> reclaim/giveback a trb with its HWO bit set, a iommu related panic would be
> seen.

If the controller processed the TRB, it would clear the HWO bit after
completion. There are cases which the HWO bit is still set for some
TRBs, but the request is completed (e.g. short transfers causing the
controller to stop processing further). That those cases, the driver
would clear the HWO bit manually.

>
> Can you propose an additional change to handle freeing up the extra trbs in the
> missed isoc case? I think that somehow the HWO bit should be checked before
> entering dwc3_gadget_ep_cleanup_completed_request in order to avoid the trb
> being given back too soon.

We should not check for TRBs of requests beyond the request associated
with the interrupt event.

>
> Your thoughts?
>

The capture shows underrun, and it causes missed isoc.

Let's take a look at the first missed isoc request (req ffffff88f834db00)

<...>-22551 [002] d..2 13985.789684: dwc3_ep_queue: ep2in: req ffffff88f834db00 length 0/49152 zsi ==> -115
<...>-22551 [002] dn.2 13985.789728: dwc3_prepare_trb: ep2in: trb ffffffc016071970 (E152:D149) buf 00000000ea800000 size 1x 49152 ctrl 00000461 (Hlcs:Sc:isoc-first)

Each request uses a one TRB. From above, you can see that there are only
3 intervals prepared (E152 - D149 = 3).

The timestamp of the last request completion is 13985.788919.
The timestamp which the queued request is 13985.789728.
We can roughly estimate the diff is at least 809us.

3 intervals (3x125us) is less than 809us. So missed isoc is expected:

irq/399-dwc3-15269 [002] d..1 13985.790754: dwc3_event: event (f9acc08a): ep2in: Transfer In Progress [63916] (sIM)
irq/399-dwc3-15269 [002] d..1 13985.790758: dwc3_complete_trb: ep2in: trb ffffffc016071970 (E154:D152) buf 00000000ea800000 size 1x 49152 ctrl 3e6a0460 (hlcs:Sc:isoc-first)
irq/399-dwc3-15269 [002] d..1 13985.790808: dwc3_gadget_giveback: ep2in: req ffffff88f834db00 length 0/49152 zsi ==> -18


After this point, the uvc driver keeps playing catch up to stay in sync
with the host. It tries, but it couldn't catch up. Also, occasionally
something seems to be blocking dwc3, creating time gaps. Maybe because
of a spin_lock held somewhere?

The logic to detect underrun doesn't trigger because the queued list is
always non-empty, but the queued requests are expected to be missed
already. So you keep seeing missed isoc.

There are a few things you can mitigate this issue:
1) Don't set IMI if the request indicates no_interrupt. This reduces the
time software needs to handle interrupts.
2) Improve the underrun detection logic.
3) Increase the queuing frequency from the uvc to keep the request queue
full. Note that reduce/avoid setting no_interrupt will allow the
controller driver to update uvc often to keep requeuing new requests.

Best option is 3), but maybe we can do all 3.

For 2), you can set IMI for all isoc request as it is now. On missed
isoc, check for the TRB's scheduled uframe (from TRB info) and compare
it to the current uframe (from DSTS) for the number of intervals in
between. With the number of queued requests, you can calculate whether
the gadget driver queued enough requests. If it doesn't, send End
Transfer command and cancel all the queued requests. The next set of
requests will be in-sync again.

BR,
Thinh

PS. On a side note, I notice that there are reports of issue when using
SG right? Please note that dwc3 driver only allocates 256 TRBs
(including a link TRB). Each SG entry takes a TRB. If a request has many
SG entries, that eats up the available TRBs. So, even though the UVC may
queue many requests, not all of them are prepared immediately. If the
TRB ring is full, the driver need to wait for more TRBs to free up
before preparing more. From the log, I see that it's sending 48KB. Let's
say the uvc splits it into PAGE_SIZE of 4KB, that would take at least 12
TRB per request. (Side thought: I'm not sure why UVC needs SG in the
first place with its current implementation)

2022-10-19 22:22:00

by Jeff Vanhoof

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

Hi Thinh,
On Wed, Oct 19, 2022 at 07:08:27PM +0000, Thinh Nguyen wrote:
> On Wed, Oct 19, 2022, Jeff Vanhoof wrote:
> > Hi Thinh,
> >
> > On Wed, Oct 19, 2022 at 02:02:53AM +0000, Thinh Nguyen wrote:
> > > On Tue, Oct 18, 2022, Jeff Vanhoof wrote:
> > > > Hi Thinh,
> > > >
> > > > On Tue, Oct 18, 2022 at 10:35:30PM +0000, Thinh Nguyen wrote:
> > > > > On Tue, Oct 18, 2022, Jeffrey Vanhoof wrote:
> > > > > > Hi Thinh,
> > > > > >
> > > > > > On Tue, Oct 18, 2022 at 06:45:40PM +0000, Thinh Nguyen wrote:
> > > > > > > Hi Dan,
> > > > > > >
> > > > > > > On Mon, Oct 17, 2022, Dan Vacura wrote:
> > > > > > > > Hi Thinh,
> > > > > > > >
> > > > > > > > On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
> > > > > > > > > On Mon, Oct 17, 2022, Dan Vacura wrote:
> > > > > > > > > > From: Jeff Vanhoof <[email protected]>
> > > > > > > > > >
> > > > > > > > > > arm-smmu related crashes seen after a Missed ISOC interrupt when
> > > > > > > > > > no_interrupt=1 is used. This can happen if the hardware is still using
> > > > > > > > > > the data associated with a TRB after the usb_request's ->complete call
> > > > > > > > > > has been made. Instead of immediately releasing a request when a Missed
> > > > > > > > > > ISOC interrupt has occurred, this change will add logic to cancel the
> > > > > > > > > > request instead where it will eventually be released when the
> > > > > > > > > > END_TRANSFER command has completed. This logic is similar to some of the
> > > > > > > > > > cleanup done in dwc3_gadget_ep_dequeue.
> > > > > > > > >
> > > > > > > > > This doesn't sound right. How did you determine that the hardware is
> > > > > > > > > still using the data associated with the TRB? Did you check the TRB's
> > > > > > > > > HWO bit?
> > > > > > > >
> > > > > > > > The problem we're seeing was mentioned in the summary of this patch
> > > > > > > > series, issue #1. Basically, with the following patch
> > > > > > > > https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/__;!!A4F2R9G_pg!aSNZ-IjMcPgL47A4NR5qp9qhVlP91UGTuCxej5NRTv8-FmTrMkKK7CjNToQQVEgtpqbKzLU2HXET9O226AEN$
> > > > > > > > integrated a smmu panic is occurring on our Android device with the 5.15
> > > > > > > > kernel which is:
> > > > > > > >
> > > > > > > > <3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!
> > > > > > > >
> > > > > > > > The uvc gadget driver appears to be the first (and only) gadget that
> > > > > > > > uses the no_interrupt=1 logic, so this seems to be a new condition for
> > > > > > > > the dwc3 driver. In our configuration, we have up to 64 requests and the
> > > > > > > > no_interrupt=1 for up to 15 requests. The list size of dep->started_list
> > > > > > > > would get up to that amount when looping through to cleanup the
> > > > > > > > completed requests. From testing and debugging the smmu panic occurs
> > > > > > > > when a -EXDEV status shows up and right after
> > > > > > > > dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion
> > > > > > > > we had was the requests were getting returned to the gadget too early.
> > > > > > >
> > > > > > > As I mentioned, if the status is updated to missed isoc, that means that
> > > > > > > the controller returned ownership of the TRB to the driver. At least for
> > > > > > > the particular request with -EXDEV, its TRBs are completed. I'm not
> > > > > > > clear on your conclusion.
> > > > > > >
> > > > > > > Do we know where did the crash occur? Is it from dwc3 driver or from uvc
> > > > > > > driver, and at what line? It'd great if we can see the driver log.
> > > > > > >
> > > > > >
> > > > > > To interject, what should happen in dwc3_gadget_ep_reclaim_completed_trb if the
> > > > > > IOC bit is not set (but the IMI bit is) and -EXDEV status is passed into it?
> > > > >
> > > > > Hm... we may have overlooked this case for no_interrupt scenario. If IMI
> > > > > is set, then there will be an interrupt when there's missed isoc
> > > > > regardless of whether no_interrupt is set by the gadget driver.
> > > > >
> > > > > > If the function returns 0, another attempt to reclaim may occur. If this
> > > > > > happens and the next request did have the HWO bit set, the function would
> > > > > > return 1 but dwc3_gadget_ep_cleanup_completed_request would still call
> > > > > > dwc3_gadget_giveback.
> > > > > >
> > > > > > As a test (without this patch), I added a check to see if HWO bit was set in
> > > > > > dwc3_gadget_ep_cleanup_completed_requests(). If the usecase was ISOC and the
> > > > > > HWO bit was set I avoided calling dwc3_gadget_ep_cleanup_completed_request().
> > > > > > This seemed to also avoid the iommu related crash being seen.
> > > > > >
> > > > > > Is there an issue in this area that needs to be corrected instead? Not having
> > > > > > interrupts set for each request may be causing some new issues to be uncovered.
> > > > > >
> > > > > > As far as the crash seen without this patch, no good stacktrace is given. Line
> > > > > > provided for crash varied a bit, but tended to appear towards the end of
> > > > > > dwc3_stop_active_transfer() or dwc3_gadget_endpoint_trbs_complete().
> > > > > >
> > > > > > Since dwc3_gadget_endpoint_trbs_complete() can be called from multiple
> > > > > > locations, I duplicated the function to help identify which path it was likely
> > > > > > being called from. At the time of the crashes seen,
> > > > > > dwc3_gadget_endpoint_transfer_in_progress() appeared to be the caller.
> > > > > >
> > > > > > dwc3_gadget_endpoint_transfer_in_progress()
> > > > > > ->dwc3_gadget_endpoint_trbs_complete() (crashed towards end of here)
> > > > > > ->dwc3_stop_active_transfer() (sometimes crashed towards end of here)
> > > > > >
> > > > > > I hope this clarifies things a bit.
> > > > > >
> > > > >
> > > > > Can we try this? Let me know if it resolves your issue.
> > > > >
> > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > > index 61fba2b7389b..8352f4b5dd9f 100644
> > > > > --- a/drivers/usb/dwc3/gadget.c
> > > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > > @@ -3657,6 +3657,10 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > > > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > > > return 1;
> > > > >
> > > > > + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > > > > + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain)
> > > > > + return 1;
> > > > > +
> > > > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > > > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > > > > return 1;
> > > > >
> > > >
> > > > With this change it doesn't seem to crash but unfortunately the output
> > > > completely hangs after the first missed isoc. At the moment I do not understand
> > > > why this might happen.
> > > >
> > >
> > > Can you capture the driver tracepoints with the change above?
> > >
> >
> > Due to the size of the log, have sent it to directly to you.
>
> Just got it. Thanks.
>
> >
> > From what I can gather from the log, with the current changes it seems that
> > after a missed isoc event few requests are staying longer than expected in the
> > started_list (not getting reclaimed) and this is preventing the transmission
> > from stopping/starting again, and opening the door for continuous stream of
> > missed isoc events that cause what appears to the user as a frozen video.
> >
> > So one thought, if IOC bit is not set every frame, but IMI bit is, when a
> > missed isoc related interrupt occurs it seems likely that more than one trb
> > request will need to be reclaimed, but the current set of changes is not
> > handling this.
> >
> > In the good transfer case this issue seems to be taken care of since the IOC
> > bit is not set every frame and the reclaimation will loop through every item in
> > the started_list and only stop if there are no additional trbs or if one has
>
> It should stop at the request that associated with the interrupt event,
> whether it's because of IMI or IOC.

In this case I was concerned that if multipled queued reqs did not have IOC bit
set, but there was a missed isoc on one of the last reqs, whether or not we would
reclaim all of the requests up to the missed isoc related req. I'm not sure if
my concern is valid or not.

>
> > its HWO bit set. Although I am a bit surprised that we are not yet seeing iommu
> > related panics here too since the trb is being reclaimed and given back even
> > the HWO bit still set, but maybe I am misunderstanding something. In my earlier
> > testing, if a missed isoc event was received and we attempted to
> > reclaim/giveback a trb with its HWO bit set, a iommu related panic would be
> > seen.
>
> If the controller processed the TRB, it would clear the HWO bit after
> completion. There are cases which the HWO bit is still set for some
> TRBs, but the request is completed (e.g. short transfers causing the
> controller to stop processing further). That those cases, the driver
> would clear the HWO bit manually.
>
> >
> > Can you propose an additional change to handle freeing up the extra trbs in the
> > missed isoc case? I think that somehow the HWO bit should be checked before
> > entering dwc3_gadget_ep_cleanup_completed_request in order to avoid the trb
> > being given back too soon.
>
> We should not check for TRBs of requests beyond the request associated
> with the interrupt event.
>
> >
> > Your thoughts?
> >
>
> The capture shows underrun, and it causes missed isoc.
>
> Let's take a look at the first missed isoc request (req ffffff88f834db00)
>
> <...>-22551 [002] d..2 13985.789684: dwc3_ep_queue: ep2in: req ffffff88f834db00 length 0/49152 zsi ==> -115
> <...>-22551 [002] dn.2 13985.789728: dwc3_prepare_trb: ep2in: trb ffffffc016071970 (E152:D149) buf 00000000ea800000 size 1x 49152 ctrl 00000461 (Hlcs:Sc:isoc-first)
>
> Each request uses a one TRB. From above, you can see that there are only
> 3 intervals prepared (E152 - D149 = 3).
>
> The timestamp of the last request completion is 13985.788919.
> The timestamp which the queued request is 13985.789728.
> We can roughly estimate the diff is at least 809us.
>
> 3 intervals (3x125us) is less than 809us. So missed isoc is expected:
>
> irq/399-dwc3-15269 [002] d..1 13985.790754: dwc3_event: event (f9acc08a): ep2in: Transfer In Progress [63916] (sIM)
> irq/399-dwc3-15269 [002] d..1 13985.790758: dwc3_complete_trb: ep2in: trb ffffffc016071970 (E154:D152) buf 00000000ea800000 size 1x 49152 ctrl 3e6a0460 (hlcs:Sc:isoc-first)
> irq/399-dwc3-15269 [002] d..1 13985.790808: dwc3_gadget_giveback: ep2in: req ffffff88f834db00 length 0/49152 zsi ==> -18
>
>
> After this point, the uvc driver keeps playing catch up to stay in sync
> with the host. It tries, but it couldn't catch up. Also, occasionally
> something seems to be blocking dwc3, creating time gaps. Maybe because
> of a spin_lock held somewhere?
>

Could the time gaps be created by the interrupt frequency changes? They
completely change up the timing of when the transfers are kicked in dwc3 and
when uvc_video_complete/uvcg_video_pump is called.

> The logic to detect underrun doesn't trigger because the queued list is
> always non-empty, but the queued requests are expected to be missed
> already. So you keep seeing missed isoc.
>
> There are a few things you can mitigate this issue:
> 1) Don't set IMI if the request indicates no_interrupt. This reduces the
> time software needs to handle interrupts.

I did try this out earlier and it did not prevent the video freeze. It does
make sense what you are suggesting, but because it didn't work for me it made
me think that not all reqs are being reclaimed after a missed isoc is seen.
I'll revisit this area again.

> 2) Improve the underrun detection logic.

I like this idea a lot but I'm not up to the task just yet. Will attempt to
follow your recommendations below and see where I get with this.

> 3) Increase the queuing frequency from the uvc to keep the request queue
> full. Note that reduce/avoid setting no_interrupt will allow the
> controller driver to update uvc often to keep requeuing new requests.
>
> Best option is 3), but maybe we can do all 3.
>

I think that this is our best option too. Dan provided a patch to make
the interrupt skip logic configurable in the uvc driver:
https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/
https://lore.kernel.org/linux-usb/[email protected]/

> For 2), you can set IMI for all isoc request as it is now. On missed
> isoc, check for the TRB's scheduled uframe (from TRB info) and compare
> it to the current uframe (from DSTS) for the number of intervals in
> between. With the number of queued requests, you can calculate whether
> the gadget driver queued enough requests. If it doesn't, send End
> Transfer command and cancel all the queued requests. The next set of
> requests will be in-sync again.
>
> BR,
> Thinh
>
> PS. On a side note, I notice that there are reports of issue when using
> SG right? Please note that dwc3 driver only allocates 256 TRBs
> (including a link TRB). Each SG entry takes a TRB. If a request has many
> SG entries, that eats up the available TRBs. So, even though the UVC may
> queue many requests, not all of them are prepared immediately. If the
> TRB ring is full, the driver need to wait for more TRBs to free up
> before preparing more. From the log, I see that it's sending 48KB. Let's
> say the uvc splits it into PAGE_SIZE of 4KB, that would take at least 12
> TRB per request. (Side thought: I'm not sure why UVC needs SG in the
> first place with its current implementation)

On our platform I am seeing 2 items in the sg list being sent out from the uvc
driver. The 1st item in the list is a 2 byte uvc header and the 2nd item is
the 48KBs of data. To me this seems inefficient but sort of makes sense why it
was done, likely to avoid a memory copy just for the 2 byte header. But I
share your concern here, it's possible that other users wont be so lucky and
will wind up having a lot of page sized items in the sg list.

We are also hoping to make the use of sg configurable with the change:
https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/
https://lore.kernel.org/linux-usb/[email protected]/

Thanks,
Jeff

2022-10-19 23:28:11

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

Hi,

On Wed, Oct 19, 2022, Jeff Vanhoof wrote:
> Hi Thinh,
> On Wed, Oct 19, 2022 at 07:08:27PM +0000, Thinh Nguyen wrote:
> > On Wed, Oct 19, 2022, Jeff Vanhoof wrote:

<snip>

> > >
> > > From what I can gather from the log, with the current changes it seems that
> > > after a missed isoc event few requests are staying longer than expected in the
> > > started_list (not getting reclaimed) and this is preventing the transmission
> > > from stopping/starting again, and opening the door for continuous stream of
> > > missed isoc events that cause what appears to the user as a frozen video.
> > >
> > > So one thought, if IOC bit is not set every frame, but IMI bit is, when a
> > > missed isoc related interrupt occurs it seems likely that more than one trb
> > > request will need to be reclaimed, but the current set of changes is not
> > > handling this.
> > >
> > > In the good transfer case this issue seems to be taken care of since the IOC
> > > bit is not set every frame and the reclaimation will loop through every item in
> > > the started_list and only stop if there are no additional trbs or if one has
> >
> > It should stop at the request that associated with the interrupt event,
> > whether it's because of IMI or IOC.
>
> In this case I was concerned that if multipled queued reqs did not have IOC bit
> set, but there was a missed isoc on one of the last reqs, whether or not we would
> reclaim all of the requests up to the missed isoc related req. I'm not sure if
> my concern is valid or not.
>

There should be no problem. If there's an interrupt event indicating a
TRB completion, the driver will give back all the requests up to the
request associated with the interrupt event, and the controller will
continue processing the remaining TRBs. On the next TRB completion
event, the driver will again give back all the requests up to the
request associated with that event.

> >
> > > its HWO bit set. Although I am a bit surprised that we are not yet seeing iommu
> > > related panics here too since the trb is being reclaimed and given back even
> > > the HWO bit still set, but maybe I am misunderstanding something. In my earlier
> > > testing, if a missed isoc event was received and we attempted to
> > > reclaim/giveback a trb with its HWO bit set, a iommu related panic would be
> > > seen.
> >
> > If the controller processed the TRB, it would clear the HWO bit after
> > completion. There are cases which the HWO bit is still set for some
> > TRBs, but the request is completed (e.g. short transfers causing the
> > controller to stop processing further). That those cases, the driver
> > would clear the HWO bit manually.
> >
> > >
> > > Can you propose an additional change to handle freeing up the extra trbs in the
> > > missed isoc case? I think that somehow the HWO bit should be checked before
> > > entering dwc3_gadget_ep_cleanup_completed_request in order to avoid the trb
> > > being given back too soon.
> >
> > We should not check for TRBs of requests beyond the request associated
> > with the interrupt event.
> >
> > >
> > > Your thoughts?
> > >
> >
> > The capture shows underrun, and it causes missed isoc.
> >
> > Let's take a look at the first missed isoc request (req ffffff88f834db00)
> >
> > <...>-22551 [002] d..2 13985.789684: dwc3_ep_queue: ep2in: req ffffff88f834db00 length 0/49152 zsi ==> -115
> > <...>-22551 [002] dn.2 13985.789728: dwc3_prepare_trb: ep2in: trb ffffffc016071970 (E152:D149) buf 00000000ea800000 size 1x 49152 ctrl 00000461 (Hlcs:Sc:isoc-first)
> >
> > Each request uses a one TRB. From above, you can see that there are only
> > 3 intervals prepared (E152 - D149 = 3).
> >
> > The timestamp of the last request completion is 13985.788919.
> > The timestamp which the queued request is 13985.789728.
> > We can roughly estimate the diff is at least 809us.
> >
> > 3 intervals (3x125us) is less than 809us. So missed isoc is expected:
> >
> > irq/399-dwc3-15269 [002] d..1 13985.790754: dwc3_event: event (f9acc08a): ep2in: Transfer In Progress [63916] (sIM)
> > irq/399-dwc3-15269 [002] d..1 13985.790758: dwc3_complete_trb: ep2in: trb ffffffc016071970 (E154:D152) buf 00000000ea800000 size 1x 49152 ctrl 3e6a0460 (hlcs:Sc:isoc-first)
> > irq/399-dwc3-15269 [002] d..1 13985.790808: dwc3_gadget_giveback: ep2in: req ffffff88f834db00 length 0/49152 zsi ==> -18
> >
> >
> > After this point, the uvc driver keeps playing catch up to stay in sync
> > with the host. It tries, but it couldn't catch up. Also, occasionally
> > something seems to be blocking dwc3, creating time gaps. Maybe because
> > of a spin_lock held somewhere?
> >
>
> Could the time gaps be created by the interrupt frequency changes? They
> completely change up the timing of when the transfers are kicked in dwc3 and
> when uvc_video_complete/uvcg_video_pump is called.

You can check all the "fastrpc_msg" tracepoints in the log. These steps
seem to slow down the queuing process in uvc.

>
> > The logic to detect underrun doesn't trigger because the queued list is
> > always non-empty, but the queued requests are expected to be missed
> > already. So you keep seeing missed isoc.
> >
> > There are a few things you can mitigate this issue:
> > 1) Don't set IMI if the request indicates no_interrupt. This reduces the
> > time software needs to handle interrupts.
>
> I did try this out earlier and it did not prevent the video freeze. It does
> make sense what you are suggesting, but because it didn't work for me it made
> me think that not all reqs are being reclaimed after a missed isoc is seen.
> I'll revisit this area again.

I don't expect this to improve much.

>
> > 2) Improve the underrun detection logic.
>
> I like this idea a lot but I'm not up to the task just yet. Will attempt to
> follow your recommendations below and see where I get with this.
>
> > 3) Increase the queuing frequency from the uvc to keep the request queue
> > full. Note that reduce/avoid setting no_interrupt will allow the
> > controller driver to update uvc often to keep requeuing new requests.
> >
> > Best option is 3), but maybe we can do all 3.
> >
>
> I think that this is our best option too. Dan provided a patch to make
> the interrupt skip logic configurable in the uvc driver:
> https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/__;!!A4F2R9G_pg!eBu4j9m9C_XJHcuTXmYqo4CAe8bcQ0ZC3UWT3NJUZYYG6S-VJpriVwd2Q5NmAOFnN2PLgTauFDZ-fBM35ftO$
> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/[email protected]/__;!!A4F2R9G_pg!eBu4j9m9C_XJHcuTXmYqo4CAe8bcQ0ZC3UWT3NJUZYYG6S-VJpriVwd2Q5NmAOFnN2PLgTauFDZ-fCLOxuhL$
>
> > For 2), you can set IMI for all isoc request as it is now. On missed
> > isoc, check for the TRB's scheduled uframe (from TRB info) and compare
> > it to the current uframe (from DSTS) for the number of intervals in
> > between. With the number of queued requests, you can calculate whether
> > the gadget driver queued enough requests. If it doesn't, send End
> > Transfer command and cancel all the queued requests. The next set of
> > requests will be in-sync again.
> >
> > BR,
> > Thinh
> >
> > PS. On a side note, I notice that there are reports of issue when using
> > SG right? Please note that dwc3 driver only allocates 256 TRBs
> > (including a link TRB). Each SG entry takes a TRB. If a request has many
> > SG entries, that eats up the available TRBs. So, even though the UVC may
> > queue many requests, not all of them are prepared immediately. If the
> > TRB ring is full, the driver need to wait for more TRBs to free up
> > before preparing more. From the log, I see that it's sending 48KB. Let's
> > say the uvc splits it into PAGE_SIZE of 4KB, that would take at least 12
> > TRB per request. (Side thought: I'm not sure why UVC needs SG in the
> > first place with its current implementation)
>
> On our platform I am seeing 2 items in the sg list being sent out from the uvc
> driver. The 1st item in the list is a 2 byte uvc header and the 2nd item is
> the 48KBs of data. To me this seems inefficient but sort of makes sense why it
> was done, likely to avoid a memory copy just for the 2 byte header. But I

That doesn't make sense to me, but I'm sure there's a real reason. Just
curious, that's all. :)

> share your concern here, it's possible that other users wont be so lucky and
> will wind up having a lot of page sized items in the sg list.
>
> We are also hoping to make the use of sg configurable with the change:
> https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/__;!!A4F2R9G_pg!eBu4j9m9C_XJHcuTXmYqo4CAe8bcQ0ZC3UWT3NJUZYYG6S-VJpriVwd2Q5NmAOFnN2PLgTauFDZ-fAiSl95Q$
> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/[email protected]/__;!!A4F2R9G_pg!eBu4j9m9C_XJHcuTXmYqo4CAe8bcQ0ZC3UWT3NJUZYYG6S-VJpriVwd2Q5NmAOFnN2PLgTauFDZ-fDoaNx6Q$
>

Ok.

Thanks,
Thinh

2022-10-20 17:00:27

by Jeff Vanhoof

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

Hi Thinh,

On Wed, Oct 19, 2022 at 11:06:08PM +0000, Thinh Nguyen wrote:
> Hi,
>
> On Wed, Oct 19, 2022, Jeff Vanhoof wrote:
> > Hi Thinh,
> > On Wed, Oct 19, 2022 at 07:08:27PM +0000, Thinh Nguyen wrote:
> > > On Wed, Oct 19, 2022, Jeff Vanhoof wrote:
>
> <snip>
>
> > > >
> > > > From what I can gather from the log, with the current changes it seems that
> > > > after a missed isoc event few requests are staying longer than expected in the
> > > > started_list (not getting reclaimed) and this is preventing the transmission
> > > > from stopping/starting again, and opening the door for continuous stream of
> > > > missed isoc events that cause what appears to the user as a frozen video.
> > > >
> > > > So one thought, if IOC bit is not set every frame, but IMI bit is, when a
> > > > missed isoc related interrupt occurs it seems likely that more than one trb
> > > > request will need to be reclaimed, but the current set of changes is not
> > > > handling this.
> > > >
> > > > In the good transfer case this issue seems to be taken care of since the IOC
> > > > bit is not set every frame and the reclaimation will loop through every item in
> > > > the started_list and only stop if there are no additional trbs or if one has
> > >
> > > It should stop at the request that associated with the interrupt event,
> > > whether it's because of IMI or IOC.
> >
> > In this case I was concerned that if multipled queued reqs did not have IOC bit
> > set, but there was a missed isoc on one of the last reqs, whether or not we would
> > reclaim all of the requests up to the missed isoc related req. I'm not sure if
> > my concern is valid or not.
> >
>
> There should be no problem. If there's an interrupt event indicating a
> TRB completion, the driver will give back all the requests up to the
> request associated with the interrupt event, and the controller will
> continue processing the remaining TRBs. On the next TRB completion
> event, the driver will again give back all the requests up to the
> request associated with that event.
>

I was testing with the following patch you suggested:

> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 61fba2b7389b..8352f4b5dd9f 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -3657,6 +3657,10 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> if (event->status & DEPEVT_STATUS_SHORT && !chain)
> return 1;
>
> + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain)
> + return 1;
> +
> if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> (trb->ctrl & DWC3_TRB_CTRL_LST))
> return 1;
>

At this time the IMI bit was set for every frame. With these changes it
appeared in case of missed isoc that sometimes not all requests would be
reclaimed (enqueued != dequeued even 100ms after the last interrupt was
handled). If the 1st req in the started_list was fine (IMI set, but not IOC),
and a later req was the one actually missed, because of this status check the
reclaimation could stop early and not clean up to the appropriate req. As
suggested yesterday, I also tried only setting the IMI bit when no_interrupt is
not set, however I was still seeing the complete freezes. After analyzing this
issue a bit, I have updated the diff to look more like this:

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index dfaf9ac24c4f..bb800a81815b 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1230,8 +1230,9 @@ static void __dwc3_prepare_one_trb(struct dwc3_ep *dep, struct dwc3_trb *trb,
trb->ctrl = DWC3_TRBCTL_ISOCHRONOUS;
}

- /* always enable Interrupt on Missed ISOC */
- trb->ctrl |= DWC3_TRB_CTRL_ISP_IMI;
+ /* enable Interrupt on Missed ISOC */
+ if ((!no_interrupt && !chain) || must_interrupt)
+ trb->ctrl |= DWC3_TRB_CTRL_ISP_IMI;
break;

case USB_ENDPOINT_XFER_BULK:
@@ -3195,6 +3196,11 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
if (event->status & DEPEVT_STATUS_SHORT && !chain)
return 1;

+ if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
+ (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain
+ && (trb->ctrl & DWC3_TRB_CTRL_ISP_IMI))
+ return 1;
+
if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
(trb->ctrl & DWC3_TRB_CTRL_LST))
return 1;

Where the trb must have the IMI set before returning early. This seemed to make
the freezes recoverable.

> > >
> > > > its HWO bit set. Although I am a bit surprised that we are not yet seeing iommu
> > > > related panics here too since the trb is being reclaimed and given back even
> > > > the HWO bit still set, but maybe I am misunderstanding something. In my earlier
> > > > testing, if a missed isoc event was received and we attempted to
> > > > reclaim/giveback a trb with its HWO bit set, a iommu related panic would be
> > > > seen.
> > >
> > > If the controller processed the TRB, it would clear the HWO bit after
> > > completion. There are cases which the HWO bit is still set for some
> > > TRBs, but the request is completed (e.g. short transfers causing the
> > > controller to stop processing further). That those cases, the driver
> > > would clear the HWO bit manually.
> > >
> > > >
> > > > Can you propose an additional change to handle freeing up the extra trbs in the
> > > > missed isoc case? I think that somehow the HWO bit should be checked before
> > > > entering dwc3_gadget_ep_cleanup_completed_request in order to avoid the trb
> > > > being given back too soon.
> > >
> > > We should not check for TRBs of requests beyond the request associated
> > > with the interrupt event.
> > >
> > > >
> > > > Your thoughts?
> > > >
> > >
> > > The capture shows underrun, and it causes missed isoc.
> > >
> > > Let's take a look at the first missed isoc request (req ffffff88f834db00)
> > >
> > > <...>-22551 [002] d..2 13985.789684: dwc3_ep_queue: ep2in: req ffffff88f834db00 length 0/49152 zsi ==> -115
> > > <...>-22551 [002] dn.2 13985.789728: dwc3_prepare_trb: ep2in: trb ffffffc016071970 (E152:D149) buf 00000000ea800000 size 1x 49152 ctrl 00000461 (Hlcs:Sc:isoc-first)
> > >
> > > Each request uses a one TRB. From above, you can see that there are only
> > > 3 intervals prepared (E152 - D149 = 3).
> > >
> > > The timestamp of the last request completion is 13985.788919.
> > > The timestamp which the queued request is 13985.789728.
> > > We can roughly estimate the diff is at least 809us.
> > >
> > > 3 intervals (3x125us) is less than 809us. So missed isoc is expected:
> > >
> > > irq/399-dwc3-15269 [002] d..1 13985.790754: dwc3_event: event (f9acc08a): ep2in: Transfer In Progress [63916] (sIM)
> > > irq/399-dwc3-15269 [002] d..1 13985.790758: dwc3_complete_trb: ep2in: trb ffffffc016071970 (E154:D152) buf 00000000ea800000 size 1x 49152 ctrl 3e6a0460 (hlcs:Sc:isoc-first)
> > > irq/399-dwc3-15269 [002] d..1 13985.790808: dwc3_gadget_giveback: ep2in: req ffffff88f834db00 length 0/49152 zsi ==> -18
> > >
> > >
> > > After this point, the uvc driver keeps playing catch up to stay in sync
> > > with the host. It tries, but it couldn't catch up. Also, occasionally
> > > something seems to be blocking dwc3, creating time gaps. Maybe because
> > > of a spin_lock held somewhere?
> > >
> >
> > Could the time gaps be created by the interrupt frequency changes? They
> > completely change up the timing of when the transfers are kicked in dwc3 and
> > when uvc_video_complete/uvcg_video_pump is called.
>
> You can check all the "fastrpc_msg" tracepoints in the log. These steps
> seem to slow down the queuing process in uvc.
>

The large time gaps appear related to the low fps used for the camera
configuration. I was able to confirm this by adding additional trace points in
the uvc driver to see if pump was being called after new work was queued up
by v4l2.

Some additional thoughts:

On the scheduling side, one issue I've seen:
1) uvc/dwc3 may take a few millisecs to queue up the requests for a single
frame. How quickly this is completed could make for some of the differences
seen from one platform to another.

2) If the isoc transfer has not yet started and uvc begins to queue up requests to
dwc3, it appears that the isoc transfer will almost immediately start. And if
the remaining requests for the frame do not come quickly enough, there is a
potential for the transfer to stop or a missed isoc event to occur.

Maybe this happens frequently on my platform because the debug kernel I am
using has a bit too much overhead. Will allowing a bit more time for initially
queuing up some data before kicking off the transfer be prudent? Is 500us
really enough for a mobile platform? We've always have had issues with random
frame drops, but on much older platforms I was able to reduce the frequency of
them by delaying the start by pushing out the frame_number by 16 or 32.

Also, one problem I am seeing specifically with the skip interrupts:
1) If a missed isoc event occurs, dwc3_gadget_giveback will call the request's
->complete callback where the status -EXDEV is passed. In uvc_video_complete it
will begin to cancel the queue if this status is seen. If this occurs while in
the middle of queuing up the requests for the same frame, then it's quite
possible that the last set of requests will have had no_interrupt=1 set.

2) In dwc3 after the missed isoc event, if the remaining set of requests in the
started_list do not have the IOC bit set, then no further interrupts will
occur. And because the started_list is not emptied the transfer will not be
stopped. Only after uvc queues up the requests for the next frame (which may
come much later, i.e. 30-100ms) will the transfers appear to continue, however
additional missed isoc events will be received and the cycle could continue.

I believe that this is a problem that should be addressed somehow. Not sure if
the uvc driver needs to send a dummy request when an error is received, or if
it should just send one more additional request with the no_interrupt bit not
set, or if it needs to just send the complete frame in spite of the
error. Hopefully I'm not just completely lost on this stuff.

Using the above patch, I grabbed the following trace snippets to show what is being seen:

Normal looking missed isoc related trace where enqueued == dequeued and transfer ends as expected:
kworker/u17:5-8169 [000] d..2 187.669572: dwc3_gadget_ep_cmd: ep2in: cmd 'Start Transfer' [49540406] params 00000000 efff5f30 00000000 --> status: Successful
irq/398-dwc3-9882 [000] d..1 187.672504: dwc3_complete_trb: ep2in: trb ffffffc016396f30 (E250:D244) buf 00000000ed470000 size 1x 0 ctrl 12550c60 (hlcs:SC:isoc-first)
irq/398-dwc3-9882 [000] d..1 187.672804: dwc3_complete_trb: ep2in: trb ffffffc016396f40 (E250:D245) buf 00000000ec5e0000 size 1x 0 ctrl 12558060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d..1 187.672885: dwc3_complete_trb: ep2in: trb ffffffc016396f50 (E250:D246) buf 00000000ecd30000 size 1x 0 ctrl 12560060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d..1 187.672942: dwc3_complete_trb: ep2in: trb ffffffc016396f60 (E250:D247) buf 00000000ecd40000 size 1x 0 ctrl 12568060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d..1 187.672993: dwc3_complete_trb: ep2in: trb ffffffc016396f70 (E250:D248) buf 00000000edfb0000 size 1x 0 ctrl 12570060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d..1 187.673049: dwc3_complete_trb: ep2in: trb ffffffc016396f80 (E250:D249) buf 00000000edfa0000 size 1x 0 ctrl 12578c60 (hlcs:SC:isoc-first)
irq/398-dwc3-9882 [000] d..1 187.674615: dwc3_complete_trb: ep2in: trb ffffffc016396f90 (E254:D250) buf 00000000eca80000 size 1x 0 ctrl 12580060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d..1 187.674722: dwc3_complete_trb: ep2in: trb ffffffc016396fa0 (E254:D251) buf 00000000edfa0000 size 1x 49152 ctrl 12588060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d... 187.674793: usb_gadget_giveback_request: ep2in: req ffffff894498a600 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
irq/398-dwc3-9882 [000] d..1 187.675008: dwc3_complete_trb: ep2in: trb ffffffc016396fb0 (E254:D252) buf 00000000edfb0000 size 1x 49152 ctrl 12590060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d... 187.675093: usb_gadget_giveback_request: ep2in: req ffffff892bcd3500 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
irq/398-dwc3-9882 [000] d..1 187.675120: dwc3_complete_trb: ep2in: trb ffffffc016396fc0 (E254:D253) buf 00000000ecd40000 size 1x 49152 ctrl 12598060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d... 187.675207: usb_gadget_giveback_request: ep2in: req ffffff8918f48200 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
irq/398-dwc3-9882 [000] d..1 187.675225: dwc3_complete_trb: ep2in: trb ffffffc016396fd0 (E254:D254) buf 00000000ecd30000 size 1x 49152 ctrl 125a0c60 (hlcs:SC:isoc-first)
irq/398-dwc3-9882 [000] d... 187.675274: usb_gadget_giveback_request: ep2in: req ffffff892bcd3600 length 0/49152 sgs 0/0 stream 0 zsI status -18 --> 0
irq/398-dwc3-9882 [000] d..1 187.675364: dwc3_gadget_ep_cmd: ep2in: cmd 'End Transfer' [50d08] params 00000000 00000000 00000000 --> status: Successful

Abnormal trace where the 'End Transfer' is delayed by a few frames:
kworker/u17:5-8169 [000] dn.2 190.804817: dwc3_gadget_ep_cmd: ep2in: cmd 'Start Transfer' [ab500406] params 00000000 efff5290 00000000 --> status: Successful
irq/398-dwc3-9882 [000] d..1 190.809070: dwc3_complete_trb: ep2in: trb ffffffc016396290 (E51:D42) buf 00000000ef6b0000 size 1x 0 ctrl 2ad40c60 (hlcs:SC:isoc-first)
irq/398-dwc3-9882 [000] d..1 190.809478: dwc3_complete_trb: ep2in: trb ffffffc0163962a0 (E53:D43) buf 00000000ef060000 size 1x 49152 ctrl 2ad48060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d... 190.809534: usb_gadget_giveback_request: ep2in: req ffffff8918f48200 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
irq/398-dwc3-9882 [000] d..1 190.809752: dwc3_complete_trb: ep2in: trb ffffffc0163962b0 (E54:D44) buf 00000000ec5a0000 size 1x 49152 ctrl 2ad50060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d... 190.809878: usb_gadget_giveback_request: ep2in: req ffffff8859af4e00 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
irq/398-dwc3-9882 [000] d..1 190.809966: dwc3_complete_trb: ep2in: trb ffffffc0163962c0 (E54:D45) buf 00000000ef180000 size 1x 49152 ctrl 2ad58060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d... 190.810037: usb_gadget_giveback_request: ep2in: req ffffff8918f48000 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
irq/398-dwc3-9882 [000] d..1 190.810058: dwc3_complete_trb: ep2in: trb ffffffc0163962d0 (E54:D46) buf 00000000ef070000 size 1x 49152 ctrl 2ad60060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d... 190.810139: usb_gadget_giveback_request: ep2in: req ffffff894498a500 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
irq/398-dwc3-9882 [000] d..1 190.810159: dwc3_complete_trb: ep2in: trb ffffffc0163962e0 (E54:D47) buf 00000000eeae0000 size 1x 49152 ctrl 2ad68c60 (hlcs:SC:isoc-first)
irq/398-dwc3-9882 [000] d... 190.810238: usb_gadget_giveback_request: ep2in: req ffffff88bddb9f00 length 0/49152 sgs 0/0 stream 0 zsI status -18 --> 0
irq/398-dwc3-9882 [000] d..1 190.810293: dwc3_complete_trb: ep2in: trb ffffffc0163962f0 (E54:D48) buf 00000000ec0d0000 size 1x 49152 ctrl 2ad70060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d... 190.810368: usb_gadget_giveback_request: ep2in: req ffffff892bcd3500 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
irq/398-dwc3-9882 [000] d..1 190.810389: dwc3_complete_trb: ep2in: trb ffffffc016396300 (E54:D49) buf 00000000ec0e0000 size 1x 0 ctrl 2ad78060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d..1 190.810454: dwc3_complete_trb: ep2in: trb ffffffc016396310 (E54:D50) buf 00000000ec0f0000 size 1x 0 ctrl 2ad80060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d..1 190.810526: dwc3_complete_trb: ep2in: trb ffffffc016396320 (E54:D51) buf 00000000efa20000 size 1x 49152 ctrl 2ad88060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d... 190.810591: usb_gadget_giveback_request: ep2in: req ffffff8859af5800 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
irq/398-dwc3-9882 [000] d..1 190.810609: dwc3_complete_trb: ep2in: trb ffffffc016396330 (E54:D52) buf 00000000efa30000 size 1x 49152 ctrl 2ad90c60 (hlcs:SC:isoc-first)
irq/398-dwc3-9882 [000] d... 190.810676: usb_gadget_giveback_request: ep2in: req ffffff88bddb8300 length 0/49152 sgs 0/0 stream 0 zsI status -18 --> 0
...
irq/398-dwc3-9882 [000] d..1 190.919720: dwc3_complete_trb: ep2in: trb ffffffc016396340 (E56:D53) buf 00000000efa40000 size 1x 49152 ctrl 2ad98060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d... 190.919804: usb_gadget_giveback_request: ep2in: req ffffff894498a900 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
irq/398-dwc3-9882 [000] d..1 190.920065: dwc3_complete_trb: ep2in: trb ffffffc016396350 (E57:D54) buf 00000000efa50000 size 1x 49152 ctrl 2ada0060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d... 190.920116: usb_gadget_giveback_request: ep2in: req ffffff88bddb8b00 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
irq/398-dwc3-9882 [000] d..1 190.920175: dwc3_complete_trb: ep2in: trb ffffffc016396360 (E57:D55) buf 00000000efa60000 size 1x 49152 ctrl 2ada8c60 (hlcs:SC:isoc-first)
irq/398-dwc3-9882 [000] d... 190.920223: usb_gadget_giveback_request: ep2in: req ffffff892bcd3600 length 0/49152 sgs 0/0 stream 0 zsI status -18 --> 0
...
irq/398-dwc3-9882 [000] d..1 190.992902: dwc3_complete_trb: ep2in: trb ffffffc016396370 (E59:D56) buf 00000000efa90000 size 1x 49152 ctrl 2adb0060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d... 190.993014: usb_gadget_giveback_request: ep2in: req ffffff88a0b9b400 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
irq/398-dwc3-9882 [000] d..1 190.993365: dwc3_complete_trb: ep2in: trb ffffffc016396380 (E60:D57) buf 00000000efaa0000 size 1x 49152 ctrl 2adb8060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d... 190.993439: usb_gadget_giveback_request: ep2in: req ffffff8859af4f00 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
irq/398-dwc3-9882 [000] d..1 190.993528: dwc3_complete_trb: ep2in: trb ffffffc016396390 (E60:D58) buf 00000000efab0000 size 1x 49152 ctrl 2adc0c60 (hlcs:SC:isoc-first)
irq/398-dwc3-9882 [000] d... 190.993622: usb_gadget_giveback_request: ep2in: req ffffff8859af4e00 length 0/49152 sgs 0/0 stream 0 zsI status -18 --> 0
...
irq/398-dwc3-9882 [000] d..1 191.104270: dwc3_complete_trb: ep2in: trb ffffffc0163963a0 (E61:D59) buf 00000000ee950000 size 1x 49152 ctrl 2adc8060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d... 191.104410: usb_gadget_giveback_request: ep2in: req ffffff8918f48000 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
irq/398-dwc3-9882 [000] d..1 191.104648: dwc3_complete_trb: ep2in: trb ffffffc0163963b0 (E61:D60) buf 00000000ed530000 size 1x 49152 ctrl 2add0060 (hlcs:sc:isoc-first)
irq/398-dwc3-9882 [000] d... 191.104745: usb_gadget_giveback_request: ep2in: req ffffff894498a500 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
irq/398-dwc3-9882 [000] d..1 191.104834: dwc3_complete_trb: ep2in: trb ffffffc0163963c0 (E61:D61) buf 00000000ef610000 size 1x 49152 ctrl 2add8c60 (hlcs:SC:isoc-first)
irq/398-dwc3-9882 [000] d... 191.104925: usb_gadget_giveback_request: ep2in: req ffffff8859af5800 length 0/49152 sgs 0/0 stream 0 zsI status -18 --> 0
irq/398-dwc3-9882 [000] d..1 191.105098: dwc3_gadget_ep_cmd: ep2in: cmd 'End Transfer' [50d08] params 00000000 00000000 00000000 --> status: Successful

Where you can see that in between frames enqueued != dequeued, and this can
cause the video to freeze/skip for more than a single frame. The length of the
freeze is all timing related and depends on if an interrupt occurs on the last
req or not in the started_list.

So, to sort of rehash things again, we came up with the patch:
[PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc
to workaround the iommu related crashes being seen with the skip interrupt solution.

As an alternative solution, we are now using the diff mentioned above, and it
seems to avoid the crash, however we are seeing longer freezes with it, I believe
due to the explaination I provided above.

If you agree with my analysis, what should we do from here? What's the best way to ensure
that the started_list can get cleaned up after a missed isoc with the skip interrupt solution
enabled?


> >
> > > The logic to detect underrun doesn't trigger because the queued list is
> > > always non-empty, but the queued requests are expected to be missed
> > > already. So you keep seeing missed isoc.
> > >
> > > There are a few things you can mitigate this issue:
> > > 1) Don't set IMI if the request indicates no_interrupt. This reduces the
> > > time software needs to handle interrupts.
> >
> > I did try this out earlier and it did not prevent the video freeze. It does
> > make sense what you are suggesting, but because it didn't work for me it made
> > me think that not all reqs are being reclaimed after a missed isoc is seen.
> > I'll revisit this area again.
>
> I don't expect this to improve much.
>
> >
> > > 2) Improve the underrun detection logic.
> >
> > I like this idea a lot but I'm not up to the task just yet. Will attempt to
> > follow your recommendations below and see where I get with this.
> >
> > > 3) Increase the queuing frequency from the uvc to keep the request queue
> > > full. Note that reduce/avoid setting no_interrupt will allow the
> > > controller driver to update uvc often to keep requeuing new requests.
> > >
> > > Best option is 3), but maybe we can do all 3.
> > >
> >
> > I think that this is our best option too. Dan provided a patch to make
> > the interrupt skip logic configurable in the uvc driver:
> > https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/__;!!A4F2R9G_pg!eBu4j9m9C_XJHcuTXmYqo4CAe8bcQ0ZC3UWT3NJUZYYG6S-VJpriVwd2Q5NmAOFnN2PLgTauFDZ-fBM35ftO$
> > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/[email protected]/__;!!A4F2R9G_pg!eBu4j9m9C_XJHcuTXmYqo4CAe8bcQ0ZC3UWT3NJUZYYG6S-VJpriVwd2Q5NmAOFnN2PLgTauFDZ-fCLOxuhL$
> >
> > > For 2), you can set IMI for all isoc request as it is now. On missed
> > > isoc, check for the TRB's scheduled uframe (from TRB info) and compare
> > > it to the current uframe (from DSTS) for the number of intervals in
> > > between. With the number of queued requests, you can calculate whether
> > > the gadget driver queued enough requests. If it doesn't, send End
> > > Transfer command and cancel all the queued requests. The next set of
> > > requests will be in-sync again.
> > >
> > > BR,
> > > Thinh
> > >
> > > PS. On a side note, I notice that there are reports of issue when using
> > > SG right? Please note that dwc3 driver only allocates 256 TRBs
> > > (including a link TRB). Each SG entry takes a TRB. If a request has many
> > > SG entries, that eats up the available TRBs. So, even though the UVC may
> > > queue many requests, not all of them are prepared immediately. If the
> > > TRB ring is full, the driver need to wait for more TRBs to free up
> > > before preparing more. From the log, I see that it's sending 48KB. Let's
> > > say the uvc splits it into PAGE_SIZE of 4KB, that would take at least 12
> > > TRB per request. (Side thought: I'm not sure why UVC needs SG in the
> > > first place with its current implementation)
> >
> > On our platform I am seeing 2 items in the sg list being sent out from the uvc
> > driver. The 1st item in the list is a 2 byte uvc header and the 2nd item is
> > the 48KBs of data. To me this seems inefficient but sort of makes sense why it
> > was done, likely to avoid a memory copy just for the 2 byte header. But I
>
> That doesn't make sense to me, but I'm sure there's a real reason. Just
> curious, that's all. :)
>
> > share your concern here, it's possible that other users wont be so lucky and
> > will wind up having a lot of page sized items in the sg list.
> >
> > We are also hoping to make the use of sg configurable with the change:
> > https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/__;!!A4F2R9G_pg!eBu4j9m9C_XJHcuTXmYqo4CAe8bcQ0ZC3UWT3NJUZYYG6S-VJpriVwd2Q5NmAOFnN2PLgTauFDZ-fAiSl95Q$
> > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/[email protected]/__;!!A4F2R9G_pg!eBu4j9m9C_XJHcuTXmYqo4CAe8bcQ0ZC3UWT3NJUZYYG6S-VJpriVwd2Q5NmAOFnN2PLgTauFDZ-fDoaNx6Q$
> >
>
> Ok.
>
> Thanks,
> Thinh

Thanks,
Jeff

2022-10-20 21:25:15

by Jeff Vanhoof

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

Hi Thinh,

> On Wed, Oct 19, 2022 at 11:06:08PM +0000, Thinh Nguyen wrote:
> > Hi,
> >

<snip>

> > >
> > > > The logic to detect underrun doesn't trigger because the queued list is
> > > > always non-empty, but the queued requests are expected to be missed
> > > > already. So you keep seeing missed isoc.
> > > >
> > > > There are a few things you can mitigate this issue:
> > > > 1) Don't set IMI if the request indicates no_interrupt. This reduces the
> > > > time software needs to handle interrupts.
> > >
> > >
> > > > 2) Improve the underrun detection logic.
> > >
> > >
> > > > 3) Increase the queuing frequency from the uvc to keep the request queue
> > > > full. Note that reduce/avoid setting no_interrupt will allow the
> > > > controller driver to update uvc often to keep requeuing new requests.
> > > >
> > > > Best option is 3), but maybe we can do all 3.
> > > >
> > >

I forgot about your option 2. Will start looking into it.

Thanks,
Jeff

2022-10-20 23:01:25

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

On Thu, Oct 20, 2022, Jeff Vanhoof wrote:
> Hi Thinh,
>
> On Wed, Oct 19, 2022 at 11:06:08PM +0000, Thinh Nguyen wrote:
> > Hi,
> >
> > On Wed, Oct 19, 2022, Jeff Vanhoof wrote:
> > > Hi Thinh,
> > > On Wed, Oct 19, 2022 at 07:08:27PM +0000, Thinh Nguyen wrote:
> > > > On Wed, Oct 19, 2022, Jeff Vanhoof wrote:
> >
> > <snip>
> >
> > > > >
> > > > > From what I can gather from the log, with the current changes it seems that
> > > > > after a missed isoc event few requests are staying longer than expected in the
> > > > > started_list (not getting reclaimed) and this is preventing the transmission
> > > > > from stopping/starting again, and opening the door for continuous stream of
> > > > > missed isoc events that cause what appears to the user as a frozen video.
> > > > >
> > > > > So one thought, if IOC bit is not set every frame, but IMI bit is, when a
> > > > > missed isoc related interrupt occurs it seems likely that more than one trb
> > > > > request will need to be reclaimed, but the current set of changes is not
> > > > > handling this.
> > > > >
> > > > > In the good transfer case this issue seems to be taken care of since the IOC
> > > > > bit is not set every frame and the reclaimation will loop through every item in
> > > > > the started_list and only stop if there are no additional trbs or if one has
> > > >
> > > > It should stop at the request that associated with the interrupt event,
> > > > whether it's because of IMI or IOC.
> > >
> > > In this case I was concerned that if multipled queued reqs did not have IOC bit
> > > set, but there was a missed isoc on one of the last reqs, whether or not we would
> > > reclaim all of the requests up to the missed isoc related req. I'm not sure if
> > > my concern is valid or not.
> > >
> >
> > There should be no problem. If there's an interrupt event indicating a
> > TRB completion, the driver will give back all the requests up to the
> > request associated with the interrupt event, and the controller will
> > continue processing the remaining TRBs. On the next TRB completion
> > event, the driver will again give back all the requests up to the
> > request associated with that event.
> >
>
> I was testing with the following patch you suggested:
>
> > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > index 61fba2b7389b..8352f4b5dd9f 100644
> > --- a/drivers/usb/dwc3/gadget.c
> > +++ b/drivers/usb/dwc3/gadget.c
> > @@ -3657,6 +3657,10 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > return 1;
> >
> > + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain)
> > + return 1;
> > +
> > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > return 1;
> >
>
> At this time the IMI bit was set for every frame. With these changes it
> appeared in case of missed isoc that sometimes not all requests would be
> reclaimed (enqueued != dequeued even 100ms after the last interrupt was
> handled). If the 1st req in the started_list was fine (IMI set, but not IOC),
> and a later req was the one actually missed, because of this status check the
> reclaimation could stop early and not clean up to the appropriate req. As

Oops. You're right.

> suggested yesterday, I also tried only setting the IMI bit when no_interrupt is
> not set, however I was still seeing the complete freezes. After analyzing this
> issue a bit, I have updated the diff to look more like this:
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index dfaf9ac24c4f..bb800a81815b 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -1230,8 +1230,9 @@ static void __dwc3_prepare_one_trb(struct dwc3_ep *dep, struct dwc3_trb *trb,
> trb->ctrl = DWC3_TRBCTL_ISOCHRONOUS;
> }
>
> - /* always enable Interrupt on Missed ISOC */
> - trb->ctrl |= DWC3_TRB_CTRL_ISP_IMI;
> + /* enable Interrupt on Missed ISOC */
> + if ((!no_interrupt && !chain) || must_interrupt)
> + trb->ctrl |= DWC3_TRB_CTRL_ISP_IMI;
> break;

Either all or none of the TRBs of a request is set with IMI, and not
some.

>
> case USB_ENDPOINT_XFER_BULK:
> @@ -3195,6 +3196,11 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> if (event->status & DEPEVT_STATUS_SHORT && !chain)
> return 1;
>
> + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain
> + && (trb->ctrl & DWC3_TRB_CTRL_ISP_IMI))
> + return 1;
> +
> if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> (trb->ctrl & DWC3_TRB_CTRL_LST))
> return 1;
>
> Where the trb must have the IMI set before returning early. This seemed to make
> the freezes recoverable.

Can you try this revised change:

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 61fba2b7389b..a69d8c28d86b 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -3654,7 +3654,7 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
if ((trb->ctrl & DWC3_TRB_CTRL_HWO) && status != -ESHUTDOWN)
return 1;

- if (event->status & DEPEVT_STATUS_SHORT && !chain)
+ if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC && !chain)
return 1;

if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
@@ -3673,6 +3673,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
struct scatterlist *s;
unsigned int num_queued = req->num_queued_sgs;
unsigned int i;
+ bool missed_isoc = false;
int ret = 0;

for_each_sg(sg, s, num_queued, i) {
@@ -3681,12 +3682,18 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
req->sg = sg_next(s);
req->num_queued_sgs--;

+ if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC)
+ missed_isoc = true;
+
ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,
trb, event, status, true);
if (ret)
break;
}

+ if (missed_isoc)
+ ret = 1;
+
return ret;
}


>
> > > >
> > > > > its HWO bit set. Although I am a bit surprised that we are not yet seeing iommu
> > > > > related panics here too since the trb is being reclaimed and given back even
> > > > > the HWO bit still set, but maybe I am misunderstanding something. In my earlier
> > > > > testing, if a missed isoc event was received and we attempted to
> > > > > reclaim/giveback a trb with its HWO bit set, a iommu related panic would be
> > > > > seen.
> > > >
> > > > If the controller processed the TRB, it would clear the HWO bit after
> > > > completion. There are cases which the HWO bit is still set for some
> > > > TRBs, but the request is completed (e.g. short transfers causing the
> > > > controller to stop processing further). That those cases, the driver
> > > > would clear the HWO bit manually.
> > > >
> > > > >
> > > > > Can you propose an additional change to handle freeing up the extra trbs in the
> > > > > missed isoc case? I think that somehow the HWO bit should be checked before
> > > > > entering dwc3_gadget_ep_cleanup_completed_request in order to avoid the trb
> > > > > being given back too soon.
> > > >
> > > > We should not check for TRBs of requests beyond the request associated
> > > > with the interrupt event.
> > > >
> > > > >
> > > > > Your thoughts?
> > > > >
> > > >
> > > > The capture shows underrun, and it causes missed isoc.
> > > >
> > > > Let's take a look at the first missed isoc request (req ffffff88f834db00)
> > > >
> > > > <...>-22551 [002] d..2 13985.789684: dwc3_ep_queue: ep2in: req ffffff88f834db00 length 0/49152 zsi ==> -115
> > > > <...>-22551 [002] dn.2 13985.789728: dwc3_prepare_trb: ep2in: trb ffffffc016071970 (E152:D149) buf 00000000ea800000 size 1x 49152 ctrl 00000461 (Hlcs:Sc:isoc-first)
> > > >
> > > > Each request uses a one TRB. From above, you can see that there are only
> > > > 3 intervals prepared (E152 - D149 = 3).
> > > >
> > > > The timestamp of the last request completion is 13985.788919.
> > > > The timestamp which the queued request is 13985.789728.
> > > > We can roughly estimate the diff is at least 809us.
> > > >
> > > > 3 intervals (3x125us) is less than 809us. So missed isoc is expected:
> > > >
> > > > irq/399-dwc3-15269 [002] d..1 13985.790754: dwc3_event: event (f9acc08a): ep2in: Transfer In Progress [63916] (sIM)
> > > > irq/399-dwc3-15269 [002] d..1 13985.790758: dwc3_complete_trb: ep2in: trb ffffffc016071970 (E154:D152) buf 00000000ea800000 size 1x 49152 ctrl 3e6a0460 (hlcs:Sc:isoc-first)
> > > > irq/399-dwc3-15269 [002] d..1 13985.790808: dwc3_gadget_giveback: ep2in: req ffffff88f834db00 length 0/49152 zsi ==> -18
> > > >
> > > >
> > > > After this point, the uvc driver keeps playing catch up to stay in sync
> > > > with the host. It tries, but it couldn't catch up. Also, occasionally
> > > > something seems to be blocking dwc3, creating time gaps. Maybe because
> > > > of a spin_lock held somewhere?
> > > >
> > >
> > > Could the time gaps be created by the interrupt frequency changes? They
> > > completely change up the timing of when the transfers are kicked in dwc3 and
> > > when uvc_video_complete/uvcg_video_pump is called.
> >
> > You can check all the "fastrpc_msg" tracepoints in the log. These steps
> > seem to slow down the queuing process in uvc.
> >
>
> The large time gaps appear related to the low fps used for the camera
> configuration. I was able to confirm this by adding additional trace points in
> the uvc driver to see if pump was being called after new work was queued up
> by v4l2.
>
> Some additional thoughts:
>
> On the scheduling side, one issue I've seen:
> 1) uvc/dwc3 may take a few millisecs to queue up the requests for a single
> frame. How quickly this is completed could make for some of the differences
> seen from one platform to another.

Are you referring to a video frame or usb periodic frame/microframe? The
dwc3 driver doesn't take much time at all taking queued requests to
prepare and pass TRBs to the controller (much less than a ms). The UVC
has to keep up and feed requests to the interval that the endpoint is
configured, which is 125us per request for UVC.

The controller doesn't take much time completing the TRB either. The
time for DMA to complete and the interrupt to be asserted should be
fast relative to an interval. However, you're at the mercy of when the
OS handle the interrupt. Different setup will have different interrupt
handler latency, and it may not be consistent. So, your gadget driver
queuing frequency needs to account for that (ie. maintain the frequency
at a little less than 125us per request).

>
> 2) If the isoc transfer has not yet started and uvc begins to queue up requests to
> dwc3, it appears that the isoc transfer will almost immediately start. And if
> the remaining requests for the frame do not come quickly enough, there is a
> potential for the transfer to stop or a missed isoc event to occur.

The isoc transfers start the moment the SetInterface control request
completes. That tells the host to start polling (sending ACK TP) for
isoc transfers, but there may not be any real data. If there's no
data, the controller will send ZLP (for IN direction) for every poll.

When there's real data, the driver will schedule at what uframe that
first data should be sent. The gadget driver is supposed to maintain a
constant amount of data for the endpoint after that. Otherwise we get
underrun and missed isoc.

>
> Maybe this happens frequently on my platform because the debug kernel I am
> using has a bit too much overhead. Will allowing a bit more time for initially
> queuing up some data before kicking off the transfer be prudent? Is 500us
> really enough for a mobile platform? We've always have had issues with random
> frame drops, but on much older platforms I was able to reduce the frequency of
> them by delaying the start by pushing out the frame_number by 16 or 32.

The 500us should be enough. As long as you maintain a constant flow of
requests to avoid underrun, missed isoc is less likely. The odd thing
with UVC is that it treats isoc endpoint as if it's bulk, and "pump"
every once a while with a set of requests. So the endpoint has to
restart for every pump to avoid underrun.

>
> Also, one problem I am seeing specifically with the skip interrupts:
> 1) If a missed isoc event occurs, dwc3_gadget_giveback will call the request's
> ->complete callback where the status -EXDEV is passed. In uvc_video_complete it
> will begin to cancel the queue if this status is seen. If this occurs while in
> the middle of queuing up the requests for the same frame, then it's quite
> possible that the last set of requests will have had no_interrupt=1 set.

That's not good. Also, it seems like it's blocking the queuing?

>
> 2) In dwc3 after the missed isoc event, if the remaining set of requests in the
> started_list do not have the IOC bit set, then no further interrupts will
> occur. And because the started_list is not emptied the transfer will not be
> stopped. Only after uvc queues up the requests for the next frame (which may
> come much later, i.e. 30-100ms) will the transfers appear to continue, however
> additional missed isoc events will be received and the cycle could continue.

If the gadget driver decides to not receive interrupt, then yes, this
can happen. If there's no interrupt telling the dwc3 driver to start
cleaning up the used TRBs, then dwc3 driver is not going to
automatically to do so. However, if the gadget driver keeps queuing more
request until the TRB ring is full, the last TRB of the TRB ring will be
set with IOC, so there will be an interrupt.

>
> I believe that this is a problem that should be addressed somehow. Not sure if

This seems to be more of an issue from the gadget driver right? It
should track and maintain the list of request ensuring that at least one
of the request in the queue has IOC.

One thing we can change is don't set give back the request with
no_interrupt and keep it in a list until a request with IOC to return
all.

> the uvc driver needs to send a dummy request when an error is received, or if
> it should just send one more additional request with the no_interrupt bit not
> set, or if it needs to just send the complete frame in spite of the
> error. Hopefully I'm not just completely lost on this stuff.
>
> Using the above patch, I grabbed the following trace snippets to show what is being seen:
>
> Normal looking missed isoc related trace where enqueued == dequeued and transfer ends as expected:
> kworker/u17:5-8169 [000] d..2 187.669572: dwc3_gadget_ep_cmd: ep2in: cmd 'Start Transfer' [49540406] params 00000000 efff5f30 00000000 --> status: Successful
> irq/398-dwc3-9882 [000] d..1 187.672504: dwc3_complete_trb: ep2in: trb ffffffc016396f30 (E250:D244) buf 00000000ed470000 size 1x 0 ctrl 12550c60 (hlcs:SC:isoc-first)
> irq/398-dwc3-9882 [000] d..1 187.672804: dwc3_complete_trb: ep2in: trb ffffffc016396f40 (E250:D245) buf 00000000ec5e0000 size 1x 0 ctrl 12558060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d..1 187.672885: dwc3_complete_trb: ep2in: trb ffffffc016396f50 (E250:D246) buf 00000000ecd30000 size 1x 0 ctrl 12560060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d..1 187.672942: dwc3_complete_trb: ep2in: trb ffffffc016396f60 (E250:D247) buf 00000000ecd40000 size 1x 0 ctrl 12568060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d..1 187.672993: dwc3_complete_trb: ep2in: trb ffffffc016396f70 (E250:D248) buf 00000000edfb0000 size 1x 0 ctrl 12570060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d..1 187.673049: dwc3_complete_trb: ep2in: trb ffffffc016396f80 (E250:D249) buf 00000000edfa0000 size 1x 0 ctrl 12578c60 (hlcs:SC:isoc-first)
> irq/398-dwc3-9882 [000] d..1 187.674615: dwc3_complete_trb: ep2in: trb ffffffc016396f90 (E254:D250) buf 00000000eca80000 size 1x 0 ctrl 12580060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d..1 187.674722: dwc3_complete_trb: ep2in: trb ffffffc016396fa0 (E254:D251) buf 00000000edfa0000 size 1x 49152 ctrl 12588060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d... 187.674793: usb_gadget_giveback_request: ep2in: req ffffff894498a600 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
> irq/398-dwc3-9882 [000] d..1 187.675008: dwc3_complete_trb: ep2in: trb ffffffc016396fb0 (E254:D252) buf 00000000edfb0000 size 1x 49152 ctrl 12590060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d... 187.675093: usb_gadget_giveback_request: ep2in: req ffffff892bcd3500 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
> irq/398-dwc3-9882 [000] d..1 187.675120: dwc3_complete_trb: ep2in: trb ffffffc016396fc0 (E254:D253) buf 00000000ecd40000 size 1x 49152 ctrl 12598060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d... 187.675207: usb_gadget_giveback_request: ep2in: req ffffff8918f48200 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
> irq/398-dwc3-9882 [000] d..1 187.675225: dwc3_complete_trb: ep2in: trb ffffffc016396fd0 (E254:D254) buf 00000000ecd30000 size 1x 49152 ctrl 125a0c60 (hlcs:SC:isoc-first)
> irq/398-dwc3-9882 [000] d... 187.675274: usb_gadget_giveback_request: ep2in: req ffffff892bcd3600 length 0/49152 sgs 0/0 stream 0 zsI status -18 --> 0
> irq/398-dwc3-9882 [000] d..1 187.675364: dwc3_gadget_ep_cmd: ep2in: cmd 'End Transfer' [50d08] params 00000000 00000000 00000000 --> status: Successful
>
> Abnormal trace where the 'End Transfer' is delayed by a few frames:
> kworker/u17:5-8169 [000] dn.2 190.804817: dwc3_gadget_ep_cmd: ep2in: cmd 'Start Transfer' [ab500406] params 00000000 efff5290 00000000 --> status: Successful
> irq/398-dwc3-9882 [000] d..1 190.809070: dwc3_complete_trb: ep2in: trb ffffffc016396290 (E51:D42) buf 00000000ef6b0000 size 1x 0 ctrl 2ad40c60 (hlcs:SC:isoc-first)
> irq/398-dwc3-9882 [000] d..1 190.809478: dwc3_complete_trb: ep2in: trb ffffffc0163962a0 (E53:D43) buf 00000000ef060000 size 1x 49152 ctrl 2ad48060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d... 190.809534: usb_gadget_giveback_request: ep2in: req ffffff8918f48200 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
> irq/398-dwc3-9882 [000] d..1 190.809752: dwc3_complete_trb: ep2in: trb ffffffc0163962b0 (E54:D44) buf 00000000ec5a0000 size 1x 49152 ctrl 2ad50060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d... 190.809878: usb_gadget_giveback_request: ep2in: req ffffff8859af4e00 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
> irq/398-dwc3-9882 [000] d..1 190.809966: dwc3_complete_trb: ep2in: trb ffffffc0163962c0 (E54:D45) buf 00000000ef180000 size 1x 49152 ctrl 2ad58060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d... 190.810037: usb_gadget_giveback_request: ep2in: req ffffff8918f48000 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
> irq/398-dwc3-9882 [000] d..1 190.810058: dwc3_complete_trb: ep2in: trb ffffffc0163962d0 (E54:D46) buf 00000000ef070000 size 1x 49152 ctrl 2ad60060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d... 190.810139: usb_gadget_giveback_request: ep2in: req ffffff894498a500 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
> irq/398-dwc3-9882 [000] d..1 190.810159: dwc3_complete_trb: ep2in: trb ffffffc0163962e0 (E54:D47) buf 00000000eeae0000 size 1x 49152 ctrl 2ad68c60 (hlcs:SC:isoc-first)
> irq/398-dwc3-9882 [000] d... 190.810238: usb_gadget_giveback_request: ep2in: req ffffff88bddb9f00 length 0/49152 sgs 0/0 stream 0 zsI status -18 --> 0
> irq/398-dwc3-9882 [000] d..1 190.810293: dwc3_complete_trb: ep2in: trb ffffffc0163962f0 (E54:D48) buf 00000000ec0d0000 size 1x 49152 ctrl 2ad70060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d... 190.810368: usb_gadget_giveback_request: ep2in: req ffffff892bcd3500 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
> irq/398-dwc3-9882 [000] d..1 190.810389: dwc3_complete_trb: ep2in: trb ffffffc016396300 (E54:D49) buf 00000000ec0e0000 size 1x 0 ctrl 2ad78060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d..1 190.810454: dwc3_complete_trb: ep2in: trb ffffffc016396310 (E54:D50) buf 00000000ec0f0000 size 1x 0 ctrl 2ad80060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d..1 190.810526: dwc3_complete_trb: ep2in: trb ffffffc016396320 (E54:D51) buf 00000000efa20000 size 1x 49152 ctrl 2ad88060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d... 190.810591: usb_gadget_giveback_request: ep2in: req ffffff8859af5800 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
> irq/398-dwc3-9882 [000] d..1 190.810609: dwc3_complete_trb: ep2in: trb ffffffc016396330 (E54:D52) buf 00000000efa30000 size 1x 49152 ctrl 2ad90c60 (hlcs:SC:isoc-first)
> irq/398-dwc3-9882 [000] d... 190.810676: usb_gadget_giveback_request: ep2in: req ffffff88bddb8300 length 0/49152 sgs 0/0 stream 0 zsI status -18 --> 0
> ...
> irq/398-dwc3-9882 [000] d..1 190.919720: dwc3_complete_trb: ep2in: trb ffffffc016396340 (E56:D53) buf 00000000efa40000 size 1x 49152 ctrl 2ad98060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d... 190.919804: usb_gadget_giveback_request: ep2in: req ffffff894498a900 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
> irq/398-dwc3-9882 [000] d..1 190.920065: dwc3_complete_trb: ep2in: trb ffffffc016396350 (E57:D54) buf 00000000efa50000 size 1x 49152 ctrl 2ada0060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d... 190.920116: usb_gadget_giveback_request: ep2in: req ffffff88bddb8b00 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
> irq/398-dwc3-9882 [000] d..1 190.920175: dwc3_complete_trb: ep2in: trb ffffffc016396360 (E57:D55) buf 00000000efa60000 size 1x 49152 ctrl 2ada8c60 (hlcs:SC:isoc-first)
> irq/398-dwc3-9882 [000] d... 190.920223: usb_gadget_giveback_request: ep2in: req ffffff892bcd3600 length 0/49152 sgs 0/0 stream 0 zsI status -18 --> 0
> ...
> irq/398-dwc3-9882 [000] d..1 190.992902: dwc3_complete_trb: ep2in: trb ffffffc016396370 (E59:D56) buf 00000000efa90000 size 1x 49152 ctrl 2adb0060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d... 190.993014: usb_gadget_giveback_request: ep2in: req ffffff88a0b9b400 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
> irq/398-dwc3-9882 [000] d..1 190.993365: dwc3_complete_trb: ep2in: trb ffffffc016396380 (E60:D57) buf 00000000efaa0000 size 1x 49152 ctrl 2adb8060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d... 190.993439: usb_gadget_giveback_request: ep2in: req ffffff8859af4f00 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
> irq/398-dwc3-9882 [000] d..1 190.993528: dwc3_complete_trb: ep2in: trb ffffffc016396390 (E60:D58) buf 00000000efab0000 size 1x 49152 ctrl 2adc0c60 (hlcs:SC:isoc-first)
> irq/398-dwc3-9882 [000] d... 190.993622: usb_gadget_giveback_request: ep2in: req ffffff8859af4e00 length 0/49152 sgs 0/0 stream 0 zsI status -18 --> 0
> ...
> irq/398-dwc3-9882 [000] d..1 191.104270: dwc3_complete_trb: ep2in: trb ffffffc0163963a0 (E61:D59) buf 00000000ee950000 size 1x 49152 ctrl 2adc8060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d... 191.104410: usb_gadget_giveback_request: ep2in: req ffffff8918f48000 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
> irq/398-dwc3-9882 [000] d..1 191.104648: dwc3_complete_trb: ep2in: trb ffffffc0163963b0 (E61:D60) buf 00000000ed530000 size 1x 49152 ctrl 2add0060 (hlcs:sc:isoc-first)
> irq/398-dwc3-9882 [000] d... 191.104745: usb_gadget_giveback_request: ep2in: req ffffff894498a500 length 0/49152 sgs 0/0 stream 0 zsi status -18 --> 0
> irq/398-dwc3-9882 [000] d..1 191.104834: dwc3_complete_trb: ep2in: trb ffffffc0163963c0 (E61:D61) buf 00000000ef610000 size 1x 49152 ctrl 2add8c60 (hlcs:SC:isoc-first)
> irq/398-dwc3-9882 [000] d... 191.104925: usb_gadget_giveback_request: ep2in: req ffffff8859af5800 length 0/49152 sgs 0/0 stream 0 zsI status -18 --> 0
> irq/398-dwc3-9882 [000] d..1 191.105098: dwc3_gadget_ep_cmd: ep2in: cmd 'End Transfer' [50d08] params 00000000 00000000 00000000 --> status: Successful
>
> Where you can see that in between frames enqueued != dequeued, and this can
> cause the video to freeze/skip for more than a single frame. The length of the
> freeze is all timing related and depends on if an interrupt occurs on the last
> req or not in the started_list.
>
> So, to sort of rehash things again, we came up with the patch:
> [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc
> to workaround the iommu related crashes being seen with the skip interrupt solution.
>

We shouldn't just End Transfer when there's missed isoc. That's not
good.

> As an alternative solution, we are now using the diff mentioned above, and it
> seems to avoid the crash, however we are seeing longer freezes with it, I believe
> due to the explaination I provided above.
>
> If you agree with my analysis, what should we do from here? What's the best way to ensure
> that the started_list can get cleaned up after a missed isoc with the skip interrupt solution
> enabled?
>

See my previous comments.

>
> > >
> > > > The logic to detect underrun doesn't trigger because the queued list is
> > > > always non-empty, but the queued requests are expected to be missed
> > > > already. So you keep seeing missed isoc.
> > > >
> > > > There are a few things you can mitigate this issue:
> > > > 1) Don't set IMI if the request indicates no_interrupt. This reduces the
> > > > time software needs to handle interrupts.
> > >
> > > I did try this out earlier and it did not prevent the video freeze. It does
> > > make sense what you are suggesting, but because it didn't work for me it made
> > > me think that not all reqs are being reclaimed after a missed isoc is seen.
> > > I'll revisit this area again.
> >
> > I don't expect this to improve much.
> >
> > >
> > > > 2) Improve the underrun detection logic.
> > >
> > > I like this idea a lot but I'm not up to the task just yet. Will attempt to
> > > follow your recommendations below and see where I get with this.
> > >
> > > > 3) Increase the queuing frequency from the uvc to keep the request queue
> > > > full. Note that reduce/avoid setting no_interrupt will allow the
> > > > controller driver to update uvc often to keep requeuing new requests.
> > > >
> > > > Best option is 3), but maybe we can do all 3.
> > > >
> > >
> > > I think that this is our best option too. Dan provided a patch to make
> > > the interrupt skip logic configurable in the uvc driver:
> > > https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/__;!!A4F2R9G_pg!eBu4j9m9C_XJHcuTXmYqo4CAe8bcQ0ZC3UWT3NJUZYYG6S-VJpriVwd2Q5NmAOFnN2PLgTauFDZ-fBM35ftO$
> > > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/[email protected]/__;!!A4F2R9G_pg!eBu4j9m9C_XJHcuTXmYqo4CAe8bcQ0ZC3UWT3NJUZYYG6S-VJpriVwd2Q5NmAOFnN2PLgTauFDZ-fCLOxuhL$
> > >
> > > > For 2), you can set IMI for all isoc request as it is now. On missed
> > > > isoc, check for the TRB's scheduled uframe (from TRB info) and compare
> > > > it to the current uframe (from DSTS) for the number of intervals in
> > > > between. With the number of queued requests, you can calculate whether
> > > > the gadget driver queued enough requests. If it doesn't, send End
> > > > Transfer command and cancel all the queued requests. The next set of
> > > > requests will be in-sync again.
> > > >
> > > > BR,
> > > > Thinh
> > > >
> > > > PS. On a side note, I notice that there are reports of issue when using
> > > > SG right? Please note that dwc3 driver only allocates 256 TRBs
> > > > (including a link TRB). Each SG entry takes a TRB. If a request has many
> > > > SG entries, that eats up the available TRBs. So, even though the UVC may
> > > > queue many requests, not all of them are prepared immediately. If the
> > > > TRB ring is full, the driver need to wait for more TRBs to free up
> > > > before preparing more. From the log, I see that it's sending 48KB. Let's
> > > > say the uvc splits it into PAGE_SIZE of 4KB, that would take at least 12
> > > > TRB per request. (Side thought: I'm not sure why UVC needs SG in the
> > > > first place with its current implementation)
> > >
> > > On our platform I am seeing 2 items in the sg list being sent out from the uvc
> > > driver. The 1st item in the list is a 2 byte uvc header and the 2nd item is
> > > the 48KBs of data. To me this seems inefficient but sort of makes sense why it
> > > was done, likely to avoid a memory copy just for the 2 byte header. But I
> >
> > That doesn't make sense to me, but I'm sure there's a real reason. Just
> > curious, that's all. :)
> >
> > > share your concern here, it's possible that other users wont be so lucky and
> > > will wind up having a lot of page sized items in the sg list.
> > >
> > > We are also hoping to make the use of sg configurable with the change:
> > > https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/[email protected]/__;!!A4F2R9G_pg!eBu4j9m9C_XJHcuTXmYqo4CAe8bcQ0ZC3UWT3NJUZYYG6S-VJpriVwd2Q5NmAOFnN2PLgTauFDZ-fAiSl95Q$
> > > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/[email protected]/__;!!A4F2R9G_pg!eBu4j9m9C_XJHcuTXmYqo4CAe8bcQ0ZC3UWT3NJUZYYG6S-VJpriVwd2Q5NmAOFnN2PLgTauFDZ-fDoaNx6Q$
> > >
> >
> > Ok.
> >

Thanks,
Thinh

2022-10-21 01:04:59

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

On Thu, Oct 20, 2022, Thinh Nguyen wrote:
> On Thu, Oct 20, 2022, Jeff Vanhoof wrote:
> > Hi Thinh,
> >
> > On Wed, Oct 19, 2022 at 11:06:08PM +0000, Thinh Nguyen wrote:
> > > Hi,
> > >
> > > On Wed, Oct 19, 2022, Jeff Vanhoof wrote:
> > > > Hi Thinh,
> > > > On Wed, Oct 19, 2022 at 07:08:27PM +0000, Thinh Nguyen wrote:
> > > > > On Wed, Oct 19, 2022, Jeff Vanhoof wrote:
> > >
> > > <snip>
> > >
> > > > > >
> > > > > > From what I can gather from the log, with the current changes it seems that
> > > > > > after a missed isoc event few requests are staying longer than expected in the
> > > > > > started_list (not getting reclaimed) and this is preventing the transmission
> > > > > > from stopping/starting again, and opening the door for continuous stream of
> > > > > > missed isoc events that cause what appears to the user as a frozen video.
> > > > > >
> > > > > > So one thought, if IOC bit is not set every frame, but IMI bit is, when a
> > > > > > missed isoc related interrupt occurs it seems likely that more than one trb
> > > > > > request will need to be reclaimed, but the current set of changes is not
> > > > > > handling this.
> > > > > >
> > > > > > In the good transfer case this issue seems to be taken care of since the IOC
> > > > > > bit is not set every frame and the reclaimation will loop through every item in
> > > > > > the started_list and only stop if there are no additional trbs or if one has
> > > > >
> > > > > It should stop at the request that associated with the interrupt event,
> > > > > whether it's because of IMI or IOC.
> > > >
> > > > In this case I was concerned that if multipled queued reqs did not have IOC bit
> > > > set, but there was a missed isoc on one of the last reqs, whether or not we would
> > > > reclaim all of the requests up to the missed isoc related req. I'm not sure if
> > > > my concern is valid or not.
> > > >
> > >
> > > There should be no problem. If there's an interrupt event indicating a
> > > TRB completion, the driver will give back all the requests up to the
> > > request associated with the interrupt event, and the controller will
> > > continue processing the remaining TRBs. On the next TRB completion
> > > event, the driver will again give back all the requests up to the
> > > request associated with that event.
> > >
> >
> > I was testing with the following patch you suggested:
> >
> > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > index 61fba2b7389b..8352f4b5dd9f 100644
> > > --- a/drivers/usb/dwc3/gadget.c
> > > +++ b/drivers/usb/dwc3/gadget.c
> > > @@ -3657,6 +3657,10 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > return 1;
> > >
> > > + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > > + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain)
> > > + return 1;
> > > +
> > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > > return 1;
> > >
> >
> > At this time the IMI bit was set for every frame. With these changes it
> > appeared in case of missed isoc that sometimes not all requests would be
> > reclaimed (enqueued != dequeued even 100ms after the last interrupt was
> > handled). If the 1st req in the started_list was fine (IMI set, but not IOC),
> > and a later req was the one actually missed, because of this status check the
> > reclaimation could stop early and not clean up to the appropriate req. As
>
> Oops. You're right.
>
> > suggested yesterday, I also tried only setting the IMI bit when no_interrupt is
> > not set, however I was still seeing the complete freezes. After analyzing this
> > issue a bit, I have updated the diff to look more like this:
> >
> > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > index dfaf9ac24c4f..bb800a81815b 100644
> > --- a/drivers/usb/dwc3/gadget.c
> > +++ b/drivers/usb/dwc3/gadget.c
> > @@ -1230,8 +1230,9 @@ static void __dwc3_prepare_one_trb(struct dwc3_ep *dep, struct dwc3_trb *trb,
> > trb->ctrl = DWC3_TRBCTL_ISOCHRONOUS;
> > }
> >
> > - /* always enable Interrupt on Missed ISOC */
> > - trb->ctrl |= DWC3_TRB_CTRL_ISP_IMI;
> > + /* enable Interrupt on Missed ISOC */
> > + if ((!no_interrupt && !chain) || must_interrupt)
> > + trb->ctrl |= DWC3_TRB_CTRL_ISP_IMI;
> > break;
>
> Either all or none of the TRBs of a request is set with IMI, and not
> some.
>
> >
> > case USB_ENDPOINT_XFER_BULK:
> > @@ -3195,6 +3196,11 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > return 1;
> >
> > + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain
> > + && (trb->ctrl & DWC3_TRB_CTRL_ISP_IMI))
> > + return 1;
> > +
> > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > return 1;
> >
> > Where the trb must have the IMI set before returning early. This seemed to make
> > the freezes recoverable.
>
> Can you try this revised change:
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 61fba2b7389b..a69d8c28d86b 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -3654,7 +3654,7 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> if ((trb->ctrl & DWC3_TRB_CTRL_HWO) && status != -ESHUTDOWN)
> return 1;
>
> - if (event->status & DEPEVT_STATUS_SHORT && !chain)

I accidentally deleted a couple of lines here.

> + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC && !chain)
> return 1;
>
> if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||

I meant to do this:

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 61fba2b7389b..cb65371572ee 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -3657,6 +3657,9 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
if (event->status & DEPEVT_STATUS_SHORT && !chain)
return 1;

+ if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC && !chain)
+ return 1;
+
if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
(trb->ctrl & DWC3_TRB_CTRL_LST))
return 1;
@@ -3673,6 +3676,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
struct scatterlist *s;
unsigned int num_queued = req->num_queued_sgs;
unsigned int i;
+ bool missed_isoc = false;
int ret = 0;

for_each_sg(sg, s, num_queued, i) {
@@ -3681,12 +3685,18 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
req->sg = sg_next(s);
req->num_queued_sgs--;

+ if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC)
+ missed_isoc = true;
+
ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,
trb, event, status, true);
if (ret)
break;
}

+ if (missed_isoc)
+ ret = 1;
+
return ret;
}


BR,
Thinh

2022-10-21 09:49:13

by Jeff Vanhoof

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

Hi Thinh,

On Fri, Oct 21, 2022 at 12:55:51AM +0000, Thinh Nguyen wrote:
> On Thu, Oct 20, 2022, Thinh Nguyen wrote:
> > On Thu, Oct 20, 2022, Jeff Vanhoof wrote:
> > > Hi Thinh,
> > >
> > > On Wed, Oct 19, 2022 at 11:06:08PM +0000, Thinh Nguyen wrote:
> > > > Hi,
> > > >
> > > > On Wed, Oct 19, 2022, Jeff Vanhoof wrote:
> > > > > Hi Thinh,
> > > > > On Wed, Oct 19, 2022 at 07:08:27PM +0000, Thinh Nguyen wrote:
> > > > > > On Wed, Oct 19, 2022, Jeff Vanhoof wrote:
> > > >
> > > > <snip>
> > > >
> > > > > > >
> > > > > > > From what I can gather from the log, with the current changes it seems that
> > > > > > > after a missed isoc event few requests are staying longer than expected in the
> > > > > > > started_list (not getting reclaimed) and this is preventing the transmission
> > > > > > > from stopping/starting again, and opening the door for continuous stream of
> > > > > > > missed isoc events that cause what appears to the user as a frozen video.
> > > > > > >
> > > > > > > So one thought, if IOC bit is not set every frame, but IMI bit is, when a
> > > > > > > missed isoc related interrupt occurs it seems likely that more than one trb
> > > > > > > request will need to be reclaimed, but the current set of changes is not
> > > > > > > handling this.
> > > > > > >
> > > > > > > In the good transfer case this issue seems to be taken care of since the IOC
> > > > > > > bit is not set every frame and the reclaimation will loop through every item in
> > > > > > > the started_list and only stop if there are no additional trbs or if one has
> > > > > >
> > > > > > It should stop at the request that associated with the interrupt event,
> > > > > > whether it's because of IMI or IOC.
> > > > >
> > > > > In this case I was concerned that if multipled queued reqs did not have IOC bit
> > > > > set, but there was a missed isoc on one of the last reqs, whether or not we would
> > > > > reclaim all of the requests up to the missed isoc related req. I'm not sure if
> > > > > my concern is valid or not.
> > > > >
> > > >
> > > > There should be no problem. If there's an interrupt event indicating a
> > > > TRB completion, the driver will give back all the requests up to the
> > > > request associated with the interrupt event, and the controller will
> > > > continue processing the remaining TRBs. On the next TRB completion
> > > > event, the driver will again give back all the requests up to the
> > > > request associated with that event.
> > > >
> > >
> > > I was testing with the following patch you suggested:
> > >
> > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > index 61fba2b7389b..8352f4b5dd9f 100644
> > > > --- a/drivers/usb/dwc3/gadget.c
> > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > @@ -3657,6 +3657,10 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > > return 1;
> > > >
> > > > + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > > > + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain)
> > > > + return 1;
> > > > +
> > > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > > > return 1;
> > > >
> > >
> > > At this time the IMI bit was set for every frame. With these changes it
> > > appeared in case of missed isoc that sometimes not all requests would be
> > > reclaimed (enqueued != dequeued even 100ms after the last interrupt was
> > > handled). If the 1st req in the started_list was fine (IMI set, but not IOC),
> > > and a later req was the one actually missed, because of this status check the
> > > reclaimation could stop early and not clean up to the appropriate req. As
> >
> > Oops. You're right.
> >
> > > suggested yesterday, I also tried only setting the IMI bit when no_interrupt is
> > > not set, however I was still seeing the complete freezes. After analyzing this
> > > issue a bit, I have updated the diff to look more like this:
> > >
> > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > index dfaf9ac24c4f..bb800a81815b 100644
> > > --- a/drivers/usb/dwc3/gadget.c
> > > +++ b/drivers/usb/dwc3/gadget.c
> > > @@ -1230,8 +1230,9 @@ static void __dwc3_prepare_one_trb(struct dwc3_ep *dep, struct dwc3_trb *trb,
> > > trb->ctrl = DWC3_TRBCTL_ISOCHRONOUS;
> > > }
> > >
> > > - /* always enable Interrupt on Missed ISOC */
> > > - trb->ctrl |= DWC3_TRB_CTRL_ISP_IMI;
> > > + /* enable Interrupt on Missed ISOC */
> > > + if ((!no_interrupt && !chain) || must_interrupt)
> > > + trb->ctrl |= DWC3_TRB_CTRL_ISP_IMI;
> > > break;
> >
> > Either all or none of the TRBs of a request is set with IMI, and not
> > some.
> >
> > >
> > > case USB_ENDPOINT_XFER_BULK:
> > > @@ -3195,6 +3196,11 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > return 1;
> > >
> > > + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > > + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain
> > > + && (trb->ctrl & DWC3_TRB_CTRL_ISP_IMI))
> > > + return 1;
> > > +
> > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > > return 1;
> > >
> > > Where the trb must have the IMI set before returning early. This seemed to make
> > > the freezes recoverable.
> >
> > Can you try this revised change:
> >
> > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > index 61fba2b7389b..a69d8c28d86b 100644
> > --- a/drivers/usb/dwc3/gadget.c
> > +++ b/drivers/usb/dwc3/gadget.c
> > @@ -3654,7 +3654,7 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > if ((trb->ctrl & DWC3_TRB_CTRL_HWO) && status != -ESHUTDOWN)
> > return 1;
> >
> > - if (event->status & DEPEVT_STATUS_SHORT && !chain)
>
> I accidentally deleted a couple of lines here.
>
> > + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC && !chain)
> > return 1;
> >
> > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
>
> I meant to do this:
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 61fba2b7389b..cb65371572ee 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -3657,6 +3657,9 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> if (event->status & DEPEVT_STATUS_SHORT && !chain)
> return 1;
>
> + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC && !chain)
> + return 1;
> +
> if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> (trb->ctrl & DWC3_TRB_CTRL_LST))
> return 1;
> @@ -3673,6 +3676,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
> struct scatterlist *s;
> unsigned int num_queued = req->num_queued_sgs;
> unsigned int i;
> + bool missed_isoc = false;
> int ret = 0;
>
> for_each_sg(sg, s, num_queued, i) {
> @@ -3681,12 +3685,18 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
> req->sg = sg_next(s);
> req->num_queued_sgs--;
>
> + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC)
> + missed_isoc = true;
> +
> ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,
> trb, event, status, true);
> if (ret)
> break;
> }
>
> + if (missed_isoc)
> + ret = 1;
> +
> return ret;
> }
>
>
> BR,
> Thinh

I tried out the following patch diff you provided and I did not see any iommu
related crashes with these changes:

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index dfaf9ac24c4f..50287437d6de 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -3195,6 +3195,9 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
if (event->status & DEPEVT_STATUS_SHORT && !chain)
return 1;

+ if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC && !chain)
+ return 1;
+
if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
(trb->ctrl & DWC3_TRB_CTRL_LST))
return 1;
@@ -3211,6 +3214,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
struct scatterlist *s;
unsigned int num_queued = req->num_queued_sgs;
unsigned int i;
+ bool missed_isoc = false;
int ret = 0;

for_each_sg(sg, s, num_queued, i) {
@@ -3219,12 +3223,18 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
req->sg = sg_next(s);
req->num_queued_sgs--;

+ if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC)
+ missed_isoc = true;
+
ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,
trb, event, status, true);
if (ret)
break;
}

+ if (missed_isoc)
+ ret = 1;
+
return ret;
}


As we discussed earlier, when uvc's complete function is called, if an -EXDEV
is returned in the request's status, the uvc driver will begin to cancel its
queue. With the current skip interrupt implementation in the uvc driver, if
this occurs while the uvc driver is pumping the current frame, then there is no
guarentee that the last request(s) will have had 'no_interrupt=0'. If the last
requests passed to dwc3 had 'no_interrupt=1', these requests would eventually
be placed at the end of the started_list. Since the IOC bit will not be set,
and if no missed isoc event occurs on these requests, then the dwc3 driver will
not be interrupted, leaving those remaining requests sitting in the
started_list, and dwc3 will not perform an 'End Transfer' as expected. Once the
uvc driver begins to pump the requests for the next frame, then it most likely
will result in additional missed isoc events, with the result being an extended
video freeze seen by the user.

I hope that other uvc driver maintainers can chime in here to help determine the
correct path forward. With the skip interrupt implementation, the uvc driver should
guarentee that the last request sent to dwc3 has 'no_interrupt=0', otherwise
if a missed isoc error occurs, it becomes very likely that the next immediate set of
frames could be dropped/cancelled because the dwc3 driver could not perform a timely
'End Transfer'.

For testing I implemented the following changes to see what I could do for this
issue. Note that I am on an older implementation and it's missing a lot of the
sg related implementation. The idea here is that if the queue is empty, and that
req_int_count is non-zero then the last request likely had 'no_interrupt=1' set.
And if this is the case then we will want to send some dummy request to dwc3 with
'no_interrupt=0' set to make sure that no requests get stuck in its started_list.

diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index a00786bc6e60..cb82af6bf453 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -253,6 +253,70 @@ uvc_video_alloc_requests(struct uvc_video *video)
* Video streaming
*/

+/*
+ * uvcg_video_check_pump_fake - Pumps fake/empty video data into a USB request
+ *
+ * This function fills an available USB requests (listed in req_free) with
+ * fake/empty video data to ensure that the last request had no_interrupt == 0
+ */
+static void uvcg_video_check_pump_fake(struct work_struct *work)
+{
+ struct uvc_video *video = container_of(work, struct uvc_video, pump);
+ struct uvc_video_queue *queue = &video->queue;
+ struct usb_request *req = NULL;
+ unsigned long flags;
+ uint8_t *data = NULL;
+ int ret;
+
+ if (video->ep->enabled && list_empty(&queue->irqqueue) &&
+ video->req_int_count != 0) {
+ /* Retrieve the first available USB request, protected by the
+ * request lock.
+ */
+ spin_lock_irqsave(&video->req_lock, flags);
+ if (list_empty(&video->req_free)) {
+ spin_unlock_irqrestore(&video->req_lock, flags);
+ return;
+ }
+ req = list_first_entry(&video->req_free, struct usb_request,
+ list);
+ list_del(&req->list);
+ spin_unlock_irqrestore(&video->req_lock, flags);
+
+ /* Set up the fake/empty uvc request */
+ data = req->buf;
+ data[0] = 2;
+ data[1] = UVC_STREAM_EOH | video->fid | UVC_STREAM_EOF;
+ req->length = 2;
+ req->no_interrupt = 0;
+
+ /* Queue the USB request */
+ spin_lock_irqsave(&queue->irqlock, flags);
+ ret = uvcg_video_ep_queue(video, req);
+ spin_unlock_irqrestore(&queue->irqlock, flags);
+
+ if (ret < 0) {
+ uvcg_queue_cancel(queue, 0);
+ } else {
+ /* Endpoint now owns the request */
+ req = NULL;
+
+ /* Reset req_int_count to force an interrupt on the next request
+ * and to avoid going through uvcg_video_check_pump_fake again.
+ */
+ video->req_int_count = 0;
+ }
+ }
+
+ if (!req)
+ return;
+
+ spin_lock_irqsave(&video->req_lock, flags);
+ list_add_tail(&req->list, &video->req_free);
+ spin_unlock_irqrestore(&video->req_lock, flags);
+ return;
+}
+
/*
* uvcg_video_pump - Pump video data into the USB requests
*
@@ -268,6 +332,8 @@ static void uvcg_video_pump(struct work_struct *work)
unsigned long flags;
int ret;

+ uvcg_video_check_pump_fake(work);
+
while (video->ep->enabled) {
/* Retrieve the first available USB request, protected by the
* request lock.
@@ -318,7 +384,8 @@ static void uvcg_video_pump(struct work_struct *work)
/* Endpoint now owns the request */
req = NULL;
}
- video->req_int_count++;
+ if (buf->state != UVC_BUF_STATE_DONE)
+ video->req_int_count++;
}

if (!req)


Alternatively we may just not want to cancel the queue upon receiving -EXDEV
and this could solve the problem too, but I don't think that it's such a great
idea, especially if things start falling behind.

I hope that someone more fluent in this area of code can take a crack at
improving/fixing this issue.

The changes above do seem to help dwc3 timely end its transfers, but mainly for
cases where some requests are missed but the next immediate ones are not (i'm
talking within a couple of hundred microseconds). Most of the time if missed
isocs occurs for a frame that the remaining reqs in the started_list will
likely also error out and the list will be emptied and dwc3 will still timely
send 'End Transfer'. In reality this is to cover a corner case that can
adversely affect the quality of the video being watched. Just wanted to be
upfront with these details.

Thinh, any pointers on how we should proceed from here? It looks like your
changes are working well.

Thanks,
Jeff

2022-10-21 17:21:41

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

On Fri, Oct 21, 2022, Jeff Vanhoof wrote:
> Hi Thinh,
>
> On Fri, Oct 21, 2022 at 12:55:51AM +0000, Thinh Nguyen wrote:
> > On Thu, Oct 20, 2022, Thinh Nguyen wrote:
> > > On Thu, Oct 20, 2022, Jeff Vanhoof wrote:
> > > > Hi Thinh,
> > > >
> > > > On Wed, Oct 19, 2022 at 11:06:08PM +0000, Thinh Nguyen wrote:
> > > > > Hi,
> > > > >
> > > > > On Wed, Oct 19, 2022, Jeff Vanhoof wrote:
> > > > > > Hi Thinh,
> > > > > > On Wed, Oct 19, 2022 at 07:08:27PM +0000, Thinh Nguyen wrote:
> > > > > > > On Wed, Oct 19, 2022, Jeff Vanhoof wrote:
> > > > >
> > > > > <snip>
> > > > >
> > > > > > > >
> > > > > > > > From what I can gather from the log, with the current changes it seems that
> > > > > > > > after a missed isoc event few requests are staying longer than expected in the
> > > > > > > > started_list (not getting reclaimed) and this is preventing the transmission
> > > > > > > > from stopping/starting again, and opening the door for continuous stream of
> > > > > > > > missed isoc events that cause what appears to the user as a frozen video.
> > > > > > > >
> > > > > > > > So one thought, if IOC bit is not set every frame, but IMI bit is, when a
> > > > > > > > missed isoc related interrupt occurs it seems likely that more than one trb
> > > > > > > > request will need to be reclaimed, but the current set of changes is not
> > > > > > > > handling this.
> > > > > > > >
> > > > > > > > In the good transfer case this issue seems to be taken care of since the IOC
> > > > > > > > bit is not set every frame and the reclaimation will loop through every item in
> > > > > > > > the started_list and only stop if there are no additional trbs or if one has
> > > > > > >
> > > > > > > It should stop at the request that associated with the interrupt event,
> > > > > > > whether it's because of IMI or IOC.
> > > > > >
> > > > > > In this case I was concerned that if multipled queued reqs did not have IOC bit
> > > > > > set, but there was a missed isoc on one of the last reqs, whether or not we would
> > > > > > reclaim all of the requests up to the missed isoc related req. I'm not sure if
> > > > > > my concern is valid or not.
> > > > > >
> > > > >
> > > > > There should be no problem. If there's an interrupt event indicating a
> > > > > TRB completion, the driver will give back all the requests up to the
> > > > > request associated with the interrupt event, and the controller will
> > > > > continue processing the remaining TRBs. On the next TRB completion
> > > > > event, the driver will again give back all the requests up to the
> > > > > request associated with that event.
> > > > >
> > > >
> > > > I was testing with the following patch you suggested:
> > > >
> > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > > index 61fba2b7389b..8352f4b5dd9f 100644
> > > > > --- a/drivers/usb/dwc3/gadget.c
> > > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > > @@ -3657,6 +3657,10 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > > > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > > > return 1;
> > > > >
> > > > > + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > > > > + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain)
> > > > > + return 1;
> > > > > +
> > > > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > > > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > > > > return 1;
> > > > >
> > > >
> > > > At this time the IMI bit was set for every frame. With these changes it
> > > > appeared in case of missed isoc that sometimes not all requests would be
> > > > reclaimed (enqueued != dequeued even 100ms after the last interrupt was
> > > > handled). If the 1st req in the started_list was fine (IMI set, but not IOC),
> > > > and a later req was the one actually missed, because of this status check the
> > > > reclaimation could stop early and not clean up to the appropriate req. As
> > >
> > > Oops. You're right.
> > >
> > > > suggested yesterday, I also tried only setting the IMI bit when no_interrupt is
> > > > not set, however I was still seeing the complete freezes. After analyzing this
> > > > issue a bit, I have updated the diff to look more like this:
> > > >
> > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > index dfaf9ac24c4f..bb800a81815b 100644
> > > > --- a/drivers/usb/dwc3/gadget.c
> > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > @@ -1230,8 +1230,9 @@ static void __dwc3_prepare_one_trb(struct dwc3_ep *dep, struct dwc3_trb *trb,
> > > > trb->ctrl = DWC3_TRBCTL_ISOCHRONOUS;
> > > > }
> > > >
> > > > - /* always enable Interrupt on Missed ISOC */
> > > > - trb->ctrl |= DWC3_TRB_CTRL_ISP_IMI;
> > > > + /* enable Interrupt on Missed ISOC */
> > > > + if ((!no_interrupt && !chain) || must_interrupt)
> > > > + trb->ctrl |= DWC3_TRB_CTRL_ISP_IMI;
> > > > break;
> > >
> > > Either all or none of the TRBs of a request is set with IMI, and not
> > > some.
> > >
> > > >
> > > > case USB_ENDPOINT_XFER_BULK:
> > > > @@ -3195,6 +3196,11 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > > return 1;
> > > >
> > > > + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > > > + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain
> > > > + && (trb->ctrl & DWC3_TRB_CTRL_ISP_IMI))
> > > > + return 1;
> > > > +
> > > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > > > return 1;
> > > >
> > > > Where the trb must have the IMI set before returning early. This seemed to make
> > > > the freezes recoverable.
> > >
> > > Can you try this revised change:
> > >
> > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > index 61fba2b7389b..a69d8c28d86b 100644
> > > --- a/drivers/usb/dwc3/gadget.c
> > > +++ b/drivers/usb/dwc3/gadget.c
> > > @@ -3654,7 +3654,7 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > if ((trb->ctrl & DWC3_TRB_CTRL_HWO) && status != -ESHUTDOWN)
> > > return 1;
> > >
> > > - if (event->status & DEPEVT_STATUS_SHORT && !chain)
> >
> > I accidentally deleted a couple of lines here.
> >
> > > + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC && !chain)
> > > return 1;
> > >
> > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> >
> > I meant to do this:
> >
> > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > index 61fba2b7389b..cb65371572ee 100644
> > --- a/drivers/usb/dwc3/gadget.c
> > +++ b/drivers/usb/dwc3/gadget.c
> > @@ -3657,6 +3657,9 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > return 1;
> >
> > + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC && !chain)
> > + return 1;
> > +
> > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > return 1;
> > @@ -3673,6 +3676,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
> > struct scatterlist *s;
> > unsigned int num_queued = req->num_queued_sgs;
> > unsigned int i;
> > + bool missed_isoc = false;
> > int ret = 0;
> >
> > for_each_sg(sg, s, num_queued, i) {
> > @@ -3681,12 +3685,18 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
> > req->sg = sg_next(s);
> > req->num_queued_sgs--;
> >
> > + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC)
> > + missed_isoc = true;
> > +
> > ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,
> > trb, event, status, true);
> > if (ret)
> > break;
> > }
> >
> > + if (missed_isoc)
> > + ret = 1;
> > +
> > return ret;
> > }
> >
> >
> > BR,
> > Thinh
>
> I tried out the following patch diff you provided and I did not see any iommu
> related crashes with these changes:
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index dfaf9ac24c4f..50287437d6de 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -3195,6 +3195,9 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> if (event->status & DEPEVT_STATUS_SHORT && !chain)
> return 1;
>
> + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC && !chain)
> + return 1;
> +
> if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> (trb->ctrl & DWC3_TRB_CTRL_LST))
> return 1;
> @@ -3211,6 +3214,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
> struct scatterlist *s;
> unsigned int num_queued = req->num_queued_sgs;
> unsigned int i;
> + bool missed_isoc = false;
> int ret = 0;
>
> for_each_sg(sg, s, num_queued, i) {
> @@ -3219,12 +3223,18 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
> req->sg = sg_next(s);
> req->num_queued_sgs--;
>
> + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC)
> + missed_isoc = true;
> +
> ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,
> trb, event, status, true);
> if (ret)
> break;
> }
>
> + if (missed_isoc)
> + ret = 1;
> +
> return ret;
> }
>
>
> As we discussed earlier, when uvc's complete function is called, if an -EXDEV
> is returned in the request's status, the uvc driver will begin to cancel its
> queue. With the current skip interrupt implementation in the uvc driver, if
> this occurs while the uvc driver is pumping the current frame, then there is no
> guarentee that the last request(s) will have had 'no_interrupt=0'. If the last
> requests passed to dwc3 had 'no_interrupt=1', these requests would eventually
> be placed at the end of the started_list. Since the IOC bit will not be set,
> and if no missed isoc event occurs on these requests, then the dwc3 driver will
> not be interrupted, leaving those remaining requests sitting in the
> started_list, and dwc3 will not perform an 'End Transfer' as expected. Once the
> uvc driver begins to pump the requests for the next frame, then it most likely
> will result in additional missed isoc events, with the result being an extended
> video freeze seen by the user.
>
> I hope that other uvc driver maintainers can chime in here to help determine the
> correct path forward. With the skip interrupt implementation, the uvc driver should
> guarentee that the last request sent to dwc3 has 'no_interrupt=0', otherwise

Rather than guarenteeing no_interrupt or not, it's more important that
the UVC maintains a constant queue of requests to the controller driver.
Isoc transfers are meant to be sent at a constant rate which the
endpoint is configured.

I recalled Dan mentioned that UVC gadget driver can queue up to 64
requests with no_interrupt=1 up to 15 requests. But I keep seeing that
the gadget driver only "pumps" 16 requests and doesn't continue until
they are completed. We can almost guarantee that it's going to be
underrun. Can UVC "pumps" multiple times at once?

> if a missed isoc error occurs, it becomes very likely that the next immediate set of
> frames could be dropped/cancelled because the dwc3 driver could not perform a timely
> 'End Transfer'.
>
> For testing I implemented the following changes to see what I could do for this
> issue. Note that I am on an older implementation and it's missing a lot of the

Please use the latest kernel, there are a lot of fixes/improvement to
dwc3 every kernel version.

> sg related implementation. The idea here is that if the queue is empty, and that
> req_int_count is non-zero then the last request likely had 'no_interrupt=1' set.
> And if this is the case then we will want to send some dummy request to dwc3 with
> 'no_interrupt=0' set to make sure that no requests get stuck in its started_list.

This is not efficient and unnecessary.

<snip>

>
>
> Alternatively we may just not want to cancel the queue upon receiving -EXDEV
> and this could solve the problem too, but I don't think that it's such a great
> idea, especially if things start falling behind.
>
> I hope that someone more fluent in this area of code can take a crack at
> improving/fixing this issue.
>
> The changes above do seem to help dwc3 timely end its transfers, but mainly for
> cases where some requests are missed but the next immediate ones are not (i'm
> talking within a couple of hundred microseconds). Most of the time if missed
> isocs occurs for a frame that the remaining reqs in the started_list will
> likely also error out and the list will be emptied and dwc3 will still timely
> send 'End Transfer'. In reality this is to cover a corner case that can
> adversely affect the quality of the video being watched. Just wanted to be
> upfront with these details.
>
> Thinh, any pointers on how we should proceed from here? It looks like your
> changes are working well.
>

You can add the underrun detection check to dwc3 whenever it receives a
new request.

ie. When the new request comes, check if the last prepared TRB's HWO bit
is cleared and if the endpoint is started, send End Transfer command to
reschedule the isoc transfers for the incoming requests.

This is probably the simpler workaround to the underrun issue of UVC.

BR,
Thinh

2022-10-21 19:14:28

by Jeff Vanhoof

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

Hi Thinh,

On Fri, Oct 21, 2022 at 04:43:52PM +0000, Thinh Nguyen wrote:
> On Fri, Oct 21, 2022, Jeff Vanhoof wrote:
> > Hi Thinh,
> >
> > On Fri, Oct 21, 2022 at 12:55:51AM +0000, Thinh Nguyen wrote:
> > > On Thu, Oct 20, 2022, Thinh Nguyen wrote:
> > > > On Thu, Oct 20, 2022, Jeff Vanhoof wrote:
> > > > > Hi Thinh,
> > > > >
> > > > > On Wed, Oct 19, 2022 at 11:06:08PM +0000, Thinh Nguyen wrote:
> > > > > > Hi,
> > > > > >
> > > > > > On Wed, Oct 19, 2022, Jeff Vanhoof wrote:
> > > > > > > Hi Thinh,
> > > > > > > On Wed, Oct 19, 2022 at 07:08:27PM +0000, Thinh Nguyen wrote:
> > > > > > > > On Wed, Oct 19, 2022, Jeff Vanhoof wrote:
> > > > > >
> > > > > > <snip>
> > > > > >
> > > > > > > > >
> > > > > > > > > From what I can gather from the log, with the current changes it seems that
> > > > > > > > > after a missed isoc event few requests are staying longer than expected in the
> > > > > > > > > started_list (not getting reclaimed) and this is preventing the transmission
> > > > > > > > > from stopping/starting again, and opening the door for continuous stream of
> > > > > > > > > missed isoc events that cause what appears to the user as a frozen video.
> > > > > > > > >
> > > > > > > > > So one thought, if IOC bit is not set every frame, but IMI bit is, when a
> > > > > > > > > missed isoc related interrupt occurs it seems likely that more than one trb
> > > > > > > > > request will need to be reclaimed, but the current set of changes is not
> > > > > > > > > handling this.
> > > > > > > > >
> > > > > > > > > In the good transfer case this issue seems to be taken care of since the IOC
> > > > > > > > > bit is not set every frame and the reclaimation will loop through every item in
> > > > > > > > > the started_list and only stop if there are no additional trbs or if one has
> > > > > > > >
> > > > > > > > It should stop at the request that associated with the interrupt event,
> > > > > > > > whether it's because of IMI or IOC.
> > > > > > >
> > > > > > > In this case I was concerned that if multipled queued reqs did not have IOC bit
> > > > > > > set, but there was a missed isoc on one of the last reqs, whether or not we would
> > > > > > > reclaim all of the requests up to the missed isoc related req. I'm not sure if
> > > > > > > my concern is valid or not.
> > > > > > >
> > > > > >
> > > > > > There should be no problem. If there's an interrupt event indicating a
> > > > > > TRB completion, the driver will give back all the requests up to the
> > > > > > request associated with the interrupt event, and the controller will
> > > > > > continue processing the remaining TRBs. On the next TRB completion
> > > > > > event, the driver will again give back all the requests up to the
> > > > > > request associated with that event.
> > > > > >
> > > > >
> > > > > I was testing with the following patch you suggested:
> > > > >
> > > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > > > index 61fba2b7389b..8352f4b5dd9f 100644
> > > > > > --- a/drivers/usb/dwc3/gadget.c
> > > > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > > > @@ -3657,6 +3657,10 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > > > > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > > > > return 1;
> > > > > >
> > > > > > + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > > > > > + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain)
> > > > > > + return 1;
> > > > > > +
> > > > > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > > > > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > > > > > return 1;
> > > > > >
> > > > >
> > > > > At this time the IMI bit was set for every frame. With these changes it
> > > > > appeared in case of missed isoc that sometimes not all requests would be
> > > > > reclaimed (enqueued != dequeued even 100ms after the last interrupt was
> > > > > handled). If the 1st req in the started_list was fine (IMI set, but not IOC),
> > > > > and a later req was the one actually missed, because of this status check the
> > > > > reclaimation could stop early and not clean up to the appropriate req. As
> > > >
> > > > Oops. You're right.
> > > >
> > > > > suggested yesterday, I also tried only setting the IMI bit when no_interrupt is
> > > > > not set, however I was still seeing the complete freezes. After analyzing this
> > > > > issue a bit, I have updated the diff to look more like this:
> > > > >
> > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > > index dfaf9ac24c4f..bb800a81815b 100644
> > > > > --- a/drivers/usb/dwc3/gadget.c
> > > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > > @@ -1230,8 +1230,9 @@ static void __dwc3_prepare_one_trb(struct dwc3_ep *dep, struct dwc3_trb *trb,
> > > > > trb->ctrl = DWC3_TRBCTL_ISOCHRONOUS;
> > > > > }
> > > > >
> > > > > - /* always enable Interrupt on Missed ISOC */
> > > > > - trb->ctrl |= DWC3_TRB_CTRL_ISP_IMI;
> > > > > + /* enable Interrupt on Missed ISOC */
> > > > > + if ((!no_interrupt && !chain) || must_interrupt)
> > > > > + trb->ctrl |= DWC3_TRB_CTRL_ISP_IMI;
> > > > > break;
> > > >
> > > > Either all or none of the TRBs of a request is set with IMI, and not
> > > > some.
> > > >
> > > > >
> > > > > case USB_ENDPOINT_XFER_BULK:
> > > > > @@ -3195,6 +3196,11 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > > > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > > > return 1;
> > > > >
> > > > > + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > > > > + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain
> > > > > + && (trb->ctrl & DWC3_TRB_CTRL_ISP_IMI))
> > > > > + return 1;
> > > > > +
> > > > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > > > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > > > > return 1;
> > > > >
> > > > > Where the trb must have the IMI set before returning early. This seemed to make
> > > > > the freezes recoverable.
> > > >
> > > > Can you try this revised change:
> > > >
> > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > index 61fba2b7389b..a69d8c28d86b 100644
> > > > --- a/drivers/usb/dwc3/gadget.c
> > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > @@ -3654,7 +3654,7 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > > if ((trb->ctrl & DWC3_TRB_CTRL_HWO) && status != -ESHUTDOWN)
> > > > return 1;
> > > >
> > > > - if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > >
> > > I accidentally deleted a couple of lines here.
> > >
> > > > + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC && !chain)
> > > > return 1;
> > > >
> > > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > >
> > > I meant to do this:
> > >
> > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > index 61fba2b7389b..cb65371572ee 100644
> > > --- a/drivers/usb/dwc3/gadget.c
> > > +++ b/drivers/usb/dwc3/gadget.c
> > > @@ -3657,6 +3657,9 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > return 1;
> > >
> > > + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC && !chain)
> > > + return 1;
> > > +
> > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > > return 1;
> > > @@ -3673,6 +3676,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
> > > struct scatterlist *s;
> > > unsigned int num_queued = req->num_queued_sgs;
> > > unsigned int i;
> > > + bool missed_isoc = false;
> > > int ret = 0;
> > >
> > > for_each_sg(sg, s, num_queued, i) {
> > > @@ -3681,12 +3685,18 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
> > > req->sg = sg_next(s);
> > > req->num_queued_sgs--;
> > >
> > > + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC)
> > > + missed_isoc = true;
> > > +
> > > ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,
> > > trb, event, status, true);
> > > if (ret)
> > > break;
> > > }
> > >
> > > + if (missed_isoc)
> > > + ret = 1;
> > > +
> > > return ret;
> > > }
> > >
> > >
> > > BR,
> > > Thinh
> >
> > I tried out the following patch diff you provided and I did not see any iommu
> > related crashes with these changes:
> >
> > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > index dfaf9ac24c4f..50287437d6de 100644
> > --- a/drivers/usb/dwc3/gadget.c
> > +++ b/drivers/usb/dwc3/gadget.c
> > @@ -3195,6 +3195,9 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > return 1;
> >
> > + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC && !chain)
> > + return 1;
> > +
> > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > return 1;
> > @@ -3211,6 +3214,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
> > struct scatterlist *s;
> > unsigned int num_queued = req->num_queued_sgs;
> > unsigned int i;
> > + bool missed_isoc = false;
> > int ret = 0;
> >
> > for_each_sg(sg, s, num_queued, i) {
> > @@ -3219,12 +3223,18 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
> > req->sg = sg_next(s);
> > req->num_queued_sgs--;
> >
> > + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC)
> > + missed_isoc = true;
> > +
> > ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,
> > trb, event, status, true);
> > if (ret)
> > break;
> > }
> >
> > + if (missed_isoc)
> > + ret = 1;
> > +
> > return ret;
> > }
> >
> >
> > As we discussed earlier, when uvc's complete function is called, if an -EXDEV
> > is returned in the request's status, the uvc driver will begin to cancel its
> > queue. With the current skip interrupt implementation in the uvc driver, if
> > this occurs while the uvc driver is pumping the current frame, then there is no
> > guarentee that the last request(s) will have had 'no_interrupt=0'. If the last
> > requests passed to dwc3 had 'no_interrupt=1', these requests would eventually
> > be placed at the end of the started_list. Since the IOC bit will not be set,
> > and if no missed isoc event occurs on these requests, then the dwc3 driver will
> > not be interrupted, leaving those remaining requests sitting in the
> > started_list, and dwc3 will not perform an 'End Transfer' as expected. Once the
> > uvc driver begins to pump the requests for the next frame, then it most likely
> > will result in additional missed isoc events, with the result being an extended
> > video freeze seen by the user.
> >
> > I hope that other uvc driver maintainers can chime in here to help determine the
> > correct path forward. With the skip interrupt implementation, the uvc driver should
> > guarentee that the last request sent to dwc3 has 'no_interrupt=0', otherwise
>
> Rather than guarenteeing no_interrupt or not, it's more important that
> the UVC maintains a constant queue of requests to the controller driver.
> Isoc transfers are meant to be sent at a constant rate which the
> endpoint is configured.
>

I agree with you on this, but it will probably always be a race with uvc
queuing up one req at a time and dwc3 starting to transmit almost immediately.
We can configure the streaming_interval on a product to kind of slow down or
delay the usb transfers, but between dwc3 and uvc driver it would be nice to
have an interface that would allow pre-queuing a certain number of reqs before
the transfer is actually started. If that is not possible, then uvc could
instead prepare a number of reqs ahead of time and attempt to queue them each
as fast a possible in a very tight loop.

> I recalled Dan mentioned that UVC gadget driver can queue up to 64
> requests with no_interrupt=1 up to 15 requests. But I keep seeing that
> the gadget driver only "pumps" 16 requests and doesn't continue until
> they are completed. We can almost guarantee that it's going to be
> underrun. Can UVC "pumps" multiple times at once?
>

uvc will usually pump when new frames comes in or when a req's complete gets
called. uvc should fill up front all the reqs required to transfer a frame (up
to 64), but once the available reqs are filled or the frame completely queued
up, it would take a new incoming frame or a kick via the complete call to have
it attempt to fill any remaining reqs in the queue again. To me this looks
ok to do, but for heavy transfers we have a somewhat smaller queue as a buffer,
48 reqs vs 64 reqs (64 - 16).

To note, for the very last request of a frame/buffer (the end of it) the uvc
driver does set the no_interrupt to 0 for this request.

> > if a missed isoc error occurs, it becomes very likely that the next immediate set of
> > frames could be dropped/cancelled because the dwc3 driver could not perform a timely
> > 'End Transfer'.
> >
> > For testing I implemented the following changes to see what I could do for this
> > issue. Note that I am on an older implementation and it's missing a lot of the
>
> Please use the latest kernel, there are a lot of fixes/improvement to
> dwc3 every kernel version.
>

I've been debugging the sg implementation and the skip interrupt implementation
seperately, backporting what can be backported. I'm working off of a 5.10
kernel debugging various issues Dan Vacura was seeing on a 5.15 kernel on a
newer product. What we had on our 5.10 based kernel was stable for uvc/dwc3, so
we needed to understand what came in since that time that broke stability.

> > sg related implementation. The idea here is that if the queue is empty, and that
> > req_int_count is non-zero then the last request likely had 'no_interrupt=1' set.
> > And if this is the case then we will want to send some dummy request to dwc3 with
> > 'no_interrupt=0' set to make sure that no requests get stuck in its started_list.
>
> This is not efficient and unnecessary.
>

Agree, but to fix this in the uvc driver the correct way seemed a bit more
complicated at first. I was thinking that the driver would always send one last
request of the frame buffer once an error is seen, but I'm now thinking of a
simpler solution. If we can update the uvc pump to prepare a number of requests
and make sure that the last request has no_interrupt set to 0 before queuing
them all up in a tight loop to the dwc3 driver, this would effectively solve
this problem too. A bit of extra smarts this area might also address some
of your concerns about uvc not pumping a lot of data at once (especially for
the beginning of a frame). I believe we should prepare more reqs up front
if the req queue is empty and less reqs if its already busy.

I'm hoping that someone can step up to help here :). If not, then this will be
my next activity.

> <snip>
>
> >
> >
> > Alternatively we may just not want to cancel the queue upon receiving -EXDEV
> > and this could solve the problem too, but I don't think that it's such a great
> > idea, especially if things start falling behind.
> >
> > I hope that someone more fluent in this area of code can take a crack at
> > improving/fixing this issue.
> >
> > The changes above do seem to help dwc3 timely end its transfers, but mainly for
> > cases where some requests are missed but the next immediate ones are not (i'm
> > talking within a couple of hundred microseconds). Most of the time if missed
> > isocs occurs for a frame that the remaining reqs in the started_list will
> > likely also error out and the list will be emptied and dwc3 will still timely
> > send 'End Transfer'. In reality this is to cover a corner case that can
> > adversely affect the quality of the video being watched. Just wanted to be
> > upfront with these details.
> >
> > Thinh, any pointers on how we should proceed from here? It looks like your
> > changes are working well.
> >
>
> You can add the underrun detection check to dwc3 whenever it receives a
> new request.
>
> ie. When the new request comes, check if the last prepared TRB's HWO bit
> is cleared and if the endpoint is started, send End Transfer command to
> reschedule the isoc transfers for the incoming requests.
>
> This is probably the simpler workaround to the underrun issue of UVC.
>

This sounds like a good optimization too by itself. Would it be possible for
you to implement something here to help get me started? Even if it's not
perfect, I'll take what I can get. We are running up against the clock for
trying to close things out (changes must be released to mainline and backported
to 5.15 for Android).

> BR,
> Thinh

Thanks,
Jeff

2022-10-21 19:16:19

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

On Fri, Oct 21, 2022, Jeff Vanhoof wrote:
> Hi Thinh,
>
> On Fri, Oct 21, 2022 at 04:43:52PM +0000, Thinh Nguyen wrote:
> > On Fri, Oct 21, 2022, Jeff Vanhoof wrote:
> > > Hi Thinh,
> > >
> > > On Fri, Oct 21, 2022 at 12:55:51AM +0000, Thinh Nguyen wrote:
> > > > On Thu, Oct 20, 2022, Thinh Nguyen wrote:
> > > > > On Thu, Oct 20, 2022, Jeff Vanhoof wrote:
> > > > > > Hi Thinh,
> > > > > >
> > > > > > On Wed, Oct 19, 2022 at 11:06:08PM +0000, Thinh Nguyen wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > On Wed, Oct 19, 2022, Jeff Vanhoof wrote:
> > > > > > > > Hi Thinh,
> > > > > > > > On Wed, Oct 19, 2022 at 07:08:27PM +0000, Thinh Nguyen wrote:
> > > > > > > > > On Wed, Oct 19, 2022, Jeff Vanhoof wrote:
> > > > > > >
> > > > > > > <snip>
> > > > > > >
> > > > > > > > > >
> > > > > > > > > > From what I can gather from the log, with the current changes it seems that
> > > > > > > > > > after a missed isoc event few requests are staying longer than expected in the
> > > > > > > > > > started_list (not getting reclaimed) and this is preventing the transmission
> > > > > > > > > > from stopping/starting again, and opening the door for continuous stream of
> > > > > > > > > > missed isoc events that cause what appears to the user as a frozen video.
> > > > > > > > > >
> > > > > > > > > > So one thought, if IOC bit is not set every frame, but IMI bit is, when a
> > > > > > > > > > missed isoc related interrupt occurs it seems likely that more than one trb
> > > > > > > > > > request will need to be reclaimed, but the current set of changes is not
> > > > > > > > > > handling this.
> > > > > > > > > >
> > > > > > > > > > In the good transfer case this issue seems to be taken care of since the IOC
> > > > > > > > > > bit is not set every frame and the reclaimation will loop through every item in
> > > > > > > > > > the started_list and only stop if there are no additional trbs or if one has
> > > > > > > > >
> > > > > > > > > It should stop at the request that associated with the interrupt event,
> > > > > > > > > whether it's because of IMI or IOC.
> > > > > > > >
> > > > > > > > In this case I was concerned that if multipled queued reqs did not have IOC bit
> > > > > > > > set, but there was a missed isoc on one of the last reqs, whether or not we would
> > > > > > > > reclaim all of the requests up to the missed isoc related req. I'm not sure if
> > > > > > > > my concern is valid or not.
> > > > > > > >
> > > > > > >
> > > > > > > There should be no problem. If there's an interrupt event indicating a
> > > > > > > TRB completion, the driver will give back all the requests up to the
> > > > > > > request associated with the interrupt event, and the controller will
> > > > > > > continue processing the remaining TRBs. On the next TRB completion
> > > > > > > event, the driver will again give back all the requests up to the
> > > > > > > request associated with that event.
> > > > > > >
> > > > > >
> > > > > > I was testing with the following patch you suggested:
> > > > > >
> > > > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > > > > index 61fba2b7389b..8352f4b5dd9f 100644
> > > > > > > --- a/drivers/usb/dwc3/gadget.c
> > > > > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > > > > @@ -3657,6 +3657,10 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > > > > > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > > > > > return 1;
> > > > > > >
> > > > > > > + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > > > > > > + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain)
> > > > > > > + return 1;
> > > > > > > +
> > > > > > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > > > > > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > > > > > > return 1;
> > > > > > >
> > > > > >
> > > > > > At this time the IMI bit was set for every frame. With these changes it
> > > > > > appeared in case of missed isoc that sometimes not all requests would be
> > > > > > reclaimed (enqueued != dequeued even 100ms after the last interrupt was
> > > > > > handled). If the 1st req in the started_list was fine (IMI set, but not IOC),
> > > > > > and a later req was the one actually missed, because of this status check the
> > > > > > reclaimation could stop early and not clean up to the appropriate req. As
> > > > >
> > > > > Oops. You're right.
> > > > >
> > > > > > suggested yesterday, I also tried only setting the IMI bit when no_interrupt is
> > > > > > not set, however I was still seeing the complete freezes. After analyzing this
> > > > > > issue a bit, I have updated the diff to look more like this:
> > > > > >
> > > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > > > index dfaf9ac24c4f..bb800a81815b 100644
> > > > > > --- a/drivers/usb/dwc3/gadget.c
> > > > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > > > @@ -1230,8 +1230,9 @@ static void __dwc3_prepare_one_trb(struct dwc3_ep *dep, struct dwc3_trb *trb,
> > > > > > trb->ctrl = DWC3_TRBCTL_ISOCHRONOUS;
> > > > > > }
> > > > > >
> > > > > > - /* always enable Interrupt on Missed ISOC */
> > > > > > - trb->ctrl |= DWC3_TRB_CTRL_ISP_IMI;
> > > > > > + /* enable Interrupt on Missed ISOC */
> > > > > > + if ((!no_interrupt && !chain) || must_interrupt)
> > > > > > + trb->ctrl |= DWC3_TRB_CTRL_ISP_IMI;
> > > > > > break;
> > > > >
> > > > > Either all or none of the TRBs of a request is set with IMI, and not
> > > > > some.
> > > > >
> > > > > >
> > > > > > case USB_ENDPOINT_XFER_BULK:
> > > > > > @@ -3195,6 +3196,11 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > > > > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > > > > return 1;
> > > > > >
> > > > > > + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > > > > > + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain
> > > > > > + && (trb->ctrl & DWC3_TRB_CTRL_ISP_IMI))
> > > > > > + return 1;
> > > > > > +
> > > > > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > > > > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > > > > > return 1;
> > > > > >
> > > > > > Where the trb must have the IMI set before returning early. This seemed to make
> > > > > > the freezes recoverable.
> > > > >
> > > > > Can you try this revised change:
> > > > >
> > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > > index 61fba2b7389b..a69d8c28d86b 100644
> > > > > --- a/drivers/usb/dwc3/gadget.c
> > > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > > @@ -3654,7 +3654,7 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > > > if ((trb->ctrl & DWC3_TRB_CTRL_HWO) && status != -ESHUTDOWN)
> > > > > return 1;
> > > > >
> > > > > - if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > >
> > > > I accidentally deleted a couple of lines here.
> > > >
> > > > > + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC && !chain)
> > > > > return 1;
> > > > >
> > > > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > >
> > > > I meant to do this:
> > > >
> > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > index 61fba2b7389b..cb65371572ee 100644
> > > > --- a/drivers/usb/dwc3/gadget.c
> > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > @@ -3657,6 +3657,9 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > > return 1;
> > > >
> > > > + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC && !chain)
> > > > + return 1;
> > > > +
> > > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > > > return 1;
> > > > @@ -3673,6 +3676,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
> > > > struct scatterlist *s;
> > > > unsigned int num_queued = req->num_queued_sgs;
> > > > unsigned int i;
> > > > + bool missed_isoc = false;
> > > > int ret = 0;
> > > >
> > > > for_each_sg(sg, s, num_queued, i) {
> > > > @@ -3681,12 +3685,18 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
> > > > req->sg = sg_next(s);
> > > > req->num_queued_sgs--;
> > > >
> > > > + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC)
> > > > + missed_isoc = true;
> > > > +
> > > > ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,
> > > > trb, event, status, true);
> > > > if (ret)
> > > > break;
> > > > }
> > > >
> > > > + if (missed_isoc)
> > > > + ret = 1;
> > > > +
> > > > return ret;
> > > > }
> > > >
> > > >
> > > > BR,
> > > > Thinh
> > >
> > > I tried out the following patch diff you provided and I did not see any iommu
> > > related crashes with these changes:
> > >
> > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > index dfaf9ac24c4f..50287437d6de 100644
> > > --- a/drivers/usb/dwc3/gadget.c
> > > +++ b/drivers/usb/dwc3/gadget.c
> > > @@ -3195,6 +3195,9 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > return 1;
> > >
> > > + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC && !chain)
> > > + return 1;
> > > +
> > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > > return 1;
> > > @@ -3211,6 +3214,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
> > > struct scatterlist *s;
> > > unsigned int num_queued = req->num_queued_sgs;
> > > unsigned int i;
> > > + bool missed_isoc = false;
> > > int ret = 0;
> > >
> > > for_each_sg(sg, s, num_queued, i) {
> > > @@ -3219,12 +3223,18 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
> > > req->sg = sg_next(s);
> > > req->num_queued_sgs--;
> > >
> > > + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC)
> > > + missed_isoc = true;
> > > +
> > > ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,
> > > trb, event, status, true);
> > > if (ret)
> > > break;
> > > }
> > >
> > > + if (missed_isoc)
> > > + ret = 1;
> > > +
> > > return ret;
> > > }
> > >
> > >
> > > As we discussed earlier, when uvc's complete function is called, if an -EXDEV
> > > is returned in the request's status, the uvc driver will begin to cancel its
> > > queue. With the current skip interrupt implementation in the uvc driver, if
> > > this occurs while the uvc driver is pumping the current frame, then there is no
> > > guarentee that the last request(s) will have had 'no_interrupt=0'. If the last
> > > requests passed to dwc3 had 'no_interrupt=1', these requests would eventually
> > > be placed at the end of the started_list. Since the IOC bit will not be set,
> > > and if no missed isoc event occurs on these requests, then the dwc3 driver will
> > > not be interrupted, leaving those remaining requests sitting in the
> > > started_list, and dwc3 will not perform an 'End Transfer' as expected. Once the
> > > uvc driver begins to pump the requests for the next frame, then it most likely
> > > will result in additional missed isoc events, with the result being an extended
> > > video freeze seen by the user.
> > >
> > > I hope that other uvc driver maintainers can chime in here to help determine the
> > > correct path forward. With the skip interrupt implementation, the uvc driver should
> > > guarentee that the last request sent to dwc3 has 'no_interrupt=0', otherwise
> >
> > Rather than guarenteeing no_interrupt or not, it's more important that
> > the UVC maintains a constant queue of requests to the controller driver.
> > Isoc transfers are meant to be sent at a constant rate which the
> > endpoint is configured.
> >
>
> I agree with you on this, but it will probably always be a race with uvc
> queuing up one req at a time and dwc3 starting to transmit almost immediately.
> We can configure the streaming_interval on a product to kind of slow down or

No, it doesn't work that way. The controller would only send the data
when the host requests for it. The host will request for data every
125us. So a UVC request will complete roughly every 125us. There should
be no race with uvc.

> delay the usb transfers, but between dwc3 and uvc driver it would be nice to
> have an interface that would allow pre-queuing a certain number of reqs before
> the transfer is actually started. If that is not possible, then uvc could
> instead prepare a number of reqs ahead of time and attempt to queue them each
> as fast a possible in a very tight loop.

All UVC gadget driver has to do is to maintain multiple requests
prepared ahead of time and don't starve the controller driver. That is,
don't let the queue reaches 0. Let's say uvc can only pump 16 requests
at a time, split at least every 8 request with no_interrupt=0. So that
uvc will have time to feed more requests when it gets the notification
of 8 requests completed. It would have roughly 8x125us to queue more
requests before underrun.

>
> > I recalled Dan mentioned that UVC gadget driver can queue up to 64
> > requests with no_interrupt=1 up to 15 requests. But I keep seeing that
> > the gadget driver only "pumps" 16 requests and doesn't continue until
> > they are completed. We can almost guarantee that it's going to be
> > underrun. Can UVC "pumps" multiple times at once?
> >
>
> uvc will usually pump when new frames comes in or when a req's complete gets
> called. uvc should fill up front all the reqs required to transfer a frame (up
> to 64), but once the available reqs are filled or the frame completely queued
> up, it would take a new incoming frame or a kick via the complete call to have
> it attempt to fill any remaining reqs in the queue again. To me this looks
> ok to do, but for heavy transfers we have a somewhat smaller queue as a buffer,
> 48 reqs vs 64 reqs (64 - 16).
>
> To note, for the very last request of a frame/buffer (the end of it) the uvc
> driver does set the no_interrupt to 0 for this request.
>
> > > if a missed isoc error occurs, it becomes very likely that the next immediate set of
> > > frames could be dropped/cancelled because the dwc3 driver could not perform a timely
> > > 'End Transfer'.
> > >
> > > For testing I implemented the following changes to see what I could do for this
> > > issue. Note that I am on an older implementation and it's missing a lot of the
> >
> > Please use the latest kernel, there are a lot of fixes/improvement to
> > dwc3 every kernel version.
> >
>
> I've been debugging the sg implementation and the skip interrupt implementation
> seperately, backporting what can be backported. I'm working off of a 5.10
> kernel debugging various issues Dan Vacura was seeing on a 5.15 kernel on a
> newer product. What we had on our 5.10 based kernel was stable for uvc/dwc3, so
> we needed to understand what came in since that time that broke stability.
>
> > > sg related implementation. The idea here is that if the queue is empty, and that
> > > req_int_count is non-zero then the last request likely had 'no_interrupt=1' set.
> > > And if this is the case then we will want to send some dummy request to dwc3 with
> > > 'no_interrupt=0' set to make sure that no requests get stuck in its started_list.
> >
> > This is not efficient and unnecessary.
> >
>
> Agree, but to fix this in the uvc driver the correct way seemed a bit more
> complicated at first. I was thinking that the driver would always send one last
> request of the frame buffer once an error is seen, but I'm now thinking of a
> simpler solution. If we can update the uvc pump to prepare a number of requests
> and make sure that the last request has no_interrupt set to 0 before queuing
> them all up in a tight loop to the dwc3 driver, this would effectively solve
> this problem too. A bit of extra smarts this area might also address some
> of your concerns about uvc not pumping a lot of data at once (especially for
> the beginning of a frame). I believe we should prepare more reqs up front
> if the req queue is empty and less reqs if its already busy.
>
> I'm hoping that someone can step up to help here :). If not, then this will be
> my next activity.
>
> > <snip>
> >
> > >
> > >
> > > Alternatively we may just not want to cancel the queue upon receiving -EXDEV
> > > and this could solve the problem too, but I don't think that it's such a great
> > > idea, especially if things start falling behind.
> > >
> > > I hope that someone more fluent in this area of code can take a crack at
> > > improving/fixing this issue.
> > >
> > > The changes above do seem to help dwc3 timely end its transfers, but mainly for
> > > cases where some requests are missed but the next immediate ones are not (i'm
> > > talking within a couple of hundred microseconds). Most of the time if missed
> > > isocs occurs for a frame that the remaining reqs in the started_list will
> > > likely also error out and the list will be emptied and dwc3 will still timely
> > > send 'End Transfer'. In reality this is to cover a corner case that can
> > > adversely affect the quality of the video being watched. Just wanted to be
> > > upfront with these details.
> > >
> > > Thinh, any pointers on how we should proceed from here? It looks like your
> > > changes are working well.
> > >
> >
> > You can add the underrun detection check to dwc3 whenever it receives a
> > new request.
> >
> > ie. When the new request comes, check if the last prepared TRB's HWO bit
> > is cleared and if the endpoint is started, send End Transfer command to
> > reschedule the isoc transfers for the incoming requests.
> >
> > This is probably the simpler workaround to the underrun issue of UVC.
> >
>
> This sounds like a good optimization too by itself. Would it be possible for
> you to implement something here to help get me started? Even if it's not
> perfect, I'll take what I can get. We are running up against the clock for
> trying to close things out (changes must be released to mainline and backported
> to 5.15 for Android).
>

At the moment, I have very limited bandwidth to implement and run tests.
However, if you or someone provide a patch, I can help review.

It should look something like this (pseudo code):

ep_queue(request) {
add_pending_list(request)
if (ep is isoc and enabled) {
if (prev_trb.HWO == 0) {
send_cmd(End Transfer)
return;
}
}
}

On End Transfer command completion:
reclaim_and_giveback_requests(started_list)

BR,
Thinh

2022-10-21 19:39:07

by Jeff Vanhoof

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

On Fri, Oct 21, 2022 at 07:09:55PM +0000, Thinh Nguyen wrote:
> On Fri, Oct 21, 2022, Jeff Vanhoof wrote:
> > Hi Thinh,
> >
> > On Fri, Oct 21, 2022 at 04:43:52PM +0000, Thinh Nguyen wrote:
> > > On Fri, Oct 21, 2022, Jeff Vanhoof wrote:
> > > > Hi Thinh,
> > > >
> > > > On Fri, Oct 21, 2022 at 12:55:51AM +0000, Thinh Nguyen wrote:
> > > > > On Thu, Oct 20, 2022, Thinh Nguyen wrote:
> > > > > > On Thu, Oct 20, 2022, Jeff Vanhoof wrote:
> > > > > > > Hi Thinh,
> > > > > > >
> > > > > > > On Wed, Oct 19, 2022 at 11:06:08PM +0000, Thinh Nguyen wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > On Wed, Oct 19, 2022, Jeff Vanhoof wrote:
> > > > > > > > > Hi Thinh,
> > > > > > > > > On Wed, Oct 19, 2022 at 07:08:27PM +0000, Thinh Nguyen wrote:
> > > > > > > > > > On Wed, Oct 19, 2022, Jeff Vanhoof wrote:
> > > > > > > >
> > > > > > > > <snip>
> > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > From what I can gather from the log, with the current changes it seems that
> > > > > > > > > > > after a missed isoc event few requests are staying longer than expected in the
> > > > > > > > > > > started_list (not getting reclaimed) and this is preventing the transmission
> > > > > > > > > > > from stopping/starting again, and opening the door for continuous stream of
> > > > > > > > > > > missed isoc events that cause what appears to the user as a frozen video.
> > > > > > > > > > >
> > > > > > > > > > > So one thought, if IOC bit is not set every frame, but IMI bit is, when a
> > > > > > > > > > > missed isoc related interrupt occurs it seems likely that more than one trb
> > > > > > > > > > > request will need to be reclaimed, but the current set of changes is not
> > > > > > > > > > > handling this.
> > > > > > > > > > >
> > > > > > > > > > > In the good transfer case this issue seems to be taken care of since the IOC
> > > > > > > > > > > bit is not set every frame and the reclaimation will loop through every item in
> > > > > > > > > > > the started_list and only stop if there are no additional trbs or if one has
> > > > > > > > > >
> > > > > > > > > > It should stop at the request that associated with the interrupt event,
> > > > > > > > > > whether it's because of IMI or IOC.
> > > > > > > > >
> > > > > > > > > In this case I was concerned that if multipled queued reqs did not have IOC bit
> > > > > > > > > set, but there was a missed isoc on one of the last reqs, whether or not we would
> > > > > > > > > reclaim all of the requests up to the missed isoc related req. I'm not sure if
> > > > > > > > > my concern is valid or not.
> > > > > > > > >
> > > > > > > >
> > > > > > > > There should be no problem. If there's an interrupt event indicating a
> > > > > > > > TRB completion, the driver will give back all the requests up to the
> > > > > > > > request associated with the interrupt event, and the controller will
> > > > > > > > continue processing the remaining TRBs. On the next TRB completion
> > > > > > > > event, the driver will again give back all the requests up to the
> > > > > > > > request associated with that event.
> > > > > > > >
> > > > > > >
> > > > > > > I was testing with the following patch you suggested:
> > > > > > >
> > > > > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > > > > > index 61fba2b7389b..8352f4b5dd9f 100644
> > > > > > > > --- a/drivers/usb/dwc3/gadget.c
> > > > > > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > > > > > @@ -3657,6 +3657,10 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > > > > > > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > > > > > > return 1;
> > > > > > > >
> > > > > > > > + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > > > > > > > + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain)
> > > > > > > > + return 1;
> > > > > > > > +
> > > > > > > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > > > > > > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > > > > > > > return 1;
> > > > > > > >
> > > > > > >
> > > > > > > At this time the IMI bit was set for every frame. With these changes it
> > > > > > > appeared in case of missed isoc that sometimes not all requests would be
> > > > > > > reclaimed (enqueued != dequeued even 100ms after the last interrupt was
> > > > > > > handled). If the 1st req in the started_list was fine (IMI set, but not IOC),
> > > > > > > and a later req was the one actually missed, because of this status check the
> > > > > > > reclaimation could stop early and not clean up to the appropriate req. As
> > > > > >
> > > > > > Oops. You're right.
> > > > > >
> > > > > > > suggested yesterday, I also tried only setting the IMI bit when no_interrupt is
> > > > > > > not set, however I was still seeing the complete freezes. After analyzing this
> > > > > > > issue a bit, I have updated the diff to look more like this:
> > > > > > >
> > > > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > > > > index dfaf9ac24c4f..bb800a81815b 100644
> > > > > > > --- a/drivers/usb/dwc3/gadget.c
> > > > > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > > > > @@ -1230,8 +1230,9 @@ static void __dwc3_prepare_one_trb(struct dwc3_ep *dep, struct dwc3_trb *trb,
> > > > > > > trb->ctrl = DWC3_TRBCTL_ISOCHRONOUS;
> > > > > > > }
> > > > > > >
> > > > > > > - /* always enable Interrupt on Missed ISOC */
> > > > > > > - trb->ctrl |= DWC3_TRB_CTRL_ISP_IMI;
> > > > > > > + /* enable Interrupt on Missed ISOC */
> > > > > > > + if ((!no_interrupt && !chain) || must_interrupt)
> > > > > > > + trb->ctrl |= DWC3_TRB_CTRL_ISP_IMI;
> > > > > > > break;
> > > > > >
> > > > > > Either all or none of the TRBs of a request is set with IMI, and not
> > > > > > some.
> > > > > >
> > > > > > >
> > > > > > > case USB_ENDPOINT_XFER_BULK:
> > > > > > > @@ -3195,6 +3196,11 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > > > > > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > > > > > return 1;
> > > > > > >
> > > > > > > + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > > > > > > + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain
> > > > > > > + && (trb->ctrl & DWC3_TRB_CTRL_ISP_IMI))
> > > > > > > + return 1;
> > > > > > > +
> > > > > > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > > > > > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > > > > > > return 1;
> > > > > > >
> > > > > > > Where the trb must have the IMI set before returning early. This seemed to make
> > > > > > > the freezes recoverable.
> > > > > >
> > > > > > Can you try this revised change:
> > > > > >
> > > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > > > index 61fba2b7389b..a69d8c28d86b 100644
> > > > > > --- a/drivers/usb/dwc3/gadget.c
> > > > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > > > @@ -3654,7 +3654,7 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > > > > if ((trb->ctrl & DWC3_TRB_CTRL_HWO) && status != -ESHUTDOWN)
> > > > > > return 1;
> > > > > >
> > > > > > - if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > > >
> > > > > I accidentally deleted a couple of lines here.
> > > > >
> > > > > > + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC && !chain)
> > > > > > return 1;
> > > > > >
> > > > > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > > >
> > > > > I meant to do this:
> > > > >
> > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > > index 61fba2b7389b..cb65371572ee 100644
> > > > > --- a/drivers/usb/dwc3/gadget.c
> > > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > > @@ -3657,6 +3657,9 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > > > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > > > return 1;
> > > > >
> > > > > + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC && !chain)
> > > > > + return 1;
> > > > > +
> > > > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > > > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > > > > return 1;
> > > > > @@ -3673,6 +3676,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
> > > > > struct scatterlist *s;
> > > > > unsigned int num_queued = req->num_queued_sgs;
> > > > > unsigned int i;
> > > > > + bool missed_isoc = false;
> > > > > int ret = 0;
> > > > >
> > > > > for_each_sg(sg, s, num_queued, i) {
> > > > > @@ -3681,12 +3685,18 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
> > > > > req->sg = sg_next(s);
> > > > > req->num_queued_sgs--;
> > > > >
> > > > > + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC)
> > > > > + missed_isoc = true;
> > > > > +
> > > > > ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,
> > > > > trb, event, status, true);
> > > > > if (ret)
> > > > > break;
> > > > > }
> > > > >
> > > > > + if (missed_isoc)
> > > > > + ret = 1;
> > > > > +
> > > > > return ret;
> > > > > }
> > > > >
> > > > >
> > > > > BR,
> > > > > Thinh
> > > >
> > > > I tried out the following patch diff you provided and I did not see any iommu
> > > > related crashes with these changes:
> > > >
> > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > index dfaf9ac24c4f..50287437d6de 100644
> > > > --- a/drivers/usb/dwc3/gadget.c
> > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > @@ -3195,6 +3195,9 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
> > > > if (event->status & DEPEVT_STATUS_SHORT && !chain)
> > > > return 1;
> > > >
> > > > + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC && !chain)
> > > > + return 1;
> > > > +
> > > > if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
> > > > (trb->ctrl & DWC3_TRB_CTRL_LST))
> > > > return 1;
> > > > @@ -3211,6 +3214,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
> > > > struct scatterlist *s;
> > > > unsigned int num_queued = req->num_queued_sgs;
> > > > unsigned int i;
> > > > + bool missed_isoc = false;
> > > > int ret = 0;
> > > >
> > > > for_each_sg(sg, s, num_queued, i) {
> > > > @@ -3219,12 +3223,18 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
> > > > req->sg = sg_next(s);
> > > > req->num_queued_sgs--;
> > > >
> > > > + if (DWC3_TRB_SIZE_TRBSTS(trb->size) == DWC3_TRBSTS_MISSED_ISOC)
> > > > + missed_isoc = true;
> > > > +
> > > > ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,
> > > > trb, event, status, true);
> > > > if (ret)
> > > > break;
> > > > }
> > > >
> > > > + if (missed_isoc)
> > > > + ret = 1;
> > > > +
> > > > return ret;
> > > > }
> > > >
> > > >
> > > > As we discussed earlier, when uvc's complete function is called, if an -EXDEV
> > > > is returned in the request's status, the uvc driver will begin to cancel its
> > > > queue. With the current skip interrupt implementation in the uvc driver, if
> > > > this occurs while the uvc driver is pumping the current frame, then there is no
> > > > guarentee that the last request(s) will have had 'no_interrupt=0'. If the last
> > > > requests passed to dwc3 had 'no_interrupt=1', these requests would eventually
> > > > be placed at the end of the started_list. Since the IOC bit will not be set,
> > > > and if no missed isoc event occurs on these requests, then the dwc3 driver will
> > > > not be interrupted, leaving those remaining requests sitting in the
> > > > started_list, and dwc3 will not perform an 'End Transfer' as expected. Once the
> > > > uvc driver begins to pump the requests for the next frame, then it most likely
> > > > will result in additional missed isoc events, with the result being an extended
> > > > video freeze seen by the user.
> > > >
> > > > I hope that other uvc driver maintainers can chime in here to help determine the
> > > > correct path forward. With the skip interrupt implementation, the uvc driver should
> > > > guarentee that the last request sent to dwc3 has 'no_interrupt=0', otherwise
> > >
> > > Rather than guarenteeing no_interrupt or not, it's more important that
> > > the UVC maintains a constant queue of requests to the controller driver.
> > > Isoc transfers are meant to be sent at a constant rate which the
> > > endpoint is configured.
> > >
> >
> > I agree with you on this, but it will probably always be a race with uvc
> > queuing up one req at a time and dwc3 starting to transmit almost immediately.
> > We can configure the streaming_interval on a product to kind of slow down or
>
> No, it doesn't work that way. The controller would only send the data
> when the host requests for it. The host will request for data every
> 125us. So a UVC request will complete roughly every 125us. There should
> be no race with uvc.
>

I wasn't very clear here. I was talking about the UVC Gadget Driver, about how
uvcg_video_pump queues up requests to the dwc3 driver via uvcg_video_ep_queue
(->usb_ep_queue->dwc3_gadget_ep_queue).

> > delay the usb transfers, but between dwc3 and uvc driver it would be nice to
> > have an interface that would allow pre-queuing a certain number of reqs before
> > the transfer is actually started. If that is not possible, then uvc could
> > instead prepare a number of reqs ahead of time and attempt to queue them each
> > as fast a possible in a very tight loop.
>
> All UVC gadget driver has to do is to maintain multiple requests
> prepared ahead of time and don't starve the controller driver. That is,
> don't let the queue reaches 0. Let's say uvc can only pump 16 requests
> at a time, split at least every 8 request with no_interrupt=0. So that
> uvc will have time to feed more requests when it gets the notification
> of 8 requests completed. It would have roughly 8x125us to queue more
> requests before underrun.
>

I think we are saying the same thing. I'll try to be more clear going forward. :)

> >
> > > I recalled Dan mentioned that UVC gadget driver can queue up to 64
> > > requests with no_interrupt=1 up to 15 requests. But I keep seeing that
> > > the gadget driver only "pumps" 16 requests and doesn't continue until
> > > they are completed. We can almost guarantee that it's going to be
> > > underrun. Can UVC "pumps" multiple times at once?
> > >
> >
> > uvc will usually pump when new frames comes in or when a req's complete gets
> > called. uvc should fill up front all the reqs required to transfer a frame (up
> > to 64), but once the available reqs are filled or the frame completely queued
> > up, it would take a new incoming frame or a kick via the complete call to have
> > it attempt to fill any remaining reqs in the queue again. To me this looks
> > ok to do, but for heavy transfers we have a somewhat smaller queue as a buffer,
> > 48 reqs vs 64 reqs (64 - 16).
> >
> > To note, for the very last request of a frame/buffer (the end of it) the uvc
> > driver does set the no_interrupt to 0 for this request.
> >
> > > > if a missed isoc error occurs, it becomes very likely that the next immediate set of
> > > > frames could be dropped/cancelled because the dwc3 driver could not perform a timely
> > > > 'End Transfer'.
> > > >
> > > > For testing I implemented the following changes to see what I could do for this
> > > > issue. Note that I am on an older implementation and it's missing a lot of the
> > >
> > > Please use the latest kernel, there are a lot of fixes/improvement to
> > > dwc3 every kernel version.
> > >
> >
> > I've been debugging the sg implementation and the skip interrupt implementation
> > seperately, backporting what can be backported. I'm working off of a 5.10
> > kernel debugging various issues Dan Vacura was seeing on a 5.15 kernel on a
> > newer product. What we had on our 5.10 based kernel was stable for uvc/dwc3, so
> > we needed to understand what came in since that time that broke stability.
> >
> > > > sg related implementation. The idea here is that if the queue is empty, and that
> > > > req_int_count is non-zero then the last request likely had 'no_interrupt=1' set.
> > > > And if this is the case then we will want to send some dummy request to dwc3 with
> > > > 'no_interrupt=0' set to make sure that no requests get stuck in its started_list.
> > >
> > > This is not efficient and unnecessary.
> > >
> >
> > Agree, but to fix this in the uvc driver the correct way seemed a bit more
> > complicated at first. I was thinking that the driver would always send one last
> > request of the frame buffer once an error is seen, but I'm now thinking of a
> > simpler solution. If we can update the uvc pump to prepare a number of requests
> > and make sure that the last request has no_interrupt set to 0 before queuing
> > them all up in a tight loop to the dwc3 driver, this would effectively solve
> > this problem too. A bit of extra smarts this area might also address some
> > of your concerns about uvc not pumping a lot of data at once (especially for
> > the beginning of a frame). I believe we should prepare more reqs up front
> > if the req queue is empty and less reqs if its already busy.
> >
> > I'm hoping that someone can step up to help here :). If not, then this will be
> > my next activity.
> >
> > > <snip>
> > >
> > > >
> > > >
> > > > Alternatively we may just not want to cancel the queue upon receiving -EXDEV
> > > > and this could solve the problem too, but I don't think that it's such a great
> > > > idea, especially if things start falling behind.
> > > >
> > > > I hope that someone more fluent in this area of code can take a crack at
> > > > improving/fixing this issue.
> > > >
> > > > The changes above do seem to help dwc3 timely end its transfers, but mainly for
> > > > cases where some requests are missed but the next immediate ones are not (i'm
> > > > talking within a couple of hundred microseconds). Most of the time if missed
> > > > isocs occurs for a frame that the remaining reqs in the started_list will
> > > > likely also error out and the list will be emptied and dwc3 will still timely
> > > > send 'End Transfer'. In reality this is to cover a corner case that can
> > > > adversely affect the quality of the video being watched. Just wanted to be
> > > > upfront with these details.
> > > >
> > > > Thinh, any pointers on how we should proceed from here? It looks like your
> > > > changes are working well.
> > > >
> > >
> > > You can add the underrun detection check to dwc3 whenever it receives a
> > > new request.
> > >
> > > ie. When the new request comes, check if the last prepared TRB's HWO bit
> > > is cleared and if the endpoint is started, send End Transfer command to
> > > reschedule the isoc transfers for the incoming requests.
> > >
> > > This is probably the simpler workaround to the underrun issue of UVC.
> > >
> >
> > This sounds like a good optimization too by itself. Would it be possible for
> > you to implement something here to help get me started? Even if it's not
> > perfect, I'll take what I can get. We are running up against the clock for
> > trying to close things out (changes must be released to mainline and backported
> > to 5.15 for Android).
> >
>
> At the moment, I have very limited bandwidth to implement and run tests.
> However, if you or someone provide a patch, I can help review.
>
> It should look something like this (pseudo code):
>
> ep_queue(request) {
> add_pending_list(request)
> if (ep is isoc and enabled) {
> if (prev_trb.HWO == 0) {
> send_cmd(End Transfer)
> return;
> }
> }
> }
>
> On End Transfer command completion:
> reclaim_and_giveback_requests(started_list)
>
> BR,
> Thinh

Thanks,
Jeff

2024-02-22 00:04:15

by Michael Grzeschik

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

Sorry for digging up this grave! :)

I once more came accross the whole situation we are still encountering
since one year or so again and found the some reasons why:

#1 there are so many latencies, so that the system is not fast enough to
enqueue requests back into an running HW-Transfer. At least on our
system setup.

and

#2 there are so many missed transfers leading to broken frames
when adding request with no_interrupt set.

For #1: There sometimes are situations in the system where the threaded
interrupt handler for the dwc3 is not called fast enough, although the
HW-irq was called early and enqueued the irq event and woke the irq
thread early. In our case this often happens, when there are other tasks
involved on the same CPU and the scheduler is not able to pipeline the
irq thread in the necessary time. In our case the main issue is an
HW-irq handler of the ethernet controller (cadence macb) that runs
berserk on CPU0 and therefor is taking a lot of CPU time. Per default on
our system all irq handlers are running on the same CPU. As per
definition all interrupt threads will be started on the same CPU as the
irq was called, this forces a lot of pressure on one Core. So changing
the smp_affinity of the dwc3 irq to the second CPU only, already solves
a lot of the underruns.

For #2: I found an issue in the handling of the completion of requests in
the started list. When the interrupt handler is *explicitly* calling
stop_active_transfer if the overall event of the request was an missed
event. This event value only represents the value of the request that
was actually triggering the interrupt.

It also calls ep_cleanup_completed_requests and is iterating over the
started requests and will call giveback/complete functions of the
requests with the proper request status.

So this will also catch missed requests in the queue. However, since
there might be, lets say 5 good requests and one missed request, what
will happen is, that each complete call for the first good requests will
enqueue new requests into the started list and will also call the
updatecmd on that transfer that was already missed until the loop will
reach the one request with the MISSED status bit set.

So in my opinion the patch from Jeff makes sense when adding the
following change aswell. With those both changes the underruns and
broken frames finally disappear. I am still unsure about the complete
solution about that, since with this the mentioned 5 good requests
will be cancelled aswell. So this is still a WIP status here.

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index e031813c5769b..b991d25bbf897 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -3509,6 +3509,45 @@ static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,
return ret;
}

+static int dwc3_gadget_ep_check_missed_requests(struct dwc3_ep *dep)
+{
+ struct dwc3_request *req;
+ struct dwc3_request *tmp;
+ int ret = 0;
+
+ list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
+ struct dwc3_trb *trb;
+
+ /* TOOD: check if the trb association is correct */
+ trb = req->trb;
+ switch (DWC3_TRB_SIZE_TRBSTS(trb->size)) {
+ case DWC3_TRBSTS_MISSED_ISOC:
+ /* Isoc endpoint only */
+ ret = -EXDEV;
+ break;
+ case DWC3_TRB_STS_XFER_IN_PROG:
+ /* Applicable when End Transfer with ForceRM=0 */
+ case DWC3_TRBSTS_SETUP_PENDING:
+ /* Control endpoint only */
+ case DWC3_TRBSTS_OK:
+ default:
+ ret = 0;
+ break;
+ }
+ }
+
+ return ret;
+}
+
static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
const struct dwc3_event_depevt *event, int status)
{
@@ -3566,7 +3605,7 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
struct dwc3 *dwc = dep->dwc;
bool no_started_trb = true;

- if (status == -EXDEV) {
+ if (status == -EXDEV || dwc3_gadget_ep_check_missed_requests(dep)) {
struct dwc3_request *tmp;
struct dwc3_request *req;


On Mon, Oct 17, 2022 at 03:54:40PM -0500, Dan Vacura wrote:
>From: Jeff Vanhoof <[email protected]>
>
>arm-smmu related crashes seen after a Missed ISOC interrupt when
>no_interrupt=1 is used. This can happen if the hardware is still using
>the data associated with a TRB after the usb_request's ->complete call
>has been made. Instead of immediately releasing a request when a Missed
>ISOC interrupt has occurred, this change will add logic to cancel the
>request instead where it will eventually be released when the
>END_TRANSFER command has completed. This logic is similar to some of the
>cleanup done in dwc3_gadget_ep_dequeue.
>
>Fixes: 6d8a019614f3 ("usb: dwc3: gadget: check for Missed Isoc from event status")
>Cc: <[email protected]>
>Signed-off-by: Jeff Vanhoof <[email protected]>
>Co-developed-by: Dan Vacura <[email protected]>
>Signed-off-by: Dan Vacura <[email protected]>
>---
>V1 -> V3:
>- no change, new patch in series
>
> drivers/usb/dwc3/core.h | 1 +
> drivers/usb/dwc3/gadget.c | 38 ++++++++++++++++++++++++++------------
> 2 files changed, 27 insertions(+), 12 deletions(-)
>
>diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
>index 8f9959ba9fd4..9b005d912241 100644
>--- a/drivers/usb/dwc3/core.h
>+++ b/drivers/usb/dwc3/core.h
>@@ -943,6 +943,7 @@ struct dwc3_request {
> #define DWC3_REQUEST_STATUS_DEQUEUED 3
> #define DWC3_REQUEST_STATUS_STALLED 4
> #define DWC3_REQUEST_STATUS_COMPLETED 5
>+#define DWC3_REQUEST_STATUS_MISSED_ISOC 6
> #define DWC3_REQUEST_STATUS_UNKNOWN -1
>
> u8 epnum;
>diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>index 079cd333632e..411532c5c378 100644
>--- a/drivers/usb/dwc3/gadget.c
>+++ b/drivers/usb/dwc3/gadget.c
>@@ -2021,6 +2021,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
> case DWC3_REQUEST_STATUS_STALLED:
> dwc3_gadget_giveback(dep, req, -EPIPE);
> break;
>+ case DWC3_REQUEST_STATUS_MISSED_ISOC:
>+ dwc3_gadget_giveback(dep, req, -EXDEV);
>+ break;
> default:
> dev_err(dwc->dev, "request cancelled with wrong reason:%d\n", req->status);
> dwc3_gadget_giveback(dep, req, -ECONNRESET);
>@@ -3402,21 +3405,32 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
> struct dwc3 *dwc = dep->dwc;
> bool no_started_trb = true;
>
>- dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
>+ if (status == -EXDEV) {
>+ struct dwc3_request *tmp;
>+ struct dwc3_request *req;
>
>- if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
>- goto out;
>+ if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
>+ dwc3_stop_active_transfer(dep, true, true);
>
>- if (!dep->endpoint.desc)
>- return no_started_trb;
>+ list_for_each_entry_safe(req, tmp, &dep->started_list, list)
>+ dwc3_gadget_move_cancelled_request(req,
>+ DWC3_REQUEST_STATUS_MISSED_ISOC);
>+ } else {
>+ dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
>
>- if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
>- list_empty(&dep->started_list) &&
>- (list_empty(&dep->pending_list) || status == -EXDEV))
>- dwc3_stop_active_transfer(dep, true, true);
>- else if (dwc3_gadget_ep_should_continue(dep))
>- if (__dwc3_gadget_kick_transfer(dep) == 0)
>- no_started_trb = false;
>+ if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
>+ goto out;
>+
>+ if (!dep->endpoint.desc)
>+ return no_started_trb;
>+
>+ if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
>+ list_empty(&dep->started_list) && list_empty(&dep->pending_list))
>+ dwc3_stop_active_transfer(dep, true, true);
>+ else if (dwc3_gadget_ep_should_continue(dep))
>+ if (__dwc3_gadget_kick_transfer(dep) == 0)
>+ no_started_trb = false;
>+ }
>
> out:
> /*
>--
>2.34.1
>

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |


Attachments:
(No filename) (8.42 kB)
signature.asc (849.00 B)
Download all attachments

2024-02-22 01:21:48

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

On Thu, Feb 22, 2024, Michael Grzeschik wrote:
> Sorry for digging up this grave! :)
>
> I once more came accross the whole situation we are still encountering
> since one year or so again and found the some reasons why:
>
> #1 there are so many latencies, so that the system is not fast enough to
> enqueue requests back into an running HW-Transfer. At least on our
> system setup.
>
> and
>
> #2 there are so many missed transfers leading to broken frames
> when adding request with no_interrupt set.
>
> For #1: There sometimes are situations in the system where the threaded
> interrupt handler for the dwc3 is not called fast enough, although the
> HW-irq was called early and enqueued the irq event and woke the irq
> thread early. In our case this often happens, when there are other tasks
> involved on the same CPU and the scheduler is not able to pipeline the
> irq thread in the necessary time. In our case the main issue is an
> HW-irq handler of the ethernet controller (cadence macb) that runs
> berserk on CPU0 and therefor is taking a lot of CPU time. Per default on
> our system all irq handlers are running on the same CPU. As per
> definition all interrupt threads will be started on the same CPU as the
> irq was called, this forces a lot of pressure on one Core. So changing
> the smp_affinity of the dwc3 irq to the second CPU only, already solves
> a lot of the underruns.

That's great!

>
> For #2: I found an issue in the handling of the completion of requests in
> the started list. When the interrupt handler is *explicitly* calling
> stop_active_transfer if the overall event of the request was an missed
> event. This event value only represents the value of the request that
> was actually triggering the interrupt.
>
> It also calls ep_cleanup_completed_requests and is iterating over the
> started requests and will call giveback/complete functions of the
> requests with the proper request status.
>
> So this will also catch missed requests in the queue. However, since
> there might be, lets say 5 good requests and one missed request, what
> will happen is, that each complete call for the first good requests will
> enqueue new requests into the started list and will also call the
> updatecmd on that transfer that was already missed until the loop will
> reach the one request with the MISSED status bit set.
>
> So in my opinion the patch from Jeff makes sense when adding the
> following change aswell. With those both changes the underruns and
> broken frames finally disappear. I am still unsure about the complete
> solution about that, since with this the mentioned 5 good requests
> will be cancelled aswell. So this is still a WIP status here.
>

When the dwc3 driver issues stop_active_transfer(), that means that the
started_list is empty and there is an underrun. It treats the incoming
requests as staled. However, for UVC, they are still "good".

I think you can just check if the started_list is empty before queuing
new requests. If it is, perform stop_active_transfer() to reschedule the
incoming requests. None of the newly queue requests will be released
yet since they are in the pending_list.

For UVC, perhaps you can introduce a new flag to usb_request called
"ignore_queue_latency" or something equivalent. The dwc3 is already
partially doing this for UVC. With this new flag, we can rework dwc3 to
clearly separate the expected behavior from the function driver.

BR,
Thinh

2024-02-27 21:02:11

by Michael Grzeschik

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

On Thu, Feb 22, 2024 at 01:20:04AM +0000, Thinh Nguyen wrote:
>On Thu, Feb 22, 2024, Michael Grzeschik wrote:
>> For #2: I found an issue in the handling of the completion of requests in
>> the started list. When the interrupt handler is *explicitly* calling
>> stop_active_transfer if the overall event of the request was an missed
>> event. This event value only represents the value of the request that
>> was actually triggering the interrupt.
>>
>> It also calls ep_cleanup_completed_requests and is iterating over the
>> started requests and will call giveback/complete functions of the
>> requests with the proper request status.
>>
>> So this will also catch missed requests in the queue. However, since
>> there might be, lets say 5 good requests and one missed request, what
>> will happen is, that each complete call for the first good requests will
>> enqueue new requests into the started list and will also call the
>> updatecmd on that transfer that was already missed until the loop will
>> reach the one request with the MISSED status bit set.
>>
>> So in my opinion the patch from Jeff makes sense when adding the
>> following change aswell. With those both changes the underruns and
>> broken frames finally disappear. I am still unsure about the complete
>> solution about that, since with this the mentioned 5 good requests
>> will be cancelled aswell. So this is still a WIP status here.
>>
>
>When the dwc3 driver issues stop_active_transfer(), that means that the
>started_list is empty and there is an underrun.

At this moment this is only the case when both, pending and started list
are empty. Or the interrupt event was EXDEV.

The main problem is that the function
dwc3_gadget_ep_cleanup_completed_requests(dep, event, status); will
issue an complete for each started request, which on the other hand will
refill the pending list, and therefor after that refill the
stop_active_transfer is currently never hit.

>It treats the incoming requests as staled. However, for UVC, they are
>still "good".

Right, so in that case we can requeue them anyway. But this will have to
be done after the stop transfer cmd has finished.

>I think you can just check if the started_list is empty before queuing
>new requests. If it is, perform stop_active_transfer() to reschedule the
>incoming requests. None of the newly queue requests will be released
>yet since they are in the pending_list.

So that is basically exactly what my patch is doing. However in the case
of an underrun it is not safe to call dwc3_gadget_ep_cleanup_completed_requests
as jeff stated. So his underlying patch is really fixing an issue here.

>For UVC, perhaps you can introduce a new flag to usb_request called
>"ignore_queue_latency" or something equivalent. The dwc3 is already
>partially doing this for UVC. With this new flag, we can rework dwc3 to
>clearly separate the expected behavior from the function driver.

I don't know why this "extra" flag is even necessary. The code example
is already working without that extra flag.

Actually I even came up with an better solution. Additionally of checking if
one of the requests in the started list was missed, we can activly check if
the trb ring did run dry and if dwc3_gadget_endpoint_trbs_complete is
going to enqueue in to the empty trb ring.

So my whole change looks like that:

diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index efe6caf4d0e87..2c8047dcd1612 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -952,6 +952,7 @@ struct dwc3_request {
#define DWC3_REQUEST_STATUS_DEQUEUED 3
#define DWC3_REQUEST_STATUS_STALLED 4
#define DWC3_REQUEST_STATUS_COMPLETED 5
+#define DWC3_REQUEST_STATUS_MISSED_ISOC 6
#define DWC3_REQUEST_STATUS_UNKNOWN -1

u8 epnum;
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 858fe4c299b7a..a31f4d3502bd3 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2057,6 +2057,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
req = next_request(&dep->cancelled_list);
dwc3_gadget_ep_skip_trbs(dep, req);
switch (req->status) {
+ case 0:
+ dwc3_gadget_giveback(dep, req, 0);
+ break;
case DWC3_REQUEST_STATUS_DISCONNECTED:
dwc3_gadget_giveback(dep, req, -ESHUTDOWN);
break;
@@ -2066,6 +2069,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
case DWC3_REQUEST_STATUS_STALLED:
dwc3_gadget_giveback(dep, req, -EPIPE);
break;
+ case DWC3_REQUEST_STATUS_MISSED_ISOC:
+ dwc3_gadget_giveback(dep, req, -EXDEV);
+ break;
default:
dev_err(dwc->dev, "request cancelled with wrong reason:%d\n", req->status);
dwc3_gadget_giveback(dep, req, -ECONNRESET);
@@ -3509,6 +3515,36 @@ static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,
return ret;
}

+static int dwc3_gadget_ep_check_missed_requests(struct dwc3_ep *dep)
+{
+ struct dwc3_request *req;
+ struct dwc3_request *tmp;
+ int ret = 0;
+
+ list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
+ struct dwc3_trb *trb;
+
+ trb = req->trb;
+ switch (DWC3_TRB_SIZE_TRBSTS(trb->size)) {
+ case DWC3_TRBSTS_MISSED_ISOC:
+ /* Isoc endpoint only */
+ ret = -EXDEV;
+ break;
+ case DWC3_TRB_STS_XFER_IN_PROG:
+ /* Applicable when End Transfer with ForceRM=0 */
+ case DWC3_TRBSTS_SETUP_PENDING:
+ /* Control endpoint only */
+ case DWC3_TRBSTS_OK:
+ default:
+ ret = 0;
+ break;
+ }
+ }
+
+ return ret;
+}
+
static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
const struct dwc3_event_depevt *event, int status)
{
@@ -3565,22 +3601,51 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
{
struct dwc3 *dwc = dep->dwc;
bool no_started_trb = true;
+ unsigned int transfer_in_flight = 0;
+
+ /* It is possible that the interrupt thread was delayed by
+ * scheduling in the system, and therefor the HW has already
+ * run dry. In that case the last trb in the queue is already
+ * handled by the hw. By checking the HWO bit we know to restart
+ * the whole transfer. The condition to appear is more likelely
+ * if not every trb has the IOC bit set and therefor does not
+ * trigger the interrupt thread fewer.
+ */
+ if (dep->number && usb_endpoint_xfer_isoc(dep->endpoint.desc)) {
+ struct dwc3_trb *trb;

- dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
+ trb = dwc3_ep_prev_trb(dep, dep->trb_enqueue);
+ transfer_in_flight = trb->ctrl & DWC3_TRB_CTRL_HWO;
+ }

- if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
- goto out;
+ if (status == -EXDEV || !transfer_in_flight) {
+ struct dwc3_request *tmp;
+ struct dwc3_request *req;

- if (!dep->endpoint.desc)
- return no_started_trb;
+ if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
+ dwc3_stop_active_transfer(dep, true, true);

- if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
- list_empty(&dep->started_list) &&
- (list_empty(&dep->pending_list) || status == -EXDEV))
- dwc3_stop_active_transfer(dep, true, true);
- else if (dwc3_gadget_ep_should_continue(dep))
- if (__dwc3_gadget_kick_transfer(dep) == 0)
- no_started_trb = false;
+ list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
+ dwc3_gadget_move_cancelled_request(req,
+ (DWC3_TRB_SIZE_TRBSTS(req->trb->size) == DWC3_TRBSTS_MISSED_ISOC) ?
+ DWC3_REQUEST_STATUS_MISSED_ISOC : 0);
+ }
+ } else {
+ dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
+
+ if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
+ goto out;
+
+ if (!dep->endpoint.desc)
+ return no_started_trb;
+
+ if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
+ list_empty(&dep->started_list) && list_empty(&dep->pending_list))
+ dwc3_stop_active_transfer(dep, true, true);
+ else if (dwc3_gadget_ep_should_continue(dep))
+ if (__dwc3_gadget_kick_transfer(dep) == 0)
+ no_started_trb = false;
+ }

out:
/*

I will seperate the whole hunk into smaller changes and send an v1
the next days to review.

Regards,
Michael


--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |


Attachments:
(No filename) (8.38 kB)
signature.asc (849.00 B)
Download all attachments

2024-03-07 01:58:33

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

On Tue, Feb 27, 2024, Michael Grzeschik wrote:
> On Thu, Feb 22, 2024 at 01:20:04AM +0000, Thinh Nguyen wrote:
> > On Thu, Feb 22, 2024, Michael Grzeschik wrote:
> > > For #2: I found an issue in the handling of the completion of requests in
> > > the started list. When the interrupt handler is *explicitly* calling
> > > stop_active_transfer if the overall event of the request was an missed
> > > event. This event value only represents the value of the request that
> > > was actually triggering the interrupt.
> > >
> > > It also calls ep_cleanup_completed_requests and is iterating over the
> > > started requests and will call giveback/complete functions of the
> > > requests with the proper request status.
> > >
> > > So this will also catch missed requests in the queue. However, since
> > > there might be, lets say 5 good requests and one missed request, what
> > > will happen is, that each complete call for the first good requests will
> > > enqueue new requests into the started list and will also call the
> > > updatecmd on that transfer that was already missed until the loop will
> > > reach the one request with the MISSED status bit set.
> > >
> > > So in my opinion the patch from Jeff makes sense when adding the
> > > following change aswell. With those both changes the underruns and
> > > broken frames finally disappear. I am still unsure about the complete
> > > solution about that, since with this the mentioned 5 good requests
> > > will be cancelled aswell. So this is still a WIP status here.
> > >
> >
> > When the dwc3 driver issues stop_active_transfer(), that means that the
> > started_list is empty and there is an underrun.
>
> At this moment this is only the case when both, pending and started list
> are empty. Or the interrupt event was EXDEV.
>
> The main problem is that the function
> dwc3_gadget_ep_cleanup_completed_requests(dep, event, status); will
> issue an complete for each started request, which on the other hand will
> refill the pending list, and therefor after that refill the
> stop_active_transfer is currently never hit.
>
> > It treats the incoming requests as staled. However, for UVC, they are
> > still "good".
>
> Right, so in that case we can requeue them anyway. But this will have to
> be done after the stop transfer cmd has finished.
>
> > I think you can just check if the started_list is empty before queuing
> > new requests. If it is, perform stop_active_transfer() to reschedule the
> > incoming requests. None of the newly queue requests will be released
> > yet since they are in the pending_list.
>
> So that is basically exactly what my patch is doing. However in the case
> of an underrun it is not safe to call dwc3_gadget_ep_cleanup_completed_requests
> as jeff stated. So his underlying patch is really fixing an issue here.

What I mean is to actively check for started list on every
usb_ep_queue() call. Checking during
dwc3_gadget_ep_cleanup_completed_requests() is already too late.

>
> > For UVC, perhaps you can introduce a new flag to usb_request called
> > "ignore_queue_latency" or something equivalent. The dwc3 is already
> > partially doing this for UVC. With this new flag, we can rework dwc3 to
> > clearly separate the expected behavior from the function driver.
>
> I don't know why this "extra" flag is even necessary. The code example
> is already working without that extra flag.

The flag is for controller to determine what kinds of behavior the
function driver expects. My intention is if this extra flag is not set,
the dwc3 driver will not attempt to reshcedule isoc request at all (ie.
no stop_active_transfer()).

>
> Actually I even came up with an better solution. Additionally of checking if
> one of the requests in the started list was missed, we can activly check if
> the trb ring did run dry and if dwc3_gadget_endpoint_trbs_complete is
> going to enqueue in to the empty trb ring.
>
> So my whole change looks like that:
>
> diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
> index efe6caf4d0e87..2c8047dcd1612 100644
> --- a/drivers/usb/dwc3/core.h
> +++ b/drivers/usb/dwc3/core.h
> @@ -952,6 +952,7 @@ struct dwc3_request {
> #define DWC3_REQUEST_STATUS_DEQUEUED 3
> #define DWC3_REQUEST_STATUS_STALLED 4
> #define DWC3_REQUEST_STATUS_COMPLETED 5
> +#define DWC3_REQUEST_STATUS_MISSED_ISOC 6
> #define DWC3_REQUEST_STATUS_UNKNOWN -1
> u8 epnum;
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 858fe4c299b7a..a31f4d3502bd3 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -2057,6 +2057,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
> req = next_request(&dep->cancelled_list);
> dwc3_gadget_ep_skip_trbs(dep, req);
> switch (req->status) {
> + case 0:
> + dwc3_gadget_giveback(dep, req, 0);
> + break;
> case DWC3_REQUEST_STATUS_DISCONNECTED:
> dwc3_gadget_giveback(dep, req, -ESHUTDOWN);
> break;
> @@ -2066,6 +2069,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
> case DWC3_REQUEST_STATUS_STALLED:
> dwc3_gadget_giveback(dep, req, -EPIPE);
> break;
> + case DWC3_REQUEST_STATUS_MISSED_ISOC:
> + dwc3_gadget_giveback(dep, req, -EXDEV);
> + break;
> default:
> dev_err(dwc->dev, "request cancelled with wrong reason:%d\n", req->status);
> dwc3_gadget_giveback(dep, req, -ECONNRESET);
> @@ -3509,6 +3515,36 @@ static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,
> return ret;
> }
> +static int dwc3_gadget_ep_check_missed_requests(struct dwc3_ep *dep)
> +{
> + struct dwc3_request *req;
> + struct dwc3_request *tmp;
> + int ret = 0;
> +
> + list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
> + struct dwc3_trb *trb;
> +
> + trb = req->trb;
> + switch (DWC3_TRB_SIZE_TRBSTS(trb->size)) {
> + case DWC3_TRBSTS_MISSED_ISOC:
> + /* Isoc endpoint only */
> + ret = -EXDEV;
> + break;
> + case DWC3_TRB_STS_XFER_IN_PROG:
> + /* Applicable when End Transfer with ForceRM=0 */
> + case DWC3_TRBSTS_SETUP_PENDING:
> + /* Control endpoint only */
> + case DWC3_TRBSTS_OK:
> + default:
> + ret = 0;
> + break;
> + }
> + }
> +
> + return ret;
> +}
> +
> static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
> const struct dwc3_event_depevt *event, int status)
> {
> @@ -3565,22 +3601,51 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
> {
> struct dwc3 *dwc = dep->dwc;
> bool no_started_trb = true;
> + unsigned int transfer_in_flight = 0;
> +
> + /* It is possible that the interrupt thread was delayed by
> + * scheduling in the system, and therefor the HW has already
> + * run dry. In that case the last trb in the queue is already
> + * handled by the hw. By checking the HWO bit we know to restart
> + * the whole transfer. The condition to appear is more likelely
> + * if not every trb has the IOC bit set and therefor does not
> + * trigger the interrupt thread fewer.
> + */
> + if (dep->number && usb_endpoint_xfer_isoc(dep->endpoint.desc)) {
> + struct dwc3_trb *trb;
> - dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
> + trb = dwc3_ep_prev_trb(dep, dep->trb_enqueue);
> + transfer_in_flight = trb->ctrl & DWC3_TRB_CTRL_HWO;
> + }
> - if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
> - goto out;
> + if (status == -EXDEV || !transfer_in_flight) {
> + struct dwc3_request *tmp;
> + struct dwc3_request *req;
> - if (!dep->endpoint.desc)
> - return no_started_trb;
> + if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
> + dwc3_stop_active_transfer(dep, true, true);
> - if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> - list_empty(&dep->started_list) &&
> - (list_empty(&dep->pending_list) || status == -EXDEV))
> - dwc3_stop_active_transfer(dep, true, true);
> - else if (dwc3_gadget_ep_should_continue(dep))
> - if (__dwc3_gadget_kick_transfer(dep) == 0)
> - no_started_trb = false;
> + list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
> + dwc3_gadget_move_cancelled_request(req,
> + (DWC3_TRB_SIZE_TRBSTS(req->trb->size) == DWC3_TRBSTS_MISSED_ISOC) ?
> + DWC3_REQUEST_STATUS_MISSED_ISOC : 0);
> + }
> + } else {
> + dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
> +
> + if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
> + goto out;
> +
> + if (!dep->endpoint.desc)
> + return no_started_trb;
> +
> + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> + list_empty(&dep->started_list) && list_empty(&dep->pending_list))
> + dwc3_stop_active_transfer(dep, true, true);
> + else if (dwc3_gadget_ep_should_continue(dep))
> + if (__dwc3_gadget_kick_transfer(dep) == 0)
> + no_started_trb = false;
> + }
> out:
> /*
>
> I will seperate the whole hunk into smaller changes and send an v1
> the next days to review.
>

No, we should not reschedule for every missed-isoc. We only want to
target underrun condition.

Thanks,
Thinh

2024-03-07 16:15:52

by Michael Grzeschik

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

On Thu, Mar 07, 2024 at 01:57:44AM +0000, Thinh Nguyen wrote:
>On Tue, Feb 27, 2024, Michael Grzeschik wrote:
>> On Thu, Feb 22, 2024 at 01:20:04AM +0000, Thinh Nguyen wrote:
>> > On Thu, Feb 22, 2024, Michael Grzeschik wrote:
>> > > For #2: I found an issue in the handling of the completion of requests in
>> > > the started list. When the interrupt handler is *explicitly* calling
>> > > stop_active_transfer if the overall event of the request was an missed
>> > > event. This event value only represents the value of the request that
>> > > was actually triggering the interrupt.
>> > >
>> > > It also calls ep_cleanup_completed_requests and is iterating over the
>> > > started requests and will call giveback/complete functions of the
>> > > requests with the proper request status.
>> > >
>> > > So this will also catch missed requests in the queue. However, since
>> > > there might be, lets say 5 good requests and one missed request, what
>> > > will happen is, that each complete call for the first good requests will
>> > > enqueue new requests into the started list and will also call the
>> > > updatecmd on that transfer that was already missed until the loop will
>> > > reach the one request with the MISSED status bit set.
>> > >
>> > > So in my opinion the patch from Jeff makes sense when adding the
>> > > following change aswell. With those both changes the underruns and
>> > > broken frames finally disappear. I am still unsure about the complete
>> > > solution about that, since with this the mentioned 5 good requests
>> > > will be cancelled aswell. So this is still a WIP status here.
>> > >
>> >
>> > When the dwc3 driver issues stop_active_transfer(), that means that the
>> > started_list is empty and there is an underrun.
>>
>> At this moment this is only the case when both, pending and started list
>> are empty. Or the interrupt event was EXDEV.
>>
>> The main problem is that the function
>> dwc3_gadget_ep_cleanup_completed_requests(dep, event, status); will
>> issue an complete for each started request, which on the other hand will
>> refill the pending list, and therefor after that refill the
>> stop_active_transfer is currently never hit.
>>
>> > It treats the incoming requests as staled. However, for UVC, they are
>> > still "good".
>>
>> Right, so in that case we can requeue them anyway. But this will have to
>> be done after the stop transfer cmd has finished.
>>
>> > I think you can just check if the started_list is empty before queuing
>> > new requests. If it is, perform stop_active_transfer() to reschedule the
>> > incoming requests. None of the newly queue requests will be released
>> > yet since they are in the pending_list.
>>
>> So that is basically exactly what my patch is doing. However in the case
>> of an underrun it is not safe to call dwc3_gadget_ep_cleanup_completed_requests
>> as jeff stated. So his underlying patch is really fixing an issue here.
>
>What I mean is to actively check for started list on every
>usb_ep_queue() call. Checking during
>dwc3_gadget_ep_cleanup_completed_requests() is already too late.

I see.

>>
>> > For UVC, perhaps you can introduce a new flag to usb_request called
>> > "ignore_queue_latency" or something equivalent. The dwc3 is already
>> > partially doing this for UVC. With this new flag, we can rework dwc3 to
>> > clearly separate the expected behavior from the function driver.
>>
>> I don't know why this "extra" flag is even necessary. The code example
>> is already working without that extra flag.
>
>The flag is for controller to determine what kinds of behavior the
>function driver expects. My intention is if this extra flag is not set,
>the dwc3 driver will not attempt to reshcedule isoc request at all (ie.
>no stop_active_transfer()).

Ok.

>>
>> Actually I even came up with an better solution. Additionally of checking if
>> one of the requests in the started list was missed, we can activly check if
>> the trb ring did run dry and if dwc3_gadget_endpoint_trbs_complete is
>> going to enqueue in to the empty trb ring.
>>
>> So my whole change looks like that:
>>
>> diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
>> index efe6caf4d0e87..2c8047dcd1612 100644
>> --- a/drivers/usb/dwc3/core.h
>> +++ b/drivers/usb/dwc3/core.h
>> @@ -952,6 +952,7 @@ struct dwc3_request {
>> #define DWC3_REQUEST_STATUS_DEQUEUED 3
>> #define DWC3_REQUEST_STATUS_STALLED 4
>> #define DWC3_REQUEST_STATUS_COMPLETED 5
>> +#define DWC3_REQUEST_STATUS_MISSED_ISOC 6
>> #define DWC3_REQUEST_STATUS_UNKNOWN -1
>> u8 epnum;
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index 858fe4c299b7a..a31f4d3502bd3 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -2057,6 +2057,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>> req = next_request(&dep->cancelled_list);
>> dwc3_gadget_ep_skip_trbs(dep, req);
>> switch (req->status) {
>> + case 0:
>> + dwc3_gadget_giveback(dep, req, 0);
>> + break;
>> case DWC3_REQUEST_STATUS_DISCONNECTED:
>> dwc3_gadget_giveback(dep, req, -ESHUTDOWN);
>> break;
>> @@ -2066,6 +2069,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>> case DWC3_REQUEST_STATUS_STALLED:
>> dwc3_gadget_giveback(dep, req, -EPIPE);
>> break;
>> + case DWC3_REQUEST_STATUS_MISSED_ISOC:
>> + dwc3_gadget_giveback(dep, req, -EXDEV);
>> + break;
>> default:
>> dev_err(dwc->dev, "request cancelled with wrong reason:%d\n", req->status);
>> dwc3_gadget_giveback(dep, req, -ECONNRESET);
>> @@ -3509,6 +3515,36 @@ static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,
>> return ret;
>> }
>> +static int dwc3_gadget_ep_check_missed_requests(struct dwc3_ep *dep)
>> +{
>> + struct dwc3_request *req;
>> + struct dwc3_request *tmp;
>> + int ret = 0;
>> +
>> + list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
>> + struct dwc3_trb *trb;
>> +
>> + trb = req->trb;
>> + switch (DWC3_TRB_SIZE_TRBSTS(trb->size)) {
>> + case DWC3_TRBSTS_MISSED_ISOC:
>> + /* Isoc endpoint only */
>> + ret = -EXDEV;
>> + break;
>> + case DWC3_TRB_STS_XFER_IN_PROG:
>> + /* Applicable when End Transfer with ForceRM=0 */
>> + case DWC3_TRBSTS_SETUP_PENDING:
>> + /* Control endpoint only */
>> + case DWC3_TRBSTS_OK:
>> + default:
>> + ret = 0;
>> + break;
>> + }
>> + }
>> +
>> + return ret;
>> +}
>> +
>> static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
>> const struct dwc3_event_depevt *event, int status)
>> {
>> @@ -3565,22 +3601,51 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
>> {
>> struct dwc3 *dwc = dep->dwc;
>> bool no_started_trb = true;
>> + unsigned int transfer_in_flight = 0;
>> +
>> + /* It is possible that the interrupt thread was delayed by
>> + * scheduling in the system, and therefor the HW has already
>> + * run dry. In that case the last trb in the queue is already
>> + * handled by the hw. By checking the HWO bit we know to restart
>> + * the whole transfer. The condition to appear is more likelely
>> + * if not every trb has the IOC bit set and therefor does not
>> + * trigger the interrupt thread fewer.
>> + */
>> + if (dep->number && usb_endpoint_xfer_isoc(dep->endpoint.desc)) {
>> + struct dwc3_trb *trb;
>> - dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
>> + trb = dwc3_ep_prev_trb(dep, dep->trb_enqueue);
>> + transfer_in_flight = trb->ctrl & DWC3_TRB_CTRL_HWO;
>> + }
>> - if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
>> - goto out;
>> + if (status == -EXDEV || !transfer_in_flight) {
>> + struct dwc3_request *tmp;
>> + struct dwc3_request *req;
>> - if (!dep->endpoint.desc)
>> - return no_started_trb;
>> + if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
>> + dwc3_stop_active_transfer(dep, true, true);
>> - if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
>> - list_empty(&dep->started_list) &&
>> - (list_empty(&dep->pending_list) || status == -EXDEV))

@[!!here!!]

>> - dwc3_stop_active_transfer(dep, true, true);
>> - else if (dwc3_gadget_ep_should_continue(dep))
>> - if (__dwc3_gadget_kick_transfer(dep) == 0)
>> - no_started_trb = false;
>> + list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
>> + dwc3_gadget_move_cancelled_request(req,
>> + (DWC3_TRB_SIZE_TRBSTS(req->trb->size) == DWC3_TRBSTS_MISSED_ISOC) ?
>> + DWC3_REQUEST_STATUS_MISSED_ISOC : 0);
>> + }
>> + } else {
>> + dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
>> +
>> + if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
>> + goto out;
>> +
>> + if (!dep->endpoint.desc)
>> + return no_started_trb;
>> +
>> + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
>> + list_empty(&dep->started_list) && list_empty(&dep->pending_list))
>> + dwc3_stop_active_transfer(dep, true, true);
>> + else if (dwc3_gadget_ep_should_continue(dep))
>> + if (__dwc3_gadget_kick_transfer(dep) == 0)
>> + no_started_trb = false;
>> + }
>> out:
>> /*
>>
>> I will seperate the whole hunk into smaller changes and send an v1
>> the next days to review.
>>

I finally send a v1 of my series.

https://lore.kernel.org/linux-usb/20240307-dwc3-gadget-complete-irq-v1-0-4fe9ac0ba2b7@pengutronix.de/

For the rest of the discussion, I would like to move the conversation to
the newly send series.

>No, we should not reschedule for every missed-isoc. We only want to
>target underrun condition.

As you stated above, with reschedule what you mean is calling
stop_transfer after a missed transfer was seen?

If so, why is this condition in there already? (@[!!here!!])

Michael

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |


Attachments:
(No filename) (10.00 kB)
signature.asc (849.00 B)
Download all attachments

2024-03-08 02:57:25

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc

On Thu, Mar 07, 2024, Michael Grzeschik wrote:
> On Thu, Mar 07, 2024 at 01:57:44AM +0000, Thinh Nguyen wrote:
> > On Tue, Feb 27, 2024, Michael Grzeschik wrote:
> > > On Thu, Feb 22, 2024 at 01:20:04AM +0000, Thinh Nguyen wrote:
> > > > On Thu, Feb 22, 2024, Michael Grzeschik wrote:
> > > > > For #2: I found an issue in the handling of the completion of requests in
> > > > > the started list. When the interrupt handler is *explicitly* calling
> > > > > stop_active_transfer if the overall event of the request was an missed
> > > > > event. This event value only represents the value of the request that
> > > > > was actually triggering the interrupt.
> > > > >
> > > > > It also calls ep_cleanup_completed_requests and is iterating over the
> > > > > started requests and will call giveback/complete functions of the
> > > > > requests with the proper request status.
> > > > >
> > > > > So this will also catch missed requests in the queue. However, since
> > > > > there might be, lets say 5 good requests and one missed request, what
> > > > > will happen is, that each complete call for the first good requests will
> > > > > enqueue new requests into the started list and will also call the
> > > > > updatecmd on that transfer that was already missed until the loop will
> > > > > reach the one request with the MISSED status bit set.
> > > > >
> > > > > So in my opinion the patch from Jeff makes sense when adding the
> > > > > following change aswell. With those both changes the underruns and
> > > > > broken frames finally disappear. I am still unsure about the complete
> > > > > solution about that, since with this the mentioned 5 good requests
> > > > > will be cancelled aswell. So this is still a WIP status here.
> > > > >
> > > >
> > > > When the dwc3 driver issues stop_active_transfer(), that means that the
> > > > started_list is empty and there is an underrun.
> > >
> > > At this moment this is only the case when both, pending and started list
> > > are empty. Or the interrupt event was EXDEV.
> > >
> > > The main problem is that the function
> > > dwc3_gadget_ep_cleanup_completed_requests(dep, event, status); will
> > > issue an complete for each started request, which on the other hand will
> > > refill the pending list, and therefor after that refill the
> > > stop_active_transfer is currently never hit.
> > >
> > > > It treats the incoming requests as staled. However, for UVC, they are
> > > > still "good".
> > >
> > > Right, so in that case we can requeue them anyway. But this will have to
> > > be done after the stop transfer cmd has finished.
> > >
> > > > I think you can just check if the started_list is empty before queuing
> > > > new requests. If it is, perform stop_active_transfer() to reschedule the
> > > > incoming requests. None of the newly queue requests will be released
> > > > yet since they are in the pending_list.
> > >
> > > So that is basically exactly what my patch is doing. However in the case
> > > of an underrun it is not safe to call dwc3_gadget_ep_cleanup_completed_requests
> > > as jeff stated. So his underlying patch is really fixing an issue here.
> >
> > What I mean is to actively check for started list on every
> > usb_ep_queue() call. Checking during
> > dwc3_gadget_ep_cleanup_completed_requests() is already too late.
>
> I see.
>
> > >
> > > > For UVC, perhaps you can introduce a new flag to usb_request called
> > > > "ignore_queue_latency" or something equivalent. The dwc3 is already
> > > > partially doing this for UVC. With this new flag, we can rework dwc3 to
> > > > clearly separate the expected behavior from the function driver.
> > >
> > > I don't know why this "extra" flag is even necessary. The code example
> > > is already working without that extra flag.
> >
> > The flag is for controller to determine what kinds of behavior the
> > function driver expects. My intention is if this extra flag is not set,
> > the dwc3 driver will not attempt to reshcedule isoc request at all (ie.
> > no stop_active_transfer()).
>
> Ok.
>
> > >
> > > Actually I even came up with an better solution. Additionally of checking if
> > > one of the requests in the started list was missed, we can activly check if
> > > the trb ring did run dry and if dwc3_gadget_endpoint_trbs_complete is
> > > going to enqueue in to the empty trb ring.
> > >
> > > So my whole change looks like that:
> > >
> > > diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
> > > index efe6caf4d0e87..2c8047dcd1612 100644
> > > --- a/drivers/usb/dwc3/core.h
> > > +++ b/drivers/usb/dwc3/core.h
> > > @@ -952,6 +952,7 @@ struct dwc3_request {
> > > #define DWC3_REQUEST_STATUS_DEQUEUED 3
> > > #define DWC3_REQUEST_STATUS_STALLED 4
> > > #define DWC3_REQUEST_STATUS_COMPLETED 5
> > > +#define DWC3_REQUEST_STATUS_MISSED_ISOC 6
> > > #define DWC3_REQUEST_STATUS_UNKNOWN -1
> > > u8 epnum;
> > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > index 858fe4c299b7a..a31f4d3502bd3 100644
> > > --- a/drivers/usb/dwc3/gadget.c
> > > +++ b/drivers/usb/dwc3/gadget.c
> > > @@ -2057,6 +2057,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
> > > req = next_request(&dep->cancelled_list);
> > > dwc3_gadget_ep_skip_trbs(dep, req);
> > > switch (req->status) {
> > > + case 0:
> > > + dwc3_gadget_giveback(dep, req, 0);
> > > + break;
> > > case DWC3_REQUEST_STATUS_DISCONNECTED:
> > > dwc3_gadget_giveback(dep, req, -ESHUTDOWN);
> > > break;
> > > @@ -2066,6 +2069,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
> > > case DWC3_REQUEST_STATUS_STALLED:
> > > dwc3_gadget_giveback(dep, req, -EPIPE);
> > > break;
> > > + case DWC3_REQUEST_STATUS_MISSED_ISOC:
> > > + dwc3_gadget_giveback(dep, req, -EXDEV);
> > > + break;
> > > default:
> > > dev_err(dwc->dev, "request cancelled with wrong reason:%d\n", req->status);
> > > dwc3_gadget_giveback(dep, req, -ECONNRESET);
> > > @@ -3509,6 +3515,36 @@ static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,
> > > return ret;
> > > }
> > > +static int dwc3_gadget_ep_check_missed_requests(struct dwc3_ep *dep)
> > > +{
> > > + struct dwc3_request *req;
> > > + struct dwc3_request *tmp;
> > > + int ret = 0;
> > > +
> > > + list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
> > > + struct dwc3_trb *trb;
> > > +
> > > + trb = req->trb;
> > > + switch (DWC3_TRB_SIZE_TRBSTS(trb->size)) {
> > > + case DWC3_TRBSTS_MISSED_ISOC:
> > > + /* Isoc endpoint only */
> > > + ret = -EXDEV;
> > > + break;
> > > + case DWC3_TRB_STS_XFER_IN_PROG:
> > > + /* Applicable when End Transfer with ForceRM=0 */
> > > + case DWC3_TRBSTS_SETUP_PENDING:
> > > + /* Control endpoint only */
> > > + case DWC3_TRBSTS_OK:
> > > + default:
> > > + ret = 0;
> > > + break;
> > > + }
> > > + }
> > > +
> > > + return ret;
> > > +}
> > > +
> > > static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
> > > const struct dwc3_event_depevt *event, int status)
> > > {
> > > @@ -3565,22 +3601,51 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
> > > {
> > > struct dwc3 *dwc = dep->dwc;
> > > bool no_started_trb = true;
> > > + unsigned int transfer_in_flight = 0;
> > > +
> > > + /* It is possible that the interrupt thread was delayed by
> > > + * scheduling in the system, and therefor the HW has already
> > > + * run dry. In that case the last trb in the queue is already
> > > + * handled by the hw. By checking the HWO bit we know to restart
> > > + * the whole transfer. The condition to appear is more likelely
> > > + * if not every trb has the IOC bit set and therefor does not
> > > + * trigger the interrupt thread fewer.
> > > + */
> > > + if (dep->number && usb_endpoint_xfer_isoc(dep->endpoint.desc)) {
> > > + struct dwc3_trb *trb;
> > > - dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
> > > + trb = dwc3_ep_prev_trb(dep, dep->trb_enqueue);
> > > + transfer_in_flight = trb->ctrl & DWC3_TRB_CTRL_HWO;
> > > + }
> > > - if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
> > > - goto out;
> > > + if (status == -EXDEV || !transfer_in_flight) {
> > > + struct dwc3_request *tmp;
> > > + struct dwc3_request *req;
> > > - if (!dep->endpoint.desc)
> > > - return no_started_trb;
> > > + if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
> > > + dwc3_stop_active_transfer(dep, true, true);
> > > - if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > > - list_empty(&dep->started_list) &&
> > > - (list_empty(&dep->pending_list) || status == -EXDEV))
>
> @[!!here!!]
>
> > > - dwc3_stop_active_transfer(dep, true, true);
> > > - else if (dwc3_gadget_ep_should_continue(dep))
> > > - if (__dwc3_gadget_kick_transfer(dep) == 0)
> > > - no_started_trb = false;
> > > + list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
> > > + dwc3_gadget_move_cancelled_request(req,
> > > + (DWC3_TRB_SIZE_TRBSTS(req->trb->size) == DWC3_TRBSTS_MISSED_ISOC) ?
> > > + DWC3_REQUEST_STATUS_MISSED_ISOC : 0);
> > > + }
> > > + } else {
> > > + dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
> > > +
> > > + if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
> > > + goto out;
> > > +
> > > + if (!dep->endpoint.desc)
> > > + return no_started_trb;
> > > +
> > > + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > > + list_empty(&dep->started_list) && list_empty(&dep->pending_list))
> > > + dwc3_stop_active_transfer(dep, true, true);
> > > + else if (dwc3_gadget_ep_should_continue(dep))
> > > + if (__dwc3_gadget_kick_transfer(dep) == 0)
> > > + no_started_trb = false;
> > > + }
> > > out:
> > > /*
> > >
> > > I will seperate the whole hunk into smaller changes and send an v1
> > > the next days to review.
> > >
>
> I finally send a v1 of my series.
>
> https://lore.kernel.org/linux-usb/20240307-dwc3-gadget-complete-irq-v1-0-4fe9ac0ba2b7@pengutronix.de/
>
> For the rest of the discussion, I would like to move the conversation to
> the newly send series.

I saw your pushes. Thanks. I'll review and move the discussion there.

>
> > No, we should not reschedule for every missed-isoc. We only want to
> > target underrun condition.
>
> As you stated above, with reschedule what you mean is calling
> stop_transfer after a missed transfer was seen?
>
> If so, why is this condition in there already? (@[!!here!!])
>

It's only to reschedule if started_list is empty _and_ if there's either
no pending request or there's a missed isoc. Not for every missed isoc.

BR,
Thinh