2018-01-03 20:33:12

by Kieran Bingham

[permalink] [raw]
Subject: [RFC/RFT PATCH 0/6] Asynchronous UVC

From: Kieran Bingham <[email protected]>

The Linux UVC driver has long provided adequate performance capabilities for
web-cams and low data rate video devices in Linux while resolutions were low.

Modern USB cameras are now capable of high data rates thanks to USB3 with
1080p, and even 4k capture resolutions supported.

Cameras such as the Stereolabs ZED or the Logitech Brio can generate more data
than an embedded ARM core is able to process on a single core, resulting in
frame loss.

A large part of this performance impact is from the requirement to
‘memcpy’ frames out from URB packets to destination frames. This unfortunate
requirement is due to the UVC protocol allowing a variable length header, and
thus it is not possible to provide the target frame buffers directly.

Extra throughput is possible by moving the actual memcpy actions to a work
queue, and moving the memcpy out of interrupt context and allowing work tasks
to be scheduled across multiple cores.

This series has been tested on both the ZED and Brio cameras on arm64
platforms, however due to the intrinsic changes in the driver I would like to
see it tested with other devices and other platforms, so I'd appreciate if
anyone can test this on a range of USB cameras.

Kieran Bingham (6):
uvcvideo: Refactor URB descriptors
uvcvideo: Convert decode functions to use new context structure
uvcvideo: Protect queue internals with helper
uvcvideo: queue: Simplify spin-lock usage
uvcvideo: queue: Support asynchronous buffer handling
uvcvideo: Move decode processing to process context

drivers/media/usb/uvc/uvc_isight.c | 4 +-
drivers/media/usb/uvc/uvc_queue.c | 115 ++++++++++++++----
drivers/media/usb/uvc/uvc_video.c | 191 ++++++++++++++++++++++--------
drivers/media/usb/uvc/uvcvideo.h | 56 +++++++--
4 files changed, 289 insertions(+), 77 deletions(-)

base-commit: 6f0e5fd39143a59c22d60e7befc4f33f22aeed2f
--
git-series 0.9.1


2018-01-03 20:33:17

by Kieran Bingham

[permalink] [raw]
Subject: [RFC/RFT PATCH 1/6] uvcvideo: Refactor URB descriptors

From: Kieran Bingham <[email protected]>

We currently store three separate arrays for each URB reference we hold.

Objectify the data needed to track URBs into a single uvc_urb structure,
allowing better object management and tracking of the URB.

All accesses to the data pointers through stream, are converted to use a
uvc_urb pointer for consistency.

Signed-off-by: Kieran Bingham <[email protected]>
Reviewed-by: Laurent Pinchart <[email protected]>
---
drivers/media/usb/uvc/uvc_video.c | 46 ++++++++++++++++++++------------
drivers/media/usb/uvc/uvcvideo.h | 18 ++++++++++---
2 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
index 73cd44e8bd81..86512f6229dd 100644
--- a/drivers/media/usb/uvc/uvc_video.c
+++ b/drivers/media/usb/uvc/uvc_video.c
@@ -1357,14 +1357,16 @@ static void uvc_free_urb_buffers(struct uvc_streaming *stream)
unsigned int i;

for (i = 0; i < UVC_URBS; ++i) {
- if (stream->urb_buffer[i]) {
+ struct uvc_urb *uvc_urb = &stream->uvc_urb[i];
+
+ if (uvc_urb->urb_buffer) {
#ifndef CONFIG_DMA_NONCOHERENT
usb_free_coherent(stream->dev->udev, stream->urb_size,
- stream->urb_buffer[i], stream->urb_dma[i]);
+ uvc_urb->urb_buffer, uvc_urb->urb_dma);
#else
- kfree(stream->urb_buffer[i]);
+ kfree(uvc_urb->urb_buffer);
#endif
- stream->urb_buffer[i] = NULL;
+ uvc_urb->urb_buffer = NULL;
}
}

@@ -1402,16 +1404,18 @@ static int uvc_alloc_urb_buffers(struct uvc_streaming *stream,
/* Retry allocations until one succeed. */
for (; npackets > 1; npackets /= 2) {
for (i = 0; i < UVC_URBS; ++i) {
+ struct uvc_urb *uvc_urb = &stream->uvc_urb[i];
+
stream->urb_size = psize * npackets;
#ifndef CONFIG_DMA_NONCOHERENT
- stream->urb_buffer[i] = usb_alloc_coherent(
+ uvc_urb->urb_buffer = usb_alloc_coherent(
stream->dev->udev, stream->urb_size,
- gfp_flags | __GFP_NOWARN, &stream->urb_dma[i]);
+ gfp_flags | __GFP_NOWARN, &uvc_urb->urb_dma);
#else
- stream->urb_buffer[i] =
+ uvc_urb->urb_buffer =
kmalloc(stream->urb_size, gfp_flags | __GFP_NOWARN);
#endif
- if (!stream->urb_buffer[i]) {
+ if (!uvc_urb->urb_buffer) {
uvc_free_urb_buffers(stream);
break;
}
@@ -1441,13 +1445,15 @@ static void uvc_uninit_video(struct uvc_streaming *stream, int free_buffers)
uvc_video_stats_stop(stream);

for (i = 0; i < UVC_URBS; ++i) {
- urb = stream->urb[i];
+ struct uvc_urb *uvc_urb = &stream->uvc_urb[i];
+
+ urb = uvc_urb->urb;
if (urb == NULL)
continue;

usb_kill_urb(urb);
usb_free_urb(urb);
- stream->urb[i] = NULL;
+ uvc_urb->urb = NULL;
}

if (free_buffers)
@@ -1502,6 +1508,8 @@ static int uvc_init_video_isoc(struct uvc_streaming *stream,
size = npackets * psize;

for (i = 0; i < UVC_URBS; ++i) {
+ struct uvc_urb *uvc_urb = &stream->uvc_urb[i];
+
urb = usb_alloc_urb(npackets, gfp_flags);
if (urb == NULL) {
uvc_uninit_video(stream, 1);
@@ -1514,12 +1522,12 @@ static int uvc_init_video_isoc(struct uvc_streaming *stream,
ep->desc.bEndpointAddress);
#ifndef CONFIG_DMA_NONCOHERENT
urb->transfer_flags = URB_ISO_ASAP | URB_NO_TRANSFER_DMA_MAP;
- urb->transfer_dma = stream->urb_dma[i];
+ urb->transfer_dma = uvc_urb->urb_dma;
#else
urb->transfer_flags = URB_ISO_ASAP;
#endif
urb->interval = ep->desc.bInterval;
- urb->transfer_buffer = stream->urb_buffer[i];
+ urb->transfer_buffer = uvc_urb->urb_buffer;
urb->complete = uvc_video_complete;
urb->number_of_packets = npackets;
urb->transfer_buffer_length = size;
@@ -1529,7 +1537,7 @@ static int uvc_init_video_isoc(struct uvc_streaming *stream,
urb->iso_frame_desc[j].length = psize;
}

- stream->urb[i] = urb;
+ uvc_urb->urb = urb;
}

return 0;
@@ -1568,6 +1576,8 @@ static int uvc_init_video_bulk(struct uvc_streaming *stream,
size = 0;

for (i = 0; i < UVC_URBS; ++i) {
+ struct uvc_urb *uvc_urb = &stream->uvc_urb[i];
+
urb = usb_alloc_urb(0, gfp_flags);
if (urb == NULL) {
uvc_uninit_video(stream, 1);
@@ -1575,14 +1585,14 @@ static int uvc_init_video_bulk(struct uvc_streaming *stream,
}

usb_fill_bulk_urb(urb, stream->dev->udev, pipe,
- stream->urb_buffer[i], size, uvc_video_complete,
+ uvc_urb->urb_buffer, size, uvc_video_complete,
stream);
#ifndef CONFIG_DMA_NONCOHERENT
urb->transfer_flags = URB_NO_TRANSFER_DMA_MAP;
- urb->transfer_dma = stream->urb_dma[i];
+ urb->transfer_dma = uvc_urb->urb_dma;
#endif

- stream->urb[i] = urb;
+ uvc_urb->urb = urb;
}

return 0;
@@ -1673,7 +1683,9 @@ static int uvc_init_video(struct uvc_streaming *stream, gfp_t gfp_flags)

/* Submit the URBs. */
for (i = 0; i < UVC_URBS; ++i) {
- ret = usb_submit_urb(stream->urb[i], gfp_flags);
+ struct uvc_urb *uvc_urb = &stream->uvc_urb[i];
+
+ ret = usb_submit_urb(uvc_urb->urb, gfp_flags);
if (ret < 0) {
uvc_printk(KERN_ERR, "Failed to submit URB %u "
"(%d).\n", i, ret);
diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index 19e725e2bda5..4afa8ce13ea7 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -479,6 +479,20 @@ struct uvc_stats_stream {
unsigned int max_sof; /* Maximum STC.SOF value */
};

+/**
+ * struct uvc_urb - URB context management structure
+ *
+ * @urb: described URB. Must be allocated with usb_alloc_urb()
+ * @urb_buffer: memory storage for the URB
+ * @urb_dma: DMA coherent addressing for the urb_buffer
+ */
+struct uvc_urb {
+ struct urb *urb;
+
+ char *urb_buffer;
+ dma_addr_t urb_dma;
+};
+
struct uvc_streaming {
struct list_head list;
struct uvc_device *dev;
@@ -521,9 +535,7 @@ struct uvc_streaming {
__u32 max_payload_size;
} bulk;

- struct urb *urb[UVC_URBS];
- char *urb_buffer[UVC_URBS];
- dma_addr_t urb_dma[UVC_URBS];
+ struct uvc_urb uvc_urb[UVC_URBS];
unsigned int urb_size;

__u32 sequence;
--
git-series 0.9.1

2018-01-03 20:33:26

by Kieran Bingham

[permalink] [raw]
Subject: [RFC/RFT PATCH 2/6] uvcvideo: Convert decode functions to use new context structure

From: Kieran Bingham <[email protected]>

The URB completion handlers currently reference the stream context.

Now that each URB has its own context structure, convert the decode (and
one encode) functions to utilise this context for URB management.

Signed-off-by: Kieran Bingham <[email protected]>
Reviewed-by: Laurent Pinchart <[email protected]>
---
drivers/media/usb/uvc/uvc_isight.c | 4 +++-
drivers/media/usb/uvc/uvc_video.c | 30 ++++++++++++++++++++----------
drivers/media/usb/uvc/uvcvideo.h | 8 ++++----
3 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_isight.c b/drivers/media/usb/uvc/uvc_isight.c
index 8510e7259e76..433b8b4f96e2 100644
--- a/drivers/media/usb/uvc/uvc_isight.c
+++ b/drivers/media/usb/uvc/uvc_isight.c
@@ -99,9 +99,11 @@ static int isight_decode(struct uvc_video_queue *queue, struct uvc_buffer *buf,
return 0;
}

-void uvc_video_decode_isight(struct urb *urb, struct uvc_streaming *stream,
+void uvc_video_decode_isight(struct uvc_urb *uvc_urb,
struct uvc_buffer *buf)
{
+ struct urb *urb = uvc_urb->urb;
+ struct uvc_streaming *stream = uvc_urb->stream;
int ret, i;

for (i = 0; i < urb->number_of_packets; ++i) {
diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
index 86512f6229dd..17a40c9a1fa3 100644
--- a/drivers/media/usb/uvc/uvc_video.c
+++ b/drivers/media/usb/uvc/uvc_video.c
@@ -1153,9 +1153,11 @@ static void uvc_video_validate_buffer(const struct uvc_streaming *stream,
/*
* Completion handler for video URBs.
*/
-static void uvc_video_decode_isoc(struct urb *urb, struct uvc_streaming *stream,
- struct uvc_buffer *buf)
+static void uvc_video_decode_isoc(struct uvc_urb *uvc_urb,
+ struct uvc_buffer *buf)
{
+ struct urb *urb = uvc_urb->urb;
+ struct uvc_streaming *stream = uvc_urb->stream;
u8 *mem;
int ret, i;

@@ -1199,9 +1201,11 @@ static void uvc_video_decode_isoc(struct urb *urb, struct uvc_streaming *stream,
}
}

-static void uvc_video_decode_bulk(struct urb *urb, struct uvc_streaming *stream,
- struct uvc_buffer *buf)
+static void uvc_video_decode_bulk(struct uvc_urb *uvc_urb,
+ struct uvc_buffer *buf)
{
+ struct urb *urb = uvc_urb->urb;
+ struct uvc_streaming *stream = uvc_urb->stream;
u8 *mem;
int len, ret;

@@ -1266,9 +1270,12 @@ static void uvc_video_decode_bulk(struct urb *urb, struct uvc_streaming *stream,
}
}

-static void uvc_video_encode_bulk(struct urb *urb, struct uvc_streaming *stream,
- struct uvc_buffer *buf)
+static void uvc_video_encode_bulk(struct uvc_urb *uvc_urb,
+ struct uvc_buffer *buf)
{
+ struct urb *urb = uvc_urb->urb;
+ struct uvc_streaming *stream = uvc_urb->stream;
+
u8 *mem = urb->transfer_buffer;
int len = stream->urb_size, ret;

@@ -1311,7 +1318,8 @@ static void uvc_video_encode_bulk(struct urb *urb, struct uvc_streaming *stream,

static void uvc_video_complete(struct urb *urb)
{
- struct uvc_streaming *stream = urb->context;
+ struct uvc_urb *uvc_urb = urb->context;
+ struct uvc_streaming *stream = uvc_urb->stream;
struct uvc_video_queue *queue = &stream->queue;
struct uvc_buffer *buf = NULL;
unsigned long flags;
@@ -1341,7 +1349,7 @@ static void uvc_video_complete(struct urb *urb)
queue);
spin_unlock_irqrestore(&queue->irqlock, flags);

- stream->decode(urb, stream, buf);
+ stream->decode(uvc_urb, buf);

if ((ret = usb_submit_urb(urb, GFP_ATOMIC)) < 0) {
uvc_printk(KERN_ERR, "Failed to resubmit video URB (%d).\n",
@@ -1419,6 +1427,8 @@ static int uvc_alloc_urb_buffers(struct uvc_streaming *stream,
uvc_free_urb_buffers(stream);
break;
}
+
+ uvc_urb->stream = stream;
}

if (i == UVC_URBS) {
@@ -1517,7 +1527,7 @@ static int uvc_init_video_isoc(struct uvc_streaming *stream,
}

urb->dev = stream->dev->udev;
- urb->context = stream;
+ urb->context = uvc_urb;
urb->pipe = usb_rcvisocpipe(stream->dev->udev,
ep->desc.bEndpointAddress);
#ifndef CONFIG_DMA_NONCOHERENT
@@ -1586,7 +1596,7 @@ static int uvc_init_video_bulk(struct uvc_streaming *stream,

usb_fill_bulk_urb(urb, stream->dev->udev, pipe,
uvc_urb->urb_buffer, size, uvc_video_complete,
- stream);
+ uvc_urb);
#ifndef CONFIG_DMA_NONCOHERENT
urb->transfer_flags = URB_NO_TRANSFER_DMA_MAP;
urb->transfer_dma = uvc_urb->urb_dma;
diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index 4afa8ce13ea7..e4bd3d68a273 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -483,11 +483,13 @@ struct uvc_stats_stream {
* struct uvc_urb - URB context management structure
*
* @urb: described URB. Must be allocated with usb_alloc_urb()
+ * @stream: UVC streaming context
* @urb_buffer: memory storage for the URB
* @urb_dma: DMA coherent addressing for the urb_buffer
*/
struct uvc_urb {
struct urb *urb;
+ struct uvc_streaming *stream;

char *urb_buffer;
dma_addr_t urb_dma;
@@ -523,8 +525,7 @@ struct uvc_streaming {
/* Buffers queue. */
unsigned int frozen : 1;
struct uvc_video_queue queue;
- void (*decode) (struct urb *urb, struct uvc_streaming *video,
- struct uvc_buffer *buf);
+ void (*decode) (struct uvc_urb *uvc_urb, struct uvc_buffer *buf);

/* Context data used by the bulk completion handler. */
struct {
@@ -780,8 +781,7 @@ extern struct usb_host_endpoint *uvc_find_endpoint(
struct usb_host_interface *alts, __u8 epaddr);

/* Quirks support */
-void uvc_video_decode_isight(struct urb *urb, struct uvc_streaming *stream,
- struct uvc_buffer *buf);
+void uvc_video_decode_isight(struct uvc_urb *uvc_urb, struct uvc_buffer *buf);

/* debugfs and statistics */
void uvc_debugfs_init(void);
--
git-series 0.9.1

2018-01-03 20:33:36

by Kieran Bingham

[permalink] [raw]
Subject: [RFC/RFT PATCH 3/6] uvcvideo: Protect queue internals with helper

From: Kieran Bingham <[email protected]>

The URB completion operation obtains the current buffer by reading
directly into the queue internal interface.

Protect this queue abstraction by providing a helper
uvc_queue_get_current_buffer() which can be used by both the decode
task, and the uvc_queue_next_buffer() functions.

Signed-off-by: Kieran Bingham <[email protected]>
Reviewed-by: Laurent Pinchart <[email protected]>
---
drivers/media/usb/uvc/uvc_queue.c | 34 +++++++++++++++++++++++++++-----
drivers/media/usb/uvc/uvc_video.c | 7 +------
drivers/media/usb/uvc/uvcvideo.h | 2 ++-
3 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_queue.c b/drivers/media/usb/uvc/uvc_queue.c
index c8d78b2f3de4..0711e3d9ff76 100644
--- a/drivers/media/usb/uvc/uvc_queue.c
+++ b/drivers/media/usb/uvc/uvc_queue.c
@@ -399,6 +399,34 @@ void uvc_queue_cancel(struct uvc_video_queue *queue, int disconnect)
spin_unlock_irqrestore(&queue->irqlock, flags);
}

+/*
+ * uvc_queue_get_current_buffer: Obtain the current working output buffer
+ *
+ * Buffers may span multiple packets, and even URBs, therefore the active buffer
+ * remains on the queue until the EOF marker.
+ */
+static struct uvc_buffer *
+__uvc_queue_get_current_buffer(struct uvc_video_queue *queue)
+{
+ if (!list_empty(&queue->irqqueue))
+ return list_first_entry(&queue->irqqueue, struct uvc_buffer,
+ queue);
+ else
+ return NULL;
+}
+
+struct uvc_buffer *uvc_queue_get_current_buffer(struct uvc_video_queue *queue)
+{
+ struct uvc_buffer *nextbuf;
+ unsigned long flags;
+
+ spin_lock_irqsave(&queue->irqlock, flags);
+ nextbuf = __uvc_queue_get_current_buffer(queue);
+ spin_unlock_irqrestore(&queue->irqlock, flags);
+
+ return nextbuf;
+}
+
struct uvc_buffer *uvc_queue_next_buffer(struct uvc_video_queue *queue,
struct uvc_buffer *buf)
{
@@ -415,11 +443,7 @@ struct uvc_buffer *uvc_queue_next_buffer(struct uvc_video_queue *queue,

spin_lock_irqsave(&queue->irqlock, flags);
list_del(&buf->queue);
- if (!list_empty(&queue->irqqueue))
- nextbuf = list_first_entry(&queue->irqqueue, struct uvc_buffer,
- queue);
- else
- nextbuf = NULL;
+ nextbuf = __uvc_queue_get_current_buffer(queue);
spin_unlock_irqrestore(&queue->irqlock, flags);

buf->state = buf->error ? UVC_BUF_STATE_ERROR : UVC_BUF_STATE_DONE;
diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
index 17a40c9a1fa3..045ac655313c 100644
--- a/drivers/media/usb/uvc/uvc_video.c
+++ b/drivers/media/usb/uvc/uvc_video.c
@@ -1322,7 +1322,6 @@ static void uvc_video_complete(struct urb *urb)
struct uvc_streaming *stream = uvc_urb->stream;
struct uvc_video_queue *queue = &stream->queue;
struct uvc_buffer *buf = NULL;
- unsigned long flags;
int ret;

switch (urb->status) {
@@ -1343,11 +1342,7 @@ static void uvc_video_complete(struct urb *urb)
return;
}

- spin_lock_irqsave(&queue->irqlock, flags);
- if (!list_empty(&queue->irqqueue))
- buf = list_first_entry(&queue->irqqueue, struct uvc_buffer,
- queue);
- spin_unlock_irqrestore(&queue->irqlock, flags);
+ buf = uvc_queue_get_current_buffer(queue);

stream->decode(uvc_urb, buf);

diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index e4bd3d68a273..f274c685087d 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -692,6 +692,8 @@ extern int uvc_queue_streamon(struct uvc_video_queue *queue,
extern int uvc_queue_streamoff(struct uvc_video_queue *queue,
enum v4l2_buf_type type);
extern void uvc_queue_cancel(struct uvc_video_queue *queue, int disconnect);
+extern struct uvc_buffer *
+ uvc_queue_get_current_buffer(struct uvc_video_queue *queue);
extern struct uvc_buffer *uvc_queue_next_buffer(struct uvc_video_queue *queue,
struct uvc_buffer *buf);
extern int uvc_queue_mmap(struct uvc_video_queue *queue,
--
git-series 0.9.1

2018-01-03 20:33:48

by Kieran Bingham

[permalink] [raw]
Subject: [RFC/RFT PATCH 5/6] uvcvideo: queue: Support asynchronous buffer handling

From: Kieran Bingham <[email protected]>

The buffer queue interface currently operates sequentially, processing
buffers after they have fully completed.

In preparation for supporting parallel tasks operating on the buffers,
we will need to support buffers being processed on multiple CPUs.

Adapt the uvc_queue_next_buffer() such that a reference count tracks the
active use of the buffer, returning the buffer to the VB2 stack at
completion.

Signed-off-by: Kieran Bingham <[email protected]>
---
drivers/media/usb/uvc/uvc_queue.c | 61 ++++++++++++++++++++++++++------
drivers/media/usb/uvc/uvcvideo.h | 4 ++-
2 files changed, 54 insertions(+), 11 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_queue.c b/drivers/media/usb/uvc/uvc_queue.c
index 28dbdf8bb533..204dd91a8526 100644
--- a/drivers/media/usb/uvc/uvc_queue.c
+++ b/drivers/media/usb/uvc/uvc_queue.c
@@ -131,6 +131,7 @@ static void uvc_buffer_queue(struct vb2_buffer *vb)

spin_lock_irqsave(&queue->irqlock, flags);
if (likely(!(queue->flags & UVC_QUEUE_DISCONNECTED))) {
+ kref_init(&buf->ref);
list_add_tail(&buf->queue, &queue->irqqueue);
} else {
/* If the device is disconnected return the buffer to userspace
@@ -425,28 +426,66 @@ struct uvc_buffer *uvc_queue_get_current_buffer(struct uvc_video_queue *queue)
return nextbuf;
}

-struct uvc_buffer *uvc_queue_next_buffer(struct uvc_video_queue *queue,
+/*
+ * uvc_queue_requeue: Requeue a buffer on our internal irqqueue
+ *
+ * Reuse a buffer through our internal queue without the need to 'prepare'
+ * The buffer will be returned to userspace through the uvc_buffer_queue call if
+ * the device has been disconnected
+ */
+static void uvc_queue_requeue(struct uvc_video_queue *queue,
struct uvc_buffer *buf)
{
- struct uvc_buffer *nextbuf;
- unsigned long flags;
+ buf->error = 0;
+ buf->state = UVC_BUF_STATE_QUEUED;
+ buf->bytesused = 0;
+ vb2_set_plane_payload(&buf->buf.vb2_buf, 0, 0);
+
+ uvc_buffer_queue(&buf->buf.vb2_buf);
+}
+
+static void uvc_queue_buffer_complete(struct kref *ref)
+{
+ struct uvc_buffer *buf = container_of(ref, struct uvc_buffer, ref);
+ struct vb2_buffer *vb = &buf->buf.vb2_buf;
+ struct uvc_video_queue *queue = vb2_get_drv_priv(vb->vb2_queue);

if ((queue->flags & UVC_QUEUE_DROP_CORRUPTED) && buf->error) {
- buf->error = 0;
- buf->state = UVC_BUF_STATE_QUEUED;
- buf->bytesused = 0;
- vb2_set_plane_payload(&buf->buf.vb2_buf, 0, 0);
- return buf;
+ uvc_queue_requeue(queue, buf);
+ return;
}

+ buf->state = buf->error ? UVC_BUF_STATE_ERROR : UVC_BUF_STATE_DONE;
+ vb2_set_plane_payload(&buf->buf.vb2_buf, 0, buf->bytesused);
+ vb2_buffer_done(&buf->buf.vb2_buf, VB2_BUF_STATE_DONE);
+}
+
+/*
+ * Release a reference on the buffer. Complete the buffer when the last
+ * reference is released
+ */
+void uvc_queue_buffer_release(struct uvc_buffer *buf)
+{
+ kref_put(&buf->ref, uvc_queue_buffer_complete);
+}
+
+/*
+ * Remove this buffer from the queue. Lifetime will persist while async actions
+ * are still running (if any), and uvc_queue_buffer_release will give the buffer
+ * back to VB2 when all users have completed.
+ */
+struct uvc_buffer *uvc_queue_next_buffer(struct uvc_video_queue *queue,
+ struct uvc_buffer *buf)
+{
+ struct uvc_buffer *nextbuf;
+ unsigned long flags;
+
spin_lock_irqsave(&queue->irqlock, flags);
list_del(&buf->queue);
nextbuf = __uvc_queue_get_current_buffer(queue);
spin_unlock_irqrestore(&queue->irqlock, flags);

- buf->state = buf->error ? UVC_BUF_STATE_ERROR : UVC_BUF_STATE_DONE;
- vb2_set_plane_payload(&buf->buf.vb2_buf, 0, buf->bytesused);
- vb2_buffer_done(&buf->buf.vb2_buf, VB2_BUF_STATE_DONE);
+ uvc_queue_buffer_release(buf);

return nextbuf;
}
diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index f274c685087d..2e51bbdf5dac 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -404,6 +404,9 @@ struct uvc_buffer {
unsigned int bytesused;

u32 pts;
+
+ /* asynchronous buffer handling */
+ struct kref ref;
};

#define UVC_QUEUE_DISCONNECTED (1 << 0)
@@ -696,6 +699,7 @@ extern struct uvc_buffer *
uvc_queue_get_current_buffer(struct uvc_video_queue *queue);
extern struct uvc_buffer *uvc_queue_next_buffer(struct uvc_video_queue *queue,
struct uvc_buffer *buf);
+extern void uvc_queue_buffer_release(struct uvc_buffer *buf);
extern int uvc_queue_mmap(struct uvc_video_queue *queue,
struct vm_area_struct *vma);
extern unsigned int uvc_queue_poll(struct uvc_video_queue *queue,
--
git-series 0.9.1

2018-01-03 20:33:52

by Kieran Bingham

[permalink] [raw]
Subject: [RFC/RFT PATCH 6/6] uvcvideo: Move decode processing to process context

From: Kieran Bingham <[email protected]>

Newer high definition cameras, and cameras with multiple lenses such as
the range of stereovision cameras now available have ever increasing
data rates.

The inclusion of a variable length packet header in URB packets mean
that we must memcpy the frame data out to our destination 'manually'.
This can result in data rates of up to 2 gigabits per second being
processed.

To improve efficiency, and maximise throughput, handle the URB decode
processing through a work queue to move it from interrupt context, and
allow multiple processors to work on URBs in parallel.

Signed-off-by: Kieran Bingham <[email protected]>
---
drivers/media/usb/uvc/uvc_queue.c | 12 +++-
drivers/media/usb/uvc/uvc_video.c | 114 ++++++++++++++++++++++++++-----
drivers/media/usb/uvc/uvcvideo.h | 24 +++++++-
3 files changed, 132 insertions(+), 18 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_queue.c b/drivers/media/usb/uvc/uvc_queue.c
index 204dd91a8526..07fcbfc132c9 100644
--- a/drivers/media/usb/uvc/uvc_queue.c
+++ b/drivers/media/usb/uvc/uvc_queue.c
@@ -179,10 +179,22 @@ static void uvc_stop_streaming(struct vb2_queue *vq)
struct uvc_video_queue *queue = vb2_get_drv_priv(vq);
struct uvc_streaming *stream = uvc_queue_to_stream(queue);

+ /* Prevent new buffers coming in. */
+ spin_lock_irq(&queue->irqlock);
+ queue->flags |= UVC_QUEUE_STOPPING;
+ spin_unlock_irq(&queue->irqlock);
+
+ /*
+ * All pending work should be completed before disabling the stream, as
+ * all URBs will be free'd during uvc_video_enable(s, 0).
+ */
+ flush_workqueue(stream->async_wq);
+
uvc_video_enable(stream, 0);

spin_lock_irq(&queue->irqlock);
uvc_queue_return_buffers(queue, UVC_BUF_STATE_ERROR);
+ queue->flags &= ~UVC_QUEUE_STOPPING;
spin_unlock_irq(&queue->irqlock);
}

diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
index 045ac655313c..b7b32a6bc2dc 100644
--- a/drivers/media/usb/uvc/uvc_video.c
+++ b/drivers/media/usb/uvc/uvc_video.c
@@ -1058,21 +1058,70 @@ static int uvc_video_decode_start(struct uvc_streaming *stream,
return data[0];
}

-static void uvc_video_decode_data(struct uvc_streaming *stream,
- struct uvc_buffer *buf, const __u8 *data, int len)
+/*
+ * uvc_video_decode_data_work: Asynchronous memcpy processing
+ *
+ * Perform memcpy tasks in process context, with completion handlers
+ * to return the URB, and buffer handles.
+ *
+ * The work submitter must pre-determine that the work is safe
+ */
+static void uvc_video_decode_data_work(struct work_struct *work)
{
- unsigned int maxlen, nbytes;
- void *mem;
+ struct uvc_urb *uvc_urb = container_of(work, struct uvc_urb, work);
+ struct uvc_streaming *stream = uvc_urb->stream;
+ struct uvc_video_queue *queue = &stream->queue;
+ unsigned int i;
+ bool stopping;
+ int ret;
+
+ for (i = 0; i < uvc_urb->packets; i++) {
+ struct uvc_decode_op *op = &uvc_urb->decodes[i];
+
+ memcpy(op->dst, op->src, op->len);
+
+ /* Release reference taken on this buffer */
+ uvc_queue_buffer_release(op->buf);
+ }
+
+ /*
+ * Prevent resubmitting URBs when shutting down to ensure that no new
+ * work item will be scheduled after uvc_stop_streaming() flushes the
+ * work queue.
+ */
+ spin_lock_irq(&queue->irqlock);
+ stopping = queue->flags & UVC_QUEUE_STOPPING;
+ spin_unlock_irq(&queue->irqlock);
+
+ if (stopping)
+ return;
+
+ ret = usb_submit_urb(uvc_urb->urb, GFP_ATOMIC);
+ if (ret < 0)
+ uvc_printk(KERN_ERR, "Failed to resubmit video URB (%d).\n",
+ ret);
+}
+
+static void uvc_video_decode_data(struct uvc_decode_op *decode,
+ struct uvc_urb *uvc_urb, struct uvc_buffer *buf,
+ const __u8 *data, int len)
+{
+ unsigned int maxlen;

if (len <= 0)
return;

- /* Copy the video data to the buffer. */
maxlen = buf->length - buf->bytesused;
- mem = buf->mem + buf->bytesused;
- nbytes = min((unsigned int)len, maxlen);
- memcpy(mem, data, nbytes);
- buf->bytesused += nbytes;
+
+ /* Take a buffer reference for async work */
+ kref_get(&buf->ref);
+
+ decode->buf = buf;
+ decode->src = data;
+ decode->dst = buf->mem + buf->bytesused;
+ decode->len = min_t(unsigned int, len, maxlen);
+
+ buf->bytesused += decode->len;

/* Complete the current frame if the buffer size was exceeded. */
if (len > maxlen) {
@@ -1080,6 +1129,8 @@ static void uvc_video_decode_data(struct uvc_streaming *stream,
buf->error = 1;
buf->state = UVC_BUF_STATE_READY;
}
+
+ uvc_urb->packets++;
}

static void uvc_video_decode_end(struct uvc_streaming *stream,
@@ -1162,6 +1213,8 @@ static void uvc_video_decode_isoc(struct uvc_urb *uvc_urb,
int ret, i;

for (i = 0; i < urb->number_of_packets; ++i) {
+ struct uvc_decode_op *op = &uvc_urb->decodes[uvc_urb->packets];
+
if (urb->iso_frame_desc[i].status < 0) {
uvc_trace(UVC_TRACE_FRAME, "USB isochronous frame "
"lost (%d).\n", urb->iso_frame_desc[i].status);
@@ -1187,7 +1240,7 @@ static void uvc_video_decode_isoc(struct uvc_urb *uvc_urb,
continue;

/* Decode the payload data. */
- uvc_video_decode_data(stream, buf, mem + ret,
+ uvc_video_decode_data(op, uvc_urb, buf, mem + ret,
urb->iso_frame_desc[i].actual_length - ret);

/* Process the header again. */
@@ -1248,9 +1301,12 @@ static void uvc_video_decode_bulk(struct uvc_urb *uvc_urb,
* sure buf is never dereferenced if NULL.
*/

- /* Process video data. */
- if (!stream->bulk.skip_payload && buf != NULL)
- uvc_video_decode_data(stream, buf, mem, len);
+ /* Prepare video data for processing. */
+ if (!stream->bulk.skip_payload && buf != NULL) {
+ struct uvc_decode_op *op = &uvc_urb->decodes[0];
+
+ uvc_video_decode_data(op, uvc_urb, buf, mem, len);
+ }

/* Detect the payload end by a URB smaller than the maximum size (or
* a payload size equal to the maximum) and process the header again.
@@ -1322,7 +1378,8 @@ static void uvc_video_complete(struct urb *urb)
struct uvc_streaming *stream = uvc_urb->stream;
struct uvc_video_queue *queue = &stream->queue;
struct uvc_buffer *buf = NULL;
- int ret;
+ unsigned long flags;
+ bool stopping;

switch (urb->status) {
case 0:
@@ -1342,14 +1399,30 @@ static void uvc_video_complete(struct urb *urb)
return;
}

+ /*
+ * Simply accept and discard completed URBs without processing when the
+ * stream is being shutdown. URBs will be freed as part of the
+ * uvc_video_enable(s, 0) action, so we must not queue asynchronous
+ * work based upon them.
+ */
+ spin_lock_irqsave(&queue->irqlock, flags);
+ stopping = queue->flags & UVC_QUEUE_STOPPING;
+ spin_unlock_irqrestore(&queue->irqlock, flags);
+
+ if (stopping)
+ return;
+
buf = uvc_queue_get_current_buffer(queue);

+ /* Re-initialise the URB packet work */
+ uvc_urb->packets = 0;
+
+ /* Process the URB headers, but work is deferred to a work queue */
stream->decode(uvc_urb, buf);

- if ((ret = usb_submit_urb(urb, GFP_ATOMIC)) < 0) {
- uvc_printk(KERN_ERR, "Failed to resubmit video URB (%d).\n",
- ret);
- }
+ /* Handle any heavy lifting required */
+ INIT_WORK(&uvc_urb->work, uvc_video_decode_data_work);
+ queue_work(stream->async_wq, &uvc_urb->work);
}

/*
@@ -1621,6 +1694,11 @@ static int uvc_init_video(struct uvc_streaming *stream, gfp_t gfp_flags)

uvc_video_stats_start(stream);

+ stream->async_wq = alloc_workqueue("uvcvideo", WQ_UNBOUND | WQ_HIGHPRI,
+ 0);
+ if (!stream->async_wq)
+ return -ENOMEM;
+
if (intf->num_altsetting > 1) {
struct usb_host_endpoint *best_ep = NULL;
unsigned int best_psize = UINT_MAX;
diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index 2e51bbdf5dac..1f7399cb66e1 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -411,6 +411,7 @@ struct uvc_buffer {

#define UVC_QUEUE_DISCONNECTED (1 << 0)
#define UVC_QUEUE_DROP_CORRUPTED (1 << 1)
+#define UVC_QUEUE_STOPPING (1 << 2)

struct uvc_video_queue {
struct vb2_queue queue;
@@ -483,12 +484,30 @@ struct uvc_stats_stream {
};

/**
+ * struct uvc_decode_op: Context structure to schedule asynchronous memcpy
+ *
+ * @buf: active buf object for this decode
+ * @dst: copy destination address
+ * @src: copy source address
+ * @len: copy length
+ */
+struct uvc_decode_op {
+ struct uvc_buffer *buf;
+ void *dst;
+ const __u8 *src;
+ int len;
+};
+
+/**
* struct uvc_urb - URB context management structure
*
* @urb: described URB. Must be allocated with usb_alloc_urb()
* @stream: UVC streaming context
* @urb_buffer: memory storage for the URB
* @urb_dma: DMA coherent addressing for the urb_buffer
+ * @packets: counter to indicate the number of copy operations
+ * @decodes: work descriptors for asynchronous copy operations
+ * @work: work queue entry for asynchronous decode
*/
struct uvc_urb {
struct urb *urb;
@@ -496,6 +515,10 @@ struct uvc_urb {

char *urb_buffer;
dma_addr_t urb_dma;
+
+ unsigned int packets;
+ struct uvc_decode_op decodes[UVC_MAX_PACKETS];
+ struct work_struct work;
};

struct uvc_streaming {
@@ -528,6 +551,7 @@ struct uvc_streaming {
/* Buffers queue. */
unsigned int frozen : 1;
struct uvc_video_queue queue;
+ struct workqueue_struct *async_wq;
void (*decode) (struct uvc_urb *uvc_urb, struct uvc_buffer *buf);

/* Context data used by the bulk completion handler. */
--
git-series 0.9.1

2018-01-03 20:34:30

by Kieran Bingham

[permalink] [raw]
Subject: [RFC/RFT PATCH 4/6] uvcvideo: queue: Simplify spin-lock usage

From: Kieran Bingham <[email protected]>

Both uvc_start_streaming(), and uvc_stop_streaming() are called from
userspace context. As such, they do not need to save the IRQ state, and
can use spin_lock_irq() and spin_unlock_irq() respectively.

Signed-off-by: Kieran Bingham <[email protected]>
---
drivers/media/usb/uvc/uvc_queue.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_queue.c b/drivers/media/usb/uvc/uvc_queue.c
index 0711e3d9ff76..28dbdf8bb533 100644
--- a/drivers/media/usb/uvc/uvc_queue.c
+++ b/drivers/media/usb/uvc/uvc_queue.c
@@ -158,7 +158,6 @@ static int uvc_start_streaming(struct vb2_queue *vq, unsigned int count)
{
struct uvc_video_queue *queue = vb2_get_drv_priv(vq);
struct uvc_streaming *stream = uvc_queue_to_stream(queue);
- unsigned long flags;
int ret;

queue->buf_used = 0;
@@ -167,9 +166,9 @@ static int uvc_start_streaming(struct vb2_queue *vq, unsigned int count)
if (ret == 0)
return 0;

- spin_lock_irqsave(&queue->irqlock, flags);
+ spin_lock_irq(&queue->irqlock);
uvc_queue_return_buffers(queue, UVC_BUF_STATE_QUEUED);
- spin_unlock_irqrestore(&queue->irqlock, flags);
+ spin_unlock_irq(&queue->irqlock);

return ret;
}
@@ -178,13 +177,12 @@ static void uvc_stop_streaming(struct vb2_queue *vq)
{
struct uvc_video_queue *queue = vb2_get_drv_priv(vq);
struct uvc_streaming *stream = uvc_queue_to_stream(queue);
- unsigned long flags;

uvc_video_enable(stream, 0);

- spin_lock_irqsave(&queue->irqlock, flags);
+ spin_lock_irq(&queue->irqlock);
uvc_queue_return_buffers(queue, UVC_BUF_STATE_ERROR);
- spin_unlock_irqrestore(&queue->irqlock, flags);
+ spin_unlock_irq(&queue->irqlock);
}

static const struct vb2_ops uvc_queue_qops = {
--
git-series 0.9.1

2018-01-03 21:13:13

by Troy Kisky

[permalink] [raw]
Subject: Re: [RFC/RFT PATCH 0/6] Asynchronous UVC

On 1/3/2018 12:32 PM, Kieran Bingham wrote:
> From: Kieran Bingham <[email protected]>
>
> The Linux UVC driver has long provided adequate performance capabilities for
> web-cams and low data rate video devices in Linux while resolutions were low.
>
> Modern USB cameras are now capable of high data rates thanks to USB3 with
> 1080p, and even 4k capture resolutions supported.
>
> Cameras such as the Stereolabs ZED or the Logitech Brio can generate more data
> than an embedded ARM core is able to process on a single core, resulting in
> frame loss.
>
> A large part of this performance impact is from the requirement to
> ‘memcpy’ frames out from URB packets to destination frames. This unfortunate
> requirement is due to the UVC protocol allowing a variable length header, and
> thus it is not possible to provide the target frame buffers directly.


I have a rather large patch that does provide frame buffers directly for bulk
cameras. It cannot be used with ISOC cameras. But it is currently for 4.1.
I'll be porting it to 4.9 in a few days if you'd like to see it.

BR
Troy


>
> Extra throughput is possible by moving the actual memcpy actions to a work
> queue, and moving the memcpy out of interrupt context and allowing work tasks
> to be scheduled across multiple cores.
>
> This series has been tested on both the ZED and Brio cameras on arm64
> platforms, however due to the intrinsic changes in the driver I would like to
> see it tested with other devices and other platforms, so I'd appreciate if
> anyone can test this on a range of USB cameras.
>
> Kieran Bingham (6):
> uvcvideo: Refactor URB descriptors
> uvcvideo: Convert decode functions to use new context structure
> uvcvideo: Protect queue internals with helper
> uvcvideo: queue: Simplify spin-lock usage
> uvcvideo: queue: Support asynchronous buffer handling
> uvcvideo: Move decode processing to process context
>
> drivers/media/usb/uvc/uvc_isight.c | 4 +-
> drivers/media/usb/uvc/uvc_queue.c | 115 ++++++++++++++----
> drivers/media/usb/uvc/uvc_video.c | 191 ++++++++++++++++++++++--------
> drivers/media/usb/uvc/uvcvideo.h | 56 +++++++--
> 4 files changed, 289 insertions(+), 77 deletions(-)
>
> base-commit: 6f0e5fd39143a59c22d60e7befc4f33f22aeed2f
>

2018-01-04 10:25:17

by Kieran Bingham

[permalink] [raw]
Subject: Re: [RFC/RFT PATCH 0/6] Asynchronous UVC

Hi Troy,

On 03/01/18 21:13, Troy Kisky wrote:
> On 1/3/2018 12:32 PM, Kieran Bingham wrote:
>> From: Kieran Bingham <[email protected]>
>>
>> The Linux UVC driver has long provided adequate performance capabilities for
>> web-cams and low data rate video devices in Linux while resolutions were low.
>>
>> Modern USB cameras are now capable of high data rates thanks to USB3 with
>> 1080p, and even 4k capture resolutions supported.
>>
>> Cameras such as the Stereolabs ZED or the Logitech Brio can generate more data
>> than an embedded ARM core is able to process on a single core, resulting in
>> frame loss.
>>
>> A large part of this performance impact is from the requirement to
>> ‘memcpy’ frames out from URB packets to destination frames. This unfortunate
>> requirement is due to the UVC protocol allowing a variable length header, and
>> thus it is not possible to provide the target frame buffers directly.
>
> I have a rather large patch that does provide frame buffers directly for bulk
> cameras. It cannot be used with ISOC cameras. But it is currently for 4.1.
> I'll be porting it to 4.9 in a few days if you'd like to see it.

That does indeed sound interesting!

Having direct frame buffers for bulk would certainly be a benefit.
This series could then only be relevant to the ISOC cameras.

Let me know if there's anything I can do to help with porting to mainline, (as
opposed to 4.9) and feel free to send me patches to test and review.

If you're able to test these patches along side yours then that could also be
useful, although I suspect there may be a lot of cross-over - so it will likely
be more effort than just applying the patches :)

--
Regards

Kieran



> BR
> Troy
>
>
>>
>> Extra throughput is possible by moving the actual memcpy actions to a work
>> queue, and moving the memcpy out of interrupt context and allowing work tasks
>> to be scheduled across multiple cores.
>>
>> This series has been tested on both the ZED and Brio cameras on arm64
>> platforms, however due to the intrinsic changes in the driver I would like to
>> see it tested with other devices and other platforms, so I'd appreciate if
>> anyone can test this on a range of USB cameras.
>>
>> Kieran Bingham (6):
>> uvcvideo: Refactor URB descriptors
>> uvcvideo: Convert decode functions to use new context structure
>> uvcvideo: Protect queue internals with helper
>> uvcvideo: queue: Simplify spin-lock usage
>> uvcvideo: queue: Support asynchronous buffer handling
>> uvcvideo: Move decode processing to process context
>>
>> drivers/media/usb/uvc/uvc_isight.c | 4 +-
>> drivers/media/usb/uvc/uvc_queue.c | 115 ++++++++++++++----
>> drivers/media/usb/uvc/uvc_video.c | 191 ++++++++++++++++++++++--------
>> drivers/media/usb/uvc/uvcvideo.h | 56 +++++++--
>> 4 files changed, 289 insertions(+), 77 deletions(-)
>>
>> base-commit: 6f0e5fd39143a59c22d60e7befc4f33f22aeed2f
>>
>

2018-01-04 18:25:15

by Guennadi Liakhovetski

[permalink] [raw]
Subject: Re: [RFC/RFT PATCH 1/6] uvcvideo: Refactor URB descriptors

Hi Kieran,

Just minor suggestions below:

On Wed, 3 Jan 2018, Kieran Bingham wrote:

> From: Kieran Bingham <[email protected]>
>
> We currently store three separate arrays for each URB reference we hold.
>
> Objectify the data needed to track URBs into a single uvc_urb structure,
> allowing better object management and tracking of the URB.
>
> All accesses to the data pointers through stream, are converted to use a
> uvc_urb pointer for consistency.
>
> Signed-off-by: Kieran Bingham <[email protected]>
> Reviewed-by: Laurent Pinchart <[email protected]>
> ---
> drivers/media/usb/uvc/uvc_video.c | 46 ++++++++++++++++++++------------
> drivers/media/usb/uvc/uvcvideo.h | 18 ++++++++++---
> 2 files changed, 44 insertions(+), 20 deletions(-)

[snip]

> diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
> index 19e725e2bda5..4afa8ce13ea7 100644
> --- a/drivers/media/usb/uvc/uvcvideo.h
> +++ b/drivers/media/usb/uvc/uvcvideo.h
> @@ -479,6 +479,20 @@ struct uvc_stats_stream {
> unsigned int max_sof; /* Maximum STC.SOF value */
> };
>
> +/**
> + * struct uvc_urb - URB context management structure
> + *
> + * @urb: described URB. Must be allocated with usb_alloc_urb()

Didn't you mean "describes?"

> + * @urb_buffer: memory storage for the URB
> + * @urb_dma: DMA coherent addressing for the urb_buffer

The whole struct describes URBs, so, I wouldn't repeat that in these two
field names, I'd just call them "buffer" and "dma." OTOH, later you add
more fields like "stream," which aren't per-URB, so, maybe you want to
keep these prefixes.

Thanks
Guennadi

> + */
> +struct uvc_urb {
> + struct urb *urb;
> +
> + char *urb_buffer;
> + dma_addr_t urb_dma;
> +};
> +
> struct uvc_streaming {
> struct list_head list;
> struct uvc_device *dev;
> @@ -521,9 +535,7 @@ struct uvc_streaming {
> __u32 max_payload_size;
> } bulk;
>
> - struct urb *urb[UVC_URBS];
> - char *urb_buffer[UVC_URBS];
> - dma_addr_t urb_dma[UVC_URBS];
> + struct uvc_urb uvc_urb[UVC_URBS];
> unsigned int urb_size;
>
> __u32 sequence;
> --
> git-series 0.9.1
>

2018-01-04 18:26:07

by Guennadi Liakhovetski

[permalink] [raw]
Subject: Re: [RFC/RFT PATCH 3/6] uvcvideo: Protect queue internals with helper

Hi Kieran,

On Wed, 3 Jan 2018, Kieran Bingham wrote:

> From: Kieran Bingham <[email protected]>
>
> The URB completion operation obtains the current buffer by reading
> directly into the queue internal interface.
>
> Protect this queue abstraction by providing a helper
> uvc_queue_get_current_buffer() which can be used by both the decode
> task, and the uvc_queue_next_buffer() functions.
>
> Signed-off-by: Kieran Bingham <[email protected]>
> Reviewed-by: Laurent Pinchart <[email protected]>
> ---
> drivers/media/usb/uvc/uvc_queue.c | 34 +++++++++++++++++++++++++++-----
> drivers/media/usb/uvc/uvc_video.c | 7 +------
> drivers/media/usb/uvc/uvcvideo.h | 2 ++-
> 3 files changed, 32 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/media/usb/uvc/uvc_queue.c b/drivers/media/usb/uvc/uvc_queue.c
> index c8d78b2f3de4..0711e3d9ff76 100644
> --- a/drivers/media/usb/uvc/uvc_queue.c
> +++ b/drivers/media/usb/uvc/uvc_queue.c
> @@ -399,6 +399,34 @@ void uvc_queue_cancel(struct uvc_video_queue *queue, int disconnect)
> spin_unlock_irqrestore(&queue->irqlock, flags);
> }
>
> +/*
> + * uvc_queue_get_current_buffer: Obtain the current working output buffer
> + *
> + * Buffers may span multiple packets, and even URBs, therefore the active buffer
> + * remains on the queue until the EOF marker.
> + */
> +static struct uvc_buffer *
> +__uvc_queue_get_current_buffer(struct uvc_video_queue *queue)
> +{
> + if (!list_empty(&queue->irqqueue))
> + return list_first_entry(&queue->irqqueue, struct uvc_buffer,
> + queue);
> + else
> + return NULL;

I think the preferred style is not to use "else" in such cases. It might
even be prettier to write

if (list_empty(...))
return NULL;

return list_first_entry(...);

Thanks
Guennadi

2018-01-04 18:55:30

by Guennadi Liakhovetski

[permalink] [raw]
Subject: Re: [RFC/RFT PATCH 6/6] uvcvideo: Move decode processing to process context

On Wed, 3 Jan 2018, Kieran Bingham wrote:

> From: Kieran Bingham <[email protected]>
>
> Newer high definition cameras, and cameras with multiple lenses such as
> the range of stereovision cameras now available have ever increasing
> data rates.
>
> The inclusion of a variable length packet header in URB packets mean
> that we must memcpy the frame data out to our destination 'manually'.
> This can result in data rates of up to 2 gigabits per second being
> processed.
>
> To improve efficiency, and maximise throughput, handle the URB decode
> processing through a work queue to move it from interrupt context, and
> allow multiple processors to work on URBs in parallel.
>
> Signed-off-by: Kieran Bingham <[email protected]>
> ---
> drivers/media/usb/uvc/uvc_queue.c | 12 +++-
> drivers/media/usb/uvc/uvc_video.c | 114 ++++++++++++++++++++++++++-----
> drivers/media/usb/uvc/uvcvideo.h | 24 +++++++-
> 3 files changed, 132 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/media/usb/uvc/uvc_queue.c b/drivers/media/usb/uvc/uvc_queue.c
> index 204dd91a8526..07fcbfc132c9 100644
> --- a/drivers/media/usb/uvc/uvc_queue.c
> +++ b/drivers/media/usb/uvc/uvc_queue.c
> @@ -179,10 +179,22 @@ static void uvc_stop_streaming(struct vb2_queue *vq)
> struct uvc_video_queue *queue = vb2_get_drv_priv(vq);
> struct uvc_streaming *stream = uvc_queue_to_stream(queue);
>
> + /* Prevent new buffers coming in. */
> + spin_lock_irq(&queue->irqlock);
> + queue->flags |= UVC_QUEUE_STOPPING;
> + spin_unlock_irq(&queue->irqlock);
> +
> + /*
> + * All pending work should be completed before disabling the stream, as
> + * all URBs will be free'd during uvc_video_enable(s, 0).
> + */
> + flush_workqueue(stream->async_wq);
> +
> uvc_video_enable(stream, 0);
>
> spin_lock_irq(&queue->irqlock);
> uvc_queue_return_buffers(queue, UVC_BUF_STATE_ERROR);
> + queue->flags &= ~UVC_QUEUE_STOPPING;
> spin_unlock_irq(&queue->irqlock);
> }
>
> diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
> index 045ac655313c..b7b32a6bc2dc 100644
> --- a/drivers/media/usb/uvc/uvc_video.c
> +++ b/drivers/media/usb/uvc/uvc_video.c
> @@ -1058,21 +1058,70 @@ static int uvc_video_decode_start(struct uvc_streaming *stream,
> return data[0];
> }
>
> -static void uvc_video_decode_data(struct uvc_streaming *stream,
> - struct uvc_buffer *buf, const __u8 *data, int len)
> +/*
> + * uvc_video_decode_data_work: Asynchronous memcpy processing
> + *
> + * Perform memcpy tasks in process context, with completion handlers
> + * to return the URB, and buffer handles.
> + *
> + * The work submitter must pre-determine that the work is safe
> + */
> +static void uvc_video_decode_data_work(struct work_struct *work)
> {
> - unsigned int maxlen, nbytes;
> - void *mem;
> + struct uvc_urb *uvc_urb = container_of(work, struct uvc_urb, work);
> + struct uvc_streaming *stream = uvc_urb->stream;
> + struct uvc_video_queue *queue = &stream->queue;
> + unsigned int i;
> + bool stopping;
> + int ret;
> +
> + for (i = 0; i < uvc_urb->packets; i++) {
> + struct uvc_decode_op *op = &uvc_urb->decodes[i];
> +
> + memcpy(op->dst, op->src, op->len);
> +
> + /* Release reference taken on this buffer */
> + uvc_queue_buffer_release(op->buf);
> + }
> +
> + /*
> + * Prevent resubmitting URBs when shutting down to ensure that no new
> + * work item will be scheduled after uvc_stop_streaming() flushes the
> + * work queue.
> + */
> + spin_lock_irq(&queue->irqlock);
> + stopping = queue->flags & UVC_QUEUE_STOPPING;
> + spin_unlock_irq(&queue->irqlock);

Are you sure this locking really helps? What if uvc_stop_streaming() runs
here?

Thanks
Guennadi

> +
> + if (stopping)
> + return;
> +
> + ret = usb_submit_urb(uvc_urb->urb, GFP_ATOMIC);
> + if (ret < 0)
> + uvc_printk(KERN_ERR, "Failed to resubmit video URB (%d).\n",
> + ret);
> +}

2018-01-06 18:29:50

by Kieran Bingham

[permalink] [raw]
Subject: Re: [RFC/RFT PATCH 6/6] uvcvideo: Move decode processing to process context

Hi Guennadi,

Thank you for taking the time to review this series,

On 04/01/18 18:54, Guennadi Liakhovetski wrote:
> On Wed, 3 Jan 2018, Kieran Bingham wrote:
>
>> From: Kieran Bingham <[email protected]>
>>
>> Newer high definition cameras, and cameras with multiple lenses such as
>> the range of stereovision cameras now available have ever increasing
>> data rates.
>>
>> The inclusion of a variable length packet header in URB packets mean
>> that we must memcpy the frame data out to our destination 'manually'.
>> This can result in data rates of up to 2 gigabits per second being
>> processed.
>>
>> To improve efficiency, and maximise throughput, handle the URB decode
>> processing through a work queue to move it from interrupt context, and
>> allow multiple processors to work on URBs in parallel.
>>
>> Signed-off-by: Kieran Bingham <[email protected]>
>> ---
>> drivers/media/usb/uvc/uvc_queue.c | 12 +++-
>> drivers/media/usb/uvc/uvc_video.c | 114 ++++++++++++++++++++++++++-----
>> drivers/media/usb/uvc/uvcvideo.h | 24 +++++++-
>> 3 files changed, 132 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/media/usb/uvc/uvc_queue.c b/drivers/media/usb/uvc/uvc_queue.c
>> index 204dd91a8526..07fcbfc132c9 100644
>> --- a/drivers/media/usb/uvc/uvc_queue.c
>> +++ b/drivers/media/usb/uvc/uvc_queue.c
>> @@ -179,10 +179,22 @@ static void uvc_stop_streaming(struct vb2_queue *vq)
>> struct uvc_video_queue *queue = vb2_get_drv_priv(vq);
>> struct uvc_streaming *stream = uvc_queue_to_stream(queue);
>>
>> + /* Prevent new buffers coming in. */
>> + spin_lock_irq(&queue->irqlock);
>> + queue->flags |= UVC_QUEUE_STOPPING;
>> + spin_unlock_irq(&queue->irqlock);
>> +
>> + /*
>> + * All pending work should be completed before disabling the stream, as
>> + * all URBs will be free'd during uvc_video_enable(s, 0).
>> + */
>> + flush_workqueue(stream->async_wq);
>> +
>> uvc_video_enable(stream, 0);
>>
>> spin_lock_irq(&queue->irqlock);
>> uvc_queue_return_buffers(queue, UVC_BUF_STATE_ERROR);
>> + queue->flags &= ~UVC_QUEUE_STOPPING;
>> spin_unlock_irq(&queue->irqlock);
>> }
>>
>> diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
>> index 045ac655313c..b7b32a6bc2dc 100644
>> --- a/drivers/media/usb/uvc/uvc_video.c
>> +++ b/drivers/media/usb/uvc/uvc_video.c
>> @@ -1058,21 +1058,70 @@ static int uvc_video_decode_start(struct uvc_streaming *stream,
>> return data[0];
>> }
>>
>> -static void uvc_video_decode_data(struct uvc_streaming *stream,
>> - struct uvc_buffer *buf, const __u8 *data, int len)
>> +/*
>> + * uvc_video_decode_data_work: Asynchronous memcpy processing
>> + *
>> + * Perform memcpy tasks in process context, with completion handlers
>> + * to return the URB, and buffer handles.
>> + *
>> + * The work submitter must pre-determine that the work is safe
>> + */
>> +static void uvc_video_decode_data_work(struct work_struct *work)
>> {
>> - unsigned int maxlen, nbytes;
>> - void *mem;
>> + struct uvc_urb *uvc_urb = container_of(work, struct uvc_urb, work);
>> + struct uvc_streaming *stream = uvc_urb->stream;
>> + struct uvc_video_queue *queue = &stream->queue;
>> + unsigned int i;
>> + bool stopping;
>> + int ret;
>> +
>> + for (i = 0; i < uvc_urb->packets; i++) {
>> + struct uvc_decode_op *op = &uvc_urb->decodes[i];
>> +
>> + memcpy(op->dst, op->src, op->len);
>> +
>> + /* Release reference taken on this buffer */
>> + uvc_queue_buffer_release(op->buf);
>> + }
>> +
>> + /*
>> + * Prevent resubmitting URBs when shutting down to ensure that no new
>> + * work item will be scheduled after uvc_stop_streaming() flushes the
>> + * work queue.
>> + */
>> + spin_lock_irq(&queue->irqlock);
>> + stopping = queue->flags & UVC_QUEUE_STOPPING;
>> + spin_unlock_irq(&queue->irqlock);
>
> Are you sure this locking really helps? What if uvc_stop_streaming() runs
> here?

Quite - there is a race there. It protects the flag though ;-)

I thought I had pulled the lock out to only check the flag, to prevent
surrounding the usb_submit_urb(), but now I look again - I see no reason not to
lock for the whole critical section.

I've tested locally on my laptop, and it seems safe to hold the lock during the
usb_submit_urb() (unless anyone sees something I'm missing), so I'll update this
for the next spin to something more like:

spin_lock_irq(&queue->irqlock);
if (!(queue->flags & UVC_QUEUE_STOPPING)) {
ret = usb_submit_urb(uvc_urb->urb, GFP_ATOMIC);
if (ret < 0)
uvc_printk(KERN_ERR,
"Failed to resubmit video URB (%d).\n",
ret);
}
spin_unlock_irq(&queue->irqlock);


> Thanks
> Guennadi
>
>> +
>> + if (stopping)
>> + return;
>> +
>> + ret = usb_submit_urb(uvc_urb->urb, GFP_ATOMIC);
>> + if (ret < 0)
>> + uvc_printk(KERN_ERR, "Failed to resubmit video URB (%d).\n",
>> + ret);
>> +}


Regards
--
Kieran

2018-01-06 18:30:48

by Kieran Bingham

[permalink] [raw]
Subject: Re: [RFC/RFT PATCH 1/6] uvcvideo: Refactor URB descriptors

Hi Guennadi,

Thanks for your review,

On 04/01/18 18:24, Guennadi Liakhovetski wrote:
> Hi Kieran,
>
> Just minor suggestions below:
>
> On Wed, 3 Jan 2018, Kieran Bingham wrote:
>
>> From: Kieran Bingham <[email protected]>
>>
>> We currently store three separate arrays for each URB reference we hold.
>>
>> Objectify the data needed to track URBs into a single uvc_urb structure,
>> allowing better object management and tracking of the URB.
>>
>> All accesses to the data pointers through stream, are converted to use a
>> uvc_urb pointer for consistency.
>>
>> Signed-off-by: Kieran Bingham <[email protected]>
>> Reviewed-by: Laurent Pinchart <[email protected]>
>> ---
>> drivers/media/usb/uvc/uvc_video.c | 46 ++++++++++++++++++++------------
>> drivers/media/usb/uvc/uvcvideo.h | 18 ++++++++++---
>> 2 files changed, 44 insertions(+), 20 deletions(-)
>
> [snip]
>
>> diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
>> index 19e725e2bda5..4afa8ce13ea7 100644
>> --- a/drivers/media/usb/uvc/uvcvideo.h
>> +++ b/drivers/media/usb/uvc/uvcvideo.h
>> @@ -479,6 +479,20 @@ struct uvc_stats_stream {
>> unsigned int max_sof; /* Maximum STC.SOF value */
>> };
>>
>> +/**
>> + * struct uvc_urb - URB context management structure
>> + *
>> + * @urb: described URB. Must be allocated with usb_alloc_urb()
>
> Didn't you mean "describes?"

Hrm ... I think I meant described as in "This is the URB described by this
uvc_urb structure", rather than "this variable describes the URB"

Perhaps I'll change this to:

@urb: The URB described by this context structure.

I think the 'must be allocated with usb_alloc_urb() is fairly implicit, so could
be dropped in that case.

>
>> + * @urb_buffer: memory storage for the URB
>> + * @urb_dma: DMA coherent addressing for the urb_buffer
>
> The whole struct describes URBs, so, I wouldn't repeat that in these two
> field names, I'd just call them "buffer" and "dma." OTOH, later you add
> more fields like "stream," which aren't per-URB, so, maybe you want to
> keep these prefixes.

These names were kept from the original fields. But actually I think you're
right here - it wouldn't hurt to shorten the names, even with the other fields
added.

> Thanks
> Guennadi
>
>> + */
>> +struct uvc_urb {
>> + struct urb *urb;
>> +
>> + char *urb_buffer;
>> + dma_addr_t urb_dma;
>> +};
>> +
>> struct uvc_streaming {
>> struct list_head list;
>> struct uvc_device *dev;
>> @@ -521,9 +535,7 @@ struct uvc_streaming {
>> __u32 max_payload_size;
>> } bulk;
>>
>> - struct urb *urb[UVC_URBS];
>> - char *urb_buffer[UVC_URBS];
>> - dma_addr_t urb_dma[UVC_URBS];
>> + struct uvc_urb uvc_urb[UVC_URBS];
>> unsigned int urb_size;
>>
>> __u32 sequence;
>> --
>> git-series 0.9.1
>>

--
Kieran

2018-01-06 18:37:30

by Kieran Bingham

[permalink] [raw]
Subject: Re: [RFC/RFT PATCH 3/6] uvcvideo: Protect queue internals with helper

Hi Guennadi,

On 04/01/18 18:25, Guennadi Liakhovetski wrote:
> Hi Kieran,
>
> On Wed, 3 Jan 2018, Kieran Bingham wrote:
>
>> From: Kieran Bingham <[email protected]>
>>
>> The URB completion operation obtains the current buffer by reading
>> directly into the queue internal interface.
>>
>> Protect this queue abstraction by providing a helper
>> uvc_queue_get_current_buffer() which can be used by both the decode
>> task, and the uvc_queue_next_buffer() functions.
>>
>> Signed-off-by: Kieran Bingham <[email protected]>
>> Reviewed-by: Laurent Pinchart <[email protected]>
>> ---
>> drivers/media/usb/uvc/uvc_queue.c | 34 +++++++++++++++++++++++++++-----
>> drivers/media/usb/uvc/uvc_video.c | 7 +------
>> drivers/media/usb/uvc/uvcvideo.h | 2 ++-
>> 3 files changed, 32 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/media/usb/uvc/uvc_queue.c b/drivers/media/usb/uvc/uvc_queue.c
>> index c8d78b2f3de4..0711e3d9ff76 100644
>> --- a/drivers/media/usb/uvc/uvc_queue.c
>> +++ b/drivers/media/usb/uvc/uvc_queue.c
>> @@ -399,6 +399,34 @@ void uvc_queue_cancel(struct uvc_video_queue *queue, int disconnect)
>> spin_unlock_irqrestore(&queue->irqlock, flags);
>> }
>>
>> +/*
>> + * uvc_queue_get_current_buffer: Obtain the current working output buffer
>> + *
>> + * Buffers may span multiple packets, and even URBs, therefore the active buffer
>> + * remains on the queue until the EOF marker.
>> + */
>> +static struct uvc_buffer *
>> +__uvc_queue_get_current_buffer(struct uvc_video_queue *queue)
>> +{
>> + if (!list_empty(&queue->irqqueue))
>> + return list_first_entry(&queue->irqqueue, struct uvc_buffer,
>> + queue);
>> + else
>> + return NULL;
>
> I think the preferred style is not to use "else" in such cases. It might
> even be prettier to write
>
> if (list_empty(...))
> return NULL;
>
> return list_first_entry(...);

Ah yes, I believe you are correct.
Good spot!

Fixed, and looks much neater.

--
Kieran

>
> Thanks
> Guennadi
>

2018-06-05 09:02:38

by Kieran Bingham

[permalink] [raw]
Subject: Re: [RFC/RFT PATCH 0/6] Asynchronous UVC

Hi Troy

On 03/01/18 21:13, Troy Kisky wrote:
> On 1/3/2018 12:32 PM, Kieran Bingham wrote:
>> From: Kieran Bingham <[email protected]>
>>
>> The Linux UVC driver has long provided adequate performance capabilities for
>> web-cams and low data rate video devices in Linux while resolutions were low.
>>
>> Modern USB cameras are now capable of high data rates thanks to USB3 with
>> 1080p, and even 4k capture resolutions supported.
>>
>> Cameras such as the Stereolabs ZED or the Logitech Brio can generate more data
>> than an embedded ARM core is able to process on a single core, resulting in
>> frame loss.
>>
>> A large part of this performance impact is from the requirement to
>> ‘memcpy’ frames out from URB packets to destination frames. This unfortunate
>> requirement is due to the UVC protocol allowing a variable length header, and
>> thus it is not possible to provide the target frame buffers directly.
>
>
> I have a rather large patch that does provide frame buffers directly for bulk
> cameras. It cannot be used with ISOC cameras. But it is currently for 4.1.
> I'll be porting it to 4.9 in a few days if you'd like to see it.


How did you get on with this porting activity?

Is it possible to share any of this work with the mailing lists ?

(If you have not ported to v4.9 - I think it would be useful even to post the
v4.1 patch and we can look at what's needed for getting it ported to mainline)

--
Regards

Kieran


>
> BR
> Troy
>
>
>>
>> Extra throughput is possible by moving the actual memcpy actions to a work
>> queue, and moving the memcpy out of interrupt context and allowing work tasks
>> to be scheduled across multiple cores.
>>
>> This series has been tested on both the ZED and Brio cameras on arm64
>> platforms, however due to the intrinsic changes in the driver I would like to
>> see it tested with other devices and other platforms, so I'd appreciate if
>> anyone can test this on a range of USB cameras.
>>
>> Kieran Bingham (6):
>> uvcvideo: Refactor URB descriptors
>> uvcvideo: Convert decode functions to use new context structure
>> uvcvideo: Protect queue internals with helper
>> uvcvideo: queue: Simplify spin-lock usage
>> uvcvideo: queue: Support asynchronous buffer handling
>> uvcvideo: Move decode processing to process context
>>
>> drivers/media/usb/uvc/uvc_isight.c | 4 +-
>> drivers/media/usb/uvc/uvc_queue.c | 115 ++++++++++++++----
>> drivers/media/usb/uvc/uvc_video.c | 191 ++++++++++++++++++++++--------
>> drivers/media/usb/uvc/uvcvideo.h | 56 +++++++--
>> 4 files changed, 289 insertions(+), 77 deletions(-)
>>
>> base-commit: 6f0e5fd39143a59c22d60e7befc4f33f22aeed2f
>>
>

2018-06-05 19:21:38

by Troy Kisky

[permalink] [raw]
Subject: Re: [RFC/RFT PATCH 0/6] Asynchronous UVC

On 6/5/2018 2:01 AM, Kieran Bingham wrote:
> Hi Troy
>
> On 03/01/18 21:13, Troy Kisky wrote:
>> On 1/3/2018 12:32 PM, Kieran Bingham wrote:
>>> From: Kieran Bingham <[email protected]>
>>>
>>> The Linux UVC driver has long provided adequate performance capabilities for
>>> web-cams and low data rate video devices in Linux while resolutions were low.
>>>
>>> Modern USB cameras are now capable of high data rates thanks to USB3 with
>>> 1080p, and even 4k capture resolutions supported.
>>>
>>> Cameras such as the Stereolabs ZED or the Logitech Brio can generate more data
>>> than an embedded ARM core is able to process on a single core, resulting in
>>> frame loss.
>>>
>>> A large part of this performance impact is from the requirement to
>>> ‘memcpy’ frames out from URB packets to destination frames. This unfortunate
>>> requirement is due to the UVC protocol allowing a variable length header, and
>>> thus it is not possible to provide the target frame buffers directly.
>>
>>
>> I have a rather large patch that does provide frame buffers directly for bulk
>> cameras. It cannot be used with ISOC cameras. But it is currently for 4.1.
>> I'll be porting it to 4.9 in a few days if you'd like to see it.
>
>
> How did you get on with this porting activity?
>
> Is it possible to share any of this work with the mailing lists ?


This is pretty ugly all squashed together but here is the 4.9 patch

It does a bit more than 0 copy. I'll just post a link, because I doubt anyone
else wants to look.

https://github.com/boundarydevices/linux-imx6/commit/5cbb48a3332a6e8aad4a1359b1b5eb05eb0fff96

HTH
Troy

>
> (If you have not ported to v4.9 - I think it would be useful even to post the
> v4.1 patch and we can look at what's needed for getting it ported to mainline)
>
> --
> Regards
>
> Kieran
>
>
>>
>> BR
>> Troy