From: Gustavo Padovan <[email protected]>
Hi,
This RFC adds support for Explicit Synchronization of shared buffers in V4L2.
It uses the Sync File Framework[1] as vector to communicate the fences
between kernel and userspace.
I'm sending this to start the discussion on the best approach to implement
Explicit Synchronization, please check the TODO/OPEN section below.
Explicit Synchronization allows us to control the synchronization of
shared buffers from userspace by passing fences to the kernel and/or
receiving them from the the kernel.
Fences passed to the kernel are named in-fences and the kernel should wait
them to signal before using the buffer. On the other side, the kernel creates
out-fences for every buffer it receives from userspace. This fence is sent back
to userspace and it will signal when the capture, for example, has finished.
Signalling an out-fence in V4L2 would mean that the job on the buffer is done
and the buffer can be used by other drivers.
Current RFC implementation
--------------------------
The current implementation is not intended to be more than a PoC to start
the discussion on how Explicit Synchronization should be supported in V4L2.
The first patch proposes an userspace API for fences, then on patch 2
we prepare to the addition of in-fences in patch 3, by introducing the
infrastructure on vb2 to wait on an in-fence signal before queueing the buffer
in the driver.
Patch 4 fix uvc v4l2 event handling and patch 5 configure q->dev for vivid
drivers to enable to subscribe and dequeue events on it.
Patches 6-7 enables support to notify BUF_QUEUED events, i.e., let userspace
know that particular buffer was enqueued in the driver. This is needed,
because we return the out-fence fd as an out argument in QBUF, but at the time
it returns we don't know to which buffer the fence will be attached thus
the BUF_QUEUED event tells which buffer is associated to the fence received in
QBUF by userspace.
Patches 8 and 9 add more fence infrastructure to support out-fences and finally
patch 10 adds support to out-fences.
TODO/OPEN:
----------
* For this first implementation we will keep the ordering of the buffers queued
in videobuf2, that means we will only enqueue buffer whose fence was signalled
if that buffer is the first one in the queue. Otherwise it has to wait until it
is the first one. This is not implmented yet. Later we could create a flag to
allow unordered queing in the drivers from vb2 if needed.
* Should we have out-fences per-buffer or per-plane? or both? In this RFC, for
simplicity they are per-buffer, but Mauro and Javier raised the option of
doing per-plane fences. That could benefit mem2mem and V4L2 <-> GPU operation
at least on cases when we have Capture hw that releases the Y frame before the
other frames for example. When using V4L2 per-plane out-fences to communicate
with KMS they would need to be merged together as currently the DRM Plane
interface only supports one fence per DRM Plane.
In-fences should be per-buffer as the DRM only has per-buffer fences, but
in case of mem2mem operations per-plane fences might be useful?
So should we have both ways, per-plane and per-buffer, or just one of them
for now?
* other open topics are how to deal with hw-fences and Request API.
Comments are welcome!
Regards,
Gustavo
---
Gustavo Padovan (9):
[media] vb2: add explicit fence user API
[media] vb2: split out queueing from vb_core_qbuf()
[media] vb2: add in-fence support to QBUF
[media] uvc: enable subscriptions to other events
[media] vivid: assign the specific device to the vb2_queue->dev
[media] v4l: add V4L2_EVENT_BUF_QUEUED event
[media] v4l: add support to BUF_QUEUED event
[media] vb2: add infrastructure to support out-fences
[media] vb2: add out-fence support to QBUF
Javier Martinez Canillas (1):
[media] vb2: add videobuf2 dma-buf fence helpers
drivers/media/Kconfig | 1 +
drivers/media/platform/vivid/vivid-core.c | 10 +-
drivers/media/usb/uvc/uvc_v4l2.c | 2 +-
drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 4 +-
drivers/media/v4l2-core/v4l2-ctrls.c | 6 +-
drivers/media/v4l2-core/videobuf2-core.c | 139 ++++++++++++++++++++------
drivers/media/v4l2-core/videobuf2-v4l2.c | 29 +++++-
include/media/videobuf2-core.h | 12 ++-
include/media/videobuf2-fence.h | 49 +++++++++
include/uapi/linux/videodev2.h | 12 ++-
10 files changed, 218 insertions(+), 46 deletions(-)
create mode 100644 include/media/videobuf2-fence.h
--
2.9.3
From: Gustavo Padovan <[email protected]>
Receive in-fence from userspace and support for waiting on them
before queueing the buffer for the driver.
Signed-off-by: Gustavo Padovan <[email protected]>
---
drivers/media/Kconfig | 1 +
drivers/media/v4l2-core/videobuf2-core.c | 24 ++++++++++++++++++++----
drivers/media/v4l2-core/videobuf2-v4l2.c | 14 +++++++++++++-
include/media/videobuf2-core.h | 7 ++++++-
4 files changed, 40 insertions(+), 6 deletions(-)
diff --git a/drivers/media/Kconfig b/drivers/media/Kconfig
index 3512316..7c5a0e0 100644
--- a/drivers/media/Kconfig
+++ b/drivers/media/Kconfig
@@ -5,6 +5,7 @@
menuconfig MEDIA_SUPPORT
tristate "Multimedia support"
depends on HAS_IOMEM
+ select SYNC_FILE
help
If you want to use Webcams, Video grabber devices and/or TV devices
enable this option and other options below.
diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
index 0e30fcd..e0e7109 100644
--- a/drivers/media/v4l2-core/videobuf2-core.c
+++ b/drivers/media/v4l2-core/videobuf2-core.c
@@ -1400,7 +1400,18 @@ static int __vb2_core_qbuf(struct vb2_buffer *vb, struct vb2_queue *q)
return 0;
}
-int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb)
+static void vb2_qbuf_fence_cb(struct dma_fence *f, struct dma_fence_cb *cb)
+{
+ struct vb2_buffer *vb = container_of(cb, struct vb2_buffer, fence_cb);
+
+ dma_fence_put(vb->in_fence);
+ vb->in_fence = NULL;
+
+ __vb2_core_qbuf(vb, vb->vb2_queue);
+}
+
+int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb,
+ struct dma_fence *fence)
{
struct vb2_buffer *vb;
int ret;
@@ -1432,6 +1443,11 @@ int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb)
if (pb)
call_void_bufop(q, fill_user_buffer, vb, pb);
+ vb->in_fence = fence;
+ if (fence && !dma_fence_add_callback(fence, &vb->fence_cb,
+ vb2_qbuf_fence_cb))
+ return 0;
+
return __vb2_core_qbuf(vb, q);
}
EXPORT_SYMBOL_GPL(vb2_core_qbuf);
@@ -2246,7 +2262,7 @@ static int __vb2_init_fileio(struct vb2_queue *q, int read)
* Queue all buffers.
*/
for (i = 0; i < q->num_buffers; i++) {
- ret = vb2_core_qbuf(q, i, NULL);
+ ret = vb2_core_qbuf(q, i, NULL, NULL);
if (ret)
goto err_reqbufs;
fileio->bufs[i].queued = 1;
@@ -2425,7 +2441,7 @@ static size_t __vb2_perform_fileio(struct vb2_queue *q, char __user *data, size_
if (copy_timestamp)
b->timestamp = ktime_get_ns();
- ret = vb2_core_qbuf(q, index, NULL);
+ ret = vb2_core_qbuf(q, index, NULL, NULL);
dprintk(5, "vb2_dbuf result: %d\n", ret);
if (ret)
return ret;
@@ -2528,7 +2544,7 @@ static int vb2_thread(void *data)
if (copy_timestamp)
vb->timestamp = ktime_get_ns();;
if (!threadio->stop)
- ret = vb2_core_qbuf(q, vb->index, NULL);
+ ret = vb2_core_qbuf(q, vb->index, NULL, NULL);
call_void_qop(q, wait_prepare, q);
if (ret || threadio->stop)
break;
diff --git a/drivers/media/v4l2-core/videobuf2-v4l2.c b/drivers/media/v4l2-core/videobuf2-v4l2.c
index d23c1bf..c164aa0 100644
--- a/drivers/media/v4l2-core/videobuf2-v4l2.c
+++ b/drivers/media/v4l2-core/videobuf2-v4l2.c
@@ -23,6 +23,7 @@
#include <linux/sched.h>
#include <linux/freezer.h>
#include <linux/kthread.h>
+#include <linux/sync_file.h>
#include <media/v4l2-dev.h>
#include <media/v4l2-fh.h>
@@ -557,6 +558,7 @@ EXPORT_SYMBOL_GPL(vb2_create_bufs);
int vb2_qbuf(struct vb2_queue *q, struct v4l2_buffer *b)
{
+ struct dma_fence *fence = NULL;
int ret;
if (vb2_fileio_is_active(q)) {
@@ -565,7 +567,17 @@ int vb2_qbuf(struct vb2_queue *q, struct v4l2_buffer *b)
}
ret = vb2_queue_or_prepare_buf(q, b, "qbuf");
- return ret ? ret : vb2_core_qbuf(q, b->index, b);
+
+ if (b->flags & V4L2_BUF_FLAG_IN_FENCE) {
+ if (b->memory != VB2_MEMORY_DMABUF)
+ return -EINVAL;
+
+ fence = sync_file_get_fence(b->fence_fd);
+ if (!fence)
+ return -EINVAL;
+ }
+
+ return ret ? ret : vb2_core_qbuf(q, b->index, b, fence);
}
EXPORT_SYMBOL_GPL(vb2_qbuf);
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index ac5898a..fe2de99 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -16,6 +16,7 @@
#include <linux/mutex.h>
#include <linux/poll.h>
#include <linux/dma-buf.h>
+#include <linux/dma-fence.h>
#define VB2_MAX_FRAME (32)
#define VB2_MAX_PLANES (8)
@@ -259,6 +260,9 @@ struct vb2_buffer {
struct list_head queued_entry;
struct list_head done_entry;
+
+ struct dma_fence *in_fence;
+ struct dma_fence_cb fence_cb;
#ifdef CONFIG_VIDEO_ADV_DEBUG
/*
* Counters for how often these buffer-related ops are
@@ -727,7 +731,8 @@ int vb2_core_prepare_buf(struct vb2_queue *q, unsigned int index, void *pb);
* The return values from this function are intended to be directly returned
* from vidioc_qbuf handler in driver.
*/
-int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb);
+int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb,
+ struct dma_fence *fence);
/**
* vb2_core_dqbuf() - Dequeue a buffer to the userspace
--
2.9.3
From: Gustavo Padovan <[email protected]>
Implement the needed pieces to let userspace subscribe for
V4L2_EVENT_BUF_QUEUED events. Videobuf2 will queue the event for the
DQEVENT ioctl.
Signed-off-by: Gustavo Padovan <[email protected]>
---
drivers/media/v4l2-core/v4l2-ctrls.c | 6 +++++-
drivers/media/v4l2-core/videobuf2-core.c | 15 +++++++++++++++
2 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c b/drivers/media/v4l2-core/v4l2-ctrls.c
index b9e08e3..1be554b 100644
--- a/drivers/media/v4l2-core/v4l2-ctrls.c
+++ b/drivers/media/v4l2-core/v4l2-ctrls.c
@@ -3413,8 +3413,12 @@ EXPORT_SYMBOL(v4l2_ctrl_log_status);
int v4l2_ctrl_subscribe_event(struct v4l2_fh *fh,
const struct v4l2_event_subscription *sub)
{
- if (sub->type == V4L2_EVENT_CTRL)
+ switch (sub->type) {
+ case V4L2_EVENT_CTRL:
return v4l2_event_subscribe(fh, sub, 0, &v4l2_ctrl_sub_ev_ops);
+ case V4L2_EVENT_BUF_QUEUED:
+ return v4l2_event_subscribe(fh, sub, 0, NULL);
+ }
return -EINVAL;
}
EXPORT_SYMBOL(v4l2_ctrl_subscribe_event);
diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
index e0e7109..d9cb777 100644
--- a/drivers/media/v4l2-core/videobuf2-core.c
+++ b/drivers/media/v4l2-core/videobuf2-core.c
@@ -25,6 +25,7 @@
#include <linux/kthread.h>
#include <media/videobuf2-core.h>
+#include <media/v4l2-event.h>
#include <media/v4l2-mc.h>
#include <trace/events/vb2.h>
@@ -1221,6 +1222,18 @@ static int __qbuf_dmabuf(struct vb2_buffer *vb, const void *pb)
return ret;
}
+static void vb2_buffer_queued_event(struct vb2_buffer *vb)
+{
+ struct video_device *vdev = to_video_device(vb->vb2_queue->dev);
+ struct v4l2_event event;
+
+ memset(&event, 0, sizeof(event));
+ event.type = V4L2_EVENT_BUF_QUEUED;
+ event.u.buf_queued.index = vb->index;
+
+ v4l2_event_queue(vdev, &event);
+}
+
/**
* __enqueue_in_driver() - enqueue a vb2_buffer in driver for processing
*/
@@ -1239,6 +1252,8 @@ static void __enqueue_in_driver(struct vb2_buffer *vb)
call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
call_void_vb_qop(vb, buf_queue, vb);
+
+ vb2_buffer_queued_event(vb);
}
static int __buf_prepare(struct vb2_buffer *vb, const void *pb)
--
2.9.3
From: Gustavo Padovan <[email protected]>
Add a new event the userspace can subscribe to receive notifications
about when a buffer was enqueued onto the driver. The event provides
the index of the enqueued buffer.
Signed-off-by: Gustavo Padovan <[email protected]>
---
include/uapi/linux/videodev2.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 3b6cfa6..5b44fd0 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -2133,6 +2133,7 @@ struct v4l2_streamparm {
#define V4L2_EVENT_FRAME_SYNC 4
#define V4L2_EVENT_SOURCE_CHANGE 5
#define V4L2_EVENT_MOTION_DET 6
+#define V4L2_EVENT_BUF_QUEUED 7
#define V4L2_EVENT_PRIVATE_START 0x08000000
/* Payload for V4L2_EVENT_VSYNC */
@@ -2185,6 +2186,10 @@ struct v4l2_event_motion_det {
__u32 region_mask;
};
+struct v4l2_event_buf_queued {
+ __u32 index;
+};
+
struct v4l2_event {
__u32 type;
union {
@@ -2193,6 +2198,7 @@ struct v4l2_event {
struct v4l2_event_frame_sync frame_sync;
struct v4l2_event_src_change src_change;
struct v4l2_event_motion_det motion_det;
+ struct v4l2_event_buf_queued buf_queued;
__u8 data[64];
} u;
__u32 pending;
--
2.9.3
From: Gustavo Padovan <[email protected]>
Instead of assign the global v4l2 device assigned the specific device,
this was causing trouble when using using V4L2 events with vivid
devices. The queue device should be the same one we opened in userspace.
Signed-off-by: Gustavo Padovan <[email protected]>
---
drivers/media/platform/vivid/vivid-core.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/media/platform/vivid/vivid-core.c b/drivers/media/platform/vivid/vivid-core.c
index ef344b9..8843170 100644
--- a/drivers/media/platform/vivid/vivid-core.c
+++ b/drivers/media/platform/vivid/vivid-core.c
@@ -1070,7 +1070,7 @@ static int vivid_create_instance(struct platform_device *pdev, int inst)
q->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
q->min_buffers_needed = 2;
q->lock = &dev->mutex;
- q->dev = dev->v4l2_dev.dev;
+ q->dev = &dev->vid_cap_dev.dev;
ret = vb2_queue_init(q);
if (ret)
@@ -1090,7 +1090,7 @@ static int vivid_create_instance(struct platform_device *pdev, int inst)
q->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
q->min_buffers_needed = 2;
q->lock = &dev->mutex;
- q->dev = dev->v4l2_dev.dev;
+ q->dev = &dev->vid_out_dev.dev;
ret = vb2_queue_init(q);
if (ret)
@@ -1110,7 +1110,7 @@ static int vivid_create_instance(struct platform_device *pdev, int inst)
q->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
q->min_buffers_needed = 2;
q->lock = &dev->mutex;
- q->dev = dev->v4l2_dev.dev;
+ q->dev = &dev->vbi_cap_dev.dev;
ret = vb2_queue_init(q);
if (ret)
@@ -1130,7 +1130,7 @@ static int vivid_create_instance(struct platform_device *pdev, int inst)
q->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
q->min_buffers_needed = 2;
q->lock = &dev->mutex;
- q->dev = dev->v4l2_dev.dev;
+ q->dev = &dev->vbi_out_dev.dev;
ret = vb2_queue_init(q);
if (ret)
@@ -1149,7 +1149,7 @@ static int vivid_create_instance(struct platform_device *pdev, int inst)
q->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
q->min_buffers_needed = 8;
q->lock = &dev->mutex;
- q->dev = dev->v4l2_dev.dev;
+ q->dev = &dev->sdr_cap_dev.dev;
ret = vb2_queue_init(q);
if (ret)
--
2.9.3
From: Gustavo Padovan <[email protected]>
If V4L2_BUF_FLAG_OUT_FENCE flag is present on the QBUF call we create
an out_fence for the buffer and return it to userspace on the fence_fd
field.
The fence is signaled on buffer_done(), when the job on the buffer is
finished.
TODO: clean up on __vb2_queue_cancel()
Signed-off-by: Gustavo Padovan <[email protected]>
---
drivers/media/v4l2-core/videobuf2-core.c | 6 ++++++
drivers/media/v4l2-core/videobuf2-v4l2.c | 15 ++++++++++++++-
2 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
index 54b1404..ca8abcc 100644
--- a/drivers/media/v4l2-core/videobuf2-core.c
+++ b/drivers/media/v4l2-core/videobuf2-core.c
@@ -356,6 +356,7 @@ static int __vb2_queue_alloc(struct vb2_queue *q, enum vb2_memory memory,
vb->planes[plane].length = plane_sizes[plane];
vb->planes[plane].min_length = plane_sizes[plane];
}
+ vb->out_fence_fd = -1;
q->bufs[vb->index] = vb;
/* Allocate video buffer memory for the MMAP type */
@@ -940,6 +941,11 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
__enqueue_in_driver(vb);
return;
default:
+ dma_fence_signal(vb->out_fence);
+ dma_fence_put(vb->out_fence);
+ vb->out_fence = NULL;
+ vb->out_fence_fd = -1;
+
/* Inform any processes that may be waiting for buffers */
wake_up(&q->done_wq);
break;
diff --git a/drivers/media/v4l2-core/videobuf2-v4l2.c b/drivers/media/v4l2-core/videobuf2-v4l2.c
index c164aa0..1b43d9f 100644
--- a/drivers/media/v4l2-core/videobuf2-v4l2.c
+++ b/drivers/media/v4l2-core/videobuf2-v4l2.c
@@ -204,9 +204,14 @@ static void __fill_v4l2_buffer(struct vb2_buffer *vb, void *pb)
b->timestamp = ns_to_timeval(vb->timestamp);
b->timecode = vbuf->timecode;
b->sequence = vbuf->sequence;
- b->fence_fd = -1;
+ b->fence_fd = vb->out_fence_fd;
b->reserved = 0;
+ if (vb->sync_file) {
+ fd_install(vb->out_fence_fd, vb->sync_file->file);
+ vb->sync_file = NULL;
+ }
+
if (q->is_multiplanar) {
/*
* Fill in plane-related data if userspace provided an array
@@ -577,6 +582,14 @@ int vb2_qbuf(struct vb2_queue *q, struct v4l2_buffer *b)
return -EINVAL;
}
+ if (b->flags & V4L2_BUF_FLAG_OUT_FENCE) {
+ ret = vb2_setup_out_fence(q, b->index);
+ if (ret) {
+ dma_fence_put(fence);
+ return ret;
+ }
+ }
+
return ret ? ret : vb2_core_qbuf(q, b->index, b, fence);
}
EXPORT_SYMBOL_GPL(vb2_qbuf);
--
2.9.3
From: Gustavo Padovan <[email protected]>
Add vb2_setup_out_fence() and the needed members to struct vb2_buffer.
Signed-off-by: Gustavo Padovan <[email protected]>
---
drivers/media/v4l2-core/videobuf2-core.c | 31 +++++++++++++++++++++++++++++++
include/media/videobuf2-core.h | 5 +++++
2 files changed, 36 insertions(+)
diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
index d9cb777..54b1404 100644
--- a/drivers/media/v4l2-core/videobuf2-core.c
+++ b/drivers/media/v4l2-core/videobuf2-core.c
@@ -23,8 +23,11 @@
#include <linux/sched.h>
#include <linux/freezer.h>
#include <linux/kthread.h>
+#include <linux/sync_file.h>
+#include <linux/dma-fence.h>
#include <media/videobuf2-core.h>
+#include <media/videobuf2-fence.h>
#include <media/v4l2-event.h>
#include <media/v4l2-mc.h>
@@ -1315,6 +1318,34 @@ int vb2_core_prepare_buf(struct vb2_queue *q, unsigned int index, void *pb)
}
EXPORT_SYMBOL_GPL(vb2_core_prepare_buf);
+int vb2_setup_out_fence(struct vb2_queue *q, unsigned int index)
+{
+ struct vb2_buffer *vb = q->bufs[index];
+
+ vb->out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
+ if (vb->out_fence_fd < 0)
+ return vb->out_fence_fd;
+
+ vb->out_fence = vb2_fence_alloc();
+ if (!vb->out_fence)
+ goto err;
+
+ vb->sync_file = sync_file_create(vb->out_fence);
+ if (!vb->sync_file) {
+ dma_fence_put(vb->out_fence);
+ vb->out_fence = NULL;
+ goto err;
+ }
+
+ return 0;
+
+err:
+ put_unused_fd(vb->out_fence_fd);
+ vb->out_fence_fd = -1;
+ return -ENOMEM;
+}
+EXPORT_SYMBOL_GPL(vb2_setup_out_fence);
+
/**
* vb2_start_streaming() - Attempt to start streaming.
* @q: videobuf2 queue
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index fe2de99..efdc390 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -263,6 +263,10 @@ struct vb2_buffer {
struct dma_fence *in_fence;
struct dma_fence_cb fence_cb;
+
+ struct dma_fence *out_fence;
+ struct sync_file *sync_file;
+ int out_fence_fd;
#ifdef CONFIG_VIDEO_ADV_DEBUG
/*
* Counters for how often these buffer-related ops are
@@ -710,6 +714,7 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
*/
int vb2_core_prepare_buf(struct vb2_queue *q, unsigned int index, void *pb);
+int vb2_setup_out_fence(struct vb2_queue *q, unsigned int index);
/**
* vb2_core_qbuf() - Queue a buffer from userspace
*
--
2.9.3
From: Javier Martinez Canillas <[email protected]>
Add a videobuf2-fence.h header file that contains different helpers
for DMA buffer sharing explicit fence support in videobuf2.
Signed-off-by: Javier Martinez Canillas <[email protected]>
Signed-off-by: Gustavo Padovan <[email protected]>
---
include/media/videobuf2-fence.h | 49 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
create mode 100644 include/media/videobuf2-fence.h
diff --git a/include/media/videobuf2-fence.h b/include/media/videobuf2-fence.h
new file mode 100644
index 0000000..ed5612c
--- /dev/null
+++ b/include/media/videobuf2-fence.h
@@ -0,0 +1,49 @@
+/*
+ * videobuf2-fence.h - DMA buffer sharing fence helpers for videobuf 2
+ *
+ * Copyright (C) 2016 Samsung Electronics
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/dma-fence.h>
+#include <linux/slab.h>
+
+static DEFINE_SPINLOCK(vb2_fence_lock);
+
+static inline const char *vb2_fence_get_driver_name(struct dma_fence *fence)
+{
+ return "vb2_fence";
+}
+
+static inline const char *vb2_fence_get_timeline_name(struct dma_fence *fence)
+{
+ return "vb2_fence_timeline";
+}
+
+static inline bool vb2_fence_enable_signaling(struct dma_fence *fence)
+{
+ return true;
+}
+
+static const struct dma_fence_ops vb2_fence_ops = {
+ .get_driver_name = vb2_fence_get_driver_name,
+ .get_timeline_name = vb2_fence_get_timeline_name,
+ .enable_signaling = vb2_fence_enable_signaling,
+ .wait = dma_fence_default_wait,
+};
+
+static inline struct dma_fence *vb2_fence_alloc(void)
+{
+ struct dma_fence *vb2_fence = kzalloc(sizeof(*vb2_fence), GFP_KERNEL);
+
+ if (!vb2_fence)
+ return NULL;
+
+ dma_fence_init(vb2_fence, &vb2_fence_ops, &vb2_fence_lock,
+ dma_fence_context_alloc(1), 1);
+
+ return vb2_fence;
+}
--
2.9.3
From: Gustavo Padovan <[email protected]>
Call v4l2_ctrl_subscribe_event to subscribe to more events supported by
v4l.
Signed-off-by: Gustavo Padovan <[email protected]>
---
drivers/media/usb/uvc/uvc_v4l2.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/media/usb/uvc/uvc_v4l2.c b/drivers/media/usb/uvc/uvc_v4l2.c
index 3e7e283..dfa0ccd 100644
--- a/drivers/media/usb/uvc/uvc_v4l2.c
+++ b/drivers/media/usb/uvc/uvc_v4l2.c
@@ -1240,7 +1240,7 @@ static int uvc_ioctl_subscribe_event(struct v4l2_fh *fh,
case V4L2_EVENT_CTRL:
return v4l2_event_subscribe(fh, sub, 0, &uvc_ctrl_sub_ev_ops);
default:
- return -EINVAL;
+ return v4l2_ctrl_subscribe_event(fh, sub);
}
}
--
2.9.3
From: Gustavo Padovan <[email protected]>
In order to support explicit synchronization we need to divide
vb2_core_qbuf() in two parts one, to be executed before the fence
signals and another one after that, to do the actual queueing of
the buffer.
Signed-off-by: Gustavo Padovan <[email protected]>
---
drivers/media/v4l2-core/videobuf2-core.c | 65 ++++++++++++++++++--------------
1 file changed, 36 insertions(+), 29 deletions(-)
diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
index 94afbbf9..0e30fcd 100644
--- a/drivers/media/v4l2-core/videobuf2-core.c
+++ b/drivers/media/v4l2-core/videobuf2-core.c
@@ -1363,29 +1363,10 @@ static int vb2_start_streaming(struct vb2_queue *q)
return ret;
}
-int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb)
+static int __vb2_core_qbuf(struct vb2_buffer *vb, struct vb2_queue *q)
{
- struct vb2_buffer *vb;
int ret;
- vb = q->bufs[index];
-
- switch (vb->state) {
- case VB2_BUF_STATE_DEQUEUED:
- ret = __buf_prepare(vb, pb);
- if (ret)
- return ret;
- break;
- case VB2_BUF_STATE_PREPARED:
- break;
- case VB2_BUF_STATE_PREPARING:
- dprintk(1, "buffer still being prepared\n");
- return -EINVAL;
- default:
- dprintk(1, "invalid buffer state %d\n", vb->state);
- return -EINVAL;
- }
-
/*
* Add to the queued buffers list, a buffer will stay on it until
* dequeued in dqbuf.
@@ -1395,11 +1376,6 @@ int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb)
q->waiting_for_buffers = false;
vb->state = VB2_BUF_STATE_QUEUED;
- if (pb)
- call_void_bufop(q, copy_timestamp, vb, pb);
-
- trace_vb2_qbuf(q, vb);
-
/*
* If already streaming, give the buffer to driver for processing.
* If not, the buffer will be given to driver on next streamon.
@@ -1407,10 +1383,6 @@ int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb)
if (q->start_streaming_called)
__enqueue_in_driver(vb);
- /* Fill buffer information for the userspace */
- if (pb)
- call_void_bufop(q, fill_user_buffer, vb, pb);
-
/*
* If streamon has been called, and we haven't yet called
* start_streaming() since not enough buffers were queued, and
@@ -1427,6 +1399,41 @@ int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb)
dprintk(1, "qbuf of buffer %d succeeded\n", vb->index);
return 0;
}
+
+int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb)
+{
+ struct vb2_buffer *vb;
+ int ret;
+
+ vb = q->bufs[index];
+
+ switch (vb->state) {
+ case VB2_BUF_STATE_DEQUEUED:
+ ret = __buf_prepare(vb, pb);
+ if (ret)
+ return ret;
+ break;
+ case VB2_BUF_STATE_PREPARED:
+ break;
+ case VB2_BUF_STATE_PREPARING:
+ dprintk(1, "buffer still being prepared\n");
+ return -EINVAL;
+ default:
+ dprintk(1, "invalid buffer state %d\n", vb->state);
+ return -EINVAL;
+ }
+
+ if (pb)
+ call_void_bufop(q, copy_timestamp, vb, pb);
+
+ trace_vb2_qbuf(q, vb);
+
+ /* Fill buffer information for the userspace */
+ if (pb)
+ call_void_bufop(q, fill_user_buffer, vb, pb);
+
+ return __vb2_core_qbuf(vb, q);
+}
EXPORT_SYMBOL_GPL(vb2_core_qbuf);
/**
--
2.9.3
From: Gustavo Padovan <[email protected]>
Turn the reserved2 field into fence_fd that we will use to send
an in-fence to the kernel return an out-fence from the kernel to
userspace.
Two new flags were added, V4L2_BUF_FLAG_IN_FENCE and
V4L2_BUF_FLAG_OUT_FENCE. They should be used when setting in-fence and
out-fence, respectively.
Signed-off-by: Gustavo Padovan <[email protected]>
---
drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 4 ++--
drivers/media/v4l2-core/videobuf2-v4l2.c | 2 +-
include/uapi/linux/videodev2.h | 6 ++++--
3 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
index eac9565..0a522cb 100644
--- a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
+++ b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
@@ -348,7 +348,7 @@ struct v4l2_buffer32 {
__s32 fd;
} m;
__u32 length;
- __u32 reserved2;
+ __s32 fence_fd;
__u32 reserved;
};
@@ -511,7 +511,7 @@ static int put_v4l2_buffer32(struct v4l2_buffer *kp, struct v4l2_buffer32 __user
put_user(kp->timestamp.tv_usec, &up->timestamp.tv_usec) ||
copy_to_user(&up->timecode, &kp->timecode, sizeof(struct v4l2_timecode)) ||
put_user(kp->sequence, &up->sequence) ||
- put_user(kp->reserved2, &up->reserved2) ||
+ put_user(kp->fence_fd, &up->fence_fd) ||
put_user(kp->reserved, &up->reserved) ||
put_user(kp->length, &up->length))
return -EFAULT;
diff --git a/drivers/media/v4l2-core/videobuf2-v4l2.c b/drivers/media/v4l2-core/videobuf2-v4l2.c
index 3529849..d23c1bf 100644
--- a/drivers/media/v4l2-core/videobuf2-v4l2.c
+++ b/drivers/media/v4l2-core/videobuf2-v4l2.c
@@ -203,7 +203,7 @@ static void __fill_v4l2_buffer(struct vb2_buffer *vb, void *pb)
b->timestamp = ns_to_timeval(vb->timestamp);
b->timecode = vbuf->timecode;
b->sequence = vbuf->sequence;
- b->reserved2 = 0;
+ b->fence_fd = -1;
b->reserved = 0;
if (q->is_multiplanar) {
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 45184a2..3b6cfa6 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -911,7 +911,7 @@ struct v4l2_buffer {
__s32 fd;
} m;
__u32 length;
- __u32 reserved2;
+ __s32 fence_fd;
__u32 reserved;
};
@@ -946,8 +946,10 @@ struct v4l2_buffer {
#define V4L2_BUF_FLAG_TSTAMP_SRC_MASK 0x00070000
#define V4L2_BUF_FLAG_TSTAMP_SRC_EOF 0x00000000
#define V4L2_BUF_FLAG_TSTAMP_SRC_SOE 0x00010000
+#define V4L2_BUF_FLAG_IN_FENCE 0x00100000
+#define V4L2_BUF_FLAG_OUT_FENCE 0x00200000
/* mem2mem encoder/decoder */
-#define V4L2_BUF_FLAG_LAST 0x00100000
+#define V4L2_BUF_FLAG_LAST 0x00400000
/**
* struct v4l2_exportbuffer - export of video buffer as DMABUF file descriptor
--
2.9.3
Hi Gustavo,
On Mon, 2017-03-13 at 16:20 -0300, Gustavo Padovan wrote:
> From: Gustavo Padovan <[email protected]>
>
> Turn the reserved2 field into fence_fd that we will use to send
> an in-fence to the kernel return an out-fence from the kernel to
> userspace.
>
> Two new flags were added, V4L2_BUF_FLAG_IN_FENCE and
> V4L2_BUF_FLAG_OUT_FENCE. They should be used when setting in-fence and
> out-fence, respectively.
>
> Signed-off-by: Gustavo Padovan <[email protected]>
> ---
> drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 4 ++--
> drivers/media/v4l2-core/videobuf2-v4l2.c | 2 +-
> include/uapi/linux/videodev2.h | 6 ++++--
> 3 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
> index eac9565..0a522cb 100644
> --- a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
> +++ b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
> @@ -348,7 +348,7 @@ struct v4l2_buffer32 {
> __s32 fd;
> } m;
> __u32 length;
> - __u32 reserved2;
> + __s32 fence_fd;
> __u32 reserved;
> };
>
> @@ -511,7 +511,7 @@ static int put_v4l2_buffer32(struct v4l2_buffer *kp, struct v4l2_buffer32 __user
> put_user(kp->timestamp.tv_usec, &up->timestamp.tv_usec) ||
> copy_to_user(&up->timecode, &kp->timecode, sizeof(struct v4l2_timecode)) ||
> put_user(kp->sequence, &up->sequence) ||
> - put_user(kp->reserved2, &up->reserved2) ||
> + put_user(kp->fence_fd, &up->fence_fd) ||
> put_user(kp->reserved, &up->reserved) ||
> put_user(kp->length, &up->length))
> return -EFAULT;
> diff --git a/drivers/media/v4l2-core/videobuf2-v4l2.c b/drivers/media/v4l2-core/videobuf2-v4l2.c
> index 3529849..d23c1bf 100644
> --- a/drivers/media/v4l2-core/videobuf2-v4l2.c
> +++ b/drivers/media/v4l2-core/videobuf2-v4l2.c
> @@ -203,7 +203,7 @@ static void __fill_v4l2_buffer(struct vb2_buffer *vb, void *pb)
> b->timestamp = ns_to_timeval(vb->timestamp);
> b->timecode = vbuf->timecode;
> b->sequence = vbuf->sequence;
> - b->reserved2 = 0;
> + b->fence_fd = -1;
> b->reserved = 0;
>
> if (q->is_multiplanar) {
> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> index 45184a2..3b6cfa6 100644
> --- a/include/uapi/linux/videodev2.h
> +++ b/include/uapi/linux/videodev2.h
> @@ -911,7 +911,7 @@ struct v4l2_buffer {
> __s32 fd;
> } m;
> __u32 length;
> - __u32 reserved2;
> + __s32 fence_fd;
> __u32 reserved;
> };
>
> @@ -946,8 +946,10 @@ struct v4l2_buffer {
> #define V4L2_BUF_FLAG_TSTAMP_SRC_MASK 0x00070000
> #define V4L2_BUF_FLAG_TSTAMP_SRC_EOF 0x00000000
> #define V4L2_BUF_FLAG_TSTAMP_SRC_SOE 0x00010000
> +#define V4L2_BUF_FLAG_IN_FENCE 0x00100000
> +#define V4L2_BUF_FLAG_OUT_FENCE 0x00200000
> /* mem2mem encoder/decoder */
> -#define V4L2_BUF_FLAG_LAST 0x00100000
> +#define V4L2_BUF_FLAG_LAST 0x00400000
This is not just a sentinel, it is a meaningful flag that must not be
changed. It indicates the last buffer produced by a hardware codec.
regards
Philipp
Hi Gustavo,
Em Mon, 13 Mar 2017 16:20:25 -0300
Gustavo Padovan <[email protected]> escreveu:
> From: Gustavo Padovan <[email protected]>
>
> Hi,
>
> This RFC adds support for Explicit Synchronization of shared buffers in V4L2.
> It uses the Sync File Framework[1] as vector to communicate the fences
> between kernel and userspace.
Thanks for your work!
I looked on your patchset, and I didn't notice anything really weird
there. So, instead on reviewing patch per patch, I prefer to discuss
about the requirements and API, as depending on it, the code base will
change a lot.
I'd like to do some tests with it on devices with mem2mem drivers.
My plan is to use an Exynos board for such thing, but I guess that
the DRM driver for it currently doesn't. I'm seeing internally if someone
could be sure that Exynos driver upstream will become ready for such
tests.
Javier wrote some patches last year meant to implement implicit
fences support. What we noticed is that, while his mechanism worked
fine for pure capture and pure output devices, when we added a mem2mem
device, on a DMABUF+fences pipeline, e. g.:
sensor -> [m2m] -> DRM
End everything using fences/DMABUF, the fences mechanism caused dead
locks on existing userspace apps.
A m2m device has both capture and output devnodes. Both should be
queued/dequeued. The capture queue is synchronized internally at the
driver with the output buffer[1].
[1] The names here are counter-intuitive: "capture" is a devnode
where userspace receives a video stream; "output" is a devnode where
userspace feeds a video stream.
The problem is that adding implicit fences changed the behavior of
the ioctls, causing gstreamer to wait forever for buffers to be ready.
I suspect that, even with explicit fences, the behavior of Q/DQ
will be incompatible with the current behavior (or will require some
dirty hacks to make it identical).
So, IMHO, the best would be to use a new set of ioctls, when fences are
used (like VIDIOC_QFENCE/DQFENCE).
>
> I'm sending this to start the discussion on the best approach to implement
> Explicit Synchronization, please check the TODO/OPEN section below.
>
> Explicit Synchronization allows us to control the synchronization of
> shared buffers from userspace by passing fences to the kernel and/or
> receiving them from the the kernel.
>
> Fences passed to the kernel are named in-fences and the kernel should wait
> them to signal before using the buffer. On the other side, the kernel creates
> out-fences for every buffer it receives from userspace. This fence is sent back
> to userspace and it will signal when the capture, for example, has finished.
>
> Signalling an out-fence in V4L2 would mean that the job on the buffer is done
> and the buffer can be used by other drivers.
>
> Current RFC implementation
> --------------------------
>
> The current implementation is not intended to be more than a PoC to start
> the discussion on how Explicit Synchronization should be supported in V4L2.
>
> The first patch proposes an userspace API for fences, then on patch 2
> we prepare to the addition of in-fences in patch 3, by introducing the
> infrastructure on vb2 to wait on an in-fence signal before queueing the buffer
> in the driver.
>
> Patch 4 fix uvc v4l2 event handling and patch 5 configure q->dev for vivid
> drivers to enable to subscribe and dequeue events on it.
>
> Patches 6-7 enables support to notify BUF_QUEUED events, i.e., let userspace
> know that particular buffer was enqueued in the driver. This is needed,
> because we return the out-fence fd as an out argument in QBUF, but at the time
> it returns we don't know to which buffer the fence will be attached thus
> the BUF_QUEUED event tells which buffer is associated to the fence received in
> QBUF by userspace.
>
> Patches 8 and 9 add more fence infrastructure to support out-fences and finally
> patch 10 adds support to out-fences.
>
> TODO/OPEN:
> ----------
>
> * For this first implementation we will keep the ordering of the buffers queued
> in videobuf2, that means we will only enqueue buffer whose fence was signalled
> if that buffer is the first one in the queue. Otherwise it has to wait until it
> is the first one. This is not implmented yet. Later we could create a flag to
> allow unordered queing in the drivers from vb2 if needed.
The V4L2 spec doesn't warrant that the buffers will be dequeued at the
queue order.
In practice, however, most drivers will not reorder. Yet, mem2mem codec
drivers may reorder the buffers at the output, as the luminance information
(Y) usually comes first on JPEG/MPEG-like formats.
> * Should we have out-fences per-buffer or per-plane? or both? In this RFC, for
> simplicity they are per-buffer, but Mauro and Javier raised the option of
> doing per-plane fences. That could benefit mem2mem and V4L2 <-> GPU operation
> at least on cases when we have Capture hw that releases the Y frame before the
> other frames for example. When using V4L2 per-plane out-fences to communicate
> with KMS they would need to be merged together as currently the DRM Plane
> interface only supports one fence per DRM Plane.
That's another advantage of using a new set of ioctls for queues: with that,
queuing/dequeing per plane will be easier. On codec drivers, doing it per
plane could bring performance improvements.
> In-fences should be per-buffer as the DRM only has per-buffer fences, but
> in case of mem2mem operations per-plane fences might be useful?
>
> So should we have both ways, per-plane and per-buffer, or just one of them
> for now?
The API should be flexible enough to support both usecases. We could
implement just per-buffer in the beginning, but, on such case, we
should deploy an API that will allow to later add per-plane fences
without breaking userspace.
So, I prefer that, on multiplane fences, we have one fence per plane,
even if, at the first implementation, all fences will be released
at the same time, when the buffer is fully filled. That would allow
us to later improve it, without changing userspace.
> * other open topics are how to deal with hw-fences and Request API.
Let's not mix issues here. Request API is already complex enough
without explicit fences. The same is true for explicit fences: it is
also complex to add support for it with multi-planes and a not
ordered DQ.
So, my suggestion here is to not delay Request API due to fences,
nor to delay fences due to Request API, letting them to be merged
on different Kernel releases. When both features got added, someone
can work on a patchset that would allow using Request API with fences.
--
Thanks,
Mauro
Hello Gustavo,
On 03/13/2017 04:20 PM, Gustavo Padovan wrote:
[snip]
>
> int vb2_qbuf(struct vb2_queue *q, struct v4l2_buffer *b)
> {
> + struct dma_fence *fence = NULL;
> int ret;
>
> if (vb2_fileio_is_active(q)) {
> @@ -565,7 +567,17 @@ int vb2_qbuf(struct vb2_queue *q, struct v4l2_buffer *b)
> }
>
> ret = vb2_queue_or_prepare_buf(q, b, "qbuf");
> - return ret ? ret : vb2_core_qbuf(q, b->index, b);
> +
> + if (b->flags & V4L2_BUF_FLAG_IN_FENCE) {
> + if (b->memory != VB2_MEMORY_DMABUF)
> + return -EINVAL;
> +
I wonder if is correct to check this. Only one side of the pipeline uses
V4L2_MEMORY_DMABUF while the other uses V4L2_MEMORY_MMAP + VIDIOC_EXPBUF.
So that means that fences can only be used in one direction?
Best regards,
--
Javier Martinez Canillas
Open Source Group
Samsung Research America
Hello Mauro and Gustavo,
On 04/03/2017 07:16 AM, Mauro Carvalho Chehab wrote:
> Hi Gustavo,
>
> Em Mon, 13 Mar 2017 16:20:25 -0300
> Gustavo Padovan <[email protected]> escreveu:
>
>> From: Gustavo Padovan <[email protected]>
>>
>> Hi,
>>
>> This RFC adds support for Explicit Synchronization of shared buffers in V4L2.
>> It uses the Sync File Framework[1] as vector to communicate the fences
>> between kernel and userspace.
>
> Thanks for your work!
>
> I looked on your patchset, and I didn't notice anything really weird
> there. So, instead on reviewing patch per patch, I prefer to discuss
> about the requirements and API, as depending on it, the code base will
> change a lot.
>
Agree that's better to first set on an uAPI and then implement based on that.
> I'd like to do some tests with it on devices with mem2mem drivers.
> My plan is to use an Exynos board for such thing, but I guess that
> the DRM driver for it currently doesn't. I'm seeing internally if someone
> could be sure that Exynos driver upstream will become ready for such
> tests.
>
Not sure if you should try to do testing before agreeing on an uAPI and
implementation.
> Javier wrote some patches last year meant to implement implicit
> fences support. What we noticed is that, while his mechanism worked
> fine for pure capture and pure output devices, when we added a mem2mem
> device, on a DMABUF+fences pipeline, e. g.:
>
> sensor -> [m2m] -> DRM
>
> End everything using fences/DMABUF, the fences mechanism caused dead
> locks on existing userspace apps.
>
> A m2m device has both capture and output devnodes. Both should be
> queued/dequeued. The capture queue is synchronized internally at the
> driver with the output buffer[1].
>
> [1] The names here are counter-intuitive: "capture" is a devnode
> where userspace receives a video stream; "output" is a devnode where
> userspace feeds a video stream.
>
> The problem is that adding implicit fences changed the behavior of
> the ioctls, causing gstreamer to wait forever for buffers to be ready.
>
The problem was related to trying to make user-space unaware of the implicit
fences support, and so it tried to QBUF a buffer that had already a pending
fence. A workaround was to block the second QBUF ioctl if the buffer had a
pending fence, but this caused the mentioned deadlock since GStreamer wasn't
expecting the QBUF ioctl to block.
> I suspect that, even with explicit fences, the behavior of Q/DQ
> will be incompatible with the current behavior (or will require some
> dirty hacks to make it identical).
>
> So, IMHO, the best would be to use a new set of ioctls, when fences are
> used (like VIDIOC_QFENCE/DQFENCE).
>
For explicit you can check if there's an input-fence so is different than
implicit, but still I agree that it would be better to have specific ioctls.
>>
>> I'm sending this to start the discussion on the best approach to implement
>> Explicit Synchronization, please check the TODO/OPEN section below.
>>
>> Explicit Synchronization allows us to control the synchronization of
>> shared buffers from userspace by passing fences to the kernel and/or
>> receiving them from the the kernel.
>>
>> Fences passed to the kernel are named in-fences and the kernel should wait
>> them to signal before using the buffer. On the other side, the kernel creates
>> out-fences for every buffer it receives from userspace. This fence is sent back
>> to userspace and it will signal when the capture, for example, has finished.
>>
>> Signalling an out-fence in V4L2 would mean that the job on the buffer is done
>> and the buffer can be used by other drivers.
>>
>> Current RFC implementation
>> --------------------------
>>
>> The current implementation is not intended to be more than a PoC to start
>> the discussion on how Explicit Synchronization should be supported in V4L2.
>>
>> The first patch proposes an userspace API for fences, then on patch 2
>> we prepare to the addition of in-fences in patch 3, by introducing the
>> infrastructure on vb2 to wait on an in-fence signal before queueing the buffer
>> in the driver.
>>
>> Patch 4 fix uvc v4l2 event handling and patch 5 configure q->dev for vivid
>> drivers to enable to subscribe and dequeue events on it.
>>
>> Patches 6-7 enables support to notify BUF_QUEUED events, i.e., let userspace
>> know that particular buffer was enqueued in the driver. This is needed,
>> because we return the out-fence fd as an out argument in QBUF, but at the time
>> it returns we don't know to which buffer the fence will be attached thus
>> the BUF_QUEUED event tells which buffer is associated to the fence received in
>> QBUF by userspace.
>>
>> Patches 8 and 9 add more fence infrastructure to support out-fences and finally
>> patch 10 adds support to out-fences.
>>
>> TODO/OPEN:
>> ----------
>>
>> * For this first implementation we will keep the ordering of the buffers queued
>> in videobuf2, that means we will only enqueue buffer whose fence was signalled
>> if that buffer is the first one in the queue. Otherwise it has to wait until it
>> is the first one. This is not implmented yet. Later we could create a flag to
>> allow unordered queing in the drivers from vb2 if needed.
>
> The V4L2 spec doesn't warrant that the buffers will be dequeued at the
> queue order.
>
> In practice, however, most drivers will not reorder. Yet, mem2mem codec
> drivers may reorder the buffers at the output, as the luminance information
> (Y) usually comes first on JPEG/MPEG-like formats.
>
>> * Should we have out-fences per-buffer or per-plane? or both? In this RFC, for
>> simplicity they are per-buffer, but Mauro and Javier raised the option of
>> doing per-plane fences. That could benefit mem2mem and V4L2 <-> GPU operation
>> at least on cases when we have Capture hw that releases the Y frame before the
>> other frames for example. When using V4L2 per-plane out-fences to communicate
>> with KMS they would need to be merged together as currently the DRM Plane
>> interface only supports one fence per DRM Plane.
>
> That's another advantage of using a new set of ioctls for queues: with that,
> queuing/dequeing per plane will be easier. On codec drivers, doing it per
> plane could bring performance improvements.
>
You don't really need to Q/DQ on a per plane basis AFAICT. Since on QBUF you
can get a set of out-fences that can be passed to the other driver and so it
should be able to wait per fence.
>> In-fences should be per-buffer as the DRM only has per-buffer fences, but
I'm not that familiar with DRM, but I thought DRM fences was also per plane
and not per buffer.
How this works without fences? For V4L2 there's a dma-buf fd per plane and
so I was expecting the DRM API to also import a dma-buf fd per DRM plane.
I only have access to an Exynos board whose display controller supports
single plane formats, so I don't know how this works for multi planar.
>> in case of mem2mem operations per-plane fences might be useful?
>>
>> So should we have both ways, per-plane and per-buffer, or just one of them
>> for now?
>
> The API should be flexible enough to support both usecases. We could
> implement just per-buffer in the beginning, but, on such case, we
> should deploy an API that will allow to later add per-plane fences
> without breaking userspace.
>
> So, I prefer that, on multiplane fences, we have one fence per plane,
> even if, at the first implementation, all fences will be released
> at the same time, when the buffer is fully filled. That would allow
> us to later improve it, without changing userspace.
It's true that vb2 can't currently signal fences per plane since the interface
between vb2 and drivers is per vb2_buffer. But the uAPI shouldn't be restricted
by this implementation detail (that can be changed) and should support per plane
fences IMHO.
That's for example the case with the V4L2 dma-buf API. There is a dma-buf fd per
plane, and internally for vb2 single planar buffers use the dma-buf associated
with plane 0.
Now when mentioning this, I noticed that in your implementation the fences are
not associated with a dma-buf. I thought the idea was for the fences to be
associated with a dma-buf's reservation object. If we do that, then fences will
be per fence since the dma-buf/reservation objet are also per fence in v4l2/vb2.
>
>> * other open topics are how to deal with hw-fences and Request API.
>
> Let's not mix issues here. Request API is already complex enough
> without explicit fences. The same is true for explicit fences: it is
> also complex to add support for it with multi-planes and a not
> ordered DQ.
>
> So, my suggestion here is to not delay Request API due to fences,
> nor to delay fences due to Request API, letting them to be merged
> on different Kernel releases. When both features got added, someone
> can work on a patchset that would allow using Request API with fences.
>
Best regards,
--
Javier Martinez Canillas
Open Source Group
Samsung Research America
Hi Gustavo,
On Mon, Apr 3, 2017 at 1:46 PM, Javier Martinez Canillas
<[email protected]> wrote:
> Hello Mauro and Gustavo,
>
> On 04/03/2017 07:16 AM, Mauro Carvalho Chehab wrote:
>> Hi Gustavo,
>>
>> Em Mon, 13 Mar 2017 16:20:25 -0300
>> Gustavo Padovan <[email protected]> escreveu:
>>
>>> From: Gustavo Padovan <[email protected]>
>>>
>>> Hi,
>>>
>>> This RFC adds support for Explicit Synchronization of shared buffers in V4L2.
>>> It uses the Sync File Framework[1] as vector to communicate the fences
>>> between kernel and userspace.
>>
>> Thanks for your work!
>>
>> I looked on your patchset, and I didn't notice anything really weird
>> there. So, instead on reviewing patch per patch, I prefer to discuss
>> about the requirements and API, as depending on it, the code base will
>> change a lot.
>>
>
> Agree that's better to first set on an uAPI and then implement based on that.
Yes. Agreeing on uAPI first would work well.
>
>> I'd like to do some tests with it on devices with mem2mem drivers.
>> My plan is to use an Exynos board for such thing, but I guess that
>> the DRM driver for it currently doesn't. I'm seeing internally if someone
>> could be sure that Exynos driver upstream will become ready for such
>> tests.
>>
>
> Not sure if you should try to do testing before agreeing on an uAPI and
> implementation.
Running some tests might give you a better feel for m2m - export buf - drm
cases without fences on exynos. This could help understand the performance
gains with fences.
>
>> Javier wrote some patches last year meant to implement implicit
>> fences support. What we noticed is that, while his mechanism worked
>> fine for pure capture and pure output devices, when we added a mem2mem
>> device, on a DMABUF+fences pipeline, e. g.:
>>
>> sensor -> [m2m] -> DRM
>>
>> End everything using fences/DMABUF, the fences mechanism caused dead
>> locks on existing userspace apps.
>>
>> A m2m device has both capture and output devnodes. Both should be
>> queued/dequeued. The capture queue is synchronized internally at the
>> driver with the output buffer[1].
>>
>> [1] The names here are counter-intuitive: "capture" is a devnode
>> where userspace receives a video stream; "output" is a devnode where
>> userspace feeds a video stream.
>>
>> The problem is that adding implicit fences changed the behavior of
>> the ioctls, causing gstreamer to wait forever for buffers to be ready.
>>
>
> The problem was related to trying to make user-space unaware of the implicit
> fences support, and so it tried to QBUF a buffer that had already a pending
> fence. A workaround was to block the second QBUF ioctl if the buffer had a
> pending fence, but this caused the mentioned deadlock since GStreamer wasn't
> expecting the QBUF ioctl to block.
>
>> I suspect that, even with explicit fences, the behavior of Q/DQ
>> will be incompatible with the current behavior (or will require some
>> dirty hacks to make it identical).
One big difference between implicit and explicit is that use-space is aware
of fences in the case of explicit. Can that knowledge be helpful in avoiding
the the problems we have seen with explicit?
>>
>> So, IMHO, the best would be to use a new set of ioctls, when fences are
>> used (like VIDIOC_QFENCE/DQFENCE).
>>
>
> For explicit you can check if there's an input-fence so is different than
> implicit, but still I agree that it would be better to have specific ioctls.
>
It would be nice if we can avoid adding more ioctls. As I mentioned earlier,
user-space is aware that fences are in use. Can we key off of that and make
it a queue property to be used to change the user-space action for fence vs.
no fence case?
<snip.>
thanks,
-- Shuah
Hi Gustavo,
Thank you for the patchset. Please see my comments below.
On Mon, Mar 13, 2017 at 04:20:25PM -0300, Gustavo Padovan wrote:
> From: Gustavo Padovan <[email protected]>
>
> Hi,
>
> This RFC adds support for Explicit Synchronization of shared buffers in V4L2.
> It uses the Sync File Framework[1] as vector to communicate the fences
> between kernel and userspace.
>
> I'm sending this to start the discussion on the best approach to implement
> Explicit Synchronization, please check the TODO/OPEN section below.
>
> Explicit Synchronization allows us to control the synchronization of
> shared buffers from userspace by passing fences to the kernel and/or
> receiving them from the the kernel.
>
> Fences passed to the kernel are named in-fences and the kernel should wait
> them to signal before using the buffer. On the other side, the kernel creates
> out-fences for every buffer it receives from userspace. This fence is sent back
> to userspace and it will signal when the capture, for example, has finished.
>
> Signalling an out-fence in V4L2 would mean that the job on the buffer is done
> and the buffer can be used by other drivers.
Shouldn't you be able to add two fences to the buffer, one in and one out?
I.e. you'd have the buffer passed from another device to a V4L2 device and
on to a third device.
(Or, two fences per a plane, as you elaborated below.)
>
> Current RFC implementation
> --------------------------
>
> The current implementation is not intended to be more than a PoC to start
> the discussion on how Explicit Synchronization should be supported in V4L2.
>
> The first patch proposes an userspace API for fences, then on patch 2
> we prepare to the addition of in-fences in patch 3, by introducing the
> infrastructure on vb2 to wait on an in-fence signal before queueing the buffer
> in the driver.
>
> Patch 4 fix uvc v4l2 event handling and patch 5 configure q->dev for vivid
> drivers to enable to subscribe and dequeue events on it.
>
> Patches 6-7 enables support to notify BUF_QUEUED events, i.e., let userspace
> know that particular buffer was enqueued in the driver. This is needed,
> because we return the out-fence fd as an out argument in QBUF, but at the time
> it returns we don't know to which buffer the fence will be attached thus
> the BUF_QUEUED event tells which buffer is associated to the fence received in
> QBUF by userspace.
>
> Patches 8 and 9 add more fence infrastructure to support out-fences and finally
> patch 10 adds support to out-fences.
>
> TODO/OPEN:
> ----------
>
> * For this first implementation we will keep the ordering of the buffers queued
> in videobuf2, that means we will only enqueue buffer whose fence was signalled
> if that buffer is the first one in the queue. Otherwise it has to wait until it
> is the first one. This is not implmented yet. Later we could create a flag to
> allow unordered queing in the drivers from vb2 if needed.
>
> * Should we have out-fences per-buffer or per-plane? or both? In this RFC, for
> simplicity they are per-buffer, but Mauro and Javier raised the option of
> doing per-plane fences. That could benefit mem2mem and V4L2 <-> GPU operation
> at least on cases when we have Capture hw that releases the Y frame before the
> other frames for example. When using V4L2 per-plane out-fences to communicate
> with KMS they would need to be merged together as currently the DRM Plane
> interface only supports one fence per DRM Plane.
>
> In-fences should be per-buffer as the DRM only has per-buffer fences, but
> in case of mem2mem operations per-plane fences might be useful?
>
> So should we have both ways, per-plane and per-buffer, or just one of them
> for now?
The data_offset field is only present in struct v4l2_plane, i.e. it is only
available through using the multi-planar API even if you just have a single
plane.
I'd say it'd be appropriate to have a fence per-plane, but I have no strong
opinion either way at least at the moment.
How otherwise could you make use of this in multi-planar OUTPUT queues? It
may seem like a far-fetched use case but I still wouldn't ignore it in
design.
>
> * other open topics are how to deal with hw-fences and Request API.
>
> Comments are welcome!
>
> Regards,
>
> Gustavo
>
> ---
> Gustavo Padovan (9):
> [media] vb2: add explicit fence user API
> [media] vb2: split out queueing from vb_core_qbuf()
> [media] vb2: add in-fence support to QBUF
> [media] uvc: enable subscriptions to other events
> [media] vivid: assign the specific device to the vb2_queue->dev
> [media] v4l: add V4L2_EVENT_BUF_QUEUED event
> [media] v4l: add support to BUF_QUEUED event
> [media] vb2: add infrastructure to support out-fences
> [media] vb2: add out-fence support to QBUF
>
> Javier Martinez Canillas (1):
> [media] vb2: add videobuf2 dma-buf fence helpers
>
> drivers/media/Kconfig | 1 +
> drivers/media/platform/vivid/vivid-core.c | 10 +-
> drivers/media/usb/uvc/uvc_v4l2.c | 2 +-
> drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 4 +-
> drivers/media/v4l2-core/v4l2-ctrls.c | 6 +-
> drivers/media/v4l2-core/videobuf2-core.c | 139 ++++++++++++++++++++------
> drivers/media/v4l2-core/videobuf2-v4l2.c | 29 +++++-
> include/media/videobuf2-core.h | 12 ++-
> include/media/videobuf2-fence.h | 49 +++++++++
> include/uapi/linux/videodev2.h | 12 ++-
> 10 files changed, 218 insertions(+), 46 deletions(-)
> create mode 100644 include/media/videobuf2-fence.h
>
--
Kind regards,
Sakari Ailus
e-mail: [email protected] XMPP: [email protected]
Hi Philipp,
2017-04-03 Philipp Zabel <[email protected]>:
> Hi Gustavo,
>
> On Mon, 2017-03-13 at 16:20 -0300, Gustavo Padovan wrote:
> > From: Gustavo Padovan <[email protected]>
> >
> > Turn the reserved2 field into fence_fd that we will use to send
> > an in-fence to the kernel return an out-fence from the kernel to
> > userspace.
> >
> > Two new flags were added, V4L2_BUF_FLAG_IN_FENCE and
> > V4L2_BUF_FLAG_OUT_FENCE. They should be used when setting in-fence and
> > out-fence, respectively.
> >
> > Signed-off-by: Gustavo Padovan <[email protected]>
> > ---
> > drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 4 ++--
> > drivers/media/v4l2-core/videobuf2-v4l2.c | 2 +-
> > include/uapi/linux/videodev2.h | 6 ++++--
> > 3 files changed, 7 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
> > index eac9565..0a522cb 100644
> > --- a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
> > +++ b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
> > @@ -348,7 +348,7 @@ struct v4l2_buffer32 {
> > __s32 fd;
> > } m;
> > __u32 length;
> > - __u32 reserved2;
> > + __s32 fence_fd;
> > __u32 reserved;
> > };
> >
> > @@ -511,7 +511,7 @@ static int put_v4l2_buffer32(struct v4l2_buffer *kp, struct v4l2_buffer32 __user
> > put_user(kp->timestamp.tv_usec, &up->timestamp.tv_usec) ||
> > copy_to_user(&up->timecode, &kp->timecode, sizeof(struct v4l2_timecode)) ||
> > put_user(kp->sequence, &up->sequence) ||
> > - put_user(kp->reserved2, &up->reserved2) ||
> > + put_user(kp->fence_fd, &up->fence_fd) ||
> > put_user(kp->reserved, &up->reserved) ||
> > put_user(kp->length, &up->length))
> > return -EFAULT;
> > diff --git a/drivers/media/v4l2-core/videobuf2-v4l2.c b/drivers/media/v4l2-core/videobuf2-v4l2.c
> > index 3529849..d23c1bf 100644
> > --- a/drivers/media/v4l2-core/videobuf2-v4l2.c
> > +++ b/drivers/media/v4l2-core/videobuf2-v4l2.c
> > @@ -203,7 +203,7 @@ static void __fill_v4l2_buffer(struct vb2_buffer *vb, void *pb)
> > b->timestamp = ns_to_timeval(vb->timestamp);
> > b->timecode = vbuf->timecode;
> > b->sequence = vbuf->sequence;
> > - b->reserved2 = 0;
> > + b->fence_fd = -1;
> > b->reserved = 0;
> >
> > if (q->is_multiplanar) {
> > diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> > index 45184a2..3b6cfa6 100644
> > --- a/include/uapi/linux/videodev2.h
> > +++ b/include/uapi/linux/videodev2.h
> > @@ -911,7 +911,7 @@ struct v4l2_buffer {
> > __s32 fd;
> > } m;
> > __u32 length;
> > - __u32 reserved2;
> > + __s32 fence_fd;
> > __u32 reserved;
> > };
> >
> > @@ -946,8 +946,10 @@ struct v4l2_buffer {
> > #define V4L2_BUF_FLAG_TSTAMP_SRC_MASK 0x00070000
> > #define V4L2_BUF_FLAG_TSTAMP_SRC_EOF 0x00000000
> > #define V4L2_BUF_FLAG_TSTAMP_SRC_SOE 0x00010000
> > +#define V4L2_BUF_FLAG_IN_FENCE 0x00100000
> > +#define V4L2_BUF_FLAG_OUT_FENCE 0x00200000
> > /* mem2mem encoder/decoder */
> > -#define V4L2_BUF_FLAG_LAST 0x00100000
> > +#define V4L2_BUF_FLAG_LAST 0x00400000
>
> This is not just a sentinel, it is a meaningful flag that must not be
> changed. It indicates the last buffer produced by a hardware codec.
Oh, I wouldn't be able to figure out it myself! Thanks.
Gustavo
2017-04-03 Javier Martinez Canillas <[email protected]>:
> Hello Mauro and Gustavo,
>
> On 04/03/2017 07:16 AM, Mauro Carvalho Chehab wrote:
> > Hi Gustavo,
> >
> > Em Mon, 13 Mar 2017 16:20:25 -0300
> > Gustavo Padovan <[email protected]> escreveu:
> >
> >> From: Gustavo Padovan <[email protected]>
> >>
> >> Hi,
> >>
> >> This RFC adds support for Explicit Synchronization of shared buffers in V4L2.
> >> It uses the Sync File Framework[1] as vector to communicate the fences
> >> between kernel and userspace.
> >
> > Thanks for your work!
> >
> > I looked on your patchset, and I didn't notice anything really weird
> > there. So, instead on reviewing patch per patch, I prefer to discuss
> > about the requirements and API, as depending on it, the code base will
> > change a lot.
> >
>
> Agree that's better to first set on an uAPI and then implement based on that.
>
> > I'd like to do some tests with it on devices with mem2mem drivers.
> > My plan is to use an Exynos board for such thing, but I guess that
> > the DRM driver for it currently doesn't. I'm seeing internally if someone
> > could be sure that Exynos driver upstream will become ready for such
> > tests.
> >
>
> Not sure if you should try to do testing before agreeing on an uAPI and
> implementation.
>
> > Javier wrote some patches last year meant to implement implicit
> > fences support. What we noticed is that, while his mechanism worked
> > fine for pure capture and pure output devices, when we added a mem2mem
> > device, on a DMABUF+fences pipeline, e. g.:
> >
> > sensor -> [m2m] -> DRM
> >
> > End everything using fences/DMABUF, the fences mechanism caused dead
> > locks on existing userspace apps.
> >
> > A m2m device has both capture and output devnodes. Both should be
> > queued/dequeued. The capture queue is synchronized internally at the
> > driver with the output buffer[1].
> >
> > [1] The names here are counter-intuitive: "capture" is a devnode
> > where userspace receives a video stream; "output" is a devnode where
> > userspace feeds a video stream.
> >
> > The problem is that adding implicit fences changed the behavior of
> > the ioctls, causing gstreamer to wait forever for buffers to be ready.
> >
>
> The problem was related to trying to make user-space unaware of the implicit
> fences support, and so it tried to QBUF a buffer that had already a pending
> fence. A workaround was to block the second QBUF ioctl if the buffer had a
> pending fence, but this caused the mentioned deadlock since GStreamer wasn't
> expecting the QBUF ioctl to block.
>
> > I suspect that, even with explicit fences, the behavior of Q/DQ
> > will be incompatible with the current behavior (or will require some
> > dirty hacks to make it identical).
For QBUF the only difference is that we set flags for fences and pass
and receives in and out fences. For DQBUF the behavior is exactly the
same. What incompatibles or hacks do you see?
I had the expectation that the flags would be for userspace to learn
about any different behavior.
> >
> > So, IMHO, the best would be to use a new set of ioctls, when fences are
> > used (like VIDIOC_QFENCE/DQFENCE).
> >
>
> For explicit you can check if there's an input-fence so is different than
> implicit, but still I agree that it would be better to have specific ioctls.
I'm pretty new to v4l2 so I don't know all use cases yet, but what I
thought was to just add extra flags to QBUF to mark when using fences
instead of having userspace to setup completely new ioctls for fences.
The burden for userspace should be smaller with flags.
>
> >>
> >> I'm sending this to start the discussion on the best approach to implement
> >> Explicit Synchronization, please check the TODO/OPEN section below.
> >>
> >> Explicit Synchronization allows us to control the synchronization of
> >> shared buffers from userspace by passing fences to the kernel and/or
> >> receiving them from the the kernel.
> >>
> >> Fences passed to the kernel are named in-fences and the kernel should wait
> >> them to signal before using the buffer. On the other side, the kernel creates
> >> out-fences for every buffer it receives from userspace. This fence is sent back
> >> to userspace and it will signal when the capture, for example, has finished.
> >>
> >> Signalling an out-fence in V4L2 would mean that the job on the buffer is done
> >> and the buffer can be used by other drivers.
> >>
> >> Current RFC implementation
> >> --------------------------
> >>
> >> The current implementation is not intended to be more than a PoC to start
> >> the discussion on how Explicit Synchronization should be supported in V4L2.
> >>
> >> The first patch proposes an userspace API for fences, then on patch 2
> >> we prepare to the addition of in-fences in patch 3, by introducing the
> >> infrastructure on vb2 to wait on an in-fence signal before queueing the buffer
> >> in the driver.
> >>
> >> Patch 4 fix uvc v4l2 event handling and patch 5 configure q->dev for vivid
> >> drivers to enable to subscribe and dequeue events on it.
> >>
> >> Patches 6-7 enables support to notify BUF_QUEUED events, i.e., let userspace
> >> know that particular buffer was enqueued in the driver. This is needed,
> >> because we return the out-fence fd as an out argument in QBUF, but at the time
> >> it returns we don't know to which buffer the fence will be attached thus
> >> the BUF_QUEUED event tells which buffer is associated to the fence received in
> >> QBUF by userspace.
> >>
> >> Patches 8 and 9 add more fence infrastructure to support out-fences and finally
> >> patch 10 adds support to out-fences.
> >>
> >> TODO/OPEN:
> >> ----------
> >>
> >> * For this first implementation we will keep the ordering of the buffers queued
> >> in videobuf2, that means we will only enqueue buffer whose fence was signalled
> >> if that buffer is the first one in the queue. Otherwise it has to wait until it
> >> is the first one. This is not implmented yet. Later we could create a flag to
> >> allow unordered queing in the drivers from vb2 if needed.
> >
> > The V4L2 spec doesn't warrant that the buffers will be dequeued at the
> > queue order.
> >
> > In practice, however, most drivers will not reorder. Yet, mem2mem codec
> > drivers may reorder the buffers at the output, as the luminance information
> > (Y) usually comes first on JPEG/MPEG-like formats.
> >
> >> * Should we have out-fences per-buffer or per-plane? or both? In this RFC, for
> >> simplicity they are per-buffer, but Mauro and Javier raised the option of
> >> doing per-plane fences. That could benefit mem2mem and V4L2 <-> GPU operation
> >> at least on cases when we have Capture hw that releases the Y frame before the
> >> other frames for example. When using V4L2 per-plane out-fences to communicate
> >> with KMS they would need to be merged together as currently the DRM Plane
> >> interface only supports one fence per DRM Plane.
> >
> > That's another advantage of using a new set of ioctls for queues: with that,
> > queuing/dequeing per plane will be easier. On codec drivers, doing it per
> > plane could bring performance improvements.
> >
>
> You don't really need to Q/DQ on a per plane basis AFAICT. Since on QBUF you
> can get a set of out-fences that can be passed to the other driver and so it
> should be able to wait per fence.
>
> >> In-fences should be per-buffer as the DRM only has per-buffer fences, but
>
> I'm not that familiar with DRM, but I thought DRM fences was also per plane
> and not per buffer.
DRM plane is a different thing, its a representation of a region on the
screen and there is only one buffer for each DRM plane.
One of the questions I raised was: how to match V4L2 per-plane fences to
DRM per-buffer fences?
>
> How this works without fences? For V4L2 there's a dma-buf fd per plane and
> so I was expecting the DRM API to also import a dma-buf fd per DRM plane.
Yes. It should do something similar behind the Framebuffer abstraction.
>
> I only have access to an Exynos board whose display controller supports
> single plane formats, so I don't know how this works for multi planar.
>
> >> in case of mem2mem operations per-plane fences might be useful?
> >>
> >> So should we have both ways, per-plane and per-buffer, or just one of them
> >> for now?
> >
> > The API should be flexible enough to support both usecases. We could
> > implement just per-buffer in the beginning, but, on such case, we
> > should deploy an API that will allow to later add per-plane fences
> > without breaking userspace.
I believe we can just extend the per-plane parts of QBUF for their
fences. We could even use plane[0] for the per-buffer case.
> >
> > So, I prefer that, on multiplane fences, we have one fence per plane,
> > even if, at the first implementation, all fences will be released
> > at the same time, when the buffer is fully filled. That would allow
> > us to later improve it, without changing userspace.
>
> It's true that vb2 can't currently signal fences per plane since the interface
> between vb2 and drivers is per vb2_buffer. But the uAPI shouldn't be restricted
> by this implementation detail (that can be changed) and should support per plane
> fences IMHO.
>
> That's for example the case with the V4L2 dma-buf API. There is a dma-buf fd per
> plane, and internally for vb2 single planar buffers use the dma-buf associated
> with plane 0.
>
> Now when mentioning this, I noticed that in your implementation the fences are
> not associated with a dma-buf. I thought the idea was for the fences to be
> associated with a dma-buf's reservation object. If we do that, then fences will
> be per fence since the dma-buf/reservation objet are also per fence in v4l2/vb2.
Can you explain what you were thinking on the relation between fences
and reservation objects? Not sure I follow.
Gustavo
Hi Sakari,
2017-04-04 Sakari Ailus <[email protected]>:
> Hi Gustavo,
>
> Thank you for the patchset. Please see my comments below.
>
> On Mon, Mar 13, 2017 at 04:20:25PM -0300, Gustavo Padovan wrote:
> > From: Gustavo Padovan <[email protected]>
> >
> > Hi,
> >
> > This RFC adds support for Explicit Synchronization of shared buffers in V4L2.
> > It uses the Sync File Framework[1] as vector to communicate the fences
> > between kernel and userspace.
> >
> > I'm sending this to start the discussion on the best approach to implement
> > Explicit Synchronization, please check the TODO/OPEN section below.
> >
> > Explicit Synchronization allows us to control the synchronization of
> > shared buffers from userspace by passing fences to the kernel and/or
> > receiving them from the the kernel.
> >
> > Fences passed to the kernel are named in-fences and the kernel should wait
> > them to signal before using the buffer. On the other side, the kernel creates
> > out-fences for every buffer it receives from userspace. This fence is sent back
> > to userspace and it will signal when the capture, for example, has finished.
> >
> > Signalling an out-fence in V4L2 would mean that the job on the buffer is done
> > and the buffer can be used by other drivers.
>
> Shouldn't you be able to add two fences to the buffer, one in and one out?
> I.e. you'd have the buffer passed from another device to a V4L2 device and
> on to a third device.
>
> (Or, two fences per a plane, as you elaborated below.)
The out one should be created by V4L2 in this case, sent to userspace
and then sent to third device. Another options is what we've been
calling future fences in DRM. Where we may have a syscall to create this
out-fence for us and then we could pass both in and out fence to the
device. But that can be supported later along with what this RFC
proposes.
>
> >
> > Current RFC implementation
> > --------------------------
> >
> > The current implementation is not intended to be more than a PoC to start
> > the discussion on how Explicit Synchronization should be supported in V4L2.
> >
> > The first patch proposes an userspace API for fences, then on patch 2
> > we prepare to the addition of in-fences in patch 3, by introducing the
> > infrastructure on vb2 to wait on an in-fence signal before queueing the buffer
> > in the driver.
> >
> > Patch 4 fix uvc v4l2 event handling and patch 5 configure q->dev for vivid
> > drivers to enable to subscribe and dequeue events on it.
> >
> > Patches 6-7 enables support to notify BUF_QUEUED events, i.e., let userspace
> > know that particular buffer was enqueued in the driver. This is needed,
> > because we return the out-fence fd as an out argument in QBUF, but at the time
> > it returns we don't know to which buffer the fence will be attached thus
> > the BUF_QUEUED event tells which buffer is associated to the fence received in
> > QBUF by userspace.
> >
> > Patches 8 and 9 add more fence infrastructure to support out-fences and finally
> > patch 10 adds support to out-fences.
> >
> > TODO/OPEN:
> > ----------
> >
> > * For this first implementation we will keep the ordering of the buffers queued
> > in videobuf2, that means we will only enqueue buffer whose fence was signalled
> > if that buffer is the first one in the queue. Otherwise it has to wait until it
> > is the first one. This is not implmented yet. Later we could create a flag to
> > allow unordered queing in the drivers from vb2 if needed.
> >
> > * Should we have out-fences per-buffer or per-plane? or both? In this RFC, for
> > simplicity they are per-buffer, but Mauro and Javier raised the option of
> > doing per-plane fences. That could benefit mem2mem and V4L2 <-> GPU operation
> > at least on cases when we have Capture hw that releases the Y frame before the
> > other frames for example. When using V4L2 per-plane out-fences to communicate
> > with KMS they would need to be merged together as currently the DRM Plane
> > interface only supports one fence per DRM Plane.
> >
> > In-fences should be per-buffer as the DRM only has per-buffer fences, but
> > in case of mem2mem operations per-plane fences might be useful?
> >
> > So should we have both ways, per-plane and per-buffer, or just one of them
> > for now?
>
> The data_offset field is only present in struct v4l2_plane, i.e. it is only
> available through using the multi-planar API even if you just have a single
> plane.
I didn't get why you mentioned the data_offset field. :)
Gustavo
Hello Gustavo,
On 04/05/2017 11:09 AM, Gustavo Padovan wrote:
> 2017-04-03 Javier Martinez Canillas <[email protected]>:
>
>> Hello Mauro and Gustavo,
>>
>> On 04/03/2017 07:16 AM, Mauro Carvalho Chehab wrote:
>>> Hi Gustavo,
>>>
>>> Em Mon, 13 Mar 2017 16:20:25 -0300
>>> Gustavo Padovan <[email protected]> escreveu:
>>>
>>>> From: Gustavo Padovan <[email protected]>
>>>>
>>>> Hi,
>>>>
>>>> This RFC adds support for Explicit Synchronization of shared buffers in V4L2.
>>>> It uses the Sync File Framework[1] as vector to communicate the fences
>>>> between kernel and userspace.
>>>
>>> Thanks for your work!
>>>
>>> I looked on your patchset, and I didn't notice anything really weird
>>> there. So, instead on reviewing patch per patch, I prefer to discuss
>>> about the requirements and API, as depending on it, the code base will
>>> change a lot.
>>>
>>
>> Agree that's better to first set on an uAPI and then implement based on that.
>>
>>> I'd like to do some tests with it on devices with mem2mem drivers.
>>> My plan is to use an Exynos board for such thing, but I guess that
>>> the DRM driver for it currently doesn't. I'm seeing internally if someone
>>> could be sure that Exynos driver upstream will become ready for such
>>> tests.
>>>
>>
>> Not sure if you should try to do testing before agreeing on an uAPI and
>> implementation.
>>
>>> Javier wrote some patches last year meant to implement implicit
>>> fences support. What we noticed is that, while his mechanism worked
>>> fine for pure capture and pure output devices, when we added a mem2mem
>>> device, on a DMABUF+fences pipeline, e. g.:
>>>
>>> sensor -> [m2m] -> DRM
>>>
>>> End everything using fences/DMABUF, the fences mechanism caused dead
>>> locks on existing userspace apps.
>>>
>>> A m2m device has both capture and output devnodes. Both should be
>>> queued/dequeued. The capture queue is synchronized internally at the
>>> driver with the output buffer[1].
>>>
>>> [1] The names here are counter-intuitive: "capture" is a devnode
>>> where userspace receives a video stream; "output" is a devnode where
>>> userspace feeds a video stream.
>>>
>>> The problem is that adding implicit fences changed the behavior of
>>> the ioctls, causing gstreamer to wait forever for buffers to be ready.
>>>
>>
>> The problem was related to trying to make user-space unaware of the implicit
>> fences support, and so it tried to QBUF a buffer that had already a pending
>> fence. A workaround was to block the second QBUF ioctl if the buffer had a
>> pending fence, but this caused the mentioned deadlock since GStreamer wasn't
>> expecting the QBUF ioctl to block.
>>
>>> I suspect that, even with explicit fences, the behavior of Q/DQ
>>> will be incompatible with the current behavior (or will require some
>>> dirty hacks to make it identical).
>
> For QBUF the only difference is that we set flags for fences and pass
> and receives in and out fences. For DQBUF the behavior is exactly the
> same. What incompatibles or hacks do you see?
>
> I had the expectation that the flags would be for userspace to learn
> about any different behavior.
>
>>>
>>> So, IMHO, the best would be to use a new set of ioctls, when fences are
>>> used (like VIDIOC_QFENCE/DQFENCE).
>>>
>>
>> For explicit you can check if there's an input-fence so is different than
>> implicit, but still I agree that it would be better to have specific ioctls.
>
> I'm pretty new to v4l2 so I don't know all use cases yet, but what I
> thought was to just add extra flags to QBUF to mark when using fences
> instead of having userspace to setup completely new ioctls for fences.
> The burden for userspace should be smaller with flags.
>
Yes, you are right. So I guess its better indeed to just extend the current
ioctls as you propose and only move to new ones if really needed.
>>
>>>>
>>>> I'm sending this to start the discussion on the best approach to implement
>>>> Explicit Synchronization, please check the TODO/OPEN section below.
>>>>
>>>> Explicit Synchronization allows us to control the synchronization of
>>>> shared buffers from userspace by passing fences to the kernel and/or
>>>> receiving them from the the kernel.
>>>>
>>>> Fences passed to the kernel are named in-fences and the kernel should wait
>>>> them to signal before using the buffer. On the other side, the kernel creates
>>>> out-fences for every buffer it receives from userspace. This fence is sent back
>>>> to userspace and it will signal when the capture, for example, has finished.
>>>>
>>>> Signalling an out-fence in V4L2 would mean that the job on the buffer is done
>>>> and the buffer can be used by other drivers.
>>>>
>>>> Current RFC implementation
>>>> --------------------------
>>>>
>>>> The current implementation is not intended to be more than a PoC to start
>>>> the discussion on how Explicit Synchronization should be supported in V4L2.
>>>>
>>>> The first patch proposes an userspace API for fences, then on patch 2
>>>> we prepare to the addition of in-fences in patch 3, by introducing the
>>>> infrastructure on vb2 to wait on an in-fence signal before queueing the buffer
>>>> in the driver.
>>>>
>>>> Patch 4 fix uvc v4l2 event handling and patch 5 configure q->dev for vivid
>>>> drivers to enable to subscribe and dequeue events on it.
>>>>
>>>> Patches 6-7 enables support to notify BUF_QUEUED events, i.e., let userspace
>>>> know that particular buffer was enqueued in the driver. This is needed,
>>>> because we return the out-fence fd as an out argument in QBUF, but at the time
>>>> it returns we don't know to which buffer the fence will be attached thus
>>>> the BUF_QUEUED event tells which buffer is associated to the fence received in
>>>> QBUF by userspace.
>>>>
>>>> Patches 8 and 9 add more fence infrastructure to support out-fences and finally
>>>> patch 10 adds support to out-fences.
>>>>
>>>> TODO/OPEN:
>>>> ----------
>>>>
>>>> * For this first implementation we will keep the ordering of the buffers queued
>>>> in videobuf2, that means we will only enqueue buffer whose fence was signalled
>>>> if that buffer is the first one in the queue. Otherwise it has to wait until it
>>>> is the first one. This is not implmented yet. Later we could create a flag to
>>>> allow unordered queing in the drivers from vb2 if needed.
>>>
>>> The V4L2 spec doesn't warrant that the buffers will be dequeued at the
>>> queue order.
>>>
>>> In practice, however, most drivers will not reorder. Yet, mem2mem codec
>>> drivers may reorder the buffers at the output, as the luminance information
>>> (Y) usually comes first on JPEG/MPEG-like formats.
>>>
>>>> * Should we have out-fences per-buffer or per-plane? or both? In this RFC, for
>>>> simplicity they are per-buffer, but Mauro and Javier raised the option of
>>>> doing per-plane fences. That could benefit mem2mem and V4L2 <-> GPU operation
>>>> at least on cases when we have Capture hw that releases the Y frame before the
>>>> other frames for example. When using V4L2 per-plane out-fences to communicate
>>>> with KMS they would need to be merged together as currently the DRM Plane
>>>> interface only supports one fence per DRM Plane.
>>>
>>> That's another advantage of using a new set of ioctls for queues: with that,
>>> queuing/dequeing per plane will be easier. On codec drivers, doing it per
>>> plane could bring performance improvements.
>>>
>>
>> You don't really need to Q/DQ on a per plane basis AFAICT. Since on QBUF you
>> can get a set of out-fences that can be passed to the other driver and so it
>> should be able to wait per fence.
>>
>>>> In-fences should be per-buffer as the DRM only has per-buffer fences, but
>>
>> I'm not that familiar with DRM, but I thought DRM fences was also per plane
>> and not per buffer.
>
> DRM plane is a different thing, its a representation of a region on the
> screen and there is only one buffer for each DRM plane.
>
Yes, but you can still have different DRM planes, right? For example a RGB
primary plane and a YUV overlay plane for video display. That's why I thought
there would be some mapping between the v4l2 planes and DRM planes since there's
a way for DRM to import dma-bufs exported by v4l2 and vice-versa.
> One of the questions I raised was: how to match V4L2 per-plane fences to
> DRM per-buffer fences?
>
I don't think that DRM fences should be per-buffer but instead per dma-buf.
It's OK if there's a 1:1 mapping between DRM buffer and dma-buf, but that's
not the case for v4l2. You can have many dma-bufs in a v4l2_buffer, one per
each v4l2 plane.
IOW, it shouldn't be about DRM buffers waiting for fences associated with
v4l2 buffers, but instead about DRM waiting for fences associated with
dma-bufs that were exported by v4l2 and imported by DRM.
>>
>> How this works without fences? For V4L2 there's a dma-buf fd per plane and
>> so I was expecting the DRM API to also import a dma-buf fd per DRM plane.
>
> Yes. It should do something similar behind the Framebuffer abstraction.
>
>>
>> I only have access to an Exynos board whose display controller supports
>> single plane formats, so I don't know how this works for multi planar.
>>
>>>> in case of mem2mem operations per-plane fences might be useful?
>>>>
>>>> So should we have both ways, per-plane and per-buffer, or just one of them
>>>> for now?
>>>
>>> The API should be flexible enough to support both usecases. We could
>>> implement just per-buffer in the beginning, but, on such case, we
>>> should deploy an API that will allow to later add per-plane fences
>>> without breaking userspace.
>
> I believe we can just extend the per-plane parts of QBUF for their
> fences. We could even use plane[0] for the per-buffer case.
>
Agreed.
>>>
>>> So, I prefer that, on multiplane fences, we have one fence per plane,
>>> even if, at the first implementation, all fences will be released
>>> at the same time, when the buffer is fully filled. That would allow
>>> us to later improve it, without changing userspace.
>>
>> It's true that vb2 can't currently signal fences per plane since the interface
>> between vb2 and drivers is per vb2_buffer. But the uAPI shouldn't be restricted
>> by this implementation detail (that can be changed) and should support per plane
>> fences IMHO.
>>
>> That's for example the case with the V4L2 dma-buf API. There is a dma-buf fd per
>> plane, and internally for vb2 single planar buffers use the dma-buf associated
>> with plane 0.
>>
>> Now when mentioning this, I noticed that in your implementation the fences are
>> not associated with a dma-buf. I thought the idea was for the fences to be
>> associated with a dma-buf's reservation object. If we do that, then fences will
>> be per fence since the dma-buf/reservation objet are also per fence in v4l2/vb2.
>
Argh, I noticed that I made a lot of typos here. What I meant was:
"then fences will be per v4l2 plane since the dma-buf/reservation object are also
per v4l2 plane in v4l2/vb2."
So what I tried to say is that each v4l2 plane has an associated dma-buf, and so
if fences are per plane, it will also be per dma-buf. I guess at the DRM side is
similar, you have dma-buf associated with DRM planes that fences should protect.
> Can you explain what you were thinking on the relation between fences
> and reservation objects? Not sure I follow.
>
As mentioned above, I believe that fences should be associated with dma-bufs.
And a dma-buf already has a reservation object that can be used to associate
fences to a dma-buf. You can add shared or exclusive fences and that be looked
up by the other driver.
That's what implicit fences uses and I believe explicit fences should also use
this infrastructure and export these fences with sync_file_create() as you do.
> Gustavo
>
Best regards,
--
Javier Martinez Canillas
Open Source Group
Samsung Research America
Hi Gustavo,
On Wed, Apr 05, 2017 at 05:24:57PM +0200, Gustavo Padovan wrote:
> Hi Sakari,
>
> 2017-04-04 Sakari Ailus <[email protected]>:
>
> > Hi Gustavo,
> >
> > Thank you for the patchset. Please see my comments below.
> >
> > On Mon, Mar 13, 2017 at 04:20:25PM -0300, Gustavo Padovan wrote:
> > > From: Gustavo Padovan <[email protected]>
> > >
> > > Hi,
> > >
> > > This RFC adds support for Explicit Synchronization of shared buffers in V4L2.
> > > It uses the Sync File Framework[1] as vector to communicate the fences
> > > between kernel and userspace.
> > >
> > > I'm sending this to start the discussion on the best approach to implement
> > > Explicit Synchronization, please check the TODO/OPEN section below.
> > >
> > > Explicit Synchronization allows us to control the synchronization of
> > > shared buffers from userspace by passing fences to the kernel and/or
> > > receiving them from the the kernel.
> > >
> > > Fences passed to the kernel are named in-fences and the kernel should wait
> > > them to signal before using the buffer. On the other side, the kernel creates
> > > out-fences for every buffer it receives from userspace. This fence is sent back
> > > to userspace and it will signal when the capture, for example, has finished.
> > >
> > > Signalling an out-fence in V4L2 would mean that the job on the buffer is done
> > > and the buffer can be used by other drivers.
> >
> > Shouldn't you be able to add two fences to the buffer, one in and one out?
> > I.e. you'd have the buffer passed from another device to a V4L2 device and
> > on to a third device.
> >
> > (Or, two fences per a plane, as you elaborated below.)
>
> The out one should be created by V4L2 in this case, sent to userspace
> and then sent to third device. Another options is what we've been
> calling future fences in DRM. Where we may have a syscall to create this
> out-fence for us and then we could pass both in and out fence to the
> device. But that can be supported later along with what this RFC
> proposes.
Please excuse my ignorance on fences.
I just wanted to make sure that case was also considered. struct v4l2_buffer
will run out of space soon so we'll need a replacement anyway. The timecode
field is still available for re-use...
>
>
> >
> > >
> > > Current RFC implementation
> > > --------------------------
> > >
> > > The current implementation is not intended to be more than a PoC to start
> > > the discussion on how Explicit Synchronization should be supported in V4L2.
> > >
> > > The first patch proposes an userspace API for fences, then on patch 2
> > > we prepare to the addition of in-fences in patch 3, by introducing the
> > > infrastructure on vb2 to wait on an in-fence signal before queueing the buffer
> > > in the driver.
> > >
> > > Patch 4 fix uvc v4l2 event handling and patch 5 configure q->dev for vivid
> > > drivers to enable to subscribe and dequeue events on it.
> > >
> > > Patches 6-7 enables support to notify BUF_QUEUED events, i.e., let userspace
> > > know that particular buffer was enqueued in the driver. This is needed,
> > > because we return the out-fence fd as an out argument in QBUF, but at the time
> > > it returns we don't know to which buffer the fence will be attached thus
> > > the BUF_QUEUED event tells which buffer is associated to the fence received in
> > > QBUF by userspace.
> > >
> > > Patches 8 and 9 add more fence infrastructure to support out-fences and finally
> > > patch 10 adds support to out-fences.
> > >
> > > TODO/OPEN:
> > > ----------
> > >
> > > * For this first implementation we will keep the ordering of the buffers queued
> > > in videobuf2, that means we will only enqueue buffer whose fence was signalled
> > > if that buffer is the first one in the queue. Otherwise it has to wait until it
> > > is the first one. This is not implmented yet. Later we could create a flag to
> > > allow unordered queing in the drivers from vb2 if needed.
> > >
> > > * Should we have out-fences per-buffer or per-plane? or both? In this RFC, for
> > > simplicity they are per-buffer, but Mauro and Javier raised the option of
> > > doing per-plane fences. That could benefit mem2mem and V4L2 <-> GPU operation
> > > at least on cases when we have Capture hw that releases the Y frame before the
> > > other frames for example. When using V4L2 per-plane out-fences to communicate
> > > with KMS they would need to be merged together as currently the DRM Plane
> > > interface only supports one fence per DRM Plane.
> > >
> > > In-fences should be per-buffer as the DRM only has per-buffer fences, but
> > > in case of mem2mem operations per-plane fences might be useful?
> > >
> > > So should we have both ways, per-plane and per-buffer, or just one of them
> > > for now?
> >
> > The data_offset field is only present in struct v4l2_plane, i.e. it is only
> > available through using the multi-planar API even if you just have a single
> > plane.
>
> I didn't get why you mentioned the data_offset field. :)
I think I meant to continue this but didn't end up writing it down. :-)
What I wanted to say that the multi-plane API is a super-set of the
single-plane API and there's already a case for not all the functionality
being available through the single-plane API. At least I'm ok with adding
the per-plane fences to the multi-plane API only.
The framework could possibly do more to support the single-plane API as an
application interface so that the applications using single-plane API only
would get that as a bonus. (Just thinking out loud. Out of scope of this
patchset definitely.)
--
Kind regards,
Sakari Ailus
e-mail: [email protected] XMPP: [email protected]
Hi Javier,
2017-04-05 Javier Martinez Canillas <[email protected]>:
> Hello Gustavo,
>
> On 04/05/2017 11:09 AM, Gustavo Padovan wrote:
> > 2017-04-03 Javier Martinez Canillas <[email protected]>:
> >
> >> Hello Mauro and Gustavo,
> >>
> >> On 04/03/2017 07:16 AM, Mauro Carvalho Chehab wrote:
> >>> Hi Gustavo,
> >>>
> >>> Em Mon, 13 Mar 2017 16:20:25 -0300
> >>> Gustavo Padovan <[email protected]> escreveu:
> >>>
> >>>> From: Gustavo Padovan <[email protected]>
> >>>>
> >>>> Hi,
> >>>>
> >>>> This RFC adds support for Explicit Synchronization of shared buffers in V4L2.
> >>>> It uses the Sync File Framework[1] as vector to communicate the fences
> >>>> between kernel and userspace.
> >>>
> >>> Thanks for your work!
> >>>
> >>> I looked on your patchset, and I didn't notice anything really weird
> >>> there. So, instead on reviewing patch per patch, I prefer to discuss
> >>> about the requirements and API, as depending on it, the code base will
> >>> change a lot.
> >>>
> >>
> >> Agree that's better to first set on an uAPI and then implement based on that.
> >>
> >>> I'd like to do some tests with it on devices with mem2mem drivers.
> >>> My plan is to use an Exynos board for such thing, but I guess that
> >>> the DRM driver for it currently doesn't. I'm seeing internally if someone
> >>> could be sure that Exynos driver upstream will become ready for such
> >>> tests.
> >>>
> >>
> >> Not sure if you should try to do testing before agreeing on an uAPI and
> >> implementation.
> >>
> >>> Javier wrote some patches last year meant to implement implicit
> >>> fences support. What we noticed is that, while his mechanism worked
> >>> fine for pure capture and pure output devices, when we added a mem2mem
> >>> device, on a DMABUF+fences pipeline, e. g.:
> >>>
> >>> sensor -> [m2m] -> DRM
> >>>
> >>> End everything using fences/DMABUF, the fences mechanism caused dead
> >>> locks on existing userspace apps.
> >>>
> >>> A m2m device has both capture and output devnodes. Both should be
> >>> queued/dequeued. The capture queue is synchronized internally at the
> >>> driver with the output buffer[1].
> >>>
> >>> [1] The names here are counter-intuitive: "capture" is a devnode
> >>> where userspace receives a video stream; "output" is a devnode where
> >>> userspace feeds a video stream.
> >>>
> >>> The problem is that adding implicit fences changed the behavior of
> >>> the ioctls, causing gstreamer to wait forever for buffers to be ready.
> >>>
> >>
> >> The problem was related to trying to make user-space unaware of the implicit
> >> fences support, and so it tried to QBUF a buffer that had already a pending
> >> fence. A workaround was to block the second QBUF ioctl if the buffer had a
> >> pending fence, but this caused the mentioned deadlock since GStreamer wasn't
> >> expecting the QBUF ioctl to block.
> >>
> >>> I suspect that, even with explicit fences, the behavior of Q/DQ
> >>> will be incompatible with the current behavior (or will require some
> >>> dirty hacks to make it identical).
> >
> > For QBUF the only difference is that we set flags for fences and pass
> > and receives in and out fences. For DQBUF the behavior is exactly the
> > same. What incompatibles or hacks do you see?
> >
> > I had the expectation that the flags would be for userspace to learn
> > about any different behavior.
> >
> >>>
> >>> So, IMHO, the best would be to use a new set of ioctls, when fences are
> >>> used (like VIDIOC_QFENCE/DQFENCE).
> >>>
> >>
> >> For explicit you can check if there's an input-fence so is different than
> >> implicit, but still I agree that it would be better to have specific ioctls.
> >
> > I'm pretty new to v4l2 so I don't know all use cases yet, but what I
> > thought was to just add extra flags to QBUF to mark when using fences
> > instead of having userspace to setup completely new ioctls for fences.
> > The burden for userspace should be smaller with flags.
> >
>
> Yes, you are right. So I guess its better indeed to just extend the current
> ioctls as you propose and only move to new ones if really needed.
>
> >>
> >>>>
> >>>> I'm sending this to start the discussion on the best approach to implement
> >>>> Explicit Synchronization, please check the TODO/OPEN section below.
> >>>>
> >>>> Explicit Synchronization allows us to control the synchronization of
> >>>> shared buffers from userspace by passing fences to the kernel and/or
> >>>> receiving them from the the kernel.
> >>>>
> >>>> Fences passed to the kernel are named in-fences and the kernel should wait
> >>>> them to signal before using the buffer. On the other side, the kernel creates
> >>>> out-fences for every buffer it receives from userspace. This fence is sent back
> >>>> to userspace and it will signal when the capture, for example, has finished.
> >>>>
> >>>> Signalling an out-fence in V4L2 would mean that the job on the buffer is done
> >>>> and the buffer can be used by other drivers.
> >>>>
> >>>> Current RFC implementation
> >>>> --------------------------
> >>>>
> >>>> The current implementation is not intended to be more than a PoC to start
> >>>> the discussion on how Explicit Synchronization should be supported in V4L2.
> >>>>
> >>>> The first patch proposes an userspace API for fences, then on patch 2
> >>>> we prepare to the addition of in-fences in patch 3, by introducing the
> >>>> infrastructure on vb2 to wait on an in-fence signal before queueing the buffer
> >>>> in the driver.
> >>>>
> >>>> Patch 4 fix uvc v4l2 event handling and patch 5 configure q->dev for vivid
> >>>> drivers to enable to subscribe and dequeue events on it.
> >>>>
> >>>> Patches 6-7 enables support to notify BUF_QUEUED events, i.e., let userspace
> >>>> know that particular buffer was enqueued in the driver. This is needed,
> >>>> because we return the out-fence fd as an out argument in QBUF, but at the time
> >>>> it returns we don't know to which buffer the fence will be attached thus
> >>>> the BUF_QUEUED event tells which buffer is associated to the fence received in
> >>>> QBUF by userspace.
> >>>>
> >>>> Patches 8 and 9 add more fence infrastructure to support out-fences and finally
> >>>> patch 10 adds support to out-fences.
> >>>>
> >>>> TODO/OPEN:
> >>>> ----------
> >>>>
> >>>> * For this first implementation we will keep the ordering of the buffers queued
> >>>> in videobuf2, that means we will only enqueue buffer whose fence was signalled
> >>>> if that buffer is the first one in the queue. Otherwise it has to wait until it
> >>>> is the first one. This is not implmented yet. Later we could create a flag to
> >>>> allow unordered queing in the drivers from vb2 if needed.
> >>>
> >>> The V4L2 spec doesn't warrant that the buffers will be dequeued at the
> >>> queue order.
> >>>
> >>> In practice, however, most drivers will not reorder. Yet, mem2mem codec
> >>> drivers may reorder the buffers at the output, as the luminance information
> >>> (Y) usually comes first on JPEG/MPEG-like formats.
> >>>
> >>>> * Should we have out-fences per-buffer or per-plane? or both? In this RFC, for
> >>>> simplicity they are per-buffer, but Mauro and Javier raised the option of
> >>>> doing per-plane fences. That could benefit mem2mem and V4L2 <-> GPU operation
> >>>> at least on cases when we have Capture hw that releases the Y frame before the
> >>>> other frames for example. When using V4L2 per-plane out-fences to communicate
> >>>> with KMS they would need to be merged together as currently the DRM Plane
> >>>> interface only supports one fence per DRM Plane.
> >>>
> >>> That's another advantage of using a new set of ioctls for queues: with that,
> >>> queuing/dequeing per plane will be easier. On codec drivers, doing it per
> >>> plane could bring performance improvements.
> >>>
> >>
> >> You don't really need to Q/DQ on a per plane basis AFAICT. Since on QBUF you
> >> can get a set of out-fences that can be passed to the other driver and so it
> >> should be able to wait per fence.
> >>
> >>>> In-fences should be per-buffer as the DRM only has per-buffer fences, but
> >>
> >> I'm not that familiar with DRM, but I thought DRM fences was also per plane
> >> and not per buffer.
> >
> > DRM plane is a different thing, its a representation of a region on the
> > screen and there is only one buffer for each DRM plane.
> >
>
> Yes, but you can still have different DRM planes, right? For example a RGB
> primary plane and a YUV overlay plane for video display. That's why I thought
> there would be some mapping between the v4l2 planes and DRM planes since there's
> a way for DRM to import dma-bufs exported by v4l2 and vice-versa.
>
> > One of the questions I raised was: how to match V4L2 per-plane fences to
> > DRM per-buffer fences?
> >
>
> I don't think that DRM fences should be per-buffer but instead per dma-buf.
> It's OK if there's a 1:1 mapping between DRM buffer and dma-buf, but that's
> not the case for v4l2. You can have many dma-bufs in a v4l2_buffer, one per
> each v4l2 plane.
>
> IOW, it shouldn't be about DRM buffers waiting for fences associated with
> v4l2 buffers, but instead about DRM waiting for fences associated with
> dma-bufs that were exported by v4l2 and imported by DRM.
The explicit synchronization implementation on DRM never deals directly
with reservation objects. That is what implicit synchronization does.
To be strict in explicit we associated the to a job, and the job can be
anything. For DRM in-fences they are per-plane and the signaling of the
fence tell us the framebuffer is ready to be scanned out. For out-fences
they are per-crtc (which includes many dma-bufs). So setting them to be
per dma-buf only for V4L2 limits our possibilities to solve problems.
So instead, let's what problems we want to solve and how sane the
userspace api can be.
In which situations will we have an advantage for having per-plane
fences on V4L2? Not all userspace applications will need that and
per-buffer is much simpler as an API.
As discussed earlier we should do it in incremental steps and support
per v4l2-buffer first, then once that is done we can add per-plane to
it.
Gustavo
Hello Gustavo,
On 04/06/2017 10:08 AM, Gustavo Padovan wrote:
> Hi Javier,
>
> 2017-04-05 Javier Martinez Canillas <[email protected]>:
>
>> Hello Gustavo,
>>
>> On 04/05/2017 11:09 AM, Gustavo Padovan wrote:
>>> 2017-04-03 Javier Martinez Canillas <[email protected]>:
>>>
>>>> Hello Mauro and Gustavo,
>>>>
>>>> On 04/03/2017 07:16 AM, Mauro Carvalho Chehab wrote:
>>>>> Hi Gustavo,
>>>>>
>>>>> Em Mon, 13 Mar 2017 16:20:25 -0300
>>>>> Gustavo Padovan <[email protected]> escreveu:
>>>>>
>>>>>> From: Gustavo Padovan <[email protected]>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> This RFC adds support for Explicit Synchronization of shared buffers in V4L2.
>>>>>> It uses the Sync File Framework[1] as vector to communicate the fences
>>>>>> between kernel and userspace.
>>>>>
>>>>> Thanks for your work!
>>>>>
>>>>> I looked on your patchset, and I didn't notice anything really weird
>>>>> there. So, instead on reviewing patch per patch, I prefer to discuss
>>>>> about the requirements and API, as depending on it, the code base will
>>>>> change a lot.
>>>>>
>>>>
>>>> Agree that's better to first set on an uAPI and then implement based on that.
>>>>
>>>>> I'd like to do some tests with it on devices with mem2mem drivers.
>>>>> My plan is to use an Exynos board for such thing, but I guess that
>>>>> the DRM driver for it currently doesn't. I'm seeing internally if someone
>>>>> could be sure that Exynos driver upstream will become ready for such
>>>>> tests.
>>>>>
>>>>
>>>> Not sure if you should try to do testing before agreeing on an uAPI and
>>>> implementation.
>>>>
>>>>> Javier wrote some patches last year meant to implement implicit
>>>>> fences support. What we noticed is that, while his mechanism worked
>>>>> fine for pure capture and pure output devices, when we added a mem2mem
>>>>> device, on a DMABUF+fences pipeline, e. g.:
>>>>>
>>>>> sensor -> [m2m] -> DRM
>>>>>
>>>>> End everything using fences/DMABUF, the fences mechanism caused dead
>>>>> locks on existing userspace apps.
>>>>>
>>>>> A m2m device has both capture and output devnodes. Both should be
>>>>> queued/dequeued. The capture queue is synchronized internally at the
>>>>> driver with the output buffer[1].
>>>>>
>>>>> [1] The names here are counter-intuitive: "capture" is a devnode
>>>>> where userspace receives a video stream; "output" is a devnode where
>>>>> userspace feeds a video stream.
>>>>>
>>>>> The problem is that adding implicit fences changed the behavior of
>>>>> the ioctls, causing gstreamer to wait forever for buffers to be ready.
>>>>>
>>>>
>>>> The problem was related to trying to make user-space unaware of the implicit
>>>> fences support, and so it tried to QBUF a buffer that had already a pending
>>>> fence. A workaround was to block the second QBUF ioctl if the buffer had a
>>>> pending fence, but this caused the mentioned deadlock since GStreamer wasn't
>>>> expecting the QBUF ioctl to block.
>>>>
>>>>> I suspect that, even with explicit fences, the behavior of Q/DQ
>>>>> will be incompatible with the current behavior (or will require some
>>>>> dirty hacks to make it identical).
>>>
>>> For QBUF the only difference is that we set flags for fences and pass
>>> and receives in and out fences. For DQBUF the behavior is exactly the
>>> same. What incompatibles or hacks do you see?
>>>
>>> I had the expectation that the flags would be for userspace to learn
>>> about any different behavior.
>>>
>>>>>
>>>>> So, IMHO, the best would be to use a new set of ioctls, when fences are
>>>>> used (like VIDIOC_QFENCE/DQFENCE).
>>>>>
>>>>
>>>> For explicit you can check if there's an input-fence so is different than
>>>> implicit, but still I agree that it would be better to have specific ioctls.
>>>
>>> I'm pretty new to v4l2 so I don't know all use cases yet, but what I
>>> thought was to just add extra flags to QBUF to mark when using fences
>>> instead of having userspace to setup completely new ioctls for fences.
>>> The burden for userspace should be smaller with flags.
>>>
>>
>> Yes, you are right. So I guess its better indeed to just extend the current
>> ioctls as you propose and only move to new ones if really needed.
>>
>>>>
>>>>>>
>>>>>> I'm sending this to start the discussion on the best approach to implement
>>>>>> Explicit Synchronization, please check the TODO/OPEN section below.
>>>>>>
>>>>>> Explicit Synchronization allows us to control the synchronization of
>>>>>> shared buffers from userspace by passing fences to the kernel and/or
>>>>>> receiving them from the the kernel.
>>>>>>
>>>>>> Fences passed to the kernel are named in-fences and the kernel should wait
>>>>>> them to signal before using the buffer. On the other side, the kernel creates
>>>>>> out-fences for every buffer it receives from userspace. This fence is sent back
>>>>>> to userspace and it will signal when the capture, for example, has finished.
>>>>>>
>>>>>> Signalling an out-fence in V4L2 would mean that the job on the buffer is done
>>>>>> and the buffer can be used by other drivers.
>>>>>>
>>>>>> Current RFC implementation
>>>>>> --------------------------
>>>>>>
>>>>>> The current implementation is not intended to be more than a PoC to start
>>>>>> the discussion on how Explicit Synchronization should be supported in V4L2.
>>>>>>
>>>>>> The first patch proposes an userspace API for fences, then on patch 2
>>>>>> we prepare to the addition of in-fences in patch 3, by introducing the
>>>>>> infrastructure on vb2 to wait on an in-fence signal before queueing the buffer
>>>>>> in the driver.
>>>>>>
>>>>>> Patch 4 fix uvc v4l2 event handling and patch 5 configure q->dev for vivid
>>>>>> drivers to enable to subscribe and dequeue events on it.
>>>>>>
>>>>>> Patches 6-7 enables support to notify BUF_QUEUED events, i.e., let userspace
>>>>>> know that particular buffer was enqueued in the driver. This is needed,
>>>>>> because we return the out-fence fd as an out argument in QBUF, but at the time
>>>>>> it returns we don't know to which buffer the fence will be attached thus
>>>>>> the BUF_QUEUED event tells which buffer is associated to the fence received in
>>>>>> QBUF by userspace.
>>>>>>
>>>>>> Patches 8 and 9 add more fence infrastructure to support out-fences and finally
>>>>>> patch 10 adds support to out-fences.
>>>>>>
>>>>>> TODO/OPEN:
>>>>>> ----------
>>>>>>
>>>>>> * For this first implementation we will keep the ordering of the buffers queued
>>>>>> in videobuf2, that means we will only enqueue buffer whose fence was signalled
>>>>>> if that buffer is the first one in the queue. Otherwise it has to wait until it
>>>>>> is the first one. This is not implmented yet. Later we could create a flag to
>>>>>> allow unordered queing in the drivers from vb2 if needed.
>>>>>
>>>>> The V4L2 spec doesn't warrant that the buffers will be dequeued at the
>>>>> queue order.
>>>>>
>>>>> In practice, however, most drivers will not reorder. Yet, mem2mem codec
>>>>> drivers may reorder the buffers at the output, as the luminance information
>>>>> (Y) usually comes first on JPEG/MPEG-like formats.
>>>>>
>>>>>> * Should we have out-fences per-buffer or per-plane? or both? In this RFC, for
>>>>>> simplicity they are per-buffer, but Mauro and Javier raised the option of
>>>>>> doing per-plane fences. That could benefit mem2mem and V4L2 <-> GPU operation
>>>>>> at least on cases when we have Capture hw that releases the Y frame before the
>>>>>> other frames for example. When using V4L2 per-plane out-fences to communicate
>>>>>> with KMS they would need to be merged together as currently the DRM Plane
>>>>>> interface only supports one fence per DRM Plane.
>>>>>
>>>>> That's another advantage of using a new set of ioctls for queues: with that,
>>>>> queuing/dequeing per plane will be easier. On codec drivers, doing it per
>>>>> plane could bring performance improvements.
>>>>>
>>>>
>>>> You don't really need to Q/DQ on a per plane basis AFAICT. Since on QBUF you
>>>> can get a set of out-fences that can be passed to the other driver and so it
>>>> should be able to wait per fence.
>>>>
>>>>>> In-fences should be per-buffer as the DRM only has per-buffer fences, but
>>>>
>>>> I'm not that familiar with DRM, but I thought DRM fences was also per plane
>>>> and not per buffer.
>>>
>>> DRM plane is a different thing, its a representation of a region on the
>>> screen and there is only one buffer for each DRM plane.
>>>
>>
>> Yes, but you can still have different DRM planes, right? For example a RGB
>> primary plane and a YUV overlay plane for video display. That's why I thought
>> there would be some mapping between the v4l2 planes and DRM planes since there's
>> a way for DRM to import dma-bufs exported by v4l2 and vice-versa.
>>
>>> One of the questions I raised was: how to match V4L2 per-plane fences to
>>> DRM per-buffer fences?
>>>
>>
>> I don't think that DRM fences should be per-buffer but instead per dma-buf.
>> It's OK if there's a 1:1 mapping between DRM buffer and dma-buf, but that's
>> not the case for v4l2. You can have many dma-bufs in a v4l2_buffer, one per
>> each v4l2 plane.
>>
>> IOW, it shouldn't be about DRM buffers waiting for fences associated with
>> v4l2 buffers, but instead about DRM waiting for fences associated with
>> dma-bufs that were exported by v4l2 and imported by DRM.
>
> The explicit synchronization implementation on DRM never deals directly
> with reservation objects. That is what implicit synchronization does.
> To be strict in explicit we associated the to a job, and the job can be
> anything. For DRM in-fences they are per-plane and the signaling of the
> fence tell us the framebuffer is ready to be scanned out. For out-fences
> they are per-crtc (which includes many dma-bufs). So setting them to be
I see, thanks a lot for the clarification and sorry for my ignorance about
explicit fences support in DRM.
> per dma-buf only for V4L2 limits our possibilities to solve problems.
>
Yes, having it per dma-buf will not help when using fences with DRM indeed.
> So instead, let's what problems we want to solve and how sane the
> userspace api can be.
>
> In which situations will we have an advantage for having per-plane
> fences on V4L2? Not all userspace applications will need that and
> per-buffer is much simpler as an API.
>
I was thinking in a case where you are using fences between two v4l2
devices that can process individual planes in a multi-planar format.
But as mentioned in a previous email, currently it won't have any
advantage since the interface between videobuf2 and v4l2 drivers uses
vb2_buffer and not vb2_planes. That is, individual planes can't be
passed between vb2 and drivers and only complete buffers (so fences
will be signaled at the same time anyways even if are per plane).
But I thought the API shouldn't be restricted by this vb2 implementation
detail since vb2 can be changed in the future to pass vb2_planes instead.
> As discussed earlier we should do it in incremental steps and support
> per v4l2-buffer first, then once that is done we can add per-plane to
> it.
>
Yes, I misunderstood how explicit fences worked for DRM and thought that
was used to protect individual dma-bufs. But as you said, out-fences are
per CRTC so then I guess that makes sense to have per v4l2_buffer fences
in v4l2/vb2, since explicit fences are used to signal that an operation
occurred rather than a specific dma-buf is ready to be accessed.
> Gustavo
>
Best regards,
--
Javier Martinez Canillas
Open Source Group
Samsung Research America
Hi Gustavo,
Em Wed, 24 May 2017 21:31:01 -0300
Gustavo Padovan <[email protected]> escreveu:
> Hi all,
>
> I've been working on the v2 of this series, but I think I hit a blocker
> when trying to cover the case where the driver asks to requeue the
> buffer. It is related to the out-fence side.
>
> In the current implementation we return on QBUF an out-fence fd that is not
> tied to any buffer, because we don't know the queueing order until the
> buffer is queued to the driver. Then when the buffer is queued we use
> the BUF_QUEUED event to notify userspace of the index of the buffer,
> so now userspace knows the buffer associated to the out-fence fd
> received earlier.
>
> Userspace goes ahead and send a DRM Atomic Request to the kernel to
> display that buffer on the screen once the fence signals. If it is
> a nonblocking request the fence waiting is past the check phase, thus
> it isn't allowed to fail anymore.
>
> But now, what happens if the V4L2 driver calls buffer_done() asking
> to requeue the buffer. That means the operation failed and can't
> signal the fence, starving the DRM side.
>
> We need to fix that. The only way I can see is to guarantee ordering of
> buffers when out-fences are used. Ordering is something that HAL3 needs
> to so maybe there is more than one reason to do it like this. I'm not
> a V4L2 expert, so I don't know all the consequences of such a change.
>
> Any other ideas?
>
> The current patchset is at:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/padovan/linux.git/log/?h=v4l2-fences
Currently, nothing warrants that buffers will be returned in order,
but that should be the case of most drivers. I guess the main
exception would be mem2mem codec drivers. On those, the driver
or the hardware may opt to reorder the buffers.
If this is a mandatory requirement for explicit fences to work, then
we'll need to be able to explicitly enable it, per driver, and
clearly document that drivers using it *should* warrant that the
dequeued buffer will follow the queued order.
We may need to modify VB2 in order to enforce it or return an
error if the order doesn't match.
--
Thanks,
Mauro
Hi Gustavo,
On Thu, Jun 8, 2017 at 2:17 PM, Mauro Carvalho Chehab
<[email protected]> wrote:
> Hi Gustavo,
>
> Em Wed, 24 May 2017 21:31:01 -0300
> Gustavo Padovan <[email protected]> escreveu:
>
>> Hi all,
>>
>> I've been working on the v2 of this series, but I think I hit a blocker
>> when trying to cover the case where the driver asks to requeue the
>> buffer. It is related to the out-fence side.
>>
>> In the current implementation we return on QBUF an out-fence fd that is not
>> tied to any buffer, because we don't know the queueing order until the
>> buffer is queued to the driver. Then when the buffer is queued we use
>> the BUF_QUEUED event to notify userspace of the index of the buffer,
>> so now userspace knows the buffer associated to the out-fence fd
>> received earlier.
>>
>> Userspace goes ahead and send a DRM Atomic Request to the kernel to
>> display that buffer on the screen once the fence signals. If it is
>> a nonblocking request the fence waiting is past the check phase, thus
>> it isn't allowed to fail anymore.
>>
>> But now, what happens if the V4L2 driver calls buffer_done() asking
>> to requeue the buffer. That means the operation failed and can't
>> signal the fence, starving the DRM side.
>>
>> We need to fix that. The only way I can see is to guarantee ordering of
>> buffers when out-fences are used. Ordering is something that HAL3 needs
>> to so maybe there is more than one reason to do it like this. I'm not
>> a V4L2 expert, so I don't know all the consequences of such a change.
>>
>> Any other ideas?
>>
>> The current patchset is at:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/padovan/linux.git/log/?h=v4l2-fences
Do you plan to send the v2 out? I did a quick review and have a few comments.
[media] vb2: split out queueing from vb_core_qbuf()
It changes the sequence a bit.
/* Fill buffer information for the userspace */
if (pb)
call_void_bufop(q, fill_user_buffer, vb, pb);
With the changes - user information is filled before __enqueue_in_driver(vb);
Anyway, it might be a good idea to send the v2 out for review and we can review
patches in detail. I am hoping to test your patch series on odroid-xu4
next week.
Could you please add me to the thread as well as include me when you send
v2 and subsequent versions.
thanks,
-- Shuah
Hi Mauro,
2017-06-08 Mauro Carvalho Chehab <[email protected]>:
> Hi Gustavo,
>
> Em Wed, 24 May 2017 21:31:01 -0300
> Gustavo Padovan <[email protected]> escreveu:
>
> > Hi all,
> >
> > I've been working on the v2 of this series, but I think I hit a blocker
> > when trying to cover the case where the driver asks to requeue the
> > buffer. It is related to the out-fence side.
> >
> > In the current implementation we return on QBUF an out-fence fd that is not
> > tied to any buffer, because we don't know the queueing order until the
> > buffer is queued to the driver. Then when the buffer is queued we use
> > the BUF_QUEUED event to notify userspace of the index of the buffer,
> > so now userspace knows the buffer associated to the out-fence fd
> > received earlier.
> >
> > Userspace goes ahead and send a DRM Atomic Request to the kernel to
> > display that buffer on the screen once the fence signals. If it is
> > a nonblocking request the fence waiting is past the check phase, thus
> > it isn't allowed to fail anymore.
> >
> > But now, what happens if the V4L2 driver calls buffer_done() asking
> > to requeue the buffer. That means the operation failed and can't
> > signal the fence, starving the DRM side.
> >
> > We need to fix that. The only way I can see is to guarantee ordering of
> > buffers when out-fences are used. Ordering is something that HAL3 needs
> > to so maybe there is more than one reason to do it like this. I'm not
> > a V4L2 expert, so I don't know all the consequences of such a change.
> >
> > Any other ideas?
> >
> > The current patchset is at:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/padovan/linux.git/log/?h=v4l2-fences
>
> Currently, nothing warrants that buffers will be returned in order,
> but that should be the case of most drivers. I guess the main
> exception would be mem2mem codec drivers. On those, the driver
> or the hardware may opt to reorder the buffers.
>
> If this is a mandatory requirement for explicit fences to work, then
> we'll need to be able to explicitly enable it, per driver, and
> clearly document that drivers using it *should* warrant that the
> dequeued buffer will follow the queued order.
It is mandatory in the sense it won't work properly and make DRM fail an
atomic commit if a buffer is requeued. So it is fair to ask drivers to
guarantee ordering is a good thing. Then later we can deal with the
drivers that can't.
>
> We may need to modify VB2 in order to enforce it or return an
> error if the order doesn't match.
Yeah, I'll look into how to do this.
Gustavo
2017-06-08 Shuah Khan <[email protected]>:
> Hi Gustavo,
>
> On Thu, Jun 8, 2017 at 2:17 PM, Mauro Carvalho Chehab
> <[email protected]> wrote:
> > Hi Gustavo,
> >
> > Em Wed, 24 May 2017 21:31:01 -0300
> > Gustavo Padovan <[email protected]> escreveu:
> >
> >> Hi all,
> >>
> >> I've been working on the v2 of this series, but I think I hit a blocker
> >> when trying to cover the case where the driver asks to requeue the
> >> buffer. It is related to the out-fence side.
> >>
> >> In the current implementation we return on QBUF an out-fence fd that is not
> >> tied to any buffer, because we don't know the queueing order until the
> >> buffer is queued to the driver. Then when the buffer is queued we use
> >> the BUF_QUEUED event to notify userspace of the index of the buffer,
> >> so now userspace knows the buffer associated to the out-fence fd
> >> received earlier.
> >>
> >> Userspace goes ahead and send a DRM Atomic Request to the kernel to
> >> display that buffer on the screen once the fence signals. If it is
> >> a nonblocking request the fence waiting is past the check phase, thus
> >> it isn't allowed to fail anymore.
> >>
> >> But now, what happens if the V4L2 driver calls buffer_done() asking
> >> to requeue the buffer. That means the operation failed and can't
> >> signal the fence, starving the DRM side.
> >>
> >> We need to fix that. The only way I can see is to guarantee ordering of
> >> buffers when out-fences are used. Ordering is something that HAL3 needs
> >> to so maybe there is more than one reason to do it like this. I'm not
> >> a V4L2 expert, so I don't know all the consequences of such a change.
> >>
> >> Any other ideas?
> >>
> >> The current patchset is at:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/padovan/linux.git/log/?h=v4l2-fences
>
> Do you plan to send the v2 out? I did a quick review and have a few comments.
>
> [media] vb2: split out queueing from vb_core_qbuf()
>
> It changes the sequence a bit.
>
> /* Fill buffer information for the userspace */
> if (pb)
> call_void_bufop(q, fill_user_buffer, vb, pb);
>
> With the changes - user information is filled before __enqueue_in_driver(vb);
Without my changes it also fills it before __enqueue_in_driver() when
start_streaming wasn't called yet. So I don't think it really matters.
>
> Anyway, it might be a good idea to send the v2 out for review and we can review
> patches in detail. I am hoping to test your patch series on odroid-xu4
> next week.
> Could you please add me to the thread as well as include me when you send
> v2 and subsequent versions.
I will send a v2 as soon as I can, but from Thursday next week until
the 25th I'll be on vacation.
Gustavo
Le lundi 03 avril 2017 à 15:46 -0400, Javier Martinez Canillas a
écrit :
> > The problem is that adding implicit fences changed the behavior of
> > the ioctls, causing gstreamer to wait forever for buffers to be ready.
> >
>
> The problem was related to trying to make user-space unaware of the implicit
> fences support, and so it tried to QBUF a buffer that had already a pending
> fence. A workaround was to block the second QBUF ioctl if the buffer had a
> pending fence, but this caused the mentioned deadlock since GStreamer wasn't
> expecting the QBUF ioctl to block.
That QBUF may block isn't a problem, but modern userspace application,
not just GStreamer, need "cancellable" operations. This is achieved by
avoiding blocking call that cannot be interrupted. What is usually
done, is that we poll the device FD to determine when it is safe to
call QBUF in a way that it will return in a finit amount of time. For
the implicit fence, it could not work, since the driver is not yet
aware of the fence, hence cannot use it to delay the poll operation.
Though, it's not clear why it couldn't wait asynchronously like this
RFC is doing with the explicit fences.
In the current RFC, the fences are signalled through a callback, and
QBUF is split in half. So waiting on the fence is done elsewhere, and
the qbuf operation completes on the fence callback thread.
Nicolas
On 06/09/2017 12:25 AM, Gustavo Padovan wrote:
> 2017-06-08 Shuah Khan <[email protected]>:
>
>> Hi Gustavo,
>>
>> On Thu, Jun 8, 2017 at 2:17 PM, Mauro Carvalho Chehab
>> <[email protected]> wrote:
>>> Hi Gustavo,
>>>
>>> Em Wed, 24 May 2017 21:31:01 -0300
>>> Gustavo Padovan <[email protected]> escreveu:
>>>
>>>> Hi all,
>>>>
>>>> I've been working on the v2 of this series, but I think I hit a blocker
>>>> when trying to cover the case where the driver asks to requeue the
>>>> buffer. It is related to the out-fence side.
>>>>
>>>> In the current implementation we return on QBUF an out-fence fd that is not
>>>> tied to any buffer, because we don't know the queueing order until the
>>>> buffer is queued to the driver. Then when the buffer is queued we use
>>>> the BUF_QUEUED event to notify userspace of the index of the buffer,
>>>> so now userspace knows the buffer associated to the out-fence fd
>>>> received earlier.
>>>>
>>>> Userspace goes ahead and send a DRM Atomic Request to the kernel to
>>>> display that buffer on the screen once the fence signals. If it is
>>>> a nonblocking request the fence waiting is past the check phase, thus
>>>> it isn't allowed to fail anymore.
>>>>
>>>> But now, what happens if the V4L2 driver calls buffer_done() asking
>>>> to requeue the buffer. That means the operation failed and can't
>>>> signal the fence, starving the DRM side.
>>>>
>>>> We need to fix that. The only way I can see is to guarantee ordering of
>>>> buffers when out-fences are used. Ordering is something that HAL3 needs
>>>> to so maybe there is more than one reason to do it like this. I'm not
>>>> a V4L2 expert, so I don't know all the consequences of such a change.
>>>>
>>>> Any other ideas?
>>>>
>>>> The current patchset is at:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/padovan/linux.git/log/?h=v4l2-fences
>>
>> Do you plan to send the v2 out? I did a quick review and have a few comments.
>>
>> [media] vb2: split out queueing from vb_core_qbuf()
>>
>> It changes the sequence a bit.
>>
>> /* Fill buffer information for the userspace */
>> if (pb)
>> call_void_bufop(q, fill_user_buffer, vb, pb);
>>
>> With the changes - user information is filled before __enqueue_in_driver(vb);
>
> Without my changes it also fills it before __enqueue_in_driver() when
> start_streaming wasn't called yet. So I don't think it really matters.
Right, with this change, it fills the buffer before __enqueue_in_driver()
when start_streaming is called. Is that an issue, I don't know for sure.
It might not be necessary perhaps if buffer is filled in the path when
stream is already called.
>
>>
>> Anyway, it might be a good idea to send the v2 out for review and we can review
>> patches in detail. I am hoping to test your patch series on odroid-xu4
>> next week.
>> Could you please add me to the thread as well as include me when you send
>> v2 and subsequent versions.
>
> I will send a v2 as soon as I can, but from Thursday next week until
> the 25th I'll be on vacation.
okay sounds good.
thanks,
-- Shuah