2023-07-04 04:23:02

by Hsia-Jun Li

[permalink] [raw]
Subject: [PATCH 0/2] Improve V4L2 M2M job scheduler

From: "Hsia-Jun(Randy) Li" <[email protected]>

The first patch is an old patch, I resend it again.
I want to make the work thats parses the bitstream
to extract the sequence information or video resolution
as a part of V4L2 schedule. Such a work would also
consume the device's resources likes remote CPU
time.

Although reuse a flag which no current driver may
not be a good idea. I could add a new flag for that
if people like that.

The second is a patch offering a generic solution
for tracking buffers which have been pushed to
hardware(or firmware). It didn't record which buffer
that hardware(firmware) still holds for future
decoding(likes the reference buffer), while it
has been sent to the user(dequeue). We may need
a flag for this work.

Hsia-Jun(Randy) Li (1):
media: v4l2-mem2mem: add a list for buf used by hw

Randy Li (1):
media: v4l2-mem2mem: allow device run without buf

drivers/media/v4l2-core/v4l2-mem2mem.c | 30 +++++++++++++++++---------
include/media/v4l2-mem2mem.h | 10 ++++++++-
2 files changed, 29 insertions(+), 11 deletions(-)

--
2.17.1



2023-07-04 04:47:34

by Hsia-Jun Li

[permalink] [raw]
Subject: [PATCH 1/2] [RESEND] media: v4l2-mem2mem: allow device run without buf

From: Randy Li <[email protected]>

For the decoder supports Dynamic Resolution Change,
we don't need to allocate any CAPTURE or graphics buffer
for them at inital CAPTURE setup step.

We need to make the device run or we can't get those
metadata.

Signed-off-by: Randy Li <[email protected]>
---
drivers/media/v4l2-core/v4l2-mem2mem.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
index 0cc30397fbad..c771aba42015 100644
--- a/drivers/media/v4l2-core/v4l2-mem2mem.c
+++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
@@ -301,8 +301,9 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,

dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);

- if (!m2m_ctx->out_q_ctx.q.streaming
- || !m2m_ctx->cap_q_ctx.q.streaming) {
+ if (!(m2m_ctx->out_q_ctx.q.streaming || m2m_ctx->out_q_ctx.buffered)
+ || !(m2m_ctx->cap_q_ctx.q.streaming
+ || m2m_ctx->cap_q_ctx.buffered)) {
dprintk("Streaming needs to be on for both queues\n");
return;
}
--
2.17.1


2023-07-04 05:03:12

by Hsia-Jun Li

[permalink] [raw]
Subject: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw

From: "Hsia-Jun(Randy) Li" <[email protected]>

Many drivers have to create its own buf_struct for a
vb2_queue to track such a state. Also driver has to
iterate over rdy_queue every times to find out a buffer
which is not sent to hardware(or firmware), this new
list just offers the driver a place to store the buffer
that hardware(firmware) has acknowledged.

One important advance about this list, it doesn't like
rdy_queue which both bottom half of the user calling
could operate it, while the v4l2 worker would as well.
The v4l2 core could only operate this queue when its
v4l2_context is not running, the driver would only
access this new hw_queue in its own worker.

Signed-off-by: Hsia-Jun(Randy) Li <[email protected]>
---
drivers/media/v4l2-core/v4l2-mem2mem.c | 25 +++++++++++++++++--------
include/media/v4l2-mem2mem.h | 10 +++++++++-
2 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
index c771aba42015..b4151147d5bd 100644
--- a/drivers/media/v4l2-core/v4l2-mem2mem.c
+++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
@@ -321,15 +321,21 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
goto job_unlock;
}

- src = v4l2_m2m_next_src_buf(m2m_ctx);
- dst = v4l2_m2m_next_dst_buf(m2m_ctx);
- if (!src && !m2m_ctx->out_q_ctx.buffered) {
- dprintk("No input buffers available\n");
- goto job_unlock;
+ if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
+ src = v4l2_m2m_next_src_buf(m2m_ctx);
+
+ if (!src && !m2m_ctx->out_q_ctx.buffered) {
+ dprintk("No input buffers available\n");
+ goto job_unlock;
+ }
}
- if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
- dprintk("No output buffers available\n");
- goto job_unlock;
+
+ if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
+ dst = v4l2_m2m_next_dst_buf(m2m_ctx);
+ if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
+ dprintk("No output buffers available\n");
+ goto job_unlock;
+ }
}

m2m_ctx->new_frame = true;
@@ -896,6 +902,7 @@ int v4l2_m2m_streamoff(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
INIT_LIST_HEAD(&q_ctx->rdy_queue);
q_ctx->num_rdy = 0;
spin_unlock_irqrestore(&q_ctx->rdy_spinlock, flags);
+ INIT_LIST_HEAD(&q_ctx->hw_queue);

if (m2m_dev->curr_ctx == m2m_ctx) {
m2m_dev->curr_ctx = NULL;
@@ -1234,6 +1241,8 @@ struct v4l2_m2m_ctx *v4l2_m2m_ctx_init(struct v4l2_m2m_dev *m2m_dev,

INIT_LIST_HEAD(&out_q_ctx->rdy_queue);
INIT_LIST_HEAD(&cap_q_ctx->rdy_queue);
+ INIT_LIST_HEAD(&out_q_ctx->hw_queue);
+ INIT_LIST_HEAD(&cap_q_ctx->hw_queue);
spin_lock_init(&out_q_ctx->rdy_spinlock);
spin_lock_init(&cap_q_ctx->rdy_spinlock);

diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
index d6c8eb2b5201..2342656e582d 100644
--- a/include/media/v4l2-mem2mem.h
+++ b/include/media/v4l2-mem2mem.h
@@ -53,9 +53,16 @@ struct v4l2_m2m_dev;
* processed
*
* @q: pointer to struct &vb2_queue
- * @rdy_queue: List of V4L2 mem-to-mem queues
+ * @rdy_queue: List of V4L2 mem-to-mem queues. If v4l2_m2m_buf_queue() is
+ * called in struct vb2_ops->buf_queue(), the buffer enqueued
+ * by user would be added to this list.
* @rdy_spinlock: spin lock to protect the struct usage
* @num_rdy: number of buffers ready to be processed
+ * @hw_queue: A list for tracking the buffer is occupied by the hardware
+ * (or device's firmware). A buffer could only be in either
+ * this list or @rdy_queue.
+ * Driver may choose not to use this list while uses its own
+ * private data to do this work.
* @buffered: is the queue buffered?
*
* Queue for buffers ready to be processed as soon as this
@@ -68,6 +75,7 @@ struct v4l2_m2m_queue_ctx {
struct list_head rdy_queue;
spinlock_t rdy_spinlock;
u8 num_rdy;
+ struct list_head hw_queue;
bool buffered;
};

--
2.17.1


2023-07-07 19:40:45

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: [PATCH 1/2] [RESEND] media: v4l2-mem2mem: allow device run without buf

Hi Randy,

Le mardi 04 juillet 2023 à 12:00 +0800, Hsia-Jun Li a écrit :
> From: Randy Li <[email protected]>
>
> For the decoder supports Dynamic Resolution Change,
> we don't need to allocate any CAPTURE or graphics buffer
> for them at inital CAPTURE setup step.
>
> We need to make the device run or we can't get those
> metadata.
>
> Signed-off-by: Randy Li <[email protected]>
> ---
> drivers/media/v4l2-core/v4l2-mem2mem.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> index 0cc30397fbad..c771aba42015 100644
> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> @@ -301,8 +301,9 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
>
> dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);
>
> - if (!m2m_ctx->out_q_ctx.q.streaming
> - || !m2m_ctx->cap_q_ctx.q.streaming) {
> + if (!(m2m_ctx->out_q_ctx.q.streaming || m2m_ctx->out_q_ctx.buffered)
> + || !(m2m_ctx->cap_q_ctx.q.streaming
> + || m2m_ctx->cap_q_ctx.buffered)) {

I have a two atches with similar goals in my wave5 tree. It will be easier to
upstream with an actual user, though, I'm probably a month or two away from
submitting this driver again.

https://gitlab.collabora.com/chipsnmedia/kernel/-/commit/ac59eafd5076c4deb3bfe1fb85b3b776586ef3eb
https://gitlab.collabora.com/chipsnmedia/kernel/-/commit/5de4fbe0abb20b8e8d862b654f93e3efeb1ef251

Sebastien and I authored this without giving it much thought, but we believe
this massively simplify our handling of DRC (dynamic resolution change).

The main difference, is that we added ignore_streaming to the ctx, so that
drivers can opt-in the mode of operation. Thinking it would avoid any potential
side effects in drivers that aren't prepared to that. We didn't want to tied it
up to buffered, this is open to discussion of course, we do use buffered on both
queues and use a slightly more advance job_ready function, that take into
account our driver state.

In short, Sebastien and I agree this small change is the right direction, we
simply have a different implementation. I can send it as RFC if one believe its
would be useful now (even without a user).

> dprintk("Streaming needs to be on for both queues\n");
> return;
> }


2023-07-07 19:56:38

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw

Le mardi 04 juillet 2023 à 12:00 +0800, Hsia-Jun Li a écrit :
> From: "Hsia-Jun(Randy) Li" <[email protected]>
>
> Many drivers have to create its own buf_struct for a
> vb2_queue to track such a state. Also driver has to
> iterate over rdy_queue every times to find out a buffer
> which is not sent to hardware(or firmware), this new
> list just offers the driver a place to store the buffer
> that hardware(firmware) has acknowledged.
>
> One important advance about this list, it doesn't like
> rdy_queue which both bottom half of the user calling
> could operate it, while the v4l2 worker would as well.
> The v4l2 core could only operate this queue when its
> v4l2_context is not running, the driver would only
> access this new hw_queue in its own worker.

That's an interesting proposal. I didn't like leaving decoded frames into the
rdy_queue, but removing them required me to have my own list, so that I can
clean it up if some buffers are never displayed.

We'll see if we can use this into wave5.

>
> Signed-off-by: Hsia-Jun(Randy) Li <[email protected]>
> ---
> drivers/media/v4l2-core/v4l2-mem2mem.c | 25 +++++++++++++++++--------
> include/media/v4l2-mem2mem.h | 10 +++++++++-
> 2 files changed, 26 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> index c771aba42015..b4151147d5bd 100644
> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> @@ -321,15 +321,21 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
> goto job_unlock;
> }
>
> - src = v4l2_m2m_next_src_buf(m2m_ctx);
> - dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> - if (!src && !m2m_ctx->out_q_ctx.buffered) {
> - dprintk("No input buffers available\n");
> - goto job_unlock;
> + if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
> + src = v4l2_m2m_next_src_buf(m2m_ctx);
> +
> + if (!src && !m2m_ctx->out_q_ctx.buffered) {
> + dprintk("No input buffers available\n");
> + goto job_unlock;
> + }
> }
> - if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> - dprintk("No output buffers available\n");
> - goto job_unlock;
> +
> + if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
> + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> + if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> + dprintk("No output buffers available\n");
> + goto job_unlock;
> + }
> }
>
> m2m_ctx->new_frame = true;
> @@ -896,6 +902,7 @@ int v4l2_m2m_streamoff(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
> INIT_LIST_HEAD(&q_ctx->rdy_queue);
> q_ctx->num_rdy = 0;
> spin_unlock_irqrestore(&q_ctx->rdy_spinlock, flags);
> + INIT_LIST_HEAD(&q_ctx->hw_queue);
>
> if (m2m_dev->curr_ctx == m2m_ctx) {
> m2m_dev->curr_ctx = NULL;
> @@ -1234,6 +1241,8 @@ struct v4l2_m2m_ctx *v4l2_m2m_ctx_init(struct v4l2_m2m_dev *m2m_dev,
>
> INIT_LIST_HEAD(&out_q_ctx->rdy_queue);
> INIT_LIST_HEAD(&cap_q_ctx->rdy_queue);
> + INIT_LIST_HEAD(&out_q_ctx->hw_queue);
> + INIT_LIST_HEAD(&cap_q_ctx->hw_queue);
> spin_lock_init(&out_q_ctx->rdy_spinlock);
> spin_lock_init(&cap_q_ctx->rdy_spinlock);
>
> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
> index d6c8eb2b5201..2342656e582d 100644
> --- a/include/media/v4l2-mem2mem.h
> +++ b/include/media/v4l2-mem2mem.h
> @@ -53,9 +53,16 @@ struct v4l2_m2m_dev;
> * processed
> *
> * @q: pointer to struct &vb2_queue
> - * @rdy_queue: List of V4L2 mem-to-mem queues
> + * @rdy_queue: List of V4L2 mem-to-mem queues. If v4l2_m2m_buf_queue() is
> + * called in struct vb2_ops->buf_queue(), the buffer enqueued
> + * by user would be added to this list.
> * @rdy_spinlock: spin lock to protect the struct usage
> * @num_rdy: number of buffers ready to be processed
> + * @hw_queue: A list for tracking the buffer is occupied by the hardware
> + * (or device's firmware). A buffer could only be in either
> + * this list or @rdy_queue.
> + * Driver may choose not to use this list while uses its own
> + * private data to do this work.

What's the threading protection around this one ? Also, would it be possible to
opt-in that the driver cleanup that list automatically after streamoff has been
executed ?

One thing the doc is missing, is that HW buffer are actually flagged as ACTIVE
buffer vb2, there is a strong link between the two concept, and the doc should
take care.

> * @buffered: is the queue buffered?
> *
> * Queue for buffers ready to be processed as soon as this
> @@ -68,6 +75,7 @@ struct v4l2_m2m_queue_ctx {
> struct list_head rdy_queue;
> spinlock_t rdy_spinlock;
> u8 num_rdy;
> + struct list_head hw_queue;
> bool buffered;
> };
>


2023-07-11 07:01:19

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw

Hi Hsia-Jun,

kernel test robot noticed the following build warnings:

https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Hsia-Jun-Li/media-v4l2-mem2mem-allow-device-run-without-buf/20230704-120308
base: git://linuxtv.org/media_tree.git master
patch link: https://lore.kernel.org/r/20230704040044.681850-3-randy.li%40synaptics.com
patch subject: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw
config: arm64-randconfig-m041-20230710 (https://download.01.org/0day-ci/archive/20230711/[email protected]/config)
compiler: aarch64-linux-gcc (GCC) 12.3.0
reproduce: (https://download.01.org/0day-ci/archive/20230711/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Reported-by: Dan Carpenter <[email protected]>
| Closes: https://lore.kernel.org/r/[email protected]/

smatch warnings:
drivers/media/v4l2-core/v4l2-mem2mem.c:343 __v4l2_m2m_try_queue() error: uninitialized symbol 'src'.
drivers/media/v4l2-core/v4l2-mem2mem.c:343 __v4l2_m2m_try_queue() error: uninitialized symbol 'dst'.

vim +/src +343 drivers/media/v4l2-core/v4l2-mem2mem.c

9db3bbf58be59a drivers/media/v4l2-core/v4l2-mem2mem.c Ezequiel Garcia 2018-07-25 296 static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
9db3bbf58be59a drivers/media/v4l2-core/v4l2-mem2mem.c Ezequiel Garcia 2018-07-25 297 struct v4l2_m2m_ctx *m2m_ctx)
7f98639def42a6 drivers/media/video/v4l2-mem2mem.c Pawel Osciak 2010-04-23 298 {
f8cca8c97a63d7 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 299 unsigned long flags_job;
f8cca8c97a63d7 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 300 struct vb2_v4l2_buffer *dst, *src;
7f98639def42a6 drivers/media/video/v4l2-mem2mem.c Pawel Osciak 2010-04-23 301
7f98639def42a6 drivers/media/video/v4l2-mem2mem.c Pawel Osciak 2010-04-23 302 dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);
7f98639def42a6 drivers/media/video/v4l2-mem2mem.c Pawel Osciak 2010-04-23 303
f3e1a4c9d7a02d drivers/media/v4l2-core/v4l2-mem2mem.c Randy Li 2023-07-04 304 if (!(m2m_ctx->out_q_ctx.q.streaming || m2m_ctx->out_q_ctx.buffered)
f3e1a4c9d7a02d drivers/media/v4l2-core/v4l2-mem2mem.c Randy Li 2023-07-04 305 || !(m2m_ctx->cap_q_ctx.q.streaming
f3e1a4c9d7a02d drivers/media/v4l2-core/v4l2-mem2mem.c Randy Li 2023-07-04 306 || m2m_ctx->cap_q_ctx.buffered)) {
7f98639def42a6 drivers/media/video/v4l2-mem2mem.c Pawel Osciak 2010-04-23 307 dprintk("Streaming needs to be on for both queues\n");
7f98639def42a6 drivers/media/video/v4l2-mem2mem.c Pawel Osciak 2010-04-23 308 return;
7f98639def42a6 drivers/media/video/v4l2-mem2mem.c Pawel Osciak 2010-04-23 309 }
7f98639def42a6 drivers/media/video/v4l2-mem2mem.c Pawel Osciak 2010-04-23 310
7f98639def42a6 drivers/media/video/v4l2-mem2mem.c Pawel Osciak 2010-04-23 311 spin_lock_irqsave(&m2m_dev->job_spinlock, flags_job);
2ad5389b341282 drivers/media/v4l2-core/v4l2-mem2mem.c Shaik Ameer Basha 2013-09-20 312
2ad5389b341282 drivers/media/v4l2-core/v4l2-mem2mem.c Shaik Ameer Basha 2013-09-20 313 /* If the context is aborted then don't schedule it */
2ad5389b341282 drivers/media/v4l2-core/v4l2-mem2mem.c Shaik Ameer Basha 2013-09-20 314 if (m2m_ctx->job_flags & TRANS_ABORT) {
2ad5389b341282 drivers/media/v4l2-core/v4l2-mem2mem.c Shaik Ameer Basha 2013-09-20 315 dprintk("Aborted context\n");
cbec2836f8be61 drivers/media/v4l2-core/v4l2-mem2mem.c Sakari Ailus 2018-10-18 316 goto job_unlock;
2ad5389b341282 drivers/media/v4l2-core/v4l2-mem2mem.c Shaik Ameer Basha 2013-09-20 317 }
2ad5389b341282 drivers/media/v4l2-core/v4l2-mem2mem.c Shaik Ameer Basha 2013-09-20 318
7f98639def42a6 drivers/media/video/v4l2-mem2mem.c Pawel Osciak 2010-04-23 319 if (m2m_ctx->job_flags & TRANS_QUEUED) {
7f98639def42a6 drivers/media/video/v4l2-mem2mem.c Pawel Osciak 2010-04-23 320 dprintk("On job queue already\n");
cbec2836f8be61 drivers/media/v4l2-core/v4l2-mem2mem.c Sakari Ailus 2018-10-18 321 goto job_unlock;
7f98639def42a6 drivers/media/video/v4l2-mem2mem.c Pawel Osciak 2010-04-23 322 }
7f98639def42a6 drivers/media/video/v4l2-mem2mem.c Pawel Osciak 2010-04-23 323
cafbe530b02613 drivers/media/v4l2-core/v4l2-mem2mem.c Hsia-Jun(Randy Li 2023-07-04 324) if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
f8cca8c97a63d7 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 325 src = v4l2_m2m_next_src_buf(m2m_ctx);
cafbe530b02613 drivers/media/v4l2-core/v4l2-mem2mem.c Hsia-Jun(Randy Li 2023-07-04 326)
f8cca8c97a63d7 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 327 if (!src && !m2m_ctx->out_q_ctx.buffered) {
7f98639def42a6 drivers/media/video/v4l2-mem2mem.c Pawel Osciak 2010-04-23 328 dprintk("No input buffers available\n");
f8cca8c97a63d7 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 329 goto job_unlock;
7f98639def42a6 drivers/media/video/v4l2-mem2mem.c Pawel Osciak 2010-04-23 330 }
cafbe530b02613 drivers/media/v4l2-core/v4l2-mem2mem.c Hsia-Jun(Randy Li 2023-07-04 331) }

src uninitialized on else path.

cafbe530b02613 drivers/media/v4l2-core/v4l2-mem2mem.c Hsia-Jun(Randy Li 2023-07-04 332)
cafbe530b02613 drivers/media/v4l2-core/v4l2-mem2mem.c Hsia-Jun(Randy Li 2023-07-04 333) if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
cafbe530b02613 drivers/media/v4l2-core/v4l2-mem2mem.c Hsia-Jun(Randy Li 2023-07-04 334) dst = v4l2_m2m_next_dst_buf(m2m_ctx);
f8cca8c97a63d7 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 335 if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
7f98639def42a6 drivers/media/video/v4l2-mem2mem.c Pawel Osciak 2010-04-23 336 dprintk("No output buffers available\n");
f8cca8c97a63d7 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 337 goto job_unlock;
f8cca8c97a63d7 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 338 }
cafbe530b02613 drivers/media/v4l2-core/v4l2-mem2mem.c Hsia-Jun(Randy Li 2023-07-04 339) }

dst not initialized if !list_empty()

f8cca8c97a63d7 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 340
f07602ac388723 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 341 m2m_ctx->new_frame = true;
f07602ac388723 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 342
f07602ac388723 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 @343 if (src && dst && dst->is_held &&

Uninitialized.

f07602ac388723 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 344 dst->vb2_buf.copied_timestamp &&
f8cca8c97a63d7 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 345 dst->vb2_buf.timestamp != src->vb2_buf.timestamp) {
86ef61ad686c17 drivers/media/v4l2-core/v4l2-mem2mem.c Nicolas Dufresne 2022-04-26 346 dprintk("Timestamp mismatch, returning held capture buffer\n");
f8cca8c97a63d7 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 347 dst->is_held = false;
f8cca8c97a63d7 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 348 v4l2_m2m_dst_buf_remove(m2m_ctx);
f8cca8c97a63d7 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 349 v4l2_m2m_buf_done(dst, VB2_BUF_STATE_DONE);
f8cca8c97a63d7 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 350 dst = v4l2_m2m_next_dst_buf(m2m_ctx);
f8cca8c97a63d7 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 351
f8cca8c97a63d7 drivers/media/v4l2-core/v4l2-mem2mem.c Hans Verkuil 2019-10-11 352 if (!dst && !m2m_ctx->cap_q_ctx.buffered) {

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


2023-07-11 09:18:58

by Randy Li

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw


On 2023/7/8 03:20, Nicolas Dufresne wrote:
> Le mardi 04 juillet 2023 à 12:00 +0800, Hsia-Jun Li a écrit :
>> From: "Hsia-Jun(Randy) Li" <[email protected]>
>>
>> Many drivers have to create its own buf_struct for a
>> vb2_queue to track such a state. Also driver has to
>> iterate over rdy_queue every times to find out a buffer
>> which is not sent to hardware(or firmware), this new
>> list just offers the driver a place to store the buffer
>> that hardware(firmware) has acknowledged.
>>
>> One important advance about this list, it doesn't like
>> rdy_queue which both bottom half of the user calling
>> could operate it, while the v4l2 worker would as well.
>> The v4l2 core could only operate this queue when its
>> v4l2_context is not running, the driver would only
>> access this new hw_queue in its own worker.
> That's an interesting proposal. I didn't like leaving decoded frames into the
> rdy_queue, but removing them required me to have my own list, so that I can
> clean it up if some buffers are never displayed.
>
> We'll see if we can use this into wave5.
>
>> Signed-off-by: Hsia-Jun(Randy) Li <[email protected]>
>> ---
>> drivers/media/v4l2-core/v4l2-mem2mem.c | 25 +++++++++++++++++--------
>> include/media/v4l2-mem2mem.h | 10 +++++++++-
>> 2 files changed, 26 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
>> index c771aba42015..b4151147d5bd 100644
>> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
>> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
>> @@ -321,15 +321,21 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
>> goto job_unlock;
>> }
>>
>> - src = v4l2_m2m_next_src_buf(m2m_ctx);
>> - dst = v4l2_m2m_next_dst_buf(m2m_ctx);
>> - if (!src && !m2m_ctx->out_q_ctx.buffered) {
>> - dprintk("No input buffers available\n");
>> - goto job_unlock;
>> + if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
>> + src = v4l2_m2m_next_src_buf(m2m_ctx);
>> +
>> + if (!src && !m2m_ctx->out_q_ctx.buffered) {
>> + dprintk("No input buffers available\n");
>> + goto job_unlock;
>> + }
>> }
>> - if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
>> - dprintk("No output buffers available\n");
>> - goto job_unlock;
>> +
>> + if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
>> + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
>> + if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
>> + dprintk("No output buffers available\n");
>> + goto job_unlock;
>> + }
>> }
>>
>> m2m_ctx->new_frame = true;
>> @@ -896,6 +902,7 @@ int v4l2_m2m_streamoff(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
>> INIT_LIST_HEAD(&q_ctx->rdy_queue);
>> q_ctx->num_rdy = 0;
>> spin_unlock_irqrestore(&q_ctx->rdy_spinlock, flags);
>> + INIT_LIST_HEAD(&q_ctx->hw_queue);
>>
>> if (m2m_dev->curr_ctx == m2m_ctx) {
>> m2m_dev->curr_ctx = NULL;
>> @@ -1234,6 +1241,8 @@ struct v4l2_m2m_ctx *v4l2_m2m_ctx_init(struct v4l2_m2m_dev *m2m_dev,
>>
>> INIT_LIST_HEAD(&out_q_ctx->rdy_queue);
>> INIT_LIST_HEAD(&cap_q_ctx->rdy_queue);
>> + INIT_LIST_HEAD(&out_q_ctx->hw_queue);
>> + INIT_LIST_HEAD(&cap_q_ctx->hw_queue);
>> spin_lock_init(&out_q_ctx->rdy_spinlock);
>> spin_lock_init(&cap_q_ctx->rdy_spinlock);
>>
>> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
>> index d6c8eb2b5201..2342656e582d 100644
>> --- a/include/media/v4l2-mem2mem.h
>> +++ b/include/media/v4l2-mem2mem.h
>> @@ -53,9 +53,16 @@ struct v4l2_m2m_dev;
>> * processed
>> *
>> * @q: pointer to struct &vb2_queue
>> - * @rdy_queue: List of V4L2 mem-to-mem queues
>> + * @rdy_queue: List of V4L2 mem-to-mem queues. If v4l2_m2m_buf_queue() is
>> + * called in struct vb2_ops->buf_queue(), the buffer enqueued
>> + * by user would be added to this list.
>> * @rdy_spinlock: spin lock to protect the struct usage
>> * @num_rdy: number of buffers ready to be processed
>> + * @hw_queue: A list for tracking the buffer is occupied by the hardware
>> + * (or device's firmware). A buffer could only be in either
>> + * this list or @rdy_queue.
>> + * Driver may choose not to use this list while uses its own
>> + * private data to do this work.
> What's the threading protection around this one ?

I mentioned that in commit message.

"

The v4l2 core could only operate this queue when its
v4l2_context is not running, the driver would only
access this new hw_queue in its own worker.
"

> Also, would it be possible to
> opt-in that the driver cleanup that list automatically after streamoff has been
> executed ?
I think that is what streamoff purposes to do, or we don't have a method
to let the hardware(firmware) release all its buffers.
>
> One thing the doc is missing, is that HW buffer are actually flagged as ACTIVE
> buffer vb2, there is a strong link between the two concept, and the doc should
> take care.

I would like to complete the doc if people are not against about this idea.

And I would put all necessary information in one place.

>
>> * @buffered: is the queue buffered?
>> *
>> * Queue for buffers ready to be processed as soon as this
>> @@ -68,6 +75,7 @@ struct v4l2_m2m_queue_ctx {
>> struct list_head rdy_queue;
>> spinlock_t rdy_spinlock;
>> u8 num_rdy;
>> + struct list_head hw_queue;
>> bool buffered;
>> };
>>

2023-07-12 09:59:01

by Tomasz Figa

[permalink] [raw]
Subject: Re: [PATCH 1/2] [RESEND] media: v4l2-mem2mem: allow device run without buf

On Fri, Jul 07, 2023 at 03:14:23PM -0400, Nicolas Dufresne wrote:
> Hi Randy,
>
> Le mardi 04 juillet 2023 ? 12:00 +0800, Hsia-Jun Li a ?crit?:
> > From: Randy Li <[email protected]>
> >
> > For the decoder supports Dynamic Resolution Change,
> > we don't need to allocate any CAPTURE or graphics buffer
> > for them at inital CAPTURE setup step.
> >
> > We need to make the device run or we can't get those
> > metadata.
> >
> > Signed-off-by: Randy Li <[email protected]>
> > ---
> > drivers/media/v4l2-core/v4l2-mem2mem.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > index 0cc30397fbad..c771aba42015 100644
> > --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> > +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > @@ -301,8 +301,9 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
> >
> > dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);
> >
> > - if (!m2m_ctx->out_q_ctx.q.streaming
> > - || !m2m_ctx->cap_q_ctx.q.streaming) {
> > + if (!(m2m_ctx->out_q_ctx.q.streaming || m2m_ctx->out_q_ctx.buffered)
> > + || !(m2m_ctx->cap_q_ctx.q.streaming
> > + || m2m_ctx->cap_q_ctx.buffered)) {
>
> I have a two atches with similar goals in my wave5 tree. It will be easier to
> upstream with an actual user, though, I'm probably a month or two away from
> submitting this driver again.
>
> https://gitlab.collabora.com/chipsnmedia/kernel/-/commit/ac59eafd5076c4deb3bfe1fb85b3b776586ef3eb
> https://gitlab.collabora.com/chipsnmedia/kernel/-/commit/5de4fbe0abb20b8e8d862b654f93e3efeb1ef251
>

While I'm not going to NAK this series or those 2 patches if you send
them, I'm not really convinced that adding more and more complexity to
the mem2mem helpers is a good idea, especially since all of those seem
to be only needed by stateful video decoders.

The mem2mem framework started as a set of helpers to eliminate boiler
plate from simple drivers that always get 1 CAPTURE and 1 OUTPUT buffer,
run 1 processing job on them and then return both of the to the userspace
and I think it should stay like this.

I think we're strongly in need of a stateful video decoder framework that
would actually address the exact problems that those have rather than
bending something that wasn't designed with them in mind to work around the
differences.

Best regards,
Tomasz

> Sebastien and I authored this without giving it much thought, but we believe
> this massively simplify our handling of DRC (dynamic resolution change).
>
> The main difference, is that we added ignore_streaming to the ctx, so that
> drivers can opt-in the mode of operation. Thinking it would avoid any potential
> side effects in drivers that aren't prepared to that. We didn't want to tied it
> up to buffered, this is open to discussion of course, we do use buffered on both
> queues and use a slightly more advance job_ready function, that take into
> account our driver state.
>
> In short, Sebastien and I agree this small change is the right direction, we
> simply have a different implementation. I can send it as RFC if one believe its
> would be useful now (even without a user).
>
> > dprintk("Streaming needs to be on for both queues\n");
> > return;
> > }
>

2023-07-12 10:01:36

by Tomasz Figa

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw

On Tue, Jul 04, 2023 at 12:00:38PM +0800, Hsia-Jun Li wrote:
> From: "Hsia-Jun(Randy) Li" <[email protected]>
>
> Many drivers have to create its own buf_struct for a
> vb2_queue to track such a state. Also driver has to
> iterate over rdy_queue every times to find out a buffer
> which is not sent to hardware(or firmware), this new
> list just offers the driver a place to store the buffer
> that hardware(firmware) has acknowledged.
>
> One important advance about this list, it doesn't like
> rdy_queue which both bottom half of the user calling
> could operate it, while the v4l2 worker would as well.
> The v4l2 core could only operate this queue when its
> v4l2_context is not running, the driver would only
> access this new hw_queue in its own worker.

Could you describe in what case such a list would be useful for a
mem2mem driver?

>
> Signed-off-by: Hsia-Jun(Randy) Li <[email protected]>
> ---
> drivers/media/v4l2-core/v4l2-mem2mem.c | 25 +++++++++++++++++--------
> include/media/v4l2-mem2mem.h | 10 +++++++++-
> 2 files changed, 26 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> index c771aba42015..b4151147d5bd 100644
> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> @@ -321,15 +321,21 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
> goto job_unlock;
> }
>
> - src = v4l2_m2m_next_src_buf(m2m_ctx);
> - dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> - if (!src && !m2m_ctx->out_q_ctx.buffered) {
> - dprintk("No input buffers available\n");
> - goto job_unlock;
> + if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
> + src = v4l2_m2m_next_src_buf(m2m_ctx);
> +
> + if (!src && !m2m_ctx->out_q_ctx.buffered) {
> + dprintk("No input buffers available\n");
> + goto job_unlock;
> + }
> }
> - if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> - dprintk("No output buffers available\n");
> - goto job_unlock;
> +
> + if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
> + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> + if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> + dprintk("No output buffers available\n");
> + goto job_unlock;
> + }
> }

src and dst would be referenced unitialized below if neither of the
above ifs hits...

Best regards,
Tomasz

>
> m2m_ctx->new_frame = true;
> @@ -896,6 +902,7 @@ int v4l2_m2m_streamoff(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
> INIT_LIST_HEAD(&q_ctx->rdy_queue);
> q_ctx->num_rdy = 0;
> spin_unlock_irqrestore(&q_ctx->rdy_spinlock, flags);
> + INIT_LIST_HEAD(&q_ctx->hw_queue);
>
> if (m2m_dev->curr_ctx == m2m_ctx) {
> m2m_dev->curr_ctx = NULL;
> @@ -1234,6 +1241,8 @@ struct v4l2_m2m_ctx *v4l2_m2m_ctx_init(struct v4l2_m2m_dev *m2m_dev,
>
> INIT_LIST_HEAD(&out_q_ctx->rdy_queue);
> INIT_LIST_HEAD(&cap_q_ctx->rdy_queue);
> + INIT_LIST_HEAD(&out_q_ctx->hw_queue);
> + INIT_LIST_HEAD(&cap_q_ctx->hw_queue);
> spin_lock_init(&out_q_ctx->rdy_spinlock);
> spin_lock_init(&cap_q_ctx->rdy_spinlock);
>
> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
> index d6c8eb2b5201..2342656e582d 100644
> --- a/include/media/v4l2-mem2mem.h
> +++ b/include/media/v4l2-mem2mem.h
> @@ -53,9 +53,16 @@ struct v4l2_m2m_dev;
> * processed
> *
> * @q: pointer to struct &vb2_queue
> - * @rdy_queue: List of V4L2 mem-to-mem queues
> + * @rdy_queue: List of V4L2 mem-to-mem queues. If v4l2_m2m_buf_queue() is
> + * called in struct vb2_ops->buf_queue(), the buffer enqueued
> + * by user would be added to this list.
> * @rdy_spinlock: spin lock to protect the struct usage
> * @num_rdy: number of buffers ready to be processed
> + * @hw_queue: A list for tracking the buffer is occupied by the hardware
> + * (or device's firmware). A buffer could only be in either
> + * this list or @rdy_queue.
> + * Driver may choose not to use this list while uses its own
> + * private data to do this work.
> * @buffered: is the queue buffered?
> *
> * Queue for buffers ready to be processed as soon as this
> @@ -68,6 +75,7 @@ struct v4l2_m2m_queue_ctx {
> struct list_head rdy_queue;
> spinlock_t rdy_spinlock;
> u8 num_rdy;
> + struct list_head hw_queue;
> bool buffered;
> };
>
> --
> 2.17.1
>

2023-07-12 10:20:04

by Sebastian Fricke

[permalink] [raw]
Subject: Re: [PATCH 1/2] [RESEND] media: v4l2-mem2mem: allow device run without buf

Hey Tomasz,

On 12.07.2023 09:31, Tomasz Figa wrote:
>On Fri, Jul 07, 2023 at 03:14:23PM -0400, Nicolas Dufresne wrote:
>> Hi Randy,
>>
>> Le mardi 04 juillet 2023 à 12:00 +0800, Hsia-Jun Li a écrit :
>> > From: Randy Li <[email protected]>
>> >
>> > For the decoder supports Dynamic Resolution Change,
>> > we don't need to allocate any CAPTURE or graphics buffer
>> > for them at inital CAPTURE setup step.
>> >
>> > We need to make the device run or we can't get those
>> > metadata.
>> >
>> > Signed-off-by: Randy Li <[email protected]>
>> > ---
>> > drivers/media/v4l2-core/v4l2-mem2mem.c | 5 +++--
>> > 1 file changed, 3 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
>> > index 0cc30397fbad..c771aba42015 100644
>> > --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
>> > +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
>> > @@ -301,8 +301,9 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
>> >
>> > dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);
>> >
>> > - if (!m2m_ctx->out_q_ctx.q.streaming
>> > - || !m2m_ctx->cap_q_ctx.q.streaming) {
>> > + if (!(m2m_ctx->out_q_ctx.q.streaming || m2m_ctx->out_q_ctx.buffered)
>> > + || !(m2m_ctx->cap_q_ctx.q.streaming
>> > + || m2m_ctx->cap_q_ctx.buffered)) {
>>
>> I have a two atches with similar goals in my wave5 tree. It will be easier to
>> upstream with an actual user, though, I'm probably a month or two away from
>> submitting this driver again.
>>
>> https://gitlab.collabora.com/chipsnmedia/kernel/-/commit/ac59eafd5076c4deb3bfe1fb85b3b776586ef3eb
>> https://gitlab.collabora.com/chipsnmedia/kernel/-/commit/5de4fbe0abb20b8e8d862b654f93e3efeb1ef251
>>
>
>While I'm not going to NAK this series or those 2 patches if you send
>them, I'm not really convinced that adding more and more complexity to
>the mem2mem helpers is a good idea, especially since all of those seem
>to be only needed by stateful video decoders.
>
>The mem2mem framework started as a set of helpers to eliminate boiler
>plate from simple drivers that always get 1 CAPTURE and 1 OUTPUT buffer,
>run 1 processing job on them and then return both of the to the userspace
>and I think it should stay like this.
>
>I think we're strongly in need of a stateful video decoder framework that
>would actually address the exact problems that those have rather than
>bending something that wasn't designed with them in mind to work around the
>differences.

Thanks for the feedback.

I have recently discussed how we could approach creating a framework for
the codecs side, with Hans Verkuil and Nicolas Dufresne.

The first step we would have to do is come up with a list of
requirements for that framework and expected future needs, maybe we can
start a public discussion on the mailing list to generate a list like
that.
But all in all this endeavor will probably require quite a bit of time
and effort, do you think we could modify M2M a bit for our use case and
then when we are in the process of creating the new framework, we could
maybe think about simplifying the M2M framework again?

>
>Best regards,
>Tomasz

Greetings,
Sebastian

>
>> Sebastien and I authored this without giving it much thought, but we believe
>> this massively simplify our handling of DRC (dynamic resolution change).
>>
>> The main difference, is that we added ignore_streaming to the ctx, so that
>> drivers can opt-in the mode of operation. Thinking it would avoid any potential
>> side effects in drivers that aren't prepared to that. We didn't want to tied it
>> up to buffered, this is open to discussion of course, we do use buffered on both
>> queues and use a slightly more advance job_ready function, that take into
>> account our driver state.
>>
>> In short, Sebastien and I agree this small change is the right direction, we
>> simply have a different implementation. I can send it as RFC if one believe its
>> would be useful now (even without a user).
>>
>> > dprintk("Streaming needs to be on for both queues\n");
>> > return;
>> > }
>>

2023-07-13 03:52:44

by Tomasz Figa

[permalink] [raw]
Subject: Re: [PATCH 1/2] [RESEND] media: v4l2-mem2mem: allow device run without buf

On Wed, Jul 12, 2023 at 6:44 PM Sebastian Fricke
<[email protected]> wrote:
>
> Hey Tomasz,
>
> On 12.07.2023 09:31, Tomasz Figa wrote:
> >On Fri, Jul 07, 2023 at 03:14:23PM -0400, Nicolas Dufresne wrote:
> >> Hi Randy,
> >>
> >> Le mardi 04 juillet 2023 à 12:00 +0800, Hsia-Jun Li a écrit :
> >> > From: Randy Li <[email protected]>
> >> >
> >> > For the decoder supports Dynamic Resolution Change,
> >> > we don't need to allocate any CAPTURE or graphics buffer
> >> > for them at inital CAPTURE setup step.
> >> >
> >> > We need to make the device run or we can't get those
> >> > metadata.
> >> >
> >> > Signed-off-by: Randy Li <[email protected]>
> >> > ---
> >> > drivers/media/v4l2-core/v4l2-mem2mem.c | 5 +++--
> >> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> >> > index 0cc30397fbad..c771aba42015 100644
> >> > --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> >> > +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> >> > @@ -301,8 +301,9 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
> >> >
> >> > dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);
> >> >
> >> > - if (!m2m_ctx->out_q_ctx.q.streaming
> >> > - || !m2m_ctx->cap_q_ctx.q.streaming) {
> >> > + if (!(m2m_ctx->out_q_ctx.q.streaming || m2m_ctx->out_q_ctx.buffered)
> >> > + || !(m2m_ctx->cap_q_ctx.q.streaming
> >> > + || m2m_ctx->cap_q_ctx.buffered)) {
> >>
> >> I have a two atches with similar goals in my wave5 tree. It will be easier to
> >> upstream with an actual user, though, I'm probably a month or two away from
> >> submitting this driver again.
> >>
> >> https://gitlab.collabora.com/chipsnmedia/kernel/-/commit/ac59eafd5076c4deb3bfe1fb85b3b776586ef3eb
> >> https://gitlab.collabora.com/chipsnmedia/kernel/-/commit/5de4fbe0abb20b8e8d862b654f93e3efeb1ef251
> >>
> >
> >While I'm not going to NAK this series or those 2 patches if you send
> >them, I'm not really convinced that adding more and more complexity to
> >the mem2mem helpers is a good idea, especially since all of those seem
> >to be only needed by stateful video decoders.
> >
> >The mem2mem framework started as a set of helpers to eliminate boiler
> >plate from simple drivers that always get 1 CAPTURE and 1 OUTPUT buffer,
> >run 1 processing job on them and then return both of the to the userspace
> >and I think it should stay like this.
> >
> >I think we're strongly in need of a stateful video decoder framework that
> >would actually address the exact problems that those have rather than
> >bending something that wasn't designed with them in mind to work around the
> >differences.
>
> Thanks for the feedback.
>
> I have recently discussed how we could approach creating a framework for
> the codecs side, with Hans Verkuil and Nicolas Dufresne.

That's great to hear, thanks. :)

>
> The first step we would have to do is come up with a list of
> requirements for that framework and expected future needs, maybe we can
> start a public discussion on the mailing list to generate a list like
> that.

Makes sense. Let me CC some ChromeOS folks working on video codec
drivers these days.

> But all in all this endeavor will probably require quite a bit of time
> and effort, do you think we could modify M2M a bit for our use case and
> then when we are in the process of creating the new framework, we could
> maybe think about simplifying the M2M framework again?

Sure, as I said, I'm not NAKing this series.

>
> >
> >Best regards,
> >Tomasz
>
> Greetings,
> Sebastian
>
> >
> >> Sebastien and I authored this without giving it much thought, but we believe
> >> this massively simplify our handling of DRC (dynamic resolution change).
> >>
> >> The main difference, is that we added ignore_streaming to the ctx, so that
> >> drivers can opt-in the mode of operation. Thinking it would avoid any potential
> >> side effects in drivers that aren't prepared to that. We didn't want to tied it
> >> up to buffered, this is open to discussion of course, we do use buffered on both
> >> queues and use a slightly more advance job_ready function, that take into
> >> account our driver state.
> >>
> >> In short, Sebastien and I agree this small change is the right direction, we
> >> simply have a different implementation. I can send it as RFC if one believe its
> >> would be useful now (even without a user).
> >>
> >> > dprintk("Streaming needs to be on for both queues\n");
> >> > return;
> >> > }
> >>

2023-07-17 07:47:50

by Hsia-Jun Li

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw


On 7/12/23 17:33, Tomasz Figa wrote:
> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> On Tue, Jul 04, 2023 at 12:00:38PM +0800, Hsia-Jun Li wrote:
>> From: "Hsia-Jun(Randy) Li" <[email protected]>
>>
>> Many drivers have to create its own buf_struct for a
>> vb2_queue to track such a state. Also driver has to
>> iterate over rdy_queue every times to find out a buffer
>> which is not sent to hardware(or firmware), this new
>> list just offers the driver a place to store the buffer
>> that hardware(firmware) has acknowledged.
>>
>> One important advance about this list, it doesn't like
>> rdy_queue which both bottom half of the user calling
>> could operate it, while the v4l2 worker would as well.
>> The v4l2 core could only operate this queue when its
>> v4l2_context is not running, the driver would only
>> access this new hw_queue in its own worker.
> Could you describe in what case such a list would be useful for a
> mem2mem driver?

This list, as its description, just for saving us from creating a
private buffer struct to track buffer state.

The queue in the kernel is not the queue that hardware(codec firmware)
are using.


>> Signed-off-by: Hsia-Jun(Randy) Li <[email protected]>
>> ---
>> drivers/media/v4l2-core/v4l2-mem2mem.c | 25 +++++++++++++++++--------
>> include/media/v4l2-mem2mem.h | 10 +++++++++-
>> 2 files changed, 26 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
>> index c771aba42015..b4151147d5bd 100644
>> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
>> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
>> @@ -321,15 +321,21 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
>> goto job_unlock;
>> }
>>
>> - src = v4l2_m2m_next_src_buf(m2m_ctx);
>> - dst = v4l2_m2m_next_dst_buf(m2m_ctx);
>> - if (!src && !m2m_ctx->out_q_ctx.buffered) {
>> - dprintk("No input buffers available\n");
>> - goto job_unlock;
>> + if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
>> + src = v4l2_m2m_next_src_buf(m2m_ctx);
>> +
>> + if (!src && !m2m_ctx->out_q_ctx.buffered) {
>> + dprintk("No input buffers available\n");
>> + goto job_unlock;
>> + }
>> }
>> - if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
>> - dprintk("No output buffers available\n");
>> - goto job_unlock;
>> +
>> + if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
>> + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
>> + if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
>> + dprintk("No output buffers available\n");
>> + goto job_unlock;
>> + }
>> }
> src and dst would be referenced unitialized below if neither of the
> above ifs hits...
I think they have been initialized at v4l2_m2m_ctx_init()
>
> Best regards,
> Tomasz
>
>> m2m_ctx->new_frame = true;
>> @@ -896,6 +902,7 @@ int v4l2_m2m_streamoff(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
>> INIT_LIST_HEAD(&q_ctx->rdy_queue);
>> q_ctx->num_rdy = 0;
>> spin_unlock_irqrestore(&q_ctx->rdy_spinlock, flags);
>> + INIT_LIST_HEAD(&q_ctx->hw_queue);
>>
>> if (m2m_dev->curr_ctx == m2m_ctx) {
>> m2m_dev->curr_ctx = NULL;
>> @@ -1234,6 +1241,8 @@ struct v4l2_m2m_ctx *v4l2_m2m_ctx_init(struct v4l2_m2m_dev *m2m_dev,
>>
>> INIT_LIST_HEAD(&out_q_ctx->rdy_queue);
>> INIT_LIST_HEAD(&cap_q_ctx->rdy_queue);
>> + INIT_LIST_HEAD(&out_q_ctx->hw_queue);
>> + INIT_LIST_HEAD(&cap_q_ctx->hw_queue);
>> spin_lock_init(&out_q_ctx->rdy_spinlock);
>> spin_lock_init(&cap_q_ctx->rdy_spinlock);
>>
>> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
>> index d6c8eb2b5201..2342656e582d 100644
>> --- a/include/media/v4l2-mem2mem.h
>> +++ b/include/media/v4l2-mem2mem.h
>> @@ -53,9 +53,16 @@ struct v4l2_m2m_dev;
>> * processed
>> *
>> * @q: pointer to struct &vb2_queue
>> - * @rdy_queue: List of V4L2 mem-to-mem queues
>> + * @rdy_queue: List of V4L2 mem-to-mem queues. If v4l2_m2m_buf_queue() is
>> + * called in struct vb2_ops->buf_queue(), the buffer enqueued
>> + * by user would be added to this list.
>> * @rdy_spinlock: spin lock to protect the struct usage
>> * @num_rdy: number of buffers ready to be processed
>> + * @hw_queue: A list for tracking the buffer is occupied by the hardware
>> + * (or device's firmware). A buffer could only be in either
>> + * this list or @rdy_queue.
>> + * Driver may choose not to use this list while uses its own
>> + * private data to do this work.
>> * @buffered: is the queue buffered?
>> *
>> * Queue for buffers ready to be processed as soon as this
>> @@ -68,6 +75,7 @@ struct v4l2_m2m_queue_ctx {
>> struct list_head rdy_queue;
>> spinlock_t rdy_spinlock;
>> u8 num_rdy;
>> + struct list_head hw_queue;
>> bool buffered;
>> };
>>
>> --
>> 2.17.1
>>
--
Hsia-Jun(Randy) Li


2023-07-17 14:19:55

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: [PATCH 1/2] [RESEND] media: v4l2-mem2mem: allow device run without buf

Le mercredi 12 juillet 2023 à 09:31 +0000, Tomasz Figa a écrit :
> On Fri, Jul 07, 2023 at 03:14:23PM -0400, Nicolas Dufresne wrote:
> > Hi Randy,
> >
> > Le mardi 04 juillet 2023 à 12:00 +0800, Hsia-Jun Li a écrit :
> > > From: Randy Li <[email protected]>
> > >
> > > For the decoder supports Dynamic Resolution Change,
> > > we don't need to allocate any CAPTURE or graphics buffer
> > > for them at inital CAPTURE setup step.
> > >
> > > We need to make the device run or we can't get those
> > > metadata.
> > >
> > > Signed-off-by: Randy Li <[email protected]>
> > > ---
> > > drivers/media/v4l2-core/v4l2-mem2mem.c | 5 +++--
> > > 1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > index 0cc30397fbad..c771aba42015 100644
> > > --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > @@ -301,8 +301,9 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
> > >
> > > dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);
> > >
> > > - if (!m2m_ctx->out_q_ctx.q.streaming
> > > - || !m2m_ctx->cap_q_ctx.q.streaming) {
> > > + if (!(m2m_ctx->out_q_ctx.q.streaming || m2m_ctx->out_q_ctx.buffered)
> > > + || !(m2m_ctx->cap_q_ctx.q.streaming
> > > + || m2m_ctx->cap_q_ctx.buffered)) {
> >
> > I have a two atches with similar goals in my wave5 tree. It will be easier to
> > upstream with an actual user, though, I'm probably a month or two away from
> > submitting this driver again.
> >
> > https://gitlab.collabora.com/chipsnmedia/kernel/-/commit/ac59eafd5076c4deb3bfe1fb85b3b776586ef3eb
> > https://gitlab.collabora.com/chipsnmedia/kernel/-/commit/5de4fbe0abb20b8e8d862b654f93e3efeb1ef251
> >
>
> While I'm not going to NAK this series or those 2 patches if you send
> them, I'm not really convinced that adding more and more complexity to
> the mem2mem helpers is a good idea, especially since all of those seem
> to be only needed by stateful video decoders.
>
> The mem2mem framework started as a set of helpers to eliminate boiler
> plate from simple drivers that always get 1 CAPTURE and 1 OUTPUT buffer,
> run 1 processing job on them and then return both of the to the userspace
> and I think it should stay like this.

Its a bit late to try and bring that argument. It should have been raised couple
of years ago (before I even started helping with these CODEC). Now that all the
newly written stately decoders uses this framework, it is logical to keep
reducing the boiler plate for these too. In my opinion, the job_ready()
callback, should have been a lot more flexible from the start. And allowing
driver to make it more powerful does not really add that much complexity.

Speaking of complexity, driving the output manually (outside of the job
workqueue) during sequence initialization is a way more complex and risky then
this. Finally, sticking with 1:1 pattern means encoder, detilers, image
enhancement reducing framerate, etc. would all be unwelcome to use this. Which
in short, means no one should even use this.

>
> I think we're strongly in need of a stateful video decoder framework that
> would actually address the exact problems that those have rather than
> bending something that wasn't designed with them in mind to work around the
> differences.

The bend is already there, of course I'd be happy to help with any new
framework. Specially on modern stateless, were there is a need to do better
scheduling. Just ping me if you have some effort starting, I don't currently
have a budget or bandwidth to write new drivers or port existing drivers them on
a newly written framework.

Nicolas


[...]

2023-07-17 14:50:03

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw

Le mercredi 12 juillet 2023 à 09:33 +0000, Tomasz Figa a écrit :
> On Tue, Jul 04, 2023 at 12:00:38PM +0800, Hsia-Jun Li wrote:
> > From: "Hsia-Jun(Randy) Li" <[email protected]>
> >
> > Many drivers have to create its own buf_struct for a
> > vb2_queue to track such a state. Also driver has to
> > iterate over rdy_queue every times to find out a buffer
> > which is not sent to hardware(or firmware), this new
> > list just offers the driver a place to store the buffer
> > that hardware(firmware) has acknowledged.
> >
> > One important advance about this list, it doesn't like
> > rdy_queue which both bottom half of the user calling
> > could operate it, while the v4l2 worker would as well.
> > The v4l2 core could only operate this queue when its
> > v4l2_context is not running, the driver would only
> > access this new hw_queue in its own worker.
>
> Could you describe in what case such a list would be useful for a
> mem2mem driver?

Today all driver must track buffers that are "owned by the hardware". This is a
concept dictated by the m2m framework and enforced through the ACTIVE flag. All
buffers from this list must be mark as done/error/queued after streamoff of the
respective queue in order to acknowledge that they are no longer in use by the
HW. Not doing so will warn:

videobuf2_common: driver bug: stop_streaming operation is leaving buf ...

Though, there is no queue to easily iterate them. All driver endup having their
own queue, or just leaving the buffers in the rdy_queue (which isn't better).

Nicolas
>
> >
> > Signed-off-by: Hsia-Jun(Randy) Li <[email protected]>
> > ---
> > drivers/media/v4l2-core/v4l2-mem2mem.c | 25 +++++++++++++++++--------
> > include/media/v4l2-mem2mem.h | 10 +++++++++-
> > 2 files changed, 26 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > index c771aba42015..b4151147d5bd 100644
> > --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> > +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > @@ -321,15 +321,21 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
> > goto job_unlock;
> > }
> >
> > - src = v4l2_m2m_next_src_buf(m2m_ctx);
> > - dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> > - if (!src && !m2m_ctx->out_q_ctx.buffered) {
> > - dprintk("No input buffers available\n");
> > - goto job_unlock;
> > + if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
> > + src = v4l2_m2m_next_src_buf(m2m_ctx);
> > +
> > + if (!src && !m2m_ctx->out_q_ctx.buffered) {
> > + dprintk("No input buffers available\n");
> > + goto job_unlock;
> > + }
> > }
> > - if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> > - dprintk("No output buffers available\n");
> > - goto job_unlock;
> > +
> > + if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
> > + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> > + if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> > + dprintk("No output buffers available\n");
> > + goto job_unlock;
> > + }
> > }
>
> src and dst would be referenced unitialized below if neither of the
> above ifs hits...
>
> Best regards,
> Tomasz
>
> >
> > m2m_ctx->new_frame = true;
> > @@ -896,6 +902,7 @@ int v4l2_m2m_streamoff(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
> > INIT_LIST_HEAD(&q_ctx->rdy_queue);
> > q_ctx->num_rdy = 0;
> > spin_unlock_irqrestore(&q_ctx->rdy_spinlock, flags);
> > + INIT_LIST_HEAD(&q_ctx->hw_queue);
> >
> > if (m2m_dev->curr_ctx == m2m_ctx) {
> > m2m_dev->curr_ctx = NULL;
> > @@ -1234,6 +1241,8 @@ struct v4l2_m2m_ctx *v4l2_m2m_ctx_init(struct v4l2_m2m_dev *m2m_dev,
> >
> > INIT_LIST_HEAD(&out_q_ctx->rdy_queue);
> > INIT_LIST_HEAD(&cap_q_ctx->rdy_queue);
> > + INIT_LIST_HEAD(&out_q_ctx->hw_queue);
> > + INIT_LIST_HEAD(&cap_q_ctx->hw_queue);
> > spin_lock_init(&out_q_ctx->rdy_spinlock);
> > spin_lock_init(&cap_q_ctx->rdy_spinlock);
> >
> > diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
> > index d6c8eb2b5201..2342656e582d 100644
> > --- a/include/media/v4l2-mem2mem.h
> > +++ b/include/media/v4l2-mem2mem.h
> > @@ -53,9 +53,16 @@ struct v4l2_m2m_dev;
> > * processed
> > *
> > * @q: pointer to struct &vb2_queue
> > - * @rdy_queue: List of V4L2 mem-to-mem queues
> > + * @rdy_queue: List of V4L2 mem-to-mem queues. If v4l2_m2m_buf_queue() is
> > + * called in struct vb2_ops->buf_queue(), the buffer enqueued
> > + * by user would be added to this list.
> > * @rdy_spinlock: spin lock to protect the struct usage
> > * @num_rdy: number of buffers ready to be processed
> > + * @hw_queue: A list for tracking the buffer is occupied by the hardware
> > + * (or device's firmware). A buffer could only be in either
> > + * this list or @rdy_queue.
> > + * Driver may choose not to use this list while uses its own
> > + * private data to do this work.
> > * @buffered: is the queue buffered?
> > *
> > * Queue for buffers ready to be processed as soon as this
> > @@ -68,6 +75,7 @@ struct v4l2_m2m_queue_ctx {
> > struct list_head rdy_queue;
> > spinlock_t rdy_spinlock;
> > u8 num_rdy;
> > + struct list_head hw_queue;
> > bool buffered;
> > };
> >
> > --
> > 2.17.1
> >


2023-07-18 05:12:13

by Hsia-Jun Li

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw


On 7/17/23 22:07, Nicolas Dufresne wrote:
> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> Le mercredi 12 juillet 2023 à 09:33 +0000, Tomasz Figa a écrit :
>> On Tue, Jul 04, 2023 at 12:00:38PM +0800, Hsia-Jun Li wrote:
>>> From: "Hsia-Jun(Randy) Li" <[email protected]>
>>>
>>> Many drivers have to create its own buf_struct for a
>>> vb2_queue to track such a state. Also driver has to
>>> iterate over rdy_queue every times to find out a buffer
>>> which is not sent to hardware(or firmware), this new
>>> list just offers the driver a place to store the buffer
>>> that hardware(firmware) has acknowledged.
>>>
>>> One important advance about this list, it doesn't like
>>> rdy_queue which both bottom half of the user calling
>>> could operate it, while the v4l2 worker would as well.
>>> The v4l2 core could only operate this queue when its
>>> v4l2_context is not running, the driver would only
>>> access this new hw_queue in its own worker.
>> Could you describe in what case such a list would be useful for a
>> mem2mem driver?
> Today all driver must track buffers that are "owned by the hardware". This is a
> concept dictated by the m2m framework and enforced through the ACTIVE flag.

I think that flag is confusing, the m2m framework would only set this
flag when a buffer is enqueue into v4l2 (__enqueue_in_driver()).

The application can't know whether the driver(or hardware) would still
use it when that buffer is dequeued(__vb2_dqbuf() would override the
state here).

I was trying to suggest a flag for the new DELETE_BUF ioctl() or it
could delete a buffer which the hardware could still use in the future,
if we are not in the case for non-secure stateless codec.

> All
> buffers from this list must be mark as done/error/queued after streamoff of the
> respective queue in order to acknowledge that they are no longer in use by the
> HW. Not doing so will warn:
>
> videobuf2_common: driver bug: stop_streaming operation is leaving buf ...
>
> Though, there is no queue to easily iterate them. All driver endup having their
> own queue, or just leaving the buffers in the rdy_queue (which isn't better).
>
> Nicolas
>>> Signed-off-by: Hsia-Jun(Randy) Li <[email protected]>
>>> ---
>>> drivers/media/v4l2-core/v4l2-mem2mem.c | 25 +++++++++++++++++--------
>>> include/media/v4l2-mem2mem.h | 10 +++++++++-
>>> 2 files changed, 26 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
>>> index c771aba42015..b4151147d5bd 100644
>>> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
>>> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
>>> @@ -321,15 +321,21 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
>>> goto job_unlock;
>>> }
>>>
>>> - src = v4l2_m2m_next_src_buf(m2m_ctx);
>>> - dst = v4l2_m2m_next_dst_buf(m2m_ctx);
>>> - if (!src && !m2m_ctx->out_q_ctx.buffered) {
>>> - dprintk("No input buffers available\n");
>>> - goto job_unlock;
>>> + if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
>>> + src = v4l2_m2m_next_src_buf(m2m_ctx);
>>> +
>>> + if (!src && !m2m_ctx->out_q_ctx.buffered) {
>>> + dprintk("No input buffers available\n");
>>> + goto job_unlock;
>>> + }
>>> }
>>> - if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
>>> - dprintk("No output buffers available\n");
>>> - goto job_unlock;
>>> +
>>> + if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
>>> + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
>>> + if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
>>> + dprintk("No output buffers available\n");
>>> + goto job_unlock;
>>> + }
>>> }
>> src and dst would be referenced unitialized below if neither of the
>> above ifs hits...
>>
>> Best regards,
>> Tomasz
>>
>>> m2m_ctx->new_frame = true;
>>> @@ -896,6 +902,7 @@ int v4l2_m2m_streamoff(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
>>> INIT_LIST_HEAD(&q_ctx->rdy_queue);
>>> q_ctx->num_rdy = 0;
>>> spin_unlock_irqrestore(&q_ctx->rdy_spinlock, flags);
>>> + INIT_LIST_HEAD(&q_ctx->hw_queue);
>>>
>>> if (m2m_dev->curr_ctx == m2m_ctx) {
>>> m2m_dev->curr_ctx = NULL;
>>> @@ -1234,6 +1241,8 @@ struct v4l2_m2m_ctx *v4l2_m2m_ctx_init(struct v4l2_m2m_dev *m2m_dev,
>>>
>>> INIT_LIST_HEAD(&out_q_ctx->rdy_queue);
>>> INIT_LIST_HEAD(&cap_q_ctx->rdy_queue);
>>> + INIT_LIST_HEAD(&out_q_ctx->hw_queue);
>>> + INIT_LIST_HEAD(&cap_q_ctx->hw_queue);
>>> spin_lock_init(&out_q_ctx->rdy_spinlock);
>>> spin_lock_init(&cap_q_ctx->rdy_spinlock);
>>>
>>> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
>>> index d6c8eb2b5201..2342656e582d 100644
>>> --- a/include/media/v4l2-mem2mem.h
>>> +++ b/include/media/v4l2-mem2mem.h
>>> @@ -53,9 +53,16 @@ struct v4l2_m2m_dev;
>>> * processed
>>> *
>>> * @q: pointer to struct &vb2_queue
>>> - * @rdy_queue: List of V4L2 mem-to-mem queues
>>> + * @rdy_queue: List of V4L2 mem-to-mem queues. If v4l2_m2m_buf_queue() is
>>> + * called in struct vb2_ops->buf_queue(), the buffer enqueued
>>> + * by user would be added to this list.
>>> * @rdy_spinlock: spin lock to protect the struct usage
>>> * @num_rdy: number of buffers ready to be processed
>>> + * @hw_queue: A list for tracking the buffer is occupied by the hardware
>>> + * (or device's firmware). A buffer could only be in either
>>> + * this list or @rdy_queue.
>>> + * Driver may choose not to use this list while uses its own
>>> + * private data to do this work.
>>> * @buffered: is the queue buffered?
>>> *
>>> * Queue for buffers ready to be processed as soon as this
>>> @@ -68,6 +75,7 @@ struct v4l2_m2m_queue_ctx {
>>> struct list_head rdy_queue;
>>> spinlock_t rdy_spinlock;
>>> u8 num_rdy;
>>> + struct list_head hw_queue;
>>> bool buffered;
>>> };
>>>
>>> --
>>> 2.17.1
>>>
--
Hsia-Jun(Randy) Li


2023-07-21 10:10:38

by Hsia-Jun Li

[permalink] [raw]
Subject: Re: [PATCH 1/2] [RESEND] media: v4l2-mem2mem: allow device run without buf



On 7/17/23 22:00, Nicolas Dufresne wrote:
> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> Le mercredi 12 juillet 2023 à 09:31 +0000, Tomasz Figa a écrit :
>> On Fri, Jul 07, 2023 at 03:14:23PM -0400, Nicolas Dufresne wrote:
>>> Hi Randy,
>>>
>>> Le mardi 04 juillet 2023 à 12:00 +0800, Hsia-Jun Li a écrit :
>>>> From: Randy Li <[email protected]>
>>>>
>>>> For the decoder supports Dynamic Resolution Change,
>>>> we don't need to allocate any CAPTURE or graphics buffer
>>>> for them at inital CAPTURE setup step.
>>>>
>>>> We need to make the device run or we can't get those
>>>> metadata.
>>>>
>>>> Signed-off-by: Randy Li <[email protected]>
>>>> ---
>>>> drivers/media/v4l2-core/v4l2-mem2mem.c | 5 +++--
>>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>> index 0cc30397fbad..c771aba42015 100644
>>>> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>> @@ -301,8 +301,9 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
>>>>
>>>> dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);
>>>>
>>>> - if (!m2m_ctx->out_q_ctx.q.streaming
>>>> - || !m2m_ctx->cap_q_ctx.q.streaming) {
>>>> + if (!(m2m_ctx->out_q_ctx.q.streaming || m2m_ctx->out_q_ctx.buffered)
>>>> + || !(m2m_ctx->cap_q_ctx.q.streaming
>>>> + || m2m_ctx->cap_q_ctx.buffered)) {
>>>
>>> I have a two atches with similar goals in my wave5 tree. It will be easier to
>>> upstream with an actual user, though, I'm probably a month or two away from
>>> submitting this driver again.
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.collabora.com_chipsnmedia_kernel_-2D_commit_ac59eafd5076c4deb3bfe1fb85b3b776586ef3eb&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=9eWwqueFnh1yZHTW11j-syNVQvema7iBzNQeX1GKUQwXZ9pm6V4HDL_R2tIYKoOw&s=Ez5AyEYFIAJmC_k00IPO_ImzVdLZjr_veRq1bN4RSNg&e=
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.collabora.com_chipsnmedia_kernel_-2D_commit_5de4fbe0abb20b8e8d862b654f93e3efeb1ef251&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=9eWwqueFnh1yZHTW11j-syNVQvema7iBzNQeX1GKUQwXZ9pm6V4HDL_R2tIYKoOw&s=tM81gjNe-bTjpjmidZ1sAhiodMh6npcVJNOhMCi1mPo&e=
>>>
>>
>> While I'm not going to NAK this series or those 2 patches if you send
>> them, I'm not really convinced that adding more and more complexity to
>> the mem2mem helpers is a good idea, especially since all of those seem
>> to be only needed by stateful video decoders.
>>
>> The mem2mem framework started as a set of helpers to eliminate boiler
>> plate from simple drivers that always get 1 CAPTURE and 1 OUTPUT buffer,
>> run 1 processing job on them and then return both of the to the userspace
>> and I think it should stay like this.
>
> Its a bit late to try and bring that argument. It should have been raised couple
> of years ago (before I even started helping with these CODEC). Now that all the
> newly written stately decoders uses this framework, it is logical to keep
> reducing the boiler plate for these too. In my opinion, the job_ready()
> callback, should have been a lot more flexible from the start. And allowing
> driver to make it more powerful does not really add that much complexity.
>
> Speaking of complexity, driving the output manually (outside of the job
> workqueue) during sequence initialization is a way more complex and risky then
> this. Finally, sticking with 1:1 pattern means encoder, detilers, image
> enhancement reducing framerate, etc. would all be unwelcome to use this. Which
> in short, means no one should even use this.
>
I think those things are m2m, but it would be hard to present in current
m2m framework:
1. N:1 compositor(It may be implemented as a loop running 2:1 compositor).
2. AV1 film gain
3. HDR with dynamic meta data to SDR

The video things fix for m2m model could be just a little less complex
than ISP or camera pipeline. The only difference is just ISP won't have
multiple contexts running at the same time.
If we could design a model for the video encoder I think we could solve
the most of problems.
A video encoder would have:
1. input graphics buffer
2. reconstruction graphics buffer
3. motion vector cache buffer(optional)
4. coded bitstream output
5. encoding statistic report
>>
>> I think we're strongly in need of a stateful video decoder framework that
>> would actually address the exact problems that those have rather than
>> bending something that wasn't designed with them in mind to work around the
>> differences.
>
> The bend is already there, of course I'd be happy to help with any new
> framework. Specially on modern stateless, were there is a need to do better
> scheduling.
I didn't know the schedule problem about stateless codec, are they
supposed to be in the job queue when the buffers that DPB requests are
own by the driver and its registers are prepared except the trigger bit?
Just ping me if you have some effort starting, I don't currently
> have a budget or bandwidth to write new drivers or port existing drivers them on
> a newly written framework.
>
> Nicolas
>
>
> [...]

--
Hsia-Jun(Randy) Li

2023-07-21 17:09:07

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: [PATCH 1/2] [RESEND] media: v4l2-mem2mem: allow device run without buf

Le vendredi 21 juillet 2023 à 16:56 +0800, Hsia-Jun Li a écrit :
>
> On 7/17/23 22:00, Nicolas Dufresne wrote:
> > CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
> >
> >
> > Le mercredi 12 juillet 2023 à 09:31 +0000, Tomasz Figa a écrit :
> > > On Fri, Jul 07, 2023 at 03:14:23PM -0400, Nicolas Dufresne wrote:
> > > > Hi Randy,
> > > >
> > > > Le mardi 04 juillet 2023 à 12:00 +0800, Hsia-Jun Li a écrit :
> > > > > From: Randy Li <[email protected]>
> > > > >
> > > > > For the decoder supports Dynamic Resolution Change,
> > > > > we don't need to allocate any CAPTURE or graphics buffer
> > > > > for them at inital CAPTURE setup step.
> > > > >
> > > > > We need to make the device run or we can't get those
> > > > > metadata.
> > > > >
> > > > > Signed-off-by: Randy Li <[email protected]>
> > > > > ---
> > > > > drivers/media/v4l2-core/v4l2-mem2mem.c | 5 +++--
> > > > > 1 file changed, 3 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > > > index 0cc30397fbad..c771aba42015 100644
> > > > > --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > > > +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > > > @@ -301,8 +301,9 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
> > > > >
> > > > > dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);
> > > > >
> > > > > - if (!m2m_ctx->out_q_ctx.q.streaming
> > > > > - || !m2m_ctx->cap_q_ctx.q.streaming) {
> > > > > + if (!(m2m_ctx->out_q_ctx.q.streaming || m2m_ctx->out_q_ctx.buffered)
> > > > > + || !(m2m_ctx->cap_q_ctx.q.streaming
> > > > > + || m2m_ctx->cap_q_ctx.buffered)) {
> > > >
> > > > I have a two atches with similar goals in my wave5 tree. It will be easier to
> > > > upstream with an actual user, though, I'm probably a month or two away from
> > > > submitting this driver again.
> > > >
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.collabora.com_chipsnmedia_kernel_-2D_commit_ac59eafd5076c4deb3bfe1fb85b3b776586ef3eb&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=9eWwqueFnh1yZHTW11j-syNVQvema7iBzNQeX1GKUQwXZ9pm6V4HDL_R2tIYKoOw&s=Ez5AyEYFIAJmC_k00IPO_ImzVdLZjr_veRq1bN4RSNg&e=
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.collabora.com_chipsnmedia_kernel_-2D_commit_5de4fbe0abb20b8e8d862b654f93e3efeb1ef251&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=9eWwqueFnh1yZHTW11j-syNVQvema7iBzNQeX1GKUQwXZ9pm6V4HDL_R2tIYKoOw&s=tM81gjNe-bTjpjmidZ1sAhiodMh6npcVJNOhMCi1mPo&e=
> > > >
> > >
> > > While I'm not going to NAK this series or those 2 patches if you send
> > > them, I'm not really convinced that adding more and more complexity to
> > > the mem2mem helpers is a good idea, especially since all of those seem
> > > to be only needed by stateful video decoders.
> > >
> > > The mem2mem framework started as a set of helpers to eliminate boiler
> > > plate from simple drivers that always get 1 CAPTURE and 1 OUTPUT buffer,
> > > run 1 processing job on them and then return both of the to the userspace
> > > and I think it should stay like this.
> >
> > Its a bit late to try and bring that argument. It should have been raised couple
> > of years ago (before I even started helping with these CODEC). Now that all the
> > newly written stately decoders uses this framework, it is logical to keep
> > reducing the boiler plate for these too. In my opinion, the job_ready()
> > callback, should have been a lot more flexible from the start. And allowing
> > driver to make it more powerful does not really add that much complexity.
> >
> > Speaking of complexity, driving the output manually (outside of the job
> > workqueue) during sequence initialization is a way more complex and risky then
> > this. Finally, sticking with 1:1 pattern means encoder, detilers, image
> > enhancement reducing framerate, etc. would all be unwelcome to use this. Which
> > in short, means no one should even use this.
> >
> I think those things are m2m, but it would be hard to present in current
> m2m framework:
> 1. N:1 compositor(It may be implemented as a loop running 2:1 compositor).

Correct, only SRC/DST/BG type of blitters can be supported for compositing,
which is quite limiting. Currently there is no way to make an N:1 M2M, as M2M
instances are implemented at the video node layer, and not at the MC layer. This
is a entirely new subject and API design space to tackle (same goes for 1:N,
like multi scalers, svc decoders etc.).

> 2. AV1 film gain

For AV1/HEVC film grain, it is handle similar to inline converters and scalers.
The driver secretly allocate the reference frames, and post process into the
user visible buffers. It breaks some assumption made by most protected memory
setup though, as not all allocation is user driven, meaning the decoder needs to
know if its secure or not. Secure memory is a also another API design space to
tackle.

> 3. HDR with dynamic meta data to SDR

True, but easy to design around the stateless model. I'm not worried for these.

>
> The video things fix for m2m model could be just a little less complex
> than ISP or camera pipeline. The only difference is just ISP won't have
> multiple contexts running at the same time.

I thought that having the kernel schedule ISP reprocessing jobs (which requires
instances) would be nice. But this can only be solved after we have solved the
N:N use cases of m2m (with multiple instances).

> If we could design a model for the video encoder I think we could solve
> the most of problems.
> A video encoder would have:
> 1. input graphics buffer
> 2. reconstruction graphics buffer
> 3. motion vector cache buffer(optional)
> 4. coded bitstream output
> 5. encoding statistic report
> > >
> > > I think we're strongly in need of a stateful video decoder framework that
> > > would actually address the exact problems that those have rather than
> > > bending something that wasn't designed with them in mind to work around the
> > > differences.
> >
> > The bend is already there, of course I'd be happy to help with any new
> > framework. Specially on modern stateless, were there is a need to do better
> > scheduling.
> I didn't know the schedule problem about stateless codec, are they
> supposed to be in the job queue when the buffers that DPB requests are
> own by the driver and its registers are prepared except the trigger bit?

On RK3588 at least, decoder scheduling is going to be complex. There is an even
number of cores, but when you need to decode 8K, you have to pair two cores
(there is a specific set of cores that are to be paired with). We need a decent
scheduling logic to ensure we don't starve 8K decoding session when there is
multiple smaller resolution session on-going.

On MTK, the entropy decoding (LAT) and the reconstruction (CORE) is split. MTK
vcodec is using multiple workqueues to move jobs around, which is clearly
expensive. Also, the life time of a job is not exactly easy to manage.

On RPi HEVC (not upstream yet, but being worked on), the entropy decoding and
reconstruction is done one the same core, but remains 2 concurrent operation.
Does not impose a complex scheduling issue, but it raised the need for a way to
fully utilize such HW.

This is just some examples of complexity for which the current framework is not
that helpful (even though, its not impossible either).

> Just ping me if you have some effort starting, I don't currently
> > have a budget or bandwidth to write new drivers or port existing drivers them on
> > a newly written framework.
> >
> > Nicolas
> >
> >
> > [...]
>


2023-07-24 17:45:07

by Randy Li

[permalink] [raw]
Subject: Re: [PATCH 1/2] [RESEND] media: v4l2-mem2mem: allow device run without buf


On 2023/7/22 00:22, Nicolas Dufresne wrote:
> Le vendredi 21 juillet 2023 à 16:56 +0800, Hsia-Jun Li a écrit :
>> On 7/17/23 22:00, Nicolas Dufresne wrote:
>>> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
>>>
>>>
>>> Le mercredi 12 juillet 2023 à 09:31 +0000, Tomasz Figa a écrit :
>>>> On Fri, Jul 07, 2023 at 03:14:23PM -0400, Nicolas Dufresne wrote:
>>>>> Hi Randy,
>>>>>
>>>>> Le mardi 04 juillet 2023 à 12:00 +0800, Hsia-Jun Li a écrit :
>>>>>> From: Randy Li <[email protected]>
>>>>>>
>>>>>> For the decoder supports Dynamic Resolution Change,
>>>>>> we don't need to allocate any CAPTURE or graphics buffer
>>>>>> for them at inital CAPTURE setup step.
>>>>>>
>>>>>> We need to make the device run or we can't get those
>>>>>> metadata.
>>>>>>
>>>>>> Signed-off-by: Randy Li <[email protected]>
>>>>>> ---
>>>>>> drivers/media/v4l2-core/v4l2-mem2mem.c | 5 +++--
>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>>>> index 0cc30397fbad..c771aba42015 100644
>>>>>> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>>>> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>>>> @@ -301,8 +301,9 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
>>>>>>
>>>>>> dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);
>>>>>>
>>>>>> - if (!m2m_ctx->out_q_ctx.q.streaming
>>>>>> - || !m2m_ctx->cap_q_ctx.q.streaming) {
>>>>>> + if (!(m2m_ctx->out_q_ctx.q.streaming || m2m_ctx->out_q_ctx.buffered)
>>>>>> + || !(m2m_ctx->cap_q_ctx.q.streaming
>>>>>> + || m2m_ctx->cap_q_ctx.buffered)) {
>>>>> I have a two atches with similar goals in my wave5 tree. It will be easier to
>>>>> upstream with an actual user, though, I'm probably a month or two away from
>>>>> submitting this driver again.
>>>>>
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.collabora.com_chipsnmedia_kernel_-2D_commit_ac59eafd5076c4deb3bfe1fb85b3b776586ef3eb&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=9eWwqueFnh1yZHTW11j-syNVQvema7iBzNQeX1GKUQwXZ9pm6V4HDL_R2tIYKoOw&s=Ez5AyEYFIAJmC_k00IPO_ImzVdLZjr_veRq1bN4RSNg&e=
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.collabora.com_chipsnmedia_kernel_-2D_commit_5de4fbe0abb20b8e8d862b654f93e3efeb1ef251&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=9eWwqueFnh1yZHTW11j-syNVQvema7iBzNQeX1GKUQwXZ9pm6V4HDL_R2tIYKoOw&s=tM81gjNe-bTjpjmidZ1sAhiodMh6npcVJNOhMCi1mPo&e=
>>>>>
>>>> While I'm not going to NAK this series or those 2 patches if you send
>>>> them, I'm not really convinced that adding more and more complexity to
>>>> the mem2mem helpers is a good idea, especially since all of those seem
>>>> to be only needed by stateful video decoders.
>>>>
>>>> The mem2mem framework started as a set of helpers to eliminate boiler
>>>> plate from simple drivers that always get 1 CAPTURE and 1 OUTPUT buffer,
>>>> run 1 processing job on them and then return both of the to the userspace
>>>> and I think it should stay like this.
>>> Its a bit late to try and bring that argument. It should have been raised couple
>>> of years ago (before I even started helping with these CODEC). Now that all the
>>> newly written stately decoders uses this framework, it is logical to keep
>>> reducing the boiler plate for these too. In my opinion, the job_ready()
>>> callback, should have been a lot more flexible from the start. And allowing
>>> driver to make it more powerful does not really add that much complexity.
>>>
>>> Speaking of complexity, driving the output manually (outside of the job
>>> workqueue) during sequence initialization is a way more complex and risky then
>>> this. Finally, sticking with 1:1 pattern means encoder, detilers, image
>>> enhancement reducing framerate, etc. would all be unwelcome to use this. Which
>>> in short, means no one should even use this.
>>>
>> I think those things are m2m, but it would be hard to present in current
>> m2m framework:
>> 1. N:1 compositor(It may be implemented as a loop running 2:1 compositor).
> Correct, only SRC/DST/BG type of blitters can be supported for compositing,
> which is quite limiting. Currently there is no way to make an N:1 M2M, as M2M
> instances are implemented at the video node layer, and not at the MC layer. This
> is a entirely new subject and API design space to tackle (same goes for 1:N,
> like multi scalers, svc decoders etc.).
SVC case is the one I mention in the talk, although the major problem
may only happens to SVC-S.
>
>> 2. AV1 film gain
> For AV1/HEVC film grain, it is handle similar to inline converters and scalers.

I know a few decoders in the market didn't implement such feature in the
its hardware, they rely on the other hardware.

Actually, it would be better to let NPU do such job.

> The driver secretly allocate the reference frames, and post process into the
> user visible buffers.
Hiding internal buffer is the worst case, frame buffer could be large.
> It breaks some assumption made by most protected memory
> setup though, as not all allocation is user driven, meaning the decoder needs to
> know if its secure or not. Secure memory is a also another API design space to
> tackle.
>
>> 3. HDR with dynamic meta data to SDR
> True, but easy to design around the stateless model. I'm not worried for these.
The current stateless API won't support DMA buffer for the metadata.
>
>> The video things fix for m2m model could be just a little less complex
>> than ISP or camera pipeline. The only difference is just ISP won't have
>> multiple contexts running at the same time.
> I thought that having the kernel schedule ISP reprocessing jobs (which requires
> instances) would be nice. But this can only be solved after we have solved the
> N:N use cases of m2m (with multiple instances).
>
>> If we could design a model for the video encoder I think we could solve
>> the most of problems.
>> A video encoder would have:
>> 1. input graphics buffer
>> 2. reconstruction graphics buffer
>> 3. motion vector cache buffer(optional)
>> 4. coded bitstream output
>> 5. encoding statistic report
>>>> I think we're strongly in need of a stateful video decoder framework that
>>>> would actually address the exact problems that those have rather than
>>>> bending something that wasn't designed with them in mind to work around the
>>>> differences.
>>> The bend is already there, of course I'd be happy to help with any new
>>> framework. Specially on modern stateless, were there is a need to do better
>>> scheduling.
>> I didn't know the schedule problem about stateless codec, are they
>> supposed to be in the job queue when the buffers that DPB requests are
>> own by the driver and its registers are prepared except the trigger bit?
> On RK3588 at least, decoder scheduling is going to be complex. There is an even
> number of cores, but when you need to decode 8K, you have to pair two cores
> (there is a specific set of cores that are to be paired with). We need a decent

How do two cores work parallel? Tiles ?

But AV1 could do intra block copy.

> scheduling logic to ensure we don't starve 8K decoding session when there is
> multiple smaller resolution session on-going.
>
> On MTK, the entropy decoding (LAT) and the reconstruction (CORE) is split. MTK
> vcodec is using multiple workqueues to move jobs around, which is clearly
> expensive. Also, the life time of a job is not exactly easy to manage.

This model sounds easy,

LAT produces partial frame buffer with intra blocks and its motion
vector buffer

CORE complete the frame from the motion vector buffer and its reference
buffers

We just separately two hardware devices here.

>
> On RPi HEVC (not upstream yet, but being worked on), the entropy decoding and
> reconstruction is done one the same core, but remains 2 concurrent operation.
> Does not impose a complex scheduling issue, but it raised the need for a way to
> fully utilize such HW.

This sounds be more complex than MTK's case. It would be hard to measure
the job length with entropy part and inter construction part.

Although usually the later one would consume more memory bandwidth or
hardware time.

>
> This is just some examples of complexity for which the current framework is not
> that helpful (even though, its not impossible either).
>
>> Just ping me if you have some effort starting, I don't currently
>>> have a budget or bandwidth to write new drivers or port existing drivers them on
>>> a newly written framework.
>>>
>>> Nicolas
>>>
>>>
>>> [...]

2023-07-27 07:51:32

by Tomasz Figa

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw

On Mon, Jul 17, 2023 at 4:15 PM Hsia-Jun Li <[email protected]> wrote:
>
>
> On 7/12/23 17:33, Tomasz Figa wrote:
> > CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
> >
> >
> > On Tue, Jul 04, 2023 at 12:00:38PM +0800, Hsia-Jun Li wrote:
> >> From: "Hsia-Jun(Randy) Li" <[email protected]>
> >>
> >> Many drivers have to create its own buf_struct for a
> >> vb2_queue to track such a state. Also driver has to
> >> iterate over rdy_queue every times to find out a buffer
> >> which is not sent to hardware(or firmware), this new
> >> list just offers the driver a place to store the buffer
> >> that hardware(firmware) has acknowledged.
> >>
> >> One important advance about this list, it doesn't like
> >> rdy_queue which both bottom half of the user calling
> >> could operate it, while the v4l2 worker would as well.
> >> The v4l2 core could only operate this queue when its
> >> v4l2_context is not running, the driver would only
> >> access this new hw_queue in its own worker.
> > Could you describe in what case such a list would be useful for a
> > mem2mem driver?
>
> This list, as its description, just for saving us from creating a
> private buffer struct to track buffer state.
>
> The queue in the kernel is not the queue that hardware(codec firmware)
> are using.
>

Sorry, I find the description difficult to understand. It might make
sense to have the text proofread by someone experienced in writing
technical documentation in English before posting in the future.
Thanks.

I think I got the point from Nicolas' explanation, though.

>
> >> Signed-off-by: Hsia-Jun(Randy) Li <[email protected]>
> >> ---
> >> drivers/media/v4l2-core/v4l2-mem2mem.c | 25 +++++++++++++++++--------
> >> include/media/v4l2-mem2mem.h | 10 +++++++++-
> >> 2 files changed, 26 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> >> index c771aba42015..b4151147d5bd 100644
> >> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> >> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> >> @@ -321,15 +321,21 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
> >> goto job_unlock;
> >> }
> >>
> >> - src = v4l2_m2m_next_src_buf(m2m_ctx);
> >> - dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> >> - if (!src && !m2m_ctx->out_q_ctx.buffered) {
> >> - dprintk("No input buffers available\n");
> >> - goto job_unlock;
> >> + if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
> >> + src = v4l2_m2m_next_src_buf(m2m_ctx);
> >> +
> >> + if (!src && !m2m_ctx->out_q_ctx.buffered) {
> >> + dprintk("No input buffers available\n");
> >> + goto job_unlock;
> >> + }
> >> }
> >> - if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> >> - dprintk("No output buffers available\n");
> >> - goto job_unlock;
> >> +
> >> + if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
> >> + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> >> + if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> >> + dprintk("No output buffers available\n");
> >> + goto job_unlock;
> >> + }
> >> }
> > src and dst would be referenced unitialized below if neither of the
> > above ifs hits...
> I think they have been initialized at v4l2_m2m_ctx_init()

What do you mean? They are local variables in this function.

> >
> > Best regards,
> > Tomasz
> >
> >> m2m_ctx->new_frame = true;
> >> @@ -896,6 +902,7 @@ int v4l2_m2m_streamoff(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
> >> INIT_LIST_HEAD(&q_ctx->rdy_queue);
> >> q_ctx->num_rdy = 0;
> >> spin_unlock_irqrestore(&q_ctx->rdy_spinlock, flags);
> >> + INIT_LIST_HEAD(&q_ctx->hw_queue);
> >>
> >> if (m2m_dev->curr_ctx == m2m_ctx) {
> >> m2m_dev->curr_ctx = NULL;
> >> @@ -1234,6 +1241,8 @@ struct v4l2_m2m_ctx *v4l2_m2m_ctx_init(struct v4l2_m2m_dev *m2m_dev,
> >>
> >> INIT_LIST_HEAD(&out_q_ctx->rdy_queue);
> >> INIT_LIST_HEAD(&cap_q_ctx->rdy_queue);
> >> + INIT_LIST_HEAD(&out_q_ctx->hw_queue);
> >> + INIT_LIST_HEAD(&cap_q_ctx->hw_queue);
> >> spin_lock_init(&out_q_ctx->rdy_spinlock);
> >> spin_lock_init(&cap_q_ctx->rdy_spinlock);
> >>
> >> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
> >> index d6c8eb2b5201..2342656e582d 100644
> >> --- a/include/media/v4l2-mem2mem.h
> >> +++ b/include/media/v4l2-mem2mem.h
> >> @@ -53,9 +53,16 @@ struct v4l2_m2m_dev;
> >> * processed
> >> *
> >> * @q: pointer to struct &vb2_queue
> >> - * @rdy_queue: List of V4L2 mem-to-mem queues
> >> + * @rdy_queue: List of V4L2 mem-to-mem queues. If v4l2_m2m_buf_queue() is
> >> + * called in struct vb2_ops->buf_queue(), the buffer enqueued
> >> + * by user would be added to this list.
> >> * @rdy_spinlock: spin lock to protect the struct usage
> >> * @num_rdy: number of buffers ready to be processed
> >> + * @hw_queue: A list for tracking the buffer is occupied by the hardware
> >> + * (or device's firmware). A buffer could only be in either
> >> + * this list or @rdy_queue.
> >> + * Driver may choose not to use this list while uses its own
> >> + * private data to do this work.
> >> * @buffered: is the queue buffered?
> >> *
> >> * Queue for buffers ready to be processed as soon as this
> >> @@ -68,6 +75,7 @@ struct v4l2_m2m_queue_ctx {
> >> struct list_head rdy_queue;
> >> spinlock_t rdy_spinlock;
> >> u8 num_rdy;
> >> + struct list_head hw_queue;
> >> bool buffered;
> >> };
> >>
> >> --
> >> 2.17.1
> >>
> --
> Hsia-Jun(Randy) Li
>

2023-07-27 08:14:53

by Tomasz Figa

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw

On Thu, Jul 27, 2023 at 4:33 PM Hsia-Jun Li <[email protected]> wrote:
>
>
>
> On 7/27/23 15:25, Tomasz Figa wrote:
> > CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
> >
> >
> > On Mon, Jul 17, 2023 at 4:15 PM Hsia-Jun Li <[email protected]> wrote:
> >>
> >>
> >> On 7/12/23 17:33, Tomasz Figa wrote:
> >>> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
> >>>
> >>>
> >>> On Tue, Jul 04, 2023 at 12:00:38PM +0800, Hsia-Jun Li wrote:
> >>>> From: "Hsia-Jun(Randy) Li" <[email protected]>
> >>>>
> >>>> Many drivers have to create its own buf_struct for a
> >>>> vb2_queue to track such a state. Also driver has to
> >>>> iterate over rdy_queue every times to find out a buffer
> >>>> which is not sent to hardware(or firmware), this new
> >>>> list just offers the driver a place to store the buffer
> >>>> that hardware(firmware) has acknowledged.
> >>>>
> >>>> One important advance about this list, it doesn't like
> >>>> rdy_queue which both bottom half of the user calling
> >>>> could operate it, while the v4l2 worker would as well.
> >>>> The v4l2 core could only operate this queue when its
> >>>> v4l2_context is not running, the driver would only
> >>>> access this new hw_queue in its own worker.
> >>> Could you describe in what case such a list would be useful for a
> >>> mem2mem driver?
> >>
> >> This list, as its description, just for saving us from creating a
> >> private buffer struct to track buffer state.
> >>
> >> The queue in the kernel is not the queue that hardware(codec firmware)
> >> are using.
> >>
> >
> > Sorry, I find the description difficult to understand. It might make
> > sense to have the text proofread by someone experienced in writing
> > technical documentation in English before posting in the future.
> > Thanks.
> >
> Sorry, I may not be able to get more resource here. I would try to ask a
> chatbot to fix my description next time.

I think people on the linux-media IRC channel (including me) could
also be willing to help with rephrasing things, but if a chatbot can
do it too, it's even better. :)

> > I think I got the point from Nicolas' explanation, though.
> >
> >>
> >>>> Signed-off-by: Hsia-Jun(Randy) Li <[email protected]>
> >>>> ---
> >>>> drivers/media/v4l2-core/v4l2-mem2mem.c | 25 +++++++++++++++++--------
> >>>> include/media/v4l2-mem2mem.h | 10 +++++++++-
> >>>> 2 files changed, 26 insertions(+), 9 deletions(-)
> >>>>
> >>>> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> >>>> index c771aba42015..b4151147d5bd 100644
> >>>> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> >>>> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> >>>> @@ -321,15 +321,21 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
> >>>> goto job_unlock;
> >>>> }
> >>>>
> >>>> - src = v4l2_m2m_next_src_buf(m2m_ctx);
> >>>> - dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> >>>> - if (!src && !m2m_ctx->out_q_ctx.buffered) {
> >>>> - dprintk("No input buffers available\n");
> >>>> - goto job_unlock;
> >>>> + if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
> >>>> + src = v4l2_m2m_next_src_buf(m2m_ctx);
> >>>> +
> >>>> + if (!src && !m2m_ctx->out_q_ctx.buffered) {
> >>>> + dprintk("No input buffers available\n");
> >>>> + goto job_unlock;
> >>>> + }
> >>>> }
> >>>> - if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> >>>> - dprintk("No output buffers available\n");
> >>>> - goto job_unlock;
> >>>> +
> >>>> + if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
> >>>> + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> >>>> + if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> >>>> + dprintk("No output buffers available\n");
> >>>> + goto job_unlock;
> >>>> + }
> >>>> }
> >>> src and dst would be referenced unitialized below if neither of the
> >>> above ifs hits...
> >> I think they have been initialized at v4l2_m2m_ctx_init()
> >
> > What do you mean? They are local variables in this function.
> >
> Sorry, I didn't notice the sentence after that.
> >>>
> >>> Best regards,
> >>> Tomasz
> >>>
> >>>> m2m_ctx->new_frame = true;
> >>>> @@ -896,6 +902,7 @@ int v4l2_m2m_streamoff(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
> >>>> INIT_LIST_HEAD(&q_ctx->rdy_queue);
> >>>> q_ctx->num_rdy = 0;
> >>>> spin_unlock_irqrestore(&q_ctx->rdy_spinlock, flags);
> >>>> + INIT_LIST_HEAD(&q_ctx->hw_queue);
> >>>>
> >>>> if (m2m_dev->curr_ctx == m2m_ctx) {
> >>>> m2m_dev->curr_ctx = NULL;
> >>>> @@ -1234,6 +1241,8 @@ struct v4l2_m2m_ctx *v4l2_m2m_ctx_init(struct v4l2_m2m_dev *m2m_dev,
> >>>>
> >>>> INIT_LIST_HEAD(&out_q_ctx->rdy_queue);
> >>>> INIT_LIST_HEAD(&cap_q_ctx->rdy_queue);
> >>>> + INIT_LIST_HEAD(&out_q_ctx->hw_queue);
> >>>> + INIT_LIST_HEAD(&cap_q_ctx->hw_queue);
> >>>> spin_lock_init(&out_q_ctx->rdy_spinlock);
> >>>> spin_lock_init(&cap_q_ctx->rdy_spinlock);
> >>>>
> >>>> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
> >>>> index d6c8eb2b5201..2342656e582d 100644
> >>>> --- a/include/media/v4l2-mem2mem.h
> >>>> +++ b/include/media/v4l2-mem2mem.h
> >>>> @@ -53,9 +53,16 @@ struct v4l2_m2m_dev;
> >>>> * processed
> >>>> *
> >>>> * @q: pointer to struct &vb2_queue
> >>>> - * @rdy_queue: List of V4L2 mem-to-mem queues
> >>>> + * @rdy_queue: List of V4L2 mem-to-mem queues. If v4l2_m2m_buf_queue() is
> >>>> + * called in struct vb2_ops->buf_queue(), the buffer enqueued
> >>>> + * by user would be added to this list.
> >>>> * @rdy_spinlock: spin lock to protect the struct usage
> >>>> * @num_rdy: number of buffers ready to be processed
> >>>> + * @hw_queue: A list for tracking the buffer is occupied by the hardware
> >>>> + * (or device's firmware). A buffer could only be in either
> >>>> + * this list or @rdy_queue.
> >>>> + * Driver may choose not to use this list while uses its own
> >>>> + * private data to do this work.
> >>>> * @buffered: is the queue buffered?
> >>>> *
> >>>> * Queue for buffers ready to be processed as soon as this
> >>>> @@ -68,6 +75,7 @@ struct v4l2_m2m_queue_ctx {
> >>>> struct list_head rdy_queue;
> >>>> spinlock_t rdy_spinlock;
> >>>> u8 num_rdy;
> >>>> + struct list_head hw_queue;
> >>>> bool buffered;
> >>>> };
> >>>>
> >>>> --
> >>>> 2.17.1
> >>>>
> >> --
> >> Hsia-Jun(Randy) Li
> >>
>
> --
> Hsia-Jun(Randy) Li

2023-07-27 08:58:54

by Tomasz Figa

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw

On Mon, Jul 17, 2023 at 11:07 PM Nicolas Dufresne <[email protected]> wrote:
>
> Le mercredi 12 juillet 2023 à 09:33 +0000, Tomasz Figa a écrit :
> > On Tue, Jul 04, 2023 at 12:00:38PM +0800, Hsia-Jun Li wrote:
> > > From: "Hsia-Jun(Randy) Li" <[email protected]>
> > >
> > > Many drivers have to create its own buf_struct for a
> > > vb2_queue to track such a state. Also driver has to
> > > iterate over rdy_queue every times to find out a buffer
> > > which is not sent to hardware(or firmware), this new
> > > list just offers the driver a place to store the buffer
> > > that hardware(firmware) has acknowledged.
> > >
> > > One important advance about this list, it doesn't like
> > > rdy_queue which both bottom half of the user calling
> > > could operate it, while the v4l2 worker would as well.
> > > The v4l2 core could only operate this queue when its
> > > v4l2_context is not running, the driver would only
> > > access this new hw_queue in its own worker.
> >
> > Could you describe in what case such a list would be useful for a
> > mem2mem driver?
>
> Today all driver must track buffers that are "owned by the hardware". This is a
> concept dictated by the m2m framework and enforced through the ACTIVE flag. All
> buffers from this list must be mark as done/error/queued after streamoff of the
> respective queue in order to acknowledge that they are no longer in use by the
> HW. Not doing so will warn:
>
> videobuf2_common: driver bug: stop_streaming operation is leaving buf ...
>
> Though, there is no queue to easily iterate them. All driver endup having their
> own queue, or just leaving the buffers in the rdy_queue (which isn't better).
>

Thanks for the explanation. I see how it could be useful now.

Although I guess this is a problem specifically for hardware (or
firmware) which can internally queue more than 1 buffer, right?
Otherwise the current buffer could just stay at the top of the
rdy_queue until it's removed by the driver's completion handler,
timeout/error handler or context destruction.

Best regards,
Tomasz

> Nicolas
> >
> > >
> > > Signed-off-by: Hsia-Jun(Randy) Li <[email protected]>
> > > ---
> > > drivers/media/v4l2-core/v4l2-mem2mem.c | 25 +++++++++++++++++--------
> > > include/media/v4l2-mem2mem.h | 10 +++++++++-
> > > 2 files changed, 26 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > index c771aba42015..b4151147d5bd 100644
> > > --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > @@ -321,15 +321,21 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
> > > goto job_unlock;
> > > }
> > >
> > > - src = v4l2_m2m_next_src_buf(m2m_ctx);
> > > - dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> > > - if (!src && !m2m_ctx->out_q_ctx.buffered) {
> > > - dprintk("No input buffers available\n");
> > > - goto job_unlock;
> > > + if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
> > > + src = v4l2_m2m_next_src_buf(m2m_ctx);
> > > +
> > > + if (!src && !m2m_ctx->out_q_ctx.buffered) {
> > > + dprintk("No input buffers available\n");
> > > + goto job_unlock;
> > > + }
> > > }
> > > - if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> > > - dprintk("No output buffers available\n");
> > > - goto job_unlock;
> > > +
> > > + if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
> > > + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> > > + if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> > > + dprintk("No output buffers available\n");
> > > + goto job_unlock;
> > > + }
> > > }
> >
> > src and dst would be referenced unitialized below if neither of the
> > above ifs hits...
> >
> > Best regards,
> > Tomasz
> >
> > >
> > > m2m_ctx->new_frame = true;
> > > @@ -896,6 +902,7 @@ int v4l2_m2m_streamoff(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
> > > INIT_LIST_HEAD(&q_ctx->rdy_queue);
> > > q_ctx->num_rdy = 0;
> > > spin_unlock_irqrestore(&q_ctx->rdy_spinlock, flags);
> > > + INIT_LIST_HEAD(&q_ctx->hw_queue);
> > >
> > > if (m2m_dev->curr_ctx == m2m_ctx) {
> > > m2m_dev->curr_ctx = NULL;
> > > @@ -1234,6 +1241,8 @@ struct v4l2_m2m_ctx *v4l2_m2m_ctx_init(struct v4l2_m2m_dev *m2m_dev,
> > >
> > > INIT_LIST_HEAD(&out_q_ctx->rdy_queue);
> > > INIT_LIST_HEAD(&cap_q_ctx->rdy_queue);
> > > + INIT_LIST_HEAD(&out_q_ctx->hw_queue);
> > > + INIT_LIST_HEAD(&cap_q_ctx->hw_queue);
> > > spin_lock_init(&out_q_ctx->rdy_spinlock);
> > > spin_lock_init(&cap_q_ctx->rdy_spinlock);
> > >
> > > diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
> > > index d6c8eb2b5201..2342656e582d 100644
> > > --- a/include/media/v4l2-mem2mem.h
> > > +++ b/include/media/v4l2-mem2mem.h
> > > @@ -53,9 +53,16 @@ struct v4l2_m2m_dev;
> > > * processed
> > > *
> > > * @q: pointer to struct &vb2_queue
> > > - * @rdy_queue: List of V4L2 mem-to-mem queues
> > > + * @rdy_queue: List of V4L2 mem-to-mem queues. If v4l2_m2m_buf_queue() is
> > > + * called in struct vb2_ops->buf_queue(), the buffer enqueued
> > > + * by user would be added to this list.
> > > * @rdy_spinlock: spin lock to protect the struct usage
> > > * @num_rdy: number of buffers ready to be processed
> > > + * @hw_queue: A list for tracking the buffer is occupied by the hardware
> > > + * (or device's firmware). A buffer could only be in either
> > > + * this list or @rdy_queue.
> > > + * Driver may choose not to use this list while uses its own
> > > + * private data to do this work.
> > > * @buffered: is the queue buffered?
> > > *
> > > * Queue for buffers ready to be processed as soon as this
> > > @@ -68,6 +75,7 @@ struct v4l2_m2m_queue_ctx {
> > > struct list_head rdy_queue;
> > > spinlock_t rdy_spinlock;
> > > u8 num_rdy;
> > > + struct list_head hw_queue;
> > > bool buffered;
> > > };
> > >
> > > --
> > > 2.17.1
> > >
>

2023-07-27 09:08:54

by Hsia-Jun Li

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw



On 7/27/23 15:25, Tomasz Figa wrote:
> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> On Mon, Jul 17, 2023 at 4:15 PM Hsia-Jun Li <[email protected]> wrote:
>>
>>
>> On 7/12/23 17:33, Tomasz Figa wrote:
>>> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
>>>
>>>
>>> On Tue, Jul 04, 2023 at 12:00:38PM +0800, Hsia-Jun Li wrote:
>>>> From: "Hsia-Jun(Randy) Li" <[email protected]>
>>>>
>>>> Many drivers have to create its own buf_struct for a
>>>> vb2_queue to track such a state. Also driver has to
>>>> iterate over rdy_queue every times to find out a buffer
>>>> which is not sent to hardware(or firmware), this new
>>>> list just offers the driver a place to store the buffer
>>>> that hardware(firmware) has acknowledged.
>>>>
>>>> One important advance about this list, it doesn't like
>>>> rdy_queue which both bottom half of the user calling
>>>> could operate it, while the v4l2 worker would as well.
>>>> The v4l2 core could only operate this queue when its
>>>> v4l2_context is not running, the driver would only
>>>> access this new hw_queue in its own worker.
>>> Could you describe in what case such a list would be useful for a
>>> mem2mem driver?
>>
>> This list, as its description, just for saving us from creating a
>> private buffer struct to track buffer state.
>>
>> The queue in the kernel is not the queue that hardware(codec firmware)
>> are using.
>>
>
> Sorry, I find the description difficult to understand. It might make
> sense to have the text proofread by someone experienced in writing
> technical documentation in English before posting in the future.
> Thanks.
>
Sorry, I may not be able to get more resource here. I would try to ask a
chatbot to fix my description next time.
> I think I got the point from Nicolas' explanation, though.
>
>>
>>>> Signed-off-by: Hsia-Jun(Randy) Li <[email protected]>
>>>> ---
>>>> drivers/media/v4l2-core/v4l2-mem2mem.c | 25 +++++++++++++++++--------
>>>> include/media/v4l2-mem2mem.h | 10 +++++++++-
>>>> 2 files changed, 26 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>> index c771aba42015..b4151147d5bd 100644
>>>> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>> @@ -321,15 +321,21 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
>>>> goto job_unlock;
>>>> }
>>>>
>>>> - src = v4l2_m2m_next_src_buf(m2m_ctx);
>>>> - dst = v4l2_m2m_next_dst_buf(m2m_ctx);
>>>> - if (!src && !m2m_ctx->out_q_ctx.buffered) {
>>>> - dprintk("No input buffers available\n");
>>>> - goto job_unlock;
>>>> + if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
>>>> + src = v4l2_m2m_next_src_buf(m2m_ctx);
>>>> +
>>>> + if (!src && !m2m_ctx->out_q_ctx.buffered) {
>>>> + dprintk("No input buffers available\n");
>>>> + goto job_unlock;
>>>> + }
>>>> }
>>>> - if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
>>>> - dprintk("No output buffers available\n");
>>>> - goto job_unlock;
>>>> +
>>>> + if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
>>>> + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
>>>> + if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
>>>> + dprintk("No output buffers available\n");
>>>> + goto job_unlock;
>>>> + }
>>>> }
>>> src and dst would be referenced unitialized below if neither of the
>>> above ifs hits...
>> I think they have been initialized at v4l2_m2m_ctx_init()
>
> What do you mean? They are local variables in this function.
>
Sorry, I didn't notice the sentence after that.
>>>
>>> Best regards,
>>> Tomasz
>>>
>>>> m2m_ctx->new_frame = true;
>>>> @@ -896,6 +902,7 @@ int v4l2_m2m_streamoff(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
>>>> INIT_LIST_HEAD(&q_ctx->rdy_queue);
>>>> q_ctx->num_rdy = 0;
>>>> spin_unlock_irqrestore(&q_ctx->rdy_spinlock, flags);
>>>> + INIT_LIST_HEAD(&q_ctx->hw_queue);
>>>>
>>>> if (m2m_dev->curr_ctx == m2m_ctx) {
>>>> m2m_dev->curr_ctx = NULL;
>>>> @@ -1234,6 +1241,8 @@ struct v4l2_m2m_ctx *v4l2_m2m_ctx_init(struct v4l2_m2m_dev *m2m_dev,
>>>>
>>>> INIT_LIST_HEAD(&out_q_ctx->rdy_queue);
>>>> INIT_LIST_HEAD(&cap_q_ctx->rdy_queue);
>>>> + INIT_LIST_HEAD(&out_q_ctx->hw_queue);
>>>> + INIT_LIST_HEAD(&cap_q_ctx->hw_queue);
>>>> spin_lock_init(&out_q_ctx->rdy_spinlock);
>>>> spin_lock_init(&cap_q_ctx->rdy_spinlock);
>>>>
>>>> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
>>>> index d6c8eb2b5201..2342656e582d 100644
>>>> --- a/include/media/v4l2-mem2mem.h
>>>> +++ b/include/media/v4l2-mem2mem.h
>>>> @@ -53,9 +53,16 @@ struct v4l2_m2m_dev;
>>>> * processed
>>>> *
>>>> * @q: pointer to struct &vb2_queue
>>>> - * @rdy_queue: List of V4L2 mem-to-mem queues
>>>> + * @rdy_queue: List of V4L2 mem-to-mem queues. If v4l2_m2m_buf_queue() is
>>>> + * called in struct vb2_ops->buf_queue(), the buffer enqueued
>>>> + * by user would be added to this list.
>>>> * @rdy_spinlock: spin lock to protect the struct usage
>>>> * @num_rdy: number of buffers ready to be processed
>>>> + * @hw_queue: A list for tracking the buffer is occupied by the hardware
>>>> + * (or device's firmware). A buffer could only be in either
>>>> + * this list or @rdy_queue.
>>>> + * Driver may choose not to use this list while uses its own
>>>> + * private data to do this work.
>>>> * @buffered: is the queue buffered?
>>>> *
>>>> * Queue for buffers ready to be processed as soon as this
>>>> @@ -68,6 +75,7 @@ struct v4l2_m2m_queue_ctx {
>>>> struct list_head rdy_queue;
>>>> spinlock_t rdy_spinlock;
>>>> u8 num_rdy;
>>>> + struct list_head hw_queue;
>>>> bool buffered;
>>>> };
>>>>
>>>> --
>>>> 2.17.1
>>>>
>> --
>> Hsia-Jun(Randy) Li
>>

--
Hsia-Jun(Randy) Li

2023-07-27 17:34:02

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw

Le jeudi 27 juillet 2023 à 16:43 +0900, Tomasz Figa a écrit :
> On Mon, Jul 17, 2023 at 11:07 PM Nicolas Dufresne <[email protected]> wrote:
> >
> > Le mercredi 12 juillet 2023 à 09:33 +0000, Tomasz Figa a écrit :
> > > On Tue, Jul 04, 2023 at 12:00:38PM +0800, Hsia-Jun Li wrote:
> > > > From: "Hsia-Jun(Randy) Li" <[email protected]>
> > > >
> > > > Many drivers have to create its own buf_struct for a
> > > > vb2_queue to track such a state. Also driver has to
> > > > iterate over rdy_queue every times to find out a buffer
> > > > which is not sent to hardware(or firmware), this new
> > > > list just offers the driver a place to store the buffer
> > > > that hardware(firmware) has acknowledged.
> > > >
> > > > One important advance about this list, it doesn't like
> > > > rdy_queue which both bottom half of the user calling
> > > > could operate it, while the v4l2 worker would as well.
> > > > The v4l2 core could only operate this queue when its
> > > > v4l2_context is not running, the driver would only
> > > > access this new hw_queue in its own worker.
> > >
> > > Could you describe in what case such a list would be useful for a
> > > mem2mem driver?
> >
> > Today all driver must track buffers that are "owned by the hardware". This is a
> > concept dictated by the m2m framework and enforced through the ACTIVE flag. All
> > buffers from this list must be mark as done/error/queued after streamoff of the
> > respective queue in order to acknowledge that they are no longer in use by the
> > HW. Not doing so will warn:
> >
> > videobuf2_common: driver bug: stop_streaming operation is leaving buf ...
> >
> > Though, there is no queue to easily iterate them. All driver endup having their
> > own queue, or just leaving the buffers in the rdy_queue (which isn't better).
> >
>
> Thanks for the explanation. I see how it could be useful now.
>
> Although I guess this is a problem specifically for hardware (or
> firmware) which can internally queue more than 1 buffer, right?
> Otherwise the current buffer could just stay at the top of the
> rdy_queue until it's removed by the driver's completion handler,
> timeout/error handler or context destruction.

Correct, its only an issue when you need to process multiple src buffers before
producing a dst buffer. If affects stateful decoder, stateful encoders and
deinterlacer as far as I'm aware.

Nicolas

>
> Best regards,
> Tomasz
>
> > Nicolas
> > >
> > > >
> > > > Signed-off-by: Hsia-Jun(Randy) Li <[email protected]>
> > > > ---
> > > > drivers/media/v4l2-core/v4l2-mem2mem.c | 25 +++++++++++++++++--------
> > > > include/media/v4l2-mem2mem.h | 10 +++++++++-
> > > > 2 files changed, 26 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > > index c771aba42015..b4151147d5bd 100644
> > > > --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > > +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > > @@ -321,15 +321,21 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
> > > > goto job_unlock;
> > > > }
> > > >
> > > > - src = v4l2_m2m_next_src_buf(m2m_ctx);
> > > > - dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> > > > - if (!src && !m2m_ctx->out_q_ctx.buffered) {
> > > > - dprintk("No input buffers available\n");
> > > > - goto job_unlock;
> > > > + if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
> > > > + src = v4l2_m2m_next_src_buf(m2m_ctx);
> > > > +
> > > > + if (!src && !m2m_ctx->out_q_ctx.buffered) {
> > > > + dprintk("No input buffers available\n");
> > > > + goto job_unlock;
> > > > + }
> > > > }
> > > > - if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> > > > - dprintk("No output buffers available\n");
> > > > - goto job_unlock;
> > > > +
> > > > + if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
> > > > + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> > > > + if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> > > > + dprintk("No output buffers available\n");
> > > > + goto job_unlock;
> > > > + }
> > > > }
> > >
> > > src and dst would be referenced unitialized below if neither of the
> > > above ifs hits...
> > >
> > > Best regards,
> > > Tomasz
> > >
> > > >
> > > > m2m_ctx->new_frame = true;
> > > > @@ -896,6 +902,7 @@ int v4l2_m2m_streamoff(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
> > > > INIT_LIST_HEAD(&q_ctx->rdy_queue);
> > > > q_ctx->num_rdy = 0;
> > > > spin_unlock_irqrestore(&q_ctx->rdy_spinlock, flags);
> > > > + INIT_LIST_HEAD(&q_ctx->hw_queue);
> > > >
> > > > if (m2m_dev->curr_ctx == m2m_ctx) {
> > > > m2m_dev->curr_ctx = NULL;
> > > > @@ -1234,6 +1241,8 @@ struct v4l2_m2m_ctx *v4l2_m2m_ctx_init(struct v4l2_m2m_dev *m2m_dev,
> > > >
> > > > INIT_LIST_HEAD(&out_q_ctx->rdy_queue);
> > > > INIT_LIST_HEAD(&cap_q_ctx->rdy_queue);
> > > > + INIT_LIST_HEAD(&out_q_ctx->hw_queue);
> > > > + INIT_LIST_HEAD(&cap_q_ctx->hw_queue);
> > > > spin_lock_init(&out_q_ctx->rdy_spinlock);
> > > > spin_lock_init(&cap_q_ctx->rdy_spinlock);
> > > >
> > > > diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
> > > > index d6c8eb2b5201..2342656e582d 100644
> > > > --- a/include/media/v4l2-mem2mem.h
> > > > +++ b/include/media/v4l2-mem2mem.h
> > > > @@ -53,9 +53,16 @@ struct v4l2_m2m_dev;
> > > > * processed
> > > > *
> > > > * @q: pointer to struct &vb2_queue
> > > > - * @rdy_queue: List of V4L2 mem-to-mem queues
> > > > + * @rdy_queue: List of V4L2 mem-to-mem queues. If v4l2_m2m_buf_queue() is
> > > > + * called in struct vb2_ops->buf_queue(), the buffer enqueued
> > > > + * by user would be added to this list.
> > > > * @rdy_spinlock: spin lock to protect the struct usage
> > > > * @num_rdy: number of buffers ready to be processed
> > > > + * @hw_queue: A list for tracking the buffer is occupied by the hardware
> > > > + * (or device's firmware). A buffer could only be in either
> > > > + * this list or @rdy_queue.
> > > > + * Driver may choose not to use this list while uses its own
> > > > + * private data to do this work.
> > > > * @buffered: is the queue buffered?
> > > > *
> > > > * Queue for buffers ready to be processed as soon as this
> > > > @@ -68,6 +75,7 @@ struct v4l2_m2m_queue_ctx {
> > > > struct list_head rdy_queue;
> > > > spinlock_t rdy_spinlock;
> > > > u8 num_rdy;
> > > > + struct list_head hw_queue;
> > > > bool buffered;
> > > > };
> > > >
> > > > --
> > > > 2.17.1
> > > >
> >


2023-07-28 05:47:25

by Tomasz Figa

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw

On Fri, Jul 28, 2023 at 1:58 AM Nicolas Dufresne <[email protected]> wrote:
>
> Le jeudi 27 juillet 2023 à 16:43 +0900, Tomasz Figa a écrit :
> > On Mon, Jul 17, 2023 at 11:07 PM Nicolas Dufresne <[email protected]> wrote:
> > >
> > > Le mercredi 12 juillet 2023 à 09:33 +0000, Tomasz Figa a écrit :
> > > > On Tue, Jul 04, 2023 at 12:00:38PM +0800, Hsia-Jun Li wrote:
> > > > > From: "Hsia-Jun(Randy) Li" <[email protected]>
> > > > >
> > > > > Many drivers have to create its own buf_struct for a
> > > > > vb2_queue to track such a state. Also driver has to
> > > > > iterate over rdy_queue every times to find out a buffer
> > > > > which is not sent to hardware(or firmware), this new
> > > > > list just offers the driver a place to store the buffer
> > > > > that hardware(firmware) has acknowledged.
> > > > >
> > > > > One important advance about this list, it doesn't like
> > > > > rdy_queue which both bottom half of the user calling
> > > > > could operate it, while the v4l2 worker would as well.
> > > > > The v4l2 core could only operate this queue when its
> > > > > v4l2_context is not running, the driver would only
> > > > > access this new hw_queue in its own worker.
> > > >
> > > > Could you describe in what case such a list would be useful for a
> > > > mem2mem driver?
> > >
> > > Today all driver must track buffers that are "owned by the hardware". This is a
> > > concept dictated by the m2m framework and enforced through the ACTIVE flag. All
> > > buffers from this list must be mark as done/error/queued after streamoff of the
> > > respective queue in order to acknowledge that they are no longer in use by the
> > > HW. Not doing so will warn:
> > >
> > > videobuf2_common: driver bug: stop_streaming operation is leaving buf ...
> > >
> > > Though, there is no queue to easily iterate them. All driver endup having their
> > > own queue, or just leaving the buffers in the rdy_queue (which isn't better).
> > >
> >
> > Thanks for the explanation. I see how it could be useful now.
> >
> > Although I guess this is a problem specifically for hardware (or
> > firmware) which can internally queue more than 1 buffer, right?
> > Otherwise the current buffer could just stay at the top of the
> > rdy_queue until it's removed by the driver's completion handler,
> > timeout/error handler or context destruction.
>
> Correct, its only an issue when you need to process multiple src buffers before
> producing a dst buffer. If affects stateful decoder, stateful encoders and
> deinterlacer as far as I'm aware.

Is it actually necessary to keep those buffers in a list in that case, though?
I can see that a deinterlacer would indeed need 2 input buffers to
perform the deinterlacing operation, but those would be just known to
the driver, since it's running the task currently.
For a stateful decoder, wouldn't it just consume the bitstream buffer
(producing something partially decoded to its own internal buffers)
and return it shortly?
The most realistic scenario would be for stateful encoders which could
keep some input buffers as reference frames for further encoding, but
then would this patch actually work for them? It would make
__v4l2_m2m_try_queue never add the context to the job_queue if there
are some buffers in that hw_queue list.

Maybe what I need here are actual patches modifying some existing
drivers. Randy, would you be able to include that in the next version?
Thanks.

Best regards,
Tomasz

>
> Nicolas
>
> >
> > Best regards,
> > Tomasz
> >
> > > Nicolas
> > > >
> > > > >
> > > > > Signed-off-by: Hsia-Jun(Randy) Li <[email protected]>
> > > > > ---
> > > > > drivers/media/v4l2-core/v4l2-mem2mem.c | 25 +++++++++++++++++--------
> > > > > include/media/v4l2-mem2mem.h | 10 +++++++++-
> > > > > 2 files changed, 26 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > > > index c771aba42015..b4151147d5bd 100644
> > > > > --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > > > +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > > > @@ -321,15 +321,21 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
> > > > > goto job_unlock;
> > > > > }
> > > > >
> > > > > - src = v4l2_m2m_next_src_buf(m2m_ctx);
> > > > > - dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> > > > > - if (!src && !m2m_ctx->out_q_ctx.buffered) {
> > > > > - dprintk("No input buffers available\n");
> > > > > - goto job_unlock;
> > > > > + if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
> > > > > + src = v4l2_m2m_next_src_buf(m2m_ctx);
> > > > > +
> > > > > + if (!src && !m2m_ctx->out_q_ctx.buffered) {
> > > > > + dprintk("No input buffers available\n");
> > > > > + goto job_unlock;
> > > > > + }
> > > > > }
> > > > > - if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> > > > > - dprintk("No output buffers available\n");
> > > > > - goto job_unlock;
> > > > > +
> > > > > + if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
> > > > > + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> > > > > + if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> > > > > + dprintk("No output buffers available\n");
> > > > > + goto job_unlock;
> > > > > + }
> > > > > }
> > > >
> > > > src and dst would be referenced unitialized below if neither of the
> > > > above ifs hits...
> > > >
> > > > Best regards,
> > > > Tomasz
> > > >
> > > > >
> > > > > m2m_ctx->new_frame = true;
> > > > > @@ -896,6 +902,7 @@ int v4l2_m2m_streamoff(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
> > > > > INIT_LIST_HEAD(&q_ctx->rdy_queue);
> > > > > q_ctx->num_rdy = 0;
> > > > > spin_unlock_irqrestore(&q_ctx->rdy_spinlock, flags);
> > > > > + INIT_LIST_HEAD(&q_ctx->hw_queue);
> > > > >
> > > > > if (m2m_dev->curr_ctx == m2m_ctx) {
> > > > > m2m_dev->curr_ctx = NULL;
> > > > > @@ -1234,6 +1241,8 @@ struct v4l2_m2m_ctx *v4l2_m2m_ctx_init(struct v4l2_m2m_dev *m2m_dev,
> > > > >
> > > > > INIT_LIST_HEAD(&out_q_ctx->rdy_queue);
> > > > > INIT_LIST_HEAD(&cap_q_ctx->rdy_queue);
> > > > > + INIT_LIST_HEAD(&out_q_ctx->hw_queue);
> > > > > + INIT_LIST_HEAD(&cap_q_ctx->hw_queue);
> > > > > spin_lock_init(&out_q_ctx->rdy_spinlock);
> > > > > spin_lock_init(&cap_q_ctx->rdy_spinlock);
> > > > >
> > > > > diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
> > > > > index d6c8eb2b5201..2342656e582d 100644
> > > > > --- a/include/media/v4l2-mem2mem.h
> > > > > +++ b/include/media/v4l2-mem2mem.h
> > > > > @@ -53,9 +53,16 @@ struct v4l2_m2m_dev;
> > > > > * processed
> > > > > *
> > > > > * @q: pointer to struct &vb2_queue
> > > > > - * @rdy_queue: List of V4L2 mem-to-mem queues
> > > > > + * @rdy_queue: List of V4L2 mem-to-mem queues. If v4l2_m2m_buf_queue() is
> > > > > + * called in struct vb2_ops->buf_queue(), the buffer enqueued
> > > > > + * by user would be added to this list.
> > > > > * @rdy_spinlock: spin lock to protect the struct usage
> > > > > * @num_rdy: number of buffers ready to be processed
> > > > > + * @hw_queue: A list for tracking the buffer is occupied by the hardware
> > > > > + * (or device's firmware). A buffer could only be in either
> > > > > + * this list or @rdy_queue.
> > > > > + * Driver may choose not to use this list while uses its own
> > > > > + * private data to do this work.
> > > > > * @buffered: is the queue buffered?
> > > > > *
> > > > > * Queue for buffers ready to be processed as soon as this
> > > > > @@ -68,6 +75,7 @@ struct v4l2_m2m_queue_ctx {
> > > > > struct list_head rdy_queue;
> > > > > spinlock_t rdy_spinlock;
> > > > > u8 num_rdy;
> > > > > + struct list_head hw_queue;
> > > > > bool buffered;
> > > > > };
> > > > >
> > > > > --
> > > > > 2.17.1
> > > > >
> > >
>

2023-07-28 08:04:48

by Hsia-Jun Li

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw



On 7/28/23 12:43, Tomasz Figa wrote:
> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> On Fri, Jul 28, 2023 at 1:58 AM Nicolas Dufresne <[email protected]> wrote:
>>
>> Le jeudi 27 juillet 2023 à 16:43 +0900, Tomasz Figa a écrit :
>>> On Mon, Jul 17, 2023 at 11:07 PM Nicolas Dufresne <[email protected]> wrote:
>>>>
>>>> Le mercredi 12 juillet 2023 à 09:33 +0000, Tomasz Figa a écrit :
>>>>> On Tue, Jul 04, 2023 at 12:00:38PM +0800, Hsia-Jun Li wrote:
>>>>>> From: "Hsia-Jun(Randy) Li" <[email protected]>
>>>>>>
>>>>>> Many drivers have to create its own buf_struct for a
>>>>>> vb2_queue to track such a state. Also driver has to
>>>>>> iterate over rdy_queue every times to find out a buffer
>>>>>> which is not sent to hardware(or firmware), this new
>>>>>> list just offers the driver a place to store the buffer
>>>>>> that hardware(firmware) has acknowledged.
>>>>>>
>>>>>> One important advance about this list, it doesn't like
>>>>>> rdy_queue which both bottom half of the user calling
>>>>>> could operate it, while the v4l2 worker would as well.
>>>>>> The v4l2 core could only operate this queue when its
>>>>>> v4l2_context is not running, the driver would only
>>>>>> access this new hw_queue in its own worker.
>>>>>
>>>>> Could you describe in what case such a list would be useful for a
>>>>> mem2mem driver?
>>>>
>>>> Today all driver must track buffers that are "owned by the hardware". This is a
>>>> concept dictated by the m2m framework and enforced through the ACTIVE flag. All
>>>> buffers from this list must be mark as done/error/queued after streamoff of the
>>>> respective queue in order to acknowledge that they are no longer in use by the
>>>> HW. Not doing so will warn:
>>>>
>>>> videobuf2_common: driver bug: stop_streaming operation is leaving buf ...
>>>>
>>>> Though, there is no queue to easily iterate them. All driver endup having their
>>>> own queue, or just leaving the buffers in the rdy_queue (which isn't better).
>>>>
>>>
>>> Thanks for the explanation. I see how it could be useful now.
>>>
>>> Although I guess this is a problem specifically for hardware (or
>>> firmware) which can internally queue more than 1 buffer, right?
>>> Otherwise the current buffer could just stay at the top of the
>>> rdy_queue until it's removed by the driver's completion handler,
>>> timeout/error handler or context destruction.
>>
>> Correct, its only an issue when you need to process multiple src buffers before
>> producing a dst buffer. If affects stateful decoder, stateful encoders and
>> deinterlacer as far as I'm aware.
>
> Is it actually necessary to keep those buffers in a list in that case, though?
> I can see that a deinterlacer would indeed need 2 input buffers to
> perform the deinterlacing operation, but those would be just known to
> the driver, since it's running the task currently.
> For a stateful decoder, wouldn't it just consume the bitstream buffer
> (producing something partially decoded to its own internal buffers)
> and return it shortly?
Display re-order. Firmware could do such batch work, taking a few
bitstream buffer, then output a list graphics buffer in the display
order also discard the usage of the non-display buffer when it is
removed from dpb.

Even in one input and one output mode, firmware need to do redo, let the
driver know when a graphics buffer could be display, so firmware would
usually hold the graphics buffer(frame) until its display time.

Besides, I hate the driver occupied a large of memory without user's
order. I would like to drop those internal buffers.
> The most realistic scenario would be for stateful encoders which could
> keep some input buffers as reference frames for further encoding, but
> then would this patch actually work for them? It would make
> __v4l2_m2m_try_queue never add the context to the job_queue if there
> are some buffers in that hw_queue list.
why?
>
> Maybe what I need here are actual patches modifying some existing
> drivers. Randy, would you be able to include that in the next version?
May not. The Synaptics VideoSmart is a secure video platform(DRM), I
could release a snapshot of the driver when I got the permission, that
would be after the official release of the SDK.
But you may not be able to compile it because we have our own TEE
interface(not optee), also running it because the trusted app would be
signed with a per-device key.
> Thanks.
>
> Best regards,
> Tomasz
>
>>
>> Nicolas
>>
>>>
>>> Best regards,
>>> Tomasz
>>>
>>>> Nicolas
>>>>>
>>>>>>
>>>>>> Signed-off-by: Hsia-Jun(Randy) Li <[email protected]>
>>>>>> ---
>>>>>> drivers/media/v4l2-core/v4l2-mem2mem.c | 25 +++++++++++++++++--------
>>>>>> include/media/v4l2-mem2mem.h | 10 +++++++++-
>>>>>> 2 files changed, 26 insertions(+), 9 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>>>> index c771aba42015..b4151147d5bd 100644
>>>>>> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>>>> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>>>> @@ -321,15 +321,21 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
>>>>>> goto job_unlock;
>>>>>> }
>>>>>>
>>>>>> - src = v4l2_m2m_next_src_buf(m2m_ctx);
>>>>>> - dst = v4l2_m2m_next_dst_buf(m2m_ctx);
>>>>>> - if (!src && !m2m_ctx->out_q_ctx.buffered) {
>>>>>> - dprintk("No input buffers available\n");
>>>>>> - goto job_unlock;
>>>>>> + if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
>>>>>> + src = v4l2_m2m_next_src_buf(m2m_ctx);
>>>>>> +
>>>>>> + if (!src && !m2m_ctx->out_q_ctx.buffered) {
>>>>>> + dprintk("No input buffers available\n");
>>>>>> + goto job_unlock;
>>>>>> + }
>>>>>> }
>>>>>> - if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
>>>>>> - dprintk("No output buffers available\n");
>>>>>> - goto job_unlock;
>>>>>> +
>>>>>> + if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
>>>>>> + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
>>>>>> + if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
>>>>>> + dprintk("No output buffers available\n");
>>>>>> + goto job_unlock;
>>>>>> + }
>>>>>> }
>>>>>
>>>>> src and dst would be referenced unitialized below if neither of the
>>>>> above ifs hits...
>>>>>
>>>>> Best regards,
>>>>> Tomasz
>>>>>
>>>>>>
>>>>>> m2m_ctx->new_frame = true;
>>>>>> @@ -896,6 +902,7 @@ int v4l2_m2m_streamoff(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
>>>>>> INIT_LIST_HEAD(&q_ctx->rdy_queue);
>>>>>> q_ctx->num_rdy = 0;
>>>>>> spin_unlock_irqrestore(&q_ctx->rdy_spinlock, flags);
>>>>>> + INIT_LIST_HEAD(&q_ctx->hw_queue);
>>>>>>
>>>>>> if (m2m_dev->curr_ctx == m2m_ctx) {
>>>>>> m2m_dev->curr_ctx = NULL;
>>>>>> @@ -1234,6 +1241,8 @@ struct v4l2_m2m_ctx *v4l2_m2m_ctx_init(struct v4l2_m2m_dev *m2m_dev,
>>>>>>
>>>>>> INIT_LIST_HEAD(&out_q_ctx->rdy_queue);
>>>>>> INIT_LIST_HEAD(&cap_q_ctx->rdy_queue);
>>>>>> + INIT_LIST_HEAD(&out_q_ctx->hw_queue);
>>>>>> + INIT_LIST_HEAD(&cap_q_ctx->hw_queue);
>>>>>> spin_lock_init(&out_q_ctx->rdy_spinlock);
>>>>>> spin_lock_init(&cap_q_ctx->rdy_spinlock);
>>>>>>
>>>>>> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
>>>>>> index d6c8eb2b5201..2342656e582d 100644
>>>>>> --- a/include/media/v4l2-mem2mem.h
>>>>>> +++ b/include/media/v4l2-mem2mem.h
>>>>>> @@ -53,9 +53,16 @@ struct v4l2_m2m_dev;
>>>>>> * processed
>>>>>> *
>>>>>> * @q: pointer to struct &vb2_queue
>>>>>> - * @rdy_queue: List of V4L2 mem-to-mem queues
>>>>>> + * @rdy_queue: List of V4L2 mem-to-mem queues. If v4l2_m2m_buf_queue() is
>>>>>> + * called in struct vb2_ops->buf_queue(), the buffer enqueued
>>>>>> + * by user would be added to this list.
>>>>>> * @rdy_spinlock: spin lock to protect the struct usage
>>>>>> * @num_rdy: number of buffers ready to be processed
>>>>>> + * @hw_queue: A list for tracking the buffer is occupied by the hardware
>>>>>> + * (or device's firmware). A buffer could only be in either
>>>>>> + * this list or @rdy_queue.
>>>>>> + * Driver may choose not to use this list while uses its own
>>>>>> + * private data to do this work.
>>>>>> * @buffered: is the queue buffered?
>>>>>> *
>>>>>> * Queue for buffers ready to be processed as soon as this
>>>>>> @@ -68,6 +75,7 @@ struct v4l2_m2m_queue_ctx {
>>>>>> struct list_head rdy_queue;
>>>>>> spinlock_t rdy_spinlock;
>>>>>> u8 num_rdy;
>>>>>> + struct list_head hw_queue;
>>>>>> bool buffered;
>>>>>> };
>>>>>>
>>>>>> --
>>>>>> 2.17.1
>>>>>>
>>>>
>>

--
Hsia-Jun(Randy) Li

2023-07-28 08:31:50

by Hsia-Jun Li

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw



On 7/28/23 15:26, Tomasz Figa wrote:
> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> On Fri, Jul 28, 2023 at 4:09 PM Hsia-Jun Li <[email protected]> wrote:
>>
>>
>>
>> On 7/28/23 12:43, Tomasz Figa wrote:
>>> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
>>>
>>>
>>> On Fri, Jul 28, 2023 at 1:58 AM Nicolas Dufresne <[email protected]> wrote:
>>>>
>>>> Le jeudi 27 juillet 2023 à 16:43 +0900, Tomasz Figa a écrit :
>>>>> On Mon, Jul 17, 2023 at 11:07 PM Nicolas Dufresne <[email protected]> wrote:
>>>>>>
>>>>>> Le mercredi 12 juillet 2023 à 09:33 +0000, Tomasz Figa a écrit :
>>>>>>> On Tue, Jul 04, 2023 at 12:00:38PM +0800, Hsia-Jun Li wrote:
>>>>>>>> From: "Hsia-Jun(Randy) Li" <[email protected]>
>>>>>>>>
>>>>>>>> Many drivers have to create its own buf_struct for a
>>>>>>>> vb2_queue to track such a state. Also driver has to
>>>>>>>> iterate over rdy_queue every times to find out a buffer
>>>>>>>> which is not sent to hardware(or firmware), this new
>>>>>>>> list just offers the driver a place to store the buffer
>>>>>>>> that hardware(firmware) has acknowledged.
>>>>>>>>
>>>>>>>> One important advance about this list, it doesn't like
>>>>>>>> rdy_queue which both bottom half of the user calling
>>>>>>>> could operate it, while the v4l2 worker would as well.
>>>>>>>> The v4l2 core could only operate this queue when its
>>>>>>>> v4l2_context is not running, the driver would only
>>>>>>>> access this new hw_queue in its own worker.
>>>>>>>
>>>>>>> Could you describe in what case such a list would be useful for a
>>>>>>> mem2mem driver?
>>>>>>
>>>>>> Today all driver must track buffers that are "owned by the hardware". This is a
>>>>>> concept dictated by the m2m framework and enforced through the ACTIVE flag. All
>>>>>> buffers from this list must be mark as done/error/queued after streamoff of the
>>>>>> respective queue in order to acknowledge that they are no longer in use by the
>>>>>> HW. Not doing so will warn:
>>>>>>
>>>>>> videobuf2_common: driver bug: stop_streaming operation is leaving buf ...
>>>>>>
>>>>>> Though, there is no queue to easily iterate them. All driver endup having their
>>>>>> own queue, or just leaving the buffers in the rdy_queue (which isn't better).
>>>>>>
>>>>>
>>>>> Thanks for the explanation. I see how it could be useful now.
>>>>>
>>>>> Although I guess this is a problem specifically for hardware (or
>>>>> firmware) which can internally queue more than 1 buffer, right?
>>>>> Otherwise the current buffer could just stay at the top of the
>>>>> rdy_queue until it's removed by the driver's completion handler,
>>>>> timeout/error handler or context destruction.
>>>>
>>>> Correct, its only an issue when you need to process multiple src buffers before
>>>> producing a dst buffer. If affects stateful decoder, stateful encoders and
>>>> deinterlacer as far as I'm aware.
>>>
>>> Is it actually necessary to keep those buffers in a list in that case, though?
>>> I can see that a deinterlacer would indeed need 2 input buffers to
>>> perform the deinterlacing operation, but those would be just known to
>>> the driver, since it's running the task currently.
>>> For a stateful decoder, wouldn't it just consume the bitstream buffer
>>> (producing something partially decoded to its own internal buffers)
>>> and return it shortly?
>> Display re-order. Firmware could do such batch work, taking a few
>> bitstream buffer, then output a list graphics buffer in the display
>> order also discard the usage of the non-display buffer when it is
>> removed from dpb.
>>
>> Even in one input and one output mode, firmware need to do redo, let the
>> driver know when a graphics buffer could be display, so firmware would
>> usually hold the graphics buffer(frame) until its display time.
>>
>
> Okay, so that hold would be for frame buffers, not bitstream buffers, right?
For the 1:1 model, decoder won't hold the input(OUTPUT queue) buffer
usually.
While for the VP9, we have a super frame and temporal unit packing for
AV1 which break the current API requirement for an AU in a buffer. The
hardware would trigger multiple work for that(that means multiple irqs
ack for a usual devices).
For the encoder, it is a different story.
> But yeah, I see that then it could hold onto those buffers until it's
> their turn to display and it could be a bigger number of frames,
> depending on the complexity of the codec.
>
>> Besides, I hate the driver occupied a large of memory without user's
>> order. I would like to drop those internal buffers.
>
> I think this is one reason to migrate to the stateless decoder design.
>
I didn't know such plan here. I don't think the current stateless API
could export the reconstruction buffers for encoder or post-processing
buffer for decoder to us.
>>> The most realistic scenario would be for stateful encoders which could
>>> keep some input buffers as reference frames for further encoding, but
>>> then would this patch actually work for them? It would make
>>> __v4l2_m2m_try_queue never add the context to the job_queue if there
>>> are some buffers in that hw_queue list.
>> why?
>>>
>>> Maybe what I need here are actual patches modifying some existing
>>> drivers. Randy, would you be able to include that in the next version?
>> May not. The Synaptics VideoSmart is a secure video platform(DRM), I
>> could release a snapshot of the driver when I got the permission, that
>> would be after the official release of the SDK.
>> But you may not be able to compile it because we have our own TEE
>> interface(not optee), also running it because the trusted app would be
>> signed with a per-device key.
>
> Could you modify another, already existing driver then?
>
>>> Thanks.
>>>
>>> Best regards,
>>> Tomasz
>>>
>>>>
>>>> Nicolas
>>>>
>>>>>
>>>>> Best regards,
>>>>> Tomasz
>>>>>
>>>>>> Nicolas
>>>>>>>
>>>>>>>>
>>>>>>>> Signed-off-by: Hsia-Jun(Randy) Li <[email protected]>
>>>>>>>> ---
>>>>>>>> drivers/media/v4l2-core/v4l2-mem2mem.c | 25 +++++++++++++++++--------
>>>>>>>> include/media/v4l2-mem2mem.h | 10 +++++++++-
>>>>>>>> 2 files changed, 26 insertions(+), 9 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>>>>>> index c771aba42015..b4151147d5bd 100644
>>>>>>>> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>>>>>> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>>>>>> @@ -321,15 +321,21 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
>>>>>>>> goto job_unlock;
>>>>>>>> }
>>>>>>>>
>>>>>>>> - src = v4l2_m2m_next_src_buf(m2m_ctx);
>>>>>>>> - dst = v4l2_m2m_next_dst_buf(m2m_ctx);
>>>>>>>> - if (!src && !m2m_ctx->out_q_ctx.buffered) {
>>>>>>>> - dprintk("No input buffers available\n");
>>>>>>>> - goto job_unlock;
>>>>>>>> + if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
>>>>>>>> + src = v4l2_m2m_next_src_buf(m2m_ctx);
>>>>>>>> +
>>>>>>>> + if (!src && !m2m_ctx->out_q_ctx.buffered) {
>>>>>>>> + dprintk("No input buffers available\n");
>>>>>>>> + goto job_unlock;
>>>>>>>> + }
>>>>>>>> }
>>>>>>>> - if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
>>>>>>>> - dprintk("No output buffers available\n");
>>>>>>>> - goto job_unlock;
>>>>>>>> +
>>>>>>>> + if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
>>>>>>>> + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
>>>>>>>> + if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
>>>>>>>> + dprintk("No output buffers available\n");
>>>>>>>> + goto job_unlock;
>>>>>>>> + }
>>>>>>>> }
>>>>>>>
>>>>>>> src and dst would be referenced unitialized below if neither of the
>>>>>>> above ifs hits...
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Tomasz
>>>>>>>
>>>>>>>>
>>>>>>>> m2m_ctx->new_frame = true;
>>>>>>>> @@ -896,6 +902,7 @@ int v4l2_m2m_streamoff(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
>>>>>>>> INIT_LIST_HEAD(&q_ctx->rdy_queue);
>>>>>>>> q_ctx->num_rdy = 0;
>>>>>>>> spin_unlock_irqrestore(&q_ctx->rdy_spinlock, flags);
>>>>>>>> + INIT_LIST_HEAD(&q_ctx->hw_queue);
>>>>>>>>
>>>>>>>> if (m2m_dev->curr_ctx == m2m_ctx) {
>>>>>>>> m2m_dev->curr_ctx = NULL;
>>>>>>>> @@ -1234,6 +1241,8 @@ struct v4l2_m2m_ctx *v4l2_m2m_ctx_init(struct v4l2_m2m_dev *m2m_dev,
>>>>>>>>
>>>>>>>> INIT_LIST_HEAD(&out_q_ctx->rdy_queue);
>>>>>>>> INIT_LIST_HEAD(&cap_q_ctx->rdy_queue);
>>>>>>>> + INIT_LIST_HEAD(&out_q_ctx->hw_queue);
>>>>>>>> + INIT_LIST_HEAD(&cap_q_ctx->hw_queue);
>>>>>>>> spin_lock_init(&out_q_ctx->rdy_spinlock);
>>>>>>>> spin_lock_init(&cap_q_ctx->rdy_spinlock);
>>>>>>>>
>>>>>>>> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
>>>>>>>> index d6c8eb2b5201..2342656e582d 100644
>>>>>>>> --- a/include/media/v4l2-mem2mem.h
>>>>>>>> +++ b/include/media/v4l2-mem2mem.h
>>>>>>>> @@ -53,9 +53,16 @@ struct v4l2_m2m_dev;
>>>>>>>> * processed
>>>>>>>> *
>>>>>>>> * @q: pointer to struct &vb2_queue
>>>>>>>> - * @rdy_queue: List of V4L2 mem-to-mem queues
>>>>>>>> + * @rdy_queue: List of V4L2 mem-to-mem queues. If v4l2_m2m_buf_queue() is
>>>>>>>> + * called in struct vb2_ops->buf_queue(), the buffer enqueued
>>>>>>>> + * by user would be added to this list.
>>>>>>>> * @rdy_spinlock: spin lock to protect the struct usage
>>>>>>>> * @num_rdy: number of buffers ready to be processed
>>>>>>>> + * @hw_queue: A list for tracking the buffer is occupied by the hardware
>>>>>>>> + * (or device's firmware). A buffer could only be in either
>>>>>>>> + * this list or @rdy_queue.
>>>>>>>> + * Driver may choose not to use this list while uses its own
>>>>>>>> + * private data to do this work.
>>>>>>>> * @buffered: is the queue buffered?
>>>>>>>> *
>>>>>>>> * Queue for buffers ready to be processed as soon as this
>>>>>>>> @@ -68,6 +75,7 @@ struct v4l2_m2m_queue_ctx {
>>>>>>>> struct list_head rdy_queue;
>>>>>>>> spinlock_t rdy_spinlock;
>>>>>>>> u8 num_rdy;
>>>>>>>> + struct list_head hw_queue;
>>>>>>>> bool buffered;
>>>>>>>> };
>>>>>>>>
>>>>>>>> --
>>>>>>>> 2.17.1
>>>>>>>>
>>>>>>
>>>>
>>
>> --
>> Hsia-Jun(Randy) Li

--
Hsia-Jun(Randy) Li

2023-07-28 09:13:46

by Tomasz Figa

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw

On Fri, Jul 28, 2023 at 4:09 PM Hsia-Jun Li <[email protected]> wrote:
>
>
>
> On 7/28/23 12:43, Tomasz Figa wrote:
> > CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
> >
> >
> > On Fri, Jul 28, 2023 at 1:58 AM Nicolas Dufresne <[email protected]> wrote:
> >>
> >> Le jeudi 27 juillet 2023 à 16:43 +0900, Tomasz Figa a écrit :
> >>> On Mon, Jul 17, 2023 at 11:07 PM Nicolas Dufresne <[email protected]> wrote:
> >>>>
> >>>> Le mercredi 12 juillet 2023 à 09:33 +0000, Tomasz Figa a écrit :
> >>>>> On Tue, Jul 04, 2023 at 12:00:38PM +0800, Hsia-Jun Li wrote:
> >>>>>> From: "Hsia-Jun(Randy) Li" <[email protected]>
> >>>>>>
> >>>>>> Many drivers have to create its own buf_struct for a
> >>>>>> vb2_queue to track such a state. Also driver has to
> >>>>>> iterate over rdy_queue every times to find out a buffer
> >>>>>> which is not sent to hardware(or firmware), this new
> >>>>>> list just offers the driver a place to store the buffer
> >>>>>> that hardware(firmware) has acknowledged.
> >>>>>>
> >>>>>> One important advance about this list, it doesn't like
> >>>>>> rdy_queue which both bottom half of the user calling
> >>>>>> could operate it, while the v4l2 worker would as well.
> >>>>>> The v4l2 core could only operate this queue when its
> >>>>>> v4l2_context is not running, the driver would only
> >>>>>> access this new hw_queue in its own worker.
> >>>>>
> >>>>> Could you describe in what case such a list would be useful for a
> >>>>> mem2mem driver?
> >>>>
> >>>> Today all driver must track buffers that are "owned by the hardware". This is a
> >>>> concept dictated by the m2m framework and enforced through the ACTIVE flag. All
> >>>> buffers from this list must be mark as done/error/queued after streamoff of the
> >>>> respective queue in order to acknowledge that they are no longer in use by the
> >>>> HW. Not doing so will warn:
> >>>>
> >>>> videobuf2_common: driver bug: stop_streaming operation is leaving buf ...
> >>>>
> >>>> Though, there is no queue to easily iterate them. All driver endup having their
> >>>> own queue, or just leaving the buffers in the rdy_queue (which isn't better).
> >>>>
> >>>
> >>> Thanks for the explanation. I see how it could be useful now.
> >>>
> >>> Although I guess this is a problem specifically for hardware (or
> >>> firmware) which can internally queue more than 1 buffer, right?
> >>> Otherwise the current buffer could just stay at the top of the
> >>> rdy_queue until it's removed by the driver's completion handler,
> >>> timeout/error handler or context destruction.
> >>
> >> Correct, its only an issue when you need to process multiple src buffers before
> >> producing a dst buffer. If affects stateful decoder, stateful encoders and
> >> deinterlacer as far as I'm aware.
> >
> > Is it actually necessary to keep those buffers in a list in that case, though?
> > I can see that a deinterlacer would indeed need 2 input buffers to
> > perform the deinterlacing operation, but those would be just known to
> > the driver, since it's running the task currently.
> > For a stateful decoder, wouldn't it just consume the bitstream buffer
> > (producing something partially decoded to its own internal buffers)
> > and return it shortly?
> Display re-order. Firmware could do such batch work, taking a few
> bitstream buffer, then output a list graphics buffer in the display
> order also discard the usage of the non-display buffer when it is
> removed from dpb.
>
> Even in one input and one output mode, firmware need to do redo, let the
> driver know when a graphics buffer could be display, so firmware would
> usually hold the graphics buffer(frame) until its display time.
>

Okay, so that hold would be for frame buffers, not bitstream buffers, right?
But yeah, I see that then it could hold onto those buffers until it's
their turn to display and it could be a bigger number of frames,
depending on the complexity of the codec.

> Besides, I hate the driver occupied a large of memory without user's
> order. I would like to drop those internal buffers.

I think this is one reason to migrate to the stateless decoder design.

> > The most realistic scenario would be for stateful encoders which could
> > keep some input buffers as reference frames for further encoding, but
> > then would this patch actually work for them? It would make
> > __v4l2_m2m_try_queue never add the context to the job_queue if there
> > are some buffers in that hw_queue list.
> why?
> >
> > Maybe what I need here are actual patches modifying some existing
> > drivers. Randy, would you be able to include that in the next version?
> May not. The Synaptics VideoSmart is a secure video platform(DRM), I
> could release a snapshot of the driver when I got the permission, that
> would be after the official release of the SDK.
> But you may not be able to compile it because we have our own TEE
> interface(not optee), also running it because the trusted app would be
> signed with a per-device key.

Could you modify another, already existing driver then?

> > Thanks.
> >
> > Best regards,
> > Tomasz
> >
> >>
> >> Nicolas
> >>
> >>>
> >>> Best regards,
> >>> Tomasz
> >>>
> >>>> Nicolas
> >>>>>
> >>>>>>
> >>>>>> Signed-off-by: Hsia-Jun(Randy) Li <[email protected]>
> >>>>>> ---
> >>>>>> drivers/media/v4l2-core/v4l2-mem2mem.c | 25 +++++++++++++++++--------
> >>>>>> include/media/v4l2-mem2mem.h | 10 +++++++++-
> >>>>>> 2 files changed, 26 insertions(+), 9 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> >>>>>> index c771aba42015..b4151147d5bd 100644
> >>>>>> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> >>>>>> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> >>>>>> @@ -321,15 +321,21 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
> >>>>>> goto job_unlock;
> >>>>>> }
> >>>>>>
> >>>>>> - src = v4l2_m2m_next_src_buf(m2m_ctx);
> >>>>>> - dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> >>>>>> - if (!src && !m2m_ctx->out_q_ctx.buffered) {
> >>>>>> - dprintk("No input buffers available\n");
> >>>>>> - goto job_unlock;
> >>>>>> + if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
> >>>>>> + src = v4l2_m2m_next_src_buf(m2m_ctx);
> >>>>>> +
> >>>>>> + if (!src && !m2m_ctx->out_q_ctx.buffered) {
> >>>>>> + dprintk("No input buffers available\n");
> >>>>>> + goto job_unlock;
> >>>>>> + }
> >>>>>> }
> >>>>>> - if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> >>>>>> - dprintk("No output buffers available\n");
> >>>>>> - goto job_unlock;
> >>>>>> +
> >>>>>> + if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
> >>>>>> + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> >>>>>> + if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> >>>>>> + dprintk("No output buffers available\n");
> >>>>>> + goto job_unlock;
> >>>>>> + }
> >>>>>> }
> >>>>>
> >>>>> src and dst would be referenced unitialized below if neither of the
> >>>>> above ifs hits...
> >>>>>
> >>>>> Best regards,
> >>>>> Tomasz
> >>>>>
> >>>>>>
> >>>>>> m2m_ctx->new_frame = true;
> >>>>>> @@ -896,6 +902,7 @@ int v4l2_m2m_streamoff(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
> >>>>>> INIT_LIST_HEAD(&q_ctx->rdy_queue);
> >>>>>> q_ctx->num_rdy = 0;
> >>>>>> spin_unlock_irqrestore(&q_ctx->rdy_spinlock, flags);
> >>>>>> + INIT_LIST_HEAD(&q_ctx->hw_queue);
> >>>>>>
> >>>>>> if (m2m_dev->curr_ctx == m2m_ctx) {
> >>>>>> m2m_dev->curr_ctx = NULL;
> >>>>>> @@ -1234,6 +1241,8 @@ struct v4l2_m2m_ctx *v4l2_m2m_ctx_init(struct v4l2_m2m_dev *m2m_dev,
> >>>>>>
> >>>>>> INIT_LIST_HEAD(&out_q_ctx->rdy_queue);
> >>>>>> INIT_LIST_HEAD(&cap_q_ctx->rdy_queue);
> >>>>>> + INIT_LIST_HEAD(&out_q_ctx->hw_queue);
> >>>>>> + INIT_LIST_HEAD(&cap_q_ctx->hw_queue);
> >>>>>> spin_lock_init(&out_q_ctx->rdy_spinlock);
> >>>>>> spin_lock_init(&cap_q_ctx->rdy_spinlock);
> >>>>>>
> >>>>>> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
> >>>>>> index d6c8eb2b5201..2342656e582d 100644
> >>>>>> --- a/include/media/v4l2-mem2mem.h
> >>>>>> +++ b/include/media/v4l2-mem2mem.h
> >>>>>> @@ -53,9 +53,16 @@ struct v4l2_m2m_dev;
> >>>>>> * processed
> >>>>>> *
> >>>>>> * @q: pointer to struct &vb2_queue
> >>>>>> - * @rdy_queue: List of V4L2 mem-to-mem queues
> >>>>>> + * @rdy_queue: List of V4L2 mem-to-mem queues. If v4l2_m2m_buf_queue() is
> >>>>>> + * called in struct vb2_ops->buf_queue(), the buffer enqueued
> >>>>>> + * by user would be added to this list.
> >>>>>> * @rdy_spinlock: spin lock to protect the struct usage
> >>>>>> * @num_rdy: number of buffers ready to be processed
> >>>>>> + * @hw_queue: A list for tracking the buffer is occupied by the hardware
> >>>>>> + * (or device's firmware). A buffer could only be in either
> >>>>>> + * this list or @rdy_queue.
> >>>>>> + * Driver may choose not to use this list while uses its own
> >>>>>> + * private data to do this work.
> >>>>>> * @buffered: is the queue buffered?
> >>>>>> *
> >>>>>> * Queue for buffers ready to be processed as soon as this
> >>>>>> @@ -68,6 +75,7 @@ struct v4l2_m2m_queue_ctx {
> >>>>>> struct list_head rdy_queue;
> >>>>>> spinlock_t rdy_spinlock;
> >>>>>> u8 num_rdy;
> >>>>>> + struct list_head hw_queue;
> >>>>>> bool buffered;
> >>>>>> };
> >>>>>>
> >>>>>> --
> >>>>>> 2.17.1
> >>>>>>
> >>>>
> >>
>
> --
> Hsia-Jun(Randy) Li

2023-07-28 16:34:31

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw

Le vendredi 28 juillet 2023 à 13:43 +0900, Tomasz Figa a écrit :
> On Fri, Jul 28, 2023 at 1:58 AM Nicolas Dufresne <[email protected]> wrote:
> >
> > Le jeudi 27 juillet 2023 à 16:43 +0900, Tomasz Figa a écrit :
> > > On Mon, Jul 17, 2023 at 11:07 PM Nicolas Dufresne <[email protected]> wrote:
> > > >
> > > > Le mercredi 12 juillet 2023 à 09:33 +0000, Tomasz Figa a écrit :
> > > > > On Tue, Jul 04, 2023 at 12:00:38PM +0800, Hsia-Jun Li wrote:
> > > > > > From: "Hsia-Jun(Randy) Li" <[email protected]>
> > > > > >
> > > > > > Many drivers have to create its own buf_struct for a
> > > > > > vb2_queue to track such a state. Also driver has to
> > > > > > iterate over rdy_queue every times to find out a buffer
> > > > > > which is not sent to hardware(or firmware), this new
> > > > > > list just offers the driver a place to store the buffer
> > > > > > that hardware(firmware) has acknowledged.
> > > > > >
> > > > > > One important advance about this list, it doesn't like
> > > > > > rdy_queue which both bottom half of the user calling
> > > > > > could operate it, while the v4l2 worker would as well.
> > > > > > The v4l2 core could only operate this queue when its
> > > > > > v4l2_context is not running, the driver would only
> > > > > > access this new hw_queue in its own worker.
> > > > >
> > > > > Could you describe in what case such a list would be useful for a
> > > > > mem2mem driver?
> > > >
> > > > Today all driver must track buffers that are "owned by the hardware". This is a
> > > > concept dictated by the m2m framework and enforced through the ACTIVE flag. All
> > > > buffers from this list must be mark as done/error/queued after streamoff of the
> > > > respective queue in order to acknowledge that they are no longer in use by the
> > > > HW. Not doing so will warn:
> > > >
> > > > videobuf2_common: driver bug: stop_streaming operation is leaving buf ...
> > > >
> > > > Though, there is no queue to easily iterate them. All driver endup having their
> > > > own queue, or just leaving the buffers in the rdy_queue (which isn't better).
> > > >
> > >
> > > Thanks for the explanation. I see how it could be useful now.
> > >
> > > Although I guess this is a problem specifically for hardware (or
> > > firmware) which can internally queue more than 1 buffer, right?
> > > Otherwise the current buffer could just stay at the top of the
> > > rdy_queue until it's removed by the driver's completion handler,
> > > timeout/error handler or context destruction.
> >
> > Correct, its only an issue when you need to process multiple src buffers before
> > producing a dst buffer. If affects stateful decoder, stateful encoders and
> > deinterlacer as far as I'm aware.
>
> Is it actually necessary to keep those buffers in a list in that case, though?
> I can see that a deinterlacer would indeed need 2 input buffers to
> perform the deinterlacing operation, but those would be just known to
> the driver, since it's running the task currently.
> For a stateful decoder, wouldn't it just consume the bitstream buffer
> (producing something partially decoded to its own internal buffers)
> and return it shortly?

In practice, in stateful decoder, we pace the consumption of input buffers,
otherwise we just endup consuming the entire video into a ring buffer, which
makes operation like seeks quite heavy and cause CPU spikes.

That being said, I'm not sure how useful a list would be for bitstream buffers.
At the moment, in my current work, I'm leaving buffers in the ready queue, and
just tagging the one I have already copied into the ring buffer. And I remove
them from the ready list, when the related data has been decoded. This is when I
actually copy the timestamp from src to dst buffer. So in short, I don't use an
extra list, but use some marking on the buffers though, to remember which one
have already been copied. This is specific to ring buffer based codecs of
course.

The one where a second list helps is for display picture buffers. When a buffer
has been filled, if its in the ready queue, I currently remove that buffer and
put it in a custom list. It will then be removed when/if the firmware decides to
display it. It may also never be displayed, and reused by the firmware. I short,
these are the frame "owned" by the firmware and containing valid pixels. The rdy
list contains free pictures buffers, and the pixels are undefined.

Maybe, and I'm ready to try, I could also leave them in ready queue and opt for
marking and a counter. As I'm using a job_ready() function, its my driver that
decides if a device_run() should be executed or not. So what matters is
basically if there is a free buffer for a new decode operation, and a counter of
filled but not displayed buffer could probably do that.

> The most realistic scenario would be for stateful encoders which could
> keep some input buffers as reference frames for further encoding, but
> then would this patch actually work for them? It would make
> __v4l2_m2m_try_queue never add the context to the job_queue if there
> are some buffers in that hw_queue list.

Encoders have 3 set of buffers, despite m2m having two queues. OUTPUT buffers
are the pictures, there is a set of internal reconstruction buffers, and finally
the CAPTURE buffers are the bitstream. Bitstream buffers are subject to
reordering, so conceptually the firmware holds more then 1, and reconstruction
buffers are completely hidden.

>
> Maybe what I need here are actual patches modifying some existing
> drivers. Randy, would you be able to include that in the next version?
> Thanks.

Agreed.

>
> Best regards,
> Tomasz
>
> >
> > Nicolas
> >
> > >
> > > Best regards,
> > > Tomasz
> > >
> > > > Nicolas
> > > > >
> > > > > >
> > > > > > Signed-off-by: Hsia-Jun(Randy) Li <[email protected]>
> > > > > > ---
> > > > > > drivers/media/v4l2-core/v4l2-mem2mem.c | 25 +++++++++++++++++--------
> > > > > > include/media/v4l2-mem2mem.h | 10 +++++++++-
> > > > > > 2 files changed, 26 insertions(+), 9 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > > > > index c771aba42015..b4151147d5bd 100644
> > > > > > --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > > > > +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > > > > @@ -321,15 +321,21 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
> > > > > > goto job_unlock;
> > > > > > }
> > > > > >
> > > > > > - src = v4l2_m2m_next_src_buf(m2m_ctx);
> > > > > > - dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> > > > > > - if (!src && !m2m_ctx->out_q_ctx.buffered) {
> > > > > > - dprintk("No input buffers available\n");
> > > > > > - goto job_unlock;
> > > > > > + if (list_empty(&m2m_ctx->out_q_ctx.hw_queue)) {
> > > > > > + src = v4l2_m2m_next_src_buf(m2m_ctx);
> > > > > > +
> > > > > > + if (!src && !m2m_ctx->out_q_ctx.buffered) {
> > > > > > + dprintk("No input buffers available\n");
> > > > > > + goto job_unlock;
> > > > > > + }
> > > > > > }
> > > > > > - if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> > > > > > - dprintk("No output buffers available\n");
> > > > > > - goto job_unlock;
> > > > > > +
> > > > > > + if (list_empty(&m2m_ctx->cap_q_ctx.hw_queue)) {
> > > > > > + dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> > > > > > + if (!dst && !m2m_ctx->cap_q_ctx.buffered) {
> > > > > > + dprintk("No output buffers available\n");
> > > > > > + goto job_unlock;
> > > > > > + }
> > > > > > }
> > > > >
> > > > > src and dst would be referenced unitialized below if neither of the
> > > > > above ifs hits...
> > > > >
> > > > > Best regards,
> > > > > Tomasz
> > > > >
> > > > > >
> > > > > > m2m_ctx->new_frame = true;
> > > > > > @@ -896,6 +902,7 @@ int v4l2_m2m_streamoff(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
> > > > > > INIT_LIST_HEAD(&q_ctx->rdy_queue);
> > > > > > q_ctx->num_rdy = 0;
> > > > > > spin_unlock_irqrestore(&q_ctx->rdy_spinlock, flags);
> > > > > > + INIT_LIST_HEAD(&q_ctx->hw_queue);
> > > > > >
> > > > > > if (m2m_dev->curr_ctx == m2m_ctx) {
> > > > > > m2m_dev->curr_ctx = NULL;
> > > > > > @@ -1234,6 +1241,8 @@ struct v4l2_m2m_ctx *v4l2_m2m_ctx_init(struct v4l2_m2m_dev *m2m_dev,
> > > > > >
> > > > > > INIT_LIST_HEAD(&out_q_ctx->rdy_queue);
> > > > > > INIT_LIST_HEAD(&cap_q_ctx->rdy_queue);
> > > > > > + INIT_LIST_HEAD(&out_q_ctx->hw_queue);
> > > > > > + INIT_LIST_HEAD(&cap_q_ctx->hw_queue);
> > > > > > spin_lock_init(&out_q_ctx->rdy_spinlock);
> > > > > > spin_lock_init(&cap_q_ctx->rdy_spinlock);
> > > > > >
> > > > > > diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
> > > > > > index d6c8eb2b5201..2342656e582d 100644
> > > > > > --- a/include/media/v4l2-mem2mem.h
> > > > > > +++ b/include/media/v4l2-mem2mem.h
> > > > > > @@ -53,9 +53,16 @@ struct v4l2_m2m_dev;
> > > > > > * processed
> > > > > > *
> > > > > > * @q: pointer to struct &vb2_queue
> > > > > > - * @rdy_queue: List of V4L2 mem-to-mem queues
> > > > > > + * @rdy_queue: List of V4L2 mem-to-mem queues. If v4l2_m2m_buf_queue() is
> > > > > > + * called in struct vb2_ops->buf_queue(), the buffer enqueued
> > > > > > + * by user would be added to this list.
> > > > > > * @rdy_spinlock: spin lock to protect the struct usage
> > > > > > * @num_rdy: number of buffers ready to be processed
> > > > > > + * @hw_queue: A list for tracking the buffer is occupied by the hardware
> > > > > > + * (or device's firmware). A buffer could only be in either
> > > > > > + * this list or @rdy_queue.
> > > > > > + * Driver may choose not to use this list while uses its own
> > > > > > + * private data to do this work.
> > > > > > * @buffered: is the queue buffered?
> > > > > > *
> > > > > > * Queue for buffers ready to be processed as soon as this
> > > > > > @@ -68,6 +75,7 @@ struct v4l2_m2m_queue_ctx {
> > > > > > struct list_head rdy_queue;
> > > > > > spinlock_t rdy_spinlock;
> > > > > > u8 num_rdy;
> > > > > > + struct list_head hw_queue;
> > > > > > bool buffered;
> > > > > > };
> > > > > >
> > > > > > --
> > > > > > 2.17.1
> > > > > >
> > > >
> >


2023-07-28 17:30:40

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw

Le vendredi 28 juillet 2023 à 15:37 +0800, Hsia-Jun Li a écrit :
> > I think this is one reason to migrate to the stateless decoder design.
> >
> I didn't know such plan here. I don't think the current stateless API
> could export the reconstruction buffers for encoder or post-processing
> buffer for decoder to us.

Someone suggested introduce auxiliary queues in our meeting in Lyon a while ago,
but I bet everyone got too busy with finalizing APIs, fixing fluster tests etc.
The suggestion felt like it would be possible to add it after the fact. This was
also being discussed in the context of supporting multi-scalers (standalone our
inline with the codec, like VC8000D+). It could also cover for primary and
secondary buffers, along with encoder primary, and reconstruction buffers, but
also auxiliary reference data. This would also be needed to properly support
Vulkan Video fwiw, and could also help with a transition to DMABuf Heaps and
memory accounting.

I've also had corridor discussion around having multi-instance media constroller
devices. It wasn't clear how to bind the media instance to the video node
instances, but assuming there is a way, it would be a tad more flexible (but
massively more complex).

Nicolas


2023-08-03 17:11:41

by Randy Li

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw


On 2023/7/29 00:19, Nicolas Dufresne wrote:
> Le vendredi 28 juillet 2023 à 15:37 +0800, Hsia-Jun Li a écrit :
>>> I think this is one reason to migrate to the stateless decoder design.
>>>
>> I didn't know such plan here. I don't think the current stateless API
>> could export the reconstruction buffers for encoder or post-processing
>> buffer for decoder to us.
> Someone suggested introduce auxiliary queues in our meeting in Lyon a while ago,
> but I bet everyone got too busy with finalizing APIs, fixing fluster tests etc.
> The suggestion felt like it would be possible to add it after the fact. This was
> also being discussed in the context of supporting multi-scalers (standalone our
> inline with the codec, like VC8000D+). It could also cover for primary and
> secondary buffers, along with encoder primary, and reconstruction buffers, but
> also auxiliary reference data. This would also be needed to properly support
> Vulkan Video fwiw, and could also help with a transition to DMABuf Heaps and
> memory accounting.
>
> I've also had corridor discussion around having multi-instance media constroller
> devices. It wasn't clear how to bind the media instance to the video node
> instances, but assuming there is a way, it would be a tad more flexible (but
> massively more complex).

I think we should answer to those questions before we decided what we want:

A. Should a queue only has the buffers for the same format and sizes?

B. How does an application handle those drivers requests additional queue?

C. How to sync multiple buffers in a v4l2 job.

I asked the same question A when I discuss this with media: v4l2: Add
DELETE_BUF ioctl.

If we would not add extra queue here, how does the driver filter out the
most proper buffer for the current hardware output(CAPTURE) buffer.

If we have multiple queues in a direction, how to make driver select
between them?


The question B is the debt we made, some applications have gotten used
to the case they can't control the lifetime of reconstruction buffer in
encoding or turn the post-processing off when the display pipeline could
support tile format output.

We know allow the userspace could decide where we allocate those
buffers, but could the userspace decided not to handle their lifetime?


The question C may more be more related to the complex case like camera
senor and ISP. With this auxiliary queue, multiple video nodes are not
necessary anymore.

But ISP may not request all the data finish its path, ex. the ISP are
not satisfied with the focus point that its senor detected or the light
level, it may just drop the image data then shot again.

Also the poll event could only tell us which direction could do the
dequeue/enqueue work, it can't tell us which queue is ready. Should we
introduce something likes sync point(fence fd) here?


We may lead way to V4L3 as Tomasz suggested although I don't want to
take the risk to be. If we would make a V4L3 like thing, we have better
to decide it correct and could handle any future problem.

>
> Nicolas
>

2023-08-04 22:42:50

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: [PATCH 2/2] media: v4l2-mem2mem: add a list for buf used by hw

Hi Randy,

Le vendredi 04 août 2023 à 00:16 +0800, Randy Li a écrit :
> On 2023/7/29 00:19, Nicolas Dufresne wrote:
> > Le vendredi 28 juillet 2023 à 15:37 +0800, Hsia-Jun Li a écrit :
> > > > I think this is one reason to migrate to the stateless decoder design.
> > > >
> > > I didn't know such plan here. I don't think the current stateless API
> > > could export the reconstruction buffers for encoder or post-processing
> > > buffer for decoder to us.
> > Someone suggested introduce auxiliary queues in our meeting in Lyon a while ago,
> > but I bet everyone got too busy with finalizing APIs, fixing fluster tests etc.
> > The suggestion felt like it would be possible to add it after the fact. This was
> > also being discussed in the context of supporting multi-scalers (standalone our
> > inline with the codec, like VC8000D+). It could also cover for primary and
> > secondary buffers, along with encoder primary, and reconstruction buffers, but
> > also auxiliary reference data. This would also be needed to properly support
> > Vulkan Video fwiw, and could also help with a transition to DMABuf Heaps and
> > memory accounting.
> >
> > I've also had corridor discussion around having multi-instance media constroller
> > devices. It wasn't clear how to bind the media instance to the video node
> > instances, but assuming there is a way, it would be a tad more flexible (but
> > massively more complex).
>
> I think we should answer to those questions before we decided what we want:
>
> A. Should a queue only has the buffers for the same format and sizes?

I initially started answering these, but ended up concluding this is more some
sort of personal notes. I believe the discussion is diverging. Remember that in
an existing API, one cannot fix all theoretical issues in one go. In order to
move forward, you have to tackle very specific use case and find a way forward.
If you are to break compatibility as much as you suggest, you'd rather look into
writing a new API.

[...]

2023-08-20 11:56:17

by Hans Verkuil

[permalink] [raw]
Subject: Re: [PATCH 0/2] Improve V4L2 M2M job scheduler

On 04/07/2023 06:00, Hsia-Jun Li wrote:
> From: "Hsia-Jun(Randy) Li" <[email protected]>
>
> The first patch is an old patch, I resend it again.
> I want to make the work thats parses the bitstream
> to extract the sequence information or video resolution
> as a part of V4L2 schedule. Such a work would also
> consume the device's resources likes remote CPU
> time.
>
> Although reuse a flag which no current driver may
> not be a good idea. I could add a new flag for that
> if people like that.
>
> The second is a patch offering a generic solution
> for tracking buffers which have been pushed to
> hardware(or firmware). It didn't record which buffer
> that hardware(firmware) still holds for future
> decoding(likes the reference buffer), while it
> has been sent to the user(dequeue). We may need
> a flag for this work.

I am dropping this series from patchwork: clearly this generated
a lot of discussion, and I think that needs to come to a conclusion.

BTW, I believe that at minimum the codec-specific parts in
v4l2-mem2mem.c should be split off in their own source (v4l2-m2m-codec.c?).

I agree with Tomasz that mem2mem.c was originally for simple m2m devices,
and adding all sorts of codec specific code to that source doesn't make
it easier to follow.

Regards,

Hans

>
> Hsia-Jun(Randy) Li (1):
> media: v4l2-mem2mem: add a list for buf used by hw
>
> Randy Li (1):
> media: v4l2-mem2mem: allow device run without buf
>
> drivers/media/v4l2-core/v4l2-mem2mem.c | 30 +++++++++++++++++---------
> include/media/v4l2-mem2mem.h | 10 ++++++++-
> 2 files changed, 29 insertions(+), 11 deletions(-)
>