2023-06-22 13:34:38

by Benjamin Gaignard

[permalink] [raw]
Subject: [PATCH v3 00/11] Add DELETE_BUF ioctl

Unlike when resolution change on keyframes, dynamic resolution change
on inter frames doesn't allow to do a stream off/on sequence because
it is need to keep all previous references alive to decode inter frames.
This constraint have two main problems:
- more memory consumption.
- more buffers in use.
To solve these issue this series introduce DELETE_BUF ioctl and remove
the 32 buffers limit per queue.

VP9 conformance tests using fluster give a score of 210/305.
The 25 resize inter tests (vp90-2-21-resize_inter_* files) are ok
but require to use postprocessor.

Kernel branch is available here:
https://gitlab.collabora.com/benjamin.gaignard/for-upstream/-/commits/remove_vb2_queue_limit_v3

GStreamer branch to use DELETE_BUF ioctl and testing dynamic resolution
change is here:
https://gitlab.freedesktop.org/benjamin.gaignard1/gstreamer/-/commits/VP9_drc

changes in version 3:
- Use Xarray API to store allocated video buffers.
- No module parameter to limit the number of buffer per queue.
- Use Xarray inside Verisilicon driver to store postprocessor buffers
and remove VB2_MAX_FRAME limit.
- Allow Versilicon driver to change of resolution while streaming
- Various fixes the Verisilicon VP9 code to improve fluster score.

changes in version 2:
- Use a dynamic array and not a list to keep trace of allocated buffers.
Not use IDR interface because it is marked as deprecated in kernel
documentation.
- Add a module parameter to limit the number of buffer per queue.
- Add DELETE_BUF ioctl and m2m helpers.

Benjamin Gaignard (11):
media: videobuf2: Access vb2_queue bufs array through helper functions
media: videobuf2: Use Xarray instead of static buffers array
media: videobuf2: Remove VB2_MAX_FRAME limit on buffer storage
media: videobuf2: Stop define VB2_MAX_FRAME as global
media: verisilicon: Refactor postprocessor to store more buffers
media: verisilicon: Store chroma and motion vectors offset
media: verisilicon: vp9: Use destination buffer height to compute
chroma offset
media: verisilicon: postproc: Fix down scale test
media: verisilicon: vp9: Allow to change resolution while streaming
media: v4l2: Add DELETE_BUF ioctl
media: v4l2: Add mem2mem helpers for DELETE_BUF ioctl

.../userspace-api/media/v4l/user-func.rst | 1 +
.../media/v4l/vidioc-delete-buf.rst | 51 ++++
.../media/common/videobuf2/videobuf2-core.c | 275 ++++++++++++++----
.../media/common/videobuf2/videobuf2-v4l2.c | 34 ++-
drivers/media/platform/amphion/vdec.c | 1 +
drivers/media/platform/amphion/vpu_dbg.c | 22 +-
.../platform/mediatek/jpeg/mtk_jpeg_core.c | 6 +-
.../vcodec/vdec/vdec_vp9_req_lat_if.c | 4 +-
drivers/media/platform/qcom/venus/hfi.h | 2 +
drivers/media/platform/st/sti/hva/hva-v4l2.c | 4 +
drivers/media/platform/verisilicon/hantro.h | 8 +-
.../platform/verisilicon/hantro_g2_vp9_dec.c | 10 +-
.../media/platform/verisilicon/hantro_hw.h | 4 +-
.../platform/verisilicon/hantro_postproc.c | 114 +++++---
.../media/platform/verisilicon/hantro_v4l2.c | 37 +--
drivers/media/test-drivers/vim2m.c | 1 +
drivers/media/test-drivers/visl/visl-dec.c | 28 +-
drivers/media/v4l2-core/v4l2-dev.c | 1 +
drivers/media/v4l2-core/v4l2-ioctl.c | 10 +
drivers/media/v4l2-core/v4l2-mem2mem.c | 20 ++
.../staging/media/atomisp/pci/atomisp_ioctl.c | 2 +-
drivers/staging/media/ipu3/ipu3-v4l2.c | 2 +
include/media/v4l2-ioctl.h | 4 +
include/media/v4l2-mem2mem.h | 12 +
include/media/videobuf2-core.h | 16 +-
include/media/videobuf2-v4l2.h | 15 +-
include/uapi/linux/videodev2.h | 2 +
27 files changed, 523 insertions(+), 163 deletions(-)
create mode 100644 Documentation/userspace-api/media/v4l/vidioc-delete-buf.rst

--
2.39.2



2023-06-22 13:38:02

by Benjamin Gaignard

[permalink] [raw]
Subject: [PATCH v3 06/11] media: verisilicon: Store chroma and motion vectors offset

Store computed values of chroma and motion vectors offset because
they depends on width and height values which change if the resolution
change.

Signed-off-by: Benjamin Gaignard <[email protected]>
---
drivers/media/platform/verisilicon/hantro.h | 2 ++
drivers/media/platform/verisilicon/hantro_g2_vp9_dec.c | 6 ++++--
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/media/platform/verisilicon/hantro.h b/drivers/media/platform/verisilicon/hantro.h
index bd0dca11b90a..16c7e9bafde3 100644
--- a/drivers/media/platform/verisilicon/hantro.h
+++ b/drivers/media/platform/verisilicon/hantro.h
@@ -320,6 +320,8 @@ struct hantro_vp9_decoded_buffer_info {
/* Info needed when the decoded frame serves as a reference frame. */
unsigned short width;
unsigned short height;
+ size_t chroma_offset;
+ size_t mv_offset;
u32 bit_depth : 4;
};

diff --git a/drivers/media/platform/verisilicon/hantro_g2_vp9_dec.c b/drivers/media/platform/verisilicon/hantro_g2_vp9_dec.c
index 6fc4b555517f..6db1c32fce4d 100644
--- a/drivers/media/platform/verisilicon/hantro_g2_vp9_dec.c
+++ b/drivers/media/platform/verisilicon/hantro_g2_vp9_dec.c
@@ -158,9 +158,11 @@ static void config_output(struct hantro_ctx *ctx,

chroma_addr = luma_addr + chroma_offset(ctx, dec_params);
hantro_write_addr(ctx->dev, G2_OUT_CHROMA_ADDR, chroma_addr);
+ dst->vp9.chroma_offset = chroma_offset(ctx, dec_params);

mv_addr = luma_addr + mv_offset(ctx, dec_params);
hantro_write_addr(ctx->dev, G2_OUT_MV_ADDR, mv_addr);
+ dst->vp9.mv_offset = mv_offset(ctx, dec_params);
}

struct hantro_vp9_ref_reg {
@@ -195,7 +197,7 @@ static void config_ref(struct hantro_ctx *ctx,
luma_addr = hantro_get_dec_buf_addr(ctx, &buf->base.vb.vb2_buf);
hantro_write_addr(ctx->dev, ref_reg->y_base, luma_addr);

- chroma_addr = luma_addr + chroma_offset(ctx, dec_params);
+ chroma_addr = luma_addr + buf->vp9.chroma_offset;
hantro_write_addr(ctx->dev, ref_reg->c_base, chroma_addr);
}

@@ -238,7 +240,7 @@ static void config_ref_registers(struct hantro_ctx *ctx,
config_ref(ctx, dst, &ref_regs[2], dec_params, dec_params->alt_frame_ts);

mv_addr = hantro_get_dec_buf_addr(ctx, &mv_ref->base.vb.vb2_buf) +
- mv_offset(ctx, dec_params);
+ mv_ref->vp9.mv_offset;
hantro_write_addr(ctx->dev, G2_REF_MV_ADDR(0), mv_addr);

hantro_reg_write(ctx->dev, &vp9_last_sign_bias,
--
2.39.2


2023-06-27 07:58:57

by Hsia-Jun Li

[permalink] [raw]
Subject: Re: [PATCH v3 00/11] Add DELETE_BUF ioctl


On 6/22/23 21:13, Benjamin Gaignard wrote:
> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> Unlike when resolution change on keyframes, dynamic resolution change
> on inter frames doesn't allow to do a stream off/on sequence because
> it is need to keep all previous references alive to decode inter frames.
> This constraint have two main problems:
> - more memory consumption.
> - more buffers in use.
> To solve these issue this series introduce DELETE_BUF ioctl and remove
> the 32 buffers limit per queue.

I know the VIDIOC_CREATE_BUFS allows creating a buffer with a different
size than the driver suggests in G_FMT.

But the vb2_ops->queue_setup() could check whether the sizeimages meet
its minimal requirement with the current format.

This enables a problem that the driver need to check the buffer size
before they make a hardware use a buffer from the rdy_queue.


Thinking of such case, we know a AV1 sequence(VP9 or VP8 didn't have a
sequence header) would need a much large buffer for the alternative
reference frame.

Then create one special buffer for the altref, the driver need a
hardware to pick it from the rdy_queue first or it would be a waste to
use it as a regular frame buffer.

Also missing such step would not solve the memory allocation problem.

>
> VP9 conformance tests using fluster give a score of 210/305.
> The 25 resize inter tests (vp90-2-21-resize_inter_* files) are ok
> but require to use postprocessor.
>
> Kernel branch is available here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.collabora.com_benjamin.gaignard_for-2Dupstream_-2D_commits_remove-5Fvb2-5Fqueue-5Flimit-5Fv3&d=DwIDAg&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=DCpeuc2fAyJ_XUCYsydYOB5ynn0uW4JsFKVbEiXj-6AhZ5d2vm3GkOClPl8cfN9U&s=8whob9PKPu98WlyK6J9DcmFFiDPbwI3ws-nLfWR0oTE&e=
>
> GStreamer branch to use DELETE_BUF ioctl and testing dynamic resolution
> change is here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.freedesktop.org_benjamin.gaignard1_gstreamer_-2D_commits_VP9-5Fdrc&d=DwIDAg&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=DCpeuc2fAyJ_XUCYsydYOB5ynn0uW4JsFKVbEiXj-6AhZ5d2vm3GkOClPl8cfN9U&s=SEexoIeuXbraR1zvtSkz0MQFGyZSeKQ7Pt6mJoNrS0A&e=
>
> changes in version 3:
> - Use Xarray API to store allocated video buffers.
> - No module parameter to limit the number of buffer per queue.
> - Use Xarray inside Verisilicon driver to store postprocessor buffers
> and remove VB2_MAX_FRAME limit.
> - Allow Versilicon driver to change of resolution while streaming
> - Various fixes the Verisilicon VP9 code to improve fluster score.
>
> changes in version 2:
> - Use a dynamic array and not a list to keep trace of allocated buffers.
> Not use IDR interface because it is marked as deprecated in kernel
> documentation.
> - Add a module parameter to limit the number of buffer per queue.
> - Add DELETE_BUF ioctl and m2m helpers.
>
> Benjamin Gaignard (11):
> media: videobuf2: Access vb2_queue bufs array through helper functions
> media: videobuf2: Use Xarray instead of static buffers array
> media: videobuf2: Remove VB2_MAX_FRAME limit on buffer storage
> media: videobuf2: Stop define VB2_MAX_FRAME as global
> media: verisilicon: Refactor postprocessor to store more buffers
> media: verisilicon: Store chroma and motion vectors offset
> media: verisilicon: vp9: Use destination buffer height to compute
> chroma offset
> media: verisilicon: postproc: Fix down scale test
> media: verisilicon: vp9: Allow to change resolution while streaming
> media: v4l2: Add DELETE_BUF ioctl
> media: v4l2: Add mem2mem helpers for DELETE_BUF ioctl
>
> .../userspace-api/media/v4l/user-func.rst | 1 +
> .../media/v4l/vidioc-delete-buf.rst | 51 ++++
> .../media/common/videobuf2/videobuf2-core.c | 275 ++++++++++++++----
> .../media/common/videobuf2/videobuf2-v4l2.c | 34 ++-
> drivers/media/platform/amphion/vdec.c | 1 +
> drivers/media/platform/amphion/vpu_dbg.c | 22 +-
> .../platform/mediatek/jpeg/mtk_jpeg_core.c | 6 +-
> .../vcodec/vdec/vdec_vp9_req_lat_if.c | 4 +-
> drivers/media/platform/qcom/venus/hfi.h | 2 +
> drivers/media/platform/st/sti/hva/hva-v4l2.c | 4 +
> drivers/media/platform/verisilicon/hantro.h | 8 +-
> .../platform/verisilicon/hantro_g2_vp9_dec.c | 10 +-
> .../media/platform/verisilicon/hantro_hw.h | 4 +-
> .../platform/verisilicon/hantro_postproc.c | 114 +++++---
> .../media/platform/verisilicon/hantro_v4l2.c | 37 +--
> drivers/media/test-drivers/vim2m.c | 1 +
> drivers/media/test-drivers/visl/visl-dec.c | 28 +-
> drivers/media/v4l2-core/v4l2-dev.c | 1 +
> drivers/media/v4l2-core/v4l2-ioctl.c | 10 +
> drivers/media/v4l2-core/v4l2-mem2mem.c | 20 ++
> .../staging/media/atomisp/pci/atomisp_ioctl.c | 2 +-
> drivers/staging/media/ipu3/ipu3-v4l2.c | 2 +
> include/media/v4l2-ioctl.h | 4 +
> include/media/v4l2-mem2mem.h | 12 +
> include/media/videobuf2-core.h | 16 +-
> include/media/videobuf2-v4l2.h | 15 +-
> include/uapi/linux/videodev2.h | 2 +
> 27 files changed, 523 insertions(+), 163 deletions(-)
> create mode 100644 Documentation/userspace-api/media/v4l/vidioc-delete-buf.rst
>
> --
> 2.39.2
>
--
Hsia-Jun(Randy) Li