2018-11-24 08:33:39

by Paul Kocialkowski

[permalink] [raw]
Subject: [PATCH v2 0/2] HEVC/H.265 stateless support for V4L2 and Cedrus

This introduces the required bits for supporting HEVC/H.265 both in the
V4L2 framework and the Cedrus VPU driver that concerns Allwinner
devices.

A specific pixel format is introduced for the HEVC slice format and
controls are provided to pass the bitstream metadata to the decoder.
Some bitstream extensions are knowingly not supported at this point.

Since this is the first proposal for stateless HEVC/H.265 support in
V4L2, reviews and comments about the controls definitions are
particularly welcome.

On the Cedrus side, the H.265 implementation covers frame pictures
with both uni-directional and bi-direction prediction modes (P/B
slices). Field pictures (interleaved), scaling lists and 10-bit output
are not supported at this point.

This series is based upon the following series:
* media: cedrus: Add H264 decoding support
* vb2/cedrus: add tag support

Changes since v1:
* Added a H.265 capability to whitelist relevant platforms;
* Switched over to tags instead of buffer indices in the DPB
* Declared variable in their reduced scope as suggested;
* Added the H.265/HEVC spec to the biblio;
* Used in-doc references to the spec and the required APIs;
* Removed debugging leftovers.

Cheers!

Paul Kocialkowski (2):
media: v4l: Add definitions for the HEVC slice format and controls
media: cedrus: Add HEVC/H.265 decoding support

Documentation/media/uapi/v4l/biblio.rst | 9 +
.../media/uapi/v4l/extended-controls.rst | 417 ++++++++++++++
.../media/uapi/v4l/pixfmt-compressed.rst | 15 +
.../media/uapi/v4l/vidioc-queryctrl.rst | 18 +
.../media/videodev2.h.rst.exceptions | 3 +
drivers/media/v4l2-core/v4l2-ctrls.c | 26 +
drivers/media/v4l2-core/v4l2-ioctl.c | 1 +
drivers/staging/media/sunxi/cedrus/Makefile | 2 +-
drivers/staging/media/sunxi/cedrus/cedrus.c | 22 +-
drivers/staging/media/sunxi/cedrus/cedrus.h | 18 +
.../staging/media/sunxi/cedrus/cedrus_dec.c | 9 +
.../staging/media/sunxi/cedrus/cedrus_h265.c | 543 ++++++++++++++++++
.../staging/media/sunxi/cedrus/cedrus_hw.c | 4 +
.../staging/media/sunxi/cedrus/cedrus_regs.h | 290 ++++++++++
.../staging/media/sunxi/cedrus/cedrus_video.c | 10 +
include/media/v4l2-ctrls.h | 6 +
include/uapi/linux/v4l2-controls.h | 155 +++++
include/uapi/linux/v4l2-controls.h.rej | 187 ------
include/uapi/linux/videodev2.h | 7 +
19 files changed, 1553 insertions(+), 189 deletions(-)
create mode 100644 drivers/staging/media/sunxi/cedrus/cedrus_h265.c
delete mode 100644 include/uapi/linux/v4l2-controls.h.rej

--
2.19.1



2018-11-24 08:33:47

by Paul Kocialkowski

[permalink] [raw]
Subject: [PATCH v2 2/2] media: cedrus: Add HEVC/H.265 decoding support

This introduces support for HEVC/H.265 to the Cedrus VPU driver, with
both uni-directional and bi-directional prediction modes supported.

Field-coded (interlaced) pictures, custom quantization matrices and
10-bit output are not supported at this point.

Signed-off-by: Paul Kocialkowski <[email protected]>
---
drivers/staging/media/sunxi/cedrus/Makefile | 2 +-
drivers/staging/media/sunxi/cedrus/cedrus.c | 22 +-
drivers/staging/media/sunxi/cedrus/cedrus.h | 18 +
.../staging/media/sunxi/cedrus/cedrus_dec.c | 9 +
.../staging/media/sunxi/cedrus/cedrus_h265.c | 543 ++++++++++++++++++
.../staging/media/sunxi/cedrus/cedrus_hw.c | 4 +
.../staging/media/sunxi/cedrus/cedrus_regs.h | 290 ++++++++++
.../staging/media/sunxi/cedrus/cedrus_video.c | 10 +
8 files changed, 896 insertions(+), 2 deletions(-)
create mode 100644 drivers/staging/media/sunxi/cedrus/cedrus_h265.c

diff --git a/drivers/staging/media/sunxi/cedrus/Makefile b/drivers/staging/media/sunxi/cedrus/Makefile
index aaf141fc58b6..186cb6d01b67 100644
--- a/drivers/staging/media/sunxi/cedrus/Makefile
+++ b/drivers/staging/media/sunxi/cedrus/Makefile
@@ -1,4 +1,4 @@
obj-$(CONFIG_VIDEO_SUNXI_CEDRUS) += sunxi-cedrus.o

sunxi-cedrus-y = cedrus.o cedrus_video.o cedrus_hw.o cedrus_dec.o \
- cedrus_mpeg2.o cedrus_h264.o
+ cedrus_mpeg2.o cedrus_h264.o cedrus_h265.o
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.c b/drivers/staging/media/sunxi/cedrus/cedrus.c
index 923aa7bd57f4..e1e610dbe804 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus.c
@@ -64,6 +64,24 @@ static const struct cedrus_control cedrus_controls[] = {
.codec = CEDRUS_CODEC_H264,
.required = true,
},
+ {
+ .id = V4L2_CID_MPEG_VIDEO_HEVC_SPS,
+ .elem_size = sizeof(struct v4l2_ctrl_hevc_sps),
+ .codec = CEDRUS_CODEC_H265,
+ .required = true,
+ },
+ {
+ .id = V4L2_CID_MPEG_VIDEO_HEVC_PPS,
+ .elem_size = sizeof(struct v4l2_ctrl_hevc_pps),
+ .codec = CEDRUS_CODEC_H265,
+ .required = true,
+ },
+ {
+ .id = V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS,
+ .elem_size = sizeof(struct v4l2_ctrl_hevc_slice_params),
+ .codec = CEDRUS_CODEC_H265,
+ .required = true,
+ },
};

#define CEDRUS_CONTROLS_COUNT ARRAY_SIZE(cedrus_controls)
@@ -304,6 +322,7 @@ static int cedrus_probe(struct platform_device *pdev)

dev->dec_ops[CEDRUS_CODEC_MPEG2] = &cedrus_dec_ops_mpeg2;
dev->dec_ops[CEDRUS_CODEC_H264] = &cedrus_dec_ops_h264;
+ dev->dec_ops[CEDRUS_CODEC_H265] = &cedrus_dec_ops_h265;

mutex_init(&dev->dev_mutex);

@@ -411,7 +430,8 @@ static const struct cedrus_variant sun8i_a33_cedrus_variant = {
};

static const struct cedrus_variant sun8i_h3_cedrus_variant = {
- .capabilities = CEDRUS_CAPABILITY_UNTILED,
+ .capabilities = CEDRUS_CAPABILITY_UNTILED |
+ CEDRUS_CAPABILITY_H265_DEC,
};

static const struct of_device_id cedrus_dt_match[] = {
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h b/drivers/staging/media/sunxi/cedrus/cedrus.h
index fe7e06267d92..1895108222c8 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus.h
+++ b/drivers/staging/media/sunxi/cedrus/cedrus.h
@@ -27,10 +27,12 @@
#define CEDRUS_NAME "cedrus"

#define CEDRUS_CAPABILITY_UNTILED BIT(0)
+#define CEDRUS_CAPABILITY_H265_DEC BIT(1)

enum cedrus_codec {
CEDRUS_CODEC_MPEG2,
CEDRUS_CODEC_H264,
+ CEDRUS_CODEC_H265,
CEDRUS_CODEC_LAST,
};

@@ -65,6 +67,12 @@ struct cedrus_mpeg2_run {
const struct v4l2_ctrl_mpeg2_quantization *quantization;
};

+struct cedrus_h265_run {
+ const struct v4l2_ctrl_hevc_sps *sps;
+ const struct v4l2_ctrl_hevc_pps *pps;
+ const struct v4l2_ctrl_hevc_slice_params *slice_params;
+};
+
struct cedrus_run {
struct vb2_v4l2_buffer *src;
struct vb2_v4l2_buffer *dst;
@@ -72,6 +80,7 @@ struct cedrus_run {
union {
struct cedrus_h264_run h264;
struct cedrus_mpeg2_run mpeg2;
+ struct cedrus_h265_run h265;
};
};

@@ -108,6 +117,14 @@ struct cedrus_ctx {
void *pic_info_buf;
dma_addr_t pic_info_buf_dma;
} h264;
+ struct {
+ void *mv_col_buf;
+ dma_addr_t mv_col_buf_addr;
+ ssize_t mv_col_buf_size;
+ ssize_t mv_col_buf_unit_size;
+ void *neighbor_info_buf;
+ dma_addr_t neighbor_info_buf_addr;
+ } h265;
} codec;
};

@@ -151,6 +168,7 @@ struct cedrus_dev {

extern struct cedrus_dec_ops cedrus_dec_ops_mpeg2;
extern struct cedrus_dec_ops cedrus_dec_ops_h264;
+extern struct cedrus_dec_ops cedrus_dec_ops_h265;

static inline void cedrus_write(struct cedrus_dev *dev, u32 reg, u32 val)
{
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
index 2cd3a995e82e..d5ddd3938581 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
@@ -57,6 +57,15 @@ void cedrus_device_run(void *priv)
V4L2_CID_MPEG_VIDEO_H264_SPS);
break;

+ case V4L2_PIX_FMT_HEVC_SLICE:
+ run.h265.sps = cedrus_find_control_data(ctx,
+ V4L2_CID_MPEG_VIDEO_HEVC_SPS);
+ run.h265.pps = cedrus_find_control_data(ctx,
+ V4L2_CID_MPEG_VIDEO_HEVC_PPS);
+ run.h265.slice_params = cedrus_find_control_data(ctx,
+ V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS);
+ break;
+
default:
break;
}
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h265.c b/drivers/staging/media/sunxi/cedrus/cedrus_h265.c
new file mode 100644
index 000000000000..20ed0b5386a7
--- /dev/null
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_h265.c
@@ -0,0 +1,543 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Cedrus VPU driver
+ *
+ * Copyright (C) 2013 Jens Kuske <[email protected]>
+ * Copyright (C) 2018 Paul Kocialkowski <[email protected]>
+ * Copyright (C) 2018 Bootlin
+ */
+
+#include <linux/types.h>
+
+#include <media/videobuf2-dma-contig.h>
+
+#include "cedrus.h"
+#include "cedrus_hw.h"
+#include "cedrus_regs.h"
+
+/*
+ * Note: Neighbor info buffer size is apparently doubled for H6, which may be
+ * related to 10 bit H265 support.
+ */
+#define CEDRUS_H265_NEIGHBOR_INFO_BUF_SIZE (397 * SZ_1K)
+#define CEDRUS_H265_ENTRY_POINTS_BUF_SIZE (4 * SZ_1K)
+#define CEDRUS_H265_MV_COL_BUF_UNIT_CTB_SIZE 160
+
+struct cedrus_h265_sram_frame_info {
+ __le32 top_pic_order_cnt;
+ __le32 bottom_pic_order_cnt;
+ __le32 top_mv_col_buf_addr;
+ __le32 bottom_mv_col_buf_addr;
+ __le32 luma_addr;
+ __le32 chroma_addr;
+} __packed;
+
+struct cedrus_h265_sram_pred_weight {
+ __s8 delta_weight;
+ __s8 offset;
+} __packed;
+
+static enum cedrus_irq_status cedrus_h265_irq_status(struct cedrus_ctx *ctx)
+{
+ struct cedrus_dev *dev = ctx->dev;
+ u32 reg;
+
+ reg = cedrus_read(dev, VE_DEC_H265_STATUS);
+ reg &= VE_DEC_H265_STATUS_CHECK_MASK;
+
+ if (reg & VE_DEC_H265_STATUS_CHECK_ERROR ||
+ !(reg & VE_DEC_H265_STATUS_SUCCESS))
+ return CEDRUS_IRQ_ERROR;
+
+ return CEDRUS_IRQ_OK;
+}
+
+static void cedrus_h265_irq_clear(struct cedrus_ctx *ctx)
+{
+ struct cedrus_dev *dev = ctx->dev;
+
+ cedrus_write(dev, VE_DEC_H265_STATUS, VE_DEC_H265_STATUS_CHECK_MASK);
+}
+
+static void cedrus_h265_irq_disable(struct cedrus_ctx *ctx)
+{
+ struct cedrus_dev *dev = ctx->dev;
+ u32 reg = cedrus_read(dev, VE_DEC_H265_CTRL);
+
+ reg &= ~VE_DEC_H265_CTRL_IRQ_MASK;
+
+ cedrus_write(dev, VE_DEC_H265_CTRL, reg);
+}
+
+static void cedrus_h265_sram_write_offset(struct cedrus_dev *dev, u32 offset)
+{
+ cedrus_write(dev, VE_DEC_H265_SRAM_OFFSET, offset);
+}
+
+static void cedrus_h265_sram_write_data(struct cedrus_dev *dev, u32 *data,
+ unsigned int count)
+{
+ while (count--)
+ cedrus_write(dev, VE_DEC_H265_SRAM_DATA, *data++);
+}
+
+static inline dma_addr_t cedrus_h265_frame_info_mv_col_buf_addr(
+ struct cedrus_ctx *ctx, unsigned int index, unsigned int field)
+{
+ return ctx->codec.h265.mv_col_buf_addr + index *
+ ctx->codec.h265.mv_col_buf_unit_size +
+ field * ctx->codec.h265.mv_col_buf_unit_size / 2;
+}
+
+static void cedrus_h265_frame_info_write_single(struct cedrus_dev *dev,
+ unsigned int index,
+ bool field_pic,
+ u32 pic_order_cnt[],
+ dma_addr_t mv_col_buf_addr[],
+ dma_addr_t dst_luma_addr,
+ dma_addr_t dst_chroma_addr)
+{
+ u32 offset = VE_DEC_H265_SRAM_OFFSET_FRAME_INFO +
+ VE_DEC_H265_SRAM_OFFSET_FRAME_INFO_UNIT * index;
+ struct cedrus_h265_sram_frame_info frame_info = {
+ .top_pic_order_cnt = pic_order_cnt[0],
+ .bottom_pic_order_cnt = field_pic ? pic_order_cnt[1] :
+ pic_order_cnt[0],
+ .top_mv_col_buf_addr =
+ VE_DEC_H265_SRAM_DATA_ADDR_BASE(mv_col_buf_addr[0]),
+ .bottom_mv_col_buf_addr = field_pic ?
+ VE_DEC_H265_SRAM_DATA_ADDR_BASE(mv_col_buf_addr[1]) :
+ VE_DEC_H265_SRAM_DATA_ADDR_BASE(mv_col_buf_addr[0]),
+ .luma_addr = VE_DEC_H265_SRAM_DATA_ADDR_BASE(dst_luma_addr),
+ .chroma_addr = VE_DEC_H265_SRAM_DATA_ADDR_BASE(dst_chroma_addr),
+ };
+ unsigned int count = sizeof(frame_info) / sizeof(u32);
+
+ cedrus_h265_sram_write_offset(dev, offset);
+ cedrus_h265_sram_write_data(dev, (u32 *)&frame_info, count);
+}
+
+static void cedrus_h265_frame_info_write_dpb(struct cedrus_ctx *ctx,
+ const struct v4l2_hevc_dpb_entry *dpb,
+ u8 num_active_dpb_entries)
+{
+ struct cedrus_dev *dev = ctx->dev;
+ struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
+ unsigned int i;
+
+ for (i = 0; i < num_active_dpb_entries; i++) {
+ dma_addr_t dst_luma_addr, dst_chroma_addr;
+ dma_addr_t mv_col_buf_addr[2];
+ u32 pic_order_cnt[2];
+ int buffer_index = vb2_find_tag(cap_q, dpb[i].buffer_tag, 0);
+
+ dst_luma_addr = cedrus_dst_buf_addr(ctx, buffer_index, 0) -
+ PHYS_OFFSET;
+ dst_chroma_addr = cedrus_dst_buf_addr(ctx, buffer_index, 1) -
+ PHYS_OFFSET;
+ mv_col_buf_addr[0] = cedrus_h265_frame_info_mv_col_buf_addr(ctx,
+ buffer_index, 0) - PHYS_OFFSET;
+ pic_order_cnt[0] = dpb[i].pic_order_cnt[0];
+
+ if (dpb[i].field_pic) {
+ mv_col_buf_addr[1] =
+ cedrus_h265_frame_info_mv_col_buf_addr(ctx,
+ buffer_index, 1) - PHYS_OFFSET;
+ pic_order_cnt[1] = dpb[i].pic_order_cnt[1];
+ }
+
+ cedrus_h265_frame_info_write_single(dev, i, dpb[i].field_pic,
+ pic_order_cnt,
+ mv_col_buf_addr,
+ dst_luma_addr,
+ dst_chroma_addr);
+ }
+}
+
+static void cedrus_h265_ref_pic_list_write(struct cedrus_dev *dev,
+ const u8 list[],
+ u8 num_ref_idx_active,
+ const struct v4l2_hevc_dpb_entry *dpb,
+ u8 num_active_dpb_entries,
+ u32 sram_offset)
+{
+ unsigned int i;
+ u32 reg = 0;
+
+ cedrus_h265_sram_write_offset(dev, sram_offset);
+
+ for (i = 0; i < num_ref_idx_active; i++) {
+ unsigned int shift = (i % 4) * 8;
+ unsigned int index = list[i];
+ u8 value = list[i];
+
+ if (dpb[index].rps == V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR)
+ value |= VE_DEC_H265_SRAM_REF_PIC_LIST_LT_REF;
+
+ reg |= value << shift;
+
+ if ((i % 4) == 3 || i == (num_ref_idx_active - 1)) {
+ cedrus_h265_sram_write_data(dev, &reg, 1);
+ reg = 0;
+ }
+ }
+}
+
+static void cedrus_h265_pred_weight_write(struct cedrus_dev *dev,
+ const s8 delta_luma_weight[],
+ const s8 luma_offset[],
+ const s8 delta_chroma_weight[][2],
+ const s8 chroma_offset[][2],
+ u8 num_ref_idx_active,
+ u32 sram_luma_offset,
+ u32 sram_chroma_offset)
+{
+ struct cedrus_h265_sram_pred_weight pred_weight[2] = { 0 };
+ unsigned int i, j;
+
+ cedrus_h265_sram_write_offset(dev, sram_luma_offset);
+
+ for (i = 0; i < num_ref_idx_active; i++) {
+ unsigned int index = i % 2;
+
+ pred_weight[index].delta_weight = delta_luma_weight[i];
+ pred_weight[index].offset = luma_offset[i];
+
+ if (index == 1 || i == (num_ref_idx_active - 1))
+ cedrus_h265_sram_write_data(dev, (u32 *)&pred_weight,
+ 1);
+ }
+
+ cedrus_h265_sram_write_offset(dev, sram_chroma_offset);
+
+ for (i = 0; i < num_ref_idx_active; i++) {
+ for (j = 0; j < 2; j++) {
+ pred_weight[j].delta_weight = delta_chroma_weight[i][j];
+ pred_weight[j].offset = chroma_offset[i][j];
+ }
+
+ cedrus_h265_sram_write_data(dev, (u32 *)&pred_weight, 1);
+ }
+}
+
+static void cedrus_h265_setup(struct cedrus_ctx *ctx,
+ struct cedrus_run *run)
+{
+ struct cedrus_dev *dev = ctx->dev;
+ const struct v4l2_ctrl_hevc_sps *sps;
+ const struct v4l2_ctrl_hevc_pps *pps;
+ const struct v4l2_ctrl_hevc_slice_params *slice_params;
+ const struct v4l2_hevc_pred_weight_table *pred_weight_table;
+ dma_addr_t src_buf_addr;
+ dma_addr_t src_buf_end_addr;
+ dma_addr_t dst_luma_addr, dst_chroma_addr;
+ dma_addr_t mv_col_buf_addr[2];
+ u32 chroma_log2_weight_denom;
+ u32 output_pic_list_index;
+ u32 pic_order_cnt[2];
+ u32 reg;
+
+ sps = run->h265.sps;
+ pps = run->h265.pps;
+ slice_params = run->h265.slice_params;
+ pred_weight_table = &slice_params->pred_weight_table;
+
+ /* MV column buffer size and allocation. */
+ if (!ctx->codec.h265.mv_col_buf_size) {
+ unsigned int num_buffers =
+ run->dst->vb2_buf.vb2_queue->num_buffers;
+ unsigned int log2_max_luma_coding_block_size =
+ sps->log2_min_luma_coding_block_size_minus3 + 3 +
+ sps->log2_diff_max_min_luma_coding_block_size;
+ unsigned int ctb_size_luma =
+ 1 << log2_max_luma_coding_block_size;
+
+ /*
+ * Each CTB requires a MV col buffer with a specific unit size.
+ * Since the address is given with missing lsb bits, 1 KiB is
+ * added to each buffer to ensure proper alignment.
+ */
+ ctx->codec.h265.mv_col_buf_unit_size =
+ DIV_ROUND_UP(ctx->src_fmt.width, ctb_size_luma) *
+ DIV_ROUND_UP(ctx->src_fmt.height, ctb_size_luma) *
+ CEDRUS_H265_MV_COL_BUF_UNIT_CTB_SIZE + SZ_1K;
+
+ ctx->codec.h265.mv_col_buf_size = num_buffers *
+ ctx->codec.h265.mv_col_buf_unit_size;
+
+ ctx->codec.h265.mv_col_buf =
+ dma_alloc_coherent(dev->dev,
+ ctx->codec.h265.mv_col_buf_size,
+ &ctx->codec.h265.mv_col_buf_addr,
+ GFP_KERNEL);
+ if (!ctx->codec.h265.mv_col_buf) {
+ ctx->codec.h265.mv_col_buf_size = 0;
+ // TODO: Abort the process here.
+ return;
+ }
+ }
+
+ /* Activate H265 engine. */
+ cedrus_engine_enable(dev, CEDRUS_CODEC_H265);
+
+ /* Source offset and length in bits. */
+
+ reg = slice_params->data_bit_offset;
+ cedrus_write(dev, VE_DEC_H265_BITS_OFFSET, reg);
+
+ reg = slice_params->bit_size - slice_params->data_bit_offset;
+ cedrus_write(dev, VE_DEC_H265_BITS_LEN, reg);
+
+ /* Source beginning and end addresses. */
+
+ src_buf_addr = vb2_dma_contig_plane_dma_addr(&run->src->vb2_buf, 0) -
+ PHYS_OFFSET;
+
+ reg = VE_DEC_H265_BITS_ADDR_BASE(src_buf_addr);
+ reg |= VE_DEC_H265_BITS_ADDR_VALID_SLICE_DATA;
+ reg |= VE_DEC_H265_BITS_ADDR_LAST_SLICE_DATA;
+ reg |= VE_DEC_H265_BITS_ADDR_FIRST_SLICE_DATA;
+
+ cedrus_write(dev, VE_DEC_H265_BITS_ADDR, reg);
+
+ src_buf_end_addr = src_buf_addr +
+ DIV_ROUND_UP(slice_params->bit_size, 8);
+
+ reg = VE_DEC_H265_BITS_END_ADDR_BASE(src_buf_end_addr);
+ cedrus_write(dev, VE_DEC_H265_BITS_END_ADDR, reg);
+
+ /* Coding tree block address: start at the beginning. */
+ reg = VE_DEC_H265_DEC_CTB_ADDR_X(0) | VE_DEC_H265_DEC_CTB_ADDR_Y(0);
+ cedrus_write(dev, VE_DEC_H265_DEC_CTB_ADDR, reg);
+
+ cedrus_write(dev, VE_DEC_H265_TILE_START_CTB, 0);
+ cedrus_write(dev, VE_DEC_H265_TILE_END_CTB, 0);
+
+ /* Clear the number of correctly-decoded coding tree blocks. */
+ cedrus_write(dev, VE_DEC_H265_DEC_CTB_NUM, 0);
+
+ /* Initialize bitstream access. */
+ cedrus_write(dev, VE_DEC_H265_TRIGGER, VE_DEC_H265_TRIGGER_INIT_SWDEC);
+
+ /* Bitstream parameters. */
+
+ reg = VE_DEC_H265_DEC_NAL_HDR_NAL_UNIT_TYPE(slice_params->nal_unit_type) |
+ VE_DEC_H265_DEC_NAL_HDR_NUH_TEMPORAL_ID_PLUS1(slice_params->nuh_temporal_id_plus1);
+
+ cedrus_write(dev, VE_DEC_H265_DEC_NAL_HDR, reg);
+
+ reg = VE_DEC_H265_DEC_SPS_HDR_STRONG_INTRA_SMOOTHING_ENABLE_FLAG(sps->strong_intra_smoothing_enabled_flag) |
+ VE_DEC_H265_DEC_SPS_HDR_SPS_TEMPORAL_MVP_ENABLED_FLAG(sps->sps_temporal_mvp_enabled_flag) |
+ VE_DEC_H265_DEC_SPS_HDR_SAMPLE_ADAPTIVE_OFFSET_ENABLED_FLAG(sps->sample_adaptive_offset_enabled_flag) |
+ VE_DEC_H265_DEC_SPS_HDR_AMP_ENABLED_FLAG(sps->amp_enabled_flag) |
+ VE_DEC_H265_DEC_SPS_HDR_MAX_TRANSFORM_HIERARCHY_DEPTH_INTRA(sps->max_transform_hierarchy_depth_intra) |
+ VE_DEC_H265_DEC_SPS_HDR_MAX_TRANSFORM_HIERARCHY_DEPTH_INTER(sps->max_transform_hierarchy_depth_inter) |
+ VE_DEC_H265_DEC_SPS_HDR_LOG2_DIFF_MAX_MIN_TRANSFORM_BLOCK_SIZE(sps->log2_diff_max_min_luma_transform_block_size) |
+ VE_DEC_H265_DEC_SPS_HDR_LOG2_MIN_TRANSFORM_BLOCK_SIZE_MINUS2(sps->log2_min_luma_transform_block_size_minus2) |
+ VE_DEC_H265_DEC_SPS_HDR_LOG2_DIFF_MAX_MIN_LUMA_CODING_BLOCK_SIZE(sps->log2_diff_max_min_luma_coding_block_size) |
+ VE_DEC_H265_DEC_SPS_HDR_LOG2_MIN_LUMA_CODING_BLOCK_SIZE_MINUS3(sps->log2_min_luma_coding_block_size_minus3) |
+ VE_DEC_H265_DEC_SPS_HDR_BIT_DEPTH_CHROMA_MINUS8(sps->bit_depth_chroma_minus8) |
+ VE_DEC_H265_DEC_SPS_HDR_SEPARATE_COLOUR_PLANE_FLAG(sps->separate_colour_plane_flag) |
+ VE_DEC_H265_DEC_SPS_HDR_CHROMA_FORMAT_IDC(sps->chroma_format_idc);
+
+ cedrus_write(dev, VE_DEC_H265_DEC_SPS_HDR, reg);
+
+ reg = VE_DEC_H265_DEC_PCM_CTRL_PCM_ENABLED_FLAG(sps->pcm_enabled_flag) |
+ VE_DEC_H265_DEC_PCM_CTRL_PCM_LOOP_FILTER_DISABLED_FLAG(sps->pcm_loop_filter_disabled_flag) |
+ VE_DEC_H265_DEC_PCM_CTRL_LOG2_DIFF_MAX_MIN_PCM_LUMA_CODING_BLOCK_SIZE(sps->log2_diff_max_min_pcm_luma_coding_block_size) |
+ VE_DEC_H265_DEC_PCM_CTRL_LOG2_MIN_PCM_LUMA_CODING_BLOCK_SIZE_MINUS3(sps->log2_min_pcm_luma_coding_block_size_minus3) |
+ VE_DEC_H265_DEC_PCM_CTRL_PCM_SAMPLE_BIT_DEPTH_CHROMA_MINUS1(sps->pcm_sample_bit_depth_chroma_minus1) |
+ VE_DEC_H265_DEC_PCM_CTRL_PCM_SAMPLE_BIT_DEPTH_LUMA_MINUS1(sps->pcm_sample_bit_depth_luma_minus1);
+
+ cedrus_write(dev, VE_DEC_H265_DEC_PCM_CTRL, reg);
+
+ reg = VE_DEC_H265_DEC_PPS_CTRL0_PPS_CR_QP_OFFSET(pps->pps_cr_qp_offset) |
+ VE_DEC_H265_DEC_PPS_CTRL0_PPS_CB_QP_OFFSET(pps->pps_cb_qp_offset) |
+ VE_DEC_H265_DEC_PPS_CTRL0_INIT_QP_MINUS26(pps->init_qp_minus26) |
+ VE_DEC_H265_DEC_PPS_CTRL0_DIFF_CU_QP_DELTA_DEPTH(pps->diff_cu_qp_delta_depth) |
+ VE_DEC_H265_DEC_PPS_CTRL0_CU_QP_DELTA_ENABLED_FLAG(pps->cu_qp_delta_enabled_flag) |
+ VE_DEC_H265_DEC_PPS_CTRL0_TRANSFORM_SKIP_ENABLED_FLAG(pps->transform_skip_enabled_flag) |
+ VE_DEC_H265_DEC_PPS_CTRL0_CONSTRAINED_INTRA_PRED_FLAG(pps->constrained_intra_pred_flag) |
+ VE_DEC_H265_DEC_PPS_CTRL0_SIGN_DATA_HIDING_FLAG(pps->sign_data_hiding_enabled_flag);
+
+ cedrus_write(dev, VE_DEC_H265_DEC_PPS_CTRL0, reg);
+
+ reg = VE_DEC_H265_DEC_PPS_CTRL1_LOG2_PARALLEL_MERGE_LEVEL_MINUS2(pps->log2_parallel_merge_level_minus2) |
+ VE_DEC_H265_DEC_PPS_CTRL1_PPS_LOOP_FILTER_ACROSS_SLICES_ENABLED_FLAG(pps->pps_loop_filter_across_slices_enabled_flag) |
+ VE_DEC_H265_DEC_PPS_CTRL1_LOOP_FILTER_ACROSS_TILES_ENABLED_FLAG(pps->loop_filter_across_tiles_enabled_flag) |
+ VE_DEC_H265_DEC_PPS_CTRL1_ENTROPY_CODING_SYNC_ENABLED_FLAG(pps->entropy_coding_sync_enabled_flag) |
+ VE_DEC_H265_DEC_PPS_CTRL1_TILES_ENABLED_FLAG(0) |
+ VE_DEC_H265_DEC_PPS_CTRL1_TRANSQUANT_BYPASS_ENABLE_FLAG(pps->transquant_bypass_enabled_flag) |
+ VE_DEC_H265_DEC_PPS_CTRL1_WEIGHTED_BIPRED_FLAG(pps->weighted_bipred_flag) |
+ VE_DEC_H265_DEC_PPS_CTRL1_WEIGHTED_PRED_FLAG(pps->weighted_pred_flag);
+
+ cedrus_write(dev, VE_DEC_H265_DEC_PPS_CTRL1, reg);
+
+ reg = VE_DEC_H265_DEC_SLICE_HDR_INFO0_PICTURE_TYPE(slice_params->pic_struct) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO0_FIVE_MINUS_MAX_NUM_MERGE_CAND(slice_params->five_minus_max_num_merge_cand) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO0_NUM_REF_IDX_L1_ACTIVE_MINUS1(slice_params->num_ref_idx_l1_active_minus1) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO0_NUM_REF_IDX_L0_ACTIVE_MINUS1(slice_params->num_ref_idx_l0_active_minus1) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO0_COLLOCATED_REF_IDX(slice_params->collocated_ref_idx) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO0_COLLOCATED_FROM_L0_FLAG(slice_params->collocated_from_l0_flag) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO0_CABAC_INIT_FLAG(slice_params->cabac_init_flag) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO0_MVD_L1_ZERO_FLAG(slice_params->mvd_l1_zero_flag) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO0_SLICE_SAO_CHROMA_FLAG(slice_params->slice_sao_chroma_flag) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO0_SLICE_SAO_LUMA_FLAG(slice_params->slice_sao_luma_flag) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO0_SLICE_TEMPORAL_MVP_ENABLE_FLAG(slice_params->slice_temporal_mvp_enabled_flag) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO0_COLOUR_PLANE_ID(slice_params->colour_plane_id) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO0_SLICE_TYPE(slice_params->slice_type) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO0_DEPENDENT_SLICE_SEGMENT_FLAG(pps->dependent_slice_segment_flag) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO0_FIRST_SLICE_SEGMENT_IN_PIC_FLAG(1);
+
+ cedrus_write(dev, VE_DEC_H265_DEC_SLICE_HDR_INFO0, reg);
+
+ reg = VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_TC_OFFSET_DIV2(slice_params->slice_tc_offset_div2) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_BETA_OFFSET_DIV2(slice_params->slice_beta_offset_div2) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_DEBLOCKING_FILTER_DISABLED_FLAG(slice_params->slice_deblocking_filter_disabled_flag) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_LOOP_FILTER_ACROSS_SLICES_ENABLED_FLAG(slice_params->slice_loop_filter_across_slices_enabled_flag) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_POC_BIGEST_IN_RPS_ST(slice_params->num_rps_poc_st_curr_after == 0) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_CR_QP_OFFSET(slice_params->slice_cr_qp_offset) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_CB_QP_OFFSET(slice_params->slice_cb_qp_offset) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_QP_DELTA(slice_params->slice_qp_delta);
+
+ cedrus_write(dev, VE_DEC_H265_DEC_SLICE_HDR_INFO1, reg);
+
+ chroma_log2_weight_denom = pred_weight_table->luma_log2_weight_denom +
+ pred_weight_table->delta_chroma_log2_weight_denom;
+ reg = VE_DEC_H265_DEC_SLICE_HDR_INFO2_NUM_ENTRY_POINT_OFFSETS(0) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO2_CHROMA_LOG2_WEIGHT_DENOM(chroma_log2_weight_denom) |
+ VE_DEC_H265_DEC_SLICE_HDR_INFO2_LUMA_LOG2_WEIGHT_DENOM(pred_weight_table->luma_log2_weight_denom);
+
+ cedrus_write(dev, VE_DEC_H265_DEC_SLICE_HDR_INFO2, reg);
+
+ /* Decoded picture size. */
+
+ reg = VE_DEC_H265_DEC_PIC_SIZE_WIDTH(ctx->src_fmt.width) |
+ VE_DEC_H265_DEC_PIC_SIZE_HEIGHT(ctx->src_fmt.height);
+
+ cedrus_write(dev, VE_DEC_H265_DEC_PIC_SIZE, reg);
+
+ /* Scaling list */
+
+ reg = VE_DEC_H265_SCALING_LIST_CTRL0_DEFAULT;
+ cedrus_write(dev, VE_DEC_H265_SCALING_LIST_CTRL0, reg);
+
+ /* Neightbor information address. */
+ reg = VE_DEC_H265_NEIGHBOR_INFO_ADDR_BASE(ctx->codec.h265.neighbor_info_buf_addr);
+ cedrus_write(dev, VE_DEC_H265_NEIGHBOR_INFO_ADDR, reg);
+
+ /* Write decoded picture buffer in pic list. */
+ cedrus_h265_frame_info_write_dpb(ctx, slice_params->dpb,
+ slice_params->num_active_dpb_entries);
+
+ /* Output frame. */
+
+ output_pic_list_index = V4L2_HEVC_DPB_ENTRIES_NUM_MAX;
+ pic_order_cnt[0] = pic_order_cnt[1] = slice_params->slice_pic_order_cnt;
+ mv_col_buf_addr[0] = cedrus_h265_frame_info_mv_col_buf_addr(ctx,
+ run->dst->vb2_buf.index, 0) - PHYS_OFFSET;
+ mv_col_buf_addr[1] = cedrus_h265_frame_info_mv_col_buf_addr(ctx,
+ run->dst->vb2_buf.index, 1) - PHYS_OFFSET;
+ dst_luma_addr = cedrus_dst_buf_addr(ctx, run->dst->vb2_buf.index, 0) -
+ PHYS_OFFSET;
+ dst_chroma_addr = cedrus_dst_buf_addr(ctx, run->dst->vb2_buf.index, 1) -
+ PHYS_OFFSET;
+
+ cedrus_h265_frame_info_write_single(dev, output_pic_list_index,
+ slice_params->pic_struct != 0,
+ pic_order_cnt, mv_col_buf_addr,
+ dst_luma_addr, dst_chroma_addr);
+
+ cedrus_write(dev, VE_DEC_H265_OUTPUT_FRAME_IDX, output_pic_list_index);
+
+ /* Reference picture list 0 (for P/B frames). */
+ if (slice_params->slice_type != V4L2_HEVC_SLICE_TYPE_I) {
+ cedrus_h265_ref_pic_list_write(dev, slice_params->ref_idx_l0,
+ slice_params->num_ref_idx_l0_active_minus1 + 1,
+ slice_params->dpb, slice_params->num_active_dpb_entries,
+ VE_DEC_H265_SRAM_OFFSET_REF_PIC_LIST0);
+
+ if (pps->weighted_pred_flag || pps->weighted_bipred_flag)
+ cedrus_h265_pred_weight_write(dev,
+ pred_weight_table->delta_luma_weight_l0,
+ pred_weight_table->luma_offset_l0,
+ pred_weight_table->delta_chroma_weight_l0,
+ pred_weight_table->chroma_offset_l0,
+ slice_params->num_ref_idx_l0_active_minus1 + 1,
+ VE_DEC_H265_SRAM_OFFSET_PRED_WEIGHT_LUMA_L0,
+ VE_DEC_H265_SRAM_OFFSET_PRED_WEIGHT_CHROMA_L0);
+ }
+
+ /* Reference picture list 1 (for B frames). */
+ if (slice_params->slice_type == V4L2_HEVC_SLICE_TYPE_B) {
+ cedrus_h265_ref_pic_list_write(dev, slice_params->ref_idx_l1,
+ slice_params->num_ref_idx_l1_active_minus1 + 1,
+ slice_params->dpb,
+ slice_params->num_active_dpb_entries,
+ VE_DEC_H265_SRAM_OFFSET_REF_PIC_LIST1);
+
+ if (pps->weighted_bipred_flag)
+ cedrus_h265_pred_weight_write(dev,
+ pred_weight_table->delta_luma_weight_l1,
+ pred_weight_table->luma_offset_l1,
+ pred_weight_table->delta_chroma_weight_l1,
+ pred_weight_table->chroma_offset_l1,
+ slice_params->num_ref_idx_l1_active_minus1 + 1,
+ VE_DEC_H265_SRAM_OFFSET_PRED_WEIGHT_LUMA_L1,
+ VE_DEC_H265_SRAM_OFFSET_PRED_WEIGHT_CHROMA_L1);
+ }
+
+ /* Enable appropriate interruptions. */
+ cedrus_write(dev, VE_DEC_H265_CTRL, VE_DEC_H265_CTRL_IRQ_MASK);
+}
+
+static int cedrus_h265_start(struct cedrus_ctx *ctx)
+{
+ struct cedrus_dev *dev = ctx->dev;
+
+ /* The buffer size is calculated at setup time. */
+ ctx->codec.h265.mv_col_buf_size = 0;
+
+ ctx->codec.h265.neighbor_info_buf =
+ dma_alloc_coherent(dev->dev, CEDRUS_H265_NEIGHBOR_INFO_BUF_SIZE,
+ &ctx->codec.h265.neighbor_info_buf_addr,
+ GFP_KERNEL);
+ if (!ctx->codec.h265.neighbor_info_buf)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static void cedrus_h265_stop(struct cedrus_ctx *ctx)
+{
+ struct cedrus_dev *dev = ctx->dev;
+
+ if (ctx->codec.h265.mv_col_buf_size > 0) {
+ dma_free_coherent(dev->dev, ctx->codec.h265.mv_col_buf_size,
+ ctx->codec.h265.mv_col_buf,
+ ctx->codec.h265.mv_col_buf_addr);
+
+ ctx->codec.h265.mv_col_buf_size = 0;
+ }
+
+ dma_free_coherent(dev->dev, CEDRUS_H265_NEIGHBOR_INFO_BUF_SIZE,
+ ctx->codec.h265.neighbor_info_buf,
+ ctx->codec.h265.neighbor_info_buf_addr);
+}
+
+static void cedrus_h265_trigger(struct cedrus_ctx *ctx)
+{
+ struct cedrus_dev *dev = ctx->dev;
+
+ cedrus_write(dev, VE_DEC_H265_TRIGGER, VE_DEC_H265_TRIGGER_DEC_SLICE);
+}
+
+struct cedrus_dec_ops cedrus_dec_ops_h265 = {
+ .irq_clear = cedrus_h265_irq_clear,
+ .irq_disable = cedrus_h265_irq_disable,
+ .irq_status = cedrus_h265_irq_status,
+ .setup = cedrus_h265_setup,
+ .start = cedrus_h265_start,
+ .stop = cedrus_h265_stop,
+ .trigger = cedrus_h265_trigger,
+};
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
index 7d5ca6ddf8e4..a8b693305f02 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
@@ -50,6 +50,10 @@ int cedrus_engine_enable(struct cedrus_dev *dev, enum cedrus_codec codec)
reg |= VE_MODE_DEC_H264;
break;

+ case CEDRUS_CODEC_H265:
+ reg |= VE_MODE_DEC_H265;
+ break;
+
default:
return -EINVAL;
}
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h
index 6fe9896a506d..d87d13d6ed16 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h
@@ -18,10 +18,17 @@
* * MC: Motion Compensation
* * STCD: Start Code Detect
* * SDRT: Scale Down and Rotate
+ * * WB: Writeback
+ * * BITS/BS: Bitstream
+ * * MB: Macroblock
+ * * CTU: Coding Tree Unit
+ * * CTB: Coding Tree Block
+ * * IDX: Index
*/

#define VE_ENGINE_DEC_MPEG 0x100
#define VE_ENGINE_DEC_H264 0x200
+#define VE_ENGINE_DEC_H265 0x500

#define VE_MODE 0x00

@@ -232,6 +239,289 @@
#define VE_DEC_MPEG_ROT_LUMA (VE_ENGINE_DEC_MPEG + 0xcc)
#define VE_DEC_MPEG_ROT_CHROMA (VE_ENGINE_DEC_MPEG + 0xd0)

+#define VE_DEC_H265_DEC_NAL_HDR (VE_ENGINE_DEC_H265 + 0x00)
+
+#define VE_DEC_H265_DEC_NAL_HDR_NUH_TEMPORAL_ID_PLUS1(v) \
+ (((v) << 6) & GENMASK(8, 6))
+#define VE_DEC_H265_DEC_NAL_HDR_NAL_UNIT_TYPE(v) \
+ ((v) & GENMASK(5, 0))
+
+#define VE_DEC_H265_DEC_SPS_HDR (VE_ENGINE_DEC_H265 + 0x04)
+
+#define VE_DEC_H265_DEC_SPS_HDR_STRONG_INTRA_SMOOTHING_ENABLE_FLAG(v) \
+ ((v) ? BIT(26) : 0)
+#define VE_DEC_H265_DEC_SPS_HDR_SPS_TEMPORAL_MVP_ENABLED_FLAG(v) \
+ ((v) ? BIT(25) : 0)
+#define VE_DEC_H265_DEC_SPS_HDR_SAMPLE_ADAPTIVE_OFFSET_ENABLED_FLAG(v) \
+ ((v) ? BIT(24) : 0)
+#define VE_DEC_H265_DEC_SPS_HDR_AMP_ENABLED_FLAG(v) \
+ ((v) ? BIT(23) : 0)
+#define VE_DEC_H265_DEC_SPS_HDR_MAX_TRANSFORM_HIERARCHY_DEPTH_INTRA(v) \
+ (((v) << 20) & GENMASK(22, 20))
+#define VE_DEC_H265_DEC_SPS_HDR_MAX_TRANSFORM_HIERARCHY_DEPTH_INTER(v) \
+ (((v) << 17) & GENMASK(19, 17))
+#define VE_DEC_H265_DEC_SPS_HDR_LOG2_DIFF_MAX_MIN_TRANSFORM_BLOCK_SIZE(v) \
+ (((v) << 15) & GENMASK(16, 15))
+#define VE_DEC_H265_DEC_SPS_HDR_LOG2_MIN_TRANSFORM_BLOCK_SIZE_MINUS2(v) \
+ (((v) << 13) & GENMASK(14, 13))
+#define VE_DEC_H265_DEC_SPS_HDR_LOG2_DIFF_MAX_MIN_LUMA_CODING_BLOCK_SIZE(v) \
+ (((v) << 11) & GENMASK(12, 11))
+#define VE_DEC_H265_DEC_SPS_HDR_LOG2_MIN_LUMA_CODING_BLOCK_SIZE_MINUS3(v) \
+ (((v) << 9) & GENMASK(10, 9))
+#define VE_DEC_H265_DEC_SPS_HDR_BIT_DEPTH_CHROMA_MINUS8(v) \
+ (((v) << 6) & GENMASK(8, 6))
+#define VE_DEC_H265_DEC_SPS_HDR_BIT_DEPTH_LUMA_MINUS8(v) \
+ (((v) << 3) & GENMASK(5, 3))
+#define VE_DEC_H265_DEC_SPS_HDR_SEPARATE_COLOUR_PLANE_FLAG(v) \
+ ((v) ? BIT(2) : 0)
+#define VE_DEC_H265_DEC_SPS_HDR_CHROMA_FORMAT_IDC(v) \
+ ((v) & GENMASK(1, 0))
+
+#define VE_DEC_H265_DEC_PIC_SIZE (VE_ENGINE_DEC_H265 + 0x08)
+
+#define VE_DEC_H265_DEC_PIC_SIZE_WIDTH(w) (((w) << 0) & GENMASK(13, 0))
+#define VE_DEC_H265_DEC_PIC_SIZE_HEIGHT(h) (((h) << 16) & GENMASK(29, 16))
+
+#define VE_DEC_H265_DEC_PCM_CTRL (VE_ENGINE_DEC_H265 + 0x0c)
+
+#define VE_DEC_H265_DEC_PCM_CTRL_PCM_ENABLED_FLAG(v) \
+ ((v) ? BIT(15) : 0)
+#define VE_DEC_H265_DEC_PCM_CTRL_PCM_LOOP_FILTER_DISABLED_FLAG(v) \
+ ((v) ? BIT(14) : 0)
+#define VE_DEC_H265_DEC_PCM_CTRL_LOG2_DIFF_MAX_MIN_PCM_LUMA_CODING_BLOCK_SIZE(v) \
+ (((v) << 10) & GENMASK(11, 10))
+#define VE_DEC_H265_DEC_PCM_CTRL_LOG2_MIN_PCM_LUMA_CODING_BLOCK_SIZE_MINUS3(v) \
+ (((v) << 8) & GENMASK(9, 8))
+#define VE_DEC_H265_DEC_PCM_CTRL_PCM_SAMPLE_BIT_DEPTH_CHROMA_MINUS1(v) \
+ (((v) << 4) & GENMASK(7, 4))
+#define VE_DEC_H265_DEC_PCM_CTRL_PCM_SAMPLE_BIT_DEPTH_LUMA_MINUS1(v) \
+ (((v) << 0) & GENMASK(3, 0))
+
+#define VE_DEC_H265_DEC_PPS_CTRL0 (VE_ENGINE_DEC_H265 + 0x10)
+
+#define VE_DEC_H265_DEC_PPS_CTRL0_PPS_CR_QP_OFFSET(v) \
+ (((v) << 24) & GENMASK(29, 24))
+#define VE_DEC_H265_DEC_PPS_CTRL0_PPS_CB_QP_OFFSET(v) \
+ (((v) << 16) & GENMASK(21, 16))
+#define VE_DEC_H265_DEC_PPS_CTRL0_INIT_QP_MINUS26(v) \
+ (((v) << 8) & GENMASK(14, 8))
+#define VE_DEC_H265_DEC_PPS_CTRL0_DIFF_CU_QP_DELTA_DEPTH(v) \
+ (((v) << 4) & GENMASK(5, 4))
+#define VE_DEC_H265_DEC_PPS_CTRL0_CU_QP_DELTA_ENABLED_FLAG(v) \
+ ((v) ? BIT(3) : 0)
+#define VE_DEC_H265_DEC_PPS_CTRL0_TRANSFORM_SKIP_ENABLED_FLAG(v) \
+ ((v) ? BIT(2) : 0)
+#define VE_DEC_H265_DEC_PPS_CTRL0_CONSTRAINED_INTRA_PRED_FLAG(v) \
+ ((v) ? BIT(1) : 0)
+#define VE_DEC_H265_DEC_PPS_CTRL0_SIGN_DATA_HIDING_FLAG(v) \
+ ((v) ? BIT(0) : 0)
+
+#define VE_DEC_H265_DEC_PPS_CTRL1 (VE_ENGINE_DEC_H265 + 0x14)
+
+#define VE_DEC_H265_DEC_PPS_CTRL1_LOG2_PARALLEL_MERGE_LEVEL_MINUS2(v) \
+ (((v) << 8) & GENMASK(10, 8))
+#define VE_DEC_H265_DEC_PPS_CTRL1_PPS_LOOP_FILTER_ACROSS_SLICES_ENABLED_FLAG(v) \
+ ((v) ? BIT(6) : 0)
+#define VE_DEC_H265_DEC_PPS_CTRL1_LOOP_FILTER_ACROSS_TILES_ENABLED_FLAG(v) \
+ ((v) ? BIT(5) : 0)
+#define VE_DEC_H265_DEC_PPS_CTRL1_ENTROPY_CODING_SYNC_ENABLED_FLAG(v) \
+ ((v) ? BIT(4) : 0)
+#define VE_DEC_H265_DEC_PPS_CTRL1_TILES_ENABLED_FLAG(v) \
+ ((v) ? BIT(3) : 0)
+#define VE_DEC_H265_DEC_PPS_CTRL1_TRANSQUANT_BYPASS_ENABLE_FLAG(v) \
+ ((v) ? BIT(2) : 0)
+#define VE_DEC_H265_DEC_PPS_CTRL1_WEIGHTED_BIPRED_FLAG(v) \
+ ((v) ? BIT(1) : 0)
+#define VE_DEC_H265_DEC_PPS_CTRL1_WEIGHTED_PRED_FLAG(v) \
+ ((v) ? BIT(0) : 0)
+
+#define VE_DEC_H265_SCALING_LIST_CTRL0 (VE_ENGINE_DEC_H265 + 0x18)
+
+#define VE_DEC_H265_SCALING_LIST_CTRL0_ENABLED_FLAG(v) \
+ ((v) ? BIT(31) : 0)
+#define VE_DEC_H265_SCALING_LIST_CTRL0_SRAM (0 << 30)
+#define VE_DEC_H265_SCALING_LIST_CTRL0_DEFAULT (1 << 30)
+
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO0 (VE_ENGINE_DEC_H265 + 0x20)
+
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO0_PICTURE_TYPE(v) \
+ (((v) << 28) & GENMASK(29, 28))
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO0_FIVE_MINUS_MAX_NUM_MERGE_CAND(v) \
+ (((v) << 24) & GENMASK(26, 24))
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO0_NUM_REF_IDX_L1_ACTIVE_MINUS1(v) \
+ (((v) << 20) & GENMASK(23, 20))
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO0_NUM_REF_IDX_L0_ACTIVE_MINUS1(v) \
+ (((v) << 16) & GENMASK(19, 16))
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO0_COLLOCATED_REF_IDX(v) \
+ (((v) << 12) & GENMASK(15, 12))
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO0_COLLOCATED_FROM_L0_FLAG(v) \
+ ((v) ? BIT(11) : 0)
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO0_CABAC_INIT_FLAG(v) \
+ ((v) ? BIT(10) : 0)
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO0_MVD_L1_ZERO_FLAG(v) \
+ ((v) ? BIT(9) : 0)
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO0_SLICE_SAO_CHROMA_FLAG(v) \
+ ((v) ? BIT(8) : 0)
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO0_SLICE_SAO_LUMA_FLAG(v) \
+ ((v) ? BIT(7) : 0)
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO0_SLICE_TEMPORAL_MVP_ENABLE_FLAG(v) \
+ ((v) ? BIT(6) : 0)
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO0_COLOUR_PLANE_ID(v) \
+ (((v) << 4) & GENMASK(5, 4))
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO0_SLICE_TYPE(v) \
+ (((v) << 2) & GENMASK(3, 2))
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO0_DEPENDENT_SLICE_SEGMENT_FLAG(v) \
+ ((v) ? BIT(1) : 0)
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO0_FIRST_SLICE_SEGMENT_IN_PIC_FLAG(v) \
+ ((v) ? BIT(0) : 0)
+
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO1 (VE_ENGINE_DEC_H265 + 0x24)
+
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_TC_OFFSET_DIV2(v) \
+ (((v) << 28) & GENMASK(31, 28))
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_BETA_OFFSET_DIV2(v) \
+ (((v) << 24) & GENMASK(27, 24))
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_DEBLOCKING_FILTER_DISABLED_FLAG(v) \
+ ((v) ? BIT(23) : 0)
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_LOOP_FILTER_ACROSS_SLICES_ENABLED_FLAG(v) \
+ ((v) ? BIT(22) : 0)
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_POC_BIGEST_IN_RPS_ST(v) \
+ ((v) ? BIT(21) : 0)
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_CR_QP_OFFSET(v) \
+ (((v) << 16) & GENMASK(20, 16))
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_CB_QP_OFFSET(v) \
+ (((v) << 8) & GENMASK(12, 8))
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_QP_DELTA(v) \
+ ((v) & GENMASK(6, 0))
+
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO2 (VE_ENGINE_DEC_H265 + 0x28)
+
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO2_NUM_ENTRY_POINT_OFFSETS(v) \
+ (((v) << 8) & GENMASK(21, 8))
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO2_CHROMA_LOG2_WEIGHT_DENOM(v) \
+ (((v) << 4) & GENMASK(6, 4))
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO2_LUMA_LOG2_WEIGHT_DENOM(v) \
+ (((v) << 0) & GENMASK(2, 0))
+
+#define VE_DEC_H265_DEC_CTB_ADDR (VE_ENGINE_DEC_H265 + 0x2c)
+
+#define VE_DEC_H265_DEC_CTB_ADDR_Y(y) (((y) << 16) & GENMASK(25, 16))
+#define VE_DEC_H265_DEC_CTB_ADDR_X(x) (((x) << 0) & GENMASK(9, 0))
+
+#define VE_DEC_H265_CTRL (VE_ENGINE_DEC_H265 + 0x30)
+
+#define VE_DEC_H265_CTRL_DDR_CONSISTENCY_EN BIT(31)
+#define VE_DEC_H265_CTRL_STCD_EN BIT(25)
+#define VE_DEC_H265_CTRL_EPTB_DEC_BYPASS_EN BIT(24)
+#define VE_DEC_H265_CTRL_TQ_BYPASS_EN BIT(12)
+#define VE_DEC_H265_CTRL_VLD_BYPASS_EN BIT(11)
+#define VE_DEC_H265_CTRL_NCRI_CACHE_DISABLE BIT(10)
+#define VE_DEC_H265_CTRL_ROTATE_SCALE_OUT_EN BIT(9)
+#define VE_DEC_H265_CTRL_MC_NO_WRITEBACK BIT(8)
+#define VE_DEC_H265_CTRL_VLD_DATA_REQ_IRQ_EN BIT(2)
+#define VE_DEC_H265_CTRL_ERROR_IRQ_EN BIT(1)
+#define VE_DEC_H265_CTRL_FINISH_IRQ_EN BIT(0)
+#define VE_DEC_H265_CTRL_IRQ_MASK \
+ (VE_DEC_H265_CTRL_FINISH_IRQ_EN | VE_DEC_H265_CTRL_ERROR_IRQ_EN | \
+ VE_DEC_H265_CTRL_VLD_DATA_REQ_IRQ_EN)
+
+#define VE_DEC_H265_TRIGGER (VE_ENGINE_DEC_H265 + 0x34)
+
+#define VE_DEC_H265_TRIGGER_STCD_VC1 (0x02 << 4)
+#define VE_DEC_H265_TRIGGER_STCD_AVS (0x01 << 4)
+#define VE_DEC_H265_TRIGGER_STCD_HEVC (0x00 << 4)
+#define VE_DEC_H265_TRIGGER_DEC_SLICE (0x08 << 0)
+#define VE_DEC_H265_TRIGGER_INIT_SWDEC (0x07 << 0)
+#define VE_DEC_H265_TRIGGER_BYTE_ALIGN (0x06 << 0)
+#define VE_DEC_H265_TRIGGER_GET_VLCUE (0x05 << 0)
+#define VE_DEC_H265_TRIGGER_GET_VLCSE (0x04 << 0)
+#define VE_DEC_H265_TRIGGER_FLUSH_BITS (0x03 << 0)
+#define VE_DEC_H265_TRIGGER_GET_BITS (0x02 << 0)
+#define VE_DEC_H265_TRIGGER_SHOW_BITS (0x01 << 0)
+
+#define VE_DEC_H265_STATUS (VE_ENGINE_DEC_H265 + 0x38)
+
+#define VE_DEC_H265_STATUS_STCD BIT(24)
+#define VE_DEC_H265_STATUS_STCD_BUSY BIT(21)
+#define VE_DEC_H265_STATUS_WB_BUSY BIT(20)
+#define VE_DEC_H265_STATUS_BS_DMA_BUSY BIT(19)
+#define VE_DEC_H265_STATUS_IQIT_BUSY BIT(18)
+#define VE_DEC_H265_STATUS_INTER_BUSY BIT(17)
+#define VE_DEC_H265_STATUS_MORE_DATA BIT(16)
+#define VE_DEC_H265_STATUS_VLD_BUSY BIT(14)
+#define VE_DEC_H265_STATUS_DEBLOCKING_BUSY BIT(13)
+#define VE_DEC_H265_STATUS_DEBLOCKING_DRAM_BUSY BIT(12)
+#define VE_DEC_H265_STATUS_INTRA_BUSY BIT(11)
+#define VE_DEC_H265_STATUS_SAO_BUSY BIT(10)
+#define VE_DEC_H265_STATUS_MVP_BUSY BIT(9)
+#define VE_DEC_H265_STATUS_SWDEC_BUSY BIT(8)
+#define VE_DEC_H265_STATUS_OVER_TIME BIT(3)
+#define VE_DEC_H265_STATUS_VLD_DATA_REQ BIT(2)
+#define VE_DEC_H265_STATUS_ERROR BIT(1)
+#define VE_DEC_H265_STATUS_SUCCESS BIT(0)
+#define VE_DEC_H265_STATUS_STCD_TYPE_MASK GENMASK(23, 22)
+#define VE_DEC_H265_STATUS_CHECK_MASK \
+ (VE_DEC_H265_STATUS_SUCCESS | VE_DEC_H265_STATUS_ERROR | \
+ VE_DEC_H265_STATUS_VLD_DATA_REQ)
+#define VE_DEC_H265_STATUS_CHECK_ERROR \
+ (VE_DEC_H265_STATUS_ERROR | VE_DEC_H265_STATUS_VLD_DATA_REQ)
+
+#define VE_DEC_H265_DEC_CTB_NUM (VE_ENGINE_DEC_H265 + 0x3c)
+
+#define VE_DEC_H265_BITS_ADDR (VE_ENGINE_DEC_H265 + 0x40)
+
+#define VE_DEC_H265_BITS_ADDR_FIRST_SLICE_DATA BIT(30)
+#define VE_DEC_H265_BITS_ADDR_LAST_SLICE_DATA BIT(29)
+#define VE_DEC_H265_BITS_ADDR_VALID_SLICE_DATA BIT(28)
+#define VE_DEC_H265_BITS_ADDR_BASE(a) (((a) >> 8) & GENMASK(27, 0))
+
+#define VE_DEC_H265_BITS_OFFSET (VE_ENGINE_DEC_H265 + 0x44)
+#define VE_DEC_H265_BITS_LEN (VE_ENGINE_DEC_H265 + 0x48)
+
+#define VE_DEC_H265_BITS_END_ADDR (VE_ENGINE_DEC_H265 + 0x4c)
+
+#define VE_DEC_H265_BITS_END_ADDR_BASE(a) ((a) >> 8)
+
+#define VE_DEC_H265_SDRT_CTRL (VE_ENGINE_DEC_H265 + 0x50)
+#define VE_DEC_H265_SDRT_LUMA_ADDR (VE_ENGINE_DEC_H265 + 0x54)
+#define VE_DEC_H265_SDRT_CHROMA_ADDR (VE_ENGINE_DEC_H265 + 0x58)
+
+#define VE_DEC_H265_OUTPUT_FRAME_IDX (VE_ENGINE_DEC_H265 + 0x5c)
+
+#define VE_DEC_H265_NEIGHBOR_INFO_ADDR (VE_ENGINE_DEC_H265 + 0x60)
+
+#define VE_DEC_H265_NEIGHBOR_INFO_ADDR_BASE(a) ((a) >> 8)
+
+#define VE_DEC_H265_ENTRY_POINT_OFFSET_ADDR (VE_ENGINE_DEC_H265 + 0x64)
+#define VE_DEC_H265_TILE_START_CTB (VE_ENGINE_DEC_H265 + 0x68)
+#define VE_DEC_H265_TILE_END_CTB (VE_ENGINE_DEC_H265 + 0x6c)
+
+#define VE_DEC_H265_LOW_ADDR (VE_ENGINE_DEC_H265 + 0x80)
+
+#define VE_DEC_H265_LOW_ADDR_PRIMARY_CHROMA(a) \
+ (((a) << 24) & GENMASK(31, 24))
+#define VE_DEC_H265_LOW_ADDR_SECONDARY_CHROMA(a) \
+ (((a) << 16) & GENMASK(23, 16))
+#define VE_DEC_H265_LOW_ADDR_ENTRY_POINTS_BUF(a) \
+ (((a) << 0) & GENMASK(7, 0))
+
+#define VE_DEC_H265_SRAM_OFFSET (VE_ENGINE_DEC_H265 + 0xe0)
+
+#define VE_DEC_H265_SRAM_OFFSET_PRED_WEIGHT_LUMA_L0 0x00
+#define VE_DEC_H265_SRAM_OFFSET_PRED_WEIGHT_CHROMA_L0 0x20
+#define VE_DEC_H265_SRAM_OFFSET_PRED_WEIGHT_LUMA_L1 0x60
+#define VE_DEC_H265_SRAM_OFFSET_PRED_WEIGHT_CHROMA_L1 0x80
+#define VE_DEC_H265_SRAM_OFFSET_FRAME_INFO 0x400
+#define VE_DEC_H265_SRAM_OFFSET_FRAME_INFO_UNIT 0x20
+#define VE_DEC_H265_SRAM_OFFSET_SCALING_LISTS 0x800
+#define VE_DEC_H265_SRAM_OFFSET_REF_PIC_LIST0 0xc00
+#define VE_DEC_H265_SRAM_OFFSET_REF_PIC_LIST1 0xc10
+
+#define VE_DEC_H265_SRAM_DATA (VE_ENGINE_DEC_H265 + 0xe4)
+
+#define VE_DEC_H265_SRAM_DATA_ADDR_BASE(a) ((a) >> 8)
+#define VE_DEC_H265_SRAM_REF_PIC_LIST_LT_REF BIT(7)
+
/* FIXME: Legacy below. */

#define VBV_SIZE (1024 * 1024)
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_video.c b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
index f104ac0f9ec3..74c29720aa32 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_video.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
@@ -41,6 +41,11 @@ static struct cedrus_format cedrus_formats[] = {
.pixelformat = V4L2_PIX_FMT_H264_SLICE,
.directions = CEDRUS_DECODE_SRC,
},
+ {
+ .pixelformat = V4L2_PIX_FMT_HEVC_SLICE,
+ .directions = CEDRUS_DECODE_SRC,
+ .capabilities = CEDRUS_CAPABILITY_H265_DEC,
+ },
{
.pixelformat = V4L2_PIX_FMT_SUNXI_TILED_NV12,
.directions = CEDRUS_DECODE_DST,
@@ -105,6 +110,7 @@ static void cedrus_prepare_format(struct v4l2_pix_format *pix_fmt)
switch (pix_fmt->pixelformat) {
case V4L2_PIX_FMT_MPEG2_SLICE:
case V4L2_PIX_FMT_H264_SLICE:
+ case V4L2_PIX_FMT_HEVC_SLICE:
/* Zero bytes per line for encoded source. */
bytesperline = 0;

@@ -455,6 +461,10 @@ static int cedrus_start_streaming(struct vb2_queue *vq, unsigned int count)
ctx->current_codec = CEDRUS_CODEC_H264;
break;

+ case V4L2_PIX_FMT_HEVC_SLICE:
+ ctx->current_codec = CEDRUS_CODEC_H265;
+ break;
+
default:
return -EINVAL;
}
--
2.19.1


2018-11-24 08:34:52

by Paul Kocialkowski

[permalink] [raw]
Subject: [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

This introduces the required definitions for HEVC decoding support with
stateless VPUs. The controls associated to the HEVC slice format provide
the required meta-data for decoding slices extracted from the bitstream.

This interface comes with the following limitations:
* No custom quantization matrices (scaling lists);
* Support for a single temporal layer only;
* No slice entry point offsets support;
* No conformance window support;
* No VUI parameters support;
* No support for SPS extensions: range, multilayer, 3d, scc, 4 bits;
* No support for PPS extensions: range, multilayer, 3d, scc, 4 bits.

Signed-off-by: Paul Kocialkowski <[email protected]>
---
Documentation/media/uapi/v4l/biblio.rst | 9 +
.../media/uapi/v4l/extended-controls.rst | 417 ++++++++++++++++++
.../media/uapi/v4l/pixfmt-compressed.rst | 15 +
.../media/uapi/v4l/vidioc-queryctrl.rst | 18 +
.../media/videodev2.h.rst.exceptions | 3 +
drivers/media/v4l2-core/v4l2-ctrls.c | 26 ++
drivers/media/v4l2-core/v4l2-ioctl.c | 1 +
include/media/v4l2-ctrls.h | 6 +
include/uapi/linux/v4l2-controls.h | 155 +++++++
include/uapi/linux/v4l2-controls.h.rej | 187 --------
include/uapi/linux/videodev2.h | 7 +
11 files changed, 657 insertions(+), 187 deletions(-)
delete mode 100644 include/uapi/linux/v4l2-controls.h.rej

diff --git a/Documentation/media/uapi/v4l/biblio.rst b/Documentation/media/uapi/v4l/biblio.rst
index 73aeb7ce47d2..59a98feca3a1 100644
--- a/Documentation/media/uapi/v4l/biblio.rst
+++ b/Documentation/media/uapi/v4l/biblio.rst
@@ -124,6 +124,15 @@ ITU H.264

:author: International Telecommunication Union (http://www.itu.ch)

+.. _hevc:
+
+ITU H.265/HEVC
+==============
+
+:title: ITU-T Rec. H.265 | ISO/IEC 23008-2 "High Efficiency Video Coding"
+
+:author: International Telecommunication Union (http://www.itu.ch), International Organisation for Standardisation (http://www.iso.ch)
+
.. _jfif:

JFIF
diff --git a/Documentation/media/uapi/v4l/extended-controls.rst b/Documentation/media/uapi/v4l/extended-controls.rst
index 87c0d151577f..906ff4f32634 100644
--- a/Documentation/media/uapi/v4l/extended-controls.rst
+++ b/Documentation/media/uapi/v4l/extended-controls.rst
@@ -2038,6 +2038,423 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type -
- ``flags``
-

+.. _v4l2-mpeg-hevc:
+
+``V4L2_CID_MPEG_VIDEO_HEVC_SPS (struct)``
+ Specifies the Sequence Parameter Set fields (as extracted from the
+ bitstream) for the associated HEVC slice data.
+ These bitstream parameters are defined according to :ref:`hevc`.
+ Refer to the specification for the documentation of these fields.
+
+.. c:type:: v4l2_ctrl_hevc_sps
+
+.. cssclass:: longtable
+
+.. flat-table:: struct v4l2_ctrl_hevc_sps
+ :header-rows: 0
+ :stub-columns: 0
+ :widths: 1 1 2
+
+ * - __u8
+ - ``chroma_format_idc``
+ -
+ * - __u8
+ - ``separate_colour_plane_flag``
+ -
+ * - __u16
+ - ``pic_width_in_luma_samples``
+ -
+ * - __u16
+ - ``pic_height_in_luma_samples``
+ -
+ * - __u8
+ - ``bit_depth_luma_minus8``
+ -
+ * - __u8
+ - ``bit_depth_chroma_minus8``
+ -
+ * - __u8
+ - ``log2_max_pic_order_cnt_lsb_minus4``
+ -
+ * - __u8
+ - ``sps_max_dec_pic_buffering_minus1``
+ -
+ * - __u8
+ - ``sps_max_num_reorder_pics``
+ -
+ * - __u8
+ - ``sps_max_latency_increase_plus1``
+ -
+ * - __u8
+ - ``log2_min_luma_coding_block_size_minus3``
+ -
+ * - __u8
+ - ``log2_diff_max_min_luma_coding_block_size``
+ -
+ * - __u8
+ - ``log2_min_luma_transform_block_size_minus2``
+ -
+ * - __u8
+ - ``log2_diff_max_min_luma_transform_block_size``
+ -
+ * - __u8
+ - ``max_transform_hierarchy_depth_inter``
+ -
+ * - __u8
+ - ``max_transform_hierarchy_depth_intra``
+ -
+ * - __u8
+ - ``scaling_list_enabled_flag``
+ -
+ * - __u8
+ - ``amp_enabled_flag``
+ -
+ * - __u8
+ - ``sample_adaptive_offset_enabled_flag``
+ -
+ * - __u8
+ - ``pcm_enabled_flag``
+ -
+ * - __u8
+ - ``pcm_sample_bit_depth_luma_minus1``
+ -
+ * - __u8
+ - ``pcm_sample_bit_depth_chroma_minus1``
+ -
+ * - __u8
+ - ``log2_min_pcm_luma_coding_block_size_minus3``
+ -
+ * - __u8
+ - ``log2_diff_max_min_pcm_luma_coding_block_size``
+ -
+ * - __u8
+ - ``pcm_loop_filter_disabled_flag``
+ -
+ * - __u8
+ - ``num_short_term_ref_pic_sets``
+ -
+ * - __u8
+ - ``long_term_ref_pics_present_flag``
+ -
+ * - __u8
+ - ``num_long_term_ref_pics_sps``
+ -
+ * - __u8
+ - ``sps_temporal_mvp_enabled_flag``
+ -
+ * - __u8
+ - ``strong_intra_smoothing_enabled_flag``
+ -
+
+``V4L2_CID_MPEG_VIDEO_HEVC_PPS (struct)``
+ Specifies the Picture Parameter Set fields (as extracted from the
+ bitstream) for the associated HEVC slice data.
+ These bitstream parameters are defined according to :ref:`hevc`.
+ Refer to the specification for the documentation of these fields.
+
+.. c:type:: v4l2_ctrl_hevc_pps
+
+.. cssclass:: longtable
+
+.. flat-table:: struct v4l2_ctrl_hevc_pps
+ :header-rows: 0
+ :stub-columns: 0
+ :widths: 1 1 2
+
+ * - __u8
+ - ``dependent_slice_segment_flag``
+ -
+ * - __u8
+ - ``output_flag_present_flag``
+ -
+ * - __u8
+ - ``num_extra_slice_header_bits``
+ -
+ * - __u8
+ - ``sign_data_hiding_enabled_flag``
+ -
+ * - __u8
+ - ``cabac_init_present_flag``
+ -
+ * - __s8
+ - ``init_qp_minus26``
+ -
+ * - __u8
+ - ``constrained_intra_pred_flag``
+ -
+ * - __u8
+ - ``transform_skip_enabled_flag``
+ -
+ * - __u8
+ - ``cu_qp_delta_enabled_flag``
+ -
+ * - __u8
+ - ``diff_cu_qp_delta_depth``
+ -
+ * - __s8
+ - ``pps_cb_qp_offset``
+ -
+ * - __s8
+ - ``pps_cr_qp_offset``
+ -
+ * - __u8
+ - ``pps_slice_chroma_qp_offsets_present_flag``
+ -
+ * - __u8
+ - ``weighted_pred_flag``
+ -
+ * - __u8
+ - ``weighted_bipred_flag``
+ -
+ * - __u8
+ - ``transquant_bypass_enabled_flag``
+ -
+ * - __u8
+ - ``tiles_enabled_flag``
+ -
+ * - __u8
+ - ``entropy_coding_sync_enabled_flag``
+ -
+ * - __u8
+ - ``num_tile_columns_minus1``
+ -
+ * - __u8
+ - ``num_tile_rows_minus1``
+ -
+ * - __u8
+ - ``column_width_minus1[20]``
+ -
+ * - __u8
+ - ``row_height_minus1[22]``
+ -
+ * - __u8
+ - ``loop_filter_across_tiles_enabled_flag``
+ -
+ * - __u8
+ - ``pps_loop_filter_across_slices_enabled_flag``
+ -
+ * - __u8
+ - ``deblocking_filter_override_enabled_flag``
+ -
+ * - __u8
+ - ``pps_disable_deblocking_filter_flag``
+ -
+ * - __s8
+ - ``pps_beta_offset_div2``
+ -
+ * - __s8
+ - ``pps_tc_offset_div2``
+ -
+ * - __u8
+ - ``lists_modification_present_flag``
+ -
+ * - __u8
+ - ``log2_parallel_merge_level_minus2``
+ -
+ * - __u8
+ - ``slice_segment_header_extension_present_flag``
+ -
+
+``V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS (struct)``
+ Specifies various slice-specific parameters, especially from the NAL unit
+ header, general slice segment header and weighted prediction parameter
+ parts of the bitstream.
+ These bitstream parameters are defined according to :ref:`hevc`.
+ Refer to the specification for the documentation of these fields.
+
+.. c:type:: v4l2_ctrl_hevc_slice_params
+
+.. cssclass:: longtable
+
+.. flat-table:: struct v4l2_ctrl_hevc_slice_params
+ :header-rows: 0
+ :stub-columns: 0
+ :widths: 1 1 2
+
+ * - __u32
+ - ``bit_size``
+ - Size (in bits) of the current slice data.
+ * - __u32
+ - ``data_bit_offset``
+ - Offset (in bits) to the video data in the current slice data.
+ * - __u8
+ - ``nal_unit_type``
+ -
+ * - __u8
+ - ``nuh_temporal_id_plus1``
+ -
+ * - __u8
+ - ``slice_type``
+ -
+ (V4L2_HEVC_SLICE_TYPE_I, V4L2_HEVC_SLICE_TYPE_P or
+ V4L2_HEVC_SLICE_TYPE_B).
+ * - __u8
+ - ``colour_plane_id``
+ -
+ * - __u16
+ - ``slice_pic_order_cnt``
+ -
+ * - __u8
+ - ``slice_sao_luma_flag``
+ -
+ * - __u8
+ - ``slice_sao_chroma_flag``
+ -
+ * - __u8
+ - ``slice_temporal_mvp_enabled_flag``
+ -
+ * - __u8
+ - ``num_ref_idx_l0_active_minus1``
+ -
+ * - __u8
+ - ``num_ref_idx_l1_active_minus1``
+ -
+ * - __u8
+ - ``mvd_l1_zero_flag``
+ -
+ * - __u8
+ - ``cabac_init_flag``
+ -
+ * - __u8
+ - ``collocated_from_l0_flag``
+ -
+ * - __u8
+ - ``collocated_ref_idx``
+ -
+ * - __u8
+ - ``five_minus_max_num_merge_cand``
+ -
+ * - __u8
+ - ``use_integer_mv_flag``
+ -
+ * - __s8
+ - ``slice_qp_delta``
+ -
+ * - __s8
+ - ``slice_cb_qp_offset``
+ -
+ * - __s8
+ - ``slice_cr_qp_offset``
+ -
+ * - __s8
+ - ``slice_act_y_qp_offset``
+ -
+ * - __s8
+ - ``slice_act_cb_qp_offset``
+ -
+ * - __s8
+ - ``slice_act_cr_qp_offset``
+ -
+ * - __u8
+ - ``slice_deblocking_filter_disabled_flag``
+ -
+ * - __s8
+ - ``slice_beta_offset_div2``
+ -
+ * - __s8
+ - ``slice_tc_offset_div2``
+ -
+ * - __u8
+ - ``slice_loop_filter_across_slices_enabled_flag``
+ -
+ * - __u8
+ - ``pic_struct``
+ -
+ * - struct :c:type:`v4l2_hevc_dpb_entry`
+ - ``dpb[V4L2_HEVC_DPB_ENTRIES_NUM_MAX]``
+ - The decoded picture buffer, for meta-data about reference frames.
+ * - __u8
+ - ``num_active_dpb_entries``
+ - The number of entries in ``dpb``.
+ * - __u8
+ - ``ref_idx_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX]``
+ - The list of L0 reference elements as indices in the DPB.
+ * - __u8
+ - ``ref_idx_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX]``
+ - The list of L1 reference elements as indices in the DPB.
+ * - __u8
+ - ``num_rps_poc_st_curr_before``
+ - The number of reference pictures in the short-term set that come before
+ the current frame.
+ * - __u8
+ - ``num_rps_poc_st_curr_after``
+ - The number of reference pictures in the short-term set that come after
+ the current frame.
+ * - __u8
+ - ``num_rps_poc_lt_curr``
+ - The number of reference pictures in the long-term set.
+ * - struct :c:type:`v4l2_hevc_pred_weight_table`
+ - ``pred_weight_table``
+ - The prediction weight coefficients for inter-picture prediction.
+
+.. c:type:: v4l2_hevc_dpb_entry
+
+.. cssclass:: longtable
+
+.. flat-table:: struct v4l2_hevc_dpb_entry
+ :header-rows: 0
+ :stub-columns: 0
+ :widths: 1 1 2
+
+ * - __u32
+ - ``buffer_tag``
+ - The V4L2 buffer tag that matches the associated reference picture.
+ * - __u8
+ - ``rps``
+ - The reference set for the reference frame
+ (V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE,
+ V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER or
+ V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR)
+ * - __u8
+ - ``field_pic``
+ - Whether the reference is a field picture or a frame.
+ * - __u16
+ - ``pic_order_cnt[2]``
+ - The picture order count of the reference. Only the first element of the
+ array is used for frame pictures, while the first element identifies the
+ top field and the second the bottom field in field-coded pictures.
+
+.. c:type:: v4l2_hevc_pred_weight_table
+
+.. cssclass:: longtable
+
+.. flat-table:: struct v4l2_hevc_pred_weight_table
+ :header-rows: 0
+ :stub-columns: 0
+ :widths: 1 1 2
+
+ * - __u8
+ - ``luma_log2_weight_denom``
+ -
+ * - __s8
+ - ``delta_chroma_log2_weight_denom``
+ -
+ * - __s8
+ - ``delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX]``
+ -
+ * - __s8
+ - ``luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX]``
+ -
+ * - __s8
+ - ``delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2]``
+ -
+ * - __s8
+ - ``chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2]``
+ -
+ * - __s8
+ - ``delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX]``
+ -
+ * - __s8
+ - ``luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX]``
+ -
+ * - __s8
+ - ``delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2]``
+ -
+ * - __s8
+ - ``chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2]``
+ -
+
+
MFC 5.1 MPEG Controls
---------------------

diff --git a/Documentation/media/uapi/v4l/pixfmt-compressed.rst b/Documentation/media/uapi/v4l/pixfmt-compressed.rst
index f15fc1c8d479..572a43bfe9c9 100644
--- a/Documentation/media/uapi/v4l/pixfmt-compressed.rst
+++ b/Documentation/media/uapi/v4l/pixfmt-compressed.rst
@@ -131,6 +131,21 @@ Compressed Formats
- ``V4L2_PIX_FMT_HEVC``
- 'HEVC'
- HEVC/H.265 video elementary stream.
+ * .. _V4L2-PIX-FMT-HEVC-SLICE:
+
+ - ``V4L2_PIX_FMT_HEVC_SLICE``
+ - 'S265'
+ - HEVC parsed slice data, as extracted from the HEVC bitstream.
+ This format is adapted for stateless video decoders that implement a
+ HEVC pipeline (using the :ref:`codec` and :ref:`media-request-api`).
+ Metadata associated with the frame to decode is required to be passed
+ through the following controls :
+ * ``V4L2_CID_MPEG_VIDEO_HEVC_SPS``
+ * ``V4L2_CID_MPEG_VIDEO_HEVC_PPS``
+ * ``V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS``
+ See the :ref:`associated Codec Control IDs <v4l2-mpeg-hevc>`.
+ Buffers associated with this pixel format must contain the appropriate
+ number of macroblocks to decode a full corresponding frame.
* .. _V4L2-PIX-FMT-FWHT:

- ``V4L2_PIX_FMT_FWHT``
diff --git a/Documentation/media/uapi/v4l/vidioc-queryctrl.rst b/Documentation/media/uapi/v4l/vidioc-queryctrl.rst
index 38a9c988124c..8e0cc836058d 100644
--- a/Documentation/media/uapi/v4l/vidioc-queryctrl.rst
+++ b/Documentation/media/uapi/v4l/vidioc-queryctrl.rst
@@ -466,6 +466,24 @@ See also the examples in :ref:`control`.
- n/a
- A struct :c:type:`v4l2_ctrl_h264_decode_param`, containing H264
decode parameters for stateless video decoders.
+ * - ``V4L2_CTRL_TYPE_HEVC_SPS``
+ - n/a
+ - n/a
+ - n/a
+ - A struct :c:type:`v4l2_ctrl_hevc_sps`, containing HEVC Sequence
+ Parameter Set for stateless video decoders.
+ * - ``V4L2_CTRL_TYPE_HEVC_PPS``
+ - n/a
+ - n/a
+ - n/a
+ - A struct :c:type:`v4l2_ctrl_hevc_pps`, containing HEVC Picture
+ Parameter Set for stateless video decoders.
+ * - ``V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS``
+ - n/a
+ - n/a
+ - n/a
+ - A struct :c:type:`v4l2_ctrl_hevc_slice_params`, containing HEVC
+ slice parameters for stateless video decoders.

.. tabularcolumns:: |p{6.6cm}|p{2.2cm}|p{8.7cm}|

diff --git a/Documentation/media/videodev2.h.rst.exceptions b/Documentation/media/videodev2.h.rst.exceptions
index 99f1bd2bc44c..27978d8b18f5 100644
--- a/Documentation/media/videodev2.h.rst.exceptions
+++ b/Documentation/media/videodev2.h.rst.exceptions
@@ -138,6 +138,9 @@ replace symbol V4L2_CTRL_TYPE_H264_PPS :c:type:`v4l2_ctrl_type`
replace symbol V4L2_CTRL_TYPE_H264_SCALING_MATRIX :c:type:`v4l2_ctrl_type`
replace symbol V4L2_CTRL_TYPE_H264_SLICE_PARAMS :c:type:`v4l2_ctrl_type`
replace symbol V4L2_CTRL_TYPE_H264_DECODE_PARAMS :c:type:`v4l2_ctrl_type`
+replace symbol V4L2_CTRL_TYPE_HEVC_SPS :c:type:`v4l2_ctrl_type`
+replace symbol V4L2_CTRL_TYPE_HEVC_PPS :c:type:`v4l2_ctrl_type`
+replace symbol V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS :c:type:`v4l2_ctrl_type`

# V4L2 capability defines
replace define V4L2_CAP_VIDEO_CAPTURE device-capabilities
diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c b/drivers/media/v4l2-core/v4l2-ctrls.c
index e96c453208e8..9af17815ecc3 100644
--- a/drivers/media/v4l2-core/v4l2-ctrls.c
+++ b/drivers/media/v4l2-core/v4l2-ctrls.c
@@ -913,6 +913,9 @@ const char *v4l2_ctrl_get_name(u32 id)
case V4L2_CID_MPEG_VIDEO_HEVC_SIZE_OF_LENGTH_FIELD: return "HEVC Size of Length Field";
case V4L2_CID_MPEG_VIDEO_REF_NUMBER_FOR_PFRAMES: return "Reference Frames for a P-Frame";
case V4L2_CID_MPEG_VIDEO_PREPEND_SPSPPS_TO_IDR: return "Prepend SPS and PPS to IDR";
+ case V4L2_CID_MPEG_VIDEO_HEVC_SPS: return "HEVC Sequence Parameter Set";
+ case V4L2_CID_MPEG_VIDEO_HEVC_PPS: return "HEVC Picture Parameter Set";
+ case V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS: return "HEVC Slice Parameters";

/* CAMERA controls */
/* Keep the order of the 'case's the same as in v4l2-controls.h! */
@@ -1320,6 +1323,15 @@ void v4l2_ctrl_fill(u32 id, const char **name, enum v4l2_ctrl_type *type,
case V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS:
*type = V4L2_CTRL_TYPE_H264_DECODE_PARAMS;
break;
+ case V4L2_CID_MPEG_VIDEO_HEVC_SPS:
+ *type = V4L2_CTRL_TYPE_HEVC_SPS;
+ break;
+ case V4L2_CID_MPEG_VIDEO_HEVC_PPS:
+ *type = V4L2_CTRL_TYPE_HEVC_PPS;
+ break;
+ case V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS:
+ *type = V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS;
+ break;
default:
*type = V4L2_CTRL_TYPE_INTEGER;
break;
@@ -1692,6 +1704,11 @@ static int std_validate(const struct v4l2_ctrl *ctrl, u32 idx,
case V4L2_CTRL_TYPE_H264_DECODE_PARAMS:
return 0;

+ case V4L2_CTRL_TYPE_HEVC_SPS:
+ case V4L2_CTRL_TYPE_HEVC_PPS:
+ case V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS:
+ return 0;
+
default:
return -EINVAL;
}
@@ -2287,6 +2304,15 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct v4l2_ctrl_handler *hdl,
case V4L2_CTRL_TYPE_H264_DECODE_PARAMS:
elem_size = sizeof(struct v4l2_ctrl_h264_decode_param);
break;
+ case V4L2_CTRL_TYPE_HEVC_SPS:
+ elem_size = sizeof(struct v4l2_ctrl_hevc_sps);
+ break;
+ case V4L2_CTRL_TYPE_HEVC_PPS:
+ elem_size = sizeof(struct v4l2_ctrl_hevc_pps);
+ break;
+ case V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS:
+ elem_size = sizeof(struct v4l2_ctrl_hevc_slice_params);
+ break;
default:
if (type < V4L2_CTRL_COMPOUND_TYPES)
elem_size = sizeof(s32);
diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
index aa63f1794272..7bec91c6effe 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -1321,6 +1321,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
case V4L2_PIX_FMT_VP8: descr = "VP8"; break;
case V4L2_PIX_FMT_VP9: descr = "VP9"; break;
case V4L2_PIX_FMT_HEVC: descr = "HEVC"; break; /* aka H.265 */
+ case V4L2_PIX_FMT_HEVC_SLICE: descr = "HEVC Parsed Slice Data"; break;
case V4L2_PIX_FMT_FWHT: descr = "FWHT"; break; /* used in vicodec */
case V4L2_PIX_FMT_CPIA1: descr = "GSPCA CPiA YUV"; break;
case V4L2_PIX_FMT_WNVA: descr = "WNVA"; break;
diff --git a/include/media/v4l2-ctrls.h b/include/media/v4l2-ctrls.h
index b4ca95710d2d..11664c5c3706 100644
--- a/include/media/v4l2-ctrls.h
+++ b/include/media/v4l2-ctrls.h
@@ -48,6 +48,9 @@ struct poll_table_struct;
* @p_h264_scal_mtrx: Pointer to a struct v4l2_ctrl_h264_scaling_matrix.
* @p_h264_slice_param: Pointer to a struct v4l2_ctrl_h264_slice_param.
* @p_h264_decode_param: Pointer to a struct v4l2_ctrl_h264_decode_param.
+ * @p_hevc_sps: Pointer to an HEVC sequence parameter set structure.
+ * @p_hevc_pps: Pointer to an HEVC picture parameter set structure.
+ * @p_hevc_slice_params Pointer to an HEVC slice parameters structure.
* @p: Pointer to a compound value.
*/
union v4l2_ctrl_ptr {
@@ -64,6 +67,9 @@ union v4l2_ctrl_ptr {
struct v4l2_ctrl_h264_scaling_matrix *p_h264_scal_mtrx;
struct v4l2_ctrl_h264_slice_param *p_h264_slice_param;
struct v4l2_ctrl_h264_decode_param *p_h264_decode_param;
+ struct v4l2_ctrl_hevc_sps *p_hevc_sps;
+ struct v4l2_ctrl_hevc_pps *p_hevc_pps;
+ struct v4l2_ctrl_hevc_slice_params *p_hevc_slice_params;
void *p;
};

diff --git a/include/uapi/linux/v4l2-controls.h b/include/uapi/linux/v4l2-controls.h
index 628c0cdb51d9..5bbf63b2dad1 100644
--- a/include/uapi/linux/v4l2-controls.h
+++ b/include/uapi/linux/v4l2-controls.h
@@ -709,6 +709,9 @@ enum v4l2_cid_mpeg_video_hevc_size_of_length_field {
#define V4L2_CID_MPEG_VIDEO_HEVC_HIER_CODING_L6_BR (V4L2_CID_MPEG_BASE + 642)
#define V4L2_CID_MPEG_VIDEO_REF_NUMBER_FOR_PFRAMES (V4L2_CID_MPEG_BASE + 643)
#define V4L2_CID_MPEG_VIDEO_PREPEND_SPSPPS_TO_IDR (V4L2_CID_MPEG_BASE + 644)
+#define V4L2_CID_MPEG_VIDEO_HEVC_SPS (V4L2_CID_MPEG_BASE + 645)
+#define V4L2_CID_MPEG_VIDEO_HEVC_PPS (V4L2_CID_MPEG_BASE + 646)
+#define V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS (V4L2_CID_MPEG_BASE + 647)

/* MPEG-class control IDs specific to the CX2341x driver as defined by V4L2 */
#define V4L2_CID_MPEG_CX2341X_BASE (V4L2_CTRL_CLASS_MPEG | 0x1000)
@@ -1324,4 +1327,156 @@ struct v4l2_ctrl_h264_decode_param {
struct v4l2_h264_dpb_entry dpb[16];
};

+#define V4L2_HEVC_SLICE_TYPE_B 0
+#define V4L2_HEVC_SLICE_TYPE_P 1
+#define V4L2_HEVC_SLICE_TYPE_I 2
+
+struct v4l2_ctrl_hevc_sps {
+ /* ISO/IEC 23008-2, ITU-T Rec. H.265: Sequence parameter set */
+ __u8 chroma_format_idc;
+ __u8 separate_colour_plane_flag;
+ __u16 pic_width_in_luma_samples;
+ __u16 pic_height_in_luma_samples;
+ __u8 bit_depth_luma_minus8;
+ __u8 bit_depth_chroma_minus8;
+ __u8 log2_max_pic_order_cnt_lsb_minus4;
+ __u8 sps_max_dec_pic_buffering_minus1;
+ __u8 sps_max_num_reorder_pics;
+ __u8 sps_max_latency_increase_plus1;
+ __u8 log2_min_luma_coding_block_size_minus3;
+ __u8 log2_diff_max_min_luma_coding_block_size;
+ __u8 log2_min_luma_transform_block_size_minus2;
+ __u8 log2_diff_max_min_luma_transform_block_size;
+ __u8 max_transform_hierarchy_depth_inter;
+ __u8 max_transform_hierarchy_depth_intra;
+ __u8 scaling_list_enabled_flag;
+ __u8 amp_enabled_flag;
+ __u8 sample_adaptive_offset_enabled_flag;
+ __u8 pcm_enabled_flag;
+ __u8 pcm_sample_bit_depth_luma_minus1;
+ __u8 pcm_sample_bit_depth_chroma_minus1;
+ __u8 log2_min_pcm_luma_coding_block_size_minus3;
+ __u8 log2_diff_max_min_pcm_luma_coding_block_size;
+ __u8 pcm_loop_filter_disabled_flag;
+ __u8 num_short_term_ref_pic_sets;
+ __u8 long_term_ref_pics_present_flag;
+ __u8 num_long_term_ref_pics_sps;
+ __u8 sps_temporal_mvp_enabled_flag;
+ __u8 strong_intra_smoothing_enabled_flag;
+};
+
+struct v4l2_ctrl_hevc_pps {
+ /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture parameter set */
+ __u8 dependent_slice_segment_flag;
+ __u8 output_flag_present_flag;
+ __u8 num_extra_slice_header_bits;
+ __u8 sign_data_hiding_enabled_flag;
+ __u8 cabac_init_present_flag;
+ __s8 init_qp_minus26;
+ __u8 constrained_intra_pred_flag;
+ __u8 transform_skip_enabled_flag;
+ __u8 cu_qp_delta_enabled_flag;
+ __u8 diff_cu_qp_delta_depth;
+ __s8 pps_cb_qp_offset;
+ __s8 pps_cr_qp_offset;
+ __u8 pps_slice_chroma_qp_offsets_present_flag;
+ __u8 weighted_pred_flag;
+ __u8 weighted_bipred_flag;
+ __u8 transquant_bypass_enabled_flag;
+ __u8 tiles_enabled_flag;
+ __u8 entropy_coding_sync_enabled_flag;
+ __u8 num_tile_columns_minus1;
+ __u8 num_tile_rows_minus1;
+ __u8 column_width_minus1[20];
+ __u8 row_height_minus1[22];
+ __u8 loop_filter_across_tiles_enabled_flag;
+ __u8 pps_loop_filter_across_slices_enabled_flag;
+ __u8 deblocking_filter_override_enabled_flag;
+ __u8 pps_disable_deblocking_filter_flag;
+ __s8 pps_beta_offset_div2;
+ __s8 pps_tc_offset_div2;
+ __u8 lists_modification_present_flag;
+ __u8 log2_parallel_merge_level_minus2;
+ __u8 slice_segment_header_extension_present_flag;
+};
+
+#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE 0x01
+#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER 0x02
+#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR 0x03
+
+#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX 16
+
+struct v4l2_hevc_dpb_entry {
+ __u32 buffer_tag;
+ __u8 rps;
+ __u8 field_pic;
+ __u16 pic_order_cnt[2];
+};
+
+struct v4l2_hevc_pred_weight_table {
+ __u8 luma_log2_weight_denom;
+ __s8 delta_chroma_log2_weight_denom;
+
+ __s8 delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
+ __s8 luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
+ __s8 delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
+ __s8 chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
+
+ __s8 delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
+ __s8 luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
+ __s8 delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
+ __s8 chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
+};
+
+struct v4l2_ctrl_hevc_slice_params {
+ __u32 bit_size;
+ __u32 data_bit_offset;
+
+ /* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
+ __u8 nal_unit_type;
+ __u8 nuh_temporal_id_plus1;
+
+ /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
+ __u8 slice_type;
+ __u8 colour_plane_id;
+ __u16 slice_pic_order_cnt;
+ __u8 slice_sao_luma_flag;
+ __u8 slice_sao_chroma_flag;
+ __u8 slice_temporal_mvp_enabled_flag;
+ __u8 num_ref_idx_l0_active_minus1;
+ __u8 num_ref_idx_l1_active_minus1;
+ __u8 mvd_l1_zero_flag;
+ __u8 cabac_init_flag;
+ __u8 collocated_from_l0_flag;
+ __u8 collocated_ref_idx;
+ __u8 five_minus_max_num_merge_cand;
+ __u8 use_integer_mv_flag;
+ __s8 slice_qp_delta;
+ __s8 slice_cb_qp_offset;
+ __s8 slice_cr_qp_offset;
+ __s8 slice_act_y_qp_offset;
+ __s8 slice_act_cb_qp_offset;
+ __s8 slice_act_cr_qp_offset;
+ __u8 slice_deblocking_filter_disabled_flag;
+ __s8 slice_beta_offset_div2;
+ __s8 slice_tc_offset_div2;
+ __u8 slice_loop_filter_across_slices_enabled_flag;
+
+ /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture timing SEI message */
+ __u8 pic_struct;
+
+ /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
+ struct v4l2_hevc_dpb_entry dpb[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
+ __u8 num_active_dpb_entries;
+ __u8 ref_idx_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
+ __u8 ref_idx_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
+
+ __u8 num_rps_poc_st_curr_before;
+ __u8 num_rps_poc_st_curr_after;
+ __u8 num_rps_poc_lt_curr;
+
+ /* ISO/IEC 23008-2, ITU-T Rec. H.265: Weighted prediction parameter */
+ struct v4l2_hevc_pred_weight_table pred_weight_table;
+};
+
#endif
diff --git a/include/uapi/linux/v4l2-controls.h.rej b/include/uapi/linux/v4l2-controls.h.rej
deleted file mode 100644
index 1fbb7bf8daa7..000000000000
--- a/include/uapi/linux/v4l2-controls.h.rej
+++ /dev/null
@@ -1,187 +0,0 @@
---- include/uapi/linux/v4l2-controls.h
-+++ include/uapi/linux/v4l2-controls.h
-@@ -50,6 +50,8 @@
- #ifndef __LINUX_V4L2_CONTROLS_H
- #define __LINUX_V4L2_CONTROLS_H
-
-+#include <linux/types.h>
-+
- /* Control classes */
- #define V4L2_CTRL_CLASS_USER 0x00980000 /* Old-style 'user' controls */
- #define V4L2_CTRL_CLASS_MPEG 0x00990000 /* MPEG-compression controls */
-@@ -534,6 +536,12 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type {
- };
- #define V4L2_CID_MPEG_VIDEO_H264_HIERARCHICAL_CODING_LAYER (V4L2_CID_MPEG_BASE+381)
- #define V4L2_CID_MPEG_VIDEO_H264_HIERARCHICAL_CODING_LAYER_QP (V4L2_CID_MPEG_BASE+382)
-+#define V4L2_CID_MPEG_VIDEO_H264_SPS (V4L2_CID_MPEG_BASE+383)
-+#define V4L2_CID_MPEG_VIDEO_H264_PPS (V4L2_CID_MPEG_BASE+384)
-+#define V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX (V4L2_CID_MPEG_BASE+385)
-+#define V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS (V4L2_CID_MPEG_BASE+386)
-+#define V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS (V4L2_CID_MPEG_BASE+387)
-+
- #define V4L2_CID_MPEG_VIDEO_MPEG4_I_FRAME_QP (V4L2_CID_MPEG_BASE+400)
- #define V4L2_CID_MPEG_VIDEO_MPEG4_P_FRAME_QP (V4L2_CID_MPEG_BASE+401)
- #define V4L2_CID_MPEG_VIDEO_MPEG4_B_FRAME_QP (V4L2_CID_MPEG_BASE+402)
-@@ -1156,4 +1164,162 @@ struct v4l2_ctrl_mpeg2_quantization {
- __u8 chroma_non_intra_quantiser_matrix[64];
- };
-
-+/* Compounds controls */
-+
-+#define V4L2_H264_SPS_CONSTRAINT_SET0_FLAG 0x01
-+#define V4L2_H264_SPS_CONSTRAINT_SET1_FLAG 0x02
-+#define V4L2_H264_SPS_CONSTRAINT_SET2_FLAG 0x04
-+#define V4L2_H264_SPS_CONSTRAINT_SET3_FLAG 0x08
-+#define V4L2_H264_SPS_CONSTRAINT_SET4_FLAG 0x10
-+#define V4L2_H264_SPS_CONSTRAINT_SET5_FLAG 0x20
-+
-+#define V4L2_H264_SPS_FLAG_SEPARATE_COLOUR_PLANE 0x01
-+#define V4L2_H264_SPS_FLAG_QPPRIME_Y_ZERO_TRANSFORM_BYPASS 0x02
-+#define V4L2_H264_SPS_FLAG_DELTA_PIC_ORDER_ALWAYS_ZERO 0x04
-+#define V4L2_H264_SPS_FLAG_GAPS_IN_FRAME_NUM_VALUE_ALLOWED 0x08
-+#define V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY 0x10
-+#define V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD 0x20
-+#define V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE 0x40
-+
-+struct v4l2_ctrl_h264_sps {
-+ __u8 profile_idc;
-+ __u8 constraint_set_flags;
-+ __u8 level_idc;
-+ __u8 seq_parameter_set_id;
-+ __u8 chroma_format_idc;
-+ __u8 bit_depth_luma_minus8;
-+ __u8 bit_depth_chroma_minus8;
-+ __u8 log2_max_frame_num_minus4;
-+ __u8 pic_order_cnt_type;
-+ __u8 log2_max_pic_order_cnt_lsb_minus4;
-+ __u8 max_num_ref_frames;
-+ __u8 num_ref_frames_in_pic_order_cnt_cycle;
-+ __s32 offset_for_ref_frame[255];
-+ __s32 offset_for_non_ref_pic;
-+ __s32 offset_for_top_to_bottom_field;
-+ __u16 pic_width_in_mbs_minus1;
-+ __u16 pic_height_in_map_units_minus1;
-+ __u8 flags;
-+};
-+
-+#define V4L2_H264_PPS_FLAG_ENTROPY_CODING_MODE 0x0001
-+#define V4L2_H264_PPS_FLAG_BOTTOM_FIELD_PIC_ORDER_IN_FRAME_PRESENT 0x0002
-+#define V4L2_H264_PPS_FLAG_WEIGHTED_PRED 0x0004
-+#define V4L2_H264_PPS_FLAG_DEBLOCKING_FILTER_CONTROL_PRESENT 0x0008
-+#define V4L2_H264_PPS_FLAG_CONSTRAINED_INTRA_PRED 0x0010
-+#define V4L2_H264_PPS_FLAG_REDUNDANT_PIC_CNT_PRESENT 0x0020
-+#define V4L2_H264_PPS_FLAG_TRANSFORM_8X8_MODE 0x0040
-+#define V4L2_H264_PPS_FLAG_PIC_SCALING_MATRIX_PRESENT 0x0080
-+
-+struct v4l2_ctrl_h264_pps {
-+ __u8 pic_parameter_set_id;
-+ __u8 seq_parameter_set_id;
-+ __u8 num_slice_groups_minus1;
-+ __u8 num_ref_idx_l0_default_active_minus1;
-+ __u8 num_ref_idx_l1_default_active_minus1;
-+ __u8 weighted_bipred_idc;
-+ __s8 pic_init_qp_minus26;
-+ __s8 pic_init_qs_minus26;
-+ __s8 chroma_qp_index_offset;
-+ __s8 second_chroma_qp_index_offset;
-+ __u8 flags;
-+};
-+
-+struct v4l2_ctrl_h264_scaling_matrix {
-+ __u8 scaling_list_4x4[6][16];
-+ __u8 scaling_list_8x8[6][64];
-+};
-+
-+struct v4l2_h264_weight_factors {
-+ __s8 luma_weight[32];
-+ __s8 luma_offset[32];
-+ __s8 chroma_weight[32][2];
-+ __s8 chroma_offset[32][2];
-+};
-+
-+struct v4l2_h264_pred_weight_table {
-+ __u8 luma_log2_weight_denom;
-+ __u8 chroma_log2_weight_denom;
-+ struct v4l2_h264_weight_factors weight_factors[2];
-+};
-+
-+#define V4L2_H264_SLICE_TYPE_P 0
-+#define V4L2_H264_SLICE_TYPE_B 1
-+#define V4L2_H264_SLICE_TYPE_I 2
-+#define V4L2_H264_SLICE_TYPE_SP 3
-+#define V4L2_H264_SLICE_TYPE_SI 4
-+
-+#define V4L2_H264_SLICE_FLAG_FIELD_PIC 0x01
-+#define V4L2_H264_SLICE_FLAG_BOTTOM_FIELD 0x02
-+#define V4L2_H264_SLICE_FLAG_DIRECT_SPATIAL_MV_PRED 0x04
-+#define V4L2_H264_SLICE_FLAG_SP_FOR_SWITCH 0x08
-+
-+struct v4l2_ctrl_h264_slice_param {
-+ /* Size in bytes, including header */
-+ __u32 size;
-+ /* Offset in bits to slice_data() from the beginning of this slice. */
-+ __u32 header_bit_size;
-+
-+ __u16 first_mb_in_slice;
-+ __u8 slice_type;
-+ __u8 pic_parameter_set_id;
-+ __u8 colour_plane_id;
-+ __u16 frame_num;
-+ __u16 idr_pic_id;
-+ __u16 pic_order_cnt_lsb;
-+ __s32 delta_pic_order_cnt_bottom;
-+ __s32 delta_pic_order_cnt0;
-+ __s32 delta_pic_order_cnt1;
-+ __u8 redundant_pic_cnt;
-+
-+ struct v4l2_h264_pred_weight_table pred_weight_table;
-+ /* Size in bits of dec_ref_pic_marking() syntax element. */
-+ __u32 dec_ref_pic_marking_bit_size;
-+ /* Size in bits of pic order count syntax. */
-+ __u32 pic_order_cnt_bit_size;
-+
-+ __u8 cabac_init_idc;
-+ __s8 slice_qp_delta;
-+ __s8 slice_qs_delta;
-+ __u8 disable_deblocking_filter_idc;
-+ __s8 slice_alpha_c0_offset_div2;
-+ __s8 slice_beta_offset_div2;
-+ __u32 slice_group_change_cycle;
-+
-+ __u8 num_ref_idx_l0_active_minus1;
-+ __u8 num_ref_idx_l1_active_minus1;
-+ /* Entries on each list are indices
-+ * into v4l2_ctrl_h264_decode_param.dpb[]. */
-+ __u8 ref_pic_list0[32];
-+ __u8 ref_pic_list1[32];
-+
-+ __u8 flags;
-+};
-+
-+#define V4L2_H264_DPB_ENTRY_FLAG_VALID 0x01
-+#define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE 0x02
-+#define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM 0x04
-+
-+struct v4l2_h264_dpb_entry {
-+ __u32 tag;
-+ __u16 frame_num;
-+ __u16 pic_num;
-+ /* Note that field is indicated by v4l2_buffer.field */
-+ __s32 top_field_order_cnt;
-+ __s32 bottom_field_order_cnt;
-+ __u8 flags; /* V4L2_H264_DPB_ENTRY_FLAG_* */
-+};
-+
-+struct v4l2_ctrl_h264_decode_param {
-+ __u32 num_slices;
-+ __u8 idr_pic_flag;
-+ __u8 nal_ref_idc;
-+ __s32 top_field_order_cnt;
-+ __s32 bottom_field_order_cnt;
-+ __u8 ref_pic_list_p0[32];
-+ __u8 ref_pic_list_b0[32];
-+ __u8 ref_pic_list_b1[32];
-+ struct v4l2_h264_dpb_entry dpb[16];
-+};
-+
- #endif
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index dd028e0bf306..26f5bec9e988 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -655,6 +655,7 @@ struct v4l2_pix_format {
#define V4L2_PIX_FMT_VP8 v4l2_fourcc('V', 'P', '8', '0') /* VP8 */
#define V4L2_PIX_FMT_VP9 v4l2_fourcc('V', 'P', '9', '0') /* VP9 */
#define V4L2_PIX_FMT_HEVC v4l2_fourcc('H', 'E', 'V', 'C') /* HEVC aka H.265 */
+#define V4L2_PIX_FMT_HEVC_SLICE v4l2_fourcc('S', '2', '6', '5') /* HEVC parsed slices */
#define V4L2_PIX_FMT_FWHT v4l2_fourcc('F', 'W', 'H', 'T') /* Fast Walsh Hadamard Transform (vicodec) */

/* Vendor-specific formats */
@@ -1637,6 +1638,9 @@ struct v4l2_ext_control {
struct v4l2_ctrl_h264_scaling_matrix __user *p_h264_scal_mtrx;
struct v4l2_ctrl_h264_slice_param __user *p_h264_slice_param;
struct v4l2_ctrl_h264_decode_param __user *p_h264_decode_param;
+ struct v4l2_ctrl_hevc_sps __user *p_hevc_sps;
+ struct v4l2_ctrl_hevc_pps __user *p_hevc_pps;
+ struct v4l2_ctrl_hevc_slice_params __user *p_hevc_slice_params;
void __user *ptr;
};
} __attribute__ ((packed));
@@ -1689,6 +1693,9 @@ enum v4l2_ctrl_type {
V4L2_CTRL_TYPE_H264_SCALING_MATRIX = 0x0107,
V4L2_CTRL_TYPE_H264_SLICE_PARAMS = 0x0108,
V4L2_CTRL_TYPE_H264_DECODE_PARAMS = 0x0109,
+ V4L2_CTRL_TYPE_HEVC_SPS = 0x0110,
+ V4L2_CTRL_TYPE_HEVC_PPS = 0x0111,
+ V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS = 0x0112,
};

/* Used in the VIDIOC_QUERYCTRL ioctl for querying controls */
--
2.19.1


2018-11-27 10:36:45

by Maxime Ripard

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] media: cedrus: Add HEVC/H.265 decoding support

Hi!

On Fri, Nov 23, 2018 at 02:02:09PM +0100, Paul Kocialkowski wrote:
> This introduces support for HEVC/H.265 to the Cedrus VPU driver, with
> both uni-directional and bi-directional prediction modes supported.
>
> Field-coded (interlaced) pictures, custom quantization matrices and
> 10-bit output are not supported at this point.
>
> Signed-off-by: Paul Kocialkowski <[email protected]>

Output from checkpatch:
total: 0 errors, 68 warnings, 14 checks, 999 lines checked

> +/*
> + * Note: Neighbor info buffer size is apparently doubled for H6, which may be
> + * related to 10 bit H265 support.
> + */
> +#define CEDRUS_H265_NEIGHBOR_INFO_BUF_SIZE (397 * SZ_1K)
> +#define CEDRUS_H265_ENTRY_POINTS_BUF_SIZE (4 * SZ_1K)
> +#define CEDRUS_H265_MV_COL_BUF_UNIT_CTB_SIZE 160

Having some information on where this is coming from would be useful.

> +static void cedrus_h265_sram_write_data(struct cedrus_dev *dev, u32 *data,

Since the data pointer is pretty much an opaque structure, you should
have a void pointer here, that would avoid the type casting you're
doing when calling that function.

> + unsigned int count)
> +{
> + while (count--)
> + cedrus_write(dev, VE_DEC_H265_SRAM_DATA, *data++);
> +}
> +
> +static inline dma_addr_t cedrus_h265_frame_info_mv_col_buf_addr(
> + struct cedrus_ctx *ctx, unsigned int index, unsigned int field)
> +{
> + return ctx->codec.h265.mv_col_buf_addr + index *
> + ctx->codec.h265.mv_col_buf_unit_size +
> + field * ctx->codec.h265.mv_col_buf_unit_size / 2;
> +}
> +
> +static void cedrus_h265_frame_info_write_single(struct cedrus_dev *dev,
> + unsigned int index,
> + bool field_pic,
> + u32 pic_order_cnt[],
> + dma_addr_t mv_col_buf_addr[],
> + dma_addr_t dst_luma_addr,
> + dma_addr_t dst_chroma_addr)
> +{
> + u32 offset = VE_DEC_H265_SRAM_OFFSET_FRAME_INFO +
> + VE_DEC_H265_SRAM_OFFSET_FRAME_INFO_UNIT * index;
> + struct cedrus_h265_sram_frame_info frame_info = {
> + .top_pic_order_cnt = pic_order_cnt[0],
> + .bottom_pic_order_cnt = field_pic ? pic_order_cnt[1] :
> + pic_order_cnt[0],
> + .top_mv_col_buf_addr =
> + VE_DEC_H265_SRAM_DATA_ADDR_BASE(mv_col_buf_addr[0]),
> + .bottom_mv_col_buf_addr = field_pic ?
> + VE_DEC_H265_SRAM_DATA_ADDR_BASE(mv_col_buf_addr[1]) :
> + VE_DEC_H265_SRAM_DATA_ADDR_BASE(mv_col_buf_addr[0]),
> + .luma_addr = VE_DEC_H265_SRAM_DATA_ADDR_BASE(dst_luma_addr),
> + .chroma_addr = VE_DEC_H265_SRAM_DATA_ADDR_BASE(dst_chroma_addr),
> + };
> + unsigned int count = sizeof(frame_info) / sizeof(u32);
> +
> + cedrus_h265_sram_write_offset(dev, offset);
> + cedrus_h265_sram_write_data(dev, (u32 *)&frame_info, count);

Usually, any generic write function will have its size passed in bytes.

> +}
> +
> +static void cedrus_h265_frame_info_write_dpb(struct cedrus_ctx *ctx,
> + const struct v4l2_hevc_dpb_entry *dpb,
> + u8 num_active_dpb_entries)
> +{
> + struct cedrus_dev *dev = ctx->dev;
> + struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
> + unsigned int i;
> +
> + for (i = 0; i < num_active_dpb_entries; i++) {
> + dma_addr_t dst_luma_addr, dst_chroma_addr;
> + dma_addr_t mv_col_buf_addr[2];
> + u32 pic_order_cnt[2];
> + int buffer_index = vb2_find_tag(cap_q, dpb[i].buffer_tag, 0);
> +
> + dst_luma_addr = cedrus_dst_buf_addr(ctx, buffer_index, 0) -
> + PHYS_OFFSET;
> + dst_chroma_addr = cedrus_dst_buf_addr(ctx, buffer_index, 1) -
> + PHYS_OFFSET;
> + mv_col_buf_addr[0] = cedrus_h265_frame_info_mv_col_buf_addr(ctx,
> + buffer_index, 0) - PHYS_OFFSET;

The PHYS_OFFSET part should be part of
cedrus_h265_frame_info_mv_col_buf_addr.

> + pic_order_cnt[0] = dpb[i].pic_order_cnt[0];
> +
> + if (dpb[i].field_pic) {
> + mv_col_buf_addr[1] =
> + cedrus_h265_frame_info_mv_col_buf_addr(ctx,
> + buffer_index, 1) - PHYS_OFFSET;
> + pic_order_cnt[1] = dpb[i].pic_order_cnt[1];
> + }
> +
> + cedrus_h265_frame_info_write_single(dev, i, dpb[i].field_pic,
> + pic_order_cnt,
> + mv_col_buf_addr,
> + dst_luma_addr,
> + dst_chroma_addr);
> + }
> +}
> +
> +static void cedrus_h265_ref_pic_list_write(struct cedrus_dev *dev,
> + const u8 list[],
> + u8 num_ref_idx_active,
> + const struct v4l2_hevc_dpb_entry *dpb,
> + u8 num_active_dpb_entries,
> + u32 sram_offset)
> +{
> + unsigned int i;
> + u32 reg = 0;
> +
> + cedrus_h265_sram_write_offset(dev, sram_offset);
> +
> + for (i = 0; i < num_ref_idx_active; i++) {
> + unsigned int shift = (i % 4) * 8;
> + unsigned int index = list[i];
> + u8 value = list[i];
> +
> + if (dpb[index].rps == V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR)
> + value |= VE_DEC_H265_SRAM_REF_PIC_LIST_LT_REF;
> +
> + reg |= value << shift;
> +
> + if ((i % 4) == 3 || i == (num_ref_idx_active - 1)) {
> + cedrus_h265_sram_write_data(dev, &reg, 1);
> + reg = 0;
> + }

A comment here explaining what you're doing with reg would be nice.

> + }
> +}
> +
> +static void cedrus_h265_pred_weight_write(struct cedrus_dev *dev,
> + const s8 delta_luma_weight[],
> + const s8 luma_offset[],
> + const s8 delta_chroma_weight[][2],
> + const s8 chroma_offset[][2],
> + u8 num_ref_idx_active,
> + u32 sram_luma_offset,
> + u32 sram_chroma_offset)
> +{
> + struct cedrus_h265_sram_pred_weight pred_weight[2] = { 0 };
> + unsigned int i, j;
> +
> + cedrus_h265_sram_write_offset(dev, sram_luma_offset);
> +
> + for (i = 0; i < num_ref_idx_active; i++) {
> + unsigned int index = i % 2;
> +
> + pred_weight[index].delta_weight = delta_luma_weight[i];
> + pred_weight[index].offset = luma_offset[i];
> +
> + if (index == 1 || i == (num_ref_idx_active - 1))
> + cedrus_h265_sram_write_data(dev, (u32 *)&pred_weight,
> + 1);
> + }
> +
> + cedrus_h265_sram_write_offset(dev, sram_chroma_offset);
> +
> + for (i = 0; i < num_ref_idx_active; i++) {
> + for (j = 0; j < 2; j++) {
> + pred_weight[j].delta_weight = delta_chroma_weight[i][j];
> + pred_weight[j].offset = chroma_offset[i][j];
> + }
> +
> + cedrus_h265_sram_write_data(dev, (u32 *)&pred_weight, 1);
> + }
> +}
> +
> +static void cedrus_h265_setup(struct cedrus_ctx *ctx,
> + struct cedrus_run *run)
> +{
> + struct cedrus_dev *dev = ctx->dev;
> + const struct v4l2_ctrl_hevc_sps *sps;
> + const struct v4l2_ctrl_hevc_pps *pps;
> + const struct v4l2_ctrl_hevc_slice_params *slice_params;
> + const struct v4l2_hevc_pred_weight_table *pred_weight_table;
> + dma_addr_t src_buf_addr;
> + dma_addr_t src_buf_end_addr;
> + dma_addr_t dst_luma_addr, dst_chroma_addr;
> + dma_addr_t mv_col_buf_addr[2];
> + u32 chroma_log2_weight_denom;
> + u32 output_pic_list_index;
> + u32 pic_order_cnt[2];
> + u32 reg;
> +
> + sps = run->h265.sps;
> + pps = run->h265.pps;
> + slice_params = run->h265.slice_params;
> + pred_weight_table = &slice_params->pred_weight_table;
> +
> + /* MV column buffer size and allocation. */
> + if (!ctx->codec.h265.mv_col_buf_size) {
> + unsigned int num_buffers =
> + run->dst->vb2_buf.vb2_queue->num_buffers;
> + unsigned int log2_max_luma_coding_block_size =
> + sps->log2_min_luma_coding_block_size_minus3 + 3 +
> + sps->log2_diff_max_min_luma_coding_block_size;
> + unsigned int ctb_size_luma =
> + 1 << log2_max_luma_coding_block_size;
> +
> + /*
> + * Each CTB requires a MV col buffer with a specific unit size.
> + * Since the address is given with missing lsb bits, 1 KiB is
> + * added to each buffer to ensure proper alignment.
> + */
> + ctx->codec.h265.mv_col_buf_unit_size =
> + DIV_ROUND_UP(ctx->src_fmt.width, ctb_size_luma) *
> + DIV_ROUND_UP(ctx->src_fmt.height, ctb_size_luma) *
> + CEDRUS_H265_MV_COL_BUF_UNIT_CTB_SIZE + SZ_1K;
> +
> + ctx->codec.h265.mv_col_buf_size = num_buffers *
> + ctx->codec.h265.mv_col_buf_unit_size;
> +
> + ctx->codec.h265.mv_col_buf =
> + dma_alloc_coherent(dev->dev,
> + ctx->codec.h265.mv_col_buf_size,
> + &ctx->codec.h265.mv_col_buf_addr,
> + GFP_KERNEL);
> + if (!ctx->codec.h265.mv_col_buf) {
> + ctx->codec.h265.mv_col_buf_size = 0;
> + // TODO: Abort the process here.
> + return;
> + }
> + }
> +
> + /* Activate H265 engine. */
> + cedrus_engine_enable(dev, CEDRUS_CODEC_H265);
> +
> + /* Source offset and length in bits. */
> +
> + reg = slice_params->data_bit_offset;
> + cedrus_write(dev, VE_DEC_H265_BITS_OFFSET, reg);
> +
> + reg = slice_params->bit_size - slice_params->data_bit_offset;
> + cedrus_write(dev, VE_DEC_H265_BITS_LEN, reg);
> +
> + /* Source beginning and end addresses. */
> +
> + src_buf_addr = vb2_dma_contig_plane_dma_addr(&run->src->vb2_buf, 0) -
> + PHYS_OFFSET;
> +
> + reg = VE_DEC_H265_BITS_ADDR_BASE(src_buf_addr);
> + reg |= VE_DEC_H265_BITS_ADDR_VALID_SLICE_DATA;
> + reg |= VE_DEC_H265_BITS_ADDR_LAST_SLICE_DATA;
> + reg |= VE_DEC_H265_BITS_ADDR_FIRST_SLICE_DATA;
> +
> + cedrus_write(dev, VE_DEC_H265_BITS_ADDR, reg);
> +
> + src_buf_end_addr = src_buf_addr +
> + DIV_ROUND_UP(slice_params->bit_size, 8);
> +
> + reg = VE_DEC_H265_BITS_END_ADDR_BASE(src_buf_end_addr);
> + cedrus_write(dev, VE_DEC_H265_BITS_END_ADDR, reg);
> +
> + /* Coding tree block address: start at the beginning. */
> + reg = VE_DEC_H265_DEC_CTB_ADDR_X(0) | VE_DEC_H265_DEC_CTB_ADDR_Y(0);
> + cedrus_write(dev, VE_DEC_H265_DEC_CTB_ADDR, reg);
> +
> + cedrus_write(dev, VE_DEC_H265_TILE_START_CTB, 0);
> + cedrus_write(dev, VE_DEC_H265_TILE_END_CTB, 0);
> +
> + /* Clear the number of correctly-decoded coding tree blocks. */
> + cedrus_write(dev, VE_DEC_H265_DEC_CTB_NUM, 0);
> +
> + /* Initialize bitstream access. */
> + cedrus_write(dev, VE_DEC_H265_TRIGGER, VE_DEC_H265_TRIGGER_INIT_SWDEC);
> +
> + /* Bitstream parameters. */
> +
> + reg = VE_DEC_H265_DEC_NAL_HDR_NAL_UNIT_TYPE(slice_params->nal_unit_type) |
> + VE_DEC_H265_DEC_NAL_HDR_NUH_TEMPORAL_ID_PLUS1(slice_params->nuh_temporal_id_plus1);
> +
> + cedrus_write(dev, VE_DEC_H265_DEC_NAL_HDR, reg);
> +
> + reg = VE_DEC_H265_DEC_SPS_HDR_STRONG_INTRA_SMOOTHING_ENABLE_FLAG(sps->strong_intra_smoothing_enabled_flag) |
> + VE_DEC_H265_DEC_SPS_HDR_SPS_TEMPORAL_MVP_ENABLED_FLAG(sps->sps_temporal_mvp_enabled_flag) |
> + VE_DEC_H265_DEC_SPS_HDR_SAMPLE_ADAPTIVE_OFFSET_ENABLED_FLAG(sps->sample_adaptive_offset_enabled_flag) |
> + VE_DEC_H265_DEC_SPS_HDR_AMP_ENABLED_FLAG(sps->amp_enabled_flag) |
> + VE_DEC_H265_DEC_SPS_HDR_MAX_TRANSFORM_HIERARCHY_DEPTH_INTRA(sps->max_transform_hierarchy_depth_intra) |
> + VE_DEC_H265_DEC_SPS_HDR_MAX_TRANSFORM_HIERARCHY_DEPTH_INTER(sps->max_transform_hierarchy_depth_inter) |
> + VE_DEC_H265_DEC_SPS_HDR_LOG2_DIFF_MAX_MIN_TRANSFORM_BLOCK_SIZE(sps->log2_diff_max_min_luma_transform_block_size) |
> + VE_DEC_H265_DEC_SPS_HDR_LOG2_MIN_TRANSFORM_BLOCK_SIZE_MINUS2(sps->log2_min_luma_transform_block_size_minus2) |
> + VE_DEC_H265_DEC_SPS_HDR_LOG2_DIFF_MAX_MIN_LUMA_CODING_BLOCK_SIZE(sps->log2_diff_max_min_luma_coding_block_size) |
> + VE_DEC_H265_DEC_SPS_HDR_LOG2_MIN_LUMA_CODING_BLOCK_SIZE_MINUS3(sps->log2_min_luma_coding_block_size_minus3) |
> + VE_DEC_H265_DEC_SPS_HDR_BIT_DEPTH_CHROMA_MINUS8(sps->bit_depth_chroma_minus8) |
> + VE_DEC_H265_DEC_SPS_HDR_SEPARATE_COLOUR_PLANE_FLAG(sps->separate_colour_plane_flag) |
> + VE_DEC_H265_DEC_SPS_HDR_CHROMA_FORMAT_IDC(sps->chroma_format_idc);
> +
> + cedrus_write(dev, VE_DEC_H265_DEC_SPS_HDR, reg);
> +
> + reg = VE_DEC_H265_DEC_PCM_CTRL_PCM_ENABLED_FLAG(sps->pcm_enabled_flag) |
> + VE_DEC_H265_DEC_PCM_CTRL_PCM_LOOP_FILTER_DISABLED_FLAG(sps->pcm_loop_filter_disabled_flag) |
> + VE_DEC_H265_DEC_PCM_CTRL_LOG2_DIFF_MAX_MIN_PCM_LUMA_CODING_BLOCK_SIZE(sps->log2_diff_max_min_pcm_luma_coding_block_size) |
> + VE_DEC_H265_DEC_PCM_CTRL_LOG2_MIN_PCM_LUMA_CODING_BLOCK_SIZE_MINUS3(sps->log2_min_pcm_luma_coding_block_size_minus3) |
> + VE_DEC_H265_DEC_PCM_CTRL_PCM_SAMPLE_BIT_DEPTH_CHROMA_MINUS1(sps->pcm_sample_bit_depth_chroma_minus1) |
> + VE_DEC_H265_DEC_PCM_CTRL_PCM_SAMPLE_BIT_DEPTH_LUMA_MINUS1(sps->pcm_sample_bit_depth_luma_minus1);
> +
> + cedrus_write(dev, VE_DEC_H265_DEC_PCM_CTRL, reg);
> +
> + reg = VE_DEC_H265_DEC_PPS_CTRL0_PPS_CR_QP_OFFSET(pps->pps_cr_qp_offset) |
> + VE_DEC_H265_DEC_PPS_CTRL0_PPS_CB_QP_OFFSET(pps->pps_cb_qp_offset) |
> + VE_DEC_H265_DEC_PPS_CTRL0_INIT_QP_MINUS26(pps->init_qp_minus26) |
> + VE_DEC_H265_DEC_PPS_CTRL0_DIFF_CU_QP_DELTA_DEPTH(pps->diff_cu_qp_delta_depth) |
> + VE_DEC_H265_DEC_PPS_CTRL0_CU_QP_DELTA_ENABLED_FLAG(pps->cu_qp_delta_enabled_flag) |
> + VE_DEC_H265_DEC_PPS_CTRL0_TRANSFORM_SKIP_ENABLED_FLAG(pps->transform_skip_enabled_flag) |
> + VE_DEC_H265_DEC_PPS_CTRL0_CONSTRAINED_INTRA_PRED_FLAG(pps->constrained_intra_pred_flag) |
> + VE_DEC_H265_DEC_PPS_CTRL0_SIGN_DATA_HIDING_FLAG(pps->sign_data_hiding_enabled_flag);
> +
> + cedrus_write(dev, VE_DEC_H265_DEC_PPS_CTRL0, reg);
> +
> + reg = VE_DEC_H265_DEC_PPS_CTRL1_LOG2_PARALLEL_MERGE_LEVEL_MINUS2(pps->log2_parallel_merge_level_minus2) |
> + VE_DEC_H265_DEC_PPS_CTRL1_PPS_LOOP_FILTER_ACROSS_SLICES_ENABLED_FLAG(pps->pps_loop_filter_across_slices_enabled_flag) |
> + VE_DEC_H265_DEC_PPS_CTRL1_LOOP_FILTER_ACROSS_TILES_ENABLED_FLAG(pps->loop_filter_across_tiles_enabled_flag) |
> + VE_DEC_H265_DEC_PPS_CTRL1_ENTROPY_CODING_SYNC_ENABLED_FLAG(pps->entropy_coding_sync_enabled_flag) |
> + VE_DEC_H265_DEC_PPS_CTRL1_TILES_ENABLED_FLAG(0) |
> + VE_DEC_H265_DEC_PPS_CTRL1_TRANSQUANT_BYPASS_ENABLE_FLAG(pps->transquant_bypass_enabled_flag) |
> + VE_DEC_H265_DEC_PPS_CTRL1_WEIGHTED_BIPRED_FLAG(pps->weighted_bipred_flag) |
> + VE_DEC_H265_DEC_PPS_CTRL1_WEIGHTED_PRED_FLAG(pps->weighted_pred_flag);
> +
> + cedrus_write(dev, VE_DEC_H265_DEC_PPS_CTRL1, reg);
> +
> + reg = VE_DEC_H265_DEC_SLICE_HDR_INFO0_PICTURE_TYPE(slice_params->pic_struct) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO0_FIVE_MINUS_MAX_NUM_MERGE_CAND(slice_params->five_minus_max_num_merge_cand) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO0_NUM_REF_IDX_L1_ACTIVE_MINUS1(slice_params->num_ref_idx_l1_active_minus1) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO0_NUM_REF_IDX_L0_ACTIVE_MINUS1(slice_params->num_ref_idx_l0_active_minus1) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO0_COLLOCATED_REF_IDX(slice_params->collocated_ref_idx) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO0_COLLOCATED_FROM_L0_FLAG(slice_params->collocated_from_l0_flag) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO0_CABAC_INIT_FLAG(slice_params->cabac_init_flag) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO0_MVD_L1_ZERO_FLAG(slice_params->mvd_l1_zero_flag) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO0_SLICE_SAO_CHROMA_FLAG(slice_params->slice_sao_chroma_flag) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO0_SLICE_SAO_LUMA_FLAG(slice_params->slice_sao_luma_flag) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO0_SLICE_TEMPORAL_MVP_ENABLE_FLAG(slice_params->slice_temporal_mvp_enabled_flag) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO0_COLOUR_PLANE_ID(slice_params->colour_plane_id) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO0_SLICE_TYPE(slice_params->slice_type) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO0_DEPENDENT_SLICE_SEGMENT_FLAG(pps->dependent_slice_segment_flag) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO0_FIRST_SLICE_SEGMENT_IN_PIC_FLAG(1);
> +
> + cedrus_write(dev, VE_DEC_H265_DEC_SLICE_HDR_INFO0, reg);
> +
> + reg = VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_TC_OFFSET_DIV2(slice_params->slice_tc_offset_div2) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_BETA_OFFSET_DIV2(slice_params->slice_beta_offset_div2) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_DEBLOCKING_FILTER_DISABLED_FLAG(slice_params->slice_deblocking_filter_disabled_flag) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_LOOP_FILTER_ACROSS_SLICES_ENABLED_FLAG(slice_params->slice_loop_filter_across_slices_enabled_flag) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_POC_BIGEST_IN_RPS_ST(slice_params->num_rps_poc_st_curr_after == 0) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_CR_QP_OFFSET(slice_params->slice_cr_qp_offset) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_CB_QP_OFFSET(slice_params->slice_cb_qp_offset) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_QP_DELTA(slice_params->slice_qp_delta);
> +
> + cedrus_write(dev, VE_DEC_H265_DEC_SLICE_HDR_INFO1, reg);
> +
> + chroma_log2_weight_denom = pred_weight_table->luma_log2_weight_denom +
> + pred_weight_table->delta_chroma_log2_weight_denom;
> + reg = VE_DEC_H265_DEC_SLICE_HDR_INFO2_NUM_ENTRY_POINT_OFFSETS(0) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO2_CHROMA_LOG2_WEIGHT_DENOM(chroma_log2_weight_denom) |
> + VE_DEC_H265_DEC_SLICE_HDR_INFO2_LUMA_LOG2_WEIGHT_DENOM(pred_weight_table->luma_log2_weight_denom);
> +
> + cedrus_write(dev, VE_DEC_H265_DEC_SLICE_HDR_INFO2, reg);
> +
> + /* Decoded picture size. */
> +
> + reg = VE_DEC_H265_DEC_PIC_SIZE_WIDTH(ctx->src_fmt.width) |
> + VE_DEC_H265_DEC_PIC_SIZE_HEIGHT(ctx->src_fmt.height);
> +
> + cedrus_write(dev, VE_DEC_H265_DEC_PIC_SIZE, reg);
> +
> + /* Scaling list */
> +
> + reg = VE_DEC_H265_SCALING_LIST_CTRL0_DEFAULT;
> + cedrus_write(dev, VE_DEC_H265_SCALING_LIST_CTRL0, reg);
> +
> + /* Neightbor information address. */
> + reg = VE_DEC_H265_NEIGHBOR_INFO_ADDR_BASE(ctx->codec.h265.neighbor_info_buf_addr);
> + cedrus_write(dev, VE_DEC_H265_NEIGHBOR_INFO_ADDR, reg);
> +
> + /* Write decoded picture buffer in pic list. */
> + cedrus_h265_frame_info_write_dpb(ctx, slice_params->dpb,
> + slice_params->num_active_dpb_entries);
> +
> + /* Output frame. */
> +
> + output_pic_list_index = V4L2_HEVC_DPB_ENTRIES_NUM_MAX;
> + pic_order_cnt[0] = pic_order_cnt[1] = slice_params->slice_pic_order_cnt;
> + mv_col_buf_addr[0] = cedrus_h265_frame_info_mv_col_buf_addr(ctx,
> + run->dst->vb2_buf.index, 0) - PHYS_OFFSET;
> + mv_col_buf_addr[1] = cedrus_h265_frame_info_mv_col_buf_addr(ctx,
> + run->dst->vb2_buf.index, 1) - PHYS_OFFSET;
> + dst_luma_addr = cedrus_dst_buf_addr(ctx, run->dst->vb2_buf.index, 0) -
> + PHYS_OFFSET;
> + dst_chroma_addr = cedrus_dst_buf_addr(ctx, run->dst->vb2_buf.index, 1) -
> + PHYS_OFFSET;
> +
> + cedrus_h265_frame_info_write_single(dev, output_pic_list_index,
> + slice_params->pic_struct != 0,
> + pic_order_cnt, mv_col_buf_addr,
> + dst_luma_addr, dst_chroma_addr);

You can only pass the run and slice_params pointers to that function.

> +
> + cedrus_write(dev, VE_DEC_H265_OUTPUT_FRAME_IDX, output_pic_list_index);
> +
> + /* Reference picture list 0 (for P/B frames). */
> + if (slice_params->slice_type != V4L2_HEVC_SLICE_TYPE_I) {
> + cedrus_h265_ref_pic_list_write(dev, slice_params->ref_idx_l0,
> + slice_params->num_ref_idx_l0_active_minus1 + 1,
> + slice_params->dpb, slice_params->num_active_dpb_entries,
> + VE_DEC_H265_SRAM_OFFSET_REF_PIC_LIST0);
> +

slice_params is enough.

> + if (pps->weighted_pred_flag || pps->weighted_bipred_flag)
> + cedrus_h265_pred_weight_write(dev,
> + pred_weight_table->delta_luma_weight_l0,
> + pred_weight_table->luma_offset_l0,
> + pred_weight_table->delta_chroma_weight_l0,
> + pred_weight_table->chroma_offset_l0,
> + slice_params->num_ref_idx_l0_active_minus1 + 1,
> + VE_DEC_H265_SRAM_OFFSET_PRED_WEIGHT_LUMA_L0,
> + VE_DEC_H265_SRAM_OFFSET_PRED_WEIGHT_CHROMA_L0);

Ditto, that function should only take the pred_weight_table and
slice_params pointers

> + }
> +
> + /* Reference picture list 1 (for B frames). */
> + if (slice_params->slice_type == V4L2_HEVC_SLICE_TYPE_B) {
> + cedrus_h265_ref_pic_list_write(dev, slice_params->ref_idx_l1,
> + slice_params->num_ref_idx_l1_active_minus1 + 1,
> + slice_params->dpb,
> + slice_params->num_active_dpb_entries,
> + VE_DEC_H265_SRAM_OFFSET_REF_PIC_LIST1);
> +
> + if (pps->weighted_bipred_flag)
> + cedrus_h265_pred_weight_write(dev,
> + pred_weight_table->delta_luma_weight_l1,
> + pred_weight_table->luma_offset_l1,
> + pred_weight_table->delta_chroma_weight_l1,
> + pred_weight_table->chroma_offset_l1,
> + slice_params->num_ref_idx_l1_active_minus1 + 1,
> + VE_DEC_H265_SRAM_OFFSET_PRED_WEIGHT_LUMA_L1,
> + VE_DEC_H265_SRAM_OFFSET_PRED_WEIGHT_CHROMA_L1);
> + }

Ditto

Looks good otherwise, thanks!
Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Attachments:
(No filename) (20.30 kB)
signature.asc (235.00 B)
Download all attachments

2018-12-05 13:19:53

by Hans Verkuil

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

On 11/23/18 14:02, Paul Kocialkowski wrote:
> This introduces the required definitions for HEVC decoding support with
> stateless VPUs. The controls associated to the HEVC slice format provide
> the required meta-data for decoding slices extracted from the bitstream.
>
> This interface comes with the following limitations:
> * No custom quantization matrices (scaling lists);
> * Support for a single temporal layer only;
> * No slice entry point offsets support;
> * No conformance window support;
> * No VUI parameters support;
> * No support for SPS extensions: range, multilayer, 3d, scc, 4 bits;
> * No support for PPS extensions: range, multilayer, 3d, scc, 4 bits.

So if support for one or more of these items would have to be added,
would that be just new controls, or would it affect existing controls?

>
> Signed-off-by: Paul Kocialkowski <[email protected]>
> ---
> Documentation/media/uapi/v4l/biblio.rst | 9 +
> .../media/uapi/v4l/extended-controls.rst | 417 ++++++++++++++++++
> .../media/uapi/v4l/pixfmt-compressed.rst | 15 +
> .../media/uapi/v4l/vidioc-queryctrl.rst | 18 +
> .../media/videodev2.h.rst.exceptions | 3 +
> drivers/media/v4l2-core/v4l2-ctrls.c | 26 ++
> drivers/media/v4l2-core/v4l2-ioctl.c | 1 +
> include/media/v4l2-ctrls.h | 6 +
> include/uapi/linux/v4l2-controls.h | 155 +++++++
> include/uapi/linux/v4l2-controls.h.rej | 187 --------

Huh? .rej?

> include/uapi/linux/videodev2.h | 7 +
> 11 files changed, 657 insertions(+), 187 deletions(-)
> delete mode 100644 include/uapi/linux/v4l2-controls.h.rej
>
> diff --git a/Documentation/media/uapi/v4l/biblio.rst b/Documentation/media/uapi/v4l/biblio.rst
> index 73aeb7ce47d2..59a98feca3a1 100644
> --- a/Documentation/media/uapi/v4l/biblio.rst
> +++ b/Documentation/media/uapi/v4l/biblio.rst
> @@ -124,6 +124,15 @@ ITU H.264
>
> :author: International Telecommunication Union (http://www.itu.ch)
>
> +.. _hevc:
> +
> +ITU H.265/HEVC
> +==============
> +
> +:title: ITU-T Rec. H.265 | ISO/IEC 23008-2 "High Efficiency Video Coding"
> +
> +:author: International Telecommunication Union (http://www.itu.ch), International Organisation for Standardisation (http://www.iso.ch)
> +
> .. _jfif:
>
> JFIF
> diff --git a/Documentation/media/uapi/v4l/extended-controls.rst b/Documentation/media/uapi/v4l/extended-controls.rst
> index 87c0d151577f..906ff4f32634 100644
> --- a/Documentation/media/uapi/v4l/extended-controls.rst
> +++ b/Documentation/media/uapi/v4l/extended-controls.rst
> @@ -2038,6 +2038,423 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type -
> - ``flags``
> -
>
> +.. _v4l2-mpeg-hevc:
> +
> +``V4L2_CID_MPEG_VIDEO_HEVC_SPS (struct)``
> + Specifies the Sequence Parameter Set fields (as extracted from the
> + bitstream) for the associated HEVC slice data.
> + These bitstream parameters are defined according to :ref:`hevc`.
> + Refer to the specification for the documentation of these fields.

Same as for h264: if possible, refer to the section(s) in the spec that deal
with these fields.

> +
> +.. c:type:: v4l2_ctrl_hevc_sps
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_ctrl_hevc_sps
> + :header-rows: 0
> + :stub-columns: 0
> + :widths: 1 1 2
> +
> + * - __u8
> + - ``chroma_format_idc``
> + -
> + * - __u8
> + - ``separate_colour_plane_flag``
> + -
> + * - __u16
> + - ``pic_width_in_luma_samples``
> + -
> + * - __u16
> + - ``pic_height_in_luma_samples``
> + -
> + * - __u8
> + - ``bit_depth_luma_minus8``
> + -
> + * - __u8
> + - ``bit_depth_chroma_minus8``
> + -
> + * - __u8
> + - ``log2_max_pic_order_cnt_lsb_minus4``
> + -
> + * - __u8
> + - ``sps_max_dec_pic_buffering_minus1``
> + -
> + * - __u8
> + - ``sps_max_num_reorder_pics``
> + -
> + * - __u8
> + - ``sps_max_latency_increase_plus1``
> + -
> + * - __u8
> + - ``log2_min_luma_coding_block_size_minus3``
> + -
> + * - __u8
> + - ``log2_diff_max_min_luma_coding_block_size``
> + -
> + * - __u8
> + - ``log2_min_luma_transform_block_size_minus2``
> + -
> + * - __u8
> + - ``log2_diff_max_min_luma_transform_block_size``
> + -
> + * - __u8
> + - ``max_transform_hierarchy_depth_inter``
> + -
> + * - __u8
> + - ``max_transform_hierarchy_depth_intra``
> + -
> + * - __u8
> + - ``scaling_list_enabled_flag``
> + -
> + * - __u8
> + - ``amp_enabled_flag``
> + -
> + * - __u8
> + - ``sample_adaptive_offset_enabled_flag``
> + -
> + * - __u8
> + - ``pcm_enabled_flag``
> + -
> + * - __u8
> + - ``pcm_sample_bit_depth_luma_minus1``
> + -
> + * - __u8
> + - ``pcm_sample_bit_depth_chroma_minus1``
> + -
> + * - __u8
> + - ``log2_min_pcm_luma_coding_block_size_minus3``
> + -
> + * - __u8
> + - ``log2_diff_max_min_pcm_luma_coding_block_size``
> + -
> + * - __u8
> + - ``pcm_loop_filter_disabled_flag``
> + -
> + * - __u8
> + - ``num_short_term_ref_pic_sets``
> + -
> + * - __u8
> + - ``long_term_ref_pics_present_flag``
> + -
> + * - __u8
> + - ``num_long_term_ref_pics_sps``
> + -
> + * - __u8
> + - ``sps_temporal_mvp_enabled_flag``
> + -
> + * - __u8
> + - ``strong_intra_smoothing_enabled_flag``
> + -
> +
> +``V4L2_CID_MPEG_VIDEO_HEVC_PPS (struct)``
> + Specifies the Picture Parameter Set fields (as extracted from the
> + bitstream) for the associated HEVC slice data.
> + These bitstream parameters are defined according to :ref:`hevc`.
> + Refer to the specification for the documentation of these fields.
> +
> +.. c:type:: v4l2_ctrl_hevc_pps
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_ctrl_hevc_pps
> + :header-rows: 0
> + :stub-columns: 0
> + :widths: 1 1 2
> +
> + * - __u8
> + - ``dependent_slice_segment_flag``
> + -
> + * - __u8
> + - ``output_flag_present_flag``
> + -
> + * - __u8
> + - ``num_extra_slice_header_bits``
> + -
> + * - __u8
> + - ``sign_data_hiding_enabled_flag``
> + -
> + * - __u8
> + - ``cabac_init_present_flag``
> + -
> + * - __s8
> + - ``init_qp_minus26``
> + -
> + * - __u8
> + - ``constrained_intra_pred_flag``
> + -
> + * - __u8
> + - ``transform_skip_enabled_flag``
> + -
> + * - __u8
> + - ``cu_qp_delta_enabled_flag``
> + -
> + * - __u8
> + - ``diff_cu_qp_delta_depth``
> + -
> + * - __s8
> + - ``pps_cb_qp_offset``
> + -
> + * - __s8
> + - ``pps_cr_qp_offset``
> + -
> + * - __u8
> + - ``pps_slice_chroma_qp_offsets_present_flag``
> + -
> + * - __u8
> + - ``weighted_pred_flag``
> + -
> + * - __u8
> + - ``weighted_bipred_flag``
> + -
> + * - __u8
> + - ``transquant_bypass_enabled_flag``
> + -
> + * - __u8
> + - ``tiles_enabled_flag``
> + -
> + * - __u8
> + - ``entropy_coding_sync_enabled_flag``
> + -
> + * - __u8
> + - ``num_tile_columns_minus1``
> + -
> + * - __u8
> + - ``num_tile_rows_minus1``
> + -
> + * - __u8
> + - ``column_width_minus1[20]``
> + -
> + * - __u8
> + - ``row_height_minus1[22]``
> + -
> + * - __u8
> + - ``loop_filter_across_tiles_enabled_flag``
> + -
> + * - __u8
> + - ``pps_loop_filter_across_slices_enabled_flag``
> + -
> + * - __u8
> + - ``deblocking_filter_override_enabled_flag``
> + -
> + * - __u8
> + - ``pps_disable_deblocking_filter_flag``
> + -
> + * - __s8
> + - ``pps_beta_offset_div2``
> + -
> + * - __s8
> + - ``pps_tc_offset_div2``
> + -
> + * - __u8
> + - ``lists_modification_present_flag``
> + -
> + * - __u8
> + - ``log2_parallel_merge_level_minus2``
> + -
> + * - __u8
> + - ``slice_segment_header_extension_present_flag``
> + -
> +
> +``V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS (struct)``
> + Specifies various slice-specific parameters, especially from the NAL unit
> + header, general slice segment header and weighted prediction parameter
> + parts of the bitstream.
> + These bitstream parameters are defined according to :ref:`hevc`.
> + Refer to the specification for the documentation of these fields.
> +
> +.. c:type:: v4l2_ctrl_hevc_slice_params
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_ctrl_hevc_slice_params
> + :header-rows: 0
> + :stub-columns: 0
> + :widths: 1 1 2
> +
> + * - __u32
> + - ``bit_size``
> + - Size (in bits) of the current slice data.
> + * - __u32
> + - ``data_bit_offset``
> + - Offset (in bits) to the video data in the current slice data.
> + * - __u8
> + - ``nal_unit_type``
> + -
> + * - __u8
> + - ``nuh_temporal_id_plus1``
> + -
> + * - __u8
> + - ``slice_type``
> + -
> + (V4L2_HEVC_SLICE_TYPE_I, V4L2_HEVC_SLICE_TYPE_P or
> + V4L2_HEVC_SLICE_TYPE_B).
> + * - __u8
> + - ``colour_plane_id``
> + -
> + * - __u16
> + - ``slice_pic_order_cnt``
> + -
> + * - __u8
> + - ``slice_sao_luma_flag``
> + -
> + * - __u8
> + - ``slice_sao_chroma_flag``
> + -
> + * - __u8
> + - ``slice_temporal_mvp_enabled_flag``
> + -
> + * - __u8
> + - ``num_ref_idx_l0_active_minus1``
> + -
> + * - __u8
> + - ``num_ref_idx_l1_active_minus1``
> + -
> + * - __u8
> + - ``mvd_l1_zero_flag``
> + -
> + * - __u8
> + - ``cabac_init_flag``
> + -
> + * - __u8
> + - ``collocated_from_l0_flag``
> + -
> + * - __u8
> + - ``collocated_ref_idx``
> + -
> + * - __u8
> + - ``five_minus_max_num_merge_cand``
> + -
> + * - __u8
> + - ``use_integer_mv_flag``
> + -
> + * - __s8
> + - ``slice_qp_delta``
> + -
> + * - __s8
> + - ``slice_cb_qp_offset``
> + -
> + * - __s8
> + - ``slice_cr_qp_offset``
> + -
> + * - __s8
> + - ``slice_act_y_qp_offset``
> + -
> + * - __s8
> + - ``slice_act_cb_qp_offset``
> + -
> + * - __s8
> + - ``slice_act_cr_qp_offset``
> + -
> + * - __u8
> + - ``slice_deblocking_filter_disabled_flag``
> + -
> + * - __s8
> + - ``slice_beta_offset_div2``
> + -
> + * - __s8
> + - ``slice_tc_offset_div2``
> + -
> + * - __u8
> + - ``slice_loop_filter_across_slices_enabled_flag``
> + -
> + * - __u8
> + - ``pic_struct``
> + -
> + * - struct :c:type:`v4l2_hevc_dpb_entry`
> + - ``dpb[V4L2_HEVC_DPB_ENTRIES_NUM_MAX]``
> + - The decoded picture buffer, for meta-data about reference frames.
> + * - __u8
> + - ``num_active_dpb_entries``
> + - The number of entries in ``dpb``.
> + * - __u8
> + - ``ref_idx_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX]``
> + - The list of L0 reference elements as indices in the DPB.
> + * - __u8
> + - ``ref_idx_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX]``
> + - The list of L1 reference elements as indices in the DPB.
> + * - __u8
> + - ``num_rps_poc_st_curr_before``
> + - The number of reference pictures in the short-term set that come before
> + the current frame.
> + * - __u8
> + - ``num_rps_poc_st_curr_after``
> + - The number of reference pictures in the short-term set that come after
> + the current frame.
> + * - __u8
> + - ``num_rps_poc_lt_curr``
> + - The number of reference pictures in the long-term set.
> + * - struct :c:type:`v4l2_hevc_pred_weight_table`
> + - ``pred_weight_table``
> + - The prediction weight coefficients for inter-picture prediction.
> +
> +.. c:type:: v4l2_hevc_dpb_entry
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_hevc_dpb_entry
> + :header-rows: 0
> + :stub-columns: 0
> + :widths: 1 1 2
> +
> + * - __u32
> + - ``buffer_tag``

It's called 'tag' in v4l2_h264_dpb_entry. Probably a good idea to keep the same
terminology. I have no preference, as long as the names are consistent.

> + - The V4L2 buffer tag that matches the associated reference picture.
> + * - __u8
> + - ``rps``
> + - The reference set for the reference frame
> + (V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE,
> + V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER or
> + V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR)
> + * - __u8
> + - ``field_pic``
> + - Whether the reference is a field picture or a frame.
> + * - __u16
> + - ``pic_order_cnt[2]``
> + - The picture order count of the reference. Only the first element of the
> + array is used for frame pictures, while the first element identifies the
> + top field and the second the bottom field in field-coded pictures.
> +
> +.. c:type:: v4l2_hevc_pred_weight_table
> +
> +.. cssclass:: longtable
> +
> +.. flat-table:: struct v4l2_hevc_pred_weight_table
> + :header-rows: 0
> + :stub-columns: 0
> + :widths: 1 1 2
> +
> + * - __u8
> + - ``luma_log2_weight_denom``
> + -
> + * - __s8
> + - ``delta_chroma_log2_weight_denom``
> + -
> + * - __s8
> + - ``delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX]``
> + -
> + * - __s8
> + - ``luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX]``
> + -
> + * - __s8
> + - ``delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2]``
> + -
> + * - __s8
> + - ``chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2]``
> + -
> + * - __s8
> + - ``delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX]``
> + -
> + * - __s8
> + - ``luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX]``
> + -
> + * - __s8
> + - ``delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2]``
> + -
> + * - __s8
> + - ``chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2]``
> + -
> +
> +
> MFC 5.1 MPEG Controls
> ---------------------
>
> diff --git a/Documentation/media/uapi/v4l/pixfmt-compressed.rst b/Documentation/media/uapi/v4l/pixfmt-compressed.rst
> index f15fc1c8d479..572a43bfe9c9 100644
> --- a/Documentation/media/uapi/v4l/pixfmt-compressed.rst
> +++ b/Documentation/media/uapi/v4l/pixfmt-compressed.rst
> @@ -131,6 +131,21 @@ Compressed Formats
> - ``V4L2_PIX_FMT_HEVC``
> - 'HEVC'
> - HEVC/H.265 video elementary stream.
> + * .. _V4L2-PIX-FMT-HEVC-SLICE:
> +
> + - ``V4L2_PIX_FMT_HEVC_SLICE``
> + - 'S265'
> + - HEVC parsed slice data, as extracted from the HEVC bitstream.
> + This format is adapted for stateless video decoders that implement a
> + HEVC pipeline (using the :ref:`codec` and :ref:`media-request-api`).
> + Metadata associated with the frame to decode is required to be passed
> + through the following controls :
> + * ``V4L2_CID_MPEG_VIDEO_HEVC_SPS``
> + * ``V4L2_CID_MPEG_VIDEO_HEVC_PPS``
> + * ``V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS``
> + See the :ref:`associated Codec Control IDs <v4l2-mpeg-hevc>`.
> + Buffers associated with this pixel format must contain the appropriate
> + number of macroblocks to decode a full corresponding frame.
> * .. _V4L2-PIX-FMT-FWHT:
>
> - ``V4L2_PIX_FMT_FWHT``
> diff --git a/Documentation/media/uapi/v4l/vidioc-queryctrl.rst b/Documentation/media/uapi/v4l/vidioc-queryctrl.rst
> index 38a9c988124c..8e0cc836058d 100644
> --- a/Documentation/media/uapi/v4l/vidioc-queryctrl.rst
> +++ b/Documentation/media/uapi/v4l/vidioc-queryctrl.rst
> @@ -466,6 +466,24 @@ See also the examples in :ref:`control`.
> - n/a
> - A struct :c:type:`v4l2_ctrl_h264_decode_param`, containing H264
> decode parameters for stateless video decoders.
> + * - ``V4L2_CTRL_TYPE_HEVC_SPS``
> + - n/a
> + - n/a
> + - n/a
> + - A struct :c:type:`v4l2_ctrl_hevc_sps`, containing HEVC Sequence
> + Parameter Set for stateless video decoders.
> + * - ``V4L2_CTRL_TYPE_HEVC_PPS``
> + - n/a
> + - n/a
> + - n/a
> + - A struct :c:type:`v4l2_ctrl_hevc_pps`, containing HEVC Picture
> + Parameter Set for stateless video decoders.
> + * - ``V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS``
> + - n/a
> + - n/a
> + - n/a
> + - A struct :c:type:`v4l2_ctrl_hevc_slice_params`, containing HEVC
> + slice parameters for stateless video decoders.
>
> .. tabularcolumns:: |p{6.6cm}|p{2.2cm}|p{8.7cm}|
>
> diff --git a/Documentation/media/videodev2.h.rst.exceptions b/Documentation/media/videodev2.h.rst.exceptions
> index 99f1bd2bc44c..27978d8b18f5 100644
> --- a/Documentation/media/videodev2.h.rst.exceptions
> +++ b/Documentation/media/videodev2.h.rst.exceptions
> @@ -138,6 +138,9 @@ replace symbol V4L2_CTRL_TYPE_H264_PPS :c:type:`v4l2_ctrl_type`
> replace symbol V4L2_CTRL_TYPE_H264_SCALING_MATRIX :c:type:`v4l2_ctrl_type`
> replace symbol V4L2_CTRL_TYPE_H264_SLICE_PARAMS :c:type:`v4l2_ctrl_type`
> replace symbol V4L2_CTRL_TYPE_H264_DECODE_PARAMS :c:type:`v4l2_ctrl_type`
> +replace symbol V4L2_CTRL_TYPE_HEVC_SPS :c:type:`v4l2_ctrl_type`
> +replace symbol V4L2_CTRL_TYPE_HEVC_PPS :c:type:`v4l2_ctrl_type`
> +replace symbol V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS :c:type:`v4l2_ctrl_type`
>
> # V4L2 capability defines
> replace define V4L2_CAP_VIDEO_CAPTURE device-capabilities
> diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c b/drivers/media/v4l2-core/v4l2-ctrls.c
> index e96c453208e8..9af17815ecc3 100644
> --- a/drivers/media/v4l2-core/v4l2-ctrls.c
> +++ b/drivers/media/v4l2-core/v4l2-ctrls.c
> @@ -913,6 +913,9 @@ const char *v4l2_ctrl_get_name(u32 id)
> case V4L2_CID_MPEG_VIDEO_HEVC_SIZE_OF_LENGTH_FIELD: return "HEVC Size of Length Field";
> case V4L2_CID_MPEG_VIDEO_REF_NUMBER_FOR_PFRAMES: return "Reference Frames for a P-Frame";
> case V4L2_CID_MPEG_VIDEO_PREPEND_SPSPPS_TO_IDR: return "Prepend SPS and PPS to IDR";
> + case V4L2_CID_MPEG_VIDEO_HEVC_SPS: return "HEVC Sequence Parameter Set";
> + case V4L2_CID_MPEG_VIDEO_HEVC_PPS: return "HEVC Picture Parameter Set";

Hmm. The H264 control descriptions just say H264 SPS and PPS. I like this better.
Can you update the H264 control descriptions?

> + case V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS: return "HEVC Slice Parameters";
>
> /* CAMERA controls */
> /* Keep the order of the 'case's the same as in v4l2-controls.h! */
> @@ -1320,6 +1323,15 @@ void v4l2_ctrl_fill(u32 id, const char **name, enum v4l2_ctrl_type *type,
> case V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS:
> *type = V4L2_CTRL_TYPE_H264_DECODE_PARAMS;
> break;
> + case V4L2_CID_MPEG_VIDEO_HEVC_SPS:
> + *type = V4L2_CTRL_TYPE_HEVC_SPS;
> + break;
> + case V4L2_CID_MPEG_VIDEO_HEVC_PPS:
> + *type = V4L2_CTRL_TYPE_HEVC_PPS;
> + break;
> + case V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS:
> + *type = V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS;
> + break;
> default:
> *type = V4L2_CTRL_TYPE_INTEGER;
> break;
> @@ -1692,6 +1704,11 @@ static int std_validate(const struct v4l2_ctrl *ctrl, u32 idx,
> case V4L2_CTRL_TYPE_H264_DECODE_PARAMS:
> return 0;
>
> + case V4L2_CTRL_TYPE_HEVC_SPS:
> + case V4L2_CTRL_TYPE_HEVC_PPS:
> + case V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS:
> + return 0;
> +
> default:
> return -EINVAL;
> }
> @@ -2287,6 +2304,15 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct v4l2_ctrl_handler *hdl,
> case V4L2_CTRL_TYPE_H264_DECODE_PARAMS:
> elem_size = sizeof(struct v4l2_ctrl_h264_decode_param);
> break;
> + case V4L2_CTRL_TYPE_HEVC_SPS:
> + elem_size = sizeof(struct v4l2_ctrl_hevc_sps);
> + break;
> + case V4L2_CTRL_TYPE_HEVC_PPS:
> + elem_size = sizeof(struct v4l2_ctrl_hevc_pps);
> + break;
> + case V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS:
> + elem_size = sizeof(struct v4l2_ctrl_hevc_slice_params);
> + break;
> default:
> if (type < V4L2_CTRL_COMPOUND_TYPES)
> elem_size = sizeof(s32);
> diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
> index aa63f1794272..7bec91c6effe 100644
> --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> @@ -1321,6 +1321,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
> case V4L2_PIX_FMT_VP8: descr = "VP8"; break;
> case V4L2_PIX_FMT_VP9: descr = "VP9"; break;
> case V4L2_PIX_FMT_HEVC: descr = "HEVC"; break; /* aka H.265 */
> + case V4L2_PIX_FMT_HEVC_SLICE: descr = "HEVC Parsed Slice Data"; break;

H264 calls it "H.264 Parsed Slice". Again, please pick one or the other.
MPEG2 calls it Parsed Slice Data as well, so I'd say the H264 code should change.

> case V4L2_PIX_FMT_FWHT: descr = "FWHT"; break; /* used in vicodec */
> case V4L2_PIX_FMT_CPIA1: descr = "GSPCA CPiA YUV"; break;
> case V4L2_PIX_FMT_WNVA: descr = "WNVA"; break;
> diff --git a/include/media/v4l2-ctrls.h b/include/media/v4l2-ctrls.h
> index b4ca95710d2d..11664c5c3706 100644
> --- a/include/media/v4l2-ctrls.h
> +++ b/include/media/v4l2-ctrls.h
> @@ -48,6 +48,9 @@ struct poll_table_struct;
> * @p_h264_scal_mtrx: Pointer to a struct v4l2_ctrl_h264_scaling_matrix.
> * @p_h264_slice_param: Pointer to a struct v4l2_ctrl_h264_slice_param.
> * @p_h264_decode_param: Pointer to a struct v4l2_ctrl_h264_decode_param.
> + * @p_hevc_sps: Pointer to an HEVC sequence parameter set structure.
> + * @p_hevc_pps: Pointer to an HEVC picture parameter set structure.
> + * @p_hevc_slice_params Pointer to an HEVC slice parameters structure.
> * @p: Pointer to a compound value.
> */
> union v4l2_ctrl_ptr {
> @@ -64,6 +67,9 @@ union v4l2_ctrl_ptr {
> struct v4l2_ctrl_h264_scaling_matrix *p_h264_scal_mtrx;
> struct v4l2_ctrl_h264_slice_param *p_h264_slice_param;
> struct v4l2_ctrl_h264_decode_param *p_h264_decode_param;
> + struct v4l2_ctrl_hevc_sps *p_hevc_sps;
> + struct v4l2_ctrl_hevc_pps *p_hevc_pps;
> + struct v4l2_ctrl_hevc_slice_params *p_hevc_slice_params;
> void *p;
> };
>
> diff --git a/include/uapi/linux/v4l2-controls.h b/include/uapi/linux/v4l2-controls.h
> index 628c0cdb51d9..5bbf63b2dad1 100644
> --- a/include/uapi/linux/v4l2-controls.h
> +++ b/include/uapi/linux/v4l2-controls.h
> @@ -709,6 +709,9 @@ enum v4l2_cid_mpeg_video_hevc_size_of_length_field {
> #define V4L2_CID_MPEG_VIDEO_HEVC_HIER_CODING_L6_BR (V4L2_CID_MPEG_BASE + 642)
> #define V4L2_CID_MPEG_VIDEO_REF_NUMBER_FOR_PFRAMES (V4L2_CID_MPEG_BASE + 643)
> #define V4L2_CID_MPEG_VIDEO_PREPEND_SPSPPS_TO_IDR (V4L2_CID_MPEG_BASE + 644)
> +#define V4L2_CID_MPEG_VIDEO_HEVC_SPS (V4L2_CID_MPEG_BASE + 645)
> +#define V4L2_CID_MPEG_VIDEO_HEVC_PPS (V4L2_CID_MPEG_BASE + 646)
> +#define V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS (V4L2_CID_MPEG_BASE + 647)
>
> /* MPEG-class control IDs specific to the CX2341x driver as defined by V4L2 */
> #define V4L2_CID_MPEG_CX2341X_BASE (V4L2_CTRL_CLASS_MPEG | 0x1000)
> @@ -1324,4 +1327,156 @@ struct v4l2_ctrl_h264_decode_param {
> struct v4l2_h264_dpb_entry dpb[16];
> };
>
> +#define V4L2_HEVC_SLICE_TYPE_B 0
> +#define V4L2_HEVC_SLICE_TYPE_P 1
> +#define V4L2_HEVC_SLICE_TYPE_I 2
> +
> +struct v4l2_ctrl_hevc_sps {
> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Sequence parameter set */
> + __u8 chroma_format_idc;
> + __u8 separate_colour_plane_flag;
> + __u16 pic_width_in_luma_samples;
> + __u16 pic_height_in_luma_samples;
> + __u8 bit_depth_luma_minus8;
> + __u8 bit_depth_chroma_minus8;
> + __u8 log2_max_pic_order_cnt_lsb_minus4;
> + __u8 sps_max_dec_pic_buffering_minus1;
> + __u8 sps_max_num_reorder_pics;
> + __u8 sps_max_latency_increase_plus1;
> + __u8 log2_min_luma_coding_block_size_minus3;
> + __u8 log2_diff_max_min_luma_coding_block_size;
> + __u8 log2_min_luma_transform_block_size_minus2;
> + __u8 log2_diff_max_min_luma_transform_block_size;
> + __u8 max_transform_hierarchy_depth_inter;
> + __u8 max_transform_hierarchy_depth_intra;
> + __u8 scaling_list_enabled_flag;
> + __u8 amp_enabled_flag;
> + __u8 sample_adaptive_offset_enabled_flag;
> + __u8 pcm_enabled_flag;
> + __u8 pcm_sample_bit_depth_luma_minus1;
> + __u8 pcm_sample_bit_depth_chroma_minus1;
> + __u8 log2_min_pcm_luma_coding_block_size_minus3;
> + __u8 log2_diff_max_min_pcm_luma_coding_block_size;
> + __u8 pcm_loop_filter_disabled_flag;
> + __u8 num_short_term_ref_pic_sets;
> + __u8 long_term_ref_pics_present_flag;
> + __u8 num_long_term_ref_pics_sps;
> + __u8 sps_temporal_mvp_enabled_flag;
> + __u8 strong_intra_smoothing_enabled_flag;

Same comment as for H264: keep the structs 4 byte aligned and ensure that
there are no holes.

> +};
> +
> +struct v4l2_ctrl_hevc_pps {
> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture parameter set */
> + __u8 dependent_slice_segment_flag;
> + __u8 output_flag_present_flag;
> + __u8 num_extra_slice_header_bits;
> + __u8 sign_data_hiding_enabled_flag;
> + __u8 cabac_init_present_flag;
> + __s8 init_qp_minus26;
> + __u8 constrained_intra_pred_flag;
> + __u8 transform_skip_enabled_flag;
> + __u8 cu_qp_delta_enabled_flag;
> + __u8 diff_cu_qp_delta_depth;
> + __s8 pps_cb_qp_offset;
> + __s8 pps_cr_qp_offset;
> + __u8 pps_slice_chroma_qp_offsets_present_flag;
> + __u8 weighted_pred_flag;
> + __u8 weighted_bipred_flag;
> + __u8 transquant_bypass_enabled_flag;
> + __u8 tiles_enabled_flag;
> + __u8 entropy_coding_sync_enabled_flag;
> + __u8 num_tile_columns_minus1;
> + __u8 num_tile_rows_minus1;
> + __u8 column_width_minus1[20];
> + __u8 row_height_minus1[22];
> + __u8 loop_filter_across_tiles_enabled_flag;
> + __u8 pps_loop_filter_across_slices_enabled_flag;
> + __u8 deblocking_filter_override_enabled_flag;
> + __u8 pps_disable_deblocking_filter_flag;
> + __s8 pps_beta_offset_div2;
> + __s8 pps_tc_offset_div2;
> + __u8 lists_modification_present_flag;
> + __u8 log2_parallel_merge_level_minus2;
> + __u8 slice_segment_header_extension_present_flag;
> +};
> +
> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE 0x01
> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER 0x02
> +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR 0x03
> +
> +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX 16
> +
> +struct v4l2_hevc_dpb_entry {
> + __u32 buffer_tag;
> + __u8 rps;
> + __u8 field_pic;
> + __u16 pic_order_cnt[2];
> +};
> +
> +struct v4l2_hevc_pred_weight_table {
> + __u8 luma_log2_weight_denom;
> + __s8 delta_chroma_log2_weight_denom;
> +
> + __s8 delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> + __s8 luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> + __s8 delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> + __s8 chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> +
> + __s8 delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> + __s8 luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> + __s8 delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> + __s8 chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> +};
> +
> +struct v4l2_ctrl_hevc_slice_params {
> + __u32 bit_size;
> + __u32 data_bit_offset;
> +
> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
> + __u8 nal_unit_type;
> + __u8 nuh_temporal_id_plus1;
> +
> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
> + __u8 slice_type;
> + __u8 colour_plane_id;
> + __u16 slice_pic_order_cnt;
> + __u8 slice_sao_luma_flag;
> + __u8 slice_sao_chroma_flag;
> + __u8 slice_temporal_mvp_enabled_flag;
> + __u8 num_ref_idx_l0_active_minus1;
> + __u8 num_ref_idx_l1_active_minus1;
> + __u8 mvd_l1_zero_flag;
> + __u8 cabac_init_flag;
> + __u8 collocated_from_l0_flag;
> + __u8 collocated_ref_idx;
> + __u8 five_minus_max_num_merge_cand;
> + __u8 use_integer_mv_flag;
> + __s8 slice_qp_delta;
> + __s8 slice_cb_qp_offset;
> + __s8 slice_cr_qp_offset;
> + __s8 slice_act_y_qp_offset;
> + __s8 slice_act_cb_qp_offset;
> + __s8 slice_act_cr_qp_offset;
> + __u8 slice_deblocking_filter_disabled_flag;
> + __s8 slice_beta_offset_div2;
> + __s8 slice_tc_offset_div2;
> + __u8 slice_loop_filter_across_slices_enabled_flag;
> +
> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture timing SEI message */
> + __u8 pic_struct;
> +
> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
> + struct v4l2_hevc_dpb_entry dpb[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> + __u8 num_active_dpb_entries;
> + __u8 ref_idx_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> + __u8 ref_idx_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> +
> + __u8 num_rps_poc_st_curr_before;
> + __u8 num_rps_poc_st_curr_after;
> + __u8 num_rps_poc_lt_curr;
> +
> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Weighted prediction parameter */
> + struct v4l2_hevc_pred_weight_table pred_weight_table;
> +};
> +
> #endif
> diff --git a/include/uapi/linux/v4l2-controls.h.rej b/include/uapi/linux/v4l2-controls.h.rej
> deleted file mode 100644
> index 1fbb7bf8daa7..000000000000
> --- a/include/uapi/linux/v4l2-controls.h.rej
> +++ /dev/null
> @@ -1,187 +0,0 @@
> ---- include/uapi/linux/v4l2-controls.h
> -+++ include/uapi/linux/v4l2-controls.h
> -@@ -50,6 +50,8 @@
> - #ifndef __LINUX_V4L2_CONTROLS_H
> - #define __LINUX_V4L2_CONTROLS_H
> -
> -+#include <linux/types.h>
> -+
> - /* Control classes */
> - #define V4L2_CTRL_CLASS_USER 0x00980000 /* Old-style 'user' controls */
> - #define V4L2_CTRL_CLASS_MPEG 0x00990000 /* MPEG-compression controls */
> -@@ -534,6 +536,12 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type {
> - };
> - #define V4L2_CID_MPEG_VIDEO_H264_HIERARCHICAL_CODING_LAYER (V4L2_CID_MPEG_BASE+381)
> - #define V4L2_CID_MPEG_VIDEO_H264_HIERARCHICAL_CODING_LAYER_QP (V4L2_CID_MPEG_BASE+382)
> -+#define V4L2_CID_MPEG_VIDEO_H264_SPS (V4L2_CID_MPEG_BASE+383)
> -+#define V4L2_CID_MPEG_VIDEO_H264_PPS (V4L2_CID_MPEG_BASE+384)
> -+#define V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX (V4L2_CID_MPEG_BASE+385)
> -+#define V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS (V4L2_CID_MPEG_BASE+386)
> -+#define V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS (V4L2_CID_MPEG_BASE+387)
> -+
> - #define V4L2_CID_MPEG_VIDEO_MPEG4_I_FRAME_QP (V4L2_CID_MPEG_BASE+400)
> - #define V4L2_CID_MPEG_VIDEO_MPEG4_P_FRAME_QP (V4L2_CID_MPEG_BASE+401)
> - #define V4L2_CID_MPEG_VIDEO_MPEG4_B_FRAME_QP (V4L2_CID_MPEG_BASE+402)
> -@@ -1156,4 +1164,162 @@ struct v4l2_ctrl_mpeg2_quantization {
> - __u8 chroma_non_intra_quantiser_matrix[64];
> - };
> -
> -+/* Compounds controls */
> -+
> -+#define V4L2_H264_SPS_CONSTRAINT_SET0_FLAG 0x01
> -+#define V4L2_H264_SPS_CONSTRAINT_SET1_FLAG 0x02
> -+#define V4L2_H264_SPS_CONSTRAINT_SET2_FLAG 0x04
> -+#define V4L2_H264_SPS_CONSTRAINT_SET3_FLAG 0x08
> -+#define V4L2_H264_SPS_CONSTRAINT_SET4_FLAG 0x10
> -+#define V4L2_H264_SPS_CONSTRAINT_SET5_FLAG 0x20
> -+
> -+#define V4L2_H264_SPS_FLAG_SEPARATE_COLOUR_PLANE 0x01
> -+#define V4L2_H264_SPS_FLAG_QPPRIME_Y_ZERO_TRANSFORM_BYPASS 0x02
> -+#define V4L2_H264_SPS_FLAG_DELTA_PIC_ORDER_ALWAYS_ZERO 0x04
> -+#define V4L2_H264_SPS_FLAG_GAPS_IN_FRAME_NUM_VALUE_ALLOWED 0x08
> -+#define V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY 0x10
> -+#define V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD 0x20
> -+#define V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE 0x40
> -+
> -+struct v4l2_ctrl_h264_sps {
> -+ __u8 profile_idc;
> -+ __u8 constraint_set_flags;
> -+ __u8 level_idc;
> -+ __u8 seq_parameter_set_id;
> -+ __u8 chroma_format_idc;
> -+ __u8 bit_depth_luma_minus8;
> -+ __u8 bit_depth_chroma_minus8;
> -+ __u8 log2_max_frame_num_minus4;
> -+ __u8 pic_order_cnt_type;
> -+ __u8 log2_max_pic_order_cnt_lsb_minus4;
> -+ __u8 max_num_ref_frames;
> -+ __u8 num_ref_frames_in_pic_order_cnt_cycle;
> -+ __s32 offset_for_ref_frame[255];
> -+ __s32 offset_for_non_ref_pic;
> -+ __s32 offset_for_top_to_bottom_field;
> -+ __u16 pic_width_in_mbs_minus1;
> -+ __u16 pic_height_in_map_units_minus1;
> -+ __u8 flags;
> -+};
> -+
> -+#define V4L2_H264_PPS_FLAG_ENTROPY_CODING_MODE 0x0001
> -+#define V4L2_H264_PPS_FLAG_BOTTOM_FIELD_PIC_ORDER_IN_FRAME_PRESENT 0x0002
> -+#define V4L2_H264_PPS_FLAG_WEIGHTED_PRED 0x0004
> -+#define V4L2_H264_PPS_FLAG_DEBLOCKING_FILTER_CONTROL_PRESENT 0x0008
> -+#define V4L2_H264_PPS_FLAG_CONSTRAINED_INTRA_PRED 0x0010
> -+#define V4L2_H264_PPS_FLAG_REDUNDANT_PIC_CNT_PRESENT 0x0020
> -+#define V4L2_H264_PPS_FLAG_TRANSFORM_8X8_MODE 0x0040
> -+#define V4L2_H264_PPS_FLAG_PIC_SCALING_MATRIX_PRESENT 0x0080
> -+
> -+struct v4l2_ctrl_h264_pps {
> -+ __u8 pic_parameter_set_id;
> -+ __u8 seq_parameter_set_id;
> -+ __u8 num_slice_groups_minus1;
> -+ __u8 num_ref_idx_l0_default_active_minus1;
> -+ __u8 num_ref_idx_l1_default_active_minus1;
> -+ __u8 weighted_bipred_idc;
> -+ __s8 pic_init_qp_minus26;
> -+ __s8 pic_init_qs_minus26;
> -+ __s8 chroma_qp_index_offset;
> -+ __s8 second_chroma_qp_index_offset;
> -+ __u8 flags;
> -+};
> -+
> -+struct v4l2_ctrl_h264_scaling_matrix {
> -+ __u8 scaling_list_4x4[6][16];
> -+ __u8 scaling_list_8x8[6][64];
> -+};
> -+
> -+struct v4l2_h264_weight_factors {
> -+ __s8 luma_weight[32];
> -+ __s8 luma_offset[32];
> -+ __s8 chroma_weight[32][2];
> -+ __s8 chroma_offset[32][2];
> -+};
> -+
> -+struct v4l2_h264_pred_weight_table {
> -+ __u8 luma_log2_weight_denom;
> -+ __u8 chroma_log2_weight_denom;
> -+ struct v4l2_h264_weight_factors weight_factors[2];
> -+};
> -+
> -+#define V4L2_H264_SLICE_TYPE_P 0
> -+#define V4L2_H264_SLICE_TYPE_B 1
> -+#define V4L2_H264_SLICE_TYPE_I 2
> -+#define V4L2_H264_SLICE_TYPE_SP 3
> -+#define V4L2_H264_SLICE_TYPE_SI 4
> -+
> -+#define V4L2_H264_SLICE_FLAG_FIELD_PIC 0x01
> -+#define V4L2_H264_SLICE_FLAG_BOTTOM_FIELD 0x02
> -+#define V4L2_H264_SLICE_FLAG_DIRECT_SPATIAL_MV_PRED 0x04
> -+#define V4L2_H264_SLICE_FLAG_SP_FOR_SWITCH 0x08
> -+
> -+struct v4l2_ctrl_h264_slice_param {
> -+ /* Size in bytes, including header */
> -+ __u32 size;
> -+ /* Offset in bits to slice_data() from the beginning of this slice. */
> -+ __u32 header_bit_size;
> -+
> -+ __u16 first_mb_in_slice;
> -+ __u8 slice_type;
> -+ __u8 pic_parameter_set_id;
> -+ __u8 colour_plane_id;
> -+ __u16 frame_num;
> -+ __u16 idr_pic_id;
> -+ __u16 pic_order_cnt_lsb;
> -+ __s32 delta_pic_order_cnt_bottom;
> -+ __s32 delta_pic_order_cnt0;
> -+ __s32 delta_pic_order_cnt1;
> -+ __u8 redundant_pic_cnt;
> -+
> -+ struct v4l2_h264_pred_weight_table pred_weight_table;
> -+ /* Size in bits of dec_ref_pic_marking() syntax element. */
> -+ __u32 dec_ref_pic_marking_bit_size;
> -+ /* Size in bits of pic order count syntax. */
> -+ __u32 pic_order_cnt_bit_size;
> -+
> -+ __u8 cabac_init_idc;
> -+ __s8 slice_qp_delta;
> -+ __s8 slice_qs_delta;
> -+ __u8 disable_deblocking_filter_idc;
> -+ __s8 slice_alpha_c0_offset_div2;
> -+ __s8 slice_beta_offset_div2;
> -+ __u32 slice_group_change_cycle;
> -+
> -+ __u8 num_ref_idx_l0_active_minus1;
> -+ __u8 num_ref_idx_l1_active_minus1;
> -+ /* Entries on each list are indices
> -+ * into v4l2_ctrl_h264_decode_param.dpb[]. */
> -+ __u8 ref_pic_list0[32];
> -+ __u8 ref_pic_list1[32];
> -+
> -+ __u8 flags;
> -+};
> -+
> -+#define V4L2_H264_DPB_ENTRY_FLAG_VALID 0x01
> -+#define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE 0x02
> -+#define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM 0x04
> -+
> -+struct v4l2_h264_dpb_entry {
> -+ __u32 tag;
> -+ __u16 frame_num;
> -+ __u16 pic_num;
> -+ /* Note that field is indicated by v4l2_buffer.field */
> -+ __s32 top_field_order_cnt;
> -+ __s32 bottom_field_order_cnt;
> -+ __u8 flags; /* V4L2_H264_DPB_ENTRY_FLAG_* */
> -+};
> -+
> -+struct v4l2_ctrl_h264_decode_param {
> -+ __u32 num_slices;
> -+ __u8 idr_pic_flag;
> -+ __u8 nal_ref_idc;
> -+ __s32 top_field_order_cnt;
> -+ __s32 bottom_field_order_cnt;
> -+ __u8 ref_pic_list_p0[32];
> -+ __u8 ref_pic_list_b0[32];
> -+ __u8 ref_pic_list_b1[32];
> -+ struct v4l2_h264_dpb_entry dpb[16];
> -+};
> -+
> - #endif
> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> index dd028e0bf306..26f5bec9e988 100644
> --- a/include/uapi/linux/videodev2.h
> +++ b/include/uapi/linux/videodev2.h
> @@ -655,6 +655,7 @@ struct v4l2_pix_format {
> #define V4L2_PIX_FMT_VP8 v4l2_fourcc('V', 'P', '8', '0') /* VP8 */
> #define V4L2_PIX_FMT_VP9 v4l2_fourcc('V', 'P', '9', '0') /* VP9 */
> #define V4L2_PIX_FMT_HEVC v4l2_fourcc('H', 'E', 'V', 'C') /* HEVC aka H.265 */
> +#define V4L2_PIX_FMT_HEVC_SLICE v4l2_fourcc('S', '2', '6', '5') /* HEVC parsed slices */
> #define V4L2_PIX_FMT_FWHT v4l2_fourcc('F', 'W', 'H', 'T') /* Fast Walsh Hadamard Transform (vicodec) */
>
> /* Vendor-specific formats */
> @@ -1637,6 +1638,9 @@ struct v4l2_ext_control {
> struct v4l2_ctrl_h264_scaling_matrix __user *p_h264_scal_mtrx;
> struct v4l2_ctrl_h264_slice_param __user *p_h264_slice_param;
> struct v4l2_ctrl_h264_decode_param __user *p_h264_decode_param;
> + struct v4l2_ctrl_hevc_sps __user *p_hevc_sps;
> + struct v4l2_ctrl_hevc_pps __user *p_hevc_pps;
> + struct v4l2_ctrl_hevc_slice_params __user *p_hevc_slice_params;
> void __user *ptr;
> };
> } __attribute__ ((packed));
> @@ -1689,6 +1693,9 @@ enum v4l2_ctrl_type {
> V4L2_CTRL_TYPE_H264_SCALING_MATRIX = 0x0107,
> V4L2_CTRL_TYPE_H264_SLICE_PARAMS = 0x0108,
> V4L2_CTRL_TYPE_H264_DECODE_PARAMS = 0x0109,
> + V4L2_CTRL_TYPE_HEVC_SPS = 0x0110,
> + V4L2_CTRL_TYPE_HEVC_PPS = 0x0111,
> + V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS = 0x0112,
> };
>
> /* Used in the VIDIOC_QUERYCTRL ioctl for querying controls */
>

Regards,

Hans

2018-12-05 21:00:40

by Jernej Škrabec

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

Hi!

Dne petek, 23. november 2018 ob 14:02:08 CET je Paul Kocialkowski napisal(a):
> This introduces the required definitions for HEVC decoding support with
> stateless VPUs. The controls associated to the HEVC slice format provide
> the required meta-data for decoding slices extracted from the bitstream.
>
> This interface comes with the following limitations:
> * No custom quantization matrices (scaling lists);
> * Support for a single temporal layer only;
> * No slice entry point offsets support;
> * No conformance window support;
> * No VUI parameters support;
> * No support for SPS extensions: range, multilayer, 3d, scc, 4 bits;
> * No support for PPS extensions: range, multilayer, 3d, scc, 4 bits.
>
> Signed-off-by: Paul Kocialkowski <[email protected]>
> ---

<snip>

> diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c
> b/drivers/media/v4l2-core/v4l2-ctrls.c index e96c453208e8..9af17815ecc3
> 100644
> --- a/drivers/media/v4l2-core/v4l2-ctrls.c
> +++ b/drivers/media/v4l2-core/v4l2-ctrls.c
> @@ -913,6 +913,9 @@ const char *v4l2_ctrl_get_name(u32 id)
> case V4L2_CID_MPEG_VIDEO_HEVC_SIZE_OF_LENGTH_FIELD: return "HEVC Size of
> Length Field"; case V4L2_CID_MPEG_VIDEO_REF_NUMBER_FOR_PFRAMES: return
> "Reference Frames for a P-Frame"; case
> V4L2_CID_MPEG_VIDEO_PREPEND_SPSPPS_TO_IDR: return "Prepend SPS and PPS
to
> IDR"; + case V4L2_CID_MPEG_VIDEO_HEVC_SPS: return "HEVC Sequence
> Parameter Set"; + case V4L2_CID_MPEG_VIDEO_HEVC_PPS: return "HEVC
Picture
> Parameter Set"; + case V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS: return
"HEVC
> Slice Parameters";
>
> /* CAMERA controls */
> /* Keep the order of the 'case's the same as in v4l2-controls.h! */
> @@ -1320,6 +1323,15 @@ void v4l2_ctrl_fill(u32 id, const char **name, enum
> v4l2_ctrl_type *type, case V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS:
> *type = V4L2_CTRL_TYPE_H264_DECODE_PARAMS;
> break;
> + case V4L2_CID_MPEG_VIDEO_HEVC_SPS:
> + *type = V4L2_CTRL_TYPE_HEVC_SPS;
> + break;
> + case V4L2_CID_MPEG_VIDEO_HEVC_PPS:
> + *type = V4L2_CTRL_TYPE_HEVC_PPS;
> + break;
> + case V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS:
> + *type = V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS;
> + break;
> default:
> *type = V4L2_CTRL_TYPE_INTEGER;
> break;
> @@ -1692,6 +1704,11 @@ static int std_validate(const struct v4l2_ctrl *ctrl,
> u32 idx, case V4L2_CTRL_TYPE_H264_DECODE_PARAMS:
> return 0;
>
> + case V4L2_CTRL_TYPE_HEVC_SPS:
> + case V4L2_CTRL_TYPE_HEVC_PPS:
> + case V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS:
> + return 0;
> +
> default:
> return -EINVAL;
> }
> @@ -2287,6 +2304,15 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct
> v4l2_ctrl_handler *hdl, case V4L2_CTRL_TYPE_H264_DECODE_PARAMS:
> elem_size = sizeof(struct v4l2_ctrl_h264_decode_param);
> break;
> + case V4L2_CTRL_TYPE_HEVC_SPS:
> + elem_size = sizeof(struct v4l2_ctrl_hevc_sps);
> + break;
> + case V4L2_CTRL_TYPE_HEVC_PPS:
> + elem_size = sizeof(struct v4l2_ctrl_hevc_pps);
> + break;
> + case V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS:
> + elem_size = sizeof(struct v4l2_ctrl_hevc_slice_params);
> + break;
> default:
> if (type < V4L2_CTRL_COMPOUND_TYPES)
> elem_size = sizeof(s32);
> diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c
> b/drivers/media/v4l2-core/v4l2-ioctl.c index aa63f1794272..7bec91c6effe
> 100644
> --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> @@ -1321,6 +1321,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
> case V4L2_PIX_FMT_VP8: descr = "VP8"; break;
> case V4L2_PIX_FMT_VP9: descr = "VP9"; break;
> case V4L2_PIX_FMT_HEVC: descr = "HEVC"; break; /* aka H.265 */
> + case V4L2_PIX_FMT_HEVC_SLICE: descr = "HEVC Parsed Slice Data"; break;
> case V4L2_PIX_FMT_FWHT: descr = "FWHT"; break; /* used in vicodec */
> case V4L2_PIX_FMT_CPIA1: descr = "GSPCA CPiA YUV"; break;
> case V4L2_PIX_FMT_WNVA: descr = "WNVA"; break;
> diff --git a/include/media/v4l2-ctrls.h b/include/media/v4l2-ctrls.h
> index b4ca95710d2d..11664c5c3706 100644
> --- a/include/media/v4l2-ctrls.h
> +++ b/include/media/v4l2-ctrls.h
> @@ -48,6 +48,9 @@ struct poll_table_struct;
> * @p_h264_scal_mtrx: Pointer to a struct
v4l2_ctrl_h264_scaling_matrix.
> * @p_h264_slice_param: Pointer to a struct v4l2_ctrl_h264_slice_param.
> * @p_h264_decode_param: Pointer to a struct v4l2_ctrl_h264_decode_param.
> + * @p_hevc_sps: Pointer to an HEVC sequence parameter set structure.
> + * @p_hevc_pps: Pointer to an HEVC picture parameter set structure.
> + * @p_hevc_slice_params Pointer to an HEVC slice parameters structure.
> * @p: Pointer to a compound value.
> */
> union v4l2_ctrl_ptr {
> @@ -64,6 +67,9 @@ union v4l2_ctrl_ptr {
> struct v4l2_ctrl_h264_scaling_matrix *p_h264_scal_mtrx;
> struct v4l2_ctrl_h264_slice_param *p_h264_slice_param;
> struct v4l2_ctrl_h264_decode_param *p_h264_decode_param;
> + struct v4l2_ctrl_hevc_sps *p_hevc_sps;
> + struct v4l2_ctrl_hevc_pps *p_hevc_pps;
> + struct v4l2_ctrl_hevc_slice_params *p_hevc_slice_params;
> void *p;
> };
>
> diff --git a/include/uapi/linux/v4l2-controls.h
> b/include/uapi/linux/v4l2-controls.h index 628c0cdb51d9..5bbf63b2dad1
> 100644
> --- a/include/uapi/linux/v4l2-controls.h
> +++ b/include/uapi/linux/v4l2-controls.h
> @@ -709,6 +709,9 @@ enum v4l2_cid_mpeg_video_hevc_size_of_length_field {
> #define V4L2_CID_MPEG_VIDEO_HEVC_HIER_CODING_L6_BR (V4L2_CID_MPEG_BASE +
> 642) #define V4L2_CID_MPEG_VIDEO_REF_NUMBER_FOR_PFRAMES (V4L2_CID_MPEG_BASE
> + 643) #define
> V4L2_CID_MPEG_VIDEO_PREPEND_SPSPPS_TO_IDR (V4L2_CID_MPEG_BASE + 644)
> +#define V4L2_CID_MPEG_VIDEO_HEVC_SPS (V4L2_CID_MPEG_BASE + 645)
+#define
> V4L2_CID_MPEG_VIDEO_HEVC_PPS (V4L2_CID_MPEG_BASE + 646) +#define
> V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS (V4L2_CID_MPEG_BASE + 647)
>
> /* MPEG-class control IDs specific to the CX2341x driver as defined by
> V4L2 */ #define V4L2_CID_MPEG_CX2341X_BASE (V4L2_CTRL_CLASS_MPEG |
> 0x1000) @@ -1324,4 +1327,156 @@ struct v4l2_ctrl_h264_decode_param {
> struct v4l2_h264_dpb_entry dpb[16];
> };
>
> +#define V4L2_HEVC_SLICE_TYPE_B 0
> +#define V4L2_HEVC_SLICE_TYPE_P 1
> +#define V4L2_HEVC_SLICE_TYPE_I 2
> +
> +struct v4l2_ctrl_hevc_sps {
> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Sequence parameter set */
> + __u8 chroma_format_idc;
> + __u8 separate_colour_plane_flag;
> + __u16 pic_width_in_luma_samples;
> + __u16 pic_height_in_luma_samples;
> + __u8 bit_depth_luma_minus8;
> + __u8 bit_depth_chroma_minus8;
> + __u8 log2_max_pic_order_cnt_lsb_minus4;
> + __u8 sps_max_dec_pic_buffering_minus1;
> + __u8 sps_max_num_reorder_pics;
> + __u8 sps_max_latency_increase_plus1;
> + __u8 log2_min_luma_coding_block_size_minus3;
> + __u8 log2_diff_max_min_luma_coding_block_size;
> + __u8 log2_min_luma_transform_block_size_minus2;
> + __u8 log2_diff_max_min_luma_transform_block_size;
> + __u8 max_transform_hierarchy_depth_inter;
> + __u8 max_transform_hierarchy_depth_intra;
> + __u8 scaling_list_enabled_flag;
> + __u8 amp_enabled_flag;
> + __u8 sample_adaptive_offset_enabled_flag;
> + __u8 pcm_enabled_flag;
> + __u8 pcm_sample_bit_depth_luma_minus1;
> + __u8 pcm_sample_bit_depth_chroma_minus1;
> + __u8 log2_min_pcm_luma_coding_block_size_minus3;
> + __u8 log2_diff_max_min_pcm_luma_coding_block_size;
> + __u8 pcm_loop_filter_disabled_flag;
> + __u8 num_short_term_ref_pic_sets;
> + __u8 long_term_ref_pics_present_flag;
> + __u8 num_long_term_ref_pics_sps;
> + __u8 sps_temporal_mvp_enabled_flag;
> + __u8 strong_intra_smoothing_enabled_flag;
> +};
> +
> +struct v4l2_ctrl_hevc_pps {
> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture parameter set */
> + __u8 dependent_slice_segment_flag;
> + __u8 output_flag_present_flag;
> + __u8 num_extra_slice_header_bits;
> + __u8 sign_data_hiding_enabled_flag;
> + __u8 cabac_init_present_flag;
> + __s8 init_qp_minus26;
> + __u8 constrained_intra_pred_flag;
> + __u8 transform_skip_enabled_flag;
> + __u8 cu_qp_delta_enabled_flag;
> + __u8 diff_cu_qp_delta_depth;
> + __s8 pps_cb_qp_offset;
> + __s8 pps_cr_qp_offset;
> + __u8 pps_slice_chroma_qp_offsets_present_flag;
> + __u8 weighted_pred_flag;
> + __u8 weighted_bipred_flag;
> + __u8 transquant_bypass_enabled_flag;
> + __u8 tiles_enabled_flag;
> + __u8 entropy_coding_sync_enabled_flag;
> + __u8 num_tile_columns_minus1;
> + __u8 num_tile_rows_minus1;
> + __u8 column_width_minus1[20];
> + __u8 row_height_minus1[22];
> + __u8 loop_filter_across_tiles_enabled_flag;
> + __u8 pps_loop_filter_across_slices_enabled_flag;
> + __u8 deblocking_filter_override_enabled_flag;
> + __u8 pps_disable_deblocking_filter_flag;
> + __s8 pps_beta_offset_div2;
> + __s8 pps_tc_offset_div2;
> + __u8 lists_modification_present_flag;
> + __u8 log2_parallel_merge_level_minus2;
> + __u8 slice_segment_header_extension_present_flag;
> +};

Although scaling lists are not supported yet, I still think you should include
"scaling_list_data_present_flag" here for the sake of completeness and you
already included "scaling_list_enable_flag" in SPS.

I didn't do any thorough review though, just noticed this bit.

Best regards,
Jernej

> +
> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE 0x01
> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER 0x02
> +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR 0x03
> +
> +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX 16
> +
> +struct v4l2_hevc_dpb_entry {
> + __u32 buffer_tag;
> + __u8 rps;
> + __u8 field_pic;
> + __u16 pic_order_cnt[2];
> +};
> +
> +struct v4l2_hevc_pred_weight_table {
> + __u8 luma_log2_weight_denom;
> + __s8 delta_chroma_log2_weight_denom;
> +
> + __s8 delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> + __s8 luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> + __s8 delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> + __s8 chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> +
> + __s8 delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> + __s8 luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> + __s8 delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> + __s8 chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> +};
> +
> +struct v4l2_ctrl_hevc_slice_params {
> + __u32 bit_size;
> + __u32 data_bit_offset;
> +
> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
> + __u8 nal_unit_type;
> + __u8 nuh_temporal_id_plus1;
> +
> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
> + __u8 slice_type;
> + __u8 colour_plane_id;
> + __u16 slice_pic_order_cnt;
> + __u8 slice_sao_luma_flag;
> + __u8 slice_sao_chroma_flag;
> + __u8 slice_temporal_mvp_enabled_flag;
> + __u8 num_ref_idx_l0_active_minus1;
> + __u8 num_ref_idx_l1_active_minus1;
> + __u8 mvd_l1_zero_flag;
> + __u8 cabac_init_flag;
> + __u8 collocated_from_l0_flag;
> + __u8 collocated_ref_idx;
> + __u8 five_minus_max_num_merge_cand;
> + __u8 use_integer_mv_flag;
> + __s8 slice_qp_delta;
> + __s8 slice_cb_qp_offset;
> + __s8 slice_cr_qp_offset;
> + __s8 slice_act_y_qp_offset;
> + __s8 slice_act_cb_qp_offset;
> + __s8 slice_act_cr_qp_offset;
> + __u8 slice_deblocking_filter_disabled_flag;
> + __s8 slice_beta_offset_div2;
> + __s8 slice_tc_offset_div2;
> + __u8 slice_loop_filter_across_slices_enabled_flag;
> +
> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture timing SEI message */
> + __u8 pic_struct;
> +
> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
> + struct v4l2_hevc_dpb_entry dpb[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> + __u8 num_active_dpb_entries;
> + __u8 ref_idx_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> + __u8 ref_idx_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> +
> + __u8 num_rps_poc_st_curr_before;
> + __u8 num_rps_poc_st_curr_after;
> + __u8 num_rps_poc_lt_curr;
> +
> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Weighted prediction parameter */
> + struct v4l2_hevc_pred_weight_table pred_weight_table;
> +};
> +
> #endif




2018-12-12 12:54:08

by Paul Kocialkowski

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

Hi,

On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
> Hi!
>
> Dne petek, 23. november 2018 ob 14:02:08 CET je Paul Kocialkowski napisal(a):
> > This introduces the required definitions for HEVC decoding support with
> > stateless VPUs. The controls associated to the HEVC slice format provide
> > the required meta-data for decoding slices extracted from the bitstream.
> >
> > This interface comes with the following limitations:
> > * No custom quantization matrices (scaling lists);
> > * Support for a single temporal layer only;
> > * No slice entry point offsets support;
> > * No conformance window support;
> > * No VUI parameters support;
> > * No support for SPS extensions: range, multilayer, 3d, scc, 4 bits;
> > * No support for PPS extensions: range, multilayer, 3d, scc, 4 bits.
> >
> > Signed-off-by: Paul Kocialkowski <[email protected]>
> > ---
>
> <snip>
>
> > diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c
> > b/drivers/media/v4l2-core/v4l2-ctrls.c index e96c453208e8..9af17815ecc3
> > 100644
> > --- a/drivers/media/v4l2-core/v4l2-ctrls.c
> > +++ b/drivers/media/v4l2-core/v4l2-ctrls.c
> > @@ -913,6 +913,9 @@ const char *v4l2_ctrl_get_name(u32 id)
> > case V4L2_CID_MPEG_VIDEO_HEVC_SIZE_OF_LENGTH_FIELD: return "HEVC Size of
> > Length Field"; case V4L2_CID_MPEG_VIDEO_REF_NUMBER_FOR_PFRAMES: return
> > "Reference Frames for a P-Frame"; case
> > V4L2_CID_MPEG_VIDEO_PREPEND_SPSPPS_TO_IDR: return "Prepend SPS and PPS
> to
> > IDR"; + case V4L2_CID_MPEG_VIDEO_HEVC_SPS: return "HEVC Sequence
> > Parameter Set"; + case V4L2_CID_MPEG_VIDEO_HEVC_PPS: return "HEVC
> Picture
> > Parameter Set"; + case V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS: return
> "HEVC
> > Slice Parameters";
> >
> > /* CAMERA controls */
> > /* Keep the order of the 'case's the same as in v4l2-controls.h! */
> > @@ -1320,6 +1323,15 @@ void v4l2_ctrl_fill(u32 id, const char **name, enum
> > v4l2_ctrl_type *type, case V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS:
> > *type = V4L2_CTRL_TYPE_H264_DECODE_PARAMS;
> > break;
> > + case V4L2_CID_MPEG_VIDEO_HEVC_SPS:
> > + *type = V4L2_CTRL_TYPE_HEVC_SPS;
> > + break;
> > + case V4L2_CID_MPEG_VIDEO_HEVC_PPS:
> > + *type = V4L2_CTRL_TYPE_HEVC_PPS;
> > + break;
> > + case V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS:
> > + *type = V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS;
> > + break;
> > default:
> > *type = V4L2_CTRL_TYPE_INTEGER;
> > break;
> > @@ -1692,6 +1704,11 @@ static int std_validate(const struct v4l2_ctrl *ctrl,
> > u32 idx, case V4L2_CTRL_TYPE_H264_DECODE_PARAMS:
> > return 0;
> >
> > + case V4L2_CTRL_TYPE_HEVC_SPS:
> > + case V4L2_CTRL_TYPE_HEVC_PPS:
> > + case V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS:
> > + return 0;
> > +
> > default:
> > return -EINVAL;
> > }
> > @@ -2287,6 +2304,15 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct
> > v4l2_ctrl_handler *hdl, case V4L2_CTRL_TYPE_H264_DECODE_PARAMS:
> > elem_size = sizeof(struct v4l2_ctrl_h264_decode_param);
> > break;
> > + case V4L2_CTRL_TYPE_HEVC_SPS:
> > + elem_size = sizeof(struct v4l2_ctrl_hevc_sps);
> > + break;
> > + case V4L2_CTRL_TYPE_HEVC_PPS:
> > + elem_size = sizeof(struct v4l2_ctrl_hevc_pps);
> > + break;
> > + case V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS:
> > + elem_size = sizeof(struct v4l2_ctrl_hevc_slice_params);
> > + break;
> > default:
> > if (type < V4L2_CTRL_COMPOUND_TYPES)
> > elem_size = sizeof(s32);
> > diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c
> > b/drivers/media/v4l2-core/v4l2-ioctl.c index aa63f1794272..7bec91c6effe
> > 100644
> > --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> > +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> > @@ -1321,6 +1321,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
> > case V4L2_PIX_FMT_VP8: descr = "VP8"; break;
> > case V4L2_PIX_FMT_VP9: descr = "VP9"; break;
> > case V4L2_PIX_FMT_HEVC: descr = "HEVC"; break; /* aka H.265 */
> > + case V4L2_PIX_FMT_HEVC_SLICE: descr = "HEVC Parsed Slice Data"; break;
> > case V4L2_PIX_FMT_FWHT: descr = "FWHT"; break; /* used in vicodec */
> > case V4L2_PIX_FMT_CPIA1: descr = "GSPCA CPiA YUV"; break;
> > case V4L2_PIX_FMT_WNVA: descr = "WNVA"; break;
> > diff --git a/include/media/v4l2-ctrls.h b/include/media/v4l2-ctrls.h
> > index b4ca95710d2d..11664c5c3706 100644
> > --- a/include/media/v4l2-ctrls.h
> > +++ b/include/media/v4l2-ctrls.h
> > @@ -48,6 +48,9 @@ struct poll_table_struct;
> > * @p_h264_scal_mtrx: Pointer to a struct
> v4l2_ctrl_h264_scaling_matrix.
> > * @p_h264_slice_param: Pointer to a struct v4l2_ctrl_h264_slice_param.
> > * @p_h264_decode_param: Pointer to a struct v4l2_ctrl_h264_decode_param.
> > + * @p_hevc_sps: Pointer to an HEVC sequence parameter set structure.
> > + * @p_hevc_pps: Pointer to an HEVC picture parameter set structure.
> > + * @p_hevc_slice_params Pointer to an HEVC slice parameters structure.
> > * @p: Pointer to a compound value.
> > */
> > union v4l2_ctrl_ptr {
> > @@ -64,6 +67,9 @@ union v4l2_ctrl_ptr {
> > struct v4l2_ctrl_h264_scaling_matrix *p_h264_scal_mtrx;
> > struct v4l2_ctrl_h264_slice_param *p_h264_slice_param;
> > struct v4l2_ctrl_h264_decode_param *p_h264_decode_param;
> > + struct v4l2_ctrl_hevc_sps *p_hevc_sps;
> > + struct v4l2_ctrl_hevc_pps *p_hevc_pps;
> > + struct v4l2_ctrl_hevc_slice_params *p_hevc_slice_params;
> > void *p;
> > };
> >
> > diff --git a/include/uapi/linux/v4l2-controls.h
> > b/include/uapi/linux/v4l2-controls.h index 628c0cdb51d9..5bbf63b2dad1
> > 100644
> > --- a/include/uapi/linux/v4l2-controls.h
> > +++ b/include/uapi/linux/v4l2-controls.h
> > @@ -709,6 +709,9 @@ enum v4l2_cid_mpeg_video_hevc_size_of_length_field {
> > #define V4L2_CID_MPEG_VIDEO_HEVC_HIER_CODING_L6_BR (V4L2_CID_MPEG_BASE +
> > 642) #define V4L2_CID_MPEG_VIDEO_REF_NUMBER_FOR_PFRAMES (V4L2_CID_MPEG_BASE
> > + 643) #define
> > V4L2_CID_MPEG_VIDEO_PREPEND_SPSPPS_TO_IDR (V4L2_CID_MPEG_BASE + 644)
> > +#define V4L2_CID_MPEG_VIDEO_HEVC_SPS (V4L2_CID_MPEG_BASE + 645)
> +#define
> > V4L2_CID_MPEG_VIDEO_HEVC_PPS (V4L2_CID_MPEG_BASE + 646) +#define
> > V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS (V4L2_CID_MPEG_BASE + 647)
> >
> > /* MPEG-class control IDs specific to the CX2341x driver as defined by
> > V4L2 */ #define V4L2_CID_MPEG_CX2341X_BASE (V4L2_CTRL_CLASS_MPEG |
> > 0x1000) @@ -1324,4 +1327,156 @@ struct v4l2_ctrl_h264_decode_param {
> > struct v4l2_h264_dpb_entry dpb[16];
> > };
> >
> > +#define V4L2_HEVC_SLICE_TYPE_B 0
> > +#define V4L2_HEVC_SLICE_TYPE_P 1
> > +#define V4L2_HEVC_SLICE_TYPE_I 2
> > +
> > +struct v4l2_ctrl_hevc_sps {
> > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Sequence parameter set */
> > + __u8 chroma_format_idc;
> > + __u8 separate_colour_plane_flag;
> > + __u16 pic_width_in_luma_samples;
> > + __u16 pic_height_in_luma_samples;
> > + __u8 bit_depth_luma_minus8;
> > + __u8 bit_depth_chroma_minus8;
> > + __u8 log2_max_pic_order_cnt_lsb_minus4;
> > + __u8 sps_max_dec_pic_buffering_minus1;
> > + __u8 sps_max_num_reorder_pics;
> > + __u8 sps_max_latency_increase_plus1;
> > + __u8 log2_min_luma_coding_block_size_minus3;
> > + __u8 log2_diff_max_min_luma_coding_block_size;
> > + __u8 log2_min_luma_transform_block_size_minus2;
> > + __u8 log2_diff_max_min_luma_transform_block_size;
> > + __u8 max_transform_hierarchy_depth_inter;
> > + __u8 max_transform_hierarchy_depth_intra;
> > + __u8 scaling_list_enabled_flag;
> > + __u8 amp_enabled_flag;
> > + __u8 sample_adaptive_offset_enabled_flag;
> > + __u8 pcm_enabled_flag;
> > + __u8 pcm_sample_bit_depth_luma_minus1;
> > + __u8 pcm_sample_bit_depth_chroma_minus1;
> > + __u8 log2_min_pcm_luma_coding_block_size_minus3;
> > + __u8 log2_diff_max_min_pcm_luma_coding_block_size;
> > + __u8 pcm_loop_filter_disabled_flag;
> > + __u8 num_short_term_ref_pic_sets;
> > + __u8 long_term_ref_pics_present_flag;
> > + __u8 num_long_term_ref_pics_sps;
> > + __u8 sps_temporal_mvp_enabled_flag;
> > + __u8 strong_intra_smoothing_enabled_flag;
> > +};
> > +
> > +struct v4l2_ctrl_hevc_pps {
> > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture parameter set */
> > + __u8 dependent_slice_segment_flag;
> > + __u8 output_flag_present_flag;
> > + __u8 num_extra_slice_header_bits;
> > + __u8 sign_data_hiding_enabled_flag;
> > + __u8 cabac_init_present_flag;
> > + __s8 init_qp_minus26;
> > + __u8 constrained_intra_pred_flag;
> > + __u8 transform_skip_enabled_flag;
> > + __u8 cu_qp_delta_enabled_flag;
> > + __u8 diff_cu_qp_delta_depth;
> > + __s8 pps_cb_qp_offset;
> > + __s8 pps_cr_qp_offset;
> > + __u8 pps_slice_chroma_qp_offsets_present_flag;
> > + __u8 weighted_pred_flag;
> > + __u8 weighted_bipred_flag;
> > + __u8 transquant_bypass_enabled_flag;
> > + __u8 tiles_enabled_flag;
> > + __u8 entropy_coding_sync_enabled_flag;
> > + __u8 num_tile_columns_minus1;
> > + __u8 num_tile_rows_minus1;
> > + __u8 column_width_minus1[20];
> > + __u8 row_height_minus1[22];
> > + __u8 loop_filter_across_tiles_enabled_flag;
> > + __u8 pps_loop_filter_across_slices_enabled_flag;
> > + __u8 deblocking_filter_override_enabled_flag;
> > + __u8 pps_disable_deblocking_filter_flag;
> > + __s8 pps_beta_offset_div2;
> > + __s8 pps_tc_offset_div2;
> > + __u8 lists_modification_present_flag;
> > + __u8 log2_parallel_merge_level_minus2;
> > + __u8 slice_segment_header_extension_present_flag;
> > +};
>
> Although scaling lists are not supported yet, I still think you should include
> "scaling_list_data_present_flag" here for the sake of completeness and you
> already included "scaling_list_enable_flag" in SPS.
>
> I didn't do any thorough review though, just noticed this bit.

Thanks for suggestion! I decided to discard these
"scaling_list_data_present_flag" fields because I think it's best to
have a dedicated control for the scaling list (like in the current
H.264 proposal). With a dedicated control, scaling lists are no longer
attached to either the PPS or SPS so I don't think it makes sense to
have "scaling_list_data_present_flag" fields in these structures.

Drivers can just infer whether custom scaling lists are used or not
with the presence of the optional control and they don't need to know
if it was originally extracted from the PPS or SPS.

Does that make sense to you?

Cheers,

Paul

> Best regards,
> Jernej
>
> > +
> > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE 0x01
> > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER 0x02
> > +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR 0x03
> > +
> > +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX 16
> > +
> > +struct v4l2_hevc_dpb_entry {
> > + __u32 buffer_tag;
> > + __u8 rps;
> > + __u8 field_pic;
> > + __u16 pic_order_cnt[2];
> > +};
> > +
> > +struct v4l2_hevc_pred_weight_table {
> > + __u8 luma_log2_weight_denom;
> > + __s8 delta_chroma_log2_weight_denom;
> > +
> > + __s8 delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > + __s8 luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > + __s8 delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > + __s8 chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > +
> > + __s8 delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > + __s8 luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > + __s8 delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > + __s8 chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > +};
> > +
> > +struct v4l2_ctrl_hevc_slice_params {
> > + __u32 bit_size;
> > + __u32 data_bit_offset;
> > +
> > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
> > + __u8 nal_unit_type;
> > + __u8 nuh_temporal_id_plus1;
> > +
> > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
> > + __u8 slice_type;
> > + __u8 colour_plane_id;
> > + __u16 slice_pic_order_cnt;
> > + __u8 slice_sao_luma_flag;
> > + __u8 slice_sao_chroma_flag;
> > + __u8 slice_temporal_mvp_enabled_flag;
> > + __u8 num_ref_idx_l0_active_minus1;
> > + __u8 num_ref_idx_l1_active_minus1;
> > + __u8 mvd_l1_zero_flag;
> > + __u8 cabac_init_flag;
> > + __u8 collocated_from_l0_flag;
> > + __u8 collocated_ref_idx;
> > + __u8 five_minus_max_num_merge_cand;
> > + __u8 use_integer_mv_flag;
> > + __s8 slice_qp_delta;
> > + __s8 slice_cb_qp_offset;
> > + __s8 slice_cr_qp_offset;
> > + __s8 slice_act_y_qp_offset;
> > + __s8 slice_act_cb_qp_offset;
> > + __s8 slice_act_cr_qp_offset;
> > + __u8 slice_deblocking_filter_disabled_flag;
> > + __s8 slice_beta_offset_div2;
> > + __s8 slice_tc_offset_div2;
> > + __u8 slice_loop_filter_across_slices_enabled_flag;
> > +
> > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture timing SEI message */
> > + __u8 pic_struct;
> > +
> > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
> > + struct v4l2_hevc_dpb_entry dpb[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > + __u8 num_active_dpb_entries;
> > + __u8 ref_idx_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > + __u8 ref_idx_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > +
> > + __u8 num_rps_poc_st_curr_before;
> > + __u8 num_rps_poc_st_curr_after;
> > + __u8 num_rps_poc_lt_curr;
> > +
> > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Weighted prediction parameter */
> > + struct v4l2_hevc_pred_weight_table pred_weight_table;
> > +};
> > +
> > #endif
>
>
--
Paul Kocialkowski, Bootlin (formerly Free Electrons)
Embedded Linux and kernel engineering
https://bootlin.com


2019-01-07 03:59:24

by Randy Li

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls


On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
> Hi,
>
> On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
>
>>> +
>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE 0x01
>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER 0x02
>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR 0x03
>>> +
>>> +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX 16
>>> +
>>> +struct v4l2_hevc_dpb_entry {
>>> + __u32 buffer_tag;
>>> + __u8 rps;
>>> + __u8 field_pic;
>>> + __u16 pic_order_cnt[2];
>>> +};

Please add a property for reference index, if that rps is not used for
this, some device would request that(not the rockchip one). And
Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.

Adding another buffer_tag for referring the memory of the motion vectors
for each frames. Or a better method is add a meta data to echo picture
buffer,  since the picture output is just the same as the original,
display won't care whether the motion vectors are written the button of
picture or somewhere else.


>>> +
>>> +struct v4l2_hevc_pred_weight_table {
>>> + __u8 luma_log2_weight_denom;
>>> + __s8 delta_chroma_log2_weight_denom;
>>> +
>>> + __s8 delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>> + __s8 luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>> + __s8 delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>> + __s8 chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>> +
>>> + __s8 delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>> + __s8 luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>> + __s8 delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>> + __s8 chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>> +};
>>> +
Those properties I think are not necessary are applying for the
Rockchip's device, may not work for the others.
>>> +struct v4l2_ctrl_hevc_slice_params {
>>> + __u32 bit_size;
>>> + __u32 data_bit_offset;
>>> +
>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
>>> + __u8 nal_unit_type;
>>> + __u8 nuh_temporal_id_plus1;
>>> +
>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
>>> + __u8 slice_type;
>>> + __u8 colour_plane_id;
----------------------------------------------------------------------------
>>> + __u16 slice_pic_order_cnt;
>>> + __u8 slice_sao_luma_flag;
>>> + __u8 slice_sao_chroma_flag;
>>> + __u8 slice_temporal_mvp_enabled_flag;
>>> + __u8 num_ref_idx_l0_active_minus1;
>>> + __u8 num_ref_idx_l1_active_minus1;
Rockchip's decoder doesn't use this part.
>>> + __u8 mvd_l1_zero_flag;
>>> + __u8 cabac_init_flag;
>>> + __u8 collocated_from_l0_flag;
>>> + __u8 collocated_ref_idx;
>>> + __u8 five_minus_max_num_merge_cand;
>>> + __u8 use_integer_mv_flag;
>>> + __s8 slice_qp_delta;
>>> + __s8 slice_cb_qp_offset;
>>> + __s8 slice_cr_qp_offset;
>>> + __s8 slice_act_y_qp_offset;
>>> + __s8 slice_act_cb_qp_offset;
>>> + __s8 slice_act_cr_qp_offset;
>>> + __u8 slice_deblocking_filter_disabled_flag;
>>> + __s8 slice_beta_offset_div2;
>>> + __s8 slice_tc_offset_div2;
>>> + __u8 slice_loop_filter_across_slices_enabled_flag;
>>> +
>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture timing SEI message */
>>> + __u8 pic_struct;
I think the decoder doesn't care about this, it is used for display.
>>> +
>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
>>> + struct v4l2_hevc_dpb_entry dpb[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>> + __u8 num_active_dpb_entries;
>>> + __u8 ref_idx_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>> + __u8 ref_idx_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>> +
>>> + __u8 num_rps_poc_st_curr_before;
>>> + __u8 num_rps_poc_st_curr_after;
>>> + __u8 num_rps_poc_lt_curr;
>>> +
>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Weighted prediction parameter */
>>> + struct v4l2_hevc_pred_weight_table pred_weight_table;
>>> +};
>>> +
>>> #endif
>>


2019-01-07 09:59:42

by Paul Kocialkowski

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

Hi,

On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
> On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
> > Hi,
> >
> > On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
> >
> > > > +
> > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE 0x01
> > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER 0x02
> > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR 0x03
> > > > +
> > > > +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX 16
> > > > +
> > > > +struct v4l2_hevc_dpb_entry {
> > > > + __u32 buffer_tag;
> > > > + __u8 rps;
> > > > + __u8 field_pic;
> > > > + __u16 pic_order_cnt[2];
> > > > +};
>
> Please add a property for reference index, if that rps is not used for
> this, some device would request that(not the rockchip one). And
> Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.

What exactly is that reference index? Is it a bitstream element or
something deduced from the bitstream?

> Adding another buffer_tag for referring the memory of the motion vectors
> for each frames. Or a better method is add a meta data to echo picture
> buffer, since the picture output is just the same as the original,
> display won't care whether the motion vectors are written the button of
> picture or somewhere else.

The motion vectors are passed as part of the raw bitstream data, in the
slices. Is there a case where the motion vectors are coded differently?

> > > > +
> > > > +struct v4l2_hevc_pred_weight_table {
> > > > + __u8 luma_log2_weight_denom;
> > > > + __s8 delta_chroma_log2_weight_denom;
> > > > +
> > > > + __s8 delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > + __s8 luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > + __s8 delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > + __s8 chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > +
> > > > + __s8 delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > + __s8 luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > + __s8 delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > + __s8 chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > +};
> > > > +
> Those properties I think are not necessary are applying for the
> Rockchip's device, may not work for the others.

Yes, it's possible that some of the elements are not necessary for some
decoders. What we want is to cover all the elements that might be
required for a decoder.

> > > > +struct v4l2_ctrl_hevc_slice_params {
> > > > + __u32 bit_size;
> > > > + __u32 data_bit_offset;
> > > > +
> > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
> > > > + __u8 nal_unit_type;
> > > > + __u8 nuh_temporal_id_plus1;
> > > > +
> > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
> > > > + __u8 slice_type;
> > > > + __u8 colour_plane_id;
> ----------------------------------------------------------------------------
> > > > + __u16 slice_pic_order_cnt;
> > > > + __u8 slice_sao_luma_flag;
> > > > + __u8 slice_sao_chroma_flag;
> > > > + __u8 slice_temporal_mvp_enabled_flag;
> > > > + __u8 num_ref_idx_l0_active_minus1;
> > > > + __u8 num_ref_idx_l1_active_minus1;
> Rockchip's decoder doesn't use this part.
> > > > + __u8 mvd_l1_zero_flag;
> > > > + __u8 cabac_init_flag;
> > > > + __u8 collocated_from_l0_flag;
> > > > + __u8 collocated_ref_idx;
> > > > + __u8 five_minus_max_num_merge_cand;
> > > > + __u8 use_integer_mv_flag;
> > > > + __s8 slice_qp_delta;
> > > > + __s8 slice_cb_qp_offset;
> > > > + __s8 slice_cr_qp_offset;
> > > > + __s8 slice_act_y_qp_offset;
> > > > + __s8 slice_act_cb_qp_offset;
> > > > + __s8 slice_act_cr_qp_offset;
> > > > + __u8 slice_deblocking_filter_disabled_flag;
> > > > + __s8 slice_beta_offset_div2;
> > > > + __s8 slice_tc_offset_div2;
> > > > + __u8 slice_loop_filter_across_slices_enabled_flag;
> > > > +
> > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture timing SEI message */
> > > > + __u8 pic_struct;
> I think the decoder doesn't care about this, it is used for display.

The purpose of this field is to indicate whether the current picture is
a progressive frame or an interlaced field picture, which is useful for
decoding.

At least our decoder has a register field to indicate frame/top
field/bottom field, so we certainly need to keep the info around.
Looking at the spec and the ffmpeg implementation, it looks like this
flag of the bitstream is the usual way to report field coding.

Cheers,

Paul

> > > > +
> > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
> > > > + struct v4l2_hevc_dpb_entry dpb[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > + __u8 num_active_dpb_entries;
> > > > + __u8 ref_idx_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > + __u8 ref_idx_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > +
> > > > + __u8 num_rps_poc_st_curr_before;
> > > > + __u8 num_rps_poc_st_curr_after;
> > > > + __u8 num_rps_poc_lt_curr;
> > > > +
> > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Weighted prediction parameter */
> > > > + struct v4l2_hevc_pred_weight_table pred_weight_table;
> > > > +};
> > > > +
> > > > #endif
--
Paul Kocialkowski, Bootlin (formerly Free Electrons)
Embedded Linux and kernel engineering
https://bootlin.com


2019-01-08 01:18:52

by Randy Li

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls



Sent from my iPad

> On Jan 7, 2019, at 5:57 PM, Paul Kocialkowski <[email protected]> wrote:
>
> Hi,
>
>> On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
>>> On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
>>> Hi,
>>>
>>> On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
>>>
>>>>> +
>>>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE 0x01
>>>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER 0x02
>>>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR 0x03
>>>>> +
>>>>> +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX 16
>>>>> +
>>>>> +struct v4l2_hevc_dpb_entry {
>>>>> + __u32 buffer_tag;
>>>>> + __u8 rps;
>>>>> + __u8 field_pic;
>>>>> + __u16 pic_order_cnt[2];
>>>>> +};
>>
>> Please add a property for reference index, if that rps is not used for
>> this, some device would request that(not the rockchip one). And
>> Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.
>
> What exactly is that reference index? Is it a bitstream element or
> something deduced from the bitstream?
>
picture order count(POC) for HEVC and frame_num in AVC. I think it is the number used in list0(P slice and B slice) and list1(B slice).
>> Adding another buffer_tag for referring the memory of the motion vectors
>> for each frames. Or a better method is add a meta data to echo picture
>> buffer, since the picture output is just the same as the original,
>> display won't care whether the motion vectors are written the button of
>> picture or somewhere else.
>
> The motion vectors are passed as part of the raw bitstream data, in the
> slices. Is there a case where the motion vectors are coded differently?
No, it is an additional cache for decoder, even FFmpeg having such data, I think allwinner must output it into somewhere.
>
>>>>> +
>>>>> +struct v4l2_hevc_pred_weight_table {
>>>>> + __u8 luma_log2_weight_denom;
>>>>> + __s8 delta_chroma_log2_weight_denom;
>>>>> +
>>>>> + __s8 delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>> + __s8 luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>> + __s8 delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>> + __s8 chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>> +
>>>>> + __s8 delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>> + __s8 luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>> + __s8 delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>> + __s8 chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>> +};
>>>>> +
>> Those properties I think are not necessary are applying for the
>> Rockchip's device, may not work for the others.
>
> Yes, it's possible that some of the elements are not necessary for some
> decoders. What we want is to cover all the elements that might be
> required for a decoder.
I wonder whether allwinner need that, those sao flag usually ignored by decoder in design. But more is better than less, it is hard to extend a v4l2 structure in the future, maybe a new HEVC profile would bring a new property, it is still too early for HEVC.
>
>>>>> +struct v4l2_ctrl_hevc_slice_params {
>>>>> + __u32 bit_size;
>>>>> + __u32 data_bit_offset;
>>>>> +
>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
>>>>> + __u8 nal_unit_type;
>>>>> + __u8 nuh_temporal_id_plus1;
>>>>> +
>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
>>>>> + __u8 slice_type;
>>>>> + __u8 colour_plane_id;
>> ----------------------------------------------------------------------------
>>>>> + __u16 slice_pic_order_cnt;
>>>>> + __u8 slice_sao_luma_flag;
>>>>> + __u8 slice_sao_chroma_flag;
>>>>> + __u8 slice_temporal_mvp_enabled_flag;
>>>>> + __u8 num_ref_idx_l0_active_minus1;
>>>>> + __u8 num_ref_idx_l1_active_minus1;
>> Rockchip's decoder doesn't use this part.
>>>>> + __u8 mvd_l1_zero_flag;
>>>>> + __u8 cabac_init_flag;
>>>>> + __u8 collocated_from_l0_flag;
>>>>> + __u8 collocated_ref_idx;
>>>>> + __u8 five_minus_max_num_merge_cand;
>>>>> + __u8 use_integer_mv_flag;
>>>>> + __s8 slice_qp_delta;
>>>>> + __s8 slice_cb_qp_offset;
>>>>> + __s8 slice_cr_qp_offset;
>>>>> + __s8 slice_act_y_qp_offset;
>>>>> + __s8 slice_act_cb_qp_offset;
>>>>> + __s8 slice_act_cr_qp_offset;
>>>>> + __u8 slice_deblocking_filter_disabled_flag;
>>>>> + __s8 slice_beta_offset_div2;
>>>>> + __s8 slice_tc_offset_div2;
>>>>> + __u8 slice_loop_filter_across_slices_enabled_flag;
>>>>> +
>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture timing SEI message */
>>>>> + __u8 pic_struct;
>> I think the decoder doesn't care about this, it is used for display.
>
> The purpose of this field is to indicate whether the current picture is
> a progressive frame or an interlaced field picture, which is useful for
> decoding.
>
> At least our decoder has a register field to indicate frame/top
> field/bottom field, so we certainly need to keep the info around.
> Looking at the spec and the ffmpeg implementation, it looks like this
> flag of the bitstream is the usual way to report field coding.
It depends whether the decoder cares about scan type or more, I wonder prefer general_interlaced_source_flag for just scan type, it would be better than reading another SEL.
>
> Cheers,
>
> Paul
Randy “ayaka” LI
>
>>>>> +
>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
>>>>> + struct v4l2_hevc_dpb_entry dpb[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>> + __u8 num_active_dpb_entries;
>>>>> + __u8 ref_idx_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>> + __u8 ref_idx_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>> +
>>>>> + __u8 num_rps_poc_st_curr_before;
>>>>> + __u8 num_rps_poc_st_curr_after;
>>>>> + __u8 num_rps_poc_lt_curr;
>>>>> +
>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Weighted prediction parameter */
>>>>> + struct v4l2_hevc_pred_weight_table pred_weight_table;
>>>>> +};
>>>>> +
>>>>> #endif
> --
> Paul Kocialkowski, Bootlin (formerly Free Electrons)
> Embedded Linux and kernel engineering
> https://bootlin.com
>


2019-01-08 08:41:43

by Paul Kocialkowski

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

Hi,

On Tue, 2019-01-08 at 09:16 +0800, Ayaka wrote:
>
> Sent from my iPad
>
> > On Jan 7, 2019, at 5:57 PM, Paul Kocialkowski <[email protected]> wrote:
> >
> > Hi,
> >
> > > On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
> > > > On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
> > > > Hi,
> > > >
> > > > On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
> > > >
> > > > > > +
> > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE 0x01
> > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER 0x02
> > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR 0x03
> > > > > > +
> > > > > > +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX 16
> > > > > > +
> > > > > > +struct v4l2_hevc_dpb_entry {
> > > > > > + __u32 buffer_tag;
> > > > > > + __u8 rps;
> > > > > > + __u8 field_pic;
> > > > > > + __u16 pic_order_cnt[2];
> > > > > > +};
> > >
> > > Please add a property for reference index, if that rps is not used for
> > > this, some device would request that(not the rockchip one). And
> > > Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.
> >
> > What exactly is that reference index? Is it a bitstream element or
> > something deduced from the bitstream?
> >
> picture order count(POC) for HEVC and frame_num in AVC. I think it is
> the number used in list0(P slice and B slice) and list1(B slice).

The picture order count is already the last field of the DPB entry
structure. There is one for each field picture.

> > > Adding another buffer_tag for referring the memory of the motion vectors
> > > for each frames. Or a better method is add a meta data to echo picture
> > > buffer, since the picture output is just the same as the original,
> > > display won't care whether the motion vectors are written the button of
> > > picture or somewhere else.
> >
> > The motion vectors are passed as part of the raw bitstream data, in the
> > slices. Is there a case where the motion vectors are coded differently?
> No, it is an additional cache for decoder, even FFmpeg having such
> data, I think allwinner must output it into somewhere.

Ah yes I see what you mean! This is handled internally by our driver
and not exposed to userspace. I don't think it would be a good idea to
expose this cache or request that userspace allocates it like a video
buffer.

> > > > > > +
> > > > > > +struct v4l2_hevc_pred_weight_table {
> > > > > > + __u8 luma_log2_weight_denom;
> > > > > > + __s8 delta_chroma_log2_weight_denom;
> > > > > > +
> > > > > > + __s8 delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > + __s8 luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > + __s8 delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > + __s8 chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > +
> > > > > > + __s8 delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > + __s8 luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > + __s8 delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > + __s8 chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > +};
> > > > > > +
> > > Those properties I think are not necessary are applying for the
> > > Rockchip's device, may not work for the others.
> >
> > Yes, it's possible that some of the elements are not necessary for some
> > decoders. What we want is to cover all the elements that might be
> > required for a decoder.
> I wonder whether allwinner need that, those sao flag usually ignored
> by decoder in design. But more is better than less, it is hard to
> extend a v4l2 structure in the future, maybe a new HEVC profile
> would bring a new property, it is still too early for HEVC.

Yes this is used by our decoder. The idea is to have all the basic
bitstream elements in the structures (even if some decoders don't use
them all) and add others for extension as separate controls later.

> > > > > > +struct v4l2_ctrl_hevc_slice_params {
> > > > > > + __u32 bit_size;
> > > > > > + __u32 data_bit_offset;
> > > > > > +
> > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
> > > > > > + __u8 nal_unit_type;
> > > > > > + __u8 nuh_temporal_id_plus1;
> > > > > > +
> > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
> > > > > > + __u8 slice_type;
> > > > > > + __u8 colour_plane_id;
> > > ----------------------------------------------------------------------------
> > > > > > + __u16 slice_pic_order_cnt;
> > > > > > + __u8 slice_sao_luma_flag;
> > > > > > + __u8 slice_sao_chroma_flag;
> > > > > > + __u8 slice_temporal_mvp_enabled_flag;
> > > > > > + __u8 num_ref_idx_l0_active_minus1;
> > > > > > + __u8 num_ref_idx_l1_active_minus1;
> > > Rockchip's decoder doesn't use this part.
> > > > > > + __u8 mvd_l1_zero_flag;
> > > > > > + __u8 cabac_init_flag;
> > > > > > + __u8 collocated_from_l0_flag;
> > > > > > + __u8 collocated_ref_idx;
> > > > > > + __u8 five_minus_max_num_merge_cand;
> > > > > > + __u8 use_integer_mv_flag;
> > > > > > + __s8 slice_qp_delta;
> > > > > > + __s8 slice_cb_qp_offset;
> > > > > > + __s8 slice_cr_qp_offset;
> > > > > > + __s8 slice_act_y_qp_offset;
> > > > > > + __s8 slice_act_cb_qp_offset;
> > > > > > + __s8 slice_act_cr_qp_offset;
> > > > > > + __u8 slice_deblocking_filter_disabled_flag;
> > > > > > + __s8 slice_beta_offset_div2;
> > > > > > + __s8 slice_tc_offset_div2;
> > > > > > + __u8 slice_loop_filter_across_slices_enabled_flag;
> > > > > > +
> > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture timing SEI message */
> > > > > > + __u8 pic_struct;
> > > I think the decoder doesn't care about this, it is used for display.
> >
> > The purpose of this field is to indicate whether the current picture is
> > a progressive frame or an interlaced field picture, which is useful for
> > decoding.
> >
> > At least our decoder has a register field to indicate frame/top
> > field/bottom field, so we certainly need to keep the info around.
> > Looking at the spec and the ffmpeg implementation, it looks like this
> > flag of the bitstream is the usual way to report field coding.
> It depends whether the decoder cares about scan type or more, I
> wonder prefer general_interlaced_source_flag for just scan type, it
> would be better than reading another SEL.

Well we still need a way to indicate if the current data is top or
bottom field for interlaced. I don't think that knowing that the whole
video is interlaced would be precise enough.

Cheers,

Paul

> > > > > > +
> > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
> > > > > > + struct v4l2_hevc_dpb_entry dpb[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > + __u8 num_active_dpb_entries;
> > > > > > + __u8 ref_idx_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > + __u8 ref_idx_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > +
> > > > > > + __u8 num_rps_poc_st_curr_before;
> > > > > > + __u8 num_rps_poc_st_curr_after;
> > > > > > + __u8 num_rps_poc_lt_curr;
> > > > > > +
> > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Weighted prediction parameter */
> > > > > > + struct v4l2_hevc_pred_weight_table pred_weight_table;
> > > > > > +};
> > > > > > +
> > > > > > #endif
> > --
> > Paul Kocialkowski, Bootlin (formerly Free Electrons)
> > Embedded Linux and kernel engineering
> > https://bootlin.com
> >
--
Paul Kocialkowski, Bootlin (formerly Free Electrons)
Embedded Linux and kernel engineering
https://bootlin.com


2019-01-08 10:01:53

by Randy Li

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls



Sent from my iPad

> On Jan 8, 2019, at 4:38 PM, Paul Kocialkowski <[email protected]> wrote:
>
> Hi,
>
>> On Tue, 2019-01-08 at 09:16 +0800, Ayaka wrote:
>>
>> Sent from my iPad
>>
>>> On Jan 7, 2019, at 5:57 PM, Paul Kocialkowski <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>>>> On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
>>>>> On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
>>>>> Hi,
>>>>>
>>>>> On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
>>>>>
>>>>>>> +
>>>>>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE 0x01
>>>>>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER 0x02
>>>>>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR 0x03
>>>>>>> +
>>>>>>> +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX 16
>>>>>>> +
>>>>>>> +struct v4l2_hevc_dpb_entry {
>>>>>>> + __u32 buffer_tag;
>>>>>>> + __u8 rps;
>>>>>>> + __u8 field_pic;
>>>>>>> + __u16 pic_order_cnt[2];
>>>>>>> +};
>>>>
>>>> Please add a property for reference index, if that rps is not used for
>>>> this, some device would request that(not the rockchip one). And
>>>> Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.
>>>
>>> What exactly is that reference index? Is it a bitstream element or
>>> something deduced from the bitstream?
>>>
>> picture order count(POC) for HEVC and frame_num in AVC. I think it is
>> the number used in list0(P slice and B slice) and list1(B slice).
>
> The picture order count is already the last field of the DPB entry
> structure. There is one for each field picture.
As we are not sure whether there is a field coded slice or CTU, I would hold this part and else about the field.
>
>>>> Adding another buffer_tag for referring the memory of the motion vectors
>>>> for each frames. Or a better method is add a meta data to echo picture
>>>> buffer, since the picture output is just the same as the original,
>>>> display won't care whether the motion vectors are written the button of
>>>> picture or somewhere else.
>>>
>>> The motion vectors are passed as part of the raw bitstream data, in the
>>> slices. Is there a case where the motion vectors are coded differently?
>> No, it is an additional cache for decoder, even FFmpeg having such
>> data, I think allwinner must output it into somewhere.
>
> Ah yes I see what you mean! This is handled internally by our driver
> and not exposed to userspace. I don't think it would be a good idea to
> expose this cache or request that userspace allocates it like a video
> buffer.
>
No, usually the driver should allocate, as the user space have no idea on size of each devices.
But for advantage user, application can fix a broken picture with a proper data or analysis a object motion from that.
So I would suggest attaching this information to a picture buffer as a meta data.
>>>>>>> +
>>>>>>> +struct v4l2_hevc_pred_weight_table {
>>>>>>> + __u8 luma_log2_weight_denom;
>>>>>>> + __s8 delta_chroma_log2_weight_denom;
>>>>>>> +
>>>>>>> + __s8 delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>> + __s8 luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>> + __s8 delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>>>> + __s8 chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>>>> +
>>>>>>> + __s8 delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>> + __s8 luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>> + __s8 delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>>>> + __s8 chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>>>> +};
>>>>>>> +
>>>> Those properties I think are not necessary are applying for the
>>>> Rockchip's device, may not work for the others.
>>>
>>> Yes, it's possible that some of the elements are not necessary for some
>>> decoders. What we want is to cover all the elements that might be
>>> required for a decoder.
>> I wonder whether allwinner need that, those sao flag usually ignored
>> by decoder in design. But more is better than less, it is hard to
>> extend a v4l2 structure in the future, maybe a new HEVC profile
>> would bring a new property, it is still too early for HEVC.
>
> Yes this is used by our decoder. The idea is to have all the basic
> bitstream elements in the structures (even if some decoders don't use
> them all) and add others for extension as separate controls later.
>
>>>>>>> +struct v4l2_ctrl_hevc_slice_params {
>>>>>>> + __u32 bit_size;
>>>>>>> + __u32 data_bit_offset;
>>>>>>> +
>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
>>>>>>> + __u8 nal_unit_type;
>>>>>>> + __u8 nuh_temporal_id_plus1;
>>>>>>> +
>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
>>>>>>> + __u8 slice_type;
>>>>>>> + __u8 colour_plane_id;
>>>> ----------------------------------------------------------------------------
>>>>>>> + __u16 slice_pic_order_cnt;
>>>>>>> + __u8 slice_sao_luma_flag;
>>>>>>> + __u8 slice_sao_chroma_flag;
>>>>>>> + __u8 slice_temporal_mvp_enabled_flag;
>>>>>>> + __u8 num_ref_idx_l0_active_minus1;
>>>>>>> + __u8 num_ref_idx_l1_active_minus1;
>>>> Rockchip's decoder doesn't use this part.
>>>>>>> + __u8 mvd_l1_zero_flag;
>>>>>>> + __u8 cabac_init_flag;
>>>>>>> + __u8 collocated_from_l0_flag;
>>>>>>> + __u8 collocated_ref_idx;
>>>>>>> + __u8 five_minus_max_num_merge_cand;
>>>>>>> + __u8 use_integer_mv_flag;
>>>>>>> + __s8 slice_qp_delta;
>>>>>>> + __s8 slice_cb_qp_offset;
>>>>>>> + __s8 slice_cr_qp_offset;
>>>>>>> + __s8 slice_act_y_qp_offset;
>>>>>>> + __s8 slice_act_cb_qp_offset;
>>>>>>> + __s8 slice_act_cr_qp_offset;
>>>>>>> + __u8 slice_deblocking_filter_disabled_flag;
>>>>>>> + __s8 slice_beta_offset_div2;
>>>>>>> + __s8 slice_tc_offset_div2;
>>>>>>> + __u8 slice_loop_filter_across_slices_enabled_flag;
>>>>>>> +
>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture timing SEI message */
>>>>>>> + __u8 pic_struct;
>>>> I think the decoder doesn't care about this, it is used for display.
>>>
>>> The purpose of this field is to indicate whether the current picture is
>>> a progressive frame or an interlaced field picture, which is useful for
>>> decoding.
>>>
>>> At least our decoder has a register field to indicate frame/top
>>> field/bottom field, so we certainly need to keep the info around.
>>> Looking at the spec and the ffmpeg implementation, it looks like this
>>> flag of the bitstream is the usual way to report field coding.
>> It depends whether the decoder cares about scan type or more, I
>> wonder prefer general_interlaced_source_flag for just scan type, it
>> would be better than reading another SEL.
>
> Well we still need a way to indicate if the current data is top or
> bottom field for interlaced. I don't think that knowing that the whole
> video is interlaced would be precise enough.
>
> Cheers,
>
> Paul
>
>>>>>>> +
>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
>>>>>>> + struct v4l2_hevc_dpb_entry dpb[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>> + __u8 num_active_dpb_entries;
>>>>>>> + __u8 ref_idx_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>> + __u8 ref_idx_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>> +
>>>>>>> + __u8 num_rps_poc_st_curr_before;
>>>>>>> + __u8 num_rps_poc_st_curr_after;
>>>>>>> + __u8 num_rps_poc_lt_curr;
>>>>>>> +
>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Weighted prediction parameter */
>>>>>>> + struct v4l2_hevc_pred_weight_table pred_weight_table;
>>>>>>> +};
>>>>>>> +
>>>>>>> #endif
>>> --
>>> Paul Kocialkowski, Bootlin (formerly Free Electrons)
>>> Embedded Linux and kernel engineering
>>> https://bootlin.com
>>>
> --
> Paul Kocialkowski, Bootlin (formerly Free Electrons)
> Embedded Linux and kernel engineering
> https://bootlin.com
>


2019-01-10 13:34:31

by Randy Li

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

I forget a important thing, for the rkvdec and rk hevc decoder, it would
requests cabac table, scaling list, picture parameter set and reference
picture storing in one or various of DMA buffers. I am not talking about
the data been parsed, the decoder would requests a raw data.

For the pps and rps, it is possible to reuse the slice header, just let
the decoder know the offset from the bitstream bufer, I would suggest to
add three properties(with sps) for them. But I think we need a method to
mark a OUTPUT side buffer for those aux data.

On 1/8/19 6:00 PM, Ayaka wrote:
>
> Sent from my iPad
>
>> On Jan 8, 2019, at 4:38 PM, Paul Kocialkowski <[email protected]> wrote:
>>
>> Hi,
>>
>>> On Tue, 2019-01-08 at 09:16 +0800, Ayaka wrote:
>>>
>>> Sent from my iPad
>>>
>>>> On Jan 7, 2019, at 5:57 PM, Paul Kocialkowski <[email protected]> wrote:
>>>>
>>>> Hi,
>>>>
>>>>>> On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
>>>>>> On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
>>>>>>
>>>>>>>> +
>>>>>>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE 0x01
>>>>>>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER 0x02
>>>>>>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR 0x03
>>>>>>>> +
>>>>>>>> +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX 16
>>>>>>>> +
>>>>>>>> +struct v4l2_hevc_dpb_entry {
>>>>>>>> + __u32 buffer_tag;
>>>>>>>> + __u8 rps;
>>>>>>>> + __u8 field_pic;
>>>>>>>> + __u16 pic_order_cnt[2];
>>>>>>>> +};
>>>>> Please add a property for reference index, if that rps is not used for
>>>>> this, some device would request that(not the rockchip one). And
>>>>> Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.
>>>> What exactly is that reference index? Is it a bitstream element or
>>>> something deduced from the bitstream?
>>>>
>>> picture order count(POC) for HEVC and frame_num in AVC. I think it is
>>> the number used in list0(P slice and B slice) and list1(B slice).
>> The picture order count is already the last field of the DPB entry
>> structure. There is one for each field picture.
> As we are not sure whether there is a field coded slice or CTU, I would hold this part and else about the field.
>>>>> Adding another buffer_tag for referring the memory of the motion vectors
>>>>> for each frames. Or a better method is add a meta data to echo picture
>>>>> buffer, since the picture output is just the same as the original,
>>>>> display won't care whether the motion vectors are written the button of
>>>>> picture or somewhere else.
>>>> The motion vectors are passed as part of the raw bitstream data, in the
>>>> slices. Is there a case where the motion vectors are coded differently?
>>> No, it is an additional cache for decoder, even FFmpeg having such
>>> data, I think allwinner must output it into somewhere.
>> Ah yes I see what you mean! This is handled internally by our driver
>> and not exposed to userspace. I don't think it would be a good idea to
>> expose this cache or request that userspace allocates it like a video
>> buffer.
>>
> No, usually the driver should allocate, as the user space have no idea on size of each devices.
> But for advantage user, application can fix a broken picture with a proper data or analysis a object motion from that.
> So I would suggest attaching this information to a picture buffer as a meta data.
>>>>>>>> +
>>>>>>>> +struct v4l2_hevc_pred_weight_table {
>>>>>>>> + __u8 luma_log2_weight_denom;
>>>>>>>> + __s8 delta_chroma_log2_weight_denom;
>>>>>>>> +
>>>>>>>> + __s8 delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>> + __s8 luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>> + __s8 delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>>>>> + __s8 chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>>>>> +
>>>>>>>> + __s8 delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>> + __s8 luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>> + __s8 delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>>>>> + __s8 chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>>>>> +};
>>>>>>>> +
>>>>> Those properties I think are not necessary are applying for the
>>>>> Rockchip's device, may not work for the others.
>>>> Yes, it's possible that some of the elements are not necessary for some
>>>> decoders. What we want is to cover all the elements that might be
>>>> required for a decoder.
>>> I wonder whether allwinner need that, those sao flag usually ignored
>>> by decoder in design. But more is better than less, it is hard to
>>> extend a v4l2 structure in the future, maybe a new HEVC profile
>>> would bring a new property, it is still too early for HEVC.
>> Yes this is used by our decoder. The idea is to have all the basic
>> bitstream elements in the structures (even if some decoders don't use
>> them all) and add others for extension as separate controls later.
>>
>>>>>>>> +struct v4l2_ctrl_hevc_slice_params {
>>>>>>>> + __u32 bit_size;
>>>>>>>> + __u32 data_bit_offset;
>>>>>>>> +
>>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
>>>>>>>> + __u8 nal_unit_type;
>>>>>>>> + __u8 nuh_temporal_id_plus1;
>>>>>>>> +
>>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
>>>>>>>> + __u8 slice_type;
>>>>>>>> + __u8 colour_plane_id;
>>>>> ----------------------------------------------------------------------------
>>>>>>>> + __u16 slice_pic_order_cnt;
>>>>>>>> + __u8 slice_sao_luma_flag;
>>>>>>>> + __u8 slice_sao_chroma_flag;
>>>>>>>> + __u8 slice_temporal_mvp_enabled_flag;
>>>>>>>> + __u8 num_ref_idx_l0_active_minus1;
>>>>>>>> + __u8 num_ref_idx_l1_active_minus1;
>>>>> Rockchip's decoder doesn't use this part.
>>>>>>>> + __u8 mvd_l1_zero_flag;
>>>>>>>> + __u8 cabac_init_flag;
>>>>>>>> + __u8 collocated_from_l0_flag;
>>>>>>>> + __u8 collocated_ref_idx;
>>>>>>>> + __u8 five_minus_max_num_merge_cand;
>>>>>>>> + __u8 use_integer_mv_flag;
>>>>>>>> + __s8 slice_qp_delta;
>>>>>>>> + __s8 slice_cb_qp_offset;
>>>>>>>> + __s8 slice_cr_qp_offset;
>>>>>>>> + __s8 slice_act_y_qp_offset;
>>>>>>>> + __s8 slice_act_cb_qp_offset;
>>>>>>>> + __s8 slice_act_cr_qp_offset;
>>>>>>>> + __u8 slice_deblocking_filter_disabled_flag;
>>>>>>>> + __s8 slice_beta_offset_div2;
>>>>>>>> + __s8 slice_tc_offset_div2;
>>>>>>>> + __u8 slice_loop_filter_across_slices_enabled_flag;
>>>>>>>> +
>>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture timing SEI message */
>>>>>>>> + __u8 pic_struct;
>>>>> I think the decoder doesn't care about this, it is used for display.
>>>> The purpose of this field is to indicate whether the current picture is
>>>> a progressive frame or an interlaced field picture, which is useful for
>>>> decoding.
>>>>
>>>> At least our decoder has a register field to indicate frame/top
>>>> field/bottom field, so we certainly need to keep the info around.
>>>> Looking at the spec and the ffmpeg implementation, it looks like this
>>>> flag of the bitstream is the usual way to report field coding.
>>> It depends whether the decoder cares about scan type or more, I
>>> wonder prefer general_interlaced_source_flag for just scan type, it
>>> would be better than reading another SEL.
>> Well we still need a way to indicate if the current data is top or
>> bottom field for interlaced. I don't think that knowing that the whole
>> video is interlaced would be precise enough.
>>
>> Cheers,
>>
>> Paul
>>
>>>>>>>> +
>>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
>>>>>>>> + struct v4l2_hevc_dpb_entry dpb[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>> + __u8 num_active_dpb_entries;
>>>>>>>> + __u8 ref_idx_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>> + __u8 ref_idx_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>> +
>>>>>>>> + __u8 num_rps_poc_st_curr_before;
>>>>>>>> + __u8 num_rps_poc_st_curr_after;
>>>>>>>> + __u8 num_rps_poc_lt_curr;
>>>>>>>> +
>>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Weighted prediction parameter */
>>>>>>>> + struct v4l2_hevc_pred_weight_table pred_weight_table;
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> #endif
>>>> --
>>>> Paul Kocialkowski, Bootlin (formerly Free Electrons)
>>>> Embedded Linux and kernel engineering
>>>> https://bootlin.com
>>>>
>> --
>> Paul Kocialkowski, Bootlin (formerly Free Electrons)
>> Embedded Linux and kernel engineering
>> https://bootlin.com
>>

2019-01-24 10:27:57

by Paul Kocialkowski

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

Hi,

On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
> I forget a important thing, for the rkvdec and rk hevc decoder, it would
> requests cabac table, scaling list, picture parameter set and reference
> picture storing in one or various of DMA buffers. I am not talking about
> the data been parsed, the decoder would requests a raw data.
>
> For the pps and rps, it is possible to reuse the slice header, just let
> the decoder know the offset from the bitstream bufer, I would suggest to
> add three properties(with sps) for them. But I think we need a method to
> mark a OUTPUT side buffer for those aux data.

I'm quite confused about the hardware implementation then. From what
you're saying, it seems that it takes the raw bitstream elements rather
than parsed elements. Is it really a stateless implementation?

The stateless implementation was designed with the idea that only the
raw slice data should be passed in bitstream form to the decoder. For
H.264, it seems that some decoders also need the slice header in raw
bitstream form (because they take the full slice NAL unit), see the
discussions in this thread:
media: docs-rst: Document m2m stateless video decoder interface

Can you detail exactly what the rockchip decoder absolutely needs in
raw bitstream format?

Cheers,

Paul

> On 1/8/19 6:00 PM, Ayaka wrote:
> > Sent from my iPad
> >
> > > On Jan 8, 2019, at 4:38 PM, Paul Kocialkowski <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > > On Tue, 2019-01-08 at 09:16 +0800, Ayaka wrote:
> > > >
> > > > Sent from my iPad
> > > >
> > > > > On Jan 7, 2019, at 5:57 PM, Paul Kocialkowski <[email protected]> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > > > On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
> > > > > > > On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
> > > > > > >
> > > > > > > > > +
> > > > > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE 0x01
> > > > > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER 0x02
> > > > > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR 0x03
> > > > > > > > > +
> > > > > > > > > +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX 16
> > > > > > > > > +
> > > > > > > > > +struct v4l2_hevc_dpb_entry {
> > > > > > > > > + __u32 buffer_tag;
> > > > > > > > > + __u8 rps;
> > > > > > > > > + __u8 field_pic;
> > > > > > > > > + __u16 pic_order_cnt[2];
> > > > > > > > > +};
> > > > > > Please add a property for reference index, if that rps is not used for
> > > > > > this, some device would request that(not the rockchip one). And
> > > > > > Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.
> > > > > What exactly is that reference index? Is it a bitstream element or
> > > > > something deduced from the bitstream?
> > > > >
> > > > picture order count(POC) for HEVC and frame_num in AVC. I think it is
> > > > the number used in list0(P slice and B slice) and list1(B slice).
> > > The picture order count is already the last field of the DPB entry
> > > structure. There is one for each field picture.
> > As we are not sure whether there is a field coded slice or CTU, I would hold this part and else about the field.
> > > > > > Adding another buffer_tag for referring the memory of the motion vectors
> > > > > > for each frames. Or a better method is add a meta data to echo picture
> > > > > > buffer, since the picture output is just the same as the original,
> > > > > > display won't care whether the motion vectors are written the button of
> > > > > > picture or somewhere else.
> > > > > The motion vectors are passed as part of the raw bitstream data, in the
> > > > > slices. Is there a case where the motion vectors are coded differently?
> > > > No, it is an additional cache for decoder, even FFmpeg having such
> > > > data, I think allwinner must output it into somewhere.
> > > Ah yes I see what you mean! This is handled internally by our driver
> > > and not exposed to userspace. I don't think it would be a good idea to
> > > expose this cache or request that userspace allocates it like a video
> > > buffer.
> > >
> > No, usually the driver should allocate, as the user space have no idea on size of each devices.
> > But for advantage user, application can fix a broken picture with a proper data or analysis a object motion from that.
> > So I would suggest attaching this information to a picture buffer as a meta data.
> > > > > > > > > +
> > > > > > > > > +struct v4l2_hevc_pred_weight_table {
> > > > > > > > > + __u8 luma_log2_weight_denom;
> > > > > > > > > + __s8 delta_chroma_log2_weight_denom;
> > > > > > > > > +
> > > > > > > > > + __s8 delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > > + __s8 luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > > + __s8 delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > > > > + __s8 chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > > > > +
> > > > > > > > > + __s8 delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > > + __s8 luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > > + __s8 delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > > > > + __s8 chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > > > > +};
> > > > > > > > > +
> > > > > > Those properties I think are not necessary are applying for the
> > > > > > Rockchip's device, may not work for the others.
> > > > > Yes, it's possible that some of the elements are not necessary for some
> > > > > decoders. What we want is to cover all the elements that might be
> > > > > required for a decoder.
> > > > I wonder whether allwinner need that, those sao flag usually ignored
> > > > by decoder in design. But more is better than less, it is hard to
> > > > extend a v4l2 structure in the future, maybe a new HEVC profile
> > > > would bring a new property, it is still too early for HEVC.
> > > Yes this is used by our decoder. The idea is to have all the basic
> > > bitstream elements in the structures (even if some decoders don't use
> > > them all) and add others for extension as separate controls later.
> > >
> > > > > > > > > +struct v4l2_ctrl_hevc_slice_params {
> > > > > > > > > + __u32 bit_size;
> > > > > > > > > + __u32 data_bit_offset;
> > > > > > > > > +
> > > > > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
> > > > > > > > > + __u8 nal_unit_type;
> > > > > > > > > + __u8 nuh_temporal_id_plus1;
> > > > > > > > > +
> > > > > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
> > > > > > > > > + __u8 slice_type;
> > > > > > > > > + __u8 colour_plane_id;
> > > > > > ----------------------------------------------------------------------------
> > > > > > > > > + __u16 slice_pic_order_cnt;
> > > > > > > > > + __u8 slice_sao_luma_flag;
> > > > > > > > > + __u8 slice_sao_chroma_flag;
> > > > > > > > > + __u8 slice_temporal_mvp_enabled_flag;
> > > > > > > > > + __u8 num_ref_idx_l0_active_minus1;
> > > > > > > > > + __u8 num_ref_idx_l1_active_minus1;
> > > > > > Rockchip's decoder doesn't use this part.
> > > > > > > > > + __u8 mvd_l1_zero_flag;
> > > > > > > > > + __u8 cabac_init_flag;
> > > > > > > > > + __u8 collocated_from_l0_flag;
> > > > > > > > > + __u8 collocated_ref_idx;
> > > > > > > > > + __u8 five_minus_max_num_merge_cand;
> > > > > > > > > + __u8 use_integer_mv_flag;
> > > > > > > > > + __s8 slice_qp_delta;
> > > > > > > > > + __s8 slice_cb_qp_offset;
> > > > > > > > > + __s8 slice_cr_qp_offset;
> > > > > > > > > + __s8 slice_act_y_qp_offset;
> > > > > > > > > + __s8 slice_act_cb_qp_offset;
> > > > > > > > > + __s8 slice_act_cr_qp_offset;
> > > > > > > > > + __u8 slice_deblocking_filter_disabled_flag;
> > > > > > > > > + __s8 slice_beta_offset_div2;
> > > > > > > > > + __s8 slice_tc_offset_div2;
> > > > > > > > > + __u8 slice_loop_filter_across_slices_enabled_flag;
> > > > > > > > > +
> > > > > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture timing SEI message */
> > > > > > > > > + __u8 pic_struct;
> > > > > > I think the decoder doesn't care about this, it is used for display.
> > > > > The purpose of this field is to indicate whether the current picture is
> > > > > a progressive frame or an interlaced field picture, which is useful for
> > > > > decoding.
> > > > >
> > > > > At least our decoder has a register field to indicate frame/top
> > > > > field/bottom field, so we certainly need to keep the info around.
> > > > > Looking at the spec and the ffmpeg implementation, it looks like this
> > > > > flag of the bitstream is the usual way to report field coding.
> > > > It depends whether the decoder cares about scan type or more, I
> > > > wonder prefer general_interlaced_source_flag for just scan type, it
> > > > would be better than reading another SEL.
> > > Well we still need a way to indicate if the current data is top or
> > > bottom field for interlaced. I don't think that knowing that the whole
> > > video is interlaced would be precise enough.
> > >
> > > Cheers,
> > >
> > > Paul
> > >
> > > > > > > > > +
> > > > > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
> > > > > > > > > + struct v4l2_hevc_dpb_entry dpb[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > > + __u8 num_active_dpb_entries;
> > > > > > > > > + __u8 ref_idx_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > > + __u8 ref_idx_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > > +
> > > > > > > > > + __u8 num_rps_poc_st_curr_before;
> > > > > > > > > + __u8 num_rps_poc_st_curr_after;
> > > > > > > > > + __u8 num_rps_poc_lt_curr;
> > > > > > > > > +
> > > > > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Weighted prediction parameter */
> > > > > > > > > + struct v4l2_hevc_pred_weight_table pred_weight_table;
> > > > > > > > > +};
> > > > > > > > > +
> > > > > > > > > #endif
> > > > > --
> > > > > Paul Kocialkowski, Bootlin (formerly Free Electrons)
> > > > > Embedded Linux and kernel engineering
> > > > > https://bootlin.com
> > > > >
> > > --
> > > Paul Kocialkowski, Bootlin (formerly Free Electrons)
> > > Embedded Linux and kernel engineering
> > > https://bootlin.com
> > >
--
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com


2019-01-24 10:38:20

by Paul Kocialkowski

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

Hi,

On Tue, 2019-01-08 at 18:00 +0800, Ayaka wrote:
>
> Sent from my iPad
>
> > On Jan 8, 2019, at 4:38 PM, Paul Kocialkowski <[email protected]> wrote:
> >
> > Hi,
> >
> > > On Tue, 2019-01-08 at 09:16 +0800, Ayaka wrote:
> > >
> > > Sent from my iPad
> > >
> > > > On Jan 7, 2019, at 5:57 PM, Paul Kocialkowski <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > > > On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
> > > > > > On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
> > > > > > Hi,
> > > > > >
> > > > > > On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
> > > > > >
> > > > > > > > +
> > > > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE 0x01
> > > > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER 0x02
> > > > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR 0x03
> > > > > > > > +
> > > > > > > > +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX 16
> > > > > > > > +
> > > > > > > > +struct v4l2_hevc_dpb_entry {
> > > > > > > > + __u32 buffer_tag;
> > > > > > > > + __u8 rps;
> > > > > > > > + __u8 field_pic;
> > > > > > > > + __u16 pic_order_cnt[2];
> > > > > > > > +};
> > > > >
> > > > > Please add a property for reference index, if that rps is not used for
> > > > > this, some device would request that(not the rockchip one). And
> > > > > Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.
> > > >
> > > > What exactly is that reference index? Is it a bitstream element or
> > > > something deduced from the bitstream?
> > > >
> > > picture order count(POC) for HEVC and frame_num in AVC. I think it is
> > > the number used in list0(P slice and B slice) and list1(B slice).
> >
> > The picture order count is already the last field of the DPB entry
> > structure. There is one for each field picture.
> As we are not sure whether there is a field coded slice or CTU, I
> would hold this part and else about the field.

I'm not sure what you meant here, sorry.

> > > > > Adding another buffer_tag for referring the memory of the motion vectors
> > > > > for each frames. Or a better method is add a meta data to echo picture
> > > > > buffer, since the picture output is just the same as the original,
> > > > > display won't care whether the motion vectors are written the button of
> > > > > picture or somewhere else.
> > > >
> > > > The motion vectors are passed as part of the raw bitstream data, in the
> > > > slices. Is there a case where the motion vectors are coded differently?
> > > No, it is an additional cache for decoder, even FFmpeg having such
> > > data, I think allwinner must output it into somewhere.
> >
> > Ah yes I see what you mean! This is handled internally by our driver
> > and not exposed to userspace. I don't think it would be a good idea to
> > expose this cache or request that userspace allocates it like a video
> > buffer.
> >
> No, usually the driver should allocate, as the user space have no
> idea on size of each devices.
> But for advantage user, application can fix a broken picture with a
> proper data or analysis a object motion from that.
> So I would suggest attaching this information to a picture buffer as
> a meta data.

Right, the driver will allocate chunks of memory for the decoding
metadata used by the hardware decoder.

Well, I don't think V4L2 has any mechanism to expose this data for now
and since it's very specific to the hardware implementation, I guess
the interest in having that is generally pretty low.

That's maybe something that could be added later if someone wants to
work on it, but I think we are better off keeping this metadata hidden
by the driver for now.

> > > > > > > > +
> > > > > > > > +struct v4l2_hevc_pred_weight_table {
> > > > > > > > + __u8 luma_log2_weight_denom;
> > > > > > > > + __s8 delta_chroma_log2_weight_denom;
> > > > > > > > +
> > > > > > > > + __s8 delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > + __s8 luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > + __s8 delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > > > + __s8 chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > > > +
> > > > > > > > + __s8 delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > + __s8 luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > + __s8 delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > > > + __s8 chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > > > +};
> > > > > > > > +
> > > > > Those properties I think are not necessary are applying for the
> > > > > Rockchip's device, may not work for the others.
> > > >
> > > > Yes, it's possible that some of the elements are not necessary for some
> > > > decoders. What we want is to cover all the elements that might be
> > > > required for a decoder.
> > > I wonder whether allwinner need that, those sao flag usually ignored
> > > by decoder in design. But more is better than less, it is hard to
> > > extend a v4l2 structure in the future, maybe a new HEVC profile
> > > would bring a new property, it is still too early for HEVC.
> >
> > Yes this is used by our decoder. The idea is to have all the basic
> > bitstream elements in the structures (even if some decoders don't use
> > them all) and add others for extension as separate controls later.
> >
> > > > > > > > +struct v4l2_ctrl_hevc_slice_params {
> > > > > > > > + __u32 bit_size;
> > > > > > > > + __u32 data_bit_offset;
> > > > > > > > +
> > > > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
> > > > > > > > + __u8 nal_unit_type;
> > > > > > > > + __u8 nuh_temporal_id_plus1;
> > > > > > > > +
> > > > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
> > > > > > > > + __u8 slice_type;
> > > > > > > > + __u8 colour_plane_id;
> > > > > ----------------------------------------------------------------------------
> > > > > > > > + __u16 slice_pic_order_cnt;
> > > > > > > > + __u8 slice_sao_luma_flag;
> > > > > > > > + __u8 slice_sao_chroma_flag;
> > > > > > > > + __u8 slice_temporal_mvp_enabled_flag;
> > > > > > > > + __u8 num_ref_idx_l0_active_minus1;
> > > > > > > > + __u8 num_ref_idx_l1_active_minus1;
> > > > > Rockchip's decoder doesn't use this part.
> > > > > > > > + __u8 mvd_l1_zero_flag;
> > > > > > > > + __u8 cabac_init_flag;
> > > > > > > > + __u8 collocated_from_l0_flag;
> > > > > > > > + __u8 collocated_ref_idx;
> > > > > > > > + __u8 five_minus_max_num_merge_cand;
> > > > > > > > + __u8 use_integer_mv_flag;
> > > > > > > > + __s8 slice_qp_delta;
> > > > > > > > + __s8 slice_cb_qp_offset;
> > > > > > > > + __s8 slice_cr_qp_offset;
> > > > > > > > + __s8 slice_act_y_qp_offset;
> > > > > > > > + __s8 slice_act_cb_qp_offset;
> > > > > > > > + __s8 slice_act_cr_qp_offset;
> > > > > > > > + __u8 slice_deblocking_filter_disabled_flag;
> > > > > > > > + __s8 slice_beta_offset_div2;
> > > > > > > > + __s8 slice_tc_offset_div2;
> > > > > > > > + __u8 slice_loop_filter_across_slices_enabled_flag;
> > > > > > > > +
> > > > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture timing SEI message */
> > > > > > > > + __u8 pic_struct;
> > > > > I think the decoder doesn't care about this, it is used for display.
> > > >
> > > > The purpose of this field is to indicate whether the current picture is
> > > > a progressive frame or an interlaced field picture, which is useful for
> > > > decoding.
> > > >
> > > > At least our decoder has a register field to indicate frame/top
> > > > field/bottom field, so we certainly need to keep the info around.
> > > > Looking at the spec and the ffmpeg implementation, it looks like this
> > > > flag of the bitstream is the usual way to report field coding.
> > > It depends whether the decoder cares about scan type or more, I
> > > wonder prefer general_interlaced_source_flag for just scan type, it
> > > would be better than reading another SEL.
> >
> > Well we still need a way to indicate if the current data is top or
> > bottom field for interlaced. I don't think that knowing that the whole
> > video is interlaced would be precise enough.
> >
> > Cheers,
> >
> > Paul
> >
> > > > > > > > +
> > > > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
> > > > > > > > + struct v4l2_hevc_dpb_entry dpb[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > + __u8 num_active_dpb_entries;
> > > > > > > > + __u8 ref_idx_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > + __u8 ref_idx_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > +
> > > > > > > > + __u8 num_rps_poc_st_curr_before;
> > > > > > > > + __u8 num_rps_poc_st_curr_after;
> > > > > > > > + __u8 num_rps_poc_lt_curr;
> > > > > > > > +
> > > > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Weighted prediction parameter */
> > > > > > > > + struct v4l2_hevc_pred_weight_table pred_weight_table;
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > #endif
> > > > --
> > > > Paul Kocialkowski, Bootlin (formerly Free Electrons)
> > > > Embedded Linux and kernel engineering
> > > > https://bootlin.com
> > > >
> > --
> > Paul Kocialkowski, Bootlin (formerly Free Electrons)
> > Embedded Linux and kernel engineering
> > https://bootlin.com
> >
--
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com


2019-01-24 12:21:34

by Randy Li

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls



Sent from my iPad

> On Jan 24, 2019, at 6:36 PM, Paul Kocialkowski <[email protected]> wrote:
>
> Hi,
>
>> On Tue, 2019-01-08 at 18:00 +0800, Ayaka wrote:
>>
>> Sent from my iPad
>>
>>> On Jan 8, 2019, at 4:38 PM, Paul Kocialkowski <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>>> On Tue, 2019-01-08 at 09:16 +0800, Ayaka wrote:
>>>>
>>>> Sent from my iPad
>>>>
>>>>> On Jan 7, 2019, at 5:57 PM, Paul Kocialkowski <[email protected]> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>>> On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
>>>>>>> On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
>>>>>>>
>>>>>>>>> +
>>>>>>>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE 0x01
>>>>>>>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER 0x02
>>>>>>>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR 0x03
>>>>>>>>> +
>>>>>>>>> +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX 16
>>>>>>>>> +
>>>>>>>>> +struct v4l2_hevc_dpb_entry {
>>>>>>>>> + __u32 buffer_tag;
>>>>>>>>> + __u8 rps;
>>>>>>>>> + __u8 field_pic;
>>>>>>>>> + __u16 pic_order_cnt[2];
>>>>>>>>> +};
>>>>>>
>>>>>> Please add a property for reference index, if that rps is not used for
>>>>>> this, some device would request that(not the rockchip one). And
>>>>>> Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.
>>>>>
>>>>> What exactly is that reference index? Is it a bitstream element or
>>>>> something deduced from the bitstream?
>>>>>
>>>> picture order count(POC) for HEVC and frame_num in AVC. I think it is
>>>> the number used in list0(P slice and B slice) and list1(B slice).
>>>
>>> The picture order count is already the last field of the DPB entry
>>> structure. There is one for each field picture.
>> As we are not sure whether there is a field coded slice or CTU, I
>> would hold this part and else about the field.
>
> I'm not sure what you meant here, sorry.
As we talked in IRC, I am not sure the field coded picture is supported in HEVC.
And I don’t why there would be two pic order cnt, a picture can only be used a short term or a long term reference at one picture decoding
>
>>>>>> Adding another buffer_tag for referring the memory of the motion vectors
>>>>>> for each frames. Or a better method is add a meta data to echo picture
>>>>>> buffer, since the picture output is just the same as the original,
>>>>>> display won't care whether the motion vectors are written the button of
>>>>>> picture or somewhere else.
>>>>>
>>>>> The motion vectors are passed as part of the raw bitstream data, in the
>>>>> slices. Is there a case where the motion vectors are coded differently?
>>>> No, it is an additional cache for decoder, even FFmpeg having such
>>>> data, I think allwinner must output it into somewhere.
>>>
>>> Ah yes I see what you mean! This is handled internally by our driver
>>> and not exposed to userspace. I don't think it would be a good idea to
>>> expose this cache or request that userspace allocates it like a video
>>> buffer.
>>>
>> No, usually the driver should allocate, as the user space have no
>> idea on size of each devices.
>> But for advantage user, application can fix a broken picture with a
>> proper data or analysis a object motion from that.
>> So I would suggest attaching this information to a picture buffer as
>> a meta data.
>
> Right, the driver will allocate chunks of memory for the decoding
> metadata used by the hardware decoder.
>
> Well, I don't think V4L2 has any mechanism to expose this data for now
> and since it's very specific to the hardware implementation, I guess
> the interest in having that is generally pretty low.
>
> That's maybe something that could be added later if someone wants to
> work on it, but I think we are better off keeping this metadata hidden
> by the driver for now.
I am writing a V4l2 driver for rockchip based on the previous vendor driver I sent to mail list. I think I would offer a better way to describe the meta after that. But it need both work in derives and userspace, it would cost some times.
>
>>>>>>>>> +
>>>>>>>>> +struct v4l2_hevc_pred_weight_table {
>>>>>>>>> + __u8 luma_log2_weight_denom;
>>>>>>>>> + __s8 delta_chroma_log2_weight_denom;
>>>>>>>>> +
>>>>>>>>> + __s8 delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>>> + __s8 luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>>> + __s8 delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>>>>>> + __s8 chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>>>>>> +
>>>>>>>>> + __s8 delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>>> + __s8 luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>>> + __s8 delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>>>>>> + __s8 chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>>>>>> +};
>>>>>>>>> +
>>>>>> Those properties I think are not necessary are applying for the
>>>>>> Rockchip's device, may not work for the others.
>>>>>
>>>>> Yes, it's possible that some of the elements are not necessary for some
>>>>> decoders. What we want is to cover all the elements that might be
>>>>> required for a decoder.
>>>> I wonder whether allwinner need that, those sao flag usually ignored
>>>> by decoder in design. But more is better than less, it is hard to
>>>> extend a v4l2 structure in the future, maybe a new HEVC profile
>>>> would bring a new property, it is still too early for HEVC.
>>>
>>> Yes this is used by our decoder. The idea is to have all the basic
>>> bitstream elements in the structures (even if some decoders don't use
>>> them all) and add others for extension as separate controls later.
>>>
>>>>>>>>> +struct v4l2_ctrl_hevc_slice_params {
>>>>>>>>> + __u32 bit_size;
>>>>>>>>> + __u32 data_bit_offset;
>>>>>>>>> +
>>>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
>>>>>>>>> + __u8 nal_unit_type;
>>>>>>>>> + __u8 nuh_temporal_id_plus1;
>>>>>>>>> +
>>>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
>>>>>>>>> + __u8 slice_type;
>>>>>>>>> + __u8 colour_plane_id;
>>>>>> ----------------------------------------------------------------------------
>>>>>>>>> + __u16 slice_pic_order_cnt;
>>>>>>>>> + __u8 slice_sao_luma_flag;
>>>>>>>>> + __u8 slice_sao_chroma_flag;
>>>>>>>>> + __u8 slice_temporal_mvp_enabled_flag;
>>>>>>>>> + __u8 num_ref_idx_l0_active_minus1;
>>>>>>>>> + __u8 num_ref_idx_l1_active_minus1;
>>>>>> Rockchip's decoder doesn't use this part.
>>>>>>>>> + __u8 mvd_l1_zero_flag;
>>>>>>>>> + __u8 cabac_init_flag;
>>>>>>>>> + __u8 collocated_from_l0_flag;
>>>>>>>>> + __u8 collocated_ref_idx;
>>>>>>>>> + __u8 five_minus_max_num_merge_cand;
>>>>>>>>> + __u8 use_integer_mv_flag;
>>>>>>>>> + __s8 slice_qp_delta;
>>>>>>>>> + __s8 slice_cb_qp_offset;
>>>>>>>>> + __s8 slice_cr_qp_offset;
>>>>>>>>> + __s8 slice_act_y_qp_offset;
>>>>>>>>> + __s8 slice_act_cb_qp_offset;
>>>>>>>>> + __s8 slice_act_cr_qp_offset;
>>>>>>>>> + __u8 slice_deblocking_filter_disabled_flag;
>>>>>>>>> + __s8 slice_beta_offset_div2;
>>>>>>>>> + __s8 slice_tc_offset_div2;
>>>>>>>>> + __u8 slice_loop_filter_across_slices_enabled_flag;
>>>>>>>>> +
>>>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture timing SEI message */
>>>>>>>>> + __u8 pic_struct;
>>>>>> I think the decoder doesn't care about this, it is used for display.
>>>>>
>>>>> The purpose of this field is to indicate whether the current picture is
>>>>> a progressive frame or an interlaced field picture, which is useful for
>>>>> decoding.
>>>>>
>>>>> At least our decoder has a register field to indicate frame/top
>>>>> field/bottom field, so we certainly need to keep the info around.
>>>>> Looking at the spec and the ffmpeg implementation, it looks like this
>>>>> flag of the bitstream is the usual way to report field coding.
>>>> It depends whether the decoder cares about scan type or more, I
>>>> wonder prefer general_interlaced_source_flag for just scan type, it
>>>> would be better than reading another SEL.
>>>
>>> Well we still need a way to indicate if the current data is top or
>>> bottom field for interlaced. I don't think that knowing that the whole
>>> video is interlaced would be precise enough.
>>>
>>> Cheers,
>>>
>>> Paul
>>>
>>>>>>>>> +
>>>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
>>>>>>>>> + struct v4l2_hevc_dpb_entry dpb[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>>> + __u8 num_active_dpb_entries;
>>>>>>>>> + __u8 ref_idx_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>>> + __u8 ref_idx_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>>> +
>>>>>>>>> + __u8 num_rps_poc_st_curr_before;
>>>>>>>>> + __u8 num_rps_poc_st_curr_after;
>>>>>>>>> + __u8 num_rps_poc_lt_curr;
>>>>>>>>> +
>>>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Weighted prediction parameter */
>>>>>>>>> + struct v4l2_hevc_pred_weight_table pred_weight_table;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> #endif
>>>>> --
>>>>> Paul Kocialkowski, Bootlin (formerly Free Electrons)
>>>>> Embedded Linux and kernel engineering
>>>>> https://bootlin.com
>>>>>
>>> --
>>> Paul Kocialkowski, Bootlin (formerly Free Electrons)
>>> Embedded Linux and kernel engineering
>>> https://bootlin.com
>>>
> --
> Paul Kocialkowski, Bootlin
> Embedded Linux and kernel engineering
> https://bootlin.com
>


2019-01-24 12:24:21

by Randy Li

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls



Sent from my iPad

> On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski <[email protected]> wrote:
>
> Hi,
>
>> On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
>> I forget a important thing, for the rkvdec and rk hevc decoder, it would
>> requests cabac table, scaling list, picture parameter set and reference
>> picture storing in one or various of DMA buffers. I am not talking about
>> the data been parsed, the decoder would requests a raw data.
>>
>> For the pps and rps, it is possible to reuse the slice header, just let
>> the decoder know the offset from the bitstream bufer, I would suggest to
>> add three properties(with sps) for them. But I think we need a method to
>> mark a OUTPUT side buffer for those aux data.
>
> I'm quite confused about the hardware implementation then. From what
> you're saying, it seems that it takes the raw bitstream elements rather
> than parsed elements. Is it really a stateless implementation?
>
> The stateless implementation was designed with the idea that only the
> raw slice data should be passed in bitstream form to the decoder. For
> H.264, it seems that some decoders also need the slice header in raw
> bitstream form (because they take the full slice NAL unit), see the
> discussions in this thread:
> media: docs-rst: Document m2m stateless video decoder interface

Stateless just mean it won’t track the previous result, but I don’t think you can define what a date the hardware would need. Even you just build a dpb for the decoder, it is still stateless, but parsing less or more data from the bitstream doesn’t stop a decoder become a stateless decoder.
>
> Can you detail exactly what the rockchip decoder absolutely needs in
> raw bitstream format?
>
> Cheers,
>
> Paul
>
>>> On 1/8/19 6:00 PM, Ayaka wrote:
>>> Sent from my iPad
>>>
>>>> On Jan 8, 2019, at 4:38 PM, Paul Kocialkowski <[email protected]> wrote:
>>>>
>>>> Hi,
>>>>
>>>>> On Tue, 2019-01-08 at 09:16 +0800, Ayaka wrote:
>>>>>
>>>>> Sent from my iPad
>>>>>
>>>>>> On Jan 7, 2019, at 5:57 PM, Paul Kocialkowski <[email protected]> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>>> On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
>>>>>>>> On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
>>>>>>>>
>>>>>>>>>> +
>>>>>>>>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE 0x01
>>>>>>>>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER 0x02
>>>>>>>>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR 0x03
>>>>>>>>>> +
>>>>>>>>>> +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX 16
>>>>>>>>>> +
>>>>>>>>>> +struct v4l2_hevc_dpb_entry {
>>>>>>>>>> + __u32 buffer_tag;
>>>>>>>>>> + __u8 rps;
>>>>>>>>>> + __u8 field_pic;
>>>>>>>>>> + __u16 pic_order_cnt[2];
>>>>>>>>>> +};
>>>>>>> Please add a property for reference index, if that rps is not used for
>>>>>>> this, some device would request that(not the rockchip one). And
>>>>>>> Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.
>>>>>> What exactly is that reference index? Is it a bitstream element or
>>>>>> something deduced from the bitstream?
>>>>>>
>>>>> picture order count(POC) for HEVC and frame_num in AVC. I think it is
>>>>> the number used in list0(P slice and B slice) and list1(B slice).
>>>> The picture order count is already the last field of the DPB entry
>>>> structure. There is one for each field picture.
>>> As we are not sure whether there is a field coded slice or CTU, I would hold this part and else about the field.
>>>>>>> Adding another buffer_tag for referring the memory of the motion vectors
>>>>>>> for each frames. Or a better method is add a meta data to echo picture
>>>>>>> buffer, since the picture output is just the same as the original,
>>>>>>> display won't care whether the motion vectors are written the button of
>>>>>>> picture or somewhere else.
>>>>>> The motion vectors are passed as part of the raw bitstream data, in the
>>>>>> slices. Is there a case where the motion vectors are coded differently?
>>>>> No, it is an additional cache for decoder, even FFmpeg having such
>>>>> data, I think allwinner must output it into somewhere.
>>>> Ah yes I see what you mean! This is handled internally by our driver
>>>> and not exposed to userspace. I don't think it would be a good idea to
>>>> expose this cache or request that userspace allocates it like a video
>>>> buffer.
>>>>
>>> No, usually the driver should allocate, as the user space have no idea on size of each devices.
>>> But for advantage user, application can fix a broken picture with a proper data or analysis a object motion from that.
>>> So I would suggest attaching this information to a picture buffer as a meta data.
>>>>>>>>>> +
>>>>>>>>>> +struct v4l2_hevc_pred_weight_table {
>>>>>>>>>> + __u8 luma_log2_weight_denom;
>>>>>>>>>> + __s8 delta_chroma_log2_weight_denom;
>>>>>>>>>> +
>>>>>>>>>> + __s8 delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>>>> + __s8 luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>>>> + __s8 delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>>>>>>> + __s8 chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>>>>>>> +
>>>>>>>>>> + __s8 delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>>>> + __s8 luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>>>> + __s8 delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>>>>>>> + __s8 chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>>>>>>>>> +};
>>>>>>>>>> +
>>>>>>> Those properties I think are not necessary are applying for the
>>>>>>> Rockchip's device, may not work for the others.
>>>>>> Yes, it's possible that some of the elements are not necessary for some
>>>>>> decoders. What we want is to cover all the elements that might be
>>>>>> required for a decoder.
>>>>> I wonder whether allwinner need that, those sao flag usually ignored
>>>>> by decoder in design. But more is better than less, it is hard to
>>>>> extend a v4l2 structure in the future, maybe a new HEVC profile
>>>>> would bring a new property, it is still too early for HEVC.
>>>> Yes this is used by our decoder. The idea is to have all the basic
>>>> bitstream elements in the structures (even if some decoders don't use
>>>> them all) and add others for extension as separate controls later.
>>>>
>>>>>>>>>> +struct v4l2_ctrl_hevc_slice_params {
>>>>>>>>>> + __u32 bit_size;
>>>>>>>>>> + __u32 data_bit_offset;
>>>>>>>>>> +
>>>>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
>>>>>>>>>> + __u8 nal_unit_type;
>>>>>>>>>> + __u8 nuh_temporal_id_plus1;
>>>>>>>>>> +
>>>>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
>>>>>>>>>> + __u8 slice_type;
>>>>>>>>>> + __u8 colour_plane_id;
>>>>>>> ----------------------------------------------------------------------------
>>>>>>>>>> + __u16 slice_pic_order_cnt;
>>>>>>>>>> + __u8 slice_sao_luma_flag;
>>>>>>>>>> + __u8 slice_sao_chroma_flag;
>>>>>>>>>> + __u8 slice_temporal_mvp_enabled_flag;
>>>>>>>>>> + __u8 num_ref_idx_l0_active_minus1;
>>>>>>>>>> + __u8 num_ref_idx_l1_active_minus1;
>>>>>>> Rockchip's decoder doesn't use this part.
>>>>>>>>>> + __u8 mvd_l1_zero_flag;
>>>>>>>>>> + __u8 cabac_init_flag;
>>>>>>>>>> + __u8 collocated_from_l0_flag;
>>>>>>>>>> + __u8 collocated_ref_idx;
>>>>>>>>>> + __u8 five_minus_max_num_merge_cand;
>>>>>>>>>> + __u8 use_integer_mv_flag;
>>>>>>>>>> + __s8 slice_qp_delta;
>>>>>>>>>> + __s8 slice_cb_qp_offset;
>>>>>>>>>> + __s8 slice_cr_qp_offset;
>>>>>>>>>> + __s8 slice_act_y_qp_offset;
>>>>>>>>>> + __s8 slice_act_cb_qp_offset;
>>>>>>>>>> + __s8 slice_act_cr_qp_offset;
>>>>>>>>>> + __u8 slice_deblocking_filter_disabled_flag;
>>>>>>>>>> + __s8 slice_beta_offset_div2;
>>>>>>>>>> + __s8 slice_tc_offset_div2;
>>>>>>>>>> + __u8 slice_loop_filter_across_slices_enabled_flag;
>>>>>>>>>> +
>>>>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture timing SEI message */
>>>>>>>>>> + __u8 pic_struct;
>>>>>>> I think the decoder doesn't care about this, it is used for display.
>>>>>> The purpose of this field is to indicate whether the current picture is
>>>>>> a progressive frame or an interlaced field picture, which is useful for
>>>>>> decoding.
>>>>>>
>>>>>> At least our decoder has a register field to indicate frame/top
>>>>>> field/bottom field, so we certainly need to keep the info around.
>>>>>> Looking at the spec and the ffmpeg implementation, it looks like this
>>>>>> flag of the bitstream is the usual way to report field coding.
>>>>> It depends whether the decoder cares about scan type or more, I
>>>>> wonder prefer general_interlaced_source_flag for just scan type, it
>>>>> would be better than reading another SEL.
>>>> Well we still need a way to indicate if the current data is top or
>>>> bottom field for interlaced. I don't think that knowing that the whole
>>>> video is interlaced would be precise enough.
>>>>
>>>> Cheers,
>>>>
>>>> Paul
>>>>
>>>>>>>>>> +
>>>>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
>>>>>>>>>> + struct v4l2_hevc_dpb_entry dpb[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>>>> + __u8 num_active_dpb_entries;
>>>>>>>>>> + __u8 ref_idx_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>>>> + __u8 ref_idx_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>>>>>>>>> +
>>>>>>>>>> + __u8 num_rps_poc_st_curr_before;
>>>>>>>>>> + __u8 num_rps_poc_st_curr_after;
>>>>>>>>>> + __u8 num_rps_poc_lt_curr;
>>>>>>>>>> +
>>>>>>>>>> + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Weighted prediction parameter */
>>>>>>>>>> + struct v4l2_hevc_pred_weight_table pred_weight_table;
>>>>>>>>>> +};
>>>>>>>>>> +
>>>>>>>>>> #endif
>>>>>> --
>>>>>> Paul Kocialkowski, Bootlin (formerly Free Electrons)
>>>>>> Embedded Linux and kernel engineering
>>>>>> https://bootlin.com
>>>>>>
>>>> --
>>>> Paul Kocialkowski, Bootlin (formerly Free Electrons)
>>>> Embedded Linux and kernel engineering
>>>> https://bootlin.com
>>>>
> --
> Paul Kocialkowski, Bootlin
> Embedded Linux and kernel engineering
> https://bootlin.com
>


2019-01-24 13:11:01

by Paul Kocialkowski

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] media: cedrus: Add HEVC/H.265 decoding support

Hi,

On Tue, 2018-11-27 at 09:21 +0100, Maxime Ripard wrote:
> Hi!
>
> On Fri, Nov 23, 2018 at 02:02:09PM +0100, Paul Kocialkowski wrote:
> > This introduces support for HEVC/H.265 to the Cedrus VPU driver, with
> > both uni-directional and bi-directional prediction modes supported.
> >
> > Field-coded (interlaced) pictures, custom quantization matrices and
> > 10-bit output are not supported at this point.
> >
> > Signed-off-by: Paul Kocialkowski <[email protected]>
>
> Output from checkpatch:
> total: 0 errors, 68 warnings, 14 checks, 999 lines checked

Looks like many of the "line over 80 chars" are due to macros. I don't
think it would be a good idea to break them down or to change the
macros names since they are directly inherited from the bitstream
elements.

What do you think?

> > +/*
> > + * Note: Neighbor info buffer size is apparently doubled for H6, which may be
> > + * related to 10 bit H265 support.
> > + */
> > +#define CEDRUS_H265_NEIGHBOR_INFO_BUF_SIZE (397 * SZ_1K)
> > +#define CEDRUS_H265_ENTRY_POINTS_BUF_SIZE (4 * SZ_1K)
> > +#define CEDRUS_H265_MV_COL_BUF_UNIT_CTB_SIZE 160
>
> Having some information on where this is coming from would be useful.

Yes, definitely.

> > +static void cedrus_h265_sram_write_data(struct cedrus_dev *dev, u32 *data,
>
> Since the data pointer is pretty much an opaque structure, you should
> have a void pointer here, that would avoid the type casting you're
> doing when calling that function.

Sure, that would make more sense.

[...]

> > + /* Output frame. */
> > +
> > + output_pic_list_index = V4L2_HEVC_DPB_ENTRIES_NUM_MAX;
> > + pic_order_cnt[0] = pic_order_cnt[1] = slice_params->slice_pic_order_cnt;
> > + mv_col_buf_addr[0] = cedrus_h265_frame_info_mv_col_buf_addr(ctx,
> > + run->dst->vb2_buf.index, 0) - PHYS_OFFSET;
> > + mv_col_buf_addr[1] = cedrus_h265_frame_info_mv_col_buf_addr(ctx,
> > + run->dst->vb2_buf.index, 1) - PHYS_OFFSET;
> > + dst_luma_addr = cedrus_dst_buf_addr(ctx, run->dst->vb2_buf.index, 0) -
> > + PHYS_OFFSET;
> > + dst_chroma_addr = cedrus_dst_buf_addr(ctx, run->dst->vb2_buf.index, 1) -
> > + PHYS_OFFSET;
> > +
> > + cedrus_h265_frame_info_write_single(dev, output_pic_list_index,
> > + slice_params->pic_struct != 0,
> > + pic_order_cnt, mv_col_buf_addr,
> > + dst_luma_addr, dst_chroma_addr);
>
> You can only pass the run and slice_params pointers to that function.

The point is to make it independent from the context, so that the same
function can be called with either the slice_params or the dpb info.
I don't think making two variants or even two wrappers would bring any
significant benefit.

> > +
> > + cedrus_write(dev, VE_DEC_H265_OUTPUT_FRAME_IDX, output_pic_list_index);
> > +
> > + /* Reference picture list 0 (for P/B frames). */
> > + if (slice_params->slice_type != V4L2_HEVC_SLICE_TYPE_I) {
> > + cedrus_h265_ref_pic_list_write(dev, slice_params->ref_idx_l0,
> > + slice_params->num_ref_idx_l0_active_minus1 + 1,
> > + slice_params->dpb, slice_params->num_active_dpb_entries,
> > + VE_DEC_H265_SRAM_OFFSET_REF_PIC_LIST0);
> > +
>
> slice_params is enough.

The rationale is similar to the one above: being able to use the same
helper with either L0 or L1, which implies passing the relevant
elements directly.

> > + if (pps->weighted_pred_flag || pps->weighted_bipred_flag)
> > + cedrus_h265_pred_weight_write(dev,
> > + pred_weight_table->delta_luma_weight_l0,
> > + pred_weight_table->luma_offset_l0,
> > + pred_weight_table->delta_chroma_weight_l0,
> > + pred_weight_table->chroma_offset_l0,
> > + slice_params->num_ref_idx_l0_active_minus1 + 1,
> > + VE_DEC_H265_SRAM_OFFSET_PRED_WEIGHT_LUMA_L0,
> > + VE_DEC_H265_SRAM_OFFSET_PRED_WEIGHT_CHROMA_L0);
>
> Ditto, that function should only take the pred_weight_table and
> slice_params pointers

And same rational as well.

> > + }
> > +
> > + /* Reference picture list 1 (for B frames). */
> > + if (slice_params->slice_type == V4L2_HEVC_SLICE_TYPE_B) {
> > + cedrus_h265_ref_pic_list_write(dev, slice_params->ref_idx_l1,
> > + slice_params->num_ref_idx_l1_active_minus1 + 1,
> > + slice_params->dpb,
> > + slice_params->num_active_dpb_entries,
> > + VE_DEC_H265_SRAM_OFFSET_REF_PIC_LIST1);
> > +
> > + if (pps->weighted_bipred_flag)
> > + cedrus_h265_pred_weight_write(dev,
> > + pred_weight_table->delta_luma_weight_l1,
> > + pred_weight_table->luma_offset_l1,
> > + pred_weight_table->delta_chroma_weight_l1,
> > + pred_weight_table->chroma_offset_l1,
> > + slice_params->num_ref_idx_l1_active_minus1 + 1,
> > + VE_DEC_H265_SRAM_OFFSET_PRED_WEIGHT_LUMA_L1,
> > + VE_DEC_H265_SRAM_OFFSET_PRED_WEIGHT_CHROMA_L1);
> > + }
>
> Ditto
>
> Looks good otherwise, thanks!

Thanks for the review!

Cheers,

Paul

--
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com


2019-01-25 10:11:07

by Maxime Ripard

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] media: cedrus: Add HEVC/H.265 decoding support

Hi,

On Thu, Jan 24, 2019 at 02:10:25PM +0100, Paul Kocialkowski wrote:
> On Tue, 2018-11-27 at 09:21 +0100, Maxime Ripard wrote:
> > Hi!
> >
> > On Fri, Nov 23, 2018 at 02:02:09PM +0100, Paul Kocialkowski wrote:
> > > This introduces support for HEVC/H.265 to the Cedrus VPU driver, with
> > > both uni-directional and bi-directional prediction modes supported.
> > >
> > > Field-coded (interlaced) pictures, custom quantization matrices and
> > > 10-bit output are not supported at this point.
> > >
> > > Signed-off-by: Paul Kocialkowski <[email protected]>
> >
> > Output from checkpatch:
> > total: 0 errors, 68 warnings, 14 checks, 999 lines checked
>
> Looks like many of the "line over 80 chars" are due to macros. I don't
> think it would be a good idea to break them down or to change the
> macros names since they are directly inherited from the bitstream
> elements.
>
> What do you think?

Yeah, the 80-chars limit can be ignored. But there's more warnings and
checks that should be addressed.

> > > + /* Output frame. */
> > > +
> > > + output_pic_list_index = V4L2_HEVC_DPB_ENTRIES_NUM_MAX;
> > > + pic_order_cnt[0] = pic_order_cnt[1] = slice_params->slice_pic_order_cnt;
> > > + mv_col_buf_addr[0] = cedrus_h265_frame_info_mv_col_buf_addr(ctx,
> > > + run->dst->vb2_buf.index, 0) - PHYS_OFFSET;
> > > + mv_col_buf_addr[1] = cedrus_h265_frame_info_mv_col_buf_addr(ctx,
> > > + run->dst->vb2_buf.index, 1) - PHYS_OFFSET;
> > > + dst_luma_addr = cedrus_dst_buf_addr(ctx, run->dst->vb2_buf.index, 0) -
> > > + PHYS_OFFSET;
> > > + dst_chroma_addr = cedrus_dst_buf_addr(ctx, run->dst->vb2_buf.index, 1) -
> > > + PHYS_OFFSET;
> > > +
> > > + cedrus_h265_frame_info_write_single(dev, output_pic_list_index,
> > > + slice_params->pic_struct != 0,
> > > + pic_order_cnt, mv_col_buf_addr,
> > > + dst_luma_addr, dst_chroma_addr);
> >
> > You can only pass the run and slice_params pointers to that function.
>
> The point is to make it independent from the context, so that the same
> function can be called with either the slice_params or the dpb info.
> I don't think making two variants or even two wrappers would bring any
> significant benefit.

Then you can still pass directly the vb2 buffer pointer, that would
remove the mv_col_buf_addr, dst_luma_addr and dst_chroma_addr. The
idea here is that the less arguments you have in your function, the
easier it is to understand.

> > > +
> > > + cedrus_write(dev, VE_DEC_H265_OUTPUT_FRAME_IDX, output_pic_list_index);
> > > +
> > > + /* Reference picture list 0 (for P/B frames). */
> > > + if (slice_params->slice_type != V4L2_HEVC_SLICE_TYPE_I) {
> > > + cedrus_h265_ref_pic_list_write(dev, slice_params->ref_idx_l0,
> > > + slice_params->num_ref_idx_l0_active_minus1 + 1,
> > > + slice_params->dpb, slice_params->num_active_dpb_entries,
> > > + VE_DEC_H265_SRAM_OFFSET_REF_PIC_LIST0);
> > > +
> >
> > slice_params is enough.
>
> The rationale is similar to the one above: being able to use the same
> helper with either L0 or L1, which implies passing the relevant
> elements directly.

The DPB and num_active_dpb_entries will not change from one run to the
other though. And having intermediate functions if that allows to be
clearer is fine as well.

Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Attachments:
(No filename) (3.37 kB)
signature.asc (235.00 B)
Download all attachments

2019-01-25 13:05:19

by Paul Kocialkowski

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

Hi,

On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
>
> Sent from my iPad
>
> > On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski <[email protected]> wrote:
> >
> > Hi,
> >
> > > On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
> > > I forget a important thing, for the rkvdec and rk hevc decoder, it would
> > > requests cabac table, scaling list, picture parameter set and reference
> > > picture storing in one or various of DMA buffers. I am not talking about
> > > the data been parsed, the decoder would requests a raw data.
> > >
> > > For the pps and rps, it is possible to reuse the slice header, just let
> > > the decoder know the offset from the bitstream bufer, I would suggest to
> > > add three properties(with sps) for them. But I think we need a method to
> > > mark a OUTPUT side buffer for those aux data.
> >
> > I'm quite confused about the hardware implementation then. From what
> > you're saying, it seems that it takes the raw bitstream elements rather
> > than parsed elements. Is it really a stateless implementation?
> >
> > The stateless implementation was designed with the idea that only the
> > raw slice data should be passed in bitstream form to the decoder. For
> > H.264, it seems that some decoders also need the slice header in raw
> > bitstream form (because they take the full slice NAL unit), see the
> > discussions in this thread:
> > media: docs-rst: Document m2m stateless video decoder interface
>
> Stateless just mean it won’t track the previous result, but I don’t
> think you can define what a date the hardware would need. Even you
> just build a dpb for the decoder, it is still stateless, but parsing
> less or more data from the bitstream doesn’t stop a decoder become a
> stateless decoder.

Yes fair enough, the format in which the hardware decoder takes the
bitstream parameters does not make it stateless or stateful per-se.
It's just that stateless decoders should have no particular reason for
parsing the bitstream on their own since the hardware can be designed
with registers for each relevant bitstream element to configure the
decoding pipeline. That's how GPU-based decoder implementations are
implemented (VAAPI/VDPAU/NVDEC, etc).

So the format we have agreed on so far for the stateless interface is
to pass parsed elements via v4l2 control structures.

If the hardware can only work by parsing the bitstream itself, I'm not
sure what the best solution would be. Reconstructing the bitstream in
the kernel is a pretty bad option, but so is parsing in the kernel or
having the data both in parsed and raw forms. Do you see another
possibility?

Cheers,

Paul

> > Can you detail exactly what the rockchip decoder absolutely needs in
> > raw bitstream format?
> >
> > Cheers,
> >
> > Paul
> >
> > > > On 1/8/19 6:00 PM, Ayaka wrote:
> > > > Sent from my iPad
> > > >
> > > > > On Jan 8, 2019, at 4:38 PM, Paul Kocialkowski <[email protected]> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > > On Tue, 2019-01-08 at 09:16 +0800, Ayaka wrote:
> > > > > >
> > > > > > Sent from my iPad
> > > > > >
> > > > > > > On Jan 7, 2019, at 5:57 PM, Paul Kocialkowski <[email protected]> wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > > > On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
> > > > > > > > > On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
> > > > > > > > >
> > > > > > > > > > > +
> > > > > > > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE 0x01
> > > > > > > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER 0x02
> > > > > > > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR 0x03
> > > > > > > > > > > +
> > > > > > > > > > > +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX 16
> > > > > > > > > > > +
> > > > > > > > > > > +struct v4l2_hevc_dpb_entry {
> > > > > > > > > > > + __u32 buffer_tag;
> > > > > > > > > > > + __u8 rps;
> > > > > > > > > > > + __u8 field_pic;
> > > > > > > > > > > + __u16 pic_order_cnt[2];
> > > > > > > > > > > +};
> > > > > > > > Please add a property for reference index, if that rps is not used for
> > > > > > > > this, some device would request that(not the rockchip one). And
> > > > > > > > Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.
> > > > > > > What exactly is that reference index? Is it a bitstream element or
> > > > > > > something deduced from the bitstream?
> > > > > > >
> > > > > > picture order count(POC) for HEVC and frame_num in AVC. I think it is
> > > > > > the number used in list0(P slice and B slice) and list1(B slice).
> > > > > The picture order count is already the last field of the DPB entry
> > > > > structure. There is one for each field picture.
> > > > As we are not sure whether there is a field coded slice or CTU, I would hold this part and else about the field.
> > > > > > > > Adding another buffer_tag for referring the memory of the motion vectors
> > > > > > > > for each frames. Or a better method is add a meta data to echo picture
> > > > > > > > buffer, since the picture output is just the same as the original,
> > > > > > > > display won't care whether the motion vectors are written the button of
> > > > > > > > picture or somewhere else.
> > > > > > > The motion vectors are passed as part of the raw bitstream data, in the
> > > > > > > slices. Is there a case where the motion vectors are coded differently?
> > > > > > No, it is an additional cache for decoder, even FFmpeg having such
> > > > > > data, I think allwinner must output it into somewhere.
> > > > > Ah yes I see what you mean! This is handled internally by our driver
> > > > > and not exposed to userspace. I don't think it would be a good idea to
> > > > > expose this cache or request that userspace allocates it like a video
> > > > > buffer.
> > > > >
> > > > No, usually the driver should allocate, as the user space have no idea on size of each devices.
> > > > But for advantage user, application can fix a broken picture with a proper data or analysis a object motion from that.
> > > > So I would suggest attaching this information to a picture buffer as a meta data.
> > > > > > > > > > > +
> > > > > > > > > > > +struct v4l2_hevc_pred_weight_table {
> > > > > > > > > > > + __u8 luma_log2_weight_denom;
> > > > > > > > > > > + __s8 delta_chroma_log2_weight_denom;
> > > > > > > > > > > +
> > > > > > > > > > > + __s8 delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > > > > + __s8 luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > > > > + __s8 delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > > > > > > + __s8 chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > > > > > > +
> > > > > > > > > > > + __s8 delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > > > > + __s8 luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > > > > + __s8 delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > > > > > > + __s8 chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > > > > > > +};
> > > > > > > > > > > +
> > > > > > > > Those properties I think are not necessary are applying for the
> > > > > > > > Rockchip's device, may not work for the others.
> > > > > > > Yes, it's possible that some of the elements are not necessary for some
> > > > > > > decoders. What we want is to cover all the elements that might be
> > > > > > > required for a decoder.
> > > > > > I wonder whether allwinner need that, those sao flag usually ignored
> > > > > > by decoder in design. But more is better than less, it is hard to
> > > > > > extend a v4l2 structure in the future, maybe a new HEVC profile
> > > > > > would bring a new property, it is still too early for HEVC.
> > > > > Yes this is used by our decoder. The idea is to have all the basic
> > > > > bitstream elements in the structures (even if some decoders don't use
> > > > > them all) and add others for extension as separate controls later.
> > > > >
> > > > > > > > > > > +struct v4l2_ctrl_hevc_slice_params {
> > > > > > > > > > > + __u32 bit_size;
> > > > > > > > > > > + __u32 data_bit_offset;
> > > > > > > > > > > +
> > > > > > > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
> > > > > > > > > > > + __u8 nal_unit_type;
> > > > > > > > > > > + __u8 nuh_temporal_id_plus1;
> > > > > > > > > > > +
> > > > > > > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
> > > > > > > > > > > + __u8 slice_type;
> > > > > > > > > > > + __u8 colour_plane_id;
> > > > > > > > ----------------------------------------------------------------------------
> > > > > > > > > > > + __u16 slice_pic_order_cnt;
> > > > > > > > > > > + __u8 slice_sao_luma_flag;
> > > > > > > > > > > + __u8 slice_sao_chroma_flag;
> > > > > > > > > > > + __u8 slice_temporal_mvp_enabled_flag;
> > > > > > > > > > > + __u8 num_ref_idx_l0_active_minus1;
> > > > > > > > > > > + __u8 num_ref_idx_l1_active_minus1;
> > > > > > > > Rockchip's decoder doesn't use this part.
> > > > > > > > > > > + __u8 mvd_l1_zero_flag;
> > > > > > > > > > > + __u8 cabac_init_flag;
> > > > > > > > > > > + __u8 collocated_from_l0_flag;
> > > > > > > > > > > + __u8 collocated_ref_idx;
> > > > > > > > > > > + __u8 five_minus_max_num_merge_cand;
> > > > > > > > > > > + __u8 use_integer_mv_flag;
> > > > > > > > > > > + __s8 slice_qp_delta;
> > > > > > > > > > > + __s8 slice_cb_qp_offset;
> > > > > > > > > > > + __s8 slice_cr_qp_offset;
> > > > > > > > > > > + __s8 slice_act_y_qp_offset;
> > > > > > > > > > > + __s8 slice_act_cb_qp_offset;
> > > > > > > > > > > + __s8 slice_act_cr_qp_offset;
> > > > > > > > > > > + __u8 slice_deblocking_filter_disabled_flag;
> > > > > > > > > > > + __s8 slice_beta_offset_div2;
> > > > > > > > > > > + __s8 slice_tc_offset_div2;
> > > > > > > > > > > + __u8 slice_loop_filter_across_slices_enabled_flag;
> > > > > > > > > > > +
> > > > > > > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture timing SEI message */
> > > > > > > > > > > + __u8 pic_struct;
> > > > > > > > I think the decoder doesn't care about this, it is used for display.
> > > > > > > The purpose of this field is to indicate whether the current picture is
> > > > > > > a progressive frame or an interlaced field picture, which is useful for
> > > > > > > decoding.
> > > > > > >
> > > > > > > At least our decoder has a register field to indicate frame/top
> > > > > > > field/bottom field, so we certainly need to keep the info around.
> > > > > > > Looking at the spec and the ffmpeg implementation, it looks like this
> > > > > > > flag of the bitstream is the usual way to report field coding.
> > > > > > It depends whether the decoder cares about scan type or more, I
> > > > > > wonder prefer general_interlaced_source_flag for just scan type, it
> > > > > > would be better than reading another SEL.
> > > > > Well we still need a way to indicate if the current data is top or
> > > > > bottom field for interlaced. I don't think that knowing that the whole
> > > > > video is interlaced would be precise enough.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Paul
> > > > >
> > > > > > > > > > > +
> > > > > > > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
> > > > > > > > > > > + struct v4l2_hevc_dpb_entry dpb[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > > > > + __u8 num_active_dpb_entries;
> > > > > > > > > > > + __u8 ref_idx_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > > > > + __u8 ref_idx_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > > > > +
> > > > > > > > > > > + __u8 num_rps_poc_st_curr_before;
> > > > > > > > > > > + __u8 num_rps_poc_st_curr_after;
> > > > > > > > > > > + __u8 num_rps_poc_lt_curr;
> > > > > > > > > > > +
> > > > > > > > > > > + /* ISO/IEC 23008-2, ITU-T Rec. H.265: Weighted prediction parameter */
> > > > > > > > > > > + struct v4l2_hevc_pred_weight_table pred_weight_table;
> > > > > > > > > > > +};
> > > > > > > > > > > +
> > > > > > > > > > > #endif
> > > > > > > --
> > > > > > > Paul Kocialkowski, Bootlin (formerly Free Electrons)
> > > > > > > Embedded Linux and kernel engineering
> > > > > > > https://bootlin.com
> > > > > > >
> > > > > --
> > > > > Paul Kocialkowski, Bootlin (formerly Free Electrons)
> > > > > Embedded Linux and kernel engineering
> > > > > https://bootlin.com
> > > > >
> > --
> > Paul Kocialkowski, Bootlin
> > Embedded Linux and kernel engineering
> > https://bootlin.com
> >
--
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com


2019-01-29 07:54:12

by Alexandre Courbot

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
<[email protected]> wrote:
>
> Hi,
>
> On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
> >
> > Sent from my iPad
> >
> > > On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > > On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
> > > > I forget a important thing, for the rkvdec and rk hevc decoder, it would
> > > > requests cabac table, scaling list, picture parameter set and reference
> > > > picture storing in one or various of DMA buffers. I am not talking about
> > > > the data been parsed, the decoder would requests a raw data.
> > > >
> > > > For the pps and rps, it is possible to reuse the slice header, just let
> > > > the decoder know the offset from the bitstream bufer, I would suggest to
> > > > add three properties(with sps) for them. But I think we need a method to
> > > > mark a OUTPUT side buffer for those aux data.
> > >
> > > I'm quite confused about the hardware implementation then. From what
> > > you're saying, it seems that it takes the raw bitstream elements rather
> > > than parsed elements. Is it really a stateless implementation?
> > >
> > > The stateless implementation was designed with the idea that only the
> > > raw slice data should be passed in bitstream form to the decoder. For
> > > H.264, it seems that some decoders also need the slice header in raw
> > > bitstream form (because they take the full slice NAL unit), see the
> > > discussions in this thread:
> > > media: docs-rst: Document m2m stateless video decoder interface
> >
> > Stateless just mean it won’t track the previous result, but I don’t
> > think you can define what a date the hardware would need. Even you
> > just build a dpb for the decoder, it is still stateless, but parsing
> > less or more data from the bitstream doesn’t stop a decoder become a
> > stateless decoder.
>
> Yes fair enough, the format in which the hardware decoder takes the
> bitstream parameters does not make it stateless or stateful per-se.
> It's just that stateless decoders should have no particular reason for
> parsing the bitstream on their own since the hardware can be designed
> with registers for each relevant bitstream element to configure the
> decoding pipeline. That's how GPU-based decoder implementations are
> implemented (VAAPI/VDPAU/NVDEC, etc).
>
> So the format we have agreed on so far for the stateless interface is
> to pass parsed elements via v4l2 control structures.
>
> If the hardware can only work by parsing the bitstream itself, I'm not
> sure what the best solution would be. Reconstructing the bitstream in
> the kernel is a pretty bad option, but so is parsing in the kernel or
> having the data both in parsed and raw forms. Do you see another
> possibility?

Is reconstructing the bitstream so bad? The v4l2 controls provide a
generic interface to an encoded format which the driver needs to
convert into a sequence that the hardware can understand. Typically
this is done by populating hardware-specific structures. Can't we
consider that in this specific instance, the hardware-specific
structure just happens to be identical to the original bitstream
format?

I agree that this is not strictly optimal for that particular
hardware, but such is the cost of abstractions, and in this specific
case I don't believe the cost would be particularly high?

2019-01-29 08:11:24

by Maxime Ripard

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

On Tue, Jan 29, 2019 at 04:44:35PM +0900, Alexandre Courbot wrote:
> On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
> > On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
> > >
> > > Sent from my iPad
> > >
> > > > On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > > On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
> > > > > I forget a important thing, for the rkvdec and rk hevc decoder, it would
> > > > > requests cabac table, scaling list, picture parameter set and reference
> > > > > picture storing in one or various of DMA buffers. I am not talking about
> > > > > the data been parsed, the decoder would requests a raw data.
> > > > >
> > > > > For the pps and rps, it is possible to reuse the slice header, just let
> > > > > the decoder know the offset from the bitstream bufer, I would suggest to
> > > > > add three properties(with sps) for them. But I think we need a method to
> > > > > mark a OUTPUT side buffer for those aux data.
> > > >
> > > > I'm quite confused about the hardware implementation then. From what
> > > > you're saying, it seems that it takes the raw bitstream elements rather
> > > > than parsed elements. Is it really a stateless implementation?
> > > >
> > > > The stateless implementation was designed with the idea that only the
> > > > raw slice data should be passed in bitstream form to the decoder. For
> > > > H.264, it seems that some decoders also need the slice header in raw
> > > > bitstream form (because they take the full slice NAL unit), see the
> > > > discussions in this thread:
> > > > media: docs-rst: Document m2m stateless video decoder interface
> > >
> > > Stateless just mean it won’t track the previous result, but I don’t
> > > think you can define what a date the hardware would need. Even you
> > > just build a dpb for the decoder, it is still stateless, but parsing
> > > less or more data from the bitstream doesn’t stop a decoder become a
> > > stateless decoder.
> >
> > Yes fair enough, the format in which the hardware decoder takes the
> > bitstream parameters does not make it stateless or stateful per-se.
> > It's just that stateless decoders should have no particular reason for
> > parsing the bitstream on their own since the hardware can be designed
> > with registers for each relevant bitstream element to configure the
> > decoding pipeline. That's how GPU-based decoder implementations are
> > implemented (VAAPI/VDPAU/NVDEC, etc).
> >
> > So the format we have agreed on so far for the stateless interface is
> > to pass parsed elements via v4l2 control structures.
> >
> > If the hardware can only work by parsing the bitstream itself, I'm not
> > sure what the best solution would be. Reconstructing the bitstream in
> > the kernel is a pretty bad option, but so is parsing in the kernel or
> > having the data both in parsed and raw forms. Do you see another
> > possibility?
>
> Is reconstructing the bitstream so bad? The v4l2 controls provide a
> generic interface to an encoded format which the driver needs to
> convert into a sequence that the hardware can understand. Typically
> this is done by populating hardware-specific structures. Can't we
> consider that in this specific instance, the hardware-specific
> structure just happens to be identical to the original bitstream
> format?
>
> I agree that this is not strictly optimal for that particular
> hardware, but such is the cost of abstractions, and in this specific
> case I don't believe the cost would be particularly high?

I mean, that argument can be made for the rockchip driver as well. If
reconstructing the bitstream is something we can do, and if we don't
care about being suboptimal for one particular hardware, then why the
rockchip driver doesn't just recreate the bitstream from that API?

After all, this is just a hardware specific header that happens to be
identical to the original bitstream format

Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2019-01-29 09:41:29

by Tomasz Figa

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

On Tue, Jan 29, 2019 at 5:09 PM Maxime Ripard <[email protected]> wrote:
>
> On Tue, Jan 29, 2019 at 04:44:35PM +0900, Alexandre Courbot wrote:
> > On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
> > > On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
> > > >
> > > > Sent from my iPad
> > > >
> > > > > On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski <[email protected]> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > > On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
> > > > > > I forget a important thing, for the rkvdec and rk hevc decoder, it would
> > > > > > requests cabac table, scaling list, picture parameter set and reference
> > > > > > picture storing in one or various of DMA buffers. I am not talking about
> > > > > > the data been parsed, the decoder would requests a raw data.
> > > > > >
> > > > > > For the pps and rps, it is possible to reuse the slice header, just let
> > > > > > the decoder know the offset from the bitstream bufer, I would suggest to
> > > > > > add three properties(with sps) for them. But I think we need a method to
> > > > > > mark a OUTPUT side buffer for those aux data.
> > > > >
> > > > > I'm quite confused about the hardware implementation then. From what
> > > > > you're saying, it seems that it takes the raw bitstream elements rather
> > > > > than parsed elements. Is it really a stateless implementation?
> > > > >
> > > > > The stateless implementation was designed with the idea that only the
> > > > > raw slice data should be passed in bitstream form to the decoder. For
> > > > > H.264, it seems that some decoders also need the slice header in raw
> > > > > bitstream form (because they take the full slice NAL unit), see the
> > > > > discussions in this thread:
> > > > > media: docs-rst: Document m2m stateless video decoder interface
> > > >
> > > > Stateless just mean it won’t track the previous result, but I don’t
> > > > think you can define what a date the hardware would need. Even you
> > > > just build a dpb for the decoder, it is still stateless, but parsing
> > > > less or more data from the bitstream doesn’t stop a decoder become a
> > > > stateless decoder.
> > >
> > > Yes fair enough, the format in which the hardware decoder takes the
> > > bitstream parameters does not make it stateless or stateful per-se.
> > > It's just that stateless decoders should have no particular reason for
> > > parsing the bitstream on their own since the hardware can be designed
> > > with registers for each relevant bitstream element to configure the
> > > decoding pipeline. That's how GPU-based decoder implementations are
> > > implemented (VAAPI/VDPAU/NVDEC, etc).
> > >
> > > So the format we have agreed on so far for the stateless interface is
> > > to pass parsed elements via v4l2 control structures.
> > >
> > > If the hardware can only work by parsing the bitstream itself, I'm not
> > > sure what the best solution would be. Reconstructing the bitstream in
> > > the kernel is a pretty bad option, but so is parsing in the kernel or
> > > having the data both in parsed and raw forms. Do you see another
> > > possibility?
> >
> > Is reconstructing the bitstream so bad? The v4l2 controls provide a
> > generic interface to an encoded format which the driver needs to
> > convert into a sequence that the hardware can understand. Typically
> > this is done by populating hardware-specific structures. Can't we
> > consider that in this specific instance, the hardware-specific
> > structure just happens to be identical to the original bitstream
> > format?
> >
> > I agree that this is not strictly optimal for that particular
> > hardware, but such is the cost of abstractions, and in this specific
> > case I don't believe the cost would be particularly high?
>
> I mean, that argument can be made for the rockchip driver as well. If
> reconstructing the bitstream is something we can do, and if we don't
> care about being suboptimal for one particular hardware, then why the
> rockchip driver doesn't just recreate the bitstream from that API?
>
> After all, this is just a hardware specific header that happens to be
> identical to the original bitstream format

I think in another thread (about H.264 I believe), we realized that it
could be a good idea to just include the Slice NAL units in the
Annex.B format in the buffers and that should work for all the
hardware we could think of (given offsets to particular parts inside
of the buffer). Wouldn't something similar work here for HEVC?

I don't really get the meaning of "raw" for "cabac table, scaling
list, picture parameter set and reference picture", since those are
parts of the bitstream, which needs to be parsed to obtain those.

Best regards,
Tomasz

2019-01-29 21:42:04

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
> On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
> <[email protected]> wrote:
> > Hi,
> >
> > On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
> > > Sent from my iPad
> > >
> > > > On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > > On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
> > > > > I forget a important thing, for the rkvdec and rk hevc decoder, it would
> > > > > requests cabac table, scaling list, picture parameter set and reference
> > > > > picture storing in one or various of DMA buffers. I am not talking about
> > > > > the data been parsed, the decoder would requests a raw data.
> > > > >
> > > > > For the pps and rps, it is possible to reuse the slice header, just let
> > > > > the decoder know the offset from the bitstream bufer, I would suggest to
> > > > > add three properties(with sps) for them. But I think we need a method to
> > > > > mark a OUTPUT side buffer for those aux data.
> > > >
> > > > I'm quite confused about the hardware implementation then. From what
> > > > you're saying, it seems that it takes the raw bitstream elements rather
> > > > than parsed elements. Is it really a stateless implementation?
> > > >
> > > > The stateless implementation was designed with the idea that only the
> > > > raw slice data should be passed in bitstream form to the decoder. For
> > > > H.264, it seems that some decoders also need the slice header in raw
> > > > bitstream form (because they take the full slice NAL unit), see the
> > > > discussions in this thread:
> > > > media: docs-rst: Document m2m stateless video decoder interface
> > >
> > > Stateless just mean it won’t track the previous result, but I don’t
> > > think you can define what a date the hardware would need. Even you
> > > just build a dpb for the decoder, it is still stateless, but parsing
> > > less or more data from the bitstream doesn’t stop a decoder become a
> > > stateless decoder.
> >
> > Yes fair enough, the format in which the hardware decoder takes the
> > bitstream parameters does not make it stateless or stateful per-se.
> > It's just that stateless decoders should have no particular reason for
> > parsing the bitstream on their own since the hardware can be designed
> > with registers for each relevant bitstream element to configure the
> > decoding pipeline. That's how GPU-based decoder implementations are
> > implemented (VAAPI/VDPAU/NVDEC, etc).
> >
> > So the format we have agreed on so far for the stateless interface is
> > to pass parsed elements via v4l2 control structures.
> >
> > If the hardware can only work by parsing the bitstream itself, I'm not
> > sure what the best solution would be. Reconstructing the bitstream in
> > the kernel is a pretty bad option, but so is parsing in the kernel or
> > having the data both in parsed and raw forms. Do you see another
> > possibility?
>
> Is reconstructing the bitstream so bad? The v4l2 controls provide a
> generic interface to an encoded format which the driver needs to
> convert into a sequence that the hardware can understand. Typically
> this is done by populating hardware-specific structures. Can't we
> consider that in this specific instance, the hardware-specific
> structure just happens to be identical to the original bitstream
> format?

At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
would be really really bad. In GStreamer project we have discussed for
a while (but have never done anything about) adding the ability through
a bitmask to select which part of the stream need to be parsed, as
parsing itself was causing some overhead. Maybe similar thing applies,
though as per our new design, it's the fourcc that dictate the driver
behaviour, we'd need yet another fourcc for drivers that wants the full
bitstream (which seems odd if you have already parsed everything, I
think this need some clarification).

>
> I agree that this is not strictly optimal for that particular
> hardware, but such is the cost of abstractions, and in this specific
> case I don't believe the cost would be particularly high?


Attachments:
signature.asc (201.00 B)
This is a digitally signed message part

2019-01-30 02:30:05

by Alexandre Courbot

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

On Wed, Jan 30, 2019 at 6:41 AM Nicolas Dufresne <[email protected]> wrote:
>
> Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
> > On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
> > <[email protected]> wrote:
> > > Hi,
> > >
> > > On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
> > > > Sent from my iPad
> > > >
> > > > > On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski <[email protected]> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > > On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
> > > > > > I forget a important thing, for the rkvdec and rk hevc decoder, it would
> > > > > > requests cabac table, scaling list, picture parameter set and reference
> > > > > > picture storing in one or various of DMA buffers. I am not talking about
> > > > > > the data been parsed, the decoder would requests a raw data.
> > > > > >
> > > > > > For the pps and rps, it is possible to reuse the slice header, just let
> > > > > > the decoder know the offset from the bitstream bufer, I would suggest to
> > > > > > add three properties(with sps) for them. But I think we need a method to
> > > > > > mark a OUTPUT side buffer for those aux data.
> > > > >
> > > > > I'm quite confused about the hardware implementation then. From what
> > > > > you're saying, it seems that it takes the raw bitstream elements rather
> > > > > than parsed elements. Is it really a stateless implementation?
> > > > >
> > > > > The stateless implementation was designed with the idea that only the
> > > > > raw slice data should be passed in bitstream form to the decoder. For
> > > > > H.264, it seems that some decoders also need the slice header in raw
> > > > > bitstream form (because they take the full slice NAL unit), see the
> > > > > discussions in this thread:
> > > > > media: docs-rst: Document m2m stateless video decoder interface
> > > >
> > > > Stateless just mean it won’t track the previous result, but I don’t
> > > > think you can define what a date the hardware would need. Even you
> > > > just build a dpb for the decoder, it is still stateless, but parsing
> > > > less or more data from the bitstream doesn’t stop a decoder become a
> > > > stateless decoder.
> > >
> > > Yes fair enough, the format in which the hardware decoder takes the
> > > bitstream parameters does not make it stateless or stateful per-se.
> > > It's just that stateless decoders should have no particular reason for
> > > parsing the bitstream on their own since the hardware can be designed
> > > with registers for each relevant bitstream element to configure the
> > > decoding pipeline. That's how GPU-based decoder implementations are
> > > implemented (VAAPI/VDPAU/NVDEC, etc).
> > >
> > > So the format we have agreed on so far for the stateless interface is
> > > to pass parsed elements via v4l2 control structures.
> > >
> > > If the hardware can only work by parsing the bitstream itself, I'm not
> > > sure what the best solution would be. Reconstructing the bitstream in
> > > the kernel is a pretty bad option, but so is parsing in the kernel or
> > > having the data both in parsed and raw forms. Do you see another
> > > possibility?
> >
> > Is reconstructing the bitstream so bad? The v4l2 controls provide a
> > generic interface to an encoded format which the driver needs to
> > convert into a sequence that the hardware can understand. Typically
> > this is done by populating hardware-specific structures. Can't we
> > consider that in this specific instance, the hardware-specific
> > structure just happens to be identical to the original bitstream
> > format?
>
> At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
> would be really really bad. In GStreamer project we have discussed for
> a while (but have never done anything about) adding the ability through
> a bitmask to select which part of the stream need to be parsed, as
> parsing itself was causing some overhead. Maybe similar thing applies,
> though as per our new design, it's the fourcc that dictate the driver
> behaviour, we'd need yet another fourcc for drivers that wants the full
> bitstream (which seems odd if you have already parsed everything, I
> think this need some clarification).

Note that I am not proposing to rebuild the *entire* bitstream
in-kernel. What I am saying is that if the hardware interprets some
structures (like SPS/PPS) in their raw format, this raw format could
be reconstructed from the structures passed by userspace at negligible
cost. Such manipulation would only happen on a small amount of data.

Exposing finer-grained driver requirements through a bitmask may
deserve more exploring. Maybe we could end with a spectrum of
capabilities that would allow us to cover the range from fully
stateless to fully stateful IPs more smoothly. Right now we have two
specifications that only consider the extremes of that range.

2019-01-30 03:43:43

by Tomasz Figa

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

On Wed, Jan 30, 2019 at 11:29 AM Alexandre Courbot
<[email protected]> wrote:
>
> On Wed, Jan 30, 2019 at 6:41 AM Nicolas Dufresne <[email protected]> wrote:
> >
> > Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
> > > On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
> > > <[email protected]> wrote:
> > > > Hi,
> > > >
> > > > On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
> > > > > Sent from my iPad
> > > > >
> > > > > > On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski <[email protected]> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > > On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
> > > > > > > I forget a important thing, for the rkvdec and rk hevc decoder, it would
> > > > > > > requests cabac table, scaling list, picture parameter set and reference
> > > > > > > picture storing in one or various of DMA buffers. I am not talking about
> > > > > > > the data been parsed, the decoder would requests a raw data.
> > > > > > >
> > > > > > > For the pps and rps, it is possible to reuse the slice header, just let
> > > > > > > the decoder know the offset from the bitstream bufer, I would suggest to
> > > > > > > add three properties(with sps) for them. But I think we need a method to
> > > > > > > mark a OUTPUT side buffer for those aux data.
> > > > > >
> > > > > > I'm quite confused about the hardware implementation then. From what
> > > > > > you're saying, it seems that it takes the raw bitstream elements rather
> > > > > > than parsed elements. Is it really a stateless implementation?
> > > > > >
> > > > > > The stateless implementation was designed with the idea that only the
> > > > > > raw slice data should be passed in bitstream form to the decoder. For
> > > > > > H.264, it seems that some decoders also need the slice header in raw
> > > > > > bitstream form (because they take the full slice NAL unit), see the
> > > > > > discussions in this thread:
> > > > > > media: docs-rst: Document m2m stateless video decoder interface
> > > > >
> > > > > Stateless just mean it won’t track the previous result, but I don’t
> > > > > think you can define what a date the hardware would need. Even you
> > > > > just build a dpb for the decoder, it is still stateless, but parsing
> > > > > less or more data from the bitstream doesn’t stop a decoder become a
> > > > > stateless decoder.
> > > >
> > > > Yes fair enough, the format in which the hardware decoder takes the
> > > > bitstream parameters does not make it stateless or stateful per-se.
> > > > It's just that stateless decoders should have no particular reason for
> > > > parsing the bitstream on their own since the hardware can be designed
> > > > with registers for each relevant bitstream element to configure the
> > > > decoding pipeline. That's how GPU-based decoder implementations are
> > > > implemented (VAAPI/VDPAU/NVDEC, etc).
> > > >
> > > > So the format we have agreed on so far for the stateless interface is
> > > > to pass parsed elements via v4l2 control structures.
> > > >
> > > > If the hardware can only work by parsing the bitstream itself, I'm not
> > > > sure what the best solution would be. Reconstructing the bitstream in
> > > > the kernel is a pretty bad option, but so is parsing in the kernel or
> > > > having the data both in parsed and raw forms. Do you see another
> > > > possibility?
> > >
> > > Is reconstructing the bitstream so bad? The v4l2 controls provide a
> > > generic interface to an encoded format which the driver needs to
> > > convert into a sequence that the hardware can understand. Typically
> > > this is done by populating hardware-specific structures. Can't we
> > > consider that in this specific instance, the hardware-specific
> > > structure just happens to be identical to the original bitstream
> > > format?
> >
> > At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
> > would be really really bad. In GStreamer project we have discussed for
> > a while (but have never done anything about) adding the ability through
> > a bitmask to select which part of the stream need to be parsed, as
> > parsing itself was causing some overhead. Maybe similar thing applies,
> > though as per our new design, it's the fourcc that dictate the driver
> > behaviour, we'd need yet another fourcc for drivers that wants the full
> > bitstream (which seems odd if you have already parsed everything, I
> > think this need some clarification).
>
> Note that I am not proposing to rebuild the *entire* bitstream
> in-kernel. What I am saying is that if the hardware interprets some
> structures (like SPS/PPS) in their raw format, this raw format could
> be reconstructed from the structures passed by userspace at negligible
> cost. Such manipulation would only happen on a small amount of data.
>
> Exposing finer-grained driver requirements through a bitmask may
> deserve more exploring. Maybe we could end with a spectrum of
> capabilities that would allow us to cover the range from fully
> stateless to fully stateful IPs more smoothly. Right now we have two
> specifications that only consider the extremes of that range.

I gave it a bit more thought and if we combine what Nicolas suggested
about the bitmask control with the userspace providing the full
bitstream in the OUTPUT buffers, split into some logical units and
"tagged" with their type (e.g. SPS, PPS, slice, etc.), we could
potentially get an interface that would work for any kind of decoder I
can think of, actually eliminating the boundary between stateful and
stateless decoders.

For example, a fully stateful decoder would have the bitmask control
set to 0 and accept data from all the OUTPUT buffers as they come. A
decoder that doesn't do any parsing on its own would have all the
valid bits in the bitmask set and ignore the data in OUTPUT buffers
tagged as any kind of metadata. And then, we could have any cases in
between, including stateful decoders which just can't parse the stream
on their own, but still manage anything else themselves, or stateless
ones which can parse parts of the stream, like the rk3399 vdec can
parse the H.264 slice headers on its own.

That could potentially let us completely eliminate the distinction
between the stateful and stateless interfaces and just have one that
covers both.

Thoughts?

Best regards,
Tomasz

2019-01-30 06:29:18

by Randy Li

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls



Sent from my iPad

> On Jan 30, 2019, at 11:35 AM, Tomasz Figa <[email protected]> wrote:
>
> On Wed, Jan 30, 2019 at 11:29 AM Alexandre Courbot
> <[email protected]> wrote:
>>
>>> On Wed, Jan 30, 2019 at 6:41 AM Nicolas Dufresne <[email protected]> wrote:
>>>
>>>> Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
>>>> On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
>>>> <[email protected]> wrote:
>>>>> Hi,
>>>>>
>>>>>> On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
>>>>>> Sent from my iPad
>>>>>>
>>>>>>> On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski <[email protected]> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>> On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
>>>>>>>> I forget a important thing, for the rkvdec and rk hevc decoder, it would
>>>>>>>> requests cabac table, scaling list, picture parameter set and reference
>>>>>>>> picture storing in one or various of DMA buffers. I am not talking about
>>>>>>>> the data been parsed, the decoder would requests a raw data.
>>>>>>>>
>>>>>>>> For the pps and rps, it is possible to reuse the slice header, just let
>>>>>>>> the decoder know the offset from the bitstream bufer, I would suggest to
>>>>>>>> add three properties(with sps) for them. But I think we need a method to
>>>>>>>> mark a OUTPUT side buffer for those aux data.
>>>>>>>
>>>>>>> I'm quite confused about the hardware implementation then. From what
>>>>>>> you're saying, it seems that it takes the raw bitstream elements rather
>>>>>>> than parsed elements. Is it really a stateless implementation?
>>>>>>>
>>>>>>> The stateless implementation was designed with the idea that only the
>>>>>>> raw slice data should be passed in bitstream form to the decoder. For
>>>>>>> H.264, it seems that some decoders also need the slice header in raw
>>>>>>> bitstream form (because they take the full slice NAL unit), see the
>>>>>>> discussions in this thread:
>>>>>>> media: docs-rst: Document m2m stateless video decoder interface
>>>>>>
>>>>>> Stateless just mean it won’t track the previous result, but I don’t
>>>>>> think you can define what a date the hardware would need. Even you
>>>>>> just build a dpb for the decoder, it is still stateless, but parsing
>>>>>> less or more data from the bitstream doesn’t stop a decoder become a
>>>>>> stateless decoder.
>>>>>
>>>>> Yes fair enough, the format in which the hardware decoder takes the
>>>>> bitstream parameters does not make it stateless or stateful per-se.
>>>>> It's just that stateless decoders should have no particular reason for
>>>>> parsing the bitstream on their own since the hardware can be designed
>>>>> with registers for each relevant bitstream element to configure the
>>>>> decoding pipeline. That's how GPU-based decoder implementations are
>>>>> implemented (VAAPI/VDPAU/NVDEC, etc).
>>>>>
>>>>> So the format we have agreed on so far for the stateless interface is
>>>>> to pass parsed elements via v4l2 control structures.
>>>>>
>>>>> If the hardware can only work by parsing the bitstream itself, I'm not
>>>>> sure what the best solution would be. Reconstructing the bitstream in
>>>>> the kernel is a pretty bad option, but so is parsing in the kernel or
>>>>> having the data both in parsed and raw forms. Do you see another
>>>>> possibility?
>>>>
>>>> Is reconstructing the bitstream so bad? The v4l2 controls provide a
>>>> generic interface to an encoded format which the driver needs to
>>>> convert into a sequence that the hardware can understand. Typically
>>>> this is done by populating hardware-specific structures. Can't we
>>>> consider that in this specific instance, the hardware-specific
>>>> structure just happens to be identical to the original bitstream
>>>> format?
>>>
>>> At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
>>> would be really really bad. In GStreamer project we have discussed for
>>> a while (but have never done anything about) adding the ability through
>>> a bitmask to select which part of the stream need to be parsed, as
>>> parsing itself was causing some overhead. Maybe similar thing applies,
>>> though as per our new design, it's the fourcc that dictate the driver
>>> behaviour, we'd need yet another fourcc for drivers that wants the full
>>> bitstream (which seems odd if you have already parsed everything, I
>>> think this need some clarification).
>>
>> Note that I am not proposing to rebuild the *entire* bitstream
>> in-kernel. What I am saying is that if the hardware interprets some
>> structures (like SPS/PPS) in their raw format, this raw format could
>> be reconstructed from the structures passed by userspace at negligible
>> cost. Such manipulation would only happen on a small amount of data.
>>
>> Exposing finer-grained driver requirements through a bitmask may
>> deserve more exploring. Maybe we could end with a spectrum of
>> capabilities that would allow us to cover the range from fully
>> stateless to fully stateful IPs more smoothly. Right now we have two
>> specifications that only consider the extremes of that range.
>
> I gave it a bit more thought and if we combine what Nicolas suggested
> about the bitmask control with the userspace providing the full
> bitstream in the OUTPUT buffers, split into some logical units and
> "tagged" with their type (e.g. SPS, PPS, slice, etc.), we could
> potentially get an interface that would work for any kind of decoder I
> can think of, actually eliminating the boundary between stateful and
> stateless decoders.
I agree with this idea, that is what I want calling memory region description while I am still struggling with userspace to post my driver demo.
>
> For example, a fully stateful decoder would have the bitmask control
> set to 0 and accept data from all the OUTPUT buffers as they come. A
> decoder that doesn't do any parsing on its own would have all the
> valid bits in the bitmask set and ignore the data in OUTPUT buffers
> tagged as any kind of metadata. And then, we could have any cases in
> between, including stateful decoders which just can't parse the stream
> on their own, but still manage anything else themselves, or stateless
> ones which can parse parts of the stream, like the rk3399 vdec can
> parse the H.264 slice headers on its own.
>
Actually not, the rkvdec and rkhevc can parse most but not all syntax sections.
Besides the vp9 decoder of rkvdec won’t parse most of the syntax.

I talked to some rockchip staff about the performance problem of reconstruction bitstream after yesterday arguing with tfiga at IRC yesterday. Although 1ms looks small to those decoder which can decode a picture of a UHD 4K HEVC videos in 9ms, it is enough for 60fps. But how about a higher frame rate like 120fps or 240fps and when it comes to 8K which is used in Japan broadcast.

I would bring more detail in the FOSDEM 2019, I may stay at graphics devroom at Saturday.
> That could potentially let us completely eliminate the distinction
> between the stateful and stateless interfaces and just have one that
> covers both.
>
> Thoughts?
>
> Best regards,
> Tomasz


2019-01-30 07:04:41

by Randy Li

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls



Sent from my iPad

> On Jan 30, 2019, at 5:41 AM, Nicolas Dufresne <[email protected]> wrote:
>
>> Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
>> On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
>> <[email protected]> wrote:
>>> Hi,
>>>
>>>> On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
>>>> Sent from my iPad
>>>>
>>>>> On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski <[email protected]> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>> On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
>>>>>> I forget a important thing, for the rkvdec and rk hevc decoder, it would
>>>>>> requests cabac table, scaling list, picture parameter set and reference
>>>>>> picture storing in one or various of DMA buffers. I am not talking about
>>>>>> the data been parsed, the decoder would requests a raw data.
>>>>>>
>>>>>> For the pps and rps, it is possible to reuse the slice header, just let
>>>>>> the decoder know the offset from the bitstream bufer, I would suggest to
>>>>>> add three properties(with sps) for them. But I think we need a method to
>>>>>> mark a OUTPUT side buffer for those aux data.
>>>>>
>>>>> I'm quite confused about the hardware implementation then. From what
>>>>> you're saying, it seems that it takes the raw bitstream elements rather
>>>>> than parsed elements. Is it really a stateless implementation?
>>>>>
>>>>> The stateless implementation was designed with the idea that only the
>>>>> raw slice data should be passed in bitstream form to the decoder. For
>>>>> H.264, it seems that some decoders also need the slice header in raw
>>>>> bitstream form (because they take the full slice NAL unit), see the
>>>>> discussions in this thread:
>>>>> media: docs-rst: Document m2m stateless video decoder interface
>>>>
>>>> Stateless just mean it won’t track the previous result, but I don’t
>>>> think you can define what a date the hardware would need. Even you
>>>> just build a dpb for the decoder, it is still stateless, but parsing
>>>> less or more data from the bitstream doesn’t stop a decoder become a
>>>> stateless decoder.
>>>
>>> Yes fair enough, the format in which the hardware decoder takes the
>>> bitstream parameters does not make it stateless or stateful per-se.
>>> It's just that stateless decoders should have no particular reason for
>>> parsing the bitstream on their own since the hardware can be designed
>>> with registers for each relevant bitstream element to configure the
>>> decoding pipeline. That's how GPU-based decoder implementations are
>>> implemented (VAAPI/VDPAU/NVDEC, etc).
>>>
>>> So the format we have agreed on so far for the stateless interface is
>>> to pass parsed elements via v4l2 control structures.
>>>
>>> If the hardware can only work by parsing the bitstream itself, I'm not
>>> sure what the best solution would be. Reconstructing the bitstream in
>>> the kernel is a pretty bad option, but so is parsing in the kernel or
>>> having the data both in parsed and raw forms. Do you see another
>>> possibility?
>>
>> Is reconstructing the bitstream so bad? The v4l2 controls provide a
>> generic interface to an encoded format which the driver needs to
>> convert into a sequence that the hardware can understand. Typically
>> this is done by populating hardware-specific structures. Can't we
>> consider that in this specific instance, the hardware-specific
>> structure just happens to be identical to the original bitstream
>> format?
>
> At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
Lucky, most of hardware won’t be able to processing such a big buffer.
General speaking, the register is 24bits for stream length in bytes.
> would be really really bad. In GStreamer project we have discussed for
> a while (but have never done anything about) adding the ability through
> a bitmask to select which part of the stream need to be parsed, as
> parsing itself was causing some overhead. Maybe similar thing applies,
> though as per our new design, it's the fourcc that dictate the driver
> behaviour, we'd need yet another fourcc for drivers that wants the full
> bitstream (which seems odd if you have already parsed everything, I
> think this need some clarification).
>
>>
>> I agree that this is not strictly optimal for that particular
>> hardware, but such is the cost of abstractions, and in this specific
>> case I don't believe the cost would be particularly high?


2019-01-30 07:24:24

by Tomasz Figa

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

On Wed, Jan 30, 2019 at 3:28 PM Ayaka <[email protected]> wrote:
>
>
>
> Sent from my iPad
>
> > On Jan 30, 2019, at 11:35 AM, Tomasz Figa <[email protected]> wrote:
> >
> > On Wed, Jan 30, 2019 at 11:29 AM Alexandre Courbot
> > <[email protected]> wrote:
> >>
> >>> On Wed, Jan 30, 2019 at 6:41 AM Nicolas Dufresne <[email protected]> wrote:
> >>>
> >>>> Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
> >>>> On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
> >>>> <[email protected]> wrote:
> >>>>> Hi,
> >>>>>
> >>>>>> On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
> >>>>>> Sent from my iPad
> >>>>>>
> >>>>>>> On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski <[email protected]> wrote:
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>>> On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
> >>>>>>>> I forget a important thing, for the rkvdec and rk hevc decoder, it would
> >>>>>>>> requests cabac table, scaling list, picture parameter set and reference
> >>>>>>>> picture storing in one or various of DMA buffers. I am not talking about
> >>>>>>>> the data been parsed, the decoder would requests a raw data.
> >>>>>>>>
> >>>>>>>> For the pps and rps, it is possible to reuse the slice header, just let
> >>>>>>>> the decoder know the offset from the bitstream bufer, I would suggest to
> >>>>>>>> add three properties(with sps) for them. But I think we need a method to
> >>>>>>>> mark a OUTPUT side buffer for those aux data.
> >>>>>>>
> >>>>>>> I'm quite confused about the hardware implementation then. From what
> >>>>>>> you're saying, it seems that it takes the raw bitstream elements rather
> >>>>>>> than parsed elements. Is it really a stateless implementation?
> >>>>>>>
> >>>>>>> The stateless implementation was designed with the idea that only the
> >>>>>>> raw slice data should be passed in bitstream form to the decoder. For
> >>>>>>> H.264, it seems that some decoders also need the slice header in raw
> >>>>>>> bitstream form (because they take the full slice NAL unit), see the
> >>>>>>> discussions in this thread:
> >>>>>>> media: docs-rst: Document m2m stateless video decoder interface
> >>>>>>
> >>>>>> Stateless just mean it won’t track the previous result, but I don’t
> >>>>>> think you can define what a date the hardware would need. Even you
> >>>>>> just build a dpb for the decoder, it is still stateless, but parsing
> >>>>>> less or more data from the bitstream doesn’t stop a decoder become a
> >>>>>> stateless decoder.
> >>>>>
> >>>>> Yes fair enough, the format in which the hardware decoder takes the
> >>>>> bitstream parameters does not make it stateless or stateful per-se.
> >>>>> It's just that stateless decoders should have no particular reason for
> >>>>> parsing the bitstream on their own since the hardware can be designed
> >>>>> with registers for each relevant bitstream element to configure the
> >>>>> decoding pipeline. That's how GPU-based decoder implementations are
> >>>>> implemented (VAAPI/VDPAU/NVDEC, etc).
> >>>>>
> >>>>> So the format we have agreed on so far for the stateless interface is
> >>>>> to pass parsed elements via v4l2 control structures.
> >>>>>
> >>>>> If the hardware can only work by parsing the bitstream itself, I'm not
> >>>>> sure what the best solution would be. Reconstructing the bitstream in
> >>>>> the kernel is a pretty bad option, but so is parsing in the kernel or
> >>>>> having the data both in parsed and raw forms. Do you see another
> >>>>> possibility?
> >>>>
> >>>> Is reconstructing the bitstream so bad? The v4l2 controls provide a
> >>>> generic interface to an encoded format which the driver needs to
> >>>> convert into a sequence that the hardware can understand. Typically
> >>>> this is done by populating hardware-specific structures. Can't we
> >>>> consider that in this specific instance, the hardware-specific
> >>>> structure just happens to be identical to the original bitstream
> >>>> format?
> >>>
> >>> At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
> >>> would be really really bad. In GStreamer project we have discussed for
> >>> a while (but have never done anything about) adding the ability through
> >>> a bitmask to select which part of the stream need to be parsed, as
> >>> parsing itself was causing some overhead. Maybe similar thing applies,
> >>> though as per our new design, it's the fourcc that dictate the driver
> >>> behaviour, we'd need yet another fourcc for drivers that wants the full
> >>> bitstream (which seems odd if you have already parsed everything, I
> >>> think this need some clarification).
> >>
> >> Note that I am not proposing to rebuild the *entire* bitstream
> >> in-kernel. What I am saying is that if the hardware interprets some
> >> structures (like SPS/PPS) in their raw format, this raw format could
> >> be reconstructed from the structures passed by userspace at negligible
> >> cost. Such manipulation would only happen on a small amount of data.
> >>
> >> Exposing finer-grained driver requirements through a bitmask may
> >> deserve more exploring. Maybe we could end with a spectrum of
> >> capabilities that would allow us to cover the range from fully
> >> stateless to fully stateful IPs more smoothly. Right now we have two
> >> specifications that only consider the extremes of that range.
> >
> > I gave it a bit more thought and if we combine what Nicolas suggested
> > about the bitmask control with the userspace providing the full
> > bitstream in the OUTPUT buffers, split into some logical units and
> > "tagged" with their type (e.g. SPS, PPS, slice, etc.), we could
> > potentially get an interface that would work for any kind of decoder I
> > can think of, actually eliminating the boundary between stateful and
> > stateless decoders.
> I agree with this idea, that is what I want calling memory region description while I am still struggling with userspace to post my driver demo.
> >
> > For example, a fully stateful decoder would have the bitmask control
> > set to 0 and accept data from all the OUTPUT buffers as they come. A
> > decoder that doesn't do any parsing on its own would have all the
> > valid bits in the bitmask set and ignore the data in OUTPUT buffers
> > tagged as any kind of metadata. And then, we could have any cases in
> > between, including stateful decoders which just can't parse the stream
> > on their own, but still manage anything else themselves, or stateless
> > ones which can parse parts of the stream, like the rk3399 vdec can
> > parse the H.264 slice headers on its own.
> >
> Actually not, the rkvdec and rkhevc can parse most but not all syntax sections.
> Besides the vp9 decoder of rkvdec won’t parse most of the syntax.
>
> I talked to some rockchip staff about the performance problem of reconstruction bitstream after yesterday arguing with tfiga at IRC yesterday. Although 1ms looks small to those decoder which can decode a picture of a UHD 4K HEVC videos in 9ms, it is enough for 60fps. But how about a higher frame rate like 120fps or 240fps and when it comes to 8K which is used in Japan broadcast.

1 ms for a 500 MHz CPU (which is quite slow these days) is 500k
cycles. We don't have to reconstruct the whole bitstream, just the
parsed metadata and also we don't get a new PPS or SPS every frame.
Not sure where you have this 1 ms from. Most of the difference between
our structures and the bitstream is that the latter is packed and
could be variable length.

We actually have some sample bitstream assembly code for the rockchip encoder:

https://chromium.googlesource.com/chromiumos/third_party/libv4lplugins/+/5e6034258146af6be973fb6a5bb6b9d6e7489437/libv4l-rockchip_v2/libvepu/h264e/h264e.c#148
https://chromium.googlesource.com/chromiumos/third_party/libv4lplugins/+/5e6034258146af6be973fb6a5bb6b9d6e7489437/libv4l-rockchip_v2/libvepu/streams.c#36

Disassembling the stream_put_bits() gives 36 thumb2 instructions,
including 23 for the loop for each byte that is written.
stream_write_ue() is a bit more complicated, but in the worst case it
ends up with 4 calls to stream_put_bits(), each at most spanning 4
bytes for simplicity.

Let's count the operations for SPS then:
(1) stream_put_bits() spanning 1 byte: 33 times
(2) stream_put_bits() spanning up to 3 bytes: 4 times
(3) stream_write_ue() up to 31 bits: 19 times

Adding it together:
(1) 33 * 36 +
(2) 4 * (36 + 2 * 23) +
(3) 19 * (4 * (36 + 3 * 23)) =

1188 + 328 + 7980 = 9496 ~= 10k instructions

The code above doesn't seem to contain any expensive instructions,
like division, so for a modern pipelined out of order core (e.g. A53),
it could be safe to assume 1 instruction per cycle. At 500 MHz that
gives you 20 usecs.

SPS is the most complex header and for H.264 we just do PPS and some
slice headers. Let's round it up a bit and we could have around 100
usecs for the complete frame metadata.

>
> I would bring more detail in the FOSDEM 2019, I may stay at graphics devroom at Saturday.
> > That could potentially let us completely eliminate the distinction
> > between the stateful and stateless interfaces and just have one that
> > covers both.
> >
> > Thoughts?

Any thoughts on my proposal to make the interface more flexible? Any
specific examples of issues that we could encounter that would prevent
it from working efficiently with Rockchip (or other) hardware?

Best regards,
Tomasz

2019-01-30 07:58:39

by Maxime Ripard

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

On Wed, Jan 30, 2019 at 12:35:41PM +0900, Tomasz Figa wrote:
> On Wed, Jan 30, 2019 at 11:29 AM Alexandre Courbot
> <[email protected]> wrote:
> >
> > On Wed, Jan 30, 2019 at 6:41 AM Nicolas Dufresne <[email protected]> wrote:
> > >
> > > Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
> > > > On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
> > > > <[email protected]> wrote:
> > > > > Hi,
> > > > >
> > > > > On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
> > > > > > Sent from my iPad
> > > > > >
> > > > > > > On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski <[email protected]> wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > > On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
> > > > > > > > I forget a important thing, for the rkvdec and rk hevc decoder, it would
> > > > > > > > requests cabac table, scaling list, picture parameter set and reference
> > > > > > > > picture storing in one or various of DMA buffers. I am not talking about
> > > > > > > > the data been parsed, the decoder would requests a raw data.
> > > > > > > >
> > > > > > > > For the pps and rps, it is possible to reuse the slice header, just let
> > > > > > > > the decoder know the offset from the bitstream bufer, I would suggest to
> > > > > > > > add three properties(with sps) for them. But I think we need a method to
> > > > > > > > mark a OUTPUT side buffer for those aux data.
> > > > > > >
> > > > > > > I'm quite confused about the hardware implementation then. From what
> > > > > > > you're saying, it seems that it takes the raw bitstream elements rather
> > > > > > > than parsed elements. Is it really a stateless implementation?
> > > > > > >
> > > > > > > The stateless implementation was designed with the idea that only the
> > > > > > > raw slice data should be passed in bitstream form to the decoder. For
> > > > > > > H.264, it seems that some decoders also need the slice header in raw
> > > > > > > bitstream form (because they take the full slice NAL unit), see the
> > > > > > > discussions in this thread:
> > > > > > > media: docs-rst: Document m2m stateless video decoder interface
> > > > > >
> > > > > > Stateless just mean it won’t track the previous result, but I don’t
> > > > > > think you can define what a date the hardware would need. Even you
> > > > > > just build a dpb for the decoder, it is still stateless, but parsing
> > > > > > less or more data from the bitstream doesn’t stop a decoder become a
> > > > > > stateless decoder.
> > > > >
> > > > > Yes fair enough, the format in which the hardware decoder takes the
> > > > > bitstream parameters does not make it stateless or stateful per-se.
> > > > > It's just that stateless decoders should have no particular reason for
> > > > > parsing the bitstream on their own since the hardware can be designed
> > > > > with registers for each relevant bitstream element to configure the
> > > > > decoding pipeline. That's how GPU-based decoder implementations are
> > > > > implemented (VAAPI/VDPAU/NVDEC, etc).
> > > > >
> > > > > So the format we have agreed on so far for the stateless interface is
> > > > > to pass parsed elements via v4l2 control structures.
> > > > >
> > > > > If the hardware can only work by parsing the bitstream itself, I'm not
> > > > > sure what the best solution would be. Reconstructing the bitstream in
> > > > > the kernel is a pretty bad option, but so is parsing in the kernel or
> > > > > having the data both in parsed and raw forms. Do you see another
> > > > > possibility?
> > > >
> > > > Is reconstructing the bitstream so bad? The v4l2 controls provide a
> > > > generic interface to an encoded format which the driver needs to
> > > > convert into a sequence that the hardware can understand. Typically
> > > > this is done by populating hardware-specific structures. Can't we
> > > > consider that in this specific instance, the hardware-specific
> > > > structure just happens to be identical to the original bitstream
> > > > format?
> > >
> > > At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
> > > would be really really bad. In GStreamer project we have discussed for
> > > a while (but have never done anything about) adding the ability through
> > > a bitmask to select which part of the stream need to be parsed, as
> > > parsing itself was causing some overhead. Maybe similar thing applies,
> > > though as per our new design, it's the fourcc that dictate the driver
> > > behaviour, we'd need yet another fourcc for drivers that wants the full
> > > bitstream (which seems odd if you have already parsed everything, I
> > > think this need some clarification).
> >
> > Note that I am not proposing to rebuild the *entire* bitstream
> > in-kernel. What I am saying is that if the hardware interprets some
> > structures (like SPS/PPS) in their raw format, this raw format could
> > be reconstructed from the structures passed by userspace at negligible
> > cost. Such manipulation would only happen on a small amount of data.
> >
> > Exposing finer-grained driver requirements through a bitmask may
> > deserve more exploring. Maybe we could end with a spectrum of
> > capabilities that would allow us to cover the range from fully
> > stateless to fully stateful IPs more smoothly. Right now we have two
> > specifications that only consider the extremes of that range.
>
> I gave it a bit more thought and if we combine what Nicolas suggested
> about the bitmask control with the userspace providing the full
> bitstream in the OUTPUT buffers, split into some logical units and
> "tagged" with their type (e.g. SPS, PPS, slice, etc.), we could
> potentially get an interface that would work for any kind of decoder I
> can think of, actually eliminating the boundary between stateful and
> stateless decoders.
>
> For example, a fully stateful decoder would have the bitmask control
> set to 0 and accept data from all the OUTPUT buffers as they come. A
> decoder that doesn't do any parsing on its own would have all the
> valid bits in the bitmask set and ignore the data in OUTPUT buffers
> tagged as any kind of metadata. And then, we could have any cases in
> between, including stateful decoders which just can't parse the stream
> on their own, but still manage anything else themselves, or stateless
> ones which can parse parts of the stream, like the rk3399 vdec can
> parse the H.264 slice headers on its own.
>
> That could potentially let us completely eliminate the distinction
> between the stateful and stateless interfaces and just have one that
> covers both.
>
> Thoughts?

If we have to provide the whole bitstream in the buffers, then it
entirely breaks the sole software stack we have running and working
currently, for a use case and a driver that hasn't seen a single line
of code.

Seriously, this is a *private* API that we did that way so that we can
change it and only make it public. Why not do just that?

Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Attachments:
(No filename) (7.03 kB)
signature.asc (235.00 B)
Download all attachments

2019-01-30 09:55:08

by Randy Li

[permalink] [raw]
Subject: Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls



Sent from my iPad

> On Jan 30, 2019, at 3:17 PM, Tomasz Figa <[email protected]> wrote:
>
>> On Wed, Jan 30, 2019 at 3:28 PM Ayaka <[email protected]> wrote:
>>
>>
>>
>> Sent from my iPad
>>
>>> On Jan 30, 2019, at 11:35 AM, Tomasz Figa <[email protected]> wrote:
>>>
>>> On Wed, Jan 30, 2019 at 11:29 AM Alexandre Courbot
>>> <[email protected]> wrote:
>>>>
>>>>>> On Wed, Jan 30, 2019 at 6:41 AM Nicolas Dufresne <[email protected]> wrote:
>>>>>>
>>>>>> Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
>>>>>> On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
>>>>>> <[email protected]> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>>> On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
>>>>>>>> Sent from my iPad
>>>>>>>>
>>>>>>>>> On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>> On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
>>>>>>>>>> I forget a important thing, for the rkvdec and rk hevc decoder, it would
>>>>>>>>>> requests cabac table, scaling list, picture parameter set and reference
>>>>>>>>>> picture storing in one or various of DMA buffers. I am not talking about
>>>>>>>>>> the data been parsed, the decoder would requests a raw data.
>>>>>>>>>>
>>>>>>>>>> For the pps and rps, it is possible to reuse the slice header, just let
>>>>>>>>>> the decoder know the offset from the bitstream bufer, I would suggest to
>>>>>>>>>> add three properties(with sps) for them. But I think we need a method to
>>>>>>>>>> mark a OUTPUT side buffer for those aux data.
>>>>>>>>>
>>>>>>>>> I'm quite confused about the hardware implementation then. From what
>>>>>>>>> you're saying, it seems that it takes the raw bitstream elements rather
>>>>>>>>> than parsed elements. Is it really a stateless implementation?
>>>>>>>>>
>>>>>>>>> The stateless implementation was designed with the idea that only the
>>>>>>>>> raw slice data should be passed in bitstream form to the decoder. For
>>>>>>>>> H.264, it seems that some decoders also need the slice header in raw
>>>>>>>>> bitstream form (because they take the full slice NAL unit), see the
>>>>>>>>> discussions in this thread:
>>>>>>>>> media: docs-rst: Document m2m stateless video decoder interface
>>>>>>>>
>>>>>>>> Stateless just mean it won’t track the previous result, but I don’t
>>>>>>>> think you can define what a date the hardware would need. Even you
>>>>>>>> just build a dpb for the decoder, it is still stateless, but parsing
>>>>>>>> less or more data from the bitstream doesn’t stop a decoder become a
>>>>>>>> stateless decoder.
>>>>>>>
>>>>>>> Yes fair enough, the format in which the hardware decoder takes the
>>>>>>> bitstream parameters does not make it stateless or stateful per-se.
>>>>>>> It's just that stateless decoders should have no particular reason for
>>>>>>> parsing the bitstream on their own since the hardware can be designed
>>>>>>> with registers for each relevant bitstream element to configure the
>>>>>>> decoding pipeline. That's how GPU-based decoder implementations are
>>>>>>> implemented (VAAPI/VDPAU/NVDEC, etc).
>>>>>>>
>>>>>>> So the format we have agreed on so far for the stateless interface is
>>>>>>> to pass parsed elements via v4l2 control structures.
>>>>>>>
>>>>>>> If the hardware can only work by parsing the bitstream itself, I'm not
>>>>>>> sure what the best solution would be. Reconstructing the bitstream in
>>>>>>> the kernel is a pretty bad option, but so is parsing in the kernel or
>>>>>>> having the data both in parsed and raw forms. Do you see another
>>>>>>> possibility?
>>>>>>
>>>>>> Is reconstructing the bitstream so bad? The v4l2 controls provide a
>>>>>> generic interface to an encoded format which the driver needs to
>>>>>> convert into a sequence that the hardware can understand. Typically
>>>>>> this is done by populating hardware-specific structures. Can't we
>>>>>> consider that in this specific instance, the hardware-specific
>>>>>> structure just happens to be identical to the original bitstream
>>>>>> format?
>>>>>
>>>>> At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
>>>>> would be really really bad. In GStreamer project we have discussed for
>>>>> a while (but have never done anything about) adding the ability through
>>>>> a bitmask to select which part of the stream need to be parsed, as
>>>>> parsing itself was causing some overhead. Maybe similar thing applies,
>>>>> though as per our new design, it's the fourcc that dictate the driver
>>>>> behaviour, we'd need yet another fourcc for drivers that wants the full
>>>>> bitstream (which seems odd if you have already parsed everything, I
>>>>> think this need some clarification).
>>>>
>>>> Note that I am not proposing to rebuild the *entire* bitstream
>>>> in-kernel. What I am saying is that if the hardware interprets some
>>>> structures (like SPS/PPS) in their raw format, this raw format could
>>>> be reconstructed from the structures passed by userspace at negligible
>>>> cost. Such manipulation would only happen on a small amount of data.
>>>>
>>>> Exposing finer-grained driver requirements through a bitmask may
>>>> deserve more exploring. Maybe we could end with a spectrum of
>>>> capabilities that would allow us to cover the range from fully
>>>> stateless to fully stateful IPs more smoothly. Right now we have two
>>>> specifications that only consider the extremes of that range.
>>>
>>> I gave it a bit more thought and if we combine what Nicolas suggested
>>> about the bitmask control with the userspace providing the full
>>> bitstream in the OUTPUT buffers, split into some logical units and
>>> "tagged" with their type (e.g. SPS, PPS, slice, etc.), we could
>>> potentially get an interface that would work for any kind of decoder I
>>> can think of, actually eliminating the boundary between stateful and
>>> stateless decoders.
>> I agree with this idea, that is what I want calling memory region description while I am still struggling with userspace to post my driver demo.
>>>
>>> For example, a fully stateful decoder would have the bitmask control
>>> set to 0 and accept data from all the OUTPUT buffers as they come. A
>>> decoder that doesn't do any parsing on its own would have all the
>>> valid bits in the bitmask set and ignore the data in OUTPUT buffers
>>> tagged as any kind of metadata. And then, we could have any cases in
>>> between, including stateful decoders which just can't parse the stream
>>> on their own, but still manage anything else themselves, or stateless
>>> ones which can parse parts of the stream, like the rk3399 vdec can
>>> parse the H.264 slice headers on its own.
>>>
>> Actually not, the rkvdec and rkhevc can parse most but not all syntax sections.
>> Besides the vp9 decoder of rkvdec won’t parse most of the syntax.
>>
>> I talked to some rockchip staff about the performance problem of reconstruction bitstream after yesterday arguing with tfiga at IRC yesterday. Although 1ms looks small to those decoder which can decode a picture of a UHD 4K HEVC videos in 9ms, it is enough for 60fps. But how about a higher frame rate like 120fps or 240fps and when it comes to 8K which is used in Japan broadcast.
>
> 1 ms for a 500 MHz CPU (which is quite slow these days) is 500k
> cycles. We don't have to reconstruct the whole bitstream, just the
> parsed metadata and also we don't get a new PPS or SPS every frame.
> Not sure where you have this 1 ms from. Most of the difference between
> our structures and the bitstream is that the latter is packed and
> could be variable length.
You told me that number yesterday.
>
> We actually have some sample bitstream assembly code for the rockchip encoder:
>
> https://chromium.googlesource.com/chromiumos/third_party/libv4lplugins/+/5e6034258146af6be973fb6a5bb6b9d6e7489437/libv4l-rockchip_v2/libvepu/h264e/h264e.c#148
> https://chromium.googlesource.com/chromiumos/third_party/libv4lplugins/+/5e6034258146af6be973fb6a5bb6b9d6e7489437/libv4l-rockchip_v2/libvepu/streams.c#36
>
> Disassembling the stream_put_bits() gives 36 thumb2 instructions,
> including 23 for the loop for each byte that is written.
> stream_write_ue() is a bit more complicated, but in the worst case it
> ends up with 4 calls to stream_put_bits(), each at most spanning 4
> bytes for simplicity.
>
> Let's count the operations for SPS then:
> (1) stream_put_bits() spanning 1 byte: 33 times
> (2) stream_put_bits() spanning up to 3 bytes: 4 times
> (3) stream_write_ue() up to 31 bits: 19 times
>
> Adding it together:
> (1) 33 * 36 +
> (2) 4 * (36 + 2 * 23) +
> (3) 19 * (4 * (36 + 3 * 23)) =
>
> 1188 + 328 + 7980 = 9496 ~= 10k instructions
>
> The code above doesn't seem to contain any expensive instructions,
> like division, so for a modern pipelined out of order core (e.g. A53),
> it could be safe to assume 1 instruction per cycle. At 500 MHz that
> gives you 20 usecs.
>
> SPS is the most complex header and for H.264 we just do PPS and some
> slice headers. Let's round it up a bit and we could have around 100
> usecs for the complete frame metadata.
The scaling list (cabac table) address need to fill the pps header, which would request mapping and unmapping a 4K memory echo of times.

And you ever think of multiple session at the same time. Besides, have you tired the Android CTS test? There is a test that would do resolution change every 5 frames which would request to construct a new sps and pps for H.264. It is the Android CTS make me have such of thought.

Anyway the problem here is that v4l2 driver need to wait the previous picture is done to generate the register for the next pictures. Which leading more idle time of the device.
>>
>> I would bring more detail in the FOSDEM 2019, I may stay at graphics devroom at Saturday.
>>> That could potentially let us completely eliminate the distinction
>>> between the stateful and stateless interfaces and just have one that
>>> covers both.
>>>
>>> Thoughts?
>
> Any thoughts on my proposal to make the interface more flexible? Any
> specific examples of issues that we could encounter that would prevent
> it from working efficiently with Rockchip (or other) hardware?
>
> Best regards,
> Tomasz