This series contains fixes and improvements for the hantro H264 decoder.
Patch 1 and 5 fixes the motion vector buffer offset calculation for field encoded
and monochrome content and makes it possible to decode a sample from [1].
Patch 2 removes overallocation for the motion vector buffer,
only half of the extra size is needed.
Patch 3 changes to use the same source for width and height as is used for
motion vector buffer offset calculation.
The RFC patches that added bits to handle field encoded content have
been dropped and will be resent in a separate series.
The following sample from [1] is now playable with this series
- H264_1080i-25-interlace_Kaesescheibchen.mkv
This series has been tested using ffmpeg v4l2 request hwaccel at [2]
[1] http://kwiboo.libreelec.tv/test/samples/
[2] https://github.com/Kwiboo/FFmpeg/compare/4.0.4-Leia-18.4...v4l2-request-hwaccel-4.0.4-hantro
Changes in v3:
- rebased on for-v5.5q
- drop RFC patches
- src_fmt instead of dst_fmt is used for width/height
- address feedback from Boris
Changes in v2:
- scaling list changes split to its own series
- address feedback from Philipp and Ezequiel
Regards,
Jonas
Jonas Karlman (5):
media: hantro: Fix H264 motion vector buffer offset
media: hantro: Reduce H264 extra space for motion vectors
media: hantro: Use output buffer width and height for H264 decoding
media: hantro: Remove now unused H264 pic_size
media: hantro: Set H264 FIELDPIC_FLAG_E flag correctly
.../staging/media/hantro/hantro_g1_h264_dec.c | 37 +++++++++++++------
drivers/staging/media/hantro/hantro_h264.c | 5 ---
drivers/staging/media/hantro/hantro_hw.h | 3 --
drivers/staging/media/hantro/hantro_v4l2.c | 20 +++++++++-
4 files changed, 43 insertions(+), 22 deletions(-)
--
2.17.1
A decoded 8-bit 4:2:0 frame need memory for up to 448 bytes per
macroblock and is laid out in memory as follow:
+---------------------------+
| Y-plane 256 bytes x MBs |
+---------------------------+
| UV-plane 128 bytes x MBs |
+---------------------------+
| MV buffer 64 bytes x MBs |
+---------------------------+
The motion vector buffer offset is currently correct for 4:2:0 because the
extra space for motion vectors is overallocated with an extra 64 bytes x MBs.
Wrong offset for both destination and motion vector buffer are used
for the bottom field of field encoded content, wrong offset is
also used for 4:0:0 (monochrome) content.
Fix this by setting the motion vector address to the expected 384 bytes x MBs
offset for 4:2:0 and 256 bytes x MBs offset for 4:0:0 content.
Also use correct destination and motion vector buffer offset
for the bottom field of field encoded content.
While at it also extend the check for 4:0:0 (monochrome) to include an
additional check for High Profile (100).
Fixes: dea0a82f3d22 ("media: hantro: Add support for H264 decoding on G1")
Signed-off-by: Jonas Karlman <[email protected]>
Reviewed-by: Boris Brezillon <[email protected]>
---
Changes in v3:
* address remarks from Boris
- use src_fmt instead of dst_fmt
Changes in v2:
* address remarks from Philipp and Ezequiel
- update commit message
- rename offset to bytes_per_mb
- remove MV_OFFSET macros
- move PIC_MB_WIDTH/HEIGHT_P change to separate patch
---
.../staging/media/hantro/hantro_g1_h264_dec.c | 31 +++++++++++++------
1 file changed, 22 insertions(+), 9 deletions(-)
diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
index 70a6b5b26477..30d977c3d529 100644
--- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
+++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
@@ -81,7 +81,7 @@ static void set_params(struct hantro_ctx *ctx)
reg |= G1_REG_DEC_CTRL4_CABAC_E;
if (sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE)
reg |= G1_REG_DEC_CTRL4_DIR_8X8_INFER_E;
- if (sps->chroma_format_idc == 0)
+ if (sps->profile_idc >= 100 && sps->chroma_format_idc == 0)
reg |= G1_REG_DEC_CTRL4_BLACKWHITE_E;
if (pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED)
reg |= G1_REG_DEC_CTRL4_WEIGHT_PRED_E;
@@ -234,6 +234,7 @@ static void set_buffers(struct hantro_ctx *ctx)
struct vb2_v4l2_buffer *src_buf, *dst_buf;
struct hantro_dev *vpu = ctx->dev;
dma_addr_t src_dma, dst_dma;
+ size_t offset = 0;
src_buf = hantro_get_src_buf(ctx);
dst_buf = hantro_get_dst_buf(ctx);
@@ -244,18 +245,30 @@ static void set_buffers(struct hantro_ctx *ctx)
/* Destination (decoded frame) buffer. */
dst_dma = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
- vdpu_write_relaxed(vpu, dst_dma, G1_REG_ADDR_DST);
+ /* Adjust dma addr to start at second line for bottom field */
+ if (ctrls->slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
+ offset = ALIGN(ctx->src_fmt.width, MB_DIM);
+ vdpu_write_relaxed(vpu, dst_dma + offset, G1_REG_ADDR_DST);
/* Higher profiles require DMV buffer appended to reference frames. */
if (ctrls->sps->profile_idc > 66 && ctrls->decode->nal_ref_idc) {
- size_t pic_size = ctx->h264_dec.pic_size;
- size_t mv_offset = round_up(pic_size, 8);
-
+ unsigned int bytes_per_mb = 384;
+
+ /* DMV buffer for monochrome start directly after Y-plane */
+ if (ctrls->sps->profile_idc >= 100 &&
+ ctrls->sps->chroma_format_idc == 0)
+ bytes_per_mb = 256;
+ offset = bytes_per_mb * MB_WIDTH(ctx->src_fmt.width) *
+ MB_HEIGHT(ctx->src_fmt.height);
+
+ /*
+ * DMV buffer is split in two for field encoded frames,
+ * adjust offset for bottom field
+ */
if (ctrls->slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
- mv_offset += 32 * MB_WIDTH(ctx->dst_fmt.width);
-
- vdpu_write_relaxed(vpu, dst_dma + mv_offset,
- G1_REG_ADDR_DIR_MV);
+ offset += 32 * MB_WIDTH(ctx->src_fmt.width) *
+ MB_HEIGHT(ctx->src_fmt.height);
+ vdpu_write_relaxed(vpu, dst_dma + offset, G1_REG_ADDR_DIR_MV);
}
/* Auxiliary buffer prepared in hantro_g1_h264_dec_prepare_table(). */
--
2.17.1
pic_size in hantro_h264_dec_hw_ctx struct is no longer used,
lets remove it.
Signed-off-by: Jonas Karlman <[email protected]>
Reviewed-by: Boris Brezillon <[email protected]>
---
drivers/staging/media/hantro/hantro_h264.c | 5 -----
drivers/staging/media/hantro/hantro_hw.h | 3 ---
2 files changed, 8 deletions(-)
diff --git a/drivers/staging/media/hantro/hantro_h264.c b/drivers/staging/media/hantro/hantro_h264.c
index 694a330f508e..568640eab3a6 100644
--- a/drivers/staging/media/hantro/hantro_h264.c
+++ b/drivers/staging/media/hantro/hantro_h264.c
@@ -618,7 +618,6 @@ int hantro_h264_dec_init(struct hantro_ctx *ctx)
struct hantro_h264_dec_hw_ctx *h264_dec = &ctx->h264_dec;
struct hantro_aux_buf *priv = &h264_dec->priv;
struct hantro_h264_dec_priv_tbl *tbl;
- struct v4l2_pix_format_mplane pix_mp;
priv->cpu = dma_alloc_coherent(vpu->dev, sizeof(*tbl), &priv->dma,
GFP_KERNEL);
@@ -629,9 +628,5 @@ int hantro_h264_dec_init(struct hantro_ctx *ctx)
tbl = priv->cpu;
memcpy(tbl->cabac_table, h264_cabac_table, sizeof(tbl->cabac_table));
- v4l2_fill_pixfmt_mp(&pix_mp, ctx->dst_fmt.pixelformat,
- ctx->dst_fmt.width, ctx->dst_fmt.height);
- h264_dec->pic_size = pix_mp.plane_fmt[0].sizeimage;
-
return 0;
}
diff --git a/drivers/staging/media/hantro/hantro_hw.h b/drivers/staging/media/hantro/hantro_hw.h
index 69b88f4d3fb3..fa91dd1848b7 100644
--- a/drivers/staging/media/hantro/hantro_hw.h
+++ b/drivers/staging/media/hantro/hantro_hw.h
@@ -80,15 +80,12 @@ struct hantro_h264_dec_reflists {
* @dpb: DPB
* @reflists: P/B0/B1 reflists
* @ctrls: V4L2 controls attached to a run
- * @pic_size: Size in bytes of decoded picture, this is needed
- * to pass the location of motion vectors.
*/
struct hantro_h264_dec_hw_ctx {
struct hantro_aux_buf priv;
struct v4l2_h264_dpb_entry dpb[HANTRO_H264_DPB_SIZE];
struct hantro_h264_dec_reflists reflists;
struct hantro_h264_dec_ctrls ctrls;
- size_t pic_size;
};
/**
--
2.17.1
Hi Jonas,
On Thu, Nov 7, 2019 at 7:34 AM Jonas Karlman <[email protected]> wrote:
>
> A decoded 8-bit 4:2:0 frame need memory for up to 448 bytes per
> macroblock and is laid out in memory as follow:
>
> +---------------------------+
> | Y-plane 256 bytes x MBs |
> +---------------------------+
> | UV-plane 128 bytes x MBs |
> +---------------------------+
> | MV buffer 64 bytes x MBs |
> +---------------------------+
>
> The motion vector buffer offset is currently correct for 4:2:0 because the
> extra space for motion vectors is overallocated with an extra 64 bytes x MBs.
>
> Wrong offset for both destination and motion vector buffer are used
> for the bottom field of field encoded content, wrong offset is
> also used for 4:0:0 (monochrome) content.
>
> Fix this by setting the motion vector address to the expected 384 bytes x MBs
> offset for 4:2:0 and 256 bytes x MBs offset for 4:0:0 content.
>
> Also use correct destination and motion vector buffer offset
> for the bottom field of field encoded content.
>
> While at it also extend the check for 4:0:0 (monochrome) to include an
> additional check for High Profile (100).
>
> Fixes: dea0a82f3d22 ("media: hantro: Add support for H264 decoding on G1")
> Signed-off-by: Jonas Karlman <[email protected]>
> Reviewed-by: Boris Brezillon <[email protected]>
> ---
> Changes in v3:
> * address remarks from Boris
> - use src_fmt instead of dst_fmt
> Changes in v2:
> * address remarks from Philipp and Ezequiel
> - update commit message
> - rename offset to bytes_per_mb
> - remove MV_OFFSET macros
> - move PIC_MB_WIDTH/HEIGHT_P change to separate patch
> ---
> .../staging/media/hantro/hantro_g1_h264_dec.c | 31 +++++++++++++------
> 1 file changed, 22 insertions(+), 9 deletions(-)
>
First of all, thanks for the patches! Good to see more members of the
community contributing to the driver.
Please find my comments inline.
> diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> index 70a6b5b26477..30d977c3d529 100644
> --- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> +++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> @@ -81,7 +81,7 @@ static void set_params(struct hantro_ctx *ctx)
> reg |= G1_REG_DEC_CTRL4_CABAC_E;
> if (sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE)
> reg |= G1_REG_DEC_CTRL4_DIR_8X8_INFER_E;
> - if (sps->chroma_format_idc == 0)
> + if (sps->profile_idc >= 100 && sps->chroma_format_idc == 0)
I'd rather make this a separate patch with proper explanation in commit message.
> reg |= G1_REG_DEC_CTRL4_BLACKWHITE_E;
> if (pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED)
> reg |= G1_REG_DEC_CTRL4_WEIGHT_PRED_E;
> @@ -234,6 +234,7 @@ static void set_buffers(struct hantro_ctx *ctx)
> struct vb2_v4l2_buffer *src_buf, *dst_buf;
> struct hantro_dev *vpu = ctx->dev;
> dma_addr_t src_dma, dst_dma;
> + size_t offset = 0;
>
> src_buf = hantro_get_src_buf(ctx);
> dst_buf = hantro_get_dst_buf(ctx);
> @@ -244,18 +245,30 @@ static void set_buffers(struct hantro_ctx *ctx)
>
> /* Destination (decoded frame) buffer. */
> dst_dma = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
> - vdpu_write_relaxed(vpu, dst_dma, G1_REG_ADDR_DST);
> + /* Adjust dma addr to start at second line for bottom field */
> + if (ctrls->slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
> + offset = ALIGN(ctx->src_fmt.width, MB_DIM);
Isn't ctx->src_fmt.width already aligned to MB_DIM?
Also, offset is in bytes, so should we rather use the bytesperline field?
> + vdpu_write_relaxed(vpu, dst_dma + offset, G1_REG_ADDR_DST);
>
> /* Higher profiles require DMV buffer appended to reference frames. */
> if (ctrls->sps->profile_idc > 66 && ctrls->decode->nal_ref_idc) {
> - size_t pic_size = ctx->h264_dec.pic_size;
> - size_t mv_offset = round_up(pic_size, 8);
> -
> + unsigned int bytes_per_mb = 384;
> +
> + /* DMV buffer for monochrome start directly after Y-plane */
> + if (ctrls->sps->profile_idc >= 100 &&
> + ctrls->sps->chroma_format_idc == 0)
> + bytes_per_mb = 256;
nit: Adding a blank line here would make it much easier to read.
> + offset = bytes_per_mb * MB_WIDTH(ctx->src_fmt.width) *
> + MB_HEIGHT(ctx->src_fmt.height);
It's kind of difficult to follow with this idea of bytes_per_mb IMHO.
Would it perhaps make sense to rewrite the code as below?
luma_size = ctx->src_fmt.planes[0].bytesperline * ctx->src_fmt.height;
if (ctrls->sps->profile_idc >= 100 &&
ctrls->sps->chroma_format_idc == 0)
chroma_size = 0;
else
chroma_size = ctx->src_fmt.planes[0].bytesperline *
ctx->src_fmt.height / 4;
offset = luma_size + chroma_size;
Also, the code only handles 4:2:0 and 4:0:0. How about 4:2:2?
Best regards,
Tomasz
> +
> + /*
> + * DMV buffer is split in two for field encoded frames,
> + * adjust offset for bottom field
> + */
> if (ctrls->slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
> - mv_offset += 32 * MB_WIDTH(ctx->dst_fmt.width);
> -
> - vdpu_write_relaxed(vpu, dst_dma + mv_offset,
> - G1_REG_ADDR_DIR_MV);
> + offset += 32 * MB_WIDTH(ctx->src_fmt.width) *
> + MB_HEIGHT(ctx->src_fmt.height);
> + vdpu_write_relaxed(vpu, dst_dma + offset, G1_REG_ADDR_DIR_MV);
> }
>
> /* Auxiliary buffer prepared in hantro_g1_h264_dec_prepare_table(). */
> --
> 2.17.1
>