by Maxime Ripard

[permalink] [raw]

Subject: Re: [PATCH 1/7] media: cedrus: Disable engine after each slice decoding

Hi,

On Thu, May 30, 2019 at 11:15:10PM +0200, Jernej Skrabec wrote:
> libvdpau-sunxi always disables engine after each decoded slice.
> Do same in Cedrus driver.
>
> Presumably this also lowers power consumption which is always nice.
>
> Signed-off-by: Jernej Skrabec <[email protected]>

Is it fixing anything though?

I indeed saw that cedar did disable it everytime, but I couldn't find
a reason why.

Also, the power management improvement would need to be measured, it
can even create the opposite situation where the device will draw more
current from being woken up than if it had just remained disabled.

Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

Attachments:

(No filename) (737.00 B)
signature.asc (235.00 B)
Download all attachments

2019-06-03 12:25:49

by Maxime Ripard

[permalink] [raw]

Subject: Re: [PATCH 4/7] media: cedrus: Remove dst_bufs from context

On Thu, May 30, 2019 at 11:15:13PM +0200, Jernej Skrabec wrote:
> This array is just duplicated capture buffer queue. Remove it and adjust
> code to look into capture buffer queue instead.
>
> Signed-off-by: Jernej Skrabec <[email protected]>

Acked-by: Maxime Ripard <[email protected]>

Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

Attachments:

(No filename) (409.00 B)
signature.asc (235.00 B)
Download all attachments

2019-06-03 12:27:08

Hi,

On Thu, May 30, 2019 at 11:15:12PM +0200, Jernej Skrabec wrote:
> It seems that for some H264 videos at least one bitstream parsing
> trigger must be called in order to be decoded correctly. There is no
> explanation why this helps, but it was observed that two sample videos
> with this fix are now decoded correctly and there is no regression with
> others.
>
> Signed-off-by: Jernej Skrabec <[email protected]>
> ---
> I have two samples which are fixed by this:
> http://jernej.libreelec.tv/videos/h264/test.mkv
> http://jernej.libreelec.tv/videos/h264/Dredd%20%E2%80%93%20DTS%20Sound%20Check%20DTS-HD%20MA%207.1.m2ts
>
> Although second one also needs support for multi-slice frames, which is not yet implemented here.
>
> .../staging/media/sunxi/cedrus/cedrus_h264.c | 22 ++++++++++++++++---
> 1 file changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> index cc8d17f211a1..d0ee3f90ff46 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> @@ -6,6 +6,7 @@
> * Copyright (c) 2018 Bootlin
> */
>
> +#include <linux/delay.h>
> #include <linux/types.h>
>
> #include <media/videobuf2-dma-contig.h>
> @@ -289,6 +290,20 @@ static void cedrus_write_pred_weight_table(struct cedrus_ctx *ctx,
> }
> }

We should have a comment here explaining why that is needed

> +static void cedrus_skip_bits(struct cedrus_dev *dev, int num)
> +{
> + for (; num > 32; num -= 32) {
> + cedrus_write(dev, VE_H264_TRIGGER_TYPE, 0x3 | (32 << 8));

Using defines here would be great

> + while (cedrus_read(dev, VE_H264_STATUS) & (1 << 8))
> + udelay(1);
> + }

A new line here would be great

> + if (num > 0) {
> + cedrus_write(dev, VE_H264_TRIGGER_TYPE, 0x3 | (num << 8));
> + while (cedrus_read(dev, VE_H264_STATUS) & (1 << 8))
> + udelay(1);
> + }

Can't we make that a bit simpler by not duplicating the loop?

Something like:

int current = 0;

while (current < num) {
int tmp = min(num - current, 32);

cedrus_write(dev, VE_H264_TRIGGER_TYPE, 0x3 | (current << 8))
while (...)
...

current += tmp;
}

Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

Attachments:

(No filename) (2.33 kB)
signature.asc (235.00 B)
Download all attachments

2019-06-03 16:58:04

by Maxime Ripard

[permalink] [raw]

Subject: Re: [PATCH 7/7] media: cedrus: Improve H264 memory efficiency

On Thu, May 30, 2019 at 11:15:16PM +0200, Jernej Skrabec wrote:
> H264 decoder driver preallocated pretty big worst case mv col buffer
> pool. It turns out that pool is most of the time much bigger than it
> needs to be.
>
> Solution implemented here is to allocate memory only if capture buffer
> is actually used and only as much as it is really necessary.
>
> This is also preparation for 4K video decoding support, which will be
> implemented later.

What is it doing exactly to prepare for 4k?

> Signed-off-by: Jernej Skrabec <[email protected]>
> ---
> drivers/staging/media/sunxi/cedrus/cedrus.h | 4 -
> .../staging/media/sunxi/cedrus/cedrus_h264.c | 81 +++++++------------
> 2 files changed, 28 insertions(+), 57 deletions(-)
>
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h b/drivers/staging/media/sunxi/cedrus/cedrus.h
> index 16c1bdfd243a..fcbbbef65494 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus.h
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus.h
> @@ -106,10 +106,6 @@ struct cedrus_ctx {
>
> union {
> struct {
> - void *mv_col_buf;
> - dma_addr_t mv_col_buf_dma;
> - ssize_t mv_col_buf_field_size;
> - ssize_t mv_col_buf_size;
> void *pic_info_buf;
> dma_addr_t pic_info_buf_dma;
> void *neighbor_info_buf;
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> index b2290f98d81a..758fd0049e8f 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> @@ -54,17 +54,14 @@ static void cedrus_h264_write_sram(struct cedrus_dev *dev,
> cedrus_write(dev, VE_AVC_SRAM_PORT_DATA, *buffer++);
> }
>
> -static dma_addr_t cedrus_h264_mv_col_buf_addr(struct cedrus_ctx *ctx,
> - unsigned int position,
> +static dma_addr_t cedrus_h264_mv_col_buf_addr(struct cedrus_buffer *buf,
> unsigned int field)
> {
> - dma_addr_t addr = ctx->codec.h264.mv_col_buf_dma;
> -
> - /* Adjust for the position */
> - addr += position * ctx->codec.h264.mv_col_buf_field_size * 2;
> + dma_addr_t addr = buf->extra_buf_dma;
>
> /* Adjust for the field */
> - addr += field * ctx->codec.h264.mv_col_buf_field_size;
> + if (field)
> + addr += buf->extra_buf_size / 2;
>
> return addr;
> }
> @@ -76,7 +73,6 @@ static void cedrus_fill_ref_pic(struct cedrus_ctx *ctx,
> struct cedrus_h264_sram_ref_pic *pic)
> {
> struct vb2_buffer *vbuf = &buf->m2m_buf.vb.vb2_buf;
> - unsigned int position = buf->codec.h264.position;
>
> pic->top_field_order_cnt = cpu_to_le32(top_field_order_cnt);
> pic->bottom_field_order_cnt = cpu_to_le32(bottom_field_order_cnt);
> @@ -84,10 +80,8 @@ static void cedrus_fill_ref_pic(struct cedrus_ctx *ctx,
>
> pic->luma_ptr = cpu_to_le32(cedrus_buf_addr(vbuf, &ctx->dst_fmt, 0));
> pic->chroma_ptr = cpu_to_le32(cedrus_buf_addr(vbuf, &ctx->dst_fmt, 1));
> - pic->mv_col_top_ptr =
> - cpu_to_le32(cedrus_h264_mv_col_buf_addr(ctx, position, 0));
> - pic->mv_col_bot_ptr =
> - cpu_to_le32(cedrus_h264_mv_col_buf_addr(ctx, position, 1));
> + pic->mv_col_top_ptr = cpu_to_le32(cedrus_h264_mv_col_buf_addr(buf, 0));
> + pic->mv_col_bot_ptr = cpu_to_le32(cedrus_h264_mv_col_buf_addr(buf, 1));
> }
>
> static void cedrus_write_frame_list(struct cedrus_ctx *ctx,
> @@ -142,6 +136,28 @@ static void cedrus_write_frame_list(struct cedrus_ctx *ctx,
> output_buf = vb2_to_cedrus_buffer(&run->dst->vb2_buf);
> output_buf->codec.h264.position = position;
>
> + if (!output_buf->extra_buf_size) {
> + const struct v4l2_ctrl_h264_sps *sps = run->h264.sps;
> + unsigned int field_size;
> +
> + field_size = DIV_ROUND_UP(ctx->src_fmt.width, 16) *
> + DIV_ROUND_UP(ctx->src_fmt.height, 16) * 16;
> + if (!(sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE))
> + field_size = field_size * 2;
> + if (!(sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY))
> + field_size = field_size * 2;
> +
> + output_buf->extra_buf_size = field_size * 2;
> + output_buf->extra_buf =
> + dma_alloc_coherent(dev->dev,
> + output_buf->extra_buf_size,
> + &output_buf->extra_buf_dma,
> + GFP_KERNEL);
> +
> + if (!output_buf->extra_buf)
> + output_buf->extra_buf_size = 0;
> + }
> +

That also means that instead of allocating that buffer exactly once,
you now allocate it for each output buffer?

I guess that it will cleaned up by your previous patch at
buffer_cleanup time, so after it's no longer a reference frame?

What is the average memory usage before, and after that change during
a playback, and what is the runtime cost of doing it multiple times
instead of once?

Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

Attachments:

(No filename) (4.74 kB)
signature.asc (235.00 B)
Download all attachments

2019-06-03 17:31:27

[permalink] [raw]

Subject: Re: [PATCH 0/7] media: cedrus: Improvements/cleanup

Hi!

On Mon, Aug 12, 2019 at 02:12:21PM +0200, Hans Verkuil wrote:
> On 5/30/19 11:15 PM, Jernej Skrabec wrote:
> > Here is first batch of random Cedrus improvements/cleanups. Only patch 2
> > has a change which raises a question about H264 controls.
> >
> > Changes were tested on H3 SoC using modified ffmpeg and Kodi.
> >
> > Please take a look.
>
> This has been sitting in patchwork for quite some time. I've updated the
> status of the various patches and most needed extra work.
>
> It seems that patches 4/7 and 5/7 are OK. Maxime, can you please confirm
> that these two are still valid? They apply cleanly on the latest master
> at least, but since they are a bit old I prefer to have confirmation that
> it's OK to merge them.

Yes, you can definitely merge those.

Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

Attachments:

(No filename) (896.00 B)
signature.asc (235.00 B)
Download all attachments