LinuxLists.cc - V4L2 Encoders Pre-Processing Support Questions

2023-10-19 09:39:36

Subject: V4L2 Encoders Pre-Processing Support Questions

Hello,

While working on the Allwinner Video Engine H.264 encoder, I found that it has
some pre-processing capabilities. This includes things like chroma
down-sampling, colorspace conversion and scaling.

For example this means that you can feed the encoder with YUV 4:2:2 data and
it will downsample it to 4:2:0 since that's the only thing the hardware can do.
It can also happen when e.g. providing RGB source pictures which will be
converted to YUV 4:2:0 internally.

I was wondering how all of this is dealt with currently and whether this should
be a topic of attention. As far as I can see there is currently no practical way
for userspace to know that such downsampling will take place, although this is
useful to know.

Would it make sense to have an additional media entity between the source video
node and the encoder proc and have the actual pixel format configured in that
link (this would still be a video-centric device so userspace would not be
expected to configure that link). But then what if the hardware can either
down-sample or keep the provided sub-sampling? How would userspace indicate
which behavior to select? It is maybe not great to let userspace configure the
pads when this is a video-node-centric driver.

Perhaps this could be a control or the driver could decide to pick the least
destructive sub-sampling available based on the selected codec profile
(but this is still a guess that may not match the use case). With a control
we probably don't need an extra media entity.

Another topic is scaling. We can generally support scaling by allowing a
different size for the coded queue after configuring the picture queue.
However there would be some interaction with the selection rectangle, which is
used to set the cropping rectangle from the *source*. So the driver will need
to take this rectangle and scale it to match with the coded size.

The main inconsistency here is that the rectangle would no longer correspond to
what will be set in the bitstream, nor would the destination size since it does
not count the cropping rectangle by definition. It might be more sensible to
have the selection rectangle operate on the coded/destination queue instead,
but things are already specified to be the other way round.

Maybe a selection rectangle could be introduced for the coded queue too, which
would generally be propagated from the picture-side one, except in the case of
scaling where it would be used to clarify the actual final size (coded size
taking the cropping in account). It this case the source selection rectangle
would be understood as an actual source crop (may not be supported by hardware)
instead of an indication for the codec metadata crop fields. And the coded
queue dimensions would need to take in account this source cropping, which is
kinda contradictory with the current semantics. Perhaps we could define that
the source crop rectangle should be entirely ignored when scaling is used,
which would simplify things (although we lose the ability to support source
cropping if the hardware can do it).

If operating on the source selection rectangle only (no second rectangle on the
coded queue) some cases would be impossible to reach, for instance going from
some aligned dimensions to unaligned ones (e.g. 1280x720 source scaled to
1920x1088 and we want the codec cropping fields to indicate 1920x1080).

Anyway just wanted to check if people have already thought about these topics,
but I'm mostly thinking out loud and I'm of course not saying we need to solve
these problems now.

Sorry again for the long email, I hope the points I'm making are somewhat
understandable.

Cheers,

Paul

--
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

Attachments:

(No filename) (3.73 kB)
signature.asc (499.00 B)
Download all attachments

2023-10-20 17:57:01

by Nicolas Dufresne

[permalink] [raw]

Subject: Re: V4L2 Encoders Pre-Processing Support Questions

Hi Paul,

Le jeudi 19 octobre 2023 à 11:39 +0200, Paul Kocialkowski a écrit :
> Hello,
>
> While working on the Allwinner Video Engine H.264 encoder, I found that it has
> some pre-processing capabilities. This includes things like chroma
> down-sampling, colorspace conversion and scaling.

Similar with Hantro H1.

>
> For example this means that you can feed the encoder with YUV 4:2:2 data and
> it will downsample it to 4:2:0 since that's the only thing the hardware can do.
> It can also happen when e.g. providing RGB source pictures which will be
> converted to YUV 4:2:0 internally.
>
> I was wondering how all of this is dealt with currently and whether this should
> be a topic of attention. As far as I can see there is currently no practical way
> for userspace to know that such downsampling will take place, although this is
> useful to know.

Userspace already know that the driver will be downsample through the selected
profile. The only issue would be if a users want to force a profile with 422
support, but have its 422 data downsampled anyway. This is legal in the spec,
but I'd question myself if its worth supporting.

>
> Would it make sense to have an additional media entity between the source video
> node and the encoder proc and have the actual pixel format configured in that
> link (this would still be a video-centric device so userspace would not be
> expected to configure that link). But then what if the hardware can either
> down-sample or keep the provided sub-sampling? How would userspace indicate
> which behavior to select? It is maybe not great to let userspace configure the
> pads when this is a video-node-centric driver.
>
> Perhaps this could be a control or the driver could decide to pick the least
> destructive sub-sampling available based on the selected codec profile
> (but this is still a guess that may not match the use case). With a control
> we probably don't need an extra media entity.

Yes, for the cases not covered by the profile, I'd consider a control to force
downsampling. A menu, so we can use the available menu items to get enumerate
what is supported.

>
> Another topic is scaling. We can generally support scaling by allowing a
> different size for the coded queue after configuring the picture queue.
> However there would be some interaction with the selection rectangle, which is
> used to set the cropping rectangle from the *source*. So the driver will need
> to take this rectangle and scale it to match with the coded size.
>
> The main inconsistency here is that the rectangle would no longer correspond to
> what will be set in the bitstream, nor would the destination size since it does
> not count the cropping rectangle by definition. It might be more sensible to
> have the selection rectangle operate on the coded/destination queue instead,
> but things are already specified to be the other way round.
>
> Maybe a selection rectangle could be introduced for the coded queue too, which
> would generally be propagated from the picture-side one, except in the case of
> scaling where it would be used to clarify the actual final size (coded size
> taking the cropping in account). It this case the source selection rectangle
> would be understood as an actual source crop (may not be supported by hardware)
> instead of an indication for the codec metadata crop fields. And the coded
> queue dimensions would need to take in account this source cropping, which is
> kinda contradictory with the current semantics. Perhaps we could define that
> the source crop rectangle should be entirely ignored when scaling is used,
> which would simplify things (although we lose the ability to support source
> cropping if the hardware can do it).

Yes, we should use selection on both queue (fortunately there is a v4l2_buf_type
in that API). Otherwise we cannot model all the scaling and cropping options.
What the spec must do is define the configuration sequence, so that a
negotiation is possible. We need a convention regarding the order, so that there
is a way to converge with the driver, and also to conclude if the driver cannot
handle it.

>
> If operating on the source selection rectangle only (no second rectangle on the
> coded queue) some cases would be impossible to reach, for instance going from
> some aligned dimensions to unaligned ones (e.g. 1280x720 source scaled to
> 1920x1088 and we want the codec cropping fields to indicate 1920x1080).
>
> Anyway just wanted to check if people have already thought about these topics,
> but I'm mostly thinking out loud and I'm of course not saying we need to solve
> these problems now.

We might find extra corner case by implementing the spec, but I think the API we
have makes most of this possible already. Remember that we have fwht sw codec in
kernel for the purpose of developing this kind of feature. A simple bob scaler
can be added for testing scaling.

>
> Sorry again for the long email, I hope the points I'm making are somewhat
> understandable.
>
> Cheers,
>
> Paul
>

regards,
Nicolas

2023-10-25 09:03:09

by Paul Kocialkowski

[permalink] [raw]

Subject: Re: V4L2 Encoders Pre-Processing Support Questions

Hi Nicolas,

Thanks for you useful answer!

On Fri 20 Oct 23, 13:56, Nicolas Dufresne wrote:
> > For example this means that you can feed the encoder with YUV 4:2:2 data and
> > it will downsample it to 4:2:0 since that's the only thing the hardware can do.
> > It can also happen when e.g. providing RGB source pictures which will be
> > converted to YUV 4:2:0 internally.
> >
> > I was wondering how all of this is dealt with currently and whether this should
> > be a topic of attention. As far as I can see there is currently no practical way
> > for userspace to know that such downsampling will take place, although this is
> > useful to know.
>
> Userspace already know that the driver will be downsample through the selected
> profile. The only issue would be if a users want to force a profile with 422
> support, but have its 422 data downsampled anyway. This is legal in the spec,
> but I'd question myself if its worth supporting.

Yeah indeed I think there's a distinction between selecting a profile that
allows 422 and ensuring that this is what the encoder selects. Not sure if 420
is always valid for any profile, but there's surely some overlap where both
could be selected in compliance with the profile.

> > Would it make sense to have an additional media entity between the source video
> > node and the encoder proc and have the actual pixel format configured in that
> > link (this would still be a video-centric device so userspace would not be
> > expected to configure that link). But then what if the hardware can either
> > down-sample or keep the provided sub-sampling? How would userspace indicate
> > which behavior to select? It is maybe not great to let userspace configure the
> > pads when this is a video-node-centric driver.
> >
> > Perhaps this could be a control or the driver could decide to pick the least
> > destructive sub-sampling available based on the selected codec profile
> > (but this is still a guess that may not match the use case). With a control
> > we probably don't need an extra media entity.
>
> Yes, for the cases not covered by the profile, I'd consider a control to force
> downsampling. A menu, so we can use the available menu items to get enumerate
> what is supported.

Sounds good then.

> > Another topic is scaling. We can generally support scaling by allowing a
> > different size for the coded queue after configuring the picture queue.
> > However there would be some interaction with the selection rectangle, which is
> > used to set the cropping rectangle from the *source*. So the driver will need
> > to take this rectangle and scale it to match with the coded size.
> >
> > The main inconsistency here is that the rectangle would no longer correspond to
> > what will be set in the bitstream, nor would the destination size since it does
> > not count the cropping rectangle by definition. It might be more sensible to
> > have the selection rectangle operate on the coded/destination queue instead,
> > but things are already specified to be the other way round.
> >
> > Maybe a selection rectangle could be introduced for the coded queue too, which
> > would generally be propagated from the picture-side one, except in the case of
> > scaling where it would be used to clarify the actual final size (coded size
> > taking the cropping in account). It this case the source selection rectangle
> > would be understood as an actual source crop (may not be supported by hardware)
> > instead of an indication for the codec metadata crop fields. And the coded
> > queue dimensions would need to take in account this source cropping, which is
> > kinda contradictory with the current semantics. Perhaps we could define that
> > the source crop rectangle should be entirely ignored when scaling is used,
> > which would simplify things (although we lose the ability to support source
> > cropping if the hardware can do it).
>
> Yes, we should use selection on both queue (fortunately there is a v4l2_buf_type
> in that API). Otherwise we cannot model all the scaling and cropping options.
> What the spec must do is define the configuration sequence, so that a
> negotiation is possible. We need a convention regarding the order, so that there
> is a way to converge with the driver, and also to conclude if the driver cannot
> handle it.

Agreed. I'm just a bit worried that it's a bit late to change the semantics now
that the source crop is defined in the stateful encoding uAPI and its meaning
would become unclear/different when a destination crop is added.

Cheers,

Paul

> > If operating on the source selection rectangle only (no second rectangle on the
> > coded queue) some cases would be impossible to reach, for instance going from
> > some aligned dimensions to unaligned ones (e.g. 1280x720 source scaled to
> > 1920x1088 and we want the codec cropping fields to indicate 1920x1080).
> >
> > Anyway just wanted to check if people have already thought about these topics,
> > but I'm mostly thinking out loud and I'm of course not saying we need to solve
> > these problems now.
>
> We might find extra corner case by implementing the spec, but I think the API we
> have makes most of this possible already. Remember that we have fwht sw codec in
> kernel for the purpose of developing this kind of feature. A simple bob scaler
> can be added for testing scaling.
>
> >
> > Sorry again for the long email, I hope the points I'm making are somewhat
> > understandable.
> >
> > Cheers,
> >
> > Paul
> >
>
> regards,
> Nicolas
>

--
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

Attachments:

(No filename) (5.60 kB)
signature.asc (499.00 B)
Download all attachments