LinuxLists.cc - Stateless Encoding uAPI Discussion and Proposal

[permalink] [raw]

Subject: Re: Stateless Encoding uAPI Discussion and Proposal

Le mardi 25 juillet 2023 à 11:09 +0200, Paul Kocialkowski a écrit :
> Hi Nicolas,
>
> On Mon 24 Jul 23, 10:03, Nicolas Dufresne wrote:
> > Le vendredi 21 juillet 2023 à 20:19 +0200, Michael Grzeschik a écrit :
> > > > As a result, we cannot expect that any given encoder is able to produce frames
> > > > for any set of headers. Reporting related constraints and limitations (beyond
> > > > profile/level) seems quite difficult and error-prone.
> > > >
> > > > So it seems that keeping header generation in-kernel only (close to where the
> > > > hardware is actually configured) is the safest approach.
> > >
> > > For the case with the rkvenc, the headers are also not created by the
> > > kernel driver. Instead we use the gst_h264_bit_writer_sps/pps functions
> > > that are part of the codecparsers module.
> >
> > One level of granularity we can add is split headers (like SPS/PPS) and
> > slice/frame headers.
>
> Do you mean asking the driver to return a buffer with only SPS/PPS and then
> return another buffer with the slice/frame header?
>
> Looks like there's already a control for it: V4L2_CID_MPEG_VIDEO_HEADER_MODE
> which takes either
> - V4L2_MPEG_VIDEO_HEADER_MODE_SEPARATE: looks like what you're suggesting
> - V4L2_MPEG_VIDEO_HEADER_MODE_JOINED_WITH_1ST_FRAME: usual case
>
> So that could certainly be supported to easily allow userspace to stuff extra
> NALUs in-between.

Good point, indeed.

>
> > It remains that in some cases, like HEVC, when the slice
> > header is byte aligned, it can be nice to be able to handle it at application
> > side in order to avoid limiting SVC support (and other creative features) by our
> > API/abstraction limitations.
>
> Do you see something in the headers that we expect the kernel to generate that
> would need specific changes to support features like SVC?

Getting the kernel to set the layer IDs, unless we have a full SVC configuration
would just be extra indirections. That being said, if we mention HEVC, these IDs
can be modified in-place as they use a fixed number of bytes. If you can split
the headers appart, generating per layer headers in application makes a lot of
sense.

Traditionally, slice headers are made by stateless accelerators, but not the
SPS/PPS and friend.

>
> From what I can see there's a svc_extension_flag that's only set for specific
> NALUs (prefix_nal_unit/lice_layer_extension) so these could be inserted by
> userspace.
>
> Also I'm not very knowledgeable about SVC so it's not very clear to me if it's
> possible to take an encoder that doesn't support SVC and turn the resulting
> stream into something SVC-ready by adding extra NAL units or if the encoder
> should be a lot more involved.

You can use any encoders to create a temporal SVC. Its only about the
referencing pattern, made so you can reduce the framerate (dividing by 2
usually).

For spatial layer, the encoders need scaling capabilities. I'm not totally sure
how multi-view work, but this is most likely just using left eye as reference
(not having an I frame ever for the second eye).

>
> Also do you know if we have stateful codecs supporting SVC?

We don't at the moment, they all produce headers with layer id hardcoded to 0 as
far as I'm aware. The general plan (if it had continued) might have been to
offer a memu based control, and drivers could offer from a list of preset SVC
pattern. Mimicking what browsers needs:

https://www.w3.org/TR/webrtc-svc/

>
> > I think a certain level of "per CODEC" reasoning is
> > also needed. Just like, I would not want to have to ask the kernel to generate
> > user data SEI and other in-band data.
>
> Yeah it looks like there is definitely a need for adding extra NALUs from
> userspace without passing that data to the kernel.
>
> Cheers,
>
> Paul
>

2023-07-26 21:07:57

by Nicolas Dufresne

[permalink] [raw]

Subject: Re: Stateless Encoding uAPI Discussion and Proposal

Hi,

Le mercredi 26 juillet 2023 à 10:49 +0800, Hsia-Jun Li a écrit :
> > I am strongly against this approach, instead I think we need to keep all
> > vendor-specific parts in the kernel driver and provide a clean unified userspace
> > API.
> >
> We are driving away vendor participation. Besides, the current design is
> a performance bottleneck.

I know you have been hammering this argument for many many years. But in
concrete situation, we have conducted tests, and we out perform vendors stacks
that directly hit into hardware register with stateless CODEC. Also, Paul's
proposal, is that fine grain / highly close to metal tuning of the encoding
process should endup in the Linux kernel, so that it can benefit from the
natural hard real-time advantage of a hard IRQ. Just like anything else, we will
find a lot of common methods and shareable code which will benefit in security
and quality, which is very unlike what we normally get from per vendor BSP. The
strategy is the same as everything else in Linux, vendor will adpot it if there
is a clear benefit. And better quality, ease of use, good collection of mature
userspace software is what makes the difference. It does takes time of course.

regards,
Nicolas

2023-07-27 03:28:38

by Hsia-Jun Li

[permalink] [raw]

Subject: Re: Stateless Encoding uAPI Discussion and Proposal

On 7/27/23 03:53, Nicolas Dufresne wrote:
> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> Hi,
>
> Le mercredi 26 juillet 2023 à 10:49 +0800, Hsia-Jun Li a écrit :
>>> I am strongly against this approach, instead I think we need to keep all
>>> vendor-specific parts in the kernel driver and provide a clean unified userspace
>>> API.
>>>
>> We are driving away vendor participation. Besides, the current design is
>> a performance bottleneck.
>
> I know you have been hammering this argument for many many years. But in
> concrete situation, we have conducted tests, and we out perform vendors stacks
> that directly hit into hardware register with stateless CODEC. Also, Paul's
> proposal, is that fine grain / highly close to metal tuning of the encoding
> process should endup in the Linux kernel, so that it can benefit from the
> natural hard real-time advantage of a hard IRQ. Just like anything else, we will
In a real case, especially in those EDR/DVR, NVR, re-encoding could
happen occasionally. The important is feedback the encoded statistic to
the controller(userspace) then userspace decided the future
operation(whether re-encoding this or not).

> find a lot of common methods and shareable code which will benefit in security
The security for a vendor would only mean the protection of its
intelligence properties. Also userspace and HAL is isolated in Android.
Security or quality are not a problem here, you can't even run the
unverified code.
Or we just define an interface that only FOSS would use.
> and quality, which is very unlike what we normally get from per vendor BSP. The
> strategy is the same as everything else in Linux, vendor will adpot it if there
> is a clear benefit. And better quality, ease of use, good collection of mature
Any vendor would like to implement a DRM(digital right, security) video
pipeline would not even think of this. They are not many vendors that
just sell plain video codecs hardware.

In such case, we can't even invoke in its memory management, they may
even drop the V4L2 framework.

Somebody may say why the vendor want the stateless codec, they could
have a dedicated core to run a firmware. It is simple, if you are
comparing an ARM cortex-R/M core to an ARM application core, which one
could performance better? A remote processor could make the memory
model(cache coherent) more complex. Besides, it is about the cost.
> userspace software is what makes the difference. It does takes time of course.

Anyway, despite those registers and controls part, I think I could input
the buffer management part here.

Please DO ***NOT*** make a standard that occupied many memory behinds
usersace and a standard that user has to handle the reconstruction
buffer holding with a strange mechanism(I mean reconstruction buffer
lifetime would be manged by userspace manually).
>
> regards,
> Nicolas

--
Hsia-Jun(Randy) Li

2023-07-27 17:13:48

by Nicolas Dufresne

[permalink] [raw]

Subject: Re: Stateless Encoding uAPI Discussion and Proposal

Le jeudi 27 juillet 2023 à 10:45 +0800, Hsia-Jun Li a écrit :
>
> On 7/27/23 03:53, Nicolas Dufresne wrote:
> > CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
> >
> >
> > Hi,
> >
> > Le mercredi 26 juillet 2023 à 10:49 +0800, Hsia-Jun Li a écrit :
> > > > I am strongly against this approach, instead I think we need to keep all
> > > > vendor-specific parts in the kernel driver and provide a clean unified userspace
> > > > API.
> > > >
> > > We are driving away vendor participation. Besides, the current design is
> > > a performance bottleneck.
> >
> >

. . .

> Or we just define an interface that only FOSS would use.

We explicitly favour FOSS and make API that guaranty you can use the driver with
FOSS. This is not something we do in secret, this is fundamental to being a GPL
project. On DRM side, were the API is a lot more flexible, they explicitly
reject drivers without an actual FOSS user. We don't strictly have to do that in
V4L2, because the API is done at a higher level. But if we were to come up with
a lower level abstraction, we'd certainly have this rules.

. . .
>

>
> Please DO ***NOT*** make a standard that occupied many memory behinds
> usersace and a standard that user has to handle the reconstruction
> buffer holding with a strange mechanism(I mean reconstruction buffer
> lifetime would be manged by userspace manually).

In all fairness, people have limited time, and builds on top of existing
infrastructure. The reason reconstruction buffers won't be exposed is really
simple to understand. We don't have API in current framework to support all the
allocations happening in codec drivers. If we could not progress without that,
I've sure finding solution would become a priority. But the trith is that we can
live without, and are aiming to move forward without.

We can certainly start a thread on the subject, I even have plenty of ideas how
to introduce these without throwing away all the existing stuff. But only if
there is a clear intention to actually implement it. We have plenty on our plate
and exposing reconstruction buffers can certainly wait.

regards,
Nicolas
>

2023-08-09 15:08:31

Hi folks,

On Tue 11 Jul 23, 19:12, Paul Kocialkowski wrote:
> I am now working on a H.264 encoder driver for Allwinner platforms (currently
> focusing on the V3/V3s), which already provides some usable bitstream and will
> be published soon.

So I wanted to shared an update on my side since I've been making progress on
the H.264 encoding work for Allwinner platforms. At this point the code supports
IDR, I and P frames, with a single reference. It also supports GOP (both closed
and open with IDR or I frame interval and explicit keyframe request) but uses
QP controls and does not yet provide rate control. I hope to be able to
implement rate-control before we can make a first public release of the code.

One of the main topics of concern now is how reference frames should be managed
and how it should interact with kernel-side GOP management and rate control.

Leaving GOP management to the kernel-side implies having it decide which frame
should be IDR, I or P (and B for encoders that can support it), while keeping
the possibility to request a keyframe (IDR) and configure GOP size. Now it seems
to me that this is already a good balance between giving userspace a decent
level of control while not having to specify the frame type explicitly for each
frame or maintain a GOP in userspace.

Requesting the frame type explicitly seems more fragile as many situations will
be invalid (e.g. requesting a P frame at the beginning of the stream, etc) and
it generally requires userspace to know a lot about what the codec assumptions
are. Also for B frames the decision would need to be consistent with the fact
that a following frame (in display order) would need to be submitted earlier
than the current frame and inform the kernel so that the picture order count
(display order indication) can be maintained. This is not impossible or out of
reach, but it brings a lot of complexity for little advantage.

Leaving the decision to the kernel side with some hints (whether to force a
keyframe, whether to allow B frames) seems a lot easier, especially for B frames
since the kernel could just receive frames in-order and decide to hold one
so that it can use the next frame submitted as a forward reference for this
upcoming B frame. This requires flushing support but it's already well in place
for stateful encoders.

The next topic of interest is reference management. It seems pretty clear that
the decision of whether a frame should be a reference or not always needs to be
taken when encoding that frame. In H.264 the nal_ref_idc slice header element
indicates whether a frame is marked as reference or not. IDR frames can
additionally be marked as long-term reference (if I understood correctly, the
frame will stay in the reference picture list until the next IDR frame).
Frames that are marked as reference are added to the l0/l1 lists implicitly
that way and are evicted mostly depending on the number of reference slots
available, or when a new GOP is started.

With the frame type decided by the kernel, it becomes nearly impossible for
userspace to keep track of the reference lists. Userspace would at least need
to know when an IDR frame is produced to flush the reference lists. In addition
it looks like most hardware doesn't have a way to explicitly discard previous
frames that were marked as reference from being used as reference for next
frames. All in all this means that we should expect little control over the
reference frames list.

As a result my updated proposal would be to have userspace only indicate whether
a submitted frame should be marked as a reference or not instead of submitting
an explicit list of previous buffers that should be used as reference, which
would be impossible to honor in many cases.

Addition information gathered:
- It seems likely that the Allwinner Video Engine only supports one reference
frame. There's a register for specifying the rec buffer of a second one but
I have never seen the proprietary blob use it. It might be as easy as
specifying a non-zero address there but it might also be ignored or require
some undocumented bit to use more than one reference. I haven't made any
attempt at using it yet.
- Contrary to what I said after Andrzej's talk at EOSS, most Allwinner platforms
do not support VP8 encode (despite Allwinner's proprietary blob having an
API for it). The only platform that advertises it is the A80 and this might
actually be a VP8-only Hantro H1. It seems that the API they developed in the
library stuck around even if no other platform can use it.

Sorry for the long email again, I'm trying to be a bit more explanatory than
just giving some bare conclusions that I drew on my own.

What do you think about these ideas?

Cheers,

Paul

>
> This is a very long email where I've tried to split things into distinct topics
> and explain a few concepts to make sure everyone is on the same page.
>
> # Bitstream Headers
>
> Stateless encoders typically do not generate all the bitstream headers and
> sometimes no header at all (e.g. Allwinner encoder does not even produce slice
> headers). There's often some hardware block that makes bit-level writing to the
> destination buffer easier (deals with alignment, etc).
>
> The values of the bitstream headers must be in line with how the compressed
> data bitstream is generated and generally follow the codec specification.
> Some encoders might allow configuring all the fields found in the headers,
> others may only allow configuring a few or have specific constraints regarding
> which values are allowed.
>
> As a result, we cannot expect that any given encoder is able to produce frames
> for any set of headers. Reporting related constraints and limitations (beyond
> profile/level) seems quite difficult and error-prone.
>
> So it seems that keeping header generation in-kernel only (close to where the
> hardware is actually configured) is the safest approach.
>
> # Codec Features
>
> Codecs have many variable features that can be enabled or not and specific
> configuration fields that can take various values. There is usually some
> top-level indication of profile/level that restricts what can be used.
>
> This is a very similar situation to stateful encoding, where codec-specific
> controls are used to report and set profile/level and configure these aspects.
> A particularly nice thing about it is that we can reuse these existing controls
> and add new ones in the future for features that are not yet covered.
>
> This approach feels more flexible than designing new structures with a selected
> set of parameters (that could match the existing controls) for each codec.
>
> # Reference and Reconstruction Management
>
> With stateless encoding, we need to tell the hardware which frames need to be
> used as references for encoding the current frame and make sure we have the
> these references available as decoded frames in memory.
>
> Regardless of references, stateless encoders typically need some memory space to
> write the decoded (known as reconstructed) frame while it's being encoded.
>
> One question here is how many slots for decoded pictures should be allocated
> by the driver when starting to stream. There is usually a maximum number of
> reference frames that can be used at a time, although perhaps there is a use
> case to keeping more around and alternative between them for future references.
>
> Another question is how the driver should keep track of which frame will be used
> as a reference in the future and which one can be evicted from the pool of
> decoded pictures if it's not going to be used anymore.
>
> A restrictive approach would be to let the driver alone manage that, similarly
> to how stateful encoders behave. However it might provide extra flexibility
> (and memory gain) to allow userspace to configure the maximum number of possible
> reference frames. In that case it becomes necessary to indicate if a given
> frame will be used as a reference in the future (maybe using a buffer flag)
> and to indicate which previous reference frames (probably to be identified with
> the matching output buffer's timestamp) should be used for the current encode.
> This could be done with a new dedicated control (as a variable-sized array of
> timestamps). Note that userspace would have to update it for every frame or the
> reference frames will remain the same for future encodes.
>
> The driver will then make sure to keep the reconstructed buffer around, in one
> of the slots. When there's no slot left, the driver will drop the oldest
> reference it has (maybe with a bounce buffer to still allow it to be used as a
> reference for the current encode).
>
> With this behavior defined in the uAPI spec, userspace will also be able to
> keep track of which previous frame is no longer allowed as a reference.
>
> # Frame Types
>
> Stateless encoder drivers will typically instruct the hardware to encode either
> an intra-coded or an inter-coded frame. While a stream composed only of a single
> intra-coded frame followed by only inter-coded frames is possible, it's
> generally not desirable as it is not very robust against data loss and makes
> seeking difficult.
>
> As a result, the frame type is usually decided based on a given GOP size
> (the frequency at which a new intra-coded frame is produced) while intra-coded
> frames can be explicitly requested upon request. Stateful encoders implement
> these through dedicated controls:
> - V4L2_CID_MPEG_VIDEO_FORCE_KEY_FRAME
> - V4L2_CID_MPEG_VIDEO_GOP_SIZE
> - V4L2_CID_MPEG_VIDEO_H264_I_PERIOD
>
> It seems that reusing them would be possible, which would let the driver decide
> of the particular frame type.
>
> However it makes the reference frame management a bit trickier since reference
> frames might be requested from userspace for a frame that ends up being
> intra-coded. We can either allow this and silently ignore the info or expect
> that userspace keeps track of the GOP index and not send references on the first
> frame.
>
> In some codecs, there's also a notion of barrier key-frames (IDR frames in
> H.264) that strictly forbid using any past reference beyond the frame.
> There seems to be an assumption that the GOP start uses this kind of frame
> (and not any intra-coded frame), while the force key frame control does not
> particularly specify it.
>
> In that case we should flush the list of references and userspace should no
> longer provide references to them for future frames. This puts a requirement on
> userspace to keep track of GOP start in order to know when to flush its
> reference list. It could also check if V4L2_BUF_FLAG_KEYFRAME is set, but this
> could also indicate a general intra-coded frame that is not a barrier.
>
> So another possibility would be for userspace to explicitly indicate which
> frame type to use (in a codec-specific way) and act accordingly, leaving any
> notion of GOP up to userspace. I feel like this might be the easiest approach
> while giving an extra degree of control to userspace.
>
> # Rate Control
>
> Another important feature of encoders is the ability to control the amount of
> data produced following different rate control strategies. Stateful encoders
> typically do this in-firmware and expose controls for selecting the strategy
> and associated targets.
>
> It seems desirable to support both automatic and manual rate-control to
> userspace.
>
> Automatic control would be implemented kernel-side (with algos possibly shared
> across drivers) and reuse existing stateful controls. The advantage is
> simplicity (userspace does not need to carry its own rate-control
> implementation) and to ensure that there is a built-in mechanism for common
> strategies available for every driver (no mandatory dependency on a proprietary
> userspace stack). There may also be extra statistics or controls available to
> the driver that allow finer-grain control.
>
> Manual control allows userspace to get creative and requires the ability to set
> the quantization parameter (QP) directly for each frame (controls are already
> as many stateful encoders also support it).
>
> # Regions of Interest
>
> Regions of interest (ROIs) allow specifying sub-regions of the frame that should
> be prioritized for quality. Stateless encoders typically support a limited
> number and allow setting specific QP values for these regions.
>
> While the QP value should be used directly in manual rate-control, we probably
> want to have some "level of importance" setting for kernel-side rate-control,
> along with the dimensions/position of each ROI. This could be expressed with
> a new structure containing all these elements and presented as a variable-sized
> array control with as many elements as the hardware can support.
>
> --
> Paul Kocialkowski, Bootlin
> Embedded Linux and kernel engineering
> https://bootlin.com

--
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

Attachments:

(No filename) (12.94 kB)
signature.asc (499.00 B)
Download all attachments

2023-08-10 16:38:43

by Nicolas Dufresne

[permalink] [raw]

Subject: Re: Stateless Encoding uAPI Discussion and Proposal

Le jeudi 10 août 2023 à 15:44 +0200, Paul Kocialkowski a écrit :
> Hi folks,
>
> On Tue 11 Jul 23, 19:12, Paul Kocialkowski wrote:
> > I am now working on a H.264 encoder driver for Allwinner platforms (currently
> > focusing on the V3/V3s), which already provides some usable bitstream and will
> > be published soon.
>
> So I wanted to shared an update on my side since I've been making progress on
> the H.264 encoding work for Allwinner platforms. At this point the code supports
> IDR, I and P frames, with a single reference. It also supports GOP (both closed
> and open with IDR or I frame interval and explicit keyframe request) but uses
> QP controls and does not yet provide rate control. I hope to be able to
> implement rate-control before we can make a first public release of the code.

Just a reminder that we will code review the API first, the supporting
implementation will just be companion. So in this context, the sooner the better
for an RFC here.

>
> One of the main topics of concern now is how reference frames should be managed
> and how it should interact with kernel-side GOP management and rate control.

Maybe we need to have a discussion about kernel side GOP management first ?
While I think kernel side rate control is un-avoidable, I don't think stateless
encoder should have kernel side GOP management.

>
> Leaving GOP management to the kernel-side implies having it decide which frame
> should be IDR, I or P (and B for encoders that can support it), while keeping
> the possibility to request a keyframe (IDR) and configure GOP size. Now it seems
> to me that this is already a good balance between giving userspace a decent
> level of control while not having to specify the frame type explicitly for each
> frame or maintain a GOP in userspace.

My expectation for stateless encoder is to have to specify the frame type and
the associate references if the type requires it.

>
> Requesting the frame type explicitly seems more fragile as many situations will
> be invalid (e.g. requesting a P frame at the beginning of the stream, etc) and
> it generally requires userspace to know a lot about what the codec assumptions
> are. Also for B frames the decision would need to be consistent with the fact
> that a following frame (in display order) would need to be submitted earlier
> than the current frame and inform the kernel so that the picture order count
> (display order indication) can be maintained. This is not impossible or out of
> reach, but it brings a lot of complexity for little advantage.

We have had a lot more consistent results over the last decade with stateless
hardware codecs in contrast to stateful where we endup with wide variation in
behaviour. This applies to Chromium, GStreamer and any active users of VA
encoders really. I'm strongly in favour for stateless reference API out of the
Linux kernel.

>
> Leaving the decision to the kernel side with some hints (whether to force a
> keyframe, whether to allow B frames) seems a lot easier, especially for B frames
> since the kernel could just receive frames in-order and decide to hold one
> so that it can use the next frame submitted as a forward reference for this
> upcoming B frame. This requires flushing support but it's already well in place
> for stateful encoders.

No, its a lot harder for users. The placement of keyframe should be bound to
various image analyses and streaming conditions like scene change detection,
network traffic, but also, I strictly don't want to depend on the Linux kernel
when its time to implement a custom reference tree. In general, stateful decoder
are never up to the game of modern RTP features and other fancy robust
referencing model. I overall have to disagree with your proposed approach. I
believe we have to create a stateless encoder interface and not a completely
abstract this hardware over our existing stateful interface. We should take
adventage of the nature of the hardware to make simpler and safer driver.

>
> The next topic of interest is reference management. It seems pretty clear that
> the decision of whether a frame should be a reference or not always needs to be
> taken when encoding that frame. In H.264 the nal_ref_idc slice header element
> indicates whether a frame is marked as reference or not. IDR frames can
> additionally be marked as long-term reference (if I understood correctly, the
> frame will stay in the reference picture list until the next IDR frame).

This is incorrect. Any frames can be marked as long term reference, it does not
matter what type they are. From what I recall, marking of the long term in the
bitstream is using a explicit IDX, so there is no specific rules on which one
get evicted. Long term of course are limited as they occupy space in the DPB.
Also, Each CODEC have different DPB semantic. For H.264, the DPB can run in two
modes. The first is a simple fifo, in this case, any frame you encode and want
to keep as reference is pushed into the DPB (which has a fixed size minus the
long term). If full, the oldest frame is removed. It is not bound to IDR or GOP.
Though, an IDR will implicitly cause the decoder to evict everything (including
long term).

The second mode uses the memory management commands. This is a series if
instruction that the encoder can send to the decoder. The specification is quite
complex, it is a common source of bugs in decoders and a place were stateless
hardware codecs performs more consistently in general. Through the commands, the
encoder ensure that the decoder dpb representation stay on sync.

> Frames that are marked as reference are added to the l0/l1 lists implicitly
> that way and are evicted mostly depending on the number of reference slots
> available, or when a new GOP is started.

Be aware that "slots" is a hardware implementation detail. I think it can be
used for any MPEG CODEC, but be careful since slots in AV1 specification have a
completely different meaning. Generalization of slots will create confusion.

>
> With the frame type decided by the kernel, it becomes nearly impossible for
> userspace to keep track of the reference lists. Userspace would at least need
> to know when an IDR frame is produced to flush the reference lists. In addition
> it looks like most hardware doesn't have a way to explicitly discard previous
> frames that were marked as reference from being used as reference for next
> frames. All in all this means that we should expect little control over the
> reference frames list.
>
> As a result my updated proposal would be to have userspace only indicate whether
> a submitted frame should be marked as a reference or not instead of submitting
> an explicit list of previous buffers that should be used as reference, which
> would be impossible to honor in many cases.
>
> Addition information gathered:
> - It seems likely that the Allwinner Video Engine only supports one reference
> frame. There's a register for specifying the rec buffer of a second one but
> I have never seen the proprietary blob use it. It might be as easy as
> specifying a non-zero address there but it might also be ignored or require
> some undocumented bit to use more than one reference. I haven't made any
> attempt at using it yet.

There is something in that fact that makes me think of Hantro H1. Hantro H1 also
have a second reference, but non one ever use it. We have on our todo to
actually give this a look.

> - Contrary to what I said after Andrzej's talk at EOSS, most Allwinner platforms
> do not support VP8 encode (despite Allwinner's proprietary blob having an
> API for it). The only platform that advertises it is the A80 and this might
> actually be a VP8-only Hantro H1. It seems that the API they developed in the
> library stuck around even if no other platform can use it.

Thanks for letting us know. Our assumption is that a second hardware design is
unlikely as Google was giving it for free to any hardware makers that wanted it.

>
> Sorry for the long email again, I'm trying to be a bit more explanatory than
> just giving some bare conclusions that I drew on my own.
>
> What do you think about these ideas?

In general, we diverge on the direction we want the interface to be. What you
seem to describe now is just a normal stateful encoder interface with everything
needed to drive the stateless hardware implemented in the Linux kernel. There is
no parsing or other unsafety in encoders, so I don't have a strict no-go
argument for that, but for me, it means much more complex drivers and lesser
flexibility. The VA model have been working great for us in the past, giving us
the ability to implement new feature, or even slightly of spec features. While,
the Linux kernel might not be the right place for these experimental methods.

Personally, I would rather discuss around your uAPI RFC though, I think a lot of
other devs here would like to see what you have drafted.

Nicolas

>
> Cheers,
>
> Paul
>
> >
> > This is a very long email where I've tried to split things into distinct topics
> > and explain a few concepts to make sure everyone is on the same page.
> >
> > # Bitstream Headers
> >
> > Stateless encoders typically do not generate all the bitstream headers and
> > sometimes no header at all (e.g. Allwinner encoder does not even produce slice
> > headers). There's often some hardware block that makes bit-level writing to the
> > destination buffer easier (deals with alignment, etc).
> >
> > The values of the bitstream headers must be in line with how the compressed
> > data bitstream is generated and generally follow the codec specification.
> > Some encoders might allow configuring all the fields found in the headers,
> > others may only allow configuring a few or have specific constraints regarding
> > which values are allowed.
> >
> > As a result, we cannot expect that any given encoder is able to produce frames
> > for any set of headers. Reporting related constraints and limitations (beyond
> > profile/level) seems quite difficult and error-prone.
> >
> > So it seems that keeping header generation in-kernel only (close to where the
> > hardware is actually configured) is the safest approach.
> >
> > # Codec Features
> >
> > Codecs have many variable features that can be enabled or not and specific
> > configuration fields that can take various values. There is usually some
> > top-level indication of profile/level that restricts what can be used.
> >
> > This is a very similar situation to stateful encoding, where codec-specific
> > controls are used to report and set profile/level and configure these aspects.
> > A particularly nice thing about it is that we can reuse these existing controls
> > and add new ones in the future for features that are not yet covered.
> >
> > This approach feels more flexible than designing new structures with a selected
> > set of parameters (that could match the existing controls) for each codec.
> >
> > # Reference and Reconstruction Management
> >
> > With stateless encoding, we need to tell the hardware which frames need to be
> > used as references for encoding the current frame and make sure we have the
> > these references available as decoded frames in memory.
> >
> > Regardless of references, stateless encoders typically need some memory space to
> > write the decoded (known as reconstructed) frame while it's being encoded.
> >
> > One question here is how many slots for decoded pictures should be allocated
> > by the driver when starting to stream. There is usually a maximum number of
> > reference frames that can be used at a time, although perhaps there is a use
> > case to keeping more around and alternative between them for future references.
> >
> > Another question is how the driver should keep track of which frame will be used
> > as a reference in the future and which one can be evicted from the pool of
> > decoded pictures if it's not going to be used anymore.
> >
> > A restrictive approach would be to let the driver alone manage that, similarly
> > to how stateful encoders behave. However it might provide extra flexibility
> > (and memory gain) to allow userspace to configure the maximum number of possible
> > reference frames. In that case it becomes necessary to indicate if a given
> > frame will be used as a reference in the future (maybe using a buffer flag)
> > and to indicate which previous reference frames (probably to be identified with
> > the matching output buffer's timestamp) should be used for the current encode.
> > This could be done with a new dedicated control (as a variable-sized array of
> > timestamps). Note that userspace would have to update it for every frame or the
> > reference frames will remain the same for future encodes.
> >
> > The driver will then make sure to keep the reconstructed buffer around, in one
> > of the slots. When there's no slot left, the driver will drop the oldest
> > reference it has (maybe with a bounce buffer to still allow it to be used as a
> > reference for the current encode).
> >
> > With this behavior defined in the uAPI spec, userspace will also be able to
> > keep track of which previous frame is no longer allowed as a reference.
> >
> > # Frame Types
> >
> > Stateless encoder drivers will typically instruct the hardware to encode either
> > an intra-coded or an inter-coded frame. While a stream composed only of a single
> > intra-coded frame followed by only inter-coded frames is possible, it's
> > generally not desirable as it is not very robust against data loss and makes
> > seeking difficult.
> >
> > As a result, the frame type is usually decided based on a given GOP size
> > (the frequency at which a new intra-coded frame is produced) while intra-coded
> > frames can be explicitly requested upon request. Stateful encoders implement
> > these through dedicated controls:
> > - V4L2_CID_MPEG_VIDEO_FORCE_KEY_FRAME
> > - V4L2_CID_MPEG_VIDEO_GOP_SIZE
> > - V4L2_CID_MPEG_VIDEO_H264_I_PERIOD
> >
> > It seems that reusing them would be possible, which would let the driver decide
> > of the particular frame type.
> >
> > However it makes the reference frame management a bit trickier since reference
> > frames might be requested from userspace for a frame that ends up being
> > intra-coded. We can either allow this and silently ignore the info or expect
> > that userspace keeps track of the GOP index and not send references on the first
> > frame.
> >
> > In some codecs, there's also a notion of barrier key-frames (IDR frames in
> > H.264) that strictly forbid using any past reference beyond the frame.
> > There seems to be an assumption that the GOP start uses this kind of frame
> > (and not any intra-coded frame), while the force key frame control does not
> > particularly specify it.
> >
> > In that case we should flush the list of references and userspace should no
> > longer provide references to them for future frames. This puts a requirement on
> > userspace to keep track of GOP start in order to know when to flush its
> > reference list. It could also check if V4L2_BUF_FLAG_KEYFRAME is set, but this
> > could also indicate a general intra-coded frame that is not a barrier.
> >
> > So another possibility would be for userspace to explicitly indicate which
> > frame type to use (in a codec-specific way) and act accordingly, leaving any
> > notion of GOP up to userspace. I feel like this might be the easiest approach
> > while giving an extra degree of control to userspace.
> >
> > # Rate Control
> >
> > Another important feature of encoders is the ability to control the amount of
> > data produced following different rate control strategies. Stateful encoders
> > typically do this in-firmware and expose controls for selecting the strategy
> > and associated targets.
> >
> > It seems desirable to support both automatic and manual rate-control to
> > userspace.
> >
> > Automatic control would be implemented kernel-side (with algos possibly shared
> > across drivers) and reuse existing stateful controls. The advantage is
> > simplicity (userspace does not need to carry its own rate-control
> > implementation) and to ensure that there is a built-in mechanism for common
> > strategies available for every driver (no mandatory dependency on a proprietary
> > userspace stack). There may also be extra statistics or controls available to
> > the driver that allow finer-grain control.
> >
> > Manual control allows userspace to get creative and requires the ability to set
> > the quantization parameter (QP) directly for each frame (controls are already
> > as many stateful encoders also support it).
> >
> > # Regions of Interest
> >
> > Regions of interest (ROIs) allow specifying sub-regions of the frame that should
> > be prioritized for quality. Stateless encoders typically support a limited
> > number and allow setting specific QP values for these regions.
> >
> > While the QP value should be used directly in manual rate-control, we probably
> > want to have some "level of importance" setting for kernel-side rate-control,
> > along with the dimensions/position of each ROI. This could be expressed with
> > a new structure containing all these elements and presented as a variable-sized
> > array control with as many elements as the hardware can support.
> >
> > --
> > Paul Kocialkowski, Bootlin
> > Embedded Linux and kernel engineering
> > https://bootlin.com
>
>
>

2023-08-11 21:21:41

Hi folks,

Just a quick message on this thread to let you know that we have just published
the code for the H.264 encoding extension to cedrus for the V3/V3s/S3.

You can find more details in the dedicated blog post:
- https://bootlin.com/blog/open-source-linux-kernel-support-for-the-allwinner-v3-v3s-s3-h-264-video-encoder/

And the code is at:
- https://github.com/bootlin/linux/tree/cedrus/h264-encoding
- https://github.com/bootlin/v4l2-cedrus-enc-test

As announced this doesn't really help advance our uAPI discussion here since
there is no rate-control yet and the stateful controls are reused for
controlling the encoding features (including things like GOP).

Cheers,

Paul

--
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

Attachments:

(No filename) (798.00 B)
signature.asc (499.00 B)
Download all attachments