Hi,

Le mardi 09 octobre 2018 à 16:30 +0900, Tomasz Figa a écrit :
> On Thu, Oct 4, 2018 at 9:46 PM Paul Kocialkowski <[email protected]> wrote:
> >
> > Hi,
> >
> > Here are a few minor suggestion about H.264 controls.
> >
> > Le jeudi 04 octobre 2018 à 17:11 +0900, Alexandre Courbot a écrit :
> > > diff --git a/Documentation/media/uapi/v4l/extended-controls.rst b/Documentation/media/uapi/v4l/extended-controls.rst
> > > index a9252225b63e..9d06d853d4ff 100644
> > > --- a/Documentation/media/uapi/v4l/extended-controls.rst
> > > +++ b/Documentation/media/uapi/v4l/extended-controls.rst
> > > @@ -810,6 +810,31 @@ enum v4l2_mpeg_video_bitrate_mode -
> > > otherwise the decoder expects a single frame in per buffer.
> > > Applicable to the decoder, all codecs.
> > >
> > > +.. _v4l2-mpeg-h264:
> > > +
> > > +``V4L2_CID_MPEG_VIDEO_H264_SPS``
> > > + Instance of struct v4l2_ctrl_h264_sps, containing the SPS of to use with
> > > + the next queued frame. Applicable to the H.264 stateless decoder.
> > > +
> > > +``V4L2_CID_MPEG_VIDEO_H264_PPS``
> > > + Instance of struct v4l2_ctrl_h264_pps, containing the PPS of to use with
> > > + the next queued frame. Applicable to the H.264 stateless decoder.
> > > +
> > > +``V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX``
> >
> > For consistency with MPEG-2 and upcoming JPEG, I think we should call
> > this "H264_QUANTIZATION".
>
> I'd rather stay consistent with H.264 specification, which uses the
> wording as defined in Alex's patch. Otherwise it would be difficult to
> correlate between the controls and the specification, which is
> something that the userspace developer would definitely need to
> understand how to properly parse the stream and obtain the decoding
> parameters.

Okay, I agree this makes more sense than trying to keep the names
consistent across codecs.

> >
> > > + Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
> > > + matrix to use when decoding the next queued frame. Applicable to the H.264
> > > + stateless decoder.
> > > +
> > > +``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM``
> >
> > Ditto with "H264_SLICE_PARAMS".
> >
>
> It doesn't seem to be related to the spec in this case and "params"
> sounds better indeed.

Cheers,

Paul

Attachments:

signature.asc (849.00 B)
This is a digitally signed message part

2018-10-12 11:54:08

by Paul Kocialkowski

[permalink] [raw]

Subject: Re: [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface

Hi,

Le mardi 09 octobre 2018 à 16:36 +0900, Tomasz Figa a écrit :
> On Sat, Oct 6, 2018 at 2:09 AM Paul Kocialkowski <[email protected]> wrote:
> > Hi,
> >
> > Le jeudi 04 octobre 2018 à 14:10 -0400, Nicolas Dufresne a écrit :
> > > Le jeudi 04 octobre 2018 à 14:47 +0200, Paul Kocialkowski a écrit :
> > > > > + Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
> > > > > + matrix to use when decoding the next queued frame. Applicable to the H.264
> > > > > + stateless decoder.
> > > > > +
> > > > > +``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM``
> > > >
> > > > Ditto with "H264_SLICE_PARAMS".
> > > >
> > > > > + Array of struct v4l2_ctrl_h264_slice_param, containing at least as many
> > > > > + entries as there are slices in the corresponding ``OUTPUT`` buffer.
> > > > > + Applicable to the H.264 stateless decoder.
> > > > > +
> > > > > +``V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAM``
> > > > > + Instance of struct v4l2_ctrl_h264_decode_param, containing the high-level
> > > > > + decoding parameters for a H.264 frame. Applicable to the H.264 stateless
> > > > > + decoder.
> > > >
> > > > Since we require all the macroblocks to decode one frame to be held in
> > > > the same OUTPUT buffer, it probably doesn't make sense to keep
> > > > DECODE_PARAM and SLICE_PARAM distinct.
> > > >
> > > > I would suggest merging both in "SLICE_PARAMS", similarly to what I
> > > > have proposed for H.265: https://patchwork.kernel.org/patch/10578023/
> > > >
> > > > What do you think?
> > >
> > > I don't understand why we add this arbitrary restriction of "all the
> > > macroblocks to decode one frame". The bitstream may contain multiple
> > > NALs per frame (e.g. slices), and stateless API shall pass each NAL
> > > separately imho. The driver can then decide to combine them if needed,
> > > or to keep them seperate. I would expect most decoder to decode each
> > > slice independently from each other, even though they write into the
> > > same frame.
> >
> > Well, we sort of always assumed that there is a 1:1 correspondency
> > between request and output frame when implemeting the software for
> > cedrus, which simplified both userspace and the driver. The approach we
> > have taken is to use one of the slice parameters for the whole series
> > of slices and just append the slice data.
> >
> > Now that you bring it up, I realize this is an unfortunate decision.
> > This may have been the cause of bugs and limitations with our driver
> > because the slice parameters may very well be distinct for each slice.
>
> I might be misunderstanding something, but, at least for the H.264
> API, there is no relation between the number of buffers/requests and
> number of slice parameters. The V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM
> is an array, with each element describing each slice in the OUTPUT
> buffer. So actually, it could be up to the userspace if it want to
> have 1 OUTPUT buffer per slice or all slices in 1 OUTPUT buffer - the
> former would have v4l2_ctrl_h264_decode_param::num_slices = 1 and only
> one valid element in V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS.

It seems that we have totally missed that when implementing H.264
support for Cedrus and did not do anything similar for other codecs.

I think grouping slices in the same OUTPUT buffer and providing offsets
for each slice from each element of an array of SLICE_PARAMS controls
makes a lot of sense.

This way we don't have to have one OUTPUT buffer per slice, which
simplifies things a lot. Also, not having one request per slice will
probably speed things up.

However, I'm a bit confused when looking at the chromeos-4.4 code. It
seems that RK3288 only takes the parameters from the first slice (it's
not using DECODE_PARAM's num_slices). Also, RK3399 doesn't seem to use
the slice params at all. I'm really curious to understand how it works
for Rockchip. Perhaps someone has some insight about this?

Also, I'm not sure we have converged on a solution for the Rockchip VPU
requiring the start code before the coded data. Given that each slice
should be handled separately, does it mean the start code has to be
repeated for each?

> > Moreover, I suppose that just appending the slices data implies that
> > they are coded in the same order as the picture, which is probably
> > often the case but certainly not anything guaranteed.
>
> Again, at least in the H.264 API being proposed here, the order of
> slices is not specified by the order of slice data in the buffer. Each
> entry of the V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS array points to the
> specific offset within the buffer.

Sounds good to me.

Cheers,

Paul

--
Developer of free digital technology and hardware support.

Website: https://www.paulk.fr/
Coding blog: https://code.paulk.fr/
Git repositories: https://git.paulk.fr/ https://git.code.paulk.fr/

Attachments:

signature.asc (849.00 B)
This is a digitally signed message part

2018-10-12 12:28:53

by Paul Kocialkowski

[permalink] [raw]

Subject: Re: [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface

Hi,

Le mardi 09 octobre 2018 à 14:58 +0900, Tomasz Figa a écrit :
> Hi Paul,
>
> On Thu, Oct 4, 2018 at 9:40 PM Paul Kocialkowski <[email protected]> wrote:
> > Hi Alexandre,
> >
> > Thanks for submitting this second version of the RFC, it is very
> > appreciated! I will try to provide useful feedback here and hopefully
> > be more reactive than during v1 review!
> >
> > Most of it looks good to me, but there is a specific point I'd like to
> > keep discussing.
> >
> > Le jeudi 04 octobre 2018 à 17:11 +0900, Alexandre Courbot a écrit :
> > > This patch documents the protocol that user-space should follow when
> > > communicating with stateless video decoders. It is based on the
> > > following references:
> > >
> > > * The current protocol used by Chromium (converted from config store to
> > > request API)
> > >
> > > * The submitted Cedrus VPU driver
> > >
> > > As such, some things may not be entirely consistent with the current
> > > state of drivers, so it would be great if all stakeholders could point
> > > out these inconsistencies. :)
> > >
> > > This patch is supposed to be applied on top of the Request API V18 as
> > > well as the memory-to-memory video decoder interface series by Tomasz
> > > Figa.
> > >
> > > Changes since V1:
> > >
> > > * Applied fixes received as feedback,
> > > * Moved controls descriptions to the extended controls file,
> > > * Document reference frame management and referencing (need Hans' feedback on
> > > that).
> > >
> > > Signed-off-by: Alexandre Courbot <[email protected]>
> > > ---
> > > .../media/uapi/v4l/dev-stateless-decoder.rst | 348 ++++++++++++++++++
> > > Documentation/media/uapi/v4l/devices.rst | 1 +
> > > .../media/uapi/v4l/extended-controls.rst | 25 ++
> > > .../media/uapi/v4l/pixfmt-compressed.rst | 54 ++-
> > > 4 files changed, 424 insertions(+), 4 deletions(-)
> > > create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > >
> > > diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > new file mode 100644
> > > index 000000000000..e54246df18d0
> > > --- /dev/null
> > > +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > @@ -0,0 +1,348 @@
> > > +.. -*- coding: utf-8; mode: rst -*-
> > > +
> > > +.. _stateless_decoder:
> > > +
> > > +**************************************************
> > > +Memory-to-memory Stateless Video Decoder Interface
> > > +**************************************************
> > > +
> > > +A stateless decoder is a decoder that works without retaining any kind of state
> > > +between processing frames. This means that each frame is decoded independently
> > > +of any previous and future frames, and that the client is responsible for
> > > +maintaining the decoding state and providing it to the driver. This is in
> > > +contrast to the stateful video decoder interface, where the hardware maintains
> > > +the decoding state and all the client has to do is to provide the raw encoded
> > > +stream.
> > > +
> > > +This section describes how user-space ("the client") is expected to communicate
> > > +with such decoders in order to successfully decode an encoded stream. Compared
> > > +to stateful codecs, the driver/client sequence is simpler, but the cost of this
> > > +simplicity is extra complexity in the client which must maintain a consistent
> > > +decoding state.
> > > +
> > > +Querying capabilities
> > > +=====================
> > > +
> > > +1. To enumerate the set of coded formats supported by the driver, the client
> > > + calls :c:func:`VIDIOC_ENUM_FMT` on the ``OUTPUT`` queue.
> > > +
> > > + * The driver must always return the full set of supported ``OUTPUT`` formats,
> > > + irrespective of the format currently set on the ``CAPTURE`` queue.
> > > +
> > > +2. To enumerate the set of supported raw formats, the client calls
> > > + :c:func:`VIDIOC_ENUM_FMT` on the ``CAPTURE`` queue.
> > > +
> > > + * The driver must return only the formats supported for the format currently
> > > + active on the ``OUTPUT`` queue.
> > > +
> > > + * Depending on the currently set ``OUTPUT`` format, the set of supported raw
> > > + formats may depend on the value of some controls (e.g. H264 or VP9
> > > + profile). The client is responsible for making sure that these controls
> > > + are set to the desired value before querying the ``CAPTURE`` queue.
> >
> > I still think we have a problem when enumerating CAPTURE formats, that
> > providing the profile/level information does not help solving.
> >
> > From previous emails on v1 (to which I failed to react to), it seems
> > that the consensus was to set the profile/level indication beforehand
> > to reduce the subset of possible formats and return that as enumerated
> > possible formats.
>
> I think the consensus was to set all the the parsed header controls
> and actually Alex seems to have mentioned it slightly further in his
> patch:
>
> + * In order to enumerate raw formats supported by a given coded format, the
> + client must thus set that coded format on the ``OUTPUT`` queue first, then
> + set any control listed on the format's description, and finally enumerate
> + the ``CAPTURE`` queue.
>
> > However, it does not really solve the issue here, given the following
> > distinct cases:
> >
> > 1. The VPU can only output the format for the decoded frame and that
> > format is not known until the first buffer metadata is passed.
>
> That's why I later suggested metadata (parsed header controls) and not
> just some selective controls, such as profiles.

Oh sorry, it seems that I misunderstood this part. I totally agree with
the approach of setting the required codec controls then.

I will look into how that can be managed with VAAPI. I recall that the
format negotiation part happens very early (no metadata passed yet) and
I don't think there's a way to change the format afterwards. But if
there's a problem there, it's definitely on VAAPI's side and that
should not contaminate our API.

> > Everything that is reported as supported at this point should be
> > understood as supported formats for the decoded bitstreams, but
> > userspace would have to pick the one matching the decoded format of the
> > bitstream to decode. I don't really see the point of trying to reduce
> > that list by providing the profile/level.
> >
> > 2. The VPU has some format conversion block in its pipeline and can
> > actually provide a range of different formats for CAPTURE buffers,
> > independently from the format of the decoded bitstream.
> >
> > Either way, I think (correct me if I'm wrong) that players do know the
> > format from the decoded bitstream here, so enumeration only makes sense
> > for case 2.
>
> Players don't know the format for the decoded bitstream, as I already
> explained before. From stream metadata they would only know whether
> the stream is YUV 4:2:0 vs 4:2:2, but wouldn't know the exact hardware
> constraints, e.g. whether NV12 or YUV420 is supported for given YUV
> 4:2:0 stream.

That's a good point, thanks!

> > Something we could do is to not enumerate any format for case 1., which
> > we would specify as an indication that only the decoded bitstream
> > format must be set. Then in case 2., we would enumerate the possible
> > formats.
> >
> > For case 1., having the driver expose the supported profiles ensures
> > that any format in a supported profile is valid although not
> > enumerated.
>
> Profile doesn't fully determine a specific pixel format, only the
> abstract format (see above).

From what I've understood, profile indicates which YUV sub-sampling
configuration is allowed, but there may be multiple ones. For instance,
the High 4:4:4 profile allows both 4:2:2 and 4:4:4 sub-sampling.

So I maintain that it doesn't make much sense to set the profile while
decoding. Instead, I think it should be a read-only control that
indicates what the hardware can do. Reading this control would allow
userspace to find out whether the current video can be decoded in
hardware or not (so this could be added to the Querying capabilities
part of the doc).

> > Alternatively, we could go with a control that indicates whether the
> > driver supports a format decorrelated from the decoded bitstream format
> > and still enumerate all formats in case 1., with the implication that
> > only the right one must be picked by userspace. Here again, I don't see
> > the point of reducing the list by setting the profile/level.
> >
> > So my goal here is to clearly enable userspace to distinguish between
> > the two situations.
> >
> > What do you think?
>
> Why would we need to create a control, if we already have the ENUM_FMT
> API existing exactly to achieve this? We already have this problem
> solved for stateful decoders and if we request the userspace to
> actually set all the necessary metadata beforehand, the resulting
> behavior (initialization sequence) would be much more consistent
> between these 2 APIs.

Let's drop this idea altogether, setting required controls feels like a
much more generic and cleaner approach.

Cheers,

Paul

--
Developer of free digital technology and hardware support.

Website: https://www.paulk.fr/
Coding blog: https://code.paulk.fr/
Git repositories: https://git.paulk.fr/ https://git.code.paulk.fr/

Attachments:

signature.asc (849.00 B)
This is a digitally signed message part