LinuxLists.cc - [RFC PATCH] media: docs-rst: Document m2m stateless video decoder interface

[permalink] [raw]

Subject: Re: [RFC PATCH] media: docs-rst: Document m2m stateless video decoder interface

Hi Alex,

+Maxime Ripard +Ezequiel Garcia +Nicolas Dufresne

On Fri, Aug 31, 2018 at 4:48 PM Alexandre Courbot <[email protected]> wrote:
>
> This patch documents the protocol that user-space should follow when
> communicating with stateless video decoders. It is based on the
> following references:
>
> * The current protocol used by Chromium (converted from config store to
> request API)
>
> * The submitted Cedrus VPU driver
>
> As such, some things may not be entirely consistent with the current
> state of drivers, so it would be great if all stakeholders could point
> out these inconsistencies. :)
>
> This patch is supposed to be applied on top of the Request API V18 as
> well as the memory-to-memory video decoder interface series by Tomasz
> Figa.
>
> It should be considered an early RFC.

Thanks a lot for working on this. I'll review this patch a bit later,
but let me post links to patches on which we had some earlier
discussions, to keep things together:

https://patchwork.kernel.org/patch/10462311/
https://patchwork.kernel.org/patch/10577969/

Also CCd folks who discussed there.

Best regards,
Tomasz

2018-09-10 07:18:51

[permalink] [raw]

Subject: Re: [RFC PATCH] media: docs-rst: Document m2m stateless video decoder interface

Hi Alex,

+Maxime Ripard +Ezequiel Garcia +Nicolas Dufresne

[Not snipping intentionally.]

On Fri, Aug 31, 2018 at 4:48 PM Alexandre Courbot <[email protected]> wrote:
>
> This patch documents the protocol that user-space should follow when
> communicating with stateless video decoders. It is based on the
> following references:
>
> * The current protocol used by Chromium (converted from config store to
> request API)
>
> * The submitted Cedrus VPU driver
>
> As such, some things may not be entirely consistent with the current
> state of drivers, so it would be great if all stakeholders could point
> out these inconsistencies. :)
>
> This patch is supposed to be applied on top of the Request API V18 as
> well as the memory-to-memory video decoder interface series by Tomasz
> Figa.
>
> It should be considered an early RFC.

Thanks a lot. I think this gives us a much better start already than
what we had with stateful API without any documentation. :)

Please see my comments inline.

>
> Signed-off-by: Alexandre Courbot <[email protected]>
> ---
> .../media/uapi/v4l/dev-stateless-decoder.rst | 413 ++++++++++++++++++
> Documentation/media/uapi/v4l/devices.rst | 1 +
> .../media/uapi/v4l/extended-controls.rst | 23 +
> 3 files changed, 437 insertions(+)
> create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
>
> diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> new file mode 100644
> index 000000000000..bf7b13a8ee16
> --- /dev/null
> +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> @@ -0,0 +1,413 @@
> +.. -*- coding: utf-8; mode: rst -*-
> +
> +.. _stateless_decoder:
> +
> +**************************************************
> +Memory-to-memory Stateless Video Decoder Interface
> +**************************************************
> +
> +A stateless decoder is a decoder that works without retaining any kind of state
> +between processing frames. This means that each frame is decoded independently
> +of any previous and future frames, and that the client is responsible for
> +maintaining the decoding state and providing it to the driver. This is in
> +contrast to the stateful video decoder interface, where the hardware maintains
> +the decoding state and all the client has to do is to provide the raw encoded
> +stream.
> +
> +This section describes how user-space ("the client") is expected to communicate
> +with such decoders in order to successfully decode an encoded stream. Compared
> +to stateful codecs, the driver/client protocol is simpler, but cost of this
> +simplicity is extra complexity in the client which must maintain the decoding
> +state.
> +
> +Querying capabilities
> +=====================
> +
> +1. To enumerate the set of coded formats supported by the driver, the client
> + calls :c:func:`VIDIOC_ENUM_FMT` on the ``OUTPUT`` queue.
> +
> + * The driver must always return the full set of supported formats for the
> + currently set ``OUTPUT`` format, irrespective of the format currently set
> + on the ``CAPTURE`` queue.
> +
> +2. To enumerate the set of supported raw formats, the client calls
> + :c:func:`VIDIOC_ENUM_FMT` on the ``CAPTURE`` queue.
> +
> + * The driver must return only the formats supported for the format currently
> + active on the ``OUTPUT`` queue.
> +
> + * In order to enumerate raw formats supported by a given coded format, the
> + client must thus set that coded format on the ``OUTPUT`` queue first, and
> + then enumerate the ``CAPTURE`` queue.

One thing that we might want to note here is that available CAPTURE
formats may depend on more factors than just current OUTPUT format.
Depending on encoding options of the stream being decoded (e.g. VP9
profile), the list of supported format might be limited to one of YUV
4:2:0, 4:2:2 or 4:4:4 family of formats, but might be any of them, if
the hardware supports conversion.

I was wondering whether we shouldn't require the client to set all the
necessary initial codec-specific controls before querying CAPTURE
formats, but since we don't have any compatibility constraints here,
as opposed to the stateful API, perhaps we could just make CAPTURE
queue completely independent and have a source change event raised if
the controls set later make existing format invalid.

I'd like to ask the userspace folks here (Nicolas?), whether:
1) we need to know the exact list of formats that are guaranteed to be
supported for playing back the whole video, or
2) we're okay with some rough approximation here, or
3) maybe we don't even need to enumerate formats on CAPTURE beforehand?

To be honest, I'm not sure 1) is even possible, since the resolution
(or some other stream parameters) could change mid-stream.

> +
> +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> + resolutions for a given format, passing desired pixel format in
> + :c:type:`v4l2_frmsizeenum` ``pixel_format``.
> +
> + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``OUTPUT`` queue
> + must include all possible coded resolutions supported by the decoder
> + for given coded pixel format.
> +
> + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``CAPTURE`` queue
> + must include all possible frame buffer resolutions supported by the
> + decoder for given raw pixel format and coded format currently set on
> + ``OUTPUT`` queue.
> +
> + .. note::
> +
> + The client may derive the supported resolution range for a
> + combination of coded and raw format by setting width and height of
> + ``OUTPUT`` format to 0 and calculating the intersection of
> + resolutions returned from calls to :c:func:`VIDIOC_ENUM_FRAMESIZES`
> + for the given coded and raw formats.
> +
> +4. Supported profiles and levels for given format, if applicable, may be
> + queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> +
> +Initialization
> +==============
> +
> +1. *[optional]* Enumerate supported ``OUTPUT`` formats and resolutions. See
> + capability enumeration.
> +
> +2. Set the coded format on the ``OUTPUT`` queue via :c:func:`VIDIOC_S_FMT`
> +
> + * **Required fields:**
> +
> + ``type``
> + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> + ``pixelformat``
> + a coded pixel format
> +
> + ``width``, ``height``
> + parsed width and height of the coded format

Perhaps "coded width and height parsed from the stream" could be a bit
more clear?

> +
> + other fields
> + follow standard semantics
> +
> + .. note::
> +
> + Changing ``OUTPUT`` format may change currently set ``CAPTURE``
> + format. The driver will derive a new ``CAPTURE`` format from
> + ``OUTPUT`` format being set, including resolution, colorimetry
> + parameters, etc. If the client needs a specific ``CAPTURE`` format,
> + it must adjust it afterwards.
> +
> +3. *[optional]* Get minimum number of buffers required for ``OUTPUT``
> + queue via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to
> + use more buffers than minimum required by hardware/format.
> +
> + * **Required fields:**
> +
> + ``id``
> + set to ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> +
> + * **Return fields:**
> +
> + ``value``
> + required number of ``OUTPUT`` buffers for the currently set
> + format

I'm not very sure if this is useful for stateless API, but no strong opinion.

> +
> +4. Call :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` queue to get format for the
> + destination buffers parsed/decoded from the bitstream.
> +
> + * **Required fields:**
> +
> + ``type``
> + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> + * **Return fields:**
> +
> + ``width``, ``height``
> + frame buffer resolution for the decoded frames
> +
> + ``pixelformat``
> + pixel format for decoded frames
> +
> + ``num_planes`` (for _MPLANE ``type`` only)
> + number of planes for pixelformat
> +
> + ``sizeimage``, ``bytesperline``
> + as per standard semantics; matching frame buffer format
> +
> + .. note::
> +
> + The value of ``pixelformat`` may be any pixel format supported for the
> + ``OUTPUT`` format, based on the hardware capabilities. It is suggested
> + that driver chooses the preferred/optimal format for given configuration.
> + For example, a YUV format may be preferred over an RGB format, if
> + additional conversion step would be required.
> +
> +5. *[optional]* Enumerate ``CAPTURE`` formats via :c:func:`VIDIOC_ENUM_FMT` on
> + ``CAPTURE`` queue. The client may use this ioctl to discover which
> + alternative raw formats are supported for the current ``OUTPUT`` format and
> + select one of them via :c:func:`VIDIOC_S_FMT`.
> +
> + .. note::
> +
> + The driver will return only formats supported for the currently selected
> + ``OUTPUT`` format, even if more formats may be supported by the driver in
> + general.
> +
> + For example, a driver/hardware may support YUV and RGB formats for
> + resolutions 1920x1088 and lower, but only YUV for higher resolutions (due
> + to hardware limitations). After setting a resolution of 1920x1088 or lower
> + as the ``OUTPUT`` format, :c:func:`VIDIOC_ENUM_FMT` may return a set of
> + YUV and RGB pixel formats, but after setting a resolution higher than
> + 1920x1088, the driver will not return RGB, unsupported for this
> + resolution.
> +
> +6. *[optional]* Choose a different ``CAPTURE`` format than suggested via
> + :c:func:`VIDIOC_S_FMT` on ``CAPTURE`` queue. It is possible for the client to
> + choose a different format than selected/suggested by the driver in
> + :c:func:`VIDIOC_G_FMT`.
> +
> + * **Required fields:**
> +
> + ``type``
> + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> + ``pixelformat``
> + a raw pixel format
> +
> + .. note::
> +
> + Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available
> + formats after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
> + out a set of allowed formats for given configuration, but not required,
> + if the client can accept the defaults.

V4L2_EVENT_SOURCE_CHANGE was not mentioned in earlier steps. I suppose
it's a leftover from the stateful API? :)

Still, I think we may eventually need source change events, because of
the reasons I mentioned above.

> +
> +7. *[optional]* Get minimum number of buffers required for ``CAPTURE`` queue via
> + :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use more buffers
> + than minimum required by hardware/format.
> +
> + * **Required fields:**
> +
> + ``id``
> + set to ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> +
> + * **Return fields:**
> +
> + ``value``
> + minimum number of buffers required to decode the stream parsed in this
> + initialization sequence.
> +
> + .. note::
> +
> + Note that the minimum number of buffers must be at least the number
> + required to successfully decode the current stream. This may for example
> + be the required DPB size for an H.264 stream given the parsed stream
> + configuration (resolution, level).

I'm not sure if this really makes sense for stateless API, because DPB
management is done by the client, so it already has all the data to
know how many buffers would be needed/optimal.

> +
> +8. Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on
> + ``OUTPUT`` queue.
> +
> + * **Required fields:**
> +
> + ``count``
> + requested number of buffers to allocate; greater than zero
> +
> + ``type``
> + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> + ``memory``
> + follows standard semantics
> +
> + ``sizeimage``
> + follows standard semantics; the client is free to choose any
> + suitable size, however, it may be subject to change by the
> + driver
> +
> + * **Return fields:**
> +
> + ``count``
> + actual number of buffers allocated
> +
> + * The driver must adjust count to minimum of required number of ``OUTPUT``
> + buffers for given format and count passed. The client must check this
> + value after the ioctl returns to get the number of buffers allocated.
> +
> + .. note::
> +
> + To allocate more than minimum number of buffers (for pipeline depth), use
> + G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) to get minimum number of
> + buffers required by the driver/format, and pass the obtained value plus
> + the number of additional buffers needed in count to
> + :c:func:`VIDIOC_REQBUFS`.
> +
> +9. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
> + ``CAPTURE`` queue.
> +
> + * **Required fields:**
> +
> + ``count``
> + requested number of buffers to allocate; greater than zero
> +
> + ``type``
> + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> + ``memory``
> + follows standard semantics
> +
> + * **Return fields:**
> +
> + ``count``
> + adjusted to allocated number of buffers
> +
> + * The driver must adjust count to minimum of required number of
> + destination buffers for given format and stream configuration and the
> + count passed. The client must check this value after the ioctl
> + returns to get the number of buffers allocated.
> +
> + .. note::
> +
> + To allocate more than minimum number of buffers (for pipeline
> + depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``) to
> + get minimum number of buffers required, and pass the obtained value
> + plus the number of additional buffers needed in count to
> + :c:func:`VIDIOC_REQBUFS`.
> +
> +10. Allocate requests (likely one per ``OUTPUT`` buffer) via
> + :c:func:`MEDIA_IOC_REQUEST_ALLOC` on the media device.
> +
> +11. Start streaming on both ``OUTPUT`` and ``CAPTURE`` queues via
> + :c:func:`VIDIOC_STREAMON`.
> +
> +Decoding
> +========
> +
> +For each frame, the client is responsible for submitting a request to which the
> +following is attached:
> +
> +* Exactly one frame worth of encoded data in a buffer submitted to the
> + ``OUTPUT`` queue,

Just to make sure, in case of H.264, that would include all the slices
of the frame in one buffer, right?

> +* All the controls relevant to the format being decoded (see below for details).
> +
> +``CAPTURE`` buffers must not be part of the request, but must be queued
> +independently. The driver will pick one of the queued ``CAPTURE`` buffers and
> +decode the frame into it. Although the client has no control over which
> +``CAPTURE`` buffer will be used with a given ``OUTPUT`` buffer, it is guaranteed
> +that ``CAPTURE`` buffers will be returned in decode order (i.e. the same order
> +as ``OUTPUT`` buffers were submitted), so it is trivial to associate a dequeued
> +``CAPTURE`` buffer to its originating request and ``OUTPUT`` buffer.
> +
> +If the request is submitted without an ``OUTPUT`` buffer or if one of the
> +required controls are missing, then :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return
> +``-EINVAL``.

As per some of the other discussion threads we had before (and I
linked to in my previous reply), we might want one of the following:
1) precisely define the list of controls needed with the
fine-granularity of all possible stream feature combinations,
2) make only some very basic controls mandatory and never fail if
other controls are not specified.

IMHO 2) has a potential to lead to userspace relying on undefined
behavior with some controls not being set (for
laziness/simplicity/whatever excuse the author can think of), so I'd
personally go with 1)...

> Decoding errors are signaled by the ``CAPTURE`` buffers being
> +dequeued carrying the ``V4L2_BUF_FLAG_ERROR`` flag.
> +
> +The contents of source ``OUTPUT`` buffers, as well as the controls that must be
> +set on the request, depend on active coded pixel format and might be affected by
> +codec-specific extended controls, as stated in documentation of each format
> +individually.
> +
> +MPEG-2 buffer content and controls
> +----------------------------------
> +The following information is valid when the ``OUTPUT`` queue is set to the
> +``V4L2_PIX_FMT_MPEG2_SLICE`` format.

Perhaps we should document the controls there instead? I think that
was the conclusion from the discussions over the stateful API.

> +
> +The ``OUTPUT`` buffer must contain all the macroblock slices of a given frame,
> +i.e. if a frame requires several macroblock slices to be entirely decoded, then
> +all these slices must be provided. In addition, the following controls must be
> +set on the request:
> +
> +V4L2_CID_MPEG_VIDEO_MPEG2_SLICE_PARAMS
> + Slice parameters (one per slice) for the current frame.
> +

How do we know how many slices are included in current frame?

> +Optional controls:
> +
> +V4L2_CID_MPEG_VIDEO_MPEG2_QUANTIZATION
> + Quantization matrices for the current frame.

What happens if it's not specified?

> +
> +H.264 buffer content and controls
> +---------------------------------
> +The following information is valid when the ``OUTPUT`` queue is set to the
> +``V4L2_PIX_FMT_H264_SLICE`` format.

Ditto.

> +
> +The ``OUTPUT`` buffer must contain all the macroblock slices of a given frame,
> +i.e. if a frame requires several macroblock slices to be entirely decoded, then
> +all these slices must be provided. In addition, the following controls must be
> +set on the request:
> +
> +V4L2_CID_MPEG_VIDEO_H264_SPS
> + Instance of struct v4l2_ctrl_h264_sps, containing the SPS of to use with the
> + frame.
> +
> +V4L2_CID_MPEG_VIDEO_H264_PPS
> + Instance of struct v4l2_ctrl_h264_pps, containing the PPS of to use with the
> + frame.
> +
> +V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX
> + Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
> + matrix to use when decoding the frame.
> +
> +V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM
> + Array of struct v4l2_ctrl_h264_slice_param, containing at least as many
> + entries as there are slices in the corresponding ``OUTPUT`` buffer.
> +
> +V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAM
> + Instance of struct v4l2_ctrl_h264_decode_param, containing the high-level
> + decoding parameters for a H.264 frame.
> +
> +Seek
> +====
> +In order to seek, the client just needs to submit requests using input buffers
> +corresponding to the new stream position. It must however be aware that
> +resolution may have changed and follow the dynamic resolution change protocol in

nit: We tend to call it "sequence" rather than "protocol" in other documents.

> +that case. Also depending on the codec used, picture parameters (e.g. SPS/PPS
> +for H.264) may have changed and the client is responsible for making sure that
> +a valid state is sent to the kernel.
> +
> +The client is then free to ignore any returned ``CAPTURE`` buffer that comes
> +from the pre-seek position.
> +
> +Pause
> +=====
> +
> +In order to pause, the client should just cease queuing buffers onto the
> +``OUTPUT`` queue. This is different from the general V4L2 API definition of
> +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue.
> +Without source bitstream data, there is no data to process and the hardware
> +remains idle.

This behavior is by design of memory-to-memory devices, so I'm not
sure we really need this section. Perhaps we could move it to a
separate document which explains m2m basics.

> +
> +Dynamic resolution change
> +=========================
> +
> +If the client detects a resolution change in the stream, it may need to
> +reallocate the ``CAPTURE`` buffers to fit the new size.
> +
> +1. Wait until all submitted requests have completed and dequeue the
> + corresponding output buffers.
> +
> +2. Call :c:func:`VIDIOC_STREAMOFF` on the ``CAPTURE`` queue.
> +
> +3. Free all ``CAPTURE`` buffers by calling :c:func:`VIDIOC_REQBUFS` on the
> + ``CAPTURE`` queue with a buffer count of zero.
> +
> +4. Set the new format and resolution on the ``CAPTURE`` queue.
> +
> +5. Allocate new ``CAPTURE`` buffers for the new resolution.
> +
> +6. Call :c:func:`VIDIOC_STREAMON` on the ``CAPTURE`` queue to resume the stream.
> +
> +The client can then start queueing new ``CAPTURE`` buffers and submit requests
> +to decode the next buffers at the new resolution.

There is some inconsistency here. In initialization sequence, we set
OUTPUT format to coded width and height and had the driver set a sane
default CAPTURE format. What happens to OUTPUT format? It definitely
wouldn't make sense if it stayed at the initial coded size.

> +
> +Drain
> +=====
> +
> +In order to drain the stream on a stateless decoder, the client just needs to
> +wait until all the submitted requests are completed. There is no need to send a
> +``V4L2_DEC_CMD_STOP`` command since requests are processed sequentially by the
> +driver.

Is there a need to include this section? I feel like it basically says
"Drain sequence: There is no drain sequence." ;)

> +
> +End of stream
> +=============
> +
> +If the decoder encounters an end of stream marking in the stream, the
> +driver must send a ``V4L2_EVENT_EOS`` event to the client after all frames
> +are decoded and ready to be dequeued on the ``CAPTURE`` queue, with the
> +:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This
> +behavior is identical to the drain sequence triggered by the client via
> +``V4L2_DEC_CMD_STOP``.

The client parses the stream, so it should be also responsible for end
of stream handling. There isn't anything to be signaled by the driver
here (nor the driver could actually signal), so I'd just remove this
section completely.

> diff --git a/Documentation/media/uapi/v4l/devices.rst b/Documentation/media/uapi/v4l/devices.rst
> index 1822c66c2154..a8e568eda7d8 100644
> --- a/Documentation/media/uapi/v4l/devices.rst
> +++ b/Documentation/media/uapi/v4l/devices.rst
> @@ -16,6 +16,7 @@ Interfaces
> dev-osd
> dev-codec
> dev-decoder
> + dev-stateless-decoder
> dev-encoder
> dev-effect
> dev-raw-vbi
> diff --git a/Documentation/media/uapi/v4l/extended-controls.rst b/Documentation/media/uapi/v4l/extended-controls.rst
> index a9252225b63e..c0411ebf4c12 100644
> --- a/Documentation/media/uapi/v4l/extended-controls.rst
> +++ b/Documentation/media/uapi/v4l/extended-controls.rst
> @@ -810,6 +810,29 @@ enum v4l2_mpeg_video_bitrate_mode -
> otherwise the decoder expects a single frame in per buffer.
> Applicable to the decoder, all codecs.
>
> +``V4L2_CID_MPEG_VIDEO_H264_SPS``
> + Instance of struct v4l2_ctrl_h264_sps, containing the SPS of to use with
> + the next queued frame. Applicable to the H.264 stateless decoder.
> +
> +``V4L2_CID_MPEG_VIDEO_H264_PPS``
> + Instance of struct v4l2_ctrl_h264_pps, containing the PPS of to use with
> + the next queued frame. Applicable to the H.264 stateless decoder.
> +
> +``V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX``
> + Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
> + matrix to use when decoding the next queued frame. Applicable to the H.264
> + stateless decoder.
> +
> +``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM``
> + Array of struct v4l2_ctrl_h264_slice_param, containing at least as many
> + entries as there are slices in the corresponding ``OUTPUT`` buffer.
> + Applicable to the H.264 stateless decoder.
> +
> +``V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAM``
> + Instance of struct v4l2_ctrl_h264_decode_param, containing the high-level
> + decoding parameters for a H.264 frame. Applicable to the H.264 stateless
> + decoder.
> +

This seems to be roughly the same as in "H.264 buffer content and
controls". IMHO we should just keep the controls described here and
make the other file just cross reference to these descriptions.

Also, I guess this would eventually end up in the patch that adds
those controls and is just included here for RFC purposes, right?

Best regards,
Tomasz

2018-09-10 11:27:23

[permalink] [raw]

Subject: Re: [RFC PATCH] media: docs-rst: Document m2m stateless video decoder interface

Hi Alexandre,

Thank you very much for working on this, much appreciated!

On 08/31/2018 09:47 AM, Alexandre Courbot wrote:
> This patch documents the protocol that user-space should follow when
> communicating with stateless video decoders. It is based on the
> following references:
>
> * The current protocol used by Chromium (converted from config store to
> request API)
>
> * The submitted Cedrus VPU driver
>
> As such, some things may not be entirely consistent with the current
> state of drivers, so it would be great if all stakeholders could point
> out these inconsistencies. :)
>
> This patch is supposed to be applied on top of the Request API V18 as
> well as the memory-to-memory video decoder interface series by Tomasz
> Figa.
>
> It should be considered an early RFC.
>
> Signed-off-by: Alexandre Courbot <[email protected]>
> ---
> .../media/uapi/v4l/dev-stateless-decoder.rst | 413 ++++++++++++++++++
> Documentation/media/uapi/v4l/devices.rst | 1 +
> .../media/uapi/v4l/extended-controls.rst | 23 +
> 3 files changed, 437 insertions(+)
> create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
>
> diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> new file mode 100644
> index 000000000000..bf7b13a8ee16
> --- /dev/null
> +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> @@ -0,0 +1,413 @@
> +.. -*- coding: utf-8; mode: rst -*-
> +
> +.. _stateless_decoder:
> +
> +**************************************************
> +Memory-to-memory Stateless Video Decoder Interface
> +**************************************************
> +
> +A stateless decoder is a decoder that works without retaining any kind of state
> +between processing frames. This means that each frame is decoded independently
> +of any previous and future frames, and that the client is responsible for
> +maintaining the decoding state and providing it to the driver. This is in
> +contrast to the stateful video decoder interface, where the hardware maintains
> +the decoding state and all the client has to do is to provide the raw encoded
> +stream.
> +
> +This section describes how user-space ("the client") is expected to communicate
> +with such decoders in order to successfully decode an encoded stream. Compared
> +to stateful codecs, the driver/client protocol is simpler, but cost of this
> +simplicity is extra complexity in the client which must maintain the decoding
> +state.
> +
> +Querying capabilities
> +=====================
> +
> +1. To enumerate the set of coded formats supported by the driver, the client
> + calls :c:func:`VIDIOC_ENUM_FMT` on the ``OUTPUT`` queue.
> +
> + * The driver must always return the full set of supported formats for the
> + currently set ``OUTPUT`` format, irrespective of the format currently set
> + on the ``CAPTURE`` queue.
> +
> +2. To enumerate the set of supported raw formats, the client calls
> + :c:func:`VIDIOC_ENUM_FMT` on the ``CAPTURE`` queue.
> +
> + * The driver must return only the formats supported for the format currently
> + active on the ``OUTPUT`` queue.
> +
> + * In order to enumerate raw formats supported by a given coded format, the
> + client must thus set that coded format on the ``OUTPUT`` queue first, and
> + then enumerate the ``CAPTURE`` queue.
> +
> +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> + resolutions for a given format, passing desired pixel format in
> + :c:type:`v4l2_frmsizeenum` ``pixel_format``.
> +
> + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``OUTPUT`` queue
> + must include all possible coded resolutions supported by the decoder
> + for given coded pixel format.
> +
> + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``CAPTURE`` queue
> + must include all possible frame buffer resolutions supported by the
> + decoder for given raw pixel format and coded format currently set on
> + ``OUTPUT`` queue.
> +
> + .. note::
> +
> + The client may derive the supported resolution range for a
> + combination of coded and raw format by setting width and height of
> + ``OUTPUT`` format to 0 and calculating the intersection of
> + resolutions returned from calls to :c:func:`VIDIOC_ENUM_FRAMESIZES`
> + for the given coded and raw formats.
> +
> +4. Supported profiles and levels for given format, if applicable, may be
> + queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> +
> +Initialization
> +==============
> +
> +1. *[optional]* Enumerate supported ``OUTPUT`` formats and resolutions. See
> + capability enumeration.
> +
> +2. Set the coded format on the ``OUTPUT`` queue via :c:func:`VIDIOC_S_FMT`
> +
> + * **Required fields:**
> +
> + ``type``
> + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> + ``pixelformat``
> + a coded pixel format
> +
> + ``width``, ``height``
> + parsed width and height of the coded format
> +
> + other fields
> + follow standard semantics
> +
> + .. note::
> +
> + Changing ``OUTPUT`` format may change currently set ``CAPTURE``
> + format. The driver will derive a new ``CAPTURE`` format from
> + ``OUTPUT`` format being set, including resolution, colorimetry
> + parameters, etc. If the client needs a specific ``CAPTURE`` format,
> + it must adjust it afterwards.
> +
> +3. *[optional]* Get minimum number of buffers required for ``OUTPUT``
> + queue via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to
> + use more buffers than minimum required by hardware/format.
> +
> + * **Required fields:**
> +
> + ``id``
> + set to ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> +
> + * **Return fields:**
> +
> + ``value``
> + required number of ``OUTPUT`` buffers for the currently set
> + format
> +
> +4. Call :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` queue to get format for the
> + destination buffers parsed/decoded from the bitstream.
> +
> + * **Required fields:**
> +
> + ``type``
> + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> + * **Return fields:**
> +
> + ``width``, ``height``
> + frame buffer resolution for the decoded frames
> +
> + ``pixelformat``
> + pixel format for decoded frames
> +
> + ``num_planes`` (for _MPLANE ``type`` only)
> + number of planes for pixelformat
> +
> + ``sizeimage``, ``bytesperline``
> + as per standard semantics; matching frame buffer format
> +
> + .. note::
> +
> + The value of ``pixelformat`` may be any pixel format supported for the
> + ``OUTPUT`` format, based on the hardware capabilities. It is suggested
> + that driver chooses the preferred/optimal format for given configuration.
> + For example, a YUV format may be preferred over an RGB format, if
> + additional conversion step would be required.
> +
> +5. *[optional]* Enumerate ``CAPTURE`` formats via :c:func:`VIDIOC_ENUM_FMT` on
> + ``CAPTURE`` queue. The client may use this ioctl to discover which
> + alternative raw formats are supported for the current ``OUTPUT`` format and
> + select one of them via :c:func:`VIDIOC_S_FMT`.
> +
> + .. note::
> +
> + The driver will return only formats supported for the currently selected
> + ``OUTPUT`` format, even if more formats may be supported by the driver in
> + general.
> +
> + For example, a driver/hardware may support YUV and RGB formats for
> + resolutions 1920x1088 and lower, but only YUV for higher resolutions (due
> + to hardware limitations). After setting a resolution of 1920x1088 or lower
> + as the ``OUTPUT`` format, :c:func:`VIDIOC_ENUM_FMT` may return a set of
> + YUV and RGB pixel formats, but after setting a resolution higher than
> + 1920x1088, the driver will not return RGB, unsupported for this
> + resolution.
> +
> +6. *[optional]* Choose a different ``CAPTURE`` format than suggested via
> + :c:func:`VIDIOC_S_FMT` on ``CAPTURE`` queue. It is possible for the client to
> + choose a different format than selected/suggested by the driver in
> + :c:func:`VIDIOC_G_FMT`.
> +
> + * **Required fields:**
> +
> + ``type``
> + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> + ``pixelformat``
> + a raw pixel format
> +
> + .. note::
> +
> + Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available
> + formats after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
> + out a set of allowed formats for given configuration, but not required,
> + if the client can accept the defaults.
> +
> +7. *[optional]* Get minimum number of buffers required for ``CAPTURE`` queue via
> + :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use more buffers
> + than minimum required by hardware/format.
> +
> + * **Required fields:**
> +
> + ``id``
> + set to ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> +
> + * **Return fields:**
> +
> + ``value``
> + minimum number of buffers required to decode the stream parsed in this
> + initialization sequence.
> +
> + .. note::
> +
> + Note that the minimum number of buffers must be at least the number
> + required to successfully decode the current stream. This may for example
> + be the required DPB size for an H.264 stream given the parsed stream
> + configuration (resolution, level).
> +
> +8. Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on
> + ``OUTPUT`` queue.
> +
> + * **Required fields:**
> +
> + ``count``
> + requested number of buffers to allocate; greater than zero
> +
> + ``type``
> + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> + ``memory``
> + follows standard semantics
> +
> + ``sizeimage``
> + follows standard semantics; the client is free to choose any
> + suitable size, however, it may be subject to change by the
> + driver
> +
> + * **Return fields:**
> +
> + ``count``
> + actual number of buffers allocated
> +
> + * The driver must adjust count to minimum of required number of ``OUTPUT``
> + buffers for given format and count passed. The client must check this
> + value after the ioctl returns to get the number of buffers allocated.
> +
> + .. note::
> +
> + To allocate more than minimum number of buffers (for pipeline depth), use
> + G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) to get minimum number of
> + buffers required by the driver/format, and pass the obtained value plus
> + the number of additional buffers needed in count to
> + :c:func:`VIDIOC_REQBUFS`.
> +
> +9. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
> + ``CAPTURE`` queue.
> +
> + * **Required fields:**
> +
> + ``count``
> + requested number of buffers to allocate; greater than zero
> +
> + ``type``
> + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> + ``memory``
> + follows standard semantics
> +
> + * **Return fields:**
> +
> + ``count``
> + adjusted to allocated number of buffers
> +
> + * The driver must adjust count to minimum of required number of
> + destination buffers for given format and stream configuration and the
> + count passed. The client must check this value after the ioctl
> + returns to get the number of buffers allocated.
> +
> + .. note::
> +
> + To allocate more than minimum number of buffers (for pipeline
> + depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``) to
> + get minimum number of buffers required, and pass the obtained value
> + plus the number of additional buffers needed in count to
> + :c:func:`VIDIOC_REQBUFS`.
> +
> +10. Allocate requests (likely one per ``OUTPUT`` buffer) via
> + :c:func:`MEDIA_IOC_REQUEST_ALLOC` on the media device.
> +
> +11. Start streaming on both ``OUTPUT`` and ``CAPTURE`` queues via
> + :c:func:`VIDIOC_STREAMON`.

If I am not mistaken, steps 1-11 are the same as for stateful codecs. It is better
to just refer to that instead of copying them.

> +
> +Decoding
> +========
> +
> +For each frame, the client is responsible for submitting a request to which the
> +following is attached:
> +
> +* Exactly one frame worth of encoded data in a buffer submitted to the
> + ``OUTPUT`` queue,
> +* All the controls relevant to the format being decoded (see below for details).
> +
> +``CAPTURE`` buffers must not be part of the request, but must be queued
> +independently. The driver will pick one of the queued ``CAPTURE`` buffers and
> +decode the frame into it. Although the client has no control over which
> +``CAPTURE`` buffer will be used with a given ``OUTPUT`` buffer, it is guaranteed
> +that ``CAPTURE`` buffers will be returned in decode order (i.e. the same order
> +as ``OUTPUT`` buffers were submitted), so it is trivial to associate a dequeued
> +``CAPTURE`` buffer to its originating request and ``OUTPUT`` buffer.
> +
> +If the request is submitted without an ``OUTPUT`` buffer or if one of the
> +required controls are missing, then :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return
> +``-EINVAL``.

Not entirely true: if buffers are missing, then ENOENT is returned. Missing required
controls or more than one OUTPUT buffer will result in EINVAL. This per the latest
Request API changes.

Decoding errors are signaled by the ``CAPTURE`` buffers being
> +dequeued carrying the ``V4L2_BUF_FLAG_ERROR`` flag.

Add here that if the reference frame had an error, then all other frames that refer
to it should also set the ERROR flag. It is up to userspace to decide whether or
not to drop them (part of the frame might still be valid).

I am not sure whether this should be documented, but there are some additional
restrictions w.r.t. reference frames:

Since decoders need access to the decoded reference frames there are some corner
cases that need to be checked:

1) V4L2_MEMORY_USERPTR cannot be used for the capture queue: the driver does not
know when a malloced but dequeued buffer is freed, so the reference frame
could suddenly be gone.

2) V4L2_MEMORY_DMABUF can be used, but drivers should check that the dma buffer is
still available AND increase the dmabuf refcount while it is used by the HW.

3) What to do if userspace has requeued a buffer containing a reference frame,
and you want to decode a B/P-frame that refers to that buffer? We need to
check against that: I think that when you queue a capture buffer whose index
is used in a pending request as a reference frame, than that should fail with
an error. And trying to queue a request referring to a buffer that has been
requeued should also fail.

We might need to add some support for this in v4l2-mem2mem.c or vb2.

We will have similar (but not quite identical) issues with stateless encoders.

> +
> +The contents of source ``OUTPUT`` buffers, as well as the controls that must be
> +set on the request, depend on active coded pixel format and might be affected by
> +codec-specific extended controls, as stated in documentation of each format
> +individually.
> +
> +MPEG-2 buffer content and controls
> +----------------------------------
> +The following information is valid when the ``OUTPUT`` queue is set to the
> +``V4L2_PIX_FMT_MPEG2_SLICE`` format.
> +
> +The ``OUTPUT`` buffer must contain all the macroblock slices of a given frame,
> +i.e. if a frame requires several macroblock slices to be entirely decoded, then
> +all these slices must be provided. In addition, the following controls must be
> +set on the request:
> +
> +V4L2_CID_MPEG_VIDEO_MPEG2_SLICE_PARAMS
> + Slice parameters (one per slice) for the current frame.
> +
> +Optional controls:
> +
> +V4L2_CID_MPEG_VIDEO_MPEG2_QUANTIZATION
> + Quantization matrices for the current frame.
> +
> +H.264 buffer content and controls
> +---------------------------------
> +The following information is valid when the ``OUTPUT`` queue is set to the
> +``V4L2_PIX_FMT_H264_SLICE`` format.
> +
> +The ``OUTPUT`` buffer must contain all the macroblock slices of a given frame,
> +i.e. if a frame requires several macroblock slices to be entirely decoded, then
> +all these slices must be provided. In addition, the following controls must be
> +set on the request:
> +
> +V4L2_CID_MPEG_VIDEO_H264_SPS
> + Instance of struct v4l2_ctrl_h264_sps, containing the SPS of to use with the
> + frame.
> +
> +V4L2_CID_MPEG_VIDEO_H264_PPS
> + Instance of struct v4l2_ctrl_h264_pps, containing the PPS of to use with the
> + frame.
> +
> +V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX
> + Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
> + matrix to use when decoding the frame.
> +
> +V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM
> + Array of struct v4l2_ctrl_h264_slice_param, containing at least as many
> + entries as there are slices in the corresponding ``OUTPUT`` buffer.
> +
> +V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAM
> + Instance of struct v4l2_ctrl_h264_decode_param, containing the high-level
> + decoding parameters for a H.264 frame.
> +
> +Seek
> +====
> +In order to seek, the client just needs to submit requests using input buffers
> +corresponding to the new stream position. It must however be aware that
> +resolution may have changed and follow the dynamic resolution change protocol in
> +that case. Also depending on the codec used, picture parameters (e.g. SPS/PPS
> +for H.264) may have changed and the client is responsible for making sure that
> +a valid state is sent to the kernel.
> +
> +The client is then free to ignore any returned ``CAPTURE`` buffer that comes
> +from the pre-seek position.
> +
> +Pause
> +=====
> +
> +In order to pause, the client should just cease queuing buffers onto the
> +``OUTPUT`` queue. This is different from the general V4L2 API definition of
> +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue.
> +Without source bitstream data, there is no data to process and the hardware
> +remains idle.
> +
> +Dynamic resolution change
> +=========================
> +
> +If the client detects a resolution change in the stream, it may need to
> +reallocate the ``CAPTURE`` buffers to fit the new size.
> +
> +1. Wait until all submitted requests have completed and dequeue the
> + corresponding output buffers.
> +
> +2. Call :c:func:`VIDIOC_STREAMOFF` on the ``CAPTURE`` queue.
> +
> +3. Free all ``CAPTURE`` buffers by calling :c:func:`VIDIOC_REQBUFS` on the
> + ``CAPTURE`` queue with a buffer count of zero.
> +
> +4. Set the new format and resolution on the ``CAPTURE`` queue.
> +
> +5. Allocate new ``CAPTURE`` buffers for the new resolution.
> +
> +6. Call :c:func:`VIDIOC_STREAMON` on the ``CAPTURE`` queue to resume the stream.
> +
> +The client can then start queueing new ``CAPTURE`` buffers and submit requests
> +to decode the next buffers at the new resolution.
> +
> +Drain
> +=====
> +
> +In order to drain the stream on a stateless decoder, the client just needs to
> +wait until all the submitted requests are completed. There is no need to send a
> +``V4L2_DEC_CMD_STOP`` command since requests are processed sequentially by the
> +driver.
> +
> +End of stream
> +=============
> +
> +If the decoder encounters an end of stream marking in the stream, the
> +driver must send a ``V4L2_EVENT_EOS`` event to the client after all frames
> +are decoded and ready to be dequeued on the ``CAPTURE`` queue, with the
> +:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This
> +behavior is identical to the drain sequence triggered by the client via
> +``V4L2_DEC_CMD_STOP``.
> diff --git a/Documentation/media/uapi/v4l/devices.rst b/Documentation/media/uapi/v4l/devices.rst
> index 1822c66c2154..a8e568eda7d8 100644
> --- a/Documentation/media/uapi/v4l/devices.rst
> +++ b/Documentation/media/uapi/v4l/devices.rst
> @@ -16,6 +16,7 @@ Interfaces
> dev-osd
> dev-codec
> dev-decoder
> + dev-stateless-decoder
> dev-encoder
> dev-effect
> dev-raw-vbi
> diff --git a/Documentation/media/uapi/v4l/extended-controls.rst b/Documentation/media/uapi/v4l/extended-controls.rst
> index a9252225b63e..c0411ebf4c12 100644
> --- a/Documentation/media/uapi/v4l/extended-controls.rst
> +++ b/Documentation/media/uapi/v4l/extended-controls.rst
> @@ -810,6 +810,29 @@ enum v4l2_mpeg_video_bitrate_mode -
> otherwise the decoder expects a single frame in per buffer.
> Applicable to the decoder, all codecs.
>
> +``V4L2_CID_MPEG_VIDEO_H264_SPS``
> + Instance of struct v4l2_ctrl_h264_sps, containing the SPS of to use with
> + the next queued frame. Applicable to the H.264 stateless decoder.
> +
> +``V4L2_CID_MPEG_VIDEO_H264_PPS``
> + Instance of struct v4l2_ctrl_h264_pps, containing the PPS of to use with
> + the next queued frame. Applicable to the H.264 stateless decoder.
> +
> +``V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX``
> + Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
> + matrix to use when decoding the next queued frame. Applicable to the H.264
> + stateless decoder.
> +
> +``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM``
> + Array of struct v4l2_ctrl_h264_slice_param, containing at least as many
> + entries as there are slices in the corresponding ``OUTPUT`` buffer.
> + Applicable to the H.264 stateless decoder.
> +
> +``V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAM``
> + Instance of struct v4l2_ctrl_h264_decode_param, containing the high-level
> + decoding parameters for a H.264 frame. Applicable to the H.264 stateless
> + decoder.
> +
> ``V4L2_CID_MPEG_VIDEO_H264_VUI_SAR_ENABLE (boolean)``
> Enable writing sample aspect ratio in the Video Usability
> Information. Applicable to the H264 encoder.
>

Regards,

Hans

2018-09-10 11:59:16

[permalink] [raw]

Subject: Re: [RFC PATCH] media: docs-rst: Document m2m stateless video decoder interface

On 09/10/2018 01:25 PM, Hans Verkuil wrote:
>> +
>> +Decoding
>> +========
>> +
>> +For each frame, the client is responsible for submitting a request to which the
>> +following is attached:
>> +
>> +* Exactly one frame worth of encoded data in a buffer submitted to the
>> + ``OUTPUT`` queue,
>> +* All the controls relevant to the format being decoded (see below for details).
>> +
>> +``CAPTURE`` buffers must not be part of the request, but must be queued
>> +independently. The driver will pick one of the queued ``CAPTURE`` buffers and
>> +decode the frame into it. Although the client has no control over which
>> +``CAPTURE`` buffer will be used with a given ``OUTPUT`` buffer, it is guaranteed
>> +that ``CAPTURE`` buffers will be returned in decode order (i.e. the same order
>> +as ``OUTPUT`` buffers were submitted), so it is trivial to associate a dequeued
>> +``CAPTURE`` buffer to its originating request and ``OUTPUT`` buffer.
>> +
>> +If the request is submitted without an ``OUTPUT`` buffer or if one of the
>> +required controls are missing, then :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return
>> +``-EINVAL``.
>
> Not entirely true: if buffers are missing, then ENOENT is returned. Missing required
> controls or more than one OUTPUT buffer will result in EINVAL. This per the latest
> Request API changes.
>
> Decoding errors are signaled by the ``CAPTURE`` buffers being
>> +dequeued carrying the ``V4L2_BUF_FLAG_ERROR`` flag.
>
> Add here that if the reference frame had an error, then all other frames that refer
> to it should also set the ERROR flag. It is up to userspace to decide whether or
> not to drop them (part of the frame might still be valid).
>
> I am not sure whether this should be documented, but there are some additional
> restrictions w.r.t. reference frames:
>
> Since decoders need access to the decoded reference frames there are some corner
> cases that need to be checked:
>
> 1) V4L2_MEMORY_USERPTR cannot be used for the capture queue: the driver does not
> know when a malloced but dequeued buffer is freed, so the reference frame
> could suddenly be gone.
>
> 2) V4L2_MEMORY_DMABUF can be used, but drivers should check that the dma buffer is
> still available AND increase the dmabuf refcount while it is used by the HW.
>
> 3) What to do if userspace has requeued a buffer containing a reference frame,
> and you want to decode a B/P-frame that refers to that buffer? We need to
> check against that: I think that when you queue a capture buffer whose index
> is used in a pending request as a reference frame, than that should fail with
> an error. And trying to queue a request referring to a buffer that has been
> requeued should also fail.
>
> We might need to add some support for this in v4l2-mem2mem.c or vb2.
>
> We will have similar (but not quite identical) issues with stateless encoders.

Related to this is the question whether buffer indices that are used to refer
to reference frames should refer to the capture or output queue.

Using capture indices works if you never queue more than one request at a time:
you know exactly what the capture buffer index is of the decoded I-frame, and
you can use that in the following requests.

But if multiple requests are queued, then you don't necessarily know to which
capture buffer an I-frame will be decoded, so then you can't provide this index
to following B/P-frames. This puts restrictions on userspace: you can only
queue B/P-frames once you have decoded the I-frame. This might be perfectly
acceptable, though.

Using output buffer indices will work (the driver will know to which capture
buffer index the I-frames mapped), but it requires that the output buffer that
contained a reference frame isn't requeued, since that means that the driver
will lose this mapping. I think this will make things too confusing, though.

A third option is that you don't refer to reference frames by buffer index,
but instead by some other counter (sequence counter, perhaps?).

Regards,

Hans

2018-09-10 12:26:31

by Nicolas Dufresne

[permalink] [raw]

Subject: Re: [RFC PATCH] media: docs-rst: Document m2m stateless video decoder interface

Le lundi 10 septembre 2018 à 16:17 +0900, Tomasz Figa a écrit :
> Hi Alex,
>
> +Maxime Ripard +Ezequiel Garcia +Nicolas Dufresne
>
> [Not snipping intentionally.]
>
> On Fri, Aug 31, 2018 at 4:48 PM Alexandre Courbot <[email protected]> wrote:
> >
> > This patch documents the protocol that user-space should follow when
> > communicating with stateless video decoders. It is based on the
> > following references:
> >
> > * The current protocol used by Chromium (converted from config store to
> > request API)
> >
> > * The submitted Cedrus VPU driver
> >
> > As such, some things may not be entirely consistent with the current
> > state of drivers, so it would be great if all stakeholders could point
> > out these inconsistencies. :)
> >
> > This patch is supposed to be applied on top of the Request API V18 as
> > well as the memory-to-memory video decoder interface series by Tomasz
> > Figa.
> >
> > It should be considered an early RFC.
>
> Thanks a lot. I think this gives us a much better start already than
> what we had with stateful API without any documentation. :)
>
> Please see my comments inline.
>
> >
> > Signed-off-by: Alexandre Courbot <[email protected]>
> > ---
> > .../media/uapi/v4l/dev-stateless-decoder.rst | 413 ++++++++++++++++++
> > Documentation/media/uapi/v4l/devices.rst | 1 +
> > .../media/uapi/v4l/extended-controls.rst | 23 +
> > 3 files changed, 437 insertions(+)
> > create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> >
> > diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > new file mode 100644
> > index 000000000000..bf7b13a8ee16
> > --- /dev/null
> > +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > @@ -0,0 +1,413 @@
> > +.. -*- coding: utf-8; mode: rst -*-
> > +
> > +.. _stateless_decoder:
> > +
> > +**************************************************
> > +Memory-to-memory Stateless Video Decoder Interface
> > +**************************************************
> > +
> > +A stateless decoder is a decoder that works without retaining any kind of state
> > +between processing frames. This means that each frame is decoded independently
> > +of any previous and future frames, and that the client is responsible for
> > +maintaining the decoding state and providing it to the driver. This is in
> > +contrast to the stateful video decoder interface, where the hardware maintains
> > +the decoding state and all the client has to do is to provide the raw encoded
> > +stream.
> > +
> > +This section describes how user-space ("the client") is expected to communicate
> > +with such decoders in order to successfully decode an encoded stream. Compared
> > +to stateful codecs, the driver/client protocol is simpler, but cost of this
> > +simplicity is extra complexity in the client which must maintain the decoding
> > +state.
> > +
> > +Querying capabilities
> > +=====================
> > +
> > +1. To enumerate the set of coded formats supported by the driver, the client
> > + calls :c:func:`VIDIOC_ENUM_FMT` on the ``OUTPUT`` queue.
> > +
> > + * The driver must always return the full set of supported formats for the
> > + currently set ``OUTPUT`` format, irrespective of the format currently set
> > + on the ``CAPTURE`` queue.
> > +
> > +2. To enumerate the set of supported raw formats, the client calls
> > + :c:func:`VIDIOC_ENUM_FMT` on the ``CAPTURE`` queue.
> > +
> > + * The driver must return only the formats supported for the format currently
> > + active on the ``OUTPUT`` queue.
> > +
> > + * In order to enumerate raw formats supported by a given coded format, the
> > + client must thus set that coded format on the ``OUTPUT`` queue first, and
> > + then enumerate the ``CAPTURE`` queue.
>
> One thing that we might want to note here is that available CAPTURE
> formats may depend on more factors than just current OUTPUT format.
> Depending on encoding options of the stream being decoded (e.g. VP9
> profile), the list of supported format might be limited to one of YUV
> 4:2:0, 4:2:2 or 4:4:4 family of formats, but might be any of them, if
> the hardware supports conversion.
>
> I was wondering whether we shouldn't require the client to set all the
> necessary initial codec-specific controls before querying CAPTURE
> formats, but since we don't have any compatibility constraints here,
> as opposed to the stateful API, perhaps we could just make CAPTURE
> queue completely independent and have a source change event raised if
> the controls set later make existing format invalid.
>
> I'd like to ask the userspace folks here (Nicolas?), whether:
> 1) we need to know the exact list of formats that are guaranteed to be
> supported for playing back the whole video, or
> 2) we're okay with some rough approximation here, or
> 3) maybe we don't even need to enumerate formats on CAPTURE beforehand?

While Gst do try and make an initial list of formats before setting the
capture sink format (for stateful case), I don't think this is really
required. So specially for stateless, feel free to require a format,
level, profile, tier, etc. before one can enumerate the formats. I will
likely drop that initial probe in a near future, it also speed up the
startup time.

>
> To be honest, I'm not sure 1) is even possible, since the resolution
> (or some other stream parameters) could change mid-stream.
>
> > +
> > +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> > + resolutions for a given format, passing desired pixel format in
> > + :c:type:`v4l2_frmsizeenum` ``pixel_format``.
> > +
> > + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``OUTPUT`` queue
> > + must include all possible coded resolutions supported by the decoder
> > + for given coded pixel format.
> > +
> > + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``CAPTURE`` queue
> > + must include all possible frame buffer resolutions supported by the
> > + decoder for given raw pixel format and coded format currently set on
> > + ``OUTPUT`` queue.
> > +
> > + .. note::
> > +
> > + The client may derive the supported resolution range for a
> > + combination of coded and raw format by setting width and height of
> > + ``OUTPUT`` format to 0 and calculating the intersection of
> > + resolutions returned from calls to :c:func:`VIDIOC_ENUM_FRAMESIZES`
> > + for the given coded and raw formats.
> > +
> > +4. Supported profiles and levels for given format, if applicable, may be
> > + queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> > +
> > +Initialization
> > +==============
> > +
> > +1. *[optional]* Enumerate supported ``OUTPUT`` formats and resolutions. See
> > + capability enumeration.
> > +
> > +2. Set the coded format on the ``OUTPUT`` queue via :c:func:`VIDIOC_S_FMT`
> > +
> > + * **Required fields:**
> > +
> > + ``type``
> > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > + ``pixelformat``
> > + a coded pixel format
> > +
> > + ``width``, ``height``
> > + parsed width and height of the coded format
>
> Perhaps "coded width and height parsed from the stream" could be a bit
> more clear?
>
> > +
> > + other fields
> > + follow standard semantics
> > +
> > + .. note::
> > +
> > + Changing ``OUTPUT`` format may change currently set ``CAPTURE``
> > + format. The driver will derive a new ``CAPTURE`` format from
> > + ``OUTPUT`` format being set, including resolution, colorimetry
> > + parameters, etc. If the client needs a specific ``CAPTURE`` format,
> > + it must adjust it afterwards.
> > +
> > +3. *[optional]* Get minimum number of buffers required for ``OUTPUT``
> > + queue via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to
> > + use more buffers than minimum required by hardware/format.
> > +
> > + * **Required fields:**
> > +
> > + ``id``
> > + set to ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> > +
> > + * **Return fields:**
> > +
> > + ``value``
> > + required number of ``OUTPUT`` buffers for the currently set
> > + format
>
> I'm not very sure if this is useful for stateless API, but no strong opinion.
>
> > +
> > +4. Call :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` queue to get format for the
> > + destination buffers parsed/decoded from the bitstream.
> > +
> > + * **Required fields:**
> > +
> > + ``type``
> > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > +
> > + * **Return fields:**
> > +
> > + ``width``, ``height``
> > + frame buffer resolution for the decoded frames
> > +
> > + ``pixelformat``
> > + pixel format for decoded frames
> > +
> > + ``num_planes`` (for _MPLANE ``type`` only)
> > + number of planes for pixelformat
> > +
> > + ``sizeimage``, ``bytesperline``
> > + as per standard semantics; matching frame buffer format
> > +
> > + .. note::
> > +
> > + The value of ``pixelformat`` may be any pixel format supported for the
> > + ``OUTPUT`` format, based on the hardware capabilities. It is suggested
> > + that driver chooses the preferred/optimal format for given configuration.
> > + For example, a YUV format may be preferred over an RGB format, if
> > + additional conversion step would be required.
> > +
> > +5. *[optional]* Enumerate ``CAPTURE`` formats via :c:func:`VIDIOC_ENUM_FMT` on
> > + ``CAPTURE`` queue. The client may use this ioctl to discover which
> > + alternative raw formats are supported for the current ``OUTPUT`` format and
> > + select one of them via :c:func:`VIDIOC_S_FMT`.
> > +
> > + .. note::
> > +
> > + The driver will return only formats supported for the currently selected
> > + ``OUTPUT`` format, even if more formats may be supported by the driver in
> > + general.
> > +
> > + For example, a driver/hardware may support YUV and RGB formats for
> > + resolutions 1920x1088 and lower, but only YUV for higher resolutions (due
> > + to hardware limitations). After setting a resolution of 1920x1088 or lower
> > + as the ``OUTPUT`` format, :c:func:`VIDIOC_ENUM_FMT` may return a set of
> > + YUV and RGB pixel formats, but after setting a resolution higher than
> > + 1920x1088, the driver will not return RGB, unsupported for this
> > + resolution.
> > +
> > +6. *[optional]* Choose a different ``CAPTURE`` format than suggested via
> > + :c:func:`VIDIOC_S_FMT` on ``CAPTURE`` queue. It is possible for the client to
> > + choose a different format than selected/suggested by the driver in
> > + :c:func:`VIDIOC_G_FMT`.
> > +
> > + * **Required fields:**
> > +
> > + ``type``
> > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > +
> > + ``pixelformat``
> > + a raw pixel format
> > +
> > + .. note::
> > +
> > + Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available
> > + formats after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
> > + out a set of allowed formats for given configuration, but not required,
> > + if the client can accept the defaults.
>
> V4L2_EVENT_SOURCE_CHANGE was not mentioned in earlier steps. I suppose
> it's a leftover from the stateful API? :)
>
> Still, I think we may eventually need source change events, because of
> the reasons I mentioned above.
>
> > +
> > +7. *[optional]* Get minimum number of buffers required for ``CAPTURE`` queue via
> > + :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use more buffers
> > + than minimum required by hardware/format.
> > +
> > + * **Required fields:**
> > +
> > + ``id``
> > + set to ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> > +
> > + * **Return fields:**
> > +
> > + ``value``
> > + minimum number of buffers required to decode the stream parsed in this
> > + initialization sequence.
> > +
> > + .. note::
> > +
> > + Note that the minimum number of buffers must be at least the number
> > + required to successfully decode the current stream. This may for example
> > + be the required DPB size for an H.264 stream given the parsed stream
> > + configuration (resolution, level).
>
> I'm not sure if this really makes sense for stateless API, because DPB
> management is done by the client, so it already has all the data to
> know how many buffers would be needed/optimal.
>
> > +
> > +8. Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on
> > + ``OUTPUT`` queue.
> > +
> > + * **Required fields:**
> > +
> > + ``count``
> > + requested number of buffers to allocate; greater than zero
> > +
> > + ``type``
> > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > + ``memory``
> > + follows standard semantics
> > +
> > + ``sizeimage``
> > + follows standard semantics; the client is free to choose any
> > + suitable size, however, it may be subject to change by the
> > + driver
> > +
> > + * **Return fields:**
> > +
> > + ``count``
> > + actual number of buffers allocated
> > +
> > + * The driver must adjust count to minimum of required number of ``OUTPUT``
> > + buffers for given format and count passed. The client must check this
> > + value after the ioctl returns to get the number of buffers allocated.
> > +
> > + .. note::
> > +
> > + To allocate more than minimum number of buffers (for pipeline depth), use
> > + G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) to get minimum number of
> > + buffers required by the driver/format, and pass the obtained value plus
> > + the number of additional buffers needed in count to
> > + :c:func:`VIDIOC_REQBUFS`.
> > +
> > +9. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
> > + ``CAPTURE`` queue.
> > +
> > + * **Required fields:**
> > +
> > + ``count``
> > + requested number of buffers to allocate; greater than zero
> > +
> > + ``type``
> > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > +
> > + ``memory``
> > + follows standard semantics
> > +
> > + * **Return fields:**
> > +
> > + ``count``
> > + adjusted to allocated number of buffers
> > +
> > + * The driver must adjust count to minimum of required number of
> > + destination buffers for given format and stream configuration and the
> > + count passed. The client must check this value after the ioctl
> > + returns to get the number of buffers allocated.
> > +
> > + .. note::
> > +
> > + To allocate more than minimum number of buffers (for pipeline
> > + depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``) to
> > + get minimum number of buffers required, and pass the obtained value
> > + plus the number of additional buffers needed in count to
> > + :c:func:`VIDIOC_REQBUFS`.
> > +
> > +10. Allocate requests (likely one per ``OUTPUT`` buffer) via
> > + :c:func:`MEDIA_IOC_REQUEST_ALLOC` on the media device.
> > +
> > +11. Start streaming on both ``OUTPUT`` and ``CAPTURE`` queues via
> > + :c:func:`VIDIOC_STREAMON`.
> > +
> > +Decoding
> > +========
> > +
> > +For each frame, the client is responsible for submitting a request to which the
> > +following is attached:
> > +
> > +* Exactly one frame worth of encoded data in a buffer submitted to the
> > + ``OUTPUT`` queue,
>
> Just to make sure, in case of H.264, that would include all the slices
> of the frame in one buffer, right?
>
> > +* All the controls relevant to the format being decoded (see below for details).
> > +
> > +``CAPTURE`` buffers must not be part of the request, but must be queued
> > +independently. The driver will pick one of the queued ``CAPTURE`` buffers and
> > +decode the frame into it. Although the client has no control over which
> > +``CAPTURE`` buffer will be used with a given ``OUTPUT`` buffer, it is guaranteed
> > +that ``CAPTURE`` buffers will be returned in decode order (i.e. the same order
> > +as ``OUTPUT`` buffers were submitted), so it is trivial to associate a dequeued
> > +``CAPTURE`` buffer to its originating request and ``OUTPUT`` buffer.
> > +
> > +If the request is submitted without an ``OUTPUT`` buffer or if one of the
> > +required controls are missing, then :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return
> > +``-EINVAL``.
>
> As per some of the other discussion threads we had before (and I
> linked to in my previous reply), we might want one of the following:
> 1) precisely define the list of controls needed with the
> fine-granularity of all possible stream feature combinations,
> 2) make only some very basic controls mandatory and never fail if
> other controls are not specified.
>
> IMHO 2) has a potential to lead to userspace relying on undefined
> behavior with some controls not being set (for
> laziness/simplicity/whatever excuse the author can think of), so I'd
> personally go with 1)...
>
> > Decoding errors are signaled by the ``CAPTURE`` buffers being
> > +dequeued carrying the ``V4L2_BUF_FLAG_ERROR`` flag.
> > +
> > +The contents of source ``OUTPUT`` buffers, as well as the controls that must be
> > +set on the request, depend on active coded pixel format and might be affected by
> > +codec-specific extended controls, as stated in documentation of each format
> > +individually.
> > +
> > +MPEG-2 buffer content and controls
> > +----------------------------------
> > +The following information is valid when the ``OUTPUT`` queue is set to the
> > +``V4L2_PIX_FMT_MPEG2_SLICE`` format.
>
> Perhaps we should document the controls there instead? I think that
> was the conclusion from the discussions over the stateful API.
>
> > +
> > +The ``OUTPUT`` buffer must contain all the macroblock slices of a given frame,
> > +i.e. if a frame requires several macroblock slices to be entirely decoded, then
> > +all these slices must be provided. In addition, the following controls must be
> > +set on the request:
> > +
> > +V4L2_CID_MPEG_VIDEO_MPEG2_SLICE_PARAMS
> > + Slice parameters (one per slice) for the current frame.
> > +
>
> How do we know how many slices are included in current frame?
>
> > +Optional controls:
> > +
> > +V4L2_CID_MPEG_VIDEO_MPEG2_QUANTIZATION
> > + Quantization matrices for the current frame.
>
> What happens if it's not specified?
>
> > +
> > +H.264 buffer content and controls
> > +---------------------------------
> > +The following information is valid when the ``OUTPUT`` queue is set to the
> > +``V4L2_PIX_FMT_H264_SLICE`` format.
>
> Ditto.
>
> > +
> > +The ``OUTPUT`` buffer must contain all the macroblock slices of a given frame,
> > +i.e. if a frame requires several macroblock slices to be entirely decoded, then
> > +all these slices must be provided. In addition, the following controls must be
> > +set on the request:
> > +
> > +V4L2_CID_MPEG_VIDEO_H264_SPS
> > + Instance of struct v4l2_ctrl_h264_sps, containing the SPS of to use with the
> > + frame.
> > +
> > +V4L2_CID_MPEG_VIDEO_H264_PPS
> > + Instance of struct v4l2_ctrl_h264_pps, containing the PPS of to use with the
> > + frame.
> > +
> > +V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX
> > + Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
> > + matrix to use when decoding the frame.
> > +
> > +V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM
> > + Array of struct v4l2_ctrl_h264_slice_param, containing at least as many
> > + entries as there are slices in the corresponding ``OUTPUT`` buffer.
> > +
> > +V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAM
> > + Instance of struct v4l2_ctrl_h264_decode_param, containing the high-level
> > + decoding parameters for a H.264 frame.
> > +
> > +Seek
> > +====
> > +In order to seek, the client just needs to submit requests using input buffers
> > +corresponding to the new stream position. It must however be aware that
> > +resolution may have changed and follow the dynamic resolution change protocol in
>
> nit: We tend to call it "sequence" rather than "protocol" in other documents.
>
> > +that case. Also depending on the codec used, picture parameters (e.g. SPS/PPS
> > +for H.264) may have changed and the client is responsible for making sure that
> > +a valid state is sent to the kernel.
> > +
> > +The client is then free to ignore any returned ``CAPTURE`` buffer that comes
> > +from the pre-seek position.
> > +
> > +Pause
> > +=====
> > +
> > +In order to pause, the client should just cease queuing buffers onto the
> > +``OUTPUT`` queue. This is different from the general V4L2 API definition of
> > +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue.
> > +Without source bitstream data, there is no data to process and the hardware
> > +remains idle.
>
> This behavior is by design of memory-to-memory devices, so I'm not
> sure we really need this section. Perhaps we could move it to a
> separate document which explains m2m basics.
>
> > +
> > +Dynamic resolution change
> > +=========================
> > +
> > +If the client detects a resolution change in the stream, it may need to
> > +reallocate the ``CAPTURE`` buffers to fit the new size.
> > +
> > +1. Wait until all submitted requests have completed and dequeue the
> > + corresponding output buffers.
> > +
> > +2. Call :c:func:`VIDIOC_STREAMOFF` on the ``CAPTURE`` queue.
> > +
> > +3. Free all ``CAPTURE`` buffers by calling :c:func:`VIDIOC_REQBUFS` on the
> > + ``CAPTURE`` queue with a buffer count of zero.
> > +
> > +4. Set the new format and resolution on the ``CAPTURE`` queue.
> > +
> > +5. Allocate new ``CAPTURE`` buffers for the new resolution.
> > +
> > +6. Call :c:func:`VIDIOC_STREAMON` on the ``CAPTURE`` queue to resume the stream.
> > +
> > +The client can then start queueing new ``CAPTURE`` buffers and submit requests
> > +to decode the next buffers at the new resolution.
>
> There is some inconsistency here. In initialization sequence, we set
> OUTPUT format to coded width and height and had the driver set a sane
> default CAPTURE format. What happens to OUTPUT format? It definitely
> wouldn't make sense if it stayed at the initial coded size.
>
> > +
> > +Drain
> > +=====
> > +
> > +In order to drain the stream on a stateless decoder, the client just needs to
> > +wait until all the submitted requests are completed. There is no need to send a
> > +``V4L2_DEC_CMD_STOP`` command since requests are processed sequentially by the
> > +driver.
>
> Is there a need to include this section? I feel like it basically says
> "Drain sequence: There is no drain sequence." ;)
>
> > +
> > +End of stream
> > +=============
> > +
> > +If the decoder encounters an end of stream marking in the stream, the
> > +driver must send a ``V4L2_EVENT_EOS`` event to the client after all frames
> > +are decoded and ready to be dequeued on the ``CAPTURE`` queue, with the
> > +:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This
> > +behavior is identical to the drain sequence triggered by the client via
> > +``V4L2_DEC_CMD_STOP``.
>
> The client parses the stream, so it should be also responsible for end
> of stream handling. There isn't anything to be signaled by the driver
> here (nor the driver could actually signal), so I'd just remove this
> section completely.
>
> > diff --git a/Documentation/media/uapi/v4l/devices.rst b/Documentation/media/uapi/v4l/devices.rst
> > index 1822c66c2154..a8e568eda7d8 100644
> > --- a/Documentation/media/uapi/v4l/devices.rst
> > +++ b/Documentation/media/uapi/v4l/devices.rst
> > @@ -16,6 +16,7 @@ Interfaces
> > dev-osd
> > dev-codec
> > dev-decoder
> > + dev-stateless-decoder
> > dev-encoder
> > dev-effect
> > dev-raw-vbi
> > diff --git a/Documentation/media/uapi/v4l/extended-controls.rst b/Documentation/media/uapi/v4l/extended-controls.rst
> > index a9252225b63e..c0411ebf4c12 100644
> > --- a/Documentation/media/uapi/v4l/extended-controls.rst
> > +++ b/Documentation/media/uapi/v4l/extended-controls.rst
> > @@ -810,6 +810,29 @@ enum v4l2_mpeg_video_bitrate_mode -
> > otherwise the decoder expects a single frame in per buffer.
> > Applicable to the decoder, all codecs.
> >
> > +``V4L2_CID_MPEG_VIDEO_H264_SPS``
> > + Instance of struct v4l2_ctrl_h264_sps, containing the SPS of to use with
> > + the next queued frame. Applicable to the H.264 stateless decoder.
> > +
> > +``V4L2_CID_MPEG_VIDEO_H264_PPS``
> > + Instance of struct v4l2_ctrl_h264_pps, containing the PPS of to use with
> > + the next queued frame. Applicable to the H.264 stateless decoder.
> > +
> > +``V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX``
> > + Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
> > + matrix to use when decoding the next queued frame. Applicable to the H.264
> > + stateless decoder.
> > +
> > +``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM``
> > + Array of struct v4l2_ctrl_h264_slice_param, containing at least as many
> > + entries as there are slices in the corresponding ``OUTPUT`` buffer.
> > + Applicable to the H.264 stateless decoder.
> > +
> > +``V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAM``
> > + Instance of struct v4l2_ctrl_h264_decode_param, containing the high-level
> > + decoding parameters for a H.264 frame. Applicable to the H.264 stateless
> > + decoder.
> > +
>
> This seems to be roughly the same as in "H.264 buffer content and
> controls". IMHO we should just keep the controls described here and
> make the other file just cross reference to these descriptions.
>
> Also, I guess this would eventually end up in the patch that adds
> those controls and is just included here for RFC purposes, right?
>
> Best regards,
> Tomasz

Attachments:

signature.asc (201.00 B)
This is a digitally signed message part

2018-09-10 12:51:43

[permalink] [raw]

Subject: Re: [RFC PATCH] media: docs-rst: Document m2m stateless video decoder interface

On 09/10/2018 01:57 PM, Hans Verkuil wrote:
> On 09/10/2018 01:25 PM, Hans Verkuil wrote:
>>> +
>>> +Decoding
>>> +========
>>> +
>>> +For each frame, the client is responsible for submitting a request to which the
>>> +following is attached:
>>> +
>>> +* Exactly one frame worth of encoded data in a buffer submitted to the
>>> + ``OUTPUT`` queue,
>>> +* All the controls relevant to the format being decoded (see below for details).
>>> +
>>> +``CAPTURE`` buffers must not be part of the request, but must be queued
>>> +independently. The driver will pick one of the queued ``CAPTURE`` buffers and
>>> +decode the frame into it. Although the client has no control over which
>>> +``CAPTURE`` buffer will be used with a given ``OUTPUT`` buffer, it is guaranteed
>>> +that ``CAPTURE`` buffers will be returned in decode order (i.e. the same order
>>> +as ``OUTPUT`` buffers were submitted), so it is trivial to associate a dequeued
>>> +``CAPTURE`` buffer to its originating request and ``OUTPUT`` buffer.
>>> +
>>> +If the request is submitted without an ``OUTPUT`` buffer or if one of the
>>> +required controls are missing, then :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return
>>> +``-EINVAL``.
>>
>> Not entirely true: if buffers are missing, then ENOENT is returned. Missing required
>> controls or more than one OUTPUT buffer will result in EINVAL. This per the latest
>> Request API changes.
>>
>> Decoding errors are signaled by the ``CAPTURE`` buffers being
>>> +dequeued carrying the ``V4L2_BUF_FLAG_ERROR`` flag.
>>
>> Add here that if the reference frame had an error, then all other frames that refer
>> to it should also set the ERROR flag. It is up to userspace to decide whether or
>> not to drop them (part of the frame might still be valid).
>>
>> I am not sure whether this should be documented, but there are some additional
>> restrictions w.r.t. reference frames:
>>
>> Since decoders need access to the decoded reference frames there are some corner
>> cases that need to be checked:
>>
>> 1) V4L2_MEMORY_USERPTR cannot be used for the capture queue: the driver does not
>> know when a malloced but dequeued buffer is freed, so the reference frame
>> could suddenly be gone.
>>
>> 2) V4L2_MEMORY_DMABUF can be used, but drivers should check that the dma buffer is
>> still available AND increase the dmabuf refcount while it is used by the HW.
>>
>> 3) What to do if userspace has requeued a buffer containing a reference frame,
>> and you want to decode a B/P-frame that refers to that buffer? We need to
>> check against that: I think that when you queue a capture buffer whose index
>> is used in a pending request as a reference frame, than that should fail with
>> an error. And trying to queue a request referring to a buffer that has been
>> requeued should also fail.
>>
>> We might need to add some support for this in v4l2-mem2mem.c or vb2.
>>
>> We will have similar (but not quite identical) issues with stateless encoders.
>
> Related to this is the question whether buffer indices that are used to refer
> to reference frames should refer to the capture or output queue.
>
> Using capture indices works if you never queue more than one request at a time:
> you know exactly what the capture buffer index is of the decoded I-frame, and
> you can use that in the following requests.
>
> But if multiple requests are queued, then you don't necessarily know to which
> capture buffer an I-frame will be decoded, so then you can't provide this index
> to following B/P-frames. This puts restrictions on userspace: you can only
> queue B/P-frames once you have decoded the I-frame. This might be perfectly
> acceptable, though.
>
> Using output buffer indices will work (the driver will know to which capture
> buffer index the I-frames mapped), but it requires that the output buffer that
> contained a reference frame isn't requeued, since that means that the driver
> will lose this mapping. I think this will make things too confusing, though.
>
> A third option is that you don't refer to reference frames by buffer index,
> but instead by some other counter (sequence counter, perhaps?).

Definitely not sequence number, since it has the same problem as buffer index.

This could be a value relative to the frame you are trying to decode: i.e. 'use
the capture buffer from X frames ago'. This makes it index independent and is
still easy to keep track of inside the driver and application.

Regards,

Hans

2018-09-11 06:10:14

[permalink] [raw]

Subject: Re: [RFC PATCH] media: docs-rst: Document m2m stateless video decoder interface

On Mon, Sep 10, 2018 at 4:17 PM Tomasz Figa <[email protected]> wrote:
>
> Hi Alex,
>
> +Maxime Ripard +Ezequiel Garcia +Nicolas Dufresne
>
> [Not snipping intentionally.]
>
> On Fri, Aug 31, 2018 at 4:48 PM Alexandre Courbot <[email protected]> wrote:
> >
> > This patch documents the protocol that user-space should follow when
> > communicating with stateless video decoders. It is based on the
> > following references:
> >
> > * The current protocol used by Chromium (converted from config store to
> > request API)
> >
> > * The submitted Cedrus VPU driver
> >
> > As such, some things may not be entirely consistent with the current
> > state of drivers, so it would be great if all stakeholders could point
> > out these inconsistencies. :)
> >
> > This patch is supposed to be applied on top of the Request API V18 as
> > well as the memory-to-memory video decoder interface series by Tomasz
> > Figa.
> >
> > It should be considered an early RFC.
>
> Thanks a lot. I think this gives us a much better start already than
> what we had with stateful API without any documentation. :)
>
> Please see my comments inline.
>
> >
> > Signed-off-by: Alexandre Courbot <[email protected]>
> > ---
> > .../media/uapi/v4l/dev-stateless-decoder.rst | 413 ++++++++++++++++++
> > Documentation/media/uapi/v4l/devices.rst | 1 +
> > .../media/uapi/v4l/extended-controls.rst | 23 +
> > 3 files changed, 437 insertions(+)
> > create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> >
> > diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > new file mode 100644
> > index 000000000000..bf7b13a8ee16
> > --- /dev/null
> > +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > @@ -0,0 +1,413 @@
> > +.. -*- coding: utf-8; mode: rst -*-
> > +
> > +.. _stateless_decoder:
> > +
> > +**************************************************
> > +Memory-to-memory Stateless Video Decoder Interface
> > +**************************************************
> > +
> > +A stateless decoder is a decoder that works without retaining any kind of state
> > +between processing frames. This means that each frame is decoded independently
> > +of any previous and future frames, and that the client is responsible for
> > +maintaining the decoding state and providing it to the driver. This is in
> > +contrast to the stateful video decoder interface, where the hardware maintains
> > +the decoding state and all the client has to do is to provide the raw encoded
> > +stream.
> > +
> > +This section describes how user-space ("the client") is expected to communicate
> > +with such decoders in order to successfully decode an encoded stream. Compared
> > +to stateful codecs, the driver/client protocol is simpler, but cost of this
> > +simplicity is extra complexity in the client which must maintain the decoding
> > +state.
> > +
> > +Querying capabilities
> > +=====================
> > +
> > +1. To enumerate the set of coded formats supported by the driver, the client
> > + calls :c:func:`VIDIOC_ENUM_FMT` on the ``OUTPUT`` queue.
> > +
> > + * The driver must always return the full set of supported formats for the
> > + currently set ``OUTPUT`` format, irrespective of the format currently set
> > + on the ``CAPTURE`` queue.
> > +
> > +2. To enumerate the set of supported raw formats, the client calls
> > + :c:func:`VIDIOC_ENUM_FMT` on the ``CAPTURE`` queue.
> > +
> > + * The driver must return only the formats supported for the format currently
> > + active on the ``OUTPUT`` queue.
> > +
> > + * In order to enumerate raw formats supported by a given coded format, the
> > + client must thus set that coded format on the ``OUTPUT`` queue first, and
> > + then enumerate the ``CAPTURE`` queue.
>
> One thing that we might want to note here is that available CAPTURE
> formats may depend on more factors than just current OUTPUT format.
> Depending on encoding options of the stream being decoded (e.g. VP9
> profile), the list of supported format might be limited to one of YUV
> 4:2:0, 4:2:2 or 4:4:4 family of formats, but might be any of them, if
> the hardware supports conversion.
>
> I was wondering whether we shouldn't require the client to set all the
> necessary initial codec-specific controls before querying CAPTURE
> formats, but since we don't have any compatibility constraints here,
> as opposed to the stateful API, perhaps we could just make CAPTURE
> queue completely independent and have a source change event raised if
> the controls set later make existing format invalid.
>
> I'd like to ask the userspace folks here (Nicolas?), whether:
> 1) we need to know the exact list of formats that are guaranteed to be
> supported for playing back the whole video, or
> 2) we're okay with some rough approximation here, or
> 3) maybe we don't even need to enumerate formats on CAPTURE beforehand?

Since user-space will need to parse all the profile and other
information and submit it to the kernel anyway, it seems to me that
requiring it to submit that information before setting the CAPTURE
format makes the most sense. Otherwise all clients should listen for
events and be capable of renegotiating the format, which would
complicate them some more.

>
> To be honest, I'm not sure 1) is even possible, since the resolution
> (or some other stream parameters) could change mid-stream.

Even in that case, the client is supposed to handle that change, so I
don't think this makes a big difference compared to initial
negotiation?

>
> > +
> > +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> > + resolutions for a given format, passing desired pixel format in
> > + :c:type:`v4l2_frmsizeenum` ``pixel_format``.
> > +
> > + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``OUTPUT`` queue
> > + must include all possible coded resolutions supported by the decoder
> > + for given coded pixel format.
> > +
> > + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``CAPTURE`` queue
> > + must include all possible frame buffer resolutions supported by the
> > + decoder for given raw pixel format and coded format currently set on
> > + ``OUTPUT`` queue.
> > +
> > + .. note::
> > +
> > + The client may derive the supported resolution range for a
> > + combination of coded and raw format by setting width and height of
> > + ``OUTPUT`` format to 0 and calculating the intersection of
> > + resolutions returned from calls to :c:func:`VIDIOC_ENUM_FRAMESIZES`
> > + for the given coded and raw formats.
> > +
> > +4. Supported profiles and levels for given format, if applicable, may be
> > + queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> > +
> > +Initialization
> > +==============
> > +
> > +1. *[optional]* Enumerate supported ``OUTPUT`` formats and resolutions. See
> > + capability enumeration.
> > +
> > +2. Set the coded format on the ``OUTPUT`` queue via :c:func:`VIDIOC_S_FMT`
> > +
> > + * **Required fields:**
> > +
> > + ``type``
> > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > + ``pixelformat``
> > + a coded pixel format
> > +
> > + ``width``, ``height``
> > + parsed width and height of the coded format
>
> Perhaps "coded width and height parsed from the stream" could be a bit
> more clear?

Indeed, fixed.

>
> > +
> > + other fields
> > + follow standard semantics
> > +
> > + .. note::
> > +
> > + Changing ``OUTPUT`` format may change currently set ``CAPTURE``
> > + format. The driver will derive a new ``CAPTURE`` format from
> > + ``OUTPUT`` format being set, including resolution, colorimetry
> > + parameters, etc. If the client needs a specific ``CAPTURE`` format,
> > + it must adjust it afterwards.
> > +
> > +3. *[optional]* Get minimum number of buffers required for ``OUTPUT``
> > + queue via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to
> > + use more buffers than minimum required by hardware/format.
> > +
> > + * **Required fields:**
> > +
> > + ``id``
> > + set to ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> > +
> > + * **Return fields:**
> > +
> > + ``value``
> > + required number of ``OUTPUT`` buffers for the currently set
> > + format
>
> I'm not very sure if this is useful for stateless API, but no strong opinion.

It probably isn't. Removed that part.

>
> > +
> > +4. Call :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` queue to get format for the
> > + destination buffers parsed/decoded from the bitstream.
> > +
> > + * **Required fields:**
> > +
> > + ``type``
> > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > +
> > + * **Return fields:**
> > +
> > + ``width``, ``height``
> > + frame buffer resolution for the decoded frames
> > +
> > + ``pixelformat``
> > + pixel format for decoded frames
> > +
> > + ``num_planes`` (for _MPLANE ``type`` only)
> > + number of planes for pixelformat
> > +
> > + ``sizeimage``, ``bytesperline``
> > + as per standard semantics; matching frame buffer format
> > +
> > + .. note::
> > +
> > + The value of ``pixelformat`` may be any pixel format supported for the
> > + ``OUTPUT`` format, based on the hardware capabilities. It is suggested
> > + that driver chooses the preferred/optimal format for given configuration.
> > + For example, a YUV format may be preferred over an RGB format, if
> > + additional conversion step would be required.
> > +
> > +5. *[optional]* Enumerate ``CAPTURE`` formats via :c:func:`VIDIOC_ENUM_FMT` on
> > + ``CAPTURE`` queue. The client may use this ioctl to discover which
> > + alternative raw formats are supported for the current ``OUTPUT`` format and
> > + select one of them via :c:func:`VIDIOC_S_FMT`.
> > +
> > + .. note::
> > +
> > + The driver will return only formats supported for the currently selected
> > + ``OUTPUT`` format, even if more formats may be supported by the driver in
> > + general.
> > +
> > + For example, a driver/hardware may support YUV and RGB formats for
> > + resolutions 1920x1088 and lower, but only YUV for higher resolutions (due
> > + to hardware limitations). After setting a resolution of 1920x1088 or lower
> > + as the ``OUTPUT`` format, :c:func:`VIDIOC_ENUM_FMT` may return a set of
> > + YUV and RGB pixel formats, but after setting a resolution higher than
> > + 1920x1088, the driver will not return RGB, unsupported for this
> > + resolution.
> > +
> > +6. *[optional]* Choose a different ``CAPTURE`` format than suggested via
> > + :c:func:`VIDIOC_S_FMT` on ``CAPTURE`` queue. It is possible for the client to
> > + choose a different format than selected/suggested by the driver in
> > + :c:func:`VIDIOC_G_FMT`.
> > +
> > + * **Required fields:**
> > +
> > + ``type``
> > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > +
> > + ``pixelformat``
> > + a raw pixel format
> > +
> > + .. note::
> > +
> > + Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available
> > + formats after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
> > + out a set of allowed formats for given configuration, but not required,
> > + if the client can accept the defaults.
>
> V4L2_EVENT_SOURCE_CHANGE was not mentioned in earlier steps. I suppose
> it's a leftover from the stateful API? :)

Correct. :)

>
> Still, I think we may eventually need source change events, because of
> the reasons I mentioned above.

I'd personally prefer to do without, but if they become a requirement
(which I am not convinced of yet) we will indeed have no choice.

>
> > +
> > +7. *[optional]* Get minimum number of buffers required for ``CAPTURE`` queue via
> > + :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use more buffers
> > + than minimum required by hardware/format.
> > +
> > + * **Required fields:**
> > +
> > + ``id``
> > + set to ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> > +
> > + * **Return fields:**
> > +
> > + ``value``
> > + minimum number of buffers required to decode the stream parsed in this
> > + initialization sequence.
> > +
> > + .. note::
> > +
> > + Note that the minimum number of buffers must be at least the number
> > + required to successfully decode the current stream. This may for example
> > + be the required DPB size for an H.264 stream given the parsed stream
> > + configuration (resolution, level).
>
> I'm not sure if this really makes sense for stateless API, because DPB
> management is done by the client, so it already has all the data to
> know how many buffers would be needed/optimal.

Yeah, this is probably not useful here.

>
> > +
> > +8. Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on
> > + ``OUTPUT`` queue.
> > +
> > + * **Required fields:**
> > +
> > + ``count``
> > + requested number of buffers to allocate; greater than zero
> > +
> > + ``type``
> > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > + ``memory``
> > + follows standard semantics
> > +
> > + ``sizeimage``
> > + follows standard semantics; the client is free to choose any
> > + suitable size, however, it may be subject to change by the
> > + driver
> > +
> > + * **Return fields:**
> > +
> > + ``count``
> > + actual number of buffers allocated
> > +
> > + * The driver must adjust count to minimum of required number of ``OUTPUT``
> > + buffers for given format and count passed. The client must check this
> > + value after the ioctl returns to get the number of buffers allocated.
> > +
> > + .. note::
> > +
> > + To allocate more than minimum number of buffers (for pipeline depth), use
> > + G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) to get minimum number of
> > + buffers required by the driver/format, and pass the obtained value plus
> > + the number of additional buffers needed in count to
> > + :c:func:`VIDIOC_REQBUFS`.
> > +
> > +9. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
> > + ``CAPTURE`` queue.
> > +
> > + * **Required fields:**
> > +
> > + ``count``
> > + requested number of buffers to allocate; greater than zero
> > +
> > + ``type``
> > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > +
> > + ``memory``
> > + follows standard semantics
> > +
> > + * **Return fields:**
> > +
> > + ``count``
> > + adjusted to allocated number of buffers
> > +
> > + * The driver must adjust count to minimum of required number of
> > + destination buffers for given format and stream configuration and the
> > + count passed. The client must check this value after the ioctl
> > + returns to get the number of buffers allocated.
> > +
> > + .. note::
> > +
> > + To allocate more than minimum number of buffers (for pipeline
> > + depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``) to
> > + get minimum number of buffers required, and pass the obtained value
> > + plus the number of additional buffers needed in count to
> > + :c:func:`VIDIOC_REQBUFS`.
> > +
> > +10. Allocate requests (likely one per ``OUTPUT`` buffer) via
> > + :c:func:`MEDIA_IOC_REQUEST_ALLOC` on the media device.
> > +
> > +11. Start streaming on both ``OUTPUT`` and ``CAPTURE`` queues via
> > + :c:func:`VIDIOC_STREAMON`.
> > +
> > +Decoding
> > +========
> > +
> > +For each frame, the client is responsible for submitting a request to which the
> > +following is attached:
> > +
> > +* Exactly one frame worth of encoded data in a buffer submitted to the
> > + ``OUTPUT`` queue,
>
> Just to make sure, in case of H.264, that would include all the slices
> of the frame in one buffer, right?

Yes. I thought "one frame worth of encoded data" already carried that
meaning, but do you think this should be said more explicitly?

>
> > +* All the controls relevant to the format being decoded (see below for details).
> > +
> > +``CAPTURE`` buffers must not be part of the request, but must be queued
> > +independently. The driver will pick one of the queued ``CAPTURE`` buffers and
> > +decode the frame into it. Although the client has no control over which
> > +``CAPTURE`` buffer will be used with a given ``OUTPUT`` buffer, it is guaranteed
> > +that ``CAPTURE`` buffers will be returned in decode order (i.e. the same order
> > +as ``OUTPUT`` buffers were submitted), so it is trivial to associate a dequeued
> > +``CAPTURE`` buffer to its originating request and ``OUTPUT`` buffer.
> > +
> > +If the request is submitted without an ``OUTPUT`` buffer or if one of the
> > +required controls are missing, then :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return
> > +``-EINVAL``.
>
> As per some of the other discussion threads we had before (and I
> linked to in my previous reply), we might want one of the following:
> 1) precisely define the list of controls needed with the
> fine-granularity of all possible stream feature combinations,
> 2) make only some very basic controls mandatory and never fail if
> other controls are not specified.
>
> IMHO 2) has a potential to lead to userspace relying on undefined
> behavior with some controls not being set (for
> laziness/simplicity/whatever excuse the author can think of), so I'd
> personally go with 1)...

Strongly agree here. Undefined behavior is the devil.

The list may become very exhaustive though. Finding a way to present
it clearly and as concisely as possible within the formatting options
allowed by Sphinx may be a challenge.

>
> > Decoding errors are signaled by the ``CAPTURE`` buffers being
> > +dequeued carrying the ``V4L2_BUF_FLAG_ERROR`` flag.
> > +
> > +The contents of source ``OUTPUT`` buffers, as well as the controls that must be
> > +set on the request, depend on active coded pixel format and might be affected by
> > +codec-specific extended controls, as stated in documentation of each format
> > +individually.
> > +
> > +MPEG-2 buffer content and controls
> > +----------------------------------
> > +The following information is valid when the ``OUTPUT`` queue is set to the
> > +``V4L2_PIX_FMT_MPEG2_SLICE`` format.
>
> Perhaps we should document the controls there instead? I think that
> was the conclusion from the discussions over the stateful API.

Do you mean, on the format page? Sounds reasonable.

>
> > +
> > +The ``OUTPUT`` buffer must contain all the macroblock slices of a given frame,
> > +i.e. if a frame requires several macroblock slices to be entirely decoded, then
> > +all these slices must be provided. In addition, the following controls must be
> > +set on the request:
> > +
> > +V4L2_CID_MPEG_VIDEO_MPEG2_SLICE_PARAMS
> > + Slice parameters (one per slice) for the current frame.
> > +
>
> How do we know how many slices are included in current frame?

Mmm, indeed. It seems like the Cedrus driver assumes only one slice
per frame. Paul, is that correct?

>
> > +Optional controls:
> > +
> > +V4L2_CID_MPEG_VIDEO_MPEG2_QUANTIZATION
> > + Quantization matrices for the current frame.
>
> What happens if it's not specified?

Cedrus uses a default quantization matrix in that case. I am not
particularly knowledgeable about the MPEG-2 spec, but maybe this is
because streams are not required to include quantization matrices?

>
> > +
> > +H.264 buffer content and controls
> > +---------------------------------
> > +The following information is valid when the ``OUTPUT`` queue is set to the
> > +``V4L2_PIX_FMT_H264_SLICE`` format.
>
> Ditto.
>
> > +
> > +The ``OUTPUT`` buffer must contain all the macroblock slices of a given frame,
> > +i.e. if a frame requires several macroblock slices to be entirely decoded, then
> > +all these slices must be provided. In addition, the following controls must be
> > +set on the request:
> > +
> > +V4L2_CID_MPEG_VIDEO_H264_SPS
> > + Instance of struct v4l2_ctrl_h264_sps, containing the SPS of to use with the
> > + frame.
> > +
> > +V4L2_CID_MPEG_VIDEO_H264_PPS
> > + Instance of struct v4l2_ctrl_h264_pps, containing the PPS of to use with the
> > + frame.
> > +
> > +V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX
> > + Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
> > + matrix to use when decoding the frame.
> > +
> > +V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM
> > + Array of struct v4l2_ctrl_h264_slice_param, containing at least as many
> > + entries as there are slices in the corresponding ``OUTPUT`` buffer.
> > +
> > +V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAM
> > + Instance of struct v4l2_ctrl_h264_decode_param, containing the high-level
> > + decoding parameters for a H.264 frame.
> > +
> > +Seek
> > +====
> > +In order to seek, the client just needs to submit requests using input buffers
> > +corresponding to the new stream position. It must however be aware that
> > +resolution may have changed and follow the dynamic resolution change protocol in
>
> nit: We tend to call it "sequence" rather than "protocol" in other documents.

Replaced in the document.

>
> > +that case. Also depending on the codec used, picture parameters (e.g. SPS/PPS
> > +for H.264) may have changed and the client is responsible for making sure that
> > +a valid state is sent to the kernel.
> > +
> > +The client is then free to ignore any returned ``CAPTURE`` buffer that comes
> > +from the pre-seek position.
> > +
> > +Pause
> > +=====
> > +
> > +In order to pause, the client should just cease queuing buffers onto the
> > +``OUTPUT`` queue. This is different from the general V4L2 API definition of
> > +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue.
> > +Without source bitstream data, there is no data to process and the hardware
> > +remains idle.
>
> This behavior is by design of memory-to-memory devices, so I'm not
> sure we really need this section. Perhaps we could move it to a
> separate document which explains m2m basics.
>
> > +
> > +Dynamic resolution change
> > +=========================
> > +
> > +If the client detects a resolution change in the stream, it may need to
> > +reallocate the ``CAPTURE`` buffers to fit the new size.
> > +
> > +1. Wait until all submitted requests have completed and dequeue the
> > + corresponding output buffers.
> > +
> > +2. Call :c:func:`VIDIOC_STREAMOFF` on the ``CAPTURE`` queue.
> > +
> > +3. Free all ``CAPTURE`` buffers by calling :c:func:`VIDIOC_REQBUFS` on the
> > + ``CAPTURE`` queue with a buffer count of zero.
> > +
> > +4. Set the new format and resolution on the ``CAPTURE`` queue.
> > +
> > +5. Allocate new ``CAPTURE`` buffers for the new resolution.
> > +
> > +6. Call :c:func:`VIDIOC_STREAMON` on the ``CAPTURE`` queue to resume the stream.
> > +
> > +The client can then start queueing new ``CAPTURE`` buffers and submit requests
> > +to decode the next buffers at the new resolution.
>
> There is some inconsistency here. In initialization sequence, we set
> OUTPUT format to coded width and height and had the driver set a sane
> default CAPTURE format. What happens to OUTPUT format? It definitely
> wouldn't make sense if it stayed at the initial coded size.

You're right. We need to set the resolution on OUTPUT first, and maybe
reallocate input buffers as well since a resolution increase may mean
they are now too small to contain one encoded frame.

Actually, in the case of stateless codecs a resolution change may mean
re-doing initialization from step #2. I suppose a resolution change
can only happen on an IDR frame, so all the user-maintained state
could be dropped anyway.

>
> > +
> > +Drain
> > +=====
> > +
> > +In order to drain the stream on a stateless decoder, the client just needs to
> > +wait until all the submitted requests are completed. There is no need to send a
> > +``V4L2_DEC_CMD_STOP`` command since requests are processed sequentially by the
> > +driver.
>
> Is there a need to include this section? I feel like it basically says
> "Drain sequence: There is no drain sequence." ;)

This is mostly to match what we have in the stateful doc. Not saying
anything may make it look like this has not been thoroughly tought. :)

>
> > +
> > +End of stream
> > +=============
> > +
> > +If the decoder encounters an end of stream marking in the stream, the
> > +driver must send a ``V4L2_EVENT_EOS`` event to the client after all frames
> > +are decoded and ready to be dequeued on the ``CAPTURE`` queue, with the
> > +:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This
> > +behavior is identical to the drain sequence triggered by the client via
> > +``V4L2_DEC_CMD_STOP``.
>
> The client parses the stream, so it should be also responsible for end
> of stream handling. There isn't anything to be signaled by the driver
> here (nor the driver could actually signal), so I'd just remove this
> section completely.

Indeed.

>
> > diff --git a/Documentation/media/uapi/v4l/devices.rst b/Documentation/media/uapi/v4l/devices.rst
> > index 1822c66c2154..a8e568eda7d8 100644
> > --- a/Documentation/media/uapi/v4l/devices.rst
> > +++ b/Documentation/media/uapi/v4l/devices.rst
> > @@ -16,6 +16,7 @@ Interfaces
> > dev-osd
> > dev-codec
> > dev-decoder
> > + dev-stateless-decoder
> > dev-encoder
> > dev-effect
> > dev-raw-vbi
> > diff --git a/Documentation/media/uapi/v4l/extended-controls.rst b/Documentation/media/uapi/v4l/extended-controls.rst
> > index a9252225b63e..c0411ebf4c12 100644
> > --- a/Documentation/media/uapi/v4l/extended-controls.rst
> > +++ b/Documentation/media/uapi/v4l/extended-controls.rst
> > @@ -810,6 +810,29 @@ enum v4l2_mpeg_video_bitrate_mode -
> > otherwise the decoder expects a single frame in per buffer.
> > Applicable to the decoder, all codecs.
> >
> > +``V4L2_CID_MPEG_VIDEO_H264_SPS``
> > + Instance of struct v4l2_ctrl_h264_sps, containing the SPS of to use with
> > + the next queued frame. Applicable to the H.264 stateless decoder.
> > +
> > +``V4L2_CID_MPEG_VIDEO_H264_PPS``
> > + Instance of struct v4l2_ctrl_h264_pps, containing the PPS of to use with
> > + the next queued frame. Applicable to the H.264 stateless decoder.
> > +
> > +``V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX``
> > + Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
> > + matrix to use when decoding the next queued frame. Applicable to the H.264
> > + stateless decoder.
> > +
> > +``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM``
> > + Array of struct v4l2_ctrl_h264_slice_param, containing at least as many
> > + entries as there are slices in the corresponding ``OUTPUT`` buffer.
> > + Applicable to the H.264 stateless decoder.
> > +
> > +``V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAM``
> > + Instance of struct v4l2_ctrl_h264_decode_param, containing the high-level
> > + decoding parameters for a H.264 frame. Applicable to the H.264 stateless
> > + decoder.
> > +
>
> This seems to be roughly the same as in "H.264 buffer content and
> controls". IMHO we should just keep the controls described here and
> make the other file just cross reference to these descriptions.
>
> Also, I guess this would eventually end up in the patch that adds
> those controls and is just included here for RFC purposes, right?

Yes, this is here to make things easier to follow, and also because I
don't think that the patch adding these controls had documentation?
(sorry if it had!)

Thanks for the review!
Alex.

2018-09-11 08:40:58

[permalink] [raw]

Subject: Re: [RFC PATCH] media: docs-rst: Document m2m stateless video decoder interface

Hi Hans, thanks for the comments!

On Mon, Sep 10, 2018 at 8:25 PM Hans Verkuil <[email protected]> wrote:
>
> Hi Alexandre,
>
> Thank you very much for working on this, much appreciated!

It's still quite rough around the edges (and everywhere else
actually), but that will do to get things started...

>
> On 08/31/2018 09:47 AM, Alexandre Courbot wrote:
> > This patch documents the protocol that user-space should follow when
> > communicating with stateless video decoders. It is based on the
> > following references:
> >
> > * The current protocol used by Chromium (converted from config store to
> > request API)
> >
> > * The submitted Cedrus VPU driver
> >
> > As such, some things may not be entirely consistent with the current
> > state of drivers, so it would be great if all stakeholders could point
> > out these inconsistencies. :)
> >
> > This patch is supposed to be applied on top of the Request API V18 as
> > well as the memory-to-memory video decoder interface series by Tomasz
> > Figa.
> >
> > It should be considered an early RFC.
> >
> > Signed-off-by: Alexandre Courbot <[email protected]>
> > ---
> > .../media/uapi/v4l/dev-stateless-decoder.rst | 413 ++++++++++++++++++
> > Documentation/media/uapi/v4l/devices.rst | 1 +
> > .../media/uapi/v4l/extended-controls.rst | 23 +
> > 3 files changed, 437 insertions(+)
> > create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> >
> > diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > new file mode 100644
> > index 000000000000..bf7b13a8ee16
> > --- /dev/null
> > +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > @@ -0,0 +1,413 @@
> > +.. -*- coding: utf-8; mode: rst -*-
> > +
> > +.. _stateless_decoder:
> > +
> > +**************************************************
> > +Memory-to-memory Stateless Video Decoder Interface
> > +**************************************************
> > +
> > +A stateless decoder is a decoder that works without retaining any kind of state
> > +between processing frames. This means that each frame is decoded independently
> > +of any previous and future frames, and that the client is responsible for
> > +maintaining the decoding state and providing it to the driver. This is in
> > +contrast to the stateful video decoder interface, where the hardware maintains
> > +the decoding state and all the client has to do is to provide the raw encoded
> > +stream.
> > +
> > +This section describes how user-space ("the client") is expected to communicate
> > +with such decoders in order to successfully decode an encoded stream. Compared
> > +to stateful codecs, the driver/client protocol is simpler, but cost of this
> > +simplicity is extra complexity in the client which must maintain the decoding
> > +state.
> > +
> > +Querying capabilities
> > +=====================
> > +
> > +1. To enumerate the set of coded formats supported by the driver, the client
> > + calls :c:func:`VIDIOC_ENUM_FMT` on the ``OUTPUT`` queue.
> > +
> > + * The driver must always return the full set of supported formats for the
> > + currently set ``OUTPUT`` format, irrespective of the format currently set
> > + on the ``CAPTURE`` queue.
> > +
> > +2. To enumerate the set of supported raw formats, the client calls
> > + :c:func:`VIDIOC_ENUM_FMT` on the ``CAPTURE`` queue.
> > +
> > + * The driver must return only the formats supported for the format currently
> > + active on the ``OUTPUT`` queue.
> > +
> > + * In order to enumerate raw formats supported by a given coded format, the
> > + client must thus set that coded format on the ``OUTPUT`` queue first, and
> > + then enumerate the ``CAPTURE`` queue.
> > +
> > +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> > + resolutions for a given format, passing desired pixel format in
> > + :c:type:`v4l2_frmsizeenum` ``pixel_format``.
> > +
> > + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``OUTPUT`` queue
> > + must include all possible coded resolutions supported by the decoder
> > + for given coded pixel format.
> > +
> > + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``CAPTURE`` queue
> > + must include all possible frame buffer resolutions supported by the
> > + decoder for given raw pixel format and coded format currently set on
> > + ``OUTPUT`` queue.
> > +
> > + .. note::
> > +
> > + The client may derive the supported resolution range for a
> > + combination of coded and raw format by setting width and height of
> > + ``OUTPUT`` format to 0 and calculating the intersection of
> > + resolutions returned from calls to :c:func:`VIDIOC_ENUM_FRAMESIZES`
> > + for the given coded and raw formats.
> > +
> > +4. Supported profiles and levels for given format, if applicable, may be
> > + queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> > +
> > +Initialization
> > +==============
> > +
> > +1. *[optional]* Enumerate supported ``OUTPUT`` formats and resolutions. See
> > + capability enumeration.
> > +
> > +2. Set the coded format on the ``OUTPUT`` queue via :c:func:`VIDIOC_S_FMT`
> > +
> > + * **Required fields:**
> > +
> > + ``type``
> > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > + ``pixelformat``
> > + a coded pixel format
> > +
> > + ``width``, ``height``
> > + parsed width and height of the coded format
> > +
> > + other fields
> > + follow standard semantics
> > +
> > + .. note::
> > +
> > + Changing ``OUTPUT`` format may change currently set ``CAPTURE``
> > + format. The driver will derive a new ``CAPTURE`` format from
> > + ``OUTPUT`` format being set, including resolution, colorimetry
> > + parameters, etc. If the client needs a specific ``CAPTURE`` format,
> > + it must adjust it afterwards.
> > +
> > +3. *[optional]* Get minimum number of buffers required for ``OUTPUT``
> > + queue via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to
> > + use more buffers than minimum required by hardware/format.
> > +
> > + * **Required fields:**
> > +
> > + ``id``
> > + set to ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> > +
> > + * **Return fields:**
> > +
> > + ``value``
> > + required number of ``OUTPUT`` buffers for the currently set
> > + format
> > +
> > +4. Call :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` queue to get format for the
> > + destination buffers parsed/decoded from the bitstream.
> > +
> > + * **Required fields:**
> > +
> > + ``type``
> > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > +
> > + * **Return fields:**
> > +
> > + ``width``, ``height``
> > + frame buffer resolution for the decoded frames
> > +
> > + ``pixelformat``
> > + pixel format for decoded frames
> > +
> > + ``num_planes`` (for _MPLANE ``type`` only)
> > + number of planes for pixelformat
> > +
> > + ``sizeimage``, ``bytesperline``
> > + as per standard semantics; matching frame buffer format
> > +
> > + .. note::
> > +
> > + The value of ``pixelformat`` may be any pixel format supported for the
> > + ``OUTPUT`` format, based on the hardware capabilities. It is suggested
> > + that driver chooses the preferred/optimal format for given configuration.
> > + For example, a YUV format may be preferred over an RGB format, if
> > + additional conversion step would be required.
> > +
> > +5. *[optional]* Enumerate ``CAPTURE`` formats via :c:func:`VIDIOC_ENUM_FMT` on
> > + ``CAPTURE`` queue. The client may use this ioctl to discover which
> > + alternative raw formats are supported for the current ``OUTPUT`` format and
> > + select one of them via :c:func:`VIDIOC_S_FMT`.
> > +
> > + .. note::
> > +
> > + The driver will return only formats supported for the currently selected
> > + ``OUTPUT`` format, even if more formats may be supported by the driver in
> > + general.
> > +
> > + For example, a driver/hardware may support YUV and RGB formats for
> > + resolutions 1920x1088 and lower, but only YUV for higher resolutions (due
> > + to hardware limitations). After setting a resolution of 1920x1088 or lower
> > + as the ``OUTPUT`` format, :c:func:`VIDIOC_ENUM_FMT` may return a set of
> > + YUV and RGB pixel formats, but after setting a resolution higher than
> > + 1920x1088, the driver will not return RGB, unsupported for this
> > + resolution.
> > +
> > +6. *[optional]* Choose a different ``CAPTURE`` format than suggested via
> > + :c:func:`VIDIOC_S_FMT` on ``CAPTURE`` queue. It is possible for the client to
> > + choose a different format than selected/suggested by the driver in
> > + :c:func:`VIDIOC_G_FMT`.
> > +
> > + * **Required fields:**
> > +
> > + ``type``
> > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > +
> > + ``pixelformat``
> > + a raw pixel format
> > +
> > + .. note::
> > +
> > + Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available
> > + formats after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
> > + out a set of allowed formats for given configuration, but not required,
> > + if the client can accept the defaults.
> > +
> > +7. *[optional]* Get minimum number of buffers required for ``CAPTURE`` queue via
> > + :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use more buffers
> > + than minimum required by hardware/format.
> > +
> > + * **Required fields:**
> > +
> > + ``id``
> > + set to ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> > +
> > + * **Return fields:**
> > +
> > + ``value``
> > + minimum number of buffers required to decode the stream parsed in this
> > + initialization sequence.
> > +
> > + .. note::
> > +
> > + Note that the minimum number of buffers must be at least the number
> > + required to successfully decode the current stream. This may for example
> > + be the required DPB size for an H.264 stream given the parsed stream
> > + configuration (resolution, level).
> > +
> > +8. Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on
> > + ``OUTPUT`` queue.
> > +
> > + * **Required fields:**
> > +
> > + ``count``
> > + requested number of buffers to allocate; greater than zero
> > +
> > + ``type``
> > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > + ``memory``
> > + follows standard semantics
> > +
> > + ``sizeimage``
> > + follows standard semantics; the client is free to choose any
> > + suitable size, however, it may be subject to change by the
> > + driver
> > +
> > + * **Return fields:**
> > +
> > + ``count``
> > + actual number of buffers allocated
> > +
> > + * The driver must adjust count to minimum of required number of ``OUTPUT``
> > + buffers for given format and count passed. The client must check this
> > + value after the ioctl returns to get the number of buffers allocated.
> > +
> > + .. note::
> > +
> > + To allocate more than minimum number of buffers (for pipeline depth), use
> > + G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) to get minimum number of
> > + buffers required by the driver/format, and pass the obtained value plus
> > + the number of additional buffers needed in count to
> > + :c:func:`VIDIOC_REQBUFS`.
> > +
> > +9. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
> > + ``CAPTURE`` queue.
> > +
> > + * **Required fields:**
> > +
> > + ``count``
> > + requested number of buffers to allocate; greater than zero
> > +
> > + ``type``
> > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > +
> > + ``memory``
> > + follows standard semantics
> > +
> > + * **Return fields:**
> > +
> > + ``count``
> > + adjusted to allocated number of buffers
> > +
> > + * The driver must adjust count to minimum of required number of
> > + destination buffers for given format and stream configuration and the
> > + count passed. The client must check this value after the ioctl
> > + returns to get the number of buffers allocated.
> > +
> > + .. note::
> > +
> > + To allocate more than minimum number of buffers (for pipeline
> > + depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``) to
> > + get minimum number of buffers required, and pass the obtained value
> > + plus the number of additional buffers needed in count to
> > + :c:func:`VIDIOC_REQBUFS`.
> > +
> > +10. Allocate requests (likely one per ``OUTPUT`` buffer) via
> > + :c:func:`MEDIA_IOC_REQUEST_ALLOC` on the media device.
> > +
> > +11. Start streaming on both ``OUTPUT`` and ``CAPTURE`` queues via
> > + :c:func:`VIDIOC_STREAMON`.
>
> If I am not mistaken, steps 1-11 are the same as for stateful codecs. It is better
> to just refer to that instead of copying them.

They are similar but not exactly identical, particularly after Tomasz'
comments. It may be difficult to avoid some duplication here.

>
> > +
> > +Decoding
> > +========
> > +
> > +For each frame, the client is responsible for submitting a request to which the
> > +following is attached:
> > +
> > +* Exactly one frame worth of encoded data in a buffer submitted to the
> > + ``OUTPUT`` queue,
> > +* All the controls relevant to the format being decoded (see below for details).
> > +
> > +``CAPTURE`` buffers must not be part of the request, but must be queued
> > +independently. The driver will pick one of the queued ``CAPTURE`` buffers and
> > +decode the frame into it. Although the client has no control over which
> > +``CAPTURE`` buffer will be used with a given ``OUTPUT`` buffer, it is guaranteed
> > +that ``CAPTURE`` buffers will be returned in decode order (i.e. the same order
> > +as ``OUTPUT`` buffers were submitted), so it is trivial to associate a dequeued
> > +``CAPTURE`` buffer to its originating request and ``OUTPUT`` buffer.
> > +
> > +If the request is submitted without an ``OUTPUT`` buffer or if one of the
> > +required controls are missing, then :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return
> > +``-EINVAL``.
>
> Not entirely true: if buffers are missing, then ENOENT is returned. Missing required
> controls or more than one OUTPUT buffer will result in EINVAL. This per the latest
> Request API changes.

Fixed that, thanks.

>
> Decoding errors are signaled by the ``CAPTURE`` buffers being
> > +dequeued carrying the ``V4L2_BUF_FLAG_ERROR`` flag.
>
> Add here that if the reference frame had an error, then all other frames that refer
> to it should also set the ERROR flag. It is up to userspace to decide whether or
> not to drop them (part of the frame might still be valid).

Noted and added.

>
> I am not sure whether this should be documented, but there are some additional
> restrictions w.r.t. reference frames:
>
> Since decoders need access to the decoded reference frames there are some corner
> cases that need to be checked:
>
> 1) V4L2_MEMORY_USERPTR cannot be used for the capture queue: the driver does not
> know when a malloced but dequeued buffer is freed, so the reference frame
> could suddenly be gone.

It it is confirmed that we cannot use USERPTR buffers as CAPTURE then
this probably needs to be documented. I wonder if there isn't a way to
avoid this by having vb2 keep a reference to the pages in such a way
that they would not be recycled after after userspace calls free() on
the buffer. Is that possible with user-allocated memory?

Not that I think that forbidding USERPTR buffers in this scenario
would be a big problem.

>
> 2) V4L2_MEMORY_DMABUF can be used, but drivers should check that the dma buffer is
> still available AND increase the dmabuf refcount while it is used by the HW.

Yeah, with DMABUF the above problem can easily be avoided at least.

>
> 3) What to do if userspace has requeued a buffer containing a reference frame,
> and you want to decode a B/P-frame that refers to that buffer? We need to
> check against that: I think that when you queue a capture buffer whose index
> is used in a pending request as a reference frame, than that should fail with
> an error. And trying to queue a request referring to a buffer that has been
> requeued should also fail.
>
> We might need to add some support for this in v4l2-mem2mem.c or vb2.

Sounds good, and we should document this as well.

2018-09-11 08:41:09

[permalink] [raw]

Subject: Re: [RFC PATCH] media: docs-rst: Document m2m stateless video decoder interface

On Mon, Sep 10, 2018 at 9:49 PM Hans Verkuil <[email protected]> wrote:
>
> On 09/10/2018 01:57 PM, Hans Verkuil wrote:
> > On 09/10/2018 01:25 PM, Hans Verkuil wrote:
> >>> +
> >>> +Decoding
> >>> +========
> >>> +
> >>> +For each frame, the client is responsible for submitting a request to which the
> >>> +following is attached:
> >>> +
> >>> +* Exactly one frame worth of encoded data in a buffer submitted to the
> >>> + ``OUTPUT`` queue,
> >>> +* All the controls relevant to the format being decoded (see below for details).
> >>> +
> >>> +``CAPTURE`` buffers must not be part of the request, but must be queued
> >>> +independently. The driver will pick one of the queued ``CAPTURE`` buffers and
> >>> +decode the frame into it. Although the client has no control over which
> >>> +``CAPTURE`` buffer will be used with a given ``OUTPUT`` buffer, it is guaranteed
> >>> +that ``CAPTURE`` buffers will be returned in decode order (i.e. the same order
> >>> +as ``OUTPUT`` buffers were submitted), so it is trivial to associate a dequeued
> >>> +``CAPTURE`` buffer to its originating request and ``OUTPUT`` buffer.
> >>> +
> >>> +If the request is submitted without an ``OUTPUT`` buffer or if one of the
> >>> +required controls are missing, then :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return
> >>> +``-EINVAL``.
> >>
> >> Not entirely true: if buffers are missing, then ENOENT is returned. Missing required
> >> controls or more than one OUTPUT buffer will result in EINVAL. This per the latest
> >> Request API changes.
> >>
> >> Decoding errors are signaled by the ``CAPTURE`` buffers being
> >>> +dequeued carrying the ``V4L2_BUF_FLAG_ERROR`` flag.
> >>
> >> Add here that if the reference frame had an error, then all other frames that refer
> >> to it should also set the ERROR flag. It is up to userspace to decide whether or
> >> not to drop them (part of the frame might still be valid).
> >>
> >> I am not sure whether this should be documented, but there are some additional
> >> restrictions w.r.t. reference frames:
> >>
> >> Since decoders need access to the decoded reference frames there are some corner
> >> cases that need to be checked:
> >>
> >> 1) V4L2_MEMORY_USERPTR cannot be used for the capture queue: the driver does not
> >> know when a malloced but dequeued buffer is freed, so the reference frame
> >> could suddenly be gone.
> >>
> >> 2) V4L2_MEMORY_DMABUF can be used, but drivers should check that the dma buffer is
> >> still available AND increase the dmabuf refcount while it is used by the HW.
> >>
> >> 3) What to do if userspace has requeued a buffer containing a reference frame,
> >> and you want to decode a B/P-frame that refers to that buffer? We need to
> >> check against that: I think that when you queue a capture buffer whose index
> >> is used in a pending request as a reference frame, than that should fail with
> >> an error. And trying to queue a request referring to a buffer that has been
> >> requeued should also fail.
> >>
> >> We might need to add some support for this in v4l2-mem2mem.c or vb2.
> >>
> >> We will have similar (but not quite identical) issues with stateless encoders.
> >
> > Related to this is the question whether buffer indices that are used to refer
> > to reference frames should refer to the capture or output queue.
> >
> > Using capture indices works if you never queue more than one request at a time:
> > you know exactly what the capture buffer index is of the decoded I-frame, and
> > you can use that in the following requests.
> >
> > But if multiple requests are queued, then you don't necessarily know to which
> > capture buffer an I-frame will be decoded, so then you can't provide this index
> > to following B/P-frames. This puts restrictions on userspace: you can only
> > queue B/P-frames once you have decoded the I-frame. This might be perfectly
> > acceptable, though.

IIUC at the moment we are indeed using CAPTURE buffer indexes, e.g:

.. flat-table:: struct v4l2_ctrl_mpeg2_slice_params
..
- ``backward_ref_index``
- Index for the V4L2 buffer to use as backward reference, used
with
B-coded and P-coded frames.

So I wonder how is the user-space currently exercising Cedrus doing
here? Waiting for each frame used as a reference to be dequeued?

> >
> > Using output buffer indices will work (the driver will know to which capture
> > buffer index the I-frames mapped), but it requires that the output buffer that
> > contained a reference frame isn't requeued, since that means that the driver
> > will lose this mapping. I think this will make things too confusing, though.
> >
> > A third option is that you don't refer to reference frames by buffer index,
> > but instead by some other counter (sequence counter, perhaps?).
>
> Definitely not sequence number, since it has the same problem as buffer index.
>
> This could be a value relative to the frame you are trying to decode: i.e. 'use
> the capture buffer from X frames ago'. This makes it index independent and is
> still easy to keep track of inside the driver and application.

Sounds like that would work, although the driver would need to
maintain a history of processed frames indexes. We will still need to
make sure that frames referenced have not been re-queued in the
meantime.

Drivers (or the to-be-designed codec framework?) will also need to be
handle the case where user-space did not allocate enough buffers to
hold all the reference frames and the frame to be decoded...

Sorry, I have more questions than answers I'm afraid.

2018-09-11 08:50:52

[permalink] [raw]

Subject: Re: [RFC PATCH] media: docs-rst: Document m2m stateless video decoder interface

On 09/11/18 10:40, Alexandre Courbot wrote:
>> I am not sure whether this should be documented, but there are some additional
>> restrictions w.r.t. reference frames:
>>
>> Since decoders need access to the decoded reference frames there are some corner
>> cases that need to be checked:
>>
>> 1) V4L2_MEMORY_USERPTR cannot be used for the capture queue: the driver does not
>> know when a malloced but dequeued buffer is freed, so the reference frame
>> could suddenly be gone.
>
> It it is confirmed that we cannot use USERPTR buffers as CAPTURE then
> this probably needs to be documented. I wonder if there isn't a way to
> avoid this by having vb2 keep a reference to the pages in such a way
> that they would not be recycled after after userspace calls free() on
> the buffer. Is that possible with user-allocated memory?

vb2 keeps a reference while the buffer is queued, but that reference is
released once the buffer is dequeued. Correctly, IMHO. If you provide
USERPTR, than userspace is responsible for the memory. Changing this
would require changing the API, since USERPTR has always worked like
this.

>
> Not that I think that forbidding USERPTR buffers in this scenario
> would be a big problem.

I think it is perfectly OK to forbid this. Ideally I would like to have
some test in v4l2-compliance or (even better) v4l2-mem2mem.c for this,
but it is actually not that easy to identify drivers like this.

Suggestions for this on a postcard...

>
>>
>> 2) V4L2_MEMORY_DMABUF can be used, but drivers should check that the dma buffer is
>> still available AND increase the dmabuf refcount while it is used by the HW.
>
> Yeah, with DMABUF the above problem can easily be avoided at least.
>
>>
>> 3) What to do if userspace has requeued a buffer containing a reference frame,
>> and you want to decode a B/P-frame that refers to that buffer? We need to
>> check against that: I think that when you queue a capture buffer whose index
>> is used in a pending request as a reference frame, than that should fail with
>> an error. And trying to queue a request referring to a buffer that has been
>> requeued should also fail.
>>
>> We might need to add some support for this in v4l2-mem2mem.c or vb2.
>
> Sounds good, and we should document this as well.
>

Right. And test it!

Regards,

Hans

2018-09-11 09:09:39

[permalink] [raw]

Subject: Re: [RFC PATCH] media: docs-rst: Document m2m stateless video decoder interface

On 09/11/18 10:40, Alexandre Courbot wrote:
> On Mon, Sep 10, 2018 at 9:49 PM Hans Verkuil <[email protected]> wrote:
>>
>> On 09/10/2018 01:57 PM, Hans Verkuil wrote:
>>> On 09/10/2018 01:25 PM, Hans Verkuil wrote:
>>>>> +
>>>>> +Decoding
>>>>> +========
>>>>> +
>>>>> +For each frame, the client is responsible for submitting a request to which the
>>>>> +following is attached:
>>>>> +
>>>>> +* Exactly one frame worth of encoded data in a buffer submitted to the
>>>>> + ``OUTPUT`` queue,
>>>>> +* All the controls relevant to the format being decoded (see below for details).
>>>>> +
>>>>> +``CAPTURE`` buffers must not be part of the request, but must be queued
>>>>> +independently. The driver will pick one of the queued ``CAPTURE`` buffers and
>>>>> +decode the frame into it. Although the client has no control over which
>>>>> +``CAPTURE`` buffer will be used with a given ``OUTPUT`` buffer, it is guaranteed
>>>>> +that ``CAPTURE`` buffers will be returned in decode order (i.e. the same order
>>>>> +as ``OUTPUT`` buffers were submitted), so it is trivial to associate a dequeued
>>>>> +``CAPTURE`` buffer to its originating request and ``OUTPUT`` buffer.
>>>>> +
>>>>> +If the request is submitted without an ``OUTPUT`` buffer or if one of the
>>>>> +required controls are missing, then :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return
>>>>> +``-EINVAL``.
>>>>
>>>> Not entirely true: if buffers are missing, then ENOENT is returned. Missing required
>>>> controls or more than one OUTPUT buffer will result in EINVAL. This per the latest
>>>> Request API changes.
>>>>
>>>> Decoding errors are signaled by the ``CAPTURE`` buffers being
>>>>> +dequeued carrying the ``V4L2_BUF_FLAG_ERROR`` flag.
>>>>
>>>> Add here that if the reference frame had an error, then all other frames that refer
>>>> to it should also set the ERROR flag. It is up to userspace to decide whether or
>>>> not to drop them (part of the frame might still be valid).
>>>>
>>>> I am not sure whether this should be documented, but there are some additional
>>>> restrictions w.r.t. reference frames:
>>>>
>>>> Since decoders need access to the decoded reference frames there are some corner
>>>> cases that need to be checked:
>>>>
>>>> 1) V4L2_MEMORY_USERPTR cannot be used for the capture queue: the driver does not
>>>> know when a malloced but dequeued buffer is freed, so the reference frame
>>>> could suddenly be gone.
>>>>
>>>> 2) V4L2_MEMORY_DMABUF can be used, but drivers should check that the dma buffer is
>>>> still available AND increase the dmabuf refcount while it is used by the HW.
>>>>
>>>> 3) What to do if userspace has requeued a buffer containing a reference frame,
>>>> and you want to decode a B/P-frame that refers to that buffer? We need to
>>>> check against that: I think that when you queue a capture buffer whose index
>>>> is used in a pending request as a reference frame, than that should fail with
>>>> an error. And trying to queue a request referring to a buffer that has been
>>>> requeued should also fail.
>>>>
>>>> We might need to add some support for this in v4l2-mem2mem.c or vb2.
>>>>
>>>> We will have similar (but not quite identical) issues with stateless encoders.
>>>
>>> Related to this is the question whether buffer indices that are used to refer
>>> to reference frames should refer to the capture or output queue.
>>>
>>> Using capture indices works if you never queue more than one request at a time:
>>> you know exactly what the capture buffer index is of the decoded I-frame, and
>>> you can use that in the following requests.
>>>
>>> But if multiple requests are queued, then you don't necessarily know to which
>>> capture buffer an I-frame will be decoded, so then you can't provide this index
>>> to following B/P-frames. This puts restrictions on userspace: you can only
>>> queue B/P-frames once you have decoded the I-frame. This might be perfectly
>>> acceptable, though.
>
> IIUC at the moment we are indeed using CAPTURE buffer indexes, e.g:
>
> .. flat-table:: struct v4l2_ctrl_mpeg2_slice_params
> ..
> - ``backward_ref_index``
> - Index for the V4L2 buffer to use as backward reference, used
> with
> B-coded and P-coded frames.
>
> So I wonder how is the user-space currently exercising Cedrus doing
> here? Waiting for each frame used as a reference to be dequeued?

No, the assumption is (if I understand correctly) that userspace won't
touch the memory of the dequeued reference buffer so HW can just point
to it.

Paul, please correct me if I am wrong.

What does chromeOS do?

The cedrus approach sounds reasonable, although it should be documented
that userspace shouldn't modify the data in a reference frame.

>
>>>
>>> Using output buffer indices will work (the driver will know to which capture
>>> buffer index the I-frames mapped), but it requires that the output buffer that
>>> contained a reference frame isn't requeued, since that means that the driver
>>> will lose this mapping. I think this will make things too confusing, though.
>>>
>>> A third option is that you don't refer to reference frames by buffer index,
>>> but instead by some other counter (sequence counter, perhaps?).
>>
>> Definitely not sequence number, since it has the same problem as buffer index.
>>
>> This could be a value relative to the frame you are trying to decode: i.e. 'use
>> the capture buffer from X frames ago'. This makes it index independent and is
>> still easy to keep track of inside the driver and application.
>
> Sounds like that would work, although the driver would need to
> maintain a history of processed frames indexes. We will still need to
> make sure that frames referenced have not been re-queued in the
> meantime.

Definitely, but that is all internal administration. Although I think
we would want to have some helper code for this.

After thinking about this some more I rather like this. This makes it
independent of buffer index numbers.

An alternative approach would be that the applications specifies a cookie
(u32) value to an output buffer which is then copied to the corresponding
capture buffer. That cookie is then used to refer to reference frames.

Basically userspace gives names to buffers.

The problem is that there really is no space left in v4l2_buffer, unless we
decide to turn the timecode field into a union with a 'u32 cookie' and add
a COOKIE buffer flag. Nobody is using the timecode, so that should be possible.

But if we go there, then perhaps we should just bite the bullet and create
a struct v4l2_ext_buffer that is clean:

https://git.linuxtv.org/hverkuil/media_tree.git/commit/?h=v4l2-buffer&id=a95549df06d9900f3559afdbb9da06bd4b22d1f3

> Drivers (or the to-be-designed codec framework?) will also need to be
> handle the case where user-space did not allocate enough buffers to
> hold all the reference frames and the frame to be decoded...

Is that a driver problem? That sounds more like something that userspace
has to handle. The driver just has to check that the referenced buffers
have not been requeued by userspace and return an error if they have.

It's a stateless codec, so if the driver detects a crappy state, then
what can it do but return an error? Not the driver's problem.

Regards,

Hans

2018-09-18 08:03:38

[permalink] [raw]

Subject: Re: [RFC PATCH] media: docs-rst: Document m2m stateless video decoder interface

Hi Hans, sorry for the late reply.

On Tue, Sep 11, 2018 at 6:09 PM Hans Verkuil <[email protected]> wrote:
>
> On 09/11/18 10:40, Alexandre Courbot wrote:
> > On Mon, Sep 10, 2018 at 9:49 PM Hans Verkuil <[email protected]> wrote:
> >>
> >> On 09/10/2018 01:57 PM, Hans Verkuil wrote:
> >>> On 09/10/2018 01:25 PM, Hans Verkuil wrote:
> >>>>> +
> >>>>> +Decoding
> >>>>> +========
> >>>>> +
> >>>>> +For each frame, the client is responsible for submitting a request to which the
> >>>>> +following is attached:
> >>>>> +
> >>>>> +* Exactly one frame worth of encoded data in a buffer submitted to the
> >>>>> + ``OUTPUT`` queue,
> >>>>> +* All the controls relevant to the format being decoded (see below for details).
> >>>>> +
> >>>>> +``CAPTURE`` buffers must not be part of the request, but must be queued
> >>>>> +independently. The driver will pick one of the queued ``CAPTURE`` buffers and
> >>>>> +decode the frame into it. Although the client has no control over which
> >>>>> +``CAPTURE`` buffer will be used with a given ``OUTPUT`` buffer, it is guaranteed
> >>>>> +that ``CAPTURE`` buffers will be returned in decode order (i.e. the same order
> >>>>> +as ``OUTPUT`` buffers were submitted), so it is trivial to associate a dequeued
> >>>>> +``CAPTURE`` buffer to its originating request and ``OUTPUT`` buffer.
> >>>>> +
> >>>>> +If the request is submitted without an ``OUTPUT`` buffer or if one of the
> >>>>> +required controls are missing, then :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return
> >>>>> +``-EINVAL``.
> >>>>
> >>>> Not entirely true: if buffers are missing, then ENOENT is returned. Missing required
> >>>> controls or more than one OUTPUT buffer will result in EINVAL. This per the latest
> >>>> Request API changes.
> >>>>
> >>>> Decoding errors are signaled by the ``CAPTURE`` buffers being
> >>>>> +dequeued carrying the ``V4L2_BUF_FLAG_ERROR`` flag.
> >>>>
> >>>> Add here that if the reference frame had an error, then all other frames that refer
> >>>> to it should also set the ERROR flag. It is up to userspace to decide whether or
> >>>> not to drop them (part of the frame might still be valid).
> >>>>
> >>>> I am not sure whether this should be documented, but there are some additional
> >>>> restrictions w.r.t. reference frames:
> >>>>
> >>>> Since decoders need access to the decoded reference frames there are some corner
> >>>> cases that need to be checked:
> >>>>
> >>>> 1) V4L2_MEMORY_USERPTR cannot be used for the capture queue: the driver does not
> >>>> know when a malloced but dequeued buffer is freed, so the reference frame
> >>>> could suddenly be gone.
> >>>>
> >>>> 2) V4L2_MEMORY_DMABUF can be used, but drivers should check that the dma buffer is
> >>>> still available AND increase the dmabuf refcount while it is used by the HW.
> >>>>
> >>>> 3) What to do if userspace has requeued a buffer containing a reference frame,
> >>>> and you want to decode a B/P-frame that refers to that buffer? We need to
> >>>> check against that: I think that when you queue a capture buffer whose index
> >>>> is used in a pending request as a reference frame, than that should fail with
> >>>> an error. And trying to queue a request referring to a buffer that has been
> >>>> requeued should also fail.
> >>>>
> >>>> We might need to add some support for this in v4l2-mem2mem.c or vb2.
> >>>>
> >>>> We will have similar (but not quite identical) issues with stateless encoders.
> >>>
> >>> Related to this is the question whether buffer indices that are used to refer
> >>> to reference frames should refer to the capture or output queue.
> >>>
> >>> Using capture indices works if you never queue more than one request at a time:
> >>> you know exactly what the capture buffer index is of the decoded I-frame, and
> >>> you can use that in the following requests.
> >>>
> >>> But if multiple requests are queued, then you don't necessarily know to which
> >>> capture buffer an I-frame will be decoded, so then you can't provide this index
> >>> to following B/P-frames. This puts restrictions on userspace: you can only
> >>> queue B/P-frames once you have decoded the I-frame. This might be perfectly
> >>> acceptable, though.
> >
> > IIUC at the moment we are indeed using CAPTURE buffer indexes, e.g:
> >
> > .. flat-table:: struct v4l2_ctrl_mpeg2_slice_params
> > ..
> > - ``backward_ref_index``
> > - Index for the V4L2 buffer to use as backward reference, used
> > with
> > B-coded and P-coded frames.
> >
> > So I wonder how is the user-space currently exercising Cedrus doing
> > here? Waiting for each frame used as a reference to be dequeued?
>
> No, the assumption is (if I understand correctly) that userspace won't
> touch the memory of the dequeued reference buffer so HW can just point
> to it.
>
> Paul, please correct me if I am wrong.
>
> What does chromeOS do?

At the moment Chrome OS (using the config store) queues the OUTPUT and
CAPTURE buffers together, i.e. in the same request. The CAPTURE buffer
is not tied to the request in any way, but what seems to matter here
is the queue order. If drivers process CAPTURE drivers sequentially,
then you can know which CAPTURE buffer will be used for the request.

The corollary of that is that CAPTURE buffers cannot be re-queued
until they are not referenced anymore, something the Chrome OS
userspace also takes care of. Would it be a problem to make this the
default expectation instead of having the kernel check and reorder
CAPTURE buffers? The worst that can happen AFAICT is is frame
corruption, and processing queued CAPTURE buffers sequentially would
allow us to use the V4L2 buffer ID to reference frames. That's still
the most intuitive way to do, using relative frame indexes (i.e. X
frames ago) adds complexity and the potential for misuse and bugs.

>
> The cedrus approach sounds reasonable, although it should be documented
> that userspace shouldn't modify the data in a reference frame.
>
> >
> >>>
> >>> Using output buffer indices will work (the driver will know to which capture
> >>> buffer index the I-frames mapped), but it requires that the output buffer that
> >>> contained a reference frame isn't requeued, since that means that the driver
> >>> will lose this mapping. I think this will make things too confusing, though.
> >>>
> >>> A third option is that you don't refer to reference frames by buffer index,
> >>> but instead by some other counter (sequence counter, perhaps?).
> >>
> >> Definitely not sequence number, since it has the same problem as buffer index.
> >>
> >> This could be a value relative to the frame you are trying to decode: i.e. 'use
> >> the capture buffer from X frames ago'. This makes it index independent and is
> >> still easy to keep track of inside the driver and application.
> >
> > Sounds like that would work, although the driver would need to
> > maintain a history of processed frames indexes. We will still need to
> > make sure that frames referenced have not been re-queued in the
> > meantime.
>
> Definitely, but that is all internal administration. Although I think
> we would want to have some helper code for this.
>
> After thinking about this some more I rather like this. This makes it
> independent of buffer index numbers.
>
> An alternative approach would be that the applications specifies a cookie
> (u32) value to an output buffer which is then copied to the corresponding
> capture buffer. That cookie is then used to refer to reference frames.
>
> Basically userspace gives names to buffers.
>
> The problem is that there really is no space left in v4l2_buffer, unless we
> decide to turn the timecode field into a union with a 'u32 cookie' and add
> a COOKIE buffer flag. Nobody is using the timecode, so that should be possible.
>
> But if we go there, then perhaps we should just bite the bullet and create
> a struct v4l2_ext_buffer that is clean:
>
> https://git.linuxtv.org/hverkuil/media_tree.git/commit/?h=v4l2-buffer&id=a95549df06d9900f3559afdbb9da06bd4b22d1f3
>
> > Drivers (or the to-be-designed codec framework?) will also need to be
> > handle the case where user-space did not allocate enough buffers to
> > hold all the reference frames and the frame to be decoded...
>
> Is that a driver problem? That sounds more like something that userspace
> has to handle. The driver just has to check that the referenced buffers
> have not been requeued by userspace and return an error if they have.
>
> It's a stateless codec, so if the driver detects a crappy state, then
> what can it do but return an error? Not the driver's problem.
>
> Regards,
>
> Hans

2018-10-03 10:01:40

[permalink] [raw]

Subject: Re: [RFC PATCH] media: docs-rst: Document m2m stateless video decoder interface

On Tue, Sep 11, 2018 at 3:09 PM Alexandre Courbot <[email protected]> wrote:
> On Mon, Sep 10, 2018 at 4:17 PM Tomasz Figa <[email protected]> wrote:
> > On Fri, Aug 31, 2018 at 4:48 PM Alexandre Courbot <[email protected]> wrote:
[snip]
> > > +Querying capabilities
> > > +=====================
> > > +
> > > +1. To enumerate the set of coded formats supported by the driver, the client
> > > + calls :c:func:`VIDIOC_ENUM_FMT` on the ``OUTPUT`` queue.
> > > +
> > > + * The driver must always return the full set of supported formats for the
> > > + currently set ``OUTPUT`` format, irrespective of the format currently set
> > > + on the ``CAPTURE`` queue.
> > > +
> > > +2. To enumerate the set of supported raw formats, the client calls
> > > + :c:func:`VIDIOC_ENUM_FMT` on the ``CAPTURE`` queue.
> > > +
> > > + * The driver must return only the formats supported for the format currently
> > > + active on the ``OUTPUT`` queue.
> > > +
> > > + * In order to enumerate raw formats supported by a given coded format, the
> > > + client must thus set that coded format on the ``OUTPUT`` queue first, and
> > > + then enumerate the ``CAPTURE`` queue.
> >
> > One thing that we might want to note here is that available CAPTURE
> > formats may depend on more factors than just current OUTPUT format.
> > Depending on encoding options of the stream being decoded (e.g. VP9
> > profile), the list of supported format might be limited to one of YUV
> > 4:2:0, 4:2:2 or 4:4:4 family of formats, but might be any of them, if
> > the hardware supports conversion.
> >
> > I was wondering whether we shouldn't require the client to set all the
> > necessary initial codec-specific controls before querying CAPTURE
> > formats, but since we don't have any compatibility constraints here,
> > as opposed to the stateful API, perhaps we could just make CAPTURE
> > queue completely independent and have a source change event raised if
> > the controls set later make existing format invalid.
> >
> > I'd like to ask the userspace folks here (Nicolas?), whether:
> > 1) we need to know the exact list of formats that are guaranteed to be
> > supported for playing back the whole video, or
> > 2) we're okay with some rough approximation here, or
> > 3) maybe we don't even need to enumerate formats on CAPTURE beforehand?
>
> Since user-space will need to parse all the profile and other
> information and submit it to the kernel anyway, it seems to me that
> requiring it to submit that information before setting the CAPTURE
> format makes the most sense. Otherwise all clients should listen for
> events and be capable of renegotiating the format, which would
> complicate them some more.

I would indeed lean towards just setting the relevant controls here at
initialization and be done without any source change events.

>
> >
> > To be honest, I'm not sure 1) is even possible, since the resolution
> > (or some other stream parameters) could change mid-stream.
>
> Even in that case, the client is supposed to handle that change, so I
> don't think this makes a big difference compared to initial
> negotiation?

Indeed.

[snip]
> > > +6. *[optional]* Choose a different ``CAPTURE`` format than suggested via
> > > + :c:func:`VIDIOC_S_FMT` on ``CAPTURE`` queue. It is possible for the client to
> > > + choose a different format than selected/suggested by the driver in
> > > + :c:func:`VIDIOC_G_FMT`.
> > > +
> > > + * **Required fields:**
> > > +
> > > + ``type``
> > > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > > +
> > > + ``pixelformat``
> > > + a raw pixel format
> > > +
> > > + .. note::
> > > +
> > > + Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available
> > > + formats after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
> > > + out a set of allowed formats for given configuration, but not required,
> > > + if the client can accept the defaults.
> >
> > V4L2_EVENT_SOURCE_CHANGE was not mentioned in earlier steps. I suppose
> > it's a leftover from the stateful API? :)
>
> Correct. :)
>
> >
> > Still, I think we may eventually need source change events, because of
> > the reasons I mentioned above.
>
> I'd personally prefer to do without, but if they become a requirement
> (which I am not convinced of yet) we will indeed have no choice.
>

Agreed.

[snip]
> > > +Decoding
> > > +========
> > > +
> > > +For each frame, the client is responsible for submitting a request to which the
> > > +following is attached:
> > > +
> > > +* Exactly one frame worth of encoded data in a buffer submitted to the
> > > + ``OUTPUT`` queue,
> >
> > Just to make sure, in case of H.264, that would include all the slices
> > of the frame in one buffer, right?
>
> Yes. I thought "one frame worth of encoded data" already carried that
> meaning, but do you think this should be said more explicitly?

No strong opinion. Perhaps we could further specify this in the
description of each format.

[snip]
> > > Decoding errors are signaled by the ``CAPTURE`` buffers being
> > > +dequeued carrying the ``V4L2_BUF_FLAG_ERROR`` flag.
> > > +
> > > +The contents of source ``OUTPUT`` buffers, as well as the controls that must be
> > > +set on the request, depend on active coded pixel format and might be affected by
> > > +codec-specific extended controls, as stated in documentation of each format
> > > +individually.
> > > +
> > > +MPEG-2 buffer content and controls
> > > +----------------------------------
> > > +The following information is valid when the ``OUTPUT`` queue is set to the
> > > +``V4L2_PIX_FMT_MPEG2_SLICE`` format.
> >
> > Perhaps we should document the controls there instead? I think that
> > was the conclusion from the discussions over the stateful API.
>
> Do you mean, on the format page? Sounds reasonable.
>

Yep.

[snip]
> > > +Dynamic resolution change
> > > +=========================
> > > +
> > > +If the client detects a resolution change in the stream, it may need to
> > > +reallocate the ``CAPTURE`` buffers to fit the new size.
> > > +
> > > +1. Wait until all submitted requests have completed and dequeue the
> > > + corresponding output buffers.
> > > +
> > > +2. Call :c:func:`VIDIOC_STREAMOFF` on the ``CAPTURE`` queue.
> > > +
> > > +3. Free all ``CAPTURE`` buffers by calling :c:func:`VIDIOC_REQBUFS` on the
> > > + ``CAPTURE`` queue with a buffer count of zero.
> > > +
> > > +4. Set the new format and resolution on the ``CAPTURE`` queue.
> > > +
> > > +5. Allocate new ``CAPTURE`` buffers for the new resolution.
> > > +
> > > +6. Call :c:func:`VIDIOC_STREAMON` on the ``CAPTURE`` queue to resume the stream.
> > > +
> > > +The client can then start queueing new ``CAPTURE`` buffers and submit requests
> > > +to decode the next buffers at the new resolution.
> >
> > There is some inconsistency here. In initialization sequence, we set
> > OUTPUT format to coded width and height and had the driver set a sane
> > default CAPTURE format. What happens to OUTPUT format? It definitely
> > wouldn't make sense if it stayed at the initial coded size.
>
> You're right. We need to set the resolution on OUTPUT first, and maybe
> reallocate input buffers as well since a resolution increase may mean
> they are now too small to contain one encoded frame.
>
> Actually, in the case of stateless codecs a resolution change may mean
> re-doing initialization from step #2. I suppose a resolution change
> can only happen on an IDR frame, so all the user-maintained state
> could be dropped anyway.
>

One could reinitialize, which would work, but could be costly.
Technically one could just use bigger frame buffers from the start,
which would make a resolution change just a no-op . The only thing
that would change would be visible size, but that's fully handled by
the user and nothing a stateless driver should be concerned with.

> >
> > > +
> > > +Drain
> > > +=====
> > > +
> > > +In order to drain the stream on a stateless decoder, the client just needs to
> > > +wait until all the submitted requests are completed. There is no need to send a
> > > +``V4L2_DEC_CMD_STOP`` command since requests are processed sequentially by the
> > > +driver.
> >
> > Is there a need to include this section? I feel like it basically says
> > "Drain sequence: There is no drain sequence." ;)
>
> This is mostly to match what we have in the stateful doc. Not saying
> anything may make it look like this has not been thoroughly tought. :)
>

I guess we can keep it indeed. Leaving to other to comment.

Best regards,
Tomasz

2018-10-03 10:13:20

[permalink] [raw]

Subject: Re: [RFC PATCH] media: docs-rst: Document m2m stateless video decoder interface

On Tue, Sep 11, 2018 at 5:48 PM Hans Verkuil <[email protected]> wrote:
>
> On 09/11/18 10:40, Alexandre Courbot wrote:
> >> I am not sure whether this should be documented, but there are some additional
> >> restrictions w.r.t. reference frames:
> >>
> >> Since decoders need access to the decoded reference frames there are some corner
> >> cases that need to be checked:
> >>
> >> 1) V4L2_MEMORY_USERPTR cannot be used for the capture queue: the driver does not
> >> know when a malloced but dequeued buffer is freed, so the reference frame
> >> could suddenly be gone.
> >
> > It it is confirmed that we cannot use USERPTR buffers as CAPTURE then
> > this probably needs to be documented. I wonder if there isn't a way to
> > avoid this by having vb2 keep a reference to the pages in such a way
> > that they would not be recycled after after userspace calls free() on
> > the buffer. Is that possible with user-allocated memory?
>
> vb2 keeps a reference while the buffer is queued, but that reference is
> released once the buffer is dequeued. Correctly, IMHO. If you provide
> USERPTR, than userspace is responsible for the memory. Changing this
> would require changing the API, since USERPTR has always worked like
> this.

That would be a userspace bug wouldn't it? The next try to get user
pages would fail in that case and we could just fail such decode
request, couldn't we?

(Personally I'm not a big fan of USERPTR, though.)

>
> >
> > Not that I think that forbidding USERPTR buffers in this scenario
> > would be a big problem.
>
> I think it is perfectly OK to forbid this. Ideally I would like to have
> some test in v4l2-compliance or (even better) v4l2-mem2mem.c for this,
> but it is actually not that easy to identify drivers like this.
>
> Suggestions for this on a postcard...
>
> >
> >>
> >> 2) V4L2_MEMORY_DMABUF can be used, but drivers should check that the dma buffer is
> >> still available AND increase the dmabuf refcount while it is used by the HW.
> >
> > Yeah, with DMABUF the above problem can easily be avoided at least.
> >
> >>
> >> 3) What to do if userspace has requeued a buffer containing a reference frame,
> >> and you want to decode a B/P-frame that refers to that buffer? We need to
> >> check against that: I think that when you queue a capture buffer whose index
> >> is used in a pending request as a reference frame, than that should fail with
> >> an error. And trying to queue a request referring to a buffer that has been
> >> requeued should also fail.
> >>
> >> We might need to add some support for this in v4l2-mem2mem.c or vb2.
> >
> > Sounds good, and we should document this as well.
> >
>
> Right. And test it!

I'm not convinced that we should be enforcing this. Moreover,
requeuing a buffer containing a reference frame for a pending request
is not bound to be an error. It might be a legit case when the same
entry in the reference list is replaced with a different key frame
decoded into the same buffer as the reference list entry pointed until
now.

Best regards,
Tomasz

2018-10-03 10:14:16

[permalink] [raw]

Subject: Re: [RFC PATCH] media: docs-rst: Document m2m stateless video decoder interface

On Tue, Sep 18, 2018 at 5:02 PM Alexandre Courbot <[email protected]> wrote:
>
> Hi Hans, sorry for the late reply.
>
> On Tue, Sep 11, 2018 at 6:09 PM Hans Verkuil <[email protected]> wrote:
> >
> > On 09/11/18 10:40, Alexandre Courbot wrote:
> > > On Mon, Sep 10, 2018 at 9:49 PM Hans Verkuil <[email protected]> wrote:
> > >>
> > >> On 09/10/2018 01:57 PM, Hans Verkuil wrote:
> > >>> On 09/10/2018 01:25 PM, Hans Verkuil wrote:
> > >>>>> +
> > >>>>> +Decoding
> > >>>>> +========
> > >>>>> +
> > >>>>> +For each frame, the client is responsible for submitting a request to which the
> > >>>>> +following is attached:
> > >>>>> +
> > >>>>> +* Exactly one frame worth of encoded data in a buffer submitted to the
> > >>>>> + ``OUTPUT`` queue,
> > >>>>> +* All the controls relevant to the format being decoded (see below for details).
> > >>>>> +
> > >>>>> +``CAPTURE`` buffers must not be part of the request, but must be queued
> > >>>>> +independently. The driver will pick one of the queued ``CAPTURE`` buffers and
> > >>>>> +decode the frame into it. Although the client has no control over which
> > >>>>> +``CAPTURE`` buffer will be used with a given ``OUTPUT`` buffer, it is guaranteed
> > >>>>> +that ``CAPTURE`` buffers will be returned in decode order (i.e. the same order
> > >>>>> +as ``OUTPUT`` buffers were submitted), so it is trivial to associate a dequeued
> > >>>>> +``CAPTURE`` buffer to its originating request and ``OUTPUT`` buffer.
> > >>>>> +
> > >>>>> +If the request is submitted without an ``OUTPUT`` buffer or if one of the
> > >>>>> +required controls are missing, then :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return
> > >>>>> +``-EINVAL``.
> > >>>>
> > >>>> Not entirely true: if buffers are missing, then ENOENT is returned. Missing required
> > >>>> controls or more than one OUTPUT buffer will result in EINVAL. This per the latest
> > >>>> Request API changes.
> > >>>>
> > >>>> Decoding errors are signaled by the ``CAPTURE`` buffers being
> > >>>>> +dequeued carrying the ``V4L2_BUF_FLAG_ERROR`` flag.
> > >>>>
> > >>>> Add here that if the reference frame had an error, then all other frames that refer
> > >>>> to it should also set the ERROR flag. It is up to userspace to decide whether or
> > >>>> not to drop them (part of the frame might still be valid).
> > >>>>
> > >>>> I am not sure whether this should be documented, but there are some additional
> > >>>> restrictions w.r.t. reference frames:
> > >>>>
> > >>>> Since decoders need access to the decoded reference frames there are some corner
> > >>>> cases that need to be checked:
> > >>>>
> > >>>> 1) V4L2_MEMORY_USERPTR cannot be used for the capture queue: the driver does not
> > >>>> know when a malloced but dequeued buffer is freed, so the reference frame
> > >>>> could suddenly be gone.
> > >>>>
> > >>>> 2) V4L2_MEMORY_DMABUF can be used, but drivers should check that the dma buffer is
> > >>>> still available AND increase the dmabuf refcount while it is used by the HW.
> > >>>>
> > >>>> 3) What to do if userspace has requeued a buffer containing a reference frame,
> > >>>> and you want to decode a B/P-frame that refers to that buffer? We need to
> > >>>> check against that: I think that when you queue a capture buffer whose index
> > >>>> is used in a pending request as a reference frame, than that should fail with
> > >>>> an error. And trying to queue a request referring to a buffer that has been
> > >>>> requeued should also fail.
> > >>>>
> > >>>> We might need to add some support for this in v4l2-mem2mem.c or vb2.
> > >>>>
> > >>>> We will have similar (but not quite identical) issues with stateless encoders.
> > >>>
> > >>> Related to this is the question whether buffer indices that are used to refer
> > >>> to reference frames should refer to the capture or output queue.
> > >>>
> > >>> Using capture indices works if you never queue more than one request at a time:
> > >>> you know exactly what the capture buffer index is of the decoded I-frame, and
> > >>> you can use that in the following requests.
> > >>>
> > >>> But if multiple requests are queued, then you don't necessarily know to which
> > >>> capture buffer an I-frame will be decoded, so then you can't provide this index
> > >>> to following B/P-frames. This puts restrictions on userspace: you can only
> > >>> queue B/P-frames once you have decoded the I-frame. This might be perfectly
> > >>> acceptable, though.
> > >
> > > IIUC at the moment we are indeed using CAPTURE buffer indexes, e.g:
> > >
> > > .. flat-table:: struct v4l2_ctrl_mpeg2_slice_params
> > > ..
> > > - ``backward_ref_index``
> > > - Index for the V4L2 buffer to use as backward reference, used
> > > with
> > > B-coded and P-coded frames.
> > >
> > > So I wonder how is the user-space currently exercising Cedrus doing
> > > here? Waiting for each frame used as a reference to be dequeued?
> >
> > No, the assumption is (if I understand correctly) that userspace won't
> > touch the memory of the dequeued reference buffer so HW can just point
> > to it.
> >
> > Paul, please correct me if I am wrong.
> >
> > What does chromeOS do?
>
> At the moment Chrome OS (using the config store) queues the OUTPUT and
> CAPTURE buffers together, i.e. in the same request. The CAPTURE buffer
> is not tied to the request in any way, but what seems to matter here
> is the queue order. If drivers process CAPTURE drivers sequentially,
> then you can know which CAPTURE buffer will be used for the request.
>
> The corollary of that is that CAPTURE buffers cannot be re-queued
> until they are not referenced anymore, something the Chrome OS
> userspace also takes care of. Would it be a problem to make this the
> default expectation instead of having the kernel check and reorder
> CAPTURE buffers? The worst that can happen AFAICT is is frame
> corruption, and processing queued CAPTURE buffers sequentially would
> allow us to use the V4L2 buffer ID to reference frames. That's still
> the most intuitive way to do, using relative frame indexes (i.e. X
> frames ago) adds complexity and the potential for misuse and bugs.

+1

The stateless API delegates the reference frame management to the
client and I don't see why we should be protecting the client from
itself. In particular, as I suggested in another email, there can be
valid cases where requeuing CAPTURE buffers while still on the
reference list is okay.

Best regards,
Tomasz

2018-10-04 11:51:26

by Paul Kocialkowski

[permalink] [raw]

Subject: Re: [RFC PATCH] media: docs-rst: Document m2m stateless video decoder interface

Hi,

Le mercredi 03 octobre 2018 à 19:13 +0900, Tomasz Figa a écrit :
> On Tue, Sep 18, 2018 at 5:02 PM Alexandre Courbot <[email protected]> wrote:
> > Hi Hans, sorry for the late reply.
> >
> > On Tue, Sep 11, 2018 at 6:09 PM Hans Verkuil <[email protected]> wrote:
> > > On 09/11/18 10:40, Alexandre Courbot wrote:
> > > > On Mon, Sep 10, 2018 at 9:49 PM Hans Verkuil <[email protected]> wrote:
> > > > > On 09/10/2018 01:57 PM, Hans Verkuil wrote:
> > > > > > On 09/10/2018 01:25 PM, Hans Verkuil wrote:
> > > > > > > > +
> > > > > > > > +Decoding
> > > > > > > > +========
> > > > > > > > +
> > > > > > > > +For each frame, the client is responsible for submitting a request to which the
> > > > > > > > +following is attached:
> > > > > > > > +
> > > > > > > > +* Exactly one frame worth of encoded data in a buffer submitted to the
> > > > > > > > + ``OUTPUT`` queue,
> > > > > > > > +* All the controls relevant to the format being decoded (see below for details).
> > > > > > > > +
> > > > > > > > +``CAPTURE`` buffers must not be part of the request, but must be queued
> > > > > > > > +independently. The driver will pick one of the queued ``CAPTURE`` buffers and
> > > > > > > > +decode the frame into it. Although the client has no control over which
> > > > > > > > +``CAPTURE`` buffer will be used with a given ``OUTPUT`` buffer, it is guaranteed
> > > > > > > > +that ``CAPTURE`` buffers will be returned in decode order (i.e. the same order
> > > > > > > > +as ``OUTPUT`` buffers were submitted), so it is trivial to associate a dequeued
> > > > > > > > +``CAPTURE`` buffer to its originating request and ``OUTPUT`` buffer.
> > > > > > > > +
> > > > > > > > +If the request is submitted without an ``OUTPUT`` buffer or if one of the
> > > > > > > > +required controls are missing, then :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return
> > > > > > > > +``-EINVAL``.
> > > > > > >
> > > > > > > Not entirely true: if buffers are missing, then ENOENT is returned. Missing required
> > > > > > > controls or more than one OUTPUT buffer will result in EINVAL. This per the latest
> > > > > > > Request API changes.
> > > > > > >
> > > > > > > Decoding errors are signaled by the ``CAPTURE`` buffers being
> > > > > > > > +dequeued carrying the ``V4L2_BUF_FLAG_ERROR`` flag.
> > > > > > >
> > > > > > > Add here that if the reference frame had an error, then all other frames that refer
> > > > > > > to it should also set the ERROR flag. It is up to userspace to decide whether or
> > > > > > > not to drop them (part of the frame might still be valid).
> > > > > > >
> > > > > > > I am not sure whether this should be documented, but there are some additional
> > > > > > > restrictions w.r.t. reference frames:
> > > > > > >
> > > > > > > Since decoders need access to the decoded reference frames there are some corner
> > > > > > > cases that need to be checked:
> > > > > > >
> > > > > > > 1) V4L2_MEMORY_USERPTR cannot be used for the capture queue: the driver does not
> > > > > > > know when a malloced but dequeued buffer is freed, so the reference frame
> > > > > > > could suddenly be gone.
> > > > > > >
> > > > > > > 2) V4L2_MEMORY_DMABUF can be used, but drivers should check that the dma buffer is
> > > > > > > still available AND increase the dmabuf refcount while it is used by the HW.
> > > > > > >
> > > > > > > 3) What to do if userspace has requeued a buffer containing a reference frame,
> > > > > > > and you want to decode a B/P-frame that refers to that buffer? We need to
> > > > > > > check against that: I think that when you queue a capture buffer whose index
> > > > > > > is used in a pending request as a reference frame, than that should fail with
> > > > > > > an error. And trying to queue a request referring to a buffer that has been
> > > > > > > requeued should also fail.
> > > > > > >
> > > > > > > We might need to add some support for this in v4l2-mem2mem.c or vb2.
> > > > > > >
> > > > > > > We will have similar (but not quite identical) issues with stateless encoders.
> > > > > >
> > > > > > Related to this is the question whether buffer indices that are used to refer
> > > > > > to reference frames should refer to the capture or output queue.
> > > > > >
> > > > > > Using capture indices works if you never queue more than one request at a time:
> > > > > > you know exactly what the capture buffer index is of the decoded I-frame, and
> > > > > > you can use that in the following requests.
> > > > > >
> > > > > > But if multiple requests are queued, then you don't necessarily know to which
> > > > > > capture buffer an I-frame will be decoded, so then you can't provide this index
> > > > > > to following B/P-frames. This puts restrictions on userspace: you can only
> > > > > > queue B/P-frames once you have decoded the I-frame. This might be perfectly
> > > > > > acceptable, though.
> > > >
> > > > IIUC at the moment we are indeed using CAPTURE buffer indexes, e.g:
> > > >
> > > > .. flat-table:: struct v4l2_ctrl_mpeg2_slice_params
> > > > ..
> > > > - ``backward_ref_index``
> > > > - Index for the V4L2 buffer to use as backward reference, used
> > > > with
> > > > B-coded and P-coded frames.
> > > >
> > > > So I wonder how is the user-space currently exercising Cedrus doing
> > > > here? Waiting for each frame used as a reference to be dequeued?
> > >
> > > No, the assumption is (if I understand correctly) that userspace won't
> > > touch the memory of the dequeued reference buffer so HW can just point
> > > to it.
> > >
> > > Paul, please correct me if I am wrong.
> > >
> > > What does chromeOS do?
> >
> > At the moment Chrome OS (using the config store) queues the OUTPUT and
> > CAPTURE buffers together, i.e. in the same request. The CAPTURE buffer
> > is not tied to the request in any way, but what seems to matter here
> > is the queue order. If drivers process CAPTURE drivers sequentially,
> > then you can know which CAPTURE buffer will be used for the request.
> >
> > The corollary of that is that CAPTURE buffers cannot be re-queued
> > until they are not referenced anymore, something the Chrome OS
> > userspace also takes care of. Would it be a problem to make this the
> > default expectation instead of having the kernel check and reorder
> > CAPTURE buffers? The worst that can happen AFAICT is is frame
> > corruption, and processing queued CAPTURE buffers sequentially would
> > allow us to use the V4L2 buffer ID to reference frames. That's still
> > the most intuitive way to do, using relative frame indexes (i.e. X
> > frames ago) adds complexity and the potential for misuse and bugs.
>
> +1
>
> The stateless API delegates the reference frame management to the
> client and I don't see why we should be protecting the client from
> itself. In particular, as I suggested in another email, there can be
> valid cases where requeuing CAPTURE buffers while still on the
> reference list is okay.

I agree that it's best to delegate this to the client, to alleviate the
complexity of dealing with relative indexes in the driver. From what
I've seen, players take care of allocating enough buffers so that re-
queuing CAPTURE buffers is not a problem in practice (they are no
longer used as reference when re-queued). What I've seen is that
buffers are simply rotated at each frame and that just works with
enough buffers allocated.

From that perspective, it would probably also make sense to ask that
userspace provides CAPTURE buffers indexes for references (and thus
stays in charge of the CAPTURE-OUTPUT association).

We could also have the driver keep that association and ask userspace
to provide OUTPUT buffers indexes for references. This seems more
consistent from an API perspective (requests associate OUTPUT buffers
with metadata, so that's the unit that naturally identifies a frame).

However, I'm pretty sure that userspace would also have to keep the
association one way or another to decide which buffer can be reused
safely, so it seems to me like keeping CAPTURE indexes would reduce the
driver overhead without really complexifying userspace much.

What do you think?

Cheers,

Paul

--
Developer of free digital technology and hardware support.

Website: https://www.paulk.fr/
Coding blog: https://code.paulk.fr/
Git repositories: https://git.paulk.fr/ https://git.code.paulk.fr/

Attachments:

signature.asc (849.00 B)
This is a digitally signed message part

2018-10-08 10:09:16