LinuxLists.cc - Re: [PATCH V2] [media] v4l2: Add AV1 pixel format

2022-09-12 15:48:32

Subject: Re: [PATCH V2] [media] v4l2: Add AV1 pixel format

Hi Shi,

thanks for the patches, check inline for some comments. Generally speaking, we
don't usually add formats ahead of time unless we have a good rationale to do
so. Should be expect a companion series against the amlogic decoder driver that
enables this ?

Le mardi 30 août 2022 à 09:40 +0800, Shi Hao a écrit :
> From: "hao.shi" <[email protected]>
>
> Add AV1 compressed pixel format. It is the more common format.
>
> Signed-off-by: Hao Shi <[email protected]>
> ---
> .../userspace-api/media/v4l/pixfmt-compressed.rst | 9 +++++++++
> drivers/media/v4l2-core/v4l2-ioctl.c | 1 +
> include/uapi/linux/videodev2.h | 1 +
> 3 files changed, 11 insertions(+)
>
> diff --git a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
> index 506dd3c98884..5bdeeebdf9f5 100644
> --- a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
> +++ b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
> @@ -232,6 +232,15 @@ Compressed Formats
> Metadata associated with the frame to decode is required to be passed
> through the ``V4L2_CID_STATELESS_FWHT_PARAMS`` control.
> See the :ref:`associated Codec Control ID <codec-stateless-fwht>`.
> + * .. _V4L2-PIX-FMT-AV1:
> +
> + - ``V4L2_PIX_FMT_AV1``
> + - 'AV1'
> + - AV1 Access Unit. The decoder expects one Access Unit per buffer.

I believe this is using a MPEG LA terminology. Did you mean a Temporal Unit (TU)
? In AV1 a TU represent 1 displayable picture, just like AU in H.264 (if you
ignore interlaced video).

> + The encoder generates one Access Unit per buffer. This format is
> + adapted for stateful video decoders. AV1 (AOMedia Video 1) is an
> + open video coding format. It was developed as a successor to VP9
> + by the Alliance for Open Media (AOMedia).
>
> .. raw:: latex
>
> diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
> index c314025d977e..fc0f43228546 100644
> --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> @@ -1497,6 +1497,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
> case V4L2_PIX_FMT_MT21C: descr = "Mediatek Compressed Format"; break;
> case V4L2_PIX_FMT_QC08C: descr = "QCOM Compressed 8-bit Format"; break;
> case V4L2_PIX_FMT_QC10C: descr = "QCOM Compressed 10-bit Format"; break;
> + case V4L2_PIX_FMT_AV1: descr = "AV1"; break;
> default:
> if (fmt->description[0])
> return;
> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> index 01e630f2ec78..c5ea9f38d807 100644
> --- a/include/uapi/linux/videodev2.h
> +++ b/include/uapi/linux/videodev2.h
> @@ -738,6 +738,7 @@ struct v4l2_pix_format {
> #define V4L2_PIX_FMT_FWHT_STATELESS v4l2_fourcc('S', 'F', 'W', 'H') /* Stateless FWHT (vicodec) */
> #define V4L2_PIX_FMT_H264_SLICE v4l2_fourcc('S', '2', '6', '4') /* H264 parsed slices */
> #define V4L2_PIX_FMT_HEVC_SLICE v4l2_fourcc('S', '2', '6', '5') /* HEVC parsed slices */
> +#define V4L2_PIX_FMT_AV1 v4l2_fourcc('A', 'V', '1', '0') /* AV1 */
>
> /* Vendor-specific formats */
> #define V4L2_PIX_FMT_CPIA1 v4l2_fourcc('C', 'P', 'I', 'A') /* cpia1 YUV */
>
> base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868

2022-11-29 11:20:56

by Hsia-Jun Li

[permalink] [raw]

Subject: Re: [PATCH V2] [media] v4l2: Add AV1 pixel format

Hello

I think we need to add an extra event for VP9 and AV1 which support
frame scaling, which means its frame width and height could be different
to the previous frame or reference frame.

That would be more possible for the VP9 as there is not a sequence
header for VP9.

On 9/12/22 23:45, Nicolas Dufresne wrote:
> Hi Shi,
>
> thanks for the patches, check inline for some comments. Generally speaking, we
> don't usually add formats ahead of time unless we have a good rationale to do
> so. Should be expect a companion series against the amlogic decoder driver that
> enables this ?
>
> Le mardi 30 août 2022 à 09:40 +0800, Shi Hao a écrit :
>> From: "hao.shi" <[email protected]>
>>
>> Add AV1 compressed pixel format. It is the more common format.
>>
>> Signed-off-by: Hao Shi <[email protected]>
>> ---
>> .../userspace-api/media/v4l/pixfmt-compressed.rst | 9 +++++++++
>> drivers/media/v4l2-core/v4l2-ioctl.c | 1 +
>> include/uapi/linux/videodev2.h | 1 +
>> 3 files changed, 11 insertions(+)
>>
>> diff --git a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
>> index 506dd3c98884..5bdeeebdf9f5 100644
>> --- a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
>> +++ b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
>> @@ -232,6 +232,15 @@ Compressed Formats
>> Metadata associated with the frame to decode is required to be passed
>> through the ``V4L2_CID_STATELESS_FWHT_PARAMS`` control.
>> See the :ref:`associated Codec Control ID <codec-stateless-fwht>`.
>> + * .. _V4L2-PIX-FMT-AV1:
>> +
>> + - ``V4L2_PIX_FMT_AV1``
>> + - 'AV1'
>> + - AV1 Access Unit. The decoder expects one Access Unit per buffer.
>
> I believe this is using a MPEG LA terminology. Did you mean a Temporal Unit (TU)
> ? In AV1 a TU represent 1 displayable picture, just like AU in H.264 (if you
> ignore interlaced video).
I think it should be a complete tile group obu. From the spec, we have
the term 'frame'.

Currently, AV1 doesn't support interlace.
>
>> + The encoder generates one Access Unit per buffer. This format is
>> + adapted for stateful video decoders. AV1 (AOMedia Video 1) is an
>> + open video coding format. It was developed as a successor to VP9
>> + by the Alliance for Open Media (AOMedia).
>>
>> .. raw:: latex
>>
>> diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
>> index c314025d977e..fc0f43228546 100644
>> --- a/drivers/media/v4l2-core/v4l2-ioctl.c
>> +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
>> @@ -1497,6 +1497,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
>> case V4L2_PIX_FMT_MT21C: descr = "Mediatek Compressed Format"; break;
>> case V4L2_PIX_FMT_QC08C: descr = "QCOM Compressed 8-bit Format"; break;
>> case V4L2_PIX_FMT_QC10C: descr = "QCOM Compressed 10-bit Format"; break;
>> + case V4L2_PIX_FMT_AV1: descr = "AV1"; break;
>> default:
>> if (fmt->description[0])
>> return;
>> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
>> index 01e630f2ec78..c5ea9f38d807 100644
>> --- a/include/uapi/linux/videodev2.h
>> +++ b/include/uapi/linux/videodev2.h
>> @@ -738,6 +738,7 @@ struct v4l2_pix_format {
>> #define V4L2_PIX_FMT_FWHT_STATELESS v4l2_fourcc('S', 'F', 'W', 'H') /* Stateless FWHT (vicodec) */
>> #define V4L2_PIX_FMT_H264_SLICE v4l2_fourcc('S', '2', '6', '4') /* H264 parsed slices */
>> #define V4L2_PIX_FMT_HEVC_SLICE v4l2_fourcc('S', '2', '6', '5') /* HEVC parsed slices */
>> +#define V4L2_PIX_FMT_AV1 v4l2_fourcc('A', 'V', '1', '0') /* AV1 */
>>
>> /* Vendor-specific formats */
>> #define V4L2_PIX_FMT_CPIA1 v4l2_fourcc('C', 'P', 'I', 'A') /* cpia1 YUV */
>>
>> base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
>
>

--
Hsia-Jun(Randy) Li

2022-12-06 18:40:39

by Nicolas Dufresne

[permalink] [raw]

Subject: Re: [PATCH V2] [media] v4l2: Add AV1 pixel format

Le mardi 29 novembre 2022 à 18:32 +0800, Hsia-Jun Li a écrit :
> Hello
>
> I think we need to add an extra event for VP9 and AV1 which support
> frame scaling, which means its frame width and height could be different
> to the previous frame or reference frame.
>
> That would be more possible for the VP9 as there is not a sequence
> header for VP9.

The solution is unlikely in the form of an event, but yes, to complete VP9
support (and improve AV1 support) a mechanism need to be designed and specified
to handle inter-frame resolution changes.

Why I say improve AV1, this is because VP9 bitstream does not signal SVC spatial
streams (the most common use of inter-frame resolution changes). With SVC
streams, the smaller images are alway decode-only (never displayed). This can be
at least partially supported as long as the maximum image dimension is signalled
by the bitstream. This is the case for AV1, but not VP9.

Stateless decoders are not affected, because userspace is aware of frames being
decoded, but not displayed. It is also aware that these frames are reference
frames. While on stateless decoder, userspace usually does not have this
knowledge. I think one way to solve this, would be for drivers to be able to
mark a buffer done, with a flag telling userspace that its not to be displayed.
For the SVC case, the dimensions and stride are irrelevant.

For true inter-resolution changes, like VP9 supports (though rarely used), this
needs more APIs. It was suggested to extend CREATE_BUFS, which allow allocation
with different FMT, with a DELETE_BUFS ioctl, so that userspace can smoothly
handle the allocation transition. For VP9 also, it might be required to support
super-frame, VP9 supper frames are the ancestor of AV1 TU, and only the last
frame of a super-frame is every to be displayed. A newly introduced AV1 format
might also requires complete TU, rather then frames, this needs strict
documentation. Decoding frames would mean that un-display and frame of different
sizes get delivered, and we don't have a method to communicate these frame
dimension and strides at the moment.

Nicolas

>
> On 9/12/22 23:45, Nicolas Dufresne wrote:
> > Hi Shi,
> >
> > thanks for the patches, check inline for some comments. Generally speaking, we
> > don't usually add formats ahead of time unless we have a good rationale to do
> > so. Should be expect a companion series against the amlogic decoder driver that
> > enables this ?
> >
> > Le mardi 30 août 2022 à 09:40 +0800, Shi Hao a écrit :
> > > From: "hao.shi" <[email protected]>
> > >
> > > Add AV1 compressed pixel format. It is the more common format.
> > >
> > > Signed-off-by: Hao Shi <[email protected]>
> > > ---
> > > .../userspace-api/media/v4l/pixfmt-compressed.rst | 9 +++++++++
> > > drivers/media/v4l2-core/v4l2-ioctl.c | 1 +
> > > include/uapi/linux/videodev2.h | 1 +
> > > 3 files changed, 11 insertions(+)
> > >
> > > diff --git a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
> > > index 506dd3c98884..5bdeeebdf9f5 100644
> > > --- a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
> > > +++ b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
> > > @@ -232,6 +232,15 @@ Compressed Formats
> > > Metadata associated with the frame to decode is required to be passed
> > > through the ``V4L2_CID_STATELESS_FWHT_PARAMS`` control.
> > > See the :ref:`associated Codec Control ID <codec-stateless-fwht>`.
> > > + * .. _V4L2-PIX-FMT-AV1:
> > > +
> > > + - ``V4L2_PIX_FMT_AV1``
> > > + - 'AV1'
> > > + - AV1 Access Unit. The decoder expects one Access Unit per buffer.
> >
> > I believe this is using a MPEG LA terminology. Did you mean a Temporal Unit (TU)
> > ? In AV1 a TU represent 1 displayable picture, just like AU in H.264 (if you
> > ignore interlaced video).
> I think it should be a complete tile group obu. From the spec, we have
> the term 'frame'.
>
> Currently, AV1 doesn't support interlace.
> >
> > > + The encoder generates one Access Unit per buffer. This format is
> > > + adapted for stateful video decoders. AV1 (AOMedia Video 1) is an
> > > + open video coding format. It was developed as a successor to VP9
> > > + by the Alliance for Open Media (AOMedia).
> > >
> > > .. raw:: latex
> > >
> > > diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
> > > index c314025d977e..fc0f43228546 100644
> > > --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> > > +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> > > @@ -1497,6 +1497,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
> > > case V4L2_PIX_FMT_MT21C: descr = "Mediatek Compressed Format"; break;
> > > case V4L2_PIX_FMT_QC08C: descr = "QCOM Compressed 8-bit Format"; break;
> > > case V4L2_PIX_FMT_QC10C: descr = "QCOM Compressed 10-bit Format"; break;
> > > + case V4L2_PIX_FMT_AV1: descr = "AV1"; break;
> > > default:
> > > if (fmt->description[0])
> > > return;
> > > diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> > > index 01e630f2ec78..c5ea9f38d807 100644
> > > --- a/include/uapi/linux/videodev2.h
> > > +++ b/include/uapi/linux/videodev2.h
> > > @@ -738,6 +738,7 @@ struct v4l2_pix_format {
> > > #define V4L2_PIX_FMT_FWHT_STATELESS v4l2_fourcc('S', 'F', 'W', 'H') /* Stateless FWHT (vicodec) */
> > > #define V4L2_PIX_FMT_H264_SLICE v4l2_fourcc('S', '2', '6', '4') /* H264 parsed slices */
> > > #define V4L2_PIX_FMT_HEVC_SLICE v4l2_fourcc('S', '2', '6', '5') /* HEVC parsed slices */
> > > +#define V4L2_PIX_FMT_AV1 v4l2_fourcc('A', 'V', '1', '0') /* AV1 */
> > >
> > > /* Vendor-specific formats */
> > > #define V4L2_PIX_FMT_CPIA1 v4l2_fourcc('C', 'P', 'I', 'A') /* cpia1 YUV */
> > >
> > > base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
> >
> >
>

2022-12-07 07:32:46

by Hsia-Jun Li

[permalink] [raw]

Subject: Re: [PATCH V2] [media] v4l2: Add AV1 pixel format

On 12/7/22 02:03, Nicolas Dufresne wrote:
> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> Le mardi 29 novembre 2022 à 18:32 +0800, Hsia-Jun Li a écrit :
>> Hello
>>
>> I think we need to add an extra event for VP9 and AV1 which support
>> frame scaling, which means its frame width and height could be different
>> to the previous frame or reference frame.
>>
>> That would be more possible for the VP9 as there is not a sequence
>> header for VP9.
>
> The solution is unlikely in the form of an event, but yes, to complete VP9
> support (and improve AV1 support) a mechanism need to be designed and specified
> to handle inter-frame resolution changes.
>
> Why I say improve AV1, this is because VP9 bitstream does not signal SVC spatial
> streams (the most common use of inter-frame resolution changes). With SVC
> streams, the smaller images are alway decode-only (never displayed). This can be
> at least partially supported as long as the maximum image dimension is signalled
> by the bitstream. This is the case for AV1, but not VP9.
>
> Stateless decoders are not affected, because userspace is aware of frames being
> decoded, but not displayed. It is also aware that these frames are reference
> frames. While on stateless decoder, userspace usually does not have this
> knowledge. I think one way to solve this, would be for drivers to be able to
> mark a buffer done, with a flag telling userspace that its not to be displayed.
> For the SVC case, the dimensions and stride are irrelevant.
>
> For true inter-resolution changes, like VP9 supports (though rarely used), this
> needs more APIs. It was suggested to extend CREATE_BUFS, which allow allocation
> with different FMT, with a DELETE_BUFS ioctl, so that userspace can smoothly
> handle the allocation transition.
This could only solve the problem of never display graphics buffers
likes golden frame or alternative reference frame.

About the topic timestamp tracking problem in v4l2, maybe we could start
a new thread or move them to Gstreamer.
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/1619

My idea here is event attached to buffer or just using the new request
supported in CAPTURE side. I know you worry about the v4l2 event, it is
out of band, more event could lead to the problem we suffer from
OpenMAX. If we could have an order between event and buffer, it won't be
a problem.
For VP9 also, it might be required to support
> super-frame, VP9 supper frames are the ancestor of AV1 TU, and only the last
> frame of a super-frame is every to be displayed. A newly introduced AV1 format
> might also requires complete TU, rather then frames, this needs strict
> documentation.
I don't think the temporal unit is a good idea here.
Most of hardware could only decode a frame once or less likes a
tile(likes slice in ITU codecs).

Considering the MPEG-TS case,
https://aomediacodec.github.io/av1-mpeg2-ts/
Decodable Frame Group could be more a better idea.
Temporal Unit would lead to larger delay.

Decoding frames would mean that un-display and frame of different
> sizes get delivered, and we don't have a method to communicate these frame
> dimension and strides at the moment.
>
> Nicolas
>
>
>
>>
>> On 9/12/22 23:45, Nicolas Dufresne wrote:
>>> Hi Shi,
>>>
>>> thanks for the patches, check inline for some comments. Generally speaking, we
>>> don't usually add formats ahead of time unless we have a good rationale to do
>>> so. Should be expect a companion series against the amlogic decoder driver that
>>> enables this ?
>>>
>>> Le mardi 30 août 2022 à 09:40 +0800, Shi Hao a écrit :
>>>> From: "hao.shi" <[email protected]>
>>>>
>>>> Add AV1 compressed pixel format. It is the more common format.
>>>>
>>>> Signed-off-by: Hao Shi <[email protected]>
>>>> ---
>>>> .../userspace-api/media/v4l/pixfmt-compressed.rst | 9 +++++++++
>>>> drivers/media/v4l2-core/v4l2-ioctl.c | 1 +
>>>> include/uapi/linux/videodev2.h | 1 +
>>>> 3 files changed, 11 insertions(+)
>>>>
>>>> diff --git a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
>>>> index 506dd3c98884..5bdeeebdf9f5 100644
>>>> --- a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
>>>> +++ b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
>>>> @@ -232,6 +232,15 @@ Compressed Formats
>>>> Metadata associated with the frame to decode is required to be passed
>>>> through the ``V4L2_CID_STATELESS_FWHT_PARAMS`` control.
>>>> See the :ref:`associated Codec Control ID <codec-stateless-fwht>`.
>>>> + * .. _V4L2-PIX-FMT-AV1:
>>>> +
>>>> + - ``V4L2_PIX_FMT_AV1``
>>>> + - 'AV1'
>>>> + - AV1 Access Unit. The decoder expects one Access Unit per buffer.
>>>
>>> I believe this is using a MPEG LA terminology. Did you mean a Temporal Unit (TU)
>>> ? In AV1 a TU represent 1 displayable picture, just like AU in H.264 (if you
>>> ignore interlaced video).
>> I think it should be a complete tile group obu. From the spec, we have
>> the term 'frame'.
>>
>> Currently, AV1 doesn't support interlace.
>>>
>>>> + The encoder generates one Access Unit per buffer. This format is
>>>> + adapted for stateful video decoders. AV1 (AOMedia Video 1) is an
>>>> + open video coding format. It was developed as a successor to VP9
>>>> + by the Alliance for Open Media (AOMedia).
>>>>
>>>> .. raw:: latex
>>>>
>>>> diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
>>>> index c314025d977e..fc0f43228546 100644
>>>> --- a/drivers/media/v4l2-core/v4l2-ioctl.c
>>>> +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
>>>> @@ -1497,6 +1497,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
>>>> case V4L2_PIX_FMT_MT21C: descr = "Mediatek Compressed Format"; break;
>>>> case V4L2_PIX_FMT_QC08C: descr = "QCOM Compressed 8-bit Format"; break;
>>>> case V4L2_PIX_FMT_QC10C: descr = "QCOM Compressed 10-bit Format"; break;
>>>> + case V4L2_PIX_FMT_AV1: descr = "AV1"; break;
>>>> default:
>>>> if (fmt->description[0])
>>>> return;
>>>> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
>>>> index 01e630f2ec78..c5ea9f38d807 100644
>>>> --- a/include/uapi/linux/videodev2.h
>>>> +++ b/include/uapi/linux/videodev2.h
>>>> @@ -738,6 +738,7 @@ struct v4l2_pix_format {
>>>> #define V4L2_PIX_FMT_FWHT_STATELESS v4l2_fourcc('S', 'F', 'W', 'H') /* Stateless FWHT (vicodec) */
>>>> #define V4L2_PIX_FMT_H264_SLICE v4l2_fourcc('S', '2', '6', '4') /* H264 parsed slices */
>>>> #define V4L2_PIX_FMT_HEVC_SLICE v4l2_fourcc('S', '2', '6', '5') /* HEVC parsed slices */
>>>> +#define V4L2_PIX_FMT_AV1 v4l2_fourcc('A', 'V', '1', '0') /* AV1 */
>>>>
>>>> /* Vendor-specific formats */
>>>> #define V4L2_PIX_FMT_CPIA1 v4l2_fourcc('C', 'P', 'I', 'A') /* cpia1 YUV */
>>>>
>>>> base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
>>>
>>>
>>
>

--
Hsia-Jun(Randy) Li
Spelling

Possible spelling mistake found.

OpenVASPentaxOpenStaxOpenAIPenman

Add "OpenMAX" to personal dictionary

Ignore in this text

LanguageTool

basic

2022-12-07 17:44:25

by Nicolas Dufresne

[permalink] [raw]

Subject: Re: [PATCH V2] [media] v4l2: Add AV1 pixel format

Le mercredi 07 décembre 2022 à 15:18 +0800, Hsia-Jun Li a écrit :
>
> On 12/7/22 02:03, Nicolas Dufresne wrote:
> > CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
> >
> >
> > Le mardi 29 novembre 2022 à 18:32 +0800, Hsia-Jun Li a écrit :
> > > Hello
> > >
> > > I think we need to add an extra event for VP9 and AV1 which support
> > > frame scaling, which means its frame width and height could be different
> > > to the previous frame or reference frame.
> > >
> > > That would be more possible for the VP9 as there is not a sequence
> > > header for VP9.
> >
> > The solution is unlikely in the form of an event, but yes, to complete VP9
> > support (and improve AV1 support) a mechanism need to be designed and specified
> > to handle inter-frame resolution changes.
> >
> > Why I say improve AV1, this is because VP9 bitstream does not signal SVC spatial
> > streams (the most common use of inter-frame resolution changes). With SVC
> > streams, the smaller images are alway decode-only (never displayed). This can be
> > at least partially supported as long as the maximum image dimension is signalled
> > by the bitstream. This is the case for AV1, but not VP9.
> >
> > Stateless decoders are not affected, because userspace is aware of frames being
> > decoded, but not displayed. It is also aware that these frames are reference
> > frames. While on stateless decoder, userspace usually does not have this
> > knowledge. I think one way to solve this, would be for drivers to be able to
> > mark a buffer done, with a flag telling userspace that its not to be displayed.
> > For the SVC case, the dimensions and stride are irrelevant.
> >
> > For true inter-resolution changes, like VP9 supports (though rarely used), this
> > needs more APIs. It was suggested to extend CREATE_BUFS, which allow allocation
> > with different FMT, with a DELETE_BUFS ioctl, so that userspace can smoothly
> > handle the allocation transition.
> This could only solve the problem of never display graphics buffers
> likes golden frame or alternative reference frame.
>
> About the topic timestamp tracking problem in v4l2, maybe we could start
> a new thread or move them to Gstreamer.
> https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/1619
>
> My idea here is event attached to buffer or just using the new request
> supported in CAPTURE side. I know you worry about the v4l2 event, it is
> out of band, more event could lead to the problem we suffer from
> OpenMAX. If we could have an order between event and buffer, it won't be
> a problem.
> For VP9 also, it might be required to support
> > super-frame, VP9 supper frames are the ancestor of AV1 TU, and only the last
> > frame of a super-frame is every to be displayed. A newly introduced AV1 format
> > might also requires complete TU, rather then frames, this needs strict
> > documentation.
> I don't think the temporal unit is a good idea here.
> Most of hardware could only decode a frame once or less likes a
> tile(likes slice in ITU codecs).
>
> Considering the MPEG-TS case,
> https://aomediacodec.github.io/av1-mpeg2-ts/
> Decodable Frame Group could be more a better idea.
> Temporal Unit would lead to larger delay.

This is off topic for the tread, but this one could be fixed by setting a flag
on the capture buffer, something like:

V4L2_BUF_FLAG_DECODE_ONLY

That's similar to how it works with other CODEC API. The down side is that the
driver needs to remember if this is a reference frame when userspace queue back
that decode-only frame, so its not overwritten. Userspace is not aware of the
reference state, hence can't be made responsible. I suspect a lot of the drivers
out there uses secondary buffer, meaning the reference are not the CAPTURE
buffer. This use case needs to be thought thought too. Perhaps other driver uses
internally allocated memory whenever its about to produce a decode only, but
that seems to require some firmware feature that is likely uncommon. Please,
make your research, compare various drivers, and propose an API in the form of
an RFC so we can discuss that independently from this AV1 pixel format thread.

>
> Decoding frames would mean that un-display and frame of different
> > sizes get delivered, and we don't have a method to communicate these frame
> > dimension and strides at the moment.
> >
> > Nicolas
> >
> >
> >
> > >
> > > On 9/12/22 23:45, Nicolas Dufresne wrote:
> > > > Hi Shi,
> > > >
> > > > thanks for the patches, check inline for some comments. Generally speaking, we
> > > > don't usually add formats ahead of time unless we have a good rationale to do
> > > > so. Should be expect a companion series against the amlogic decoder driver that
> > > > enables this ?
> > > >
> > > > Le mardi 30 août 2022 à 09:40 +0800, Shi Hao a écrit :
> > > > > From: "hao.shi" <[email protected]>
> > > > >
> > > > > Add AV1 compressed pixel format. It is the more common format.
> > > > >
> > > > > Signed-off-by: Hao Shi <[email protected]>
> > > > > ---
> > > > > .../userspace-api/media/v4l/pixfmt-compressed.rst | 9 +++++++++
> > > > > drivers/media/v4l2-core/v4l2-ioctl.c | 1 +
> > > > > include/uapi/linux/videodev2.h | 1 +
> > > > > 3 files changed, 11 insertions(+)
> > > > >
> > > > > diff --git a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
> > > > > index 506dd3c98884..5bdeeebdf9f5 100644
> > > > > --- a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
> > > > > +++ b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
> > > > > @@ -232,6 +232,15 @@ Compressed Formats
> > > > > Metadata associated with the frame to decode is required to be passed
> > > > > through the ``V4L2_CID_STATELESS_FWHT_PARAMS`` control.
> > > > > See the :ref:`associated Codec Control ID <codec-stateless-fwht>`.
> > > > > + * .. _V4L2-PIX-FMT-AV1:
> > > > > +
> > > > > + - ``V4L2_PIX_FMT_AV1``
> > > > > + - 'AV1'
> > > > > + - AV1 Access Unit. The decoder expects one Access Unit per buffer.
> > > >
> > > > I believe this is using a MPEG LA terminology. Did you mean a Temporal Unit (TU)
> > > > ? In AV1 a TU represent 1 displayable picture, just like AU in H.264 (if you
> > > > ignore interlaced video).
> > > I think it should be a complete tile group obu. From the spec, we have
> > > the term 'frame'.
> > >
> > > Currently, AV1 doesn't support interlace.
> > > >
> > > > > + The encoder generates one Access Unit per buffer. This format is
> > > > > + adapted for stateful video decoders. AV1 (AOMedia Video 1) is an
> > > > > + open video coding format. It was developed as a successor to VP9
> > > > > + by the Alliance for Open Media (AOMedia).
> > > > >
> > > > > .. raw:: latex
> > > > >
> > > > > diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
> > > > > index c314025d977e..fc0f43228546 100644
> > > > > --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> > > > > +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> > > > > @@ -1497,6 +1497,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
> > > > > case V4L2_PIX_FMT_MT21C: descr = "Mediatek Compressed Format"; break;
> > > > > case V4L2_PIX_FMT_QC08C: descr = "QCOM Compressed 8-bit Format"; break;
> > > > > case V4L2_PIX_FMT_QC10C: descr = "QCOM Compressed 10-bit Format"; break;
> > > > > + case V4L2_PIX_FMT_AV1: descr = "AV1"; break;
> > > > > default:
> > > > > if (fmt->description[0])
> > > > > return;
> > > > > diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> > > > > index 01e630f2ec78..c5ea9f38d807 100644
> > > > > --- a/include/uapi/linux/videodev2.h
> > > > > +++ b/include/uapi/linux/videodev2.h
> > > > > @@ -738,6 +738,7 @@ struct v4l2_pix_format {
> > > > > #define V4L2_PIX_FMT_FWHT_STATELESS v4l2_fourcc('S', 'F', 'W', 'H') /* Stateless FWHT (vicodec) */
> > > > > #define V4L2_PIX_FMT_H264_SLICE v4l2_fourcc('S', '2', '6', '4') /* H264 parsed slices */
> > > > > #define V4L2_PIX_FMT_HEVC_SLICE v4l2_fourcc('S', '2', '6', '5') /* HEVC parsed slices */
> > > > > +#define V4L2_PIX_FMT_AV1 v4l2_fourcc('A', 'V', '1', '0') /* AV1 */
> > > > >
> > > > > /* Vendor-specific formats */
> > > > > #define V4L2_PIX_FMT_CPIA1 v4l2_fourcc('C', 'P', 'I', 'A') /* cpia1 YUV */
> > > > >
> > > > > base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
> > > >
> > > >
> > >
> >
>

2022-12-08 03:12:23

by Hsia-Jun Li

[permalink] [raw]

Subject: Re: [PATCH V2] [media] v4l2: Add AV1 pixel format

On 12/8/22 01:16, Nicolas Dufresne wrote:
> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> Le mercredi 07 décembre 2022 à 15:18 +0800, Hsia-Jun Li a écrit :
>>
>> On 12/7/22 02:03, Nicolas Dufresne wrote:
>>> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
>>>
>>>
>>> Le mardi 29 novembre 2022 à 18:32 +0800, Hsia-Jun Li a écrit :
>>>> Hello
>>>>
>>>> I think we need to add an extra event for VP9 and AV1 which support
>>>> frame scaling, which means its frame width and height could be different
>>>> to the previous frame or reference frame.
>>>>
>>>> That would be more possible for the VP9 as there is not a sequence
>>>> header for VP9.
>>>
>>> The solution is unlikely in the form of an event, but yes, to complete VP9
>>> support (and improve AV1 support) a mechanism need to be designed and specified
>>> to handle inter-frame resolution changes.
>>>
>>> Why I say improve AV1, this is because VP9 bitstream does not signal SVC spatial
>>> streams (the most common use of inter-frame resolution changes). With SVC
>>> streams, the smaller images are alway decode-only (never displayed). This can be

We expect to get all the result from different layers of a SVC stream.
Which layer would be displayed is a user's decision.

1. golden frame usually would be higher resolution or better quality.
But we would only display frames which are lower resolution.

2. Higher resolution or quality layer would have longer interval, user
may just display a lower layer in real time case (like video conference).

>>> at least partially supported as long as the maximum image dimension is signalled
>>> by the bitstream. This is the case for AV1, but not VP9.
>>>
>>> Stateless decoders are not affected, because userspace is aware of frames being
>>> decoded, but not displayed. It is also aware that these frames are reference
>>> frames. While on stateless decoder, userspace usually does not have this
>>> knowledge. I think one way to solve this, would be for drivers to be able to
>>> mark a buffer done, with a flag telling userspace that its not to be displayed.
>>> For the SVC case, the dimensions and stride are irrelevant.
>>>
>>> For true inter-resolution changes, like VP9 supports (though rarely used), this
>>> needs more APIs. It was suggested to extend CREATE_BUFS, which allow allocation
>>> with different FMT, with a DELETE_BUFS ioctl, so that userspace can smoothly
>>> handle the allocation transition.
>> This could only solve the problem of never display graphics buffers
>> likes golden frame or alternative reference frame.
>>
>> About the topic timestamp tracking problem in v4l2, maybe we could start
>> a new thread or move them to Gstreamer.
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.freedesktop.org_gstreamer_gstreamer_-2D_issues_1619&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=GL3Q2_6ERpT94we1vI-sUNajRV_t3-lcd8F6DAbXt5EimufjPEa-yTi1-p3EhsCM&s=KbTA0L42trYmxvVphaiUeDsgUS4e-vm64epfaSgAYH8&e=
>>
>> My idea here is event attached to buffer or just using the new request
>> supported in CAPTURE side. I know you worry about the v4l2 event, it is
>> out of band, more event could lead to the problem we suffer from
>> OpenMAX. If we could have an order between event and buffer, it won't be
>> a problem.
>> For VP9 also, it might be required to support
>>> super-frame, VP9 supper frames are the ancestor of AV1 TU, and only the last
>>> frame of a super-frame is every to be displayed. A newly introduced AV1 format
>>> might also requires complete TU, rather then frames, this needs strict
>>> documentation.
>> I don't think the temporal unit is a good idea here.
>> Most of hardware could only decode a frame once or less likes a
>> tile(likes slice in ITU codecs).
>>
>> Considering the MPEG-TS case,
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__aomediacodec.github.io_av1-2Dmpeg2-2Dts_&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=GL3Q2_6ERpT94we1vI-sUNajRV_t3-lcd8F6DAbXt5EimufjPEa-yTi1-p3EhsCM&s=f7qIYhe94ooCIv1awccCfSmI-Gq0raAXHRogkCBTB4M&e=
>> Decodable Frame Group could be more a better idea.
>> Temporal Unit would lead to larger delay.
>
> This is off topic for the tread, but this one could be fixed by setting a flag
> on the capture buffer, something like:
>
> V4L2_BUF_FLAG_DECODE_ONLY
>
> That's similar to how it works with other CODEC API. The down side is that the
> driver needs to remember if this is a reference frame when userspace queue back
> that decode-only frame, so its not overwritten. Userspace is not aware of the
> reference state, hence can't be made responsible. I suspect a lot of the drivers
> out there uses secondary buffer, meaning the reference are not the CAPTURE
> buffer.
Drivers certainly could allocate its internal buffers. But I believe
Android won't like this idea. They would like you allocate it from
somewhere then import into driver.

Besides, when you decode a secure(DRM) stream, you won't want to leak
any data from it. While for those normal stream, you don't want occupt
that limitted amount secure memory zone. I would like to let the
userspace control the allocation of those internal buffers.

This use case needs to be thought thought too. Perhaps other driver uses
> internally allocated memory whenever its about to produce a decode only, but
> that seems to require some firmware feature that is likely uncommon. Please,
> make your research, compare various drivers, and propose an API in the form of
> an RFC so we can discuss that independently from this AV1 pixel format thread.
My proposal for solving tracking the timestamp issue is making v4l2
event have order relevant to buffer.

It would come after I refresh the v4l2 pix format extend API.
>
>>
>> Decoding frames would mean that un-display and frame of different
>>> sizes get delivered, and we don't have a method to communicate these frame
>>> dimension and strides at the moment.
>>>
>>> Nicolas
>>>
>>>
>>>
>>>>
>>>> On 9/12/22 23:45, Nicolas Dufresne wrote:
>>>>> Hi Shi,
>>>>>
>>>>> thanks for the patches, check inline for some comments. Generally speaking, we
>>>>> don't usually add formats ahead of time unless we have a good rationale to do
>>>>> so. Should be expect a companion series against the amlogic decoder driver that
>>>>> enables this ?
>>>>>
>>>>> Le mardi 30 août 2022 à 09:40 +0800, Shi Hao a écrit :
>>>>>> From: "hao.shi" <[email protected]>
>>>>>>
>>>>>> Add AV1 compressed pixel format. It is the more common format.
>>>>>>
>>>>>> Signed-off-by: Hao Shi <[email protected]>
>>>>>> ---
>>>>>> .../userspace-api/media/v4l/pixfmt-compressed.rst | 9 +++++++++
>>>>>> drivers/media/v4l2-core/v4l2-ioctl.c | 1 +
>>>>>> include/uapi/linux/videodev2.h | 1 +
>>>>>> 3 files changed, 11 insertions(+)
>>>>>>
>>>>>> diff --git a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
>>>>>> index 506dd3c98884..5bdeeebdf9f5 100644
>>>>>> --- a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
>>>>>> +++ b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
>>>>>> @@ -232,6 +232,15 @@ Compressed Formats
>>>>>> Metadata associated with the frame to decode is required to be passed
>>>>>> through the ``V4L2_CID_STATELESS_FWHT_PARAMS`` control.
>>>>>> See the :ref:`associated Codec Control ID <codec-stateless-fwht>`.
>>>>>> + * .. _V4L2-PIX-FMT-AV1:
>>>>>> +
>>>>>> + - ``V4L2_PIX_FMT_AV1``
>>>>>> + - 'AV1'
>>>>>> + - AV1 Access Unit. The decoder expects one Access Unit per buffer.
>>>>>
>>>>> I believe this is using a MPEG LA terminology. Did you mean a Temporal Unit (TU)
>>>>> ? In AV1 a TU represent 1 displayable picture, just like AU in H.264 (if you
>>>>> ignore interlaced video).
>>>> I think it should be a complete tile group obu. From the spec, we have
>>>> the term 'frame'.
>>>>
>>>> Currently, AV1 doesn't support interlace.
>>>>>
>>>>>> + The encoder generates one Access Unit per buffer. This format is
>>>>>> + adapted for stateful video decoders. AV1 (AOMedia Video 1) is an
>>>>>> + open video coding format. It was developed as a successor to VP9
>>>>>> + by the Alliance for Open Media (AOMedia).
>>>>>>
>>>>>> .. raw:: latex
>>>>>>
>>>>>> diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
>>>>>> index c314025d977e..fc0f43228546 100644
>>>>>> --- a/drivers/media/v4l2-core/v4l2-ioctl.c
>>>>>> +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
>>>>>> @@ -1497,6 +1497,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
>>>>>> case V4L2_PIX_FMT_MT21C: descr = "Mediatek Compressed Format"; break;
>>>>>> case V4L2_PIX_FMT_QC08C: descr = "QCOM Compressed 8-bit Format"; break;
>>>>>> case V4L2_PIX_FMT_QC10C: descr = "QCOM Compressed 10-bit Format"; break;
>>>>>> + case V4L2_PIX_FMT_AV1: descr = "AV1"; break;
>>>>>> default:
>>>>>> if (fmt->description[0])
>>>>>> return;
>>>>>> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
>>>>>> index 01e630f2ec78..c5ea9f38d807 100644
>>>>>> --- a/include/uapi/linux/videodev2.h
>>>>>> +++ b/include/uapi/linux/videodev2.h
>>>>>> @@ -738,6 +738,7 @@ struct v4l2_pix_format {
>>>>>> #define V4L2_PIX_FMT_FWHT_STATELESS v4l2_fourcc('S', 'F', 'W', 'H') /* Stateless FWHT (vicodec) */
>>>>>> #define V4L2_PIX_FMT_H264_SLICE v4l2_fourcc('S', '2', '6', '4') /* H264 parsed slices */
>>>>>> #define V4L2_PIX_FMT_HEVC_SLICE v4l2_fourcc('S', '2', '6', '5') /* HEVC parsed slices */
>>>>>> +#define V4L2_PIX_FMT_AV1 v4l2_fourcc('A', 'V', '1', '0') /* AV1 */
>>>>>>
>>>>>> /* Vendor-specific formats */
>>>>>> #define V4L2_PIX_FMT_CPIA1 v4l2_fourcc('C', 'P', 'I', 'A') /* cpia1 YUV */
>>>>>>
>>>>>> base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
>>>>>
>>>>>
>>>>
>>>
>>
>

--
Hsia-Jun(Randy) Li

2022-12-15 21:10:25

by Nicolas Dufresne

[permalink] [raw]

Subject: Re: [PATCH V2] [media] v4l2: Add AV1 pixel format

Le jeudi 08 décembre 2022 à 10:39 +0800, Hsia-Jun Li a écrit :
>
> On 12/8/22 01:16, Nicolas Dufresne wrote:
> > CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
> >
> >
> > Le mercredi 07 décembre 2022 à 15:18 +0800, Hsia-Jun Li a écrit :
> > >
> > > On 12/7/22 02:03, Nicolas Dufresne wrote:
> > > > CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
> > > >
> > > >
> > > > Le mardi 29 novembre 2022 à 18:32 +0800, Hsia-Jun Li a écrit :
> > > > > Hello
> > > > >
> > > > > I think we need to add an extra event for VP9 and AV1 which support
> > > > > frame scaling, which means its frame width and height could be different
> > > > > to the previous frame or reference frame.
> > > > >
> > > > > That would be more possible for the VP9 as there is not a sequence
> > > > > header for VP9.
> > > >
> > > > The solution is unlikely in the form of an event, but yes, to complete VP9
> > > > support (and improve AV1 support) a mechanism need to be designed and specified
> > > > to handle inter-frame resolution changes.
> > > >
> > > > Why I say improve AV1, this is because VP9 bitstream does not signal SVC spatial
> > > > streams (the most common use of inter-frame resolution changes). With SVC
> > > > streams, the smaller images are alway decode-only (never displayed). This can be
>
> We expect to get all the result from different layers of a SVC stream.
> Which layer would be displayed is a user's decision.

This is off reality. Its the conferencing server that measure each participant
bandwidth and decide how many layers each one should get to avoid overloading
the network. The selection happens in compressed domain. Perhaps someone could
be creative and make-up a use case for what you describe, but this isn't used in
practice. Adding multi-resolution output requires a massive API additions in
stateful decoders (which this hidden RFC does not cover).

>
> 1. golden frame usually would be higher resolution or better quality.
> But we would only display frames which are lower resolution.
>
> 2. Higher resolution or quality layer would have longer interval, user
> may just display a lower layer in real time case (like video conference).
>
> > > > at least partially supported as long as the maximum image dimension is signalled
> > > > by the bitstream. This is the case for AV1, but not VP9.
> > > >
> > > > Stateless decoders are not affected, because userspace is aware of frames being
> > > > decoded, but not displayed. It is also aware that these frames are reference
> > > > frames. While on stateless decoder, userspace usually does not have this
> > > > knowledge. I think one way to solve this, would be for drivers to be able to
> > > > mark a buffer done, with a flag telling userspace that its not to be displayed.
> > > > For the SVC case, the dimensions and stride are irrelevant.
> > > >
> > > > For true inter-resolution changes, like VP9 supports (though rarely used), this
> > > > needs more APIs. It was suggested to extend CREATE_BUFS, which allow allocation
> > > > with different FMT, with a DELETE_BUFS ioctl, so that userspace can smoothly
> > > > handle the allocation transition.
> > > This could only solve the problem of never display graphics buffers
> > > likes golden frame or alternative reference frame.
> > >
> > > About the topic timestamp tracking problem in v4l2, maybe we could start
> > > a new thread or move them to Gstreamer.
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.freedesktop.org_gstreamer_gstreamer_-2D_issues_1619&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=GL3Q2_6ERpT94we1vI-sUNajRV_t3-lcd8F6DAbXt5EimufjPEa-yTi1-p3EhsCM&s=KbTA0L42trYmxvVphaiUeDsgUS4e-vm64epfaSgAYH8&e=
> > >
> > > My idea here is event attached to buffer or just using the new request
> > > supported in CAPTURE side. I know you worry about the v4l2 event, it is
> > > out of band, more event could lead to the problem we suffer from
> > > OpenMAX. If we could have an order between event and buffer, it won't be
> > > a problem.
> > > For VP9 also, it might be required to support
> > > > super-frame, VP9 supper frames are the ancestor of AV1 TU, and only the last
> > > > frame of a super-frame is every to be displayed. A newly introduced AV1 format
> > > > might also requires complete TU, rather then frames, this needs strict
> > > > documentation.
> > > I don't think the temporal unit is a good idea here.
> > > Most of hardware could only decode a frame once or less likes a
> > > tile(likes slice in ITU codecs).
> > >
> > > Considering the MPEG-TS case,
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__aomediacodec.github.io_av1-2Dmpeg2-2Dts_&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=GL3Q2_6ERpT94we1vI-sUNajRV_t3-lcd8F6DAbXt5EimufjPEa-yTi1-p3EhsCM&s=f7qIYhe94ooCIv1awccCfSmI-Gq0raAXHRogkCBTB4M&e=
> > > Decodable Frame Group could be more a better idea.
> > > Temporal Unit would lead to larger delay.
> >
> > This is off topic for the tread, but this one could be fixed by setting a flag
> > on the capture buffer, something like:
> >
> > V4L2_BUF_FLAG_DECODE_ONLY
> >
> > That's similar to how it works with other CODEC API. The down side is that the
> > driver needs to remember if this is a reference frame when userspace queue back
> > that decode-only frame, so its not overwritten. Userspace is not aware of the
> > reference state, hence can't be made responsible. I suspect a lot of the drivers
> > out there uses secondary buffer, meaning the reference are not the CAPTURE
> > buffer.
> Drivers certainly could allocate its internal buffers. But I believe
> Android won't like this idea. They would like you allocate it from
> somewhere then import into driver.
>
> Besides, when you decode a secure(DRM) stream, you won't want to leak
> any data from it. While for those normal stream, you don't want occupt
> that limitted amount secure memory zone. I would like to let the
> userspace control the allocation of those internal buffers.
>
> This use case needs to be thought thought too. Perhaps other driver uses
> > internally allocated memory whenever its about to produce a decode only, but
> > that seems to require some firmware feature that is likely uncommon. Please,
> > make your research, compare various drivers, and propose an API in the form of
> > an RFC so we can discuss that independently from this AV1 pixel format thread.
> My proposal for solving tracking the timestamp issue is making v4l2
> event have order relevant to buffer.
>
> It would come after I refresh the v4l2 pix format extend API.
> >
> > >
> > > Decoding frames would mean that un-display and frame of different
> > > > sizes get delivered, and we don't have a method to communicate these frame
> > > > dimension and strides at the moment.
> > > >
> > > > Nicolas
> > > >
> > > >
> > > >
> > > > >
> > > > > On 9/12/22 23:45, Nicolas Dufresne wrote:
> > > > > > Hi Shi,
> > > > > >
> > > > > > thanks for the patches, check inline for some comments. Generally speaking, we
> > > > > > don't usually add formats ahead of time unless we have a good rationale to do
> > > > > > so. Should be expect a companion series against the amlogic decoder driver that
> > > > > > enables this ?
> > > > > >
> > > > > > Le mardi 30 août 2022 à 09:40 +0800, Shi Hao a écrit :
> > > > > > > From: "hao.shi" <[email protected]>
> > > > > > >
> > > > > > > Add AV1 compressed pixel format. It is the more common format.
> > > > > > >
> > > > > > > Signed-off-by: Hao Shi <[email protected]>
> > > > > > > ---
> > > > > > > .../userspace-api/media/v4l/pixfmt-compressed.rst | 9 +++++++++
> > > > > > > drivers/media/v4l2-core/v4l2-ioctl.c | 1 +
> > > > > > > include/uapi/linux/videodev2.h | 1 +
> > > > > > > 3 files changed, 11 insertions(+)
> > > > > > >
> > > > > > > diff --git a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
> > > > > > > index 506dd3c98884..5bdeeebdf9f5 100644
> > > > > > > --- a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
> > > > > > > +++ b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
> > > > > > > @@ -232,6 +232,15 @@ Compressed Formats
> > > > > > > Metadata associated with the frame to decode is required to be passed
> > > > > > > through the ``V4L2_CID_STATELESS_FWHT_PARAMS`` control.
> > > > > > > See the :ref:`associated Codec Control ID <codec-stateless-fwht>`.
> > > > > > > + * .. _V4L2-PIX-FMT-AV1:
> > > > > > > +
> > > > > > > + - ``V4L2_PIX_FMT_AV1``
> > > > > > > + - 'AV1'
> > > > > > > + - AV1 Access Unit. The decoder expects one Access Unit per buffer.
> > > > > >
> > > > > > I believe this is using a MPEG LA terminology. Did you mean a Temporal Unit (TU)
> > > > > > ? In AV1 a TU represent 1 displayable picture, just like AU in H.264 (if you
> > > > > > ignore interlaced video).
> > > > > I think it should be a complete tile group obu. From the spec, we have
> > > > > the term 'frame'.
> > > > >
> > > > > Currently, AV1 doesn't support interlace.
> > > > > >
> > > > > > > + The encoder generates one Access Unit per buffer. This format is
> > > > > > > + adapted for stateful video decoders. AV1 (AOMedia Video 1) is an
> > > > > > > + open video coding format. It was developed as a successor to VP9
> > > > > > > + by the Alliance for Open Media (AOMedia).
> > > > > > >
> > > > > > > .. raw:: latex
> > > > > > >
> > > > > > > diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
> > > > > > > index c314025d977e..fc0f43228546 100644
> > > > > > > --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> > > > > > > +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> > > > > > > @@ -1497,6 +1497,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
> > > > > > > case V4L2_PIX_FMT_MT21C: descr = "Mediatek Compressed Format"; break;
> > > > > > > case V4L2_PIX_FMT_QC08C: descr = "QCOM Compressed 8-bit Format"; break;
> > > > > > > case V4L2_PIX_FMT_QC10C: descr = "QCOM Compressed 10-bit Format"; break;
> > > > > > > + case V4L2_PIX_FMT_AV1: descr = "AV1"; break;
> > > > > > > default:
> > > > > > > if (fmt->description[0])
> > > > > > > return;
> > > > > > > diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> > > > > > > index 01e630f2ec78..c5ea9f38d807 100644
> > > > > > > --- a/include/uapi/linux/videodev2.h
> > > > > > > +++ b/include/uapi/linux/videodev2.h
> > > > > > > @@ -738,6 +738,7 @@ struct v4l2_pix_format {
> > > > > > > #define V4L2_PIX_FMT_FWHT_STATELESS v4l2_fourcc('S', 'F', 'W', 'H') /* Stateless FWHT (vicodec) */
> > > > > > > #define V4L2_PIX_FMT_H264_SLICE v4l2_fourcc('S', '2', '6', '4') /* H264 parsed slices */
> > > > > > > #define V4L2_PIX_FMT_HEVC_SLICE v4l2_fourcc('S', '2', '6', '5') /* HEVC parsed slices */
> > > > > > > +#define V4L2_PIX_FMT_AV1 v4l2_fourcc('A', 'V', '1', '0') /* AV1 */
> > > > > > >
> > > > > > > /* Vendor-specific formats */
> > > > > > > #define V4L2_PIX_FMT_CPIA1 v4l2_fourcc('C', 'P', 'I', 'A') /* cpia1 YUV */
> > > > > > >
> > > > > > > base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

2022-12-16 03:14:55

by Hsia-Jun Li

[permalink] [raw]

Subject: Re: [PATCH V2] [media] v4l2: Add AV1 pixel format

On 12/16/22 04:18, Nicolas Dufresne wrote:
> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> Le jeudi 08 décembre 2022 à 10:39 +0800, Hsia-Jun Li a écrit :
>>
>> On 12/8/22 01:16, Nicolas Dufresne wrote:
>>> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
>>>
>>>
>>> Le mercredi 07 décembre 2022 à 15:18 +0800, Hsia-Jun Li a écrit :
>>>>
>>>> On 12/7/22 02:03, Nicolas Dufresne wrote:
>>>>> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
>>>>>
>>>>>
>>>>> Le mardi 29 novembre 2022 à 18:32 +0800, Hsia-Jun Li a écrit :
>>>>>> Hello
>>>>>>
>>>>>> I think we need to add an extra event for VP9 and AV1 which support
>>>>>> frame scaling, which means its frame width and height could be different
>>>>>> to the previous frame or reference frame.
>>>>>>
>>>>>> That would be more possible for the VP9 as there is not a sequence
>>>>>> header for VP9.
>>>>>
>>>>> The solution is unlikely in the form of an event, but yes, to complete VP9
>>>>> support (and improve AV1 support) a mechanism need to be designed and specified
>>>>> to handle inter-frame resolution changes.
>>>>>
>>>>> Why I say improve AV1, this is because VP9 bitstream does not signal SVC spatial
>>>>> streams (the most common use of inter-frame resolution changes). With SVC
>>>>> streams, the smaller images are alway decode-only (never displayed). This can be
>>
>> We expect to get all the result from different layers of a SVC stream.
>> Which layer would be displayed is a user's decision.
>
> This is off reality. Its the conferencing server that measure each participant
> bandwidth and decide how many layers each one should get to avoid overloading
I think the major would prefer the P2P mode and forward way. Server
negotiate and re-encoding were still in use but the lag is people don't
want.
> the network. The selection happens in compressed domain. Perhaps someone could
> be creative and make-up a use case for what you describe, but this isn't used in
> practice.
I know at least two commerical softwares support this.

Just consider this simple, not SVC anneb. Do VP9 and AV1 could support
the current frame has less resolution and worst quaility than the
reference frame? I think the answer would be certainly yes. That is the
idea of golden frame.
So "the smaller images are alway decode-only (never displayed)" is not
right.
Only the inverted case is a pratice of SVC.

Adding multi-resolution output requires a massive API additions in
> stateful decoders (which this hidden RFC does not cover).
yes, had better to start a new topic.
>
>>
>> 1. golden frame usually would be higher resolution or better quality.
>> But we would only display frames which are lower resolution.
>>
>> 2. Higher resolution or quality layer would have longer interval, user
>> may just display a lower layer in real time case (like video conference).
>>
>>>>> at least partially supported as long as the maximum image dimension is signalled
>>>>> by the bitstream. This is the case for AV1, but not VP9.
>>>>>
>>>>> Stateless decoders are not affected, because userspace is aware of frames being
>>>>> decoded, but not displayed. It is also aware that these frames are reference
>>>>> frames. While on stateless decoder, userspace usually does not have this
>>>>> knowledge. I think one way to solve this, would be for drivers to be able to
>>>>> mark a buffer done, with a flag telling userspace that its not to be displayed.
>>>>> For the SVC case, the dimensions and stride are irrelevant.
>>>>>
>>>>> For true inter-resolution changes, like VP9 supports (though rarely used), this
>>>>> needs more APIs. It was suggested to extend CREATE_BUFS, which allow allocation
>>>>> with different FMT, with a DELETE_BUFS ioctl, so that userspace can smoothly
>>>>> handle the allocation transition.
>>>> This could only solve the problem of never display graphics buffers
>>>> likes golden frame or alternative reference frame.
>>>>
>>>> About the topic timestamp tracking problem in v4l2, maybe we could start
>>>> a new thread or move them to Gstreamer.
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.freedesktop.org_gstreamer_gstreamer_-2D_issues_1619&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=GL3Q2_6ERpT94we1vI-sUNajRV_t3-lcd8F6DAbXt5EimufjPEa-yTi1-p3EhsCM&s=KbTA0L42trYmxvVphaiUeDsgUS4e-vm64epfaSgAYH8&e=
>>>>
>>>> My idea here is event attached to buffer or just using the new request
>>>> supported in CAPTURE side. I know you worry about the v4l2 event, it is
>>>> out of band, more event could lead to the problem we suffer from
>>>> OpenMAX. If we could have an order between event and buffer, it won't be
>>>> a problem.
>>>> For VP9 also, it might be required to support
>>>>> super-frame, VP9 supper frames are the ancestor of AV1 TU, and only the last
>>>>> frame of a super-frame is every to be displayed. A newly introduced AV1 format
>>>>> might also requires complete TU, rather then frames, this needs strict
>>>>> documentation.
>>>> I don't think the temporal unit is a good idea here.
>>>> Most of hardware could only decode a frame once or less likes a
>>>> tile(likes slice in ITU codecs).
>>>>
>>>> Considering the MPEG-TS case,
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__aomediacodec.github.io_av1-2Dmpeg2-2Dts_&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=GL3Q2_6ERpT94we1vI-sUNajRV_t3-lcd8F6DAbXt5EimufjPEa-yTi1-p3EhsCM&s=f7qIYhe94ooCIv1awccCfSmI-Gq0raAXHRogkCBTB4M&e=
>>>> Decodable Frame Group could be more a better idea.
>>>> Temporal Unit would lead to larger delay.
>>>
>>> This is off topic for the tread, but this one could be fixed by setting a flag
>>> on the capture buffer, something like:
>>>
>>> V4L2_BUF_FLAG_DECODE_ONLY
>>>
>>> That's similar to how it works with other CODEC API. The down side is that the
>>> driver needs to remember if this is a reference frame when userspace queue back
>>> that decode-only frame, so its not overwritten. Userspace is not aware of the
>>> reference state, hence can't be made responsible. I suspect a lot of the drivers
>>> out there uses secondary buffer, meaning the reference are not the CAPTURE
>>> buffer.
>> Drivers certainly could allocate its internal buffers. But I believe
>> Android won't like this idea. They would like you allocate it from
>> somewhere then import into driver.
>>
>> Besides, when you decode a secure(DRM) stream, you won't want to leak
>> any data from it. While for those normal stream, you don't want occupt
>> that limitted amount secure memory zone. I would like to let the
>> userspace control the allocation of those internal buffers.
>>
>> This use case needs to be thought thought too. Perhaps other driver uses
>>> internally allocated memory whenever its about to produce a decode only, but
>>> that seems to require some firmware feature that is likely uncommon. Please,
>>> make your research, compare various drivers, and propose an API in the form of
>>> an RFC so we can discuss that independently from this AV1 pixel format thread.
>> My proposal for solving tracking the timestamp issue is making v4l2
>> event have order relevant to buffer.
>>
>> It would come after I refresh the v4l2 pix format extend API.
>>>
>>>>
>>>> Decoding frames would mean that un-display and frame of different
>>>>> sizes get delivered, and we don't have a method to communicate these frame
>>>>> dimension and strides at the moment.
>>>>>
>>>>> Nicolas
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> On 9/12/22 23:45, Nicolas Dufresne wrote:
>>>>>>> Hi Shi,
>>>>>>>
>>>>>>> thanks for the patches, check inline for some comments. Generally speaking, we
>>>>>>> don't usually add formats ahead of time unless we have a good rationale to do
>>>>>>> so. Should be expect a companion series against the amlogic decoder driver that
>>>>>>> enables this ?
>>>>>>>
>>>>>>> Le mardi 30 août 2022 à 09:40 +0800, Shi Hao a écrit :
>>>>>>>> From: "hao.shi" <[email protected]>
>>>>>>>>
>>>>>>>> Add AV1 compressed pixel format. It is the more common format.
>>>>>>>>
>>>>>>>> Signed-off-by: Hao Shi <[email protected]>
>>>>>>>> ---
>>>>>>>> .../userspace-api/media/v4l/pixfmt-compressed.rst | 9 +++++++++
>>>>>>>> drivers/media/v4l2-core/v4l2-ioctl.c | 1 +
>>>>>>>> include/uapi/linux/videodev2.h | 1 +
>>>>>>>> 3 files changed, 11 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
>>>>>>>> index 506dd3c98884..5bdeeebdf9f5 100644
>>>>>>>> --- a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
>>>>>>>> +++ b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
>>>>>>>> @@ -232,6 +232,15 @@ Compressed Formats
>>>>>>>> Metadata associated with the frame to decode is required to be passed
>>>>>>>> through the ``V4L2_CID_STATELESS_FWHT_PARAMS`` control.
>>>>>>>> See the :ref:`associated Codec Control ID <codec-stateless-fwht>`.
>>>>>>>> + * .. _V4L2-PIX-FMT-AV1:
>>>>>>>> +
>>>>>>>> + - ``V4L2_PIX_FMT_AV1``
>>>>>>>> + - 'AV1'
>>>>>>>> + - AV1 Access Unit. The decoder expects one Access Unit per buffer.
>>>>>>>
>>>>>>> I believe this is using a MPEG LA terminology. Did you mean a Temporal Unit (TU)
>>>>>>> ? In AV1 a TU represent 1 displayable picture, just like AU in H.264 (if you
>>>>>>> ignore interlaced video).
>>>>>> I think it should be a complete tile group obu. From the spec, we have
>>>>>> the term 'frame'.
>>>>>>
>>>>>> Currently, AV1 doesn't support interlace.
>>>>>>>
>>>>>>>> + The encoder generates one Access Unit per buffer. This format is
>>>>>>>> + adapted for stateful video decoders. AV1 (AOMedia Video 1) is an
>>>>>>>> + open video coding format. It was developed as a successor to VP9
>>>>>>>> + by the Alliance for Open Media (AOMedia).
>>>>>>>>
>>>>>>>> .. raw:: latex
>>>>>>>>
>>>>>>>> diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
>>>>>>>> index c314025d977e..fc0f43228546 100644
>>>>>>>> --- a/drivers/media/v4l2-core/v4l2-ioctl.c
>>>>>>>> +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
>>>>>>>> @@ -1497,6 +1497,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
>>>>>>>> case V4L2_PIX_FMT_MT21C: descr = "Mediatek Compressed Format"; break;
>>>>>>>> case V4L2_PIX_FMT_QC08C: descr = "QCOM Compressed 8-bit Format"; break;
>>>>>>>> case V4L2_PIX_FMT_QC10C: descr = "QCOM Compressed 10-bit Format"; break;
>>>>>>>> + case V4L2_PIX_FMT_AV1: descr = "AV1"; break;
>>>>>>>> default:
>>>>>>>> if (fmt->description[0])
>>>>>>>> return;
>>>>>>>> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
>>>>>>>> index 01e630f2ec78..c5ea9f38d807 100644
>>>>>>>> --- a/include/uapi/linux/videodev2.h
>>>>>>>> +++ b/include/uapi/linux/videodev2.h
>>>>>>>> @@ -738,6 +738,7 @@ struct v4l2_pix_format {
>>>>>>>> #define V4L2_PIX_FMT_FWHT_STATELESS v4l2_fourcc('S', 'F', 'W', 'H') /* Stateless FWHT (vicodec) */
>>>>>>>> #define V4L2_PIX_FMT_H264_SLICE v4l2_fourcc('S', '2', '6', '4') /* H264 parsed slices */
>>>>>>>> #define V4L2_PIX_FMT_HEVC_SLICE v4l2_fourcc('S', '2', '6', '5') /* HEVC parsed slices */
>>>>>>>> +#define V4L2_PIX_FMT_AV1 v4l2_fourcc('A', 'V', '1', '0') /* AV1 */
>>>>>>>>
>>>>>>>> /* Vendor-specific formats */
>>>>>>>> #define V4L2_PIX_FMT_CPIA1 v4l2_fourcc('C', 'P', 'I', 'A') /* cpia1 YUV */
>>>>>>>>
>>>>>>>> base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

--
Hsia-Jun(Randy) Li