LinuxLists.cc - [PATCH 0/2] Better support for complex pixel formats

Hi Miquel,

On Fri, Feb 02, 2024 at 10:26:01AM +0100, Miquel Raynal wrote:
> [email protected] wrote on Fri, 2 Feb 2024 10:55:22 +0200:
>
> > On Thu, 01 Feb 2024 18:31:32 +0100
> > Louis Chauvet <[email protected]> wrote:
> >
> > > Change the composition algorithm to iterate over pixels instead of lines.
> > > It allows a simpler management of rotation and pixel access for complex formats.
> > >
> > > This new algorithm allows read_pixel function to have access to x/y
> > > coordinates and make it possible to read the correct thing in a block
> > > when block_w and block_h are not 1.
> > > The iteration pixel-by-pixel in the same method also allows a simpler
> > > management of rotation with drm_rect_* helpers. This way it's not needed
> > > anymore to have misterious switch-case distributed in multiple places.
> >
> > Hi,
> >
> > there was a very good reason to write this code using lines:
> > performance. Before lines, it was indeed operating on individual pixels.
> >
> > Please, include performance measurements before and after this series
> > to quantify the impact on the previously already supported pixel
> > formats, particularly the 32-bit-per-pixel RGB variants.
> >
> > VKMS will be used more and more in CI for userspace projects, and
> > performance actually matters there.
> >
> > I'm worrying that this performance degradation here is significant. I
> > believe it is possible to keep blending with lines, if you add new line
> > getters for reading from rotated, sub-sampled etc. images. That way you
> > don't have to regress the most common formats' performance.
>
> While I understand performance is important and should be taken into
> account seriously, I cannot understand how broken testing could be
> considered better. Fast but inaccurate will always be significantly
> less attractive to my eyes.

AFAIK, neither the cover letter nor the commit log claimed it was fixing
something broken, just that it was "better" (according to what
criteria?).

If something is truly broken, it must be stated what exactly is so we
can all come up with a solution that will satisfy everyone.

> I am in favor of making this working first, and then improving the code
> for faster results. Maybe the line-driven approach can be dedicated to
> "simpler" formats where more complex corner cases do not happen.

Which ones?

Maxime

Attachments:

(No filename) (2.38 kB)
signature.asc (235.00 B)
Download all attachments

2024-02-02 10:14:57

by Maíra Canal

by Arthur Grillo

[permalink] [raw]

Subject: Re: [PATCH 2/2] drm/vkms: Use a simpler composition function

On 07/02/24 13:03, Louis Chauvet wrote:
> Hello Pekka, Arthur,
>
> [...]
>
>>>> Would it be possible to have a standardised benchmark specifically
>>>> for performance rather than correctness, in IGT or where-ever it
>>>> would make sense? Then it would be simple to tell contributors to
>>>> run this and report the numbers before and after.
>>>>
>>>> I would propose this kind of KMS layout:
>>>>
>>>> - CRTC size 3841 x 2161
>>>> - primary plane, XRGB8888, 3639 x 2161 @ 101,0
>>>> - overlay A, XBGR2101010, 3033 x 1777 @ 201,199
>>>> - overlay B, ARGB8888, 1507 x 1400 @ 1800,250
>>>>
>>>> The sizes and positions are deliberately odd to try to avoid happy
>>>> alignment accidents. The planes are big, which should let the pixel
>>>> operations easily dominate performance measurement. There are
>>>> different pixel formats, both opaque and semi-transparent. There is
>>>> lots of plane overlap. The planes also do not cover the whole CRTC
>>>> leaving the background visible a bit.
>>>>
>>>> There should be two FBs per each plane, flipped alternatingly each
>>>> frame. Writeback should be active. Run this a number of frames, say,
>>>> 100, and measure the kernel CPU time taken. It's supposed to take at
>>>> least several seconds in total.
>>>>
>>>> I think something like this should be the base benchmark. One can
>>>> add more to it, like rotated planes, YUV planes, etc. or switch
>>>> settings on the existing planes. Maybe even FB_DAMAGE_CLIPS. Maybe
>>>> one more overlay that is very tall and thin.
>>>>
>>>> Just an idea, what do you all think?
>>>
>>> Hi Pekka,
>>>
>>> I just finished writing this proposal using IGT.
>>>
>>> I got pretty interesting results:
>>>
>>> The mentioned commit 8356b97906503a02125c8d03c9b88a61ea46a05a took
>>> around 13 seconds. While drm-misc/drm-misc-next took 36 seconds.
>>>
>>> I'm currently bisecting to be certain that the change to the
>>> pixel-by-pixel is the culprit, but I don't see why it wouldn't be.
>>>
>>> I just need to do some final touches on the benchmark code and it
>>> will be ready for revision.
>>
>> Awesome, thank you very much for doing that!
>> pq
>
> I also think it's a good benchmarks for classic configurations. The odd
> size is a very nice idea to verify the corner cases of line-by-line
> algorithms.
>
> When this is ready, please share the test, so I can check if my patch is
> as performant as before.
>
> Thank you for this work.
>
> Have a nice day,
> Louis Chauvet
>

Just sent the benchmark for revision:
https://lore.kernel.org/r/[email protected]

2024-02-08 09:40:44

by Pekka Paalanen

[permalink] [raw]

Subject: Re: [PATCH 2/2] drm/vkms: Use a simpler composition function

On Wed, 7 Feb 2024 16:49:56 +0100
Louis Chauvet <[email protected]> wrote:

> Hello Pekka, Arthur, Maxime,

Hi all

> > > > >>>>>> Change the composition algorithm to iterate over pixels instead of lines.
> > > > >>>>>> It allows a simpler management of rotation and pixel access for complex formats.
> > > > >>>>>>
> > > > >>>>>> This new algorithm allows read_pixel function to have access to x/y
> > > > >>>>>> coordinates and make it possible to read the correct thing in a block
> > > > >>>>>> when block_w and block_h are not 1.
> > > > >>>>>> The iteration pixel-by-pixel in the same method also allows a simpler
> > > > >>>>>> management of rotation with drm_rect_* helpers. This way it's not needed
> > > > >>>>>> anymore to have misterious switch-case distributed in multiple places.
> > > > >>>>>
> > > > >>>>> Hi,
> > > > >>>>>
> > > > >>>>> there was a very good reason to write this code using lines:
> > > > >>>>> performance. Before lines, it was indeed operating on individual pixels.
> > > > >>>>>
> > > > >>>>> Please, include performance measurements before and after this series
> > > > >>>>> to quantify the impact on the previously already supported pixel
> > > > >>>>> formats, particularly the 32-bit-per-pixel RGB variants.
> > > > >>>>>
> > > > >>>>> VKMS will be used more and more in CI for userspace projects, and
> > > > >>>>> performance actually matters there.
> > > > >>>>>
> > > > >>>>> I'm worrying that this performance degradation here is significant. I
> > > > >>>>> believe it is possible to keep blending with lines, if you add new line
> > > > >>>>> getters for reading from rotated, sub-sampled etc. images. That way you
> > > > >>>>> don't have to regress the most common formats' performance.
>
> I tested, and yes, it's significant for most of the tests. None of them
> timed out on my machine, but I agree that I have to improve this. Do you
> know which tests are the more "heavy"?

I don't, but considering that various userspace projects (e.g. Wayland
compositors) want to use VKMS more and more in their own CI, looking
only at IGT is not enough. Every second saved per run is a tiny bit of
data center energy saved, or developers waiting less for results.

I do have some expectations that for each KMS property, Wayland
compositors tend to use the "normal" property value more than any other
value. So if you test different pixel formats, you probably set
rotation to normal, since it's completely orthogonal in userspace. And
then you would test different rotations with just one pixel format.

At least I would personally leave it to IGT to test all the possible
combinations of pixel formats + rotations + odd sizes + odd positions.
Wayland compositor CI wants to test the compositor internals, not VKMS
internals.

> > > > >>>> While I understand performance is important and should be taken into
> > > > >>>> account seriously, I cannot understand how broken testing could be
> > > > >>>> considered better. Fast but inaccurate will always be significantly
> > > > >>>> less attractive to my eyes.
> > > > >>>
> > > > >>> AFAIK, neither the cover letter nor the commit log claimed it was fixing
> > > > >>> something broken, just that it was "better" (according to what
> > > > >>> criteria?).
>
> Sorry Maxime for this little missunderstanding, I will improve the commit
> message and cover letter for the v2.
>
> > > > >> Today's RGB implementation is only optimized in the line-by-line case
> > > > >> when there is no rotation. The logic is bit convoluted and may possibly
> > > > >> be slightly clarified with a per-format read_line() implementation,
> > > > >> at a very light performance cost. Such an improvement would definitely
> > > > >> benefit to the clarity of the code, especially when transformations
> > > > >> (especially the rotations) come into play because they would be clearly
> > > > >> handled differently instead of being "hidden" in the optimized logic.
> > > > >> Performances would not change much as this path is not optimized today
> > > > >> anyway (the pixel-oriented logic is already used in the rotation case).
>
> [...]
>
> > > > > I think it would, if I understand what you mean. Ever since I proposed
> > > > > a line-by-line algorithm to improve the performance, I was thinking of
> > > > > per-format read_line() functions that would be selected outside of any
> > > > > loops.
>
> [...]
>
> > > > > I haven't looked at VKMS in a long time, and I am disappointed to find
> > > > > that vkms_compose_row() is calling plane->pixel_read() pixel-by-pixel.
> > > > > The reading vfunc should be called with many pixels at a time when the
> > > > > source FB layout allows it. The whole point of the line-based functions
> > > > > was that they repeat the innermost loop in every function body to make
> > > > > the per-pixel overhead as small as possible. The VKMS implementations
> > > > > benchmarked before and after the original line-based algorithm showed
> > > > > that calling a function pointer per-pixel is relatively very expensive.
> > > > > Or maybe it was a switch-case.
>
> [...]
>
> > > > But, I agree with Miquel that the rotation logic is easier to implement
> > > > in a pixel-based way. So going pixel-by-pixel only when rotation occurs
> > > > would be great.
> > >
> > > Yes, and I think that can very well be done in the line-based framework
> > > still that existed in the old days before any rotation support was
> > > added. Essentially a plug-in line-getter function that then calls a
> > > format-specific line-getter pixel-by-pixel while applying the rotation.
> > > It would be simple, it would leave unrotated performance unharmed (use
> > > format-specific line-getter directly with lines), but it might be
> > > somewhat less performant for rotated KMS planes. I suspect that might
> > > be a good compromise.
> > >
> > > Format-specific line-getters could also be parameterized by
> > > pixel-to-pixel offset in bytes. Then they could directly traverse FB
> > > rows forward and backward, and even FB columns. It may or may not have
> > > a penalty compared to the original line-getters, so it would have to
> > > be benchmarked.
> >
> > Oh, actually, since the byte offset depends on format, it might be
> > better to parametrize by direction and compute the offset in the
> > format-specific line-getter before the loop.
> >
>
> I'm currently working on this implementation. The algorithm would look
> like:
>
> void blend(...) {
> for(int y = 0; y < height; y++) {
> for(int plane = 0; plane < nb_planes; plane++) {
> if(planes[plane].read_line && planes[plane].rotation == DRM_ROTATION_0) {

I would try to drop the rotation check here completely. Instead, when
choosing the function pointer to call here, outside of *all* loops, you
would check the rotation property. If rotation is a no-op, pick the
read_line function directly. If rotation/reflection is needed, pick a
rotation function that will then call read_line function pixel-by-pixel.

So planes[plane] would have two vfuncs, one with a plain read_line that
assumes normal orientation and can return a line of arbitrary length
from arbitrary x,y position, and another vfunc that this loop here will
call which is either some rotation handling function or just the same
function as the first vfunc.

The two function pointers might well need different signatures, meaning
you need a simple wrapper for the rotation=normal case too.

I believe that could result in cleaner code.

> [...] /* Small common logic to manage REFLECT_X/Y and translations */
> planes[plane].read_line(....);
> } else {
> [...] /* v1 of my patch, pixel by pixel read */
> }
> }
> }
> }
>
> where read_line is:
> void read_line(frame_info *src, int y, int x_start, int x_stop, pixel_argb16 *dts[])
> - y is the line to read (so the caller need to compute the correct offset)
> - x_start/x_stop are the start and stop column, but they may be not
> ordered properly (i.e DRM_REFLECT_X will make x_start greater than
> x_stop)
> - src/dst are source and destination buffers

This sounds ok. An alternative would be something like

enum direction {
RIGHT,
LEFT,
UP,
DOWN,
};

void read_line(frame_info *src, int start_x, int start_y, enum direction dir,
int count_pixels, pixel_argb16 *dst);

Based on dir, before the inner loop this function would compute the
byte offset between the pixels to be read. If the format is multiplanar
YUV, it can compute the offset per plane. And the starting pointers per
pixel plane, of course, and one end pointer for the loop stop condition
maybe from dst.

This might make all the other directions than RIGHT much faster than
calling read_line one pixel at a time to achieve the same.

Would need to benchmark if this is significantly slower than your
suggestion for dir=RIGHT, though. If it's roughly the same, then it
would probably be worth it.

> This way:
> - It's simple to read for the general case (usage of drm_rect_* instead of
> manually rewriting the logic)
> - Each pixel format can be quickly implemented with "pixel-by-pixel"
> methods
> - The performances should be good if no rotation is implied for some
> formats
>
> I also created some helpers for conversions between formats to avoid code
> duplication between pixel and line algorithms (and also between argb and
> xrgb variants).
>
> The only flaw with this, is that most of the read_line functions will
> look like:
>
> void read_line(...) {
> int increment = x_start < x_stop ? 1: -1;
> while(x_start != x_stop) {
> out += 1;
> [...] /* color conversion */
> x_start += increment;
> }
> }
>
> But as Pekka explained, it's probably the most efficient way to do it.

Yes, I expect them to look roughly like that. It's necessary for moving
as much of the setup computations and real function calls out of the
inner-most loop as possible. The middle (over KMS planes) and outer
(over y) loops are less sensitive to wasted cycles on redundant
computations.

> Is there a way to save the output of vkms to a folder (something like
> "one image per frame")? It's a bit complex to debug graphics without
> seeing them.
>
> I have something which (I think) should work, but some tests are failing
> and I don't find why in my code (I don't find the reason why the they are
> failing and the hexdump I added to debug seems normal).
>
> I think my issue is a simple mistake in the "translation managment" or
> maybe just a wrong offset, but I don't see my error in the code. I think a
> quick look on the final image would help me a lot.

I don't know anything about the kernel unit testing frameworks, maybe
they could help?

But if you drive the test through UAPI from userspace, use a writeback
connector to fetch the resulting image. I'm sure IGT has code doing
that somewhere, although many IGT tests rely on CRC instead because CRC
is more widely available in hardware.

Arthur's new benchmark seems to be using writeback, you just need to
make it save to file:
https://lore.kernel.org/all/[email protected]/

Thanks,
pq

Attachments:

(No filename) (849.00 B)
OpenPGP digital signature

2024-02-15 18:24:01

by Arthur Grillo

[permalink] [raw]

Subject: Re: [PATCH 2/2] drm/vkms: Use a simpler composition function

On 08/02/24 06:39, Pekka Paalanen wrote:
> On Wed, 7 Feb 2024 16:49:56 +0100
> Louis Chauvet <[email protected]> wrote:
>
>> Hello Pekka, Arthur, Maxime,
>
> Hi all
>
>>>>>>>>>>> Change the composition algorithm to iterate over pixels instead of lines.
>>>>>>>>>>> It allows a simpler management of rotation and pixel access for complex formats.
>>>>>>>>>>>
>>>>>>>>>>> This new algorithm allows read_pixel function to have access to x/y
>>>>>>>>>>> coordinates and make it possible to read the correct thing in a block
>>>>>>>>>>> when block_w and block_h are not 1.
>>>>>>>>>>> The iteration pixel-by-pixel in the same method also allows a simpler
>>>>>>>>>>> management of rotation with drm_rect_* helpers. This way it's not needed
>>>>>>>>>>> anymore to have misterious switch-case distributed in multiple places.
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> there was a very good reason to write this code using lines:
>>>>>>>>>> performance. Before lines, it was indeed operating on individual pixels.
>>>>>>>>>>
>>>>>>>>>> Please, include performance measurements before and after this series
>>>>>>>>>> to quantify the impact on the previously already supported pixel
>>>>>>>>>> formats, particularly the 32-bit-per-pixel RGB variants.
>>>>>>>>>>
>>>>>>>>>> VKMS will be used more and more in CI for userspace projects, and
>>>>>>>>>> performance actually matters there.
>>>>>>>>>>
>>>>>>>>>> I'm worrying that this performance degradation here is significant. I
>>>>>>>>>> believe it is possible to keep blending with lines, if you add new line
>>>>>>>>>> getters for reading from rotated, sub-sampled etc. images. That way you
>>>>>>>>>> don't have to regress the most common formats' performance.
>>
>> I tested, and yes, it's significant for most of the tests. None of them
>> timed out on my machine, but I agree that I have to improve this. Do you
>> know which tests are the more "heavy"?
>
> I don't, but considering that various userspace projects (e.g. Wayland
> compositors) want to use VKMS more and more in their own CI, looking
> only at IGT is not enough. Every second saved per run is a tiny bit of
> data center energy saved, or developers waiting less for results.
>
> I do have some expectations that for each KMS property, Wayland
> compositors tend to use the "normal" property value more than any other
> value. So if you test different pixel formats, you probably set
> rotation to normal, since it's completely orthogonal in userspace. And
> then you would test different rotations with just one pixel format.
>
> At least I would personally leave it to IGT to test all the possible
> combinations of pixel formats + rotations + odd sizes + odd positions.
> Wayland compositor CI wants to test the compositor internals, not VKMS
> internals.
>
>>>>>>>>> While I understand performance is important and should be taken into
>>>>>>>>> account seriously, I cannot understand how broken testing could be
>>>>>>>>> considered better. Fast but inaccurate will always be significantly
>>>>>>>>> less attractive to my eyes.
>>>>>>>>
>>>>>>>> AFAIK, neither the cover letter nor the commit log claimed it was fixing
>>>>>>>> something broken, just that it was "better" (according to what
>>>>>>>> criteria?).
>>
>> Sorry Maxime for this little missunderstanding, I will improve the commit
>> message and cover letter for the v2.
>>
>>>>>>> Today's RGB implementation is only optimized in the line-by-line case
>>>>>>> when there is no rotation. The logic is bit convoluted and may possibly
>>>>>>> be slightly clarified with a per-format read_line() implementation,
>>>>>>> at a very light performance cost. Such an improvement would definitely
>>>>>>> benefit to the clarity of the code, especially when transformations
>>>>>>> (especially the rotations) come into play because they would be clearly
>>>>>>> handled differently instead of being "hidden" in the optimized logic.
>>>>>>> Performances would not change much as this path is not optimized today
>>>>>>> anyway (the pixel-oriented logic is already used in the rotation case).
>>
>> [...]
>>
>>>>>> I think it would, if I understand what you mean. Ever since I proposed
>>>>>> a line-by-line algorithm to improve the performance, I was thinking of
>>>>>> per-format read_line() functions that would be selected outside of any
>>>>>> loops.
>>
>> [...]
>>
>>>>>> I haven't looked at VKMS in a long time, and I am disappointed to find
>>>>>> that vkms_compose_row() is calling plane->pixel_read() pixel-by-pixel.
>>>>>> The reading vfunc should be called with many pixels at a time when the
>>>>>> source FB layout allows it. The whole point of the line-based functions
>>>>>> was that they repeat the innermost loop in every function body to make
>>>>>> the per-pixel overhead as small as possible. The VKMS implementations
>>>>>> benchmarked before and after the original line-based algorithm showed
>>>>>> that calling a function pointer per-pixel is relatively very expensive.
>>>>>> Or maybe it was a switch-case.
>>
>> [...]
>>
>>>>> But, I agree with Miquel that the rotation logic is easier to implement
>>>>> in a pixel-based way. So going pixel-by-pixel only when rotation occurs
>>>>> would be great.
>>>>
>>>> Yes, and I think that can very well be done in the line-based framework
>>>> still that existed in the old days before any rotation support was
>>>> added. Essentially a plug-in line-getter function that then calls a
>>>> format-specific line-getter pixel-by-pixel while applying the rotation.
>>>> It would be simple, it would leave unrotated performance unharmed (use
>>>> format-specific line-getter directly with lines), but it might be
>>>> somewhat less performant for rotated KMS planes. I suspect that might
>>>> be a good compromise.
>>>>
>>>> Format-specific line-getters could also be parameterized by
>>>> pixel-to-pixel offset in bytes. Then they could directly traverse FB
>>>> rows forward and backward, and even FB columns. It may or may not have
>>>> a penalty compared to the original line-getters, so it would have to
>>>> be benchmarked.
>>>
>>> Oh, actually, since the byte offset depends on format, it might be
>>> better to parametrize by direction and compute the offset in the
>>> format-specific line-getter before the loop.
>>>
>>
>> I'm currently working on this implementation. The algorithm would look
>> like:
>>
>> void blend(...) {
>> for(int y = 0; y < height; y++) {
>> for(int plane = 0; plane < nb_planes; plane++) {
>> if(planes[plane].read_line && planes[plane].rotation == DRM_ROTATION_0) {
>
> I would try to drop the rotation check here completely. Instead, when
> choosing the function pointer to call here, outside of *all* loops, you
> would check the rotation property. If rotation is a no-op, pick the
> read_line function directly. If rotation/reflection is needed, pick a
> rotation function that will then call read_line function pixel-by-pixel.
>
> So planes[plane] would have two vfuncs, one with a plain read_line that
> assumes normal orientation and can return a line of arbitrary length
> from arbitrary x,y position, and another vfunc that this loop here will
> call which is either some rotation handling function or just the same
> function as the first vfunc.
>
> The two function pointers might well need different signatures, meaning
> you need a simple wrapper for the rotation=normal case too.
>
> I believe that could result in cleaner code.
>
>> [...] /* Small common logic to manage REFLECT_X/Y and translations */
>> planes[plane].read_line(....);
>> } else {
>> [...] /* v1 of my patch, pixel by pixel read */
>> }
>> }
>> }
>> }
>>
>> where read_line is:
>> void read_line(frame_info *src, int y, int x_start, int x_stop, pixel_argb16 *dts[])
>> - y is the line to read (so the caller need to compute the correct offset)
>> - x_start/x_stop are the start and stop column, but they may be not
>> ordered properly (i.e DRM_REFLECT_X will make x_start greater than
>> x_stop)
>> - src/dst are source and destination buffers
>
> This sounds ok. An alternative would be something like
>
> enum direction {
> RIGHT,
> LEFT,
> UP,
> DOWN,
> };
>
> void read_line(frame_info *src, int start_x, int start_y, enum direction dir,
> int count_pixels, pixel_argb16 *dst);
>
> Based on dir, before the inner loop this function would compute the
> byte offset between the pixels to be read. If the format is multiplanar
> YUV, it can compute the offset per plane. And the starting pointers per
> pixel plane, of course, and one end pointer for the loop stop condition
> maybe from dst.
>
> This might make all the other directions than RIGHT much faster than
> calling read_line one pixel at a time to achieve the same.
>
> Would need to benchmark if this is significantly slower than your
> suggestion for dir=RIGHT, though. If it's roughly the same, then it
> would probably be worth it.
>
>
>> This way:
>> - It's simple to read for the general case (usage of drm_rect_* instead of
>> manually rewriting the logic)
>> - Each pixel format can be quickly implemented with "pixel-by-pixel"
>> methods
>> - The performances should be good if no rotation is implied for some
>> formats
>>
>> I also created some helpers for conversions between formats to avoid code
>> duplication between pixel and line algorithms (and also between argb and
>> xrgb variants).
>>
>> The only flaw with this, is that most of the read_line functions will
>> look like:
>>
>> void read_line(...) {
>> int increment = x_start < x_stop ? 1: -1;
>> while(x_start != x_stop) {
>> out += 1;
>> [...] /* color conversion */
>> x_start += increment;
>> }
>> }
>>
>> But as Pekka explained, it's probably the most efficient way to do it.
>
> Yes, I expect them to look roughly like that. It's necessary for moving
> as much of the setup computations and real function calls out of the
> inner-most loop as possible. The middle (over KMS planes) and outer
> (over y) loops are less sensitive to wasted cycles on redundant
> computations.
>
>> Is there a way to save the output of vkms to a folder (something like
>> "one image per frame")? It's a bit complex to debug graphics without
>> seeing them.
>>
>> I have something which (I think) should work, but some tests are failing
>> and I don't find why in my code (I don't find the reason why the they are
>> failing and the hexdump I added to debug seems normal).
>>
>> I think my issue is a simple mistake in the "translation managment" or
>> maybe just a wrong offset, but I don't see my error in the code. I think a
>> quick look on the final image would help me a lot.
>
> I don't know anything about the kernel unit testing frameworks, maybe
> they could help?
>
> But if you drive the test through UAPI from userspace, use a writeback
> connector to fetch the resulting image. I'm sure IGT has code doing
> that somewhere, although many IGT tests rely on CRC instead because CRC
> is more widely available in hardware.
>
> Arthur's new benchmark seems to be using writeback, you just need to
> make it save to file:
> https://lore.kernel.org/all/[email protected]/

Hi,

I just found that the IGT's test kms_writeback has a flag '--dump' that
does that, maybe you can use it or copy the logic to the benchmark.

Best Regards,
~Arthur Grillo

>
>
> Thanks,
> pq