2024-02-23 11:38:31

by Louis Chauvet

[permalink] [raw]
Subject: [PATCH v2 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading

This patchset is the second version of [1]. It is almost a complete
rewrite to use a line-by-line algorithm for the composition.
It can be divided in three parts:
- PATCH 1 to 4: no functional change is intended, only some formatting and
documenting
(PATCH 2 is taken from [2])
- PATCH 5: main patch for this series, it reintroduce the
line-by-line algorithm
- PATCH 6 to 9: taken from Arthur's series [2], with sometimes adaptation
to use the pixel-by-pixel algorithm.

The PATCH 5 aims to restore the line-by-line pixel reading algorithm. It
was introduced in 8ba1648567e2 ("drm: vkms: Refactor the plane composer to
accept new formats") but removed in 8ba1648567e2 ("drm: vkms: Refactor the
plane composer to accept new formats") in a over-simplification effort.
At this time, nobody noticed the performance impact of this commit. After
the first iteration of my series, poeple notice performance impact, and it
was the case. Pekka suggested to reimplement the line-by-line algorithm.

Expiriments on my side shown great improvement for the line-by-line
algorithm, and the performances are the same as the original line-by-line
algorithm. I targeted my effort to make the code working for all the
rotations and translations. The usage of helpers from drm_rect_* avoid
reimplementing existing logic.

The only "complex" part remaining is the clipping of the coordinate to
avoid reading/writing outside of src/dst. Thus I added a lot of comments
to help when someone will want to add some features (framebuffer resizing
for example).

The YUV part is not mandatory for this series, but as my first effort was
to help the integration of YUV, I decided to rebase Arthur's series on
mine to help. I took [3], [4], [5] and [6] and adapted them to use the
line-by-line reading. If I did something wrong here, please let me
know.

My series was mainly tested with:
- kms_plane (for color conversions)
- kms_rotation_crc (for rotations of planes)
- kms_cursor_crc (for translations)
The benchmark used to measure the improvment was done with:
- kms_fb_stress

[1]: https://lore.kernel.org/r/[email protected]
[2]: https://lore.kernel.org/all/[email protected]/
[3]: https://lore.kernel.org/all/[email protected]/
[4]: https://lore.kernel.org/all/[email protected]/
[5]: https://lore.kernel.org/all/[email protected]/
[6]: https://lore.kernel.org/all/[email protected]/

To: Rodrigo Siqueira <[email protected]>
To: Melissa Wen <[email protected]>
To: Maíra Canal <[email protected]>
To: Haneen Mohammed <[email protected]>
To: Daniel Vetter <[email protected]>
To: Maarten Lankhorst <[email protected]>
To: Maxime Ripard <[email protected]>
To: Thomas Zimmermann <[email protected]>
To: David Airlie <[email protected]>
To: [email protected]
To: Jonathan Corbet <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Louis Chauvet <[email protected]>

Note: after my changes, those tests seems to pass, so [7] may need
updating (I did not check, it was maybe already the case):
- kms_cursor_legacy@flip-vs-cursor-atomic
- kms_pipe_crc_basic@nonblocking-crc
- kms_pipe_crc_basic@nonblocking-crc-frame-sequence
- kms_writeback@writeback-pixel-formats
- kms_writeback@writeback-invalid-parameters
- kms_flip@flip-vs-absolute-wf_vblank-interruptible
And those tests pass, I did not investigate why the runners fails:
- kms_flip@flip-vs-expired-vblank-interruptible
- kms_flip@flip-vs-expired-vblank
- kms_flip@plain-flip-fb-recreate
- kms_flip@plain-flip-fb-recreate-interruptible
- kms_flip@plain-flip-ts-check-interruptible
- kms_cursor_legacy@cursorA-vs-flipA-toggle
- kms_pipe_crc_basic@nonblocking-crc
- kms_prop_blob@invalid-get-prop
- kms_flip@flip-vs-absolute-wf_vblank-interruptible
- kms_invalid_mode@zero-hdisplay
- kms_invalid_mode@bad-vtotal
- kms_cursor_crc.* (everything is SUCCEED or SKIP, but no fails)

[7]: https://lore.kernel.org/all/[email protected]/

Changes in v2:
- Rebased the series on top of drm-misc/drm-misc-net
- Extract the typedef for pixel_read/pixel_write
- Introduce the line-by-line algorithm per pixel format
- Add some documentation for existing and new code
- Port the series [1] to use line-by-line algorithm
- Link to v1: https://lore.kernel.org/r/[email protected]

---
Arthur Grillo (5):
drm/vkms: Use drm_frame directly
drm/vkms: Add YUV support
drm/vkms: Add range and encoding properties to pixel_read function
drm/vkms: Drop YUV formats TODO
drm/vkms: Create KUnit tests for YUV conversions

Louis Chauvet (4):
drm/vkms: Code formatting
drm/vkms: write/update the documentation for pixel conversion and pixel write functions
drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions
drm/vkms: Re-introduce line-per-line composition algorithm

Documentation/gpu/vkms.rst | 3 +-
drivers/gpu/drm/vkms/Makefile | 1 +
drivers/gpu/drm/vkms/tests/.kunitconfig | 4 +
drivers/gpu/drm/vkms/tests/Makefile | 3 +
drivers/gpu/drm/vkms/tests/vkms_format_test.c | 155 +++++++
drivers/gpu/drm/vkms/vkms_composer.c | 233 ++++++++---
drivers/gpu/drm/vkms/vkms_crtc.c | 6 +-
drivers/gpu/drm/vkms/vkms_drv.c | 3 +-
drivers/gpu/drm/vkms/vkms_drv.h | 56 ++-
drivers/gpu/drm/vkms/vkms_formats.c | 565 +++++++++++++++++++++-----
drivers/gpu/drm/vkms/vkms_formats.h | 13 +-
drivers/gpu/drm/vkms/vkms_plane.c | 50 ++-
drivers/gpu/drm/vkms/vkms_writeback.c | 14 +-
13 files changed, 916 insertions(+), 190 deletions(-)
---
base-commit: aa1267e673fe5307cf00d02add4017d2878598b6
change-id: 20240201-yuv-1337d90d9576

Best regards,
--
Louis Chauvet <[email protected]>



2024-02-23 11:38:34

by Louis Chauvet

[permalink] [raw]
Subject: [PATCH v2 2/9] drm/vkms: Use drm_frame directly

From: Arthur Grillo <[email protected]>

Remove intermidiary variables and access the variables directly from
drm_frame. These changes should be noop.

Signed-off-by: Arthur Grillo <[email protected]>
Signed-off-by: Louis Chauvet <[email protected]>
---
drivers/gpu/drm/vkms/vkms_drv.h | 3 ---
drivers/gpu/drm/vkms/vkms_formats.c | 12 +++++++-----
drivers/gpu/drm/vkms/vkms_plane.c | 3 ---
drivers/gpu/drm/vkms/vkms_writeback.c | 5 -----
4 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 8f5710debb1e..b4b357447292 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -31,9 +31,6 @@ struct vkms_frame_info {
struct drm_rect rotated;
struct iosys_map map[DRM_FORMAT_MAX_PLANES];
unsigned int rotation;
- unsigned int offset;
- unsigned int pitch;
- unsigned int cpp;
};

struct pixel_argb_u16 {
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index 36046b12f296..172830a3936a 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -11,8 +11,10 @@

static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
{
- return frame_info->offset + (y * frame_info->pitch)
- + (x * frame_info->cpp);
+ struct drm_framebuffer *fb = frame_info->fb;
+
+ return fb->offsets[0] + (y * fb->pitches[0])
+ + (x * fb->format->cpp[0]);
}

/*
@@ -131,12 +133,12 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state
u8 *src_pixels = get_packed_src_addr(frame_info, y);
int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);

- for (size_t x = 0; x < limit; x++, src_pixels += frame_info->cpp) {
+ for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
int x_pos = get_x_position(frame_info, limit, x);

if (drm_rotation_90_or_270(frame_info->rotation))
src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1)
- + frame_info->cpp * y;
+ + frame_info->fb->format->cpp[0] * y;

plane->pixel_read(src_pixels, &out_pixels[x_pos]);
}
@@ -223,7 +225,7 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst), src_buffer->n_pixels);

- for (size_t x = 0; x < x_limit; x++, dst_pixels += frame_info->cpp)
+ for (size_t x = 0; x < x_limit; x++, dst_pixels += frame_info->fb->format->cpp[0])
wb->pixel_write(dst_pixels, &in_pixels[x]);
}

diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 90c09046e0af..d5203f531d96 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -124,9 +124,6 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated),
drm_rect_height(&frame_info->rotated), frame_info->rotation);

- frame_info->offset = fb->offsets[0];
- frame_info->pitch = fb->pitches[0];
- frame_info->cpp = fb->format->cpp[0];
vkms_plane_state->pixel_read = get_pixel_conversion_function(fmt);
}

diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index bc724cbd5e3a..c8582df1f739 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -149,11 +149,6 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
crtc_state->active_writeback = active_wb;
crtc_state->wb_pending = true;
spin_unlock_irq(&output->composer_lock);
-
- wb_frame_info->offset = fb->offsets[0];
- wb_frame_info->pitch = fb->pitches[0];
- wb_frame_info->cpp = fb->format->cpp[0];
-
drm_writeback_queue_job(wb_conn, connector_state);
active_wb->pixel_write = get_pixel_write_function(wb_format);
drm_rect_init(&wb_frame_info->src, 0, 0, crtc_width, crtc_height);

--
2.43.0


2024-02-23 11:38:56

by Louis Chauvet

[permalink] [raw]
Subject: [PATCH v2 3/9] drm/vkms: write/update the documentation for pixel conversion and pixel write functions

Add some documentation on pixel conversion functions.
Update of outdated comments for pixel_write functions.

Signed-off-by: Louis Chauvet <[email protected]>
---
drivers/gpu/drm/vkms/vkms_composer.c | 4 +++
drivers/gpu/drm/vkms/vkms_drv.h | 13 ++++++++
drivers/gpu/drm/vkms/vkms_formats.c | 58 ++++++++++++++++++++++++++++++------
3 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index c6d9b4a65809..5b341222d239 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -189,6 +189,10 @@ static void blend(struct vkms_writeback_job *wb,

size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;

+ /*
+ * The planes are composed line-by-line. It is a necessary complexity to avoid poor
+ * blending performance.
+ */
for (size_t y = 0; y < crtc_y_limit; y++) {
fill_background(&background_color, output_buffer);

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index b4b357447292..18086423a3a7 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -25,6 +25,17 @@

#define VKMS_LUT_SIZE 256

+/**
+ * struct vkms_frame_info - structure to store the state of a frame
+ *
+ * @fb: backing drm framebuffer
+ * @src: source rectangle of this frame in the source framebuffer
+ * @dst: destination rectangle in the crtc buffer
+ * @map: see drm_shadow_plane_state@data
+ * @rotation: rotation applied to the source.
+ *
+ * @src and @dst should have the same size modulo the rotation.
+ */
struct vkms_frame_info {
struct drm_framebuffer *fb;
struct drm_rect src, dst;
@@ -52,6 +63,8 @@ struct vkms_writeback_job {
* vkms_plane_state - Driver specific plane state
* @base: base plane state
* @frame_info: data required for composing computation
+ * @pixel_read: function to read a pixel in this plane. The creator of a vkms_plane_state must
+ * ensure that this pointer is valid
*/
struct vkms_plane_state {
struct drm_shadow_plane_state base;
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index 172830a3936a..cb7a49b7c8e7 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -9,6 +9,17 @@

#include "vkms_formats.h"

+/**
+ * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
+ * in the first plane
+ *
+ * @frame_info: Buffer metadata
+ * @x: The x coordinate of the wanted pixel in the buffer
+ * @y: The y coordinate of the wanted pixel in the buffer
+ *
+ * The caller must be aware that this offset is not always a pointer to a pixel. If individual
+ * pixel values are needed, they have to be extracted from the resulting block.
+ */
static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
{
struct drm_framebuffer *fb = frame_info->fb;
@@ -17,12 +28,13 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
+ (x * fb->format->cpp[0]);
}

-/*
- * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
+/**
+ * packed_pixels_addr() - Get the pointer to the block containing the pixel at the given
+ * coordinates
*
* @frame_info: Buffer metadata
- * @x: The x(width) coordinate of the 2D buffer
- * @y: The y(Heigth) coordinate of the 2D buffer
+ * @x: The x(width) coordinate inside the plane
+ * @y: The y(height) coordinate inside the plane
*
* Takes the information stored in the frame_info, a pair of coordinates, and
* returns the address of the first color channel.
@@ -53,6 +65,13 @@ static int get_x_position(const struct vkms_frame_info *frame_info, int limit, i
return x;
}

+/*
+ * The following functions take pixel data from the buffer and convert them to the format
+ * ARGB16161616 in out_pixel.
+ *
+ * They are used in the `vkms_compose_row` function to handle multiple formats.
+ */
+
static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
{
/*
@@ -145,12 +164,11 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state
}

/*
- * The following functions take an line of argb_u16 pixels from the
- * src_buffer, convert them to a specific format, and store them in the
- * destination.
+ * The following functions take one argb_u16 pixel and convert it to a specific format. The
+ * result is stored in @dst_pixels.
*
- * They are used in the `compose_active_planes` to convert and store a line
- * from the src_buffer to the writeback buffer.
+ * They are used in the `vkms_writeback_row` to convert and store a pixel from the src_buffer to
+ * the writeback buffer.
*/
static void argb_u16_to_ARGB8888(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel)
{
@@ -216,6 +234,14 @@ static void argb_u16_to_RGB565(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel)
*pixels = cpu_to_le16(r << 11 | g << 5 | b);
}

+/**
+ * Generic loop for all supported writeback format. It is executed just after the blending to
+ * write a line in the writeback buffer.
+ *
+ * @wb: Job where to insert the final image
+ * @src_buffer: Line to write
+ * @y: Row to write in the writeback buffer
+ */
void vkms_writeback_row(struct vkms_writeback_job *wb,
const struct line_buffer *src_buffer, int y)
{
@@ -229,6 +255,13 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
wb->pixel_write(dst_pixels, &in_pixels[x]);
}

+/**
+ * Retrieve the correct read_pixel function for a specific format.
+ * The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the
+ * pointer is valid before using it in a vkms_plane_state.
+ *
+ * @format: 4cc of the format
+ */
void *get_pixel_conversion_function(u32 format)
{
switch (format) {
@@ -247,6 +280,13 @@ void *get_pixel_conversion_function(u32 format)
}
}

+/**
+ * Retrieve the correct write_pixel function for a specific format.
+ * The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the
+ * pointer is valid before using it in a vkms_writeback_job.
+ *
+ * @format: 4cc of the format
+ */
void *get_pixel_write_function(u32 format)
{
switch (format) {

--
2.43.0


2024-02-23 11:39:03

by Louis Chauvet

[permalink] [raw]
Subject: [PATCH v2 4/9] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions

Introduce two typedefs: pixel_read_t and pixel_write_t. It allows the
compiler to check if the passed functions take the correct arguments.
Such typedefs will help ensuring consistency across the code base in
case of update of these prototypes.

Introduce a check around the get_pixel_*_functions to avoid using a
nullptr as a function.

Document for those typedefs.

Signed-off-by: Louis Chauvet <[email protected]>
---
drivers/gpu/drm/vkms/vkms_drv.h | 23 +++++++++++++++++++++--
drivers/gpu/drm/vkms/vkms_formats.c | 8 ++++----
drivers/gpu/drm/vkms/vkms_formats.h | 4 ++--
drivers/gpu/drm/vkms/vkms_plane.c | 9 ++++++++-
drivers/gpu/drm/vkms/vkms_writeback.c | 9 ++++++++-
5 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 18086423a3a7..886c885c8cf5 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -53,12 +53,31 @@ struct line_buffer {
struct pixel_argb_u16 *pixels;
};

+/**
+ * typedef pixel_write_t - These functions are used to read a pixel from a
+ * `struct pixel_argb_u16*`, convert it in a specific format and write it in the @dst_pixels
+ * buffer.
+ *
+ * @dst_pixel: destination address to write the pixel
+ * @in_pixel: pixel to write
+ */
+typedef void (*pixel_write_t)(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel);
+
struct vkms_writeback_job {
struct iosys_map data[DRM_FORMAT_MAX_PLANES];
struct vkms_frame_info wb_frame_info;
- void (*pixel_write)(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel);
+ pixel_write_t pixel_write;
};

+/**
+ * typedef pixel_read_t - These functions are used to read a pixel in the source frame,
+ * convert it to `struct pixel_argb_u16` and write it to @out_pixel.
+ *
+ * @src_pixels: Pointer to the pixel to read
+ * @out_pixel: Pointer to write the converted pixel
+ */
+typedef void (*pixel_read_t)(u8 *src_pixels, struct pixel_argb_u16 *out_pixel);
+
/**
* vkms_plane_state - Driver specific plane state
* @base: base plane state
@@ -69,7 +88,7 @@ struct vkms_writeback_job {
struct vkms_plane_state {
struct drm_shadow_plane_state base;
struct vkms_frame_info *frame_info;
- void (*pixel_read)(u8 *src_buffer, struct pixel_argb_u16 *out_pixel);
+ pixel_read_t pixel_read;
};

struct vkms_plane {
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index cb7a49b7c8e7..1f5aeba57ad6 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -262,7 +262,7 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
*
* @format: 4cc of the format
*/
-void *get_pixel_conversion_function(u32 format)
+pixel_read_t get_pixel_read_function(u32 format)
{
switch (format) {
case DRM_FORMAT_ARGB8888:
@@ -276,7 +276,7 @@ void *get_pixel_conversion_function(u32 format)
case DRM_FORMAT_RGB565:
return &RGB565_to_argb_u16;
default:
- return NULL;
+ return (pixel_read_t)NULL;
}
}

@@ -287,7 +287,7 @@ void *get_pixel_conversion_function(u32 format)
*
* @format: 4cc of the format
*/
-void *get_pixel_write_function(u32 format)
+pixel_write_t get_pixel_write_function(u32 format)
{
switch (format) {
case DRM_FORMAT_ARGB8888:
@@ -301,6 +301,6 @@ void *get_pixel_write_function(u32 format)
case DRM_FORMAT_RGB565:
return &argb_u16_to_RGB565;
default:
- return NULL;
+ return (pixel_write_t)NULL;
}
}
diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
index cf59c2ed8e9a..3ecea4563254 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.h
+++ b/drivers/gpu/drm/vkms/vkms_formats.h
@@ -5,8 +5,8 @@

#include "vkms_drv.h"

-void *get_pixel_conversion_function(u32 format);
+pixel_read_t get_pixel_read_function(u32 format);

-void *get_pixel_write_function(u32 format);
+pixel_write_t get_pixel_write_function(u32 format);

#endif /* _VKMS_FORMATS_H_ */
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index d5203f531d96..f68b1b03d632 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -106,6 +106,13 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
return;

fmt = fb->format->format;
+ pixel_read_t pixel_read = get_pixel_read_function(fmt);
+
+ if (!pixel_read) {
+ DRM_WARN("Pixel format is not supported by VKMS planes. State is inchanged\n");
+ return;
+ }
+
vkms_plane_state = to_vkms_plane_state(new_state);
shadow_plane_state = &vkms_plane_state->base;

@@ -124,7 +131,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated),
drm_rect_height(&frame_info->rotated), frame_info->rotation);

- vkms_plane_state->pixel_read = get_pixel_conversion_function(fmt);
+ vkms_plane_state->pixel_read = pixel_read;
}

static int vkms_plane_atomic_check(struct drm_plane *plane,
diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index c8582df1f739..c92b9f06c4a4 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -140,6 +140,13 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
if (!conn_state)
return;

+ pixel_write_t pixel_write = get_pixel_write_function(wb_format);
+
+ if (!pixel_write) {
+ DRM_WARN("Pixel format is not supported by VKMS writeback. State is inchanged\n");
+ return;
+ }
+
vkms_set_composer(&vkmsdev->output, true);

active_wb = conn_state->writeback_job->priv;
@@ -150,7 +157,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
crtc_state->wb_pending = true;
spin_unlock_irq(&output->composer_lock);
drm_writeback_queue_job(wb_conn, connector_state);
- active_wb->pixel_write = get_pixel_write_function(wb_format);
+ active_wb->pixel_write = pixel_write;
drm_rect_init(&wb_frame_info->src, 0, 0, crtc_width, crtc_height);
drm_rect_init(&wb_frame_info->dst, 0, 0, crtc_width, crtc_height);
}

--
2.43.0


2024-02-23 11:39:39

by Louis Chauvet

[permalink] [raw]
Subject: [PATCH v2 8/9] drm/vkms: Drop YUV formats TODO

From: Arthur Grillo <[email protected]>

VKMS has support for YUV formats now. Remove the task from the TODO
list.

Signed-off-by: Arthur Grillo <[email protected]>
Signed-off-by: Louis Chauvet <[email protected]>
---
Documentation/gpu/vkms.rst | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
index ba04ac7c2167..13b866c3617c 100644
--- a/Documentation/gpu/vkms.rst
+++ b/Documentation/gpu/vkms.rst
@@ -122,8 +122,7 @@ There's lots of plane features we could add support for:

- Scaling.

-- Additional buffer formats, especially YUV formats for video like NV12.
- Low/high bpp RGB formats would also be interesting.
+- Additional buffer formats. Low/high bpp RGB formats would be interesting.

- Async updates (currently only possible on cursor plane using the legacy
cursor api).

--
2.43.0


2024-02-23 11:39:56

by Louis Chauvet

[permalink] [raw]
Subject: [PATCH v2 7/9] drm/vkms: Add range and encoding properties to pixel_read function

From: Arthur Grillo <[email protected]>

Create range and encoding properties. This should be noop, as none of
the conversion functions need those properties.

Signed-off-by: Arthur Grillo <[email protected]>
[Louis Chauvet: retained only relevant parts]
Signed-off-by: Louis Chauvet <[email protected]>
---
drivers/gpu/drm/vkms/vkms_plane.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 427ca67c60ce..95dfde297377 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -228,5 +228,14 @@ struct vkms_plane *vkms_plane_init(struct vkms_device *vkmsdev,
drm_plane_create_rotation_property(&plane->base, DRM_MODE_ROTATE_0,
DRM_MODE_ROTATE_MASK | DRM_MODE_REFLECT_MASK);

+ drm_plane_create_color_properties(&plane->base,
+ BIT(DRM_COLOR_YCBCR_BT601) |
+ BIT(DRM_COLOR_YCBCR_BT709) |
+ BIT(DRM_COLOR_YCBCR_BT2020),
+ BIT(DRM_COLOR_YCBCR_LIMITED_RANGE) |
+ BIT(DRM_COLOR_YCBCR_FULL_RANGE),
+ DRM_COLOR_YCBCR_BT601,
+ DRM_COLOR_YCBCR_FULL_RANGE);
+
return plane;
}

--
2.43.0


2024-02-23 11:39:57

by Louis Chauvet

[permalink] [raw]
Subject: [PATCH v2 5/9] drm/vkms: Re-introduce line-per-line composition algorithm

Re-introduce a line-by-line composition algorithm for each pixel format.
This allows more performance by not requiring an indirection per pixel
read. This patch is focussed on readability of the code.

Line-by-line composition was introduced by [1] but rewritten back to
pixel-by-pixel algorithm in [2]. At this time, nobody noticed the impact
on performance, and it was merged.

This patch is almost a revert of [2], but in addition efforts have been
made to increase readability and maintenability of the rotation handling.
The blend function is now divided in two parts:
- Transformation of coordinates from the output referential to the source
referential
- Line conversion and blending

Most of the complexity of the rotation management is avoided by using
drm_rect_* helpers. The remaning complexity is around the clipping, to
avoid reading/writing oudside source/destination buffers.

The pixel conversion is now done line-by-line, so the read_pixel_t was
replaced with read_pixel_line_t callback. This way the indirection is only
required once per line and per plane, instead of once per pixel and per
plane.

The read_line_t callbacks are very similar for most pixel format, but it
is required to avoid performance impact. Some helpers were created to
avoid code repetition:
- get_step_1x1: get the step in byte to reach next pixel block in a
certain direction
- *_to_argb_u16: helpers to perform colors conversion. They should be
inlined by the compiler, and they are used to avoid repetition between
multiple variants of the same format (argb/xrgb and maybe in the
future for formats like bgr formats).

This new algorithm was tested with:
- kms_plane (for color conversions)
- kms_rotation_crc (for rotations of planes)
- kms_cursor_crc (for translations of planes)
The performance gain was mesured with:
- kms_fb_stress

[1]: commit 8ba1648567e2 ("drm: vkms: Refactor the plane composer to accept
new formats")
https://lore.kernel.org/all/[email protected]/
[2]: commit 322d716a3e8a ("drm/vkms: isolate pixel conversion
functionality")
https://lore.kernel.org/all/[email protected]/

Signed-off-by: Louis Chauvet <[email protected]>
---
drivers/gpu/drm/vkms/vkms_composer.c | 219 +++++++++++++++++++++++-------
drivers/gpu/drm/vkms/vkms_drv.h | 25 +++-
drivers/gpu/drm/vkms/vkms_formats.c | 253 ++++++++++++++++++++++-------------
drivers/gpu/drm/vkms/vkms_formats.h | 2 +-
drivers/gpu/drm/vkms/vkms_plane.c | 8 +-
5 files changed, 350 insertions(+), 157 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index 5b341222d239..e555bf9c1aee 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -24,9 +24,10 @@ static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)

/**
* pre_mul_alpha_blend - alpha blending equation
- * @frame_info: Source framebuffer's metadata
* @stage_buffer: The line with the pixels from src_plane
* @output_buffer: A line buffer that receives all the blends output
+ * @x_start: The start offset to avoid useless copy
+ * @count: The number of byte to copy
*
* Using the information from the `frame_info`, this blends only the
* necessary pixels from the `stage_buffer` to the `output_buffer`
@@ -37,51 +38,23 @@ static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
* drm_plane_create_blend_mode_property(). Also, this formula assumes a
* completely opaque background.
*/
-static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
- struct line_buffer *stage_buffer,
- struct line_buffer *output_buffer)
+static void pre_mul_alpha_blend(
+ struct line_buffer *stage_buffer,
+ struct line_buffer *output_buffer,
+ int x_start,
+ int pixel_count)
{
- int x_dst = frame_info->dst.x1;
- struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
- struct pixel_argb_u16 *in = stage_buffer->pixels;
- int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
- stage_buffer->n_pixels);
-
- for (int x = 0; x < x_limit; x++) {
- out[x].a = (u16)0xffff;
- out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
- out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
- out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
+ struct pixel_argb_u16 *out = &output_buffer->pixels[x_start];
+ struct pixel_argb_u16 *in = &stage_buffer->pixels[x_start];
+
+ for (int i = 0; i < pixel_count; i++) {
+ out[i].a = (u16)0xffff;
+ out[i].r = pre_mul_blend_channel(in[i].r, out[i].r, in[i].a);
+ out[i].g = pre_mul_blend_channel(in[i].g, out[i].g, in[i].a);
+ out[i].b = pre_mul_blend_channel(in[i].b, out[i].b, in[i].a);
}
}

-static int get_y_pos(struct vkms_frame_info *frame_info, int y)
-{
- if (frame_info->rotation & DRM_MODE_REFLECT_Y)
- return drm_rect_height(&frame_info->rotated) - y - 1;
-
- switch (frame_info->rotation & DRM_MODE_ROTATE_MASK) {
- case DRM_MODE_ROTATE_90:
- return frame_info->rotated.x2 - y - 1;
- case DRM_MODE_ROTATE_270:
- return y + frame_info->rotated.x1;
- default:
- return y;
- }
-}
-
-static bool check_limit(struct vkms_frame_info *frame_info, int pos)
-{
- if (drm_rotation_90_or_270(frame_info->rotation)) {
- if (pos >= 0 && pos < drm_rect_width(&frame_info->rotated))
- return true;
- } else {
- if (pos >= frame_info->rotated.y1 && pos < frame_info->rotated.y2)
- return true;
- }
-
- return false;
-}

static void fill_background(const struct pixel_argb_u16 *background_color,
struct line_buffer *output_buffer)
@@ -163,6 +136,37 @@ static void apply_lut(const struct vkms_crtc_state *crtc_state, struct line_buff
}
}

+/**
+ * direction_for_rotation() - Helper to get the correct reading direction for a specific rotation
+ *
+ * @rotation: rotation to analyze
+ */
+enum pixel_read_direction direction_for_rotation(unsigned int rotation)
+{
+ if (rotation & DRM_MODE_ROTATE_0) {
+ if (rotation & DRM_MODE_REFLECT_X)
+ return READ_LEFT;
+ else
+ return READ_RIGHT;
+ } else if (rotation & DRM_MODE_ROTATE_90) {
+ if (rotation & DRM_MODE_REFLECT_Y)
+ return READ_UP;
+ else
+ return READ_DOWN;
+ } else if (rotation & DRM_MODE_ROTATE_180) {
+ if (rotation & DRM_MODE_REFLECT_X)
+ return READ_RIGHT;
+ else
+ return READ_LEFT;
+ } else if (rotation & DRM_MODE_ROTATE_270) {
+ if (rotation & DRM_MODE_REFLECT_Y)
+ return READ_DOWN;
+ else
+ return READ_UP;
+ }
+ return READ_RIGHT;
+}
+
/**
* blend - blend the pixels from all planes and compute crc
* @wb: The writeback frame buffer metadata
@@ -183,11 +187,11 @@ static void blend(struct vkms_writeback_job *wb,
{
struct vkms_plane_state **plane = crtc_state->active_planes;
u32 n_active_planes = crtc_state->num_active_planes;
- int y_pos;

const struct pixel_argb_u16 background_color = { .a = 0xffff };

size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
+ size_t crtc_x_limit = crtc_state->base.crtc->mode.hdisplay;

/*
* The planes are composed line-by-line. It is a necessary complexity to avoid poor
@@ -198,22 +202,133 @@ static void blend(struct vkms_writeback_job *wb,

/* The active planes are composed associatively in z-order. */
for (size_t i = 0; i < n_active_planes; i++) {
- y_pos = get_y_pos(plane[i]->frame_info, y);
+ struct vkms_plane_state *current_plane = plane[i];

- if (!check_limit(plane[i]->frame_info, y_pos))
+ /* Avoid rendering useless lines */
+ if (y < current_plane->frame_info->dst.y1 ||
+ y >= current_plane->frame_info->dst.y2) {
continue;
-
- vkms_compose_row(stage_buffer, plane[i], y_pos);
- pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
- output_buffer);
+ }
+
+ /*
+ * src_px is the line to copy. The initial coordinates are inside the
+ * destination framebuffer, and then drm_rect_* helpers are used to
+ * compute the correct position into the source framebuffer.
+ */
+ struct drm_rect src_px = DRM_RECT_INIT(
+ current_plane->frame_info->dst.x1, y,
+ drm_rect_width(&current_plane->frame_info->dst), 1);
+ struct drm_rect tmp_src;
+
+ drm_rect_fp_to_int(&tmp_src, &current_plane->frame_info->src);
+
+ /*
+ * [1]: Clamping src_px to the crtc_x_limit to avoid writing outside of the
+ * destination buffer
+ */
+ src_px.x2 = min_t(int, src_px.x2, (int)crtc_x_limit);
+
+ /*
+ * Transform the coordinate x/y from the crtc to coordinates into
+ * coordinates for the src buffer.
+ *
+ * - Cancel the offset of the dst buffer.
+ * - Invert the rotation. This assumes that
+ * dst = drm_rect_rotate(src, rotation) (dst and src have the
+ * same size, but can be rotated).
+ * - Apply the offset of the source rectangle to the coordinate.
+ */
+ drm_rect_translate(&src_px, -current_plane->frame_info->dst.x1,
+ -current_plane->frame_info->dst.y1);
+ drm_rect_rotate_inv(&src_px,
+ drm_rect_width(&tmp_src),
+ drm_rect_height(&tmp_src),
+ current_plane->frame_info->rotation);
+ drm_rect_translate(&src_px, tmp_src.x1, tmp_src.y1);
+
+ /* Get the correct reading direction in the source buffer. */
+
+ enum pixel_read_direction direction =
+ direction_for_rotation(current_plane->frame_info->rotation);
+
+ int x_start = src_px.x1;
+ int y_start = src_px.y1;
+ int pixel_count;
+ /* [2]: Compute and clamp the number of pixel to read */
+ if (direction == READ_RIGHT || direction == READ_LEFT) {
+ /*
+ * In horizontal reading, the src_px width is the number of pixel to
+ * read
+ */
+ pixel_count = drm_rect_width(&src_px);
+ if (x_start < 0) {
+ pixel_count += x_start;
+ x_start = 0;
+ }
+ if (x_start + pixel_count > current_plane->frame_info->fb->width) {
+ pixel_count =
+ (int)current_plane->frame_info->fb->width - x_start;
+ }
+ } else {
+ /*
+ * In vertical reading, the src_px height is the number of pixel to
+ * read
+ */
+ pixel_count = drm_rect_height(&src_px);
+ if (y_start < 0) {
+ pixel_count += y_start;
+ y_start = 0;
+ }
+ if (y_start + pixel_count > current_plane->frame_info->fb->height) {
+ pixel_count =
+ (int)current_plane->frame_info->fb->width - y_start;
+ }
+ }
+
+ if (pixel_count <= 0) {
+ /* Nothing to read, so avoid multiple function calls for nothing */
+ continue;
+ }
+
+ /*
+ * Modify the starting point to take in account the rotation
+ *
+ * src_px is the top-left corner, so when reading READ_LEFT or READ_TOP, it
+ * must be changed to the top-right/bottom-left corner.
+ */
+ if (direction == READ_LEFT) {
+ // x_start is now the right point
+ x_start += pixel_count - 1;
+ } else if (direction == READ_UP) {
+ // y_start is now the bottom point
+ y_start += pixel_count - 1;
+ }
+
+ /*
+ * Perform the conversion and the blending
+ *
+ * Here we know that the read line (x_start, y_start, pixel_count) is
+ * inside the source buffer [2] and we don't write outside the stage
+ * buffer [1]
+ */
+ current_plane->pixel_read_line(
+ current_plane->frame_info,
+ x_start,
+ y_start,
+ direction,
+ pixel_count,
+ &stage_buffer->pixels[current_plane->frame_info->dst.x1]);
+
+ pre_mul_alpha_blend(stage_buffer, output_buffer,
+ current_plane->frame_info->dst.x1,
+ pixel_count);
}

apply_lut(crtc_state, output_buffer);

*crc32 = crc32_le(*crc32, (void *)output_buffer->pixels, row_size);
-
if (wb)
- vkms_writeback_row(wb, output_buffer, y_pos);
+ vkms_writeback_row(wb, output_buffer, y);
}
}

@@ -224,7 +339,7 @@ static int check_format_funcs(struct vkms_crtc_state *crtc_state,
u32 n_active_planes = crtc_state->num_active_planes;

for (size_t i = 0; i < n_active_planes; i++)
- if (!planes[i]->pixel_read)
+ if (!planes[i]->pixel_read_line)
return -1;

if (active_wb && !active_wb->pixel_write)
diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 886c885c8cf5..ccc5be009f15 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -39,7 +39,6 @@
struct vkms_frame_info {
struct drm_framebuffer *fb;
struct drm_rect src, dst;
- struct drm_rect rotated;
struct iosys_map map[DRM_FORMAT_MAX_PLANES];
unsigned int rotation;
};
@@ -69,14 +68,27 @@ struct vkms_writeback_job {
pixel_write_t pixel_write;
};

+enum pixel_read_direction {
+ READ_UP,
+ READ_DOWN,
+ READ_LEFT,
+ READ_RIGHT
+};
+
/**
- * typedef pixel_read_t - These functions are used to read a pixel in the source frame,
+<<<<<<< HEAD
+ * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
* convert it to `struct pixel_argb_u16` and write it to @out_pixel.
*
- * @src_pixels: Pointer to the pixel to read
- * @out_pixel: Pointer to write the converted pixel
+ * @frame_info: Frame used as source for the pixel value
+ * @y: Y (height) coordinate in the source buffer
+ * @x_start: X (width) coordinate of the first pixel to copy
+ * @x_end: X (width) coordinate of the last pixel to copy
+ * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
+ * x_end.
*/
-typedef void (*pixel_read_t)(u8 *src_pixels, struct pixel_argb_u16 *out_pixel);
+typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
+ pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);

/**
* vkms_plane_state - Driver specific plane state
@@ -88,7 +100,7 @@ typedef void (*pixel_read_t)(u8 *src_pixels, struct pixel_argb_u16 *out_pixel);
struct vkms_plane_state {
struct drm_shadow_plane_state base;
struct vkms_frame_info *frame_info;
- pixel_read_t pixel_read;
+ pixel_read_line_t pixel_read_line;
};

struct vkms_plane {
@@ -193,7 +205,6 @@ int vkms_verify_crc_source(struct drm_crtc *crtc, const char *source_name,
/* Composer Support */
void vkms_composer_worker(struct work_struct *work);
void vkms_set_composer(struct vkms_output *out, bool enabled);
-void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y);
void vkms_writeback_row(struct vkms_writeback_job *wb, const struct line_buffer *src_buffer, int y);

/* Writeback */
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index 1f5aeba57ad6..46daea6d3ee9 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -11,21 +11,29 @@

/**
* packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
- * in the first plane
*
* @frame_info: Buffer metadata
* @x: The x coordinate of the wanted pixel in the buffer
* @y: The y coordinate of the wanted pixel in the buffer
+ * @plane_index: The index of the plane to use
*
* The caller must be aware that this offset is not always a pointer to a pixel. If individual
* pixel values are needed, they have to be extracted from the resulting block.
*/
-static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
+static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int x, int y,
+ size_t plane_index)
{
struct drm_framebuffer *fb = frame_info->fb;
-
- return fb->offsets[0] + (y * fb->pitches[0])
- + (x * fb->format->cpp[0]);
+ const struct drm_format_info *format = frame_info->fb->format;
+ /* Directly using x and y to multiply pitches and format->ccp is not sufficient because
+ * in some formats a block can represent multiple pixels.
+ *
+ * Dividing x and y by the block size allows to extract the correct offset of the block
+ * containing the pixel.
+ */
+ return fb->offsets[plane_index] +
+ (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
+ (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
}

/**
@@ -35,44 +43,56 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
* @frame_info: Buffer metadata
* @x: The x(width) coordinate inside the plane
* @y: The y(height) coordinate inside the plane
+ * @plane_index: The index of the plane
*
- * Takes the information stored in the frame_info, a pair of coordinates, and
- * returns the address of the first color channel.
- * This function assumes the channels are packed together, i.e. a color channel
- * comes immediately after another in the memory. And therefore, this function
- * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
+ * Takes the information stored in the frame_info, a pair of coordinates, and returns the address
+ * of the block containing this pixel.
+ * The caller must be aware that this pointer is sometimes not directly a pixel, it needs some
+ * additional work to extract pixel color from this block.
*/
static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
- int x, int y)
+ int x, int y, size_t plane_index)
{
- size_t offset = pixel_offset(frame_info, x, y);
-
- return (u8 *)frame_info->map[0].vaddr + offset;
+ return (u8 *)frame_info->map[0].vaddr + packed_pixels_offset(frame_info, x, y, plane_index);
}

-static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
+/**
+ * get_step_1x1() - Common helper to compute the correct step value between each pixel to read in a
+ * certain direction.
+ * This must be used only with format where blockh == blockw == 1.
+ * In the case when direction is not a valid pixel_read_direction, the returned step is 0, so you
+ * must not rely on this result to create a loop variant.
+ *
+ * @fb Framebuffer to iter on
+ * @direction Direction of the reading
+ */
+static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction direction,
+ int plane_index)
{
- int x_src = frame_info->src.x1 >> 16;
- int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
-
- return packed_pixels_addr(frame_info, x_src, y_src);
+ switch (direction) {
+ default:
+ DRM_ERROR("Invalid direction for pixel reading: %d\n", direction);
+ return 0;
+ case READ_RIGHT:
+ return fb->format->char_per_block[plane_index];
+ case READ_LEFT:
+ return -fb->format->char_per_block[plane_index];
+ case READ_DOWN:
+ return (int)fb->pitches[plane_index];
+ case READ_UP:
+ return -(int)fb->pitches[plane_index];
+ }
}

-static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
-{
- if (frame_info->rotation & (DRM_MODE_REFLECT_X | DRM_MODE_ROTATE_270))
- return limit - x - 1;
- return x;
-}

/*
- * The following functions take pixel data from the buffer and convert them to the format
+ * The following functions take pixel data (a, r, g, b, pixel, ...), convert them to the format
* ARGB16161616 in out_pixel.
*
- * They are used in the `vkms_compose_row` function to handle multiple formats.
+ * They are used in the `read_line`s functions to avoid duplicate work for some pixel formats.
*/

-static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
+static void ARGB8888_to_argb_u16(struct pixel_argb_u16 *out_pixel, int a, int r, int g, int b)
{
/*
* The 257 is the "conversion ratio". This number is obtained by the
@@ -80,48 +100,26 @@ static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixe
* the best color value in a pixel format with more possibilities.
* A similar idea applies to others RGB color conversions.
*/
- out_pixel->a = (u16)src_pixels[3] * 257;
- out_pixel->r = (u16)src_pixels[2] * 257;
- out_pixel->g = (u16)src_pixels[1] * 257;
- out_pixel->b = (u16)src_pixels[0] * 257;
-}
-
-static void XRGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
-{
- out_pixel->a = (u16)0xffff;
- out_pixel->r = (u16)src_pixels[2] * 257;
- out_pixel->g = (u16)src_pixels[1] * 257;
- out_pixel->b = (u16)src_pixels[0] * 257;
+ out_pixel->a = (u16)a * 257;
+ out_pixel->r = (u16)r * 257;
+ out_pixel->g = (u16)g * 257;
+ out_pixel->b = (u16)b * 257;
}

-static void ARGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
+static void ARGB16161616_to_argb_u16(struct pixel_argb_u16 *out_pixel, int a, int r, int g, int b)
{
- u16 *pixels = (u16 *)src_pixels;
-
- out_pixel->a = le16_to_cpu(pixels[3]);
- out_pixel->r = le16_to_cpu(pixels[2]);
- out_pixel->g = le16_to_cpu(pixels[1]);
- out_pixel->b = le16_to_cpu(pixels[0]);
-}
-
-static void XRGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
-{
- u16 *pixels = (u16 *)src_pixels;
-
- out_pixel->a = (u16)0xffff;
- out_pixel->r = le16_to_cpu(pixels[2]);
- out_pixel->g = le16_to_cpu(pixels[1]);
- out_pixel->b = le16_to_cpu(pixels[0]);
+ out_pixel->a = le16_to_cpu(a);
+ out_pixel->r = le16_to_cpu(r);
+ out_pixel->g = le16_to_cpu(g);
+ out_pixel->b = le16_to_cpu(b);
}

-static void RGB565_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
+static void RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixel)
{
- u16 *pixels = (u16 *)src_pixels;
-
s64 fp_rb_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(31));
s64 fp_g_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(63));

- u16 rgb_565 = le16_to_cpu(*pixels);
+ u16 rgb_565 = le16_to_cpu(*pixel);
s64 fp_r = drm_int2fixp((rgb_565 >> 11) & 0x1f);
s64 fp_g = drm_int2fixp((rgb_565 >> 5) & 0x3f);
s64 fp_b = drm_int2fixp(rgb_565 & 0x1f);
@@ -132,34 +130,105 @@ static void RGB565_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
}

-/**
- * vkms_compose_row - compose a single row of a plane
- * @stage_buffer: output line with the composed pixels
- * @plane: state of the plane that is being composed
- * @y: y coordinate of the row
+/*
+ * The following functions are read_line function for each pixel format supported by VKMS.
*
- * This function composes a single row of a plane. It gets the source pixels
- * through the y coordinate (see get_packed_src_addr()) and goes linearly
- * through the source pixel, reading the pixels and converting it to
- * ARGB16161616 (see the pixel_read() callback). For rotate-90 and rotate-270,
- * the source pixels are not traversed linearly. The source pixels are queried
- * on each iteration in order to traverse the pixels vertically.
+ * They read a line starting at the point @x_start,@y_start following the @direction. The result
+ * is stored in @out_pixel and in the format ARGB16161616.
+ *
+ * Those function are very similar, but it is required for performance reason. In the past, some
+ * experiment were done, and with a generic loop the performance are very reduced [1].
+ *
+ * [1]: https://lore.kernel.org/dri-devel/[email protected]/
*/
-void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y)
+
+static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+ enum pixel_read_direction direction, int count,
+ struct pixel_argb_u16 out_pixel[])
+{
+ u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
+
+ int step = get_step_1x1(frame_info->fb, direction, 0);
+
+ while (count) {
+ u8 *px = (u8 *)src_pixels;
+
+ ARGB8888_to_argb_u16(out_pixel, px[3], px[2], px[1], px[0]);
+ out_pixel += 1;
+ src_pixels += step;
+ count--;
+ }
+}
+
+static void XRGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+ enum pixel_read_direction direction, int count,
+ struct pixel_argb_u16 out_pixel[])
+{
+ u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
+
+ int step = get_step_1x1(frame_info->fb, direction, 0);
+
+ while (count) {
+ u8 *px = (u8 *)src_pixels;
+
+ ARGB8888_to_argb_u16(out_pixel, 255, px[2], px[1], px[0]);
+ out_pixel += 1;
+ src_pixels += step;
+ count--;
+ }
+}
+
+static void ARGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+ enum pixel_read_direction direction, int count,
+ struct pixel_argb_u16 out_pixel[])
+{
+ u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
+
+ int step = get_step_1x1(frame_info->fb, direction, 0);
+
+ while (count) {
+ u16 *px = (u16 *)src_pixels;
+
+ ARGB16161616_to_argb_u16(out_pixel, px[3], px[2], px[1], px[0]);
+ out_pixel += 1;
+ src_pixels += step;
+ count--;
+ }
+}
+
+static void XRGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+ enum pixel_read_direction direction, int count,
+ struct pixel_argb_u16 out_pixel[])
+{
+ u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
+
+ int step = get_step_1x1(frame_info->fb, direction, 0);
+
+ while (count) {
+ u16 *px = (u16 *)src_pixels;
+
+ ARGB16161616_to_argb_u16(out_pixel, 0xFFFF, px[2], px[1], px[0]);
+ out_pixel += 1;
+ src_pixels += step;
+ count--;
+ }
+}
+
+static void RGB565_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+ enum pixel_read_direction direction, int count,
+ struct pixel_argb_u16 out_pixel[])
{
- struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
- struct vkms_frame_info *frame_info = plane->frame_info;
- u8 *src_pixels = get_packed_src_addr(frame_info, y);
- int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);
+ u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);

- for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
- int x_pos = get_x_position(frame_info, limit, x);
+ int step = get_step_1x1(frame_info->fb, direction, 0);

- if (drm_rotation_90_or_270(frame_info->rotation))
- src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1)
- + frame_info->fb->format->cpp[0] * y;
+ while (count) {
+ u16 *px = (u16 *)src_pixels;

- plane->pixel_read(src_pixels, &out_pixels[x_pos]);
+ RGB565_to_argb_u16(out_pixel, px);
+ out_pixel += 1;
+ src_pixels += step;
+ count--;
}
}

@@ -247,7 +316,7 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
{
struct vkms_frame_info *frame_info = &wb->wb_frame_info;
int x_dst = frame_info->dst.x1;
- u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
+ u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y, 0);
struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst), src_buffer->n_pixels);

@@ -256,27 +325,27 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
}

/**
- * Retrieve the correct read_pixel function for a specific format.
+ * Retrieve the correct read_line function for a specific format.
* The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the
* pointer is valid before using it in a vkms_plane_state.
*
* @format: 4cc of the format
*/
-pixel_read_t get_pixel_read_function(u32 format)
+pixel_read_line_t get_pixel_read_line_function(u32 format)
{
switch (format) {
case DRM_FORMAT_ARGB8888:
- return &ARGB8888_to_argb_u16;
+ return &ARGB8888_read_line;
case DRM_FORMAT_XRGB8888:
- return &XRGB8888_to_argb_u16;
+ return &XRGB8888_read_line;
case DRM_FORMAT_ARGB16161616:
- return &ARGB16161616_to_argb_u16;
+ return &ARGB16161616_read_line;
case DRM_FORMAT_XRGB16161616:
- return &XRGB16161616_to_argb_u16;
+ return &XRGB16161616_read_line;
case DRM_FORMAT_RGB565:
- return &RGB565_to_argb_u16;
+ return &RGB565_read_line;
default:
- return (pixel_read_t)NULL;
+ return (pixel_read_line_t)NULL;
}
}

diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
index 3ecea4563254..8d2bef95ff79 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.h
+++ b/drivers/gpu/drm/vkms/vkms_formats.h
@@ -5,7 +5,7 @@

#include "vkms_drv.h"

-pixel_read_t get_pixel_read_function(u32 format);
+pixel_read_line_t get_pixel_read_line_function(u32 format);

pixel_write_t get_pixel_write_function(u32 format);

diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index f68b1b03d632..58c1c74742b5 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -106,9 +106,9 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
return;

fmt = fb->format->format;
- pixel_read_t pixel_read = get_pixel_read_function(fmt);
+ pixel_read_line_t pixel_read_line = get_pixel_read_line_function(fmt);

- if (!pixel_read) {
+ if (!pixel_read_line) {
DRM_WARN("Pixel format is not supported by VKMS planes. State is inchanged\n");
return;
}
@@ -128,10 +128,8 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
DRM_MODE_REFLECT_X |
DRM_MODE_REFLECT_Y);

- drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated),
- drm_rect_height(&frame_info->rotated), frame_info->rotation);

- vkms_plane_state->pixel_read = pixel_read;
+ vkms_plane_state->pixel_read_line = pixel_read_line;
}

static int vkms_plane_atomic_check(struct drm_plane *plane,

--
2.43.0


2024-02-23 11:40:30

by Louis Chauvet

[permalink] [raw]
Subject: [PATCH v2 9/9] drm/vkms: Create KUnit tests for YUV conversions

From: Arthur Grillo <[email protected]>

Create KUnit tests to test the conversion between YUV and RGB. Test each
conversion and range combination with some common colors.

Signed-off-by: Arthur Grillo <[email protected]>
[Louis Chauvet: fix minor formating issues (whitespace, double line)]
Signed-off-by: Louis Chauvet <[email protected]>
---
drivers/gpu/drm/vkms/Makefile | 1 +
drivers/gpu/drm/vkms/tests/.kunitconfig | 4 +
drivers/gpu/drm/vkms/tests/Makefile | 3 +
drivers/gpu/drm/vkms/tests/vkms_format_test.c | 155 ++++++++++++++++++++++++++
drivers/gpu/drm/vkms/vkms_formats.c | 9 +-
drivers/gpu/drm/vkms/vkms_formats.h | 5 +
6 files changed, 175 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
index 1b28a6a32948..8d3e46dde635 100644
--- a/drivers/gpu/drm/vkms/Makefile
+++ b/drivers/gpu/drm/vkms/Makefile
@@ -9,3 +9,4 @@ vkms-y := \
vkms_writeback.o

obj-$(CONFIG_DRM_VKMS) += vkms.o
+obj-$(CONFIG_DRM_VKMS_KUNIT_TESTS) += tests/
diff --git a/drivers/gpu/drm/vkms/tests/.kunitconfig b/drivers/gpu/drm/vkms/tests/.kunitconfig
new file mode 100644
index 000000000000..70e378228cbd
--- /dev/null
+++ b/drivers/gpu/drm/vkms/tests/.kunitconfig
@@ -0,0 +1,4 @@
+CONFIG_KUNIT=y
+CONFIG_DRM=y
+CONFIG_DRM_VKMS=y
+CONFIG_DRM_VKMS_KUNIT_TESTS=y
diff --git a/drivers/gpu/drm/vkms/tests/Makefile b/drivers/gpu/drm/vkms/tests/Makefile
new file mode 100644
index 000000000000..2d1df668569e
--- /dev/null
+++ b/drivers/gpu/drm/vkms/tests/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_DRM_VKMS_KUNIT_TESTS) += vkms_format_test.o
diff --git a/drivers/gpu/drm/vkms/tests/vkms_format_test.c b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
new file mode 100644
index 000000000000..cb6d32ff115d
--- /dev/null
+++ b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
@@ -0,0 +1,155 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#include <kunit/test.h>
+
+#include <drm/drm_fixed.h>
+#include <drm/drm_fourcc.h>
+#include <drm/drm_print.h>
+
+#include "../../drm_crtc_internal.h"
+
+#include "../vkms_drv.h"
+#include "../vkms_formats.h"
+
+#define TEST_BUFF_SIZE 50
+
+struct yuv_u8_to_argb_u16_case {
+ enum drm_color_encoding encoding;
+ enum drm_color_range range;
+ size_t n_colors;
+ struct format_pair {
+ char *name;
+ struct pixel_yuv_u8 yuv;
+ struct pixel_argb_u16 argb;
+ } colors[TEST_BUFF_SIZE];
+};
+
+static struct yuv_u8_to_argb_u16_case yuv_u8_to_argb_u16_cases[] = {
+ {
+ .encoding = DRM_COLOR_YCBCR_BT601,
+ .range = DRM_COLOR_YCBCR_FULL_RANGE,
+ .n_colors = 6,
+ .colors = {
+ {"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
+ {"gray", {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
+ {"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
+ {"red", {0x4c, 0x55, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
+ {"green", {0x96, 0x2c, 0x15}, {0x0000, 0x0000, 0xffff, 0x0000}},
+ {"blue", {0x1d, 0xff, 0x6b}, {0x0000, 0x0000, 0x0000, 0xffff}},
+ },
+ },
+ {
+ .encoding = DRM_COLOR_YCBCR_BT601,
+ .range = DRM_COLOR_YCBCR_LIMITED_RANGE,
+ .n_colors = 6,
+ .colors = {
+ {"white", {0xeb, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
+ {"gray", {0x7e, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
+ {"black", {0x10, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
+ {"red", {0x51, 0x5a, 0xf0}, {0x0000, 0xffff, 0x0000, 0x0000}},
+ {"green", {0x91, 0x36, 0x22}, {0x0000, 0x0000, 0xffff, 0x0000}},
+ {"blue", {0x29, 0xf0, 0x6e}, {0x0000, 0x0000, 0x0000, 0xffff}},
+ },
+ },
+ {
+ .encoding = DRM_COLOR_YCBCR_BT709,
+ .range = DRM_COLOR_YCBCR_FULL_RANGE,
+ .n_colors = 4,
+ .colors = {
+ {"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
+ {"gray", {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
+ {"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
+ {"red", {0x35, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
+ {"green", {0xb6, 0x1e, 0x0c}, {0x0000, 0x0000, 0xffff, 0x0000}},
+ {"blue", {0x12, 0xff, 0x74}, {0x0000, 0x0000, 0x0000, 0xffff}},
+ },
+ },
+ {
+ .encoding = DRM_COLOR_YCBCR_BT709,
+ .range = DRM_COLOR_YCBCR_LIMITED_RANGE,
+ .n_colors = 4,
+ .colors = {
+ {"white", {0xeb, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
+ {"gray", {0x7e, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
+ {"black", {0x10, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
+ {"red", {0x3f, 0x66, 0xf0}, {0x0000, 0xffff, 0x0000, 0x0000}},
+ {"green", {0xad, 0x2a, 0x1a}, {0x0000, 0x0000, 0xffff, 0x0000}},
+ {"blue", {0x20, 0xf0, 0x76}, {0x0000, 0x0000, 0x0000, 0xffff}},
+ },
+ },
+ {
+ .encoding = DRM_COLOR_YCBCR_BT2020,
+ .range = DRM_COLOR_YCBCR_FULL_RANGE,
+ .n_colors = 4,
+ .colors = {
+ {"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
+ {"gray", {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
+ {"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
+ {"red", {0x43, 0x5c, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
+ {"green", {0xad, 0x24, 0x0b}, {0x0000, 0x0000, 0xffff, 0x0000}},
+ {"blue", {0x0f, 0xff, 0x76}, {0x0000, 0x0000, 0x0000, 0xffff}},
+ },
+ },
+ {
+ .encoding = DRM_COLOR_YCBCR_BT2020,
+ .range = DRM_COLOR_YCBCR_LIMITED_RANGE,
+ .n_colors = 4,
+ .colors = {
+ {"white", {0xeb, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
+ {"gray", {0x7e, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
+ {"black", {0x10, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
+ {"red", {0x4a, 0x61, 0xf0}, {0x0000, 0xffff, 0x0000, 0x0000}},
+ {"green", {0xa4, 0x2f, 0x19}, {0x0000, 0x0000, 0xffff, 0x0000}},
+ {"blue", {0x1d, 0xf0, 0x77}, {0x0000, 0x0000, 0x0000, 0xffff}},
+ },
+ },
+};
+
+static void vkms_format_test_yuv_u8_to_argb_u16(struct kunit *test)
+{
+ const struct yuv_u8_to_argb_u16_case *param = test->param_value;
+ struct pixel_argb_u16 argb;
+
+ for (size_t i = 0; i < param->n_colors; i++) {
+ const struct format_pair *color = &param->colors[i];
+
+ yuv_u8_to_argb_u16(&argb, &color->yuv, param->encoding, param->range);
+
+ KUNIT_EXPECT_LE_MSG(test, abs_diff(argb.a, color->argb.a), 257,
+ "On the A channel of the color %s expected 0x%04x, got 0x%04x",
+ color->name, color->argb.a, argb.a);
+ KUNIT_EXPECT_LE_MSG(test, abs_diff(argb.r, color->argb.r), 257,
+ "On the R channel of the color %s expected 0x%04x, got 0x%04x",
+ color->name, color->argb.r, argb.r);
+ KUNIT_EXPECT_LE_MSG(test, abs_diff(argb.g, color->argb.g), 257,
+ "On the G channel of the color %s expected 0x%04x, got 0x%04x",
+ color->name, color->argb.g, argb.g);
+ KUNIT_EXPECT_LE_MSG(test, abs_diff(argb.b, color->argb.b), 257,
+ "On the B channel of the color %s expected 0x%04x, got 0x%04x",
+ color->name, color->argb.b, argb.b);
+ }
+}
+
+static void vkms_format_test_yuv_u8_to_argb_u16_case_desc(struct yuv_u8_to_argb_u16_case *t,
+ char *desc)
+{
+ snprintf(desc, KUNIT_PARAM_DESC_SIZE, "%s - %s",
+ drm_get_color_encoding_name(t->encoding), drm_get_color_range_name(t->range));
+}
+
+KUNIT_ARRAY_PARAM(yuv_u8_to_argb_u16, yuv_u8_to_argb_u16_cases,
+ vkms_format_test_yuv_u8_to_argb_u16_case_desc);
+
+static struct kunit_case vkms_format_test_cases[] = {
+ KUNIT_CASE_PARAM(vkms_format_test_yuv_u8_to_argb_u16, yuv_u8_to_argb_u16_gen_params),
+ {}
+};
+
+static struct kunit_suite vkms_format_test_suite = {
+ .name = "vkms-format",
+ .test_cases = vkms_format_test_cases,
+};
+
+kunit_test_suite(vkms_format_test_suite);
+
+MODULE_LICENSE("GPL");
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index 515c80866a58..20dd23ce9051 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -7,6 +7,8 @@
#include <drm/drm_rect.h>
#include <drm/drm_fixed.h>

+#include <kunit/visibility.h>
+
#include "vkms_formats.h"

/**
@@ -175,8 +177,10 @@ static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r,
*b = clamp(b_16, 0, 0xffff) >> 8;
}

-static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
- enum drm_color_encoding encoding, enum drm_color_range range)
+VISIBLE_IF_KUNIT void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16,
+ const struct pixel_yuv_u8 *yuv_u8,
+ enum drm_color_encoding encoding,
+ enum drm_color_range range)
{
static const s16 bt601_full[3][3] = {
{ 256, 0, 359 },
@@ -237,6 +241,7 @@ static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pix
argb_u16->g = g * 257;
argb_u16->b = b * 257;
}
+EXPORT_SYMBOL_IF_KUNIT(yuv_u8_to_argb_u16);

/*
* The following functions are read_line function for each pixel format supported by VKMS.
diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
index 5a3a9e1328d8..4245a5c5e956 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.h
+++ b/drivers/gpu/drm/vkms/vkms_formats.h
@@ -13,4 +13,9 @@ struct pixel_yuv_u8 {
u8 y, u, v;
};

+#if IS_ENABLED(CONFIG_KUNIT)
+void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
+ enum drm_color_encoding encoding, enum drm_color_range range);
+#endif
+
#endif /* _VKMS_FORMATS_H_ */

--
2.43.0


2024-02-23 11:45:58

by Louis Chauvet

[permalink] [raw]
Subject: [PATCH v2 6/9] drm/vkms: Add YUV support

From: Arthur Grillo <[email protected]>

Add support to the YUV formats bellow:

- NV12
- NV16
- NV24
- NV21
- NV61
- NV42
- YUV420
- YUV422
- YUV444
- YVU420
- YVU422
- YVU444

The conversion matrices of each encoding and range were obtained by
rounding the values of the original conversion matrices multiplied by
2^8. This is done to avoid the use of fixed point operations.

Signed-off-by: Arthur Grillo <[email protected]>
[Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
callbacks for yuv formats]
Signed-off-by: Louis Chauvet <[email protected]>
---
drivers/gpu/drm/vkms/vkms_composer.c | 2 +-
drivers/gpu/drm/vkms/vkms_drv.h | 6 +-
drivers/gpu/drm/vkms/vkms_formats.c | 289 +++++++++++++++++++++++++++++++++--
drivers/gpu/drm/vkms/vkms_formats.h | 4 +
drivers/gpu/drm/vkms/vkms_plane.c | 14 +-
5 files changed, 295 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index e555bf9c1aee..54fc5161d565 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
* buffer [1]
*/
current_plane->pixel_read_line(
- current_plane->frame_info,
+ current_plane,
x_start,
y_start,
direction,
diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index ccc5be009f15..a4f6456cb971 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -75,6 +75,8 @@ enum pixel_read_direction {
READ_RIGHT
};

+struct vkms_plane_state;
+
/**
<<<<<<< HEAD
* typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
@@ -87,8 +89,8 @@ enum pixel_read_direction {
* @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
* x_end.
*/
-typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
- pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
+typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
+ enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);

/**
* vkms_plane_state - Driver specific plane state
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index 46daea6d3ee9..515c80866a58 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
*/
return fb->offsets[plane_index] +
(y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
- (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
+ (x / drm_format_info_block_height(format, plane_index)) *
+ format->char_per_block[plane_index];
}

/**
@@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
}
}

+/**
+ * get_subsampling() - Get the subsampling value on a specific direction
+ */
+static int get_subsampling(const struct drm_format_info *format,
+ enum pixel_read_direction direction)
+{
+ if (direction == READ_LEFT || direction == READ_RIGHT)
+ return format->hsub;
+ else if (direction == READ_DOWN || direction == READ_UP)
+ return format->vsub;
+ return 1;
+}
+
+/**
+ * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
+ */
+static int get_subsampling_offset(const struct drm_format_info *format,
+ enum pixel_read_direction direction, int x_start, int y_start)
+{
+ if (direction == READ_RIGHT || direction == READ_LEFT)
+ return x_start;
+ else if (direction == READ_DOWN || direction == READ_UP)
+ return y_start;
+ return 0;
+}
+

/*
* The following functions take pixel data (a, r, g, b, pixel, ...), convert them to the format
@@ -130,6 +157,87 @@ static void RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixe
out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
}

+static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
+{
+ s32 y_16, cb_16, cr_16;
+ s32 r_16, g_16, b_16;
+
+ y_16 = y - y_offset;
+ cb_16 = cb - 128;
+ cr_16 = cr - 128;
+
+ r_16 = m[0][0] * y_16 + m[0][1] * cb_16 + m[0][2] * cr_16;
+ g_16 = m[1][0] * y_16 + m[1][1] * cb_16 + m[1][2] * cr_16;
+ b_16 = m[2][0] * y_16 + m[2][1] * cb_16 + m[2][2] * cr_16;
+
+ *r = clamp(r_16, 0, 0xffff) >> 8;
+ *g = clamp(g_16, 0, 0xffff) >> 8;
+ *b = clamp(b_16, 0, 0xffff) >> 8;
+}
+
+static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
+ enum drm_color_encoding encoding, enum drm_color_range range)
+{
+ static const s16 bt601_full[3][3] = {
+ { 256, 0, 359 },
+ { 256, -88, -183 },
+ { 256, 454, 0 },
+ };
+ static const s16 bt601[3][3] = {
+ { 298, 0, 409 },
+ { 298, -100, -208 },
+ { 298, 516, 0 },
+ };
+ static const s16 rec709_full[3][3] = {
+ { 256, 0, 408 },
+ { 256, -48, -120 },
+ { 256, 476, 0 },
+ };
+ static const s16 rec709[3][3] = {
+ { 298, 0, 459 },
+ { 298, -55, -136 },
+ { 298, 541, 0 },
+ };
+ static const s16 bt2020_full[3][3] = {
+ { 256, 0, 377 },
+ { 256, -42, -146 },
+ { 256, 482, 0 },
+ };
+ static const s16 bt2020[3][3] = {
+ { 298, 0, 430 },
+ { 298, -48, -167 },
+ { 298, 548, 0 },
+ };
+
+ u8 r = 0;
+ u8 g = 0;
+ u8 b = 0;
+ bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
+ unsigned int y_offset = full ? 0 : 16;
+
+ switch (encoding) {
+ case DRM_COLOR_YCBCR_BT601:
+ ycbcr2rgb(full ? bt601_full : bt601,
+ yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
+ break;
+ case DRM_COLOR_YCBCR_BT709:
+ ycbcr2rgb(full ? rec709_full : rec709,
+ yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
+ break;
+ case DRM_COLOR_YCBCR_BT2020:
+ ycbcr2rgb(full ? bt2020_full : bt2020,
+ yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
+ break;
+ default:
+ pr_warn_once("Not supported color encoding\n");
+ break;
+ }
+
+ argb_u16->r = r * 257;
+ argb_u16->g = g * 257;
+ argb_u16->b = b * 257;
+}
+
/*
* The following functions are read_line function for each pixel format supported by VKMS.
*
@@ -142,13 +250,13 @@ static void RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixe
* [1]: https://lore.kernel.org/dri-devel/[email protected]/
*/

-static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+static void ARGB8888_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
enum pixel_read_direction direction, int count,
struct pixel_argb_u16 out_pixel[])
{
- u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
+ u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);

- int step = get_step_1x1(frame_info->fb, direction, 0);
+ int step = get_step_1x1(plane->frame_info->fb, direction, 0);

while (count) {
u8 *px = (u8 *)src_pixels;
@@ -160,13 +268,13 @@ static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start,
}
}

-static void XRGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+static void XRGB8888_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
enum pixel_read_direction direction, int count,
struct pixel_argb_u16 out_pixel[])
{
- u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
+ u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);

- int step = get_step_1x1(frame_info->fb, direction, 0);
+ int step = get_step_1x1(plane->frame_info->fb, direction, 0);

while (count) {
u8 *px = (u8 *)src_pixels;
@@ -178,13 +286,13 @@ static void XRGB8888_read_line(struct vkms_frame_info *frame_info, int x_start,
}
}

-static void ARGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+static void ARGB16161616_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
enum pixel_read_direction direction, int count,
struct pixel_argb_u16 out_pixel[])
{
- u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
+ u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);

- int step = get_step_1x1(frame_info->fb, direction, 0);
+ int step = get_step_1x1(plane->frame_info->fb, direction, 0);

while (count) {
u16 *px = (u16 *)src_pixels;
@@ -196,13 +304,13 @@ static void ARGB16161616_read_line(struct vkms_frame_info *frame_info, int x_sta
}
}

-static void XRGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+static void XRGB16161616_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
enum pixel_read_direction direction, int count,
struct pixel_argb_u16 out_pixel[])
{
- u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
+ u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);

- int step = get_step_1x1(frame_info->fb, direction, 0);
+ int step = get_step_1x1(plane->frame_info->fb, direction, 0);

while (count) {
u16 *px = (u16 *)src_pixels;
@@ -214,13 +322,13 @@ static void XRGB16161616_read_line(struct vkms_frame_info *frame_info, int x_sta
}
}

-static void RGB565_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
+static void RGB565_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
enum pixel_read_direction direction, int count,
struct pixel_argb_u16 out_pixel[])
{
- u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
+ u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);

- int step = get_step_1x1(frame_info->fb, direction, 0);
+ int step = get_step_1x1(plane->frame_info->fb, direction, 0);

while (count) {
u16 *px = (u16 *)src_pixels;
@@ -232,6 +340,139 @@ static void RGB565_read_line(struct vkms_frame_info *frame_info, int x_start, in
}
}

+static void semi_planar_yuv_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
+ enum pixel_read_direction direction, int count,
+ struct pixel_argb_u16 out_pixel[])
+{
+ u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
+ u8 *uv_plane = packed_pixels_addr(plane->frame_info,
+ x_start / plane->frame_info->fb->format->hsub,
+ y_start / plane->frame_info->fb->format->vsub,
+ 1);
+ struct pixel_yuv_u8 yuv_u8;
+ int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
+ int step_uv = get_step_1x1(plane->frame_info->fb, direction, 1);
+ int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
+ int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
+ x_start, y_start); // 0
+
+ for (int i = 0; i < count; i++) {
+ yuv_u8.y = y_plane[0];
+ yuv_u8.u = uv_plane[0];
+ yuv_u8.v = uv_plane[1];
+
+ yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
+ plane->base.base.color_range);
+ out_pixel += 1;
+ y_plane += step_y;
+ if ((i + subsampling_offset + 1) % subsampling == 0)
+ uv_plane += step_uv;
+ }
+}
+
+static void semi_planar_yvu_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
+ enum pixel_read_direction direction, int count,
+ struct pixel_argb_u16 out_pixel[])
+{
+ u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
+ u8 *vu_plane = packed_pixels_addr(plane->frame_info,
+ x_start / plane->frame_info->fb->format->hsub,
+ y_start / plane->frame_info->fb->format->vsub,
+ 1);
+ struct pixel_yuv_u8 yuv_u8;
+ int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
+ int step_vu = get_step_1x1(plane->frame_info->fb, direction, 1);
+ int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
+ int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
+ x_start, y_start);
+ for (int i = 0; i < count; i++) {
+ yuv_u8.y = y_plane[0];
+ yuv_u8.u = vu_plane[1];
+ yuv_u8.v = vu_plane[0];
+
+ yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
+ plane->base.base.color_range);
+ out_pixel += 1;
+ y_plane += step_y;
+ if ((i + subsampling_offset + 1) % subsampling == 0)
+ vu_plane += step_vu;
+ }
+}
+
+static void planar_yuv_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
+ enum pixel_read_direction direction, int count,
+ struct pixel_argb_u16 out_pixel[])
+{
+ u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
+ u8 *u_plane = packed_pixels_addr(plane->frame_info,
+ x_start / plane->frame_info->fb->format->hsub,
+ y_start / plane->frame_info->fb->format->vsub,
+ 1);
+ u8 *v_plane = packed_pixels_addr(plane->frame_info,
+ x_start / plane->frame_info->fb->format->hsub,
+ y_start / plane->frame_info->fb->format->vsub,
+ 2);
+ struct pixel_yuv_u8 yuv_u8;
+ int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
+ int step_u = get_step_1x1(plane->frame_info->fb, direction, 1);
+ int step_v = get_step_1x1(plane->frame_info->fb, direction, 2);
+ int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
+ int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
+ x_start, y_start);
+
+ for (int i = 0; i < count; i++) {
+ yuv_u8.y = *y_plane;
+ yuv_u8.u = *u_plane;
+ yuv_u8.v = *v_plane;
+
+ yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
+ plane->base.base.color_range);
+ out_pixel += 1;
+ y_plane += step_y;
+ if ((i + subsampling_offset + 1) % subsampling == 0) {
+ u_plane += step_u;
+ v_plane += step_v;
+ }
+ }
+}
+
+static void planar_yvu_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
+ enum pixel_read_direction direction, int count,
+ struct pixel_argb_u16 out_pixel[])
+{
+ u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
+ u8 *v_plane = packed_pixels_addr(plane->frame_info,
+ x_start / plane->frame_info->fb->format->hsub,
+ y_start / plane->frame_info->fb->format->vsub,
+ 1);
+ u8 *u_plane = packed_pixels_addr(plane->frame_info,
+ x_start / plane->frame_info->fb->format->hsub,
+ y_start / plane->frame_info->fb->format->vsub,
+ 2);
+ struct pixel_yuv_u8 yuv_u8;
+ int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
+ int step_u = get_step_1x1(plane->frame_info->fb, direction, 1);
+ int step_v = get_step_1x1(plane->frame_info->fb, direction, 2);
+ int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
+ int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
+ x_start, y_start);
+
+ for (int i = 0; i < count; i++) {
+ yuv_u8.y = *y_plane;
+ yuv_u8.u = *u_plane;
+ yuv_u8.v = *v_plane;
+
+ yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
+ plane->base.base.color_range);
+ out_pixel += 1;
+ y_plane += step_y;
+ if ((i + subsampling_offset + 1) % subsampling == 0) {
+ u_plane += step_u;
+ v_plane += step_v;
+ }
+ }
+}
+
/*
* The following functions take one argb_u16 pixel and convert it to a specific format. The
* result is stored in @dst_pixels.
@@ -344,6 +585,22 @@ pixel_read_line_t get_pixel_read_line_function(u32 format)
return &XRGB16161616_read_line;
case DRM_FORMAT_RGB565:
return &RGB565_read_line;
+ case DRM_FORMAT_NV12:
+ case DRM_FORMAT_NV16:
+ case DRM_FORMAT_NV24:
+ return &semi_planar_yuv_read_line;
+ case DRM_FORMAT_NV21:
+ case DRM_FORMAT_NV61:
+ case DRM_FORMAT_NV42:
+ return &semi_planar_yvu_read_line;
+ case DRM_FORMAT_YUV420:
+ case DRM_FORMAT_YUV422:
+ case DRM_FORMAT_YUV444:
+ return &planar_yuv_read_line;
+ case DRM_FORMAT_YVU420:
+ case DRM_FORMAT_YVU422:
+ case DRM_FORMAT_YVU444:
+ return &planar_yvu_read_line;
default:
return (pixel_read_line_t)NULL;
}
diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
index 8d2bef95ff79..5a3a9e1328d8 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.h
+++ b/drivers/gpu/drm/vkms/vkms_formats.h
@@ -9,4 +9,8 @@ pixel_read_line_t get_pixel_read_line_function(u32 format);

pixel_write_t get_pixel_write_function(u32 format);

+struct pixel_yuv_u8 {
+ u8 y, u, v;
+};
+
#endif /* _VKMS_FORMATS_H_ */
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 58c1c74742b5..427ca67c60ce 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -17,7 +17,19 @@ static const u32 vkms_formats[] = {
DRM_FORMAT_XRGB8888,
DRM_FORMAT_XRGB16161616,
DRM_FORMAT_ARGB16161616,
- DRM_FORMAT_RGB565
+ DRM_FORMAT_RGB565,
+ DRM_FORMAT_NV12,
+ DRM_FORMAT_NV16,
+ DRM_FORMAT_NV24,
+ DRM_FORMAT_NV21,
+ DRM_FORMAT_NV61,
+ DRM_FORMAT_NV42,
+ DRM_FORMAT_YUV420,
+ DRM_FORMAT_YUV422,
+ DRM_FORMAT_YUV444,
+ DRM_FORMAT_YVU420,
+ DRM_FORMAT_YVU422,
+ DRM_FORMAT_YVU444
};

static struct drm_plane_state *

--
2.43.0


2024-02-23 11:46:34

by Thomas Zimmermann

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] drm/vkms: Add YUV support



Am 23.02.24 um 12:37 schrieb Louis Chauvet:
> From: Arthur Grillo <[email protected]>
>
> Add support to the YUV formats bellow:
>
> - NV12
> - NV16
> - NV24
> - NV21
> - NV61
> - NV42
> - YUV420
> - YUV422
> - YUV444
> - YVU420
> - YVU422
> - YVU444
>
> The conversion matrices of each encoding and range were obtained by
> rounding the values of the original conversion matrices multiplied by
> 2^8. This is done to avoid the use of fixed point operations.
>
> Signed-off-by: Arthur Grillo <[email protected]>
> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
> callbacks for yuv formats]
> Signed-off-by: Louis Chauvet <[email protected]>
> ---
> drivers/gpu/drm/vkms/vkms_composer.c | 2 +-
> drivers/gpu/drm/vkms/vkms_drv.h | 6 +-
> drivers/gpu/drm/vkms/vkms_formats.c | 289 +++++++++++++++++++++++++++++++++--
> drivers/gpu/drm/vkms/vkms_formats.h | 4 +
> drivers/gpu/drm/vkms/vkms_plane.c | 14 +-
> 5 files changed, 295 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index e555bf9c1aee..54fc5161d565 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
> * buffer [1]
> */
> current_plane->pixel_read_line(
> - current_plane->frame_info,
> + current_plane,
> x_start,
> y_start,
> direction,
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index ccc5be009f15..a4f6456cb971 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -75,6 +75,8 @@ enum pixel_read_direction {
> READ_RIGHT
> };
>
> +struct vkms_plane_state;
> +
> /**
> <<<<<<< HEAD

Noise

> * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
> @@ -87,8 +89,8 @@ enum pixel_read_direction {
> * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
> * x_end.
> */
> -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
> - pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
> +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
> + enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>
> /**
> * vkms_plane_state - Driver specific plane state
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index 46daea6d3ee9..515c80866a58 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
> */
> return fb->offsets[plane_index] +
> (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
> - (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
> + (x / drm_format_info_block_height(format, plane_index)) *
> + format->char_per_block[plane_index];
> }
>
> /**
> @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
> }
> }
>
> +/**
> + * get_subsampling() - Get the subsampling value on a specific direction
> + */
> +static int get_subsampling(const struct drm_format_info *format,
> + enum pixel_read_direction direction)
> +{
> + if (direction == READ_LEFT || direction == READ_RIGHT)
> + return format->hsub;
> + else if (direction == READ_DOWN || direction == READ_UP)
> + return format->vsub;
> + return 1;
> +}
> +
> +/**
> + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
> + */
> +static int get_subsampling_offset(const struct drm_format_info *format,
> + enum pixel_read_direction direction, int x_start, int y_start)
> +{
> + if (direction == READ_RIGHT || direction == READ_LEFT)
> + return x_start;
> + else if (direction == READ_DOWN || direction == READ_UP)
> + return y_start;
> + return 0;
> +}
> +
>
> /*
> * The following functions take pixel data (a, r, g, b, pixel, ...), convert them to the format
> @@ -130,6 +157,87 @@ static void RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixe
> out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
> }
>
> +static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
> +{
> + s32 y_16, cb_16, cr_16;
> + s32 r_16, g_16, b_16;
> +
> + y_16 = y - y_offset;
> + cb_16 = cb - 128;
> + cr_16 = cr - 128;
> +
> + r_16 = m[0][0] * y_16 + m[0][1] * cb_16 + m[0][2] * cr_16;
> + g_16 = m[1][0] * y_16 + m[1][1] * cb_16 + m[1][2] * cr_16;
> + b_16 = m[2][0] * y_16 + m[2][1] * cb_16 + m[2][2] * cr_16;
> +
> + *r = clamp(r_16, 0, 0xffff) >> 8;
> + *g = clamp(g_16, 0, 0xffff) >> 8;
> + *b = clamp(b_16, 0, 0xffff) >> 8;
> +}
> +
> +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
> + enum drm_color_encoding encoding, enum drm_color_range range)
> +{
> + static const s16 bt601_full[3][3] = {
> + { 256, 0, 359 },
> + { 256, -88, -183 },
> + { 256, 454, 0 },
> + };
> + static const s16 bt601[3][3] = {
> + { 298, 0, 409 },
> + { 298, -100, -208 },
> + { 298, 516, 0 },
> + };
> + static const s16 rec709_full[3][3] = {
> + { 256, 0, 408 },
> + { 256, -48, -120 },
> + { 256, 476, 0 },
> + };
> + static const s16 rec709[3][3] = {
> + { 298, 0, 459 },
> + { 298, -55, -136 },
> + { 298, 541, 0 },
> + };
> + static const s16 bt2020_full[3][3] = {
> + { 256, 0, 377 },
> + { 256, -42, -146 },
> + { 256, 482, 0 },
> + };
> + static const s16 bt2020[3][3] = {
> + { 298, 0, 430 },
> + { 298, -48, -167 },
> + { 298, 548, 0 },
> + };
> +
> + u8 r = 0;
> + u8 g = 0;
> + u8 b = 0;
> + bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
> + unsigned int y_offset = full ? 0 : 16;
> +
> + switch (encoding) {
> + case DRM_COLOR_YCBCR_BT601:
> + ycbcr2rgb(full ? bt601_full : bt601,
> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> + break;
> + case DRM_COLOR_YCBCR_BT709:
> + ycbcr2rgb(full ? rec709_full : rec709,
> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> + break;
> + case DRM_COLOR_YCBCR_BT2020:
> + ycbcr2rgb(full ? bt2020_full : bt2020,
> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> + break;
> + default:
> + pr_warn_once("Not supported color encoding\n");
> + break;
> + }
> +
> + argb_u16->r = r * 257;
> + argb_u16->g = g * 257;
> + argb_u16->b = b * 257;
> +}
> +
> /*
> * The following functions are read_line function for each pixel format supported by VKMS.
> *
> @@ -142,13 +250,13 @@ static void RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixe
> * [1]: https://lore.kernel.org/dri-devel/[email protected]/
> */
>
> -static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void ARGB8888_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> enum pixel_read_direction direction, int count,
> struct pixel_argb_u16 out_pixel[])
> {
> - u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> + u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>
> - int step = get_step_1x1(frame_info->fb, direction, 0);
> + int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>
> while (count) {
> u8 *px = (u8 *)src_pixels;
> @@ -160,13 +268,13 @@ static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start,
> }
> }
>
> -static void XRGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void XRGB8888_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> enum pixel_read_direction direction, int count,
> struct pixel_argb_u16 out_pixel[])
> {
> - u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> + u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>
> - int step = get_step_1x1(frame_info->fb, direction, 0);
> + int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>
> while (count) {
> u8 *px = (u8 *)src_pixels;
> @@ -178,13 +286,13 @@ static void XRGB8888_read_line(struct vkms_frame_info *frame_info, int x_start,
> }
> }
>
> -static void ARGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void ARGB16161616_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> enum pixel_read_direction direction, int count,
> struct pixel_argb_u16 out_pixel[])
> {
> - u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> + u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>
> - int step = get_step_1x1(frame_info->fb, direction, 0);
> + int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>
> while (count) {
> u16 *px = (u16 *)src_pixels;
> @@ -196,13 +304,13 @@ static void ARGB16161616_read_line(struct vkms_frame_info *frame_info, int x_sta
> }
> }
>
> -static void XRGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void XRGB16161616_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> enum pixel_read_direction direction, int count,
> struct pixel_argb_u16 out_pixel[])
> {
> - u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> + u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>
> - int step = get_step_1x1(frame_info->fb, direction, 0);
> + int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>
> while (count) {
> u16 *px = (u16 *)src_pixels;
> @@ -214,13 +322,13 @@ static void XRGB16161616_read_line(struct vkms_frame_info *frame_info, int x_sta
> }
> }
>
> -static void RGB565_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void RGB565_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> enum pixel_read_direction direction, int count,
> struct pixel_argb_u16 out_pixel[])
> {
> - u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> + u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>
> - int step = get_step_1x1(frame_info->fb, direction, 0);
> + int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>
> while (count) {
> u16 *px = (u16 *)src_pixels;
> @@ -232,6 +340,139 @@ static void RGB565_read_line(struct vkms_frame_info *frame_info, int x_start, in
> }
> }
>
> +static void semi_planar_yuv_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> + enum pixel_read_direction direction, int count,
> + struct pixel_argb_u16 out_pixel[])
> +{
> + u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> + u8 *uv_plane = packed_pixels_addr(plane->frame_info,
> + x_start / plane->frame_info->fb->format->hsub,
> + y_start / plane->frame_info->fb->format->vsub,
> + 1);
> + struct pixel_yuv_u8 yuv_u8;
> + int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> + int step_uv = get_step_1x1(plane->frame_info->fb, direction, 1);
> + int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> + int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> + x_start, y_start); // 0
> +
> + for (int i = 0; i < count; i++) {
> + yuv_u8.y = y_plane[0];
> + yuv_u8.u = uv_plane[0];
> + yuv_u8.v = uv_plane[1];
> +
> + yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
> + plane->base.base.color_range);
> + out_pixel += 1;
> + y_plane += step_y;
> + if ((i + subsampling_offset + 1) % subsampling == 0)
> + uv_plane += step_uv;
> + }
> +}
> +
> +static void semi_planar_yvu_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> + enum pixel_read_direction direction, int count,
> + struct pixel_argb_u16 out_pixel[])
> +{
> + u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> + u8 *vu_plane = packed_pixels_addr(plane->frame_info,
> + x_start / plane->frame_info->fb->format->hsub,
> + y_start / plane->frame_info->fb->format->vsub,
> + 1);
> + struct pixel_yuv_u8 yuv_u8;
> + int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> + int step_vu = get_step_1x1(plane->frame_info->fb, direction, 1);
> + int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> + int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> + x_start, y_start);
> + for (int i = 0; i < count; i++) {
> + yuv_u8.y = y_plane[0];
> + yuv_u8.u = vu_plane[1];
> + yuv_u8.v = vu_plane[0];
> +
> + yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
> + plane->base.base.color_range);
> + out_pixel += 1;
> + y_plane += step_y;
> + if ((i + subsampling_offset + 1) % subsampling == 0)
> + vu_plane += step_vu;
> + }
> +}
> +
> +static void planar_yuv_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> + enum pixel_read_direction direction, int count,
> + struct pixel_argb_u16 out_pixel[])
> +{
> + u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> + u8 *u_plane = packed_pixels_addr(plane->frame_info,
> + x_start / plane->frame_info->fb->format->hsub,
> + y_start / plane->frame_info->fb->format->vsub,
> + 1);
> + u8 *v_plane = packed_pixels_addr(plane->frame_info,
> + x_start / plane->frame_info->fb->format->hsub,
> + y_start / plane->frame_info->fb->format->vsub,
> + 2);
> + struct pixel_yuv_u8 yuv_u8;
> + int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> + int step_u = get_step_1x1(plane->frame_info->fb, direction, 1);
> + int step_v = get_step_1x1(plane->frame_info->fb, direction, 2);
> + int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> + int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> + x_start, y_start);
> +
> + for (int i = 0; i < count; i++) {
> + yuv_u8.y = *y_plane;
> + yuv_u8.u = *u_plane;
> + yuv_u8.v = *v_plane;
> +
> + yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
> + plane->base.base.color_range);
> + out_pixel += 1;
> + y_plane += step_y;
> + if ((i + subsampling_offset + 1) % subsampling == 0) {
> + u_plane += step_u;
> + v_plane += step_v;
> + }
> + }
> +}
> +
> +static void planar_yvu_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> + enum pixel_read_direction direction, int count,
> + struct pixel_argb_u16 out_pixel[])
> +{
> + u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> + u8 *v_plane = packed_pixels_addr(plane->frame_info,
> + x_start / plane->frame_info->fb->format->hsub,
> + y_start / plane->frame_info->fb->format->vsub,
> + 1);
> + u8 *u_plane = packed_pixels_addr(plane->frame_info,
> + x_start / plane->frame_info->fb->format->hsub,
> + y_start / plane->frame_info->fb->format->vsub,
> + 2);
> + struct pixel_yuv_u8 yuv_u8;
> + int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> + int step_u = get_step_1x1(plane->frame_info->fb, direction, 1);
> + int step_v = get_step_1x1(plane->frame_info->fb, direction, 2);
> + int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> + int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> + x_start, y_start);
> +
> + for (int i = 0; i < count; i++) {
> + yuv_u8.y = *y_plane;
> + yuv_u8.u = *u_plane;
> + yuv_u8.v = *v_plane;
> +
> + yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
> + plane->base.base.color_range);
> + out_pixel += 1;
> + y_plane += step_y;
> + if ((i + subsampling_offset + 1) % subsampling == 0) {
> + u_plane += step_u;
> + v_plane += step_v;
> + }
> + }
> +}
> +
> /*
> * The following functions take one argb_u16 pixel and convert it to a specific format. The
> * result is stored in @dst_pixels.
> @@ -344,6 +585,22 @@ pixel_read_line_t get_pixel_read_line_function(u32 format)
> return &XRGB16161616_read_line;
> case DRM_FORMAT_RGB565:
> return &RGB565_read_line;
> + case DRM_FORMAT_NV12:
> + case DRM_FORMAT_NV16:
> + case DRM_FORMAT_NV24:
> + return &semi_planar_yuv_read_line;
> + case DRM_FORMAT_NV21:
> + case DRM_FORMAT_NV61:
> + case DRM_FORMAT_NV42:
> + return &semi_planar_yvu_read_line;
> + case DRM_FORMAT_YUV420:
> + case DRM_FORMAT_YUV422:
> + case DRM_FORMAT_YUV444:
> + return &planar_yuv_read_line;
> + case DRM_FORMAT_YVU420:
> + case DRM_FORMAT_YVU422:
> + case DRM_FORMAT_YVU444:
> + return &planar_yvu_read_line;
> default:
> return (pixel_read_line_t)NULL;
> }
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> index 8d2bef95ff79..5a3a9e1328d8 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.h
> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> @@ -9,4 +9,8 @@ pixel_read_line_t get_pixel_read_line_function(u32 format);
>
> pixel_write_t get_pixel_write_function(u32 format);
>
> +struct pixel_yuv_u8 {
> + u8 y, u, v;
> +};
> +
> #endif /* _VKMS_FORMATS_H_ */
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index 58c1c74742b5..427ca67c60ce 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -17,7 +17,19 @@ static const u32 vkms_formats[] = {
> DRM_FORMAT_XRGB8888,
> DRM_FORMAT_XRGB16161616,
> DRM_FORMAT_ARGB16161616,
> - DRM_FORMAT_RGB565
> + DRM_FORMAT_RGB565,
> + DRM_FORMAT_NV12,
> + DRM_FORMAT_NV16,
> + DRM_FORMAT_NV24,
> + DRM_FORMAT_NV21,
> + DRM_FORMAT_NV61,
> + DRM_FORMAT_NV42,
> + DRM_FORMAT_YUV420,
> + DRM_FORMAT_YUV422,
> + DRM_FORMAT_YUV444,
> + DRM_FORMAT_YVU420,
> + DRM_FORMAT_YVU422,
> + DRM_FORMAT_YVU444
> };
>
> static struct drm_plane_state *
>

--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)


2024-02-23 11:55:07

by Maíra Canal

[permalink] [raw]
Subject: Re: [PATCH v2 5/9] drm/vkms: Re-introduce line-per-line composition algorithm

Hi Louis,

On 2/23/24 08:37, Louis Chauvet wrote:
> Re-introduce a line-by-line composition algorithm for each pixel format.
> This allows more performance by not requiring an indirection per pixel
> read. This patch is focussed on readability of the code.

s/focussed/focused

>
> Line-by-line composition was introduced by [1] but rewritten back to
> pixel-by-pixel algorithm in [2]. At this time, nobody noticed the impact
> on performance, and it was merged.
>
> This patch is almost a revert of [2], but in addition efforts have been
> made to increase readability and maintenability of the rotation handling.

s/maintenability/maintainability

> The blend function is now divided in two parts:
> - Transformation of coordinates from the output referential to the source
> referential
> - Line conversion and blending
>
> Most of the complexity of the rotation management is avoided by using
> drm_rect_* helpers. The remaning complexity is around the clipping, to

s/remaning/remaining

> avoid reading/writing oudside source/destination buffers.

s/oudside/outside

>
> The pixel conversion is now done line-by-line, so the read_pixel_t was
> replaced with read_pixel_line_t callback. This way the indirection is only
> required once per line and per plane, instead of once per pixel and per
> plane.
>
> The read_line_t callbacks are very similar for most pixel format, but it
> is required to avoid performance impact. Some helpers were created to
> avoid code repetition:
> - get_step_1x1: get the step in byte to reach next pixel block in a
> certain direction
> - *_to_argb_u16: helpers to perform colors conversion. They should be
> inlined by the compiler, and they are used to avoid repetition between
> multiple variants of the same format (argb/xrgb and maybe in the
> future for formats like bgr formats).
>
> This new algorithm was tested with:
> - kms_plane (for color conversions)
> - kms_rotation_crc (for rotations of planes)
> - kms_cursor_crc (for translations of planes)
> The performance gain was mesured with:
> - kms_fb_stress
>
> [1]: commit 8ba1648567e2 ("drm: vkms: Refactor the plane composer to accept
> new formats")
> https://lore.kernel.org/all/[email protected]/
> [2]: commit 322d716a3e8a ("drm/vkms: isolate pixel conversion
> functionality")
> https://lore.kernel.org/all/[email protected]/
>
> Signed-off-by: Louis Chauvet <[email protected]>
> ---

[...]

>
> +enum pixel_read_direction {
> + READ_UP,
> + READ_DOWN,
> + READ_LEFT,
> + READ_RIGHT
> +};
> +
> /**
> - * typedef pixel_read_t - These functions are used to read a pixel in the source frame,
> +<<<<<<< HEAD

This doesn't compile.

Best Regards,
- Maíra

> + * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
> * convert it to `struct pixel_argb_u16` and write it to @out_pixel.
> *
> - * @src_pixels: Pointer to the pixel to read
> - * @out_pixel: Pointer to write the converted pixel
> + * @frame_info: Frame used as source for the pixel value
> + * @y: Y (height) coordinate in the source buffer
> + * @x_start: X (width) coordinate of the first pixel to copy
> + * @x_end: X (width) coordinate of the last pixel to copy
> + * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
> + * x_end.
> */
> -typedef void (*pixel_read_t)(u8 *src_pixels, struct pixel_argb_u16 *out_pixel);
> +typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
> + pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>
> /**
> * vkms_plane_state - Driver specific plane state
> @@ -88,7 +100,7 @@ typedef void (*pixel_read_t)(u8 *src_pixels, struct pixel_argb_u16 *out_pixel);
> struct vkms_plane_state {
> struct drm_shadow_plane_state base;
> struct vkms_frame_info *frame_info;
> - pixel_read_t pixel_read;
> + pixel_read_line_t pixel_read_line;
> };
>
> struct vkms_plane {
> @@ -193,7 +205,6 @@ int vkms_verify_crc_source(struct drm_crtc *crtc, const char *source_name,
> /* Composer Support */
> void vkms_composer_worker(struct work_struct *work);
> void vkms_set_composer(struct vkms_output *out, bool enabled);
> -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y);
> void vkms_writeback_row(struct vkms_writeback_job *wb, const struct line_buffer *src_buffer, int y);
>
> /* Writeback */
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index 1f5aeba57ad6..46daea6d3ee9 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -11,21 +11,29 @@
>
> /**
> * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
> - * in the first plane
> *
> * @frame_info: Buffer metadata
> * @x: The x coordinate of the wanted pixel in the buffer
> * @y: The y coordinate of the wanted pixel in the buffer
> + * @plane_index: The index of the plane to use
> *
> * The caller must be aware that this offset is not always a pointer to a pixel. If individual
> * pixel values are needed, they have to be extracted from the resulting block.
> */
> -static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> +static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int x, int y,
> + size_t plane_index)
> {
> struct drm_framebuffer *fb = frame_info->fb;
> -
> - return fb->offsets[0] + (y * fb->pitches[0])
> - + (x * fb->format->cpp[0]);
> + const struct drm_format_info *format = frame_info->fb->format;
> + /* Directly using x and y to multiply pitches and format->ccp is not sufficient because
> + * in some formats a block can represent multiple pixels.
> + *
> + * Dividing x and y by the block size allows to extract the correct offset of the block
> + * containing the pixel.
> + */
> + return fb->offsets[plane_index] +
> + (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
> + (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
> }
>
> /**
> @@ -35,44 +43,56 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
> * @frame_info: Buffer metadata
> * @x: The x(width) coordinate inside the plane
> * @y: The y(height) coordinate inside the plane
> + * @plane_index: The index of the plane
> *
> - * Takes the information stored in the frame_info, a pair of coordinates, and
> - * returns the address of the first color channel.
> - * This function assumes the channels are packed together, i.e. a color channel
> - * comes immediately after another in the memory. And therefore, this function
> - * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> + * Takes the information stored in the frame_info, a pair of coordinates, and returns the address
> + * of the block containing this pixel.
> + * The caller must be aware that this pointer is sometimes not directly a pixel, it needs some
> + * additional work to extract pixel color from this block.
> */
> static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
> - int x, int y)
> + int x, int y, size_t plane_index)
> {
> - size_t offset = pixel_offset(frame_info, x, y);
> -
> - return (u8 *)frame_info->map[0].vaddr + offset;
> + return (u8 *)frame_info->map[0].vaddr + packed_pixels_offset(frame_info, x, y, plane_index);
> }
>
> -static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
> +/**
> + * get_step_1x1() - Common helper to compute the correct step value between each pixel to read in a
> + * certain direction.
> + * This must be used only with format where blockh == blockw == 1.
> + * In the case when direction is not a valid pixel_read_direction, the returned step is 0, so you
> + * must not rely on this result to create a loop variant.
> + *
> + * @fb Framebuffer to iter on
> + * @direction Direction of the reading
> + */
> +static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction direction,
> + int plane_index)
> {
> - int x_src = frame_info->src.x1 >> 16;
> - int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
> -
> - return packed_pixels_addr(frame_info, x_src, y_src);
> + switch (direction) {
> + default:
> + DRM_ERROR("Invalid direction for pixel reading: %d\n", direction);
> + return 0;
> + case READ_RIGHT:
> + return fb->format->char_per_block[plane_index];
> + case READ_LEFT:
> + return -fb->format->char_per_block[plane_index];
> + case READ_DOWN:
> + return (int)fb->pitches[plane_index];
> + case READ_UP:
> + return -(int)fb->pitches[plane_index];
> + }
> }
>
> -static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
> -{
> - if (frame_info->rotation & (DRM_MODE_REFLECT_X | DRM_MODE_ROTATE_270))
> - return limit - x - 1;
> - return x;
> -}
>
> /*
> - * The following functions take pixel data from the buffer and convert them to the format
> + * The following functions take pixel data (a, r, g, b, pixel, ...), convert them to the format
> * ARGB16161616 in out_pixel.
> *
> - * They are used in the `vkms_compose_row` function to handle multiple formats.
> + * They are used in the `read_line`s functions to avoid duplicate work for some pixel formats.
> */
>
> -static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> +static void ARGB8888_to_argb_u16(struct pixel_argb_u16 *out_pixel, int a, int r, int g, int b)
> {
> /*
> * The 257 is the "conversion ratio". This number is obtained by the
> @@ -80,48 +100,26 @@ static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixe
> * the best color value in a pixel format with more possibilities.
> * A similar idea applies to others RGB color conversions.
> */
> - out_pixel->a = (u16)src_pixels[3] * 257;
> - out_pixel->r = (u16)src_pixels[2] * 257;
> - out_pixel->g = (u16)src_pixels[1] * 257;
> - out_pixel->b = (u16)src_pixels[0] * 257;
> -}
> -
> -static void XRGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> -{
> - out_pixel->a = (u16)0xffff;
> - out_pixel->r = (u16)src_pixels[2] * 257;
> - out_pixel->g = (u16)src_pixels[1] * 257;
> - out_pixel->b = (u16)src_pixels[0] * 257;
> + out_pixel->a = (u16)a * 257;
> + out_pixel->r = (u16)r * 257;
> + out_pixel->g = (u16)g * 257;
> + out_pixel->b = (u16)b * 257;
> }
>
> -static void ARGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> +static void ARGB16161616_to_argb_u16(struct pixel_argb_u16 *out_pixel, int a, int r, int g, int b)
> {
> - u16 *pixels = (u16 *)src_pixels;
> -
> - out_pixel->a = le16_to_cpu(pixels[3]);
> - out_pixel->r = le16_to_cpu(pixels[2]);
> - out_pixel->g = le16_to_cpu(pixels[1]);
> - out_pixel->b = le16_to_cpu(pixels[0]);
> -}
> -
> -static void XRGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> -{
> - u16 *pixels = (u16 *)src_pixels;
> -
> - out_pixel->a = (u16)0xffff;
> - out_pixel->r = le16_to_cpu(pixels[2]);
> - out_pixel->g = le16_to_cpu(pixels[1]);
> - out_pixel->b = le16_to_cpu(pixels[0]);
> + out_pixel->a = le16_to_cpu(a);
> + out_pixel->r = le16_to_cpu(r);
> + out_pixel->g = le16_to_cpu(g);
> + out_pixel->b = le16_to_cpu(b);
> }
>
> -static void RGB565_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> +static void RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixel)
> {
> - u16 *pixels = (u16 *)src_pixels;
> -
> s64 fp_rb_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(31));
> s64 fp_g_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(63));
>
> - u16 rgb_565 = le16_to_cpu(*pixels);
> + u16 rgb_565 = le16_to_cpu(*pixel);
> s64 fp_r = drm_int2fixp((rgb_565 >> 11) & 0x1f);
> s64 fp_g = drm_int2fixp((rgb_565 >> 5) & 0x3f);
> s64 fp_b = drm_int2fixp(rgb_565 & 0x1f);
> @@ -132,34 +130,105 @@ static void RGB565_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
> }
>
> -/**
> - * vkms_compose_row - compose a single row of a plane
> - * @stage_buffer: output line with the composed pixels
> - * @plane: state of the plane that is being composed
> - * @y: y coordinate of the row
> +/*
> + * The following functions are read_line function for each pixel format supported by VKMS.
> *
> - * This function composes a single row of a plane. It gets the source pixels
> - * through the y coordinate (see get_packed_src_addr()) and goes linearly
> - * through the source pixel, reading the pixels and converting it to
> - * ARGB16161616 (see the pixel_read() callback). For rotate-90 and rotate-270,
> - * the source pixels are not traversed linearly. The source pixels are queried
> - * on each iteration in order to traverse the pixels vertically.
> + * They read a line starting at the point @x_start,@y_start following the @direction. The result
> + * is stored in @out_pixel and in the format ARGB16161616.
> + *
> + * Those function are very similar, but it is required for performance reason. In the past, some
> + * experiment were done, and with a generic loop the performance are very reduced [1].
> + *
> + * [1]: https://lore.kernel.org/dri-devel/[email protected]/
> */
> -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y)
> +
> +static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> + enum pixel_read_direction direction, int count,
> + struct pixel_argb_u16 out_pixel[])
> +{
> + u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +
> + int step = get_step_1x1(frame_info->fb, direction, 0);
> +
> + while (count) {
> + u8 *px = (u8 *)src_pixels;
> +
> + ARGB8888_to_argb_u16(out_pixel, px[3], px[2], px[1], px[0]);
> + out_pixel += 1;
> + src_pixels += step;
> + count--;
> + }
> +}
> +
> +static void XRGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> + enum pixel_read_direction direction, int count,
> + struct pixel_argb_u16 out_pixel[])
> +{
> + u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +
> + int step = get_step_1x1(frame_info->fb, direction, 0);
> +
> + while (count) {
> + u8 *px = (u8 *)src_pixels;
> +
> + ARGB8888_to_argb_u16(out_pixel, 255, px[2], px[1], px[0]);
> + out_pixel += 1;
> + src_pixels += step;
> + count--;
> + }
> +}
> +
> +static void ARGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> + enum pixel_read_direction direction, int count,
> + struct pixel_argb_u16 out_pixel[])
> +{
> + u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +
> + int step = get_step_1x1(frame_info->fb, direction, 0);
> +
> + while (count) {
> + u16 *px = (u16 *)src_pixels;
> +
> + ARGB16161616_to_argb_u16(out_pixel, px[3], px[2], px[1], px[0]);
> + out_pixel += 1;
> + src_pixels += step;
> + count--;
> + }
> +}
> +
> +static void XRGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> + enum pixel_read_direction direction, int count,
> + struct pixel_argb_u16 out_pixel[])
> +{
> + u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +
> + int step = get_step_1x1(frame_info->fb, direction, 0);
> +
> + while (count) {
> + u16 *px = (u16 *)src_pixels;
> +
> + ARGB16161616_to_argb_u16(out_pixel, 0xFFFF, px[2], px[1], px[0]);
> + out_pixel += 1;
> + src_pixels += step;
> + count--;
> + }
> +}
> +
> +static void RGB565_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> + enum pixel_read_direction direction, int count,
> + struct pixel_argb_u16 out_pixel[])
> {
> - struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> - struct vkms_frame_info *frame_info = plane->frame_info;
> - u8 *src_pixels = get_packed_src_addr(frame_info, y);
> - int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);
> + u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
>
> - for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
> - int x_pos = get_x_position(frame_info, limit, x);
> + int step = get_step_1x1(frame_info->fb, direction, 0);
>
> - if (drm_rotation_90_or_270(frame_info->rotation))
> - src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1)
> - + frame_info->fb->format->cpp[0] * y;
> + while (count) {
> + u16 *px = (u16 *)src_pixels;
>
> - plane->pixel_read(src_pixels, &out_pixels[x_pos]);
> + RGB565_to_argb_u16(out_pixel, px);
> + out_pixel += 1;
> + src_pixels += step;
> + count--;
> }
> }
>
> @@ -247,7 +316,7 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
> {
> struct vkms_frame_info *frame_info = &wb->wb_frame_info;
> int x_dst = frame_info->dst.x1;
> - u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> + u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y, 0);
> struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst), src_buffer->n_pixels);
>
> @@ -256,27 +325,27 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
> }
>
> /**
> - * Retrieve the correct read_pixel function for a specific format.
> + * Retrieve the correct read_line function for a specific format.
> * The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the
> * pointer is valid before using it in a vkms_plane_state.
> *
> * @format: 4cc of the format
> */
> -pixel_read_t get_pixel_read_function(u32 format)
> +pixel_read_line_t get_pixel_read_line_function(u32 format)
> {
> switch (format) {
> case DRM_FORMAT_ARGB8888:
> - return &ARGB8888_to_argb_u16;
> + return &ARGB8888_read_line;
> case DRM_FORMAT_XRGB8888:
> - return &XRGB8888_to_argb_u16;
> + return &XRGB8888_read_line;
> case DRM_FORMAT_ARGB16161616:
> - return &ARGB16161616_to_argb_u16;
> + return &ARGB16161616_read_line;
> case DRM_FORMAT_XRGB16161616:
> - return &XRGB16161616_to_argb_u16;
> + return &XRGB16161616_read_line;
> case DRM_FORMAT_RGB565:
> - return &RGB565_to_argb_u16;
> + return &RGB565_read_line;
> default:
> - return (pixel_read_t)NULL;
> + return (pixel_read_line_t)NULL;
> }
> }
>
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> index 3ecea4563254..8d2bef95ff79 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.h
> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> @@ -5,7 +5,7 @@
>
> #include "vkms_drv.h"
>
> -pixel_read_t get_pixel_read_function(u32 format);
> +pixel_read_line_t get_pixel_read_line_function(u32 format);
>
> pixel_write_t get_pixel_write_function(u32 format);
>
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index f68b1b03d632..58c1c74742b5 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -106,9 +106,9 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> return;
>
> fmt = fb->format->format;
> - pixel_read_t pixel_read = get_pixel_read_function(fmt);
> + pixel_read_line_t pixel_read_line = get_pixel_read_line_function(fmt);
>
> - if (!pixel_read) {
> + if (!pixel_read_line) {
> DRM_WARN("Pixel format is not supported by VKMS planes. State is inchanged\n");
> return;
> }
> @@ -128,10 +128,8 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> DRM_MODE_REFLECT_X |
> DRM_MODE_REFLECT_Y);
>
> - drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated),
> - drm_rect_height(&frame_info->rotated), frame_info->rotation);
>
> - vkms_plane_state->pixel_read = pixel_read;
> + vkms_plane_state->pixel_read_line = pixel_read_line;
> }
>
> static int vkms_plane_atomic_check(struct drm_plane *plane,
>

2024-02-23 11:58:10

by Maira Canal

[permalink] [raw]
Subject: Re: [PATCH v2 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading

Hi Louis,

On 2/23/24 08:37, Louis Chauvet wrote:
> This patchset is the second version of [1]. It is almost a complete
> rewrite to use a line-by-line algorithm for the composition.
> It can be divided in three parts:
> - PATCH 1 to 4: no functional change is intended, only some formatting and
> documenting
> (PATCH 2 is taken from [2])
> - PATCH 5: main patch for this series, it reintroduce the
> line-by-line algorithm
> - PATCH 6 to 9: taken from Arthur's series [2], with sometimes adaptation
> to use the pixel-by-pixel algorithm.
>
> The PATCH 5 aims to restore the line-by-line pixel reading algorithm. It
> was introduced in 8ba1648567e2 ("drm: vkms: Refactor the plane composer to
> accept new formats") but removed in 8ba1648567e2 ("drm: vkms: Refactor the
> plane composer to accept new formats") in a over-simplification effort.
> At this time, nobody noticed the performance impact of this commit. After
> the first iteration of my series, poeple notice performance impact, and it
> was the case. Pekka suggested to reimplement the line-by-line algorithm.
>
> Expiriments on my side shown great improvement for the line-by-line
> algorithm, and the performances are the same as the original line-by-line
> algorithm. I targeted my effort to make the code working for all the
> rotations and translations. The usage of helpers from drm_rect_* avoid
> reimplementing existing logic.
>
> The only "complex" part remaining is the clipping of the coordinate to
> avoid reading/writing outside of src/dst. Thus I added a lot of comments
> to help when someone will want to add some features (framebuffer resizing
> for example).
>
> The YUV part is not mandatory for this series, but as my first effort was
> to help the integration of YUV, I decided to rebase Arthur's series on
> mine to help. I took [3], [4], [5] and [6] and adapted them to use the
> line-by-line reading. If I did something wrong here, please let me
> know.
>
> My series was mainly tested with:
> - kms_plane (for color conversions)
> - kms_rotation_crc (for rotations of planes)
> - kms_cursor_crc (for translations)
> The benchmark used to measure the improvment was done with:
> - kms_fb_stress
>
> [1]: https://lore.kernel.org/r/[email protected]
> [2]: https://lore.kernel.org/all/[email protected]/
> [3]: https://lore.kernel.org/all/[email protected]/
> [4]: https://lore.kernel.org/all/[email protected]/
> [5]: https://lore.kernel.org/all/[email protected]/
> [6]: https://lore.kernel.org/all/[email protected]/
>
> To: Rodrigo Siqueira <[email protected]>
> To: Melissa Wen <[email protected]>
> To: Maíra Canal <[email protected]>
> To: Haneen Mohammed <[email protected]>
> To: Daniel Vetter <[email protected]>
> To: Maarten Lankhorst <[email protected]>
> To: Maxime Ripard <[email protected]>
> To: Thomas Zimmermann <[email protected]>
> To: David Airlie <[email protected]>
> To: [email protected]
> To: Jonathan Corbet <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Louis Chauvet <[email protected]>
>
> Note: after my changes, those tests seems to pass, so [7] may need
> updating (I did not check, it was maybe already the case):
> - kms_cursor_legacy@flip-vs-cursor-atomic
> - kms_pipe_crc_basic@nonblocking-crc
> - kms_pipe_crc_basic@nonblocking-crc-frame-sequence
> - kms_writeback@writeback-pixel-formats
> - kms_writeback@writeback-invalid-parameters
> - kms_flip@flip-vs-absolute-wf_vblank-interruptible
> And those tests pass, I did not investigate why the runners fails:
> - kms_flip@flip-vs-expired-vblank-interruptible
> - kms_flip@flip-vs-expired-vblank
> - kms_flip@plain-flip-fb-recreate
> - kms_flip@plain-flip-fb-recreate-interruptible
> - kms_flip@plain-flip-ts-check-interruptible
> - kms_cursor_legacy@cursorA-vs-flipA-toggle
> - kms_pipe_crc_basic@nonblocking-crc
> - kms_prop_blob@invalid-get-prop
> - kms_flip@flip-vs-absolute-wf_vblank-interruptible
> - kms_invalid_mode@zero-hdisplay
> - kms_invalid_mode@bad-vtotal
> - kms_cursor_crc.* (everything is SUCCEED or SKIP, but no fails)

This is great news! Could you just adjust the series fixing the
compiling errors?

Best Regards,
- Maíra

>
> [7]: https://lore.kernel.org/all/[email protected]/
>
> Changes in v2:
> - Rebased the series on top of drm-misc/drm-misc-net
> - Extract the typedef for pixel_read/pixel_write
> - Introduce the line-by-line algorithm per pixel format
> - Add some documentation for existing and new code
> - Port the series [1] to use line-by-line algorithm
> - Link to v1: https://lore.kernel.org/r/[email protected]
>
> ---
> Arthur Grillo (5):
> drm/vkms: Use drm_frame directly
> drm/vkms: Add YUV support
> drm/vkms: Add range and encoding properties to pixel_read function
> drm/vkms: Drop YUV formats TODO
> drm/vkms: Create KUnit tests for YUV conversions
>
> Louis Chauvet (4):
> drm/vkms: Code formatting
> drm/vkms: write/update the documentation for pixel conversion and pixel write functions
> drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions
> drm/vkms: Re-introduce line-per-line composition algorithm
>
> Documentation/gpu/vkms.rst | 3 +-
> drivers/gpu/drm/vkms/Makefile | 1 +
> drivers/gpu/drm/vkms/tests/.kunitconfig | 4 +
> drivers/gpu/drm/vkms/tests/Makefile | 3 +
> drivers/gpu/drm/vkms/tests/vkms_format_test.c | 155 +++++++
> drivers/gpu/drm/vkms/vkms_composer.c | 233 ++++++++---
> drivers/gpu/drm/vkms/vkms_crtc.c | 6 +-
> drivers/gpu/drm/vkms/vkms_drv.c | 3 +-
> drivers/gpu/drm/vkms/vkms_drv.h | 56 ++-
> drivers/gpu/drm/vkms/vkms_formats.c | 565 +++++++++++++++++++++-----
> drivers/gpu/drm/vkms/vkms_formats.h | 13 +-
> drivers/gpu/drm/vkms/vkms_plane.c | 50 ++-
> drivers/gpu/drm/vkms/vkms_writeback.c | 14 +-
> 13 files changed, 916 insertions(+), 190 deletions(-)
> ---
> base-commit: aa1267e673fe5307cf00d02add4017d2878598b6
> change-id: 20240201-yuv-1337d90d9576
>
> Best regards,

2024-02-26 11:37:55

by Pekka Paalanen

[permalink] [raw]
Subject: Re: [PATCH v2 4/9] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions

On Fri, 23 Feb 2024 12:37:24 +0100
Louis Chauvet <[email protected]> wrote:

> Introduce two typedefs: pixel_read_t and pixel_write_t. It allows the
> compiler to check if the passed functions take the correct arguments.
> Such typedefs will help ensuring consistency across the code base in
> case of update of these prototypes.
>
> Introduce a check around the get_pixel_*_functions to avoid using a
> nullptr as a function.
>
> Document for those typedefs.
>
> Signed-off-by: Louis Chauvet <[email protected]>
> ---
> drivers/gpu/drm/vkms/vkms_drv.h | 23 +++++++++++++++++++++--
> drivers/gpu/drm/vkms/vkms_formats.c | 8 ++++----
> drivers/gpu/drm/vkms/vkms_formats.h | 4 ++--
> drivers/gpu/drm/vkms/vkms_plane.c | 9 ++++++++-
> drivers/gpu/drm/vkms/vkms_writeback.c | 9 ++++++++-
> 5 files changed, 43 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index 18086423a3a7..886c885c8cf5 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -53,12 +53,31 @@ struct line_buffer {
> struct pixel_argb_u16 *pixels;
> };
>
> +/**
> + * typedef pixel_write_t - These functions are used to read a pixel from a
> + * `struct pixel_argb_u16*`, convert it in a specific format and write it in the @dst_pixels
> + * buffer.
> + *
> + * @dst_pixel: destination address to write the pixel
> + * @in_pixel: pixel to write
> + */
> +typedef void (*pixel_write_t)(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel);

There are some inconsistencies in pixel_write_t and pixel_read_t. At
this point of the series they still operate on a single pixel, but you
use dst_pixels and src_pixels, plural. Yet the documentation correctly
talks about processing a single pixel.

I would also expect the source to be always const, but that's a whole
another patch to change.

> +
> struct vkms_writeback_job {
> struct iosys_map data[DRM_FORMAT_MAX_PLANES];
> struct vkms_frame_info wb_frame_info;
> - void (*pixel_write)(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel);
> + pixel_write_t pixel_write;
> };
>
> +/**
> + * typedef pixel_read_t - These functions are used to read a pixel in the source frame,
> + * convert it to `struct pixel_argb_u16` and write it to @out_pixel.
> + *
> + * @src_pixels: Pointer to the pixel to read
> + * @out_pixel: Pointer to write the converted pixel
> + */
> +typedef void (*pixel_read_t)(u8 *src_pixels, struct pixel_argb_u16 *out_pixel);
> +
> /**
> * vkms_plane_state - Driver specific plane state
> * @base: base plane state
> @@ -69,7 +88,7 @@ struct vkms_writeback_job {
> struct vkms_plane_state {
> struct drm_shadow_plane_state base;
> struct vkms_frame_info *frame_info;
> - void (*pixel_read)(u8 *src_buffer, struct pixel_argb_u16 *out_pixel);
> + pixel_read_t pixel_read;
> };
>
> struct vkms_plane {
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index cb7a49b7c8e7..1f5aeba57ad6 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -262,7 +262,7 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
> *
> * @format: 4cc of the format
> */
> -void *get_pixel_conversion_function(u32 format)
> +pixel_read_t get_pixel_read_function(u32 format)
> {
> switch (format) {
> case DRM_FORMAT_ARGB8888:
> @@ -276,7 +276,7 @@ void *get_pixel_conversion_function(u32 format)
> case DRM_FORMAT_RGB565:
> return &RGB565_to_argb_u16;
> default:
> - return NULL;
> + return (pixel_read_t)NULL;
> }
> }
>
> @@ -287,7 +287,7 @@ void *get_pixel_conversion_function(u32 format)
> *
> * @format: 4cc of the format
> */
> -void *get_pixel_write_function(u32 format)
> +pixel_write_t get_pixel_write_function(u32 format)
> {
> switch (format) {
> case DRM_FORMAT_ARGB8888:
> @@ -301,6 +301,6 @@ void *get_pixel_write_function(u32 format)
> case DRM_FORMAT_RGB565:
> return &argb_u16_to_RGB565;
> default:
> - return NULL;
> + return (pixel_write_t)NULL;
> }
> }
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> index cf59c2ed8e9a..3ecea4563254 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.h
> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> @@ -5,8 +5,8 @@
>
> #include "vkms_drv.h"
>
> -void *get_pixel_conversion_function(u32 format);
> +pixel_read_t get_pixel_read_function(u32 format);
>
> -void *get_pixel_write_function(u32 format);
> +pixel_write_t get_pixel_write_function(u32 format);
>
> #endif /* _VKMS_FORMATS_H_ */
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index d5203f531d96..f68b1b03d632 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -106,6 +106,13 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> return;
>
> fmt = fb->format->format;
> + pixel_read_t pixel_read = get_pixel_read_function(fmt);
> +
> + if (!pixel_read) {
> + DRM_WARN("Pixel format is not supported by VKMS planes. State is inchanged\n");

DRM_WARN() is the kernel equivalent to userspace assert(), right?
In that failing the check means an internal invariant was violated,
which means a code bug in kernel?

Maybe this could be more specific about what invariant was violated?
E.g. atomic check should have rejected this attempt already.


Thanks,
pq

> + return;
> + }
> +
> vkms_plane_state = to_vkms_plane_state(new_state);
> shadow_plane_state = &vkms_plane_state->base;
>
> @@ -124,7 +131,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated),
> drm_rect_height(&frame_info->rotated), frame_info->rotation);
>
> - vkms_plane_state->pixel_read = get_pixel_conversion_function(fmt);
> + vkms_plane_state->pixel_read = pixel_read;
> }
>
> static int vkms_plane_atomic_check(struct drm_plane *plane,
> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
> index c8582df1f739..c92b9f06c4a4 100644
> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
> @@ -140,6 +140,13 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
> if (!conn_state)
> return;
>
> + pixel_write_t pixel_write = get_pixel_write_function(wb_format);
> +
> + if (!pixel_write) {
> + DRM_WARN("Pixel format is not supported by VKMS writeback. State is inchanged\n");
> + return;
> + }
> +
> vkms_set_composer(&vkmsdev->output, true);
>
> active_wb = conn_state->writeback_job->priv;
> @@ -150,7 +157,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
> crtc_state->wb_pending = true;
> spin_unlock_irq(&output->composer_lock);
> drm_writeback_queue_job(wb_conn, connector_state);
> - active_wb->pixel_write = get_pixel_write_function(wb_format);
> + active_wb->pixel_write = pixel_write;
> drm_rect_init(&wb_frame_info->src, 0, 0, crtc_width, crtc_height);
> drm_rect_init(&wb_frame_info->dst, 0, 0, crtc_width, crtc_height);
> }
>


Attachments:
(No filename) (849.00 B)
OpenPGP digital signature

2024-02-26 11:38:12

by Pekka Paalanen

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] drm/vkms: write/update the documentation for pixel conversion and pixel write functions

On Fri, 23 Feb 2024 12:37:23 +0100
Louis Chauvet <[email protected]> wrote:

> Add some documentation on pixel conversion functions.
> Update of outdated comments for pixel_write functions.
>
> Signed-off-by: Louis Chauvet <[email protected]>
> ---
> drivers/gpu/drm/vkms/vkms_composer.c | 4 +++
> drivers/gpu/drm/vkms/vkms_drv.h | 13 ++++++++
> drivers/gpu/drm/vkms/vkms_formats.c | 58 ++++++++++++++++++++++++++++++------
> 3 files changed, 66 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index c6d9b4a65809..5b341222d239 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -189,6 +189,10 @@ static void blend(struct vkms_writeback_job *wb,
>
> size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
>
> + /*
> + * The planes are composed line-by-line. It is a necessary complexity to avoid poor
> + * blending performance.
> + */
> for (size_t y = 0; y < crtc_y_limit; y++) {
> fill_background(&background_color, output_buffer);
>
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index b4b357447292..18086423a3a7 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -25,6 +25,17 @@
>
> #define VKMS_LUT_SIZE 256
>
> +/**
> + * struct vkms_frame_info - structure to store the state of a frame
> + *
> + * @fb: backing drm framebuffer
> + * @src: source rectangle of this frame in the source framebuffer
> + * @dst: destination rectangle in the crtc buffer
> + * @map: see drm_shadow_plane_state@data
> + * @rotation: rotation applied to the source.
> + *
> + * @src and @dst should have the same size modulo the rotation.
> + */
> struct vkms_frame_info {
> struct drm_framebuffer *fb;
> struct drm_rect src, dst;
> @@ -52,6 +63,8 @@ struct vkms_writeback_job {
> * vkms_plane_state - Driver specific plane state
> * @base: base plane state
> * @frame_info: data required for composing computation
> + * @pixel_read: function to read a pixel in this plane. The creator of a vkms_plane_state must
> + * ensure that this pointer is valid
> */
> struct vkms_plane_state {
> struct drm_shadow_plane_state base;
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index 172830a3936a..cb7a49b7c8e7 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -9,6 +9,17 @@
>
> #include "vkms_formats.h"
>
> +/**
> + * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
> + * in the first plane
> + *
> + * @frame_info: Buffer metadata
> + * @x: The x coordinate of the wanted pixel in the buffer
> + * @y: The y coordinate of the wanted pixel in the buffer
> + *
> + * The caller must be aware that this offset is not always a pointer to a pixel. If individual
> + * pixel values are needed, they have to be extracted from the resulting block.

Just wondering how the caller will be able to extract the right pixel
from the block without re-using the knowledge already used in this
function. I'd also expect the function to round down x,y to be
divisible by block dimensions, but that's not visible in this email.
Then the caller needs the remainder from the round-down, too?

> + */
> static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> {
> struct drm_framebuffer *fb = frame_info->fb;
> @@ -17,12 +28,13 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
> + (x * fb->format->cpp[0]);
> }
>
> -/*
> - * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
> +/**
> + * packed_pixels_addr() - Get the pointer to the block containing the pixel at the given
> + * coordinates
> *
> * @frame_info: Buffer metadata
> - * @x: The x(width) coordinate of the 2D buffer
> - * @y: The y(Heigth) coordinate of the 2D buffer
> + * @x: The x(width) coordinate inside the plane
> + * @y: The y(height) coordinate inside the plane
> *
> * Takes the information stored in the frame_info, a pair of coordinates, and
> * returns the address of the first color channel.
> @@ -53,6 +65,13 @@ static int get_x_position(const struct vkms_frame_info *frame_info, int limit, i
> return x;
> }
>
> +/*
> + * The following functions take pixel data from the buffer and convert them to the format
> + * ARGB16161616 in out_pixel.
> + *
> + * They are used in the `vkms_compose_row` function to handle multiple formats.
> + */
> +
> static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> {
> /*
> @@ -145,12 +164,11 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state
> }
>
> /*
> - * The following functions take an line of argb_u16 pixels from the
> - * src_buffer, convert them to a specific format, and store them in the
> - * destination.
> + * The following functions take one argb_u16 pixel and convert it to a specific format. The
> + * result is stored in @dst_pixels.
> *
> - * They are used in the `compose_active_planes` to convert and store a line
> - * from the src_buffer to the writeback buffer.
> + * They are used in the `vkms_writeback_row` to convert and store a pixel from the src_buffer to
> + * the writeback buffer.
> */
> static void argb_u16_to_ARGB8888(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel)
> {
> @@ -216,6 +234,14 @@ static void argb_u16_to_RGB565(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel)
> *pixels = cpu_to_le16(r << 11 | g << 5 | b);
> }
>
> +/**
> + * Generic loop for all supported writeback format. It is executed just after the blending to
> + * write a line in the writeback buffer.
> + *
> + * @wb: Job where to insert the final image
> + * @src_buffer: Line to write
> + * @y: Row to write in the writeback buffer
> + */
> void vkms_writeback_row(struct vkms_writeback_job *wb,
> const struct line_buffer *src_buffer, int y)
> {
> @@ -229,6 +255,13 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
> wb->pixel_write(dst_pixels, &in_pixels[x]);
> }
>
> +/**
> + * Retrieve the correct read_pixel function for a specific format.
> + * The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the
> + * pointer is valid before using it in a vkms_plane_state.
> + *
> + * @format: 4cc of the format

Since there are many different 4cc style pixel format definition tables
in existence with conflicting definitions, it would not hurt to be more
specific that this is about DRM_FORMAT_* or drm_fourcc.h.

> + */
> void *get_pixel_conversion_function(u32 format)
> {
> switch (format) {
> @@ -247,6 +280,13 @@ void *get_pixel_conversion_function(u32 format)
> }
> }
>
> +/**
> + * Retrieve the correct write_pixel function for a specific format.
> + * The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the
> + * pointer is valid before using it in a vkms_writeback_job.
> + *
> + * @format: 4cc of the format

This too.

> + */
> void *get_pixel_write_function(u32 format)
> {
> switch (format) {
>

I couldn't check if the docs are correct since the patch context is not
wide enough, but they all sound plausible to me.


Thanks,
pq


Attachments:
(No filename) (849.00 B)
OpenPGP digital signature

2024-02-26 11:38:40

by Pekka Paalanen

[permalink] [raw]
Subject: Re: [PATCH v2 5/9] drm/vkms: Re-introduce line-per-line composition algorithm

On Fri, 23 Feb 2024 12:37:25 +0100
Louis Chauvet <[email protected]> wrote:

> Re-introduce a line-by-line composition algorithm for each pixel format.
> This allows more performance by not requiring an indirection per pixel
> read. This patch is focussed on readability of the code.
>
> Line-by-line composition was introduced by [1] but rewritten back to
> pixel-by-pixel algorithm in [2]. At this time, nobody noticed the impact
> on performance, and it was merged.
>
> This patch is almost a revert of [2], but in addition efforts have been
> made to increase readability and maintenability of the rotation handling.
> The blend function is now divided in two parts:
> - Transformation of coordinates from the output referential to the source
> referential
> - Line conversion and blending
>
> Most of the complexity of the rotation management is avoided by using
> drm_rect_* helpers. The remaning complexity is around the clipping, to
> avoid reading/writing oudside source/destination buffers.
>
> The pixel conversion is now done line-by-line, so the read_pixel_t was
> replaced with read_pixel_line_t callback. This way the indirection is only
> required once per line and per plane, instead of once per pixel and per
> plane.
>
> The read_line_t callbacks are very similar for most pixel format, but it
> is required to avoid performance impact. Some helpers were created to
> avoid code repetition:
> - get_step_1x1: get the step in byte to reach next pixel block in a
> certain direction
> - *_to_argb_u16: helpers to perform colors conversion. They should be
> inlined by the compiler, and they are used to avoid repetition between
> multiple variants of the same format (argb/xrgb and maybe in the
> future for formats like bgr formats).
>
> This new algorithm was tested with:
> - kms_plane (for color conversions)
> - kms_rotation_crc (for rotations of planes)
> - kms_cursor_crc (for translations of planes)
> The performance gain was mesured with:
> - kms_fb_stress
>
> [1]: commit 8ba1648567e2 ("drm: vkms: Refactor the plane composer to accept
> new formats")
> https://lore.kernel.org/all/[email protected]/
> [2]: commit 322d716a3e8a ("drm/vkms: isolate pixel conversion
> functionality")
> https://lore.kernel.org/all/[email protected]/
>
> Signed-off-by: Louis Chauvet <[email protected]>
> ---
> drivers/gpu/drm/vkms/vkms_composer.c | 219 +++++++++++++++++++++++-------
> drivers/gpu/drm/vkms/vkms_drv.h | 25 +++-
> drivers/gpu/drm/vkms/vkms_formats.c | 253 ++++++++++++++++++++++-------------
> drivers/gpu/drm/vkms/vkms_formats.h | 2 +-
> drivers/gpu/drm/vkms/vkms_plane.c | 8 +-
> 5 files changed, 350 insertions(+), 157 deletions(-)
>
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index 5b341222d239..e555bf9c1aee 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -24,9 +24,10 @@ static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
>
> /**
> * pre_mul_alpha_blend - alpha blending equation
> - * @frame_info: Source framebuffer's metadata
> * @stage_buffer: The line with the pixels from src_plane
> * @output_buffer: A line buffer that receives all the blends output
> + * @x_start: The start offset to avoid useless copy
> + * @count: The number of byte to copy
> *
> * Using the information from the `frame_info`, this blends only the
> * necessary pixels from the `stage_buffer` to the `output_buffer`
> @@ -37,51 +38,23 @@ static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
> * drm_plane_create_blend_mode_property(). Also, this formula assumes a
> * completely opaque background.
> */
> -static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
> - struct line_buffer *stage_buffer,
> - struct line_buffer *output_buffer)
> +static void pre_mul_alpha_blend(
> + struct line_buffer *stage_buffer,
> + struct line_buffer *output_buffer,
> + int x_start,
> + int pixel_count)
> {
> - int x_dst = frame_info->dst.x1;
> - struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
> - struct pixel_argb_u16 *in = stage_buffer->pixels;
> - int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> - stage_buffer->n_pixels);
> -
> - for (int x = 0; x < x_limit; x++) {
> - out[x].a = (u16)0xffff;
> - out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
> - out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
> - out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
> + struct pixel_argb_u16 *out = &output_buffer->pixels[x_start];
> + struct pixel_argb_u16 *in = &stage_buffer->pixels[x_start];

Input buffers and pointers should be const.

> +
> + for (int i = 0; i < pixel_count; i++) {
> + out[i].a = (u16)0xffff;
> + out[i].r = pre_mul_blend_channel(in[i].r, out[i].r, in[i].a);
> + out[i].g = pre_mul_blend_channel(in[i].g, out[i].g, in[i].a);
> + out[i].b = pre_mul_blend_channel(in[i].b, out[i].b, in[i].a);
> }
> }

Somehow the hunk above does not feel like it is part of "re-introduce
line-per-line composition algorithm". This function was already running
line-by-line. Would it be easy enough to collect this and directly
related changes into a separate patch?

>
> -static int get_y_pos(struct vkms_frame_info *frame_info, int y)
> -{
> - if (frame_info->rotation & DRM_MODE_REFLECT_Y)
> - return drm_rect_height(&frame_info->rotated) - y - 1;
> -
> - switch (frame_info->rotation & DRM_MODE_ROTATE_MASK) {
> - case DRM_MODE_ROTATE_90:
> - return frame_info->rotated.x2 - y - 1;
> - case DRM_MODE_ROTATE_270:
> - return y + frame_info->rotated.x1;
> - default:
> - return y;
> - }
> -}
> -
> -static bool check_limit(struct vkms_frame_info *frame_info, int pos)
> -{
> - if (drm_rotation_90_or_270(frame_info->rotation)) {
> - if (pos >= 0 && pos < drm_rect_width(&frame_info->rotated))
> - return true;
> - } else {
> - if (pos >= frame_info->rotated.y1 && pos < frame_info->rotated.y2)
> - return true;
> - }
> -
> - return false;
> -}
>
> static void fill_background(const struct pixel_argb_u16 *background_color,
> struct line_buffer *output_buffer)
> @@ -163,6 +136,37 @@ static void apply_lut(const struct vkms_crtc_state *crtc_state, struct line_buff
> }
> }
>
> +/**
> + * direction_for_rotation() - Helper to get the correct reading direction for a specific rotation
> + *
> + * @rotation: rotation to analyze

This is KMS plane rotation property, right?

So the KMS plane has been rotated by this, and what we want to find is
the read direction on the attached FB so that reading returns pixels in
the CRTC line/scanout order, right?

Maybe extend the doc to explain that.

> + */
> +enum pixel_read_direction direction_for_rotation(unsigned int rotation)
> +{
> + if (rotation & DRM_MODE_ROTATE_0) {
> + if (rotation & DRM_MODE_REFLECT_X)
> + return READ_LEFT;
> + else
> + return READ_RIGHT;
> + } else if (rotation & DRM_MODE_ROTATE_90) {
> + if (rotation & DRM_MODE_REFLECT_Y)
> + return READ_UP;
> + else
> + return READ_DOWN;
> + } else if (rotation & DRM_MODE_ROTATE_180) {
> + if (rotation & DRM_MODE_REFLECT_X)
> + return READ_RIGHT;
> + else
> + return READ_LEFT;
> + } else if (rotation & DRM_MODE_ROTATE_270) {
> + if (rotation & DRM_MODE_REFLECT_Y)
> + return READ_DOWN;
> + else
> + return READ_UP;
> + }
> + return READ_RIGHT;
> +}
> +
> /**
> * blend - blend the pixels from all planes and compute crc
> * @wb: The writeback frame buffer metadata
> @@ -183,11 +187,11 @@ static void blend(struct vkms_writeback_job *wb,
> {
> struct vkms_plane_state **plane = crtc_state->active_planes;
> u32 n_active_planes = crtc_state->num_active_planes;
> - int y_pos;
>
> const struct pixel_argb_u16 background_color = { .a = 0xffff };
>
> size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
> + size_t crtc_x_limit = crtc_state->base.crtc->mode.hdisplay;

Wonder why these were size_t, causing needs to cast below...

>
> /*
> * The planes are composed line-by-line. It is a necessary complexity to avoid poor
> @@ -198,22 +202,133 @@ static void blend(struct vkms_writeback_job *wb,
>
> /* The active planes are composed associatively in z-order. */
> for (size_t i = 0; i < n_active_planes; i++) {
> - y_pos = get_y_pos(plane[i]->frame_info, y);
> + struct vkms_plane_state *current_plane = plane[i];
>
> - if (!check_limit(plane[i]->frame_info, y_pos))
> + /* Avoid rendering useless lines */
> + if (y < current_plane->frame_info->dst.y1 ||
> + y >= current_plane->frame_info->dst.y2) {
> continue;
> -
> - vkms_compose_row(stage_buffer, plane[i], y_pos);
> - pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
> - output_buffer);
> + }
> +
> + /*
> + * src_px is the line to copy. The initial coordinates are inside the
> + * destination framebuffer, and then drm_rect_* helpers are used to
> + * compute the correct position into the source framebuffer.
> + */
> + struct drm_rect src_px = DRM_RECT_INIT(
> + current_plane->frame_info->dst.x1, y,
> + drm_rect_width(&current_plane->frame_info->dst), 1);
> + struct drm_rect tmp_src;
> +
> + drm_rect_fp_to_int(&tmp_src, &current_plane->frame_info->src);
> +
> + /*
> + * [1]: Clamping src_px to the crtc_x_limit to avoid writing outside of the
> + * destination buffer
> + */
> + src_px.x2 = min_t(int, src_px.x2, (int)crtc_x_limit);

Up to and including this point, it would be better if src_px was called
dst_px, because only the below computation converts it into actual
src_px.

> +
> + /*
> + * Transform the coordinate x/y from the crtc to coordinates into
> + * coordinates for the src buffer.
> + *
> + * - Cancel the offset of the dst buffer.
> + * - Invert the rotation. This assumes that
> + * dst = drm_rect_rotate(src, rotation) (dst and src have the
> + * same size, but can be rotated).
> + * - Apply the offset of the source rectangle to the coordinate.
> + */
> + drm_rect_translate(&src_px, -current_plane->frame_info->dst.x1,
> + -current_plane->frame_info->dst.y1);
> + drm_rect_rotate_inv(&src_px,
> + drm_rect_width(&tmp_src),
> + drm_rect_height(&tmp_src),
> + current_plane->frame_info->rotation);
> + drm_rect_translate(&src_px, tmp_src.x1, tmp_src.y1);
> +
> + /* Get the correct reading direction in the source buffer. */
> +
> + enum pixel_read_direction direction =
> + direction_for_rotation(current_plane->frame_info->rotation);
> +
> + int x_start = src_px.x1;
> + int y_start = src_px.y1;
> + int pixel_count;
> + /* [2]: Compute and clamp the number of pixel to read */
> + if (direction == READ_RIGHT || direction == READ_LEFT) {
> + /*
> + * In horizontal reading, the src_px width is the number of pixel to
> + * read
> + */
> + pixel_count = drm_rect_width(&src_px);
> + if (x_start < 0) {
> + pixel_count += x_start;
> + x_start = 0;
> + }
> + if (x_start + pixel_count > current_plane->frame_info->fb->width) {
> + pixel_count =
> + (int)current_plane->frame_info->fb->width - x_start;
> + }
> + } else {
> + /*
> + * In vertical reading, the src_px height is the number of pixel to
> + * read
> + */
> + pixel_count = drm_rect_height(&src_px);
> + if (y_start < 0) {
> + pixel_count += y_start;
> + y_start = 0;
> + }
> + if (y_start + pixel_count > current_plane->frame_info->fb->height) {
> + pixel_count =
> + (int)current_plane->frame_info->fb->width - y_start;
> + }
> + }
> +
> + if (pixel_count <= 0) {
> + /* Nothing to read, so avoid multiple function calls for nothing */
> + continue;
> + }
> +
> + /*
> + * Modify the starting point to take in account the rotation
> + *
> + * src_px is the top-left corner, so when reading READ_LEFT or READ_TOP, it
> + * must be changed to the top-right/bottom-left corner.
> + */
> + if (direction == READ_LEFT) {
> + // x_start is now the right point
> + x_start += pixel_count - 1;
> + } else if (direction == READ_UP) {
> + // y_start is now the bottom point
> + y_start += pixel_count - 1;
> + }
> +
> + /*
> + * Perform the conversion and the blending
> + *
> + * Here we know that the read line (x_start, y_start, pixel_count) is
> + * inside the source buffer [2] and we don't write outside the stage
> + * buffer [1]
> + */
> + current_plane->pixel_read_line(
> + current_plane->frame_info,
> + x_start,
> + y_start,
> + direction,
> + pixel_count,
> + &stage_buffer->pixels[current_plane->frame_info->dst.x1]);
> +
> + pre_mul_alpha_blend(stage_buffer, output_buffer,
> + current_plane->frame_info->dst.x1,
> + pixel_count);
> }

I stared at the above algorithm for a while, and I couldn't find
anything obviously wrong, so good work.

>
> apply_lut(crtc_state, output_buffer);
>
> *crc32 = crc32_le(*crc32, (void *)output_buffer->pixels, row_size);
> -
> if (wb)
> - vkms_writeback_row(wb, output_buffer, y_pos);
> + vkms_writeback_row(wb, output_buffer, y);
> }
> }
>
> @@ -224,7 +339,7 @@ static int check_format_funcs(struct vkms_crtc_state *crtc_state,
> u32 n_active_planes = crtc_state->num_active_planes;
>
> for (size_t i = 0; i < n_active_planes; i++)
> - if (!planes[i]->pixel_read)
> + if (!planes[i]->pixel_read_line)
> return -1;
>
> if (active_wb && !active_wb->pixel_write)
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index 886c885c8cf5..ccc5be009f15 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -39,7 +39,6 @@
> struct vkms_frame_info {
> struct drm_framebuffer *fb;
> struct drm_rect src, dst;
> - struct drm_rect rotated;
> struct iosys_map map[DRM_FORMAT_MAX_PLANES];
> unsigned int rotation;
> };
> @@ -69,14 +68,27 @@ struct vkms_writeback_job {
> pixel_write_t pixel_write;
> };
>
> +enum pixel_read_direction {
> + READ_UP,
> + READ_DOWN,
> + READ_LEFT,
> + READ_RIGHT

When I saw these in code, I got a little confused. Does READ_LEFT mean
read towards left, or read starting from left? It's very common to
express reading directions as left-to-right and right-to-left rather
than "left arrow".

There are many choices how to improve this, e.g. upward, leftward,
right-to-left, positive-x, negative-y.

> +};
> +
> /**
> - * typedef pixel_read_t - These functions are used to read a pixel in the source frame,
> +<<<<<<< HEAD
> + * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
> * convert it to `struct pixel_argb_u16` and write it to @out_pixel.
> *
> - * @src_pixels: Pointer to the pixel to read
> - * @out_pixel: Pointer to write the converted pixel
> + * @frame_info: Frame used as source for the pixel value
> + * @y: Y (height) coordinate in the source buffer
> + * @x_start: X (width) coordinate of the first pixel to copy
> + * @x_end: X (width) coordinate of the last pixel to copy
> + * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
> + * x_end.
> */
> -typedef void (*pixel_read_t)(u8 *src_pixels, struct pixel_argb_u16 *out_pixel);
> +typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum

const frame_info I presume.


> + pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>
> /**
> * vkms_plane_state - Driver specific plane state
> @@ -88,7 +100,7 @@ typedef void (*pixel_read_t)(u8 *src_pixels, struct pixel_argb_u16 *out_pixel);
> struct vkms_plane_state {
> struct drm_shadow_plane_state base;
> struct vkms_frame_info *frame_info;
> - pixel_read_t pixel_read;
> + pixel_read_line_t pixel_read_line;
> };
>
> struct vkms_plane {
> @@ -193,7 +205,6 @@ int vkms_verify_crc_source(struct drm_crtc *crtc, const char *source_name,
> /* Composer Support */
> void vkms_composer_worker(struct work_struct *work);
> void vkms_set_composer(struct vkms_output *out, bool enabled);
> -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y);
> void vkms_writeback_row(struct vkms_writeback_job *wb, const struct line_buffer *src_buffer, int y);
>
> /* Writeback */
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index 1f5aeba57ad6..46daea6d3ee9 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -11,21 +11,29 @@
>
> /**
> * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
> - * in the first plane
> *
> * @frame_info: Buffer metadata
> * @x: The x coordinate of the wanted pixel in the buffer
> * @y: The y coordinate of the wanted pixel in the buffer
> + * @plane_index: The index of the plane to use
> *
> * The caller must be aware that this offset is not always a pointer to a pixel. If individual
> * pixel values are needed, they have to be extracted from the resulting block.
> */
> -static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> +static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int x, int y,
> + size_t plane_index)
> {
> struct drm_framebuffer *fb = frame_info->fb;
> -
> - return fb->offsets[0] + (y * fb->pitches[0])
> - + (x * fb->format->cpp[0]);
> + const struct drm_format_info *format = frame_info->fb->format;
> + /* Directly using x and y to multiply pitches and format->ccp is not sufficient because
> + * in some formats a block can represent multiple pixels.
> + *
> + * Dividing x and y by the block size allows to extract the correct offset of the block
> + * containing the pixel.
> + */
> + return fb->offsets[plane_index] +
> + (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
> + (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];

These changes do not seem like they belong with "re-introduce
line-per-line composition algorithm" but some other patch.


> }
>
> /**
> @@ -35,44 +43,56 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
> * @frame_info: Buffer metadata
> * @x: The x(width) coordinate inside the plane
> * @y: The y(height) coordinate inside the plane
> + * @plane_index: The index of the plane
> *
> - * Takes the information stored in the frame_info, a pair of coordinates, and
> - * returns the address of the first color channel.
> - * This function assumes the channels are packed together, i.e. a color channel
> - * comes immediately after another in the memory. And therefore, this function
> - * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> + * Takes the information stored in the frame_info, a pair of coordinates, and returns the address
> + * of the block containing this pixel.
> + * The caller must be aware that this pointer is sometimes not directly a pixel, it needs some
> + * additional work to extract pixel color from this block.
> */
> static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
> - int x, int y)
> + int x, int y, size_t plane_index)
> {
> - size_t offset = pixel_offset(frame_info, x, y);
> -
> - return (u8 *)frame_info->map[0].vaddr + offset;
> + return (u8 *)frame_info->map[0].vaddr + packed_pixels_offset(frame_info, x, y, plane_index);

This too.


> }
>
> -static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
> +/**
> + * get_step_1x1() - Common helper to compute the correct step value between each pixel to read in a
> + * certain direction.
> + * This must be used only with format where blockh == blockw == 1.
> + * In the case when direction is not a valid pixel_read_direction, the returned step is 0, so you
> + * must not rely on this result to create a loop variant.
> + *
> + * @fb Framebuffer to iter on
> + * @direction Direction of the reading
> + */
> +static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction direction,
> + int plane_index)
> {
> - int x_src = frame_info->src.x1 >> 16;
> - int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
> -
> - return packed_pixels_addr(frame_info, x_src, y_src);
> + switch (direction) {
> + default:
> + DRM_ERROR("Invalid direction for pixel reading: %d\n", direction);
> + return 0;

What I'd do here is move the default: section outside of the switch
completely. Then the compiler can warn if any enum value is not handled
here. Since every case in the switch is a return statement, falling out
of the switch block is the default case.

Maybe the enum variable containing an illegal value could be handled
more harshly so that callers could rely on this function always
returning a good value?

Just like passing in fb=NULL is handled by the kernel as an OOPS.

> + case READ_RIGHT:
> + return fb->format->char_per_block[plane_index];
> + case READ_LEFT:
> + return -fb->format->char_per_block[plane_index];
> + case READ_DOWN:
> + return (int)fb->pitches[plane_index];
> + case READ_UP:
> + return -(int)fb->pitches[plane_index];
> + }
> }
>
> -static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
> -{
> - if (frame_info->rotation & (DRM_MODE_REFLECT_X | DRM_MODE_ROTATE_270))
> - return limit - x - 1;
> - return x;
> -}
>
> /*
> - * The following functions take pixel data from the buffer and convert them to the format
> + * The following functions take pixel data (a, r, g, b, pixel, ...), convert them to the format
> * ARGB16161616 in out_pixel.
> *
> - * They are used in the `vkms_compose_row` function to handle multiple formats.
> + * They are used in the `read_line`s functions to avoid duplicate work for some pixel formats.
> */
>
> -static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> +static void ARGB8888_to_argb_u16(struct pixel_argb_u16 *out_pixel, int a, int r, int g, int b)

The function name ARGB8888_to_argb_u16() is confusing. It's not taking
in ARGB8888 pixels but separate a,r,g,b ints. The only assumption it
needs from the pixel format is the 8888 part.

> {
> /*
> * The 257 is the "conversion ratio". This number is obtained by the
> @@ -80,48 +100,26 @@ static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixe
> * the best color value in a pixel format with more possibilities.
> * A similar idea applies to others RGB color conversions.
> */
> - out_pixel->a = (u16)src_pixels[3] * 257;
> - out_pixel->r = (u16)src_pixels[2] * 257;
> - out_pixel->g = (u16)src_pixels[1] * 257;
> - out_pixel->b = (u16)src_pixels[0] * 257;
> -}
> -
> -static void XRGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> -{
> - out_pixel->a = (u16)0xffff;
> - out_pixel->r = (u16)src_pixels[2] * 257;
> - out_pixel->g = (u16)src_pixels[1] * 257;
> - out_pixel->b = (u16)src_pixels[0] * 257;
> + out_pixel->a = (u16)a * 257;
> + out_pixel->r = (u16)r * 257;
> + out_pixel->g = (u16)g * 257;
> + out_pixel->b = (u16)b * 257;
> }
>
> -static void ARGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> +static void ARGB16161616_to_argb_u16(struct pixel_argb_u16 *out_pixel, int a, int r, int g, int b)
> {
> - u16 *pixels = (u16 *)src_pixels;
> -
> - out_pixel->a = le16_to_cpu(pixels[3]);
> - out_pixel->r = le16_to_cpu(pixels[2]);
> - out_pixel->g = le16_to_cpu(pixels[1]);
> - out_pixel->b = le16_to_cpu(pixels[0]);
> -}
> -
> -static void XRGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> -{
> - u16 *pixels = (u16 *)src_pixels;
> -
> - out_pixel->a = (u16)0xffff;
> - out_pixel->r = le16_to_cpu(pixels[2]);
> - out_pixel->g = le16_to_cpu(pixels[1]);
> - out_pixel->b = le16_to_cpu(pixels[0]);
> + out_pixel->a = le16_to_cpu(a);
> + out_pixel->r = le16_to_cpu(r);
> + out_pixel->g = le16_to_cpu(g);
> + out_pixel->b = le16_to_cpu(b);
> }
>
> -static void RGB565_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> +static void RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixel)

This function OTOH is taking in literally DRM_FORMAT_RGB565, so its
name is good.

> {
> - u16 *pixels = (u16 *)src_pixels;
> -
> s64 fp_rb_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(31));
> s64 fp_g_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(63));
>
> - u16 rgb_565 = le16_to_cpu(*pixels);
> + u16 rgb_565 = le16_to_cpu(*pixel);
> s64 fp_r = drm_int2fixp((rgb_565 >> 11) & 0x1f);
> s64 fp_g = drm_int2fixp((rgb_565 >> 5) & 0x3f);
> s64 fp_b = drm_int2fixp(rgb_565 & 0x1f);
> @@ -132,34 +130,105 @@ static void RGB565_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
> }
>
> -/**
> - * vkms_compose_row - compose a single row of a plane
> - * @stage_buffer: output line with the composed pixels
> - * @plane: state of the plane that is being composed
> - * @y: y coordinate of the row
> +/*
> + * The following functions are read_line function for each pixel format supported by VKMS.
> *
> - * This function composes a single row of a plane. It gets the source pixels
> - * through the y coordinate (see get_packed_src_addr()) and goes linearly
> - * through the source pixel, reading the pixels and converting it to
> - * ARGB16161616 (see the pixel_read() callback). For rotate-90 and rotate-270,
> - * the source pixels are not traversed linearly. The source pixels are queried
> - * on each iteration in order to traverse the pixels vertically.
> + * They read a line starting at the point @x_start,@y_start following the @direction. The result
> + * is stored in @out_pixel and in the format ARGB16161616.
> + *
> + * Those function are very similar, but it is required for performance reason. In the past, some
> + * experiment were done, and with a generic loop the performance are very reduced [1].
> + *
> + * [1]: https://lore.kernel.org/dri-devel/[email protected]/
> */
> -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y)
> +
> +static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> + enum pixel_read_direction direction, int count,
> + struct pixel_argb_u16 out_pixel[])
> +{
> + u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +
> + int step = get_step_1x1(frame_info->fb, direction, 0);
> +
> + while (count) {
> + u8 *px = (u8 *)src_pixels;
> +
> + ARGB8888_to_argb_u16(out_pixel, px[3], px[2], px[1], px[0]);
> + out_pixel += 1;
> + src_pixels += step;
> + count--;

btw. you could eliminate decrementing 'count' if you computed end
address and used while (out_pixel < end).

Thanks,
pq


> + }
> +}
> +
> +static void XRGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> + enum pixel_read_direction direction, int count,
> + struct pixel_argb_u16 out_pixel[])
> +{
> + u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +
> + int step = get_step_1x1(frame_info->fb, direction, 0);
> +
> + while (count) {
> + u8 *px = (u8 *)src_pixels;
> +
> + ARGB8888_to_argb_u16(out_pixel, 255, px[2], px[1], px[0]);
> + out_pixel += 1;
> + src_pixels += step;
> + count--;
> + }
> +}
> +
> +static void ARGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> + enum pixel_read_direction direction, int count,
> + struct pixel_argb_u16 out_pixel[])
> +{
> + u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +
> + int step = get_step_1x1(frame_info->fb, direction, 0);
> +
> + while (count) {
> + u16 *px = (u16 *)src_pixels;
> +
> + ARGB16161616_to_argb_u16(out_pixel, px[3], px[2], px[1], px[0]);
> + out_pixel += 1;
> + src_pixels += step;
> + count--;
> + }
> +}
> +
> +static void XRGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> + enum pixel_read_direction direction, int count,
> + struct pixel_argb_u16 out_pixel[])
> +{
> + u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +
> + int step = get_step_1x1(frame_info->fb, direction, 0);
> +
> + while (count) {
> + u16 *px = (u16 *)src_pixels;
> +
> + ARGB16161616_to_argb_u16(out_pixel, 0xFFFF, px[2], px[1], px[0]);
> + out_pixel += 1;
> + src_pixels += step;
> + count--;
> + }
> +}
> +
> +static void RGB565_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> + enum pixel_read_direction direction, int count,
> + struct pixel_argb_u16 out_pixel[])
> {
> - struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> - struct vkms_frame_info *frame_info = plane->frame_info;
> - u8 *src_pixels = get_packed_src_addr(frame_info, y);
> - int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);
> + u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
>
> - for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
> - int x_pos = get_x_position(frame_info, limit, x);
> + int step = get_step_1x1(frame_info->fb, direction, 0);
>
> - if (drm_rotation_90_or_270(frame_info->rotation))
> - src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1)
> - + frame_info->fb->format->cpp[0] * y;
> + while (count) {
> + u16 *px = (u16 *)src_pixels;
>
> - plane->pixel_read(src_pixels, &out_pixels[x_pos]);
> + RGB565_to_argb_u16(out_pixel, px);
> + out_pixel += 1;
> + src_pixels += step;
> + count--;
> }
> }
>
> @@ -247,7 +316,7 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
> {
> struct vkms_frame_info *frame_info = &wb->wb_frame_info;
> int x_dst = frame_info->dst.x1;
> - u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> + u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y, 0);
> struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst), src_buffer->n_pixels);
>
> @@ -256,27 +325,27 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
> }
>
> /**
> - * Retrieve the correct read_pixel function for a specific format.
> + * Retrieve the correct read_line function for a specific format.
> * The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the
> * pointer is valid before using it in a vkms_plane_state.
> *
> * @format: 4cc of the format
> */
> -pixel_read_t get_pixel_read_function(u32 format)
> +pixel_read_line_t get_pixel_read_line_function(u32 format)
> {
> switch (format) {
> case DRM_FORMAT_ARGB8888:
> - return &ARGB8888_to_argb_u16;
> + return &ARGB8888_read_line;
> case DRM_FORMAT_XRGB8888:
> - return &XRGB8888_to_argb_u16;
> + return &XRGB8888_read_line;
> case DRM_FORMAT_ARGB16161616:
> - return &ARGB16161616_to_argb_u16;
> + return &ARGB16161616_read_line;
> case DRM_FORMAT_XRGB16161616:
> - return &XRGB16161616_to_argb_u16;
> + return &XRGB16161616_read_line;
> case DRM_FORMAT_RGB565:
> - return &RGB565_to_argb_u16;
> + return &RGB565_read_line;
> default:
> - return (pixel_read_t)NULL;
> + return (pixel_read_line_t)NULL;
> }
> }
>
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> index 3ecea4563254..8d2bef95ff79 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.h
> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> @@ -5,7 +5,7 @@
>
> #include "vkms_drv.h"
>
> -pixel_read_t get_pixel_read_function(u32 format);
> +pixel_read_line_t get_pixel_read_line_function(u32 format);
>
> pixel_write_t get_pixel_write_function(u32 format);
>
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index f68b1b03d632..58c1c74742b5 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -106,9 +106,9 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> return;
>
> fmt = fb->format->format;
> - pixel_read_t pixel_read = get_pixel_read_function(fmt);
> + pixel_read_line_t pixel_read_line = get_pixel_read_line_function(fmt);
>
> - if (!pixel_read) {
> + if (!pixel_read_line) {
> DRM_WARN("Pixel format is not supported by VKMS planes. State is inchanged\n");
> return;
> }
> @@ -128,10 +128,8 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> DRM_MODE_REFLECT_X |
> DRM_MODE_REFLECT_Y);
>
> - drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated),
> - drm_rect_height(&frame_info->rotated), frame_info->rotation);
>
> - vkms_plane_state->pixel_read = pixel_read;
> + vkms_plane_state->pixel_read_line = pixel_read_line;
> }
>
> static int vkms_plane_atomic_check(struct drm_plane *plane,
>


Attachments:
(No filename) (849.00 B)
OpenPGP digital signature

2024-02-26 12:23:55

by Pekka Paalanen

[permalink] [raw]
Subject: Re: [PATCH v2 7/9] drm/vkms: Add range and encoding properties to pixel_read function

On Fri, 23 Feb 2024 12:37:27 +0100
Louis Chauvet <[email protected]> wrote:

> From: Arthur Grillo <[email protected]>
>
> Create range and encoding properties. This should be noop, as none of
> the conversion functions need those properties.

None of the conversion function needs this? How can one say so?
The previous patch is making use of them already, AFAICT?

How is this a noop? Is it not exposing new UAPI from VKMS?


Thanks,
pq

>
> Signed-off-by: Arthur Grillo <[email protected]>
> [Louis Chauvet: retained only relevant parts]
> Signed-off-by: Louis Chauvet <[email protected]>
> ---
> drivers/gpu/drm/vkms/vkms_plane.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index 427ca67c60ce..95dfde297377 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -228,5 +228,14 @@ struct vkms_plane *vkms_plane_init(struct vkms_device *vkmsdev,
> drm_plane_create_rotation_property(&plane->base, DRM_MODE_ROTATE_0,
> DRM_MODE_ROTATE_MASK | DRM_MODE_REFLECT_MASK);
>
> + drm_plane_create_color_properties(&plane->base,
> + BIT(DRM_COLOR_YCBCR_BT601) |
> + BIT(DRM_COLOR_YCBCR_BT709) |
> + BIT(DRM_COLOR_YCBCR_BT2020),
> + BIT(DRM_COLOR_YCBCR_LIMITED_RANGE) |
> + BIT(DRM_COLOR_YCBCR_FULL_RANGE),
> + DRM_COLOR_YCBCR_BT601,
> + DRM_COLOR_YCBCR_FULL_RANGE);
> +
> return plane;
> }
>


Attachments:
(No filename) (849.00 B)
OpenPGP digital signature

2024-02-26 12:53:21

by Pekka Paalanen

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] drm/vkms: Add YUV support

On Fri, 23 Feb 2024 12:37:26 +0100
Louis Chauvet <[email protected]> wrote:

> From: Arthur Grillo <[email protected]>
>
> Add support to the YUV formats bellow:
>
> - NV12
> - NV16
> - NV24
> - NV21
> - NV61
> - NV42
> - YUV420
> - YUV422
> - YUV444
> - YVU420
> - YVU422
> - YVU444
>
> The conversion matrices of each encoding and range were obtained by
> rounding the values of the original conversion matrices multiplied by
> 2^8. This is done to avoid the use of fixed point operations.
>
> Signed-off-by: Arthur Grillo <[email protected]>
> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
> callbacks for yuv formats]
> Signed-off-by: Louis Chauvet <[email protected]>
> ---
> drivers/gpu/drm/vkms/vkms_composer.c | 2 +-
> drivers/gpu/drm/vkms/vkms_drv.h | 6 +-
> drivers/gpu/drm/vkms/vkms_formats.c | 289 +++++++++++++++++++++++++++++++++--
> drivers/gpu/drm/vkms/vkms_formats.h | 4 +
> drivers/gpu/drm/vkms/vkms_plane.c | 14 +-
> 5 files changed, 295 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index e555bf9c1aee..54fc5161d565 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
> * buffer [1]
> */
> current_plane->pixel_read_line(
> - current_plane->frame_info,
> + current_plane,
> x_start,
> y_start,
> direction,
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index ccc5be009f15..a4f6456cb971 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -75,6 +75,8 @@ enum pixel_read_direction {
> READ_RIGHT
> };
>
> +struct vkms_plane_state;
> +
> /**
> <<<<<<< HEAD
> * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
> @@ -87,8 +89,8 @@ enum pixel_read_direction {
> * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
> * x_end.
> */
> -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
> - pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
> +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
> + enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);

This is the second or third time in this one series changing this type.
Could you not do the change once, in its own patch if possible?

>
> /**
> * vkms_plane_state - Driver specific plane state
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index 46daea6d3ee9..515c80866a58 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
> */
> return fb->offsets[plane_index] +
> (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
> - (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
> + (x / drm_format_info_block_height(format, plane_index)) *
> + format->char_per_block[plane_index];

Shouldn't this be in the patch that added this code in the first place?

> }
>
> /**
> @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
> }
> }
>
> +/**
> + * get_subsampling() - Get the subsampling value on a specific direction

subsampling divisor

> + */
> +static int get_subsampling(const struct drm_format_info *format,
> + enum pixel_read_direction direction)
> +{
> + if (direction == READ_LEFT || direction == READ_RIGHT)
> + return format->hsub;
> + else if (direction == READ_DOWN || direction == READ_UP)
> + return format->vsub;
> + return 1;

In this and the below function, personally I'd prefer switch-case, with
a cannot-happen-scream after the switch, so the compiler can warn about
unhandled enum values.

> +}
> +
> +/**
> + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
> + */
> +static int get_subsampling_offset(const struct drm_format_info *format,
> + enum pixel_read_direction direction, int x_start, int y_start)

'start' values as "increments" for a pixel counter? Is something
misnamed here?

Is it an increment or an offset?

> +{
> + if (direction == READ_RIGHT || direction == READ_LEFT)
> + return x_start;
> + else if (direction == READ_DOWN || direction == READ_UP)
> + return y_start;
> + return 0;
> +}
> +
>
> /*
> * The following functions take pixel data (a, r, g, b, pixel, ...), convert them to the format
> @@ -130,6 +157,87 @@ static void RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixe
> out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
> }
>
> +static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
> +{
> + s32 y_16, cb_16, cr_16;
> + s32 r_16, g_16, b_16;
> +
> + y_16 = y - y_offset;
> + cb_16 = cb - 128;
> + cr_16 = cr - 128;
> +
> + r_16 = m[0][0] * y_16 + m[0][1] * cb_16 + m[0][2] * cr_16;
> + g_16 = m[1][0] * y_16 + m[1][1] * cb_16 + m[1][2] * cr_16;
> + b_16 = m[2][0] * y_16 + m[2][1] * cb_16 + m[2][2] * cr_16;
> +
> + *r = clamp(r_16, 0, 0xffff) >> 8;
> + *g = clamp(g_16, 0, 0xffff) >> 8;
> + *b = clamp(b_16, 0, 0xffff) >> 8;
> +}
> +
> +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
> + enum drm_color_encoding encoding, enum drm_color_range range)
> +{
> + static const s16 bt601_full[3][3] = {
> + { 256, 0, 359 },
> + { 256, -88, -183 },
> + { 256, 454, 0 },
> + };
> + static const s16 bt601[3][3] = {
> + { 298, 0, 409 },
> + { 298, -100, -208 },
> + { 298, 516, 0 },
> + };
> + static const s16 rec709_full[3][3] = {
> + { 256, 0, 408 },
> + { 256, -48, -120 },
> + { 256, 476, 0 },
> + };
> + static const s16 rec709[3][3] = {
> + { 298, 0, 459 },
> + { 298, -55, -136 },
> + { 298, 541, 0 },
> + };
> + static const s16 bt2020_full[3][3] = {
> + { 256, 0, 377 },
> + { 256, -42, -146 },
> + { 256, 482, 0 },
> + };
> + static const s16 bt2020[3][3] = {
> + { 298, 0, 430 },
> + { 298, -48, -167 },
> + { 298, 548, 0 },
> + };
> +
> + u8 r = 0;
> + u8 g = 0;
> + u8 b = 0;
> + bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
> + unsigned int y_offset = full ? 0 : 16;
> +
> + switch (encoding) {
> + case DRM_COLOR_YCBCR_BT601:
> + ycbcr2rgb(full ? bt601_full : bt601,

Doing all these conditional again pixel by pixel is probably
inefficient. Just like with the line reading functions, you could pick
the matrix in advance.

> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> + break;
> + case DRM_COLOR_YCBCR_BT709:
> + ycbcr2rgb(full ? rec709_full : rec709,
> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> + break;
> + case DRM_COLOR_YCBCR_BT2020:
> + ycbcr2rgb(full ? bt2020_full : bt2020,
> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> + break;
> + default:
> + pr_warn_once("Not supported color encoding\n");
> + break;
> + }
> +
> + argb_u16->r = r * 257;
> + argb_u16->g = g * 257;
> + argb_u16->b = b * 257;

I wonder. Using 8-bit fixed point precision seems quite coarse for
8-bit pixel formats, and it's going to be insufficient for higher bit
depths. Was supporting e.g. 10-bit YUV considered? There is even
deeper, too, like DRM_FORMAT_P016.

> +}
> +
> /*
> * The following functions are read_line function for each pixel format supported by VKMS.
> *
> @@ -142,13 +250,13 @@ static void RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixe
> * [1]: https://lore.kernel.org/dri-devel/[email protected]/
> */
>
> -static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void ARGB8888_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> enum pixel_read_direction direction, int count,
> struct pixel_argb_u16 out_pixel[])
> {
> - u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> + u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>
> - int step = get_step_1x1(frame_info->fb, direction, 0);
> + int step = get_step_1x1(plane->frame_info->fb, direction, 0);

These are the kind of changes I would not expect to see in a patch
adding YUV support. There are a lot of them, too.

>
> while (count) {
> u8 *px = (u8 *)src_pixels;
> @@ -160,13 +268,13 @@ static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start,
> }
> }
>
> -static void XRGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void XRGB8888_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> enum pixel_read_direction direction, int count,
> struct pixel_argb_u16 out_pixel[])
> {
> - u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> + u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>
> - int step = get_step_1x1(frame_info->fb, direction, 0);
> + int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>
> while (count) {
> u8 *px = (u8 *)src_pixels;
> @@ -178,13 +286,13 @@ static void XRGB8888_read_line(struct vkms_frame_info *frame_info, int x_start,
> }
> }
>
> -static void ARGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void ARGB16161616_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> enum pixel_read_direction direction, int count,
> struct pixel_argb_u16 out_pixel[])
> {
> - u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> + u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>
> - int step = get_step_1x1(frame_info->fb, direction, 0);
> + int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>
> while (count) {
> u16 *px = (u16 *)src_pixels;
> @@ -196,13 +304,13 @@ static void ARGB16161616_read_line(struct vkms_frame_info *frame_info, int x_sta
> }
> }
>
> -static void XRGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void XRGB16161616_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> enum pixel_read_direction direction, int count,
> struct pixel_argb_u16 out_pixel[])
> {
> - u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> + u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>
> - int step = get_step_1x1(frame_info->fb, direction, 0);
> + int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>
> while (count) {
> u16 *px = (u16 *)src_pixels;
> @@ -214,13 +322,13 @@ static void XRGB16161616_read_line(struct vkms_frame_info *frame_info, int x_sta
> }
> }
>
> -static void RGB565_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void RGB565_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> enum pixel_read_direction direction, int count,
> struct pixel_argb_u16 out_pixel[])
> {
> - u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> + u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>
> - int step = get_step_1x1(frame_info->fb, direction, 0);
> + int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>
> while (count) {
> u16 *px = (u16 *)src_pixels;
> @@ -232,6 +340,139 @@ static void RGB565_read_line(struct vkms_frame_info *frame_info, int x_start, in
> }
> }
>
> +static void semi_planar_yuv_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> + enum pixel_read_direction direction, int count,
> + struct pixel_argb_u16 out_pixel[])
> +{
> + u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> + u8 *uv_plane = packed_pixels_addr(plane->frame_info,
> + x_start / plane->frame_info->fb->format->hsub,
> + y_start / plane->frame_info->fb->format->vsub,
> + 1);
> + struct pixel_yuv_u8 yuv_u8;
> + int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> + int step_uv = get_step_1x1(plane->frame_info->fb, direction, 1);
> + int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> + int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> + x_start, y_start); // 0
> +
> + for (int i = 0; i < count; i++) {
> + yuv_u8.y = y_plane[0];
> + yuv_u8.u = uv_plane[0];
> + yuv_u8.v = uv_plane[1];
> +
> + yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
> + plane->base.base.color_range);

Oh, so this was the reason to change the read-line function signature.
Maybe just stash a pointer to the right matrix and the right y_offset
in frame_info instead?

> + out_pixel += 1;
> + y_plane += step_y;
> + if ((i + subsampling_offset + 1) % subsampling == 0)
> + uv_plane += step_uv;
> + }
> +}
> +
> +static void semi_planar_yvu_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> + enum pixel_read_direction direction, int count,
> + struct pixel_argb_u16 out_pixel[])
> +{
> + u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> + u8 *vu_plane = packed_pixels_addr(plane->frame_info,
> + x_start / plane->frame_info->fb->format->hsub,
> + y_start / plane->frame_info->fb->format->vsub,
> + 1);
> + struct pixel_yuv_u8 yuv_u8;
> + int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> + int step_vu = get_step_1x1(plane->frame_info->fb, direction, 1);
> + int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> + int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> + x_start, y_start);
> + for (int i = 0; i < count; i++) {
> + yuv_u8.y = y_plane[0];
> + yuv_u8.u = vu_plane[1];
> + yuv_u8.v = vu_plane[0];

You could swap matrix columns instead of writing this whole new
function for UV vs. VU. Just an idea.


Thanks,
pq

> +
> + yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
> + plane->base.base.color_range);
> + out_pixel += 1;
> + y_plane += step_y;
> + if ((i + subsampling_offset + 1) % subsampling == 0)
> + vu_plane += step_vu;
> + }
> +}
> +
> +static void planar_yuv_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> + enum pixel_read_direction direction, int count,
> + struct pixel_argb_u16 out_pixel[])
> +{
> + u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> + u8 *u_plane = packed_pixels_addr(plane->frame_info,
> + x_start / plane->frame_info->fb->format->hsub,
> + y_start / plane->frame_info->fb->format->vsub,
> + 1);
> + u8 *v_plane = packed_pixels_addr(plane->frame_info,
> + x_start / plane->frame_info->fb->format->hsub,
> + y_start / plane->frame_info->fb->format->vsub,
> + 2);
> + struct pixel_yuv_u8 yuv_u8;
> + int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> + int step_u = get_step_1x1(plane->frame_info->fb, direction, 1);
> + int step_v = get_step_1x1(plane->frame_info->fb, direction, 2);
> + int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> + int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> + x_start, y_start);
> +
> + for (int i = 0; i < count; i++) {
> + yuv_u8.y = *y_plane;
> + yuv_u8.u = *u_plane;
> + yuv_u8.v = *v_plane;
> +
> + yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
> + plane->base.base.color_range);
> + out_pixel += 1;
> + y_plane += step_y;
> + if ((i + subsampling_offset + 1) % subsampling == 0) {
> + u_plane += step_u;
> + v_plane += step_v;
> + }
> + }
> +}
> +
> +static void planar_yvu_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> + enum pixel_read_direction direction, int count,
> + struct pixel_argb_u16 out_pixel[])
> +{
> + u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> + u8 *v_plane = packed_pixels_addr(plane->frame_info,
> + x_start / plane->frame_info->fb->format->hsub,
> + y_start / plane->frame_info->fb->format->vsub,
> + 1);
> + u8 *u_plane = packed_pixels_addr(plane->frame_info,
> + x_start / plane->frame_info->fb->format->hsub,
> + y_start / plane->frame_info->fb->format->vsub,
> + 2);
> + struct pixel_yuv_u8 yuv_u8;
> + int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> + int step_u = get_step_1x1(plane->frame_info->fb, direction, 1);
> + int step_v = get_step_1x1(plane->frame_info->fb, direction, 2);
> + int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> + int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> + x_start, y_start);
> +
> + for (int i = 0; i < count; i++) {
> + yuv_u8.y = *y_plane;
> + yuv_u8.u = *u_plane;
> + yuv_u8.v = *v_plane;
> +
> + yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
> + plane->base.base.color_range);
> + out_pixel += 1;
> + y_plane += step_y;
> + if ((i + subsampling_offset + 1) % subsampling == 0) {
> + u_plane += step_u;
> + v_plane += step_v;
> + }
> + }
> +}
> +
> /*
> * The following functions take one argb_u16 pixel and convert it to a specific format. The
> * result is stored in @dst_pixels.
> @@ -344,6 +585,22 @@ pixel_read_line_t get_pixel_read_line_function(u32 format)
> return &XRGB16161616_read_line;
> case DRM_FORMAT_RGB565:
> return &RGB565_read_line;
> + case DRM_FORMAT_NV12:
> + case DRM_FORMAT_NV16:
> + case DRM_FORMAT_NV24:
> + return &semi_planar_yuv_read_line;
> + case DRM_FORMAT_NV21:
> + case DRM_FORMAT_NV61:
> + case DRM_FORMAT_NV42:
> + return &semi_planar_yvu_read_line;
> + case DRM_FORMAT_YUV420:
> + case DRM_FORMAT_YUV422:
> + case DRM_FORMAT_YUV444:
> + return &planar_yuv_read_line;
> + case DRM_FORMAT_YVU420:
> + case DRM_FORMAT_YVU422:
> + case DRM_FORMAT_YVU444:
> + return &planar_yvu_read_line;
> default:
> return (pixel_read_line_t)NULL;
> }
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> index 8d2bef95ff79..5a3a9e1328d8 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.h
> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> @@ -9,4 +9,8 @@ pixel_read_line_t get_pixel_read_line_function(u32 format);
>
> pixel_write_t get_pixel_write_function(u32 format);
>
> +struct pixel_yuv_u8 {
> + u8 y, u, v;
> +};
> +
> #endif /* _VKMS_FORMATS_H_ */
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index 58c1c74742b5..427ca67c60ce 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -17,7 +17,19 @@ static const u32 vkms_formats[] = {
> DRM_FORMAT_XRGB8888,
> DRM_FORMAT_XRGB16161616,
> DRM_FORMAT_ARGB16161616,
> - DRM_FORMAT_RGB565
> + DRM_FORMAT_RGB565,
> + DRM_FORMAT_NV12,
> + DRM_FORMAT_NV16,
> + DRM_FORMAT_NV24,
> + DRM_FORMAT_NV21,
> + DRM_FORMAT_NV61,
> + DRM_FORMAT_NV42,
> + DRM_FORMAT_YUV420,
> + DRM_FORMAT_YUV422,
> + DRM_FORMAT_YUV444,
> + DRM_FORMAT_YVU420,
> + DRM_FORMAT_YVU422,
> + DRM_FORMAT_YVU444
> };
>
> static struct drm_plane_state *
>


Attachments:
(No filename) (849.00 B)
OpenPGP digital signature

2024-02-26 16:39:31

by Arthur Grillo

[permalink] [raw]
Subject: Re: [PATCH v2 9/9] drm/vkms: Create KUnit tests for YUV conversions



On 23/02/24 08:37, Louis Chauvet wrote:
> From: Arthur Grillo <[email protected]>
>
> Create KUnit tests to test the conversion between YUV and RGB. Test each
> conversion and range combination with some common colors.
>
> Signed-off-by: Arthur Grillo <[email protected]>
> [Louis Chauvet: fix minor formating issues (whitespace, double line)]
> Signed-off-by: Louis Chauvet <[email protected]>
> ---
> drivers/gpu/drm/vkms/Makefile | 1 +
> drivers/gpu/drm/vkms/tests/.kunitconfig | 4 +
> drivers/gpu/drm/vkms/tests/Makefile | 3 +
> drivers/gpu/drm/vkms/tests/vkms_format_test.c | 155 ++++++++++++++++++++++++++
> drivers/gpu/drm/vkms/vkms_formats.c | 9 +-
> drivers/gpu/drm/vkms/vkms_formats.h | 5 +
> 6 files changed, 175 insertions(+), 2 deletions(-)

You need to add the CONFIG_DRM_VKMS_KUNIT_TESTS config to
drivers/gpu/drm/vkms/Kconfig, like my previous patch did.

Best Regards,
~Arthur Grillo

>
> diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
> index 1b28a6a32948..8d3e46dde635 100644
> --- a/drivers/gpu/drm/vkms/Makefile
> +++ b/drivers/gpu/drm/vkms/Makefile
> @@ -9,3 +9,4 @@ vkms-y := \
> vkms_writeback.o
>
> obj-$(CONFIG_DRM_VKMS) += vkms.o
> +obj-$(CONFIG_DRM_VKMS_KUNIT_TESTS) += tests/
> diff --git a/drivers/gpu/drm/vkms/tests/.kunitconfig b/drivers/gpu/drm/vkms/tests/.kunitconfig
> new file mode 100644
> index 000000000000..70e378228cbd
> --- /dev/null
> +++ b/drivers/gpu/drm/vkms/tests/.kunitconfig
> @@ -0,0 +1,4 @@
> +CONFIG_KUNIT=y
> +CONFIG_DRM=y
> +CONFIG_DRM_VKMS=y
> +CONFIG_DRM_VKMS_KUNIT_TESTS=y
> diff --git a/drivers/gpu/drm/vkms/tests/Makefile b/drivers/gpu/drm/vkms/tests/Makefile
> new file mode 100644
> index 000000000000..2d1df668569e
> --- /dev/null
> +++ b/drivers/gpu/drm/vkms/tests/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +obj-$(CONFIG_DRM_VKMS_KUNIT_TESTS) += vkms_format_test.o
> diff --git a/drivers/gpu/drm/vkms/tests/vkms_format_test.c b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
> new file mode 100644
> index 000000000000..cb6d32ff115d
> --- /dev/null
> +++ b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
> @@ -0,0 +1,155 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +
> +#include <kunit/test.h>
> +
> +#include <drm/drm_fixed.h>
> +#include <drm/drm_fourcc.h>
> +#include <drm/drm_print.h>
> +
> +#include "../../drm_crtc_internal.h"
> +
> +#include "../vkms_drv.h"
> +#include "../vkms_formats.h"
> +
> +#define TEST_BUFF_SIZE 50
> +
> +struct yuv_u8_to_argb_u16_case {
> + enum drm_color_encoding encoding;
> + enum drm_color_range range;
> + size_t n_colors;
> + struct format_pair {
> + char *name;
> + struct pixel_yuv_u8 yuv;
> + struct pixel_argb_u16 argb;
> + } colors[TEST_BUFF_SIZE];
> +};
> +
> +static struct yuv_u8_to_argb_u16_case yuv_u8_to_argb_u16_cases[] = {
> + {
> + .encoding = DRM_COLOR_YCBCR_BT601,
> + .range = DRM_COLOR_YCBCR_FULL_RANGE,
> + .n_colors = 6,
> + .colors = {
> + {"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
> + {"gray", {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
> + {"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
> + {"red", {0x4c, 0x55, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
> + {"green", {0x96, 0x2c, 0x15}, {0x0000, 0x0000, 0xffff, 0x0000}},
> + {"blue", {0x1d, 0xff, 0x6b}, {0x0000, 0x0000, 0x0000, 0xffff}},
> + },
> + },
> + {
> + .encoding = DRM_COLOR_YCBCR_BT601,
> + .range = DRM_COLOR_YCBCR_LIMITED_RANGE,
> + .n_colors = 6,
> + .colors = {
> + {"white", {0xeb, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
> + {"gray", {0x7e, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
> + {"black", {0x10, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
> + {"red", {0x51, 0x5a, 0xf0}, {0x0000, 0xffff, 0x0000, 0x0000}},
> + {"green", {0x91, 0x36, 0x22}, {0x0000, 0x0000, 0xffff, 0x0000}},
> + {"blue", {0x29, 0xf0, 0x6e}, {0x0000, 0x0000, 0x0000, 0xffff}},
> + },
> + },
> + {
> + .encoding = DRM_COLOR_YCBCR_BT709,
> + .range = DRM_COLOR_YCBCR_FULL_RANGE,
> + .n_colors = 4,
> + .colors = {
> + {"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
> + {"gray", {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
> + {"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
> + {"red", {0x35, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
> + {"green", {0xb6, 0x1e, 0x0c}, {0x0000, 0x0000, 0xffff, 0x0000}},
> + {"blue", {0x12, 0xff, 0x74}, {0x0000, 0x0000, 0x0000, 0xffff}},
> + },
> + },
> + {
> + .encoding = DRM_COLOR_YCBCR_BT709,
> + .range = DRM_COLOR_YCBCR_LIMITED_RANGE,
> + .n_colors = 4,
> + .colors = {
> + {"white", {0xeb, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
> + {"gray", {0x7e, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
> + {"black", {0x10, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
> + {"red", {0x3f, 0x66, 0xf0}, {0x0000, 0xffff, 0x0000, 0x0000}},
> + {"green", {0xad, 0x2a, 0x1a}, {0x0000, 0x0000, 0xffff, 0x0000}},
> + {"blue", {0x20, 0xf0, 0x76}, {0x0000, 0x0000, 0x0000, 0xffff}},
> + },
> + },
> + {
> + .encoding = DRM_COLOR_YCBCR_BT2020,
> + .range = DRM_COLOR_YCBCR_FULL_RANGE,
> + .n_colors = 4,
> + .colors = {
> + {"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
> + {"gray", {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
> + {"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
> + {"red", {0x43, 0x5c, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
> + {"green", {0xad, 0x24, 0x0b}, {0x0000, 0x0000, 0xffff, 0x0000}},
> + {"blue", {0x0f, 0xff, 0x76}, {0x0000, 0x0000, 0x0000, 0xffff}},
> + },
> + },
> + {
> + .encoding = DRM_COLOR_YCBCR_BT2020,
> + .range = DRM_COLOR_YCBCR_LIMITED_RANGE,
> + .n_colors = 4,
> + .colors = {
> + {"white", {0xeb, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
> + {"gray", {0x7e, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
> + {"black", {0x10, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
> + {"red", {0x4a, 0x61, 0xf0}, {0x0000, 0xffff, 0x0000, 0x0000}},
> + {"green", {0xa4, 0x2f, 0x19}, {0x0000, 0x0000, 0xffff, 0x0000}},
> + {"blue", {0x1d, 0xf0, 0x77}, {0x0000, 0x0000, 0x0000, 0xffff}},
> + },
> + },
> +};
> +
> +static void vkms_format_test_yuv_u8_to_argb_u16(struct kunit *test)
> +{
> + const struct yuv_u8_to_argb_u16_case *param = test->param_value;
> + struct pixel_argb_u16 argb;
> +
> + for (size_t i = 0; i < param->n_colors; i++) {
> + const struct format_pair *color = &param->colors[i];
> +
> + yuv_u8_to_argb_u16(&argb, &color->yuv, param->encoding, param->range);
> +
> + KUNIT_EXPECT_LE_MSG(test, abs_diff(argb.a, color->argb.a), 257,
> + "On the A channel of the color %s expected 0x%04x, got 0x%04x",
> + color->name, color->argb.a, argb.a);
> + KUNIT_EXPECT_LE_MSG(test, abs_diff(argb.r, color->argb.r), 257,
> + "On the R channel of the color %s expected 0x%04x, got 0x%04x",
> + color->name, color->argb.r, argb.r);
> + KUNIT_EXPECT_LE_MSG(test, abs_diff(argb.g, color->argb.g), 257,
> + "On the G channel of the color %s expected 0x%04x, got 0x%04x",
> + color->name, color->argb.g, argb.g);
> + KUNIT_EXPECT_LE_MSG(test, abs_diff(argb.b, color->argb.b), 257,
> + "On the B channel of the color %s expected 0x%04x, got 0x%04x",
> + color->name, color->argb.b, argb.b);
> + }
> +}
> +
> +static void vkms_format_test_yuv_u8_to_argb_u16_case_desc(struct yuv_u8_to_argb_u16_case *t,
> + char *desc)
> +{
> + snprintf(desc, KUNIT_PARAM_DESC_SIZE, "%s - %s",
> + drm_get_color_encoding_name(t->encoding), drm_get_color_range_name(t->range));
> +}
> +
> +KUNIT_ARRAY_PARAM(yuv_u8_to_argb_u16, yuv_u8_to_argb_u16_cases,
> + vkms_format_test_yuv_u8_to_argb_u16_case_desc);
> +
> +static struct kunit_case vkms_format_test_cases[] = {
> + KUNIT_CASE_PARAM(vkms_format_test_yuv_u8_to_argb_u16, yuv_u8_to_argb_u16_gen_params),
> + {}
> +};
> +
> +static struct kunit_suite vkms_format_test_suite = {
> + .name = "vkms-format",
> + .test_cases = vkms_format_test_cases,
> +};
> +
> +kunit_test_suite(vkms_format_test_suite);
> +
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index 515c80866a58..20dd23ce9051 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -7,6 +7,8 @@
> #include <drm/drm_rect.h>
> #include <drm/drm_fixed.h>
>
> +#include <kunit/visibility.h>
> +
> #include "vkms_formats.h"
>
> /**
> @@ -175,8 +177,10 @@ static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r,
> *b = clamp(b_16, 0, 0xffff) >> 8;
> }
>
> -static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
> - enum drm_color_encoding encoding, enum drm_color_range range)
> +VISIBLE_IF_KUNIT void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16,
> + const struct pixel_yuv_u8 *yuv_u8,
> + enum drm_color_encoding encoding,
> + enum drm_color_range range)
> {
> static const s16 bt601_full[3][3] = {
> { 256, 0, 359 },
> @@ -237,6 +241,7 @@ static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pix
> argb_u16->g = g * 257;
> argb_u16->b = b * 257;
> }
> +EXPORT_SYMBOL_IF_KUNIT(yuv_u8_to_argb_u16);
>
> /*
> * The following functions are read_line function for each pixel format supported by VKMS.
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> index 5a3a9e1328d8..4245a5c5e956 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.h
> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> @@ -13,4 +13,9 @@ struct pixel_yuv_u8 {
> u8 y, u, v;
> };
>
> +#if IS_ENABLED(CONFIG_KUNIT)
> +void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
> + enum drm_color_encoding encoding, enum drm_color_range range);
> +#endif
> +
> #endif /* _VKMS_FORMATS_H_ */
>

2024-02-27 15:04:53

by Louis Chauvet

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] drm/vkms: write/update the documentation for pixel conversion and pixel write functions

[...]

> > diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> > index 172830a3936a..cb7a49b7c8e7 100644
> > --- a/drivers/gpu/drm/vkms/vkms_formats.c
> > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > @@ -9,6 +9,17 @@
> >
> > #include "vkms_formats.h"
> >
> > +/**
> > + * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
> > + * in the first plane
> > + *
> > + * @frame_info: Buffer metadata
> > + * @x: The x coordinate of the wanted pixel in the buffer
> > + * @y: The y coordinate of the wanted pixel in the buffer
> > + *
> > + * The caller must be aware that this offset is not always a pointer to a pixel. If individual
> > + * pixel values are needed, they have to be extracted from the resulting block.
>
> Just wondering how the caller will be able to extract the right pixel
> from the block without re-using the knowledge already used in this
> function. I'd also expect the function to round down x,y to be
> divisible by block dimensions, but that's not visible in this email.
> Then the caller needs the remainder from the round-down, too?

You are right, the current implementation is only working when block_h ==
block_w == 1. I think I wrote the documentation for PATCHv2 5/9, but when
backporting this comment for PATCHv2 3/9 I forgot to update it.
The new comment will be:

* pixels_offset() - Get the offset of a given pixel data at coordinate
* x/y in the first plane
[...]
* The caller must ensure that the framebuffer associated with this
* request uses a pixel format where block_h == block_w == 1.
* If this requirement is not fulfilled, the resulting offset can be
* completly wrong.

And yes, even after PATCHv2 5/9 it is not clear what is the offset. Is
this better to replace the last sentence? (I will do the same update for
the last sentence of packed_pixels_addr)

[...]
* The returned offset correspond to the offset of the block containing the pixel at coordinates
* x/y.
* The caller must use this offset with care, as for formats with block_h != 1 or block_w != 1
* the requested pixel value may have to be extracted from the block, even if they are
* individually adressable.

> > + */
> > static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> > {
> > struct drm_framebuffer *fb = frame_info->fb;
> > @@ -17,12 +28,13 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
> > + (x * fb->format->cpp[0]);
> > }
> >

[...]

> > +/**
> > + * Retrieve the correct read_pixel function for a specific format.
> > + * The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the
> > + * pointer is valid before using it in a vkms_plane_state.
> > + *
> > + * @format: 4cc of the format
>
> Since there are many different 4cc style pixel format definition tables
> in existence with conflicting definitions, it would not hurt to be more
> specific that this is about DRM_FORMAT_* or drm_fourcc.h.

Is this better?

@format: DRM_FORMAT_* value for which to obtain a conversion function (see [drm_fourcc.h])

> > + */
> > void *get_pixel_conversion_function(u32 format)
> > {
> > switch (format) {
> > @@ -247,6 +280,13 @@ void *get_pixel_conversion_function(u32 format)
> > }
> > }
> >
> > +/**
> > + * Retrieve the correct write_pixel function for a specific format.
> > + * The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the
> > + * pointer is valid before using it in a vkms_writeback_job.
> > + *
> > + * @format: 4cc of the format
>
> This too.

Ack, I will use the same as above

> > + */
> > void *get_pixel_write_function(u32 format)
> > {
> > switch (format) {
> >
>
> I couldn't check if the docs are correct since the patch context is not
> wide enough, but they all sound plausible to me.

I checked again, I don't see other errors than your first comment.

>
> Thanks,
> pq

Kind regards,
Louis Chauvet

--
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2024-02-27 15:05:01

by Louis Chauvet

[permalink] [raw]
Subject: Re: [PATCH v2 9/9] drm/vkms: Create KUnit tests for YUV conversions

Le 26/02/24 - 13:39, Arthur Grillo a ?crit :
>
>
> On 23/02/24 08:37, Louis Chauvet wrote:
> > From: Arthur Grillo <[email protected]>
> >
> > Create KUnit tests to test the conversion between YUV and RGB. Test each
> > conversion and range combination with some common colors.
> >
> > Signed-off-by: Arthur Grillo <[email protected]>
> > [Louis Chauvet: fix minor formating issues (whitespace, double line)]
> > Signed-off-by: Louis Chauvet <[email protected]>
> > ---
> > drivers/gpu/drm/vkms/Makefile | 1 +
> > drivers/gpu/drm/vkms/tests/.kunitconfig | 4 +
> > drivers/gpu/drm/vkms/tests/Makefile | 3 +
> > drivers/gpu/drm/vkms/tests/vkms_format_test.c | 155 ++++++++++++++++++++++++++
> > drivers/gpu/drm/vkms/vkms_formats.c | 9 +-
> > drivers/gpu/drm/vkms/vkms_formats.h | 5 +
> > 6 files changed, 175 insertions(+), 2 deletions(-)
>
> You need to add the CONFIG_DRM_VKMS_KUNIT_TESTS config to
> drivers/gpu/drm/vkms/Kconfig, like my previous patch did.

I don't know how I merged your patch, but I missed the Kconfig file,
it was not intended, sorry.

Kind regards,
Louis Chauvet

[...]

--
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2024-02-27 15:05:56

by Louis Chauvet

[permalink] [raw]
Subject: Re: [PATCH v2 7/9] drm/vkms: Add range and encoding properties to pixel_read function

(same as for PATCHv2 6/9, I took the patch from Arthur with no
modifications)

Le 26/02/24 - 14:23, Pekka Paalanen a ?crit :
> On Fri, 23 Feb 2024 12:37:27 +0100
> Louis Chauvet <[email protected]> wrote:
>
> > From: Arthur Grillo <[email protected]>
> >
> > Create range and encoding properties. This should be noop, as none of
> > the conversion functions need those properties.
>
> None of the conversion function needs this? How can one say so?
> The previous patch is making use of them already, AFAICT?

It's my fault, I mixed the commits (in Arthur's series, "Add range..." was
before "Add YUV support"), but for me it makes no sens to have the color
property without the support in the driver.

Maybe it's better just to merge "Add range..." with "Add YUV support"?

> How is this a noop? Is it not exposing new UAPI from VKMS?

It's not a no-op from userspace, but from the driver side, yes.

Kind regards,
Louis Chauvet

> Thanks,
> pq
>
> >
> > Signed-off-by: Arthur Grillo <[email protected]>
> > [Louis Chauvet: retained only relevant parts]
> > Signed-off-by: Louis Chauvet <[email protected]>
> > ---
> > drivers/gpu/drm/vkms/vkms_plane.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> > index 427ca67c60ce..95dfde297377 100644
> > --- a/drivers/gpu/drm/vkms/vkms_plane.c
> > +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> > @@ -228,5 +228,14 @@ struct vkms_plane *vkms_plane_init(struct vkms_device *vkmsdev,
> > drm_plane_create_rotation_property(&plane->base, DRM_MODE_ROTATE_0,
> > DRM_MODE_ROTATE_MASK | DRM_MODE_REFLECT_MASK);
> >
> > + drm_plane_create_color_properties(&plane->base,
> > + BIT(DRM_COLOR_YCBCR_BT601) |
> > + BIT(DRM_COLOR_YCBCR_BT709) |
> > + BIT(DRM_COLOR_YCBCR_BT2020),
> > + BIT(DRM_COLOR_YCBCR_LIMITED_RANGE) |
> > + BIT(DRM_COLOR_YCBCR_FULL_RANGE),
> > + DRM_COLOR_YCBCR_BT601,
> > + DRM_COLOR_YCBCR_FULL_RANGE);
> > +
> > return plane;
> > }
> >
>



--
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2024-02-27 15:09:55

by Louis Chauvet

[permalink] [raw]
Subject: Re: [PATCH v2 5/9] drm/vkms: Re-introduce line-per-line composition algorithm

[...]

> > -static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
> > - struct line_buffer *stage_buffer,
> > - struct line_buffer *output_buffer)
> > +static void pre_mul_alpha_blend(
> > + struct line_buffer *stage_buffer,
> > + struct line_buffer *output_buffer,
> > + int x_start,
> > + int pixel_count)
> > {
> > - int x_dst = frame_info->dst.x1;
> > - struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
> > - struct pixel_argb_u16 *in = stage_buffer->pixels;
> > - int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> > - stage_buffer->n_pixels);
> > -
> > - for (int x = 0; x < x_limit; x++) {
> > - out[x].a = (u16)0xffff;
> > - out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
> > - out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
> > - out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
> > + struct pixel_argb_u16 *out = &output_buffer->pixels[x_start];
> > + struct pixel_argb_u16 *in = &stage_buffer->pixels[x_start];
>
> Input buffers and pointers should be const.

They will be const in v4.

> > +
> > + for (int i = 0; i < pixel_count; i++) {
> > + out[i].a = (u16)0xffff;
> > + out[i].r = pre_mul_blend_channel(in[i].r, out[i].r, in[i].a);
> > + out[i].g = pre_mul_blend_channel(in[i].g, out[i].g, in[i].a);
> > + out[i].b = pre_mul_blend_channel(in[i].b, out[i].b, in[i].a);
> > }
> > }
>
> Somehow the hunk above does not feel like it is part of "re-introduce
> line-per-line composition algorithm". This function was already running
> line-by-line. Would it be easy enough to collect this and directly
> related changes into a separate patch?

It is not directly related to the reintroduction of line-by-line
algorithm, but in the simplification and maintenability effort, I
changed a bit the function to avoid having multiple place computing the
x_start/pixel_count values. I don't see an interrest to extract it, it
will be just a translation of the few lines into the calling place.

[...]

> > +/**
> > + * direction_for_rotation() - Helper to get the correct reading direction for a specific rotation
> > + *
> > + * @rotation: rotation to analyze
>
> This is KMS plane rotation property, right?
>
> So the KMS plane has been rotated by this, and what we want to find is
> the read direction on the attached FB so that reading returns pixels in
> the CRTC line/scanout order, right?
>
> Maybe extend the doc to explain that.

Is it better?

* direction_for_rotation() - Get the correct reading direction for a given rotation
*
* This function will use the @rotation parameter to compute the correct reading direction to read
* a line from the source buffer.
* For example, if the buffer is reflected on X axis, the pixel must be read from right to left.
* @rotation: Rotation to analyze. It correspond the the field @frame_info.rotation.

> > + */
> > +enum pixel_read_direction direction_for_rotation(unsigned int rotation)
> > +{
> > + if (rotation & DRM_MODE_ROTATE_0) {
> > + if (rotation & DRM_MODE_REFLECT_X)
> > + return READ_LEFT;
> > + else
> > + return READ_RIGHT;
> > + } else if (rotation & DRM_MODE_ROTATE_90) {
> > + if (rotation & DRM_MODE_REFLECT_Y)
> > + return READ_UP;
> > + else
> > + return READ_DOWN;
> > + } else if (rotation & DRM_MODE_ROTATE_180) {
> > + if (rotation & DRM_MODE_REFLECT_X)
> > + return READ_RIGHT;
> > + else
> > + return READ_LEFT;
> > + } else if (rotation & DRM_MODE_ROTATE_270) {
> > + if (rotation & DRM_MODE_REFLECT_Y)
> > + return READ_DOWN;
> > + else
> > + return READ_UP;
> > + }
> > + return READ_RIGHT;
> > +}
> > +
> > /**
> > * blend - blend the pixels from all planes and compute crc
> > * @wb: The writeback frame buffer metadata
> > @@ -183,11 +187,11 @@ static void blend(struct vkms_writeback_job *wb,
> > {
> > struct vkms_plane_state **plane = crtc_state->active_planes;
> > u32 n_active_planes = crtc_state->num_active_planes;
> > - int y_pos;
> >
> > const struct pixel_argb_u16 background_color = { .a = 0xffff };
> >
> > size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
> > + size_t crtc_x_limit = crtc_state->base.crtc->mode.hdisplay;
>
> Wonder why these were size_t, causing needs to cast below...

For crtc_x_limit I just copied the crtc_y_limit. I will change both to u16
(the type of h/vdisplay).

> >
> > /*
> > * The planes are composed line-by-line. It is a necessary complexity to avoid poor
> > @@ -198,22 +202,133 @@ static void blend(struct vkms_writeback_job *wb,
> >
> > /* The active planes are composed associatively in z-order. */
> > for (size_t i = 0; i < n_active_planes; i++) {
> > - y_pos = get_y_pos(plane[i]->frame_info, y);
> > + struct vkms_plane_state *current_plane = plane[i];
> >
> > - if (!check_limit(plane[i]->frame_info, y_pos))
> > + /* Avoid rendering useless lines */
> > + if (y < current_plane->frame_info->dst.y1 ||
> > + y >= current_plane->frame_info->dst.y2) {
> > continue;
> > -
> > - vkms_compose_row(stage_buffer, plane[i], y_pos);
> > - pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
> > - output_buffer);
> > + }
> > +
> > + /*
> > + * src_px is the line to copy. The initial coordinates are inside the
> > + * destination framebuffer, and then drm_rect_* helpers are used to
> > + * compute the correct position into the source framebuffer.
> > + */
> > + struct drm_rect src_px = DRM_RECT_INIT(
> > + current_plane->frame_info->dst.x1, y,
> > + drm_rect_width(&current_plane->frame_info->dst), 1);
> > + struct drm_rect tmp_src;
> > +
> > + drm_rect_fp_to_int(&tmp_src, &current_plane->frame_info->src);
> > +
> > + /*
> > + * [1]: Clamping src_px to the crtc_x_limit to avoid writing outside of the
> > + * destination buffer
> > + */
> > + src_px.x2 = min_t(int, src_px.x2, (int)crtc_x_limit);
>
> Up to and including this point, it would be better if src_px was called
> dst_px, because only the below computation converts it into actual
> src_px.

I agree, it will be changed for the v4. I will also change the name to
`dst_line` and `src_line`.

> > +
> > + /*
> > + * Transform the coordinate x/y from the crtc to coordinates into
> > + * coordinates for the src buffer.
> > + *
> > + * - Cancel the offset of the dst buffer.
> > + * - Invert the rotation. This assumes that
> > + * dst = drm_rect_rotate(src, rotation) (dst and src have the
> > + * same size, but can be rotated).
> > + * - Apply the offset of the source rectangle to the coordinate.
> > + */
> > + drm_rect_translate(&src_px, -current_plane->frame_info->dst.x1,
> > + -current_plane->frame_info->dst.y1);
> > + drm_rect_rotate_inv(&src_px,
> > + drm_rect_width(&tmp_src),
> > + drm_rect_height(&tmp_src),
> > + current_plane->frame_info->rotation);
> > + drm_rect_translate(&src_px, tmp_src.x1, tmp_src.y1);
> > +
> > + /* Get the correct reading direction in the source buffer. */
> > +
> > + enum pixel_read_direction direction =
> > + direction_for_rotation(current_plane->frame_info->rotation);
> > +
> > + int x_start = src_px.x1;
> > + int y_start = src_px.y1;
> > + int pixel_count;
> > + /* [2]: Compute and clamp the number of pixel to read */
> > + if (direction == READ_RIGHT || direction == READ_LEFT) {
> > + /*
> > + * In horizontal reading, the src_px width is the number of pixel to
> > + * read
> > + */
> > + pixel_count = drm_rect_width(&src_px);
> > + if (x_start < 0) {
> > + pixel_count += x_start;
> > + x_start = 0;
> > + }
> > + if (x_start + pixel_count > current_plane->frame_info->fb->width) {
> > + pixel_count =
> > + (int)current_plane->frame_info->fb->width - x_start;
> > + }
> > + } else {
> > + /*
> > + * In vertical reading, the src_px height is the number of pixel to
> > + * read
> > + */
> > + pixel_count = drm_rect_height(&src_px);
> > + if (y_start < 0) {
> > + pixel_count += y_start;
> > + y_start = 0;
> > + }
> > + if (y_start + pixel_count > current_plane->frame_info->fb->height) {
> > + pixel_count =
> > + (int)current_plane->frame_info->fb->width - y_start;
> > + }
> > + }
> > +
> > + if (pixel_count <= 0) {
> > + /* Nothing to read, so avoid multiple function calls for nothing */
> > + continue;
> > + }
> > +
> > + /*
> > + * Modify the starting point to take in account the rotation
> > + *
> > + * src_px is the top-left corner, so when reading READ_LEFT or READ_TOP, it
> > + * must be changed to the top-right/bottom-left corner.
> > + */
> > + if (direction == READ_LEFT) {
> > + // x_start is now the right point
> > + x_start += pixel_count - 1;
> > + } else if (direction == READ_UP) {
> > + // y_start is now the bottom point
> > + y_start += pixel_count - 1;
> > + }
> > +
> > + /*
> > + * Perform the conversion and the blending
> > + *
> > + * Here we know that the read line (x_start, y_start, pixel_count) is
> > + * inside the source buffer [2] and we don't write outside the stage
> > + * buffer [1]
> > + */
> > + current_plane->pixel_read_line(
> > + current_plane->frame_info,
> > + x_start,
> > + y_start,
> > + direction,
> > + pixel_count,
> > + &stage_buffer->pixels[current_plane->frame_info->dst.x1]);
> > +
> > + pre_mul_alpha_blend(stage_buffer, output_buffer,
> > + current_plane->frame_info->dst.x1,
> > + pixel_count);
> > }
>
> I stared at the above algorithm for a while, and I couldn't find
> anything obviously wrong, so good work.

Thinks for your review, I spend a lot of time writing this and thinking to
all the edge cases.

One thing I forgot is to clamp dst.x1 of the destination buffer. It will
be fixed in my v4.

[...]

> > +enum pixel_read_direction {
> > + READ_UP,
> > + READ_DOWN,
> > + READ_LEFT,
> > + READ_RIGHT
>
> When I saw these in code, I got a little confused. Does READ_LEFT mean
> read towards left, or read starting from left? It's very common to
> express reading directions as left-to-right and right-to-left rather
> than "left arrow".
>
> There are many choices how to improve this, e.g. upward, leftward,
> right-to-left, positive-x, negative-y.

I will change it to: READ_LEFT_TO_RIGHT, READ_RIGHT_TO_LEFT, ...

> > +};
> > +
> > /**
> > - * typedef pixel_read_t - These functions are used to read a pixel in the source frame,
> > +<<<<<<< HEAD
> > + * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
> > * convert it to `struct pixel_argb_u16` and write it to @out_pixel.
> > *
> > - * @src_pixels: Pointer to the pixel to read
> > - * @out_pixel: Pointer to write the converted pixel
> > + * @frame_info: Frame used as source for the pixel value
> > + * @y: Y (height) coordinate in the source buffer
> > + * @x_start: X (width) coordinate of the first pixel to copy
> > + * @x_end: X (width) coordinate of the last pixel to copy
> > + * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
> > + * x_end.
> > */
> > -typedef void (*pixel_read_t)(u8 *src_pixels, struct pixel_argb_u16 *out_pixel);
> > +typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
>
> const frame_info I presume.

I agree.

> > + pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
> >
> > /**
> > * vkms_plane_state - Driver specific plane state
> > @@ -88,7 +100,7 @@ typedef void (*pixel_read_t)(u8 *src_pixels, struct pixel_argb_u16 *out_pixel);
> > struct vkms_plane_state {
> > struct drm_shadow_plane_state base;
> > struct vkms_frame_info *frame_info;
> > - pixel_read_t pixel_read;
> > + pixel_read_line_t pixel_read_line;
> > };
> >
> > struct vkms_plane {
> > @@ -193,7 +205,6 @@ int vkms_verify_crc_source(struct drm_crtc *crtc, const char *source_name,
> > /* Composer Support */
> > void vkms_composer_worker(struct work_struct *work);
> > void vkms_set_composer(struct vkms_output *out, bool enabled);
> > -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y);
> > void vkms_writeback_row(struct vkms_writeback_job *wb, const struct line_buffer *src_buffer, int y);
> >
> > /* Writeback */
> > diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> > index 1f5aeba57ad6..46daea6d3ee9 100644
> > --- a/drivers/gpu/drm/vkms/vkms_formats.c
> > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > @@ -11,21 +11,29 @@
> >
> > /**
> > * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
> > - * in the first plane
> > *
> > * @frame_info: Buffer metadata
> > * @x: The x coordinate of the wanted pixel in the buffer
> > * @y: The y coordinate of the wanted pixel in the buffer
> > + * @plane_index: The index of the plane to use
> > *
> > * The caller must be aware that this offset is not always a pointer to a pixel. If individual
> > * pixel values are needed, they have to be extracted from the resulting block.
> > */
> > -static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> > +static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int x, int y,
> > + size_t plane_index)
> > {
> > struct drm_framebuffer *fb = frame_info->fb;
> > -
> > - return fb->offsets[0] + (y * fb->pitches[0])
> > - + (x * fb->format->cpp[0]);
> > + const struct drm_format_info *format = frame_info->fb->format;
> > + /* Directly using x and y to multiply pitches and format->ccp is not sufficient because
> > + * in some formats a block can represent multiple pixels.
> > + *
> > + * Dividing x and y by the block size allows to extract the correct offset of the block
> > + * containing the pixel.
> > + */
> > + return fb->offsets[plane_index] +
> > + (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
> > + (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
>
> These changes do not seem like they belong with "re-introduce
> line-per-line composition algorithm" but some other patch.

I will extract this change and the next change in an other commit:
"drm/vkms: Update pixel accessors to support packed pixel formats and
multi-plane"

> > }
> >
> > /**
> > @@ -35,44 +43,56 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
> > * @frame_info: Buffer metadata
> > * @x: The x(width) coordinate inside the plane
> > * @y: The y(height) coordinate inside the plane
> > + * @plane_index: The index of the plane
> > *
> > - * Takes the information stored in the frame_info, a pair of coordinates, and
> > - * returns the address of the first color channel.
> > - * This function assumes the channels are packed together, i.e. a color channel
> > - * comes immediately after another in the memory. And therefore, this function
> > - * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> > + * Takes the information stored in the frame_info, a pair of coordinates, and returns the address
> > + * of the block containing this pixel.
> > + * The caller must be aware that this pointer is sometimes not directly a pixel, it needs some
> > + * additional work to extract pixel color from this block.
> > */
> > static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
> > - int x, int y)
> > + int x, int y, size_t plane_index)
> > {
> > - size_t offset = pixel_offset(frame_info, x, y);
> > -
> > - return (u8 *)frame_info->map[0].vaddr + offset;
> > + return (u8 *)frame_info->map[0].vaddr + packed_pixels_offset(frame_info, x, y, plane_index);
>
> This too.

It will be in the same commit as above.

> > }
> >
> > -static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
> > +/**
> > + * get_step_1x1() - Common helper to compute the correct step value between each pixel to read in a
> > + * certain direction.
> > + * This must be used only with format where blockh == blockw == 1.
> > + * In the case when direction is not a valid pixel_read_direction, the returned step is 0, so you
> > + * must not rely on this result to create a loop variant.
> > + *
> > + * @fb Framebuffer to iter on
> > + * @direction Direction of the reading
> > + */
> > +static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction direction,
> > + int plane_index)
> > {
> > - int x_src = frame_info->src.x1 >> 16;
> > - int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
> > -
> > - return packed_pixels_addr(frame_info, x_src, y_src);
> > + switch (direction) {
> > + default:
> > + DRM_ERROR("Invalid direction for pixel reading: %d\n", direction);
> > + return 0;
>
> What I'd do here is move the default: section outside of the switch
> completely. Then the compiler can warn if any enum value is not handled
> here. Since every case in the switch is a return statement, falling out
> of the switch block is the default case.

Hoo, I did not know that gcc can warn when using enums, I will definitly
do it for the v4.

> Maybe the enum variable containing an illegal value could be handled
> more harshly so that callers could rely on this function always
> returning a good value?
>
> Just like passing in fb=NULL is handled by the kernel as an OOPS.

I don't think it's a good idea to OOPS inside a driver. An error here is
maybe dangerous, but is not fatal to the kernel. Maybe you know how to do
a "local" OOPS to break only this driver and not the whole kernel?

For the v4 I will keep a DRM_ERROR and return 0.

> > + case READ_RIGHT:
> > + return fb->format->char_per_block[plane_index];
> > + case READ_LEFT:
> > + return -fb->format->char_per_block[plane_index];
> > + case READ_DOWN:
> > + return (int)fb->pitches[plane_index];
> > + case READ_UP:
> > + return -(int)fb->pitches[plane_index];
> > + }
> > }
> >
> > -static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
> > -{
> > - if (frame_info->rotation & (DRM_MODE_REFLECT_X | DRM_MODE_ROTATE_270))
> > - return limit - x - 1;
> > - return x;
> > -}
> >
> > /*
> > - * The following functions take pixel data from the buffer and convert them to the format
> > + * The following functions take pixel data (a, r, g, b, pixel, ...), convert them to the format
> > * ARGB16161616 in out_pixel.
> > *
> > - * They are used in the `vkms_compose_row` function to handle multiple formats.
> > + * They are used in the `read_line`s functions to avoid duplicate work for some pixel formats.
> > */
> >
> > -static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> > +static void ARGB8888_to_argb_u16(struct pixel_argb_u16 *out_pixel, int a, int r, int g, int b)
>
> The function name ARGB8888_to_argb_u16() is confusing. It's not taking
> in ARGB8888 pixels but separate a,r,g,b ints. The only assumption it
> needs from the pixel format is the 8888 part.

I don't realy know how to name it. What I like with ARGB8888 is that it's
clear that the values are 8 bits and in argb format.
Do you think that `argb_u8_to_argb_u16`, with a new structure
pixel_argb_u8 will be better? (like PATCH 6/9 with pixel_yuv_u8).

If so, I will introduce the argb_u8 structure in an other commit.

[...]

> > + * The following functions are read_line function for each pixel format supported by VKMS.
> > *
> > - * This function composes a single row of a plane. It gets the source pixels
> > - * through the y coordinate (see get_packed_src_addr()) and goes linearly
> > - * through the source pixel, reading the pixels and converting it to
> > - * ARGB16161616 (see the pixel_read() callback). For rotate-90 and rotate-270,
> > - * the source pixels are not traversed linearly. The source pixels are queried
> > - * on each iteration in order to traverse the pixels vertically.
> > + * They read a line starting at the point @x_start,@y_start following the @direction. The result
> > + * is stored in @out_pixel and in the format ARGB16161616.
> > + *
> > + * Those function are very similar, but it is required for performance reason. In the past, some
> > + * experiment were done, and with a generic loop the performance are very reduced [1].
> > + *
> > + * [1]: https://lore.kernel.org/dri-devel/[email protected]/
> > */
> > -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y)
> > +
> > +static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> > + enum pixel_read_direction direction, int count,
> > + struct pixel_argb_u16 out_pixel[])
> > +{
> > + u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> > +
> > + int step = get_step_1x1(frame_info->fb, direction, 0);
> > +
> > + while (count) {
> > + u8 *px = (u8 *)src_pixels;
> > +
> > + ARGB8888_to_argb_u16(out_pixel, px[3], px[2], px[1], px[0]);
> > + out_pixel += 1;
> > + src_pixels += step;
> > + count--;
>
> btw. you could eliminate decrementing 'count' if you computed end
> address and used while (out_pixel < end).

Yes, you are right, but after thinking about it, neither out_pixel < end
and while (count) are conveying "this loop will copy `count` pixels. I
think a for-loop here is more understandable. There is no ambiguity in the
number of pixels written and less error-prone. I will replace
while (count)
by
for(int i = 0; i < count; i++)

Kind regards,
Louis Chauvet

[...]

--
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2024-02-27 15:10:10

by Louis Chauvet

[permalink] [raw]
Subject: Re: [PATCH v2 4/9] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions

Le 26/02/24 - 13:36, Pekka Paalanen a ?crit :
> On Fri, 23 Feb 2024 12:37:24 +0100
> Louis Chauvet <[email protected]> wrote:
>
> > Introduce two typedefs: pixel_read_t and pixel_write_t. It allows the
> > compiler to check if the passed functions take the correct arguments.
> > Such typedefs will help ensuring consistency across the code base in
> > case of update of these prototypes.
> >
> > Introduce a check around the get_pixel_*_functions to avoid using a
> > nullptr as a function.
> >
> > Document for those typedefs.
> >
> > Signed-off-by: Louis Chauvet <[email protected]>
> > ---
> > drivers/gpu/drm/vkms/vkms_drv.h | 23 +++++++++++++++++++++--
> > drivers/gpu/drm/vkms/vkms_formats.c | 8 ++++----
> > drivers/gpu/drm/vkms/vkms_formats.h | 4 ++--
> > drivers/gpu/drm/vkms/vkms_plane.c | 9 ++++++++-
> > drivers/gpu/drm/vkms/vkms_writeback.c | 9 ++++++++-
> > 5 files changed, 43 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> > index 18086423a3a7..886c885c8cf5 100644
> > --- a/drivers/gpu/drm/vkms/vkms_drv.h
> > +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> > @@ -53,12 +53,31 @@ struct line_buffer {
> > struct pixel_argb_u16 *pixels;
> > };
> >
> > +/**
> > + * typedef pixel_write_t - These functions are used to read a pixel from a
> > + * `struct pixel_argb_u16*`, convert it in a specific format and write it in the @dst_pixels
> > + * buffer.
> > + *
> > + * @dst_pixel: destination address to write the pixel
> > + * @in_pixel: pixel to write
> > + */
> > +typedef void (*pixel_write_t)(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel);
>
> There are some inconsistencies in pixel_write_t and pixel_read_t. At
> this point of the series they still operate on a single pixel, but you
> use dst_pixels and src_pixels, plural. Yet the documentation correctly
> talks about processing a single pixel.

I will fix this for v4.

> I would also expect the source to be always const, but that's a whole
> another patch to change.

The v4 will contains a new patch "drm/vkms: Use const pointer for
pixel_read and pixel_write functions"

[...]

> > diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> > index d5203f531d96..f68b1b03d632 100644
> > --- a/drivers/gpu/drm/vkms/vkms_plane.c
> > +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> > @@ -106,6 +106,13 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> > return;
> >
> > fmt = fb->format->format;
> > + pixel_read_t pixel_read = get_pixel_read_function(fmt);
> > +
> > + if (!pixel_read) {
> > + DRM_WARN("Pixel format is not supported by VKMS planes. State is inchanged\n");
>
> DRM_WARN() is the kernel equivalent to userspace assert(), right?

For the DRM_WARN it is just a standard prinkt(KERN_WARN, ...) (hidden
behind drm internal macros).

> In that failing the check means an internal invariant was violated,
> which means a code bug in kernel?
>
> Maybe this could be more specific about what invariant was violated?
> E.g. atomic check should have rejected this attempt already.

I'm not an expert (yet) in DRM, so please correct me:
When atomic_update is called, the new state is always validated by
atomic_check before? There is no way to pass something not validated by
atomic_check to atomic_update? If this is the case, yes, it should be an
ERROR and not a WARN as an invalid format passed the atomic_check
verification.

If so, is this better?

if (!pixel_read) {
/*
* This is a bug as the vkms_plane_atomic_check must forbid all unsupported formats.
*/
DRM_ERROR("Pixel format %4cc is not supported by VKMS planes.\n", fmt);
return;
}

I will put the same code in vkms_writeback.c.

[...]

Kind regards,
Louis Chauvet


--
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2024-02-27 15:10:41

by Louis Chauvet

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] drm/vkms: Add YUV support

Hi Pekka,

For all the comment related to the conversion part, maybe Arthur have an
opinion on it, I took his patch as a "black box" (I did not want to
break (and debug) it).

Le 26/02/24 - 14:19, Pekka Paalanen a ?crit :
> On Fri, 23 Feb 2024 12:37:26 +0100
> Louis Chauvet <[email protected]> wrote:
>
> > From: Arthur Grillo <[email protected]>
> >
> > Add support to the YUV formats bellow:
> >
> > - NV12
> > - NV16
> > - NV24
> > - NV21
> > - NV61
> > - NV42
> > - YUV420
> > - YUV422
> > - YUV444
> > - YVU420
> > - YVU422
> > - YVU444
> >
> > The conversion matrices of each encoding and range were obtained by
> > rounding the values of the original conversion matrices multiplied by
> > 2^8. This is done to avoid the use of fixed point operations.
> >
> > Signed-off-by: Arthur Grillo <[email protected]>
> > [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
> > callbacks for yuv formats]
> > Signed-off-by: Louis Chauvet <[email protected]>
> > ---
> > drivers/gpu/drm/vkms/vkms_composer.c | 2 +-
> > drivers/gpu/drm/vkms/vkms_drv.h | 6 +-
> > drivers/gpu/drm/vkms/vkms_formats.c | 289 +++++++++++++++++++++++++++++++++--
> > drivers/gpu/drm/vkms/vkms_formats.h | 4 +
> > drivers/gpu/drm/vkms/vkms_plane.c | 14 +-
> > 5 files changed, 295 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> > index e555bf9c1aee..54fc5161d565 100644
> > --- a/drivers/gpu/drm/vkms/vkms_composer.c
> > +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> > @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
> > * buffer [1]
> > */
> > current_plane->pixel_read_line(
> > - current_plane->frame_info,
> > + current_plane,
> > x_start,
> > y_start,
> > direction,
> > diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> > index ccc5be009f15..a4f6456cb971 100644
> > --- a/drivers/gpu/drm/vkms/vkms_drv.h
> > +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> > @@ -75,6 +75,8 @@ enum pixel_read_direction {
> > READ_RIGHT
> > };
> >
> > +struct vkms_plane_state;
> > +
> > /**
> > <<<<<<< HEAD
> > * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
> > @@ -87,8 +89,8 @@ enum pixel_read_direction {
> > * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
> > * x_end.
> > */
> > -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
> > - pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
> > +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
> > + enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>
> This is the second or third time in this one series changing this type.
> Could you not do the change once, in its own patch if possible?

Sorry, this is not a change here, but a wrong formatting (missed when
rebasing).

Do you think that it make sense to re-order my patches and put this
typedef at the end? This way it is never updated.

> >
> > /**
> > * vkms_plane_state - Driver specific plane state
> > diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> > index 46daea6d3ee9..515c80866a58 100644
> > --- a/drivers/gpu/drm/vkms/vkms_formats.c
> > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
> > */
> > return fb->offsets[plane_index] +
> > (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
> > - (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
> > + (x / drm_format_info_block_height(format, plane_index)) *
> > + format->char_per_block[plane_index];
>
> Shouldn't this be in the patch that added this code in the first place?

Same as above, a wrong formatting, I will remove this change and keep
everything on one line (even if it's more than 100 chars, it is easier to
read).

> > }
> >
> > /**
> > @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
> > }
> > }
> >
> > +/**
> > + * get_subsampling() - Get the subsampling value on a specific direction
>
> subsampling divisor

Thanks for this precision.

> > + */
> > +static int get_subsampling(const struct drm_format_info *format,
> > + enum pixel_read_direction direction)
> > +{
> > + if (direction == READ_LEFT || direction == READ_RIGHT)
> > + return format->hsub;
> > + else if (direction == READ_DOWN || direction == READ_UP)
> > + return format->vsub;
> > + return 1;
>
> In this and the below function, personally I'd prefer switch-case, with
> a cannot-happen-scream after the switch, so the compiler can warn about
> unhandled enum values.

As for the previous patch, I did not know about this compiler feature,
thanks!

> > +}
> > +
> > +/**
> > + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
> > + */
> > +static int get_subsampling_offset(const struct drm_format_info *format,
> > + enum pixel_read_direction direction, int x_start, int y_start)
>
> 'start' values as "increments" for a pixel counter? Is something
> misnamed here?
>
> Is it an increment or an offset?

I don't really know how to name the function. I'm open to suggestions
x_start and y_start are really the coordinate of the starting reading point.

To explain what it does:

When using subsampling, you have to read the next pixel of planes[1..4]
not at the same "speed" as plane[0]. But I can't only rely on
"read_pixel_count % subsampling == 0", because it means that the pixel
incrementation on planes[1..4] may not be aligned with the buffer (if
hsub=2 and the start pixel is 1, I need to increment planes[1..4] only
for x=2,4,6... not 1,3,5...).

A way to ensure this is to add an "offset" to count, which ensure that the
count % subsampling == 0 on the correct pixel.

I made an error, the switch case must be (as count is always counting up,
for "inverted" reading direction a negative number ensure that
%subsampling == 0 on the correct pixel):

switch (direction) {
case READ_UP:
return -y_start;
case READ_DOWN:
return y_start;
case READ_LEFT:
return -x_start;
case READ_RIGHT:
return x_start;
}

> > +{
> > + if (direction == READ_RIGHT || direction == READ_LEFT)
> > + return x_start;
> > + else if (direction == READ_DOWN || direction == READ_UP)
> > + return y_start;
> > + return 0;
> > +}
> > +

[...]

> > +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
> > + enum drm_color_encoding encoding, enum drm_color_range range)
> > +{
> > + static const s16 bt601_full[3][3] = {
> > + { 256, 0, 359 },
> > + { 256, -88, -183 },
> > + { 256, 454, 0 },
> > + };

[...]

> > +
> > + u8 r = 0;
> > + u8 g = 0;
> > + u8 b = 0;
> > + bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
> > + unsigned int y_offset = full ? 0 : 16;
> > +
> > + switch (encoding) {
> > + case DRM_COLOR_YCBCR_BT601:
> > + ycbcr2rgb(full ? bt601_full : bt601,
>
> Doing all these conditional again pixel by pixel is probably
> inefficient. Just like with the line reading functions, you could pick
> the matrix in advance.

I don't think the performance impact is huge (it's only a pair of if), but
yes, it's an easy optimization.

I will create a conversion_matrix structure:

struct conversion_matrix {
s16 matrix[3][3];
u16 y_offset;
}

I will create a `get_conversion_matrix_to_argb_u16` function to get this
structure from a format+encoding+range.

I will also add a field `conversion_matrix` in struct vkms_plane_state to
get this matrix only once per plane setup.


> > + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> > + break;
> > + case DRM_COLOR_YCBCR_BT709:
> > + ycbcr2rgb(full ? rec709_full : rec709,
> > + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> > + break;
> > + case DRM_COLOR_YCBCR_BT2020:
> > + ycbcr2rgb(full ? bt2020_full : bt2020,
> > + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> > + break;
> > + default:
> > + pr_warn_once("Not supported color encoding\n");
> > + break;
> > + }
> > +
> > + argb_u16->r = r * 257;
> > + argb_u16->g = g * 257;
> > + argb_u16->b = b * 257;
>
> I wonder. Using 8-bit fixed point precision seems quite coarse for
> 8-bit pixel formats, and it's going to be insufficient for higher bit
> depths. Was supporting e.g. 10-bit YUV considered? There is even
> deeper, too, like DRM_FORMAT_P016.

It's a good point, as I explained above, I took the conversion part as a
"black box" to avoid breaking (and debugging) stuff. I think it's easy to
switch to s32 bits matrix with 16.16 bits (or anything with more than 16 bits in
the float part).

Maybe Arthur have an opinion on this?

Just to be sure, the DRM subsystem don't have such matrix somewhere? It
can be nice to avoid duplicating them.

> > +}
> > +
> > /*
> > * The following functions are read_line function for each pixel format supported by VKMS.
> > *
> > @@ -142,13 +250,13 @@ static void RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixe
> > * [1]: https://lore.kernel.org/dri-devel/[email protected]/
> > */
> >
> > -static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> > +static void ARGB8888_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> > enum pixel_read_direction direction, int count,
> > struct pixel_argb_u16 out_pixel[])
> > {
> > - u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> > + u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> >
> > - int step = get_step_1x1(frame_info->fb, direction, 0);
> > + int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>
> These are the kind of changes I would not expect to see in a patch
> adding YUV support. There are a lot of them, too.

I will put it directly this change in PATCHv2 5/9.

[...]

> > +static void semi_planar_yuv_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> > + enum pixel_read_direction direction, int count,
> > + struct pixel_argb_u16 out_pixel[])
> > +{
> > + u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> > + u8 *uv_plane = packed_pixels_addr(plane->frame_info,
> > + x_start / plane->frame_info->fb->format->hsub,
> > + y_start / plane->frame_info->fb->format->vsub,
> > + 1);
> > + struct pixel_yuv_u8 yuv_u8;
> > + int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> > + int step_uv = get_step_1x1(plane->frame_info->fb, direction, 1);
> > + int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> > + int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> > + x_start, y_start); // 0
> > +
> > + for (int i = 0; i < count; i++) {
> > + yuv_u8.y = y_plane[0];
> > + yuv_u8.u = uv_plane[0];
> > + yuv_u8.v = uv_plane[1];
> > +
> > + yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
> > + plane->base.base.color_range);
>
> Oh, so this was the reason to change the read-line function signature.
> Maybe just stash a pointer to the right matrix and the right y_offset
> in frame_info instead?

Yes, that why I changed the signature. I think I will keep this signature
and put the conversion_matrix inside the vkms_plane_state, for me it make
more sense to have pixel_read_line and conversion_matrix in the same
structure.

> > + out_pixel += 1;
> > + y_plane += step_y;
> > + if ((i + subsampling_offset + 1) % subsampling == 0)
> > + uv_plane += step_uv;
> > + }
> > +}
> > +
> > +static void semi_planar_yvu_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> > + enum pixel_read_direction direction, int count,
> > + struct pixel_argb_u16 out_pixel[])
> > +{
> > + u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> > + u8 *vu_plane = packed_pixels_addr(plane->frame_info,
> > + x_start / plane->frame_info->fb->format->hsub,
> > + y_start / plane->frame_info->fb->format->vsub,
> > + 1);
> > + struct pixel_yuv_u8 yuv_u8;
> > + int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> > + int step_vu = get_step_1x1(plane->frame_info->fb, direction, 1);
> > + int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> > + int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> > + x_start, y_start);
> > + for (int i = 0; i < count; i++) {
> > + yuv_u8.y = y_plane[0];
> > + yuv_u8.u = vu_plane[1];
> > + yuv_u8.v = vu_plane[0];
>
> You could swap matrix columns instead of writing this whole new
> function for UV vs. VU. Just an idea.

I was not happy with this duplication too, but I did not think about
switching columns. That's a good idea, thanks!

Kind regards,
Louis Chauvet

[...]

--
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2024-02-27 20:01:50

by Arthur Grillo

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] drm/vkms: Add YUV support



On 27/02/24 12:02, Louis Chauvet wrote:
> Hi Pekka,
>
> For all the comment related to the conversion part, maybe Arthur have an
> opinion on it, I took his patch as a "black box" (I did not want to
> break (and debug) it).
>
> Le 26/02/24 - 14:19, Pekka Paalanen a écrit :
>> On Fri, 23 Feb 2024 12:37:26 +0100
>> Louis Chauvet <[email protected]> wrote:
>>
>>> From: Arthur Grillo <[email protected]>
>>>
>>> Add support to the YUV formats bellow:
>>>
>>> - NV12
>>> - NV16
>>> - NV24
>>> - NV21
>>> - NV61
>>> - NV42
>>> - YUV420
>>> - YUV422
>>> - YUV444
>>> - YVU420
>>> - YVU422
>>> - YVU444
>>>
>>> The conversion matrices of each encoding and range were obtained by
>>> rounding the values of the original conversion matrices multiplied by
>>> 2^8. This is done to avoid the use of fixed point operations.
>>>
>>> Signed-off-by: Arthur Grillo <[email protected]>
>>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
>>> callbacks for yuv formats]
>>> Signed-off-by: Louis Chauvet <[email protected]>
>>> ---
>>> drivers/gpu/drm/vkms/vkms_composer.c | 2 +-
>>> drivers/gpu/drm/vkms/vkms_drv.h | 6 +-
>>> drivers/gpu/drm/vkms/vkms_formats.c | 289 +++++++++++++++++++++++++++++++++--
>>> drivers/gpu/drm/vkms/vkms_formats.h | 4 +
>>> drivers/gpu/drm/vkms/vkms_plane.c | 14 +-
>>> 5 files changed, 295 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>>> index e555bf9c1aee..54fc5161d565 100644
>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>>> @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
>>> * buffer [1]
>>> */
>>> current_plane->pixel_read_line(
>>> - current_plane->frame_info,
>>> + current_plane,
>>> x_start,
>>> y_start,
>>> direction,
>>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
>>> index ccc5be009f15..a4f6456cb971 100644
>>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
>>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
>>> @@ -75,6 +75,8 @@ enum pixel_read_direction {
>>> READ_RIGHT
>>> };
>>>
>>> +struct vkms_plane_state;
>>> +
>>> /**
>>> <<<<<<< HEAD
>>> * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
>>> @@ -87,8 +89,8 @@ enum pixel_read_direction {
>>> * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
>>> * x_end.
>>> */
>>> -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
>>> - pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>>> +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
>>> + enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>>
>> This is the second or third time in this one series changing this type.
>> Could you not do the change once, in its own patch if possible?
>
> Sorry, this is not a change here, but a wrong formatting (missed when
> rebasing).
>
> Do you think that it make sense to re-order my patches and put this
> typedef at the end? This way it is never updated.
>
>>>
>>> /**
>>> * vkms_plane_state - Driver specific plane state
>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>> index 46daea6d3ee9..515c80866a58 100644
>>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>> @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
>>> */
>>> return fb->offsets[plane_index] +
>>> (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
>>> - (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
>>> + (x / drm_format_info_block_height(format, plane_index)) *
>>> + format->char_per_block[plane_index];
>>
>> Shouldn't this be in the patch that added this code in the first place?
>
> Same as above, a wrong formatting, I will remove this change and keep
> everything on one line (even if it's more than 100 chars, it is easier to
> read).
>
>>> }
>>>
>>> /**
>>> @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
>>> }
>>> }
>>>
>>> +/**
>>> + * get_subsampling() - Get the subsampling value on a specific direction
>>
>> subsampling divisor
>
> Thanks for this precision.
>
>>> + */
>>> +static int get_subsampling(const struct drm_format_info *format,
>>> + enum pixel_read_direction direction)
>>> +{
>>> + if (direction == READ_LEFT || direction == READ_RIGHT)
>>> + return format->hsub;
>>> + else if (direction == READ_DOWN || direction == READ_UP)
>>> + return format->vsub;
>>> + return 1;
>>
>> In this and the below function, personally I'd prefer switch-case, with
>> a cannot-happen-scream after the switch, so the compiler can warn about
>> unhandled enum values.
>
> As for the previous patch, I did not know about this compiler feature,
> thanks!
>
>>> +}
>>> +
>>> +/**
>>> + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
>>> + */
>>> +static int get_subsampling_offset(const struct drm_format_info *format,
>>> + enum pixel_read_direction direction, int x_start, int y_start)
>>
>> 'start' values as "increments" for a pixel counter? Is something
>> misnamed here?
>>
>> Is it an increment or an offset?
>
> I don't really know how to name the function. I'm open to suggestions
> x_start and y_start are really the coordinate of the starting reading point.
>
> To explain what it does:
>
> When using subsampling, you have to read the next pixel of planes[1..4]
> not at the same "speed" as plane[0]. But I can't only rely on
> "read_pixel_count % subsampling == 0", because it means that the pixel
> incrementation on planes[1..4] may not be aligned with the buffer (if
> hsub=2 and the start pixel is 1, I need to increment planes[1..4] only
> for x=2,4,6... not 1,3,5...).
>
> A way to ensure this is to add an "offset" to count, which ensure that the
> count % subsampling == 0 on the correct pixel.
>
> I made an error, the switch case must be (as count is always counting up,
> for "inverted" reading direction a negative number ensure that
> %subsampling == 0 on the correct pixel):
>
> switch (direction) {
> case READ_UP:
> return -y_start;
> case READ_DOWN:
> return y_start;
> case READ_LEFT:
> return -x_start;
> case READ_RIGHT:
> return x_start;
> }
>
>>> +{
>>> + if (direction == READ_RIGHT || direction == READ_LEFT)
>>> + return x_start;
>>> + else if (direction == READ_DOWN || direction == READ_UP)
>>> + return y_start;
>>> + return 0;
>>> +}
>>> +
>
> [...]
>
>>> +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
>>> + enum drm_color_encoding encoding, enum drm_color_range range)
>>> +{
>>> + static const s16 bt601_full[3][3] = {
>>> + { 256, 0, 359 },
>>> + { 256, -88, -183 },
>>> + { 256, 454, 0 },
>>> + };
>
> [...]
>
>>> +
>>> + u8 r = 0;
>>> + u8 g = 0;
>>> + u8 b = 0;
>>> + bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
>>> + unsigned int y_offset = full ? 0 : 16;
>>> +
>>> + switch (encoding) {
>>> + case DRM_COLOR_YCBCR_BT601:
>>> + ycbcr2rgb(full ? bt601_full : bt601,
>>
>> Doing all these conditional again pixel by pixel is probably
>> inefficient. Just like with the line reading functions, you could pick
>> the matrix in advance.
>
> I don't think the performance impact is huge (it's only a pair of if), but
> yes, it's an easy optimization.
>
> I will create a conversion_matrix structure:
>
> struct conversion_matrix {
> s16 matrix[3][3];
> u16 y_offset;
> }
>
> I will create a `get_conversion_matrix_to_argb_u16` function to get this
> structure from a format+encoding+range.
>
> I will also add a field `conversion_matrix` in struct vkms_plane_state to
> get this matrix only once per plane setup.
>
>
>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>> + break;
>>> + case DRM_COLOR_YCBCR_BT709:
>>> + ycbcr2rgb(full ? rec709_full : rec709,
>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>> + break;
>>> + case DRM_COLOR_YCBCR_BT2020:
>>> + ycbcr2rgb(full ? bt2020_full : bt2020,
>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>> + break;
>>> + default:
>>> + pr_warn_once("Not supported color encoding\n");
>>> + break;
>>> + }
>>> +
>>> + argb_u16->r = r * 257;
>>> + argb_u16->g = g * 257;
>>> + argb_u16->b = b * 257;
>>
>> I wonder. Using 8-bit fixed point precision seems quite coarse for
>> 8-bit pixel formats, and it's going to be insufficient for higher bit
>> depths. Was supporting e.g. 10-bit YUV considered? There is even
>> deeper, too, like DRM_FORMAT_P016.
>
> It's a good point, as I explained above, I took the conversion part as a
> "black box" to avoid breaking (and debugging) stuff. I think it's easy to
> switch to s32 bits matrix with 16.16 bits (or anything with more than 16 bits in
> the float part).
>
> Maybe Arthur have an opinion on this?

Yeah, I too don't see why not we could do that. The 8-bit precision was
sufficient for those formats, but as well noted by Pekka this could be a
problem for higher bit depths. I just need to make my terrible python
script spit those values XD.

> Just to be sure, the DRM subsystem don't have such matrix somewhere? It
> can be nice to avoid duplicating them.

As to my knowledge it does not exist on DRM, I think those are normally
on the hardware itself (*please* correct me if I'm wrong).

But, v4l2 has a similar table on
drivers/media/common/v4l2-tpg/v4l2-tpg-core.c (Actually, I started my
code based on this), unfortunately it's only 8-bit too.

Best Regards,
~Arthur Grillo

>
>>> +} + /* * The following functions are read_line function for each
>>> pixel format supported by VKMS. * @@ -142,13 +250,13 @@ static void
>>> RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixe
>>> * [1]:
>>> https://lore.kernel.org/dri-devel/[email protected]/
>>> */
>>>
>>> -static void ARGB8888_read_line(struct vkms_frame_info *frame_info,
>>> int x_start, int y_start, +static void ARGB8888_read_line(struct
>>> vkms_plane_state *plane, int x_start, int y_start, enum
>>> pixel_read_direction direction, int count, struct pixel_argb_u16
>>> out_pixel[]) { - u8 *src_pixels = packed_pixels_addr(frame_info,
>>> x_start, y_start, 0); + u8 *src_pixels =
>>> packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>>>
>>> - int step = get_step_1x1(frame_info->fb, direction, 0); +
>>> int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>>
>> These are the kind of changes I would not expect to see in a patch
>> adding YUV support. There are a lot of them, too.
>
> I will put it directly this change in PATCHv2 5/9.
>
> [...]
>
>>> +static void semi_planar_yuv_read_line(struct vkms_plane_state
>>> *plane, int x_start, int y_start, +
>>> enum pixel_read_direction direction, int count, +
>>> struct pixel_argb_u16 out_pixel[]) +{ + u8 *y_plane =
>>> packed_pixels_addr(plane->frame_info, x_start, y_start, 0); +
>>> u8 *uv_plane = packed_pixels_addr(plane->frame_info, +
>>> x_start / plane->frame_info->fb->format->hsub, +
>>> y_start / plane->frame_info->fb->format->vsub, +
>>> 1); + struct pixel_yuv_u8 yuv_u8; + int step_y =
>>> get_step_1x1(plane->frame_info->fb, direction, 0); + int
>>> step_uv = get_step_1x1(plane->frame_info->fb, direction, 1); +
>>> int subsampling = get_subsampling(plane->frame_info->fb->format,
>>> direction); + int subsampling_offset =
>>> get_subsampling_offset(plane->frame_info->fb->format, direction, +
>>> x_start, y_start); // 0 + + for (int i = 0; i < count; i++) { +
>>> yuv_u8.y = y_plane[0]; + yuv_u8.u = uv_plane[0]; +
>>> yuv_u8.v = uv_plane[1]; + + yuv_u8_to_argb_u16(out_pixel,
>>> &yuv_u8, plane->base.base.color_encoding, +
>>> plane->base.base.color_range);
>>
>> Oh, so this was the reason to change the read-line function
>> signature. Maybe just stash a pointer to the right matrix and the
>> right y_offset in frame_info instead?
>
> Yes, that why I changed the signature. I think I will keep this
> signature and put the conversion_matrix inside the vkms_plane_state,
> for me it make more sense to have pixel_read_line and
> conversion_matrix in the same structure.
>
>>> + out_pixel += 1; + y_plane += step_y; +
>>> if ((i + subsampling_offset + 1) % subsampling == 0) +
>>> uv_plane += step_uv; + } +} + +static void
>>> semi_planar_yvu_read_line(struct vkms_plane_state *plane, int
>>> x_start, int y_start, + enum
>>> pixel_read_direction direction, int count, +
>>> struct pixel_argb_u16 out_pixel[]) +{ + u8 *y_plane =
>>> packed_pixels_addr(plane->frame_info, x_start, y_start, 0); +
>>> u8 *vu_plane = packed_pixels_addr(plane->frame_info, +
>>> x_start / plane->frame_info->fb->format->hsub, +
>>> y_start / plane->frame_info->fb->format->vsub, +
>>> 1); + struct pixel_yuv_u8 yuv_u8; + int step_y =
>>> get_step_1x1(plane->frame_info->fb, direction, 0); + int
>>> step_vu = get_step_1x1(plane->frame_info->fb, direction, 1); +
>>> int subsampling = get_subsampling(plane->frame_info->fb->format,
>>> direction); + int subsampling_offset =
>>> get_subsampling_offset(plane->frame_info->fb->format, direction, +
>>> x_start, y_start); + for (int i = 0; i < count; i++) { +
>>> yuv_u8.y = y_plane[0]; + yuv_u8.u = vu_plane[1]; +
>>> yuv_u8.v = vu_plane[0];
>>
>> You could swap matrix columns instead of writing this whole new
>> function for UV vs. VU. Just an idea.
>
> I was not happy with this duplication too, but I did not think about
> switching columns. That's a good idea, thanks!
>
> Kind regards, Louis Chauvet
>
> [...]
>

2024-02-29 01:52:34

by Arthur Grillo

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] drm/vkms: Add YUV support



On 27/02/24 17:01, Arthur Grillo wrote:
>
>
> On 27/02/24 12:02, Louis Chauvet wrote:
>> Hi Pekka,
>>
>> For all the comment related to the conversion part, maybe Arthur have an
>> opinion on it, I took his patch as a "black box" (I did not want to
>> break (and debug) it).
>>
>> Le 26/02/24 - 14:19, Pekka Paalanen a écrit :
>>> On Fri, 23 Feb 2024 12:37:26 +0100
>>> Louis Chauvet <[email protected]> wrote:
>>>
>>>> From: Arthur Grillo <[email protected]>
>>>>
>>>> Add support to the YUV formats bellow:
>>>>
>>>> - NV12
>>>> - NV16
>>>> - NV24
>>>> - NV21
>>>> - NV61
>>>> - NV42
>>>> - YUV420
>>>> - YUV422
>>>> - YUV444
>>>> - YVU420
>>>> - YVU422
>>>> - YVU444
>>>>
>>>> The conversion matrices of each encoding and range were obtained by
>>>> rounding the values of the original conversion matrices multiplied by
>>>> 2^8. This is done to avoid the use of fixed point operations.
>>>>
>>>> Signed-off-by: Arthur Grillo <[email protected]>
>>>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
>>>> callbacks for yuv formats]
>>>> Signed-off-by: Louis Chauvet <[email protected]>
>>>> ---
>>>> drivers/gpu/drm/vkms/vkms_composer.c | 2 +-
>>>> drivers/gpu/drm/vkms/vkms_drv.h | 6 +-
>>>> drivers/gpu/drm/vkms/vkms_formats.c | 289 +++++++++++++++++++++++++++++++++--
>>>> drivers/gpu/drm/vkms/vkms_formats.h | 4 +
>>>> drivers/gpu/drm/vkms/vkms_plane.c | 14 +-
>>>> 5 files changed, 295 insertions(+), 20 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>>>> index e555bf9c1aee..54fc5161d565 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>>>> @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
>>>> * buffer [1]
>>>> */
>>>> current_plane->pixel_read_line(
>>>> - current_plane->frame_info,
>>>> + current_plane,
>>>> x_start,
>>>> y_start,
>>>> direction,
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
>>>> index ccc5be009f15..a4f6456cb971 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
>>>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
>>>> @@ -75,6 +75,8 @@ enum pixel_read_direction {
>>>> READ_RIGHT
>>>> };
>>>>
>>>> +struct vkms_plane_state;
>>>> +
>>>> /**
>>>> <<<<<<< HEAD
>>>> * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
>>>> @@ -87,8 +89,8 @@ enum pixel_read_direction {
>>>> * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
>>>> * x_end.
>>>> */
>>>> -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
>>>> - pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>>>> +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
>>>> + enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>>>
>>> This is the second or third time in this one series changing this type.
>>> Could you not do the change once, in its own patch if possible?
>>
>> Sorry, this is not a change here, but a wrong formatting (missed when
>> rebasing).
>>
>> Do you think that it make sense to re-order my patches and put this
>> typedef at the end? This way it is never updated.
>>
>>>>
>>>> /**
>>>> * vkms_plane_state - Driver specific plane state
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>>> index 46daea6d3ee9..515c80866a58 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>>> @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
>>>> */
>>>> return fb->offsets[plane_index] +
>>>> (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
>>>> - (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
>>>> + (x / drm_format_info_block_height(format, plane_index)) *
>>>> + format->char_per_block[plane_index];
>>>
>>> Shouldn't this be in the patch that added this code in the first place?
>>
>> Same as above, a wrong formatting, I will remove this change and keep
>> everything on one line (even if it's more than 100 chars, it is easier to
>> read).
>>
>>>> }
>>>>
>>>> /**
>>>> @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
>>>> }
>>>> }
>>>>
>>>> +/**
>>>> + * get_subsampling() - Get the subsampling value on a specific direction
>>>
>>> subsampling divisor
>>
>> Thanks for this precision.
>>
>>>> + */
>>>> +static int get_subsampling(const struct drm_format_info *format,
>>>> + enum pixel_read_direction direction)
>>>> +{
>>>> + if (direction == READ_LEFT || direction == READ_RIGHT)
>>>> + return format->hsub;
>>>> + else if (direction == READ_DOWN || direction == READ_UP)
>>>> + return format->vsub;
>>>> + return 1;
>>>
>>> In this and the below function, personally I'd prefer switch-case, with
>>> a cannot-happen-scream after the switch, so the compiler can warn about
>>> unhandled enum values.
>>
>> As for the previous patch, I did not know about this compiler feature,
>> thanks!
>>
>>>> +}
>>>> +
>>>> +/**
>>>> + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
>>>> + */
>>>> +static int get_subsampling_offset(const struct drm_format_info *format,
>>>> + enum pixel_read_direction direction, int x_start, int y_start)
>>>
>>> 'start' values as "increments" for a pixel counter? Is something
>>> misnamed here?
>>>
>>> Is it an increment or an offset?
>>
>> I don't really know how to name the function. I'm open to suggestions
>> x_start and y_start are really the coordinate of the starting reading point.
>>
>> To explain what it does:
>>
>> When using subsampling, you have to read the next pixel of planes[1..4]
>> not at the same "speed" as plane[0]. But I can't only rely on
>> "read_pixel_count % subsampling == 0", because it means that the pixel
>> incrementation on planes[1..4] may not be aligned with the buffer (if
>> hsub=2 and the start pixel is 1, I need to increment planes[1..4] only
>> for x=2,4,6... not 1,3,5...).
>>
>> A way to ensure this is to add an "offset" to count, which ensure that the
>> count % subsampling == 0 on the correct pixel.
>>
>> I made an error, the switch case must be (as count is always counting up,
>> for "inverted" reading direction a negative number ensure that
>> %subsampling == 0 on the correct pixel):
>>
>> switch (direction) {
>> case READ_UP:
>> return -y_start;
>> case READ_DOWN:
>> return y_start;
>> case READ_LEFT:
>> return -x_start;
>> case READ_RIGHT:
>> return x_start;
>> }
>>
>>>> +{
>>>> + if (direction == READ_RIGHT || direction == READ_LEFT)
>>>> + return x_start;
>>>> + else if (direction == READ_DOWN || direction == READ_UP)
>>>> + return y_start;
>>>> + return 0;
>>>> +}
>>>> +
>>
>> [...]
>>
>>>> +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
>>>> + enum drm_color_encoding encoding, enum drm_color_range range)
>>>> +{
>>>> + static const s16 bt601_full[3][3] = {
>>>> + { 256, 0, 359 },
>>>> + { 256, -88, -183 },
>>>> + { 256, 454, 0 },
>>>> + };
>>
>> [...]
>>
>>>> +
>>>> + u8 r = 0;
>>>> + u8 g = 0;
>>>> + u8 b = 0;
>>>> + bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
>>>> + unsigned int y_offset = full ? 0 : 16;
>>>> +
>>>> + switch (encoding) {
>>>> + case DRM_COLOR_YCBCR_BT601:
>>>> + ycbcr2rgb(full ? bt601_full : bt601,
>>>
>>> Doing all these conditional again pixel by pixel is probably
>>> inefficient. Just like with the line reading functions, you could pick
>>> the matrix in advance.
>>
>> I don't think the performance impact is huge (it's only a pair of if), but
>> yes, it's an easy optimization.
>>
>> I will create a conversion_matrix structure:
>>
>> struct conversion_matrix {
>> s16 matrix[3][3];
>> u16 y_offset;
>> }
>>
>> I will create a `get_conversion_matrix_to_argb_u16` function to get this
>> structure from a format+encoding+range.
>>
>> I will also add a field `conversion_matrix` in struct vkms_plane_state to
>> get this matrix only once per plane setup.
>>
>>
>>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>> + break;
>>>> + case DRM_COLOR_YCBCR_BT709:
>>>> + ycbcr2rgb(full ? rec709_full : rec709,
>>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>> + break;
>>>> + case DRM_COLOR_YCBCR_BT2020:
>>>> + ycbcr2rgb(full ? bt2020_full : bt2020,
>>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>> + break;
>>>> + default:
>>>> + pr_warn_once("Not supported color encoding\n");
>>>> + break;
>>>> + }
>>>> +
>>>> + argb_u16->r = r * 257;
>>>> + argb_u16->g = g * 257;
>>>> + argb_u16->b = b * 257;
>>>
>>> I wonder. Using 8-bit fixed point precision seems quite coarse for
>>> 8-bit pixel formats, and it's going to be insufficient for higher bit
>>> depths. Was supporting e.g. 10-bit YUV considered? There is even
>>> deeper, too, like DRM_FORMAT_P016.
>>
>> It's a good point, as I explained above, I took the conversion part as a
>> "black box" to avoid breaking (and debugging) stuff. I think it's easy to
>> switch to s32 bits matrix with 16.16 bits (or anything with more than 16 bits in
>> the float part).
>>
>> Maybe Arthur have an opinion on this?
>
> Yeah, I too don't see why not we could do that. The 8-bit precision was
> sufficient for those formats, but as well noted by Pekka this could be a
> problem for higher bit depths. I just need to make my terrible python
> script spit those values XD.

Finally, I got it working with 32-bit precision.

I basically threw all my untrusted python code away, and started using
the colour python framework suggested by Sebastian[1]. After knowing the
right values (and staring at numbers for hours), I found that with a
little bit of rounding, the conversion works.

Also, while at it, I changed the name rec709 to bt709 to follow the
pattern and added "_full" to the full ranges matrices.

While using the library, I noticed that the red component is wrong on
the color red in one test case.

[1]: https://lore.kernel.org/all/20240115150600.GC160656@toolbox/

Best Regards,
~Arthur Grillo

---

diff --git a/drivers/gpu/drm/vkms/tests/vkms_format_test.c b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
index f66584549827..4cee3c2d8d84 100644
--- a/drivers/gpu/drm/vkms/tests/vkms_format_test.c
+++ b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
@@ -59,7 +59,7 @@ static struct yuv_u8_to_argb_u16_case yuv_u8_to_argb_u16_cases[] = {
{"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
{"gray", {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
{"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
- {"red", {0x35, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
+ {"red", {0x36, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
{"green", {0xb6, 0x1e, 0x0c}, {0x0000, 0x0000, 0xffff, 0x0000}},
{"blue", {0x12, 0xff, 0x74}, {0x0000, 0x0000, 0x0000, 0xffff}},
},
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index e06bbd7c0a67..043f23dbf80d 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -121,10 +121,12 @@ static void RGB565_to_argb_u16(u8 **src_pixels, struct pixel_argb_u16 *out_pixel
out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
}

-static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
+#define BIT_DEPTH 32
+
+static void ycbcr2rgb(const s64 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
{
- s32 y_16, cb_16, cr_16;
- s32 r_16, g_16, b_16;
+ s64 y_16, cb_16, cr_16;
+ s64 r_16, g_16, b_16;

y_16 = y - y_offset;
cb_16 = cb - 128;
@@ -134,9 +136,18 @@ static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r,
g_16 = m[1][0] * y_16 + m[1][1] * cb_16 + m[1][2] * cr_16;
b_16 = m[2][0] * y_16 + m[2][1] * cb_16 + m[2][2] * cr_16;

- *r = clamp(r_16, 0, 0xffff) >> 8;
- *g = clamp(g_16, 0, 0xffff) >> 8;
- *b = clamp(b_16, 0, 0xffff) >> 8;
+ // rounding the values
+ r_16 = r_16 + (1LL << (BIT_DEPTH - 4));
+ g_16 = g_16 + (1LL << (BIT_DEPTH - 4));
+ b_16 = b_16 + (1LL << (BIT_DEPTH - 4));
+
+ r_16 = clamp(r_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
+ g_16 = clamp(g_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
+ b_16 = clamp(b_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
+
+ *r = r_16 >> BIT_DEPTH;
+ *g = g_16 >> BIT_DEPTH;
+ *b = b_16 >> BIT_DEPTH;
}

VISIBLE_IF_KUNIT void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16,
@@ -144,35 +155,40 @@ VISIBLE_IF_KUNIT void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16,
enum drm_color_encoding encoding,
enum drm_color_range range)
{
- static const s16 bt601_full[3][3] = {
- {256, 0, 359},
- {256, -88, -183},
- {256, 454, 0},
+ static const s64 bt601_full[3][3] = {
+ {4294967296, 0, 6021544149},
+ {4294967296, -1478054095, -3067191994},
+ {4294967296, 7610682049, 0},
};
- static const s16 bt601[3][3] = {
- {298, 0, 409},
- {298, -100, -208},
- {298, 516, 0},
+
+ static const s64 bt601_limited[3][3] = {
+ {5020601039, 0, 6881764740},
+ {5020601039, -1689204679, -3505362278},
+ {5020601039, 8697922339, 0},
};
- static const s16 rec709_full[3][3] = {
- {256, 0, 408},
- {256, -48, -120},
- {256, 476, 0 },
+
+ static const s64 bt709_full[3][3] = {
+ {4294967296, 0, 6763714498},
+ {4294967296, -804551626, -2010578443},
+ {4294967296, 7969741314, 0},
};
- static const s16 rec709[3][3] = {
- {298, 0, 459},
- {298, -55, -136},
- {298, 541, 0},
+
+ static const s64 bt709_limited[3][3] = {
+ {5020601039, 0, 7729959424},
+ {5020601039, -919487572, -2297803934},
+ {5020601039, 9108275786, 0},
};
- static const s16 bt2020_full[3][3] = {
- {256, 0, 377},
- {256, -42, -146},
- {256, 482, 0},
+
+ static const s64 bt2020_full[3][3] = {
+ {4294967296, 0, 6333358775},
+ {4294967296, -706750298, -2453942994},
+ {4294967296, 8080551471, 0},
};
- static const s16 bt2020[3][3] = {
- {298, 0, 430},
- {298, -48, -167},
- {298, 548, 0},
+
+ static const s64 bt2020_limited[3][3] = {
+ {5020601039, 0, 7238124312},
+ {5020601039, -807714626, -2804506279},
+ {5020601039, 9234915964, 0},
};

u8 r = 0;
@@ -183,15 +199,15 @@ VISIBLE_IF_KUNIT void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16,

switch (encoding) {
case DRM_COLOR_YCBCR_BT601:
- ycbcr2rgb(full ? bt601_full : bt601,
+ ycbcr2rgb(full ? bt601_full : bt601_limited,
yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
break;
case DRM_COLOR_YCBCR_BT709:
- ycbcr2rgb(full ? rec709_full : rec709,
+ ycbcr2rgb(full ? bt709_full : bt709_limited,
yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
break;
case DRM_COLOR_YCBCR_BT2020:
- ycbcr2rgb(full ? bt2020_full : bt2020,
+ ycbcr2rgb(full ? bt2020_full : bt2020_limited,
yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
break;
default:

---

>
>> Just to be sure, the DRM subsystem don't have such matrix somewhere? It
>> can be nice to avoid duplicating them.
>
> As to my knowledge it does not exist on DRM, I think those are normally
> on the hardware itself (*please* correct me if I'm wrong).
>
> But, v4l2 has a similar table on
> drivers/media/common/v4l2-tpg/v4l2-tpg-core.c (Actually, I started my
> code based on this), unfortunately it's only 8-bit too.
>
> Best Regards,
> ~Arthur Grillo
>
>>
>>>> +} + /* * The following functions are read_line function for each
>>>> pixel format supported by VKMS. * @@ -142,13 +250,13 @@ static void
>>>> RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixe
>>>> * [1]:
>>>> https://lore.kernel.org/dri-devel/[email protected]/
>>>> */
>>>>
>>>> -static void ARGB8888_read_line(struct vkms_frame_info *frame_info,
>>>> int x_start, int y_start, +static void ARGB8888_read_line(struct
>>>> vkms_plane_state *plane, int x_start, int y_start, enum
>>>> pixel_read_direction direction, int count, struct pixel_argb_u16
>>>> out_pixel[]) { - u8 *src_pixels = packed_pixels_addr(frame_info,
>>>> x_start, y_start, 0); + u8 *src_pixels =
>>>> packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>>>>
>>>> - int step = get_step_1x1(frame_info->fb, direction, 0); +
>>>> int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>>>
>>> These are the kind of changes I would not expect to see in a patch
>>> adding YUV support. There are a lot of them, too.
>>
>> I will put it directly this change in PATCHv2 5/9.
>>
>> [...]
>>
>>>> +static void semi_planar_yuv_read_line(struct vkms_plane_state
>>>> *plane, int x_start, int y_start, +
>>>> enum pixel_read_direction direction, int count, +
>>>> struct pixel_argb_u16 out_pixel[]) +{ + u8 *y_plane =
>>>> packed_pixels_addr(plane->frame_info, x_start, y_start, 0); +
>>>> u8 *uv_plane = packed_pixels_addr(plane->frame_info, +
>>>> x_start / plane->frame_info->fb->format->hsub, +
>>>> y_start / plane->frame_info->fb->format->vsub, +
>>>> 1); + struct pixel_yuv_u8 yuv_u8; + int step_y =
>>>> get_step_1x1(plane->frame_info->fb, direction, 0); + int
>>>> step_uv = get_step_1x1(plane->frame_info->fb, direction, 1); +
>>>> int subsampling = get_subsampling(plane->frame_info->fb->format,
>>>> direction); + int subsampling_offset =
>>>> get_subsampling_offset(plane->frame_info->fb->format, direction, +
>>>> x_start, y_start); // 0 + + for (int i = 0; i < count; i++) { +
>>>> yuv_u8.y = y_plane[0]; + yuv_u8.u = uv_plane[0]; +
>>>> yuv_u8.v = uv_plane[1]; + + yuv_u8_to_argb_u16(out_pixel,
>>>> &yuv_u8, plane->base.base.color_encoding, +
>>>> plane->base.base.color_range);
>>>
>>> Oh, so this was the reason to change the read-line function
>>> signature. Maybe just stash a pointer to the right matrix and the
>>> right y_offset in frame_info instead?
>>
>> Yes, that why I changed the signature. I think I will keep this
>> signature and put the conversion_matrix inside the vkms_plane_state,
>> for me it make more sense to have pixel_read_line and
>> conversion_matrix in the same structure.
>>
>>>> + out_pixel += 1; + y_plane += step_y; +
>>>> if ((i + subsampling_offset + 1) % subsampling == 0) +
>>>> uv_plane += step_uv; + } +} + +static void
>>>> semi_planar_yvu_read_line(struct vkms_plane_state *plane, int
>>>> x_start, int y_start, + enum
>>>> pixel_read_direction direction, int count, +
>>>> struct pixel_argb_u16 out_pixel[]) +{ + u8 *y_plane =
>>>> packed_pixels_addr(plane->frame_info, x_start, y_start, 0); +
>>>> u8 *vu_plane = packed_pixels_addr(plane->frame_info, +
>>>> x_start / plane->frame_info->fb->format->hsub, +
>>>> y_start / plane->frame_info->fb->format->vsub, +
>>>> 1); + struct pixel_yuv_u8 yuv_u8; + int step_y =
>>>> get_step_1x1(plane->frame_info->fb, direction, 0); + int
>>>> step_vu = get_step_1x1(plane->frame_info->fb, direction, 1); +
>>>> int subsampling = get_subsampling(plane->frame_info->fb->format,
>>>> direction); + int subsampling_offset =
>>>> get_subsampling_offset(plane->frame_info->fb->format, direction, +
>>>> x_start, y_start); + for (int i = 0; i < count; i++) { +
>>>> yuv_u8.y = y_plane[0]; + yuv_u8.u = vu_plane[1]; +
>>>> yuv_u8.v = vu_plane[0];
>>>
>>> You could swap matrix columns instead of writing this whole new
>>> function for UV vs. VU. Just an idea.
>>
>> I was not happy with this duplication too, but I did not think about
>> switching columns. That's a good idea, thanks!
>>
>> Kind regards, Louis Chauvet
>>
>> [...]
>>

2024-02-29 08:49:02

by Pekka Paalanen

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] drm/vkms: write/update the documentation for pixel conversion and pixel write functions

On Tue, 27 Feb 2024 16:02:10 +0100
Louis Chauvet <[email protected]> wrote:

> [...]
>
> > > diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> > > index 172830a3936a..cb7a49b7c8e7 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_formats.c
> > > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > > @@ -9,6 +9,17 @@
> > >
> > > #include "vkms_formats.h"
> > >
> > > +/**
> > > + * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
> > > + * in the first plane
> > > + *
> > > + * @frame_info: Buffer metadata
> > > + * @x: The x coordinate of the wanted pixel in the buffer
> > > + * @y: The y coordinate of the wanted pixel in the buffer
> > > + *
> > > + * The caller must be aware that this offset is not always a pointer to a pixel. If individual
> > > + * pixel values are needed, they have to be extracted from the resulting block.
> >
> > Just wondering how the caller will be able to extract the right pixel
> > from the block without re-using the knowledge already used in this
> > function. I'd also expect the function to round down x,y to be
> > divisible by block dimensions, but that's not visible in this email.
> > Then the caller needs the remainder from the round-down, too?
>
> You are right, the current implementation is only working when block_h ==
> block_w == 1. I think I wrote the documentation for PATCHv2 5/9, but when
> backporting this comment for PATCHv2 3/9 I forgot to update it.
> The new comment will be:
>
> * pixels_offset() - Get the offset of a given pixel data at coordinate
> * x/y in the first plane
> [...]
> * The caller must ensure that the framebuffer associated with this
> * request uses a pixel format where block_h == block_w == 1.
> * If this requirement is not fulfilled, the resulting offset can be
> * completly wrong.

Hi Louis,

if there is no plan for how non-1x1 blocks would work yet, then I think
the above wording is fine. In my mind, the below wording would
encourage callers to seek out and try arbitrary tricks to make things
work for non-1x1 without rewriting the function to actually work.

I believe something would need to change in the function signature to
make it properly usable for non-1x1 blocks, but I too cannot suggest
anything off-hand.

>
> And yes, even after PATCHv2 5/9 it is not clear what is the offset. Is
> this better to replace the last sentence? (I will do the same update for
> the last sentence of packed_pixels_addr)
>
> [...]
> * The returned offset correspond to the offset of the block containing the pixel at coordinates
> * x/y.
> * The caller must use this offset with care, as for formats with block_h != 1 or block_w != 1
> * the requested pixel value may have to be extracted from the block, even if they are
> * individually adressable.
>
> > > + */
> > > static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> > > {
> > > struct drm_framebuffer *fb = frame_info->fb;
> > > @@ -17,12 +28,13 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
> > > + (x * fb->format->cpp[0]);
> > > }
> > >
>
> [...]
>
> > > +/**
> > > + * Retrieve the correct read_pixel function for a specific format.
> > > + * The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the
> > > + * pointer is valid before using it in a vkms_plane_state.
> > > + *
> > > + * @format: 4cc of the format
> >
> > Since there are many different 4cc style pixel format definition tables
> > in existence with conflicting definitions, it would not hurt to be more
> > specific that this is about DRM_FORMAT_* or drm_fourcc.h.
>
> Is this better?
>
> @format: DRM_FORMAT_* value for which to obtain a conversion function (see [drm_fourcc.h])

Much better!


Thanks,
pq

> > > + */
> > > void *get_pixel_conversion_function(u32 format)
> > > {
> > > switch (format) {
> > > @@ -247,6 +280,13 @@ void *get_pixel_conversion_function(u32 format)
> > > }
> > > }
> > >
> > > +/**
> > > + * Retrieve the correct write_pixel function for a specific format.
> > > + * The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the
> > > + * pointer is valid before using it in a vkms_writeback_job.
> > > + *
> > > + * @format: 4cc of the format
> >
> > This too.
>
> Ack, I will use the same as above
>
> > > + */
> > > void *get_pixel_write_function(u32 format)
> > > {
> > > switch (format) {
> > >
> >
> > I couldn't check if the docs are correct since the patch context is not
> > wide enough, but they all sound plausible to me.
>
> I checked again, I don't see other errors than your first comment.
>
> >
> > Thanks,
> > pq
>
> Kind regards,
> Louis Chauvet
>
> --
> Louis Chauvet, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com


Attachments:
(No filename) (849.00 B)
OpenPGP digital signature

2024-02-29 09:07:56

by Pekka Paalanen

[permalink] [raw]
Subject: Re: [PATCH v2 4/9] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions

On Tue, 27 Feb 2024 16:02:13 +0100
Louis Chauvet <[email protected]> wrote:

> Le 26/02/24 - 13:36, Pekka Paalanen a écrit :
> > On Fri, 23 Feb 2024 12:37:24 +0100
> > Louis Chauvet <[email protected]> wrote:
> >
> > > Introduce two typedefs: pixel_read_t and pixel_write_t. It allows the
> > > compiler to check if the passed functions take the correct arguments.
> > > Such typedefs will help ensuring consistency across the code base in
> > > case of update of these prototypes.
> > >
> > > Introduce a check around the get_pixel_*_functions to avoid using a
> > > nullptr as a function.
> > >
> > > Document for those typedefs.
> > >
> > > Signed-off-by: Louis Chauvet <[email protected]>
> > > ---
> > > drivers/gpu/drm/vkms/vkms_drv.h | 23 +++++++++++++++++++++--
> > > drivers/gpu/drm/vkms/vkms_formats.c | 8 ++++----
> > > drivers/gpu/drm/vkms/vkms_formats.h | 4 ++--
> > > drivers/gpu/drm/vkms/vkms_plane.c | 9 ++++++++-
> > > drivers/gpu/drm/vkms/vkms_writeback.c | 9 ++++++++-
> > > 5 files changed, 43 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> > > index 18086423a3a7..886c885c8cf5 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_drv.h
> > > +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> > > @@ -53,12 +53,31 @@ struct line_buffer {
> > > struct pixel_argb_u16 *pixels;
> > > };
> > >
> > > +/**
> > > + * typedef pixel_write_t - These functions are used to read a pixel from a
> > > + * `struct pixel_argb_u16*`, convert it in a specific format and write it in the @dst_pixels
> > > + * buffer.
> > > + *
> > > + * @dst_pixel: destination address to write the pixel
> > > + * @in_pixel: pixel to write
> > > + */
> > > +typedef void (*pixel_write_t)(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel);
> >
> > There are some inconsistencies in pixel_write_t and pixel_read_t. At
> > this point of the series they still operate on a single pixel, but you
> > use dst_pixels and src_pixels, plural. Yet the documentation correctly
> > talks about processing a single pixel.
>
> I will fix this for v4.
>
> > I would also expect the source to be always const, but that's a whole
> > another patch to change.
>
> The v4 will contains a new patch "drm/vkms: Use const pointer for
> pixel_read and pixel_write functions"

Sounds good!

>
> [...]
>
> > > diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> > > index d5203f531d96..f68b1b03d632 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_plane.c
> > > +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> > > @@ -106,6 +106,13 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> > > return;
> > >
> > > fmt = fb->format->format;
> > > + pixel_read_t pixel_read = get_pixel_read_function(fmt);
> > > +
> > > + if (!pixel_read) {
> > > + DRM_WARN("Pixel format is not supported by VKMS planes. State is inchanged\n");
> >
> > DRM_WARN() is the kernel equivalent to userspace assert(), right?
>
> For the DRM_WARN it is just a standard prinkt(KERN_WARN, ...) (hidden
> behind drm internal macros).

My concern here is that does hitting this cause additional breakage of
the UAPI contract? For example, the UAPI contract expects that the old
FB is unreffed and the new FB is reffed by the plane in question. If
this early return causes that FB swap to be skipped, it could cause
secondary unexpected failures or misbehaviour for userspace later. That
could mislead debugging to think that there is a userspace bug.

Even if you cannot actually read FB due to an internal bug, it would be
good to still apply all the state changes that the UAPI contract
mandates.

Unless, this is a kernel bug kind of thing which explodes very
verbosely, but DRM_WARN is not that.

> > In that failing the check means an internal invariant was violated,
> > which means a code bug in kernel?
> >
> > Maybe this could be more specific about what invariant was violated?
> > E.g. atomic check should have rejected this attempt already.
>
> I'm not an expert (yet) in DRM, so please correct me:
> When atomic_update is called, the new state is always validated by
> atomic_check before? There is no way to pass something not validated by
> atomic_check to atomic_update? If this is the case, yes, it should be an
> ERROR and not a WARN as an invalid format passed the atomic_check
> verification.

I only know about the UAPI, I'm not familiar with kernel internals.

We see that atomic_update returns void, so I think it simply cannot
return errors. To my understanding, atomic_check needs to ensure that
atomic_update cannot fail. There is even UAPI to exercise atomic_check
alone: the atomic commit TEST_ONLY flag. Userspace trusts that flag, and
will not expect an identical atomic commit to fail without TEST_ONLY
when it succeeded with TEST_ONLY.

> If so, is this better?
>
> if (!pixel_read) {
> /*
> * This is a bug as the vkms_plane_atomic_check must forbid all unsupported formats.
> */
> DRM_ERROR("Pixel format %4cc is not supported by VKMS planes.\n", fmt);
> return;
> }
>
> I will put the same code in vkms_writeback.c.

Maybe maintainers can comment whether even DRM_ERROR is strong enough.

As for the message, what you wrote in the comment is the most important
part that I'd put in the log. It explains what's going on, while that
"format not supported" is a detail without context.


Thanks,
pq


Attachments:
(No filename) (849.00 B)
OpenPGP digital signature

2024-02-29 12:37:17

by Pekka Paalanen

[permalink] [raw]
Subject: Re: [PATCH v2 7/9] drm/vkms: Add range and encoding properties to pixel_read function

On Tue, 27 Feb 2024 16:02:10 +0100
Louis Chauvet <[email protected]> wrote:

> (same as for PATCHv2 6/9, I took the patch from Arthur with no
> modifications)
>
> Le 26/02/24 - 14:23, Pekka Paalanen a écrit :
> > On Fri, 23 Feb 2024 12:37:27 +0100
> > Louis Chauvet <[email protected]> wrote:
> >
> > > From: Arthur Grillo <[email protected]>
> > >
> > > Create range and encoding properties. This should be noop, as none of
> > > the conversion functions need those properties.
> >
> > None of the conversion function needs this? How can one say so?
> > The previous patch is making use of them already, AFAICT?
>
> It's my fault, I mixed the commits (in Arthur's series, "Add range..." was
> before "Add YUV support"), but for me it makes no sens to have the color
> property without the support in the driver.

Ah, so if there was no YUV support, these properties would never affect
anything. Ok, I see where that is coming from.

>
> Maybe it's better just to merge "Add range..." with "Add YUV support"?
>
> > How is this a noop? Is it not exposing new UAPI from VKMS?
>
> It's not a no-op from userspace, but from the driver side, yes.

If it all is already hooked up and handled in the driver, then say just
that?

"Now that the driver internally handles these quantization ranges and YUV
encoding matrices, expose the UAPI for setting them."

And fix the commit summary line too, nothing "pixel_read" here.


Thanks,
pq

>
> Kind regards,
> Louis Chauvet
>
> > Thanks,
> > pq
> >
> > >
> > > Signed-off-by: Arthur Grillo <[email protected]>
> > > [Louis Chauvet: retained only relevant parts]
> > > Signed-off-by: Louis Chauvet <[email protected]>
> > > ---
> > > drivers/gpu/drm/vkms/vkms_plane.c | 9 +++++++++
> > > 1 file changed, 9 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> > > index 427ca67c60ce..95dfde297377 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_plane.c
> > > +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> > > @@ -228,5 +228,14 @@ struct vkms_plane *vkms_plane_init(struct vkms_device *vkmsdev,
> > > drm_plane_create_rotation_property(&plane->base, DRM_MODE_ROTATE_0,
> > > DRM_MODE_ROTATE_MASK | DRM_MODE_REFLECT_MASK);
> > >
> > > + drm_plane_create_color_properties(&plane->base,
> > > + BIT(DRM_COLOR_YCBCR_BT601) |
> > > + BIT(DRM_COLOR_YCBCR_BT709) |
> > > + BIT(DRM_COLOR_YCBCR_BT2020),
> > > + BIT(DRM_COLOR_YCBCR_LIMITED_RANGE) |
> > > + BIT(DRM_COLOR_YCBCR_FULL_RANGE),
> > > + DRM_COLOR_YCBCR_BT601,
> > > + DRM_COLOR_YCBCR_FULL_RANGE);
> > > +
> > > return plane;
> > > }
> > >
> >
>
>
>


Attachments:
(No filename) (849.00 B)
OpenPGP digital signature

2024-02-29 17:57:37

by Arthur Grillo

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] drm/vkms: Add YUV support



On 29/02/24 09:12, Pekka Paalanen wrote:
> On Wed, 28 Feb 2024 22:52:09 -0300
> Arthur Grillo <[email protected]> wrote:
>
>> On 27/02/24 17:01, Arthur Grillo wrote:
>>>
>>>
>>> On 27/02/24 12:02, Louis Chauvet wrote:
>>>> Hi Pekka,
>>>>
>>>> For all the comment related to the conversion part, maybe Arthur have an
>>>> opinion on it, I took his patch as a "black box" (I did not want to
>>>> break (and debug) it).
>>>>
>>>> Le 26/02/24 - 14:19, Pekka Paalanen a écrit :
>>>>> On Fri, 23 Feb 2024 12:37:26 +0100
>>>>> Louis Chauvet <[email protected]> wrote:
>>>>>
>>>>>> From: Arthur Grillo <[email protected]>
>>>>>>
>>>>>> Add support to the YUV formats bellow:
>>>>>>
>>>>>> - NV12
>>>>>> - NV16
>>>>>> - NV24
>>>>>> - NV21
>>>>>> - NV61
>>>>>> - NV42
>>>>>> - YUV420
>>>>>> - YUV422
>>>>>> - YUV444
>>>>>> - YVU420
>>>>>> - YVU422
>>>>>> - YVU444
>>>>>>
>>>>>> The conversion matrices of each encoding and range were obtained by
>>>>>> rounding the values of the original conversion matrices multiplied by
>>>>>> 2^8. This is done to avoid the use of fixed point operations.
>>>>>>
>>>>>> Signed-off-by: Arthur Grillo <[email protected]>
>>>>>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
>>>>>> callbacks for yuv formats]
>>>>>> Signed-off-by: Louis Chauvet <[email protected]>
>>>>>> ---
>>>>>> drivers/gpu/drm/vkms/vkms_composer.c | 2 +-
>>>>>> drivers/gpu/drm/vkms/vkms_drv.h | 6 +-
>>>>>> drivers/gpu/drm/vkms/vkms_formats.c | 289 +++++++++++++++++++++++++++++++++--
>>>>>> drivers/gpu/drm/vkms/vkms_formats.h | 4 +
>>>>>> drivers/gpu/drm/vkms/vkms_plane.c | 14 +-
>>>>>> 5 files changed, 295 insertions(+), 20 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>>>>>> index e555bf9c1aee..54fc5161d565 100644
>>>>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>>>>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>>>>>> @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
>>>>>> * buffer [1]
>>>>>> */
>>>>>> current_plane->pixel_read_line(
>>>>>> - current_plane->frame_info,
>>>>>> + current_plane,
>>>>>> x_start,
>>>>>> y_start,
>>>>>> direction,
>>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
>>>>>> index ccc5be009f15..a4f6456cb971 100644
>>>>>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
>>>>>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
>>>>>> @@ -75,6 +75,8 @@ enum pixel_read_direction {
>>>>>> READ_RIGHT
>>>>>> };
>>>>>>
>>>>>> +struct vkms_plane_state;
>>>>>> +
>>>>>> /**
>>>>>> <<<<<<< HEAD
>>>>>> * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
>>>>>> @@ -87,8 +89,8 @@ enum pixel_read_direction {
>>>>>> * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
>>>>>> * x_end.
>>>>>> */
>>>>>> -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
>>>>>> - pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>>>>>> +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
>>>>>> + enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>>>>>
>>>>> This is the second or third time in this one series changing this type.
>>>>> Could you not do the change once, in its own patch if possible?
>>>>
>>>> Sorry, this is not a change here, but a wrong formatting (missed when
>>>> rebasing).
>>>>
>>>> Do you think that it make sense to re-order my patches and put this
>>>> typedef at the end? This way it is never updated.
>
> I'm not sure, I haven't checked how it would change your patches. The
> intermediate changes might get a lot uglier?
>
> Just try to fold changes so that you don't need to change something
> twice over the series unless there is a good reason to. "How hard would
> it be to review this?" is my measure stick.
>
>
>>>>
>>>>>>
>>>>>> /**
>>>>>> * vkms_plane_state - Driver specific plane state
>>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>> index 46daea6d3ee9..515c80866a58 100644
>>>>>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>> @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
>>>>>> */
>>>>>> return fb->offsets[plane_index] +
>>>>>> (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
>>>>>> - (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
>>>>>> + (x / drm_format_info_block_height(format, plane_index)) *
>>>>>> + format->char_per_block[plane_index];
>>>>>
>>>>> Shouldn't this be in the patch that added this code in the first place?
>>>>
>>>> Same as above, a wrong formatting, I will remove this change and keep
>>>> everything on one line (even if it's more than 100 chars, it is easier to
>>>> read).
>
> Personally I agree that readability is more important than strict line
> length limits. I'm not sure how the kernel rolls there.
>
>>>>
>>>>>> }
>>>>>>
>>>>>> /**
>>>>>> @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> +/**
>>>>>> + * get_subsampling() - Get the subsampling value on a specific direction
>>>>>
>>>>> subsampling divisor
>>>>
>>>> Thanks for this precision.
>>>>
>>>>>> + */
>>>>>> +static int get_subsampling(const struct drm_format_info *format,
>>>>>> + enum pixel_read_direction direction)
>>>>>> +{
>>>>>> + if (direction == READ_LEFT || direction == READ_RIGHT)
>>>>>> + return format->hsub;
>>>>>> + else if (direction == READ_DOWN || direction == READ_UP)
>>>>>> + return format->vsub;
>>>>>> + return 1;
>>>>>
>>>>> In this and the below function, personally I'd prefer switch-case, with
>>>>> a cannot-happen-scream after the switch, so the compiler can warn about
>>>>> unhandled enum values.
>>>>
>>>> As for the previous patch, I did not know about this compiler feature,
>>>> thanks!
>>>>
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
>>>>>> + */
>>>>>> +static int get_subsampling_offset(const struct drm_format_info *format,
>>>>>> + enum pixel_read_direction direction, int x_start, int y_start)
>>>>>
>>>>> 'start' values as "increments" for a pixel counter? Is something
>>>>> misnamed here?
>>>>>
>>>>> Is it an increment or an offset?
>>>>
>>>> I don't really know how to name the function. I'm open to suggestions
>>>> x_start and y_start are really the coordinate of the starting reading point.
>
> I looks like it's an offset, so "offset" and "start" are good words.
> Then the only misleading piece is the doc:
>
> "Get the subsampling offset to use when incrementing the pixel counter"
>
> This sounds like the offset is used when incrementing a counter, that
> is, counter is increment by offset each time. That's my problem with
> this.
>
> Fix just the doc, and it's good, I think.
>
>>>>
>>>> To explain what it does:
>>>>
>>>> When using subsampling, you have to read the next pixel of planes[1..4]
>>>> not at the same "speed" as plane[0]. But I can't only rely on
>>>> "read_pixel_count % subsampling == 0", because it means that the pixel
>>>> incrementation on planes[1..4] may not be aligned with the buffer (if
>>>> hsub=2 and the start pixel is 1, I need to increment planes[1..4] only
>>>> for x=2,4,6... not 1,3,5...).
>>>>
>>>> A way to ensure this is to add an "offset" to count, which ensure that the
>>>> count % subsampling == 0 on the correct pixel.
>
> Yes, I think I did get that feeling from the code eventually somehow,
> but it wouldn't hurt to explain it in the comment.
>
> "An offset for keeping the chroma siting consistent regardless of
> x_start and y_start" maybe?
>
>>>>
>>>> I made an error, the switch case must be (as count is always counting up,
>>>> for "inverted" reading direction a negative number ensure that
>>>> %subsampling == 0 on the correct pixel):
>>>>
>>>> switch (direction) {
>>>> case READ_UP:
>>>> return -y_start;
>>>> case READ_DOWN:
>>>> return y_start;
>>>> case READ_LEFT:
>>>> return -x_start;
>>>> case READ_RIGHT:
>>>> return x_start;
>>>> }
>
> Yes, the inverted reading directions are different indeed. I did not
> think through if this works also for sub-sampling divisors > 2 which I
> don't think are ever used.
>
> Does IGT find this mistake? If not, maybe IGT should be extended.
>
>>>>
>>>>>> +{
>>>>>> + if (direction == READ_RIGHT || direction == READ_LEFT)
>>>>>> + return x_start;
>>>>>> + else if (direction == READ_DOWN || direction == READ_UP)
>>>>>> + return y_start;
>>>>>> + return 0;
>>>>>> +}
>>>>>> +
>>>>
>>>> [...]
>>>>
>>>>>> +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
>>>>>> + enum drm_color_encoding encoding, enum drm_color_range range)
>>>>>> +{
>>>>>> + static const s16 bt601_full[3][3] = {
>>>>>> + { 256, 0, 359 },
>>>>>> + { 256, -88, -183 },
>>>>>> + { 256, 454, 0 },
>>>>>> + };
>>>>
>>>> [...]
>>>>
>>>>>> +
>>>>>> + u8 r = 0;
>>>>>> + u8 g = 0;
>>>>>> + u8 b = 0;
>>>>>> + bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
>>>>>> + unsigned int y_offset = full ? 0 : 16;
>>>>>> +
>>>>>> + switch (encoding) {
>>>>>> + case DRM_COLOR_YCBCR_BT601:
>>>>>> + ycbcr2rgb(full ? bt601_full : bt601,
>>>>>
>>>>> Doing all these conditional again pixel by pixel is probably
>>>>> inefficient. Just like with the line reading functions, you could pick
>>>>> the matrix in advance.
>>>>
>>>> I don't think the performance impact is huge (it's only a pair of if), but
>>>> yes, it's an easy optimization.
>>>>
>>>> I will create a conversion_matrix structure:
>>>>
>>>> struct conversion_matrix {
>>>> s16 matrix[3][3];
>>>> u16 y_offset;
>>>> }
>
> When defining such a struct type, it would be good to document the
> matrix layout (which one is row, which one is column), and what the s16
> mean (fixed point?).
>
> Try to not mix signed and unsigned types, too. The C implicit type
> promotion rules can be surprising. Just make everything signed while
> computing, and convert to/from unsigned only for storage.
>
>>>>
>>>> I will create a `get_conversion_matrix_to_argb_u16` function to get this
>>>> structure from a format+encoding+range.
>>>>
>>>> I will also add a field `conversion_matrix` in struct vkms_plane_state to
>>>> get this matrix only once per plane setup.
>
> Alright. Let's see how that works.
>
>>>>
>>>>
>>>>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>>>> + break;
>>>>>> + case DRM_COLOR_YCBCR_BT709:
>>>>>> + ycbcr2rgb(full ? rec709_full : rec709,
>>>>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>>>> + break;
>>>>>> + case DRM_COLOR_YCBCR_BT2020:
>>>>>> + ycbcr2rgb(full ? bt2020_full : bt2020,
>>>>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>>>> + break;
>>>>>> + default:
>>>>>> + pr_warn_once("Not supported color encoding\n");
>>>>>> + break;
>>>>>> + }
>>>>>> +
>>>>>> + argb_u16->r = r * 257;
>>>>>> + argb_u16->g = g * 257;
>>>>>> + argb_u16->b = b * 257;
>>>>>
>>>>> I wonder. Using 8-bit fixed point precision seems quite coarse for
>>>>> 8-bit pixel formats, and it's going to be insufficient for higher bit
>>>>> depths. Was supporting e.g. 10-bit YUV considered? There is even
>>>>> deeper, too, like DRM_FORMAT_P016.
>>>>
>>>> It's a good point, as I explained above, I took the conversion part as a
>>>> "black box" to avoid breaking (and debugging) stuff. I think it's easy to
>>>> switch to s32 bits matrix with 16.16 bits (or anything with more than 16 bits in
>>>> the float part).
>>>>
>>>> Maybe Arthur have an opinion on this?
>>>
>>> Yeah, I too don't see why not we could do that. The 8-bit precision was
>>> sufficient for those formats, but as well noted by Pekka this could be a
>>> problem for higher bit depths. I just need to make my terrible python
>>> script spit those values XD.
>>
>> Finally, I got it working with 32-bit precision.
>>
>> I basically threw all my untrusted python code away, and started using
>> the colour python framework suggested by Sebastian[1]. After knowing the
>> right values (and staring at numbers for hours), I found that with a
>> little bit of rounding, the conversion works.
>>
>> Also, while at it, I changed the name rec709 to bt709 to follow the
>> pattern and added "_full" to the full ranges matrices.
>>
>> While using the library, I noticed that the red component is wrong on
>> the color red in one test case.
>>
>> [1]: https://lore.kernel.org/all/20240115150600.GC160656@toolbox/
>
> That all sounds good. I wish the kernel code contained comments
> explaining how exactly you computed those matrices with python/colour.
> If the python snippets are not too long, including them verbatim as
> code comments would be really nice for both reviewers and posterity.
>
> The same for the VKMS unit tests, too, how you got the expected result
> values.

Sure, I can do that, the python code is not that long to not have it in
there.

>
>>
>> Best Regards,
>> ~Arthur Grillo
>>
>> ---
>>
>> diff --git a/drivers/gpu/drm/vkms/tests/vkms_format_test.c b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
>> index f66584549827..4cee3c2d8d84 100644
>> --- a/drivers/gpu/drm/vkms/tests/vkms_format_test.c
>> +++ b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
>> @@ -59,7 +59,7 @@ static struct yuv_u8_to_argb_u16_case yuv_u8_to_argb_u16_cases[] = {
>> {"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
>> {"gray", {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
>> {"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
>> - {"red", {0x35, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
>> + {"red", {0x36, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
>> {"green", {0xb6, 0x1e, 0x0c}, {0x0000, 0x0000, 0xffff, 0x0000}},
>> {"blue", {0x12, 0xff, 0x74}, {0x0000, 0x0000, 0x0000, 0xffff}},
>> },
>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>> index e06bbd7c0a67..043f23dbf80d 100644
>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>> @@ -121,10 +121,12 @@ static void RGB565_to_argb_u16(u8 **src_pixels, struct pixel_argb_u16 *out_pixel
>> out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
>> }
>>
>> -static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
>> +#define BIT_DEPTH 32
>> +
>> +static void ycbcr2rgb(const s64 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
>> {
>> - s32 y_16, cb_16, cr_16;
>> - s32 r_16, g_16, b_16;
>> + s64 y_16, cb_16, cr_16;
>> + s64 r_16, g_16, b_16;
>>
>> y_16 = y - y_offset;
>> cb_16 = cb - 128;
>> @@ -134,9 +136,18 @@ static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r,
>> g_16 = m[1][0] * y_16 + m[1][1] * cb_16 + m[1][2] * cr_16;
>> b_16 = m[2][0] * y_16 + m[2][1] * cb_16 + m[2][2] * cr_16;
>>
>> - *r = clamp(r_16, 0, 0xffff) >> 8;
>> - *g = clamp(g_16, 0, 0xffff) >> 8;
>> - *b = clamp(b_16, 0, 0xffff) >> 8;
>> + // rounding the values
>> + r_16 = r_16 + (1LL << (BIT_DEPTH - 4));
>> + g_16 = g_16 + (1LL << (BIT_DEPTH - 4));
>> + b_16 = b_16 + (1LL << (BIT_DEPTH - 4));
>> +
>> + r_16 = clamp(r_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
>> + g_16 = clamp(g_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
>> + b_16 = clamp(b_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
>
> Where do the BIT_DEPTH - 4 and BIT_DEPTH + 8 come from?

Basically, the numbers are in this form in hex:

0xsspppppppp

In the end, we only want the 's' bits.

The matrix multiplication is not giving us perfect results, making some
of KUnit test not pass, This is because the values end up a little bit
off. KUnit expects 0xfe, but this functions is returning 0xfd.

I noticed that before shifting the values to get the 's' bytes the
values were a lot close to what is expected, something like:

0xfdfe287312
^
So the rounding part adds 1 to this marked 'f' to round a bit the values
(drm_fixed.h does something similar on drm_fixp2int_round).
Like that:

0xfdfe287312
+ 0x0010000000
------------
0xfe0e287312

That's why the BIT_DEPTH - 4.

After that, the values need to be clamped to not get wrong results when
shifting this s64 and casting it to u8. We clamp it to the minimum
allowed value: 0, and to the maximum allowed value, which in this case
is all the (BIT_DEPTH + 8) bits set to 1, The '+ 8' is to account for
the size of the 's' bits.

After writing this, I think that maybe it would be good to add this
explanation as a comment on the code.

>
>> +
>> + *r = r_16 >> BIT_DEPTH;
>> + *g = g_16 >> BIT_DEPTH;
>> + *b = b_16 >> BIT_DEPTH;
>> }
>
> ...
>
>>>
>>>> Just to be sure, the DRM subsystem don't have such matrix somewhere? It
>>>> can be nice to avoid duplicating them.
>>>
>>> As to my knowledge it does not exist on DRM, I think those are normally
>>> on the hardware itself (*please* correct me if I'm wrong).
>
> I couldn't find a matrix type either on a quick glance, but there is
> drm_fixed.h for a couple different fixed point formats, it seems. FWIW.
> drm_fixed.h didn't feel very appealing for this here.

Yes, when developing this code I started using drm_fixed.h, but after a
few iterations it was more of a hindrance than a help.

Best Regards,
~Arthur Grillo

>
>>>
>>> But, v4l2 has a similar table on
>>> drivers/media/common/v4l2-tpg/v4l2-tpg-core.c (Actually, I started my
>>> code based on this), unfortunately it's only 8-bit too.
>
> Thanks,
> pq

2024-02-29 12:12:59

by Pekka Paalanen

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] drm/vkms: Add YUV support

On Wed, 28 Feb 2024 22:52:09 -0300
Arthur Grillo <[email protected]> wrote:

> On 27/02/24 17:01, Arthur Grillo wrote:
> >
> >
> > On 27/02/24 12:02, Louis Chauvet wrote:
> >> Hi Pekka,
> >>
> >> For all the comment related to the conversion part, maybe Arthur have an
> >> opinion on it, I took his patch as a "black box" (I did not want to
> >> break (and debug) it).
> >>
> >> Le 26/02/24 - 14:19, Pekka Paalanen a écrit :
> >>> On Fri, 23 Feb 2024 12:37:26 +0100
> >>> Louis Chauvet <[email protected]> wrote:
> >>>
> >>>> From: Arthur Grillo <[email protected]>
> >>>>
> >>>> Add support to the YUV formats bellow:
> >>>>
> >>>> - NV12
> >>>> - NV16
> >>>> - NV24
> >>>> - NV21
> >>>> - NV61
> >>>> - NV42
> >>>> - YUV420
> >>>> - YUV422
> >>>> - YUV444
> >>>> - YVU420
> >>>> - YVU422
> >>>> - YVU444
> >>>>
> >>>> The conversion matrices of each encoding and range were obtained by
> >>>> rounding the values of the original conversion matrices multiplied by
> >>>> 2^8. This is done to avoid the use of fixed point operations.
> >>>>
> >>>> Signed-off-by: Arthur Grillo <[email protected]>
> >>>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
> >>>> callbacks for yuv formats]
> >>>> Signed-off-by: Louis Chauvet <[email protected]>
> >>>> ---
> >>>> drivers/gpu/drm/vkms/vkms_composer.c | 2 +-
> >>>> drivers/gpu/drm/vkms/vkms_drv.h | 6 +-
> >>>> drivers/gpu/drm/vkms/vkms_formats.c | 289 +++++++++++++++++++++++++++++++++--
> >>>> drivers/gpu/drm/vkms/vkms_formats.h | 4 +
> >>>> drivers/gpu/drm/vkms/vkms_plane.c | 14 +-
> >>>> 5 files changed, 295 insertions(+), 20 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> >>>> index e555bf9c1aee..54fc5161d565 100644
> >>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> >>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> >>>> @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
> >>>> * buffer [1]
> >>>> */
> >>>> current_plane->pixel_read_line(
> >>>> - current_plane->frame_info,
> >>>> + current_plane,
> >>>> x_start,
> >>>> y_start,
> >>>> direction,
> >>>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> >>>> index ccc5be009f15..a4f6456cb971 100644
> >>>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> >>>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> >>>> @@ -75,6 +75,8 @@ enum pixel_read_direction {
> >>>> READ_RIGHT
> >>>> };
> >>>>
> >>>> +struct vkms_plane_state;
> >>>> +
> >>>> /**
> >>>> <<<<<<< HEAD
> >>>> * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
> >>>> @@ -87,8 +89,8 @@ enum pixel_read_direction {
> >>>> * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
> >>>> * x_end.
> >>>> */
> >>>> -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
> >>>> - pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
> >>>> +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
> >>>> + enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
> >>>
> >>> This is the second or third time in this one series changing this type.
> >>> Could you not do the change once, in its own patch if possible?
> >>
> >> Sorry, this is not a change here, but a wrong formatting (missed when
> >> rebasing).
> >>
> >> Do you think that it make sense to re-order my patches and put this
> >> typedef at the end? This way it is never updated.

I'm not sure, I haven't checked how it would change your patches. The
intermediate changes might get a lot uglier?

Just try to fold changes so that you don't need to change something
twice over the series unless there is a good reason to. "How hard would
it be to review this?" is my measure stick.


> >>
> >>>>
> >>>> /**
> >>>> * vkms_plane_state - Driver specific plane state
> >>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> >>>> index 46daea6d3ee9..515c80866a58 100644
> >>>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> >>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> >>>> @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
> >>>> */
> >>>> return fb->offsets[plane_index] +
> >>>> (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
> >>>> - (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
> >>>> + (x / drm_format_info_block_height(format, plane_index)) *
> >>>> + format->char_per_block[plane_index];
> >>>
> >>> Shouldn't this be in the patch that added this code in the first place?
> >>
> >> Same as above, a wrong formatting, I will remove this change and keep
> >> everything on one line (even if it's more than 100 chars, it is easier to
> >> read).

Personally I agree that readability is more important than strict line
length limits. I'm not sure how the kernel rolls there.

> >>
> >>>> }
> >>>>
> >>>> /**
> >>>> @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
> >>>> }
> >>>> }
> >>>>
> >>>> +/**
> >>>> + * get_subsampling() - Get the subsampling value on a specific direction
> >>>
> >>> subsampling divisor
> >>
> >> Thanks for this precision.
> >>
> >>>> + */
> >>>> +static int get_subsampling(const struct drm_format_info *format,
> >>>> + enum pixel_read_direction direction)
> >>>> +{
> >>>> + if (direction == READ_LEFT || direction == READ_RIGHT)
> >>>> + return format->hsub;
> >>>> + else if (direction == READ_DOWN || direction == READ_UP)
> >>>> + return format->vsub;
> >>>> + return 1;
> >>>
> >>> In this and the below function, personally I'd prefer switch-case, with
> >>> a cannot-happen-scream after the switch, so the compiler can warn about
> >>> unhandled enum values.
> >>
> >> As for the previous patch, I did not know about this compiler feature,
> >> thanks!
> >>
> >>>> +}
> >>>> +
> >>>> +/**
> >>>> + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
> >>>> + */
> >>>> +static int get_subsampling_offset(const struct drm_format_info *format,
> >>>> + enum pixel_read_direction direction, int x_start, int y_start)
> >>>
> >>> 'start' values as "increments" for a pixel counter? Is something
> >>> misnamed here?
> >>>
> >>> Is it an increment or an offset?
> >>
> >> I don't really know how to name the function. I'm open to suggestions
> >> x_start and y_start are really the coordinate of the starting reading point.

I looks like it's an offset, so "offset" and "start" are good words.
Then the only misleading piece is the doc:

"Get the subsampling offset to use when incrementing the pixel counter"

This sounds like the offset is used when incrementing a counter, that
is, counter is increment by offset each time. That's my problem with
this.

Fix just the doc, and it's good, I think.

> >>
> >> To explain what it does:
> >>
> >> When using subsampling, you have to read the next pixel of planes[1..4]
> >> not at the same "speed" as plane[0]. But I can't only rely on
> >> "read_pixel_count % subsampling == 0", because it means that the pixel
> >> incrementation on planes[1..4] may not be aligned with the buffer (if
> >> hsub=2 and the start pixel is 1, I need to increment planes[1..4] only
> >> for x=2,4,6... not 1,3,5...).
> >>
> >> A way to ensure this is to add an "offset" to count, which ensure that the
> >> count % subsampling == 0 on the correct pixel.

Yes, I think I did get that feeling from the code eventually somehow,
but it wouldn't hurt to explain it in the comment.

"An offset for keeping the chroma siting consistent regardless of
x_start and y_start" maybe?

> >>
> >> I made an error, the switch case must be (as count is always counting up,
> >> for "inverted" reading direction a negative number ensure that
> >> %subsampling == 0 on the correct pixel):
> >>
> >> switch (direction) {
> >> case READ_UP:
> >> return -y_start;
> >> case READ_DOWN:
> >> return y_start;
> >> case READ_LEFT:
> >> return -x_start;
> >> case READ_RIGHT:
> >> return x_start;
> >> }

Yes, the inverted reading directions are different indeed. I did not
think through if this works also for sub-sampling divisors > 2 which I
don't think are ever used.

Does IGT find this mistake? If not, maybe IGT should be extended.

> >>
> >>>> +{
> >>>> + if (direction == READ_RIGHT || direction == READ_LEFT)
> >>>> + return x_start;
> >>>> + else if (direction == READ_DOWN || direction == READ_UP)
> >>>> + return y_start;
> >>>> + return 0;
> >>>> +}
> >>>> +
> >>
> >> [...]
> >>
> >>>> +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
> >>>> + enum drm_color_encoding encoding, enum drm_color_range range)
> >>>> +{
> >>>> + static const s16 bt601_full[3][3] = {
> >>>> + { 256, 0, 359 },
> >>>> + { 256, -88, -183 },
> >>>> + { 256, 454, 0 },
> >>>> + };
> >>
> >> [...]
> >>
> >>>> +
> >>>> + u8 r = 0;
> >>>> + u8 g = 0;
> >>>> + u8 b = 0;
> >>>> + bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
> >>>> + unsigned int y_offset = full ? 0 : 16;
> >>>> +
> >>>> + switch (encoding) {
> >>>> + case DRM_COLOR_YCBCR_BT601:
> >>>> + ycbcr2rgb(full ? bt601_full : bt601,
> >>>
> >>> Doing all these conditional again pixel by pixel is probably
> >>> inefficient. Just like with the line reading functions, you could pick
> >>> the matrix in advance.
> >>
> >> I don't think the performance impact is huge (it's only a pair of if), but
> >> yes, it's an easy optimization.
> >>
> >> I will create a conversion_matrix structure:
> >>
> >> struct conversion_matrix {
> >> s16 matrix[3][3];
> >> u16 y_offset;
> >> }

When defining such a struct type, it would be good to document the
matrix layout (which one is row, which one is column), and what the s16
mean (fixed point?).

Try to not mix signed and unsigned types, too. The C implicit type
promotion rules can be surprising. Just make everything signed while
computing, and convert to/from unsigned only for storage.

> >>
> >> I will create a `get_conversion_matrix_to_argb_u16` function to get this
> >> structure from a format+encoding+range.
> >>
> >> I will also add a field `conversion_matrix` in struct vkms_plane_state to
> >> get this matrix only once per plane setup.

Alright. Let's see how that works.

> >>
> >>
> >>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> >>>> + break;
> >>>> + case DRM_COLOR_YCBCR_BT709:
> >>>> + ycbcr2rgb(full ? rec709_full : rec709,
> >>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> >>>> + break;
> >>>> + case DRM_COLOR_YCBCR_BT2020:
> >>>> + ycbcr2rgb(full ? bt2020_full : bt2020,
> >>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> >>>> + break;
> >>>> + default:
> >>>> + pr_warn_once("Not supported color encoding\n");
> >>>> + break;
> >>>> + }
> >>>> +
> >>>> + argb_u16->r = r * 257;
> >>>> + argb_u16->g = g * 257;
> >>>> + argb_u16->b = b * 257;
> >>>
> >>> I wonder. Using 8-bit fixed point precision seems quite coarse for
> >>> 8-bit pixel formats, and it's going to be insufficient for higher bit
> >>> depths. Was supporting e.g. 10-bit YUV considered? There is even
> >>> deeper, too, like DRM_FORMAT_P016.
> >>
> >> It's a good point, as I explained above, I took the conversion part as a
> >> "black box" to avoid breaking (and debugging) stuff. I think it's easy to
> >> switch to s32 bits matrix with 16.16 bits (or anything with more than 16 bits in
> >> the float part).
> >>
> >> Maybe Arthur have an opinion on this?
> >
> > Yeah, I too don't see why not we could do that. The 8-bit precision was
> > sufficient for those formats, but as well noted by Pekka this could be a
> > problem for higher bit depths. I just need to make my terrible python
> > script spit those values XD.
>
> Finally, I got it working with 32-bit precision.
>
> I basically threw all my untrusted python code away, and started using
> the colour python framework suggested by Sebastian[1]. After knowing the
> right values (and staring at numbers for hours), I found that with a
> little bit of rounding, the conversion works.
>
> Also, while at it, I changed the name rec709 to bt709 to follow the
> pattern and added "_full" to the full ranges matrices.
>
> While using the library, I noticed that the red component is wrong on
> the color red in one test case.
>
> [1]: https://lore.kernel.org/all/20240115150600.GC160656@toolbox/

That all sounds good. I wish the kernel code contained comments
explaining how exactly you computed those matrices with python/colour.
If the python snippets are not too long, including them verbatim as
code comments would be really nice for both reviewers and posterity.

The same for the VKMS unit tests, too, how you got the expected result
values.

>
> Best Regards,
> ~Arthur Grillo
>
> ---
>
> diff --git a/drivers/gpu/drm/vkms/tests/vkms_format_test.c b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
> index f66584549827..4cee3c2d8d84 100644
> --- a/drivers/gpu/drm/vkms/tests/vkms_format_test.c
> +++ b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
> @@ -59,7 +59,7 @@ static struct yuv_u8_to_argb_u16_case yuv_u8_to_argb_u16_cases[] = {
> {"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
> {"gray", {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
> {"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
> - {"red", {0x35, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
> + {"red", {0x36, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
> {"green", {0xb6, 0x1e, 0x0c}, {0x0000, 0x0000, 0xffff, 0x0000}},
> {"blue", {0x12, 0xff, 0x74}, {0x0000, 0x0000, 0x0000, 0xffff}},
> },
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index e06bbd7c0a67..043f23dbf80d 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -121,10 +121,12 @@ static void RGB565_to_argb_u16(u8 **src_pixels, struct pixel_argb_u16 *out_pixel
> out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
> }
>
> -static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
> +#define BIT_DEPTH 32
> +
> +static void ycbcr2rgb(const s64 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
> {
> - s32 y_16, cb_16, cr_16;
> - s32 r_16, g_16, b_16;
> + s64 y_16, cb_16, cr_16;
> + s64 r_16, g_16, b_16;
>
> y_16 = y - y_offset;
> cb_16 = cb - 128;
> @@ -134,9 +136,18 @@ static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r,
> g_16 = m[1][0] * y_16 + m[1][1] * cb_16 + m[1][2] * cr_16;
> b_16 = m[2][0] * y_16 + m[2][1] * cb_16 + m[2][2] * cr_16;
>
> - *r = clamp(r_16, 0, 0xffff) >> 8;
> - *g = clamp(g_16, 0, 0xffff) >> 8;
> - *b = clamp(b_16, 0, 0xffff) >> 8;
> + // rounding the values
> + r_16 = r_16 + (1LL << (BIT_DEPTH - 4));
> + g_16 = g_16 + (1LL << (BIT_DEPTH - 4));
> + b_16 = b_16 + (1LL << (BIT_DEPTH - 4));
> +
> + r_16 = clamp(r_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
> + g_16 = clamp(g_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
> + b_16 = clamp(b_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);

Where do the BIT_DEPTH - 4 and BIT_DEPTH + 8 come from?

> +
> + *r = r_16 >> BIT_DEPTH;
> + *g = g_16 >> BIT_DEPTH;
> + *b = b_16 >> BIT_DEPTH;
> }

..

> >
> >> Just to be sure, the DRM subsystem don't have such matrix somewhere? It
> >> can be nice to avoid duplicating them.
> >
> > As to my knowledge it does not exist on DRM, I think those are normally
> > on the hardware itself (*please* correct me if I'm wrong).

I couldn't find a matrix type either on a quick glance, but there is
drm_fixed.h for a couple different fixed point formats, it seems. FWIW.
drm_fixed.h didn't feel very appealing for this here.

> >
> > But, v4l2 has a similar table on
> > drivers/media/common/v4l2-tpg/v4l2-tpg-core.c (Actually, I started my
> > code based on this), unfortunately it's only 8-bit too.

Thanks,
pq


Attachments:
(No filename) (849.00 B)
OpenPGP digital signature

2024-02-29 10:21:41

by Pekka Paalanen

[permalink] [raw]
Subject: Re: [PATCH v2 5/9] drm/vkms: Re-introduce line-per-line composition algorithm

On Tue, 27 Feb 2024 16:02:09 +0100
Louis Chauvet <[email protected]> wrote:

> [...]
>
> > > -static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
> > > - struct line_buffer *stage_buffer,
> > > - struct line_buffer *output_buffer)
> > > +static void pre_mul_alpha_blend(
> > > + struct line_buffer *stage_buffer,
> > > + struct line_buffer *output_buffer,
> > > + int x_start,
> > > + int pixel_count)
> > > {
> > > - int x_dst = frame_info->dst.x1;
> > > - struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
> > > - struct pixel_argb_u16 *in = stage_buffer->pixels;
> > > - int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> > > - stage_buffer->n_pixels);
> > > -
> > > - for (int x = 0; x < x_limit; x++) {
> > > - out[x].a = (u16)0xffff;
> > > - out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
> > > - out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
> > > - out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
> > > + struct pixel_argb_u16 *out = &output_buffer->pixels[x_start];
> > > + struct pixel_argb_u16 *in = &stage_buffer->pixels[x_start];
> >
> > Input buffers and pointers should be const.
>
> They will be const in v4.
>
> > > +
> > > + for (int i = 0; i < pixel_count; i++) {
> > > + out[i].a = (u16)0xffff;
> > > + out[i].r = pre_mul_blend_channel(in[i].r, out[i].r, in[i].a);
> > > + out[i].g = pre_mul_blend_channel(in[i].g, out[i].g, in[i].a);
> > > + out[i].b = pre_mul_blend_channel(in[i].b, out[i].b, in[i].a);
> > > }
> > > }
> >
> > Somehow the hunk above does not feel like it is part of "re-introduce
> > line-per-line composition algorithm". This function was already running
> > line-by-line. Would it be easy enough to collect this and directly
> > related changes into a separate patch?
>
> It is not directly related to the reintroduction of line-by-line
> algorithm, but in the simplification and maintenability effort, I
> changed a bit the function to avoid having multiple place computing the
> x_start/pixel_count values. I don't see an interrest to extract it, it
> will be just a translation of the few lines into the calling place.

It does make review more difficult, because it makes the patch bigger
and is not explained in the commit message. It is a surprise to a
reviewer, who then needs to think what this means and does it belong
here.

If you explain it in the commit message and note it in the commit
summary line, I think it would become fairly obvious that this patch is
doing two things rather than one.

Therefore, *if* it is easy to extract as a separate patch, then it
would be nice to do so. However, if doing so would require you to write
a bunch of temporary code that the next patch would just rewrite again,
then doing so would be counter-productive.

Patch split is about finding a good trade-off to make things easy for
reviewers:

- Smaller patches are better as long as they are self-standing and
understandable in isolation, and of course do not regress anything.

- Rewriting the same thing multiple times in the same series is extra
work for a reviewer and therefore best avoided.

- The simpler the semantic change, the bigger a patch can be and still
be easy to review.

And all the patch writing rules specific to the kernel project that I
don't know about.

> [...]
>
> > > +/**
> > > + * direction_for_rotation() - Helper to get the correct reading direction for a specific rotation
> > > + *
> > > + * @rotation: rotation to analyze
> >
> > This is KMS plane rotation property, right?
> >
> > So the KMS plane has been rotated by this, and what we want to find is
> > the read direction on the attached FB so that reading returns pixels in
> > the CRTC line/scanout order, right?
> >
> > Maybe extend the doc to explain that.
>
> Is it better?
>
> * direction_for_rotation() - Get the correct reading direction for a given rotation
> *
> * This function will use the @rotation parameter to compute the correct reading direction to read
> * a line from the source buffer.
> * For example, if the buffer is reflected on X axis, the pixel must be read from right to left.
> * @rotation: Rotation to analyze. It correspond the the field @frame_info.rotation.

I think it is important to define what determines the correct result.
In this case, we want the reading to produce pixels in the CRTC scanout
line order, I believe. If you don't say "CRTC", the reader does not
know what "the correct reading direction" should match to.

> > > + */
> > > +enum pixel_read_direction direction_for_rotation(unsigned int rotation)
> > > +{
> > > + if (rotation & DRM_MODE_ROTATE_0) {
> > > + if (rotation & DRM_MODE_REFLECT_X)
> > > + return READ_LEFT;
> > > + else
> > > + return READ_RIGHT;
> > > + } else if (rotation & DRM_MODE_ROTATE_90) {
> > > + if (rotation & DRM_MODE_REFLECT_Y)
> > > + return READ_UP;
> > > + else
> > > + return READ_DOWN;
> > > + } else if (rotation & DRM_MODE_ROTATE_180) {
> > > + if (rotation & DRM_MODE_REFLECT_X)
> > > + return READ_RIGHT;
> > > + else
> > > + return READ_LEFT;
> > > + } else if (rotation & DRM_MODE_ROTATE_270) {
> > > + if (rotation & DRM_MODE_REFLECT_Y)
> > > + return READ_DOWN;
> > > + else
> > > + return READ_UP;
> > > + }
> > > + return READ_RIGHT;
> > > +}
> > > +
> > > /**
> > > * blend - blend the pixels from all planes and compute crc
> > > * @wb: The writeback frame buffer metadata
> > > @@ -183,11 +187,11 @@ static void blend(struct vkms_writeback_job *wb,
> > > {
> > > struct vkms_plane_state **plane = crtc_state->active_planes;
> > > u32 n_active_planes = crtc_state->num_active_planes;
> > > - int y_pos;
> > >
> > > const struct pixel_argb_u16 background_color = { .a = 0xffff };
> > >
> > > size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
> > > + size_t crtc_x_limit = crtc_state->base.crtc->mode.hdisplay;
> >
> > Wonder why these were size_t, causing needs to cast below...
>
> For crtc_x_limit I just copied the crtc_y_limit. I will change both to u16
> (the type of h/vdisplay).

Don't go unsigned, that can cause unexpected results when mixed in
computations with signed variables.

Oh, the cast was probably not about size but signedness. Indeed, size_t
is unsigned.

I don't see a reason to use a 16-bit size either, it just exposes the
computations to under/overflows that would then be needed to check for.
s32 should be as fast as any, and perhaps enough bits to never
under/overflow in these computations, but please verify that.

> > >
> > > /*
> > > * The planes are composed line-by-line. It is a necessary complexity to avoid poor
> > > @@ -198,22 +202,133 @@ static void blend(struct vkms_writeback_job *wb,
> > >
> > > /* The active planes are composed associatively in z-order. */
> > > for (size_t i = 0; i < n_active_planes; i++) {
> > > - y_pos = get_y_pos(plane[i]->frame_info, y);
> > > + struct vkms_plane_state *current_plane = plane[i];
> > >
> > > - if (!check_limit(plane[i]->frame_info, y_pos))
> > > + /* Avoid rendering useless lines */
> > > + if (y < current_plane->frame_info->dst.y1 ||
> > > + y >= current_plane->frame_info->dst.y2) {
> > > continue;
> > > -
> > > - vkms_compose_row(stage_buffer, plane[i], y_pos);
> > > - pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
> > > - output_buffer);
> > > + }
> > > +
> > > + /*
> > > + * src_px is the line to copy. The initial coordinates are inside the
> > > + * destination framebuffer, and then drm_rect_* helpers are used to
> > > + * compute the correct position into the source framebuffer.
> > > + */
> > > + struct drm_rect src_px = DRM_RECT_INIT(
> > > + current_plane->frame_info->dst.x1, y,
> > > + drm_rect_width(&current_plane->frame_info->dst), 1);
> > > + struct drm_rect tmp_src;
> > > +
> > > + drm_rect_fp_to_int(&tmp_src, &current_plane->frame_info->src);
> > > +
> > > + /*
> > > + * [1]: Clamping src_px to the crtc_x_limit to avoid writing outside of the
> > > + * destination buffer
> > > + */
> > > + src_px.x2 = min_t(int, src_px.x2, (int)crtc_x_limit);
> >
> > Up to and including this point, it would be better if src_px was called
> > dst_px, because only the below computation converts it into actual
> > src_px.
>
> I agree, it will be changed for the v4. I will also change the name to
> `dst_line` and `src_line`.

Alright.

..


> > > }
> > >
> > > -static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
> > > +/**
> > > + * get_step_1x1() - Common helper to compute the correct step value between each pixel to read in a
> > > + * certain direction.
> > > + * This must be used only with format where blockh == blockw == 1.
> > > + * In the case when direction is not a valid pixel_read_direction, the returned step is 0, so you
> > > + * must not rely on this result to create a loop variant.
> > > + *
> > > + * @fb Framebuffer to iter on
> > > + * @direction Direction of the reading
> > > + */
> > > +static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction direction,
> > > + int plane_index)
> > > {
> > > - int x_src = frame_info->src.x1 >> 16;
> > > - int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
> > > -
> > > - return packed_pixels_addr(frame_info, x_src, y_src);
> > > + switch (direction) {
> > > + default:
> > > + DRM_ERROR("Invalid direction for pixel reading: %d\n", direction);
> > > + return 0;
> >
> > What I'd do here is move the default: section outside of the switch
> > completely. Then the compiler can warn if any enum value is not handled
> > here. Since every case in the switch is a return statement, falling out
> > of the switch block is the default case.
>
> Hoo, I did not know that gcc can warn when using enums, I will definitly
> do it for the v4.
>
> > Maybe the enum variable containing an illegal value could be handled
> > more harshly so that callers could rely on this function always
> > returning a good value?
> >
> > Just like passing in fb=NULL is handled by the kernel as an OOPS.
>
> I don't think it's a good idea to OOPS inside a driver.

Everyone already do that. Most functions that do not expect to be called
with NULL never check the arguments for NULL. They just OOPS on
dereference if someone passes in NULL. And for a good reason: adding
all those checks is both code churn and it casts doubt: "maybe it is
legal and expected to call this function with NULL sometimes, what good
does that do?".

> An error here is
> maybe dangerous, but is not fatal to the kernel. Maybe you know how to do
> a "local" OOPS to break only this driver and not the whole kernel?

I don't know what the best practices are in the kernel.

> For the v4 I will keep a DRM_ERROR and return 0.

Does that require the caller to check for 0? Could the 0 cause
something else to end up in an endless loop? If it does return 0, how
should a caller handle this case that "cannot" ever happen? Why have
code for something that cannot happen?

Of course it's a trade-off between correctness and limping along
injured, and the kernel tends to strongly lean toward the latter for the
obvious reasons.

> > > + case READ_RIGHT:
> > > + return fb->format->char_per_block[plane_index];
> > > + case READ_LEFT:
> > > + return -fb->format->char_per_block[plane_index];
> > > + case READ_DOWN:
> > > + return (int)fb->pitches[plane_index];
> > > + case READ_UP:
> > > + return -(int)fb->pitches[plane_index];
> > > + }
> > > }
> > >
> > > -static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
> > > -{
> > > - if (frame_info->rotation & (DRM_MODE_REFLECT_X | DRM_MODE_ROTATE_270))
> > > - return limit - x - 1;
> > > - return x;
> > > -}
> > >
> > > /*
> > > - * The following functions take pixel data from the buffer and convert them to the format
> > > + * The following functions take pixel data (a, r, g, b, pixel, ...), convert them to the format
> > > * ARGB16161616 in out_pixel.
> > > *
> > > - * They are used in the `vkms_compose_row` function to handle multiple formats.
> > > + * They are used in the `read_line`s functions to avoid duplicate work for some pixel formats.
> > > */
> > >
> > > -static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> > > +static void ARGB8888_to_argb_u16(struct pixel_argb_u16 *out_pixel, int a, int r, int g, int b)
> >
> > The function name ARGB8888_to_argb_u16() is confusing. It's not taking
> > in ARGB8888 pixels but separate a,r,g,b ints. The only assumption it
> > needs from the pixel format is the 8888 part.
>
> I don't realy know how to name it. What I like with ARGB8888 is that it's
> clear that the values are 8 bits and in argb format.

I could even propose

static struct pixel_argb_u16
argb_u16_from_u8888(int a, int r, int g, int b)

perhaps. Yes, returning a struct by value. I think it would fit, and
these are supposed to get fully inlined anyway, too.

c.f argb_u16_from_u2101010().

Not a big deal though, I think I'm getting a little bit too involved to
see what would be the most intuitively understandable naming scheme for
someone not familiar with the code.

> Do you think that `argb_u8_to_argb_u16`, with a new structure
> pixel_argb_u8 will be better? (like PATCH 6/9 with pixel_yuv_u8).
>
> If so, I will introduce the argb_u8 structure in an other commit.

How would you handle 10-bpc formats? Is there a need for
proliferation of bit-depth-specific struct types?

> [...]
>
> > > + * The following functions are read_line function for each pixel format supported by VKMS.
> > > *
> > > - * This function composes a single row of a plane. It gets the source pixels
> > > - * through the y coordinate (see get_packed_src_addr()) and goes linearly
> > > - * through the source pixel, reading the pixels and converting it to
> > > - * ARGB16161616 (see the pixel_read() callback). For rotate-90 and rotate-270,
> > > - * the source pixels are not traversed linearly. The source pixels are queried
> > > - * on each iteration in order to traverse the pixels vertically.
> > > + * They read a line starting at the point @x_start,@y_start following the @direction. The result
> > > + * is stored in @out_pixel and in the format ARGB16161616.
> > > + *
> > > + * Those function are very similar, but it is required for performance reason. In the past, some
> > > + * experiment were done, and with a generic loop the performance are very reduced [1].
> > > + *
> > > + * [1]: https://lore.kernel.org/dri-devel/[email protected]/
> > > */
> > > -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y)
> > > +
> > > +static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> > > + enum pixel_read_direction direction, int count,
> > > + struct pixel_argb_u16 out_pixel[])
> > > +{
> > > + u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> > > +
> > > + int step = get_step_1x1(frame_info->fb, direction, 0);
> > > +
> > > + while (count) {
> > > + u8 *px = (u8 *)src_pixels;
> > > +
> > > + ARGB8888_to_argb_u16(out_pixel, px[3], px[2], px[1], px[0]);
> > > + out_pixel += 1;
> > > + src_pixels += step;
> > > + count--;
> >
> > btw. you could eliminate decrementing 'count' if you computed end
> > address and used while (out_pixel < end).
>
> Yes, you are right, but after thinking about it, neither out_pixel < end
> and while (count) are conveying "this loop will copy `count` pixels. I
> think a for-loop here is more understandable. There is no ambiguity in the
> number of pixels written and less error-prone. I will replace
> while (count)
> by
> for(int i = 0; i < count; i++)

I agree that a for-loop is the most obvious way of saying it, but I
also think while (out_pixel < end) is very close too, and so is while (count).
None of those would make me think twice.

However, I'm thinking of performance here. After all, this is the
hottest code path there is in VKMS. Is the compiler smart enough to
eliminate count-- or i to reduce the number of CPU cycles?


Thanks,
pq


Attachments:
(No filename) (849.00 B)
OpenPGP digital signature

2024-03-02 14:15:00

by Arthur Grillo

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] drm/vkms: Add YUV support



On 01/03/24 08:53, Pekka Paalanen wrote:
> On Thu, 29 Feb 2024 14:57:06 -0300
> Arthur Grillo <[email protected]> wrote:
>
>> On 29/02/24 09:12, Pekka Paalanen wrote:
>>> On Wed, 28 Feb 2024 22:52:09 -0300
>>> Arthur Grillo <[email protected]> wrote:
>>>
>>>> On 27/02/24 17:01, Arthur Grillo wrote:
>>>>>
>>>>>
>>>>> On 27/02/24 12:02, Louis Chauvet wrote:
>>>>>> Hi Pekka,
>>>>>>
>>>>>> For all the comment related to the conversion part, maybe Arthur have an
>>>>>> opinion on it, I took his patch as a "black box" (I did not want to
>>>>>> break (and debug) it).
>>>>>>
>>>>>> Le 26/02/24 - 14:19, Pekka Paalanen a écrit :
>>>>>>> On Fri, 23 Feb 2024 12:37:26 +0100
>>>>>>> Louis Chauvet <[email protected]> wrote:
>>>>>>>
>>>>>>>> From: Arthur Grillo <[email protected]>
>>>>>>>>
>>>>>>>> Add support to the YUV formats bellow:
>>>>>>>>
>>>>>>>> - NV12
>>>>>>>> - NV16
>>>>>>>> - NV24
>>>>>>>> - NV21
>>>>>>>> - NV61
>>>>>>>> - NV42
>>>>>>>> - YUV420
>>>>>>>> - YUV422
>>>>>>>> - YUV444
>>>>>>>> - YVU420
>>>>>>>> - YVU422
>>>>>>>> - YVU444
>>>>>>>>
>>>>>>>> The conversion matrices of each encoding and range were obtained by
>>>>>>>> rounding the values of the original conversion matrices multiplied by
>>>>>>>> 2^8. This is done to avoid the use of fixed point operations.
>>>>>>>>
>>>>>>>> Signed-off-by: Arthur Grillo <[email protected]>
>>>>>>>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
>>>>>>>> callbacks for yuv formats]
>>>>>>>> Signed-off-by: Louis Chauvet <[email protected]>
>>>>>>>> ---
>>>>>>>> drivers/gpu/drm/vkms/vkms_composer.c | 2 +-
>>>>>>>> drivers/gpu/drm/vkms/vkms_drv.h | 6 +-
>>>>>>>> drivers/gpu/drm/vkms/vkms_formats.c | 289 +++++++++++++++++++++++++++++++++--
>>>>>>>> drivers/gpu/drm/vkms/vkms_formats.h | 4 +
>>>>>>>> drivers/gpu/drm/vkms/vkms_plane.c | 14 +-
>>>>>>>> 5 files changed, 295 insertions(+), 20 deletions(-)
>
> ...
>
>>>> diff --git a/drivers/gpu/drm/vkms/tests/vkms_format_test.c b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
>>>> index f66584549827..4cee3c2d8d84 100644
>>>> --- a/drivers/gpu/drm/vkms/tests/vkms_format_test.c
>>>> +++ b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
>>>> @@ -59,7 +59,7 @@ static struct yuv_u8_to_argb_u16_case yuv_u8_to_argb_u16_cases[] = {
>>>> {"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
>>>> {"gray", {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
>>>> {"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
>>>> - {"red", {0x35, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
>>>> + {"red", {0x36, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
>>>> {"green", {0xb6, 0x1e, 0x0c}, {0x0000, 0x0000, 0xffff, 0x0000}},
>>>> {"blue", {0x12, 0xff, 0x74}, {0x0000, 0x0000, 0x0000, 0xffff}},
>>>> },
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>>> index e06bbd7c0a67..043f23dbf80d 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>>> @@ -121,10 +121,12 @@ static void RGB565_to_argb_u16(u8 **src_pixels, struct pixel_argb_u16 *out_pixel
>>>> out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
>>>> }
>>>>
>>>> -static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
>>>> +#define BIT_DEPTH 32
>>>> +
>>>> +static void ycbcr2rgb(const s64 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
>>>> {
>>>> - s32 y_16, cb_16, cr_16;
>>>> - s32 r_16, g_16, b_16;
>>>> + s64 y_16, cb_16, cr_16;
>>>> + s64 r_16, g_16, b_16;
>>>>
>>>> y_16 = y - y_offset;
>>>> cb_16 = cb - 128;
>>>> @@ -134,9 +136,18 @@ static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r,
>>>> g_16 = m[1][0] * y_16 + m[1][1] * cb_16 + m[1][2] * cr_16;
>>>> b_16 = m[2][0] * y_16 + m[2][1] * cb_16 + m[2][2] * cr_16;
>>>>
>>>> - *r = clamp(r_16, 0, 0xffff) >> 8;
>>>> - *g = clamp(g_16, 0, 0xffff) >> 8;
>>>> - *b = clamp(b_16, 0, 0xffff) >> 8;
>>>> + // rounding the values
>>>> + r_16 = r_16 + (1LL << (BIT_DEPTH - 4));
>>>> + g_16 = g_16 + (1LL << (BIT_DEPTH - 4));
>>>> + b_16 = b_16 + (1LL << (BIT_DEPTH - 4));
>>>> +
>>>> + r_16 = clamp(r_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
>>>> + g_16 = clamp(g_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
>>>> + b_16 = clamp(b_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
>>>
>>> Where do the BIT_DEPTH - 4 and BIT_DEPTH + 8 come from?
>>
>> Basically, the numbers are in this form in hex:
>>
>> 0xsspppppppp
>>
>> In the end, we only want the 's' bits.
>>
>> The matrix multiplication is not giving us perfect results, making some
>> of KUnit test not pass, This is because the values end up a little bit
>> off. KUnit expects 0xfe, but this functions is returning 0xfd.
>>
>> I noticed that before shifting the values to get the 's' bytes the
>> values were a lot close to what is expected, something like:
>>
>> 0xfdfe287312
>> ^
>> So the rounding part adds 1 to this marked 'f' to round a bit the values
>> (drm_fixed.h does something similar on drm_fixp2int_round).
>> Like that:
>>
>> 0xfdfe287312
>> + 0x0010000000
>> ------------
>> 0xfe0e287312
>>
>> That's why the BIT_DEPTH - 4.
>
> I have a hard time deciphering this. There is some sort of strange
> combination of UNORM and fixed-point going on here, where you process
> the range 0.0 - 255.0 including 32-bit fraction. All variables being
> named "_16" does not help, I've no idea what that refers to.

Totally forgot to rename that, sorry.

>
> Usually when you have unsigned pixel format type, it's UNORM, that is
> an unsigned integer representation that maps to [0.0, 1.0]. When
> converting UNORM properly to e.g. fixed-point, you don't have to
> consider the UNORM bit depth when computing in fixed-point.
>
> There is a catch: since 0xff maps to 1.0, the divisor is 0xff, and not
> a bit shift by 8. This must be taken into account when converting
> between different depths of UNORM, or between UNORM and fixed-point.
> Converting between different depths of fixed-point does not have this
> problem.
>
> If you want to proper rounding, meaning that 0.5 rounds up to 1.0 and
> 0.4999 rounds down to 0.0 when rounding to integers, you have to add
> 0.5 before truncating.
>
> So in this case you need to add 0x0100_0000 / 2 = 0x0080_0000, not
> 0x0010_0000.

Thanks for the explanations, I will try to take all this into account.

>
> I don't understand what drm_fixp2int_round() is even doing. The offset
> is not 0.5, it's 0.0000076.
>
>> After that, the values need to be clamped to not get wrong results when
>> shifting this s64 and casting it to u8. We clamp it to the minimum
>> allowed value: 0, and to the maximum allowed value, which in this case
>> is all the (BIT_DEPTH + 8) bits set to 1, The '+ 8' is to account for
>> the size of the 's' bits.
>
> Ok. You could also shift with >> BIT_DEPTH first, and then clamp to 0,
> 255.

Great idea! This makes more sense.

Best Regards,
~Arthur Grillo

>
>
> Thanks,
> pq
>
>> After writing this, I think that maybe it would be good to add this
>> explanation as a comment on the code.
>>
>>>
>>>> +
>>>> + *r = r_16 >> BIT_DEPTH;
>>>> + *g = g_16 >> BIT_DEPTH;
>>>> + *b = b_16 >> BIT_DEPTH;
>>>> }

2024-03-04 15:28:46

by Louis Chauvet

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] drm/vkms: write/update the documentation for pixel conversion and pixel write functions

Le 29/02/24 - 10:48, Pekka Paalanen a ?crit :
> On Tue, 27 Feb 2024 16:02:10 +0100
> Louis Chauvet <[email protected]> wrote:
>
> > [...]
> >
> > > > diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> > > > index 172830a3936a..cb7a49b7c8e7 100644
> > > > --- a/drivers/gpu/drm/vkms/vkms_formats.c
> > > > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > > > @@ -9,6 +9,17 @@
> > > >
> > > > #include "vkms_formats.h"
> > > >
> > > > +/**
> > > > + * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
> > > > + * in the first plane
> > > > + *
> > > > + * @frame_info: Buffer metadata
> > > > + * @x: The x coordinate of the wanted pixel in the buffer
> > > > + * @y: The y coordinate of the wanted pixel in the buffer
> > > > + *
> > > > + * The caller must be aware that this offset is not always a pointer to a pixel. If individual
> > > > + * pixel values are needed, they have to be extracted from the resulting block.
> > >
> > > Just wondering how the caller will be able to extract the right pixel
> > > from the block without re-using the knowledge already used in this
> > > function. I'd also expect the function to round down x,y to be
> > > divisible by block dimensions, but that's not visible in this email.
> > > Then the caller needs the remainder from the round-down, too?
> >
> > You are right, the current implementation is only working when block_h ==
> > block_w == 1. I think I wrote the documentation for PATCHv2 5/9, but when
> > backporting this comment for PATCHv2 3/9 I forgot to update it.
> > The new comment will be:
> >
> > * pixels_offset() - Get the offset of a given pixel data at coordinate
> > * x/y in the first plane
> > [...]
> > * The caller must ensure that the framebuffer associated with this
> > * request uses a pixel format where block_h == block_w == 1.
> > * If this requirement is not fulfilled, the resulting offset can be
> > * completly wrong.
>
> Hi Louis,

Hi Pekka,

> if there is no plan for how non-1x1 blocks would work yet, then I think
> the above wording is fine. In my mind, the below wording would
> encourage callers to seek out and try arbitrary tricks to make things
> work for non-1x1 without rewriting the function to actually work.
>
> I believe something would need to change in the function signature to
> make it properly usable for non-1x1 blocks, but I too cannot suggest
> anything off-hand.

I already made the change to support non-1x1 blocks in Patchv2 5/9
(I will extract this modification in "drm/vkms: Update pixels accessor to
support packed and multi-plane formats"), this function is now able
to extract the pointer to the start of a block. But as stated in the
comment, the caller must manually extract the correct pixel values (if the
format is 2x2, the pointer will point to the first byte of this block, the
caller must do some computation to access the bottom-right value).

> >
> > And yes, even after PATCHv2 5/9 it is not clear what is the offset. Is
> > this better to replace the last sentence? (I will do the same update for
> > the last sentence of packed_pixels_addr)
> >
> > [...]
> > * The returned offset correspond to the offset of the block containing the pixel at coordinates
> > * x/y.
> > * The caller must use this offset with care, as for formats with block_h != 1 or block_w != 1
> > * the requested pixel value may have to be extracted from the block, even if they are
> > * individually adressable.
> >
> > > > + */
> > > > static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> > > > {
> > > > struct drm_framebuffer *fb = frame_info->fb;
> > > > @@ -17,12 +28,13 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
> > > > + (x * fb->format->cpp[0]);
> > > > }
> > > >
> >
> > [...]
> >
> > > > +/**
> > > > + * Retrieve the correct read_pixel function for a specific format.
> > > > + * The returned pointer is NULL for unsupported pixel formats. The caller must ensure that the
> > > > + * pointer is valid before using it in a vkms_plane_state.
> > > > + *
> > > > + * @format: 4cc of the format
> > >
> > > Since there are many different 4cc style pixel format definition tables
> > > in existence with conflicting definitions, it would not hurt to be more
> > > specific that this is about DRM_FORMAT_* or drm_fourcc.h.
> >
> > Is this better?
> >
> > @format: DRM_FORMAT_* value for which to obtain a conversion function (see [drm_fourcc.h])
>
> Much better!
>
>
> Thanks,
> pq

[...]

Kind regards,
Louis Chauvet

--
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2024-03-04 15:29:44

by Louis Chauvet

[permalink] [raw]
Subject: Re: [PATCH v2 7/9] drm/vkms: Add range and encoding properties to pixel_read function

Le 29/02/24 - 14:24, Pekka Paalanen a ?crit :
> On Tue, 27 Feb 2024 16:02:10 +0100
> Louis Chauvet <[email protected]> wrote:
>
> > (same as for PATCHv2 6/9, I took the patch from Arthur with no
> > modifications)
> >
> > Le 26/02/24 - 14:23, Pekka Paalanen a ?crit :
> > > On Fri, 23 Feb 2024 12:37:27 +0100
> > > Louis Chauvet <[email protected]> wrote:
> > >
> > > > From: Arthur Grillo <[email protected]>
> > > >
> > > > Create range and encoding properties. This should be noop, as none of
> > > > the conversion functions need those properties.
> > >
> > > None of the conversion function needs this? How can one say so?
> > > The previous patch is making use of them already, AFAICT?
> >
> > It's my fault, I mixed the commits (in Arthur's series, "Add range..." was
> > before "Add YUV support"), but for me it makes no sens to have the color
> > property without the support in the driver.
>
> Ah, so if there was no YUV support, these properties would never affect
> anything. Ok, I see where that is coming from.
>
> >
> > Maybe it's better just to merge "Add range..." with "Add YUV support"?
> >
> > > How is this a noop? Is it not exposing new UAPI from VKMS?
> >
> > It's not a no-op from userspace, but from the driver side, yes.
>
> If it all is already hooked up and handled in the driver, then say just
> that?
>
> "Now that the driver internally handles these quantization ranges and YUV
> encoding matrices, expose the UAPI for setting them."
>
> And fix the commit summary line too, nothing "pixel_read" here.

Ack

Kind regards,
Louis Chauvet

> Thanks,
> pq
>
> >
> > Kind regards,
> > Louis Chauvet
> >
> > > Thanks,
> > > pq
> > >
> > > >
> > > > Signed-off-by: Arthur Grillo <[email protected]>
> > > > [Louis Chauvet: retained only relevant parts]
> > > > Signed-off-by: Louis Chauvet <[email protected]>
> > > > ---
> > > > drivers/gpu/drm/vkms/vkms_plane.c | 9 +++++++++
> > > > 1 file changed, 9 insertions(+)
> > > >
> > > > diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> > > > index 427ca67c60ce..95dfde297377 100644
> > > > --- a/drivers/gpu/drm/vkms/vkms_plane.c
> > > > +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> > > > @@ -228,5 +228,14 @@ struct vkms_plane *vkms_plane_init(struct vkms_device *vkmsdev,
> > > > drm_plane_create_rotation_property(&plane->base, DRM_MODE_ROTATE_0,
> > > > DRM_MODE_ROTATE_MASK | DRM_MODE_REFLECT_MASK);
> > > >
> > > > + drm_plane_create_color_properties(&plane->base,
> > > > + BIT(DRM_COLOR_YCBCR_BT601) |
> > > > + BIT(DRM_COLOR_YCBCR_BT709) |
> > > > + BIT(DRM_COLOR_YCBCR_BT2020),
> > > > + BIT(DRM_COLOR_YCBCR_LIMITED_RANGE) |
> > > > + BIT(DRM_COLOR_YCBCR_FULL_RANGE),
> > > > + DRM_COLOR_YCBCR_BT601,
> > > > + DRM_COLOR_YCBCR_FULL_RANGE);
> > > > +
> > > > return plane;
> > > > }
> > > >
> > >
> >
> >
> >
>



--
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2024-03-04 15:29:45

by Louis Chauvet

[permalink] [raw]
Subject: Re: [PATCH v2 4/9] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions

Le 29/02/24 - 11:07, Pekka Paalanen a ?crit :
> On Tue, 27 Feb 2024 16:02:13 +0100
> Louis Chauvet <[email protected]> wrote:
>
> > Le 26/02/24 - 13:36, Pekka Paalanen a ?crit :
> > > On Fri, 23 Feb 2024 12:37:24 +0100
> > > Louis Chauvet <[email protected]> wrote:
> > >
> > > > Introduce two typedefs: pixel_read_t and pixel_write_t. It allows the
> > > > compiler to check if the passed functions take the correct arguments.
> > > > Such typedefs will help ensuring consistency across the code base in
> > > > case of update of these prototypes.
> > > >
> > > > Introduce a check around the get_pixel_*_functions to avoid using a
> > > > nullptr as a function.
> > > >
> > > > Document for those typedefs.
> > > >
> > > > Signed-off-by: Louis Chauvet <[email protected]>
> > > > ---
> > > > drivers/gpu/drm/vkms/vkms_drv.h | 23 +++++++++++++++++++++--
> > > > drivers/gpu/drm/vkms/vkms_formats.c | 8 ++++----
> > > > drivers/gpu/drm/vkms/vkms_formats.h | 4 ++--
> > > > drivers/gpu/drm/vkms/vkms_plane.c | 9 ++++++++-
> > > > drivers/gpu/drm/vkms/vkms_writeback.c | 9 ++++++++-
> > > > 5 files changed, 43 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> > > > index 18086423a3a7..886c885c8cf5 100644
> > > > --- a/drivers/gpu/drm/vkms/vkms_drv.h
> > > > +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> > > > @@ -53,12 +53,31 @@ struct line_buffer {
> > > > struct pixel_argb_u16 *pixels;
> > > > };
> > > >
> > > > +/**
> > > > + * typedef pixel_write_t - These functions are used to read a pixel from a
> > > > + * `struct pixel_argb_u16*`, convert it in a specific format and write it in the @dst_pixels
> > > > + * buffer.
> > > > + *
> > > > + * @dst_pixel: destination address to write the pixel
> > > > + * @in_pixel: pixel to write
> > > > + */
> > > > +typedef void (*pixel_write_t)(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel);
> > >
> > > There are some inconsistencies in pixel_write_t and pixel_read_t. At
> > > this point of the series they still operate on a single pixel, but you
> > > use dst_pixels and src_pixels, plural. Yet the documentation correctly
> > > talks about processing a single pixel.
> >
> > I will fix this for v4.
> >
> > > I would also expect the source to be always const, but that's a whole
> > > another patch to change.
> >
> > The v4 will contains a new patch "drm/vkms: Use const pointer for
> > pixel_read and pixel_write functions"
>
> Sounds good!
>
> >
> > [...]
> >
> > > > diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> > > > index d5203f531d96..f68b1b03d632 100644
> > > > --- a/drivers/gpu/drm/vkms/vkms_plane.c
> > > > +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> > > > @@ -106,6 +106,13 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> > > > return;
> > > >
> > > > fmt = fb->format->format;
> > > > + pixel_read_t pixel_read = get_pixel_read_function(fmt);
> > > > +
> > > > + if (!pixel_read) {
> > > > + DRM_WARN("Pixel format is not supported by VKMS planes. State is inchanged\n");
> > >
> > > DRM_WARN() is the kernel equivalent to userspace assert(), right?
> >
> > For the DRM_WARN it is just a standard prinkt(KERN_WARN, ...) (hidden
> > behind drm internal macros).
>
> My concern here is that does hitting this cause additional breakage of
> the UAPI contract? For example, the UAPI contract expects that the old
> FB is unreffed and the new FB is reffed by the plane in question. If
> this early return causes that FB swap to be skipped, it could cause
> secondary unexpected failures or misbehaviour for userspace later. That
> could mislead debugging to think that there is a userspace bug.
>
> Even if you cannot actually read FB due to an internal bug, it would be
> good to still apply all the state changes that the UAPI contract
> mandates.
>
> Unless, this is a kernel bug kind of thing which explodes very
> verbosely, but DRM_WARN is not that.

You are right. In this case I maybe can just create a dummy
"pixel_read" which always return black? (The v4 will have it so you can
see how it look)

This way, I can:
- keep the check and avoid null function pointers (and OOPS);
- log (WARN, DRM_WARN, DRM_ERROR or whatever suggested by DRM maintainers
to warn very loudly);
- Apply the requested change from userspace (and don't break the UAPI
contract).

The resulting behavior will be "the virtual plane is black", which is, I
think, not very important. As the primary usage of VKMS is testing, it
will just broke all the tests, and a quick look at the kernel logs will
show this bug.

> > > In that failing the check means an internal invariant was violated,
> > > which means a code bug in kernel?
> > >
> > > Maybe this could be more specific about what invariant was violated?
> > > E.g. atomic check should have rejected this attempt already.
> >
> > I'm not an expert (yet) in DRM, so please correct me:
> > When atomic_update is called, the new state is always validated by
> > atomic_check before? There is no way to pass something not validated by
> > atomic_check to atomic_update? If this is the case, yes, it should be an
> > ERROR and not a WARN as an invalid format passed the atomic_check
> > verification.
>
> I only know about the UAPI, I'm not familiar with kernel internals.
>
> We see that atomic_update returns void, so I think it simply cannot
> return errors. To my understanding, atomic_check needs to ensure that
> atomic_update cannot fail. There is even UAPI to exercise atomic_check
> alone: the atomic commit TEST_ONLY flag. Userspace trusts that flag, and
> will not expect an identical atomic commit to fail without TEST_ONLY
> when it succeeded with TEST_ONLY.

That my understanding of the UAPI/DRM internals too, is my suggestion
above sufficient? It will always succeed, no kernel OOPS.

> > If so, is this better?
> >
> > if (!pixel_read) {
> > /*
> > * This is a bug as the vkms_plane_atomic_check must forbid all unsupported formats.
> > */
> > DRM_ERROR("Pixel format %4cc is not supported by VKMS planes.\n", fmt);
> > return;
> > }
> >
> > I will put the same code in vkms_writeback.c.
>
> Maybe maintainers can comment whether even DRM_ERROR is strong enough.
>
> As for the message, what you wrote in the comment is the most important
> part that I'd put in the log. It explains what's going on, while that
> "format not supported" is a detail without context.
>

Is something like this better?

/*
* This is a bug in vkms_plane_atomic_check. All the supported
* format must:
* - Be listed in vkms_formats
* - Have a pixel_read_line callback
*/
WARN(true, "Pixel format %4cc is not supported by VKMS planes. This is a kernel bug. Atomic check must forbid this configuration.\n", fmt)

> Thanks,
> pq

[...]

Kind regards,
Louis Chauvet

--
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2024-03-04 15:29:50

by Louis Chauvet

[permalink] [raw]
Subject: Re: [PATCH v2 5/9] drm/vkms: Re-introduce line-per-line composition algorithm

Le 29/02/24 - 12:21, Pekka Paalanen a ?crit :
> On Tue, 27 Feb 2024 16:02:09 +0100
> Louis Chauvet <[email protected]> wrote:
>
> > [...]
> >
> > > > -static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
> > > > - struct line_buffer *stage_buffer,
> > > > - struct line_buffer *output_buffer)
> > > > +static void pre_mul_alpha_blend(
> > > > + struct line_buffer *stage_buffer,
> > > > + struct line_buffer *output_buffer,
> > > > + int x_start,
> > > > + int pixel_count)
> > > > {
> > > > - int x_dst = frame_info->dst.x1;
> > > > - struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
> > > > - struct pixel_argb_u16 *in = stage_buffer->pixels;
> > > > - int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> > > > - stage_buffer->n_pixels);
> > > > -
> > > > - for (int x = 0; x < x_limit; x++) {
> > > > - out[x].a = (u16)0xffff;
> > > > - out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
> > > > - out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
> > > > - out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
> > > > + struct pixel_argb_u16 *out = &output_buffer->pixels[x_start];
> > > > + struct pixel_argb_u16 *in = &stage_buffer->pixels[x_start];
> > >
> > > Input buffers and pointers should be const.
> >
> > They will be const in v4.
> >
> > > > +
> > > > + for (int i = 0; i < pixel_count; i++) {
> > > > + out[i].a = (u16)0xffff;
> > > > + out[i].r = pre_mul_blend_channel(in[i].r, out[i].r, in[i].a);
> > > > + out[i].g = pre_mul_blend_channel(in[i].g, out[i].g, in[i].a);
> > > > + out[i].b = pre_mul_blend_channel(in[i].b, out[i].b, in[i].a);
> > > > }
> > > > }
> > >
> > > Somehow the hunk above does not feel like it is part of "re-introduce
> > > line-per-line composition algorithm". This function was already running
> > > line-by-line. Would it be easy enough to collect this and directly
> > > related changes into a separate patch?
> >
> > It is not directly related to the reintroduction of line-by-line
> > algorithm, but in the simplification and maintenability effort, I
> > changed a bit the function to avoid having multiple place computing the
> > x_start/pixel_count values. I don't see an interrest to extract it, it
> > will be just a translation of the few lines into the calling place.
>
> It does make review more difficult, because it makes the patch bigger
> and is not explained in the commit message. It is a surprise to a
> reviewer, who then needs to think what this means and does it belong
> here.
>
> If you explain it in the commit message and note it in the commit
> summary line, I think it would become fairly obvious that this patch is
> doing two things rather than one.
>
> Therefore, *if* it is easy to extract as a separate patch, then it
> would be nice to do so. However, if doing so would require you to write
> a bunch of temporary code that the next patch would just rewrite again,
> then doing so would be counter-productive.
>
> Patch split is about finding a good trade-off to make things easy for
> reviewers:
>
> - Smaller patches are better as long as they are self-standing and
> understandable in isolation, and of course do not regress anything.
>
> - Rewriting the same thing multiple times in the same series is extra
> work for a reviewer and therefore best avoided.
>
> - The simpler the semantic change, the bigger a patch can be and still
> be easy to review.
>
> And all the patch writing rules specific to the kernel project that I
> don't know about.

I will extract it in "drm/vkms: Avoid computing blending limits inside the
blend function". It's not very relevant by itself, but it make the main
patch easier to read.

> > [...]
> >
> > > > +/**
> > > > + * direction_for_rotation() - Helper to get the correct reading direction for a specific rotation
> > > > + *
> > > > + * @rotation: rotation to analyze
> > >
> > > This is KMS plane rotation property, right?
> > >
> > > So the KMS plane has been rotated by this, and what we want to find is
> > > the read direction on the attached FB so that reading returns pixels in
> > > the CRTC line/scanout order, right?
> > >
> > > Maybe extend the doc to explain that.
> >
> > Is it better?
> >
> > * direction_for_rotation() - Get the correct reading direction for a given rotation
> > *
> > * This function will use the @rotation parameter to compute the correct reading direction to read
> > * a line from the source buffer.
> > * For example, if the buffer is reflected on X axis, the pixel must be read from right to left.
> > * @rotation: Rotation to analyze. It correspond the the field @frame_info.rotation.
>
> I think it is important to define what determines the correct result.
> In this case, we want the reading to produce pixels in the CRTC scanout
> line order, I believe. If you don't say "CRTC", the reader does not
> know what "the correct reading direction" should match to.

Is this a better explanation?

* This function will use the @rotation setting of a source plane to compute the reading
* direction in this plane which correspond to a left to right writing in the CRTC.
* For example, if the buffer is reflected on X axis, the pixel must be read from right to left
* to be written from left to right on the CRTC.

> > > > + */
> > > > +enum pixel_read_direction direction_for_rotation(unsigned int rotation)
> > > > +{
> > > > + if (rotation & DRM_MODE_ROTATE_0) {
> > > > + if (rotation & DRM_MODE_REFLECT_X)
> > > > + return READ_LEFT;
> > > > + else
> > > > + return READ_RIGHT;
> > > > + } else if (rotation & DRM_MODE_ROTATE_90) {
> > > > + if (rotation & DRM_MODE_REFLECT_Y)
> > > > + return READ_UP;
> > > > + else
> > > > + return READ_DOWN;
> > > > + } else if (rotation & DRM_MODE_ROTATE_180) {
> > > > + if (rotation & DRM_MODE_REFLECT_X)
> > > > + return READ_RIGHT;
> > > > + else
> > > > + return READ_LEFT;
> > > > + } else if (rotation & DRM_MODE_ROTATE_270) {
> > > > + if (rotation & DRM_MODE_REFLECT_Y)
> > > > + return READ_DOWN;
> > > > + else
> > > > + return READ_UP;
> > > > + }
> > > > + return READ_RIGHT;
> > > > +}
> > > > +
> > > > /**
> > > > * blend - blend the pixels from all planes and compute crc
> > > > * @wb: The writeback frame buffer metadata
> > > > @@ -183,11 +187,11 @@ static void blend(struct vkms_writeback_job *wb,
> > > > {
> > > > struct vkms_plane_state **plane = crtc_state->active_planes;
> > > > u32 n_active_planes = crtc_state->num_active_planes;
> > > > - int y_pos;
> > > >
> > > > const struct pixel_argb_u16 background_color = { .a = 0xffff };
> > > >
> > > > size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
> > > > + size_t crtc_x_limit = crtc_state->base.crtc->mode.hdisplay;
> > >
> > > Wonder why these were size_t, causing needs to cast below...
> >
> > For crtc_x_limit I just copied the crtc_y_limit. I will change both to u16
> > (the type of h/vdisplay).
>
> Don't go unsigned, that can cause unexpected results when mixed in
> computations with signed variables.

I will replace them with int.

> Oh, the cast was probably not about size but signedness. Indeed, size_t
> is unsigned.
>
> I don't see a reason to use a 16-bit size either, it just exposes the
> computations to under/overflows that would then be needed to check for.
> s32 should be as fast as any, and perhaps enough bits to never
> under/overflow in these computations, but please verify that.

I just suggested u16 because it's the type of vdisplay/hdisplay. It was
not for performance reason.

> > > >
> > > > /*
> > > > * The planes are composed line-by-line. It is a necessary complexity to avoid poor
> > > > @@ -198,22 +202,133 @@ static void blend(struct vkms_writeback_job *wb,
> > > >
> > > > /* The active planes are composed associatively in z-order. */
> > > > for (size_t i = 0; i < n_active_planes; i++) {
> > > > - y_pos = get_y_pos(plane[i]->frame_info, y);
> > > > + struct vkms_plane_state *current_plane = plane[i];
> > > >
> > > > - if (!check_limit(plane[i]->frame_info, y_pos))
> > > > + /* Avoid rendering useless lines */
> > > > + if (y < current_plane->frame_info->dst.y1 ||
> > > > + y >= current_plane->frame_info->dst.y2) {
> > > > continue;
> > > > -
> > > > - vkms_compose_row(stage_buffer, plane[i], y_pos);
> > > > - pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
> > > > - output_buffer);
> > > > + }
> > > > +
> > > > + /*
> > > > + * src_px is the line to copy. The initial coordinates are inside the
> > > > + * destination framebuffer, and then drm_rect_* helpers are used to
> > > > + * compute the correct position into the source framebuffer.
> > > > + */
> > > > + struct drm_rect src_px = DRM_RECT_INIT(
> > > > + current_plane->frame_info->dst.x1, y,
> > > > + drm_rect_width(&current_plane->frame_info->dst), 1);
> > > > + struct drm_rect tmp_src;
> > > > +
> > > > + drm_rect_fp_to_int(&tmp_src, &current_plane->frame_info->src);
> > > > +
> > > > + /*
> > > > + * [1]: Clamping src_px to the crtc_x_limit to avoid writing outside of the
> > > > + * destination buffer
> > > > + */
> > > > + src_px.x2 = min_t(int, src_px.x2, (int)crtc_x_limit);
> > >
> > > Up to and including this point, it would be better if src_px was called
> > > dst_px, because only the below computation converts it into actual
> > > src_px.
> >
> > I agree, it will be changed for the v4. I will also change the name to
> > `dst_line` and `src_line`.
>
> Alright.
>
> ...
>
>
> > > > }
> > > >
> > > > -static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
> > > > +/**
> > > > + * get_step_1x1() - Common helper to compute the correct step value between each pixel to read in a
> > > > + * certain direction.
> > > > + * This must be used only with format where blockh == blockw == 1.
> > > > + * In the case when direction is not a valid pixel_read_direction, the returned step is 0, so you
> > > > + * must not rely on this result to create a loop variant.
> > > > + *
> > > > + * @fb Framebuffer to iter on
> > > > + * @direction Direction of the reading
> > > > + */
> > > > +static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction direction,
> > > > + int plane_index)
> > > > {
> > > > - int x_src = frame_info->src.x1 >> 16;
> > > > - int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
> > > > -
> > > > - return packed_pixels_addr(frame_info, x_src, y_src);
> > > > + switch (direction) {
> > > > + default:
> > > > + DRM_ERROR("Invalid direction for pixel reading: %d\n", direction);
> > > > + return 0;
> > >
> > > What I'd do here is move the default: section outside of the switch
> > > completely. Then the compiler can warn if any enum value is not handled
> > > here. Since every case in the switch is a return statement, falling out
> > > of the switch block is the default case.
> >
> > Hoo, I did not know that gcc can warn when using enums, I will definitly
> > do it for the v4.
> >
> > > Maybe the enum variable containing an illegal value could be handled
> > > more harshly so that callers could rely on this function always
> > > returning a good value?
> > >
> > > Just like passing in fb=NULL is handled by the kernel as an OOPS.
> >
> > I don't think it's a good idea to OOPS inside a driver.
>
> Everyone already do that. Most functions that do not expect to be called
> with NULL never check the arguments for NULL. They just OOPS on
> dereference if someone passes in NULL. And for a good reason: adding
> all those checks is both code churn and it casts doubt: "maybe it is
> legal and expected to call this function with NULL sometimes, what good
> does that do?".

I agree that adding something like

if (direction_is_valid) pr_err("Invalid direction")

is useless, but as I already have the switch, it cost nothing to warn if
something gone wrong. I will just replace this simple DRM_ERROR with a
WARN_ONCE to be more verbose about "it is a bug".

> > An error here is
> > maybe dangerous, but is not fatal to the kernel. Maybe you know how to do
> > a "local" OOPS to break only this driver and not the whole kernel?
>
> I don't know what the best practices are in the kernel.
>
> > For the v4 I will keep a DRM_ERROR and return 0.
>
> Does that require the caller to check for 0? Could the 0 cause
> something else to end up in an endless loop? If it does return 0, how
> should a caller handle this case that "cannot" ever happen? Why have
> code for something that cannot happen?

I have to return something, otherwise the compiler will complain about it.

To avoid for future developers surprise, I added this information in the
comment. This way the user don't have to read the code to understand how
much he can rely on this value.

If the caller can trust his direction, he don't have to worry about this.
If he can't trust his direction, he know that the returned value can be
zero, and thus can't be used for a loop variant.

The zero is also nice because it does not interfere with the normal
behavior of this function. If the returned value is not zero, it's the
correct step to use from one pixel to an other.

> Of course it's a trade-off between correctness and limping along
> injured, and the kernel tends to strongly lean toward the latter for the
> obvious reasons.
>
> > > > + case READ_RIGHT:
> > > > + return fb->format->char_per_block[plane_index];
> > > > + case READ_LEFT:
> > > > + return -fb->format->char_per_block[plane_index];
> > > > + case READ_DOWN:
> > > > + return (int)fb->pitches[plane_index];
> > > > + case READ_UP:
> > > > + return -(int)fb->pitches[plane_index];
> > > > + }
> > > > }
> > > >
> > > > -static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
> > > > -{
> > > > - if (frame_info->rotation & (DRM_MODE_REFLECT_X | DRM_MODE_ROTATE_270))
> > > > - return limit - x - 1;
> > > > - return x;
> > > > -}
> > > >
> > > > /*
> > > > - * The following functions take pixel data from the buffer and convert them to the format
> > > > + * The following functions take pixel data (a, r, g, b, pixel, ...), convert them to the format
> > > > * ARGB16161616 in out_pixel.
> > > > *
> > > > - * They are used in the `vkms_compose_row` function to handle multiple formats.
> > > > + * They are used in the `read_line`s functions to avoid duplicate work for some pixel formats.
> > > > */
> > > >
> > > > -static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> > > > +static void ARGB8888_to_argb_u16(struct pixel_argb_u16 *out_pixel, int a, int r, int g, int b)
> > >
> > > The function name ARGB8888_to_argb_u16() is confusing. It's not taking
> > > in ARGB8888 pixels but separate a,r,g,b ints. The only assumption it
> > > needs from the pixel format is the 8888 part.
> >
> > I don't realy know how to name it. What I like with ARGB8888 is that it's
> > clear that the values are 8 bits and in argb format.
>
> I could even propose
>
> static struct pixel_argb_u16
> argb_u16_from_u8888(int a, int r, int g, int b)
>
> perhaps. Yes, returning a struct by value. I think it would fit, and
> these are supposed to get fully inlined anyway, too.
>
> c.f argb_u16_from_u2101010().

I don't find this method, but I got and like the idea, I will change the
callback to this in the v4.

> Not a big deal though, I think I'm getting a little bit too involved to
> see what would be the most intuitively understandable naming scheme for
> someone not familiar with the code.
>
> > Do you think that `argb_u8_to_argb_u16`, with a new structure
> > pixel_argb_u8 will be better? (like PATCH 6/9 with pixel_yuv_u8).
> >
> > If so, I will introduce the argb_u8 structure in an other commit.
>
> How would you handle 10-bpc formats? Is there a need for
> proliferation of bit-depth-specific struct types?

No, I don't think it's good to multiply things. I will patch Arthur's
patches to avoid the pixel_yuv_u8 structure.

> > [...]
> >
> > > > + * The following functions are read_line function for each pixel format supported by VKMS.
> > > > *
> > > > - * This function composes a single row of a plane. It gets the source pixels
> > > > - * through the y coordinate (see get_packed_src_addr()) and goes linearly
> > > > - * through the source pixel, reading the pixels and converting it to
> > > > - * ARGB16161616 (see the pixel_read() callback). For rotate-90 and rotate-270,
> > > > - * the source pixels are not traversed linearly. The source pixels are queried
> > > > - * on each iteration in order to traverse the pixels vertically.
> > > > + * They read a line starting at the point @x_start,@y_start following the @direction. The result
> > > > + * is stored in @out_pixel and in the format ARGB16161616.
> > > > + *
> > > > + * Those function are very similar, but it is required for performance reason. In the past, some
> > > > + * experiment were done, and with a generic loop the performance are very reduced [1].
> > > > + *
> > > > + * [1]: https://lore.kernel.org/dri-devel/[email protected]/
> > > > */
> > > > -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y)
> > > > +
> > > > +static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> > > > + enum pixel_read_direction direction, int count,
> > > > + struct pixel_argb_u16 out_pixel[])
> > > > +{
> > > > + u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> > > > +
> > > > + int step = get_step_1x1(frame_info->fb, direction, 0);
> > > > +
> > > > + while (count) {
> > > > + u8 *px = (u8 *)src_pixels;
> > > > +
> > > > + ARGB8888_to_argb_u16(out_pixel, px[3], px[2], px[1], px[0]);
> > > > + out_pixel += 1;
> > > > + src_pixels += step;
> > > > + count--;
> > >
> > > btw. you could eliminate decrementing 'count' if you computed end
> > > address and used while (out_pixel < end).
> >
> > Yes, you are right, but after thinking about it, neither out_pixel < end
> > and while (count) are conveying "this loop will copy `count` pixels. I
> > think a for-loop here is more understandable. There is no ambiguity in the
> > number of pixels written and less error-prone. I will replace
> > while (count)
> > by
> > for(int i = 0; i < count; i++)
>
> I agree that a for-loop is the most obvious way of saying it, but I
> also think while (out_pixel < end) is very close too, and so is while (count).
> None of those would make me think twice.
>
> However, I'm thinking of performance here. After all, this is the
> hottest code path there is in VKMS. Is the compiler smart enough to
> eliminate count-- or i to reduce the number of CPU cycles?

You are proably right, I will change it to out_pixel < end.

> Thanks,
> pq

Kind regards,
Louis Chauvet

--
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2024-03-04 15:43:57

by Arthur Grillo

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] drm/vkms: Add YUV support



On 04/03/24 12:28, Louis Chauvet wrote:
> Le 29/02/24 - 14:12, Pekka Paalanen a écrit :
>> On Wed, 28 Feb 2024 22:52:09 -0300
>> Arthur Grillo <[email protected]> wrote:
>>
>>> On 27/02/24 17:01, Arthur Grillo wrote:
>>>>
>>>>
>>>> On 27/02/24 12:02, Louis Chauvet wrote:
>>>>> Hi Pekka,
>>>>>
>>>>> For all the comment related to the conversion part, maybe Arthur have an
>>>>> opinion on it, I took his patch as a "black box" (I did not want to
>>>>> break (and debug) it).
>>>>>
>>>>> Le 26/02/24 - 14:19, Pekka Paalanen a écrit :
>>>>>> On Fri, 23 Feb 2024 12:37:26 +0100
>>>>>> Louis Chauvet <[email protected]> wrote:
>>>>>>
>>>>>>> From: Arthur Grillo <[email protected]>
>>>>>>>
>>>>>>> Add support to the YUV formats bellow:
>>>>>>>
>>>>>>> - NV12
>>>>>>> - NV16
>>>>>>> - NV24
>>>>>>> - NV21
>>>>>>> - NV61
>>>>>>> - NV42
>>>>>>> - YUV420
>>>>>>> - YUV422
>>>>>>> - YUV444
>>>>>>> - YVU420
>>>>>>> - YVU422
>>>>>>> - YVU444
>>>>>>>
>>>>>>> The conversion matrices of each encoding and range were obtained by
>>>>>>> rounding the values of the original conversion matrices multiplied by
>>>>>>> 2^8. This is done to avoid the use of fixed point operations.
>>>>>>>
>>>>>>> Signed-off-by: Arthur Grillo <[email protected]>
>>>>>>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
>>>>>>> callbacks for yuv formats]
>>>>>>> Signed-off-by: Louis Chauvet <[email protected]>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/vkms/vkms_composer.c | 2 +-
>>>>>>> drivers/gpu/drm/vkms/vkms_drv.h | 6 +-
>>>>>>> drivers/gpu/drm/vkms/vkms_formats.c | 289 +++++++++++++++++++++++++++++++++--
>>>>>>> drivers/gpu/drm/vkms/vkms_formats.h | 4 +
>>>>>>> drivers/gpu/drm/vkms/vkms_plane.c | 14 +-
>>>>>>> 5 files changed, 295 insertions(+), 20 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>>>>>>> index e555bf9c1aee..54fc5161d565 100644
>>>>>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>>>>>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>>>>>>> @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
>>>>>>> * buffer [1]
>>>>>>> */
>>>>>>> current_plane->pixel_read_line(
>>>>>>> - current_plane->frame_info,
>>>>>>> + current_plane,
>>>>>>> x_start,
>>>>>>> y_start,
>>>>>>> direction,
>>>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
>>>>>>> index ccc5be009f15..a4f6456cb971 100644
>>>>>>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
>>>>>>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
>>>>>>> @@ -75,6 +75,8 @@ enum pixel_read_direction {
>>>>>>> READ_RIGHT
>>>>>>> };
>>>>>>>
>>>>>>> +struct vkms_plane_state;
>>>>>>> +
>>>>>>> /**
>>>>>>> <<<<<<< HEAD
>>>>>>> * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
>>>>>>> @@ -87,8 +89,8 @@ enum pixel_read_direction {
>>>>>>> * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
>>>>>>> * x_end.
>>>>>>> */
>>>>>>> -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
>>>>>>> - pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>>>>>>> +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
>>>>>>> + enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>>>>>>
>>>>>> This is the second or third time in this one series changing this type.
>>>>>> Could you not do the change once, in its own patch if possible?
>>>>>
>>>>> Sorry, this is not a change here, but a wrong formatting (missed when
>>>>> rebasing).
>>>>>
>>>>> Do you think that it make sense to re-order my patches and put this
>>>>> typedef at the end? This way it is never updated.
>>
>> I'm not sure, I haven't checked how it would change your patches. The
>> intermediate changes might get a lot uglier?
>>
>> Just try to fold changes so that you don't need to change something
>> twice over the series unless there is a good reason to. "How hard would
>> it be to review this?" is my measure stick.
>
> It will not be uglier, it was just the order I did things. I first cleaned
> the code and created this typedef (PATCHv2 4/9), and then rewrote the
> composition, for which I had to change the typedef.
>
> I also wanted to make my series easy to understand and make clear what is
> my "main contribution" and what are "quality stuff, not related to my
> contribution":
> - Prepare things (document existing state, format, typedef)
> - Big change (and update related doc, typedef)
> - Rebase some other stuff on my big change (YUV)
>
> So yes, some parts are changed twice in preparation step and the "big
> change".
>
>>
>>>>>
>>>>>>>
>>>>>>> /**
>>>>>>> * vkms_plane_state - Driver specific plane state
>>>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>>> index 46daea6d3ee9..515c80866a58 100644
>>>>>>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>>> @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
>>>>>>> */
>>>>>>> return fb->offsets[plane_index] +
>>>>>>> (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
>>>>>>> - (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
>>>>>>> + (x / drm_format_info_block_height(format, plane_index)) *
>>>>>>> + format->char_per_block[plane_index];
>>>>>>
>>>>>> Shouldn't this be in the patch that added this code in the first place?
>>>>>
>>>>> Same as above, a wrong formatting, I will remove this change and keep
>>>>> everything on one line (even if it's more than 100 chars, it is easier to
>>>>> read).
>>
>> Personally I agree that readability is more important than strict line
>> length limits. I'm not sure how the kernel rolls there.
>>
>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> /**
>>>>>>> @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> +/**
>>>>>>> + * get_subsampling() - Get the subsampling value on a specific direction
>>>>>>
>>>>>> subsampling divisor
>>>>>
>>>>> Thanks for this precision.
>>>>>
>>>>>>> + */
>>>>>>> +static int get_subsampling(const struct drm_format_info *format,
>>>>>>> + enum pixel_read_direction direction)
>>>>>>> +{
>>>>>>> + if (direction == READ_LEFT || direction == READ_RIGHT)
>>>>>>> + return format->hsub;
>>>>>>> + else if (direction == READ_DOWN || direction == READ_UP)
>>>>>>> + return format->vsub;
>>>>>>> + return 1;
>>>>>>
>>>>>> In this and the below function, personally I'd prefer switch-case, with
>>>>>> a cannot-happen-scream after the switch, so the compiler can warn about
>>>>>> unhandled enum values.
>>>>>
>>>>> As for the previous patch, I did not know about this compiler feature,
>>>>> thanks!
>>>>>
>>>>>>> +}
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
>>>>>>> + */
>>>>>>> +static int get_subsampling_offset(const struct drm_format_info *format,
>>>>>>> + enum pixel_read_direction direction, int x_start, int y_start)
>>>>>>
>>>>>> 'start' values as "increments" for a pixel counter? Is something
>>>>>> misnamed here?
>>>>>>
>>>>>> Is it an increment or an offset?
>>>>>
>>>>> I don't really know how to name the function. I'm open to suggestions
>>>>> x_start and y_start are really the coordinate of the starting reading point.
>>
>> I looks like it's an offset, so "offset" and "start" are good words.
>> Then the only misleading piece is the doc:
>>
>> "Get the subsampling offset to use when incrementing the pixel counter"
>>
>> This sounds like the offset is used when incrementing a counter, that
>> is, counter is increment by offset each time. That's my problem with
>> this.
>>
>> Fix just the doc, and it's good, I think.
>>
>>>>>
>>>>> To explain what it does:
>>>>>
>>>>> When using subsampling, you have to read the next pixel of planes[1..4]
>>>>> not at the same "speed" as plane[0]. But I can't only rely on
>>>>> "read_pixel_count % subsampling == 0", because it means that the pixel
>>>>> incrementation on planes[1..4] may not be aligned with the buffer (if
>>>>> hsub=2 and the start pixel is 1, I need to increment planes[1..4] only
>>>>> for x=2,4,6... not 1,3,5...).
>>>>>
>>>>> A way to ensure this is to add an "offset" to count, which ensure that the
>>>>> count % subsampling == 0 on the correct pixel.
>>
>> Yes, I think I did get that feeling from the code eventually somehow,
>> but it wouldn't hurt to explain it in the comment.
>>
>> "An offset for keeping the chroma siting consistent regardless of
>> x_start and y_start" maybe?
>
> It is better yes, thanks!
>
>>>>>
>>>>> I made an error, the switch case must be (as count is always counting up,
>>>>> for "inverted" reading direction a negative number ensure that
>>>>> %subsampling == 0 on the correct pixel):
>>>>>
>>>>> switch (direction) {
>>>>> case READ_UP:
>>>>> return -y_start;
>>>>> case READ_DOWN:
>>>>> return y_start;
>>>>> case READ_LEFT:
>>>>> return -x_start;
>>>>> case READ_RIGHT:
>>>>> return x_start;
>>>>> }
>>
>> Yes, the inverted reading directions are different indeed. I did not
>> think through if this works also for sub-sampling divisors > 2 which I
>> don't think are ever used.
>
> I choosen those values because they should work with any sub-sampling
> divisor.
>
> hsub/vsub = 4 is used with DRM_FORMAT_YUV410/YVU410/YUV411/YVU411.
>
>>
>> Does IGT find this mistake? If not, maybe IGT should be extended.
>
> No, for two reasons:
> - The original version works fine for NV12/16/24 and YUV with *sub <= 2
> (x+n%2 == x-n%2). It only breaks for *sub > 2.
> - YUV410/... are not supported by VKMS
> - IGT does not test different colors for rotations/translations (at least
> for the tests I tried). I will see if it's possible to add things in
> kms_rotation_crc/kms_cursor_crc to test more colors format (at least
> one RGB and one YUV).
>
>>>>>
>>>>>>> +{
>>>>>>> + if (direction == READ_RIGHT || direction == READ_LEFT)
>>>>>>> + return x_start;
>>>>>>> + else if (direction == READ_DOWN || direction == READ_UP)
>>>>>>> + return y_start;
>>>>>>> + return 0;
>>>>>>> +}
>>>>>>> +
>>>>>
>>>>> [...]
>>>>>
>>>>>>> +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
>>>>>>> + enum drm_color_encoding encoding, enum drm_color_range range)
>>>>>>> +{
>>>>>>> + static const s16 bt601_full[3][3] = {
>>>>>>> + { 256, 0, 359 },
>>>>>>> + { 256, -88, -183 },
>>>>>>> + { 256, 454, 0 },
>>>>>>> + };
>>>>>
>>>>> [...]
>>>>>
>>>>>>> +
>>>>>>> + u8 r = 0;
>>>>>>> + u8 g = 0;
>>>>>>> + u8 b = 0;
>>>>>>> + bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
>>>>>>> + unsigned int y_offset = full ? 0 : 16;
>>>>>>> +
>>>>>>> + switch (encoding) {
>>>>>>> + case DRM_COLOR_YCBCR_BT601:
>>>>>>> + ycbcr2rgb(full ? bt601_full : bt601,
>>>>>>
>>>>>> Doing all these conditional again pixel by pixel is probably
>>>>>> inefficient. Just like with the line reading functions, you could pick
>>>>>> the matrix in advance.
>>>>>
>>>>> I don't think the performance impact is huge (it's only a pair of if), but
>>>>> yes, it's an easy optimization.
>>>>>
>>>>> I will create a conversion_matrix structure:
>>>>>
>>>>> struct conversion_matrix {
>>>>> s16 matrix[3][3];
>>>>> u16 y_offset;
>>>>> }
>>
>> When defining such a struct type, it would be good to document the
>> matrix layout (which one is row, which one is column), and what the s16
>> mean (fixed point?).
>
> Ack
>
>> Try to not mix signed and unsigned types, too. The C implicit type
>> promotion rules can be surprising. Just make everything signed while
>> computing, and convert to/from unsigned only for storage.
>
> Ack, I will change to signed type.
>
>>>>>
>>>>> I will create a `get_conversion_matrix_to_argb_u16` function to get this
>>>>> structure from a format+encoding+range.
>>>>>
>>>>> I will also add a field `conversion_matrix` in struct vkms_plane_state to
>>>>> get this matrix only once per plane setup.
>>
>> Alright. Let's see how that works.
>>
>>>>>
>>>>>
>>>>>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>>>>> + break;
>>>>>>> + case DRM_COLOR_YCBCR_BT709:
>>>>>>> + ycbcr2rgb(full ? rec709_full : rec709,
>>>>>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>>>>> + break;
>>>>>>> + case DRM_COLOR_YCBCR_BT2020:
>>>>>>> + ycbcr2rgb(full ? bt2020_full : bt2020,
>>>>>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>>>>> + break;
>>>>>>> + default:
>>>>>>> + pr_warn_once("Not supported color encoding\n");
>>>>>>> + break;
>>>>>>> + }
>>>>>>> +
>>>>>>> + argb_u16->r = r * 257;
>>>>>>> + argb_u16->g = g * 257;
>>>>>>> + argb_u16->b = b * 257;
>>>>>>
>>>>>> I wonder. Using 8-bit fixed point precision seems quite coarse for
>>>>>> 8-bit pixel formats, and it's going to be insufficient for higher bit
>>>>>> depths. Was supporting e.g. 10-bit YUV considered? There is even
>>>>>> deeper, too, like DRM_FORMAT_P016.
>>>>>
>>>>> It's a good point, as I explained above, I took the conversion part as a
>>>>> "black box" to avoid breaking (and debugging) stuff. I think it's easy to
>>>>> switch to s32 bits matrix with 16.16 bits (or anything with more than 16 bits in
>>>>> the float part).
>>>>>
>>>>> Maybe Arthur have an opinion on this?
>>>>
>>>> Yeah, I too don't see why not we could do that. The 8-bit precision was
>>>> sufficient for those formats, but as well noted by Pekka this could be a
>>>> problem for higher bit depths. I just need to make my terrible python
>>>> script spit those values XD.
>>>
>>> Finally, I got it working with 32-bit precision.
>>>
>>> I basically threw all my untrusted python code away, and started using
>>> the colour python framework suggested by Sebastian[1]. After knowing the
>>> right values (and staring at numbers for hours), I found that with a
>>> little bit of rounding, the conversion works.
>>>
>>> Also, while at it, I changed the name rec709 to bt709 to follow the
>>> pattern and added "_full" to the full ranges matrices.
>>>
>>> While using the library, I noticed that the red component is wrong on
>>> the color red in one test case.
>>>
>>> [1]: https://lore.kernel.org/all/20240115150600.GC160656@toolbox/
>>
>> That all sounds good. I wish the kernel code contained comments
>> explaining how exactly you computed those matrices with python/colour.
>> If the python snippets are not too long, including them verbatim as
>> code comments would be really nice for both reviewers and posterity.
>>
>> The same for the VKMS unit tests, too, how you got the expected result
>> values.
>
> I edited the YUV support to have those s64 values.
>
> @arthur, I will submit a v4 with this:
> - matrix selection in plane_atomic_update (so it's selected only once)
> - s64 numbers for matrix
> - avoiding multiple loop implementation by switching matrix columns

This looks good to me.

>
> Regarding the YUV part, I don't feel confortable adressing Pekka's
> comments, would you mind doing it?

I'm already doing that, how do you want me to send those changes? I reply to
your series, like a did before?

Best Regards,
~Arthur Grillo

>
> Kind regards,
> Louis Chauvet
>
> [...]
>

2024-03-04 15:51:02

by Louis Chauvet

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] drm/vkms: Add YUV support

[...]

> > @arthur, I will submit a v4 with this:
> > - matrix selection in plane_atomic_update (so it's selected only once)
> > - s64 numbers for matrix
> > - avoiding multiple loop implementation by switching matrix columns
>
> This looks good to me.
>
> >
> > Regarding the YUV part, I don't feel confortable adressing Pekka's
> > comments, would you mind doing it?
>
> I'm already doing that, how do you want me to send those changes? I reply to
> your series, like a did before?

Yes, simply reply to my series, so I can rebase everything on the
line-by-line work.

Kind regards,
Louis Chauvet

> Best Regards,
> ~Arthur Grillo
>
> >
> > Kind regards,
> > Louis Chauvet
> >
> > [...]
> >

--
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2024-03-04 15:58:20

by Louis Chauvet

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] drm/vkms: Add YUV support

Le 29/02/24 - 14:12, Pekka Paalanen a ?crit :
> On Wed, 28 Feb 2024 22:52:09 -0300
> Arthur Grillo <[email protected]> wrote:
>
> > On 27/02/24 17:01, Arthur Grillo wrote:
> > >
> > >
> > > On 27/02/24 12:02, Louis Chauvet wrote:
> > >> Hi Pekka,
> > >>
> > >> For all the comment related to the conversion part, maybe Arthur have an
> > >> opinion on it, I took his patch as a "black box" (I did not want to
> > >> break (and debug) it).
> > >>
> > >> Le 26/02/24 - 14:19, Pekka Paalanen a ?crit :
> > >>> On Fri, 23 Feb 2024 12:37:26 +0100
> > >>> Louis Chauvet <[email protected]> wrote:
> > >>>
> > >>>> From: Arthur Grillo <[email protected]>
> > >>>>
> > >>>> Add support to the YUV formats bellow:
> > >>>>
> > >>>> - NV12
> > >>>> - NV16
> > >>>> - NV24
> > >>>> - NV21
> > >>>> - NV61
> > >>>> - NV42
> > >>>> - YUV420
> > >>>> - YUV422
> > >>>> - YUV444
> > >>>> - YVU420
> > >>>> - YVU422
> > >>>> - YVU444
> > >>>>
> > >>>> The conversion matrices of each encoding and range were obtained by
> > >>>> rounding the values of the original conversion matrices multiplied by
> > >>>> 2^8. This is done to avoid the use of fixed point operations.
> > >>>>
> > >>>> Signed-off-by: Arthur Grillo <[email protected]>
> > >>>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
> > >>>> callbacks for yuv formats]
> > >>>> Signed-off-by: Louis Chauvet <[email protected]>
> > >>>> ---
> > >>>> drivers/gpu/drm/vkms/vkms_composer.c | 2 +-
> > >>>> drivers/gpu/drm/vkms/vkms_drv.h | 6 +-
> > >>>> drivers/gpu/drm/vkms/vkms_formats.c | 289 +++++++++++++++++++++++++++++++++--
> > >>>> drivers/gpu/drm/vkms/vkms_formats.h | 4 +
> > >>>> drivers/gpu/drm/vkms/vkms_plane.c | 14 +-
> > >>>> 5 files changed, 295 insertions(+), 20 deletions(-)
> > >>>>
> > >>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> > >>>> index e555bf9c1aee..54fc5161d565 100644
> > >>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> > >>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> > >>>> @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
> > >>>> * buffer [1]
> > >>>> */
> > >>>> current_plane->pixel_read_line(
> > >>>> - current_plane->frame_info,
> > >>>> + current_plane,
> > >>>> x_start,
> > >>>> y_start,
> > >>>> direction,
> > >>>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> > >>>> index ccc5be009f15..a4f6456cb971 100644
> > >>>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> > >>>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> > >>>> @@ -75,6 +75,8 @@ enum pixel_read_direction {
> > >>>> READ_RIGHT
> > >>>> };
> > >>>>
> > >>>> +struct vkms_plane_state;
> > >>>> +
> > >>>> /**
> > >>>> <<<<<<< HEAD
> > >>>> * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
> > >>>> @@ -87,8 +89,8 @@ enum pixel_read_direction {
> > >>>> * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
> > >>>> * x_end.
> > >>>> */
> > >>>> -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
> > >>>> - pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
> > >>>> +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
> > >>>> + enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
> > >>>
> > >>> This is the second or third time in this one series changing this type.
> > >>> Could you not do the change once, in its own patch if possible?
> > >>
> > >> Sorry, this is not a change here, but a wrong formatting (missed when
> > >> rebasing).
> > >>
> > >> Do you think that it make sense to re-order my patches and put this
> > >> typedef at the end? This way it is never updated.
>
> I'm not sure, I haven't checked how it would change your patches. The
> intermediate changes might get a lot uglier?
>
> Just try to fold changes so that you don't need to change something
> twice over the series unless there is a good reason to. "How hard would
> it be to review this?" is my measure stick.

It will not be uglier, it was just the order I did things. I first cleaned
the code and created this typedef (PATCHv2 4/9), and then rewrote the
composition, for which I had to change the typedef.

I also wanted to make my series easy to understand and make clear what is
my "main contribution" and what are "quality stuff, not related to my
contribution":
- Prepare things (document existing state, format, typedef)
- Big change (and update related doc, typedef)
- Rebase some other stuff on my big change (YUV)

So yes, some parts are changed twice in preparation step and the "big
change".

>
> > >>
> > >>>>
> > >>>> /**
> > >>>> * vkms_plane_state - Driver specific plane state
> > >>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> > >>>> index 46daea6d3ee9..515c80866a58 100644
> > >>>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> > >>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > >>>> @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
> > >>>> */
> > >>>> return fb->offsets[plane_index] +
> > >>>> (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
> > >>>> - (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
> > >>>> + (x / drm_format_info_block_height(format, plane_index)) *
> > >>>> + format->char_per_block[plane_index];
> > >>>
> > >>> Shouldn't this be in the patch that added this code in the first place?
> > >>
> > >> Same as above, a wrong formatting, I will remove this change and keep
> > >> everything on one line (even if it's more than 100 chars, it is easier to
> > >> read).
>
> Personally I agree that readability is more important than strict line
> length limits. I'm not sure how the kernel rolls there.
>
> > >>
> > >>>> }
> > >>>>
> > >>>> /**
> > >>>> @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
> > >>>> }
> > >>>> }
> > >>>>
> > >>>> +/**
> > >>>> + * get_subsampling() - Get the subsampling value on a specific direction
> > >>>
> > >>> subsampling divisor
> > >>
> > >> Thanks for this precision.
> > >>
> > >>>> + */
> > >>>> +static int get_subsampling(const struct drm_format_info *format,
> > >>>> + enum pixel_read_direction direction)
> > >>>> +{
> > >>>> + if (direction == READ_LEFT || direction == READ_RIGHT)
> > >>>> + return format->hsub;
> > >>>> + else if (direction == READ_DOWN || direction == READ_UP)
> > >>>> + return format->vsub;
> > >>>> + return 1;
> > >>>
> > >>> In this and the below function, personally I'd prefer switch-case, with
> > >>> a cannot-happen-scream after the switch, so the compiler can warn about
> > >>> unhandled enum values.
> > >>
> > >> As for the previous patch, I did not know about this compiler feature,
> > >> thanks!
> > >>
> > >>>> +}
> > >>>> +
> > >>>> +/**
> > >>>> + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
> > >>>> + */
> > >>>> +static int get_subsampling_offset(const struct drm_format_info *format,
> > >>>> + enum pixel_read_direction direction, int x_start, int y_start)
> > >>>
> > >>> 'start' values as "increments" for a pixel counter? Is something
> > >>> misnamed here?
> > >>>
> > >>> Is it an increment or an offset?
> > >>
> > >> I don't really know how to name the function. I'm open to suggestions
> > >> x_start and y_start are really the coordinate of the starting reading point.
>
> I looks like it's an offset, so "offset" and "start" are good words.
> Then the only misleading piece is the doc:
>
> "Get the subsampling offset to use when incrementing the pixel counter"
>
> This sounds like the offset is used when incrementing a counter, that
> is, counter is increment by offset each time. That's my problem with
> this.
>
> Fix just the doc, and it's good, I think.
>
> > >>
> > >> To explain what it does:
> > >>
> > >> When using subsampling, you have to read the next pixel of planes[1..4]
> > >> not at the same "speed" as plane[0]. But I can't only rely on
> > >> "read_pixel_count % subsampling == 0", because it means that the pixel
> > >> incrementation on planes[1..4] may not be aligned with the buffer (if
> > >> hsub=2 and the start pixel is 1, I need to increment planes[1..4] only
> > >> for x=2,4,6... not 1,3,5...).
> > >>
> > >> A way to ensure this is to add an "offset" to count, which ensure that the
> > >> count % subsampling == 0 on the correct pixel.
>
> Yes, I think I did get that feeling from the code eventually somehow,
> but it wouldn't hurt to explain it in the comment.
>
> "An offset for keeping the chroma siting consistent regardless of
> x_start and y_start" maybe?

It is better yes, thanks!

> > >>
> > >> I made an error, the switch case must be (as count is always counting up,
> > >> for "inverted" reading direction a negative number ensure that
> > >> %subsampling == 0 on the correct pixel):
> > >>
> > >> switch (direction) {
> > >> case READ_UP:
> > >> return -y_start;
> > >> case READ_DOWN:
> > >> return y_start;
> > >> case READ_LEFT:
> > >> return -x_start;
> > >> case READ_RIGHT:
> > >> return x_start;
> > >> }
>
> Yes, the inverted reading directions are different indeed. I did not
> think through if this works also for sub-sampling divisors > 2 which I
> don't think are ever used.

I choosen those values because they should work with any sub-sampling
divisor.

hsub/vsub = 4 is used with DRM_FORMAT_YUV410/YVU410/YUV411/YVU411.

>
> Does IGT find this mistake? If not, maybe IGT should be extended.

No, for two reasons:
- The original version works fine for NV12/16/24 and YUV with *sub <= 2
(x+n%2 == x-n%2). It only breaks for *sub > 2.
- YUV410/... are not supported by VKMS
- IGT does not test different colors for rotations/translations (at least
for the tests I tried). I will see if it's possible to add things in
kms_rotation_crc/kms_cursor_crc to test more colors format (at least
one RGB and one YUV).

> > >>
> > >>>> +{
> > >>>> + if (direction == READ_RIGHT || direction == READ_LEFT)
> > >>>> + return x_start;
> > >>>> + else if (direction == READ_DOWN || direction == READ_UP)
> > >>>> + return y_start;
> > >>>> + return 0;
> > >>>> +}
> > >>>> +
> > >>
> > >> [...]
> > >>
> > >>>> +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
> > >>>> + enum drm_color_encoding encoding, enum drm_color_range range)
> > >>>> +{
> > >>>> + static const s16 bt601_full[3][3] = {
> > >>>> + { 256, 0, 359 },
> > >>>> + { 256, -88, -183 },
> > >>>> + { 256, 454, 0 },
> > >>>> + };
> > >>
> > >> [...]
> > >>
> > >>>> +
> > >>>> + u8 r = 0;
> > >>>> + u8 g = 0;
> > >>>> + u8 b = 0;
> > >>>> + bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
> > >>>> + unsigned int y_offset = full ? 0 : 16;
> > >>>> +
> > >>>> + switch (encoding) {
> > >>>> + case DRM_COLOR_YCBCR_BT601:
> > >>>> + ycbcr2rgb(full ? bt601_full : bt601,
> > >>>
> > >>> Doing all these conditional again pixel by pixel is probably
> > >>> inefficient. Just like with the line reading functions, you could pick
> > >>> the matrix in advance.
> > >>
> > >> I don't think the performance impact is huge (it's only a pair of if), but
> > >> yes, it's an easy optimization.
> > >>
> > >> I will create a conversion_matrix structure:
> > >>
> > >> struct conversion_matrix {
> > >> s16 matrix[3][3];
> > >> u16 y_offset;
> > >> }
>
> When defining such a struct type, it would be good to document the
> matrix layout (which one is row, which one is column), and what the s16
> mean (fixed point?).

Ack

> Try to not mix signed and unsigned types, too. The C implicit type
> promotion rules can be surprising. Just make everything signed while
> computing, and convert to/from unsigned only for storage.

Ack, I will change to signed type.

> > >>
> > >> I will create a `get_conversion_matrix_to_argb_u16` function to get this
> > >> structure from a format+encoding+range.
> > >>
> > >> I will also add a field `conversion_matrix` in struct vkms_plane_state to
> > >> get this matrix only once per plane setup.
>
> Alright. Let's see how that works.
>
> > >>
> > >>
> > >>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> > >>>> + break;
> > >>>> + case DRM_COLOR_YCBCR_BT709:
> > >>>> + ycbcr2rgb(full ? rec709_full : rec709,
> > >>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> > >>>> + break;
> > >>>> + case DRM_COLOR_YCBCR_BT2020:
> > >>>> + ycbcr2rgb(full ? bt2020_full : bt2020,
> > >>>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> > >>>> + break;
> > >>>> + default:
> > >>>> + pr_warn_once("Not supported color encoding\n");
> > >>>> + break;
> > >>>> + }
> > >>>> +
> > >>>> + argb_u16->r = r * 257;
> > >>>> + argb_u16->g = g * 257;
> > >>>> + argb_u16->b = b * 257;
> > >>>
> > >>> I wonder. Using 8-bit fixed point precision seems quite coarse for
> > >>> 8-bit pixel formats, and it's going to be insufficient for higher bit
> > >>> depths. Was supporting e.g. 10-bit YUV considered? There is even
> > >>> deeper, too, like DRM_FORMAT_P016.
> > >>
> > >> It's a good point, as I explained above, I took the conversion part as a
> > >> "black box" to avoid breaking (and debugging) stuff. I think it's easy to
> > >> switch to s32 bits matrix with 16.16 bits (or anything with more than 16 bits in
> > >> the float part).
> > >>
> > >> Maybe Arthur have an opinion on this?
> > >
> > > Yeah, I too don't see why not we could do that. The 8-bit precision was
> > > sufficient for those formats, but as well noted by Pekka this could be a
> > > problem for higher bit depths. I just need to make my terrible python
> > > script spit those values XD.
> >
> > Finally, I got it working with 32-bit precision.
> >
> > I basically threw all my untrusted python code away, and started using
> > the colour python framework suggested by Sebastian[1]. After knowing the
> > right values (and staring at numbers for hours), I found that with a
> > little bit of rounding, the conversion works.
> >
> > Also, while at it, I changed the name rec709 to bt709 to follow the
> > pattern and added "_full" to the full ranges matrices.
> >
> > While using the library, I noticed that the red component is wrong on
> > the color red in one test case.
> >
> > [1]: https://lore.kernel.org/all/20240115150600.GC160656@toolbox/
>
> That all sounds good. I wish the kernel code contained comments
> explaining how exactly you computed those matrices with python/colour.
> If the python snippets are not too long, including them verbatim as
> code comments would be really nice for both reviewers and posterity.
>
> The same for the VKMS unit tests, too, how you got the expected result
> values.

I edited the YUV support to have those s64 values.

@arthur, I will submit a v4 with this:
- matrix selection in plane_atomic_update (so it's selected only once)
- s64 numbers for matrix
- avoiding multiple loop implementation by switching matrix columns

Regarding the YUV part, I don't feel confortable adressing Pekka's
comments, would you mind doing it?

Kind regards,
Louis Chauvet

[...]

--
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2024-03-04 17:29:49

by Arthur Grillo

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] drm/vkms: Add YUV support



On 04/03/24 12:48, Louis Chauvet wrote:
> [...]
>
>>> @arthur, I will submit a v4 with this:
>>> - matrix selection in plane_atomic_update (so it's selected only once)
>>> - s64 numbers for matrix
>>> - avoiding multiple loop implementation by switching matrix columns
>>
>> This looks good to me.
>>
>>>
>>> Regarding the YUV part, I don't feel confortable adressing Pekka's
>>> comments, would you mind doing it?
>>
>> I'm already doing that, how do you want me to send those changes? I reply to
>> your series, like a did before?
>
> Yes, simply reply to my series, so I can rebase everything on the
> line-by-line work.

OK, I will do that.

Best Regards,
~Arthur Grillo

> Kind regards,
> Louis Chauvet
>
>> Best Regards,
>> ~Arthur Grillo
>>
>>>
>>> Kind regards,
>>> Louis Chauvet
>>>
>>> [...]
>>>
>

2024-03-01 11:53:48

by Pekka Paalanen

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] drm/vkms: Add YUV support

On Thu, 29 Feb 2024 14:57:06 -0300
Arthur Grillo <[email protected]> wrote:

> On 29/02/24 09:12, Pekka Paalanen wrote:
> > On Wed, 28 Feb 2024 22:52:09 -0300
> > Arthur Grillo <[email protected]> wrote:
> >
> >> On 27/02/24 17:01, Arthur Grillo wrote:
> >>>
> >>>
> >>> On 27/02/24 12:02, Louis Chauvet wrote:
> >>>> Hi Pekka,
> >>>>
> >>>> For all the comment related to the conversion part, maybe Arthur have an
> >>>> opinion on it, I took his patch as a "black box" (I did not want to
> >>>> break (and debug) it).
> >>>>
> >>>> Le 26/02/24 - 14:19, Pekka Paalanen a écrit :
> >>>>> On Fri, 23 Feb 2024 12:37:26 +0100
> >>>>> Louis Chauvet <[email protected]> wrote:
> >>>>>
> >>>>>> From: Arthur Grillo <[email protected]>
> >>>>>>
> >>>>>> Add support to the YUV formats bellow:
> >>>>>>
> >>>>>> - NV12
> >>>>>> - NV16
> >>>>>> - NV24
> >>>>>> - NV21
> >>>>>> - NV61
> >>>>>> - NV42
> >>>>>> - YUV420
> >>>>>> - YUV422
> >>>>>> - YUV444
> >>>>>> - YVU420
> >>>>>> - YVU422
> >>>>>> - YVU444
> >>>>>>
> >>>>>> The conversion matrices of each encoding and range were obtained by
> >>>>>> rounding the values of the original conversion matrices multiplied by
> >>>>>> 2^8. This is done to avoid the use of fixed point operations.
> >>>>>>
> >>>>>> Signed-off-by: Arthur Grillo <[email protected]>
> >>>>>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
> >>>>>> callbacks for yuv formats]
> >>>>>> Signed-off-by: Louis Chauvet <[email protected]>
> >>>>>> ---
> >>>>>> drivers/gpu/drm/vkms/vkms_composer.c | 2 +-
> >>>>>> drivers/gpu/drm/vkms/vkms_drv.h | 6 +-
> >>>>>> drivers/gpu/drm/vkms/vkms_formats.c | 289 +++++++++++++++++++++++++++++++++--
> >>>>>> drivers/gpu/drm/vkms/vkms_formats.h | 4 +
> >>>>>> drivers/gpu/drm/vkms/vkms_plane.c | 14 +-
> >>>>>> 5 files changed, 295 insertions(+), 20 deletions(-)

..

> >> diff --git a/drivers/gpu/drm/vkms/tests/vkms_format_test.c b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
> >> index f66584549827..4cee3c2d8d84 100644
> >> --- a/drivers/gpu/drm/vkms/tests/vkms_format_test.c
> >> +++ b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
> >> @@ -59,7 +59,7 @@ static struct yuv_u8_to_argb_u16_case yuv_u8_to_argb_u16_cases[] = {
> >> {"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
> >> {"gray", {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
> >> {"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
> >> - {"red", {0x35, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
> >> + {"red", {0x36, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
> >> {"green", {0xb6, 0x1e, 0x0c}, {0x0000, 0x0000, 0xffff, 0x0000}},
> >> {"blue", {0x12, 0xff, 0x74}, {0x0000, 0x0000, 0x0000, 0xffff}},
> >> },
> >> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> >> index e06bbd7c0a67..043f23dbf80d 100644
> >> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> >> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> >> @@ -121,10 +121,12 @@ static void RGB565_to_argb_u16(u8 **src_pixels, struct pixel_argb_u16 *out_pixel
> >> out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
> >> }
> >>
> >> -static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
> >> +#define BIT_DEPTH 32
> >> +
> >> +static void ycbcr2rgb(const s64 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
> >> {
> >> - s32 y_16, cb_16, cr_16;
> >> - s32 r_16, g_16, b_16;
> >> + s64 y_16, cb_16, cr_16;
> >> + s64 r_16, g_16, b_16;
> >>
> >> y_16 = y - y_offset;
> >> cb_16 = cb - 128;
> >> @@ -134,9 +136,18 @@ static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r,
> >> g_16 = m[1][0] * y_16 + m[1][1] * cb_16 + m[1][2] * cr_16;
> >> b_16 = m[2][0] * y_16 + m[2][1] * cb_16 + m[2][2] * cr_16;
> >>
> >> - *r = clamp(r_16, 0, 0xffff) >> 8;
> >> - *g = clamp(g_16, 0, 0xffff) >> 8;
> >> - *b = clamp(b_16, 0, 0xffff) >> 8;
> >> + // rounding the values
> >> + r_16 = r_16 + (1LL << (BIT_DEPTH - 4));
> >> + g_16 = g_16 + (1LL << (BIT_DEPTH - 4));
> >> + b_16 = b_16 + (1LL << (BIT_DEPTH - 4));
> >> +
> >> + r_16 = clamp(r_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
> >> + g_16 = clamp(g_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
> >> + b_16 = clamp(b_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
> >
> > Where do the BIT_DEPTH - 4 and BIT_DEPTH + 8 come from?
>
> Basically, the numbers are in this form in hex:
>
> 0xsspppppppp
>
> In the end, we only want the 's' bits.
>
> The matrix multiplication is not giving us perfect results, making some
> of KUnit test not pass, This is because the values end up a little bit
> off. KUnit expects 0xfe, but this functions is returning 0xfd.
>
> I noticed that before shifting the values to get the 's' bytes the
> values were a lot close to what is expected, something like:
>
> 0xfdfe287312
> ^
> So the rounding part adds 1 to this marked 'f' to round a bit the values
> (drm_fixed.h does something similar on drm_fixp2int_round).
> Like that:
>
> 0xfdfe287312
> + 0x0010000000
> ------------
> 0xfe0e287312
>
> That's why the BIT_DEPTH - 4.

I have a hard time deciphering this. There is some sort of strange
combination of UNORM and fixed-point going on here, where you process
the range 0.0 - 255.0 including 32-bit fraction. All variables being
named "_16" does not help, I've no idea what that refers to.

Usually when you have unsigned pixel format type, it's UNORM, that is
an unsigned integer representation that maps to [0.0, 1.0]. When
converting UNORM properly to e.g. fixed-point, you don't have to
consider the UNORM bit depth when computing in fixed-point.

There is a catch: since 0xff maps to 1.0, the divisor is 0xff, and not
a bit shift by 8. This must be taken into account when converting
between different depths of UNORM, or between UNORM and fixed-point.
Converting between different depths of fixed-point does not have this
problem.

If you want to proper rounding, meaning that 0.5 rounds up to 1.0 and
0.4999 rounds down to 0.0 when rounding to integers, you have to add
0.5 before truncating.

So in this case you need to add 0x0100_0000 / 2 = 0x0080_0000, not
0x0010_0000.

I don't understand what drm_fixp2int_round() is even doing. The offset
is not 0.5, it's 0.0000076.

> After that, the values need to be clamped to not get wrong results when
> shifting this s64 and casting it to u8. We clamp it to the minimum
> allowed value: 0, and to the maximum allowed value, which in this case
> is all the (BIT_DEPTH + 8) bits set to 1, The '+ 8' is to account for
> the size of the 's' bits.

Ok. You could also shift with >> BIT_DEPTH first, and then clamp to 0,
255.


Thanks,
pq

> After writing this, I think that maybe it would be good to add this
> explanation as a comment on the code.
>
> >
> >> +
> >> + *r = r_16 >> BIT_DEPTH;
> >> + *g = g_16 >> BIT_DEPTH;
> >> + *b = b_16 >> BIT_DEPTH;
> >> }


Attachments:
(No filename) (849.00 B)
OpenPGP digital signature

2024-03-05 09:50:36

by Pekka Paalanen

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] drm/vkms: write/update the documentation for pixel conversion and pixel write functions

On Mon, 4 Mar 2024 16:28:30 +0100
Louis Chauvet <[email protected]> wrote:

> Le 29/02/24 - 10:48, Pekka Paalanen a écrit :
> > On Tue, 27 Feb 2024 16:02:10 +0100
> > Louis Chauvet <[email protected]> wrote:
> >
> > > [...]
> > >
> > > > > diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> > > > > index 172830a3936a..cb7a49b7c8e7 100644
> > > > > --- a/drivers/gpu/drm/vkms/vkms_formats.c
> > > > > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > > > > @@ -9,6 +9,17 @@
> > > > >
> > > > > #include "vkms_formats.h"
> > > > >
> > > > > +/**
> > > > > + * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
> > > > > + * in the first plane
> > > > > + *
> > > > > + * @frame_info: Buffer metadata
> > > > > + * @x: The x coordinate of the wanted pixel in the buffer
> > > > > + * @y: The y coordinate of the wanted pixel in the buffer
> > > > > + *
> > > > > + * The caller must be aware that this offset is not always a pointer to a pixel. If individual
> > > > > + * pixel values are needed, they have to be extracted from the resulting block.
> > > >
> > > > Just wondering how the caller will be able to extract the right pixel
> > > > from the block without re-using the knowledge already used in this
> > > > function. I'd also expect the function to round down x,y to be
> > > > divisible by block dimensions, but that's not visible in this email.
> > > > Then the caller needs the remainder from the round-down, too?
> > >
> > > You are right, the current implementation is only working when block_h ==
> > > block_w == 1. I think I wrote the documentation for PATCHv2 5/9, but when
> > > backporting this comment for PATCHv2 3/9 I forgot to update it.
> > > The new comment will be:
> > >
> > > * pixels_offset() - Get the offset of a given pixel data at coordinate
> > > * x/y in the first plane
> > > [...]
> > > * The caller must ensure that the framebuffer associated with this
> > > * request uses a pixel format where block_h == block_w == 1.
> > > * If this requirement is not fulfilled, the resulting offset can be
> > > * completly wrong.
> >
> > Hi Louis,
>
> Hi Pekka,
>
> > if there is no plan for how non-1x1 blocks would work yet, then I think
> > the above wording is fine. In my mind, the below wording would
> > encourage callers to seek out and try arbitrary tricks to make things
> > work for non-1x1 without rewriting the function to actually work.
> >
> > I believe something would need to change in the function signature to
> > make it properly usable for non-1x1 blocks, but I too cannot suggest
> > anything off-hand.
>
> I already made the change to support non-1x1 blocks in Patchv2 5/9
> (I will extract this modification in "drm/vkms: Update pixels accessor to
> support packed and multi-plane formats"), this function is now able
> to extract the pointer to the start of a block. But as stated in the
> comment, the caller must manually extract the correct pixel values (if the
> format is 2x2, the pointer will point to the first byte of this block, the
> caller must do some computation to access the bottom-right value).

Patchv2 5/9 is not enough.

"Manually extract the correct pixels" is the thing I have a problem
with here. The caller should not need to re-do any semantic
calculations this function already did. Most likely this function
should return the remainders from the x,y coordinate division, so that
the caller can extract the right pixels from the block, or something
else equivalent.

That same semantic division should not be done in two different places.
It is too easy for someone later to come and change one site while
missing the other.

I have a hard time finding in "[PATCH v2 6/9] drm/vkms: Add YUV
support" how you actually handle blocks bigger than 1x1. I see
get_subsampling() which returns format->{hsub,vsub}, and I see
get_subsampling_offset() which combined with remainder-division gates U
and V plane pixel pointer increments.

However, I do not see you ever using
drm_format_info_block_{width,height}() anywhere else. That makes me
think you have no code to actually handle non-1x1 block formats, which
means that you cannot get the function signature of
packed_pixels_offset() right in this series either. It would be better
to not even pretend the function works for non-1x1 blocks until you
have code handling at least one such format.

All of the YUV formats that patch 6 adds support for use 1x1 blocks all
all their planes.


Thanks,
pq

> > >
> > > And yes, even after PATCHv2 5/9 it is not clear what is the offset. Is
> > > this better to replace the last sentence? (I will do the same update for
> > > the last sentence of packed_pixels_addr)
> > >
> > > [...]
> > > * The returned offset correspond to the offset of the block containing the pixel at coordinates
> > > * x/y.
> > > * The caller must use this offset with care, as for formats with block_h != 1 or block_w != 1
> > > * the requested pixel value may have to be extracted from the block, even if they are
> > > * individually adressable.
> > >
> > > > > + */
> > > > > static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> > > > > {
> > > > > struct drm_framebuffer *fb = frame_info->fb;
> > > > > @@ -17,12 +28,13 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
> > > > > + (x * fb->format->cpp[0]);
> > > > > }
> > > > >


Attachments:
(No filename) (849.00 B)
OpenPGP digital signature

2024-03-05 09:50:59

by Pekka Paalanen

[permalink] [raw]
Subject: Re: [PATCH v2 4/9] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions

On Mon, 4 Mar 2024 16:28:32 +0100
Louis Chauvet <[email protected]> wrote:

> Le 29/02/24 - 11:07, Pekka Paalanen a écrit :
> > On Tue, 27 Feb 2024 16:02:13 +0100
> > Louis Chauvet <[email protected]> wrote:
> >
> > > Le 26/02/24 - 13:36, Pekka Paalanen a écrit :
> > > > On Fri, 23 Feb 2024 12:37:24 +0100
> > > > Louis Chauvet <[email protected]> wrote:
> > > >
> > > > > Introduce two typedefs: pixel_read_t and pixel_write_t. It allows the
> > > > > compiler to check if the passed functions take the correct arguments.
> > > > > Such typedefs will help ensuring consistency across the code base in
> > > > > case of update of these prototypes.
> > > > >
> > > > > Introduce a check around the get_pixel_*_functions to avoid using a
> > > > > nullptr as a function.
> > > > >
> > > > > Document for those typedefs.
> > > > >
> > > > > Signed-off-by: Louis Chauvet <[email protected]>
> > > > > ---
> > > > > drivers/gpu/drm/vkms/vkms_drv.h | 23 +++++++++++++++++++++--
> > > > > drivers/gpu/drm/vkms/vkms_formats.c | 8 ++++----
> > > > > drivers/gpu/drm/vkms/vkms_formats.h | 4 ++--
> > > > > drivers/gpu/drm/vkms/vkms_plane.c | 9 ++++++++-
> > > > > drivers/gpu/drm/vkms/vkms_writeback.c | 9 ++++++++-
> > > > > 5 files changed, 43 insertions(+), 10 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> > > > > index 18086423a3a7..886c885c8cf5 100644
> > > > > --- a/drivers/gpu/drm/vkms/vkms_drv.h
> > > > > +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> > > > > @@ -53,12 +53,31 @@ struct line_buffer {
> > > > > struct pixel_argb_u16 *pixels;
> > > > > };
> > > > >
> > > > > +/**
> > > > > + * typedef pixel_write_t - These functions are used to read a pixel from a
> > > > > + * `struct pixel_argb_u16*`, convert it in a specific format and write it in the @dst_pixels
> > > > > + * buffer.
> > > > > + *
> > > > > + * @dst_pixel: destination address to write the pixel
> > > > > + * @in_pixel: pixel to write
> > > > > + */
> > > > > +typedef void (*pixel_write_t)(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel);
> > > >
> > > > There are some inconsistencies in pixel_write_t and pixel_read_t. At
> > > > this point of the series they still operate on a single pixel, but you
> > > > use dst_pixels and src_pixels, plural. Yet the documentation correctly
> > > > talks about processing a single pixel.
> > >
> > > I will fix this for v4.
> > >
> > > > I would also expect the source to be always const, but that's a whole
> > > > another patch to change.
> > >
> > > The v4 will contains a new patch "drm/vkms: Use const pointer for
> > > pixel_read and pixel_write functions"
> >
> > Sounds good!
> >
> > >
> > > [...]
> > >
> > > > > diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> > > > > index d5203f531d96..f68b1b03d632 100644
> > > > > --- a/drivers/gpu/drm/vkms/vkms_plane.c
> > > > > +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> > > > > @@ -106,6 +106,13 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> > > > > return;
> > > > >
> > > > > fmt = fb->format->format;
> > > > > + pixel_read_t pixel_read = get_pixel_read_function(fmt);
> > > > > +
> > > > > + if (!pixel_read) {
> > > > > + DRM_WARN("Pixel format is not supported by VKMS planes. State is inchanged\n");
> > > >
> > > > DRM_WARN() is the kernel equivalent to userspace assert(), right?
> > >
> > > For the DRM_WARN it is just a standard prinkt(KERN_WARN, ...) (hidden
> > > behind drm internal macros).
> >
> > My concern here is that does hitting this cause additional breakage of
> > the UAPI contract? For example, the UAPI contract expects that the old
> > FB is unreffed and the new FB is reffed by the plane in question. If
> > this early return causes that FB swap to be skipped, it could cause
> > secondary unexpected failures or misbehaviour for userspace later. That
> > could mislead debugging to think that there is a userspace bug.
> >
> > Even if you cannot actually read FB due to an internal bug, it would be
> > good to still apply all the state changes that the UAPI contract
> > mandates.
> >
> > Unless, this is a kernel bug kind of thing which explodes very
> > verbosely, but DRM_WARN is not that.
>
> You are right. In this case I maybe can just create a dummy
> "pixel_read" which always return black? (The v4 will have it so you can
> see how it look)
>
> This way, I can:
> - keep the check and avoid null function pointers (and OOPS);
> - log (WARN, DRM_WARN, DRM_ERROR or whatever suggested by DRM maintainers
> to warn very loudly);
> - Apply the requested change from userspace (and don't break the UAPI
> contract).
>
> The resulting behavior will be "the virtual plane is black", which is, I
> think, not very important. As the primary usage of VKMS is testing, it
> will just broke all the tests, and a quick look at the kernel logs will
> show this bug.

That's fine by me. After all, atomic check should have already
prevented this, and this can only happen due to a kernel bug AFAIU.


> > > > In that failing the check means an internal invariant was violated,
> > > > which means a code bug in kernel?
> > > >
> > > > Maybe this could be more specific about what invariant was violated?
> > > > E.g. atomic check should have rejected this attempt already.
> > >
> > > I'm not an expert (yet) in DRM, so please correct me:
> > > When atomic_update is called, the new state is always validated by
> > > atomic_check before? There is no way to pass something not validated by
> > > atomic_check to atomic_update? If this is the case, yes, it should be an
> > > ERROR and not a WARN as an invalid format passed the atomic_check
> > > verification.
> >
> > I only know about the UAPI, I'm not familiar with kernel internals.
> >
> > We see that atomic_update returns void, so I think it simply cannot
> > return errors. To my understanding, atomic_check needs to ensure that
> > atomic_update cannot fail. There is even UAPI to exercise atomic_check
> > alone: the atomic commit TEST_ONLY flag. Userspace trusts that flag, and
> > will not expect an identical atomic commit to fail without TEST_ONLY
> > when it succeeded with TEST_ONLY.
>
> That my understanding of the UAPI/DRM internals too, is my suggestion
> above sufficient? It will always succeed, no kernel OOPS.
>
> > > If so, is this better?
> > >
> > > if (!pixel_read) {
> > > /*
> > > * This is a bug as the vkms_plane_atomic_check must forbid all unsupported formats.
> > > */
> > > DRM_ERROR("Pixel format %4cc is not supported by VKMS planes.\n", fmt);
> > > return;
> > > }
> > >
> > > I will put the same code in vkms_writeback.c.
> >
> > Maybe maintainers can comment whether even DRM_ERROR is strong enough.
> >
> > As for the message, what you wrote in the comment is the most important
> > part that I'd put in the log. It explains what's going on, while that
> > "format not supported" is a detail without context.
> >
>
> Is something like this better?
>
> /*
> * This is a bug in vkms_plane_atomic_check. All the supported
> * format must:
> * - Be listed in vkms_formats
> * - Have a pixel_read_line callback
> */
> WARN(true, "Pixel format %4cc is not supported by VKMS planes. This is a kernel bug. Atomic check must forbid this configuration.\n", fmt)
>

Sure.


Thanks,
pq


Attachments:
(No filename) (849.00 B)
OpenPGP digital signature

2024-03-06 17:30:14

by Louis Chauvet

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] drm/vkms: write/update the documentation for pixel conversion and pixel write functions

[...]

> > > > > > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > > > > > @@ -9,6 +9,17 @@
> > > > > >
> > > > > > #include "vkms_formats.h"
> > > > > >
> > > > > > +/**
> > > > > > + * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
> > > > > > + * in the first plane
> > > > > > + *
> > > > > > + * @frame_info: Buffer metadata
> > > > > > + * @x: The x coordinate of the wanted pixel in the buffer
> > > > > > + * @y: The y coordinate of the wanted pixel in the buffer
> > > > > > + *
> > > > > > + * The caller must be aware that this offset is not always a pointer to a pixel. If individual
> > > > > > + * pixel values are needed, they have to be extracted from the resulting block.
> > > > >
> > > > > Just wondering how the caller will be able to extract the right pixel
> > > > > from the block without re-using the knowledge already used in this
> > > > > function. I'd also expect the function to round down x,y to be
> > > > > divisible by block dimensions, but that's not visible in this email.
> > > > > Then the caller needs the remainder from the round-down, too?
> > > >
> > > > You are right, the current implementation is only working when block_h ==
> > > > block_w == 1. I think I wrote the documentation for PATCHv2 5/9, but when
> > > > backporting this comment for PATCHv2 3/9 I forgot to update it.
> > > > The new comment will be:
> > > >
> > > > * pixels_offset() - Get the offset of a given pixel data at coordinate
> > > > * x/y in the first plane
> > > > [...]
> > > > * The caller must ensure that the framebuffer associated with this
> > > > * request uses a pixel format where block_h == block_w == 1.
> > > > * If this requirement is not fulfilled, the resulting offset can be
> > > > * completly wrong.
> > >
> > > Hi Louis,
> >
> > Hi Pekka,
> >
> > > if there is no plan for how non-1x1 blocks would work yet, then I think
> > > the above wording is fine. In my mind, the below wording would
> > > encourage callers to seek out and try arbitrary tricks to make things
> > > work for non-1x1 without rewriting the function to actually work.
> > >
> > > I believe something would need to change in the function signature to
> > > make it properly usable for non-1x1 blocks, but I too cannot suggest
> > > anything off-hand.
> >
> > I already made the change to support non-1x1 blocks in Patchv2 5/9
> > (I will extract this modification in "drm/vkms: Update pixels accessor to
> > support packed and multi-plane formats"), this function is now able
> > to extract the pointer to the start of a block. But as stated in the
> > comment, the caller must manually extract the correct pixel values (if the
> > format is 2x2, the pointer will point to the first byte of this block, the
> > caller must do some computation to access the bottom-right value).
>
> Patchv2 5/9 is not enough.
>
> "Manually extract the correct pixels" is the thing I have a problem
> with here. The caller should not need to re-do any semantic
> calculations this function already did. Most likely this function
> should return the remainders from the x,y coordinate division, so that
> the caller can extract the right pixels from the block, or something
> else equivalent.
>
> That same semantic division should not be done in two different places.
> It is too easy for someone later to come and change one site while
> missing the other.

I did not notice this, and I agree, thanks for this feedback. For the v5 I
will change it and update the function signature to:

static void packed_pixels_offset(const struct vkms_frame_info *frame_info, int x, int y,
size_t plane_index, size_t *offset, size_t *rem_x, size_t *rem_y)

where rem_x and rem_y are those reminder.

> I have a hard time finding in "[PATCH v2 6/9] drm/vkms: Add YUV
> support" how you actually handle blocks bigger than 1x1. I see
> get_subsampling() which returns format->{hsub,vsub}, and I see
> get_subsampling_offset() which combined with remainder-division gates U
> and V plane pixel pointer increments.
>
> However, I do not see you ever using
> drm_format_info_block_{width,height}() anywhere else. That makes me
> think you have no code to actually handle non-1x1 block formats, which
> means that you cannot get the function signature of
> packed_pixels_offset() right in this series either. It would be better
> to not even pretend the function works for non-1x1 blocks until you
> have code handling at least one such format.
>
> All of the YUV formats that patch 6 adds support for use 1x1 blocks all
> all their planes.

Yes, none of the supported format have block_h != block_w != 1, so there
is no need to drm_format_info_block*() helpers.

I wrote the code for DRM_FORMAT_R*. They are packed, with block_w != 1. I
will add this patch in the next revision. I also wrote the IGT test for
DRM_FORMAT_R1 [1]. Everything will be in the v5 (I will send it when you have the
time to review the v4).

For information, I also have a series ready for adding more RGB variants
(I introduced a macro to make it easier and avoid copy/pasting the same
loop). I don't send them yet, because I realy want this series merged
first. I also have the work for the writeback "line-by-line" algorithm
ready (I just need to rebase it, but it will be fast).

[1]: https://lore.kernel.org/igt-dev/[email protected]

Kind regards,
Louis Chauvet

[...]

--
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2024-03-06 17:30:27

by Louis Chauvet

[permalink] [raw]
Subject: Re: [PATCH v2 5/9] drm/vkms: Re-introduce line-per-line composition algorithm

Le 05/03/24 - 12:10, Pekka Paalanen a ?crit :
> On Mon, 4 Mar 2024 16:28:33 +0100
> Louis Chauvet <[email protected]> wrote:
>
> > Le 29/02/24 - 12:21, Pekka Paalanen a ?crit :
> > > On Tue, 27 Feb 2024 16:02:09 +0100
> > > Louis Chauvet <[email protected]> wrote:
> > >
> > > > [...]
> > > >
> > > > > > -static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
> > > > > > - struct line_buffer *stage_buffer,
> > > > > > - struct line_buffer *output_buffer)
> > > > > > +static void pre_mul_alpha_blend(
> > > > > > + struct line_buffer *stage_buffer,
> > > > > > + struct line_buffer *output_buffer,
> > > > > > + int x_start,
> > > > > > + int pixel_count)
> > > > > > {
> > > > > > - int x_dst = frame_info->dst.x1;
> > > > > > - struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
> > > > > > - struct pixel_argb_u16 *in = stage_buffer->pixels;
> > > > > > - int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> > > > > > - stage_buffer->n_pixels);
> > > > > > -
> > > > > > - for (int x = 0; x < x_limit; x++) {
> > > > > > - out[x].a = (u16)0xffff;
> > > > > > - out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
> > > > > > - out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
> > > > > > - out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
> > > > > > + struct pixel_argb_u16 *out = &output_buffer->pixels[x_start];
> > > > > > + struct pixel_argb_u16 *in = &stage_buffer->pixels[x_start];
> > > > >
> > > > > Input buffers and pointers should be const.
> > > >
> > > > They will be const in v4.
> > > >
> > > > > > +
> > > > > > + for (int i = 0; i < pixel_count; i++) {
> > > > > > + out[i].a = (u16)0xffff;
> > > > > > + out[i].r = pre_mul_blend_channel(in[i].r, out[i].r, in[i].a);
> > > > > > + out[i].g = pre_mul_blend_channel(in[i].g, out[i].g, in[i].a);
> > > > > > + out[i].b = pre_mul_blend_channel(in[i].b, out[i].b, in[i].a);
> > > > > > }
> > > > > > }
> > > > >
> > > > > Somehow the hunk above does not feel like it is part of "re-introduce
> > > > > line-per-line composition algorithm". This function was already running
> > > > > line-by-line. Would it be easy enough to collect this and directly
> > > > > related changes into a separate patch?
> > > >
> > > > It is not directly related to the reintroduction of line-by-line
> > > > algorithm, but in the simplification and maintenability effort, I
> > > > changed a bit the function to avoid having multiple place computing the
> > > > x_start/pixel_count values. I don't see an interrest to extract it, it
> > > > will be just a translation of the few lines into the calling place.
> > >
> > > It does make review more difficult, because it makes the patch bigger
> > > and is not explained in the commit message. It is a surprise to a
> > > reviewer, who then needs to think what this means and does it belong
> > > here.
> > >
> > > If you explain it in the commit message and note it in the commit
> > > summary line, I think it would become fairly obvious that this patch is
> > > doing two things rather than one.
> > >
> > > Therefore, *if* it is easy to extract as a separate patch, then it
> > > would be nice to do so. However, if doing so would require you to write
> > > a bunch of temporary code that the next patch would just rewrite again,
> > > then doing so would be counter-productive.
> > >
> > > Patch split is about finding a good trade-off to make things easy for
> > > reviewers:
> > >
> > > - Smaller patches are better as long as they are self-standing and
> > > understandable in isolation, and of course do not regress anything.
> > >
> > > - Rewriting the same thing multiple times in the same series is extra
> > > work for a reviewer and therefore best avoided.
> > >
> > > - The simpler the semantic change, the bigger a patch can be and still
> > > be easy to review.
> > >
> > > And all the patch writing rules specific to the kernel project that I
> > > don't know about.
> >
> > I will extract it in "drm/vkms: Avoid computing blending limits inside the
> > blend function". It's not very relevant by itself, but it make the main
> > patch easier to read.
>
> Thank you.
>
>
> > > > [...]
> > > >
> > > > > > +/**
> > > > > > + * direction_for_rotation() - Helper to get the correct reading direction for a specific rotation
> > > > > > + *
> > > > > > + * @rotation: rotation to analyze
> > > > >
> > > > > This is KMS plane rotation property, right?
> > > > >
> > > > > So the KMS plane has been rotated by this, and what we want to find is
> > > > > the read direction on the attached FB so that reading returns pixels in
> > > > > the CRTC line/scanout order, right?
> > > > >
> > > > > Maybe extend the doc to explain that.
> > > >
> > > > Is it better?
> > > >
> > > > * direction_for_rotation() - Get the correct reading direction for a given rotation
> > > > *
> > > > * This function will use the @rotation parameter to compute the correct reading direction to read
> > > > * a line from the source buffer.
> > > > * For example, if the buffer is reflected on X axis, the pixel must be read from right to left.
> > > > * @rotation: Rotation to analyze. It correspond the the field @frame_info.rotation.
> > >
> > > I think it is important to define what determines the correct result.
> > > In this case, we want the reading to produce pixels in the CRTC scanout
> > > line order, I believe. If you don't say "CRTC", the reader does not
> > > know what "the correct reading direction" should match to.
> >
> > Is this a better explanation?
> >
> > * This function will use the @rotation setting of a source plane to compute the reading
> > * direction in this plane which correspond to a left to right writing in the CRTC.
> > * For example, if the buffer is reflected on X axis, the pixel must be read from right to left
> > * to be written from left to right on the CRTC.
>
> Perfect!
>
>
> >
> > > > > > + */
> > > > > > +enum pixel_read_direction direction_for_rotation(unsigned int rotation)
> > > > > > +{
> > > > > > + if (rotation & DRM_MODE_ROTATE_0) {
> > > > > > + if (rotation & DRM_MODE_REFLECT_X)
> > > > > > + return READ_LEFT;
> > > > > > + else
> > > > > > + return READ_RIGHT;
> > > > > > + } else if (rotation & DRM_MODE_ROTATE_90) {
> > > > > > + if (rotation & DRM_MODE_REFLECT_Y)
> > > > > > + return READ_UP;
> > > > > > + else
> > > > > > + return READ_DOWN;
> > > > > > + } else if (rotation & DRM_MODE_ROTATE_180) {
> > > > > > + if (rotation & DRM_MODE_REFLECT_X)
> > > > > > + return READ_RIGHT;
> > > > > > + else
> > > > > > + return READ_LEFT;
> > > > > > + } else if (rotation & DRM_MODE_ROTATE_270) {
> > > > > > + if (rotation & DRM_MODE_REFLECT_Y)
> > > > > > + return READ_DOWN;
> > > > > > + else
> > > > > > + return READ_UP;
> > > > > > + }
> > > > > > + return READ_RIGHT;
> > > > > > +}
> > > > > > +
> > > > > > /**
> > > > > > * blend - blend the pixels from all planes and compute crc
> > > > > > * @wb: The writeback frame buffer metadata
> > > > > > @@ -183,11 +187,11 @@ static void blend(struct vkms_writeback_job *wb,
> > > > > > {
> > > > > > struct vkms_plane_state **plane = crtc_state->active_planes;
> > > > > > u32 n_active_planes = crtc_state->num_active_planes;
> > > > > > - int y_pos;
> > > > > >
> > > > > > const struct pixel_argb_u16 background_color = { .a = 0xffff };
> > > > > >
> > > > > > size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
> > > > > > + size_t crtc_x_limit = crtc_state->base.crtc->mode.hdisplay;
> > > > >
> > > > > Wonder why these were size_t, causing needs to cast below...
> > > >
> > > > For crtc_x_limit I just copied the crtc_y_limit. I will change both to u16
> > > > (the type of h/vdisplay).
> > >
> > > Don't go unsigned, that can cause unexpected results when mixed in
> > > computations with signed variables.
> >
> > I will replace them with int.
> >
> > > Oh, the cast was probably not about size but signedness. Indeed, size_t
> > > is unsigned.
> > >
> > > I don't see a reason to use a 16-bit size either, it just exposes the
> > > computations to under/overflows that would then be needed to check for.
> > > s32 should be as fast as any, and perhaps enough bits to never
> > > under/overflow in these computations, but please verify that.
> >
> > I just suggested u16 because it's the type of vdisplay/hdisplay. It was
> > not for performance reason.
>
> Right. It's not uncommon store a value in a storage efficient type that
> may also disallow illegal values, and then use a different type while
> actually computing with it in order to not provoke too obscure C
> language rules most people never heard of, to avoid over/underflows, or
> to just avoid undefined behaviour.
>
> ...
>
> > > > > > +static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction direction,
> > > > > > + int plane_index)
> > > > > > {
> > > > > > - int x_src = frame_info->src.x1 >> 16;
> > > > > > - int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
> > > > > > -
> > > > > > - return packed_pixels_addr(frame_info, x_src, y_src);
> > > > > > + switch (direction) {
> > > > > > + default:
> > > > > > + DRM_ERROR("Invalid direction for pixel reading: %d\n", direction);
> > > > > > + return 0;
> > > > >
> > > > > What I'd do here is move the default: section outside of the switch
> > > > > completely. Then the compiler can warn if any enum value is not handled
> > > > > here. Since every case in the switch is a return statement, falling out
> > > > > of the switch block is the default case.
> > > >
> > > > Hoo, I did not know that gcc can warn when using enums, I will definitly
> > > > do it for the v4.
> > > >
> > > > > Maybe the enum variable containing an illegal value could be handled
> > > > > more harshly so that callers could rely on this function always
> > > > > returning a good value?
> > > > >
> > > > > Just like passing in fb=NULL is handled by the kernel as an OOPS.
> > > >
> > > > I don't think it's a good idea to OOPS inside a driver.
> > >
> > > Everyone already do that. Most functions that do not expect to be called
> > > with NULL never check the arguments for NULL. They just OOPS on
> > > dereference if someone passes in NULL. And for a good reason: adding
> > > all those checks is both code churn and it casts doubt: "maybe it is
> > > legal and expected to call this function with NULL sometimes, what good
> > > does that do?".
> >
> > I agree that adding something like
> >
> > if (direction_is_valid) pr_err("Invalid direction")
> >
> > is useless, but as I already have the switch, it cost nothing to warn if
> > something gone wrong. I will just replace this simple DRM_ERROR with a
> > WARN_ONCE to be more verbose about "it is a bug".
>
> Sounds good to me, and I hope kernel maintainers would agree.
>
>
> > > > An error here is
> > > > maybe dangerous, but is not fatal to the kernel. Maybe you know how to do
> > > > a "local" OOPS to break only this driver and not the whole kernel?
> > >
> > > I don't know what the best practices are in the kernel.
> > >
> > > > For the v4 I will keep a DRM_ERROR and return 0.
> > >
> > > Does that require the caller to check for 0? Could the 0 cause
> > > something else to end up in an endless loop? If it does return 0, how
> > > should a caller handle this case that "cannot" ever happen? Why have
> > > code for something that cannot happen?
> >
> > I have to return something, otherwise the compiler will complain about it.
> >
> > To avoid for future developers surprise, I added this information in the
> > comment. This way the user don't have to read the code to understand how
> > much he can rely on this value.
> >
> > If the caller can trust his direction, he don't have to worry about this.
> > If he can't trust his direction, he know that the returned value can be
> > zero, and thus can't be used for a loop variant.
>
> There should not be "untrusted" values to begin with at this point,
> anything that comes from outside of the kernel should have already been
> sanitised. This is about kernel bugs though. Bugs cannot be predicted,
> nor can anyone guarantee to write bug-free code. Hence, the direction
> value is always "somewhat untrusted". We're being paranoid about bugs
> that might happen and trying to ensure the kernel can limp along
> regardless, while also trying to minimise the amount of code that
> "cannot" ever be reached.
>
> > The zero is also nice because it does not interfere with the normal
> > behavior of this function. If the returned value is not zero, it's the
> > correct step to use from one pixel to an other.
>
> If you expect the caller needing to check for the "cannot happen" case,
> returning a unique error value is fine. If you expect the caller to
> never need to think of the "cannot happen" case, you should return a
> value that is "safe", if such value exists. "Safe" here means using it
> will not result in grave bugs like bad memory access, but it also won't
> produce expected results unless by accident.

That my issue, on my initial draft, I had a `return 1` (so I can use it as
a loop variant), but after thinking, if the start pixel is the last
of the plane, it will access outside the buffer.

> This getting perhaps a bit too philosophical, so don't mind about this
> too much if it feels strange.

Maybe yes, I was a bit paranoid, I can just return 0 and remove the comment.

>
> > > Of course it's a trade-off between correctness and limping along
> > > injured, and the kernel tends to strongly lean toward the latter for the
> > > obvious reasons.
> > >
> > > > > > + case READ_RIGHT:
> > > > > > + return fb->format->char_per_block[plane_index];
> > > > > > + case READ_LEFT:
> > > > > > + return -fb->format->char_per_block[plane_index];
> > > > > > + case READ_DOWN:
> > > > > > + return (int)fb->pitches[plane_index];
> > > > > > + case READ_UP:
> > > > > > + return -(int)fb->pitches[plane_index];
> > > > > > + }
> > > > > > }
> > > > > >
> > > > > > -static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
> > > > > > -{
> > > > > > - if (frame_info->rotation & (DRM_MODE_REFLECT_X | DRM_MODE_ROTATE_270))
> > > > > > - return limit - x - 1;
> > > > > > - return x;
> > > > > > -}
> > > > > >
> > > > > > /*
> > > > > > - * The following functions take pixel data from the buffer and convert them to the format
> > > > > > + * The following functions take pixel data (a, r, g, b, pixel, ...), convert them to the format
> > > > > > * ARGB16161616 in out_pixel.
> > > > > > *
> > > > > > - * They are used in the `vkms_compose_row` function to handle multiple formats.
> > > > > > + * They are used in the `read_line`s functions to avoid duplicate work for some pixel formats.
> > > > > > */
> > > > > >
> > > > > > -static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> > > > > > +static void ARGB8888_to_argb_u16(struct pixel_argb_u16 *out_pixel, int a, int r, int g, int b)
> > > > >
> > > > > The function name ARGB8888_to_argb_u16() is confusing. It's not taking
> > > > > in ARGB8888 pixels but separate a,r,g,b ints. The only assumption it
> > > > > needs from the pixel format is the 8888 part.
> > > >
> > > > I don't realy know how to name it. What I like with ARGB8888 is that it's
> > > > clear that the values are 8 bits and in argb format.
> > >
> > > I could even propose
> > >
> > > static struct pixel_argb_u16
> > > argb_u16_from_u8888(int a, int r, int g, int b)
> > >
> > > perhaps. Yes, returning a struct by value. I think it would fit, and
> > > these are supposed to get fully inlined anyway, too.
> > >
> > > c.f argb_u16_from_u2101010().
> >
> > I don't find this method, but I got and like the idea, I will change the
> > callback to this in the v4.
>
> I mean, there is no support for 10-bpc formats in VKMS yet IIRC, but
> there should be one day, so thinking about how that would fit in the
> naming scheme is nice.
>
> > > Not a big deal though, I think I'm getting a little bit too involved to
> > > see what would be the most intuitively understandable naming scheme for
> > > someone not familiar with the code.
> > >
> > > > Do you think that `argb_u8_to_argb_u16`, with a new structure
> > > > pixel_argb_u8 will be better? (like PATCH 6/9 with pixel_yuv_u8).
> > > >
> > > > If so, I will introduce the argb_u8 structure in an other commit.
> > >
> > > How would you handle 10-bpc formats? Is there a need for
> > > proliferation of bit-depth-specific struct types?
> >
> > No, I don't think it's good to multiply things. I will patch Arthur's
> > patches to avoid the pixel_yuv_u8 structure.
> >
> > > > [...]
> > > >
> > > > > > + * The following functions are read_line function for each pixel format supported by VKMS.
> > > > > > *
> > > > > > - * This function composes a single row of a plane. It gets the source pixels
> > > > > > - * through the y coordinate (see get_packed_src_addr()) and goes linearly
> > > > > > - * through the source pixel, reading the pixels and converting it to
> > > > > > - * ARGB16161616 (see the pixel_read() callback). For rotate-90 and rotate-270,
> > > > > > - * the source pixels are not traversed linearly. The source pixels are queried
> > > > > > - * on each iteration in order to traverse the pixels vertically.
> > > > > > + * They read a line starting at the point @x_start,@y_start following the @direction. The result
> > > > > > + * is stored in @out_pixel and in the format ARGB16161616.
> > > > > > + *
> > > > > > + * Those function are very similar, but it is required for performance reason. In the past, some
> > > > > > + * experiment were done, and with a generic loop the performance are very reduced [1].
> > > > > > + *
> > > > > > + * [1]: https://lore.kernel.org/dri-devel/[email protected]/
> > > > > > */
> > > > > > -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y)
> > > > > > +
> > > > > > +static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> > > > > > + enum pixel_read_direction direction, int count,
> > > > > > + struct pixel_argb_u16 out_pixel[])
> > > > > > +{
> > > > > > + u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> > > > > > +
> > > > > > + int step = get_step_1x1(frame_info->fb, direction, 0);
> > > > > > +
> > > > > > + while (count) {
> > > > > > + u8 *px = (u8 *)src_pixels;
> > > > > > +
> > > > > > + ARGB8888_to_argb_u16(out_pixel, px[3], px[2], px[1], px[0]);
> > > > > > + out_pixel += 1;
> > > > > > + src_pixels += step;
> > > > > > + count--;
> > > > >
> > > > > btw. you could eliminate decrementing 'count' if you computed end
> > > > > address and used while (out_pixel < end).
> > > >
> > > > Yes, you are right, but after thinking about it, neither out_pixel < end
> > > > and while (count) are conveying "this loop will copy `count` pixels. I
> > > > think a for-loop here is more understandable. There is no ambiguity in the
> > > > number of pixels written and less error-prone. I will replace
> > > > while (count)
> > > > by
> > > > for(int i = 0; i < count; i++)
> > >
> > > I agree that a for-loop is the most obvious way of saying it, but I
> > > also think while (out_pixel < end) is very close too, and so is while (count).
> > > None of those would make me think twice.
> > >
> > > However, I'm thinking of performance here. After all, this is the
> > > hottest code path there is in VKMS. Is the compiler smart enough to
> > > eliminate count-- or i to reduce the number of CPU cycles?
> >
> > You are proably right, I will change it to out_pixel < end.
>
> Don't trust my word without benchmarking it. ;-)

I did not notice a change with kms_fb_stress. There is maybe a small
improvment, but completly hidden in the DRM overhead.

Kind regards,
Louis Chauvet

>
> Thanks,
> pq



--
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2024-03-06 20:10:26

by Arthur Grillo

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] drm/vkms: Add YUV support



On 04/03/24 13:51, Arthur Grillo wrote:
>
>
> On 04/03/24 12:48, Louis Chauvet wrote:
[...]
>>>
>>>> Regarding the YUV part, I don't feel confortable adressing Pekka's
>>>> comments, would you mind doing it?
>>>
>>> I'm already doing that, how do you want me to send those changes? I reply to
>>> your series, like a did before?
>>
>> Yes, simply reply to my series, so I can rebase everything on the
>> line-by-line work.
>
> OK, I will do that.

Hi,

I know that I said that, but it would be very difficult to that with my
b4 workflow. So, I sent a separate series based on the v4:

https://lore.kernel.org/all/[email protected]/

I hope that it does not difficult things for you.

Best Regards,
~Arthur Grillo

>
> Best Regards,
> ~Arthur Grillo
>
>> Kind regards,
>> Louis Chauvet
>>
>>> Best Regards,
>>> ~Arthur Grillo
>>>
>>>>
>>>> Kind regards,
>>>> Louis Chauvet
>>>>
>>>> [...]
>>>>
>>

2024-03-07 00:03:51

by Louis Chauvet

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] drm/vkms: Add YUV support

Le 06/03/24 - 17:09, Arthur Grillo a ?crit :
>
>
> On 04/03/24 13:51, Arthur Grillo wrote:
> >
> >
> > On 04/03/24 12:48, Louis Chauvet wrote:
> [...]
> >>>
> >>>> Regarding the YUV part, I don't feel confortable adressing Pekka's
> >>>> comments, would you mind doing it?
> >>>
> >>> I'm already doing that, how do you want me to send those changes? I reply to
> >>> your series, like a did before?
> >>
> >> Yes, simply reply to my series, so I can rebase everything on the
> >> line-by-line work.
> >
> > OK, I will do that.
>
> Hi,
>
> I know that I said that, but it would be very difficult to that with my
> b4 workflow. So, I sent a separate series based on the v4:
>
> https://lore.kernel.org/all/[email protected]/
>
> I hope that it does not difficult things for you.

Thanks for this work!

I completly understood, and a "real" patch is even better as I
can fetch them through patchwork. The v5 is (almost, see my comment)
ready, but I want to wait for Pekka's comments/replies on the v4 before
sending it.

Kind regards,
Louis Chauvet

> Best Regards,
> ~Arthur Grillo
>
> >
> > Best Regards,
> > ~Arthur Grillo
> >
> >> Kind regards,
> >> Louis Chauvet
> >>
> >>> Best Regards,
> >>> ~Arthur Grillo
> >>>
> >>>>
> >>>> Kind regards,
> >>>> Louis Chauvet
> >>>>
> >>>> [...]
> >>>>
> >>

--
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2024-03-07 08:42:17

by Pekka Paalanen

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] drm/vkms: write/update the documentation for pixel conversion and pixel write functions

On Wed, 6 Mar 2024 18:29:53 +0100
Louis Chauvet <[email protected]> wrote:

> [...]
>
> > > > > > > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > > > > > > @@ -9,6 +9,17 @@
> > > > > > >
> > > > > > > #include "vkms_formats.h"
> > > > > > >
> > > > > > > +/**
> > > > > > > + * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
> > > > > > > + * in the first plane
> > > > > > > + *
> > > > > > > + * @frame_info: Buffer metadata
> > > > > > > + * @x: The x coordinate of the wanted pixel in the buffer
> > > > > > > + * @y: The y coordinate of the wanted pixel in the buffer
> > > > > > > + *
> > > > > > > + * The caller must be aware that this offset is not always a pointer to a pixel. If individual
> > > > > > > + * pixel values are needed, they have to be extracted from the resulting block.
> > > > > >
> > > > > > Just wondering how the caller will be able to extract the right pixel
> > > > > > from the block without re-using the knowledge already used in this
> > > > > > function. I'd also expect the function to round down x,y to be
> > > > > > divisible by block dimensions, but that's not visible in this email.
> > > > > > Then the caller needs the remainder from the round-down, too?
> > > > >
> > > > > You are right, the current implementation is only working when block_h ==
> > > > > block_w == 1. I think I wrote the documentation for PATCHv2 5/9, but when
> > > > > backporting this comment for PATCHv2 3/9 I forgot to update it.
> > > > > The new comment will be:
> > > > >
> > > > > * pixels_offset() - Get the offset of a given pixel data at coordinate
> > > > > * x/y in the first plane
> > > > > [...]
> > > > > * The caller must ensure that the framebuffer associated with this
> > > > > * request uses a pixel format where block_h == block_w == 1.
> > > > > * If this requirement is not fulfilled, the resulting offset can be
> > > > > * completly wrong.
> > > >
> > > > Hi Louis,
> > >
> > > Hi Pekka,
> > >
> > > > if there is no plan for how non-1x1 blocks would work yet, then I think
> > > > the above wording is fine. In my mind, the below wording would
> > > > encourage callers to seek out and try arbitrary tricks to make things
> > > > work for non-1x1 without rewriting the function to actually work.
> > > >
> > > > I believe something would need to change in the function signature to
> > > > make it properly usable for non-1x1 blocks, but I too cannot suggest
> > > > anything off-hand.
> > >
> > > I already made the change to support non-1x1 blocks in Patchv2 5/9
> > > (I will extract this modification in "drm/vkms: Update pixels accessor to
> > > support packed and multi-plane formats"), this function is now able
> > > to extract the pointer to the start of a block. But as stated in the
> > > comment, the caller must manually extract the correct pixel values (if the
> > > format is 2x2, the pointer will point to the first byte of this block, the
> > > caller must do some computation to access the bottom-right value).
> >
> > Patchv2 5/9 is not enough.
> >
> > "Manually extract the correct pixels" is the thing I have a problem
> > with here. The caller should not need to re-do any semantic
> > calculations this function already did. Most likely this function
> > should return the remainders from the x,y coordinate division, so that
> > the caller can extract the right pixels from the block, or something
> > else equivalent.
> >
> > That same semantic division should not be done in two different places.
> > It is too easy for someone later to come and change one site while
> > missing the other.
>
> I did not notice this, and I agree, thanks for this feedback. For the v5 I
> will change it and update the function signature to:
>
> static void packed_pixels_offset(const struct vkms_frame_info *frame_info, int x, int y,
> size_t plane_index, size_t *offset, size_t *rem_x, size_t *rem_y)
>
> where rem_x and rem_y are those reminder.

Ok, that's a start.

Why size_t? It's unsigned. You'll probably be mixing signed and
unsigned variables in computations again.

> > I have a hard time finding in "[PATCH v2 6/9] drm/vkms: Add YUV
> > support" how you actually handle blocks bigger than 1x1. I see
> > get_subsampling() which returns format->{hsub,vsub}, and I see
> > get_subsampling_offset() which combined with remainder-division gates U
> > and V plane pixel pointer increments.
> >
> > However, I do not see you ever using
> > drm_format_info_block_{width,height}() anywhere else. That makes me
> > think you have no code to actually handle non-1x1 block formats, which
> > means that you cannot get the function signature of
> > packed_pixels_offset() right in this series either. It would be better
> > to not even pretend the function works for non-1x1 blocks until you
> > have code handling at least one such format.
> >
> > All of the YUV formats that patch 6 adds support for use 1x1 blocks all
> > all their planes.
>
> Yes, none of the supported format have block_h != block_w != 1, so there
> is no need to drm_format_info_block*() helpers.
>
> I wrote the code for DRM_FORMAT_R*. They are packed, with block_w != 1. I
> will add this patch in the next revision. I also wrote the IGT test for
> DRM_FORMAT_R1 [1].

Excellent!

> Everything will be in the v5 (I will send it when you have the
> time to review the v4).

I'm too busy this week, I think. Maybe next.

Why should I review v4 when I already know you will be changing things
again? I'd probably flag the same things I've already said.


Thanks,
pq

> For information, I also have a series ready for adding more RGB variants
> (I introduced a macro to make it easier and avoid copy/pasting the same
> loop). I don't send them yet, because I realy want this series merged
> first. I also have the work for the writeback "line-by-line" algorithm
> ready (I just need to rebase it, but it will be fast).
>
> [1]: https://lore.kernel.org/igt-dev/[email protected]
>
> Kind regards,
> Louis Chauvet
>
> [...]
>


Attachments:
(No filename) (849.00 B)
OpenPGP digital signature

2024-03-05 10:10:26

by Pekka Paalanen

[permalink] [raw]
Subject: Re: [PATCH v2 5/9] drm/vkms: Re-introduce line-per-line composition algorithm

On Mon, 4 Mar 2024 16:28:33 +0100
Louis Chauvet <[email protected]> wrote:

> Le 29/02/24 - 12:21, Pekka Paalanen a écrit :
> > On Tue, 27 Feb 2024 16:02:09 +0100
> > Louis Chauvet <[email protected]> wrote:
> >
> > > [...]
> > >
> > > > > -static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
> > > > > - struct line_buffer *stage_buffer,
> > > > > - struct line_buffer *output_buffer)
> > > > > +static void pre_mul_alpha_blend(
> > > > > + struct line_buffer *stage_buffer,
> > > > > + struct line_buffer *output_buffer,
> > > > > + int x_start,
> > > > > + int pixel_count)
> > > > > {
> > > > > - int x_dst = frame_info->dst.x1;
> > > > > - struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
> > > > > - struct pixel_argb_u16 *in = stage_buffer->pixels;
> > > > > - int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> > > > > - stage_buffer->n_pixels);
> > > > > -
> > > > > - for (int x = 0; x < x_limit; x++) {
> > > > > - out[x].a = (u16)0xffff;
> > > > > - out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
> > > > > - out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
> > > > > - out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
> > > > > + struct pixel_argb_u16 *out = &output_buffer->pixels[x_start];
> > > > > + struct pixel_argb_u16 *in = &stage_buffer->pixels[x_start];
> > > >
> > > > Input buffers and pointers should be const.
> > >
> > > They will be const in v4.
> > >
> > > > > +
> > > > > + for (int i = 0; i < pixel_count; i++) {
> > > > > + out[i].a = (u16)0xffff;
> > > > > + out[i].r = pre_mul_blend_channel(in[i].r, out[i].r, in[i].a);
> > > > > + out[i].g = pre_mul_blend_channel(in[i].g, out[i].g, in[i].a);
> > > > > + out[i].b = pre_mul_blend_channel(in[i].b, out[i].b, in[i].a);
> > > > > }
> > > > > }
> > > >
> > > > Somehow the hunk above does not feel like it is part of "re-introduce
> > > > line-per-line composition algorithm". This function was already running
> > > > line-by-line. Would it be easy enough to collect this and directly
> > > > related changes into a separate patch?
> > >
> > > It is not directly related to the reintroduction of line-by-line
> > > algorithm, but in the simplification and maintenability effort, I
> > > changed a bit the function to avoid having multiple place computing the
> > > x_start/pixel_count values. I don't see an interrest to extract it, it
> > > will be just a translation of the few lines into the calling place.
> >
> > It does make review more difficult, because it makes the patch bigger
> > and is not explained in the commit message. It is a surprise to a
> > reviewer, who then needs to think what this means and does it belong
> > here.
> >
> > If you explain it in the commit message and note it in the commit
> > summary line, I think it would become fairly obvious that this patch is
> > doing two things rather than one.
> >
> > Therefore, *if* it is easy to extract as a separate patch, then it
> > would be nice to do so. However, if doing so would require you to write
> > a bunch of temporary code that the next patch would just rewrite again,
> > then doing so would be counter-productive.
> >
> > Patch split is about finding a good trade-off to make things easy for
> > reviewers:
> >
> > - Smaller patches are better as long as they are self-standing and
> > understandable in isolation, and of course do not regress anything.
> >
> > - Rewriting the same thing multiple times in the same series is extra
> > work for a reviewer and therefore best avoided.
> >
> > - The simpler the semantic change, the bigger a patch can be and still
> > be easy to review.
> >
> > And all the patch writing rules specific to the kernel project that I
> > don't know about.
>
> I will extract it in "drm/vkms: Avoid computing blending limits inside the
> blend function". It's not very relevant by itself, but it make the main
> patch easier to read.

Thank you.


> > > [...]
> > >
> > > > > +/**
> > > > > + * direction_for_rotation() - Helper to get the correct reading direction for a specific rotation
> > > > > + *
> > > > > + * @rotation: rotation to analyze
> > > >
> > > > This is KMS plane rotation property, right?
> > > >
> > > > So the KMS plane has been rotated by this, and what we want to find is
> > > > the read direction on the attached FB so that reading returns pixels in
> > > > the CRTC line/scanout order, right?
> > > >
> > > > Maybe extend the doc to explain that.
> > >
> > > Is it better?
> > >
> > > * direction_for_rotation() - Get the correct reading direction for a given rotation
> > > *
> > > * This function will use the @rotation parameter to compute the correct reading direction to read
> > > * a line from the source buffer.
> > > * For example, if the buffer is reflected on X axis, the pixel must be read from right to left.
> > > * @rotation: Rotation to analyze. It correspond the the field @frame_info.rotation.
> >
> > I think it is important to define what determines the correct result.
> > In this case, we want the reading to produce pixels in the CRTC scanout
> > line order, I believe. If you don't say "CRTC", the reader does not
> > know what "the correct reading direction" should match to.
>
> Is this a better explanation?
>
> * This function will use the @rotation setting of a source plane to compute the reading
> * direction in this plane which correspond to a left to right writing in the CRTC.
> * For example, if the buffer is reflected on X axis, the pixel must be read from right to left
> * to be written from left to right on the CRTC.

Perfect!


>
> > > > > + */
> > > > > +enum pixel_read_direction direction_for_rotation(unsigned int rotation)
> > > > > +{
> > > > > + if (rotation & DRM_MODE_ROTATE_0) {
> > > > > + if (rotation & DRM_MODE_REFLECT_X)
> > > > > + return READ_LEFT;
> > > > > + else
> > > > > + return READ_RIGHT;
> > > > > + } else if (rotation & DRM_MODE_ROTATE_90) {
> > > > > + if (rotation & DRM_MODE_REFLECT_Y)
> > > > > + return READ_UP;
> > > > > + else
> > > > > + return READ_DOWN;
> > > > > + } else if (rotation & DRM_MODE_ROTATE_180) {
> > > > > + if (rotation & DRM_MODE_REFLECT_X)
> > > > > + return READ_RIGHT;
> > > > > + else
> > > > > + return READ_LEFT;
> > > > > + } else if (rotation & DRM_MODE_ROTATE_270) {
> > > > > + if (rotation & DRM_MODE_REFLECT_Y)
> > > > > + return READ_DOWN;
> > > > > + else
> > > > > + return READ_UP;
> > > > > + }
> > > > > + return READ_RIGHT;
> > > > > +}
> > > > > +
> > > > > /**
> > > > > * blend - blend the pixels from all planes and compute crc
> > > > > * @wb: The writeback frame buffer metadata
> > > > > @@ -183,11 +187,11 @@ static void blend(struct vkms_writeback_job *wb,
> > > > > {
> > > > > struct vkms_plane_state **plane = crtc_state->active_planes;
> > > > > u32 n_active_planes = crtc_state->num_active_planes;
> > > > > - int y_pos;
> > > > >
> > > > > const struct pixel_argb_u16 background_color = { .a = 0xffff };
> > > > >
> > > > > size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
> > > > > + size_t crtc_x_limit = crtc_state->base.crtc->mode.hdisplay;
> > > >
> > > > Wonder why these were size_t, causing needs to cast below...
> > >
> > > For crtc_x_limit I just copied the crtc_y_limit. I will change both to u16
> > > (the type of h/vdisplay).
> >
> > Don't go unsigned, that can cause unexpected results when mixed in
> > computations with signed variables.
>
> I will replace them with int.
>
> > Oh, the cast was probably not about size but signedness. Indeed, size_t
> > is unsigned.
> >
> > I don't see a reason to use a 16-bit size either, it just exposes the
> > computations to under/overflows that would then be needed to check for.
> > s32 should be as fast as any, and perhaps enough bits to never
> > under/overflow in these computations, but please verify that.
>
> I just suggested u16 because it's the type of vdisplay/hdisplay. It was
> not for performance reason.

Right. It's not uncommon store a value in a storage efficient type that
may also disallow illegal values, and then use a different type while
actually computing with it in order to not provoke too obscure C
language rules most people never heard of, to avoid over/underflows, or
to just avoid undefined behaviour.

..

> > > > > +static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction direction,
> > > > > + int plane_index)
> > > > > {
> > > > > - int x_src = frame_info->src.x1 >> 16;
> > > > > - int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
> > > > > -
> > > > > - return packed_pixels_addr(frame_info, x_src, y_src);
> > > > > + switch (direction) {
> > > > > + default:
> > > > > + DRM_ERROR("Invalid direction for pixel reading: %d\n", direction);
> > > > > + return 0;
> > > >
> > > > What I'd do here is move the default: section outside of the switch
> > > > completely. Then the compiler can warn if any enum value is not handled
> > > > here. Since every case in the switch is a return statement, falling out
> > > > of the switch block is the default case.
> > >
> > > Hoo, I did not know that gcc can warn when using enums, I will definitly
> > > do it for the v4.
> > >
> > > > Maybe the enum variable containing an illegal value could be handled
> > > > more harshly so that callers could rely on this function always
> > > > returning a good value?
> > > >
> > > > Just like passing in fb=NULL is handled by the kernel as an OOPS.
> > >
> > > I don't think it's a good idea to OOPS inside a driver.
> >
> > Everyone already do that. Most functions that do not expect to be called
> > with NULL never check the arguments for NULL. They just OOPS on
> > dereference if someone passes in NULL. And for a good reason: adding
> > all those checks is both code churn and it casts doubt: "maybe it is
> > legal and expected to call this function with NULL sometimes, what good
> > does that do?".
>
> I agree that adding something like
>
> if (direction_is_valid) pr_err("Invalid direction")
>
> is useless, but as I already have the switch, it cost nothing to warn if
> something gone wrong. I will just replace this simple DRM_ERROR with a
> WARN_ONCE to be more verbose about "it is a bug".

Sounds good to me, and I hope kernel maintainers would agree.


> > > An error here is
> > > maybe dangerous, but is not fatal to the kernel. Maybe you know how to do
> > > a "local" OOPS to break only this driver and not the whole kernel?
> >
> > I don't know what the best practices are in the kernel.
> >
> > > For the v4 I will keep a DRM_ERROR and return 0.
> >
> > Does that require the caller to check for 0? Could the 0 cause
> > something else to end up in an endless loop? If it does return 0, how
> > should a caller handle this case that "cannot" ever happen? Why have
> > code for something that cannot happen?
>
> I have to return something, otherwise the compiler will complain about it.
>
> To avoid for future developers surprise, I added this information in the
> comment. This way the user don't have to read the code to understand how
> much he can rely on this value.
>
> If the caller can trust his direction, he don't have to worry about this.
> If he can't trust his direction, he know that the returned value can be
> zero, and thus can't be used for a loop variant.

There should not be "untrusted" values to begin with at this point,
anything that comes from outside of the kernel should have already been
sanitised. This is about kernel bugs though. Bugs cannot be predicted,
nor can anyone guarantee to write bug-free code. Hence, the direction
value is always "somewhat untrusted". We're being paranoid about bugs
that might happen and trying to ensure the kernel can limp along
regardless, while also trying to minimise the amount of code that
"cannot" ever be reached.

> The zero is also nice because it does not interfere with the normal
> behavior of this function. If the returned value is not zero, it's the
> correct step to use from one pixel to an other.

If you expect the caller needing to check for the "cannot happen" case,
returning a unique error value is fine. If you expect the caller to
never need to think of the "cannot happen" case, you should return a
value that is "safe", if such value exists. "Safe" here means using it
will not result in grave bugs like bad memory access, but it also won't
produce expected results unless by accident.

This getting perhaps a bit too philosophical, so don't mind about this
too much if it feels strange.

> > Of course it's a trade-off between correctness and limping along
> > injured, and the kernel tends to strongly lean toward the latter for the
> > obvious reasons.
> >
> > > > > + case READ_RIGHT:
> > > > > + return fb->format->char_per_block[plane_index];
> > > > > + case READ_LEFT:
> > > > > + return -fb->format->char_per_block[plane_index];
> > > > > + case READ_DOWN:
> > > > > + return (int)fb->pitches[plane_index];
> > > > > + case READ_UP:
> > > > > + return -(int)fb->pitches[plane_index];
> > > > > + }
> > > > > }
> > > > >
> > > > > -static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
> > > > > -{
> > > > > - if (frame_info->rotation & (DRM_MODE_REFLECT_X | DRM_MODE_ROTATE_270))
> > > > > - return limit - x - 1;
> > > > > - return x;
> > > > > -}
> > > > >
> > > > > /*
> > > > > - * The following functions take pixel data from the buffer and convert them to the format
> > > > > + * The following functions take pixel data (a, r, g, b, pixel, ..), convert them to the format
> > > > > * ARGB16161616 in out_pixel.
> > > > > *
> > > > > - * They are used in the `vkms_compose_row` function to handle multiple formats.
> > > > > + * They are used in the `read_line`s functions to avoid duplicate work for some pixel formats.
> > > > > */
> > > > >
> > > > > -static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
> > > > > +static void ARGB8888_to_argb_u16(struct pixel_argb_u16 *out_pixel, int a, int r, int g, int b)
> > > >
> > > > The function name ARGB8888_to_argb_u16() is confusing. It's not taking
> > > > in ARGB8888 pixels but separate a,r,g,b ints. The only assumption it
> > > > needs from the pixel format is the 8888 part.
> > >
> > > I don't realy know how to name it. What I like with ARGB8888 is that it's
> > > clear that the values are 8 bits and in argb format.
> >
> > I could even propose
> >
> > static struct pixel_argb_u16
> > argb_u16_from_u8888(int a, int r, int g, int b)
> >
> > perhaps. Yes, returning a struct by value. I think it would fit, and
> > these are supposed to get fully inlined anyway, too.
> >
> > c.f argb_u16_from_u2101010().
>
> I don't find this method, but I got and like the idea, I will change the
> callback to this in the v4.

I mean, there is no support for 10-bpc formats in VKMS yet IIRC, but
there should be one day, so thinking about how that would fit in the
naming scheme is nice.

> > Not a big deal though, I think I'm getting a little bit too involved to
> > see what would be the most intuitively understandable naming scheme for
> > someone not familiar with the code.
> >
> > > Do you think that `argb_u8_to_argb_u16`, with a new structure
> > > pixel_argb_u8 will be better? (like PATCH 6/9 with pixel_yuv_u8).
> > >
> > > If so, I will introduce the argb_u8 structure in an other commit.
> >
> > How would you handle 10-bpc formats? Is there a need for
> > proliferation of bit-depth-specific struct types?
>
> No, I don't think it's good to multiply things. I will patch Arthur's
> patches to avoid the pixel_yuv_u8 structure.
>
> > > [...]
> > >
> > > > > + * The following functions are read_line function for each pixel format supported by VKMS.
> > > > > *
> > > > > - * This function composes a single row of a plane. It gets the source pixels
> > > > > - * through the y coordinate (see get_packed_src_addr()) and goes linearly
> > > > > - * through the source pixel, reading the pixels and converting it to
> > > > > - * ARGB16161616 (see the pixel_read() callback). For rotate-90 and rotate-270,
> > > > > - * the source pixels are not traversed linearly. The source pixels are queried
> > > > > - * on each iteration in order to traverse the pixels vertically.
> > > > > + * They read a line starting at the point @x_start,@y_start following the @direction. The result
> > > > > + * is stored in @out_pixel and in the format ARGB16161616.
> > > > > + *
> > > > > + * Those function are very similar, but it is required for performance reason. In the past, some
> > > > > + * experiment were done, and with a generic loop the performance are very reduced [1].
> > > > > + *
> > > > > + * [1]: https://lore.kernel.org/dri-devel/[email protected]/
> > > > > */
> > > > > -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y)
> > > > > +
> > > > > +static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> > > > > + enum pixel_read_direction direction, int count,
> > > > > + struct pixel_argb_u16 out_pixel[])
> > > > > +{
> > > > > + u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> > > > > +
> > > > > + int step = get_step_1x1(frame_info->fb, direction, 0);
> > > > > +
> > > > > + while (count) {
> > > > > + u8 *px = (u8 *)src_pixels;
> > > > > +
> > > > > + ARGB8888_to_argb_u16(out_pixel, px[3], px[2], px[1], px[0]);
> > > > > + out_pixel += 1;
> > > > > + src_pixels += step;
> > > > > + count--;
> > > >
> > > > btw. you could eliminate decrementing 'count' if you computed end
> > > > address and used while (out_pixel < end).
> > >
> > > Yes, you are right, but after thinking about it, neither out_pixel < end
> > > and while (count) are conveying "this loop will copy `count` pixels. I
> > > think a for-loop here is more understandable. There is no ambiguity in the
> > > number of pixels written and less error-prone. I will replace
> > > while (count)
> > > by
> > > for(int i = 0; i < count; i++)
> >
> > I agree that a for-loop is the most obvious way of saying it, but I
> > also think while (out_pixel < end) is very close too, and so is while (count).
> > None of those would make me think twice.
> >
> > However, I'm thinking of performance here. After all, this is the
> > hottest code path there is in VKMS. Is the compiler smart enough to
> > eliminate count-- or i to reduce the number of CPU cycles?
>
> You are proably right, I will change it to out_pixel < end.

Don't trust my word without benchmarking it. ;-)


Thanks,
pq


Attachments:
(No filename) (849.00 B)
OpenPGP digital signature