Received: by 2002:a05:7208:9594:b0:7e:5202:c8b4 with SMTP id gs20csp1975078rbb; Tue, 27 Feb 2024 07:05:01 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCX1GFh5QWMX+Z2auRulAIoScUbUAz4W0uuZKSOdvQTldlg/SuP30Wo8JpdD/AqmQ0jhHixdSqT1HOhwXyKpZi4axt1cFfdwlg941fUN2A== X-Google-Smtp-Source: AGHT+IE0fPr81k+ITG+Imp68V8kssSTUIy1JrkjAXMeaPwhc9TXI3pj+firahHLYQXalV4sFOOYq X-Received: by 2002:a17:906:f9d4:b0:a43:4c31:c4f1 with SMTP id lj20-20020a170906f9d400b00a434c31c4f1mr4154308ejb.11.1709046301374; Tue, 27 Feb 2024 07:05:01 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709046301; cv=pass; d=google.com; s=arc-20160816; b=RKol4NvywarwXiq11mzVMba4xCnp0MkVHfw9LjaU4Gw31Sm7RvWW38tir51JV03SQI ZJFYVMpxX0Pgeo2QG5bbYUY1to6udigcI39wpOYqX5azQjjJd0NlU97B0jaQfll45XK+ mHeJDZRN0B0pySA6F7FEmU0OQv0EUsHcz/4Lqr198ulSC/KZ5+um8TKOcLkv83dHBNIZ caVfvVQftczT7XdsAeMN5ZIkgw4B0tDw/73o6fzm2oRCf4uO+ns+eAMs+SUu1/iEDEzJ KjWSdm1Su5o/UD42BW738yl+o9Xa+xnjad0ZdBvId+JCLMm/WszQ6Dz8JGY37zFf2c6s t11A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:mail-followup-to:message-id:subject:cc:to:from:date :dkim-signature; bh=SzXNjEy1KygNfe9B338Kutw8Dkg/ESDdJW91KNPp8G4=; fh=pEOvsqHZyKeBaOjoipg1u26T/dNaZ9UU6E/qX0hC/Lo=; b=Yh5NLOoURi2igu0FqD3Uqcx7Bl9xv/KEpRE45fpXJwwmtiktoLA/Vpqpn9Tqogjj9u auePaTnDh6GeFVNRdBsWUWcl59t8+Wv/kq/9w9nEEoDUxsVCAcjR670QBeX2mwG0IDOc kVp3eflzyWTIScfCHUYhZ4Mj5lrW/ZCGesAd13em1Ei/sTh1ltDbqOniMHl8zierAQSz 2KBdQ6Sfml7O+Spn3mptT+I7jD2u5J13p1CpIRgK3iGaAOwRLbzqv0/L5nS2d+tyUcJm aRxyFRQ7skq4lZE2xXT5WmTC/XZDQoGCcaK4LHwz6NWzL402cgyVE7sqrePDNFTCZ8oR nuBQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@bootlin.com header.s=gm1 header.b=TIuivmmc; arc=pass (i=1 spf=pass spfdomain=bootlin.com dkim=pass dkdomain=bootlin.com dmarc=pass fromdomain=bootlin.com); spf=pass (google.com: domain of linux-kernel+bounces-83462-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83462-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=bootlin.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id br21-20020a170906d15500b00a43da8a152dsi102431ejb.420.2024.02.27.07.05.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 07:05:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-83462-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@bootlin.com header.s=gm1 header.b=TIuivmmc; arc=pass (i=1 spf=pass spfdomain=bootlin.com dkim=pass dkdomain=bootlin.com dmarc=pass fromdomain=bootlin.com); spf=pass (google.com: domain of linux-kernel+bounces-83462-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83462-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=bootlin.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id E84191F21293 for ; Tue, 27 Feb 2024 15:05:00 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6E078149DEC; Tue, 27 Feb 2024 15:02:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bootlin.com header.i=@bootlin.com header.b="TIuivmmc" Received: from relay8-d.mail.gandi.net (relay8-d.mail.gandi.net [217.70.183.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 28A121487E0 for ; Tue, 27 Feb 2024 15:02:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.70.183.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709046138; cv=none; b=rUKH0p1ZYFAe0QAHLRJlwjVc6YHXPyXlO39Uf1JYx5AJHtVay5Iy46XvS/vheT1DELo0rulVbByFV6mp7CgbaNax+67qvxhXwbGGXewo6IlacY6V1IE6YbXCZnx/8doMjexPTdgKkYj1wWkn5amcxxAuzZDq39gC259JnQ6bZPg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709046138; c=relaxed/simple; bh=rrS4u1gjuzdpkqLLt9IvErorcz/ICZapoDZPP8nPxpQ=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=aOmpKWGV7OFGwsh04Nz5OCYThDVWeEqgN38G5/Ytrv4IX/SqNxI0pViD7uHUT5v018PWQZscmJnjlMRF9gMISpV7yl3Jcu/gxFFxLJmDjmf+vY1LZkjvKQtFym1d6+gFM2MKx68RW/1Bwkskr2aKbSpeMEYPWly3c2OTfckS8jw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=bootlin.com; spf=pass smtp.mailfrom=bootlin.com; dkim=pass (2048-bit key) header.d=bootlin.com header.i=@bootlin.com header.b=TIuivmmc; arc=none smtp.client-ip=217.70.183.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=bootlin.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bootlin.com Received: by mail.gandi.net (Postfix) with ESMTPSA id B94001BF210; Tue, 27 Feb 2024 15:02:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bootlin.com; s=gm1; t=1709046132; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SzXNjEy1KygNfe9B338Kutw8Dkg/ESDdJW91KNPp8G4=; b=TIuivmmcgInEXy4np1bHfOLDsVJ/qhx/XAazrewSo4N1v+KBAeOmY8jpAYIG/BXYLhUr8E +nHc/Sm2dlWqayclIeOOymrkDNy06h/EgZXU6B2FaKX1zZmixVursldvGs2Mqr1x5k8555 YmYcSECvYrtaVHJBzHSxDJPFjUbtq86D3wJvYk/IITpDlfcAcmxPV4OM0iKKoUQkRR8fD/ bWjhr0mgwx48PacWV1PQuYzJZGgutIaZxHrgRzrDnyBpVo4zK+x2pp/up3ftLtHcLcfvkg BKLGdI1qhQx+eCVWR72824HLgvjoVMmxAy/pVGEqeWSxb194et0J4+SWRUw4QQ== Date: Tue, 27 Feb 2024 16:02:09 +0100 From: Louis Chauvet To: Arthur Grillo Cc: Rodrigo Siqueira , Melissa Wen , =?iso-8859-1?Q?Ma=EDra?= Canal , Haneen Mohammed , Daniel Vetter , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Jonathan Corbet , pekka.paalanen@haloniitty.fi, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, jeremie.dautheribes@bootlin.com, miquel.raynal@bootlin.com, thomas.petazzoni@bootlin.com, seanpaul@google.com, marcheu@google.com, nicolejadeyee@google.com Subject: Re: [PATCH v3 5/9] drm/vkms: Re-introduce line-per-line composition algorithm Message-ID: Mail-Followup-To: Arthur Grillo , Rodrigo Siqueira , Melissa Wen , =?iso-8859-1?Q?Ma=EDra?= Canal , Haneen Mohammed , Daniel Vetter , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Jonathan Corbet , pekka.paalanen@haloniitty.fi, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, jeremie.dautheribes@bootlin.com, miquel.raynal@bootlin.com, thomas.petazzoni@bootlin.com, seanpaul@google.com, marcheu@google.com, nicolejadeyee@google.com References: <20240226-yuv-v3-0-ff662f0994db@bootlin.com> <20240226-yuv-v3-5-ff662f0994db@bootlin.com> <3d34fda7-1a95-472d-b059-eec7cb280c35@riseup.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <3d34fda7-1a95-472d-b059-eec7cb280c35@riseup.net> X-GND-Sasl: louis.chauvet@bootlin.com Le 26/02/24 - 11:14, Arthur Grillo a ?crit : > > > On 26/02/24 05:46, Louis Chauvet wrote: > > Re-introduce a line-by-line composition algorithm for each pixel format. > > This allows more performance by not requiring an indirection per pixel > > read. This patch is focused on readability of the code. > > > > Line-by-line composition was introduced by [1] but rewritten back to > > pixel-by-pixel algorithm in [2]. At this time, nobody noticed the impact > > on performance, and it was merged. > > > > This patch is almost a revert of [2], but in addition efforts have been > > made to increase readability and maintainability of the rotation handling. > > The blend function is now divided in two parts: > > - Transformation of coordinates from the output referential to the source > > referential > > - Line conversion and blending > > > > Most of the complexity of the rotation management is avoided by using > > drm_rect_* helpers. The remaining complexity is around the clipping, to > > avoid reading/writing outside source/destination buffers. > > > > The pixel conversion is now done line-by-line, so the read_pixel_t was > > replaced with read_pixel_line_t callback. This way the indirection is only > > required once per line and per plane, instead of once per pixel and per > > plane. > > > > The read_line_t callbacks are very similar for most pixel format, but it > > is required to avoid performance impact. Some helpers were created to > > avoid code repetition: > > - get_step_1x1: get the step in byte to reach next pixel block in a > > certain direction > > - *_to_argb_u16: helpers to perform colors conversion. They should be > > inlined by the compiler, and they are used to avoid repetition between > > multiple variants of the same format (argb/xrgb and maybe in the > > future for formats like bgr formats). > > > > This new algorithm was tested with: > > - kms_plane (for color conversions) > > - kms_rotation_crc (for rotations of planes) > > - kms_cursor_crc (for translations of planes) > > The performance gain was mesured with: > > - kms_fb_stress > > > > [1]: commit 8ba1648567e2 ("drm: vkms: Refactor the plane composer to accept > > new formats") > > https://lore.kernel.org/all/20220905190811.25024-7-igormtorrente@gmail.com/ > > [2]: commit 322d716a3e8a ("drm/vkms: isolate pixel conversion > > functionality") > > https://lore.kernel.org/all/20230418130525.128733-2-mcanal@igalia.com/ > > > > Signed-off-by: Louis Chauvet > > --- > > drivers/gpu/drm/vkms/vkms_composer.c | 219 +++++++++++++++++++++++------- > > drivers/gpu/drm/vkms/vkms_drv.h | 24 +++- > > drivers/gpu/drm/vkms/vkms_formats.c | 253 ++++++++++++++++++++++------------- > > drivers/gpu/drm/vkms/vkms_formats.h | 2 +- > > drivers/gpu/drm/vkms/vkms_plane.c | 8 +- > > 5 files changed, 349 insertions(+), 157 deletions(-) > > > > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c > > index 5b341222d239..e555bf9c1aee 100644 > > --- a/drivers/gpu/drm/vkms/vkms_composer.c > > +++ b/drivers/gpu/drm/vkms/vkms_composer.c > > @@ -24,9 +24,10 @@ static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha) > > > > /** > > * pre_mul_alpha_blend - alpha blending equation > > - * @frame_info: Source framebuffer's metadata > > * @stage_buffer: The line with the pixels from src_plane > > * @output_buffer: A line buffer that receives all the blends output > > + * @x_start: The start offset to avoid useless copy > > + * @count: The number of byte to copy > > * > > * Using the information from the `frame_info`, this blends only the > > * necessary pixels from the `stage_buffer` to the `output_buffer` > > @@ -37,51 +38,23 @@ static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha) > > * drm_plane_create_blend_mode_property(). Also, this formula assumes a > > * completely opaque background. > > */ > > -static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info, > > - struct line_buffer *stage_buffer, > > - struct line_buffer *output_buffer) > > +static void pre_mul_alpha_blend( > > + struct line_buffer *stage_buffer, > > + struct line_buffer *output_buffer, > > + int x_start, > > + int pixel_count) > > { > > - int x_dst = frame_info->dst.x1; > > - struct pixel_argb_u16 *out = output_buffer->pixels + x_dst; > > - struct pixel_argb_u16 *in = stage_buffer->pixels; > > - int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst), > > - stage_buffer->n_pixels); > > - > > - for (int x = 0; x < x_limit; x++) { > > - out[x].a = (u16)0xffff; > > - out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a); > > - out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a); > > - out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a); > > + struct pixel_argb_u16 *out = &output_buffer->pixels[x_start]; > > + struct pixel_argb_u16 *in = &stage_buffer->pixels[x_start]; > > + > > + for (int i = 0; i < pixel_count; i++) { > > + out[i].a = (u16)0xffff; > > + out[i].r = pre_mul_blend_channel(in[i].r, out[i].r, in[i].a); > > + out[i].g = pre_mul_blend_channel(in[i].g, out[i].g, in[i].a); > > + out[i].b = pre_mul_blend_channel(in[i].b, out[i].b, in[i].a); > > } > > } > > > > -static int get_y_pos(struct vkms_frame_info *frame_info, int y) > > -{ > > - if (frame_info->rotation & DRM_MODE_REFLECT_Y) > > - return drm_rect_height(&frame_info->rotated) - y - 1; > > - > > - switch (frame_info->rotation & DRM_MODE_ROTATE_MASK) { > > - case DRM_MODE_ROTATE_90: > > - return frame_info->rotated.x2 - y - 1; > > - case DRM_MODE_ROTATE_270: > > - return y + frame_info->rotated.x1; > > - default: > > - return y; > > - } > > -} > > - > > -static bool check_limit(struct vkms_frame_info *frame_info, int pos) > > -{ > > - if (drm_rotation_90_or_270(frame_info->rotation)) { > > - if (pos >= 0 && pos < drm_rect_width(&frame_info->rotated)) > > - return true; > > - } else { > > - if (pos >= frame_info->rotated.y1 && pos < frame_info->rotated.y2) > > - return true; > > - } > > - > > - return false; > > -} > > > > static void fill_background(const struct pixel_argb_u16 *background_color, > > struct line_buffer *output_buffer) > > @@ -163,6 +136,37 @@ static void apply_lut(const struct vkms_crtc_state *crtc_state, struct line_buff > > } > > } > > > > +/** > > + * direction_for_rotation() - Helper to get the correct reading direction for a specific rotation > > + * > > + * @rotation: rotation to analyze > > + */ > > +enum pixel_read_direction direction_for_rotation(unsigned int rotation) > > +{ > > + if (rotation & DRM_MODE_ROTATE_0) { > > + if (rotation & DRM_MODE_REFLECT_X) > > + return READ_LEFT; > > + else > > + return READ_RIGHT; > > + } else if (rotation & DRM_MODE_ROTATE_90) { > > + if (rotation & DRM_MODE_REFLECT_Y) > > + return READ_UP; > > + else > > + return READ_DOWN; > > + } else if (rotation & DRM_MODE_ROTATE_180) { > > + if (rotation & DRM_MODE_REFLECT_X) > > + return READ_RIGHT; > > + else > > + return READ_LEFT; > > + } else if (rotation & DRM_MODE_ROTATE_270) { > > + if (rotation & DRM_MODE_REFLECT_Y) > > + return READ_DOWN; > > + else > > + return READ_UP; > > + } > > + return READ_RIGHT; > > +} > > + > > /** > > * blend - blend the pixels from all planes and compute crc > > * @wb: The writeback frame buffer metadata > > @@ -183,11 +187,11 @@ static void blend(struct vkms_writeback_job *wb, > > { > > struct vkms_plane_state **plane = crtc_state->active_planes; > > u32 n_active_planes = crtc_state->num_active_planes; > > - int y_pos; > > > > const struct pixel_argb_u16 background_color = { .a = 0xffff }; > > > > size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay; > > + size_t crtc_x_limit = crtc_state->base.crtc->mode.hdisplay; > > > > /* > > * The planes are composed line-by-line. It is a necessary complexity to avoid poor > > @@ -198,22 +202,133 @@ static void blend(struct vkms_writeback_job *wb, > > > > /* The active planes are composed associatively in z-order. */ > > for (size_t i = 0; i < n_active_planes; i++) { > > - y_pos = get_y_pos(plane[i]->frame_info, y); > > + struct vkms_plane_state *current_plane = plane[i]; > > > > - if (!check_limit(plane[i]->frame_info, y_pos)) > > + /* Avoid rendering useless lines */ > > + if (y < current_plane->frame_info->dst.y1 || > > + y >= current_plane->frame_info->dst.y2) { > > continue; > > - > > - vkms_compose_row(stage_buffer, plane[i], y_pos); > > - pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer, > > - output_buffer); > > + } > > + > > + /* > > + * src_px is the line to copy. The initial coordinates are inside the > > So maybe is better to rename to src_line? Good idea! Kind regards, Louis Chauvet [...] -- Louis Chauvet, Bootlin Embedded Linux and Kernel engineering https://bootlin.com