Received: by 2002:a05:7412:bbc7:b0:fc:a2b0:25d7 with SMTP id kh7csp929406rdb; Fri, 2 Feb 2024 08:11:25 -0800 (PST) X-Google-Smtp-Source: AGHT+IEku2XiV64vSDKRRj1OKWhmXGWN/7COVY7srn9tOqTmkTZ1CguMI5AexmbGtqIyTcuOdTmC X-Received: by 2002:a17:90a:eac4:b0:296:207b:c917 with SMTP id ev4-20020a17090aeac400b00296207bc917mr4658086pjb.23.1706890285440; Fri, 02 Feb 2024 08:11:25 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706890285; cv=pass; d=google.com; s=arc-20160816; b=eqYcbt8LGjUvyhVo0N4UNDKcAcmS0V7F4hZO80hHv6u0o/kZn42LOmpn6c6R1qd7eQ NyRItFf9QFjfA1UhdG/a2g85JxTTCN630dSnF+PHs9sYWOVebIQFSY5Wt/Skv7AOvQiT 5T7gw5EobaX0allcALxUXMm9y9vbJq0fgDQnafaXSBUvpvhguvDro9t3phaGh4tRxPw/ 3P1CbMC038PrPPw1zbXT2Q5O8CDwdIRCzy/uKgm64N7ScRvyr/U+3pDAgTdNFg7GRRxm 3Y3Jx+Ez3IlTn6Y/CQgsJZEIsnWEL+g9uKy+StTuo85VhEmOxhn1bpraW+d2GhdSK3Mb pLyg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:organization:references :in-reply-to:message-id:subject:cc:to:from:date:dkim-signature; bh=Weks7RHR+sDFkVJXBmvPTUMZqebWWfxJm3w8ueDMk3U=; fh=qzGZD6k+ooUqqZrdAmQ6GpuKIxMHFWoQzIgqkI9gtFg=; b=uqQ1YLHBJhiw5MXiB69iGBsQF3YscrCkDX4foUmSbUiu5wAKa1a8SoOjIbmN69FsWb 1NRJqZM3Yo7pgkPST2o/uJ+ircDxqse3pFM5bjIJkrp6b9gsfQbNIS776O5EcES6KPYN VPfqV1pxFDvwNKZWVZko8FvYe/og6/YOJuiU7NErAwH8psjqBzL57OYdzn2D/jewxCil k6ZsYYSNXU8QFHie0+NwZMDgxhJeoKakzAYnlakdjiDLoPlSwJYaqxYwKQyYMcHFbNkS +Hp3XZhq9UG4YQ1/qMhAkkG8lXqXoR/yqPxH6GxBNUsQWZIr8UtsJG3I2BBeb/OI5Vyg bu7Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@bootlin.com header.s=gm1 header.b=WsZhLkJX; arc=pass (i=1 spf=pass spfdomain=bootlin.com dkim=pass dkdomain=bootlin.com dmarc=pass fromdomain=bootlin.com); spf=pass (google.com: domain of linux-kernel+bounces-50101-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-50101-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=bootlin.com X-Forwarded-Encrypted: i=1; AJvYcCU1oEZwW5hFNUu6q+Y85UISohMU2kTEutYvK7Gyt6caT4GkoddcjpNc6ZhJTzAZ8sHstym57ShcbZ4lihVT7hNx69ZrWijnthFaCBDMvA== Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id e89-20020a17090a6fe200b0029625680676si136093pjk.175.2024.02.02.08.11.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Feb 2024 08:11:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-50101-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@bootlin.com header.s=gm1 header.b=WsZhLkJX; arc=pass (i=1 spf=pass spfdomain=bootlin.com dkim=pass dkdomain=bootlin.com dmarc=pass fromdomain=bootlin.com); spf=pass (google.com: domain of linux-kernel+bounces-50101-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-50101-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=bootlin.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 62EB6283009 for ; Fri, 2 Feb 2024 16:10:59 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9EBC8148FE7; Fri, 2 Feb 2024 16:07:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bootlin.com header.i=@bootlin.com header.b="WsZhLkJX" Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6E9391487F9 for ; Fri, 2 Feb 2024 16:07:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.70.183.197 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706890064; cv=none; b=D8ENW/y7obaa/iKIr/E0IGL8aFiTXhmjShk4L8QmMEe0+uNMiNHhoQEP9sHGukc1RrN62LiWcXEMIT41mPDjhxfDsNJFqhi5hYpmGuxKGp7MQY3rFizb9j0ByYxKSXlokkKPmNmyAAis3Mpol6/rSpodYh1k5O35NzxwIN+EHf4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706890064; c=relaxed/simple; bh=A+3EyYofSc6kXr8hjf253jMxqdG64P1Y+Y7oC43yiIk=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=OzFOKidkQTUWQLQn6fLstsjiNhcgRwi++aEshuE1ZY6pL4hGArAs1MV5jSiFaKrnyts3iXogZZCFviN8rzXW3vbPSUMtd///4k73tvNrFSfrJkjvMNeamItcAyQAAHjlk7ktLZbcQpswnv1poO4bQhiKU4eFnkwFgeF3JQtkrOM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=bootlin.com; spf=pass smtp.mailfrom=bootlin.com; dkim=pass (2048-bit key) header.d=bootlin.com header.i=@bootlin.com header.b=WsZhLkJX; arc=none smtp.client-ip=217.70.183.197 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=bootlin.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bootlin.com Received: by mail.gandi.net (Postfix) with ESMTPSA id E9E481C0007; Fri, 2 Feb 2024 16:07:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bootlin.com; s=gm1; t=1706890059; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Weks7RHR+sDFkVJXBmvPTUMZqebWWfxJm3w8ueDMk3U=; b=WsZhLkJX3vM9eQMZ1GukgFiUom/09EVzNiqk3KL+QzdjdrLrXimwRw/s37xRgMYJYX1gI9 dZMX3zOblbYq0GKGxg2OB8G2is7xlDwbQ0bcxxu4vXh7ub/eCeprtxpZHeUFFWG5AYNckW ZiYaCUmKxdauSz3v7JTsnBUxOcTzq6wPwIV+D1Lp6EVXOUmILgu3COUzA5VWMtzPAT8tj2 TGq8vilM9QZpTDpf35HGRcj8r1XGyzf/uxyE7lSawtYw+xVK0Yz3DcrTkTmS7ghVDEmz5g 2YJxzDZ0hsjhhF/cKg3wYy2OTbEvnhOBt1086/TGASZCS3uFaqFd/boCscFJUw== Date: Fri, 2 Feb 2024 17:07:34 +0100 From: Miquel Raynal To: Pekka Paalanen Cc: Maxime Ripard , Louis Chauvet , Rodrigo Siqueira , Melissa Wen , =?UTF-8?B?TWHDrXJh?= Canal , Haneen Mohammed , Daniel Vetter , Maarten Lankhorst , Thomas Zimmermann , David Airlie , marcheu@google.com, seanpaul@google.com, nicolejadeyee@google.com, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, thomas.petazzoni@bootlin.com, Arthur Grillo Subject: Re: [PATCH 2/2] drm/vkms: Use a simpler composition function Message-ID: <20240202170734.3176dfe4@xps-13> In-Reply-To: <20240202174913.789a9db9@eldfell> References: <20240201-yuv-v1-0-3ca376f27632@bootlin.com> <20240201-yuv-v1-2-3ca376f27632@bootlin.com> <20240202105522.43128e19@eldfell> <20240202102601.70b6d49c@xps-13> <3nofkwzgnf4yva2wfogdbii47ohpi2wm5vp6aijtg3emxyoowt@twyreqz7ai3g> <20240202131322.5471e184@xps-13> <20240202174913.789a9db9@eldfell> Organization: Bootlin X-Mailer: Claws Mail 4.0.0 (GTK+ 3.24.33; x86_64-pc-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-GND-Sasl: miquel.raynal@bootlin.com Hi Pekka, pekka.paalanen@haloniitty.fi wrote on Fri, 2 Feb 2024 17:49:13 +0200: > On Fri, 2 Feb 2024 13:13:22 +0100 > Miquel Raynal wrote: >=20 > > Hello Maxime, > >=20 > > + Arthur > >=20 > > mripard@kernel.org wrote on Fri, 2 Feb 2024 10:53:37 +0100: > > =20 > > > Hi Miquel, > > >=20 > > > On Fri, Feb 02, 2024 at 10:26:01AM +0100, Miquel Raynal wrote: =20 > > > > pekka.paalanen@haloniitty.fi wrote on Fri, 2 Feb 2024 10:55:22 +020= 0: > > > > =20 > > > > > On Thu, 01 Feb 2024 18:31:32 +0100 > > > > > Louis Chauvet wrote: > > > > > =20 > > > > > > Change the composition algorithm to iterate over pixels instead= of lines. > > > > > > It allows a simpler management of rotation and pixel access for= complex formats. > > > > > >=20 > > > > > > This new algorithm allows read_pixel function to have access to= x/y > > > > > > coordinates and make it possible to read the correct thing in a= block > > > > > > when block_w and block_h are not 1. > > > > > > The iteration pixel-by-pixel in the same method also allows a s= impler > > > > > > management of rotation with drm_rect_* helpers. This way it's n= ot needed > > > > > > anymore to have misterious switch-case distributed in multiple = places. =20 > > > > >=20 > > > > > Hi, > > > > >=20 > > > > > there was a very good reason to write this code using lines: > > > > > performance. Before lines, it was indeed operating on individual = pixels. > > > > >=20 > > > > > Please, include performance measurements before and after this se= ries > > > > > to quantify the impact on the previously already supported pixel > > > > > formats, particularly the 32-bit-per-pixel RGB variants. > > > > >=20 > > > > > VKMS will be used more and more in CI for userspace projects, and > > > > > performance actually matters there. > > > > >=20 > > > > > I'm worrying that this performance degradation here is significan= t. I > > > > > believe it is possible to keep blending with lines, if you add ne= w line > > > > > getters for reading from rotated, sub-sampled etc. images. That w= ay you > > > > > don't have to regress the most common formats' performance. = =20 > > > >=20 > > > > While I understand performance is important and should be taken into > > > > account seriously, I cannot understand how broken testing could be > > > > considered better. Fast but inaccurate will always be significantly > > > > less attractive to my eyes. =20 > > >=20 > > > AFAIK, neither the cover letter nor the commit log claimed it was fix= ing > > > something broken, just that it was "better" (according to what > > > criteria?). =20 > >=20 > > Better is probably too vague and I agree the "fixing" part is not > > clearly explained in the commit log. The cover-letter however states: > > =20 > > > Patch 2/2: This patch is more complex. My main target was to solve is= sues > > > I found in [1], but as it was very complex to do it "in place", I cho= ose > > > to rework the composition function. =20 > > ... =20 > > > [1]: https://lore.kernel.org/dri-devel/20240110-vkms-yuv-v2-0-952fcaa= 5a193@riseup.net/ =20 > >=20 > > If you follow this link you will find all the feedback and especially > > the "broken" parts. Just to be clear, writing bugs is totally expected > > and review/testing is supposed to help on this regard. I am not blaming > > the author in any way, just focusing on getting this code in a more > > readable shape and hopefully reinforce the testing procedure. > > =20 > > > If something is truly broken, it must be stated what exactly is so we > > > can all come up with a solution that will satisfy everyone. =20 > >=20 > > Maybe going through the series pointed above will give more context > > but AFAIU: the YUV composition is not totally right (and the tests used > > to validate it need to be more complex as well in order to fail). > >=20 > > Here is a proposal. > >=20 > > Today's RGB implementation is only optimized in the line-by-line case > > when there is no rotation. The logic is bit convoluted and may possibly > > be slightly clarified with a per-format read_line() implementation, > > at a very light performance cost. Such an improvement would definitely > > benefit to the clarity of the code, especially when transformations > > (especially the rotations) come into play because they would be clearly > > handled differently instead of being "hidden" in the optimized logic. > > Performances would not change much as this path is not optimized today > > anyway (the pixel-oriented logic is already used in the rotation case). > >=20 > > Arthur's YUV implementation is indeed well optimized but the added > > complexity probably lead to small mistakes in the logic. The > > per-format read_line() implementation mentioned above could be > > extended to the YUV format as well, which would leverage Arthur's > > proposal by re-using his optimized version. Louis will help on this > > regard. However, for more complex cases such as when there is a > > rotation, it will be easier (and not sub-optimized compared to the RGB > > case) to also fallback to a pixel-oriented processing. > >=20 > > Would this approach make sense? =20 >=20 > Hi, >=20 > I think it would, if I understand what you mean. Ever since I proposed > a line-by-line algorithm to improve the performance, I was thinking of > per-format read_line() functions that would be selected outside of any > loops. Extending that to support YUV is only natural. I can imagine > rotation complicates things, and I won't oppose that resulting in a > much heavier read_line() implementation used in those cases. They might > perhaps call the original read_line() implementations pixel-by-pixel or > plane-by-plane (i.e. YUV planes) per pixel. Chroma-siting complicates > things even further. That way one could compose any > rotation-format-siting combination by chaining function pointers. I'll let Louis also validate but on my side I feel like I totally agree with your feedback. > I haven't looked at VKMS in a long time, and I am disappointed to find > that vkms_compose_row() is calling plane->pixel_read() pixel-by-pixel. > The reading vfunc should be called with many pixels at a time when the > source FB layout allows it. The whole point of the line-based functions > was that they repeat the innermost loop in every function body to make > the per-pixel overhead as small as possible. The VKMS implementations > benchmarked before and after the original line-based algorithm showed > that calling a function pointer per-pixel is relatively very expensive. > Or maybe it was a switch-case. Indeed, since your initial feedback Louis made a couple of comparisons and the time penalty is between +5% and +60% depending on the case, AFAIR. > Sorry, I didn't realize the optimization had already been lost. No problem, actually I also lost myself in my first answer as I initially thought the (mainline) RGB logic was also broken in edge cases, which it was not, only the YUV logic suffered from some limitations. > Btw. I'd suggest renaming vkms_compose_row() to vkms_fetch_row() since > it's not composing anything and the name mislead me. Makes sense. > I think if you inspect the compositing code as of revision > 8356b97906503a02125c8d03c9b88a61ea46a05a you'll get a better feeling of > what it was supposed to be. Excellent, thanks a lot! Miqu=C3=A8l