Received: by 2002:ab2:3319:0:b0:1ef:7a0f:c32d with SMTP id i25csp695639lqc; Fri, 8 Mar 2024 09:04:11 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXG8f4MqnRbtQ4nvgwc1iDzL2ffbDntsNAMeyGf85n16N0AYs3bzURp6IjbcuF/8iyR4WHhIlxn2izTdVmB5Hfm/ZOzhzg8h/Yy2aHBsA== X-Google-Smtp-Source: AGHT+IFZwnyOwJjCT6Ob2H3oKeQwqXOO7/iUDRxqOpwhpv0b3PJ7BsQF4/uj0yg0ldfxNB7svnoS X-Received: by 2002:a05:620a:1912:b0:788:245f:b4bc with SMTP id bj18-20020a05620a191200b00788245fb4bcmr14407688qkb.36.1709917450821; Fri, 08 Mar 2024 09:04:10 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709917450; cv=pass; d=google.com; s=arc-20160816; b=hj+BEzFftlZFUQohrJ6UF6pQfb9MXJbm1Hwrn0Zp4w8dGwb9YGyaU+1nB4sYhZxS7i DN+o6MpO4tmk2KveXRqboNtJLA5pVir8EzlZw0TP/dALH4guEWnYfAc7EtOl1hF9JVnp rCdhPqutRXbP6mS2F1GgXtlzfw+Zi5vODjWdWScKRgPx7ruB2iHpUCuqd0Ooe2s4D7Yo XnRRDCNBiXH93+zU7Fry1eBihnMj1h5QvkD+H6BqzK2BeWjaNwNHAgKzAZgwZUpF6/Qu 81PmMnvqkh9nZnToDUVIkHt1zwNS73owcI60XJgAZ+2z/Dx7lqdqIjvWmSmrqUhcbnX+ voTw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:subject:cc:to:from:date; bh=z97iX4igXTUSeclWco5Hu3/BkwYPrDrNaZEKY+N3Z+w=; fh=u273BXd66lADKMsuJIrvkfZupy/DvJiYlYU2zqpb7FE=; b=EDbbQKfYOSP33mIGD03ye6xC64aW5jobDNG4RjaABOVvozruujY+kzB2XnG53PBY/j DGihTHbilBTueD9q0t01fOM55sqejIGn/CM3goi0hXP5SSGPxU9AjWgeZ9OGLn7VMBTp Mbfbl1Y3P+my5Q8aFrq1nHbOhcuU+ISJa6YGaAcrfXxETZJIVvy7jtOt9Nj5Hq7nIakL PyrO208xDa24TVBwZGeIg9vB3CpeiULWjg+qQPptR9W2XltGDMJCQ66p3xX9x0LMJZLv D6eFsCoeCXQO0GCWejXMYnIYOEJQ3XLGxq8viJQifbqwQ/aMyRGz/sZwx5p6ea//CziF I4YQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-97347-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-97347-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id ou24-20020a05620a621800b00787c6ff2d1dsi18508683qkn.212.2024.03.08.09.04.10 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Mar 2024 09:04:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-97347-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-97347-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-97347-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id E077F1C22A6A for ; Fri, 8 Mar 2024 17:03:49 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3525320B11; Fri, 8 Mar 2024 17:02:31 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 47E3D15C8 for ; Fri, 8 Mar 2024 17:02:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709917350; cv=none; b=c4rBEXK8yJpjfFh/LJgAv0XmnsshHAygtQ9ghikpnv0NZqm+JVWHoXVOuvGV2jHIC21OIL+gRtBTOSTqwvfdRbhOVDrGv4zrbxaFOzuEzYoqp9Ux0g0HCc94Dg/MiAT3Fisx2JcuBpDDUyQSqRgdChqd0YgILzQkFPx1roQMOsg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709917350; c=relaxed/simple; bh=nBf0mh1Mx2uJcthMaU8Eokutt3sv+hG4RQslq3NtpVM=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=qCmz0uOpNRVxUi7te3QWbI1YdrHNI+devU1AqRlHogNHMAXBL9JZXhDKA14Dp+sgb2ndbN/qiNgZGxyKoKE/LppUc8D6oqeDnm5QFHZJPiomN43TzBt9EClAIcK9oUZ3pkkBYD/x3nwgTzRaaJTzR7Cl4KzFdd0jXsb1iRVCdZw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0F1DEC15 for ; Fri, 8 Mar 2024 09:03:04 -0800 (PST) Received: from e110455-lin.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 32F323F73F for ; Fri, 8 Mar 2024 09:02:27 -0800 (PST) Date: Fri, 8 Mar 2024 17:02:15 +0000 From: Liviu Dudau To: Boris Brezillon Cc: =?utf-8?Q?Adri=C3=A1n?= Larumbe , Steven Price , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Daniel Vetter , kernel@collabora.com, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Mihail Atanassov Subject: Re: [PATCH] drm/panthor: Add support for performance counters Message-ID: References: <20240305165820.585245-1-adrian.larumbe@collabora.com> <20240308161515.1d74fd55@collabora.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20240308161515.1d74fd55@collabora.com> On Fri, Mar 08, 2024 at 04:15:15PM +0100, Boris Brezillon wrote: > On Fri, 8 Mar 2024 13:52:18 +0000 > Liviu Dudau wrote: > > > Hi Adrián, > > > > Thanks for the patch and appologies for taking a bit longer to respond, > > I was trying to gather some internal Arm feedback before replying. > > > > On Tue, Mar 05, 2024 at 04:58:16PM +0000, Adrián Larumbe wrote: > > > This brings in support for Panthor's HW performance counters and querying > > > them from UM through a specific ioctl(). The code is inspired by existing > > > functionality for the Panfrost driver, with some noteworthy differences: > > > > > > - Sample size is now reported by the firmware rather than having to reckon > > > it by hand > > > - Counter samples are chained in a ring buffer that can be accessed > > > concurrently, but only from threads within a single context (this is > > > because of a HW limitation). > > > - List of enabled counters must be explicitly told from UM > > > - Rather than allocating the BO that will contain the perfcounter values > > > in the render context's address space, the samples ring buffer is mapped > > > onto the MCU's VM. > > > - If more than one thread within the same context tries to dump a sample, > > > then the kernel will copy the same frame to every single thread that was > > > able to join the dump queue right before the FW finished processing the > > > sample request. > > > - UM must provide a BO handle for retrieval of perfcnt values rather > > > than passing a user virtual address. > > > > > > The reason multicontext access to the driver's perfcnt ioctl interface > > > isn't tolerated is because toggling a different set of counters than the > > > current one implies a counter reset, which also messes up with the ring > > > buffer's extraction and insertion pointers. This is an unfortunate > > > hardware limitation. > > > > There are a few issues that we would like to improve with this patch. > > > > I'm not sure what user space app has been used for testing this (I'm guessing > > gputop from igt?) but whatever is used it needs to understand the counters > > being exposed. > > We are using perfetto to expose perfcounters. > > > In your patch there is no information given to the user space > > about the layout of the counters and / or their order. Where is this being > > planned to be defined? > > That's done on purpose. We want to keep the kernel side as dumb as > possible so we don't have to maintain a perfcounter database there. All > the kernel needs to know is how much data it should transfer pass to > userspace, and that's pretty much it. I was not thinking about a perfcounter database but hints about counter value size. In the future we might have 64bits counters and you will not be able to tell only from the sample size if you're talking with an old firmware or not. (Read: future GPUs are likely to go bigger on number of perf counters before they gain higher definition). > > > Long term, I think it would be good to have details > > about the counters available. > > The perfcounter definitions are currently declared in mesa (see the G57 > perfcounter definitions for instance [1]). Mesa also contains a perfetto > datasource for Panfrost [2]. Sorry, I've missed that perfetto got updated. > > > > > The other issue we can see is with the perfcnt_process_sample() and its > > handling of threshold event and overflows. If the userspace doesn't sample > > quick enough and we cross the threshold we are going to lose samples and > > there is no mechanism to communicate to user space that the values they > > are getting have gaps. I believe the default mode for the firmware is to > > give you counter values relative to the last read value, so if you drop > > samples you're not going to make any sense of the data. > > If we get relative values, that's indeed a problem. I thought we were > getting absolute values though, in which case, if you miss two 32-bit > wraparounds, either your sampling rate is very slow, or events occur at > a high rate. First CSF GPUs have some erratas around perf counters that firmware tries to hide. I need to dig a bit deeper into what firmware does for each GPU before confirming the counting mode. The main issue with the code is that we should not be dropping samples at all, even if they are absolute values, as there will be gaps in the analysis. Looking at perfcnt_process_sample(), it does not increase the sampling rate if we reach the threshold, nor does it tell the user space to do so. From our DDK experience, if you've reached the threshold with the app sampling then you're likely to overflow as well, missing samples. > > > > > The third topic that is worth discussing is the runtime PM. Currently your > > patch will increment the runtime PM usage count when the performance > > counter dump is enabled, which means you will not be able to instrument > > your power saving modes. It might not be a concern for the current users > > of the driver, but it is worth discussing how to enable that workflow > > for future. > > I guess we could add a flags field to drm_panthor_perfcnt_config and > declare a DRM_PANTHOR_PERFCNT_CFG_ALLOW_GPU_SUSPEND flag to support this > use case. Yes, sounds like a good plan. Doesn't have to be included in this patch. Best regards, Liviu > > [1]https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/panfrost/perf/G57.xml?ref_type=heads > [2]https://gitlab.freedesktop.org/mesa/mesa/-/tree/main/src/panfrost/ds?ref_type=heads -- ==================== | I would like to | | fix the world, | | but they're not | | giving me the | \ source code! / --------------- ¯\_(ツ)_/¯