Received: by 2002:ab2:6d45:0:b0:1fb:d597:ff75 with SMTP id d5csp350951lqr; Wed, 5 Jun 2024 07:58:01 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXNgKXgy8AnOBp0aaDAEAl9nSHcumVGx8p4ZgJTInMfsRMIJMf10cQiC+0a0awTSZyRVtw8UrJizs1A+JF79L0NCBJriFOZfRLZgpkxJg== X-Google-Smtp-Source: AGHT+IE/l2MYo9zEK/Gywl5Y1OdZy0JY6O5jplHkaBSD1Qdbw+St1/1Oqk/A2rPY4oMRV592dOyp X-Received: by 2002:a17:903:120c:b0:1f6:9181:47ac with SMTP id d9443c01a7336-1f6a590aa53mr31829005ad.1.1717599481176; Wed, 05 Jun 2024 07:58:01 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717599481; cv=pass; d=google.com; s=arc-20160816; b=tzibM9D2fYPyJ3PAq9ACZ5eGK1Dn0EeVZmzj16kzYG8jUDY5ZblYRV6EVDpLjxQ8jL Ymn40B9xv/XbtovuDnNVesapL9O6QoICazUdkLACC+OlFLoT13c6VgMxTuI5kdQnvOlN TvkXffXfyQB/w2KXItFgTq0fjOXI9IKZA+sxAiWCxjuKyNiq7MOqGnKfYD3dYjbi5Koh WWCyNQxWXFjfeDgNI6cpR/v+gwDEHPdEpviXERumT4yHgT/azr4G1ojYefTnF2EqP21D VpIHw5mnBomN74b4zTXR7gMVhqAKo6HubM+1gt14Zrei/ACiC22dJvFQEQQ2OXelq8+J w55A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=9CGxyo2iRHclTYc36k+KgY5ZaavZB+Qy7vQTt8qCvZg=; fh=jav7JN3JBsPxRIzesCQ8LlPlJ6UJQ5GwGNKlQCYkgyM=; b=NYH0ON2/CTINFycJmFZJZ2CODdz/wGIJ93EFdaHTU6vHK5on6xbQBZzplirGSQtc+C uT2uaweRw1gl+SOj0AiMMhMYCer4+hKQpVsK8CZtifj2k2PDry+L81DkVsFkAXrNAQ/7 KAVD4yoZNoSkSufgBSDvoXpaHkEK7c4CBmrBnCoOn2PArn5pFkwf5x6ySRoBY02F0rOv oWFachoSmGvd0JrEWofIFr/ChURB2TZU1VisooQsIBQtzwKY1uZfb7Ec3oRdkVS0P3WY TNxaoXPevlUm3DkUIq8zq+GBaGlAzBKPczSMh9R91vB/dHOKdrAJN27hOPN+uY+yBfVd PzIw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=IItoganz; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-202763-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-202763-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id d9443c01a7336-1f6323dcfefsi26079835ad.278.2024.06.05.07.58.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Jun 2024 07:58:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-202763-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=IItoganz; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-202763-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-202763-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 1588EB23224 for ; Wed, 5 Jun 2024 14:17:56 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E289D1AAA5; Wed, 5 Jun 2024 14:17:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="IItoganz" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C230219D889; Wed, 5 Jun 2024 14:17:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717597066; cv=none; b=H/aYyp3D9bsGBXwbZQo6qfvKl55pqufClvXIggqltCWiDo2dK/JD9WrNkxE0ei8aV/xoOMqSFfBs39oJiWGgbILfSDEI/SjQqi7wPhPLHpzFBot4B9hz9mgHJrepYlgc3tu87zWqKO80YBlkl6obM6/Gs5PxphTZwQcXvmK5fU4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717597066; c=relaxed/simple; bh=XEx/RmLIAYzNyNyRLPZfdvmzntgVOQycajzJ1rdomLo=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=leO6HSKmax8X/6KjeU8FRozdFp//xZq1V3leg0C/bgvFz4urJQTAL/2y1LzHeCV4PbAxoDDASOyIgHGeYgJKVCzu5hahdi1nloJCjoJBu7LXzHpbaLu+ukoCLoEtDEHTtpNQ5BAvuZ+1xyxR16is+L4YgM4JTSos7/zjHY0dVqE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=IItoganz; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 55F0EC32782; Wed, 5 Jun 2024 14:17:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1717597066; bh=XEx/RmLIAYzNyNyRLPZfdvmzntgVOQycajzJ1rdomLo=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=IItoganzLnx0lSwTSZIHuhQhh9O5JJ0bM1Bsa8/zvIK1CJb/oftCjbeBNCPDaBnbP KEzP7mM7r2ThKIKnYWXAHR7m6RCdj9mhakmpBTif8brruizBiJpiJC/02N2Q+nk13h OK/gnO7gOLQ6/EXHHDyzXJsnfn3ZXawD9oyfHJigMWU5eHacEYHM4q+ptgc5jhxKwm stHox8ATBR4qHK9fIoLHPOgJlfbU88Czk5JH6BGUn/b2mzHHkjZuUbjaNEluygb4ZS guYBQ92b0jm6f4XOKVP2CXrO8owT1sr7aLJ8GOBM68pdDudVnCnSvzNhwxnI8zqwjc Ef6uIAPZg6zrA== Received: by mail-oo1-f44.google.com with SMTP id 006d021491bc7-5ba8e1b9018so18225eaf.0; Wed, 05 Jun 2024 07:17:46 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCWLMHaBf4+2j0qKO1cRSanQqR8u+IZ+Tvl6I1vBgW7JFBEQ6P/4VxVWpfZ9F5B5iwB1u0nKttXBzQIO4Q0lU3CMryxIS9RU/bAJfj8UTv6nXXie48O07n+sC8fOrENmHQxG34ZWd9E= X-Gm-Message-State: AOJu0Yw7aMw8R0JEjOcQWJeyPyNNpLocp1hozhDwWTKVHxcni7T5vff4 REoHpnixSHSgtPbdRZkA5pnD0ydKZ8HQ6P8l/YjkNh6xgiuXIx/wKxXwutOynGyU2dpa0+8RRpc MCTlh7KvWl/NHBYqbzFJjZkkrGdI= X-Received: by 2002:a4a:a7c9:0:b0:5ba:6669:ba6e with SMTP id 006d021491bc7-5ba78ff8903mr2780302eaf.1.1717597065543; Wed, 05 Jun 2024 07:17:45 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240603081331.3829278-1-beata.michalska@arm.com> <20240603081331.3829278-2-beata.michalska@arm.com> <20240603114811.oio3uemniib5uaa2@vireshk-i7> In-Reply-To: From: "Rafael J. Wysocki" Date: Wed, 5 Jun 2024 16:17:32 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 1/1] cpufreq: Rewire arch specific feedback for cpuinfo/scaling_cur_freq To: Beata Michalska Cc: "Rafael J. Wysocki" , Viresh Kumar , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, len.brown@intel.com, ionela.voinescu@arm.com, vanshikonda@os.amperecomputing.com, sumitg@nvidia.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Jun 5, 2024 at 10:36=E2=80=AFAM Beata Michalska wrote: > > On Mon, Jun 03, 2024 at 03:43:12PM +0200, Rafael J. Wysocki wrote: > > On Mon, Jun 3, 2024 at 1:48=E2=80=AFPM Viresh Kumar wrote: > > > > > > Hi Beata, > > > > > > Thanks for taking this forward. > > > > > > On 03-06-24, 09:13, Beata Michalska wrote: > > > > Some architectures provide a way to determine an average frequency = over > > > > a certain period of time, based on available performance monitors (= AMU on > > > > ARM or APERF/MPERf on x86). With those at hand, enroll arch_freq_ge= t_on_cpu > > > > into cpuinfo_cur_freq policy sysfs attribute handler, which is expe= cted to > > > > represent the current frequency of a given CPU,as obtained by the h= ardware. > > > > This is the type of feedback that counters do provide. > > > > > > Please add blank line between paragraphs, it makes it easier to read > > > them. > > > > > > > At the same time, keep the scaling_cur_freq attribute align with th= e docs > > > > and make it provide most recently requested frequency, still allowi= ng to > > > > fallback to using arch_freq_get_on_cpu for cases when cpuinfo_cur_f= req is > > > > not available. > > > > > > Please split this patch into two parts, they are very distinct change= s > > > and should be kept separate. > > > > > > > Signed-off-by: Beata Michalska > > > > --- > > > > drivers/cpufreq/cpufreq.c | 8 ++++++-- > > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > > > > index a45aac17c20f..3b0eabe4a983 100644 > > > > --- a/drivers/cpufreq/cpufreq.c > > > > +++ b/drivers/cpufreq/cpufreq.c > > > > @@ -758,7 +758,8 @@ static ssize_t show_scaling_cur_freq(struct cpu= freq_policy *policy, char *buf) > > > > ssize_t ret; > > > > unsigned int freq; > > > > > > > > - freq =3D arch_freq_get_on_cpu(policy->cpu); > > > > + freq =3D !cpufreq_driver->get ? arch_freq_get_on_cpu(policy->= cpu) > > > > + : 0; > > > > > > This is getting trickier than I thought as I dived into more details > > > of all the changes to the file. > > > > > > Rafael, > > > > > > We probably need to decide on a policy for these two files, it is > > > getting a bit confusing. > > > > > > cpuinfo_cur_freq: > > > > > > The purpose of this file is abundantly clear. This returns the best > > > possible guess of the current hardware frequency. It should rely on > > > arch_freq_get_on_cpu() or ->get() to get the value. > > > > Let me quote the documentation: > > > > "This is expected to be the frequency the hardware actually runs at. > > If that frequency cannot be determined, this attribute should not be > > present." > > > > In my reading, this has nothing to do with arch_freq_get_on_cpu(), at > > least on x86. > My reading on this (and I might be wrong) is that for x86, > arch_freq_get_on_cpu() is utilizing the APERF/MPERF registers to get > a rough/average frequency and as such it does, in a way, get a hw feedbac= k > and it does somewhat fall under "frequency the hardware actually runs at"= . But it is not expected to somewhat fall under that definition, it is expected to be exactly the frequency the CPU is running at when the file is accessed. In the past, there was hardware that could provide that information. > Because it is an average value, it might not provide an instant view on > current frequency, but either way, the value provided here migh be a bit = off > anyway. Well, not just a bit. It may be off completely. > But then, we could adjust the timewidnow being used to make it more > accurate. Not on x86 AFAICS because the time window is the scheduler tick period. On x86 it is better to use a utility like turbostat to measure frequency instead of the cpufreq sysfs (or /proc/cpuinfo for that matter). > I might be looking at it the wrong way though. > > > > > Perhaps we can > > > make this available all the time, instead of conditionally on ->get() > > > callback (which isn't present for intel-pstate for example). > > > > We could, but then on x86 there is no expectation that this file will > > be present and changing this may introduce significant confusion > > because of the way it is documented (which would need to be changed, > > but people might be forgiven for failing to notice the change of > > interpretation of this file). > > > > > scaling_cur_freq: > > > > > > This should better reflect the last requested frequency, but since a > > > significant time now it is trying to show what cpuinfo_cur_freq shows= . > > > > Well, not really. > > > > > commit c034b02e213d ("cpufreq: expose scaling_cur_freq sysfs file for= set_policy() drivers") > > > commit f8475cef9008 ("x86: use common aperfmperf_khz_on_cpu() to calc= ulate KHz using APERF/MPERF") > > > > "In the majority of cases, this is the frequency of the last P-state > > requested by the scaling driver from the hardware using the scaling > > interface provided by it, which may or may not reflect the frequency > > the CPU is actually running at (due to hardware design and other > > limitations). > > > > Some architectures (e.g. x86) may attempt to provide information more > > precisely reflecting the current CPU frequency through this attribute, > > but that still may not be the exact current CPU frequency as seen by > > the hardware at the moment." > > > > So the problem is that on Intel x86 with HWP and intel_pstate in the > > active mode, say, "the frequency of the last P-state requested by the > > scaling driver from the hardware" is actually never known, so exposing > > it via scaling_cur_freq is not possible. > > > > Moreover, because cpuinfo_cur_freq is not present at all in that case, > > scaling_cur_freq is the only way to allow user space to get an idea > > about the CPU current frequency. I don't think it can be changed now > > without confusing users. > > > What's your take on leaving the scaling_cur_freq to resolve to > arch_freq_get_on_cpu() when cpuinfo_cur_freq is not present ? IIUC that's what happens right now on x86 and I don't see major problems with it. > This way nothing will change for the intel_pstate + HWP > but it will allow using the info provided by arch_freq_get_on_cpu() for > cpuinfo_cur_freq if one is provided and it will automatically ignore it f= or > scaling_cur_freq ? Though I guess it falls under "making all confused" th= at you > have described donw the line. Well, it does. > > > What should we do ? I wonder if we will break some userspace tools > > > (which may have started relying on these changes). > > > > We will. > > > > IIUC, it is desirable to expose "the frequency of the last P-state > > requested by the scaling driver from the hardware" via > > scaling_cur_freq on ARM, but it is also desirable to expose an > > approximation of the actual current CPU frequency, so the only way to > > do that without confusing the heck out of everybody downstream would > > be to introduce a new attribute for this purpose and document it > > precisely. > > What would introducing new attribute mean for scaling_cur_freq on x86 the= n ? Nothing to start with, but later it might be possible to make it only work for drivers that don't implement ->setpolicy(), ie. when policy->cur is well defined, so scaling_cur_freq, if it were present, would only return the last frequency requested by the driver. > I do assume we would leave cpuinfo_cur_freq untouched without calling > arch_freq_get_on_cpu() (as it is now). Then keeping scaling_cur_freq to a= ctually > use it - what would be the 3rd attribute meaning, if: > - cpuinfo_cur_freq -> actual freq as seen by the hardware (but not the > counters?) This is to be used only when the hardware can tell what frequency the CPU is running at when it is asked for that value. > - scaling_cur_freq -> last requested frequency and for some cases, feedba= ck from > counters Yes, but see above - I would prefer to limit it to the last requested frequency in the future. > - whatever_name_we_will_come_with -> average frequency based on counters = ? Yes. > not always available None of them would be always present, but at least one of them would be present on any system. > It is still bit confusing. Fair enough.