Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp7872811rwl; Thu, 23 Mar 2023 09:37:55 -0700 (PDT) X-Google-Smtp-Source: AKy350beifSAHk8bu7MTg7iO7OJ+wW5ut1cfHFEaJsCz6hHbhqb3mN3RAQ2OC/C7lvHAozPzIuzx X-Received: by 2002:aa7:c914:0:b0:4af:69b8:52af with SMTP id b20-20020aa7c914000000b004af69b852afmr2227edt.24.1679589474889; Thu, 23 Mar 2023 09:37:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679589474; cv=none; d=google.com; s=arc-20160816; b=Z49xjOXZk8kWlI4OiU57Uh+8s6PBXNOeSy12lhrTF1WDCj41O24rX1idbVrNMxG9IV EiHjg0R7SQirbU7TW8Yi0YsRwI+mlg4EcyRmwTkUrWDBiPRofSU1yBAmA2qxDQfo57FA FIp+74RXA8dGXCu7e/t1ARQteaQLjceDQiR57PWucrAnqHGW1/0P/uhkFPCng33JKjjb CkDpssO7SINg1Ky4nko2RcG+KztPb31KgBVAL6h4QEzN9rf8U9+6Z63Cz3edIIvBZ4vo d3LKSJnvhnFmy6IIaPTCf4hzciwrYZGNiDHi/PxQ/hGS/61DHazFWx5SqGO+NcI09xZk HjNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=RV7EaI4bNGbS96mXTwCUazekWdWKO2ypTUHo0dfyW3I=; b=Sh+pTrYhtXkaJbOKGSj65vgtixi3kT5Ulwe/GGKGeUHOgx/wqqX9OxPR+Y7ItjX5jJ JxllSeDxJWG5XPVyDyOTLNSzklAom2bJcugBEMgSYR88dKMmZWPuydiqmeM+yEs3tLEu xph7O0FJFFJDiAngcsJD2i1ZIwKNVY/C/7JqweVlf1Yewh18q+Dzk2Vz0d4Es+xvBKUv oW9V2g1F1eQQUJXwF48Mhj+cYCBBAEOESksLJvphs5X9vYbx6AdCENiqGvkr5ZJSNrN7 8OQOH9RMlMKhoD4TH9n0gr9dwDjvNBFrY4RfazJpLXhl9hBUxNRYIMug+euqNpHZmFM5 xpaQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d8-20020a1709064c4800b009311b97abe3si17546151ejw.112.2023.03.23.09.36.58; Thu, 23 Mar 2023 09:37:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231422AbjCWQar (ORCPT + 99 others); Thu, 23 Mar 2023 12:30:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231580AbjCWQaO (ORCPT ); Thu, 23 Mar 2023 12:30:14 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D4C4B2BF0A for ; Thu, 23 Mar 2023 09:30:04 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1F6382F4; Thu, 23 Mar 2023 09:30:48 -0700 (PDT) Received: from [192.168.178.6] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 04D123F766; Thu, 23 Mar 2023 09:30:01 -0700 (PDT) Message-ID: <60fe6b16-0fc6-6ac4-f8fe-87ae9b6592c0@arm.com> Date: Thu, 23 Mar 2023 17:29:52 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [RFC PATCH 0/1] sched/pelt: Change PELT halflife at runtime Content-Language: en-US To: Qais Yousef , Vincent Guittot Cc: Peter Zijlstra , Kajetan Puchalski , Jian-Min Liu , Ingo Molnar , Morten Rasmussen , Vincent Donnefort , Quentin Perret , Patrick Bellasi , Abhijeet Dharmapurikar , Qais Yousef , linux-kernel@vger.kernel.org, Jonathan JMChen References: <20221108194843.i4qckcu7zwqstyis@airbuntu> <424e2c81-987d-f10e-106d-8b4c611768bc@arm.com> <20230223153700.55zydy7jyfwidkis@airbuntu> <20230301172458.intrgsirjauzqmo3@airbuntu> From: Dietmar Eggemann In-Reply-To: <20230301172458.intrgsirjauzqmo3@airbuntu> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.3 required=5.0 tests=NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/03/2023 18:24, Qais Yousef wrote: > On 03/01/23 11:39, Vincent Guittot wrote: >> On Thu, 23 Feb 2023 at 16:37, Qais Yousef wrote: >>> >>> On 02/09/23 17:16, Vincent Guittot wrote: >>> >>>> I don't see how util_est_faster can help this 1ms task here ? It's >>>> most probably never be preempted during this 1ms. For such an Android >>>> Graphics Pipeline short task, hasn't uclamp_min been designed for and >>>> a better solution ? >>> >>> uclamp_min is being used in UI and helping there. But your mileage might vary >>> with adoption still. >>> >>> The major motivation behind this is to help things like gaming as the original >>> thread started. It can help UI and other use cases too. Android framework has >>> a lot of context on the type of workload that can help it make a decision when >>> this helps. And OEMs can have the chance to tune and apply based on the >>> characteristics of their device. >>> >>>> IIUC how util_est_faster works, it removes the waiting time when >>>> sharing cpu time with other tasks. So as long as there is no (runnable >>>> but not running time), the result is the same as current util_est. >>>> util_est_faster makes a difference only when the task alternates >>>> between runnable and running slices. >>>> Have you considered using runnable_avg metrics in the increase of cpu >>>> freq ? This takes into the runnable slice and not only the running >>>> time and increase faster than util_avg when tasks compete for the same >>>> CPU >>> >>> Just to understand why we're heading into this direction now. >>> >>> AFAIU the desired outcome to have faster rampup time (and on HMP faster up >>> migration) which both are tied to utilization signal. >>> >>> Wouldn't make the util response time faster help not just for rampup, but >>> rampdown too? >>> >>> If we improve util response time, couldn't this mean we can remove util_est or >>> am I missing something? >> >> not sure because you still have a ramping step whereas util_est >> directly gives you the final tager util_est gives us instantaneous signal at enqueue for periodic tasks, something PELT will never be able to do. > I didn't get you. tager? > >> >>> >>> Currently we have util response which is tweaked by util_est and then that is >>> tweaked further by schedutil with that 25% margin when maping util to >>> frequency. >> >> the 25% is not related to the ramping time but to the fact that you >> always need some margin to cover unexpected events and estimation >> error > > At the moment we have > > util_avg -> util_est -> (util_est_faster) -> util_map_freq -> schedutil filter ==> current frequency selection > > I think we have too many transformations before deciding the current > frequencies. Which makes it hard to tweak the system response. To me it looks more like this: max(max(util_avg, util_est), runnable_avg) -> schedutil's rate limit* -> freq. selection ^^^^^^^^^^^^ new proposal to factor in root cfs_rq contention Like Vincent mentioned, util_map_freq() (now: map_util_perf()) is only there to create the safety margin used by schedutil & EAS. * The schedutil up/down filter thing has been already naked in Nov 2016. IMHO, this is where util_est was initially discussed as an alternative. We have it in mainline as well, but one value (default 10ms) for both directions. There was discussion to map it to the driver's translation_latency instead. In Pixel7 you use 0.5ms up and `5/20/20ms` down for `little/medium/big`. So on `up` your rate is as small as possible (only respecting the driver's translation_latency) but on `down` you use much more than that. Why exactly do you have this higher value on `down`? My hunch is scenarios in which the CPU (all CPUs in the freq. domain) goes idle, so util_est is 0 and the blocked utilization is decaying (too fast, 4ms (250Hz) versus 20ms?). So you don't want to ramp-up frequency again when the CPU wakes up in those 20ms? >>> I think if we can allow improving general util response time by tweaking PELT >>> HALFLIFE we can potentially remove util_est and potentially that magic 25% >>> margin too. >>> >>> Why the approach of further tweaking util_est is better? >> >> note that in this case it doesn't really tweak util_est but Dietmar >> has taken into account runnable_avg to increase the freq in case of >> contention >> >> Also IIUC Dietmar's results, the problem seems more linked to the >> selection of a higher freq than increasing the utilization; >> runnable_avg tests give similar perf results than shorter half life >> and better power consumption. > > Does it ramp down faster too? Not sure why you are interested in this? Can't be related to the `driving DVFS` functionality discussed above.