Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp367341pxj; Thu, 10 Jun 2021 02:46:51 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyI/Jk3crKB9DgHIQCcHPzSLF8G7YAxVamLJR7xjEtSrEJzs3chrqwOc8OLgyZQe4xix/Kw X-Received: by 2002:a05:6402:54f:: with SMTP id i15mr3747256edx.339.1623318410928; Thu, 10 Jun 2021 02:46:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623318410; cv=none; d=google.com; s=arc-20160816; b=q61ZomcwK8L1rV0prlDIH2Kly+2o2vFlpntfmMteblIht85m/PiS/moMp3e8ycqYU/ HLPK67tOGje00IcXvhr18MlRC457Li0jeD/9AGGRyhHC8SwCzmqifsd4Fx+dOc3KQAw5 /qFx5xVw5ZPkMMBhrY0VVA4vYTSTxBTWkB3+tjDcKoOk1nWsXvnZHVkQAyNnL3Y+nfg4 7DReSwWfsr6/gNag2TRfEBBNOAIx4u8MmBx+wrEO1N8yvSEKlpqohutGOp9VnspMjaxT WHeeaUiTQQms4yZT1+mD2yTcLAr9UsVA7RGgpHbKRE/Yw1gA20ySmRhTxP8XUWdrE8d2 gehw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=gM3UcJPTrjoyJ/3bbIZ5GW9fDf24Q/ItJx+WrrjC9IQ=; b=NoXcGAVaAvjl162PBQExkcTEGqZS+D+nxcy4uAyE/bsUoxaXfLLFao+jxu7lxj2lri xdbMBpDD9OAE9lEZ5IHHwTt3n0J5FgR8BdldmsqmoiqVDZkjlB1fAThcpU3Fa0t3NByu 1kEFVxQfnDuz+1kj42kQMtL+naqEVqXl0t0PWrqYmNpUG5EFytDL3bm2H53ayqOfXJZC mvl2E4+dCYPjONPvbDrbQeLiYOd0LbUg+EAU+qAFqGp9Cy6ce5JoXN2v9g7rPkH1pld1 FeP7nmgJot8RWGl2cHeh+J0YcOL3UwHjctpv06kPU8aodmZjRLSUX4F/xGFFkmAafsKq NS3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=heoi6mxm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z7si1859034edr.263.2021.06.10.02.46.27; Thu, 10 Jun 2021 02:46:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=heoi6mxm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230120AbhFJJrK (ORCPT + 99 others); Thu, 10 Jun 2021 05:47:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39148 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229895AbhFJJrK (ORCPT ); Thu, 10 Jun 2021 05:47:10 -0400 Received: from mail-lf1-x129.google.com (mail-lf1-x129.google.com [IPv6:2a00:1450:4864:20::129]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 015CDC061574 for ; Thu, 10 Jun 2021 02:45:14 -0700 (PDT) Received: by mail-lf1-x129.google.com with SMTP id n12so2207937lft.10 for ; Thu, 10 Jun 2021 02:45:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gM3UcJPTrjoyJ/3bbIZ5GW9fDf24Q/ItJx+WrrjC9IQ=; b=heoi6mxmEBqhWrlDhMrvH6O3fXZ/YJ5SbtfhGjdzNiLXqg7aqvdCht7zC3L4fa9xWu lDDg/bSCOPXLxd8WNefLjvO9V8dwZcjym6BeG3EmytKXmxiliBNP8kmUkNS6sw5Qf3Ym QSE8kqBHxdxp1pIDP/Wo47qnX35kVAwzRihBH37jphK8ji/bDLpVS8VvHN+A8+D4XmgI t3OCgyKgTyDAGc9TbX3FSqkR/atzyQibCqgn2kroY534Hw6johZaqKorH66Jlap5y7nq ZldJ5kLcoKbFTab00vwoABQmXu8+eZky3l3DoK5kviMZSTsPyqR7gYnrWHzi5OfD6+0A kEbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gM3UcJPTrjoyJ/3bbIZ5GW9fDf24Q/ItJx+WrrjC9IQ=; b=eJBFR5603xAGu44I9ShHwuKbBPw2X+V/rHs8UpP8YVNqxH9BU1jWFi2gcEAk/chi8C s9rxBTI4JoTKKbvW2rXRqJ1Uzpj2VS7gtefDODiaOs5+EJ2FtEd0BLn0MHtoittRj0cV fdPXa0uKIaWXvvAarTpCtDXQRzabvjYd9dZtlY+4qL76Ojsvh/YcKIqjypfBsmQ9xgMH WYFHix5YGuTVJfPbgDi+chpDbAZX+bnigIG1lVTxYwWDz68PAI53jDMlFPoLuumxNg0n IpX6Lbji3XZsmxPW4J19Ae/TIzYULnXH4FdLb9vly3dlS2etl18XJUR5XYRy9Pqv5Tl+ OT4w== X-Gm-Message-State: AOAM532KaoyIymIPpMLJUCxbSS+fB/y/fP0qww2UEZ4f2QpoLvvFNFRA 5Oa2ftqocmxjfG2kFbFbfaxe7ILkoz2t7Z5CwHy/6Q== X-Received: by 2002:a05:6512:3483:: with SMTP id v3mr1420859lfr.154.1623318312275; Thu, 10 Jun 2021 02:45:12 -0700 (PDT) MIME-Version: 1.0 References: <20210604080954.13915-1-lukasz.luba@arm.com> <20210604080954.13915-2-lukasz.luba@arm.com> <8f4156a7-46ca-361d-bcb7-1cbdc860ef37@arm.com> In-Reply-To: <8f4156a7-46ca-361d-bcb7-1cbdc860ef37@arm.com> From: Vincent Guittot Date: Thu, 10 Jun 2021 11:45:00 +0200 Message-ID: Subject: Re: [PATCH v2 1/2] sched/fair: Take thermal pressure into account while estimating energy To: Lukasz Luba Cc: linux-kernel , "open list:THERMAL" , Peter Zijlstra , "Rafael J. Wysocki" , Viresh Kumar , Quentin Perret , Dietmar Eggemann , Vincent Donnefort , Beata Michalska , Ingo Molnar , Juri Lelli , Steven Rostedt , segall@google.com, Mel Gorman , Daniel Bristot de Oliveira Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 10 Jun 2021 at 11:36, Lukasz Luba wrote: > > > > On 6/10/21 10:11 AM, Vincent Guittot wrote: > > On Thu, 10 Jun 2021 at 10:42, Lukasz Luba wrote: > >> > >> > >> > >> On 6/10/21 8:59 AM, Vincent Guittot wrote: > >>> On Fri, 4 Jun 2021 at 10:10, Lukasz Luba wrote: > >>>> > >>>> Energy Aware Scheduling (EAS) needs to be able to predict the frequency > >>>> requests made by the SchedUtil governor to properly estimate energy used > >>>> in the future. It has to take into account CPUs utilization and forecast > >>>> Performance Domain (PD) frequency. There is a corner case when the max > >>>> allowed frequency might be reduced due to thermal. SchedUtil is aware of > >>>> that reduced frequency, so it should be taken into account also in EAS > >>>> estimations. > >>>> > >>>> SchedUtil, as a CPUFreq governor, knows the maximum allowed frequency of > >>>> a CPU, thanks to cpufreq_driver_resolve_freq() and internal clamping > >>>> to 'policy::max'. SchedUtil is responsible to respect that upper limit > >>>> while setting the frequency through CPUFreq drivers. This effective > >>>> frequency is stored internally in 'sugov_policy::next_freq' and EAS has > >>>> to predict that value. > >>>> > >>>> In the existing code the raw value of arch_scale_cpu_capacity() is used > >>>> for clamping the returned CPU utilization from effective_cpu_util(). > >>>> This patch fixes issue with too big single CPU utilization, by introducing > >>>> clamping to the allowed CPU capacity. The allowed CPU capacity is a CPU > >>>> capacity reduced by thermal pressure signal. We rely on this load avg > >>>> geometric series in similar way as other mechanisms in the scheduler. > >>>> > >>>> Thanks to knowledge about allowed CPU capacity, we don't get too big value > >>>> for a single CPU utilization, which is then added to the util sum. The > >>>> util sum is used as a source of information for estimating whole PD energy. > >>>> To avoid wrong energy estimation in EAS (due to capped frequency), make > >>>> sure that the calculation of util sum is aware of allowed CPU capacity. > >>>> > >>>> Signed-off-by: Lukasz Luba > >>>> --- > >>>> kernel/sched/fair.c | 17 ++++++++++++++--- > >>>> 1 file changed, 14 insertions(+), 3 deletions(-) > >>>> > >>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > >>>> index 161b92aa1c79..1aeddecabc20 100644 > >>>> --- a/kernel/sched/fair.c > >>>> +++ b/kernel/sched/fair.c > >>>> @@ -6527,6 +6527,7 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd) > >>>> struct cpumask *pd_mask = perf_domain_span(pd); > >>>> unsigned long cpu_cap = arch_scale_cpu_capacity(cpumask_first(pd_mask)); > >>>> unsigned long max_util = 0, sum_util = 0; > >>>> + unsigned long _cpu_cap = cpu_cap; > >>>> int cpu; > >>>> > >>>> /* > >>>> @@ -6558,14 +6559,24 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd) > >>>> cpu_util_next(cpu, p, -1) + task_util_est(p); > >>>> } > >>>> > >>>> + /* > >>>> + * Take the thermal pressure from non-idle CPUs. They have > >>>> + * most up-to-date information. For idle CPUs thermal pressure > >>>> + * signal is not updated so often. > >>> > >>> What do you mean by "not updated so often" ? Do you have a value ? > >>> > >>> Thermal pressure is updated at the same rate as other PELT values of > >>> an idle CPU. Why is it a problem there ? > >>> > >> > >> > >> For idle CPU the value is updated 'remotely' by some other CPU > >> running nohz_idle_balance(). That goes into > >> update_blocked_averages() if the flags and checks are OK inside > >> update_nohz_stats(). Sometimes this is not called > >> because other_have_blocked() returned false. It can happen for a long > > > > So i miss that you were in a loop and the below was called for each > > cpu and _cpu_cap was overwritten > > > > + if (!idle_cpu(cpu)) > > + _cpu_cap = cpu_cap - thermal_load_avg(cpu_rq(cpu)); > > > > But that also means that if the 1st cpus of the pd are idle, they will > > use original capacity whereas the other ones will remove the thermal > > pressure. Isn't this a problem ? You don't use the same capacity for > > all cpus in the performance domain regarding the thermal pressure? > > True, but in the experiments for idle CPUs I haven't > observed that they still have some big util (bigger than _cpu_cap). > It decayed already, so it's not a problem for idle CPUs. > > Although, it might be my test case which didn't trigger something. > Is it worth to add the loop above this one, to be 100% sure and > get a thermal pressure signal from some running CPU? > Then apply the same value always inside the 2nd loop? Either it's a problem and you must make sure to use the same capacity for all cpus of a PD Or it's not but in this case you don't need if (!idle_cpu(cpu))