Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp334287pxj; Thu, 10 Jun 2021 01:45:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwkihxjb2AZAJNwU+JNc6Q8gcH8JVPFaphv584l1LdwxryecaRDcPxBcuW36f9iwwfmPMzf X-Received: by 2002:a17:906:ae91:: with SMTP id md17mr3578640ejb.433.1623314714495; Thu, 10 Jun 2021 01:45:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623314714; cv=none; d=google.com; s=arc-20160816; b=eaPsDi9iim2lgLHrZbBsJYBe0lD769GHJKnmNLi++czwtZb86nKfpZE68/MOPQs3NY /Ubt7eBNN/aVilnKthptXsQ3UULbSs4nJWGQNpBQsrMNz5srmZSikv38u2PGgH9K0nid iyy2jdukt4ZNL1gKNjsYichcm+/YZworrR0Mlyy/6ZVyi41TH6nxuWa7cjSqq+sAkRXi Ru3GYA8zVPEbL/yvF2au0GIj9j3vl/XwdvP4aCT9ukjDG7O9gJgamUZbae1Fg1MYMUhW +NjoTbj0xzMyCpwWrl7vxQSvy6pYPfPTqaS1Wutk4R/2//ESVAlhDUjkTPMo6jdWUGBE tI1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=Gxw38YADYnAUDib3E7rPKpkK9rVquI/etM/Z/bIynWw=; b=kAMuYBslLEgRVcO2M0fKCpMzMwUaxsnXn9enIkRDJhpm0YIgp7SY5sCtIhettcs9Z5 BgWQbbS5FiA3w5uIKzMzAueZt+py5UthRWMPPoA4yGwY8TXwlxNGfFg/q+uymUpWH12D tRvfUofPpzOV9y07GHw/drG91aEQcBXyhbiFXmOLY8ufgAYHjpfNosufD/pKU7JfXxGu 8R0Gk3TwSB3RON4ZZLasKYI51nLkKnBqM3pTEBz4ptwmCVQbf/iyRnHuIz6WTH43ns19 5+cFdV8qbigBEAyKbknPfALyjTntaw7j+UzLtjdsGuv99I1gYT7eeU9e5FJGytKIIgm4 e0ag== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bt8si1925763ejb.153.2021.06.10.01.44.51; Thu, 10 Jun 2021 01:45:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230196AbhFJIoy (ORCPT + 99 others); Thu, 10 Jun 2021 04:44:54 -0400 Received: from foss.arm.com ([217.140.110.172]:53868 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229770AbhFJIov (ORCPT ); Thu, 10 Jun 2021 04:44:51 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B9653D6E; Thu, 10 Jun 2021 01:42:55 -0700 (PDT) Received: from [10.57.4.220] (unknown [10.57.4.220]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 18C303F719; Thu, 10 Jun 2021 01:42:52 -0700 (PDT) Subject: Re: [PATCH v2 1/2] sched/fair: Take thermal pressure into account while estimating energy To: Vincent Guittot Cc: linux-kernel , "open list:THERMAL" , Peter Zijlstra , "Rafael J. Wysocki" , Viresh Kumar , Quentin Perret , Dietmar Eggemann , Vincent Donnefort , Beata Michalska , Ingo Molnar , Juri Lelli , Steven Rostedt , segall@google.com, Mel Gorman , Daniel Bristot de Oliveira References: <20210604080954.13915-1-lukasz.luba@arm.com> <20210604080954.13915-2-lukasz.luba@arm.com> From: Lukasz Luba Message-ID: Date: Thu, 10 Jun 2021 09:42:51 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/10/21 8:59 AM, Vincent Guittot wrote: > On Fri, 4 Jun 2021 at 10:10, Lukasz Luba wrote: >> >> Energy Aware Scheduling (EAS) needs to be able to predict the frequency >> requests made by the SchedUtil governor to properly estimate energy used >> in the future. It has to take into account CPUs utilization and forecast >> Performance Domain (PD) frequency. There is a corner case when the max >> allowed frequency might be reduced due to thermal. SchedUtil is aware of >> that reduced frequency, so it should be taken into account also in EAS >> estimations. >> >> SchedUtil, as a CPUFreq governor, knows the maximum allowed frequency of >> a CPU, thanks to cpufreq_driver_resolve_freq() and internal clamping >> to 'policy::max'. SchedUtil is responsible to respect that upper limit >> while setting the frequency through CPUFreq drivers. This effective >> frequency is stored internally in 'sugov_policy::next_freq' and EAS has >> to predict that value. >> >> In the existing code the raw value of arch_scale_cpu_capacity() is used >> for clamping the returned CPU utilization from effective_cpu_util(). >> This patch fixes issue with too big single CPU utilization, by introducing >> clamping to the allowed CPU capacity. The allowed CPU capacity is a CPU >> capacity reduced by thermal pressure signal. We rely on this load avg >> geometric series in similar way as other mechanisms in the scheduler. >> >> Thanks to knowledge about allowed CPU capacity, we don't get too big value >> for a single CPU utilization, which is then added to the util sum. The >> util sum is used as a source of information for estimating whole PD energy. >> To avoid wrong energy estimation in EAS (due to capped frequency), make >> sure that the calculation of util sum is aware of allowed CPU capacity. >> >> Signed-off-by: Lukasz Luba >> --- >> kernel/sched/fair.c | 17 ++++++++++++++--- >> 1 file changed, 14 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 161b92aa1c79..1aeddecabc20 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -6527,6 +6527,7 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd) >> struct cpumask *pd_mask = perf_domain_span(pd); >> unsigned long cpu_cap = arch_scale_cpu_capacity(cpumask_first(pd_mask)); >> unsigned long max_util = 0, sum_util = 0; >> + unsigned long _cpu_cap = cpu_cap; >> int cpu; >> >> /* >> @@ -6558,14 +6559,24 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd) >> cpu_util_next(cpu, p, -1) + task_util_est(p); >> } >> >> + /* >> + * Take the thermal pressure from non-idle CPUs. They have >> + * most up-to-date information. For idle CPUs thermal pressure >> + * signal is not updated so often. > > What do you mean by "not updated so often" ? Do you have a value ? > > Thermal pressure is updated at the same rate as other PELT values of > an idle CPU. Why is it a problem there ? > For idle CPU the value is updated 'remotely' by some other CPU running nohz_idle_balance(). That goes into update_blocked_averages() if the flags and checks are OK inside update_nohz_stats(). Sometimes this is not called because other_have_blocked() returned false. It can happen for a long idle CPU, which all signals in that function has 0 [1]. This will cause that we don't check what is a new value stored by thermal cpufreq_cooling for the thermal pressure [2]. We should feed that value into the 'signal' machinery inside the __update_blocked_others() [3]. Unfortunately, in a corner case there's a flag (rq->has_blocked_load) which blocks the check of a raw thermal value and prevents feeding it into thermal pressure signal (since it's a long idle CPU, there is no load) [4]. It has implication on this patch, because I cannot e.g. take first CPU from the PD mask and blindly check it's thermal pressure, because it can be idle for a long time. I don't want to have two loop, first just for taking the latest thermal pressure for the PD. Thus, I want to re-use the existing loop to take the latest information from non-idle CPU and pass use. Regards, Lukasz [1] https://elixir.bootlin.com/linux/latest/source/kernel/sched/fair.c#L7909 [2] https://elixir.bootlin.com/linux/latest/source/drivers/thermal/cpufreq_cooling.c#L494 [3] https://elixir.bootlin.com/linux/latest/source/kernel/sched/fair.c#L7958 [4] https://elixir.bootlin.com/linux/latest/source/kernel/sched/fair.c#L8433