Received: by 10.213.65.68 with SMTP id h4csp1273614imn; Wed, 21 Mar 2018 06:57:42 -0700 (PDT) X-Google-Smtp-Source: AG47ELsyt3GGIi3Bw7zuRUsMsT/rAHtK3ChR2kwa93W6zUAzDpk7RNkA0w0LjVsoyI48WH8JBqtq X-Received: by 2002:a17:902:a705:: with SMTP id w5-v6mr21299864plq.299.1521640662341; Wed, 21 Mar 2018 06:57:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521640662; cv=none; d=google.com; s=arc-20160816; b=cjXt8cD4M3kRbzsVLcYrmbxl4DrnLLUg90nXbAbmOtENempQd+vwJRfyVNkA8EXtsh KGm0nrbeOGE/Z8o+f6svJNPuISoUfzHDGZOuvcs62iL9lOrRqi7AIdXwa96+jmiwGJE/ 18vXyNLjGfRop6zEMF227PhqfmdsSU9dwoX4DQu6BwbF7z5EbdPqb8NrOC0DesQgm+I2 QwJNlokuHOQ8QU8kSu8UoJOJuye4HtefHjyGbei+F+SUPgoT+lqotp+NaCadkTZ63esd Gq7gUBXijmMsaWIVfFbYwaRkGBLK2l/lreBTMmrcfFG9u1oDauUWy+MiuBS0vJQ3Obdg 65RQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=5W2kQ6gKL3H5WyTTvmSI7eIV1Wb+FgZg191HZ5djFFs=; b=veKVPqpwunZ1CQRW3GoL8rx+XWdenm8GKSF32/03Y9WGO1XpiC4lcc09VmYAqhMUc3 WfRiJR1BQISYDZ/D5HLuH41QrickrgitQKSSKxiteFkLzwZDyQZ96Andt8KA9iJN3iNd a1qKRWGs81FRIYM7BguiWlmV1BScCFfQjL66IUSGHyHQtqdR4RKNnAFuQXWriM4RU1Ug oe7KTN/93pEgMyg+fzzxB8j7rMUJUGiC0wAJUPhWeEXlrB50UbwbT910fj3Y2ZePvO8v JeUN5+4pxlEAjrO134nkYIgvN5qTei3L8UoOUfDDn4Yu8ZI9amr9YykNVUsYbCG8SsX9 pfJQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j10si3144905pfe.276.2018.03.21.06.57.27; Wed, 21 Mar 2018 06:57:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752252AbeCUN4G (ORCPT + 99 others); Wed, 21 Mar 2018 09:56:06 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:53524 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751736AbeCUN4E (ORCPT ); Wed, 21 Mar 2018 09:56:04 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 169E71529; Wed, 21 Mar 2018 06:56:04 -0700 (PDT) Received: from queper01-VirtualBox (usa-sjc-mx-foss1.foss.arm.com [217.140.101.70]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 94ECD3F25D; Wed, 21 Mar 2018 06:56:00 -0700 (PDT) Date: Wed, 21 Mar 2018 13:55:58 +0000 From: Quentin Perret To: Juri Lelli Cc: Patrick Bellasi , Dietmar Eggemann , linux-kernel@vger.kernel.org, Peter Zijlstra , Thara Gopinath , linux-pm@vger.kernel.org, Morten Rasmussen , Chris Redpath , Valentin Schneider , "Rafael J . Wysocki" , Greg Kroah-Hartman , Vincent Guittot , Viresh Kumar , Todd Kjos , Joel Fernandes Subject: Re: [RFC PATCH 4/6] sched/fair: Introduce an energy estimation helper function Message-ID: <20180321135557.GB1373@queper01-VirtualBox> References: <20180320094312.24081-1-dietmar.eggemann@arm.com> <20180320094312.24081-5-dietmar.eggemann@arm.com> <20180321090430.GA6913@localhost.localdomain> <20180321122621.GA13951@e110439-lin> <20180321125925.GB15165@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180321125925.GB15165@localhost.localdomain> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wednesday 21 Mar 2018 at 13:59:25 (+0100), Juri Lelli wrote: > On 21/03/18 12:26, Patrick Bellasi wrote: > > On 21-Mar 10:04, Juri Lelli wrote: > > > Hi, > > > > > > On 20/03/18 09:43, Dietmar Eggemann wrote: > > > > From: Quentin Perret > > > > > > > > In preparation for the definition of an energy-aware wakeup path, a > > > > helper function is provided to estimate the consequence on system energy > > > > when a specific task wakes-up on a specific CPU. compute_energy() > > > > estimates the OPPs to be reached by all frequency domains and estimates > > > > the consumption of each online CPU according to its energy model and its > > > > percentage of busy time. > > > > > > > > Cc: Ingo Molnar > > > > Cc: Peter Zijlstra > > > > Signed-off-by: Quentin Perret > > > > Signed-off-by: Dietmar Eggemann > > > > --- > > > > kernel/sched/fair.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > 1 file changed, 81 insertions(+) > > > > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > > index 6c72a5e7b1b0..76bd46502486 100644 > > > > --- a/kernel/sched/fair.c > > > > +++ b/kernel/sched/fair.c > > > > @@ -6409,6 +6409,30 @@ static inline int cpu_overutilized(int cpu) > > > > } > > > > > > > > /* > > > > + * Returns the util of "cpu" if "p" wakes up on "dst_cpu". > > > > + */ > > > > +static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu) > > > > +{ > > > > + unsigned long util = cpu_rq(cpu)->cfs.avg.util_avg; > > > > > > What about other classes? Shouldn't we now also take into account > > > DEADLINE (as schedutil does)? > > > > Good point, although that would likely require to factor out from > > schedutil the utilization aggregation function, isn't it? > > Maybe, or simply use getter methods and aggregate again here. I agree with you both, taking DL into account here is most likely the right thing to do. Thinking about this, there are other places in those patches where we should really use capacity_of() instead of capacity_orig_of() (in task_fits() in patch 5/6 for ex) in order to avoid CPUs under heavy RT pressure. I'll try to improve the integration with other scheduling classes in v2. > > > > > > BTW, we now also have a getter method in sched/sched.h; it takes > > > UTIL_EST into account, though. Do we need to take that into account when > > > estimating energy consumption? Yes, I think that using UTIL_EST makes a lot of sense for energy calculation. This is what is used for frequency selection (with schedutil obviously) and this is also our best guess on how much time a task will spend on a CPU. V2 will be rebased on the latest tip/sched/core and I'll make sure to integrate things better with util_est. > > > > Actually I think that this whole function can be written "just" as: > > > > ---8<--- > > unsigned long util = cpu_util_wake(cpu); > > > > if (cpu != dst_cpu) > > return util; > > > > return min(util + task_util(p), capacity_orig_of(cpu)); > > ---8<--- > > > > which will reuse existing functions as well as getting for free other > > stuff (like the CPU util_est). > > > > Considering your observation above, it makes also easy to add into > > util the DEADLINE and RT utilizations, just before returning the > > value. > > Well, for RT we should problably consider the fact that schedutil is > going to select max OPP... Right, but I need to think about the right place to put that, and how to compute the energy accurately in this case. Some modification might also be required in find_cap_state() (patch 5/6). > > Apart from that I guess it could work like you said. > > > > > > > + unsigned long capacity = capacity_orig_of(cpu); > > > > + > > > > + /* > > > > + * If p is where it should be, or if it has no impact on cpu, there is > > > > + * not much to do. > > > > + */ > > > > + if ((task_cpu(p) == dst_cpu) || (cpu != task_cpu(p) && cpu != dst_cpu)) > > > > + goto clamp_util; > > > > + > > > > + if (dst_cpu == cpu) > > > > + util += task_util(p); > > > > + else > > > > + util = max_t(long, util - task_util(p), 0); > > > > + > > > > +clamp_util: > > > > + return (util >= capacity) ? capacity : util; > > > > +} > > > > + > > > > +/* > > > > * Disable WAKE_AFFINE in the case where task @p doesn't fit in the > > > > * capacity of either the waking CPU @cpu or the previous CPU @prev_cpu. > > > > * > > > > @@ -6432,6 +6456,63 @@ static int wake_cap(struct task_struct *p, int cpu, int prev_cpu) > > > > return !util_fits_capacity(task_util(p), min_cap); > > > > } > > > > > > > > +static struct capacity_state *find_cap_state(int cpu, unsigned long util) > > > > +{ > > > > + struct sched_energy_model *em = *per_cpu_ptr(energy_model, cpu); > > > > + struct capacity_state *cs = NULL; > > > > + int i; > > > > + > > > > + /* > > > > + * As the goal is to estimate the OPP reached for a specific util > > > > + * value, mimic the behaviour of schedutil with a 1.25 coefficient > > > > + */ > > > > + util += util >> 2; > > > > > > What about other governors (ondemand for example). Is this supposed to > > > work only when schedutil is in use (if so we should probably make it > > > conditional on that)? > > > > Yes, I would say that EAS mostly makes sense when you have a "minimum" > > control on OPPs... otherwise all the energy estimations are really > > fuzzy. > > Make sense to me. Shouldn't we then make all this conditional on using > schedutil? So, in theory, EAS could make sense even for other governors than schedutil. Even with the performance governor it is probably more energy efficient (although users using "performance" probably don't care about energy, but that's just an example) to place small tasks onto little CPUs up to a certain point given by the energy model. The ideal solution would be to change the behaviour of find_cap_state() depending on the governor being used, but I don't know if this extra complexity is worth it really. I'm happy to make all this conditional on schedutil as a first step and we can see later if that makes sense to extend EAS to other use-cases. > > > > > > Also, even when schedutil is in use, shouldn't we ask it for a util > > > "computation" instead of replicating its _current_ heuristic? > > > > Are you proposing to have the 1.25 factor only here and remove it from > > schedutil? > > I'm only saying that we shouldn't probably have two places where we add > this 1.25 factor to utilization before using it, as in the future one of > the two might modify that 1.25 to something else and then we'll have > problems. So, maybe a common wrapper that adds such factor? Ok, I can definitely factorize this code between schedutil and EAS. And BTW, would it make sense to make schedutil use "capacity_margin" instead of an arbitrary value ? The semantics feels pretty close. Out of curiosity, what was the reason to use C=1.25 in the first place ? > > > > > > I fear the two might diverge in the future. > > > > That could be avoided by factoring out from schedutil the > > "compensation" factor into a proper function to be used by all the > > interested playes, isn't it? > > And I should have read till the end before writing the above paragraph > it seems. :) > > Thanks, > > - Juri Thank you very much for the feedback !