Received: by 10.192.165.156 with SMTP id m28csp1010405imm; Wed, 18 Apr 2018 02:21:11 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+tpq2t+rX2vwvCW6gW7jMcNHg/csuvyy6yYJ+YgvBWyiTu4KK0mTXEbvN2dR1pY4hzTTAk X-Received: by 10.99.123.20 with SMTP id w20mr1146456pgc.124.1524043270929; Wed, 18 Apr 2018 02:21:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524043270; cv=none; d=google.com; s=arc-20160816; b=fwrVCszJrK2l1sdNK7K4ybJz2vUfZIKwiq7UCse5WFUbeH6lNWf25oUjXHK6BIu100 9owaHV6fnEtWdhEqa0fQYHlOxU3NKMRfXpe63Q55uEVg4/S9O9xjTLfdOnjbG3fBh8ME o119ONgg72wdmgNKoCuuL1LsEvioaEAwOeoBoatUSqfb4dmhBel7soQZCoFk0OVbG0Dg IbTXTwvlkUwn0iFY7zaXx6fqN8sscU30TR5iI9LJMGkQ7ZtRlIPe6KlRIp2gxNN86cnX XEIzk0YLuKwBjenKfxJOgDuXdBk5R3U+LNGewv+Pbexcy+230APg5Fz1wo/ag4RUlXC8 dc2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=sWb/aHXM/RAErVrR0gRq+cC5wx5SPM3/JcZZxaNyrb4=; b=ixtzH8jGo9AGCkbhHvx/3RDA6+0Bq5du5oXxz4MXKw0VwgKIK/LWgsO39h9WhBHwxM n2aU9C6XqZftun5i6855LCsdCElnoKoo507qGWR0xJHXq6fMG06Tgd7SEy2rFhhP4mQ2 YjWn7FhWP/FjdElm1PaJtFjYWp7BG4ZOFroo6g/jHRkFPjA+yk+oOdoz4VPT4Zo5zukv 1TcnYC3U3ur+bpHPMQbXWon8QaeQYMcEXMWaxHeJHyqngIKbVDz9dzkM0rA9l3iqZRDX BknAW7vNcpCghU5qKkG5gP50/vN7QehCkkFZluYKn4eDCClxJoehLCBMU0fotdko6uhE kWLg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=j+jEipaQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z5si699239pgp.671.2018.04.18.02.20.56; Wed, 18 Apr 2018 02:21:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=j+jEipaQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753352AbeDRJTm (ORCPT + 99 others); Wed, 18 Apr 2018 05:19:42 -0400 Received: from mail-pg0-f68.google.com ([74.125.83.68]:43934 "EHLO mail-pg0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752630AbeDRJTk (ORCPT ); Wed, 18 Apr 2018 05:19:40 -0400 Received: by mail-pg0-f68.google.com with SMTP id f132so572744pgc.10 for ; Wed, 18 Apr 2018 02:19:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=sWb/aHXM/RAErVrR0gRq+cC5wx5SPM3/JcZZxaNyrb4=; b=j+jEipaQEz5U5XxSNMTcN2MCII8HBJVANmieDnz2CL+s2lqiYAvfMSM3NsGTnmIf7p UEe0iZ+WjhrPvSRciPwxKQfYN9gEerUOWD+jDJCajilwL1uKYcrUCH00qftfxvdDBpf1 QkWNXxKtruBRBJwjmp/kJtJabC0UJYog1ewBM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=sWb/aHXM/RAErVrR0gRq+cC5wx5SPM3/JcZZxaNyrb4=; b=p8ovezRQUL156hAlcG24UoZ1jd3ziqpH0SFXU6QIHUtLYBR8N8HsfEbT9cTEBIO3Ar 3lBCKkjPdXC2xXyS74sSU2Cv45JEZM/lOyzhQdloYrBoWzv4xJfoFEQFCKn+/Js4OqoP R89WVWReOfvthRTNvskA+A9o1cICEoLRxpvVVogQDf7BXAm7K3E6DxXnJUySCGqvq2dS zfKEnA1VgJ48JRo6Nykw2P50/VdpVVCbzZBdgmnQ8TASquGBXBspNpX2eKzJPM1FGuY2 /p35VZZz7a8k2BF1Z3FBa3XJMafGnx86YzZKxRKw1qwRjr702IVUVlGZ3nbVSlgTdI95 kVsQ== X-Gm-Message-State: ALQs6tAOYLEqy+CtQLH61fpSacCZum19YuVzADWjyEnauwddHMpfaHJP PG+ZMRGsMOYHoS8EY0wSsn/OYg== X-Received: by 10.98.138.68 with SMTP id y65mr1270434pfd.110.1524043180031; Wed, 18 Apr 2018 02:19:40 -0700 (PDT) Received: from leoy-ThinkPad-X240s (li1168-94.members.linode.com. [45.79.69.94]) by smtp.gmail.com with ESMTPSA id e4sm2038433pfa.128.2018.04.18.02.19.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 18 Apr 2018 02:19:38 -0700 (PDT) Date: Wed, 18 Apr 2018 17:19:28 +0800 From: Leo Yan To: Quentin Perret Cc: Dietmar Eggemann , linux-kernel@vger.kernel.org, Peter Zijlstra , Thara Gopinath , linux-pm@vger.kernel.org, Morten Rasmussen , Chris Redpath , Patrick Bellasi , Valentin Schneider , "Rafael J . Wysocki" , Greg Kroah-Hartman , Vincent Guittot , Viresh Kumar , Todd Kjos , Joel Fernandes , Juri Lelli , Steve Muckle , Eduardo Valentin Subject: Re: [RFC PATCH v2 4/6] sched/fair: Introduce an energy estimation helper function Message-ID: <20180418091928.GA15682@leoy-ThinkPad-X240s> References: <20180406153607.17815-1-dietmar.eggemann@arm.com> <20180406153607.17815-5-dietmar.eggemann@arm.com> <20180417152213.GC18509@leoy-ThinkPad-X240s> <20180418081339.GB3943@e108498-lin.cambridge.arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180418081339.GB3943@e108498-lin.cambridge.arm.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 18, 2018 at 09:13:39AM +0100, Quentin Perret wrote: > On Tuesday 17 Apr 2018 at 23:22:13 (+0800), Leo Yan wrote: > > On Fri, Apr 06, 2018 at 04:36:05PM +0100, Dietmar Eggemann wrote: > > > From: Quentin Perret > > > > > > In preparation for the definition of an energy-aware wakeup path, a > > > helper function is provided to estimate the consequence on system energy > > > when a specific task wakes-up on a specific CPU. compute_energy() > > > estimates the OPPs to be reached by all frequency domains and estimates > > > the consumption of each online CPU according to its energy model and its > > > percentage of busy time. > > > > > > Cc: Ingo Molnar > > > Cc: Peter Zijlstra > > > Signed-off-by: Quentin Perret > > > Signed-off-by: Dietmar Eggemann > > > --- > > > include/linux/sched/energy.h | 20 +++++++++++++ > > > kernel/sched/fair.c | 68 ++++++++++++++++++++++++++++++++++++++++++++ > > > kernel/sched/sched.h | 2 +- > > > 3 files changed, 89 insertions(+), 1 deletion(-) > > > > > > diff --git a/include/linux/sched/energy.h b/include/linux/sched/energy.h > > > index 941071eec013..b4110b145228 100644 > > > --- a/include/linux/sched/energy.h > > > +++ b/include/linux/sched/energy.h > > > @@ -27,6 +27,24 @@ static inline bool sched_energy_enabled(void) > > > return static_branch_unlikely(&sched_energy_present); > > > } > > > > > > +static inline > > > +struct capacity_state *find_cap_state(int cpu, unsigned long util) > > > +{ > > > + struct sched_energy_model *em = *per_cpu_ptr(energy_model, cpu); > > > + struct capacity_state *cs = NULL; > > > + int i; > > > + > > > + util += util >> 2; > > > + > > > + for (i = 0; i < em->nr_cap_states; i++) { > > > + cs = &em->cap_states[i]; > > > + if (cs->cap >= util) > > > + break; > > > + } > > > + > > > + return cs; > > > > 'cs' is possible to return NULL. > > Only if em-nr_cap_states==0, and that shouldn't be possible if > sched_energy_present==True, so this code should be safe :-) You are right. Thanks for explanation. > > > +} > > > + > > > static inline struct cpumask *freq_domain_span(struct freq_domain *fd) > > > { > > > return &fd->span; > > > @@ -42,6 +60,8 @@ struct freq_domain; > > > static inline bool sched_energy_enabled(void) { return false; } > > > static inline struct cpumask > > > *freq_domain_span(struct freq_domain *fd) { return NULL; } > > > +static inline struct capacity_state > > > +*find_cap_state(int cpu, unsigned long util) { return NULL; } > > > static inline void init_sched_energy(void) { } > > > #define for_each_freq_domain(fdom) for (; fdom; fdom = NULL) > > > #endif > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index 6960e5ef3c14..8cb9fb04fff2 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -6633,6 +6633,74 @@ static int wake_cap(struct task_struct *p, int cpu, int prev_cpu) > > > } > > > > > > /* > > > + * Returns the util of "cpu" if "p" wakes up on "dst_cpu". > > > + */ > > > +static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu) > > > +{ > > > + unsigned long util, util_est; > > > + struct cfs_rq *cfs_rq; > > > + > > > + /* Task is where it should be, or has no impact on cpu */ > > > + if ((task_cpu(p) == dst_cpu) || (cpu != task_cpu(p) && cpu != dst_cpu)) > > > + return cpu_util(cpu); > > > + > > > + cfs_rq = &cpu_rq(cpu)->cfs; > > > + util = READ_ONCE(cfs_rq->avg.util_avg); > > > + > > > + if (dst_cpu == cpu) > > > + util += task_util(p); > > > + else > > > + util = max_t(long, util - task_util(p), 0); > > > > I tried to understand the logic at here, below code is more clear for > > myself: > > > > int prev_cpu = task_cpu(p); > > > > cfs_rq = &cpu_rq(cpu)->cfs; > > util = READ_ONCE(cfs_rq->avg.util_avg); > > > > /* Bail out if src and dst CPUs are the same one */ > > if (prev_cpu == cpu && dst_cpu == cpu) > > return util; > > > > /* Remove task utilization for src CPU */ > > if (cpu == prev_cpu) > > util = max_t(long, util - task_util(p), 0); > > > > /* Add task utilization for dst CPU */ > > if (dst_cpu == cpu) > > util += task_util(p); > > > > BTW, CPU utilization is decayed value and task_util() is not decayed > > value, so 'util - task_util(p)' calculates a smaller value than the > > prev CPU pure utilization, right? > > task_util() is the raw PELT signal, without UTIL_EST, so I think it's > fine to do `util - task_util()`. > > > > > Another question is can we reuse the function cpu_util_wake() and > > just compenstate task util for dst cpu? > > Well it's not that simple. cpu_util_wake() will give you the max between > the util_avg and the util_est value, so which task_util() should you add > to it ? The util_avg or the uti_est value ? If feature 'UTIL_EST' is enabled, then add task's util_est; otherwise add task util_avg value. I think cpu_util_wake() has similiar logic with here, it merely returns CPU level util; but here needs to accumulate CPU level util + task level util. So seems to me, the logic is: cpu_util_wake() + task_util_wake() > Here we are trying to predict what will be the cpu_util signal in the > future, so the only always-correct implementation of this function has > to predict what will be the CPU util_avg and util_est signals in > parallel and take the max of the two. I totally agree with this, just want to check if can reuse existed code, so we can have more consistent logic accrossing scheduler. > > > + if (sched_feat(UTIL_EST)) { > > > + util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued); > > > + if (dst_cpu == cpu) > > > + util_est += _task_util_est(p); > > > + else > > > + util_est = max_t(long, util_est - _task_util_est(p), 0); > > > + util = max(util, util_est); > > > + } > > > + > > > + return min_t(unsigned long, util, capacity_orig_of(cpu)); > > > +} > > > + > > > +/* > > > + * Estimates the system level energy assuming that p wakes-up on dst_cpu. > > > + * > > > + * compute_energy() is safe to call only if an energy model is available for > > > + * the platform, which is when sched_energy_enabled() is true. > > > + */ > > > +static unsigned long compute_energy(struct task_struct *p, int dst_cpu) > > > +{ > > > + unsigned long util, max_util, sum_util; > > > + struct capacity_state *cs; > > > + unsigned long energy = 0; > > > + struct freq_domain *fd; > > > + int cpu; > > > + > > > + for_each_freq_domain(fd) { > > > + max_util = sum_util = 0; > > > + for_each_cpu_and(cpu, freq_domain_span(fd), cpu_online_mask) { > > > + util = cpu_util_next(cpu, p, dst_cpu); > > > + util += cpu_util_dl(cpu_rq(cpu)); > > > + max_util = max(util, max_util); > > > + sum_util += util; > > > + } > > > + > > > + /* > > > + * Here we assume that the capacity states of CPUs belonging to > > > + * the same frequency domains are shared. Hence, we look at the > > > + * capacity state of the first CPU and re-use it for all. > > > + */ > > > + cpu = cpumask_first(freq_domain_span(fd)); > > > + cs = find_cap_state(cpu, max_util); > > > + energy += cs->power * sum_util / cs->cap; > > > + } > > > > This means all CPUs will be iterated for calculation, the complexity is > > O(n)... > > > > > + return energy; > > > +} > > > + > > > +/* > > > * select_task_rq_fair: Select target runqueue for the waking task in domains > > > * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE, > > > * SD_BALANCE_FORK, or SD_BALANCE_EXEC. > > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > > > index 5d552c0d7109..6eb38f41d5d9 100644 > > > --- a/kernel/sched/sched.h > > > +++ b/kernel/sched/sched.h > > > @@ -2156,7 +2156,7 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {} > > > # define arch_scale_freq_invariant() false > > > #endif > > > > > > -#ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL > > > +#ifdef CONFIG_SMP > > > static inline unsigned long cpu_util_dl(struct rq *rq) > > > { > > > return (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT; > > > -- > > > 2.11.0 > > >