Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964934AbbLOKVb (ORCPT ); Tue, 15 Dec 2015 05:21:31 -0500 Received: from mail-lf0-f50.google.com ([209.85.215.50]:36237 "EHLO mail-lf0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964803AbbLOKV3 (ORCPT ); Tue, 15 Dec 2015 05:21:29 -0500 MIME-Version: 1.0 In-Reply-To: <20151214002658.GD28098@intel.com> References: <1448372970-8764-1-git-send-email-vincent.guittot@linaro.org> <20151214002658.GD28098@intel.com> From: Vincent Guittot Date: Tue, 15 Dec 2015 11:21:08 +0100 Message-ID: Subject: Re: [PATCH] sched/fair: update scale invariance of pelt To: Yuyang Du Cc: Peter Zijlstra , Ingo Molnar , linux-kernel , Morten Rasmussen , Linaro Kernel Mailman List , Dietmar Eggemann , Paul Turner , Benjamin Segall Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6617 Lines: 155 On 14 December 2015 at 01:26, Yuyang Du wrote: > Hi Vincent, > > I don't quite catch what this is doing, maybe I need more time > to ramp up to the gory detail difficult like this. > > Do you scale or not scale? You seem removed the scaling, but added it > after "Remainder of delta accrued against u_0".. I'm scaling the time before taking it in the pelt algorithm. My reply to Morten's comment tries to explain more deeply what i'm trying to achieve Thanks, Vincent > > Thanks, > Yuyang > > On Tue, Nov 24, 2015 at 02:49:30PM +0100, Vincent Guittot wrote: >> The current implementation of load tracking invariance scales the load >> tracking value with current frequency and uarch performance (only for >> utilization) of the CPU. >> >> One main result of the current formula is that the figures are capped by >> the current capacity of the CPU. This limitation is the main reason of not >> including the uarch invariance (arch_scale_cpu_capacity) in the calculation >> of load_avg because capping the load can generate erroneous system load >> statistic as described with this example [1] >> >> Instead of scaling the complete value of PELT algo, we should only scale >> the running time by the current capacity of the CPU. It seems more correct >> to only scale the running time because the non running time of a task >> (sleeping or waiting for a runqueue) is the same whatever the current freq >> and the compute capacity of the CPU. >> >> Then, one main advantage of this change is that the load of a task can >> reach max value whatever the current freq and the uarch of the CPU on which >> it run. It will just take more time at a lower freq than a max freq or on a >> "little" CPU compared to a "big" one. The load and the utilization stay >> invariant across system so we can still compared them between CPU but with >> a wider range of values. >> >> With this change, we don't have to test if a CPU is overloaded or not in >> order to use one metric (util) or another (load) as all metrics are always >> valid. >> >> I have put below some examples of duration to reach some typical load value >> according to the capacity of the CPU with current implementation >> and with this patch. >> >> Util (%) max capacity half capacity(mainline) half capacity(w/ patch) >> 972 (95%) 138ms not reachable 276ms >> 486 (47.5%) 30ms 138ms 60ms >> 256 (25%) 13ms 32ms 26ms >> >> We can see that at half capacity, we need twice the duration of max >> capacity with this patch whereas we have a non linear increase of the >> duration with current implementation. >> >> [1] https://lkml.org/lkml/2014/12/18/128 >> >> Signed-off-by: Vincent Guittot >> --- >> kernel/sched/fair.c | 28 +++++++++++++--------------- >> 1 file changed, 13 insertions(+), 15 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 824aa9f..f2a18e1 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -2560,10 +2560,9 @@ static __always_inline int >> __update_load_avg(u64 now, int cpu, struct sched_avg *sa, >> unsigned long weight, int running, struct cfs_rq *cfs_rq) >> { >> - u64 delta, scaled_delta, periods; >> + u64 delta, periods; >> u32 contrib; >> - unsigned int delta_w, scaled_delta_w, decayed = 0; >> - unsigned long scale_freq, scale_cpu; >> + unsigned int delta_w, decayed = 0; >> >> delta = now - sa->last_update_time; >> /* >> @@ -2584,8 +2583,10 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, >> return 0; >> sa->last_update_time = now; >> >> - scale_freq = arch_scale_freq_capacity(NULL, cpu); >> - scale_cpu = arch_scale_cpu_capacity(NULL, cpu); >> + if (running) { >> + delta = cap_scale(delta, arch_scale_freq_capacity(NULL, cpu)); >> + delta = cap_scale(delta, arch_scale_cpu_capacity(NULL, cpu)); >> + } >> >> /* delta_w is the amount already accumulated against our next period */ >> delta_w = sa->period_contrib; >> @@ -2601,16 +2602,15 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, >> * period and accrue it. >> */ >> delta_w = 1024 - delta_w; >> - scaled_delta_w = cap_scale(delta_w, scale_freq); >> if (weight) { >> - sa->load_sum += weight * scaled_delta_w; >> + sa->load_sum += weight * delta_w; >> if (cfs_rq) { >> cfs_rq->runnable_load_sum += >> - weight * scaled_delta_w; >> + weight * delta_w; >> } >> } >> if (running) >> - sa->util_sum += scaled_delta_w * scale_cpu; >> + sa->util_sum += delta_w << SCHED_CAPACITY_SHIFT; >> >> delta -= delta_w; >> >> @@ -2627,25 +2627,23 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, >> >> /* Efficiently calculate \sum (1..n_period) 1024*y^i */ >> contrib = __compute_runnable_contrib(periods); >> - contrib = cap_scale(contrib, scale_freq); >> if (weight) { >> sa->load_sum += weight * contrib; >> if (cfs_rq) >> cfs_rq->runnable_load_sum += weight * contrib; >> } >> if (running) >> - sa->util_sum += contrib * scale_cpu; >> + sa->util_sum += contrib << SCHED_CAPACITY_SHIFT; >> } >> >> /* Remainder of delta accrued against u_0` */ >> - scaled_delta = cap_scale(delta, scale_freq); >> if (weight) { >> - sa->load_sum += weight * scaled_delta; >> + sa->load_sum += weight * delta; >> if (cfs_rq) >> - cfs_rq->runnable_load_sum += weight * scaled_delta; >> + cfs_rq->runnable_load_sum += weight * delta; >> } >> if (running) >> - sa->util_sum += scaled_delta * scale_cpu; >> + sa->util_sum += delta << SCHED_CAPACITY_SHIFT; >> >> sa->period_contrib += delta; >> >> -- >> 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/