Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753115AbcDFSyG (ORCPT ); Wed, 6 Apr 2016 14:54:06 -0400 Received: from foss.arm.com ([217.140.101.70]:58793 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752641AbcDFSyD (ORCPT ); Wed, 6 Apr 2016 14:54:03 -0400 Subject: Re: [PATCH RFC] sched/fair: let cpu's cfs_rq to reflect task migration To: Morten Rasmussen References: <1459528717-17339-1-git-send-email-leo.yan@linaro.org> <20160401194948.GN3448@twins.programming.kicks-ass.net> <56FEF621.3070404@linaro.org> <20160402071154.GA7046@leoy-linaro> <20160404084821.GA18516@e105550-lin.cambridge.arm.com> <20160404183003.GA8697@intel.com> <20160405075112.GC18516@e105550-lin.cambridge.arm.com> <20160405001552.GB8697@intel.com> <5703EF38.2060204@arm.com> <20160406083702.GE18516@e105550-lin.cambridge.arm.com> Cc: Yuyang Du , Leo Yan , Steve Muckle , Peter Zijlstra , Ingo Molnar , Vincent Guittot , linux-kernel@vger.kernel.org, eas-dev@lists.linaro.org From: Dietmar Eggemann Message-ID: <57055B41.6000906@arm.com> Date: Wed, 6 Apr 2016 19:53:53 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160406083702.GE18516@e105550-lin.cambridge.arm.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5195 Lines: 124 On 06/04/16 09:37, Morten Rasmussen wrote: > On Tue, Apr 05, 2016 at 06:00:40PM +0100, Dietmar Eggemann wrote: >> @@ -2893,8 +2906,12 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s >> se->avg.last_update_time = cfs_rq->avg.last_update_time; >> cfs_rq->avg.load_avg += se->avg.load_avg; >> cfs_rq->avg.load_sum += se->avg.load_sum; >> - cfs_rq->avg.util_avg += se->avg.util_avg; >> - cfs_rq->avg.util_sum += se->avg.util_sum; >> + >> + if (!entity_is_task(se)) >> + return; >> + >> + rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg; >> + rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum; > > To me it seems that you cannot be sure that the rq_of(cfs_rq)->cfs.avg > time stamp is aligned with se->avg time stamp, which is necessary before > you can add/subtract two geometric series without introducing an error. > > attach_entity_load_avg() is called (through a couple of other functions) > from the for_each_sched_entity() loop in enqueue_task_fair() which works > its way towards the root cfs_rq, i.e. rq_of(cfs_rq)->cfs. So in the loop > iteration where you attach the task sched_entity, we haven't yet visited > and updated rq_of(cfs_rq)->cfs.avg. > > If you just add the task contribution and discover later that there is a > time delta when you update rq_of(cfs_rq)->cfs.avg you end up decaying > the task contribution which was already up-to-date and its util > contribution to rq_of(cfs_rq)->cfs.avg ends up being smaller than it > should be. > > Am I missing something? No that's definitely wrong in the current patch. I could defer the attach into the update_cfs_rq_load_avg() call for the root cfs_rq if '&rq_of(cfs_rq)->cfs != cfs_rq' in attach_entity_load_avg(): Something like this (only lightly tested!): diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 51d675715776..d31d9cd453a1 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2856,6 +2856,16 @@ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq) decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa, scale_load_down(cfs_rq->load.weight), cfs_rq->curr != NULL, cfs_rq); + if (cfs_rq->added_util_avg) { + sa->util_avg += cfs_rq->added_util_avg; + cfs_rq->added_util_avg = 0; + } + + if (cfs_rq->added_util_sum) { + sa->util_sum += cfs_rq->added_util_sum; + cfs_rq->added_util_sum = 0; + } + #ifndef CONFIG_64BIT smp_wmb(); cfs_rq->load_last_update_time_copy = sa->last_update_time; @@ -2910,8 +2920,13 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s if (!entity_is_task(se)) return; - rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg; - rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum; + if (&rq_of(cfs_rq)->cfs == cfs_rq) { + rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg; + rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum; + } else { + rq_of(cfs_rq)->cfs.added_util_avg = se->avg.util_avg; + rq_of(cfs_rq)->cfs.added_util_sum = se->avg.util_sum; + } } static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) @@ -4344,8 +4359,11 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) cfs_rq = cfs_rq_of(se); cfs_rq->h_nr_running++; - if (cfs_rq_throttled(cfs_rq)) + if (cfs_rq_throttled(cfs_rq)) { + rq_of(cfs_rq)->cfs.added_util_avg = 0; + rq_of(cfs_rq)->cfs.added_util_sum = 0; break; + } update_load_avg(se, 1); update_cfs_shares(cfs_rq); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index b2ff5a2bd6df..f0ea3a7eaf07 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -391,6 +391,8 @@ struct cfs_rq { unsigned long tg_load_avg_contrib; #endif atomic_long_t removed_load_avg, removed_util_avg; + u32 added_util_sum; + unsigned long added_util_avg; #ifndef CONFIG_64BIT u64 load_last_update_time_copy; But attach_entity_load_avg() is not only called in enqueue_entity_load_avg() for migrated tasks but also in attach_task_cfs_rq() which is called from switched_to_fair() and task_move_group_fair() where we can't assume that after the enqueue_entity_load_avg() a call to update_cfs_rq_load_avg() follows like in enqueue_task_fair(): for_each_sched_entity(se) enqueue_entity() enqueue_entity_load_avg() update_cfs_rq_load_avg(now, cfs_rq) if (migrated) attach_entity_load_avg() for_each_sched_entity(se) update_load_avg() update_cfs_rq_load_avg(now, cfs_rq) Not sure if we can just update the root cfs_rq to se->avg.last_update_time before we add se->avg.util_[avg/sum] to rq_of(cfs_rq)->cfs.avg.util_[avg/sum] in attach_entity_load_avg()? cfs_rq throttling has to be considered as well ...