Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756397AbcDGNEu (ORCPT ); Thu, 7 Apr 2016 09:04:50 -0400 Received: from mail-lb0-f180.google.com ([209.85.217.180]:36611 "EHLO mail-lb0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752929AbcDGNEt (ORCPT ); Thu, 7 Apr 2016 09:04:49 -0400 MIME-Version: 1.0 In-Reply-To: <57055B41.6000906@arm.com> References: <1459528717-17339-1-git-send-email-leo.yan@linaro.org> <20160401194948.GN3448@twins.programming.kicks-ass.net> <56FEF621.3070404@linaro.org> <20160402071154.GA7046@leoy-linaro> <20160404084821.GA18516@e105550-lin.cambridge.arm.com> <20160404183003.GA8697@intel.com> <20160405075112.GC18516@e105550-lin.cambridge.arm.com> <20160405001552.GB8697@intel.com> <5703EF38.2060204@arm.com> <20160406083702.GE18516@e105550-lin.cambridge.arm.com> <57055B41.6000906@arm.com> From: Vincent Guittot Date: Thu, 7 Apr 2016 15:04:27 +0200 Message-ID: Subject: Re: [PATCH RFC] sched/fair: let cpu's cfs_rq to reflect task migration To: Dietmar Eggemann Cc: Morten Rasmussen , Yuyang Du , Leo Yan , Steve Muckle , Peter Zijlstra , Ingo Molnar , linux-kernel , "eas-dev@lists.linaro.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5800 Lines: 136 Hi Dietmar, On 6 April 2016 at 20:53, Dietmar Eggemann wrote: > On 06/04/16 09:37, Morten Rasmussen wrote: >> On Tue, Apr 05, 2016 at 06:00:40PM +0100, Dietmar Eggemann wrote: >>> @@ -2893,8 +2906,12 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s >>> se->avg.last_update_time = cfs_rq->avg.last_update_time; >>> cfs_rq->avg.load_avg += se->avg.load_avg; >>> cfs_rq->avg.load_sum += se->avg.load_sum; >>> - cfs_rq->avg.util_avg += se->avg.util_avg; >>> - cfs_rq->avg.util_sum += se->avg.util_sum; >>> + >>> + if (!entity_is_task(se)) >>> + return; >>> + >>> + rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg; >>> + rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum; >> >> To me it seems that you cannot be sure that the rq_of(cfs_rq)->cfs.avg >> time stamp is aligned with se->avg time stamp, which is necessary before >> you can add/subtract two geometric series without introducing an error. >> >> attach_entity_load_avg() is called (through a couple of other functions) >> from the for_each_sched_entity() loop in enqueue_task_fair() which works >> its way towards the root cfs_rq, i.e. rq_of(cfs_rq)->cfs. So in the loop >> iteration where you attach the task sched_entity, we haven't yet visited >> and updated rq_of(cfs_rq)->cfs.avg. >> >> If you just add the task contribution and discover later that there is a >> time delta when you update rq_of(cfs_rq)->cfs.avg you end up decaying >> the task contribution which was already up-to-date and its util >> contribution to rq_of(cfs_rq)->cfs.avg ends up being smaller than it >> should be. >> >> Am I missing something? > > No that's definitely wrong in the current patch. I could defer the attach into > the update_cfs_rq_load_avg() call for the root cfs_rq if > '&rq_of(cfs_rq)->cfs != cfs_rq' in attach_entity_load_avg(): > > Something like this (only lightly tested!): > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 51d675715776..d31d9cd453a1 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2856,6 +2856,16 @@ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq) > decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa, > scale_load_down(cfs_rq->load.weight), cfs_rq->curr != NULL, cfs_rq); > > + if (cfs_rq->added_util_avg) { > + sa->util_avg += cfs_rq->added_util_avg; > + cfs_rq->added_util_avg = 0; > + } > + > + if (cfs_rq->added_util_sum) { > + sa->util_sum += cfs_rq->added_util_sum; > + cfs_rq->added_util_sum = 0; > + } > + > #ifndef CONFIG_64BIT > smp_wmb(); > cfs_rq->load_last_update_time_copy = sa->last_update_time; > @@ -2910,8 +2920,13 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s > if (!entity_is_task(se)) > return; > > - rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg; > - rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum; > + if (&rq_of(cfs_rq)->cfs == cfs_rq) { > + rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg; > + rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum; > + } else { > + rq_of(cfs_rq)->cfs.added_util_avg = se->avg.util_avg; > + rq_of(cfs_rq)->cfs.added_util_sum = se->avg.util_sum; > + } > } Don't you also need similar thing for the detach ? > > static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) > @@ -4344,8 +4359,11 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) > cfs_rq = cfs_rq_of(se); > cfs_rq->h_nr_running++; > > - if (cfs_rq_throttled(cfs_rq)) > + if (cfs_rq_throttled(cfs_rq)) { > + rq_of(cfs_rq)->cfs.added_util_avg = 0; > + rq_of(cfs_rq)->cfs.added_util_sum = 0; > break; > + } > > update_load_avg(se, 1); > update_cfs_shares(cfs_rq); > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index b2ff5a2bd6df..f0ea3a7eaf07 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -391,6 +391,8 @@ struct cfs_rq { > unsigned long tg_load_avg_contrib; > #endif > atomic_long_t removed_load_avg, removed_util_avg; > + u32 added_util_sum; > + unsigned long added_util_avg; > #ifndef CONFIG_64BIT > u64 load_last_update_time_copy; > > But attach_entity_load_avg() is not only called in enqueue_entity_load_avg() for migrated > tasks but also in attach_task_cfs_rq() which is called from switched_to_fair() and > task_move_group_fair() where we can't assume that after the enqueue_entity_load_avg() a > call to update_cfs_rq_load_avg() follows like in > > enqueue_task_fair(): > > for_each_sched_entity(se) > enqueue_entity() > enqueue_entity_load_avg() > update_cfs_rq_load_avg(now, cfs_rq) > if (migrated) attach_entity_load_avg() > > for_each_sched_entity(se) > update_load_avg() > update_cfs_rq_load_avg(now, cfs_rq) > > > Not sure if we can just update the root cfs_rq to se->avg.last_update_time before we add > se->avg.util_[avg/sum] to rq_of(cfs_rq)->cfs.avg.util_[avg/sum] in attach_entity_load_avg()? > > cfs_rq throttling has to be considered as well ... IIUC this new proposal, the utilization of a task will be accounted on the utilization of the root cfs_rq thanks to tsk->se->cfs_rq->tg->se[cpu]->... down to the root cfs_rq. Then, you directly add the utilization of the newly enqueued task in the root cfs_rq.