Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754611AbcDDI6f (ORCPT ); Mon, 4 Apr 2016 04:58:35 -0400 Received: from foss.arm.com ([217.140.101.70]:43806 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750741AbcDDI6e (ORCPT ); Mon, 4 Apr 2016 04:58:34 -0400 Date: Mon, 4 Apr 2016 10:01:08 +0100 From: Morten Rasmussen To: Leo Yan Cc: Ingo Molnar , Peter Zijlstra , Dietmar Eggemann , Vincent Guittot , Steve Muckle , linux-kernel@vger.kernel.org, eas-dev@lists.linaro.org Subject: Re: [PATCH RFC] sched/fair: let cpu's cfs_rq to reflect task migration Message-ID: <20160404090108.GB18516@e105550-lin.cambridge.arm.com> References: <1459528717-17339-1-git-send-email-leo.yan@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1459528717-17339-1-git-send-email-leo.yan@linaro.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2448 Lines: 57 On Sat, Apr 02, 2016 at 12:38:37AM +0800, Leo Yan wrote: > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 0fe30e6..10ca1a9 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2825,12 +2825,24 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq); > static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq) > { > struct sched_avg *sa = &cfs_rq->avg; > + struct sched_avg *cpu_sa = NULL; > int decayed, removed = 0; > + int cpu = cpu_of(rq_of(cfs_rq)); > + > + if (&cpu_rq(cpu)->cfs != cfs_rq) > + cpu_sa = &cpu_rq(cpu)->cfs.avg; > > if (atomic_long_read(&cfs_rq->removed_load_avg)) { > s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); > sa->load_avg = max_t(long, sa->load_avg - r, 0); > sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0); > + > + if (cpu_sa) { > + cpu_sa->load_avg = max_t(long, cpu_sa->load_avg - r, 0); > + cpu_sa->load_sum = max_t(s64, > + cpu_sa->load_sum - r * LOAD_AVG_MAX, 0); I think this is not quite right. 'r' is the load contribution removed from a group cfs_rq. Unless the task is the only task in the group and the group shares are default, this value is different from the load contribution of the task at root level (task_h_load()). And how about nested groups? I think you may end up removing the contribution of a task multiple times from the root cfs_rq if it is in a nested group. You will need to either redesign group scheduling so you get the load to trickle down automatically and with the right contribution, or maintain a new sum at the root bypassing the existing design. I'm not sure if the latter makes sense for load though. It would make more sense for utilization only. > @@ -2919,6 +2939,13 @@ skip_aging: > cfs_rq->avg.load_sum += se->avg.load_sum; > cfs_rq->avg.util_avg += se->avg.util_avg; > cfs_rq->avg.util_sum += se->avg.util_sum; > + > + if (&cpu_rq(cpu)->cfs != cfs_rq) { > + cpu_rq(cpu)->cfs.avg.load_avg += se->avg.load_avg; > + cpu_rq(cpu)->cfs.avg.load_sum += se->avg.load_sum; > + cpu_rq(cpu)->cfs.avg.util_avg += se->avg.util_avg; > + cpu_rq(cpu)->cfs.avg.util_sum += se->avg.util_sum; > + } Same problem as above. You should be adding the right amount of task contribution here. Something similar to task_h_load(). The problem with using task_h_load() is that it is not updated immediately either, so maintaining a stable signal based on that might be difficult.