Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751947AbbGMRnK (ORCPT ); Mon, 13 Jul 2015 13:43:10 -0400 Received: from eu-smtp-delivery-143.mimecast.com ([207.82.80.143]:28348 "EHLO eu-smtp-delivery-143.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751117AbbGMRnJ convert rfc822-to-8bit (ORCPT ); Mon, 13 Jul 2015 13:43:09 -0400 Message-ID: <55A3F8A9.2060807@arm.com> Date: Mon, 13 Jul 2015 18:43:05 +0100 From: Dietmar Eggemann User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Rabin Vincent CC: Yuyang Du , Morten Rasmussen , Mike Galbraith , Peter Zijlstra , "mingo@redhat.com" , "linux-kernel@vger.kernel.org" , Paul Turner , Ben Segall Subject: Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() References: <20150701145551.GA15690@axis.com> <20150701204404.GH25159@twins.programming.kicks-ass.net> <20150701232511.GA5197@intel.com> <1435824347.5351.18.camel@gmail.com> <20150702010539.GB5197@intel.com> <20150702114032.GA7598@e105550-lin.cambridge.arm.com> <20150702193702.GD5197@intel.com> <20150703093441.GA15477@e105550-lin.cambridge.arm.com> <20150705201241.GE5197@intel.com> <559ABCB8.6020209@arm.com> <20150707111757.GA24839@axis.com> In-Reply-To: <20150707111757.GA24839@axis.com> X-OriginalArrivalTime: 13 Jul 2015 17:43:06.0141 (UTC) FILETIME=[5FF3F4D0:01D0BD93] X-MC-Unique: WnmmPF9ER7mjOmnHVNIuIg-1 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3782 Lines: 114 On 07/07/15 12:17, Rabin Vincent wrote: > On Mon, Jul 06, 2015 at 07:36:56PM +0200, Dietmar Eggemann wrote: >> Rabin, could you share the content of your >> /sys/fs/cgroup/cpu/system.slice directory and of /proc/cgroups ? > > Here's /proc/cgroups, > > # cat /proc/cgroups > #subsys_name hierarchy num_cgroups enabled > cpu 2 98 1 > cpuacct 2 98 1 > > and the contents of /sys/fs/cgroup/cpu/system.slice are available here: > https://drive.google.com/file/d/0B4tMLbMvJ-l6ZVBvZ09QOE15MU0/view > > /Rabin > So why not maintain a runnable signal for the task group se's? At least to figure out if the 118 is coming from blocked load. -- >8 -- Subject: [PATCH] sched: Maintain a runnable version of tg->load_avg and cfs_rq->tg_load_contrib Including blocked load in the load average contribution of sched entities (se->avg.load_avg_contrib) representing task groups can lead to scenarios where the imbalance is greater than sum(task_h_load(p)) for all tasks p on the src rq. To avoid this use cfs_rq->runnable_tg_load_contrib and tg->runnable_load_avg to calculate se->avg.load_avg_contrib for sched entities representing task groups. Both runnable based values are updated in cadence with the existing values. The existing tg->load_avg and cfs_rq->tg_load_contrib are still used to calculate task group weight. Signed-off-by: Dietmar Eggemann --- kernel/sched/fair.c | 11 ++++++++--- kernel/sched/sched.h | 2 ++ 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 587a2f67ceb1..f2cfbaaf5700 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2647,7 +2647,7 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq, int force_update) { struct task_group *tg = cfs_rq->tg; - long tg_contrib; + long tg_contrib, runnable_tg_contrib; tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg; tg_contrib -= cfs_rq->tg_load_contrib; @@ -2655,9 +2655,14 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq, if (!tg_contrib) return; + runnable_tg_contrib = cfs_rq->runnable_load_avg; + runnable_tg_contrib -= cfs_rq->runnable_tg_load_contrib; + if (force_update || abs(tg_contrib) > cfs_rq->tg_load_contrib / 8) { atomic_long_add(tg_contrib, &tg->load_avg); cfs_rq->tg_load_contrib += tg_contrib; + atomic_long_add(runnable_tg_contrib, &tg->runnable_load_avg); + cfs_rq->runnable_tg_load_contrib += runnable_tg_contrib; } } @@ -2690,9 +2695,9 @@ static inline void __update_group_entity_contrib(struct sched_entity *se) u64 contrib; - contrib = cfs_rq->tg_load_contrib * tg->shares; + contrib = cfs_rq->runnable_tg_load_contrib * tg->shares; se->avg.load_avg_contrib = div_u64(contrib, - atomic_long_read(&tg->load_avg) + 1); + atomic_long_read(&tg->runnable_load_avg) + 1); /* * For group entities we need to compute a correction term in the case diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 84d48790bb6d..eed74e5efe91 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -245,6 +245,7 @@ struct task_group { #ifdef CONFIG_SMP atomic_long_t load_avg; + atomic_long_t runnable_load_avg; atomic_t runnable_avg; #endif #endif @@ -386,6 +387,7 @@ struct cfs_rq { /* Required to track per-cpu representation of a task_group */ u32 tg_runnable_contrib; unsigned long tg_load_contrib; + unsigned long runnable_tg_load_contrib; /* * h_load = weight * f(tg) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/