Message-ID: <55A3F8A9.2060807@arm.com>
Date: Mon, 13 Jul 2015 18:43:05 +0100
From: Dietmar Eggemann <dietmar.eggemann@arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0
MIME-Version: 1.0
To: Rabin Vincent <rabin.vincent@axis.com>
CC: Yuyang Du <yuyang.du@intel.com>,
        Morten Rasmussen <Morten.Rasmussen@arm.com>,
        Mike Galbraith <umgwanakikbuti@gmail.com>,
        Peter Zijlstra <peterz@infradead.org>,
        "mingo@redhat.com" <mingo@redhat.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Paul Turner <pjt@google.com>, Ben Segall <bsegall@google.com>
Subject: Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance()
References: <20150701145551.GA15690@axis.com> <20150701204404.GH25159@twins.programming.kicks-ass.net> <20150701232511.GA5197@intel.com> <1435824347.5351.18.camel@gmail.com> <20150702010539.GB5197@intel.com> <20150702114032.GA7598@e105550-lin.cambridge.arm.com> <20150702193702.GD5197@intel.com> <20150703093441.GA15477@e105550-lin.cambridge.arm.com> <20150705201241.GE5197@intel.com> <559ABCB8.6020209@arm.com> <20150707111757.GA24839@axis.com>
In-Reply-To: <20150707111757.GA24839@axis.com>
Content-Type: text/plain; charset=WINDOWS-1252
Content-Transfer-Encoding: 8BIT
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3782
Lines: 114

On 07/07/15 12:17, Rabin Vincent wrote:
> On Mon, Jul 06, 2015 at 07:36:56PM +0200, Dietmar Eggemann wrote:
>> Rabin, could you share the content of your
>> /sys/fs/cgroup/cpu/system.slice directory and of /proc/cgroups ?
> 
> Here's /proc/cgroups,
> 
> # cat /proc/cgroups 
> #subsys_name	hierarchy	num_cgroups	enabled
> cpu	2	98	1
> cpuacct	2	98	1
> 
> and the contents of /sys/fs/cgroup/cpu/system.slice are available here:
> https://drive.google.com/file/d/0B4tMLbMvJ-l6ZVBvZ09QOE15MU0/view
> 
> /Rabin
> 

So why not maintain a runnable signal for the task group se's?
At least to figure out if the 118 is coming from blocked load.

-- >8 --
Subject: [PATCH] sched: Maintain a runnable version of tg->load_avg and
 cfs_rq->tg_load_contrib

Including blocked load in the load average contribution of sched
entities (se->avg.load_avg_contrib) representing task groups can
lead to scenarios where the imbalance is greater than
sum(task_h_load(p)) for all tasks p on the src rq.

To avoid this use cfs_rq->runnable_tg_load_contrib and
tg->runnable_load_avg to calculate se->avg.load_avg_contrib for
sched entities representing task groups.

Both runnable based values are updated in cadence with the
existing values.

The existing tg->load_avg and cfs_rq->tg_load_contrib are still
used to calculate task group weight.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 kernel/sched/fair.c  | 11 ++++++++---
 kernel/sched/sched.h |  2 ++
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 587a2f67ceb1..f2cfbaaf5700 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2647,7 +2647,7 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
 						 int force_update)
 {
 	struct task_group *tg = cfs_rq->tg;
-	long tg_contrib;
+	long tg_contrib, runnable_tg_contrib;
 
 	tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg;
 	tg_contrib -= cfs_rq->tg_load_contrib;
@@ -2655,9 +2655,14 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
 	if (!tg_contrib)
 		return;
 
+	runnable_tg_contrib = cfs_rq->runnable_load_avg;
+	runnable_tg_contrib -= cfs_rq->runnable_tg_load_contrib;
+
 	if (force_update || abs(tg_contrib) > cfs_rq->tg_load_contrib / 8) {
 		atomic_long_add(tg_contrib, &tg->load_avg);
 		cfs_rq->tg_load_contrib += tg_contrib;
+		atomic_long_add(runnable_tg_contrib, &tg->runnable_load_avg);
+		cfs_rq->runnable_tg_load_contrib += runnable_tg_contrib;
 	}
 }
 
@@ -2690,9 +2695,9 @@ static inline void __update_group_entity_contrib(struct sched_entity *se)
 
 	u64 contrib;
 
-	contrib = cfs_rq->tg_load_contrib * tg->shares;
+	contrib = cfs_rq->runnable_tg_load_contrib * tg->shares;
 	se->avg.load_avg_contrib = div_u64(contrib,
-				     atomic_long_read(&tg->load_avg) + 1);
+				atomic_long_read(&tg->runnable_load_avg) + 1);
 
 	/*
 	 * For group entities we need to compute a correction term in the case
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 84d48790bb6d..eed74e5efe91 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -245,6 +245,7 @@ struct task_group {
 
 #ifdef	CONFIG_SMP
 	atomic_long_t load_avg;
+	atomic_long_t runnable_load_avg;
 	atomic_t runnable_avg;
 #endif
 #endif
@@ -386,6 +387,7 @@ struct cfs_rq {
 	/* Required to track per-cpu representation of a task_group */
 	u32 tg_runnable_contrib;
 	unsigned long tg_load_contrib;
+	unsigned long runnable_tg_load_contrib;
 
 	/*
 	 *   h_load = weight * f(tg)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/