Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753640AbbF3Obk (ORCPT ); Tue, 30 Jun 2015 10:31:40 -0400 Received: from bastet.se.axis.com ([195.60.68.11]:52810 "EHLO bastet.se.axis.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752956AbbF3ObC (ORCPT ); Tue, 30 Jun 2015 10:31:02 -0400 Date: Tue, 30 Jun 2015 16:30:58 +0200 From: Rabin Vincent To: mingo@redhat.com, peterz@infradead.org Cc: linux-kernel@vger.kernel.org Subject: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() Message-ID: <20150630143057.GA31689@axis.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4329 Lines: 132 Hi, We're seeing a livelock where two CPUs both loop with interrupts disabled in pick_next_task_fair() / idle_balance() and continuously fetch all tasks from each other. This has been observed on a 3.18 kernel but the relevant code paths appear to be the same in current mainline. The problem situation appears to be this: cpu0 cpu1 pick_next_task take rq0->lock pick_next_task_fair running=0 idle_balance() drop rq0->lock wakeup A,B on cpu0 pick_next_task take rq1->lock pick_next_task_fair running=0 idle_balance() drop r1->lock load_balance() busiest=rq0 take rq0->lock detach A,B drop rq0->lock take rq1->lock attach A,B pulled_task = 2 drop rq1->lock load_balance() busiest=rq1 take rq1->lock detach A,B drop rq1->lock take rq0->lock attach A,B pulled_task = 2 drop rq0->lock running=0() idle_balance() busiest=rq0, pull A,B, etc. running = 0 load_balance() busiest=rq1, pull A,B, etc And this goes on, with interrupts disabled on both CPUs, for at least a 100 ms (which is when a hardware watchdog hits). The conditions needed, apart from the right timing, are: - cgroups. One of the tasks, say A, needs to be in a CPU cgroup. When the problem occurs, A's ->se has zero load_avg_contrib and task_h_load(A) is zero. However, the se->parent->parent of A has a (relatively) high load_avg_contrib. cpu0's cfs_rq has therefore a relatively high runnable_load_avg. find_busiest_group() therefore detects imbalance, and detach_tasks() detaches all tasks. - PREEMPT=n. Otherwise, the code under #ifdef in detach_tasks() would ensure that we'd only ever pull a maximum of one task during idle balancing. - cpu0 and cpu1 are threads on the same core (cpus_share_cache() == true). otherwise, cpu1 will not be able to wakeup tasks on cpu0 while cpu0 has interrupts disabled (since an IPI would be required). Turning off the default TTWU_QUEUE feature would also provide the same effect. I see two simple ways to prevent the livelock. One is to just to remove the #ifdef: diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3d57cc0..74c94dc 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5937,7 +5937,6 @@ static int detach_tasks(struct lb_env *env) detached++; env->imbalance -= load; -#ifdef CONFIG_PREEMPT /* * NEWIDLE balancing is a source of latency, so preemptible * kernels will stop after the first task is detached to minimize @@ -5945,7 +5944,6 @@ static int detach_tasks(struct lb_env *env) */ if (env->idle == CPU_NEWLY_IDLE) break; -#endif /* * We only want to steal up to the prescribed amount of Or this should work too: diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3d57cc0..13358cf 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7487,6 +7487,8 @@ static int idle_balance(struct rq *this_rq) */ if (this_rq->cfs.h_nr_running && !pulled_task) pulled_task = 1; + else if (!this_rq->cfs.h_nr_running && pulled_task) + pulled_task = 0; out: /* Move the next balance forward */ But it seems that the real problem is that detach_tasks() can remove all tasks, when cgroups are involved as described above? Thanks. /Rabin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/