Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751480AbbGIGP5 (ORCPT ); Thu, 9 Jul 2015 02:15:57 -0400 Received: from bes.se.axis.com ([195.60.68.10]:56767 "EHLO bes.se.axis.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750819AbbGIGPy (ORCPT ); Thu, 9 Jul 2015 02:15:54 -0400 Date: Thu, 9 Jul 2015 08:15:49 +0200 From: Stefan Ekenberg To: Yuyang Du Cc: Peter Zijlstra , Rabin Vincent , Mike Galbraith , "mingo@redhat.com" , "linux-kernel@vger.kernel.org" , Paul Turner , Ben Segall , Morten Rasmussen Subject: Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() Message-ID: <20150709061540.GA1289@axis.com> References: <20150630143057.GA31689@axis.com> <1435728995.9397.7.camel@gmail.com> <20150701145551.GA15690@axis.com> <20150701204404.GH25159@twins.programming.kicks-ass.net> <20150701232511.GA5197@intel.com> <20150703163928.GR3644@twins.programming.kicks-ass.net> <20150705221151.GF5197@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150705221151.GF5197@intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2504 Lines: 60 Hi, I tested the patch on a setup with 7 devices, all running the same troublesome use-case in parallel (same use-case as we used to produce the crash dumps). This use-case was previously able to reproduce the problem about 21 times during 24 hours. After including the patch the setup ran perfectly for 48 hours. So to summarize, patch tested OK. Tested-by: Stefan Ekenberg On Mon, Jul 06, 2015 at 12:11:51AM +0200, Yuyang Du wrote: > On Fri, Jul 03, 2015 at 06:39:28PM +0200, Peter Zijlstra wrote: > > On Thu, Jul 02, 2015 at 07:25:11AM +0800, Yuyang Du wrote: > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index 40a7fcb..f7cc1ef 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -5898,6 +5898,10 @@ static int detach_tasks(struct lb_env *env) > > > return 0; > > > > > > while (!list_empty(tasks)) { > > > + > > > + if (env->idle == CPU_NEWLY_IDLE && env->src_rq->nr_running <= 1) > > > > Should we make that ->idle != CPU_NOT_IDLE ? > > I think including CPU_IDLE is good. > > -- > Subject: [PATCH] sched: Avoid pulling all tasks in idle balancing > > In idle balancing where a CPU going idle pulls tasks from another CPU, > a livelock may happen if the CPU pulls all tasks from another, makes > it idle, and this iterates. So just avoid this. > > Reported-by: Rabin Vincent > Signed-off-by: Yuyang Du > --- > kernel/sched/fair.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 40a7fcb..769d591 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5898,6 +5898,13 @@ static int detach_tasks(struct lb_env *env) > return 0; > > while (!list_empty(tasks)) { > + /* > + * We don't want to steal all, otherwise we may be treated likewise, > + * which could at worst lead to a livelock crash. > + */ > + if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1) > + break; > + > p = list_first_entry(tasks, struct task_struct, se.group_node); > > env->loop++; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/