Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753315AbbDBJQY (ORCPT ); Thu, 2 Apr 2015 05:16:24 -0400 Received: from foss.arm.com ([217.140.101.70]:39813 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753070AbbDBJQV (ORCPT ); Thu, 2 Apr 2015 05:16:21 -0400 Date: Thu, 2 Apr 2015 10:17:04 +0100 From: Morten Rasmussen To: Jason Low Cc: Preeti U Murthy , "peterz@infradead.org" , "mingo@kernel.org" , Daniel Lezcano , "riel@redhat.com" , "vincent.guittot@linaro.org" , "srikar@linux.vnet.ibm.com" , "pjt@google.com" , "benh@kernel.crashing.org" , "efault@gmx.de" , "linux-kernel@vger.kernel.org" , "iamjoonsoo.kim@lge.com" , "svaidy@linux.vnet.ibm.com" , "tim.c.chen@linux.intel.com" Subject: Re: sched: Improve load balancing in the presence of idle CPUs Message-ID: <20150402091704.GZ18994@e105550-lin.cambridge.arm.com> References: <1427741729.5694.24.camel@j-VirtualBox> <551A5CCE.70008@linux.vnet.ibm.com> <1427828056.2492.24.camel@j-VirtualBox> <551B9514.80701@linux.vnet.ibm.com> <20150401170418.GX18994@e105550-lin.cambridge.arm.com> <1427954347.2556.43.camel@j-VirtualBox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1427954347.2556.43.camel@j-VirtualBox> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3925 Lines: 104 On Thu, Apr 02, 2015 at 06:59:07AM +0100, Jason Low wrote: > On Wed, 2015-04-01 at 18:04 +0100, Morten Rasmussen wrote: > > On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote: > > > > I am sorry I don't quite get this. Can you please elaborate? > > > > I think the scenario is that we are in nohz_idle_balance() and decide to > > bail out because we have pulled some tasks, but before leaving > > nohz_idle_balance() we want to check if more balancing is necessary > > using nohz_kick_needed() and potentially kick somebody to continue. > > Also, below is an example patch. > > (Without the conversion to idle_cpu(), the check for rq->idle_balance > would not be accurate anymore) > > --- > kernel/sched/fair.c | 17 ++++++++++------- > 1 files changed, 10 insertions(+), 7 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index fdae26e..7749a14 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7620,6 +7620,8 @@ out: > } > > #ifdef CONFIG_NO_HZ_COMMON > +static inline bool nohz_kick_needed(struct rq *rq); > + > /* > * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the > * rebalancing for all the cpus for whom scheduler ticks are stopped. > @@ -7629,6 +7631,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) > int this_cpu = this_rq->cpu; > struct rq *rq; > int balance_cpu; > + bool done_balancing = false; > > if (idle != CPU_IDLE || > !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu))) > @@ -7644,7 +7647,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) > * balancing owner will pick it up. > */ > if (need_resched()) > - break; > + goto end; > > rq = cpu_rq(balance_cpu); > > @@ -7663,9 +7666,12 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) > if (time_after(this_rq->next_balance, rq->next_balance)) > this_rq->next_balance = rq->next_balance; > } > + done_balancing = true; > nohz.next_balance = this_rq->next_balance; > end: > clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)); > + if (!done_balancing && nohz_kick_needed(this_rq)) > + nohz_balancer_kick(); > } > > /* > @@ -7687,7 +7693,7 @@ static inline bool nohz_kick_needed(struct rq *rq) > int nr_busy, cpu = rq->cpu; > bool kick = false; > > - if (unlikely(rq->idle_balance)) > + if (unlikely(idle_cpu(cpu))) > return false; > > /* > @@ -7757,16 +7763,13 @@ static void run_rebalance_domains(struct softirq_action *h) > enum cpu_idle_type idle = this_rq->idle_balance ? > CPU_IDLE : CPU_NOT_IDLE; > > + rebalance_domains(this_rq, idle); > /* > * If this cpu has a pending nohz_balance_kick, then do the > * balancing on behalf of the other idle cpus whose ticks are > - * stopped. Do nohz_idle_balance *before* rebalance_domains to > - * give the idle cpus a chance to load balance. Else we may > - * load balance only within the local sched_domain hierarchy > - * and abort nohz_idle_balance altogether if we pull some load. > + * stopped. > */ > nohz_idle_balance(this_rq, idle); > - rebalance_domains(this_rq, idle); > } I think this should reduce the latency Preeti is seeing and avoid unnecessary wake-ups, however, it may not be quite as aggressive in spreading tasks quickly. It will stop the chain-of-kicks as soon as the balancer cpu has pulled only one task. The source cpu may still be having two tasks and other cpus may still have more than two tasks running. Depending on how bad it is, we could consider kicking another cpu if the imbalance is still significant after the balancer cpu has pulled a task. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/