Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752458AbbH0LVj (ORCPT ); Thu, 27 Aug 2015 07:21:39 -0400 Received: from mail-la0-f52.google.com ([209.85.215.52]:34488 "EHLO mail-la0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751769AbbH0LVi (ORCPT ); Thu, 27 Aug 2015 07:21:38 -0400 MIME-Version: 1.0 In-Reply-To: <1438595750-20455-1-git-send-email-vincent.guittot@linaro.org> References: <1438595750-20455-1-git-send-email-vincent.guittot@linaro.org> From: Vincent Guittot Date: Thu, 27 Aug 2015 13:21:16 +0200 Message-ID: Subject: Re: [PATCH v2] sched: fix nohz.next_balance update To: Peter Zijlstra , Ingo Molnar , linux-kernel , Preeti U Murthy , Jason Low Cc: Linaro Kernel Mailman List , Vincent Guittot Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4206 Lines: 103 Hi, On 3 August 2015 at 11:55, Vincent Guittot wrote: > Since commit d4573c3e1c99 ("sched: Improve load balancing in the presence > of idle CPUs"), the ILB CPU starts with the idle load balancing of other > idle CPUs and finishes with itself in order to speed up the spread of tasks > in all idle CPUs. > > The this_rq->next_balance is still used in nohz_idle_balance as an > intermediate step to gather the shortest next balance before updating > nohz.next_balance. But the former has not been updated yet and is likely to > be set with the current jiffies. As a result, the nohz.next_balance will be > set with current jiffies instead of the real next balance date. This > generates spurious kicks of nohz ilde balance. > > nohz_idle_balance must set the nohz.next_balance without taking into > account this_rq->next_balance which is not updated yet. Then, this_rq will > update nohz.next_update with its next_balance once updated and if necessary. > > Signed-off-by: Vincent Guittot > --- > > change since v1: > - add #ifdef CONFIG_NO_HZ_COMMON for accessing nohz structure > - fix some typos > > kernel/sched/fair.c | 35 +++++++++++++++++++++++++++++++---- > 1 file changed, 31 insertions(+), 4 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 587a2f6..581378a 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7779,8 +7779,23 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle) > * When the cpu is attached to null domain for ex, it will not be > * updated. > */ > - if (likely(update_next_balance)) > + if (likely(update_next_balance)) { > rq->next_balance = next_balance; > + > +#ifdef CONFIG_NO_HZ_COMMON > + /* > + * If this cpu has been elected to perform the nohz idle > + * balance. Other idle cpus have already rebalanced with > + * nohz_idle_balance and the nohz.next_balance has been > + * updated accordingly. This cpu is now running the idle load > + * balance for itself and we need to update the > + * nohz.next_balance accordingly. > + */ > + if ((idle == CPU_IDLE) && > + time_after(nohz.next_balance, rq->next_balance)) > + nohz.next_balance = rq->next_balance; > +#endif > + } > } > > #ifdef CONFIG_NO_HZ_COMMON > @@ -7793,6 +7808,9 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) > int this_cpu = this_rq->cpu; > struct rq *rq; > int balance_cpu; > + /* Earliest time when we have to do rebalance again */ > + unsigned long next_balance = jiffies + 60*HZ; > + int update_next_balance = 0; > > if (idle != CPU_IDLE || > !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu))) > @@ -7824,10 +7842,19 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) > rebalance_domains(rq, CPU_IDLE); > } > > - if (time_after(this_rq->next_balance, rq->next_balance)) > - this_rq->next_balance = rq->next_balance; > + if (time_after(next_balance, rq->next_balance)) { > + next_balance = rq->next_balance; > + update_next_balance = 1; > + } > } > - nohz.next_balance = this_rq->next_balance; > + > + /* > + * next_balance will be updated only when there is a need. > + * When the cpu is attached to null domain for ex, it will not be > + * updated. > + */ > + if (likely(update_next_balance)) > + nohz.next_balance = next_balance; > end: > clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)); > } > -- > 1.9.1 > Gentle ping Regards, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/