Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755569Ab3FENbV (ORCPT ); Wed, 5 Jun 2013 09:31:21 -0400 Received: from mail-we0-f169.google.com ([74.125.82.169]:32780 "EHLO mail-we0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753464Ab3FENbR (ORCPT ); Wed, 5 Jun 2013 09:31:17 -0400 Date: Wed, 5 Jun 2013 15:31:13 +0200 From: Frederic Weisbecker To: Vincent Guittot Cc: Peter Zijlstra , linux-kernel , "linaro-kernel@lists.linaro.org" , Ingo Molnar Subject: Re: [PATCH] sched: fix clear NOHZ_BALANCE_KICK Message-ID: <20130605133110.GA26600@somewhere> References: <1369927385-7801-1-git-send-email-vincent.guittot@linaro.org> <20130603224836.GA9388@somewhere> <20130604093611.GJ8923@twins.programming.kicks-ass.net> <20130604102620.GB14012@somewhere> <20130604111900.GB14973@somewhere> <20130604144451.GJ14973@somewhere> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4022 Lines: 88 On Tue, Jun 04, 2013 at 05:29:39PM +0200, Vincent Guittot wrote: > On 4 June 2013 16:44, Frederic Weisbecker wrote: > > On Tue, Jun 04, 2013 at 01:48:47PM +0200, Vincent Guittot wrote: > >> On 4 June 2013 13:19, Frederic Weisbecker wrote: > >> > On Tue, Jun 04, 2013 at 01:11:47PM +0200, Vincent Guittot wrote: > >> >> On 4 June 2013 12:26, Frederic Weisbecker wrote: > >> >> > On Tue, Jun 04, 2013 at 11:36:11AM +0200, Peter Zijlstra wrote: > >> >> >> > >> >> >> The best I can seem to come up with is something like the below; but I think > >> >> >> its ghastly. Surely we can do something saner with that bit. > >> >> >> > >> >> >> Having to clear it at 3 different places is just wrong. > >> >> > > >> >> > We could clear the flag early in scheduler_ipi() and set some > >> >> > specific value in rq->idle_balance that tells we want nohz idle > >> >> > balancing from the softirq, something like this untested: > >> >> > >> >> I'm not sure that we can have less than 2 places to clear it: cancel > >> >> place or acknowledge place otherwise we can face a situation where > >> >> idle load balance will be triggered 2 consecutive times because > >> >> NOHZ_BALANCE_KICK will be cleared before the idle load balance has > >> >> been done and had a chance to migrate tasks. > >> > > >> > I guess it depends what is the minimum value of rq->next_balance, it seems > >> > to be large enough to avoid this kind of incident. Although I don't > >> > know well the whole logic with rq->next_balance and ilb trigger so I must > >> > defer to you. > >> > >> In the trace that was showing the issue, i can see that both CPU0 and > >> CPU1 were trying to trig ILB almost simultaneously and the > >> test_and_set NOHZ_BALANCE_KICK filters one request so i would say that > >> clearing the bit before the end of the idle load balance sequence can > >> generate such sequence > > > > I see. > > > >> > >> In the sequence below, i have minimized the clear of NOHZ_BALANCE_KICK > >> in 2 places : acknowledge and cancel. I have reused part of the > >> proposal from peter which clears the bit if the condition doesn't > >> match but i have reordered the tests to done that only if all other > >> condition are matching > >> > >> static inline bool got_nohz_idle_kick(void) > >> { > >> - int cpu = smp_processor_id(); > >> - return idle_cpu(cpu) && test_bit(NOHZ_BALANCE_KICK, nohz_flags(cpu)); > >> + bool nohz_kick = test_bit(NOHZ_BALANCE_KICK, nohz_flags(cpu)); > >> + > >> + if (!nohz_kick) > >> + return false; > >> + > >> + if (idle_cpu(cpu) && !need_resched()) > >> + return true; > >> + > >> + clear_bit(NOHZ_BALANCE_KICK, nohz_flags(cpu)); > >> + return false; > >> } > >> > >> #else /* CONFIG_NO_HZ_COMMON */ > >> @@ -1393,8 +1401,9 @@ static void sched_ttwu_pending(void) > >> > >> void scheduler_ipi(void) > >> { > >> - if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick() > >> - && !tick_nohz_full_cpu(smp_processor_id())) > >> + if (llist_empty(&this_rq()->wake_list) > >> + && !tick_nohz_full_cpu(smp_processor_id()) > >> + && !got_nohz_idle_kick()) > >> return; > > > > But we still need got_nohz_idle_kick() to be the first check, don't we? Otherwise > > if we run an "idle -> quick task slice -> idle" sequence we may keep the flag > > but lose the notifying IPI in between. > > I'm not sure to catch the sequence you are describing above: "idle -> > quick task slice -> idle". > In addition, got_nohz_idle_kick must be the last tested condition (in > my proposal) in order to clear NOHZ_BALANCE_KICK only if we are sure > that we are going to return without possibility to trig the Idle load > balance Right, sorry for the confusion. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/