Date: Tue, 18 Oct 2016 13:09:54 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>,
        Wanpeng Li <kernellwp@gmail.com>, Ingo Molnar <mingo@kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Mike Galbraith <umgwanakikbuti@gmail.com>,
        Yuyang Du <yuyang.du@intel.com>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>
Subject: Re: [PATCH] sched/fair: Do not decay new task load on first enqueue
Message-ID: <20161018110954.GX3142@twins.programming.kicks-ass.net>
References: <20160928101422.GR5016@twins.programming.kicks-ass.net>
 <20160928193731.GD16071@codeblueprint.co.uk>
 <CANRm+CyVFuT3XJt7DZEBZgHb_hQPzDUfOGnkAqNexH4q2ex74Q@mail.gmail.com>
 <20161010100107.GZ16071@codeblueprint.co.uk>
 <CAKfTPtBFrahA2fBoG5S5CBiJHb8EZkUbPaOZ4jZFc1mVYH5zJQ@mail.gmail.com>
 <20161010173440.GA28945@linaro.org>
 <20161011102453.GA16071@codeblueprint.co.uk>
 <CAKfTPtCKxu=39wbcDW5hioGnWxmGnSoYiu_oMfQEWObwBKzG5w@mail.gmail.com>
 <20161011185759.GD16071@codeblueprint.co.uk>
 <CAKfTPtCj-Jp2RLryAGkeWv8sw55n130hX5fWO-3mopgej6+4qQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAKfTPtCj-Jp2RLryAGkeWv8sw55n130hX5fWO-3mopgej6+4qQ@mail.gmail.com>
User-Agent: Mutt/1.5.23.1 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1501
Lines: 34

On Wed, Oct 12, 2016 at 09:41:36AM +0200, Vincent Guittot wrote:

> ok. In fact, I have noticed another regression with tip/sched/core and
> hackbench while looking at yours.
> I have bisect to :
> 10e2f1acd0 ("sched/core: Rewrite and improve select_idle_siblings")
> 
> hackbench -P -g 1
> 
>        v4.8        tip/sched/core  tip/sched/core+revert 10e2f1acd010
> and 1b568f0aabf2
> min 0.051       0,052               0.049
> avg 0.057(0%)   0,062(-7%)   0.056(+1%)
> max 0.070       0,073      0.067
> stdev  +/-8%       +/-10%    +/-9%
> 
> The issue seems to be that it prevents some migration at wake up at
> the end of hackbench test so we have last tasks that compete for the
> same CPU whereas other CPUs are idle in the same MC domain. I haven't
> to look more deeply which part of the patch do the regression yet

So select_idle_cpu(), which does the LLC wide CPU scan is now throttled
by a comparison between avg_cost and avg_idle; where avg_cost is a
historical measure of how costly it was to scan the entire LLC domain
and avg_idle is our current idle time guestimate (also a historical
average).

The problem was that a number of workloads were spending quite a lot of
time here scanning CPUs while they could be doing useful work (esp.
since newer parts have silly amounts of CPUs per LLC).

The toggle is a heuristic with a random number in.. we could see if
there's anything better we can do. I know some people take the toggle
out entirely, but that will regress other workloads.