Date: Tue, 25 Apr 2017 11:49:41 -0700
From: Tejun Heo <tj@kernel.org>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
        linux-kernel <linux-kernel@vger.kernel.org>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Mike Galbraith <efault@gmx.de>, Paul Turner <pjt@google.com>,
        Chris Mason <clm@fb.com>, kernel-team@fb.com
Subject: Re: [PATCH 2/2] sched/fair: Always propagate runnable_load_avg
Message-ID: <20170425184941.GB15593@wtj.duckdns.org>
References: <20170424201344.GA14169@wtj.duckdns.org>
 <20170424201444.GC14169@wtj.duckdns.org>
 <CAKfTPtCf0JUPubBtjY25Lr6J1aihUMjs3HEw+8MXcCwpuku7eQ@mail.gmail.com>
 <CAKfTPtBLYKXyEYpyTWDRakP8zwe0z=_2HT3Lg7UM2PdQUF3kAA@mail.gmail.com>
 <CAKfTPtAJmJsT2=DbZFtK2aBVkNKbksueuDs_vCzsvWPR-_Aebg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAKfTPtAJmJsT2=DbZFtK2aBVkNKbksueuDs_vCzsvWPR-_Aebg@mail.gmail.com>
User-Agent: Mutt/1.8.0 (2017-02-23)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2205
Lines: 50

Hello,

On Tue, Apr 25, 2017 at 02:59:18PM +0200, Vincent Guittot wrote:
> >> So you are changing the purpose of propagate_entity_load_avg which
> >> aims to propagate load_avg/util_avg changes only when a task migrate
> >> and you also want to propagate the enqueue/dequeue in the parent
> >> cfs_rq->runnable_load_avg

Yeah, it always propagates runnable_load_avg and load_avg/util_avg too
on migrations.

> > In fact you want that sched_entity load_avg reflects
> > cfs_rq->runnable_load_avg and not cfs_rq->avg.load_avg

Yes, that's how it gets changed.  The load balancer assumes that the
root's runnable_load_avg is the total sum of all currently active
tasks.  Nesting cfs_rq's shouldn't change that and how it should be
mapped is clearly defined (scaled recursively till it reaches the
root), which is what the code calculates.  The change in
cfs_rq->avg.load_avg's behavior is to reflect that immediate
propagation as load_avg and runnable_load_avg are tightly coupled.

While it does change a nested cfs_rq's load_avg behavior.  It sheds of
the extra layer of averaging and directly reflects the scaled load
avgs of its members, which are already time averaged.  I could have
missed something but couldn't spot anything which can break from this.

> I have run a quick test with your patches and schbench on my platform.
> I haven't been able to reproduce your regression but my platform is
> quite different from yours (only 8 cores without SMT)
> But most importantly, the parent cfs_rq->runnable_load_avg never
> reaches 0 (or almost 0) when it is idle. Instead, it still has a
> runnable_load_avg (this is not due to rounding computation) whereas
> runnable_load_avg should be 0
 
Heh, let me try that out.  Probably a silly mistake somewhere.

> Just to be curious, Is your regression still there if you disable
> SMT/hyperthreading on your paltform?

Will try that too.  I can't see why HT would change it because I see
single CPU queues misevaluated.  Just in case, you need to tune the
test params so that it doesn't load the machine too much and that
there are some non-CPU intensive workloads going on to purturb things
a bit.  Anyways, I'm gonna try disabling HT.

Thanks.

-- 
tejun