Date: Tue, 18 Oct 2016 14:07:44 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
        Joseph Salisbury <joseph.salisbury@canonical.com>,
        Ingo Molnar <mingo@kernel.org>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        LKML <linux-kernel@vger.kernel.org>, Mike Galbraith <efault@gmx.de>,
        omer.akram@canonical.com
Subject: Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes
Message-ID: <20161018120744.GZ3142@twins.programming.kicks-ass.net>
References: <d0b0e00d-2892-6386-3e09-8df568e161ef@arm.com>
 <20161014151827.GA10379@linaro.org>
 <2bb765e7-8a5f-c525-a6ae-fbec6fae6354@canonical.com>
 <20161017090903.GA11962@linaro.org>
 <4e15ad55-beeb-e860-0420-8f439d076758@arm.com>
 <20161017131952.GR3117@twins.programming.kicks-ass.net>
 <CAKfTPtAxw1b-vy285HKtPUBFYuJdv2CFZH_gP3CMtZHs1wLPXg@mail.gmail.com>
 <94cc6deb-f93e-60ec-5834-e84a8b98e73c@arm.com>
 <20161018090747.GW3142@twins.programming.kicks-ass.net>
 <f646569d-a435-3b89-73a9-43c41adb4af4@arm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <f646569d-a435-3b89-73a9-43c41adb4af4@arm.com>
User-Agent: Mutt/1.5.23.1 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1843
Lines: 41

On Tue, Oct 18, 2016 at 12:15:11PM +0100, Dietmar Eggemann wrote:
> On 18/10/16 10:07, Peter Zijlstra wrote:
> > On Mon, Oct 17, 2016 at 11:52:39PM +0100, Dietmar Eggemann wrote:

> > On IRC you mentioned that adding list_add_leaf_cfs_rq() to
> > online_fair_sched_group() cures this, this would actually match with
> > unregister_fair_sched_group() doing list_del_leaf_cfs_rq() and avoid
> > a few instructions on the enqueue path, so that's all good.
> 
> Yes, I was able to recreate a similar problem (not related to the cpu
> masks) on ARM64 (6 logical cpus). I created 100 2. level tg's but only
> put one task (no cpu affinity, so it could run on multiple cpus) in one
> of these tg's (mainly to see the related cfs_rq's in /proc/sched_debug).
> 
> I get a remaining .tg_load_avg : 49898 for cfs_rq[x]:/tg_1

Ah, and since all those CPUs are online, we decay all that load away. OK
makes sense now.

>  > I'm just not immediately seeing how that cures things. The only relevant
> > user of the leaf_cfs_rq list seems to be update_blocked_averages() which
> > is called from the balance code (idle_balance() and
> > rebalance_domains()). But neither should call that for offline (or
> > !present) CPUs.
> 
> Assuming this is load from the 99 2. level tg's which never had a task
> running, putting list_add_leaf_cfs_rq() into online_fair_sched_group()
> for all cpus makes sure that all the 'blocked load' get's decayed.
> 
> Doing what Vincent just suggested, not initializing tg se's w/ 1024 but
> w/ 0 instead prevents this from being necessary.

Indeed. I just worry about the cases where we do no propagate the load
up, eg. the stuff fixed by:

  1476695653-12309-5-git-send-email-vincent.guittot@linaro.org

If we hit an intermediary cgroup with 0 load, we might get some
interactivity issues.

But it could be I got lost again :-)