Date: Thu, 14 Apr 2016 02:44:12 +0800
From: Yuyang Du <yuyang.du@intel.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>,
        linux-kernel <linux-kernel@vger.kernel.org>,
        Benjamin Segall <bsegall@google.com>, Paul Turner <pjt@google.com>,
        Morten Rasmussen <morten.rasmussen@arm.com>,
        Juri Lelli <juri.lelli@arm.com>
Subject: Re: [PATCH 2/4] sched/fair: Drop out incomplete current period when
 sched averages accrue
Message-ID: <20160413184412.GO8697@intel.com>
References: <1460327765-18024-1-git-send-email-yuyang.du@intel.com>
 <1460327765-18024-3-git-send-email-yuyang.du@intel.com>
 <570E61FE.4060000@arm.com>
 <CAKfTPtCc3t+g7hESEJ53EnwzvOGr+oE_NF=eOvq0YVo-YXN=HQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAKfTPtCc3t+g7hESEJ53EnwzvOGr+oE_NF=eOvq0YVo-YXN=HQ@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2137
Lines: 60

On Wed, Apr 13, 2016 at 05:28:18PM +0200, Vincent Guittot wrote:
> > For a periodic task, the signals really get much more unstable. Even for
> > a steady state (load/util related) periodic task there is a meander
> > pattern which depends on if we for instance hit a dequeue (decay +
> > accrue) or an enqueue (decay only) after the 1ms has elapsed.
> >
> > IMHO, 1ms is too big to create signals describing task and cpu load/util
> > signals given the current scheduler dynamics. We simply see too many
> > signal driving points (e.g. enqueue/dequeue) bailing out of
> > __update_load_avg().

By "bailing out", you mean return without update because the delta is less
than 1ms?

> > Examples of 1 periodic task pinned to a cpu on an ARM64 system, HZ=250
> > in steady state:
> >
> > (1) task runtime = 100us period = 200us
> >
> >   pelt          load/util signal
> >
> >   1us:          488-491
> >
> >   1ms:          483-534

100us/200us = 50%, so the util should center around 512, it seems in this
regard, it is better, but the variance is undesirable.

> > We get ~2 dequeues (load/util example: 493->504) and ~2 enqueues
> > (load/util example: 496->483) in the meander pattern in the 1ms case.
> >
> > (2) task runtime = 100us period = 1000us
> >
> >   pelt          load/util signal
> >
> >   1us:          103-105
> >
> >   1ms:           84-145
> >
> > We get ~3-4 dequeues (load/util example: 104->124->134->140) and ~16-20
> > enqueues (load/util example: 137->134->...->99->97) in the meander
> > pattern in the 1ms case.

The same as above.

> 
> yes, similarly i have some use cases with 2ms running task in a period
> of 5.12ms. it will be seen either as a 1ms running task or a 2ms
> running tasks depending on how the running is synced with the 1ms
> boundary
> 
> so the load will vary between 197-215 up to 396-423 depending of when
> the 1ms boundary occurs in the 2ms running
> 

Same as above, and this time, the util is "expected" to be 2/5.242*1024=391
of all the samples. We solve the problem of overly-decay, but the precision
loss is a new problem.

Let me see if we can get to a 2-level period scheme, :)