Date: Tue, 28 Mar 2017 14:50:22 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Yuyang Du <yuyang.du@intel.com>
Cc: mingo@kernel.org, linux-kernel@vger.kernel.org, pjt@google.com,
        bsegall@google.com, morten.rasmussen@arm.com,
        vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
        matt@codeblueprint.co.uk, umgwanakikbuti@gmail.com
Subject: Re: [RESEND PATCH 2/2] sched/fair: Optimize __update_sched_avg()
Message-ID: <20170328125022.75wlcrsnhobysxbj@hirez.programming.kicks-ass.net>
References: <1486935863-25251-1-git-send-email-yuyang.du@intel.com>
 <1486935863-25251-3-git-send-email-yuyang.du@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1486935863-25251-3-git-send-email-yuyang.du@intel.com>
User-Agent: NeoMutt/20170113 (1.7.2)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1750
Lines: 54

This Changelog being so impenetrable is what makes me skip over it;
I'll put it on the 'look-at-later' pile, and that just never happens :/

On Mon, Feb 13, 2017 at 05:44:23AM +0800, Yuyang Du wrote:
> __update_load_avg() has the following steps:
> 
>   1. add the remainder of the last incomplete period
>   2. decay old sum
>   3. accumulate new sum in full periods since last_update_time
>   4. accumulate the current incomplete period
>   5. update averages
> 
> However, there is no need to separately compute steps 1, 3, and 4.
> 
> Illustation:
> 
>              c1          c3           c4
>              ^           ^            ^
>              |           |            |
>            |<->|<----------------->|<--->|
>    ... |---x---|------| ... |------|-----x (now)
> 
> c1, c3, and c4 are the accumulated (meanwhile decayed) contributions
> in timing in steps 1, 3, and 4 respectively.
> 
> With them, the accumulated contribution to load_sum, for example, is:
> 
> contrib = c1 * weight * freq_scaled;
> contrib += c3 * weight * freq_scaled;
> contrib += c4 * weight * freq_scaled;
> 
> Obviously, we can optimize the above and they equate to:
> 
> contrib = c1 + c3 + c4;
> contrib *= weight * freq_scaled;

But that's not at all what's happening;

The equation is something like:

                       1 (p+1)/32            p+1 1 n/32
 load = (load' + c1) * -^          + 1024 * \Sum -^     + c4
                       2                     n=1 2

                                     <---------------->
				            c3


And then its 'obvious' you cannot do c1+c3+c4 anything.


The decay factor of each part (c1,c3,4) is different, so unless you
explain how that works, instead of hand-wave a bit, this isn't making
any sense.