Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933016AbdC1Mvs (ORCPT ); Tue, 28 Mar 2017 08:51:48 -0400 Received: from merlin.infradead.org ([205.233.59.134]:47050 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932908AbdC1Mvp (ORCPT ); Tue, 28 Mar 2017 08:51:45 -0400 Date: Tue, 28 Mar 2017 14:50:22 +0200 From: Peter Zijlstra To: Yuyang Du Cc: mingo@kernel.org, linux-kernel@vger.kernel.org, pjt@google.com, bsegall@google.com, morten.rasmussen@arm.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, matt@codeblueprint.co.uk, umgwanakikbuti@gmail.com Subject: Re: [RESEND PATCH 2/2] sched/fair: Optimize __update_sched_avg() Message-ID: <20170328125022.75wlcrsnhobysxbj@hirez.programming.kicks-ass.net> References: <1486935863-25251-1-git-send-email-yuyang.du@intel.com> <1486935863-25251-3-git-send-email-yuyang.du@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1486935863-25251-3-git-send-email-yuyang.du@intel.com> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1750 Lines: 54 This Changelog being so impenetrable is what makes me skip over it; I'll put it on the 'look-at-later' pile, and that just never happens :/ On Mon, Feb 13, 2017 at 05:44:23AM +0800, Yuyang Du wrote: > __update_load_avg() has the following steps: > > 1. add the remainder of the last incomplete period > 2. decay old sum > 3. accumulate new sum in full periods since last_update_time > 4. accumulate the current incomplete period > 5. update averages > > However, there is no need to separately compute steps 1, 3, and 4. > > Illustation: > > c1 c3 c4 > ^ ^ ^ > | | | > |<->|<----------------->|<--->| > ... |---x---|------| ... |------|-----x (now) > > c1, c3, and c4 are the accumulated (meanwhile decayed) contributions > in timing in steps 1, 3, and 4 respectively. > > With them, the accumulated contribution to load_sum, for example, is: > > contrib = c1 * weight * freq_scaled; > contrib += c3 * weight * freq_scaled; > contrib += c4 * weight * freq_scaled; > > Obviously, we can optimize the above and they equate to: > > contrib = c1 + c3 + c4; > contrib *= weight * freq_scaled; But that's not at all what's happening; The equation is something like: 1 (p+1)/32 p+1 1 n/32 load = (load' + c1) * -^ + 1024 * \Sum -^ + c4 2 n=1 2 <----------------> c3 And then its 'obvious' you cannot do c1+c3+c4 anything. The decay factor of each part (c1,c3,4) is different, so unless you explain how that works, instead of hand-wave a bit, this isn't making any sense.