Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754857AbdC2ILE (ORCPT ); Wed, 29 Mar 2017 04:11:04 -0400 Received: from mga09.intel.com ([134.134.136.24]:63782 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754222AbdC2IJg (ORCPT ); Wed, 29 Mar 2017 04:09:36 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.36,240,1486454400"; d="scan'208";a="839583339" Date: Wed, 29 Mar 2017 08:04:42 +0800 From: Yuyang Du To: Peter Zijlstra Cc: mingo@kernel.org, linux-kernel@vger.kernel.org, pjt@google.com, bsegall@google.com, morten.rasmussen@arm.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, matt@codeblueprint.co.uk, umgwanakikbuti@gmail.com Subject: Re: [RESEND PATCH 2/2] sched/fair: Optimize __update_sched_avg() Message-ID: <20170329000442.GC2459@ydu19desktop> References: <1486935863-25251-1-git-send-email-yuyang.du@intel.com> <1486935863-25251-3-git-send-email-yuyang.du@intel.com> <20170328144625.GX3093@worktop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170328144625.GX3093@worktop> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2899 Lines: 95 Hi Peter, On Tue, Mar 28, 2017 at 04:46:25PM +0200, Peter Zijlstra wrote: > On Mon, Feb 13, 2017 at 05:44:23AM +0800, Yuyang Du wrote: > > __update_load_avg() has the following steps: > > > > 1. add the remainder of the last incomplete period > > 2. decay old sum > > 3. accumulate new sum in full periods since last_update_time > > 4. accumulate the current incomplete period > > 5. update averages > > > > However, there is no need to separately compute steps 1, 3, and 4. > > > > Illustation: > > > > c1 c3 c4 > > ^ ^ ^ > > | | | > > |<->|<----------------->|<--->| > > ... |---x---|------| ... |------|-----x (now) > > > > c1, c3, and c4 are the accumulated (meanwhile decayed) contributions > > in timing in steps 1, 3, and 4 respectively. > > > > With them, the accumulated contribution to load_sum, for example, is: > > > > contrib = c1 * weight * freq_scaled; > > contrib += c3 * weight * freq_scaled; > > contrib += c4 * weight * freq_scaled; > > > > Obviously, we can optimize the above and they equate to: > > > > contrib = c1 + c3 + c4; > > contrib *= weight * freq_scaled; > > > > So I figured out what it is you're doing, how's this? I still need to > rewrite the Changelog to make this cleared, Yes, you need to, and let me do it too and learn how you will rewrite it. Subject: [PATCH] sched/fair: Optimize __update_sched_avg() In __update_load_avg(), the flow of periods and update times are illustrated as: t1 t3 t4 ^ ^ ^ | | | |<->|<----------------->|<--->| ... |---x---|------| ... |------|-----x (now) in which, t1 is the remainder of the last incomplete period, t2 is the full periods since the last_update_time, and t3 is the current period. These times altogether make contributions to the load_sum with the following 5 steps: step 1: t1 is decayed as c1 step 2: last sum is decayed step 3: t3 has accumulated c3 step 4: t4 is c4 step 5: average the sum These contributions are further scaled with weight and frequency: contrib = c1 * weight * freq_scaled; contrib += c3 * weight * freq_scaled; contrib += c4 * weight * freq_scaled; Obviously, they equate to: contrib = c1 + c3 + c4; contrib *= weight * freq_scaled; By doing so, we save instructions, and clean the codes. With that, c1 must be additionally decayed as opposed to doing it in step 2, but these two approaches are about the same if you compare decay_load() with a round of contrib scaling. Code size comparison (with allyesconfig): Before: kernel/sched/built-in.o 1213304 After : kernel/sched/built-in.o 1212536 (-768B) > but I think the code now has > understandable comments. Yes, you made it. It's much better now. Thanks, Yuyang