Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754264AbdCaHBt (ORCPT ); Fri, 31 Mar 2017 03:01:49 -0400 Received: from bombadil.infradead.org ([65.50.211.133]:49384 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752942AbdCaHBs (ORCPT ); Fri, 31 Mar 2017 03:01:48 -0400 Date: Fri, 31 Mar 2017 09:01:41 +0200 From: Peter Zijlstra To: Paul Turner Cc: Yuyang Du , Ingo Molnar , LKML , Benjamin Segall , Morten Rasmussen , Vincent Guittot , Dietmar Eggemann , Matt Fleming , Mike Galbraith Subject: Re: [RESEND PATCH 2/2] sched/fair: Optimize __update_sched_avg() Message-ID: <20170331070141.ygghjaamv6bswbyl@hirez.programming.kicks-ass.net> References: <1486935863-25251-1-git-send-email-yuyang.du@intel.com> <1486935863-25251-3-git-send-email-yuyang.du@intel.com> <20170328144625.GX3093@worktop> <20170329000442.GC2459@ydu19desktop> <20170329104126.lg6ismevfbqywpcj@hirez.programming.kicks-ass.net> <20170330121658.6mo7datma4ssw7st@hirez.programming.kicks-ass.net> <20170330141428.deiduft5btwid6ov@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3510 Lines: 98 On Thu, Mar 30, 2017 at 03:02:47PM -0700, Paul Turner wrote: > On Thu, Mar 30, 2017 at 7:14 AM, Peter Zijlstra wrote: > > On Thu, Mar 30, 2017 at 02:16:58PM +0200, Peter Zijlstra wrote: > >> On Thu, Mar 30, 2017 at 04:21:08AM -0700, Paul Turner wrote: > > > >> > > + > >> > > + if (unlikely(periods >= LOAD_AVG_MAX_N)) > >> > > return LOAD_AVG_MAX; > > > >> > > >> > Is this correct in the iterated periods > LOAD_AVG_MAX_N case? > >> > I don't think the decay above is guaranteed to return these to zero. > >> > >> Ah! > >> > >> Indeed, so decay_load() needs LOAD_AVG_PERIOD * 63 before it truncates > >> to 0, because every LOAD_AVG_PERIOD we half the value; loose 1 bit; so > >> 63 of those and we're 0. > >> > >> But __accumulate_sum() OTOH returns LOAD_AVG_MAX after only > >> LOAD_AVG_MAX_N, which < LOAD_AVG_PERIOD * 63. > >> > >> So yes, combined we exceed LOAD_AVG_MAX, which is bad. Let me think what > >> to do about that. > > > > > > So at the very least it should be decay_load(LOAD_AVG_MAX, 1) (aka > > LOAD_AVG_MAX - 1024), but that still doesn't account for the !0 > > decay_load() of the first segment. > > > > I'm thinking that we can compute the middle segment, by taking the max > > value and chopping off the ends, like: > > > > > > p > > c2 = 1024 \Sum y^n > > n=1 > > > > inf inf > > = 1024 ( \Sum y^n - \Sum y^n - y^0 ) > > n=0 n=p > > > > > > So this is endemic to what I think is a deeper problem: I think not. 47742*e((345/32)*l(.5)) 27.12819932019487579284 So even without weight, 345 periods isn't enough to flatten LOAD_AVG_MAX. > The previous rounds of optimization folded weight into load_sum. I > think this introduced a number of correctness problems: I'll argue you on correctness; but these might well be problems indeed. > a) the load_avg is no longer independent of weight; a high weight > entity can linger for eons. [63 LOAD_AVG_PERIODS] Certainly longer than before, agreed. > b) it breaks the dynamic response of load_avg on a weight change. > While nice is not common, there's a case that this is really important > for which is cgroups with a low number of threads running. E.g. When we > transition from 1->2 threads we immediately halve the weight, but > because of the folding it takes a very large time to be numerically > correct again. I think this is a matter of semantics, not correctness. We did have that weight in the past, so our past average including that isn't incorrect per se. Similarly, our vruntime includes increments based on prior weight. Now; you're arguing that this is undesired, and this might well be. > c) It doesn't work with scale_load_down and fractional shares below > SCHED_LOAD_SCALE [we multiply in a zero -> zero rq load] Yes, I noticed this, and this is indeed undesired. > d) It makes doing stability/clipping above a nightmare. Unsure what exactly you're referring to; the scale_down_load() boundary at 0? How is this a separate point from c then? > I think it's actually *costing* us cycles, since we end up multiplying > in the weight every partial [load_sum] update, but we only actually > need to compute it when updating load_avg [which is per period > overflow]. So lets pull it out again -- but I don't think we need to undo all of yuyang's patches for that. So please, look at the patch I proposed for the problem you spotted. Lets fix the current state and take it from there, ok?