Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754594AbdDLPo6 (ORCPT ); Wed, 12 Apr 2017 11:44:58 -0400 Received: from merlin.infradead.org ([205.233.59.134]:40682 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752513AbdDLPoz (ORCPT ); Wed, 12 Apr 2017 11:44:55 -0400 Date: Wed, 12 Apr 2017 17:44:47 +0200 From: Peter Zijlstra To: Vincent Guittot Cc: mingo@kernel.org, linux-kernel@vger.kernel.org, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, yuyang.du@intel.com, pjt@google.com, bsegall@google.com Subject: Re: [PATCH v2] sched/fair: update scale invariance of PELT Message-ID: <20170412154447.coqnzhlhimz5pc3l@hirez.programming.kicks-ass.net> References: <1491815909-13345-1-git-send-email-vincent.guittot@linaro.org> <20170410173802.orygigjbcpefqtdv@hirez.programming.kicks-ass.net> <20170411075221.GA30421@linaro.org> <20170411085305.aik6gdy6n3wa22ok@hirez.programming.kicks-ass.net> <20170411094021.GA17811@linaro.org> <20170411104136.33hkvzlmoa7zc72l@hirez.programming.kicks-ass.net> <20170411104949.eat4o37rlqiiobeu@hirez.programming.kicks-ass.net> <20170411130920.GB22895@linaro.org> <20170412112858.75hg75sd3clfxvvk@hirez.programming.kicks-ass.net> <20170412145047.GA19363@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170412145047.GA19363@linaro.org> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3962 Lines: 107 On Wed, Apr 12, 2017 at 04:50:47PM +0200, Vincent Guittot wrote: > Le Wednesday 12 Apr 2017 ? 13:28:58 (+0200), Peter Zijlstra a ?crit : > > > > |---------|---------| (wall-time) > > ----****------------- F=100% > > ----******----------- F= 66% > > |--------------|----| (fudge-time) > > It has been a bit hard for me to catch the diagram above because you scale the > idle time to get same ratio at 100% and 66% wherease I don't scale idle > time but only running time. Ah, so below I wrote that we then scale each window back to equal size, so the absolute size in wall-time becomes immaterial. > > (explicitly not used 50%, because then the second window would have > > collapsed to 0, imagine the joy if you go lower still) > > The second window can't collapse because we are working on delta time not > absolute wall-time and the delta is for only 1 type at a time: running or idle Right, but consider what happens when F drops too low, idle goes away from where there would've been some at F=1. At that point things become unrecoverable afaict. > > So in fudge-time the first window has 6/15 == 4/10 for the max-freq / > > wall-time combo. > > > > > > > > Then l = p' - p''. The lost idle time is tracked to apply the same amount of decay > > > window when the task is sleeping > > > > > > so at the end we have a number of decay window of p''+l = p'' so we still have > > > the same amount of decay window than previously. > > > > Now, we have to stretch time back to equal window size, and while you do > > that for the active windows, we have to do manual compensation for idle > > windows (which is somewhat ugleh) and is where the lost-time comes from. > > We can't stretch idle time because there is no relation between the idle time > and the current capacity. Brain melts.. > > Also, this all feels entirely yucky, because as per the above, if we'd > > ran at 33%, we'd have ended up with a negative time window. > > Not sure to catch how we can end up with negative window. We are working with > delta time not absolute time. |---------|---------|---------| F=100% --****------------------------ |--------------|----|---------| F= 66% --******---------------------- |-------------------|---------| F= 50% --********-------------------- |-----------------------------| F= 33% --************---------------- So what happens is that when the (wall) time for a window goes negative it simply moves the next window along, until that too is compressed etc.. So in the above figure, the right most edge of F=33% contains 2 whole periods of idle time, both contracted to measure 0 (wall) time. The only thing you have to recover them from is the lost idle time measure. > > Not to mention that this only seems to work for low utilization. Once > > you hit higher utilization scenarios, where there isn't much idle time > > to compensate for the stretching, things go wobbly. Although both > > scenarios might end up being the same. > > During the running phase, we calculate how much idle time has diseappeared > because we are running at lower frequency and we compensate it once back to > idle. > > > > > And instead of resurrecting 0 sized windows, you throw them out, which > > I don't catch point above It might've been slightly inaccurate. But the point remains that you destroy time. Not all accrued lost idle time is recovered. + if (sa->util_sum < (LOAD_AVG_MAX * 1000)) { + /* + * Add the idle time stolen by running at lower compute + * capacity + */ + delta += sa->stolen_idle_time; + } + sa->stolen_idle_time = 0; See here, stolen_idle_time is reset regardless. Time is non-continuous at that point. I still have to draw me more interesting cases, I'm not convinced I fully understand things.