Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965223AbcCJDop (ORCPT ); Wed, 9 Mar 2016 22:44:45 -0500 Received: from mail-lb0-f170.google.com ([209.85.217.170]:35699 "EHLO mail-lb0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934844AbcCJDom (ORCPT ); Wed, 9 Mar 2016 22:44:42 -0500 MIME-Version: 1.0 In-Reply-To: References: <2495375.dFbdlAZmA6@vostro.rjw.lan> <56D8AEB7.2050100@linaro.org> <36459679.vzZnOsAVeg@vostro.rjw.lan> <20160308112759.GF6356@twins.programming.kicks-ass.net> <20160308192640.GD6344@twins.programming.kicks-ass.net> <20160309163930.GP6356@twins.programming.kicks-ass.net> From: Vincent Guittot Date: Thu, 10 Mar 2016 10:44:21 +0700 Message-ID: Subject: Re: [PATCH 6/6] cpufreq: schedutil: New governor based on scheduler utilization data To: "Rafael J. Wysocki" Cc: Peter Zijlstra , "Rafael J. Wysocki" , Steve Muckle , Linux PM list , Juri Lelli , ACPI Devel Maling List , Linux Kernel Mailing List , Srinivas Pandruvada , Viresh Kumar , Michael Turquette , Ingo Molnar Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5818 Lines: 176 On 10 March 2016 at 06:28, Rafael J. Wysocki wrote: > On Wed, Mar 9, 2016 at 5:39 PM, Peter Zijlstra wrote: >> On Tue, Mar 08, 2016 at 09:05:50PM +0100, Rafael J. Wysocki wrote: >>> >> This means that on platforms where the utilization is frequency >>> >> invariant we should use >>> >> >>> >> next_freq = a * x >>> >> >>> >> (where x is given by (2) above) and for platforms where the >>> >> utilization is not frequency invariant >>> >> >>> >> next_freq = a * x * current_freq / max_freq >>> >> >>> >> and all boils down to finding a. >>> > >>> > Right. >>> >>> However, that doesn't seem to be in agreement with the Steve's results >>> posted earlier in this thread. >> >> I could not make anything of those numbers. >> >>> Also theoretically, with frequency invariant, the only way you can get >>> to 100% utilization is by running at the max frequency, so the closer >>> to 100% you get, the faster you need to run to get any further. That >>> indicates nonlinear to me. >> >> I'm not seeing that, you get that by using a > 1. No need for >> non-linear. > > OK > >>> >> Now, it seems reasonable for a to be something like (1 + 1/n) * >>> >> max_freq, so for non-frequency invariant we get >>> >> >>> >> nex_freq = (1 + 1/n) * current_freq * x > > (*) (see below) > >>> > This seems like a big leap; where does: >>> > >>> > (1 + 1/n) * max_freq >>> > >>> > come from? And what is 'n'? >> >>> a = max_freq gives next_freq = max_freq for x = 1, >> >> next_freq = a * x * current_freq / max_freq >> >> [ a := max_freq, x := 1 ] -> >> >> = max_freq * 1 * current_freq / max_freq >> = current_freq >> >> != max_freq >> >> But I think I see what you're saying; because at x = 1, >> current_frequency must be max_frequency. Per your earlier point. > > Correct. > >>> but with that choice of a you may never get to x = 1 with frequency >>> invariant because of the feedback effect mentioned above, so the 1/n >>> produces the extra boost needed for that (n is a positive integer). >> >> OK, so that gets us: >> >> a = (1 + 1/n) ; n > 0 >> >> [ I would not have chosen (1 + 1/n), but lets stick to that ] > > Well, what would you choose then? :-) > >> So for n = 4 that gets you: a = 1.25, which effectively gets you an 80% >> utilization tipping point. That is, 1.25 * .8 = 1, iow. you'll pick the >> next frequency (assuming RELATION_L like selection). >> >> Together this gets you: >> >> next_freq = (1 + 1/n) * max_freq * x * current_freq / max_freq >> = (1 + 1/n) * x * current_freq > > That seems to be what I said above (*), isn't it? > >> Again, with n = 4, x > .8 will result in a next_freq > current_freq, and >> hence (RELATION_L) pick a higher one. > > OK > >>> Quite frankly, to me it looks like linear really is a better >>> approximation for "raw" utilization. That is, for frequency invariant >>> x we should take: >>> >>> next_freq = a * x * max_freq / current_freq >> >> (its very confusing how you use 'x' for both invariant and >> non-invariant). >> >> That doesn't make sense, remember: >> >> util = \Sum_i u_i * freq_i / max_freq (1) >> >> Which for systems where freq_i is constant reduces to: >> >> util = util_raw * current_freq / max_freq (2) >> >> But you cannot reverse this. IOW you cannot try and divide out >> current_freq on a frequency invariant metric. > > I see. > >> So going by: >> >> next_freq = (1 + 1/n) * max_freq * util (3) > > I think that should be > > next_freq = (1 + 1/n) * max_freq * util / max > > (where max is the second argument of cpufreq_update_util) or the > dimensions on both sides don't match. > >> if we substitute (2) into (3) we get: >> >> = (1 + 1/n) * max_freq * util_raw * current_freq / max_freq >> = (1 + 1/n) * current_freq * util_raw (4) >> >> Which gets you two formula with the same general behaviour. As (2) is >> the only approximation of (1) we can make. > > OK > > So since utilization is not frequency invariant in the current > mainline (or linux-next for that matter) AFAIC, I'm going to use the > following in the next version of the schedutil patch series: > > next_freq = 1.25 * current_freq * util_raw / max > > where util_raw and max are what I get from cpufreq_update_util(). > > 1.25 is for the 80% tipping point which I think is reasonable. We have the arch_scale_freq_capacity function that is arch dependent and can be used to merge the 2 formula that were described by peter above. By default, arch_scale_freq_capacity return SCHED_CAPACITY_SCALE which is max capacity but when arch_scale_freq_capacity is defined by an architecture, arch_scale_freq_capacity returns current_freq * max_capacity/max_freq so can't we use arch_scale_freq in your formula ? Taking your formula above it becomes: next_freq = 1.25 * current_freq * util / arch_scale_freq_capacity() Without invariance feature, we have the same formula than above : next_freq = 1.25 * current_freq * util_raw / max because SCHED_CAPACITY_SCALE is max capacity With invariance feature, we have next_freq = 1.25 * current_freq * util / (current_freq*max_capacity/max_freq) = 1.25 * util * max_freq / max which is the formula that has to be used with frequency invariant utilization. so we have one formula that works for both configuration (this is not really optimized for invariant system because we multiply then divide by current_freq in 2 different places but it's better than a wrong formula) Now, arch_scale_freq_capacity is available in kernel/sched/sched.h header file which can only be accessed by scheduler code... May be we can pass arch_scale_freq_capacity value instead of max one as a parameter of update_util function prototype Vincent