Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752023AbdDAX3q (ORCPT ); Sat, 1 Apr 2017 19:29:46 -0400 Received: from mail-qk0-f177.google.com ([209.85.220.177]:36375 "EHLO mail-qk0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751953AbdDAX3o (ORCPT ); Sat, 1 Apr 2017 19:29:44 -0400 MIME-Version: 1.0 In-Reply-To: References: <1514608.eWxQqcMBcc@aspire.rjw.lan> <20170331102223.GS19929@e106622-lin> <6978951.cfP8K2rCI0@aspire.rjw.lan> From: Andres Oportus Date: Sat, 1 Apr 2017 16:29:42 -0700 Message-ID: Subject: Re: [RFC/RFT][PATCH] cpufreq: schedutil: Reduce frequencies slower To: Linux PM , LKML Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6442 Lines: 164 On Sat, Apr 1, 2017 at 1:39 PM, Andres Oportus wrote: > Hi Rafael, Juri, > > On Fri, Mar 31, 2017 at 2:51 PM, Rafael J. Wysocki > wrote: >> >> On Friday, March 31, 2017 11:22:23 AM Juri Lelli wrote: >> > Hi Rafael, >> >> Hi Juri, >> >> > On 30/03/17 23:36, Rafael J. Wysocki wrote: >> > > From: Rafael J. Wysocki >> > > >> > > The schedutil governor reduces frequencies too fast in some >> > > situations which cases undesirable performance drops to >> > > appear. >> > > >> > > To address that issue, make schedutil reduce the frequency slower by >> > > setting it to the average of the value chosen during the previous >> > > iteration of governor computations and the new one coming from its >> > > frequency selection formula. >> > > >> > >> > I'm curious to test this out on Pixel phones once back in office, but >> > I've already got some considerations about this patch. Please find them >> > inline below. >> > >> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=194963 >> > > Reported-by: John >> > > Signed-off-by: Rafael J. Wysocki >> > > --- >> > > >> > > This addresses a practical issue, but one in the "responsiveness" or >> > > "interactivity" category which is quite hard to represent >> > > quantitatively. >> > > >> > > As reported by John in BZ194963, schedutil does not ramp up P-states >> > > quickly >> > > enough which causes audio issues to appear in his gaming setup. At >> > > least it >> > > evidently is worse than ondemand in this respect and the patch below >> > > helps. >> > > >> > >> > Might also be a PELT problem? >> >> I don't think so. >> >> As mentioned below, intel_pstate had it too and it doesn't use PELT. :-) >> >> This appears to be a general issue with load-based (or utilization-based) >> frequency selection algorithms using periodic sampling. Roughly, if >> something >> unanticipated is going to happen shortly (such as a burst in audio >> activity in >> a game), it may take a whole period to notice what's going on and the >> frequency >> set for that period can make a difference between sufficient and >> insufficient >> provisioning. >> >> >> What the patch does is to increase the likelihood that the frequency in >> question will be sufficient to avoid noticeable effects (such as audio >> cracks) >> and it tends to do the trick most of the time. >> >> [Of course, you may argue that this is related to the rate limitting in >> schedutil and intel_pstate, but then PELT itself is sampled periodically >> AFAICS.] >> >> > > The patch essentially repeats the trick added some time ago to the >> > > load-based >> > > P-state selection algorithm in intel_pstate, which allowed us to make >> > > it viable >> > > for performance-oriented users, and which is to reduce frequencies at >> > > a slower >> > > pace. >> > > >> > > The reason why I chose the average is because it is computationally >> > > cheap >> > > and pretty much the max reasonable slowdown and the idea is that in >> > > case >> > > there's something about to run that we don't know about yet, it is >> > > better to >> > > stay at a higher level for a while more to avoid having to get up from >> > > the floor >> > > every time. >> > > >> > >> > Another approach we have been playing with on Android (to solve what >> > seem to me similar issues) is to have decoupled up and down frequency >> > changes thresholds. With this you can decide how much quick to react to >> > a sudden increase in utilization and how much "hysteresis" you want to >> > have before slowing down. Every platfrom can also be easily tuned as >> > needed (instead of having the same filter applied for every platform). >> >> > >> > We seemed to actually recently come to the consideration that the up >> > threshold is probably not much needed (and it is in fact set to very >> > small values in practice). Once one is confident that the utilization >> > signal is not too jumpy, responding to a request for additional capacity >> > quickly seems the right thing to do (especially for RT/DL policies). >> > >> > What's your opinion? >> >> As I said, responding to increased load may take a whole period to notice >> and it looks like what happens during that period may be quite important. >> >> To me, thresholds have a problem that from the algorithm perspective they >> are constant values set externally. This means they likely need to be >> tuned >> once in a while by whatever entity that had set them (it is difficult to >> imagine that the same values will always be suitable for every workload) >> and >> that means an additional layer of (dynamic) control on top of the >> governor. >> >> > > But technically speaking it is a filter. :-) >> > > >> > > So among other things I'm wondering if that leads to substantial >> > > increases in >> > > energy consumption anywhere. >> > > >> > >> > Having a tunable might help getting the tradeoff right for different >> > platforms, maybe? >> >> It might, but it would mean additional computational cost (at least one >> more >> integer multiplication AFAICS). >> >> > As we discussed at the last LPC, having an energy model handy and use >> > that to decide how quickly to ramp up or slow down seems the desirable >> > long term solution, but we probably need something (as you are >> > proposing) until we get there. >> >> Well, we definitely need something to address real use cases, like the one >> that >> I responded to with this patch. :-) > > I don't know the history/intent behind schedutil rate limiting, but if we > make it to be only "down" as Juri mentioned we would not be adding a new > tunable but rather changing the current one to be more restricted (maybe > some renaming would be in order if this is done), this would provide > hysteresis to reduce this problem without locking the amount of the > hysteresis which may not work for all platforms. I also agree that "it is > difficult to imagine that the same values will always be suitable for every > workload", but without any value to control the whole system, we get nothing > in between. Ultimately I also think we should solve the hysteresis problem > at the root, i.e. the input to the governor in the form of util/load that > has not only hysteresis and energy model, but also any other behavioral > inputs built-in. > > Thanks, > Andres >> >> >> Thanks, >> Rafael >> >