Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753232AbdCOQYF (ORCPT ); Wed, 15 Mar 2017 12:24:05 -0400 Received: from foss.arm.com ([217.140.101.70]:49282 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752608AbdCOQYB (ORCPT ); Wed, 15 Mar 2017 12:24:01 -0400 Date: Wed, 15 Mar 2017 16:24:14 +0000 From: Juri Lelli To: Joel Fernandes Cc: Patrick Bellasi , "Joel Fernandes (Google)" , Linux Kernel Mailing List , linux-pm@vger.kernel.org, Ingo Molnar , Peter Zijlstra , "Rafael J . Wysocki" , Andres Oportus Subject: Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks Message-ID: <20170315162414.GI31499@e106622-lin> References: <1488292722-19410-1-git-send-email-patrick.bellasi@arm.com> <1488292722-19410-6-git-send-email-patrick.bellasi@arm.com> <20170315114052.GB18557@e110439-lin> <20170315144449.GH31499@e106622-lin> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5174 Lines: 104 On 15/03/17 09:13, Joel Fernandes wrote: > On Wed, Mar 15, 2017 at 7:44 AM, Juri Lelli wrote: > > Hi Joel, > > > > On 15/03/17 05:59, Joel Fernandes wrote: > >> On Wed, Mar 15, 2017 at 4:40 AM, Patrick Bellasi > >> wrote: > >> > On 13-Mar 03:08, Joel Fernandes (Google) wrote: > >> >> Hi Patrick, > >> >> > >> >> On Tue, Feb 28, 2017 at 6:38 AM, Patrick Bellasi > >> >> wrote: > >> >> > Currently schedutil enforce a maximum OPP when RT/DL tasks are RUNNABLE. > >> >> > Such a mandatory policy can be made more tunable from userspace thus > >> >> > allowing for example to define a reasonable max capacity (i.e. > >> >> > frequency) which is required for the execution of a specific RT/DL > >> >> > workload. This will contribute to make the RT class more "friendly" for > >> >> > power/energy sensible applications. > >> >> > > >> >> > This patch extends the usage of capacity_{min,max} to the RT/DL classes. > >> >> > Whenever a task in these classes is RUNNABLE, the capacity required is > >> >> > defined by the constraints of the control group that task belongs to. > >> >> > > >> >> > >> >> We briefly discussed this at Linaro Connect that this works well for > >> >> sporadic RT tasks that run briefly and then sleep for long periods of > >> >> time - so certainly this patch is good, but its only a partial > >> >> solution to the problem of frequent and short-sleepers and something > >> >> is required to keep the boost active for short non-RUNNABLE as well. > >> >> The behavior with many periodic RT tasks is that they will sleep for > >> >> short intervals and run for short intervals periodically. In this case > >> >> removing the clamp (or the boost as in schedtune v2) on a dequeue will > >> >> essentially mean during a narrow window cpufreq can drop the frequency > >> >> and only to make it go back up again. > >> >> > >> >> Currently for schedtune v2, I am working on prototyping something like > >> >> the following for Android: > >> >> - if RT task is enqueue, introduce the boost. > >> >> - When task is dequeued, start a timer for a "minimum deboost delay > >> >> time" before taking out the boost. > >> >> - If task is enqueued again before the timer fires, then cancel the timer. > >> >> > >> >> I don't think any "fix" to this particular issue should be to the > >> >> schedutil governor and should be sorted before going to cpufreq itself > >> >> (that is before making the request). What do you think about this? > >> > > >> > My short observations are: > >> > > >> > 1) for certain RT tasks, which have a quite "predictable" activation > >> > pattern, we should definitively try to use DEADLINE... which will > >> > factor out all "boosting potential races" since the bandwidth > >> > requirements are well defined at task description time. > >> > >> I don't immediately see how deadline can fix this, when a task is > >> dequeued after end of its current runtime, its bandwidth will be > >> subtracted from the active running bandwidth. This is what drives the > >> DL part of the capacity request. In this case, we run into the same > >> issue as with the boost-removal on dequeue. Isn't it? > >> > > > > Unfortunately, I still have to post the set of patches (based on Luca's > > reclaiming set) that introduces driving of clock frequency from > > DEADLINE, so I guess everything we can discuss about how DEADLINE might > > help here might be difficult to understand. :( > > > > I should definitely fix that. > > I fully understand, Sorry to be discussing this too soon here... > No problem. I just thought I should clarify before people go WTH are these guys talking about?! :) > > However, trying to quickly summarize how that would work (for who is > > already somewhat familiar with reclaiming bits): > > > > - a task utilization contribution is accounted for (at rq level) as > > soon as it wakes up for the first time in a new period > > - its contribution is then removed after the 0lag time (or when the > > task gets throttled) > > - frequency transitions are triggered accordingly > > > > So, I don't see why triggering a go down request after the 0lag time > > expired and quickly reacting to tasks waking up would have create > > problems in your case? > > In my experience, the 'reacting to tasks' bit doesn't work very well. Humm.. but in this case we won't be 'reacting', we will be 'anticipating' tasks' needs, right? > For short running period tasks, we need to set the frequency to > something and not ramp it down too quickly (for ex, runtime 1.5ms and > period 3ms). In this case the 0-lag time would be < 3ms. I guess if > we're going to use 0-lag time, then we'd need to set it runtime and > period to be higher than exactly matching the task's? So would we be > assigning the same bandwidth but for R/T instead of r/t (Where r, R > are the runtimes and t,T are periods, and R > r and T > t)? > In general, I guess, you could let the Period be the task's period and set Runtime to be somewhat greater that the task's runtime (so to account for system overhead and such, e.g. hw limits regarding frequency switching).