Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932384AbdCTSIu (ORCPT ); Mon, 20 Mar 2017 14:08:50 -0400 Received: from foss.arm.com ([217.140.101.70]:43548 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756331AbdCTSIr (ORCPT ); Mon, 20 Mar 2017 14:08:47 -0400 Date: Mon, 20 Mar 2017 18:08:37 +0000 From: Patrick Bellasi To: Tejun Heo Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ingo Molnar , Peter Zijlstra Subject: Re: [RFC v3 1/5] sched/core: add capacity constraints to CPU controller Message-ID: <20170320180837.GB28391@e110439-lin> References: <1488292722-19410-1-git-send-email-patrick.bellasi@arm.com> <1488292722-19410-2-git-send-email-patrick.bellasi@arm.com> <20170320171511.GB3623@htj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170320171511.GB3623@htj.duckdns.org> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5807 Lines: 145 On 20-Mar 13:15, Tejun Heo wrote: > Hello, > > On Tue, Feb 28, 2017 at 02:38:38PM +0000, Patrick Bellasi wrote: > > This patch extends the CPU controller by adding a couple of new > > attributes, capacity_min and capacity_max, which can be used to enforce > > bandwidth boosting and capping. More specifically: > > > > - capacity_min: defines the minimum capacity which should be granted > > (by schedutil) when a task in this group is running, > > i.e. the task will run at least at that capacity > > > > - capacity_max: defines the maximum capacity which can be granted > > (by schedutil) when a task in this group is running, > > i.e. the task can run up to that capacity > > cpu.capacity.min and cpu.capacity.max are the more conventional names. Ok, should be an easy renaming. > I'm not sure about the name capacity as it doesn't encode what it does > and is difficult to tell apart from cpu bandwidth limits. I think > it'd be better to represent what it controls more explicitly. In the scheduler jargon, capacity represents the amount of computation that a CPU can provide and it's usually defined to be 1024 for the biggest CPU (on non SMP systems) running at the highest OPP (i.e. maximum frequency). It's true that it kind of overlaps with the concept of "bandwidth". However, the main difference here is that "bandwidth" is not frequency (and architecture) scaled. Thus, for example, assuming we have only one CPU with these two OPPs: OPP | Frequency | Capacity 1 | 500MHz | 512 2 | 1GHz | 1024 a task running 60% of the time on that CPU when configured to run at 500MHz, from the bandwidth standpoint it's using 60% bandwidth but, from the capacity standpoint, is using only 30% of the available capacity. IOW, bandwidth is purely temporal based while capacity factors in both frequency and architectural differences. Thus, while a "bandwidth" constraint limits the amount of time a task can use a CPU, independently from the "actual computation" performed, with the new "capacity" constraints we can enforce much "actual computation" a task can perform in the "unit of time". > > These attributes: > > a) are tunable at all hierarchy levels, i.e. root group too > > This usually is problematic because there should be a non-cgroup way > of configuring the feature in case cgroup isn't configured or used, > and it becomes awkward to have two separate mechanisms configuring the > same thing. Maybe the feature is cgroup specific enough that it makes > sense here but this needs more explanation / justification. In the previous proposal I used to expose global tunables under procfs, e.g.: /proc/sys/kernel/sched_capacity_min /proc/sys/kernel/sched_capacity_max which can be used to defined tunable root constraints when CGroups are not available, and becomes RO when CGroups are. Can this be eventually an acceptable option? In any case I think that this feature will be mainly targeting CGroup based systems. Indeed, one of the main goals is to collect "application specific" information from "informed run-times". Being "application specific" means that we need a way to classify applications depending on the runtime context... and that capability in Linux is ultimately provided via the CGroup interface. > > b) allow to create subgroups of tasks which are not violating the > > capacity constraints defined by the parent group. > > Thus, tasks on a subgroup can only be more boosted and/or more > > For both limits and protections, the parent caps the maximum the > children can get. At least that's what memcg does for memory.low. > Doing that makes sense for memcg because for memory the parent can > still do protections regardless of what its children are doing and it > makes delegation safe by default. Just to be more clear, the current proposal enforces: - capacity_max_child <= capacity_max_parent Since, if a task is constrained to get only up to a certain amount of capacity, than its childs cannot use more than that... eventually they can only be further constrained. - capacity_min_child >= capacity_min_parent Since, if a task has been boosted to run at least as much fast, than its childs cannot be constrained to go slower without eventually impacting parent performance. > I understand why you would want a property like capacity to be the > other direction as that way you get more specific as you walk down the > tree for both limits and protections; Right, the protection schema is defined in such a way to never affect parent constraints. > however, I think we need to > think a bit more about it and ensure that the resulting interface > isn't confusing. Sure. > Would it work for capacity to behave the other > direction - ie. a parent's min restricting the highest min that its > descendants can get? It's completely fine if that's weird. I had a thought about that possibility and it was not convincing me from the use-cases standpoint, at least for the ones I've considered. Reason is that capacity_min is used to implement a concept of "boosting" where, let say we want to "run a task faster then a minimum frequency". Assuming that this constraint has been defined because we know that this task, and likely all its descendant threads, needs at least that capacity level to perform according to expectations. In that case the "refining down the hierarchy" can require to boost further some threads but likely not less. Does this make sense? To me this seems to match quite well at least Android/ChromeOS specific use-cases. I'm not sure if there can be other different use-cases in the domain for example of managed containers. > Thanks. > > -- > tejun -- #include Patrick Bellasi