Date: Thu, 27 Oct 2016 21:14:39 +0100
From: Patrick Bellasi <patrick.bellasi@arm.com>
To: Tejun Heo <tj@kernel.org>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@kernel.org>,
        Peter Zijlstra <peterz@infradead.org>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Steve Muckle <steve.muckle@linaro.org>, Leo Yan <leo.yan@linaro.org>,
        Viresh Kumar <viresh.kumar@linaro.org>,
        "Rafael J . Wysocki" <rjw@rjwysocki.net>, Todd Kjos <tkjos@google.com>,
        Srinath Sridharan <srinathsr@google.com>,
        Andres Oportus <andresoportus@google.com>,
        Juri Lelli <juri.lelli@arm.com>,
        Morten Rasmussen <morten.rasmussen@arm.com>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Chris Redpath <chris.redpath@arm.com>,
        Robin Randhawa <robin.randhawa@arm.com>, Li Zefan <lizefan@huawei.com>,
        Johannes Weiner <hannes@cmpxchg.org>, Ingo Molnar <mingo@redhat.com>
Subject: Re: [RFC v2 5/8] sched/tune: add initial support for CGroups based
 boosting
Message-ID: <20161027201439.GA3668@derkdell>
References: <20161027174108.31139-1-patrick.bellasi@arm.com>
 <20161027174108.31139-6-patrick.bellasi@arm.com>
 <20161027183017.GA15876@htj.duckdns.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20161027183017.GA15876@htj.duckdns.org>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5593
Lines: 129

On 27-Oct 14:30, Tejun Heo wrote:
> Hello, Patrick.

Hi Tejun,

> On Thu, Oct 27, 2016 at 06:41:05PM +0100, Patrick Bellasi wrote:
> > To support task performance boosting, the usage of a single knob has the
> > advantage to be a simple solution, both from the implementation and the
> > usability standpoint.  However, on a real system it can be difficult to
> > identify a single value for the knob which fits the needs of multiple
> > different tasks. For example, some kernel threads and/or user-space
> > background services should be better managed the "standard" way while we
> > still want to be able to boost the performance of specific workloads.
> > 
> > In order to improve the flexibility of the task boosting mechanism this
> > patch is the first of a small series which extends the previous
> > implementation to introduce a "per task group" support.
> > 
> > This first patch introduces just the basic CGroups support, a new
> > "schedtune" CGroups controller is added which allows to configure
> > different boost value for different groups of tasks.
> > To keep the implementation simple while still supporting an effective
> > boosting strategy, the new controller:
> >   1. allows only a two layer hierarchy
> >   2. supports only a limited number of boost groups
> > 
> > A two layer hierarchy allows to place each task either:
> >   a) in the root control group
> >      thus being subject to a system-wide boosting value
> >   b) in a child of the root group
> >      thus being subject to the specific boost value defined by that
> >      "boost group"
> > 
> > The limited number of "boost groups" supported is mainly motivated by
> > the observation that in a real system it could be useful to have only
> > few classes of tasks which deserve different treatment.
> > For example, background vs foreground or interactive vs low-priority.
> > 
> > As an additional benefit, a limited number of boost groups allows also
> > to have a simpler implementation, especially for the code required to
> > compute the boost value for CPUs which have RUNNABLE tasks belonging to
> > different boost groups.
> 
> So, skipping on the actual details of boosting mechanism, in terms of
> cgroup support, it should be integrated into the existing cpu
> controller and have proper support for hierarchy.

I have a couple of concerns/questions about both of these points.

First, regarding the integration with the cpu controller,
don't we risk to overload the semantic of the cpu controller?

Right now this controller is devoted to track the bandwidth that a
group of tasks can consume and/or to repartition the available
bandwidth among the tasks in that group.
Boosting is a different concept, it's kind-of related to CPU bandwidth
but it targets a completely different goal, i.e. biasing schedutils
and (in the future) scheduler decisions.

I'm wondering also how much confusing and complex it can be to
configure a system where you have not overlapping groups of tasks with
different bandwidth and boosting requirements.

For example, let assume we have three tasks: A, B, and C and we want:

   Bandwidth:  10% for A and B,  20% for C
   Boost:      10% for A,         0% for B and C

IMO, configuring such a set of constraints would be quite complex if
we expose the boost value through the cpu controller.

> Note that hierarchy
> support doesn't necessarily mean that boosting itself has to be
> hierarchical.

Initially I've actually considered such a design, however...

>It can be, for example, something along the line of
> "the descendants are allowed upto this level of boosting" so that the
> hierarchy just serves to assign the appropriate boosting values to the
> groups of tasks.

... the current "single layer hierarchy" has been proposed instead for
two main reasons.

First, we was not able to think about realistic use-cases where we
need this "up to this level" semantic.
For boosting purposes, tasks are grouped based on their role and/or
importance in the system. This property is usually defined in
"absolute" terms instead of "relative" therms.
Does it make sense to say that task A can be boosted only up to how
much is task B? In our experience probably never.

The second reason is mainly related to the possibility to have an
efficient and low-overhead implementation. The currently defined
semantic for CPU boosting requires to perform certain operations at
each task enqueue and dequeue events. Some of these operations are
part of the hot path in the scheduler code. The flat hierarchy allows
to use per-cpu data structures and algorithms which aims at being
efficient in reducing the overheads incurred in doing the required
accounting.

As a final remark, I would like to say that Google is currently using
SchedTune in Android to classify tasks by "importance" and feed this
information into the scheduler. Doing this exercise, so far we did not
spot limitations related to the usage of a flat hierarchy.

However, I like to have this discussion, which it's actually the main
goal of this RFC. My suggestion is just that we should think about
use-cases before and than introduce a more complex solution, but only
if we convince ourself that it can bring more benefits than burdens in
code maintainability.

Is your request for a "proper support for hierarchy" somehow related to
the requirements for the "unified hierarchy"? Or do you see also other
more functional/semantic aspects?


> Thanks.

If you are going to attend LPC next week, I hope we can have a chat on
these topics.

Cheers Patrick

-- 
#include <best/regards.h>

Patrick Bellasi