Date: Thu, 10 Nov 2016 11:59:18 +0100
From: Peter Zijlstra <peterz@infradead.org>
To: luca abeni <luca.abeni@unitn.it>
Cc: Ingo Molnar <mingo@kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
        Juri Lelli <juri.lelli@gmail.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Claudio Scordino <claudio@evidence.eu.com>,
        Tommaso Cucinotta <tommaso.cucinotta@sssup.it>,
        Daniel Bistrot de Oliveira <danielbristot@gmail.com>,
        Henrik Austad <henrik@austad.us>, linux-kernel@vger.kernel.org
Subject: Re: [RFD] sched/deadline: Support single CPU affinity
Message-ID: <20161110105918.GT3142@twins.programming.kicks-ass.net>
References: <20161110080807.GD11311@worktop.programming.kicks-ass.net>
 <20161110100602.2781af72@sweethome>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20161110100602.2781af72@sweethome>
User-Agent: Mutt/1.5.23.1 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5057
Lines: 119

On Thu, Nov 10, 2016 at 10:06:02AM +0100, luca abeni wrote:
> Hi Peter,
> 
> On Thu, 10 Nov 2016 09:08:07 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Add support for single CPU affinity to SCHED_DEADLINE; the supposed
> > reason for wanting single CPU affinity is better QoS than provided by
> > G-EDF.
> This looks very interesting, thanks for sharing!
> I have some "theoretical" comments; I'll look at the code later.
> 
> > Therefore the aim is to provide harder guarantees, similar to UP, for
> > single CPU affine tasks. This then leads to a mixed criticality
> > scheduling requirement for the CPU scheduler. G-EDF like for the
> > non-affine (global) tasks and UP like for the single CPU tasks.
> The goal for single CPU affine tasks is clear; which kind of guarantees
> do you want to provide for the other (global) tasks? The tardiness
> guarantees?

I wanted to provide as close to regular G-EDF guarantees as possible. So
yes, the bounded tardiness, but also try and meet deadlines when
possible.

> 
> [...]
> > MIXED CRITICALITY SCHEDULING
> > 
> > Since we want to provide better guarantees for single CPU affine
> > tasks than the G-EDF scheduler provides for the single CPU tasks, we
> > need to somehow alter the scheduling algorithm.
> > 
> > The trivial layered EDF/G-EDF approach is obviously flawed in that it
> > will result in many unnecessary deadline misses. The trivial example
> > is having a single CPU task with a deadline after a runnable global
> > task. By always running single CPU tasks over global tasks we can
> > make the global task miss its deadline even though we could easily
> > have ran both within the alloted time.

> Ok; the layered approach clearly causes some unneeded deadline misses
> on global tasks... But those tasks risk to miss deadlines anyway (with
> the corrent scheduler, they are guaranteed to have a limited tardiness,
> not to respect all of the deadlines). I think this is related to the
> question about which guarantees are provided to the global tasks.

Right, so I wanted to limit the impact. Basically, given a stronger
admission test for the GEDF scheduler that would guarantee deadlines, I
would like the new scheduling function to still guarantee all deadlines.

That is, I didn't want to let the scheduling function be the weak spot,
only the admission test.

> > Therefore we must use a more complicated scheme. By adding a second
> > measure present in the sporadic task model to the scheduling function
> > we can try and distinguish between the constraints of handling the
> > two cases in a single scheduler.
> > 
> > We define the time to fail as:
> > 
> >   ttf(t) := t_d - t_b; where
> > 
> > 	t_d is t's absolute deadline
> > 	t_b is t's remaining budget

> Ok; I think this is similar to what is called "pseudo-deadline" in some
> papers.
> If I understand well, it is equal to the current time + the task
> "laxity" (or slack time). So, scheduling the task with the smaller ttf
> is equivalent to the "least laxity first" (LLF) algorithm.
> Giving precedence to tasks with 0 laxity is a technique that is often
> used to improve the schedulability on multi-processor systems.

Right, similar to LLF. Initially I was using the term laxity here, but
Hendrik convinced me that this is called time-to-fail. I'll let him
explain :-)

But yes, a pure TTF based scheduler should be equivalent to LLF which
itself is fairly similar to EDF in that its optimal for UP etc.. (I
think).

> > This is the last possible moment we must schedule this task such that
> > it can complete its work and not miss its deadline.
> > 
> > If we then augment the regular EDF rules by, for local tasks,
> > considering the time to fail and let this measure override the
> > regular EDF pick when the time to fail can be overran by the EDF pick.
> > 
> > That is, when:
> > 
> > 	now + left_b > min(ttf)

> Ok, this looks interesting... I have to better understand this rule. If
> I am not wrong, this is equivalent to

> 	now + left_budget > min(now + laxity) =>
> 	=> left_budget > min(laxity)

> So, if I understand well when the remaining budget of the current task
> is larger than the minimum laxity you switch from EDF to LLF (which is
> equivalent to local time to fail)? Is this understanding correct?

I think so, but I don't have the exact laxity definitions at hand, I'll
need to go dig out that paper. I should have it somewhere.

> I _suspect_ that the migration rules must also be changed (when a task
> migrates from a runqueue, it currently selects as a target the runqueue
> having the largest earliest deadline... Maybe it should also consider
> the presence of rq-local tasks - or the LLF-ordered heap - on the
> target?)

Quite possible, I didn't think about that.

> Did you find this scheduling strategy on some paper? Otherwise, I think
> we need to develop some theoretical analysis for it... (which is of
> course another interesting thing! :)

No, I cooked this up myself. There might of course still be a paper on
this, but if so, I'm blissfully unaware.