Date: Thu, 10 Nov 2016 13:56:46 +0100
From: Peter Zijlstra <peterz@infradead.org>
To: Henrik Austad <henrik@austad.us>
Cc: Ingo Molnar <mingo@kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
        Juri Lelli <juri.lelli@gmail.com>, Luca Abeni <luca.abeni@unitn.it>,
        Steven Rostedt <rostedt@goodmis.org>,
        Claudio Scordino <claudio@evidence.eu.com>,
        Tommaso Cucinotta <tommaso.cucinotta@sssup.it>,
        Daniel Bistrot de Oliveira <danielbristot@gmail.com>,
        linux-kernel@vger.kernel.org
Subject: Re: [RFD] sched/deadline: Support single CPU affinity
Message-ID: <20161110125646.GS3117@twins.programming.kicks-ass.net>
References: <20161110080807.GD11311@worktop.programming.kicks-ass.net>
 <20161110122100.GA30974@sisyphus.home.austad.us>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20161110122100.GA30974@sisyphus.home.austad.us>
User-Agent: Mutt/1.5.23.1 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3979
Lines: 89

On Thu, Nov 10, 2016 at 01:21:00PM +0100, Henrik Austad wrote:
> On Thu, Nov 10, 2016 at 09:08:07AM +0100, Peter Zijlstra wrote:

> > We define the time to fail as:
> > 
> >   ttf(t) := t_d - t_b; where
> > 
> > 	t_d is t's absolute deadline
> > 	t_b is t's remaining budget
> > 
> > This is the last possible moment we must schedule this task such that it can
> > complete its work and not miss its deadline.
> 
> To elaborate a bit on this (this is a modified LLF approach if my memory 
> serves):
> 
> You have the dynamic time-to-failure (TtF), i.e. as the task progresses 
> (scheduled to run), the relative time-to-failure will remain constant. This 
> can be used to compare thasks to a running task and should minimize the 
> number of calculations required.
> 
> Then you have the static Time-of-failure (ToF, which is the absoulte time 
> when a task will no longer be able to meet its deadline. This is what you 
> use for keeping a sorted list of tasks in the runqueue. As this is a fixed 
> point in time, you do not have to dynamically update or do crazy 
> calculation when inserting/removing threads from the rq.
> 
> > If we then augment the regular EDF rules by, for local tasks, considering the
> > time to fail and let this measure override the regular EDF pick when the
> > time to fail can be overran by the EDF pick.
> 
> Then, if you do this - do you need to constrict this to a local CPU? I 
> *think* you could do this in a global scheduler if you use ToF/TtF for all 
> deadline-tasks, I think you should be able to meet deadlines.
> 
> I had a rant about this way back [1,2 Sec 11.4], I need to sit down and 
> re-read most of it, it has been a few too many years, but the idea was to 
> minimize the number of context-switches (which LLF is prone to get a lot 
> of) as well as minimize the computational overhead by avoiding 
> re-calculating time-of-failre/time-to-failre a lot.
> 
> > That is, when:
> > 
> > 	now + left_b > min(ttf)
> 
> Why not just  use ttf/tof for all deadline-tasks? We have all the 
> information available anyway and it would probably make the internal logic 
> easier?

So the point is that the total task set isn't able to meet its deadlines
per-se. Therefore, no matter which scheduling function you pick, be it
EDF, LLF, TTF or EDZL, you'll miss deadlines.

Consider the trivial example of 2 CPU and 3 tasks with u=2/3. That's
just not going to work (with static job priority schedulers, so lets not
do Pfair etc. ;-).

The point here is to allow the local tasks to operate UP-like and
therefore get UP-like guarantees. We know all these scheduling functions
(EDF, LLF, TTF, EDZL) are optimal on UP.

At the same time, we'd like the global task set to not unduly suffer,
that is, if the total task set is schedulable under G-EDF, then we'd
like to actually achieve this.

So we have to distinguish between the local and global tasks from
principle, as we have different criticality requirements for them. We
want to always meet the local deadlines, but the global tasks are
subject to tardiness depending on the AC function.

Now, mixed criticality in general is 'proven' NP hard. But since the
sporadic task model has at least 2 variables and we need to distinguish
between 2 levels only, I feel this should be tractable.

And this is where and why I introduced TTF, its a second measure, next
to the exiting earlier deadline, to differentiate the criticality
levels. This then also means we should only use TTF for local tasks,
otherwise it cannot differentiate.

Furthermore, as you mentioned, we don't immediately use the 0-laxity
point as per EDZL (I've only read the link Tommaso provided, didn't get
the article per paywall), since 0-laxity is a very fragile point, _any_
system disturbance after that will mean a fail, also, getting it
requires a timer. So I slightly relaxed that to the last natural
scheduling point before that, which is where the:

  now + leftmost_budget > min(ttf)

constraint comes from.