2010-07-09 14:38:44

by Dario Faggioli

[permalink] [raw]
Subject: periods and deadlines in SCHED_DEADLINE

Hi all,

So, talking to Peter and Thomas at the OSPERT workshop in Brussels [1],
the so called "sporaidic task model" came out many many times! Such
model comprises three parameters: wcet (or better budget or better
runtime :-D), deadline and period, where obviously deadline and period
could be different. However, SCHED_DEADLINE, since now, is using just
deadline, assuming that period is always equal to it.

Now, Peter said something about starting using period as well, but we
didn't have the time to discuss about that. Ironically, I got just a
couple of days before _the_same_ feature request by Harald from Ericsson
(in Cc), asking the very same thing.
So, this mail is to try to find a consensus (or just to gather your
opinions) on how to include this feature in the scheduler.

Since I'm about to release a new version, I would like, if possible, to
take these requests into account...

Here's the issue:
- using periods for calculating the tasks' bandwidth and then using
deadlines for scheduling the tasks is going to work, but the
admission control test that you would need for ensuring anybody
will make its deadline is waaay more complex than Sum_i(BW_i)<1, even
for uniprocessors/partitionig. That one instead would gives you just
a very basic guarantee that the design in not completely broken
(formally, I think I should say it is only a necessary
condition :-)).

Therefore, here it is my proposal:
- if the programmer specify both period and deadline, I act as above,
running the _only_necessary_ test in the kernel, assuming that
userspace performed some other kind of (correct!) admission test by
its own, or that it is cool with missing some deadlines... What do
you think?
- do you think it could be useful to have a different syscall to deal
with the period parameter (if it's different from deadline), e.g.,
something useful to make the task periodic as you have (if I remember
well) in Xenomai or RTAI?
If you think it's worth doing that, do you think the
task_wait_interval() syscall that we already added could/should do
the job?

Basically, from the scheduling point of view, what it could happen is
that I'm still _NOT_ going to allow a task with runtime Q_i, deadline
D_i and period P_i to use more bandwidth than Q_i/P_i, I'm still using D
for scheduling but the passing of the simple in-kernel admission test
Sum_i(Q_i/P_i)<1 won't guarantee that the task will always finish its
jobs before D.

Please, if you find this interesting provide me with your comments,
otherwise, excuse me for bothering. :-)

Thanks and regards,
Dario

[1] http://www.artist-embedded.org/artist/Overview,1909.html

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2010-07-09 14:19:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote:

> - using periods for calculating the tasks' bandwidth and then using
> deadlines for scheduling the tasks is going to work, but the
> admission control test that you would need for ensuring anybody
> will make its deadline is waaay more complex than Sum_i(BW_i)<1, even
> for uniprocessors/partitionig. That one instead would gives you just
> a very basic guarantee that the design in not completely broken
> (formally, I think I should say it is only a necessary
> condition :-)).

Happen to have a paper handy that explains all this in a concise way?

2010-07-09 14:24:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote:
> Basically, from the scheduling point of view, what it could happen is
> that I'm still _NOT_ going to allow a task with runtime Q_i, deadline
> D_i and period P_i to use more bandwidth than Q_i/P_i, I'm still using D
> for scheduling but the passing of the simple in-kernel admission test
> Sum_i(Q_i/P_i)<1 won't guarantee that the task will always finish its
> jobs before D.

But the tardiness would still be bounded, right? So its a valid Soft-RT
model?

2010-07-09 14:30:15

by Peter Zijlstra

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote:
>
> Therefore, here it is my proposal:
> - if the programmer specify both period and deadline, I act as above,
> running the _only_necessary_ test in the kernel, assuming that
> userspace performed some other kind of (correct!) admission test by
> its own, or that it is cool with missing some deadlines... What do
> you think?

It would be good to provide room in the interface to improve upon this
situation.

One thing we could do, although this would make the proposed scheduler a
wee bit more complex, is split between hard and soft realtime. Only
accept P==rel_D for hard, and schedule the soft tasks in the hard slack
or something like that.

That way we can later extend the hard admission tests to accept more.

2010-07-09 14:32:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote:
> - do you think it could be useful to have a different syscall to deal
> with the period parameter (if it's different from deadline), e.g.,
> something useful to make the task periodic as you have (if I remember
> well) in Xenomai or RTAI?
> If you think it's worth doing that, do you think the
> task_wait_interval() syscall that we already added could/should do
> the job?

Again, I'm afraid I need some extra education here in order to make a
sensible comment.

What are the exact semantics of this extra proposed syscall?

What exactly are the benefits over not having it, and simply rely on the
task to not wake up more often, but if it does have it run into the lack
of budget and sort it that way?

2010-07-09 15:31:45

by Bjoern Brandenburg

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE


On Jul 9, 2010, at 4:18 PM, Peter Zijlstra wrote:

> On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote:
>
>> - using periods for calculating the tasks' bandwidth and then using
>> deadlines for scheduling the tasks is going to work, but the
>> admission control test that you would need for ensuring anybody
>> will make its deadline is waaay more complex than Sum_i(BW_i)<1, even
>> for uniprocessors/partitionig. That one instead would gives you just
>> a very basic guarantee that the design in not completely broken
>> (formally, I think I should say it is only a necessary
>> condition :-)).
>
> Happen to have a paper handy that explains all this in a concise way?
>

Sounds confusing, but this is actually not that complicated.

- If the period exceeds the deadline of a task, then it is said to have a constrained deadline, and is said to be a constrained task.

- The density of a task is the ratio budget/min(period, relative deadline). The density of a task is at most its utilization (budget/period).

- There exists a simple *sufficient* (but not necessary) test for uniprocessor EDF for constrained tasks: if the sum of all task densities is at most one (on each processor), then all jobs will meet their deadlines. (Of course, this assumes that the budget is only replenished at the beginning at each period and does not take self-suspensions due to I/O etc. into account.)

- More accurate tests exist. A very recent paper (presented this Wednesday at ECRTS) by Masrur et al. provides a summary of the state of the art and a new constant-time admissions test.

Constant-Time Admission Control for Partitioned EDF
Alejandro Masrur, Samarjit Chakraborty, and Georg F?rber
Proc. ECRTS 2010, pp 34-43
ftp://ftp.rcs.ei.tum.de/pub/papers/rtsg/edffast.pdf

As a side note, almost all global EDF hard real-time admission tests can handle tasks with constrained deadlines transparently. However, as far as I can tell, they do not apply to SCHED_DEADLINE.

- Bj?rn

PS: My responses will be delayed, I'm off to the airport...

2010-07-09 16:35:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Fri, 2010-07-09 at 16:51 +0200, Bjoern Brandenburg wrote:
> On Jul 9, 2010, at 4:18 PM, Peter Zijlstra wrote:
>
> > On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote:
> >
> >> - using periods for calculating the tasks' bandwidth and then using
> >> deadlines for scheduling the tasks is going to work, but the
> >> admission control test that you would need for ensuring anybody
> >> will make its deadline is waaay more complex than Sum_i(BW_i)<1, even
> >> for uniprocessors/partitionig. That one instead would gives you just
> >> a very basic guarantee that the design in not completely broken
> >> (formally, I think I should say it is only a necessary
> >> condition :-)).
> >
> > Happen to have a paper handy that explains all this in a concise way?
> >
>
> Sounds confusing, but this is actually not that complicated.

Indeed, reading the referenced paper did clear things up, thanks!

I think the easiest path for now would indeed be to split between hard
and soft rt tasks, and limit hard to d==p, and later worry about
supporting d<p for hard.

It will very much depend on how we're going to go about doing things
with that 'load-balancer' thingy anyway.

The idea is that we approximate G-EDF by moving tasks around, but Dario
told me the admission tests are still assuming P-EDF.

Add to that the interesting problems of task affinity and we might soon
all have a head-ache ;-)

One thing we can do is limit the task affinity to either 1 cpu or all
cpus in the load-balance domain. Since there don't yet exist any
applications we can disallow things to make life easier.

If we only allow pinned tasks and free tasks, splitting the admission
test in two would suffice I think, if keep one per-cpu utilization
measure and use the maximum of these over all cpus to start the global
utilization measure, things ought to work out.

If we do hard and soft independenty, we would need to split those into 2
again, resulting in 4 sets of measures.



2010-07-10 07:08:41

by Dario Faggioli

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Fri, 2010-07-09 at 16:51 +0200, Bjoern Brandenburg wrote:
> > Happen to have a paper handy that explains all this in a concise way?
> >
>
> Sounds confusing, but this is actually not that complicated.
>
> - If the period exceeds the deadline of a task, then it is said to have a constrained deadline, and is said to be a constrained task.
>
> - The density of a task is the ratio budget/min(period, relative deadline). The density of a task is at most its utilization (budget/period).
>
> - There exists a simple *sufficient* (but not necessary) test for uniprocessor EDF for constrained tasks: if the sum of all task densities is at most one (on each processor), then all jobs will meet their deadlines. (Of course, this assumes that the budget is only replenished at the beginning at each period and does not take self-suspensions due to I/O etc. into account.)
>
Right, I think Bjoern made it much more clear than how you can find it
in many of the existing papers! :-)

Just to make a simple UP example, if you have two tasks with the
following parameters (where T_i=(C_i, P_i, D_i)):

T_1 = (4, 10, 5)
T_2 = (4, 10, 5)

Then the test Sum_i(C_i/P_i)<1 will say "go!" (4/10+4/10<1), but
one of the tasks can miss its deadline by 3 time units in the worst case
(if, and if yes which one, depends on the order with witch they arrive).

Using D_i instead of P_i in the test will make things better in this
case (4/5+4/5>1 ==> "no go!"), but its overly pessimistic (i.e., only
sufficient). In fact for this task set:

T_1 = (1, 10, 2)
T_2 = (2, 20, 4)
T_3 = (3, 50, 6)

which has small utilization (Sum_i(C_i/P_i)=.26) and *is schedulable*,
even if 1/2+2/4+3/6=1.5>1.

Hope this helped in clarifying things even more! :-D
Btw, another nice survey about schedulability tests for global EDF I
know (and you'll forgive me if I point you to something from my
institution :-)) is this http://retis.sssup.it/~marko/papers/ICPP09.pdf.

> As a side note, almost all global EDF hard real-time admission tests can handle tasks with constrained deadlines transparently. However, as far as I can tell, they do not apply to SCHED_DEADLINE.
>
This is the only part I am not sure I got... Can you explain me what do
you mena by "they do not apply" ?

> PS: My responses will be delayed, I'm off to the airport...
>
Yep, so will be mine one, including this one! :-P

Thanks and regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2010-07-10 07:16:54

by Luca Abeni

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

Hi all,

first of all, thanks for including me in these emails, and sorry for the
delay...

On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote:
> Hi all,
>
> So, talking to Peter and Thomas at the OSPERT workshop in Brussels [1],
> the so called "sporaidic task model" came out many many times!
I assume here you are simply talking about tasks with relative deadline
different from period, right? (the term "sporadic" is more often
associated with non-periodic activation patterns).

[...]
> - do you think it could be useful to have a different syscall to deal
> with the period parameter (if it's different from deadline), e.g.,
> something useful to make the task periodic as you have (if I remember
> well) in Xenomai or RTAI?
Maybe I am confused because I missed the initial part of the discussion,
but here I think there is the risk to mix two different concepts: the
"reservation period" (that is, the period P used to postpone the
scheduling deadline when the budget arrives to 0), and the "task
period" (which has to do with the periodicity of tasks activations). For
implementing a periodic behaviour in the task (this is, AFAIK, what RTAI
similar API provide), no new syscall is needed: clock_nanosleep() is
enough. See http://www.disi.unitn.it/~abeni/RTOS/rtapi.pdf for a
(stupid) example.
The reservation period, on the other hand, is a scheduling parameter,
and I think that setting it with extended versions of sched_setparam(),
sched_setscheduler() and similar is ok.


> If you think it's worth doing that, do you think the
> task_wait_interval() syscall that we already added could/should do
> the job?
I do not remember what task_wait_interval() does :)
Is it the syscall you added to indicate the end of a job?


> Basically, from the scheduling point of view, what it could happen is
> that I'm still _NOT_ going to allow a task with runtime Q_i, deadline
> D_i and period P_i to use more bandwidth than Q_i/P_i, I'm still using D
> for scheduling but the passing of the simple in-kernel admission test
> Sum_i(Q_i/P_i)<1 won't guarantee that the task will always finish its
> jobs before D.
I think if you want a different P_i and D_i you can use D_i for
generating new scheduling deadlines on task arrivals as "d = t + D_i",
and P_i to postpone the scheduling deadlines as "d = d + T_i" when the
budget is 0.
Depending on the replenishment amount you use, you might need to modify
the admission test as "Sum_i(Q_i/min{P_i,D_i}) < 1" or not (if you
always replenish to Q_i, then you need a modified admission test;
otherwise, you can compute the replenishment amount so that the
admission test is unaffected).


Thanks,
Luca

2010-07-10 07:20:36

by Luca Abeni

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Fri, 2010-07-09 at 16:24 +0200, Peter Zijlstra wrote:
> On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote:
> > Basically, from the scheduling point of view, what it could happen is
> > that I'm still _NOT_ going to allow a task with runtime Q_i, deadline
> > D_i and period P_i to use more bandwidth than Q_i/P_i, I'm still using D
> > for scheduling but the passing of the simple in-kernel admission test
> > Sum_i(Q_i/P_i)<1 won't guarantee that the task will always finish its
> > jobs before D.
>
> But the tardiness would still be bounded, right? So its a valid Soft-RT
> model?
I think that if Sum_i(Q_i/P_i)<1 but Sum_i(Q_i/min{P_i,D_i})>=1 then you
can have sporadic deadline misses, but it should still be possible to
compute an upper bound for the tardiness.
But this is just a feeling, I have no proof... :)


Luca

2010-07-10 07:50:39

by Dario Faggioli

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Fri, 2010-07-09 at 16:32 +0200, Peter Zijlstra wrote:
> On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote:
> > - do you think it could be useful to have a different syscall to deal
> > with the period parameter (if it's different from deadline), e.g.,
> > something useful to make the task periodic as you have (if I remember
> > well) in Xenomai or RTAI?
> > If you think it's worth doing that, do you think the
> > task_wait_interval() syscall that we already added could/should do
> > the job?
>
> Again, I'm afraid I need some extra education here in order to make a
> sensible comment.
>
Hey, fine, where's the problem? :-P

> What are the exact semantics of this extra proposed syscall?
>
Right now, it is:
task_wait_interval(t) --> "wake me up at the first instant after t when
you can give me my full runtime"

> What exactly are the benefits over not having it, and simply rely on the
> task to not wake up more often, but if it does have it run into the lack
> of budget and sort it that way?
>
What you're saying obviously will always work, and it is actually a
quite common usage pattern (we use it like that a lot! :-)).

The new syscall might help when it is important for a task to
synchronize with the budget provisioning mechanism. It might be
uncommon, but there could be situations --more in hard than in soft
scenarios-- where you want to be sure that you're next job (and all the
subsequent ones, if you behave well) will get its full runtime, even if
this means waiting a little bit.

what I was wondering was if this semantic should be modified by the
introduction of the "period", but I also agree with Luca that we must do
our best to avoid confusion!

Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2010-07-10 09:01:43

by Dario Faggioli

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Fri, 2010-07-09 at 18:35 +0200, Peter Zijlstra wrote:
> I think the easiest path for now would indeed be to split between hard
> and soft rt tasks, and limit hard to d==p, and later worry about
> supporting d<p for hard.
>
Mmm... I see... Are you thinking of another scheduling class? Or maybe
just another queue with "higher priority" inside the same scheduling
class (sched_dl.c)?

I mean I can do both, so I prefer to do what you like most in the first
place, instead of having to do it twice!! :-P

Maybe having two policies inside the same class (maybe handled in
separate queues/rb-trees) might save a lot of code duplication.
If we want to go like this, suggestions on the name(s) of the new (or of
both) policy(-ies) are more than welcome. :-D

> It will very much depend on how we're going to go about doing things
> with that 'load-balancer' thingy anyway.
>
Agree. The "load-balancer" right now pushes/pulls tasks to/from the
various runqueue --just how the saame thing happens in sched-rt-- to,
say, approximate G-EDF. Code is on the git... I just need some time to
clean up a little bit more and post the patches, but it's already
working at least... :-)

> The idea is that we approximate G-EDF by moving tasks around, but Dario
> told me the admission tests are still assuming P-EDF.
>
Yep, as said above, that's what we've done since now. Regarding
"partitioned admission", let me try to explain this.

You asked me to use sched_dl_runtime_us/sched_dl_period_us to let people
decide how much bandwidth should be devoted to EDF tasks. This obviously
yields to _only_one_ bandwidth value that is then utilized as the
utilization cap on *each* CPU, mainly for consistency reasons with
sched_rt_{runtime,period}_us. At that time I was using such value as the
"overall EDF bandwidth", but I changed to your suggested semantic.

Obviously this works perfectly as long as tasks stay on the CPU were
they are created, and if they're manually migrated (by explicitly
changing their affinity) I can easily check if there's enough bandwidth
on the target CPU, and if yes move the task and its bandwidth there.
That's how things were before the 'load-balancer' (and still does, if
you set affinities of tasks so to have a fully partitioned setup).

With global scheduling in place, we have this new situation. A task is
forked on a CPU (say 0), and I allow that if there's enough bandwidth
for it on that processor (and obviously, if yes, I also consume such
amount of bw). When the task is dynamically migrated to CPU 1 I have two
choices:
(a) I move the bandwidth it occupies from 0 to 1 or,
(b) I leave it (the bw, not the task) where it is, on 0.

If I go for (b) and the scheduler wants to move a 0.2 task from CPU 0
(loaded up to 1.0) to CPU 1 loaded up to 0.9, I'm having a "transitory"
situation with load 0.7 on CPU 0 and load 1.1 on CPU 1 --which I really
don't like--, but at least I'm still enforcing Sum_i(EDFtask_i)<1.9.
Moreover, If a new 0.1 task is being forked by someone on CPU 1
(independently whether it finds 1.1 or 1.0 load there), it will fail,
even if there is room for it in the system (on CPU 1) --which I really
don't like!! This is what, as I said to Peter in Brussels, I mean with
"still partitioned" admission test.

If I go for (a), and again the scheduler tries to move a 0.2 task from
CPU 0 (loaded up to 1) to CPU 1 (loaded up to 0.9) I again have, two
choices, failing or permitting this. Failing would mean another
limitation to global scheduling --which I really don't like-- but
allowing that would mean that another task of 0.2 can be created on CPU
0 so that I end up in bw(CPU0)+bw(CPU1)=1+1.1=2.1 --which I really don't
like!! :-O :-O

More-moreover, if my bw limit is 0.7 on each of the 2 CPUs I have,
keeping the bandwidth separated forbids me to create a 0.2 task if both
the CPU are loaded up to 0.6, while it probably could be scheduled,
since we have global EDF! :-O

If you look at the code, you'll find (b) implemented right now, but, as
you might have understood, it's something I really don't like! :-(

If we want something better I cannot think on anything that doesn't
include having a global (per-domain should be fine as well) mechanism
for bandwidth accounting...

> Add to that the interesting problems of task affinity and we might soon
> all have a head-ache ;-)
>
We right now support affinity, i.e., tasks will be pushed/pulled to/by
CPUs where they can run. I'm not aware of any academic work that
analyzes such a situation, but this doesn't mean we can't figure
something out... Just to give people an example of "why real-time
scheduling theory still matters"!! ;-P ;-P

> One thing we can do is limit the task affinity to either 1 cpu or all
> cpus in the load-balance domain. Since there don't yet exist any
> applications we can disallow things to make life easier.
>
> If we only allow pinned tasks and free tasks, splitting the admission
> test in two would suffice I think, if keep one per-cpu utilization
> measure and use the maximum of these over all cpus to start the global
> utilization measure, things ought to work out.
>
Ok, that seems possible to me, but since I have to write the code you
must tell me what you want the semantic of (syswide and per-group)
sched_dl_{runtime,period} to become and how should I treat them! :-)

Regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2010-07-10 09:14:39

by Dario Faggioli

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Fri, 2010-07-09 at 16:30 +0200, Peter Zijlstra wrote:
> On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote:
> >
> > Therefore, here it is my proposal:
> > - if the programmer specify both period and deadline, I act as above,
> > running the _only_necessary_ test in the kernel, assuming that
> > userspace performed some other kind of (correct!) admission test by
> > its own, or that it is cool with missing some deadlines... What do
> > you think?
>
> It would be good to provide room in the interface to improve upon this
> situation.
>
Well, room is already there, problem is decide how to use it.

> One thing we could do, although this would make the proposed scheduler a
> wee bit more complex, is split between hard and soft realtime. Only
> accept P==rel_D for hard, and schedule the soft tasks in the hard slack
> or something like that.
>
As said in the other mail, this could be done, just let me know as much
as you can how you think it should happen (scheduling class, scheduling
policy, two (or more) separate queues, ecc.

Btw, Dhaval introduced me to the obscure world of IRC... I am (if
everything worked! :-P) dariof on #sched and/or #sched-rt, so feel free
to contact me even there, if at te PC I'll be glad to discuss even
there! :-D

Thanks and regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2010-07-10 09:20:59

by Dario Faggioli

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Sat, 2010-07-10 at 09:09 +0200, Luca Abeni wrote:
> > - do you think it could be useful to have a different syscall to deal
> > with the period parameter (if it's different from deadline), e.g.,
> > something useful to make the task periodic as you have (if I remember
> > well) in Xenomai or RTAI?
> Maybe I am confused because I missed the initial part of the discussion,
> but here I think there is the risk to mix two different concepts: the
> "reservation period" (that is, the period P used to postpone the
> scheduling deadline when the budget arrives to 0), and the "task
> period" (which has to do with the periodicity of tasks activations).
>
Agree, this is also what I fear...

> For
> implementing a periodic behaviour in the task (this is, AFAIK, what RTAI
> similar API provide), no new syscall is needed: clock_nanosleep() is
> enough. See http://www.disi.unitn.it/~abeni/RTOS/rtapi.pdf for a
> (stupid) example.
>
Yep, agree again, just wanted to see what other folks' thoughts
were. :-)

> > If you think it's worth doing that, do you think the
> > task_wait_interval() syscall that we already added could/should do
> > the job?
> I do not remember what task_wait_interval() does :)
> Is it the syscall you added to indicate the end of a job?
>
Yep. And it can be useful for that purpose, or not being used at all.

> I think if you want a different P_i and D_i you can use D_i for
> generating new scheduling deadlines on task arrivals as "d = t + D_i",
> and P_i to postpone the scheduling deadlines as "d = d + T_i" when the
> budget is 0.
>
Yes, that's exactly what we wanted to do, considered it's also a very
small and easy to achieve behavioural change...

> Depending on the replenishment amount you use, you might need to modify
> the admission test as "Sum_i(Q_i/min{P_i,D_i}) < 1" or not (if you
> always replenish to Q_i, then you need a modified admission test;
> otherwise, you can compute the replenishment amount so that the
> admission test is unaffected).
>
Well, as I tried to point out in the other e-mails, there also are other
problems with the admission test... Let's see if we find a consensus on
how to proceed... :-)

Thanks and regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2010-07-10 10:28:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Sat, 2010-07-10 at 11:01 +0200, Raistlin wrote:
> On Fri, 2010-07-09 at 18:35 +0200, Peter Zijlstra wrote:
> > I think the easiest path for now would indeed be to split between hard
> > and soft rt tasks, and limit hard to d==p, and later worry about
> > supporting d<p for hard.
> >
> Mmm... I see... Are you thinking of another scheduling class? Or maybe
> just another queue with "higher priority" inside the same scheduling
> class (sched_dl.c)?

Inside the same class, since as you say that would allow sharing lots of
things, also conceptually it makes sense as the admission tests would
really have to share a lot of data between them anyway.

> Maybe having two policies inside the same class (maybe handled in
> separate queues/rb-trees) might save a lot of code duplication.
> If we want to go like this, suggestions on the name(s) of the new (or of
> both) policy(-ies) are more than welcome. :-D

Right, so that's a good point, I'm wondering if we should use two
separate policies or use the one policy, SCHED_DEADLINE, and use flags
to distinguish between these two uses.

Anyway, that part is the easy part to implement and shouldn't take more
than a few minutes to flip between one and the other.

> > The idea is that we approximate G-EDF by moving tasks around, but Dario
> > told me the admission tests are still assuming P-EDF.
> >
> Yep, as said above, that's what we've done since now. Regarding
> "partitioned admission", let me try to explain this.
>
> You asked me to use sched_dl_runtime_us/sched_dl_period_us to let people
> decide how much bandwidth should be devoted to EDF tasks. This obviously
> yields to _only_one_ bandwidth value that is then utilized as the
> utilization cap on *each* CPU, mainly for consistency reasons with
> sched_rt_{runtime,period}_us. At that time I was using such value as the
> "overall EDF bandwidth", but I changed to your suggested semantic.

But if you have a per-cpu bandwidth, and the number of cpus, you also
have the total amount of bandwidth available to G-EDF, no?

> With global scheduling in place, we have this new situation. A task is
> forked on a CPU (say 0), and I allow that if there's enough bandwidth
> for it on that processor (and obviously, if yes, I also consume such
> amount of bw). When the task is dynamically migrated to CPU 1 I have two
> choices:
> (a) I move the bandwidth it occupies from 0 to 1 or,
> (b) I leave it (the bw, not the task) where it is, on 0.

Well, typically G-EDF doesn't really care about what cpu runs what, as
long as the admission thing is respected and we maintain the
smp-invariant of running the n<=m highest 'prio' tasks on m cpus.

So it really doesn't matter how we specify the group budget, one global
clock or one clock per cpu, if we have the number of cpus involved we
can convert between those.

(c) use a 'global' bw pool and be blissfully ignorant of the
per-cpu things?

> If we want something better I cannot think on anything that doesn't
> include having a global (per-domain should be fine as well) mechanism
> for bandwidth accounting...

Right, per root_domain bandwidth accounting for admission should be
perfectly fine.

> > Add to that the interesting problems of task affinity and we might soon
> > all have a head-ache ;-)
> >
> We right now support affinity, i.e., tasks will be pushed/pulled to/by
> CPUs where they can run. I'm not aware of any academic work that
> analyzes such a situation, but this doesn't mean we can't figure
> something out... Just to give people an example of "why real-time
> scheduling theory still matters"!! ;-P ;-P

Hehe, I wouldn't at all mind dis-allowing random affinity masks and only
deal with 1 cpu or 'all' cpus for now.

But yeah, if someone can come up with something clever, I'm all ears ;-)

> > One thing we can do is limit the task affinity to either 1 cpu or all
> > cpus in the load-balance domain. Since there don't yet exist any
> > applications we can disallow things to make life easier.
> >
> > If we only allow pinned tasks and free tasks, splitting the admission
> > test in two would suffice I think, if keep one per-cpu utilization
> > measure and use the maximum of these over all cpus to start the global
> > utilization measure, things ought to work out.
> >
> Ok, that seems possible to me, but since I have to write the code you
> must tell me what you want the semantic of (syswide and per-group)
> sched_dl_{runtime,period} to become and how should I treat them! :-)

Right, so for the system-wide and group bandwidth limits I think we
should present them as if there's one clock, and let the scheduler sort
out how many cpus are available to make it happen.

So we specify bandwidth as if we were looking at our watch, and say,
this here group can consume 30 seconds every 2 minutes. If the
load-balance domains happen to be larger than 1 cpu, hooray we can run
more tasks and the total available bandwidth simply gets multiplied by
the number of available cpus.

Makes sense?


2010-07-10 10:36:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Sat, 2010-07-10 at 09:11 +0200, Luca Abeni wrote:
> On Fri, 2010-07-09 at 16:24 +0200, Peter Zijlstra wrote:
> > On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote:
> > > Basically, from the scheduling point of view, what it could happen is
> > > that I'm still _NOT_ going to allow a task with runtime Q_i, deadline
> > > D_i and period P_i to use more bandwidth than Q_i/P_i, I'm still using D
> > > for scheduling but the passing of the simple in-kernel admission test
> > > Sum_i(Q_i/P_i)<1 won't guarantee that the task will always finish its
> > > jobs before D.
> >
> > But the tardiness would still be bounded, right? So its a valid Soft-RT
> > model?

> I think that if Sum_i(Q_i/P_i)<1 but Sum_i(Q_i/min{P_i,D_i})>=1 then you
> can have sporadic deadline misses, but it should still be possible to
> compute an upper bound for the tardiness.
> But this is just a feeling, I have no proof... :)

The paper referenced by Bjoern yesterday mentioned that Baruah et al.
did that proof.

"For the considered case di < pi , Baruah et al. [7] showed
that the complexity increases considerably. However, as-
suming the processor utilization to be strictly less than 1,
Baruah et al. [6, 7] proved that if a deadline is missed,
this happens within a maximum time upper bound which
can be computed."


[6] S. Baruah, A. Mok, and L. Rosier. Preemptively scheduling
hard-real-time sporadic tasks on one processor. Proceedings
of the 11th IEEE Real-Time Systems Symposium, December 1990.

[7] S. Baruah, L. Rosier, and R. Howell. Algorithms and
complexity concerning the preemptive scheduling of peri-
odic real-time tasks on one processor. Real-Time Systems,
2(4):301–324, November 1990.


It also jives well with that I remember from reading through Jim's
papers on Soft-RT G-EDF.

2010-07-10 14:49:23

by Dario Faggioli

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Sat, 2010-07-10 at 12:28 +0200, Peter Zijlstra wrote:
> > Mmm... I see... Are you thinking of another scheduling class? Or maybe
> > just another queue with "higher priority" inside the same scheduling
> > class (sched_dl.c)?
>
> Inside the same class, since as you say that would allow sharing lots of
> things, also conceptually it makes sense as the admission tests would
> really have to share a lot of data between them anyway.
>
Ok, fine. So my plan is to let what I have out really as fast as I can
so that you can see the progresses we made, and then start working on
this new thing...

> Right, so that's a good point, I'm wondering if we should use two
> separate policies or use the one policy, SCHED_DEADLINE, and use flags
> to distinguish between these two uses.
>
> Anyway, that part is the easy part to implement and shouldn't take more
> than a few minutes to flip between one and the other.
>
Yes, that will be need very low effort to change.

> But if you have a per-cpu bandwidth, and the number of cpus, you also
> have the total amount of bandwidth available to G-EDF, no?
>
That's for sure true. If you like that I can add something like it quite
easily... I'll try to do that on a per-domain basis, instead of truly
fully global.

> So it really doesn't matter how we specify the group budget, one global
> clock or one clock per cpu, if we have the number of cpus involved we
> can convert between those.
>
> (c) use a 'global' bw pool and be blissfully ignorant of the
> per-cpu things?
>
Ok, if it's not a problem having a global pool I think I'll keep the per
CPU bw specification as well as per CPU bw availability. Not only
because I like multiplication more than divisions, but since I think
they could be useful if someone want to achieve a partitioned behaviour
through setting affinities accordingly.

> > Ok, that seems possible to me, but since I have to write the code you
> > must tell me what you want the semantic of (syswide and per-group)
> > sched_dl_{runtime,period} to become and how should I treat them! :-)
>
> Right, so for the system-wide and group bandwidth limits I think we
> should present them as if there's one clock, and let the scheduler sort
> out how many cpus are available to make it happen.
>
As said above, I agree and I'll do exactly that...

> So we specify bandwidth as if we were looking at our watch, and say,
> this here group can consume 30 seconds every 2 minutes. If the
> load-balance domains happen to be larger than 1 cpu, hooray we can run
> more tasks and the total available bandwidth simply gets multiplied by
> the number of available cpus.
>
> Makes sense?
>
Yep. Thanks! :-)

Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2010-07-10 15:11:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Sat, 2010-07-10 at 09:50 +0200, Raistlin wrote:

> Hey, fine, where's the problem? :-P

We're talking about it.. the exact semantics and the reasons
therefore ;-)

> > What are the exact semantics of this extra proposed syscall?
> >
> Right now, it is:
> task_wait_interval(t) --> "wake me up at the first instant after t when
> you can give me my full runtime"
>
> > What exactly are the benefits over not having it, and simply rely on the
> > task to not wake up more often, but if it does have it run into the lack
> > of budget and sort it that way?
> >
> What you're saying obviously will always work, and it is actually a
> quite common usage pattern (we use it like that a lot! :-)).
>
> The new syscall might help when it is important for a task to
> synchronize with the budget provisioning mechanism. It might be
> uncommon, but there could be situations --more in hard than in soft
> scenarios-- where you want to be sure that you're next job (and all the
> subsequent ones, if you behave well) will get its full runtime, even if
> this means waiting a little bit.
>
> what I was wondering was if this semantic should be modified by the
> introduction of the "period", but I also agree with Luca that we must do
> our best to avoid confusion!

Right, so I would actually expect RT job release to be triggered by
external events (say interrupts) more than on their own. And when its an
external event I don't really see the use of this new syscall.

I guess I'm asking for what reason RT tasks would be ever be
self-releasing, it seems, odd..

2010-07-10 17:19:13

by Harald Gustafsson

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

2010/7/9 Peter Zijlstra <[email protected]>:
> One thing we could do, although this would make the proposed scheduler a
> wee bit more complex, is split between hard and soft realtime. Only
> accept P==rel_D for hard, and schedule the soft tasks in the hard slack
> or something like that.
>
> That way we can later extend the hard admission tests to accept more.

Sorry for jumping in a bit late. I'm not that happy with this
suggestion if I understand you correctly. The main reason for having
deadlines shorter than the periods is for tasks that need a short
response time and likely are most important for the system to be
scheduled as fast as possible. Hence if they get scheduled after the
tasks with deadline=period then that defeat the purpose with having a
short deadline. Quite often these tasks are short and hence only need
a low bandwidth, i.e. long period between activations relative to
deadline and runtime.

But also other use cases exist with longer running tasks (e.g. around
5-10 ms) per period (e.g. around 20 ms). You might have several of
such tasks running, but as a system designer you know that their
activation phase will allow them to be scheduled interleaved. This can
be for example you know that the interrupt pattern waking the tasks
are interleaved. The admission test would be even more complex if we
also need to take into account the phases of task periods. Hence I
think some of these things need to be left for the system designer
without being hindered by an admission into the highest hard deadline
scheduling policy. As you might have understood I'm mostly talking
about embedded system, which have some tasks that are central parts of
the running system but which also might in parallel run more generic
software.

Did I get your proposal correct? Do you agree that tasks that need to
set a deadline < period usually do that since they need to be
scheduled in quickly?

2010-07-10 17:29:49

by Harald Gustafsson

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

> Right, so I would actually expect RT job release to be triggered by
> external events (say interrupts) more than on their own. And when its an
> external event I don't really see the use of this new syscall.

I agree that the common usage would be to wake up by an external
event, rather than sleeping for a specified time period that is longer
than the set period. I would assume that the period would be changed
to match some generic operational unit of the task like a frame rate
or packet rate, etc.

2010-07-10 18:32:50

by Peter Zijlstra

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Sat, 2010-07-10 at 19:19 +0200, Harald Gustafsson wrote:
> 2010/7/9 Peter Zijlstra <[email protected]>:
> > One thing we could do, although this would make the proposed scheduler a
> > wee bit more complex, is split between hard and soft realtime. Only
> > accept P==rel_D for hard, and schedule the soft tasks in the hard slack
> > or something like that.
> >
> > That way we can later extend the hard admission tests to accept more.
>
> Sorry for jumping in a bit late. I'm not that happy with this
> suggestion if I understand you correctly. The main reason for having
> deadlines shorter than the periods is for tasks that need a short
> response time and likely are most important for the system to be
> scheduled as fast as possible. Hence if they get scheduled after the
> tasks with deadline=period then that defeat the purpose with having a
> short deadline. Quite often these tasks are short and hence only need
> a low bandwidth, i.e. long period between activations relative to
> deadline and runtime.

> Did I get your proposal correct? Do you agree that tasks that need to
> set a deadline < period usually do that since they need to be
> scheduled in quickly?

Sure, but 'quickly' doesn't convey whether its soft or hard RT you're
interested in. The soft scheme would still have a bound on the tardiness
and is quite sufficient for a large number of workloads (but of course
there are plenty hard workloads too).

> But also other use cases exist with longer running tasks (e.g. around
> 5-10 ms) per period (e.g. around 20 ms). You might have several of
> such tasks running, but as a system designer you know that their
> activation phase will allow them to be scheduled interleaved. This can
> be for example you know that the interrupt pattern waking the tasks
> are interleaved. The admission test would be even more complex if we
> also need to take into account the phases of task periods. Hence I
> think some of these things need to be left for the system designer
> without being hindered by an admission into the highest hard deadline
> scheduling policy. As you might have understood I'm mostly talking
> about embedded system, which have some tasks that are central parts of
> the running system but which also might in parallel run more generic
> software.

That is a very delicate point, the whole reason SCHED_FIFO and friends
suck so much is that they don't provide any kind of isolation, and thus,
as an Operating-System abstraction they're an utter failure.

If you take out admission control you end up with a similar situation.

In general the sporadic task model schedulers don't need to be
privileged because it does provide isolation. But the moment you allow
by-passing the admission control everything goes out the window. So even
a simple privileged flag telling the admission control to stuff it would
render the whole system unusable, you'd have to start failing everything
not having that flag, simply because the admission control is rendered
useless.

So I would like to take the stand that the mainline scheduler will not
allow such a knob and people will have to help out with improving the
admission controller.

Embedded people can of course easily hack in whatever they well fancy,
and adding the 'yes_I_really_want_this_anyway' flag or even taking out
admission control all together is something the GPL allows them to do.

2010-07-10 20:08:49

by Harald Gustafsson

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

2010/7/10 Peter Zijlstra <[email protected]>:
> Sure, but 'quickly' doesn't convey whether its soft or hard RT you're
> interested in. The soft scheme would still have a bound on the tardiness
> and is quite sufficient for a large number of workloads (but of course
> there are plenty hard workloads too).

It's soft RT (since exist ways of recover) but with high requirement
on that failures being rare, e.g. once every 3000 periods due to QoS.

>> But also other use cases exist with longer running tasks (e.g. around
>> 5-10 ms) per period (e.g. around 20 ms). You might have several of
>> such tasks running, but as a system designer you know that their
>> activation phase will allow them to be scheduled interleaved. This can
>> be for example you know that the interrupt pattern waking the tasks
>> are interleaved. The admission test would be even more complex if we
>> also need to take into account the phases of task periods. Hence I
>> think some of these things need to be left for the system designer
>> without being hindered by an admission into the highest hard deadline
>> scheduling policy. As you might have understood I'm mostly talking
>> about embedded system, which have some tasks that are central parts of
>> the running system but which also might in parallel run more generic
>> software.
>
> That is a very delicate point, the whole reason SCHED_FIFO and friends
> suck so much is that they don't provide any kind of isolation, and thus,
> as an Operating-System abstraction they're an utter failure.
>
> If you take out admission control you end up with a similar situation.

OK, I see your point, and I also want to keep the isolation, its just
that I thought that the complexity might be too large to be accepted
by mainline. Let's work towards a solution with good admission
control, i.e. having more complex admission control to handle this.

> In general the sporadic task model schedulers don't need to be
> privileged because it does provide isolation. But the moment you allow
> by-passing the admission control everything goes out the window. So even
> a simple privileged flag telling the admission control to stuff it would
> render the whole system unusable, you'd have to start failing everything
> not having that flag, simply because the admission control is rendered
> useless.

Yes, thats true if you have any truly hard RT tasks in the system.
Could we have a way of making tasks with deadline=period also go into
the soft deadline RT policy and not just always run before any
deadline<period tasks? Maybe utilizing the flags field. In this way we
rather demote all tasks than elevate some tasks above other tasks, and
then the system designer could make sure that only using hard RT when
needed (and supported by the admission control). Since the fact that
it can be easily admission controlled is maybe not a sufficient fact
for making the user want to have it as a hard RT task, running before
tasks needing complex admission control. Also, without such notion the
behavior might change with new kernel releases when the admission
control is developed i.e suddenly some tasks will be scheduled as hard
that was previously scheduled as soft deadline RT.

> So I would like to take the stand that the mainline scheduler will not
> allow such a knob and people will have to help out with improving the
> admission controller.

OK, now I agree, let's keep the hard RT, but allow soft deadline
scheduling also.

> Embedded people can of course easily hack in whatever they well fancy,
> and adding the 'yes_I_really_want_this_anyway' flag or even taking out
> admission control all together is something the GPL allows them to do.

Not an option I would like to pursue, it should be possible to get a
working solution without this.

2010-07-10 21:52:46

by Dario Faggioli

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Sat, 2010-07-10 at 22:08 +0200, Harald Gustafsson wrote:
> > That is a very delicate point, the whole reason SCHED_FIFO and friends
> > suck so much is that they don't provide any kind of isolation, and thus,
> > as an Operating-System abstraction they're an utter failure.
> >
> > If you take out admission control you end up with a similar situation.
>
> OK, I see your point, and I also want to keep the isolation, its just
> that I thought that the complexity might be too large to be accepted
> by mainline. Let's work towards a solution with good admission
> control, i.e. having more complex admission control to handle this.
>
Indeed. I think things might be done step by step, relaxing the
constraints as long as we find better solutions.

> > Embedded people can of course easily hack in whatever they well fancy,
> > and adding the 'yes_I_really_want_this_anyway' flag or even taking out
> > admission control all together is something the GPL allows them to do.
>
> Not an option I would like to pursue, it should be possible to get a
> working solution without this.
>
Yeah, I see your point and agree with it. Btw, I think that, even in the
configuration described by Peter, if you --as an embedded system
engineer-- have the full control of your device/product, you can avoid
having any hard-rt task. Then, if you only have soft ones, you'll get
the benefit of having the possibility of setting D!=P without suffering
of any interference... Am I right?

I think this could be a viable solution, at least until we have
something better to relax assumptions on the schedulability test for
hard tasks, isn't it?

Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2010-07-11 05:41:34

by Harald Gustafsson

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

2010/7/10 Raistlin <[email protected]>:
> Indeed. I think things might be done step by step, relaxing the
> constraints as long as we find better solutions.
Yes, I definitely think the step by step approach is best here. So
that eventually its possible to use hard deadline RT also for more
complex task activation patterns.

>> > Embedded people can of course easily hack in whatever they well fancy,
>> > and adding the 'yes_I_really_want_this_anyway' flag or even taking out
>> > admission control all together is something the GPL allows them to do.
>>
>> Not an option I would like to pursue, it should be possible to get a
>> working solution without this.
>>
> Yeah, I see your point and agree with it. Btw, I think that, even in the
> configuration described by Peter, if you --as an embedded system
> engineer-- have the full control of your device/product, you can avoid
> having any hard-rt task. Then, if you only have soft ones, you'll get
> the benefit of having the possibility of setting D!=P without suffering
> of any interference... Am I right?

Yes, this is exactly what I mean. As long as the interface gives the
user control to be able to demote a task to soft RT even if the
admission control could allow it into the hard RT policy. Then later
when we have managed to include more complex admission controls into
the kernel, the system designer could when his system setup is handled
move over to always use hard RT. This would also mean that people
would start using the DL scheduler and having a motivation of
improving the admission control, because it is preferred to have the
benefit of reliable isolation between tasks.

> I think this could be a viable solution, at least until we have
> something better to relax assumptions on the schedulability test for
> hard tasks, isn't it?
Yes, I agree.

Regards,
Harald

2010-07-11 06:13:07

by Bjoern Brandenburg

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Jul 10, 2010, at 12:36 PM, Peter Zijlstra wrote:
> On Sat, 2010-07-10 at 09:11 +0200, Luca Abeni wrote:
>> On Fri, 2010-07-09 at 16:24 +0200, Peter Zijlstra wrote:
>>> On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote:
>>>> Basically, from the scheduling point of view, what it could happen is
>>>> that I'm still _NOT_ going to allow a task with runtime Q_i, deadline
>>>> D_i and period P_i to use more bandwidth than Q_i/P_i, I'm still using D
>>>> for scheduling but the passing of the simple in-kernel admission test
>>>> Sum_i(Q_i/P_i)<1 won't guarantee that the task will always finish its
>>>> jobs before D.
>>>
>>> But the tardiness would still be bounded, right? So its a valid Soft-RT
>>> model?
>
>> I think that if Sum_i(Q_i/P_i)<1 but Sum_i(Q_i/min{P_i,D_i})>=1 then you
>> can have sporadic deadline misses, but it should still be possible to
>> compute an upper bound for the tardiness.
>> But this is just a feeling, I have no proof... :)
>
> The paper referenced by Bjoern yesterday mentioned that Baruah et al.
> did that proof.
>
> "For the considered case di < pi , Baruah et al. [7] showed
> that the complexity increases considerably. However, as-
> suming the processor utilization to be strictly less than 1,
> Baruah et al. [6, 7] proved that if a deadline is missed,
> this happens within a maximum time upper bound which
> can be computed."
>
>
> [6] S. Baruah, A. Mok, and L. Rosier. Preemptively scheduling
> hard-real-time sporadic tasks on one processor. Proceedings
> of the 11th IEEE Real-Time Systems Symposium, December 1990.
>
> [7] S. Baruah, L. Rosier, and R. Howell. Algorithms and
> complexity concerning the preemptive scheduling of peri-
> odic real-time tasks on one processor. Real-Time Systems,
> 2(4):301?324, November 1990.

These are two different things.

"Bounded tardiness" means "is there a constant x such that _every deadline_ is missed by at most x" and is commonly used in the context of global multiprocessor schedulers.

Baruah et al. showed that _on a uniprocessor_ the first deadline miss occurs within a bounded-length interval if the system is not fully utilized, which yields a slow but accurate admissions test. This result doesn't say by how much the first deadline is missed (if it is missed), nor whether or not there will be later deadline misses with ever-growing tardiness. (I don't think that's the case, but Baruah et al.'s work was not concerned with this question.)

>
>
> It also jives well with that I remember from reading through Jim's
> papers on Soft-RT G-EDF.
>

Guaranteed bounded tardiness as long as the system is not over-utilized for the case P_i!=D_i follows from Leontyev & Anderson's work on window-constrained scheduling policies [1]: each priority priority point (i.e., deadline) lies within the window [release, release + max(D_i)] and is thus window-constrained.

- Bj?rn

[1] H. Leontyev and J. Anderson, " Generalized Tardiness Bounds for Global Multiprocessor Scheduling ", Proceedings of the 28th IEEE Real-Time Systems Symposium, pp. 413-422, December 2007. http://www.cs.unc.edu/~anderson/papers/rtss07b.pdf

---
Bj?rn B. Brandenburg
Ph.D. Student
Dept. of Computer Science
University of North Carolina at Chapel Hill
http://www.cs.unc.edu/~bbb



2010-07-11 06:15:59

by Bjoern Brandenburg

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE


On Jul 10, 2010, at 9:50 AM, Raistlin wrote:

>>
>> What are the exact semantics of this extra proposed syscall?
>>
> Right now, it is:
> task_wait_interval(t) --> "wake me up at the first instant after t when
> you can give me my full runtime"
>
>> What exactly are the benefits over not having it, and simply rely on the
>> task to not wake up more often, but if it does have it run into the lack
>> of budget and sort it that way?
>>
> What you're saying obviously will always work, and it is actually a
> quite common usage pattern (we use it like that a lot! :-)).
>
> The new syscall might help when it is important for a task to
> synchronize with the budget provisioning mechanism. It might be
> uncommon, but there could be situations --more in hard than in soft
> scenarios-- where you want to be sure that you're next job (and all the
> subsequent ones, if you behave well) will get its full runtime, even if
> this means waiting a little bit.

Isn't this basically sched_yield? Don't run me now, give lower-priority work a chance to complete, let me run again when my budget is replenished.

Otherwise, what are the semantics of sched_yield under EDF? Cycle through equal-deadline jobs? Given sporadic task activations, this is probably not very useful.

- Bj?rn

2010-07-11 06:43:39

by Bjoern Brandenburg

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Jul 10, 2010, at 11:01 AM, Raistlin wrote:
> On Fri, 2010-07-09 at 18:35 +0200, Peter Zijlstra wrote:
>> It will very much depend on how we're going to go about doing things
>> with that 'load-balancer' thingy anyway.
>>
> Agree. The "load-balancer" right now pushes/pulls tasks to/from the
> various runqueue --just how the saame thing happens in sched-rt-- to,
> say, approximate G-EDF. Code is on the git... I just need some time to
> clean up a little bit more and post the patches, but it's already
> working at least... :-)
>
>> The idea is that we approximate G-EDF by moving tasks around, but Dario
>> told me the admission tests are still assuming P-EDF.
>>
> Yep, as said above, that's what we've done since now. Regarding
> "partitioned admission", let me try to explain this.
>
> You asked me to use sched_dl_runtime_us/sched_dl_period_us to let people
> decide how much bandwidth should be devoted to EDF tasks. This obviously
> yields to _only_one_ bandwidth value that is then utilized as the
> utilization cap on *each* CPU, mainly for consistency reasons with
> sched_rt_{runtime,period}_us. At that time I was using such value as the
> "overall EDF bandwidth", but I changed to your suggested semantic.
>
> Obviously this works perfectly as long as tasks stay on the CPU were
> they are created, and if they're manually migrated (by explicitly
> changing their affinity) I can easily check if there's enough bandwidth
> on the target CPU, and if yes move the task and its bandwidth there.
> That's how things were before the 'load-balancer' (and still does, if
> you set affinities of tasks so to have a fully partitioned setup).
>
> With global scheduling in place, we have this new situation. A task is
> forked on a CPU (say 0), and I allow that if there's enough bandwidth
> for it on that processor (and obviously, if yes, I also consume such
> amount of bw). When the task is dynamically migrated to CPU 1 I have two
> choices:
> (a) I move the bandwidth it occupies from 0 to 1 or,
> (b) I leave it (the bw, not the task) where it is, on 0.
>
[...]
>
> If we want something better I cannot think on anything that doesn't
> include having a global (per-domain should be fine as well) mechanism
> for bandwidth accounting...


Now I am getting quite confused. In particular, all of the admission test discussion yesterday was P-EDF-specific, but it seems that now the discussion is about G-EDF.

Is the kernel implementing G-EDF or P-EDF? These are two different schedulers, with different admissions test. You can't just magically apply a test for P-EDF on each CPU and make claims about schedulability under G-EDF. Tracking per-CPU budgets under G-EDF is also meaningless.

Either this is a P-EDF implementation, in which case you need to track per-CPU utilization and move budgets around when you migrate tasks (i.e., (a) above) and you perform a P-EDF admissions test prior to _each_ migration (can you move the task as desired without causing some deadline to be missed on the target processor?), or it is a G-EDF implementation, in which case you only track total global utilization and perform G-EDF admissions tests on task creation (and no tests on task migration).

If you want to do G-EDF with limited and different budgets on each CPU (e.g., G-EDF tasks may only run for 100 out of 1000 ms on CPU 0, but for 400 out of 1000 ms on CPU 1), then you are entering the domain of restricted-supply scheduling, which is significantly more complicated to analyze (see [1,2]).

As far as I know there is no exiting analysis for "almost G-EDF", i.e., the case where each task may only migrate among a subset of the processors (= affinity masks), except for the special case of clustered EDF (C-EDF), wherein the subsets of processors are non-overlapping.

- Bj?rn

[1] I. Shin, A. Easwaran, I. Lee, Hierarchical Scheduling Framework for Virtual Clustering of Multiprocessors, in: Proceedings of the 20th Euromicro Conference on Real-Time Systems, 181?190, 2008.
[2] H. Leontyev and J. Anderson, Generalized Tardiness Bounds for Global Multiprocessor Scheduling, Real-Time Systems, Volume 44, Number 1, pages 26-71, February 2010. http://www.cs.unc.edu/~anderson/papers/rtj09.pdf


---
Bj?rn B. Brandenburg
Ph.D. Student
Dept. of Computer Science
University of North Carolina at Chapel Hill
http://www.cs.unc.edu/~bbb



2010-07-11 06:46:45

by Bjoern Brandenburg

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE


On Jul 10, 2010, at 9:08 AM, Raistlin wrote:

>>
>> As a side note, almost all global EDF hard real-time admission tests can handle tasks with constrained deadlines transparently. However, as far as I can tell, they do not apply to SCHED_DEADLINE.
>>
> This is the only part I am not sure I got... Can you explain me what do
> you mena by "they do not apply" ?

I was under the impression that the kernel implementation is a P-EDF implementation. Now it seems to me that it is in fact a "P-EDF with frequent migrations" implementation. I'd be hesitant to just assume that it "approximates G-EDF" sufficiently well to apply any of the published G-EDF tests.

- Bj?rn

---
Bj?rn B. Brandenburg
Ph.D. Student
Dept. of Computer Science
University of North Carolina at Chapel Hill
http://www.cs.unc.edu/~bbb



2010-07-11 07:33:24

by Bjoern Brandenburg

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Jul 10, 2010, at 10:08 PM, Harald Gustafsson wrote:
> 2010/7/10 Peter Zijlstra <[email protected]>
>>
>>
>> That is a very delicate point, the whole reason SCHED_FIFO and friends
>> suck so much is that they don't provide any kind of isolation, and thus,
>> as an Operating-System abstraction they're an utter failure.
>>
>> If you take out admission control you end up with a similar situation.
>
> OK, I see your point, and I also want to keep the isolation, its just
> that I thought that the complexity might be too large to be accepted
> by mainline. Let's work towards a solution with good admission
> control, i.e. having more complex admission control to handle this.
>
>> In general the sporadic task model schedulers don't need to be
>> privileged because it does provide isolation. But the moment you allow
>> by-passing the admission control everything goes out the window. So even
>> a simple privileged flag telling the admission control to stuff it would
>> render the whole system unusable, you'd have to start failing everything
>> not having that flag, simply because the admission control is rendered
>> useless.
>
> Yes, thats true if you have any truly hard RT tasks in the system.
> Could we have a way of making tasks with deadline=period also go into
> the soft deadline RT policy and not just always run before any
> deadline<period tasks? Maybe utilizing the flags field. In this way we
> rather demote all tasks than elevate some tasks above other tasks, and
> then the system designer could make sure that only using hard RT when
> needed (and supported by the admission control). Since the fact that
> it can be easily admission controlled is maybe not a sufficient fact
> for making the user want to have it as a hard RT task, running before
> tasks needing complex admission control. Also, without such notion the
> behavior might change with new kernel releases when the admission
> control is developed i.e suddenly some tasks will be scheduled as hard
> that was previously scheduled as soft deadline RT.

Trying to infer whether a task is "hard" or "soft" from task parameters is not a good idea, IMO. It's much better to make this an explicit part of the task model that is configured via sched_setparam. By default, tasks should be marked "soft" (this leaves more wiggle room to the kernel); users who care can change the flag to "hard".

Taking a step back, I think the problem here is that we are trying to shove too many concepts and properties into a single scheduler. Hard (no tardiness) is different from soft (bounded tardiness) is different from global is different from partitioned.

>From my point of view, it makes no sense to support hard deadlines under G-EDF (this is backed up by our schedulability studies [1]). Hard deadlines are best served by a P-EDF implementation (that only migrates on task creation/admission).

Also, it makes little sense to worry about supporting both hard and soft in a P-EDF implementation. Just schedule everything by its deadline and both hard and soft tasks will be fine (if the admissions test was passed).

In my opinion, it is much more useful to have a two-level scheduling framework: a truly partitioned EDF implementation at the top level to support "hard" tasks, and a truly global EDF implementation at the second level for "soft" tasks, with the property that the second level implementation is only invoked if the first level is idle. This nicely matches Linux's scheduling class hierarchy. This leaves the question how to avoid starving the second level if the first level has "large" budgets (e.g., wcet 1 hour, period 2 hours), but this can be done with some additional budget tracking and job slicing. This is explained in detail in [2] (please forgive me for promoting my own work, but I think it is relevant).

Here is what I think could work for the first iteration:

1) Set period, relative deadline, budget, and the hard-soft-flag with set_schedparam.
2) For hard tasks, the kernel finds the CPU with the lowest total utilization that is marked in the affinity masks and that passes a P-EDF admissions test.
3) Hard tasks are scheduled by a P-EDF implementation as described in [2].
4) For soft tasks, the kernel requires that _all_ CPUs are marked in the affinity mask.
5) Soft tasks are scheduled with G-EDF as described in [2].

Under this approach, both the P-EDF and G-EDF scheduling classes become simpler to implement since they only deal with one kind of task. In particular, load balancing under P-EDF is only required at admissions time, the kernel is not required to infer anything, and the policy is well-defined and predictable, and manual partitioning is available for "hard" tasks.

I assume that this works across 2-8 cores with shared caches, but (of course) G-EDF is not going to scale to largish numbers of cores and NUMA platforms. Thus, in later iterations 4) should be relaxed to allow clustered EDF.

If, however, you are more interested in supporting both hard and soft under a single G-EDF scheduler, then I recommend taking a look at [3].

- Bj?rn

[1] A. Bastoni, B. Brandenburg, and J. Anderson, An Empirical Comparison of Global, Partitioned, and Clustered Multiprocessor Real-Time Schedulers, in submission, May 2010. http://www.cs.unc.edu/~anderson/papers/rtss10c.pdf

[2] B. Brandenburg and J. Anderson, Integrating Hard/Soft Real-Time Tasks and Best-Effort Jobs on Multiprocessors, Proceedings of the 19th Euromicro Conference on Real-Time Systems, pp. 61-70, July 2007. http://www.cs.unc.edu/~anderson/papers/ecrts07b.pdf

[3] H. Leontyev and J. Anderson, A Unified Hard/Soft Real-Time Schedulability Test for Global EDF Multiprocessor Scheduling, Proceedings of the 29th IEEE Real-Time Systems Symposium, pp. 375-384, December 2008. http://www.cs.unc.edu/~anderson/papers/rtss08a.pdf

---
Bj?rn B. Brandenburg
Ph.D. Student
Dept. of Computer Science
University of North Carolina at Chapel Hill
http://www.cs.unc.edu/~bbb



2010-07-12 10:21:50

by Harald Gustafsson

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

2010/7/11 Bjoern Brandenburg <[email protected]>:
> Trying to infer whether a task is "hard" or "soft" from task parameters is not a good idea, IMO. It's much better to make this an explicit part of the task model that is configured via sched_setparam. By default, tasks should be marked "soft" (this leaves more wiggle room to the kernel); users who care can change the flag to "hard".

Yes, this was along the lines of my suggestions also.

> Taking a step back, I think the problem here is that we are trying to shove too many concepts and properties into a single scheduler. Hard (no tardiness) is different from soft (bounded tardiness) is different from global is different from partitioned.
>
> From my point of view, it makes no sense to support hard deadlines under G-EDF (this is backed up by our schedulability studies [1]). Hard deadlines are best served by a P-EDF implementation (that only migrates on task creation/admission).
>
> Also, it makes little sense to worry about supporting both hard and soft in a P-EDF implementation. Just schedule everything by its deadline and both hard and soft tasks will be fine (if the admissions test was passed).
>
> In my opinion, it is much more useful to have a two-level scheduling framework: a truly partitioned EDF implementation at the top level to support "hard" tasks, and a truly global EDF implementation at the second level for "soft" tasks, with the property that the second level implementation is only invoked if the first level is idle. This nicely matches Linux's scheduling class hierarchy.

Would this prevent soft RT tasks to be pinned to a specific CPU, i.e.
have an affinity set of one CPU? I'm asking since we would like to use
soft deadline RT (bounded tardiness, e.g. with relative deadlines
short than periods), but with threads set to run on a specific CPU.
This is due to that we do an analysis on data dependencies etc in user
space applications (or offline) and distribute jobs among threads on
each CPU. At the same time we would like the isolation between
parallel running such applications and other CPU activities.

I'm not saying that all soft RT needs to be pinned or that some such
tasks can't be made to meet hard admission control but we would need
the possibility. If I would need to priorities between soft deadline
RT and CPU affinity, soft deadline RT would win, since it is still
quite likely that the tasks get scheduled on individual cores when
needed.

When you say truly global EDF implementation does that mean the
scheduler only considers relative deadline and may "migrate" a thread
among all the CPUs at each scheduling instance, disregarding cache
effects, power management (e.g. waking an idle core), etc? Or can
these things be factored into the relative deadlines (maybe they
already are by the user)?

/Harald

2010-08-02 19:35:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Sun, 2010-07-11 at 09:32 +0200, Bjoern Brandenburg wrote:

> Trying to infer whether a task is "hard" or "soft" from task
> parameters is not a good idea, IMO. It's much better to make this an
> explicit part of the task model that is configured via sched_setparam.
> By default, tasks should be marked "soft" (this leaves more wiggle
> room to the kernel); users who care can change the flag to "hard".

I think we're in violent agreement here ;-) and I was convinced that was
what we were talking about. The question was only how to represent that
in the sched_param_ex structure, the options were:

struct sched_param_ex params;

params.flags |= SF_SOFT;
sched_setscheduler_ex( .policy = SCHED_DEADLINE, .param = &params);

vs

sched_setscheduler_ex( .policy = SCHED_DEADLINE_{SOFT,HARD},
.param = &params);

> Taking a step back, I think the problem here is that we are trying to
> shove too many concepts and properties into a single scheduler. Hard
> (no tardiness) is different from soft (bounded tardiness) is different
> from global is different from partitioned.
>
> From my point of view, it makes no sense to support hard deadlines
> under G-EDF (this is backed up by our schedulability studies [1]).
> Hard deadlines are best served by a P-EDF implementation (that only
> migrates on task creation/admission).
>
The problem is more that we need to support things like cpu affinity and
cpusets within the context of a 'global' scheduler.

Using cpusets we can partition the 'load-balancer' and create clusters
(root-domains in linux scheduler speak).

Using cpu affinity we can limit tasks to a subset of their cluster's
cpus.

Esp. the latter is very hard to do, and I think we can get away with
only allowing a single cpu or the full cluster (its a new policy, so
there is no existing userspace to break).

This ends up meaning we need to support both P-EDF and G-EDF for soft,
and since we want to re-use pretty much all the code and only have a
different admission test for hard (initially), it would end up also
being P/G-EDF for hard (even though as you rightly point out, hard G-EDF
is pretty pointless -- but since the policy doesn't promise EDF, we
could later improve it to be PD^2 or whatever, at which point global
hard does start to make sense).

(which I guess would suggest we use different policies instead of a
flag, since that would make most sense if we end up replacing the hard
part with another policy)

So what I want to have is a sporadic task scheduler, not an EDF
scheduler (hence also the request to s/SCHED_EDF/SCHED_DEADLINE/ --
avoiding the obvious SCHED_SPORADIC in order to avoid confusion with the
POSIX thing).

EDF is just the easiest of the many different ways to schedule a
sporadic task set.

2010-08-03 08:16:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Sun, 2010-07-11 at 08:46 +0200, Bjoern Brandenburg wrote:
> I'd be hesitant to just assume that it "approximates G-EDF"
> sufficiently well to apply any of the published G-EDF tests.

OK, suppose that for each cpu we keep the earliest and next-earliest
deadline in a table. Then on wakeup (job release) we pick the cpu with
the currently last deadline to preempt (we push the task).

On sleep (job completion) we look for the earliest among all
next-earliest deadlines to select the next runnable task (we pull the
task).

If we serialize all this using one big lock around this [ {earliest,
next-earliest} ] table, we've basically implemented G-EDF, agreed?

Now replace that global lock with an algorithm that looks at the table,
finds the last-earliest or earliest-next-earliest in a lock-less
fashion, then locks the target cpu's rq->lock, verifies the result and
either continues or tries again.

So we replace the global lock with cmpxchg like loops using 2 per-cpu
locks. Our current SCHED_FIFO balancer does just this and is found to be
a very good approximation of global-fifo (iirc there's one funny case,
but I can't remember, maybe Steve or Gregory can remember the details).

2010-08-03 09:41:31

by Peter Zijlstra

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Sun, 2010-07-11 at 08:42 +0200, Bjoern Brandenburg wrote:
>
> If you want to do G-EDF with limited and different budgets on each CPU
> (e.g., G-EDF tasks may only run for 100 out of 1000 ms on CPU 0, but
> for 400 out of 1000 ms on CPU 1), then you are entering the domain of
> restricted-supply scheduling, which is significantly more complicated
> to analyze (see [1,2]).

Without having looked at the refs, won't the soft case still have
bounded tardiness? Since the boundedness property mostly depends on
u<=1, that is, as long as we can always run everything within the
available time we won't start drifting.

> As far as I know there is no exiting analysis for "almost G-EDF",
> i.e., the case where each task may only migrate among a subset of the
> processors (= affinity masks), except for the special case of
> clustered EDF (C-EDF), wherein the subsets of processors are
> non-overlapping.

Right, affinity masks are a pain, hence I proposed to limit that to
either 1 cpu (yielding fully paritioned) or the full cluster.

That will leave us with only having to stack a partitioned and global
scheduler on top of each other, and per the previous point, I think that
ought to work out trivially for soft, hard otoh will get 'interesting'.

2010-08-03 09:46:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Sun, 2010-07-11 at 08:42 +0200, Bjoern Brandenburg wrote:
> If you want to do G-EDF with limited and different budgets on each CPU
> (e.g., G-EDF tasks may only run for 100 out of 1000 ms on CPU 0, but
> for 400 out of 1000 ms on CPU 1), then you are entering the domain of
> restricted-supply scheduling, which is significantly more complicated
> to analyze (see [1,2]).

Would making the thing homogenious by assuming the worst for all cpus
make the analysis easier? That is, in the above example, only allow the
G-EDF scheduler to run for 100 out of 1000 ms on both cpus.

2010-08-03 12:02:33

by Gregory Haskins

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

Hello Peter, Steven, Thomas, et al. !

Its been a while.

>>> On 8/3/2010 at 04:16 AM, in message <1280823360.1923.419.camel@laptop>, Peter
Zijlstra <[email protected]> wrote:

[snip]

>
> So we replace the global lock with cmpxchg like loops using 2 per-cpu
> locks. Our current SCHED_FIFO balancer does just this and is found to be
> a very good approximation of global-fifo (iirc there's one funny case,
> but I can't remember, maybe Steve or Gregory can remember the details).

The only thing I remember at the moment is that Steven and I found one last remaining bug in an extreme corner
case where priority order may be violated. Sadly, I can no longer recall the exact specifics and would have to dig
into the code to remember. The good news is I believe the issue is fixable. Its just a matter of impl+test, which
neither of us ever seem to have found the time to properly dedicate. Perhaps this will be the catalyst ;)

-Greg

2010-08-04 04:38:16

by Andrea Bastoni

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On 08/03/2010 05:46 AM, Peter Zijlstra wrote:
> On Sun, 2010-07-11 at 08:42 +0200, Bjoern Brandenburg wrote:
>> If you want to do G-EDF with limited and different budgets on each CPU
>> (e.g., G-EDF tasks may only run for 100 out of 1000 ms on CPU 0, but
>> for 400 out of 1000 ms on CPU 1), then you are entering the domain of
>> restricted-supply scheduling, which is significantly more complicated
>> to analyze (see [1,2]).
>
> Would making the thing homogenious by assuming the worst for all cpus
> make the analysis easier? That is, in the above example, only allow the
> G-EDF scheduler to run for 100 out of 1000 ms on both cpus.

Both [1,2] (and also [4]) assumes a more general model than the one based on the worst for all
CPUs, therefore the analysis (based on these papers) will likely be more pessimistic, but not
necessarily easier.

- Andrea

[4] E. Bini, M. Bertogna, S. Baruah, Virtual Multiprocessor Platforms: Specification and Use. In
Proceedings of the 2009 30th IEEE Real-Time Systems Symposium, 437-446, 2009.

--
Andrea Bastoni
Visiting Ph.D. Student
Dept. of Computer Science
University of North Carolina at Chapel Hill
http://www.sprg.uniroma2.it/home/bastoni/

2010-08-04 04:38:27

by Andrea Bastoni

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On 08/03/2010 05:41 AM, Peter Zijlstra wrote:
> On Sun, 2010-07-11 at 08:42 +0200, Bjoern Brandenburg wrote:
>>
>> If you want to do G-EDF with limited and different budgets on each CPU
>> (e.g., G-EDF tasks may only run for 100 out of 1000 ms on CPU 0, but
>> for 400 out of 1000 ms on CPU 1), then you are entering the domain of
>> restricted-supply scheduling, which is significantly more complicated
>> to analyze (see [1,2]).
>
> Without having looked at the refs, won't the soft case still have
> bounded tardiness? Since the boundedness property mostly depends on
> u<=1, that is, as long as we can always run everything within the
> available time we won't start drifting.

Yes, the soft case will still have bounded tardiness (see [2]), although the reason is more
related to the fact that priorities are defined by deadlines, than to U<=1.

Anyway, both hard and soft real-time cases become very difficult to analyze if limited/different
budgets are allowed on each CPU.

>> As far as I know there is no exiting analysis for "almost G-EDF",
>> i.e., the case where each task may only migrate among a subset of the
>> processors (= affinity masks), except for the special case of
>> clustered EDF (C-EDF), wherein the subsets of processors are
>> non-overlapping.
>
> Right, affinity masks are a pain, hence I proposed to limit that to
> either 1 cpu (yielding fully paritioned) or the full cluster.

I'm not sure I get what you mean by "full cluster". With G-EDF-like scheduling policies it only
makes sense to cluster cores around some memory level (cache Lx, NUMA node...), as the idea is
to reduce the cost of a task migration among cores. Depending on the workload, a higher (lower)
level of clustering may perform better.

A "full cluster" therefore should be created around some memory level. But if a socket has, for
example, two level of caches (L2 + L3) and a "full cluster" forces to select all cores in the
socket (first hierarchy level in cpusets), we miss the possibility to cluster the cores that
shares the L2 (and this configuration may lead better performance). In these clusters the
processors are _non-overlapping_.

Instead, if you want to use the cpuset + affinity to define possibly _overlapping_ clusters (or
containers, or servers) to support different budgets on each CPU (something similar to cgroup,
see [1,3]), forcing only two configuration (single cpu/full cluster) may be restrictive.

Thanks,

- Andrea

[3] H. Leontyev and J. Anderson, " A Hierarchical Multiprocessor Bandwidth Reservation Scheme
with Timing Guarantees", Real-Time Systems, Volume 43, Number 1, pp. 60-92, September 2009.
http://www.cs.unc.edu/~anderson/papers/rtj09b.pdf

--
Andrea Bastoni
Visiting Ph.D. Student
Dept. of Computer Science
University of North Carolina at Chapel Hill
http://www.sprg.uniroma2.it/home/bastoni/

2010-08-04 05:02:21

by Bjoern Brandenburg

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE


On Aug 3, 2010, at 5:46 AM, Peter Zijlstra <[email protected]> wrote:

> On Sun, 2010-07-11 at 08:42 +0200, Bjoern Brandenburg wrote:
>> If you want to do G-EDF with limited and different budgets on each CPU
>> (e.g., G-EDF tasks may only run for 100 out of 1000 ms on CPU 0, but
>> for 400 out of 1000 ms on CPU 1), then you are entering the domain of
>> restricted-supply scheduling, which is significantly more complicated
>> to analyze (see [1,2]).
>
> Would making the thing homogenious by assuming the worst for all cpus
> make the analysis easier? That is, in the above example, only allow the
> G-EDF scheduler to run for 100 out of 1000 ms on both cpus.

It would, but that severely limits your non-G-EDF tasks. What if you want to provision a P-EDF task with a period of ten milliseconds? If you allow a G-EDF slice of guaranteed 100 ms, then your P-EDF task might miss a deadline if it arrives at the wrong time. If you let it preempt the G-EDF slice, then you get "out of sync" and the analysis can become tricky. If you scale down the G-EDF time slice length to less than ten ms, then overheads increase needlessly on all processors. EDF-HSB addresses specifically this problem (for the case where G-EDF is only used for soft tasks):

B. Brandenburg and J. Anderson, “Integrating Hard/Soft Real-Time Tasks and Best-Effort Jobs on Multiprocessors”, Proceedings of the 19th Euromicro Conference on Real-Time Systems (ECRTS 2007), pp. 61-70. IEEE, July 2007. http://www.cs.unc.edu/~anderson/papers/ecrts07b.pdf

- Björn-

2010-08-04 05:18:00

by Bjoern Brandenburg

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Aug 3, 2010, at 5:41 AM, Peter Zijlstra <[email protected]> wrote:

> On Sun, 2010-07-11 at 08:42 +0200, Bjoern Brandenburg wrote:
>>
>> If you want to do G-EDF with limited and different budgets on each CPU
>> (e.g., G-EDF tasks may only run for 100 out of 1000 ms on CPU 0, but
>> for 400 out of 1000 ms on CPU 1), then you are entering the domain of
>> restricted-supply scheduling, which is significantly more complicated
>> to analyze (see [1,2]).
>
> Without having looked at the refs, won't the soft case still have
> bounded tardiness? Since the boundedness property mostly depends on
> u<=1, that is, as long as we can always run everything within the
> available time we won't start drifting.

No, not in all cases. Suppose you have a soft task with a utilization of 0.2 and two G-EDF reservations of utilization 0.15 each (the rest of the CPU time is allocated to partitioned hard tasks), reservation periods and task period are equal, and no CPU is overutilized. If the reservations become available at different times during each period, then the soft task has bounded tardiness, but if they become available in parallel, then tardiness can grow unboundedly because it can only execute on one CPU at a time.

Hennadiy Leontvey has analyzed this in the bounded-tardiness context extensively and provides some counter examples in his dissertation. For hard real-time, the MPR papers that Andrea pointed out are the state of the art afaik. (I stand by my opinion that global hard real-time is less pressing for the Linux kernel.)

>
>> As far as I know there is no exiting analysis for "almost G-EDF",
>> i.e., the case where each task may only migrate among a subset of the
>> processors (= affinity masks), except for the special case of
>> clustered EDF (C-EDF), wherein the subsets of processors are
>> non-overlapping.
>
> Right, affinity masks are a pain, hence I proposed to limit that to
> either 1 cpu (yielding fully paritioned) or the full cluster.

Yes, I agree.

> That will leave us with only having to stack a partitioned and global
> scheduler on top of each other, and per the previous point, I think that
> ought to work out trivially for soft, hard otoh will get 'interesting'.

As I mentioned in my other reply, this is exactly what EDF-HSB (http://www.cs.unc.edu/~anderson/papers/ecrts07b.pdf) does.

- Björn

2010-08-04 05:34:43

by Bjoern Brandenburg

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Aug 2, 2010, at 3:34 PM, Peter Zijlstra <[email protected]> wrote:

> On Sun, 2010-07-11 at 09:32 +0200, Bjoern Brandenburg wrote:
>
>> Trying to infer whether a task is "hard" or "soft" from task
>> parameters is not a good idea, IMO. It's much better to make this an
>> explicit part of the task model that is configured via sched_setparam.
>> By default, tasks should be marked "soft" (this leaves more wiggle
>> room to the kernel); users who care can change the flag to "hard".
>
> I think we're in violent agreement here ;-) and I was convinced that was
> what we were talking about. The question was only how to represent that
> in the sched_param_ex structure, the options were:
>
> struct sched_param_ex params;
>
> params.flags |= SF_SOFT;
> sched_setscheduler_ex( .policy = SCHED_DEADLINE, .param = &params);
>
> vs
>
> sched_setscheduler_ex( .policy = SCHED_DEADLINE_{SOFT,HARD},
> .param = &params);

>From my point of view I don't really see a difference—the first approach probably lends itself better to adding additional variants but makes it harder to tweak a particular feature once it's used in userspace.

>
>> Taking a step back, I think the problem here is that we are trying to
>> shove too many concepts and properties into a single scheduler. Hard
>> (no tardiness) is different from soft (bounded tardiness) is different
>> from global is different from partitioned.
>>
>> From my point of view, it makes no sense to support hard deadlines
>> under G-EDF (this is backed up by our schedulability studies [1]).
>> Hard deadlines are best served by a P-EDF implementation (that only
>> migrates on task creation/admission).
>>
> The problem is more that we need to support things like cpu affinity and
> cpusets within the context of a 'global' scheduler.
>
> Using cpusets we can partition the 'load-balancer' and create clusters
> (root-domains in linux scheduler speak).
>
> Using cpu affinity we can limit tasks to a subset of their cluster's
> cpus.

Neither makes much sense in the context of a truly global scheduler. Creating non-overlapping clusters makes sense to create clustered schedulers.

>
> Esp. the latter is very hard to do, and I think we can get away with
> only allowing a single cpu or the full cluster (its a new policy, so
> there is no existing userspace to break).

It is probably a good idea to keep it as limited as possible in the beginning. There is no supporting theory afaik for overlapping clusters or non-uniform CPU affinities (i.e., nobody's looked at a case in which CPU affinities can vary from task to task).

>
> This ends up meaning we need to support both P-EDF and G-EDF for soft,
> and since we want to re-use pretty much all the code and only have a
> different admission test for hard (initially), it would end up also
> being P/G-EDF for hard

Couldn't you simply reject tasks that call setscheduler() with a combination of 'hard' and a CPU affinity mask with more than one CPU allowed?

> (even though as you rightly point out, hard G-EDF
> is pretty pointless -- but since the policy doesn't promise EDF, we
> could later improve it to be PD^2 or whatever, at which point global
> hard does start to make sense).
>
> (which I guess would suggest we use different policies instead of a
> flag, since that would make most sense if we end up replacing the hard
> part with another policy)

I don't think the implemented policy is just a hidden implementation detail that can be transparently changed in a future kernel version. Sure, other sporadic schedulers exist, but a lot of things in real-time systems depend on the scheduling policy (e.g., retry bounds for lock-free data structures, locking protocols, overhead accounting, etc.) so that changing it without revalidating/testing everything is not really an option. (At least not for applications that could be considered 'almost hard real-time').

>
> So what I want to have is a sporadic task scheduler, not an EDF
> scheduler (hence also the request to s/SCHED_EDF/SCHED_DEADLINE/ --
> avoiding the obvious SCHED_SPORADIC in order to avoid confusion with the
> POSIX thing).
>
> EDF is just the easiest of the many different ways to schedule a
> sporadic task set.

In theory I fully agree, but I strongly suspect that applications are going to assume a specific implementation anyway in practice. Changing the implemented scheduling policy for sporadic tasks would essentially force all existing embedded apps that make use of this API back into testing, which likely means that newer kernel versions could not be adopted for older projects (which may be the case anyway). In light of this, it may not be unreasonable to promise a specific policy.

However, a generic SCHED_DEADLINE does not preclude the option of supporting specific policies later, so this might not be a big deal in the end.
>


- Björn-

2010-08-04 05:55:30

by Bjoern Brandenburg

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Jul 12, 2010, at 6:21 AM, Harald Gustafsson <[email protected]> wrote:

> 2010/7/11 Bjoern Brandenburg <[email protected]>:
>> Trying to infer whether a task is "hard" or "soft" from task parameters is not a good idea, IMO. It's much better to make this an explicit part of the task model that is configured via sched_setparam. By default, tasks should be marked "soft" (this leaves more wiggle room to the kernel); users who care can change the flag to "hard".
>
> Yes, this was along the lines of my suggestions also.
>
>> Taking a step back, I think the problem here is that we are trying to shove too many concepts and properties into a single scheduler. Hard (no tardiness) is different from soft (bounded tardiness) is different from global is different from partitioned.
>>
>> From my point of view, it makes no sense to support hard deadlines under G-EDF (this is backed up by our schedulability studies [1]). Hard deadlines are best served by a P-EDF implementation (that only migrates on task creation/admission).
>>
>> Also, it makes little sense to worry about supporting both hard and soft in a P-EDF implementation. Just schedule everything by its deadline and both hard and soft tasks will be fine (if the admissions test was passed).
>>
>> In my opinion, it is much more useful to have a two-level scheduling framework: a truly partitioned EDF implementation at the top level to support "hard" tasks, and a truly global EDF implementation at the second level for "soft" tasks, with the property that the second level implementation is only invoked if the first level is idle. This nicely matches Linux's scheduling class hierarchy.
>
> Would this prevent soft RT tasks to be pinned to a specific CPU, i.e.
> have an affinity set of one CPU? I'm asking since we would like to use
> soft deadline RT (bounded tardiness, e.g. with relative deadlines
> short than periods), but with threads set to run on a specific CPU.
> This is due to that we do an analysis on data dependencies etc in user
> space applications (or offline) and distribute jobs among threads on
> each CPU. At the same time we would like the isolation between
> parallel running such applications and other CPU activities.

I think this is asking the wrong question. In the context of a global scheduler, it doesn't make sense to pin threads to individual CPUs. However, nobody is forcing you to use a global scheduler—just use partitioned EDF for your soft tasks, too, and you get bounded tardiness (the bound happens to be zero for implicit-deadline tasks). The appeal of using a global scheduler at all is that some workloads have bounded tardiness under global scheduling but not under partitioned scheduling, but the reverse is not true.

>
> I'm not saying that all soft RT needs to be pinned or that some such
> tasks can't be made to meet hard admission control but we would need
> the possibility. If I would need to priorities between soft deadline
> RT and CPU affinity, soft deadline RT would win, since it is still
> quite likely that the tasks get scheduled on individual cores when
> needed.
>
> When you say truly global EDF implementation does that mean the
> scheduler only considers relative deadline and may "migrate" a thread
> among all the CPUs at each scheduling instance, disregarding cache
> effects, power management (e.g. waking an idle core), etc?

Global EDF does not consider cache affinity (but a reasonable implementation will avoid needless migrations). Applications that require string cache affinity are better served by partitioned schedulers if it is possible to partition the workload.

Power management is likely an area that will require some additional thought. However, at ECRTS, Thomas was strongly advocating race-to-idle, and this is what G-EDF does anyway: use all cores to get everything out of the way, asap.

> Or can
> these things be factored into the relative deadlines (maybe they
> already are by the user)?

I'm not sure what you mean by that. Relative deadlines represent an application's timeliness requirements and nothing else. I guess the user could tweak parameters to force certain scheduling decisions, but I wouldn't recommend to encourage that in the kernel API.

- Björn


>

2010-08-04 06:30:26

by Bjoern Brandenburg

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Aug 3, 2010, at 4:16 AM, Peter Zijlstra <[email protected]> wrote:

> On Sun, 2010-07-11 at 08:46 +0200, Bjoern Brandenburg wrote:
>> I'd be hesitant to just assume that it "approximates G-EDF"
>> sufficiently well to apply any of the published G-EDF tests.
>
> OK, suppose that for each cpu we keep the earliest and next-earliest
> deadline in a table. Then on wakeup (job release) we pick the cpu with
> the currently last deadline to preempt (we push the task).
>
> On sleep (job completion) we look for the earliest among all
> next-earliest deadlines to select the next runnable task (we pull the
> task).
>
> If we serialize all this using one big lock around this [ {earliest,
> next-earliest} ] table, we've basically implemented G-EDF, agreed?

Yes, agreed. (Assuming that the next-earliest filed is always kept up-to-date by finding the next-earliest when the task is pulled.)

>
> Now replace that global lock with an algorithm that looks at the table,
> finds the last-earliest or earliest-next-earliest in a lock-less
> fashion, then locks the target cpu's rq->lock, verifies the result and
> either continues or tries again.

Can this lead to tasks bouncing back-and-forth? Under a strict interpretation of G-EDF, each job arrival should cause at most one migration. Can you bound the maximum number of times that the retry-loop is taken per scheduling decision? Can you prove that the lock-less traversal of the table yields a consistent snapshot, or is it possible to accidentally miss a priority inversion due to concurrent job arrivals?

In practice, repeated retries are probably not much of a problem, but not having a firm bound would violate strict validation rules (you can't prove it terminates), and would also violate academic real-time rules (again, you ought to be able to prove it correct). I realize that these rules may not be something that has a high priority for Linux, but on the other hand some properties such as the max number of migrations may be implicitly assumed in schedulability tests.

I'm not saying that the proposed implementation is not compatible with published analysis, but I'd be cautious to simply assume that it is. Some of the questions that were raised in this thread make it sound like the border between global and partitioned isn't clearly drawn in the implementation yet (e.g., handling of proc affinity masks), so my opinion may change when the code stabilizes. (This isn't meant as a criticism of Dario et al.'s good work; this is just something very hard to get right, and especially so on the first attempt.)

> So we replace the global lock with cmpxchg like loops using 2 per-cpu
> locks. Our current SCHED_FIFO balancer does just this and is found to be
> a very good approximation of global-fifo (iirc there's one funny case,
> but I can't remember, maybe Steve or Gregory can remember the details).

Going back to Dario's original comments, when combined with proc. affinities/partitioning you'd either have to move budget allocations from CPU to CPU or track a global utilization sum for admission test purposes.

- Björn
-

2010-08-04 07:14:31

by Peter Zijlstra

[permalink] [raw]
Subject: Re: periods and deadlines in SCHED_DEADLINE

On Tue, 2010-08-03 at 23:52 -0400, Andrea Bastoni wrote:
> Instead, if you want to use the cpuset + affinity to define possibly _overlapping_ clusters (or
> containers, or servers) to support different budgets on each CPU (something similar to cgroup,
> see [1,3]), forcing only two configuration (single cpu/full cluster) may be restrictive.

cpusets doesn't allow overlapping load-balance domains as it stands
today. In that case it would end up being a single large domain.

With cpu affinity we can of course create whatever we want, hence my
suggestion to limit allowed affinity masks to 1 cpu or the full
load-balance domain.