2008-10-28 16:13:53

by Henrik Austad

[permalink] [raw]
Subject: Rearranging layout of code in the scheduler

Hello,

Before I dive in, I should probably justify my motivations for writing
this email. I'm working away on implementing an EDF scheduler for real
time tasks in the kernel. This again leads to hacking at the existing
source as I'm not about to toss out the entire scheduler - just replace
(by some Kconfig switch) the RR/FIFO classes. As to why I'm looking at
EDF, I think the answer to that is a bit too long (and not appropriate
for this email anyway) so I'll leave that part out.

However, what I do mean to discuss is the current state of the scheduler
code. Working through the code, I must say I'm really impressed. The
code is clean, it is well thought out and the new approach with
sched_class and sched_entity makes it very modular. However, digging
deeper, I find myself turning more and more desperate, looking over my
shoulder for a way out.

Now, I'm in no doubt that the code *is* modular, that it *is* clean and
tidy, but coming from outside, it is not that easy to grasp it all. And,
it is not just the sheer size and complexity of the scheduler itself,
but also a lot with how the code is arranged.

For instance, functions like free_fair_sched_group,
alloc_fair_sched_group etc - does IMHO belong in sched_fair.c and not in
sched.c. The same goes for several rt-functions and structs.

So, if one drew up a list over all events that would cause functions in
sched.c to be called, this could be used to make a minimized 'interface'
and then let the scheduler call the appropriate function for the given
class in the same fashion sched_tick is used today.

What I would like, is to rip out all the *actual* scheduling logic and
put this in sched_[fair|rt].c and let sched be purely event-driven
(which would be a nice design goal in itself). This would also lead to
sched_[fair|rt].h, where the structs, macros, defines etc can be
defined. Today these are defined in just about everywhere, making the
code unnecessary complicated (I'm not going to say messy since I'm not
*that* senior to kernel coding :-))

Why not use the sched_class for all that it's worth - make the different
classes implement a set of functions and let sched.c be oblivious to
what's going on other than turning the machinery around?

Is this something worth pursuing? I mean, the scheduler *does* work, and
if it ain't broken, don't fix it. However, I have a strong feeling that
this can be done a lot cleaner, not to mention, changing from one type
of scheduler to another will be much easier. :-)

--
med Vennlig Hilsen - Yours Sincerely
Henrik Austad


Attachments:
(No filename) (2.49 kB)
signature.asc (189.00 B)
This is a digitally signed message part.
Download all attachments

2008-10-28 16:50:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Rearranging layout of code in the scheduler

On Tue, 2008-10-28 at 16:34 +0100, Henrik Austad wrote:
> Hello,
>
> Before I dive in, I should probably justify my motivations for writing
> this email. I'm working away on implementing an EDF scheduler for real
> time tasks in the kernel. This again leads to hacking at the existing
> source as I'm not about to toss out the entire scheduler - just replace
> (by some Kconfig switch) the RR/FIFO classes. As to why I'm looking at
> EDF, I think the answer to that is a bit too long (and not appropriate
> for this email anyway) so I'll leave that part out.

You and a few other folks. The most interesting part of EDF is not the
actual scheduler itself (although there are fun issues with that as
well), but extending the Priority Inheritance framework to deal with all
the fun cases that come with EDF.

> However, what I do mean to discuss is the current state of the scheduler
> code. Working through the code, I must say I'm really impressed. The
> code is clean, it is well thought out and the new approach with
> sched_class and sched_entity makes it very modular. However, digging
> deeper, I find myself turning more and more desperate, looking over my
> shoulder for a way out.
>
> Now, I'm in no doubt that the code *is* modular, that it *is* clean and
> tidy, but coming from outside, it is not that easy to grasp it all. And,
> it is not just the sheer size and complexity of the scheduler itself,
> but also a lot with how the code is arranged.
>
> For instance, functions like free_fair_sched_group,
> alloc_fair_sched_group etc - does IMHO belong in sched_fair.c and not in
> sched.c. The same goes for several rt-functions and structs.
>
> So, if one drew up a list over all events that would cause functions in
> sched.c to be called, this could be used to make a minimized 'interface'
> and then let the scheduler call the appropriate function for the given
> class in the same fashion sched_tick is used today.

I'd start out small by moving the functions to the right file. After
that you could look at providing methods in the sched_class.

> What I would like, is to rip out all the *actual* scheduling logic and
> put this in sched_[fair|rt].c and let sched be purely event-driven
> (which would be a nice design goal in itself). This would also lead to
> sched_[fair|rt].h, where the structs, macros, defines etc can be
> defined. Today these are defined in just about everywhere, making the
> code unnecessary complicated (I'm not going to say messy since I'm not
> *that* senior to kernel coding :-))

You might need to be careful there, or introduce sched_(fair|rt).h for
those.

> Why not use the sched_class for all that it's worth - make the different
> classes implement a set of functions and let sched.c be oblivious to
> what's going on other than turning the machinery around?

Sounds good, its been on the agenda for a while, but nobody ever got
around to it.

Other cleanups that can be done are:
- get rid of all the load balance iterator stuff and move
that all into sched_fair

- extract the common sched_entity members and create:

struct {
struct sched_entity_common common;
union {
struct sched_entity fair;
struct sched_rt_entity rt;
}
}

> Is this something worth pursuing? I mean, the scheduler *does* work, and
> if it ain't broken, don't fix it. However, I have a strong feeling that
> this can be done a lot cleaner, not to mention, changing from one type
> of scheduler to another will be much easier. :-)

Well, adding a sched_class, no need to replace anything besides that.


2008-10-28 19:42:09

by Henrik Austad

[permalink] [raw]
Subject: Re: Rearranging layout of code in the scheduler

On Tuesday 28 October 2008 17:50:27 Peter Zijlstra wrote:
> On Tue, 2008-10-28 at 16:34 +0100, Henrik Austad wrote:
> > Hello,
> >
> > Before I dive in, I should probably justify my motivations for writing
> > this email. I'm working away on implementing an EDF scheduler for real
> > time tasks in the kernel. This again leads to hacking at the existing
> > source as I'm not about to toss out the entire scheduler - just replace
> > (by some Kconfig switch) the RR/FIFO classes. As to why I'm looking at
> > EDF, I think the answer to that is a bit too long (and not appropriate
> > for this email anyway) so I'll leave that part out.
>
> You and a few other folks. The most interesting part of EDF is not the
> actual scheduler itself (although there are fun issues with that as
> well), but extending the Priority Inheritance framework to deal with all
> the fun cases that come with EDF.

well, yes, you trade in priority inversion with deadline inversion. Probably a
few other exotic issues as well :-)

> > However, what I do mean to discuss is the current state of the scheduler
> > code. Working through the code, I must say I'm really impressed. The
> > code is clean, it is well thought out and the new approach with
> > sched_class and sched_entity makes it very modular. However, digging
> > deeper, I find myself turning more and more desperate, looking over my
> > shoulder for a way out.
> >
> > Now, I'm in no doubt that the code *is* modular, that it *is* clean and
> > tidy, but coming from outside, it is not that easy to grasp it all. And,
> > it is not just the sheer size and complexity of the scheduler itself,
> > but also a lot with how the code is arranged.
> >
> > For instance, functions like free_fair_sched_group,
> > alloc_fair_sched_group etc - does IMHO belong in sched_fair.c and not in
> > sched.c. The same goes for several rt-functions and structs.
> >
> > So, if one drew up a list over all events that would cause functions in
> > sched.c to be called, this could be used to make a minimized 'interface'
> > and then let the scheduler call the appropriate function for the given
> > class in the same fashion sched_tick is used today.
>
> I'd start out small by moving the functions to the right file. After
> that you could look at providing methods in the sched_class.

Yes, I was thinking something along those lines, slowly moving things out. I
just wanted to clear the air before I got started in case someone had some
real issues with this approach. After all, if it's never going to get merged,
there's no point doing it.

> > What I would like, is to rip out all the *actual* scheduling logic and
> > put this in sched_[fair|rt].c and let sched be purely event-driven
> > (which would be a nice design goal in itself). This would also lead to
> > sched_[fair|rt].h, where the structs, macros, defines etc can be
> > defined. Today these are defined in just about everywhere, making the
> > code unnecessary complicated (I'm not going to say messy since I'm not
> > *that* senior to kernel coding :-))
>
> You might need to be careful there, or introduce sched_(fair|rt).h for
> those.

This is a final goal, the light in the end of the tunnel if you like.

> > Why not use the sched_class for all that it's worth - make the different
> > classes implement a set of functions and let sched.c be oblivious to
> > what's going on other than turning the machinery around?
>
> Sounds good, its been on the agenda for a while, but nobody ever got
> around to it.
>
> Other cleanups that can be done are:
> - get rid of all the load balance iterator stuff and move
> that all into sched_fair

Ah, yes. I can give that a go

> - extract the common sched_entity members and create:
>
> struct {
> struct sched_entity_common common;
> union {
> struct sched_entity fair;
> struct sched_rt_entity rt;
> }
> }
>
> > Is this something worth pursuing? I mean, the scheduler *does* work, and
> > if it ain't broken, don't fix it. However, I have a strong feeling that
> > this can be done a lot cleaner, not to mention, changing from one type
> > of scheduler to another will be much easier. :-)
>
> Well, adding a sched_class, no need to replace anything besides that.

I wasn't really aiming at replacing anything (besides the rt-stuff, but as
said earlier, that's not the issuen ow). I just want to move things to make
it a bit clearer.

--
med Vennlig Hilsen - Yours Sincerely
Henrik Austad


Attachments:
(No filename) (4.35 kB)
signature.asc (189.00 B)
This is a digitally signed message part.
Download all attachments

2008-10-30 17:17:22

by Peter Zijlstra

[permalink] [raw]
Subject: Deadline scheduling (was: Re: Rearranging layout of code in the scheduler)

On Thu, 2008-10-30 at 17:49 +0100, [email protected] wrote:
> Quoting Peter Zijlstra <[email protected]>:
> >> Before I dive in, I should probably justify my motivations for writing
> >> this email. I'm working away on implementing an EDF scheduler for real
> >> time tasks in the kernel. This again leads to hacking at the existing
> >> source as I'm not about to toss out the entire scheduler - just replace
> >> (by some Kconfig switch) the RR/FIFO classes. As to why I'm looking at
> >> EDF, I think the answer to that is a bit too long (and not appropriate
> >> for this email anyway) so I'll leave that part out.
> Well, I understand that, but it could be interesting... At least to me. :-)
>
> > You and a few other folks.
> Yes, here we are! :-)
>
> We also have some code, but it still is highly experimental and we are
> deeply rearranging it.
>
> > The most interesting part of EDF is not the
> > actual scheduler itself (although there are fun issues with that as
> > well), but extending the Priority Inheritance framework to deal with all
> > the fun cases that come with EDF.
> The main problem is that, especially to deal with SMP systems, we also
> need to investigate theoretical issues and find out what the best
> approach could be.
>
> > Well, adding a sched_class, no need to replace anything besides that.
> >
> I'm not saying anything in possible sched.c and sched_{fair|rt}.c code
> rearranging, I also only wonder why replacing fixed priority RT
> scheduling with EDF.
>
> I think they both could be in... Maybe we can discuss on where, I mean
> on which position, in the linked list of scheduling classes, to put
> each of them.

Right, ideally I'd like to see 2 EDF classes on top of FIFO, so that we
end up with the following classes

hedf - hard EDF
sedf - soft EDF (bounded tardiness)
fifo/rr - the current static priority scheduler
fair - the current proportional fair scheduler
idle - the idle scheduler

The two edf classes must share some state, so that the sedf class knows
about the utilisation consumed by hedf, and the main difference between
these two classes is the schedulability test.

[ NOTE: read EDF as any deadline scheduler, so it might as well be
pfair PD^2 scheduler. ]

The few problems this gives are things like kstopmachine and the
migration threads, which should run at the max priority available on the
system.

[ NOTE: although possibly we could make an exception for the migration
threads, as we generally don't need to migrate running RT tasks]

Perhaps we can introduce another class on top of hedf which will run
just these two tasks and is not exposed to userspace (yes, I understand
it will ruin just about any schedulability analysis).

Which leaves us with the big issue of priority inversion ;-)

We can do deadline inheritance and bandwidth inheritance by changing
plist to a rb-tree/binary heap and mapping the static priority levels
somewhere at the back and also propagating the actual task responsible
for the boost down the chain (so as to be able to do bandwidth
inheritance).

>From what I gather the sssup folks are doing that, although they
reported that DI between disjoint schedule domains (partitions) posed an
interesting problem.

Personally I'd like to see the full priority inversion issue solved by
something like the proxy execution protocol, however the SMP extension
thereof seems to be a tad expensive - found a book on graph theory, all
that remains is finding time to read it :-)

The advantage of proxy execution is that its fully invariant to the
schedule function and thus even works for proportional fair schedulers
and any kind of scheduler hierarchy.

2008-10-30 17:48:58

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Deadline scheduling (was: Re: Rearranging layout of code in the scheduler)

On Thu, 2008-10-30 at 18:17 +0100, Peter Zijlstra wrote:

> Personally I'd like to see the full priority inversion issue solved by
> something like the proxy execution protocol, however the SMP extension
> thereof seems to be a tad expensive - found a book on graph theory, all
> that remains is finding time to read it :-)
>
> The advantage of proxy execution is that its fully invariant to the
> schedule function and thus even works for proportional fair schedulers
> and any kind of scheduler hierarchy.

Basically the problem that I'm looking at is:

Given a directed acyclic graph G, with entry nodes E (those who don't
have an inbound edge) and exit nodes X (those without an outbound edge)
then given an exit node x of X, split the graph into G1 and G2 so that
G1 contains x and all paths leading to it.

2008-10-30 17:49:47

by Dario Faggioli

[permalink] [raw]
Subject: Re: Rearranging layout of code in the scheduler

Quoting Peter Zijlstra <[email protected]>:
>> Before I dive in, I should probably justify my motivations for writing
>> this email. I'm working away on implementing an EDF scheduler for real
>> time tasks in the kernel. This again leads to hacking at the existing
>> source as I'm not about to toss out the entire scheduler - just replace
>> (by some Kconfig switch) the RR/FIFO classes. As to why I'm looking at
>> EDF, I think the answer to that is a bit too long (and not appropriate
>> for this email anyway) so I'll leave that part out.
Well, I understand that, but it could be interesting... At least to me. :-)

> You and a few other folks.
Yes, here we are! :-)

We also have some code, but it still is highly experimental and we are
deeply rearranging it.

> The most interesting part of EDF is not the
> actual scheduler itself (although there are fun issues with that as
> well), but extending the Priority Inheritance framework to deal with all
> the fun cases that come with EDF.
The main problem is that, especially to deal with SMP systems, we also
need to investigate theoretical issues and find out what the best
approach could be.

> Well, adding a sched_class, no need to replace anything besides that.
>
I'm not saying anything in possible sched.c and sched_{fair|rt}.c code
rearranging, I also only wonder why replacing fixed priority RT
scheduling with EDF.

I think they both could be in... Maybe we can discuss on where, I mean
on which position, in the linked list of scheduling classes, to put
each of them.

Regards,
Dario

Thanks for the Cc. Peter, I also added Fabio and Michael that, you
know, are working to this thing. :-)

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

2008-10-30 21:45:34

by Henrik Austad

[permalink] [raw]
Subject: Re: Deadline scheduling (was: Re: Rearranging layout of code in the scheduler)

On Thursday 30 October 2008 18:17:14 Peter Zijlstra wrote:
> On Thu, 2008-10-30 at 17:49 +0100, [email protected] wrote:
> > Quoting Peter Zijlstra <[email protected]>:
> > >> Before I dive in, I should probably justify my motivations for writing
> > >> this email. I'm working away on implementing an EDF scheduler for real
> > >> time tasks in the kernel. This again leads to hacking at the existing
> > >> source as I'm not about to toss out the entire scheduler - just
> > >> replace (by some Kconfig switch) the RR/FIFO classes. As to why I'm
> > >> looking at EDF, I think the answer to that is a bit too long (and not
> > >> appropriate for this email anyway) so I'll leave that part out.
> >
> > Well, I understand that, but it could be interesting... At least to me.
> > :-)

ok, simply put:
* give each task a relative deadline (will probably introduce a new syscall,
please don't shoot me).
* when the task enters TASK_RUNNING, set abosolute deadline to time_now +
rel_deadline.
* insert task in rq, sorted by abs_deadline
* pick leftmost task and run it
* when task is done, pick next task

that's it.

Then, of course, you have to add all the logic to make the thing work :)

> >
> > > You and a few other folks.
> >
> > Yes, here we are! :-)
> >
> > We also have some code, but it still is highly experimental and we are
> > deeply rearranging it.

I have a very clear idea about *what* the scheduler should do, the problem is
how to get it to work :-)

Things would be a lot easier if code in the scheduler was a bit 'more
separated. I have started separating things and moving it to separate files.
I'll ship off a few patches for comments if it's interesting?

> >
> > > The most interesting part of EDF is not the
> > > actual scheduler itself (although there are fun issues with that as
> > > well), but extending the Priority Inheritance framework to deal with
> > > all the fun cases that come with EDF.

Well, I find EDF intersting because it is so blissfully simple. :-)

> > The main problem is that, especially to deal with SMP systems, we also
> > need to investigate theoretical issues and find out what the best
> > approach could be.

Yes, well, EDF is not optimal for SMP systems, only for single core. However,
you can do a pretty good attempt by assigning tasks to cores in a greedy
fashion (simply put the next task at the CPU with the lowest load).

As a further optimization, I guess you could do the whole sced-domain thing to
minimize the search space.

> > > Well, adding a sched_class, no need to replace anything besides that.
> >
> > I'm not saying anything in possible sched.c and sched_{fair|rt}.c code
> > rearranging, I also only wonder why replacing fixed priority RT
> > scheduling with EDF.
> >
> > I think they both could be in... Maybe we can discuss on where, I mean
> > on which position, in the linked list of scheduling classes, to put
> > each of them.

No. You should have *either* FIFO/RR *or* EDF, not both at the same time. If
you absolutely require both, you should at least separate them on a per-core
basis. If you mix them, they need to be aware of the other in order to make
the right descision, and that is not good.

> Right, ideally I'd like to see 2 EDF classes on top of FIFO, so that we
> end up with the following classes
>
> hedf - hard EDF
> sedf - soft EDF (bounded tardiness)
> fifo/rr - the current static priority scheduler
> fair - the current proportional fair scheduler
> idle - the idle scheduler
>
> The two edf classes must share some state, so that the sedf class knows
> about the utilisation consumed by hedf, and the main difference between
> these two classes is the schedulability test.

Well.. why not just treat *all* RT-tasks as *either* FIFO/RR or EDF? Having
fifo and edf together will complicate things. And, people looking for edf,
will not use fifo/rr anyway (famous last words).

Furthermore, hard/firm/soft could be treated as one class, but it should be
treated differently at missed deadlines.
* Soft: nothing. Scheduled at best effort, when deadline passes, prioritize
other tasks to avoid cascading effect of deadlinemissies
* Firm: keep some statistics that the user can modify and a possible
event-handler when limits are exceeded
* Hard: immediatly call a user registred function, or send signal to notify
task.

> [ NOTE: read EDF as any deadline scheduler, so it might as well be
> pfair PD^2 scheduler. ]

Well, the nice thing about EDF, is that it is provable optimal for any
feasible schedule, IOW, if all the tasks are schedulable, you can have 100%
utilization of the CPU.

On a multicore, EDF is not optimal, as rearranging the tasks becomes NP-hard
(basically a knapsack problem) for several backpacks :-)

Besides that, EDF is the simplest, most brain-dead scheduler you can imagine.
Basically you want to add the deadline to the tasks, put it in a sorted list
and pick the leftmost task every time untill it completes.

> The few problems this gives are things like kstopmachine and the
> migration threads, which should run at the max priority available on the
> system.
>
> [ NOTE: although possibly we could make an exception for the migration
> threads, as we generally don't need to migrate running RT tasks]
>
> Perhaps we can introduce another class on top of hedf which will run
> just these two tasks and is not exposed to userspace (yes, I understand
> it will ruin just about any schedulability analysis).
>
> Which leaves us with the big issue of priority inversion ;-)

Couldn't above idea solve a bit of this? I have some papers on deadline
inheritance laying aorund somewhere, I can have a look at that, I think it
was a fairly elegant solution to some of these issues there.

>
> We can do deadline inheritance and bandwidth inheritance by changing
> plist to a rb-tree/binary heap and mapping the static priority levels
> somewhere at the back and also propagating the actual task responsible
> for the boost down the chain (so as to be able to do bandwidth
> inheritance).

IMHO, you are complicating things unnecessary. EDF is simple. Why not go for a
*very* basic system, not offer things that will be very difficult to
guarantee.

> >From what I gather the sssup folks are doing that, although they
>
> reported that DI between disjoint schedule domains (partitions) posed an
> interesting problem.
>
> Personally I'd like to see the full priority inversion issue solved by
> something like the proxy execution protocol, however the SMP extension
> thereof seems to be a tad expensive - found a book on graph theory, all
> that remains is finding time to read it :-)
>
> The advantage of proxy execution is that its fully invariant to the
> schedule function and thus even works for proportional fair schedulers
> and any kind of scheduler hierarchy.



--
-> henrik

2008-10-31 09:03:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Deadline scheduling (was: Re: Rearranging layout of code in the scheduler)

On Thu, 2008-10-30 at 22:44 +0100, Henrik Austad wrote:
> On Thursday 30 October 2008 18:17:14 Peter Zijlstra wrote:
> > On Thu, 2008-10-30 at 17:49 +0100, [email protected] wrote:
> > > Quoting Peter Zijlstra <[email protected]>:
> > > >> Before I dive in, I should probably justify my motivations for writing
> > > >> this email. I'm working away on implementing an EDF scheduler for real
> > > >> time tasks in the kernel. This again leads to hacking at the existing
> > > >> source as I'm not about to toss out the entire scheduler - just
> > > >> replace (by some Kconfig switch) the RR/FIFO classes. As to why I'm
> > > >> looking at EDF, I think the answer to that is a bit too long (and not
> > > >> appropriate for this email anyway) so I'll leave that part out.
> > >
> > > Well, I understand that, but it could be interesting... At least to me.
> > > :-)
>
> ok, simply put:
> * give each task a relative deadline (will probably introduce a new syscall,
> please don't shoot me).

We call that sys_sched_setscheduler2().

> * when the task enters TASK_RUNNING, set abosolute deadline to time_now +
> rel_deadline.
> * insert task in rq, sorted by abs_deadline
> * pick leftmost task and run it
> * when task is done, pick next task
>
> that's it.
>
> Then, of course, you have to add all the logic to make the thing work :)

Well, yes, I know how to do EDF, and its trivially simple - on UP.
Deadline scheduling on SMP otoh is not.

> > >
> > > > You and a few other folks.
> > >
> > > Yes, here we are! :-)
> > >
> > > We also have some code, but it still is highly experimental and we are
> > > deeply rearranging it.
>
> I have a very clear idea about *what* the scheduler should do, the problem is
> how to get it to work :-)
>
> Things would be a lot easier if code in the scheduler was a bit 'more
> separated. I have started separating things and moving it to separate files.
> I'll ship off a few patches for comments if it's interesting?

Sure, but implementing an EDF class isn't really all that hard - esp if
you only want UP.

The real fun is in the PI stuff and schedulability tests on SMP.

> > >
> > > > The most interesting part of EDF is not the
> > > > actual scheduler itself (although there are fun issues with that as
> > > > well), but extending the Priority Inheritance framework to deal with
> > > > all the fun cases that come with EDF.
>
> Well, I find EDF intersting because it is so blissfully simple. :-)

Yes it is, until you do SMP :-)

> > > The main problem is that, especially to deal with SMP systems, we also
> > > need to investigate theoretical issues and find out what the best
> > > approach could be.
>
> Yes, well, EDF is not optimal for SMP systems, only for single core. However,
> you can do a pretty good attempt by assigning tasks to cores in a greedy
> fashion (simply put the next task at the CPU with the lowest load).
>
> As a further optimization, I guess you could do the whole sced-domain thing to
> minimize the search space.

The problem with greedy binpacking heuristics is that your schedulablity
test are out the window, making the whole thing useless.

> > > > Well, adding a sched_class, no need to replace anything besides that.
> > >
> > > I'm not saying anything in possible sched.c and sched_{fair|rt}.c code
> > > rearranging, I also only wonder why replacing fixed priority RT
> > > scheduling with EDF.
> > >
> > > I think they both could be in... Maybe we can discuss on where, I mean
> > > on which position, in the linked list of scheduling classes, to put
> > > each of them.
>
> No. You should have *either* FIFO/RR *or* EDF, not both at the same time. If
> you absolutely require both, you should at least separate them on a per-core
> basis. If you mix them, they need to be aware of the other in order to make
> the right descision, and that is not good.

We _have_ to have both. Its that simple.

POSIX mandates we have SCHED_FIFO/RR, there is tons and tons of
userspace that uses it, we cannot just replace it with a deadline
scheduler.

Also, start a linux-rt kernel and look at all the RT threads you have
(and that's only going to get worse when we go to threads per interrupt
handler instead of threads per interrupt source).

Who is going to assign useful and meaningful periods, deadlines and
execution times to all those threads?

So the only other option is to add a deadline scheduler and allow people
to use it.

When you do that, you'll see that you have to add it above the FIFO/RR
class because otherwise your schedulability is out the window again.

> > Right, ideally I'd like to see 2 EDF classes on top of FIFO, so that we
> > end up with the following classes
> >
> > hedf - hard EDF
> > sedf - soft EDF (bounded tardiness)
> > fifo/rr - the current static priority scheduler
> > fair - the current proportional fair scheduler
> > idle - the idle scheduler
> >
> > The two edf classes must share some state, so that the sedf class knows
> > about the utilisation consumed by hedf, and the main difference between
> > these two classes is the schedulability test.
>
> Well.. why not just treat *all* RT-tasks as *either* FIFO/RR or EDF? Having
> fifo and edf together will complicate things. And, people looking for edf,
> will not use fifo/rr anyway (famous last words).

errr, not so, see above. FIFO/RR are static priority schedulers, whereas
deadline schedulers are job-static or dynamic priority schedulers. You
cannot simply map FIFO onto them, you need more information and who is
going to provide that?

Also, with the above mentioned requirement that we must have FIFO/RR,
there really is no choice.

> Furthermore, hard/firm/soft could be treated as one class, but it should be
> treated differently at missed deadlines.
> * Soft: nothing. Scheduled at best effort, when deadline passes, prioritize
> other tasks to avoid cascading effect of deadlinemissies
> * Firm: keep some statistics that the user can modify and a possible
> event-handler when limits are exceeded
> * Hard: immediatly call a user registred function, or send signal to notify
> task.

Thing is, you have to run hard tasks first, and scheduler weaker forms
in its slack time, otherwise you cannot guarantee anything.

And the easiest way to do that slack time stealing, is with such a
hierarchy. Sure, you can share a lot of code etc. but you need separate
classes.

> > [ NOTE: read EDF as any deadline scheduler, so it might as well be
> > pfair PD^2 scheduler. ]
>
> Well, the nice thing about EDF, is that it is provable optimal for any
> feasible schedule, IOW, if all the tasks are schedulable, you can have 100%
> utilization of the CPU.

On UP - which is not interesting on a general purpose kernel that runs
on machines with up to 4096 CPUs.

> On a multicore, EDF is not optimal, as rearranging the tasks becomes NP-hard
> (basically a knapsack problem) for several backpacks :-)

That is partitioned EDF or PEDF for short. There are many other EDF
variants who do not suffer this, for example global EDF (GEDF).

GEDF can guarantee a utilization bound of 50% for hard-rt and is soft-rt
up to 100%.

Then there are the pfair class of scheduling algorithms which can
theoretically yield up to 100% utilization on SMP systems.

> Besides that, EDF is the simplest, most brain-dead scheduler you can imagine.
> Basically you want to add the deadline to the tasks, put it in a sorted list
> and pick the leftmost task every time untill it completes.

Sure, and all that is useless without schedulability tests.

> > The few problems this gives are things like kstopmachine and the
> > migration threads, which should run at the max priority available on the
> > system.
> >
> > [ NOTE: although possibly we could make an exception for the migration
> > threads, as we generally don't need to migrate running RT tasks]
> >
> > Perhaps we can introduce another class on top of hedf which will run
> > just these two tasks and is not exposed to userspace (yes, I understand
> > it will ruin just about any schedulability analysis).
> >
> > Which leaves us with the big issue of priority inversion ;-)
>
> Couldn't above idea solve a bit of this? I have some papers on deadline
> inheritance laying aorund somewhere, I can have a look at that, I think it
> was a fairly elegant solution to some of these issues there.

Yes, its brilliant, on UP :-)

But it should work on SMP as well with a bit more effort. But you really
need bandwidth inheritance as well, which is yet another bit of effort.

But the Pisa folks mentioned some issues wrt partitioned EDF; Michael,
was that because the deadline clock wasn't synchronized or something? -
Could you explain that again, my brain seems to have mis-filed it.

This is relevant, because even if you do GEDF or better, linux allows
you to break your system into partitions using cpusets.

> > We can do deadline inheritance and bandwidth inheritance by changing
> > plist to a rb-tree/binary heap and mapping the static priority levels
> > somewhere at the back and also propagating the actual task responsible
> > for the boost down the chain (so as to be able to do bandwidth
> > inheritance).
>
> IMHO, you are complicating things unnecessary. EDF is simple. Why not go for a
> *very* basic system, not offer things that will be very difficult to
> guarantee.

If you want a system that can guarantee anything, you must solve the
priority inversion issue, and for deadline scheduling that means both DI
and BI, even if you do PEDF.

Because even if you do not allow your tasks to migrate, there will be
shared resources (if not in userspace, then at least in the kernel), and
as long as you have that....

2008-10-31 12:11:25

by Henrik Austad

[permalink] [raw]
Subject: Re: Deadline scheduling (was: Re: Rearranging layout of code in the scheduler)

On Fri, Oct 31, 2008 at 10:03:52AM +0100, Peter Zijlstra wrote:
> On Thu, 2008-10-30 at 22:44 +0100, Henrik Austad wrote:
> > On Thursday 30 October 2008 18:17:14 Peter Zijlstra wrote:
> > > On Thu, 2008-10-30 at 17:49 +0100, [email protected] wrote:
> > > > Quoting Peter Zijlstra <[email protected]>:
> > > > >> Before I dive in, I should probably justify my motivations for writing
> > > > >> this email. I'm working away on implementing an EDF scheduler for real
> > > > >> time tasks in the kernel. This again leads to hacking at the existing
> > > > >> source as I'm not about to toss out the entire scheduler - just
> > > > >> replace (by some Kconfig switch) the RR/FIFO classes. As to why I'm
> > > > >> looking at EDF, I think the answer to that is a bit too long (and not
> > > > >> appropriate for this email anyway) so I'll leave that part out.
> > > >
> > > > Well, I understand that, but it could be interesting... At least to me.
> > > > :-)
> >
> > ok, simply put:
> > * give each task a relative deadline (will probably introduce a new syscall,
> > please don't shoot me).
>
> We call that sys_sched_setscheduler2().

Ah, ok, I thought introducing new syscalls was *really* frowned upon.

>
> > * when the task enters TASK_RUNNING, set abosolute deadline to time_now +
> > rel_deadline.
> > * insert task in rq, sorted by abs_deadline
> > * pick leftmost task and run it
> > * when task is done, pick next task
> >
> > that's it.
> >
> > Then, of course, you have to add all the logic to make the thing work :)
>
> Well, yes, I know how to do EDF, and its trivially simple - on UP.
> Deadline scheduling on SMP otoh is not.

True.

> > > >
> > > > > You and a few other folks.
> > > >
> > > > Yes, here we are! :-)
> > > >
> > > > We also have some code, but it still is highly experimental and we are
> > > > deeply rearranging it.
> >
> > I have a very clear idea about *what* the scheduler should do, the problem is
> > how to get it to work :-)
> >
> > Things would be a lot easier if code in the scheduler was a bit 'more
> > separated. I have started separating things and moving it to separate files.
> > I'll ship off a few patches for comments if it's interesting?
>
> Sure, but implementing an EDF class isn't really all that hard - esp if
> you only want UP.
>
> The real fun is in the PI stuff and schedulability tests on SMP.

As a start, that is the approach at least I would like to take. Once you have a
proven, functional EDF on a single core, you can extend that to handle several
cores, if you really want to.

> > > > > The most interesting part of EDF is not the
> > > > > actual scheduler itself (although there are fun issues with that as
> > > > > well), but extending the Priority Inheritance framework to deal with
> > > > > all the fun cases that come with EDF.
> >
> > Well, I find EDF intersting because it is so blissfully simple. :-)
>
> Yes it is, until you do SMP :-)

Well, I have a simple solution to that to ;)

> > > > The main problem is that, especially to deal with SMP systems, we also
> > > > need to investigate theoretical issues and find out what the best
> > > > approach could be.
> >
> > Yes, well, EDF is not optimal for SMP systems, only for single core. However,
> > you can do a pretty good attempt by assigning tasks to cores in a greedy
> > fashion (simply put the next task at the CPU with the lowest load).
> >
> > As a further optimization, I guess you could do the whole sced-domain thing to
> > minimize the search space.
>
> The problem with greedy binpacking heuristics is that your schedulablity
> test are out the window, making the whole thing useless.

Well, not really. I mean, to be optimal, you should also consider WCET, but
then, that's really not interesting as IMHO that's the userspace-programmer's
responsibility. If the user wants to add tasks that sum up to 210% utilization,
it's really not much we can do anyway. You certainly wouldn't want the kernel to
stop accepting new jobs.

So, keep the kernel logic as simple as possible and move the job to the user.
By keeping the kernel logic simple - we make the job easier for the end-users. A
very complex EDF-scheduler will make the testing very difficult.

If, on the other hand, we *know* that the scheduler is not optimal, but that it
behaves in a predictable manner, the end users have a simpler task of finding
out why something bad happened.

Because, no matter *what* you do, and *how* you implement it, with *whatever*
features, there will be cases when things fall apart, and having a simple,
predictable scheduler will be necessary to figure it out.

> > > > > Well, adding a sched_class, no need to replace anything besides that.
> > > >
> > > > I'm not saying anything in possible sched.c and sched_{fair|rt}.c code
> > > > rearranging, I also only wonder why replacing fixed priority RT
> > > > scheduling with EDF.
> > > >
> > > > I think they both could be in... Maybe we can discuss on where, I mean
> > > > on which position, in the linked list of scheduling classes, to put
> > > > each of them.
> >
> > No. You should have *either* FIFO/RR *or* EDF, not both at the same time. If
> > you absolutely require both, you should at least separate them on a per-core
> > basis. If you mix them, they need to be aware of the other in order to make
> > the right descision, and that is not good.
>
> We _have_ to have both. Its that simple.

No, we do not. Or, at least not at the same time (see below)

> POSIX mandates we have SCHED_FIFO/RR, there is tons and tons of userspace that
> uses it, we cannot just replace it with a deadline scheduler.

I didn't mean to rip the whole fifo/rr out of the kernel, but adding a switch at
compile-time so that you could choose *either* normal, static RT *or* EDF. Then
we could, at least for the first few versions, have it depend on !SMP to avoid
the whole SMP-non-optimal-mess.

> Also, start a linux-rt kernel and look at all the RT threads you have (and
> that's only going to get worse when we go to threads per interrupt handler
> instead of threads per interrupt source).

yes, that would certainly be a problem. But, if we can configure in either EDF
or FIFO/RR, adding this to the kernel-rt-threads should be possible.

> Who is going to assign useful and meaningful periods, deadlines and execution
> times to all those threads?

well, periods is not that difficult. Basically you treat everything as
asynchronus events and put them in the runqueue when they become runnable. And
the application itself should set a relative deadline via the syscall to tell
the kernel "when I become runnable, I must finish before time_now +
rel_deadline"

RT-tasks that runs forever will certainly be a problem, but what about something
like mark it as soft/firm and for every time a deadline is missed, have the
trigger function set a new deadline? It would mean you have to rewrite a whole
lot of apps though.

> So the only other option is to add a deadline scheduler and allow people to
> use it.
>
> When you do that, you'll see that you have to add it above the FIFO/RR class
> because otherwise your schedulability is out the window again.
>
> > > Right, ideally I'd like to see 2 EDF classes on top of FIFO, so that we
> > > end up with the following classes
> > >
> > > hedf - hard EDF sedf - soft EDF (bounded tardiness) fifo/rr - the
> > > current static priority scheduler fair - the current proportional fair
> > > scheduler idle - the idle scheduler
> > >
> > > The two edf classes must share some state, so that the sedf class knows
> > > about the utilisation consumed by hedf, and the main difference between
> > > these two classes is the schedulability test.
> >
> > Well.. why not just treat *all* RT-tasks as *either* FIFO/RR or EDF? Having
> > fifo and edf together will complicate things. And, people looking for edf,
> > will not use fifo/rr anyway (famous last words).
>
> errr, not so, see above. FIFO/RR are static priority schedulers, whereas
> deadline schedulers are job-static or dynamic priority schedulers. You
> cannot simply map FIFO onto them, you need more information and who is
> going to provide that?

I'm not going to map anything anywhere. EDF is very different from priority
based scheduling and we should try to map something back (as priorities,
strictly speaking, are a set of deadlines analyzed and mapped to priorities
sothat the deadlines are met).

> Also, with the above mentioned requirement that we must have FIFO/RR,
> there really is no choice.
>
> > Furthermore, hard/firm/soft could be treated as one class, but it should be
> > treated differently at missed deadlines.
> > * Soft: nothing. Scheduled at best effort, when deadline passes, prioritize
> > other tasks to avoid cascading effect of deadlinemissies
> > * Firm: keep some statistics that the user can modify and a possible
> > event-handler when limits are exceeded
> > * Hard: immediatly call a user registred function, or send signal to notify
> > task.
>
> Thing is, you have to run hard tasks first, and scheduler weaker forms
> in its slack time, otherwise you cannot guarantee anything.

Well, then you suddenly introduce priorities to the deadlines, and that is not
good. A hard task is not more important than a soft, but the effect of missing
the deadline is. If the schedule is infeasible, it really doesn't matter what
you do, as you will miss deadlines, and if you prioritize hard tasks, you will
end up starving firm and soft

Before you go on and tell me how wrong I am, note that I don't disagree with
you, I think choosing hrt before the others, is the best solution from an
implementation point of view.

> And the easiest way to do that slack time stealing, is with such a
> hierarchy. Sure, you can share a lot of code etc. but you need separate
> classes.
>
> > > [ NOTE: read EDF as any deadline scheduler, so it might as well be
> > > pfair PD^2 scheduler. ]
> >
> > Well, the nice thing about EDF, is that it is provable optimal for any
> > feasible schedule, IOW, if all the tasks are schedulable, you can have 100%
> > utilization of the CPU.
>
> On UP - which is not interesting on a general purpose kernel that runs
> on machines with up to 4096 CPUs.

But, and pardon my ignorance, will an EDF-scheduler be intersting for such a
large system? From what I've gathered, small systems are the ones that could
benefit from an EDF as you can analyze and predict behaviour, and then, since
EDF is optimal, tune the CPU-freq down and still know that things will work.

Some embedded people can probably provide a lot better input here than me, as
this is just a general idea I snapped up 'somewhere' (where somewhere is an
element of the set of all places I've been the last 6 months).

> > On a multicore, EDF is not optimal, as rearranging the tasks becomes NP-hard
> > (basically a knapsack problem) for several backpacks :-)
>
> That is partitioned EDF or PEDF for short. There are many other EDF
> variants who do not suffer this, for example global EDF (GEDF).
>
> GEDF can guarantee a utilization bound of 50% for hard-rt and is soft-rt
> up to 100%.
>
> Then there are the pfair class of scheduling algorithms which can
> theoretically yield up to 100% utilization on SMP systems.

Do you know about any pratical attempts on this, and what kind of result that
produced?

>
> > Besides that, EDF is the simplest, most brain-dead scheduler you can imagine.
> > Basically you want to add the deadline to the tasks, put it in a sorted list
> > and pick the leftmost task every time untill it completes.
>
> Sure, and all that is useless without schedulability tests.

Yes, but should the kernel do the schedulability test? Or should the ball be
passed on to userspace? To analyze the schedulability, you would need the worst
case execution time (WCET) of the process, and if the kernel/scheduler should
start trying to estimate that...

So, as a start, why not just 'ignore' WCET in the first versions, and that can
be added later on, if necessary.

> > > The few problems this gives are things like kstopmachine and the
> > > migration threads, which should run at the max priority available on the
> > > system.
> > >
> > > [ NOTE: although possibly we could make an exception for the migration
> > > threads, as we generally don't need to migrate running RT tasks]
> > >
> > > Perhaps we can introduce another class on top of hedf which will run
> > > just these two tasks and is not exposed to userspace (yes, I understand
> > > it will ruin just about any schedulability analysis).

Or, have the task run with minimal deadline, and then set to TASK_INTERRUPTIBLE,
use a timer to awaken it again in a short time, run it, yield and so on. Make a
periodic tick that will preempt whatever task that's running?


> > >
> > > Which leaves us with the big issue of priority inversion ;-)
> >
> > Couldn't above idea solve a bit of this? I have some papers on deadline
> > inheritance laying aorund somewhere, I can have a look at that, I think it
> > was a fairly elegant solution to some of these issues there.
>
> Yes, its brilliant, on UP :-)
>
> But it should work on SMP as well with a bit more effort. But you really
> need bandwidth inheritance as well, which is yet another bit of effort.
>
> But the Pisa folks mentioned some issues wrt partitioned EDF; Michael,
> was that because the deadline clock wasn't synchronized or something? -
> Could you explain that again, my brain seems to have mis-filed it.
>
> This is relevant, because even if you do GEDF or better, linux allows
> you to break your system into partitions using cpusets.
>
> > > We can do deadline inheritance and bandwidth inheritance by changing
> > > plist to a rb-tree/binary heap and mapping the static priority levels
> > > somewhere at the back and also propagating the actual task responsible
> > > for the boost down the chain (so as to be able to do bandwidth
> > > inheritance).
> >
> > IMHO, you are complicating things unnecessary. EDF is simple. Why not go for a
> > *very* basic system, not offer things that will be very difficult to
> > guarantee.
>
> If you want a system that can guarantee anything, you must solve the
> priority inversion issue, and for deadline scheduling that means both DI
> and BI, even if you do PEDF.
>
> Because even if you do not allow your tasks to migrate, there will be
> shared resources (if not in userspace, then at least in the kernel), and
> as long as you have that....

A lot of good points, and I certainly see your side of it. However (and yes, I
have to argue a bit more ;)), I don't think an EDF-scheduler should contain a
lot of features.

If you want to use the EDF, why not give the user a list of consequenses like
- Only a single core
- No other RT-scheduler, if other userspace program breaks, so be it, the user
has been warned.
- Best effort only
- Provide handlers for a given set of signals that will be sent to any
application missing a deadline
- no cpu-scaling
- ... keep going, basically strip away every piece of dynamic behaviour and
complex scheduling code

henrik

2008-10-31 13:30:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Deadline scheduling (was: Re: Rearranging layout of code in the scheduler)

On Fri, 2008-10-31 at 13:09 +0100, Henrik Austad wrote:

> Ah, ok, I thought introducing new syscalls was *really* frowned upon.

We prefer not to, but sometimes there just isn't any other option.
If we want to extend struct sched_param, we need 2 new syscalls.

> > Sure, but implementing an EDF class isn't really all that hard - esp if
> > you only want UP.
> >
> > The real fun is in the PI stuff and schedulability tests on SMP.
>
> As a start, that is the approach at least I would like to take. Once you have a
> proven, functional EDF on a single core, you can extend that to handle several
> cores, if you really want to.

Well, you're of course free to do so, but I don't think its a very
interesting thing to do.

> > > > > The main problem is that, especially to deal with SMP systems, we also
> > > > > need to investigate theoretical issues and find out what the best
> > > > > approach could be.
> > >
> > > Yes, well, EDF is not optimal for SMP systems, only for single core. However,
> > > you can do a pretty good attempt by assigning tasks to cores in a greedy
> > > fashion (simply put the next task at the CPU with the lowest load).
> > >
> > > As a further optimization, I guess you could do the whole sced-domain thing to
> > > minimize the search space.
> >
> > The problem with greedy binpacking heuristics is that your schedulablity
> > test are out the window, making the whole thing useless.
>
> Well, not really. I mean, to be optimal, you should also consider WCET, but
> then, that's really not interesting as IMHO that's the userspace-programmer's
> responsibility. If the user wants to add tasks that sum up to 210% utilization,
> it's really not much we can do anyway. You certainly wouldn't want the kernel to
> stop accepting new jobs.
>
> So, keep the kernel logic as simple as possible and move the job to the user.
> By keeping the kernel logic simple - we make the job easier for the end-users. A
> very complex EDF-scheduler will make the testing very difficult.
>
> If, on the other hand, we *know* that the scheduler is not optimal, but that it
> behaves in a predictable manner, the end users have a simpler task of finding
> out why something bad happened.
>
> Because, no matter *what* you do, and *how* you implement it, with *whatever*
> features, there will be cases when things fall apart, and having a simple,
> predictable scheduler will be necessary to figure it out.

I agree that the scheduler should be simple, and even something like
PD^2 is relatively simple.

But I disagree that we should not do schedulability tests. Doing those,
and esp. enforcing tasks to their given limits increases the QoS for
others in the presence of faulty/malicious tasks.

Also, WCET is still the users responsibility.

If for each deadline task you specify a period, a deadline and a budget.
Then the WCET computation is reflected in the budget.

By enforcing the schedulability test and execution budget you raise the
quality of service, because even in the presence of a mis-behaving task
only that task will be impacted. The other tasks will still meet their
deadlines.

> > > No. You should have *either* FIFO/RR *or* EDF, not both at the same time. If
> > > you absolutely require both, you should at least separate them on a per-core
> > > basis. If you mix them, they need to be aware of the other in order to make
> > > the right descision, and that is not good.
> >
> > We _have_ to have both. Its that simple.
>
> No, we do not. Or, at least not at the same time (see below)
>
> > POSIX mandates we have SCHED_FIFO/RR, there is tons and tons of userspace that
> > uses it, we cannot just replace it with a deadline scheduler.
>
> I didn't mean to rip the whole fifo/rr out of the kernel, but adding a switch at
> compile-time so that you could choose *either* normal, static RT *or* EDF. Then
> we could, at least for the first few versions, have it depend on !SMP to avoid
> the whole SMP-non-optimal-mess.

But _why_? why not leave FIFO/RR in? There is absolutely no downside to
keeping it around.

> > Thing is, you have to run hard tasks first, and scheduler weaker forms
> > in its slack time, otherwise you cannot guarantee anything.
>
> Well, then you suddenly introduce priorities to the deadlines, and that is not
> good. A hard task is not more important than a soft, but the effect of missing
> the deadline is. If the schedule is infeasible, it really doesn't matter what
> you do, as you will miss deadlines, and if you prioritize hard tasks, you will
> end up starving firm and soft
>
> Before you go on and tell me how wrong I am, note that I don't disagree with
> you, I think choosing hrt before the others, is the best solution from an
> implementation point of view.

This is, if you make the soft-deadline class aware of the hard-deadline
class's tasks and schedulability contraints, then you can keep the
soft-rt class schedulable too.

So srt is in no way less important, its just has less restrictions on
the schedule, therefore we can run it in the hrt slack/idle time.

And adding the schedulability test in the kernel avoids these starvation
issues, because you just cannot.

> > On UP - which is not interesting on a general purpose kernel that runs
> > on machines with up to 4096 CPUs.
>
> But, and pardon my ignorance, will an EDF-scheduler be intersting for such a
> large system? From what I've gathered, small systems are the ones that could
> benefit from an EDF as you can analyze and predict behaviour, and then, since
> EDF is optimal, tune the CPU-freq down and still know that things will work.
>
> Some embedded people can probably provide a lot better input here than me, as
> this is just a general idea I snapped up 'somewhere' (where somewhere is an
> element of the set of all places I've been the last 6 months).

Not that large indeed, but people are interested in running RT workloads
on machines in the 32/64 scale.

And even the embedded folks are now staring quad core arm11 chips in the
face, wondering how to do things.

> > Then there are the pfair class of scheduling algorithms which can
> > theoretically yield up to 100% utilization on SMP systems.
>
> Do you know about any pratical attempts on this, and what kind of result that
> produced?

Fairly decent, http://www.cs.unc.edu/~anderson/papers/rtss08b.pdf

> > > Besides that, EDF is the simplest, most brain-dead scheduler you can imagine.
> > > Basically you want to add the deadline to the tasks, put it in a sorted list
> > > and pick the leftmost task every time untill it completes.
> >
> > Sure, and all that is useless without schedulability tests.
>
> Yes, but should the kernel do the schedulability test? Or should the ball be
> passed on to userspace? To analyze the schedulability, you would need the worst
> case execution time (WCET) of the process, and if the kernel/scheduler should
> start trying to estimate that...
>
> So, as a start, why not just 'ignore' WCET in the first versions, and that can
> be added later on, if necessary.

Like said above, WCET is represented in the execution budget.

> A lot of good points, and I certainly see your side of it. However (and yes, I
> have to argue a bit more ;)), I don't think an EDF-scheduler should contain a
> lot of features.
>
> If you want to use the EDF, why not give the user a list of consequenses like
> - Only a single core

There won't be a single core machine left soon ;-)

> - No other RT-scheduler, if other userspace program breaks, so be it, the user
> has been warned.

That's a no go, and I don't see why you would need that.

> - Best effort only

That's pretty useless imho. Best-effort and RT are a bit contradictory.

> - Provide handlers for a given set of signals that will be sent to any
> application missing a deadline

Yeah, the idea was to send SIGXCPU to tasks who exceed their budget (and
thus will miss their deadline).

> - no cpu-scaling
> - ... keep going, basically strip away every piece of dynamic behaviour and
> complex scheduling code

I'm thinking there's little useful left after all that ;-)

2008-10-31 15:05:44

by Henrik Austad

[permalink] [raw]
Subject: Re: Deadline scheduling (was: Re: Rearranging layout of code in the scheduler)

On Friday 31 October 2008 14:30:08 Peter Zijlstra wrote:
> On Fri, 2008-10-31 at 13:09 +0100, Henrik Austad wrote:
> > Ah, ok, I thought introducing new syscalls was *really* frowned upon.
>
> We prefer not to, but sometimes there just isn't any other option.
> If we want to extend struct sched_param, we need 2 new syscalls.
>
> > > Sure, but implementing an EDF class isn't really all that hard - esp if
> > > you only want UP.
> > >
> > > The real fun is in the PI stuff and schedulability tests on SMP.
> >
> > As a start, that is the approach at least I would like to take. Once you
> > have a proven, functional EDF on a single core, you can extend that to
> > handle several cores, if you really want to.
>
> Well, you're of course free to do so, but I don't think its a very
> interesting thing to do.
>
> > > > > > The main problem is that, especially to deal with SMP systems, we
> > > > > > also need to investigate theoretical issues and find out what the
> > > > > > best approach could be.
> > > >
> > > > Yes, well, EDF is not optimal for SMP systems, only for single core.
> > > > However, you can do a pretty good attempt by assigning tasks to cores
> > > > in a greedy fashion (simply put the next task at the CPU with the
> > > > lowest load).
> > > >
> > > > As a further optimization, I guess you could do the whole sced-domain
> > > > thing to minimize the search space.
> > >
> > > The problem with greedy binpacking heuristics is that your
> > > schedulablity test are out the window, making the whole thing useless.
> >
> > Well, not really. I mean, to be optimal, you should also consider WCET,
> > but then, that's really not interesting as IMHO that's the
> > userspace-programmer's responsibility. If the user wants to add tasks
> > that sum up to 210% utilization, it's really not much we can do anyway.
> > You certainly wouldn't want the kernel to stop accepting new jobs.
> >
> > So, keep the kernel logic as simple as possible and move the job to the
> > user. By keeping the kernel logic simple - we make the job easier for the
> > end-users. A very complex EDF-scheduler will make the testing very
> > difficult.
> >
> > If, on the other hand, we *know* that the scheduler is not optimal, but
> > that it behaves in a predictable manner, the end users have a simpler
> > task of finding out why something bad happened.
> >
> > Because, no matter *what* you do, and *how* you implement it, with
> > *whatever* features, there will be cases when things fall apart, and
> > having a simple, predictable scheduler will be necessary to figure it
> > out.
>
> I agree that the scheduler should be simple, and even something like
> PD^2 is relatively simple.
>
> But I disagree that we should not do schedulability tests. Doing those,
> and esp. enforcing tasks to their given limits increases the QoS for
> others in the presence of faulty/malicious tasks.
>
> Also, WCET is still the users responsibility.
>
> If for each deadline task you specify a period, a deadline and a budget.
> Then the WCET computation is reflected in the budget.
>
> By enforcing the schedulability test and execution budget you raise the
> quality of service, because even in the presence of a mis-behaving task
> only that task will be impacted. The other tasks will still meet their
> deadlines.

Ah, ok. I see.

>
> > > > No. You should have *either* FIFO/RR *or* EDF, not both at the same
> > > > time. If you absolutely require both, you should at least separate
> > > > them on a per-core basis. If you mix them, they need to be aware of
> > > > the other in order to make the right descision, and that is not good.
> > >
> > > We _have_ to have both. Its that simple.
> >
> > No, we do not. Or, at least not at the same time (see below)
> >
> > > POSIX mandates we have SCHED_FIFO/RR, there is tons and tons of
> > > userspace that uses it, we cannot just replace it with a deadline
> > > scheduler.
> >
> > I didn't mean to rip the whole fifo/rr out of the kernel, but adding a
> > switch at compile-time so that you could choose *either* normal, static
> > RT *or* EDF. Then we could, at least for the first few versions, have it
> > depend on !SMP to avoid the whole SMP-non-optimal-mess.
>
> But _why_? why not leave FIFO/RR in? There is absolutely no downside to
> keeping it around.

My motivation was the increased complexity in meeting deadlines etc. By having
only EDF (or only RR/FIFO) things would be a lot simpler. Life is not simple
it seems :-)

>
> > > Thing is, you have to run hard tasks first, and scheduler weaker forms
> > > in its slack time, otherwise you cannot guarantee anything.
> >
> > Well, then you suddenly introduce priorities to the deadlines, and that
> > is not good. A hard task is not more important than a soft, but the
> > effect of missing the deadline is. If the schedule is infeasible, it
> > really doesn't matter what you do, as you will miss deadlines, and if you
> > prioritize hard tasks, you will end up starving firm and soft
> >
> > Before you go on and tell me how wrong I am, note that I don't disagree
> > with you, I think choosing hrt before the others, is the best solution
> > from an implementation point of view.
>
> This is, if you make the soft-deadline class aware of the hard-deadline
> class's tasks and schedulability contraints, then you can keep the
> soft-rt class schedulable too.
>
> So srt is in no way less important, its just has less restrictions on
> the schedule, therefore we can run it in the hrt slack/idle time.
>
> And adding the schedulability test in the kernel avoids these starvation
> issues, because you just cannot.
>
> > > On UP - which is not interesting on a general purpose kernel that runs
> > > on machines with up to 4096 CPUs.
> >
> > But, and pardon my ignorance, will an EDF-scheduler be intersting for
> > such a large system? From what I've gathered, small systems are the ones
> > that could benefit from an EDF as you can analyze and predict behaviour,
> > and then, since EDF is optimal, tune the CPU-freq down and still know
> > that things will work.
> >
> > Some embedded people can probably provide a lot better input here than
> > me, as this is just a general idea I snapped up 'somewhere' (where
> > somewhere is an element of the set of all places I've been the last 6
> > months).
>
> Not that large indeed, but people are interested in running RT workloads
> on machines in the 32/64 scale.
>
> And even the embedded folks are now staring quad core arm11 chips in the
> face, wondering how to do things.

Hmm, I must admit, that is a very good point.

>
> > > Then there are the pfair class of scheduling algorithms which can
> > > theoretically yield up to 100% utilization on SMP systems.
> >
> > Do you know about any pratical attempts on this, and what kind of result
> > that produced?
>
> Fairly decent, http://www.cs.unc.edu/~anderson/papers/rtss08b.pdf

Thanks!

>
> > > > Besides that, EDF is the simplest, most brain-dead scheduler you can
> > > > imagine. Basically you want to add the deadline to the tasks, put it
> > > > in a sorted list and pick the leftmost task every time untill it
> > > > completes.
> > >
> > > Sure, and all that is useless without schedulability tests.
> >
> > Yes, but should the kernel do the schedulability test? Or should the ball
> > be passed on to userspace? To analyze the schedulability, you would need
> > the worst case execution time (WCET) of the process, and if the
> > kernel/scheduler should start trying to estimate that...
> >
> > So, as a start, why not just 'ignore' WCET in the first versions, and
> > that can be added later on, if necessary.
>
> Like said above, WCET is represented in the execution budget.
>
> > A lot of good points, and I certainly see your side of it. However (and
> > yes, I have to argue a bit more ;)), I don't think an EDF-scheduler
> > should contain a lot of features.
> >
> > If you want to use the EDF, why not give the user a list of consequenses
> > like - Only a single core
>
> There won't be a single core machine left soon ;-)
>
> > - No other RT-scheduler, if other userspace program breaks, so be it, the
> > user has been warned.
>
> That's a no go, and I don't see why you would need that.
>
> > - Best effort only
>
> That's pretty useless imho. Best-effort and RT are a bit contradictory.
>
> > - Provide handlers for a given set of signals that will be sent to any
> > application missing a deadline
>
> Yeah, the idea was to send SIGXCPU to tasks who exceed their budget (and
> thus will miss their deadline).
>
> > - no cpu-scaling
> > - ... keep going, basically strip away every piece of dynamic behaviour
> > and complex scheduling code
>
> I'm thinking there's little useful left after all that ;-)

bah, you just picked all my arguments to shreds and convinced me I'm wrong. I
guess it's back to the drawing board then, but now I have a better
understanding at least.

Thans for all the feedback guys, especially Peter! I really appreciate this!

--
med Vennlig Hilsen - Yours Sincerely
Henrik Austad

2008-10-31 18:09:30

by Dario Faggioli

[permalink] [raw]
Subject: Re: Deadline scheduling (was: Re: Rearranging layout of code in the scheduler)

On Gio, 30 Ottobre 2008 10:44 pm, Henrik Austad wrote:
>> As to why I'm
>> looking at EDF, I think the answer to that is a bit too long (and
>> not
>> appropriate for this email anyway) so I'll leave that part out.
>> >
>> > Well, I understand that, but it could be interesting... At least to
>> > me.
>
> ok, simply put:
> * give each task a relative deadline (will probably introduce a new
> syscall,
> please don't shoot me).
> * when the task enters TASK_RUNNING, set abosolute deadline to time_now +
> rel_deadline.
> * insert task in rq, sorted by abs_deadline
> * pick leftmost task and run it
> * when task is done, pick next task
>
> that's it.
>
Ok, that is how EDF work, and I know it... I was asking something
different... But nevermind! :-D

>> > > The most interesting part of EDF is not the
>> > > actual scheduler itself (although there are fun issues with that as
>> > > well), but extending the Priority Inheritance framework to deal with
>> > > all the fun cases that come with EDF.
>
> Well, I find EDF intersting because it is so blissfully simple. :-)
>
I agree, EDF is very simple and has a lot of very nice properties...
Problem is do decide how to assign a deadline to a task, if it is not a
classical soft or hard real-time one! :-O

But you're not talking about things like that, aren't you?

> Yes, well, EDF is not optimal for SMP systems, only for single core.
> However,
> you can do a pretty good attempt by assigning tasks to cores in a greedy
> fashion (simply put the next task at the CPU with the lowest load).
>
I definitely agree that hard real time workloads are better handled by
partitioned EDF, but for soft ones, it was sad to suffer from the possible
CPU utilization loss it entails.

Also, what about resources shared by different tasks in different
CPU/partitions? And if you avoid sharing resources between tasks in
different partitions (is that acceptable?), what about system resources,
that are shared by _all_ tasks in the system by definition?

Sorry about asking so much questions, but these are the issues we are
trying to address, and I'm quite interested in knowing if you have any
idea about them. :-)

> No. You should have *either* FIFO/RR *or* EDF, not both at the same time.
>
Oh... Why?

> If
> you absolutely require both, you should at least separate them on a
> per-core
> basis. If you mix them, they need to be aware of the other in order to
> make
> the right descision, and that is not good.
>
Well, obvioulsy it's something that we have to think carefully, but I
can't see any harmful situation in having a deadline based and a fixed
priority based scheduling from where to pick task in (sorry) priority
order.

> Well.. why not just treat *all* RT-tasks as *either* FIFO/RR or EDF?
> Having
> fifo and edf together will complicate things. And, people looking for edf,
> will not use fifo/rr anyway (famous last words).
>
Ok, maybe it's a matter of personal feelings, but I think that such a
design, even more complicated, could be very nice and useful.

>> Which leaves us with the big issue of priority inversion ;-)
>
> Couldn't above idea solve a bit of this? I have some papers on deadline
> inheritance laying aorund somewhere, I can have a look at that, I think it
> was a fairly elegant solution to some of these issues there.
>
Well, I personally think that partitioning _raises_ issues about resource
sharing instead of lightening them... In an OS like Linux is, at least...
:-O

Regards,
Dario Faggioli

PS. Sorry for the webmail... I'm abroad and I've not my laptop with me :-(

2008-10-31 18:17:48

by Dario Faggioli

[permalink] [raw]
Subject: Re: Deadline scheduling (was: Re: Rearranging layout of code in the scheduler)

On Gio, 30 Ottobre 2008 6:17 pm, Peter Zijlstra wrote:
> Right, ideally I'd like to see 2 EDF classes on top of FIFO, so that we
> end up with the following classes
>
> hedf - hard EDF
> sedf - soft EDF (bounded tardiness)
> fifo/rr - the current static priority scheduler
> fair - the current proportional fair scheduler
> idle - the idle scheduler
>
Oh, so two classes? Well, yes, could be nice.

> The two edf classes must share some state, so that the sedf class knows
> about the utilisation consumed by hedf, and the main difference between
> these two classes is the schedulability test.
>
Yep. Actually I think that schedulability test could be an issue as well,
especially if we like group/hierarchical approach, since the known
hierarchical admission tests are quite complex to implement and, probably,
time consuming if performed on-line in an highly dynamic (with respec to
to task arrival and leaving) system.

> The few problems this gives are things like kstopmachine and the
> migration threads, which should run at the max priority available on the
> system.
>
Yeah, that's exactly what we was thinking too.

Actually, I was also thinking that having fixed priority scheduling
_before_ EDF could be of some benefit if you have to be sure that a task
is going to be executed at a very precise instant in time, but have not a
precise about that yet.

> Perhaps we can introduce another class on top of hedf which will run
> just these two tasks and is not exposed to userspace (yes, I understand
> it will ruin just about any schedulability analysis).
Well, could be a solution... And if this is only used for such kind of
special tasks, maybe it is not impossible to bound or account for their
scheduling contribution... But I'm just shooting inthe dark here, sorry
about that! :-P

> We can do deadline inheritance and bandwidth inheritance by changing
> plist to a rb-tree/binary heap and mapping the static priority levels
> somewhere at the back and also propagating the actual task responsible
> for the boost down the chain (so as to be able to do bandwidth
> inheritance).
>
> From what I gather the sssup folks are doing that, although they
> reported that DI between disjoint schedule domains (partitions) posed an
> interesting problem.
>
Yes, that's right, this is what we are investigating and trying to do in
these days (... Or weeks... Or months!).

> Personally I'd like to see the full priority inversion issue solved by
> something like the proxy execution protocol, however the SMP extension
> thereof seems to be a tad expensive - found a book on graph theory, all
> that remains is finding time to read it :-)
>
Wow... So, good luck for that! :-)

Maybe it's my fault, but I see some issues with proxy execution and
similar protocols.
That is, if you have, let's say, task A blocked on task B, blocked on task
C, and you are using proxy execution, that means that you have not
dequeued A and B when they blocked, but that you, for example, filled a
pointer that reminds you, when you schedule them, that you have to
actually run C, am I right?

Now, what happens if C blocks on a non-rt mutex lock, or if it simply go
to sleep? Is that acceptable to track the blocking chain in order to
actually dequeue also A and B, and to requeue them again when C will wake
up as well?

Forgive if that's a stupid point... :-(

> The advantage of proxy execution is that its fully invariant to the
> schedule function and thus even works for proportional fair schedulers
> and any kind of scheduler hierarchy.
>
Yes, I agree and I like it very much too. If you go for it, you could also
add bandwidth inheritance (e.g., for group scheduling) and things like
that almost for free (if wanted! :-))

Regards,
Dario Faggioli