LinuxLists.cc - Rearranging layout of code in the scheduler

2008-10-28 16:13:53

Subject: Rearranging layout of code in the scheduler

Hello,

Before I dive in, I should probably justify my motivations for writing
this email. I'm working away on implementing an EDF scheduler for real
time tasks in the kernel. This again leads to hacking at the existing
source as I'm not about to toss out the entire scheduler - just replace
(by some Kconfig switch) the RR/FIFO classes. As to why I'm looking at
EDF, I think the answer to that is a bit too long (and not appropriate
for this email anyway) so I'll leave that part out.

However, what I do mean to discuss is the current state of the scheduler
code. Working through the code, I must say I'm really impressed. The
code is clean, it is well thought out and the new approach with
sched_class and sched_entity makes it very modular. However, digging
deeper, I find myself turning more and more desperate, looking over my
shoulder for a way out.

Now, I'm in no doubt that the code *is* modular, that it *is* clean and
tidy, but coming from outside, it is not that easy to grasp it all. And,
it is not just the sheer size and complexity of the scheduler itself,
but also a lot with how the code is arranged.

For instance, functions like free_fair_sched_group,
alloc_fair_sched_group etc - does IMHO belong in sched_fair.c and not in
sched.c. The same goes for several rt-functions and structs.

So, if one drew up a list over all events that would cause functions in
sched.c to be called, this could be used to make a minimized 'interface'
and then let the scheduler call the appropriate function for the given
class in the same fashion sched_tick is used today.

What I would like, is to rip out all the *actual* scheduling logic and
put this in sched_[fair|rt].c and let sched be purely event-driven
(which would be a nice design goal in itself). This would also lead to
sched_[fair|rt].h, where the structs, macros, defines etc can be
defined. Today these are defined in just about everywhere, making the
code unnecessary complicated (I'm not going to say messy since I'm not
*that* senior to kernel coding :-))

Why not use the sched_class for all that it's worth - make the different
classes implement a set of functions and let sched.c be oblivious to
what's going on other than turning the machinery around?

Is this something worth pursuing? I mean, the scheduler *does* work, and
if it ain't broken, don't fix it. However, I have a strong feeling that
this can be done a lot cleaner, not to mention, changing from one type
of scheduler to another will be much easier. :-)

--
med Vennlig Hilsen - Yours Sincerely
Henrik Austad

Attachments:

(No filename) (2.49 kB)
signature.asc (189.00 B)
This is a digitally signed message part. Download all attachments

2008-10-28 16:50:55

by Peter Zijlstra

[permalink] [raw]

Subject: Re: Rearranging layout of code in the scheduler

On Tue, 2008-10-28 at 16:34 +0100, Henrik Austad wrote:
> Hello,
>
> Before I dive in, I should probably justify my motivations for writing
> this email. I'm working away on implementing an EDF scheduler for real
> time tasks in the kernel. This again leads to hacking at the existing
> source as I'm not about to toss out the entire scheduler - just replace
> (by some Kconfig switch) the RR/FIFO classes. As to why I'm looking at
> EDF, I think the answer to that is a bit too long (and not appropriate
> for this email anyway) so I'll leave that part out.

You and a few other folks. The most interesting part of EDF is not the
actual scheduler itself (although there are fun issues with that as
well), but extending the Priority Inheritance framework to deal with all
the fun cases that come with EDF.

> However, what I do mean to discuss is the current state of the scheduler
> code. Working through the code, I must say I'm really impressed. The
> code is clean, it is well thought out and the new approach with
> sched_class and sched_entity makes it very modular. However, digging
> deeper, I find myself turning more and more desperate, looking over my
> shoulder for a way out.
>
> Now, I'm in no doubt that the code *is* modular, that it *is* clean and
> tidy, but coming from outside, it is not that easy to grasp it all. And,
> it is not just the sheer size and complexity of the scheduler itself,
> but also a lot with how the code is arranged.
>
> For instance, functions like free_fair_sched_group,
> alloc_fair_sched_group etc - does IMHO belong in sched_fair.c and not in
> sched.c. The same goes for several rt-functions and structs.
>
> So, if one drew up a list over all events that would cause functions in
> sched.c to be called, this could be used to make a minimized 'interface'
> and then let the scheduler call the appropriate function for the given
> class in the same fashion sched_tick is used today.

I'd start out small by moving the functions to the right file. After
that you could look at providing methods in the sched_class.

> What I would like, is to rip out all the *actual* scheduling logic and
> put this in sched_[fair|rt].c and let sched be purely event-driven
> (which would be a nice design goal in itself). This would also lead to
> sched_[fair|rt].h, where the structs, macros, defines etc can be
> defined. Today these are defined in just about everywhere, making the
> code unnecessary complicated (I'm not going to say messy since I'm not
> *that* senior to kernel coding :-))

You might need to be careful there, or introduce sched_(fair|rt).h for
those.

> Why not use the sched_class for all that it's worth - make the different
> classes implement a set of functions and let sched.c be oblivious to
> what's going on other than turning the machinery around?

Sounds good, its been on the agenda for a while, but nobody ever got
around to it.

Other cleanups that can be done are:
- get rid of all the load balance iterator stuff and move
that all into sched_fair

- extract the common sched_entity members and create:

struct {
struct sched_entity_common common;
union {
struct sched_entity fair;
struct sched_rt_entity rt;
}
}

> Is this something worth pursuing? I mean, the scheduler *does* work, and
> if it ain't broken, don't fix it. However, I have a strong feeling that
> this can be done a lot cleaner, not to mention, changing from one type
> of scheduler to another will be much easier. :-)

Well, adding a sched_class, no need to replace anything besides that.

2008-10-28 19:42:09

On Gio, 30 Ottobre 2008 10:44 pm, Henrik Austad wrote:
>> As to why I'm
>> looking at EDF, I think the answer to that is a bit too long (and
>> not
>> appropriate for this email anyway) so I'll leave that part out.
>> >
>> > Well, I understand that, but it could be interesting... At least to
>> > me.
>
> ok, simply put:
> * give each task a relative deadline (will probably introduce a new
> syscall,
> please don't shoot me).
> * when the task enters TASK_RUNNING, set abosolute deadline to time_now +
> rel_deadline.
> * insert task in rq, sorted by abs_deadline
> * pick leftmost task and run it
> * when task is done, pick next task
>
> that's it.
>
Ok, that is how EDF work, and I know it... I was asking something
different... But nevermind! :-D

>> > > The most interesting part of EDF is not the
>> > > actual scheduler itself (although there are fun issues with that as
>> > > well), but extending the Priority Inheritance framework to deal with
>> > > all the fun cases that come with EDF.
>
> Well, I find EDF intersting because it is so blissfully simple. :-)
>
I agree, EDF is very simple and has a lot of very nice properties...
Problem is do decide how to assign a deadline to a task, if it is not a
classical soft or hard real-time one! :-O

But you're not talking about things like that, aren't you?

> Yes, well, EDF is not optimal for SMP systems, only for single core.
> However,
> you can do a pretty good attempt by assigning tasks to cores in a greedy
> fashion (simply put the next task at the CPU with the lowest load).
>
I definitely agree that hard real time workloads are better handled by
partitioned EDF, but for soft ones, it was sad to suffer from the possible
CPU utilization loss it entails.

Also, what about resources shared by different tasks in different
CPU/partitions? And if you avoid sharing resources between tasks in
different partitions (is that acceptable?), what about system resources,
that are shared by _all_ tasks in the system by definition?

Sorry about asking so much questions, but these are the issues we are
trying to address, and I'm quite interested in knowing if you have any
idea about them. :-)

> No. You should have *either* FIFO/RR *or* EDF, not both at the same time.
>
Oh... Why?

> If
> you absolutely require both, you should at least separate them on a
> per-core
> basis. If you mix them, they need to be aware of the other in order to
> make
> the right descision, and that is not good.
>
Well, obvioulsy it's something that we have to think carefully, but I
can't see any harmful situation in having a deadline based and a fixed
priority based scheduling from where to pick task in (sorry) priority
order.

> Well.. why not just treat *all* RT-tasks as *either* FIFO/RR or EDF?
> Having
> fifo and edf together will complicate things. And, people looking for edf,
> will not use fifo/rr anyway (famous last words).
>
Ok, maybe it's a matter of personal feelings, but I think that such a
design, even more complicated, could be very nice and useful.

>> Which leaves us with the big issue of priority inversion ;-)
>
> Couldn't above idea solve a bit of this? I have some papers on deadline
> inheritance laying aorund somewhere, I can have a look at that, I think it
> was a fairly elegant solution to some of these issues there.
>
Well, I personally think that partitioning _raises_ issues about resource
sharing instead of lightening them... In an OS like Linux is, at least...
:-O

Regards,
Dario Faggioli

PS. Sorry for the webmail... I'm abroad and I've not my laptop with me :-(

2008-10-31 18:17:48

by Dario Faggioli

[permalink] [raw]

Subject: Re: Deadline scheduling (was: Re: Rearranging layout of code in the scheduler)

On Gio, 30 Ottobre 2008 6:17 pm, Peter Zijlstra wrote:
> Right, ideally I'd like to see 2 EDF classes on top of FIFO, so that we
> end up with the following classes
>
> hedf - hard EDF
> sedf - soft EDF (bounded tardiness)
> fifo/rr - the current static priority scheduler
> fair - the current proportional fair scheduler
> idle - the idle scheduler
>
Oh, so two classes? Well, yes, could be nice.

> The two edf classes must share some state, so that the sedf class knows
> about the utilisation consumed by hedf, and the main difference between
> these two classes is the schedulability test.
>
Yep. Actually I think that schedulability test could be an issue as well,
especially if we like group/hierarchical approach, since the known
hierarchical admission tests are quite complex to implement and, probably,
time consuming if performed on-line in an highly dynamic (with respec to
to task arrival and leaving) system.

> The few problems this gives are things like kstopmachine and the
> migration threads, which should run at the max priority available on the
> system.
>
Yeah, that's exactly what we was thinking too.

Actually, I was also thinking that having fixed priority scheduling
_before_ EDF could be of some benefit if you have to be sure that a task
is going to be executed at a very precise instant in time, but have not a
precise about that yet.

> Perhaps we can introduce another class on top of hedf which will run
> just these two tasks and is not exposed to userspace (yes, I understand
> it will ruin just about any schedulability analysis).
Well, could be a solution... And if this is only used for such kind of
special tasks, maybe it is not impossible to bound or account for their
scheduling contribution... But I'm just shooting inthe dark here, sorry
about that! :-P

> We can do deadline inheritance and bandwidth inheritance by changing
> plist to a rb-tree/binary heap and mapping the static priority levels
> somewhere at the back and also propagating the actual task responsible
> for the boost down the chain (so as to be able to do bandwidth
> inheritance).
>
> From what I gather the sssup folks are doing that, although they
> reported that DI between disjoint schedule domains (partitions) posed an
> interesting problem.
>
Yes, that's right, this is what we are investigating and trying to do in
these days (... Or weeks... Or months!).

> Personally I'd like to see the full priority inversion issue solved by
> something like the proxy execution protocol, however the SMP extension
> thereof seems to be a tad expensive - found a book on graph theory, all
> that remains is finding time to read it :-)
>
Wow... So, good luck for that! :-)

Maybe it's my fault, but I see some issues with proxy execution and
similar protocols.
That is, if you have, let's say, task A blocked on task B, blocked on task
C, and you are using proxy execution, that means that you have not
dequeued A and B when they blocked, but that you, for example, filled a
pointer that reminds you, when you schedule them, that you have to
actually run C, am I right?

Now, what happens if C blocks on a non-rt mutex lock, or if it simply go
to sleep? Is that acceptable to track the blocking chain in order to
actually dequeue also A and B, and to requeue them again when C will wake
up as well?

Forgive if that's a stupid point... :-(

> The advantage of proxy execution is that its fully invariant to the
> schedule function and thus even works for proportional fair schedulers
> and any kind of scheduler hierarchy.
>
Yes, I agree and I like it very much too. If you go for it, you could also
add bandwidth inheritance (e.g., for group scheduling) and things like
that almost for free (if wanted! :-))

Regards,
Dario Faggioli