2003-11-23 11:58:01

by Nick Piggin

[permalink] [raw]
Subject: [RFC] generalise scheduling classes

Hi everyone,
We still don't have an HT aware scheduler, which is unfortunate because
weird stuff like that looks like it will only become more common in future.

I made a patch on top of my recent NUMA/SMP scheduling stuff to implement
generalised scheduling classes. With this modification we can allow
architectures to control scheduling policy in a much finer way.
Hyperthreading should be no problem, hierarchical (NUMA) nodes should
be doable as well.

I'm not exactly sure how architecuture specific code is supposed to be
handled, I'll have to have a look at some examples. Basically architectures
build up your own scheduling "classes".

I have supplied a default function to build up the classes if none is
supplied. It builds them so functionality should be similar to the
previous standard local / remote behaviour.

Haven't done much testing yet, just asking for comments. Will these
classes be sufficient for everyone?

Class is struct sched_class in include/linux/sched.h
Default classes are built by arch_init_sched_classes in kernel/sched.c

http://www.kerneltrap.org/~npiggin/w23/
The patch in question is this one
http://www.kerneltrap.org/~npiggin/w23/broken-out/sched-domain.patch

Best regards,
Nick



2003-11-23 12:02:10

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC] generalise scheduling classes


On Sun, 23 Nov 2003, Nick Piggin wrote:

> We still don't have an HT aware scheduler, [...]

uhm, have you seen my HT scheduler patches, in particular the HT scheduler
in Fedora Core 1, which is on top of a pretty recent 2.6 scheduler? Works
pretty well.

Ingo

2003-11-23 12:18:45

by Nick Piggin

[permalink] [raw]
Subject: Re: [RFC] generalise scheduling classes



Ingo Molnar wrote:

>On Sun, 23 Nov 2003, Nick Piggin wrote:
>
>
>>We still don't have an HT aware scheduler, [...]
>>
>
>uhm, have you seen my HT scheduler patches, in particular the HT scheduler
>in Fedora Core 1, which is on top of a pretty recent 2.6 scheduler? Works
>pretty well.
>

No I have seen it. Sorry I know you have done so and it looks good. I
wouldn't be adverse to it being included, although Linus seems to be.
The changes I have made nearly give you it for free anyway.

I just meant that there is not one in Linus' tree yet.

Nick


2003-11-23 12:23:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC] generalise scheduling classes


On Sun, 23 Nov 2003, Nick Piggin wrote:

> I just meant that there is not one in Linus' tree yet.

yes, because when i wrote it we were already in a feature freeze, and the
changes are intrusive. And being the scheduler maintainer i'm supposed to
show a certain level of self restraint :-)

Ingo

2003-11-23 16:27:28

by Martin J. Bligh

[permalink] [raw]
Subject: Re: [RFC] generalise scheduling classes

>> We still don't have an HT aware scheduler, [...]
>
> uhm, have you seen my HT scheduler patches, in particular the HT scheduler
> in Fedora Core 1, which is on top of a pretty recent 2.6 scheduler? Works
> pretty well.

Do you have a pointer to an updated patch? I haven't seen a version of
that for a while, and would like to play with it.

Thanks,

M.

2003-11-23 21:38:29

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [RFC] generalise scheduling classes

On Sun, Nov 23, 2003 at 10:57:54PM +1100, Nick Piggin wrote:
> Class is struct sched_class in include/linux/sched.h
> Default classes are built by arch_init_sched_classes in kernel/sched.c
> http://www.kerneltrap.org/~npiggin/w23/
> The patch in question is this one
> http://www.kerneltrap.org/~npiggin/w23/broken-out/sched-domain.patch

There's a small terminological oddity in that "class" is usually meant
to describe policies governing a task, and "domain" system partitions
like the bits in your patch (I don't recall if they're meant to be
logical or physical). e.g. usage elsewhere would say that there is an
"interactive class", a "timesharing class", a "realtime class", and so
on. Apart from that (and I suppose it's a minor concern), this appears
relatively innocuous.


-- wli

2003-11-24 01:10:14

by Anton Blanchard

[permalink] [raw]
Subject: Re: [RFC] generalise scheduling classes


> We still don't have an HT aware scheduler, which is unfortunate because
> weird stuff like that looks like it will only become more common in
> future.

Yep. Look at POWER5, 2 cores on a die sharing a l2 cache and 2 threads
on each core. On top of that you have the higher level NUMA
characteristics of the machine. So we need SMT as well as (potentially)
2 levels of NUMA. The overhead of enabling multi levels of NUMA may
outweigh the gains, we need to do some analysis.

Looks like a lot of the other architectures are going multi core multi
thread...

(HT is an intel trademark for what boils down to being SMT)

Anton

2003-11-24 02:19:58

by Nick Piggin

[permalink] [raw]
Subject: Re: [RFC] generalise scheduling classes



William Lee Irwin III wrote:

>On Sun, Nov 23, 2003 at 10:57:54PM +1100, Nick Piggin wrote:
>
>>Class is struct sched_class in include/linux/sched.h
>>Default classes are built by arch_init_sched_classes in kernel/sched.c
>>http://www.kerneltrap.org/~npiggin/w23/
>>The patch in question is this one
>>http://www.kerneltrap.org/~npiggin/w23/broken-out/sched-domain.patch
>>
>
>There's a small terminological oddity in that "class" is usually meant
>to describe policies governing a task, and "domain" system partitions
>like the bits in your patch (I don't recall if they're meant to be
>logical or physical). e.g. usage elsewhere would say that there is an
>"interactive class", a "timesharing class", a "realtime class", and so
>on. Apart from that (and I suppose it's a minor concern), this appears
>relatively innocuous.
>

Yeah as you see from the name of the patch as well I got a bit muddled.
I think I'd better change it to sched_domain. Good point.


2003-11-24 02:26:45

by Nick Piggin

[permalink] [raw]
Subject: Re: [RFC] generalise scheduling classes



Anton Blanchard wrote:

>>We still don't have an HT aware scheduler, which is unfortunate because
>>weird stuff like that looks like it will only become more common in
>>future.
>>
>
>Yep. Look at POWER5, 2 cores on a die sharing a l2 cache and 2 threads
>on each core. On top of that you have the higher level NUMA
>characteristics of the machine. So we need SMT as well as (potentially)
>2 levels of NUMA. The overhead of enabling multi levels of NUMA may
>outweigh the gains, we need to do some analysis.
>

Technically the scheduler knows nothing about NUMA. Previously it had
local and a remote domains corresponding to inter and intra node cpu sets.
All it did was to do remote balancing a little more gently. But we'll call
it NUMA scheduling.

What you want for POWER5 is very aggressive sharing at the SMT level and
possibly even the chip level if they share l2. Less aggressive for node
local and then even less for remote.

SGI I think have differing distances between NUMA nodes and they expressed
possible interest in a multi level system.

I can't give you good benchmark numbers because I only have the NUMAQ at
OSDL to test on - its only got 2 levels anyway. I should think that
overheads are quite minor considering it is in slow paths (balancing).


2003-11-24 02:39:08

by Davide Libenzi

[permalink] [raw]
Subject: Re: [RFC] generalise scheduling classes

[Cc list trimmed. There was the whole world]

On Mon, 24 Nov 2003, Nick Piggin wrote:

> Technically the scheduler knows nothing about NUMA. Previously it had
> local and a remote domains corresponding to inter and intra node cpu sets.
> All it did was to do remote balancing a little more gently. But we'll call
> it NUMA scheduling.

One patch I did ages ago was using a topology matrix NxN storing distances
(read move weights) from each CPU: mat[i][j] == distance/weight i <-> j
At that time the matrix was bolt-in since there was no topology API. maybe
now can be built a little bit more wisely using HT and NUMA topology info.



- Davide


2003-11-24 23:01:59

by Bill Davidsen

[permalink] [raw]
Subject: Re: [RFC] generalise scheduling classes

In article <[email protected]>,
Nick Piggin <[email protected]> wrote:

| We still don't have an HT aware scheduler, which is unfortunate because
| weird stuff like that looks like it will only become more common in future.

The idea is hardly new, in the late 60's GE (still a mainframe vendor at
that time) was looking at two execution units on a single memory path.
They decided it would have problems with memory bandwidth, what else is
new?


--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2003-11-25 01:46:53

by Nick Piggin

[permalink] [raw]
Subject: Re: [RFC] generalise scheduling classes



bill davidsen wrote:

>In article <[email protected]>,
>Nick Piggin <[email protected]> wrote:
>
>| We still don't have an HT aware scheduler, which is unfortunate because
>| weird stuff like that looks like it will only become more common in future.
>
>The idea is hardly new, in the late 60's GE (still a mainframe vendor at
>that time) was looking at two execution units on a single memory path.
>They decided it would have problems with memory bandwidth, what else is
>new?
>

I don't think I said new, but I guess they (SMT, NUMA, CMP) are newish
for architectures supported by Linux Kernel. OK NUMA has been around for
a while, but the scheduler apparently doesn't work so well for atypical
new NUMAs like Opteron.


2003-11-25 16:34:25

by Bill Davidsen

[permalink] [raw]
Subject: Re: [RFC] generalise scheduling classes

On Tue, 25 Nov 2003, Nick Piggin wrote:

>
>
> bill davidsen wrote:
>
> >In article <[email protected]>,
> >Nick Piggin <[email protected]> wrote:
> >
> >| We still don't have an HT aware scheduler, which is unfortunate because
> >| weird stuff like that looks like it will only become more common in future.
> >
> >The idea is hardly new, in the late 60's GE (still a mainframe vendor at
> >that time) was looking at two execution units on a single memory path.
> >They decided it would have problems with memory bandwidth, what else is
> >new?
> >
>
> I don't think I said new, but I guess they (SMT, NUMA, CMP) are newish
> for architectures supported by Linux Kernel. OK NUMA has been around for
> a while, but the scheduler apparently doesn't work so well for atypical
> new NUMAs like Opteron.

You didn't say new, I wasn't correcting you, just thought that the
historical perspective might be interesting. I would love to try the new
scheduler, but my test computer is not pleased with Fedora.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.