2003-01-16 18:26:00

by Martin J. Bligh

[permalink] [raw]
Subject: [PATCH] (0/3) NUMA aware scheduler

Following is a sequence of patches to add NUMA awareness to the scheduler.
These have been submitted to you several times before, but in my opinion
were structured in such a way to make them too invasive to non-NUMA machines.
I propsed a new scheme of working in "concentric circles" which this set
follows (Erich did most of the hard work of restructuring), and is now
completely non-invasive to non-NUMA systems. It has no effect whatsoever
on standard machines. This can be seen by code inspection, and has been
checked by benchmarking.

These patches are the culmination of work by Erich Focht, Michael Hohnbaum
and myself. We've also incorporated feedback from Christoph and Robert Love.
I believe these are now ready for mainline acceptance. I've tested them on
NUMA-Q, standard SMP and UP. Erich has run them on the NEC ia64 NUMA machine.

Benchmarks on a 16-way NUMA-Q machine w/ 16Gb of RAM

Kernbench: (average of 5 kernel compiles)
Elapsed User System CPU
2.5.58 20.012s 191.81s 48.37s 1200.6%
2.5.58-numasched 19.57s 187.264s 42.186s 1171.8%

NUMA schedbench 64: (64 processes running memory allocation fairly heavily)
Elapsed TotalUser TotalSys
2.5.48 608.81 9418.37 26.74
2.5.58-numasched 230.49 3613.47 15.57


2003-01-16 19:30:47

by Martin J. Bligh

[permalink] [raw]
Subject: Re: [PATCH] (0/3) NUMA aware scheduler

> These patches are the culmination of work by Erich Focht, Michael Hohnbaum
> and myself. We've also incorporated feedback from Christoph and Robert Love.

There's also some bits of code embedded in here from Christoph.

M.

2003-01-16 19:43:05

by Andrew Theurer

[permalink] [raw]
Subject: Re: [PATCH] (0/3) NUMA aware scheduler

> Following is a sequence of patches to add NUMA awareness to the scheduler.
> These have been submitted to you several times before, but in my opinion
> were structured in such a way to make them too invasive to non-NUMA
machines.
> I propsed a new scheme of working in "concentric circles" which this set
> follows (Erich did most of the hard work of restructuring), and is now
> completely non-invasive to non-NUMA systems. It has no effect whatsoever
> on standard machines. This can be seen by code inspection, and has been
> checked by benchmarking.

FYI, I have used a topology to map HT aware processors (in this case P4) to
a NUMA topology while using this scheduler. This was done to help address
the same problems that Ingo's shared runqueue implementation fixed. The
topology is quite simple. Sibling logical procs are members of a node.
Number of nodes = number of physical procs.

This primarily avoids sharing cpu cores (and avoiding resource contention)
on low loads. In my case, 4 tasks on 8 logical proc system, we want to load
balance the tasks across nodes/cores for better performance. For my test, I
did a make -j4 on a 2.4.18 kernel. Results are:

stock sched, no numa: 56.523 elapsed 202.899 user, 18.266 sys, 390.6%
numa sched, ht topo: 53.088 elapsed 189.424 user, 18.36 sys, 391%

~6.5% better. These results are the average of 10 kernel compiles.
* I did make one minor change to sched_best_cpu(). The first test case was
elimintaed, and that change is currently under discussion.

I did this mainly to demonstrate that a numa scheduler's policies may be
able to help HT systems and to capture a wider interest in numa scheduler.
By no means is P4 HT required to use this. This is simply a numa topology
implemantation. I would like some feedback on any interest in this.

One of the reasons we probably have not had much interest in numa patches is
that numa systems are not that prevailent. However, numa-like qualites are
showing up in commonly available systems, and I believe we can take
advantage of policies that these patches, such as numa scheduler provide.
Does anyone have any other ideas where numa like qualities lie? x86-64?

-Andrew Theurer

P.S. I am working on a topology patch to send out. It's quite hackish right
now.


2003-01-16 20:19:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] (0/3) NUMA aware scheduler


Applied.

I also have to say that I hope this means that the HT-specific scheduler
stuff will go away. HT _should_ be just another NuMA issue, and right now
the two seem to be just slightly different ways of covering the same
needs.

However, I'm going away for two weeks starting tomorrow, so even if there
is some experimental HT/NUMA patch, I don't want it at this point. The
NUMA scheduler merge is more of a "get the infrastructure in place" thing
for me right now.

Linus

2003-01-16 22:34:32

by Martin J. Bligh

[permalink] [raw]
Subject: Re: [PATCH] (0/3) NUMA aware scheduler

> Applied.

Thank you!

> I also have to say that I hope this means that the HT-specific scheduler
> stuff will go away. HT _should_ be just another NuMA issue, and right now
> the two seem to be just slightly different ways of covering the same
> needs.

Yup, Andrew Theurer from our performance team has been working on this.
Initial results look encouraging.

> However, I'm going away for two weeks starting tomorrow, so even if there
> is some experimental HT/NUMA patch, I don't want it at this point. The
> NUMA scheduler merge is more of a "get the infrastructure in place" thing
> for me right now.

Absolutely. Hopefully by the time you return we'll have a structure for
hyperthreading in place that's reasonably tuned ;-)

There's some more tuning and tweaking we could do to the NUMA machines as
well (I'm looking at how to implement Ingo's feedback), but I'm convinced
the infrastructure is correct.

Thanks,

M.

2003-01-20 19:38:41

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] (0/3) NUMA aware scheduler

Hi!

> One of the reasons we probably have not had much interest in numa patches is
> that numa systems are not that prevailent. However, numa-like qualites are
> showing up in commonly available systems, and I believe we can take
> advantage of policies that these patches, such as numa scheduler provide.
> Does anyone have any other ideas where numa like qualities lie? x86-64?

Yep, x86-64 SMP systems are in fact NUMA systems that don't penalize
remote memory *that* badly.
Pavel
--
Worst form of spam? Adding advertisment signatures ala sourceforge.net.
What goes next? Inserting advertisment *into* email?