2002-10-09 23:56:50

by Michael Hohnbaum

[permalink] [raw]
Subject: [RFC] Simple NUMA Scheduler - rev 2

Attached is an updated version of my simple NUMA scheduler patch
which applies against 2.5.41. This patch does two things:

* When balancing loads between runqueues, it favors balancing
between runqueues on the same node.
* An additional balance action happens at exec time to help
minimize balancing later. This results in less movement
of tasks between nodes.

Since the last version, the only significant change has been to
modify the exec time load balancing to keep track of the last cpu
that was chosen by the exec load balance and start looking on the
next node for the shortest runqueue. This results in a more balanced
distribution of tasks across nodes and processors, but on 32-bit
NUMA systems has the disadvantage of no longer favoring node 0. Due
to the bulk of the kernel's memory residing on node 0, the prior
behavior that favored node 0 actually resulted in slightly better
performance for the 32-bit NUMA systems I've been working with.

There have been a few other minor tweaks and extensive testing.

This is not nearly as full functioned as Erich Focht's NUMA scheduler,
but is also less intrusive. Our performance testing shows no effect
on non-NUMA smp machines, better performance on NUMA machines, and
more even distribution of load across nodes.

While much more can be done to support NUMA scheduling, this relatively
small patch is a good start and provides significant improvements.

Kernbench numbers:

stock 2.5.41: 312.49user 123.99system 34.86elapsed
numasched: 293.36user 126.47system 33.69elapsed

Also attached are the results of running Erich's numa_test with
parameters of 4, 8, 12, 16, 24, 32, 48, 64, and 96. These show
the load across nodes and the percent of time that each process
spent on each node.

Comments?
--

Michael Hohnbaum 503-578-5486
[email protected] T/L 775-5486


Attachments:
sched41rev1.patch (7.06 kB)
numa_test.41_rev1 (19.47 kB)
Download all attachments

2002-10-12 08:03:04

by Erich Focht

[permalink] [raw]
Subject: Re: [RFC] Simple NUMA Scheduler - rev 2

Hi Martin,

what is the cache_decay_ticks value for these results? Is it the
default (i.e. = 0 )?

Erich

On Friday 11 October 2002 23:26, Martin J. Bligh wrote:
> > Comments?
>
> OK, that's the first one that looks better to me across the board in both
> kernbench and Eric's tests - congrats .... do_schedule seems to be a little
> slow, not sure if that's fixable or not. Will test Erich's new stuff this
> evening (I hope).
>
> (lower numbers are better).
>
> Kernbench:
> Elapsed User System CPU
> 2.5.41-mm3 19.946s 192.44s 44.15s 1186%
> 2.5.41-mm3-sched41rev1 20.07s 190.058s 43.434s 1163.2%
>
> Schedbench 4:
> Elapsed TotalUser TotalSys AvgUser
> 2.5.41-mm3 33.80 48.12 135.24 0.74
> 2.5.41-mm3-sched41rev1 22.42 37.08 89.70 0.65
>
> Schedbench 8:
> Elapsed TotalUser TotalSys AvgUser
> 2.5.41-mm3 46.92 81.47 375.44 1.65
> 2.5.41-mm3-sched41rev1 31.09 45.17 248.81 1.59
>
> Schedbench 16:
> Elapsed TotalUser TotalSys AvgUser
> 2.5.41-mm3 64.81 82.92 1037.24 5.55
> 2.5.41-mm3-sched41rev1 51.79 63.72 828.76 4.58
>
> Schedbench 32:
> Elapsed TotalUser TotalSys AvgUser
> 2.5.41-mm3 95.36 223.18 3052.02 12.57
> 2.5.41-mm3-sched41rev1 55.42 123.27 1773.77 7.83
>
> Schedbench 64:
> Elapsed TotalUser TotalSys AvgUser
> 2.5.41-mm3 156.15 638.75 9994.56 27.37
> 2.5.41-mm3-sched41rev1 57.92 256.19 3707.48 16.90
>
> ----------------------