2002-10-14 11:02:03

by Erich Focht

[permalink] [raw]
Subject: [PATCH] node affine NUMA scheduler 1/5

I resend these because for some unknown reason they don't seem to
have made it neither into the MARC archives nor into those at
http://www.cs.helsinki.fi

---------- Resent Message ----------

Subject: [PATCH] node affine NUMA scheduler 1/5
Date: Fri, 11 Oct 2002 19:54:30 +0200

Hi,

here comes the complete set of patches for the node affine NUMA
scheduler. It's made of several building blocks and one can make
several flavors of NUMA schedulers out of the patches.

The patches are:

01-numa_sched_core-2.5.39-10.patch :
Provides basic NUMA functionality. It implements CPU pools
with all the mess needed to initialize them. Also it has a
node aware find_busiest_queue() which first scans the own
node for more loaded CPUs. If no steal candidate is found on
the own node, it finds the most loaded node and tries to steal
a task from it. By steal delays for remote node steals it
tries to achieve equal node load. These delays can be extended
to cope with multi-level node hierarchies (that patch is not
included).

02-numa_sched_ilb-2.5.39-10.patch :
This patch provides simple initial load balancing during exec().
It is node aware and will select the least loaded node. Also it
does a round-robin initial node selection to distribute the load
better across the nodes.

03-node_affine-2.5.39-10.patch :
This is the heart of the node affine part of the patch. Tasks
are assigned a homenode during initial load balancing and they
are attracted to the homenode.

04-alloc_on_homenode.patch :
Coupling with the memory allocator: for user tasks allocate memory
from the homenode, no matter on which node the task is scheduled.

05-dynamic_homenode-2.5.39-10.patch :
Dynamic homenode selection. When pages are allocated or freed
they are tracked. The homenode is recalculated dynamically and
set to the node where most of the memory of the task is allocated.

Meaningfull combinations of patches are:

A : numa scheduler : 01 + 02 node aware NUMA scheduler, with initial load
balancing
B : node affine scheduler : 01 + 02 + 03 (+04)

C : node affine scheduler with dynamic homenode selection :
01 + 02 + 03 + 05 ( !exclude 04 !)

The best results should be provided by C as it incorporates most of
the features.

The patches should run on ia32 NUMAQ and ia64 Azusa (with the topology
patches applied). Other architectures just need the build_node() call
similar to arch/i386/kernel/smpboot.c The issues with NUMAQ (uninitialized
platform specific stuff) should be solved.

Comments, flames, etc... welcome ;-)

Best regards,
Erich


Attachments:
01-numa_sched_core-2.5.39-10.patch (20.06 kB)