2005-03-16 18:37:27

by Manfred Spraul

[permalink] [raw]
Subject: Re: Fw: [PATCH] NUMA Slab Allocator

Hi Christoph,

Do you have profile data from your modification? Which percentage of the
allocations is node-local, which percentage is from foreign nodes?
Preferably per-cache. It shouldn't be difficult to add statistics
counters to your patch.
And: Can you estaimate which percentage is really accessed node-local
and which percentage are long-living structures that are accessed from
all cpus in the system?
I had discussions with guys from IBM and SGI regarding a numa allocator,
and we decided that we need profile data before we can decide if we need
one:
- A node-local allocator reduces the inter-node traffic, because the
callers get node-local memory
- A node-local allocator increases the inter-node traffic, because
objects that are kfree'd on the wrong node must be returned to their
home node.

> static inline void __cache_free (kmem_cache_t *cachep, void* objp)
> {
> struct array_cache *ac = ac_data(cachep);
>+ struct slab *slabp;
>
> check_irq_off();
> objp = cache_free_debugcheck(cachep, objp, __builtin_return_address(0));
>
>- if (likely(ac->avail < ac->limit)) {
>+ /* Make sure we are not freeing a object from another
>+ * node to the array cache on this cpu.
>+ */
>+ slabp = GET_PAGE_SLAB(virt_to_page(objp));
>
>
This line is quite slow, and should be performed only for NUMA builds,
not for non-numa builds. Some kind of wrapper is required.

>+ if(unlikely(slabp->nodeid != numa_node_id())) {
>+ STATS_INC_FREEMISS(cachep);
>+ int nodeid = slabp->nodeid;
>+ spin_lock(&(cachep->nodelists[nodeid])->list_lock);
>
>
This line is very dangerous: Every wrong-node allocation causes a
spin_lock operation. I fear that the cache line traffic for the spinlock
might kill the performance for some workloads. I personally think that
batching is required, i.e. each cpu stores wrong-node objects in a
seperate per-cpu array, and then the objects are returned as a block to
their home node.

>-/*
>- * NUMA: different approach needed if the spinlock is moved into
>- * the l3 structure
>
>
You have moved the cache spinlock into the l3 structure. Have you
compared both approaches?
A global spinlock has the advantage that batching is possible in
free_block: Acquire global spinlock, return objects to all nodes in the
system, release spinlock. A node-local spinlock would mean less
contention [multiple spinlocks instead of one global lock], but far more
spin_lock/unlock calls.

IIRC the conclusion from our discussion was, that there are at least
four possible implementations:
- your version
- Add a second per-cpu array for off-node allocations. __cache_free
batches, free_block then returns. Global spinlock or per-node spinlock.
A patch with a global spinlock is in
http://www.colorfullife.com/~manfred/Linux-kernel/slab/patch-slab-numa-2.5.66
per-node spinlocks would require a restructuring of free_block.
- Add per-node array for each cpu for wrong node allocations. Allows
very fast batch return: each array contains memory just from one node,
usefull if per-node spinlocks are used.
- do nothing. Least overhead within slab.

I'm fairly certains that "do nothing" is the right answer for some
caches. For example the dentry-cache: The object lifetime is seconds to
minutes, the objects are stored in a global hashtable. They will be
touched from all cpus in the system, thus guaranteeing that
kmem_cache_alloc returns node-local memory won't help. But the added
overhead within slab.c will hurt.

--
Manfred


2005-03-16 18:56:59

by Martin J. Bligh

[permalink] [raw]
Subject: Re: Fw: [PATCH] NUMA Slab Allocator

> Do you have profile data from your modification? Which percentage of the allocations is node-local, which percentage is from foreign nodes? Preferably per-cache. It shouldn't be difficult to add statistics counters to your patch.
> And: Can you estaimate which percentage is really accessed node-local and which percentage are long-living structures that are accessed from all cpus in the system?
> I had discussions with guys from IBM and SGI regarding a numa allocator, and we decided that we need profile data before we can decide if we need one:
> - A node-local allocator reduces the inter-node traffic, because the callers get node-local memory
> - A node-local allocator increases the inter-node traffic, because objects that are kfree'd on the wrong node must be returned to their home node.

One of the big problems is that much of the slab data really is more global
(ie dentry, inodes, etc). Some of it is more localized (typically the
kmalloc style stuff). I can't really generate any data easily, as most
of my NUMA boxes are either small Opterons / midsized PPC64, which have
a fairly low NUMA factor, or large ia32, which only has kernel mem on
node 0 ;-(

> IIRC the conclusion from our discussion was, that there are at least four possible implementations:
> - your version
> - Add a second per-cpu array for off-node allocations. __cache_free batches, free_block then returns. Global spinlock or per-node spinlock. A patch with a global spinlock is in
> http://www.colorfullife.com/~manfred/Linux-kernel/slab/patch-slab-numa-2.5.66
> per-node spinlocks would require a restructuring of free_block.
> - Add per-node array for each cpu for wrong node allocations. Allows very fast batch return: each array contains memory just from one node, usefull if per-node spinlocks are used.
> - do nothing. Least overhead within slab.
>
> I'm fairly certains that "do nothing" is the right answer for some caches.
> For example the dentry-cache: The object lifetime is seconds to minutes,
> the objects are stored in a global hashtable. They will be touched from
> all cpus in the system, thus guaranteeing that kmem_cache_alloc returns
> node-local memory won't help. But the added overhead within slab.c will hurt.

That'd be my inclination .... but OTOH, we do that for pagecache OK. Dunno,
I'm torn. Depends if there's locality on the file access or not, I guess.
Is there any *harm* in doing it node local .... perhaps creating a node
mem pressure imbalance (OTOH, there's loads of stuff that does that anyway ;-))

The other thing that needs serious thought is how we balance reclaim pressure.

M.

2005-03-16 19:10:55

by Manfred Spraul

[permalink] [raw]
Subject: Re: Fw: [PATCH] NUMA Slab Allocator

Martin J. Bligh wrote:

>That'd be my inclination .... but OTOH, we do that for pagecache OK.
>
The page cache doesn't have a global hash table.

> Dunno,
>I'm torn. Depends if there's locality on the file access or not, I guess.
>Is there any *harm* in doing it node local .... perhaps creating a node
>mem pressure imbalance (OTOH, there's loads of stuff that does that anyway ;-))
>
>
>
The harm is slower kmem_cache_free and a lower hit ratio for the per-cpu
caches: kmem_cache_free must identify and return wrong node objects, and
due to these returns, the per-cpu array is more often empty in
kmem_cache_alloc.

IIRC someone from SGI wrote that they have seen bad performance in
fork-bomb tests on large cpu count systems which might be caused by
inter-node traffic on the mm_struct structure and that they think that a
numa aware allocator would help. As far as I know no tests were done to
very that assumption.

--
Manfred

2005-03-30 05:31:31

by Christoph Lameter

[permalink] [raw]
Subject: API changes to the slab allocator for NUMA memory allocation

The patch makes the following function calls available to allocate memory on
a specific node without changing the basic operation of the slab
allocator:

kmem_cache_alloc_node(kmem_cache_t *cachep, unsigned int flags, int node);
kmalloc_node(size_t size, unsigned int flags, int node);

These are similar then to the existing node-blind functions:

kmem_cache_alloc(kmem_cache_t *cachep, unsigned int flags);
kmalloc(size, flags);

The implementation for kmalloc_node is a slight variation on the old
kmalloc function. kmem_cache_alloc_node was changed to pass flags and
the node information through the existing layers of the slab allocator
(which lead to some minor rearrangements). The functions at the lowest
layer (kmem_getpages, cache_grow) are already node aware.
Also __alloc_percpu can call kmalloc_node now.

This patch is necessary for the pageset localization patch posted
after this patch. The pageset patch also contains results of an
AIM7 benchmark that exercises this patch.

Patch against 2.6.11.6-bk3

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.11/include/linux/slab.h
===================================================================
--- linux-2.6.11.orig/include/linux/slab.h 2005-03-29 15:02:20.000000000 -0800
+++ linux-2.6.11/include/linux/slab.h 2005-03-29 18:17:19.000000000 -0800
@@ -61,15 +61,6 @@ extern kmem_cache_t *kmem_cache_create(c
void (*)(void *, kmem_cache_t *, unsigned long));
extern int kmem_cache_destroy(kmem_cache_t *);
extern int kmem_cache_shrink(kmem_cache_t *);
-extern void *kmem_cache_alloc(kmem_cache_t *, unsigned int __nocast);
-#ifdef CONFIG_NUMA
-extern void *kmem_cache_alloc_node(kmem_cache_t *, int);
-#else
-static inline void *kmem_cache_alloc_node(kmem_cache_t *cachep, int node)
-{
- return kmem_cache_alloc(cachep, GFP_KERNEL);
-}
-#endif
extern void kmem_cache_free(kmem_cache_t *, void *);
extern unsigned int kmem_cache_size(kmem_cache_t *);

@@ -80,9 +71,23 @@ struct cache_sizes {
kmem_cache_t *cs_dmacachep;
};
extern struct cache_sizes malloc_sizes[];
-extern void *__kmalloc(size_t, unsigned int __nocast);

-static inline void *kmalloc(size_t size, unsigned int __nocast flags)
+extern void *__kmalloc_node(size_t, unsigned int __nocast, int node);
+#ifdef CONFIG_NUMA
+extern void *kmem_cache_alloc_node(kmem_cache_t *, unsigned int __nocast, int);
+#define kmem_cache_alloc(cachep, flags) kmem_cache_alloc_node(cachep, flags, -1)
+#else
+extern void *kmem_cache_alloc(kmem_cache_t *, unsigned int __nocast);
+#define kmem_cache_alloc_node(cachep, flags, node) kmem_cache_alloc(cachep, flags)
+#endif
+
+#define __kmalloc(size, flags) __kmalloc_node(size, flags, -1)
+#define kmalloc(size, flags) kmalloc_node(size, flags, -1)
+
+/*
+ * Allocating memory on a specific node.
+ */
+static inline void *kmalloc_node(size_t size, unsigned int flags, int node)
{
if (__builtin_constant_p(size)) {
int i = 0;
@@ -98,11 +103,11 @@ static inline void *kmalloc(size_t size,
__you_cannot_kmalloc_that_much();
}
found:
- return kmem_cache_alloc((flags & GFP_DMA) ?
+ return kmem_cache_alloc_node((flags & GFP_DMA) ?
malloc_sizes[i].cs_dmacachep :
- malloc_sizes[i].cs_cachep, flags);
+ malloc_sizes[i].cs_cachep, flags, node);
}
- return __kmalloc(size, flags);
+ return __kmalloc_node(size, flags, node);
}

extern void *kcalloc(size_t, size_t, unsigned int __nocast);
Index: linux-2.6.11/mm/slab.c
===================================================================
--- linux-2.6.11.orig/mm/slab.c 2005-03-29 15:02:20.000000000 -0800
+++ linux-2.6.11/mm/slab.c 2005-03-29 15:02:27.000000000 -0800
@@ -676,7 +676,7 @@ static struct array_cache *alloc_arrayca
kmem_cache_t *cachep;
cachep = kmem_find_general_cachep(memsize, GFP_KERNEL);
if (cachep)
- nc = kmem_cache_alloc_node(cachep, cpu_to_node(cpu));
+ nc = kmem_cache_alloc_node(cachep, GFP_KERNEL, cpu_to_node(cpu));
}
if (!nc)
nc = kmalloc(memsize, GFP_KERNEL);
@@ -1988,7 +1988,7 @@ bad:
#define check_slabp(x,y) do { } while(0)
#endif

-static void *cache_alloc_refill(kmem_cache_t *cachep, unsigned int __nocast flags)
+static void *cache_alloc_refill(kmem_cache_t *cachep, unsigned int __nocast flags, int node)
{
int batchcount;
struct kmem_list3 *l3;
@@ -2070,7 +2070,7 @@ alloc_done:

if (unlikely(!ac->avail)) {
int x;
- x = cache_grow(cachep, flags, -1);
+ x = cache_grow(cachep, flags, node);

// cache_grow can reenable interrupts, then ac could change.
ac = ac_data(cachep);
@@ -2140,7 +2140,7 @@ cache_alloc_debugcheck_after(kmem_cache_
#endif


-static inline void *__cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
+static inline void *__cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags, int node)
{
unsigned long save_flags;
void* objp;
@@ -2156,7 +2156,7 @@ static inline void *__cache_alloc(kmem_c
objp = ac_entry(ac)[--ac->avail];
} else {
STATS_INC_ALLOCMISS(cachep);
- objp = cache_alloc_refill(cachep, flags);
+ objp = cache_alloc_refill(cachep, flags, node);
}
local_irq_restore(save_flags);
objp = cache_alloc_debugcheck_after(cachep, flags, objp, __builtin_return_address(0));
@@ -2167,7 +2167,6 @@ static inline void *__cache_alloc(kmem_c
* NUMA: different approach needed if the spinlock is moved into
* the l3 structure
*/
-
static void free_block(kmem_cache_t *cachep, void **objpp, int nr_objects)
{
int i;
@@ -2295,20 +2294,6 @@ static inline void __cache_free(kmem_cac
}

/**
- * kmem_cache_alloc - Allocate an object
- * @cachep: The cache to allocate from.
- * @flags: See kmalloc().
- *
- * Allocate an object from this cache. The flags are only relevant
- * if the cache has no available objects.
- */
-void *kmem_cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
-{
- return __cache_alloc(cachep, flags);
-}
-EXPORT_SYMBOL(kmem_cache_alloc);
-
-/**
* kmem_ptr_validate - check if an untrusted pointer might
* be a slab entry.
* @cachep: the cache we're checking against
@@ -2350,7 +2335,23 @@ out:
return 0;
}

-#ifdef CONFIG_NUMA
+#ifndef CONFIG_NUMA
+/**
+ * kmem_cache_alloc - Allocate an object
+ * @cachep: The cache to allocate from.
+ * @flags: See kmalloc().
+ *
+ * Allocate an object from this cache. The flags are only relevant
+ * if the cache has no available objects.
+ */
+void *kmem_cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
+{
+ return __cache_alloc(cachep, flags, -1);
+}
+EXPORT_SYMBOL(kmem_cache_alloc);
+
+#else
+
/**
* kmem_cache_alloc_node - Allocate an object on the specified node
* @cachep: The cache to allocate from.
@@ -2361,13 +2362,16 @@ out:
* and can sleep. And it will allocate memory on the given node, which
* can improve the performance for cpu bound structures.
*/
-void *kmem_cache_alloc_node(kmem_cache_t *cachep, int nodeid)
+void *kmem_cache_alloc_node(kmem_cache_t *cachep, unsigned int flags, int nodeid)
{
int loop;
void *objp;
struct slab *slabp;
kmem_bufctl_t next;

+ if (nodeid < 0)
+ return __cache_alloc(cachep, flags, -1);
+
for (loop = 0;;loop++) {
struct list_head *q;

@@ -2393,7 +2397,7 @@ void *kmem_cache_alloc_node(kmem_cache_t
spin_unlock_irq(&cachep->spinlock);

local_irq_disable();
- if (!cache_grow(cachep, GFP_KERNEL, nodeid)) {
+ if (!cache_grow(cachep, flags, nodeid)) {
local_irq_enable();
return NULL;
}
@@ -2429,7 +2433,7 @@ got_slabp:
list3_data(cachep)->free_objects--;
spin_unlock_irq(&cachep->spinlock);

- objp = cache_alloc_debugcheck_after(cachep, GFP_KERNEL, objp,
+ objp = cache_alloc_debugcheck_after(cachep, flags, objp,
__builtin_return_address(0));
return objp;
}
@@ -2441,6 +2445,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_node);
* kmalloc - allocate memory
* @size: how many bytes of memory are required.
* @flags: the type of memory to allocate.
+ * @node: the node from which to allocate or -1
*
* kmalloc is the normal method of allocating memory
* in the kernel.
@@ -2458,16 +2463,16 @@ EXPORT_SYMBOL(kmem_cache_alloc_node);
* platforms. For example, on i386, it means that the memory must come
* from the first 16MB.
*/
-void *__kmalloc(size_t size, unsigned int __nocast flags)
+void *__kmalloc_node(size_t size, unsigned int __nocast flags, int node)
{
kmem_cache_t *cachep;

cachep = kmem_find_general_cachep(size, flags);
if (unlikely(cachep == NULL))
return NULL;
- return __cache_alloc(cachep, flags);
+ return __cache_alloc(cachep, flags, node);
}
-EXPORT_SYMBOL(__kmalloc);
+EXPORT_SYMBOL(__kmalloc_node);

#ifdef CONFIG_SMP
/**
@@ -2489,9 +2494,7 @@ void *__alloc_percpu(size_t size, size_t
for (i = 0; i < NR_CPUS; i++) {
if (!cpu_possible(i))
continue;
- pdata->ptrs[i] = kmem_cache_alloc_node(
- kmem_find_general_cachep(size, GFP_KERNEL),
- cpu_to_node(i));
+ pdata->ptrs[i] = kmalloc_node(size, GFP_KERNEL, cpu_to_node(i));

if (!pdata->ptrs[i])
goto unwind_oom;

2005-03-30 06:42:10

by Manfred Spraul

[permalink] [raw]
Subject: Re: API changes to the slab allocator for NUMA memory allocation

Christoph Lameter wrote:

>The patch makes the following function calls available to allocate memory on
>a specific node without changing the basic operation of the slab
>allocator:
>
> kmem_cache_alloc_node(kmem_cache_t *cachep, unsigned int flags, int node);
> kmalloc_node(size_t size, unsigned int flags, int node);
>
>
>
I intentionally didn't add a kmalloc_node() function:
kmalloc is just a wrapper around
kmem_find_general_cachep+kmem_cache_alloc. It exists only for
efficiency. The _node functions are slow, thus a wrapper is IMHO not
required. kmalloc_node(size,flags,node) is identical to
kmem_cache_alloc(kmem_find_general_cachep(size,flags),flags,node). What
about making kmem_find_general_cachep() public again and removing
kmalloc_node()?

And I don't know if it's a good idea to make kmalloc() a special case of
kmalloc_node(): It adds one parameter to every kmalloc call and
kmem_cache_alloc call, virtually everyone passes -1. Does it increase
the .text size?

--
Manfred

2005-03-30 15:56:26

by Christoph Lameter

[permalink] [raw]
Subject: Re: API changes to the slab allocator for NUMA memory allocation

On Wed, 30 Mar 2005, Manfred Spraul wrote:

> >The patch makes the following function calls available to allocate memory on
> >a specific node without changing the basic operation of the slab
> >allocator:
> >
> > kmem_cache_alloc_node(kmem_cache_t *cachep, unsigned int flags, int node);
> > kmalloc_node(size_t size, unsigned int flags, int node);

> I intentionally didn't add a kmalloc_node() function:
> kmalloc is just a wrapper around
> kmem_find_general_cachep+kmem_cache_alloc. It exists only for
> efficiency. The _node functions are slow, thus a wrapper is IMHO not
> required. kmalloc_node(size,flags,node) is identical to
> kmem_cache_alloc(kmem_find_general_cachep(size,flags),flags,node). What
> about making kmem_find_general_cachep() public again and removing
> kmalloc_node()?

kmalloc is the function in use by most kernel code. kmalloc_node makes the
use of node specific allocations easy. Yes node functions are slow at
this point but we will submit additional patches that will address those
issues. The patch makes it easy for a variety of kernel modules to use
node specific memory allocations. With this patch we will be able to
submit patches that enhance the speed of the slab allocator as well as
patches that make subsystems use node specific memory at the same time.

> And I don't know if it's a good idea to make kmalloc() a special case of
> kmalloc_node(): It adds one parameter to every kmalloc call and
> kmem_cache_alloc call, virtually everyone passes -1. Does it increase
> the .text size?

The -1 is optimized away for the non NUMA case. In the NUMA case its an
additional parameter that is passed to kmem_cache_alloc. So its one
additional register load that allows us to not have an additional function
for the case non node specific allocations.

2005-03-30 17:56:12

by Manfred Spraul

[permalink] [raw]
Subject: Re: API changes to the slab allocator for NUMA memory allocation

// $Header$
// Kernel Version:
// VERSION = 2
// PATCHLEVEL = 6
// SUBLEVEL = 11
// EXTRAVERSION = -mm2
--- 2.6/include/linux/slab.h 2005-03-12 01:05:50.000000000 +0100
+++ build-2.6/include/linux/slab.h 2005-03-30 19:33:56.158317105 +0200
@@ -62,16 +62,9 @@
extern int kmem_cache_destroy(kmem_cache_t *);
extern int kmem_cache_shrink(kmem_cache_t *);
extern void *kmem_cache_alloc(kmem_cache_t *, int);
-#ifdef CONFIG_NUMA
-extern void *kmem_cache_alloc_node(kmem_cache_t *, int);
-#else
-static inline void *kmem_cache_alloc_node(kmem_cache_t *cachep, int node)
-{
- return kmem_cache_alloc(cachep, GFP_KERNEL);
-}
-#endif
extern void kmem_cache_free(kmem_cache_t *, void *);
extern unsigned int kmem_cache_size(kmem_cache_t *);
+extern kmem_cache_t * kmem_find_general_cachep(size_t size, int gfpflags);

/* Size description struct for general caches. */
struct cache_sizes {
@@ -109,6 +102,20 @@
extern void kfree(const void *);
extern unsigned int ksize(const void *);

+#ifdef CONFIG_NUMA
+extern void *kmem_cache_alloc_node(kmem_cache_t *, int);
+extern void *kmalloc_node(size_t size, int flags, int node);
+#else
+static inline void *kmem_cache_alloc_node(kmem_cache_t *cachep, int node)
+{
+ return kmem_cache_alloc(cachep, GFP_KERNEL);
+}
+static inline void *kmalloc_node(size_t size, int flags, int node)
+{
+ return kmalloc(size, flags);
+}
+#endif
+
extern int FASTCALL(kmem_cache_reap(int));
extern int FASTCALL(kmem_ptr_validate(kmem_cache_t *cachep, void *ptr));

--- 2.6/mm/slab.c 2005-03-12 12:36:34.000000000 +0100
+++ build-2.6/mm/slab.c 2005-03-30 19:44:15.428757330 +0200
@@ -586,7 +586,7 @@
return cachep->array[smp_processor_id()];
}

-static inline kmem_cache_t * kmem_find_general_cachep (size_t size, int gfpflags)
+static inline kmem_cache_t *__find_general_cachep(size_t size, int gfpflags)
{
struct cache_sizes *csizep = malloc_sizes;

@@ -610,6 +610,12 @@
return csizep->cs_cachep;
}

+kmem_cache_t *kmem_find_general_cachep(size_t size, int gfpflags)
+{
+ return __find_general_cachep(size, gfpflags);
+}
+EXPORT_SYMBOL(kmem_find_general_cachep);
+
/* Cal the num objs, wastage, and bytes left over for a given slab size. */
static void cache_estimate (unsigned long gfporder, size_t size, size_t align,
int flags, size_t *left_over, unsigned int *num)
@@ -2200,7 +2197,7 @@
list_del(&slabp->list);
objnr = (objp - slabp->s_mem) / cachep->objsize;
check_slabp(cachep, slabp);
-#if 0 /* disabled, not compatible with leak detection */
+#if DEBUG
if (slab_bufctl(slabp)[objnr] != BUFCTL_ALLOC) {
printk(KERN_ERR "slab: double free detected in cache "
"'%s', objp %p\n", cachep->name, objp);
@@ -2450,6 +2447,16 @@
}
EXPORT_SYMBOL(kmem_cache_alloc_node);

+void *kmalloc_node(size_t size, int flags, int node)
+{
+ kmem_cache_t *cachep;
+
+ cachep = kmem_find_general_cachep(size, flags);
+ if (unlikely(cachep == NULL))
+ return NULL;
+ return kmem_cache_alloc_node(cachep, node);
+}
+EXPORT_SYMBOL(kmalloc_node);
#endif

/**
@@ -2477,7 +2484,12 @@
{
kmem_cache_t *cachep;

- cachep = kmem_find_general_cachep(size, flags);
+ /* If you want to save a few bytes .text space: replace
+ * __ with kmem_.
+ * Then kmalloc uses the uninlined functions instead of the inline
+ * functions.
+ */
+ cachep = __find_general_cachep(size, flags);
if (unlikely(cachep == NULL))
return NULL;
return __cache_alloc(cachep, flags);


Attachments:
patch-slab-numa-experimental (3.35 kB)

2005-03-30 18:15:52

by Christoph Lameter

[permalink] [raw]
Subject: Re: API changes to the slab allocator for NUMA memory allocation

On Wed, 30 Mar 2005, Manfred Spraul wrote:

> Correct, I was thinking about the NUMA case.
> You've decided to add one register load to every call of kmalloc. On
> i386, kmalloc_node() is a 24-byte function. I'd bet that adding the node
> parameter to every call of kmalloc causes a .text increase larger than
> 240 bytes. And I have not yet considered that you have increased the
> number of conditional branches in every kmalloc(32,GFP_KERNEL) call by
> 33%, i.e. from 3 to 4 conditional branch instructions.
> I'd add an explicit kmalloc_node function. Attached is a prototype
> patch. You'd have to reintroduce the flags field to
> kmem_cache_alloc_node() and update kmalloc_node.
> The patch was manually edited, I hope it applies to a recent tree ;-)
>
> What do you think?

Looks fine to me.