2006-09-12 14:43:45

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: [patch] slab: Do not use mempolicy for kmalloc_node

The slab should follow the specified memory policy for kmalloc allocations,
which it does. However, for kmalloc_node allocations, slab should
serve the object from the requested node irrespective of memory policy.
This seems to be broken in slab code. Following patch fixes this.

Patch abstacts out a __cache_alloc_mempolicy function to be used when
mempolicy is to be applied.

Signed-off-by: Alok N Kataria <[email protected]>
Signed-off-by: Ravikiran Thirumalai <[email protected]>
Signed-off-by: Shai Fultheim <[email protected]>

Index: linux-2.6.18-rc3/mm/slab.c
===================================================================
--- linux-2.6.18-rc3.orig/mm/slab.c 2006-08-04 10:01:46.000000000 -0700
+++ linux-2.6.18-rc3/mm/slab.c 2006-08-08 12:05:21.000000000 -0700
@@ -2963,19 +2963,12 @@ static void *cache_alloc_debugcheck_afte
#define cache_alloc_debugcheck_after(a,b,objp,d) (objp)
#endif

-static inline void *____cache_alloc(struct kmem_cache *cachep, gfp_t flags)
+static inline void *
+____cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
void *objp;
struct array_cache *ac;

-#ifdef CONFIG_NUMA
- if (unlikely(current->flags & (PF_SPREAD_SLAB | PF_MEMPOLICY))) {
- objp = alternate_node_alloc(cachep, flags);
- if (objp != NULL)
- return objp;
- }
-#endif
-
check_irq_off();
ac = cpu_cache_get(cachep);
if (likely(ac->avail)) {
@@ -2989,6 +2982,28 @@ static inline void *____cache_alloc(stru
return objp;
}

+#ifdef CONFIG_NUMA
+static inline void *__cache_alloc_mempolicy(struct kmem_cache *cachep, gfp_t flags)
+{
+ void *objp;
+
+ if (unlikely(current->flags & (PF_SPREAD_SLAB | PF_MEMPOLICY))) {
+ objp = alternate_node_alloc(cachep, flags);
+ if (objp != NULL)
+ return objp;
+ }
+
+ return ____cache_alloc(cachep, flags);
+}
+#else
+static inline void *__cache_alloc_mempolicy(struct kmem_cache *cachep, gfp_t flags)
+{
+ return ____cache_alloc(cachep, flags);
+}
+#endif
+
+
+
static __always_inline void *__cache_alloc(struct kmem_cache *cachep,
gfp_t flags, void *caller)
{
@@ -2998,7 +3013,7 @@ static __always_inline void *__cache_all
cache_alloc_debugcheck_before(cachep, flags);

local_irq_save(save_flags);
- objp = ____cache_alloc(cachep, flags);
+ objp = __cache_alloc_mempolicy(cachep, flags);
local_irq_restore(save_flags);
objp = cache_alloc_debugcheck_after(cachep, flags, objp,
caller);
@@ -3303,8 +3318,9 @@ void *kmem_cache_alloc_node(struct kmem_
cache_alloc_debugcheck_before(cachep, flags);
local_irq_save(save_flags);

- if (nodeid == -1 || nodeid == numa_node_id() ||
- !cachep->nodelists[nodeid])
+ if (nodeid == -1 || !cachep->nodelists[nodeid])
+ ptr = __cache_alloc_mempolicy(cachep, flags);
+ else if (nodeid == numa_node_id())
ptr = ____cache_alloc(cachep, flags);
else
ptr = __cache_alloc_node(cachep, flags, nodeid);


2006-09-12 17:37:20

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch] slab: Do not use mempolicy for kmalloc_node

On Tue, 12 Sep 2006, Ravikiran G Thirumalai wrote:

> The slab should follow the specified memory policy for kmalloc allocations,
> which it does. However, for kmalloc_node allocations, slab should
> serve the object from the requested node irrespective of memory policy.
> This seems to be broken in slab code. Following patch fixes this.

This is not complete. Please see the discussion on GFP_THISNODE and the
related patch to fix this issue
http://marc.theaimsgroup.com/?l=linux-mm&m=115505682122540&w=2

2006-09-12 19:50:55

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: Re: [patch] slab: Do not use mempolicy for kmalloc_node

On Tue, Sep 12, 2006 at 10:36:54AM -0700, Christoph Lameter wrote:
> On Tue, 12 Sep 2006, Ravikiran G Thirumalai wrote:
>
> ...
> This is not complete. Please see the discussion on GFP_THISNODE and the
> related patch to fix this issue
> http://marc.theaimsgroup.com/?l=linux-mm&m=115505682122540&w=2

Hmm, I see, but with the above patch, if we ignore mempolicy for
__GFP_THISNODE slab caches at alternate_node_alloc (which is pretty much
all the slab caches) then we would be ignoring memplocies altogether no?

Thanks,
Kiran

2006-09-12 19:52:33

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch] slab: Do not use mempolicy for kmalloc_node

On Tue, 12 Sep 2006, Ravikiran G Thirumalai wrote:

> On Tue, Sep 12, 2006 at 10:36:54AM -0700, Christoph Lameter wrote:
> > On Tue, 12 Sep 2006, Ravikiran G Thirumalai wrote:
> >
> > ...
> > This is not complete. Please see the discussion on GFP_THISNODE and the
> > related patch to fix this issue
> > http://marc.theaimsgroup.com/?l=linux-mm&m=115505682122540&w=2
>
> Hmm, I see, but with the above patch, if we ignore mempolicy for
> __GFP_THISNODE slab caches at alternate_node_alloc (which is pretty much
> all the slab caches) then we would be ignoring memplocies altogether no?

We are implementing memory policies in the slab layer. I.e. we
are taking slab objects round robin from the per node lists of the
slab.

2006-09-13 22:12:46

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: Re: [patch] slab: Do not use mempolicy for kmalloc_node

On Tue, Sep 12, 2006 at 12:52:14PM -0700, Christoph Lameter wrote:
> On Tue, 12 Sep 2006, Ravikiran G Thirumalai wrote:
>
> > On Tue, Sep 12, 2006 at 10:36:54AM -0700, Christoph Lameter wrote:
> > > On Tue, 12 Sep 2006, Ravikiran G Thirumalai wrote:
> > >
> > > ...
> > > This is not complete. Please see the discussion on GFP_THISNODE and the
> > > related patch to fix this issue
> > > http://marc.theaimsgroup.com/?l=linux-mm&m=115505682122540&w=2
> >
> > Hmm, I see, but with the above patch, if we ignore mempolicy for
> > __GFP_THISNODE slab caches at alternate_node_alloc (which is pretty much
> > all the slab caches) then we would be ignoring memplocies altogether no?
>
> We are implementing memory policies in the slab layer. I.e. we
> are taking slab objects round robin from the per node lists of the
> slab.

Christoph,
As discussed offline, cpuset constraints and mempolicy constraints still
get applied to kmalloc_node in current mainline as well as the patch pointed
above. Here is the fix we agreed upon. Please ack it if you can :)

Thanks,
Kiran


Slab should follow the specified cpuset constraints/mem policy constraints
for kmalloc allocations, which it does. However, for kmalloc_node
allocations, slab should serve the object from the requested node
irrespective of memory policy. This seems to be broken in slab code.
Following patch fixes this.

Patch just moves out the cpuset/mempolicy base allocation from ____cache_alloc
to __cache_alloc. __cache_alloc is used for general purpose allocation,
and cpuset/mempolicy constraints should be considered there. Whereas,
____cache_alloc should always be used to allocate objects from the
array_cache of the executing CPU

Signed-off-by: Alok N Kataria <[email protected]>
Signed-off-by: Ravikiran Thirumalai <[email protected]>
Signed-off-by: Shai Fultheim <[email protected]>

Index: linux-2.6.18-rc6/mm/slab.c
===================================================================
--- linux-2.6.18-rc6.orig/mm/slab.c 2006-09-13 14:19:25.000000000 -0700
+++ linux-2.6.18-rc6/mm/slab.c 2006-09-13 14:21:19.000000000 -0700
@@ -2963,19 +2963,11 @@ static void *cache_alloc_debugcheck_afte
#define cache_alloc_debugcheck_after(a,b,objp,d) (objp)
#endif

+/* Allocate object from the array cache of the executing cpu */
static inline void *____cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
void *objp;
struct array_cache *ac;
-
-#ifdef CONFIG_NUMA
- if (unlikely(current->flags & (PF_SPREAD_SLAB | PF_MEMPOLICY))) {
- objp = alternate_node_alloc(cachep, flags);
- if (objp != NULL)
- return objp;
- }
-#endif
-
check_irq_off();
ac = cpu_cache_get(cachep);
if (likely(ac->avail)) {
@@ -2989,15 +2981,29 @@ static inline void *____cache_alloc(stru
return objp;
}

+/*
+ * Allocate object from the appropriate node as per mempolicy/cpuset
+ * constraints
+ */
static __always_inline void *__cache_alloc(struct kmem_cache *cachep,
gfp_t flags, void *caller)
{
unsigned long save_flags;
void *objp;
-
cache_alloc_debugcheck_before(cachep, flags);
-
local_irq_save(save_flags);
+
+#ifdef CONFIG_NUMA
+ if (unlikely(current->flags & (PF_SPREAD_SLAB | PF_MEMPOLICY))) {
+ objp = alternate_node_alloc(cachep, flags);
+ if (objp != NULL) {
+ local_irq_restore(save_flags);
+ prefetchw(objp);
+ return objp;
+ }
+ }
+#endif
+
objp = ____cache_alloc(cachep, flags);
local_irq_restore(save_flags);
objp = cache_alloc_debugcheck_after(cachep, flags, objp,
@@ -3303,9 +3309,10 @@ void *kmem_cache_alloc_node(struct kmem_
cache_alloc_debugcheck_before(cachep, flags);
local_irq_save(save_flags);

- if (nodeid == -1 || nodeid == numa_node_id() ||
- !cachep->nodelists[nodeid])
+ if (nodeid == numa_node_id())
ptr = ____cache_alloc(cachep, flags);
+ else if (nodeid == -1 || !cachep->nodelists[nodeid])
+ ptr = __cache_alloc(cachep, flags, __builtin_return_address(0));
else
ptr = __cache_alloc_node(cachep, flags, nodeid);
local_irq_restore(save_flags);

2006-09-13 22:28:15

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch] slab: Do not use mempolicy for kmalloc_node

I wish first of all that the description would accurately describe the
problem.

On Wed, 13 Sep 2006, Ravikiran G Thirumalai wrote:


> +/* Allocate object from the array cache of the executing cpu */

I am not sure what this adds.

> +/*
> + * Allocate object from the appropriate node as per mempolicy/cpuset
> + * constraints
> + */

We only do this under certain conditions. The main purpose of this
function is to allocate an object without having specified a node.

> unsigned long save_flags;
> void *objp;
> -
> cache_alloc_debugcheck_before(cachep, flags);
> -
> local_irq_save(save_flags);

Extra material.

> +
> +#ifdef CONFIG_NUMA
> + if (unlikely(current->flags & (PF_SPREAD_SLAB | PF_MEMPOLICY))) {
> + objp = alternate_node_alloc(cachep, flags);
> + if (objp != NULL) {
> + local_irq_restore(save_flags);
> + prefetchw(objp);
> + return objp;
> + }
> + }
> +#endif
> +

Ok.

> @@ -3303,9 +3309,10 @@ void *kmem_cache_alloc_node(struct kmem_
> cache_alloc_debugcheck_before(cachep, flags);
> local_irq_save(save_flags);
>
> - if (nodeid == -1 || nodeid == numa_node_id() ||
> - !cachep->nodelists[nodeid])
> + if (nodeid == numa_node_id())
> ptr = ____cache_alloc(cachep, flags);
> + else if (nodeid == -1 || !cachep->nodelists[nodeid])
> + ptr = __cache_alloc(cachep, flags, __builtin_return_address(0));
> else

We are still allocating according to policy if nodeid == -1 or if we have
an invalid node? I thought we agreed that kmalloc_node should never
obey memory policies?

Simply move the #ifdef CONFIG_NUMA block as we agreed last night. And fix
the description to specify under what conditions kmalloc_node will obey
memory policies.

2006-09-13 23:35:50

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: Re: [patch] slab: Do not use mempolicy for kmalloc_node

On Wed, Sep 13, 2006 at 03:28:01PM -0700, Christoph Lameter wrote:
> I wish first of all that the description would accurately describe the
> problem.

Well, I thought the subject itself described the problem in one line :), I
guess I should be more verbose? OK.

>
> On Wed, 13 Sep 2006, Ravikiran G Thirumalai wrote:
>
>
> > +/* Allocate object from the array cache of the executing cpu */
>
> I am not sure what this adds.

Comments :). Well the idea was to describe the difference between
__cache_alloc and ____cache_alloc, I mean, these are two similar sounding
functions with extra underscores. OK, maybe I was being too verbose here?

>
> > +/*
> > + * Allocate object from the appropriate node as per mempolicy/cpuset
> > + * constraints
> > + */
>
> We only do this under certain conditions. The main purpose of this
> function is to allocate an object without having specified a node.

Yes, the conditions being cpuset constraints or a mempolicy being in place.
Again, since objects can be allocated off other nodes under certain
conditions, I thought it was good to document it...

>
> > unsigned long save_flags;
> > void *objp;
> > -
> > cache_alloc_debugcheck_before(cachep, flags);
> > -
> > local_irq_save(save_flags);
>
> Extra material.

OK

>
> > +
> > +#ifdef CONFIG_NUMA
> > + if (unlikely(current->flags & (PF_SPREAD_SLAB | PF_MEMPOLICY))) {
> > + objp = alternate_node_alloc(cachep, flags);
> > + if (objp != NULL) {
> > + local_irq_restore(save_flags);
> > + prefetchw(objp);
> > + return objp;
> > + }
> > + }
> > +#endif
> > +
>
> Ok.
>
> > @@ -3303,9 +3309,10 @@ void *kmem_cache_alloc_node(struct kmem_
> > cache_alloc_debugcheck_before(cachep, flags);
> > local_irq_save(save_flags);
> >
> > - if (nodeid == -1 || nodeid == numa_node_id() ||
> > - !cachep->nodelists[nodeid])
> > + if (nodeid == numa_node_id())
> > ptr = ____cache_alloc(cachep, flags);
> > + else if (nodeid == -1 || !cachep->nodelists[nodeid])
> > + ptr = __cache_alloc(cachep, flags, __builtin_return_address(0));
> > else
>
> We are still allocating according to policy if nodeid == -1 or if we have
> an invalid node? I thought we agreed that kmalloc_node should never
> obey memory policies?
>
> Simply move the #ifdef CONFIG_NUMA block as we agreed last night. And fix
> the description to specify under what conditions kmalloc_node will obey
> memory policies.

This is the case when we are requesting an object from a non existent
node/invalid node. So we have 2 choices, either to spread the allocations
as per the cpuset constraints (the same treatment as kmalloc), or to
allocate from the requesting node, either ways we are not strictly
confirming to the user's choice of node (which we cannot). I cannot see
major advantage or disadvantage either ways, so I chose to keep the policy
in current mainline code -- spread according to the policy set.

2006-09-13 23:49:10

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch] slab: Do not use mempolicy for kmalloc_node

On Wed, 13 Sep 2006, Ravikiran G Thirumalai wrote:

> > We only do this under certain conditions. The main purpose of this
> > function is to allocate an object without having specified a node.
>
> Yes, the conditions being cpuset constraints or a mempolicy being in place.
> Again, since objects can be allocated off other nodes under certain
> conditions, I thought it was good to document it...

This is only true for the CONFIG_NUMA case.

> This is the case when we are requesting an object from a non existent
> node/invalid node. So we have 2 choices, either to spread the allocations
> as per the cpuset constraints (the same treatment as kmalloc), or to
> allocate from the requesting node, either ways we are not strictly
> confirming to the user's choice of node (which we cannot). I cannot see
> major advantage or disadvantage either ways, so I chose to keep the policy
> in current mainline code -- spread according to the policy set.

The two cases were your patch still applied memory policies were:

1. nodeid = -1. This is one particular case that we wanted to fix because
it means use numa_node_id().

2. The case where the nodelist does not yet exist.

AFAIK this situation only occurs on boot strap when we are actually
attempting to allocate from a different node than what we are running on.
Falling back to the local node is the right thing to do because we have
that already working. A process that is running on a node must always have
the nodelists for all caches allocated. The cpuup callbacks take care of that.

kmalloc_node needs work like page_alloc_node. page_alloc_node() never
consults memory policies and thus one would not expect kmalloc_node to do
so either.

2006-09-13 23:57:13

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: Re: [patch] slab: Do not use mempolicy for kmalloc_node

On Wed, Sep 13, 2006 at 04:48:58PM -0700, Christoph Lameter wrote:
> On Wed, 13 Sep 2006, Ravikiran G Thirumalai wrote:
>
> The two cases were your patch still applied memory policies were:
>
> 1. nodeid = -1. This is one particular case that we wanted to fix because
> it means use numa_node_id().

OK, I did not realise nodeid = -1 _should_ imply current node. Not using
mempolicy makes sense then.

>
> 2. The case where the nodelist does not yet exist.
>
> AFAIK this situation only occurs on boot strap when we are actually
> attempting to allocate from a different node than what we are running on.
> Falling back to the local node is the right thing to do because we have
> that already working. A process that is running on a node must always have
> the nodelists for all caches allocated. The cpuup callbacks take care of that.
>
> kmalloc_node needs work like page_alloc_node. page_alloc_node() never
> consults memory policies and thus one would not expect kmalloc_node to do
> so either.

OK.