2007-11-07 01:13:23

by Christoph Lameter

[permalink] [raw]
Subject: [patch 02/23] SLUB: Rename NUMA defrag_ratio to remote_node_defrag_ratio

We need the defrag ratio for the non NUMA situation now. The NUMA defrag works
by allocating objects from partial slabs on remote nodes. Rename it to

remote_node_defrag_ratio

to be clear about this.

[This patch is already in mm]

Reviewed-by: Rik van Riel <[email protected]>
Signed-off-by: Christoph Lameter <[email protected]>
---
include/linux/slub_def.h | 5 ++++-
mm/slub.c | 17 +++++++++--------
2 files changed, 13 insertions(+), 9 deletions(-)

Index: linux-2.6/include/linux/slub_def.h
===================================================================
--- linux-2.6.orig/include/linux/slub_def.h 2007-11-06 12:34:13.000000000 -0800
+++ linux-2.6/include/linux/slub_def.h 2007-11-06 12:36:28.000000000 -0800
@@ -60,7 +60,10 @@ struct kmem_cache {
#endif

#ifdef CONFIG_NUMA
- int defrag_ratio;
+ /*
+ * Defragmentation by allocating from a remote node.
+ */
+ int remote_node_defrag_ratio;
struct kmem_cache_node *node[MAX_NUMNODES];
#endif
#ifdef CONFIG_SMP
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2007-11-06 12:36:16.000000000 -0800
+++ linux-2.6/mm/slub.c 2007-11-06 12:37:25.000000000 -0800
@@ -1345,7 +1345,8 @@ static unsigned long get_any_partial(str
* expensive if we do it every time we are trying to find a slab
* with available objects.
*/
- if (!s->defrag_ratio || get_cycles() % 1024 > s->defrag_ratio)
+ if (!s->remote_node_defrag_ratio ||
+ get_cycles() % 1024 > s->remote_node_defrag_ratio)
return 0;

zonelist = &NODE_DATA(slab_node(current->mempolicy))
@@ -2363,7 +2364,7 @@ static int kmem_cache_open(struct kmem_c

s->refcount = 1;
#ifdef CONFIG_NUMA
- s->defrag_ratio = 100;
+ s->remote_node_defrag_ratio = 100;
#endif
if (!init_kmem_cache_nodes(s, gfpflags & ~SLUB_DMA))
goto error;
@@ -4005,21 +4006,21 @@ static ssize_t free_calls_show(struct km
SLAB_ATTR_RO(free_calls);

#ifdef CONFIG_NUMA
-static ssize_t defrag_ratio_show(struct kmem_cache *s, char *buf)
+static ssize_t remote_node_defrag_ratio_show(struct kmem_cache *s, char *buf)
{
- return sprintf(buf, "%d\n", s->defrag_ratio / 10);
+ return sprintf(buf, "%d\n", s->remote_node_defrag_ratio / 10);
}

-static ssize_t defrag_ratio_store(struct kmem_cache *s,
+static ssize_t remote_node_defrag_ratio_store(struct kmem_cache *s,
const char *buf, size_t length)
{
int n = simple_strtoul(buf, NULL, 10);

if (n < 100)
- s->defrag_ratio = n * 10;
+ s->remote_node_defrag_ratio = n * 10;
return length;
}
-SLAB_ATTR(defrag_ratio);
+SLAB_ATTR(remote_node_defrag_ratio);
#endif

static struct attribute * slab_attrs[] = {
@@ -4050,7 +4051,7 @@ static struct attribute * slab_attrs[] =
&cache_dma_attr.attr,
#endif
#ifdef CONFIG_NUMA
- &defrag_ratio_attr.attr,
+ &remote_node_defrag_ratio_attr.attr,
#endif
NULL
};

--


2007-11-08 14:50:55

by mel

[permalink] [raw]
Subject: Re: [patch 02/23] SLUB: Rename NUMA defrag_ratio to remote_node_defrag_ratio

On (06/11/07 17:11), Christoph Lameter didst pronounce:
> We need the defrag ratio for the non NUMA situation now. The NUMA defrag works
> by allocating objects from partial slabs on remote nodes. Rename it to
>
> remote_node_defrag_ratio
>

I'm not too keen on the defrag name here largely because I cannot tell what
it has to do with defragmention or ratios. It's really about working out
when it is better to pack objects into a remote slab than reclaim objects
from a local slab, right? It's also not clear what it is a ratio of what to
what. I thought it might be clock cycles but that isn't very clear either.
If we are renaming this can it be something like remote_packing_cost_limit ?

> to be clear about this.
>
> [This patch is already in mm]
>
> Reviewed-by: Rik van Riel <[email protected]>
> Signed-off-by: Christoph Lameter <[email protected]>
> ---
> include/linux/slub_def.h | 5 ++++-
> mm/slub.c | 17 +++++++++--------
> 2 files changed, 13 insertions(+), 9 deletions(-)
>
> Index: linux-2.6/include/linux/slub_def.h
> ===================================================================
> --- linux-2.6.orig/include/linux/slub_def.h 2007-11-06 12:34:13.000000000 -0800
> +++ linux-2.6/include/linux/slub_def.h 2007-11-06 12:36:28.000000000 -0800
> @@ -60,7 +60,10 @@ struct kmem_cache {
> #endif
>
> #ifdef CONFIG_NUMA
> - int defrag_ratio;
> + /*
> + * Defragmentation by allocating from a remote node.
> + */
> + int remote_node_defrag_ratio;

How about

/*
* When packing objects into slabs, it may become necessary to
* reclaim objects on a local slab or allocate from a remote node.
* The remote_packing_cost_limit is the maximum cost of remote
* accesses that should be paid before it becomes worthwhile to
* reclaim instead
*/
int remote_packing_cost_limit;

?

I still don't see what get_cycles() has to do with anything but this
could be because my understanding of SLUB sucks.

> struct kmem_cache_node *node[MAX_NUMNODES];
> #endif
> #ifdef CONFIG_SMP
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c 2007-11-06 12:36:16.000000000 -0800
> +++ linux-2.6/mm/slub.c 2007-11-06 12:37:25.000000000 -0800
> @@ -1345,7 +1345,8 @@ static unsigned long get_any_partial(str
> * expensive if we do it every time we are trying to find a slab
> * with available objects.
> */
> - if (!s->defrag_ratio || get_cycles() % 1024 > s->defrag_ratio)
> + if (!s->remote_node_defrag_ratio ||
> + get_cycles() % 1024 > s->remote_node_defrag_ratio)

I cannot figure out what the number of cycles currently showing on the TSC
have to do with a ratio :(. I could semi-understand if we were counting up
how many cycles were being spent trying to pack objects but that does not
appear to be the case. The comment didn't help a whole lot either. It felt
like a cost for packing, not a ratio

> return 0;
>
> zonelist = &NODE_DATA(slab_node(current->mempolicy))
> @@ -2363,7 +2364,7 @@ static int kmem_cache_open(struct kmem_c
>
> s->refcount = 1;
> #ifdef CONFIG_NUMA
> - s->defrag_ratio = 100;
> + s->remote_node_defrag_ratio = 100;
> #endif
> if (!init_kmem_cache_nodes(s, gfpflags & ~SLUB_DMA))
> goto error;
> @@ -4005,21 +4006,21 @@ static ssize_t free_calls_show(struct km
> SLAB_ATTR_RO(free_calls);
>
> #ifdef CONFIG_NUMA
> -static ssize_t defrag_ratio_show(struct kmem_cache *s, char *buf)
> +static ssize_t remote_node_defrag_ratio_show(struct kmem_cache *s, char *buf)
> {
> - return sprintf(buf, "%d\n", s->defrag_ratio / 10);
> + return sprintf(buf, "%d\n", s->remote_node_defrag_ratio / 10);
> }
>
> -static ssize_t defrag_ratio_store(struct kmem_cache *s,
> +static ssize_t remote_node_defrag_ratio_store(struct kmem_cache *s,
> const char *buf, size_t length)
> {
> int n = simple_strtoul(buf, NULL, 10);
>
> if (n < 100)
> - s->defrag_ratio = n * 10;
> + s->remote_node_defrag_ratio = n * 10;
> return length;
> }
> -SLAB_ATTR(defrag_ratio);
> +SLAB_ATTR(remote_node_defrag_ratio);
> #endif
>
> static struct attribute * slab_attrs[] = {
> @@ -4050,7 +4051,7 @@ static struct attribute * slab_attrs[] =
> &cache_dma_attr.attr,
> #endif
> #ifdef CONFIG_NUMA
> - &defrag_ratio_attr.attr,
> + &remote_node_defrag_ratio_attr.attr,
> #endif
> NULL
> };
>
> --
>

--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-11-08 17:26:51

by Matt Mackall

[permalink] [raw]
Subject: Re: [patch 02/23] SLUB: Rename NUMA defrag_ratio to remote_node_defrag_ratio

On Thu, Nov 08, 2007 at 02:50:44PM +0000, Mel Gorman wrote:
> > - if (!s->defrag_ratio || get_cycles() % 1024 > s->defrag_ratio)
> > + if (!s->remote_node_defrag_ratio ||
> > + get_cycles() % 1024 > s->remote_node_defrag_ratio)
>
> I cannot figure out what the number of cycles currently showing on the TSC
> have to do with a ratio :(. I could semi-understand if we were counting up
> how many cycles were being spent trying to pack objects but that does not
> appear to be the case. The comment didn't help a whole lot either. It felt
> like a cost for packing, not a ratio

It's just a random number generator. And a bad one: lots of arches
return 0. And I believe at least one of them has some NUMA support.

--
Mathematics is the supreme nostalgia of our time.

2007-11-08 18:56:37

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch 02/23] SLUB: Rename NUMA defrag_ratio to remote_node_defrag_ratio

On Thu, 8 Nov 2007, Mel Gorman wrote:

> On (06/11/07 17:11), Christoph Lameter didst pronounce:
> > We need the defrag ratio for the non NUMA situation now. The NUMA defrag works
> > by allocating objects from partial slabs on remote nodes. Rename it to
> >
> > remote_node_defrag_ratio
> >
>
> I'm not too keen on the defrag name here largely because I cannot tell what
> it has to do with defragmention or ratios. It's really about working out
> when it is better to pack objects into a remote slab than reclaim objects
> from a local slab, right? It's also not clear what it is a ratio of what to
> what. I thought it might be clock cycles but that isn't very clear either.
> If we are renaming this can it be something like remote_packing_cost_limit ?

In a NUMA situation we have a choice between

1. Allocating a page from the local node (which consumes more memory and
is advantageous performance wise.

2. Not allocating from the local node but see if any other node has
available partially allocated slabs. If we allocate from them then
we save memory and reduce the amount of partial slabs on the remote
node. Thus the fragmentation ratio is reduced.

> How about
>
> /*
> * When packing objects into slabs, it may become necessary to
> * reclaim objects on a local slab or allocate from a remote node.
> * The remote_packing_cost_limit is the maximum cost of remote
> * accesses that should be paid before it becomes worthwhile to
> * reclaim instead
> */
> int remote_packing_cost_limit;
>
> ?

That is not what this is about. And the functionality has been in SLUB
since the beginning.

2007-11-08 19:16:43

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch 02/23] SLUB: Rename NUMA defrag_ratio to remote_node_defrag_ratio

On Thu, 8 Nov 2007, Matt Mackall wrote:

> > I cannot figure out what the number of cycles currently showing on the TSC
> > have to do with a ratio :(. I could semi-understand if we were counting up
> > how many cycles were being spent trying to pack objects but that does not
> > appear to be the case. The comment didn't help a whole lot either. It felt
> > like a cost for packing, not a ratio
>
> It's just a random number generator. And a bad one: lots of arches
> return 0. And I believe at least one of them has some NUMA support.

Do we have a better one? Something with minimal processing overhead? I'd
be glad to switch it.

2007-11-08 19:48:03

by Matt Mackall

[permalink] [raw]
Subject: Re: [patch 02/23] SLUB: Rename NUMA defrag_ratio to remote_node_defrag_ratio

On Thu, Nov 08, 2007 at 11:16:33AM -0800, Christoph Lameter wrote:
> On Thu, 8 Nov 2007, Matt Mackall wrote:
>
> > > I cannot figure out what the number of cycles currently showing on the TSC
> > > have to do with a ratio :(. I could semi-understand if we were counting up
> > > how many cycles were being spent trying to pack objects but that does not
> > > appear to be the case. The comment didn't help a whole lot either. It felt
> > > like a cost for packing, not a ratio
> >
> > It's just a random number generator. And a bad one: lots of arches
> > return 0. And I believe at least one of them has some NUMA support.
>
> Do we have a better one? Something with minimal processing overhead? I'd
> be glad to switch it.

Not really. drivers/char/random.c does:

__get_cpu_var(trickle_count)++ & 0xfff

for a similar purpose.

--
Mathematics is the supreme nostalgia of our time.

2007-11-08 20:01:35

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch 02/23] SLUB: Rename NUMA defrag_ratio to remote_node_defrag_ratio

On Thu, 8 Nov 2007, Matt Mackall wrote:

> Not really. drivers/char/random.c does:
>
> __get_cpu_var(trickle_count)++ & 0xfff

That is incremented on each call to add_timer_randomness. Not a high
enough resolution there. I guess I am stuck with get_cycles().

2007-11-08 20:10:57

by mel

[permalink] [raw]
Subject: Re: [patch 02/23] SLUB: Rename NUMA defrag_ratio to remote_node_defrag_ratio

On (08/11/07 10:56), Christoph Lameter didst pronounce:
> On Thu, 8 Nov 2007, Mel Gorman wrote:
>
> > On (06/11/07 17:11), Christoph Lameter didst pronounce:
> > > We need the defrag ratio for the non NUMA situation now. The NUMA defrag works
> > > by allocating objects from partial slabs on remote nodes. Rename it to
> > >
> > > remote_node_defrag_ratio
> > >
> >
> > I'm not too keen on the defrag name here largely because I cannot tell what
> > it has to do with defragmention or ratios. It's really about working out
> > when it is better to pack objects into a remote slab than reclaim objects
> > from a local slab, right? It's also not clear what it is a ratio of what to
> > what. I thought it might be clock cycles but that isn't very clear either.
> > If we are renaming this can it be something like remote_packing_cost_limit ?
>
> In a NUMA situation we have a choice between
>
> 1. Allocating a page from the local node (which consumes more memory and
> is advantageous performance wise.
>
> 2. Not allocating from the local node but see if any other node has
> available partially allocated slabs. If we allocate from them then
> we save memory and reduce the amount of partial slabs on the remote
> node. Thus the fragmentation ratio is reduced.
>

Ok, I get the logic somewhat now, thanks.

> > How about
> >
> > /*
> > * When packing objects into slabs, it may become necessary to
> > * reclaim objects on a local slab or allocate from a remote node.
> > * The remote_packing_cost_limit is the maximum cost of remote
> > * accesses that should be paid before it becomes worthwhile to
> > * reclaim instead
> > */
> > int remote_packing_cost_limit;
> >
> > ?
>
> That is not what this is about. And the functionality has been in SLUB
> since the beginning.
>

Yeah, my understanding of SLUB is crap. Sorry for the noise.

--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-11-08 21:04:17

by Matt Mackall

[permalink] [raw]
Subject: Re: [patch 02/23] SLUB: Rename NUMA defrag_ratio to remote_node_defrag_ratio

On Thu, Nov 08, 2007 at 12:01:24PM -0800, Christoph Lameter wrote:
> On Thu, 8 Nov 2007, Matt Mackall wrote:
>
> > Not really. drivers/char/random.c does:
> >
> > __get_cpu_var(trickle_count)++ & 0xfff
>
> That is incremented on each call to add_timer_randomness. Not a high
> enough resolution there. I guess I am stuck with get_cycles().

I'm not suggesting you use trickle_count, silly. I'm suggesting you
use a similar approach.

But perhaps I should just add a lightweight RNG to random.c and be
done with it.

--
Mathematics is the supreme nostalgia of our time.

2007-11-08 21:28:42

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch 02/23] SLUB: Rename NUMA defrag_ratio to remote_node_defrag_ratio

On Thu, 8 Nov 2007, Matt Mackall wrote:

> But perhaps I should just add a lightweight RNG to random.c and be
> done with it.

It would be appreciated.

2007-11-09 00:01:47

by Matt Mackall

[permalink] [raw]
Subject: Re: [patch 02/23] SLUB: Rename NUMA defrag_ratio to remote_node_defrag_ratio

On Thu, Nov 08, 2007 at 01:28:31PM -0800, Christoph Lameter wrote:
> On Thu, 8 Nov 2007, Matt Mackall wrote:
>
> > But perhaps I should just add a lightweight RNG to random.c and be
> > done with it.
>
> It would be appreciated.

As someone pointed out privately, there's a random32() in lib/random32.c.

Unfortunately, this function is too heavy for many fast-path uses and
too weak for most other uses. I'll see if I can come up with something
better..

--
Mathematics is the supreme nostalgia of our time.