2012-10-13 16:51:28

by Richard Kennedy

[permalink] [raw]
Subject: [PATCH 0/2] RFC SLUB: increase range of kmalloc slab sizes

This patch increases the range of slab sizes available to kmalloc, adding
slabs half way between the existing power of two sized ones, so allowing slightly
more efficient use of memory.
Most of the new slabs already exist as kmem_cache slabs so only the 1.5k,3k & 6k
are entirely new.
The code in get_slab_index() is simple, optimizes well and only adds a few
instructions to the code path of dynamically sized kmallocs.
It also simplifies the slab initialisation code as it removes the special case of the 2
odd sized slabs of 96 & 192 and the need for the slab_index array.

I have been running this on an x86_64 desktop machine for a few weeks without any problems,
and have not measured any significant performance difference, nothing about the noise anyway.

The new slabs (1.5k,3k,6k) get used by several hundred objects on desktop workloads so this
patch has a small but useful impact on memory usage.
As the other new slabs are aliased to existing slabs it's difficult to measure any differences.

The code should correctly support KMALLOC_MIN_SIZE and therefore work on architectures other
than x86_64, but I don't have any hardware to test it on. So if anyone feels like testing this patch
I will be interested in the results.

The patches are agains v3.6
I have only tested this on x86_64 with gcc 4.7.2

The first patch is just to tidy up hardcoded constants in resiliency_test() replacing them
with calls to kmalloc_index so that it will still work after the kmalloc_cache array get reordered.

The second patch adds the new slabs, updates the kmalloc code and kmem_cache_init().

This version is a drop in replacement for the existing code, but I could make it a config option if
you prefer.

regards
Richard

Richard Kennedy (2):
SLUB: remove hard coded magic numbers from resiliency_test
SLUB: increase the range of slab sizes available to kmalloc, allowing
a somewhat more effient use of memory.

include/linux/slub_def.h | 95 +++++++++++++-------------
mm/slub.c | 174 ++++++++++++++++++-----------------------------
2 files changed, 114 insertions(+), 155 deletions(-)

--
1.7.11.7


2012-10-13 16:51:35

by Richard Kennedy

[permalink] [raw]
Subject: [PATCH 1/2] SLUB: remove hard coded magic numbers from resiliency_test

Use the always inlined function kmalloc_index to translate
sizes to indexes, so that we don't have to have the slab indexes
hard coded in two places.



Signed-off-by: Richard Kennedy <[email protected]>
---
mm/slub.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 2fdd96f..804ac42 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4418,7 +4418,7 @@ static void resiliency_test(void)
printk(KERN_ERR "\n1. kmalloc-16: Clobber Redzone/next pointer"
" 0x12->0x%p\n\n", p + 16);

- validate_slab_cache(kmalloc_caches[4]);
+ validate_slab_cache(kmalloc_caches[kmalloc_index(16)]);

/* Hmmm... The next two are dangerous */
p = kzalloc(32, GFP_KERNEL);
@@ -4428,7 +4428,7 @@ static void resiliency_test(void)
printk(KERN_ERR
"If allocated object is overwritten then not detectable\n\n");

- validate_slab_cache(kmalloc_caches[5]);
+ validate_slab_cache(kmalloc_caches[kmalloc_index(32)]);
p = kzalloc(64, GFP_KERNEL);
p += 64 + (get_cycles() & 0xff) * sizeof(void *);
*p = 0x56;
@@ -4436,27 +4436,27 @@ static void resiliency_test(void)
p);
printk(KERN_ERR
"If allocated object is overwritten then not detectable\n\n");
- validate_slab_cache(kmalloc_caches[6]);
+ validate_slab_cache(kmalloc_caches[kmalloc_index(64)]);

printk(KERN_ERR "\nB. Corruption after free\n");
p = kzalloc(128, GFP_KERNEL);
kfree(p);
*p = 0x78;
printk(KERN_ERR "1. kmalloc-128: Clobber first word 0x78->0x%p\n\n", p);
- validate_slab_cache(kmalloc_caches[7]);
+ validate_slab_cache(kmalloc_caches[kmalloc_index(128)]);

p = kzalloc(256, GFP_KERNEL);
kfree(p);
p[50] = 0x9a;
printk(KERN_ERR "\n2. kmalloc-256: Clobber 50th byte 0x9a->0x%p\n\n",
p);
- validate_slab_cache(kmalloc_caches[8]);
+ validate_slab_cache(kmalloc_caches[kmalloc_index(256)]);

p = kzalloc(512, GFP_KERNEL);
kfree(p);
p[512] = 0xab;
printk(KERN_ERR "\n3. kmalloc-512: Clobber redzone 0xab->0x%p\n\n", p);
- validate_slab_cache(kmalloc_caches[9]);
+ validate_slab_cache(kmalloc_caches[kmalloc_index(512)]);
}
#else
#ifdef CONFIG_SYSFS
--
1.7.11.7

2012-10-13 16:51:42

by Richard Kennedy

[permalink] [raw]
Subject: [PATCH 2/2] SLUB: increase the range of slab sizes available to kmalloc, allowing a somewhat more effient use of memory.

This patch add new slabs sized at 1.5 * 2^n.
i.e - 24,48,96,192,384,768,1.5k,3k,6k
Most of which already exist as kmem_cache slabs, except for 1.5k, 3k & 6k.

There is no extra overhead for statically sized kmalloc, and only a small change for dynamically sized ones.

Minimal code changes:
new list of slab sizes
new kmalloc functions
re-factored slub initialisation

other than that the core slub code remains unchanged.

No measurable significant performance difference, nothing above the noise anyway.

Only tested on x86_64, so may not work on other architectures.
The code should support the use of KMALLOC_MIN_SIZE, but I have no hardware to test this on.

compiled with gcc 4.7.2
patch against 3.6

Signed-off-by: Richard Kennedy <[email protected]>
---
include/linux/slub_def.h | 95 +++++++++++++--------------
mm/slub.c | 162 ++++++++++++++++++-----------------------------
2 files changed, 108 insertions(+), 149 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index df448ad..3c36009 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -143,62 +143,63 @@ struct kmem_cache {
#endif

/*
- * We keep the general caches in an array of slab caches that are used for
- * 2^x bytes of allocations.
+ * the table of slab sizes, should handle all of the arch cases
+ * -- but only tested on x86_64 -- caveat emptor
*/
-extern struct kmem_cache *kmalloc_caches[SLUB_PAGE_SHIFT];

-/*
- * Sorry that the following has to be that ugly but some versions of GCC
- * have trouble with constant propagation and loops.
+static const short __slab_sizes[] = {0, 8, 12, 16, 24, 32, 48, 64, 96,
+ 128, 192, 256, 384, 512, 768, 1024,
+ 1536, 2048, 3072, 4096, 6144, 8192};
+
+/* very ugly, but gcc will optimize these away.
+ * the comment on the original finction says :-
+ * > Sorry that the following has to be that ugly but some versions of GCC
+ * > have trouble with constant propagation and loops.
+ * so a simple loop just won't work.
+ *
+ * Don't allow anything smaller than, or not aligned to KMALLOC_MIN_SIZE
*/
-static __always_inline int kmalloc_index(size_t size)
+static __always_inline short __test_slab_size(size_t size, short index)
{
- if (!size)
+ if (__slab_sizes[index] % KMALLOC_MIN_SIZE ||
+ __slab_sizes[index] < size)
return 0;
+ return 1;
+}
+
+static __always_inline short kmalloc_index(size_t size)
+{
+ if (!size) return 0;
+ if (__test_slab_size(size, 1)) return 1;
+ if (__test_slab_size(size, 2)) return 2;
+ if (__test_slab_size(size, 3)) return 3;
+ if (__test_slab_size(size, 4)) return 4;
+ if (__test_slab_size(size, 5)) return 5;
+ if (__test_slab_size(size, 6)) return 6;
+ if (__test_slab_size(size, 7)) return 7;
+ if (__test_slab_size(size, 8)) return 8;
+ if (__test_slab_size(size, 9)) return 9;
+ if (__test_slab_size(size, 10)) return 10;
+ if (__test_slab_size(size, 11)) return 11;
+ if (__test_slab_size(size, 12)) return 12;
+ if (__test_slab_size(size, 13)) return 13;
+ if (__test_slab_size(size, 14)) return 14;
+ if (__test_slab_size(size, 15)) return 15;
+ if (__test_slab_size(size, 16)) return 16;
+ if (__test_slab_size(size, 17)) return 17;
+ if (__test_slab_size(size, 18)) return 18;
+ if (__test_slab_size(size, 19)) return 19;
+ if (__test_slab_size(size, 20)) return 20;
+ if (__test_slab_size(size, 21)) return 21;

- if (size <= KMALLOC_MIN_SIZE)
- return KMALLOC_SHIFT_LOW;
-
- if (KMALLOC_MIN_SIZE <= 32 && size > 64 && size <= 96)
- return 1;
- if (KMALLOC_MIN_SIZE <= 64 && size > 128 && size <= 192)
- return 2;
- if (size <= 8) return 3;
- if (size <= 16) return 4;
- if (size <= 32) return 5;
- if (size <= 64) return 6;
- if (size <= 128) return 7;
- if (size <= 256) return 8;
- if (size <= 512) return 9;
- if (size <= 1024) return 10;
- if (size <= 2 * 1024) return 11;
- if (size <= 4 * 1024) return 12;
-/*
- * The following is only needed to support architectures with a larger page
- * size than 4k. We need to support 2 * PAGE_SIZE here. So for a 64k page
- * size we would have to go up to 128k.
- */
- if (size <= 8 * 1024) return 13;
- if (size <= 16 * 1024) return 14;
- if (size <= 32 * 1024) return 15;
- if (size <= 64 * 1024) return 16;
- if (size <= 128 * 1024) return 17;
- if (size <= 256 * 1024) return 18;
- if (size <= 512 * 1024) return 19;
- if (size <= 1024 * 1024) return 20;
- if (size <= 2 * 1024 * 1024) return 21;
BUG();
- return -1; /* Will never be reached */
+ return -1;
+}

/*
- * What we really wanted to do and cannot do because of compiler issues is:
- * int i;
- * for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++)
- * if (size <= (1 << i))
- * return i;
+ * We keep the general caches in an array of slab caches
*/
-}
+extern struct kmem_cache *kmalloc_caches[ARRAY_SIZE(__slab_sizes)];

/*
* Find the slab cache for a given combination of allocation flags and size.
@@ -208,7 +209,7 @@ static __always_inline int kmalloc_index(size_t size)
*/
static __always_inline struct kmem_cache *kmalloc_slab(size_t size)
{
- int index = kmalloc_index(size);
+ short index = kmalloc_index(size);

if (index == 0)
return NULL;
diff --git a/mm/slub.c b/mm/slub.c
index 804ac42..5949f78 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3223,13 +3223,13 @@ EXPORT_SYMBOL(kmem_cache_destroy);
* Kmalloc subsystem
*******************************************************************/

-struct kmem_cache *kmalloc_caches[SLUB_PAGE_SHIFT];
+struct kmem_cache *kmalloc_caches[ARRAY_SIZE(__slab_sizes)];
EXPORT_SYMBOL(kmalloc_caches);

static struct kmem_cache *kmem_cache;

#ifdef CONFIG_ZONE_DMA
-static struct kmem_cache *kmalloc_dma_caches[SLUB_PAGE_SHIFT];
+static struct kmem_cache *kmalloc_dma_caches[ARRAY_SIZE(__slab_sizes)];
#endif

static int __init setup_slub_min_order(char *str)
@@ -3291,55 +3291,36 @@ panic:
return NULL;
}

-/*
- * Conversion table for small slabs sizes / 8 to the index in the
- * kmalloc array. This is necessary for slabs < 192 since we have non power
- * of two cache sizes there. The size of larger slabs can be determined using
- * fls.
- */
-static s8 size_index[24] = {
- 3, /* 8 */
- 4, /* 16 */
- 5, /* 24 */
- 5, /* 32 */
- 6, /* 40 */
- 6, /* 48 */
- 6, /* 56 */
- 6, /* 64 */
- 1, /* 72 */
- 1, /* 80 */
- 1, /* 88 */
- 1, /* 96 */
- 7, /* 104 */
- 7, /* 112 */
- 7, /* 120 */
- 7, /* 128 */
- 2, /* 136 */
- 2, /* 144 */
- 2, /* 152 */
- 2, /* 160 */
- 2, /* 168 */
- 2, /* 176 */
- 2, /* 184 */
- 2 /* 192 */
-};
-
-static inline int size_index_elem(size_t bytes)
+static __always_inline short get_slab_index(size_t size)
{
- return (bytes - 1) / 8;
+ short index;
+
+ /*
+ * All slabs must be aligned to KMALLOC_MIN_SIZE
+ * so disallow the half sized slab between MIN & 2*MIN
+ */
+ if (size <= KMALLOC_MIN_SIZE * 2) {
+ if (size > KMALLOC_MIN_SIZE)
+ return kmalloc_index(KMALLOC_MIN_SIZE * 2);
+ return kmalloc_index(KMALLOC_MIN_SIZE);
+ }
+
+ size--;
+ index = fls(size);
+ if (test_bit(index - 2, &size))
+ return (index - 3) * 2 + 1;
+ return (index - 3) * 2;
}

+
static struct kmem_cache *get_slab(size_t size, gfp_t flags)
{
- int index;
+ short index;

- if (size <= 192) {
- if (!size)
- return ZERO_SIZE_PTR;
+ if (!size)
+ return ZERO_SIZE_PTR;

- index = size_index[size_index_elem(size)];
- } else
- index = fls(size - 1);
+ index = get_slab_index(size);

#ifdef CONFIG_ZONE_DMA
if (unlikely((flags & SLUB_DMA)))
@@ -3715,6 +3696,20 @@ void __init kmem_cache_init(void)
struct kmem_cache *temp_kmem_cache_node;
unsigned long kmalloc_size;

+ /*
+ * do a simple smoke test to verify that get_slab_index is ok
+ * and matches __slab_sizes correctly.
+ * note i == 0 is unused
+ */
+ for (i = 1; i < ARRAY_SIZE(__slab_sizes); i++) {
+ short index;
+ if (__slab_sizes[i] % KMALLOC_MIN_SIZE)
+ continue;
+ index = get_slab_index(__slab_sizes[i]);
+ BUG_ON(index != i);
+ }
+
+
if (debug_guardpage_minorder())
slub_max_order = 0;

@@ -3782,64 +3777,29 @@ void __init kmem_cache_init(void)
BUILD_BUG_ON(KMALLOC_MIN_SIZE > 256 ||
(KMALLOC_MIN_SIZE & (KMALLOC_MIN_SIZE - 1)));

- for (i = 8; i < KMALLOC_MIN_SIZE; i += 8) {
- int elem = size_index_elem(i);
- if (elem >= ARRAY_SIZE(size_index))
- break;
- size_index[elem] = KMALLOC_SHIFT_LOW;
- }

- if (KMALLOC_MIN_SIZE == 64) {
- /*
- * The 96 byte size cache is not used if the alignment
- * is 64 byte.
- */
- for (i = 64 + 8; i <= 96; i += 8)
- size_index[size_index_elem(i)] = 7;
- } else if (KMALLOC_MIN_SIZE == 128) {
- /*
- * The 192 byte sized cache is not used if the alignment
- * is 128 byte. Redirect kmalloc to use the 256 byte cache
- * instead.
- */
- for (i = 128 + 8; i <= 192; i += 8)
- size_index[size_index_elem(i)] = 8;
- }
-
- /* Caches that are not of the two-to-the-power-of size */
- if (KMALLOC_MIN_SIZE <= 32) {
- kmalloc_caches[1] = create_kmalloc_cache("kmalloc-96", 96, 0);
- caches++;
- }
-
- if (KMALLOC_MIN_SIZE <= 64) {
- kmalloc_caches[2] = create_kmalloc_cache("kmalloc-192", 192, 0);
- caches++;
- }
-
- for (i = KMALLOC_SHIFT_LOW; i < SLUB_PAGE_SHIFT; i++) {
- kmalloc_caches[i] = create_kmalloc_cache("kmalloc", 1 << i, 0);
+ for (i = 1; i < ARRAY_SIZE(__slab_sizes); i++) {
+ if (__slab_sizes[i] % KMALLOC_MIN_SIZE) {
+ kmalloc_caches[i] = 0;
+ continue;
+ }
+ kmalloc_caches[i] = create_kmalloc_cache("kmalloc",
+ __slab_sizes[i], 0);
caches++;
}

slab_state = UP;

/* Provide the correct kmalloc names now that the caches are up */
- if (KMALLOC_MIN_SIZE <= 32) {
- kmalloc_caches[1]->name = kstrdup(kmalloc_caches[1]->name, GFP_NOWAIT);
- BUG_ON(!kmalloc_caches[1]->name);
- }
-
- if (KMALLOC_MIN_SIZE <= 64) {
- kmalloc_caches[2]->name = kstrdup(kmalloc_caches[2]->name, GFP_NOWAIT);
- BUG_ON(!kmalloc_caches[2]->name);
- }

- for (i = KMALLOC_SHIFT_LOW; i < SLUB_PAGE_SHIFT; i++) {
- char *s = kasprintf(GFP_NOWAIT, "kmalloc-%d", 1 << i);
-
- BUG_ON(!s);
- kmalloc_caches[i]->name = s;
+ for (i = 1; i < ARRAY_SIZE(kmalloc_caches); i++) {
+ struct kmem_cache *slab = kmalloc_caches[i];
+ if (slab && slab->size) {
+ char *name = kasprintf(GFP_NOWAIT, "kmalloc-%d",
+ slab->object_size);
+ BUG_ON(!name);
+ slab->name = name;
+ }
}

#ifdef CONFIG_SMP
@@ -3847,16 +3807,14 @@ void __init kmem_cache_init(void)
#endif

#ifdef CONFIG_ZONE_DMA
- for (i = 0; i < SLUB_PAGE_SHIFT; i++) {
- struct kmem_cache *s = kmalloc_caches[i];
-
- if (s && s->size) {
- char *name = kasprintf(GFP_NOWAIT,
- "dma-kmalloc-%d", s->object_size);
-
+ for (i = 1; i < ARRAY_SIZE(kmalloc_caches); i++) {
+ struct kmem_cache *slab = kmalloc_caches[i];
+ if (slab && slab->size) {
+ char *name = kasprintf(GFP_NOWAIT, "dma-kmalloc-%d",
+ slab->object_size);
BUG_ON(!name);
kmalloc_dma_caches[i] = create_kmalloc_cache(name,
- s->object_size, SLAB_CACHE_DMA);
+ slab->object_size, SLAB_CACHE_DMA);
}
}
#endif
--
1.7.11.7

2012-10-13 23:46:21

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 0/2] RFC SLUB: increase range of kmalloc slab sizes

Richard Kennedy <[email protected]> writes:

> This patch increases the range of slab sizes available to kmalloc, adding
> slabs half way between the existing power of two sized ones, so allowing slightly
> more efficient use of memory.
> Most of the new slabs already exist as kmem_cache slabs so only the 1.5k,3k & 6k
> are entirely new.

I'm not sure what order slab/slub use by default these days, but for
order 0 none of your new sizes sound like a winner:

4K / 1.5 = 2 = 4K / 2K
4K / 3K = 1 = 4K / 4K
8K / 6K = 1 = 8K / 8K

I think you need better data that it actually saves memory with some
reproducible workloads.

Revisiting the sizes is a good idea -- the original Bonwick slab paper
explicitely recommended against power of twos -- but I think it needs a
more data driven process to actually select better ones than what you
used.

Most likely the best fits are different between 32bit and 64bit
and also will change occasionally as kernel data structures grow
(or rarely shrink)

In fact I suspect it would be an interesting option for feedback
control for embedded kernels - measure workload, reboot/recompile with
slab fitting the workload well.

So I think before trying any of this you need a good slab profiler
and a good set of workloads.

-Andi

--
[email protected] -- Speaking for myself only

Subject: Re: [PATCH 2/2] SLUB: increase the range of slab sizes available to kmalloc, allowing a somewhat more effient use of memory.

On Sat, 13 Oct 2012, Richard Kennedy wrote:

> -extern struct kmem_cache *kmalloc_caches[SLUB_PAGE_SHIFT];
>
> -/*
> - * Sorry that the following has to be that ugly but some versions of GCC
> - * have trouble with constant propagation and loops.
> +static const short __slab_sizes[] = {0, 8, 12, 16, 24, 32, 48, 64, 96,
> + 128, 192, 256, 384, 512, 768, 1024,
> + 1536, 2048, 3072, 4096, 6144, 8192};
> +

Urg. No thanks. What is the exact benefit of this patch?

Subject: Re: [PATCH 1/2] SLUB: remove hard coded magic numbers from resiliency_test

On Sat, 13 Oct 2012, Richard Kennedy wrote:

> Use the always inlined function kmalloc_index to translate
> sizes to indexes, so that we don't have to have the slab indexes
> hard coded in two places.

Acked-by: Christoph Lameter <[email protected]>

2012-10-16 00:53:20

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH 1/2] SLUB: remove hard coded magic numbers from resiliency_test

On Mon, 15 Oct 2012, Christoph Lameter wrote:

> > Use the always inlined function kmalloc_index to translate
> > sizes to indexes, so that we don't have to have the slab indexes
> > hard coded in two places.
>
> Acked-by: Christoph Lameter <[email protected]>
>

Shouldn't this be using get_slab() instead?

Subject: Re: [PATCH 1/2] SLUB: remove hard coded magic numbers from resiliency_test

On Mon, 15 Oct 2012, David Rientjes wrote:

> On Mon, 15 Oct 2012, Christoph Lameter wrote:
>
> > > Use the always inlined function kmalloc_index to translate
> > > sizes to indexes, so that we don't have to have the slab indexes
> > > hard coded in two places.
> >
> > Acked-by: Christoph Lameter <[email protected]>
> >
>
> Shouldn't this be using get_slab() instead?

Right that would even be better.