2013-09-06 05:57:48

by Joonsoo Kim

[permalink] [raw]
Subject: [REPOST PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab

* THIS IS JUST REPOSTED ACCORDING TO MAINTAINER'S REQUEST *

* Changes from original post
Correct the position of the results.
Attach more results about cache-misses and elapsed time on a hackbench test.

-----------------------------------------------------
This patchset implements byte sized indexes for the freelist of a slab.

Currently, the freelist of a slab consist of unsigned int sized indexes.
Most of slabs have less number of objects than 256, so much space is wasted.
To reduce this overhead, this patchset implements byte sized indexes for
the freelist of a slab. With it, we can save 3 bytes for each objects.

This introduce one likely branch to functions used for setting/getting
objects to/from the freelist, but we may get more benefits from
this change.

Below is some numbers of 'cat /proc/slabinfo' related to my previous posting
and this patchset.


* Before *
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
kmalloc-512 527 600 512 8 1 : tunables 54 27 0 : slabdata 75 75 0
kmalloc-256 210 210 256 15 1 : tunables 120 60 0 : slabdata 14 14 0
kmalloc-192 1040 1040 192 20 1 : tunables 120 60 0 : slabdata 52 52 0
kmalloc-96 750 750 128 30 1 : tunables 120 60 0 : slabdata 25 25 0
kmalloc-64 2773 2773 64 59 1 : tunables 120 60 0 : slabdata 47 47 0
kmalloc-128 660 690 128 30 1 : tunables 120 60 0 : slabdata 23 23 0
kmalloc-32 11200 11200 32 112 1 : tunables 120 60 0 : slabdata 100 100 0
kmem_cache 197 200 192 20 1 : tunables 120 60 0 : slabdata 10 10 0

* After my previous posting(overload struct slab over struct page) *
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
kmalloc-512 525 640 512 8 1 : tunables 54 27 0 : slabdata 80 80 0
kmalloc-256 210 210 256 15 1 : tunables 120 60 0 : slabdata 14 14 0
kmalloc-192 1016 1040 192 20 1 : tunables 120 60 0 : slabdata 52 52 0
kmalloc-96 560 620 128 31 1 : tunables 120 60 0 : slabdata 20 20 0
kmalloc-64 2148 2280 64 60 1 : tunables 120 60 0 : slabdata 38 38 0
kmalloc-128 647 682 128 31 1 : tunables 120 60 0 : slabdata 22 22 0
kmalloc-32 11360 11413 32 113 1 : tunables 120 60 0 : slabdata 101 101 0
kmem_cache 197 200 192 20 1 : tunables 120 60 0 : slabdata 10 10 0

kmem_caches consisting of objects less than or equal to 128 byte have one more
objects in a slab. You can see it at objperslab.

We can improve further with this patchset.

* My previous posting + this patchset *
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
kmalloc-512 521 648 512 8 1 : tunables 54 27 0 : slabdata 81 81 0
kmalloc-256 208 208 256 16 1 : tunables 120 60 0 : slabdata 13 13 0
kmalloc-192 1029 1029 192 21 1 : tunables 120 60 0 : slabdata 49 49 0
kmalloc-96 529 589 128 31 1 : tunables 120 60 0 : slabdata 19 19 0
kmalloc-64 2142 2142 64 63 1 : tunables 120 60 0 : slabdata 34 34 0
kmalloc-128 660 682 128 31 1 : tunables 120 60 0 : slabdata 22 22 0
kmalloc-32 11716 11780 32 124 1 : tunables 120 60 0 : slabdata 95 95 0
kmem_cache 197 210 192 21 1 : tunables 120 60 0 : slabdata 10 10 0

kmem_caches consisting of objects less than or equal to 256 byte have
one or more objects than before. In the case of kmalloc-32, we have 11 more
objects, so 352 bytes (11 * 32) are saved and this is roughly 9% saving of
memory. Of couse, this percentage decreases as the number of objects
in a slab decreases.



Here are the performance results on my 4 cpus machine.

* Before *

Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):

238,309,671 cache-misses ( +- 0.40% )

12.010172090 seconds time elapsed ( +- 0.21% )

* After my previous posting *

Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):

229,945,138 cache-misses ( +- 0.23% )

11.627897174 seconds time elapsed ( +- 0.14% )

* My previous posting + this patchset *

Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):

218,640,472 cache-misses ( +- 0.42% )

11.504999837 seconds time elapsed ( +- 0.21% )


cache-misses are reduced by each patchset, roughly 5% respectively.
And elapsed times are also improved by 3.1% and 4.2% to baseline, respectively.

I think that all patchsets deserve to be merged, since it reduces memory usage and
also improves performance. :)

Please let me know expert's opinions :)
Thanks.

This patchset comes from a Christoph's idea.
https://lkml.org/lkml/2013/8/23/315

Patches are on top of my previous posting.
https://lkml.org/lkml/2013/8/22/137

Joonsoo Kim (4):
slab: factor out calculate nr objects in cache_estimate
slab: introduce helper functions to get/set free object
slab: introduce byte sized index for the freelist of a slab
slab: make more slab management structure off the slab

mm/slab.c | 138 +++++++++++++++++++++++++++++++++++++++++++++----------------
1 file changed, 103 insertions(+), 35 deletions(-)

--
1.7.9.5


2013-09-06 05:57:49

by Joonsoo Kim

[permalink] [raw]
Subject: [REPOST PATCH 1/4] slab: factor out calculate nr objects in cache_estimate

This logic is not simple to understand so that making separate function
helping readability. Additionally, we can use this change in the
following patch which implement for freelist to have another sized index
in according to nr objects.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/slab.c b/mm/slab.c
index f3868fe..9d4bad5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -565,9 +565,31 @@ static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
return cachep->array[smp_processor_id()];
}

-static size_t slab_mgmt_size(size_t nr_objs, size_t align)
+static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
+ size_t idx_size, size_t align)
{
- return ALIGN(nr_objs * sizeof(unsigned int), align);
+ int nr_objs;
+ size_t freelist_size;
+
+ /*
+ * Ignore padding for the initial guess. The padding
+ * is at most @align-1 bytes, and @buffer_size is at
+ * least @align. In the worst case, this result will
+ * be one greater than the number of objects that fit
+ * into the memory allocation when taking the padding
+ * into account.
+ */
+ nr_objs = slab_size / (buffer_size + idx_size);
+
+ /*
+ * This calculated number will be either the right
+ * amount, or one greater than what we want.
+ */
+ freelist_size = slab_size - nr_objs * buffer_size;
+ if (freelist_size < ALIGN(nr_objs * idx_size, align))
+ nr_objs--;
+
+ return nr_objs;
}

/*
@@ -600,28 +622,12 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
nr_objs = slab_size / buffer_size;

} else {
- /*
- * Ignore padding for the initial guess. The padding
- * is at most @align-1 bytes, and @buffer_size is at
- * least @align. In the worst case, this result will
- * be one greater than the number of objects that fit
- * into the memory allocation when taking the padding
- * into account.
- */
- nr_objs = (slab_size) / (buffer_size + sizeof(unsigned int));
-
- /*
- * This calculated number will be either the right
- * amount, or one greater than what we want.
- */
- if (slab_mgmt_size(nr_objs, align) + nr_objs*buffer_size
- > slab_size)
- nr_objs--;
-
- mgmt_size = slab_mgmt_size(nr_objs, align);
+ nr_objs = calculate_nr_objs(slab_size, buffer_size,
+ sizeof(unsigned int), align);
+ mgmt_size = ALIGN(nr_objs * sizeof(unsigned int), align);
}
*num = nr_objs;
- *left_over = slab_size - nr_objs*buffer_size - mgmt_size;
+ *left_over = slab_size - (nr_objs * buffer_size) - mgmt_size;
}

#if DEBUG
--
1.7.9.5

2013-09-06 05:57:59

by Joonsoo Kim

[permalink] [raw]
Subject: [REPOST PATCH 4/4] slab: make more slab management structure off the slab

Now, the size of the freelist for the slab management diminish,
so that the on-slab management structure can waste large space
if the object of the slab is large.

Consider a 128 byte sized slab. If on-slab is used, 31 objects can be
in the slab. The size of the freelist for this case would be 31 bytes
so that 97 bytes, that is, more than 75% of object size, are wasted.

In a 64 byte sized slab case, no space is wasted if we use on-slab.
So set off-slab determining constraint to 128 bytes.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/slab.c b/mm/slab.c
index bd366e5..d01a2f0 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2277,7 +2277,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
* it too early on. Always use on-slab management when
* SLAB_NOLEAKTRACE to avoid recursive calls into kmemleak)
*/
- if ((size >= (PAGE_SIZE >> 3)) && !slab_early_init &&
+ if ((size >= (PAGE_SIZE >> 5)) && !slab_early_init &&
!(flags & SLAB_NOLEAKTRACE))
/*
* Size is large, assume best to place the slab management obj
--
1.7.9.5

2013-09-06 05:58:27

by Joonsoo Kim

[permalink] [raw]
Subject: [REPOST PATCH 3/4] slab: introduce byte sized index for the freelist of a slab

Currently, the freelist of a slab consist of unsigned int sized indexes.
Most of slabs have less number of objects than 256, since restriction
for page order is at most 1 in default configuration. For example,
consider a slab consisting of 32 byte sized objects on two continous
pages. In this case, 256 objects is possible and these number fit to byte
sized indexes. 256 objects is maximum possible value in default
configuration, since 32 byte is minimum object size in the SLAB.
(8192 / 32 = 256). Therefore, if we use byte sized index, we can save
3 bytes for each object.

This introduce one likely branch to functions used for setting/getting
objects to/from the freelist, but we may get more benefits from
this change.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/slab.c b/mm/slab.c
index a0e49bb..bd366e5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -565,8 +565,16 @@ static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
return cachep->array[smp_processor_id()];
}

-static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
- size_t idx_size, size_t align)
+static inline bool can_byte_index(int nr_objs)
+{
+ if (likely(nr_objs <= (sizeof(unsigned char) << 8)))
+ return true;
+
+ return false;
+}
+
+static int __calculate_nr_objs(size_t slab_size, size_t buffer_size,
+ unsigned int idx_size, size_t align)
{
int nr_objs;
size_t freelist_size;
@@ -592,6 +600,29 @@ static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
return nr_objs;
}

+static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
+ size_t align)
+{
+ int nr_objs;
+ int byte_nr_objs;
+
+ nr_objs = __calculate_nr_objs(slab_size, buffer_size,
+ sizeof(unsigned int), align);
+ if (!can_byte_index(nr_objs))
+ return nr_objs;
+
+ byte_nr_objs = __calculate_nr_objs(slab_size, buffer_size,
+ sizeof(unsigned char), align);
+ /*
+ * nr_objs can be larger when using byte index,
+ * so that it cannot be indexed by byte index.
+ */
+ if (can_byte_index(byte_nr_objs))
+ return byte_nr_objs;
+ else
+ return nr_objs;
+}
+
/*
* Calculate the number of objects and left-over bytes for a given buffer size.
*/
@@ -618,13 +649,18 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
* correct alignment when allocated.
*/
if (flags & CFLGS_OFF_SLAB) {
- mgmt_size = 0;
nr_objs = slab_size / buffer_size;
+ mgmt_size = 0;

} else {
- nr_objs = calculate_nr_objs(slab_size, buffer_size,
- sizeof(unsigned int), align);
- mgmt_size = ALIGN(nr_objs * sizeof(unsigned int), align);
+ nr_objs = calculate_nr_objs(slab_size, buffer_size, align);
+ if (can_byte_index(nr_objs)) {
+ mgmt_size =
+ ALIGN(nr_objs * sizeof(unsigned char), align);
+ } else {
+ mgmt_size =
+ ALIGN(nr_objs * sizeof(unsigned int), align);
+ }
}
*num = nr_objs;
*left_over = slab_size - (nr_objs * buffer_size) - mgmt_size;
@@ -2012,7 +2048,10 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
* looping condition in cache_grow().
*/
offslab_limit = size;
- offslab_limit /= sizeof(unsigned int);
+ if (can_byte_index(num))
+ offslab_limit /= sizeof(unsigned char);
+ else
+ offslab_limit /= sizeof(unsigned int);

if (num > offslab_limit)
break;
@@ -2253,8 +2292,13 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
if (!cachep->num)
return -E2BIG;

- freelist_size =
- ALIGN(cachep->num * sizeof(unsigned int), cachep->align);
+ if (can_byte_index(cachep->num)) {
+ freelist_size = ALIGN(cachep->num * sizeof(unsigned char),
+ cachep->align);
+ } else {
+ freelist_size = ALIGN(cachep->num * sizeof(unsigned int),
+ cachep->align);
+ }

/*
* If the slab has been placed off-slab, and we have enough space then
@@ -2267,7 +2311,10 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)

if (flags & CFLGS_OFF_SLAB) {
/* really off slab. No need for manual alignment */
- freelist_size = cachep->num * sizeof(unsigned int);
+ if (can_byte_index(cachep->num))
+ freelist_size = cachep->num * sizeof(unsigned char);
+ else
+ freelist_size = cachep->num * sizeof(unsigned int);

#ifdef CONFIG_PAGE_POISONING
/* If we're going to use the generic kernel_map_pages()
@@ -2545,15 +2592,22 @@ static struct freelist *alloc_slabmgmt(struct kmem_cache *cachep,
return freelist;
}

-static inline unsigned int get_free_obj(struct page *page, unsigned int idx)
+static inline unsigned int get_free_obj(struct kmem_cache *cachep,
+ struct page *page, unsigned int idx)
{
- return ((unsigned int *)page->freelist)[idx];
+ if (likely(can_byte_index(cachep->num)))
+ return ((unsigned char *)page->freelist)[idx];
+ else
+ return ((unsigned int *)page->freelist)[idx];
}

-static inline void set_free_obj(struct page *page,
+static inline void set_free_obj(struct kmem_cache *cachep, struct page *page,
unsigned int idx, unsigned int val)
{
- ((unsigned int *)(page->freelist))[idx] = val;
+ if (likely(can_byte_index(cachep->num)))
+ ((unsigned char *)(page->freelist))[idx] = (unsigned char)val;
+ else
+ ((unsigned int *)(page->freelist))[idx] = val;
}

static void cache_init_objs(struct kmem_cache *cachep,
@@ -2598,7 +2652,7 @@ static void cache_init_objs(struct kmem_cache *cachep,
if (cachep->ctor)
cachep->ctor(objp);
#endif
- set_free_obj(page, i, i);
+ set_free_obj(cachep, page, i, i);
}
}

@@ -2615,9 +2669,11 @@ static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
static void *slab_get_obj(struct kmem_cache *cachep, struct page *page,
int nodeid)
{
+ unsigned int index;
void *objp;

- objp = index_to_obj(cachep, page, get_free_obj(page, page->active));
+ index = get_free_obj(cachep, page, page->active);
+ objp = index_to_obj(cachep, page, index);
page->active++;
#if DEBUG
WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
@@ -2638,7 +2694,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct page *page,

/* Verify double free bug */
for (i = page->active; i < cachep->num; i++) {
- if (get_free_obj(page, i) == objnr) {
+ if (get_free_obj(cachep, page, i) == objnr) {
printk(KERN_ERR "slab: double free detected in cache "
"'%s', objp %p\n", cachep->name, objp);
BUG();
@@ -2646,7 +2702,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
}
#endif
page->active--;
- set_free_obj(page, page->active, objnr);
+ set_free_obj(cachep, page, page->active, objnr);
}

/*
@@ -4220,7 +4276,7 @@ static void handle_slab(unsigned long *n, struct kmem_cache *c,

for (j = page->active; j < c->num; j++) {
/* Skip freed item */
- if (get_free_obj(page, j) == i) {
+ if (get_free_obj(c, page, j) == i) {
active = false;
break;
}
--
1.7.9.5

2013-09-06 05:58:53

by Joonsoo Kim

[permalink] [raw]
Subject: [REPOST PATCH 2/4] slab: introduce helper functions to get/set free object

In the following patches, to get/set free objects from the freelist
is changed so that simple casting doesn't work for it. Therefore,
introduce helper functions.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/slab.c b/mm/slab.c
index 9d4bad5..a0e49bb 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2545,9 +2545,15 @@ static struct freelist *alloc_slabmgmt(struct kmem_cache *cachep,
return freelist;
}

-static inline unsigned int *slab_freelist(struct page *page)
+static inline unsigned int get_free_obj(struct page *page, unsigned int idx)
{
- return (unsigned int *)(page->freelist);
+ return ((unsigned int *)page->freelist)[idx];
+}
+
+static inline void set_free_obj(struct page *page,
+ unsigned int idx, unsigned int val)
+{
+ ((unsigned int *)(page->freelist))[idx] = val;
}

static void cache_init_objs(struct kmem_cache *cachep,
@@ -2592,7 +2598,7 @@ static void cache_init_objs(struct kmem_cache *cachep,
if (cachep->ctor)
cachep->ctor(objp);
#endif
- slab_freelist(page)[i] = i;
+ set_free_obj(page, i, i);
}
}

@@ -2611,7 +2617,7 @@ static void *slab_get_obj(struct kmem_cache *cachep, struct page *page,
{
void *objp;

- objp = index_to_obj(cachep, page, slab_freelist(page)[page->active]);
+ objp = index_to_obj(cachep, page, get_free_obj(page, page->active));
page->active++;
#if DEBUG
WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
@@ -2632,7 +2638,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct page *page,

/* Verify double free bug */
for (i = page->active; i < cachep->num; i++) {
- if (slab_freelist(page)[i] == objnr) {
+ if (get_free_obj(page, i) == objnr) {
printk(KERN_ERR "slab: double free detected in cache "
"'%s', objp %p\n", cachep->name, objp);
BUG();
@@ -2640,7 +2646,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
}
#endif
page->active--;
- slab_freelist(page)[page->active] = objnr;
+ set_free_obj(page, page->active, objnr);
}

/*
@@ -4214,7 +4220,7 @@ static void handle_slab(unsigned long *n, struct kmem_cache *c,

for (j = page->active; j < c->num; j++) {
/* Skip freed item */
- if (slab_freelist(page)[j] == i) {
+ if (get_free_obj(page, j) == i) {
active = false;
break;
}
--
1.7.9.5

Subject: Re: [REPOST PATCH 1/4] slab: factor out calculate nr objects in cache_estimate

On Fri, 6 Sep 2013, Joonsoo Kim wrote:

> }
> *num = nr_objs;
> - *left_over = slab_size - nr_objs*buffer_size - mgmt_size;
> + *left_over = slab_size - (nr_objs * buffer_size) - mgmt_size;
> }

What is the point of this change? Drop it.

Subject: Re: [REPOST PATCH 2/4] slab: introduce helper functions to get/set free object

On Fri, 6 Sep 2013, Joonsoo Kim wrote:

> In the following patches, to get/set free objects from the freelist
> is changed so that simple casting doesn't work for it. Therefore,
> introduce helper functions.

Acked-by: Christoph Lameter <[email protected]>

Subject: Re: [REPOST PATCH 3/4] slab: introduce byte sized index for the freelist of a slab

On Fri, 6 Sep 2013, Joonsoo Kim wrote:

> Currently, the freelist of a slab consist of unsigned int sized indexes.
> Most of slabs have less number of objects than 256, since restriction
> for page order is at most 1 in default configuration. For example,
> consider a slab consisting of 32 byte sized objects on two continous
> pages. In this case, 256 objects is possible and these number fit to byte
> sized indexes. 256 objects is maximum possible value in default
> configuration, since 32 byte is minimum object size in the SLAB.
> (8192 / 32 = 256). Therefore, if we use byte sized index, we can save
> 3 bytes for each object.

Ok then why is the patch making slab do either byte sized or int sized
indexes? Seems that you could do a clean cutover?


As you said: The mininum object size is 32 bytes for slab. 32 * 256 =
8k. So we are fine unless the page size is > 8k. This is true for IA64 and
powerpc only I believe. The page size can be determined at compile time
and depending on that page size you could then choose a different size for
the indexes. Or the alternative is to increase the minimum slab object size.
A 16k page size would require a 64 byte minimum allocation. But thats no
good I guess. byte sized or short int sized index support would be enough.

> This introduce one likely branch to functions used for setting/getting
> objects to/from the freelist, but we may get more benefits from
> this change.

Lets not do that.

Subject: Re: [REPOST PATCH 4/4] slab: make more slab management structure off the slab

On Fri, 6 Sep 2013, Joonsoo Kim wrote:

> In a 64 byte sized slab case, no space is wasted if we use on-slab.
> So set off-slab determining constraint to 128 bytes.

Acked-by: Christoph Lameter <[email protected]>

2013-09-09 04:32:07

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [REPOST PATCH 3/4] slab: introduce byte sized index for the freelist of a slab

On Fri, Sep 06, 2013 at 03:58:18PM +0000, Christoph Lameter wrote:
> On Fri, 6 Sep 2013, Joonsoo Kim wrote:
>
> > Currently, the freelist of a slab consist of unsigned int sized indexes.
> > Most of slabs have less number of objects than 256, since restriction
> > for page order is at most 1 in default configuration. For example,
> > consider a slab consisting of 32 byte sized objects on two continous
> > pages. In this case, 256 objects is possible and these number fit to byte
> > sized indexes. 256 objects is maximum possible value in default
> > configuration, since 32 byte is minimum object size in the SLAB.
> > (8192 / 32 = 256). Therefore, if we use byte sized index, we can save
> > 3 bytes for each object.
>
> Ok then why is the patch making slab do either byte sized or int sized
> indexes? Seems that you could do a clean cutover?
>
>
> As you said: The mininum object size is 32 bytes for slab. 32 * 256 =
> 8k. So we are fine unless the page size is > 8k. This is true for IA64 and
> powerpc only I believe. The page size can be determined at compile time
> and depending on that page size you could then choose a different size for
> the indexes. Or the alternative is to increase the minimum slab object size.
> A 16k page size would require a 64 byte minimum allocation. But thats no
> good I guess. byte sized or short int sized index support would be enough.

Sorry for misleading commit message.

32 byte is not minimum object size, minimum *kmalloc* object size
in default configuration. There are some slabs that their object size is
less than 32 byte. If we have a 8 byte sized kmem_cache, it has 512 objects
in 4K page.

Moreover, we can configure slab_max_order in boot time so that we can't know
how many object are in a certain slab in compile time. Therefore we can't
decide the size of the index in compile time.

I think that byte and short int sized index support would be enough, but
it should be determined at runtime.

>
> > This introduce one likely branch to functions used for setting/getting
> > objects to/from the freelist, but we may get more benefits from
> > this change.
>
> Lets not do that.

IMHO, this is as best as we can. Do you have any better idea?

Thanks.

2013-09-09 04:32:22

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [REPOST PATCH 1/4] slab: factor out calculate nr objects in cache_estimate

On Fri, Sep 06, 2013 at 03:48:04PM +0000, Christoph Lameter wrote:
> On Fri, 6 Sep 2013, Joonsoo Kim wrote:
>
> > }
> > *num = nr_objs;
> > - *left_over = slab_size - nr_objs*buffer_size - mgmt_size;
> > + *left_over = slab_size - (nr_objs * buffer_size) - mgmt_size;
> > }
>
> What is the point of this change? Drop it.

Okay. I will drop it.

>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

Subject: Re: [REPOST PATCH 3/4] slab: introduce byte sized index for the freelist of a slab

On Mon, 9 Sep 2013, Joonsoo Kim wrote:

> 32 byte is not minimum object size, minimum *kmalloc* object size
> in default configuration. There are some slabs that their object size is
> less than 32 byte. If we have a 8 byte sized kmem_cache, it has 512 objects
> in 4K page.

As far as I can recall only SLUB supports 8 byte objects. SLABs mininum
has always been 32 bytes.

> Moreover, we can configure slab_max_order in boot time so that we can't know
> how many object are in a certain slab in compile time. Therefore we can't
> decide the size of the index in compile time.

You can ignore the slab_max_order if necessary.

> I think that byte and short int sized index support would be enough, but
> it should be determined at runtime.

On x86 f.e. it would add useless branching. The branches are never taken.
You only need these if you do bad things to the system like requiring
large contiguous allocs.

2013-09-10 05:43:27

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [REPOST PATCH 3/4] slab: introduce byte sized index for the freelist of a slab

On Mon, Sep 09, 2013 at 02:44:03PM +0000, Christoph Lameter wrote:
> On Mon, 9 Sep 2013, Joonsoo Kim wrote:
>
> > 32 byte is not minimum object size, minimum *kmalloc* object size
> > in default configuration. There are some slabs that their object size is
> > less than 32 byte. If we have a 8 byte sized kmem_cache, it has 512 objects
> > in 4K page.
>
> As far as I can recall only SLUB supports 8 byte objects. SLABs mininum
> has always been 32 bytes.

No.
There are many slabs that their object size are less than 32 byte.
And I can also create a 8 byte sized slab in my kernel with SLAB.

js1304@js1304-P5Q-DELUXE:~/Projects/remote_git/linux$ sudo cat /proc/slabinfo | awk '{if($4 < 32) print $0}'
slabinfo - version: 2.1
ecryptfs_file_cache 0 0 16 240 1 : tunables 120 60 8 : slabdata 0 0 0
jbd2_revoke_table_s 2 240 16 240 1 : tunables 120 60 8 : slabdata 1 1 0
journal_handle 0 0 24 163 1 : tunables 120 60 8 : slabdata 0 0 0
revoke_table 0 0 16 240 1 : tunables 120 60 8 : slabdata 0 0 0
scsi_data_buffer 0 0 24 163 1 : tunables 120 60 8 : slabdata 0 0 0
fsnotify_event_holder 0 0 24 163 1 : tunables 120 60 8 : slabdata 0 0 0
numa_policy 3 163 24 163 1 : tunables 120 60 8 : slabdata 1 1 0

>
> > Moreover, we can configure slab_max_order in boot time so that we can't know
> > how many object are in a certain slab in compile time. Therefore we can't
> > decide the size of the index in compile time.
>
> You can ignore the slab_max_order if necessary.
>
> > I think that byte and short int sized index support would be enough, but
> > it should be determined at runtime.
>
> On x86 f.e. it would add useless branching. The branches are never taken.
> You only need these if you do bad things to the system like requiring
> large contiguous allocs.

As I said before, since there is a possibility that some runtime loaded modules
use a 8 byte sized slab, we can't determine index size in compile time. Otherwise
we should always use short int sized index and I think that it is worse than
adding a branch.

Thanks.

Subject: Re: [REPOST PATCH 3/4] slab: introduce byte sized index for the freelist of a slab

On Tue, 10 Sep 2013, Joonsoo Kim wrote:

> On Mon, Sep 09, 2013 at 02:44:03PM +0000, Christoph Lameter wrote:
> > On Mon, 9 Sep 2013, Joonsoo Kim wrote:
> >
> > > 32 byte is not minimum object size, minimum *kmalloc* object size
> > > in default configuration. There are some slabs that their object size is
> > > less than 32 byte. If we have a 8 byte sized kmem_cache, it has 512 objects
> > > in 4K page.
> >
> > As far as I can recall only SLUB supports 8 byte objects. SLABs mininum
> > has always been 32 bytes.
>
> No.
> There are many slabs that their object size are less than 32 byte.
> And I can also create a 8 byte sized slab in my kernel with SLAB.

Well the minimum size for the kmalloc array is 32 bytes. These are custom
slabs. KMALLOC_SHIFT_LOW is set to 5 in include/linux/slab.h.

Ok so there are some slabs like that. Hmmm.. We have sizes 16 and 24 in
your list. 16*256 is still 4096. So this would still work fine if we would
forbid a size of 8 or increase that by default to 16.

> > On x86 f.e. it would add useless branching. The branches are never taken.
> > You only need these if you do bad things to the system like requiring
> > large contiguous allocs.
>
> As I said before, since there is a possibility that some runtime loaded modules
> use a 8 byte sized slab, we can't determine index size in compile time. Otherwise
> we should always use short int sized index and I think that it is worse than
> adding a branch.

We can enforce a mininum slab size and an order limit so that it fits. And
then there would be no additional branching.

2013-09-11 01:04:16

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [REPOST PATCH 3/4] slab: introduce byte sized index for the freelist of a slab

On Tue, Sep 10, 2013 at 09:25:05PM +0000, Christoph Lameter wrote:
> On Tue, 10 Sep 2013, Joonsoo Kim wrote:
>
> > On Mon, Sep 09, 2013 at 02:44:03PM +0000, Christoph Lameter wrote:
> > > On Mon, 9 Sep 2013, Joonsoo Kim wrote:
> > >
> > > > 32 byte is not minimum object size, minimum *kmalloc* object size
> > > > in default configuration. There are some slabs that their object size is
> > > > less than 32 byte. If we have a 8 byte sized kmem_cache, it has 512 objects
> > > > in 4K page.
> > >
> > > As far as I can recall only SLUB supports 8 byte objects. SLABs mininum
> > > has always been 32 bytes.
> >
> > No.
> > There are many slabs that their object size are less than 32 byte.
> > And I can also create a 8 byte sized slab in my kernel with SLAB.
>
> Well the minimum size for the kmalloc array is 32 bytes. These are custom
> slabs. KMALLOC_SHIFT_LOW is set to 5 in include/linux/slab.h.
>
> Ok so there are some slabs like that. Hmmm.. We have sizes 16 and 24 in
> your list. 16*256 is still 4096. So this would still work fine if we would
> forbid a size of 8 or increase that by default to 16.
>
> > > On x86 f.e. it would add useless branching. The branches are never taken.
> > > You only need these if you do bad things to the system like requiring
> > > large contiguous allocs.
> >
> > As I said before, since there is a possibility that some runtime loaded modules
> > use a 8 byte sized slab, we can't determine index size in compile time. Otherwise
> > we should always use short int sized index and I think that it is worse than
> > adding a branch.
>
> We can enforce a mininum slab size and an order limit so that it fits. And
> then there would be no additional branching.
>

Okay. I will respin this patchset with your suggestion.

Anyway, could you review my previous patchset, that is, 'overload struct slab
over struct page to reduce memory usage'? I'm not sure whether your answer is
ack or not.

Thanks.

Subject: Re: [REPOST PATCH 3/4] slab: introduce byte sized index for the freelist of a slab

On Wed, 11 Sep 2013, Joonsoo Kim wrote:

> Anyway, could you review my previous patchset, that is, 'overload struct slab
> over struct page to reduce memory usage'? I'm not sure whether your answer is
> ack or not.

I scanned over it before but I was not able to see if it was correct on
first glance. I will look at it again.

2013-09-12 06:52:05

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [REPOST PATCH 3/4] slab: introduce byte sized index for the freelist of a slab

On Wed, Sep 11, 2013 at 02:22:25PM +0000, Christoph Lameter wrote:
> On Wed, 11 Sep 2013, Joonsoo Kim wrote:
>
> > Anyway, could you review my previous patchset, that is, 'overload struct slab
> > over struct page to reduce memory usage'? I'm not sure whether your answer is
> > ack or not.
>
> I scanned over it before but I was not able to see if it was correct on
> first glance. I will look at it again.

Sounds good! Thanks :)