2013-10-16 08:51:27

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2 00/15] slab: overload struct slab over struct page to reduce memory usage

There is two main topics in this patchset. One is to reduce memory usage
and the other is to change a management method of free objects of a slab.

The SLAB allocate a struct slab for each slab. The size of this structure
except bufctl array is 40 bytes on 64 bits machine. We can reduce memory
waste and cache footprint if we overload struct slab over struct page.

And this patchset change a management method of free objects of a slab.
Current free objects management method of the slab is weird, because
it touch random position of the array of kmem_bufctl_t when we try to
get free object. See following example.

struct slab's free = 6
kmem_bufctl_t array: 1 END 5 7 0 4 3 2

To get free objects, we access this array with following index pattern.
6 -> 3 -> 7 -> 2 -> 5 -> 4 -> 0 -> 1 -> END

If we have many objects, this array would be larger and be not in the same
cache line. It is not good for performance.

We can do same thing through more easy way, like as the stack.
This patchset implement it and remove complex code for above algorithm.
This makes slab code much cleaner.

Below is some numbers of 'cat /proc/slabinfo'.

* Before *
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
kmalloc-512 527 600 512 8 1 : tunables 54 27 0 : slabdata 75 75 0
kmalloc-256 210 210 256 15 1 : tunables 120 60 0 : slabdata 14 14 0
kmalloc-192 1040 1040 192 20 1 : tunables 120 60 0 : slabdata 52 52 0
kmalloc-96 750 750 128 30 1 : tunables 120 60 0 : slabdata 25 25 0
kmalloc-64 2773 2773 64 59 1 : tunables 120 60 0 : slabdata 47 47 0
kmalloc-128 660 690 128 30 1 : tunables 120 60 0 : slabdata 23 23 0
kmalloc-32 11200 11200 32 112 1 : tunables 120 60 0 : slabdata 100 100 0
kmem_cache 197 200 192 20 1 : tunables 120 60 0 : slabdata 10 10 0

* After *
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables [snip...]
kmalloc-512 525 640 512 8 1 : tunables 54 27 0 : slabdata 80 80 0
kmalloc-256 210 210 256 15 1 : tunables 120 60 0 : slabdata 14 14 0
kmalloc-192 1016 1040 192 20 1 : tunables 120 60 0 : slabdata 52 52 0
kmalloc-96 560 620 128 31 1 : tunables 120 60 0 : slabdata 20 20 0
kmalloc-64 2148 2280 64 60 1 : tunables 120 60 0 : slabdata 38 38 0
kmalloc-128 647 682 128 31 1 : tunables 120 60 0 : slabdata 22 22 0
kmalloc-32 11360 11413 32 113 1 : tunables 120 60 0 : slabdata 101 101 0
kmem_cache 197 200 192 20 1 : tunables 120 60 0 : slabdata 10 10 0

kmem_caches consisting of objects less than or equal to 128 byte have one more
objects in a slab. You can see it at objperslab.


Here are the performance results on my 4 cpus machine.

* Before *

Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):

238,309,671 cache-misses ( +- 0.40% )

12.010172090 seconds time elapsed ( +- 0.21% )

* After *

Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):

229,945,138 cache-misses ( +- 0.23% )

11.627897174 seconds time elapsed ( +- 0.14% )

cache-misses are reduced by this patchset, roughly 5%.
And elapsed times are also improved by 3.1% to baseline.

I think that this patchsets deserve to be merged, since it reduces memory usage and
also improves performance. :)
Please let me know expert's opinion.

Thanks.

This patchset is based on v3.12-rc5.

Joonsoo Kim (15):
slab: correct pfmemalloc check
slab: change return type of kmem_getpages() to struct page
slab: remove colouroff in struct slab
slab: remove nodeid in struct slab
slab: remove cachep in struct slab_rcu
slab: overloading the RCU head over the LRU for RCU free
slab: use well-defined macro, virt_to_slab()
slab: use __GFP_COMP flag for allocating slab pages
slab: change the management method of free objects of the slab
slab: remove kmem_bufctl_t
slab: remove SLAB_LIMIT
slab: replace free and inuse in struct slab with newly introduced
active
slab: use struct page for slab management
slab: remove useless statement for checking pfmemalloc
slab: rename slab_bufctl to slab_freelist

include/linux/mm_types.h | 24 +-
include/linux/slab.h | 9 +-
include/linux/slab_def.h | 4 +-
mm/slab.c | 565 ++++++++++++++++++----------------------------
4 files changed, 244 insertions(+), 358 deletions(-)

--
1.7.9.5


2013-10-16 08:44:18

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2 01/15] slab: correct pfmemalloc check

We checked pfmemalloc by slab unit, not page unit. You can see this
in is_slab_pfmemalloc(). So other pages don't need to be set/cleared
pfmemalloc.

And, therefore we should check pfmemalloc in page flag of first page,
but current implementation don't do that. virt_to_head_page(obj) just
return 'struct page' of that object, not one of first page, since the SLAB
don't use __GFP_COMP when CONFIG_MMU. To get 'struct page' of first page,
we first get a slab and try to get it via virt_to_head_page(slab->s_mem).

Cc: Mel Gorman <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/slab.c b/mm/slab.c
index 2580db0..0b4ddaf 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -930,7 +930,8 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
{
if (unlikely(pfmemalloc_active)) {
/* Some pfmemalloc slabs exist, check if this is one */
- struct page *page = virt_to_head_page(objp);
+ struct slab *slabp = virt_to_slab(objp);
+ struct page *page = virt_to_head_page(slabp->s_mem);
if (PageSlabPfmemalloc(page))
set_obj_pfmemalloc(&objp);
}
@@ -1776,7 +1777,7 @@ static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid)
__SetPageSlab(page + i);

if (page->pfmemalloc)
- SetPageSlabPfmemalloc(page + i);
+ SetPageSlabPfmemalloc(page);
}
memcg_bind_pages(cachep, cachep->gfporder);

@@ -1809,9 +1810,10 @@ static void kmem_freepages(struct kmem_cache *cachep, void *addr)
else
sub_zone_page_state(page_zone(page),
NR_SLAB_UNRECLAIMABLE, nr_freed);
+
+ __ClearPageSlabPfmemalloc(page);
while (i--) {
BUG_ON(!PageSlab(page));
- __ClearPageSlabPfmemalloc(page);
__ClearPageSlab(page);
page++;
}
--
1.7.9.5

2013-10-16 08:44:29

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2 05/15] slab: remove cachep in struct slab_rcu

We can get cachep using page in struct slab_rcu, so remove it.

Acked-by: Christoph Lameter <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/slab.c b/mm/slab.c
index 71ba8f5..7e1aabe 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -204,7 +204,6 @@ typedef unsigned int kmem_bufctl_t;
*/
struct slab_rcu {
struct rcu_head head;
- struct kmem_cache *cachep;
struct page *page;
};

@@ -1824,7 +1823,7 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
static void kmem_rcu_free(struct rcu_head *head)
{
struct slab_rcu *slab_rcu = (struct slab_rcu *)head;
- struct kmem_cache *cachep = slab_rcu->cachep;
+ struct kmem_cache *cachep = slab_rcu->page->slab_cache;

kmem_freepages(cachep, slab_rcu->page);
if (OFF_SLAB(cachep))
@@ -2052,7 +2051,6 @@ static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
struct slab_rcu *slab_rcu;

slab_rcu = (struct slab_rcu *)slabp;
- slab_rcu->cachep = cachep;
slab_rcu->page = page;
call_rcu(&slab_rcu->head, kmem_rcu_free);
} else {
--
1.7.9.5

2013-10-16 08:44:27

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2 07/15] slab: use well-defined macro, virt_to_slab()

This is trivial change, just use well-defined macro.

Acked-by: Christoph Lameter <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/slab.c b/mm/slab.c
index 84c4ed6..f9e676e 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2865,7 +2865,6 @@ static inline void verify_redzone_free(struct kmem_cache *cache, void *obj)
static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
unsigned long caller)
{
- struct page *page;
unsigned int objnr;
struct slab *slabp;

@@ -2873,9 +2872,7 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,

objp -= obj_offset(cachep);
kfree_debugcheck(objp);
- page = virt_to_head_page(objp);
-
- slabp = page->slab_page;
+ slabp = virt_to_slab(objp);

if (cachep->flags & SLAB_RED_ZONE) {
verify_redzone_free(cachep, objp);
@@ -3087,7 +3084,7 @@ static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
struct slab *slabp;
unsigned objnr;

- slabp = virt_to_head_page(objp)->slab_page;
+ slabp = virt_to_slab(objp);
objnr = (unsigned)(objp - slabp->s_mem) / cachep->size;
slab_bufctl(slabp)[objnr] = BUFCTL_ACTIVE;
}
--
1.7.9.5

2013-10-16 08:44:47

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2 15/15] slab: rename slab_bufctl to slab_freelist

Now, bufctl is not proper name to this array.
So change it.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/slab.c b/mm/slab.c
index fbb594f..af2db76 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2550,7 +2550,7 @@ static struct freelist *alloc_slabmgmt(struct kmem_cache *cachep,
return freelist;
}

-static inline unsigned int *slab_bufctl(struct page *page)
+static inline unsigned int *slab_freelist(struct page *page)
{
return (unsigned int *)(page->freelist);
}
@@ -2597,7 +2597,7 @@ static void cache_init_objs(struct kmem_cache *cachep,
if (cachep->ctor)
cachep->ctor(objp);
#endif
- slab_bufctl(page)[i] = i;
+ slab_freelist(page)[i] = i;
}
}

@@ -2616,7 +2616,7 @@ static void *slab_get_obj(struct kmem_cache *cachep, struct page *page,
{
void *objp;

- objp = index_to_obj(cachep, page, slab_bufctl(page)[page->active]);
+ objp = index_to_obj(cachep, page, slab_freelist(page)[page->active]);
page->active++;
#if DEBUG
WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
@@ -2637,7 +2637,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct page *page,

/* Verify double free bug */
for (i = page->active; i < cachep->num; i++) {
- if (slab_bufctl(page)[i] == objnr) {
+ if (slab_freelist(page)[i] == objnr) {
printk(KERN_ERR "slab: double free detected in cache "
"'%s', objp %p\n", cachep->name, objp);
BUG();
@@ -2645,7 +2645,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
}
#endif
page->active--;
- slab_bufctl(page)[page->active] = objnr;
+ slab_freelist(page)[page->active] = objnr;
}

/*
@@ -4218,7 +4218,7 @@ static void handle_slab(unsigned long *n, struct kmem_cache *c,

for (j = page->active; j < c->num; j++) {
/* Skip freed item */
- if (slab_bufctl(page)[j] == i) {
+ if (slab_freelist(page)[j] == i) {
active = false;
break;
}
--
1.7.9.5

2013-10-16 08:45:26

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2 13/15] slab: use struct page for slab management

Now, there are a few field in struct slab, so we can overload these
over struct page. This will save some memory and reduce cache footprint.

After this change, slabp_cache and slab_size no longer related to
a struct slab, so rename them as freelist_cache and freelist_size.

These changes are just mechanical ones and there is no functional change.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 8b85d8c..4e17190 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -42,18 +42,22 @@ struct page {
/* First double word block */
unsigned long flags; /* Atomic flags, some possibly
* updated asynchronously */
- struct address_space *mapping; /* If low bit clear, points to
- * inode address_space, or NULL.
- * If page mapped as anonymous
- * memory, low bit is set, and
- * it points to anon_vma object:
- * see PAGE_MAPPING_ANON below.
- */
+ union {
+ struct address_space *mapping; /* If low bit clear, points to
+ * inode address_space, or NULL.
+ * If page mapped as anonymous
+ * memory, low bit is set, and
+ * it points to anon_vma object:
+ * see PAGE_MAPPING_ANON below.
+ */
+ void *s_mem; /* slab first object */
+ };
+
/* Second double word */
struct {
union {
pgoff_t index; /* Our offset within mapping. */
- void *freelist; /* slub/slob first free object */
+ void *freelist; /* sl[aou]b first free object */
bool pfmemalloc; /* If set by the page allocator,
* ALLOC_NO_WATERMARKS was set
* and the low watermark was not
@@ -109,6 +113,7 @@ struct page {
};
atomic_t _count; /* Usage count, see below. */
};
+ unsigned int active; /* SLAB */
};
};

diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index e9346b4..09bfffb 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -27,8 +27,8 @@ struct kmem_cache {

size_t colour; /* cache colouring range */
unsigned int colour_off; /* colour offset */
- struct kmem_cache *slabp_cache;
- unsigned int slab_size;
+ struct kmem_cache *freelist_cache;
+ unsigned int freelist_size;

/* constructor func */
void (*ctor)(void *obj);
diff --git a/mm/slab.c b/mm/slab.c
index 2ec2336..0e7f2e7 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -164,21 +164,6 @@
static bool pfmemalloc_active __read_mostly;

/*
- * struct slab
- *
- * Manages the objs in a slab. Placed either at the beginning of mem allocated
- * for a slab, or allocated from an general cache.
- * Slabs are chained into three list: fully used, partial, fully free slabs.
- */
-struct slab {
- struct {
- struct list_head list;
- void *s_mem; /* including colour offset */
- unsigned int active; /* num of objs active in slab */
- };
-};
-
-/*
* struct array_cache
*
* Purpose:
@@ -405,18 +390,10 @@ static inline struct kmem_cache *virt_to_cache(const void *obj)
return page->slab_cache;
}

-static inline struct slab *virt_to_slab(const void *obj)
-{
- struct page *page = virt_to_head_page(obj);
-
- VM_BUG_ON(!PageSlab(page));
- return page->slab_page;
-}
-
-static inline void *index_to_obj(struct kmem_cache *cache, struct slab *slab,
+static inline void *index_to_obj(struct kmem_cache *cache, struct page *page,
unsigned int idx)
{
- return slab->s_mem + cache->size * idx;
+ return page->s_mem + cache->size * idx;
}

/*
@@ -426,9 +403,9 @@ static inline void *index_to_obj(struct kmem_cache *cache, struct slab *slab,
* reciprocal_divide(offset, cache->reciprocal_buffer_size)
*/
static inline unsigned int obj_to_index(const struct kmem_cache *cache,
- const struct slab *slab, void *obj)
+ const struct page *page, void *obj)
{
- u32 offset = (obj - slab->s_mem);
+ u32 offset = (obj - page->s_mem);
return reciprocal_divide(offset, cache->reciprocal_buffer_size);
}

@@ -590,7 +567,7 @@ static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)

static size_t slab_mgmt_size(size_t nr_objs, size_t align)
{
- return ALIGN(sizeof(struct slab)+nr_objs*sizeof(unsigned int), align);
+ return ALIGN(nr_objs * sizeof(unsigned int), align);
}

/*
@@ -609,7 +586,6 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
* on it. For the latter case, the memory allocated for a
* slab is used for:
*
- * - The struct slab
* - One unsigned int for each object
* - Padding to respect alignment of @align
* - @buffer_size bytes for each object
@@ -632,8 +608,7 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
* into the memory allocation when taking the padding
* into account.
*/
- nr_objs = (slab_size - sizeof(struct slab)) /
- (buffer_size + sizeof(unsigned int));
+ nr_objs = (slab_size) / (buffer_size + sizeof(unsigned int));

/*
* This calculated number will be either the right
@@ -773,11 +748,11 @@ static struct array_cache *alloc_arraycache(int node, int entries,
return nc;
}

-static inline bool is_slab_pfmemalloc(struct slab *slabp)
+static inline bool is_slab_pfmemalloc(struct page *page)
{
- struct page *page = virt_to_page(slabp->s_mem);
+ struct page *mem_page = virt_to_page(page->s_mem);

- return PageSlabPfmemalloc(page);
+ return PageSlabPfmemalloc(mem_page);
}

/* Clears pfmemalloc_active if no slabs have pfmalloc set */
@@ -785,23 +760,23 @@ static void recheck_pfmemalloc_active(struct kmem_cache *cachep,
struct array_cache *ac)
{
struct kmem_cache_node *n = cachep->node[numa_mem_id()];
- struct slab *slabp;
+ struct page *page;
unsigned long flags;

if (!pfmemalloc_active)
return;

spin_lock_irqsave(&n->list_lock, flags);
- list_for_each_entry(slabp, &n->slabs_full, list)
- if (is_slab_pfmemalloc(slabp))
+ list_for_each_entry(page, &n->slabs_full, lru)
+ if (is_slab_pfmemalloc(page))
goto out;

- list_for_each_entry(slabp, &n->slabs_partial, list)
- if (is_slab_pfmemalloc(slabp))
+ list_for_each_entry(page, &n->slabs_partial, lru)
+ if (is_slab_pfmemalloc(page))
goto out;

- list_for_each_entry(slabp, &n->slabs_free, list)
- if (is_slab_pfmemalloc(slabp))
+ list_for_each_entry(page, &n->slabs_free, lru)
+ if (is_slab_pfmemalloc(page))
goto out;

pfmemalloc_active = false;
@@ -841,8 +816,8 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
*/
n = cachep->node[numa_mem_id()];
if (!list_empty(&n->slabs_free) && force_refill) {
- struct slab *slabp = virt_to_slab(objp);
- ClearPageSlabPfmemalloc(virt_to_head_page(slabp->s_mem));
+ struct page *page = virt_to_head_page(objp);
+ ClearPageSlabPfmemalloc(virt_to_head_page(page->s_mem));
clear_obj_pfmemalloc(&objp);
recheck_pfmemalloc_active(cachep, ac);
return objp;
@@ -874,9 +849,9 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
{
if (unlikely(pfmemalloc_active)) {
/* Some pfmemalloc slabs exist, check if this is one */
- struct slab *slabp = virt_to_slab(objp);
- struct page *page = virt_to_head_page(slabp->s_mem);
- if (PageSlabPfmemalloc(page))
+ struct page *page = virt_to_head_page(objp);
+ struct page *mem_page = virt_to_head_page(page->s_mem);
+ if (PageSlabPfmemalloc(mem_page))
set_obj_pfmemalloc(&objp);
}

@@ -1633,7 +1608,7 @@ static noinline void
slab_out_of_memory(struct kmem_cache *cachep, gfp_t gfpflags, int nodeid)
{
struct kmem_cache_node *n;
- struct slab *slabp;
+ struct page *page;
unsigned long flags;
int node;

@@ -1652,15 +1627,15 @@ slab_out_of_memory(struct kmem_cache *cachep, gfp_t gfpflags, int nodeid)
continue;

spin_lock_irqsave(&n->list_lock, flags);
- list_for_each_entry(slabp, &n->slabs_full, list) {
+ list_for_each_entry(page, &n->slabs_full, lru) {
active_objs += cachep->num;
active_slabs++;
}
- list_for_each_entry(slabp, &n->slabs_partial, list) {
- active_objs += slabp->active;
+ list_for_each_entry(page, &n->slabs_partial, lru) {
+ active_objs += page->active;
active_slabs++;
}
- list_for_each_entry(slabp, &n->slabs_free, list)
+ list_for_each_entry(page, &n->slabs_free, lru)
num_slabs++;

free_objects += n->free_objects;
@@ -1746,6 +1721,8 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
BUG_ON(!PageSlab(page));
__ClearPageSlabPfmemalloc(page);
__ClearPageSlab(page);
+ page_mapcount_reset(page);
+ page->mapping = NULL;

memcg_release_pages(cachep, cachep->gfporder);
if (current->reclaim_state)
@@ -1910,19 +1887,19 @@ static void check_poison_obj(struct kmem_cache *cachep, void *objp)
/* Print some data about the neighboring objects, if they
* exist:
*/
- struct slab *slabp = virt_to_slab(objp);
+ struct page *page = virt_to_head_page(objp);
unsigned int objnr;

- objnr = obj_to_index(cachep, slabp, objp);
+ objnr = obj_to_index(cachep, page, objp);
if (objnr) {
- objp = index_to_obj(cachep, slabp, objnr - 1);
+ objp = index_to_obj(cachep, page, objnr - 1);
realobj = (char *)objp + obj_offset(cachep);
printk(KERN_ERR "Prev obj: start=%p, len=%d\n",
realobj, size);
print_objinfo(cachep, objp, 2);
}
if (objnr + 1 < cachep->num) {
- objp = index_to_obj(cachep, slabp, objnr + 1);
+ objp = index_to_obj(cachep, page, objnr + 1);
realobj = (char *)objp + obj_offset(cachep);
printk(KERN_ERR "Next obj: start=%p, len=%d\n",
realobj, size);
@@ -1933,11 +1910,12 @@ static void check_poison_obj(struct kmem_cache *cachep, void *objp)
#endif

#if DEBUG
-static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slabp)
+static void slab_destroy_debugcheck(struct kmem_cache *cachep,
+ struct page *page)
{
int i;
for (i = 0; i < cachep->num; i++) {
- void *objp = index_to_obj(cachep, slabp, i);
+ void *objp = index_to_obj(cachep, page, i);

if (cachep->flags & SLAB_POISON) {
#ifdef CONFIG_DEBUG_PAGEALLOC
@@ -1962,7 +1940,8 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slab
}
}
#else
-static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slabp)
+static void slab_destroy_debugcheck(struct kmem_cache *cachep,
+ struct page *page)
{
}
#endif
@@ -1976,11 +1955,12 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slab
* Before calling the slab must have been unlinked from the cache. The
* cache-lock is not held/needed.
*/
-static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
+static void slab_destroy(struct kmem_cache *cachep, struct page *page)
{
- struct page *page = virt_to_head_page(slabp->s_mem);
+ struct freelist *freelist;

- slab_destroy_debugcheck(cachep, slabp);
+ freelist = page->freelist;
+ slab_destroy_debugcheck(cachep, page);
if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {
struct rcu_head *head;

@@ -1998,11 +1978,11 @@ static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
}

/*
- * From now on, we don't use slab management
+ * From now on, we don't use freelist
* although actual page can be freed in rcu context
*/
if (OFF_SLAB(cachep))
- kmem_cache_free(cachep->slabp_cache, slabp);
+ kmem_cache_free(cachep->freelist_cache, freelist);
}

/**
@@ -2039,7 +2019,7 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
* use off-slab slabs. Needed to avoid a possible
* looping condition in cache_grow().
*/
- offslab_limit = size - sizeof(struct slab);
+ offslab_limit = size;
offslab_limit /= sizeof(unsigned int);

if (num > offslab_limit)
@@ -2162,7 +2142,7 @@ static int __init_refok setup_cpu_cache(struct kmem_cache *cachep, gfp_t gfp)
int
__kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
{
- size_t left_over, slab_size, ralign;
+ size_t left_over, freelist_size, ralign;
gfp_t gfp;
int err;
size_t size = cachep->size;
@@ -2281,22 +2261,21 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
if (!cachep->num)
return -E2BIG;

- slab_size = ALIGN(cachep->num * sizeof(unsigned int)
- + sizeof(struct slab), cachep->align);
+ freelist_size =
+ ALIGN(cachep->num * sizeof(unsigned int), cachep->align);

/*
* If the slab has been placed off-slab, and we have enough space then
* move it on-slab. This is at the expense of any extra colouring.
*/
- if (flags & CFLGS_OFF_SLAB && left_over >= slab_size) {
+ if (flags & CFLGS_OFF_SLAB && left_over >= freelist_size) {
flags &= ~CFLGS_OFF_SLAB;
- left_over -= slab_size;
+ left_over -= freelist_size;
}

if (flags & CFLGS_OFF_SLAB) {
/* really off slab. No need for manual alignment */
- slab_size =
- cachep->num * sizeof(unsigned int) + sizeof(struct slab);
+ freelist_size = cachep->num * sizeof(unsigned int);

#ifdef CONFIG_PAGE_POISONING
/* If we're going to use the generic kernel_map_pages()
@@ -2313,7 +2292,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
if (cachep->colour_off < cachep->align)
cachep->colour_off = cachep->align;
cachep->colour = left_over / cachep->colour_off;
- cachep->slab_size = slab_size;
+ cachep->freelist_size = freelist_size;
cachep->flags = flags;
cachep->allocflags = __GFP_COMP;
if (CONFIG_ZONE_DMA_FLAG && (flags & SLAB_CACHE_DMA))
@@ -2322,7 +2301,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
cachep->reciprocal_buffer_size = reciprocal_value(size);

if (flags & CFLGS_OFF_SLAB) {
- cachep->slabp_cache = kmalloc_slab(slab_size, 0u);
+ cachep->freelist_cache = kmalloc_slab(freelist_size, 0u);
/*
* This is a possibility for one of the malloc_sizes caches.
* But since we go off slab only for object size greater than
@@ -2330,7 +2309,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
* this should not happen at all.
* But leave a BUG_ON for some lucky dude.
*/
- BUG_ON(ZERO_OR_NULL_PTR(cachep->slabp_cache));
+ BUG_ON(ZERO_OR_NULL_PTR(cachep->freelist_cache));
}

err = setup_cpu_cache(cachep, gfp);
@@ -2436,7 +2415,7 @@ static int drain_freelist(struct kmem_cache *cache,
{
struct list_head *p;
int nr_freed;
- struct slab *slabp;
+ struct page *page;

nr_freed = 0;
while (nr_freed < tofree && !list_empty(&n->slabs_free)) {
@@ -2448,18 +2427,18 @@ static int drain_freelist(struct kmem_cache *cache,
goto out;
}

- slabp = list_entry(p, struct slab, list);
+ page = list_entry(p, struct page, lru);
#if DEBUG
- BUG_ON(slabp->active);
+ BUG_ON(page->active);
#endif
- list_del(&slabp->list);
+ list_del(&page->lru);
/*
* Safe to drop the lock. The slab is no longer linked
* to the cache.
*/
n->free_objects -= cache->num;
spin_unlock_irq(&n->list_lock);
- slab_destroy(cache, slabp);
+ slab_destroy(cache, page);
nr_freed++;
}
out:
@@ -2542,18 +2521,18 @@ int __kmem_cache_shutdown(struct kmem_cache *cachep)
* descriptors in kmem_cache_create, we search through the malloc_sizes array.
* If we are creating a malloc_sizes cache here it would not be visible to
* kmem_find_general_cachep till the initialization is complete.
- * Hence we cannot have slabp_cache same as the original cache.
+ * Hence we cannot have freelist_cache same as the original cache.
*/
-static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
+static struct freelist *alloc_slabmgmt(struct kmem_cache *cachep,
struct page *page, int colour_off,
gfp_t local_flags, int nodeid)
{
- struct slab *slabp;
+ struct freelist *freelist;
void *addr = page_address(page);

if (OFF_SLAB(cachep)) {
/* Slab management obj is off-slab. */
- slabp = kmem_cache_alloc_node(cachep->slabp_cache,
+ freelist = kmem_cache_alloc_node(cachep->freelist_cache,
local_flags, nodeid);
/*
* If the first object in the slab is leaked (it's allocated
@@ -2561,31 +2540,31 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
* kmemleak does not treat the ->s_mem pointer as a reference
* to the object. Otherwise we will not report the leak.
*/
- kmemleak_scan_area(&slabp->list, sizeof(struct list_head),
+ kmemleak_scan_area(&page->lru, sizeof(struct list_head),
local_flags);
- if (!slabp)
+ if (!freelist)
return NULL;
} else {
- slabp = addr + colour_off;
- colour_off += cachep->slab_size;
+ freelist = addr + colour_off;
+ colour_off += cachep->freelist_size;
}
- slabp->active = 0;
- slabp->s_mem = addr + colour_off;
- return slabp;
+ page->active = 0;
+ page->s_mem = addr + colour_off;
+ return freelist;
}

-static inline unsigned int *slab_bufctl(struct slab *slabp)
+static inline unsigned int *slab_bufctl(struct page *page)
{
- return (unsigned int *) (slabp + 1);
+ return (unsigned int *)(page->freelist);
}

static void cache_init_objs(struct kmem_cache *cachep,
- struct slab *slabp)
+ struct page *page)
{
int i;

for (i = 0; i < cachep->num; i++) {
- void *objp = index_to_obj(cachep, slabp, i);
+ void *objp = index_to_obj(cachep, page, i);
#if DEBUG
/* need to poison the objs? */
if (cachep->flags & SLAB_POISON)
@@ -2621,7 +2600,7 @@ static void cache_init_objs(struct kmem_cache *cachep,
if (cachep->ctor)
cachep->ctor(objp);
#endif
- slab_bufctl(slabp)[i] = i;
+ slab_bufctl(page)[i] = i;
}
}

@@ -2635,13 +2614,13 @@ static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
}
}

-static void *slab_get_obj(struct kmem_cache *cachep, struct slab *slabp,
+static void *slab_get_obj(struct kmem_cache *cachep, struct page *page,
int nodeid)
{
void *objp;

- objp = index_to_obj(cachep, slabp, slab_bufctl(slabp)[slabp->active]);
- slabp->active++;
+ objp = index_to_obj(cachep, page, slab_bufctl(page)[page->active]);
+ page->active++;
#if DEBUG
WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
#endif
@@ -2649,10 +2628,10 @@ static void *slab_get_obj(struct kmem_cache *cachep, struct slab *slabp,
return objp;
}

-static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
+static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
void *objp, int nodeid)
{
- unsigned int objnr = obj_to_index(cachep, slabp, objp);
+ unsigned int objnr = obj_to_index(cachep, page, objp);
#if DEBUG
unsigned int i;

@@ -2660,16 +2639,16 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);

/* Verify double free bug */
- for (i = slabp->active; i < cachep->num; i++) {
- if (slab_bufctl(slabp)[i] == objnr) {
+ for (i = page->active; i < cachep->num; i++) {
+ if (slab_bufctl(page)[i] == objnr) {
printk(KERN_ERR "slab: double free detected in cache "
"'%s', objp %p\n", cachep->name, objp);
BUG();
}
}
#endif
- slabp->active--;
- slab_bufctl(slabp)[slabp->active] = objnr;
+ page->active--;
+ slab_bufctl(page)[page->active] = objnr;
}

/*
@@ -2677,11 +2656,11 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
* for the slab allocator to be able to lookup the cache and slab of a
* virtual address for kfree, ksize, and slab debugging.
*/
-static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
- struct page *page)
+static void slab_map_pages(struct kmem_cache *cache, struct page *page,
+ struct freelist *freelist)
{
page->slab_cache = cache;
- page->slab_page = slab;
+ page->freelist = freelist;
}

/*
@@ -2691,7 +2670,7 @@ static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
static int cache_grow(struct kmem_cache *cachep,
gfp_t flags, int nodeid, struct page *page)
{
- struct slab *slabp;
+ struct freelist *freelist;
size_t offset;
gfp_t local_flags;
struct kmem_cache_node *n;
@@ -2738,14 +2717,14 @@ static int cache_grow(struct kmem_cache *cachep,
goto failed;

/* Get slab management. */
- slabp = alloc_slabmgmt(cachep, page, offset,
+ freelist = alloc_slabmgmt(cachep, page, offset,
local_flags & ~GFP_CONSTRAINT_MASK, nodeid);
- if (!slabp)
+ if (!freelist)
goto opps1;

- slab_map_pages(cachep, slabp, page);
+ slab_map_pages(cachep, page, freelist);

- cache_init_objs(cachep, slabp);
+ cache_init_objs(cachep, page);

if (local_flags & __GFP_WAIT)
local_irq_disable();
@@ -2753,7 +2732,7 @@ static int cache_grow(struct kmem_cache *cachep,
spin_lock(&n->list_lock);

/* Make slab active. */
- list_add_tail(&slabp->list, &(n->slabs_free));
+ list_add_tail(&page->lru, &(n->slabs_free));
STATS_INC_GROWN(cachep);
n->free_objects += cachep->num;
spin_unlock(&n->list_lock);
@@ -2808,13 +2787,13 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
unsigned long caller)
{
unsigned int objnr;
- struct slab *slabp;
+ struct page *page;

BUG_ON(virt_to_cache(objp) != cachep);

objp -= obj_offset(cachep);
kfree_debugcheck(objp);
- slabp = virt_to_slab(objp);
+ page = virt_to_head_page(objp);

if (cachep->flags & SLAB_RED_ZONE) {
verify_redzone_free(cachep, objp);
@@ -2824,10 +2803,10 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
if (cachep->flags & SLAB_STORE_USER)
*dbg_userword(cachep, objp) = (void *)caller;

- objnr = obj_to_index(cachep, slabp, objp);
+ objnr = obj_to_index(cachep, page, objp);

BUG_ON(objnr >= cachep->num);
- BUG_ON(objp != index_to_obj(cachep, slabp, objnr));
+ BUG_ON(objp != index_to_obj(cachep, page, objnr));

if (cachep->flags & SLAB_POISON) {
#ifdef CONFIG_DEBUG_PAGEALLOC
@@ -2886,7 +2865,7 @@ retry:

while (batchcount > 0) {
struct list_head *entry;
- struct slab *slabp;
+ struct page *page;
/* Get slab alloc is to come from. */
entry = n->slabs_partial.next;
if (entry == &n->slabs_partial) {
@@ -2896,7 +2875,7 @@ retry:
goto must_grow;
}

- slabp = list_entry(entry, struct slab, list);
+ page = list_entry(entry, struct page, lru);
check_spinlock_acquired(cachep);

/*
@@ -2904,23 +2883,23 @@ retry:
* there must be at least one object available for
* allocation.
*/
- BUG_ON(slabp->active >= cachep->num);
+ BUG_ON(page->active >= cachep->num);

- while (slabp->active < cachep->num && batchcount--) {
+ while (page->active < cachep->num && batchcount--) {
STATS_INC_ALLOCED(cachep);
STATS_INC_ACTIVE(cachep);
STATS_SET_HIGH(cachep);

- ac_put_obj(cachep, ac, slab_get_obj(cachep, slabp,
+ ac_put_obj(cachep, ac, slab_get_obj(cachep, page,
node));
}

/* move slabp to correct slabp list: */
- list_del(&slabp->list);
- if (slabp->active == cachep->num)
- list_add(&slabp->list, &n->slabs_full);
+ list_del(&page->lru);
+ if (page->active == cachep->num)
+ list_add(&page->list, &n->slabs_full);
else
- list_add(&slabp->list, &n->slabs_partial);
+ list_add(&page->list, &n->slabs_partial);
}

must_grow:
@@ -3175,7 +3154,7 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
int nodeid)
{
struct list_head *entry;
- struct slab *slabp;
+ struct page *page;
struct kmem_cache_node *n;
void *obj;
int x;
@@ -3195,24 +3174,24 @@ retry:
goto must_grow;
}

- slabp = list_entry(entry, struct slab, list);
+ page = list_entry(entry, struct page, lru);
check_spinlock_acquired_node(cachep, nodeid);

STATS_INC_NODEALLOCS(cachep);
STATS_INC_ACTIVE(cachep);
STATS_SET_HIGH(cachep);

- BUG_ON(slabp->active == cachep->num);
+ BUG_ON(page->active == cachep->num);

- obj = slab_get_obj(cachep, slabp, nodeid);
+ obj = slab_get_obj(cachep, page, nodeid);
n->free_objects--;
/* move slabp to correct slabp list: */
- list_del(&slabp->list);
+ list_del(&page->lru);

- if (slabp->active == cachep->num)
- list_add(&slabp->list, &n->slabs_full);
+ if (page->active == cachep->num)
+ list_add(&page->lru, &n->slabs_full);
else
- list_add(&slabp->list, &n->slabs_partial);
+ list_add(&page->lru, &n->slabs_partial);

spin_unlock(&n->list_lock);
goto done;
@@ -3362,21 +3341,21 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,

for (i = 0; i < nr_objects; i++) {
void *objp;
- struct slab *slabp;
+ struct page *page;

clear_obj_pfmemalloc(&objpp[i]);
objp = objpp[i];

- slabp = virt_to_slab(objp);
+ page = virt_to_head_page(objp);
n = cachep->node[node];
- list_del(&slabp->list);
+ list_del(&page->lru);
check_spinlock_acquired_node(cachep, node);
- slab_put_obj(cachep, slabp, objp, node);
+ slab_put_obj(cachep, page, objp, node);
STATS_DEC_ACTIVE(cachep);
n->free_objects++;

/* fixup slab chains */
- if (slabp->active == 0) {
+ if (page->active == 0) {
if (n->free_objects > n->free_limit) {
n->free_objects -= cachep->num;
/* No need to drop any previously held
@@ -3385,16 +3364,16 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
* a different cache, refer to comments before
* alloc_slabmgmt.
*/
- slab_destroy(cachep, slabp);
+ slab_destroy(cachep, page);
} else {
- list_add(&slabp->list, &n->slabs_free);
+ list_add(&page->lru, &n->slabs_free);
}
} else {
/* Unconditionally move a slab to the end of the
* partial list on free - maximum time for the
* other objects to be freed, too.
*/
- list_add_tail(&slabp->list, &n->slabs_partial);
+ list_add_tail(&page->lru, &n->slabs_partial);
}
}
}
@@ -3434,10 +3413,10 @@ free_done:

p = n->slabs_free.next;
while (p != &(n->slabs_free)) {
- struct slab *slabp;
+ struct page *page;

- slabp = list_entry(p, struct slab, list);
- BUG_ON(slabp->active);
+ page = list_entry(p, struct page, lru);
+ BUG_ON(page->active);

i++;
p = p->next;
@@ -4041,7 +4020,7 @@ out:
#ifdef CONFIG_SLABINFO
void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
{
- struct slab *slabp;
+ struct page *page;
unsigned long active_objs;
unsigned long num_objs;
unsigned long active_slabs = 0;
@@ -4061,22 +4040,22 @@ void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
check_irq_on();
spin_lock_irq(&n->list_lock);

- list_for_each_entry(slabp, &n->slabs_full, list) {
- if (slabp->active != cachep->num && !error)
+ list_for_each_entry(page, &n->slabs_full, lru) {
+ if (page->active != cachep->num && !error)
error = "slabs_full accounting error";
active_objs += cachep->num;
active_slabs++;
}
- list_for_each_entry(slabp, &n->slabs_partial, list) {
- if (slabp->active == cachep->num && !error)
+ list_for_each_entry(page, &n->slabs_partial, lru) {
+ if (page->active == cachep->num && !error)
error = "slabs_partial accounting error";
- if (!slabp->active && !error)
+ if (!page->active && !error)
error = "slabs_partial accounting error";
- active_objs += slabp->active;
+ active_objs += page->active;
active_slabs++;
}
- list_for_each_entry(slabp, &n->slabs_free, list) {
- if (slabp->active && !error)
+ list_for_each_entry(page, &n->slabs_free, lru) {
+ if (page->active && !error)
error = "slabs_free accounting error";
num_slabs++;
}
@@ -4229,19 +4208,20 @@ static inline int add_caller(unsigned long *n, unsigned long v)
return 1;
}

-static void handle_slab(unsigned long *n, struct kmem_cache *c, struct slab *s)
+static void handle_slab(unsigned long *n, struct kmem_cache *c,
+ struct page *page)
{
void *p;
int i, j;

if (n[0] == n[1])
return;
- for (i = 0, p = s->s_mem; i < c->num; i++, p += c->size) {
+ for (i = 0, p = page->s_mem; i < c->num; i++, p += c->size) {
bool active = true;

- for (j = s->active; j < c->num; j++) {
+ for (j = page->active; j < c->num; j++) {
/* Skip freed item */
- if (slab_bufctl(s)[j] == i) {
+ if (slab_bufctl(page)[j] == i) {
active = false;
break;
}
@@ -4273,7 +4253,7 @@ static void show_symbol(struct seq_file *m, unsigned long address)
static int leaks_show(struct seq_file *m, void *p)
{
struct kmem_cache *cachep = list_entry(p, struct kmem_cache, list);
- struct slab *slabp;
+ struct page *page;
struct kmem_cache_node *n;
const char *name;
unsigned long *x = m->private;
@@ -4297,10 +4277,10 @@ static int leaks_show(struct seq_file *m, void *p)
check_irq_on();
spin_lock_irq(&n->list_lock);

- list_for_each_entry(slabp, &n->slabs_full, list)
- handle_slab(x, cachep, slabp);
- list_for_each_entry(slabp, &n->slabs_partial, list)
- handle_slab(x, cachep, slabp);
+ list_for_each_entry(page, &n->slabs_full, lru)
+ handle_slab(x, cachep, page);
+ list_for_each_entry(page, &n->slabs_partial, lru)
+ handle_slab(x, cachep, page);
spin_unlock_irq(&n->list_lock);
}
name = cachep->name;
--
1.7.9.5

2013-10-16 08:45:24

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2 14/15] slab: remove useless statement for checking pfmemalloc

Now, virt_to_page(page->s_mem) is same as the page,
because slab use this structure for management.
So remove useless statement.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/slab.c b/mm/slab.c
index 0e7f2e7..fbb594f 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -750,9 +750,7 @@ static struct array_cache *alloc_arraycache(int node, int entries,

static inline bool is_slab_pfmemalloc(struct page *page)
{
- struct page *mem_page = virt_to_page(page->s_mem);
-
- return PageSlabPfmemalloc(mem_page);
+ return PageSlabPfmemalloc(page);
}

/* Clears pfmemalloc_active if no slabs have pfmalloc set */
@@ -817,7 +815,7 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
n = cachep->node[numa_mem_id()];
if (!list_empty(&n->slabs_free) && force_refill) {
struct page *page = virt_to_head_page(objp);
- ClearPageSlabPfmemalloc(virt_to_head_page(page->s_mem));
+ ClearPageSlabPfmemalloc(page);
clear_obj_pfmemalloc(&objp);
recheck_pfmemalloc_active(cachep, ac);
return objp;
@@ -850,8 +848,7 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
if (unlikely(pfmemalloc_active)) {
/* Some pfmemalloc slabs exist, check if this is one */
struct page *page = virt_to_head_page(objp);
- struct page *mem_page = virt_to_head_page(page->s_mem);
- if (PageSlabPfmemalloc(mem_page))
+ if (PageSlabPfmemalloc(page))
set_obj_pfmemalloc(&objp);
}

--
1.7.9.5

2013-10-16 08:44:23

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2 06/15] slab: overloading the RCU head over the LRU for RCU free

With build-time size checking, we can overload the RCU head over the LRU
of struct page to free pages of a slab in rcu context. This really help to
implement to overload the struct slab over the struct page and this
eventually reduce memory usage and cache footprint of the SLAB.

Acked-by: Christoph Lameter <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index d9851ee..8b85d8c 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -130,6 +130,9 @@ struct page {

struct list_head list; /* slobs list of pages */
struct slab *slab_page; /* slab fields */
+ struct rcu_head rcu_head; /* Used by SLAB
+ * when destroying via RCU
+ */
};

/* Remainder is not double word aligned */
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 74f1058..c2bba24 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -53,7 +53,14 @@
* }
* rcu_read_unlock();
*
- * See also the comment on struct slab_rcu in mm/slab.c.
+ * This is useful if we need to approach a kernel structure obliquely,
+ * from its address obtained without the usual locking. We can lock
+ * the structure to stabilize it and check it's still at the given address,
+ * only if we can be sure that the memory has not been meanwhile reused
+ * for some other kind of object (which our subsystem's lock might corrupt).
+ *
+ * rcu_read_lock before reading the address, then rcu_read_unlock after
+ * taking the spinlock within the structure expected at that address.
*/
#define SLAB_DESTROY_BY_RCU 0x00080000UL /* Defer freeing slabs to RCU */
#define SLAB_MEM_SPREAD 0x00100000UL /* Spread some memory over cpuset */
diff --git a/mm/slab.c b/mm/slab.c
index 7e1aabe..84c4ed6 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -189,25 +189,6 @@ typedef unsigned int kmem_bufctl_t;
#define SLAB_LIMIT (((kmem_bufctl_t)(~0U))-3)

/*
- * struct slab_rcu
- *
- * slab_destroy on a SLAB_DESTROY_BY_RCU cache uses this structure to
- * arrange for kmem_freepages to be called via RCU. This is useful if
- * we need to approach a kernel structure obliquely, from its address
- * obtained without the usual locking. We can lock the structure to
- * stabilize it and check it's still at the given address, only if we
- * can be sure that the memory has not been meanwhile reused for some
- * other kind of object (which our subsystem's lock might corrupt).
- *
- * rcu_read_lock before reading the address, then rcu_read_unlock after
- * taking the spinlock within the structure expected at that address.
- */
-struct slab_rcu {
- struct rcu_head head;
- struct page *page;
-};
-
-/*
* struct slab
*
* Manages the objs in a slab. Placed either at the beginning of mem allocated
@@ -215,14 +196,11 @@ struct slab_rcu {
* Slabs are chained into three list: fully used, partial, fully free slabs.
*/
struct slab {
- union {
- struct {
- struct list_head list;
- void *s_mem; /* including colour offset */
- unsigned int inuse; /* num of objs active in slab */
- kmem_bufctl_t free;
- };
- struct slab_rcu __slab_cover_slab_rcu;
+ struct {
+ struct list_head list;
+ void *s_mem; /* including colour offset */
+ unsigned int inuse; /* num of objs active in slab */
+ kmem_bufctl_t free;
};
};

@@ -1509,6 +1487,8 @@ void __init kmem_cache_init(void)
{
int i;

+ BUILD_BUG_ON(sizeof(((struct page *)NULL)->lru) <
+ sizeof(struct rcu_head));
kmem_cache = &kmem_cache_boot;
setup_node_pointer(kmem_cache);

@@ -1822,12 +1802,13 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)

static void kmem_rcu_free(struct rcu_head *head)
{
- struct slab_rcu *slab_rcu = (struct slab_rcu *)head;
- struct kmem_cache *cachep = slab_rcu->page->slab_cache;
+ struct kmem_cache *cachep;
+ struct page *page;

- kmem_freepages(cachep, slab_rcu->page);
- if (OFF_SLAB(cachep))
- kmem_cache_free(cachep->slabp_cache, slab_rcu);
+ page = container_of(head, struct page, rcu_head);
+ cachep = page->slab_cache;
+
+ kmem_freepages(cachep, page);
}

#if DEBUG
@@ -2048,16 +2029,27 @@ static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)

slab_destroy_debugcheck(cachep, slabp);
if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {
- struct slab_rcu *slab_rcu;
+ struct rcu_head *head;
+
+ /*
+ * RCU free overloads the RCU head over the LRU.
+ * slab_page has been overloeaded over the LRU,
+ * however it is not used from now on so that
+ * we can use it safely.
+ */
+ head = (void *)&page->rcu_head;
+ call_rcu(head, kmem_rcu_free);

- slab_rcu = (struct slab_rcu *)slabp;
- slab_rcu->page = page;
- call_rcu(&slab_rcu->head, kmem_rcu_free);
} else {
kmem_freepages(cachep, page);
- if (OFF_SLAB(cachep))
- kmem_cache_free(cachep->slabp_cache, slabp);
}
+
+ /*
+ * From now on, we don't use slab management
+ * although actual page can be freed in rcu context
+ */
+ if (OFF_SLAB(cachep))
+ kmem_cache_free(cachep->slabp_cache, slabp);
}

/**
--
1.7.9.5

2013-10-16 08:46:35

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2 11/15] slab: remove SLAB_LIMIT

It's useless now, so remove it.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/slab.c b/mm/slab.c
index 6ced1cc..c271d5b 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -163,8 +163,6 @@
*/
static bool pfmemalloc_active __read_mostly;

-#define SLAB_LIMIT (((unsigned int)(~0U))-1)
-
/*
* struct slab
*
@@ -626,8 +624,6 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
mgmt_size = 0;
nr_objs = slab_size / buffer_size;

- if (nr_objs > SLAB_LIMIT)
- nr_objs = SLAB_LIMIT;
} else {
/*
* Ignore padding for the initial guess. The padding
@@ -648,9 +644,6 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
> slab_size)
nr_objs--;

- if (nr_objs > SLAB_LIMIT)
- nr_objs = SLAB_LIMIT;
-
mgmt_size = slab_mgmt_size(nr_objs, align);
}
*num = nr_objs;
--
1.7.9.5

2013-10-16 08:46:33

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2 12/15] slab: replace free and inuse in struct slab with newly introduced active

Now, free in struct slab is same meaning as inuse.
So, remove both and replace them with active.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/slab.c b/mm/slab.c
index c271d5b..2ec2336 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -174,8 +174,7 @@ struct slab {
struct {
struct list_head list;
void *s_mem; /* including colour offset */
- unsigned int inuse; /* num of objs active in slab */
- unsigned int free;
+ unsigned int active; /* num of objs active in slab */
};
};

@@ -1658,7 +1657,7 @@ slab_out_of_memory(struct kmem_cache *cachep, gfp_t gfpflags, int nodeid)
active_slabs++;
}
list_for_each_entry(slabp, &n->slabs_partial, list) {
- active_objs += slabp->inuse;
+ active_objs += slabp->active;
active_slabs++;
}
list_for_each_entry(slabp, &n->slabs_free, list)
@@ -2451,7 +2450,7 @@ static int drain_freelist(struct kmem_cache *cache,

slabp = list_entry(p, struct slab, list);
#if DEBUG
- BUG_ON(slabp->inuse);
+ BUG_ON(slabp->active);
#endif
list_del(&slabp->list);
/*
@@ -2570,9 +2569,8 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
slabp = addr + colour_off;
colour_off += cachep->slab_size;
}
- slabp->inuse = 0;
+ slabp->active = 0;
slabp->s_mem = addr + colour_off;
- slabp->free = 0;
return slabp;
}

@@ -2642,12 +2640,11 @@ static void *slab_get_obj(struct kmem_cache *cachep, struct slab *slabp,
{
void *objp;

- slabp->inuse++;
- objp = index_to_obj(cachep, slabp, slab_bufctl(slabp)[slabp->free]);
+ objp = index_to_obj(cachep, slabp, slab_bufctl(slabp)[slabp->active]);
+ slabp->active++;
#if DEBUG
WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
#endif
- slabp->free++;

return objp;
}
@@ -2663,7 +2660,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);

/* Verify double free bug */
- for (i = slabp->free; i < cachep->num; i++) {
+ for (i = slabp->active; i < cachep->num; i++) {
if (slab_bufctl(slabp)[i] == objnr) {
printk(KERN_ERR "slab: double free detected in cache "
"'%s', objp %p\n", cachep->name, objp);
@@ -2671,9 +2668,8 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
}
}
#endif
- slabp->free--;
- slab_bufctl(slabp)[slabp->free] = objnr;
- slabp->inuse--;
+ slabp->active--;
+ slab_bufctl(slabp)[slabp->active] = objnr;
}

/*
@@ -2908,9 +2904,9 @@ retry:
* there must be at least one object available for
* allocation.
*/
- BUG_ON(slabp->inuse >= cachep->num);
+ BUG_ON(slabp->active >= cachep->num);

- while (slabp->inuse < cachep->num && batchcount--) {
+ while (slabp->active < cachep->num && batchcount--) {
STATS_INC_ALLOCED(cachep);
STATS_INC_ACTIVE(cachep);
STATS_SET_HIGH(cachep);
@@ -2921,7 +2917,7 @@ retry:

/* move slabp to correct slabp list: */
list_del(&slabp->list);
- if (slabp->free == cachep->num)
+ if (slabp->active == cachep->num)
list_add(&slabp->list, &n->slabs_full);
else
list_add(&slabp->list, &n->slabs_partial);
@@ -3206,14 +3202,14 @@ retry:
STATS_INC_ACTIVE(cachep);
STATS_SET_HIGH(cachep);

- BUG_ON(slabp->inuse == cachep->num);
+ BUG_ON(slabp->active == cachep->num);

obj = slab_get_obj(cachep, slabp, nodeid);
n->free_objects--;
/* move slabp to correct slabp list: */
list_del(&slabp->list);

- if (slabp->free == cachep->num)
+ if (slabp->active == cachep->num)
list_add(&slabp->list, &n->slabs_full);
else
list_add(&slabp->list, &n->slabs_partial);
@@ -3380,7 +3376,7 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
n->free_objects++;

/* fixup slab chains */
- if (slabp->inuse == 0) {
+ if (slabp->active == 0) {
if (n->free_objects > n->free_limit) {
n->free_objects -= cachep->num;
/* No need to drop any previously held
@@ -3441,7 +3437,7 @@ free_done:
struct slab *slabp;

slabp = list_entry(p, struct slab, list);
- BUG_ON(slabp->inuse);
+ BUG_ON(slabp->active);

i++;
p = p->next;
@@ -4066,22 +4062,22 @@ void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
spin_lock_irq(&n->list_lock);

list_for_each_entry(slabp, &n->slabs_full, list) {
- if (slabp->inuse != cachep->num && !error)
+ if (slabp->active != cachep->num && !error)
error = "slabs_full accounting error";
active_objs += cachep->num;
active_slabs++;
}
list_for_each_entry(slabp, &n->slabs_partial, list) {
- if (slabp->inuse == cachep->num && !error)
- error = "slabs_partial inuse accounting error";
- if (!slabp->inuse && !error)
- error = "slabs_partial/inuse accounting error";
- active_objs += slabp->inuse;
+ if (slabp->active == cachep->num && !error)
+ error = "slabs_partial accounting error";
+ if (!slabp->active && !error)
+ error = "slabs_partial accounting error";
+ active_objs += slabp->active;
active_slabs++;
}
list_for_each_entry(slabp, &n->slabs_free, list) {
- if (slabp->inuse && !error)
- error = "slabs_free/inuse accounting error";
+ if (slabp->active && !error)
+ error = "slabs_free accounting error";
num_slabs++;
}
free_objects += n->free_objects;
@@ -4243,7 +4239,7 @@ static void handle_slab(unsigned long *n, struct kmem_cache *c, struct slab *s)
for (i = 0, p = s->s_mem; i < c->num; i++, p += c->size) {
bool active = true;

- for (j = s->free; j < c->num; j++) {
+ for (j = s->active; j < c->num; j++) {
/* Skip freed item */
if (slab_bufctl(s)[j] == i) {
active = false;
--
1.7.9.5

2013-10-16 08:47:48

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2 08/15] slab: use __GFP_COMP flag for allocating slab pages

If we use 'struct page' of first page as 'struct slab', there is no
advantage not to use __GFP_COMP. So use __GFP_COMP flag for all the cases.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/slab.c b/mm/slab.c
index f9e676e..75c6082 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1718,15 +1718,6 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
{
struct page *page;
int nr_pages;
- int i;
-
-#ifndef CONFIG_MMU
- /*
- * Nommu uses slab's for process anonymous memory allocations, and thus
- * requires __GFP_COMP to properly refcount higher order allocations
- */
- flags |= __GFP_COMP;
-#endif

flags |= cachep->allocflags;
if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
@@ -1750,12 +1741,9 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
else
add_zone_page_state(page_zone(page),
NR_SLAB_UNRECLAIMABLE, nr_pages);
- for (i = 0; i < nr_pages; i++) {
- __SetPageSlab(page + i);
-
- if (page->pfmemalloc)
- SetPageSlabPfmemalloc(page);
- }
+ __SetPageSlab(page);
+ if (page->pfmemalloc)
+ SetPageSlabPfmemalloc(page);
memcg_bind_pages(cachep, cachep->gfporder);

if (kmemcheck_enabled && !(cachep->flags & SLAB_NOTRACK)) {
@@ -1775,8 +1763,7 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
*/
static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
{
- unsigned long i = (1 << cachep->gfporder);
- const unsigned long nr_freed = i;
+ const unsigned long nr_freed = (1 << cachep->gfporder);

kmemcheck_free_shadow(page, cachep->gfporder);

@@ -1787,12 +1774,9 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
sub_zone_page_state(page_zone(page),
NR_SLAB_UNRECLAIMABLE, nr_freed);

+ BUG_ON(!PageSlab(page));
__ClearPageSlabPfmemalloc(page);
- while (i--) {
- BUG_ON(!PageSlab(page));
- __ClearPageSlab(page);
- page++;
- }
+ __ClearPageSlab(page);

memcg_release_pages(cachep, cachep->gfporder);
if (current->reclaim_state)
@@ -2362,7 +2346,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
cachep->colour = left_over / cachep->colour_off;
cachep->slab_size = slab_size;
cachep->flags = flags;
- cachep->allocflags = 0;
+ cachep->allocflags = __GFP_COMP;
if (CONFIG_ZONE_DMA_FLAG && (flags & SLAB_CACHE_DMA))
cachep->allocflags |= GFP_DMA;
cachep->size = size;
@@ -2729,17 +2713,8 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
struct page *page)
{
- int nr_pages;
-
- nr_pages = 1;
- if (likely(!PageCompound(page)))
- nr_pages <<= cache->gfporder;
-
- do {
- page->slab_cache = cache;
- page->slab_page = slab;
- page++;
- } while (--nr_pages);
+ page->slab_cache = cache;
+ page->slab_page = slab;
}

/*
--
1.7.9.5

2013-10-16 08:47:47

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2 09/15] slab: change the management method of free objects of the slab

Current free objects management method of the slab is weird, because
it touch random position of the array of kmem_bufctl_t when we try to
get free object. See following example.

struct slab's free = 6
kmem_bufctl_t array: 1 END 5 7 0 4 3 2

To get free objects, we access this array with following pattern.
6 -> 3 -> 7 -> 2 -> 5 -> 4 -> 0 -> 1 -> END

If we have many objects, this array would be larger and be not in the same
cache line. It is not good for performance.

We can do same thing through more easy way, like as the stack.
Only thing we have to do is to maintain stack top to free object. I use
free field of struct slab for this purpose. After that, if we need to get
an object, we can get it at stack top and manipulate top pointer.
That's all. This method already used in array_cache management.
Following is an access pattern when we use this method.

struct slab's free = 0
kmem_bufctl_t array: 6 3 7 2 5 4 0 1

To get free objects, we access this array with following pattern.
0 -> 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7

This may help cache line footprint if slab has many objects, and,
in addition, this makes code much much simpler.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/slab.c b/mm/slab.c
index 75c6082..05fe37e 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -183,9 +183,6 @@ static bool pfmemalloc_active __read_mostly;
*/

typedef unsigned int kmem_bufctl_t;
-#define BUFCTL_END (((kmem_bufctl_t)(~0U))-0)
-#define BUFCTL_FREE (((kmem_bufctl_t)(~0U))-1)
-#define BUFCTL_ACTIVE (((kmem_bufctl_t)(~0U))-2)
#define SLAB_LIMIT (((kmem_bufctl_t)(~0U))-3)

/*
@@ -2653,9 +2650,8 @@ static void cache_init_objs(struct kmem_cache *cachep,
if (cachep->ctor)
cachep->ctor(objp);
#endif
- slab_bufctl(slabp)[i] = i + 1;
+ slab_bufctl(slabp)[i] = i;
}
- slab_bufctl(slabp)[i - 1] = BUFCTL_END;
}

static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
@@ -2671,16 +2667,14 @@ static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
static void *slab_get_obj(struct kmem_cache *cachep, struct slab *slabp,
int nodeid)
{
- void *objp = index_to_obj(cachep, slabp, slabp->free);
- kmem_bufctl_t next;
+ void *objp;

slabp->inuse++;
- next = slab_bufctl(slabp)[slabp->free];
+ objp = index_to_obj(cachep, slabp, slab_bufctl(slabp)[slabp->free]);
#if DEBUG
- slab_bufctl(slabp)[slabp->free] = BUFCTL_FREE;
WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
#endif
- slabp->free = next;
+ slabp->free++;

return objp;
}
@@ -2689,19 +2683,23 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
void *objp, int nodeid)
{
unsigned int objnr = obj_to_index(cachep, slabp, objp);
-
#if DEBUG
+ kmem_bufctl_t i;
+
/* Verify that the slab belongs to the intended node */
WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);

- if (slab_bufctl(slabp)[objnr] + 1 <= SLAB_LIMIT + 1) {
- printk(KERN_ERR "slab: double free detected in cache "
- "'%s', objp %p\n", cachep->name, objp);
- BUG();
+ /* Verify double free bug */
+ for (i = slabp->free; i < cachep->num; i++) {
+ if (slab_bufctl(slabp)[i] == objnr) {
+ printk(KERN_ERR "slab: double free detected in cache "
+ "'%s', objp %p\n", cachep->name, objp);
+ BUG();
+ }
}
#endif
- slab_bufctl(slabp)[objnr] = slabp->free;
- slabp->free = objnr;
+ slabp->free--;
+ slab_bufctl(slabp)[slabp->free] = objnr;
slabp->inuse--;
}

@@ -2862,9 +2860,6 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
BUG_ON(objnr >= cachep->num);
BUG_ON(objp != index_to_obj(cachep, slabp, objnr));

-#ifdef CONFIG_DEBUG_SLAB_LEAK
- slab_bufctl(slabp)[objnr] = BUFCTL_FREE;
-#endif
if (cachep->flags & SLAB_POISON) {
#ifdef CONFIG_DEBUG_PAGEALLOC
if ((cachep->size % PAGE_SIZE)==0 && OFF_SLAB(cachep)) {
@@ -2881,33 +2876,9 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
return objp;
}

-static void check_slabp(struct kmem_cache *cachep, struct slab *slabp)
-{
- kmem_bufctl_t i;
- int entries = 0;
-
- /* Check slab's freelist to see if this obj is there. */
- for (i = slabp->free; i != BUFCTL_END; i = slab_bufctl(slabp)[i]) {
- entries++;
- if (entries > cachep->num || i >= cachep->num)
- goto bad;
- }
- if (entries != cachep->num - slabp->inuse) {
-bad:
- printk(KERN_ERR "slab: Internal list corruption detected in "
- "cache '%s'(%d), slabp %p(%d). Tainted(%s). Hexdump:\n",
- cachep->name, cachep->num, slabp, slabp->inuse,
- print_tainted());
- print_hex_dump(KERN_ERR, "", DUMP_PREFIX_OFFSET, 16, 1, slabp,
- sizeof(*slabp) + cachep->num * sizeof(kmem_bufctl_t),
- 1);
- BUG();
- }
-}
#else
#define kfree_debugcheck(x) do { } while(0)
#define cache_free_debugcheck(x,objp,z) (objp)
-#define check_slabp(x,y) do { } while(0)
#endif

static void *cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags,
@@ -2957,7 +2928,6 @@ retry:
}

slabp = list_entry(entry, struct slab, list);
- check_slabp(cachep, slabp);
check_spinlock_acquired(cachep);

/*
@@ -2975,11 +2945,10 @@ retry:
ac_put_obj(cachep, ac, slab_get_obj(cachep, slabp,
node));
}
- check_slabp(cachep, slabp);

/* move slabp to correct slabp list: */
list_del(&slabp->list);
- if (slabp->free == BUFCTL_END)
+ if (slabp->free == cachep->num)
list_add(&slabp->list, &n->slabs_full);
else
list_add(&slabp->list, &n->slabs_partial);
@@ -3054,16 +3023,6 @@ static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
*dbg_redzone1(cachep, objp) = RED_ACTIVE;
*dbg_redzone2(cachep, objp) = RED_ACTIVE;
}
-#ifdef CONFIG_DEBUG_SLAB_LEAK
- {
- struct slab *slabp;
- unsigned objnr;
-
- slabp = virt_to_slab(objp);
- objnr = (unsigned)(objp - slabp->s_mem) / cachep->size;
- slab_bufctl(slabp)[objnr] = BUFCTL_ACTIVE;
- }
-#endif
objp += obj_offset(cachep);
if (cachep->ctor && cachep->flags & SLAB_POISON)
cachep->ctor(objp);
@@ -3269,7 +3228,6 @@ retry:

slabp = list_entry(entry, struct slab, list);
check_spinlock_acquired_node(cachep, nodeid);
- check_slabp(cachep, slabp);

STATS_INC_NODEALLOCS(cachep);
STATS_INC_ACTIVE(cachep);
@@ -3278,12 +3236,11 @@ retry:
BUG_ON(slabp->inuse == cachep->num);

obj = slab_get_obj(cachep, slabp, nodeid);
- check_slabp(cachep, slabp);
n->free_objects--;
/* move slabp to correct slabp list: */
list_del(&slabp->list);

- if (slabp->free == BUFCTL_END)
+ if (slabp->free == cachep->num)
list_add(&slabp->list, &n->slabs_full);
else
list_add(&slabp->list, &n->slabs_partial);
@@ -3445,11 +3402,9 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
n = cachep->node[node];
list_del(&slabp->list);
check_spinlock_acquired_node(cachep, node);
- check_slabp(cachep, slabp);
slab_put_obj(cachep, slabp, objp, node);
STATS_DEC_ACTIVE(cachep);
n->free_objects++;
- check_slabp(cachep, slabp);

/* fixup slab chains */
if (slabp->inuse == 0) {
@@ -4308,12 +4263,23 @@ static inline int add_caller(unsigned long *n, unsigned long v)
static void handle_slab(unsigned long *n, struct kmem_cache *c, struct slab *s)
{
void *p;
- int i;
+ int i, j;
+
if (n[0] == n[1])
return;
for (i = 0, p = s->s_mem; i < c->num; i++, p += c->size) {
- if (slab_bufctl(s)[i] != BUFCTL_ACTIVE)
+ bool active = true;
+
+ for (j = s->free; j < c->num; j++) {
+ /* Skip freed item */
+ if (slab_bufctl(s)[j] == i) {
+ active = false;
+ break;
+ }
+ }
+ if (!active)
continue;
+
if (!add_caller(n, (unsigned long)*dbg_userword(c, p)))
return;
}
--
1.7.9.5

2013-10-16 08:47:46

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2 10/15] slab: remove kmem_bufctl_t

Now, we changed the management method of free objects of the slab and
there is no need to use special value, BUFCTL_END, BUFCTL_FREE and
BUFCTL_ACTIVE. So remove them.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/slab.c b/mm/slab.c
index 05fe37e..6ced1cc 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -163,27 +163,7 @@
*/
static bool pfmemalloc_active __read_mostly;

-/*
- * kmem_bufctl_t:
- *
- * Bufctl's are used for linking objs within a slab
- * linked offsets.
- *
- * This implementation relies on "struct page" for locating the cache &
- * slab an object belongs to.
- * This allows the bufctl structure to be small (one int), but limits
- * the number of objects a slab (not a cache) can contain when off-slab
- * bufctls are used. The limit is the size of the largest general cache
- * that does not use off-slab slabs.
- * For 32bit archs with 4 kB pages, is this 56.
- * This is not serious, as it is only for large objects, when it is unwise
- * to have too many per slab.
- * Note: This limit can be raised by introducing a general cache whose size
- * is less than 512 (PAGE_SIZE<<3), but greater than 256.
- */
-
-typedef unsigned int kmem_bufctl_t;
-#define SLAB_LIMIT (((kmem_bufctl_t)(~0U))-3)
+#define SLAB_LIMIT (((unsigned int)(~0U))-1)

/*
* struct slab
@@ -197,7 +177,7 @@ struct slab {
struct list_head list;
void *s_mem; /* including colour offset */
unsigned int inuse; /* num of objs active in slab */
- kmem_bufctl_t free;
+ unsigned int free;
};
};

@@ -613,7 +593,7 @@ static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)

static size_t slab_mgmt_size(size_t nr_objs, size_t align)
{
- return ALIGN(sizeof(struct slab)+nr_objs*sizeof(kmem_bufctl_t), align);
+ return ALIGN(sizeof(struct slab)+nr_objs*sizeof(unsigned int), align);
}

/*
@@ -633,7 +613,7 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
* slab is used for:
*
* - The struct slab
- * - One kmem_bufctl_t for each object
+ * - One unsigned int for each object
* - Padding to respect alignment of @align
* - @buffer_size bytes for each object
*
@@ -658,7 +638,7 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
* into account.
*/
nr_objs = (slab_size - sizeof(struct slab)) /
- (buffer_size + sizeof(kmem_bufctl_t));
+ (buffer_size + sizeof(unsigned int));

/*
* This calculated number will be either the right
@@ -2068,7 +2048,7 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
* looping condition in cache_grow().
*/
offslab_limit = size - sizeof(struct slab);
- offslab_limit /= sizeof(kmem_bufctl_t);
+ offslab_limit /= sizeof(unsigned int);

if (num > offslab_limit)
break;
@@ -2309,7 +2289,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
if (!cachep->num)
return -E2BIG;

- slab_size = ALIGN(cachep->num * sizeof(kmem_bufctl_t)
+ slab_size = ALIGN(cachep->num * sizeof(unsigned int)
+ sizeof(struct slab), cachep->align);

/*
@@ -2324,7 +2304,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
if (flags & CFLGS_OFF_SLAB) {
/* really off slab. No need for manual alignment */
slab_size =
- cachep->num * sizeof(kmem_bufctl_t) + sizeof(struct slab);
+ cachep->num * sizeof(unsigned int) + sizeof(struct slab);

#ifdef CONFIG_PAGE_POISONING
/* If we're going to use the generic kernel_map_pages()
@@ -2603,9 +2583,9 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
return slabp;
}

-static inline kmem_bufctl_t *slab_bufctl(struct slab *slabp)
+static inline unsigned int *slab_bufctl(struct slab *slabp)
{
- return (kmem_bufctl_t *) (slabp + 1);
+ return (unsigned int *) (slabp + 1);
}

static void cache_init_objs(struct kmem_cache *cachep,
@@ -2684,7 +2664,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
{
unsigned int objnr = obj_to_index(cachep, slabp, objp);
#if DEBUG
- kmem_bufctl_t i;
+ unsigned int i;

/* Verify that the slab belongs to the intended node */
WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
--
1.7.9.5

2013-10-16 08:50:42

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2 04/15] slab: remove nodeid in struct slab

We can get nodeid using address translation, so this field is not useful.
Therefore, remove it.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/slab.c b/mm/slab.c
index 34eb115..71ba8f5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -222,7 +222,6 @@ struct slab {
void *s_mem; /* including colour offset */
unsigned int inuse; /* num of objs active in slab */
kmem_bufctl_t free;
- unsigned short nodeid;
};
struct slab_rcu __slab_cover_slab_rcu;
};
@@ -1099,8 +1098,7 @@ static void drain_alien_cache(struct kmem_cache *cachep,

static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
{
- struct slab *slabp = virt_to_slab(objp);
- int nodeid = slabp->nodeid;
+ int nodeid = page_to_nid(virt_to_page(objp));
struct kmem_cache_node *n;
struct array_cache *alien = NULL;
int node;
@@ -1111,7 +1109,7 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
* Make sure we are not freeing a object from another node to the array
* cache on this cpu.
*/
- if (likely(slabp->nodeid == node))
+ if (likely(nodeid == node))
return 0;

n = cachep->node[node];
@@ -2630,7 +2628,6 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
}
slabp->inuse = 0;
slabp->s_mem = addr + colour_off;
- slabp->nodeid = nodeid;
slabp->free = 0;
return slabp;
}
@@ -2707,7 +2704,7 @@ static void *slab_get_obj(struct kmem_cache *cachep, struct slab *slabp,
next = slab_bufctl(slabp)[slabp->free];
#if DEBUG
slab_bufctl(slabp)[slabp->free] = BUFCTL_FREE;
- WARN_ON(slabp->nodeid != nodeid);
+ WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
#endif
slabp->free = next;

@@ -2721,7 +2718,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,

#if DEBUG
/* Verify that the slab belongs to the intended node */
- WARN_ON(slabp->nodeid != nodeid);
+ WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);

if (slab_bufctl(slabp)[objnr] + 1 <= SLAB_LIMIT + 1) {
printk(KERN_ERR "slab: double free detected in cache "
--
1.7.9.5

2013-10-16 08:51:25

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2 02/15] slab: change return type of kmem_getpages() to struct page

It is more understandable that kmem_getpages() return struct page.
And, with this, we can reduce one translation from virt addr to page and
makes better code than before. Below is a change of this patch.

* Before
text data bss dec hex filename
22123 23434 4 45561 b1f9 mm/slab.o

* After
text data bss dec hex filename
22074 23434 4 45512 b1c8 mm/slab.o

And this help following patch to remove struct slab's colouroff.

Acked-by: Christoph Lameter <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/slab.c b/mm/slab.c
index 0b4ddaf..7d79bd7 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -205,7 +205,7 @@ typedef unsigned int kmem_bufctl_t;
struct slab_rcu {
struct rcu_head head;
struct kmem_cache *cachep;
- void *addr;
+ struct page *page;
};

/*
@@ -1737,7 +1737,8 @@ slab_out_of_memory(struct kmem_cache *cachep, gfp_t gfpflags, int nodeid)
* did not request dmaable memory, we might get it, but that
* would be relatively rare and ignorable.
*/
-static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid)
+static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
+ int nodeid)
{
struct page *page;
int nr_pages;
@@ -1790,16 +1791,15 @@ static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid)
kmemcheck_mark_unallocated_pages(page, nr_pages);
}

- return page_address(page);
+ return page;
}

/*
* Interface to system's page release.
*/
-static void kmem_freepages(struct kmem_cache *cachep, void *addr)
+static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
{
unsigned long i = (1 << cachep->gfporder);
- struct page *page = virt_to_page(addr);
const unsigned long nr_freed = i;

kmemcheck_free_shadow(page, cachep->gfporder);
@@ -1821,7 +1821,7 @@ static void kmem_freepages(struct kmem_cache *cachep, void *addr)
memcg_release_pages(cachep, cachep->gfporder);
if (current->reclaim_state)
current->reclaim_state->reclaimed_slab += nr_freed;
- free_memcg_kmem_pages((unsigned long)addr, cachep->gfporder);
+ __free_memcg_kmem_pages(page, cachep->gfporder);
}

static void kmem_rcu_free(struct rcu_head *head)
@@ -1829,7 +1829,7 @@ static void kmem_rcu_free(struct rcu_head *head)
struct slab_rcu *slab_rcu = (struct slab_rcu *)head;
struct kmem_cache *cachep = slab_rcu->cachep;

- kmem_freepages(cachep, slab_rcu->addr);
+ kmem_freepages(cachep, slab_rcu->page);
if (OFF_SLAB(cachep))
kmem_cache_free(cachep->slabp_cache, slab_rcu);
}
@@ -2048,7 +2048,7 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slab
*/
static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
{
- void *addr = slabp->s_mem - slabp->colouroff;
+ struct page *page = virt_to_head_page(slabp->s_mem);

slab_destroy_debugcheck(cachep, slabp);
if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {
@@ -2056,10 +2056,10 @@ static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)

slab_rcu = (struct slab_rcu *)slabp;
slab_rcu->cachep = cachep;
- slab_rcu->addr = addr;
+ slab_rcu->page = page;
call_rcu(&slab_rcu->head, kmem_rcu_free);
} else {
- kmem_freepages(cachep, addr);
+ kmem_freepages(cachep, page);
if (OFF_SLAB(cachep))
kmem_cache_free(cachep->slabp_cache, slabp);
}
@@ -2604,11 +2604,12 @@ int __kmem_cache_shutdown(struct kmem_cache *cachep)
* kmem_find_general_cachep till the initialization is complete.
* Hence we cannot have slabp_cache same as the original cache.
*/
-static struct slab *alloc_slabmgmt(struct kmem_cache *cachep, void *objp,
- int colour_off, gfp_t local_flags,
- int nodeid)
+static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
+ struct page *page, int colour_off,
+ gfp_t local_flags, int nodeid)
{
struct slab *slabp;
+ void *addr = page_address(page);

if (OFF_SLAB(cachep)) {
/* Slab management obj is off-slab. */
@@ -2625,12 +2626,12 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep, void *objp,
if (!slabp)
return NULL;
} else {
- slabp = objp + colour_off;
+ slabp = addr + colour_off;
colour_off += cachep->slab_size;
}
slabp->inuse = 0;
slabp->colouroff = colour_off;
- slabp->s_mem = objp + colour_off;
+ slabp->s_mem = addr + colour_off;
slabp->nodeid = nodeid;
slabp->free = 0;
return slabp;
@@ -2741,12 +2742,9 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
* virtual address for kfree, ksize, and slab debugging.
*/
static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
- void *addr)
+ struct page *page)
{
int nr_pages;
- struct page *page;
-
- page = virt_to_page(addr);

nr_pages = 1;
if (likely(!PageCompound(page)))
@@ -2764,7 +2762,7 @@ static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
* kmem_cache_alloc() when there are no active objs left in a cache.
*/
static int cache_grow(struct kmem_cache *cachep,
- gfp_t flags, int nodeid, void *objp)
+ gfp_t flags, int nodeid, struct page *page)
{
struct slab *slabp;
size_t offset;
@@ -2807,18 +2805,18 @@ static int cache_grow(struct kmem_cache *cachep,
* Get mem for the objs. Attempt to allocate a physical page from
* 'nodeid'.
*/
- if (!objp)
- objp = kmem_getpages(cachep, local_flags, nodeid);
- if (!objp)
+ if (!page)
+ page = kmem_getpages(cachep, local_flags, nodeid);
+ if (!page)
goto failed;

/* Get slab management. */
- slabp = alloc_slabmgmt(cachep, objp, offset,
+ slabp = alloc_slabmgmt(cachep, page, offset,
local_flags & ~GFP_CONSTRAINT_MASK, nodeid);
if (!slabp)
goto opps1;

- slab_map_pages(cachep, slabp, objp);
+ slab_map_pages(cachep, slabp, page);

cache_init_objs(cachep, slabp);

@@ -2834,7 +2832,7 @@ static int cache_grow(struct kmem_cache *cachep,
spin_unlock(&n->list_lock);
return 1;
opps1:
- kmem_freepages(cachep, objp);
+ kmem_freepages(cachep, page);
failed:
if (local_flags & __GFP_WAIT)
local_irq_disable();
@@ -3250,18 +3248,20 @@ retry:
* We may trigger various forms of reclaim on the allowed
* set and go into memory reserves if necessary.
*/
+ struct page *page;
+
if (local_flags & __GFP_WAIT)
local_irq_enable();
kmem_flagcheck(cache, flags);
- obj = kmem_getpages(cache, local_flags, numa_mem_id());
+ page = kmem_getpages(cache, local_flags, numa_mem_id());
if (local_flags & __GFP_WAIT)
local_irq_disable();
- if (obj) {
+ if (page) {
/*
* Insert into the appropriate per node queues
*/
- nid = page_to_nid(virt_to_page(obj));
- if (cache_grow(cache, flags, nid, obj)) {
+ nid = page_to_nid(page);
+ if (cache_grow(cache, flags, nid, page)) {
obj = ____cache_alloc_node(cache,
flags | GFP_THISNODE, nid);
if (!obj)
--
1.7.9.5

2013-10-16 08:51:24

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2 03/15] slab: remove colouroff in struct slab

Now there is no user colouroff, so remove it.

Acked-by: Christoph Lameter <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/slab.c b/mm/slab.c
index 7d79bd7..34eb115 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -219,7 +219,6 @@ struct slab {
union {
struct {
struct list_head list;
- unsigned long colouroff;
void *s_mem; /* including colour offset */
unsigned int inuse; /* num of objs active in slab */
kmem_bufctl_t free;
@@ -2630,7 +2629,6 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
colour_off += cachep->slab_size;
}
slabp->inuse = 0;
- slabp->colouroff = colour_off;
slabp->s_mem = addr + colour_off;
slabp->nodeid = nodeid;
slabp->free = 0;
--
1.7.9.5

Subject: Re: [PATCH v2 01/15] slab: correct pfmemalloc check

On Wed, 16 Oct 2013, Joonsoo Kim wrote:

> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -930,7 +930,8 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
> {
> if (unlikely(pfmemalloc_active)) {
> /* Some pfmemalloc slabs exist, check if this is one */
> - struct page *page = virt_to_head_page(objp);
> + struct slab *slabp = virt_to_slab(objp);
> + struct page *page = virt_to_head_page(slabp->s_mem);
> if (PageSlabPfmemalloc(page))

I hope the compiler optimizes this code correctly because virt_to_slab
already does one virt_to_head_page()?

Otherwise this looks fine.

2013-10-16 20:35:00

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] slab: overload struct slab over struct page to reduce memory usage

On Wed, 16 Oct 2013 17:43:57 +0900 Joonsoo Kim <[email protected]> wrote:

> There is two main topics in this patchset. One is to reduce memory usage
> and the other is to change a management method of free objects of a slab.
>
> The SLAB allocate a struct slab for each slab. The size of this structure
> except bufctl array is 40 bytes on 64 bits machine. We can reduce memory
> waste and cache footprint if we overload struct slab over struct page.

Seems a good idea from a quick look.

A thought: when we do things like this - adding additional
interpretations to `struct page', we need to bear in mind that other
unrelated code can inspect that pageframe. It is not correct to assume
that because slab "owns" this page, no other code will be looking at it
and interpreting its contents.

One example is mm/memory-failure.c:memory_failure(). It starts with a
raw pfn, uses that to get at the `struct page', then starts playing
around with it. Will that code still work correctly when some of the
page's fields have been overlayed with slab-specific contents?

And memory_failure() is just one example - another is compact_zone()
and there may well be others.

This issue hasn't been well thought through. Given a random struct
page, there isn't any protocol to determine what it actually *is*.
It's a plain old variant record, but it lacks the agreed-upon tag field
which tells users which variant is currently in use.

2013-10-17 05:27:13

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH v2 01/15] slab: correct pfmemalloc check

On Wed, Oct 16, 2013 at 03:27:54PM +0000, Christoph Lameter wrote:
> On Wed, 16 Oct 2013, Joonsoo Kim wrote:
>
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -930,7 +930,8 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
> > {
> > if (unlikely(pfmemalloc_active)) {
> > /* Some pfmemalloc slabs exist, check if this is one */
> > - struct page *page = virt_to_head_page(objp);
> > + struct slab *slabp = virt_to_slab(objp);
> > + struct page *page = virt_to_head_page(slabp->s_mem);
> > if (PageSlabPfmemalloc(page))
>
> I hope the compiler optimizes this code correctly because virt_to_slab
> already does one virt_to_head_page()?

It should not.
objp could be in a different page with slabp->s_mem's,
so virt_to_head_page() should be called twice.

Anyway, after implementing struct slab overloading, one call site is
removed by [14/15] in this patchset, so there is no issue.

Thanks.

>
> Otherwise this looks fine.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-10-17 06:01:33

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] slab: overload struct slab over struct page to reduce memory usage

On Wed, Oct 16, 2013 at 01:34:57PM -0700, Andrew Morton wrote:
> On Wed, 16 Oct 2013 17:43:57 +0900 Joonsoo Kim <[email protected]> wrote:
>
> > There is two main topics in this patchset. One is to reduce memory usage
> > and the other is to change a management method of free objects of a slab.
> >
> > The SLAB allocate a struct slab for each slab. The size of this structure
> > except bufctl array is 40 bytes on 64 bits machine. We can reduce memory
> > waste and cache footprint if we overload struct slab over struct page.
>
> Seems a good idea from a quick look.

Thanks :)

>
> A thought: when we do things like this - adding additional
> interpretations to `struct page', we need to bear in mind that other
> unrelated code can inspect that pageframe. It is not correct to assume
> that because slab "owns" this page, no other code will be looking at it
> and interpreting its contents.
>
> One example is mm/memory-failure.c:memory_failure(). It starts with a
> raw pfn, uses that to get at the `struct page', then starts playing
> around with it. Will that code still work correctly when some of the
> page's fields have been overlayed with slab-specific contents?

Yes, it would work correctly since the SLUB already overload many fields
of struct page with slab-specific contents. One exception is mapping field
which isn't overloaded by the SLUB. But I guess there is no problem,
because the code inspecting random struct page may be already check
PageSlab() or PageLRU() and then skip it if so.

>
> And memory_failure() is just one example - another is compact_zone()
> and there may well be others.
>
> This issue hasn't been well thought through. Given a random struct
> page, there isn't any protocol to determine what it actually *is*.
> It's a plain old variant record, but it lacks the agreed-upon tag field
> which tells users which variant is currently in use.

With PageSlab(), we can determine that this page is the slab page.
Do we need more? Is there another user who overload struct page?

Thanks.

> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-10-17 07:20:13

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] slab: overload struct slab over struct page to reduce memory usage

On 10/16/13 10:34 PM, Andrew Morton wrote:
> On Wed, 16 Oct 2013 17:43:57 +0900 Joonsoo Kim <[email protected]> wrote:
>
>> There is two main topics in this patchset. One is to reduce memory usage
>> and the other is to change a management method of free objects of a slab.
>>
>> The SLAB allocate a struct slab for each slab. The size of this structure
>> except bufctl array is 40 bytes on 64 bits machine. We can reduce memory
>> waste and cache footprint if we overload struct slab over struct page.
> Seems a good idea from a quick look.

Indeed.

Christoph, I'd like to pick this up and queue for linux-next. Any
objections or comments to the patches?

Pekka

Subject: Re: [PATCH v2 00/15] slab: overload struct slab over struct page to reduce memory usage

On Thu, 17 Oct 2013, Pekka Enberg wrote:

> On 10/16/13 10:34 PM, Andrew Morton wrote:
> > On Wed, 16 Oct 2013 17:43:57 +0900 Joonsoo Kim <[email protected]>
> > wrote:
> >
> > > There is two main topics in this patchset. One is to reduce memory usage
> > > and the other is to change a management method of free objects of a slab.
> > >
> > > The SLAB allocate a struct slab for each slab. The size of this structure
> > > except bufctl array is 40 bytes on 64 bits machine. We can reduce memory
> > > waste and cache footprint if we overload struct slab over struct page.
> > Seems a good idea from a quick look.
>
> Indeed.
>
> Christoph, I'd like to pick this up and queue for linux-next. Any
> objections or comments to the patches?

I think this is fine. I have looked through the whole set repeatedly and
like the overall approach but I have I have only commented in detail on a
the beginning part of it. There was always something coming up. Sigh.

Subject: Re: [PATCH v2 04/15] slab: remove nodeid in struct slab

On Wed, 16 Oct 2013, Joonsoo Kim wrote:

> We can get nodeid using address translation, so this field is not useful.
> Therefore, remove it.

Acked-by: Christoph Lameter <[email protected]>

Subject: Re: [PATCH v2 08/15] slab: use __GFP_COMP flag for allocating slab pages

On Wed, 16 Oct 2013, Joonsoo Kim wrote:

> If we use 'struct page' of first page as 'struct slab', there is no
> advantage not to use __GFP_COMP. So use __GFP_COMP flag for all the cases.

Yes this is going to make the allocators behave in the same way. We could
actually put some of the page allocator related functionality in
slab_common.c

Acked-by: Christoph Lameter <[email protected]>

Subject: Re: [PATCH v2 11/15] slab: remove SLAB_LIMIT

On Wed, 16 Oct 2013, Joonsoo Kim wrote:

> It's useless now, so remove it.

Acked-by: Christoph Lameter <[email protected]>

Subject: Re: [PATCH v2 13/15] slab: use struct page for slab management

On Wed, 16 Oct 2013, Joonsoo Kim wrote:

> - * see PAGE_MAPPING_ANON below.
> - */
> + union {
> + struct address_space *mapping; /* If low bit clear, points to
> + * inode address_space, or NULL.
> + * If page mapped as anonymous
> + * memory, low bit is set, and
> + * it points to anon_vma object:
> + * see PAGE_MAPPING_ANON below.
> + */
> + void *s_mem; /* slab first object */
> + };

The overloading of mapping has caused problems in the past since slab
pages are (or are they no longer?) used for DMA to disk. At that point the
I/O subsystem may be expecting a mapping in the page struct if this field
is not NULL.

Subject: Re: [PATCH v2 00/15] slab: overload struct slab over struct page to reduce memory usage

On Wed, 16 Oct 2013, Andrew Morton wrote:

> This issue hasn't been well thought through. Given a random struct
> page, there isn't any protocol to determine what it actually *is*.
> It's a plain old variant record, but it lacks the agreed-upon tag field
> which tells users which variant is currently in use.

This issue has bitten us when SLUB was first introduced. We found out the
hard way f.e. that the mapping field had meaning in various contexts for a
slab page.

2013-10-18 15:05:16

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH v2 13/15] slab: use struct page for slab management

2013/10/18 Christoph Lameter <[email protected]>:
> On Wed, 16 Oct 2013, Joonsoo Kim wrote:
>
>> - * see PAGE_MAPPING_ANON below.
>> - */
>> + union {
>> + struct address_space *mapping; /* If low bit clear, points to
>> + * inode address_space, or NULL.
>> + * If page mapped as anonymous
>> + * memory, low bit is set, and
>> + * it points to anon_vma object:
>> + * see PAGE_MAPPING_ANON below.
>> + */
>> + void *s_mem; /* slab first object */
>> + };
>
> The overloading of mapping has caused problems in the past since slab
> pages are (or are they no longer?) used for DMA to disk. At that point the
> I/O subsystem may be expecting a mapping in the page struct if this field
> is not NULL.

I search the history of struct page and find that the SLUB use mapping field
in past (2007 year). At that time, you inserted VM_BUG_ON(PageSlab(page))
('b5fab14') into page_mapping() function to find remaining use. Recently,
I never hear that this is triggered and 6 years have passed since inserting
VM_BUG_ON(), so I guess there is no problem to use it.
If this argument is reasonable, please give me an ACK :)

Thanks.

2013-10-18 15:13:36

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH v2 08/15] slab: use __GFP_COMP flag for allocating slab pages

2013/10/18 Christoph Lameter <[email protected]>:
> On Wed, 16 Oct 2013, Joonsoo Kim wrote:
>
>> If we use 'struct page' of first page as 'struct slab', there is no
>> advantage not to use __GFP_COMP. So use __GFP_COMP flag for all the cases.
>
> Yes this is going to make the allocators behave in the same way. We could
> actually put some of the page allocator related functionality in
> slab_common.c

Okay. After merging this, I will try to clean-up that.

> Acked-by: Christoph Lameter <[email protected]>

Thanks.

2013-10-19 22:39:59

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] slab: overload struct slab over struct page to reduce memory usage

Andrew Morton <[email protected]> writes:
>
> One example is mm/memory-failure.c:memory_failure(). It starts with a
> raw pfn, uses that to get at the `struct page', then starts playing
> around with it. Will that code still work correctly when some of the
> page's fields have been overlayed with slab-specific contents?

As long as PageSlab() works correctly memory_failure should be happy.

>
> This issue hasn't been well thought through. Given a random struct
> page, there isn't any protocol to determine what it actually *is*.
> It's a plain old variant record, but it lacks the agreed-upon tag field
> which tells users which variant is currently in use.

PageSlab() should work for this right?

For the generic case it may not though.

-Andi

--
[email protected] -- Speaking for myself only

2013-10-19 22:41:16

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] slab: overload struct slab over struct page to reduce memory usage

Joonsoo Kim <[email protected]> writes:

> There is two main topics in this patchset. One is to reduce memory usage
> and the other is to change a management method of free objects of a slab.

I did a quick read over the whole patchset and it looks good to me. I
especially like how much code you remove. And of course the benchmarks
are looking good to. Thanks for cleaning up this old code.

Acked-by: Andi Kleen <[email protected]>

-Andi
--
[email protected] -- Speaking for myself only

Subject: Re: [PATCH v2 13/15] slab: use struct page for slab management

On Sat, 19 Oct 2013, JoonSoo Kim wrote:

> I search the history of struct page and find that the SLUB use mapping field
> in past (2007 year). At that time, you inserted VM_BUG_ON(PageSlab(page))
> ('b5fab14') into page_mapping() function to find remaining use. Recently,
> I never hear that this is triggered and 6 years have passed since inserting
> VM_BUG_ON(), so I guess there is no problem to use it.
> If this argument is reasonable, please give me an ACK :)
>
> Thanks.

Acked-by: Christoph Lameter <[email protected]>

2013-10-30 08:27:45

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH v2 13/15] slab: use struct page for slab management

Hello, Pekka.

There are two problems with this patch.

One is that this makes kmemleak warning WHEN CONFIG_DEBUG_KMEMLEAK,
because of false kmemleak_scan_area() call.

Another is about non-existing 'struct freelist'. It is not really
matter, since I just use a pointer to struct freelist. Therefore,
my compiler doesn't complain anything and generated code works fine :(

Following patch fixes these problems.
If you want an incremental patch against original patchset,
I can do it. Please let me know what you want.

Thanks.

Changes from v2 to v2-FIX.
1. remove kmemleak_scan_area callsite, since it is useless now.
2. %s/'struct freelist *'/'void *'


-----------------------------8<-------------------------------
>From 6d11304824a3b8c3bf7574323a3e55471cc26937 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim <[email protected]>
Date: Wed, 28 Aug 2013 16:30:27 +0900
Subject: [PATCH v2-FIX 13/15] slab: use struct page for slab management

Now, there are a few field in struct slab, so we can overload these
over struct page. This will save some memory and reduce cache footprint.

After this change, slabp_cache and slab_size no longer related to
a struct slab, so rename them as freelist_cache and freelist_size.

These changes are just mechanical ones and there is no functional change.

Acked-by: Andi Kleen <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 8b85d8c..4e17190 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -42,18 +42,22 @@ struct page {
/* First double word block */
unsigned long flags; /* Atomic flags, some possibly
* updated asynchronously */
- struct address_space *mapping; /* If low bit clear, points to
- * inode address_space, or NULL.
- * If page mapped as anonymous
- * memory, low bit is set, and
- * it points to anon_vma object:
- * see PAGE_MAPPING_ANON below.
- */
+ union {
+ struct address_space *mapping; /* If low bit clear, points to
+ * inode address_space, or NULL.
+ * If page mapped as anonymous
+ * memory, low bit is set, and
+ * it points to anon_vma object:
+ * see PAGE_MAPPING_ANON below.
+ */
+ void *s_mem; /* slab first object */
+ };
+
/* Second double word */
struct {
union {
pgoff_t index; /* Our offset within mapping. */
- void *freelist; /* slub/slob first free object */
+ void *freelist; /* sl[aou]b first free object */
bool pfmemalloc; /* If set by the page allocator,
* ALLOC_NO_WATERMARKS was set
* and the low watermark was not
@@ -109,6 +113,7 @@ struct page {
};
atomic_t _count; /* Usage count, see below. */
};
+ unsigned int active; /* SLAB */
};
};

diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index e9346b4..09bfffb 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -27,8 +27,8 @@ struct kmem_cache {

size_t colour; /* cache colouring range */
unsigned int colour_off; /* colour offset */
- struct kmem_cache *slabp_cache;
- unsigned int slab_size;
+ struct kmem_cache *freelist_cache;
+ unsigned int freelist_size;

/* constructor func */
void (*ctor)(void *obj);
diff --git a/mm/slab.c b/mm/slab.c
index 2ec2336..da01fc70 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -164,21 +164,6 @@
static bool pfmemalloc_active __read_mostly;

/*
- * struct slab
- *
- * Manages the objs in a slab. Placed either at the beginning of mem allocated
- * for a slab, or allocated from an general cache.
- * Slabs are chained into three list: fully used, partial, fully free slabs.
- */
-struct slab {
- struct {
- struct list_head list;
- void *s_mem; /* including colour offset */
- unsigned int active; /* num of objs active in slab */
- };
-};
-
-/*
* struct array_cache
*
* Purpose:
@@ -405,18 +390,10 @@ static inline struct kmem_cache *virt_to_cache(const void *obj)
return page->slab_cache;
}

-static inline struct slab *virt_to_slab(const void *obj)
-{
- struct page *page = virt_to_head_page(obj);
-
- VM_BUG_ON(!PageSlab(page));
- return page->slab_page;
-}
-
-static inline void *index_to_obj(struct kmem_cache *cache, struct slab *slab,
+static inline void *index_to_obj(struct kmem_cache *cache, struct page *page,
unsigned int idx)
{
- return slab->s_mem + cache->size * idx;
+ return page->s_mem + cache->size * idx;
}

/*
@@ -426,9 +403,9 @@ static inline void *index_to_obj(struct kmem_cache *cache, struct slab *slab,
* reciprocal_divide(offset, cache->reciprocal_buffer_size)
*/
static inline unsigned int obj_to_index(const struct kmem_cache *cache,
- const struct slab *slab, void *obj)
+ const struct page *page, void *obj)
{
- u32 offset = (obj - slab->s_mem);
+ u32 offset = (obj - page->s_mem);
return reciprocal_divide(offset, cache->reciprocal_buffer_size);
}

@@ -590,7 +567,7 @@ static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)

static size_t slab_mgmt_size(size_t nr_objs, size_t align)
{
- return ALIGN(sizeof(struct slab)+nr_objs*sizeof(unsigned int), align);
+ return ALIGN(nr_objs * sizeof(unsigned int), align);
}

/*
@@ -609,7 +586,6 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
* on it. For the latter case, the memory allocated for a
* slab is used for:
*
- * - The struct slab
* - One unsigned int for each object
* - Padding to respect alignment of @align
* - @buffer_size bytes for each object
@@ -632,8 +608,7 @@ static void cache_estimate(unsigned long gfporder, size_t buffer_size,
* into the memory allocation when taking the padding
* into account.
*/
- nr_objs = (slab_size - sizeof(struct slab)) /
- (buffer_size + sizeof(unsigned int));
+ nr_objs = (slab_size) / (buffer_size + sizeof(unsigned int));

/*
* This calculated number will be either the right
@@ -773,11 +748,11 @@ static struct array_cache *alloc_arraycache(int node, int entries,
return nc;
}

-static inline bool is_slab_pfmemalloc(struct slab *slabp)
+static inline bool is_slab_pfmemalloc(struct page *page)
{
- struct page *page = virt_to_page(slabp->s_mem);
+ struct page *mem_page = virt_to_page(page->s_mem);

- return PageSlabPfmemalloc(page);
+ return PageSlabPfmemalloc(mem_page);
}

/* Clears pfmemalloc_active if no slabs have pfmalloc set */
@@ -785,23 +760,23 @@ static void recheck_pfmemalloc_active(struct kmem_cache *cachep,
struct array_cache *ac)
{
struct kmem_cache_node *n = cachep->node[numa_mem_id()];
- struct slab *slabp;
+ struct page *page;
unsigned long flags;

if (!pfmemalloc_active)
return;

spin_lock_irqsave(&n->list_lock, flags);
- list_for_each_entry(slabp, &n->slabs_full, list)
- if (is_slab_pfmemalloc(slabp))
+ list_for_each_entry(page, &n->slabs_full, lru)
+ if (is_slab_pfmemalloc(page))
goto out;

- list_for_each_entry(slabp, &n->slabs_partial, list)
- if (is_slab_pfmemalloc(slabp))
+ list_for_each_entry(page, &n->slabs_partial, lru)
+ if (is_slab_pfmemalloc(page))
goto out;

- list_for_each_entry(slabp, &n->slabs_free, list)
- if (is_slab_pfmemalloc(slabp))
+ list_for_each_entry(page, &n->slabs_free, lru)
+ if (is_slab_pfmemalloc(page))
goto out;

pfmemalloc_active = false;
@@ -841,8 +816,8 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
*/
n = cachep->node[numa_mem_id()];
if (!list_empty(&n->slabs_free) && force_refill) {
- struct slab *slabp = virt_to_slab(objp);
- ClearPageSlabPfmemalloc(virt_to_head_page(slabp->s_mem));
+ struct page *page = virt_to_head_page(objp);
+ ClearPageSlabPfmemalloc(virt_to_head_page(page->s_mem));
clear_obj_pfmemalloc(&objp);
recheck_pfmemalloc_active(cachep, ac);
return objp;
@@ -874,9 +849,9 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
{
if (unlikely(pfmemalloc_active)) {
/* Some pfmemalloc slabs exist, check if this is one */
- struct slab *slabp = virt_to_slab(objp);
- struct page *page = virt_to_head_page(slabp->s_mem);
- if (PageSlabPfmemalloc(page))
+ struct page *page = virt_to_head_page(objp);
+ struct page *mem_page = virt_to_head_page(page->s_mem);
+ if (PageSlabPfmemalloc(mem_page))
set_obj_pfmemalloc(&objp);
}

@@ -1633,7 +1608,7 @@ static noinline void
slab_out_of_memory(struct kmem_cache *cachep, gfp_t gfpflags, int nodeid)
{
struct kmem_cache_node *n;
- struct slab *slabp;
+ struct page *page;
unsigned long flags;
int node;

@@ -1652,15 +1627,15 @@ slab_out_of_memory(struct kmem_cache *cachep, gfp_t gfpflags, int nodeid)
continue;

spin_lock_irqsave(&n->list_lock, flags);
- list_for_each_entry(slabp, &n->slabs_full, list) {
+ list_for_each_entry(page, &n->slabs_full, lru) {
active_objs += cachep->num;
active_slabs++;
}
- list_for_each_entry(slabp, &n->slabs_partial, list) {
- active_objs += slabp->active;
+ list_for_each_entry(page, &n->slabs_partial, lru) {
+ active_objs += page->active;
active_slabs++;
}
- list_for_each_entry(slabp, &n->slabs_free, list)
+ list_for_each_entry(page, &n->slabs_free, lru)
num_slabs++;

free_objects += n->free_objects;
@@ -1746,6 +1721,8 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
BUG_ON(!PageSlab(page));
__ClearPageSlabPfmemalloc(page);
__ClearPageSlab(page);
+ page_mapcount_reset(page);
+ page->mapping = NULL;

memcg_release_pages(cachep, cachep->gfporder);
if (current->reclaim_state)
@@ -1910,19 +1887,19 @@ static void check_poison_obj(struct kmem_cache *cachep, void *objp)
/* Print some data about the neighboring objects, if they
* exist:
*/
- struct slab *slabp = virt_to_slab(objp);
+ struct page *page = virt_to_head_page(objp);
unsigned int objnr;

- objnr = obj_to_index(cachep, slabp, objp);
+ objnr = obj_to_index(cachep, page, objp);
if (objnr) {
- objp = index_to_obj(cachep, slabp, objnr - 1);
+ objp = index_to_obj(cachep, page, objnr - 1);
realobj = (char *)objp + obj_offset(cachep);
printk(KERN_ERR "Prev obj: start=%p, len=%d\n",
realobj, size);
print_objinfo(cachep, objp, 2);
}
if (objnr + 1 < cachep->num) {
- objp = index_to_obj(cachep, slabp, objnr + 1);
+ objp = index_to_obj(cachep, page, objnr + 1);
realobj = (char *)objp + obj_offset(cachep);
printk(KERN_ERR "Next obj: start=%p, len=%d\n",
realobj, size);
@@ -1933,11 +1910,12 @@ static void check_poison_obj(struct kmem_cache *cachep, void *objp)
#endif

#if DEBUG
-static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slabp)
+static void slab_destroy_debugcheck(struct kmem_cache *cachep,
+ struct page *page)
{
int i;
for (i = 0; i < cachep->num; i++) {
- void *objp = index_to_obj(cachep, slabp, i);
+ void *objp = index_to_obj(cachep, page, i);

if (cachep->flags & SLAB_POISON) {
#ifdef CONFIG_DEBUG_PAGEALLOC
@@ -1962,7 +1940,8 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slab
}
}
#else
-static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slabp)
+static void slab_destroy_debugcheck(struct kmem_cache *cachep,
+ struct page *page)
{
}
#endif
@@ -1976,11 +1955,12 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slab
* Before calling the slab must have been unlinked from the cache. The
* cache-lock is not held/needed.
*/
-static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
+static void slab_destroy(struct kmem_cache *cachep, struct page *page)
{
- struct page *page = virt_to_head_page(slabp->s_mem);
+ void *freelist;

- slab_destroy_debugcheck(cachep, slabp);
+ freelist = page->freelist;
+ slab_destroy_debugcheck(cachep, page);
if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) {
struct rcu_head *head;

@@ -1998,11 +1978,11 @@ static void slab_destroy(struct kmem_cache *cachep, struct slab *slabp)
}

/*
- * From now on, we don't use slab management
+ * From now on, we don't use freelist
* although actual page can be freed in rcu context
*/
if (OFF_SLAB(cachep))
- kmem_cache_free(cachep->slabp_cache, slabp);
+ kmem_cache_free(cachep->freelist_cache, freelist);
}

/**
@@ -2039,7 +2019,7 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
* use off-slab slabs. Needed to avoid a possible
* looping condition in cache_grow().
*/
- offslab_limit = size - sizeof(struct slab);
+ offslab_limit = size;
offslab_limit /= sizeof(unsigned int);

if (num > offslab_limit)
@@ -2162,7 +2142,7 @@ static int __init_refok setup_cpu_cache(struct kmem_cache *cachep, gfp_t gfp)
int
__kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
{
- size_t left_over, slab_size, ralign;
+ size_t left_over, freelist_size, ralign;
gfp_t gfp;
int err;
size_t size = cachep->size;
@@ -2281,22 +2261,21 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
if (!cachep->num)
return -E2BIG;

- slab_size = ALIGN(cachep->num * sizeof(unsigned int)
- + sizeof(struct slab), cachep->align);
+ freelist_size =
+ ALIGN(cachep->num * sizeof(unsigned int), cachep->align);

/*
* If the slab has been placed off-slab, and we have enough space then
* move it on-slab. This is at the expense of any extra colouring.
*/
- if (flags & CFLGS_OFF_SLAB && left_over >= slab_size) {
+ if (flags & CFLGS_OFF_SLAB && left_over >= freelist_size) {
flags &= ~CFLGS_OFF_SLAB;
- left_over -= slab_size;
+ left_over -= freelist_size;
}

if (flags & CFLGS_OFF_SLAB) {
/* really off slab. No need for manual alignment */
- slab_size =
- cachep->num * sizeof(unsigned int) + sizeof(struct slab);
+ freelist_size = cachep->num * sizeof(unsigned int);

#ifdef CONFIG_PAGE_POISONING
/* If we're going to use the generic kernel_map_pages()
@@ -2313,7 +2292,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
if (cachep->colour_off < cachep->align)
cachep->colour_off = cachep->align;
cachep->colour = left_over / cachep->colour_off;
- cachep->slab_size = slab_size;
+ cachep->freelist_size = freelist_size;
cachep->flags = flags;
cachep->allocflags = __GFP_COMP;
if (CONFIG_ZONE_DMA_FLAG && (flags & SLAB_CACHE_DMA))
@@ -2322,7 +2301,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
cachep->reciprocal_buffer_size = reciprocal_value(size);

if (flags & CFLGS_OFF_SLAB) {
- cachep->slabp_cache = kmalloc_slab(slab_size, 0u);
+ cachep->freelist_cache = kmalloc_slab(freelist_size, 0u);
/*
* This is a possibility for one of the malloc_sizes caches.
* But since we go off slab only for object size greater than
@@ -2330,7 +2309,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
* this should not happen at all.
* But leave a BUG_ON for some lucky dude.
*/
- BUG_ON(ZERO_OR_NULL_PTR(cachep->slabp_cache));
+ BUG_ON(ZERO_OR_NULL_PTR(cachep->freelist_cache));
}

err = setup_cpu_cache(cachep, gfp);
@@ -2436,7 +2415,7 @@ static int drain_freelist(struct kmem_cache *cache,
{
struct list_head *p;
int nr_freed;
- struct slab *slabp;
+ struct page *page;

nr_freed = 0;
while (nr_freed < tofree && !list_empty(&n->slabs_free)) {
@@ -2448,18 +2427,18 @@ static int drain_freelist(struct kmem_cache *cache,
goto out;
}

- slabp = list_entry(p, struct slab, list);
+ page = list_entry(p, struct page, lru);
#if DEBUG
- BUG_ON(slabp->active);
+ BUG_ON(page->active);
#endif
- list_del(&slabp->list);
+ list_del(&page->lru);
/*
* Safe to drop the lock. The slab is no longer linked
* to the cache.
*/
n->free_objects -= cache->num;
spin_unlock_irq(&n->list_lock);
- slab_destroy(cache, slabp);
+ slab_destroy(cache, page);
nr_freed++;
}
out:
@@ -2542,50 +2521,42 @@ int __kmem_cache_shutdown(struct kmem_cache *cachep)
* descriptors in kmem_cache_create, we search through the malloc_sizes array.
* If we are creating a malloc_sizes cache here it would not be visible to
* kmem_find_general_cachep till the initialization is complete.
- * Hence we cannot have slabp_cache same as the original cache.
+ * Hence we cannot have freelist_cache same as the original cache.
*/
-static struct slab *alloc_slabmgmt(struct kmem_cache *cachep,
+static void *alloc_slabmgmt(struct kmem_cache *cachep,
struct page *page, int colour_off,
gfp_t local_flags, int nodeid)
{
- struct slab *slabp;
+ void *freelist;
void *addr = page_address(page);

if (OFF_SLAB(cachep)) {
/* Slab management obj is off-slab. */
- slabp = kmem_cache_alloc_node(cachep->slabp_cache,
+ freelist = kmem_cache_alloc_node(cachep->freelist_cache,
local_flags, nodeid);
- /*
- * If the first object in the slab is leaked (it's allocated
- * but no one has a reference to it), we want to make sure
- * kmemleak does not treat the ->s_mem pointer as a reference
- * to the object. Otherwise we will not report the leak.
- */
- kmemleak_scan_area(&slabp->list, sizeof(struct list_head),
- local_flags);
- if (!slabp)
+ if (!freelist)
return NULL;
} else {
- slabp = addr + colour_off;
- colour_off += cachep->slab_size;
+ freelist = addr + colour_off;
+ colour_off += cachep->freelist_size;
}
- slabp->active = 0;
- slabp->s_mem = addr + colour_off;
- return slabp;
+ page->active = 0;
+ page->s_mem = addr + colour_off;
+ return freelist;
}

-static inline unsigned int *slab_bufctl(struct slab *slabp)
+static inline unsigned int *slab_bufctl(struct page *page)
{
- return (unsigned int *) (slabp + 1);
+ return (unsigned int *)(page->freelist);
}

static void cache_init_objs(struct kmem_cache *cachep,
- struct slab *slabp)
+ struct page *page)
{
int i;

for (i = 0; i < cachep->num; i++) {
- void *objp = index_to_obj(cachep, slabp, i);
+ void *objp = index_to_obj(cachep, page, i);
#if DEBUG
/* need to poison the objs? */
if (cachep->flags & SLAB_POISON)
@@ -2621,7 +2592,7 @@ static void cache_init_objs(struct kmem_cache *cachep,
if (cachep->ctor)
cachep->ctor(objp);
#endif
- slab_bufctl(slabp)[i] = i;
+ slab_bufctl(page)[i] = i;
}
}

@@ -2635,13 +2606,13 @@ static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
}
}

-static void *slab_get_obj(struct kmem_cache *cachep, struct slab *slabp,
+static void *slab_get_obj(struct kmem_cache *cachep, struct page *page,
int nodeid)
{
void *objp;

- objp = index_to_obj(cachep, slabp, slab_bufctl(slabp)[slabp->active]);
- slabp->active++;
+ objp = index_to_obj(cachep, page, slab_bufctl(page)[page->active]);
+ page->active++;
#if DEBUG
WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
#endif
@@ -2649,10 +2620,10 @@ static void *slab_get_obj(struct kmem_cache *cachep, struct slab *slabp,
return objp;
}

-static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
+static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
void *objp, int nodeid)
{
- unsigned int objnr = obj_to_index(cachep, slabp, objp);
+ unsigned int objnr = obj_to_index(cachep, page, objp);
#if DEBUG
unsigned int i;

@@ -2660,16 +2631,16 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);

/* Verify double free bug */
- for (i = slabp->active; i < cachep->num; i++) {
- if (slab_bufctl(slabp)[i] == objnr) {
+ for (i = page->active; i < cachep->num; i++) {
+ if (slab_bufctl(page)[i] == objnr) {
printk(KERN_ERR "slab: double free detected in cache "
"'%s', objp %p\n", cachep->name, objp);
BUG();
}
}
#endif
- slabp->active--;
- slab_bufctl(slabp)[slabp->active] = objnr;
+ page->active--;
+ slab_bufctl(page)[page->active] = objnr;
}

/*
@@ -2677,11 +2648,11 @@ static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
* for the slab allocator to be able to lookup the cache and slab of a
* virtual address for kfree, ksize, and slab debugging.
*/
-static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
- struct page *page)
+static void slab_map_pages(struct kmem_cache *cache, struct page *page,
+ void *freelist)
{
page->slab_cache = cache;
- page->slab_page = slab;
+ page->freelist = freelist;
}

/*
@@ -2691,7 +2662,7 @@ static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
static int cache_grow(struct kmem_cache *cachep,
gfp_t flags, int nodeid, struct page *page)
{
- struct slab *slabp;
+ void *freelist;
size_t offset;
gfp_t local_flags;
struct kmem_cache_node *n;
@@ -2738,14 +2709,14 @@ static int cache_grow(struct kmem_cache *cachep,
goto failed;

/* Get slab management. */
- slabp = alloc_slabmgmt(cachep, page, offset,
+ freelist = alloc_slabmgmt(cachep, page, offset,
local_flags & ~GFP_CONSTRAINT_MASK, nodeid);
- if (!slabp)
+ if (!freelist)
goto opps1;

- slab_map_pages(cachep, slabp, page);
+ slab_map_pages(cachep, page, freelist);

- cache_init_objs(cachep, slabp);
+ cache_init_objs(cachep, page);

if (local_flags & __GFP_WAIT)
local_irq_disable();
@@ -2753,7 +2724,7 @@ static int cache_grow(struct kmem_cache *cachep,
spin_lock(&n->list_lock);

/* Make slab active. */
- list_add_tail(&slabp->list, &(n->slabs_free));
+ list_add_tail(&page->lru, &(n->slabs_free));
STATS_INC_GROWN(cachep);
n->free_objects += cachep->num;
spin_unlock(&n->list_lock);
@@ -2808,13 +2779,13 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
unsigned long caller)
{
unsigned int objnr;
- struct slab *slabp;
+ struct page *page;

BUG_ON(virt_to_cache(objp) != cachep);

objp -= obj_offset(cachep);
kfree_debugcheck(objp);
- slabp = virt_to_slab(objp);
+ page = virt_to_head_page(objp);

if (cachep->flags & SLAB_RED_ZONE) {
verify_redzone_free(cachep, objp);
@@ -2824,10 +2795,10 @@ static void *cache_free_debugcheck(struct kmem_cache *cachep, void *objp,
if (cachep->flags & SLAB_STORE_USER)
*dbg_userword(cachep, objp) = (void *)caller;

- objnr = obj_to_index(cachep, slabp, objp);
+ objnr = obj_to_index(cachep, page, objp);

BUG_ON(objnr >= cachep->num);
- BUG_ON(objp != index_to_obj(cachep, slabp, objnr));
+ BUG_ON(objp != index_to_obj(cachep, page, objnr));

if (cachep->flags & SLAB_POISON) {
#ifdef CONFIG_DEBUG_PAGEALLOC
@@ -2886,7 +2857,7 @@ retry:

while (batchcount > 0) {
struct list_head *entry;
- struct slab *slabp;
+ struct page *page;
/* Get slab alloc is to come from. */
entry = n->slabs_partial.next;
if (entry == &n->slabs_partial) {
@@ -2896,7 +2867,7 @@ retry:
goto must_grow;
}

- slabp = list_entry(entry, struct slab, list);
+ page = list_entry(entry, struct page, lru);
check_spinlock_acquired(cachep);

/*
@@ -2904,23 +2875,23 @@ retry:
* there must be at least one object available for
* allocation.
*/
- BUG_ON(slabp->active >= cachep->num);
+ BUG_ON(page->active >= cachep->num);

- while (slabp->active < cachep->num && batchcount--) {
+ while (page->active < cachep->num && batchcount--) {
STATS_INC_ALLOCED(cachep);
STATS_INC_ACTIVE(cachep);
STATS_SET_HIGH(cachep);

- ac_put_obj(cachep, ac, slab_get_obj(cachep, slabp,
+ ac_put_obj(cachep, ac, slab_get_obj(cachep, page,
node));
}

/* move slabp to correct slabp list: */
- list_del(&slabp->list);
- if (slabp->active == cachep->num)
- list_add(&slabp->list, &n->slabs_full);
+ list_del(&page->lru);
+ if (page->active == cachep->num)
+ list_add(&page->list, &n->slabs_full);
else
- list_add(&slabp->list, &n->slabs_partial);
+ list_add(&page->list, &n->slabs_partial);
}

must_grow:
@@ -3175,7 +3146,7 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
int nodeid)
{
struct list_head *entry;
- struct slab *slabp;
+ struct page *page;
struct kmem_cache_node *n;
void *obj;
int x;
@@ -3195,24 +3166,24 @@ retry:
goto must_grow;
}

- slabp = list_entry(entry, struct slab, list);
+ page = list_entry(entry, struct page, lru);
check_spinlock_acquired_node(cachep, nodeid);

STATS_INC_NODEALLOCS(cachep);
STATS_INC_ACTIVE(cachep);
STATS_SET_HIGH(cachep);

- BUG_ON(slabp->active == cachep->num);
+ BUG_ON(page->active == cachep->num);

- obj = slab_get_obj(cachep, slabp, nodeid);
+ obj = slab_get_obj(cachep, page, nodeid);
n->free_objects--;
/* move slabp to correct slabp list: */
- list_del(&slabp->list);
+ list_del(&page->lru);

- if (slabp->active == cachep->num)
- list_add(&slabp->list, &n->slabs_full);
+ if (page->active == cachep->num)
+ list_add(&page->lru, &n->slabs_full);
else
- list_add(&slabp->list, &n->slabs_partial);
+ list_add(&page->lru, &n->slabs_partial);

spin_unlock(&n->list_lock);
goto done;
@@ -3362,21 +3333,21 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,

for (i = 0; i < nr_objects; i++) {
void *objp;
- struct slab *slabp;
+ struct page *page;

clear_obj_pfmemalloc(&objpp[i]);
objp = objpp[i];

- slabp = virt_to_slab(objp);
+ page = virt_to_head_page(objp);
n = cachep->node[node];
- list_del(&slabp->list);
+ list_del(&page->lru);
check_spinlock_acquired_node(cachep, node);
- slab_put_obj(cachep, slabp, objp, node);
+ slab_put_obj(cachep, page, objp, node);
STATS_DEC_ACTIVE(cachep);
n->free_objects++;

/* fixup slab chains */
- if (slabp->active == 0) {
+ if (page->active == 0) {
if (n->free_objects > n->free_limit) {
n->free_objects -= cachep->num;
/* No need to drop any previously held
@@ -3385,16 +3356,16 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
* a different cache, refer to comments before
* alloc_slabmgmt.
*/
- slab_destroy(cachep, slabp);
+ slab_destroy(cachep, page);
} else {
- list_add(&slabp->list, &n->slabs_free);
+ list_add(&page->lru, &n->slabs_free);
}
} else {
/* Unconditionally move a slab to the end of the
* partial list on free - maximum time for the
* other objects to be freed, too.
*/
- list_add_tail(&slabp->list, &n->slabs_partial);
+ list_add_tail(&page->lru, &n->slabs_partial);
}
}
}
@@ -3434,10 +3405,10 @@ free_done:

p = n->slabs_free.next;
while (p != &(n->slabs_free)) {
- struct slab *slabp;
+ struct page *page;

- slabp = list_entry(p, struct slab, list);
- BUG_ON(slabp->active);
+ page = list_entry(p, struct page, lru);
+ BUG_ON(page->active);

i++;
p = p->next;
@@ -4041,7 +4012,7 @@ out:
#ifdef CONFIG_SLABINFO
void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
{
- struct slab *slabp;
+ struct page *page;
unsigned long active_objs;
unsigned long num_objs;
unsigned long active_slabs = 0;
@@ -4061,22 +4032,22 @@ void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
check_irq_on();
spin_lock_irq(&n->list_lock);

- list_for_each_entry(slabp, &n->slabs_full, list) {
- if (slabp->active != cachep->num && !error)
+ list_for_each_entry(page, &n->slabs_full, lru) {
+ if (page->active != cachep->num && !error)
error = "slabs_full accounting error";
active_objs += cachep->num;
active_slabs++;
}
- list_for_each_entry(slabp, &n->slabs_partial, list) {
- if (slabp->active == cachep->num && !error)
+ list_for_each_entry(page, &n->slabs_partial, lru) {
+ if (page->active == cachep->num && !error)
error = "slabs_partial accounting error";
- if (!slabp->active && !error)
+ if (!page->active && !error)
error = "slabs_partial accounting error";
- active_objs += slabp->active;
+ active_objs += page->active;
active_slabs++;
}
- list_for_each_entry(slabp, &n->slabs_free, list) {
- if (slabp->active && !error)
+ list_for_each_entry(page, &n->slabs_free, lru) {
+ if (page->active && !error)
error = "slabs_free accounting error";
num_slabs++;
}
@@ -4229,19 +4200,20 @@ static inline int add_caller(unsigned long *n, unsigned long v)
return 1;
}

-static void handle_slab(unsigned long *n, struct kmem_cache *c, struct slab *s)
+static void handle_slab(unsigned long *n, struct kmem_cache *c,
+ struct page *page)
{
void *p;
int i, j;

if (n[0] == n[1])
return;
- for (i = 0, p = s->s_mem; i < c->num; i++, p += c->size) {
+ for (i = 0, p = page->s_mem; i < c->num; i++, p += c->size) {
bool active = true;

- for (j = s->active; j < c->num; j++) {
+ for (j = page->active; j < c->num; j++) {
/* Skip freed item */
- if (slab_bufctl(s)[j] == i) {
+ if (slab_bufctl(page)[j] == i) {
active = false;
break;
}
@@ -4273,7 +4245,7 @@ static void show_symbol(struct seq_file *m, unsigned long address)
static int leaks_show(struct seq_file *m, void *p)
{
struct kmem_cache *cachep = list_entry(p, struct kmem_cache, list);
- struct slab *slabp;
+ struct page *page;
struct kmem_cache_node *n;
const char *name;
unsigned long *x = m->private;
@@ -4297,10 +4269,10 @@ static int leaks_show(struct seq_file *m, void *p)
check_irq_on();
spin_lock_irq(&n->list_lock);

- list_for_each_entry(slabp, &n->slabs_full, list)
- handle_slab(x, cachep, slabp);
- list_for_each_entry(slabp, &n->slabs_partial, list)
- handle_slab(x, cachep, slabp);
+ list_for_each_entry(page, &n->slabs_full, lru)
+ handle_slab(x, cachep, page);
+ list_for_each_entry(page, &n->slabs_partial, lru)
+ handle_slab(x, cachep, page);
spin_unlock_irq(&n->list_lock);
}
name = cachep->name;
--
1.7.9.5

2013-10-30 08:42:21

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH v2 13/15] slab: use struct page for slab management

On 10/30/2013 10:28 AM, Joonsoo Kim wrote:
> If you want an incremental patch against original patchset,
> I can do it. Please let me know what you want.

Yes, please. Incremental is much easier to deal with if we want this to
end up in v3.13.

Pekka

2013-10-30 10:03:50

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2 16/15] slab: fix to calm down kmemleak warning

After using struct page as slab management, we should not call
kmemleak_scan_area(), since struct page isn't the tracking object of
kmemleak. Without this patch and if CONFIG_DEBUG_KMEMLEAK is enabled,
so many kmemleak warnings are printed.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/slab.c b/mm/slab.c
index af2db76..a8a9349 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2531,14 +2531,6 @@ static struct freelist *alloc_slabmgmt(struct kmem_cache *cachep,
/* Slab management obj is off-slab. */
freelist = kmem_cache_alloc_node(cachep->freelist_cache,
local_flags, nodeid);
- /*
- * If the first object in the slab is leaked (it's allocated
- * but no one has a reference to it), we want to make sure
- * kmemleak does not treat the ->s_mem pointer as a reference
- * to the object. Otherwise we will not report the leak.
- */
- kmemleak_scan_area(&page->lru, sizeof(struct list_head),
- local_flags);
if (!freelist)
return NULL;
} else {
--
1.7.9.5

2013-10-30 10:03:58

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2 17/15] slab: replace non-existing 'struct freelist *' with 'void *'

There is no 'strcut freelist', but codes use pointer to 'struct freelist'.
Although compiler doesn't complain anything about this wrong usage and
codes work fine, but fixing it is better.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/slab.c b/mm/slab.c
index a8a9349..a983e30 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1954,7 +1954,7 @@ static void slab_destroy_debugcheck(struct kmem_cache *cachep,
*/
static void slab_destroy(struct kmem_cache *cachep, struct page *page)
{
- struct freelist *freelist;
+ void *freelist;

freelist = page->freelist;
slab_destroy_debugcheck(cachep, page);
@@ -2520,11 +2520,11 @@ int __kmem_cache_shutdown(struct kmem_cache *cachep)
* kmem_find_general_cachep till the initialization is complete.
* Hence we cannot have freelist_cache same as the original cache.
*/
-static struct freelist *alloc_slabmgmt(struct kmem_cache *cachep,
+static void *alloc_slabmgmt(struct kmem_cache *cachep,
struct page *page, int colour_off,
gfp_t local_flags, int nodeid)
{
- struct freelist *freelist;
+ void *freelist;
void *addr = page_address(page);

if (OFF_SLAB(cachep)) {
@@ -2646,7 +2646,7 @@ static void slab_put_obj(struct kmem_cache *cachep, struct page *page,
* virtual address for kfree, ksize, and slab debugging.
*/
static void slab_map_pages(struct kmem_cache *cache, struct page *page,
- struct freelist *freelist)
+ void *freelist)
{
page->slab_cache = cache;
page->freelist = freelist;
@@ -2659,7 +2659,7 @@ static void slab_map_pages(struct kmem_cache *cache, struct page *page,
static int cache_grow(struct kmem_cache *cachep,
gfp_t flags, int nodeid, struct page *page)
{
- struct freelist *freelist;
+ void *freelist;
size_t offset;
gfp_t local_flags;
struct kmem_cache_node *n;
--
1.7.9.5

2013-10-30 10:05:55

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH v2 13/15] slab: use struct page for slab management

On Wed, Oct 30, 2013 at 10:42:14AM +0200, Pekka Enberg wrote:
> On 10/30/2013 10:28 AM, Joonsoo Kim wrote:
> >If you want an incremental patch against original patchset,
> >I can do it. Please let me know what you want.
>
> Yes, please. Incremental is much easier to deal with if we want this
> to end up in v3.13.

Okay. I just sent them.

Thanks.