These patches are for refactoring some code in slub.
Two patches were submitted 1 weeks ago, but doesn't receive ack or nack
from MAINTAINER for slub. So I re-send these.
https://lkml.org/lkml/2012/5/10/273
https://lkml.org/lkml/2012/5/10/275
Among others, one about page-flag is very simple change.
Last one is main target of this patch set.
It is dependent on 'slub: change cmpxchg_double_slab in
unfreeze_partials to __cmpxchg_double_slab', so I send these at one time.
Joonsoo Kim (4):
slub: change cmpxchg_double_slab in get_freelist() to
__cmpxchg_double_slab
slub: change cmpxchg_double_slab in unfreeze_partials to
__cmpxchg_double_slab
slub: use __SetPageSlab function to set PG_slab flag
slub: refactoring unfreeze_partials()
mm/slub.c | 54 +++++++++++++++++-------------------------------------
1 file changed, 17 insertions(+), 37 deletions(-)
--
1.7.9.5
get_freelist() is only called by __slab_alloc with interrupt disabled,
so __cmpxchg_double_slab is suitable.
Acked-by: David Rientjes <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
diff --git a/mm/slub.c b/mm/slub.c
index 0c3105c..d28bc45 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2179,7 +2179,7 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page)
new.inuse = page->objects;
new.frozen = freelist != NULL;
- } while (!cmpxchg_double_slab(s, page,
+ } while (!__cmpxchg_double_slab(s, page,
freelist, counters,
NULL, new.counters,
"get_freelist"));
--
1.7.9.5
unfreeze_partials() is only called with interrupt disabled,
so __cmpxchg_double_slab is suitable.
Signed-off-by: Joonsoo Kim <[email protected]>
diff --git a/mm/slub.c b/mm/slub.c
index d28bc45..c38efce 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1935,7 +1935,7 @@ static void unfreeze_partials(struct kmem_cache *s)
l = m;
}
- } while (!cmpxchg_double_slab(s, page,
+ } while (!__cmpxchg_double_slab(s, page,
old.freelist, old.counters,
new.freelist, new.counters,
"unfreezing slab"));
--
1.7.9.5
To set page-flag, using SetPageXXXX() and __SetPageXXXX() is more
understandable and maintainable. So change it.
Signed-off-by: Joonsoo Kim <[email protected]>
diff --git a/mm/slub.c b/mm/slub.c
index c38efce..69342fd 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1369,7 +1369,7 @@ static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
inc_slabs_node(s, page_to_nid(page), page->objects);
page->slab = s;
- page->flags |= 1 << PG_slab;
+ __SetPageSlab(page);
start = page_address(page);
--
1.7.9.5
Current implementation of unfreeze_partials() is so complicated,
but benefit from it is insignificant. In addition many code in
do {} while loop have a bad influence to a fail rate of cmpxchg_double_slab.
Under current implementation which test status of cpu partial slab
and acquire list_lock in do {} while loop,
we don't need to acquire a list_lock and gain a little benefit
when front of the cpu partial slab is to be discarded, but this is a rare case.
In case that add_partial is performed and cmpxchg_double_slab is failed,
remove_partial should be called case by case.
I think that these are disadvantages of current implementation,
so I do refactoring unfreeze_partials().
Minimizing code in do {} while loop introduce a reduced fail rate
of cmpxchg_double_slab. Below is output of 'slabinfo -r kmalloc-256'
when './perf stat -r 33 hackbench 50 process 4000 > /dev/null' is done.
** before **
Cmpxchg_double Looping
------------------------
Locked Cmpxchg Double redos 182685
Unlocked Cmpxchg Double redos 0
** after **
Cmpxchg_double Looping
------------------------
Locked Cmpxchg Double redos 177995
Unlocked Cmpxchg Double redos 1
We can see cmpxchg_double_slab fail rate is improved slightly.
Bolow is output of './perf stat -r 30 hackbench 50 process 4000 > /dev/null'.
** before **
Performance counter stats for './hackbench 50 process 4000' (30 runs):
108517.190463 task-clock # 7.926 CPUs utilized ( +- 0.24% )
2,919,550 context-switches # 0.027 M/sec ( +- 3.07% )
100,774 CPU-migrations # 0.929 K/sec ( +- 4.72% )
124,201 page-faults # 0.001 M/sec ( +- 0.15% )
401,500,234,387 cycles # 3.700 GHz ( +- 0.24% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
250,576,913,354 instructions # 0.62 insns per cycle ( +- 0.13% )
45,934,956,860 branches # 423.297 M/sec ( +- 0.14% )
188,219,787 branch-misses # 0.41% of all branches ( +- 0.56% )
13.691837307 seconds time elapsed ( +- 0.24% )
** after **
Performance counter stats for './hackbench 50 process 4000' (30 runs):
107784.479767 task-clock # 7.928 CPUs utilized ( +- 0.22% )
2,834,781 context-switches # 0.026 M/sec ( +- 2.33% )
93,083 CPU-migrations # 0.864 K/sec ( +- 3.45% )
123,967 page-faults # 0.001 M/sec ( +- 0.15% )
398,781,421,836 cycles # 3.700 GHz ( +- 0.22% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
250,189,160,419 instructions # 0.63 insns per cycle ( +- 0.09% )
45,855,370,128 branches # 425.436 M/sec ( +- 0.10% )
169,881,248 branch-misses # 0.37% of all branches ( +- 0.43% )
13.596272341 seconds time elapsed ( +- 0.22% )
No regression is found, but rather we can see slightly better result.
Signed-off-by: Joonsoo Kim <[email protected]>
diff --git a/mm/slub.c b/mm/slub.c
index 69342fd..eebb6d0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1882,18 +1882,24 @@ redo:
/* Unfreeze all the cpu partial slabs */
static void unfreeze_partials(struct kmem_cache *s)
{
- struct kmem_cache_node *n = NULL;
+ struct kmem_cache_node *n = NULL, *n2 = NULL;
struct kmem_cache_cpu *c = this_cpu_ptr(s->cpu_slab);
struct page *page, *discard_page = NULL;
while ((page = c->partial)) {
- enum slab_modes { M_PARTIAL, M_FREE };
- enum slab_modes l, m;
struct page new;
struct page old;
c->partial = page->next;
- l = M_FREE;
+
+ n2 = get_node(s, page_to_nid(page));
+ if (n != n2) {
+ if (n)
+ spin_unlock(&n->list_lock);
+
+ n = n2;
+ spin_lock(&n->list_lock);
+ }
do {
@@ -1906,43 +1912,17 @@ static void unfreeze_partials(struct kmem_cache *s)
new.frozen = 0;
- if (!new.inuse && (!n || n->nr_partial > s->min_partial))
- m = M_FREE;
- else {
- struct kmem_cache_node *n2 = get_node(s,
- page_to_nid(page));
-
- m = M_PARTIAL;
- if (n != n2) {
- if (n)
- spin_unlock(&n->list_lock);
-
- n = n2;
- spin_lock(&n->list_lock);
- }
- }
-
- if (l != m) {
- if (l == M_PARTIAL) {
- remove_partial(n, page);
- stat(s, FREE_REMOVE_PARTIAL);
- } else {
- add_partial(n, page,
- DEACTIVATE_TO_TAIL);
- stat(s, FREE_ADD_PARTIAL);
- }
-
- l = m;
- }
-
} while (!__cmpxchg_double_slab(s, page,
old.freelist, old.counters,
new.freelist, new.counters,
"unfreezing slab"));
- if (m == M_FREE) {
+ if (unlikely(!new.inuse && n->nr_partial > s->min_partial)) {
page->next = discard_page;
discard_page = page;
+ } else {
+ add_partial(n, page, DEACTIVATE_TO_TAIL);
+ stat(s, FREE_ADD_PARTIAL);
}
}
--
1.7.9.5
On Fri, 18 May 2012, Joonsoo Kim wrote:
> To set page-flag, using SetPageXXXX() and __SetPageXXXX() is more
> understandable and maintainable. So change it.
Acked-by: Christoph Lameter <[email protected]>
On Fri, 18 May 2012, Joonsoo Kim wrote:
> Two patches were submitted 1 weeks ago, but doesn't receive ack or nack
> from MAINTAINER for slub. So I re-send these.
Could you combine the first two patches into one? They do the same thing.
2012/5/18 Christoph Lameter <[email protected]>:
> On Fri, 18 May 2012, Joonsoo Kim wrote:
>
>> Two patches were submitted 1 weeks ago, but doesn't receive ack or nack
>> from MAINTAINER for slub. So I re-send these.
>
> Could you combine the first two patches into one? They do the same thing.
>
Of course. I will re-send combined patch soon.
Thank you.
get_freelist() is only called by __slab_alloc() with interrupt disabled,
so __cmpxchg_double_slab() is suitable.
unfreeze_partials() is only called with interrupt disabled,
so __cmpxchg_double_slab() is suitable.
Signed-off-by: Joonsoo Kim <[email protected]>
diff --git a/mm/slub.c b/mm/slub.c
index 0c3105c..c38efce 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1935,7 +1935,7 @@ static void unfreeze_partials(struct kmem_cache *s)
l = m;
}
- } while (!cmpxchg_double_slab(s, page,
+ } while (!__cmpxchg_double_slab(s, page,
old.freelist, old.counters,
new.freelist, new.counters,
"unfreezing slab"));
@@ -2179,7 +2179,7 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page)
new.inuse = page->objects;
new.frozen = freelist != NULL;
- } while (!cmpxchg_double_slab(s, page,
+ } while (!__cmpxchg_double_slab(s, page,
freelist, counters,
NULL, new.counters,
"get_freelist"));
--
1.7.9.5
On Fri, 18 May 2012, Joonsoo Kim wrote:
> get_freelist() is only called by __slab_alloc() with interrupt disabled,
> so __cmpxchg_double_slab() is suitable.
>
> unfreeze_partials() is only called with interrupt disabled,
> so __cmpxchg_double_slab() is suitable.
Combine these sentences as well.
Acked-by: Christoph Lameter <[email protected]>
get_freelist(), unfreeze_partials() are only called with interrupt disabled,
so __cmpxchg_double_slab() is suitable.
Acked-by: Christoph Lameter <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
diff --git a/mm/slub.c b/mm/slub.c
index 0c3105c..c38efce 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1935,7 +1935,7 @@ static void unfreeze_partials(struct kmem_cache *s)
l = m;
}
- } while (!cmpxchg_double_slab(s, page,
+ } while (!__cmpxchg_double_slab(s, page,
old.freelist, old.counters,
new.freelist, new.counters,
"unfreezing slab"));
@@ -2179,7 +2179,7 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page)
new.inuse = page->objects;
new.frozen = freelist != NULL;
- } while (!cmpxchg_double_slab(s, page,
+ } while (!__cmpxchg_double_slab(s, page,
freelist, counters,
NULL, new.counters,
"get_freelist"));
--
1.7.9.5
On Fri, 18 May 2012, Joonsoo Kim wrote:
> I think that these are disadvantages of current implementation,
> so I do refactoring unfreeze_partials().
The reason the current implementation is so complex is to avoid races. The
state of the list and the state of the partial pages must be consistent at
all times.
> Minimizing code in do {} while loop introduce a reduced fail rate
> of cmpxchg_double_slab. Below is output of 'slabinfo -r kmalloc-256'
> when './perf stat -r 33 hackbench 50 process 4000 > /dev/null' is done.
Looks good. If I can convince myself that this does not open up any
new races then I may ack it.
On Thu, 17 May 2012, Christoph Lameter wrote:
> On Fri, 18 May 2012, Joonsoo Kim wrote:
>
> > get_freelist() is only called by __slab_alloc() with interrupt disabled,
> > so __cmpxchg_double_slab() is suitable.
> >
> > unfreeze_partials() is only called with interrupt disabled,
> > so __cmpxchg_double_slab() is suitable.
>
> Combine these sentences as well.
>
> Acked-by: Christoph Lameter <[email protected]>
You should add a comment on top of get_freelist() and unfreeze_partials()
that they now *require* interrupts to be disabled.
On Fri, 18 May 2012, Joonsoo Kim wrote:
> To set page-flag, using SetPageXXXX() and __SetPageXXXX() is more
> understandable and maintainable. So change it.
>
> Signed-off-by: Joonsoo Kim <[email protected]>
>
> diff --git a/mm/slub.c b/mm/slub.c
> index c38efce..69342fd 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1369,7 +1369,7 @@ static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
>
> inc_slabs_node(s, page_to_nid(page), page->objects);
> page->slab = s;
> - page->flags |= 1 << PG_slab;
> + __SetPageSlab(page);
>
> start = page_address(page);
Applied
get_freelist(), unfreeze_partials() are only called with interrupt disabled,
so __cmpxchg_double_slab() is suitable.
Acked-by: Christoph Lameter <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
According to comment from Pekka, add some comment.
diff --git a/mm/slub.c b/mm/slub.c
index 0c3105c..d7f8291 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1879,7 +1879,11 @@ redo:
}
}
-/* Unfreeze all the cpu partial slabs */
+/*
+ * Unfreeze all the cpu partial slabs.
+ *
+ * This function must be called with interrupt disabled.
+ */
static void unfreeze_partials(struct kmem_cache *s)
{
struct kmem_cache_node *n = NULL;
@@ -1935,7 +1939,7 @@ static void unfreeze_partials(struct kmem_cache *s)
l = m;
}
- } while (!cmpxchg_double_slab(s, page,
+ } while (!__cmpxchg_double_slab(s, page,
old.freelist, old.counters,
new.freelist, new.counters,
"unfreezing slab"));
@@ -2163,6 +2167,8 @@ static inline void *new_slab_objects(struct kmem_cache *s, gfp_t flags,
* The page is still frozen if the return value is not NULL.
*
* If this function returns NULL then the page has been unfrozen.
+ *
+ * This function must be called with interrupt disabled.
*/
static inline void *get_freelist(struct kmem_cache *s, struct page *page)
{
@@ -2179,7 +2185,7 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page)
new.inuse = page->objects;
new.frozen = freelist != NULL;
- } while (!cmpxchg_double_slab(s, page,
+ } while (!__cmpxchg_double_slab(s, page,
freelist, counters,
NULL, new.counters,
"get_freelist"));
--
1.7.9.5
2012/5/18 Christoph Lameter <[email protected]>:
> The reason the current implementation is so complex is to avoid races. The
> state of the list and the state of the partial pages must be consistent at
> all times.
OK. I got it.
> Looks good. If I can convince myself that this does not open up any
> new races then I may ack it.
Thanks.
On Thu, 17 May 2012, Christoph Lameter wrote:
> On Fri, 18 May 2012, Joonsoo Kim wrote:
>
>> I think that these are disadvantages of current implementation,
>> so I do refactoring unfreeze_partials().
>
> The reason the current implementation is so complex is to avoid races. The
> state of the list and the state of the partial pages must be consistent at
> all times.
>
>> Minimizing code in do {} while loop introduce a reduced fail rate
>> of cmpxchg_double_slab. Below is output of 'slabinfo -r kmalloc-256'
>> when './perf stat -r 33 hackbench 50 process 4000 > /dev/null' is done.
>
> Looks good. If I can convince myself that this does not open up any
> new races then I may ack it.
This is a reminder mail.
Would u give me some comments for this please?
On Fri, 18 May 2012, Joonsoo Kim wrote:
> Minimizing code in do {} while loop introduce a reduced fail rate
> of cmpxchg_double_slab. Below is output of 'slabinfo -r kmalloc-256'
> when './perf stat -r 33 hackbench 50 process 4000 > /dev/null' is done.
Ok. This works because the pages are frozen and the node lock has been
taken so the only concurrency to worry about is freeing of objects via
slab_free(). The cmpxchg is safe for that.
Acked-by: Christoph Lameter <[email protected]>
On Fri, 18 May 2012, Joonsoo Kim wrote:
> get_freelist(), unfreeze_partials() are only called with interrupt disabled,
> so __cmpxchg_double_slab() is suitable.
>
> Acked-by: Christoph Lameter <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>
Applied, thanks!
> ---
> According to comment from Pekka, add some comment.
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 0c3105c..d7f8291 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1879,7 +1879,11 @@ redo:
> }
> }
>
> -/* Unfreeze all the cpu partial slabs */
> +/*
> + * Unfreeze all the cpu partial slabs.
> + *
> + * This function must be called with interrupt disabled.
> + */
> static void unfreeze_partials(struct kmem_cache *s)
> {
> struct kmem_cache_node *n = NULL;
> @@ -1935,7 +1939,7 @@ static void unfreeze_partials(struct kmem_cache *s)
> l = m;
> }
>
> - } while (!cmpxchg_double_slab(s, page,
> + } while (!__cmpxchg_double_slab(s, page,
> old.freelist, old.counters,
> new.freelist, new.counters,
> "unfreezing slab"));
> @@ -2163,6 +2167,8 @@ static inline void *new_slab_objects(struct kmem_cache *s, gfp_t flags,
> * The page is still frozen if the return value is not NULL.
> *
> * If this function returns NULL then the page has been unfrozen.
> + *
> + * This function must be called with interrupt disabled.
> */
> static inline void *get_freelist(struct kmem_cache *s, struct page *page)
> {
> @@ -2179,7 +2185,7 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page)
> new.inuse = page->objects;
> new.frozen = freelist != NULL;
>
> - } while (!cmpxchg_double_slab(s, page,
> + } while (!__cmpxchg_double_slab(s, page,
> freelist, counters,
> NULL, new.counters,
> "get_freelist"));
> --
> 1.7.9.5
>
>