Commit 0ad9500e16fe24aa55809a2b00e0d2d0e658fc71 ('slub: prefetch
next freelist pointer in slab_alloc') add prefetch instruction to
fast path of allocation.
Same benefit is also available in slow path of allocation, but it is not
large portion of overall allocation. Nevertheless we could get
some benifit from it, so prefetch next freelist pointer in __slab_alloc.
Signed-off-by: Joonsoo Kim <[email protected]>
diff --git a/mm/slub.c b/mm/slub.c
index f96d8bc..92f1c0e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2248,6 +2248,7 @@ load_freelist:
VM_BUG_ON(!c->page->frozen);
c->freelist = get_freepointer(s, freelist);
c->tid = next_tid(c->tid);
+ prefetch_freepointer(s, c->freelist);
local_irq_restore(flags);
return freelist;
--
1.7.9.5
In current implementation, after unfreezing, we doesn't touch oldpage,
so it remain 'NOT NULL'. When we call this_cpu_cmpxchg()
with this old oldpage, this_cpu_cmpxchg() is mostly be failed.
We can change value of oldpage to NULL after unfreezing,
because unfreeze_partial() ensure that all the cpu partial slabs is removed
from cpu partial list. In this time, we could expect that
this_cpu_cmpxchg is mostly succeed.
Signed-off-by: Joonsoo Kim <[email protected]>
diff --git a/mm/slub.c b/mm/slub.c
index 92f1c0e..531d8ed 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1968,6 +1968,7 @@ int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
local_irq_save(flags);
unfreeze_partials(s);
local_irq_restore(flags);
+ oldpage = NULL;
pobjects = 0;
pages = 0;
stat(s, CPU_PARTIAL_DRAIN);
--
1.7.9.5
In some case of __slab_free(), we need a lock for manipulating partial list.
If freeing object with a lock is failed, a lock doesn't needed anymore
for some reasons.
Case 1. prior is NULL, kmem_cache_debug(s) is true
In this case, another free is occured before our free is succeed.
When slab is full(prior is NULL), only possible operation is slab_free().
So in this case, we guess another free is occured.
It may make a slab frozen, so lock is not needed anymore.
Case 2. inuse is NULL
In this case, acquire_slab() is occured before out free is succeed.
We have a last object for slab, so other operation for this slab is
not possible except acquire_slab().
Acquire_slab() makes a slab frozen, so lock is not needed anymore.
Above two reason explain why we don't need a lock
when freeing object with a lock is failed.
So, when cmpxchg_double_slab() is failed, releasing a lock is more suitable.
This may reduce lock contention.
This also make logic somehow simple that 'was_frozen with a lock' case
is never occured. Remove it.
Signed-off-by: Joonsoo Kim <[email protected]>
diff --git a/mm/slub.c b/mm/slub.c
index 531d8ed..3e0b9db 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2438,7 +2438,6 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
void *prior;
void **object = (void *)x;
int was_frozen;
- int inuse;
struct page new;
unsigned long counters;
struct kmem_cache_node *n = NULL;
@@ -2450,13 +2449,17 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
return;
do {
+ if (unlikely(n)) {
+ spin_unlock_irqrestore(&n->list_lock, flags);
+ n = NULL;
+ }
prior = page->freelist;
counters = page->counters;
set_freepointer(s, object, prior);
new.counters = counters;
was_frozen = new.frozen;
new.inuse--;
- if ((!new.inuse || !prior) && !was_frozen && !n) {
+ if ((!new.inuse || !prior) && !was_frozen) {
if (!kmem_cache_debug(s) && !prior)
@@ -2481,7 +2484,6 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
}
}
- inuse = new.inuse;
} while (!cmpxchg_double_slab(s, page,
prior, counters,
@@ -2507,25 +2509,17 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
return;
}
+ if (unlikely(!new.inuse && n->nr_partial > s->min_partial))
+ goto slab_empty;
+
/*
- * was_frozen may have been set after we acquired the list_lock in
- * an earlier loop. So we need to check it here again.
+ * Objects left in the slab. If it was not on the partial list before
+ * then add it.
*/
- if (was_frozen)
- stat(s, FREE_FROZEN);
- else {
- if (unlikely(!inuse && n->nr_partial > s->min_partial))
- goto slab_empty;
-
- /*
- * Objects left in the slab. If it was not on the partial list before
- * then add it.
- */
- if (unlikely(!prior)) {
- remove_full(s, page);
- add_partial(n, page, DEACTIVATE_TO_TAIL);
- stat(s, FREE_ADD_PARTIAL);
- }
+ if (kmem_cache_debug(s) && unlikely(!prior)) {
+ remove_full(s, page);
+ add_partial(n, page, DEACTIVATE_TO_TAIL);
+ stat(s, FREE_ADD_PARTIAL);
}
spin_unlock_irqrestore(&n->list_lock, flags);
return;
--
1.7.9.5
Commit 0ad9500e16fe24aa55809a2b00e0d2d0e658fc71 ('slub: prefetch
next freelist pointer in slab_alloc') add prefetch instruction to
fast path of allocation.
Same benefit is also available in slow path of allocation, but it is not
large portion of overall allocation. Nevertheless we could get
some benifit from it, so prefetch next freelist pointer in __slab_alloc.
Cc: Eric Dumazet <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
Add 'Cc: Eric Dumazet <[email protected]>'
diff --git a/mm/slub.c b/mm/slub.c
index f96d8bc..92f1c0e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2248,6 +2248,7 @@ load_freelist:
VM_BUG_ON(!c->page->frozen);
c->freelist = get_freepointer(s, freelist);
c->tid = next_tid(c->tid);
+ prefetch_freepointer(s, c->freelist);
local_irq_restore(flags);
return freelist;
--
1.7.9.5
On Sat, 23 Jun 2012, Joonsoo Kim wrote:
> In current implementation, after unfreezing, we doesn't touch oldpage,
> so it remain 'NOT NULL'. When we call this_cpu_cmpxchg()
> with this old oldpage, this_cpu_cmpxchg() is mostly be failed.
>
> We can change value of oldpage to NULL after unfreezing,
> because unfreeze_partial() ensure that all the cpu partial slabs is removed
> from cpu partial list. In this time, we could expect that
> this_cpu_cmpxchg is mostly succeed.
>
> Signed-off-by: Joonsoo Kim <[email protected]>
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 92f1c0e..531d8ed 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1968,6 +1968,7 @@ int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
> local_irq_save(flags);
> unfreeze_partials(s);
> local_irq_restore(flags);
> + oldpage = NULL;
> pobjects = 0;
> pages = 0;
> stat(s, CPU_PARTIAL_DRAIN);
Applied, thanks!