2021-02-02 08:10:14

by Abel Wu

[permalink] [raw]
Subject: [PATCH] mm/slub: embed __slab_alloc to its caller

Since slab_alloc_node() is the only caller of __slab_alloc(), embed
__slab_alloc() to its caller to save function call overhead. This
will also expand the caller's code block size a bit, but hackbench
tests on both host and guest didn't show a difference w/ or w/o
this patch.

Also rename ___slab_alloc() to __slab_alloc().

Signed-off-by: Abel Wu <[email protected]>
---
mm/slub.c | 46 ++++++++++++++++------------------------------
1 file changed, 16 insertions(+), 30 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 7ecbbbe5bc0c..0f69d2d0471a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2654,10 +2654,9 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page)
* we need to allocate a new slab. This is the slowest path since it involves
* a call to the page allocator and the setup of a new slab.
*
- * Version of __slab_alloc to use when we know that interrupts are
- * already disabled (which is the case for bulk allocation).
+ * Must be called with interrupts disabled.
*/
-static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
+static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
unsigned long addr, struct kmem_cache_cpu *c)
{
void *freelist;
@@ -2758,31 +2757,6 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
return freelist;
}

-/*
- * Another one that disabled interrupt and compensates for possible
- * cpu changes by refetching the per cpu area pointer.
- */
-static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
- unsigned long addr, struct kmem_cache_cpu *c)
-{
- void *p;
- unsigned long flags;
-
- local_irq_save(flags);
-#ifdef CONFIG_PREEMPTION
- /*
- * We may have been preempted and rescheduled on a different
- * cpu before disabling interrupts. Need to reload cpu area
- * pointer.
- */
- c = this_cpu_ptr(s->cpu_slab);
-#endif
-
- p = ___slab_alloc(s, gfpflags, node, addr, c);
- local_irq_restore(flags);
- return p;
-}
-
/*
* If the object has been wiped upon free, make sure it's fully initialized by
* zeroing out freelist pointer.
@@ -2854,7 +2828,19 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
object = c->freelist;
page = c->page;
if (unlikely(!object || !page || !node_match(page, node))) {
+ unsigned long flags;
+
+ local_irq_save(flags);
+#ifdef CONFIG_PREEMPTION
+ /*
+ * We may have been preempted and rescheduled on a different
+ * cpu before disabling interrupts. Need to reload cpu area
+ * pointer.
+ */
+ c = this_cpu_ptr(s->cpu_slab);
+#endif
object = __slab_alloc(s, gfpflags, node, addr, c);
+ local_irq_restore(flags);
} else {
void *next_object = get_freepointer_safe(s, object);

@@ -3299,7 +3285,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
* We may have removed an object from c->freelist using
* the fastpath in the previous iteration; in that case,
* c->tid has not been bumped yet.
- * Since ___slab_alloc() may reenable interrupts while
+ * Since __slab_alloc() may reenable interrupts while
* allocating memory, we should bump c->tid now.
*/
c->tid = next_tid(c->tid);
@@ -3308,7 +3294,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
* Invoking slow path likely have side-effect
* of re-populating per CPU c->freelist
*/
- p[i] = ___slab_alloc(s, flags, NUMA_NO_NODE,
+ p[i] = __slab_alloc(s, flags, NUMA_NO_NODE,
_RET_IP_, c);
if (unlikely(!p[i]))
goto error;
--
2.27.0


Subject: Re: [PATCH] mm/slub: embed __slab_alloc to its caller

On Tue, 2 Feb 2021, Abel Wu wrote:

> Since slab_alloc_node() is the only caller of __slab_alloc(), embed
> __slab_alloc() to its caller to save function call overhead. This
> will also expand the caller's code block size a bit, but hackbench
> tests on both host and guest didn't show a difference w/ or w/o
> this patch.

slab_alloc_node is an always_inline function. It is intentional that only
the fast path was inlined and not the slow path.

2021-02-03 01:46:52

by Abel Wu

[permalink] [raw]
Subject: Re: [PATCH] mm/slub: embed __slab_alloc to its caller

> On Feb 2, 2021, at 6:11 PM, Christoph Lameter <[email protected]> wrote:
>
> On Tue, 2 Feb 2021, Abel Wu wrote:
>
>> Since slab_alloc_node() is the only caller of __slab_alloc(), embed
>> __slab_alloc() to its caller to save function call overhead. This
>> will also expand the caller's code block size a bit, but hackbench
>> tests on both host and guest didn't show a difference w/ or w/o
>> this patch.
>
> slab_alloc_node is an always_inline function. It is intentional that only
> the fast path was inlined and not the slow path.

Oh I got it. Thanks for your excellent explanation.

Best Regards,
Abel

2021-02-05 13:13:35

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH] mm/slub: embed __slab_alloc to its caller

On 2/3/21 2:41 AM, Abel Wu wrote:
>> On Feb 2, 2021, at 6:11 PM, Christoph Lameter <[email protected]> wrote:
>>
>> On Tue, 2 Feb 2021, Abel Wu wrote:
>>
>>> Since slab_alloc_node() is the only caller of __slab_alloc(), embed
>>> __slab_alloc() to its caller to save function call overhead. This
>>> will also expand the caller's code block size a bit, but hackbench
>>> tests on both host and guest didn't show a difference w/ or w/o
>>> this patch.
>>
>> slab_alloc_node is an always_inline function. It is intentional that only
>> the fast path was inlined and not the slow path.
>
> Oh I got it. Thanks for your excellent explanation.

BTW, there's a script in the Linux source to nicely see the effect of such changes:

./scripts/bloat-o-meter slub.o.before mm/slub.o
add/remove: 0/1 grow/shrink: 9/0 up/down: 1660/-1130 (530)
Function old new delta
__slab_alloc 127 1130 +1003
__kmalloc_track_caller 877 965 +88
__kmalloc 878 966 +88
kmem_cache_alloc 778 862 +84
__kmalloc_node_track_caller 996 1080 +84
kmem_cache_alloc_node_trace 813 896 +83
kmem_cache_alloc_node 800 881 +81
kmem_cache_alloc_trace 786 862 +76
__kmalloc_node 998 1071 +73
___slab_alloc 1130 - -1130
Total: Before=57782, After=58312, chg +0.92%

And yeah, bloating all the entry points wouldn't be nice.
Thanks,
Vlastimil