2019-04-04 09:24:31

by Vlastimil Babka

[permalink] [raw]
Subject: [RFC 0/2] add static key for slub_debug

Hi,

I looked a bit at SLUB debugging capabilities and first thing I noticed is
there's no static key guarding the runtime enablement as is common for similar
debugging functionalities, so here's a RFC to add it. Can be further improved
if there's interest.

It's true that in the alloc fast path the debugging check overhead is AFAICS
amortized by the per-cpu cache, i.e. when the allocation is from there, no
debugging functionality is performed. IMHO that's kinda a weakness, especially
for SLAB_STORE_USER, so I might also look at doing something about it, and then
the static key might be more critical for overhead reduction.

In the freeing fast path I quickly checked the stats and it seems that in
do_slab_free(), the "if (likely(page == c->page))" is not as likely as it
declares, as in the majority of cases, freeing doesn't happen on the object
that belongs to the page currently cached. So the advantage of a static key in
slow path __slab_free() should be more useful immediately.

Vlastimil Babka (2):
mm, slub: introduce static key for slub_debug
mm, slub: add missing kmem_cache_debug() checks

mm/slub.c | 31 +++++++++++++++++++++++++++++--
1 file changed, 29 insertions(+), 2 deletions(-)

--
2.21.0


2019-04-04 09:17:14

by Vlastimil Babka

[permalink] [raw]
Subject: [RFC 1/2] mm, slub: introduce static key for slub_debug

One advantage of CONFIG_SLUB_DEBUG is that a generic distro kernel can be built
with it, but it's inactive until enabled on boot or at runtime, without
rebuilding the kernel. With a static key, we can minimize the overhead of
checking whether slub_debug is enabled, as we do for e.g. page_owner.

For now and for simplicity, the static key stays enabled for the whole uptime
once activated for any cache, although some per-cache debugging options can be
also disabled at runtime. This can be improved if there's interest.

Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/slub.c | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index d30ede89f4a6..398e53e16e2e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -115,10 +115,21 @@
* the fast path and disables lockless freelists.
*/

+#ifdef CONFIG_SLUB_DEBUG
+#ifdef CONFIG_SLUB_DEBUG_ON
+DEFINE_STATIC_KEY_TRUE(slub_debug_enabled);
+#else
+DEFINE_STATIC_KEY_FALSE(slub_debug_enabled);
+#endif
+#endif
+
static inline int kmem_cache_debug(struct kmem_cache *s)
{
#ifdef CONFIG_SLUB_DEBUG
- return unlikely(s->flags & SLAB_DEBUG_FLAGS);
+ if (static_branch_unlikely(&slub_debug_enabled))
+ return s->flags & SLAB_DEBUG_FLAGS;
+ else
+ return 0;
#else
return 0;
#endif
@@ -1287,6 +1298,9 @@ static int __init setup_slub_debug(char *str)
if (*str == ',')
slub_debug_slabs = str + 1;
out:
+ if (slub_debug)
+ static_branch_enable(&slub_debug_enabled);
+
return 1;
}

@@ -5193,6 +5207,7 @@ static ssize_t red_zone_store(struct kmem_cache *s,
s->flags &= ~SLAB_RED_ZONE;
if (buf[0] == '1') {
s->flags |= SLAB_RED_ZONE;
+ static_branch_enable(&slub_debug_enabled);
}
calculate_sizes(s, -1);
return length;
@@ -5213,6 +5228,7 @@ static ssize_t poison_store(struct kmem_cache *s,
s->flags &= ~SLAB_POISON;
if (buf[0] == '1') {
s->flags |= SLAB_POISON;
+ static_branch_enable(&slub_debug_enabled);
}
calculate_sizes(s, -1);
return length;
@@ -5234,6 +5250,7 @@ static ssize_t store_user_store(struct kmem_cache *s,
if (buf[0] == '1') {
s->flags &= ~__CMPXCHG_DOUBLE;
s->flags |= SLAB_STORE_USER;
+ static_branch_enable(&slub_debug_enabled);
}
calculate_sizes(s, -1);
return length;
--
2.21.0

2019-04-04 09:23:50

by Vlastimil Babka

[permalink] [raw]
Subject: [RFC 2/2] mm, slub: add missing kmem_cache_debug() checks

Some debugging checks in SLUB are not hidden behind kmem_cache_debug() check.
Add the check so that those places can also benefit from reduced overhead
thanks to the the static key added by the previous patch.

Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/slub.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 398e53e16e2e..9d1b0e5e8593 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1086,6 +1086,13 @@ static inline void dec_slabs_node(struct kmem_cache *s, int node, int objects)
static void setup_object_debug(struct kmem_cache *s, struct page *page,
void *object)
{
+ /*
+ * __OBJECT_POISON implies SLAB_POISON which is covered by
+ * kmem_cache_debug()
+ */
+ if (!kmem_cache_debug(s))
+ return;
+
if (!(s->flags & (SLAB_STORE_USER|SLAB_RED_ZONE|__OBJECT_POISON)))
return;

@@ -1095,6 +1102,9 @@ static void setup_object_debug(struct kmem_cache *s, struct page *page,

static void setup_page_debug(struct kmem_cache *s, void *addr, int order)
{
+ if (!kmem_cache_debug(s))
+ return;
+
if (!(s->flags & SLAB_POISON))
return;

@@ -1734,7 +1744,7 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
int order = compound_order(page);
int pages = 1 << order;

- if (s->flags & SLAB_CONSISTENCY_CHECKS) {
+ if (kmem_cache_debug(s) && s->flags & SLAB_CONSISTENCY_CHECKS) {
void *p;

slab_pad_check(s, page);
--
2.21.0

Subject: Re: [RFC 0/2] add static key for slub_debug

On Thu, 4 Apr 2019, Vlastimil Babka wrote:

> I looked a bit at SLUB debugging capabilities and first thing I noticed is
> there's no static key guarding the runtime enablement as is common for similar
> debugging functionalities, so here's a RFC to add it. Can be further improved
> if there's interest.

Well the runtime enablement is per slab cache and static keys are global.

Adding static key adds code to the critical paths. Since the flags for a
kmem_cache have to be inspected anyways there may not be that much of a
benefit.

> It's true that in the alloc fast path the debugging check overhead is AFAICS
> amortized by the per-cpu cache, i.e. when the allocation is from there, no
> debugging functionality is performed. IMHO that's kinda a weakness, especially
> for SLAB_STORE_USER, so I might also look at doing something about it, and then
> the static key might be more critical for overhead reduction.

Moving debugging out of the per cpu fastpath allows that fastpath to be
much simpler and faster.

SLAB_STORE_USER is mostly used only for debugging in which case we are
less concerned with performance.

If you want to use SLAB_STORE_USER in the fastpath then we have to do some
major redesign there.

> In the freeing fast path I quickly checked the stats and it seems that in
> do_slab_free(), the "if (likely(page == c->page))" is not as likely as it
> declares, as in the majority of cases, freeing doesn't happen on the object
> that belongs to the page currently cached. So the advantage of a static key in
> slow path __slab_free() should be more useful immediately.

Right. The freeing logic is actuall a weakness in terms of performance for
SLUB due to the need to operate on a per page queue immediately.

Subject: Re: [RFC 2/2] mm, slub: add missing kmem_cache_debug() checks

On Thu, 4 Apr 2019, Vlastimil Babka wrote:

> Some debugging checks in SLUB are not hidden behind kmem_cache_debug() check.
> Add the check so that those places can also benefit from reduced overhead
> thanks to the the static key added by the previous patch.

Hmmm... I would not expect too much of a benefit from these changes since
most of the stuff is actually not on the hot path.

2019-04-04 22:01:07

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [RFC 0/2] add static key for slub_debug

On 4/4/19 5:57 PM, Christopher Lameter wrote:
> On Thu, 4 Apr 2019, Vlastimil Babka wrote:
>
>> I looked a bit at SLUB debugging capabilities and first thing I noticed is
>> there's no static key guarding the runtime enablement as is common for similar
>> debugging functionalities, so here's a RFC to add it. Can be further improved
>> if there's interest.
>
> Well the runtime enablement is per slab cache and static keys are global.

Sure, but the most common scenario worth optimizing for is that no slab
cache has debugging enabled, and thus the global static key is disabled.

Once it becomes enabled, the flags are checked for each cache, no change
there.

> Adding static key adds code to the critical paths.

It's effectively a NOP as long as it's disabled. When enabled, the NOP
is livepatched into a jump (unconditional!) to the flags check.

> Since the flags for a
> kmem_cache have to be inspected anyways there may not be that much of a
> benefit.

The point is that as long as it's disabled (the common case), no flag
check (most likely involving a conditional jump) is being executed at
all (unlike now). NOP is obviously cheaper than a flag check. For the
(uncommon) case with debugging enabled, it adds unconditional jump which
is also rather cheap. So the tradeoff looks good.

>> It's true that in the alloc fast path the debugging check overhead is AFAICS
>> amortized by the per-cpu cache, i.e. when the allocation is from there, no
>> debugging functionality is performed. IMHO that's kinda a weakness, especially
>> for SLAB_STORE_USER, so I might also look at doing something about it, and then
>> the static key might be more critical for overhead reduction.
>
> Moving debugging out of the per cpu fastpath allows that fastpath to be
> much simpler and faster.
>
> SLAB_STORE_USER is mostly used only for debugging in which case we are
> less concerned with performance.

Agreed, so it would be nice if we could do e.g. SLAB_STORE_USER for all
allocations in such case.

> If you want to use SLAB_STORE_USER in the fastpath then we have to do some
> major redesign there.

Sure. Just saying that benefit of static key in alloc path is currently
limited as the debugging itself is limited due to alloc fast path being
effective. But there's still immediate benefit in free path.

>> In the freeing fast path I quickly checked the stats and it seems that in
>> do_slab_free(), the "if (likely(page == c->page))" is not as likely as it
>> declares, as in the majority of cases, freeing doesn't happen on the object
>> that belongs to the page currently cached. So the advantage of a static key in
>> slow path __slab_free() should be more useful immediately.
>
> Right. The freeing logic is actuall a weakness in terms of performance for
> SLUB due to the need to operate on a per page queue immediately.
>