2011-05-24 11:55:25

by James Morris

[permalink] [raw]
Subject: SLUB regression in current Linus

I started experiencing a boot hang in 2.6.39+, and bisected it to this
change:

commit 442b06bcea23a01934d3da7ec5898fa154a6cafb
Author: Christoph Lameter <[email protected]>
Date: Tue May 17 16:29:31 2011 -0500

slub: Remove node check in slab_free


Reverting the patch appears to fix the hang for me, although I'm not sure
what the actual problem is.

This is on a quad-core Opteron (1352). Let me know if you need any further
info.



- James
--
James Morris
<[email protected]>


2011-05-24 12:07:16

by James Morris

[permalink] [raw]
Subject: Re: SLUB regression in current Linus

On Tue, 24 May 2011, James Morris wrote:

> Reverting the patch appears to fix the hang for me, although I'm not sure
> what the actual problem is.

Disabling slub debugging also fixes it.


- James
--
James Morris
<[email protected]>

Subject: Re: SLUB regression in current Linus

On Tue, 24 May 2011, James Morris wrote:

> On Tue, 24 May 2011, James Morris wrote:
>
> > Reverting the patch appears to fix the hang for me, although I'm not sure
> > what the actual problem is.
>
> Disabling slub debugging also fixes it.

Boots fine here with and without debugging. Could you send me the kernel
configuration file?

2011-05-24 22:15:37

by James Morris

[permalink] [raw]
Subject: Re: SLUB regression in current Linus

On Tue, 24 May 2011, Christoph Lameter wrote:

> On Tue, 24 May 2011, James Morris wrote:
>
> > On Tue, 24 May 2011, James Morris wrote:
> >
> > > Reverting the patch appears to fix the hang for me, although I'm not sure
> > > what the actual problem is.
> >
> > Disabling slub debugging also fixes it.
>
> Boots fine here with and without debugging. Could you send me the kernel
> configuration file?
>

Attached.

It turned out the system was still unstable with the attached config
(e.g. spontaneous reboot).


--
James Morris
<[email protected]>


Attachments:
conf-test (51.32 kB)

2011-05-24 23:03:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: SLUB regression in current Linus

On Tue, May 24, 2011 at 4:52 AM, James Morris <[email protected]> wrote:
>
> Reverting the patch appears to fix the hang for me, although I'm not sure
> what the actual problem is.
>
> This is on a quad-core Opteron (1352). Let me know if you need any further
> info.

That whole "deactivate_slab()" + "c->page = NULL" that that patch does
looks bogus.

Look at __slab_alloc: we have:


page = c->page;
if (!page)
goto new_slab;

slab_lock(page);
if (unlikely(!node_match(c, node)))
goto another_slab;

and let's assume we have two users racing on that "c->page". The
"slab_lock()" is going to work for one of them, right?

Ok, so the one it works for will then hit

if (kmem_cache_debug(s))
goto debug;

and thus get to the new "deactivate_slab(s,c) + c->page = NULL" and
then unlock the page.

In the meantime, the one that wasn't able to lock the page will now go
forward, but will not have "node_match()" any more, so it does that
"goto another_slab".

Which does "deactivate_slab(s,c)" again, and now c->page is NULL, so
that totally breaks.

What am I missing?

That patch seems to be just broken piece-of-s%^!

Christoph, Pekka, please tell me why I shouldn't immediately revert
it. What am I missing?

Linus

2011-05-25 05:22:24

by Pekka Enberg

[permalink] [raw]
Subject: Re: SLUB regression in current Linus

On Wed, May 25, 2011 at 2:03 AM, Linus Torvalds
<[email protected]> wrote:
> On Tue, May 24, 2011 at 4:52 AM, James Morris <[email protected]> wrote:
>>
>> Reverting the patch appears to fix the hang for me, although I'm not sure
>> what the actual problem is.
>>
>> This is on a quad-core Opteron (1352). Let me know if you need any further
>> info.
>
> That whole "deactivate_slab()" + "c->page = NULL" that that patch does
> looks bogus.
>
> Look at __slab_alloc: we have:
>
>
> ? ? ? ?page = c->page;
> ? ? ? ?if (!page)
> ? ? ? ? ? ? ? ?goto new_slab;
>
> ? ? ? ?slab_lock(page);
> ? ? ? ?if (unlikely(!node_match(c, node)))
> ? ? ? ? ? ? ? ?goto another_slab;
>
> and let's assume we have two users racing on that "c->page". The
> "slab_lock()" is going to work for one of them, right?
>
> Ok, so the one it works for will then hit
>
> ? ? ? ?if (kmem_cache_debug(s))
> ? ? ? ? ? ? ? ?goto debug;
>
> and thus get to the new "deactivate_slab(s,c) + c->page = NULL" and
> then unlock the page.
>
> In the meantime, the one that wasn't able to lock the page will now go
> forward, but will not have "node_match()" any more, so it does that
> "goto another_slab".
>
> Which does "deactivate_slab(s,c)" again, and now c->page is NULL, so
> that totally breaks.
>
> What am I missing?
>
> That patch seems to be just broken piece-of-s%^!
>
> Christoph, Pekka, please tell me why I shouldn't immediately revert
> it. What am I missing?

It's safe to revert it, yes. Christoph? AFAICT Linus is correct here.

2011-05-25 07:40:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: SLUB regression in current Linus


* James Morris <[email protected]> wrote:

> I started experiencing a boot hang in 2.6.39+, and bisected it to this
> change:

i too was seeing various instabilities and SLUB corruption in -tip testing.
Every 5th randconfig kernel crashes.

I started testing the revert.

Thanks,

Ingo

2011-05-25 08:58:30

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH] Revert "slub: Remove node check in slab_free"


* Ingo Molnar <[email protected]> wrote:

>
> * James Morris <[email protected]> wrote:
>
> > I started experiencing a boot hang in 2.6.39+, and bisected it to this
> > change:
>
> i too was seeing various instabilities and SLUB corruption in -tip testing.
> Every 5th randconfig kernel crashes.
>
> I started testing the revert.

Stability of -tip has increased dramatically (36 test iterations and
no crash), so yes this revert resolves the crashes.

I've attached the changelogified revert patch below. James, since i
was seeing these sporadically you've saved me a couple of hours of
rather painful bisection work! :-)

Thanks,

Ingo

----------------->
>From 6ac0730862b6dd1b45bc86e8e61e4026293b09f9 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Wed, 25 May 2011 09:37:47 +0200
Subject: [PATCH] Revert "slub: Remove node check in slab_free"

This reverts commit 442b06bcea23a01934d3da7ec5898fa154a6cafb.

As Linus explained it's broken. Quoting Linus:

That whole "deactivate_slab()" + "c->page = NULL" that this patch
does looks bogus.

Look at __slab_alloc(), we have:

page = c->page;
if (!page)
goto new_slab;

slab_lock(page);
if (unlikely(!node_match(c, node)))
goto another_slab;

and let's assume we have two users racing on that "c->page". The
"slab_lock()" is going to work for one of them, right?

Ok, so the one it works for will then hit:

if (kmem_cache_debug(s))
goto debug;

and thus get to the new "deactivate_slab(s,c) + c->page = NULL" and
then unlock the page.

In the meantime, the one that wasn't able to lock the page will now
go forward, but will not have "node_match()" any more, so it does
that "goto another_slab".

Which does "deactivate_slab(s,c)" again, and now c->page is NULL, so
that totally breaks.

Reported-and-bisected-by: James Morris <[email protected]>
Analyzed-by: Linus Torvalds <[email protected]>
Acked-by: Pekka Enberg <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
mm/slub.c | 4 +---
1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 4ea7f1a..ed1281b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1881,8 +1881,6 @@ debug:

page->inuse++;
page->freelist = get_freepointer(s, object);
- deactivate_slab(s, c);
- c->page = NULL;
c->node = NUMA_NO_NODE;
goto unlock_out;
}
@@ -2114,7 +2112,7 @@ redo:
tid = c->tid;
barrier();

- if (likely(page == c->page)) {
+ if (likely(page == c->page && c->node != NUMA_NO_NODE)) {
set_freepointer(s, object, c->freelist);

if (unlikely(!irqsafe_cpu_cmpxchg_double(

2011-05-25 11:13:29

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH] Revert "slub: Remove node check in slab_free"

On Wed, May 25, 2011 at 11:58 AM, Ingo Molnar <[email protected]> wrote:
>> i too was seeing various instabilities and SLUB corruption in -tip testing.
>> Every 5th randconfig kernel crashes.
>>
>> I started testing the revert.
>
> Stability of -tip has increased dramatically (36 test iterations and
> no crash), so yes this revert resolves the crashes.
>
> I've attached the changelogified revert patch below. James, since i
> was seeing these sporadically you've saved me a couple of hours of
> rather painful bisection work! :-)

Thank you Ingo!

2011-05-25 12:42:53

by Jens Axboe

[permalink] [raw]
Subject: Re: SLUB regression in current Linus

On 2011-05-25 09:40, Ingo Molnar wrote:
>
> * James Morris <[email protected]> wrote:
>
>> I started experiencing a boot hang in 2.6.39+, and bisected it to this
>> change:
>
> i too was seeing various instabilities and SLUB corruption in -tip testing.
> Every 5th randconfig kernel crashes.
>
> I started testing the revert.

My test machine insta-crashes on boot. Lets please get this reverted.

--
Jens Axboe

Subject: Re: SLUB regression in current Linus

On Tue, 24 May 2011, Linus Torvalds wrote:

> Look at __slab_alloc: we have:
>
>
> page = c->page;
> if (!page)
> goto new_slab;
>
> slab_lock(page);
> if (unlikely(!node_match(c, node)))
> goto another_slab;
>
> and let's assume we have two users racing on that "c->page". The
> "slab_lock()" is going to work for one of them, right?

There cannot be two users racing through this code segment since we have
interrupts disabled and c is pointing to a per cpu structure. c->page
points to a page that can only be allocated from from the current
processor (from the freelist in c->freelist) but it can be freed to from
multiple cpus (via the page->freelist). The code that you are discussing
is copying the freed objects from the page->freelist to the per cpu
freelist and it needs to lock out the slab_free path to do that.

Subject: Re: SLUB regression in current Linus

On Wed, 25 May 2011, James Morris wrote:

> It turned out the system was still unstable with the attached config
> (e.g. spontaneous reboot).

Ahh. Thank you.

Here is the fix:

Subject: slub: Fix double bit unlock in debug mode

Commit 442b06bcea23a01934d3da7ec5898fa154a6cafb added a deactivate_slab()
in the debug case in __slab_alloc(). deactivate_slab() unlocks the current
slab used for allocation. Going to the label unlock_out: does it again.

So simply return the object. In the debug case we do not need all the other
processing that unlock_out: does.

Signed-off-by: Christoph Lameter <[email protected]>

---
mm/slub.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2011-05-25 09:41:27.000000000 -0500
+++ linux-2.6/mm/slub.c 2011-05-25 09:41:39.000000000 -0500
@@ -1884,7 +1884,8 @@ debug:
deactivate_slab(s, c);
c->page = NULL;
c->node = NUMA_NO_NODE;
- goto unlock_out;
+ local_irq_restore(flags);
+ return object;
}

/*


Attachments:
conf-test (51.32 kB)

2011-05-25 14:55:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Revert "slub: Remove node check in slab_free"

On Wed, May 25, 2011 at 1:58 AM, Ingo Molnar <[email protected]> wrote:
>
> As Linus explained it's broken. Quoting Linus:

Actually, apparently Linus' explanation doesn't hold water.

There does seem to be something very broken in that commit, and a
revert is appropriate, but as Christoph says, "c" is a local per-cpu
thing that is protected by interrupts being disabled. So the case I
thought looked bad cannot happen - it's not actually protected by the
page lock.

That said, we have one bisect that pinpointed that commit, and you
reporting that stability improves dramatically by reverting it, so I
definitely will revert it. I would like to be able to figure out what
the problem with it is, though.

(Not that that will hold up the revert).

One thing that I don't understand is why the debug case does *any* of
that at all: it used to only do "c->node = NUMA_NO_NODE;" and even
that looks bogus. A normal successful allocation doesn't do it, so why
does the debug allocation? The normal allocation path just does

c->freelist = get_freepointer(s, object);
page->inuse = page->objects;
page->freelist = NULL;

so why does the debug path do something totally different:

page->inuse++;
page->freelist = get_freepointer(s, object);
deactivate_slab(s, c);
c->page = NULL;
c->node = NUMA_NO_NODE;

(with that "decativate_slab + c->page = NULL" being the new part to
the offending commit).

The code is also ugly as hell. "deactivate_slab()" already sets
c->page to NULL, and it probably should have set c->node too.
Whatever.

Christoph, please look into this. With the kind of confirmation we
have, there is no question that commit gets reverted. So I'll revert
it soon, but I'd still like to know what is going on.

Linus

2011-05-25 14:58:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: SLUB regression in current Linus

On Wed, May 25, 2011 at 7:47 AM, Christoph Lameter <[email protected]> wrote:
>
> So simply return the object. In the debug case we do not need all the other
> processing that unlock_out: does.

So that patch looks like it should explain things.

But exactly why don't we need the ->tid update and the ALLOC_SLOWPATH
stats? Both of them would seem to be equally valid for the debug case.

Linus

Subject: Re: SLUB regression in current Linus

On Wed, 25 May 2011, Linus Torvalds wrote:

> On Wed, May 25, 2011 at 7:47 AM, Christoph Lameter <[email protected]> wrote:
> >
> > So simply return the object. In the debug case we do not need all the other
> > processing that unlock_out: does.
>
> So that patch looks like it should explain things.
>
> But exactly why don't we need the ->tid update and the ALLOC_SLOWPATH
> stats? Both of them would seem to be equally valid for the debug case.

We always fall back to the slow path in the debug case. So the tid update
is useless.

ALLOC_SLOWPATH would just be incremented for all allocations. Also a
pretty useless thing.

2011-05-25 15:42:36

by Eric Dumazet

[permalink] [raw]
Subject: Re: SLUB regression in current Linus

Le mercredi 25 mai 2011 à 09:47 -0500, Christoph Lameter a écrit :
> On Wed, 25 May 2011, James Morris wrote:
>
> > It turned out the system was still unstable with the attached config
> > (e.g. spontaneous reboot).
>
> Ahh. Thank you.
>
> Here is the fix:
>
> Subject: slub: Fix double bit unlock in debug mode
>
> Commit 442b06bcea23a01934d3da7ec5898fa154a6cafb added a deactivate_slab()
> in the debug case in __slab_alloc(). deactivate_slab() unlocks the current
> slab used for allocation. Going to the label unlock_out: does it again.
>
> So simply return the object. In the debug case we do not need all the other
> processing that unlock_out: does.
>
> Signed-off-by: Christoph Lameter <[email protected]>
>
> ---
> mm/slub.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c 2011-05-25 09:41:27.000000000 -0500
> +++ linux-2.6/mm/slub.c 2011-05-25 09:41:39.000000000 -0500
> @@ -1884,7 +1884,8 @@ debug:
> deactivate_slab(s, c);
> c->page = NULL;

is this c->page = NULL; really necessary ?

Thanks !

> c->node = NUMA_NO_NODE;
> - goto unlock_out;
> + local_irq_restore(flags);
> + return object;
> }
>
> /*

2011-05-25 15:45:47

by Eric Dumazet

[permalink] [raw]
Subject: Re: SLUB regression in current Linus

Le mercredi 25 mai 2011 à 10:04 -0500, Christoph Lameter a écrit :
> On Wed, 25 May 2011, Linus Torvalds wrote:
>
> > On Wed, May 25, 2011 at 7:47 AM, Christoph Lameter <[email protected]> wrote:
> > >
> > > So simply return the object. In the debug case we do not need all the other
> > > processing that unlock_out: does.
> >
> > So that patch looks like it should explain things.
> >
> > But exactly why don't we need the ->tid update and the ALLOC_SLOWPATH
> > stats? Both of them would seem to be equally valid for the debug case.
>
> We always fall back to the slow path in the debug case. So the tid update
> is useless.

tid change is updated anyway at the end of deactivate_slab()

Subject: Re: SLUB regression in current Linus

On Wed, 25 May 2011, Eric Dumazet wrote:

> > Index: linux-2.6/mm/slub.c
> > ===================================================================
> > --- linux-2.6.orig/mm/slub.c 2011-05-25 09:41:27.000000000 -0500
> > +++ linux-2.6/mm/slub.c 2011-05-25 09:41:39.000000000 -0500
> > @@ -1884,7 +1884,8 @@ debug:
> > deactivate_slab(s, c);
> > c->page = NULL;
>
> is this c->page = NULL; really necessary ?

No. deactivate_slab() already sets it to NULL.

Subject: Re: SLUB regression in current Linus

On Wed, 25 May 2011, Eric Dumazet wrote:

> > We always fall back to the slow path in the debug case. So the tid update
> > is useless.
>
> tid change is updated anyway at the end of deactivate_slab()

That too is useless in the debug case since we already disable
interrupts and do not need tid based synchronization.

2011-05-25 23:53:21

by David Rientjes

[permalink] [raw]
Subject: Re: SLUB regression in current Linus

On Wed, 25 May 2011, Jens Axboe wrote:

> >> I started experiencing a boot hang in 2.6.39+, and bisected it to this
> >> change:
> >
> > i too was seeing various instabilities and SLUB corruption in -tip testing.
> > Every 5th randconfig kernel crashes.
> >
> > I started testing the revert.
>
> My test machine insta-crashes on boot. Lets please get this reverted.
>

Linus merged a71ae47a2cbf ("slub: Fix double bit unlock in debug mode")
that should fix the issue.