2007-11-03 17:24:27

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 1/2] slub: fix leakage

Slub has been quite leaky under load. Taking mm_struct as an example, in
a loop of swapping kernel builds, after the first iteration slabinfo shows:
Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg
mm_struct 55 840 73.7K 18/7/4 4 0 38 62 A
but Objects and Partials steadily creep up - after the 340th iteration:
mm_struct 110 840 188.4K 46/36/4 4 0 78 49 A

The culprit turns out to be __slab_alloc(), where it copes with the race
that another task has assigned the cpu slab while we were allocating one.
Don't rush off to load_freelist there: that assumes c->freelist is empty,
and will lose all of its free slots when c->page->freelist is not empty.
Instead just do a local allocation from c->freelist when it has one.

Which fixes the leakage: Objects and Partials then remain stable.

Signed-off-by: Hugh Dickins <[email protected]>
---
I recommend this for 2.6.24-rc2 and 2.6.23-stable.

mm/slub.c | 5 +++++
1 file changed, 5 insertions(+)

--- 2.6.24-rc1/mm/slub.c 2007-10-24 07:16:04.000000000 +0100
+++ linux/mm/slub.c 2007-11-03 13:22:31.000000000 +0000
@@ -1525,6 +1525,11 @@ new_slab:
* want the current one since its cache hot
*/
discard_slab(s, new);
+ if (c->freelist) {
+ object = c->freelist;
+ c->freelist = object[c->offset];
+ return object;
+ }
slab_lock(c->page);
goto load_freelist;
}


2007-11-03 17:24:38

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 2/2] slub: fix Objects count

The count of active Objects shown by Slub's slabinfo is too approximate,
because each cpu slab is counted as all in use, even when lots are free.
That makes tracing leaks harder than it need be.

Add a free count into kmem_cache_cpu (which doesn't enlarge it on 64-bit),
to keep that count in the hot and dirty per-cpu cacheline.

Signed-off-by: Hugh Dickins <[email protected]>
---
I recommend this for 2.6.24-rc2, and 2.6.23-stable. But perhaps you've
intentionally avoided this little extra overhead (which isn't necessary
for any functional correctness). I do think it helps to see accurate
numbers there - which Slab gives, doesn't it?

I believe I've seen a futures patch go by in which you fit something
else into kmem_cache_cpu here; then another in which you remove more.
Not sure where you stand now!

include/linux/slub_def.h | 1 +
mm/slub.c | 9 +++++++--
2 files changed, 8 insertions(+), 2 deletions(-)

--- 2.6.24-rc1+/include/linux/slub_def.h 2007-10-24 07:16:02.000000000 +0100
+++ linux/include/linux/slub_def.h 2007-11-03 13:22:48.000000000 +0000
@@ -14,6 +14,7 @@
struct kmem_cache_cpu {
void **freelist;
struct page *page;
+ unsigned int free;
int node;
unsigned int offset;
unsigned int objsize;
--- 2.6.24-rc1+/mm/slub.c 2007-11-03 13:22:31.000000000 +0000
+++ linux/mm/slub.c 2007-11-03 13:22:48.000000000 +0000
@@ -1392,6 +1392,7 @@ static void deactivate_slab(struct kmem_
page->freelist = object;
page->inuse--;
}
+ c->free = 0;
c->page = NULL;
unfreeze_slab(s, page);
}
@@ -1483,8 +1484,8 @@ load_freelist:
if (unlikely(SlabDebug(c->page)))
goto debug;

- object = c->page->freelist;
c->freelist = object[c->offset];
+ c->free = s->objects - c->page->inuse - 1;
c->page->inuse = s->objects;
c->page->freelist = NULL;
c->node = page_to_nid(c->page);
@@ -1528,6 +1529,7 @@ new_slab:
if (c->freelist) {
object = c->freelist;
c->freelist = object[c->offset];
+ c->free--;
return object;
}
slab_lock(c->page);
@@ -1580,6 +1582,7 @@ static void __always_inline *slab_alloc(
else {
object = c->freelist;
c->freelist = object[c->offset];
+ c->free--;
}
local_irq_restore(flags);

@@ -1685,6 +1688,7 @@ static void __always_inline slab_free(st
if (likely(page == c->page && c->node >= 0)) {
object[c->offset] = c->freelist;
c->freelist = object;
+ c->free++;
} else
__slab_free(s, page, x, addr, c->offset);

@@ -1866,6 +1870,7 @@ static void init_kmem_cache_cpu(struct k
{
c->page = NULL;
c->freelist = NULL;
+ c->free = 0;
c->node = 0;
c->offset = s->offset / sizeof(void *);
c->objsize = s->objsize;
@@ -3534,7 +3539,7 @@ static unsigned long slab_objects(struct
int x = 0;

if (flags & SO_OBJECTS)
- x = page->inuse;
+ x = page->inuse - c->free;
else
x = 1;
total += x;

2007-11-03 17:28:16

by Oliver Pinter

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

Q: It's needed auch to 2.6.22-stable?

On 11/3/07, Hugh Dickins <[email protected]> wrote:
> Slub has been quite leaky under load. Taking mm_struct as an example, in
> a loop of swapping kernel builds, after the first iteration slabinfo shows:
> Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg
> mm_struct 55 840 73.7K 18/7/4 4 0 38 62 A
> but Objects and Partials steadily creep up - after the 340th iteration:
> mm_struct 110 840 188.4K 46/36/4 4 0 78 49 A
>
> The culprit turns out to be __slab_alloc(), where it copes with the race
> that another task has assigned the cpu slab while we were allocating one.
> Don't rush off to load_freelist there: that assumes c->freelist is empty,
> and will lose all of its free slots when c->page->freelist is not empty.
> Instead just do a local allocation from c->freelist when it has one.
>
> Which fixes the leakage: Objects and Partials then remain stable.
>
> Signed-off-by: Hugh Dickins <[email protected]>
> ---
> I recommend this for 2.6.24-rc2 and 2.6.23-stable.
>
> mm/slub.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> --- 2.6.24-rc1/mm/slub.c 2007-10-24 07:16:04.000000000 +0100
> +++ linux/mm/slub.c 2007-11-03 13:22:31.000000000 +0000
> @@ -1525,6 +1525,11 @@ new_slab:
> * want the current one since its cache hot
> */
> discard_slab(s, new);
> + if (c->freelist) {
> + object = c->freelist;
> + c->freelist = object[c->offset];
> + return object;
> + }
> slab_lock(c->page);
> goto load_freelist;
> }
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


--
Thanks,
Oliver

2007-11-03 17:38:33

by Willy Tarreau

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Sat, Nov 03, 2007 at 05:10:54PM +0000, Hugh Dickins wrote:
> Slub has been quite leaky under load. Taking mm_struct as an example, in
> a loop of swapping kernel builds, after the first iteration slabinfo shows:
> Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg
> mm_struct 55 840 73.7K 18/7/4 4 0 38 62 A
> but Objects and Partials steadily creep up - after the 340th iteration:
> mm_struct 110 840 188.4K 46/36/4 4 0 78 49 A
>
> The culprit turns out to be __slab_alloc(), where it copes with the race
> that another task has assigned the cpu slab while we were allocating one.
> Don't rush off to load_freelist there: that assumes c->freelist is empty,
> and will lose all of its free slots when c->page->freelist is not empty.
> Instead just do a local allocation from c->freelist when it has one.
>
> Which fixes the leakage: Objects and Partials then remain stable.
>
> Signed-off-by: Hugh Dickins <[email protected]>
> ---
> I recommend this for 2.6.24-rc2 and 2.6.23-stable.

Hugh, you should also have CC'd linux-stable for this, otherwise there's
a risk that the patch remains unnoticed by the stable team (CC'd).

> mm/slub.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> --- 2.6.24-rc1/mm/slub.c 2007-10-24 07:16:04.000000000 +0100
> +++ linux/mm/slub.c 2007-11-03 13:22:31.000000000 +0000
> @@ -1525,6 +1525,11 @@ new_slab:
> * want the current one since its cache hot
> */
> discard_slab(s, new);
> + if (c->freelist) {
> + object = c->freelist;
> + c->freelist = object[c->offset];
> + return object;
> + }
> slab_lock(c->page);
> goto load_freelist;
> }

Regards,
Willy

2007-11-03 17:51:17

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Sat, 3 Nov 2007, Olivér Pintér wrote:
> Q: It's needed auch to 2.6.22-stable?

I guess so: though SLUB wasn't on by default in 2.6.22; and it being
only a slow leak rather than a corruption, I was less inclined to
agitate about it for releases further back.

But your question makes me realize I never even looked at 2.6.23 or
2.6.22 hereabouts, just assumed they were the same; let alone patch
or build or test them. The patches reject as such because quite a
lot has changed around (there was no struct kmem_cache_cpu in either).

A hurried look suggests that the leakage problem was there in both,
but let's wait to hear Christoph's expert opinion.

Hugh

2007-11-03 17:56:49

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Sat, 3 Nov 2007, Willy Tarreau wrote:
> On Sat, Nov 03, 2007 at 05:10:54PM +0000, Hugh Dickins wrote:
> > I recommend this for 2.6.24-rc2 and 2.6.23-stable.
>
> Hugh, you should also have CC'd linux-stable for this, otherwise there's
> a risk that the patch remains unnoticed by the stable team (CC'd).

That was intentional, actually: I thought Christoph ought to have
a say in it first. Then he or I would alert the stable team.

And it turns out, see other mail, that the patches wouldn't even
apply as is to -stable: let's forget 2/2 so far as -stable goes,
and I'll now look into versions of 1/2 for 2.6.23 and 2.6.22
(checking whether the problem really happens there or not).

Hugh

2007-11-03 17:59:49

by Willy Tarreau

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Sat, Nov 03, 2007 at 05:55:50PM +0000, Hugh Dickins wrote:
> On Sat, 3 Nov 2007, Willy Tarreau wrote:
> > On Sat, Nov 03, 2007 at 05:10:54PM +0000, Hugh Dickins wrote:
> > > I recommend this for 2.6.24-rc2 and 2.6.23-stable.
> >
> > Hugh, you should also have CC'd linux-stable for this, otherwise there's
> > a risk that the patch remains unnoticed by the stable team (CC'd).
>
> That was intentional, actually: I thought Christoph ought to have
> a say in it first. Then he or I would alert the stable team.

OK, it's clearer then. Sorry for the noise.

Willy

2007-11-03 18:27:06

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 2/2] slub: fix Objects count

On Sat, 3 Nov 2007, Hugh Dickins wrote:

> The count of active Objects shown by Slub's slabinfo is too approximate,
> because each cpu slab is counted as all in use, even when lots are free.
> That makes tracing leaks harder than it need be.

True but that is the way it is for performance reasons.

> Add a free count into kmem_cache_cpu (which doesn't enlarge it on 64-bit),
> to keep that count in the hot and dirty per-cpu cacheline.

Adds to much overhead to the fast paths and will make the current
optimizations in mm impossible.

2007-11-03 18:31:33

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Sat, 3 Nov 2007, Hugh Dickins wrote:

> The culprit turns out to be __slab_alloc(), where it copes with the race
> that another task has assigned the cpu slab while we were allocating one.
> Don't rush off to load_freelist there: that assumes c->freelist is empty,
> and will lose all of its free slots when c->page->freelist is not empty.
> Instead just do a local allocation from c->freelist when it has one.

Hmmm.. Right. This will require some fixes to the optimizations in mm.

Acked-by: Christoph Lameter <[email protected]>

2007-11-03 18:48:20

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Sat, 3 Nov 2007, Hugh Dickins wrote:

> Which fixes the leakage: Objects and Partials then remain stable.

Well this code is just an optimization for a rare case. Your patch may not
handle the debug situation the right way. We could just remove it.



SLUB: Fix memory leak by not reusing cpu_slab

Fix the memory leak that may occur when we attempt to reuse a cpu_slab
that was allocated while we reenabled interrupts in order to be able to
grow a slab cache. The per cpu freelist may contain objects and in that
situation we may overwrite the per cpu freelist pointer loosing objects.
This only occurs if we find that the concurrently allocated slab fits
our allocation needs.

If we simply always deactivate the slab then the freelist will be properly
reintegrated and the memory leak will go away.

Signed-off-by: Christoph Lameter <[email protected]>

---
mm/slub.c | 20 +-------------------
1 file changed, 1 insertion(+), 19 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2007-11-03 11:41:58.000000000 -0700
+++ linux-2.6/mm/slub.c 2007-11-03 11:42:12.000000000 -0700
@@ -1511,26 +1511,8 @@ new_slab:

if (new) {
c = get_cpu_slab(s, smp_processor_id());
- if (c->page) {
- /*
- * Someone else populated the cpu_slab while we
- * enabled interrupts, or we have gotten scheduled
- * on another cpu. The page may not be on the
- * requested node even if __GFP_THISNODE was
- * specified. So we need to recheck.
- */
- if (node_match(c, node)) {
- /*
- * Current cpuslab is acceptable and we
- * want the current one since its cache hot
- */
- discard_slab(s, new);
- slab_lock(c->page);
- goto load_freelist;
- }
- /* New slab does not fit our expectations */
+ if (c->page)
flush_slab(s, c);
- }
slab_lock(new);
SetSlabFrozen(new);
c->page = new;

2007-11-03 18:51:38

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

The fix against mm:


SLUB: Fix memory leak by not reusing cpu_slab

Fix the memory leak that may occur when we attempt to reuse a cpu_slab
that was allocated while we reenabled interrupts in order to be able to
grow a slab cache. The per cpu freelist may contain objects and in that
situation we may overwrite the per cpu freelist pointer loosing objects.
This only occurs if we find that the concurrently allocated slab fits
our allocation needs.

If we simply always deactivate the slab then the freelist will be properly
reintegrated and the memory leak will go away.

Signed-off-by: Christoph Lameter <[email protected]>

---
mm/slub.c | 19 +------------------
1 file changed, 1 insertion(+), 18 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2007-11-03 11:49:20.000000000 -0700
+++ linux-2.6/mm/slub.c 2007-11-03 11:49:29.000000000 -0700
@@ -1529,25 +1529,8 @@ static noinline unsigned long get_new_sl
return 0;

*pc = c = get_cpu_slab(s, smp_processor_id());
- if (c->page) {
- /*
- * Someone else populated the cpu_slab while we
- * enabled interrupts, or we have gotten scheduled
- * on another cpu. The page may not be on the
- * requested node even if __GFP_THISNODE was
- * specified. So we need to recheck.
- */
- if (node_match(c, node)) {
- /*
- * Current cpuslab is acceptable and we
- * want the current one since its cache hot
- */
- discard_slab(s, page);
- return slab_lock(c->page);
- }
- /* New slab does not fit our expectations */
+ if (c->page)
flush_slab(s, c);
- }
c->page = page;
return slab_lock(page) | FROZEN;
}

2007-11-03 18:53:17

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Sat, 3 Nov 2007, Hugh Dickins wrote:
> On Sat, 3 Nov 2007, Olivér Pintér wrote:
> > Q: It's needed auch to 2.6.22-stable?
>
> I guess so: though SLUB wasn't on by default in 2.6.22; and it being
> only a slow leak rather than a corruption, I was less inclined to
> agitate about it for releases further back.
>
> But your question makes me realize I never even looked at 2.6.23 or
> 2.6.22 hereabouts, just assumed they were the same; let alone patch
> or build or test them. The patches reject as such because quite a
> lot has changed around (there was no struct kmem_cache_cpu in either).
>
> A hurried look suggests that the leakage problem was there in both,
> but let's wait to hear Christoph's expert opinion.

Okay, here's a version for 2.6.23 and 2.6.22...
Christoph, you've now Acked the 2.6.24 one, thanks:
do you agree this patch below should go to -stable?


Slub has been quite leaky under load. Taking mm_struct as an example, in
a loop of swapping kernel builds, after the first iteration slabinfo shows:
Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg
mm_struct 55 840 73.7K 18/7/4 4 0 38 62 A
but Objects and Partials steadily creep up - after the 340th iteration:
mm_struct 110 840 188.4K 46/36/4 4 0 78 49 A
(example taken from 2.6.24-rc1: YMMV).

The culprit turns out to be __slab_alloc(), where it copes with the race
that another task has assigned the cpu slab while we were allocating one.
Don't rush off to load_freelist there: that assumes page->lockless_freelist
is empty, and will lose all its free slots when page->freelist is not empty.
Instead just do a local allocation from lockless_freelist when it has one.

Which fixes the leakage: Objects and Partials then remain stable.

Signed-off-by: Hugh Dickins <[email protected]>
---
Version of patch suitable and recommended for both 2.6.23-stable and
2.6.22-stable. I've not run tests on either to observe the mounting
leakage; but a version of the patch below with a printk announcing
when non-empty freelist would overwrite non-empty lockless_freelist
does indeed show up in both (though notably less frequently than in
2.6.24-rc1 - something else seems to be making it more likely now).
But please wait for Christoph's Ack before committing to -stable.

mm/slub.c | 6 ++++++
1 file changed, 6 insertions(+)

--- 2.6.23/mm/slub.c 2007-10-09 21:31:38.000000000 +0100
+++ linux/mm/slub.c 2007-11-03 18:23:07.000000000 +0000
@@ -1517,6 +1517,12 @@ new_slab:
*/
discard_slab(s, page);
page = s->cpu_slab[cpu];
+ if (page->lockless_freelist) {
+ object = page->lockless_freelist;
+ page->lockless_freelist =
+ object[page->offset];
+ return object;
+ }
slab_lock(page);
goto load_freelist;
}

2007-11-03 19:05:26

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Sat, 3 Nov 2007, Christoph Lameter wrote:
> On Sat, 3 Nov 2007, Hugh Dickins wrote:
>
> > Which fixes the leakage: Objects and Partials then remain stable.
>
> Well this code is just an optimization for a rare case.
> Your patch may not handle the debug situation the right way.

Oh? How?

> We could just remove it.

Hmm, that does seem a possibility. It is going to increase the
number of partials in use, but they should go away later once
they fall out of use (in a repetitive test like mine).

I'll give it a try overnight and report back then.

Hugh

2007-11-03 19:10:44

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 2/2] slub: fix Objects count

On Sat, 3 Nov 2007, Christoph Lameter wrote:
> On Sat, 3 Nov 2007, Hugh Dickins wrote:
>
> > The count of active Objects shown by Slub's slabinfo is too approximate,
> > because each cpu slab is counted as all in use, even when lots are free.
> > That makes tracing leaks harder than it need be.
>
> True but that is the way it is for performance reasons.

I was afraid you might say something like that.
Perhaps it'll be a patch I need to use in my own builds.
Though I'd have thought others would want that accuracy too.
Didn't SLAB give it? (The "r*gr*ss**n" word!)

> > Add a free count into kmem_cache_cpu (which doesn't enlarge it on 64-bit),
> > to keep that count in the hot and dirty per-cpu cacheline.
>
> Adds to much overhead to the fast paths

You've come to that conclusion very quickly!
Any numbers to back it up?

> and will make the current optimizations in mm impossible.

I'll have to wait and see what those are: you move too fast for me.

Hugh

2007-11-03 19:26:18

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Sat, 3 Nov 2007, Hugh Dickins wrote:

> On Sat, 3 Nov 2007, Christoph Lameter wrote:
> > On Sat, 3 Nov 2007, Hugh Dickins wrote:
> >
> > > Which fixes the leakage: Objects and Partials then remain stable.
> >
> > Well this code is just an optimization for a rare case.
> > Your patch may not handle the debug situation the right way.
>
> Oh? How?

If SLAB_DEBUG is set then your patch does not do the proper checks for the
object, tracing information is not written and the poisoning is not done.
See alloc_debug_processing().

2007-11-03 19:33:18

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 2/2] slub: fix Objects count

On Sat, 3 Nov 2007, Hugh Dickins wrote:

> I was afraid you might say something like that.
> Perhaps it'll be a patch I need to use in my own builds.
> Though I'd have thought others would want that accuracy too.
> Didn't SLAB give it? (The "r*gr*ss**n" word!)

Slab also only counts objects that are not in the queues. See free_block()
f.e.

We could improve the situation by flushing all cpu slabs before counts are
determined.

Which can be done manually. Run

slabinfo -s

and then look at the numbers.


> > Adds to much overhead to the fast paths
>
> You've come to that conclusion very quickly!

I have just spend a few weeks optimizing the fast and slow paths and there
is some additional overhead that I am still trying to eliminate.

> Any numbers to back it up?

The performance in the fast paths depends on updating only a single word
for an allocation. Adding another counter makes that impossible.

See the recent post on SLUB regression on SMP.

2007-11-03 19:36:19

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Sat, 3 Nov 2007, Christoph Lameter wrote:
> On Sat, 3 Nov 2007, Hugh Dickins wrote:
> > On Sat, 3 Nov 2007, Christoph Lameter wrote:
> > > On Sat, 3 Nov 2007, Hugh Dickins wrote:
> > >
> > > > Which fixes the leakage: Objects and Partials then remain stable.
> > >
> > > Well this code is just an optimization for a rare case.
> > > Your patch may not handle the debug situation the right way.
> >
> > Oh? How?
>
> If SLAB_DEBUG is set then your patch does not do the proper checks for the
> object, tracing information is not written and the poisoning is not done.
> See alloc_debug_processing().

Yup, you're right, thanks.

I'll followup that version I CC'ed to stable,
to stop it and say you'll supply a corrected version.

Hugh

2007-11-03 19:40:48

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Sat, 3 Nov 2007, Hugh Dickins wrote:
> On Sat, 3 Nov 2007, Hugh Dickins wrote:
> > On Sat, 3 Nov 2007, Olivér Pintér wrote:
> > > Q: It's needed auch to 2.6.22-stable?
>
> Okay, here's a version for 2.6.23 and 2.6.22...
> Christoph, you've now Acked the 2.6.24 one, thanks:
> do you agree this patch below should go to -stable?

Later Christoph noticed that I'm not handling the SlabDebug case right.
So stable should ignore my patch, and he will come up with another.

Hugh

2007-11-03 19:47:51

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Sat, 3 Nov 2007, Hugh Dickins wrote:

> Later Christoph noticed that I'm not handling the SlabDebug case right.
> So stable should ignore my patch, and he will come up with another.

Hmmm? I thought you wanted to test the patch provided?

2007-11-03 19:53:19

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Sat, 3 Nov 2007, Christoph Lameter wrote:
> On Sat, 3 Nov 2007, Hugh Dickins wrote:
>
> > Later Christoph noticed that I'm not handling the SlabDebug case right.
> > So stable should ignore my patch, and he will come up with another.
>
> Hmmm? I thought you wanted to test the patch provided?

Yes. Sounds like you see a contradiction there - I don't see it.

Hugh

2007-11-03 19:54:34

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Sat, 3 Nov 2007, Hugh Dickins wrote:

> On Sat, 3 Nov 2007, Christoph Lameter wrote:
> > On Sat, 3 Nov 2007, Hugh Dickins wrote:
> >
> > > Later Christoph noticed that I'm not handling the SlabDebug case right.
> > > So stable should ignore my patch, and he will come up with another.
> >
> > Hmmm? I thought you wanted to test the patch provided?
>
> Yes. Sounds like you see a contradiction there - I don't see it.

Maybe language issues. I read this as meaning that there is no fix
available yet.

2007-11-03 20:03:56

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 2/2] slub: fix Objects count

On Sat, 3 Nov 2007, Christoph Lameter wrote:
> On Sat, 3 Nov 2007, Hugh Dickins wrote:
>
> > I was afraid you might say something like that.
> > Perhaps it'll be a patch I need to use in my own builds.
> > Though I'd have thought others would want that accuracy too.
> > Didn't SLAB give it? (The "r*gr*ss**n" word!)
>
> Slab also only counts objects that are not in the queues. See free_block()
> f.e.

I'll take your word for it, and apologize for my slur on slub!
(Slub has a great deal to admire in it, I should say.)

>
> We could improve the situation by flushing all cpu slabs before counts are
> determined.
>
> Which can be done manually. Run
>
> slabinfo -s
>
> and then look at the numbers.

Mmm, I'd been doing slabinfo -v sometimes. These are fine in some
situations, but it's always better when the observer can avoid
interfering with the observed. Impossible, we know, but...

Also, many caches too quickly re-equip themselves
with cpu slabs which again obscure the numbers.

> > > Adds to much overhead to the fast paths
> >
> > You've come to that conclusion very quickly!
>
> I have just spend a few weeks optimizing the fast and slow paths and there
> is some additional overhead that I am still trying to eliminate.
>
> > Any numbers to back it up?
>
> The performance in the fast paths depends on updating only a single word
> for an allocation. Adding another counter makes that impossible.

Gosh, that's a tighter corner than any I've been in.

>
> See the recent post on SLUB regression on SMP.

I'll have to read up on that, thanks for the pointer.

Hugh

2007-11-03 20:11:55

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Sat, 3 Nov 2007, Christoph Lameter wrote:
> On Sat, 3 Nov 2007, Hugh Dickins wrote:
> > On Sat, 3 Nov 2007, Christoph Lameter wrote:
> > > On Sat, 3 Nov 2007, Hugh Dickins wrote:
> > >
> > > > Later Christoph noticed that I'm not handling the SlabDebug case right.
> > > > So stable should ignore my patch, and he will come up with another.
> > >
> > > Hmmm? I thought you wanted to test the patch provided?
> >
> > Yes. Sounds like you see a contradiction there - I don't see it.
>
> Maybe language issues. I read this as meaning that there is no fix
> available yet.

Neither of us has yet posted a correct patch which applies to 2.6.23
and 2.6.22. I'm testing your 2.6.24-rc patch overnight, and if that's
fine then one of us will post the version for -stable. I thought I'd
better leave that to you, after I've reported back.

Hugh

2007-11-03 20:27:11

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Sat, 3 Nov 2007, Hugh Dickins wrote:

> Neither of us has yet posted a correct patch which applies to 2.6.23
> and 2.6.22. I'm testing your 2.6.24-rc patch overnight, and if that's
> fine then one of us will post the version for -stable. I thought I'd
> better leave that to you, after I've reported back.

Ok.

2007-11-04 11:19:51

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Sat, 3 Nov 2007, Christoph Lameter wrote:
> On Sat, 3 Nov 2007, Hugh Dickins wrote:
>
> > Neither of us has yet posted a correct patch which applies to 2.6.23
> > and 2.6.22. I'm testing your 2.6.24-rc patch overnight, and if that's
> > fine then one of us will post the version for -stable. I thought I'd
> > better leave that to you, after I've reported back.
>
> Ok.

That testing went fine, as you'd expect. Your diffstat is certainly
nicer than mine (corrected for SlabDebug) would be. I expect you'll
go ahead with yours.

But I remain slightly uneasy about it: I do think your original
instinct, in putting in the code you're now removing, was good.

In a low memory situation, when several tasks pile up to allocate
the same resource, we'd usually free back all but the first, rather
than depleting free memory even more than necessary. That you were
doing before, now you take the simpler way out and don't bother.

I've no evidence that this is a significant issue: just mention
it in case it gives you second thoughts e.g. was there a concrete
scenario, other than instinct, which led you to put in that code
originally?

Hugh

2007-11-05 18:45:56

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Sun, 4 Nov 2007, Hugh Dickins wrote:

> That testing went fine, as you'd expect. Your diffstat is certainly
> nicer than mine (corrected for SlabDebug) would be. I expect you'll
> go ahead with yours.

Thanks for testing.

> But I remain slightly uneasy about it: I do think your original
> instinct, in putting in the code you're now removing, was good.

Well yes. I tend to think too much about performance.

> In a low memory situation, when several tasks pile up to allocate
> the same resource, we'd usually free back all but the first, rather
> than depleting free memory even more than necessary. That you were
> doing before, now you take the simpler way out and don't bother.

Hmmm... But even without the patch: All tasks had to allocate their
own slabs via the page allocator first. Most of those were then thrown
away immediately. Now we are flushing the current cpu slab. Which means
that this is also going back to the page allocator if its empty. It is
likely that the push back in the situation you mention will put a slab
with only one object allocated onto the partial lists. This can have two
beneficial effects:

1. We can avoid going back to the page allocator for awhile since we will
find the almost free slab if the current slab is exhausted.

2. If the object that was allocated in the flushed slab was a short lived
use freed then the slab will go back to the page allocator very fast.

> I've no evidence that this is a significant issue: just mention
> it in case it gives you second thoughts e.g. was there a concrete
> scenario, other than instinct, which led you to put in that code
> originally?

The intend was to use objects that were cache hot as much as possible. Use
of the newly allocated slab means we are likely accessing a cache cold
page.

However, given that it took us pretty long to find that issue I would
think that this is not that of an important code path. So the removal
seems to be the right way to go.

2007-11-05 19:10:49

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Mon, 5 Nov 2007, Christoph Lameter wrote:
> On Sun, 4 Nov 2007, Hugh Dickins wrote:
>
> > In a low memory situation, when several tasks pile up to allocate
> > the same resource, we'd usually free back all but the first, rather
> > than depleting free memory even more than necessary. That you were
> > doing before, now you take the simpler way out and don't bother.
>
> Hmmm... But even without the patch: All tasks had to allocate their
> own slabs via the page allocator first. Most of those were then thrown
> away immediately. Now we are flushing the current cpu slab. Which means
> that this is also going back to the page allocator if its empty.

Perfectly possible, but not the likely case.

> It is
> likely that the push back in the situation you mention will put a slab
> with only one object allocated onto the partial lists. This can have two
> beneficial effects:
>
> 1. We can avoid going back to the page allocator for awhile since we will
> find the almost free slab if the current slab is exhausted.

Well, yes, but we don't usually make that argument for allocating
more than we need - especially not when memory is low ;)

>
> 2. If the object that was allocated in the flushed slab was a short lived
> use freed then the slab will go back to the page allocator very fast.
>
> > I've no evidence that this is a significant issue: just mention
> > it in case it gives you second thoughts e.g. was there a concrete
> > scenario, other than instinct, which led you to put in that code
> > originally?
>
> The intend was to use objects that were cache hot as much as possible. Use
> of the newly allocated slab means we are likely accessing a cache cold
> page.

Ah yes. And that's certainly no argument for retaining the code,
I'm sure it's not a case we need to optimize for.

>
> However, given that it took us pretty long to find that issue I would
> think that this is not that of an important code path. So the removal
> seems to be the right way to go.

Okay, I wanted to make the point, but I've no wish to hold up your fix
(and removing code, particularly code that has given trouble, is always
welcome). Please go ahead - thanks.

Hugh

2007-11-05 19:15:59

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Mon, 5 Nov 2007, Hugh Dickins wrote:

> Okay, I wanted to make the point, but I've no wish to hold up your fix
> (and removing code, particularly code that has given trouble, is always
> welcome). Please go ahead - thanks.

Here is the fix against 2.6.23:

SLUB: Fix memory leak by not reusing cpu_slab

Fix the memory leak that may occur when we attempt to reuse a cpu_slab
that was allocated while we reenabled interrupts in order to be able to
grow a slab cache. The per cpu freelist may contain objects and in that
situation we may overwrite the per cpu freelist pointer loosing objects.
This only occurs if we find that the concurrently allocated slab fits
our allocation needs.

If we simply always deactivate the slab then the freelist will be properly
reintegrated and the memory leak will go away.

Signed-off-by: Christoph Lameter <[email protected]>

---
mm/slub.c | 20 +-------------------
1 file changed, 1 insertion(+), 19 deletions(-)

Index: linux-2.6.23/mm/slub.c
===================================================================
--- linux-2.6.23.orig/mm/slub.c 2007-10-09 13:31:38.000000000 -0700
+++ linux-2.6.23/mm/slub.c 2007-11-05 11:09:49.000000000 -0800
@@ -1501,28 +1501,8 @@ new_slab:
page = new_slab(s, gfpflags, node);
if (page) {
cpu = smp_processor_id();
- if (s->cpu_slab[cpu]) {
- /*
- * Someone else populated the cpu_slab while we
- * enabled interrupts, or we have gotten scheduled
- * on another cpu. The page may not be on the
- * requested node even if __GFP_THISNODE was
- * specified. So we need to recheck.
- */
- if (node == -1 ||
- page_to_nid(s->cpu_slab[cpu]) == node) {
- /*
- * Current cpuslab is acceptable and we
- * want the current one since its cache hot
- */
- discard_slab(s, page);
- page = s->cpu_slab[cpu];
- slab_lock(page);
- goto load_freelist;
- }
- /* New slab does not fit our expectations */
+ if (s->cpu_slab[cpu])
flush_slab(s, s->cpu_slab[cpu], cpu);
- }
slab_lock(page);
SetSlabFrozen(page);
s->cpu_slab[cpu] = page;

2007-11-05 19:24:06

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

Fix against 2.6.22 (the .23 fix also applies with some offsets)


SLUB: Fix memory leak by not reusing cpu_slab

Fix the memory leak that may occur when we attempt to reuse a cpu_slab
that was allocated while we reenabled interrupts in order to be able to
grow a slab cache. The per cpu freelist may contain objects and in that
situation we may overwrite the per cpu freelist pointer loosing objects.
This only occurs if we find that the concurrently allocated slab fits
our allocation needs.

If we simply always deactivate the slab then the freelist will be properly
reintegrated and the memory leak will go away.

Signed-off-by: Christoph Lameter <[email protected]>

---
mm/slub.c | 20 +-------------------
1 file changed, 1 insertion(+), 19 deletions(-)

Index: linux-2.6.22/mm/slub.c
===================================================================
--- linux-2.6.22.orig/mm/slub.c 2007-07-08 16:32:17.000000000 -0700
+++ linux-2.6.22/mm/slub.c 2007-11-05 11:19:15.000000000 -0800
@@ -1431,28 +1431,8 @@ new_slab:
page = new_slab(s, gfpflags, node);
if (page) {
cpu = smp_processor_id();
- if (s->cpu_slab[cpu]) {
- /*
- * Someone else populated the cpu_slab while we
- * enabled interrupts, or we have gotten scheduled
- * on another cpu. The page may not be on the
- * requested node even if __GFP_THISNODE was
- * specified. So we need to recheck.
- */
- if (node == -1 ||
- page_to_nid(s->cpu_slab[cpu]) == node) {
- /*
- * Current cpuslab is acceptable and we
- * want the current one since its cache hot
- */
- discard_slab(s, page);
- page = s->cpu_slab[cpu];
- slab_lock(page);
- goto load_freelist;
- }
- /* New slab does not fit our expectations */
+ if (s->cpu_slab[cpu])
flush_slab(s, s->cpu_slab[cpu], cpu);
- }
slab_lock(page);
SetSlabFrozen(page);
s->cpu_slab[cpu] = page;

2007-11-05 19:27:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage



On Mon, 5 Nov 2007, Christoph Lameter wrote:
>
> Here is the fix against 2.6.23:

Will you send the equivalent one for the current -git tree?

Linus

2007-11-05 19:32:15

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Mon, 5 Nov 2007, Linus Torvalds wrote:

> > Here is the fix against 2.6.23:
>
> Will you send the equivalent one for the current -git tree?

I already sent that on Saturday (also sent a version against mm which is
a bit different now). Here is the version against git again:


SLUB: Fix memory leak by not reusing cpu_slab

Fix the memory leak that may occur when we attempt to reuse a cpu_slab
that was allocated while we reenabled interrupts in order to be able to
grow a slab cache. The per cpu freelist may contain objects and in that
situation we may overwrite the per cpu freelist pointer loosing objects.
This only occurs if we find that the concurrently allocated slab fits
our allocation needs.

If we simply always deactivate the slab then the freelist will be properly
reintegrated and the memory leak will go away.

Signed-off-by: Christoph Lameter <[email protected]>

---
mm/slub.c | 20 +-------------------
1 file changed, 1 insertion(+), 19 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2007-11-03 11:41:58.000000000 -0700
+++ linux-2.6/mm/slub.c 2007-11-03 11:42:12.000000000 -0700
@@ -1511,26 +1511,8 @@ new_slab:

if (new) {
c = get_cpu_slab(s, smp_processor_id());
- if (c->page) {
- /*
- * Someone else populated the cpu_slab while we
- * enabled interrupts, or we have gotten scheduled
- * on another cpu. The page may not be on the
- * requested node even if __GFP_THISNODE was
- * specified. So we need to recheck.
- */
- if (node_match(c, node)) {
- /*
- * Current cpuslab is acceptable and we
- * want the current one since its cache hot
- */
- discard_slab(s, new);
- slab_lock(c->page);
- goto load_freelist;
- }
- /* New slab does not fit our expectations */
+ if (c->page)
flush_slab(s, c);
- }
slab_lock(new);
SetSlabFrozen(new);
c->page = new;

2007-11-05 19:37:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage



On Mon, 5 Nov 2007, Christoph Lameter wrote:
>
> > Will you send the equivalent one for the current -git tree?
>
> I already sent that on Saturday (also sent a version against mm which is
> a bit different now).

With all the back-and-forth with you and Hugh, I wanted to make sure I
knew what to apply ;)

> Here is the version against git again:

Thanks,

Linus

2007-11-06 06:16:08

by Jeff Chua

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Nov 6, 2007 2:45 AM, Christoph Lameter <[email protected]> wrote:

> 1. We can avoid going back to the page allocator for awhile since we will
> find the almost free slab if the current slab is exhausted.

Does this impact SLAB as well? I'm getting out of memory with kernel
2.6.21, 2.6.22 and 2.6.23, so I'm stuck with 2.6.20-15 for systems
running Oracle. It only happens with lots of activities.

Jeff.

2007-11-06 07:59:37

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 1/2] slub: fix leakage

On Tue, 6 Nov 2007, Jeff Chua wrote:
> On Nov 6, 2007 2:45 AM, Christoph Lameter <[email protected]> wrote:
>
> > 1. We can avoid going back to the page allocator for awhile since we will
> > find the almost free slab if the current slab is exhausted.
>
> Does this impact SLAB as well?

No, it's in code peculiar to SLUB.

Of course, there might be a similar bug somewhere in SLAB.
More likely to be something else though.

> I'm getting out of memory with kernel
> 2.6.21, 2.6.22 and 2.6.23, so I'm stuck with 2.6.20-15 for systems
> running Oracle. It only happens with lots of activities.

It would help everybody if you could get more info on this.
Give 2.6.24-rc2 a try when it appears, if you can.

Do you think it's the SLAB which is growing? If /proc/meminfo's
Slab count goes up and up, then use /proc/slabinfo to see what
it is that's leaking. Then there's CONFIG_DEBUG_SLAB_LEAK.

With 2.6.22 or 2.6.23 you could CONFIG_SLUB instead, apply
the leak fix from this thread, and see if SLUB behaves the same
(using Documentation/vm/slabinfo.c instead of /proc/slabinfo); but
chances are the problem's above it, and it will behave the same.

Best start a new thread when you can report back.

Thanks,
Hugh