2011-03-11 23:05:10

by Vince Weaver

[permalink] [raw]
Subject: perf: kernel memory leak when inherit enabled

Hello

While trying to use perf events with inherit enabled to profile some
multi-threaded BLAS routines (using PAPI) I ended up out-of-memorying my
machine. It turns out you can quickly leak gigabytes of kernel memory
that isn't freed when the process exits.

I've attached a test case that reproduces this.

It leaks on 2.6.37 and 2.6.35.9. It does not on 2.6.32.

I would like to try on current git and maybe even bisect, but my devel
machine currently is broken with the ".size expression does not evaluate
to a constant" binutils nonsense.

Unfortunately I won't have much chance to test any more until Monday, but
I thought I'd send this in case it's something obvious that can be easily
reproduced and fixed.

Thanks,

Vince
[email protected]


Attachments:
pe_inherit_memleak.c (1.71 kB)

2011-03-14 22:27:43

by Vince Weaver

[permalink] [raw]
Subject: Re: perf: kernel memory leak when inherit enabled

On Fri, 11 Mar 2011, Vince Weaver wrote:
>
> While trying to use perf events with inherit enabled to profile some
> multi-threaded BLAS routines (using PAPI) I ended up out-of-memorying my
> machine. It turns out you can quickly leak gigabytes of kernel memory
> that isn't freed when the process exits.

I've bisected this. There's a whole day I'll never see again. binutils
2.21 and gcc-4.5 for the lose :(

Anyway this memory leak with inherit was introduced in
4fd38e4595e

Vince

2011-03-14 22:32:56

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: perf: kernel memory leak when inherit enabled

Em Mon, Mar 14, 2011 at 06:27:19PM -0400, Vince Weaver escreveu:
> On Fri, 11 Mar 2011, Vince Weaver wrote:
> >
> > While trying to use perf events with inherit enabled to profile some
> > multi-threaded BLAS routines (using PAPI) I ended up out-of-memorying my
> > machine. It turns out you can quickly leak gigabytes of kernel memory
> > that isn't freed when the process exits.
>
> I've bisected this. There's a whole day I'll never see again. binutils
> 2.21 and gcc-4.5 for the lose :(
>
> Anyway this memory leak with inherit was introduced in
> 4fd38e4595e


commit 4fd38e4595e2f6c9d27732c042a0e16b2753049c
Author: Peter Zijlstra <[email protected]>
Date: Thu May 6 17:31:38 2010 +0200

perf: Fix exit() vs PERF_FORMAT_GROUP

Both Stephane and Corey reported that PERF_FORMAT_GROUP didn't work
as expected if the task the counters were attached to quit before
the read() call.

The cause is that we unconditionally destroy the grouping when we
remove counters from their context. Fix this by only doing this when
we free the counter itself.

Reported-by: Corey Ashford <[email protected]>
Reported-by: Stephane Eranian <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
LKML-Reference: <1273160566.5605.404.camel@twins>
Signed-off-by: Ingo Molnar <[email protected]>

- Arnaldo

2011-03-15 00:28:26

by Vince Weaver

[permalink] [raw]
Subject: Re: perf: kernel memory leak when inherit enabled

On Mon, 14 Mar 2011, Arnaldo Carvalho de Melo wrote:

> Em Mon, Mar 14, 2011 at 06:27:19PM -0400, Vince Weaver escreveu:
> > On Fri, 11 Mar 2011, Vince Weaver wrote:
> > >
> > > While trying to use perf events with inherit enabled to profile some
> > > multi-threaded BLAS routines (using PAPI) I ended up out-of-memorying my
> > > machine. It turns out you can quickly leak gigabytes of kernel memory
> > > that isn't freed when the process exits.
> >
> > I've bisected this. There's a whole day I'll never see again. binutils
> > 2.21 and gcc-4.5 for the lose :(
> >
> > Anyway this memory leak with inherit was introduced in
> > 4fd38e4595e
>
>
> commit 4fd38e4595e2f6c9d27732c042a0e16b2753049c
> Author: Peter Zijlstra <[email protected]>
> Date: Thu May 6 17:31:38 2010 +0200
>
> perf: Fix exit() vs PERF_FORMAT_GROUP

This changeset was reverted already in
e3174cfd2a1e28fff774681f00a0eef3d31da970
yet somehow that didn't fix the inherit mem-leak.

Vince
[email protected]

2011-03-15 02:26:46

by Vince Weaver

[permalink] [raw]
Subject: Re: perf: kernel memory leak when inherit enabled

On Mon, 14 Mar 2011, Vince Weaver wrote:

> On Mon, 14 Mar 2011, Arnaldo Carvalho de Melo wrote:
>
> > Em Mon, Mar 14, 2011 at 06:27:19PM -0400, Vince Weaver escreveu:
> > > On Fri, 11 Mar 2011, Vince Weaver wrote:
> > > >
> > > > While trying to use perf events with inherit enabled to profile some
> > > > multi-threaded BLAS routines (using PAPI) I ended up out-of-memorying my
> > > > machine. It turns out you can quickly leak gigabytes of kernel memory
> > > > that isn't freed when the process exits.
> > >
> > > I've bisected this. There's a whole day I'll never see again. binutils
> > > 2.21 and gcc-4.5 for the lose :(
> > >
> > > Anyway this memory leak with inherit was introduced in
> > > 4fd38e4595e
>
> This changeset was reverted already in
> e3174cfd2a1e28fff774681f00a0eef3d31da970
> yet somehow that didn't fix the inherit mem-leak.

I see, a new fix was made immediately after the revert,
050735b08ca8a016bbace4445fa025b88fee770b
which probably immediately re-introduced the problem, which is why
git bisect didn't catch this. I'm away from my test machine so I'll
have to wait until tomorrow before I can investigate more.

Vince
[email protected]

2011-03-15 09:08:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: perf: kernel memory leak when inherit enabled

On Mon, 2011-03-14 at 18:27 -0400, Vince Weaver wrote:
> On Fri, 11 Mar 2011, Vince Weaver wrote:
> >
> > While trying to use perf events with inherit enabled to profile some
> > multi-threaded BLAS routines (using PAPI) I ended up out-of-memorying my
> > machine. It turns out you can quickly leak gigabytes of kernel memory
> > that isn't freed when the process exits.
>
> I've bisected this. There's a whole day I'll never see again. binutils
> 2.21 and gcc-4.5 for the lose :(
>
> Anyway this memory leak with inherit was introduced in
> 4fd38e4595e

Thanks, managed to get some time yesterday and got as far as to see that
our perf_event_context refcounting is indeed screwy, but didn't get
around to actually catching the culprit.

2011-03-15 13:41:33

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH] perf: Fix tear-down of inherited group events

Subject: perf: Fix tear-down of inherited group events
From: Peter Zijlstra <[email protected]>
Date: Tue Mar 15 14:37:10 CET 2011

When destroying inherited events, we need to destroy groups too,
otherwise the event iteration in perf_event_exit_task_context() will
miss group siblings and we leak events with all the consequences.

Reported-by: Vince Weaver <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: [email protected] # .35+
---
kernel/perf_event.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -6722,17 +6722,20 @@ __perf_event_exit_task(struct perf_event
struct perf_event_context *child_ctx,
struct task_struct *child)
{
- struct perf_event *parent_event;
+ if (child_event->parent) {
+ raw_spin_lock_irq(&child_ctx->lock);
+ perf_group_detach(child_event);
+ raw_spin_unlock_irq(&child_ctx->lock);
+ }

perf_remove_from_context(child_event);

- parent_event = child_event->parent;
/*
- * It can happen that parent exits first, and has events
+ * It can happen that the parent exits first, and has events
* that are still around due to the child reference. These
- * events need to be zapped - but otherwise linger.
+ * events need to be zapped.
*/
- if (parent_event) {
+ if (child_event->parent) {
sync_child_event(child_event, child);
free_event(child_event);
}

2011-03-15 16:09:44

by Vince Weaver

[permalink] [raw]
Subject: Re: [PATCH] perf: Fix tear-down of inherited group events

On Tue, 15 Mar 2011, Peter Zijlstra wrote:

> Subject: perf: Fix tear-down of inherited group events
> From: Peter Zijlstra <[email protected]>
> Date: Tue Mar 15 14:37:10 CET 2011
>
> When destroying inherited events, we need to destroy groups too,
> otherwise the event iteration in perf_event_exit_task_context() will
> miss group siblings and we leak events with all the consequences.

Thanks for the fix! I can verify that when applied against current
linus-git kernel that my original test case no longer leaks memory.

I've also run the full PAPI regression tests, plus the BLAS/PAPI benchmark
code that originally showed the problem and everything checks out fine.

It's a shame this fix didn't make it in before 2.6.38.

Tested-by: Vince Weaver <[email protected]>

Vince
[email protected]

2011-03-15 16:41:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf: Fix tear-down of inherited group events

On Tue, 2011-03-15 at 12:09 -0400, Vince Weaver wrote:
> On Tue, 15 Mar 2011, Peter Zijlstra wrote:
>
> > Subject: perf: Fix tear-down of inherited group events
> > From: Peter Zijlstra <[email protected]>
> > Date: Tue Mar 15 14:37:10 CET 2011
> >
> > When destroying inherited events, we need to destroy groups too,
> > otherwise the event iteration in perf_event_exit_task_context() will
> > miss group siblings and we leak events with all the consequences.
>
> Thanks for the fix! I can verify that when applied against current
> linus-git kernel that my original test case no longer leaks memory.
>
> I've also run the full PAPI regression tests, plus the BLAS/PAPI benchmark
> code that originally showed the problem and everything checks out fine.
>
> It's a shame this fix didn't make it in before 2.6.38.

It sure is, sadly there's still only 24h in a day, less if you consider
this .jp earthquake speeding up our planets spin :-)

> Tested-by: Vince Weaver <[email protected]>

Thanks Vince!

2011-03-16 14:00:06

by Peter Zijlstra

[permalink] [raw]
Subject: [tip:perf/urgent] perf: Fix tear-down of inherited group events

Commit-ID: 38b435b16c36b0d863efcf3f07b34a6fac9873fd
Gitweb: http://git.kernel.org/tip/38b435b16c36b0d863efcf3f07b34a6fac9873fd
Author: Peter Zijlstra <[email protected]>
AuthorDate: Tue, 15 Mar 2011 14:37:10 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 16 Mar 2011 14:04:14 +0100

perf: Fix tear-down of inherited group events

When destroying inherited events, we need to destroy groups too,
otherwise the event iteration in perf_event_exit_task_context() will
miss group siblings and we leak events with all the consequences.

Reported-and-tested-by: Vince Weaver <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: <[email protected]> # .35+
LKML-Reference: <1300196470.2203.61.camel@twins>
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/perf_event.c | 13 ++++++++-----
1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 533f715..3472bb1 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -6722,17 +6722,20 @@ __perf_event_exit_task(struct perf_event *child_event,
struct perf_event_context *child_ctx,
struct task_struct *child)
{
- struct perf_event *parent_event;
+ if (child_event->parent) {
+ raw_spin_lock_irq(&child_ctx->lock);
+ perf_group_detach(child_event);
+ raw_spin_unlock_irq(&child_ctx->lock);
+ }

perf_remove_from_context(child_event);

- parent_event = child_event->parent;
/*
- * It can happen that parent exits first, and has events
+ * It can happen that the parent exits first, and has events
* that are still around due to the child reference. These
- * events need to be zapped - but otherwise linger.
+ * events need to be zapped.
*/
- if (parent_event) {
+ if (child_event->parent) {
sync_child_event(child_event, child);
free_event(child_event);
}