2021-10-18 07:59:36

by quanyang wang

[permalink] [raw]
Subject: [V2][PATCH] cgroup: fix memory leak caused by missing cgroup_bpf_offline

From: Quanyang Wang <[email protected]>

When enabling CONFIG_CGROUP_BPF, kmemleak can be observed by running
the command as below:

$mount -t cgroup -o none,name=foo cgroup cgroup/
$umount cgroup/

unreferenced object 0xc3585c40 (size 64):
comm "mount", pid 425, jiffies 4294959825 (age 31.990s)
hex dump (first 32 bytes):
01 00 00 80 84 8c 28 c0 00 00 00 00 00 00 00 00 ......(.........
00 00 00 00 00 00 00 00 6c 43 a0 c3 00 00 00 00 ........lC......
backtrace:
[<e95a2f9e>] cgroup_bpf_inherit+0x44/0x24c
[<1f03679c>] cgroup_setup_root+0x174/0x37c
[<ed4b0ac5>] cgroup1_get_tree+0x2c0/0x4a0
[<f85b12fd>] vfs_get_tree+0x24/0x108
[<f55aec5c>] path_mount+0x384/0x988
[<e2d5e9cd>] do_mount+0x64/0x9c
[<208c9cfe>] sys_mount+0xfc/0x1f4
[<06dd06e0>] ret_fast_syscall+0x0/0x48
[<a8308cb3>] 0xbeb4daa8

This is because that since the commit 2b0d3d3e4fcf ("percpu_ref: reduce
memory footprint of percpu_ref in fast path") root_cgrp->bpf.refcnt.data
is allocated by the function percpu_ref_init in cgroup_bpf_inherit which
is called by cgroup_setup_root when mounting, but not freed along with
root_cgrp when umounting. Adding cgroup_bpf_offline which calls
percpu_ref_kill to cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in
umount path.

This patch also fixes the commit 4bfc0bb2c60e ("bpf: decouple the lifetime
of cgroup_bpf from cgroup itself"). A cgroup_bpf_offline is needed to do a
cleanup that frees the resources which are allocated by cgroup_bpf_inherit
in cgroup_setup_root.

And inside cgroup_bpf_offline, cgroup_get() is at the beginning and
cgroup_put is at the end of cgroup_bpf_release which is called by
cgroup_bpf_offline. So cgroup_bpf_offline can keep the balance of
cgroup's refcount.

Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path")
Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
Signed-off-by: Quanyang Wang <[email protected]>
---
V1 ---> V2:
1. As per Daniel's suggestion, add description to commit msg about the
balance of cgroup's refcount in cgroup_bpf_offline.
2. As per Michal's suggestion, add tag "Fixes: 4bfc0bb2c60e" and add
description about it.
3. Fix indentation on the percpu_ref_is_dying line.
---
kernel/cgroup/cgroup.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 570b0c97392a..ea08f01d0111 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2187,8 +2187,10 @@ static void cgroup_kill_sb(struct super_block *sb)
* And don't kill the default root.
*/
if (list_empty(&root->cgrp.self.children) && root != &cgrp_dfl_root &&
- !percpu_ref_is_dying(&root->cgrp.self.refcnt))
+ !percpu_ref_is_dying(&root->cgrp.self.refcnt)) {
+ cgroup_bpf_offline(&root->cgrp);
percpu_ref_kill(&root->cgrp.self.refcnt);
+ }
cgroup_put(&root->cgrp);
kernfs_kill_sb(sb);
}
--
2.25.1


2021-10-18 09:04:09

by Ming Lei

[permalink] [raw]
Subject: Re: [V2][PATCH] cgroup: fix memory leak caused by missing cgroup_bpf_offline

On Mon, Oct 18, 2021 at 03:56:23PM +0800, [email protected] wrote:
> From: Quanyang Wang <[email protected]>
>
> When enabling CONFIG_CGROUP_BPF, kmemleak can be observed by running
> the command as below:
>
> $mount -t cgroup -o none,name=foo cgroup cgroup/
> $umount cgroup/
>
> unreferenced object 0xc3585c40 (size 64):
> comm "mount", pid 425, jiffies 4294959825 (age 31.990s)
> hex dump (first 32 bytes):
> 01 00 00 80 84 8c 28 c0 00 00 00 00 00 00 00 00 ......(.........
> 00 00 00 00 00 00 00 00 6c 43 a0 c3 00 00 00 00 ........lC......
> backtrace:
> [<e95a2f9e>] cgroup_bpf_inherit+0x44/0x24c
> [<1f03679c>] cgroup_setup_root+0x174/0x37c
> [<ed4b0ac5>] cgroup1_get_tree+0x2c0/0x4a0
> [<f85b12fd>] vfs_get_tree+0x24/0x108
> [<f55aec5c>] path_mount+0x384/0x988
> [<e2d5e9cd>] do_mount+0x64/0x9c
> [<208c9cfe>] sys_mount+0xfc/0x1f4
> [<06dd06e0>] ret_fast_syscall+0x0/0x48
> [<a8308cb3>] 0xbeb4daa8
>
> This is because that since the commit 2b0d3d3e4fcf ("percpu_ref: reduce
> memory footprint of percpu_ref in fast path") root_cgrp->bpf.refcnt.data
> is allocated by the function percpu_ref_init in cgroup_bpf_inherit which
> is called by cgroup_setup_root when mounting, but not freed along with
> root_cgrp when umounting. Adding cgroup_bpf_offline which calls
> percpu_ref_kill to cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in
> umount path.
>
> This patch also fixes the commit 4bfc0bb2c60e ("bpf: decouple the lifetime
> of cgroup_bpf from cgroup itself"). A cgroup_bpf_offline is needed to do a
> cleanup that frees the resources which are allocated by cgroup_bpf_inherit
> in cgroup_setup_root.
>
> And inside cgroup_bpf_offline, cgroup_get() is at the beginning and
> cgroup_put is at the end of cgroup_bpf_release which is called by
> cgroup_bpf_offline. So cgroup_bpf_offline can keep the balance of
> cgroup's refcount.
>
> Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path")

If I understand correctly, cgroup_bpf_release() won't be called without
your patch. So anything allocated in cgroup_bpf_inherit() will be
leaked?

If that is true, 'Fixes: 2b0d3d3e4fcf' looks misleading, cause people has to
backport your patch if 4bfc0bb2c60e is applied. Meantime, this fix isn't
needed if 4bfc0bb2c60e isn't merged.


Thanks,
Ming

2021-10-18 10:08:30

by quanyang wang

[permalink] [raw]
Subject: Re: [V2][PATCH] cgroup: fix memory leak caused by missing cgroup_bpf_offline

Hi Ming,

On 10/18/21 5:02 PM, Ming Lei wrote:
> On Mon, Oct 18, 2021 at 03:56:23PM +0800, [email protected] wrote:
>> From: Quanyang Wang <[email protected]>
>>
>> When enabling CONFIG_CGROUP_BPF, kmemleak can be observed by running
>> the command as below:
>>
>> $mount -t cgroup -o none,name=foo cgroup cgroup/
>> $umount cgroup/
>>
>> unreferenced object 0xc3585c40 (size 64):
>> comm "mount", pid 425, jiffies 4294959825 (age 31.990s)
>> hex dump (first 32 bytes):
>> 01 00 00 80 84 8c 28 c0 00 00 00 00 00 00 00 00 ......(.........
>> 00 00 00 00 00 00 00 00 6c 43 a0 c3 00 00 00 00 ........lC......
>> backtrace:
>> [<e95a2f9e>] cgroup_bpf_inherit+0x44/0x24c
>> [<1f03679c>] cgroup_setup_root+0x174/0x37c
>> [<ed4b0ac5>] cgroup1_get_tree+0x2c0/0x4a0
>> [<f85b12fd>] vfs_get_tree+0x24/0x108
>> [<f55aec5c>] path_mount+0x384/0x988
>> [<e2d5e9cd>] do_mount+0x64/0x9c
>> [<208c9cfe>] sys_mount+0xfc/0x1f4
>> [<06dd06e0>] ret_fast_syscall+0x0/0x48
>> [<a8308cb3>] 0xbeb4daa8
>>
>> This is because that since the commit 2b0d3d3e4fcf ("percpu_ref: reduce
>> memory footprint of percpu_ref in fast path") root_cgrp->bpf.refcnt.data
>> is allocated by the function percpu_ref_init in cgroup_bpf_inherit which
>> is called by cgroup_setup_root when mounting, but not freed along with
>> root_cgrp when umounting. Adding cgroup_bpf_offline which calls
>> percpu_ref_kill to cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in
>> umount path.
>>
>> This patch also fixes the commit 4bfc0bb2c60e ("bpf: decouple the lifetime
>> of cgroup_bpf from cgroup itself"). A cgroup_bpf_offline is needed to do a
>> cleanup that frees the resources which are allocated by cgroup_bpf_inherit
>> in cgroup_setup_root.
>>
>> And inside cgroup_bpf_offline, cgroup_get() is at the beginning and
>> cgroup_put is at the end of cgroup_bpf_release which is called by
>> cgroup_bpf_offline. So cgroup_bpf_offline can keep the balance of
>> cgroup's refcount.
>>
>> Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path")
>
> If I understand correctly, cgroup_bpf_release() won't be called without
> your patch. So anything allocated in cgroup_bpf_inherit() will be
> leaked?
No, for now cgroup_bpf_release is called to release bpf.refcnt.data of
the cgroup which is not root_cgroup. Only root_cgroup's bpf data is leaked.

For non-root cgroup:
cgroup_mkdir
-> cgroup_create
-->cgroup_bpf_inherit(cgrp_A)
cgroup_rmdir
->cgroup_destroy_locked()
-->cgroup_bpf_offline(cgrp_A)
So for non-root cgroup, there is no memory leak.


For root cgroup:
cgroup_setup_root
->cgroup_bpf_inherit(root_cgrp)
cgroup_kill_sb:
-> (Here should be call cgroup_bpf_offline, or else leak occurs)

Thanks,
Quanyang


>
> If that is true, 'Fixes: 2b0d3d3e4fcf' looks misleading, cause people has to
> backport your patch if 4bfc0bb2c60e is applied. Meantime, this fix isn't
> needed if 4bfc0bb2c60e isn't merged.
>
>
> Thanks,
> Ming
>

2021-10-18 13:02:35

by Ming Lei

[permalink] [raw]
Subject: Re: [V2][PATCH] cgroup: fix memory leak caused by missing cgroup_bpf_offline

On Mon, Oct 18, 2021 at 06:06:28PM +0800, Quanyang Wang wrote:
> Hi Ming,
>
> On 10/18/21 5:02 PM, Ming Lei wrote:
> > On Mon, Oct 18, 2021 at 03:56:23PM +0800, [email protected] wrote:
> > > From: Quanyang Wang <[email protected]>
> > >
> > > When enabling CONFIG_CGROUP_BPF, kmemleak can be observed by running
> > > the command as below:
> > >
> > > $mount -t cgroup -o none,name=foo cgroup cgroup/
> > > $umount cgroup/
> > >
> > > unreferenced object 0xc3585c40 (size 64):
> > > comm "mount", pid 425, jiffies 4294959825 (age 31.990s)
> > > hex dump (first 32 bytes):
> > > 01 00 00 80 84 8c 28 c0 00 00 00 00 00 00 00 00 ......(.........
> > > 00 00 00 00 00 00 00 00 6c 43 a0 c3 00 00 00 00 ........lC......
> > > backtrace:
> > > [<e95a2f9e>] cgroup_bpf_inherit+0x44/0x24c
> > > [<1f03679c>] cgroup_setup_root+0x174/0x37c
> > > [<ed4b0ac5>] cgroup1_get_tree+0x2c0/0x4a0
> > > [<f85b12fd>] vfs_get_tree+0x24/0x108
> > > [<f55aec5c>] path_mount+0x384/0x988
> > > [<e2d5e9cd>] do_mount+0x64/0x9c
> > > [<208c9cfe>] sys_mount+0xfc/0x1f4
> > > [<06dd06e0>] ret_fast_syscall+0x0/0x48
> > > [<a8308cb3>] 0xbeb4daa8
> > >
> > > This is because that since the commit 2b0d3d3e4fcf ("percpu_ref: reduce
> > > memory footprint of percpu_ref in fast path") root_cgrp->bpf.refcnt.data
> > > is allocated by the function percpu_ref_init in cgroup_bpf_inherit which
> > > is called by cgroup_setup_root when mounting, but not freed along with
> > > root_cgrp when umounting. Adding cgroup_bpf_offline which calls
> > > percpu_ref_kill to cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in
> > > umount path.
> > >
> > > This patch also fixes the commit 4bfc0bb2c60e ("bpf: decouple the lifetime
> > > of cgroup_bpf from cgroup itself"). A cgroup_bpf_offline is needed to do a
> > > cleanup that frees the resources which are allocated by cgroup_bpf_inherit
> > > in cgroup_setup_root.
> > >
> > > And inside cgroup_bpf_offline, cgroup_get() is at the beginning and
> > > cgroup_put is at the end of cgroup_bpf_release which is called by
> > > cgroup_bpf_offline. So cgroup_bpf_offline can keep the balance of
> > > cgroup's refcount.
> > >
> > > Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path")
> >
> > If I understand correctly, cgroup_bpf_release() won't be called without
> > your patch. So anything allocated in cgroup_bpf_inherit() will be
> > leaked?
> No, for now cgroup_bpf_release is called to release bpf.refcnt.data of the
> cgroup which is not root_cgroup. Only root_cgroup's bpf data is leaked.

You mean that cgroup_bpf_inherit() allocates nothing for root_cgroup?

If yes, I agree you can add 'Fixes: 2b0d3d3e4fcf', otherwise please
remove it.


Thanks,
Ming

2021-10-19 10:43:41

by quanyang wang

[permalink] [raw]
Subject: Re: [V2][PATCH] cgroup: fix memory leak caused by missing cgroup_bpf_offline

Hi Ming,

On 10/18/21 8:59 PM, Ming Lei wrote:
> On Mon, Oct 18, 2021 at 06:06:28PM +0800, Quanyang Wang wrote:
>> Hi Ming,
>>
>> On 10/18/21 5:02 PM, Ming Lei wrote:
>>> On Mon, Oct 18, 2021 at 03:56:23PM +0800, [email protected] wrote:
>>>> From: Quanyang Wang <[email protected]>
>>>>
>>>> When enabling CONFIG_CGROUP_BPF, kmemleak can be observed by running
>>>> the command as below:
>>>>
>>>> $mount -t cgroup -o none,name=foo cgroup cgroup/
>>>> $umount cgroup/
>>>>
>>>> unreferenced object 0xc3585c40 (size 64):
>>>> comm "mount", pid 425, jiffies 4294959825 (age 31.990s)
>>>> hex dump (first 32 bytes):
>>>> 01 00 00 80 84 8c 28 c0 00 00 00 00 00 00 00 00 ......(.........
>>>> 00 00 00 00 00 00 00 00 6c 43 a0 c3 00 00 00 00 ........lC......
>>>> backtrace:
>>>> [<e95a2f9e>] cgroup_bpf_inherit+0x44/0x24c
>>>> [<1f03679c>] cgroup_setup_root+0x174/0x37c
>>>> [<ed4b0ac5>] cgroup1_get_tree+0x2c0/0x4a0
>>>> [<f85b12fd>] vfs_get_tree+0x24/0x108
>>>> [<f55aec5c>] path_mount+0x384/0x988
>>>> [<e2d5e9cd>] do_mount+0x64/0x9c
>>>> [<208c9cfe>] sys_mount+0xfc/0x1f4
>>>> [<06dd06e0>] ret_fast_syscall+0x0/0x48
>>>> [<a8308cb3>] 0xbeb4daa8
>>>>
>>>> This is because that since the commit 2b0d3d3e4fcf ("percpu_ref: reduce
>>>> memory footprint of percpu_ref in fast path") root_cgrp->bpf.refcnt.data
>>>> is allocated by the function percpu_ref_init in cgroup_bpf_inherit which
>>>> is called by cgroup_setup_root when mounting, but not freed along with
>>>> root_cgrp when umounting. Adding cgroup_bpf_offline which calls
>>>> percpu_ref_kill to cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in
>>>> umount path.
>>>>
>>>> This patch also fixes the commit 4bfc0bb2c60e ("bpf: decouple the lifetime
>>>> of cgroup_bpf from cgroup itself"). A cgroup_bpf_offline is needed to do a
>>>> cleanup that frees the resources which are allocated by cgroup_bpf_inherit
>>>> in cgroup_setup_root.
>>>>
>>>> And inside cgroup_bpf_offline, cgroup_get() is at the beginning and
>>>> cgroup_put is at the end of cgroup_bpf_release which is called by
>>>> cgroup_bpf_offline. So cgroup_bpf_offline can keep the balance of
>>>> cgroup's refcount.
>>>>
>>>> Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path")
>>>
>>> If I understand correctly, cgroup_bpf_release() won't be called without
>>> your patch. So anything allocated in cgroup_bpf_inherit() will be
>>> leaked?
>> No, for now cgroup_bpf_release is called to release bpf.refcnt.data of the
>> cgroup which is not root_cgroup. Only root_cgroup's bpf data is leaked.
>
> You mean that cgroup_bpf_inherit() allocates nothing for root_cgroup?
Yes, cgroup_bpf_inherit allocates something for root_cgroup.

The earlier commit 4bfc0bb2c60e ("bpf: decouple the lifetime of
cgroup_bpf from cgroup itself") introduces an imbalance that call
cgroup_bpf_inherit(&root_cgroup) but not call
cgroup_bpf_offline(&root_cgroup). But there was no memory leak here.

When the commit 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of
percpu_ref in fast path") applies, some data is allocated for
root_cgroup and not released with root_cgroup, and memory leak is observed.

So I add 2 "Fixes tags" here to indicate that 2 commits introduce two
different issues.

But it seems that 2 "Fixes tags" is misleading now.
So maybe just fix earlier commit 4bfc0bb2c60e which introduces imbalance?

Thanks,
Quanyang
>
> If yes, I agree you can add 'Fixes: 2b0d3d3e4fcf', otherwise please
> remove it.
>
>
> Thanks,
> Ming
>

2021-10-19 10:45:12

by quanyang wang

[permalink] [raw]
Subject: Re: [V2][PATCH] cgroup: fix memory leak caused by missing cgroup_bpf_offline

Hi Ming,

On 10/18/21 8:59 PM, Ming Lei wrote:
> On Mon, Oct 18, 2021 at 06:06:28PM +0800, Quanyang Wang wrote:
>> Hi Ming,
>>
>> On 10/18/21 5:02 PM, Ming Lei wrote:
>>> On Mon, Oct 18, 2021 at 03:56:23PM +0800, [email protected] wrote:
>>>> From: Quanyang Wang <[email protected]>
>>>>
>>>> When enabling CONFIG_CGROUP_BPF, kmemleak can be observed by running
>>>> the command as below:
>>>>
>>>> $mount -t cgroup -o none,name=foo cgroup cgroup/
>>>> $umount cgroup/
>>>>
>>>> unreferenced object 0xc3585c40 (size 64):
>>>> comm "mount", pid 425, jiffies 4294959825 (age 31.990s)
>>>> hex dump (first 32 bytes):
>>>> 01 00 00 80 84 8c 28 c0 00 00 00 00 00 00 00 00 ......(.........
>>>> 00 00 00 00 00 00 00 00 6c 43 a0 c3 00 00 00 00 ........lC......
>>>> backtrace:
>>>> [<e95a2f9e>] cgroup_bpf_inherit+0x44/0x24c
>>>> [<1f03679c>] cgroup_setup_root+0x174/0x37c
>>>> [<ed4b0ac5>] cgroup1_get_tree+0x2c0/0x4a0
>>>> [<f85b12fd>] vfs_get_tree+0x24/0x108
>>>> [<f55aec5c>] path_mount+0x384/0x988
>>>> [<e2d5e9cd>] do_mount+0x64/0x9c
>>>> [<208c9cfe>] sys_mount+0xfc/0x1f4
>>>> [<06dd06e0>] ret_fast_syscall+0x0/0x48
>>>> [<a8308cb3>] 0xbeb4daa8
>>>>
>>>> This is because that since the commit 2b0d3d3e4fcf ("percpu_ref: reduce
>>>> memory footprint of percpu_ref in fast path") root_cgrp->bpf.refcnt.data
>>>> is allocated by the function percpu_ref_init in cgroup_bpf_inherit which
>>>> is called by cgroup_setup_root when mounting, but not freed along with
>>>> root_cgrp when umounting. Adding cgroup_bpf_offline which calls
>>>> percpu_ref_kill to cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in
>>>> umount path.
>>>>
>>>> This patch also fixes the commit 4bfc0bb2c60e ("bpf: decouple the lifetime
>>>> of cgroup_bpf from cgroup itself"). A cgroup_bpf_offline is needed to do a
>>>> cleanup that frees the resources which are allocated by cgroup_bpf_inherit
>>>> in cgroup_setup_root.
>>>>
>>>> And inside cgroup_bpf_offline, cgroup_get() is at the beginning and
>>>> cgroup_put is at the end of cgroup_bpf_release which is called by
>>>> cgroup_bpf_offline. So cgroup_bpf_offline can keep the balance of
>>>> cgroup's refcount.
>>>>
>>>> Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path")
>>>
>>> If I understand correctly, cgroup_bpf_release() won't be called without
>>> your patch. So anything allocated in cgroup_bpf_inherit() will be
>>> leaked?
>> No, for now cgroup_bpf_release is called to release bpf.refcnt.data of the
>> cgroup which is not root_cgroup. Only root_cgroup's bpf data is leaked.
>
> You mean that cgroup_bpf_inherit() allocates nothing for root_cgroup?
>
> If yes, I agree you can add 'Fixes: 2b0d3d3e4fcf', otherwise please
> remove it.
>
>
> Thanks,
> Ming
>

2021-10-19 17:12:50

by Michal Koutný

[permalink] [raw]
Subject: Re: [V2][PATCH] cgroup: fix memory leak caused by missing cgroup_bpf_offline

Hi.

On Tue, Oct 19, 2021 at 06:41:14PM +0800, Quanyang Wang <[email protected]> wrote:
> So I add 2 "Fixes tags" here to indicate that 2 commits introduce two
> different issues.

AFAIU, both the changes are needed to cause the leak, a single patch
alone won't cause the issue. Is that correct? (Perhaps not as I realize,
see below.)

But on second thought, the problem is the missing percpu_ref_exit() in
the (root) cgroup release path and percpu counter would allocate the
percpu_count_ptr anyway, so 4bfc0bb2c60e is only making the leak more
visible. Is this correct?

I agree the commit 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of
percpu_ref in fast path") alone did nothing wrong.

[On a related (but independent) note, there seems to be an optimization
opportunity in not dealing with cgroup_bpf at all on the non-default
hierarchies.]

Regards,
Michal

2021-10-20 02:20:24

by Ming Lei

[permalink] [raw]
Subject: Re: [V2][PATCH] cgroup: fix memory leak caused by missing cgroup_bpf_offline

On Tue, Oct 19, 2021 at 07:10:26PM +0200, Michal Koutn? wrote:
> Hi.
>
> On Tue, Oct 19, 2021 at 06:41:14PM +0800, Quanyang Wang <[email protected]> wrote:
> > So I add 2 "Fixes tags" here to indicate that 2 commits introduce two
> > different issues.
>
> AFAIU, both the changes are needed to cause the leak, a single patch
> alone won't cause the issue. Is that correct? (Perhaps not as I realize,
> see below.)
>
> But on second thought, the problem is the missing percpu_ref_exit() in
> the (root) cgroup release path and percpu counter would allocate the
> percpu_count_ptr anyway, so 4bfc0bb2c60e is only making the leak more
> visible. Is this correct?
>
> I agree the commit 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of
> percpu_ref in fast path") alone did nothing wrong.

If only precpu_ref data is leaked, it is fine to add "Fixes: 2b0d3d3e4fcf",
I thought cgroup_bpf_release() needs to release more for root cgroup, but
looks not true.


Thanks,
Ming

2021-10-20 05:19:11

by quanyang wang

[permalink] [raw]
Subject: Re: [V2][PATCH] cgroup: fix memory leak caused by missing cgroup_bpf_offline

Hi Michal,

On 10/20/21 1:10 AM, Michal Koutný wrote:
> Hi.
>
> On Tue, Oct 19, 2021 at 06:41:14PM +0800, Quanyang Wang <[email protected]> wrote:
>> So I add 2 "Fixes tags" here to indicate that 2 commits introduce two
>> different issues.
>
> AFAIU, both the changes are needed to cause the leak, a single patch
> alone won't cause the issue. Is that correct? (Perhaps not as I realize,
> see below.)
Yes, I back to the earlier commit 4bfc0bb2c60e and no memory leak is
observed.
>
> But on second thought, the problem is the missing percpu_ref_exit() in
> the (root) cgroup release path and percpu counter would allocate the
> percpu_count_ptr anyway, so 4bfc0bb2c60e is only making the leak more
> visible. Is this correct?
No, the earlier commit 4bfc0bb2c60e introduces a imbalance and the later
commit 2b0d3d3e4fcf introduces a visible leak.

Thanks,
Quanyang
>
> I agree the commit 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of
> percpu_ref in fast path") alone did nothing wrong.
>
> [On a related (but independent) note, there seems to be an optimization
> opportunity in not dealing with cgroup_bpf at all on the non-default
> hierarchies.]
>
> Regards,
> Michal
>

2021-10-20 05:23:57

by quanyang wang

[permalink] [raw]
Subject: Re: [V2][PATCH] cgroup: fix memory leak caused by missing cgroup_bpf_offline

Hi Ming,

On 10/20/21 10:17 AM, Ming Lei wrote:
> On Tue, Oct 19, 2021 at 07:10:26PM +0200, Michal Koutný wrote:
>> Hi.
>>
>> On Tue, Oct 19, 2021 at 06:41:14PM +0800, Quanyang Wang <[email protected]> wrote:
>>> So I add 2 "Fixes tags" here to indicate that 2 commits introduce two
>>> different issues.
>>
>> AFAIU, both the changes are needed to cause the leak, a single patch
>> alone won't cause the issue. Is that correct? (Perhaps not as I realize,
>> see below.)
>>
>> But on second thought, the problem is the missing percpu_ref_exit() in
>> the (root) cgroup release path and percpu counter would allocate the
>> percpu_count_ptr anyway, so 4bfc0bb2c60e is only making the leak more
>> visible. Is this correct?
>>
>> I agree the commit 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of
>> percpu_ref in fast path") alone did nothing wrong.
>
> If only precpu_ref data is leaked, it is fine to add "Fixes: 2b0d3d3e4fcf",
> I thought cgroup_bpf_release() needs to release more for root cgroup, but
> looks not true.
For now, I can only observe that precpu_ref data is leaked when running
ltp testsuite.
Thanks,
Quanyang
>
>
> Thanks,
> Ming
>

2021-10-20 17:30:14

by Michal Koutný

[permalink] [raw]
Subject: Re: [V2][PATCH] cgroup: fix memory leak caused by missing cgroup_bpf_offline

On Wed, Oct 20, 2021 at 01:22:06PM +0800, Quanyang Wang <[email protected]> wrote:
> > If only precpu_ref data is leaked, it is fine to add "Fixes: 2b0d3d3e4fcf",
> > I thought cgroup_bpf_release() needs to release more for root cgroup, but
> > looks not true.
> For now, I can only observe that precpu_ref data is leaked when running ltp
> testsuite.

I assume you refer to ref->data. I considered the ref->percpu_count_ptr
allocated with __alloc_percpu_gfp(). Could it be that kmemleak won't
detect leaked percpu allocations?

(The patch you sent resolves this as well, I'm just curious.)

Michal

2021-10-22 01:33:34

by Roman Gushchin

[permalink] [raw]
Subject: Re: [V2][PATCH] cgroup: fix memory leak caused by missing cgroup_bpf_offline

On Mon, Oct 18, 2021 at 03:56:23PM +0800, [email protected] wrote:
> From: Quanyang Wang <[email protected]>
>
> When enabling CONFIG_CGROUP_BPF, kmemleak can be observed by running
> the command as below:
>
> $mount -t cgroup -o none,name=foo cgroup cgroup/
> $umount cgroup/
>
> unreferenced object 0xc3585c40 (size 64):
> comm "mount", pid 425, jiffies 4294959825 (age 31.990s)
> hex dump (first 32 bytes):
> 01 00 00 80 84 8c 28 c0 00 00 00 00 00 00 00 00 ......(.........
> 00 00 00 00 00 00 00 00 6c 43 a0 c3 00 00 00 00 ........lC......
> backtrace:
> [<e95a2f9e>] cgroup_bpf_inherit+0x44/0x24c
> [<1f03679c>] cgroup_setup_root+0x174/0x37c
> [<ed4b0ac5>] cgroup1_get_tree+0x2c0/0x4a0
> [<f85b12fd>] vfs_get_tree+0x24/0x108
> [<f55aec5c>] path_mount+0x384/0x988
> [<e2d5e9cd>] do_mount+0x64/0x9c
> [<208c9cfe>] sys_mount+0xfc/0x1f4
> [<06dd06e0>] ret_fast_syscall+0x0/0x48
> [<a8308cb3>] 0xbeb4daa8
>
> This is because that since the commit 2b0d3d3e4fcf ("percpu_ref: reduce
> memory footprint of percpu_ref in fast path") root_cgrp->bpf.refcnt.data
> is allocated by the function percpu_ref_init in cgroup_bpf_inherit which
> is called by cgroup_setup_root when mounting, but not freed along with
> root_cgrp when umounting. Adding cgroup_bpf_offline which calls
> percpu_ref_kill to cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in
> umount path.
>
> This patch also fixes the commit 4bfc0bb2c60e ("bpf: decouple the lifetime
> of cgroup_bpf from cgroup itself"). A cgroup_bpf_offline is needed to do a
> cleanup that frees the resources which are allocated by cgroup_bpf_inherit
> in cgroup_setup_root.
>
> And inside cgroup_bpf_offline, cgroup_get() is at the beginning and
> cgroup_put is at the end of cgroup_bpf_release which is called by
> cgroup_bpf_offline. So cgroup_bpf_offline can keep the balance of
> cgroup's refcount.
>
> Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path")
> Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
> Signed-off-by: Quanyang Wang <[email protected]>
> ---
> V1 ---> V2:
> 1. As per Daniel's suggestion, add description to commit msg about the
> balance of cgroup's refcount in cgroup_bpf_offline.
> 2. As per Michal's suggestion, add tag "Fixes: 4bfc0bb2c60e" and add
> description about it.
> 3. Fix indentation on the percpu_ref_is_dying line.

Acked-by: Roman Gushchin <[email protected]>

The fix looks correct, two fixes tag are fine too, if only it won't
confuse scripts picking up patches for stable backports.

In fact, it's a very cold path, which is arguably never hit in the real
life. On cgroup v2 it's not an issue. I'm not sure we need a stable
backport at all, only if it creates a noise for some automation tests.

Quanyang, out of curiosity, how did you find it?

Anyway, thanks for catching and fixing it!

Roman

2021-10-22 11:09:16

by quanyang wang

[permalink] [raw]
Subject: Re: [V2][PATCH] cgroup: fix memory leak caused by missing cgroup_bpf_offline

Hi Michal,

On 10/21/21 1:28 AM, Michal Koutný wrote:
> On Wed, Oct 20, 2021 at 01:22:06PM +0800, Quanyang Wang <[email protected]> wrote:
>>> If only precpu_ref data is leaked, it is fine to add "Fixes: 2b0d3d3e4fcf",
>>> I thought cgroup_bpf_release() needs to release more for root cgroup, but
>>> looks not true.
>> For now, I can only observe that precpu_ref data is leaked when running ltp
>> testsuite.
>
> I assume you refer to ref->data. I considered the ref->percpu_count_ptr
> allocated with __alloc_percpu_gfp(). Could it be that kmemleak won't
> detect leaked percpu allocations?
Yes, kmemleak can't detect percpu allocations. I find some message about
this:

commit f528f0b8e53d
Author: Catalin Marinas <[email protected]>
Date: Mon Sep 26 17:12:53 2011 +0100

kmemleak: Handle percpu memory allocation

This patch adds kmemleak callbacks from the percpu allocator, reducing a
number of false positives caused by kmemleak not scanning such memory
blocks. The percpu chunks are never reported as leaks because of current
kmemleak limitations with the __percpu pointer not pointing directly to
the actual chunks.

Thanks,
Quanyang
>
> (The patch you sent resolves this as well, I'm just curious.)
>
> Michal
>

2021-10-22 11:59:29

by quanyang wang

[permalink] [raw]
Subject: Re: [V2][PATCH] cgroup: fix memory leak caused by missing cgroup_bpf_offline

Hi Roman,

On 10/22/21 9:30 AM, Roman Gushchin wrote:
> On Mon, Oct 18, 2021 at 03:56:23PM +0800, [email protected] wrote:
>> From: Quanyang Wang <[email protected]>
>>
>> When enabling CONFIG_CGROUP_BPF, kmemleak can be observed by running
>> the command as below:
>>
>> $mount -t cgroup -o none,name=foo cgroup cgroup/
>> $umount cgroup/
>>
>> unreferenced object 0xc3585c40 (size 64):
>> comm "mount", pid 425, jiffies 4294959825 (age 31.990s)
>> hex dump (first 32 bytes):
>> 01 00 00 80 84 8c 28 c0 00 00 00 00 00 00 00 00 ......(.........
>> 00 00 00 00 00 00 00 00 6c 43 a0 c3 00 00 00 00 ........lC......
>> backtrace:
>> [<e95a2f9e>] cgroup_bpf_inherit+0x44/0x24c
>> [<1f03679c>] cgroup_setup_root+0x174/0x37c
>> [<ed4b0ac5>] cgroup1_get_tree+0x2c0/0x4a0
>> [<f85b12fd>] vfs_get_tree+0x24/0x108
>> [<f55aec5c>] path_mount+0x384/0x988
>> [<e2d5e9cd>] do_mount+0x64/0x9c
>> [<208c9cfe>] sys_mount+0xfc/0x1f4
>> [<06dd06e0>] ret_fast_syscall+0x0/0x48
>> [<a8308cb3>] 0xbeb4daa8
>>
>> This is because that since the commit 2b0d3d3e4fcf ("percpu_ref: reduce
>> memory footprint of percpu_ref in fast path") root_cgrp->bpf.refcnt.data
>> is allocated by the function percpu_ref_init in cgroup_bpf_inherit which
>> is called by cgroup_setup_root when mounting, but not freed along with
>> root_cgrp when umounting. Adding cgroup_bpf_offline which calls
>> percpu_ref_kill to cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in
>> umount path.
>>
>> This patch also fixes the commit 4bfc0bb2c60e ("bpf: decouple the lifetime
>> of cgroup_bpf from cgroup itself"). A cgroup_bpf_offline is needed to do a
>> cleanup that frees the resources which are allocated by cgroup_bpf_inherit
>> in cgroup_setup_root.
>>
>> And inside cgroup_bpf_offline, cgroup_get() is at the beginning and
>> cgroup_put is at the end of cgroup_bpf_release which is called by
>> cgroup_bpf_offline. So cgroup_bpf_offline can keep the balance of
>> cgroup's refcount.
>>
>> Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path")
>> Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
>> Signed-off-by: Quanyang Wang <[email protected]>
>> ---
>> V1 ---> V2:
>> 1. As per Daniel's suggestion, add description to commit msg about the
>> balance of cgroup's refcount in cgroup_bpf_offline.
>> 2. As per Michal's suggestion, add tag "Fixes: 4bfc0bb2c60e" and add
>> description about it.
>> 3. Fix indentation on the percpu_ref_is_dying line.
>
> Acked-by: Roman Gushchin <[email protected]>
>
> The fix looks correct, two fixes tag are fine too, if only it won't
> confuse scripts picking up patches for stable backports.
>
> In fact, it's a very cold path, which is arguably never hit in the real
> life. On cgroup v2 it's not an issue. I'm not sure we need a stable
> backport at all, only if it creates a noise for some automation tests.
>
> Quanyang, out of curiosity, how did you find it?
I ran ltp testsuite to find this.

./runltp -f controllers -s cgroup

Thanks,
Quanyang
>
> Anyway, thanks for catching and fixing it!
>
> Roman
>

2021-10-22 21:31:55

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [V2][PATCH] cgroup: fix memory leak caused by missing cgroup_bpf_offline

On Fri, Oct 22, 2021 at 4:56 AM Quanyang Wang
<[email protected]> wrote:
> >> Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path")
> >> Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
> >> Signed-off-by: Quanyang Wang <[email protected]>
> >> ---
> >> V1 ---> V2:
> >> 1. As per Daniel's suggestion, add description to commit msg about the
> >> balance of cgroup's refcount in cgroup_bpf_offline.
> >> 2. As per Michal's suggestion, add tag "Fixes: 4bfc0bb2c60e" and add
> >> description about it.
> >> 3. Fix indentation on the percpu_ref_is_dying line.
> >
> > Acked-by: Roman Gushchin <[email protected]>
> >
> > The fix looks correct, two fixes tag are fine too, if only it won't
> > confuse scripts picking up patches for stable backports.
> >
> > In fact, it's a very cold path, which is arguably never hit in the real
> > life. On cgroup v2 it's not an issue. I'm not sure we need a stable
> > backport at all, only if it creates a noise for some automation tests.
> >
> > Quanyang, out of curiosity, how did you find it?
> I ran ltp testsuite to find this.
>
> ./runltp -f controllers -s cgroup
>
> Thanks,
> Quanyang
> >
> > Anyway, thanks for catching and fixing it!

Applied to bpf tree. Thanks everyone!