2013-04-02 07:35:55

by Zefan Li

[permalink] [raw]
Subject: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

If memcg_init_kmem() returns -errno when a memcg is being created,
mem_cgroup_css_online() will decrement memcg and its parent's refcnt,
(but strangely there's no mem_cgroup_put() for mem_cgroup_get() called
in memcg_propagate_kmem()).

But then cgroup core will call mem_cgroup_css_free() to do cleanup...

Signed-off-by: Li Zefan <[email protected]>
---
mm/memcontrol.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e80504e..e927fc6 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6247,16 +6247,7 @@ mem_cgroup_css_online(struct cgroup *cont)

error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
mutex_unlock(&memcg_create_mutex);
- if (error) {
- /*
- * We call put now because our (and parent's) refcnts
- * are already in place. mem_cgroup_put() will internally
- * call __mem_cgroup_free, so return directly
- */
- mem_cgroup_put(memcg);
- if (parent->use_hierarchy)
- mem_cgroup_put(parent);
- }
+
return error;
}

--
1.8.0.2


2013-04-02 08:04:30

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On 2013/4/2 15:35, Li Zefan wrote:
> If memcg_init_kmem() returns -errno when a memcg is being created,
> mem_cgroup_css_online() will decrement memcg and its parent's refcnt,

> (but strangely there's no mem_cgroup_put() for mem_cgroup_get() called
> in memcg_propagate_kmem()).

The comment in memcg_propagate_kmem() suggests it knows mem_cgroup_css_free()
will be called in failure, while mem_cgroup_css_online() doesn't know.

>
> But then cgroup core will call mem_cgroup_css_free() to do cleanup...
>
> Signed-off-by: Li Zefan <[email protected]>
> ---
> mm/memcontrol.c | 11 +----------
> 1 file changed, 1 insertion(+), 10 deletions(-)

2013-04-02 08:06:37

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On 04/02/2013 12:03 PM, Li Zefan wrote:
> On 2013/4/2 15:35, Li Zefan wrote:
>> If memcg_init_kmem() returns -errno when a memcg is being created,
>> mem_cgroup_css_online() will decrement memcg and its parent's refcnt,
>
>> (but strangely there's no mem_cgroup_put() for mem_cgroup_get() called
>> in memcg_propagate_kmem()).
>
> The comment in memcg_propagate_kmem() suggests it knows mem_cgroup_css_free()
> will be called in failure, while mem_cgroup_css_online() doesn't know.
>
This is a bit suspicious. At first your analysis seems fair, but I've
extensively tested memcg teardown process with kmemcg (and even
uncovered some bugs at that), and it works when and how expected.

Also, note that this teardown code long predates kmemcg.

I am not saying your are wrong - on the contrary, you seem to be right,
but I think this one needs to be handled with extra care. I will run
some tests, take a look, and get back to you.

2013-04-02 08:34:59

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On 2013/4/2 16:07, Glauber Costa wrote:
> On 04/02/2013 12:03 PM, Li Zefan wrote:
>> On 2013/4/2 15:35, Li Zefan wrote:
>>> If memcg_init_kmem() returns -errno when a memcg is being created,
>>> mem_cgroup_css_online() will decrement memcg and its parent's refcnt,
>>
>>> (but strangely there's no mem_cgroup_put() for mem_cgroup_get() called
>>> in memcg_propagate_kmem()).
>>
>> The comment in memcg_propagate_kmem() suggests it knows mem_cgroup_css_free()
>> will be called in failure, while mem_cgroup_css_online() doesn't know.
>>
> This is a bit suspicious. At first your analysis seems fair, but I've
> extensively tested memcg teardown process with kmemcg (and even
> uncovered some bugs at that), and it works when and how expected.
>

Because this bug is in a failure path, and seems the only way to get into
this path is -ENOMEM.

> Also, note that this teardown code long predates kmemcg.
>

Maybe this bug was introduced when ss->create() was changed to ss->css_alloc()
and ss->css_online(), and before that change ss->destroy() won't be called
if ss->create() failed.

> I am not saying your are wrong - on the contrary, you seem to be right,
> but I think this one needs to be handled with extra care. I will run
> some tests, take a look, and get back to you.
>

2013-04-02 08:42:15

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On 04/02/2013 12:34 PM, Li Zefan wrote:
> On 2013/4/2 16:07, Glauber Costa wrote:
>> On 04/02/2013 12:03 PM, Li Zefan wrote:
>>> On 2013/4/2 15:35, Li Zefan wrote:
>>>> If memcg_init_kmem() returns -errno when a memcg is being created,
>>>> mem_cgroup_css_online() will decrement memcg and its parent's refcnt,
>>>
>>>> (but strangely there's no mem_cgroup_put() for mem_cgroup_get() called
>>>> in memcg_propagate_kmem()).
>>>
>>> The comment in memcg_propagate_kmem() suggests it knows mem_cgroup_css_free()
>>> will be called in failure, while mem_cgroup_css_online() doesn't know.
>>>
>> This is a bit suspicious. At first your analysis seems fair, but I've
>> extensively tested memcg teardown process with kmemcg (and even
>> uncovered some bugs at that), and it works when and how expected.
>>
>
> Because this bug is in a failure path, and seems the only way to get into
> this path is -ENOMEM.
>
Yes, but I tend to test that with manually introduced error codes.

For what is worth, I just did it. And indeed, by ignoring kmemcg
initialization and failing with ENOMEM here triggers a bug. Your patch
fixes it. I tested both failing all non-root, and letting the first
succeed and failing the second if the parent is use_hierarchy. Both
cases have a bug initially that you fix.

If that is allowed to proceed, kmemcg initialization and teardown works
as expected.

>> Also, note that this teardown code long predates kmemcg.
>>
>
> Maybe this bug was introduced when ss->create() was changed to ss->css_alloc()
> and ss->css_online(), and before that change ss->destroy() won't be called
> if ss->create() failed.
>
Yes, this is possible,

2013-04-02 08:43:09

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On 04/02/2013 11:35 AM, Li Zefan wrote:
> If memcg_init_kmem() returns -errno when a memcg is being created,
> mem_cgroup_css_online() will decrement memcg and its parent's refcnt,
> (but strangely there's no mem_cgroup_put() for mem_cgroup_get() called
> in memcg_propagate_kmem()).
>
> But then cgroup core will call mem_cgroup_css_free() to do cleanup...
>

Not a kmemcg bug, but a real bug. Tested by forcing an ENOMEM condition
to happen manually, and Li patch fixes it.
> Signed-off-by: Li Zefan <[email protected]>

Acked-by: Glauber Costa <[email protected]>

2013-04-02 12:16:06

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On Tue 02-04-13 15:35:28, Li Zefan wrote:
[...]
> @@ -6247,16 +6247,7 @@ mem_cgroup_css_online(struct cgroup *cont)
>
> error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
> mutex_unlock(&memcg_create_mutex);
> - if (error) {
> - /*
> - * We call put now because our (and parent's) refcnts
> - * are already in place. mem_cgroup_put() will internally
> - * call __mem_cgroup_free, so return directly
> - */
> - mem_cgroup_put(memcg);
> - if (parent->use_hierarchy)
> - mem_cgroup_put(parent);
> - }
> +
> return error;
> }

The mem_cgroup_put(parent) part is incorrect because mem_cgroup_put goes
up the hierarchy already but I do not think mem_cgroup_put(memcg) should
go away as well. Who is going to free the last reference then?

Maybe I am missing something but we have:
cgroup_create
css = ss->css_alloc(cgrp)
mem_cgroup_css_alloc
atomic_set(&memcg->refcnt, 1)
online_css(ss, cgrp)
mem_cgroup_css_online
error = memcg_init_kmem # fails
goto err_destroy
err_destroy:
cgroup_destroy_locked(cgrp)
offline_css
mem_cgroup_css_offline

no mem_cgroup_put on the way.

--
Michal Hocko
SUSE Labs

2013-04-02 12:21:45

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On 04/02/2013 04:16 PM, Michal Hocko wrote:
> On Tue 02-04-13 15:35:28, Li Zefan wrote:
> [...]
>> @@ -6247,16 +6247,7 @@ mem_cgroup_css_online(struct cgroup *cont)
>>
>> error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
>> mutex_unlock(&memcg_create_mutex);
>> - if (error) {
>> - /*
>> - * We call put now because our (and parent's) refcnts
>> - * are already in place. mem_cgroup_put() will internally
>> - * call __mem_cgroup_free, so return directly
>> - */
>> - mem_cgroup_put(memcg);
>> - if (parent->use_hierarchy)
>> - mem_cgroup_put(parent);
>> - }
>> +
>> return error;
>> }
>
> The mem_cgroup_put(parent) part is incorrect because mem_cgroup_put goes
> up the hierarchy already but I do not think mem_cgroup_put(memcg) should
> go away as well. Who is going to free the last reference then?
>
> Maybe I am missing something but we have:
> cgroup_create
> css = ss->css_alloc(cgrp)
> mem_cgroup_css_alloc
> atomic_set(&memcg->refcnt, 1)
> online_css(ss, cgrp)
> mem_cgroup_css_online
> error = memcg_init_kmem # fails
> goto err_destroy
> err_destroy:
> cgroup_destroy_locked(cgrp)
> offline_css
> mem_cgroup_css_offline
>
> no mem_cgroup_put on the way.
>

static void mem_cgroup_css_free(struct cgroup *cont)
{
struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);

kmem_cgroup_destroy(memcg);

mem_cgroup_put(memcg);
}

kernel/cgroup.c:
err_free_all:
for_each_subsys(root, ss) {
if (cgrp->subsys[ss->subsys_id])
ss->css_free(cgrp);
}

As I've said, this error path predates kmemcg for a long time. I wasn't
still able to identify precisely why it was working - assuming it was
indeed working (I remember having tested it with handcrafted manual
errors, but memory can always fail...).

But our best theory so far is that the cgroup rework started calling the
free function that wasn't called before.

2013-04-02 13:32:32

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On Tue 02-04-13 16:22:23, Glauber Costa wrote:
> On 04/02/2013 04:16 PM, Michal Hocko wrote:
> > On Tue 02-04-13 15:35:28, Li Zefan wrote:
> > [...]
> >> @@ -6247,16 +6247,7 @@ mem_cgroup_css_online(struct cgroup *cont)
> >>
> >> error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
> >> mutex_unlock(&memcg_create_mutex);
> >> - if (error) {
> >> - /*
> >> - * We call put now because our (and parent's) refcnts
> >> - * are already in place. mem_cgroup_put() will internally
> >> - * call __mem_cgroup_free, so return directly
> >> - */
> >> - mem_cgroup_put(memcg);
> >> - if (parent->use_hierarchy)
> >> - mem_cgroup_put(parent);
> >> - }
> >> +
> >> return error;
> >> }
> >
> > The mem_cgroup_put(parent) part is incorrect because mem_cgroup_put goes
> > up the hierarchy already but I do not think mem_cgroup_put(memcg) should
> > go away as well. Who is going to free the last reference then?
> >
> > Maybe I am missing something but we have:
> > cgroup_create
> > css = ss->css_alloc(cgrp)
> > mem_cgroup_css_alloc
> > atomic_set(&memcg->refcnt, 1)
> > online_css(ss, cgrp)
> > mem_cgroup_css_online
> > error = memcg_init_kmem # fails
> > goto err_destroy
> > err_destroy:
> > cgroup_destroy_locked(cgrp)
> > offline_css
> > mem_cgroup_css_offline
> >
> > no mem_cgroup_put on the way.
> >
>
> static void mem_cgroup_css_free(struct cgroup *cont)
> {
> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>
> kmem_cgroup_destroy(memcg);
>
> mem_cgroup_put(memcg);
> }
>
> kernel/cgroup.c:
> err_free_all:
> for_each_subsys(root, ss) {
> if (cgrp->subsys[ss->subsys_id])
> ss->css_free(cgrp);
> }

But we do not get to that path after online_css fails because that one
jumps to err_destroy. So this is not it. Maybe css_free gets called from
cgroup_diput but I got lost in the indirection.
--
Michal Hocko
SUSE Labs

2013-04-02 13:36:21

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On 04/02/2013 05:32 PM, Michal Hocko wrote:
> On Tue 02-04-13 16:22:23, Glauber Costa wrote:
>> On 04/02/2013 04:16 PM, Michal Hocko wrote:
>>> On Tue 02-04-13 15:35:28, Li Zefan wrote:
>>> [...]
>>>> @@ -6247,16 +6247,7 @@ mem_cgroup_css_online(struct cgroup *cont)
>>>>
>>>> error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
>>>> mutex_unlock(&memcg_create_mutex);
>>>> - if (error) {
>>>> - /*
>>>> - * We call put now because our (and parent's) refcnts
>>>> - * are already in place. mem_cgroup_put() will internally
>>>> - * call __mem_cgroup_free, so return directly
>>>> - */
>>>> - mem_cgroup_put(memcg);
>>>> - if (parent->use_hierarchy)
>>>> - mem_cgroup_put(parent);
>>>> - }
>>>> +
>>>> return error;
>>>> }
>>>
>>> The mem_cgroup_put(parent) part is incorrect because mem_cgroup_put goes
>>> up the hierarchy already but I do not think mem_cgroup_put(memcg) should
>>> go away as well. Who is going to free the last reference then?
>>>
>>> Maybe I am missing something but we have:
>>> cgroup_create
>>> css = ss->css_alloc(cgrp)
>>> mem_cgroup_css_alloc
>>> atomic_set(&memcg->refcnt, 1)
>>> online_css(ss, cgrp)
>>> mem_cgroup_css_online
>>> error = memcg_init_kmem # fails
>>> goto err_destroy
>>> err_destroy:
>>> cgroup_destroy_locked(cgrp)
>>> offline_css
>>> mem_cgroup_css_offline
>>>
>>> no mem_cgroup_put on the way.
>>>
>>
>> static void mem_cgroup_css_free(struct cgroup *cont)
>> {
>> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>>
>> kmem_cgroup_destroy(memcg);
>>
>> mem_cgroup_put(memcg);
>> }
>>
>> kernel/cgroup.c:
>> err_free_all:
>> for_each_subsys(root, ss) {
>> if (cgrp->subsys[ss->subsys_id])
>> ss->css_free(cgrp);
>> }
>
> But we do not get to that path after online_css fails because that one
> jumps to err_destroy. So this is not it. Maybe css_free gets called from
> cgroup_diput but I got lost in the indirection.
>

Yes, it is called from diput:

call_rcu(&cgrp->rcu_head, cgroup_free_rcu);

2013-04-02 14:16:49

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On Tue 02-04-13 14:16:00, Michal Hocko wrote:
> On Tue 02-04-13 15:35:28, Li Zefan wrote:
> [...]
> > @@ -6247,16 +6247,7 @@ mem_cgroup_css_online(struct cgroup *cont)
> >
> > error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
> > mutex_unlock(&memcg_create_mutex);
> > - if (error) {
> > - /*
> > - * We call put now because our (and parent's) refcnts
> > - * are already in place. mem_cgroup_put() will internally
> > - * call __mem_cgroup_free, so return directly
> > - */
> > - mem_cgroup_put(memcg);
> > - if (parent->use_hierarchy)
> > - mem_cgroup_put(parent);
> > - }
> > +
> > return error;
> > }
>
> The mem_cgroup_put(parent) part is incorrect because mem_cgroup_put goes
> up the hierarchy already but I do not think mem_cgroup_put(memcg) should
> go away as well. Who is going to free the last reference then?
>
> Maybe I am missing something but we have:

OK, I was missing something but "there is one reference without put"
still holds...

cgroup_create
css = ss->css_alloc(cgrp)
mem_cgroup_css_alloc
atomic_set(&memcg->refcnt, 1)
online_css(ss, cgrp)
mem_cgroup_css_online
memcg_init_kmem
mem_cgroup_get # refcnt = 2
memcg_update_all_caches
memcg_update_cache_size # fails with ENOMEM
goto err_destroy
err_destroy:
cgroup_destroy_locked(cgrp)
offline_css
mem_cgroup_css_offline


There is one mem_cgroup_put from mem_cgroup_css_free from cgroup_diput
but besides that I do not see any put after the patch is applied. So I
think you really need to drop only the mem_cgroup_put on parent part.
--
Michal Hocko
SUSE Labs

2013-04-02 14:20:17

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On 04/02/2013 06:16 PM, Michal Hocko wrote:
> mem_cgroup_css_online
> memcg_init_kmem
> mem_cgroup_get # refcnt = 2
> memcg_update_all_caches
> memcg_update_cache_size # fails with ENOMEM

Here is the thing: this one in kmem only happens for kmem enabled
memcgs. For those, we tend to do a get once, and put only when the last
kmem reference is gone.

For non-kmem memcgs, refcnt will be 1 here, and will be balanced out by
the mem_cgroup_put() in css_free.

2013-04-02 14:28:29

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On Tue 02-04-13 18:20:56, Glauber Costa wrote:
> On 04/02/2013 06:16 PM, Michal Hocko wrote:
> > mem_cgroup_css_online
> > memcg_init_kmem
> > mem_cgroup_get # refcnt = 2
> > memcg_update_all_caches
> > memcg_update_cache_size # fails with ENOMEM
>
> Here is the thing: this one in kmem only happens for kmem enabled
> memcgs. For those, we tend to do a get once, and put only when the last
> kmem reference is gone.
>
> For non-kmem memcgs, refcnt will be 1 here, and will be balanced out by
> the mem_cgroup_put() in css_free.

So we need this, right?
---
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f608546..2ef875d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
ret = memcg_update_cache_sizes(memcg);
mutex_unlock(&set_limit_mutex);
out:
+ if (ret)
+ mem_cgroup_put(memcg);
return ret;
}
#endif /* CONFIG_MEMCG_KMEM */
@@ -6417,16 +6419,6 @@ mem_cgroup_css_online(struct cgroup *cont)

error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
mutex_unlock(&memcg_create_mutex);
- if (error) {
- /*
- * We call put now because our (and parent's) refcnts
- * are already in place. mem_cgroup_put() will internally
- * call __mem_cgroup_free, so return directly
- */
- mem_cgroup_put(memcg);
- if (parent->use_hierarchy)
- mem_cgroup_put(parent);
- }
return error;
}

--
Michal Hocko
SUSE Labs

2013-04-02 14:32:51

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On 04/02/2013 06:28 PM, Michal Hocko wrote:
> On Tue 02-04-13 18:20:56, Glauber Costa wrote:
>> On 04/02/2013 06:16 PM, Michal Hocko wrote:
>>> mem_cgroup_css_online
>>> memcg_init_kmem
>>> mem_cgroup_get # refcnt = 2
>>> memcg_update_all_caches
>>> memcg_update_cache_size # fails with ENOMEM
>>
>> Here is the thing: this one in kmem only happens for kmem enabled
>> memcgs. For those, we tend to do a get once, and put only when the last
>> kmem reference is gone.
>>
>> For non-kmem memcgs, refcnt will be 1 here, and will be balanced out by
>> the mem_cgroup_put() in css_free.
>
> So we need this, right?
> ---
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f608546..2ef875d 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
> ret = memcg_update_cache_sizes(memcg);
> mutex_unlock(&set_limit_mutex);
> out:
> + if (ret)
> + mem_cgroup_put(memcg);
> return ret;
> }
> #endif /* CONFIG_MEMCG_KMEM */
> @@ -6417,16 +6419,6 @@ mem_cgroup_css_online(struct cgroup *cont)
>
> error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
> mutex_unlock(&memcg_create_mutex);
> - if (error) {
> - /*
> - * We call put now because our (and parent's) refcnts
> - * are already in place. mem_cgroup_put() will internally
> - * call __mem_cgroup_free, so return directly
> - */
> - mem_cgroup_put(memcg);
> - if (parent->use_hierarchy)
> - mem_cgroup_put(parent);
> - }
> return error;
> }
>
>
Yes, indeed you are very right - and thanks for looking at such depth.

2013-04-02 15:04:28

by Michal Hocko

[permalink] [raw]
Subject: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On Tue 02-04-13 18:33:30, Glauber Costa wrote:
> On 04/02/2013 06:28 PM, Michal Hocko wrote:
> > On Tue 02-04-13 18:20:56, Glauber Costa wrote:
> >> On 04/02/2013 06:16 PM, Michal Hocko wrote:
> >>> mem_cgroup_css_online
> >>> memcg_init_kmem
> >>> mem_cgroup_get # refcnt = 2
> >>> memcg_update_all_caches
> >>> memcg_update_cache_size # fails with ENOMEM
> >>
> >> Here is the thing: this one in kmem only happens for kmem enabled
> >> memcgs. For those, we tend to do a get once, and put only when the last
> >> kmem reference is gone.
> >>
> >> For non-kmem memcgs, refcnt will be 1 here, and will be balanced out by
> >> the mem_cgroup_put() in css_free.
> >
> > So we need this, right?
> > ---
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index f608546..2ef875d 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
> > ret = memcg_update_cache_sizes(memcg);
> > mutex_unlock(&set_limit_mutex);
> > out:
> > + if (ret)
> > + mem_cgroup_put(memcg);
> > return ret;
> > }
> > #endif /* CONFIG_MEMCG_KMEM */
> > @@ -6417,16 +6419,6 @@ mem_cgroup_css_online(struct cgroup *cont)
> >
> > error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
> > mutex_unlock(&memcg_create_mutex);
> > - if (error) {
> > - /*
> > - * We call put now because our (and parent's) refcnts
> > - * are already in place. mem_cgroup_put() will internally
> > - * call __mem_cgroup_free, so return directly
> > - */
> > - mem_cgroup_put(memcg);
> > - if (parent->use_hierarchy)
> > - mem_cgroup_put(parent);
> > - }
> > return error;
> > }
> >
> >
> Yes, indeed you are very right - and thanks for looking at such depth.

So what about the patch bellow? It seems that I provoked all this mess
but my brain managed to push it away so I do not remember why I thought
the parent needs reference drop... It is "only" 3.9 thing fortunately.
---
>From 3aff5d958f1d0717795018f7d0d6b63d53ad1dd3 Mon Sep 17 00:00:00 2001
From: Li Zefan <[email protected]>
Date: Tue, 2 Apr 2013 16:37:39 +0200
Subject: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online()
fails

mem_cgroup_css_online is called with memcg with refcnt = 1 and it
expects that mem_cgroup_css_free will drop this last reference.
This doesn't hold when memcg_init_kmem fails though and a reference is
dropped for both memcg and its parent explicitly if it returns with an
error.

This is not correct for two reasons. Firstly mem_cgroup_put on parent is
excessive because mem_cgroup_put is hierarchy aware and secondly only
memcg_propagate_kmem takes an additional reference.

The first one is a real use-after-free bug introduced by e4715f01
(memcg: avoid dangling reference count in creation failure)

The later one is non-issue right now because the only implementation
of init_cgroup seems to be tcp_init_cgroup which doesn't fail
but it is better to make the error handling saner and move the
mem_cgroup_put(memcg) to memcg_propagate_kmem where it belongs.

Signed-off-by: Li Zefan <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
mm/memcontrol.c | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f608546..cf9ba7e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
ret = memcg_update_cache_sizes(memcg);
mutex_unlock(&set_limit_mutex);
out:
+ if (ret)
+ mem_cgroup_put(memcg);
return ret;
}
#endif /* CONFIG_MEMCG_KMEM */
@@ -6417,16 +6419,7 @@ mem_cgroup_css_online(struct cgroup *cont)

error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
mutex_unlock(&memcg_create_mutex);
- if (error) {
- /*
- * We call put now because our (and parent's) refcnts
- * are already in place. mem_cgroup_put() will internally
- * call __mem_cgroup_free, so return directly
- */
- mem_cgroup_put(memcg);
- if (parent->use_hierarchy)
- mem_cgroup_put(parent);
- }
+
return error;
}

--
1.7.10.4

--
1.7.10.4
--
Michal Hocko
SUSE Labs

2013-04-03 03:44:00

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On 2013/4/2 21:32, Michal Hocko wrote:
> On Tue 02-04-13 16:22:23, Glauber Costa wrote:
>> On 04/02/2013 04:16 PM, Michal Hocko wrote:
>>> On Tue 02-04-13 15:35:28, Li Zefan wrote:
>>> [...]
>>>> @@ -6247,16 +6247,7 @@ mem_cgroup_css_online(struct cgroup *cont)
>>>>
>>>> error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
>>>> mutex_unlock(&memcg_create_mutex);
>>>> - if (error) {
>>>> - /*
>>>> - * We call put now because our (and parent's) refcnts
>>>> - * are already in place. mem_cgroup_put() will internally
>>>> - * call __mem_cgroup_free, so return directly
>>>> - */
>>>> - mem_cgroup_put(memcg);
>>>> - if (parent->use_hierarchy)
>>>> - mem_cgroup_put(parent);
>>>> - }
>>>> +
>>>> return error;
>>>> }
>>>
>>> The mem_cgroup_put(parent) part is incorrect because mem_cgroup_put goes
>>> up the hierarchy already but I do not think mem_cgroup_put(memcg) should
>>> go away as well. Who is going to free the last reference then?
>>>
>>> Maybe I am missing something but we have:
>>> cgroup_create
>>> css = ss->css_alloc(cgrp)
>>> mem_cgroup_css_alloc
>>> atomic_set(&memcg->refcnt, 1)
>>> online_css(ss, cgrp)
>>> mem_cgroup_css_online
>>> error = memcg_init_kmem # fails
>>> goto err_destroy
>>> err_destroy:
>>> cgroup_destroy_locked(cgrp)
>>> offline_css
>>> mem_cgroup_css_offline
>>>
>>> no mem_cgroup_put on the way.
>>>
>>
>> static void mem_cgroup_css_free(struct cgroup *cont)
>> {
>> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>>
>> kmem_cgroup_destroy(memcg);
>>
>> mem_cgroup_put(memcg);
>> }
>>
>> kernel/cgroup.c:
>> err_free_all:
>> for_each_subsys(root, ss) {
>> if (cgrp->subsys[ss->subsys_id])
>> ss->css_free(cgrp);
>> }
>
> But we do not get to that path after online_css fails because that one
> jumps to err_destroy. So this is not it. Maybe css_free gets called from
> cgroup_diput but I got lost in the indirection.
>

Just like rmdir a cgroup. cgroup_destroy_locked() will be called, and it
makes cgroup dentry go down to 0, and then cgroup_diput() is called, which
calls mem_cgroup_css_free().

2013-04-03 03:49:51

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

>> Yes, indeed you are very right - and thanks for looking at such depth.
>
> So what about the patch bellow? It seems that I provoked all this mess
> but my brain managed to push it away so I do not remember why I thought
> the parent needs reference drop... It is "only" 3.9 thing fortunately.
> ---
>>From 3aff5d958f1d0717795018f7d0d6b63d53ad1dd3 Mon Sep 17 00:00:00 2001
> From: Li Zefan <[email protected]>
> Date: Tue, 2 Apr 2013 16:37:39 +0200
> Subject: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online()
> fails
>
> mem_cgroup_css_online is called with memcg with refcnt = 1 and it
> expects that mem_cgroup_css_free will drop this last reference.
> This doesn't hold when memcg_init_kmem fails though and a reference is
> dropped for both memcg and its parent explicitly if it returns with an
> error.
>
> This is not correct for two reasons. Firstly mem_cgroup_put on parent is
> excessive because mem_cgroup_put is hierarchy aware and secondly only
> memcg_propagate_kmem takes an additional reference.
>
> The first one is a real use-after-free bug introduced by e4715f01
> (memcg: avoid dangling reference count in creation failure)
>
> The later one is non-issue right now because the only implementation
> of init_cgroup seems to be tcp_init_cgroup which doesn't fail
> but it is better to make the error handling saner and move the
> mem_cgroup_put(memcg) to memcg_propagate_kmem where it belongs.
>
> Signed-off-by: Li Zefan <[email protected]>
> Signed-off-by: Michal Hocko <[email protected]>
> ---
> mm/memcontrol.c | 13 +++----------
> 1 file changed, 3 insertions(+), 10 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f608546..cf9ba7e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
> ret = memcg_update_cache_sizes(memcg);
> mutex_unlock(&set_limit_mutex);
> out:
> + if (ret)
> + mem_cgroup_put(memcg);

Correct me if I'm wrong, but I think:

When memcg_propagate_kmem() calls mem_cgroup_get(), it's because the kmemcg
is active by inheritance. Then when memcg_update_cache_sizes() fails, leading
to mem_cgroup_css_free() is called by cgroup core:

static void mem_cgroup_css_free(struct cgroup *cont)
{
struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);

kmem_cgroup_destroy(memcg);

mem_cgroup_put(memcg);
}

static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
{
mem_cgroup_sockets_destroy(memcg);

memcg_kmem_mark_dead(memcg);

if (res_counter_read_u64(&memcg->kmem, RES_USAGE) != 0)
return;

if (memcg_kmem_test_and_clear_dead(memcg))
mem_cgroup_put(memcg); <------- !!!!!!!!!
}

> return ret;
> }
> #endif /* CONFIG_MEMCG_KMEM */
> @@ -6417,16 +6419,7 @@ mem_cgroup_css_online(struct cgroup *cont)
>
> error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
> mutex_unlock(&memcg_create_mutex);
> - if (error) {
> - /*
> - * We call put now because our (and parent's) refcnts
> - * are already in place. mem_cgroup_put() will internally
> - * call __mem_cgroup_free, so return directly
> - */
> - mem_cgroup_put(memcg);
> - if (parent->use_hierarchy)
> - mem_cgroup_put(parent);
> - }
> +
> return error;
> }
>
>

2013-04-03 07:43:13

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On Wed 03-04-13 11:49:29, Li Zefan wrote:
> >> Yes, indeed you are very right - and thanks for looking at such depth.
> >
> > So what about the patch bellow? It seems that I provoked all this mess
> > but my brain managed to push it away so I do not remember why I thought
> > the parent needs reference drop... It is "only" 3.9 thing fortunately.
> > ---
> >>From 3aff5d958f1d0717795018f7d0d6b63d53ad1dd3 Mon Sep 17 00:00:00 2001
> > From: Li Zefan <[email protected]>
> > Date: Tue, 2 Apr 2013 16:37:39 +0200
> > Subject: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online()
> > fails
> >
> > mem_cgroup_css_online is called with memcg with refcnt = 1 and it
> > expects that mem_cgroup_css_free will drop this last reference.
> > This doesn't hold when memcg_init_kmem fails though and a reference is
> > dropped for both memcg and its parent explicitly if it returns with an
> > error.
> >
> > This is not correct for two reasons. Firstly mem_cgroup_put on parent is
> > excessive because mem_cgroup_put is hierarchy aware and secondly only
> > memcg_propagate_kmem takes an additional reference.
> >
> > The first one is a real use-after-free bug introduced by e4715f01
> > (memcg: avoid dangling reference count in creation failure)
> >
> > The later one is non-issue right now because the only implementation
> > of init_cgroup seems to be tcp_init_cgroup which doesn't fail
> > but it is better to make the error handling saner and move the
> > mem_cgroup_put(memcg) to memcg_propagate_kmem where it belongs.
> >
> > Signed-off-by: Li Zefan <[email protected]>
> > Signed-off-by: Michal Hocko <[email protected]>
> > ---
> > mm/memcontrol.c | 13 +++----------
> > 1 file changed, 3 insertions(+), 10 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index f608546..cf9ba7e 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
> > ret = memcg_update_cache_sizes(memcg);
> > mutex_unlock(&set_limit_mutex);
> > out:
> > + if (ret)
> > + mem_cgroup_put(memcg);
>
> Correct me if I'm wrong, but I think:
>
> When memcg_propagate_kmem() calls mem_cgroup_get(), it's because the kmemcg
> is active by inheritance. Then when memcg_update_cache_sizes() fails, leading
> to mem_cgroup_css_free() is called by cgroup core:
>
> static void mem_cgroup_css_free(struct cgroup *cont)
> {
> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>
> kmem_cgroup_destroy(memcg);
>
> mem_cgroup_put(memcg);
> }
>
> static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
> {
> mem_cgroup_sockets_destroy(memcg);
>
> memcg_kmem_mark_dead(memcg);
>
> if (res_counter_read_u64(&memcg->kmem, RES_USAGE) != 0)
> return;
>
> if (memcg_kmem_test_and_clear_dead(memcg))
> mem_cgroup_put(memcg); <------- !!!!!!!!!
> }

But memcg_update_cache_sizes calls memcg_kmem_clear_activated on the
error path.

--
Michal Hocko
SUSE Labs

2013-04-03 07:50:23

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On 2013/4/3 15:43, Michal Hocko wrote:
> On Wed 03-04-13 11:49:29, Li Zefan wrote:
>>>> Yes, indeed you are very right - and thanks for looking at such depth.
>>>
>>> So what about the patch bellow? It seems that I provoked all this mess
>>> but my brain managed to push it away so I do not remember why I thought
>>> the parent needs reference drop... It is "only" 3.9 thing fortunately.
>>> ---
>>> >From 3aff5d958f1d0717795018f7d0d6b63d53ad1dd3 Mon Sep 17 00:00:00 2001
>>> From: Li Zefan <[email protected]>
>>> Date: Tue, 2 Apr 2013 16:37:39 +0200
>>> Subject: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online()
>>> fails
>>>
>>> mem_cgroup_css_online is called with memcg with refcnt = 1 and it
>>> expects that mem_cgroup_css_free will drop this last reference.
>>> This doesn't hold when memcg_init_kmem fails though and a reference is
>>> dropped for both memcg and its parent explicitly if it returns with an
>>> error.
>>>
>>> This is not correct for two reasons. Firstly mem_cgroup_put on parent is
>>> excessive because mem_cgroup_put is hierarchy aware and secondly only
>>> memcg_propagate_kmem takes an additional reference.
>>>
>>> The first one is a real use-after-free bug introduced by e4715f01
>>> (memcg: avoid dangling reference count in creation failure)
>>>
>>> The later one is non-issue right now because the only implementation
>>> of init_cgroup seems to be tcp_init_cgroup which doesn't fail
>>> but it is better to make the error handling saner and move the
>>> mem_cgroup_put(memcg) to memcg_propagate_kmem where it belongs.
>>>
>>> Signed-off-by: Li Zefan <[email protected]>
>>> Signed-off-by: Michal Hocko <[email protected]>
>>> ---
>>> mm/memcontrol.c | 13 +++----------
>>> 1 file changed, 3 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index f608546..cf9ba7e 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>>> @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
>>> ret = memcg_update_cache_sizes(memcg);
>>> mutex_unlock(&set_limit_mutex);
>>> out:
>>> + if (ret)
>>> + mem_cgroup_put(memcg);
>>
>> Correct me if I'm wrong, but I think:
>>
>> When memcg_propagate_kmem() calls mem_cgroup_get(), it's because the kmemcg
>> is active by inheritance. Then when memcg_update_cache_sizes() fails, leading
>> to mem_cgroup_css_free() is called by cgroup core:
>>
>> static void mem_cgroup_css_free(struct cgroup *cont)
>> {
>> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>>
>> kmem_cgroup_destroy(memcg);
>>
>> mem_cgroup_put(memcg);
>> }
>>
>> static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
>> {
>> mem_cgroup_sockets_destroy(memcg);
>>
>> memcg_kmem_mark_dead(memcg);
>>
>> if (res_counter_read_u64(&memcg->kmem, RES_USAGE) != 0)
>> return;
>>
>> if (memcg_kmem_test_and_clear_dead(memcg))
>> mem_cgroup_put(memcg); <------- !!!!!!!!!
>> }
>
> But memcg_update_cache_sizes calls memcg_kmem_clear_activated on the
> error path.
>

But memcg_kmem_mark_dead() checks the ACCOUNT flag not the ACCOUNTED flag.
Am I missing something?

2013-04-03 08:08:16

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On 04/02/2013 07:04 PM, Michal Hocko wrote:
> On Tue 02-04-13 18:33:30, Glauber Costa wrote:
>> On 04/02/2013 06:28 PM, Michal Hocko wrote:
>>> On Tue 02-04-13 18:20:56, Glauber Costa wrote:
>>>> On 04/02/2013 06:16 PM, Michal Hocko wrote:
>>>>> mem_cgroup_css_online
>>>>> memcg_init_kmem
>>>>> mem_cgroup_get # refcnt = 2
>>>>> memcg_update_all_caches
>>>>> memcg_update_cache_size # fails with ENOMEM
>>>>
>>>> Here is the thing: this one in kmem only happens for kmem enabled
>>>> memcgs. For those, we tend to do a get once, and put only when the last
>>>> kmem reference is gone.
>>>>
>>>> For non-kmem memcgs, refcnt will be 1 here, and will be balanced out by
>>>> the mem_cgroup_put() in css_free.
>>>
>>> So we need this, right?
>>> ---
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index f608546..2ef875d 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>>> @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
>>> ret = memcg_update_cache_sizes(memcg);
>>> mutex_unlock(&set_limit_mutex);
>>> out:
>>> + if (ret)
>>> + mem_cgroup_put(memcg);
>>> return ret;
>>> }
>>> #endif /* CONFIG_MEMCG_KMEM */
>>> @@ -6417,16 +6419,6 @@ mem_cgroup_css_online(struct cgroup *cont)
>>>
>>> error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
>>> mutex_unlock(&memcg_create_mutex);
>>> - if (error) {
>>> - /*
>>> - * We call put now because our (and parent's) refcnts
>>> - * are already in place. mem_cgroup_put() will internally
>>> - * call __mem_cgroup_free, so return directly
>>> - */
>>> - mem_cgroup_put(memcg);
>>> - if (parent->use_hierarchy)
>>> - mem_cgroup_put(parent);
>>> - }
>>> return error;
>>> }
>>>
>>>
>> Yes, indeed you are very right - and thanks for looking at such depth.
>
> So what about the patch bellow? It seems that I provoked all this mess
> but my brain managed to push it away so I do not remember why I thought
> the parent needs reference drop... It is "only" 3.9 thing fortunately.
> ---

Li being fine with it, I am fine with it.

2013-04-03 08:18:47

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On Wed 03-04-13 15:49:06, Li Zefan wrote:
> On 2013/4/3 15:43, Michal Hocko wrote:
> > On Wed 03-04-13 11:49:29, Li Zefan wrote:
> >>>> Yes, indeed you are very right - and thanks for looking at such depth.
> >>>
> >>> So what about the patch bellow? It seems that I provoked all this mess
> >>> but my brain managed to push it away so I do not remember why I thought
> >>> the parent needs reference drop... It is "only" 3.9 thing fortunately.
> >>> ---
> >>> >From 3aff5d958f1d0717795018f7d0d6b63d53ad1dd3 Mon Sep 17 00:00:00 2001
> >>> From: Li Zefan <[email protected]>
> >>> Date: Tue, 2 Apr 2013 16:37:39 +0200
> >>> Subject: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online()
> >>> fails
> >>>
> >>> mem_cgroup_css_online is called with memcg with refcnt = 1 and it
> >>> expects that mem_cgroup_css_free will drop this last reference.
> >>> This doesn't hold when memcg_init_kmem fails though and a reference is
> >>> dropped for both memcg and its parent explicitly if it returns with an
> >>> error.
> >>>
> >>> This is not correct for two reasons. Firstly mem_cgroup_put on parent is
> >>> excessive because mem_cgroup_put is hierarchy aware and secondly only
> >>> memcg_propagate_kmem takes an additional reference.
> >>>
> >>> The first one is a real use-after-free bug introduced by e4715f01
> >>> (memcg: avoid dangling reference count in creation failure)
> >>>
> >>> The later one is non-issue right now because the only implementation
> >>> of init_cgroup seems to be tcp_init_cgroup which doesn't fail
> >>> but it is better to make the error handling saner and move the
> >>> mem_cgroup_put(memcg) to memcg_propagate_kmem where it belongs.
> >>>
> >>> Signed-off-by: Li Zefan <[email protected]>
> >>> Signed-off-by: Michal Hocko <[email protected]>
> >>> ---
> >>> mm/memcontrol.c | 13 +++----------
> >>> 1 file changed, 3 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >>> index f608546..cf9ba7e 100644
> >>> --- a/mm/memcontrol.c
> >>> +++ b/mm/memcontrol.c
> >>> @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
> >>> ret = memcg_update_cache_sizes(memcg);
> >>> mutex_unlock(&set_limit_mutex);
> >>> out:
> >>> + if (ret)
> >>> + mem_cgroup_put(memcg);
> >>
> >> Correct me if I'm wrong, but I think:
> >>
> >> When memcg_propagate_kmem() calls mem_cgroup_get(), it's because the kmemcg
> >> is active by inheritance. Then when memcg_update_cache_sizes() fails, leading
> >> to mem_cgroup_css_free() is called by cgroup core:
> >>
> >> static void mem_cgroup_css_free(struct cgroup *cont)
> >> {
> >> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> >>
> >> kmem_cgroup_destroy(memcg);
> >>
> >> mem_cgroup_put(memcg);
> >> }
> >>
> >> static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
> >> {
> >> mem_cgroup_sockets_destroy(memcg);
> >>
> >> memcg_kmem_mark_dead(memcg);
> >>
> >> if (res_counter_read_u64(&memcg->kmem, RES_USAGE) != 0)
> >> return;
> >>
> >> if (memcg_kmem_test_and_clear_dead(memcg))
> >> mem_cgroup_put(memcg); <------- !!!!!!!!!
> >> }
> >
> > But memcg_update_cache_sizes calls memcg_kmem_clear_activated on the
> > error path.
> >
>
> But memcg_kmem_mark_dead() checks the ACCOUNT flag not the ACCOUNTED flag.
> Am I missing something?
>

Dang. You are right! Glauber, is there any reason why
memcg_kmem_mark_dead checks only KMEM_ACCOUNTED_ACTIVE rather than
KMEM_ACCOUNTED_MASK?

This all is very confusing to say the least.

Anyway, this all means that Li's first patch is correct. I am not sure I
like it though. I think that the refcount cleanup should be done as
close to where it has been taken as possible otherwise we will end up in
this "chase the nasty details" again and again. There are definitely two
bugs here. The one introduced by e4715f01 and the other one introduced
even earlier (I haven't checked that history yet). I think we should do
something like the 2 follow up patches but if you guys think that the smaller
patch from Li is more appropriate then I will not block it.

--
Michal Hocko
SUSE Labs

2013-04-03 08:30:20

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On 04/03/2013 12:18 PM, Michal Hocko wrote:
> Dang. You are right! Glauber, is there any reason why
> memcg_kmem_mark_dead checks only KMEM_ACCOUNTED_ACTIVE rather than
> KMEM_ACCOUNTED_MASK?
>
> This all is very confusing to say the least.
Yes, it is.

In kmemcg we need to differentiate between "active" and "activated"
states due to static branches management. This is only important in the
first activation, to make sure the static branches patching are
synchronized.

>From this point on, the ACTIVE flag is the one we should be looking at.

Again, I fully agree it is complicated, but being that a property of the
static branches (we tried to fix it in the static branches itself but
without a lot of luck, since by their design they patch one site at a
time). I tried to overcome this by testing handcrafted errors and
documenting the states as well as I could.

But that not always work *that* well. Maybe we can use the results of
this discussion to document the tear down process a bit more?

2013-04-03 08:39:09

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

>>> But memcg_update_cache_sizes calls memcg_kmem_clear_activated on the
>>> error path.
>>>
>>
>> But memcg_kmem_mark_dead() checks the ACCOUNT flag not the ACCOUNTED flag.
>> Am I missing something?
>>
>
> Dang. You are right! Glauber, is there any reason why
> memcg_kmem_mark_dead checks only KMEM_ACCOUNTED_ACTIVE rather than
> KMEM_ACCOUNTED_MASK?
>
> This all is very confusing to say the least.
>
> Anyway, this all means that Li's first patch is correct. I am not sure I
> like it though. I think that the refcount cleanup should be done as
> close to where it has been taken as possible otherwise we will end up in
> this "chase the nasty details" again and again. There are definitely two
> bugs here. The one introduced by e4715f01 and the other one introduced
> even earlier (I haven't checked that history yet). I think we should do
> something like the 2 follow up patches but if you guys think that the smaller
> patch from Li is more appropriate then I will not block it.
>

Or we can queue my patch for 3.9, and then see if we want to change the
tear down process, and if yes we make the change for 3.10.

2013-04-03 08:51:03

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails

On Wed 03-04-13 16:37:53, Li Zefan wrote:
> >>> But memcg_update_cache_sizes calls memcg_kmem_clear_activated on the
> >>> error path.
> >>>
> >>
> >> But memcg_kmem_mark_dead() checks the ACCOUNT flag not the ACCOUNTED flag.
> >> Am I missing something?
> >>
> >
> > Dang. You are right! Glauber, is there any reason why
> > memcg_kmem_mark_dead checks only KMEM_ACCOUNTED_ACTIVE rather than
> > KMEM_ACCOUNTED_MASK?
> >
> > This all is very confusing to say the least.
> >
> > Anyway, this all means that Li's first patch is correct. I am not sure I
> > like it though. I think that the refcount cleanup should be done as
> > close to where it has been taken as possible otherwise we will end up in
> > this "chase the nasty details" again and again. There are definitely two
> > bugs here. The one introduced by e4715f01 and the other one introduced
> > even earlier (I haven't checked that history yet). I think we should do
> > something like the 2 follow up patches but if you guys think that the smaller
> > patch from Li is more appropriate then I will not block it.
> >
>
> Or we can queue my patch for 3.9, and then see if we want to change the
> tear down process, and if yes we make the change for 3.10.

OK, I thought it would be easier but I always end up with something
similar to your patch. So feel free to add my Acked-by and parts of my
changelog that fit (namely obvious bug introduced by e4715f01 and
documentnation of the clean-up path). I have a split up version in case
others like it more - will follow.

Thanks!
--
Michal Hocko
SUSE Labs

2013-04-03 08:54:08

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 1/2] Revert "memcg: avoid dangling reference count in creation failure."

This reverts commit e4715f01be697a3730c78f8ffffb595591d6a88c

mem_cgroup_put is hierarchy aware so mem_cgroup_put(memcg) already drops
an additional reference from all parents so the additional
mem_cgrroup_put(parent) potentially causes use-after-free.

Signed-off-by: Li Zefan <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
mm/memcontrol.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f608546..6de6d70 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6424,8 +6424,6 @@ mem_cgroup_css_online(struct cgroup *cont)
* call __mem_cgroup_free, so return directly
*/
mem_cgroup_put(memcg);
- if (parent->use_hierarchy)
- mem_cgroup_put(parent);
}
return error;
}
--
1.7.10.4

2013-04-03 08:54:06

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 2/2] memcg, kmem: clean up reference count handling on the error path

mem_cgroup_css_online calls mem_cgroup_put if memcg_init_kmem
fails. This is not correct because only memcg_propagate_kmem takes an
additional reference while mem_cgroup_sockets_init is allowed to fail as
well (although no current implementation fails) but it doesn't take any
reference. This all suggests that it should be memcg_propagate_kmem that
should clean up after itself so this patch moves mem_cgroup_put over
there.
Unfortunately this is not that easy (as pointed out by Li Zefan) because
memcg_kmem_mark_dead marks the group dead (KMEM_ACCOUNTED_DEAD) if it
is marked active (KMEM_ACCOUNTED_ACTIVE) which is the case even if
memcg_propagate_kmem fails so the additional reference is dropped in
that case in kmem_cgroup_destroy which means that the reference would be
dropped two times.

The easiest way then would be to simply remove mem_cgrroup_put from
mem_cgroup_css_online and rely on kmem_cgroup_destroy doing the right
thing.

Signed-off-by: Li Zefan <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
mm/memcontrol.c | 8 --------
1 file changed, 8 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6de6d70..65b2850 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6417,14 +6417,6 @@ mem_cgroup_css_online(struct cgroup *cont)

error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
mutex_unlock(&memcg_create_mutex);
- if (error) {
- /*
- * We call put now because our (and parent's) refcnts
- * are already in place. mem_cgroup_put() will internally
- * call __mem_cgroup_free, so return directly
- */
- mem_cgroup_put(memcg);
- }
return error;
}

--
1.7.10.4

2013-04-03 09:48:58

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 2/2] memcg, kmem: clean up reference count handling on the error path

On Wed 03-04-13 10:53:54, Michal Hocko wrote:
> mem_cgroup_css_online calls mem_cgroup_put if memcg_init_kmem
> fails. This is not correct because only memcg_propagate_kmem takes an
> additional reference while mem_cgroup_sockets_init is allowed to fail as
> well (although no current implementation fails) but it doesn't take any
> reference. This all suggests that it should be memcg_propagate_kmem that
> should clean up after itself so this patch moves mem_cgroup_put over
> there.
> Unfortunately this is not that easy (as pointed out by Li Zefan) because
> memcg_kmem_mark_dead marks the group dead (KMEM_ACCOUNTED_DEAD) if it
> is marked active (KMEM_ACCOUNTED_ACTIVE) which is the case even if
> memcg_propagate_kmem fails so the additional reference is dropped in
> that case in kmem_cgroup_destroy which means that the reference would be
> dropped two times.
>
> The easiest way then would be to simply remove mem_cgrroup_put from
> mem_cgroup_css_online and rely on kmem_cgroup_destroy doing the right
> thing.

Forgot to mention that this one could be marked for stable for 3.8

> Signed-off-by: Li Zefan <[email protected]>
> Signed-off-by: Michal Hocko <[email protected]>
> ---
> mm/memcontrol.c | 8 --------
> 1 file changed, 8 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6de6d70..65b2850 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6417,14 +6417,6 @@ mem_cgroup_css_online(struct cgroup *cont)
>
> error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
> mutex_unlock(&memcg_create_mutex);
> - if (error) {
> - /*
> - * We call put now because our (and parent's) refcnts
> - * are already in place. mem_cgroup_put() will internally
> - * call __mem_cgroup_free, so return directly
> - */
> - mem_cgroup_put(memcg);
> - }
> return error;
> }
>
> --
> 1.7.10.4
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Michal Hocko
SUSE Labs