2014-02-11 08:06:04

by Zefan Li

[permalink] [raw]
Subject: [PATCH] cgroup: protect modifications to cgroup_idr with cgroup_mutex

Setup cgroupfs like this:
# mount -t cgroup -o cpuacct xxx /cgroup
# mkdir /cgroup/sub1
# mkdir /cgroup/sub2

Then run these two commands:
# for ((; ;)) { mkdir /cgroup/sub1/tmp && rmdir /mnt/sub1/tmp; } &
# for ((; ;)) { mkdir /cgroup/sub2/tmp && rmdir /mnt/sub2/tmp; } &

After seconds you may see this warning:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 25243 at lib/idr.c:527 sub_remove+0x87/0x1b0()
idr_remove called for id=6 which is not allocated.
...
Call Trace:
[<ffffffff8156063c>] dump_stack+0x7a/0x96
[<ffffffff810591ac>] warn_slowpath_common+0x8c/0xc0
[<ffffffff81059296>] warn_slowpath_fmt+0x46/0x50
[<ffffffff81300aa7>] sub_remove+0x87/0x1b0
[<ffffffff810f3f02>] ? css_killed_work_fn+0x32/0x1b0
[<ffffffff81300bf5>] idr_remove+0x25/0xd0
[<ffffffff810f2bab>] cgroup_destroy_css_killed+0x5b/0xc0
[<ffffffff810f4000>] css_killed_work_fn+0x130/0x1b0
[<ffffffff8107cdbc>] process_one_work+0x26c/0x550
[<ffffffff8107eefe>] worker_thread+0x12e/0x3b0
[<ffffffff81085f96>] kthread+0xe6/0xf0
[<ffffffff81570bac>] ret_from_fork+0x7c/0xb0
---[ end trace 2d1577ec10cf80d0 ]---

It's because allocating/removing cgroup ID is not properly synchronized.

The bug was introduced when we converted cgroup_ida to cgroup_idr.
While synchronization is already done inside ida_simple_{get,remove}(),
users are responsible for concurrent calls to idr_{alloc,remove}().

Fixes: 4e96ee8e981b ("cgroup: convert cgroup_ida to cgroup_idr")
Cc: <[email protected]> #3.12+
Reported-by: Michal Hocko <[email protected]>
Signed-off-by: Li Zefan <[email protected]>
---
include/linux/cgroup.h | 2 ++
kernel/cgroup.c | 30 ++++++++++++++++--------------
2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 5c09759..9450f02 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -166,6 +166,8 @@ struct cgroup {
*
* The ID of the root cgroup is always 0, and a new cgroup
* will be assigned with a smallest available ID.
+ *
+ * Allocating/Removing ID must be protected by cgroup_mutex.
*/
int id;

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index e2f46ba..4bbaadd 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -886,7 +886,9 @@ static void cgroup_diput(struct dentry *dentry, struct inode *inode)
* per-subsystem and moved to css->id so that lookups are
* successful until the target css is released.
*/
+ mutex_lock(&cgroup_mutex);
idr_remove(&cgrp->root->cgroup_idr, cgrp->id);
+ mutex_unlock(&cgroup_mutex);
cgrp->id = -1;

call_rcu(&cgrp->rcu_head, cgroup_free_rcu);
@@ -4173,14 +4175,6 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
rcu_assign_pointer(cgrp->name, name);

/*
- * Temporarily set the pointer to NULL, so idr_find() won't return
- * a half-baked cgroup.
- */
- cgrp->id = idr_alloc(&root->cgroup_idr, NULL, 1, 0, GFP_KERNEL);
- if (cgrp->id < 0)
- goto err_free_name;
-
- /*
* Only live parents can have children. Note that the liveliness
* check isn't strictly necessary because cgroup_mkdir() and
* cgroup_rmdir() are fully synchronized by i_mutex; however, do it
@@ -4189,9 +4183,17 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
*/
if (!cgroup_lock_live_group(parent)) {
err = -ENODEV;
- goto err_free_id;
+ goto err_free_name;
}

+ /*
+ * Temporarily set the pointer to NULL, so idr_find() won't return
+ * a half-baked cgroup.
+ */
+ cgrp->id = idr_alloc(&root->cgroup_idr, NULL, 1, 0, GFP_KERNEL);
+ if (cgrp->id < 0)
+ goto err_unlock;
+
/* Grab a reference on the superblock so the hierarchy doesn't
* get deleted on unmount if there are child cgroups. This
* can be done outside cgroup_mutex, since the sb can't
@@ -4221,7 +4223,7 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
*/
err = cgroup_create_file(dentry, S_IFDIR | mode, sb);
if (err < 0)
- goto err_unlock;
+ goto err_free_id;
lockdep_assert_held(&dentry->d_inode->i_mutex);

cgrp->serial_nr = cgroup_serial_nr_next++;
@@ -4257,12 +4259,12 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,

return 0;

-err_unlock:
- mutex_unlock(&cgroup_mutex);
- /* Release the reference count that we took on the superblock */
- deactivate_super(sb);
err_free_id:
idr_remove(&root->cgroup_idr, cgrp->id);
+ /* Release the reference count that we took on the superblock */
+ deactivate_super(sb);
+err_unlock:
+ mutex_unlock(&cgroup_mutex);
err_free_name:
kfree(rcu_dereference_raw(cgrp->name));
err_free_cgrp:
--
1.8.0.2


2014-02-11 10:20:42

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] cgroup: protect modifications to cgroup_idr with cgroup_mutex

Hi Li,
good work in reproducing the issue so quickly!
I have tried to backport this patch to 3.12 kernel but the code has
changed since then.
The only two instances of idr_remove which are called outside of
cgroup_mutex seem to be:
- cgroup_create calling it from err_free_id: path
- css_free_work_fn
mem_cgroup_css_free
__mem_cgroup_free
free_css_id

The second one takes ss->id_lock spinlock which should be sufficient
to exclude get_new_cssid but cgroup_mount and cgroup_create don't use
id_lock. They do hold cgroup_mutex though. So I guess I need something
like the following (I will have it tested):
---
Date: Tue, 11 Feb 2014 16:05:46 +0800
From: Li Zefan <[email protected]>
Subject: [PATCH - for 3.12] cgroup: protect modifications to cgroup_idr with cgroup_mutex

Setup cgroupfs like this:
# mount -t cgroup -o cpuacct xxx /cgroup
# mkdir /cgroup/sub1
# mkdir /cgroup/sub2

Then run these two commands:
# for ((; ;)) { mkdir /cgroup/sub1/tmp && rmdir /mnt/sub1/tmp; } &
# for ((; ;)) { mkdir /cgroup/sub2/tmp && rmdir /mnt/sub2/tmp; } &

After seconds you may see this warning:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 25243 at lib/idr.c:527 sub_remove+0x87/0x1b0()
idr_remove called for id=6 which is not allocated.
...
Call Trace:
[<ffffffff8156063c>] dump_stack+0x7a/0x96
[<ffffffff810591ac>] warn_slowpath_common+0x8c/0xc0
[<ffffffff81059296>] warn_slowpath_fmt+0x46/0x50
[<ffffffff81300aa7>] sub_remove+0x87/0x1b0
[<ffffffff810f3f02>] ? css_killed_work_fn+0x32/0x1b0
[<ffffffff81300bf5>] idr_remove+0x25/0xd0
[<ffffffff810f2bab>] cgroup_destroy_css_killed+0x5b/0xc0
[<ffffffff810f4000>] css_killed_work_fn+0x130/0x1b0
[<ffffffff8107cdbc>] process_one_work+0x26c/0x550
[<ffffffff8107eefe>] worker_thread+0x12e/0x3b0
[<ffffffff81085f96>] kthread+0xe6/0xf0
[<ffffffff81570bac>] ret_from_fork+0x7c/0xb0
---[ end trace 2d1577ec10cf80d0 ]---

It's because allocating/removing cgroup ID is not properly synchronized.

The bug was introduced when we converted cgroup_ida to cgroup_idr.
While synchronization is already done inside ida_simple_{get,remove}(),
users are responsible for concurrent calls to idr_{alloc,remove}().

[[email protected]: ported to 3.12]
Fixes: 4e96ee8e981b ("cgroup: convert cgroup_ida to cgroup_idr")
Cc: <[email protected]> #3.12+
Reported-by: Michal Hocko <[email protected]>
Signed-off-by: Li Zefan <[email protected]>
---
include/linux/cgroup.h | 2 ++
kernel/cgroup.c | 25 ++++++++++++++-----------
2 files changed, 16 insertions(+), 11 deletions(-)

--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -169,6 +169,8 @@ struct cgroup {
*
* The ID of the root cgroup is always 0, and a new cgroup
* will be assigned with a smallest available ID.
+ *
+ * Allocating/Removing ID must be protected by cgroup_mutex.
*/
int id;

--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4410,14 +4410,6 @@ static long cgroup_create(struct cgroup
rcu_assign_pointer(cgrp->name, name);

/*
- * Temporarily set the pointer to NULL, so idr_find() won't return
- * a half-baked cgroup.
- */
- cgrp->id = idr_alloc(&root->cgroup_idr, NULL, 1, 0, GFP_KERNEL);
- if (cgrp->id < 0)
- goto err_free_name;
-
- /*
* Only live parents can have children. Note that the liveliness
* check isn't strictly necessary because cgroup_mkdir() and
* cgroup_rmdir() are fully synchronized by i_mutex; however, do it
@@ -4426,9 +4418,17 @@ static long cgroup_create(struct cgroup
*/
if (!cgroup_lock_live_group(parent)) {
err = -ENODEV;
- goto err_free_id;
+ goto err_free_name;
}

+ /*
+ * Temporarily set the pointer to NULL, so idr_find() won't return
+ * a half-baked cgroup.
+ */
+ cgrp->id = idr_alloc(&root->cgroup_idr, NULL, 1, 0, GFP_KERNEL);
+ if (cgrp->id < 0)
+ goto err_unlock;
+
/* Grab a reference on the superblock so the hierarchy doesn't
* get deleted on unmount if there are child cgroups. This
* can be done outside cgroup_mutex, since the sb can't
@@ -4542,11 +4542,11 @@ err_free_all:
ss->css_free(css);
}
}
+ idr_remove(&root->cgroup_idr, cgrp->id);
+err_unlock:
mutex_unlock(&cgroup_mutex);
/* Release the reference count that we took on the superblock */
deactivate_super(sb);
-err_free_id:
- idr_remove(&root->cgroup_idr, cgrp->id);
err_free_name:
kfree(rcu_dereference_raw(cgrp->name));
err_free_cgrp:
@@ -5622,9 +5622,12 @@ void free_css_id(struct cgroup_subsys *s

rcu_assign_pointer(id->css, NULL);
rcu_assign_pointer(css->id, NULL);
+
+ mutex_lock(&cgroup_mutex);
spin_lock(&ss->id_lock);
idr_remove(&ss->idr, id->id);
spin_unlock(&ss->id_lock);
+ mutex_unlock(&cgroup_mutex);
kfree_rcu(id, rcu_head);
}
EXPORT_SYMBOL_GPL(free_css_id);
--
Michal Hocko
SUSE Labs

2014-02-11 15:36:35

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] cgroup: protect modifications to cgroup_idr with cgroup_mutex

On Tue, Feb 11, 2014 at 04:05:46PM +0800, Li Zefan wrote:
> Setup cgroupfs like this:
> # mount -t cgroup -o cpuacct xxx /cgroup
> # mkdir /cgroup/sub1
> # mkdir /cgroup/sub2
>
> Then run these two commands:
> # for ((; ;)) { mkdir /cgroup/sub1/tmp && rmdir /mnt/sub1/tmp; } &
> # for ((; ;)) { mkdir /cgroup/sub2/tmp && rmdir /mnt/sub2/tmp; } &
>
> After seconds you may see this warning:
>
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 25243 at lib/idr.c:527 sub_remove+0x87/0x1b0()
> idr_remove called for id=6 which is not allocated.
> ...
> Call Trace:
> [<ffffffff8156063c>] dump_stack+0x7a/0x96
> [<ffffffff810591ac>] warn_slowpath_common+0x8c/0xc0
> [<ffffffff81059296>] warn_slowpath_fmt+0x46/0x50
> [<ffffffff81300aa7>] sub_remove+0x87/0x1b0
> [<ffffffff810f3f02>] ? css_killed_work_fn+0x32/0x1b0
> [<ffffffff81300bf5>] idr_remove+0x25/0xd0
> [<ffffffff810f2bab>] cgroup_destroy_css_killed+0x5b/0xc0
> [<ffffffff810f4000>] css_killed_work_fn+0x130/0x1b0
> [<ffffffff8107cdbc>] process_one_work+0x26c/0x550
> [<ffffffff8107eefe>] worker_thread+0x12e/0x3b0
> [<ffffffff81085f96>] kthread+0xe6/0xf0
> [<ffffffff81570bac>] ret_from_fork+0x7c/0xb0
> ---[ end trace 2d1577ec10cf80d0 ]---
>
> It's because allocating/removing cgroup ID is not properly synchronized.
>
> The bug was introduced when we converted cgroup_ida to cgroup_idr.
> While synchronization is already done inside ida_simple_{get,remove}(),
> users are responsible for concurrent calls to idr_{alloc,remove}().
>
> Fixes: 4e96ee8e981b ("cgroup: convert cgroup_ida to cgroup_idr")
> Cc: <[email protected]> #3.12+
> Reported-by: Michal Hocko <[email protected]>
> Signed-off-by: Li Zefan <[email protected]>

Jesus, you're right. I missed that completely. It conflicts with the
return value fix in cgroup/for-3.14-fixes but easy to fix up.
Applying to cgroup/for-3.14-fixes.

Thanks!

--
tejun

2014-02-11 15:41:12

by Tejun Heo

[permalink] [raw]
Subject: [PATCH cgroup/for-3.14-fixes] cgroup: protect modifications to cgroup_idr with cgroup_mutex

Hello,

This is the slightly modified patch that I applied to
cgroup/for-3.14-fixes.

Thanks a lot!

-------- 8< --------
>From 0ab02ca8f887908152d1a96db5130fc661d36a1e Mon Sep 17 00:00:00 2001
From: Li Zefan <[email protected]>
Date: Tue, 11 Feb 2014 16:05:46 +0800

Setup cgroupfs like this:
# mount -t cgroup -o cpuacct xxx /cgroup
# mkdir /cgroup/sub1
# mkdir /cgroup/sub2

Then run these two commands:
# for ((; ;)) { mkdir /cgroup/sub1/tmp && rmdir /mnt/sub1/tmp; } &
# for ((; ;)) { mkdir /cgroup/sub2/tmp && rmdir /mnt/sub2/tmp; } &

After seconds you may see this warning:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 25243 at lib/idr.c:527 sub_remove+0x87/0x1b0()
idr_remove called for id=6 which is not allocated.
...
Call Trace:
[<ffffffff8156063c>] dump_stack+0x7a/0x96
[<ffffffff810591ac>] warn_slowpath_common+0x8c/0xc0
[<ffffffff81059296>] warn_slowpath_fmt+0x46/0x50
[<ffffffff81300aa7>] sub_remove+0x87/0x1b0
[<ffffffff810f3f02>] ? css_killed_work_fn+0x32/0x1b0
[<ffffffff81300bf5>] idr_remove+0x25/0xd0
[<ffffffff810f2bab>] cgroup_destroy_css_killed+0x5b/0xc0
[<ffffffff810f4000>] css_killed_work_fn+0x130/0x1b0
[<ffffffff8107cdbc>] process_one_work+0x26c/0x550
[<ffffffff8107eefe>] worker_thread+0x12e/0x3b0
[<ffffffff81085f96>] kthread+0xe6/0xf0
[<ffffffff81570bac>] ret_from_fork+0x7c/0xb0
---[ end trace 2d1577ec10cf80d0 ]---

It's because allocating/removing cgroup ID is not properly synchronized.

The bug was introduced when we converted cgroup_ida to cgroup_idr.
While synchronization is already done inside ida_simple_{get,remove}(),
users are responsible for concurrent calls to idr_{alloc,remove}().

tj: Refreshed on top of b58c89986a77 ("cgroup: fix error return from
cgroup_create()").

Fixes: 4e96ee8e981b ("cgroup: convert cgroup_ida to cgroup_idr")
Cc: <[email protected]> #3.12+
Reported-by: Michal Hocko <[email protected]>
Signed-off-by: Li Zefan <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/cgroup.h | 2 ++
kernel/cgroup.c | 34 ++++++++++++++++++----------------
2 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 5c09759..9450f02 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -166,6 +166,8 @@ struct cgroup {
*
* The ID of the root cgroup is always 0, and a new cgroup
* will be assigned with a smallest available ID.
+ *
+ * Allocating/Removing ID must be protected by cgroup_mutex.
*/
int id;

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 3edf716..52719ce 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -886,7 +886,9 @@ static void cgroup_diput(struct dentry *dentry, struct inode *inode)
* per-subsystem and moved to css->id so that lookups are
* successful until the target css is released.
*/
+ mutex_lock(&cgroup_mutex);
idr_remove(&cgrp->root->cgroup_idr, cgrp->id);
+ mutex_unlock(&cgroup_mutex);
cgrp->id = -1;

call_rcu(&cgrp->rcu_head, cgroup_free_rcu);
@@ -4168,16 +4170,6 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
rcu_assign_pointer(cgrp->name, name);

/*
- * Temporarily set the pointer to NULL, so idr_find() won't return
- * a half-baked cgroup.
- */
- cgrp->id = idr_alloc(&root->cgroup_idr, NULL, 1, 0, GFP_KERNEL);
- if (cgrp->id < 0) {
- err = -ENOMEM;
- goto err_free_name;
- }
-
- /*
* Only live parents can have children. Note that the liveliness
* check isn't strictly necessary because cgroup_mkdir() and
* cgroup_rmdir() are fully synchronized by i_mutex; however, do it
@@ -4186,7 +4178,17 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
*/
if (!cgroup_lock_live_group(parent)) {
err = -ENODEV;
- goto err_free_id;
+ goto err_free_name;
+ }
+
+ /*
+ * Temporarily set the pointer to NULL, so idr_find() won't return
+ * a half-baked cgroup.
+ */
+ cgrp->id = idr_alloc(&root->cgroup_idr, NULL, 1, 0, GFP_KERNEL);
+ if (cgrp->id < 0) {
+ err = -ENOMEM;
+ goto err_unlock;
}

/* Grab a reference on the superblock so the hierarchy doesn't
@@ -4218,7 +4220,7 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
*/
err = cgroup_create_file(dentry, S_IFDIR | mode, sb);
if (err < 0)
- goto err_unlock;
+ goto err_free_id;
lockdep_assert_held(&dentry->d_inode->i_mutex);

cgrp->serial_nr = cgroup_serial_nr_next++;
@@ -4254,12 +4256,12 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,

return 0;

-err_unlock:
- mutex_unlock(&cgroup_mutex);
- /* Release the reference count that we took on the superblock */
- deactivate_super(sb);
err_free_id:
idr_remove(&root->cgroup_idr, cgrp->id);
+ /* Release the reference count that we took on the superblock */
+ deactivate_super(sb);
+err_unlock:
+ mutex_unlock(&cgroup_mutex);
err_free_name:
kfree(rcu_dereference_raw(cgrp->name));
err_free_cgrp:
--
1.8.5.3

2014-02-11 16:35:04

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH cgroup/for-3.14-fixes] cgroup: protect modifications to cgroup_idr with cgroup_mutex

On Tue 11-02-14 10:41:05, Tejun Heo wrote:
[...]
> @@ -4254,12 +4256,12 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
>
> return 0;
>
> -err_unlock:
> - mutex_unlock(&cgroup_mutex);
> - /* Release the reference count that we took on the superblock */
> - deactivate_super(sb);
> err_free_id:
> idr_remove(&root->cgroup_idr, cgrp->id);
> + /* Release the reference count that we took on the superblock */
> + deactivate_super(sb);
> +err_unlock:
> + mutex_unlock(&cgroup_mutex);
> err_free_name:
> kfree(rcu_dereference_raw(cgrp->name));
> err_free_cgrp:

Do I have to change deactivate_super vs. mutex_unlock ordering in my
backport for 3.12 as well?
--
Michal Hocko
SUSE Labs

2014-02-12 01:51:33

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH] cgroup: protect modifications to cgroup_idr with cgroup_mutex

On 2014/2/11 18:20, Michal Hocko wrote:
> Hi Li,
> good work in reproducing the issue so quickly!
> I have tried to backport this patch to 3.12 kernel but the code has
> changed since then.
> The only two instances of idr_remove which are called outside of
> cgroup_mutex seem to be:
> - cgroup_create calling it from err_free_id: path
> - css_free_work_fn
> mem_cgroup_css_free
> __mem_cgroup_free
> free_css_id
>
> The second one takes ss->id_lock spinlock which should be sufficient
> to exclude get_new_cssid but cgroup_mount and cgroup_create don't use
> id_lock. They do hold cgroup_mutex though. So I guess I need something
> like the following (I will have it tested):

I don't think you need to do anything with ss->idr.

cgroup_create() calls alloc_css_id() -> get_new_cssid(), and get_new_cssid()
uses id_lock.

cgroup_mount() won't touch ss->idr, because the css_id for root cgroup is
always 0.

2014-02-12 02:17:03

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH cgroup/for-3.14-fixes] cgroup: protect modifications to cgroup_idr with cgroup_mutex

On 2014/2/12 0:26, Michal Hocko wrote:
> On Tue 11-02-14 10:41:05, Tejun Heo wrote:
> [...]
>> @@ -4254,12 +4256,12 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
>>
>> return 0;
>>
>> -err_unlock:
>> - mutex_unlock(&cgroup_mutex);
>> - /* Release the reference count that we took on the superblock */
>> - deactivate_super(sb);
>> err_free_id:
>> idr_remove(&root->cgroup_idr, cgrp->id);
>> + /* Release the reference count that we took on the superblock */
>> + deactivate_super(sb);
>> +err_unlock:
>> + mutex_unlock(&cgroup_mutex);
>> err_free_name:
>> kfree(rcu_dereference_raw(cgrp->name));
>> err_free_cgrp:
>
> Do I have to change deactivate_super vs. mutex_unlock ordering in my
> backport for 3.12 as well?
>

Your change is wrong that you shouldn't drop sb refcnt in err_unlock path.

But you made me think if it's OK to hold cgroup_mutex while calling deactivate_super(),
and the answer is NO! deactive_super() may call cgroup_kill_sb() which will
acquire cgroup_mutex.

I'll update the patch.

Thank Tejun we won't be entangled with vfs internal anymore after coverting
to kernfs.

2014-02-12 02:32:33

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH cgroup/for-3.14-fixes] cgroup: protect modifications to cgroup_idr with cgroup_mutex

On 2014/2/12 10:15, Li Zefan wrote:
> On 2014/2/12 0:26, Michal Hocko wrote:
>> On Tue 11-02-14 10:41:05, Tejun Heo wrote:
>> [...]
>>> @@ -4254,12 +4256,12 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
>>>
>>> return 0;
>>>
>>> -err_unlock:
>>> - mutex_unlock(&cgroup_mutex);
>>> - /* Release the reference count that we took on the superblock */
>>> - deactivate_super(sb);
>>> err_free_id:
>>> idr_remove(&root->cgroup_idr, cgrp->id);
>>> + /* Release the reference count that we took on the superblock */
>>> + deactivate_super(sb);
>>> +err_unlock:
>>> + mutex_unlock(&cgroup_mutex);
>>> err_free_name:
>>> kfree(rcu_dereference_raw(cgrp->name));
>>> err_free_cgrp:
>>
>> Do I have to change deactivate_super vs. mutex_unlock ordering in my
>> backport for 3.12 as well?
>>
>
> Your change is wrong that you shouldn't drop sb refcnt in err_unlock path.
>
> But you made me think if it's OK to hold cgroup_mutex while calling deactivate_super(),
> and the answer is NO! deactive_super() may call cgroup_kill_sb() which will
> acquire cgroup_mutex.
>
> I'll update the patch.
>
> Thank Tejun we won't be entangled with vfs internal anymore after coverting
> to kernfs.
>

On second thought, it should be safe to call deactivate_super() before
releasing cgroup_mutex, as cgroup_create() is called through vfs, so vfs
should guanrantee the superblock won't disapear, so this deactivate_super()
won't drop sb refcnt to 0.

Still this is just my guess without diving into vfs code, and we'd better
not depend on it even my guess is correct.

2014-02-12 09:12:48

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] cgroup: protect modifications to cgroup_idr with cgroup_mutex

Li has pointed out that my previous backport was not correct because
err_unlock label releases a reference to supperblock which was not taken
before idr_alloc. I've also removed cgroup_mutex from free_css_id as per
Li.
Fixed in this version.

---
Date: Tue, 11 Feb 2014 16:05:46 +0800
From: Li Zefan <[email protected]>
Subject: [PATCH - for 3.12] cgroup: protect modifications to cgroup_idr with cgroup_mutex

Setup cgroupfs like this:
# mount -t cgroup -o cpuacct xxx /cgroup
# mkdir /cgroup/sub1
# mkdir /cgroup/sub2

Then run these two commands:
# for ((; ;)) { mkdir /cgroup/sub1/tmp && rmdir /mnt/sub1/tmp; } &
# for ((; ;)) { mkdir /cgroup/sub2/tmp && rmdir /mnt/sub2/tmp; } &

After seconds you may see this warning:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 25243 at lib/idr.c:527 sub_remove+0x87/0x1b0()
idr_remove called for id=6 which is not allocated.
...
Call Trace:
[<ffffffff8156063c>] dump_stack+0x7a/0x96
[<ffffffff810591ac>] warn_slowpath_common+0x8c/0xc0
[<ffffffff81059296>] warn_slowpath_fmt+0x46/0x50
[<ffffffff81300aa7>] sub_remove+0x87/0x1b0
[<ffffffff810f3f02>] ? css_killed_work_fn+0x32/0x1b0
[<ffffffff81300bf5>] idr_remove+0x25/0xd0
[<ffffffff810f2bab>] cgroup_destroy_css_killed+0x5b/0xc0
[<ffffffff810f4000>] css_killed_work_fn+0x130/0x1b0
[<ffffffff8107cdbc>] process_one_work+0x26c/0x550
[<ffffffff8107eefe>] worker_thread+0x12e/0x3b0
[<ffffffff81085f96>] kthread+0xe6/0xf0
[<ffffffff81570bac>] ret_from_fork+0x7c/0xb0
---[ end trace 2d1577ec10cf80d0 ]---

It's because allocating/removing cgroup ID is not properly synchronized.

The bug was introduced when we converted cgroup_ida to cgroup_idr.
While synchronization is already done inside ida_simple_{get,remove}(),
users are responsible for concurrent calls to idr_{alloc,remove}().

[[email protected]: ported to 3.12]
Fixes: 4e96ee8e981b ("cgroup: convert cgroup_ida to cgroup_idr")
Cc: <[email protected]> #3.12+
Reported-by: Michal Hocko <[email protected]>
Signed-off-by: Li Zefan <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
include/linux/cgroup.h | 2 ++
kernel/cgroup.c | 23 ++++++++++++-----------
2 files changed, 14 insertions(+), 11 deletions(-)

--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -169,6 +169,8 @@ struct cgroup {
*
* The ID of the root cgroup is always 0, and a new cgroup
* will be assigned with a smallest available ID.
+ *
+ * Allocating/Removing ID must be protected by cgroup_mutex.
*/
int id;

--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4410,14 +4410,6 @@ static long cgroup_create(struct cgroup
rcu_assign_pointer(cgrp->name, name);

/*
- * Temporarily set the pointer to NULL, so idr_find() won't return
- * a half-baked cgroup.
- */
- cgrp->id = idr_alloc(&root->cgroup_idr, NULL, 1, 0, GFP_KERNEL);
- if (cgrp->id < 0)
- goto err_free_name;
-
- /*
* Only live parents can have children. Note that the liveliness
* check isn't strictly necessary because cgroup_mkdir() and
* cgroup_rmdir() are fully synchronized by i_mutex; however, do it
@@ -4426,7 +4418,7 @@ static long cgroup_create(struct cgroup
*/
if (!cgroup_lock_live_group(parent)) {
err = -ENODEV;
- goto err_free_id;
+ goto err_free_name;
}

/* Grab a reference on the superblock so the hierarchy doesn't
@@ -4436,6 +4428,14 @@ static long cgroup_create(struct cgroup
* fs */
atomic_inc(&sb->s_active);

+ /*
+ * Temporarily set the pointer to NULL, so idr_find() won't return
+ * a half-baked cgroup.
+ */
+ cgrp->id = idr_alloc(&root->cgroup_idr, NULL, 1, 0, GFP_KERNEL);
+ if (cgrp->id < 0)
+ goto err_unlock;
+
init_cgroup_housekeeping(cgrp);

dentry->d_fsdata = cgrp;
@@ -4542,11 +4542,11 @@ err_free_all:
ss->css_free(css);
}
}
+ idr_remove(&root->cgroup_idr, cgrp->id);
+err_unlock:
mutex_unlock(&cgroup_mutex);
/* Release the reference count that we took on the superblock */
deactivate_super(sb);
-err_free_id:
- idr_remove(&root->cgroup_idr, cgrp->id);
err_free_name:
kfree(rcu_dereference_raw(cgrp->name));
err_free_cgrp:
--
Michal Hocko
SUSE Labs

2014-02-12 09:27:32

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH] cgroup: protect modifications to cgroup_idr with cgroup_mutex

On 2014/2/12 17:12, Michal Hocko wrote:
> Li has pointed out that my previous backport was not correct because
> err_unlock label releases a reference to supperblock which was not taken
> before idr_alloc. I've also removed cgroup_mutex from free_css_id as per
> Li.
> Fixed in this version.
>

Looks good to me!

Are you going to send it to stable mailing list after the fix hits mainline
or use it internally?

It has small conflicts with this patch that will also be backported to 3.12:

https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/commit/kernel/cgroup.c?h=for-3.14-fixes&id=b58c89986a77a23658682a100eb15d8edb571ebb

2014-02-12 09:38:36

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] cgroup: protect modifications to cgroup_idr with cgroup_mutex

On Wed 12-02-14 17:26:40, Li Zefan wrote:
> On 2014/2/12 17:12, Michal Hocko wrote:
> > Li has pointed out that my previous backport was not correct because
> > err_unlock label releases a reference to supperblock which was not taken
> > before idr_alloc. I've also removed cgroup_mutex from free_css_id as per
> > Li.
> > Fixed in this version.
> >
>
> Looks good to me!

Thanks for double checking.

> Are you going to send it to stable mailing list after the fix hits mainline
> or use it internally?

I plan to have it tested internally and then send it later on when
stable FAILED to apply mail arrives. Unless you want to send it yourself
of course.

> It has small conflicts with this patch that will also be backported to 3.12:
>
> https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/commit/kernel/cgroup.c?h=for-3.14-fixes&id=b58c89986a77a23658682a100eb15d8edb571ebb

Thanks for the heads up. Should be trivial to resolve.
--
Michal Hocko
SUSE Labs