LinuxLists.cc - [PATCH] cgroup: Fix root_count when mount fails due to busy subsystem

2009-01-23 00:49:40

Subject: [PATCH] cgroup: Fix root_count when mount fails due to busy subsystem

cgroup: Fix root_count when mount fails due to busy subsystem

root_count was being incremented in cgroup_get_sb() after all error
checking was complete, but decremented in cgroup_kill_sb(), which can
be called on a superblock that we gave up on due to an error. This
patch changes cgroup_kill_sb() to only decrement root_count if the
root was previously linked into the list of roots.

Signed-off-by: Paul Menage <[email protected]>

---

I was actually surprised to find that list_del() doesn't crash when
run on an unattached list_head structure.

kernel/cgroup.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index adcd0bb..9ce27e8 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1115,8 +1115,10 @@ static void cgroup_kill_sb(struct super_block *sb) {
}
write_unlock(&css_set_lock);

- list_del(&root->root_list);
- root_count--;
+ if (!list_empty(&root->root_list)) {
+ list_del(&root->root_list);
+ root_count--;
+ }

mutex_unlock(&cgroup_mutex);

2009-01-23 02:20:29

by Serge E. Hallyn

[permalink] [raw]

Subject: Re: [PATCH] cgroup: Fix root_count when mount fails due to busy subsystem

Quoting Paul Menage ([email protected]):
> cgroup: Fix root_count when mount fails due to busy subsystem
>
> root_count was being incremented in cgroup_get_sb() after all error
> checking was complete, but decremented in cgroup_kill_sb(), which can
> be called on a superblock that we gave up on due to an error. This
> patch changes cgroup_kill_sb() to only decrement root_count if the
> root was previously linked into the list of roots.
>
> Signed-off-by: Paul Menage <[email protected]>

Tested-by: Serge Hallyn <[email protected]>

thanks,
-serge

>
> ---
>
> I was actually surprised to find that list_del() doesn't crash when
> run on an unattached list_head structure.
>
> kernel/cgroup.c | 6 ++++--
> 1 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index adcd0bb..9ce27e8 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -1115,8 +1115,10 @@ static void cgroup_kill_sb(struct super_block *sb) {
> }
> write_unlock(&css_set_lock);
>
> - list_del(&root->root_list);
> - root_count--;
> + if (!list_empty(&root->root_list)) {
> + list_del(&root->root_list);
> + root_count--;
> + }
>
> mutex_unlock(&cgroup_mutex);
>

2009-01-23 10:23:54

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] cgroup: Fix root_count when mount fails due to busy subsystem

2009-01-23 16:59:24

by Paul Menage

[permalink] [raw]

Subject: Re: [PATCH] cgroup: Fix root_count when mount fails due to busy subsystem

On Fri, Jan 23, 2009 at 2:22 AM, Ingo Molnar <[email protected]> wrote:
>> --- a/kernel/cgroup.c
>> +++ b/kernel/cgroup.c
>> @@ -1115,8 +1115,10 @@ static void cgroup_kill_sb(struct super_block *sb) {
>> }
>> write_unlock(&css_set_lock);
>>
>> - list_del(&root->root_list);
>> - root_count--;
>> + if (!list_empty(&root->root_list)) {
>> + list_del(&root->root_list);
>> + root_count--;
>> + }
>
> That's ugly. It is _much_ cleaner to always keep the link head consistent
> - i.e. initialize it with INIT_LIST_HEAD()

It is initialized with INIT_LIST_HEAD().

> and then remove from it via
> list_del_init().

There's not much point doing list_del_init() rather than list_del()
here since we're about to delete the root.

>
> That way the error path will do the right thing automatically, and there's
> no need for that ugly "if !list_empty" construct either.

The important part here is avoiding decrementing root_count.

So the code could equally be:

if (!list_empty(&root->root_list)) {
root_count--;
}
list_del(&root->root_list);

but what I have in this patch seems more straightforward. It's
actually how the code used to be before it was removed as a
"redundant" check by a patch that I unfortunately didn't get a chance
to read properly (or Ack) because I was too snowed under with other
work.

Paul

2009-01-23 17:50:57

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] cgroup: Fix root_count when mount fails due to busy subsystem

* Paul Menage <[email protected]> wrote:

> On Fri, Jan 23, 2009 at 2:22 AM, Ingo Molnar <[email protected]> wrote:
> >> --- a/kernel/cgroup.c
> >> +++ b/kernel/cgroup.c
> >> @@ -1115,8 +1115,10 @@ static void cgroup_kill_sb(struct super_block *sb) {
> >> }
> >> write_unlock(&css_set_lock);
> >>
> >> - list_del(&root->root_list);
> >> - root_count--;
> >> + if (!list_empty(&root->root_list)) {
> >> + list_del(&root->root_list);
> >> + root_count--;
> >> + }
> >
> > That's ugly. It is _much_ cleaner to always keep the link head consistent
> > - i.e. initialize it with INIT_LIST_HEAD()
>
> It is initialized with INIT_LIST_HEAD().
>
> > and then remove from it via
> > list_del_init().
>
> There's not much point doing list_del_init() rather than list_del()
> here since we're about to delete the root.
>
> >
> > That way the error path will do the right thing automatically, and there's
> > no need for that ugly "if !list_empty" construct either.
>
> The important part here is avoiding decrementing root_count.

that should be done via proper placement of err_* labels. An assymetric
exit path like the one you did is always the sign of bad code structure.

Ingo

2009-01-23 18:11:57

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] cgroup: Fix root_count when mount fails due to busy subsystem

* Paul Menage <[email protected]> wrote:

> cgroup: Fix root_count when mount fails due to busy subsystem
>
> root_count was being incremented in cgroup_get_sb() after all error
> checking was complete, but decremented in cgroup_kill_sb(), which can be
> called on a superblock that we gave up on due to an error. This patch
> changes cgroup_kill_sb() to only decrement root_count if the root was
> previously linked into the list of roots.

i'm wondering, what happens in the buggy case: does cgroup_kill_sb() get
called twice (if yes, why?), or do we call cgroup_kill_sb() on a not yet
added sb and hence root_count has not been elevated yet? (if yes, which
codepath does this?)

The error handling in cgroup_get_sb() definitely seems a bit twisted -
find below a few error path and other cleanups.

Thanks,

Ingo

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index c298310..28d1b67 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1025,18 +1026,12 @@ static int cgroup_get_sb(struct file_system_type *fs_type,
* have some link structures left over
*/
ret = allocate_cg_links(css_set_count, &tmp_cg_links);
- if (ret) {
- mutex_unlock(&cgroup_mutex);
- mutex_unlock(&inode->i_mutex);
- goto drop_new_super;
- }
+ if (ret)
+ goto drop_new_super_unlock;

ret = rebind_subsystems(root, root->subsys_bits);
- if (ret == -EBUSY) {
- mutex_unlock(&cgroup_mutex);
- mutex_unlock(&inode->i_mutex);
+ if (ret == -EBUSY)
goto free_cg_links;
- }

/* EBUSY should be the only error here */
BUG_ON(ret);
@@ -1075,18 +1070,24 @@ static int cgroup_get_sb(struct file_system_type *fs_type,

free_cg_links:
free_cg_links(&tmp_cg_links);
+
+ drop_new_super_unlock:
+ mutex_unlock(&cgroup_mutex);
+ mutex_unlock(&inode->i_mutex);
drop_new_super:
up_write(&sb->s_umount);
deactivate_super(sb);
+
return ret;
}

-static void cgroup_kill_sb(struct super_block *sb) {
+static void cgroup_kill_sb(struct super_block *sb)
+{
struct cgroupfs_root *root = sb->s_fs_info;
struct cgroup *cgrp = &root->top_cgroup;
- int ret;
- struct cg_cgroup_link *link;
struct cg_cgroup_link *saved_link;
+ struct cg_cgroup_link *link;
+ int ret;

BUG_ON(!root);

2009-01-23 18:32:24

by Paul Menage

[permalink] [raw]

Subject: Re: [PATCH] cgroup: Fix root_count when mount fails due to busy subsystem

On Fri, Jan 23, 2009 at 10:10 AM, Ingo Molnar <[email protected]> wrote:
>
> * Paul Menage <[email protected]> wrote:
>
>> cgroup: Fix root_count when mount fails due to busy subsystem
>>
>> root_count was being incremented in cgroup_get_sb() after all error
>> checking was complete, but decremented in cgroup_kill_sb(), which can be
>> called on a superblock that we gave up on due to an error. This patch
>> changes cgroup_kill_sb() to only decrement root_count if the root was
>> previously linked into the list of roots.
>
> i'm wondering, what happens in the buggy case: does cgroup_kill_sb() get
> called twice (if yes, why?),

No.

> or do we call cgroup_kill_sb() on a not yet
> added sb and hence root_count has not been elevated yet?

Right.

> (if yes, which
> codepath does this?)

It's via the call to deactivate_super().

The code could be restructured such that:

- we don't set sb->s_fs_info until we've linked the new root into the root_list
- do any necessary cleanup for a failed root in cgroup_get_sb()
- have cgroup_kill_sb() handle either no root or a fully-initialized root

But then you're replacing "only decrement root_count if root was
linked in to list" with "only do root cleanup if root was atached to
sb" in cgroup_kill_sb(). I don't see that one is much cleaner than the
other.

For 2.6.29, we should fix this by reverting the broken part of the
patch that made it into 2.6.29-rcX

Paul

2009-01-23 18:38:17

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] cgroup: Fix root_count when mount fails due to busy subsystem

* Paul Menage <[email protected]> wrote:

> On Fri, Jan 23, 2009 at 10:10 AM, Ingo Molnar <[email protected]> wrote:
> >
> > * Paul Menage <[email protected]> wrote:
> >
> >> cgroup: Fix root_count when mount fails due to busy subsystem
> >>
> >> root_count was being incremented in cgroup_get_sb() after all error
> >> checking was complete, but decremented in cgroup_kill_sb(), which can be
> >> called on a superblock that we gave up on due to an error. This patch
> >> changes cgroup_kill_sb() to only decrement root_count if the root was
> >> previously linked into the list of roots.
> >
> > i'm wondering, what happens in the buggy case: does cgroup_kill_sb() get
> > called twice (if yes, why?),
>
> No.
>
> > or do we call cgroup_kill_sb() on a not yet
> > added sb and hence root_count has not been elevated yet?
>
> Right.
>
> > (if yes, which
> > codepath does this?)
>
> It's via the call to deactivate_super().

Which exact call chain is that?

> The code could be restructured such that:
>
> - we don't set sb->s_fs_info until we've linked the new root into the root_list
> - do any necessary cleanup for a failed root in cgroup_get_sb()
> - have cgroup_kill_sb() handle either no root or a fully-initialized root
>
> But then you're replacing "only decrement root_count if root was linked
> in to list" with "only do root cleanup if root was atached to sb" in
> cgroup_kill_sb(). I don't see that one is much cleaner than the other.

Agreed, that's not an improvement.

> For 2.6.29, we should fix this by reverting the broken part of the patch
> that made it into 2.6.29-rcX

Agreed too - i withdraw my objection.

Nevertheless my observation remains: kernel/cgroup.c has a complex looking
error paths which should be cleaned up. (independently of this issue)

Ingo

2009-01-23 18:42:59

by Paul Menage

[permalink] [raw]

Subject: Re: [PATCH] cgroup: Fix root_count when mount fails due to busy subsystem

On Fri, Jan 23, 2009 at 10:36 AM, Ingo Molnar <[email protected]> wrote:
>>
>> It's via the call to deactivate_super().
>
> Which exact call chain is that?

cgroup_get_sb() -> *failure* -> deactivate_super() -> VFS stuff
(sorry, no access to a source tree right now) -> cgroup_kill_sb()

Paul