2008-08-16 02:32:30

by Munehiro Ikeda

[permalink] [raw]
Subject: [PATCH] cgroup: memory.force_empty can make system slowdown

Cgroup's memory controller has a control file "memory.force_empty"
to reset usage account charged to a cgroup. The account shouldn't
be reset if one or more processes are attached to the cgroup (at
least for memory controller, IMHO). So mem_cgroup_force_empty()
is implemented to return -EBUSY and do nothing if so.
However, cgroup on hierarchy root faultily might be a exception.
Even if processes are attached to root cgroup (which is a "default"
cgroup for processes), forcing-empty can run by writing something to
memory.force_empty and it'll never end.

Following patch prevents this issue.

This patch is for cgroup infrastructure code. The issue can be
measured by modifying memory controller code also, namely to change
mem_cgroup_force_empty() to see CSS_ROOT bit of css->flags.
I believe cgroup->count approach like the patch below is rather
generic and reasonable, how does that sound?

Paul, Balbir?



Signed-off-by: Munehiro "Muuhh" Ikeda <[email protected]>

diff -uNrp linux-2.6.27-rc3.orig/kernel/cgroup.c linux-2.6.27-rc3/kernel/cgroup.c
--- linux-2.6.27-rc3.orig/kernel/cgroup.c 2008-08-12 21:55:39.000000000 -0400
+++ linux-2.6.27-rc3/kernel/cgroup.c 2008-08-15 20:52:52.000000000 -0400
@@ -2264,8 +2264,10 @@ static void init_cgroup_css(struct cgrou
css->cgroup = cgrp;
atomic_set(&css->refcnt, 0);
css->flags = 0;
- if (cgrp == dummytop)
+ if (cgrp == dummytop) {
set_bit(CSS_ROOT, &css->flags);
+ atomic_set(&css->cgroup->count, 1);
+ }
BUG_ON(cgrp->subsys[ss->subsys_id]);
cgrp->subsys[ss->subsys_id] = css;
}



--
IKEDA, Munehiro
NEC Corporation of America
[email protected]


2008-08-17 01:17:30

by Li Zefan

[permalink] [raw]
Subject: Re: [PATCH] cgroup: memory.force_empty can make system slowdown

IKEDA, Munehiro wrote:
> Cgroup's memory controller has a control file "memory.force_empty"
> to reset usage account charged to a cgroup. The account shouldn't
> be reset if one or more processes are attached to the cgroup (at
> least for memory controller, IMHO). So mem_cgroup_force_empty()
> is implemented to return -EBUSY and do nothing if so.
> However, cgroup on hierarchy root faultily might be a exception.
> Even if processes are attached to root cgroup (which is a "default"
> cgroup for processes), forcing-empty can run by writing something to
> memory.force_empty and it'll never end.
>

I found this bug last week, and I've made patches to fix it, but then
I was on vacation. I'll send the patches out soon.

> Following patch prevents this issue.
>
> This patch is for cgroup infrastructure code. The issue can be
> measured by modifying memory controller code also, namely to change
> mem_cgroup_force_empty() to see CSS_ROOT bit of css->flags.
> I believe cgroup->count approach like the patch below is rather
> generic and reasonable, how does that sound?
>

It's ok for the top_group's count to be 0 due to the top_cgroup hack.
With this patch, the top cgroup's count will be always >0, even if it
has no tasks in it, so writing to top_cgroup's force_empty will always
return -EBUSY.

> Paul, Balbir?
>
>
>
> Signed-off-by: Munehiro "Muuhh" Ikeda <[email protected]>
>
> diff -uNrp linux-2.6.27-rc3.orig/kernel/cgroup.c linux-2.6.27-rc3/kernel/cgroup.c
> --- linux-2.6.27-rc3.orig/kernel/cgroup.c 2008-08-12 21:55:39.000000000 -0400
> +++ linux-2.6.27-rc3/kernel/cgroup.c 2008-08-15 20:52:52.000000000 -0400
> @@ -2264,8 +2264,10 @@ static void init_cgroup_css(struct cgrou
> css->cgroup = cgrp;
> atomic_set(&css->refcnt, 0);
> css->flags = 0;
> - if (cgrp == dummytop)
> + if (cgrp == dummytop) {
> set_bit(CSS_ROOT, &css->flags);
> + atomic_set(&css->cgroup->count, 1);
> + }
> BUG_ON(cgrp->subsys[ss->subsys_id]);
> cgrp->subsys[ss->subsys_id] = css;
> }
>
>
>

2008-08-17 03:11:51

by Li Zefan

[permalink] [raw]
Subject: Re: [PATCH] cgroup: memory.force_empty can make system slowdown

Li Zefan wrote:
> IKEDA, Munehiro wrote:
>> Cgroup's memory controller has a control file "memory.force_empty"
>> to reset usage account charged to a cgroup. The account shouldn't
>> be reset if one or more processes are attached to the cgroup (at
>> least for memory controller, IMHO). So mem_cgroup_force_empty()
>> is implemented to return -EBUSY and do nothing if so.
>> However, cgroup on hierarchy root faultily might be a exception.
>> Even if processes are attached to root cgroup (which is a "default"
>> cgroup for processes), forcing-empty can run by writing something to
>> memory.force_empty and it'll never end.
>>
>
> I found this bug last week, and I've made patches to fix it, but then
> I was on vacation. I'll send the patches out soon.
>
>> Following patch prevents this issue.
>>
>> This patch is for cgroup infrastructure code. The issue can be
>> measured by modifying memory controller code also, namely to change
>> mem_cgroup_force_empty() to see CSS_ROOT bit of css->flags.
>> I believe cgroup->count approach like the patch below is rather
>> generic and reasonable, how does that sound?
>>
>
> It's ok for the top_group's count to be 0 due to the top_cgroup hack.
> With this patch, the top cgroup's count will be always >0, even if it
> has no tasks in it, so writing to top_cgroup's force_empty will always
> return -EBUSY.
>

I thought cgrp->css_sets will be empty when there are no tasks in the top cgroup,
but I was wrong, because init_css_set's refcount will always >0,
so cgroup_task_count() won't return 0 for the top cgroup:

# mount -t cgroup -o debug xxx /mnt
# mkdir /mnt/sub
# for pid in `cat /mnt/tasks`; do echo $pid > /mnt/sub/tasks; done
# cat /mnt/tasks
# cat /mnt/debug.taskcount
3