2009-11-24 06:00:04

by Daisuke Nishimura

[permalink] [raw]
Subject: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy

task_in_mem_cgroup(), which is called by select_bad_process() to check whether
a task can be a candidate for being oom-killed from memcg's limit, checks
"curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).

But this check return true(it's false positive) when:

<some path>/00 use_hierarchy == 0 <- hitting limit
<some path>/00/aa use_hierarchy == 1 <- "curr"

This leads to killing an innocent task in 00/aa. This patch is a fix for this
bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
should print information of mem_cgroup which the task being killed, not current,
belongs to.

Signed-off-by: Daisuke Nishimura <[email protected]>
---
The bug exists and should be fixed in 2.6.31.y too.
I'll post a patch for -stable later.

mm/memcontrol.c | 4 ++--
mm/oom_kill.c | 13 +++++++------
2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ea00a93..d02f9f8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -783,7 +783,7 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
task_unlock(task);
if (!curr)
return 0;
- if (curr->use_hierarchy)
+ if (mem->use_hierarchy)
ret = css_is_ancestor(&curr->css, &mem->css);
else
ret = (curr == mem);
@@ -1032,7 +1032,7 @@ void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
static char memcg_name[PATH_MAX];
int ret;

- if (!memcg)
+ if (!memcg || !p)
return;


diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ab04537..be56461 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -356,7 +356,8 @@ static void dump_tasks(const struct mem_cgroup *mem)
} while_each_thread(g, p);
}

-static void dump_header(gfp_t gfp_mask, int order, struct mem_cgroup *mem)
+static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
+ struct mem_cgroup *mem)
{
pr_warning("%s invoked oom-killer: gfp_mask=0x%x, order=%d, "
"oom_adj=%d\n",
@@ -365,7 +366,7 @@ static void dump_header(gfp_t gfp_mask, int order, struct mem_cgroup *mem)
cpuset_print_task_mems_allowed(current);
task_unlock(current);
dump_stack();
- mem_cgroup_print_oom_info(mem, current);
+ mem_cgroup_print_oom_info(mem, p);
show_mem();
if (sysctl_oom_dump_tasks)
dump_tasks(mem);
@@ -440,7 +441,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
struct task_struct *c;

if (printk_ratelimit())
- dump_header(gfp_mask, order, mem);
+ dump_header(p, gfp_mask, order, mem);

/*
* If the task is already exiting, don't alarm the sysadmin or kill
@@ -576,7 +577,7 @@ retry:
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!p) {
read_unlock(&tasklist_lock);
- dump_header(gfp_mask, order, NULL);
+ dump_header(NULL, gfp_mask, order, NULL);
panic("Out of memory and no killable processes...\n");
}

@@ -644,7 +645,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
return;

if (sysctl_panic_on_oom == 2) {
- dump_header(gfp_mask, order, NULL);
+ dump_header(NULL, gfp_mask, order, NULL);
panic("out of memory. Compulsory panic_on_oom is selected.\n");
}

@@ -663,7 +664,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,

case CONSTRAINT_NONE:
if (sysctl_panic_on_oom) {
- dump_header(gfp_mask, order, NULL);
+ dump_header(NULL, gfp_mask, order, NULL);
panic("out of memory. panic_on_oom is selected\n");
}
/* Fall-through */
--
1.5.6.1


2009-11-24 07:31:31

by Daisuke Nishimura

[permalink] [raw]
Subject: [BUGFIX][PATCH -stable] memcg: avoid oom-killing innocent task in case of use_hierarchy

task_in_mem_cgroup(), which is called by select_bad_process() to check whether
a task can be a candidate for being oom-killed from memcg's limit, checks
"curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).

But this check return true(it's false positive) when:

<some path>/00 use_hierarchy == 0 <- hitting limit
<some path>/00/aa use_hierarchy == 1 <- "curr"

This leads to killing an innocent task in 00/aa. This patch is a fix for this
bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
should print information of mem_cgroup which the task being killed, not current,
belongs to.

Signed-off-by: Daisuke Nishimura <[email protected]>
---
mm/memcontrol.c | 2 +-
mm/oom_kill.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fd4529d..3acc226 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -496,7 +496,7 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
task_unlock(task);
if (!curr)
return 0;
- if (curr->use_hierarchy)
+ if (mem->use_hierarchy)
ret = css_is_ancestor(&curr->css, &mem->css);
else
ret = (curr == mem);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index a7b2460..ed452e9 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -400,7 +400,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
cpuset_print_task_mems_allowed(current);
task_unlock(current);
dump_stack();
- mem_cgroup_print_oom_info(mem, current);
+ mem_cgroup_print_oom_info(mem, p);
show_mem();
if (sysctl_oom_dump_tasks)
dump_tasks(mem);
--
1.5.6.1

2009-11-24 13:31:55

by Balbir Singh

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy

On Tue, Nov 24, 2009 at 11:27 AM, Daisuke Nishimura
<[email protected]> wrote:
> task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> a task can be a candidate for being oom-killed from memcg's limit, checks
> "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
>
> But this check return true(it's false positive) when:
>
> ? ? ? ?<some path>/00 ? ? ? ? ?use_hierarchy == 0 ? ? ?<- hitting limit
> ? ? ? ? ?<some path>/00/aa ? ? use_hierarchy == 1 ? ? ?<- "curr"
>
> This leads to killing an innocent task in 00/aa. This patch is a fix for this
> bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> should print information of mem_cgroup which the task being killed, not current,
> belongs to.
>

Quick Question: What happens if <some path>/00 has no tasks in it
after your patches?

Balbir

2009-11-24 14:56:28

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy

On Tue, 24 Nov 2009 19:01:54 +0530
Balbir Singh <[email protected]> wrote:

> On Tue, Nov 24, 2009 at 11:27 AM, Daisuke Nishimura
> <[email protected]> wrote:
> > task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> > a task can be a candidate for being oom-killed from memcg's limit, checks
> > "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
> >
> > But this check return true(it's false positive) when:
> >
> >        <some path>/00          use_hierarchy == 0      <- hitting limit
> >          <some path>/00/aa     use_hierarchy == 1      <- "curr"
> >
> > This leads to killing an innocent task in 00/aa. This patch is a fix for this
> > bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> > should print information of mem_cgroup which the task being killed, not current,
> > belongs to.
> >
>
> Quick Question: What happens if <some path>/00 has no tasks in it
> after your patches?
>
Nothing would happen because <some path>/00 never hit its limit.

The bug that this patch fixes is:

- create a dir <some path>/00 and set some limits.
- create a sub dir <some path>/00/aa w/o any limits, and enable hierarchy.
- run some programs in both in 00 and 00/aa. programs in 00 should be
big enough to cause oom by its limit.
- when oom happens by 00's limit, tasks in 00/aa can also be killed.


Regards,
Daisuke Nishimura.

2009-11-24 17:04:06

by Balbir Singh

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy

* Daisuke Nishimura <[email protected]> [2009-11-24 23:00:29]:

> On Tue, 24 Nov 2009 19:01:54 +0530
> Balbir Singh <[email protected]> wrote:
>
> > On Tue, Nov 24, 2009 at 11:27 AM, Daisuke Nishimura
> > <[email protected]> wrote:
> > > task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> > > a task can be a candidate for being oom-killed from memcg's limit, checks
> > > "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
> > >
> > > But this check return true(it's false positive) when:
> > >
> > > ? ? ? ?<some path>/00 ? ? ? ? ?use_hierarchy == 0 ? ? ?<- hitting limit
> > > ? ? ? ? ?<some path>/00/aa ? ? use_hierarchy == 1 ? ? ?<- "curr"
> > >
> > > This leads to killing an innocent task in 00/aa. This patch is a fix for this
> > > bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> > > should print information of mem_cgroup which the task being killed, not current,
> > > belongs to.
> > >
> >
> > Quick Question: What happens if <some path>/00 has no tasks in it
> > after your patches?
> >
> Nothing would happen because <some path>/00 never hit its limit.

Why not? I am talking of a scenario where <some path>/00 is set to a
limit (similar to your example) and hits its limit, but the groups
under it have no limits, but tasks. Shouldn't we be scanning
<some path>/00/aa as well?

>
> The bug that this patch fixes is:
>
> - create a dir <some path>/00 and set some limits.
> - create a sub dir <some path>/00/aa w/o any limits, and enable hierarchy.
> - run some programs in both in 00 and 00/aa. programs in 00 should be
> big enough to cause oom by its limit.
> - when oom happens by 00's limit, tasks in 00/aa can also be killed.
>

To be honest, the last part is fair, specifically if 00/aa has a task
that is really the heaviest task as per the oom logic. no? Are you
suggesting that only tasks in <some path>/00 should be selected by the
oom logic?

--
Balbir

2009-11-24 23:56:34

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy

On Tue, 24 Nov 2009 22:34:02 +0530, Balbir Singh <[email protected]> wrote:
> * Daisuke Nishimura <[email protected]> [2009-11-24 23:00:29]:
>
> > On Tue, 24 Nov 2009 19:01:54 +0530
> > Balbir Singh <[email protected]> wrote:
> >
> > > On Tue, Nov 24, 2009 at 11:27 AM, Daisuke Nishimura
> > > <[email protected]> wrote:
> > > > task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> > > > a task can be a candidate for being oom-killed from memcg's limit, checks
> > > > "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
> > > >
> > > > But this check return true(it's false positive) when:
> > > >
> > > >        <some path>/00          use_hierarchy == 0      <- hitting limit
> > > >          <some path>/00/aa     use_hierarchy == 1      <- "curr"
> > > >
> > > > This leads to killing an innocent task in 00/aa. This patch is a fix for this
> > > > bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> > > > should print information of mem_cgroup which the task being killed, not current,
> > > > belongs to.
> > > >
> > >
> > > Quick Question: What happens if <some path>/00 has no tasks in it
> > > after your patches?
> > >
> > Nothing would happen because <some path>/00 never hit its limit.
>
> Why not? I am talking of a scenario where <some path>/00 is set to a
> limit (similar to your example) and hits its limit, but the groups
> under it have no limits, but tasks. Shouldn't we be scanning
> <some path>/00/aa as well?
>
> >
> > The bug that this patch fixes is:
> >
> > - create a dir <some path>/00 and set some limits.
> > - create a sub dir <some path>/00/aa w/o any limits, and enable hierarchy.
> > - run some programs in both in 00 and 00/aa. programs in 00 should be
> > big enough to cause oom by its limit.
> > - when oom happens by 00's limit, tasks in 00/aa can also be killed.
> >
>
> To be honest, the last part is fair, specifically if 00/aa has a task
> that is really the heaviest task as per the oom logic. no? Are you
> suggesting that only tasks in <some path>/00 should be selected by the
> oom logic?
>
All of your comments would be rational if hierarchy is enabled in 00(it's
also enabled in 00/aa automatically in this case).
I'm saying about the case where it's disabled in 00 but enabled in 00/aa.

In this scenario, charges by tasks in 00/aa is(and should not be) charged to 00.
And oom caused by 00's limit should not affect the task in 00/aa.


Regards,
Daisuke Nishimura.

2009-11-25 00:03:44

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH -stable] memcg: avoid oom-killing innocent task in case of use_hierarchy

On Tue, 24 Nov 2009 16:28:54 +0900
Daisuke Nishimura <[email protected]> wrote:

> task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> a task can be a candidate for being oom-killed from memcg's limit, checks
> "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
>
> But this check return true(it's false positive) when:
>
> <some path>/00 use_hierarchy == 0 <- hitting limit
> <some path>/00/aa use_hierarchy == 1 <- "curr"
>
> This leads to killing an innocent task in 00/aa. This patch is a fix for this
> bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> should print information of mem_cgroup which the task being killed, not current,
> belongs to.
>
> Signed-off-by: Daisuke Nishimura <[email protected]>
> ---
> mm/memcontrol.c | 2 +-
> mm/oom_kill.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index fd4529d..3acc226 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -496,7 +496,7 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
> task_unlock(task);
> if (!curr)
> return 0;
> - if (curr->use_hierarchy)
> + if (mem->use_hierarchy)
> ret = css_is_ancestor(&curr->css, &mem->css);
> else
> ret = (curr == mem);

Hmm. Maybe not-expected behavior...could you add comment ?

Acked-by: KAMEZAWA Hiroyuki <[email protected]>
(*) I'm sorry I can't work enough in these days.

Thanks,
-Kame

2009-11-25 00:10:50

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy

On Tue, 24 Nov 2009 22:34:02 +0530
Balbir Singh <[email protected]> wrote:

> * Daisuke Nishimura <[email protected]> [2009-11-24 23:00:29]:
>
> > On Tue, 24 Nov 2009 19:01:54 +0530
> > Balbir Singh <[email protected]> wrote:
> >
> > > On Tue, Nov 24, 2009 at 11:27 AM, Daisuke Nishimura
> > > <[email protected]> wrote:
> > > > task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> > > > a task can be a candidate for being oom-killed from memcg's limit, checks
> > > > "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
> > > >
> > > > But this check return true(it's false positive) when:
> > > >
> > > >        <some path>/00          use_hierarchy == 0      <- hitting limit
> > > >          <some path>/00/aa     use_hierarchy == 1      <- "curr"
> > > >
> > > > This leads to killing an innocent task in 00/aa. This patch is a fix for this
> > > > bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> > > > should print information of mem_cgroup which the task being killed, not current,
> > > > belongs to.
> > > >
> > >
> > > Quick Question: What happens if <some path>/00 has no tasks in it
> > > after your patches?
> > >
> > Nothing would happen because <some path>/00 never hit its limit.
>
> Why not? I am talking of a scenario where <some path>/00 is set to a
> limit (similar to your example) and hits its limit, but the groups
> under it have no limits, but tasks. Shouldn't we be scanning
> <some path>/00/aa as well?
>
No. <some path>/00 == use_hierarchy=0 means _all_ children's accounting
information is never added up to <some path>/00.

If there is no task in <some path>/00, it means <some path>/00 contains only
file cache and not-migrated rss. To hit limit, the admin has to make
memory.(memsw).limit_in_bytes smaller. But in this case, oom is not called.
-ENOMEM is returned to users. IIUC.




> >
> > The bug that this patch fixes is:
> >
> > - create a dir <some path>/00 and set some limits.
> > - create a sub dir <some path>/00/aa w/o any limits, and enable hierarchy.
> > - run some programs in both in 00 and 00/aa. programs in 00 should be
> > big enough to cause oom by its limit.
> > - when oom happens by 00's limit, tasks in 00/aa can also be killed.
> >
>
> To be honest, the last part is fair, specifically if 00/aa has a task
> that is really the heaviest task as per the oom logic. no? Are you
> suggesting that only tasks in <some path>/00 should be selected by the
> oom logic?
>
<some path>/00 and <some path>/00/aa has completely different accounting set.
There are no hierarchy relationship. The directory tree shows "virtual"
hierarchy but in reality, their relationship is horizontal rather than hierarchycal.

So, killing tasks only in <some path>/00 is better.

Thanks,
-Kame


2009-11-25 03:29:58

by Balbir Singh

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy

* [email protected] <[email protected]> [2009-11-25 08:49:10]:

> On Tue, 24 Nov 2009 22:34:02 +0530, Balbir Singh <[email protected]> wrote:
> > * Daisuke Nishimura <[email protected]> [2009-11-24 23:00:29]:
> >
> > > On Tue, 24 Nov 2009 19:01:54 +0530
> > > Balbir Singh <[email protected]> wrote:
> > >
> > > > On Tue, Nov 24, 2009 at 11:27 AM, Daisuke Nishimura
> > > > <[email protected]> wrote:
> > > > > task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> > > > > a task can be a candidate for being oom-killed from memcg's limit, checks
> > > > > "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
> > > > >
> > > > > But this check return true(it's false positive) when:
> > > > >
> > > > > ? ? ? ?<some path>/00 ? ? ? ? ?use_hierarchy == 0 ? ? ?<- hitting limit
> > > > > ? ? ? ? ?<some path>/00/aa ? ? use_hierarchy == 1 ? ? ?<- "curr"
> > > > >
> > > > > This leads to killing an innocent task in 00/aa. This patch is a fix for this
> > > > > bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> > > > > should print information of mem_cgroup which the task being killed, not current,
> > > > > belongs to.
> > > > >
> > > >
> > > > Quick Question: What happens if <some path>/00 has no tasks in it
> > > > after your patches?
> > > >
> > > Nothing would happen because <some path>/00 never hit its limit.
> >
> > Why not? I am talking of a scenario where <some path>/00 is set to a
> > limit (similar to your example) and hits its limit, but the groups
> > under it have no limits, but tasks. Shouldn't we be scanning
> > <some path>/00/aa as well?
> >
> > >
> > > The bug that this patch fixes is:
> > >
> > > - create a dir <some path>/00 and set some limits.
> > > - create a sub dir <some path>/00/aa w/o any limits, and enable hierarchy.
> > > - run some programs in both in 00 and 00/aa. programs in 00 should be
> > > big enough to cause oom by its limit.
> > > - when oom happens by 00's limit, tasks in 00/aa can also be killed.
> > >
> >
> > To be honest, the last part is fair, specifically if 00/aa has a task
> > that is really the heaviest task as per the oom logic. no? Are you
> > suggesting that only tasks in <some path>/00 should be selected by the
> > oom logic?
> >
> All of your comments would be rational if hierarchy is enabled in 00(it's
> also enabled in 00/aa automatically in this case).
> I'm saying about the case where it's disabled in 00 but enabled in 00/aa.
>

OK, I misunderstood the example then, so even though hierarchy is
disabled, we kill a task in 00/aa when 00 hits the limit. Thanks for
clarifying.

> In this scenario, charges by tasks in 00/aa is(and should not be) charged to 00.
> And oom caused by 00's limit should not affect the task in 00/aa.
>
>
> Regards,
> Daisuke Nishimura.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

--
Balbir

2009-11-25 04:08:49

by Balbir Singh

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH -stable] memcg: avoid oom-killing innocent task in case of use_hierarchy

* [email protected] <[email protected]> [2009-11-24 16:28:54]:

> task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> a task can be a candidate for being oom-killed from memcg's limit, checks
> "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
>
> But this check return true(it's false positive) when:
>
> <some path>/00 use_hierarchy == 0 <- hitting limit
> <some path>/00/aa use_hierarchy == 1 <- "curr"
>
> This leads to killing an innocent task in 00/aa. This patch is a fix for this
> bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> should print information of mem_cgroup which the task being killed, not current,
> belongs to.
>
> Signed-off-by: Daisuke Nishimura <[email protected]>
> ---
> mm/memcontrol.c | 2 +-
> mm/oom_kill.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index fd4529d..3acc226 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -496,7 +496,7 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
> task_unlock(task);
> if (!curr)
> return 0;
> - if (curr->use_hierarchy)
> + if (mem->use_hierarchy)
> ret = css_is_ancestor(&curr->css, &mem->css);
> else
> ret = (curr == mem);
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index a7b2460..ed452e9 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -400,7 +400,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
> cpuset_print_task_mems_allowed(current);
> task_unlock(current);
> dump_stack();
> - mem_cgroup_print_oom_info(mem, current);
> + mem_cgroup_print_oom_info(mem, p);
> show_mem();
> if (sysctl_oom_dump_tasks)
> dump_tasks(mem);
>


Reviewed-by: Balbir Singh <[email protected]>


--
Balbir

2009-11-25 04:09:55

by Balbir Singh

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy

* [email protected] <[email protected]> [2009-11-24 14:57:59]:

> task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> a task can be a candidate for being oom-killed from memcg's limit, checks
> "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
>
> But this check return true(it's false positive) when:
>
> <some path>/00 use_hierarchy == 0 <- hitting limit
> <some path>/00/aa use_hierarchy == 1 <- "curr"
>
> This leads to killing an innocent task in 00/aa. This patch is a fix for this
> bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> should print information of mem_cgroup which the task being killed, not current,
> belongs to.
>
> Signed-off-by: Daisuke Nishimura <[email protected]>


Acked-by: Balbir Singh <[email protected]>

--
Balbir

2009-11-25 05:42:36

by Daisuke Nishimura

[permalink] [raw]
Subject: [BUGFIX][PATCH v2 -stable] memcg: avoid oom-killing innocent task in case of use_hierarchy

> Hmm. Maybe not-expected behavior...could you add comment ?
>
How about this ?

> Acked-by: KAMEZAWA Hiroyuki <[email protected]>
> (*) I'm sorry I can't work enough in these days.
>

BTW, this patch conflict with oom-dump-stack-and-vm-state-when-oom-killer-panics.patch
in current mmotm(that's why I post mmotm version separately), so this bug will not be fixed
till 2.6.33 in linus-tree.
So I think this patch should go in 2.6.32.y too.

===
From: Daisuke Nishimura <[email protected]>

task_in_mem_cgroup(), which is called by select_bad_process() to check whether
a task can be a candidate for being oom-killed from memcg's limit, checks
"curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).

But this check return true(it's false positive) when:

<some path>/00 use_hierarchy == 0 <- hitting limit
<some path>/00/aa use_hierarchy == 1 <- "curr"

This leads to killing an innocent task in 00/aa. This patch is a fix for this
bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
should print information of mem_cgroup which the task being killed, not current,
belongs to.

Signed-off-by: Daisuke Nishimura <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Reviewed-by: Balbir Singh <[email protected]>
---
mm/memcontrol.c | 8 +++++++-
mm/oom_kill.c | 2 +-
2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fd4529d..566925e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -496,7 +496,13 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
task_unlock(task);
if (!curr)
return 0;
- if (curr->use_hierarchy)
+ /*
+ * We should check use_hierarchy of "mem" not "curr". Because checking
+ * use_hierarchy of "curr" here make this function true if hierarchy is
+ * enabled in "curr" and "curr" is a child of "mem" in *cgroup*
+ * hierarchy(even if use_hierarchy is disabled in "mem").
+ */
+ if (mem->use_hierarchy)
ret = css_is_ancestor(&curr->css, &mem->css);
else
ret = (curr == mem);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index a7b2460..ed452e9 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -400,7 +400,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
cpuset_print_task_mems_allowed(current);
task_unlock(current);
dump_stack();
- mem_cgroup_print_oom_info(mem, current);
+ mem_cgroup_print_oom_info(mem, p);
show_mem();
if (sysctl_oom_dump_tasks)
dump_tasks(mem);
--
1.5.6.1

2009-11-25 05:53:23

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH v2 -stable] memcg: avoid oom-killing innocent task in case of use_hierarchy

On Wed, 25 Nov 2009 14:32:18 +0900
Daisuke Nishimura <[email protected]> wrote:

> > Hmm. Maybe not-expected behavior...could you add comment ?
> >
> How about this ?
>
seems nice. Thank you very much.

Regards,
-Kame

2009-11-25 20:46:41

by Andrew Morton

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH v2 -stable] memcg: avoid oom-killing innocent task in case of use_hierarchy

On Wed, 25 Nov 2009 14:32:18 +0900
Daisuke Nishimura <[email protected]> wrote:

> > Hmm. Maybe not-expected behavior...could you add comment ?
> >
> How about this ?
>
> > Acked-by: KAMEZAWA Hiroyuki <[email protected]>
> > (*) I'm sorry I can't work enough in these days.
> >
>
> BTW, this patch conflict with oom-dump-stack-and-vm-state-when-oom-killer-panics.patch
> in current mmotm(that's why I post mmotm version separately), so this bug will not be fixed
> till 2.6.33 in linus-tree.
> So I think this patch should go in 2.6.32.y too.

I don't actually have a 2.6.33 version of this patch yet.

2009-11-26 00:19:48

by Daisuke Nishimura

[permalink] [raw]
Subject: [BUGFIX][PATCH v2 -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy

On Wed, 25 Nov 2009 12:45:51 -0800, Andrew Morton <[email protected]> wrote:
> On Wed, 25 Nov 2009 14:32:18 +0900
> Daisuke Nishimura <[email protected]> wrote:
>
> > > Hmm. Maybe not-expected behavior...could you add comment ?
> > >
> > How about this ?
> >
> > > Acked-by: KAMEZAWA Hiroyuki <[email protected]>
> > > (*) I'm sorry I can't work enough in these days.
> > >
> >
> > BTW, this patch conflict with oom-dump-stack-and-vm-state-when-oom-killer-panics.patch
> > in current mmotm(that's why I post mmotm version separately), so this bug will not be fixed
> > till 2.6.33 in linus-tree.
> > So I think this patch should go in 2.6.32.y too.
>
> I don't actually have a 2.6.33 version of this patch yet.

I add comments as I did in for-stable version and attach the updated patch
for-mmotm to this mail.

It can be applied on current mmotm(2009-11-24-16-47).

===
From: Daisuke Nishimura <[email protected]>

task_in_mem_cgroup(), which is called by select_bad_process() to check whether
a task can be a candidate for being oom-killed from memcg's limit, checks
"curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).

But this check return true(it's false positive) when:

<some path>/aa use_hierarchy == 0 <- hitting limit
<some path>/aa/00 use_hierarchy == 1 <- the task belongs to

This leads to killing an innocent task in aa/00. This patch is a fix for this
bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
should print information of mem_cgroup which the task being killed, not current,
belongs to.

Signed-off-by: Daisuke Nishimura <[email protected]>
Acked-by: Balbir Singh <[email protected]>
---
mm/memcontrol.c | 10 ++++++++--
mm/oom_kill.c | 13 +++++++------
2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 661b8c6..951c103 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -759,7 +759,13 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
task_unlock(task);
if (!curr)
return 0;
- if (curr->use_hierarchy)
+ /*
+ * We should check use_hierarchy of "mem" not "curr". Because checking
+ * use_hierarchy of "curr" here make this function true if hierarchy is
+ * enabled in "curr" and "curr" is a child of "mem" in *cgroup*
+ * hierarchy(even if use_hierarchy is disabled in "mem").
+ */
+ if (mem->use_hierarchy)
ret = css_is_ancestor(&curr->css, &mem->css);
else
ret = (curr == mem);
@@ -1008,7 +1014,7 @@ void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
static char memcg_name[PATH_MAX];
int ret;

- if (!memcg)
+ if (!memcg || !p)
return;


diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ab04537..be56461 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -356,7 +356,8 @@ static void dump_tasks(const struct mem_cgroup *mem)
} while_each_thread(g, p);
}

-static void dump_header(gfp_t gfp_mask, int order, struct mem_cgroup *mem)
+static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
+ struct mem_cgroup *mem)
{
pr_warning("%s invoked oom-killer: gfp_mask=0x%x, order=%d, "
"oom_adj=%d\n",
@@ -365,7 +366,7 @@ static void dump_header(gfp_t gfp_mask, int order, struct mem_cgroup *mem)
cpuset_print_task_mems_allowed(current);
task_unlock(current);
dump_stack();
- mem_cgroup_print_oom_info(mem, current);
+ mem_cgroup_print_oom_info(mem, p);
show_mem();
if (sysctl_oom_dump_tasks)
dump_tasks(mem);
@@ -440,7 +441,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
struct task_struct *c;

if (printk_ratelimit())
- dump_header(gfp_mask, order, mem);
+ dump_header(p, gfp_mask, order, mem);

/*
* If the task is already exiting, don't alarm the sysadmin or kill
@@ -576,7 +577,7 @@ retry:
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!p) {
read_unlock(&tasklist_lock);
- dump_header(gfp_mask, order, NULL);
+ dump_header(NULL, gfp_mask, order, NULL);
panic("Out of memory and no killable processes...\n");
}

@@ -644,7 +645,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
return;

if (sysctl_panic_on_oom == 2) {
- dump_header(gfp_mask, order, NULL);
+ dump_header(NULL, gfp_mask, order, NULL);
panic("out of memory. Compulsory panic_on_oom is selected.\n");
}

@@ -663,7 +664,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,

case CONSTRAINT_NONE:
if (sysctl_panic_on_oom) {
- dump_header(gfp_mask, order, NULL);
+ dump_header(NULL, gfp_mask, order, NULL);
panic("out of memory. panic_on_oom is selected\n");
}
/* Fall-through */
--
1.5.6.1

2009-12-17 00:59:08

by Daisuke Nishimura

[permalink] [raw]
Subject: [BUGFIX][PATCH v2 -stable] memcg: avoid oom-killing innocent task in case of use_hierarchy

Stable team.

Cay you pick this up for 2.6.32.y(and 2.6.31.y if it will be released) ?

This is a for-stable version of a bugfix patch that corresponds to the
upstream commmit d31f56dbf8bafaacb0c617f9a6f137498d5c7aed.

===
From: Daisuke Nishimura <[email protected]>

task_in_mem_cgroup(), which is called by select_bad_process() to check whether
a task can be a candidate for being oom-killed from memcg's limit, checks
"curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).

But this check return true(it's false positive) when:

<some path>/00 use_hierarchy == 0 <- hitting limit
<some path>/00/aa use_hierarchy == 1 <- "curr"

This leads to killing an innocent task in 00/aa. This patch is a fix for this
bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
should print information of mem_cgroup which the task being killed, not current,
belongs to.

Signed-off-by: Daisuke Nishimura <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Reviewed-by: Balbir Singh <[email protected]>
---
mm/memcontrol.c | 8 +++++++-
mm/oom_kill.c | 2 +-
2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fd4529d..566925e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -496,7 +496,13 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
task_unlock(task);
if (!curr)
return 0;
- if (curr->use_hierarchy)
+ /*
+ * We should check use_hierarchy of "mem" not "curr". Because checking
+ * use_hierarchy of "curr" here make this function true if hierarchy is
+ * enabled in "curr" and "curr" is a child of "mem" in *cgroup*
+ * hierarchy(even if use_hierarchy is disabled in "mem").
+ */
+ if (mem->use_hierarchy)
ret = css_is_ancestor(&curr->css, &mem->css);
else
ret = (curr == mem);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index a7b2460..ed452e9 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -400,7 +400,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
cpuset_print_task_mems_allowed(current);
task_unlock(current);
dump_stack();
- mem_cgroup_print_oom_info(mem, current);
+ mem_cgroup_print_oom_info(mem, p);
show_mem();
if (sysctl_oom_dump_tasks)
dump_tasks(mem);
--
1.5.6.1

2010-01-04 22:30:27

by Greg KH

[permalink] [raw]
Subject: Re: [stable] [BUGFIX][PATCH v2 -stable] memcg: avoid oom-killing innocent task in case of use_hierarchy

On Thu, Dec 17, 2009 at 09:47:24AM +0900, Daisuke Nishimura wrote:
> Stable team.
>
> Cay you pick this up for 2.6.32.y(and 2.6.31.y if it will be released) ?
>
> This is a for-stable version of a bugfix patch that corresponds to the
> upstream commmit d31f56dbf8bafaacb0c617f9a6f137498d5c7aed.

I've applied it to the .32-stable tree, but it does not apply to .31.
Care to provide a version of the patch for that kernel if you want it
applied there?

thanks,

greg k-h

2010-01-05 03:27:50

by Daisuke Nishimura

[permalink] [raw]
Subject: [stable][BUGFIX][PATCH v3] memcg: avoid oom-killing innocent task in case of use_hierarchy

On Mon, 4 Jan 2010 14:28:19 -0800
Greg KH <[email protected]> wrote:

> On Thu, Dec 17, 2009 at 09:47:24AM +0900, Daisuke Nishimura wrote:
> > Stable team.
> >
> > Cay you pick this up for 2.6.32.y(and 2.6.31.y if it will be released) ?
> >
> > This is a for-stable version of a bugfix patch that corresponds to the
> > upstream commmit d31f56dbf8bafaacb0c617f9a6f137498d5c7aed.
>
> I've applied it to the .32-stable tree, but it does not apply to .31.
> Care to provide a version of the patch for that kernel if you want it
> applied there?
>
hmm, strange. I can apply it onto 2.6.31.9. It might conflict with other patches
in 2.6.31.y queue ?
Anyway, I've attached the patch that is rebased on 2.6.31.9. Please tell me if you
have any problem with it.

v3: rebased on 2.6.31.9
===
>From 14cd608eef94c851460d3d56e0c676d17ecc64f2 Mon Sep 17 00:00:00 2001
From: Daisuke Nishimura <[email protected]>
Date: Tue, 5 Jan 2010 12:15:42 +0900
Subject: [PATCH] memcg: avoid oom-killing innocent task in case of use_hierarchy

task_in_mem_cgroup(), which is called by select_bad_process() to check whether
a task can be a candidate for being oom-killed from memcg's limit, checks
"curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).

But this check return true(it's false positive) when:

<some path>/00 use_hierarchy == 0 <- hitting limit
<some path>/00/aa use_hierarchy == 1 <- "curr"

This leads to killing an innocent task in 00/aa. This patch is a fix for this
bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
should print information of mem_cgroup which the task being killed, not current,
belongs to.

Signed-off-by: Daisuke Nishimura <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Reviewed-by: Balbir Singh <[email protected]>
---
mm/memcontrol.c | 8 +++++++-
mm/oom_kill.c | 2 +-
2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fd4529d..566925e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -496,7 +496,13 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
task_unlock(task);
if (!curr)
return 0;
- if (curr->use_hierarchy)
+ /*
+ * We should check use_hierarchy of "mem" not "curr". Because checking
+ * use_hierarchy of "curr" here make this function true if hierarchy is
+ * enabled in "curr" and "curr" is a child of "mem" in *cgroup*
+ * hierarchy(even if use_hierarchy is disabled in "mem").
+ */
+ if (mem->use_hierarchy)
ret = css_is_ancestor(&curr->css, &mem->css);
else
ret = (curr == mem);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index a7b2460..ed452e9 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -400,7 +400,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
cpuset_print_task_mems_allowed(current);
task_unlock(current);
dump_stack();
- mem_cgroup_print_oom_info(mem, current);
+ mem_cgroup_print_oom_info(mem, p);
show_mem();
if (sysctl_oom_dump_tasks)
dump_tasks(mem);
--
1.6.3.3

2010-01-05 19:33:44

by Greg KH

[permalink] [raw]
Subject: Re: [stable] [BUGFIX][PATCH v3] memcg: avoid oom-killing innocent task in case of use_hierarchy

On Tue, Jan 05, 2010 at 12:26:33PM +0900, Daisuke Nishimura wrote:
> On Mon, 4 Jan 2010 14:28:19 -0800
> Greg KH <[email protected]> wrote:
>
> > On Thu, Dec 17, 2009 at 09:47:24AM +0900, Daisuke Nishimura wrote:
> > > Stable team.
> > >
> > > Cay you pick this up for 2.6.32.y(and 2.6.31.y if it will be released) ?
> > >
> > > This is a for-stable version of a bugfix patch that corresponds to the
> > > upstream commmit d31f56dbf8bafaacb0c617f9a6f137498d5c7aed.
> >
> > I've applied it to the .32-stable tree, but it does not apply to .31.
> > Care to provide a version of the patch for that kernel if you want it
> > applied there?
> >
> hmm, strange. I can apply it onto 2.6.31.9. It might conflict with other patches
> in 2.6.31.y queue ?
> Anyway, I've attached the patch that is rebased on 2.6.31.9. Please tell me if you
> have any problem with it.
>
> v3: rebased on 2.6.31.9

This version worked, thanks for regenerating it.

greg k-h