2010-06-01 05:47:06

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 1/5] oom: make oom_unkillable() helper function


This patch series was made on top yesterday oom pile
=====================================================================

Now, sysctl_oom_kill_allocating_task case and CONSTRAINT_MEMORY_POLICY
case don't call select_bad_process(). then, oom_kill_process() need
very similar check for distinguish unkillable tasks.

This patch unify such two logic.

Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/oom_kill.c | 62 ++++++++++++++++++++++++--------------------------------
1 files changed, 27 insertions(+), 35 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index f6aa3fc..bac4515 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -250,6 +250,21 @@ static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
}
#endif

+int oom_unkillable(struct task_struct *p, struct mem_cgroup *mem)
+{
+ /* skip the init task and kthreads */
+ if (is_global_init(p) || (p->flags & PF_KTHREAD))
+ return 1;
+
+ if (mem && !task_in_mem_cgroup(p, mem))
+ return 1;
+
+ if (p->signal->oom_adj == OOM_DISABLE)
+ return 1;
+
+ return 0;
+}
+
/*
* Simple selection loop. We chose the process with the highest
* number of 'points'. We expect the caller will lock the tasklist.
@@ -268,10 +283,7 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
for_each_process(p) {
unsigned long points;

- /* skip the init task and kthreads */
- if (is_global_init(p) || (p->flags & PF_KTHREAD))
- continue;
- if (mem && !task_in_mem_cgroup(p, mem))
+ if (oom_unkillable(p, mem))
continue;

/*
@@ -304,9 +316,6 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
*ppoints = ULONG_MAX;
}

- if (p->signal->oom_adj == OOM_DISABLE)
- continue;
-
points = badness(p, uptime.tv_sec);
if (points > *ppoints || !chosen) {
chosen = p;
@@ -386,20 +395,18 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
* flag though it's unlikely that we select a process with CAP_SYS_RAW_IO
* set.
*/
-static void __oom_kill_task(struct task_struct *p, int verbose)
+static int __oom_kill_process(struct task_struct *p, struct mem_cgroup *mem,
+ int verbose)
{
- if (is_global_init(p)) {
- WARN_ON(1);
- printk(KERN_WARNING "tried to kill init!\n");
- return;
- }
+ if (oom_unkillable(p, mem))
+ return 1;

p = find_lock_task_mm(p);
if (!p) {
WARN_ON(1);
printk(KERN_WARNING "tried to kill an mm-less task %d (%s)!\n",
task_pid_nr(p), p->comm);
- return;
+ return 1;
}

if (verbose)
@@ -420,22 +427,6 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
set_tsk_thread_flag(p, TIF_MEMDIE);

force_sig(SIGKILL, p);
-}
-
-static int oom_kill_task(struct task_struct *p)
-{
- /* WARNING: mm may not be dereferenced since we did not obtain its
- * value from get_task_mm(p). This is OK since all we need to do is
- * compare mm to q->mm below.
- *
- * Furthermore, even if mm contains a non-NULL value, p->mm may
- * change to NULL at any time since we do not hold task_lock(p).
- * However, this is of no concern to us.
- */
- if (!p->mm || p->signal->oom_adj == OOM_DISABLE)
- return 1;
-
- __oom_kill_task(p, 1);

return 0;
}
@@ -454,7 +445,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
* its children or threads, just set TIF_MEMDIE so it can die quickly
*/
if (p->flags & PF_EXITING) {
- __oom_kill_task(p, 0);
+ __oom_kill_process(p, mem, 0);
return 0;
}

@@ -465,12 +456,13 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
list_for_each_entry(c, &p->children, sibling) {
if (c->mm == p->mm)
continue;
- if (mem && !task_in_mem_cgroup(c, mem))
- continue;
- if (!oom_kill_task(c))
+
+ /* Ok, Kill the child */
+ if (!__oom_kill_process(c, mem, 1))
return 0;
}
- return oom_kill_task(p);
+
+ return __oom_kill_process(p, mem, 1);
}

#ifdef CONFIG_CGROUP_MEM_RES_CTLR
--
1.6.5.2



2010-06-01 05:48:11

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 2/5] oom: remove warning for in mm-less task __oom_kill_process()

If the race of mm detach in task exiting vs oom is happen,
find_lock_task_mm() can be return NULL.

So, the warning is pointless. remove it.


Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/oom_kill.c | 6 +-----
1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index bac4515..70e1a85 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -402,12 +402,8 @@ static int __oom_kill_process(struct task_struct *p, struct mem_cgroup *mem,
return 1;

p = find_lock_task_mm(p);
- if (!p) {
- WARN_ON(1);
- printk(KERN_WARNING "tried to kill an mm-less task %d (%s)!\n",
- task_pid_nr(p), p->comm);
+ if (!p)
return 1;
- }

if (verbose)
printk(KERN_ERR "Killed process %d (%s) "
--
1.6.5.2


2010-06-01 05:49:21

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 3/5] oom: Fix child process iteration properly

Oleg pointed out that current oom child process iterating logic is wrong.

> list_for_each_entry(p->children) can only see the tasks forked
> by p, it can't see other children forked by its sub-threads.

This patch fixes it.

Reported-by: Oleg Nesterov <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/oom_kill.c | 34 ++++++++++++++++++++--------------
1 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 70e1a85..1bdf27d 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -88,6 +88,7 @@ unsigned long badness(struct task_struct *p, unsigned long uptime)
{
unsigned long points, cpu_time, run_time;
struct task_struct *c;
+ struct task_struct *t = p;
struct task_struct *child;
int oom_adj = p->signal->oom_adj;
struct task_cputime task_time;
@@ -125,14 +126,16 @@ unsigned long badness(struct task_struct *p, unsigned long uptime)
* child is eating the vast majority of memory, adding only half
* to the parents will make the child our kill candidate of choice.
*/
- list_for_each_entry(c, &p->children, sibling) {
- child = find_lock_task_mm(c);
- if (child) {
- if (child->mm != p->mm)
- points += child->mm->total_vm/2 + 1;
- task_unlock(child);
+ do {
+ list_for_each_entry(c, &t->children, sibling) {
+ child = find_lock_task_mm(c);
+ if (child) {
+ if (child->mm != p->mm)
+ points += child->mm->total_vm/2 + 1;
+ task_unlock(child);
+ }
}
- }
+ } while_each_thread(p, t);

/*
* CPU time is in tens of seconds and run time is in thousands
@@ -432,6 +435,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
const char *message)
{
struct task_struct *c;
+ struct task_struct *t = p;

if (printk_ratelimit())
dump_header(p, gfp_mask, order, mem);
@@ -449,14 +453,16 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
message, task_pid_nr(p), p->comm, points);

/* Try to kill a child first */
- list_for_each_entry(c, &p->children, sibling) {
- if (c->mm == p->mm)
- continue;
+ do {
+ list_for_each_entry(c, &t->children, sibling) {
+ if (c->mm == p->mm)
+ continue;

- /* Ok, Kill the child */
- if (!__oom_kill_process(c, mem, 1))
- return 0;
- }
+ /* Ok, Kill the child */
+ if (!__oom_kill_process(c, mem, 1))
+ return 0;
+ }
+ } while_each_thread(p, t);

return __oom_kill_process(p, mem, 1);
}
--
1.6.5.2


2010-06-01 05:50:33

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 4/5] oom-kill: give the dying task a higher priority (v4)

From: Luis Claudio R. Goncalves <[email protected]>

In a system under heavy load it was observed that even after the
oom-killer selects a task to die, the task may take a long time to die.

Right before sending a SIGKILL to the task selected by the oom-killer
this task has it's priority increased so that it can exit() exit soon,
freeing memory. That is accomplished by:

/*
* We give our sacrificial lamb high priority and access to
* all the memory it needs. That way it should be able to
* exit() and clear out its resources quickly...
*/
p->rt.time_slice = HZ;
set_tsk_thread_flag(p, TIF_MEMDIE);

It sounds plausible giving the dying task an even higher priority to be
sure it will be scheduled sooner and free the desired memory. It was
suggested on LKML using SCHED_FIFO:1, the lowest RT priority so that
this
task won't interfere with any running RT task.

Another good suggestion, implemented here, was to avoid boosting the
dying
task priority in case of mem_cgroup OOM.

Signed-off-by: Luis Claudio R. Goncalves <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]> [rebase
on top my patches]
---
mm/oom_kill.c | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index b1df1d9..cbad4d4 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -427,6 +427,18 @@ static int __oom_kill_process(struct task_struct *p, struct mem_cgroup *mem,

force_sig(SIGKILL, p);

+ /*
+ * If this is a system OOM (not a memcg OOM), speed up the recovery
+ * by boosting the dying task priority to the lowest FIFO priority.
+ * That helps with the recovery and avoids interfering with RT tasks.
+ */
+ if (mem == NULL) {
+ struct sched_param param;
+
+ param.sched_priority = 1;
+ sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
+ }
+
return 0;
}

--
1.6.5.2


2010-06-01 05:51:55

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 5/5] oom: dump_tasks() use find_lock_task_mm() too

dump_task() should have the same process iteration logic as
select_bad_process().

It is needed for protecting from task exiting race.


Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/oom_kill.c | 31 +++++++++++++------------------
1 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index cbad4d4..a8af9e8 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -344,35 +344,30 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
*/
static void dump_tasks(const struct mem_cgroup *mem)
{
- struct task_struct *g, *p;
+ struct task_struct *p;
+ struct task_struct *task;

printk(KERN_INFO "[ pid ] uid tgid total_vm rss cpu oom_adj "
"name\n");
- do_each_thread(g, p) {
+
+ for_each_process(p) {
struct mm_struct *mm;

- if (mem && !task_in_mem_cgroup(p, mem))
+ if (is_global_init(p) || (p->flags & PF_KTHREAD))
continue;
- if (!thread_group_leader(p))
+ if (mem && !task_in_mem_cgroup(p, mem))
continue;

- task_lock(p);
- mm = p->mm;
- if (!mm) {
- /*
- * total_vm and rss sizes do not exist for tasks with no
- * mm so there's no need to report them; they can't be
- * oom killed anyway.
- */
- task_unlock(p);
+ task = find_lock_task_mm(p);
+ if (!task)
continue;
- }
+
printk(KERN_INFO "[%5d] %5d %5d %8lu %8lu %3d %3d %s\n",
- p->pid, __task_cred(p)->uid, p->tgid, mm->total_vm,
- get_mm_rss(mm), (int)task_cpu(p), p->signal->oom_adj,
+ task->pid, __task_cred(task)->uid, task->tgid, task->mm->total_vm,
+ get_mm_rss(task->mm), (int)task_cpu(task), task->signal->oom_adj,
p->comm);
- task_unlock(p);
- } while_each_thread(g, p);
+ task_unlock(task);
+ }
}

static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
--
1.6.5.2


2010-06-01 07:20:48

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH 2/5] oom: remove warning for in mm-less task __oom_kill_process()

On Tue, 1 Jun 2010, KOSAKI Motohiro wrote:

> If the race of mm detach in task exiting vs oom is happen,
> find_lock_task_mm() can be return NULL.
>
> So, the warning is pointless. remove it.
>

This is already removed with my patch
oom-remove-unnecessary-code-and-cleanup.patch from my oom killer rewrite.

2010-06-01 19:28:55

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/5] oom: Fix child process iteration properly

On 06/01, KOSAKI Motohiro wrote:
>
> @@ -88,6 +88,7 @@ unsigned long badness(struct task_struct *p, unsigned long uptime)
> {
> unsigned long points, cpu_time, run_time;
> struct task_struct *c;
> + struct task_struct *t = p;

This initialization should be moved down to

> + do {
> + list_for_each_entry(c, &t->children, sibling) {
> + child = find_lock_task_mm(c);
> + if (child) {
> + if (child->mm != p->mm)
> + points += child->mm->total_vm/2 + 1;
> + task_unlock(child);
> + }
> }
> - }
> + } while_each_thread(p, t);

this loop. We have "p = find_lock_task_mm(p)" in between which can change p.

Apart from this, I think the whole series is nice.

Oleg.

2010-06-02 13:54:06

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 5/5] oom: dump_tasks() use find_lock_task_mm() too

> dump_task() should have the same process iteration logic as
> select_bad_process().
>
> It is needed for protecting from task exiting race.
>
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>

sorry, this patch made one warning.
incremental patch is here.



Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/oom_kill.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index b06f8d1..f33a1b8 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -334,8 +334,6 @@ static void dump_tasks(const struct mem_cgroup *mem)
"name\n");

for_each_process(p) {
- struct mm_struct *mm;
-
if (is_global_init(p) || (p->flags & PF_KTHREAD))
continue;
if (mem && !task_in_mem_cgroup(p, mem))
--
1.6.5.2


2010-06-02 13:54:08

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 3/5] oom: Fix child process iteration properly

> On 06/01, KOSAKI Motohiro wrote:
> >
> > @@ -88,6 +88,7 @@ unsigned long badness(struct task_struct *p, unsigned long uptime)
> > {
> > unsigned long points, cpu_time, run_time;
> > struct task_struct *c;
> > + struct task_struct *t = p;
>
> This initialization should be moved down to
>
> > + do {
> > + list_for_each_entry(c, &t->children, sibling) {
> > + child = find_lock_task_mm(c);
> > + if (child) {
> > + if (child->mm != p->mm)
> > + points += child->mm->total_vm/2 + 1;
> > + task_unlock(child);
> > + }
> > }
> > - }
> > + } while_each_thread(p, t);
>
> this loop. We have "p = find_lock_task_mm(p)" in between which can change p.
>
> Apart from this, I think the whole series is nice.

Nich catch!

simple incremental patch is here.

========================================================
Subject: [PATCH] Fix oom: Fix child process iteration properly

p can be changed by find_lock_task_mm()

Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/oom_kill.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 9631f1b..9e7f0f9 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -88,7 +88,7 @@ unsigned long badness(struct task_struct *p, unsigned long uptime)
{
unsigned long points, cpu_time, run_time;
struct task_struct *c;
- struct task_struct *t = p;
+ struct task_struct *t;
struct task_struct *child;
int oom_adj = p->signal->oom_adj;
struct task_cputime task_time;
@@ -126,6 +126,7 @@ unsigned long badness(struct task_struct *p, unsigned long uptime)
* child is eating the vast majority of memory, adding only half
* to the parents will make the child our kill candidate of choice.
*/
+ t = p;
do {
list_for_each_entry(c, &t->children, sibling) {
child = find_lock_task_mm(c);
--
1.6.5.2




2010-06-02 15:03:15

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 5/5] oom: dump_tasks() use find_lock_task_mm() too

Hi, Kosaki.

On Tue, Jun 01, 2010 at 02:51:49PM +0900, KOSAKI Motohiro wrote:
> dump_task() should have the same process iteration logic as
> select_bad_process().
>
> It is needed for protecting from task exiting race.
>
>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> mm/oom_kill.c | 31 +++++++++++++------------------
> 1 files changed, 13 insertions(+), 18 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index cbad4d4..a8af9e8 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -344,35 +344,30 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
> */
> static void dump_tasks(const struct mem_cgroup *mem)
> {
> - struct task_struct *g, *p;
> + struct task_struct *p;
> + struct task_struct *task;
>
> printk(KERN_INFO "[ pid ] uid tgid total_vm rss cpu oom_adj "
> "name\n");
> - do_each_thread(g, p) {
> +
> + for_each_process(p) {
> struct mm_struct *mm;
>
> - if (mem && !task_in_mem_cgroup(p, mem))
> + if (is_global_init(p) || (p->flags & PF_KTHREAD))

select_bad_process needs is_global_init check to not select init as victim.
But in this case, it is just for dumping information of tasks.

> continue;
> - if (!thread_group_leader(p))
> + if (mem && !task_in_mem_cgroup(p, mem))
> continue;
>
> - task_lock(p);
> - mm = p->mm;
> - if (!mm) {
> - /*
> - * total_vm and rss sizes do not exist for tasks with no
> - * mm so there's no need to report them; they can't be
> - * oom killed anyway.
> - */

Please, do not remove the comment for mm newbies unless you think it's useless.

> - task_unlock(p);
> + task = find_lock_task_mm(p);
> + if (!task)
> continue;
> - }
> +
> printk(KERN_INFO "[%5d] %5d %5d %8lu %8lu %3d %3d %s\n",
> - p->pid, __task_cred(p)->uid, p->tgid, mm->total_vm,
> - get_mm_rss(mm), (int)task_cpu(p), p->signal->oom_adj,
> + task->pid, __task_cred(task)->uid, task->tgid, task->mm->total_vm,
> + get_mm_rss(task->mm), (int)task_cpu(task), task->signal->oom_adj,
> p->comm);
> - task_unlock(p);
> - } while_each_thread(g, p);
> + task_unlock(task);
> + }
> }
>
> static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
> --
> 1.6.5.2
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Kind regards,
Minchan Kim

2010-06-03 00:06:14

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 5/5] oom: dump_tasks() use find_lock_task_mm() too

Hi

> > @@ -344,35 +344,30 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
> > */
> > static void dump_tasks(const struct mem_cgroup *mem)
> > {
> > - struct task_struct *g, *p;
> > + struct task_struct *p;
> > + struct task_struct *task;
> >
> > printk(KERN_INFO "[ pid ] uid tgid total_vm rss cpu oom_adj "
> > "name\n");
> > - do_each_thread(g, p) {
> > +
> > + for_each_process(p) {
> > struct mm_struct *mm;
> >
> > - if (mem && !task_in_mem_cgroup(p, mem))
> > + if (is_global_init(p) || (p->flags & PF_KTHREAD))
>
> select_bad_process needs is_global_init check to not select init as victim.
> But in this case, it is just for dumping information of tasks.

But dumping oom unrelated process is useless and making confusion.
Do you have any suggestion? Instead, adding unkillable field?


>
> > continue;
> > - if (!thread_group_leader(p))
> > + if (mem && !task_in_mem_cgroup(p, mem))
> > continue;
> >
> > - task_lock(p);
> > - mm = p->mm;
> > - if (!mm) {
> > - /*
> > - * total_vm and rss sizes do not exist for tasks with no
> > - * mm so there's no need to report them; they can't be
> > - * oom killed anyway.
> > - */
>
> Please, do not remove the comment for mm newbies unless you think it's useless.

How is this?

task = find_lock_task_mm(p);
if (!task)
/*
* Probably oom vs task-exiting race was happen and ->mm
* have been detached. thus there's no need to report them;
* they can't be oom killed anyway.
*/
continue;


2010-06-03 00:32:49

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 5/5] oom: dump_tasks() use find_lock_task_mm() too

On Thu, Jun 3, 2010 at 9:06 AM, KOSAKI Motohiro
<[email protected]> wrote:
> Hi
>
>> > @@ -344,35 +344,30 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
>> >   */
>> >  static void dump_tasks(const struct mem_cgroup *mem)
>> >  {
>> > -   struct task_struct *g, *p;
>> > +   struct task_struct *p;
>> > +   struct task_struct *task;
>> >
>> >     printk(KERN_INFO "[ pid ]   uid  tgid total_vm      rss cpu oom_adj "
>> >            "name\n");
>> > -   do_each_thread(g, p) {
>> > +
>> > +   for_each_process(p) {
>> >             struct mm_struct *mm;
>> >
>> > -           if (mem && !task_in_mem_cgroup(p, mem))
>> > +           if (is_global_init(p) || (p->flags & PF_KTHREAD))
>>
>> select_bad_process needs is_global_init check to not select init as victim.
>> But in this case, it is just for dumping information of tasks.
>
> But dumping oom unrelated process is useless and making confusion.
> Do you have any suggestion? Instead, adding unkillable field?

I think it's not unrelated. Of course, init process doesn't consume
lots of memory but might consume more memory than old as time goes by
or some BUG although it is unlikely.

I think whether we print information of init or not isn't a big deal.
But we have been done it until now and you are trying to change it.
At least, we need some description why you want to remove it.
Making confusion? Hmm.. I don't think it make many people confusion.

>
>
>>
>> >                     continue;
>> > -           if (!thread_group_leader(p))
>> > +           if (mem && !task_in_mem_cgroup(p, mem))
>> >                     continue;
>> >
>> > -           task_lock(p);
>> > -           mm = p->mm;
>> > -           if (!mm) {
>> > -                   /*
>> > -                    * total_vm and rss sizes do not exist for tasks with no
>> > -                    * mm so there's no need to report them; they can't be
>> > -                    * oom killed anyway.
>> > -                    */
>>
>> Please, do not remove the comment for mm newbies unless you think it's useless.
>
> How is this?
>
>               task = find_lock_task_mm(p);
>               if (!task)
>                        /*
>                         * Probably oom vs task-exiting race was happen and ->mm
>                         * have been detached. thus there's no need to report them;
>                         * they can't be oom killed anyway.
>                         */
>                        continue;
>

Looks good to adding story about racing. but my point was "total_vm
and rss sizes do not exist for tasks with no mm". But I don't want to
bother you due to trivial.
It depends on you. :)

Thanks, Kosaki.


--
Kind regards,
Minchan Kim

2010-06-03 00:41:56

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 5/5] oom: dump_tasks() use find_lock_task_mm() too

> On Thu, Jun 3, 2010 at 9:06 AM, KOSAKI Motohiro
> <[email protected]> wrote:
> > Hi
> >
> >> > @@ -344,35 +344,30 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
> >> >   */
> >> >  static void dump_tasks(const struct mem_cgroup *mem)
> >> >  {
> >> > -   struct task_struct *g, *p;
> >> > +   struct task_struct *p;
> >> > +   struct task_struct *task;
> >> >
> >> >     printk(KERN_INFO "[ pid ]   uid  tgid total_vm      rss cpu oom_adj "
> >> >            "name\n");
> >> > -   do_each_thread(g, p) {
> >> > +
> >> > +   for_each_process(p) {
> >> >             struct mm_struct *mm;
> >> >
> >> > -           if (mem && !task_in_mem_cgroup(p, mem))
> >> > +           if (is_global_init(p) || (p->flags & PF_KTHREAD))
> >>
> >> select_bad_process needs is_global_init check to not select init as victim.
> >> But in this case, it is just for dumping information of tasks.
> >
> > But dumping oom unrelated process is useless and making confusion.
> > Do you have any suggestion? Instead, adding unkillable field?
>
> I think it's not unrelated. Of course, init process doesn't consume
> lots of memory but might consume more memory than old as time goes by
> or some BUG although it is unlikely.
>
> I think whether we print information of init or not isn't a big deal.
> But we have been done it until now and you are trying to change it.
> At least, we need some description why you want to remove it.
> Making confusion? Hmm.. I don't think it make many people confusion.

Hm. ok, I'll change logic as you said.



> >> > -           mm = p->mm;
> >> > -           if (!mm) {
> >> > -                   /*
> >> > -                    * total_vm and rss sizes do not exist for tasks with no
> >> > -                    * mm so there's no need to report them; they can't be
> >> > -                    * oom killed anyway.
> >> > -                    */
> >>
> >> Please, do not remove the comment for mm newbies unless you think it's useless.
> >
> > How is this?
> >
> >               task = find_lock_task_mm(p);
> >               if (!task)
> >                        /*
> >                         * Probably oom vs task-exiting race was happen and ->mm
> >                         * have been detached. thus there's no need to report them;
> >                         * they can't be oom killed anyway.
> >                         */
> >                        continue;
> >
>
> Looks good to adding story about racing. but my point was "total_vm
> and rss sizes do not exist for tasks with no mm". But I don't want to
> bother you due to trivial.
> It depends on you. :)


old ->mm check have two intention.

a) the task is kernel thread?
b) the task have alredy detached ->mm

but a) is not strictly correct check because we should think use_mm().
therefore we appended PF_KTHREAD check. then, here find_lock_task_mm()
focus exiting race, I think.



2010-06-03 00:46:28

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 5/5] oom: dump_tasks() use find_lock_task_mm() too

On Thu, Jun 3, 2010 at 9:41 AM, KOSAKI Motohiro
<[email protected]> wrote:
>> On Thu, Jun 3, 2010 at 9:06 AM, KOSAKI Motohiro
>> <[email protected]> wrote:
>> >> > -           mm = p->mm;
>> >> > -           if (!mm) {
>> >> > -                   /*
>> >> > -                    * total_vm and rss sizes do not exist for tasks with no
>> >> > -                    * mm so there's no need to report them; they can't be
>> >> > -                    * oom killed anyway.
>> >> > -                    */
>> >>
>> >> Please, do not remove the comment for mm newbies unless you think it's useless.
>> >
>> > How is this?
>> >
>> >               task = find_lock_task_mm(p);
>> >               if (!task)
>> >                        /*
>> >                         * Probably oom vs task-exiting race was happen and ->mm
>> >                         * have been detached. thus there's no need to report them;
>> >                         * they can't be oom killed anyway.
>> >                         */
>> >                        continue;
>> >
>>
>> Looks good to adding story about racing. but my point was "total_vm
>> and rss sizes do not exist for tasks with no mm". But I don't want to
>> bother you due to trivial.
>> It depends on you. :)
>
>
> old ->mm check have two intention.
>
>   a) the task is kernel thread?
>   b) the task have alredy detached ->mm
> but a) is not strictly correct check because we should think use_mm().
> therefore we appended PF_KTHREAD check. then, here find_lock_task_mm()
> focus exiting race, I think.
>

No objection.

--
Kind regards,
Minchan Kim