LinuxLists.cc - [PATCH -mm 0/2] oom

2016-03-08 13:12:37

Subject: [PATCH -mm 0/2] oom_reaper: missing parts

Hi Andrew,
there are two following left overs which are missing in your tree
right now. Could you add them please?

Thanks to Tetsuo for pointing it out http://lkml.kernel.org/r/[email protected]

2016-03-08 13:12:46

by Michal Hocko

[permalink] [raw]

Subject: [PATCH 1/2] mm-oom_reaper-report-success-failure-fix-fix

From: Michal Hocko <[email protected]>

typo fix

Signed-off-by: Michal Hocko <[email protected]>
---
mm/oom_kill.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 09e6f3211f1c..70fff7e3b1a7 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -480,7 +480,7 @@ static bool __oom_reap_task(struct task_struct *tsk)
}
}
tlb_finish_mmu(&tlb, 0, -1);
- pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lulB\n",
+ pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
task_pid_nr(tsk), tsk->comm,
K(get_mm_counter(mm, MM_ANONPAGES)),
K(get_mm_counter(mm, MM_FILEPAGES)),
--
2.7.0

2016-03-08 13:12:54

by Michal Hocko

[permalink] [raw]

Subject: [PATCH 2/2] oom-clear-tif_memdie-after-oom_reaper-managed-to-unmap-the-address-space-fix

From: Michal Hocko <[email protected]>

fix a left over

Tetsuo Handa <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
mm/oom_kill.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 70fff7e3b1a7..b6228643367b 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -569,7 +569,7 @@ static int __init oom_init(void)
}
subsys_initcall(oom_init)
#else
-static void wake_oom_reaper(struct task_struct *mm)
+static void wake_oom_reaper(struct task_struct *tsk)
{
}
#endif
--
2.7.0

2016-03-08 13:18:32

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH -mm 0/2] oom_reaper: missing parts

On Tue 08-03-16 14:12:15, Michal Hocko wrote:
> Hi Andrew,
> there are two following left overs which are missing in your tree
> right now. Could you add them please?
>
> Thanks to Tetsuo for pointing it out http://lkml.kernel.org/r/[email protected]

And I failed to notice this was a private email.

--
Michal Hocko
SUSE Labs

2016-03-09 21:21:50

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH 2/2] oom-clear-tif_memdie-after-oom_reaper-managed-to-unmap-the-address-space-fix

On Tue, 8 Mar 2016 14:12:17 +0100 Michal Hocko <[email protected]> wrote:

> From: Michal Hocko <[email protected]>
>
> fix a left over
>
> Tetsuo Handa <[email protected]>
> Signed-off-by: Michal Hocko <[email protected]>
> ---
> mm/oom_kill.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 70fff7e3b1a7..b6228643367b 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -569,7 +569,7 @@ static int __init oom_init(void)
> }
> subsys_initcall(oom_init)
> #else
> -static void wake_oom_reaper(struct task_struct *mm)
> +static void wake_oom_reaper(struct task_struct *tsk)
> {
> }
> #endif

Thanks.

I found the below patch lying around but I didn't queue it properly.
Is it legit?

From: Johannes Weiner <[email protected]>
Subject: oom-clear-tif_memdie-after-oom_reaper-managed-to-unmap-the-address-space-fix

When the OOM killer scans tasks and encounters a PF_EXITING one, it
force-selects that one regardless of the score. Is there a possibility
that the task might hang after it has set PF_EXITING? In that case the
OOM killer should be able to move on to the next task.

Frankly, I don't even know why we check for exiting tasks in the OOM
killer. We've tried direct reclaim at least 15 times by the time we
decide the system is OOM, there was plenty of time to exit and free
memory; and a task might exit voluntarily right after we issue a kill.
This is testing pure noise.

Cc: Tetsuo Handa <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Andrea Argangeli <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Sasha Levin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

mm/oom_kill.c | 3 ---
1 file changed, 3 deletions(-)

diff -puN mm/oom_kill.c~oom-clear-tif_memdie-after-oom_reaper-managed-to-unmap-the-address-space-fix mm/oom_kill.c
--- a/mm/oom_kill.c~oom-clear-tif_memdie-after-oom_reaper-managed-to-unmap-the-address-space-fix
+++ a/mm/oom_kill.c
@@ -292,9 +292,6 @@ enum oom_scan_t oom_scan_process_thread(
if (oom_task_origin(task))
return OOM_SCAN_SELECT;

- if (task_will_free_mem(task) && !is_sysrq_oom(oc))
- return OOM_SCAN_ABORT;
-
return OOM_SCAN_OK;
}

_

2016-03-09 22:23:32

by Tetsuo Handa

[permalink] [raw]

Subject: Re: [PATCH 2/2] oom-clear-tif_memdie-after-oom_reaper-managed-to-unmap-the-address-space-fix

Andrew Morton wrote:
> I found the below patch lying around but I didn't queue it properly.
> Is it legit?

I think that patch wants patch description updated.
Not testing pure noise, but causing possible livelock.
http://lkml.kernel.org/r/[email protected]

>
>
> From: Johannes Weiner <[email protected]>
> Subject: oom-clear-tif_memdie-after-oom_reaper-managed-to-unmap-the-address-space-fix
>
> When the OOM killer scans tasks and encounters a PF_EXITING one, it
> force-selects that one regardless of the score. Is there a possibility
> that the task might hang after it has set PF_EXITING? In that case the
> OOM killer should be able to move on to the next task.
>
> Frankly, I don't even know why we check for exiting tasks in the OOM
> killer. We've tried direct reclaim at least 15 times by the time we
> decide the system is OOM, there was plenty of time to exit and free
> memory; and a task might exit voluntarily right after we issue a kill.
> This is testing pure noise.
>
> Cc: Tetsuo Handa <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Kirill A. Shutemov <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: David Rientjes <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: Andrea Argangeli <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Sasha Levin <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> ---
>
> mm/oom_kill.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff -puN mm/oom_kill.c~oom-clear-tif_memdie-after-oom_reaper-managed-to-unmap-the-address-space-fix mm/oom_kill.c
> --- a/mm/oom_kill.c~oom-clear-tif_memdie-after-oom_reaper-managed-to-unmap-the-address-space-fix
> +++ a/mm/oom_kill.c
> @@ -292,9 +292,6 @@ enum oom_scan_t oom_scan_process_thread(
> if (oom_task_origin(task))
> return OOM_SCAN_SELECT;
>
> - if (task_will_free_mem(task) && !is_sysrq_oom(oc))
> - return OOM_SCAN_ABORT;
> -
> return OOM_SCAN_OK;
> }
>
> _
>
>

2016-03-09 22:30:15

by Johannes Weiner

[permalink] [raw]

Subject: Re: [PATCH 2/2] oom-clear-tif_memdie-after-oom_reaper-managed-to-unmap-the-address-space-fix

On Wed, Mar 09, 2016 at 01:21:42PM -0800, Andrew Morton wrote:
> I found the below patch lying around but I didn't queue it properly.
> Is it legit?

Yeah. Michal suggested this should be its own patch, which I agree
with. The subject would then be:

Subject: mm: oom_kill: don't ignore oom score on exiting tasks

> From: Johannes Weiner <[email protected]>
> Subject: oom-clear-tif_memdie-after-oom_reaper-managed-to-unmap-the-address-space-fix
>
> When the OOM killer scans tasks and encounters a PF_EXITING one, it
> force-selects that one regardless of the score. Is there a possibility
> that the task might hang after it has set PF_EXITING? In that case the
> OOM killer should be able to move on to the next task.
>
> Frankly, I don't even know why we check for exiting tasks in the OOM
> killer. We've tried direct reclaim at least 15 times by the time we
> decide the system is OOM, there was plenty of time to exit and free
> memory; and a task might exit voluntarily right after we issue a kill.
> This is testing pure noise.

Signed-off-by: Johannes Weiner <[email protected]>

2016-03-09 22:48:48

by Johannes Weiner

[permalink] [raw]

Subject: Re: [PATCH 2/2] oom-clear-tif_memdie-after-oom_reaper-managed-to-unmap-the-address-space-fix

On Thu, Mar 10, 2016 at 07:21:58AM +0900, Tetsuo Handa wrote:
> Andrew Morton wrote:
> > I found the below patch lying around but I didn't queue it properly.
> > Is it legit?
>
> I think that patch wants patch description updated.
> Not testing pure noise, but causing possible livelock.
> http://lkml.kernel.org/r/[email protected]

Sorry, I completely missed that. We're drowning in OOM killer fixes!

However, I disagree with your changelog. The scenario you describe is
real, but that the hung task is exiting is also noise. The underlying
problem is that the OOM victim is hung. Instead of OOM_SCAN_ABORT, the
OOM killer could also select some other non-exiting task that has the
mmap_sem held for reading. This patch doesn't fix that bug.

2016-03-09 23:09:01

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH 2/2] oom-clear-tif_memdie-after-oom_reaper-managed-to-unmap-the-address-space-fix

On Wed, 9 Mar 2016 17:48:29 -0500 Johannes Weiner <[email protected]> wrote:

> However, I disagree with your changelog.

What text would you prefer?

2016-03-10 00:45:21

by Johannes Weiner

[permalink] [raw]

Subject: Re: [PATCH 2/2] oom-clear-tif_memdie-after-oom_reaper-managed-to-unmap-the-address-space-fix

On Wed, Mar 09, 2016 at 03:08:53PM -0800, Andrew Morton wrote:
> On Wed, 9 Mar 2016 17:48:29 -0500 Johannes Weiner <[email protected]> wrote:
>
> > However, I disagree with your changelog.
>
> What text would you prefer?

I'd just keep the one you had initially. Or better, this modified
version:

When the OOM killer scans tasks and encounters a PF_EXITING one, it
force-selects that task regardless of the score. The problem is that
if that task got stuck waiting for some state the allocation site is
holding, the OOM reaper can not move on to the next best victim.

Frankly, I don't even know why we check for exiting tasks in the OOM
killer. We've tried direct reclaim at least 15 times by the time we
decide the system is OOM, there was plenty of time to exit and free
memory; and a task might exit voluntarily right after we issue a kill.
This is testing pure noise. Remove it.

2016-03-10 11:19:07

by Tetsuo Handa

[permalink] [raw]

Subject: Re: [PATCH 2/2]oom-clear-tif_memdie-after-oom_reaper-managed-to-unmap-the-address-space-fix

Johannes Weiner wrote:
> On Wed, Mar 09, 2016 at 03:08:53PM -0800, Andrew Morton wrote:
> > On Wed, 9 Mar 2016 17:48:29 -0500 Johannes Weiner <[email protected]> wrote:
> >
> > > However, I disagree with your changelog.
> >
> > What text would you prefer?
>
> I'd just keep the one you had initially. Or better, this modified
> version:
>
> When the OOM killer scans tasks and encounters a PF_EXITING one, it
> force-selects that task regardless of the score. The problem is that
> if that task got stuck waiting for some state the allocation site is
> holding, the OOM reaper can not move on to the next best victim.
>

There is no guarantee that the OOM reaper is waken up.
There are shortcuts which I don't like.

> Frankly, I don't even know why we check for exiting tasks in the OOM
> killer. We've tried direct reclaim at least 15 times by the time we
> decide the system is OOM, there was plenty of time to exit and free
> memory; and a task might exit voluntarily right after we issue a kill.
> This is testing pure noise. Remove it.
>

My concern is what an optimistic idea it is to wait for task_will_free_mem() or
TIF_MEMDIE task forever blindly
( http://lkml.kernel.org/r/[email protected] ).
We have

do_exit() {
exit_signals(); /* sets PF_EXITING */
/* (1) start */
exit_mm() {
mm_release() {
exit_robust_list() {
get_user() {
__do_page_fault() {
/* (1) end */
down_read(&current->mm->mmap_sem);
handle_mm_fault() {
kmalloc(GFP_KERNEL) {
out_of_memory() {
if (current->mm &&
(fatal_signal_pending(current) || task_will_free_mem(current))) {
mark_oom_victim(current); /* sets TIF_MEMDIE */
return true;
}
}
}
}
up_read(&current->mm->mmap_sem);
/* (2) start */
}
}
}
}
/* (2) end */
down_read(&current->mm->mmap_sem);
up_read(&current->mm->mmap_sem);
current->mm = NULL;
exit_oom_victim();
}
}

sequence. We will hit silent OOM livelock if somebody sharing the mm does
down_write_killable(&current->mm->mmap_sem) and kmalloc(GFP_KERNEL) for mmap() etc. at (1) or (2)
due to failing to send SIGKILL to somebody doing/done down_write_killable(&current->mm->mmap_sem)
and returning OOM_SCAN_ABORT without testing whether down_read(&victim->mm->mmap_sem) will succeed.
Since the OOM reaper is not invoked when shortcut is used, nobody can unlock.

Doing

- if (task_will_free_mem(task) && !is_sysrq_oom(oc))
+ if (task_will_free_mem(task) && !is_sysrq_oom(oc) && can_lock_mm_for_read(task))
return OOM_SCAN_ABORT;

and

if (test_tsk_thread_flag(task, TIF_MEMDIE)) {
- if (!is_sysrq_oom(oc))
+ if (!is_sysrq_oom(oc) && can_lock_mm_for_read(task))
return OOM_SCAN_ABORT;
}

is a too fast decision because can_lock_mm_for_read(task) might become true
if if we waited for a moment. Doing

- if (task_will_free_mem(task) && !is_sysrq_oom(oc))
+ if (task_will_free_mem(task) && !is_sysrq_oom(oc) && we_havent_waited_enough_period(task))
return OOM_SCAN_ABORT;

and

if (test_tsk_thread_flag(task, TIF_MEMDIE)) {
- if (!is_sysrq_oom(oc))
+ if (!is_sysrq_oom(oc) && we_havent_waited_enough_period(task))
return OOM_SCAN_ABORT;
}

is a timeout based unlocking which Michal does not like. Doing

- if (task_will_free_mem(task) && !is_sysrq_oom(oc))
+ if (task_will_free_mem(task) && !is_sysrq_oom(oc) && should_oom_scan_abort(task))
return OOM_SCAN_ABORT;

and

if (test_tsk_thread_flag(task, TIF_MEMDIE)) {
- if (!is_sysrq_oom(oc))
+ if (!is_sysrq_oom(oc) && should_oom_scan_abort(task))
return OOM_SCAN_ABORT;
}

is a counter based unlocking which I don't know what Michal thinks.

This situation is similar to when to declare OOM in OOM detection rework.