From: Michal Hocko <[email protected]>
Tetsuo has pointed out that since 27ae357fa82b ("mm, oom: fix concurrent
munlock and oom reaper unmap, v3") we have a strong synchronization
between the oom_killer and victim's exiting because both have to take
the oom_lock. Therefore the original heuristic to sleep for a short time
in out_of_memory doesn't serve the original purpose.
Moreover Tetsuo has noticed that the short sleep can be more harmful
than actually useful. Hammering the system with many processes can lead
to a starvation when the task holding the oom_lock can block for a
long time (minutes) and block any further progress because the
oom_reaper depends on the oom_lock as well.
Drop the short sleep from out_of_memory when we hold the lock. Keep the
sleep when the trylock fails to throttle the concurrent OOM paths a bit.
This should be solved in a more reasonable way (e.g. sleep proportional
to the time spent in the active reclaiming etc.) but this is much more
complex thing to achieve. This is a quick fixup to remove a stale code.
Reported-by: Tetsuo Handa <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
mm/oom_kill.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8ba6cb88cf58..ed9d473c571e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -1077,15 +1077,9 @@ bool out_of_memory(struct oom_control *oc)
dump_header(oc, NULL);
panic("Out of memory and no killable processes...\n");
}
- if (oc->chosen && oc->chosen != (void *)-1UL) {
+ if (oc->chosen && oc->chosen != (void *)-1UL)
oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" :
"Memory cgroup out of memory");
- /*
- * Give the killed process a good chance to exit before trying
- * to allocate memory again.
- */
- schedule_timeout_killable(1);
- }
return !!oc->chosen;
}
--
2.18.0
On Mon, 9 Jul 2018, Michal Hocko wrote:
> From: Michal Hocko <[email protected]>
>
> Tetsuo has pointed out that since 27ae357fa82b ("mm, oom: fix concurrent
> munlock and oom reaper unmap, v3") we have a strong synchronization
> between the oom_killer and victim's exiting because both have to take
> the oom_lock. Therefore the original heuristic to sleep for a short time
> in out_of_memory doesn't serve the original purpose.
>
> Moreover Tetsuo has noticed that the short sleep can be more harmful
> than actually useful. Hammering the system with many processes can lead
> to a starvation when the task holding the oom_lock can block for a
> long time (minutes) and block any further progress because the
> oom_reaper depends on the oom_lock as well.
>
> Drop the short sleep from out_of_memory when we hold the lock. Keep the
> sleep when the trylock fails to throttle the concurrent OOM paths a bit.
> This should be solved in a more reasonable way (e.g. sleep proportional
> to the time spent in the active reclaiming etc.) but this is much more
> complex thing to achieve. This is a quick fixup to remove a stale code.
>
> Reported-by: Tetsuo Handa <[email protected]>
> Signed-off-by: Michal Hocko <[email protected]>
This reminds me:
mm/oom_kill.c
54) int sysctl_oom_dump_tasks = 1;
55)
56) DEFINE_MUTEX(oom_lock);
57)
58) #ifdef CONFIG_NUMA
Would you mind documenting oom_lock to specify what it's protecting?
On Mon 09-07-18 15:49:53, David Rientjes wrote:
> On Mon, 9 Jul 2018, Michal Hocko wrote:
>
> > From: Michal Hocko <[email protected]>
> >
> > Tetsuo has pointed out that since 27ae357fa82b ("mm, oom: fix concurrent
> > munlock and oom reaper unmap, v3") we have a strong synchronization
> > between the oom_killer and victim's exiting because both have to take
> > the oom_lock. Therefore the original heuristic to sleep for a short time
> > in out_of_memory doesn't serve the original purpose.
> >
> > Moreover Tetsuo has noticed that the short sleep can be more harmful
> > than actually useful. Hammering the system with many processes can lead
> > to a starvation when the task holding the oom_lock can block for a
> > long time (minutes) and block any further progress because the
> > oom_reaper depends on the oom_lock as well.
> >
> > Drop the short sleep from out_of_memory when we hold the lock. Keep the
> > sleep when the trylock fails to throttle the concurrent OOM paths a bit.
> > This should be solved in a more reasonable way (e.g. sleep proportional
> > to the time spent in the active reclaiming etc.) but this is much more
> > complex thing to achieve. This is a quick fixup to remove a stale code.
> >
> > Reported-by: Tetsuo Handa <[email protected]>
> > Signed-off-by: Michal Hocko <[email protected]>
>
> This reminds me:
>
> mm/oom_kill.c
>
> 54) int sysctl_oom_dump_tasks = 1;
> 55)
> 56) DEFINE_MUTEX(oom_lock);
> 57)
> 58) #ifdef CONFIG_NUMA
>
> Would you mind documenting oom_lock to specify what it's protecting?
What do you think about the following?
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ed9d473c571e..32e6f7becb40 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -53,6 +53,14 @@ int sysctl_panic_on_oom;
int sysctl_oom_kill_allocating_task;
int sysctl_oom_dump_tasks = 1;
+/*
+ * Serializes oom killer invocations (out_of_memory()) from all contexts to
+ * prevent from over eager oom killing (e.g. when the oom killer is invoked
+ * from different domains).
+ *
+ * oom_killer_disable() relies on this lock to stabilize oom_killer_disabled
+ * and mark_oom_victim
+ */
DEFINE_MUTEX(oom_lock);
#ifdef CONFIG_NUMA
--
Michal Hocko
SUSE Labs
On Tue, 10 Jul 2018, Michal Hocko wrote:
> What do you think about the following?
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index ed9d473c571e..32e6f7becb40 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -53,6 +53,14 @@ int sysctl_panic_on_oom;
> int sysctl_oom_kill_allocating_task;
> int sysctl_oom_dump_tasks = 1;
>
> +/*
> + * Serializes oom killer invocations (out_of_memory()) from all contexts to
> + * prevent from over eager oom killing (e.g. when the oom killer is invoked
> + * from different domains).
> + *
> + * oom_killer_disable() relies on this lock to stabilize oom_killer_disabled
> + * and mark_oom_victim
> + */
> DEFINE_MUTEX(oom_lock);
>
> #ifdef CONFIG_NUMA
I think it's better, thanks. However, does it address the question about
why __oom_reap_task_mm() needs oom_lock protection? Perhaps it would be
helpful to mention synchronization between reaping triggered from
oom_reaper and by exit_mmap().
On Tue, 10 Jul 2018, David Rientjes wrote:
> I think it's better, thanks. However, does it address the question about
> why __oom_reap_task_mm() needs oom_lock protection? Perhaps it would be
> helpful to mention synchronization between reaping triggered from
> oom_reaper and by exit_mmap().
>
Actually, can't we remove the need to take oom_lock in exit_mmap() if
__oom_reap_task_mm() can do a test and set on MMF_UNSTABLE and, if already
set, bail out immediately?
On Tue 10-07-18 14:12:28, David Rientjes wrote:
> On Tue, 10 Jul 2018, David Rientjes wrote:
>
> > I think it's better, thanks. However, does it address the question about
> > why __oom_reap_task_mm() needs oom_lock protection? Perhaps it would be
> > helpful to mention synchronization between reaping triggered from
> > oom_reaper and by exit_mmap().
> >
>
> Actually, can't we remove the need to take oom_lock in exit_mmap() if
> __oom_reap_task_mm() can do a test and set on MMF_UNSTABLE and, if already
> set, bail out immediately?
I think we do not really depend on oom_lock anymore in
__oom_reap_task_mm. The race it was original added for (mmget_not_zero
vs. exit path) is no longer a problem. I didn't really get to evaluate
it deeper though. There are just too many things going on in parallel.
Tetsuo was proposing some patches to remove the lock but those patches
had some other problems. If we have a simple patch to remove the
oom_lock from the oom reaper then I will review it. I am not sure I can
come up with a patch myself in few days.
--
Michal Hocko
SUSE Labs