2015-06-18 09:58:14

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 0/2] oom: sysrq+f shouldn't not panic the system + cleanup

Hi,
I have split the patch sent previously http://marc.info/?l=linux-mm&m=143323521519798&w=2
into two parts. The first patch prevents from the panic when OOM
killer is sysrq triggered. This is an obvious bug fix and hopefuly not
controversial.

I still believe that combining the regular and the sysrq triggered OOM
paths is ugly, error prone and it deserves a split up which is done in
the second patch. There are no functional changes introduced there.
I have dropped __oom_kill_process part because this one turned out
to be harmless for for the sysrq+f path - I couldn't have found any
interruptible sleep after exit_signals.
I find the resulting code easier to follow (35 (+), 22 (-) sounds like a
reasonable code overhead for that purpose).


2015-06-18 09:58:03

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 1/2] oom: Do not panic when OOM killer is sysrq triggered

OOM killer might be triggered externally via sysrq+f. This is supposed
to kill a task no matter what e.g. a task is selected even though there
is an OOM victim on the way to exit. This is a big hammer for an admin
to help to resolve a memory short condition when the system is not able
to cope with it on its own in a reasonable time frame (e.g. when the
system is trashing or the OOM killer cannot make sufficient progress)

E.g. it doesn't make any sense to obey panic_on_oom setting because
a) administrator could have used other sysrqs to achieve the
panic/reboot and b) the policy would break an existing usecase to
kill a memory hog which would be recoverable unlike the panic which
might be configured for the real OOM condition.

It also doesn't make much sense to panic the system when there is no
OOM killable task because administrator might choose to do additional
steps before rebooting/panicing the system.

While we are there also add a comment explaining why
sysctl_oom_kill_allocating_task doesn't apply to sysrq triggered OOM
killer even though there is no explicit check and we subtly rely
on current->mm being NULL for the context from which it is triggered.

Also be more explicit about sysrq+f behavior in the documentation.

Requested-by: David Rientjes <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
Documentation/sysrq.txt | 5 ++++-
mm/oom_kill.c | 21 +++++++++++++++++----
2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt
index 0e307c94809a..a5dd88b0aede 100644
--- a/Documentation/sysrq.txt
+++ b/Documentation/sysrq.txt
@@ -75,7 +75,10 @@ On other - If you know of the key combos for other architectures, please

'e' - Send a SIGTERM to all processes, except for init.

-'f' - Will call oom_kill to kill a memory hog process.
+'f' - Will call oom_kill to kill a memory hog process. Please note that
+ an ongoing OOM killer is ignored and a task is killed even though
+ there was an oom victim selected already. panic_on_oom is ignored
+ and the system doesn't panic if there are no oom killable tasks.

'g' - Used by kgdb (kernel debugger)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index dff991e0681e..0c312eaac834 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -687,8 +687,14 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
constraint = constrained_alloc(zonelist, gfp_mask, nodemask,
&totalpages);
mpol_mask = (constraint == CONSTRAINT_MEMORY_POLICY) ? nodemask : NULL;
- check_panic_on_oom(constraint, gfp_mask, order, mpol_mask, NULL);
+ /* Ignore panic_on_oom when the OOM killer is sysrq triggered */
+ if (!force_kill)
+ check_panic_on_oom(constraint, gfp_mask, order, mpol_mask, NULL);

+ /*
+ * not affecting force_kill because sysrq triggered OOM killer runs from
+ * the workqueue context so current->mm will be NULL
+ */
if (sysctl_oom_kill_allocating_task && current->mm &&
!oom_unkillable_task(current, NULL, nodemask) &&
current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) {
@@ -700,10 +706,17 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
}

p = select_bad_process(&points, totalpages, mpol_mask, force_kill);
- /* Found nothing?!?! Either we hang forever, or we panic. */
+ /*
+ * Found nothing?!?! Either we hang forever, or we panic.
+ * Do not panic when the OOM killer is sysrq triggered.
+ */
if (!p) {
- dump_header(NULL, gfp_mask, order, NULL, mpol_mask);
- panic("Out of memory and no killable processes...\n");
+ if (!force_kill) {
+ dump_header(NULL, gfp_mask, order, NULL, mpol_mask);
+ panic("Out of memory and no killable processes...\n");
+ } else {
+ pr_info("Forced out of memory. No killable task found...\n");
+ }
}
if (p != (void *)-1UL) {
oom_kill_process(p, gfp_mask, order, points, totalpages, NULL,
--
2.1.4

2015-06-18 09:59:04

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 2/2] oom: split out forced OOM killer

The forced OOM killing is currently wired into out_of_memory() call
even though their objective is different which makes the code ugly
and harder to follow. Generic out_of_memory path has to deal with
configuration settings and heuristics which are completely irrelevant
to the forced OOM killer (e.g. sysctl_oom_kill_allocating_task or
OOM killer prevention for already dying tasks). All of them are
either relying on explicit force_kill check or indirectly by checking
current->mm which is always NULL for sysrq+f. This is not nice, hard
to follow and error prone.

Let's pull forced OOM killer code out into a separate function
(force_out_of_memory) which is really trivial now.
As a bonus we can clearly state that this is a forced OOM killer
in the OOM message which is helpful to distinguish it from the
regular OOM killer.

Signed-off-by: Michal Hocko <[email protected]>
---
drivers/tty/sysrq.c | 3 +--
include/linux/oom.h | 3 ++-
mm/oom_kill.c | 57 ++++++++++++++++++++++++++++++++---------------------
mm/page_alloc.c | 2 +-
4 files changed, 39 insertions(+), 26 deletions(-)

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index 3a42b7187b8e..06a95a8ed701 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -356,8 +356,7 @@ static struct sysrq_key_op sysrq_term_op = {
static void moom_callback(struct work_struct *ignored)
{
mutex_lock(&oom_lock);
- if (!out_of_memory(node_zonelist(first_memory_node, GFP_KERNEL),
- GFP_KERNEL, 0, NULL, true))
+ if (!force_out_of_memory())
pr_info("OOM request ignored because killer is disabled\n");
mutex_unlock(&oom_lock);
}
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 7deecb7bca5e..061e0ffd3493 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -70,8 +70,9 @@ extern enum oom_scan_t oom_scan_process_thread(struct task_struct *task,
unsigned long totalpages, const nodemask_t *nodemask,
bool force_kill);

+extern bool force_out_of_memory(void);
extern bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
- int order, nodemask_t *mask, bool force_kill);
+ int order, nodemask_t *mask);

extern void exit_oom_victim(void);

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 0c312eaac834..050936f35944 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -635,12 +635,38 @@ int unregister_oom_notifier(struct notifier_block *nb)
EXPORT_SYMBOL_GPL(unregister_oom_notifier);

/**
- * __out_of_memory - kill the "best" process when we run out of memory
+ * force_out_of_memory - forces OOM killer
+ *
+ * External trigger for the OOM killer. The system doesn't have to be under
+ * OOM condition (e.g. sysrq+f).
+ */
+bool force_out_of_memory(void)
+{
+ struct zonelist *zonelist = node_zonelist(first_memory_node, GFP_KERNEL);
+ struct task_struct *p;
+ unsigned long totalpages;
+ unsigned int points;
+
+ if (oom_killer_disabled)
+ return false;
+
+ constrained_alloc(zonelist, GFP_KERNEL, NULL, &totalpages);
+ p = select_bad_process(&points, totalpages, NULL, true);
+ if (p != (void *)-1UL)
+ oom_kill_process(p, GFP_KERNEL, 0, points, totalpages, NULL,
+ NULL, "Forced out of memory killer");
+ else
+ pr_warn("Forced out of memory. No killable task found...\n");
+
+ return true;
+}
+
+/**
+ * out_of_memory - kill the "best" process when we run out of memory
* @zonelist: zonelist pointer
* @gfp_mask: memory allocation flags
* @order: amount of memory being requested as a power of 2
* @nodemask: nodemask passed to page allocator
- * @force_kill: true if a task must be killed, even if others are exiting
*
* If we run out of memory, we have the choice between either
* killing a random task (bad), letting the system crash (worse)
@@ -648,7 +674,7 @@ EXPORT_SYMBOL_GPL(unregister_oom_notifier);
* don't have to be perfect here, we just have to be good.
*/
bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
- int order, nodemask_t *nodemask, bool force_kill)
+ int order, nodemask_t *nodemask)
{
const nodemask_t *mpol_mask;
struct task_struct *p;
@@ -687,14 +713,8 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
constraint = constrained_alloc(zonelist, gfp_mask, nodemask,
&totalpages);
mpol_mask = (constraint == CONSTRAINT_MEMORY_POLICY) ? nodemask : NULL;
- /* Ignore panic_on_oom when the OOM killer is sysrq triggered */
- if (!force_kill)
- check_panic_on_oom(constraint, gfp_mask, order, mpol_mask, NULL);
+ check_panic_on_oom(constraint, gfp_mask, order, mpol_mask, NULL);

- /*
- * not affecting force_kill because sysrq triggered OOM killer runs from
- * the workqueue context so current->mm will be NULL
- */
if (sysctl_oom_kill_allocating_task && current->mm &&
!oom_unkillable_task(current, NULL, nodemask) &&
current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) {
@@ -705,18 +725,11 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
goto out;
}

- p = select_bad_process(&points, totalpages, mpol_mask, force_kill);
- /*
- * Found nothing?!?! Either we hang forever, or we panic.
- * Do not panic when the OOM killer is sysrq triggered.
- */
+ p = select_bad_process(&points, totalpages, mpol_mask, false);
+ /* Found nothing?!?! Either we hang forever, or we panic. */
if (!p) {
- if (!force_kill) {
- dump_header(NULL, gfp_mask, order, NULL, mpol_mask);
- panic("Out of memory and no killable processes...\n");
- } else {
- pr_info("Forced out of memory. No killable task found...\n");
- }
+ dump_header(NULL, gfp_mask, order, NULL, mpol_mask);
+ panic("Out of memory and no killable processes...\n");
}
if (p != (void *)-1UL) {
oom_kill_process(p, gfp_mask, order, points, totalpages, NULL,
@@ -747,7 +760,7 @@ void pagefault_out_of_memory(void)
if (!mutex_trylock(&oom_lock))
return;

- if (!out_of_memory(NULL, 0, 0, NULL, false)) {
+ if (!out_of_memory(NULL, 0, 0, NULL)) {
/*
* There shouldn't be any user tasks runnable while the
* OOM killer is disabled, so the current task has to
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1f9ffbb087cb..014806d13138 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2731,7 +2731,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
goto out;
}
/* Exhausted what can be done so it's blamo time */
- if (out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask, false)
+ if (out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask)
|| WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL))
*did_some_progress = 1;
out:
--
2.1.4

2015-06-18 19:21:33

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH 1/2] oom: Do not panic when OOM killer is sysrq triggered

On Thu, 18 Jun 2015, Michal Hocko wrote:

> OOM killer might be triggered externally via sysrq+f. This is supposed

I'm not sure what you mean by externally? Perhaps "explicitly"?

> to kill a task no matter what e.g. a task is selected even though there
> is an OOM victim on the way to exit. This is a big hammer for an admin
> to help to resolve a memory short condition when the system is not able
> to cope with it on its own in a reasonable time frame (e.g. when the
> system is trashing or the OOM killer cannot make sufficient progress)
>
> E.g. it doesn't make any sense to obey panic_on_oom setting because
> a) administrator could have used other sysrqs to achieve the
> panic/reboot and b) the policy would break an existing usecase to
> kill a memory hog which would be recoverable unlike the panic which
> might be configured for the real OOM condition.
>
> It also doesn't make much sense to panic the system when there is no
> OOM killable task because administrator might choose to do additional
> steps before rebooting/panicing the system.
>

s/panicing/panicking/

> While we are there also add a comment explaining why
> sysctl_oom_kill_allocating_task doesn't apply to sysrq triggered OOM
> killer even though there is no explicit check and we subtly rely
> on current->mm being NULL for the context from which it is triggered.
>
> Also be more explicit about sysrq+f behavior in the documentation.
>
> Requested-by: David Rientjes <[email protected]>
> Signed-off-by: Michal Hocko <[email protected]>
> ---
> Documentation/sysrq.txt | 5 ++++-
> mm/oom_kill.c | 21 +++++++++++++++++----
> 2 files changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt
> index 0e307c94809a..a5dd88b0aede 100644
> --- a/Documentation/sysrq.txt
> +++ b/Documentation/sysrq.txt
> @@ -75,7 +75,10 @@ On other - If you know of the key combos for other architectures, please
>
> 'e' - Send a SIGTERM to all processes, except for init.
>
> -'f' - Will call oom_kill to kill a memory hog process.
> +'f' - Will call oom_kill to kill a memory hog process. Please note that
> + an ongoing OOM killer is ignored and a task is killed even though
> + there was an oom victim selected already. panic_on_oom is ignored
> + and the system doesn't panic if there are no oom killable tasks.

"an ongoing OOM killer" could probably be reworded to "parallel oom
killings".

>
> 'g' - Used by kgdb (kernel debugger)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index dff991e0681e..0c312eaac834 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -687,8 +687,14 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
> constraint = constrained_alloc(zonelist, gfp_mask, nodemask,
> &totalpages);
> mpol_mask = (constraint == CONSTRAINT_MEMORY_POLICY) ? nodemask : NULL;
> - check_panic_on_oom(constraint, gfp_mask, order, mpol_mask, NULL);
> + /* Ignore panic_on_oom when the OOM killer is sysrq triggered */
> + if (!force_kill)
> + check_panic_on_oom(constraint, gfp_mask, order, mpol_mask, NULL);

I don't think the comment is necessary, it should be clear from the code
that this only executes when force_kill == true.

You may want to reconsider my suggestion of renaming the formal as
"sysrq".

>
> + /*
> + * not affecting force_kill because sysrq triggered OOM killer runs from
> + * the workqueue context so current->mm will be NULL
> + */

Unnecessary comment, nobody is reading this code with the short circuit in
mind.

> if (sysctl_oom_kill_allocating_task && current->mm &&
> !oom_unkillable_task(current, NULL, nodemask) &&
> current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) {
> @@ -700,10 +706,17 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
> }
>
> p = select_bad_process(&points, totalpages, mpol_mask, force_kill);
> - /* Found nothing?!?! Either we hang forever, or we panic. */
> + /*
> + * Found nothing?!?! Either we hang forever, or we panic.
> + * Do not panic when the OOM killer is sysrq triggered.
> + */

Again, it's clear what a conditional does in C code.

> if (!p) {
> - dump_header(NULL, gfp_mask, order, NULL, mpol_mask);
> - panic("Out of memory and no killable processes...\n");
> + if (!force_kill) {
> + dump_header(NULL, gfp_mask, order, NULL, mpol_mask);
> + panic("Out of memory and no killable processes...\n");
> + } else {
> + pr_info("Forced out of memory. No killable task found...\n");
> + }

This line could probably be reworded to specify that an oom kill was
requested by a specific process and there was nothing avilable to kill.
I'm not sure that "forced" implies that it was process triggered.

> }
> if (p != (void *)-1UL) {
> oom_kill_process(p, gfp_mask, order, points, totalpages, NULL,

2015-06-18 19:27:33

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH 2/2] oom: split out forced OOM killer

On Thu, 18 Jun 2015, Michal Hocko wrote:

> The forced OOM killing is currently wired into out_of_memory() call
> even though their objective is different which makes the code ugly
> and harder to follow. Generic out_of_memory path has to deal with
> configuration settings and heuristics which are completely irrelevant
> to the forced OOM killer (e.g. sysctl_oom_kill_allocating_task or
> OOM killer prevention for already dying tasks). All of them are
> either relying on explicit force_kill check or indirectly by checking
> current->mm which is always NULL for sysrq+f. This is not nice, hard
> to follow and error prone.
>
> Let's pull forced OOM killer code out into a separate function
> (force_out_of_memory) which is really trivial now.
> As a bonus we can clearly state that this is a forced OOM killer
> in the OOM message which is helpful to distinguish it from the
> regular OOM killer.
>

Ok, so this patch reverts _everything_ in the first patch other than the
documentation. Just start with this patch instead, sheesh.

> Signed-off-by: Michal Hocko <[email protected]>
> ---
> drivers/tty/sysrq.c | 3 +--
> include/linux/oom.h | 3 ++-
> mm/oom_kill.c | 57 ++++++++++++++++++++++++++++++++---------------------
> mm/page_alloc.c | 2 +-
> 4 files changed, 39 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
> index 3a42b7187b8e..06a95a8ed701 100644
> --- a/drivers/tty/sysrq.c
> +++ b/drivers/tty/sysrq.c
> @@ -356,8 +356,7 @@ static struct sysrq_key_op sysrq_term_op = {
> static void moom_callback(struct work_struct *ignored)
> {
> mutex_lock(&oom_lock);
> - if (!out_of_memory(node_zonelist(first_memory_node, GFP_KERNEL),
> - GFP_KERNEL, 0, NULL, true))
> + if (!force_out_of_memory())
> pr_info("OOM request ignored because killer is disabled\n");
> mutex_unlock(&oom_lock);
> }
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index 7deecb7bca5e..061e0ffd3493 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -70,8 +70,9 @@ extern enum oom_scan_t oom_scan_process_thread(struct task_struct *task,
> unsigned long totalpages, const nodemask_t *nodemask,
> bool force_kill);
>
> +extern bool force_out_of_memory(void);
> extern bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
> - int order, nodemask_t *mask, bool force_kill);
> + int order, nodemask_t *mask);
>
> extern void exit_oom_victim(void);
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 0c312eaac834..050936f35944 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -635,12 +635,38 @@ int unregister_oom_notifier(struct notifier_block *nb)
> EXPORT_SYMBOL_GPL(unregister_oom_notifier);
>
> /**
> - * __out_of_memory - kill the "best" process when we run out of memory
> + * force_out_of_memory - forces OOM killer

... to kill a process.

> + *
> + * External trigger for the OOM killer. The system doesn't have to be under
> + * OOM condition (e.g. sysrq+f).
> + */

I'm still not sure what you mean by external. I assume you're referring
to induced by userspace rather than the kernel. I think you should use
the word "explicit".

> +bool force_out_of_memory(void)
> +{
> + struct zonelist *zonelist = node_zonelist(first_memory_node, GFP_KERNEL);
> + struct task_struct *p;
> + unsigned long totalpages;
> + unsigned int points;
> +
> + if (oom_killer_disabled)
> + return false;
> +
> + constrained_alloc(zonelist, GFP_KERNEL, NULL, &totalpages);
> + p = select_bad_process(&points, totalpages, NULL, true);
> + if (p != (void *)-1UL)
> + oom_kill_process(p, GFP_KERNEL, 0, points, totalpages, NULL,
> + NULL, "Forced out of memory killer");
> + else
> + pr_warn("Forced out of memory. No killable task found...\n");

Please consider the review from the first patch about this line.

> +
> + return true;
> +}
> +
> +/**
> + * out_of_memory - kill the "best" process when we run out of memory
> * @zonelist: zonelist pointer
> * @gfp_mask: memory allocation flags
> * @order: amount of memory being requested as a power of 2
> * @nodemask: nodemask passed to page allocator
> - * @force_kill: true if a task must be killed, even if others are exiting
> *
> * If we run out of memory, we have the choice between either
> * killing a random task (bad), letting the system crash (worse)
> @@ -648,7 +674,7 @@ EXPORT_SYMBOL_GPL(unregister_oom_notifier);
> * don't have to be perfect here, we just have to be good.
> */
> bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
> - int order, nodemask_t *nodemask, bool force_kill)
> + int order, nodemask_t *nodemask)
> {
> const nodemask_t *mpol_mask;
> struct task_struct *p;
> @@ -687,14 +713,8 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
> constraint = constrained_alloc(zonelist, gfp_mask, nodemask,
> &totalpages);
> mpol_mask = (constraint == CONSTRAINT_MEMORY_POLICY) ? nodemask : NULL;
> - /* Ignore panic_on_oom when the OOM killer is sysrq triggered */
> - if (!force_kill)
> - check_panic_on_oom(constraint, gfp_mask, order, mpol_mask, NULL);
> + check_panic_on_oom(constraint, gfp_mask, order, mpol_mask, NULL);
>
> - /*
> - * not affecting force_kill because sysrq triggered OOM killer runs from
> - * the workqueue context so current->mm will be NULL
> - */
> if (sysctl_oom_kill_allocating_task && current->mm &&
> !oom_unkillable_task(current, NULL, nodemask) &&
> current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) {
> @@ -705,18 +725,11 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
> goto out;
> }
>
> - p = select_bad_process(&points, totalpages, mpol_mask, force_kill);
> - /*
> - * Found nothing?!?! Either we hang forever, or we panic.
> - * Do not panic when the OOM killer is sysrq triggered.
> - */
> + p = select_bad_process(&points, totalpages, mpol_mask, false);
> + /* Found nothing?!?! Either we hang forever, or we panic. */
> if (!p) {
> - if (!force_kill) {
> - dump_header(NULL, gfp_mask, order, NULL, mpol_mask);
> - panic("Out of memory and no killable processes...\n");
> - } else {
> - pr_info("Forced out of memory. No killable task found...\n");
> - }
> + dump_header(NULL, gfp_mask, order, NULL, mpol_mask);
> + panic("Out of memory and no killable processes...\n");
> }
> if (p != (void *)-1UL) {
> oom_kill_process(p, gfp_mask, order, points, totalpages, NULL,
> @@ -747,7 +760,7 @@ void pagefault_out_of_memory(void)
> if (!mutex_trylock(&oom_lock))
> return;
>
> - if (!out_of_memory(NULL, 0, 0, NULL, false)) {
> + if (!out_of_memory(NULL, 0, 0, NULL)) {
> /*
> * There shouldn't be any user tasks runnable while the
> * OOM killer is disabled, so the current task has to
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 1f9ffbb087cb..014806d13138 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2731,7 +2731,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
> goto out;
> }
> /* Exhausted what can be done so it's blamo time */
> - if (out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask, false)
> + if (out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask)
> || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL))
> *did_some_progress = 1;
> out:

2015-06-19 07:09:53

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/2] oom: Do not panic when OOM killer is sysrq triggered

On Thu 18-06-15 12:21:22, David Rientjes wrote:
> On Thu, 18 Jun 2015, Michal Hocko wrote:
>
> > OOM killer might be triggered externally via sysrq+f. This is supposed
>
> I'm not sure what you mean by externally? Perhaps "explicitly"?

OK, explicitly is better.

[...]

> s/panicing/panicking/

Fixed

> > While we are there also add a comment explaining why
> > sysctl_oom_kill_allocating_task doesn't apply to sysrq triggered OOM
> > killer even though there is no explicit check and we subtly rely
> > on current->mm being NULL for the context from which it is triggered.
> >
> > Also be more explicit about sysrq+f behavior in the documentation.
> >
> > Requested-by: David Rientjes <[email protected]>
> > Signed-off-by: Michal Hocko <[email protected]>
> > ---
> > Documentation/sysrq.txt | 5 ++++-
> > mm/oom_kill.c | 21 +++++++++++++++++----
> > 2 files changed, 21 insertions(+), 5 deletions(-)
> >
> > diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt
> > index 0e307c94809a..a5dd88b0aede 100644
> > --- a/Documentation/sysrq.txt
> > +++ b/Documentation/sysrq.txt
> > @@ -75,7 +75,10 @@ On other - If you know of the key combos for other architectures, please
> >
> > 'e' - Send a SIGTERM to all processes, except for init.
> >
> > -'f' - Will call oom_kill to kill a memory hog process.
> > +'f' - Will call oom_kill to kill a memory hog process. Please note that
> > + an ongoing OOM killer is ignored and a task is killed even though
> > + there was an oom victim selected already. panic_on_oom is ignored
> > + and the system doesn't panic if there are no oom killable tasks.
>
> "an ongoing OOM killer" could probably be reworded to "parallel oom
> killings".

OK

> >
> > 'g' - Used by kgdb (kernel debugger)
> >
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index dff991e0681e..0c312eaac834 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -687,8 +687,14 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
> > constraint = constrained_alloc(zonelist, gfp_mask, nodemask,
> > &totalpages);
> > mpol_mask = (constraint == CONSTRAINT_MEMORY_POLICY) ? nodemask : NULL;
> > - check_panic_on_oom(constraint, gfp_mask, order, mpol_mask, NULL);
> > + /* Ignore panic_on_oom when the OOM killer is sysrq triggered */
> > + if (!force_kill)
> > + check_panic_on_oom(constraint, gfp_mask, order, mpol_mask, NULL);
>
> I don't think the comment is necessary, it should be clear from the code
> that this only executes when force_kill == true.

OK, I will remove the comment.

> You may want to reconsider my suggestion of renaming the formal as
> "sysrq".

I consider the naming clear enough

> >
> > + /*
> > + * not affecting force_kill because sysrq triggered OOM killer runs from
> > + * the workqueue context so current->mm will be NULL
> > + */
>
> Unnecessary comment, nobody is reading this code with the short circuit in
> mind.

I find this comment helpful because it is pointing a non trivial fact
that requires wandering the code otherwise. So I will keep it.

> > if (sysctl_oom_kill_allocating_task && current->mm &&
> > !oom_unkillable_task(current, NULL, nodemask) &&
> > current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) {
> > @@ -700,10 +706,17 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
> > }
> >
> > p = select_bad_process(&points, totalpages, mpol_mask, force_kill);
> > - /* Found nothing?!?! Either we hang forever, or we panic. */
> > + /*
> > + * Found nothing?!?! Either we hang forever, or we panic.
> > + * Do not panic when the OOM killer is sysrq triggered.
> > + */
>
> Again, it's clear what a conditional does in C code.

OK, I will remove it.

> > if (!p) {
> > - dump_header(NULL, gfp_mask, order, NULL, mpol_mask);
> > - panic("Out of memory and no killable processes...\n");
> > + if (!force_kill) {
> > + dump_header(NULL, gfp_mask, order, NULL, mpol_mask);
> > + panic("Out of memory and no killable processes...\n");
> > + } else {
> > + pr_info("Forced out of memory. No killable task found...\n");
> > + }
>
> This line could probably be reworded to specify that an oom kill was
> requested by a specific process and there was nothing avilable to kill.
> I'm not sure that "forced" implies that it was process triggered.

s@Forced@Sysrq triggered@ ?

>
> > }
> > if (p != (void *)-1UL) {
> > oom_kill_process(p, gfp_mask, order, points, totalpages, NULL,

---
diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt
index a5dd88b0aede..7664e93411d2 100644
--- a/Documentation/sysrq.txt
+++ b/Documentation/sysrq.txt
@@ -76,7 +76,7 @@ On other - If you know of the key combos for other architectures, please
'e' - Send a SIGTERM to all processes, except for init.

'f' - Will call oom_kill to kill a memory hog process. Please note that
- an ongoing OOM killer is ignored and a task is killed even though
+ parallel OOM killer is ignored and a task is killed even though
there was an oom victim selected already. panic_on_oom is ignored
and the system doesn't panic if there are no oom killable tasks.

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 0c312eaac834..f2737d66f66a 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -687,7 +687,6 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
constraint = constrained_alloc(zonelist, gfp_mask, nodemask,
&totalpages);
mpol_mask = (constraint == CONSTRAINT_MEMORY_POLICY) ? nodemask : NULL;
- /* Ignore panic_on_oom when the OOM killer is sysrq triggered */
if (!force_kill)
check_panic_on_oom(constraint, gfp_mask, order, mpol_mask, NULL);

@@ -706,16 +705,13 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
}

p = select_bad_process(&points, totalpages, mpol_mask, force_kill);
- /*
- * Found nothing?!?! Either we hang forever, or we panic.
- * Do not panic when the OOM killer is sysrq triggered.
- */
+ /* Found nothing?!?! Either we hang forever, or we panic. */
if (!p) {
if (!force_kill) {
dump_header(NULL, gfp_mask, order, NULL, mpol_mask);
panic("Out of memory and no killable processes...\n");
} else {
- pr_info("Forced out of memory. No killable task found...\n");
+ pr_info("Sysrq triggered out of memory. No killable task found...\n");
}
}
if (p != (void *)-1UL) {
--
Michal Hocko
SUSE Labs

2015-06-19 07:13:24

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 2/2] oom: split out forced OOM killer

On Thu 18-06-15 12:27:12, David Rientjes wrote:
> On Thu, 18 Jun 2015, Michal Hocko wrote:
>
> > The forced OOM killing is currently wired into out_of_memory() call
> > even though their objective is different which makes the code ugly
> > and harder to follow. Generic out_of_memory path has to deal with
> > configuration settings and heuristics which are completely irrelevant
> > to the forced OOM killer (e.g. sysctl_oom_kill_allocating_task or
> > OOM killer prevention for already dying tasks). All of them are
> > either relying on explicit force_kill check or indirectly by checking
> > current->mm which is always NULL for sysrq+f. This is not nice, hard
> > to follow and error prone.
> >
> > Let's pull forced OOM killer code out into a separate function
> > (force_out_of_memory) which is really trivial now.
> > As a bonus we can clearly state that this is a forced OOM killer
> > in the OOM message which is helpful to distinguish it from the
> > regular OOM killer.
> >
>
> Ok, so this patch reverts _everything_ in the first patch other than the
> documentation. Just start with this patch instead, sheesh.

The ordering is intentional. Clean up on top of the fix. And considering
how much you "loved" the previous attempt of the cleanup I had even
stronger reason to put this on top of the fix.

> > Signed-off-by: Michal Hocko <[email protected]>
> > ---
> > drivers/tty/sysrq.c | 3 +--
> > include/linux/oom.h | 3 ++-
> > mm/oom_kill.c | 57 ++++++++++++++++++++++++++++++++---------------------
> > mm/page_alloc.c | 2 +-
> > 4 files changed, 39 insertions(+), 26 deletions(-)
> >
> > diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
> > index 3a42b7187b8e..06a95a8ed701 100644
> > --- a/drivers/tty/sysrq.c
> > +++ b/drivers/tty/sysrq.c
> > @@ -356,8 +356,7 @@ static struct sysrq_key_op sysrq_term_op = {
> > static void moom_callback(struct work_struct *ignored)
> > {
> > mutex_lock(&oom_lock);
> > - if (!out_of_memory(node_zonelist(first_memory_node, GFP_KERNEL),
> > - GFP_KERNEL, 0, NULL, true))
> > + if (!force_out_of_memory())
> > pr_info("OOM request ignored because killer is disabled\n");
> > mutex_unlock(&oom_lock);
> > }
> > diff --git a/include/linux/oom.h b/include/linux/oom.h
> > index 7deecb7bca5e..061e0ffd3493 100644
> > --- a/include/linux/oom.h
> > +++ b/include/linux/oom.h
> > @@ -70,8 +70,9 @@ extern enum oom_scan_t oom_scan_process_thread(struct task_struct *task,
> > unsigned long totalpages, const nodemask_t *nodemask,
> > bool force_kill);
> >
> > +extern bool force_out_of_memory(void);
> > extern bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
> > - int order, nodemask_t *mask, bool force_kill);
> > + int order, nodemask_t *mask);
> >
> > extern void exit_oom_victim(void);
> >
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 0c312eaac834..050936f35944 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -635,12 +635,38 @@ int unregister_oom_notifier(struct notifier_block *nb)
> > EXPORT_SYMBOL_GPL(unregister_oom_notifier);
> >
> > /**
> > - * __out_of_memory - kill the "best" process when we run out of memory
> > + * force_out_of_memory - forces OOM killer
>
> ... to kill a process.

OK

> > + *
> > + * External trigger for the OOM killer. The system doesn't have to be under
> > + * OOM condition (e.g. sysrq+f).
> > + */
>
> I'm still not sure what you mean by external. I assume you're referring
> to induced by userspace rather than the kernel. I think you should use
> the word "explicit".

OK
---
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 5af8b5e44b27..7783a3760c56 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -635,9 +635,9 @@ int unregister_oom_notifier(struct notifier_block *nb)
EXPORT_SYMBOL_GPL(unregister_oom_notifier);

/**
- * force_out_of_memory - forces OOM killer
+ * force_out_of_memory - forces OOM killer to kill a process
*
- * External trigger for the OOM killer. The system doesn't have to be under
+ * Explicitly trigger the OOM killer. The system doesn't have to be under
* OOM condition (e.g. sysrq+f).
*/
bool force_out_of_memory(void)
--
Michal Hocko
SUSE Labs