2015-06-02 08:53:37

by Michal Hocko

[permalink] [raw]
Subject: [PATCH] oom: split out forced OOM killer

OOM killer might be triggered externally via sysrq+f. This is supposed
to kill a task no matter what e.g. a task is selected even though there
is an OOM victim on the way to exit. This is a big hammer for an admin
to help to resolve a memory short condition when the system is not able
to cope with it on its own in a reasonable time frame (e.g. when the
system is trashing or the OOM killer cannot make sufficient progress).

The forced OOM killing is currently wired into out_of_memory()
call which is kind of ugly because generic out_of_memory path
has to deal with configuration settings and heuristics which
are completely irrelevant to the forced OOM killer (e.g.
sysctl_oom_kill_allocating_task or OOM killer prevention for already
dying tasks). Some of those will not apply to sysrq because the handler
runs from the worker context.
check_panic_on_oom on the other hand will work and that is kind of
unexpected because sysrq+f should be usable to kill a mem hog whether
the global OOM policy is to panic or not.
It also doesn't make much sense to panic the system when no task cannot
be killed because admin has a separate sysrq for that purpose.

Let's pull forced OOM killer code out into a separate function
(force_out_of_memory) which is really trivial now. Also extract the core
of oom_kill_process into __oom_kill_process which doesn't do any
OOM prevention heuristics.
As a bonus we can clearly state that this is a forced OOM killer in the
OOM message which is helpful to distinguish it from the regular OOM
killer.

Signed-off-by: Michal Hocko <[email protected]>
---
drivers/tty/sysrq.c | 3 +--
include/linux/oom.h | 3 ++-
mm/oom_kill.c | 73 +++++++++++++++++++++++++++++++++++++++--------------
mm/page_alloc.c | 2 +-
4 files changed, 58 insertions(+), 23 deletions(-)

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index 3a42b7187b8e..06a95a8ed701 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -356,8 +356,7 @@ static struct sysrq_key_op sysrq_term_op = {
static void moom_callback(struct work_struct *ignored)
{
mutex_lock(&oom_lock);
- if (!out_of_memory(node_zonelist(first_memory_node, GFP_KERNEL),
- GFP_KERNEL, 0, NULL, true))
+ if (!force_out_of_memory())
pr_info("OOM request ignored because killer is disabled\n");
mutex_unlock(&oom_lock);
}
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 7deecb7bca5e..061e0ffd3493 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -70,8 +70,9 @@ extern enum oom_scan_t oom_scan_process_thread(struct task_struct *task,
unsigned long totalpages, const nodemask_t *nodemask,
bool force_kill);

+extern bool force_out_of_memory(void);
extern bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
- int order, nodemask_t *mask, bool force_kill);
+ int order, nodemask_t *mask);

extern void exit_oom_victim(void);

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index dff991e0681e..8fea31e17461 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -487,7 +487,7 @@ void oom_killer_enable(void)
* Must be called while holding a reference to p, which will be released upon
* returning.
*/
-void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
+static void __oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
unsigned int points, unsigned long totalpages,
struct mem_cgroup *memcg, nodemask_t *nodemask,
const char *message)
@@ -500,19 +500,6 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);

- /*
- * If the task is already exiting, don't alarm the sysadmin or kill
- * its children or threads, just set TIF_MEMDIE so it can die quickly
- */
- task_lock(p);
- if (p->mm && task_will_free_mem(p)) {
- mark_oom_victim(p);
- task_unlock(p);
- put_task_struct(p);
- return;
- }
- task_unlock(p);
-
if (__ratelimit(&oom_rs))
dump_header(p, gfp_mask, order, memcg, nodemask);

@@ -597,6 +584,28 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
}
#undef K

+void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
+ unsigned int points, unsigned long totalpages,
+ struct mem_cgroup *memcg, nodemask_t *nodemask,
+ const char *message)
+{
+ /*
+ * If the task is already exiting, don't alarm the sysadmin or kill
+ * its children or threads, just set TIF_MEMDIE so it can die quickly
+ */
+ task_lock(p);
+ if (p->mm && task_will_free_mem(p)) {
+ mark_oom_victim(p);
+ task_unlock(p);
+ put_task_struct(p);
+ return;
+ }
+ task_unlock(p);
+
+ __oom_kill_process(p, gfp_mask, order, points, totalpages, memcg,
+ nodemask, message);
+}
+
/*
* Determines whether the kernel must panic because of the panic_on_oom sysctl.
*/
@@ -635,12 +644,38 @@ int unregister_oom_notifier(struct notifier_block *nb)
EXPORT_SYMBOL_GPL(unregister_oom_notifier);

/**
- * __out_of_memory - kill the "best" process when we run out of memory
+ * force_out_of_memory - forces OOM killer
+ *
+ * External trigger for the OOM killer. The system doesn't have to be under
+ * OOM condition (e.g. sysrq+f).
+ */
+bool force_out_of_memory(void)
+{
+ struct zonelist *zonelist = node_zonelist(first_memory_node, GFP_KERNEL);
+ struct task_struct *p;
+ unsigned long totalpages;
+ unsigned int points;
+
+ if (oom_killer_disabled)
+ return false;
+
+ constrained_alloc(zonelist, GFP_KERNEL, NULL, &totalpages);
+ p = select_bad_process(&points, totalpages, NULL, true);
+ if (p != (void *)-1UL)
+ __oom_kill_process(p, GFP_KERNEL, 0, points, totalpages, NULL,
+ NULL, "Forced out of memory killer");
+ else
+ pr_warn("Forced out of memory. No killable task found...\n");
+
+ return true;
+}
+
+/**
+ * out_of_memory - kill the "best" process when we run out of memory
* @zonelist: zonelist pointer
* @gfp_mask: memory allocation flags
* @order: amount of memory being requested as a power of 2
* @nodemask: nodemask passed to page allocator
- * @force_kill: true if a task must be killed, even if others are exiting
*
* If we run out of memory, we have the choice between either
* killing a random task (bad), letting the system crash (worse)
@@ -648,7 +683,7 @@ EXPORT_SYMBOL_GPL(unregister_oom_notifier);
* don't have to be perfect here, we just have to be good.
*/
bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
- int order, nodemask_t *nodemask, bool force_kill)
+ int order, nodemask_t *nodemask)
{
const nodemask_t *mpol_mask;
struct task_struct *p;
@@ -699,7 +734,7 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
goto out;
}

- p = select_bad_process(&points, totalpages, mpol_mask, force_kill);
+ p = select_bad_process(&points, totalpages, mpol_mask, false);
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!p) {
dump_header(NULL, gfp_mask, order, NULL, mpol_mask);
@@ -734,7 +769,7 @@ void pagefault_out_of_memory(void)
if (!mutex_trylock(&oom_lock))
return;

- if (!out_of_memory(NULL, 0, 0, NULL, false)) {
+ if (!out_of_memory(NULL, 0, 0, NULL)) {
/*
* There shouldn't be any user tasks runnable while the
* OOM killer is disabled, so the current task has to
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1f9ffbb087cb..014806d13138 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2731,7 +2731,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
goto out;
}
/* Exhausted what can be done so it's blamo time */
- if (out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask, false)
+ if (out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask)
|| WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL))
*did_some_progress = 1;
out:
--
2.1.4


2015-06-04 22:59:15

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] oom: split out forced OOM killer

On Tue, 2 Jun 2015, Michal Hocko wrote:

> OOM killer might be triggered externally via sysrq+f. This is supposed
> to kill a task no matter what e.g. a task is selected even though there
> is an OOM victim on the way to exit. This is a big hammer for an admin
> to help to resolve a memory short condition when the system is not able
> to cope with it on its own in a reasonable time frame (e.g. when the
> system is trashing or the OOM killer cannot make sufficient progress).
>
> The forced OOM killing is currently wired into out_of_memory()
> call which is kind of ugly because generic out_of_memory path
> has to deal with configuration settings and heuristics which
> are completely irrelevant to the forced OOM killer (e.g.
> sysctl_oom_kill_allocating_task or OOM killer prevention for already
> dying tasks). Some of those will not apply to sysrq because the handler
> runs from the worker context.
> check_panic_on_oom on the other hand will work and that is kind of
> unexpected because sysrq+f should be usable to kill a mem hog whether
> the global OOM policy is to panic or not.
> It also doesn't make much sense to panic the system when no task cannot
> be killed because admin has a separate sysrq for that purpose.
>
> Let's pull forced OOM killer code out into a separate function
> (force_out_of_memory) which is really trivial now. Also extract the core
> of oom_kill_process into __oom_kill_process which doesn't do any
> OOM prevention heuristics.
> As a bonus we can clearly state that this is a forced OOM killer in the
> OOM message which is helpful to distinguish it from the regular OOM
> killer.
>

I'm not sure what the benefit of this is, and it's adding more code.
Having multiple pathways and requirements, such as constrained_alloc(), to
oom kill a process isn't any clearer, in my opinion. It also isn't
intended to be optimized since the oom killer called from the page
allocator and from sysrq aren't fastpaths. To me, this seems like only a
source code level change and doesn't make anything more clear but rather
adds more code and obfuscates the entry path.

2015-06-05 11:29:54

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: [PATCH] oom: split out forced OOM killer

On 2015-06-04 18:59, David Rientjes wrote:
> On Tue, 2 Jun 2015, Michal Hocko wrote:
>
>> OOM killer might be triggered externally via sysrq+f. This is supposed
>> to kill a task no matter what e.g. a task is selected even though there
>> is an OOM victim on the way to exit. This is a big hammer for an admin
>> to help to resolve a memory short condition when the system is not able
>> to cope with it on its own in a reasonable time frame (e.g. when the
>> system is trashing or the OOM killer cannot make sufficient progress).
>>
>> The forced OOM killing is currently wired into out_of_memory()
>> call which is kind of ugly because generic out_of_memory path
>> has to deal with configuration settings and heuristics which
>> are completely irrelevant to the forced OOM killer (e.g.
>> sysctl_oom_kill_allocating_task or OOM killer prevention for already
>> dying tasks). Some of those will not apply to sysrq because the handler
>> runs from the worker context.
>> check_panic_on_oom on the other hand will work and that is kind of
>> unexpected because sysrq+f should be usable to kill a mem hog whether
>> the global OOM policy is to panic or not.
>> It also doesn't make much sense to panic the system when no task cannot
>> be killed because admin has a separate sysrq for that purpose.
>>
>> Let's pull forced OOM killer code out into a separate function
>> (force_out_of_memory) which is really trivial now. Also extract the core
>> of oom_kill_process into __oom_kill_process which doesn't do any
>> OOM prevention heuristics.
>> As a bonus we can clearly state that this is a forced OOM killer in the
>> OOM message which is helpful to distinguish it from the regular OOM
>> killer.
>>
>
> I'm not sure what the benefit of this is, and it's adding more code.
> Having multiple pathways and requirements, such as constrained_alloc(), to
> oom kill a process isn't any clearer, in my opinion. It also isn't
> intended to be optimized since the oom killer called from the page
> allocator and from sysrq aren't fastpaths. To me, this seems like only a
> source code level change and doesn't make anything more clear but rather
> adds more code and obfuscates the entry path.

At the very least, it does make the semantics of sysrq-f much nicer for
admins (especially the bit where it ignores the panic_on_oom setting, if
the admin wants the system to panic, he'll use sysrq-c). There have
been times I've had to hit sysrq-f multiple times to get to actually
kill anything, and this looks to me like it would eliminate that rather
annoying issue as well.



Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2015-06-08 17:59:43

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] oom: split out forced OOM killer

On Fri, 5 Jun 2015, Austin S Hemmelgarn wrote:

> > I'm not sure what the benefit of this is, and it's adding more code.
> > Having multiple pathways and requirements, such as constrained_alloc(), to
> > oom kill a process isn't any clearer, in my opinion. It also isn't
> > intended to be optimized since the oom killer called from the page
> > allocator and from sysrq aren't fastpaths. To me, this seems like only a
> > source code level change and doesn't make anything more clear but rather
> > adds more code and obfuscates the entry path.
>
> At the very least, it does make the semantics of sysrq-f much nicer for admins
> (especially the bit where it ignores the panic_on_oom setting, if the admin
> wants the system to panic, he'll use sysrq-c). There have been times I've had
> to hit sysrq-f multiple times to get to actually kill anything, and this looks
> to me like it would eliminate that rather annoying issue as well.
>

Are you saying there's a functional change with this patch/

2015-06-08 18:59:16

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: [PATCH] oom: split out forced OOM killer

On 2015-06-08 13:59, David Rientjes wrote:
> On Fri, 5 Jun 2015, Austin S Hemmelgarn wrote:
>
>>> I'm not sure what the benefit of this is, and it's adding more code.
>>> Having multiple pathways and requirements, such as constrained_alloc(), to
>>> oom kill a process isn't any clearer, in my opinion. It also isn't
>>> intended to be optimized since the oom killer called from the page
>>> allocator and from sysrq aren't fastpaths. To me, this seems like only a
>>> source code level change and doesn't make anything more clear but rather
>>> adds more code and obfuscates the entry path.
>>
>> At the very least, it does make the semantics of sysrq-f much nicer for admins
>> (especially the bit where it ignores the panic_on_oom setting, if the admin
>> wants the system to panic, he'll use sysrq-c). There have been times I've had
>> to hit sysrq-f multiple times to get to actually kill anything, and this looks
>> to me like it would eliminate that rather annoying issue as well.
>>
>
> Are you saying there's a functional change with this patch/
>
I believe so (haven't actually read the patch itself, just the
changelog), although it is only a change for certain configurations to a
very specific and (I hope infrequently) used piece of functionality.
Like I said above, if I wanted to crash my system, I'd be using sysrq-c;
and if I'm using sysrq-f, I want _some_ task to die _now_.


Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2015-06-08 19:41:57

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] oom: split out forced OOM killer

On Mon, 8 Jun 2015, Austin S Hemmelgarn wrote:

> I believe so (haven't actually read the patch itself, just the changelog),
> although it is only a change for certain configurations to a very specific and
> (I hope infrequently) used piece of functionality. Like I said above, if I
> wanted to crash my system, I'd be using sysrq-c; and if I'm using sysrq-f, I
> want _some_ task to die _now_.
>

This patch is not a functional change, so I don't interpret your feedback
as any support of it being merged.

That said, you raise an interesting point of whether sysrq+f should ever
trigger a panic due to panic_on_oom. The case can be made that it should
ignore panic_on_oom and require the use of another sysrq to panic the
machine instead. Sysrq+f could then be used to oom kill a process,
regardless of panic_on_oom, and the panic only occurs if userspace did not
trigger the kill or the kill itself will fail.

I think we should pursue that direction.

This patch also changes the text which is output to the kernel log on
panic, which we use to parse for machines that have crashed due to no
killable memcg processes, so NACK on this patch. There's also no reason
to add more source code to try to make things cleaner when it just
obfuscates the oom killer code more than it needs to (we don't need to
optimize or have multiple entry points).

2015-06-08 21:06:32

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] oom: split out forced OOM killer

On Mon 08-06-15 12:41:48, David Rientjes wrote:
> On Mon, 8 Jun 2015, Austin S Hemmelgarn wrote:
>
> > I believe so (haven't actually read the patch itself, just the changelog),
> > although it is only a change for certain configurations to a very specific and
> > (I hope infrequently) used piece of functionality. Like I said above, if I
> > wanted to crash my system, I'd be using sysrq-c; and if I'm using sysrq-f, I
> > want _some_ task to die _now_.
> >
>
> This patch is not a functional change, so I don't interpret your feedback
> as any support of it being merged.

David, have you actually read the patch? The changelog is mentioning this:
"
check_panic_on_oom on the other hand will work and that is kind of
unexpected because sysrq+f should be usable to kill a mem hog whether
the global OOM policy is to panic or not.
It also doesn't make much sense to panic the system when no task cannot
be killed because admin has a separate sysrq for that purpose.
"
and the patch exludes panic_on_oom from the sysrq path.

> That said, you raise an interesting point of whether sysrq+f should ever
> trigger a panic due to panic_on_oom. The case can be made that it should
> ignore panic_on_oom and require the use of another sysrq to panic the
> machine instead. Sysrq+f could then be used to oom kill a process,
> regardless of panic_on_oom, and the panic only occurs if userspace did not
> trigger the kill or the kill itself will fail.

Why would it panic the system if there is no killable task? Shoudln't
be admin able to do additional steps after the explicit oom killer failed
and only then panic by sysrq?

> I think we should pursue that direction.
>
> This patch also changes the text which is output to the kernel log on
> panic, which we use to parse for machines that have crashed due to no
> killable memcg processes, so NACK on this patch.

Could you point to the code snippet which does that? Because the only
change to the output is for the forced oom killer.

> There's also no reason
> to add more source code to try to make things cleaner when it just
> obfuscates the oom killer code more than it needs to (we don't need to
> optimize or have multiple entry points).

I am not sure I understand your objection here. The forced oom killer
path is now clear (__oom_kill_process does the dirty job while
oom_kill_process can do heuristics to prevent from pointless killing),
easier to follow and thus more maintainable from my POV.
I could understand your objection if this has added a lot of code but
4 files changed, 58 insertions(+), 23 deletions(-)

which seems appropriate to me.

--
Michal Hocko
SUSE Labs

2015-06-08 23:06:15

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] oom: split out forced OOM killer

On Mon, 8 Jun 2015, Michal Hocko wrote:

> > This patch is not a functional change, so I don't interpret your feedback
> > as any support of it being merged.
>
> David, have you actually read the patch? The changelog is mentioning this:
> "
> check_panic_on_oom on the other hand will work and that is kind of
> unexpected because sysrq+f should be usable to kill a mem hog whether
> the global OOM policy is to panic or not.
> It also doesn't make much sense to panic the system when no task cannot
> be killed because admin has a separate sysrq for that purpose.
> "
> and the patch exludes panic_on_oom from the sysrq path.
>

Yes, and that's why I believe we should pursue that direction without the
associated "cleanup" that adds 35 lines of code to supress a panic. In
other words, there's no reason to combine a patch that suppresses the
panic even with panic_on_oom, which I support, and a "cleanup" that I
believe just obfuscates the code.

It's a one-liner change: just test for force_kill and suppress the panic;
we don't need 35 new lines that create even more unique entry paths.

> > That said, you raise an interesting point of whether sysrq+f should ever
> > trigger a panic due to panic_on_oom. The case can be made that it should
> > ignore panic_on_oom and require the use of another sysrq to panic the
> > machine instead. Sysrq+f could then be used to oom kill a process,
> > regardless of panic_on_oom, and the panic only occurs if userspace did not
> > trigger the kill or the kill itself will fail.
>
> Why would it panic the system if there is no killable task? Shoudln't
> be admin able to do additional steps after the explicit oom killer failed
> and only then panic by sysrq?
>

Today it panics, I don't think it should panic when there are no killable
processes because it's inherently racy with userspace. It's similar to
suppressing panic_on_oom for sysrq+f, but for a different reason, so it
should probably be a separate patch with its own changelog (and update to
documentation for both patches to make this explicit).

2015-06-09 09:37:07

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] oom: split out forced OOM killer

On Mon 08-06-15 16:06:07, David Rientjes wrote:
> On Mon, 8 Jun 2015, Michal Hocko wrote:
>
> > > This patch is not a functional change, so I don't interpret your feedback
> > > as any support of it being merged.
> >
> > David, have you actually read the patch? The changelog is mentioning this:
> > "
> > check_panic_on_oom on the other hand will work and that is kind of
> > unexpected because sysrq+f should be usable to kill a mem hog whether
> > the global OOM policy is to panic or not.
> > It also doesn't make much sense to panic the system when no task cannot
> > be killed because admin has a separate sysrq for that purpose.
> > "
> > and the patch exludes panic_on_oom from the sysrq path.
> >
>
> Yes, and that's why I believe we should pursue that direction without the
> associated "cleanup" that adds 35 lines of code to supress a panic. In
> other words, there's no reason to combine a patch that suppresses the
> panic even with panic_on_oom, which I support, and a "cleanup" that I
> believe just obfuscates the code.
>
> It's a one-liner change: just test for force_kill and suppress the panic;
> we don't need 35 new lines that create even more unique entry paths.

I completely detest yet another check in out_of_memory. And there is
even no reason to do that. Forced kill and genuine oom have different
objectives and combining those two just makes the code harder to read
(one has to go to check the syrq callback to realize that the forced
path is triggered from the workqueue context and that current->mm !=
NULL check will prevent some heuristics. This is just too ugly to
live). So why the heck are you pushing for keeping everything in a
single path?

That being said, I have no problem to do 3 patches, where two of them
would add force check for check_panic_on_oom and panic on no killable
task and only then pull out force_out_of_memory to make it readable
again and drop force checks but I do not see much point in this
juggling.

> > > That said, you raise an interesting point of whether sysrq+f should ever
> > > trigger a panic due to panic_on_oom. The case can be made that it should
> > > ignore panic_on_oom and require the use of another sysrq to panic the
> > > machine instead. Sysrq+f could then be used to oom kill a process,
> > > regardless of panic_on_oom, and the panic only occurs if userspace did not
> > > trigger the kill or the kill itself will fail.
> >
> > Why would it panic the system if there is no killable task? Shoudln't
> > be admin able to do additional steps after the explicit oom killer failed
> > and only then panic by sysrq?
> >
>
> Today it panics, I don't think it should panic when there are no killable
> processes because it's inherently racy with userspace. It's similar to
> suppressing panic_on_oom for sysrq+f, but for a different reason, so it
> should probably be a separate patch with its own changelog (and update to
> documentation for both patches to make this explicit).

I have no problem to be more explicit about the behavior of course. I
can fold it to the original patch.
---
>From b2e611dd5d2589eb82c800e326a9c12da33958d5 Mon Sep 17 00:00:00 2001
From: Michal Hocko <[email protected]>
Date: Tue, 9 Jun 2015 08:49:09 +0200
Subject: [PATCH] Update the sysrq+f documentnation.

Requested-by: David Rientjes <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
Documentation/sysrq.txt | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt
index 0e307c94809a..a5dd88b0aede 100644
--- a/Documentation/sysrq.txt
+++ b/Documentation/sysrq.txt
@@ -75,7 +75,10 @@ On other - If you know of the key combos for other architectures, please

'e' - Send a SIGTERM to all processes, except for init.

-'f' - Will call oom_kill to kill a memory hog process.
+'f' - Will call oom_kill to kill a memory hog process. Please note that
+ an ongoing OOM killer is ignored and a task is killed even though
+ there was an oom victim selected already. panic_on_oom is ignored
+ and the system doesn't panic if there are no oom killable tasks.

'g' - Used by kgdb (kernel debugger)

--
2.1.4


--
Michal Hocko
SUSE Labs

2015-06-09 22:45:46

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] oom: split out forced OOM killer

On Tue, 9 Jun 2015, Michal Hocko wrote:

> > Yes, and that's why I believe we should pursue that direction without the
> > associated "cleanup" that adds 35 lines of code to supress a panic. In
> > other words, there's no reason to combine a patch that suppresses the
> > panic even with panic_on_oom, which I support, and a "cleanup" that I
> > believe just obfuscates the code.
> >
> > It's a one-liner change: just test for force_kill and suppress the panic;
> > we don't need 35 new lines that create even more unique entry paths.
>
> I completely detest yet another check in out_of_memory. And there is
> even no reason to do that. Forced kill and genuine oom have different
> objectives and combining those two just makes the code harder to read
> (one has to go to check the syrq callback to realize that the forced
> path is triggered from the workqueue context and that current->mm !=
> NULL check will prevent some heuristics. This is just too ugly to
> live). So why the heck are you pushing for keeping everything in a
> single path?
>

Perhaps if you renamed "force_kill" to "sysrq" it would make more sense to
you?

I don't think the oom killer needs multiple entry points that duplicates
code and adds more than twice the lines it removes. It would make sense
if that was an optimization in a hot path, or a warm path, or even a
luke-warm path, but not an icy cold path like the oom killer.
check_panic_on_oom() can simply do

if (sysrq)
return;

It's not hard and it's very clear. We don't need 35 more lines of code to
do this.

2015-06-10 07:37:35

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] oom: split out forced OOM killer

On Tue 09-06-15 15:45:35, David Rientjes wrote:
> On Tue, 9 Jun 2015, Michal Hocko wrote:
>
> > > Yes, and that's why I believe we should pursue that direction without the
> > > associated "cleanup" that adds 35 lines of code to supress a panic. In
> > > other words, there's no reason to combine a patch that suppresses the
> > > panic even with panic_on_oom, which I support, and a "cleanup" that I
> > > believe just obfuscates the code.
> > >
> > > It's a one-liner change: just test for force_kill and suppress the panic;
> > > we don't need 35 new lines that create even more unique entry paths.
> >
> > I completely detest yet another check in out_of_memory. And there is
> > even no reason to do that. Forced kill and genuine oom have different
> > objectives and combining those two just makes the code harder to read
> > (one has to go to check the syrq callback to realize that the forced
> > path is triggered from the workqueue context and that current->mm !=
> > NULL check will prevent some heuristics. This is just too ugly to
> > live). So why the heck are you pushing for keeping everything in a
> > single path?
> >
>
> Perhaps if you renamed "force_kill" to "sysrq" it would make more sense to
> you?

The naming is not _the_ problem.

> I don't think the oom killer needs multiple entry points that duplicates
> code and adds more than twice the lines it removes. It would make sense
> if that was an optimization in a hot path, or a warm path, or even a
> luke-warm path, but not an icy cold path like the oom killer.

This is not trying to optimize for speed. It is a clean up for
_readability_ and _maintainability_ which is considerably better after
the patch because responsibilities of both paths are clear and sysrq
path doesn't have to care about whatever special handling the oom path
wants to care. It is _that_ simple.

> check_panic_on_oom() can simply do

>
> if (sysrq)
> return;

and then do the same thing for panic on no killable task and then for
all other cases which are of no relevance for the sysrq path which we
come up later potentially.

This level of argumentation is just ridiculous. You are blocking a
useful cleanup which also fixes a really non-intuitive behavior. I admit
that nobody was complaining about this behavior so this is nothing
urgent but if we go with panic_on_oom_timeout proposal posted in other
email thread then I expect panic_on_oom would be usable much more and
then it would matter much more.

> It's not hard and it's very clear. We don't need 35 more lines of code to
> do this.

Sure we do not _need_ it and we definitely can _clutter_ the code even
more.

I do not think your objections are justified. It is natural and a good
practice to split code paths which have different requirements rather
than differentiate them with multiple checks in the common path (some of
them even very subtle). It is a common practice to split up common
infrastructure in helper functions and reuse them when needed. But I
guess I do not have teach you this trivial things...

</bunfight> from my side

Andrew do whatever you like with the patch but I find the level of
argumentation in this thread as not reasonable (I would even consider it
trolling at some parts) and not sufficient for a nack.
--
Michal Hocko
SUSE Labs