2003-02-28 01:01:57

by Dan Kegel

[permalink] [raw]
Subject: Protecting processes from the OOM killer

For a while now, I've been trying to figure out how
to make the oom killer not kill important processes.

How about rewarding processes that have an
RSS limit if they stay well below it?
The operator can then mark processes that are important
by using 'ulimit -m'.
(This is orthogonal to Rik's recent patch.)

--- oom_kill.c.orig 2002-09-26 17:31:12.000000000 -0700
+++ oom_kill.c 2003-02-27 16:59:46.000000000 -0800
@@ -86,6 +90,18 @@
points *= 2;

/*
+ * Processes which *have* an RSS limit, but which are under half of it,
+ * are behaving well, so halve their badness points.
+ * Do it again if they're under a quarter of their RSS limit.
+ */
+ if (p->rlim[RLIMIT_RSS].rlim_max != ULONG_MAX) {
+ if (p->mm->rss < (p->rlim[RLIMIT_RSS].rlim_max >> (PAGE_SHIFT+1)))
+ points /= 2;
+ if (p->mm->rss < (p->rlim[RLIMIT_RSS].rlim_max >> (PAGE_SHIFT+2)))
+ points /= 2;
+ }
+
+ /*
* Superuser processes are usually more important, so we make it
* less likely that we kill those.
*/

--
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045


2003-02-28 12:27:15

by Alan

[permalink] [raw]
Subject: Re: Protecting processes from the OOM killer

On Fri, 2003-02-28 at 01:21, Dan Kegel wrote:
> For a while now, I've been trying to figure out how
> to make the oom killer not kill important processes.

How about by not allowing your system to excessively overcommit.
Everything else is armwaving "works half the time" stuff. By the time
the OOM kicks in the game is already over. The rlimit one doesnt deal
with things like fork explosions where you have lots of processes
all under 1/4 of the rlimit range who cumulatively overcommit. In
fact you now pick harder on other tasks...

2003-02-28 14:09:13

by Ville Herva

[permalink] [raw]
Subject: Re: Protecting processes from the OOM killer

On Fri, Feb 28, 2003 at 01:40:19PM +0000, you [Alan Cox] wrote:
>
> How about by not allowing your system to excessively overcommit.
> Everything else is armwaving "works half the time" stuff.

Which invites the question: the strict overcommit stuff from -ac (the 'echo
{2,3} > /proc/sys/vm/overcommit_memory' stuff) hasn't found it's way to
mainline yet, has it? I wonder if it would be compatible with up-to-date
-aa vm...


-- v --

[email protected]

2003-02-28 14:24:55

by Alan

[permalink] [raw]
Subject: Re: Protecting processes from the OOM killer

On Fri, 2003-02-28 at 14:19, Ville Herva wrote:
> On Fri, Feb 28, 2003 at 01:40:19PM +0000, you [Alan Cox] wrote:
> >
> > How about by not allowing your system to excessively overcommit.
> > Everything else is armwaving "works half the time" stuff.
>
> Which invites the question: the strict overcommit stuff from -ac (the 'echo
> {2,3} > /proc/sys/vm/overcommit_memory' stuff) hasn't found it's way to
> mainline yet, has it? I wonder if it would be compatible with up-to-date
> -aa vm...

Marcelo didn't want it for base. Its in 2.5 and in -ac. There is no
longer any rmap requirement on the code so it should "just work" with
the -aa changes too

2003-02-28 15:48:36

by Dan Kegel

[permalink] [raw]
Subject: Re: Protecting processes from the OOM killer

Alan Cox wrote:
> On Fri, 2003-02-28 at 01:21, Dan Kegel wrote:
>
>>For a while now, I've been trying to figure out how
>>to make the oom killer not kill important processes.
>
>
> How about by not allowing your system to excessively overcommit.

(I'm using 2.4.18; is
http://www.kernel.org/pub/linux/kernel/people/rml/vm/strict-overcommit/v2.4/vm-strict-overcommit-rml-2.4.18-1.patch
still the approprate patch for that?)

> Everything else is armwaving "works half the time" stuff. By the time
> the OOM kicks in the game is already over.

Even with overcommit disallowed, the OOM killer is going to run
when my users try to run too big a job, so I would still like
the OOM killer to behave "well".

> The rlimit one doesnt deal
> with things like fork explosions where you have lots of processes
> all under 1/4 of the rlimit range who cumulatively overcommit. In
> fact you now pick harder on other tasks...

We do not see fork explosions in our workload, but if we did,
we could abuse the RSS limit for now by setting it to zero except for
the processes we wanted to protect from the OOM killer.
If that works in practice the same idea could be done without the abuse;
the RSS limit is just a handy knob.
- Dan

--
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045

2003-02-28 22:03:45

by James Antill

[permalink] [raw]
Subject: Re: Protecting processes from the OOM killer

Dan Kegel <[email protected]> writes:

> Alan Cox wrote:
> > On Fri, 2003-02-28 at 01:21, Dan Kegel wrote:
> >
> > Everything else is armwaving "works half the time" stuff. By the time
> > the OOM kicks in the game is already over.
>
> Even with overcommit disallowed, the OOM killer is going to run
> when my users try to run too big a job, so I would still like
> the OOM killer to behave "well".

If OOM is called you've overcommitted memory, so this isn't true
... no overcommit == NULL from malloc() etc.

--
# James Antill -- [email protected]
:0:
* ^From: .*james@and\.org
/dev/null

2003-03-03 14:35:03

by Jesse Pollard

[permalink] [raw]
Subject: Re: Protecting processes from the OOM killer

On Friday 28 February 2003 10:08 am, Dan Kegel wrote:
> Alan Cox wrote:
snip
> > Everything else is armwaving "works half the time" stuff. By the time
> > the OOM kicks in the game is already over.
>
> Even with overcommit disallowed, the OOM killer is going to run
> when my users try to run too big a job, so I would still like
> the OOM killer to behave "well".

Shouldn't - the process the user tries to run will not be started since
it must reserve the space first. malloc will fail immediately, allowing the
process to handle the even gracefully and exit.

Anything else is a bug in the application.

--
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2003-03-03 15:09:14

by Alan

[permalink] [raw]
Subject: Re: Protecting processes from the OOM killer

On Mon, 2003-03-03 at 14:45, Jesse Pollard wrote:
> Shouldn't - the process the user tries to run will not be started since
> it must reserve the space first. malloc will fail immediately, allowing the
> process to handle the even gracefully and exit.
>
> Anything else is a bug in the application.

The one case you can't cover cleanly in C is a stack grow exceeding memory
usage. At that point it requires a tiny bit of magic. You can do it, but
the overcommit blocker has to armwave a little for the kernel and other
things so I've never seen it happen in a normal situation

2003-03-03 16:20:05

by Dan Kegel

[permalink] [raw]
Subject: Re: Protecting processes from the OOM killer

Jesse Pollard wrote:
> On Friday 28 February 2003 10:08 am, Dan Kegel wrote:
>
>>Alan Cox wrote:
>
> snip
>
>>>Everything else is armwaving "works half the time" stuff. By the time
>>>the OOM kicks in the game is already over.
>>
>>Even with overcommit disallowed, the OOM killer is going to run
>>when my users try to run too big a job, so I would still like
>>the OOM killer to behave "well".
>
>
> Shouldn't - the process the user tries to run will not be started since
> it must reserve the space first. malloc will fail immediately, allowing the
> process to handle the even gracefully and exit.

I thought of that about five minutes after I hit 'send'.

I have a feeling that there might still be a few cases
not perfectly covered by the strict overcommit patch.
Say, memory allocations due to incoming network traffic.
I guess if memory runs out during incoming traffic, the kernel
should simply drop the traffic. Until all those situations
are nicely ironed out, there's still some chance the OOM killer
might run even on a strict overcommit system.

But enough talking; I need to go try it.
- Dan

--
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045