2002-09-11 18:09:38

by Jim Sibley

[permalink] [raw]
Subject: Killing/balancing processes when overcommited

I have run into a situation in a multi-user Linux environment that when
memory is exhausted, random things happen. The best case is that the
"offending" user's task is killed. Just as likely, another user's task is
killed. In some cases, important tasks such as the telnet deamon are
killed. In extreme cases, something is killed that is critical to the
over-all well being of the system, causing random loops, kernel panics, or
system "autism" (where it is not running and responds to no external
intervention other than a reboot).

Since Rik Riel is listed as the author of the oom_kill module (which is
still the one being used in Linux 2.5), I contacted him and he suggested I
contact the vger.kernel.org.

We running Linux in multi-user, SMP, large memory environment (an LPAR in a
zSeries with 2 GB of real memory, but this could just as well happen be on
any other hardware platform). We have several Linux systems running with
4-8 swap volumes or 10-18 GB of real+swap memory and we have run Linux
systems with over 40 GB of real+swap for significant periods, so the paging
mechanism seems to be quite robust in general.

However, when the memory+swap space becomes full, we start getting random
problems. I've turned on the debugging in oom_kill and have watched how the
"kill" scores are calculated and it seems rather random to me. When the
memory is exhuasted Linux needs some attention. In a "well tuned" system,
we are safe, but when the system accidentally (or deliberately) becomes
"detuned", oom_kill is entered and arbitrarily kills a process.

Essentially, Linux has no real conception of "importance" to the
installation. In a single user environment, this is a moot point and
oom_kill works well enough.

In a multi-user environment oom_kill needs to be more selective. There is
no good algorithmic method to make a determination to terminate with
extreme prejudice:

1 - cpu usage may not be a good measure - the user causing the system to
"become detuned" may use little CPU. The assumption that the memory
offender is consuming CPU, such as being in a loop, is neither necessary
nor sufficient. If the system has been running for quite a while (days,
weeks, months), important tasks may have accumulated a relatively large
amount of CPU time and become higher profile targets for oom_kill.

2 - Large memory tasks may not be a good measure as some important tasks
often have large memory and working set prequirements, such as a database
server.

3 - Measuring memory by task rather than total memory usage of the user is
misleading because a single task which uses a moderate amount of memory
scores higher than a single user with a lot of small tasks using more
memory.

4 - Niceness is not really useful in a multi-user environment. Each user
gets to determine how nice he is. In a commerical environment, business
needs determine the "niceness". A long running process that uses a lot of
resources may be very important to the business, but could run at a lower
priority to allow better interactive response for all users. Killing it
just because it has been "niced" is not adequate.

5 - Other numerical limits tend to be arbitrary. Resources should be
allocated by installation need. Resources should be used by the
installations most important users when they need them and others when the
resources are available.

Since algorithmic methods don't really do the job, I suggest that there
should be some way for the installation to designate importance (not just
for oom_kill, but for overall use of resources).

For example, a file in /etc could be read that lists the processes by
userid or group in the order of importance or give them a weight. When
oom_kill or other processes need to make a decision to limit resources or
kill a task, users or groups with the lowest priority would be most
restricted or killed first.

Suppose an installation decided to replace a lot of desktop Linuxes with
thin clients and a large central Linux server (hardware platform is up to
the installation) for running servers, client processes, data base
services, data storage and sharing, and backup.

You might see the installations priority list like this (lowest priority is
most important, highest value is least important). Kill priority would be
the importance of keeping the process or task running. I've also added a
Resource priority to give an indication of who should get resources first
(such as CPU or devices).

resource
group priority kill priority
system 0 0 - never kill
support 1 1
payroll 2 2
production 3 3
general user 4 4
production backgournd 5 3 <- make sure testing and
general user are killed BEFORE production
testing 6 5

Note that in the example above, production has the second lowest resource
priority, but a higher kill priority ("we don't care how long it takes, but
it must complete").

In a system with sufficient resources, everyone would get what they needed.
As resources become limit, payroll gets resources first and testing gets
the least. In the extreme case, when the system is overwhelmed, testing is
the first to be removed.

This approach also has the advantage in a multi-user environment that the
system administrator would get phone calls and spam when before the
important processes are jeopardized from the less important users and
hopefully have time to react.


Regards, Jim
Linux S/390-zSeries Support, SEEL, IBM Silicon Valley Labs
t/l 543-4021, 408-463-4021, [email protected]
*** Grace Happens ***




2002-09-11 21:39:26

by Alan

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited

On Wed, 2002-09-11 at 19:08, Jim Sibley wrote:
> I have run into a situation in a multi-user Linux environment that when
> memory is exhausted, random things happen. The best case is that the
> "offending" user's task is killed. Just as likely, another user's task is

The best case is that you don't allow overcommit. 2.4 supports that in
the Red Hat and -ac trees. Robert Love forward ported the changes to
2.5.x. There is an outstanding need to add an additional "root factor"
so root can get some memory other people cannot, but otherwise it seems
to work well

2002-09-12 05:04:45

by jurriaan

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited

From: Jim Sibley <[email protected]>
Date: Wed, Sep 11, 2002 at 11:08:43AM -0700
> 1 - cpu usage may not be a good measure
> 2 - Large memory tasks may not be a good measure
> 3 - Measuring memory by task is misleading
> 4 - Niceness is not really useful in a multi-user environment.
> 5 - Other numerical limits tend to be arbitrary.

I was just think (feel free to point out the errors of my way):

what if we used the time a program was started as a guide? The last
programs started are killed of first.

That would mean that init survives to the last, as would the daemons
that are started when booting.

Alternatively, suppose we get a very large pid-space, and at the end of
booting there's something like

echo "5000" > /proc/sys/minimum-pid-from-here-on

Then, you could do:

echo "5000" > proc/sys/oom_lowest_pid_to_try_killing_first

in other words, protect a part of pid-space against oom-killing.

How this all works with threads, forks, child-processes etc etc is
beyond me - I'm just thinking a bit.

Jurriaan
--
A black cat crossing your path signifies that the animal is going somewhere.
Groucho Marx (1890-1977)
GNU/Linux 2.4.19-ac4 SMP/ReiserFS 2x1402 bogomips load av: 1.57 1.36 0.87

2002-09-12 07:01:41

by Tim Connors

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited

In linux.kernel, you wrote:
>
> resource
> group priority kill priority
> system 0 0 - never kill
> support 1 1
> payroll 2 2
> production 3 3
> general user 4 4
> production backgournd 5 3 <- make sure testing and
> general user are killed BEFORE production
> testing 6 5
>
> Note that in the example above, production has the second lowest resource
> priority, but a higher kill priority ("we don't care how long it takes, but
> it must complete").
>
> In a system with sufficient resources, everyone would get what they needed.
> As resources become limit, payroll gets resources first and testing gets
> the least. In the extreme case, when the system is overwhelmed, testing is
> the first to be removed.

You seemed to have just described a combination of forced niceness
(from login scripts) and ulimit. Man ulimit about how to limit number
of processes plus memory etc, so people can't fork() bomb you out of
existance.

--
TimC -- http://astronomy.swin.edu.au/staff/tconnors/

Conclusion to my thesis --
"It is trivial to show that it is clearly obvious that this is not
woofly"

2002-09-12 07:20:51

by Giuliano Pochini

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited


On 11-Sep-2002 Jim Sibley wrote:
> I have run into a situation in a multi-user Linux environment that when
> memory is exhausted, random things happen. [...] In a "well tuned" system,
> we are safe, but when the system accidentally (or deliberately) becomes
> "detuned", oom_kill is entered and arbitrarily kills a process.

It's not difficult to make the kerner choose the right processes
to kill. It's impossible. Imagine that when it goes oom the system
stops and asks you what processes have to be killed. What do you
kill ? I think the only way to save the system it to tell the kernel
which are the processes that must not be killed, except in extreme
conditions. Probably we do need an oomd that the sysadmin can
configure as he likes.


Bye.

2002-09-12 08:22:34

by Helge Hafting

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited

Jurriaan wrote:
>
> From: Jim Sibley <[email protected]>
> Date: Wed, Sep 11, 2002 at 11:08:43AM -0700
> > 1 - cpu usage may not be a good measure
> > 2 - Large memory tasks may not be a good measure
> > 3 - Measuring memory by task is misleading
> > 4 - Niceness is not really useful in a multi-user environment.
> > 5 - Other numerical limits tend to be arbitrary.
>
> I was just think (feel free to point out the errors of my way):
>
> what if we used the time a program was started as a guide? The last
> programs started are killed of first.
>
> That would mean that init survives to the last, as would the daemons
> that are started when booting.

And if one of your daemons has a slow memory leak then this happens:
You go OOM after a while (hours, days) - a user program is killed.
Buth the leaky dameon is running, so after a shorter time you go OOM
again.
Another user program is killed. This goes on for a while, it becomes
hard
to log in to fix things because the freshly logged in administrator
has a very new process and is the first to go!

After a while, all user programs are gone and daemons die one by one
until the offending one goes. Or perhaps the offending damon
don't leak anymore - it might be sshd but there is not enough memory
to log in so it don't get to leak any more.
>
> Alternatively, suppose we get a very large pid-space, and at the end of
> booting there's something like
>
> echo "5000" > /proc/sys/minimum-pid-from-here-on
>
> Then, you could do:
>
> echo "5000" > proc/sys/oom_lowest_pid_to_try_killing_first

Again, a bad daemon (pid < 5000) will slowly take out everything else,
with login impossible in the meantime.

> in other words, protect a part of pid-space against oom-killing.

Any way you protect a bunch of processes might fail if the bad one
is among them. Also, the OOM killer will have to fall back
to the standard heuristic whenever there is only protected
processes left.

Helge Hafting

2002-09-12 15:57:57

by Rik van Riel

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited

On Thu, 12 Sep 2002, Giuliano Pochini wrote:
> On 11-Sep-2002 Jim Sibley wrote:
> > I have run into a situation in a multi-user Linux environment that when
> > memory is exhausted, random things happen. [...] In a "well tuned" system,
> > we are safe, but when the system accidentally (or deliberately) becomes
> > "detuned", oom_kill is entered and arbitrarily kills a process.
>
> It's not difficult to make the kerner choose the right processes
> to kill. It's impossible.

This assumes there is only 1 "good" process to kill. In reality
there will often be a number of acceptable candidates, so we just
need to identify one of those ;)

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

Spamtraps of the month: [email protected] [email protected]

2002-09-12 18:11:16

by Jim Sibley

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited


Agreed, and I think its up to the installation to decide who that process
is.

Regards, Jim
Linux S/390-zSeries Support, SEEL, IBM Silicon Valley Labs
t/l 543-4021, 408-463-4021, [email protected]
*** Grace Happens ***




Rik van Riel
<[email protected] To: Giuliano Pochini <[email protected]>
om.br> cc: Jim Sibley/San Jose/IBM@IBMUS, Troy Reed/Santa
Teresa/IBM@IBMUS, <[email protected]>
09/12/02 12:02 PM Subject: RE: Killing/balancing processes when overcommited






On Thu, 12 Sep 2002, Giuliano Pochini wrote:
> On 11-Sep-2002 Jim Sibley wrote:
> > I have run into a situation in a multi-user Linux environment that when
> > memory is exhausted, random things happen. [...] In a "well tuned"
system,
> > we are safe, but when the system accidentally (or deliberately) becomes
> > "detuned", oom_kill is entered and arbitrarily kills a process.
>
> It's not difficult to make the kerner choose the right processes
> to kill. It's impossible.

This assumes there is only 1 "good" process to kill. In reality
there will often be a number of acceptable candidates, so we just
need to identify one of those ;)

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

Spamtraps of the month: [email protected] [email protected]






2002-09-12 18:26:58

by Thunder from the hill

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited

Hi,

On Thu, 12 Sep 2002, Giuliano Pochini wrote:
> It's not difficult to make the kerner choose the right processes
> to kill. It's impossible.

Not quite. But it's expensive. It adds 4 bytes per task, plus a second
OOM killer.

> Imagine that when it goes oom the system stops and asks you what
> processes have to be killed. What do you kill ?

Rather whom would you ask?

> Probably we do need an oomd that the sysadmin can configure as he likes.

That's bad, it could get killed. ;-)

Mostly the mem eaters are those who hang in an malloc() deadloop.

char *x = NULL;

/*
* We need this variable, so if we don't get it, we reallocate it
* regardless of what happened.
*/
do {
x = malloc(X_SIZE);
} while (!x);

That's possibly a candidate.

So if we just count how often per second that stubborn process uses
malloc(), you'll catch the right guy most of the time. If you don't get
a process that's over the threshold, do usual OOM killing...

Thunder
--
--./../...-/. -.--/---/..-/.-./..././.-../..-. .---/..-/.../- .-
--/../-./..-/-/./--..-- ../.----./.-../.-.. --./../...-/. -.--/---/..-
.- -/---/--/---/.-./.-./---/.--/.-.-.-
--./.-/-.../.-./.././.-../.-.-.-

2002-09-12 18:58:34

by Jim Sibley

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited


The mem eaters may not be the ones really "causing the problem" as
determined by the installation. The case where I discovered this was when
someone was asking for a lot of small chunks of memory (legitemately). So
you would need a history and a total memory usage to identify who this is.
And this is not really just limited to memory.

I still favor an installation file in /etc specifying the order in which
things are to be killed. Any alogrithmic assumptions are bound to fail at
some point to the dissatisfaction of the installation.

And this is not just limited to memory exhaustion. For example, if I exceed
the maximum number of files, I can't log on to fix the problem. If the
installation could set some priorities, they could say who to sacrifice in
order to keep others running.

Regards, Jim
Linux S/390-zSeries Support, SEEL, IBM Silicon Valley Labs
t/l 543-4021, 408-463-4021, [email protected]
*** Grace Happens ***



2002-09-12 19:04:30

by Thunder from the hill

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited

Hi,

On Thu, 12 Sep 2002, Jim Sibley wrote:
> And this is not just limited to memory exhaustion. For example, if I exceed
> the maximum number of files, I can't log on to fix the problem. If the
> installation could set some priorities, they could say who to sacrifice in
> order to keep others running.

These problems can be solved via ulimit. I was referring to things like
rsyncd which was blowing up under certain situations, but runs under a
trusted account (say UID=0). In order to condemn it you'd need the setup
I've mentioned.

Thunder
--
--./../...-/. -.--/---/..-/.-./..././.-../..-. .---/..-/.../- .-
--/../-./..-/-/./--..-- ../.----./.-../.-.. --./../...-/. -.--/---/..-
.- -/---/--/---/.-./.-./---/.--/.-.-.-
--./.-/-.../.-./.././.-../.-.-.-

2002-09-12 19:04:40

by Rik van Riel

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited

On Thu, 12 Sep 2002, Jim Sibley wrote:

> I still favor an installation file in /etc specifying the order in which
> things are to be killed. Any alogrithmic assumptions are bound to fail at
> some point to the dissatisfaction of the installation.

That's all fine and well, but somebody will have to implement this. ;)

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

Spamtraps of the month: [email protected] [email protected]

2002-09-12 20:30:00

by Alan

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited

On Thu, 2002-09-12 at 20:08, Thunder from the hill wrote:

> These problems can be solved via ulimit. I was referring to things like
> rsyncd which was blowing up under certain situations, but runs under a
> trusted account (say UID=0). In order to condemn it you'd need the setup
> I've mentioned.

Ulimit won't help you one iota

2002-09-12 20:40:13

by Thunder from the hill

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited

Hi,

On 12 Sep 2002, Alan Cox wrote:
> On Thu, 2002-09-12 at 20:08, Thunder from the hill wrote:
> > These problems can be solved via ulimit. I was referring to things
> > like rsyncd which was blowing up under certain situations, but runs
> > under a trusted account (say UID=0). In order to condemn it you'd need
> > the setup I've mentioned.
>
> Ulimit won't help you one iota

Why so pessimistic? You can ban users using ulimit, as you know. (You will
always remember when you wake up and your memory is ulimited to 1MB.)

Thunder
--
--./../...-/. -.--/---/..-/.-./..././.-../..-. .---/..-/.../- .-
--/../-./..-/-/./--..-- ../.----./.-../.-.. --./../...-/. -.--/---/..-
.- -/---/--/---/.-./.-./---/.--/.-.-.-
--./.-/-.../.-./.././.-../.-.-.-

2002-09-12 20:50:25

by Alan

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited

On Thu, 2002-09-12 at 21:43, Thunder from the hill wrote:
> > Ulimit won't help you one iota
>
> Why so pessimistic? You can ban users using ulimit, as you know. (You will
> always remember when you wake up and your memory is ulimited to 1MB.)

Because I've run real world large systems before. Ulimit is at best a
handy little toy for stopping web server accidents.

2002-09-12 21:10:04

by Thunder from the hill

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited

Hi,

On 12 Sep 2002, Alan Cox wrote:
> Because I've run real world large systems before. Ulimit is at best a
> handy little toy for stopping web server accidents.

Only if you assume that a bunch of users tries very hard to use up all the
resources...

Thunder
--
--./../...-/. -.--/---/..-/.-./..././.-../..-. .---/..-/.../- .-
--/../-./..-/-/./--..-- ../.----./.-../.-.. --./../...-/. -.--/---/..-
.- -/---/--/---/.-./.-./---/.--/.-.-.-
--./.-/-.../.-./.././.-../.-.-.-

2002-09-12 21:19:56

by Jesse Pollard

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited

On Thursday 12 September 2002 03:43 pm, Thunder from the hill wrote:
> Hi,
>
> On 12 Sep 2002, Alan Cox wrote:
> > On Thu, 2002-09-12 at 20:08, Thunder from the hill wrote:
> > > These problems can be solved via ulimit. I was referring to things
> > > like rsyncd which was blowing up under certain situations, but runs
> > > under a trusted account (say UID=0). In order to condemn it you'd need
> > > the setup I've mentioned.
> >
> > Ulimit won't help you one iota
>
> Why so pessimistic? You can ban users using ulimit, as you know. (You will
> always remember when you wake up and your memory is ulimited to 1MB.)
>
> Thunder

ulimit is a per login limit, not a global per user limit. The sum of all user
logins can still exceed the available memory. Even a large number of
simultaneous network connections (telnetd/sshd) can drive a system OOM.

Now, which of these processes should be killed?
--
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2002-09-12 21:21:54

by Jesse Pollard

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited

On Thursday 12 September 2002 04:15 pm, Thunder from the hill wrote:
> Hi,
>
> On 12 Sep 2002, Alan Cox wrote:
> > Because I've run real world large systems before. Ulimit is at best a
> > handy little toy for stopping web server accidents.
>
> Only if you assume that a bunch of users tries very hard to use up all the
> resources...
>
> Thunder

It only takes one.
--
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2002-09-12 21:51:34

by Thunder from the hill

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited

Hi,

On Thu, 12 Sep 2002, Jesse Pollard wrote:
> ulimit is a per login limit, not a global per user limit.

I see... Wonderous that I've forgot that.

> Now, which of these processes should be killed?

...the last of the user who has the most processes?

I still don't think that a whitelist could be that good. And however, it
doesn't stand against my suggestion. Firstly kill processes which likely
deadloop on malloc, then the unlisted, and then the rest. All under the
cover of overcommitment...

...Linux, the best-tinkered OOM-killing operating system...

Thunder
--
--./../...-/. -.--/---/..-/.-./..././.-../..-. .---/..-/.../- .-
--/../-./..-/-/./--..-- ../.----./.-../.-.. --./../...-/. -.--/---/..-
.- -/---/--/---/.-./.-./---/.--/.-.-.-
--./.-/-.../.-./.././.-../.-.-.-

2002-09-12 23:07:14

by Thunder from the hill

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited

Hi,

On 13 Sep 2002, Alan Cox wrote:
> You've never run a large machine have you

I've not set up one for years, that's why I've mixed up ulimit and another
bitches brew. See the other mail.

Thunder
--
--./../...-/. -.--/---/..-/.-./..././.-../..-. .---/..-/.../- .-
--/../-./..-/-/./--..-- ../.----./.-../.-.. --./../...-/. -.--/---/..-
.- -/---/--/---/.-./.-./---/.--/.-.-.-
--./.-/-.../.-./.././.-../.-.-.-

2002-09-12 23:03:25

by Alan

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited

On Thu, 2002-09-12 at 22:15, Thunder from the hill wrote:
> Hi,
>
> On 12 Sep 2002, Alan Cox wrote:
> > Because I've run real world large systems before. Ulimit is at best a
> > handy little toy for stopping web server accidents.
>
> Only if you assume that a bunch of users tries very hard to use up all the
> resources...

You've never run a large machine have you

2002-09-13 07:46:40

by Giuliano Pochini

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited


>> Now, which of these processes should be killed?
>
> ...the last of the user who has the most processes?

No, the last one it's likely to be the sysadmin that
logged in to try to fix the situation.


Bye.

2002-09-13 08:00:39

by Giuliano Pochini

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited


> I still favor an installation file in /etc specifying the order in which
> things are to be killed. Any alogrithmic assumptions are bound to fail at
> some point to the dissatisfaction of the installation.

I agree, I don't see any other solution. btw the thing is not
simple. The oom killer should be able to comply instructions
like:

if (oom)
kill netscape if it uses more than 80MB {stdprio 10}
//sometimes if start sucking memory endlessly
kill make and its childs if overall they use {stdprio 7}
more than ...[cpu files memory]
//ever tried to make -j bzImage on a 64MB box ?
kill httpd if it's swapping too much {stdprio 3}
...


Well, it's not simple. It must be planned carefully, or it
will and up being as uneffective as the current killer is.


Bye.

2002-09-13 07:57:40

by Denis Vlasenko

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited

On 11 September 2002 16:08, Jim Sibley wrote:
> resource
> group priority kill priority
> system 0 0 - never kill
> support 1 1
> payroll 2 2
> production 3 3
> general user 4 4
> production backgournd 5 3
^^^
make sure testing and general user are killed
BEFORE production
> testing 6 5


I like this. Maybe map it to user gid and provide /proc interface?

Let's say on your server you allocated gids this way:
0 - system
100 - support
110 - payroll
120 - production
200 - general user
130 - production background
500 - testing

# echo "0 100 110 120 200 130 500" >/proc/resourceprio
# echo "0 100 110 120 130 200 500" >/proc/killprio
--
vda

2002-09-13 08:12:58

by Giuliano Pochini

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited


>> Probably we do need an oomd that the sysadmin can configure as he likes.
>
> That's bad, it could get killed. ;-)

Not if it's the only killer.

> Mostly the mem eaters are those who hang in an malloc() deadloop.

And what about a make -j ? The offender is not always one memory hog.


Bye.

2002-09-13 10:12:27

by Thunder from the hill

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited

Hi,

On Fri, 13 Sep 2002, Giuliano Pochini wrote:
> > ...the last of the user who has the most processes?
>
> No, the last one it's likely to be the sysadmin that
> logged in to try to fix the situation.

Not exactly.

if (we run oom) {
if (we find a malloc() eater) {
kill it;
} else if (there's an ->user<- who forked lots of processes) {
kill some;
} else {
kill randomly, based on some table, or whatever...;
}
}

Means we only kill processes if (task->euid) in the second stage. Malloc
eaters are likely to be system jobs (such as data servers), so we better
don't check the UID then.

Thunder
--
--./../...-/. -.--/---/..-/.-./..././.-../..-. .---/..-/.../- .-
--/../-./..-/-/./--..-- ../.----./.-../.-.. --./../...-/. -.--/---/..-
.- -/---/--/---/.-./.-./---/.--/.-.-.-
--./.-/-.../.-./.././.-../.-.-.-

2002-09-13 10:17:09

by Thunder from the hill

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited

Hi,

On Fri, 13 Sep 2002, Giuliano Pochini wrote:
> And what about a make -j ? The offender is not always one memory hog.

Thou shalt not keep forking, for thou willst not be able to norish thy
children, so thy children shall die...

Thunder
--
--./../...-/. -.--/---/..-/.-./..././.-../..-. .---/..-/.../- .-
--/../-./..-/-/./--..-- ../.----./.-../.-.. --./../...-/. -.--/---/..-
.- -/---/--/---/.-./.-./---/.--/.-.-.-
--./.-/-.../.-./.././.-../.-.-.-

2002-09-13 10:50:26

by Helge Hafting

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited

Giuliano Pochini wrote:
>
> > I still favor an installation file in /etc specifying the order in which
> > things are to be killed. Any alogrithmic assumptions are bound to fail at
> > some point to the dissatisfaction of the installation.
>
> I agree, I don't see any other solution. btw the thing is not
> simple. The oom killer should be able to comply instructions
> like:
>
> if (oom)
> kill netscape if it uses more than 80MB {stdprio 10}
> //sometimes if start sucking memory endlessly
> kill make and its childs if overall they use {stdprio 7}
> more than ...[cpu files memory]
> //ever tried to make -j bzImage on a 64MB box ?
> kill httpd if it's swapping too much {stdprio 3}
> ...

This is hard to setup, and has the some weaknesses:
1. You worry only about apps you _know_. But the guy who got
his netscape or make -j killed will rename his
copies of these apps to something else so your carefully
set up oom killer won't know what is running.
(How much memory is the "mybrowser" app supposed to use?)
Or he'll get another software package that you haven't heard of.

2. Lots and lots of people running netscapes using
only 70M each will still be too much. Think of
a university with xterms and then they all
goes to cnn.com or something for the latest news
about some large event.

Even nice well-behaved apps
is bad when there is unusually many of them.
There may be no particular "offender" to point
a finger at, even for a clueful administrator.
The perfect OOM killer cannot be written - even
a human can't do that job well under all circumstances.

Helge Hafting

2002-09-13 12:53:37

by Jesse Pollard

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited

On Friday 13 September 2002 07:53 am, Denis Vlasenko wrote:
> On 11 September 2002 16:08, Jim Sibley wrote:
> > resource
> > group priority kill priority
> > system 0 0 - never
> > kill support 1 1
> > payroll 2 2
> > production 3 3
> > general user 4 4
> > production backgournd 5 3
>
> ^^^
> make sure testing and general user are
> killed BEFORE production
>
> > testing 6 5
>
> I like this. Maybe map it to user gid and provide /proc interface?
>
> Let's say on your server you allocated gids this way:
> 0 - system
> 100 - support
> 110 - payroll
> 120 - production
> 200 - general user
> 130 - production background
> 500 - testing
>
> # echo "0 100 110 120 200 130 500" >/proc/resourceprio
> # echo "0 100 110 120 130 200 500" >/proc/killprio

Don't base it on gid. Remember, a user can be a member of multiple
gids for file access. At this point you may get a payroll/production
conflict, or a production/production background conflict.

You really have to use a resource accounting structure that allows
one and only one id per process. A user may (like groups) have
access to multiple resource accounts, but a given process should
only have one.
--
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2002-09-13 12:57:42

by Giuliano Pochini

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited


> This is hard to setup, and has the some weaknesses:
> 1. You worry only about apps you _know_. But the guy who got
> his netscape or make -j killed will rename his
> copies of these apps to something else so your carefully
> set up oom killer won't know what is running.
> (How much memory is the "mybrowser" app supposed to use?)
> Or he'll get another software package that you haven't heard of.
>
> 2. Lots and lots of people running netscapes using
> only 70M each will still be too much. Think of
> a university with xterms and then they all
> goes to cnn.com or something for the latest news
> about some large event.
>
> Even nice well-behaved apps
> is bad when there is unusually many of them. [...]

That's obvious. The point is that the sysadmin should be
able to hint the oom killer as much as possible.
The current linux/mm/oom_kill.c:badness() takes into account
many factors. The sysadmin should be able to affect the
badness calculation on process/user/something basis.



Bye.

2002-09-13 16:36:04

by Gerhard Mack

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited

On Fri, 13 Sep 2002, Giuliano Pochini wrote:

> Date: Fri, 13 Sep 2002 15:02:21 +0200 (CEST)
> From: Giuliano Pochini <[email protected]>
> To: Helge Hafting <[email protected]>
> Cc: [email protected]
> Subject: Re: Killing/balancing processes when overcommited
>
>
> > This is hard to setup, and has the some weaknesses:
> > 1. You worry only about apps you _know_. But the guy who got
> > his netscape or make -j killed will rename his
> > copies of these apps to something else so your carefully
> > set up oom killer won't know what is running.
> > (How much memory is the "mybrowser" app supposed to use?)
> > Or he'll get another software package that you haven't heard of.
> >
> > 2. Lots and lots of people running netscapes using
> > only 70M each will still be too much. Think of
> > a university with xterms and then they all
> > goes to cnn.com or something for the latest news
> > about some large event.
> >
> > Even nice well-behaved apps
> > is bad when there is unusually many of them. [...]
>
> That's obvious. The point is that the sysadmin should be
> able to hint the oom killer as much as possible.
> The current linux/mm/oom_kill.c:badness() takes into account
> many factors. The sysadmin should be able to affect the
> badness calculation on process/user/something basis.

I think what is really needed is a daemon to handle complex descisions
like that with the kernel OOM killer as a fall back.

Gerhard

--
Gerhard Mack

[email protected]

<>< As a computer I find your faith in technology amusing.

2002-09-13 20:23:42

by Timothy D. Witham

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited

On Fri, 2002-09-13 at 01:05, Giuliano Pochini wrote:
>
> > I still favor an installation file in /etc specifying the order in which
> > things are to be killed. Any alogrithmic assumptions are bound to fail at
> > some point to the dissatisfaction of the installation.
>

There is another solution. And that is never allocate memory unless
you have swap space. Yes, the issue is that you need to have lots of
disk allocated to swap but on a big machine you will have that space.

This way the offending process that asks for more memory will be
the one that gets killed. Even if the 1st couple of ones aren't the
misbehaving process eventually it will ask for more memory and suffer
process execution.

Of course you don't allocate swap every time somebody asks for
a byte but you do it on some big boundary and do it when somebody
crosses that line.

Tim

> I agree, I don't see any other solution. btw the thing is not
> simple. The oom killer should be able to comply instructions
> like:
>
> if (oom)
> kill netscape if it uses more than 80MB {stdprio 10}
> //sometimes if start sucking memory endlessly
> kill make and its childs if overall they use {stdprio 7}
> more than ...[cpu files memory]
> //ever tried to make -j bzImage on a 64MB box ?
> kill httpd if it's swapping too much {stdprio 3}
> ...
>
>
> Well, it's not simple. It must be planned carefully, or it
> will and up being as uneffective as the current killer is.
>
>
> Bye.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Timothy D. Witham - Lab Director - [email protected]
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office) (503)-702-2871 (cell)
(503)-626-2436 (fax)

2002-09-13 21:08:53

by Jim Sibley

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited



Tim wrote:
> There is another solution. And that is never >allocate memory unless
>you have swap space. Yes, the issue is that you >need to have lots of
>disk allocated to swap but on a big machine you >will have that space.

How do you predict if a program is going to ask for more memory? Maybe it only
needs additional memory for a short time and is a good citizen and gives it
back?

How much disk needs to be allocated for swap? In 32 bit, each logged in user
is limited to 2 GB, so do we need 2 GB for each logged on user? 250 users, 500
GB of disk?

In a 64 bit system, how much swap would you reserve?

Actually, another OS took this approach in the early 70's and this was quickly
junked when they found out how much disk they really had to keep in reserve
for paging.

> This way the offending process that asks for >more memory will be
>the one that gets killed. Even if the 1st couple >of ones aren't the
>misbehaving process eventually it will ask for >more memory and suffer
>process execution.

It may not be the "offending process" that is asking for more memory. How do
you judge "offending?"






2002-09-13 22:31:55

by Timothy D. Witham

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited


On Fri, 2002-09-13 at 14:13, Jim Sibley wrote:
>
>
> Tim wrote:
> > There is another solution. And that is never >allocate memory unless
> >you have swap space. Yes, the issue is that you >need to have lots of
> >disk allocated to swap but on a big machine you >will have that space.
>
> How do you predict if a program is going to ask for more memory? Maybe it only
> needs additional memory for a short time and is a good citizen and gives it
> back?
>

Well its been a bit so the details are fuzzy but you have a pointer
to how much space you have left in you allocated swap and when you get
memory you decrement space and when you release memory in increment it
so that it indicates how much you have left. If you try and malloc
more than you free then you go and reserve another chunk of swap.

> How much disk needs to be allocated for swap? In 32 bit, each logged in user
> is limited to 2 GB, so do we need 2 GB for each logged on user? 250 users, 500
> GB of disk?
>

It turns out that for the high load database machines this is about
3 to 4 times the actual physical memory.


> In a 64 bit system, how much swap would you reserve?
>

Same as 32 bit machines doesn't apply.


> Actually, another OS took this approach in the early 70's and this was quickly
> junked when they found out how much disk they really had to keep in reserve
> for paging.
>

But there are many current OS's that do this and are quite successful.

> > This way the offending process that asks for >more memory will be
> >the one that gets killed. Even if the 1st couple >of ones aren't the
> >misbehaving process eventually it will ask for >more memory and suffer
> >process execution.
>
> It may not be the "offending process" that is asking for more memory. How do
> you judge "offending?"
>

In this case the offense is asking for more memory. So it is the
process that asks for more memory that goes away. Again sometimes
it will be an innocent bystander but hopefully it will eventually
be the process that is causing the problem.

Many databases program for this condition. The stuff that absolutely
must be up and running preallocates all of the memory that it needs
at startup. Any new memory requests that are denied might cause a
transaction to fail but they won't bring down the whole database.

Tim
>
>
>
>
--
Timothy D. Witham - Lab Director - [email protected]
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office) (503)-702-2871 (cell)
(503)-626-2436 (fax)

2002-09-13 22:38:34

by Timothy D. Witham

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited

On Fri, 2002-09-13 at 15:31, Timothy D. Witham wrote:
>
> On Fri, 2002-09-13 at 14:13, Jim Sibley wrote:
> >
> >
> > Tim wrote:
> > > There is another solution. And that is never >allocate memory unless
> > >you have swap space. Yes, the issue is that you >need to have lots of
> > >disk allocated to swap but on a big machine you >will have that space.
> >
> > How do you predict if a program is going to ask for more memory? Maybe it only
> > needs additional memory for a short time and is a good citizen and gives it
> > back?
> >
>
> Well its been a bit so the details are fuzzy but you have a pointer

Counter, I meant to say "a counter as to " instead of "a pointer to".

:-)

> to how much space you have left in you allocated swap and when you get
> memory you decrement space and when you release memory in increment it
> so that it indicates how much you have left. If you try and malloc
> more than you free then you go and reserve another chunk of swap.
>
> > How much disk needs to be allocated for swap? In 32 bit, each logged in user
> > is limited to 2 GB, so do we need 2 GB for each logged on user? 250 users, 500
> > GB of disk?
> >
>
> It turns out that for the high load database machines this is about
> 3 to 4 times the actual physical memory.
>
>
> > In a 64 bit system, how much swap would you reserve?
> >
>
> Same as 32 bit machines doesn't apply.
>
>
> > Actually, another OS took this approach in the early 70's and this was quickly
> > junked when they found out how much disk they really had to keep in reserve
> > for paging.
> >
>
> But there are many current OS's that do this and are quite successful.
>
> > > This way the offending process that asks for >more memory will be
> > >the one that gets killed. Even if the 1st couple >of ones aren't the
> > >misbehaving process eventually it will ask for >more memory and suffer
> > >process execution.
> >
> > It may not be the "offending process" that is asking for more memory. How do
> > you judge "offending?"
> >
>
> In this case the offense is asking for more memory. So it is the
> process that asks for more memory that goes away. Again sometimes
> it will be an innocent bystander but hopefully it will eventually
> be the process that is causing the problem.
>
> Many databases program for this condition. The stuff that absolutely
> must be up and running preallocates all of the memory that it needs
> at startup. Any new memory requests that are denied might cause a
> transaction to fail but they won't bring down the whole database.
>
> Tim
> >
> >
> >
> >
> --
> Timothy D. Witham - Lab Director - [email protected]
> Open Source Development Lab Inc - A non-profit corporation
> 15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
> (503)-626-2455 x11 (office) (503)-702-2871 (cell)
> (503)-626-2436 (fax)
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Timothy D. Witham - Lab Director - [email protected]
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office) (503)-702-2871 (cell)
(503)-626-2436 (fax)

2002-09-13 22:40:24

by Rik van Riel

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited

On 13 Sep 2002, Timothy D. Witham wrote:

> In this case the offense is asking for more memory. So it is the
> process that asks for more memory that goes away. Again sometimes it
> will be an innocent bystander but hopefully it will eventually be the
> process that is causing the problem.

If you kill the process that requests memory, the sequence often
goes as follows:

1) memory is exhausted

2) the network driver can't allocate memory and
spits out a message

3) syslogd and/or klogd get killed

Clearly you want to be a bit smarter about which process to kill.

regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

Spamtraps of the month: [email protected] [email protected]

2002-09-13 23:12:27

by Timothy D. Witham

[permalink] [raw]
Subject: RE: Killing/balancing processes when overcommited


On Fri, 2002-09-13 at 15:44, Rik van Riel wrote:
> On 13 Sep 2002, Timothy D. Witham wrote:
>
> > In this case the offense is asking for more memory. So it is the
> > process that asks for more memory that goes away. Again sometimes it
> > will be an innocent bystander but hopefully it will eventually be the
> > process that is causing the problem.
>
> If you kill the process that requests memory, the sequence often
> goes as follows:
>
> 1) memory is exhausted
>
> 2) the network driver can't allocate memory and
> spits out a message
>
> 3) syslogd and/or klogd get killed
>
> Clearly you want to be a bit smarter about which process to kill.
>


Right, you need to have a hold back for the kernel/root items to
ensure that this sort of thing doesn't happen.


> regards,
>
> Rik
> --
> Bravely reimplemented by the knights who say "NIH".
>
> http://www.surriel.com/ http://distro.conectiva.com/
>
> Spamtraps of the month: [email protected] [email protected]
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Timothy D. Witham - Lab Director - [email protected]
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office) (503)-702-2871 (cell)
(503)-626-2436 (fax)

2002-09-14 00:18:35

by Jim Sibley

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited

Actually, the "offense is not asking for memory". The issue is
gracefully responding to an exhausted resource in some kind of
predetermined way - memory being just one example, but the that started
this thread.

Any algorithm that bases the solution on the developer's notion of
"niceness" and "offense" may not solve the problem the user installation
is trying to solve - gracefully shutting down work (or ungracefully if
necessary) based on the installations priorities and needs rather than
randomly killing. Hopefully, the system can survive past the peak that
aggravating the problem or at least let someone add the resources
needed. In the particular case of "out of memory", add swap spaces.

I'd rather be able to choose to lose the online cafeteria menu before I
lose the emergency dispatch system. I'd much rather take action well
before any of the critical system functions are sacrificed. To me,
logging on by the wheel is going to fix the problem is quite high on my
priority list. But with Tim's definition, he is the offended because he
would be asking for memory.

I have to beg off this discussion for a week as I'll me out of country.
I shall return.

2002-09-16 07:24:42

by Helge Hafting

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited

Rik van Riel wrote:

> If you kill the process that requests memory, the sequence often
> goes as follows:
>
> 1) memory is exhausted
>
> 2) the network driver can't allocate memory and
> spits out a message
>
> 3) syslogd and/or klogd get killed
>
> Clearly you want to be a bit smarter about which process to kill.

Ill-implemented klogd/syslogd. Pre-allocating a little memory
is one way to go, or drop messages until allocation
becomes possible again. Then log a complaint about
messages missing due to a temporary OOM.

Helge Hafting

2002-09-16 13:58:55

by Rik van Riel

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited

On Mon, 16 Sep 2002, Helge Hafting wrote:
> Rik van Riel wrote:
>
> > 1) memory is exhausted
> > 2) the network driver can't allocate memory and
> > spits out a message
> > 3) syslogd and/or klogd get killed
> >
> > Clearly you want to be a bit smarter about which process to kill.
>
> Ill-implemented klogd/syslogd. Pre-allocating a little memory
> is one way to go, or drop messages until allocation
> becomes possible again. Then log a complaint about
> messages missing due to a temporary OOM.

No. This has absolutely nothing to do with it.

In this case, "allocating memory" simply means that klogd/syslogd
page faults on something it already allocated, say a piece of the
executable or a swapped-out buffer.

Simple page faults like this can also trigger an OOM-killing.

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

Spamtraps of the month: [email protected] [email protected]

2002-09-16 18:49:53

by Timothy D. Witham

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited


On Mon, 2002-09-16 at 07:03, Rik van Riel wrote:
> On Mon, 16 Sep 2002, Helge Hafting wrote:
> > Rik van Riel wrote:
> >
> > > 1) memory is exhausted
> > > 2) the network driver can't allocate memory and
> > > spits out a message
> > > 3) syslogd and/or klogd get killed
> > >
> > > Clearly you want to be a bit smarter about which process to kill.
> >
> > Ill-implemented klogd/syslogd. Pre-allocating a little memory
> > is one way to go, or drop messages until allocation
> > becomes possible again. Then log a complaint about
> > messages missing due to a temporary OOM.
>
> No. This has absolutely nothing to do with it.
>
> In this case, "allocating memory" simply means that klogd/syslogd
> page faults on something it already allocated, say a piece of the
> executable or a swapped-out buffer.
>
> Simple page faults like this can also trigger an OOM-killing.
>

Not in what I had described. Unless the page fault was for a
new page (just malloc'ed) it wouldn't result in the killing of
the process.

Tim

> Rik
> --
> Bravely reimplemented by the knights who say "NIH".
>
> http://www.surriel.com/ http://distro.conectiva.com/
>
> Spamtraps of the month: [email protected] [email protected]
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Timothy D. Witham - Lab Director - [email protected]
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office) (503)-702-2871 (cell)
(503)-626-2436 (fax)

2002-09-16 19:06:35

by Rik van Riel

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited

On 16 Sep 2002, Timothy D. Witham wrote:
> On Mon, 2002-09-16 at 07:03, Rik van Riel wrote:

> > > > 1) memory is exhausted
> > > > 2) the network driver can't allocate memory and
> > > > spits out a message
> > > > 3) syslogd and/or klogd get killed

> Not in what I had described. Unless the page fault was for a new page
> (just malloc'ed) it wouldn't result in the killing of the process.

Unfortunately they do. Reality doesn't quite match your
description.

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

Spamtraps of the month: [email protected] [email protected]

2002-09-16 20:28:03

by Timothy D. Witham

[permalink] [raw]
Subject: Re: Killing/balancing processes when overcommited

On Mon, 2002-09-16 at 12:11, Rik van Riel wrote:
> On 16 Sep 2002, Timothy D. Witham wrote:
> > On Mon, 2002-09-16 at 07:03, Rik van Riel wrote:
>
> > > > > 1) memory is exhausted
> > > > > 2) the network driver can't allocate memory and
> > > > > spits out a message
> > > > > 3) syslogd and/or klogd get killed
>
> > Not in what I had described. Unless the page fault was for a new page
> > (just malloc'ed) it wouldn't result in the killing of the process.
>
> Unfortunately they do. Reality doesn't quite match your
> description.
>

I wasn't talking about the current situation I was talking about
the pre-allocation method.

Tim

> Rik
> --
> Bravely reimplemented by the knights who say "NIH".
>
> http://www.surriel.com/ http://distro.conectiva.com/
>
> Spamtraps of the month: [email protected] [email protected]
--
Timothy D. Witham - Lab Director - [email protected]
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office) (503)-702-2871 (cell)
(503)-626-2436 (fax)