2001-03-27 11:00:44

by Rogier Wolff

[permalink] [raw]
Subject: OOM killer???


Just a quick bug-report:

One of our machines just started spewing:

Out of Memory: Killed process 117 (sendmail).
Out of Memory: Killed process 117 (sendmail).
Out of Memory: Killed process 117 (sendmail).
Out of Memory: Killed process 117 (sendmail).
Out of Memory: Killed process 117 (sendmail).
Out of Memory: Killed process 117 (sendmail).
Out of Memory: Killed process 117 (sendmail).
Out of Memory: Killed process 117 (sendmail).
Out of Memory: Killed process 117 (sendmail).

What we did to run it out of memory, I don't know. But I do know that
it shouldn't be killing one process more than once... (the process
should not exist after one try...)

Kernel 2.4.0 .

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots.
* There are also old, bald pilots.


2001-03-27 12:15:46

by Jonathan Morton

[permalink] [raw]
Subject: Re: OOM killer???

>Out of Memory: Killed process 117 (sendmail).
>
>What we did to run it out of memory, I don't know. But I do know that
>it shouldn't be killing one process more than once... (the process
>should not exist after one try...)

This is a known bug in the Out-of-Memory handler, where it does not count the buffer and cache memory as "free" (it should), causing premature OOM killing. It is, however, normal for the OOM killer to attempt to kill a process more than once - it takes a few scheduler cycles for the SIGKILL to actually reach the process and take effect.

Also, it probably shouldn't have killed Sendmail, since that is usually a long-running, low-UID (and important) process. The OOM-kill selector is another thing that wants fixing, and my patch contains a *very rough* beginning to this.

The following patch should solve your problem for now, until a more detailed fix (which also clears up many other problems) is available in the stable kernel.

Alan and/or Linus may wish to apply this patch too...

(excerpt from my original patch from Saturday follows)

--- start ---
diff -u linux-2.4.1.orig/mm/oom_kill.c linux/mm/oom_kill.c
--- linux-2.4.1.orig/mm/oom_kill.c Tue Nov 14 18:56:46 2000
+++ linux/mm/oom_kill.c Sat Mar 24 20:35:20 2001
@@ -76,7 +76,9 @@
run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10);

points /= int_sqrt(cpu_time);
- points /= int_sqrt(int_sqrt(run_time));
+
+ /* Long-running processes are *very* important, so don't take the 4th root */
+ points /= run_time;

/*
* Niced processes are most likely less important, so double
@@ -93,6 +95,10 @@
p->uid == 0 || p->euid == 0)
points /= 4;

+ /* Much the same goes for processes with low UIDs */
+ if(p->uid < 100 || p->euid < 100)
+ points /= 2;
+
/*
* We don't want to kill a process with direct hardware access.
* Not only could that mess up the hardware, but usually users
@@ -192,12 +198,20 @@
int out_of_memory(void)
{
struct sysinfo swp_info;
+ long free;

/* Enough free memory? Not OOM. */
- if (nr_free_pages() > freepages.min)
+ free = nr_free_pages();
+ if (free > freepages.min)
+ return 0;
+
+ if (free + nr_inactive_clean_pages() > freepages.low)
return 0;

- if (nr_free_pages() + nr_inactive_clean_pages() > freepages.low)
+ /* Buffers and caches can be freed up (Jonathan "Chromatix" Morton) */
+ free += atomic_read(&buffermem_pages);
+ free += atomic_read(&page_cache_size);
+ if (free > freepages.low)
return 0;

/* Enough swap space left? Not OOM. */
--- end ---

--------------------------------------------------------------
from: Jonathan "Chromatix" Morton
mail: [email protected] (not for attachments)
big-mail: [email protected]
uni-mail: [email protected]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----


2001-03-27 13:37:31

by Martin Dalecki

[permalink] [raw]
Subject: Re: OOM killer???

Jonathan Morton wrote:
>
> >Out of Memory: Killed process 117 (sendmail).
> >
> >What we did to run it out of memory, I don't know. But I do know that
> >it shouldn't be killing one process more than once... (the process
> >should not exist after one try...)
>
> This is a known bug in the Out-of-Memory handler, where it does not count the buffer and cache memory as "free" (it should), causing premature OOM killing. It is, however, normal for the OOM killer to attempt to kill a process more than once - it takes a few scheduler cycles for the SIGKILL to actually reach the process and take effect.
>
> Also, it probably shouldn't have killed Sendmail, since that is usually a long-running, low-UID (and important) process. The OOM-kill selector is another thing that wants fixing, and my patch contains a *very rough* beginning to this.
>
> The following patch should solve your problem for now, until a more detailed fix (which also clears up many other problems) is available in the stable kernel.
>
> Alan and/or Linus may wish to apply this patch too...
>
> (excerpt from my original patch from Saturday follows)
>
> --- start ---
> diff -u linux-2.4.1.orig/mm/oom_kill.c linux/mm/oom_kill.c
> --- linux-2.4.1.orig/mm/oom_kill.c Tue Nov 14 18:56:46 2000
> +++ linux/mm/oom_kill.c Sat Mar 24 20:35:20 2001
> @@ -76,7 +76,9 @@
> run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10);
>
> points /= int_sqrt(cpu_time);
> - points /= int_sqrt(int_sqrt(run_time));
> +
> + /* Long-running processes are *very* important, so don't take the 4th root */
> + points /= run_time;
>
> /*
> * Niced processes are most likely less important, so double
> @@ -93,6 +95,10 @@
> p->uid == 0 || p->euid == 0)
> points /= 4;
>
> + /* Much the same goes for processes with low UIDs */
> + if(p->uid < 100 || p->euid < 100)
> + points /= 2;
> +

Plase change to 100 to 500 - this would make it consistant with
the useradd command, which starts adding new users at the UID 500

2001-03-27 14:13:53

by Jonathan Morton

[permalink] [raw]
Subject: Re: OOM killer???

>Plase change to 100 to 500 - this would make it consistant with
>the useradd command, which starts adding new users at the UID 500

Depends on which distribution you're using. In my experience, almost all
the really important stuff happens below 100. In any case, the
OOM-kill-selection algorithm in this patch is *not* final. See my
accompanying mail.

--------------------------------------------------------------
from: Jonathan "Chromatix" Morton
mail: [email protected] (not for attachments)
big-mail: [email protected]
uni-mail: [email protected]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----


2001-03-27 15:33:53

by Jonathan Lundell

[permalink] [raw]
Subject: Re: OOM killer???

Martin Dalecki <[email protected]> writes:

>Plase change to 100 to 500 - this would make it consistant with
>the useradd command, which starts adding new users at the UID 500

It's probably best to keep it somewhere <500, so that one can have "static" (<500) UIDs of either flavor: OOM-killable or not. 100 seems like "enough" non-killable users to me, but that may be a lack of imagination on my part.

--
/Jonathan Lundell.

2001-03-27 16:09:03

by Greg Ingram

[permalink] [raw]
Subject: Config bug? In 2.2.19 CONFIG_RTL8139 depends on CONFIG_EXPERIMENTAL


In 2.2.19 CONFIG_RTL8139 depends on CONFIG_EXPERIMENTAL. The RTL8139
driver is not labelled as experimental. Is this an error?

- Greg


2001-03-27 16:15:53

by Jeff Garzik

[permalink] [raw]
Subject: Re: Config bug? In 2.2.19 CONFIG_RTL8139 depends on CONFIG_EXPERIMENTAL

Greg Ingram wrote:
>
> In 2.2.19 CONFIG_RTL8139 depends on CONFIG_EXPERIMENTAL. The RTL8139
> driver is not labelled as experimental. Is this an error?

Yeah, add '(EXPERIMENTAL)' to the text. Send a patch to Alan if you
want...

--
Jeff Garzik | May you have warm words on a cold evening,
Building 1024 | a full moon on a dark night,
MandrakeSoft | and a smooth road all the way to your door.

2001-03-27 16:38:13

by Greg Ingram

[permalink] [raw]
Subject: [PATCH] 2.2.19 drivers/net/Config.in


On Tue, 27 Mar 2001, Jeff Garzik wrote:

> Greg Ingram wrote:
> >
> > In 2.2.19 CONFIG_RTL8139 depends on CONFIG_EXPERIMENTAL. The RTL8139
> > driver is not labelled as experimental. Is this an error?
>
> Yeah, add '(EXPERIMENTAL)' to the text. Send a patch to Alan if you
> want...

Okay. I really thought it would be the other way around, that is, that
the driver is no longer experimental. Anyway, I also tagged the 8139too
driver as experimental. Patch follows.

- Greg

--- linux/drivers/net/Config.in.orig Tue Mar 27 10:26:52 2001
+++ linux/drivers/net/Config.in Tue Mar 27 10:27:39 2001
@@ -98,10 +98,10 @@
tristate 'NI6510 support' CONFIG_NI65
fi
if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
- tristate 'RealTek 8129/8139 (not 8019/8029!) support' CONFIG_RTL8139
+ tristate 'RealTek 8129/8139 (not 8019/8029!) support (EXPERIMENTAL)' CONFIG_RTL8139
fi
if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
- tristate 'Alternative RealTek 8139 driver (8139too) support' CONFIG_RTL8139TOO
+ tristate 'Alternative RealTek 8139 driver (8139too) support (EXPERIMENTAL)' CONFIG_RTL8139TOO
if [ "$CONFIG_RTL8139TOO" != "n" ]; then
bool ' Use PIO instead of MMIO' CONFIG_8139TOO_PIO
bool ' Support for automatic channel equalization' CONFIG_8139TOO_TUNE_TWISTER

2001-03-27 18:09:48

by Ingo Oeser

[permalink] [raw]
Subject: Re: OOM killer???

On Tue, Mar 27, 2001 at 03:24:16PM +0200, Martin Dalecki wrote:
> > @@ -93,6 +95,10 @@
> > p->uid == 0 || p->euid == 0)
> > points /= 4;
> >
> > + /* Much the same goes for processes with low UIDs */
> > + if(p->uid < 100 || p->euid < 100)
> > + points /= 2;
> > +
>
> Plase change to 100 to 500 - this would make it consistant with
> the useradd command, which starts adding new users at the UID 500

No, useradd reads usally the /etc/login.defs to select the range.
The oom-killer should have configurables for that, to allow the
policy decisions in USER space -- where it belongs -- not in KERNEL space

If we use my OOM killer API, this patch would be a module and
could have module parameters to select that.

Johnathan: I URGE you to apply my patch before adding OOM killer
stuff. What's wrong with it, that you cannot use it? ;-)

It is easy to add configurables to a module and play with them
WITHOUT recompiling.

Dynamic sysctl tables would also be possible, IF we had an value
that is DEFINED to be invalid for sysctrl(2) and only valid for /proc.

It is also better to include the egid into the decision. There
are deamons, that I defintely want to be killed on a workstation,
but not on a server.

e.g. My important matlab calculation, which runs in user mode
should not be killed. But killing a local webserver, which serves
my help system is ok (because I will not loose work, and might
get it over the net, if there is a problem).

So as Rik stated: The OOM killer cannot suit all people, so it
has to be configurable, to be OOM kill, not overkill ;-)

Thanks & Regards

Ingo Oeser
--
10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag>
<<<<<<<<<<<< been there and had much fun >>>>>>>>>>>>

2001-03-27 18:39:18

by Jonathan Morton

[permalink] [raw]
Subject: Re: OOM killer???

>If we use my OOM killer API, this patch would be a module and
>could have module parameters to select that.
>
>Johnathan: I URGE you to apply my patch before adding OOM killer
> stuff. What's wrong with it, that you cannot use it? ;-)
>
>It is easy to add configurables to a module and play with them
>WITHOUT recompiling.

Thanks for reminding me - I'll look into it on the plane and see what I can
do with it.

>e.g. My important matlab calculation, which runs in user mode
>should not be killed. But killing a local webserver, which serves
>my help system is ok (because I will not loose work, and might
>get it over the net, if there is a problem).
>
>So as Rik stated: The OOM killer cannot suit all people, so it
>has to be configurable, to be OOM kill, not overkill ;-)

Yes, configurability is probably a very good idea. However, it would be
best to include a good set of general parameters in the kernel itself, so
the set of average systems needs as little tweaking as possible. One
cannot expect every sysadmin to be familiar with these arcane (and rarely
actually used) parameters, so being able to select "server", "batch",
"workstation", "embedded" and so on would help massively.

--------------------------------------------------------------
from: Jonathan "Chromatix" Morton
mail: [email protected] (not for attachments)
big-mail: [email protected]
uni-mail: [email protected]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----


2001-03-27 19:20:30

by Martin Dalecki

[permalink] [raw]
Subject: Re: OOM killer???

Ingo Oeser wrote:
>
> On Tue, Mar 27, 2001 at 03:24:16PM +0200, Martin Dalecki wrote:
> > > @@ -93,6 +95,10 @@
> > > p->uid == 0 || p->euid == 0)
> > > points /= 4;
> > >
> > > + /* Much the same goes for processes with low UIDs */
> > > + if(p->uid < 100 || p->euid < 100)
> > > + points /= 2;
> > > +
> >
> > Plase change to 100 to 500 - this would make it consistant with
> > the useradd command, which starts adding new users at the UID 500
>
> No, useradd reads usally the /etc/login.defs to select the range.
> The oom-killer should have configurables for that, to allow the
> policy decisions in USER space -- where it belongs -- not in KERNEL space

OK sysctl would be more appripriate.

> If we use my OOM killer API, this patch would be a module and
> could have module parameters to select that.
>
> Johnathan: I URGE you to apply my patch before adding OOM killer
> stuff. What's wrong with it, that you cannot use it? ;-)
>
> It is easy to add configurables to a module and play with them
> WITHOUT recompiling.

It's total overkill and therefore not a good design.

> Dynamic sysctl tables would also be possible, IF we had an value
> that is DEFINED to be invalid for sysctrl(2) and only valid for /proc.
>
> It is also better to include the egid into the decision. There
> are deamons, that I defintely want to be killed on a workstation,
> but not on a server.
>
> e.g. My important matlab calculation, which runs in user mode
> should not be killed. But killing a local webserver, which serves
> my help system is ok (because I will not loose work, and might
> get it over the net, if there is a problem).
>
> So as Rik stated: The OOM killer cannot suit all people, so it
> has to be configurable, to be OOM kill, not overkill ;-)

Irony: Why then not store this information permanently - inside
the UID of the application?

2001-03-27 19:59:00

by Andreas Dilger

[permalink] [raw]
Subject: Re: OOM killer???

Martin Dalecki writes:
> Ingo Oeser wrote:
> > So as Rik stated: The OOM killer cannot suit all people, so it
> > has to be configurable, to be OOM kill, not overkill ;-)
>
> Irony: Why then not store this information permanently - inside
> the UID of the application?

Because in some cases (large companies and such) the UID is centrally
controlled across all machines in the company, so there are > 100 (or
500 or 1000) "system" UIDs. At one company I did work for, there were
dozens (maybe > 100) oracle instances alone (each with different UID
and passwords for security), and lots more "system" application UIDs,
each unique.

Encoding more information into the UID is getting back to the bad old
days of "uid 0" is can do anything, rather than the capability model we
are working towards. Even so, encoding process killability info in the
UID is _still_ not putting policy in user space, because if you don't
like how the OOM killer works you still need to recompile and reboot.

Having a configurable OOM killer is not overkill, IMHO, because it is
only called in very rare cases (i.e. OOM is hopefully a rare event),
so it is definitely not on the fast path. I'm sure people will agree
that spending a few extra cycles to kill the correct process is far
better than killing a lot of incorrect processes quickly.

Every time this subject comes up, I point to AIX and SIGDANGER - a signal
sent to processes when the system gets OOM. If the process has registered
a SIGDANGER handler, it has a chance to free cache and such (or do a clean
shutdown), otherwise the default signal handler will kill the process.

SIGDANGER would fix the original problem (killing numerical methods
application running for weeks) perfectly - the application can freely
allocate cache memory to speed up the calculations. When system gets
OOM (for whatever reason), it sends SIGDANGER to applications first and
they can free buffers or do safe shutdown, and this may get system out
of OOM case without having to kill anything.

Granted, I'm not against fixing the VM to reducing OOM conditions in the
first place. Having SIGDANGER still gives the application a chance to
save itself before it is killed, which none of the OOM changes have
addressed at all. It is _still_ possible to get a system into OOM from
network buffers and such, regardless of whether an application is
calling malloc() returns NULL or not.

Also, having a SIGDANGER handler _could_ reduce a process "badness"
value when looking for processes to SIGKILL, when calling all of the
SIGDANGER handlers has not freed enough memory to get out of OOM. This
assumes that programs which register SIGDANGER handlers are important,
rather than malicious (in which case your system has other problems).

Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert

2001-03-27 21:14:57

by Andreas Rogge

[permalink] [raw]
Subject: Re: OOM killer???

--On Tuesday, March 27, 2001 12:55:50 -0700 Andreas Dilger
<[email protected]> wrote:

> Every time this subject comes up, I point to AIX and SIGDANGER - a signal
> sent to processes when the system gets OOM. If the process has registered
> a SIGDANGER handler, it has a chance to free cache and such (or do a clean
> shutdown), otherwise the default signal handler will kill the process.

Having a SIGDANGER would be a fine thing, but this will need patching in all
current daemons and there has to be a possibility to configure the behaviour
of the process when recieving a SIGDANGER. i.e. it is a good idea to kill
apache on a workstation, but a very bad idea to kill apache on a webserver.
Generally I'd like to see such an implementation, but wouldn't it be better
to have a pre-seclction of the processes getting SIGDANGER?

For example: if OOM occours, send SIGDANGER to all non-root-processes with a
nice-level of n or higher (where n should be discussed).

This would make it easy to "configure" SIGDANGER-unaware Applications - in
the meantime, until all applications are SIGDANGER-aware - to deal with
OOM-situations. You just do an "nice -n -1 httpd" and one's httpd won't
get killed when OOM occours.

IMO this would dramatically improve the OOM-Problems right now.

--
Andreas Rogge <[email protected]>
Available on IRCnet:#linux.de as Dyson

2001-03-29 09:30:38

by Dr. Michael Weller

[permalink] [raw]
Subject: Re: OOM killer???

On Wed, 28 Mar 2001, Andreas Dilger wrote:

> Szaka writes:
> > On Tue, 27 Mar 2001, Andreas Dilger wrote:
> > > Every time this subject comes up, I point to AIX and SIGDANGER - a signal
> > > sent to processes when the system gets OOM.
> >
> > And every time the SIGDANGER comes up, the issue that AIX provides
> > *both* early and late allocation mechanism even on per-process basis
> > that can be controlled by *both* the programmer and the admin is
> > completely ignored. Linux supports none of these...

Maybe some details here were helpful. Are you referring to
madvise()/mlock? Maybe malloc and friends could be modified to reserve
newly allocate pages. For things like fork()/exec() a new flag is
probably required to ensure this, or the new process could actually
run a syscall first to ensure all memory can be accessed.

> If Linux provided both of those, then people would probably already be
> happy.

Probably.

>
> > ...with the current model it's quite possible the handler code is still
> > sitting on the disk and no memory will be available to page it in.
>
> Actually, I see SIGDANGER being useful at the time we hit freepages.min
> (or maybe even freepages.low), rather than actual OOM so that the apps
> have a chance to DO something about the memory shortage, rather than just
> waiting around to die.
>
> On AIX, if a process gets SIGDANGER without a registered signal handler,
> the process is terminated. It would be nice to send SIGDANGER to

Not right. AIX has two lowlevel marks, if I understand the thread right,
we would use freepages.low and freepages.min.

If freepages.low is hit, a SIGDANGER is sent to all processes. The signal
is *ignored by default*. In the handler, an easy to use syscall (no
odd /proc file parsing (really, I don't like this bloatage, if I could I
would disable it for its security issues in certain places, at least in
the past, but it is no longer possible to live w/o it) can be used to
learn about the pages left. If it's not clear that SIGDANGER is only
triggered by oom we should have a nice syscall/libcall
like is_paging_space_low().

If freepages.min is reached, AIX starts to kill processes (just like OOM
killer). It uses some heuristics which might be better than our, but I
doubt it. Probably background (no ctty) and long running processes should
be preserved if possible. short term running, interactive progs can
typically just be restarted, but loosing one weeks computation is bad
(now a sensible application should actually checkpoint itself).
AIX also often kills the wrong process: Typical situation here: user A
runs a long running experiment, user B logs in, doesn't check load of the
machine, starts a too large application and user A's appl. dies while the
one of B survices for half an hour and dies too.

The OOM killer of AIX honours catching SIGDANGER. It kills those procs
last. A nice trick is to make a no_op handler for it. This will have your
proc die last.

Michael.

--

Michael Weller: [email protected], [email protected],
or even [email protected]. If you encounter an eowmob account on
any machine in the net, it's very likely it's me.

2001-03-29 11:03:13

by Guest section DW

[permalink] [raw]
Subject: Re: OOM killer???

On Thu, Mar 29, 2001 at 11:29:34AM +0200, Dr. Michael Weller wrote:
> On Wed, 28 Mar 2001, Andreas Dilger wrote:
>
> > Szaka writes:
> > > On Tue, 27 Mar 2001, Andreas Dilger wrote:
> > > > Every time this subject comes up, I point to AIX and SIGDANGER - a signal
> > > > sent to processes when the system gets OOM.
> > >
> > > And every time the SIGDANGER comes up, the issue that AIX provides
> > > *both* early and late allocation mechanism even on per-process basis
> > > that can be controlled by *both* the programmer and the admin is
> > > completely ignored. Linux supports none of these...
>
> > If Linux provided both of those, then people would probably already be
> > happy.
>
> Probably.

Two things are wrong.
1. Linux has an OOM killer.
2. The OOM killer has a bad behaviour.

Presently, with the proper kind of load, one can see a process killed
by OOM almost daily. That is totally unacceptable.
People are working on refining the algorithm so that blatant idiocies
where processes are killed while there is plenty of resources
are avoided. Good. Suppose it done. Then one thing is wrong.

1. Linux has an OOM killer.

A system with an OOM killer is unreliable. Linux must have a reliable
mode of operation, and that must be the default mode.

Now you assume that adding SIGDANGER would make people happy.
But it would be a rather unimportant addition.
It might help in some cases, but it falls in the category
of improving the OOM killer a little.

People will be happy when Linux is reliable by default.

Andries

[Never use planes where the company's engineers spend their
time designing algorithms for selecting which passenger
must be thrown out when the plane is overloaded.]

2001-03-29 12:00:34

by Sean Hunter

[permalink] [raw]
Subject: Re: OOM killer???

On Thu, Mar 29, 2001 at 01:01:54PM +0200, Guest section DW wrote:
> [Never use planes where the company's engineers spend their
> time designing algorithms for selecting which passenger
> must be thrown out when the plane is overloaded.]

This is (as far as I can see) a fantastically specious argument. A plane is
designed to function in an entirely constrained mode of operation and in an
entirely well-understood and circumscribed problem space, whereas a linux host
is a general-purpose device which can be used for many different applications.

The reason the aero engineers don't need to select a passanger to throw out
when the plane is overloaded is simply that the plane operators do not allow
the plane to become overloaded. If I put a 100 people in a trilander it may
take off, but won't fly, and will probably crash. The plane's designers don't
have to do anything about that- we do.

Furthermore, why do you suppose an aeroplane has more than one altimeter,
artifical horizon and compass? Do you think it's because they are unable to
make one of each that is reliable? Or do you think its because they are
concerned about what happens if one fails _however unlikely that is_.

In fact, aeroplane engineers do design in ways of mitigating the effects of all
kinds of failures, including lessening the impact of a crash (directly
analogous to our OOM killer). For example, they provide means of jettisonning
fuel prior to crash landing to attempt to minimise explosions.

Risk management is about lessening impact as well as lessening probability. If
something is important, you don't only make it work as well as you can, you
mitigate the effect of failure. A reliable system is not just a strong belt,
it is belt, braces, suspenders and bicycle clips.

I have seen the OOM killer in operation three times on our production servers.
In each case it kept the machine alive in the face of hostile runaway
processes. I don't want to see things killed, but if that is the only way to
keep the host alive, I vote to keep it alive.

When I'm on a plane, I want more than one engine _and_ lifejackets.

Sean


Attachments:
(No filename) (2.31 kB)

2001-03-29 12:58:19

by Guest section DW

[permalink] [raw]
Subject: Re: OOM killer???

On Thu, Mar 29, 2001 at 01:02:38PM +0100, Sean Hunter wrote:

> The reason the aero engineers don't need to select a passanger to throw out
> when the plane is overloaded is simply that the plane operators do not allow
> the plane to become overloaded.

Yes. But today Linux willing overcommits. It would be better if
the default was not to.

> Furthermore, why do you suppose an aeroplane has more than one altimeter,
> artifical horizon and compass? Do you think it's because they are unable to
> make one of each that is reliable? Or do you think its because they are
> concerned about what happens if one fails _however unlikely that is_.

Unix V6 did not overcommit, and panicked if is was out of swap
because that was a cannot happen situation.
If you argue that we must design things so that there is no overcommit
and still have an OOM killer just in case, I have no objections at all.

2001-03-29 13:45:05

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: OOM killer???


On Thu, 29 Mar 2001, Dr. Michael Weller wrote:
> On Wed, 28 Mar 2001, Andreas Dilger wrote:
> > Szaka writes:
> > > And every time the SIGDANGER comes up, the issue that AIX provides
> > > *both* early and late allocation mechanism even on per-process basis
> > > that can be controlled by *both* the programmer and the admin is
> > > completely ignored. Linux supports none of these...
> Maybe some details here were helpful.

http://www.unet.univie.ac.at/aix/aixbman/baseadmn/pag_space_under.htm

> > > ...with the current model it's quite possible the handler code is still
> > > sitting on the disk and no memory will be available to page it in.
> > Actually, I see SIGDANGER being useful at the time we hit freepages.min

The point is AIX *can* guarantee [even for an ordinary process] that
your signal handler will be executed, Linux can *not*. It doesn't matter
where the different oom watermarks are, there would be always such
situations when your handler would get the control it's already far too
late [because between sending SIGDANGER and app getting the control (you
can't schedule e.g. 1000 apps at the same time) the system run into oom
and killed just your app (and e.g. the other 999 buggy mem leaking app
registered a no-op SIGDANGER handler), hope you get the picture even
the example is highly unrealistic].

> > (or maybe even freepages.low), rather than actual OOM so that the apps
> > have a chance to DO something about the memory shortage,

Primarily *users* should have a chance to control this thing, not
developers and kernel. The laters should provide a way to control things
and have a reasonable default [Linux already has the latest but not the
former]. Guess what one wants to be killed if he runs Oracle in
production and DB2, Informix, Sybase, etc in trial. Now only kernel
decides. With SIGDANDER also only developers/kernel would decide [now
forget about resource management that would prevent running all of them
on the same box, Linux users want to fully utilize the box ;)].

So again, IMHO to address this long standing problem, Linux needs
- optional non-overcommit, with per-process granularity would be nice
[and leave the default just as-is now], oom killer could weight also
based on this info additionally
- reserved/quaranteed superuser memory [otherwise in non-overcommit mode
in user space oom(=system oom), oom killer would just take action]
- and as a last chance not to deadlock, advisory oom killer with
reasonable default [the current default is pretty fine - apart from
its current bugs]
- a HOWTO about preventing OOM, killing your important processes, etc

Later on virtual swap space [the degree of memory overcommitment] and
SIGDANGER maybe would be useful but I don't think so at present.

BTW, the issue is far more difficult as some people, let's "fix malloc
and its friends" think.

> If freepages.min is reached, AIX starts to kill processes (just like OOM
> killer). It uses some heuristics which might be better than our, but I
> doubt it.

If every process runs in non-overcommit mode AIX kills init first :)

Szaka

2001-03-29 15:02:52

by Dr. Michael Weller

[permalink] [raw]
Subject: Re: OOM killer???

On Thu, 29 Mar 2001, Szabolcs Szakacsits wrote:

>
> On Thu, 29 Mar 2001, Dr. Michael Weller wrote:
> > On Wed, 28 Mar 2001, Andreas Dilger wrote:
> > > Szaka writes:
> > > > And every time the SIGDANGER comes up, the issue that AIX provides
> > > > *both* early and late allocation mechanism even on per-process basis
> > > > that can be controlled by *both* the programmer and the admin is
> > > > completely ignored. Linux supports none of these...
> > Maybe some details here were helpful.
>
> http://www.unet.univie.ac.at/aix/aixbman/baseadmn/pag_space_under.htm

I read it, it agress with my memory.

>
> > > > ...with the current model it's quite possible the handler code is still
> > > > sitting on the disk and no memory will be available to page it in.
> > > Actually, I see SIGDANGER being useful at the time we hit freepages.min
>
> The point is AIX *can* guarantee [even for an ordinary process] that
> your signal handler will be executed, Linux can *not*. It doesn't matter

No it can't... and the reason is...

> where the different oom watermarks are, there would be always such
> situations when your handler would get the control it's already far too
> late [because between sending SIGDANGER and app getting the control (you
> can't schedule e.g. 1000 apps at the same time) the system run into oom
> and killed just your app (and e.g. the other 999 buggy mem leaking app

..just that. The system may run OOM hard before your SIGDANGER handler is
called,

Of course not if there are 999 apps w/o a sigdanger handler. I doubt the
kernel needs to schedule them to learn SIGDANGER is ignored. These won't
be touched. But if there are 999 leaky apps with a SIGDANGER handler, you
might still be killed.

Note that there are nasty users like me, which provide a no_op function
as SIGDANGER handler. In my first hand AIX experience, this suffices
to get your process killed last (after init, for example). If you want to
honour it only when the process really freed memory, you need to store the
memory used when entering SIGDANGER (which can probably be done).

Just a routine to trigger SIGDANGER in all apps at mem low would probably
not bloat the kernel much.

About killing, from my first hand experience aix just kills random
processes (only distinguishhes SIGDANGER ignored or not). Maybe those
using much memory first. This is by no means better than the OOM killer
ap. It for examples kills your 14 days running large simulation if joe
blow users Netscape goes havoc.

AIX does not 'strictly' guarantee that SIGDANGER reaches you. It can't.
It is only likely since few processes catch SIGDANGER, and those that do
it will probably not allocate much memory while doing so. (but the single
page they might allocate for some local stack variable in the SIGDANGER
handler can kill it).

Joe blow user can code a SIGDANGER exploiting prog that will kill the
whole concept by allocating memory in SIGDANGER.

Actually, the OOM Killer could probably be coded to send an early warning
too.. That is the idea of the external implementation.. more flexibility.

I personally agree with you though, I don't like external kernel servers
like this one. I also dislike kerneld and modules (except for first time
installation of a distrib). I still always compile monolithic kernels for
my servers. I can't sleep well otherwise, sorry.

> registered a no-op SIGDANGER handler), hope you get the picture even
> the example is highly unrealistic].
>
> > > (or maybe even freepages.low), rather than actual OOM so that the apps
> > > have a chance to DO something about the memory shortage,
>
> Primarily *users* should have a chance to control this thing, not

I agree, and like SIGDANGER as an early warning. But it doesn't solve
the issue finally, only helps soothing.

About this early alloction myths: Did you actually read the page?
The fact its controlled by a silly environment variable shows it
is a mere user space issue.

So what does it do? if you do malloc (that is, sbrk or mmap) it checks
if enough paging page is there or not. Then you need to allocate (=dirty)
the pages. Just access every pagesize's (4096) byte in it. If someone else
allocates memory while you do it, you might get killed doing so or get
SIGDANGER (which you should catch so you can return 'failed' in that case
and free the new block). Same for mmaps not covered by a file
(PRIVATE or anonymous map).

How simple. If you are that concerned about this: You can easily do it in
your application. Talk to the glibc people (not kerne) to add it for your
convenience. S.o. could also make a new malloc implementation to be linked
to your app (c.f. electric fence) to overload the standard one. A simple
user space issue. IMHO, lean syscalls to dirty/alloc many pages and esp.
to learn about memory left (I don't want to parse /proc/meminfo all the
time) would be helpful though (but not strictly required).

Except when the kernel does lazy erasing of new pages, this allocation
will be slow though (can't be done in a single timeslice). It would be
good if pages could be allocated without really clearing them to zeor
(this would be done on first access). This would speedup this feature. One
would need to see how much this is an RL issue though. Probbly the appl
would dirty the pages shortly after anyway.

What does AIX about stack overflows? Applications forking and then
dirtying their shared data pages madly? OOps.. nothing.. Why? It cannot be
done!

The real advantage of the AIX implementation is: it covers the easy to
implement cases which are luckily those common people consider. The
difficult cases are not handled, but these concern only the real hackers
which know it can't be done.

Believe me. Without the early killing bug, you'd now praise how cleverly
OOM killer selects apps to kill compared to aix with no, or only well
behaved, SIGDANGER handlers.

Personally, I'd still prefer the (kernel wise seen) simple AIX
in kernel implementation, at least as an option to the external OOM
killer.

Michael (which uses AIX still about as much as linux).

--

Michael Weller: [email protected], [email protected],
or even [email protected]. If you encounter an eowmob account on
any machine in the net, it's very likely it's me.

2001-03-29 15:43:52

by David Konerding

[permalink] [raw]
Subject: Re: OOM killer???

I would tend to agree, I'm not a fan of the OOM killer's behavior. The OOM is forced
because of the policy of overcommitting of memory. The reason for that policy is
based on an observation:
many programs allocate far more memory than they ever use, and most people don't want
their program to crash just because it malloc()s memory it isn't going to use.
Having to activate the pages for memory that never gets used feels wasteful to some
people. On the other hand, having my two-week-running scientific app killed because
it did a whole bunch of disk reads that fill up 600MB of my 768MB RAM, then tried to
allocate a whole bunch more memory (that it *will* use) is obviously a fault of the
OOM killer policy. At the very least the entire disk cache, or a goodly portion of
it, should be drained before ever invoking the OOM. I'm more than glad to pay the
performance hit of no disk buffers for the privilege of having my job finish.

Now, if you're going to implement OOM, when it is absolutely necessary, at the very
least, move the policy implementation out of the kernel. One of the general
philosophies of Linux has been to move policy out of the kernel. In this case, you'd
just have a root owned process with locked pages that can't be pre-empted, which
implemented the policy. You'll never come up with an OOM policy that will fit
everybody's needs unless it can be tuned for particular system's usage, and it's
going to be far easier to come up with that policy if it's not in the kernel.



Guest section DW wrote:


> Two things are wrong.

> 1. Linux has an OOM killer.
> 2. The OOM killer has a bad behaviour.
>
> Presently, with the proper kind of load, one can see a process killed
> by OOM almost daily. That is totally unacceptable.
> People are working on refining the algorithm so that blatant idiocies
> where processes are killed while there is plenty of resources
> are avoided. Good. Suppose it done. Then one thing is wrong.
>
> 1. Linux has an OOM killer.
>
> A system with an OOM killer is unreliable. Linux must have a reliable
> mode of operation, and that must be the default mode.
>
> Now you assume that adding SIGDANGER would make people happy.
> But it would be a rather unimportant addition.
> It might help in some cases, but it falls in the category
> of improving the OOM killer a little.
>
> People will be happy when Linux is reliable by default.
>
> Andries
>
> [Never use planes where the company's engineers spend their
> time designing algorithms for selecting which passenger
> must be thrown out when the plane is overloaded.]
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2001-03-29 16:21:04

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: OOM killer???


On Thu, 29 Mar 2001, Dr. Michael Weller wrote:
> On Thu, 29 Mar 2001, Szabolcs Szakacsits wrote:
> > The point is AIX *can* guarantee [even for an ordinary process] that
> > your signal handler will be executed, Linux can *not*. It doesn't matter
> No it can't... and the reason is...

So AIX is buggy in eager mode not reserving a couple of extra pages [per
process] to be able to run the handler. What AIX version(s) you use?
Anyway, as you probably noticed at present I'm not a big supporter of
introducing SIGDANGER, too many things can be messed up for little
or no gain.

> Note that there are nasty users like me, which provide a no_op function
> as SIGDANGER handler.

For example this.

> Joe blow user can code a SIGDANGER exploiting prog that will kill the
> whole concept by allocating memory in SIGDANGER.

And this. Moreover it shouldn't be malicious, people write happily
sighandlers that would blowup thing even without they realise ...

And admin still have no control over the things ;) Sure it could be
worked around these but I feel it just doesn't worth for the added
complexity.

> About this early alloction myths: Did you actually read the page?
> The fact its controlled by a silly environment variable shows it
> is a mere user space issue.

This is my question as well ;) Although I didn't read the AIX source but
guessed kernel sets a bit in the task structure for eager mode during
the exec() syscall and takes care about everything, at least this is
what the document suggests ;) [see the bottom of the page]

Szaka

2001-03-29 16:25:35

by Jesse Pollard

[permalink] [raw]
Subject: Re: OOM killer???

Guest section DW <[email protected]>:
>
> On Thu, Mar 29, 2001 at 01:02:38PM +0100, Sean Hunter wrote:
>
> > The reason the aero engineers don't need to select a passanger to throw out
> > when the plane is overloaded is simply that the plane operators do not allow
> > the plane to become overloaded.
>
> Yes. But today Linux willing overcommits. It would be better if
> the default was not to.

Preferably, the default should be a configure option, with runtime
alterations.

> > Furthermore, why do you suppose an aeroplane has more than one altimeter,
> > artifical horizon and compass? Do you think it's because they are unable to
> > make one of each that is reliable? Or do you think its because they are
> > concerned about what happens if one fails _however unlikely that is_.
>
> Unix V6 did not overcommit, and panicked if is was out of swap
> because that was a cannot happen situation.

Ummm... no. The user got "ENOMEM" or "insufficient memory for fork", or
"swap error". The system didn't panic unless there was an I/O error on
the swap device.

> If you argue that we must design things so that there is no overcommit
> and still have an OOM killer just in case, I have no objections at all.

good.

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2001-03-29 16:42:23

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: OOM killer???


On Thu, 29 Mar 2001, Dr. Michael Weller wrote:

> Applications forking and then dirtying their shared data pages
> madly? OOps.. nothing.. Why? It cannot be done!

In eager mode Solaris, Tru64, Irix, non-overcommit patch for Linux by
Eduardo Horvath from last year can do (you get ENOMEM at fork).

Szaka

2001-03-29 17:22:53

by Stephen Satchell

[permalink] [raw]
Subject: Re: OOM killer???

At 07:41 AM 3/29/01 -0800, David Konerding wrote:
>Now, if you're going to implement OOM, when it is absolutely necessary, at
>the very
>least, move the policy implementation out of the kernel. One of the general
>philosophies of Linux has been to move policy out of the kernel. In this
>case, you'd
>just have a root owned process with locked pages that can't be pre-empted,
>which
>implemented the policy. You'll never come up with an OOM policy that will fit
>everybody's needs unless it can be tuned for particular system's usage,
>and it's
>going to be far easier to come up with that policy if it's not in the kernel.

SUMMARY OF COMMENT: We need kernel support for such a userland
process. At a minimum, I believe we would need a means to steer term
signals and kill signals to the correct process in a process tree, a means
for processes to receive notification that the system is in trouble, and
that the policy be set and the OOM killer be implemented in a daemon that
accepts input directly from the admin in a config file, from the memory
system via suitable interfaces, and from the processes via communications
(probably through the process control block). SIGDANGER needs to be
defined, but never raised by the kernel. There also needs to be a means
for the OOM daemon to request the release of non-critical cache buffers

Comment follows:

I'm in basic agreement with your sentiments, but I'm concerned about how a
userland policy system will work without some support within the
kernel. The support also needs to be well-enough defined that applications
can be written to work with the policy manager.

Let's start with Oracle, as one of the examples that keeps being brought
up. How does Oracle deal with the problem? Part of the problem is that we
have a late-commit policy and no way to clean up when the processes that
are running exceed the capacity of the machine they are running on. The
AIX solution at first glance seems reasonable: give the running processes
a chance to "become part of the solution" by freeing memory space they have
reserved but are not using, or that can be decommissioned [cached
information] without destroying the process work. The policy
implementation can by default reward those processes by lowering their
chances of being killed if the condition is not corrected.

Another characteristic of Oracle (shared by other mission-critical systems)
is that the thing is not implemented as a single process, but as a
collection of running processes working together. Arbitrarily killing one
process can corrupt the process work of several processes. Currently there
is no mechanism for a process to inform the system that any kills should
really be directed at *this* parent, so that the whole thing shuts down
reasonably. If such a mechanism were to be provided, any signals to ANY
process would be steered to the "top" process. To prevent any subtle
attacks, we would have to define bounds on which process is identified as
the "top" process. (Non-root process in the process tree would probably be
fine for non-root subprocesses; any process in the process tree except init
would be suitable for root subprocesses.)

Several people have identified that one of the current versions of the OOM
killer doesn't cause the release of cache buffers within the Linux
kernel. I've seen mention of patches to correct this problem, but if you
move the implementation of memory overcommittment recovery from the kernel
to userland, you will need to have some way for that userland daemon to
tell the various subsystems to release cached information that can be
safely released. This lets the daemon used a structured recovery technique
where it does (A), check to see if that opens up enough room, does (B),
check to see if that opens enough room, and so forth. Note that some
method needs to tell malloc() to fail all subsequent memory requests so
that when the daemon takes a corrective action there isn't further
overcommittment.

Which brings up another point: why SIGKILL? SIGTERM would appear to be
the proper signal at a "yellow alert" so that the process has a chance of
going through an orderly shutdown -- which might include check-pointing
that week-long calculation of the 6,839,294,763,900,034th prime
number. This is especially important for process sets -- the first time
that the top-level ORACLE module finds out there is trouble is to get a
SIGCHILD signal from a troop when it isn't expecting one?

Finally, I want to bring up a sore subject: beancounting. When I was
taking CS 306 (Operating System Principles) at UIUC in 1972 one issue that
came up is the management of over-committment. Our term project, a system
resource manager, had to deal with a load that included just the behavior
that has been discussed here: programs that reserved resources that it
never uses. In our resource monitors, we were expected to keep track of
allocations, deallocations, and actual usage such that we could CONTROL the
overcommittment of resources, and therefore avoid deadlock. From my
reading of the threads and a glance at the source, the problem is that
processes can ask for and receive resources virtually without limit...as
long as nobody actually uses those resources. It's only when the processes
try to use the resources that the system has promised the processes it can
use that the problem rears its ugly head.

Not only should there be beancounting, but there needs to be policy input
to the kernel when to fail malloc() et. al. If I want to avoid all
overcommittment, I should be able to set a value in a file in the /proc
filesystem to zero to say "0 percent overcommittment" -- which means fail
malloc() calls when you reach a calculated high-water mark. Higher values
stored in this file means higher levels of overcommittment is allowed in
memory allocation calls. The default at boot would be zero; the
distributions could then decide how to set the overcommittment value in the
start-up scripts. The userland policy process could even tweak this
overcommittment value on the fly if so desired, to tune the system to
current demand and to the admin's inputs.

This helps separate the prevention measure (failing mallocs()) from the
recovery measure (killing processes).

I see no way that the beancounting can be relegated to a userland process
-- it needs to be in the kernel. To avoid excess bloat in the kernel, the
kernel should only count the beans and trigger the userland process when
thresholds are exceeded by the system. In this manner, no OOM killer code
need be in the kernel at all. No OOM killer registered? We then revert to
Version 7 action: panic.

Which leads to my final point: I believe that the SIGDANGER signal should
be defined in Linux. The signal would not be raised by the kernel in any
way -- that's left to the userland OOM daemon. The response to SIGDANGER
would be described. The default action would be to ignore SIGDANGER. One
comment is that a denial of service could be launched by a process defining
a SIGDANGER handler that would call malloc() -- I've already mentioned the
requirement that the userland daemon have a way of causing all calls to
malloc() to fail. The Linux definition would differ from the AIX
definition, but the net result would be the same and I believe that the
Linux definition can be written such that existing AIX-based handlers will
work with minimum modification.

I submit this as a strawman suggestion -- I'm not married to any of the
ideas. Feel free to suggest alternatives that solve the problem.

Stephen Satchell



2001-03-29 17:59:24

by David Lang

[permalink] [raw]
Subject: Re: OOM killer???

one of the key places where the memory is 'allocated' but not used is in
the copy on write conditions (fork, clone, etc) most of the time very
little of the 'duplicate' memory is ever changed (in fact most of the time
the program that forks then executes some other program) on a lot of
production boxes this would be a _very_ significant additional overhead in
memory (think a busy apache server, it forks a bunch of processes, but
currently most of that memory is COW and never actually needs to be
duplicated)

David Lang


On Thu, 29 Mar
2001, David Konerding wrote:

> Date: Thu, 29 Mar 2001 07:41:44 -0800
> From: David Konerding <[email protected]>
> To: Guest section DW <[email protected]>
> Cc: Dr. Michael Weller <[email protected]>,
> Andreas Dilger <[email protected]>,
> Szabolcs Szakacsits <[email protected]>,
> Martin Dalecki <[email protected]>,
> Ingo Oeser <[email protected]>,
> Jonathan Morton <[email protected]>,
> Rogier Wolff <[email protected]>, [email protected]
> Subject: Re: OOM killer???
>
> I would tend to agree, I'm not a fan of the OOM killer's behavior. The OOM is forced
> because of the policy of overcommitting of memory. The reason for that policy is
> based on an observation:
> many programs allocate far more memory than they ever use, and most people don't want
> their program to crash just because it malloc()s memory it isn't going to use.
> Having to activate the pages for memory that never gets used feels wasteful to some
> people. On the other hand, having my two-week-running scientific app killed because
> it did a whole bunch of disk reads that fill up 600MB of my 768MB RAM, then tried to
> allocate a whole bunch more memory (that it *will* use) is obviously a fault of the
> OOM killer policy. At the very least the entire disk cache, or a goodly portion of
> it, should be drained before ever invoking the OOM. I'm more than glad to pay the
> performance hit of no disk buffers for the privilege of having my job finish.
>
> Now, if you're going to implement OOM, when it is absolutely necessary, at the very
> least, move the policy implementation out of the kernel. One of the general
> philosophies of Linux has been to move policy out of the kernel. In this case, you'd
> just have a root owned process with locked pages that can't be pre-empted, which
> implemented the policy. You'll never come up with an OOM policy that will fit
> everybody's needs unless it can be tuned for particular system's usage, and it's
> going to be far easier to come up with that policy if it's not in the kernel.
>
>
>
> Guest section DW wrote:
>
>
> > Two things are wrong.
>
> > 1. Linux has an OOM killer.
> > 2. The OOM killer has a bad behaviour.
> >
> > Presently, with the proper kind of load, one can see a process killed
> > by OOM almost daily. That is totally unacceptable.
> > People are working on refining the algorithm so that blatant idiocies
> > where processes are killed while there is plenty of resources
> > are avoided. Good. Suppose it done. Then one thing is wrong.
> >
> > 1. Linux has an OOM killer.
> >
> > A system with an OOM killer is unreliable. Linux must have a reliable
> > mode of operation, and that must be the default mode.
> >
> > Now you assume that adding SIGDANGER would make people happy.
> > But it would be a rather unimportant addition.
> > It might help in some cases, but it falls in the category
> > of improving the OOM killer a little.
> >
> > People will be happy when Linux is reliable by default.
> >
> > Andries
> >
> > [Never use planes where the company's engineers spend their
> > time designing algorithms for selecting which passenger
> > must be thrown out when the plane is overloaded.]
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2001-03-29 19:23:14

by Jesse Pollard

[permalink] [raw]
Subject: Re: OOM killer???

avid Lang <[email protected]>:
>one of the key places where the memory is 'allocated' but not used is in
>the copy on write conditions (fork, clone, etc) most of the time very
>little of the 'duplicate' memory is ever changed (in fact most of the time
>the program that forks then executes some other program) on a lot of
>production boxes this would be a _very_ significant additional overhead in
>memory (think a busy apache server, it forks a bunch of processes, but
>currently most of that memory is COW and never actually needs to be
>duplicated)

So? If the requirement is no-overcommit, then assume it WILL be overwritten.
Allocate sufficient swap for the requirement.

Now, it shouldn't be necessary to include the text segment - after all
this should be marked RX.

Actually just X would do, but on Intel systems that also means R. and if W
is set it also means RWX. I hope that Intel gets a better clue about memory
protection sometime soon.

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2001-03-30 02:48:28

by Michael Peddemors

[permalink] [raw]
Subject: Re: OOM killer???

Looking over the last few weeks of postings, there are just WAY to many
conflicting ways that people want the OOM to work.. Although an
incredible amount of good work has gone into this, people are definetely
not happy about the benifits of OOM ... About 10 different approaches
are being made to change the rule based systems pertaining to WHEN the
OOM will fire, but in the end, still not everyone will be happy..

The old way of having a machien crash when you asked too much of it
seemed acceptable.. People then either chose to run the mem hog, and
upgrade their boxes, or they stopped running the memhog.

Maybe its time to hear from on high whether OOM capability outweighs the
problems that seem to be associated with it?

Either that or we should be able to allow the end user/and the distros
to decide what gets killed by OOM.

Some people might like to just say to heck with it, and not let anything
get killed, while others may just want to protect certain things..

Speaking completely where I don't belong, and don't know enough about..
Just a thought, can we come up with a new form of execute bit that would
define whether OOM should affect the process? Then the app can be made
to be killable if not as root etc... As well as being applied simply at
a per application basis, because we KNOW that not all applications will
ever be completely system friendly..
There will always be useful programs that might not fit into a OOM model
well.

Linus has been notably quite of late about this issue, given the amount
of contention on this issue..

On 29 Mar 2001 07:41:44 -0800, David Konerding wrote:

> Now, if you're going to implement OOM, when it is absolutely necessary, at the very
> least, move the policy implementation out of the kernel. One of the general
> philosophies of Linux has been to move policy out of the kernel. In this case, you'd
> just have a root owned process with locked pages that can't be pre-empted, which
> implemented the policy. You'll never come up with an OOM policy that will fit
> everybody's needs unless it can be tuned for particular system's usage, and it's
> going to be far easier to come up with that policy if it's not in the kernel.

--
"Catch the Magic of Linux..."
--------------------------------------------------------
Michael Peddemors - Senior Consultant
LinuxAdministration - Internet Services
NetworkServices - Programming - Security
WizardInternet Services http://www.wizard.ca
Linux Support Specialist - http://www.linuxmagic.com
--------------------------------------------------------
(604)589-0037 Beautiful British Columbia, Canada

2001-03-30 14:50:35

by J. Scott Kasten

[permalink] [raw]
Subject: Re: OOM killer???


Just to throw my own observations into the war, I have to agree with David
K. here. This needs to be some sort of module and/or interface. Get the
policy into a replaceable user space module.

One of the hot areas for the kernel right now is for embedded systems.
They need an entirely different strategy than for a desk top. I'm working
on such a thing now were we don't even have an enabled swap space and the
OOM is causing us no end of trouble as we start dipping below 1MB "free"
system memory.

On 29 Mar 2001, Michael Peddemors wrote:

> Looking over the last few weeks of postings, there are just WAY to many
> conflicting ways that people want the OOM to work.. Although an
> incredible amount of good work has gone into this, people are definetely
> not happy about the benifits of OOM ... About 10 different approaches
> are being made to change the rule based systems pertaining to WHEN the
> OOM will fire, but in the end, still not everyone will be happy..

<SNIP>

> On 29 Mar 2001 07:41:44 -0800, David Konerding wrote:
>
> > Now, if you're going to implement OOM, when it is absolutely necessary, at the very
> > least, move the policy implementation out of the kernel. One of the general
> > philosophies of Linux has been to move policy out of the kernel. In this case, you'd
> > just have a root owned process with locked pages that can't be pre-empted, which
> > implemented the policy. You'll never come up with an OOM policy that will fit
> > everybody's needs unless it can be tuned for particular system's usage, and it's
> > going to be far easier to come up with that policy if it's not in the kernel.