2000-11-06 08:56:00

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: Looking for better VM


On Wed, 1 Nov 2000, Rik van Riel wrote:

> I'm definately looking forward to an "OOM killer showdown"
> where we can compare how the different OOM tactics work.

Since people must live with Linux's overcommiting feature the winner
would be the one that could be tuned runtime. The best general purpuse
killer could be set as default. OOM-Killer-API-Patch is a really long
waited promising 1st step.

Please also consider to give Linux application developers a chance to
be able to write reliable software and not to kill the app with the
same possibility as the apps that ask for more memory constantly in
the long run (i.e. they don't touch the mem they requested, don't have
own mem handling, etc).

> Not because I think it matters all that much on most systems
> (good admins put in enough memory&swap),

Good admins can not put in enough memory&swap because Linux overcommit
memory, they can put extra memory&swap to a non-overcommiting system
(Solaris, Tru64, Irix, NetBSD, etc - usually they also support
non-overcommit) at an average additional 5-10% cost to achive the same
workload as Linux and they can login in and kill the offending
processes if system is out of user memory [falling back to process
killing if memory reserved for root is also out -- but this basically
never happens compared to the frequent crash complaints in case of
Linux]. A Linux admin can setup mem quotas that's in average more
expensive [see, (almost?) all Linux distribution comes with a default
config that can be easily crashed by any user] than buying cheap extra
disk/RAM and using non-overcommit VM handling (at least the default
setup can't be crashed by a user) or the Linux admin can pray or hope
some black magic that seems to be a very often case and in the end
result disappointment and anger.

> but simply because
> it appears there has been amazingly little research on this
> subject and it's completely unknown which approach will work

There has been lot of research, this is the reason most Unices support
both non-overcommit and overcommit memory handling default to
non-overcommit [think of reliability and high availability].

Szaka


2000-11-06 18:57:01

by Rik van Riel

[permalink] [raw]
Subject: Re: Looking for better VM

On Mon, 6 Nov 2000, Szabolcs Szakacsits wrote:
> On Wed, 1 Nov 2000, Rik van Riel wrote:
>
> > but simply because
> > it appears there has been amazingly little research on this
> > subject and it's completely unknown which approach will work
>
> There has been lot of research, this is the reason most Unices support
> both non-overcommit and overcommit memory handling default to
> non-overcommit [think of reliability and high availability].

It's a shame you didn't take the trouble to actually
go out and see that non-overcommit doesn't solve the
"out of memory" deadlock problem.

[if you want an explanation, look in the archives,
we've explained this a dozen times now]

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

http://www.conectiva.com/ http://www.surriel.com/

2000-11-08 11:24:12

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: Looking for better VM


On Mon, 6 Nov 2000, Rik van Riel wrote:
> On Mon, 6 Nov 2000, Szabolcs Szakacsits wrote:
> > On Wed, 1 Nov 2000, Rik van Riel wrote:
> > > but simply because
> > > it appears there has been amazingly little research on this
> > > subject and it's completely unknown which approach will work
> > There has been lot of research, this is the reason most Unices support
> > both non-overcommit and overcommit memory handling default to
> > non-overcommit [think of reliability and high availability].
> It's a shame you didn't take the trouble to actually
> go out and see that non-overcommit doesn't solve the
> "out of memory" deadlock problem.

Read my *entire* email again and please try to understand. No deadlock
at all since kernel *falls back* to process killing if memory reserved
for *root* is also out.

You could ask, so what's the point for non-overcommit if we use
process killing in the end? And the answer, in *practise* this almost
never happens, root can always clean up and no processes are lost
[just as when disk is "full" except the reserved area for root]. See?
Human get a chance against hard-wired AI.

I also didn't say non-overcommit should be used as default and a
patch http://www.cs.helsinki.fi/linux/linux-kernel/2000-13/1208.html,
developed for 2.3.99-pre3 by Eduardo Horvath and unfortunately was
ignored completely, implemented it this way.

And with a runtime tunable OOM killer, Linux really would beat the
competitors [where it is quite behind at present] in this area. See?
Human get a chance against hard-wired AI again.

Believe me, there are people [don't read only kernel lists] who wants
a reliable and controllable system and where the kernel doesn't play
Russan rulet.

[who missed my first email: forget about mem quotas and the the
non-scalable "add GB's of swap" in this discussion].

> [if you want an explanation, look in the archives,
> we've explained this a dozen times now]

I've been reading the list much longer than you and really pissed of
that after so many years of discussions, this problem and user
requirements^Wwishes are still not understood. You think black and
white but the world is colorful.

Szaka

2000-11-08 13:53:52

by Rik van Riel

[permalink] [raw]
Subject: Re: Looking for better VM

On Wed, 8 Nov 2000, Szabolcs Szakacsits wrote:
> On Mon, 6 Nov 2000, Rik van Riel wrote:
> > On Mon, 6 Nov 2000, Szabolcs Szakacsits wrote:
> > > On Wed, 1 Nov 2000, Rik van Riel wrote:
> > > > but simply because
> > > > it appears there has been amazingly little research on this
> > > > subject and it's completely unknown which approach will work
> > > There has been lot of research, this is the reason most Unices support
> > > both non-overcommit and overcommit memory handling default to
> > > non-overcommit [think of reliability and high availability].
> > It's a shame you didn't take the trouble to actually
> > go out and see that non-overcommit doesn't solve the
> > "out of memory" deadlock problem.
>
> Read my *entire* email again and please try to understand. No deadlock
> at all since kernel *falls back* to process killing if memory reserved
> for *root* is also out.
>
> You could ask, so what's the point for non-overcommit if we use
> process killing in the end? And the answer, in *practise* this almost
> never happens, root can always clean up and no processes are lost
> [just as when disk is "full" except the reserved area for root]. See?
> Human get a chance against hard-wired AI.
>
> I also didn't say non-overcommit should be used as default and a
> patch http://www.cs.helsinki.fi/linux/linux-kernel/2000-13/1208.html,
> developed for 2.3.99-pre3 by Eduardo Horvath and unfortunately was
> ignored completely, implemented it this way.

OK. This is a lot more reasonable. I'm actually looking
into putting non-overcommit as a configurable option in
the kernel.

However, this does not save you from the fact that the
system is essentially deadlocked when nothing can get
more memory and nothing goes away. Non-overcommit won't
give you any extra reliability unless your applications
are very well behaved ... in which case you don't need
non-overcommit.

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

http://www.conectiva.com/ http://www.surriel.com/

2000-11-08 16:37:30

by Mikulas Patocka

[permalink] [raw]
Subject: Re: Looking for better VM

Hi.

> > I also didn't say non-overcommit should be used as default and a
> > patch http://www.cs.helsinki.fi/linux/linux-kernel/2000-13/1208.html,
> > developed for 2.3.99-pre3 by Eduardo Horvath and unfortunately was
> > ignored completely, implemented it this way.
>
> OK. This is a lot more reasonable. I'm actually looking
> into putting non-overcommit as a configurable option in
> the kernel.
>
> However, this does not save you from the fact that the
> system is essentially deadlocked when nothing can get
> more memory and nothing goes away. Non-overcommit won't
> give you any extra reliability unless your applications
> are very well behaved ... in which case you don't need
> non-overcommit.

BTW. Why does your OOM killer in 2.4 try to kill process that mmaped most
memory? mmap is hamrless. mmap on files can't eat memory and swap.

Imagine a case: you have database server that mmaps the whole 2G file but
doesn't have too much anonymous memory. You have an offending process that
does while (1) malloc(1000) and fills up 512M swap. Your OOM killer would
kill the server first...

Mikulas

2000-11-08 17:05:23

by Christoph Rohland

[permalink] [raw]
Subject: Re: Looking for better VM

Hi Mikulas,

On Wed, 8 Nov 2000, Mikulas Patocka wrote:
> BTW. Why does your OOM killer in 2.4 try to kill process that mmaped
> most memory? mmap is hamrless. mmap on files can't eat memory and
> swap.

Be careful: They may have shm segments mmaped!

Greetings
Christoph

2000-11-08 19:53:17

by Ingo Oeser

[permalink] [raw]
Subject: Re: Looking for better VM

On Wed, Nov 08, 2000 at 05:36:40PM +0100, Mikulas Patocka wrote:
> BTW. Why does your OOM killer in 2.4 try to kill process that mmaped most
> memory? mmap is hamrless. mmap on files can't eat memory and swap.

Don't complain, build your own and test it ;-)

Apply my patch

http://www.tu-chemnitz.de/~ioe/oom_kill_api.patch

and install your own OOM handler using install_oom_killer()
from <linux/swap.h>. It has all the needed documentation inline
that will be build along the kernel-api-book.

Have fun researching in this area.

PS: Applies cleanly since oom_kill.c exists and also against
2.4.0-test11-pre1.

Regards

Ingo Oeser
--
To the systems programmer, users and applications
serve only to provide a test load.
<esc>:x

2000-11-09 00:08:59

by Rik van Riel

[permalink] [raw]
Subject: Re: Looking for better VM

On Wed, 8 Nov 2000, Mikulas Patocka wrote:

> BTW. Why does your OOM killer in 2.4 try to kill process that mmaped
> most memory? mmap is hamrless. mmap on files can't eat memory and
> swap.

Because the thing is too stupid to take that into
consideration? :)

Btw, if your mmap()ed file still takes 1GB of memory,
you have 1GB of freeable memory left and you shouldn't
be out of memory ... or should you??

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

http://www.conectiva.com/ http://www.surriel.com/

2000-11-09 16:20:59

by Szabolcs Szakacsits

[permalink] [raw]
Subject: [PATCH] Reserve VM for root (was: Re: Looking for better VM)


On Wed, 8 Nov 2000, Rik van Riel wrote:

> OK. This is a lot more reasonable.

Just the same what was in my first in email.

> I'm actually looking into putting non-overcommit as a configurable
> option in the kernel.

Nice to hear, please make it a boot time option, not a compile time
one. Also a control for how many percent the kernel can overcommit
would be nice -- this is how modern Unices do.

> However, this does not save you from the fact that the
> system is essentially deadlocked when nothing can get
> more memory and nothing goes away.

I've also never said OOM killer should be disabled. In theory the
non-overcommitting systems deadlock, Linux survives. Ironically
usually it's just the opposite in practice. Any user can
deadlock/crash Linux [default install, no quotas] but not an
non-overcommitting system [root can clean up]. Here is an example code
"simulating" a leaking daemon that will "deadlock" Linux even with
your OOM killer patch [that is anyway *MUCH* better than the actually
non-existing one in 2.2.x kernels]:

main() { while(1) if (fork()) malloc(1); }

With the patch below I could ssh to the host and killall the offending
processes. To enable reserving VM space for root do

echo -1 > /proc/sys/vm/overcommit_memory

The number of reserved pages can be tuned via /proc/sys/vm/reserved,
default is 5% of the RAM (note, RAM won't be reserved, but VM).

BTW, I wanted to take a look at the frequently mentioned beancounter patch,
here is the current state,
http://www.asp-linux.com/en/products/ubpatch.shtml
"Sorry, due to growing expenses for support of public version of ASPcomplete
we do not provide sources till first official release."

Szaka


diff -ur linux.orig/include/linux/sysctl.h linux/include/linux/sysctl.h
--- linux.orig/include/linux/sysctl.h Thu Nov 9 08:20:19 2000
+++ linux/include/linux/sysctl.h Thu Nov 9 06:30:11 2000
@@ -122,7 +122,8 @@
VM_PAGECACHE=7, /* struct: Set cache memory thresholds */
VM_PAGERDAEMON=8, /* struct: Control kswapd behaviour */
VM_PGT_CACHE=9, /* struct: Set page table cache parameters */
- VM_PAGE_CLUSTER=10 /* int: set number of pages to swap together */
+ VM_PAGE_CLUSTER=10, /* int: set number of pages to swap together */
+ VM_RESERVED=11 /* int: number of pages reserved for root */
};


diff -ur linux.orig/kernel/sysctl.c linux/kernel/sysctl.c
--- linux.orig/kernel/sysctl.c Thu Nov 9 08:20:19 2000
+++ linux/kernel/sysctl.c Thu Nov 9 06:27:33 2000
@@ -37,6 +37,7 @@
extern int bdf_prm[], bdflush_min[], bdflush_max[];
extern char binfmt_java_interpreter[], binfmt_java_appletviewer[];
extern int sysctl_overcommit_memory;
+extern int vm_reserved;
extern int nr_queued_signals, max_queued_signals;

#ifdef CONFIG_KMOD
@@ -259,6 +260,8 @@
&pgt_cache_water, 2*sizeof(int), 0600, NULL, &proc_dointvec},
{VM_PAGE_CLUSTER, "page-cluster",
&page_cluster, sizeof(int), 0600, NULL, &proc_dointvec},
+ {VM_RESERVED, "reserved",
+ &vm_reserved, sizeof(int), 0600, NULL, &proc_dointvec},
{0}
};

diff -ur linux.orig/mm/mmap.c linux/mm/mmap.c
--- linux.orig/mm/mmap.c Thu Nov 9 08:20:19 2000
+++ linux/mm/mmap.c Thu Nov 9 08:17:10 2000
@@ -40,6 +40,7 @@
kmem_cache_t *vm_area_cachep;

int sysctl_overcommit_memory;
+int vm_reserved;

/* Check that a process has enough memory to allocate a
* new virtual mapping.
@@ -59,7 +60,7 @@
long free;

/* Sometimes we want to use more memory than we have. */
- if (sysctl_overcommit_memory)
+ if (sysctl_overcommit_memory == 1)
return 1;

free = buffermem >> PAGE_SHIFT;
@@ -67,6 +68,8 @@
free += nr_free_pages;
free += nr_swap_pages;
free -= (page_cache.min_percent + buffer_mem.min_percent + 2)*num_physpages/100;
+ if (sysctl_overcommit_memory == -1 && current->uid && free < vm_reserved)
+ return 0;
return free > pages;
}

@@ -872,6 +875,11 @@

void __init vma_init(void)
{
+ struct sysinfo i;
+
+ si_meminfo(&i);
+ vm_reserved = (i.totalram >> PAGE_SHIFT) / 20;
+
vm_area_cachep = kmem_cache_create("vm_area_struct",
sizeof(struct vm_area_struct),
0, SLAB_HWCACHE_ALIGN,

2000-11-10 10:38:58

by Andrey Savochkin

[permalink] [raw]
Subject: Re: Reserve VM for root (was: Re: Looking for better VM)

Hello,

On Thu, Nov 09, 2000 at 06:30:32PM +0100, Szabolcs Szakacsits wrote:
> BTW, I wanted to take a look at the frequently mentioned beancounter patch,
> here is the current state,
> http://www.asp-linux.com/en/products/ubpatch.shtml
> "Sorry, due to growing expenses for support of public version of ASPcomplete
> we do not provide sources till first official release."

That's not a place where I keep my code (and has never been :-)

ftp://ftp.sw.com.sg/pub/Linux/people/saw/kernel/user_beancounter/UserBeancounter.html
is the right place (but it has some availability problems :-(

As for memory management, it provides a simple variant of service level
support for
- in-core memory (in opposite to swap)
- total "virtual" memory.
The latter ends up in accounting of how much memory is consumed by each
subject of accounting, and an OOM-killer.
OOM-killer takes into account guarantees given to the subject and selects the
victim. In the patch on the ftp site the selection code is very simple and
taken from some old OOM patches.

BTW, I've redone memory accounting code to significantly improve it's
performance (or, to say in other words, to reduce the performance penalty
imposed by the accounting). But this new code isn't integrated to the
complete user beancounter patch.

Best regards
Andrey

2000-11-13 23:09:40

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: user beancounter (was: Reserve VM for root)


On Fri, 10 Nov 2000, Andrey Savochkin wrote:

> On Thu, Nov 09, 2000 at 06:30:32PM +0100, Szabolcs Szakacsits wrote:
> > BTW, I wanted to take a look at the frequently mentioned beancounter patch,
> > here is the current state,
> > http://www.asp-linux.com/en/products/ubpatch.shtml
> > "Sorry, due to growing expenses for support of public version of ASPcomplete
> > we do not provide sources till first official release."
>
> That's not a place where I keep my code (and has never been :-)

Sorry, I was misguided by your earlier message at
http://boudicca.tux.org/hypermail/linux-kernel/2000week30/0114.html
where you wrote
"Patch web page is http://www.asplinux.com.sg/install/ubpatch.html"

They are the same sites [mirrors in .us, .sg, .kr and .ru].

> ftp://ftp.sw.com.sg/pub/Linux/people/saw/kernel/user_beancounter/UserBeancounter.html
> is the right place (but it has some availability problems :-(

I've also tried two other ftp sites, none of them were available, just
as at present ...

> As for memory management, it provides a simple variant of service level
> support for
[...]

Thanks for the info, user beancounter is definitely needed but it's
a 2.5 issue and people have problems now. Ironically it seems disks
soon will be as fast as RAM, many thinks max swap space supported is
still 128 MB and they set up systems according to this, app
requirements (multimedia, etc) grows eagerly and users run out of
much easier then earlier. For many the quota isn't a solution because
of performance or other reasons and Linux doesn't give them any chance
to survive such a situation.

Szaka