2004-09-30 13:46:45

by Andries E. Brouwer

[permalink] [raw]
Subject: [PATCH] overcommit symbolic constants

Played a bit with overcommit the past hour.
Am not entirely satisfied with the no overcommit mode 2 -
programs segfault when the system is close to that boundary.
So, instead of the somewhat larger patch that I planned to send,
just symbolic names for the modes.

Andries

diff -uprN -X /linux/dontdiff a/arch/arm/mm/init.c b/arch/arm/mm/init.c
--- a/arch/arm/mm/init.c 2004-08-26 22:05:07.000000000 +0200
+++ b/arch/arm/mm/init.c 2004-09-30 15:34:51.000000000 +0200
@@ -590,7 +590,7 @@ void __init mem_init(void)
* anywhere without overcommit, so turn
* it on by default.
*/
- sysctl_overcommit_memory = 1;
+ sysctl_overcommit_memory = OVERCOMMIT_ALWAYS;
}
}

diff -uprN -X /linux/dontdiff a/arch/arm26/mm/init.c b/arch/arm26/mm/init.c
--- a/arch/arm26/mm/init.c 2004-08-26 22:05:07.000000000 +0200
+++ b/arch/arm26/mm/init.c 2004-09-30 15:34:51.000000000 +0200
@@ -376,7 +376,7 @@ void __init mem_init(void)
* Turn on overcommit on tiny machines
*/
if (PAGE_SIZE >= 16384 && num_physpages <= 128) {
- sysctl_overcommit_memory = 1;
+ sysctl_overcommit_memory = OVERCOMMIT_ALWAYS;
printk("Turning on overcommit\n");
}
}
diff -uprN -X /linux/dontdiff a/Documentation/vm/overcommit-accounting b/Documentation/vm/overcommit-accounting
--- a/Documentation/vm/overcommit-accounting 2003-12-18 03:58:28.000000000 +0100
+++ b/Documentation/vm/overcommit-accounting 2004-09-30 15:34:51.000000000 +0200
@@ -1,4 +1,4 @@
-The Linux kernel supports three overcommit handling modes
+The Linux kernel supports the following overcommit handling modes

0 - Heuristic overcommit handling. Obvious overcommits of
address space are refused. Used for a typical system. It
@@ -7,10 +7,10 @@ The Linux kernel supports three overcomm
allocate slighly more memory in this mode. This is the
default.

-1 - No overcommit handling. Appropriate for some scientific
+1 - Always overcommit. Appropriate for some scientific
applications.

-2 - (NEW) strict overcommit. The total address space commit
+2 - Don't overcommit. The total address space commit
for the system is not permitted to exceed swap + a
configurable percentage (default is 50) of physical RAM.
Depending on the percentage you use, in most situations
@@ -27,7 +27,7 @@ Gotchas

The C language stack growth does an implicit mremap. If you want absolute
guarantees and run close to the edge you MUST mmap your stack for the
-largest size you think you will need. For typical stack usage is does
+largest size you think you will need. For typical stack usage this does
not matter much but it's a corner case if you really really care

In mode 2 the MAP_NORESERVE flag is ignored.
diff -uprN -X /linux/dontdiff a/include/linux/mman.h b/include/linux/mman.h
--- a/include/linux/mman.h 2003-12-18 03:58:15.000000000 +0100
+++ b/include/linux/mman.h 2004-09-30 15:34:51.000000000 +0200
@@ -10,6 +10,9 @@
#define MREMAP_MAYMOVE 1
#define MREMAP_FIXED 2

+#define OVERCOMMIT_GUESS 0
+#define OVERCOMMIT_ALWAYS 1
+#define OVERCOMMIT_NEVER 2
extern int sysctl_overcommit_memory;
extern int sysctl_overcommit_ratio;
extern atomic_t vm_committed_space;
diff -uprN -X /linux/dontdiff a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c 2004-08-26 22:05:44.000000000 +0200
+++ b/mm/mmap.c 2004-09-30 15:35:27.000000000 +0200
@@ -54,7 +54,7 @@ pgprot_t protection_map[16] = {
__S000, __S001, __S010, __S011, __S100, __S101, __S110, __S111
};

-int sysctl_overcommit_memory = 0; /* default is heuristic overcommit */
+int sysctl_overcommit_memory = OVERCOMMIT_GUESS; /* heuristic overcommit */
int sysctl_overcommit_ratio = 50; /* default is 50% */
int sysctl_max_map_count = DEFAULT_MAX_MAP_COUNT;
atomic_t vm_committed_space = ATOMIC_INIT(0);
@@ -882,7 +882,7 @@ munmap_back:
return -ENOMEM;

if (accountable && (!(flags & MAP_NORESERVE) ||
- sysctl_overcommit_memory > 1)) {
+ sysctl_overcommit_memory == OVERCOMMIT_NEVER)) {
if (vm_flags & VM_SHARED) {
/* Check memory availability in shmem_file_setup? */
vm_flags |= VM_ACCOUNT;
diff -uprN -X /linux/dontdiff a/mm/nommu.c b/mm/nommu.c
--- a/mm/nommu.c 2004-08-26 22:05:44.000000000 +0200
+++ b/mm/nommu.c 2004-09-30 15:34:51.000000000 +0200
@@ -30,7 +30,7 @@ unsigned long max_mapnr;
unsigned long num_physpages;
unsigned long askedalloc, realalloc;
atomic_t vm_committed_space = ATOMIC_INIT(0);
-int sysctl_overcommit_memory; /* default is heuristic overcommit */
+int sysctl_overcommit_memory = OVERCOMMIT_GUESS; /* heuristic overcommit */
int sysctl_overcommit_ratio = 50; /* default is 50% */

int sysctl_max_map_count = DEFAULT_MAX_MAP_COUNT;
diff -uprN -X /linux/dontdiff a/security/commoncap.c b/security/commoncap.c
--- a/security/commoncap.c 2004-06-24 17:11:21.000000000 +0200
+++ b/security/commoncap.c 2004-09-30 15:34:51.000000000 +0200
@@ -314,10 +314,10 @@ int cap_vm_enough_memory(long pages)
/*
* Sometimes we want to use more memory than we have
*/
- if (sysctl_overcommit_memory == 1)
+ if (sysctl_overcommit_memory == OVERCOMMIT_ALWAYS)
return 0;

- if (sysctl_overcommit_memory == 0) {
+ if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
unsigned long n;

free = get_page_cache_size();
diff -uprN -X /linux/dontdiff a/security/dummy.c b/security/dummy.c
--- a/security/dummy.c 2004-08-26 22:05:50.000000000 +0200
+++ b/security/dummy.c 2004-09-30 15:34:51.000000000 +0200
@@ -121,10 +121,10 @@ static int dummy_vm_enough_memory(long p
/*
* Sometimes we want to use more memory than we have
*/
- if (sysctl_overcommit_memory == 1)
+ if (sysctl_overcommit_memory == OVERCOMMIT_ALWAYS)
return 0;

- if (sysctl_overcommit_memory == 0) {
+ if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
free = get_page_cache_size();
free += nr_free_pages();
free += nr_swap_pages;
diff -uprN -X /linux/dontdiff a/security/selinux/hooks.c b/security/selinux/hooks.c
--- a/security/selinux/hooks.c 2004-08-26 22:05:50.000000000 +0200
+++ b/security/selinux/hooks.c 2004-09-30 15:34:51.000000000 +0200
@@ -1548,10 +1548,10 @@ static int selinux_vm_enough_memory(long
/*
* Sometimes we want to use more memory than we have
*/
- if (sysctl_overcommit_memory == 1)
+ if (sysctl_overcommit_memory == OVERCOMMIT_ALWAYS)
return 0;

- if (sysctl_overcommit_memory == 0) {
+ if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
free = get_page_cache_size();
free += nr_free_pages();
free += nr_swap_pages;


2004-09-30 13:59:51

by Alan

[permalink] [raw]
Subject: Re: [PATCH] overcommit symbolic constants

On Iau, 2004-09-30 at 14:41, [email protected] wrote:
> Played a bit with overcommit the past hour.
> Am not entirely satisfied with the no overcommit mode 2 -
> programs segfault when the system is close to that boundary.

Not really a suprise. Very few programs handle stack growth faults.
Hence the docs comment about mmapping stacks privately for critical
code.

2004-09-30 14:19:33

by Andries E. Brouwer

[permalink] [raw]
Subject: Re: [PATCH] overcommit symbolic constants

On Thu, Sep 30, 2004 at 01:53:12PM +0100, Alan Cox wrote:
> On Iau, 2004-09-30 at 14:41, [email protected] wrote:
> > Played a bit with overcommit the past hour.
> > Am not entirely satisfied with the no overcommit mode 2 -
> > programs segfault when the system is close to that boundary.
>
> Not really a suprise. Very few programs handle stack growth faults.
> Hence the docs comment about mmapping stacks privately for critical
> code.

Most utilities do not expect to be oom-killed, but they do not
expect to be killed by segfault because of stack shortage either.
So avoiding the oom-kill and getting segfaults is no improvement
in my eyes.

A few days ago I remarked that 2 is no good when there is no swap.
OK. So, more modest aim - tighten things only in case there is
plenty of swap. I like to return NULL for malloc(), that is
something a good program tests for. I hate to fail a stack grow.
So, must play a bit more, see whether I can find a mode much
stricter than 0 that is still suitable as a general working
environment for everybody.


Andries


2004-09-30 14:26:07

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] overcommit symbolic constants

On Thu, 30 Sep 2004 [email protected] wrote:
> Played a bit with overcommit the past hour.
> Am not entirely satisfied with the no overcommit mode 2 -
> programs segfault when the system is close to that boundary.
> So, instead of the somewhat larger patch that I planned to send,
> just symbolic names for the modes.

Big improvement. And thank you for stamping on that irritating
oxymoronic "strict overcommit". Could we add this patch in too?

Signed-off-by: Hugh Dickins <[email protected]>

--- 2.6.9-rc3/Documentation/sysctl/vm.txt 2004-08-14 06:39:04.000000000 +0100
+++ linux/Documentation/sysctl/vm.txt 2004-09-30 15:23:22.340731368 +0100
@@ -47,7 +47,7 @@ of free memory left when userspace reque
When this flag is 1, the kernel pretends there is always enough
memory until it actually runs out.

-When this flag is 2, the kernel uses a "strict overcommit"
+When this flag is 2, the kernel uses a "never overcommit"
policy that attempts to prevent any overcommit of memory.

This feature can be very useful because there are a lot of

2004-09-30 14:30:21

by Alan

[permalink] [raw]
Subject: Re: [PATCH] overcommit symbolic constants

> A few days ago I remarked that 2 is no good when there is no swap.
> OK. So, more modest aim - tighten things only in case there is
> plenty of swap. I like to return NULL for malloc(), that is
> something a good program tests for. I hate to fail a stack grow.
> So, must play a bit more, see whether I can find a mode much
> stricter than 0 that is still suitable as a general working
> environment for everybody.

What might work (if you've not already tried it) is to make the initial
stack something like 1 or 4Mbytes. Don't map the pages but install a vma
of that size. That would pre-reserve address space and perhaps avoid
this. I guess if that works then make it a /proc/sys tunable for
guaranteed stack.

2004-09-30 20:27:08

by Andries E. Brouwer

[permalink] [raw]
Subject: Re: [PATCH] overcommit symbolic constants

On Thu, Sep 30, 2004 at 02:27:44PM +0100, Alan Cox wrote:

> What might work (if you've not already tried it) is to make the initial
> stack something like 1 or 4Mbytes. Don't map the pages but install a vma
> of that size. That would pre-reserve address space and perhaps avoid
> this. I guess if that works then make it a /proc/sys tunable for
> guaranteed stack.

A good and simple idea.
Yes, works entirely satisfactorily in my first few tests.

Andries