LinuxLists.cc - [PATCH] strict VM overcommit accounting for 2.4.32/2.4.33-pre1

[permalink] [raw]

Subject: Re: [PATCH] strict VM overcommit accounting for 2.4.32/2.4.33-pre1

On Thu, Dec 29, 2005 at 11:44:01PM -0800, Barry K. Nathan wrote:
> This patch adds strict VM overcommit accounting to the mainline 2.4
> kernel, thus allowing overcommit to be truly disabled. This feature
> has been in 2.4-ac, Red Hat Enterprise Linux 3 (RHEL 3) vendor kernels,
> and 2.6 for a long while.

Many thanks, I'm impatient to try it ! I tried to backport it in the
past but miserably failed as I don't understand those areas well. I'm
interested in checking that a buggy service cannot eat all the RAM an
bring the machine to death.

Cheers,
Willy

2005-12-30 18:17:54

[permalink] [raw]

Subject: Re: [PATCH] strict VM overcommit accounting for 2.4.32/2.4.33-pre1

On Fri, 2005-12-30 at 18:48 +0100, Willy Tarreau wrote:
> On Thu, Dec 29, 2005 at 11:44:01PM -0800, Barry K. Nathan wrote:
> > This patch adds strict VM overcommit accounting to the mainline 2.4
> > kernel, thus allowing overcommit to be truly disabled. This feature
> > has been in 2.4-ac, Red Hat Enterprise Linux 3 (RHEL 3) vendor kernels,
> > and 2.6 for a long while.
>
> Many thanks, I'm impatient to try it ! I tried to backport it in the
> past but miserably failed as I don't understand those areas well. I'm
> interested in checking that a buggy service cannot eat all the RAM an
> bring the machine to death.

that's what rlimit is for though... overcommit acounting doesn't help
you a lot there.

Also I think, to be honest, that this is a feature that is getting
unsuitable for the "bugfixes only" 2.4 kernel series....

2005-12-30 18:36:11

[permalink] [raw]

Subject: Re: [PATCH] strict VM overcommit accounting for 2.4.32/2.4.33-pre1

On Fri, Dec 30, 2005 at 07:17:46PM +0100, Arjan van de Ven wrote:
> On Fri, 2005-12-30 at 18:48 +0100, Willy Tarreau wrote:
> > On Thu, Dec 29, 2005 at 11:44:01PM -0800, Barry K. Nathan wrote:
> > > This patch adds strict VM overcommit accounting to the mainline 2.4
> > > kernel, thus allowing overcommit to be truly disabled. This feature
> > > has been in 2.4-ac, Red Hat Enterprise Linux 3 (RHEL 3) vendor kernels,
> > > and 2.6 for a long while.
> >
> > Many thanks, I'm impatient to try it ! I tried to backport it in the
> > past but miserably failed as I don't understand those areas well. I'm
> > interested in checking that a buggy service cannot eat all the RAM an
> > bring the machine to death.
>
> that's what rlimit is for though... overcommit acounting doesn't help
> you a lot there.

Not always. When you have buggy apache modules eating lots of memory and
you have tons of processes, rlimit will be of limited help.

> Also I think, to be honest, that this is a feature that is getting
> unsuitable for the "bugfixes only" 2.4 kernel series....

Agreed, it really is too late IMHO, because there's a non-null risk of
introducing new bugs with it. It would have been cool a few months
earlier. That won't stop me from trying it in my own tree however ;-)

Willy

2005-12-30 19:37:39

by Barry K. Nathan

[permalink] [raw]

Subject: Re: [PATCH] strict VM overcommit accounting for 2.4.32/2.4.33-pre1

On 12/30/05, Willy Tarreau <[email protected]> wrote:
> On Fri, Dec 30, 2005 at 07:17:46PM +0100, Arjan van de Ven wrote:
[snip discussion not directly related to the timing of my patch submission]
> > Also I think, to be honest, that this is a feature that is getting
> > unsuitable for the "bugfixes only" 2.4 kernel series....
>
> Agreed, it really is too late IMHO, because there's a non-null risk of
> introducing new bugs with it. It would have been cool a few months
> earlier. That won't stop me from trying it in my own tree however ;-)

Yeah, I know it's a little bit late. I wish I had been able to get
this done a few months ago... :(

Oh well, even if it doesn't get into the tree, at least it looks like
I might not be the only person to benefit from this patch. :) (BTW,
you'll probably also want the patch I just posted, which adds
Committed_AS to /proc/meminfo.)

--
-Barry K. Nathan <[email protected]>

2005-12-30 20:07:41

[permalink] [raw]

Subject: Re: [PATCH] strict VM overcommit accounting for 2.4.32/2.4.33-pre1

On Thu, Dec 29, 2005 at 11:44:01PM -0800, Barry K. Nathan wrote:
> This patch adds strict VM overcommit accounting to the mainline 2.4
> kernel, thus allowing overcommit to be truly disabled. This feature
> has been in 2.4-ac, Red Hat Enterprise Linux 3 (RHEL 3) vendor kernels,
> and 2.6 for a long while.

Thanks a lot!

> +3 - (NEW) paranoid overcommit The total address space commit
> + for the system is not permitted to exceed swap. The machine
> + will never kill a process accessing pages it has mapped
> + except due to a bug (ie report it!)

This one isn't in 2.6, which is critical for a stable system.

Thanks!

--
Al

2005-12-30 20:18:32

by Barry K. Nathan

[permalink] [raw]

Subject: Re: [PATCH] strict VM overcommit accounting for 2.4.32/2.4.33-pre1

On 12/30/05, Al Boldi <[email protected]> wrote:
> Thanks a lot!
>
> > +3 - (NEW) paranoid overcommit The total address space commit
> > + for the system is not permitted to exceed swap. The machine
> > + will never kill a process accessing pages it has mapped
> > + except due to a bug (ie report it!)
>
> This one isn't in 2.6, which is critical for a stable system.

I mentioned in my original post (maybe I wasn't clear enough) that
this is only in the documentation and not in the actual code (i.e. the
code has no advantage over 2.6 in this regard). This error comes from
the 2.4-ac/pac kernels from which this documentation originally comes.
Yes, I need to fix the documentation, I didn't get to do that yet.

I think you can get paranoid overcommit with either my patch or 2.6 by
setting /proc/sys/vm/overcommit_mode to 2 *and*
/proc/sys/vm/overcommit_ratio to 0, however.

--
-Barry K. Nathan <[email protected]>

2005-12-30 20:52:37

[permalink] [raw]

Subject: Re: [PATCH] strict VM overcommit accounting for 2.4.32/2.4.33-pre1

Barry K. Nathan wrote:
> On 12/30/05, Al Boldi <[email protected]> wrote:
> > Thanks a lot!
> >
> > > +3 - (NEW) paranoid overcommit The total address space commit
> > > + for the system is not permitted to exceed swap. The machine
> > > + will never kill a process accessing pages it has mapped
> > > + except due to a bug (ie report it!)
> >
> > This one isn't in 2.6, which is critical for a stable system.
>
> I think you can get paranoid overcommit with either my patch or 2.6 by
> setting /proc/sys/vm/overcommit_mode to 2 *and*
> /proc/sys/vm/overcommit_ratio to 0, however.

Not really in 2.6.
And even if this were made to work, what would it imply to a system running
w/o swap?

Thanks!

--
Al

2005-12-30 20:54:19

[permalink] [raw]

Subject: Re: [PATCH] strict VM overcommit accounting for 2.4.32/2.4.33-pre1

On Fri, Dec 30, 2005 at 11:37:38AM -0800, Barry K. Nathan wrote:
> On 12/30/05, Willy Tarreau <[email protected]> wrote:
> > On Fri, Dec 30, 2005 at 07:17:46PM +0100, Arjan van de Ven wrote:
> [snip discussion not directly related to the timing of my patch submission]
> > > Also I think, to be honest, that this is a feature that is getting
> > > unsuitable for the "bugfixes only" 2.4 kernel series....
> >
> > Agreed, it really is too late IMHO, because there's a non-null risk of
> > introducing new bugs with it. It would have been cool a few months
> > earlier. That won't stop me from trying it in my own tree however ;-)
>
> Yeah, I know it's a little bit late. I wish I had been able to get
> this done a few months ago... :(

And I wish I could stay awake 24 hours a day and 365 days a year...
Seriously, this work will not be lost because 2.6 has it. However,
if it works well, I intend to link to it from my interesting patches
page (when I finally find time to put it online).

> Oh well, even if it doesn't get into the tree, at least it looks like
> I might not be the only person to benefit from this patch. :)

Every patch which can enhance long term stability will interest people
who manage remote systems and who at least start softdog to get a chance
to reach the box after an accident.

> (BTW,
> you'll probably also want the patch I just posted, which adds
> Committed_AS to /proc/meminfo.)

I've caught it, thanks.

> -Barry K. Nathan <[email protected]>

Cheers,
Willy

2005-12-30 21:44:30

[permalink] [raw]

Subject: Re: [PATCH] strict VM overcommit accounting for 2.4.32/2.4.33-pre1

On Fri, Dec 30, 2005 at 11:51:58PM +0300, Al Boldi wrote:
> Barry K. Nathan wrote:
> > On 12/30/05, Al Boldi <[email protected]> wrote:
> > > Thanks a lot!
> > >
> > > > +3 - (NEW) paranoid overcommit The total address space commit
> > > > + for the system is not permitted to exceed swap. The machine
> > > > + will never kill a process accessing pages it has mapped
> > > > + except due to a bug (ie report it!)
> > >
> > > This one isn't in 2.6, which is critical for a stable system.
> >
> > I think you can get paranoid overcommit with either my patch or 2.6 by
> > setting /proc/sys/vm/overcommit_mode to 2 *and*
> > /proc/sys/vm/overcommit_ratio to 0, however.
>
> Not really in 2.6.
> And even if this were made to work, what would it imply to a system running
> w/o swap?

I can clearly reply on this one :

root@pcw:vm# swapoff -a
root@pcw:vm# free
total used free shared buffers cached
Mem: 1031752 84992 946760 0 3144 38564
-/+ buffers/cache: 43284 988468
Swap: 0 0 0
root@pcw:vm# echo 2 > overcommit_memory
root@pcw:vm# echo 0 > overcommit_ratio
root@pcw:vm# uname -a
bash: fork: Cannot allocate memory
root@pcw:vm# echo 10 > overcommit_ratio
root@pcw:vm# uname -a
Linux pcw 2.4.33-pre1 #1 SMP Fri Dec 30 21:52:06 CET 2005 i686 unknown
root@pcw:vm#

So as one could expect, if you're limited to allocation of
(swap_size)+0% of RAM, then you have a limit of 0 kB, so that
malloc() always fails. I'll do some crash tests with mmap, shm
and 100 ratio to see how it behaves, but at the moment it's
clearly good. Multi-process malloc cannot allocate more than
specified and that's good.

> Thanks!
>
> --
> Al

Regards,
Willy

2005-12-31 00:53:25

by Alan

[permalink] [raw]

Subject: Re: [PATCH] strict VM overcommit accounting for 2.4.32/2.4.33-pre1

On Gwe, 2005-12-30 at 23:06 +0300, Al Boldi wrote:
> > +3 - (NEW) paranoid overcommit The total address space commit
> > + for the system is not permitted to exceed swap. The machine
> > + will never kill a process accessing pages it has mapped
> > + except due to a bug (ie report it!)
>
> This one isn't in 2.6, which is critical for a stable system.

Actually it is

In the 2.4 case we took "50% RAM + swap" as the safe sane world 'never
OOM kill' and to all intents and purposes it works. We also had a 100%
paranoia mode.

When it was ported to 2.6 (not by me) whoever did it very sensibly made
the percentage tunable and removed "mode 3" since its mode 2 0% ram and
can be set that way.

Alan

2005-12-31 04:59:50

[permalink] [raw]

Subject: Re: [PATCH] strict VM overcommit accounting for 2.4.32/2.4.33-pre1

Alan Cox wrote:
> On Gwe, 2005-12-30 at 23:06 +0300, Al Boldi wrote:
> > > +3 - (NEW) paranoid overcommit The total address space commit
> > > + for the system is not permitted to exceed swap. The machine
> > > + will never kill a process accessing pages it has mapped
> > > + except due to a bug (ie report it!)
> >
> > This one isn't in 2.6, which is critical for a stable system.
>
> Actually it is
>
> In the 2.4 case we took "50% RAM + swap" as the safe sane world 'never
> OOM kill' and to all intents and purposes it works. We also had a 100%
> paranoia mode.
>
> When it was ported to 2.6 (not by me) whoever did it very sensibly made
> the percentage tunable and removed "mode 3" since its mode 2 0% ram and
> can be set that way.

Only, doesn't this imply that you cannot control overcommit unless backed by
swap? i.e Without swap the kernel cannot use all of ram, because it would
overcommit no-matter what, thus invoking OOM-killer.

Which raises an important question: What's overcommit to do with limiting
access to physical RAM?

Thanks!

--
Al

2005-12-31 07:40:50

[permalink] [raw]

Subject: Re: [PATCH] strict VM overcommit accounting for 2.4.32/2.4.33-pre1

On Sat, Dec 31, 2005 at 07:59:02AM +0300, Al Boldi wrote:
> Alan Cox wrote:
> > On Gwe, 2005-12-30 at 23:06 +0300, Al Boldi wrote:
> > > > +3 - (NEW) paranoid overcommit The total address space commit
> > > > + for the system is not permitted to exceed swap. The machine
> > > > + will never kill a process accessing pages it has mapped
> > > > + except due to a bug (ie report it!)
> > >
> > > This one isn't in 2.6, which is critical for a stable system.
> >
> > Actually it is
> >
> > In the 2.4 case we took "50% RAM + swap" as the safe sane world 'never
> > OOM kill' and to all intents and purposes it works. We also had a 100%
> > paranoia mode.
> >
> > When it was ported to 2.6 (not by me) whoever did it very sensibly made
> > the percentage tunable and removed "mode 3" since its mode 2 0% ram and
> > can be set that way.
>
> Only, doesn't this imply that you cannot control overcommit unless backed by
> swap? i.e Without swap the kernel cannot use all of ram, because it would
> overcommit no-matter what, thus invoking OOM-killer.
>
> Which raises an important question: What's overcommit to do with limiting
> access to physical RAM?

As shown in my previous mail, it allows malloc() to return NULL. I've
also successfully verified that it allows mmap() to fail if there is
not enough memory. I disabled swap, and set the overcommit_ratio to 95
and could not kill the system. Above this, it becomes tricky. At 97, I
see the last malloc() calls take a very long time, and at 98, the
system still hangs. But 95% without swap seems stable here.

> Thanks!
>
> --
> Al

Cheers,
Willy

2005-12-31 14:03:22

[permalink] [raw]

Subject: Re: [PATCH] strict VM overcommit accounting for 2.4.32/2.4.33-pre1

Willy Tarreau wrote:
> On Sat, Dec 31, 2005 at 07:59:02AM +0300, Al Boldi wrote:
> > Alan Cox wrote:
> > > On Gwe, 2005-12-30 at 23:06 +0300, Al Boldi wrote:
> > > > > +3 - (NEW) paranoid overcommit The total address space commit
> > > > > + for the system is not permitted to exceed swap. The machine
> > > > > + will never kill a process accessing pages it has mapped
> > > > > + except due to a bug (ie report it!)
> > > >
> > > > This one isn't in 2.6, which is critical for a stable system.
> > >
> > > Actually it is
> > >
> > > In the 2.4 case we took "50% RAM + swap" as the safe sane world
> > > 'never OOM kill' and to all intents and purposes it works. We also had
> > > a 100% paranoia mode.
> > >
> > > When it was ported to 2.6 (not by me) whoever did it very sensibly
> > > made the percentage tunable and removed "mode 3" since its mode 2 0%
> > > ram and can be set that way.
> >
> > Only, doesn't this imply that you cannot control overcommit unless
> > backed by swap? i.e Without swap the kernel cannot use all of ram,
> > because it would overcommit no-matter what, thus invoking OOM-killer.
> >
> > Which raises an important question: What's overcommit to do with
> > limiting access to physical RAM?
>
> As shown in my previous mail, it allows malloc() to return NULL. I've
> also successfully verified that it allows mmap() to fail if there is
> not enough memory. I disabled swap, and set the overcommit_ratio to 95
> and could not kill the system. Above this, it becomes tricky. At 97, I
> see the last malloc() calls take a very long time, and at 98, the
> system still hangs. But 95% without swap seems stable here.

Thanks, for confirming this! And I agree that this patch and 2.6 offer an
important and necessary workaround to inhibit OOM-killer, but it's no more
than a workaround.

And so the question remains: Why should overcommit come into play at all
when dealing with physical RAM only?

Shouldn't it be possible to disable overcommit completely, thus giving kswapd
a break from running wild trying to find something to swap/page, which is
the reason why the system gets unstable going over 95% in your example.

Thanks!

--
Al

2005-12-31 14:21:50

[permalink] [raw]

Subject: Re: [PATCH] strict VM overcommit accounting for 2.4.32/2.4.33-pre1

On Sat, Dec 31, 2005 at 05:02:20PM +0300, Al Boldi wrote:
> Willy Tarreau wrote:
(...)
> > As shown in my previous mail, it allows malloc() to return NULL. I've
> > also successfully verified that it allows mmap() to fail if there is
> > not enough memory. I disabled swap, and set the overcommit_ratio to 95
> > and could not kill the system. Above this, it becomes tricky. At 97, I
> > see the last malloc() calls take a very long time, and at 98, the
> > system still hangs. But 95% without swap seems stable here.
>
> Thanks, for confirming this! And I agree that this patch and 2.6 offer an
> important and necessary workaround to inhibit OOM-killer, but it's no more
> than a workaround.
>
> And so the question remains: Why should overcommit come into play at all
> when dealing with physical RAM only?

It is very important when you have many processes wasting lots of unused
memory. For instance, when firefox allocates 10 MB and uses only 3, then
the remaining 7 MB can be used by another process. But if finally firefox
tries to use them, and there is no memory left, then chances are that some
processes will be killed. I believe the same problem happens after a fork()
because data gets COWed but I'm not certain about this.

I'd bet that using a heavy GUI under X with no swap an overcommit_ratio
set around 95%, you could get occasionnal malloc() failures. But once
again, I may be wrong.

> Shouldn't it be possible to disable overcommit completely, thus giving kswapd
> a break from running wild trying to find something to swap/page, which is
> the reason why the system gets unstable going over 95% in your example.

I think it's going unstable above 95% because there are lots of other
areas consuming memory. I don't know for example if dentry caches,
network buffers, various hash tables, etc... are accounted for.

> Thanks!
>
> --
> Al

Willy

2005-12-31 14:26:31

[permalink] [raw]

Subject: Re: [PATCH] strict VM overcommit accounting for 2.4.32/2.4.33-pre1

On Sat, 2005-12-31 at 17:02 +0300, Al Boldi wrote:

> Shouldn't it be possible to disable overcommit completely, thus giving kswapd
> a break from running wild trying to find something to swap/page, which is
> the reason why the system gets unstable going over 95% in your example.

shared mappings make this impractical. To disable overcommit completely,
each process would need to account for all its own shared libraries, eg
each process gets glibc added etc. You'll find that on any
non-extremely-stripped system you then end up with much more memory
needed than you have ram.

2005-12-31 15:01:53

[permalink] [raw]

Subject: Re: [PATCH] strict VM overcommit accounting for 2.4.32/2.4.33-pre1

On Sat, Dec 31, 2005 at 03:26:18PM +0100, Arjan van de Ven wrote:
> On Sat, 2005-12-31 at 17:02 +0300, Al Boldi wrote:
>
> > Shouldn't it be possible to disable overcommit completely, thus giving kswapd
> > a break from running wild trying to find something to swap/page, which is
> > the reason why the system gets unstable going over 95% in your example.
>
> shared mappings make this impractical. To disable overcommit completely,
> each process would need to account for all its own shared libraries, eg
> each process gets glibc added etc. You'll find that on any
> non-extremely-stripped system you then end up with much more memory
> needed than you have ram.

Arjan, is this true even for read-only mappings such as shared libs ?
It seems to me that those ones can precisely be mapped once because
they are areas the process cannot extend. Am I wrong ?

Willy

2005-12-31 15:17:14

[permalink] [raw]

Subject: Re: [PATCH] strict VM overcommit accounting for 2.4.32/2.4.33-pre1

On Sat, 2005-12-31 at 15:58 +0100, Willy Tarreau wrote:
> On Sat, Dec 31, 2005 at 03:26:18PM +0100, Arjan van de Ven wrote:
> > On Sat, 2005-12-31 at 17:02 +0300, Al Boldi wrote:
> >
> > > Shouldn't it be possible to disable overcommit completely, thus giving kswapd
> > > a break from running wild trying to find something to swap/page, which is
> > > the reason why the system gets unstable going over 95% in your example.
> >
> > shared mappings make this impractical. To disable overcommit completely,
> > each process would need to account for all its own shared libraries, eg
> > each process gets glibc added etc. You'll find that on any
> > non-extremely-stripped system you then end up with much more memory
> > needed than you have ram.
>
> Arjan, is this true even for read-only mappings such as shared libs ?

shared libs aren't read only! they have all the relocations applied for
example. (the fact that glibc mprotects it read only again afterwards
doesn't change things.. part of the memory is already written to). and
mprotect can be called again.... what would mprotect need to do if it's
asked to make memory writable and the overcommit says "no space" ....

2005-12-31 17:37:43

[permalink] [raw]

Subject: Re: [PATCH] strict VM overcommit accounting for 2.4.32/2.4.33-pre1

Arjan van de Ven wrote:
> On Sat, 2005-12-31 at 17:02 +0300, Al Boldi wrote:
> > Shouldn't it be possible to disable overcommit completely, thus giving
> > kswapd a break from running wild trying to find something to swap/page,
> > which is the reason why the system gets unstable going over 95% in your
> > example.
>
> shared mappings make this impractical. To disable overcommit completely,
> each process would need to account for all its own shared libraries, eg
> each process gets glibc added etc. You'll find that on any
> non-extremely-stripped system you then end up with much more memory
> needed than you have ram.

Are you implying shared maps are implemented by way of overcommitting?

Really, overcommit is an add-on feature like swapping, only overcommit is
free because it's a lier. So removing an add-on feature should not affect
the underlying system in any way, such as shared mappings or swapping.

It should be possible to allow swapping to handle all memory requests
exceeding physical RAM. OverCommit should be a tuning option for those who
like to live on the edge, because it really is a gamble.

In the case where swap = physical RAM and overcommit_ratio = 0, the kernel is
in effect hiding the fact that it is overcommitting.

Can you see the overhead involved here?

Thanks for your input!

--
Al

2006-01-01 09:12:47