2002-07-11 22:26:00

by Robert Love

[permalink] [raw]
Subject: [PATCH] strict VM overcommit

The attached patch implements strict VM overcommit on top of the rmap
VM.

The basis for this is Alan Cox's work in 2.4-ac. This is a port of the
strict VM overcommit out of 2.4-ac and into the standard kernel with the
following changes:

- one or two bugfixes (have sent/will send to Alan)
- some cleanups, mostly for coding style
- I did not bring over the debugging code
- new overcommit policy for swapless machines

So what is strict VM overcommit? We introduce new overcommit policies
that attempt to never succeed an allocation that can not be fulfilled by
the backing store and consequently never OOM. This is achieved through
strict accounting of the committed address space and a policy to
allow/refuse allocations based on that accounting.

In the strictest of modes, it should be impossible to allocate more
memory than available and impossible to OOM. All memory failures should
be pushed down to the allocation routines -- malloc, mmap, etc.

The new modes are available via sysctl (same as before). See
Documentation/vm/overcommit-accounting for more information.

Again, Alan deserves the credit for the design of all this.

The patch is against 2.4.19-pre7-rmap13b but should apply to later
releases with little trouble.

Enjoy,

Robert Love


Attachments:
vm-strict-overcommit-rml-2.4.19-pre7-rmap-1.patch (28.96 kB)

2002-07-12 17:27:53

by Robert Love

[permalink] [raw]
Subject: [PATCH] strict VM overcommit for stock 2.4

A version of Alan's strict VM overcommit for the stock VM is available
for 2.4.19-rc1 at:

ftp://ftp.kernel.org/pub/linux/kernel/people/rml/vm/strict-overcommit/2.4/vm-strict-overcommit-rml-2.4.19-rc1-1.patch

This is the same code I posted yesterday (see "[PATCH] strict VM
overcommit for" from 20020711) except for the stock non-rmap VM in 2.4.

Hugh Dickins sent me a few fixes, mostly for shmfs accounting, that he
recently discovered... that code is not yet merged but will be, probably
after this weekend.

I still encourage testing and comments.

Robert Love

2002-07-18 16:17:42

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4


On 12 Jul 2002, Robert Love wrote:

> I still encourage testing and comments.

Quickly looking through the patch I can't see what prevents total loss of
control at constant memory pressure. For more please see:
http://www.uwsg.iu.edu/hypermail/linux/kernel/0108.2/0310.html

Szaka

2002-07-18 16:28:23

by Robert Love

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On Thu, 2002-07-18 at 08:22, Szakacsits Szabolcs wrote:

> Quickly looking through the patch I can't see what prevents total loss of
> control at constant memory pressure. For more please see:

I do not see anything in this email related to the issue at hand.

First, if the VM is broke that is an orthogonal issue that needs to be
fixed separately.

Specifically, what livelock situation are you insinuating? If we only
allow allocation that are met by the backing store, we cannot get
anywhere near OOM.

Robert Love

2002-07-18 17:31:15

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4


On 18 Jul 2002, Robert Love wrote:

> I do not see anything in this email related to the issue at hand.

You solve a problem and introduce a potentially more serious one.
Strict overcommit is requisite but not satisfactory.

> Specifically, what livelock situation are you insinuating? If we only
> allow allocation that are met by the backing store, we cannot get
> anywhere near OOM.

This is what I would do first [make sure you don't hit any resource,
malloc, kernel memory mapping, etc limits -- this is a simulation that
must eat all available memory continually]:
main(){void *x;while(1)if(x=malloc(4096))memset(x,666,4096);}

When the above used up all the memory try to ssh/login to the box as
root and clean up the mess. Can you do it?

Szaka

2002-07-18 17:39:48

by Robert Love

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On Thu, 2002-07-18 at 09:36, Szakacsits Szabolcs wrote:

> This is what I would do first [make sure you don't hit any resource,
> malloc, kernel memory mapping, etc limits -- this is a simulation that
> must eat all available memory continually]:
> main(){void *x;while(1)if(x=malloc(4096))memset(x,666,4096);}
>
> When the above used up all the memory try to ssh/login to the box as
> root and clean up the mess. Can you do it?

Three points:

- with strict overcommit and the "allocations must meet backing store"
rule (policy #3) the above can never use all physical memory

- if your point is that a rogue user can use all of the systems memory,
then you need per-user resource accounting.

- the point of this patch is to not use MORE memory than the system
has. I say nothing else except that I am trying to avoid OOM and push
the allocation failures into the allocations themselves. Assuming the
accounting is correct (and it seems to be) then Alan and I have
succeeded.

Robert Love

2002-07-18 18:20:55

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4


On 18 Jul 2002, Robert Love wrote:
> On Thu, 2002-07-18 at 09:36, Szakacsits Szabolcs wrote:
>
> > This is what I would do first [make sure you don't hit any resource,
> > malloc, kernel memory mapping, etc limits -- this is a simulation that
> > must eat all available memory continually]:
> > main(){void *x;while(1)if(x=malloc(4096))memset(x,666,4096);}
> >
> > When the above used up all the memory try to ssh/login to the box as
> > root and clean up the mess. Can you do it?
>
> - with strict overcommit and the "allocations must meet backing store"
> rule (policy #3) the above can never use all physical memory

So you can't do it: if this user can't get more memory neither root.

> - if your point is that a rogue user can use all of the systems memory,
> then you need per-user resource accounting.

I explicitely mentioned above, "make sure you don't hit any resource
... limit".

> - the point of this patch is to not use MORE memory than the system
> has.

I had my [not finished] own non-overcommit patch based on Eduardo
Horvath's from 2000, no need to explain what it means :) Actually the
basics of your patch looks very similar to Eduardo's one.

> I say nothing else except that I am trying to avoid OOM and push
> the allocation failures into the allocations themselves. Assuming the
> accounting is correct (and it seems to be) then Alan and I have
> succeeded.

And my point (you asked for comments) was that, this is only (the
harder) part of the solution making Linux a more reliable (no OOM
killing *and* root always has the control) and cost effective platform
(no need for occasionally very complex and continuous resource limit
setup/adjusting, especially for inexpert home/etc users).

Szaka

2002-07-18 18:26:09

by Robert Love

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On Thu, 2002-07-18 at 10:25, Szakacsits Szabolcs wrote:

> And my point (you asked for comments) was that, this is only (the
> harder) part of the solution making Linux a more reliable (no OOM
> killing *and* root always has the control) and cost effective platform
> (no need for occasionally very complex and continuous resource limit
> setup/adjusting, especially for inexpert home/etc users).

I understand your point, and you are entirely right.

But it is a _completely_ unrelated issue. The goal here is to not
overcommit memory and I think we succeeded.

An orthogonal issue is per-user resource limits and this may need to be
coupled with that. It is not a problem I am trying to solve, however.

Robert Love

2002-07-18 18:29:14

by Robert Love

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On Thu, 2002-07-18 at 10:31, Szakacsits Szabolcs wrote:

> Ahh, I figured out your target, embedded devices. Yes it's good for
> that but not enough for general purpose.

I think this applies to more than just embedded devices. Further, it
applies to even the case you are talking about because the issues are
_orthogonal_.

If you also have an issue with root vs non-root users then you need
resource limits. You still need this too.

Robert Love

2002-07-18 18:26:58

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4


On Thu, 18 Jul 2002, Szakacsits Szabolcs wrote:
> And my point (you asked for comments) was that, this is only (the
> harder) part of the solution making Linux a more reliable (no OOM
> killing *and* root always has the control) and cost effective platform
> (no need for occasionally very complex and continuous resource limit
> setup/adjusting, especially for inexpert home/etc users).

Ahh, I figured out your target, embedded devices. Yes it's good for
that but not enough for general purpose.

Szaka

2002-07-18 18:44:52

by Alan

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On Thu, 2002-07-18 at 18:31, Szakacsits Szabolcs wrote:
>
> On Thu, 18 Jul 2002, Szakacsits Szabolcs wrote:
> > And my point (you asked for comments) was that, this is only (the
> > harder) part of the solution making Linux a more reliable (no OOM
> > killing *and* root always has the control) and cost effective platform
> > (no need for occasionally very complex and continuous resource limit
> > setup/adjusting, especially for inexpert home/etc users).
>
> Ahh, I figured out your target, embedded devices. Yes it's good for
> that but not enough for general purpose.

Adjusting the percentages to have a root only zone is doable. It helps
in some conceivable cases but not all. Do people think its important, if
so I'll add it

2002-07-18 18:45:33

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4


On 18 Jul 2002, Robert Love wrote:
> An orthogonal issue is per-user resource limits and this may need to be
> coupled with that. It is not a problem I am trying to solve, however.

About 99% of the people don't know about, don't understand or don't
care about resource limits. But they do care about cleaning up when
mess comes. Adding reserved root memory would be a couple of lines,
you can get ideas from the patch from here,
http://mlf.linux.rulez.org/mlf/ezaz/reserved_root_memory.html

Surprisingly visited through google and people are asking for 2.4
patches, hint ;)

Szaka

2002-07-18 18:56:19

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4


On 18 Jul 2002, Alan Cox wrote:
> Adjusting the percentages to have a root only zone is doable. It helps
> in some conceivable cases but not all. Do people think its important, if
> so I'll add it

"Why isn't in the kernel?" was the other FAQ I got besides "when it
will be ported to 2.4?" [about the reserved root vm patch I mentioned
in my other email].

Szaka

2002-07-18 18:50:18

by Robert Love

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On Thu, 2002-07-18 at 12:58, Alan Cox wrote:

> Adjusting the percentages to have a root only zone is doable. It helps
> in some conceivable cases but not all. Do people think its important, if
> so I'll add it

Changing the rules would be easy, but you would need to make the
accounting check for root vs non-root and keep track accordingly.
Admittedly not hard but not entirely pretty either.

I still contend the issues are not related. It would make more sense to
me to do resource limits to solve this problem - rlimits are something
Rik has on his TODO and supposedly easy to add to rmap.

That way people can use strict overcommit, rlimits, neither, or both to
meet their needs.

Robert Love

2002-07-18 18:51:39

by Richard B. Johnson

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On 18 Jul 2002, Robert Love wrote:

> On Thu, 2002-07-18 at 10:25, Szakacsits Szabolcs wrote:
>
> > And my point (you asked for comments) was that, this is only (the
> > harder) part of the solution making Linux a more reliable (no OOM
> > killing *and* root always has the control) and cost effective platform
> > (no need for occasionally very complex and continuous resource limit
> > setup/adjusting, especially for inexpert home/etc users).
>
> I understand your point, and you are entirely right.
>
> But it is a _completely_ unrelated issue. The goal here is to not
> overcommit memory and I think we succeeded.
>

Let's see, I have 30 network daemons that are all sleeping, each
requested and got 200 MB of memory to work with. I've got 10 NFS
daemons that allocated their worse-case 228 MB data-buffers. They
are all sleeping. I have 6 gettys, sleeping on terminals, they
all requested and got 32 MB. I am now trying to log-in, but
/bin/login fails to exec because there is no memory.

What should have happened is each of the tasks need only about
4k until they actually access something. Since they can't possibly
access everything at once, we need to fault in pages as needed,
not all at once. This is what 'overcomit' is, and it is necessary.

If the machine was set up with the correct amount of swap, and
if resource limits are correctly in-place, even a 16 megabyte RAM
machine will not fail due to OOM.

If you have 'fixed' something so that no RAM ever has to be paged
you have a badly broken system.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

Windows-2000/Professional isn't.


2002-07-18 19:00:41

by Robert Love

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On Thu, 2002-07-18 at 11:56, Richard B. Johnson wrote:

> What should have happened is each of the tasks need only about
> 4k until they actually access something. Since they can't possibly
> access everything at once, we need to fault in pages as needed,
> not all at once. This is what 'overcomit' is, and it is necessary.

Then do not enable strict overcommit, Dick.

> If you have 'fixed' something so that no RAM ever has to be paged
> you have a badly broken system.

That is not the intention of Alan or I's work at all.

Robert Love


2002-07-18 19:07:30

by Robert Love

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On Thu, 2002-07-18 at 11:56, Richard B. Johnson wrote:

> What should have happened is each of the tasks need only about
> 4k until they actually access something. Since they can't possibly
> access everything at once, we need to fault in pages as needed,
> not all at once. This is what 'overcomit' is, and it is necessary.

I should also mention this is demand paging, not overcommit.

Overcommit is the property of succeeded more allocations than their is
memory in the address space. The idea being that allocations are lazy,
things often do not use their full allocations, etc. etc. as you
mentioned.

It is typical a good thing since it lowers VM pressure.

It is not always a good thing, for numerous reasons, and it becomes
important in those scenarios to ensure that all allocations can be met
by the backing store and consequently we never find ourselves with more
memory committed than available and thus never OOM.

This has nothing to do with paging and resource limits as you say. Btw,
without this it is possible to OOM any machine. OOM is a by-product of
allowing overcommit and poor accounting (and perhaps poor
software/users), not an incorrectly configured machine.

Robert Love

2002-07-18 19:14:04

by Richard B. Johnson

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On 18 Jul 2002, Robert Love wrote:

> On Thu, 2002-07-18 at 11:56, Richard B. Johnson wrote:
>
> > What should have happened is each of the tasks need only about
> > 4k until they actually access something. Since they can't possibly
> > access everything at once, we need to fault in pages as needed,
> > not all at once. This is what 'overcomit' is, and it is necessary.
>
> Then do not enable strict overcommit, Dick.
>
> > If you have 'fixed' something so that no RAM ever has to be paged
> > you have a badly broken system.
>
> That is not the intention of Alan or I's work at all.
>
> Robert Love

Okay then. When would it be useful? I read that it would be useful
in embedded systems, but everything that will ever run on embedded
systems is known at compile time, or is uploaded by something written
by an intelligent developer, so I don't think it's useful there. I
'do' embedded systems and have never encountered OOM.

I also read about some 'home users' not knowing how to set up
there systems. I don't think one CPU cycle should be wasted to
protect them, well maybe 10, but that's it.

I keep seeing the same thing about protecting root against fork and
malloc bombs and I get rather "malloc()" about it. All distributions
I have seen, so far, come with `gcc` and `make`. The kiddies can
crap all over their kernels at their heart's content. I don't think
Linux should be reduced to the lowest common denominator.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

Windows-2000/Professional isn't.


2002-07-18 19:20:17

by Robert Love

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On Thu, 2002-07-18 at 12:19, Richard B. Johnson wrote:

> Okay then. When would it be useful? I read that it would be useful
> in embedded systems, but everything that will ever run on embedded
> systems is known at compile time, or is uploaded by something written
> by an intelligent developer, so I don't think it's useful there. I
> 'do' embedded systems and have never encountered OOM.

I work for an embedded systems company and our customers do have OOM
problems. The problem is not so much that they _do_ OOM but that they
_can_ - killing a random process is the last thing they want.

Same issue with HA etc... its not preventing OOM so much as being
prepared for it, by pushing the failures into the allocation routines
and out from the page access.

Certainly Alan and RedHat found a need for it, too. It should be pretty
clear why this is an issue...

> I keep seeing the same thing about protecting root against fork and
> malloc bombs and I get rather "malloc()" about it. All distributions
> I have seen, so far, come with `gcc` and `make`. The kiddies can
> crap all over their kernels at their heart's content. I don't think
> Linux should be reduced to the lowest common denominator.

This is the argument I was making before -- I do not think strict
overcommit should solve this problem (nor can it fully). This is a
problem to be solved by per-user resource limits.

It is not an issue I care much for either, but this is more than just a
"kiddies" issue. Unbounded memory growth can happen without evil
intentions and in places e.g. like a university shell server it is
important to protect against.

Robert Love


2002-07-18 19:22:27

by John Stoffel

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4


Szakacsits> About 99% of the people don't know about, don't understand
Szakacsits> or don't care about resource limits. But they do care
Szakacsits> about cleaning up when mess comes. Adding reserved root
Szakacsits> memory would be a couple of lines

So what does this buy you when root itself runs the box into the
ground? Or if a dumb user decides to run his process as root, and it
takes down the system?

You're arguing for the wrong thing here. What Robert is doing is
making sure that when a process asks for memory, it can only succeed
when there is physical memory available.

Linux currently runs in over-commit mode, since it actually makes alot
of sense. Most processes ask for potentially huge amounts of memory,
but never use it. So if I have 10mb of RAM, and process A asks for
5mb, and process b asks for 5mb I'm ok. If process B asks for 6mb
then one of two things happens:

Over commit mode:
process B succeeds.

Strict overcommit mode:
process B gets a malloc failure and can't proceed.

Even if A and B only want to use 2mb of RAM each, and the system would
have 6mb free, they could *ask* for the extra RAM and overcommit the
system and hit the OOM situation.

DEC OSF/1 had a toggle way back when in the early 90s to turn this
feature on and off. Generally, being a school, we turned if off
(i.e. allowed lazy allocation) but for some core servers, we turned it
on to make sure the system was more stable.

In any case, what you're asking for is a *stupid user safety buffer*
and that's not sane. As I said before, keeping around a few Mb for
root doesn't do shit when a root process runs and pushes the system
into OOM.

John



2002-07-18 19:29:03

by Richard B. Johnson

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On 18 Jul 2002, Robert Love wrote:

> On Thu, 2002-07-18 at 11:56, Richard B. Johnson wrote:
>
> > What should have happened is each of the tasks need only about
> > 4k until they actually access something. Since they can't possibly
> > access everything at once, we need to fault in pages as needed,
> > not all at once. This is what 'overcomit' is, and it is necessary.
>
> I should also mention this is demand paging, not overcommit.
>
> Overcommit is the property of succeeded more allocations than their is
> memory in the address space. The idea being that allocations are lazy,
> things often do not use their full allocations, etc. etc. as you
> mentioned.
>
> It is typical a good thing since it lowers VM pressure.
>
> It is not always a good thing, for numerous reasons, and it becomes
> important in those scenarios to ensure that all allocations can be met
> by the backing store and consequently we never find ourselves with more
> memory committed than available and thus never OOM.
>
> This has nothing to do with paging and resource limits as you say. Btw,
> without this it is possible to OOM any machine. OOM is a by-product of
> allowing overcommit and poor accounting (and perhaps poor
> software/users), not an incorrectly configured machine.

It has everything to do with demand-paging. Since on single CPU
machines, there is only one task executing at any one time, that
single task can own and use every bit of RAM on the whole machine
is virtual memory works correctly. For performance reasons, it
may not actually use all the RAM but, in principle, it is possible.

If you don't allow that, the single task can use only the RAM that
was not allocated to other tasks. At the time an allocation is made,
the kernel cannot know what resources may be available when the task
requesting the allocation actually starts to use those allocated
resources. Instead, the kernel allocates resources based upon what
it 'knows' at the present time. Since it can't see the future anymore
than you or I, the fact that N processes just called exit() before
the requesting task touched a single page can't be known.

FYI multiple CPU machines have compounded the problems because there
can be several things happening at the same time. Although the MM
is locked so it's single-threaded, you have a before/after resource
history condition that can't be anticipated.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

Windows-2000/Professional isn't.


2002-07-18 19:35:51

by Alan

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

> Same issue with HA etc... its not preventing OOM so much as being
> prepared for it, by pushing the failures into the allocation routines
> and out from the page access.
>
> Certainly Alan and RedHat found a need for it, too. It should be pretty
> clear why this is an issue...

The code was written initially because we had large customers with a
direct requirement for the facility. It is also very relevant to
embedded systems where you want controlled failure.

2002-07-18 19:38:28

by Daniel Gryniewicz

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On Thu, 2002-07-18 at 15:35, Richard B. Johnson wrote:
> On 18 Jul 2002, Robert Love wrote:
>
> > I should also mention this is demand paging, not overcommit.
> >
> > Overcommit is the property of succeeded more allocations than their is
> > memory in the address space. The idea being that allocations are lazy,
> > things often do not use their full allocations, etc. etc. as you
> > mentioned.
> >
> > It is typical a good thing since it lowers VM pressure.
> >
> > It is not always a good thing, for numerous reasons, and it becomes
> > important in those scenarios to ensure that all allocations can be met
> > by the backing store and consequently we never find ourselves with more
> > memory committed than available and thus never OOM.
> >
> > This has nothing to do with paging and resource limits as you say. Btw,
> > without this it is possible to OOM any machine. OOM is a by-product of
> > allowing overcommit and poor accounting (and perhaps poor
> > software/users), not an incorrectly configured machine.
>
> It has everything to do with demand-paging. Since on single CPU
> machines, there is only one task executing at any one time, that
> single task can own and use every bit of RAM on the whole machine
> is virtual memory works correctly. For performance reasons, it
> may not actually use all the RAM but, in principle, it is possible.
>
> If you don't allow that, the single task can use only the RAM that
> was not allocated to other tasks. At the time an allocation is made,
> the kernel cannot know what resources may be available when the task
> requesting the allocation actually starts to use those allocated
> resources. Instead, the kernel allocates resources based upon what
> it 'knows' at the present time. Since it can't see the future anymore
> than you or I, the fact that N processes just called exit() before
> the requesting task touched a single page can't be known.
>
> FYI multiple CPU machines have compounded the problems because there
> can be several things happening at the same time. Although the MM
> is locked so it's single-threaded, you have a before/after resource
> history condition that can't be anticipated.
>
> Cheers,
> Dick Johnson
>

Is it possible that you're confusing "backing store" with "physical
RAM"? I was under the impression that strict overcommit used both RAM
and SWAP when deciding whether an allocation should succeed. If you've
exceeded all of RAM and all of swap, you are OOM. Period.

Daniel

--
Recursion n.:
See Recursion.
-- Random Shack Data Processing Dictionary


2002-07-18 19:38:15

by Alan

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On Thu, 2002-07-18 at 19:52, Robert Love wrote:
> On Thu, 2002-07-18 at 12:58, Alan Cox wrote:
>
> > Adjusting the percentages to have a root only zone is doable. It helps
> > in some conceivable cases but not all. Do people think its important, if
> > so I'll add it
>
> Changing the rules would be easy, but you would need to make the
> accounting check for root vs non-root and keep track accordingly.
> Admittedly not hard but not entirely pretty either.
>
> I still contend the issues are not related. It would make more sense to
> me to do resource limits to solve this problem - rlimits are something
> Rik has on his TODO and supposedly easy to add to rmap.

rmap supports rlimit AS which gives you paging control. Neither of them
support workload management or partitioned accounting of any kind. That
would need the beancounter patches resurrecting.

2002-07-18 20:17:09

by Richard B. Johnson

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On 18 Jul 2002, Daniel Gryniewicz wrote:

> On Thu, 2002-07-18 at 15:35, Richard B. Johnson wrote:
> > On 18 Jul 2002, Robert Love wrote:
> >
> > > I should also mention this is demand paging, not overcommit.
> > >
> > > Overcommit is the property of succeeded more allocations than their is
> > > memory in the address space. The idea being that allocations are lazy,
> > > things often do not use their full allocations, etc. etc. as you
> > > mentioned.
> > >
> > > It is typical a good thing since it lowers VM pressure.
> > >
> > > It is not always a good thing, for numerous reasons, and it becomes
> > > important in those scenarios to ensure that all allocations can be met
> > > by the backing store and consequently we never find ourselves with more
> > > memory committed than available and thus never OOM.
> > >
> > > This has nothing to do with paging and resource limits as you say. Btw,
> > > without this it is possible to OOM any machine. OOM is a by-product of
> > > allowing overcommit and poor accounting (and perhaps poor
> > > software/users), not an incorrectly configured machine.
> >
> > It has everything to do with demand-paging. Since on single CPU
> > machines, there is only one task executing at any one time, that
> > single task can own and use every bit of RAM on the whole machine
> > is virtual memory works correctly. For performance reasons, it
> > may not actually use all the RAM but, in principle, it is possible.
> >
> > If you don't allow that, the single task can use only the RAM that
> > was not allocated to other tasks. At the time an allocation is made,
> > the kernel cannot know what resources may be available when the task
> > requesting the allocation actually starts to use those allocated
> > resources. Instead, the kernel allocates resources based upon what
> > it 'knows' at the present time. Since it can't see the future anymore
> > than you or I, the fact that N processes just called exit() before
> > the requesting task touched a single page can't be known.
> >
> > FYI multiple CPU machines have compounded the problems because there
> > can be several things happening at the same time. Although the MM
> > is locked so it's single-threaded, you have a before/after resource
> > history condition that can't be anticipated.
> >
> > Cheers,
> > Dick Johnson
> >
>
> Is it possible that you're confusing "backing store" with "physical
> RAM"? I was under the impression that strict overcommit used both RAM
> and SWAP when deciding whether an allocation should succeed. If you've
> exceeded all of RAM and all of swap, you are OOM. Period.
>
> Daniel

No. And I'm not confused. Consider Virtual RAM as all the real RAM
and all the backing store. If I consider this an absolute limit,
then I am not able to fully use all the system resources.

Lets say the system has 20 'units' of a resource (virtual RAM).

Example:

Task (1) allocates 10 units, actually uses 1 so far.
Task (2) allocates 10 units, actually uses 2 so far.
Task (3) wants to allocate 7 units. It can't so it exits in error.
Task (1) uses all its units then exits normally.
Task (2) uses 1 more unit then exits normally.

So you forced a task to terminate in error because you established
hard limits on your resource.

What could (should?) have happened, is that all the memory allocations
could have succeeded, even though they exceeded the 20 units of resource.
>From the history, we see that Task (1) actually used 10. Task (2)
actually used 3 units. This means that there were really 20 - 13 = 7 units
available when task 3 requested 7 units. But, the system 'knew' what
its commitment was so it refused.

Now, since the system is dynamic, it is possible that Task (1) and
Task (2) might not even exist by the time Task (3) actually wants to
use its memory. In that case, there would have been 20 units available,
but we will never know because Task 3 exited in error.

Now, what this should point out is that a complete VM system, although
it can't anticipate the future, can put things off until the future
where things may be better. This is the true 'fix' of OOM (if it
needs fixing.

Let's say I make a fork bomb, as root, no protection. The MM knows
that it can't give me any more RAM right now so I am put to sleep.
Other tasks run fine, at full speed. As they exit, the fork-bomb
may get their memory. Since the MM knows how much resource could ever
become available, as a single task exceeds this limit, the MM
knows that it cannot get any more in the future so the MM knows it's
a bomb. If the system doesn't kill the bomb, eventually, I may fail the
system because there may be no resource available to log-in and kill the
bomb, but as long as there is one task running, connected to a terminal,
that task can be used to kill the fork-bomb.

The problem with putting memory allocation off to the future, as
I see it, is the existing paging code wasn't designed for it. If
a task that page-faulted could also sleep, the problem could be
solved.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Windows-2000/Professional isn't.

2002-07-18 20:40:41

by Daniel Gryniewicz

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On Thu, 2002-07-18 at 16:23, Richard B. Johnson wrote:
> On 18 Jul 2002, Daniel Gryniewicz wrote:
>
> > On Thu, 2002-07-18 at 15:35, Richard B. Johnson wrote:
> > > On 18 Jul 2002, Robert Love wrote:
> > >
> > > > I should also mention this is demand paging, not overcommit.
> > > >
> > > > Overcommit is the property of succeeded more allocations than their is
> > > > memory in the address space. The idea being that allocations are lazy,
> > > > things often do not use their full allocations, etc. etc. as you
> > > > mentioned.
> > > >
> > > > It is typical a good thing since it lowers VM pressure.
> > > >
> > > > It is not always a good thing, for numerous reasons, and it becomes
> > > > important in those scenarios to ensure that all allocations can be met
> > > > by the backing store and consequently we never find ourselves with more
> > > > memory committed than available and thus never OOM.
> > > >
> > > > This has nothing to do with paging and resource limits as you say. Btw,
> > > > without this it is possible to OOM any machine. OOM is a by-product of
> > > > allowing overcommit and poor accounting (and perhaps poor
> > > > software/users), not an incorrectly configured machine.
> > >
> > > It has everything to do with demand-paging. Since on single CPU
> > > machines, there is only one task executing at any one time, that
> > > single task can own and use every bit of RAM on the whole machine
> > > is virtual memory works correctly. For performance reasons, it
> > > may not actually use all the RAM but, in principle, it is possible.
> > >
> > > If you don't allow that, the single task can use only the RAM that
> > > was not allocated to other tasks. At the time an allocation is made,
> > > the kernel cannot know what resources may be available when the task
> > > requesting the allocation actually starts to use those allocated
> > > resources. Instead, the kernel allocates resources based upon what
> > > it 'knows' at the present time. Since it can't see the future anymore
> > > than you or I, the fact that N processes just called exit() before
> > > the requesting task touched a single page can't be known.
> > >
> > > FYI multiple CPU machines have compounded the problems because there
> > > can be several things happening at the same time. Although the MM
> > > is locked so it's single-threaded, you have a before/after resource
> > > history condition that can't be anticipated.
> > >
> > > Cheers,
> > > Dick Johnson
> > >
> >
> > Is it possible that you're confusing "backing store" with "physical
> > RAM"? I was under the impression that strict overcommit used both RAM
> > and SWAP when deciding whether an allocation should succeed. If you've
> > exceeded all of RAM and all of swap, you are OOM. Period.
> >
> > Daniel
>
> No. And I'm not confused. Consider Virtual RAM as all the real RAM
> and all the backing store. If I consider this an absolute limit,
> then I am not able to fully use all the system resources.
>
> Lets say the system has 20 'units' of a resource (virtual RAM).
>
> Example:
>
> Task (1) allocates 10 units, actually uses 1 so far.
> Task (2) allocates 10 units, actually uses 2 so far.
> Task (3) wants to allocate 7 units. It can't so it exits in error.
> Task (1) uses all its units then exits normally.
> Task (2) uses 1 more unit then exits normally.
>
> So you forced a task to terminate in error because you established
> hard limits on your resource.
>
> What could (should?) have happened, is that all the memory allocations
> could have succeeded, even though they exceeded the 20 units of resource.
> >From the history, we see that Task (1) actually used 10. Task (2)
> actually used 3 units. This means that there were really 20 - 13 = 7 units
> available when task 3 requested 7 units. But, the system 'knew' what
> its commitment was so it refused.
>
> Now, since the system is dynamic, it is possible that Task (1) and
> Task (2) might not even exist by the time Task (3) actually wants to
> use its memory. In that case, there would have been 20 units available,
> but we will never know because Task 3 exited in error.
>
> Now, what this should point out is that a complete VM system, although
> it can't anticipate the future, can put things off until the future
> where things may be better. This is the true 'fix' of OOM (if it
> needs fixing.
>
> Let's say I make a fork bomb, as root, no protection. The MM knows
> that it can't give me any more RAM right now so I am put to sleep.
> Other tasks run fine, at full speed. As they exit, the fork-bomb
> may get their memory. Since the MM knows how much resource could ever
> become available, as a single task exceeds this limit, the MM
> knows that it cannot get any more in the future so the MM knows it's
> a bomb. If the system doesn't kill the bomb, eventually, I may fail the
> system because there may be no resource available to log-in and kill the
> bomb, but as long as there is one task running, connected to a terminal,
> that task can be used to kill the fork-bomb.
>
> The problem with putting memory allocation off to the future, as
> I see it, is the existing paging code wasn't designed for it. If
> a task that page-faulted could also sleep, the problem could be
> solved.
>

So don't turn on strict overcommit. What you describe is what we have
now, and OOM is the result of allowing requests for more than we
actually have based on the assumption processes won't use it all. If
they do, you have problems. If you don't want those problems, you turn
on strict overcommit, and live with the allocation failures.

Daniel
--
Recursion n.:
See Recursion.
-- Random Shack Data Processing Dictionary


2002-07-18 22:05:37

by Adrian Bunk

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit

On 11 Jul 2002, Robert Love wrote:

>...
> In the strictest of modes, it should be impossible to allocate more
> memory than available and impossible to OOM. All memory failures should
> be pushed down to the allocation routines -- malloc, mmap, etc.
>...

Out of interest:

How is assured that it's impossible to OOM when the amount of memory
shrinks?

IOW:
- allocate very much memory
- "swapoff -a"

> Enjoy,
>
> Robert Love

cu
Adrian

--

You only think this is a free country. Like the US the UK spends a lot of
time explaining its a free country because its a police state.
Alan Cox




2002-07-18 22:26:28

by Robert Love

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit

On Thu, 2002-07-18 at 15:08, Adrian Bunk wrote:

> Out of interest:
>
> How is assured that it's impossible to OOM when the amount of memory
> shrinks?
>
> IOW:
> - allocate very much memory
> - "swapoff -a"

Well, seriously: don't do that.

But `swapoff' will not succeed if there is not enough swap or physical
memory to move the pages to... if it does succeed, then there is enough
storage elsewhere. At that point, you are not OOM but you may now have
more address space allocated than the strict accounting would typically
allow - thus no allocations will succeed so you should not be able to
OOM.

Robert Love

2002-07-18 22:27:49

by Robert Love

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On Thu, 2002-07-18 at 15:24, Rik van Riel wrote:

> I see no reason to not merge this (useful) part. Not only
> is it useful on its own, it's also a necessary ingredient
> of whatever "complete solution" to control per-user resource
> limits.

I am glad we agree here - resource limits and strict overcommit are two
separate solutions to various problems. Some they solve individually,
others they solve together.

I may use one, the other, both, or neither. A clean abstract solution
allows this.

Robert Love

2002-07-18 22:22:08

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On Thu, 18 Jul 2002, Szakacsits Szabolcs wrote:

> And my point was that, this is only part of the solution
> making Linux a more reliable

I see no reason to not merge this (useful) part. Not only
is it useful on its own, it's also a necessary ingredient
of whatever "complete solution" to control per-user resource
limits.

regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

2002-07-18 22:21:08

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On Thu, 18 Jul 2002, Szakacsits Szabolcs wrote:
> On Thu, 18 Jul 2002, Szakacsits Szabolcs wrote:
> > And my point (you asked for comments) was that, this is only (the
> > harder) part of the solution making Linux a more reliable (no OOM
> > killing *and* root always has the control) and cost effective platform
> > (no need for occasionally very complex and continuous resource limit
> > setup/adjusting, especially for inexpert home/etc users).
>
> Ahh, I figured out your target, embedded devices. Yes it's good for
> that but not enough for general purpose.

However, you NEED this patch in order to implement something
that is good enough for general purpose (ie. per-user resource
accounting).

regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

2002-07-19 06:47:39

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4


On Thu, 18 Jul 2002 [email protected] wrote:
> Szakacsits> About 99% of the people don't know about, don't understand
> Szakacsits> or don't care about resource limits. But they do care
> Szakacsits> about cleaning up when mess comes. Adding reserved root
> Szakacsits> memory would be a couple of lines
>
> So what does this buy you when root itself runs the box into the
> ground? Or if a dumb user decides to run his process as root, and it
> takes down the system?

You would be able to point out them running stuffs as root is the
worst scenario from security and reliability point of view. You can
argue about security now but not reliability because it doesn't matter
who owns the "runaway" processes, the end result is either uncontrolled
process killing (default kernel) or livelock (strict overcommit patch).

You can't solve everybody's problems of course but you can educate
them however at present the kernel misses the features to do so [and
for a moment *please* ignore the resource control/accounting with all
its benefits and deficients on Linux, there are lot's of way to do
resource control and Linux is quite infant at present].

> You're arguing for the wrong thing here.

How about consulting with some Sun or ex-Dec engineers why they have
this feature for (internet) decades? Because at default they use
strict overcommit and that's shooting yourself in the foot without
reserved root vm on a general purposes system.

Szaka

2002-07-19 07:12:25

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4


On 18 Jul 2002, Alan Cox wrote:
> Adjusting the percentages to have a root only zone is doable. It helps
> in some conceivable cases but not all.

For 2.2 kernels I've found 5 MB reserved from swap until it was needed
was enough to ssh to the box and fix whatever was going on (whatever:
real world cases like slashdot effects, exploits from packetstorm and
other own made testcases that heavily overcommited memory). Nevertheless
the amount reserved was controllable via /proc.

And I do know it doesn't solve all cases but covering 99% of the real
world issues isn't a bad start at all, imho.

Szaka

2002-07-19 08:25:22

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4


On 18 Jul 2002, Robert Love wrote:
> Btw, without this it is possible to OOM any machine. OOM is a
> by-product of allowing overcommit and poor accounting (and perhaps
> poor software/users), not an incorrectly configured machine.

Very well said, now I try to explain again what's missing from the
patch: livelock is a by-product of allowing strict VM overcommit and
poor accounting (and perhaps poor software/users), not an incorrectly
configured machine.

So where is the solution for "poor accounting (and perhaps poor
software/users), not an incorrectly configured machine" users? These
are part of life and please don't claim all your work was perfect at
first shoot and automatically adapted in all changing environments
whitout ever touching it again on a general purpose system. Even if
it would be true, not everybody supergenius.

So which one is better? OOM killer that considers root owned processes
to make his decision or strict VM overcommit that doesn't distinguish
root and non-root users and potentially will livelock [if you don't
have some custom solution, like "trigger OOM handler through sysrq"
patch posted here a year ago].

For embedded systems the later, for general purpose systems the first
is better in average however this is not linux-embedded and later on
people using Linux for general purpose could get the impression strict
VM overcommit is useful for them and potentially would end up in a
worse situation than without it (see my example sent, default kernel
OOM killed the bad process, with your patch reset the box).

*However* distinguishing root and non-root users also in strict VM
overcommit would make a significant difference for general purpose
systems, this was always my point.

Can you see the non-orthogonality now?

Szaka

2002-07-19 08:42:52

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4


On Thu, 18 Jul 2002, Rik van Riel wrote:
> However, you NEED this patch in order to implement something
> that is good enough for general purpose

Yes Rik, apparently you missed when I wrote: "Strict overcommit is
requisite but not satisfactory".

Szaka

2002-07-19 10:59:19

by Adrian Bunk

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit

On 18 Jul 2002, Robert Love wrote:

> > Out of interest:
> >
> > How is assured that it's impossible to OOM when the amount of memory
> > shrinks?
> >
> > IOW:
> > - allocate very much memory
> > - "swapoff -a"
>
> Well, seriously: don't do that.
>
> But `swapoff' will not succeed if there is not enough swap or physical
> memory to move the pages to... if it does succeed, then there is enough
> storage elsewhere. At that point, you are not OOM but you may now have
> more address space allocated than the strict accounting would typically
> allow - thus no allocations will succeed so you should not be able to
> OOM.

"thus no allocations will succeed" seems to be a synonymous for "the
machine is more or less dead"?

And this might be a real problem:

If you have a xsession with many open programs and leave it with an xlock
over night it's quite usual that all your applications are swapped out the
next morning. A convenient way to get all your applications swapped in
again is a "swapoff -a; swapon -a". You might argue that this is "wrong"
but it's as far as I know the best solution to get all your applications
swapped in again (and I know several people doing it this way).

If "no allocations will succeed" the first command that will fail is the
"swapon -a"...

> Robert Love

cu
Adrian

--

You only think this is a free country. Like the US the UK spends a lot of
time explaining its a free country because its a police state.
Alan Cox

2002-07-19 18:03:55

by Robert Love

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit for stock 2.4

On Fri, 2002-07-19 at 00:30, Szakacsits Szabolcs wrote:

> *However* distinguishing root and non-root users also in strict VM
> overcommit would make a significant difference for general purpose
> systems, this was always my point.
>
> Can you see the non-orthogonality now?

Nope, I still disagree and there is no point going back and forth.

We both agree that there are situations where both resource accounting
(or some sort of root-protection like you want) and strict overcommit is
required.

I contend there are situations where only one or the other is needed.

More importantly, I argue the two things should be kept separate.
Putting some root safety net into strict accounting is a hack (how much
of a net? etc.). You want to keep users from ruining things - get
per-user resource limits. You want to keep the machine from
overcommiting memory and thus not OOMing? Get strict accounting. You
want both? Use both.

I provided the first piece.

Robert Love

2002-07-20 00:31:09

by Alan Cox

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit

>
> How is assured that it's impossible to OOM when the amount of memory
> shrinks?
>
> IOW:
> - allocate very much memory
> - "swapoff -a"

Make swapoff -a return -ENOMEM

I've not done this on the basis that this is root specific stupidity and
generally shouldnt be protected against

2002-07-21 10:41:38

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit


On Fri, 19 Jul 2002, Alan Cox wrote:
> > How is assured that it's impossible to OOM when the amount of memory
> > shrinks?
> > IOW:
> > - allocate very much memory
> > - "swapoff -a"
>
> Make swapoff -a return -ENOMEM
>
> I've not done this on the basis that this is root specific stupidity and
> generally shouldnt be protected against

Recommended reading: MIT's Magazin of Innovation Technology Review,
August 2002 issue, cover story: Why Software Is So Bad?

Next you might read: "... prominent, leading Linux kernel developer
publically labels users stupid instead of handling a special case
[that is ironically used as a workaround for one of the many system
software deficiencies] in what case the system software would hang
using a new feature the developer is about to add and admitted to be
paid for ..."

Adrian would deserve a thanks for spotting and reporting the issue
[and there *are* other use cases for the above mentioned swapoff -a,
some also to overcome kernel bugs].

With all respect, Alan, the critic isn't personal but reaction to a
trendy phenomenon that should be address if developers care about user
issues.

Szaka

2002-07-21 12:17:21

by Alan

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit

On Sun, 2002-07-21 at 10:10, Szakacsits Szabolcs wrote:
>
> On Fri, 19 Jul 2002, Alan Cox wrote:
> > > How is assured that it's impossible to OOM when the amount of memory
> > > shrinks?
> > > IOW:
> > > - allocate very much memory
> > > - "swapoff -a"
> >
> > Make swapoff -a return -ENOMEM
> >
> > I've not done this on the basis that this is root specific stupidity and
> > generally shouldnt be protected against
>
> Recommended reading: MIT's Magazin of Innovation Technology Review,
> August 2002 issue, cover story: Why Software Is So Bad?
>
> Next you might read: "... prominent, leading Linux kernel developer
> publically labels users stupid instead of handling a special case

I would suggest you do something quite different. Go and read what K&R
had to say about the design of Unix. One of the design goals of Unix is
that the system does not think it knows better than the administrator.
That is one of the reasons unix works well and is so flexible.

Alan

2002-07-21 12:43:45

by Adrian Bunk

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit

On 21 Jul 2002, Alan Cox wrote:

> On Sun, 2002-07-21 at 10:10, Szakacsits Szabolcs wrote:
> > On Fri, 19 Jul 2002, Alan Cox wrote:
> > > Make swapoff -a return -ENOMEM
> > >
> > > I've not done this on the basis that this is root specific stupidity and
> > > generally shouldnt be protected against
> >
> > Recommended reading: MIT's Magazin of Innovation Technology Review,
> > August 2002 issue, cover story: Why Software Is So Bad?
> >
> > Next you might read: "... prominent, leading Linux kernel developer
> > publically labels users stupid instead of handling a special case
>
> I would suggest you do something quite different. Go and read what K&R
> had to say about the design of Unix. One of the design goals of Unix is
> that the system does not think it knows better than the administrator.
> That is one of the reasons unix works well and is so flexible.

The problem is that at the time K&R said this only real men (tm) were
administrators of UNIX systems. Nowadays clueless people like me are
administrators of their Linux system at home. ;-)

With enough stupidity root can always trash his system but if as Robert
says the state of the system will be that "no allocations will succeed"
which seems to be a synonymous for "the system is practically dead" it is
IMHO a good idea to let "swapoff -a return -ENOMEM".

> Alan

cu
Adrian

--

You only think this is a free country. Like the US the UK spends a lot of
time explaining its a free country because its a police state.
Alan Cox




2002-07-21 14:30:37

by Jos Hulzink

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit

On Sun, 21 Jul 2002, Adrian Bunk wrote:

> On 21 Jul 2002, Alan Cox wrote:
>
> > I would suggest you do something quite different. Go and read what K&R
> > had to say about the design of Unix. One of the design goals of Unix is
> > that the system does not think it knows better than the administrator.
> > That is one of the reasons unix works well and is so flexible.
>
> The problem is that at the time K&R said this only real men (tm) were
> administrators of UNIX systems. Nowadays clueless people like me are
> administrators of their Linux system at home. ;-)
>
> With enough stupidity root can always trash his system but if as Robert
> says the state of the system will be that "no allocations will succeed"
> which seems to be a synonymous for "the system is practically dead" it is
> IMHO a good idea to let "swapoff -a return -ENOMEM".
>

Maybe it is an option to add the --I_know_Im_stupid option to the swapoff
command line ? (Also known as the --force flag). This way we can both
return an error when the OS lacks memory and force a swapoff.

Agreed, the system is practically dead when no allocations will succeed,
but maybe killing user tasks when root needs memory or something is an
option... (Better a few angry users than a crashed server, besides, it is
not something that should happen every day)

Jos

2002-07-21 15:31:19

by Alan

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit

On Sun, 2002-07-21 at 13:46, Adrian Bunk wrote:
> With enough stupidity root can always trash his system but if as Robert
> says the state of the system will be that "no allocations will succeed"
> which seems to be a synonymous for "the system is practically dead" it is
> IMHO a good idea to let "swapoff -a return -ENOMEM".

In the overcommit mode I already suggested that. An administrator can
turn off overcommit protection if he really really needs to swapoff
regardless of the consequences (eg failing swap disk)

2002-07-21 16:54:29

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit


On 21 Jul 2002, Alan Cox wrote:

> One of the design goals of Unix is that the system does not think
> it knows better than the administrator.

What about the many hundred counter-examples (e.g. umount gives EBUSY,
kill can't kill processes in uninterruptible sleep, etc, etc)? Why the
system knows better then admin in these cases? Why just don't destroy
the data, crash the system as you suggest in your case? Why this
inconsistency?

Szaka

2002-07-21 17:46:31

by Alan

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit

On Sun, 2002-07-21 at 16:23, Szakacsits Szabolcs wrote:
> What about the many hundred counter-examples (e.g. umount gives EBUSY,

umount -f.

> kill can't kill processes in uninterruptible sleep, etc, etc)? Why the

In these cases the kernel infrastructure doesn't support the ability to
recover from such a state, very different from stopping a user doing
something it can handle perfectly well.

You'll find plenty of people who believe the umount behaviour is
incorrect (and it should just GC them) as wel as the fact that
uninterruptible sleep is a bad idea

2002-07-22 09:37:47

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit


On 21 Jul 2002, Alan Cox wrote:
> On Sun, 2002-07-21 at 16:23, Szakacsits Szabolcs wrote:
> > What about the many hundred counter-examples (e.g. umount gives EBUSY,
>
> umount -f.

The wasn't the question. *Why* umount (without -f) knows better then
admin? You answered *how* unconditionally umount.

> > kill can't kill processes in uninterruptible sleep, etc, etc)? Why the
> In these cases the kernel infrastructure doesn't support the ability to
> recover from such a state,

You again anwered else but ironically you just rebute yourself that
there are cases when the system knows better then the admin.

> very different from stopping a user doing something it can handle
> perfectly well.

What the patch claims is no OOM. In the swapoff case potentially there
are OOM's. This is called bug (the feature not follows the behavior
what it specified when admin turned it on). Why do you call this bug
perfectly handled case? What differentiate this case from all other
when the system knows better not to destroy your data without at least
a "force" operation for example?

Szaka

2002-07-22 09:34:39

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit


On Sun, 21 Jul 2002, Thunder from the hill wrote:

> > (e.g. umount gives EBUSY,
> Simply because you _will_ lose data if you umount a device that's being
> scribbled on.

Potentially you _will_ lose data in swapoff case. Kernel could know
when it's save to do but no way for admin to calculate.

Szaka

2002-07-22 10:20:25

by Thunder from the hill

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit

Hi,

On Mon, 22 Jul 2002, Szakacsits Szabolcs wrote:
> Potentially you _will_ lose data in swapoff case. Kernel could know
> when it's save to do but no way for admin to calculate.

The difference is between you will definitely loose data and you will
potentially loose data.

Regards,
Thunder
--
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y-
------END GEEK CODE BLOCK------

2002-07-22 12:45:42

by Alan

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit

On Mon, 2002-07-22 at 13:45, Hugh Dickins wrote:
> In strict no-overcommit mode, it should probably decide in advance
> whether to embark on swapping off: I think you suggested that
> earlier in the thread, that it's acceptable to switch overcommit
> mode temporarily to achieve whichever behaviour is desirable?

Yes. I have no problem with

#swapoff -a
swapoff: out of memory
#vmctl overcommit 1
#swapoff -a

2002-07-22 11:44:30

by Alan

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit

On Mon, 2002-07-22 at 09:08, Szakacsits Szabolcs wrote:
> > > kill can't kill processes in uninterruptible sleep, etc, etc)? Why the
> > In these cases the kernel infrastructure doesn't support the ability to
> > recover from such a state,
>
> You again anwered else but ironically you just rebute yourself that
> there are cases when the system knows better then the admin.

And purists consider those flaws

> What the patch claims is no OOM. In the swapoff case potentially there
> are OOM's. This is called bug (the feature not follows the behavior
> what it specified when admin turned it on). Why do you call this bug
> perfectly handled case? What differentiate this case from all other
> when the system knows better not to destroy your data without at least
> a "force" operation for example?

Lets put this bluntly. Your swapdisk is losing sectors left right and
centre. You propose a system where the kernel says "sorry might cause an
OOM" and I lose everything as the disk goes down. Letting the admin set
policy means I can swapoff, maybe lose a program or two to OOM but not
lose the entire system in the process.

Its quite clear that being able to override the kernels assumptions
about what is right are sensible. It always has been

2002-07-22 12:42:32

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit

On 22 Jul 2002, Alan Cox wrote:
>
> Lets put this bluntly. Your swapdisk is losing sectors left right and
> centre. You propose a system where the kernel says "sorry might cause an
> OOM" and I lose everything as the disk goes down. Letting the admin set
> policy means I can swapoff, maybe lose a program or two to OOM but not
> lose the entire system in the process.
>
> Its quite clear that being able to override the kernels assumptions
> about what is right are sensible. It always has been

Suggested compromise: swapoff (in loose overcommit-permitted mode)
should always swap off as much as it can, a small margin short of
causing OOM, but should then give up with ENOMEM (leaving the whole
swap area available again, for consistency). Seeing its failure,
the admin can then choose processes to kill (overriding the kernel's
assumptions about what is right to kill), and try swapoff again.

At present it never gives up: I do intend to fix that.

In strict no-overcommit mode, it should probably decide in advance
whether to embark on swapping off: I think you suggested that
earlier in the thread, that it's acceptable to switch overcommit
mode temporarily to achieve whichever behaviour is desirable?

Hugh

2002-07-22 01:09:16

by Thunder from the hill

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit

Hi,

On Sun, 21 Jul 2002, Szakacsits Szabolcs wrote:
> What about the many hundred counter-examples

These cases are different.

> (e.g. umount gives EBUSY,

Simply because you _will_ lose data if you umount a device that's being
scribbled on.

> kill can't kill processes in uninterruptible sleep

Because the uninterruptible sleep means the process is waiting for data.
If you destroy the process and kill an interrupt handler, you _will_
crash.

> , etc, etc)? Why the system knows better then admin in these cases? Why
> just don't destroy the data, crash the system as you suggest in your
> case?

This case is different. If you swapoff /dev/scsi/path/to/dead/disk, your
system will likely live on. Possibly you'll have some tasks killed, but
we're well up.

Alan was referring to cases where it's unlikely that we die of it, you're
referring to cases where it's clear that the system won't get through.

Regards,
Thunder
--
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y-
------END GEEK CODE BLOCK------

2002-07-22 00:45:21

by Thunder from the hill

[permalink] [raw]
Subject: Re: [PATCH] strict VM overcommit

Hi,

On Sun, 21 Jul 2002, Jos Hulzink wrote:
> Maybe it is an option to add the --I_know_Im_stupid option to the swapoff
> command line ? (Also known as the --force flag). This way we can both
> return an error when the OS lacks memory and force a swapoff.

What's wrong with the current behavior? If the system can't live without
swap, why forcing it dead?

Regards,
Thunder
--
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y-
------END GEEK CODE BLOCK------