LinuxLists.cc - [RFC][PATCH 0/6] Critical Page Pool

2005-12-14 07:50:38

Subject: [RFC][PATCH 0/6] Critical Page Pool

Here is the latest version of the Critical Page Pool patches. Besides
bugfixes, I've removed all the slab cleanup work from the series. Also,
since one of the main questions about the patch series seems to revolve
around how to appropriately size the pool, I've added some basic statistics
about the critical page pool, viewable by reading
/proc/sys/vm/critical_pages. The code now exports how many pages were
requested, how many pages are currently in use, and the maximum number of
pages that were ever in use.

The overall purpose of this patch series is to all a system administrator
to reserve a number of pages in a 'critical pool' that is set aside for
situations when the system is 'in emergency'. It is up to the individual
administrator to determine when his/her system is 'in emergency'. This is
not meant to (necessarily) anticipate OOM situations, though that is
certainly one possible use. The purpose this was originally designed for
is to allow the networking code to keep functioning despite the sytem
losing its (potentially networked) swap device, and thus temporarily
putting the system under exreme memory pressure.

Any comments about the code or the overall design are very welcome.
Patches agaist 2.6.15-rc5.

-Matt

2005-12-14 07:52:51

by Matthew Dobson

[permalink] [raw]

Subject: [RFC][PATCH 1/6] Create Critical Page Pool

Create the basic Critical Page Pool. Any allocation specifying
__GFP_CRITICAL will, as a last resort before failing the allocation, try to
get a page from the critical pool. For now, only singleton (order 0) pages
are supported.

-Matt

Attachments:

critical_pool.patch (10.75 kB)

2005-12-14 07:54:45

by Matthew Dobson

[permalink] [raw]

Subject: [RFC][PATCH 2/6] in_emergency Trigger

Create the 'in_emergency' trigger, to allow userspace to turn access to the
critical pool on and off. The rationale behind this is to ensure that the
critical pool stays full for *actual* emergency situations, and isn't used
for transient, low-mem situations.

-Matt

Attachments:

emergency_trigger.patch (4.81 kB)

2005-12-14 07:56:41

by Matthew Dobson

[permalink] [raw]

Subject: [RFC][PATCH 3/6] Slab Prep: get/return_object

Create 2 helper functions in mm/slab.c: get_object() and return_object().
These functions reduce some existing duplicated code in the slab allocator
and will be used when adding Critical Page Pool support to the slab allocator.

-Matt

Attachments:

slab_prep-get_return_object.patch (3.82 kB)

2005-12-14 07:58:11

by Matthew Dobson

[permalink] [raw]

Subject: [RFC][PATCH 4/6] Slab Prep: slab_destruct()

Create a helper function for slab_destroy() called slab_destruct(). Remove
some ifdefs inside functions and generally make the slab destroying code
more readable prior to slab support for the Critical Page Pool.

-Matt

Attachments:

slab_prep-slab_destruct.patch (2.04 kB)

2005-12-14 07:59:33

by Matthew Dobson

[permalink] [raw]

Subject: [RFC][PATCH 5/6] Slab Prep: Move cache_grow()

Move cache_grow() a few lines further down in mm/slab.c to gain access to a
couple debugging functions that will be used by the next patch. Also,
rename a goto label and fixup a couple comments.

-Matt

Attachments:

slab_prep-cache_grow.patch (5.55 kB)

2005-12-14 08:02:59

by Matthew Dobson

[permalink] [raw]

Subject: [RFC][PATCH 6/6] Critical Page Pool: Slab Support

Finally, add support for the Critical Page Pool to the Slab Allocator. We
need the slab allocator to be at least marginally aware of the existence of
critical pages, or else we leave open the possibility of non-critical slab
allocations stealing objects from 'critical' slabs. We add a separate,
node-unspecific list to kmem_cache_t called slabs_crit. We keep all
partial and full critical slabs on this list. We don't keep empty critical
slabs around, in the interest of giving this memory back to the VM ASAP in
what is typically a high memory pressure situation.

-Matt

Attachments:

slab_support.patch (7.06 kB)

2005-12-14 08:19:36

by Pekka Enberg

[permalink] [raw]

Subject: Re: [RFC][PATCH 3/6] Slab Prep: get/return_object

Hi Matt,

On 12/14/05, Matthew Dobson <[email protected]> wrote:
> Create 2 helper functions in mm/slab.c: get_object() and return_object().
> These functions reduce some existing duplicated code in the slab allocator
> and will be used when adding Critical Page Pool support to the slab allocator.

May I suggest different naming, slab_get_obj and slab_put_obj ?

Pekka

2005-12-14 08:37:42

by Pekka Enberg

[permalink] [raw]

Subject: Re: [RFC][PATCH 4/6] Slab Prep: slab_destruct()

On 12/14/05, Matthew Dobson <[email protected]> wrote:
> Create a helper function for slab_destroy() called slab_destruct(). Remove
> some ifdefs inside functions and generally make the slab destroying code
> more readable prior to slab support for the Critical Page Pool.

Looks good. How about calling it slab_destroy_objs instead?

Pekka

2005-12-14 10:48:42

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: [RFC][PATCH 1/6] Create Critical Page Pool

Hi Matthew,

On Tue, Dec 13, 2005 at 11:52:46PM -0800, Matthew Dobson wrote:
> Create the basic Critical Page Pool. Any allocation specifying
> __GFP_CRITICAL will, as a last resort before failing the allocation, try to
> get a page from the critical pool. For now, only singleton (order 0) pages
> are supported.

Hmm sorry, but this design looks wrong to me. Since the caller has to
use __GFP_CRITICAL anyway, why don't you build this critical pool
_outside_ the page allocator exactly like the mempool does?

Then you will also get an huge advantage, that is allowing to create
more than one critical pool without having to add a __GFP_CRITICAL2 next
month.

So IMHO if something you should create something like a mempool (if the
mempool isn't good enough already for your usage), so more subsystems
can register their critical pools. Call it criticalpool.c or similar but
I wouldn't mess with __GFP_* and page_alloc.c, and the sysctl should be
in the user subsystem, not global.

Or perhaps you can share the mempool code and extend the mempool API to
refill itself internally automatically as soon as pages are being
released.

You may still need a single hook in the __free_pages path, to refill
pools transparently from any freeing (not only the freeing of your
subsystem) but such an hook is acceptable. You may need to set
priorities in the criticalpool.c api as well to choose which pool to
refill first, or if to refill them in round robin when they've the same
priority.

I would touch page_alloc.c only with regard of the prioritized pool
refilling with a registration hook and I would definitely not use a
global pool and I wouldn't use __GFP_ bitflag for it.

Then each slab will be allowed to have its criticalpool too, then, not
a global one. A global one driven by the __GFP_CRITICAL flag will
quickly become useless as soon as you've more than one subsystem using
it, plus it unnecessairly mess with page_alloc.c APIs where the only
thing you care about is to catch the freeing operation with a hook.

2005-12-14 11:41:08

by Pavel Machek

[permalink] [raw]

Subject: Re: [RFC][PATCH 0/6] Critical Page Pool

Hi!

> The overall purpose of this patch series is to all a system administrator
> to reserve a number of pages in a 'critical pool' that is set aside for
> situations when the system is 'in emergency'. It is up to the individual
> administrator to determine when his/her system is 'in emergency'. This is
> not meant to (necessarily) anticipate OOM situations, though that is
> certainly one possible use. The purpose this was originally designed for
> is to allow the networking code to keep functioning despite the sytem
> losing its (potentially networked) swap device, and thus temporarily
> putting the system under exreme memory pressure.

I don't see how this can ever work.

How can _userspace_ know about what allocations are critical to the
kernel?!

And as you noticed, it does not work for your original usage case,
because reserved memory pool would have to be "sum of all network
interface bandwidths * ammount of time expected to survive without
network" which is way too much.

If you want few emergency pages for some strange hack you are doing
(swapping over network?), just put swap into ramdisk and swapon() it
when you are in emergency, or use memory hotplug and plug few more
gigabytes into your machine. But don't go introducing infrastructure
that _can't_ be used right.
Pavel
--
Thanks, Sharp!

2005-12-14 12:01:54

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: [RFC][PATCH 0/6] Critical Page Pool

On Wed, Dec 14, 2005 at 11:08:41AM +0100, Pavel Machek wrote:
> because reserved memory pool would have to be "sum of all network
> interface bandwidths * ammount of time expected to survive without
> network" which is way too much.

Yes, a global pool isn't really useful. A per-subsystem pool would be
more reasonable...

> gigabytes into your machine. But don't go introducing infrastructure
> that _can't_ be used right.

Agreed, the current design of the patch can't be used right.

2005-12-14 13:04:22

by Alan

[permalink] [raw]

Subject: Re: [RFC][PATCH 0/6] Critical Page Pool

On Mer, 2005-12-14 at 13:01 +0100, Andrea Arcangeli wrote:
> On Wed, Dec 14, 2005 at 11:08:41AM +0100, Pavel Machek wrote:
> > because reserved memory pool would have to be "sum of all network
> > interface bandwidths * ammount of time expected to survive without
> > network" which is way too much.
>
> Yes, a global pool isn't really useful. A per-subsystem pool would be
> more reasonable...

The whole extra critical level seems dubious in itself. In 2.0/2.2 days
there were a set of patches that just dropped incoming memory on sockets
when the memory was tight unless they were marked as critical (ie NFS
swap). It worked rather well. The rest of the changes beyond that seem
excessive.

2005-12-14 13:31:13

by Rik van Riel

[permalink] [raw]

Subject: Re: [RFC][PATCH 1/6] Create Critical Page Pool

On Tue, 13 Dec 2005, Matthew Dobson wrote:

> Create the basic Critical Page Pool. Any allocation specifying
> __GFP_CRITICAL will, as a last resort before failing the allocation, try
> to get a page from the critical pool. For now, only singleton (order 0)
> pages are supported.

How are you going to limit the number of GFP_CRITICAL
allocations to something smaller than the number of
pages in the pool ?

Unless you can do that, all guarantees are off...

--
All Rights Reversed

2005-12-14 15:55:33

by Matthew Dobson

[permalink] [raw]

Subject: Re: [RFC][PATCH 0/6] Critical Page Pool

Pavel Machek wrote:
> Hi!
>
>
>>The overall purpose of this patch series is to all a system administrator
>>to reserve a number of pages in a 'critical pool' that is set aside for
>>situations when the system is 'in emergency'. It is up to the individual
>>administrator to determine when his/her system is 'in emergency'. This is
>>not meant to (necessarily) anticipate OOM situations, though that is
>>certainly one possible use. The purpose this was originally designed for
>>is to allow the networking code to keep functioning despite the sytem
>>losing its (potentially networked) swap device, and thus temporarily
>>putting the system under exreme memory pressure.
>
>
> I don't see how this can ever work.
>
> How can _userspace_ know about what allocations are critical to the
> kernel?!

Well, it isn't userspace that is determining *which* allocations are
critical to the kernel. That is statically determined at compile time by
using the flag __GFP_CRITICAL on specific *kernel* allocations. Sridhar,
cc'd on this mail, has a set of patches that sprinkle the __GFP_CRITICAL
flag throughout the networking code to take advantage of this pool.
Userspace is in charge of determining *when* we're in an emergency
situation, and should thus use the critical pool, but not *which*
allocations are critical to surviving this emergency situation.

> And as you noticed, it does not work for your original usage case,
> because reserved memory pool would have to be "sum of all network
> interface bandwidths * ammount of time expected to survive without
> network" which is way too much.

Well, I never suggested it didn't work for my original usage case. The
discussion we had is that it would be incredibly difficult to 100%
iron-clad guarantee that the pool would NEVER run out of pages. But we can
size the pool, especially given a decent workload approximation, so as to
make failure far less likely.

> If you want few emergency pages for some strange hack you are doing
> (swapping over network?), just put swap into ramdisk and swapon() it
> when you are in emergency, or use memory hotplug and plug few more
> gigabytes into your machine. But don't go introducing infrastructure
> that _can't_ be used right.

Well, that's basically the point of posting these patches as an RFC. I'm
not quite so delusional as to think they're going to get picked up right
now. I was, however, hoping for feedback to figure out how to design
infrastructure that *can* be used right, as well as trying to find other
potential users of such a feature.

Thanks!

-Matt

2005-12-14 16:03:15

by Matthew Dobson

[permalink] [raw]

Subject: Re: [RFC][PATCH 0/6] Critical Page Pool

Andrea Arcangeli wrote:
> On Wed, Dec 14, 2005 at 11:08:41AM +0100, Pavel Machek wrote:
>
>>because reserved memory pool would have to be "sum of all network
>>interface bandwidths * ammount of time expected to survive without
>>network" which is way too much.
>
>
> Yes, a global pool isn't really useful. A per-subsystem pool would be
> more reasonable...

Which is an idea that I toyed with, as well. The problem that I ran into
is how to tag an allocation as belonging to a specific subsystem. For
example, in our code we need networking to use the critical pool. How do
we let __alloc_pages() know what allocations belong to networking?
Networking needs named slab allocations, kmalloc allocations, and whole
page allocations to function. Should each subsystem get it's own GFP flag
(GFP_NETWORKING, GFP_SCSI, GFP_SOUND, GFP_TERMINAL, ad nauseum)? Should we
create these pools dynamically and pass a reference to which pool each
specific allocation uses (thus adding a parameter to all memory allocation
functions in the kernel)? I realize that per-subsystem pools would be
better, but I thought about this for a while and couldn't come up with a
reasonable way to do it.

>>gigabytes into your machine. But don't go introducing infrastructure
>>that _can't_ be used right.
>
>
> Agreed, the current design of the patch can't be used right.

Well, it can for our use, but I recognize that isn't going to be a huge
selling point! :) As I mentioned in my reply to Pavel, I'd really like to
find a way to design something that WOULD be generally useful.

Thanks!

-Matt

2005-12-14 16:26:15

by Matthew Dobson

[permalink] [raw]

Subject: Re: [RFC][PATCH 1/6] Create Critical Page Pool

Rik van Riel wrote:
> On Tue, 13 Dec 2005, Matthew Dobson wrote:
>
>
>>Create the basic Critical Page Pool. Any allocation specifying
>>__GFP_CRITICAL will, as a last resort before failing the allocation, try
>>to get a page from the critical pool. For now, only singleton (order 0)
>>pages are supported.
>
>
> How are you going to limit the number of GFP_CRITICAL
> allocations to something smaller than the number of
> pages in the pool ?

We can't.

> Unless you can do that, all guarantees are off...

Well, I was careful not to use the word guarantee in my post. ;) The idea
is not to offer a 100% guarantee that the pool will never be exhausted.
The idea is to offer a pool that, sized appropriately, offers a very good
chance of surviving your emergency situation. The definition of what is a
critical allocation and what the emergency situation is left intentionally
somewhat vague, so as to offer more flexibility. For our use, certain
networking allocations are critical and our emergency situation is a 2
minute window of potential exreme memory pressure. For others it could be
something completely different, but the expectation is that the emergency
situation would be of a finite time, since the pool is a fixed size.

Thanks!

-Matt

2005-12-14 16:26:47

by Matthew Dobson

[permalink] [raw]

Subject: Re: [RFC][PATCH 3/6] Slab Prep: get/return_object

Pekka Enberg wrote:
> Hi Matt,
>
> On 12/14/05, Matthew Dobson <[email protected]> wrote:
>
>>Create 2 helper functions in mm/slab.c: get_object() and return_object().
>>These functions reduce some existing duplicated code in the slab allocator
>>and will be used when adding Critical Page Pool support to the slab allocator.
>
>
> May I suggest different naming, slab_get_obj and slab_put_obj ?
>
> Pekka

Sure. Those sound much better than mine. :)

-Matt

2005-12-14 16:30:35

by Matthew Dobson

[permalink] [raw]

Subject: Re: [RFC][PATCH 4/6] Slab Prep: slab_destruct()

Pekka Enberg wrote:
> On 12/14/05, Matthew Dobson <[email protected]> wrote:
>
>>Create a helper function for slab_destroy() called slab_destruct(). Remove
>>some ifdefs inside functions and generally make the slab destroying code
>>more readable prior to slab support for the Critical Page Pool.
>
>
> Looks good. How about calling it slab_destroy_objs instead?
>
> Pekka

I called it slab_destruct() because it's the part of the old slab_destroy()
that called the slab destructor to destroy the slab's objects.
slab_destroy_objs() is reasonable as well, though, and I can live with that.

Thanks!

-Matt

2005-12-14 16:37:21

by Matthew Dobson

[permalink] [raw]

Subject: Re: [RFC][PATCH 0/6] Critical Page Pool

Alan Cox wrote:
> On Mer, 2005-12-14 at 13:01 +0100, Andrea Arcangeli wrote:
>
>>On Wed, Dec 14, 2005 at 11:08:41AM +0100, Pavel Machek wrote:
>>
>>>because reserved memory pool would have to be "sum of all network
>>>interface bandwidths * ammount of time expected to survive without
>>>network" which is way too much.
>>
>>Yes, a global pool isn't really useful. A per-subsystem pool would be
>>more reasonable...
>
>
>
> The whole extra critical level seems dubious in itself. In 2.0/2.2 days
> there were a set of patches that just dropped incoming memory on sockets
> when the memory was tight unless they were marked as critical (ie NFS
> swap). It worked rather well. The rest of the changes beyond that seem
> excessive.

Actually, Sridhar's code (mentioned earlier in this thread) *does* drop
incoming packets that are not 'critical', but unfortunately you need to
completely copy the packet into kernel memory before you can do any
processing on it to determine whether or not it's 'critical', and thus
accept or reject it. If network traffic is coming in at a good clip and
the system is already under memory pressure, it's going to be difficult to
receive all these packets, which was the inspiration for this patchset.

Thanks!

-Matt

2005-12-14 19:17:32

by Alan

[permalink] [raw]

Subject: Re: [RFC][PATCH 0/6] Critical Page Pool

On Mer, 2005-12-14 at 08:37 -0800, Matthew Dobson wrote:
> Actually, Sridhar's code (mentioned earlier in this thread) *does* drop
> incoming packets that are not 'critical', but unfortunately you need to

I realise that but if you look at the previous history in 2.0 and 2.2
this was all that was ever needed. It thus begs the question why all the
extra support and logic this time around ?

2005-12-15 03:35:58

by Matt Mackall

[permalink] [raw]

Subject: Re: [RFC][PATCH 1/6] Create Critical Page Pool

On Wed, Dec 14, 2005 at 08:26:09AM -0800, Matthew Dobson wrote:
> Rik van Riel wrote:
> > On Tue, 13 Dec 2005, Matthew Dobson wrote:
> >
> >
> >>Create the basic Critical Page Pool. Any allocation specifying
> >>__GFP_CRITICAL will, as a last resort before failing the allocation, try
> >>to get a page from the critical pool. For now, only singleton (order 0)
> >>pages are supported.
> >
> >
> > How are you going to limit the number of GFP_CRITICAL
> > allocations to something smaller than the number of
> > pages in the pool ?
>
> We can't.
>
>
> > Unless you can do that, all guarantees are off...
>
> Well, I was careful not to use the word guarantee in my post. ;) The idea
> is not to offer a 100% guarantee that the pool will never be exhausted.
> The idea is to offer a pool that, sized appropriately, offers a very good
> chance of surviving your emergency situation. The definition of what is a
> critical allocation and what the emergency situation is left intentionally
> somewhat vague, so as to offer more flexibility. For our use, certain
> networking allocations are critical and our emergency situation is a 2
> minute window of potential exreme memory pressure. For others it could be
> something completely different, but the expectation is that the emergency
> situation would be of a finite time, since the pool is a fixed size.

What's your plan for handling the no-room-to-receive-ACKs problem?

Without addressing this, this is a non-starter for most of the network
OOM problems I care about.

--
Mathematics is the supreme nostalgia of our time.

2005-12-15 16:26:27

by Pavel Machek

[permalink] [raw]

Subject: Re: [RFC][PATCH 0/6] Critical Page Pool

Hi!

> > I don't see how this can ever work.
> >
> > How can _userspace_ know about what allocations are critical to the
> > kernel?!
>
> Well, it isn't userspace that is determining *which* allocations are
> critical to the kernel. That is statically determined at compile time by
> using the flag __GFP_CRITICAL on specific *kernel* allocations. Sridhar,
> cc'd on this mail, has a set of patches that sprinkle the __GFP_CRITICAL
> flag throughout the networking code to take advantage of this pool.
> Userspace is in charge of determining *when* we're in an emergency
> situation, and should thus use the critical pool, but not *which*

It still is not too reliable. If you userspace tool is swapped out
(etc), it may not get chance to wake up.

> > And as you noticed, it does not work for your original usage case,
> > because reserved memory pool would have to be "sum of all network
> > interface bandwidths * ammount of time expected to survive without
> > network" which is way too much.
>
> Well, I never suggested it didn't work for my original usage case. The
> discussion we had is that it would be incredibly difficult to 100%
> iron-clad guarantee that the pool would NEVER run out of pages. But we can
> size the pool, especially given a decent workload approximation, so as to
> make failure far less likely.

Perhaps you should add file in Documentation/ explaining it is not
reliable?

> > If you want few emergency pages for some strange hack you are doing
> > (swapping over network?), just put swap into ramdisk and swapon() it
> > when you are in emergency, or use memory hotplug and plug few more
> > gigabytes into your machine. But don't go introducing infrastructure
> > that _can't_ be used right.
>
> Well, that's basically the point of posting these patches as an RFC. I'm
> not quite so delusional as to think they're going to get picked up right
> now. I was, however, hoping for feedback to figure out how to design
> infrastructure that *can* be used right, as well as trying to find other
> potential users of such a feature.

Well, we don't usually take infrastructure that has no in-kernel
users, and example user would indeed be nice.
Pavel
--
Thanks, Sharp!

2005-12-15 16:27:40

by Pavel Machek

[permalink] [raw]

Subject: Re: [RFC][PATCH 0/6] Critical Page Pool

Hi!

> > The whole extra critical level seems dubious in itself. In 2.0/2.2 days
> > there were a set of patches that just dropped incoming memory on sockets
> > when the memory was tight unless they were marked as critical (ie NFS
> > swap). It worked rather well. The rest of the changes beyond that seem
> > excessive.
>
> Actually, Sridhar's code (mentioned earlier in this thread) *does* drop
> incoming packets that are not 'critical', but unfortunately you need to
> completely copy the packet into kernel memory before you can do any
> processing on it to determine whether or not it's 'critical', and thus
> accept or reject it. If network traffic is coming in at a good clip and
> the system is already under memory pressure, it's going to be difficult to
> receive all these packets, which was the inspiration for this patchset.

You should be able to do all this with single, MTU-sized buffer.

Receive packet into buffer. If it is nice, pass it up, otherwise drop
it. Yes, it may drop some "important" packets, but that's okay, packet
loss is expected on networks.
Pavel
--
Thanks, Sharp!

2005-12-15 21:51:18

by Matthew Dobson

[permalink] [raw]

Subject: Re: [RFC][PATCH 0/6] Critical Page Pool

Pavel Machek wrote:
>>>And as you noticed, it does not work for your original usage case,
>>>because reserved memory pool would have to be "sum of all network
>>>interface bandwidths * ammount of time expected to survive without
>>>network" which is way too much.
>>
>>Well, I never suggested it didn't work for my original usage case. The
>>discussion we had is that it would be incredibly difficult to 100%
>>iron-clad guarantee that the pool would NEVER run out of pages. But we can
>>size the pool, especially given a decent workload approximation, so as to
>>make failure far less likely.
>
>
> Perhaps you should add file in Documentation/ explaining it is not
> reliable?

That's a good suggestion. I will rework the patch's additions to
Documentation/sysctl/vm.txt to be more clear about exactly what we're
providing.

>>>If you want few emergency pages for some strange hack you are doing
>>>(swapping over network?), just put swap into ramdisk and swapon() it
>>>when you are in emergency, or use memory hotplug and plug few more
>>>gigabytes into your machine. But don't go introducing infrastructure
>>>that _can't_ be used right.
>>
>>Well, that's basically the point of posting these patches as an RFC. I'm
>>not quite so delusional as to think they're going to get picked up right
>>now. I was, however, hoping for feedback to figure out how to design
>>infrastructure that *can* be used right, as well as trying to find other
>>potential users of such a feature.
>
>
> Well, we don't usually take infrastructure that has no in-kernel
> users, and example user would indeed be nice.
> Pavel

Understood. I certainly wouldn't expect otherwise. I'll see if I can get
Sridhar to post his networking changes that take advantage of this.

Thanks!

-Matt

2005-12-16 05:03:46

by Sridhar Samudrala

[permalink] [raw]

Subject: Re: [RFC][PATCH 0/6] Critical Page Pool

Matthew Dobson wrote:

>Pavel Machek wrote:
>
>
>>>>And as you noticed, it does not work for your original usage case,
>>>>because reserved memory pool would have to be "sum of all network
>>>>interface bandwidths * ammount of time expected to survive without
>>>>network" which is way too much.
>>>>
>>>>
>>>Well, I never suggested it didn't work for my original usage case. The
>>>discussion we had is that it would be incredibly difficult to 100%
>>>iron-clad guarantee that the pool would NEVER run out of pages. But we can
>>>size the pool, especially given a decent workload approximation, so as to
>>>make failure far less likely.
>>>
>>>
>>Perhaps you should add file in Documentation/ explaining it is not
>>reliable?
>>
>>
>
>That's a good suggestion. I will rework the patch's additions to
>Documentation/sysctl/vm.txt to be more clear about exactly what we're
>providing.
>
>
>
>
>>>>If you want few emergency pages for some strange hack you are doing
>>>>(swapping over network?), just put swap into ramdisk and swapon() it
>>>>when you are in emergency, or use memory hotplug and plug few more
>>>>gigabytes into your machine. But don't go introducing infrastructure
>>>>that _can't_ be used right.
>>>>
>>>>
>>>Well, that's basically the point of posting these patches as an RFC. I'm
>>>not quite so delusional as to think they're going to get picked up right
>>>now. I was, however, hoping for feedback to figure out how to design
>>>infrastructure that *can* be used right, as well as trying to find other
>>>potential users of such a feature.
>>>
>>>
>>Well, we don't usually take infrastructure that has no in-kernel
>>users, and example user would indeed be nice.
>> Pavel
>>
>>
>
>Understood. I certainly wouldn't expect otherwise. I'll see if I can get
>Sridhar to post his networking changes that take advantage of this.
>
>
I have posted these patches yesterday on lkml and netdev and here is a
link to the thread.
http://thread.gmane.org/gmane.linux.kernel/357835

Thanks
Sridhar