2001-10-05 11:08:26

by Krzysztof Rusocki

[permalink] [raw]
Subject: %u-order allocation failed


Hi,

After simple bash fork bombing (about 200 forks) on my UP Celeron/96MB
I get quite a lot %u-allocations failed, but only when swap is
turned off.

When it's turned on, processes are still forking for some time
until i get messages like 'fork: Resource temporarily unavailable' or
'cannot redirect /dev/null: too many open files in system' (or similar)
and also 'cannot load libdl.so blah blah return code 23' (don't remember
exact message)... load goes up to about 700 but _none_ of processess
get killed. Machine is almost unresponsible that time... i hardly managed
to Alt+SysRQ+UB ...

As mentioned in some other mail - no highmem, no lvm, md as module (unused).
2.4.10-xfs cvs co 25th September (not 12th :/ - info in previous mail was
incorrect)

When swap was off first i got some of
0-order (gfp=0x1d2/0) from c012ac08 (_alloc_pages+24)
beside it, in a few seconds also noticed
0-order (gfp=0x1f0/0) from c012ac08
0-order (gfp=0xf0/0) from c012ac08
at random order....
I also saw a really small number of
1-order (gfp=0x1f0/0) from c012ac08

During that time almost all processess were killed by VM, machine
was more responsible so i could freely do Alt+SysRQ+K and everything
went back to normal...

I'm not familiar with LinuxVM.. so... is it normal behaviour ? or (if not)
what's happening when such messages are printed my kernel ?

Cheers,
Krzysztof

PS
lkml people - please CC, ain't subscribing.


2001-10-05 11:59:31

by Rik van Riel

[permalink] [raw]
Subject: Re: %u-order allocation failed

On Fri, 5 Oct 2001, Krzysztof Rusocki wrote:

> After simple bash fork bombing (about 200 forks) on my UP Celeron/96MB
> I get quite a lot %u-allocations failed, but only when swap is turned
> off.

> I'm not familiar with LinuxVM.. so... is it normal behaviour ? or (if not)
> what's happening when such messages are printed my kernel ?

This is perfectly normal behaviour:

1) on your system, you have no process limit configured for
yourself so you can start processes until all resources
(memory, file descriptors, ...) are used

2) when all processes are used, there really is no way the
kernel can buy you more hardware on ebay and install it
on the fly ... all it can do is start failing allocations

On production systems, good admins setup per-user limits for
the various resources so no single user is able to run the
system into the ground.

regards,

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ (volunteers needed)

http://www.surriel.com/ http://distro.conectiva.com/

2001-10-05 20:18:38

by Seth Mos

[permalink] [raw]
Subject: Re: %u-order allocation failed

On Fri, 5 Oct 2001, Rik van Riel wrote:

> On Fri, 5 Oct 2001, Krzysztof Rusocki wrote:
>
> > After simple bash fork bombing (about 200 forks) on my UP Celeron/96MB
> > I get quite a lot %u-allocations failed, but only when swap is turned
> > off.
>
> > I'm not familiar with LinuxVM.. so... is it normal behaviour ? or (if not)
> > what's happening when such messages are printed my kernel ?
>
> This is perfectly normal behaviour:
>
> 1) on your system, you have no process limit configured for
> yourself so you can start processes until all resources
> (memory, file descriptors, ...) are used

Fair enough.

> 2) when all processes are used, there really is no way the
> kernel can buy you more hardware on ebay and install it
> on the fly ... all it can do is start failing allocations

So it needs a handbrake in case of a emergency? The box at work deadlocks
or crashes. I can hardly call that normal operational behaviour.

I have a Dell PE 2500 (Serverworks LE) with 2GB ram and 2 1.13Ghz
processors. If I disable HIGHMEM (4GB) support the box does not produce
these allocations messages and does not deadlock or die under the same
load or worse. What I used was a mongo.pl with 5 processes (does not
matter if the
fs is ext2 reiserfs or xfs) and the box dies within minutes/seconds after
starting the benchmark.
This happens using either 2.4.10-xfs or 2.4.11-pre3-xfs.

Using a single process hides the issue.
> On production systems, good admins setup per-user limits for
> the various resources so no single user is able to run the
> system into the ground.

The system is beafy enough to tolerate something mundane as this. It
should definitely not die.

Cheers
Seth

2001-10-05 20:23:08

by Rik van Riel

[permalink] [raw]
Subject: Re: %u-order allocation failed

On Fri, 5 Oct 2001, Seth Mos wrote:

> This happens using either 2.4.10-xfs or 2.4.11-pre3-xfs.

Ohh duh, IIRC there are a bunch of highmem bugs in
-linus which are fixed in -ac.

Can you reproduce the bug with an -ac kernel ?

regards,

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ (volunteers needed)

http://www.surriel.com/ http://distro.conectiva.com/

2001-10-05 20:31:38

by Seth Mos

[permalink] [raw]
Subject: Re: %u-order allocation failed

On Fri, 5 Oct 2001, Rik van Riel wrote:

> On Fri, 5 Oct 2001, Seth Mos wrote:
>
> > This happens using either 2.4.10-xfs or 2.4.11-pre3-xfs.
>
> Ohh duh, IIRC there are a bunch of highmem bugs in
> -linus which are fixed in -ac.

Fitting XFS onto a -ac kernel should be fun :-(

I will try this over the weekend or get a redhat kernel going which is
also -ac based. That would come in handy for other people using XFS since
a lot are using highmem in combination with this fs.

> Can you reproduce the bug with an -ac kernel ?

I am not that good/fast at patching. Expect something over the weekend :-)

Bye
Seth

2001-10-05 20:43:49

by Steve Lord

[permalink] [raw]
Subject: Re: %u-order allocation failed

> On Fri, 5 Oct 2001, Rik van Riel wrote:
>
> > On Fri, 5 Oct 2001, Seth Mos wrote:
> >
> > > This happens using either 2.4.10-xfs or 2.4.11-pre3-xfs.
> >
> > Ohh duh, IIRC there are a bunch of highmem bugs in
> > -linus which are fixed in -ac.
>
> Fitting XFS onto a -ac kernel should be fun :-(

Its not that that simple - I tried before I got dragged kicking and
screaming back into some Irix stuff. Just running mongo on ext2
on a HIGHMEM ac kernel should show if things are better there - since
the problems seem to be fairly filesystem independent.

Steve


>
> I will try this over the weekend or get a redhat kernel going which is
> also -ac based. That would come in handy for other people using XFS since
> a lot are using highmem in combination with this fs.
>
> > Can you reproduce the bug with an -ac kernel ?
>
> I am not that good/fast at patching. Expect something over the weekend :-)
>
> Bye
> Seth


2001-10-05 21:09:00

by Seth Mos

[permalink] [raw]
Subject: Re: %u-order allocation failed

On Fri, 5 Oct 2001, Steve Lord wrote:

> > On Fri, 5 Oct 2001, Rik van Riel wrote:
> >
> > > On Fri, 5 Oct 2001, Seth Mos wrote:
> > >
> > > > This happens using either 2.4.10-xfs or 2.4.11-pre3-xfs.
> > >
> > > Ohh duh, IIRC there are a bunch of highmem bugs in
> > > -linus which are fixed in -ac.
> >
> > Fitting XFS onto a -ac kernel should be fun :-(
>
> Its not that that simple - I tried before I got dragged kicking and
> screaming back into some Irix stuff. Just running mongo on ext2
> on a HIGHMEM ac kernel should show if things are better there - since
> the problems seem to be fairly filesystem independent.

I don't have a HIGHMEM box without XFS filesystems. So i have to merge
both -ac and the xfs tree to test it. I can reformat the box ofcourse but
that would mean next week. If I can win a day and spare a reformat I am
willing to make that sacrifice.


2001-10-05 22:06:22

by David Schwartz

[permalink] [raw]
Subject: Re: %u-order allocation failed


>The system is beafy enough to tolerate something mundane as this. It should
>definitely not die.

A fork bomb with no limits attempts to create an infinite number of
processes. No system can be that beefy.

DS


2001-10-05 22:16:02

by Seth Mos

[permalink] [raw]
Subject: Re: %u-order allocation failed

On Fri, 5 Oct 2001, David Schwartz wrote:

>
> >The system is beafy enough to tolerate something mundane as this. It should
> >definitely not die.
>
> A fork bomb with no limits attempts to create an infinite number of
> processes. No system can be that beefy.

I was refering to the mundane load of mongo.pl with 5 processes. Something
the systems should withstand. If you have more then 10GB of database to
access you would want it to work. I am not talking about a lot of
processes but a lot of disk IO.

I have just one box running SMP with highmem and that one is acting
funny. All the other SMP ur Uni servers have absolutely no
problems.

Disable highmem and the problem goes away while halving your ram. That is
not very efficient is it?

Cheers


2001-10-06 14:00:49

by Mikulas Patocka

[permalink] [raw]
Subject: Re: %u-order allocation failed

> > After simple bash fork bombing (about 200 forks) on my UP Celeron/96MB
> > I get quite a lot %u-allocations failed, but only when swap is turned
> > off.
>
> > I'm not familiar with LinuxVM.. so... is it normal behaviour ? or (if not)
> > what's happening when such messages are printed my kernel ?
>
> This is perfectly normal behaviour:
>
> 1) on your system, you have no process limit configured for
> yourself so you can start processes until all resources
> (memory, file descriptors, ...) are used
>
> 2) when all processes are used, there really is no way the
> kernel can buy you more hardware on ebay and install it
> on the fly ... all it can do is start failing allocations
>
> On production systems, good admins setup per-user limits for
> the various resources so no single user is able to run the
> system into the ground.

No, it's not normal. It is long-standing bug - I think from 2.2 kernels.
You know that without swap and with certain memory allocation strategy
(when process in a loop allocates one anonymous page, one file cache page
and again...) this bug can be triggered even when there is half memory
free.

Buddy allocator is broken - kill it. Or at least do not misuse it for
anything except kernel or driver initialization.

Mikulas

2001-10-06 14:04:09

by Rik van Riel

[permalink] [raw]
Subject: Re: %u-order allocation failed

On Sat, 6 Oct 2001, Mikulas Patocka wrote:

> Buddy allocator is broken - kill it. Or at least do not misuse it for
> anything except kernel or driver initialization.

Please send patches to get rid of the buddy allocator while
still making it possible to allocate contiguous chunks of
memory.

If you have any idea on how to fix things, this would be a
good time to let us know.

cheers,

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ (volunteers needed)

http://www.surriel.com/ http://distro.conectiva.com/

2001-10-06 14:44:38

by Mikulas Patocka

[permalink] [raw]
Subject: Re: %u-order allocation failed

diff -u -r linux-orig/include/asm-i386/processor.h linux/include/asm-i386/processor.h
--- linux-orig/include/asm-i386/processor.h Sat Oct 6 16:21:50 2001
+++ linux/include/asm-i386/processor.h Sat Oct 6 16:31:15 2001
@@ -448,7 +448,7 @@
#define KSTK_ESP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)))[1022])

#define THREAD_SIZE (2*PAGE_SIZE)
-#define alloc_task_struct() ((struct task_struct *) __get_free_pages(GFP_KERNEL,1))
+#define alloc_task_struct() ((struct task_struct *) __get_free_pages(GFP_KERNEL | __GFP_VMALLOC,1))
#define free_task_struct(p) free_pages((unsigned long) (p), 1)
#define get_task_struct(tsk) atomic_inc(&virt_to_page(tsk)->count)

diff -u -r linux-orig/include/linux/mm.h linux/include/linux/mm.h
--- linux-orig/include/linux/mm.h Sat Oct 6 16:21:59 2001
+++ linux/include/linux/mm.h Sat Oct 6 16:28:12 2001
@@ -550,6 +550,7 @@
#define __GFP_IO 0x40 /* Can start low memory physical IO? */
#define __GFP_HIGHIO 0x80 /* Can start high mem physical IO? */
#define __GFP_FS 0x100 /* Can call down to low-level FS? */
+#define __GFP_VMALLOC 0x200 /* Can vmalloc pages if buddy allocator fails */

#define GFP_NOHIGHIO (__GFP_HIGH | __GFP_WAIT | __GFP_IO)
#define GFP_NOIO (__GFP_HIGH | __GFP_WAIT)
diff -u -r linux-orig/mm/page_alloc.c linux/mm/page_alloc.c
--- linux-orig/mm/page_alloc.c Sat Oct 6 16:21:47 2001
+++ linux/mm/page_alloc.c Sat Oct 6 16:36:28 2001
@@ -18,6 +18,7 @@
#include <linux/bootmem.h>
#include <linux/slab.h>
#include <linux/compiler.h>
+#include <linux/vmalloc.h>

int nr_swap_pages;
int nr_active_pages;
@@ -421,9 +422,9 @@
struct page * page;

page = alloc_pages(gfp_mask, order);
- if (!page)
- return 0;
- return (unsigned long) page_address(page);
+ if (page) return (unsigned long) page_address(page);
+ if (gfp_mask & __GFP_VMALLOC) return (unsigned long)__vmalloc(PAGE_SIZE << order, gfp_mask, PAGE_KERNEL);
+ return 0;
}

unsigned long get_zeroed_page(unsigned int gfp_mask)
@@ -447,6 +448,10 @@

void free_pages(unsigned long addr, unsigned int order)
{
+ if (addr >= VMALLOC_START && addr < VMALLOC_END) {
+ vfree((void *)addr);
+ return;
+ }
if (addr != 0)
__free_pages(virt_to_page(addr), order);
}

2001-10-06 15:31:23

by Mikulas Patocka

[permalink] [raw]
Subject: Re: %u-order allocation failed

diff -u -r linux-orig/fs/select.c linux/fs/select.c
--- linux-orig/fs/select.c Sat Oct 6 16:20:45 2001
+++ linux/fs/select.c Sat Oct 6 16:54:44 2001
@@ -236,7 +236,7 @@

static void *select_bits_alloc(int size)
{
- return kmalloc(6 * size, GFP_KERNEL);
+ return kmalloc(6 * size, GFP_KERNEL | __GFP_VMALLOC);
}

static void select_bits_free(void *bits, int size)
@@ -438,7 +438,7 @@
if (nfds != 0) {
fds = (struct pollfd **)kmalloc(
(1 + (nfds - 1) / POLLFD_PER_PAGE) * sizeof(struct pollfd *),
- GFP_KERNEL);
+ GFP_KERNEL | __GFP_VMALLOC);
if (fds == NULL)
goto out;
}
diff -u -r linux-orig/include/asm-i386/processor.h linux/include/asm-i386/processor.h
--- linux-orig/include/asm-i386/processor.h Sat Oct 6 16:21:50 2001
+++ linux/include/asm-i386/processor.h Sat Oct 6 16:31:15 2001
@@ -448,7 +448,7 @@
#define KSTK_ESP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)))[1022])

#define THREAD_SIZE (2*PAGE_SIZE)
-#define alloc_task_struct() ((struct task_struct *) __get_free_pages(GFP_KERNEL,1))
+#define alloc_task_struct() ((struct task_struct *) __get_free_pages(GFP_KERNEL | __GFP_VMALLOC,1))
#define free_task_struct(p) free_pages((unsigned long) (p), 1)
#define get_task_struct(tsk) atomic_inc(&virt_to_page(tsk)->count)

diff -u -r linux-orig/include/linux/mm.h linux/include/linux/mm.h
--- linux-orig/include/linux/mm.h Sat Oct 6 16:21:59 2001
+++ linux/include/linux/mm.h Sat Oct 6 16:28:12 2001
@@ -550,6 +550,7 @@
#define __GFP_IO 0x40 /* Can start low memory physical IO? */
#define __GFP_HIGHIO 0x80 /* Can start high mem physical IO? */
#define __GFP_FS 0x100 /* Can call down to low-level FS? */
+#define __GFP_VMALLOC 0x200 /* Can vmalloc pages if buddy allocator fails */

#define GFP_NOHIGHIO (__GFP_HIGH | __GFP_WAIT | __GFP_IO)
#define GFP_NOIO (__GFP_HIGH | __GFP_WAIT)
diff -u -r linux-orig/mm/page_alloc.c linux/mm/page_alloc.c
--- linux-orig/mm/page_alloc.c Sat Oct 6 16:21:47 2001
+++ linux/mm/page_alloc.c Sat Oct 6 16:36:28 2001
@@ -18,6 +18,7 @@
#include <linux/bootmem.h>
#include <linux/slab.h>
#include <linux/compiler.h>
+#include <linux/vmalloc.h>

int nr_swap_pages;
int nr_active_pages;
@@ -421,9 +422,9 @@
struct page * page;

page = alloc_pages(gfp_mask, order);
- if (!page)
- return 0;
- return (unsigned long) page_address(page);
+ if (page) return (unsigned long) page_address(page);
+ if (gfp_mask & __GFP_VMALLOC) return (unsigned long)__vmalloc(PAGE_SIZE << order, gfp_mask, PAGE_KERNEL);
+ return 0;
}

unsigned long get_zeroed_page(unsigned int gfp_mask)
@@ -447,6 +448,10 @@

void free_pages(unsigned long addr, unsigned int order)
{
+ if (addr >= VMALLOC_START && addr < VMALLOC_END) {
+ vfree((void *)addr);
+ return;
+ }
if (addr != 0)
__free_pages(virt_to_page(addr), order);
}
diff -u -r linux-orig/mm/slab.c linux/mm/slab.c
--- linux-orig/mm/slab.c Sat Oct 6 16:21:48 2001
+++ linux/mm/slab.c Sat Oct 6 17:04:37 2001
@@ -73,6 +73,7 @@
#include <linux/interrupt.h>
#include <linux/init.h>
#include <linux/compiler.h>
+#include <linux/vmalloc.h>
#include <asm/uaccess.h>

/*
@@ -1536,10 +1537,14 @@
cache_sizes_t *csizep = cache_sizes;

for (; csizep->cs_size; csizep++) {
+ void *p;
if (size > csizep->cs_size)
continue;
- return __kmem_cache_alloc(flags & GFP_DMA ?
- csizep->cs_dmacachep : csizep->cs_cachep, flags);
+ if ((p = __kmem_cache_alloc(flags & GFP_DMA ?
+ csizep->cs_dmacachep : csizep->cs_cachep, flags & ~__GFP_VMALLOC)))
+ return p;
+ if (flags & __GFP_VMALLOC) return __vmalloc(size, flags, PAGE_KERNEL);
+ return NULL;
}
return NULL;
}
@@ -1580,6 +1585,10 @@

if (!objp)
return;
+ if ((unsigned long)objp >= VMALLOC_START && (unsigned long)obj < VMALLOC_END) {
+ vfree(objp);
+ return;
+ }
local_irq_save(flags);
CHECK_PAGE(virt_to_page(objp));
c = GET_PAGE_CACHE(virt_to_page(objp));

2001-10-06 16:58:32

by Rik van Riel

[permalink] [raw]
Subject: Re: %u-order allocation failed

diff -u -r linux-orig/include/asm-i386/processor.h linux/include/asm-i386/processor.h
--- linux-orig/include/asm-i386/processor.h Sat Oct 6 16:21:50 2001
+++ linux/include/asm-i386/processor.h Sat Oct 6 16:31:15 2001
@@ -448,7 +448,7 @@
#define KSTK_ESP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)))[1022])

#define THREAD_SIZE (2*PAGE_SIZE)
-#define alloc_task_struct() ((struct task_struct *) __get_free_pages(GFP_KERNEL,1))
+#define alloc_task_struct() ((struct task_struct *) __get_free_pages(GFP_KERNEL | __GFP_VMALLOC,1))
#define free_task_struct(p) free_pages((unsigned long) (p), 1)
#define get_task_struct(tsk) atomic_inc(&virt_to_page(tsk)->count)

diff -u -r linux-orig/include/linux/mm.h linux/include/linux/mm.h
--- linux-orig/include/linux/mm.h Sat Oct 6 16:21:59 2001
+++ linux/include/linux/mm.h Sat Oct 6 16:28:12 2001
@@ -550,6 +550,7 @@
#define __GFP_IO 0x40 /* Can start low memory physical IO? */
#define __GFP_HIGHIO 0x80 /* Can start high mem physical IO? */
#define __GFP_FS 0x100 /* Can call down to low-level FS? */
+#define __GFP_VMALLOC 0x200 /* Can vmalloc pages if buddy allocator fails */

#define GFP_NOHIGHIO (__GFP_HIGH | __GFP_WAIT | __GFP_IO)
#define GFP_NOIO (__GFP_HIGH | __GFP_WAIT)
diff -u -r linux-orig/mm/page_alloc.c linux/mm/page_alloc.c
--- linux-orig/mm/page_alloc.c Sat Oct 6 16:21:47 2001
+++ linux/mm/page_alloc.c Sat Oct 6 16:36:28 2001
@@ -18,6 +18,7 @@
#include <linux/bootmem.h>
#include <linux/slab.h>
#include <linux/compiler.h>
+#include <linux/vmalloc.h>

int nr_swap_pages;
int nr_active_pages;
@@ -421,9 +422,9 @@
struct page * page;

page = alloc_pages(gfp_mask, order);
- if (!page)
- return 0;
- return (unsigned long) page_address(page);
+ if (page) return (unsigned long) page_address(page);
+ if (gfp_mask & __GFP_VMALLOC) return (unsigned long)__vmalloc(PAGE_SIZE << order, gfp_mask, PAGE_KERNEL);
+ return 0;
}

unsigned long get_zeroed_page(unsigned int gfp_mask)
@@ -447,6 +448,10 @@

void free_pages(unsigned long addr, unsigned int order)
{
+ if (addr >= VMALLOC_START && addr < VMALLOC_END) {
+ vfree((void *)addr);
+ return;
+ }
if (addr != 0)
__free_pages(virt_to_page(addr), order);
}

2001-10-06 17:48:48

by Mikulas Patocka

[permalink] [raw]
Subject: Re: %u-order allocation failed

On Sat, 6 Oct 2001, Rik van Riel wrote:

> On Sat, 6 Oct 2001, Mikulas Patocka wrote:
> > On Sat, 6 Oct 2001, Rik van Riel wrote:
> > > On Sat, 6 Oct 2001, Mikulas Patocka wrote:
> > >
> > > > Buddy allocator is broken - kill it. Or at least do not misuse it for
> > > > anything except kernel or driver initialization.
> > >
> > > Please send patches to get rid of the buddy allocator while
> > > still making it possible to allocate contiguous chunks of
> > > memory.
> > >
> > > If you have any idea on how to fix things, this would be a
> > > good time to let us know.
> >
> > Here goes the fix. (note that I didn't try to compile it so there may be
> > bugs, but you see the point).
>
> So what are you going to do when your 64MB of vmalloc space
> runs out ?

Make larger vmalloc space :-) Virtual memory costs very little.
Besides 64M / 8k = 8192 - so it runs out at 8192 processes.

Of course vmalloc space can overflow - but it overflows only when the
machine is overloaded with too many processes, too many processes with
many filedescriptors etc. On the other hand, the buddy allocator fails
*RANDOMLY*. Totally randomly, depending on cache access patterns and
page allocation times.

Mikulas

Subject: Re: %u-order allocation failed



--On Saturday, 06 October, 2001 4:44 PM +0200 Mikulas Patocka
<[email protected]> wrote:

> Here goes the fix. (note that I didn't try to compile it so there may be
> bugs, but you see the point).

(seems to replace high order allocations by vmalloc)

& how does vmalloc allocate physically (as opposed to virtually)
contiguous memory; can't clearly recall it being IRQ safe either
(for GFP_ATOMIC).

--
Alex Bligh

2001-10-06 18:16:33

by Anton Blanchard

[permalink] [raw]
Subject: Re: %u-order allocation failed


> Of course vmalloc space can overflow - but it overflows only when the
> machine is overloaded with too many processes, too many processes with
> many filedescriptors etc. On the other hand, the buddy allocator fails
> *RANDOMLY*. Totally randomly, depending on cache access patterns and
> page allocation times.

vmalloc space is also much worse for tlb usage when the main kernel mapping
uses large hardware ptes. Ingo and davem pointed this out to me recently
when I wanted to allocate the pagecache hash using vmalloc (at the
moment it maxes out at order 10 which is much to small for machines
with large memory).

If you could get away with a single page stack, then you could allocate
the task struct separately and avoid any order 1 allocation. But you
would probably need interrupt stacks to get away with a single page
stack.

Anton

2001-10-06 19:05:52

by Mikulas Patocka

[permalink] [raw]
Subject: Re: %u-order allocation failed

--- linux-orig/mm/vmalloc.c Sat Oct 6 16:21:47 2001
+++ linux/mm/vmalloc.c Sat Oct 6 21:01:00 2001
@@ -170,6 +170,9 @@
{
unsigned long addr;
struct vm_struct **p, *tmp, *area;
+ int align = 0;
+
+ if (size > PAGE_SIZE && !(size & (size - 1))) align = size - 1;

area = (struct vm_struct *) kmalloc(sizeof(*area), GFP_KERNEL);
if (!area)
@@ -183,6 +186,7 @@
if (size + addr <= (unsigned long) tmp->addr)
break;
addr = tmp->size + (unsigned long) tmp->addr;
+ addr = (addr + align) & ~align;
if (addr > VMALLOC_END-size)
goto out;
}

2001-10-06 19:07:32

by Mikulas Patocka

[permalink] [raw]
Subject: Re: %u-order allocation failed

> > Of course vmalloc space can overflow - but it overflows only when the
> > machine is overloaded with too many processes, too many processes with
> > many filedescriptors etc. On the other hand, the buddy allocator fails
> > *RANDOMLY*. Totally randomly, depending on cache access patterns and
> > page allocation times.
>
> vmalloc space is also much worse for tlb usage when the main kernel mapping
> uses large hardware ptes. Ingo and davem pointed this out to me recently
> when I wanted to allocate the pagecache hash using vmalloc (at the
> moment it maxes out at order 10 which is much to small for machines
> with large memory).

OK, but my patch uses vmalloc only as a fallback when buddy fails. The
probability that buddy fails is small. It is slower but with very small
probability.

It is perfectly OK to have a bit slower access to task_struct with
probability 1/1000000.

But it is ***BAD*BUG*** if allocation of task_struct fails with
probability 1/1000000.

> If you could get away with a single page stack, then you could allocate
> the task struct separately and avoid any order 1 allocation. But you
> would probably need interrupt stacks to get away with a single page
> stack.

Yes, but there are still other dangerous usages of kmalloc and
__get_free_pages. (The most offending one is in select.c)


It is sad that core VM developers did not write any documentation that
explains that high-order allocations can fail any time and the caller must
not abort his operation when it happens. Instead - they are trying to make
high-order allocations fail less often :-/ How should random
Joe-driver-developer know, that kmalloc(4096) is safe and kmalloc(4097) is
not?

Now parts of a kernel written by people who know about buddy allocator
(page/buffer/dentry/inode hash allocations, filedescriptor array
allocation) are written correctly with the assumption that high-order
allocation may fail.

Other parts of kernel written by people who do not know about buddy
allocator (task_struct allocation, select and probably a lot of drivers)
assume that high-order allocation always succeeds. task_struct and select
can be fixed easily, but cleaning the shit in drivers will be real pain
and it will probably never be finished :-(

Mikulas



2001-10-06 19:13:02

by Mikulas Patocka

[permalink] [raw]
Subject: Re: %u-order allocation failed

> --On Saturday, 06 October, 2001 4:44 PM +0200 Mikulas Patocka
> <[email protected]> wrote:
>
> > Here goes the fix. (note that I didn't try to compile it so there may be
> > bugs, but you see the point).
>
> (seems to replace high order allocations by vmalloc)
>
> & how does vmalloc allocate physically (as opposed to virtually)
> contiguous memory; can't clearly recall it being IRQ safe either
> (for GFP_ATOMIC).

It uses vmalloc only when __GFP_VMALLOC flag is given - and so it is
expected to not use __GFP_VMALLOC flag in IRQ.

NOTE: no allocations in IRQ are safe. Not only high-order ones.
Allocation in IRQ may fail any time and you must recover without lost of
functionality (network can lose packets any time, if you are doing some
general device driver, you must preallocate all buffers in process
context).

Mikulas

2001-10-06 19:22:02

by Arjan van de Ven

[permalink] [raw]
Subject: Re: %u-order allocation failed

In article <Pine.LNX.3.96.1011006210743.7808D-100000@artax.karlin.mff.cuni.cz> you wrote:

> NOTE: no allocations in IRQ are safe. Not only high-order ones.
> Allocation in IRQ may fail any time and you must recover without lost of
> functionality (network can lose packets any time, if you are doing some
> general device driver, you must preallocate all buffers in process
> context).

how again do you deal with calling vfree() on the ones where you used
vmalloc instead of the buddy allocator ?

2001-10-06 20:13:24

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: %u-order allocation failed

>
>OK, but my patch uses vmalloc only as a fallback when buddy fails. The
>probability that buddy fails is small. It is slower but with very small
>probability.
>
>It is perfectly OK to have a bit slower access to task_struct with
>probability 1/1000000.
>
>But it is ***BAD*BUG*** if allocation of task_struct fails with
>probability 1/1000000.

I missed the beginning of the thread, sorry if that question was
already answered,

What about all the code that still consider kmalloc'ed memory is
safe for use with virt_to_bus and friends and is contiguous
physically for DMA ? In some cases (non-PCI devices, embedded
platforms, etc...), the pci_consistent API is not an option.
That means that __GFP_VMALLOC can't be part of GFP_KERNEL or
many driver will break in horrible ways (random memory corruption).

Ben.


2001-10-06 21:09:45

by Alan

[permalink] [raw]
Subject: Re: %u-order allocation failed

> It is perfectly OK to have a bit slower access to task_struct with
> probability 1/1000000.

Except that you added a bug where some old driver code would crash the
machine by doing so.

> Yes, but there are still other dangerous usages of kmalloc and
> __get_free_pages. (The most offending one is in select.c)

Nothing dangeorus there. The -ac vm isnt triggering these cases.

> not abort his operation when it happens. Instead - they are trying to make
> high-order allocations fail less often :-/ How should random
> Joe-driver-developer know, that kmalloc(4096) is safe and kmalloc(4097) is
> not?

4096 is not safe - there is no safe size for a kmalloc, you can always run
out of memory - deal with it.

Alan

2001-10-06 22:33:23

by Mikulas Patocka

[permalink] [raw]
Subject: Re: %u-order allocation failed

> > It is perfectly OK to have a bit slower access to task_struct with
> > probability 1/1000000.
>
> Except that you added a bug where some old driver code would crash the
> machine by doing so.

?

> > Yes, but there are still other dangerous usages of kmalloc and
> > __get_free_pages. (The most offending one is in select.c)
>
> Nothing dangeorus there. The -ac vm isnt triggering these cases.

Sorry, but it can be triggered by _ANY_ VM since buddy allocator was
introduced. You have no guarantee, that you find two or more consecutive
free pages. And if you don't, poll() fails.

> > not abort his operation when it happens. Instead - they are trying to make
> > high-order allocations fail less often :-/ How should random
> > Joe-driver-developer know, that kmalloc(4096) is safe and kmalloc(4097) is
> > not?
>
> 4096 is not safe - there is no safe size for a kmalloc, you can always run
> out of memory - deal with it.

This is not about running out of memory. It is about free space
fragmentation. Think this:

You have no swap.
Program allocates one file cache page, one anon page, one cache page, one
anon page and so on. The memory will look like:

cache page
anon page
cache page
anon page
cache page
anon page
etc.

Now some driver wants to allocate 4097 and it CAN'T. Even when there's
half memory free.

Mikulas


2001-10-06 22:33:53

by Mikulas Patocka

[permalink] [raw]
Subject: Re: %u-order allocation failed

> >OK, but my patch uses vmalloc only as a fallback when buddy fails. The
> >probability that buddy fails is small. It is slower but with very small
> >probability.
> >
> >It is perfectly OK to have a bit slower access to task_struct with
> >probability 1/1000000.
> >
> >But it is ***BAD*BUG*** if allocation of task_struct fails with
> >probability 1/1000000.
>
> I missed the beginning of the thread, sorry if that question was
> already answered,
>
> What about all the code that still consider kmalloc'ed memory is
> safe for use with virt_to_bus and friends and is contiguous
> physically for DMA ? In some cases (non-PCI devices, embedded
> platforms, etc...), the pci_consistent API is not an option.
> That means that __GFP_VMALLOC can't be part of GFP_KERNEL or
> many driver will break in horrible ways (random memory corruption).

You are right. Code that allocates more than page and expects it to be
physicaly contignuous is broken by design. Even rewrite the driver or
allocate memory on boot. It will be very hard to audit all drivers for it.

Mikulas

2001-10-06 22:36:23

by Mikulas Patocka

[permalink] [raw]
Subject: Re: %u-order allocation failed

> In article <Pine.LNX.3.96.1011006210743.7808D-100000@artax.karlin.mff.cuni.cz> you wrote:
>
> > NOTE: no allocations in IRQ are safe. Not only high-order ones.
> > Allocation in IRQ may fail any time and you must recover without lost of
> > functionality (network can lose packets any time, if you are doing some
> > general device driver, you must preallocate all buffers in process
> > context).
>
> how again do you deal with calling vfree() on the ones where you used
> vmalloc instead of the buddy allocator ?

It's in the patch: if someone calls get_free_pages on vmallocated memory,
it will be freed with vfree instead of __get_free_pages.

Of course you can't allocate memory in process context and free it in
interrupt context - which you could do without __GFP_VMALLOC.

Mikulas

2001-10-06 22:42:33

by Alan

[permalink] [raw]
Subject: Re: %u-order allocation failed

> > Nothing dangeorus there. The -ac vm isnt triggering these cases.
>
> Sorry, but it can be triggered by _ANY_ VM since buddy allocator was
> introduced. You have no guarantee, that you find two or more consecutive
> free pages. And if you don't, poll() fails.

The two page case isnt one you need to worry about. To all intents and
purposes it does not happen, and if you do the maths it isnt going to fail
in any interesting ways. Once you go to the 4 page set the odds get a lot
longer and then rapidly get very bad indeed,

Alan

2001-10-06 22:58:37

by Mikulas Patocka

[permalink] [raw]
Subject: Re: %u-order allocation failed

> > > Nothing dangeorus there. The -ac vm isnt triggering these cases.
> >
> > Sorry, but it can be triggered by _ANY_ VM since buddy allocator was
> > introduced. You have no guarantee, that you find two or more consecutive
> > free pages. And if you don't, poll() fails.
>
> The two page case isnt one you need to worry about. To all intents and
> purposes it does not happen,

How do you know it? I showed a simple case where it may happen.

> and if you do the maths it isnt going to
> fail in any interesting ways. Once you go to the 4 page set the odds get
> a lot longer and then rapidly get very bad indeed,

I hope you don't want to count probability that the server will or won't
crash (yes, crash, because when poll in main loop fails, the server
process has not many choices - it can only terminate itself). This reminds
me some Microsoft announcement saying that Windows NT are 3 times more
stable than Windows 95 :-)

And it does happen - see this:
http://www.uwsg.indiana.edu/hypermail/linux/kernel/0012.3/0711.html
Maybe probability was reduced somehow, but the problem is still there.

Mikulas

Subject: Re: %u-order allocation failed

Mikulas,

> It uses vmalloc only when __GFP_VMALLOC flag is given - and so it is
> expected to not use __GFP_VMALLOC flag in IRQ.

Ah OK. If your point is that people use GFP_ATOMIC when it's
not needed, and demand physically contiguous memory when only
virtually contiguous memory is needed, in several places in
the kernel, then you are correct. [I am not convinced that
vmalloc() is the best way to fix it though.]

Most of the order>0 users of __get_free_pages() don't
'need' to do that. For instance I was convinced that networking
code needed this for larger than 4k packets (pre-fragmentation
or post-prefragmentation) until someone pointed out that
the kiovec stuff was there, waiting to be used, if someone
made the code changes. But the code changes are non-trivial.

Note also that something (not sure what) has made fragmentation
increasingly prevalent over the years since the buddy allocator
was originally put in. (see my earlier patch for measuring
fragmentation). There is currently /no/ intelligence in there
to defragment stuff, and the 'light touch' patches (ideas I had
and posted here) don't appear to work. If we want __get_free_pages
to allocate order>0 this is possible to do reliably if we
have some intelligent form of page out which attempts
to defragment as it runs, or else run a defragmenter. It's also possible
to do allocate order>0 GFP_ATOMIC far more reliably than at
present if we had a target for defragmentation under normal
operation, just like we retain a target for pages reserved
for atomic allocation.

The very original buddy code (circa 94/95 which I wrote) maintained
that there should be (from memory) at least one entry on a high
order list (I think it was the 64k list), which gave you a few
guaranteed 8k allocations (which was I was interested in). It's
trivial to patch this into __get_free_pages though I haven't
tried this (i.e. rather than just look at total free pages,
look at the existance of a page on either the order=4, 5, 6...
queues). Note you will use memory less efficiently if you do
this. In times of cheaper memory costs, it might be worth
testing this approach again.

--
Alex Bligh

Subject: Re: %u-order allocation failed



--On Sunday, 07 October, 2001 12:31 AM +0200 Mikulas Patocka
<[email protected]> wrote:

> Sorry, but it can be triggered by _ANY_ VM since buddy allocator was
> introduced.

Just for info, this was circa 1.0.6 :-) (patches were available
since 0.99.xxx). And before it was introduced, rather a lot
of other things would consistently fail, for instance anything
that reassembled packets whose total size was >4k. And currently
they still need that.

Kernel memory is a limited resource. Contiguous kernel memory
more so. Things that need it need to better deal with the
lack of it, esp. in transient situations (such as by working
round the absence of it, e.g. kiovec in net code, or by
causing some freeing and retrying). And, when contiguous
kernel memory is short, the allocator could do with some
intelligent page freeing to reduce fragmentation.

--
Alex Bligh

Subject: Re: %u-order allocation failed



--On Sunday, 07 October, 2001 12:58 AM +0200 Mikulas Patocka
<[email protected]> wrote:

> How do you know it? I showed a simple case where it may happen.

Do you know two order=0 allocations with the same GFP_ value
would not have also failed?

--
Alex Bligh

2001-10-07 01:24:09

by Rik van Riel

[permalink] [raw]
Subject: Re: %u-order allocation failed

On Sun, 7 Oct 2001, Mikulas Patocka wrote:

> You are right. Code that allocates more than page and expects it to be
> physicaly contignuous is broken by design. Even rewrite the driver or
> allocate memory on boot. It will be very hard to audit all drivers for it.

Better buy us all new hardware, then ;)

Some devices really do want physically contiguous buffers
for DMA...


Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ (volunteers needed)

http://www.surriel.com/ http://distro.conectiva.com/

2001-10-07 09:40:45

by Alan

[permalink] [raw]
Subject: Re: %u-order allocation failed

> Here goes the fix. (note that I didn't try to compile it so there may be
> bugs, but you see the point).

It isnt a fix

> kmalloc should be fixed too (used badly for example in select.c - and yes
> - I have seen real world bugreports for poll randomly failing with
> ENOMEM), but it will be hard to audit all drivers that they do not try to
> use dma on kmallocated memory.

So you run out of blocks of vmalloc address space instead. The same problem
still occurs and always will

2001-10-07 11:12:22

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: %u-order allocation failed

>
>You are right. Code that allocates more than page and expects it to be
>physicaly contignuous is broken by design. Even rewrite the driver or
>allocate memory on boot. It will be very hard to audit all drivers for it.

Well, the problem here is not code. Some piece of hardware just can't
scatter gather, or in some case, they can, but the scatter/gather list
itself has to be contiguous and can be larger than a page.

The fact that kmalloc returns physically contiguous memory is a feature
and can't be modified that easily. If you intend to do so, then you need
different GFP flags, for example a GFP_CONTIGUOUS flag, and then make
sure that drivers allocating DMA memory use that new flag.

Ben.


2001-10-07 12:28:22

by Mikulas Patocka

[permalink] [raw]
Subject: Re: %u-order allocation failed

On Sun, 7 Oct 2001, Alan Cox wrote:

> > Here goes the fix. (note that I didn't try to compile it so there may be
> > bugs, but you see the point).
>
> It isnt a fix
>
> > kmalloc should be fixed too (used badly for example in select.c - and yes
> > - I have seen real world bugreports for poll randomly failing with
> > ENOMEM), but it will be hard to audit all drivers that they do not try to
> > use dma on kmallocated memory.
>
> So you run out of blocks of vmalloc address space instead. The same problem
> still occurs and always will

I already said it in mail to Rik:

Yes - you can run out of vmalloc space. But you run out of it only when
you create too many processes (8192), load too many modules etc. If
someone needs to put such heavy load on linux, we can expect that he is
not a luser and he knows how to increase size of vmalloc space.

But - you run out of high-order pages randomly. You don't have to overflow
any resource - just map a file, touch it whole the first time and then
periodically touch every second page of it. Or: alloc periodically one
anon page and one cache page - read() (without readahead) does exactly
that.

You can't run out of vmalloc space just by mapping files and touching
pages.

The probability math is fine - only if you are sure that pages are
allocated and freed randomly. But they are not.

Mikulas

2001-10-07 14:08:00

by Alan

[permalink] [raw]
Subject: Re: %u-order allocation failed

> Yes - you can run out of vmalloc space. But you run out of it only when
> you create too many processes (8192), load too many modules etc. If
> someone needs to put such heavy load on linux, we can expect that he is
> not a luser and he knows how to increase size of vmalloc space.

Not just that - you get fragmentation of it which leads you back to the
same situation as kmalloc except that with the guard pages you fragment the
address space more.

Alan

2001-10-07 15:42:30

by Mikulas Patocka

[permalink] [raw]
Subject: Re: %u-order allocation failed

> > Yes - you can run out of vmalloc space. But you run out of it only when
> > you create too many processes (8192), load too many modules etc. If
> > someone needs to put such heavy load on linux, we can expect that he is
> > not a luser and he knows how to increase size of vmalloc space.
>
> Not just that - you get fragmentation of it which leads you back to the
> same situation as kmalloc except that with the guard pages you fragment the
> address space more.

So - for example if you have 500 processes, each process 8k stack (plus
one page for vmalloc alignment). Please tell me some alloc/free strategy
that fills up and fragments 64M vmalloc space.

You can't find it.

The difference between memory and vmalloc space is this: you fill up the
whole memory with cache => memory fragments. You don't fill up the whole
vmalloc space with anything => vmalloc space doesn't fragment.

Mikulas

2001-10-07 18:41:14

by ebiederman

[permalink] [raw]
Subject: Re: %u-order allocation failed

Alex Bligh - linux-kernel <[email protected]> writes:

> Mikulas,
>
> > It uses vmalloc only when __GFP_VMALLOC flag is given - and so it is
> > expected to not use __GFP_VMALLOC flag in IRQ.
>
> Ah OK. If your point is that people use GFP_ATOMIC when it's
> not needed, and demand physically contiguous memory when only
> virtually contiguous memory is needed, in several places in
> the kernel, then you are correct. [I am not convinced that
> vmalloc() is the best way to fix it though.]
>
> Most of the order>0 users of __get_free_pages() don't
> 'need' to do that. For instance I was convinced that networking
> code needed this for larger than 4k packets (pre-fragmentation
> or post-prefragmentation) until someone pointed out that
> the kiovec stuff was there, waiting to be used, if someone
> made the code changes. But the code changes are non-trivial.

The zero copy stuff introduced in 2.4.4 allows for skb fragments.
I haven't seen any of the network drivers using it on their receive
path but it should be possible.

> Note also that something (not sure what) has made fragmentation
> increasingly prevalent over the years since the buddy allocator
> was originally put in.

Actually it seems to be situations like the stack now being two pages

> (see my earlier patch for measuring
> fragmentation). There is currently /no/ intelligence in there
> to defragment stuff, and the 'light touch' patches (ideas I had
> and posted here) don't appear to work. If we want __get_free_pages
> to allocate order>0 this is possible to do reliably if we
> have some intelligent form of page out which attempts
> to defragment as it runs, or else run a defragmenter. It's also possible
> to do allocate order>0 GFP_ATOMIC far more reliably than at
> present if we had a target for defragmentation under normal
> operation, just like we retain a target for pages reserved
> for atomic allocation.
>
> The very original buddy code (circa 94/95 which I wrote) maintained
> that there should be (from memory) at least one entry on a high
> order list (I think it was the 64k list), which gave you a few
> guaranteed 8k allocations (which was I was interested in). It's
> trivial to patch this into __get_free_pages though I haven't
> tried this (i.e. rather than just look at total free pages,
> look at the existance of a page on either the order=4, 5, 6...
> queues). Note you will use memory less efficiently if you do
> this. In times of cheaper memory costs, it might be worth
> testing this approach again.
>
> --
> Alex Bligh
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2001-10-07 18:43:33

by ebiederman

[permalink] [raw]
Subject: Re: %u-order allocation failed

Alex Bligh - linux-kernel <[email protected]> writes:

> Mikulas,
>
> > It uses vmalloc only when __GFP_VMALLOC flag is given - and so it is
> > expected to not use __GFP_VMALLOC flag in IRQ.
>
> Ah OK. If your point is that people use GFP_ATOMIC when it's
> not needed, and demand physically contiguous memory when only
> virtually contiguous memory is needed, in several places in
> the kernel, then you are correct. [I am not convinced that
> vmalloc() is the best way to fix it though.]
>
> Most of the order>0 users of __get_free_pages() don't
> 'need' to do that. For instance I was convinced that networking
> code needed this for larger than 4k packets (pre-fragmentation
> or post-prefragmentation) until someone pointed out that
> the kiovec stuff was there, waiting to be used, if someone
> made the code changes. But the code changes are non-trivial.

The zero copy stuff introduced in 2.4.4 allows for skb fragments.
I haven't seen any of the network drivers using it on their receive
path but it should be possible.

> Note also that something (not sure what) has made fragmentation
> increasingly prevalent over the years since the buddy allocator
> was originally put in.

Actually it seems to be situations like the stack now being two
contiguous pages instead of one, where the demand for contiguous
memory has increased instead of the amount of fragmentation having
increased.

Eric

2001-10-07 20:43:47

by Pavel Machek

[permalink] [raw]
Subject: Re: %u-order allocation failed

Hi!

> > So what are you going to do when your 64MB of vmalloc space
> > runs out ?
>
> Make larger vmalloc space :-) Virtual memory costs very little.
> Besides 64M / 8k = 8192 - so it runs out at 8192 processes.

Hard to do of machine with 1GB ram... There, virtual memory costs
*very* much.
Pavel
--
I'm [email protected]. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [email protected]

2001-10-07 21:57:22

by Alan

[permalink] [raw]
Subject: Re: %u-order allocation failed

> The difference between memory and vmalloc space is this: you fill up the
> whole memory with cache => memory fragments. You don't fill up the whole
> vmalloc space with anything => vmalloc space doesn't fragment.

vmalloc space fragments. You fragment address space rather than pages thats
all. Same problem

Subject: Re: %u-order allocation failed



--On Sunday, October 07, 2001 12:30 PM -0600 "Eric W. Biederman"
<[email protected]> wrote:

>> Note also that something (not sure what) has made fragmentation
>> increasingly prevalent over the years since the buddy allocator
>> was originally put in.
>
> Actually it seems to be situations like the stack now being two pages

Instrumentation posted here before appears to corellate fragmentation
being /caused/ with I/O activity (single bonnie process and thus a
single 8k stack frame). My own guess is that it is due to
a different persistence of various caches.

I haven't seen anyone before blaming stack frame allocation
as a /cause/ of fragmenation - I've heard people say they
notice fragmentation more as stack frame allocs start to
fail - but that's a symptom.

--
Alex Bligh

Subject: Re: %u-order allocation failed



--On Sunday, October 07, 2001 11:01 PM +0100 Alan Cox
<[email protected]> wrote:

> vmalloc space fragments. You fragment address space rather than pages
> thats all. Same problem

Actually fragmented virtual space is theoretically
worse, as you have now lost a possible weapon to
defragment stuff (indirection on mapping to physical RAM -
i.e. you could no longer move or swap out physical RAM and
keep the virtual address mapping the same).

--
Alex Bligh

2001-10-08 22:24:45

by Mikulas Patocka

[permalink] [raw]
Subject: Re: %u-order allocation failed

On Sun, 7 Oct 2001, Alan Cox wrote:

> > The difference between memory and vmalloc space is this: you fill up the
> > whole memory with cache => memory fragments. You don't fill up the whole
> > vmalloc space with anything => vmalloc space doesn't fragment.
>
> vmalloc space fragments. You fragment address space rather than pages thats
> all. Same problem

If you have more than half of virtual space free, you can always find two
consecutive free pages. Period.

You can fill up half of virtual space if you start 4096 processes or load
many modules of total size 32M. Is it clear? Do you realize that no one
will ever hit this limit in typical linux configuration?

Mikulas

2001-10-08 22:36:55

by David Lang

[permalink] [raw]
Subject: Re: %u-order allocation failed

only 4096 processes, sounds low to me (I realize that some of my configs
are not typical, but this isn't that unusual on servers)

does this limit go up if you raise the max number of processes/threads?

David Lang

On Tue, 9 Oct 2001, Mikulas Patocka wrote:

> Date: Tue, 9 Oct 2001 00:21:04 +0200 (CEST)
> From: Mikulas Patocka <[email protected]>
> To: Alan Cox <[email protected]>
> Cc: Rik van Riel <[email protected]>,
> Krzysztof Rusocki <[email protected]>, [email protected],
> [email protected]
> Subject: Re: %u-order allocation failed
>
> On Sun, 7 Oct 2001, Alan Cox wrote:
>
> > > The difference between memory and vmalloc space is this: you fill up the
> > > whole memory with cache => memory fragments. You don't fill up the whole
> > > vmalloc space with anything => vmalloc space doesn't fragment.
> >
> > vmalloc space fragments. You fragment address space rather than pages thats
> > all. Same problem
>
> If you have more than half of virtual space free, you can always find two
> consecutive free pages. Period.
>
> You can fill up half of virtual space if you start 4096 processes or load
> many modules of total size 32M. Is it clear? Do you realize that no one
> will ever hit this limit in typical linux configuration?
>
> Mikulas
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Subject: Re: %u-order allocation failed



--On Tuesday, 09 October, 2001 12:21 AM +0200 Mikulas Patocka
<[email protected]> wrote:

> If you have more than half of virtual space free, you can always find two
> consecutive free pages. Period.

Now calculate the probability of not being able to do this in physical
space, assuming even page dispersion, and many pages free. You will
find it is very small. This may give you a clue as to what the problem
actually is.

--
Alex Bligh

2001-10-08 23:32:32

by Mikulas Patocka

[permalink] [raw]
Subject: Re: %u-order allocation failed

On Mon, 8 Oct 2001, Alex Bligh - linux-kernel wrote:

> --On Tuesday, 09 October, 2001 12:21 AM +0200 Mikulas Patocka
> <[email protected]> wrote:
>
> > If you have more than half of virtual space free, you can always find two
> > consecutive free pages. Period.
>
> Now calculate the probability of not being able to do this in physical
> space, assuming even page dispersion, and many pages free. You will
> find it is very small. This may give you a clue as to what the problem
> actually is.

My patch is not providing "very small probability". It is providing _zero_
probability that fork fails. (assiming that there is more than half
vmalloc space free).

I'm just tired of this stupid flamewar.

Linus, what do you think: is it OK if fork randomly fails with very small
probability or not?

Are you going to accept patch that maps task_struct into virtual space if
buddy allocator fails or not?

Mikulas

2001-10-08 23:39:52

by Alan

[permalink] [raw]
Subject: Re: %u-order allocation failed

> Linus, what do you think: is it OK if fork randomly fails with very small
> probability or not?

Your code doesnt change that behaviour. Not one iota. Do the mathematics,
work out the failure probabilities for page pairs. Now remember that the
vmalloc one has guard pages too.

You are trying to solve a non problem with a non solution

Alan

2001-10-08 23:47:02

by Mikulas Patocka

[permalink] [raw]
Subject: Re: %u-order allocation failed

On Tue, 9 Oct 2001, Alan Cox wrote:

> > Linus, what do you think: is it OK if fork randomly fails with very small
> > probability or not?
>
> Your code doesnt change that behaviour. Not one iota. Do the mathematics,
> work out the failure probabilities for page pairs. Now remember that the
> vmalloc one has guard pages too.
>
> You are trying to solve a non problem with a non solution

I asked Linus, not you :-/

It's up to him, if he wants "stability-based-on-probability" algorithms in
Linux or not.

Mikulas

2001-10-08 23:49:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: %u-order allocation failed


On Tue, 9 Oct 2001, Mikulas Patocka wrote:
>
> Linus, what do you think: is it OK if fork randomly fails with very small
> probability or not?

I've never seen it, I've never heard it reported, and I _know_ that
vmalloc() causes slowdowns.

In short, I'm not switching to a vmalloc() fork.

Linus

2001-10-08 23:54:52

by Mikulas Patocka

[permalink] [raw]
Subject: Re: %u-order allocation failed

On Mon, 8 Oct 2001, Linus Torvalds wrote:

>
> On Tue, 9 Oct 2001, Mikulas Patocka wrote:
> >
> > Linus, what do you think: is it OK if fork randomly fails with very small
> > probability or not?
>
> I've never seen it, I've never heard it reported, and I _know_ that
> vmalloc() causes slowdowns.
>
> In short, I'm not switching to a vmalloc() fork.

The patch uses buddy by default and does vmalloc only if buddy fails.
Slowdown is not an issue here.

Mikulas

2001-10-09 09:45:39

by Pavel Machek

[permalink] [raw]
Subject: Re: %u-order allocation failed

Hi!

> > > Linus, what do you think: is it OK if fork randomly fails with very small
> > > probability or not?
> >
> > Your code doesnt change that behaviour. Not one iota. Do the mathematics,
> > work out the failure probabilities for page pairs. Now remember that the
> > vmalloc one has guard pages too.
> >
> > You are trying to solve a non problem with a non solution
>
> I asked Linus, not you :-/
>
> It's up to him, if he wants "stability-based-on-probability" algorithms in
> Linux or not.

You ignored comment about guard pages.
Pavel
--
Casualities in World Trade Center: 6453 dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

2001-10-09 11:48:19

by Rik van Riel

[permalink] [raw]
Subject: Re: %u-order allocation failed

On Mon, 8 Oct 2001, Linus Torvalds wrote:
> On Tue, 9 Oct 2001, Mikulas Patocka wrote:
> >
> > Linus, what do you think: is it OK if fork randomly fails with very small
> > probability or not?
>
> I've never seen it, I've never heard it reported, and I _know_ that
> vmalloc() causes slowdowns.

I've seen it happen during stresstest of an underpowered
test box. When that point is reached, the system usually
is already so far overloaded there's little point in
allowing extra processes to be started.

> In short, I'm not switching to a vmalloc() fork.

The only real use I could see would be to allow root to
start up some commands to save the box when it's going
down the drain. Probably not worth it since root could
have just used ulimit for the normal users ;)

regards,

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ (volunteers needed)

http://www.surriel.com/ http://distro.conectiva.com/

2001-10-13 19:34:11

by Pavel Machek

[permalink] [raw]
Subject: Re: %u-order allocation failed

Hi!

> > The difference between memory and vmalloc space is this: you fill up the
> > whole memory with cache => memory fragments. You don't fill up the whole
> > vmalloc space with anything => vmalloc space doesn't fragment.
>
> vmalloc space fragments. You fragment address space rather than pages thats
> all. Same problem

vmalloc space tends to be empty while ram tends to be full. That might be
important.
Pavel

--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.