2001-03-01 09:46:11

by Ivan Stepnikov

[permalink] [raw]
Subject: Kernel is unstable

Hello!

I tried to test linux memory allocation. For experiment I used i386
architecture with Pentium III processor, 512M RAM and 8G swap space. For
memory allocation libhoard was tried. Linux kernel 2.4.2 with patch
per-process-3.5G-IA32-no-PAE-1, at
/pub/linux/kernel/people/andrea/patches/v2.4/2.4.0-test11-pre5/ on
ftp.kernel.org (This patch should force memory allocation for one process up
to 3.5G approximately).

When I try large blocks (about 256K - 1M) everything was ok. More then 3G
memory was successfully allocated.

On small blocks result was significantly worse. About 2.3 - 2.4G was
allocated and system hanged. But it was possible to switch between local
consoles, and to receive ping replay by network. Actually it's only one sign
of life (hard disk didn't work and it was impossible to reboot the machine
by correct methods). At /var/log/messages was

Feb 28 17:08:55 zetta kernel: <ed.
Feb 28 17:08:55 zetta kernel: __alloc_pages: 0-order allocation failed.
Feb 28 17:09:05 zetta last message repeated 363 times



Test program looks like this:



block = 1024;

for (i=1; ; ++i ){

if(p==malloc(block)){

perror("failed");

fprintf(stderr,"Error! Size total:%uM pass:
%u\n",size/1024/1024,i);

return 0;

}

size += block;

printf("Size total:%uM pass: %u\n",size/1024/1024,i);

}



I think, this test shouldn't hang system. System may work slowly, very
slowly but it shouldn't hang. If system cannot allocate any pages it should
return correct value.

And the main thing: it seems the system doesn't use all available swap
space.

I'm sure if system works properly results must be other.

--
Regards,
Ivan Stepnikov.



2001-03-01 10:28:19

by Matti Aarnio

[permalink] [raw]
Subject: Re: Kernel is unstable

On Thu, Mar 01, 2001 at 12:16:08PM +0300, Ivan Stepnikov wrote:
> Hello!
>
> I tried to test linux memory allocation. For experiment I used i386
> architecture with Pentium III processor, 512M RAM and 8G swap space. For
> memory allocation libhoard was tried. Linux kernel 2.4.2 with patch
> per-process-3.5G-IA32-no-PAE-1, at
> /pub/linux/kernel/people/andrea/patches/v2.4/2.4.0-test11-pre5/ on
> ftp.kernel.org (This patch should force memory allocation for one process up
> to 3.5G approximately).

I tried this with stock 2.4.0.
(I am sluggish to change things at my workstation.)

> When I try large blocks (about 256K - 1M) everything was ok. More then 3G
> memory was successfully allocated.
>
> On small blocks result was significantly worse. About 2.3 - 2.4G was
> allocated and system hanged. But it was possible to switch between local
> consoles, and to receive ping replay by network. Actually it's only one sign
> of life (hard disk didn't work and it was impossible to reboot the machine
> by correct methods). At /var/log/messages was

That sounds like the system got way too much of memory FRAGMENTS,
and could no longer track them with the amount of memory space
available to the Kernel ? (Kernel needs memory for its internal
databases, very few (none?) of which can be in PAE areas.)

...

Yes, LOTS of fragments with malloc(1024):
...
48459000-4845a000 rw-p 00159000 00:00 0
4845a000-4845b000 rw-p 0015a000 00:00 0
4845b000-4845c000 rw-p 0015b000 00:00 0
4845c000-4845d000 rw-p 0015c000 00:00 0
4845d000-4845e000 rw-p 0015d000 00:00 0
4845e000-4845f000 rw-p 0015e000 00:00 0
4845f000-48460000 rw-p 0015f000 00:00 0
48460000-48461000 rw-p 00160000 00:00 0
48461000-48462000 rw-p 00161000 00:00 0
48462000-48463000 rw-p 00162000 00:00 0
...


For 1 G alloc some 16400 lines of /proc/PID/maps data.

The segments are adjacent, but still NOT merged in mapping.


With malloc(1M):

...
44089000-4418a000 rw-p 00000000 00:00 0
4418a000-4428b000 rw-p 00000000 00:00 0
4428b000-4438c000 rw-p 00000000 00:00 0
4438c000-4448d000 rw-p 00000000 00:00 0
4448d000-4458e000 rw-p 00000000 00:00 0
4458e000-4468f000 rw-p 00000000 00:00 0
4468f000-44790000 rw-p 00000000 00:00 0
44790000-44891000 rw-p 00000000 00:00 0

The mmap() allocates them at 1M+64k sized segments.
(And mere 970 segments for 1G alloc.)

...
> I think, this test shouldn't hang system. System may work slowly, very
> slowly but it shouldn't hang. If system cannot allocate any pages it should
> return correct value.

I agree, system should not get hung when it runs out of
memory when constructing mapping tables. The misbehaving
client should get malloc/mmap failure.

> And the main thing: it seems the system doesn't use all available swap
> space.
>
> I'm sure if system works properly results must be other.
>
> --
> Regards,
> Ivan Stepnikov.

/Matti Aarnio

2001-03-01 13:46:42

by Heusden, Folkert van

[permalink] [raw]
Subject: RE: Kernel is unstable

> memory area has to be accessed. In some memory management systems,
> the allocated area has to be actually written (demand zero paging).
> If you execute from a user account, not root, with ulimits enabled,
> you should be able to do:
> char *p;
> for(;;)
> {
> if((p = (char *) malloc(WHATEVER)) == NULL)
> {
> puts("Out of memory");
> exit(1);
> }
> *p = (char) 0x01; /* Write to memory */
> }
> ... without hanging the system.

Allow me to nitpick :o)
int loop;
for(loop=0;loop<WHATEVER; loop+=PAGESIZE)
p[loop] = 0x01;

Because: as far as I remember only the pages that are touched are
allocated, not the whole memory-block.

But, indead, that's nitpicking. Sorry for that :o)

2001-03-01 13:41:12

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Kernel is unstable

On Thu, 1 Mar 2001, Ivan Stepnikov wrote:

> Hello!
>
> I tried to test linux memory allocation. For experiment I used i386
> architecture with Pentium III processor, 512M RAM and 8G swap space. For
> memory allocation libhoard was tried. Linux kernel 2.4.2 with patch
> per-process-3.5G-IA32-no-PAE-1, at
> /pub/linux/kernel/people/andrea/patches/v2.4/2.4.0-test11-pre5/ on
> ftp.kernel.org (This patch should force memory allocation for one process up
> to 3.5G approximately).
>
> When I try large blocks (about 256K - 1M) everything was ok. More then 3G
> memory was successfully allocated.
>
> On small blocks result was significantly worse. About 2.3 - 2.4G was
> allocated and system hanged. But it was possible to switch between local
> consoles, and to receive ping replay by network. Actually it's only one sign
> of life (hard disk didn't work and it was impossible to reboot the machine
> by correct methods). At /var/log/messages was
>
> Feb 28 17:08:55 zetta kernel: <ed.
> Feb 28 17:08:55 zetta kernel: __alloc_pages: 0-order allocation failed.
> Feb 28 17:09:05 zetta last message repeated 363 times
>
>
>
> Test program looks like this:
>
>
>
> block = 1024;
>
> for (i=1; ; ++i ){
>
> if(p==malloc(block)){
>
> perror("failed");
>
> fprintf(stderr,"Error! Size total:%uM pass:
> %u\n",size/1024/1024,i);
>
> return 0;
>
> }
>
> size += block;
>
> printf("Size total:%uM pass: %u\n",size/1024/1024,i);
>
> }
>
>

This program does not allocate memory. It only tests to see if the
value returned from malloc() was equal to whatever is in variable "p".

No assignment was made. Further, for allocation to occur, an allocated
memory area has to be accessed. In some memory management systems,
the allocated area has to be actually written (demand zero paging).

If you execute from a user account, not root, with ulimits enabled,
you should be able to do:

char *p;

for(;;)
{
if((p = (char *) malloc(WHATEVER)) == NULL)
{
puts("Out of memory");
exit(1);
}
*p = (char) 0x01; /* Write to memory */
}

... without hanging the system.

If you insist upon running without ulimits, from the root account,
you get what you asked for.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2001-03-01 14:22:50

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Kernel is unstable

On Thu, Mar 01, 2001 at 12:16:08PM +0300, Ivan Stepnikov wrote:
> if(p==malloc(block)){

Side note: I guess here you meant:

if ((p = malloc(block)) {

Make sure you catch when malloc returns null because of out of memory (the out
of memory in your case happened in the NORMAL zone).

Still the fact it can deadlock the machine is a kernel bug in the memory
management. Probably the 3.5G patch per-task makes simpler to trigger it
because it decreases the size of the normal zone to a few houndred of mbytes.

Andrea

2001-03-01 14:33:51

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Kernel is unstable

On Thu, Mar 01, 2001 at 12:27:53PM +0200, Matti Aarnio wrote:
> With malloc(1M):
>
> ...
> 44089000-4418a000 rw-p 00000000 00:00 0
> 4418a000-4428b000 rw-p 00000000 00:00 0
> 4428b000-4438c000 rw-p 00000000 00:00 0
> 4438c000-4448d000 rw-p 00000000 00:00 0
> 4448d000-4458e000 rw-p 00000000 00:00 0
> 4458e000-4468f000 rw-p 00000000 00:00 0
> 4468f000-44790000 rw-p 00000000 00:00 0
> 44790000-44891000 rw-p 00000000 00:00 0

This is actually another bug completly orthogonal to the VM deadlock with the
empty ZONE_NORMAL.

>From the above it's pretty obvious the clever vma merging is broken in 2.4.

Andrea

2001-03-01 14:38:11

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Kernel is unstable

On Thu, Mar 01, 2001 at 03:35:12PM +0100, Andrea Arcangeli wrote:
> From the above it's pretty obvious the clever vma merging is broken in 2.4.

It's not broken, it's not there any longer as somebody dropped it between test7
and 2.4.2, may I ask why?

Andrea

2001-03-01 17:13:24

by Ingo Oeser

[permalink] [raw]
Subject: Re: Kernel is unstable

On Thu, Mar 01, 2001 at 03:24:09PM +0100, Andrea Arcangeli wrote:
> On Thu, Mar 01, 2001 at 12:16:08PM +0300, Ivan Stepnikov wrote:
> > if(p==malloc(block)){
>
> Side note: I guess here you meant:
>
> if ((p = malloc(block)) {

<nitpick>
Actually

if ((p = malloc(block))) {

would be even correct C ;-)
</nitpick>

Regards

Ingo Oeser
--
10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag>
<<<<<<<<<<<< come and join the fun >>>>>>>>>>>>

2001-03-01 18:18:06

by Alan

[permalink] [raw]
Subject: Re: Kernel is unstable

> It's not broken, it's not there any longer as somebody dropped it between test7
> and 2.4.2, may I ask why?

Linus took it out because it was breaking things.

2001-03-01 18:29:16

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Kernel is unstable

On Thu, Mar 01, 2001 at 06:20:49PM +0000, Alan Cox wrote:
> > It's not broken, it's not there any longer as somebody dropped it between test7
> > and 2.4.2, may I ask why?
>
> Linus took it out because it was breaking things.

If it happened to be buggy it didn't looked unfixable from a design standpoint
and I think it was a very worthwhile feature, not just for memory but also to
avoid growing the size of the avl that we would have to pay later all the time
at each page fault.

Andrea

2001-03-01 19:08:21

by David Miller

[permalink] [raw]
Subject: Re: Kernel is unstable


Andrea Arcangeli writes:
> If it happened to be buggy it didn't looked unfixable from a design standpoint
> and I think it was a very worthwhile feature, not just for memory but also to
> avoid growing the size of the avl that we would have to pay later all the time
> at each page fault.

Linus didn't find it to be such a gain, and in fact the one
place that does gain from such merging (sys_brk()) does the
merging by hand :-)

Later,
David S. Miller
[email protected]

2001-03-01 19:15:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: Kernel is unstable

In article <[email protected]>,
Andrea Arcangeli <[email protected]> wrote:
>On Thu, Mar 01, 2001 at 06:20:49PM +0000, Alan Cox wrote:
>> > It's not broken, it's not there any longer as somebody dropped it between test7
>> > and 2.4.2, may I ask why?
>>
>> Linus took it out because it was breaking things.
>
>If it happened to be buggy it didn't looked unfixable from a design standpoint
>and I think it was a very worthwhile feature, not just for memory but also to
>avoid growing the size of the avl that we would have to pay later all the time
>at each page fault.

The locking order was rather nasty in it (mapping->i_shared_lock and
mm->page_table_lock), and made a lot of the code much less readable than
it should have been. And because none of the callers could know whether
the vma existed after being merged, they ended up doing strange things
that simply aren't necessary with the much simpler version.

This, coupled with the fact that many merges could be done trivially by
hand (much faster), made me drop it. There were a few places where it
was used where I couldn't make myself be sure that the locking was
right: I could not prove that it was buggy, but I couldn't convince
myself that it wasn't, either.

Note how do_brk() does the merging itself (see the comment "Can we just
expand an old anonymous mapping?"), and that it's basically free when
done that way, with no worries about locking etc. The same could be done
fairly trivially in mmap too, but I never saw any real usage patterns
that made it look all that worthwhile (*). Handling the mmap case the
same way do_brk() does it would fix the behaviour of this pathological
example too..

Also note that the merging tests were not free, so at least under my set
of normal load the non-merging code is actually _faster_ than the clever
optimized merging. That was what clinched it for me: I absolutely hate
to see complexity that doesn't really buy you anything noticeable.

Linus

(*) The only "testing" I did was really running normal applications and
then checking how many merges could be done on /proc/*/maps. Under
normal load I did not see very many at all - I had something like six
missed merges while running my normal set of applications (X, KDE etc).
Others can obviously have very different usage patterns.

2001-03-01 20:44:36

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Kernel is unstable

On Thu, Mar 01, 2001 at 11:04:55AM -0800, David S. Miller wrote:
> Linus didn't find it to be such a gain, and in fact the one
> place that does gain from such merging (sys_brk()) does the
> merging by hand :-)

somewhere in either kernel or glibc we need to do the merging to avoid
allocating new vmas and to grow the avl, otherwise server environment will be
hurted. We're not going to need this feature to run kde well, or to do
compiles, but people using the 3.5G per-task patch or using a 64bit userspace
becaue they're too strict in 3.5G are certainly going to strictly need this
optimization.

I certainly prefer to do the merging at the kernel layer because it's generic
and it doesn't optimize only malloc users.

Andrea

2001-03-02 08:17:40

by Christoph Rohland

[permalink] [raw]
Subject: Re: Kernel is unstable

Hi Linus,

On 1 Mar 2001, Linus Torvalds wrote:
> Note how do_brk() does the merging itself (see the comment "Can we
> just expand an old anonymous mapping?"), and that it's basically
> free when done that way, with no worries about locking etc. The same
> could be done fairly trivially in mmap too, but I never saw any real
> usage patterns that made it look all that worthwhile (*). Handling
> the mmap case the same way do_brk() does it would fix the behaviour
> of this pathological example too..

Oh there is at least one application, which does trigger the merging
quite often: SAP R/3. We have a big memory area which is handled in 1M
blocks which get mmaped/munmapped/mprotected all the time. This now
leads to a really big avl tree which before has been much smaller.

I am not sure that the merging is a gain since it in itself is a
overhead and we work on fixed blocks. I simply wanted to point out
that there are applications out there which trigger it.

Greetings
Christoph

2001-03-02 08:41:08

by David Howells

[permalink] [raw]
Subject: Re: Kernel is unstable


Linus Torvalds <[email protected]> wrote:
> Also note that the merging tests were not free, so at least under my set of
> normal load the non-merging code is actually _faster_ than the clever
> optimized merging. That was what clinched it for me: I absolutely hate to
> see complexity that doesn't really buy you anything noticeable.

Surely, doing the merge will always have take longer than not doing the merge,
no matter how finely optimised the algorithm... But merging wouldn't be done
very often... only on memory allocation calls.

Or do you mean that use of the resultant VMA chain/tree is slower? This I find
suprising since after merging the VMA tree should at worst as complex as it
was before merging, and fairly likely simpler.

Perhaps it'd be reasonable to only do VMA merging on mmap calls and not brk
calls, and let brk manually extend an existing VMA if possible.

David

2001-03-02 14:01:10

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Kernel is unstable

On Fri, Mar 02, 2001 at 08:40:44AM +0000, David Howells wrote:
> no matter how finely optimised the algorithm... But merging wouldn't be done
> very often... only on memory allocation calls.

Correct, it would happen only at mmap time of course.

> Perhaps it'd be reasonable to only do VMA merging on mmap calls and not brk
> calls, and let brk manually extend an existing VMA if possible.

Yep.

Andrea

2001-03-02 17:53:13

by Linus Torvalds

[permalink] [raw]
Subject: Re: Kernel is unstable



On Fri, 2 Mar 2001, David Howells wrote:
>
> Surely, doing the merge will always have take longer than not doing the merge,
> no matter how finely optimised the algorithm... But merging wouldn't be done
> very often... only on memory allocation calls.

Ehh.. If the merging doesn't actually happen, it's always a loss. We've
just spent CPU cycles on doing something useless. And in my tests, that
was the case a lot more than not.

Also, in the expense of taking a page fault, looking one or two levels
deeper in the AVL tree is pretty much not noticeable.

Show me numbers for real applications, and I might care. I saw barely
measurable speedups (and more importantly to me - real simplification) by
removing it.

Don't bother arguing with "it might.."

Linus

2001-03-05 23:07:09

by Chris Wedgwood

[permalink] [raw]
Subject: Re: Kernel is unstable

On Fri, Mar 02, 2001 at 09:52:35AM -0800, Linus Torvalds wrote:

Show me numbers for real applications, and I might care. I saw
barely measurable speedups (and more importantly to me - real
simplification) by removing it.

Do large numbers of entries hurt? I have an application that when
running for a while ends up with over 2000 entries in /proc/$$/maps
because of the way it works... that is paging is more efficient that
the application swapping data out; so it continually grows until it
can no longer at which point it falls back to using it's own poor
LRU, typically means you get a single process of around 2.9GB.

Everything appears to work... but it does seem a bit clunky that
there are so many vma chunks.

Perhaps the argument here is that glibc or the application should be
fixed?




--cw