2006-11-22 07:51:10

by Aubrey Li

[permalink] [raw]
Subject: The VFS cache is not freed when there is not enough free memory to allocate

Hi all,

We are working on the blackfin uClinux platform and we encountered the
following problem.
The attached patch can work around this issue and I post it here to
find better solution.

Here is a test application:
---------------------------------------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#define N 8

int main (void){
void *p[N];
int i;

printf("Alloc %d MB !\n", N);

for (i = 0; i < N; i++) {
p[i] = malloc(1024 * 1024);
if (p[i] == NULL)
printf("alloc failed\n");
}

printf("alloc successful \n");
for (i = 0; i < N; i++)
free(p[i]);
}

When there is not enough free memory to allocate:
==============================
root:/mnt> cat /proc/meminfo
MemTotal: 54196 kB
MemFree: 5520 kB <== only 5M free
Buffers: 76 kB
Cached: 44696 kB <== cache eat 40MB
SwapCached: 0 kB
Active: 21092 kB
Inactive: 23680 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 54196 kB
LowFree: 5520 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 0 kB
Mapped: 0 kB
Slab: 3720 kB
PageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 27096 kB
Committed_AS: 0 kB
VmallocTotal: 0 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
==========================================

I run the test application and get the following message:
---------------------------------------
root:/mnt> ./t
Alloc 8 MB !
t: page allocation failure. order:9, mode:0x40d0
Hardware Trace:
0 Target : <0x00004de0> { _dump_stack + 0x0 }
Source : <0x0003054a> { ___alloc_pages + 0x17e }
1 Target : <0x0003054a> { ___alloc_pages + 0x17e }
Source : <0x0000dbc2> { _printk + 0x16 }
2 Target : <0x0000dbbe> { _printk + 0x12 }
Source : <0x0000da4e> { _vprintk + 0x1a2 }
3 Target : <0x0000da42> { _vprintk + 0x196 }
Source : <0xffa001ea> { __common_int_entry + 0xd8 }
4 Target : <0xffa00188> { __common_int_entry + 0x76 }
Source : <0x000089bc> { _return_from_int + 0x58 }
5 Target : <0x000089bc> { _return_from_int + 0x58 }
Source : <0x00008992> { _return_from_int + 0x2e }
6 Target : <0x00008964> { _return_from_int + 0x0 }
Source : <0xffa00184> { __common_int_entry + 0x72 }
7 Target : <0xffa00182> { __common_int_entry + 0x70 }
Source : <0x00012682> { __local_bh_enable + 0x56 }
8 Target : <0x0001266c> { __local_bh_enable + 0x40 }
Source : <0x0001265c> { __local_bh_enable + 0x30 }
9 Target : <0x00012654> { __local_bh_enable + 0x28 }
Source : <0x00012644> { __local_bh_enable + 0x18 }
10 Target : <0x0001262c> { __local_bh_enable + 0x0 }
Source : <0x000128e0> { ___do_softirq + 0x94 }
11 Target : <0x000128d8> { ___do_softirq + 0x8c }
Source : <0x000128b8> { ___do_softirq + 0x6c }
12 Target : <0x000128aa> { ___do_softirq + 0x5e }
Source : <0x0001666a> { _run_timer_softirq + 0x82 }
13 Target : <0x000165fc> { _run_timer_softirq + 0x14 }
Source : <0x00023eb8> { _hrtimer_run_queues + 0xe8 }
14 Target : <0x00023ea6> { _hrtimer_run_queues + 0xd6 }
Source : <0x00023e70> { _hrtimer_run_queues + 0xa0 }
15 Target : <0x00023e68> { _hrtimer_run_queues + 0x98 }
Source : <0x00023eae> { _hrtimer_run_queues + 0xde }
Stack from 015a7dcc:
00000001 0003054e 00000000 00000001 000040d0 0013c70c 00000009 000040d0
00000000 00000080 00000000 000240d0 00000000 015a6000 015a6000 015a6000
00000010 00000000 00000001 00036e12 00000000 0023f8e0 00000073 00191e40
00000020 0023e9a0 000040d0 015afea9 015afe94 00101fff 000040d0 0023e9a0
00000010 00101fff 000370de 00000000 0363d3e0 00000073 0000ffff 04000021
00000000 00101000 00187af0 00035b44 00000000 00035e40 00000000 00000000
Call Trace:
[<0000fffe>] _do_exit+0x12e/0x7cc
[<00004118>] _sys_mmap+0x54/0x98
[<00101000>] _fib_create_info+0x670/0x780
[<00008828>] _system_call+0x68/0xba
[<000040c4>] _sys_mmap+0x0/0x98
[<0000fffe>] _do_exit+0x12e/0x7cc
[<00008000>] _cplb_mgr+0x8/0x2e8
[<00101000>] _fib_create_info+0x670/0x780
[<00101000>] _fib_create_info+0x670/0x780

Mem-info:
DMA per-cpu:
cpu 0 hot: high 18, batch 3 used:5
cpu 0 cold: high 6, batch 1 used:5
DMA32 per-cpu: empty
Normal per-cpu: empty
HighMem per-cpu: empty
Free pages: 21028kB (0kB HighMem)
Active:2549 inactive:3856 dirty:0 writeback:0 unstable:0 free:5257
slab:1833 mapped:0 pagetables:0
DMA free:21028kB min:948kB low:1184kB high:1420kB active:10196kB
inactive:15424kB present:56320kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:128kB high:128kB active:0kB
inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 43*4kB 35*8kB 28*16kB 17*32kB 18*64kB 20*128kB 16*256kB 11*512kB
6*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB = 21028kB
DMA32: empty
Normal: empty
HighMem: empty
14080 pages of RAM
5285 free pages
531 reserved pages
11 pages shared
0 pages swap cached
Allocation of length 1052672 from process 57 failed
DMA per-cpu:
cpu 0 hot: high 18, batch 3 used:5
cpu 0 cold: high 6, batch 1 used:5
DMA32 per-cpu: empty
Normal per-cpu: empty
HighMem per-cpu: empty
Free pages: 21028kB (0kB HighMem)
Active:2549 inactive:3856 dirty:0 writeback:0 unstable:0 free:5257
slab:1833 mapped:0 pagetables:0
DMA free:21028kB min:948kB low:1184kB high:1420kB active:10196kB
inactive:15424kB present:56320kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:128kB high:128kB active:0kB
inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 43*4kB 35*8kB 28*16kB 17*32kB 18*64kB 20*128kB 16*256kB 11*512kB
6*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB = 21028kB
DMA32: empty
Normal: empty
HighMem: empty
-----------------------------

When there is no enough free memory, kernel crash instead of freeing VFS cache,
no matter how big the value of /proc/sys/vm/vfs_cache_pressure is set.

Here is my patch,
=====================================
>From a8a03f1fed672cc310feb3f5fafdc9e0e7a6546f Mon Sep 17 00:00:00 2001
From: Aubrey.Li <[email protected]>
Date: Wed, 22 Nov 2006 15:10:18 +0800
Subject: [PATCH] Drop VFS cache when there is not enough free memory to allocate

Signed-off-by: Aubrey.Li <[email protected]>
---
mm/page_alloc.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bf2f6cf..62559fd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1039,6 +1039,11 @@ restart:
if (page)
goto got_pg;

+#if defined(CONFIG_EMBEDDED) && !defined(CONFIG_MMU)
+ drop_pagecache();
+ drop_slab();
+#endif
+
/* This allocation should allow future memory freeing. */

if (((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE)))
--
1.4.2
========================================

The patch drop the page cache and slab and then give a new chance to
get more free pages. Applied this patch, my test application can
allocate memory sucessfully and drop the cache and slab as well. See
below:
================================
root:/mnt> ./t
Alloc 8 MB !
alloc successful
root:/mnt> cat /proc/meminfo
MemTotal: 54196 kB
MemFree: 43684 kB
Buffers: 36 kB
Cached: 440 kB
SwapCached: 0 kB
Active: 32 kB
Inactive: 432 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 54196 kB
LowFree: 43684 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 8 kB
Writeback: 0 kB
AnonPages: 0 kB
Mapped: 0 kB
Slab: 9812 kB
PageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 27096 kB
Committed_AS: 0 kB
VmallocTotal: 0 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
=============================

I know performance is important for linux, and VFS cache obviously
improve the performance when implement file operation. But for
embedded system, we'll try our best to make the application executable
rather than hanging system to guarantee the system performance.

Any suggestions and solutions are really appreciated!

Thanks,
-Aubrey


Attachments:
(No filename) (8.84 kB)
0001-Drop-VFS-cache-when-there-is-not-enough-free-memory-to-allocate.txt (793.00 B)
Download all attachments

2006-11-22 08:47:58

by Peter Zijlstra

[permalink] [raw]
Subject: Re: The VFS cache is not freed when there is not enough free memory to allocate

On Wed, 2006-11-22 at 15:51 +0800, Aubrey wrote:
> Hi all,
>
> We are working on the blackfin uClinux platform and we encountered the
> following problem.
> The attached patch can work around this issue and I post it here to
> find better solution.

> root:/mnt> ./t
> Alloc 8 MB !
> t: page allocation failure. order:9, mode:0x40d0
^^^^^^^
Such high order allocs rarely succeed after bootup. The proposed patch
will hardly help that more than lumpy reclaim will. Please see the
threads on Mel Gorman's Anti-Fragmentation and Linear/Lumpy reclaim in
the linux-mm archives.

> From: Aubrey.Li <[email protected]>
> Date: Wed, 22 Nov 2006 15:10:18 +0800
> Subject: [PATCH] Drop VFS cache when there is not enough free memory to allocate
>
> Signed-off-by: Aubrey.Li <[email protected]>
> ---
> mm/page_alloc.c | 5 +++++
> 1 files changed, 5 insertions(+), 0 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bf2f6cf..62559fd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1039,6 +1039,11 @@ restart:
> if (page)
> goto got_pg;
>
> +#if defined(CONFIG_EMBEDDED) && !defined(CONFIG_MMU)
> + drop_pagecache();
> + drop_slab();
> +#endif
> +
> /* This allocation should allow future memory freeing. */
>
> if (((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE)))
> --


> The patch drop the page cache and slab and then give a new chance to
> get more free pages. Applied this patch, my test application can
> allocate memory sucessfully and drop the cache and slab as well. See
> below:
> ================================
> root:/mnt> ./t
> Alloc 8 MB !
> alloc successful

Pure luck, there are workloads where there just would not have been any
order 9 contiguous block freeable (think where each 9th order block
would contain at least one active inode).

> I know performance is important for linux, and VFS cache obviously
> improve the performance when implement file operation. But for
> embedded system, we'll try our best to make the application executable
> rather than hanging system to guarantee the system performance.
>
> Any suggestions and solutions are really appreciated!

Try Mel's patches and wait for the next Lumpy reclaim posting.

The lack of a MMU on your system makes it very hard not to rely on
higher order allocations, because even user-space allocs need to be
physically contiguous. But please take that into consideration when
writing software.

2006-11-22 10:02:33

by Aubrey Li

[permalink] [raw]
Subject: Re: The VFS cache is not freed when there is not enough free memory to allocate

On 11/22/06, Peter Zijlstra <[email protected]> wrote:
> Please see the
> threads on Mel Gorman's Anti-Fragmentation and Linear/Lumpy reclaim in
> the linux-mm archives.
>

Thanks to point this. Is it already included in Linus' git tree?

> > The patch drop the page cache and slab and then give a new chance to
> > get more free pages. Applied this patch, my test application can
> > allocate memory sucessfully and drop the cache and slab as well. See
> > below:
> > ================================
> > root:/mnt> ./t
> > Alloc 8 MB !
> > alloc successful
>
> Pure luck, there are workloads where there just would not have been any
> order 9 contiguous block freeable (think where each 9th order block
> would contain at least one active inode).
>
> > I know performance is important for linux, and VFS cache obviously
> > improve the performance when implement file operation. But for
> > embedded system, we'll try our best to make the application executable
> > rather than hanging system to guarantee the system performance.
> >
> > Any suggestions and solutions are really appreciated!
>
> Try Mel's patches and wait for the next Lumpy reclaim posting.
>
> The lack of a MMU on your system makes it very hard not to rely on
> higher order allocations, because even user-space allocs need to be
> physically contiguous. But please take that into consideration when
> writing software.

Well, the test application just use an exaggerated way to replicate the issue.

Actually, In the real work, the application such as mplayer, asterisk,
etc will run into
the above problem when run them at the second time. I think I have no
reason to modify those kind of applications.

My patch let kernel drop VFS cache in the low memory situation when
the application requests more memory allocation, I don't think it's
luck. You know, the application just wants to allocate 8
1Mbyte-blocks(order =9) and releasing VFS cache we can get almost
50Mbyte free memory.

The patch indeedly enabled many failed test cases on our side. But
yes, I don't think it's the final solution. I'll try Mel's patch and
update the results.

Thanks,
-Aubrey

2006-11-22 10:46:58

by Peter Zijlstra

[permalink] [raw]
Subject: Re: The VFS cache is not freed when there is not enough free memory to allocate

On Wed, 2006-11-22 at 18:02 +0800, Aubrey wrote:
> On 11/22/06, Peter Zijlstra <[email protected]> wrote:
> > Please see the
> > threads on Mel Gorman's Anti-Fragmentation and Linear/Lumpy reclaim in
> > the linux-mm archives.
> >
>
> Thanks to point this. Is it already included in Linus' git tree?

No it is not.

> Well, the test application just use an exaggerated way to replicate the issue.
>
> Actually, In the real work, the application such as mplayer, asterisk,
> etc will run into
> the above problem when run them at the second time. I think I have no
> reason to modify those kind of applications.

It comes from the choice of architecture, I'd not run general purpose
code like that on MMU-less hardware. But yeah, I see your point.

> My patch let kernel drop VFS cache in the low memory situation when
> the application requests more memory allocation, I don't think it's
> luck. You know, the application just wants to allocate 8
> 1Mbyte-blocks(order =9) and releasing VFS cache we can get almost
> 50Mbyte free memory.

Yes it does that, but there is no guarantee that those 50MB have a
single 1M contiguous region amongst them.

> The patch indeedly enabled many failed test cases on our side. But
> yes, I don't think it's the final solution. I'll try Mel's patch and
> update the results.

Mel's patches alone aren't quite enough, you also need some reclaim
modifications, I'll ping Andy to see how far he's on that.

2006-11-22 11:09:44

by Aubrey Li

[permalink] [raw]
Subject: Re: The VFS cache is not freed when there is not enough free memory to allocate

On 11/22/06, Peter Zijlstra <[email protected]> wrote:
>
> Mel's patches alone aren't quite enough, you also need some reclaim
> modifications, I'll ping Andy to see how far he's on that.
>

I think so. A quick look at Mei's patch, I found the patch can't help our case.
The current situation is that the application need 8 M memory, but
ther is only 5M free memory, cached memory eat almost 40Mbyte. When
the application is requesting the memory, kernel just report failure,
not attempt to release the VFS cache and try it again.
==============================
root:/mnt> cat /proc/meminfo
MemTotal: 54196 kB
MemFree: 5520 kB <== only 5M free
Buffers: 76 kB
Cached: 44696 kB <== cache eat 40MB
SwapCached: 0 kB
Active: 21092 kB
Inactive: 23680 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 54196 kB
LowFree: 5520 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 0 kB
Mapped: 0 kB
Slab: 3720 kB
PageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 27096 kB
Committed_AS: 0 kB
VmallocTotal: 0 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
==========================================

-Aubrey

2006-11-27 01:34:06

by Mike Frysinger

[permalink] [raw]
Subject: Re: The VFS cache is not freed when there is not enough free memory to allocate

On 11/22/06, Peter Zijlstra <[email protected]> wrote:
> Yes it does that, but there is no guarantee that those 50MB have a
> single 1M contiguous region amongst them.

right ... the testcase posted is more to quickly illustrate the
problem ... the requested size doesnt really matter, what does matter
is that we cant seem to reclaim memory from the VFS cache in scenarios
where the VFS cache is eating a ton of memory and we need some more

another scenario is where an application is constantly reading data
from a cd, re-encoding it to mp3, and then writing it to disk. the
VFS cache here quickly eats up the available memory.
-mike

2006-11-27 06:39:46

by Nick Piggin

[permalink] [raw]
Subject: Re: The VFS cache is not freed when there is not enough free memory to allocate

Aubrey wrote:
> On 11/22/06, Peter Zijlstra <[email protected]> wrote:

>> The lack of a MMU on your system makes it very hard not to rely on
>> higher order allocations, because even user-space allocs need to be
>> physically contiguous. But please take that into consideration when
>> writing software.
>
>
> Well, the test application just use an exaggerated way to replicate the
> issue.
>
> Actually, In the real work, the application such as mplayer, asterisk,
> etc will run into
> the above problem when run them at the second time. I think I have no
> reason to modify those kind of applications.

No that's wrong. And your patch is just a hack that happens to mask the
issue in the case you tested, and it will probably blow up in production
at some stage (consider the case where the VFS cache page is not freeable
or that page is being used for something else).

With the nommu kernel, you actually *do* have a huge reason to write
special code: large anonymous memory allocations have to use higher order
allocations!

I haven't actually written any nommu userspace code, but it is obvious
that you must try to keep malloc to <= PAGE_SIZE (although order 2 and
even 3 allocations seem to be reasonable, from process context)... Then
you would use something a bit more advanced than a linear array to store
data (a pagetable-like radix tree would be a nice, easy idea).

You are of course free to put that patch into your product's kernel
(although I would advise against it, because it has a lot of deadlock
issues)... but the reality is that if you want a robust system, you
cannot just take a unix program and run it unmodified on a nommu kernel
AFAIKS.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-11-29 07:17:16

by Sonic Zhang

[permalink] [raw]
Subject: Re: The VFS cache is not freed when there is not enough free memory to allocate

Forward to the mailing list.

Sonic Zhang wrote:
> On 11/27/06, Nick Piggin <[email protected]> wrote:


>> I haven't actually written any nommu userspace code, but it is obvious
>> that you must try to keep malloc to <= PAGE_SIZE (although order 2 and
>> even 3 allocations seem to be reasonable, from process context)... Then
>> you would use something a bit more advanced than a linear array to store
>> data (a pagetable-like radix tree would be a nice, easy idea).
>>
>
> But, even we split the 8M memory into 2048 x 4k blocks, we still face
> this failure. The key problem is that available memory is small than
> 2048 x 4k, while there are still a lot of VFS cache. The VFS cache can
> be freed, but kernel allocation function ignores it. See the new test
> application.


Which kernel allocation function? If you can provide more details I'd
like to get to the bottom of this.

Because the anonymous memory allocation in mm/nommu.c is all allocated
with GFP_KERNEL from process context, and in that case, the allocator
should not fail but call into page reclaim which in turn will free VFS
caches.



> What's a better way to free the VFS cache in memory allocator?


It should be freeing it for you, so I'm not quite sure what is going
on. Can you send over the kernel messages you see when the allocation
fails?

Also, do you happen to know of a reasonable toolchain + emulator setup
that I could test the nommu kernel with?

Thanks,
Nick

2006-11-29 09:27:56

by Aubrey Li

[permalink] [raw]
Subject: Re: The VFS cache is not freed when there is not enough free memory to allocate

On 11/29/06, Sonic Zhang <[email protected]> wrote:
> Forward to the mailing list.
>
> > On 11/27/06, Nick Piggin <[email protected]> wrote:
>
>
> >> I haven't actually written any nommu userspace code, but it is obvious
> >> that you must try to keep malloc to <= PAGE_SIZE (although order 2 and
> >> even 3 allocations seem to be reasonable, from process context)... Then
> >> you would use something a bit more advanced than a linear array to store
> >> data (a pagetable-like radix tree would be a nice, easy idea).
> >>
> >
> > But, even we split the 8M memory into 2048 x 4k blocks, we still face
> > this failure. The key problem is that available memory is small than
> > 2048 x 4k, while there are still a lot of VFS cache. The VFS cache can
> > be freed, but kernel allocation function ignores it. See the new test
> > application.
>
>
> Which kernel allocation function? If you can provide more details I'd
> like to get to the bottom of this.

I posted it here, I think you missed it. So forwarded it to you.

>
> Because the anonymous memory allocation in mm/nommu.c is all allocated
> with GFP_KERNEL from process context, and in that case, the allocator
> should not fail but call into page reclaim which in turn will free VFS
> caches.
>
>
>
> > What's a better way to free the VFS cache in memory allocator?
>
>
> It should be freeing it for you, so I'm not quite sure what is going
> on. Can you send over the kernel messages you see when the allocation
> fails?

I don't think so. The kernel doesn't attempt to free it. The log is
included in the mail I forwarded to you.

>
> Also, do you happen to know of a reasonable toolchain + emulator setup
> that I could test the nommu kernel with?

A project named skyeye.
http://www.skyeye.org/index.shtml

-Aubrey

2006-11-29 09:31:40

by Nick Piggin

[permalink] [raw]
Subject: Re: The VFS cache is not freed when there is not enough free memory to allocate

Aubrey wrote:
> On 11/29/06, Sonic Zhang <[email protected]> wrote:
>
>> Forward to the mailing list.
>>
>> > On 11/27/06, Nick Piggin <[email protected]> wrote:
>>
>>
>> >> I haven't actually written any nommu userspace code, but it is obvious
>> >> that you must try to keep malloc to <= PAGE_SIZE (although order 2 and
>> >> even 3 allocations seem to be reasonable, from process context)...
>> Then
>> >> you would use something a bit more advanced than a linear array to
>> store
>> >> data (a pagetable-like radix tree would be a nice, easy idea).
>> >>
>> >
>> > But, even we split the 8M memory into 2048 x 4k blocks, we still face
>> > this failure. The key problem is that available memory is small than
>> > 2048 x 4k, while there are still a lot of VFS cache. The VFS cache can
>> > be freed, but kernel allocation function ignores it. See the new test
>> > application.
>>
>>
>> Which kernel allocation function? If you can provide more details I'd
>> like to get to the bottom of this.
>
>
> I posted it here, I think you missed it. So forwarded it to you.

That was the order-9 allocation failure. Which is not going to be
solved properly by just dropping caches.

But Sonic apparently saw failures with 4K allocations, where the
caches weren't getting shrunk properly. This would be more interesting
because it would indicate a real problem with the kernel.

>>
>> Also, do you happen to know of a reasonable toolchain + emulator setup
>> that I could test the nommu kernel with?
>
>
> A project named skyeye.
> http://www.skyeye.org/index.shtml

Thanks, I'll give that one a try.

Nick

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-11-30 12:54:37

by Aubrey Li

[permalink] [raw]
Subject: Re: The VFS cache is not freed when there is not enough free memory to allocate

On 11/29/06, Nick Piggin <[email protected]> wrote:
> That was the order-9 allocation failure. Which is not going to be
> solved properly by just dropping caches.
>
> But Sonic apparently saw failures with 4K allocations, where the
> caches weren't getting shrunk properly. This would be more interesting
> because it would indicate a real problem with the kernel.
>
I have done several test cases. when cat /proc/meminfo show MemFree < 8192KB,

1) malloc(1024 * 4), 256 times = 8MB, allocation successful.
2) malloc(1024 * 16), 64 times = 8MB, allocation successful.
3) malloc(1024 * 64), 16 times = 8MB, allocation successful.
4) malloc(1024 * 128), 8 times = 8MB, allocation failed.
5) malloc(1024 * 256), 4 times = 8MB, allocation failed.

>From those results, we know, when allocation <=64K, cache can be
shrunk properly.
That means the malloc size of an application on nommu should be
<=64KB. That's exactly our problem. Some video programmes need a big
block which has contiguous physical address. But yes, as you said, we
must keep malloc not to alloc a big block to make the current kernel
working robust on nommu.

So, my question is, Can we improve this issue? why malloc(64K) is ok
but malloc(128K) not? Is there any existing parameters about this
issue? why not kernel attempt to shrunk cache no matter how big memory
allocation is requested?

Any thoughts?

Thanks,
-Aubrey

2006-11-30 21:19:13

by Nick Piggin

[permalink] [raw]
Subject: Re: The VFS cache is not freed when there is not enough free memory to allocate

Aubrey wrote:
> On 11/29/06, Nick Piggin <[email protected]> wrote:
>
>> That was the order-9 allocation failure. Which is not going to be
>> solved properly by just dropping caches.
>>
>> But Sonic apparently saw failures with 4K allocations, where the
>> caches weren't getting shrunk properly. This would be more interesting
>> because it would indicate a real problem with the kernel.
>>
> I have done several test cases. when cat /proc/meminfo show MemFree <
> 8192KB,
>
> 1) malloc(1024 * 4), 256 times = 8MB, allocation successful.
> 2) malloc(1024 * 16), 64 times = 8MB, allocation successful.
> 3) malloc(1024 * 64), 16 times = 8MB, allocation successful.
> 4) malloc(1024 * 128), 8 times = 8MB, allocation failed.
> 5) malloc(1024 * 256), 4 times = 8MB, allocation failed.
>
>> From those results, we know, when allocation <=64K, cache can be
>
> shrunk properly.
> That means the malloc size of an application on nommu should be
> <=64KB. That's exactly our problem. Some video programmes need a big
> block which has contiguous physical address. But yes, as you said, we
> must keep malloc not to alloc a big block to make the current kernel
> working robust on nommu.
>
> So, my question is, Can we improve this issue? why malloc(64K) is ok
> but malloc(128K) not? Is there any existing parameters about this
> issue? why not kernel attempt to shrunk cache no matter how big memory
> allocation is requested?
>
> Any thoughts?

The pattern you are seeing here is probably due to the page allocator
always retrying process context allocations which are <= order 3 (64K
with 4K pages).

You might be able to increase this limit a bit for your system, but it
could easily cause problems. Especially fragmentation on nommu systems
where the anonymous memory cannot be paged out.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-12-01 10:00:56

by Aubrey Li

[permalink] [raw]
Subject: Re: The VFS cache is not freed when there is not enough free memory to allocate

On 12/1/06, Nick Piggin <[email protected]> wrote:
>
> The pattern you are seeing here is probably due to the page allocator
> always retrying process context allocations which are <= order 3 (64K
> with 4K pages).
>
> You might be able to increase this limit a bit for your system, but it
> could easily cause problems. Especially fragmentation on nommu systems
> where the anonymous memory cannot be paged out.

Thanks for your clue. I found increasing this limit could really help
my test cases.
When MemFree < 8M, and the test case request 1M * 8 times, the
allocation can be sucessful after 81 times rebalance, :). So far I
haven't found any issue.

If I make a patch to move this parameter to be tunable in the proc
filesystem on nommu case, is it acceptable?

Thanks,
-Aubrey