2003-01-24 12:40:12

by Roman Dementiev

[permalink] [raw]
Subject: buffer leakage in kernel?

Hello everyone,

I've met with following problem (kernel 2.4.20-pre4 ):
I write and read sequentially from/to 8 files each of 64 Gbytes (not a
mistake, 64 Gbyte),
each on different disk. The files are opened with flag O_DIRECT. I have
1 Gbyte RAM, no swap.
While this scanning is running, number of "buffers" reported by ''free"
and in /proc/meminfo
is continuously increasing up to ~ 500 MB !! When the program exits
normally or I break it, number
of "buffers" does not decrease and even increases if I do operations on
other files.

This is not nice at all when I have another applications running
with memory consumption > 500 MB: when my "scanner" approaches 50G
border on
each disk, I've got numerous "Out of memory" murders :(. Even 'ssh' to
this machine
is killed :(

Could anyone explain why it happens? I suppose that it is a memory
leakage in
file system buffer management.
Is it fixed already in any patch?


Bye,
Roman



2003-01-24 17:58:25

by Andreas Dilger

[permalink] [raw]
Subject: Re: buffer leakage in kernel?

On Jan 24, 2003 13:49 +0100, Roman Dementiev wrote:
> I've met with following problem (kernel 2.4.20-pre4 ):
> I write and read sequentially from/to 8 files each of 64 Gbytes (not a
> mistake, 64 Gbyte),
> each on different disk. The files are opened with flag O_DIRECT. I have
> 1 Gbyte RAM, no swap.
> While this scanning is running, number of "buffers" reported by ''free"
> and in /proc/meminfo
> is continuously increasing up to ~ 500 MB !! When the program exits
> normally or I break it, number
> of "buffers" does not decrease and even increases if I do operations on
> other files.
>
> This is not nice at all when I have another applications running
> with memory consumption > 500 MB: when my "scanner" approaches 50G
> border on each disk, I've got numerous "Out of memory" murders :(.
> Even 'ssh' to this machine is killed :(

There was a bug in vanilla 2.4.18 related to O_DIRECT that is fixed in the
RH kernel, but I don't know when/if it was merged into the main kernel.
As always, testing with the most recent kernel (2.4.21-pre3 currently)
will tell you whether that bug has been fixed already or not.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2003-01-24 20:33:56

by Andrew Morton

[permalink] [raw]
Subject: Re: buffer leakage in kernel?

Roman Dementiev <[email protected]> wrote:
>
> Hello everyone,
>
> I've met with following problem (kernel 2.4.20-pre4 ):
> I write and read sequentially from/to 8 files each of 64 Gbytes (not a
> mistake, 64 Gbyte),
> each on different disk. The files are opened with flag O_DIRECT. I have
> 1 Gbyte RAM, no swap.
> While this scanning is running, number of "buffers" reported by ''free"
> and in /proc/meminfo
> is continuously increasing up to ~ 500 MB !!

You did not specify the filesystem type. I shall assume ext2.

This buffer growth is expected - ext2 uses one 4k indirect block to describe
the disk location of 4M of file data. Those indirect blocks are cached, and
appear as "buffers" in the memory accounting.

So after having read 500G of ext2 file with O_DIRECT, you would expect there
to be 500M of indirects in the block device pagecache. (We shouldn't be
calling this "buffers" any more. That is inaccurate, and confuses people
into thinking that Linux has a buffer cache).

> When the program exits
> normally or I break it, number
> of "buffers" does not decrease and even increases if I do operations on
> other files.

O_DIRECT operations, yes. If you were to read one of these files without
O_DIRECT, you should see "Buffers" decreasing as they are reclaimed to make
way for the newly introduced file cache.

> This is not nice at all when I have another applications running
> with memory consumption > 500 MB: when my "scanner" approaches 50G
> border on
> each disk, I've got numerous "Out of memory" murders :(. Even 'ssh' to
> this machine
> is killed :(
>
> Could anyone explain why it happens? I suppose that it is a memory
> leakage in file system buffer management.

Sounds like a bug.

Are you reading these large files via a single application, or via one
process per file?

How large is the buffer into which the application is performing the O_DIRECT
read?

Please perform this test:

1: Wait until you have 500M "Buffers"
2: cat 64_gig_file > /dev/null
3: Now see how large "Buffers" is. It should have reduced a lot.


2003-01-27 16:48:42

by Roman Dementiev

[permalink] [raw]
Subject: Re: buffer leakage in kernel?

Andrew Morton wrote:

> > While this scanning is running, number of "buffers" reported by ''free"
> > and in /proc/meminfo
> > is continuously increasing up to ~ 500 MB !!
>
> You did not specify the filesystem type. I shall assume ext2.

right

> > This is not nice at all when I have another applications running
> > with memory consumption > 500 MB: when my "scanner" approaches 50G
> > border on
> > each disk, I've got numerous "Out of memory" murders :(. Even 'ssh' to
> > this machine
> > is killed :(
> >
> > Could anyone explain why it happens? I suppose that it is a memory
> > leakage in file system buffer management.

Thanx for explanation.

> Sounds like a bug.
>
> Are you reading these large files via a single application, or via one
> process per file?

via single application with 8 I/O threads + 1 thread.

> How large is the buffer into which the application is performing the O_DIRECT
> read?

Application allocates 512 MB and never uses more.
2MB buffers are used for each read() and write() calls.
Each file has only one read or write request going any time. There is no othr
memory-
hungry applications running.

> Please perform this test:
>
> 1: Wait until you have 500M "Buffers"
> 2: cat 64_gig_file > /dev/null
> 3: Now see how large "Buffers" is. It should have reduced a lot.

Yes, it worked, they had reduced.
Does this mean, that cached indirect buffers can't be kicked out of memory
automatically
and ONLY non-O_DIRECT access can do it? I suppose, they should be displaced by
newly allocated indirect buffers and user memory allocation.


Roman

2003-01-27 20:25:12

by Andrew Morton

[permalink] [raw]
Subject: Re: buffer leakage in kernel?

Roman Dementiev <[email protected]> wrote:
>
> Application allocates 512 MB and never uses more.
> 2MB buffers are used for each read() and write() calls.
> Each file has only one read or write request going any time. There is no othr
> memory-
> hungry applications running.

OK.

> > Please perform this test:
> >
> > 1: Wait until you have 500M "Buffers"
> > 2: cat 64_gig_file > /dev/null
> > 3: Now see how large "Buffers" is. It should have reduced a lot.
>
> Yes, it worked, they had reduced.
> Does this mean, that cached indirect buffers can't be kicked out of memory
> automatically
> and ONLY non-O_DIRECT access can do it? I suppose, they should be displaced by
> newly allocated indirect buffers and user memory allocation.

I suspect what is happening is that you've managed to find a code path in
which the kernel is allocating lots of memory in a mode in which it cannot
run effective page reclaim. This would be more likely to be true if you only
see the failures when writing.

It would help if you could change mm/vmscan.c:try_to_free_pages_zone()
thusly:

/*
* Hmm.. Cache shrink failed - time to kill something?
* Mhwahahhaha! This is the part I really like. Giggle.
*/
+ show_stack(0);
out_of_memory();
return 0;
}

and pass the resulting log output through ksyoops.

And here's a protopatch to teach the kernel to make sure that there's a
decent amount of free memory before it goes and performs GFP_NOFS pagecache
allocations:

diff -puN fs/buffer.c~a fs/buffer.c
--- 24/fs/buffer.c~a 2003-01-27 12:28:02.000000000 -0800
+++ 24-akpm/fs/buffer.c 2003-01-27 12:28:23.000000000 -0800
@@ -2112,6 +2112,8 @@ int generic_direct_IO(int rw, struct ino
for (i = 0; i < nr_blocks; i++, blocknr++) {
struct buffer_head bh;

+ try_to_free_pages(GFP_KERNEL);
+
bh.b_state = 0;
bh.b_dev = inode->i_dev;
bh.b_size = blocksize;


If that sheds no light, please send me the app and I'll see if I can
reproduce it. Thanks.


2003-02-04 12:49:15

by Roman Dementiev

[permalink] [raw]
Subject: Re: buffer leakage in kernel?



On Mon, 27 Jan 2003, Andrew Morton wrote:

>
> I suspect what is happening is that you've managed to find a code path in
> which the kernel is allocating lots of memory in a mode in which it cannot
> run effective page reclaim. This would be more likely to be true if you only
> see the failures when writing.
>
> It would help if you could change mm/vmscan.c:try_to_free_pages_zone()
> thusly:
>
> /*
> * Hmm.. Cache shrink failed - time to kill something?
> * Mhwahahhaha! This is the part I really like. Giggle.
> */
> + show_stack(0);
> out_of_memory();
> return 0;
> }
>
> and pass the resulting log output through ksyoops.
>
> And here's a protopatch to teach the kernel to make sure that there's a
> decent amount of free memory before it goes and performs GFP_NOFS pagecache
> allocations:
>
> diff -puN fs/buffer.c~a fs/buffer.c
> --- 24/fs/buffer.c~a 2003-01-27 12:28:02.000000000 -0800
> +++ 24-akpm/fs/buffer.c 2003-01-27 12:28:23.000000000 -0800
> @@ -2112,6 +2112,8 @@ int generic_direct_IO(int rw, struct ino
> for (i = 0; i < nr_blocks; i++, blocknr++) {
> struct buffer_head bh;
>
> + try_to_free_pages(GFP_KERNEL);
> +
> bh.b_state = 0;
> bh.b_dev = inode->i_dev;
> bh.b_size = blocksize;
>
>
> If that sheds no light, please send me the app and I'll see if I can
> reproduce it. Thanks.
>
>

I have not yet tried that because of the lack of time. Instead I installed
2.4.21-pre4. The problem seems to be fixed there.


Roman