2005-02-08 17:08:18

by jon ross

[permalink] [raw]
Subject: VM disk cache behavior.

I have an app with a small fixed memory footprint that does a lot of
random reads from a large file. I thought if I added more memory to
the machine the VM would do more caching of the disk, but added memory
does not seem to make any difference. I played with some of the params
in /proc/sys/vm and none of them seem to have any effect.

I tired both a 2.4.20 & 2.6.10 kernels with no difference.

The machine is a Dell 2560. I tired memory configs of 512M, 1G, 4G and
the average read-times do not change.

Do I need to set/compile anything to allow the VM to use the memory?
If is was a way to tell how much memory the VM is using for a drive
cache I could at least tell if my kernel is miss-configured or my app
sucks.

Thanks,

-Jon


2005-02-08 17:23:50

by Robert Love

[permalink] [raw]
Subject: Re: VM disk cache behavior.

On Tue, 2005-02-08 at 12:06 -0500, jon ross wrote:
> I have an app with a small fixed memory footprint that does a lot of
> random reads from a large file. I thought if I added more memory to
> the machine the VM would do more caching of the disk, but added memory
> does not seem to make any difference. I played with some of the params
> in /proc/sys/vm and none of them seem to have any effect.
>
> I tired both a 2.4.20 & 2.6.10 kernels with no difference.
>
> The machine is a Dell 2560. I tired memory configs of 512M, 1G, 4G and
> the average read-times do not change.
>
> Do I need to set/compile anything to allow the VM to use the memory?
> If is was a way to tell how much memory the VM is using for a drive
> cache I could at least tell if my kernel is miss-configured or my app
> sucks.

More memory will allow the kernel to keep more cache in memory. You can
see how much memory the kernel is using for cache with free(1).

That does not sound like your problem, though. It sounds like you want
the kernel to do more _read-ahead_, e.g. cache things _before_ you even
need them (and then you might want more memory to actually keep all of
the stuff alive in the cache, but that is a secondary problem).
Unfortunately, since you are doing random reads, it is very hard for the
kernel to do intelligent read-ahead.

What you can do is pre-fault the entire file into memory. This is not a
bad idea if you know you are going to ultimately read much of the file.

You can prefault the file automatically and asynchronously using
posix_fadvise(). Example:

if (posix_fadvise (fd, 0, 0, POSIX_FADV_WILLNEED))
perror ("posix_fadvise");

See posix_fadvise(2) for more information.

It might also be faster to use mmap(1) over read(2). Then you can use
madvise().

Best,

Robert Love


2005-02-08 20:08:25

by Chris Wedgwood

[permalink] [raw]
Subject: Re: VM disk cache behavior.

On Tue, Feb 08, 2005 at 12:06:14PM -0500, jon ross wrote:

> I have an app with a small fixed memory footprint that does a lot of
> random reads from a large file. I thought if I added more memory to
> the machine the VM would do more caching of the disk, but added
> memory does not seem to make any difference. I played with some of
> the params in /proc/sys/vm and none of them seem to have any effect.

If the file is much larger than your RAM size and your access really
are random, you're probably SOL as you will be seek/IO bound most of
the time.

How large is the 'large file' ?

2005-02-09 00:38:04

by Andy Isaacson

[permalink] [raw]
Subject: Re: VM disk cache behavior.

On Tue, Feb 08, 2005 at 12:06:14PM -0500, jon ross wrote:
> I have an app with a small fixed memory footprint that does a lot of
> random reads from a large file. I thought if I added more memory to
> the machine the VM would do more caching of the disk, but added memory
> does not seem to make any difference. I played with some of the params
> in /proc/sys/vm and none of them seem to have any effect.
>
> I tired both a 2.4.20 & 2.6.10 kernels with no difference.
>
> The machine is a Dell 2560. I tired memory configs of 512M, 1G, 4G and
> the average read-times do not change.

Could we get some quant here? How small is "small"? How large is
"large"? What are you measuring? What are the results? Does the app
re-use the same data, or is its use a one-time deal?

> Do I need to set/compile anything to allow the VM to use the memory?

No, the Linux VM system should automatically cache for you.

> If is was a way to tell how much memory the VM is using for a drive
> cache I could at least tell if my kernel is miss-configured or my app
> sucks.

Check out the commands "free", "vmstat 1", "top", the contents of
/proc/meminfo, the output of Sysrq-M.

Most likely is that your app isn't behaving in a cache-friendly way. If
your file will fit in memory, just fault it in sequentially (wc -l file)
and then your app should cook. If you're not going to fit in memory,
the vm caching will probably only help if you have some reuse; you could
develop a pre-faulter to get your IO started ahead of time, but that's
generally nontrivial.

-andy

2005-02-09 05:03:56

by Kevin Puetz

[permalink] [raw]
Subject: Re: VM disk cache behavior.

Andy Isaacson wrote:

> Most likely is that your app isn't behaving in a cache-friendly way.  If
> your file will fit in memory, just fault it in sequentially (wc -l file)
> and then your app should cook.  If you're not going to fit in memory,
> the vm caching will probably only help if you have some reuse; you could
> develop a pre-faulter to get your IO started ahead of time, but that's
> generally nontrivial.

Of course, what's non-trivial is predicting your upcoming I/O pattern
(unless it's not actually random at all, just messy). Calling madvise to
prefault it is pretty easy if you actually do know what you'll want in the
near future.