Subject: Some thoughts about cache and swap

One thing where most current cache/swap implementations seem to fail
miserably is when user - for some reason - processes large amounts of
data. This may be as simple as listening large MP3 collection for few
hours (having very large working set, and pushing everything else out of
RAM, replacing that with cached songs).

Incidentally, in such cases, swapping the content is usually complete
waste, as it is unlikely that the same data is needed again, or if it
actually is required, the speed may not be any kind of issue.

In order to make better use of the limited cache space, the following
methods could be used:

1. highly preferable to cache small files only
* big seek latency of disk/net access, small RAM usage of caching
* applications with long loading times usually use big number of tiny
files => caching those makes response times a lot better
* higher probability of getting more hits (per consumed cache space)

1.1. if caching large files anyway
* try to detect access type (sequential, highly spatial or random)
* only cache the hottest parts of the file

2. only cache files where I/O is the bottle neck
* if applications typically don't need the data faster, having it in
cache isn't very useful either
* detecting whether I/O is a limiting factor is difficult

Additionally, for machines with oversized RAM (like nearly all
desktop/server computers):

3. never (or only rarely) swap out applications for more cache
* eventually it will be restored to RAM and the user will notice
major trashing with long delays, and blame the OS
* applications only take small portion of the RAM and using that
as extra cache makes only small difference in cache performance
* if application or its data has been loaded to memory, there normally
is a reason for that (i.e. the data needs to be accessed quickly)

3.1. memory leaks are exception (but maybe fixing the bug would be
correct solution instead of obscuring the problem by swapping it out)

Definition of large file changes over time, as technology evolves
* size of RAM (and thus the available cache space)
* reading one megaoctet or doing single seek on modern HDD each consume
roughly the same time - about 13 ms (notice how evil seeking is!)

- Tronic -


Attachments:
signature.asc (256.00 B)
OpenPGP digital signature

2004-06-05 23:37:19

by Rik van Riel

[permalink] [raw]
Subject: Re: Some thoughts about cache and swap

On Sat, 5 Jun 2004, [UTF-8] Lasse Kärkkäinen / Tronic wrote:

> In order to make better use of the limited cache space, the following
> methods could be used:

[snip magic piled on more magic]

I wonder if we should just bite the bullet and implement
LIRS, ARC or CART for Linux. These replacement algorithms
should pretty much detect by themselves which pages are
being used again (within a reasonable time) and which pages
aren't.

Especially CART looks interesting.

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

2004-06-06 07:00:47

by John Bradford

[permalink] [raw]
Subject: Re: Some thoughts about cache and swap

Quote from Rik van Riel <[email protected]>:
> On Sat, 5 Jun 2004, [UTF-8] Lasse K=C3=A4rkk=C3=A4inen / Tronic wrote:
>
> > In order to make better use of the limited cache space, the following
> > methods could be used:
>
> [snip magic piled on more magic]
>
> I wonder if we should just bite the bullet and implement
> LIRS, ARC or CART for Linux. These replacement algorithms
> should pretty much detect by themselves which pages are
> being used again (within a reasonable time) and which pages
> aren't.

Is the current system really bad enough to make it worthwhile, though?

Is there really much performance to be gained from tuning the 'limited' cache
space, or will it just hurt as many or more systems than it helps?

John.

2004-06-06 08:40:07

by Christian Borntraeger

[permalink] [raw]
Subject: Re: Some thoughts about cache and swap

John Bradford wrote:
> Quote from Rik van Riel <[email protected]>:
> > I wonder if we should just bite the bullet and implement
> > LIRS, ARC or CART for Linux. These replacement algorithms
> > should pretty much detect by themselves which pages are
> > being used again (within a reasonable time) and which pages
> > aren't.
> Is there really much performance to be gained from tuning the 'limited'
> cache space, or will it just hurt as many or more systems than it helps?

Thats a very good question.
Most of the time the current algorithm works quite well.
On the other hand, I definitely know what people mean when they complain
about cachingand all this stuff. By just copying a big file that I dont use
afterwards or watching an video I have 2 wonderful scenarios. The cache is
filled with useless information and big parts of KDE are neither in memory
nor in cache. Applications could use madvice or other things to indicate
that they dont need this file a second time, but they usually dont.

I think it might be interesting to have some kind of benchmark, similiar to
the interactive benchmark of Con that just triggers the workload so many
people are complaining. If I find the time, I will give it a try in the
next days.

cheers

Christian

2004-06-09 18:13:46

by Matt Mackall

[permalink] [raw]
Subject: Re: Some thoughts about cache and swap

On Sun, Jun 06, 2004 at 10:38:25AM +0200, Christian Borntraeger wrote:
> John Bradford wrote:
> > Quote from Rik van Riel <[email protected]>:
> > > I wonder if we should just bite the bullet and implement
> > > LIRS, ARC or CART for Linux. These replacement algorithms
> > > should pretty much detect by themselves which pages are
> > > being used again (within a reasonable time) and which pages
> > > aren't.
> > Is there really much performance to be gained from tuning the 'limited'
> > cache space, or will it just hurt as many or more systems than it helps?
>
> Thats a very good question.
> Most of the time the current algorithm works quite well.
> On the other hand, I definitely know what people mean when they complain
> about cachingand all this stuff. By just copying a big file that I dont use
> afterwards or watching an video I have 2 wonderful scenarios.

Perhaps people should read about the referenced algorithms. LRU
(including the hybrid LRU that Linux uses) is vulnerable to
"scanning" of the sort you're describing, while the above algorithms
have varying degrees of scan-resistance. As lack of scan-resistance
seems to be "the big problem" in the current VM, this looks like an
interesting direction to go in.

--
Mathematics is the supreme nostalgia of our time.

2004-06-09 19:26:23

by John Bradford

[permalink] [raw]
Subject: Re: Some thoughts about cache and swap

Quote from Matt Mackall <[email protected]>:
> On Sun, Jun 06, 2004 at 10:38:25AM +0200, Christian Borntraeger wrote:
> > John Bradford wrote:
> > > Quote from Rik van Riel <[email protected]>:
> > > > I wonder if we should just bite the bullet and implement
> > > > LIRS, ARC or CART for Linux. These replacement algorithms
> > > > should pretty much detect by themselves which pages are
> > > > being used again (within a reasonable time) and which pages
> > > > aren't.
> > > Is there really much performance to be gained from tuning the 'limited'
> > > cache space, or will it just hurt as many or more systems than it helps?
> >
> > Thats a very good question.
> > Most of the time the current algorithm works quite well.
> > On the other hand, I definitely know what people mean when they complain
> > about cachingand all this stuff. By just copying a big file that I dont use
> > afterwards or watching an video I have 2 wonderful scenarios.
>
> Perhaps people should read about the referenced algorithms. LRU
> (including the hybrid LRU that Linux uses) is vulnerable to
> "scanning" of the sort you're describing, while the above algorithms
> have varying degrees of scan-resistance. As lack of scan-resistance
> seems to be "the big problem" in the current VM, this looks like an
> interesting direction to go in.

Does "the big problem" really exist though?

Despite all of this discussion about swap and memory management, I _never_
reproduce any of the problems mentioned in normal use. In my experience,
extreme VM problems almost always stem from mis-configured swap.

John.

2004-06-09 19:34:11

by Rik van Riel

[permalink] [raw]
Subject: Re: Some thoughts about cache and swap

On Wed, 9 Jun 2004, John Bradford wrote:

> Does "the big problem" really exist though?

It's all about corner cases (and corner case workloads).

> Despite all of this discussion about swap and memory management, I
> _never_ reproduce any of the problems mentioned in normal use.

Just like most people aren't seeing problems with the O(1)
scheduler and the 500 lines of kludges piled on top that
keep it working ok in 2.6 - in most cases.

Compare with Con's staircase scheduler, that removes those
500 lines of code, appears to work just as good in the common
situations and deals with a few extra corner cases.

The VM is in a similar situation, with Too Much Magic(tm) all
over the place, just to keep the system working smoothly in
normal workloads.

It would be a minor miracle if the VM - with all the magic
tweaks - still worked fine for workloads that don't behave the
way the VM expects them to.

THAT is what replacing the current code with a self-learning
algorithm is all about, IMHO.

> In my experience, extreme VM problems almost always stem from
> mis-configured swap.

Haven't seen many of those, to be honest. The majority
of the VM problems I get to see are people running a
workload the kernel didn't expect - a workload the kernel
wasn't prepared to handle...

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

2004-06-09 19:43:08

by Bill Davidsen

[permalink] [raw]
Subject: Re: Some thoughts about cache and swap

John Bradford wrote:
> Quote from Rik van Riel <[email protected]>:
>
>>On Sat, 5 Jun 2004, [UTF-8] Lasse K=C3=A4rkk=C3=A4inen / Tronic wrote:
>>
>>
>>>In order to make better use of the limited cache space, the following
>>>methods could be used:
>>
>> [snip magic piled on more magic]
>>
>>I wonder if we should just bite the bullet and implement
>>LIRS, ARC or CART for Linux. These replacement algorithms
>>should pretty much detect by themselves which pages are
>>being used again (within a reasonable time) and which pages
>>aren't.
>
>
> Is the current system really bad enough to make it worthwhile, though?

Yes! The current implementation just uses all the memory available, and
pushes any programs not actively running out to disk. Click the window
and go for coffee. On a small machine that's needed, but for almost any
typical usage, desktop or server, pushing out programs to have 3.5GB of
buffer instead of 3.0 doesn't help disk performance.

> Is there really much performance to be gained from tuning the 'limited' cache
> space, or will it just hurt as many or more systems than it helps?

I doubt it, but it would be nice to have a tuner the admin could use
instead of trying to guess what the priority of program response and i/o
response should be. So if I have a graphics program which might open an
image (small file) and decompress it into 1500MB of raw image, I can set
the buffer space down to a GB or so (I assume that I do this on a
machine fitted to such use) and get good response.

And even on a small machine with only 256MB or so, not having the
overnight file check push all but the last 10-12MB of programs out of
memory. That's the problem with the current system. As for hurting other
systems, that's why a tuner would be nice.

With various patches things are getting better, don't think it isn't
noticed and appreciated.

--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

2004-06-09 19:45:08

by Bill Davidsen

[permalink] [raw]
Subject: Re: Some thoughts about cache and swap

Christian Borntraeger wrote:
> John Bradford wrote:
>
>>Quote from Rik van Riel <[email protected]>:
>>
>>>I wonder if we should just bite the bullet and implement
>>>LIRS, ARC or CART for Linux. These replacement algorithms
>>>should pretty much detect by themselves which pages are
>>>being used again (within a reasonable time) and which pages
>>>aren't.
>>
>>Is there really much performance to be gained from tuning the 'limited'
>>cache space, or will it just hurt as many or more systems than it helps?
>
>
> Thats a very good question.
> Most of the time the current algorithm works quite well.
> On the other hand, I definitely know what people mean when they complain
> about cachingand all this stuff. By just copying a big file that I dont use
> afterwards or watching an video I have 2 wonderful scenarios. The cache is
> filled with useless information and big parts of KDE are neither in memory
> nor in cache. Applications could use madvice or other things to indicate
> that they dont need this file a second time, but they usually dont.
>
> I think it might be interesting to have some kind of benchmark, similiar to
> the interactive benchmark of Con that just triggers the workload so many
> people are complaining. If I find the time, I will give it a try in the
> next days.

You might find my response test program useful. My web server is in
three pieces at the moment, but http://pages.prodigy.net/davidsen has a
copy. You might want to do your own, but mine does show the results of
i/o pushing out programs.

--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

2004-06-10 07:46:45

by Buddy Lumpkin

[permalink] [raw]
Subject: RE: Some thoughts about cache and swap

>> Is the current system really bad enough to make it worthwhile, though?

> Yes! The current implementation just uses all the memory available, and
> pushes any programs not actively running out to disk. Click the window
> and go for coffee. On a small machine that's needed, but for almost any
> typical usage, desktop or server, pushing out programs to have 3.5GB of
> buffer instead of 3.0 doesn't help disk performance.

And for these larger systems, a simple list where pages are evicted from the
tail, and pages that are referenced are moved back to the head creates a
very natural, light overhead LRU mechanism for the page cache.

If this list can be treated as free memory, then pages_low would only be
reached in a true memory shortage. In that case, by all means walk all pages
in memory in a non-descriminating fashion looking for pages to evict. When I
say "true memory shortage", I am referring to cases like Bill's excellent
analogy above where 500mb of executables and anonymous memory are evicted
from memory so that the pagecache can grow from 3GB to 3.5GB. This is
exactly how I see this issue, probably because I have spent most of my time
working on large systems that are sized such that I never expect the working
set to exceed physical RAM, and there is plenty (many GB) of free memory for
pagecache).

If the VM were implemented this way, so long as you are willing to waste
some memory, you can benefit from minimal performance impact to a large
system due to memory pressure from I/O.

For the people that just can't stand the thought of keeping old anonymous
pages resident when they could be evicted to save precious memory, then add
a desktop knob that allows you to choose how much of the page cache is
considered "free memory". That way pages_low will be reached sooner and the
desired effect of evicting old anonymous pages from memory will occur as the
more expensive and accurate LRU mechanism is awaken.

Lastly, it would be really nice if the parts of an executable address space
that are backed by a named file could be preferenced, skipped over or
whatever to keep executables from being evicted only to be faulted back in,
but I'll keep my wish list short for now :)

--Buddy



>> Is there really much performance to be gained from tuning the 'limited'
>> cache
>> space, or will it just hurt as many or more systems than it helps?

> I doubt it, but it would be nice to have a tuner the admin could use
> instead of trying to guess what the priority of program response and i/o
> response should be. So if I have a graphics program which might open an
> image (small file) and decompress it into 1500MB of raw image, I can set
> the buffer space down to a GB or so (I assume that I do this on a
> machine fitted to such use) and get good response.



2004-06-11 14:07:53

by Jörn Engel

[permalink] [raw]
Subject: Re: Some thoughts about cache and swap

On Wed, 9 June 2004 15:32:41 -0400, Rik van Riel wrote:
>
> Haven't seen many of those, to be honest. The majority
> of the VM problems I get to see are people running a
> workload the kernel didn't expect - a workload the kernel
> wasn't prepared to handle...

Is there a list of those different workloads somewhere? And I don't
mean in your head. ;)

All I notive in my personal use is the cache flushing effect from
use-once data. If that was the whole list, it should be easy enough
to fix.

J?rn

--
"Security vulnerabilities are here to stay."
-- Scott Culp, Manager of the Microsoft Security Response Center, 2001

2004-06-12 01:50:24

by Rik van Riel

[permalink] [raw]
Subject: Re: Some thoughts about cache and swap

On Fri, 11 Jun 2004, J?rn Engel wrote:
> On Wed, 9 June 2004 15:32:41 -0400, Rik van Riel wrote:
> >
> > Haven't seen many of those, to be honest. The majority
> > of the VM problems I get to see are people running a
> > workload the kernel didn't expect - a workload the kernel
> > wasn't prepared to handle...
>
> Is there a list of those different workloads somewhere? And I don't
> mean in your head. ;)

That's the point. If there was such a list, we could put
appropriate kludges in place for all of them.

> All I notive in my personal use is the cache flushing effect from
> use-once data. If that was the whole list, it should be easy enough to
> fix.

I wish it were that easy. Users keep surprising us with
new and unexpected workloads, though.

Part of it is that every time Linux is improved, people are
encouraged to try out stranger, newer, heavier workloads ;)

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

2004-06-16 14:27:59

by jlnance

[permalink] [raw]
Subject: Re: Some thoughts about cache and swap

On Wed, Jun 09, 2004 at 03:43:29PM -0400, Bill Davidsen wrote:
> John Bradford wrote:
> >Quote from Rik van Riel <[email protected]>:

> >Is the current system really bad enough to make it worthwhile, though?
>
> Yes! The current implementation just uses all the memory available, and
> pushes any programs not actively running out to disk. Click the window
> and go for coffee. On a small machine that's needed, but for almost any
> typical usage, desktop or server, pushing out programs to have 3.5GB of
> buffer instead of 3.0 doesn't help disk performance.

Hi All,

It would be good to make the swap-out code smarter, but it occured
to me this morning that the problem might really be the swap-in code.
If my 120M mozilla process gets paged out and then I click on the window,
whats the best case time required to swap it in. Isn't it something like
2 seconds? I dont think we currently get anywhere close to 2 seconds,
though I haven't checked. I can think of two reasons this might be so,
though I am no expert on the Linux VM code so I would appreciate comments.

First, it may be that we spread the pages of the executable across the
swap space rather than putting them near each other. This would introduce
a lot of seeks when we paged them back in, and this would certainly slow
us down.

Second, I believe the page-in process is fairly synchronous, something
like this:

A - app generates a page fault
B - kernel puts app to sleep and queues page-in
C - Page in happens and kernel wakes up app

This is good behavior for the normal case where swapping is rare and you
want to drop unneeded pages from the working set. But it is going to
be slow for the case where we need to page a lot of stuff in. Does the
kernel try and recognize the case of a swapped out application 'comming
back to life' and try to page large portions of it back in before the
app faults the pages?

Thanks,

Jim