2001-04-27 19:01:38

by Pete Zaitcev

[permalink] [raw]
Subject: Atrocious icache/dcache in 2.4.2

Hello:

My box here slows down dramatically after a while, and starts
behaving as if it has very little memory, e.g. programs page
each other out. It turns out that out of 40MB total, about
35MB is used for dcache and icache, and system basically
runs in 5MB of RAM.

When I tried to discuss it with riel, viro, and others,
I got an immediate and very strong knee jerk reaction "we fixed
it in 2.4.4-pre4!" "we gotta call prune_dcache more!".
That just does not sound persuasive to me.

After a little thinking it seems apparent to me that it
may be a good thing to have VM taking pages from dentry and
inode pools directly. This sounds almost what slab does,
so let me speculate about it (it is a bad idea, but it is
interesting _why_).

Suppose that we do this: when inode gets clean (e.g. unlocked,
written to disk if was changed), drop it into kmem_cache_free(),
but retain on hash (forget about poisoning for a momemt).
Then, if memory is needed, VM may ask slab, slab calls our
destructors, and destructors take inode off hash. The idea
solves the problem, but has two marks agains it. First, when
we look up an inode, we either hit dirty or "clean", which
is free. Then we have to do kmem_cache_alloc() and that will
return wrong inode, which we have to drop from hash, then do
memcpy from old "really free one", etc. It still saves disk
I/O, but messy. Another thing is a fragmentation: suppose we
have bunch of slabs, every one has a single dirty inode in it
(tar xf -). Memory pressure will be powerless to do anything
about them.

So, I have a better crackpot idea: create a fake filesystem,
say "inodefs". When inodes are needed, we pretend to read
pages from that filesystem, but in fact we just zero most
of them and put inodes there, also every one needs a "used"
counter, like slab has. When an inode is dirty, we mark
those pages locked or dirty, if only clean - mark pages
as dirty. VM will automatically try to get pages, and
write out those that are "dirty". At that moment,
we have an option to look, if any used (clean or dirty) inodes
are inside the page. If they are, we either move them in
some other (fragmented) pages, or just remove them from
hashes and pretend that the page is written.

The bad part is that inode cache code and inodefs will have
part of slab machinery replicated in them. Dunno if that is
bad enough to bury the thing.

If you have read to this point, let me know what you think.

-- Pete


2001-04-27 19:14:19

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Atrocious icache/dcache in 2.4.2

Hi Pete,

In article <[email protected]> you wrote:
> After a little thinking it seems apparent to me that it
> may be a good thing to have VM taking pages from dentry and
> inode pools directly. This sounds almost what slab does,
> so let me speculate about it (it is a bad idea, but it is
> interesting _why_).
>
> Suppose that we do this: when inode gets clean (e.g. unlocked,
> written to disk if was changed), drop it into kmem_cache_free(),
> but retain on hash (forget about poisoning for a momemt).
> Then, if memory is needed, VM may ask slab, slab calls our
> destructors, and destructors take inode off hash. The idea
> solves the problem, but has two marks agains it. First, when
> we look up an inode, we either hit dirty or "clean", which
> is free. Then we have to do kmem_cache_alloc() and that will
> return wrong inode, which we have to drop from hash, then do
> memcpy from old "really free one", etc. It still saves disk
> I/O, but messy. Another thing is a fragmentation: suppose we
> have bunch of slabs, every one has a single dirty inode in it
> (tar xf -). Memory pressure will be powerless to do anything
> about them.

It looks like you want the SLAB cache ->reclaim method we seem
to have forgotten when cloning the Solaris SLAB interface nearly
1:1...

Christoph

--
Of course it doesn't work. We've performed a software upgrade.

2001-04-27 19:40:22

by Alexander Viro

[permalink] [raw]
Subject: Re: Atrocious icache/dcache in 2.4.2



On Fri, 27 Apr 2001, Pete Zaitcev wrote:

> Hello:
>
> My box here slows down dramatically after a while, and starts
> behaving as if it has very little memory, e.g. programs page
> each other out. It turns out that out of 40MB total, about
> 35MB is used for dcache and icache, and system basically
> runs in 5MB of RAM.
>
> When I tried to discuss it with riel, viro, and others,
> I got an immediate and very strong knee jerk reaction "we fixed
> it in 2.4.4-pre4!" "we gotta call prune_dcache more!".
> That just does not sound persuasive to me.

[snip]
> written to disk if was changed), drop it into kmem_cache_free(),
> but retain on hash (forget about poisoning for a momemt).

What for?

I'm with you until now. But why bother keeping them resurrectable?
They are not refered by dentries. They have no IO happening on
them. Why retain them in cache for long?

Notice that icache is behind the dcache, so you are looking at the
second-order effects here. With the data you've shown on #kernel
it looks like half of your icache is just sitting there for no
ggod reason and slows down hash lookups.

It makes sense to retain them for a while, but inode sitting there
unreferenced by anything for minutes is a dead weight and nothing
else.

Notice that actually percent of the needlessly held inodes is higher -
2.4.2 _really_ keeps stale stuff in dcache and that means stale
stuff in icache. I.e. the only reference is from dentry that hadn't
been touched by anything for a _long_ time.

IOW, we just need to make sure that unreferenced inodes get freed
once they are not dirty / not locked. Fast. No need to keep them
on hash - just free them for real. Moreover, that will get
fragmentation down.