Hello,
Currently there is no way for modules to define dynamically sized caches that
shrink upon memory pressure. We need this for implementing Extended Attribute
caches on ext2, ext3, and ReiserFS. Other caches could also make use of the
same mechanism (e.g., nfsd's permission cache, dcache, icache, dqache).
I propose this patch, which adds the register_cache() and unregister_cache()
functions. They allow to register a callback which is invoked on memory
pressure. This callback shall then try to free some memory; the parameters
and semantics are similar to the other shrink functions in mm/vmscan.c.
Regards,
Andreas.
------------------------------------------------------------------
Andreas Gruenbacher SuSE Linux AG
mailto:[email protected] Deutschherrnstr. 15-19
http://www.suse.de/ D-90429 Nuernberg, Germany
How do you ensure that caches have their (internal) aging hands pushed
at a speed that is proportional to their memory usage, or is your design
susceptible to all the usual complaints the unified memory manager crowd
has about separate caches?
Hans
Andreas Gruenbacher wrote:
>Hello,
>
>Currently there is no way for modules to define dynamically sized caches that
>shrink upon memory pressure. We need this for implementing Extended Attribute
>caches on ext2, ext3, and ReiserFS. Other caches could also make use of the
>same mechanism (e.g., nfsd's permission cache, dcache, icache, dqache).
>
>I propose this patch, which adds the register_cache() and unregister_cache()
>functions. They allow to register a callback which is invoked on memory
>pressure. This callback shall then try to free some memory; the parameters
>and semantics are similar to the other shrink functions in mm/vmscan.c.
>
>
>Regards,
>Andreas.
>
>------------------------------------------------------------------
> Andreas Gruenbacher SuSE Linux AG
> mailto:[email protected] Deutschherrnstr. 15-19
> http://www.suse.de/ D-90429 Nuernberg, Germany
>
>
>
>
--
Hans
On Sunday 04 August 2002 13:30, Hans Reiser wrote:
> How do you ensure that caches have their (internal) aging hands pushed
> at a speed that is proportional to their memory usage, or is your design
> susceptible to all the usual complaints the unified memory manager crowd
> has about separate caches?
That's a policy/optimization issue; it's not even desirable to shrink the
caches with priorities proportional to their size---they would all tend to
become equally large.
The patch shrinks all the caches equally often, with the same priorities. The
caches can then decide themselves how they will react, depending on their
cache size and entry size, replacement strategy, taking care of page entry
clustering or not, etc.
The icache, dcache, and dqcache are shrunk using the same strategy (except the
priority is a constant for some of the caches, which could be coded in the
shrink function as well). This scheme has worked out pretty well so far,
right?
For Extended Attributes we are currently using a very simple cache with LRU
entry replacement, and small entries. The cache doesn't grow very big,
either.
Regards,
Andreas.
------------------------------------------------------------------
Andreas Gruenbacher SuSE Linux AG
mailto:[email protected] Deutschherrnstr. 15-19
http://www.suse.de/ D-90429 Nuernberg, Germany
On Sun, 4 Aug 2002, Andreas Gruenbacher wrote:
> On Sunday 04 August 2002 13:30, Hans Reiser wrote:
> > How do you ensure that caches have their (internal) aging hands pushed
> > at a speed that is proportional to their memory usage, or is your design
> > susceptible to all the usual complaints the unified memory manager crowd
> > has about separate caches?
>
> That's a policy/optimization issue; it's not even desirable to shrink the
> caches with priorities proportional to their size---they would all tend to
> become equally large.
Nope, the idea is to push all caches according to size, but
often-used caches should shrink less than caches that are
hardly ever used.
> The icache, dcache, and dqcache are shrunk using the same strategy
> (except the priority is a constant for some of the caches, which could
> be coded in the shrink function as well). This scheme has worked out
> pretty well so far, right?
Not quite, we still have some bad problems balancing the size
of these caches versus the size of the other VM occupants.
However, your shrinking function is good enough for now and
can be used with something like Ed Tomlinson's approach later
on to make reclaiming better balanced.
regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
Andreas Gruenbacher wrote:
>On Sunday 04 August 2002 13:30, Hans Reiser wrote:
>
>
>>How do you ensure that caches have their (internal) aging hands pushed
>>at a speed that is proportional to their memory usage, or is your design
>>susceptible to all the usual complaints the unified memory manager crowd
>>has about separate caches?
>>
>>
>
>That's a policy/optimization issue; it's not even desirable to shrink the
>caches with priorities proportional to their size---they would all tend to
>become equally large.
>
That is not what I said, I said move the aging hand at a speed
proportional to size, what affect the aging has depends on the usage of
the objects in the cache.
>
>The patch shrinks all the caches equally often, with the same priorities.
>
This is wrong, because frequently used objects should not be shrunk out
of their caches.
>The
>caches can then decide themselves how they will react, depending on their
>cache size and entry size, replacement strategy, taking care of page entry
>clustering or not, etc.
>
How can they decide this without a pressure signal from the master
memory manager that is proportional to their size? Or is the idea that
they figure out their own size? I suppose that is equivalent....
>
>The icache, dcache, and dqcache are shrunk using the same strategy (except the
>priority is a constant for some of the caches, which could be coded in the
>shrink function as well).
>
What does priority
>This scheme has worked out pretty well so far,
>right?
>
Our cache management has deep algorithm flaws which result in things
like one active dcache entry keeping an entire page of unused dcache
entries from expiring from the cache. Did you see Joshua MacDonald's
post on that topic?
>
>For Extended Attributes we are currently using a very simple cache with LRU
>entry replacement, and small entries. The cache doesn't grow very big,
>either.
>
You don't support eas that are larger than a page? Is it LRU on each
ea, or is it LRU on the page containing the ea?
>
>Regards,
>Andreas.
>
>------------------------------------------------------------------
> Andreas Gruenbacher SuSE Linux AG
> mailto:[email protected] Deutschherrnstr. 15-19
> http://www.suse.de/ D-90429 Nuernberg, Germany
>
>
>
>
>
--
Hans
Rik van Riel wrote:
>On Sun, 4 Aug 2002, Andreas Gruenbacher wrote:
>
>
>>On Sunday 04 August 2002 13:30, Hans Reiser wrote:
>>
>>
>>>How do you ensure that caches have their (internal) aging hands pushed
>>>at a speed that is proportional to their memory usage, or is your design
>>>susceptible to all the usual complaints the unified memory manager crowd
>>>has about separate caches?
>>>
>>>
>>That's a policy/optimization issue; it's not even desirable to shrink the
>>caches with priorities proportional to their size---they would all tend to
>>become equally large.
>>
>>
>
>Nope, the idea is to push all caches according to size, but
>often-used caches should shrink less than caches that are
>hardly ever used.
>
Do you let the subcache decide how to move the aging hand and track it?
Have I convinced you of that one yet? Or is it still page based?
--
Hans
On Sun, 4 Aug 2002, Hans Reiser wrote:
> Rik van Riel wrote:
>
> >Nope, the idea is to push all caches according to size, but
> >often-used caches should shrink less than caches that are
> >hardly ever used.
>
> Do you let the subcache decide how to move the aging hand and track it?
> Have I convinced you of that one yet? Or is it still page based?
Linus has indicated that he would like to have it page based,
but implementation issues point towards letting the subcache
manage its objects by itself ;)
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
On Sun, 4 Aug 2002, Andreas Gruenbacher wrote:
>
> Currently there is no way for modules to define dynamically sized caches that
> shrink upon memory pressure. We need this for implementing Extended Attribute
> caches on ext2, ext3, and ReiserFS. Other caches could also make use of the
> same mechanism (e.g., nfsd's permission cache, dcache, icache, dqache).
This is what the slablru patches should do.. Any slab user should be able
to get a callback when the LRU decides that the page seems to be unused.
Linus
On Sun, 4 Aug 2002, Rik van Riel wrote:
>
> Linus has indicated that he would like to have it page based,
> but implementation issues point towards letting the subcache
> manage its objects by itself ;)
The two are not mutually incompatible.
I think we've all seen that non-global shrinking just doesn't work very
well, because one cache that grows too large will end up asking everybody
else to shrink, even if a global shrinking policy would have realized that
the memory pressure is due to that one overly large cache. The resulting
balancing problems are "interesting".
Being purely page-based, together with support for the sub-caches knowing
about the page-based aging, should be fine.
In particular, it is useless for the sub-caches to try to maintain their
own LRU lists and their own accessed bits. But that doesn't mean that they
can _act_ as if they updated their own accessed bits, while really just
telling the page-based thing that that page is active.
This is what the buffer cache has been doing for the last two years, ie
"touch_buffer()" actually ends up touching the page. Which seems to be
working quite well.
Linus
On Sun, 4 Aug 2002, Linus Torvalds wrote:
> On Sun, 4 Aug 2002, Rik van Riel wrote:
> >
> > Linus has indicated that he would like to have it page based,
> > but implementation issues point towards letting the subcache
> > manage its objects by itself ;)
>
> The two are not mutually incompatible.
> In particular, it is useless for the sub-caches to try to maintain their
> own LRU lists and their own accessed bits. But that doesn't mean that
> they can _act_ as if they updated their own accessed bits, while really
> just telling the page-based thing that that page is active.
I'm not sure I agree with this. For eg. the dcache you will want
to reclaim the less used entries on a page even if there are a few
very intensely used entries on that page.
This is because then new dcache allocations can come from the
empty space on this page and the dcache doesn't have to grow in
order to allocate new entries.
If we were to go fully page-based, it'd become impossible to evict
dcache entries from any page with at least one heavily used entry
and the dcache will use much more space than otherwise required.
In reality it'd be even worse because we cannot evict any dcache
entry which is pinned by eg. inodes or directory entries.
Please tell me if I've overlooked something, it would be nice if
the page based scheme could work out after all ;)
regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
On Sun, 4 Aug 2002, Rik van Riel wrote:
>
> > In particular, it is useless for the sub-caches to try to maintain their
> > own LRU lists and their own accessed bits. But that doesn't mean that
> > they can _act_ as if they updated their own accessed bits, while really
> > just telling the page-based thing that that page is active.
>
> I'm not sure I agree with this. For eg. the dcache you will want
> to reclaim the less used entries on a page even if there are a few
> very intensely used entries on that page.
True in theory, but I doubt you will see it very much in practice.
Most of the time when you want to free dentries, it is because you have a
_ton_ of them.
The fact that some will look cold even if they aren't should not matter
that much statistically.
Yah, it's a guess. We can test it.
Linus
Linus Torvalds wrote:
>On Sun, 4 Aug 2002, Rik van Riel wrote:
>
>
>>>In particular, it is useless for the sub-caches to try to maintain their
>>>own LRU lists and their own accessed bits. But that doesn't mean that
>>>they can _act_ as if they updated their own accessed bits, while really
>>>just telling the page-based thing that that page is active.
>>>
>>>
>>I'm not sure I agree with this. For eg. the dcache you will want
>>to reclaim the less used entries on a page even if there are a few
>>very intensely used entries on that page.
>>
>>
>
>True in theory, but I doubt you will see it very much in practice.
>
>Most of the time when you want to free dentries, it is because you have a
>_ton_ of them.
>
>The fact that some will look cold even if they aren't should not matter
>that much statistically.
>
>Yah, it's a guess. We can test it.
>
> Linus
>
>
>
>
>
Josh tested it. He posted on it. I'll have him find his original post
and repost tomorrow, but summarized in brief, the current dcache
shrinking/management approach was quite inefficient. Each active dcache
entry kept a whole lot of dead ones around.
--
Hans
Rik van Riel <[email protected]> writes:
> On Sun, 4 Aug 2002, Linus Torvalds wrote:
> > On Sun, 4 Aug 2002, Rik van Riel wrote:
> > >
> > > Linus has indicated that he would like to have it page based,
> > > but implementation issues point towards letting the subcache
> > > manage its objects by itself ;)
> >
> > The two are not mutually incompatible.
>
> > In particular, it is useless for the sub-caches to try to maintain their
> > own LRU lists and their own accessed bits. But that doesn't mean that
> > they can _act_ as if they updated their own accessed bits, while really
> > just telling the page-based thing that that page is active.
>
> I'm not sure I agree with this. For eg. the dcache you will want
> to reclaim the less used entries on a page even if there are a few
> very intensely used entries on that page.
>
> This is because then new dcache allocations can come from the
> empty space on this page and the dcache doesn't have to grow in
> order to allocate new entries.
>
> If we were to go fully page-based, it'd become impossible to evict
> dcache entries from any page with at least one heavily used entry
> and the dcache will use much more space than otherwise required.
> In reality it'd be even worse because we cannot evict any dcache
> entry which is pinned by eg. inodes or directory entries.
>
> Please tell me if I've overlooked something, it would be nice if
> the page based scheme could work out after all ;)
Another angle worth taking into account is that for any data that
is purely cached, (that is not currently pinned by a user), it is
possible to compact the data. With rmap is property even applies to
pinned page cache pages.
For cases where internal fragmentation is likely it may be worth
taking advantage of this property.
Eric
On Sun, Aug 04, 2002 at 11:17:03PM +0400, Hans Reiser wrote:
> Linus Torvalds wrote:
>
> >On Sun, 4 Aug 2002, Rik van Riel wrote:
> >
> >
> >>>In particular, it is useless for the sub-caches to try to maintain their
> >>>own LRU lists and their own accessed bits. But that doesn't mean that
> >>>they can _act_ as if they updated their own accessed bits, while really
> >>>just telling the page-based thing that that page is active.
> >>>
> >>>
> >>I'm not sure I agree with this. For eg. the dcache you will want
> >>to reclaim the less used entries on a page even if there are a few
> >>very intensely used entries on that page.
> >>
> >>
> >
> >True in theory, but I doubt you will see it very much in practice.
> >
> >Most of the time when you want to free dentries, it is because you have a
> >_ton_ of them.
> >
> >The fact that some will look cold even if they aren't should not matter
> >that much statistically.
> >
> >Yah, it's a guess. We can test it.
> >
> > Linus
> >
> >
> >
> >
> >
> Josh tested it. He posted on it. I'll have him find his original post
> and repost tomorrow, but summarized in brief, the current dcache
> shrinking/management approach was quite inefficient. Each active dcache
> entry kept a whole lot of dead ones around.
It seems the rmap+slab-LRU patches work to address this problem, but even with
those patches applied Ed Tomlinson's test shows a ~50% utilization of dcache
slab pages under memory pressure. My original test on 2.4.18 showed a ~25%
utilization under memory pressure, so the slab-LRU patches definetly show an
improvement. In practice and theory, we see that the slab caches are
inefficient at using memory under pressure, so probably there is more work to
be done. Ed's posted results:
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=01041309360100.32757%40oscar
Here is the text of the message that Hans refers to:
http://groups.google.com/groups?q=g:thl2323088796d&dq=&hl=en&lr=&ie=UTF-8&selm=20020128091338.D6578%40helen.CS.Berkeley.EDU
--
When memory pressure becomes high, the Linux kswapd begins calling
shrink_caches() from try_to_free_pages() with an integer priority from
6 (the default, lowest priority) to 1 (high priority). Looking
specifically at the dcache, this results in a calls to
shrink_dcache_memory() that attempt to free a fraction (1/priority) of
the inactive dcache entries. This ultimately leads to prune_dcache()
scanning the dcache in least-recently-used order attempting to call
kmem_cache_free() on some number of dcache entries.
Dcache entries are allocated from the kmem_slab_cache, which manages
objects in page-size "slabs", but the kmem_slab_cache cannot free a
page until every object in a slab becomes unused. The problem is that
freeing dcache entries in LRU-order is effectively freeing entries
from randomly-selected slabs, and therefore applying shrink_caches()
pressure to the dcache has an undesired result. In the attempt to
reduce its size, the dcache must free objects from random slabs in
order to actually release full pages. The result is that under high
memory pressure the dcache utilization drops dramatically. The
prune_dcache() mechanism doesn't just reduce the page utilization as
desired, it reduces the intra-page utilization, which is bad.
In order to measure this effect (via /proc/slabinfo) I first populated
a large dcache and then ran a memory-hog to force swapping to occur.
The dcache utilization drops to between 20-35%. For example, before
running the memory-hog my dcache reports:
dentry_cache 10170 10170 128 339 339 1 : 252 126
(i.e., 10170 active dentry objects, 10170 available dentry objects @
128 bytes each, 339 pages with at least one object, and 339 allocated
pages, an approximately 1.4MB dcache)
While running the memory-hog program to initiate swapping, the dcache
stands at:
dentry_cache 693 3150 128 105 105 1 : 252 126
Meaning, the randomly-applied cache pressure was successful at freeing
234 (= 339-105) pages, leaving a 430KB dcache, but at the same time it
reduced the cache utilization to 22%, meaning that although it was
able to free nearly 1MB of space, 335KB are now wasted as a result of
the high memory-pressure condition.
So, it would seem that the dcache and kmem_slab_cache memory allocator
could benefit from a way to shrink the dcache in a less random way.
Any thoughts?