LinuxLists.cc - [PATCH] vmscan: fix slab vs lru balance

2015-11-24 12:47:44

Subject: [PATCH] vmscan: fix slab vs lru balance

The comment to shrink_slab states that the portion of kmem objects
scanned by it equals the portion of lru pages scanned by shrink_zone
over shrinker->seeks.

shrinker->seeks is supposed to be equal to the number of disk seeks
required to recreated an object. It is usually set to DEFAULT_SEEKS (2),
which is quite logical, because most kmem objects (e.g. dentry or inode)
require random IO to reread (seek to read and seek back).

That said, one would expect that dcache is scanned two times less
intensively than page cache, which sounds sane as dentries are generally
more costly to recreate.

However, the formula for distributing memory pressure between slab and
lru actually looks as follows (see do_shrink_slab):

lru_scanned
objs_to_scan = objs_total * --------------- * 4 / shrinker->seeks
lru_reclaimable

That is dcache, as well as most of other slab caches, is scanned two
times more aggressively than page cache.

Fix this by dropping '4' from the equation above.

Signed-off-by: Vladimir Davydov <[email protected]>
---
mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 97ba9e1cde09..9d553b07bb86 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -290,7 +290,7 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
nr = atomic_long_xchg(&shrinker->nr_deferred[nid], 0);

total_scan = nr;
- delta = (4 * nr_scanned) / shrinker->seeks;
+ delta = nr_scanned / shrinker->seeks;
delta *= freeable;
do_div(delta, nr_eligible + 1);
total_scan += delta;
--
2.1.4

2015-11-24 23:02:34

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] vmscan: fix slab vs lru balance

On Tue, 24 Nov 2015 15:47:21 +0300 Vladimir Davydov <[email protected]> wrote:

> The comment to shrink_slab states that the portion of kmem objects
> scanned by it equals the portion of lru pages scanned by shrink_zone
> over shrinker->seeks.
>
> shrinker->seeks is supposed to be equal to the number of disk seeks
> required to recreated an object. It is usually set to DEFAULT_SEEKS (2),
> which is quite logical, because most kmem objects (e.g. dentry or inode)
> require random IO to reread (seek to read and seek back).
>
> That said, one would expect that dcache is scanned two times less
> intensively than page cache, which sounds sane as dentries are generally
> more costly to recreate.
>
> However, the formula for distributing memory pressure between slab and
> lru actually looks as follows (see do_shrink_slab):
>
> lru_scanned
> objs_to_scan = objs_total * --------------- * 4 / shrinker->seeks
> lru_reclaimable
>
> That is dcache, as well as most of other slab caches, is scanned two
> times more aggressively than page cache.
>
> Fix this by dropping '4' from the equation above.
>

oh geeze. Who wrote that crap?

commit c3f4656118a78c1c294e0b4d338ac946265a822b
Author: Andrew Morton <[email protected]>
Date: Mon Dec 29 23:48:44 2003 -0800

[PATCH] shrink_slab acounts for seeks incorrectly

wli points out that shrink_slab inverts the sense of shrinker->seeks: those
caches which require more seeks to reestablish an object are shrunk harder.
That's wrong - they should be shrunk less.

So fix that up, but scaling the result so that the patch is actually a no-op
at this time, because all caches use DEFAULT_SEEKS (2).

diff --git a/mm/vmscan.c b/mm/vmscan.c
index b859482..f2da3c9 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -154,7 +154,7 @@ static int shrink_slab(long scanned, unsigned int gfp_mask)
list_for_each_entry(shrinker, &shrinker_list, list) {
unsigned long long delta;

- delta = scanned * shrinker->seeks;
+ delta = 4 * (scanned / shrinker->seeks);
delta *= (*shrinker->shrinker)(0, gfp_mask);
do_div(delta, pages + 1);
shrinker->nr += delta;

What a pathetic changelog.

The current code may be good, it may be bad, but I'm reluctant to
change it without a solid demonstration that the result is overall
superior.

2015-11-26 13:27:15

by Vladimir Davydov

[permalink] [raw]

Subject: Re: [PATCH] vmscan: fix slab vs lru balance

On Tue, Nov 24, 2015 at 03:02:27PM -0800, Andrew Morton wrote:
> On Tue, 24 Nov 2015 15:47:21 +0300 Vladimir Davydov <[email protected]> wrote:
>
> > The comment to shrink_slab states that the portion of kmem objects
> > scanned by it equals the portion of lru pages scanned by shrink_zone
> > over shrinker->seeks.
> >
> > shrinker->seeks is supposed to be equal to the number of disk seeks
> > required to recreated an object. It is usually set to DEFAULT_SEEKS (2),
> > which is quite logical, because most kmem objects (e.g. dentry or inode)
> > require random IO to reread (seek to read and seek back).
> >
> > That said, one would expect that dcache is scanned two times less
> > intensively than page cache, which sounds sane as dentries are generally
> > more costly to recreate.
> >
> > However, the formula for distributing memory pressure between slab and
> > lru actually looks as follows (see do_shrink_slab):
> >
> > lru_scanned
> > objs_to_scan = objs_total * --------------- * 4 / shrinker->seeks
> > lru_reclaimable
> >
> > That is dcache, as well as most of other slab caches, is scanned two
> > times more aggressively than page cache.
> >
> > Fix this by dropping '4' from the equation above.
> >
>
> oh geeze. Who wrote that crap?
>
>
> commit c3f4656118a78c1c294e0b4d338ac946265a822b
> Author: Andrew Morton <[email protected]>
> Date: Mon Dec 29 23:48:44 2003 -0800
>
> [PATCH] shrink_slab acounts for seeks incorrectly
>
> wli points out that shrink_slab inverts the sense of shrinker->seeks: those
> caches which require more seeks to reestablish an object are shrunk harder.
> That's wrong - they should be shrunk less.
>
> So fix that up, but scaling the result so that the patch is actually a no-op
> at this time, because all caches use DEFAULT_SEEKS (2).
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index b859482..f2da3c9 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -154,7 +154,7 @@ static int shrink_slab(long scanned, unsigned int gfp_mask)
> list_for_each_entry(shrinker, &shrinker_list, list) {
> unsigned long long delta;
>
> - delta = scanned * shrinker->seeks;
> + delta = 4 * (scanned / shrinker->seeks);
> delta *= (*shrinker->shrinker)(0, gfp_mask);
> do_div(delta, pages + 1);
> shrinker->nr += delta;
>
>
> What a pathetic changelog.
>
> The current code may be good, it may be bad, but I'm reluctant to
> change it without a solid demonstration that the result is overall
> superior.
>

Yep, that's understandable - we've been living with this (mis)behavior
for more than 10 years already and nobody seems to complain.

I don't have a solid proof at hand right now that the patch makes things
substantially better in most cases - it just comes from the speculation
that dropping dcache is really expensive, because (a) rereading it
requires random IO and (b) dropping an inode automatically results in
dropping page cache attached to it, so it shouldn't be scanned more
aggressively than unmapped page cache.

Anyway, I'll try to run various workloads with and w/o this patch and
report back if I find those which benefit from it.

Thanks,
Vladimir