2022-04-06 12:12:53

by Dave Chinner

[permalink] [raw]
Subject: Re: [RFC] mm/vmscan: add periodic slab shrinker

On Tue, Apr 05, 2022 at 10:21:53PM +0100, Matthew Wilcox wrote:
> On Tue, Apr 05, 2022 at 01:58:59PM -0700, Yang Shi wrote:
> > Yeah, I agree it actually doesn't make too much sense to return the
> > number of reclaimed objects. Other part of vmscan returns the number
> > of base pages, the sizes of slab objects are varied, it may be much
> > smaller than a page, for example, dentry may be 192 bytes.
>
> From the point of view of vmscan, it only cares about the number of pages
> freed because it's trying to free pages. But from the point of view of
> trying to keep the number of non-useful objects in check, the number of
> objects freed is more important, and it doesn't matter whether we ended
> up freeing any pages because we made memory available for this slab cache.

Yes and no. If the memory pressure is being placed on this cache,
then freeing any number of objects is a win-win situation - reclaim
makes progress and new allocations don't need to wait for reclaim.

However, if there is no pressure on this slab cache, then freeing
objects but no actual memory pages is largely wasted reclaim effort.
Freeing those objects does nothing to alleviate the memory shortage,
and the memory freed is not going to be consumed any time soon so
all we've done is fragment the slab cache and require the subsystem
to spend more resources re-populating it. That's a lose-lose.

We want to select the shrinkers that will result in the former
occurring, not the latter.

Cheers,

Dave.
--
Dave Chinner
[email protected]


2022-04-21 22:08:46

by Kent Overstreet

[permalink] [raw]
Subject: Re: [RFC] mm/vmscan: add periodic slab shrinker

On Wed, Apr 06, 2022 at 10:01:30AM +1000, Dave Chinner wrote:
> On Tue, Apr 05, 2022 at 10:21:53PM +0100, Matthew Wilcox wrote:
> > On Tue, Apr 05, 2022 at 01:58:59PM -0700, Yang Shi wrote:
> > > Yeah, I agree it actually doesn't make too much sense to return the
> > > number of reclaimed objects. Other part of vmscan returns the number
> > > of base pages, the sizes of slab objects are varied, it may be much
> > > smaller than a page, for example, dentry may be 192 bytes.
> >
> > From the point of view of vmscan, it only cares about the number of pages
> > freed because it's trying to free pages. But from the point of view of
> > trying to keep the number of non-useful objects in check, the number of
> > objects freed is more important, and it doesn't matter whether we ended
> > up freeing any pages because we made memory available for this slab cache.
>
> Yes and no. If the memory pressure is being placed on this cache,
> then freeing any number of objects is a win-win situation - reclaim
> makes progress and new allocations don't need to wait for reclaim.
>
> However, if there is no pressure on this slab cache, then freeing
> objects but no actual memory pages is largely wasted reclaim effort.
> Freeing those objects does nothing to alleviate the memory shortage,
> and the memory freed is not going to be consumed any time soon so
> all we've done is fragment the slab cache and require the subsystem
> to spend more resources re-populating it. That's a lose-lose.
>
> We want to select the shrinkers that will result in the former
> occurring, not the latter.

Do we have any existing shrinkers that preferentially free from mostly empty
slab pages though? And do we want them to?

You're talking about memory fragmentation, and I'm not sure that should be the
shrinker's concern (on the other hand, I'm not sure it shouldn't - just freeing
the objects on mostly empty slab pages is pretty reasonable for cached objects).

We could also plumb compaction down to the slab level, and just request the
subsystem move those objects. Might be easier than making that an additional
responsibility of the shrinkers, which really should be more concerned with
implementing cache replacement policy and whatnot - e.g. shrinkers were doing
something more LFU-ish that would also help with the one-off object problem.

2022-04-22 22:23:34

by Dave Chinner

[permalink] [raw]
Subject: Re: [RFC] mm/vmscan: add periodic slab shrinker

On Thu, Apr 21, 2022 at 03:03:39PM -0400, Kent Overstreet wrote:
> On Wed, Apr 06, 2022 at 10:01:30AM +1000, Dave Chinner wrote:
> > On Tue, Apr 05, 2022 at 10:21:53PM +0100, Matthew Wilcox wrote:
> > > On Tue, Apr 05, 2022 at 01:58:59PM -0700, Yang Shi wrote:
> > > > Yeah, I agree it actually doesn't make too much sense to return the
> > > > number of reclaimed objects. Other part of vmscan returns the number
> > > > of base pages, the sizes of slab objects are varied, it may be much
> > > > smaller than a page, for example, dentry may be 192 bytes.
> > >
> > > From the point of view of vmscan, it only cares about the number of pages
> > > freed because it's trying to free pages. But from the point of view of
> > > trying to keep the number of non-useful objects in check, the number of
> > > objects freed is more important, and it doesn't matter whether we ended
> > > up freeing any pages because we made memory available for this slab cache.
> >
> > Yes and no. If the memory pressure is being placed on this cache,
> > then freeing any number of objects is a win-win situation - reclaim
> > makes progress and new allocations don't need to wait for reclaim.
> >
> > However, if there is no pressure on this slab cache, then freeing
> > objects but no actual memory pages is largely wasted reclaim effort.
> > Freeing those objects does nothing to alleviate the memory shortage,
> > and the memory freed is not going to be consumed any time soon so
> > all we've done is fragment the slab cache and require the subsystem
> > to spend more resources re-populating it. That's a lose-lose.
> >
> > We want to select the shrinkers that will result in the former
> > occurring, not the latter.
>
> Do we have any existing shrinkers that preferentially free from mostly empty
> slab pages though?

No, because shrinkers have no visbility into slab cache layout.

> And do we want them to?

There have been attempts in the past to do this, which started from
selecting partial slab pages and then trying free/reclaim the
objects on those pages.

The problems these direct defrag attempts face is that partial slabs
can be a mix of referenced (in use) and unreferenced (reclaimable)
objects. Freeing/relocating an in use object is largely impossible
because of all the (unknown) external references to the object would
need to be updated and that's an intractable problem.

Hence attempts to directly target partial slab pages for reclaim
have largely ended up not having very good results because partial
slab pages containing only unreferenced objects are much rarer than
partial slab pages....

> You're talking about memory fragmentation, and I'm not sure that should be the
> shrinker's concern (on the other hand, I'm not sure it shouldn't - just freeing
> the objects on mostly empty slab pages is pretty reasonable for cached objects).

Yeah, but as per above, having mostly empty slab pages does not mean
the remaining active objects on those pages can actually be
reclaimed...

> We could also plumb compaction down to the slab level, and just request the
> subsystem move those objects.

Been there, tried that, not feasible. How do you find all the
external active references to a dentry or inode at any given point
in time and then prevent all of them from being actively used while
you switch all the external pointers to the old object to the new
object?

> Might be easier than making that an additional
> responsibility of the shrinkers, which really should be more concerned with
> implementing cache replacement policy and whatnot - e.g. shrinkers were doing
> something more LFU-ish that would also help with the one-off object problem.

*nod*

I've said this many times in the past when people have wanted to
hack special cases into inode and dentry cache reference bit
management. We need to make the list_lru rotation implementation
smarter, not hack special cases into individual shrinker algorithms

One of the original goals of the list_lru was to unify all the LRU
mechanisms used by shrinkable slab caches so we could then do
something smarter with the list_lru than a plain LRU + reference bit
and everything would benefit. e.g. Being able to bias reclaim
priority of objects based on their known access patterns (which I
implemented for the xfs buffer cache via b_lru_ref so it holds onto
tree roots and nodes harder than it does leaves) or moving the list
ordering towards LFU or clock-pro style active/inactive lists page
reclaim uses to allow single use objects to stream through the cache
instead of turn it entirely over were all potential improvements I
as thinking of.

So, yes, we should be looking at improving the list_lru
implementation so taht it handles streaming single use objects a
whole lot better. Fix it once, fix it properly, and everyone who
uses list-lru and shrinkers benefits...

Cheers,

Dave.

--
Dave Chinner
[email protected]