Subject: Re: [PATCH -mm 8/8] slab: reap dead memcg caches aggressively

On Sat, 31 May 2014, Vladimir Davydov wrote:

> > You can use a similar approach than in SLUB. Reduce the size of the per
> > cpu array objects to zero. Then SLAB will always fall back to its slow
> > path in cache_flusharray() where you may be able to do something with less
> > of an impact on performace.
>
> In contrast to SLUB, for SLAB this will slow down kfree significantly.

But that is only when you want to destroy a cache. This is similar.

> Fast path for SLAB is just putting an object to a per cpu array, while
> the slow path requires taking a per node lock, which is much slower even
> with no contention. There still can be lots of objects in a dead memcg
> cache (e.g. hundreds of megabytes of dcache), so such performance
> degradation is not acceptable, IMO.

I am not sure that there is such a stark difference to SLUB. SLUB also
takes the per node lock if necessary to handle freeing especially if you
zap the per cpu partial slab pages.


2014-06-03 20:18:32

by Vladimir Davydov

[permalink] [raw]
Subject: Re: [PATCH -mm 8/8] slab: reap dead memcg caches aggressively

On Mon, Jun 02, 2014 at 10:24:09AM -0500, Christoph Lameter wrote:
> On Sat, 31 May 2014, Vladimir Davydov wrote:
>
> > > You can use a similar approach than in SLUB. Reduce the size of the per
> > > cpu array objects to zero. Then SLAB will always fall back to its slow
> > > path in cache_flusharray() where you may be able to do something with less
> > > of an impact on performace.
> >
> > In contrast to SLUB, for SLAB this will slow down kfree significantly.
>
> But that is only when you want to destroy a cache. This is similar.

When we want to destroy a memcg cache, there can be really a lot of
objects allocated from it, e.g. gigabytes of inodes and dentries. That's
why I think we should avoid any performance degradations if possible.

>
> > Fast path for SLAB is just putting an object to a per cpu array, while
> > the slow path requires taking a per node lock, which is much slower even
> > with no contention. There still can be lots of objects in a dead memcg
> > cache (e.g. hundreds of megabytes of dcache), so such performance
> > degradation is not acceptable, IMO.
>
> I am not sure that there is such a stark difference to SLUB. SLUB also
> takes the per node lock if necessary to handle freeing especially if you
> zap the per cpu partial slab pages.

Hmm, for SLUB we will only take the node lock for inserting a slab on
the partial list, while for SLAB disabling per-cpu arrays will result in
taking the lock on each object free. So if there are only several
objects per slab, the difference won't be huge, otherwise the slow down
will be noticeable for SLAB, but not for SLUB.

I'm not that sure that we should prefer one way over another though. I
just think that if we already have periodic reaping for SLAB, why not
employ it for reaping dead memcg caches too, provided it won't obfuscate
the code? Anyway, if you think that we can neglect possible performance
degradation that will result from disabling per cpu caches for SLAB, I
can give it a try.

Thanks.