LinuxLists.cc - [GIT PULL] slab for 5.19

2022-05-23 09:58:35

Subject: [GIT PULL] slab for 5.19

Linus,

please pull the latest slab changes from

git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git tags/slab-for-5.19

======================================

- Conversion of slub_debug stack traces to stackdepot, allowing
more useful debugfs-based inspection for e.g. memory leak
debugging. Allocation and free debugfs info now includes full
traces and is sorted by the unique trace frequency.

The stackdepot conversion was already attempted last year but
reverted by ae14c63a9f20. The memory overhead (while not actually
enabled on boot) has been meanwhile solved by making the large
stackdepot allocation dynamic. The xfstest issues haven't been
reproduced on current kernel locally nor in -next, so the slab
cache layout changes that originally made that bug manifest were
probably not the root cause.

- Refactoring of dma-kmalloc caches creation.

- Trivial cleanups such as removal of unused parameters, fixes
and clarifications of comments.

- Hyeonggon Yoo joins as a reviewer.

Thanks,
Vlastimil

----------------------------------------------------------------
Andrey Konovalov (2):
mm: slab: fix comment for ARCH_KMALLOC_MINALIGN
mm: slab: fix comment for __assume_kmalloc_alignment

Hyeonggon Yoo (2):
mm/slub, kunit: Make slub_kunit unaffected by user specified flags
MAINTAINERS: add myself as reviewer for slab

JaeSang Yoo (2):
mm/slub: remove unused parameter in setup_object*()
mm/slub: remove meaningless node check in ___slab_alloc()

Jiyoup Kim (1):
mm/slub: remove duplicate flag in allocate_slab()

Miaohe Lin (3):
mm/slab: remove some unused functions
mm/slub: remove unneeded return value of slab_pad_check
mm/slub: remove unused kmem_cache_order_objects max

Ohhoon Kwon (1):
mm/slab_common: move dma-kmalloc caches creation into new_kmalloc_cache()

Oliver Glitta (4):
mm/slub: use stackdepot to save stack trace in objects
mm/slub: distinguish and print stack traces in debugfs files
mm/slub: sort debugfs output by frequency of stack traces
slab, documentation: add description of debugfs files for SLUB caches

Vlastimil Babka (3):
lib/stackdepot: allow requesting early initialization dynamically
mm/slub: move struct track init out of set_track()
Merge branches 'slab/for-5.19/stackdepot' and 'slab/for-5.19/refactor' into slab/for-linus

Yixuan Cao (1):
mm/slab.c: fix comments

Documentation/vm/slub.rst | 64 +++++++++++++++++
MAINTAINERS | 1 +
include/linux/slab.h | 15 ++--
include/linux/slub_def.h | 1 -
include/linux/stackdepot.h | 26 +++++--
init/Kconfig | 1 +
lib/Kconfig.debug | 1 +
lib/slub_kunit.c | 10 +--
lib/stackdepot.c | 67 +++++++++++------
mm/page_owner.c | 9 ++-
mm/slab.c | 29 +++-----
mm/slab.h | 5 +-
mm/slab_common.c | 23 +++---
mm/slub.c | 174 ++++++++++++++++++++++++++++-----------------
14 files changed, 283 insertions(+), 143 deletions(-)

2022-05-25 19:02:59

by pr-tracker-bot

[permalink] [raw]

Subject: Re: [GIT PULL] slab for 5.19

The pull request you sent on Mon, 23 May 2022 11:54:08 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git tags/slab-for-5.19

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/2e17ce1106e04a7f3a83796ec623881487f75dd3

Thank you!

--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

2022-05-25 21:52:47

by Linus Torvalds

[permalink] [raw]

Subject: Re: [GIT PULL] slab for 5.19

On Wed, May 25, 2022 at 1:56 PM Vlastimil Babka <[email protected]> wrote:
>
> As I understand it's a tradeoff between memory overhead due to hash
> table size and cpu overhead due to length of collision chains.

... but why is that a build-time kernel config question? And when does
it actually end up triggering?

I didn't much care - and clearly small machines like m68k didn't
either - when the whole stackdepot thing was a "odd configuration
issue".

Because it used to be that stack depot users were fairly easy to
figure out. It was all about pretty special debug options that enable
a lot of very special code (KASAN and PAGE_OWNER - is there anything
else?).

Very clearly just debug builds.

But this slab change means that now it's basically *always* enabled as
an option, and that means that now it had better do sane things
because it's enabled by default and not something that can easily be
ignored any more.

The kernel config phase is something I'm very sensitive to, because
honestly, it's just about the worst part of building a kernel.
Everything else is pretty easy. The kernel config is painful.

That means that we should not ask insane questions - and asking what
static size you want for a data structure that 99% of all people have
absolutely _zero_ idea about or interest in is wrong.

So the problem here is two-fold:

(a) it introduces a new compile-time question that isn't sane to ask
a regular user, but is now exposed to regular users.

(b) this by default uses 1MB of memory for a feature that didn't in
the past, so now if you have small machines you need to make sure you
make a special kernel config for them.

And (a) happens unconditionally, while (b) is conditional. And it's
now clear exactly what all triggers the actual allocation in (b).

I _hope_ it's never triggered unless you actively enable slub debug,
but at least from a quick grep it wasn't obvious.

For example, if you happen to have rcutorture enabled, do you get the
allocation then just because rcutorture uses

kcp = kmem_cache_create("rcuscale", 136, 8, SLAB_STORE_USER, NULL);

even if you have nothing else that would enable slaub debugging? It
*looked* that way to me from a quick grep, but I could easily have
missed something.

Linus

2022-05-26 09:30:07

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [GIT PULL] slab for 5.19

+Cc Geert and Alexander

On 5/25/22 20:29, Linus Torvalds wrote:
> On Mon, May 23, 2022 at 2:54 AM Vlastimil Babka <[email protected]> wrote:
>>
>> The stackdepot conversion was already attempted last year but
>> reverted by ae14c63a9f20. The memory overhead (while not actually
>> enabled on boot) has been meanwhile solved by making the large
>> stackdepot allocation dynamic.
>
> Why do I still see
>
> +config STACK_HASH_ORDER
> + int "stack depot hash size (12 => 4KB, 20 => 1024KB)"
> + range 12 20
> + default 20
>
> there then?
>
> All that seems to have happened is that it's not a static allocation
> any more, but it's still a big allocation very early at boot by
> default.
>
> The people who complained about this last time were on m68k machines
> iirc, and 1MB there is not insignificant.

My main concern was that configs that enable SLUB_DEBUG (quite common)
shouldn't pay the stackdepot memory overhead if people don't actually
enable the slub object tracking on boot because they are debugging
something. It's possible I misunderstood Geert's point though.

> It's not at all clear to me why that allocation should be that kind of
> fixed number, and if it's a fixed number, why it should be the maximum
> one by default. That seems entirely broken.

As I understand it's a tradeoff between memory overhead due to hash
table size and cpu overhead due to length of collision chains.

> I've pulled this, but considering that it got reverted once, I'm
> really fed up with this kind of thing. This needs to be fixed.

Right, I'll try convert stackdepot to rhashtable. If it turns out
infeasible for some reason, we could at least have an "auto" default
that autosizes the table according to how much memory the system has.

> Because I'm _this_ close to just reverting it again, and saying "No,
> you tried this crap already, didn't learn from the last time, and then
> did the same thing all over again just in a different guise".
>
> Linus

2022-05-26 10:12:12

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [GIT PULL] slab for 5.19

On 5/25/22 23:36, Linus Torvalds wrote:
> Because it used to be that stack depot users were fairly easy to
> figure out. It was all about pretty special debug options that enable
> a lot of very special code (KASAN and PAGE_OWNER - is there anything
> else?).

Some parts of DRM debugging and looks like also recently networking
through ref_tracker, which also seems to be debugging.

> Very clearly just debug builds.
>
> But this slab change means that now it's basically *always* enabled as
> an option, and that means that now it had better do sane things
> because it's enabled by default and not something that can easily be
> ignored any more.
>
> The kernel config phase is something I'm very sensitive to, because
> honestly, it's just about the worst part of building a kernel.
> Everything else is pretty easy. The kernel config is painful.
>
> That means that we should not ask insane questions - and asking what
> static size you want for a data structure that 99% of all people have
> absolutely _zero_ idea about or interest in is wrong.

Understood.

> So the problem here is two-fold:
>
> (a) it introduces a new compile-time question that isn't sane to ask
> a regular user, but is now exposed to regular users.
>
> (b) this by default uses 1MB of memory for a feature that didn't in
> the past, so now if you have small machines you need to make sure you
> make a special kernel config for them.
>
> And (a) happens unconditionally, while (b) is conditional. And it's
> now clear exactly what all triggers the actual allocation in (b).
>
> I _hope_ it's never triggered unless you actively enable slub debug,
> but at least from a quick grep it wasn't obvious.
>
> For example, if you happen to have rcutorture enabled, do you get the
> allocation then just because rcutorture uses
>
> kcp = kmem_cache_create("rcuscale", 136, 8, SLAB_STORE_USER, NULL);
>
> even if you have nothing else that would enable slaub debugging? It
> *looked* that way to me from a quick grep, but I could easily have
> missed something.

Yes, running rcutorture will trigger that stackdepot allocation, but as
that's in RCU debugging part of config, I considered it in the same
category as enabling slub debugging.
But yeah I realize it might become a problem if someone wants to run
rcutorture on m68k :/

> Linus

2022-05-26 18:55:21

by Linus Torvalds

[permalink] [raw]

Subject: Re: [GIT PULL] slab for 5.19

On Wed, May 25, 2022 at 3:01 PM Vlastimil Babka <[email protected]> wrote:
>
> Yes, running rcutorture will trigger that stackdepot allocation, but as
> that's in RCU debugging part of config, I considered it in the same
> category as enabling slub debugging.

Yeah, I don't think rcutorture is a problem per se, it was more an
example of a random interaction that doesn't actually seem to make
much sense.

As far as I can tell, there is nothing in rcutorture that actually
wants that stack trace, and it seems to be doing just a test of the
object dumping working.

So it was more the oddity and randomness of it that made me go
"Whaah?" There might be others hiding elsewhere, that rcutorture use
just happened to use the flag explicitly.

Linus

2022-05-26 20:07:49

by Linus Torvalds

[permalink] [raw]

Subject: Re: [GIT PULL] slab for 5.19

On Mon, May 23, 2022 at 2:54 AM Vlastimil Babka <[email protected]> wrote:
>
> The stackdepot conversion was already attempted last year but
> reverted by ae14c63a9f20. The memory overhead (while not actually
> enabled on boot) has been meanwhile solved by making the large
> stackdepot allocation dynamic.

Why do I still see

+config STACK_HASH_ORDER
+ int "stack depot hash size (12 => 4KB, 20 => 1024KB)"
+ range 12 20
+ default 20

there then?

All that seems to have happened is that it's not a static allocation
any more, but it's still a big allocation very early at boot by
default.

The people who complained about this last time were on m68k machines
iirc, and 1MB there is not insignificant.

It's not at all clear to me why that allocation should be that kind of
fixed number, and if it's a fixed number, why it should be the maximum
one by default. That seems entirely broken.

I've pulled this, but considering that it got reverted once, I'm
really fed up with this kind of thing. This needs to be fixed.

Because I'm _this_ close to just reverting it again, and saying "No,
you tried this crap already, didn't learn from the last time, and then
did the same thing all over again just in a different guise".

Linus

2022-05-27 09:51:25

by Hyeonggon Yoo

[permalink] [raw]

Subject: Re: [GIT PULL] slab for 5.19

On Wed, May 25, 2022 at 04:07:24PM -0700, Linus Torvalds wrote:
> On Wed, May 25, 2022 at 3:01 PM Vlastimil Babka <[email protected]> wrote:
> >
> > Yes, running rcutorture will trigger that stackdepot allocation, but as
> > that's in RCU debugging part of config, I considered it in the same
> > category as enabling slub debugging.
>
> Yeah, I don't think rcutorture is a problem per se, it was more an
> example of a random interaction that doesn't actually seem to make
> much sense.

Creating cache with SLAB_STORE_USER is (currently) done in rcutorture.
SLAB_STORE_USER means "we're going to track every user of this slab
cache" so IMO it should not be done if not debugging.

Otherwise stackdepot will be used only when user passes slub_debug
boot parameter.

> As far as I can tell, there is nothing in rcutorture that actually
> wants that stack trace, and it seems to be doing just a test of the
> object dumping working.

dumping an arbitrary object may call kmem_dump_obj(), which will dump
slab objects and thus printing where it was allocated.

>
> So it was more the oddity and randomness of it that made me go
> "Whaah?" There might be others hiding elsewhere, that rcutorture use
> just happened to use the flag explicitly.
>
> Linus

AFAIK only 1) creating cache with SLAB_STORE_USER or 2) passing
slub_debug make kernel allocate hash table for stackdepot.

This do not look that odd too me.

But I agree not asking size of hash table to user and by 1) using
rhashtables or 2) deciding size of hash table at runtime would be
better :)