2022-02-26 00:16:37

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH 0/5] SLUB debugfs improvements based on stackdepot

Hi,

this series combines and revives patches from Oliver's last year
bachelor thesis (where I was the advisor) that make SLUB's debugfs
files alloc_traces and free_traces more useful.
The resubmission was blocked on stackdepot changes that are now merged,
as explained in patch 2.

Patch 1 is a new preparatory cleanup.

Patch 2 originally submitted here [1], was merged to mainline but
reverted for stackdepot related issues as explained in the patch.

Patches 3-5 originally submitted as RFC here [2]. In this submission I
have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
be considered too intrusive so I will postpone it for later. The docs
patch is adjusted accordingly.

Also available in git, based on v5.17-rc1:
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1

I'd like to ask for some review before I add this to the slab tree.

[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/

Oliver Glitta (4):
mm/slub: use stackdepot to save stack trace in objects
mm/slub: aggregate and print stack traces in debugfs files
mm/slub: sort debugfs output by frequency of stack traces
slab, documentation: add description of debugfs files for SLUB caches

Vlastimil Babka (1):
mm/slub: move struct track init out of set_track()

Documentation/vm/slub.rst | 61 +++++++++++++++
init/Kconfig | 1 +
mm/slub.c | 152 +++++++++++++++++++++++++-------------
3 files changed, 162 insertions(+), 52 deletions(-)

--
2.35.1


2022-02-26 00:16:37

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects

From: Oliver Glitta <[email protected]>

Many stack traces are similar so there are many similar arrays.
Stackdepot saves each unique stack only once.

Replace field addrs in struct track with depot_stack_handle_t handle. Use
stackdepot to save stack trace.

The benefits are smaller memory overhead and possibility to aggregate
per-cache statistics in the following patch using the stackdepot handle
instead of matching stacks manually.

[ [email protected]: rebase to 5.17-rc1 and adjust accordingly ]

This was initially merged as commit 788691464c29 and reverted by commit
ae14c63a9f20 due to several issues, that should now be fixed.
The problem of unconditional memory overhead by stackdepot has been
addressed by commit 2dba5eb1c73b ("lib/stackdepot: allow optional init
and stack_table allocation by kvmalloc()"), so the dependency on
stackdepot will result in extra memory usage only when a slab cache
tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
The build failures on some architectures were also addressed, and the
reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
patch.

Signed-off-by: Oliver Glitta <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Joonsoo Kim <[email protected]>
---
init/Kconfig | 1 +
mm/slub.c | 88 +++++++++++++++++++++++++++++-----------------------
2 files changed, 50 insertions(+), 39 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index e9119bf54b1f..b21dd3a4a106 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1871,6 +1871,7 @@ config SLUB_DEBUG
default y
bool "Enable SLUB debugging support" if EXPERT
depends on SLUB && SYSFS
+ select STACKDEPOT if STACKTRACE_SUPPORT
help
SLUB has extensive debug support features. Disabling these can
result in significant savings in code size. This also disables
diff --git a/mm/slub.c b/mm/slub.c
index 1fc451f4fe62..3140f763e819 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -26,6 +26,7 @@
#include <linux/cpuset.h>
#include <linux/mempolicy.h>
#include <linux/ctype.h>
+#include <linux/stackdepot.h>
#include <linux/debugobjects.h>
#include <linux/kallsyms.h>
#include <linux/kfence.h>
@@ -264,8 +265,8 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
#define TRACK_ADDRS_COUNT 16
struct track {
unsigned long addr; /* Called from address */
-#ifdef CONFIG_STACKTRACE
- unsigned long addrs[TRACK_ADDRS_COUNT]; /* Called from address */
+#ifdef CONFIG_STACKDEPOT
+ depot_stack_handle_t handle;
#endif
int cpu; /* Was running on cpu */
int pid; /* Pid context */
@@ -724,22 +725,20 @@ static struct track *get_track(struct kmem_cache *s, void *object,
return kasan_reset_tag(p + alloc);
}

-static void set_track(struct kmem_cache *s, void *object,
- enum track_item alloc, unsigned long addr)
+static noinline void
+set_track(struct kmem_cache *s, void *object, enum track_item alloc,
+ unsigned long addr, gfp_t flags)
{
struct track *p = get_track(s, object, alloc);

-#ifdef CONFIG_STACKTRACE
+#ifdef CONFIG_STACKDEPOT
+ unsigned long entries[TRACK_ADDRS_COUNT];
unsigned int nr_entries;

- metadata_access_enable();
- nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
- TRACK_ADDRS_COUNT, 3);
- metadata_access_disable();
-
- if (nr_entries < TRACK_ADDRS_COUNT)
- p->addrs[nr_entries] = 0;
+ nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 3);
+ p->handle = stack_depot_save(entries, nr_entries, flags);
#endif
+
p->addr = addr;
p->cpu = smp_processor_id();
p->pid = current->pid;
@@ -759,20 +758,19 @@ static void init_tracking(struct kmem_cache *s, void *object)

static void print_track(const char *s, struct track *t, unsigned long pr_time)
{
+ depot_stack_handle_t handle __maybe_unused;
+
if (!t->addr)
return;

pr_err("%s in %pS age=%lu cpu=%u pid=%d\n",
s, (void *)t->addr, pr_time - t->when, t->cpu, t->pid);
-#ifdef CONFIG_STACKTRACE
- {
- int i;
- for (i = 0; i < TRACK_ADDRS_COUNT; i++)
- if (t->addrs[i])
- pr_err("\t%pS\n", (void *)t->addrs[i]);
- else
- break;
- }
+#ifdef CONFIG_STACKDEPOT
+ handle = READ_ONCE(t->handle);
+ if (handle)
+ stack_depot_print(handle);
+ else
+ pr_err("object allocation/free stack trace missing\n");
#endif
}

@@ -1304,9 +1302,9 @@ static inline int alloc_consistency_checks(struct kmem_cache *s,
return 1;
}

-static noinline int alloc_debug_processing(struct kmem_cache *s,
- struct slab *slab,
- void *object, unsigned long addr)
+static noinline int
+alloc_debug_processing(struct kmem_cache *s, struct slab *slab, void *object,
+ unsigned long addr, gfp_t flags)
{
if (s->flags & SLAB_CONSISTENCY_CHECKS) {
if (!alloc_consistency_checks(s, slab, object))
@@ -1315,7 +1313,7 @@ static noinline int alloc_debug_processing(struct kmem_cache *s,

/* Success perform special debug activities for allocs */
if (s->flags & SLAB_STORE_USER)
- set_track(s, object, TRACK_ALLOC, addr);
+ set_track(s, object, TRACK_ALLOC, addr, flags);
trace(s, slab, object, 1);
init_object(s, object, SLUB_RED_ACTIVE);
return 1;
@@ -1395,7 +1393,7 @@ static noinline int free_debug_processing(
}

if (s->flags & SLAB_STORE_USER)
- set_track(s, object, TRACK_FREE, addr);
+ set_track(s, object, TRACK_FREE, addr, GFP_NOWAIT);
trace(s, slab, object, 0);
/* Freepointer not overwritten by init_object(), SLAB_POISON moved it */
init_object(s, object, SLUB_RED_INACTIVE);
@@ -1632,7 +1630,8 @@ static inline
void setup_slab_debug(struct kmem_cache *s, struct slab *slab, void *addr) {}

static inline int alloc_debug_processing(struct kmem_cache *s,
- struct slab *slab, void *object, unsigned long addr) { return 0; }
+ struct slab *slab, void *object, unsigned long addr,
+ gfp_t flags) { return 0; }

static inline int free_debug_processing(
struct kmem_cache *s, struct slab *slab,
@@ -3033,7 +3032,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
check_new_slab:

if (kmem_cache_debug(s)) {
- if (!alloc_debug_processing(s, slab, freelist, addr)) {
+ if (!alloc_debug_processing(s, slab, freelist, addr, gfpflags)) {
/* Slab failed checks. Next slab needed */
goto new_slab;
} else {
@@ -4221,6 +4220,9 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
s->remote_node_defrag_ratio = 1000;
#endif

+ if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
+ stack_depot_init();
+
/* Initialize the pre-computed randomized freelist if slab is up */
if (slab_state >= UP) {
if (init_cache_random_seq(s))
@@ -4352,18 +4354,26 @@ void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
objp = fixup_red_left(s, objp);
trackp = get_track(s, objp, TRACK_ALLOC);
kpp->kp_ret = (void *)trackp->addr;
-#ifdef CONFIG_STACKTRACE
- for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
- kpp->kp_stack[i] = (void *)trackp->addrs[i];
- if (!kpp->kp_stack[i])
- break;
- }
+#ifdef CONFIG_STACKDEPOT
+ {
+ depot_stack_handle_t handle;
+ unsigned long *entries;
+ unsigned int nr_entries;
+
+ handle = READ_ONCE(trackp->handle);
+ if (handle) {
+ nr_entries = stack_depot_fetch(handle, &entries);
+ for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
+ kpp->kp_stack[i] = (void *)entries[i];
+ }

- trackp = get_track(s, objp, TRACK_FREE);
- for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
- kpp->kp_free_stack[i] = (void *)trackp->addrs[i];
- if (!kpp->kp_free_stack[i])
- break;
+ trackp = get_track(s, objp, TRACK_FREE);
+ handle = READ_ONCE(trackp->handle);
+ if (handle) {
+ nr_entries = stack_depot_fetch(handle, &entries);
+ for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
+ kpp->kp_free_stack[i] = (void *)entries[i];
+ }
}
#endif
#endif
--
2.35.1

2022-02-26 01:03:35

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH 1/5] mm/slub: move struct track init out of set_track()

set_track() either zeroes out the struct track or fills it, depending on
the addr parameter. This is unnecessary as there's only one place that
calls it for the initialization - init_tracking(). We can simply do the
zeroing there, with a single memset() that covers both TRACK_ALLOC and
TRACK_FREE as they are adjacent.

Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/slub.c | 32 +++++++++++++++-----------------
1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 261474092e43..1fc451f4fe62 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -729,34 +729,32 @@ static void set_track(struct kmem_cache *s, void *object,
{
struct track *p = get_track(s, object, alloc);

- if (addr) {
#ifdef CONFIG_STACKTRACE
- unsigned int nr_entries;
+ unsigned int nr_entries;

- metadata_access_enable();
- nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
- TRACK_ADDRS_COUNT, 3);
- metadata_access_disable();
+ metadata_access_enable();
+ nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
+ TRACK_ADDRS_COUNT, 3);
+ metadata_access_disable();

- if (nr_entries < TRACK_ADDRS_COUNT)
- p->addrs[nr_entries] = 0;
+ if (nr_entries < TRACK_ADDRS_COUNT)
+ p->addrs[nr_entries] = 0;
#endif
- p->addr = addr;
- p->cpu = smp_processor_id();
- p->pid = current->pid;
- p->when = jiffies;
- } else {
- memset(p, 0, sizeof(struct track));
- }
+ p->addr = addr;
+ p->cpu = smp_processor_id();
+ p->pid = current->pid;
+ p->when = jiffies;
}

static void init_tracking(struct kmem_cache *s, void *object)
{
+ struct track *p;
+
if (!(s->flags & SLAB_STORE_USER))
return;

- set_track(s, object, TRACK_FREE, 0UL);
- set_track(s, object, TRACK_ALLOC, 0UL);
+ p = get_track(s, object, TRACK_ALLOC);
+ memset(p, 0, 2*sizeof(struct track));
}

static void print_track(const char *s, struct track *t, unsigned long pr_time)
--
2.35.1

2022-02-26 01:58:38

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH 4/5] mm/slub: sort debugfs output by frequency of stack traces

From: Oliver Glitta <[email protected]>

Sort the output of debugfs alloc_traces and free_traces by the frequency
of allocation/freeing stack traces. Most frequently used stack traces
will be printed first, e.g. for easier memory leak debugging.

Signed-off-by: Oliver Glitta <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/slub.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index 06599db4faa3..a74afe59a403 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -38,6 +38,7 @@
#include <linux/memcontrol.h>
#include <linux/random.h>
#include <kunit/test.h>
+#include <linux/sort.h>

#include <linux/debugfs.h>
#include <trace/events/kmem.h>
@@ -6150,6 +6151,17 @@ static void *slab_debugfs_next(struct seq_file *seq, void *v, loff_t *ppos)
return NULL;
}

+static int cmp_loc_by_count(const void *a, const void *b, const void *data)
+{
+ struct location *loc1 = (struct location *)a;
+ struct location *loc2 = (struct location *)b;
+
+ if (loc1->count > loc2->count)
+ return -1;
+ else
+ return 1;
+}
+
static void *slab_debugfs_start(struct seq_file *seq, loff_t *ppos)
{
struct loc_track *t = seq->private;
@@ -6211,6 +6223,10 @@ static int slab_debug_trace_open(struct inode *inode, struct file *filep)
spin_unlock_irqrestore(&n->list_lock, flags);
}

+ /* Sort locations by count */
+ sort_r(t->loc, t->count, sizeof(struct location),
+ cmp_loc_by_count, NULL, NULL);
+
bitmap_free(obj_map);
return 0;
}
--
2.35.1

2022-02-26 02:10:12

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH 5/5] slab, documentation: add description of debugfs files for SLUB caches

From: Oliver Glitta <[email protected]>

Add description of debugfs files alloc_traces and free_traces
to SLUB cache documentation.

[ [email protected]: some rewording ]

Signed-off-by: Oliver Glitta <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: [email protected]
---
Documentation/vm/slub.rst | 61 +++++++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)

diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
index d3028554b1e9..2b2b931e59fc 100644
--- a/Documentation/vm/slub.rst
+++ b/Documentation/vm/slub.rst
@@ -384,5 +384,66 @@ c) Execute ``slabinfo-gnuplot.sh`` in '-t' mode, passing all of the
40,60`` range will plot only samples collected between 40th and
60th seconds).

+
+DebugFS files for SLUB
+======================
+
+For more information about current state of SLUB caches with the user tracking
+debug option enabled, debugfs files are available, typically under
+/sys/kernel/debug/slab/<cache>/ (created only for caches with enabled user
+tracking). There are 2 types of these files with the following debug
+information:
+
+1. alloc_traces::
+
+ Prints information about unique allocation traces of the currently
+ allocated objects. The output is sorted by frequency of each trace.
+
+ Information in the output:
+ Number of objects, allocating function, minimal/average/maximal jiffies since alloc,
+ pid range of the allocating processes, cpu mask of allocating cpus, and stack trace.
+
+ Example:::
+
+ 1085 populate_error_injection_list+0x97/0x110 age=166678/166680/166682 pid=1 cpus=1::
+ __slab_alloc+0x6d/0x90
+ kmem_cache_alloc_trace+0x2eb/0x300
+ populate_error_injection_list+0x97/0x110
+ init_error_injection+0x1b/0x71
+ do_one_initcall+0x5f/0x2d0
+ kernel_init_freeable+0x26f/0x2d7
+ kernel_init+0xe/0x118
+ ret_from_fork+0x22/0x30
+
+
+2. free_traces::
+
+ Prints information about unique free traces of the currently free objects,
+ sorted by their frequency.
+
+ Information in the output:
+ Number of objects, freeing function, minimal/average/maximal jiffies since free,
+ pid range of the freeing processes, cpu mask of freeing cpus, and stack trace.
+
+ Example:::
+
+ 51 acpi_ut_update_ref_count+0x6a6/0x782 age=236886/237027/237772 pid=1 cpus=1
+ kfree+0x2db/0x420
+ acpi_ut_update_ref_count+0x6a6/0x782
+ acpi_ut_update_object_reference+0x1ad/0x234
+ acpi_ut_remove_reference+0x7d/0x84
+ acpi_rs_get_prt_method_data+0x97/0xd6
+ acpi_get_irq_routing_table+0x82/0xc4
+ acpi_pci_irq_find_prt_entry+0x8e/0x2e0
+ acpi_pci_irq_lookup+0x3a/0x1e0
+ acpi_pci_irq_enable+0x77/0x240
+ pcibios_enable_device+0x39/0x40
+ do_pci_enable_device.part.0+0x5d/0xe0
+ pci_enable_device_flags+0xfc/0x120
+ pci_enable_device+0x13/0x20
+ virtio_pci_probe+0x9e/0x170
+ local_pci_probe+0x48/0x80
+ pci_device_probe+0x105/0x1c0
+
Christoph Lameter, May 30, 2007
Sergey Senozhatsky, October 23, 2015
--
2.35.1

2022-02-26 07:27:37

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot

On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> Hi,
>
> this series combines and revives patches from Oliver's last year
> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> files alloc_traces and free_traces more useful.
> The resubmission was blocked on stackdepot changes that are now merged,
> as explained in patch 2.
>

Hello. I just started review/testing this series.

it crashed on my system (arm64)

I ran with boot parameter slub_debug=U, and without KASAN.
So CONFIG_STACKDEPOT_ALWAYS_INIT=n.

void * __init memblock_alloc_try_nid(
phys_addr_t size, phys_addr_t align,
phys_addr_t min_addr, phys_addr_t max_addr,
int nid)
{
void *ptr;

memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
__func__, (u64)size, (u64)align, nid, &min_addr,
&max_addr, (void *)_RET_IP_);
ptr = memblock_alloc_internal(size, align,
min_addr, max_addr, nid, false);
if (ptr)
memset(ptr, 0, size); <--- Crash Here

return ptr;
}

It crashed during create_boot_cache() -> stack_depot_init() ->
memblock_alloc().

I think That's because, in kmem_cache_init(), both slab and memblock is not
available. (AFAIU memblock is not available after mem_init() because of
memblock_free_all(), right?)

Thanks!

/*
* Set up kernel memory allocators
*/
static void __init mm_init(void)
{
/*
* page_ext requires contiguous pages,
* bigger than MAX_ORDER unless SPARSEMEM.
*/
page_ext_init_flatmem();
init_mem_debugging_and_hardening();
kfence_alloc_pool();
report_meminit();
stack_depot_early_init();
mem_init();
mem_init_print_info();
kmem_cache_init();
/*
* page_owner must be initialized after buddy is ready, and also after
* slab is ready so that stack_depot_init() works properly
*/)

> Patch 1 is a new preparatory cleanup.
>
> Patch 2 originally submitted here [1], was merged to mainline but
> reverted for stackdepot related issues as explained in the patch.
>
> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> be considered too intrusive so I will postpone it for later. The docs
> patch is adjusted accordingly.
>
> Also available in git, based on v5.17-rc1:
> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
>
> I'd like to ask for some review before I add this to the slab tree.
>
> [1] https://lore.kernel.org/all/[email protected]/
> [2] https://lore.kernel.org/all/[email protected]/
>
> Oliver Glitta (4):
> mm/slub: use stackdepot to save stack trace in objects
> mm/slub: aggregate and print stack traces in debugfs files
> mm/slub: sort debugfs output by frequency of stack traces
> slab, documentation: add description of debugfs files for SLUB caches
>
> Vlastimil Babka (1):
> mm/slub: move struct track init out of set_track()
>
> Documentation/vm/slub.rst | 61 +++++++++++++++
> init/Kconfig | 1 +
> mm/slub.c | 152 +++++++++++++++++++++++++-------------
> 3 files changed, 162 insertions(+), 52 deletions(-)
>
> --
> 2.35.1
>
>

--
Thank you, You are awesome!
Hyeonggon :-)

2022-02-26 10:49:05

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects

On Fri, Feb 25, 2022 at 07:03:15PM +0100, Vlastimil Babka wrote:
> From: Oliver Glitta <[email protected]>
>
> Many stack traces are similar so there are many similar arrays.
> Stackdepot saves each unique stack only once.
>
> Replace field addrs in struct track with depot_stack_handle_t handle. Use
> stackdepot to save stack trace.
>
> The benefits are smaller memory overhead and possibility to aggregate
> per-cache statistics in the following patch using the stackdepot handle
> instead of matching stacks manually.
>
> [ [email protected]: rebase to 5.17-rc1 and adjust accordingly ]
>
> This was initially merged as commit 788691464c29 and reverted by commit
> ae14c63a9f20 due to several issues, that should now be fixed.
> The problem of unconditional memory overhead by stackdepot has been
> addressed by commit 2dba5eb1c73b ("lib/stackdepot: allow optional init
> and stack_table allocation by kvmalloc()"), so the dependency on
> stackdepot will result in extra memory usage only when a slab cache
> tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
> The build failures on some architectures were also addressed, and the
> reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
> patch.
>
> Signed-off-by: Oliver Glitta <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> Cc: David Rientjes <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Pekka Enberg <[email protected]>
> Cc: Joonsoo Kim <[email protected]>
> ---
> init/Kconfig | 1 +
> mm/slub.c | 88 +++++++++++++++++++++++++++++-----------------------
> 2 files changed, 50 insertions(+), 39 deletions(-)
>
> diff --git a/init/Kconfig b/init/Kconfig
> index e9119bf54b1f..b21dd3a4a106 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1871,6 +1871,7 @@ config SLUB_DEBUG
> default y
> bool "Enable SLUB debugging support" if EXPERT
> depends on SLUB && SYSFS
> + select STACKDEPOT if STACKTRACE_SUPPORT
> help
> SLUB has extensive debug support features. Disabling these can
> result in significant savings in code size. This also disables
> diff --git a/mm/slub.c b/mm/slub.c
> index 1fc451f4fe62..3140f763e819 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -26,6 +26,7 @@
> #include <linux/cpuset.h>
> #include <linux/mempolicy.h>
> #include <linux/ctype.h>
> +#include <linux/stackdepot.h>
> #include <linux/debugobjects.h>
> #include <linux/kallsyms.h>
> #include <linux/kfence.h>
> @@ -264,8 +265,8 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
> #define TRACK_ADDRS_COUNT 16
> struct track {
> unsigned long addr; /* Called from address */
> -#ifdef CONFIG_STACKTRACE
> - unsigned long addrs[TRACK_ADDRS_COUNT]; /* Called from address */
> +#ifdef CONFIG_STACKDEPOT
> + depot_stack_handle_t handle;
> #endif
> int cpu; /* Was running on cpu */
> int pid; /* Pid context */
> @@ -724,22 +725,20 @@ static struct track *get_track(struct kmem_cache *s, void *object,
> return kasan_reset_tag(p + alloc);
> }
>
> -static void set_track(struct kmem_cache *s, void *object,
> - enum track_item alloc, unsigned long addr)
> +static noinline void
> +set_track(struct kmem_cache *s, void *object, enum track_item alloc,
> + unsigned long addr, gfp_t flags)
> {
> struct track *p = get_track(s, object, alloc);
>
> -#ifdef CONFIG_STACKTRACE
> +#ifdef CONFIG_STACKDEPOT
> + unsigned long entries[TRACK_ADDRS_COUNT];
> unsigned int nr_entries;
>
> - metadata_access_enable();
> - nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
> - TRACK_ADDRS_COUNT, 3);
> - metadata_access_disable();
> -
> - if (nr_entries < TRACK_ADDRS_COUNT)
> - p->addrs[nr_entries] = 0;
> + nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 3);
> + p->handle = stack_depot_save(entries, nr_entries, flags);
> #endif
> +
> p->addr = addr;
> p->cpu = smp_processor_id();
> p->pid = current->pid;
> @@ -759,20 +758,19 @@ static void init_tracking(struct kmem_cache *s, void *object)
>
> static void print_track(const char *s, struct track *t, unsigned long pr_time)
> {
> + depot_stack_handle_t handle __maybe_unused;
> +
> if (!t->addr)
> return;
>
> pr_err("%s in %pS age=%lu cpu=%u pid=%d\n",
> s, (void *)t->addr, pr_time - t->when, t->cpu, t->pid);
> -#ifdef CONFIG_STACKTRACE
> - {
> - int i;
> - for (i = 0; i < TRACK_ADDRS_COUNT; i++)
> - if (t->addrs[i])
> - pr_err("\t%pS\n", (void *)t->addrs[i]);
> - else
> - break;
> - }
> +#ifdef CONFIG_STACKDEPOT
> + handle = READ_ONCE(t->handle);
> + if (handle)
> + stack_depot_print(handle);
> + else
> + pr_err("object allocation/free stack trace missing\n");
> #endif
> }
>
> @@ -1304,9 +1302,9 @@ static inline int alloc_consistency_checks(struct kmem_cache *s,
> return 1;
> }
>
> -static noinline int alloc_debug_processing(struct kmem_cache *s,
> - struct slab *slab,
> - void *object, unsigned long addr)
> +static noinline int
> +alloc_debug_processing(struct kmem_cache *s, struct slab *slab, void *object,
> + unsigned long addr, gfp_t flags)
> {
> if (s->flags & SLAB_CONSISTENCY_CHECKS) {
> if (!alloc_consistency_checks(s, slab, object))
> @@ -1315,7 +1313,7 @@ static noinline int alloc_debug_processing(struct kmem_cache *s,
>
> /* Success perform special debug activities for allocs */
> if (s->flags & SLAB_STORE_USER)
> - set_track(s, object, TRACK_ALLOC, addr);
> + set_track(s, object, TRACK_ALLOC, addr, flags);

I see warning because of this.
We should not reuse flags here because alloc_debug_processing() can be
called with preemption disabled, and caller specified GFP_KERNEL.

[ 2.015902] BUG: sleeping function called from invalid context at mm/page_alloc.c:5164
[ 2.022052] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
[ 2.028357] preempt_count: 1, expected: 0
[ 2.031508] RCU nest depth: 0, expected: 0
[ 2.034722] 1 lock held by swapper/0/1:
[ 2.037905] #0: ffff00000488f4d0 (&sb->s_type->i_mutex_key#5){+.+.}-{4:4}, at: start_creating+0x58/0x130
[ 2.045393] Preemption disabled at:
[ 2.045400] [<ffff8000083bd008>] __slab_alloc.constprop.0+0x38/0xc0
[ 2.053039] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.17.0-rc5+ #105
[ 2.059365] Hardware name: linux,dummy-virt (DT)
[ 2.063160] Call trace:
[ 2.065217] dump_backtrace+0xf8/0x130
[ 2.068350] show_stack+0x24/0x80
[ 2.071104] dump_stack_lvl+0x9c/0xd8
[ 2.074140] dump_stack+0x18/0x34
[ 2.076894] __might_resched+0x1a0/0x280
[ 2.080146] __might_sleep+0x58/0x90
[ 2.083108] prepare_alloc_pages.constprop.0+0x1b4/0x1f0
[ 2.087468] __alloc_pages+0x88/0x1e0
[ 2.090502] alloc_page_interleave+0x24/0xb4
[ 2.094021] alloc_pages+0x10c/0x170
[ 2.096984] __stack_depot_save+0x3e0/0x4e0
[ 2.100446] stack_depot_save+0x14/0x20
[ 2.103617] set_track.isra.0+0x64/0xa4
[ 2.106787] alloc_debug_processing+0x11c/0x1e0
[ 2.110532] ___slab_alloc+0x3e8/0x750
[ 2.113643] __slab_alloc.constprop.0+0x64/0xc0
[ 2.117391] kmem_cache_alloc+0x304/0x350
[ 2.120702] security_inode_alloc+0x38/0xa4
[ 2.124169] inode_init_always+0xd0/0x264
[ 2.127501] alloc_inode+0x44/0xec
[ 2.130325] new_inode+0x28/0xc0
[ 2.133011] tracefs_create_file+0x74/0x1e0
[ 2.136459] init_tracer_tracefs+0x248/0x644
[ 2.140030] tracer_init_tracefs+0x9c/0x34c
[ 2.143483] do_one_initcall+0x44/0x170
[ 2.146654] do_initcalls+0x104/0x144
[ 2.149704] kernel_init_freeable+0x130/0x178

[...]

> trace(s, slab, object, 1);
> init_object(s, object, SLUB_RED_ACTIVE);
> return 1;
> @@ -1395,7 +1393,7 @@ static noinline int free_debug_processing(
> }
>
> if (s->flags & SLAB_STORE_USER)
> - set_track(s, object, TRACK_FREE, addr);
> + set_track(s, object, TRACK_FREE, addr, GFP_NOWAIT);
> trace(s, slab, object, 0);
> /* Freepointer not overwritten by init_object(), SLAB_POISON moved it */
> init_object(s, object, SLUB_RED_INACTIVE);
> @@ -1632,7 +1630,8 @@ static inline
> void setup_slab_debug(struct kmem_cache *s, struct slab *slab, void *addr) {}
>
> static inline int alloc_debug_processing(struct kmem_cache *s,
> - struct slab *slab, void *object, unsigned long addr) { return 0; }
> + struct slab *slab, void *object, unsigned long addr,
> + gfp_t flags) { return 0; }
>
> static inline int free_debug_processing(
> struct kmem_cache *s, struct slab *slab,
> @@ -3033,7 +3032,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> check_new_slab:
>
> if (kmem_cache_debug(s)) {
> - if (!alloc_debug_processing(s, slab, freelist, addr)) {
> + if (!alloc_debug_processing(s, slab, freelist, addr, gfpflags)) {
> /* Slab failed checks. Next slab needed */
> goto new_slab;
> } else {
> @@ -4221,6 +4220,9 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
> s->remote_node_defrag_ratio = 1000;
> #endif
>
> + if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> + stack_depot_init();
> +

As mentioned in my report, it can crash system when creating boot caches
with debugging enabled.

The rest looks fine!

> /* Initialize the pre-computed randomized freelist if slab is up */
> if (slab_state >= UP) {
> if (init_cache_random_seq(s))
> @@ -4352,18 +4354,26 @@ void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
> objp = fixup_red_left(s, objp);
> trackp = get_track(s, objp, TRACK_ALLOC);
> kpp->kp_ret = (void *)trackp->addr;
> -#ifdef CONFIG_STACKTRACE
> - for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
> - kpp->kp_stack[i] = (void *)trackp->addrs[i];
> - if (!kpp->kp_stack[i])
> - break;
> - }
> +#ifdef CONFIG_STACKDEPOT
> + {
> + depot_stack_handle_t handle;
> + unsigned long *entries;
> + unsigned int nr_entries;
> +
> + handle = READ_ONCE(trackp->handle);
> + if (handle) {
> + nr_entries = stack_depot_fetch(handle, &entries);
> + for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
> + kpp->kp_stack[i] = (void *)entries[i];
> + }
>
> - trackp = get_track(s, objp, TRACK_FREE);
> - for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
> - kpp->kp_free_stack[i] = (void *)trackp->addrs[i];
> - if (!kpp->kp_free_stack[i])
> - break;
> + trackp = get_track(s, objp, TRACK_FREE);
> + handle = READ_ONCE(trackp->handle);
> + if (handle) {
> + nr_entries = stack_depot_fetch(handle, &entries);
> + for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
> + kpp->kp_free_stack[i] = (void *)entries[i];
> + }
> }
> #endif
> #endif
> --
> 2.35.1
>
>

--
Thank you, You are awesome!
Hyeonggon :-)

2022-02-26 12:44:14

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH 1/5] mm/slub: move struct track init out of set_track()

On Fri, Feb 25, 2022 at 07:03:14PM +0100, Vlastimil Babka wrote:
> set_track() either zeroes out the struct track or fills it, depending on
> the addr parameter. This is unnecessary as there's only one place that
> calls it for the initialization - init_tracking(). We can simply do the
> zeroing there, with a single memset() that covers both TRACK_ALLOC and
> TRACK_FREE as they are adjacent.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/slub.c | 32 +++++++++++++++-----------------
> 1 file changed, 15 insertions(+), 17 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 261474092e43..1fc451f4fe62 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -729,34 +729,32 @@ static void set_track(struct kmem_cache *s, void *object,
> {
> struct track *p = get_track(s, object, alloc);
>
> - if (addr) {
> #ifdef CONFIG_STACKTRACE
> - unsigned int nr_entries;
> + unsigned int nr_entries;
>
> - metadata_access_enable();
> - nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
> - TRACK_ADDRS_COUNT, 3);
> - metadata_access_disable();
> + metadata_access_enable();
> + nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
> + TRACK_ADDRS_COUNT, 3);
> + metadata_access_disable();
>
> - if (nr_entries < TRACK_ADDRS_COUNT)
> - p->addrs[nr_entries] = 0;
> + if (nr_entries < TRACK_ADDRS_COUNT)
> + p->addrs[nr_entries] = 0;
> #endif
> - p->addr = addr;
> - p->cpu = smp_processor_id();
> - p->pid = current->pid;
> - p->when = jiffies;
> - } else {
> - memset(p, 0, sizeof(struct track));
> - }
> + p->addr = addr;
> + p->cpu = smp_processor_id();
> + p->pid = current->pid;
> + p->when = jiffies;
> }
>
> static void init_tracking(struct kmem_cache *s, void *object)
> {
> + struct track *p;
> +
> if (!(s->flags & SLAB_STORE_USER))
> return;
>
> - set_track(s, object, TRACK_FREE, 0UL);
> - set_track(s, object, TRACK_ALLOC, 0UL);
> + p = get_track(s, object, TRACK_ALLOC);
> + memset(p, 0, 2*sizeof(struct track));
> }
>

Looks good.
Reviewed-by: Hyeonggon Yoo <[email protected]>

And works nicely.
Tested-by: Hyeonggon Yoo <[email protected]>

> static void print_track(const char *s, struct track *t, unsigned long pr_time)
> --
> 2.35.1
>
>

--
Thank you, You are awesome!
Hyeonggon :-)

2022-02-26 14:19:55

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot

On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> Hi,
>
> this series combines and revives patches from Oliver's last year
> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> files alloc_traces and free_traces more useful.
> The resubmission was blocked on stackdepot changes that are now merged,
> as explained in patch 2.
>
> Patch 1 is a new preparatory cleanup.
>
> Patch 2 originally submitted here [1], was merged to mainline but
> reverted for stackdepot related issues as explained in the patch.
>
> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> be considered too intrusive so I will postpone it for later. The docs
> patch is adjusted accordingly.
>

This problem is not caused by this patch series.
But I think it's worth mentioning...

It's really weird that some stack traces are not recorded
when CONFIG_KASAN=y.

I made sure that:
- Stack Depot did not reach its limit
- the free path happen on CONFIG_KASAN=y too.

I have no clue why this happen.

# cat dentry/free_traces (CONFIG_KASAN=y)
6585 <not-available> age=4294912647 pid=0 cpus=0

# cat dentry/free_traces (CONFIG_KASAN=n)
1246 <not-available> age=4294906877 pid=0 cpus=0
379 __d_free+0x20/0x2c age=33/14225/14353 pid=0-122 cpus=0-3
kmem_cache_free+0x1f4/0x21c
__d_free+0x20/0x2c
rcu_core+0x334/0x580
rcu_core_si+0x14/0x20
__do_softirq+0x12c/0x2a8

2 dentry_free+0x58/0xb0 age=14101/14101/14101 pid=158 cpus=0
kmem_cache_free+0x1f4/0x21c
dentry_free+0x58/0xb0
__dentry_kill+0x18c/0x1d0
dput+0x1c4/0x2fc
__fput+0xb0/0x230
____fput+0x14/0x20
task_work_run+0x84/0x17c
do_notify_resume+0x208/0x1330
el0_svc+0x6c/0x80
el0t_64_sync_handler+0xa8/0x130
el0t_64_sync+0x1a0/0x1a4

1 dentry_free+0x58/0xb0 age=7678 pid=190 cpus=1
kmem_cache_free+0x1f4/0x21c
dentry_free+0x58/0xb0
__dentry_kill+0x18c/0x1d0
dput+0x1c4/0x2fc
__fput+0xb0/0x230
____fput+0x14/0x20
task_work_run+0x84/0x17c
do_exit+0x2dc/0x8e0
do_group_exit+0x38/0xa4
__wake_up_parent+0x0/0x34
invoke_syscall+0x48/0x114
el0_svc_common.constprop.0+0x44/0xfc
do_el0_svc+0x2c/0x94
el0_svc+0x28/0x80
el0t_64_sync_handler+0xa8/0x130
el0t_64_sync+0x1a0/0x1a4
--
Thank you, You are awesome!
Hyeonggon :-)

2022-02-26 20:16:26

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH 4/5] mm/slub: sort debugfs output by frequency of stack traces

On Fri, Feb 25, 2022 at 07:03:17PM +0100, Vlastimil Babka wrote:
> From: Oliver Glitta <[email protected]>
>
> Sort the output of debugfs alloc_traces and free_traces by the frequency
> of allocation/freeing stack traces. Most frequently used stack traces
> will be printed first, e.g. for easier memory leak debugging.
>
> Signed-off-by: Oliver Glitta <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/slub.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 06599db4faa3..a74afe59a403 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -38,6 +38,7 @@
> #include <linux/memcontrol.h>
> #include <linux/random.h>
> #include <kunit/test.h>
> +#include <linux/sort.h>
>
> #include <linux/debugfs.h>
> #include <trace/events/kmem.h>
> @@ -6150,6 +6151,17 @@ static void *slab_debugfs_next(struct seq_file *seq, void *v, loff_t *ppos)
> return NULL;
> }
>
> +static int cmp_loc_by_count(const void *a, const void *b, const void *data)
> +{
> + struct location *loc1 = (struct location *)a;
> + struct location *loc2 = (struct location *)b;
> +
> + if (loc1->count > loc2->count)
> + return -1;
> + else
> + return 1;
> +}
> +
> static void *slab_debugfs_start(struct seq_file *seq, loff_t *ppos)
> {
> struct loc_track *t = seq->private;
> @@ -6211,6 +6223,10 @@ static int slab_debug_trace_open(struct inode *inode, struct file *filep)
> spin_unlock_irqrestore(&n->list_lock, flags);
> }
>
> + /* Sort locations by count */
> + sort_r(t->loc, t->count, sizeof(struct location),
> + cmp_loc_by_count, NULL, NULL);
> +
> bitmap_free(obj_map);
> return 0;
> }

This is so cool!

Reviewed-by: Hyeonggon Yoo <[email protected]>
Tested-by: Hyeonggon Yoo <[email protected]>

> --
> 2.35.1
>
>

--
Thank you, You are awesome!
Hyeonggon :-)

2022-02-27 03:34:56

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable

After commit 2dba5eb1c73b ("lib/stackdepot: allow optional init and
stack_table allocation by kvmalloc()"), stack_depot_init() is called
later if CONFIG_STACKDEPOT_ALWAYS_INIT=n to remove unnecessary memory
usage. It allocates stack_table using memblock_alloc() or kvmalloc()
depending on availability of slab allocator.

But when stack_depot_init() is called while creating boot slab caches,
both slab allocator and memblock is not available. So kernel crashes.
Allocate stack_table from page allocator when both slab allocator and
memblock is unavailable.

Limit size of stack_table when using page allocator because vmalloc()
is also unavailable in kmem_cache_init(). it must not be larger than
(PAGE_SIZE << (MAX_ORDER - 1)).

This patch was tested on both CONFIG_STACKDEPOT_ALWAYS_INIT=y and n.

Fixes: 2dba5eb1c73b ("lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()")
Signed-off-by: Hyeonggon Yoo <[email protected]>
---
lib/stackdepot.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index bf5ba9af0500..606f80ae2bf7 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -73,6 +73,14 @@ static int next_slab_inited;
static size_t depot_offset;
static DEFINE_RAW_SPINLOCK(depot_lock);

+static unsigned int stack_hash_size = (1 << CONFIG_STACK_HASH_ORDER);
+static inline unsigned int stack_hash_mask(void)
+{
+ return stack_hash_size - 1;
+}
+
+#define STACK_HASH_SEED 0x9747b28c
+
static bool init_stack_slab(void **prealloc)
{
if (!*prealloc)
@@ -142,10 +150,6 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
return stack;
}

-#define STACK_HASH_SIZE (1L << CONFIG_STACK_HASH_ORDER)
-#define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
-#define STACK_HASH_SEED 0x9747b28c
-
static bool stack_depot_disable;
static struct stack_record **stack_table;

@@ -172,18 +176,28 @@ __ref int stack_depot_init(void)

mutex_lock(&stack_depot_init_mutex);
if (!stack_depot_disable && !stack_table) {
- size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
+ size_t size = (stack_hash_size * sizeof(struct stack_record *));
int i;

if (slab_is_available()) {
pr_info("Stack Depot allocating hash table with kvmalloc\n");
stack_table = kvmalloc(size, GFP_KERNEL);
+ } else if (totalram_pages() > 0) {
+ /* Reduce size because vmalloc may be unavailable */
+ size = min(size, PAGE_SIZE << (MAX_ORDER - 1));
+ stack_hash_size = size / sizeof(struct stack_record *);
+
+ pr_info("Stack Depot allocating hash table with __get_free_pages\n");
+ stack_table = (struct stack_record **)
+ __get_free_pages(GFP_KERNEL, get_order(size));
} else {
pr_info("Stack Depot allocating hash table with memblock_alloc\n");
stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
}
+
if (stack_table) {
- for (i = 0; i < STACK_HASH_SIZE; i++)
+ pr_info("Stack Depot hash table size=%u\n", stack_hash_size);
+ for (i = 0; i < stack_hash_size; i++)
stack_table[i] = NULL;
} else {
pr_err("Stack Depot hash table allocation failed, disabling\n");
@@ -363,7 +377,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
goto fast_exit;

hash = hash_stack(entries, nr_entries);
- bucket = &stack_table[hash & STACK_HASH_MASK];
+ bucket = &stack_table[hash & stack_hash_mask()];

/*
* Fast path: look the stack trace up without locking.
--
2.33.1

2022-02-27 07:38:01

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable

Hi Hyeonggon,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.17-rc5 next-20220225]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Hyeonggon-Yoo/lib-stackdepot-Use-page-allocator-if-both-slab-and-memblock-is-unavailable/20220227-111029
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 2293be58d6a18cab800e25e42081bacb75c05752
config: hexagon-randconfig-r005-20220227 (https://download.01.org/0day-ci/archive/20220227/[email protected]/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d271fc04d5b97b12e6b797c6067d3c96a8d7470e)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/fd37f88eccc357002cc03a6a5fac60fb42552bc7
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Hyeonggon-Yoo/lib-stackdepot-Use-page-allocator-if-both-slab-and-memblock-is-unavailable/20220227-111029
git checkout fd37f88eccc357002cc03a6a5fac60fb42552bc7
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

>> lib/stackdepot.c:187:11: warning: comparison of distinct pointer types ('typeof (size) *' (aka 'unsigned int *') and 'typeof ((1UL << 14) << (11 - 1)) *' (aka 'unsigned long *')) [-Wcompare-distinct-pointer-types]
size = min(size, PAGE_SIZE << (MAX_ORDER - 1));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/minmax.h:45:19: note: expanded from macro 'min'
#define min(x, y) __careful_cmp(x, y, <)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/minmax.h:36:24: note: expanded from macro '__careful_cmp'
__builtin_choose_expr(__safe_cmp(x, y), \
^~~~~~~~~~~~~~~~
include/linux/minmax.h:26:4: note: expanded from macro '__safe_cmp'
(__typecheck(x, y) && __no_side_effects(x, y))
^~~~~~~~~~~~~~~~~
include/linux/minmax.h:20:28: note: expanded from macro '__typecheck'
(!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
1 warning generated.


vim +187 lib/stackdepot.c

168
169 /*
170 * __ref because of memblock_alloc(), which will not be actually called after
171 * the __init code is gone, because at that point slab_is_available() is true
172 */
173 __ref int stack_depot_init(void)
174 {
175 static DEFINE_MUTEX(stack_depot_init_mutex);
176
177 mutex_lock(&stack_depot_init_mutex);
178 if (!stack_depot_disable && !stack_table) {
179 size_t size = (stack_hash_size * sizeof(struct stack_record *));
180 int i;
181
182 if (slab_is_available()) {
183 pr_info("Stack Depot allocating hash table with kvmalloc\n");
184 stack_table = kvmalloc(size, GFP_KERNEL);
185 } else if (totalram_pages() > 0) {
186 /* Reduce size because vmalloc may be unavailable */
> 187 size = min(size, PAGE_SIZE << (MAX_ORDER - 1));
188 stack_hash_size = size / sizeof(struct stack_record *);
189
190 pr_info("Stack Depot allocating hash table with __get_free_pages\n");
191 stack_table = (struct stack_record **)
192 __get_free_pages(GFP_KERNEL, get_order(size));
193 } else {
194 pr_info("Stack Depot allocating hash table with memblock_alloc\n");
195 stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
196 }
197
198 if (stack_table) {
199 pr_info("Stack Depot hash table size=%u\n", stack_hash_size);
200 for (i = 0; i < stack_hash_size; i++)
201 stack_table[i] = NULL;
202 } else {
203 pr_err("Stack Depot hash table allocation failed, disabling\n");
204 stack_depot_disable = true;
205 mutex_unlock(&stack_depot_init_mutex);
206 return -ENOMEM;
207 }
208 }
209 mutex_unlock(&stack_depot_init_mutex);
210 return 0;
211 }
212 EXPORT_SYMBOL_GPL(stack_depot_init);
213

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]

2022-02-27 09:37:07

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH v2] lib/stackdepot: Use page allocator if both slab and memblock is unavailable

After commit 2dba5eb1c73b ("lib/stackdepot: allow optional init and
stack_table allocation by kvmalloc()"), stack_depot_init() is called
later if CONFIG_STACKDEPOT_ALWAYS_INIT=n to remove unnecessary memory
usage. It allocates stack_table using memblock_alloc() or kvmalloc()
depending on availability of slab allocator.

But when stack_depot_init() is called while creating boot slab caches,
both slab allocator and memblock is not available. So kernel crashes.
Allocate stack_table from page allocator when both slab allocator and
memblock is unavailable.

Limit size of stack_table when using page allocator because vmalloc()
is also unavailable in kmem_cache_init(). It must not be larger than
(PAGE_SIZE << (MAX_ORDER - 1)).

This patch was tested on both CONFIG_STACKDEPOT_ALWAYS_INIT=y and n.

[ [email protected]: Fix W=1 build warning ]

Fixes: 2dba5eb1c73b ("lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()")
Signed-off-by: Hyeonggon Yoo <[email protected]>
---
lib/stackdepot.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index bf5ba9af0500..a96f8fd78c42 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -73,6 +73,14 @@ static int next_slab_inited;
static size_t depot_offset;
static DEFINE_RAW_SPINLOCK(depot_lock);

+static size_t stack_hash_size = (1 << CONFIG_STACK_HASH_ORDER);
+static inline size_t stack_hash_mask(void)
+{
+ return stack_hash_size - 1;
+}
+
+#define STACK_HASH_SEED 0x9747b28c
+
static bool init_stack_slab(void **prealloc)
{
if (!*prealloc)
@@ -142,10 +150,6 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
return stack;
}

-#define STACK_HASH_SIZE (1L << CONFIG_STACK_HASH_ORDER)
-#define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
-#define STACK_HASH_SEED 0x9747b28c
-
static bool stack_depot_disable;
static struct stack_record **stack_table;

@@ -172,18 +176,28 @@ __ref int stack_depot_init(void)

mutex_lock(&stack_depot_init_mutex);
if (!stack_depot_disable && !stack_table) {
- size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
+ size_t size = (stack_hash_size * sizeof(struct stack_record *));
int i;

if (slab_is_available()) {
pr_info("Stack Depot allocating hash table with kvmalloc\n");
stack_table = kvmalloc(size, GFP_KERNEL);
+ } else if (totalram_pages() > 0) {
+ /* Reduce size because vmalloc may be unavailable */
+ size = min_t(size_t, size, PAGE_SIZE << (MAX_ORDER - 1));
+ stack_hash_size = size / sizeof(struct stack_record *);
+
+ pr_info("Stack Depot allocating hash table with __get_free_pages\n");
+ stack_table = (struct stack_record **)
+ __get_free_pages(GFP_KERNEL, get_order(size));
} else {
pr_info("Stack Depot allocating hash table with memblock_alloc\n");
stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
}
+
if (stack_table) {
- for (i = 0; i < STACK_HASH_SIZE; i++)
+ pr_info("Stack Depot hash table size=%zu\n", stack_hash_size);
+ for (i = 0; i < stack_hash_size; i++)
stack_table[i] = NULL;
} else {
pr_err("Stack Depot hash table allocation failed, disabling\n");
@@ -363,7 +377,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
goto fast_exit;

hash = hash_stack(entries, nr_entries);
- bucket = &stack_table[hash & STACK_HASH_MASK];
+ bucket = &stack_table[hash & stack_hash_mask()];

/*
* Fast path: look the stack trace up without locking.
--
2.33.1

2022-02-27 10:45:42

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable

Hi Hyeonggon,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.17-rc5 next-20220225]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Hyeonggon-Yoo/lib-stackdepot-Use-page-allocator-if-both-slab-and-memblock-is-unavailable/20220227-111029
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 2293be58d6a18cab800e25e42081bacb75c05752
config: i386-randconfig-s002 (https://download.01.org/0day-ci/archive/20220227/[email protected]/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.4-dirty
# https://github.com/0day-ci/linux/commit/fd37f88eccc357002cc03a6a5fac60fb42552bc7
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Hyeonggon-Yoo/lib-stackdepot-Use-page-allocator-if-both-slab-and-memblock-is-unavailable/20220227-111029
git checkout fd37f88eccc357002cc03a6a5fac60fb42552bc7
# save the config file to linux build tree
mkdir build_dir
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>


sparse warnings: (new ones prefixed by >>)
>> lib/stackdepot.c:187:32: sparse: sparse: incompatible types in comparison expression (different type sizes):
>> lib/stackdepot.c:187:32: sparse: unsigned int *
>> lib/stackdepot.c:187:32: sparse: unsigned long *

vim +187 lib/stackdepot.c

168
169 /*
170 * __ref because of memblock_alloc(), which will not be actually called after
171 * the __init code is gone, because at that point slab_is_available() is true
172 */
173 __ref int stack_depot_init(void)
174 {
175 static DEFINE_MUTEX(stack_depot_init_mutex);
176
177 mutex_lock(&stack_depot_init_mutex);
178 if (!stack_depot_disable && !stack_table) {
179 size_t size = (stack_hash_size * sizeof(struct stack_record *));
180 int i;
181
182 if (slab_is_available()) {
183 pr_info("Stack Depot allocating hash table with kvmalloc\n");
184 stack_table = kvmalloc(size, GFP_KERNEL);
185 } else if (totalram_pages() > 0) {
186 /* Reduce size because vmalloc may be unavailable */
> 187 size = min(size, PAGE_SIZE << (MAX_ORDER - 1));
188 stack_hash_size = size / sizeof(struct stack_record *);
189
190 pr_info("Stack Depot allocating hash table with __get_free_pages\n");
191 stack_table = (struct stack_record **)
192 __get_free_pages(GFP_KERNEL, get_order(size));
193 } else {
194 pr_info("Stack Depot allocating hash table with memblock_alloc\n");
195 stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
196 }
197
198 if (stack_table) {
199 pr_info("Stack Depot hash table size=%u\n", stack_hash_size);
200 for (i = 0; i < stack_hash_size; i++)
201 stack_table[i] = NULL;
202 } else {
203 pr_err("Stack Depot hash table allocation failed, disabling\n");
204 stack_depot_disable = true;
205 mutex_unlock(&stack_depot_init_mutex);
206 return -ENOMEM;
207 }
208 }
209 mutex_unlock(&stack_depot_init_mutex);
210 return 0;
211 }
212 EXPORT_SYMBOL_GPL(stack_depot_init);
213

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]

2022-02-27 11:44:12

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH 5/5] slab, documentation: add description of debugfs files for SLUB caches

On Fri, Feb 25, 2022 at 07:03:18PM +0100, Vlastimil Babka wrote:
> From: Oliver Glitta <[email protected]>
>
> Add description of debugfs files alloc_traces and free_traces
> to SLUB cache documentation.
>
> [ [email protected]: some rewording ]
>
> Signed-off-by: Oliver Glitta <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> Cc: Jonathan Corbet <[email protected]>
> Cc: Randy Dunlap <[email protected]>
> Cc: [email protected]
> ---
> Documentation/vm/slub.rst | 61 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 61 insertions(+)
>
> diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
> index d3028554b1e9..2b2b931e59fc 100644
> --- a/Documentation/vm/slub.rst
> +++ b/Documentation/vm/slub.rst
> @@ -384,5 +384,66 @@ c) Execute ``slabinfo-gnuplot.sh`` in '-t' mode, passing all of the
> 40,60`` range will plot only samples collected between 40th and
> 60th seconds).
>
> +
> +DebugFS files for SLUB
> +======================
> +
> +For more information about current state of SLUB caches with the user tracking
> +debug option enabled, debugfs files are available, typically under
> +/sys/kernel/debug/slab/<cache>/ (created only for caches with enabled user
> +tracking). There are 2 types of these files with the following debug
> +information:
> +
> +1. alloc_traces::
> +
> + Prints information about unique allocation traces of the currently
> + allocated objects. The output is sorted by frequency of each trace.
> +
> + Information in the output:
> + Number of objects, allocating function, minimal/average/maximal jiffies since alloc,
> + pid range of the allocating processes, cpu mask of allocating cpus, and stack trace.
> +
> + Example:::
> +
> + 1085 populate_error_injection_list+0x97/0x110 age=166678/166680/166682 pid=1 cpus=1::
> + __slab_alloc+0x6d/0x90
> + kmem_cache_alloc_trace+0x2eb/0x300
> + populate_error_injection_list+0x97/0x110
> + init_error_injection+0x1b/0x71
> + do_one_initcall+0x5f/0x2d0
> + kernel_init_freeable+0x26f/0x2d7
> + kernel_init+0xe/0x118
> + ret_from_fork+0x22/0x30
> +
> +
> +2. free_traces::
> +
> + Prints information about unique free traces of the currently free objects,
> + sorted by their frequency.
> +

I'm not sure that it's traces of the "currently free objects".

static int slab_debug_trace_open(struct inode *inode, struct file *filep)
{
[...]

obj_map = bitmap_alloc(oo_objects(s->oo), GFP_KERNEL);

[...]

for_each_kmem_cache_node(s, node, n) {
unsigned long flags;
struct slab *slab;

if (!atomic_long_read(&n->nr_slabs))
continue;

spin_lock_irqsave(&n->list_lock, flags);
list_for_each_entry(slab, &n->partial, slab_list)
process_slab(t, s, slab, alloc, obj_map);
list_for_each_entry(slab, &n->full, slab_list)
process_slab(t, s, slab, alloc, obj_map);
spin_unlock_irqrestore(&n->list_lock, flags);
}

[...]

}

static void __fill_map(unsigned long *obj_map, struct kmem_cache *s,
struct slab *slab)
{
void *addr = slab_address(slab);
void *p;

bitmap_zero(obj_map, slab->objects);

for (p = slab->freelist; p; p = get_freepointer(s, p))
set_bit(__obj_to_index(s, addr, p), obj_map);
}

static void process_slab(struct loc_track *t, struct kmem_cache *s,
struct slab *slab, enum track_item alloc,
unsigned long *obj_map)
{
void *addr = slab_address(slab);
void *p;

__fill_map(obj_map, s, slab);

for_each_object(p, s, addr, slab->objects)
if (!test_bit(__obj_to_index(s, addr, p), obj_map))
add_location(t, s, get_track(s, p, alloc));
}

I think it's not traces of "currently free objects"
because index bit of free objects are set in obj_map bitmap?

It's weird but it's traces of allocated objects that have been freed at
least once (or <not available>)

I think we can fix the code or doc?

Please tell me if I'm missing something :)

> + Information in the output:
> + Number of objects, freeing function, minimal/average/maximal jiffies since free,
> + pid range of the freeing processes, cpu mask of freeing cpus, and stack trace.
> +
> + Example:::
> +
> + 51 acpi_ut_update_ref_count+0x6a6/0x782 age=236886/237027/237772 pid=1 cpus=1
> + kfree+0x2db/0x420
> + acpi_ut_update_ref_count+0x6a6/0x782
> + acpi_ut_update_object_reference+0x1ad/0x234
> + acpi_ut_remove_reference+0x7d/0x84
> + acpi_rs_get_prt_method_data+0x97/0xd6
> + acpi_get_irq_routing_table+0x82/0xc4
> + acpi_pci_irq_find_prt_entry+0x8e/0x2e0
> + acpi_pci_irq_lookup+0x3a/0x1e0
> + acpi_pci_irq_enable+0x77/0x240
> + pcibios_enable_device+0x39/0x40
> + do_pci_enable_device.part.0+0x5d/0xe0
> + pci_enable_device_flags+0xfc/0x120
> + pci_enable_device+0x13/0x20
> + virtio_pci_probe+0x9e/0x170
> + local_pci_probe+0x48/0x80
> + pci_device_probe+0x105/0x1c0
> +

Everything else looks nice!

> Christoph Lameter, May 30, 2007
> Sergey Senozhatsky, October 23, 2015
> --
> 2.35.1
>
>

--
Thank you, You are awesome!
Hyeonggon :-)

2022-02-27 12:27:03

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects

On Fri, Feb 25, 2022 at 07:03:15PM +0100, Vlastimil Babka wrote:
> From: Oliver Glitta <[email protected]>
>
> Many stack traces are similar so there are many similar arrays.
> Stackdepot saves each unique stack only once.
>
> Replace field addrs in struct track with depot_stack_handle_t handle. Use
> stackdepot to save stack trace.
>

I think it's not a replacement?

> The benefits are smaller memory overhead and possibility to aggregate
> per-cache statistics in the following patch using the stackdepot handle
> instead of matching stacks manually.
>
> [ [email protected]: rebase to 5.17-rc1 and adjust accordingly ]
>
> This was initially merged as commit 788691464c29 and reverted by commit
> ae14c63a9f20 due to several issues, that should now be fixed.
> The problem of unconditional memory overhead by stackdepot has been
> addressed by commit 2dba5eb1c73b ("lib/stackdepot: allow optional init
> and stack_table allocation by kvmalloc()"), so the dependency on
> stackdepot will result in extra memory usage only when a slab cache
> tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
> The build failures on some architectures were also addressed, and the
> reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
> patch.

This is just an idea and beyond this patch.

After this patch, now we have external storage that records stack traces.

It's possible that some rare stack traces are in stack depot, but
not reachable because track is overwritten.

I think it's worth implementing a way to iterate through stacks in stack depot?

>
> Signed-off-by: Oliver Glitta <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> Cc: David Rientjes <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Pekka Enberg <[email protected]>
> Cc: Joonsoo Kim <[email protected]>

--
Thank you, You are awesome!
Hyeonggon :-)

2022-02-28 12:20:54

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable

On Mon, 28 Feb 2022 at 11:05, Hyeonggon Yoo <[email protected]> wrote:
[...]
> > This is odd - who is calling stack_depot_init() while neither slab nor
> > memblock are available?
>
> It's not merged yet - but Oliver's patch (2/5) in his series [1] does:
> If user is debugging cache, it calls stack_depot_init() when creating
> cache.
>
> > @@ -4221,6 +4220,9 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
> > s->remote_node_defrag_ratio = 1000;
> > #endif
> >
> > + if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> > + stack_depot_init();
> > +
>
> Oliver's patch series enables stack depot when arch supports stacktrace,
> to store slab objects' stack traces. (as slub debugging feature.)
>
> Because slub debugging is turned on by default, the commit 2dba5eb1c73b
> ("lib/stackdepot: allow optional init and stack_table allocation by
> kvmalloc()") made stack_depot_init() can be called later.
>
> With Oliver's patch applied, stack_depot_init() can be called in
> contexts below:
>
> 1) only memblock available (for kasan)
> 2) only buddy available, vmalloc/memblock unavailable (for boot caches)
> 3) buddy/slab available, vmalloc/memblock unavailable (vmap_area cache)
> 4) buddy/slab/vmalloc available, memblock unavailable (other caches)
>
> SLUB supports enabling debugging for specific cache by passing
> slub_debug boot parameter. As slab caches can be created in
> various context, stack_depot_init() should consider all contexts above.
>
> Writing this, I realized my patch does not handle case 3).. I'll send v3.
>
> [1] https://lore.kernel.org/linux-mm/YhoakP7Kih%[email protected]/T/#t
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
>
> > Do you have a stacktrace?
>
> Yeah, here:
>
> You can reproduce this on vbabka's slab-stackdepot-v1 branch [2] with
> slub_debug=U, and CONFIG_STACKDEPOT_ALWAYS_INIT=n
>
[...]
> [ 0.000000] Call trace:
> [ 0.000000] __memset+0x16c/0x188
> [ 0.000000] stack_depot_init+0xc8/0x100
> [ 0.000000] __kmem_cache_create+0x454/0x570
> [ 0.000000] create_boot_cache+0xa0/0xe0

I think even before this point you have all the information required
to determine if stackdepot will be required. It's available after
setup_slub_debug().

So why can't you just call stack_depot_init() somewhere else and avoid
all this complexity?

> [ 0.000000] kmem_cache_init+0xf8/0x204
> [ 0.000000] start_kernel+0x3ec/0x668
> [ 0.000000] __primary_switched+0xc0/0xc8
> [ 0.000000] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428)
> [ 0.000000] ---[ end trace 0000000000000000 ]---
> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

2022-02-28 13:18:12

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable

On Sun, 27 Feb 2022 at 04:08, Hyeonggon Yoo <[email protected]> wrote:
>
> After commit 2dba5eb1c73b ("lib/stackdepot: allow optional init and
> stack_table allocation by kvmalloc()"), stack_depot_init() is called
> later if CONFIG_STACKDEPOT_ALWAYS_INIT=n to remove unnecessary memory
> usage. It allocates stack_table using memblock_alloc() or kvmalloc()
> depending on availability of slab allocator.
>
> But when stack_depot_init() is called while creating boot slab caches,
> both slab allocator and memblock is not available. So kernel crashes.
> Allocate stack_table from page allocator when both slab allocator and
> memblock is unavailable.

This is odd - who is calling stack_depot_init() while neither slab nor
memblock are available? Do you have a stacktrace?

> Limit size of stack_table when using page allocator because vmalloc()
> is also unavailable in kmem_cache_init(). it must not be larger than
> (PAGE_SIZE << (MAX_ORDER - 1)).
>
> This patch was tested on both CONFIG_STACKDEPOT_ALWAYS_INIT=y and n.
>
> Fixes: 2dba5eb1c73b ("lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()")
> Signed-off-by: Hyeonggon Yoo <[email protected]>
> ---
> lib/stackdepot.c | 28 +++++++++++++++++++++-------
> 1 file changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index bf5ba9af0500..606f80ae2bf7 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -73,6 +73,14 @@ static int next_slab_inited;
> static size_t depot_offset;
> static DEFINE_RAW_SPINLOCK(depot_lock);
>
> +static unsigned int stack_hash_size = (1 << CONFIG_STACK_HASH_ORDER);
> +static inline unsigned int stack_hash_mask(void)
> +{
> + return stack_hash_size - 1;
> +}
> +
> +#define STACK_HASH_SEED 0x9747b28c
> +
> static bool init_stack_slab(void **prealloc)
> {
> if (!*prealloc)
> @@ -142,10 +150,6 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
> return stack;
> }
>
> -#define STACK_HASH_SIZE (1L << CONFIG_STACK_HASH_ORDER)
> -#define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
> -#define STACK_HASH_SEED 0x9747b28c
> -
> static bool stack_depot_disable;
> static struct stack_record **stack_table;
>
> @@ -172,18 +176,28 @@ __ref int stack_depot_init(void)
>
> mutex_lock(&stack_depot_init_mutex);
> if (!stack_depot_disable && !stack_table) {
> - size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
> + size_t size = (stack_hash_size * sizeof(struct stack_record *));
> int i;
>
> if (slab_is_available()) {
> pr_info("Stack Depot allocating hash table with kvmalloc\n");
> stack_table = kvmalloc(size, GFP_KERNEL);
> + } else if (totalram_pages() > 0) {
> + /* Reduce size because vmalloc may be unavailable */
> + size = min(size, PAGE_SIZE << (MAX_ORDER - 1));
> + stack_hash_size = size / sizeof(struct stack_record *);
> +
> + pr_info("Stack Depot allocating hash table with __get_free_pages\n");
> + stack_table = (struct stack_record **)
> + __get_free_pages(GFP_KERNEL, get_order(size));
> } else {
> pr_info("Stack Depot allocating hash table with memblock_alloc\n");
> stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
> }
> +
> if (stack_table) {
> - for (i = 0; i < STACK_HASH_SIZE; i++)
> + pr_info("Stack Depot hash table size=%u\n", stack_hash_size);
> + for (i = 0; i < stack_hash_size; i++)
> stack_table[i] = NULL;
> } else {
> pr_err("Stack Depot hash table allocation failed, disabling\n");
> @@ -363,7 +377,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
> goto fast_exit;
>
> hash = hash_stack(entries, nr_entries);
> - bucket = &stack_table[hash & STACK_HASH_MASK];
> + bucket = &stack_table[hash & stack_hash_mask()];
>
> /*
> * Fast path: look the stack trace up without locking.
> --
> 2.33.1

2022-02-28 14:11:48

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable

On Mon, Feb 28, 2022 at 08:00:00AM +0100, Marco Elver wrote:
> On Sun, 27 Feb 2022 at 04:08, Hyeonggon Yoo <[email protected]> wrote:
> >
> > After commit 2dba5eb1c73b ("lib/stackdepot: allow optional init and
> > stack_table allocation by kvmalloc()"), stack_depot_init() is called
> > later if CONFIG_STACKDEPOT_ALWAYS_INIT=n to remove unnecessary memory
> > usage. It allocates stack_table using memblock_alloc() or kvmalloc()
> > depending on availability of slab allocator.
> >
> > But when stack_depot_init() is called while creating boot slab caches,
> > both slab allocator and memblock is not available. So kernel crashes.
> > Allocate stack_table from page allocator when both slab allocator and
> > memblock is unavailable.
>
> This is odd - who is calling stack_depot_init() while neither slab nor
> memblock are available?

It's not merged yet - but Oliver's patch (2/5) in his series [1] does:
If user is debugging cache, it calls stack_depot_init() when creating
cache.

> @@ -4221,6 +4220,9 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
> s->remote_node_defrag_ratio = 1000;
> #endif
>
> + if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> + stack_depot_init();
> +

Oliver's patch series enables stack depot when arch supports stacktrace,
to store slab objects' stack traces. (as slub debugging feature.)

Because slub debugging is turned on by default, the commit 2dba5eb1c73b
("lib/stackdepot: allow optional init and stack_table allocation by
kvmalloc()") made stack_depot_init() can be called later.

With Oliver's patch applied, stack_depot_init() can be called in
contexts below:

1) only memblock available (for kasan)
2) only buddy available, vmalloc/memblock unavailable (for boot caches)
3) buddy/slab available, vmalloc/memblock unavailable (vmap_area cache)
4) buddy/slab/vmalloc available, memblock unavailable (other caches)

SLUB supports enabling debugging for specific cache by passing
slub_debug boot parameter. As slab caches can be created in
various context, stack_depot_init() should consider all contexts above.

Writing this, I realized my patch does not handle case 3).. I'll send v3.

[1] https://lore.kernel.org/linux-mm/YhoakP7Kih%[email protected]/T/#t
[2] https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1

> Do you have a stacktrace?

Yeah, here:

You can reproduce this on vbabka's slab-stackdepot-v1 branch [2] with
slub_debug=U, and CONFIG_STACKDEPOT_ALWAYS_INIT=n

[ 0.000000] Stack Depot allocating hash table with memblock_alloc
[ 0.000000] Unable to handle kernel paging request at virtual address ffff000097400000
[ 0.000000] Mem abort info:
[ 0.000000] ESR = 0x96000047
[ 0.000000] EC = 0x25: DABT (current EL), IL = 32 bits
[ 0.000000] SET = 0, FnV = 0
[ 0.000000] EA = 0, S1PTW = 0
[ 0.000000] FSC = 0x07: level 3 translation fault
[ 0.000000] Data abort info:
[ 0.000000] ISV = 0, ISS = 0x00000047
[ 0.000000] CM = 0, WnR = 1
[ 0.000000] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041719000
[ 0.000000] [ffff000097400000] pgd=18000000dcff8003, p4d=18000000dcff8003, pud=18000000dcbfe003, pmd=18000000dcb43003, pte=00680000d7400706
[ 0.000000] Internal error: Oops: 96000047 [#1] PREEMPT SMP
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-rc1-11918-gbf5d03166d75 #51
[ 0.000000] Hardware name: linux,dummy-virt (DT)
[ 0.000000] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.000000] pc : __memset+0x16c/0x188
[ 0.000000] lr : memblock_alloc_try_nid+0xcc/0xe4
[ 0.000000] sp : ffff800009a33cd0
[ 0.000000] x29: ffff800009a33cd0 x28: 0000000041720018 x27: ffff800009362640
[ 0.000000] x26: ffff800009362640 x25: 0000000000000000 x24: 0000000000000000
[ 0.000000] x23: 0000000000002000 x22: ffff80000932bb50 x21: 00000000ffffffff
[ 0.000000] x20: ffff000097400000 x19: 0000000000800000 x18: ffffffffffffffff
[ 0.000000] x17: 373578302f383278 x16: 302b657461657263 x15: 0000001000000000
[ 0.000000] x14: 0000000000000360 x13: 0000000000009f8c x12: 00000000dcb0c070
[ 0.000000] x11: 0000001000000000 x10: 00000000004ea000 x9 : 0000000000000000
[ 0.000000] x8 : ffff000097400000 x7 : 0000000000000000 x6 : 000000000000003f
[ 0.000000] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000004
[ 0.000000] x2 : 00000000007fffc0 x1 : 0000000000000000 x0 : ffff000097400000
[ 0.000000] Call trace:
[ 0.000000] __memset+0x16c/0x188
[ 0.000000] stack_depot_init+0xc8/0x100
[ 0.000000] __kmem_cache_create+0x454/0x570
[ 0.000000] create_boot_cache+0xa0/0xe0
[ 0.000000] kmem_cache_init+0xf8/0x204
[ 0.000000] start_kernel+0x3ec/0x668
[ 0.000000] __primary_switched+0xc0/0xc8
[ 0.000000] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428)
[ 0.000000] ---[ end trace 0000000000000000 ]---
[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

Thanks!

> > Limit size of stack_table when using page allocator because vmalloc()
> > is also unavailable in kmem_cache_init(). it must not be larger than
> > (PAGE_SIZE << (MAX_ORDER - 1)).
> >
> > This patch was tested on both CONFIG_STACKDEPOT_ALWAYS_INIT=y and n.
> >
> > Fixes: 2dba5eb1c73b ("lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()")
> > Signed-off-by: Hyeonggon Yoo <[email protected]>
> > ---
> > lib/stackdepot.c | 28 +++++++++++++++++++++-------
> > 1 file changed, 21 insertions(+), 7 deletions(-)
> >
> > diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> > index bf5ba9af0500..606f80ae2bf7 100644
> > --- a/lib/stackdepot.c
> > +++ b/lib/stackdepot.c
> > @@ -73,6 +73,14 @@ static int next_slab_inited;
> > static size_t depot_offset;
> > static DEFINE_RAW_SPINLOCK(depot_lock);
> >
> > +static unsigned int stack_hash_size = (1 << CONFIG_STACK_HASH_ORDER);
> > +static inline unsigned int stack_hash_mask(void)
> > +{
> > + return stack_hash_size - 1;
> > +}
> > +
> > +#define STACK_HASH_SEED 0x9747b28c
> > +
> > static bool init_stack_slab(void **prealloc)
> > {
> > if (!*prealloc)
> > @@ -142,10 +150,6 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
> > return stack;
> > }
> >
> > -#define STACK_HASH_SIZE (1L << CONFIG_STACK_HASH_ORDER)
> > -#define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
> > -#define STACK_HASH_SEED 0x9747b28c
> > -
> > static bool stack_depot_disable;
> > static struct stack_record **stack_table;
> >
> > @@ -172,18 +176,28 @@ __ref int stack_depot_init(void)
> >
> > mutex_lock(&stack_depot_init_mutex);
> > if (!stack_depot_disable && !stack_table) {
> > - size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
> > + size_t size = (stack_hash_size * sizeof(struct stack_record *));
> > int i;
> >
> > if (slab_is_available()) {
> > pr_info("Stack Depot allocating hash table with kvmalloc\n");
> > stack_table = kvmalloc(size, GFP_KERNEL);
> > + } else if (totalram_pages() > 0) {
> > + /* Reduce size because vmalloc may be unavailable */
> > + size = min(size, PAGE_SIZE << (MAX_ORDER - 1));
> > + stack_hash_size = size / sizeof(struct stack_record *);
> > +
> > + pr_info("Stack Depot allocating hash table with __get_free_pages\n");
> > + stack_table = (struct stack_record **)
> > + __get_free_pages(GFP_KERNEL, get_order(size));
> > } else {
> > pr_info("Stack Depot allocating hash table with memblock_alloc\n");
> > stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
> > }
> > +
> > if (stack_table) {
> > - for (i = 0; i < STACK_HASH_SIZE; i++)
> > + pr_info("Stack Depot hash table size=%u\n", stack_hash_size);
> > + for (i = 0; i < stack_hash_size; i++)
> > stack_table[i] = NULL;
> > } else {
> > pr_err("Stack Depot hash table allocation failed, disabling\n");
> > @@ -363,7 +377,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
> > goto fast_exit;
> >
> > hash = hash_stack(entries, nr_entries);
> > - bucket = &stack_table[hash & STACK_HASH_MASK];
> > + bucket = &stack_table[hash & stack_hash_mask()];
> >
> > /*
> > * Fast path: look the stack trace up without locking.
> > --
> > 2.33.1

--
Thank you, You are awesome!
Hyeonggon :-)

2022-02-28 17:12:30

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable

On Mon, Feb 28, 2022 at 11:50:49AM +0100, Marco Elver wrote:
> On Mon, 28 Feb 2022 at 11:05, Hyeonggon Yoo <[email protected]> wrote:
> [...]
> > > This is odd - who is calling stack_depot_init() while neither slab nor
> > > memblock are available?
> >
> > It's not merged yet - but Oliver's patch (2/5) in his series [1] does:
> > If user is debugging cache, it calls stack_depot_init() when creating
> > cache.
> >
> > > @@ -4221,6 +4220,9 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
> > > s->remote_node_defrag_ratio = 1000;
> > > #endif
> > >
> > > + if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> > > + stack_depot_init();
> > > +
> >
> > Oliver's patch series enables stack depot when arch supports stacktrace,
> > to store slab objects' stack traces. (as slub debugging feature.)
> >
> > Because slub debugging is turned on by default, the commit 2dba5eb1c73b
> > ("lib/stackdepot: allow optional init and stack_table allocation by
> > kvmalloc()") made stack_depot_init() can be called later.
> >
> > With Oliver's patch applied, stack_depot_init() can be called in
> > contexts below:
> >
> > 1) only memblock available (for kasan)
> > 2) only buddy available, vmalloc/memblock unavailable (for boot caches)
> > 3) buddy/slab available, vmalloc/memblock unavailable (vmap_area cache)
> > 4) buddy/slab/vmalloc available, memblock unavailable (other caches)
> >
> > SLUB supports enabling debugging for specific cache by passing
> > slub_debug boot parameter. As slab caches can be created in
> > various context, stack_depot_init() should consider all contexts above.
> >
> > Writing this, I realized my patch does not handle case 3).. I'll send v3.
> >
> > [1] https://lore.kernel.org/linux-mm/YhoakP7Kih%[email protected]/T/#t
> > [2] https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
> >
> > > Do you have a stacktrace?
> >
> > Yeah, here:
> >
> > You can reproduce this on vbabka's slab-stackdepot-v1 branch [2] with
> > slub_debug=U, and CONFIG_STACKDEPOT_ALWAYS_INIT=n
> >
> [...]
> > [ 0.000000] Call trace:
> > [ 0.000000] __memset+0x16c/0x188
> > [ 0.000000] stack_depot_init+0xc8/0x100
> > [ 0.000000] __kmem_cache_create+0x454/0x570
> > [ 0.000000] create_boot_cache+0xa0/0xe0
>
> I think even before this point you have all the information required
> to determine if stackdepot will be required. It's available after
> setup_slub_debug().
>
> So why can't you just call stack_depot_init() somewhere else and avoid
> all this complexity?
>

You are right. That is much simpler and sound good as SLUB does not
support enabling SLAB_STORE_USER flag when system is up.

I'll try this approach.
Thank you!

> > [ 0.000000] kmem_cache_init+0xf8/0x204
> > [ 0.000000] start_kernel+0x3ec/0x668
> > [ 0.000000] __primary_switched+0xc0/0xc8
> > [ 0.000000] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428)
> > [ 0.000000] ---[ end trace 0000000000000000 ]---
> > [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> > [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

--
Thank you, You are awesome!
Hyeonggon :-)

2022-02-28 17:28:10

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH] mm/slub: initialize stack depot in boot process

commit ba10d4b46655 ("mm/slub: use stackdepot to save stack trace in
objects") initializes stack depot while creating cache if SLAB_STORE_USER
flag is set.

This can make kernel crash because a cache can be created in various
contexts. For example if user sets slub_debug=U, kernel crashes
because create_boot_cache() calls stack_depot_init(), which tries to
allocate hash table using memblock_alloc() if slab is not available.
But memblock is also not available at that time.

This patch solves the problem by initializing stack depot early
in boot process if SLAB_STORE_USER debug flag is set globally
or the flag is set to at least one cache.

[ [email protected]: initialize stack depot depending on slub_debug
parameter instead of allowing stack_depot_init() can be called
in kmem_cache_init() for simplicity. ]

Link: https://lkml.org/lkml/2022/2/28/238
Fixes: ba10d4b46655 ("mm/slub: use stackdepot to save stack trace in objects")
Signed-off-by: Hyeonggon Yoo <[email protected]>
---
include/linux/slab.h | 1 +
init/main.c | 1 +
mm/slab.c | 4 ++++
mm/slob.c | 4 ++++
mm/slub.c | 28 +++++++++++++++++++++++++---
5 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 37bde99b74af..023f3f71ae35 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -139,6 +139,7 @@ struct mem_cgroup;
/*
* struct kmem_cache related prototypes
*/
+void __init kmem_cache_init_early(void);
void __init kmem_cache_init(void);
bool slab_is_available(void);

diff --git a/init/main.c b/init/main.c
index 65fa2e41a9c0..4fdb7975a085 100644
--- a/init/main.c
+++ b/init/main.c
@@ -835,6 +835,7 @@ static void __init mm_init(void)
kfence_alloc_pool();
report_meminit();
stack_depot_early_init();
+ kmem_cache_init_early();
mem_init();
mem_init_print_info();
kmem_cache_init();
diff --git a/mm/slab.c b/mm/slab.c
index ddf5737c63d9..80a6d01aab06 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1196,6 +1196,10 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
}
}

+void __init kmem_cache_init_early(void)
+{
+}
+
/*
* Initialisation. Called after the page allocator have been initialised and
* before smp_init().
diff --git a/mm/slob.c b/mm/slob.c
index 60c5842215f1..00e323af8be4 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -715,6 +715,10 @@ struct kmem_cache kmem_cache_boot = {
.align = ARCH_KMALLOC_MINALIGN,
};

+void __init kmem_cache_init_early(void)
+{
+}
+
void __init kmem_cache_init(void)
{
kmem_cache = &kmem_cache_boot;
diff --git a/mm/slub.c b/mm/slub.c
index a74afe59a403..40bcd18143b6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4221,9 +4221,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
s->remote_node_defrag_ratio = 1000;
#endif

- if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
- stack_depot_init();
-
/* Initialize the pre-computed randomized freelist if slab is up */
if (slab_state >= UP) {
if (init_cache_random_seq(s))
@@ -4810,6 +4807,31 @@ static struct kmem_cache * __init bootstrap(struct kmem_cache *static_cache)
return s;
}

+/* Initialize stack depot if needed */
+void __init kmem_cache_init_early(void)
+{
+#ifdef CONFIG_STACKDEPOT
+ slab_flags_t block_flags;
+ char *next_block;
+ char *slab_list;
+
+ if (slub_debug & SLAB_STORE_USER)
+ goto init_stack_depot;
+
+ next_block = slub_debug_string;
+ while (next_block) {
+ next_block = parse_slub_debug_flags(next_block, &block_flags, &slab_list, false);
+ if (block_flags & SLAB_STORE_USER)
+ goto init_stack_depot;
+ }
+
+ return;
+
+init_stack_depot:
+ stack_depot_init();
+#endif
+}
+
void __init kmem_cache_init(void)
{
static __initdata struct kmem_cache boot_kmem_cache,
--
2.33.1

2022-02-28 17:37:10

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] mm/slub: initialize stack depot in boot process

On Mon, Feb 28, 2022 at 03:09PM +0000, Hyeonggon Yoo wrote:
> commit ba10d4b46655 ("mm/slub: use stackdepot to save stack trace in
> objects") initializes stack depot while creating cache if SLAB_STORE_USER
> flag is set.
>
> This can make kernel crash because a cache can be created in various
> contexts. For example if user sets slub_debug=U, kernel crashes
> because create_boot_cache() calls stack_depot_init(), which tries to
> allocate hash table using memblock_alloc() if slab is not available.
> But memblock is also not available at that time.
>
> This patch solves the problem by initializing stack depot early
> in boot process if SLAB_STORE_USER debug flag is set globally
> or the flag is set to at least one cache.
>
> [ [email protected]: initialize stack depot depending on slub_debug
> parameter instead of allowing stack_depot_init() can be called
> in kmem_cache_init() for simplicity. ]
>
> Link: https://lkml.org/lkml/2022/2/28/238

This would be a better permalink:
https://lore.kernel.org/all/YhyeaP8lrzKgKm5A@ip-172-31-19-208.ap-northeast-1.compute.internal/

> Fixes: ba10d4b46655 ("mm/slub: use stackdepot to save stack trace in objects")

This commit does not exist in -next.

I assume you intend that "lib/stackdepot: Use page allocator if both
slab and memblock is unavailable" should be dropped now.

> Signed-off-by: Hyeonggon Yoo <[email protected]>
> ---
> include/linux/slab.h | 1 +
> init/main.c | 1 +
> mm/slab.c | 4 ++++
> mm/slob.c | 4 ++++
> mm/slub.c | 28 +++++++++++++++++++++++++---
> 5 files changed, 35 insertions(+), 3 deletions(-)
[...]
>
> +/* Initialize stack depot if needed */
> +void __init kmem_cache_init_early(void)
> +{
> +#ifdef CONFIG_STACKDEPOT
> + slab_flags_t block_flags;
> + char *next_block;
> + char *slab_list;
> +
> + if (slub_debug & SLAB_STORE_USER)
> + goto init_stack_depot;
> +
> + next_block = slub_debug_string;
> + while (next_block) {
> + next_block = parse_slub_debug_flags(next_block, &block_flags, &slab_list, false);
> + if (block_flags & SLAB_STORE_USER)
> + goto init_stack_depot;
> + }
> +
> + return;
> +
> +init_stack_depot:
> + stack_depot_init();
> +#endif
> +}

You can simplify this function to avoid the goto:

/* Initialize stack depot if needed */
void __init kmem_cache_init_early(void)
{
#ifdef CONFIG_STACKDEPOT
slab_flags_t flags = slub_debug;
char *next_block = slub_debug_string;
char *slab_list;

for (;;) {
if (flags & SLAB_STORE_USER) {
stack_depot_init();
break;
}
if (!next_block)
break;
next_block = parse_slub_debug_flags(next_block, &flags, &slab_list, false);
}
#endif
}

^^ with this version, it'd also be much easier and less confusing to add
other initialization logic unrelated to stackdepot later after the loop
(should it ever be required).

2022-02-28 19:11:01

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects

On 2/26/22 11:24, Hyeonggon Yoo wrote:
> On Fri, Feb 25, 2022 at 07:03:15PM +0100, Vlastimil Babka wrote:
>> From: Oliver Glitta <[email protected]>
>>
>> Many stack traces are similar so there are many similar arrays.
>> Stackdepot saves each unique stack only once.
>>
>> Replace field addrs in struct track with depot_stack_handle_t handle. Use
>> stackdepot to save stack trace.
>>
>> The benefits are smaller memory overhead and possibility to aggregate
>> per-cache statistics in the following patch using the stackdepot handle
>> instead of matching stacks manually.
>>
>> [ [email protected]: rebase to 5.17-rc1 and adjust accordingly ]
>>
>> This was initially merged as commit 788691464c29 and reverted by commit
>> ae14c63a9f20 due to several issues, that should now be fixed.
>> The problem of unconditional memory overhead by stackdepot has been
>> addressed by commit 2dba5eb1c73b ("lib/stackdepot: allow optional init
>> and stack_table allocation by kvmalloc()"), so the dependency on
>> stackdepot will result in extra memory usage only when a slab cache
>> tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
>> The build failures on some architectures were also addressed, and the
>> reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
>> patch.
>>
>> Signed-off-by: Oliver Glitta <[email protected]>
>> Signed-off-by: Vlastimil Babka <[email protected]>
>> Cc: David Rientjes <[email protected]>
>> Cc: Christoph Lameter <[email protected]>
>> Cc: Pekka Enberg <[email protected]>
>> Cc: Joonsoo Kim <[email protected]>
>> ---
>> init/Kconfig | 1 +
>> mm/slub.c | 88 +++++++++++++++++++++++++++++-----------------------
>> 2 files changed, 50 insertions(+), 39 deletions(-)
>>
>> diff --git a/init/Kconfig b/init/Kconfig
>> index e9119bf54b1f..b21dd3a4a106 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -1871,6 +1871,7 @@ config SLUB_DEBUG
>> default y
>> bool "Enable SLUB debugging support" if EXPERT
>> depends on SLUB && SYSFS
>> + select STACKDEPOT if STACKTRACE_SUPPORT
>> help
>> SLUB has extensive debug support features. Disabling these can
>> result in significant savings in code size. This also disables
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 1fc451f4fe62..3140f763e819 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -26,6 +26,7 @@
>> #include <linux/cpuset.h>
>> #include <linux/mempolicy.h>
>> #include <linux/ctype.h>
>> +#include <linux/stackdepot.h>
>> #include <linux/debugobjects.h>
>> #include <linux/kallsyms.h>
>> #include <linux/kfence.h>
>> @@ -264,8 +265,8 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
>> #define TRACK_ADDRS_COUNT 16
>> struct track {
>> unsigned long addr; /* Called from address */
>> -#ifdef CONFIG_STACKTRACE
>> - unsigned long addrs[TRACK_ADDRS_COUNT]; /* Called from address */
>> +#ifdef CONFIG_STACKDEPOT
>> + depot_stack_handle_t handle;
>> #endif
>> int cpu; /* Was running on cpu */
>> int pid; /* Pid context */
>> @@ -724,22 +725,20 @@ static struct track *get_track(struct kmem_cache *s, void *object,
>> return kasan_reset_tag(p + alloc);
>> }
>>
>> -static void set_track(struct kmem_cache *s, void *object,
>> - enum track_item alloc, unsigned long addr)
>> +static noinline void
>> +set_track(struct kmem_cache *s, void *object, enum track_item alloc,
>> + unsigned long addr, gfp_t flags)
>> {
>> struct track *p = get_track(s, object, alloc);
>>
>> -#ifdef CONFIG_STACKTRACE
>> +#ifdef CONFIG_STACKDEPOT
>> + unsigned long entries[TRACK_ADDRS_COUNT];
>> unsigned int nr_entries;
>>
>> - metadata_access_enable();
>> - nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
>> - TRACK_ADDRS_COUNT, 3);
>> - metadata_access_disable();
>> -
>> - if (nr_entries < TRACK_ADDRS_COUNT)
>> - p->addrs[nr_entries] = 0;
>> + nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 3);
>> + p->handle = stack_depot_save(entries, nr_entries, flags);
>> #endif
>> +
>> p->addr = addr;
>> p->cpu = smp_processor_id();
>> p->pid = current->pid;
>> @@ -759,20 +758,19 @@ static void init_tracking(struct kmem_cache *s, void *object)
>>
>> static void print_track(const char *s, struct track *t, unsigned long pr_time)
>> {
>> + depot_stack_handle_t handle __maybe_unused;
>> +
>> if (!t->addr)
>> return;
>>
>> pr_err("%s in %pS age=%lu cpu=%u pid=%d\n",
>> s, (void *)t->addr, pr_time - t->when, t->cpu, t->pid);
>> -#ifdef CONFIG_STACKTRACE
>> - {
>> - int i;
>> - for (i = 0; i < TRACK_ADDRS_COUNT; i++)
>> - if (t->addrs[i])
>> - pr_err("\t%pS\n", (void *)t->addrs[i]);
>> - else
>> - break;
>> - }
>> +#ifdef CONFIG_STACKDEPOT
>> + handle = READ_ONCE(t->handle);
>> + if (handle)
>> + stack_depot_print(handle);
>> + else
>> + pr_err("object allocation/free stack trace missing\n");
>> #endif
>> }
>>
>> @@ -1304,9 +1302,9 @@ static inline int alloc_consistency_checks(struct kmem_cache *s,
>> return 1;
>> }
>>
>> -static noinline int alloc_debug_processing(struct kmem_cache *s,
>> - struct slab *slab,
>> - void *object, unsigned long addr)
>> +static noinline int
>> +alloc_debug_processing(struct kmem_cache *s, struct slab *slab, void *object,
>> + unsigned long addr, gfp_t flags)
>> {
>> if (s->flags & SLAB_CONSISTENCY_CHECKS) {
>> if (!alloc_consistency_checks(s, slab, object))
>> @@ -1315,7 +1313,7 @@ static noinline int alloc_debug_processing(struct kmem_cache *s,
>>
>> /* Success perform special debug activities for allocs */
>> if (s->flags & SLAB_STORE_USER)
>> - set_track(s, object, TRACK_ALLOC, addr);
>> + set_track(s, object, TRACK_ALLOC, addr, flags);
>
> I see warning because of this.
> We should not reuse flags here because alloc_debug_processing() can be
> called with preemption disabled, and caller specified GFP_KERNEL.

Ugh, thanks for catching this, looks like I forgot to test with necessary
config options. Indeed the previous version of this patch that was commit
788691464c29 used GFP_NOWAIT. I have used the idea to pass allocation
gfpflags from Imran's version (another Cc I forgot, sorry)
https://lore.kernel.org/all/[email protected]/
...

> [ 2.015902] BUG: sleeping function called from invalid context at mm/page_alloc.c:5164
> [ 2.022052] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
> [ 2.028357] preempt_count: 1, expected: 0
> [ 2.031508] RCU nest depth: 0, expected: 0
> [ 2.034722] 1 lock held by swapper/0/1:
> [ 2.037905] #0: ffff00000488f4d0 (&sb->s_type->i_mutex_key#5){+.+.}-{4:4}, at: start_creating+0x58/0x130
> [ 2.045393] Preemption disabled at:
> [ 2.045400] [<ffff8000083bd008>] __slab_alloc.constprop.0+0x38/0xc0

... but indeed __slab_alloc() disables preemption so that won't work, and we
can only safely use GFP_NOWAIT. Will fix in v2, thanks.

> [ 2.053039] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.17.0-rc5+ #105
> [ 2.059365] Hardware name: linux,dummy-virt (DT)
> [ 2.063160] Call trace:
> [ 2.065217] dump_backtrace+0xf8/0x130
> [ 2.068350] show_stack+0x24/0x80
> [ 2.071104] dump_stack_lvl+0x9c/0xd8
> [ 2.074140] dump_stack+0x18/0x34
> [ 2.076894] __might_resched+0x1a0/0x280
> [ 2.080146] __might_sleep+0x58/0x90
> [ 2.083108] prepare_alloc_pages.constprop.0+0x1b4/0x1f0
> [ 2.087468] __alloc_pages+0x88/0x1e0
> [ 2.090502] alloc_page_interleave+0x24/0xb4
> [ 2.094021] alloc_pages+0x10c/0x170
> [ 2.096984] __stack_depot_save+0x3e0/0x4e0
> [ 2.100446] stack_depot_save+0x14/0x20
> [ 2.103617] set_track.isra.0+0x64/0xa4
> [ 2.106787] alloc_debug_processing+0x11c/0x1e0
> [ 2.110532] ___slab_alloc+0x3e8/0x750
> [ 2.113643] __slab_alloc.constprop.0+0x64/0xc0
> [ 2.117391] kmem_cache_alloc+0x304/0x350
> [ 2.120702] security_inode_alloc+0x38/0xa4
> [ 2.124169] inode_init_always+0xd0/0x264
> [ 2.127501] alloc_inode+0x44/0xec
> [ 2.130325] new_inode+0x28/0xc0
> [ 2.133011] tracefs_create_file+0x74/0x1e0
> [ 2.136459] init_tracer_tracefs+0x248/0x644
> [ 2.140030] tracer_init_tracefs+0x9c/0x34c
> [ 2.143483] do_one_initcall+0x44/0x170
> [ 2.146654] do_initcalls+0x104/0x144
> [ 2.149704] kernel_init_freeable+0x130/0x178
>
> [...]
>
>> trace(s, slab, object, 1);
>> init_object(s, object, SLUB_RED_ACTIVE);
>> return 1;
>> @@ -1395,7 +1393,7 @@ static noinline int free_debug_processing(
>> }
>>
>> if (s->flags & SLAB_STORE_USER)
>> - set_track(s, object, TRACK_FREE, addr);
>> + set_track(s, object, TRACK_FREE, addr, GFP_NOWAIT);
>> trace(s, slab, object, 0);
>> /* Freepointer not overwritten by init_object(), SLAB_POISON moved it */
>> init_object(s, object, SLUB_RED_INACTIVE);
>> @@ -1632,7 +1630,8 @@ static inline
>> void setup_slab_debug(struct kmem_cache *s, struct slab *slab, void *addr) {}
>>
>> static inline int alloc_debug_processing(struct kmem_cache *s,
>> - struct slab *slab, void *object, unsigned long addr) { return 0; }
>> + struct slab *slab, void *object, unsigned long addr,
>> + gfp_t flags) { return 0; }
>>
>> static inline int free_debug_processing(
>> struct kmem_cache *s, struct slab *slab,
>> @@ -3033,7 +3032,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>> check_new_slab:
>>
>> if (kmem_cache_debug(s)) {
>> - if (!alloc_debug_processing(s, slab, freelist, addr)) {
>> + if (!alloc_debug_processing(s, slab, freelist, addr, gfpflags)) {
>> /* Slab failed checks. Next slab needed */
>> goto new_slab;
>> } else {
>> @@ -4221,6 +4220,9 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
>> s->remote_node_defrag_ratio = 1000;
>> #endif
>>
>> + if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
>> + stack_depot_init();
>> +
>
> As mentioned in my report, it can crash system when creating boot caches
> with debugging enabled.
>
> The rest looks fine!
>
>> /* Initialize the pre-computed randomized freelist if slab is up */
>> if (slab_state >= UP) {
>> if (init_cache_random_seq(s))
>> @@ -4352,18 +4354,26 @@ void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
>> objp = fixup_red_left(s, objp);
>> trackp = get_track(s, objp, TRACK_ALLOC);
>> kpp->kp_ret = (void *)trackp->addr;
>> -#ifdef CONFIG_STACKTRACE
>> - for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
>> - kpp->kp_stack[i] = (void *)trackp->addrs[i];
>> - if (!kpp->kp_stack[i])
>> - break;
>> - }
>> +#ifdef CONFIG_STACKDEPOT
>> + {
>> + depot_stack_handle_t handle;
>> + unsigned long *entries;
>> + unsigned int nr_entries;
>> +
>> + handle = READ_ONCE(trackp->handle);
>> + if (handle) {
>> + nr_entries = stack_depot_fetch(handle, &entries);
>> + for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
>> + kpp->kp_stack[i] = (void *)entries[i];
>> + }
>>
>> - trackp = get_track(s, objp, TRACK_FREE);
>> - for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
>> - kpp->kp_free_stack[i] = (void *)trackp->addrs[i];
>> - if (!kpp->kp_free_stack[i])
>> - break;
>> + trackp = get_track(s, objp, TRACK_FREE);
>> + handle = READ_ONCE(trackp->handle);
>> + if (handle) {
>> + nr_entries = stack_depot_fetch(handle, &entries);
>> + for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
>> + kpp->kp_free_stack[i] = (void *)entries[i];
>> + }
>> }
>> #endif
>> #endif
>> --
>> 2.35.1
>>
>>
>

2022-02-28 20:27:15

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot

On 2/26/22 08:19, Hyeonggon Yoo wrote:
> On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
>> Hi,
>>
>> this series combines and revives patches from Oliver's last year
>> bachelor thesis (where I was the advisor) that make SLUB's debugfs
>> files alloc_traces and free_traces more useful.
>> The resubmission was blocked on stackdepot changes that are now merged,
>> as explained in patch 2.
>>
>
> Hello. I just started review/testing this series.
>
> it crashed on my system (arm64)

Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
from memblock. arm64 must have memblock freeing happen earlier or something.
(CCing memblock experts)

> I ran with boot parameter slub_debug=U, and without KASAN.
> So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
>
> void * __init memblock_alloc_try_nid(
> phys_addr_t size, phys_addr_t align,
> phys_addr_t min_addr, phys_addr_t max_addr,
> int nid)
> {
> void *ptr;
>
> memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> __func__, (u64)size, (u64)align, nid, &min_addr,
> &max_addr, (void *)_RET_IP_);
> ptr = memblock_alloc_internal(size, align,
> min_addr, max_addr, nid, false);
> if (ptr)
> memset(ptr, 0, size); <--- Crash Here
>
> return ptr;
> }
>
> It crashed during create_boot_cache() -> stack_depot_init() ->
> memblock_alloc().
>
> I think That's because, in kmem_cache_init(), both slab and memblock is not
> available. (AFAIU memblock is not available after mem_init() because of
> memblock_free_all(), right?)

Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
But then, I would expect stack_depot_init() to detect that memblock_alloc()
returns NULL, we print ""Stack Depot hash table allocation failed,
disabling" and disable it. Instead it seems memblock_alloc() returns
something that's already potentially used by somebody else? Sounds like a bug?

> Thanks!
>
> /*
> * Set up kernel memory allocators
> */
> static void __init mm_init(void)
> {
> /*
> * page_ext requires contiguous pages,
> * bigger than MAX_ORDER unless SPARSEMEM.
> */
> page_ext_init_flatmem();
> init_mem_debugging_and_hardening();
> kfence_alloc_pool();
> report_meminit();
> stack_depot_early_init();
> mem_init();
> mem_init_print_info();
> kmem_cache_init();
> /*
> * page_owner must be initialized after buddy is ready, and also after
> * slab is ready so that stack_depot_init() works properly
> */)
>
>> Patch 1 is a new preparatory cleanup.
>>
>> Patch 2 originally submitted here [1], was merged to mainline but
>> reverted for stackdepot related issues as explained in the patch.
>>
>> Patches 3-5 originally submitted as RFC here [2]. In this submission I
>> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
>> be considered too intrusive so I will postpone it for later. The docs
>> patch is adjusted accordingly.
>>
>> Also available in git, based on v5.17-rc1:
>> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
>>
>> I'd like to ask for some review before I add this to the slab tree.
>>
>> [1] https://lore.kernel.org/all/[email protected]/
>> [2] https://lore.kernel.org/all/[email protected]/
>>
>> Oliver Glitta (4):
>> mm/slub: use stackdepot to save stack trace in objects
>> mm/slub: aggregate and print stack traces in debugfs files
>> mm/slub: sort debugfs output by frequency of stack traces
>> slab, documentation: add description of debugfs files for SLUB caches
>>
>> Vlastimil Babka (1):
>> mm/slub: move struct track init out of set_track()
>>
>> Documentation/vm/slub.rst | 61 +++++++++++++++
>> init/Kconfig | 1 +
>> mm/slub.c | 152 +++++++++++++++++++++++++-------------
>> 3 files changed, 162 insertions(+), 52 deletions(-)
>>
>> --
>> 2.35.1
>>
>>
>

2022-02-28 20:41:24

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot

On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> On 2/26/22 08:19, Hyeonggon Yoo wrote:
> > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> >> Hi,
> >>
> >> this series combines and revives patches from Oliver's last year
> >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> >> files alloc_traces and free_traces more useful.
> >> The resubmission was blocked on stackdepot changes that are now merged,
> >> as explained in patch 2.
> >>
> >
> > Hello. I just started review/testing this series.
> >
> > it crashed on my system (arm64)
>
> Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> from memblock. arm64 must have memblock freeing happen earlier or something.
> (CCing memblock experts)
>
> > I ran with boot parameter slub_debug=U, and without KASAN.
> > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> >
> > void * __init memblock_alloc_try_nid(
> > phys_addr_t size, phys_addr_t align,
> > phys_addr_t min_addr, phys_addr_t max_addr,
> > int nid)
> > {
> > void *ptr;
> >
> > memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> > __func__, (u64)size, (u64)align, nid, &min_addr,
> > &max_addr, (void *)_RET_IP_);
> > ptr = memblock_alloc_internal(size, align,
> > min_addr, max_addr, nid, false);
> > if (ptr)
> > memset(ptr, 0, size); <--- Crash Here
> >
> > return ptr;
> > }
> >
> > It crashed during create_boot_cache() -> stack_depot_init() ->
> > memblock_alloc().
> >
> > I think That's because, in kmem_cache_init(), both slab and memblock is not
> > available. (AFAIU memblock is not available after mem_init() because of
> > memblock_free_all(), right?)
>
> Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> But then, I would expect stack_depot_init() to detect that memblock_alloc()
> returns NULL, we print ""Stack Depot hash table allocation failed,
> disabling" and disable it. Instead it seems memblock_alloc() returns
> something that's already potentially used by somebody else? Sounds like a bug?

If stack_depot_init() is called from kmem_cache_init(), there will be a
confusion what allocator should be used because we use slab_is_available()
to stop using memblock and start using kmalloc() instead in both
stack_depot_init() and in memblock.

Hyeonggon, did you run your tests with panic on warn at any chance?

> > Thanks!
> >
> > /*
> > * Set up kernel memory allocators
> > */
> > static void __init mm_init(void)
> > {
> > /*
> > * page_ext requires contiguous pages,
> > * bigger than MAX_ORDER unless SPARSEMEM.
> > */
> > page_ext_init_flatmem();
> > init_mem_debugging_and_hardening();
> > kfence_alloc_pool();
> > report_meminit();
> > stack_depot_early_init();
> > mem_init();
> > mem_init_print_info();
> > kmem_cache_init();
> > /*
> > * page_owner must be initialized after buddy is ready, and also after
> > * slab is ready so that stack_depot_init() works properly
> > */)
> >
> >> Patch 1 is a new preparatory cleanup.
> >>
> >> Patch 2 originally submitted here [1], was merged to mainline but
> >> reverted for stackdepot related issues as explained in the patch.
> >>
> >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> >> be considered too intrusive so I will postpone it for later. The docs
> >> patch is adjusted accordingly.
> >>
> >> Also available in git, based on v5.17-rc1:
> >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
> >>
> >> I'd like to ask for some review before I add this to the slab tree.
> >>
> >> [1] https://lore.kernel.org/all/[email protected]/
> >> [2] https://lore.kernel.org/all/[email protected]/
> >>
> >> Oliver Glitta (4):
> >> mm/slub: use stackdepot to save stack trace in objects
> >> mm/slub: aggregate and print stack traces in debugfs files
> >> mm/slub: sort debugfs output by frequency of stack traces
> >> slab, documentation: add description of debugfs files for SLUB caches
> >>
> >> Vlastimil Babka (1):
> >> mm/slub: move struct track init out of set_track()
> >>
> >> Documentation/vm/slub.rst | 61 +++++++++++++++
> >> init/Kconfig | 1 +
> >> mm/slub.c | 152 +++++++++++++++++++++++++-------------
> >> 3 files changed, 162 insertions(+), 52 deletions(-)
> >>
> >> --
> >> 2.35.1
> >>
> >>
> >
>

--
Sincerely yours,
Mike.

2022-02-28 21:29:16

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot

On Mon, Feb 28, 2022 at 10:01:29PM +0200, Mike Rapoport wrote:
> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> > On 2/26/22 08:19, Hyeonggon Yoo wrote:
> > > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> > >> Hi,
> > >>
> > >> this series combines and revives patches from Oliver's last year
> > >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> > >> files alloc_traces and free_traces more useful.
> > >> The resubmission was blocked on stackdepot changes that are now merged,
> > >> as explained in patch 2.
> > >>
> > >
> > > Hello. I just started review/testing this series.
> > >
> > > it crashed on my system (arm64)
> >
> > Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> > from memblock. arm64 must have memblock freeing happen earlier or something.
> > (CCing memblock experts)
> >
> > > I ran with boot parameter slub_debug=U, and without KASAN.
> > > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> > >
> > > void * __init memblock_alloc_try_nid(
> > > phys_addr_t size, phys_addr_t align,
> > > phys_addr_t min_addr, phys_addr_t max_addr,
> > > int nid)
> > > {
> > > void *ptr;
> > >
> > > memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> > > __func__, (u64)size, (u64)align, nid, &min_addr,
> > > &max_addr, (void *)_RET_IP_);
> > > ptr = memblock_alloc_internal(size, align,
> > > min_addr, max_addr, nid, false);
> > > if (ptr)
> > > memset(ptr, 0, size); <--- Crash Here
> > >
> > > return ptr;
> > > }
> > >
> > > It crashed during create_boot_cache() -> stack_depot_init() ->
> > > memblock_alloc().
> > >
> > > I think That's because, in kmem_cache_init(), both slab and memblock is not
> > > available. (AFAIU memblock is not available after mem_init() because of
> > > memblock_free_all(), right?)
> >
> > Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> > But then, I would expect stack_depot_init() to detect that memblock_alloc()
> > returns NULL, we print ""Stack Depot hash table allocation failed,
> > disabling" and disable it. Instead it seems memblock_alloc() returns
> > something that's already potentially used by somebody else? Sounds like a bug?
>

It's really weird, but memblock_alloc() did not fail after
memblock_free_all(). it just crashed while initializing memory returned
by memblock.

> If stack_depot_init() is called from kmem_cache_init(), there will be a
> confusion what allocator should be used because we use slab_is_available()
> to stop using memblock and start using kmalloc() instead in both
> stack_depot_init() and in memblock.
>
> Hyeonggon, did you run your tests with panic on warn at any chance?
>

Yeah, I think this stack trace would help:

[ 0.000000] Stack Depot allocating hash table with memblock_alloc
[ 0.000000] Unable to handle kernel paging request at virtual address ffff000097400000
[ 0.000000] Mem abort info:
[ 0.000000] ESR = 0x96000047
[ 0.000000] EC = 0x25: DABT (current EL), IL = 32 bits
[ 0.000000] SET = 0, FnV = 0
[ 0.000000] EA = 0, S1PTW = 0
[ 0.000000] FSC = 0x07: level 3 translation fault
[ 0.000000] Data abort info:
[ 0.000000] ISV = 0, ISS = 0x00000047
[ 0.000000] CM = 0, WnR = 1
[ 0.000000] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041719000
[ 0.000000] [ffff000097400000] pgd=18000000dcff8003, p4d=18000000dcff8003, pud=18000000dcbfe003, pmd=18000000dcb43003, pte=00680000d7400706
[ 0.000000] Internal error: Oops: 96000047 [#1] PREEMPT SMP
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-rc1-11918-gbf5d03166d75 #51
[ 0.000000] Hardware name: linux,dummy-virt (DT)
[ 0.000000] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.000000] pc : __memset+0x16c/0x188
[ 0.000000] lr : memblock_alloc_try_nid+0xcc/0xe4
[ 0.000000] sp : ffff800009a33cd0
[ 0.000000] x29: ffff800009a33cd0 x28: 0000000041720018 x27: ffff800009362640
[ 0.000000] x26: ffff800009362640 x25: 0000000000000000 x24: 0000000000000000
[ 0.000000] x23: 0000000000002000 x22: ffff80000932bb50 x21: 00000000ffffffff
[ 0.000000] x20: ffff000097400000 x19: 0000000000800000 x18: ffffffffffffffff
[ 0.000000] x17: 373578302f383278 x16: 302b657461657263 x15: 0000001000000000
[ 0.000000] x14: 0000000000000360 x13: 0000000000009f8c x12: 00000000dcb0c070
[ 0.000000] x11: 0000001000000000 x10: 00000000004ea000 x9 : 0000000000000000
[ 0.000000] x8 : ffff000097400000 x7 : 0000000000000000 x6 : 000000000000003f
[ 0.000000] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000004
[ 0.000000] x2 : 00000000007fffc0 x1 : 0000000000000000 x0 : ffff000097400000
[ 0.000000] Call trace:
[ 0.000000] __memset+0x16c/0x188
[ 0.000000] stack_depot_init+0xc8/0x100
[ 0.000000] __kmem_cache_create+0x454/0x570
[ 0.000000] create_boot_cache+0xa0/0xe0
[ 0.000000] kmem_cache_init+0xf8/0x204
[ 0.000000] start_kernel+0x3ec/0x668
[ 0.000000] __primary_switched+0xc0/0xc8
[ 0.000000] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428)
[ 0.000000] ---[ end trace 0000000000000000 ]---
[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---


Thanks!

> > > Thanks!
> > >
> > > /*
> > > * Set up kernel memory allocators
> > > */
> > > static void __init mm_init(void)
> > > {
> > > /*
> > > * page_ext requires contiguous pages,
> > > * bigger than MAX_ORDER unless SPARSEMEM.
> > > */
> > > page_ext_init_flatmem();
> > > init_mem_debugging_and_hardening();
> > > kfence_alloc_pool();
> > > report_meminit();
> > > stack_depot_early_init();
> > > mem_init();
> > > mem_init_print_info();
> > > kmem_cache_init();
> > > /*
> > > * page_owner must be initialized after buddy is ready, and also after
> > > * slab is ready so that stack_depot_init() works properly
> > > */)
> > >
> > >> Patch 1 is a new preparatory cleanup.
> > >>
> > >> Patch 2 originally submitted here [1], was merged to mainline but
> > >> reverted for stackdepot related issues as explained in the patch.
> > >>
> > >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> > >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> > >> be considered too intrusive so I will postpone it for later. The docs
> > >> patch is adjusted accordingly.
> > >>
> > >> Also available in git, based on v5.17-rc1:
> > >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
> > >>
> > >> I'd like to ask for some review before I add this to the slab tree.
> > >>
> > >> [1] https://lore.kernel.org/all/[email protected]/
> > >> [2] https://lore.kernel.org/all/[email protected]/
> > >>
> > >> Oliver Glitta (4):
> > >> mm/slub: use stackdepot to save stack trace in objects
> > >> mm/slub: aggregate and print stack traces in debugfs files
> > >> mm/slub: sort debugfs output by frequency of stack traces
> > >> slab, documentation: add description of debugfs files for SLUB caches
> > >>
> > >> Vlastimil Babka (1):
> > >> mm/slub: move struct track init out of set_track()
> > >>
> > >> Documentation/vm/slub.rst | 61 +++++++++++++++
> > >> init/Kconfig | 1 +
> > >> mm/slub.c | 152 +++++++++++++++++++++++++-------------
> > >> 3 files changed, 162 insertions(+), 52 deletions(-)
> > >>
> > >> --
> > >> 2.35.1
> > >>
> > >>
> > >
> >
>
> --
> Sincerely yours,
> Mike.

--
Thank you, You are awesome!
Hyeonggon :-)

2022-02-28 21:31:06

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot

On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> On 2/26/22 08:19, Hyeonggon Yoo wrote:
> > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> >> Hi,
> >>
> >> this series combines and revives patches from Oliver's last year
> >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> >> files alloc_traces and free_traces more useful.
> >> The resubmission was blocked on stackdepot changes that are now merged,
> >> as explained in patch 2.
> >>
> >
> > Hello. I just started review/testing this series.
> >
> > it crashed on my system (arm64)
>
> Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> from memblock. arm64 must have memblock freeing happen earlier or something.
> (CCing memblock experts)
>
> > I ran with boot parameter slub_debug=U, and without KASAN.
> > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> >
> > void * __init memblock_alloc_try_nid(
> > phys_addr_t size, phys_addr_t align,
> > phys_addr_t min_addr, phys_addr_t max_addr,
> > int nid)
> > {
> > void *ptr;
> >
> > memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> > __func__, (u64)size, (u64)align, nid, &min_addr,
> > &max_addr, (void *)_RET_IP_);
> > ptr = memblock_alloc_internal(size, align,
> > min_addr, max_addr, nid, false);
> > if (ptr)
> > memset(ptr, 0, size); <--- Crash Here
> >
> > return ptr;
> > }
> >
> > It crashed during create_boot_cache() -> stack_depot_init() ->
> > memblock_alloc().
> >
> > I think That's because, in kmem_cache_init(), both slab and memblock is not
> > available. (AFAIU memblock is not available after mem_init() because of
> > memblock_free_all(), right?)
>
> Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> But then, I would expect stack_depot_init() to detect that memblock_alloc()
> returns NULL, we print ""Stack Depot hash table allocation failed,
> disabling" and disable it. Instead it seems memblock_alloc() returns
> something that's already potentially used by somebody else? Sounds like a bug?


By the way, I fixed this by allowing stack_depot_init() to be called in
kmem_cache_init() too [1] and Marco suggested that calling
stack_depot_init() depending on slub_debug parameter for simplicity. [2]

I would prefer [2], Would you take a look?

[1] https://lkml.org/lkml/2022/2/27/31

[2] https://lkml.org/lkml/2022/2/28/717

> > Thanks!
> >
> > /*
> > * Set up kernel memory allocators
> > */
> > static void __init mm_init(void)
> > {
> > /*
> > * page_ext requires contiguous pages,
> > * bigger than MAX_ORDER unless SPARSEMEM.
> > */
> > page_ext_init_flatmem();
> > init_mem_debugging_and_hardening();
> > kfence_alloc_pool();
> > report_meminit();
> > stack_depot_early_init();
> > mem_init();
> > mem_init_print_info();
> > kmem_cache_init();
> > /*
> > * page_owner must be initialized after buddy is ready, and also after
> > * slab is ready so that stack_depot_init() works properly
> > */)
> >
> >> Patch 1 is a new preparatory cleanup.
> >>
> >> Patch 2 originally submitted here [1], was merged to mainline but
> >> reverted for stackdepot related issues as explained in the patch.
> >>
> >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> >> be considered too intrusive so I will postpone it for later. The docs
> >> patch is adjusted accordingly.
> >>
> >> Also available in git, based on v5.17-rc1:
> >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
> >>
> >> I'd like to ask for some review before I add this to the slab tree.
> >>
> >> [1] https://lore.kernel.org/all/[email protected]/
> >> [2] https://lore.kernel.org/all/[email protected]/
> >>
> >> Oliver Glitta (4):
> >> mm/slub: use stackdepot to save stack trace in objects
> >> mm/slub: aggregate and print stack traces in debugfs files
> >> mm/slub: sort debugfs output by frequency of stack traces
> >> slab, documentation: add description of debugfs files for SLUB caches
> >>
> >> Vlastimil Babka (1):
> >> mm/slub: move struct track init out of set_track()
> >>
> >> Documentation/vm/slub.rst | 61 +++++++++++++++
> >> init/Kconfig | 1 +
> >> mm/slub.c | 152 +++++++++++++++++++++++++-------------
> >> 3 files changed, 162 insertions(+), 52 deletions(-)
> >>
> >> --
> >> 2.35.1
> >>
> >>
> >
>

--
Thank you, You are awesome!
Hyeonggon :-)

2022-03-01 00:46:25

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH] mm/slub: initialize stack depot in boot process

On 2/28/22 16:09, Hyeonggon Yoo wrote:
> commit ba10d4b46655 ("mm/slub: use stackdepot to save stack trace in
> objects") initializes stack depot while creating cache if SLAB_STORE_USER
> flag is set.
>
> This can make kernel crash because a cache can be created in various
> contexts. For example if user sets slub_debug=U, kernel crashes
> because create_boot_cache() calls stack_depot_init(), which tries to
> allocate hash table using memblock_alloc() if slab is not available.
> But memblock is also not available at that time.
>
> This patch solves the problem by initializing stack depot early
> in boot process if SLAB_STORE_USER debug flag is set globally
> or the flag is set to at least one cache.
>
> [ [email protected]: initialize stack depot depending on slub_debug
> parameter instead of allowing stack_depot_init() can be called
> in kmem_cache_init() for simplicity. ]
>
> Link: https://lkml.org/lkml/2022/2/28/238
> Fixes: ba10d4b46655 ("mm/slub: use stackdepot to save stack trace in objects")
> Signed-off-by: Hyeonggon Yoo <[email protected]>

I think a much easier approach would be to do this checking in
setup_slub_debug(). There we may either detect SLAB_STORE_USER in
global_flags, or check the flags returned by parse_slub_debug_flags() in the
while (str) cycle, in the 'else' case where slab_list is present. Both cases
would just set some variable that stack_depot_early_init() (the
!CONFIG_STACKDEPOT_ALWAYS_INIT version, or a newly consolidated one) would
check. So that would be another way to request the stack_depot_init() at a
well-defined point of boot, similar to CONFIG_STACKDEPOT_ALWAYS_INIT.
Because setup_slub_debug() is called by __setup, which is processed from
start_kernel() -> parse_args() before mm_init() -> stack_depot_early_init().

> ---
> include/linux/slab.h | 1 +
> init/main.c | 1 +
> mm/slab.c | 4 ++++
> mm/slob.c | 4 ++++
> mm/slub.c | 28 +++++++++++++++++++++++++---
> 5 files changed, 35 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 37bde99b74af..023f3f71ae35 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -139,6 +139,7 @@ struct mem_cgroup;
> /*
> * struct kmem_cache related prototypes
> */
> +void __init kmem_cache_init_early(void);
> void __init kmem_cache_init(void);
> bool slab_is_available(void);
>
> diff --git a/init/main.c b/init/main.c
> index 65fa2e41a9c0..4fdb7975a085 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -835,6 +835,7 @@ static void __init mm_init(void)
> kfence_alloc_pool();
> report_meminit();
> stack_depot_early_init();
> + kmem_cache_init_early();
> mem_init();
> mem_init_print_info();
> kmem_cache_init();
> diff --git a/mm/slab.c b/mm/slab.c
> index ddf5737c63d9..80a6d01aab06 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1196,6 +1196,10 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
> }
> }
>
> +void __init kmem_cache_init_early(void)
> +{
> +}
> +
> /*
> * Initialisation. Called after the page allocator have been initialised and
> * before smp_init().
> diff --git a/mm/slob.c b/mm/slob.c
> index 60c5842215f1..00e323af8be4 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -715,6 +715,10 @@ struct kmem_cache kmem_cache_boot = {
> .align = ARCH_KMALLOC_MINALIGN,
> };
>
> +void __init kmem_cache_init_early(void)
> +{
> +}
> +
> void __init kmem_cache_init(void)
> {
> kmem_cache = &kmem_cache_boot;
> diff --git a/mm/slub.c b/mm/slub.c
> index a74afe59a403..40bcd18143b6 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4221,9 +4221,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
> s->remote_node_defrag_ratio = 1000;
> #endif
>
> - if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> - stack_depot_init();
> -
> /* Initialize the pre-computed randomized freelist if slab is up */
> if (slab_state >= UP) {
> if (init_cache_random_seq(s))
> @@ -4810,6 +4807,31 @@ static struct kmem_cache * __init bootstrap(struct kmem_cache *static_cache)
> return s;
> }
>
> +/* Initialize stack depot if needed */
> +void __init kmem_cache_init_early(void)
> +{
> +#ifdef CONFIG_STACKDEPOT
> + slab_flags_t block_flags;
> + char *next_block;
> + char *slab_list;
> +
> + if (slub_debug & SLAB_STORE_USER)
> + goto init_stack_depot;
> +
> + next_block = slub_debug_string;
> + while (next_block) {
> + next_block = parse_slub_debug_flags(next_block, &block_flags, &slab_list, false);
> + if (block_flags & SLAB_STORE_USER)
> + goto init_stack_depot;
> + }
> +
> + return;
> +
> +init_stack_depot:
> + stack_depot_init();
> +#endif
> +}
> +
> void __init kmem_cache_init(void)
> {
> static __initdata struct kmem_cache boot_kmem_cache,

2022-03-01 04:08:19

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH] mm/slub: initialize stack depot in boot process

On Mon, Feb 28, 2022 at 05:28:17PM +0100, Marco Elver wrote:
> On Mon, Feb 28, 2022 at 03:09PM +0000, Hyeonggon Yoo wrote:
> > commit ba10d4b46655 ("mm/slub: use stackdepot to save stack trace in
> > objects") initializes stack depot while creating cache if SLAB_STORE_USER
> > flag is set.
> >
> > This can make kernel crash because a cache can be created in various
> > contexts. For example if user sets slub_debug=U, kernel crashes
> > because create_boot_cache() calls stack_depot_init(), which tries to
> > allocate hash table using memblock_alloc() if slab is not available.
> > But memblock is also not available at that time.
> >
> > This patch solves the problem by initializing stack depot early
> > in boot process if SLAB_STORE_USER debug flag is set globally
> > or the flag is set to at least one cache.
> >
> > [ [email protected]: initialize stack depot depending on slub_debug
> > parameter instead of allowing stack_depot_init() can be called
> > in kmem_cache_init() for simplicity. ]
> >
> > Link: https://lkml.org/lkml/2022/2/28/238
>
> This would be a better permalink:
> https://lore.kernel.org/all/YhyeaP8lrzKgKm5A@ip-172-31-19-208.ap-northeast-1.compute.internal/
>

Agreed.

> > Fixes: ba10d4b46655 ("mm/slub: use stackdepot to save stack trace in objects")
>
> This commit does not exist in -next.
>

It did not land -next yet.

> I assume you intend that "lib/stackdepot: Use page allocator if both
> slab and memblock is unavailable" should be dropped now.
>

I did not intend that, but I agree the patch you mentioned
should be dropped now.

> > Signed-off-by: Hyeonggon Yoo <[email protected]>
> > ---
> > include/linux/slab.h | 1 +
> > init/main.c | 1 +
> > mm/slab.c | 4 ++++
> > mm/slob.c | 4 ++++
> > mm/slub.c | 28 +++++++++++++++++++++++++---
> > 5 files changed, 35 insertions(+), 3 deletions(-)
> [...]
> >
> > +/* Initialize stack depot if needed */
> > +void __init kmem_cache_init_early(void)
> > +{
> > +#ifdef CONFIG_STACKDEPOT
> > + slab_flags_t block_flags;
> > + char *next_block;
> > + char *slab_list;
> > +
> > + if (slub_debug & SLAB_STORE_USER)
> > + goto init_stack_depot;
> > +
> > + next_block = slub_debug_string;
> > + while (next_block) {
> > + next_block = parse_slub_debug_flags(next_block, &block_flags, &slab_list, false);
> > + if (block_flags & SLAB_STORE_USER)
> > + goto init_stack_depot;
> > + }
> > +
> > + return;
> > +
> > +init_stack_depot:
> > + stack_depot_init();
> > +#endif
> > +}
>
> You can simplify this function to avoid the goto:
>
> /* Initialize stack depot if needed */
> void __init kmem_cache_init_early(void)
> {
> #ifdef CONFIG_STACKDEPOT
> slab_flags_t flags = slub_debug;
> char *next_block = slub_debug_string;
> char *slab_list;
>
> for (;;) {
> if (flags & SLAB_STORE_USER) {
> stack_depot_init();
> break;
> }
> if (!next_block)
> break;
> next_block = parse_slub_debug_flags(next_block, &flags, &slab_list, false);
> }
> #endif
> }
>
> ^^ with this version, it'd also be much easier and less confusing to add
> other initialization logic unrelated to stackdepot later after the loop
> (should it ever be required).

Thank you for nice suggestion, but I want to try it in
setup_slub_debug() as Vlastimil said!

Thanks.

--
Thank you, You are awesome!
Hyeonggon :-)

2022-03-01 05:07:00

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot

On 2/28/22 21:01, Mike Rapoport wrote:
> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
>> On 2/26/22 08:19, Hyeonggon Yoo wrote:
>> > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
>> >> Hi,
>> >>
>> >> this series combines and revives patches from Oliver's last year
>> >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
>> >> files alloc_traces and free_traces more useful.
>> >> The resubmission was blocked on stackdepot changes that are now merged,
>> >> as explained in patch 2.
>> >>
>> >
>> > Hello. I just started review/testing this series.
>> >
>> > it crashed on my system (arm64)
>>
>> Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
>> from memblock. arm64 must have memblock freeing happen earlier or something.
>> (CCing memblock experts)
>>
>> > I ran with boot parameter slub_debug=U, and without KASAN.
>> > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
>> >
>> > void * __init memblock_alloc_try_nid(
>> > phys_addr_t size, phys_addr_t align,
>> > phys_addr_t min_addr, phys_addr_t max_addr,
>> > int nid)
>> > {
>> > void *ptr;
>> >
>> > memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
>> > __func__, (u64)size, (u64)align, nid, &min_addr,
>> > &max_addr, (void *)_RET_IP_);
>> > ptr = memblock_alloc_internal(size, align,
>> > min_addr, max_addr, nid, false);
>> > if (ptr)
>> > memset(ptr, 0, size); <--- Crash Here
>> >
>> > return ptr;
>> > }
>> >
>> > It crashed during create_boot_cache() -> stack_depot_init() ->
>> > memblock_alloc().
>> >
>> > I think That's because, in kmem_cache_init(), both slab and memblock is not
>> > available. (AFAIU memblock is not available after mem_init() because of
>> > memblock_free_all(), right?)
>>
>> Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
>> But then, I would expect stack_depot_init() to detect that memblock_alloc()
>> returns NULL, we print ""Stack Depot hash table allocation failed,
>> disabling" and disable it. Instead it seems memblock_alloc() returns
>> something that's already potentially used by somebody else? Sounds like a bug?
>
> If stack_depot_init() is called from kmem_cache_init(), there will be a
> confusion what allocator should be used because we use slab_is_available()
> to stop using memblock and start using kmalloc() instead in both
> stack_depot_init() and in memblock.

I did check that stack_depot_init() is called from kmem_cache_init()
*before* we make slab_is_available() true, hence assumed that memblock would
be still available at that point and expected no confusion. But seems if
memblock is already beyond memblock_free_all() then it being still available
is just an illusion?

> Hyeonggon, did you run your tests with panic on warn at any chance?
>
>> > Thanks!
>> >
>> > /*
>> > * Set up kernel memory allocators
>> > */
>> > static void __init mm_init(void)
>> > {
>> > /*
>> > * page_ext requires contiguous pages,
>> > * bigger than MAX_ORDER unless SPARSEMEM.
>> > */
>> > page_ext_init_flatmem();
>> > init_mem_debugging_and_hardening();
>> > kfence_alloc_pool();
>> > report_meminit();
>> > stack_depot_early_init();
>> > mem_init();
>> > mem_init_print_info();
>> > kmem_cache_init();
>> > /*
>> > * page_owner must be initialized after buddy is ready, and also after
>> > * slab is ready so that stack_depot_init() works properly
>> > */)
>> >
>> >> Patch 1 is a new preparatory cleanup.
>> >>
>> >> Patch 2 originally submitted here [1], was merged to mainline but
>> >> reverted for stackdepot related issues as explained in the patch.
>> >>
>> >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
>> >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
>> >> be considered too intrusive so I will postpone it for later. The docs
>> >> patch is adjusted accordingly.
>> >>
>> >> Also available in git, based on v5.17-rc1:
>> >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
>> >>
>> >> I'd like to ask for some review before I add this to the slab tree.
>> >>
>> >> [1] https://lore.kernel.org/all/[email protected]/
>> >> [2] https://lore.kernel.org/all/[email protected]/
>> >>
>> >> Oliver Glitta (4):
>> >> mm/slub: use stackdepot to save stack trace in objects
>> >> mm/slub: aggregate and print stack traces in debugfs files
>> >> mm/slub: sort debugfs output by frequency of stack traces
>> >> slab, documentation: add description of debugfs files for SLUB caches
>> >>
>> >> Vlastimil Babka (1):
>> >> mm/slub: move struct track init out of set_track()
>> >>
>> >> Documentation/vm/slub.rst | 61 +++++++++++++++
>> >> init/Kconfig | 1 +
>> >> mm/slub.c | 152 +++++++++++++++++++++++++-------------
>> >> 3 files changed, 162 insertions(+), 52 deletions(-)
>> >>
>> >> --
>> >> 2.35.1
>> >>
>> >>
>> >
>>
>

2022-03-01 09:52:39

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot

Hi,

On Mon, Feb 28, 2022 at 09:27:02PM +0000, Hyeonggon Yoo wrote:
> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> > On 2/26/22 08:19, Hyeonggon Yoo wrote:
> > > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> > >> Hi,
> > >>
> > >> this series combines and revives patches from Oliver's last year
> > >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> > >> files alloc_traces and free_traces more useful.
> > >> The resubmission was blocked on stackdepot changes that are now merged,
> > >> as explained in patch 2.
> > >>
> > >
> > > Hello. I just started review/testing this series.
> > >
> > > it crashed on my system (arm64)
> >
> > Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> > from memblock. arm64 must have memblock freeing happen earlier or something.
> > (CCing memblock experts)
> >
> > > I ran with boot parameter slub_debug=U, and without KASAN.
> > > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> > >
> > > void * __init memblock_alloc_try_nid(
> > > phys_addr_t size, phys_addr_t align,
> > > phys_addr_t min_addr, phys_addr_t max_addr,
> > > int nid)
> > > {
> > > void *ptr;
> > >
> > > memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> > > __func__, (u64)size, (u64)align, nid, &min_addr,
> > > &max_addr, (void *)_RET_IP_);
> > > ptr = memblock_alloc_internal(size, align,
> > > min_addr, max_addr, nid, false);
> > > if (ptr)
> > > memset(ptr, 0, size); <--- Crash Here
> > >
> > > return ptr;
> > > }
> > >
> > > It crashed during create_boot_cache() -> stack_depot_init() ->
> > > memblock_alloc().
> > >
> > > I think That's because, in kmem_cache_init(), both slab and memblock is not
> > > available. (AFAIU memblock is not available after mem_init() because of
> > > memblock_free_all(), right?)
> >
> > Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> > But then, I would expect stack_depot_init() to detect that memblock_alloc()
> > returns NULL, we print ""Stack Depot hash table allocation failed,
> > disabling" and disable it. Instead it seems memblock_alloc() returns
> > something that's already potentially used by somebody else? Sounds like a bug?
>
>
> By the way, I fixed this by allowing stack_depot_init() to be called in
> kmem_cache_init() too [1] and Marco suggested that calling
> stack_depot_init() depending on slub_debug parameter for simplicity. [2]
>
> I would prefer [2], Would you take a look?
>
> [1] https://lkml.org/lkml/2022/2/27/31
>
> [2] https://lkml.org/lkml/2022/2/28/717

I'm still looking at stack_depot_init() callers, but I think it's possible
to make stack_depot_early_init() a proper function that calls
memblock_alloc() and only use stack_depot_init() when slab is actually
available.

> > > Thanks!
> > >
> > > /*
> > > * Set up kernel memory allocators
> > > */
> > > static void __init mm_init(void)
> > > {
> > > /*
> > > * page_ext requires contiguous pages,
> > > * bigger than MAX_ORDER unless SPARSEMEM.
> > > */
> > > page_ext_init_flatmem();
> > > init_mem_debugging_and_hardening();
> > > kfence_alloc_pool();
> > > report_meminit();
> > > stack_depot_early_init();
> > > mem_init();
> > > mem_init_print_info();
> > > kmem_cache_init();
> > > /*
> > > * page_owner must be initialized after buddy is ready, and also after
> > > * slab is ready so that stack_depot_init() works properly
> > > */)
> > >
> > >> Patch 1 is a new preparatory cleanup.
> > >>
> > >> Patch 2 originally submitted here [1], was merged to mainline but
> > >> reverted for stackdepot related issues as explained in the patch.
> > >>
> > >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> > >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> > >> be considered too intrusive so I will postpone it for later. The docs
> > >> patch is adjusted accordingly.
> > >>
> > >> Also available in git, based on v5.17-rc1:
> > >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
> > >>
> > >> I'd like to ask for some review before I add this to the slab tree.
> > >>
> > >> [1] https://lore.kernel.org/all/[email protected]/
> > >> [2] https://lore.kernel.org/all/[email protected]/
> > >>
> > >> Oliver Glitta (4):
> > >> mm/slub: use stackdepot to save stack trace in objects
> > >> mm/slub: aggregate and print stack traces in debugfs files
> > >> mm/slub: sort debugfs output by frequency of stack traces
> > >> slab, documentation: add description of debugfs files for SLUB caches
> > >>
> > >> Vlastimil Babka (1):
> > >> mm/slub: move struct track init out of set_track()
> > >>
> > >> Documentation/vm/slub.rst | 61 +++++++++++++++
> > >> init/Kconfig | 1 +
> > >> mm/slub.c | 152 +++++++++++++++++++++++++-------------
> > >> 3 files changed, 162 insertions(+), 52 deletions(-)
> > >>
> > >> --
> > >> 2.35.1
> > >>
> > >>
> > >
> >
>
> --
> Thank you, You are awesome!
> Hyeonggon :-)

--
Sincerely yours,
Mike.

2022-03-01 11:35:06

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot

On Tue, Mar 01, 2022 at 12:38:11AM +0100, Vlastimil Babka wrote:
> On 2/28/22 21:01, Mike Rapoport wrote:
> > On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> >> On 2/26/22 08:19, Hyeonggon Yoo wrote:
> >> > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> >> >> Hi,
> >> >>
> >> >> this series combines and revives patches from Oliver's last year
> >> >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> >> >> files alloc_traces and free_traces more useful.
> >> >> The resubmission was blocked on stackdepot changes that are now merged,
> >> >> as explained in patch 2.
> >> >>
> >> >
> >> > Hello. I just started review/testing this series.
> >> >
> >> > it crashed on my system (arm64)
> >>
> >> Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> >> from memblock. arm64 must have memblock freeing happen earlier or something.
> >> (CCing memblock experts)
> >>
> >> > I ran with boot parameter slub_debug=U, and without KASAN.
> >> > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> >> >
> >> > void * __init memblock_alloc_try_nid(
> >> > phys_addr_t size, phys_addr_t align,
> >> > phys_addr_t min_addr, phys_addr_t max_addr,
> >> > int nid)
> >> > {
> >> > void *ptr;
> >> >
> >> > memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> >> > __func__, (u64)size, (u64)align, nid, &min_addr,
> >> > &max_addr, (void *)_RET_IP_);
> >> > ptr = memblock_alloc_internal(size, align,
> >> > min_addr, max_addr, nid, false);
> >> > if (ptr)
> >> > memset(ptr, 0, size); <--- Crash Here
> >> >
> >> > return ptr;
> >> > }
> >> >
> >> > It crashed during create_boot_cache() -> stack_depot_init() ->
> >> > memblock_alloc().
> >> >
> >> > I think That's because, in kmem_cache_init(), both slab and memblock is not
> >> > available. (AFAIU memblock is not available after mem_init() because of
> >> > memblock_free_all(), right?)
> >>
> >> Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> >> But then, I would expect stack_depot_init() to detect that memblock_alloc()
> >> returns NULL, we print ""Stack Depot hash table allocation failed,
> >> disabling" and disable it. Instead it seems memblock_alloc() returns
> >> something that's already potentially used by somebody else? Sounds like a bug?
> >
> > If stack_depot_init() is called from kmem_cache_init(), there will be a
> > confusion what allocator should be used because we use slab_is_available()
> > to stop using memblock and start using kmalloc() instead in both
> > stack_depot_init() and in memblock.
>
> I did check that stack_depot_init() is called from kmem_cache_init()
> *before* we make slab_is_available() true, hence assumed that memblock would
> be still available at that point and expected no confusion. But seems if
> memblock is already beyond memblock_free_all() then it being still available
> is just an illusion?

Yeah, it appears it is an illusion :)

I think we have to deal with allocations that happen between
memblock_free_all() and slab_is_available() at the memblock level and then
figure out the where to put stack_depot_init() and how to allocate memory
there.

I believe something like this (untested) patch below addresses the first
issue. As for stack_depot_init() I'm still trying to figure out the
possible call paths, but it seems we can use stack_depot_early_init() for
SLUB debugging case. I'll try to come up with something Really Soon (tm).

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 50ad19662a32..4ea89d44d22a 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -90,6 +90,7 @@ struct memblock_type {
*/
struct memblock {
bool bottom_up; /* is bottom up direction? */
+ bool mem_freed;
phys_addr_t current_limit;
struct memblock_type memory;
struct memblock_type reserved;
diff --git a/mm/memblock.c b/mm/memblock.c
index b12a364f2766..60196dc4980e 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -120,6 +120,7 @@ struct memblock memblock __initdata_memblock = {
.reserved.name = "reserved",

.bottom_up = false,
+ .mem_freed = false,
.current_limit = MEMBLOCK_ALLOC_ANYWHERE,
};

@@ -1487,6 +1488,13 @@ static void * __init memblock_alloc_internal(
if (WARN_ON_ONCE(slab_is_available()))
return kzalloc_node(size, GFP_NOWAIT, nid);

+ if (memblock.mem_freed) {
+ unsigned int order = get_order(size);
+
+ pr_warn("memblock: allocating from buddy\n");
+ return __alloc_pages_node(nid, order, GFP_KERNEL);
+ }
+
if (max_addr > memblock.current_limit)
max_addr = memblock.current_limit;

@@ -2116,6 +2124,7 @@ void __init memblock_free_all(void)

pages = free_low_memory_core_early();
totalram_pages_add(pages);
+ memblock.mem_freed = true;
}

#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)

> > Hyeonggon, did you run your tests with panic on warn at any chance?
> >
> >> > Thanks!
> >> >
> >> > /*
> >> > * Set up kernel memory allocators
> >> > */
> >> > static void __init mm_init(void)
> >> > {
> >> > /*
> >> > * page_ext requires contiguous pages,
> >> > * bigger than MAX_ORDER unless SPARSEMEM.
> >> > */
> >> > page_ext_init_flatmem();
> >> > init_mem_debugging_and_hardening();
> >> > kfence_alloc_pool();
> >> > report_meminit();
> >> > stack_depot_early_init();
> >> > mem_init();
> >> > mem_init_print_info();
> >> > kmem_cache_init();
> >> > /*
> >> > * page_owner must be initialized after buddy is ready, and also after
> >> > * slab is ready so that stack_depot_init() works properly
> >> > */)
> >> >
> >> >> Patch 1 is a new preparatory cleanup.
> >> >>
> >> >> Patch 2 originally submitted here [1], was merged to mainline but
> >> >> reverted for stackdepot related issues as explained in the patch.
> >> >>
> >> >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> >> >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> >> >> be considered too intrusive so I will postpone it for later. The docs
> >> >> patch is adjusted accordingly.
> >> >>
> >> >> Also available in git, based on v5.17-rc1:
> >> >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
> >> >>
> >> >> I'd like to ask for some review before I add this to the slab tree.
> >> >>
> >> >> [1] https://lore.kernel.org/all/[email protected]/
> >> >> [2] https://lore.kernel.org/all/[email protected]/
> >> >>
> >> >> Oliver Glitta (4):
> >> >> mm/slub: use stackdepot to save stack trace in objects
> >> >> mm/slub: aggregate and print stack traces in debugfs files
> >> >> mm/slub: sort debugfs output by frequency of stack traces
> >> >> slab, documentation: add description of debugfs files for SLUB caches
> >> >>
> >> >> Vlastimil Babka (1):
> >> >> mm/slub: move struct track init out of set_track()
> >> >>
> >> >> Documentation/vm/slub.rst | 61 +++++++++++++++
> >> >> init/Kconfig | 1 +
> >> >> mm/slub.c | 152 +++++++++++++++++++++++++-------------
> >> >> 3 files changed, 162 insertions(+), 52 deletions(-)
> >> >>
> >> >> --
> >> >> 2.35.1
> >> >>
> >> >>
> >> >
> >>
> >
>

--
Sincerely yours,
Mike.

2022-03-01 19:56:36

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot

On Tue, Mar 01, 2022 at 10:41:10AM +0100, Vlastimil Babka wrote:
> On 3/1/22 10:21, Mike Rapoport wrote:
> > On Tue, Mar 01, 2022 at 12:38:11AM +0100, Vlastimil Babka wrote:
> >> On 2/28/22 21:01, Mike Rapoport wrote:
> >> > On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> >> >
> >> > If stack_depot_init() is called from kmem_cache_init(), there will be a
> >> > confusion what allocator should be used because we use slab_is_available()
> >> > to stop using memblock and start using kmalloc() instead in both
> >> > stack_depot_init() and in memblock.
> >>
> >> I did check that stack_depot_init() is called from kmem_cache_init()
> >> *before* we make slab_is_available() true, hence assumed that memblock would
> >> be still available at that point and expected no confusion. But seems if
> >> memblock is already beyond memblock_free_all() then it being still available
> >> is just an illusion?
> >
> > Yeah, it appears it is an illusion :)
> >
> > I think we have to deal with allocations that happen between
> > memblock_free_all() and slab_is_available() at the memblock level and then
> > figure out the where to put stack_depot_init() and how to allocate memory
> > there.
> >
> > I believe something like this (untested) patch below addresses the first
> > issue. As for stack_depot_init() I'm still trying to figure out the
> > possible call paths, but it seems we can use stack_depot_early_init() for
> > SLUB debugging case. I'll try to come up with something Really Soon (tm).
>
> Yeah as you already noticed, we are pursuing an approach to decide on
> calling stack_depot_early_init(), which should be a good way to solve this
> given how special slab is in this case. For memblock I just wanted to point
> out that it could be more robust, your patch below seems to be on the right
> patch. Maybe it just doesn't have to fallback to buddy, which could be
> considered a layering violation, but just return NULL that can be
> immediately recognized as an error?

The layering violation is anyway there for slab_is_available() case, so
adding a __alloc_pages() there will be only consistent.

> > diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> > index 50ad19662a32..4ea89d44d22a 100644
> > --- a/include/linux/memblock.h
> > +++ b/include/linux/memblock.h
> > @@ -90,6 +90,7 @@ struct memblock_type {
> > */
> > struct memblock {
> > bool bottom_up; /* is bottom up direction? */
> > + bool mem_freed;
> > phys_addr_t current_limit;
> > struct memblock_type memory;
> > struct memblock_type reserved;
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index b12a364f2766..60196dc4980e 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -120,6 +120,7 @@ struct memblock memblock __initdata_memblock = {
> > .reserved.name = "reserved",
> >
> > .bottom_up = false,
> > + .mem_freed = false,
> > .current_limit = MEMBLOCK_ALLOC_ANYWHERE,
> > };
> >
> > @@ -1487,6 +1488,13 @@ static void * __init memblock_alloc_internal(
> > if (WARN_ON_ONCE(slab_is_available()))
> > return kzalloc_node(size, GFP_NOWAIT, nid);
> >
> > + if (memblock.mem_freed) {
> > + unsigned int order = get_order(size);
> > +
> > + pr_warn("memblock: allocating from buddy\n");
> > + return __alloc_pages_node(nid, order, GFP_KERNEL);
> > + }
> > +
> > if (max_addr > memblock.current_limit)
> > max_addr = memblock.current_limit;
> >
> > @@ -2116,6 +2124,7 @@ void __init memblock_free_all(void)
> >
> > pages = free_low_memory_core_early();
> > totalram_pages_add(pages);
> > + memblock.mem_freed = true;
> > }
> >
> > #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
> >

--
Sincerely yours,
Mike.

2022-03-01 19:58:01

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot

On 3/1/22 10:21, Mike Rapoport wrote:
> On Tue, Mar 01, 2022 at 12:38:11AM +0100, Vlastimil Babka wrote:
>> On 2/28/22 21:01, Mike Rapoport wrote:
>> > On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
>> >> On 2/26/22 08:19, Hyeonggon Yoo wrote:
>> >> > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
>> >> >> Hi,
>> >> >>
>> >> >> this series combines and revives patches from Oliver's last year
>> >> >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
>> >> >> files alloc_traces and free_traces more useful.
>> >> >> The resubmission was blocked on stackdepot changes that are now merged,
>> >> >> as explained in patch 2.
>> >> >>
>> >> >
>> >> > Hello. I just started review/testing this series.
>> >> >
>> >> > it crashed on my system (arm64)
>> >>
>> >> Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
>> >> from memblock. arm64 must have memblock freeing happen earlier or something.
>> >> (CCing memblock experts)
>> >>
>> >> > I ran with boot parameter slub_debug=U, and without KASAN.
>> >> > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
>> >> >
>> >> > void * __init memblock_alloc_try_nid(
>> >> > phys_addr_t size, phys_addr_t align,
>> >> > phys_addr_t min_addr, phys_addr_t max_addr,
>> >> > int nid)
>> >> > {
>> >> > void *ptr;
>> >> >
>> >> > memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
>> >> > __func__, (u64)size, (u64)align, nid, &min_addr,
>> >> > &max_addr, (void *)_RET_IP_);
>> >> > ptr = memblock_alloc_internal(size, align,
>> >> > min_addr, max_addr, nid, false);
>> >> > if (ptr)
>> >> > memset(ptr, 0, size); <--- Crash Here
>> >> >
>> >> > return ptr;
>> >> > }
>> >> >
>> >> > It crashed during create_boot_cache() -> stack_depot_init() ->
>> >> > memblock_alloc().
>> >> >
>> >> > I think That's because, in kmem_cache_init(), both slab and memblock is not
>> >> > available. (AFAIU memblock is not available after mem_init() because of
>> >> > memblock_free_all(), right?)
>> >>
>> >> Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
>> >> But then, I would expect stack_depot_init() to detect that memblock_alloc()
>> >> returns NULL, we print ""Stack Depot hash table allocation failed,
>> >> disabling" and disable it. Instead it seems memblock_alloc() returns
>> >> something that's already potentially used by somebody else? Sounds like a bug?
>> >
>> > If stack_depot_init() is called from kmem_cache_init(), there will be a
>> > confusion what allocator should be used because we use slab_is_available()
>> > to stop using memblock and start using kmalloc() instead in both
>> > stack_depot_init() and in memblock.
>>
>> I did check that stack_depot_init() is called from kmem_cache_init()
>> *before* we make slab_is_available() true, hence assumed that memblock would
>> be still available at that point and expected no confusion. But seems if
>> memblock is already beyond memblock_free_all() then it being still available
>> is just an illusion?
>
> Yeah, it appears it is an illusion :)
>
> I think we have to deal with allocations that happen between
> memblock_free_all() and slab_is_available() at the memblock level and then
> figure out the where to put stack_depot_init() and how to allocate memory
> there.
>
> I believe something like this (untested) patch below addresses the first
> issue. As for stack_depot_init() I'm still trying to figure out the
> possible call paths, but it seems we can use stack_depot_early_init() for
> SLUB debugging case. I'll try to come up with something Really Soon (tm).

Yeah as you already noticed, we are pursuing an approach to decide on
calling stack_depot_early_init(), which should be a good way to solve this
given how special slab is in this case. For memblock I just wanted to point
out that it could be more robust, your patch below seems to be on the right
patch. Maybe it just doesn't have to fallback to buddy, which could be
considered a layering violation, but just return NULL that can be
immediately recognized as an error?

> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 50ad19662a32..4ea89d44d22a 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -90,6 +90,7 @@ struct memblock_type {
> */
> struct memblock {
> bool bottom_up; /* is bottom up direction? */
> + bool mem_freed;
> phys_addr_t current_limit;
> struct memblock_type memory;
> struct memblock_type reserved;
> diff --git a/mm/memblock.c b/mm/memblock.c
> index b12a364f2766..60196dc4980e 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -120,6 +120,7 @@ struct memblock memblock __initdata_memblock = {
> .reserved.name = "reserved",
>
> .bottom_up = false,
> + .mem_freed = false,
> .current_limit = MEMBLOCK_ALLOC_ANYWHERE,
> };
>
> @@ -1487,6 +1488,13 @@ static void * __init memblock_alloc_internal(
> if (WARN_ON_ONCE(slab_is_available()))
> return kzalloc_node(size, GFP_NOWAIT, nid);
>
> + if (memblock.mem_freed) {
> + unsigned int order = get_order(size);
> +
> + pr_warn("memblock: allocating from buddy\n");
> + return __alloc_pages_node(nid, order, GFP_KERNEL);
> + }
> +
> if (max_addr > memblock.current_limit)
> max_addr = memblock.current_limit;
>
> @@ -2116,6 +2124,7 @@ void __init memblock_free_all(void)
>
> pages = free_low_memory_core_early();
> totalram_pages_add(pages);
> + memblock.mem_freed = true;
> }
>
> #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
>
>> > Hyeonggon, did you run your tests with panic on warn at any chance?
>> >
>> >> > Thanks!
>> >> >
>> >> > /*
>> >> > * Set up kernel memory allocators
>> >> > */
>> >> > static void __init mm_init(void)
>> >> > {
>> >> > /*
>> >> > * page_ext requires contiguous pages,
>> >> > * bigger than MAX_ORDER unless SPARSEMEM.
>> >> > */
>> >> > page_ext_init_flatmem();
>> >> > init_mem_debugging_and_hardening();
>> >> > kfence_alloc_pool();
>> >> > report_meminit();
>> >> > stack_depot_early_init();
>> >> > mem_init();
>> >> > mem_init_print_info();
>> >> > kmem_cache_init();
>> >> > /*
>> >> > * page_owner must be initialized after buddy is ready, and also after
>> >> > * slab is ready so that stack_depot_init() works properly
>> >> > */)
>> >> >
>> >> >> Patch 1 is a new preparatory cleanup.
>> >> >>
>> >> >> Patch 2 originally submitted here [1], was merged to mainline but
>> >> >> reverted for stackdepot related issues as explained in the patch.
>> >> >>
>> >> >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
>> >> >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
>> >> >> be considered too intrusive so I will postpone it for later. The docs
>> >> >> patch is adjusted accordingly.
>> >> >>
>> >> >> Also available in git, based on v5.17-rc1:
>> >> >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
>> >> >>
>> >> >> I'd like to ask for some review before I add this to the slab tree.
>> >> >>
>> >> >> [1] https://lore.kernel.org/all/[email protected]/
>> >> >> [2] https://lore.kernel.org/all/[email protected]/
>> >> >>
>> >> >> Oliver Glitta (4):
>> >> >> mm/slub: use stackdepot to save stack trace in objects
>> >> >> mm/slub: aggregate and print stack traces in debugfs files
>> >> >> mm/slub: sort debugfs output by frequency of stack traces
>> >> >> slab, documentation: add description of debugfs files for SLUB caches
>> >> >>
>> >> >> Vlastimil Babka (1):
>> >> >> mm/slub: move struct track init out of set_track()
>> >> >>
>> >> >> Documentation/vm/slub.rst | 61 +++++++++++++++
>> >> >> init/Kconfig | 1 +
>> >> >> mm/slub.c | 152 +++++++++++++++++++++++++-------------
>> >> >> 3 files changed, 162 insertions(+), 52 deletions(-)
>> >> >>
>> >> >> --
>> >> >> 2.35.1
>> >> >>
>> >> >>
>> >> >
>> >>
>> >
>>
>

2022-03-02 10:47:09

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot

On 3/2/22 09:37, Mike Rapoport wrote:
> On Mon, Feb 28, 2022 at 09:27:02PM +0000, Hyeonggon Yoo wrote:
>> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
>> > On 2/26/22 08:19, Hyeonggon Yoo wrote:
>> > > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
>> > >> Hi,
>> > >>
>> > >> this series combines and revives patches from Oliver's last year
>> > >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
>> > >> files alloc_traces and free_traces more useful.
>> > >> The resubmission was blocked on stackdepot changes that are now merged,
>> > >> as explained in patch 2.
>> > >>
>> > >
>> > > Hello. I just started review/testing this series.
>> > >
>> > > it crashed on my system (arm64)
>> >
>> > Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
>> > from memblock. arm64 must have memblock freeing happen earlier or something.
>> > (CCing memblock experts)
>> >
>> > > I ran with boot parameter slub_debug=U, and without KASAN.
>> > > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
>> > >
>> > > void * __init memblock_alloc_try_nid(
>> > > phys_addr_t size, phys_addr_t align,
>> > > phys_addr_t min_addr, phys_addr_t max_addr,
>> > > int nid)
>> > > {
>> > > void *ptr;
>> > >
>> > > memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
>> > > __func__, (u64)size, (u64)align, nid, &min_addr,
>> > > &max_addr, (void *)_RET_IP_);
>> > > ptr = memblock_alloc_internal(size, align,
>> > > min_addr, max_addr, nid, false);
>> > > if (ptr)
>> > > memset(ptr, 0, size); <--- Crash Here
>> > >
>> > > return ptr;
>> > > }
>> > >
>> > > It crashed during create_boot_cache() -> stack_depot_init() ->
>> > > memblock_alloc().
>> > >
>> > > I think That's because, in kmem_cache_init(), both slab and memblock is not
>> > > available. (AFAIU memblock is not available after mem_init() because of
>> > > memblock_free_all(), right?)
>> >
>> > Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
>> > But then, I would expect stack_depot_init() to detect that memblock_alloc()
>> > returns NULL, we print ""Stack Depot hash table allocation failed,
>> > disabling" and disable it. Instead it seems memblock_alloc() returns
>> > something that's already potentially used by somebody else? Sounds like a bug?
>>
>>
>> By the way, I fixed this by allowing stack_depot_init() to be called in
>> kmem_cache_init() too [1] and Marco suggested that calling
>> stack_depot_init() depending on slub_debug parameter for simplicity. [2]
>>
>> I would prefer [2], Would you take a look?
>>
>> [1] https://lkml.org/lkml/2022/2/27/31
>>
>> [2] https://lkml.org/lkml/2022/2/28/717
>
> I have the third version :)

While simple, it changes the timing of stack_depot_early_init() that was
supposed to be at a single callsite - now it's less predictable and depends
on e.g. kernel parameter ordering. Some arch/config combo could break,
dunno. Setting a variable that stack_depot_early_init() checks should be
more robust.

>
> diff --git a/mm/slub.c b/mm/slub.c
> index a74afe59a403..0c3ab2335b46 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1548,6 +1548,10 @@ static int __init setup_slub_debug(char *str)
> }
> out:
> slub_debug = global_flags;
> +
> + if (slub_flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> + stack_depot_early_init();
> +
> if (slub_debug != 0 || slub_debug_string)
> static_branch_enable(&slub_debug_enabled);
> else
> @@ -4221,9 +4225,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
> s->remote_node_defrag_ratio = 1000;
> #endif
>
> - if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> - stack_depot_init();
> -
> /* Initialize the pre-computed randomized freelist if slab is up */
> if (slab_state >= UP) {
> if (init_cache_random_seq(s))
>
>> --
>> Thank you, You are awesome!
>> Hyeonggon :-)
>

2022-03-02 11:14:42

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot

On Mon, Feb 28, 2022 at 09:27:02PM +0000, Hyeonggon Yoo wrote:
> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> > On 2/26/22 08:19, Hyeonggon Yoo wrote:
> > > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> > >> Hi,
> > >>
> > >> this series combines and revives patches from Oliver's last year
> > >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> > >> files alloc_traces and free_traces more useful.
> > >> The resubmission was blocked on stackdepot changes that are now merged,
> > >> as explained in patch 2.
> > >>
> > >
> > > Hello. I just started review/testing this series.
> > >
> > > it crashed on my system (arm64)
> >
> > Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> > from memblock. arm64 must have memblock freeing happen earlier or something.
> > (CCing memblock experts)
> >
> > > I ran with boot parameter slub_debug=U, and without KASAN.
> > > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> > >
> > > void * __init memblock_alloc_try_nid(
> > > phys_addr_t size, phys_addr_t align,
> > > phys_addr_t min_addr, phys_addr_t max_addr,
> > > int nid)
> > > {
> > > void *ptr;
> > >
> > > memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> > > __func__, (u64)size, (u64)align, nid, &min_addr,
> > > &max_addr, (void *)_RET_IP_);
> > > ptr = memblock_alloc_internal(size, align,
> > > min_addr, max_addr, nid, false);
> > > if (ptr)
> > > memset(ptr, 0, size); <--- Crash Here
> > >
> > > return ptr;
> > > }
> > >
> > > It crashed during create_boot_cache() -> stack_depot_init() ->
> > > memblock_alloc().
> > >
> > > I think That's because, in kmem_cache_init(), both slab and memblock is not
> > > available. (AFAIU memblock is not available after mem_init() because of
> > > memblock_free_all(), right?)
> >
> > Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> > But then, I would expect stack_depot_init() to detect that memblock_alloc()
> > returns NULL, we print ""Stack Depot hash table allocation failed,
> > disabling" and disable it. Instead it seems memblock_alloc() returns
> > something that's already potentially used by somebody else? Sounds like a bug?
>
>
> By the way, I fixed this by allowing stack_depot_init() to be called in
> kmem_cache_init() too [1] and Marco suggested that calling
> stack_depot_init() depending on slub_debug parameter for simplicity. [2]
>
> I would prefer [2], Would you take a look?
>
> [1] https://lkml.org/lkml/2022/2/27/31
>
> [2] https://lkml.org/lkml/2022/2/28/717

I have the third version :)

diff --git a/mm/slub.c b/mm/slub.c
index a74afe59a403..0c3ab2335b46 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1548,6 +1548,10 @@ static int __init setup_slub_debug(char *str)
}
out:
slub_debug = global_flags;
+
+ if (slub_flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
+ stack_depot_early_init();
+
if (slub_debug != 0 || slub_debug_string)
static_branch_enable(&slub_debug_enabled);
else
@@ -4221,9 +4225,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
s->remote_node_defrag_ratio = 1000;
#endif

- if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
- stack_depot_init();
-
/* Initialize the pre-computed randomized freelist if slab is up */
if (slab_state >= UP) {
if (init_cache_random_seq(s))

> --
> Thank you, You are awesome!
> Hyeonggon :-)

--
Sincerely yours,
Mike.

2022-03-02 17:53:20

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot

On Wed, 2 Mar 2022 at 18:02, Hyeonggon Yoo <[email protected]> wrote:
[...]
> So IMO we have two solutions.
>
> First solution is only allowing early init and avoiding late init.
> (setting a global variable that is visible to stack depot would do this)
>
> And second solution is to make caller allocate and manage its own hash
> table. All of this complexity is because we're trying to make stack_table
> global.

I think this would be a mistake, because then we have to continuously
audit all users of stackdepot and make sure that allocation stack
traces don't end up in duplicate hash tables. It's global for a
reason.

> First solution looks ok if we have few users of stack depot.
> But I think we should use second approach if stack depot is growing
> more and more callers?

The problem here really is just that initialization of stackdepot and
slabs can have a cyclic dependency with the changes you're making. I
very much doubt there'll be other cases (beyond the allocator itself
used by stackdepot) which can introduce such a cyclic dependency.

The easiest way to break the cyclic dependency is to initialize
stackdepot earlier, assuming it can be determined it is required (in
this case it can because the command line is parsed before slab
creation). The suggestion with the stack_depot_needed_early variable
(like Mike's suggested code) would solve all that.

I don't understand the concern about multiple contexts. The problem is
just about a cyclic dependency during early init, and I doubt we'll
have more of that.

Thanks,
-- Marco

2022-03-02 18:38:06

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot

On Wed, Mar 02, 2022 at 02:30:56PM +0200, Mike Rapoport wrote:
> On Wed, Mar 02, 2022 at 10:09:37AM +0100, Vlastimil Babka wrote:
> > On 3/2/22 09:37, Mike Rapoport wrote:
> > > On Mon, Feb 28, 2022 at 09:27:02PM +0000, Hyeonggon Yoo wrote:
> > >> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> > >> > On 2/26/22 08:19, Hyeonggon Yoo wrote:
> > >> > > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> > >> > >> Hi,
> > >> > >>
> > >> > >> this series combines and revives patches from Oliver's last year
> > >> > >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> > >> > >> files alloc_traces and free_traces more useful.
> > >> > >> The resubmission was blocked on stackdepot changes that are now merged,
> > >> > >> as explained in patch 2.
> > >> > >>
> > >> > >
> > >> > > Hello. I just started review/testing this series.
> > >> > >
> > >> > > it crashed on my system (arm64)
> > >> >
> > >> > Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> > >> > from memblock. arm64 must have memblock freeing happen earlier or something.
> > >> > (CCing memblock experts)
> > >> >
> > >> > > I ran with boot parameter slub_debug=U, and without KASAN.
> > >> > > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> > >> > >
> > >> > > void * __init memblock_alloc_try_nid(
> > >> > > phys_addr_t size, phys_addr_t align,
> > >> > > phys_addr_t min_addr, phys_addr_t max_addr,
> > >> > > int nid)
> > >> > > {
> > >> > > void *ptr;
> > >> > >
> > >> > > memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> > >> > > __func__, (u64)size, (u64)align, nid, &min_addr,
> > >> > > &max_addr, (void *)_RET_IP_);
> > >> > > ptr = memblock_alloc_internal(size, align,
> > >> > > min_addr, max_addr, nid, false);
> > >> > > if (ptr)
> > >> > > memset(ptr, 0, size); <--- Crash Here
> > >> > >
> > >> > > return ptr;
> > >> > > }
> > >> > >
> > >> > > It crashed during create_boot_cache() -> stack_depot_init() ->
> > >> > > memblock_alloc().
> > >> > >
> > >> > > I think That's because, in kmem_cache_init(), both slab and memblock is not
> > >> > > available. (AFAIU memblock is not available after mem_init() because of
> > >> > > memblock_free_all(), right?)
> > >> >
> > >> > Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> > >> > But then, I would expect stack_depot_init() to detect that memblock_alloc()
> > >> > returns NULL, we print ""Stack Depot hash table allocation failed,
> > >> > disabling" and disable it. Instead it seems memblock_alloc() returns
> > >> > something that's already potentially used by somebody else? Sounds like a bug?
> > >>
> > >>
> > >> By the way, I fixed this by allowing stack_depot_init() to be called in
> > >> kmem_cache_init() too [1] and Marco suggested that calling
> > >> stack_depot_init() depending on slub_debug parameter for simplicity. [2]
> > >>
> > >> I would prefer [2], Would you take a look?
> > >>
> > >> [1] https://lkml.org/lkml/2022/2/27/31
> > >>
> > >> [2] https://lkml.org/lkml/2022/2/28/717
> > >
> > > I have the third version :)
> >
> > While simple, it changes the timing of stack_depot_early_init() that was
> > supposed to be at a single callsite - now it's less predictable and depends
> > on e.g. kernel parameter ordering. Some arch/config combo could break,
> > dunno. Setting a variable that stack_depot_early_init() checks should be
> > more robust.
>
> Not sure I follow.
> stack_depot_early_init() is a wrapper for stack_depot_init() which already
> checks
>
> if (!stack_depot_disable && !stack_table)
>
> So largely it can be at multiple call sites just like stack_depot_init...

In my opinion, allowing to call stack_depot_init() in various context is not a good
idea. For another simple example, slub_debug=U,vmap_area can fool the
current code because it's called in context where slab is available,
but vmalloc is unavailable. and stack_depot_init() will try to allocate
using kvmalloc(). Late initialization adds too much complexity.

So IMO we have two solutions.

First solution is only allowing early init and avoiding late init.
(setting a global variable that is visible to stack depot would do this)

And second solution is to make caller allocate and manage its own hash
table. All of this complexity is because we're trying to make stack_table
global.

First solution looks ok if we have few users of stack depot.
But I think we should use second approach if stack depot is growing
more and more callers?

> Still, I understand your concern of having multiple call sites for
> stack_depot_early_init().
>
> The most robust way I can think of will be to make stack_depot_early_init()
> a proper function, move memblock_alloc() there and add a variable, say
> stack_depot_needed_early that will be set to 1 if
> CONFIG_STACKDEPOT_ALWAYS_INIT=y or by the callers that need to allocate the
> stack_table before kmalloc is up.
>
> E.g
>
> __init int stack_depot_early_init(void)
> {
>
> if (stack_depot_needed_early && !stack_table) {
> size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
> int i;
>
> pr_info("Stack Depot allocating hash table with memblock_alloc\n");
> stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
>
> if (!stack_table) {
> pr_err("Stack Depot hash table allocation failed, disabling\n");
> stack_depot_disable = true;
> return -ENOMEM;
> }
> }
>
> return 0;
> }
>
> The mutex is not needed here because mm_init() -> stack_depot_early_init()
> happens before SMP and setting stack_table[i] to NULL is redundant with
> memblock_alloc(). (btw, kvmalloc case could use __GFP_ZERO as well).
>
> I'm not sure if the stack depot should be disabled for good if the early
> allocation failed, but that's another story.
>
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index a74afe59a403..0c3ab2335b46 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -1548,6 +1548,10 @@ static int __init setup_slub_debug(char *str)
> > > }
> > > out:
> > > slub_debug = global_flags;
> > > +
> > > + if (slub_flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> > > + stack_depot_early_init();
> > > +
> > > if (slub_debug != 0 || slub_debug_string)
> > > static_branch_enable(&slub_debug_enabled);
> > > else
> > > @@ -4221,9 +4225,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
> > > s->remote_node_defrag_ratio = 1000;
> > > #endif
> > >
> > > - if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> > > - stack_depot_init();
> > > -
> > > /* Initialize the pre-computed randomized freelist if slab is up */
> > > if (slab_state >= UP) {
> > > if (init_cache_random_seq(s))
> > >
> > >> --
> > >> Thank you, You are awesome!
> > >> Hyeonggon :-)
> > >
> >
>
> --
> Sincerely yours,
> Mike.

--
Thank you, You are awesome!
Hyeonggon :-)

2022-03-02 21:00:28

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects

On Wed, Mar 02, 2022 at 05:51:32PM +0100, Vlastimil Babka wrote:
> On 2/27/22 10:44, Hyeonggon Yoo wrote:
> > On Fri, Feb 25, 2022 at 07:03:15PM +0100, Vlastimil Babka wrote:
> >> From: Oliver Glitta <[email protected]>
> >>
> >> Many stack traces are similar so there are many similar arrays.
> >> Stackdepot saves each unique stack only once.
> >>
> >> Replace field addrs in struct track with depot_stack_handle_t handle. Use
> >> stackdepot to save stack trace.
> >>
> >
> > I think it's not a replacement?
>
> It is, for the array 'addrs':
>
> -#ifdef CONFIG_STACKTRACE
> - unsigned long addrs[TRACK_ADDRS_COUNT]; /* Called from address */
> +#ifdef CONFIG_STACKDEPOT
> + depot_stack_handle_t handle;
>
> Not confuse with 'addr' which is the immediate caller and indeed stays
> for redundancy/kernels without stack trace enabled.
>

Oh, my fault. Right. I was confused.
I should read it again.

> >> The benefits are smaller memory overhead and possibility to aggregate
> >> per-cache statistics in the following patch using the stackdepot handle
> >> instead of matching stacks manually.
> >>
> >> [ [email protected]: rebase to 5.17-rc1 and adjust accordingly ]
> >>
> >> This was initially merged as commit 788691464c29 and reverted by commit
> >> ae14c63a9f20 due to several issues, that should now be fixed.
> >> The problem of unconditional memory overhead by stackdepot has been
> >> addressed by commit 2dba5eb1c73b ("lib/stackdepot: allow optional init
> >> and stack_table allocation by kvmalloc()"), so the dependency on
> >> stackdepot will result in extra memory usage only when a slab cache
> >> tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
> >> The build failures on some architectures were also addressed, and the
> >> reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
> >> patch.
> >
> > This is just an idea and beyond this patch.
> >
> > After this patch, now we have external storage that records stack traces.
>
> Well, we had it before this patch too.
>
> > It's possible that some rare stack traces are in stack depot, but
> > not reachable because track is overwritten.
>
> Yes.
>
> > I think it's worth implementing a way to iterate through stacks in stack depot?
>
> The question is for what use case? We might even not know who stored
> them - could have been page_owner, or other stack depot users.

> But the point is usually not to learn about all existing traces, but to
> determine which ones cause an object lifetime bug, or memory leak.

Yeah, this is exactly what I misunderstood.
I thought purpose of free_traces is to show all existing traces.
But I realized today that free trace without alloc trace is not useful.

I'll review v2 with these in mind.
Thank you.

> >>
> >> Signed-off-by: Oliver Glitta <[email protected]>
> >> Signed-off-by: Vlastimil Babka <[email protected]>
> >> Cc: David Rientjes <[email protected]>
> >> Cc: Christoph Lameter <[email protected]>
> >> Cc: Pekka Enberg <[email protected]>
> >> Cc: Joonsoo Kim <[email protected]>
> >
>

--
Thank you, You are awesome!
Hyeonggon :-)

2022-03-02 21:12:17

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects

On 2/27/22 10:44, Hyeonggon Yoo wrote:
> On Fri, Feb 25, 2022 at 07:03:15PM +0100, Vlastimil Babka wrote:
>> From: Oliver Glitta <[email protected]>
>>
>> Many stack traces are similar so there are many similar arrays.
>> Stackdepot saves each unique stack only once.
>>
>> Replace field addrs in struct track with depot_stack_handle_t handle. Use
>> stackdepot to save stack trace.
>>
>
> I think it's not a replacement?

It is, for the array 'addrs':

-#ifdef CONFIG_STACKTRACE
- unsigned long addrs[TRACK_ADDRS_COUNT]; /* Called from address */
+#ifdef CONFIG_STACKDEPOT
+ depot_stack_handle_t handle;

Not confuse with 'addr' which is the immediate caller and indeed stays
for redundancy/kernels without stack trace enabled.

>> The benefits are smaller memory overhead and possibility to aggregate
>> per-cache statistics in the following patch using the stackdepot handle
>> instead of matching stacks manually.
>>
>> [ [email protected]: rebase to 5.17-rc1 and adjust accordingly ]
>>
>> This was initially merged as commit 788691464c29 and reverted by commit
>> ae14c63a9f20 due to several issues, that should now be fixed.
>> The problem of unconditional memory overhead by stackdepot has been
>> addressed by commit 2dba5eb1c73b ("lib/stackdepot: allow optional init
>> and stack_table allocation by kvmalloc()"), so the dependency on
>> stackdepot will result in extra memory usage only when a slab cache
>> tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
>> The build failures on some architectures were also addressed, and the
>> reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
>> patch.
>
> This is just an idea and beyond this patch.
>
> After this patch, now we have external storage that records stack traces.

Well, we had it before this patch too.

> It's possible that some rare stack traces are in stack depot, but
> not reachable because track is overwritten.

Yes.

> I think it's worth implementing a way to iterate through stacks in stack depot?

The question is for what use case? We might even not know who stored
them - could have been page_owner, or other stack depot users. But the
point is usually not to learn about all existing traces, but to
determine which ones cause an object lifetime bug, or memory leak.

>>
>> Signed-off-by: Oliver Glitta <[email protected]>
>> Signed-off-by: Vlastimil Babka <[email protected]>
>> Cc: David Rientjes <[email protected]>
>> Cc: Christoph Lameter <[email protected]>
>> Cc: Pekka Enberg <[email protected]>
>> Cc: Joonsoo Kim <[email protected]>
>

2022-03-02 23:42:59

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot

On Wed, Mar 02, 2022 at 10:09:37AM +0100, Vlastimil Babka wrote:
> On 3/2/22 09:37, Mike Rapoport wrote:
> > On Mon, Feb 28, 2022 at 09:27:02PM +0000, Hyeonggon Yoo wrote:
> >> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> >> > On 2/26/22 08:19, Hyeonggon Yoo wrote:
> >> > > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> >> > >> Hi,
> >> > >>
> >> > >> this series combines and revives patches from Oliver's last year
> >> > >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> >> > >> files alloc_traces and free_traces more useful.
> >> > >> The resubmission was blocked on stackdepot changes that are now merged,
> >> > >> as explained in patch 2.
> >> > >>
> >> > >
> >> > > Hello. I just started review/testing this series.
> >> > >
> >> > > it crashed on my system (arm64)
> >> >
> >> > Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> >> > from memblock. arm64 must have memblock freeing happen earlier or something.
> >> > (CCing memblock experts)
> >> >
> >> > > I ran with boot parameter slub_debug=U, and without KASAN.
> >> > > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> >> > >
> >> > > void * __init memblock_alloc_try_nid(
> >> > > phys_addr_t size, phys_addr_t align,
> >> > > phys_addr_t min_addr, phys_addr_t max_addr,
> >> > > int nid)
> >> > > {
> >> > > void *ptr;
> >> > >
> >> > > memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> >> > > __func__, (u64)size, (u64)align, nid, &min_addr,
> >> > > &max_addr, (void *)_RET_IP_);
> >> > > ptr = memblock_alloc_internal(size, align,
> >> > > min_addr, max_addr, nid, false);
> >> > > if (ptr)
> >> > > memset(ptr, 0, size); <--- Crash Here
> >> > >
> >> > > return ptr;
> >> > > }
> >> > >
> >> > > It crashed during create_boot_cache() -> stack_depot_init() ->
> >> > > memblock_alloc().
> >> > >
> >> > > I think That's because, in kmem_cache_init(), both slab and memblock is not
> >> > > available. (AFAIU memblock is not available after mem_init() because of
> >> > > memblock_free_all(), right?)
> >> >
> >> > Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> >> > But then, I would expect stack_depot_init() to detect that memblock_alloc()
> >> > returns NULL, we print ""Stack Depot hash table allocation failed,
> >> > disabling" and disable it. Instead it seems memblock_alloc() returns
> >> > something that's already potentially used by somebody else? Sounds like a bug?
> >>
> >>
> >> By the way, I fixed this by allowing stack_depot_init() to be called in
> >> kmem_cache_init() too [1] and Marco suggested that calling
> >> stack_depot_init() depending on slub_debug parameter for simplicity. [2]
> >>
> >> I would prefer [2], Would you take a look?
> >>
> >> [1] https://lkml.org/lkml/2022/2/27/31
> >>
> >> [2] https://lkml.org/lkml/2022/2/28/717
> >
> > I have the third version :)
>
> While simple, it changes the timing of stack_depot_early_init() that was
> supposed to be at a single callsite - now it's less predictable and depends
> on e.g. kernel parameter ordering. Some arch/config combo could break,
> dunno. Setting a variable that stack_depot_early_init() checks should be
> more robust.

Not sure I follow.
stack_depot_early_init() is a wrapper for stack_depot_init() which already
checks

if (!stack_depot_disable && !stack_table)

So largely it can be at multiple call sites just like stack_depot_init...

Still, I understand your concern of having multiple call sites for
stack_depot_early_init().

The most robust way I can think of will be to make stack_depot_early_init()
a proper function, move memblock_alloc() there and add a variable, say
stack_depot_needed_early that will be set to 1 if
CONFIG_STACKDEPOT_ALWAYS_INIT=y or by the callers that need to allocate the
stack_table before kmalloc is up.

E.g

__init int stack_depot_early_init(void)
{

if (stack_depot_needed_early && !stack_table) {
size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
int i;

pr_info("Stack Depot allocating hash table with memblock_alloc\n");
stack_table = memblock_alloc(size, SMP_CACHE_BYTES);

if (!stack_table) {
pr_err("Stack Depot hash table allocation failed, disabling\n");
stack_depot_disable = true;
return -ENOMEM;
}
}

return 0;
}

The mutex is not needed here because mm_init() -> stack_depot_early_init()
happens before SMP and setting stack_table[i] to NULL is redundant with
memblock_alloc(). (btw, kvmalloc case could use __GFP_ZERO as well).

I'm not sure if the stack depot should be disabled for good if the early
allocation failed, but that's another story.

> > diff --git a/mm/slub.c b/mm/slub.c
> > index a74afe59a403..0c3ab2335b46 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1548,6 +1548,10 @@ static int __init setup_slub_debug(char *str)
> > }
> > out:
> > slub_debug = global_flags;
> > +
> > + if (slub_flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> > + stack_depot_early_init();
> > +
> > if (slub_debug != 0 || slub_debug_string)
> > static_branch_enable(&slub_debug_enabled);
> > else
> > @@ -4221,9 +4225,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
> > s->remote_node_defrag_ratio = 1000;
> > #endif
> >
> > - if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> > - stack_depot_init();
> > -
> > /* Initialize the pre-computed randomized freelist if slab is up */
> > if (slab_state >= UP) {
> > if (init_cache_random_seq(s))
> >
> >> --
> >> Thank you, You are awesome!
> >> Hyeonggon :-)
> >
>

--
Sincerely yours,
Mike.

2022-03-03 00:13:21

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 5/5] slab, documentation: add description of debugfs files for SLUB caches

On 2/27/22 04:49, Hyeonggon Yoo wrote:
> I think it's not traces of "currently free objects"
> because index bit of free objects are set in obj_map bitmap?

Hm right, thanks.

> It's weird but it's traces of allocated objects that have been freed at
> least once (or <not available>)
>
> I think we can fix the code or doc?

For now I'll fix the doc. Not clear to me myself what's the best usecase for
free_traces file. For alloc_traces it's clearly debugging memory leaks.
Freeing traces are most useful when a bug is detected and they are dumped in
dmesg. The debugfs file might be just for a rough idea where freeing usually
happens.

> Please tell me if I'm missing something :)
>
>> + Information in the output:
>> + Number of objects, freeing function, minimal/average/maximal jiffies since free,
>> + pid range of the freeing processes, cpu mask of freeing cpus, and stack trace.
>> +
>> + Example:::
>> +
>> + 51 acpi_ut_update_ref_count+0x6a6/0x782 age=236886/237027/237772 pid=1 cpus=1
>> + kfree+0x2db/0x420
>> + acpi_ut_update_ref_count+0x6a6/0x782
>> + acpi_ut_update_object_reference+0x1ad/0x234
>> + acpi_ut_remove_reference+0x7d/0x84
>> + acpi_rs_get_prt_method_data+0x97/0xd6
>> + acpi_get_irq_routing_table+0x82/0xc4
>> + acpi_pci_irq_find_prt_entry+0x8e/0x2e0
>> + acpi_pci_irq_lookup+0x3a/0x1e0
>> + acpi_pci_irq_enable+0x77/0x240
>> + pcibios_enable_device+0x39/0x40
>> + do_pci_enable_device.part.0+0x5d/0xe0
>> + pci_enable_device_flags+0xfc/0x120
>> + pci_enable_device+0x13/0x20
>> + virtio_pci_probe+0x9e/0x170
>> + local_pci_probe+0x48/0x80
>> + pci_device_probe+0x105/0x1c0
>> +
>
> Everything else looks nice!
>
>> Christoph Lameter, May 30, 2007
>> Sergey Senozhatsky, October 23, 2015
>> --
>> 2.35.1
>>
>>
>

2022-03-04 19:11:40

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot

On 2/26/22 13:18, Hyeonggon Yoo wrote:
> On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
>> Hi,
>>
>> this series combines and revives patches from Oliver's last year
>> bachelor thesis (where I was the advisor) that make SLUB's debugfs
>> files alloc_traces and free_traces more useful.
>> The resubmission was blocked on stackdepot changes that are now merged,
>> as explained in patch 2.
>>
>> Patch 1 is a new preparatory cleanup.
>>
>> Patch 2 originally submitted here [1], was merged to mainline but
>> reverted for stackdepot related issues as explained in the patch.
>>
>> Patches 3-5 originally submitted as RFC here [2]. In this submission I
>> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
>> be considered too intrusive so I will postpone it for later. The docs
>> patch is adjusted accordingly.
>>
>
> This problem is not caused by this patch series.
> But I think it's worth mentioning...
>
> It's really weird that some stack traces are not recorded
> when CONFIG_KASAN=y.
>
> I made sure that:
> - Stack Depot did not reach its limit
> - the free path happen on CONFIG_KASAN=y too.
>
> I have no clue why this happen.
>
> # cat dentry/free_traces (CONFIG_KASAN=y)
> 6585 <not-available> age=4294912647 pid=0 cpus=0

I think it's some kind of KASAN quarantining of freed objects, so they
haven't been properly freed through the SLUB layer yet.

> # cat dentry/free_traces (CONFIG_KASAN=n)
> 1246 <not-available> age=4294906877 pid=0 cpus=0
> 379 __d_free+0x20/0x2c age=33/14225/14353 pid=0-122 cpus=0-3
> kmem_cache_free+0x1f4/0x21c
> __d_free+0x20/0x2c
> rcu_core+0x334/0x580
> rcu_core_si+0x14/0x20
> __do_softirq+0x12c/0x2a8
>
> 2 dentry_free+0x58/0xb0 age=14101/14101/14101 pid=158 cpus=0
> kmem_cache_free+0x1f4/0x21c
> dentry_free+0x58/0xb0
> __dentry_kill+0x18c/0x1d0
> dput+0x1c4/0x2fc
> __fput+0xb0/0x230
> ____fput+0x14/0x20
> task_work_run+0x84/0x17c
> do_notify_resume+0x208/0x1330
> el0_svc+0x6c/0x80
> el0t_64_sync_handler+0xa8/0x130
> el0t_64_sync+0x1a0/0x1a4
>
> 1 dentry_free+0x58/0xb0 age=7678 pid=190 cpus=1
> kmem_cache_free+0x1f4/0x21c
> dentry_free+0x58/0xb0
> __dentry_kill+0x18c/0x1d0
> dput+0x1c4/0x2fc
> __fput+0xb0/0x230
> ____fput+0x14/0x20
> task_work_run+0x84/0x17c
> do_exit+0x2dc/0x8e0
> do_group_exit+0x38/0xa4
> __wake_up_parent+0x0/0x34
> invoke_syscall+0x48/0x114
> el0_svc_common.constprop.0+0x44/0xfc
> do_el0_svc+0x2c/0x94
> el0_svc+0x28/0x80
> el0t_64_sync_handler+0xa8/0x130
> el0t_64_sync+0x1a0/0x1a4