2022-09-01 04:53:59

by Oscar Salvador

[permalink] [raw]
Subject: [PATCH 0/3] page_owner: print stacks and their counter

Hi,

page_owner is a great debug functionality tool that gets us to know
about all pages that have been allocated/freed and their stacktrace.
This comes very handy when e.g: debugging leaks, as with some scripting
we might be able to see those stacktraces that are allocating pages
but not freeing theme.

In my experience, that is one of the most useful cases, but it can get
really tedious to screen through all pages aand try to reconstruct the
stack <-> allocated/freed relationship. There is a lot of noise
to cancel off.

This patch aims to fix that by adding a new functionality into page_owner.
What this does is to create a new read-only file "page_owner_stacks",
which prints only the allocating stacktraces and their counting, being that
the times the stacktrace has allocated - the times it has freed.

So we have a clear overview of stacks <-> allocated/freed relationship
without the need to fiddle with pages and trying to match free stacktraces
with allocated stacktraces.

This is achieved by adding a new refcount_t field in the stack_record struct,
incrementing that refcount_t everytime the same stacktrace allocates,
and decrementing it when it frees a page. Details can be seen in the
respective patches.

We also create another file called "page_owner_threshold", which let us
specify a threshold, so when when reading from "page_owner_stacks",
we will only see those stacktraces which counting goes beyond the
threshold we specified.

A PoC can be found below:

# cat /sys/kernel/debug/page_owner_threshold
0
# cat /sys/kernel/debug/page_owner_stacks > stacks_full.txt
# head -32 stacks_full.txt
prep_new_page+0x10d/0x180
get_page_from_freelist+0x1bd6/0x1e10
__alloc_pages+0x194/0x360
alloc_page_interleave+0x13/0x90
new_slab+0x31d/0x530
___slab_alloc+0x5d7/0x720
__slab_alloc.isra.85+0x4a/0x90
kmem_cache_alloc+0x455/0x4a0
acpi_ps_alloc_op+0x57/0x8f
acpi_ps_create_scope_op+0x12/0x23
acpi_ps_execute_method+0x102/0x2c1
acpi_ns_evaluate+0x343/0x4da
acpi_evaluate_object+0x1cb/0x392
acpi_run_osc+0x135/0x260
acpi_init+0x165/0x4ed
do_one_initcall+0x3e/0x200
stack count: 2

free_pcp_prepare+0x287/0x5c0
free_unref_page+0x1c/0xd0
__mmdrop+0x50/0x160
finish_task_switch+0x249/0x2b0
__schedule+0x2c3/0x960
schedule+0x44/0xb0
futex_wait_queue+0x70/0xd0
futex_wait+0x160/0x250
do_futex+0x11c/0x1b0
__x64_sys_futex+0x5e/0x1d0
do_syscall_64+0x37/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 1



# echo 10000 > /sys/kernel/debug/page_owner_threshold
# cat /sys/kernel/debug/page_owner_stacks > stacks_10000.txt
# cat stacks_10000.txt
prep_new_page+0x10d/0x180
get_page_from_freelist+0x1bd6/0x1e10
__alloc_pages+0x194/0x360
folio_alloc+0x17/0x40
page_cache_ra_unbounded+0x96/0x170
filemap_get_pages+0x23d/0x5e0
filemap_read+0xbf/0x3a0
__kernel_read+0x136/0x2f0
kernel_read_file+0x197/0x2d0
kernel_read_file_from_fd+0x54/0x90
__do_sys_finit_module+0x89/0x120
do_syscall_64+0x37/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 36195

prep_new_page+0x10d/0x180
get_page_from_freelist+0x1bd6/0x1e10
__alloc_pages+0x194/0x360
folio_alloc+0x17/0x40
page_cache_ra_unbounded+0x96/0x170
filemap_get_pages+0x23d/0x5e0
filemap_read+0xbf/0x3a0
new_sync_read+0x106/0x180
vfs_read+0x16f/0x190
ksys_read+0xa5/0xe0
do_syscall_64+0x37/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 44484

prep_new_page+0x10d/0x180
get_page_from_freelist+0x1bd6/0x1e10
__alloc_pages+0x194/0x360
folio_alloc+0x17/0x40
page_cache_ra_unbounded+0x96/0x170
filemap_get_pages+0xdd/0x5e0
filemap_read+0xbf/0x3a0
new_sync_read+0x106/0x180
vfs_read+0x16f/0x190
ksys_read+0xa5/0xe0
do_syscall_64+0x37/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 17874


Oscar Salvador (3):
lib/stackdepot: Add a refcount field in stack_record
mm, page_owner: Add page_owner_stacks file to print out only stacks
and their counter
mm,page_owner: Filter out stacks by a threshold counter

include/linux/stackdepot.h | 16 ++++-
lib/stackdepot.c | 121 ++++++++++++++++++++++++++++++++-----
mm/kasan/common.c | 3 +-
mm/page_owner.c | 102 +++++++++++++++++++++++++++++--
4 files changed, 222 insertions(+), 20 deletions(-)

--
2.35.3


2022-09-01 04:58:52

by Oscar Salvador

[permalink] [raw]
Subject: [PATCH 1/3] lib/stackdepot: Add a refcount field in stack_record

We want to filter out page_owner output and print only those
stacks that have been repeated beyond a certain threshold.
This gives us the chance to get rid of a lot of noise.
In order to do that, we need to keep track of how many repeated stacks
(for allocation) do we have, so we add a new refcount_t field
in the stack_record struct.

Note that on __set_page_owner_handle(), page_owner->handle is set,
and on __reset_page_owner(), page_owner->free_handle is set.

We are interested in page_owner->handle, so when __set_page_owner()
gets called, we derive the stack_record struct from page_owner->handle,
and we increment its refcount_t field; and when __reset_page_owner()
gets called, we derive its stack_record from page_owner->handle()
and we decrement its refcount_t field.

This is a preparation for patch#2.

Signed-off-by: Oscar Salvador <[email protected]>
---
include/linux/stackdepot.h | 13 ++++++-
lib/stackdepot.c | 79 +++++++++++++++++++++++++++++++-------
mm/kasan/common.c | 3 +-
mm/page_owner.c | 13 +++++--
4 files changed, 88 insertions(+), 20 deletions(-)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index bc2797955de9..5ee0cf5be88f 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -15,9 +15,16 @@

typedef u32 depot_stack_handle_t;

+typedef enum stack_action {
+ STACK_ACTION_NONE,
+ STACK_ACTION_INC,
+}stack_action_t;
+
depot_stack_handle_t __stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
- gfp_t gfp_flags, bool can_alloc);
+ gfp_t gfp_flags, bool can_alloc,
+ stack_action_t action);
+void stack_depot_dec_count(depot_stack_handle_t handle);

/*
* Every user of stack depot has to call stack_depot_init() during its own init
@@ -55,6 +62,10 @@ static inline int stack_depot_early_init(void) { return 0; }

depot_stack_handle_t stack_depot_save(unsigned long *entries,
unsigned int nr_entries, gfp_t gfp_flags);
+depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
+ unsigned int nr_entries,
+ gfp_t gfp_flags,
+ stack_action_t action);

unsigned int stack_depot_fetch(depot_stack_handle_t handle,
unsigned long **entries);
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 5ca0d086ef4a..aeb59d3557e2 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -63,6 +63,7 @@ struct stack_record {
u32 hash; /* Hash in the hastable */
u32 size; /* Number of frames in the stack */
union handle_parts handle;
+ refcount_t count; /* Number of the same repeated stacks */
unsigned long entries[]; /* Variable-sized array of entries. */
};

@@ -139,6 +140,7 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
stack->handle.slabindex = depot_index;
stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
stack->handle.valid = 1;
+ refcount_set(&stack->count, 1);
memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
depot_offset += required_size;

@@ -302,6 +304,29 @@ void stack_depot_print(depot_stack_handle_t stack)
}
EXPORT_SYMBOL_GPL(stack_depot_print);

+static struct stack_record *stack_depot_getstack(depot_stack_handle_t handle)
+{
+ union handle_parts parts = { .handle = handle };
+ void *slab;
+ size_t offset = parts.offset << STACK_ALLOC_ALIGN;
+ struct stack_record *stack;
+
+ if(!handle)
+ return NULL;
+
+ if (parts.slabindex > depot_index) {
+ WARN(1, "slab index %d out of bounds (%d) for stack id %08x\n",
+ parts.slabindex, depot_index, handle);
+ return NULL;
+ }
+ slab = stack_slabs[parts.slabindex];
+ if (!slab)
+ return NULL;
+
+ stack = slab + offset;
+ return stack;
+}
+
/**
* stack_depot_fetch - Fetch stack entries from a depot
*
@@ -314,30 +339,42 @@ EXPORT_SYMBOL_GPL(stack_depot_print);
unsigned int stack_depot_fetch(depot_stack_handle_t handle,
unsigned long **entries)
{
- union handle_parts parts = { .handle = handle };
- void *slab;
- size_t offset = parts.offset << STACK_ALLOC_ALIGN;
struct stack_record *stack;

*entries = NULL;
if (!handle)
return 0;

- if (parts.slabindex > depot_index) {
- WARN(1, "slab index %d out of bounds (%d) for stack id %08x\n",
- parts.slabindex, depot_index, handle);
- return 0;
- }
- slab = stack_slabs[parts.slabindex];
- if (!slab)
+ stack = stack_depot_getstack(handle);
+ if (!stack)
return 0;
- stack = slab + offset;

*entries = stack->entries;
return stack->size;
}
EXPORT_SYMBOL_GPL(stack_depot_fetch);

+static void stack_depot_inc_count(struct stack_record *stack)
+{
+ refcount_inc(&stack->count);
+}
+
+void stack_depot_dec_count(depot_stack_handle_t handle)
+{
+ struct stack_record *stack = NULL;
+
+ stack = stack_depot_getstack(handle);
+ if (stack) {
+ /*
+ * page_owner creates some stacks via create_dummy_stack().
+ * We are not interested in those, so make sure we only decrement
+ * "valid" stacks.
+ */
+ if (refcount_read(&stack->count) > 1)
+ refcount_dec(&stack->count);
+ }
+}
+
/**
* __stack_depot_save - Save a stack trace from an array
*
@@ -363,7 +400,8 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
*/
depot_stack_handle_t __stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
- gfp_t alloc_flags, bool can_alloc)
+ gfp_t alloc_flags, bool can_alloc,
+ stack_action_t action)
{
struct stack_record *found = NULL, **bucket;
depot_stack_handle_t retval = 0;
@@ -449,8 +487,11 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
/* Nobody used this memory, ok to free it. */
free_pages((unsigned long)prealloc, STACK_ALLOC_ORDER);
}
- if (found)
+ if (found) {
retval = found->handle.handle;
+ if (action == STACK_ACTION_INC)
+ stack_depot_inc_count(found);
+ }
fast_exit:
return retval;
}
@@ -472,6 +513,16 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
gfp_t alloc_flags)
{
- return __stack_depot_save(entries, nr_entries, alloc_flags, true);
+ return __stack_depot_save(entries, nr_entries, alloc_flags, true,
+ STACK_ACTION_NONE);
}
EXPORT_SYMBOL_GPL(stack_depot_save);
+
+depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
+ unsigned int nr_entries,
+ gfp_t alloc_flags,
+ stack_action_t action)
+{
+ return __stack_depot_save(entries, nr_entries, alloc_flags, true, action);
+}
+EXPORT_SYMBOL_GPL(stack_depot_save_action);
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index c40c0e7b3b5f..f434994f3b0d 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -36,7 +36,8 @@ depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc)
unsigned int nr_entries;

nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
- return __stack_depot_save(entries, nr_entries, flags, can_alloc);
+ return __stack_depot_save(entries, nr_entries, flags, can_alloc,
+ STACK_ACTION_NONE);
}

void kasan_set_track(struct kasan_track *track, gfp_t flags)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index e4c6f3f1695b..794f346d7520 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -106,7 +106,7 @@ static inline struct page_owner *get_page_owner(struct page_ext *page_ext)
return (void *)page_ext + page_owner_ops.offset;
}

-static noinline depot_stack_handle_t save_stack(gfp_t flags)
+static noinline depot_stack_handle_t save_stack(gfp_t flags, stack_action_t action)
{
unsigned long entries[PAGE_OWNER_STACK_DEPTH];
depot_stack_handle_t handle;
@@ -125,7 +125,7 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags)
current->in_page_owner = 1;

nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2);
- handle = stack_depot_save(entries, nr_entries, flags);
+ handle = stack_depot_save_action(entries, nr_entries, flags, action);
if (!handle)
handle = failure_handle;

@@ -138,6 +138,7 @@ void __reset_page_owner(struct page *page, unsigned short order)
int i;
struct page_ext *page_ext;
depot_stack_handle_t handle;
+ depot_stack_handle_t alloc_handle;
struct page_owner *page_owner;
u64 free_ts_nsec = local_clock();

@@ -145,7 +146,10 @@ void __reset_page_owner(struct page *page, unsigned short order)
if (unlikely(!page_ext))
return;

- handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
+ page_owner = get_page_owner(page_ext);
+ alloc_handle = page_owner->handle;
+
+ handle = save_stack(GFP_NOWAIT | __GFP_NOWARN, STACK_ACTION_NONE);
for (i = 0; i < (1 << order); i++) {
__clear_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
page_owner = get_page_owner(page_ext);
@@ -153,6 +157,7 @@ void __reset_page_owner(struct page *page, unsigned short order)
page_owner->free_ts_nsec = free_ts_nsec;
page_ext = page_ext_next(page_ext);
}
+ stack_depot_dec_count(alloc_handle);
}

static inline void __set_page_owner_handle(struct page_ext *page_ext,
@@ -189,7 +194,7 @@ noinline void __set_page_owner(struct page *page, unsigned short order,
if (unlikely(!page_ext))
return;

- handle = save_stack(gfp_mask);
+ handle = save_stack(gfp_mask, STACK_ACTION_INC);
__set_page_owner_handle(page_ext, handle, order, gfp_mask);
}

--
2.35.3

2022-09-01 05:04:22

by Oscar Salvador

[permalink] [raw]
Subject: [PATCH 3/3] mm,page_owner: Filter out stacks by a threshold counter

We want to be able to filter out the output on a threshold basis,
in this way we can get rid of a lot of noise and focus only on those
stacks which have an allegedly high counter.

We can control the threshold value by a new file called
'page_owner_threshold', which is 0 by default.

Signed-off-by: Oscar Salvador <[email protected]>
---
include/linux/stackdepot.h | 3 +-
lib/stackdepot.c | 6 ++--
mm/page_owner.c | 61 +++++++++++++++++++++++++++++++++++++-
3 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index 20f62039f23a..ee66be40a152 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -26,7 +26,8 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
stack_action_t action);
void stack_depot_dec_count(depot_stack_handle_t handle);
int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos,
- unsigned long *last_stack);
+ unsigned long *last_stack,
+ unsigned long threshold);

/*
* Every user of stack depot has to call stack_depot_init() during its own init
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 3090ae0f3958..b4a04f09a7b7 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -528,7 +528,8 @@ depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
EXPORT_SYMBOL_GPL(stack_depot_save_action);

int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos,
- unsigned long *last_stack)
+ unsigned long *last_stack,
+ unsigned long threshold)
{
struct stack_record *stack = NULL, *last;
struct stack_record **stacks;
@@ -547,7 +548,8 @@ int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos,
for (; stack; stack = stack->next) {
if (!stack->size || stack->size < 0 ||
stack->size > size || stack->handle.valid != 1 ||
- refcount_read(&stack->count) < 1)
+ refcount_read(&stack->count) < 1 ||
+ refcount_read(&stack->count) < threshold)
continue;

ret += stack_trace_snprint(buf, size, stack->entries, stack->size, 0);
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 8c67c7eb2451..ef10cf44aaec 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -44,6 +44,7 @@ static depot_stack_handle_t early_handle;
static void init_early_allocated_pages(void);

static unsigned long last_stack = 0;
+static unsigned long threshold_count = 0;

static int __init early_page_owner_param(char *buf)
{
@@ -676,7 +677,8 @@ static ssize_t read_page_owner_stacks(struct file *file, char __user *buf,
if (!kbuf)
return ENOMEM;

- ret += stack_depot_print_stacks_threshold(kbuf, count, pos, &last_stack);
+ ret += stack_depot_print_stacks_threshold(kbuf, count, pos, &last_stack,
+ threshold_count);
if (copy_to_user(buf, kbuf, ret))
ret = -EFAULT;

@@ -687,6 +689,61 @@ static ssize_t read_page_owner_stacks(struct file *file, char __user *buf,
return ret;
}

+static ssize_t read_page_owner_threshold(struct file *file, char __user *buf,
+ size_t count, loff_t *pos)
+{
+ char *kbuf;
+ int ret = 0;
+
+ count = min_t(size_t, count, PAGE_SIZE);
+
+ if (*pos >= count)
+ return 0;
+
+ kbuf = kmalloc(count, GFP_KERNEL);
+ if (!kbuf)
+ return ENOMEM;
+
+ ret = scnprintf(kbuf, count, "%lu\n", threshold_count);
+ if (copy_to_user(buf, kbuf, ret))
+ ret = -EFAULT;
+
+ *pos += count;
+ kfree(kbuf);
+
+ return ret;
+}
+
+static ssize_t write_page_owner_threshold(struct file *file, const char __user *buf,
+ size_t count, loff_t *pos)
+{
+ char *kbuf;
+ int ret = 0;
+
+ count = min_t(size_t, count, PAGE_SIZE);
+ kbuf = kmalloc(count, GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ if (copy_from_user(kbuf, buf, count)) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ kbuf[count - 1] = '\0';
+
+ ret = kstrtoul(kbuf, 10, &threshold_count);
+
+out:
+ kfree(kbuf);
+ return ret ? ret : count;
+}
+
+static const struct file_operations proc_page_owner_threshold = {
+ .read = read_page_owner_threshold,
+ .write = write_page_owner_threshold,
+};
+
static const struct file_operations proc_page_owner_stacks = {
.read = read_page_owner_stacks,
};
@@ -706,6 +763,8 @@ static int __init pageowner_init(void)
&proc_page_owner_operations);
debugfs_create_file("page_owner_stacks", 0400, NULL, NULL,
&proc_page_owner_stacks);
+ debugfs_create_file("page_owner_threshold", 0600, NULL, NULL,
+ &proc_page_owner_threshold);

return 0;
}
--
2.35.3

2022-09-01 05:04:51

by Oscar Salvador

[permalink] [raw]
Subject: [PATCH 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter

We might be only interested in knowing about stacks <-> count
relationship, so instead of having to fiddle with page_owner
output and screen through pfns, let us add a new file called
'page_owner_stacks' that does just that.
By cating such file, we will get all the stacktraces followed by
its counter (allocated - freed times), so we can have a more specific
overview.

Signed-off-by: Oscar Salvador <[email protected]>
---
include/linux/stackdepot.h | 2 ++
lib/stackdepot.c | 40 ++++++++++++++++++++++++++++++++++++++
mm/page_owner.c | 30 ++++++++++++++++++++++++++++
3 files changed, 72 insertions(+)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index 5ee0cf5be88f..20f62039f23a 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -25,6 +25,8 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
gfp_t gfp_flags, bool can_alloc,
stack_action_t action);
void stack_depot_dec_count(depot_stack_handle_t handle);
+int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos,
+ unsigned long *last_stack);

/*
* Every user of stack depot has to call stack_depot_init() during its own init
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index aeb59d3557e2..3090ae0f3958 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -526,3 +526,43 @@ depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
return __stack_depot_save(entries, nr_entries, alloc_flags, true, action);
}
EXPORT_SYMBOL_GPL(stack_depot_save_action);
+
+int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos,
+ unsigned long *last_stack)
+{
+ struct stack_record *stack = NULL, *last;
+ struct stack_record **stacks;
+ int i = *pos, ret = 0;
+
+ /* Continue from the last week if we have one */
+ if (*last_stack) {
+ last = (struct stack_record *)*last_stack;
+ stack = last->next;
+ } else {
+new_table:
+ stacks = &stack_table[i];
+ stack = (struct stack_record *)stacks;
+ }
+
+ for (; stack; stack = stack->next) {
+ if (!stack->size || stack->size < 0 ||
+ stack->size > size || stack->handle.valid != 1 ||
+ refcount_read(&stack->count) < 1)
+ continue;
+
+ ret += stack_trace_snprint(buf, size, stack->entries, stack->size, 0);
+ ret += scnprintf(buf + ret, size - ret, "stack count: %d\n\n",
+ refcount_read(&stack->count));
+ *last_stack = (unsigned long)stack;
+ return ret;
+ }
+
+ i++;
+ *pos = i;
+
+ /* Keep looking all tables for valid stacks */
+ if (i < STACK_HASH_SIZE)
+ goto new_table;
+
+ return 0;
+}
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 794f346d7520..8c67c7eb2451 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -43,6 +43,8 @@ static depot_stack_handle_t early_handle;

static void init_early_allocated_pages(void);

+static unsigned long last_stack = 0;
+
static int __init early_page_owner_param(char *buf)
{
int ret = kstrtobool(buf, &page_owner_enabled);
@@ -663,6 +665,32 @@ static void init_early_allocated_pages(void)
init_zones_in_node(pgdat);
}

+static ssize_t read_page_owner_stacks(struct file *file, char __user *buf,
+ size_t count, loff_t *pos)
+{
+ char *kbuf;
+ int ret = 0;
+
+ count = min_t(size_t, count, PAGE_SIZE);
+ kbuf = kmalloc(count, GFP_KERNEL);
+ if (!kbuf)
+ return ENOMEM;
+
+ ret += stack_depot_print_stacks_threshold(kbuf, count, pos, &last_stack);
+ if (copy_to_user(buf, kbuf, ret))
+ ret = -EFAULT;
+
+ if (!ret)
+ last_stack = 0;
+
+ kfree(kbuf);
+ return ret;
+}
+
+static const struct file_operations proc_page_owner_stacks = {
+ .read = read_page_owner_stacks,
+};
+
static const struct file_operations proc_page_owner_operations = {
.read = read_page_owner,
};
@@ -676,6 +704,8 @@ static int __init pageowner_init(void)

debugfs_create_file("page_owner", 0400, NULL, NULL,
&proc_page_owner_operations);
+ debugfs_create_file("page_owner_stacks", 0400, NULL, NULL,
+ &proc_page_owner_stacks);

return 0;
}
--
2.35.3

2022-09-01 08:41:08

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH 1/3] lib/stackdepot: Add a refcount field in stack_record

On Thu, Sep 01, 2022 at 06:42AM +0200, Oscar Salvador wrote:
> We want to filter out page_owner output and print only those
> stacks that have been repeated beyond a certain threshold.
> This gives us the chance to get rid of a lot of noise.
> In order to do that, we need to keep track of how many repeated stacks
> (for allocation) do we have, so we add a new refcount_t field
> in the stack_record struct.
>
> Note that on __set_page_owner_handle(), page_owner->handle is set,
> and on __reset_page_owner(), page_owner->free_handle is set.
>
> We are interested in page_owner->handle, so when __set_page_owner()
> gets called, we derive the stack_record struct from page_owner->handle,
> and we increment its refcount_t field; and when __reset_page_owner()
> gets called, we derive its stack_record from page_owner->handle()
> and we decrement its refcount_t field.
>
> This is a preparation for patch#2.
>
> Signed-off-by: Oscar Salvador <[email protected]>
> ---
> include/linux/stackdepot.h | 13 ++++++-
> lib/stackdepot.c | 79 +++++++++++++++++++++++++++++++-------
> mm/kasan/common.c | 3 +-

+Cc other kasan maintainers

> mm/page_owner.c | 13 +++++--
> 4 files changed, 88 insertions(+), 20 deletions(-)
>
> diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
> index bc2797955de9..5ee0cf5be88f 100644
> --- a/include/linux/stackdepot.h
> +++ b/include/linux/stackdepot.h
> @@ -15,9 +15,16 @@
>
> typedef u32 depot_stack_handle_t;
>
> +typedef enum stack_action {
> + STACK_ACTION_NONE,
> + STACK_ACTION_INC,
> +}stack_action_t;
> +

missing space after '}'. But please no unnecessary typedef, just 'enum
stack_action' (and spelling out 'enum stack_action' elsewhere) is just
fine.

This is in the global namespace, so I'd call this
stack_depot_action+STACK_DEPOT_ACTION_*.

However, .._ACTION_INC doesn't really say what's incremented. As an
analog to stack_depot_dec_count(), perhaps .._ACTION_COUNT?

In general it'd be nicer if there was stack_depot_inc_count() instead of
this additional argument, but I see that for performance reasons you
might not like that?

> depot_stack_handle_t __stack_depot_save(unsigned long *entries,
> unsigned int nr_entries,
> - gfp_t gfp_flags, bool can_alloc);
> + gfp_t gfp_flags, bool can_alloc,
> + stack_action_t action);
> +void stack_depot_dec_count(depot_stack_handle_t handle);
>
> /*
> * Every user of stack depot has to call stack_depot_init() during its own init
> @@ -55,6 +62,10 @@ static inline int stack_depot_early_init(void) { return 0; }
>
> depot_stack_handle_t stack_depot_save(unsigned long *entries,
> unsigned int nr_entries, gfp_t gfp_flags);
> +depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
> + unsigned int nr_entries,
> + gfp_t gfp_flags,
> + stack_action_t action);
>
> unsigned int stack_depot_fetch(depot_stack_handle_t handle,
> unsigned long **entries);
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index 5ca0d086ef4a..aeb59d3557e2 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -63,6 +63,7 @@ struct stack_record {
> u32 hash; /* Hash in the hastable */
> u32 size; /* Number of frames in the stack */
> union handle_parts handle;
> + refcount_t count; /* Number of the same repeated stacks */

This will increase stack_record size for every user, even if they don't
care about the count.

Is there a way to store this out-of-line somewhere?

> unsigned long entries[]; /* Variable-sized array of entries. */
> };
>
> @@ -139,6 +140,7 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
> stack->handle.slabindex = depot_index;
> stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
> stack->handle.valid = 1;
> + refcount_set(&stack->count, 1);
> memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
> depot_offset += required_size;
>
> @@ -302,6 +304,29 @@ void stack_depot_print(depot_stack_handle_t stack)
> }
> EXPORT_SYMBOL_GPL(stack_depot_print);
>
> +static struct stack_record *stack_depot_getstack(depot_stack_handle_t handle)
> +{
> + union handle_parts parts = { .handle = handle };
> + void *slab;
> + size_t offset = parts.offset << STACK_ALLOC_ALIGN;
> + struct stack_record *stack;
> +
> + if(!handle)
> + return NULL;
> +
> + if (parts.slabindex > depot_index) {
> + WARN(1, "slab index %d out of bounds (%d) for stack id %08x\n",
> + parts.slabindex, depot_index, handle);
> + return NULL;
> + }
> + slab = stack_slabs[parts.slabindex];
> + if (!slab)
> + return NULL;
> +
> + stack = slab + offset;
> + return stack;
> +}
> +
> /**
> * stack_depot_fetch - Fetch stack entries from a depot
> *
> @@ -314,30 +339,42 @@ EXPORT_SYMBOL_GPL(stack_depot_print);
> unsigned int stack_depot_fetch(depot_stack_handle_t handle,
> unsigned long **entries)
> {
> - union handle_parts parts = { .handle = handle };
> - void *slab;
> - size_t offset = parts.offset << STACK_ALLOC_ALIGN;
> struct stack_record *stack;
>
> *entries = NULL;
> if (!handle)
> return 0;
>
> - if (parts.slabindex > depot_index) {
> - WARN(1, "slab index %d out of bounds (%d) for stack id %08x\n",
> - parts.slabindex, depot_index, handle);
> - return 0;
> - }
> - slab = stack_slabs[parts.slabindex];
> - if (!slab)
> + stack = stack_depot_getstack(handle);
> + if (!stack)
> return 0;
> - stack = slab + offset;
>
> *entries = stack->entries;
> return stack->size;
> }
> EXPORT_SYMBOL_GPL(stack_depot_fetch);
>
> +static void stack_depot_inc_count(struct stack_record *stack)
> +{
> + refcount_inc(&stack->count);
> +}
> +
> +void stack_depot_dec_count(depot_stack_handle_t handle)
> +{
> + struct stack_record *stack = NULL;
> +
> + stack = stack_depot_getstack(handle);
> + if (stack) {
> + /*
> + * page_owner creates some stacks via create_dummy_stack().
> + * We are not interested in those, so make sure we only decrement
> + * "valid" stacks.
> + */

Comment indent is wrong.

> + if (refcount_read(&stack->count) > 1)
> + refcount_dec(&stack->count);
> + }
> +}
> +
> /**
> * __stack_depot_save - Save a stack trace from an array
> *
> @@ -363,7 +400,8 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
> */
> depot_stack_handle_t __stack_depot_save(unsigned long *entries,
> unsigned int nr_entries,
> - gfp_t alloc_flags, bool can_alloc)
> + gfp_t alloc_flags, bool can_alloc,
> + stack_action_t action)
> {
> struct stack_record *found = NULL, **bucket;
> depot_stack_handle_t retval = 0;
> @@ -449,8 +487,11 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
> /* Nobody used this memory, ok to free it. */
> free_pages((unsigned long)prealloc, STACK_ALLOC_ORDER);
> }
> - if (found)
> + if (found) {
> retval = found->handle.handle;
> + if (action == STACK_ACTION_INC)
> + stack_depot_inc_count(found);
> + }
> fast_exit:
> return retval;
> }
> @@ -472,6 +513,16 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
> unsigned int nr_entries,
> gfp_t alloc_flags)
> {
> - return __stack_depot_save(entries, nr_entries, alloc_flags, true);
> + return __stack_depot_save(entries, nr_entries, alloc_flags, true,
> + STACK_ACTION_NONE);
> }
> EXPORT_SYMBOL_GPL(stack_depot_save);
> +
> +depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
> + unsigned int nr_entries,
> + gfp_t alloc_flags,
> + stack_action_t action)
> +{
> + return __stack_depot_save(entries, nr_entries, alloc_flags, true, action);
> +}
> +EXPORT_SYMBOL_GPL(stack_depot_save_action);
> diff --git a/mm/kasan/common.c b/mm/kasan/common.c
> index c40c0e7b3b5f..f434994f3b0d 100644
> --- a/mm/kasan/common.c
> +++ b/mm/kasan/common.c
> @@ -36,7 +36,8 @@ depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc)
> unsigned int nr_entries;
>
> nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
> - return __stack_depot_save(entries, nr_entries, flags, can_alloc);
> + return __stack_depot_save(entries, nr_entries, flags, can_alloc,
> + STACK_ACTION_NONE);
> }
>
> void kasan_set_track(struct kasan_track *track, gfp_t flags)
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index e4c6f3f1695b..794f346d7520 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -106,7 +106,7 @@ static inline struct page_owner *get_page_owner(struct page_ext *page_ext)
> return (void *)page_ext + page_owner_ops.offset;
> }
>
> -static noinline depot_stack_handle_t save_stack(gfp_t flags)
> +static noinline depot_stack_handle_t save_stack(gfp_t flags, stack_action_t action)
> {
> unsigned long entries[PAGE_OWNER_STACK_DEPTH];
> depot_stack_handle_t handle;
> @@ -125,7 +125,7 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags)
> current->in_page_owner = 1;
>
> nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2);
> - handle = stack_depot_save(entries, nr_entries, flags);
> + handle = stack_depot_save_action(entries, nr_entries, flags, action);
> if (!handle)
> handle = failure_handle;
>
> @@ -138,6 +138,7 @@ void __reset_page_owner(struct page *page, unsigned short order)
> int i;
> struct page_ext *page_ext;
> depot_stack_handle_t handle;
> + depot_stack_handle_t alloc_handle;
> struct page_owner *page_owner;
> u64 free_ts_nsec = local_clock();
>
> @@ -145,7 +146,10 @@ void __reset_page_owner(struct page *page, unsigned short order)
> if (unlikely(!page_ext))
> return;
>
> - handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
> + page_owner = get_page_owner(page_ext);
> + alloc_handle = page_owner->handle;
> +
> + handle = save_stack(GFP_NOWAIT | __GFP_NOWARN, STACK_ACTION_NONE);
> for (i = 0; i < (1 << order); i++) {
> __clear_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
> page_owner = get_page_owner(page_ext);
> @@ -153,6 +157,7 @@ void __reset_page_owner(struct page *page, unsigned short order)
> page_owner->free_ts_nsec = free_ts_nsec;
> page_ext = page_ext_next(page_ext);
> }
> + stack_depot_dec_count(alloc_handle);
> }
>
> static inline void __set_page_owner_handle(struct page_ext *page_ext,
> @@ -189,7 +194,7 @@ noinline void __set_page_owner(struct page *page, unsigned short order,
> if (unlikely(!page_ext))
> return;
>
> - handle = save_stack(gfp_mask);
> + handle = save_stack(gfp_mask, STACK_ACTION_INC);
> __set_page_owner_handle(page_ext, handle, order, gfp_mask);
> }
>
> --
> 2.35.3

2022-09-01 08:50:56

by Ammar Faizi

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter

On 9/1/22 11:42 AM, Oscar Salvador wrote:
> +static unsigned long last_stack = 0;

This @last_stack can just be a static local variable in the new
function you wrote, read_page_owner_stacks(), since no other
functions use it.

> +static ssize_t read_page_owner_stacks(struct file *file, char __user *buf,
> + size_t count, loff_t *pos)
> +{
> + char *kbuf;
> + int ret = 0;
> +
> + count = min_t(size_t, count, PAGE_SIZE);
> + kbuf = kmalloc(count, GFP_KERNEL);
> + if (!kbuf)
> + return ENOMEM;

Missing a negative sign, return -ENOMEM;

> + ret += stack_depot_print_stacks_threshold(kbuf, count, pos, &last_stack);
> + if (copy_to_user(buf, kbuf, ret))
> + ret = -EFAULT;
> +
> + if (!ret)
> + last_stack = 0;
> +
> + kfree(kbuf);
> + return ret;
> +}

--
Ammar Faizi

2022-09-01 08:54:59

by Ammar Faizi

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm,page_owner: Filter out stacks by a threshold counter

On 9/1/22 11:42 AM, Oscar Salvador wrote:> +static ssize_t read_page_owner_threshold(struct file *file, char __user *buf,
> + size_t count, loff_t *pos)
> +{
> + char *kbuf;
> + int ret = 0;
> +
> + count = min_t(size_t, count, PAGE_SIZE);
> +
> + if (*pos >= count)
> + return 0;
> +
> + kbuf = kmalloc(count, GFP_KERNEL);
> + if (!kbuf)
> + return ENOMEM;

Missing a negative sign, return -ENOMEM.

> + ret = scnprintf(kbuf, count, "%lu\n", threshold_count);
> + if (copy_to_user(buf, kbuf, ret))
> + ret = -EFAULT;
> +
> + *pos += count;
> + kfree(kbuf);
> +
> + return ret;
> +}
> +
> +static ssize_t write_page_owner_threshold(struct file *file, const char __user *buf,
> + size_t count, loff_t *pos)
> +{
> + char *kbuf;
> + int ret = 0;
> +
> + count = min_t(size_t, count, PAGE_SIZE);
> + kbuf = kmalloc(count, GFP_KERNEL);

This looks overestimating to me. For unsigned long, on a 64-bit system
has max val 18446744073709551615 (20 chars).

You can use stack a allocated local variable with length 21. No need
to use kmalloc(). The same way with the read() op.

> + if (!kbuf)
> + return -ENOMEM;
> +
> + if (copy_from_user(kbuf, buf, count)) {
> + ret = -EFAULT;
> + goto out;
> + }
> +
> + kbuf[count - 1] = '\0';
> +
> + ret = kstrtoul(kbuf, 10, &threshold_count);
> +
> +out:
> + kfree(kbuf);
> + return ret ? ret : count;
> +}
> +
> +static const struct file_operations proc_page_owner_threshold = {
> + .read = read_page_owner_threshold,
> + .write = write_page_owner_threshold,
> +};

--
Ammar Faizi

2022-09-01 08:55:18

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/3] lib/stackdepot: Add a refcount field in stack_record

On Thu 01-09-22 10:24:58, Marco Elver wrote:
> On Thu, Sep 01, 2022 at 06:42AM +0200, Oscar Salvador wrote:
[...]
> > diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> > index 5ca0d086ef4a..aeb59d3557e2 100644
> > --- a/lib/stackdepot.c
> > +++ b/lib/stackdepot.c
> > @@ -63,6 +63,7 @@ struct stack_record {
> > u32 hash; /* Hash in the hastable */
> > u32 size; /* Number of frames in the stack */
> > union handle_parts handle;
> > + refcount_t count; /* Number of the same repeated stacks */
>
> This will increase stack_record size for every user, even if they don't
> care about the count.

Couldn't this be used for garbage collection?
--
Michal Hocko
SUSE Labs

2022-09-01 09:34:35

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH 1/3] lib/stackdepot: Add a refcount field in stack_record

On Thu, 1 Sept 2022 at 10:38, Michal Hocko <[email protected]> wrote:
>
> On Thu 01-09-22 10:24:58, Marco Elver wrote:
> > On Thu, Sep 01, 2022 at 06:42AM +0200, Oscar Salvador wrote:
> [...]
> > > diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> > > index 5ca0d086ef4a..aeb59d3557e2 100644
> > > --- a/lib/stackdepot.c
> > > +++ b/lib/stackdepot.c
> > > @@ -63,6 +63,7 @@ struct stack_record {
> > > u32 hash; /* Hash in the hastable */
> > > u32 size; /* Number of frames in the stack */
> > > union handle_parts handle;
> > > + refcount_t count; /* Number of the same repeated stacks */
> >
> > This will increase stack_record size for every user, even if they don't
> > care about the count.
>
> Couldn't this be used for garbage collection?

Only if we can precisely figure out at which point a stack is no
longer going to be needed.

But more realistically, stack depot was designed to be simple. Right
now it can allocate new stacks (from an internal pool), but giving the
memory back to that pool isn't supported. Doing garbage collection
would effectively be a redesign of stack depot. And for the purpose
for which stack depot was designed (debugging tools), memory has never
been an issue (note that stack depot also has a fixed upper bound on
memory usage).

We had talked (in the context of KASAN) about bounded stack storage,
but the preferred solution is usually a cache-based design which
allows evictions (in the simplest case a ring buffer), because
figuring out (and relying on) where precisely a stack will
definitively no longer be required in bug reports is complex and does
not guarantee the required bound on memory usage. Andrey has done the
work on this for tag-based KASAN modes:
https://lore.kernel.org/all/[email protected]/

2022-09-01 09:35:06

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm,page_owner: Filter out stacks by a threshold counter

On Thu 01-09-22 06:42:49, Oscar Salvador wrote:
[...]
> +static ssize_t read_page_owner_threshold(struct file *file, char __user *buf,
> + size_t count, loff_t *pos)
> +{
> + char *kbuf;
> + int ret = 0;
> +
> + count = min_t(size_t, count, PAGE_SIZE);
> +
> + if (*pos >= count)
> + return 0;
> +
> + kbuf = kmalloc(count, GFP_KERNEL);

No, you do not want to trigger user defined allocation like that. I
would use seq_file.
--
Michal Hocko
SUSE Labs

2022-09-01 09:39:09

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 0/3] page_owner: print stacks and their counter

On Thu 01-09-22 06:42:46, Oscar Salvador wrote:
> Hi,
>
> page_owner is a great debug functionality tool that gets us to know
> about all pages that have been allocated/freed and their stacktrace.
> This comes very handy when e.g: debugging leaks, as with some scripting
> we might be able to see those stacktraces that are allocating pages
> but not freeing theme.
>
> In my experience, that is one of the most useful cases, but it can get
> really tedious to screen through all pages aand try to reconstruct the
> stack <-> allocated/freed relationship. There is a lot of noise
> to cancel off.
>
> This patch aims to fix that by adding a new functionality into page_owner.
> What this does is to create a new read-only file "page_owner_stacks",
> which prints only the allocating stacktraces and their counting, being that
> the times the stacktrace has allocated - the times it has freed.
>
> So we have a clear overview of stacks <-> allocated/freed relationship
> without the need to fiddle with pages and trying to match free stacktraces
> with allocated stacktraces.
>
> This is achieved by adding a new refcount_t field in the stack_record struct,
> incrementing that refcount_t everytime the same stacktrace allocates,
> and decrementing it when it frees a page. Details can be seen in the
> respective patches.
>
> We also create another file called "page_owner_threshold", which let us
> specify a threshold, so when when reading from "page_owner_stacks",
> we will only see those stacktraces which counting goes beyond the
> threshold we specified.
>
> A PoC can be found below:
>
> # cat /sys/kernel/debug/page_owner_threshold
> 0
> # cat /sys/kernel/debug/page_owner_stacks > stacks_full.txt
> # head -32 stacks_full.txt
> prep_new_page+0x10d/0x180
> get_page_from_freelist+0x1bd6/0x1e10
> __alloc_pages+0x194/0x360
> alloc_page_interleave+0x13/0x90
> new_slab+0x31d/0x530
> ___slab_alloc+0x5d7/0x720
> __slab_alloc.isra.85+0x4a/0x90
> kmem_cache_alloc+0x455/0x4a0
> acpi_ps_alloc_op+0x57/0x8f
> acpi_ps_create_scope_op+0x12/0x23
> acpi_ps_execute_method+0x102/0x2c1
> acpi_ns_evaluate+0x343/0x4da
> acpi_evaluate_object+0x1cb/0x392
> acpi_run_osc+0x135/0x260
> acpi_init+0x165/0x4ed
> do_one_initcall+0x3e/0x200
> stack count: 2

This is very nice and useful! I guess some people would prefer to have
Memory usage: XYZ kB
dumped instead but looking at the code this would require to track
number of pages rather than calls with stacks and that would be more code
and somehow alien to the concept as well. Practically speaking, when
looking into leakers high stack count should be indicative enough IMHO.

[...]
> Oscar Salvador (3):
> lib/stackdepot: Add a refcount field in stack_record
> mm, page_owner: Add page_owner_stacks file to print out only stacks
> and their counter
> mm,page_owner: Filter out stacks by a threshold counter
>
> include/linux/stackdepot.h | 16 ++++-
> lib/stackdepot.c | 121 ++++++++++++++++++++++++++++++++-----
> mm/kasan/common.c | 3 +-
> mm/page_owner.c | 102 +++++++++++++++++++++++++++++--
> 4 files changed, 222 insertions(+), 20 deletions(-)

The code footprint is also rather low. I am no expert in neither
stackdepot nor page owner but from a very quick glance nothing really
jumped at me.

Thanks!
--
Michal Hocko
SUSE Labs

2022-09-01 10:47:32

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/3] lib/stackdepot: Add a refcount field in stack_record

On Thu 01-09-22 11:18:19, Marco Elver wrote:
> On Thu, 1 Sept 2022 at 10:38, Michal Hocko <[email protected]> wrote:
> >
> > On Thu 01-09-22 10:24:58, Marco Elver wrote:
> > > On Thu, Sep 01, 2022 at 06:42AM +0200, Oscar Salvador wrote:
> > [...]
> > > > diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> > > > index 5ca0d086ef4a..aeb59d3557e2 100644
> > > > --- a/lib/stackdepot.c
> > > > +++ b/lib/stackdepot.c
> > > > @@ -63,6 +63,7 @@ struct stack_record {
> > > > u32 hash; /* Hash in the hastable */
> > > > u32 size; /* Number of frames in the stack */
> > > > union handle_parts handle;
> > > > + refcount_t count; /* Number of the same repeated stacks */
> > >
> > > This will increase stack_record size for every user, even if they don't
> > > care about the count.
> >
> > Couldn't this be used for garbage collection?
>
> Only if we can precisely figure out at which point a stack is no
> longer going to be needed.
>
> But more realistically, stack depot was designed to be simple. Right
> now it can allocate new stacks (from an internal pool), but giving the
> memory back to that pool isn't supported. Doing garbage collection
> would effectively be a redesign of stack depot.

Fair argument.

> And for the purpose
> for which stack depot was designed (debugging tools), memory has never
> been an issue (note that stack depot also has a fixed upper bound on
> memory usage).

Is the increased size really a blocker then? I see how it sucks to
maintain a counter when it is not used by anything but page_owner but
storing that counte externally would just add more complexity AFAICS
(more allocations, more tracking etc.).

Maybe the counter can be conditional on the page_owner which would add
some complexity as well (variable size structure) but at least the
external allocation stuff could be avoided.
--
Michal Hocko
SUSE Labs

2022-09-01 10:48:20

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH 1/3] lib/stackdepot: Add a refcount field in stack_record

On Thu, 1 Sept 2022 at 12:01, Michal Hocko <[email protected]> wrote:
>
> On Thu 01-09-22 11:18:19, Marco Elver wrote:
> > On Thu, 1 Sept 2022 at 10:38, Michal Hocko <[email protected]> wrote:
> > >
> > > On Thu 01-09-22 10:24:58, Marco Elver wrote:
> > > > On Thu, Sep 01, 2022 at 06:42AM +0200, Oscar Salvador wrote:
> > > [...]
> > > > > diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> > > > > index 5ca0d086ef4a..aeb59d3557e2 100644
> > > > > --- a/lib/stackdepot.c
> > > > > +++ b/lib/stackdepot.c
> > > > > @@ -63,6 +63,7 @@ struct stack_record {
> > > > > u32 hash; /* Hash in the hastable */
> > > > > u32 size; /* Number of frames in the stack */
> > > > > union handle_parts handle;
> > > > > + refcount_t count; /* Number of the same repeated stacks */
> > > >
> > > > This will increase stack_record size for every user, even if they don't
> > > > care about the count.
> > >
> > > Couldn't this be used for garbage collection?
> >
> > Only if we can precisely figure out at which point a stack is no
> > longer going to be needed.
> >
> > But more realistically, stack depot was designed to be simple. Right
> > now it can allocate new stacks (from an internal pool), but giving the
> > memory back to that pool isn't supported. Doing garbage collection
> > would effectively be a redesign of stack depot.
>
> Fair argument.
>
> > And for the purpose
> > for which stack depot was designed (debugging tools), memory has never
> > been an issue (note that stack depot also has a fixed upper bound on
> > memory usage).
>
> Is the increased size really a blocker then? I see how it sucks to
> maintain a counter when it is not used by anything but page_owner but
> storing that counte externally would just add more complexity AFAICS
> (more allocations, more tracking etc.).

Right, I think keeping it simple is better.

> Maybe the counter can be conditional on the page_owner which would add
> some complexity as well (variable size structure) but at least the
> external allocation stuff could be avoided.

Not sure it's needed - I just checked the size of stack_record on a
x86-64 build, and it's 24 bytes. Because 'handle_parts' is 4 bytes,
and refcount_t is 4 bytes, and the alignment of 'entries' being 8
bytes, even with the refcount_t, stack_record is still 24 bytes. :-)

And for me that's good enough. Maybe mentioning this in the commit
message is worthwhile. Of course 32-bit builds still suffer a little,
but I think we can live with that.

2022-09-01 19:34:05

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter

Hi Oscar,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v6.0-rc3]
[cannot apply to akpm-mm/mm-everything next-20220901]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Oscar-Salvador/page_owner-print-stacks-and-their-counter/20220901-124408
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git c5e4d5e99162ba8025d58a3af7ad103f155d2df7
config: s390-buildonly-randconfig-r001-20220901
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project c55b41d5199d2394dd6cdb8f52180d8b81d809d4)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install s390 cross compiling tool for clang build
# apt-get install binutils-s390x-linux-gnu
# https://github.com/intel-lab-lkp/linux/commit/18d3054fb57a70676be763adab8c8881a1baa504
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Oscar-Salvador/page_owner-print-stacks-and-their-counter/20220901-124408
git checkout 18d3054fb57a70676be763adab8c8881a1baa504
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

In file included from lib/stackdepot.c:34:
In file included from include/linux/memblock.h:13:
In file included from arch/s390/include/asm/dma.h:5:
In file included from arch/s390/include/asm/io.h:75:
include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __raw_readb(PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
^
include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
#define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
^
In file included from lib/stackdepot.c:34:
In file included from include/linux/memblock.h:13:
In file included from arch/s390/include/asm/dma.h:5:
In file included from arch/s390/include/asm/io.h:75:
include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:35:59: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
^
include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
#define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
^
In file included from lib/stackdepot.c:34:
In file included from include/linux/memblock.h:13:
In file included from arch/s390/include/asm/dma.h:5:
In file included from arch/s390/include/asm/io.h:75:
include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writeb(value, PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:692:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
readsb(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:700:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
readsw(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:708:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
readsl(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:717:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
writesb(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:726:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
writesw(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:735:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
writesl(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
>> lib/stackdepot.c:603:10: error: use of undeclared identifier 'STACK_HASH_SIZE'
if (i < STACK_HASH_SIZE)
^
12 warnings and 1 error generated.


vim +/STACK_HASH_SIZE +603 lib/stackdepot.c

568
569 int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos,
570 unsigned long *last_stack)
571 {
572 struct stack_record *stack = NULL, *last;
573 struct stack_record **stacks;
574 int i = *pos, ret = 0;
575
576 /* Continue from the last week if we have one */
577 if (*last_stack) {
578 last = (struct stack_record *)*last_stack;
579 stack = last->next;
580 } else {
581 new_table:
582 stacks = &stack_table[i];
583 stack = (struct stack_record *)stacks;
584 }
585
586 for (; stack; stack = stack->next) {
587 if (!stack->size || stack->size < 0 ||
588 stack->size > size || stack->handle.valid != 1 ||
589 refcount_read(&stack->count) < 1)
590 continue;
591
592 ret += stack_trace_snprint(buf, size, stack->entries, stack->size, 0);
593 ret += scnprintf(buf + ret, size - ret, "stack count: %d\n\n",
594 refcount_read(&stack->count));
595 *last_stack = (unsigned long)stack;
596 return ret;
597 }
598
599 i++;
600 *pos = i;
601
602 /* Keep looking all tables for valid stacks */
> 603 if (i < STACK_HASH_SIZE)

--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (7.71 kB)
config (62.10 kB)
Download all attachments

2022-09-02 01:21:11

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter

Hi Oscar,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v6.0-rc3]
[cannot apply to akpm-mm/mm-everything next-20220901]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Oscar-Salvador/page_owner-print-stacks-and-their-counter/20220901-124408
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git c5e4d5e99162ba8025d58a3af7ad103f155d2df7
config: arc-randconfig-r043-20220901 (https://download.01.org/0day-ci/archive/20220902/[email protected]/config)
compiler: arc-elf-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/18d3054fb57a70676be763adab8c8881a1baa504
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Oscar-Salvador/page_owner-print-stacks-and-their-counter/20220901-124408
git checkout 18d3054fb57a70676be763adab8c8881a1baa504
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arc SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

lib/stackdepot.c: In function 'stack_depot_print_stacks_threshold':
>> lib/stackdepot.c:603:17: error: 'STACK_HASH_SIZE' undeclared (first use in this function); did you mean 'STACK_HASH_SEED'?
603 | if (i < STACK_HASH_SIZE)
| ^~~~~~~~~~~~~~~
| STACK_HASH_SEED
lib/stackdepot.c:603:17: note: each undeclared identifier is reported only once for each function it appears in


vim +603 lib/stackdepot.c

568
569 int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos,
570 unsigned long *last_stack)
571 {
572 struct stack_record *stack = NULL, *last;
573 struct stack_record **stacks;
574 int i = *pos, ret = 0;
575
576 /* Continue from the last week if we have one */
577 if (*last_stack) {
578 last = (struct stack_record *)*last_stack;
579 stack = last->next;
580 } else {
581 new_table:
582 stacks = &stack_table[i];
583 stack = (struct stack_record *)stacks;
584 }
585
586 for (; stack; stack = stack->next) {
587 if (!stack->size || stack->size < 0 ||
588 stack->size > size || stack->handle.valid != 1 ||
589 refcount_read(&stack->count) < 1)
590 continue;
591
592 ret += stack_trace_snprint(buf, size, stack->entries, stack->size, 0);
593 ret += scnprintf(buf + ret, size - ret, "stack count: %d\n\n",
594 refcount_read(&stack->count));
595 *last_stack = (unsigned long)stack;
596 return ret;
597 }
598
599 i++;
600 *pos = i;
601
602 /* Keep looking all tables for valid stacks */
> 603 if (i < STACK_HASH_SIZE)

--
0-DAY CI Kernel Test Service
https://01.org/lkp

2022-09-02 03:45:40

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH 1/3] lib/stackdepot: Add a refcount field in stack_record

On Thu, Sep 01, 2022 at 10:24:58AM +0200, Marco Elver wrote:
> On Thu, Sep 01, 2022 at 06:42AM +0200, Oscar Salvador wrote:
> > include/linux/stackdepot.h | 13 ++++++-
> > lib/stackdepot.c | 79 +++++++++++++++++++++++++++++++-------
> > mm/kasan/common.c | 3 +-
>
> +Cc other kasan maintainers

Yeah, sorry about that, I should have CCed you guys.

> > +typedef enum stack_action {
> > + STACK_ACTION_NONE,
> > + STACK_ACTION_INC,
> > +}stack_action_t;
> > +
>
> missing space after '}'. But please no unnecessary typedef, just 'enum
> stack_action' (and spelling out 'enum stack_action' elsewhere) is just
> fine.

Sure, will re-name it.

>
> This is in the global namespace, so I'd call this
> stack_depot_action+STACK_DEPOT_ACTION_*.
>
> However, .._ACTION_INC doesn't really say what's incremented. As an
> analog to stack_depot_dec_count(), perhaps .._ACTION_COUNT?

I guess we can go "STACK_DEPOT_ACTION_COUNT", or "STACK_DEPOT_ACTION_REF_INC",
but the latter seems rather baroque for my taste.

> In general it'd be nicer if there was stack_depot_inc_count() instead of
> this additional argument, but I see that for performance reasons you
> might not like that?

Yes, the first prototypes didn't have this stack_action_t thing,
but that implied that we had to look for the stack twice
in the __set_page_owner() case.

This way we only do that in the __reset_page_owner() case.

So yes, it's a trade-off performance vs LOC.

> > --- a/lib/stackdepot.c
> > +++ b/lib/stackdepot.c
> > @@ -63,6 +63,7 @@ struct stack_record {
> > u32 hash; /* Hash in the hastable */
> > u32 size; /* Number of frames in the stack */
> > union handle_parts handle;
> > + refcount_t count; /* Number of the same repeated stacks */
>
> This will increase stack_record size for every user, even if they don't
> care about the count.
>
> Is there a way to store this out-of-line somewhere?

That would require having some kind of e.g: dynamic struct and allocating
new links to stacks as they were created and increase the refcount there.

But that would be too much of complexity, I think.

As I read in your other thread, we can probably live with that, but
it is worth spelling out in the changelog.

> > +void stack_depot_dec_count(depot_stack_handle_t handle)
> > +{
> > + struct stack_record *stack = NULL;
> > +
> > + stack = stack_depot_getstack(handle);
> > + if (stack) {
> > + /*
> > + * page_owner creates some stacks via create_dummy_stack().
> > + * We are not interested in those, so make sure we only decrement
> > + * "valid" stacks.
> > + */
>
> Comment indent is wrong.

Will fix it.

Thanks for taking the time to review the code Marco!


--
Oscar Salvador
SUSE Labs

2022-09-02 03:49:04

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm,page_owner: Filter out stacks by a threshold counter

On Thu, Sep 01, 2022 at 10:40:32AM +0200, Michal Hocko wrote:
> On Thu 01-09-22 06:42:49, Oscar Salvador wrote:
> [...]
> > +static ssize_t read_page_owner_threshold(struct file *file, char __user *buf,
> > + size_t count, loff_t *pos)
> > +{
> > + char *kbuf;
> > + int ret = 0;
> > +
> > + count = min_t(size_t, count, PAGE_SIZE);
> > +
> > + if (*pos >= count)
> > + return 0;
> > +
> > + kbuf = kmalloc(count, GFP_KERNEL);
>
> No, you do not want to trigger user defined allocation like that. I
> would use seq_file.

Sure, will use that.

Thanks!


--
Oscar Salvador
SUSE Labs

2022-09-02 04:17:12

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter

On Thu, Sep 01, 2022 at 03:16:44PM +0700, Ammar Faizi wrote:
> On 9/1/22 11:42 AM, Oscar Salvador wrote:
> > +static unsigned long last_stack = 0;
>
> This @last_stack can just be a static local variable in the new
> function you wrote, read_page_owner_stacks(), since no other
> functions use it.

We could certainly do that.

>
> > +static ssize_t read_page_owner_stacks(struct file *file, char __user *buf,
> > + size_t count, loff_t *pos)
> > +{
> > + char *kbuf;
> > + int ret = 0;
> > +
> > + count = min_t(size_t, count, PAGE_SIZE);
> > + kbuf = kmalloc(count, GFP_KERNEL);
> > + if (!kbuf)
> > + return ENOMEM;
>
> Missing a negative sign, return -ENOMEM;

Oh yes, I overlooked that.

Thanks!


--
Oscar Salvador
SUSE Labs

2022-09-02 04:17:15

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm,page_owner: Filter out stacks by a threshold counter

On Thu, Sep 01, 2022 at 03:31:51PM +0700, Ammar Faizi wrote:
> On 9/1/22 11:42 AM, Oscar Salvador wrote:> +static ssize_t read_page_owner_threshold(struct file *file, char __user *buf,
> > + kbuf = kmalloc(count, GFP_KERNEL);
> > + if (!kbuf)
> > + return ENOMEM;
>
> Missing a negative sign, return -ENOMEM.

Will fix.

> > +static ssize_t write_page_owner_threshold(struct file *file, const char __user *buf,
> > + size_t count, loff_t *pos)
> > +{
> > + char *kbuf;
> > + int ret = 0;
> > +
> > + count = min_t(size_t, count, PAGE_SIZE);
> > + kbuf = kmalloc(count, GFP_KERNEL);
>
> This looks overestimating to me. For unsigned long, on a 64-bit system
> has max val 18446744073709551615 (20 chars).
>
> You can use stack a allocated local variable with length 21. No need
> to use kmalloc(). The same way with the read() op.

Probably could do that, but I'll go with Michal's option and will use
seq_file.

Thanks!


--
Oscar Salvador
SUSE Labs

2022-09-05 21:07:22

by Andrey Konovalov

[permalink] [raw]
Subject: Re: [PATCH 1/3] lib/stackdepot: Add a refcount field in stack_record

On Thu, Sep 1, 2022 at 11:18 AM Marco Elver <[email protected]> wrote:
>
> On Thu, 1 Sept 2022 at 10:38, Michal Hocko <[email protected]> wrote:
> >
> > On Thu 01-09-22 10:24:58, Marco Elver wrote:
> > > On Thu, Sep 01, 2022 at 06:42AM +0200, Oscar Salvador wrote:
> > [...]
> > > > diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> > > > index 5ca0d086ef4a..aeb59d3557e2 100644
> > > > --- a/lib/stackdepot.c
> > > > +++ b/lib/stackdepot.c
> > > > @@ -63,6 +63,7 @@ struct stack_record {
> > > > u32 hash; /* Hash in the hastable */
> > > > u32 size; /* Number of frames in the stack */
> > > > union handle_parts handle;
> > > > + refcount_t count; /* Number of the same repeated stacks */
> > >
> > > This will increase stack_record size for every user, even if they don't
> > > care about the count.
> >
> > Couldn't this be used for garbage collection?
>
> Only if we can precisely figure out at which point a stack is no
> longer going to be needed.
>
> But more realistically, stack depot was designed to be simple. Right
> now it can allocate new stacks (from an internal pool), but giving the
> memory back to that pool isn't supported. Doing garbage collection
> would effectively be a redesign of stack depot. And for the purpose
> for which stack depot was designed (debugging tools), memory has never
> been an issue (note that stack depot also has a fixed upper bound on
> memory usage).
>
> We had talked (in the context of KASAN) about bounded stack storage,
> but the preferred solution is usually a cache-based design which
> allows evictions (in the simplest case a ring buffer), because
> figuring out (and relying on) where precisely a stack will
> definitively no longer be required in bug reports is complex and does
> not guarantee the required bound on memory usage. Andrey has done the
> work on this for tag-based KASAN modes:
> https://lore.kernel.org/all/[email protected]/

To be clear, the stack ring buffer implementation for the KASAN
tag-based modes still uses the stack depot as a back end to store
stack traces.

I plan to explore redesigning the stack depot implementation to allow
evicting unneeded stack traces as the next step. (The goal is to have
a memory-bounded stack depot that doesn't just stop collecting stack
traces once the memory limit is reached.) Having a refcount for each
saved stack trace will likely be a part of this redesign.