From: Andrey Konovalov <[email protected]>
Currently, the stack depot grows indefinitely until it reaches its
capacity. Once that happens, the stack depot stops saving new stack
traces.
This creates a problem for using the stack depot for in-field testing
and in production.
For such uses, an ideal stack trace storage should:
1. Allow saving fresh stack traces on systems with a large uptime while
limiting the amount of memory used to store the traces;
2. Have a low performance impact.
Implementing #1 in the stack depot is impossible with the current
keep-forever approach. This series targets to address that. Issue #2 is
left to be addressed in a future series.
This series changes the stack depot implementation to allow evicting
unneeded stack traces from the stack depot. The users of the stack depot
can do that via new stack_depot_save_flags(STACK_DEPOT_FLAG_GET) and
stack_depot_put APIs.
Internal changes to the stack depot code include:
1. Storing stack traces in fixed-frame-sized slots; the slot size is
controlled via CONFIG_STACKDEPOT_MAX_FRAMES (vs precisely-sized
slots in the current implementation);
2. Keeping available slots in a freelist (vs keeping an offset to the next
free slot);
3. Using a read/write lock for synchronization (vs a lock-free approach
combined with a spinlock).
This series also integrates the eviction functionality in the tag-based
KASAN modes.
Despite wasting some space on rounding up the size of each stack record,
with CONFIG_STACKDEPOT_MAX_FRAMES=32, the tag-based KASAN modes end up
consuming ~5% less memory in stack depot during boot (with the default
stack ring size of 32k entries). The reason for this is the eviction of
irrelevant stack traces from the stack depot, which frees up space for
other stack traces.
For other tools that heavily rely on the stack depot, like Generic KASAN
and KMSAN, this change leads to the stack depot capacity being reached
sooner than before. However, as these tools are mainly used in fuzzing
scenarios where the kernel is frequently rebooted, this outcome should
be acceptable.
There is no measurable boot time performance impact of these changes for
KASAN on x86-64. I haven't done any tests for arm64 modes (the stack
depot without performance optimizations is not suitable for intended use
of those anyway), but I expect a similar result. Obtaining and copying
stack trace frames when saving them into stack depot is what takes the
most time.
This series does not yet provide a way to configure the maximum size of
the stack depot externally (e.g. via a command-line parameter). This will
be added in a separate series, possibly together with the performance
improvement changes.
---
Changes v1->v2:
- Rework API to stack_depot_save_flags(STACK_DEPOT_FLAG_GET) +
stack_depot_put.
- Add CONFIG_STACKDEPOT_MAX_FRAMES Kconfig option.
- Switch stack depot to using list_head's.
- Assorted minor changes, see the commit message for each path.
Andrey Konovalov (19):
lib/stackdepot: check disabled flag when fetching
lib/stackdepot: simplify __stack_depot_save
lib/stackdepot: drop valid bit from handles
lib/stackdepot: add depot_fetch_stack helper
lib/stackdepot: use fixed-sized slots for stack records
lib/stackdepot: fix and clean-up atomic annotations
lib/stackdepot: rework helpers for depot_alloc_stack
lib/stackdepot: rename next_pool_required to new_pool_required
lib/stackdepot: store next pool pointer in new_pool
lib/stackdepot: store free stack records in a freelist
lib/stackdepot: use read/write lock
lib/stackdepot: use list_head for stack record links
kmsan: use stack_depot_save instead of __stack_depot_save
lib/stackdepot, kasan: add flags to __stack_depot_save and rename
lib/stackdepot: add refcount for records
lib/stackdepot: allow users to evict stack traces
kasan: remove atomic accesses to stack ring entries
kasan: check object_size in kasan_complete_mode_report_info
kasan: use stack_depot_put for tag-based modes
include/linux/stackdepot.h | 59 ++++--
lib/Kconfig | 10 +-
lib/stackdepot.c | 410 ++++++++++++++++++++++++-------------
mm/kasan/common.c | 7 +-
mm/kasan/generic.c | 9 +-
mm/kasan/kasan.h | 2 +-
mm/kasan/report_tags.c | 27 +--
mm/kasan/tags.c | 24 ++-
mm/kmsan/core.c | 7 +-
9 files changed, 356 insertions(+), 199 deletions(-)
--
2.25.1
From: Andrey Konovalov <[email protected]>
Add stack_depot_put, a function that decrements the reference counter
on a stack record and removes it from the stack depot once the counter
reaches 0.
Internally, when removing a stack record, the function unlinks it from
the hash table bucket and returns to the freelist.
With this change, the users of stack depot can call stack_depot_put
when keeping a stack trace in the stack depot is not needed anymore.
This allows avoiding polluting the stack depot with irrelevant stack
traces and thus have more space to store the relevant ones before the
stack depot reaches its capacity.
Signed-off-by: Andrey Konovalov <[email protected]>
---
Changes v1->v2:
- Comments fixes as suggested by Marco.
- Add lockdep_assert annotation.
- Adapt to using list_head's.
- Rename stack_depot_evict to stack_depot_put.
---
include/linux/stackdepot.h | 14 ++++++++++++++
lib/stackdepot.c | 35 +++++++++++++++++++++++++++++++++++
2 files changed, 49 insertions(+)
diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index 611716702d73..a6796f178913 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -97,6 +97,8 @@ static inline int stack_depot_early_init(void) { return 0; }
*
* If STACK_DEPOT_FLAG_GET is set in @depot_flags, stack depot will increment
* the refcount on the saved stack trace if it already exists in stack depot.
+ * Users of this flag must also call stack_depot_put() when keeping the stack
+ * trace is no longer required to avoid overflowing the refcount.
*
* If the provided stack trace comes from the interrupt context, only the part
* up to the interrupt entry is saved.
@@ -162,6 +164,18 @@ void stack_depot_print(depot_stack_handle_t stack);
int stack_depot_snprint(depot_stack_handle_t handle, char *buf, size_t size,
int spaces);
+/**
+ * stack_depot_put - Drop a reference to a stack trace from stack depot
+ *
+ * @handle: Stack depot handle returned from stack_depot_save()
+ *
+ * The stack trace is evicted from stack depot once all references to it have
+ * been dropped (once the number of stack_depot_evict() calls matches the
+ * number of stack_depot_save_flags() calls with STACK_DEPOT_FLAG_GET set for
+ * this stack trace).
+ */
+void stack_depot_put(depot_stack_handle_t handle);
+
/**
* stack_depot_set_extra_bits - Set extra bits in a stack depot handle
*
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index e2c622054265..56f2abc03717 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -398,6 +398,14 @@ static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle)
return stack;
}
+/* Links stack into the freelist. */
+static void depot_free_stack(struct stack_record *stack)
+{
+ lockdep_assert_held_write(&pool_rwlock);
+
+ list_add(&stack->list, &free_stacks);
+}
+
/* Calculates the hash for a stack. */
static inline u32 hash_stack(unsigned long *entries, unsigned int size)
{
@@ -580,6 +588,33 @@ unsigned int stack_depot_fetch(depot_stack_handle_t handle,
}
EXPORT_SYMBOL_GPL(stack_depot_fetch);
+void stack_depot_put(depot_stack_handle_t handle)
+{
+ struct stack_record *stack;
+ unsigned long flags;
+
+ if (!handle || stack_depot_disabled)
+ return;
+
+ write_lock_irqsave(&pool_rwlock, flags);
+
+ stack = depot_fetch_stack(handle);
+ if (WARN_ON(!stack))
+ goto out;
+
+ if (refcount_dec_and_test(&stack->count)) {
+ /* Unlink stack from the hash table. */
+ list_del(&stack->list);
+
+ /* Free stack. */
+ depot_free_stack(stack);
+ }
+
+out:
+ write_unlock_irqrestore(&pool_rwlock, flags);
+}
+EXPORT_SYMBOL_GPL(stack_depot_put);
+
void stack_depot_print(depot_stack_handle_t stack)
{
unsigned long *entries;
--
2.25.1
From: Andrey Konovalov <[email protected]>
Rename next_pool_required to new_pool_required.
This a purely code readability change: the following patch will change
stack depot to store the pointer to the new pool in a separate variable,
and "new" seems like a more logical name.
Reviewed-by: Alexander Potapenko <[email protected]>
Signed-off-by: Andrey Konovalov <[email protected]>
---
lib/stackdepot.c | 45 ++++++++++++++++++++++-----------------------
1 file changed, 22 insertions(+), 23 deletions(-)
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index e85b658be050..e428f470faf6 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -93,12 +93,11 @@ static size_t pool_offset;
static DEFINE_RAW_SPINLOCK(pool_lock);
/*
* Stack depot tries to keep an extra pool allocated even before it runs out
- * of space in the currently used pool.
- * This flag marks that this next extra pool needs to be allocated and
- * initialized. It has the value 0 when either the next pool is not yet
- * initialized or the limit on the number of pools is reached.
+ * of space in the currently used pool. This flag marks whether this extra pool
+ * needs to be allocated. It has the value 0 when either an extra pool is not
+ * yet allocated or if the limit on the number of pools is reached.
*/
-static int next_pool_required = 1;
+static int new_pool_required = 1;
static int __init disable_stack_depot(char *str)
{
@@ -219,18 +218,18 @@ int stack_depot_init(void)
}
EXPORT_SYMBOL_GPL(stack_depot_init);
-/* Keeps the preallocated memory to be used for the next stack depot pool. */
-static void depot_keep_next_pool(void **prealloc)
+/* Keeps the preallocated memory to be used for a new stack depot pool. */
+static void depot_keep_new_pool(void **prealloc)
{
/*
- * If the next pool is already saved or the maximum number of
+ * If a new pool is already saved or the maximum number of
* pools is reached, do not use the preallocated memory.
*/
if (!next_pool_required)
return;
/*
- * Use the preallocated memory for the next pool
+ * Use the preallocated memory for the new pool
* as long as we do not exceed the maximum number of pools.
*/
if (pool_index + 1 < DEPOT_MAX_POOLS) {
@@ -239,12 +238,12 @@ static void depot_keep_next_pool(void **prealloc)
}
/*
- * At this point, either the next pool is kept or the maximum
+ * At this point, either a new pool is kept or the maximum
* number of pools is reached. In either case, take note that
* keeping another pool is not required.
* smp_store_release pairs with smp_load_acquire in stack_depot_save.
*/
- smp_store_release(&next_pool_required, 0);
+ smp_store_release(&new_pool_required, 0);
}
/* Updates refences to the current and the next stack depot pools. */
@@ -259,7 +258,7 @@ static bool depot_update_pools(size_t required_size, void **prealloc)
}
/*
- * Move on to the next pool.
+ * Move on to the new pool.
* WRITE_ONCE pairs with potential concurrent read in
* stack_depot_fetch.
*/
@@ -268,12 +267,12 @@ static bool depot_update_pools(size_t required_size, void **prealloc)
/*
* If the maximum number of pools is not reached, take note
- * that the next pool needs to be initialized.
+ * that yet another new pool needs to be allocated.
* smp_store_release pairs with smp_load_acquire in
* stack_depot_save.
*/
if (pool_index + 1 < DEPOT_MAX_POOLS)
- smp_store_release(&next_pool_required, 1);
+ smp_store_release(&new_pool_required, 1);
}
/* Check if the current pool is not yet allocated. */
@@ -284,9 +283,9 @@ static bool depot_update_pools(size_t required_size, void **prealloc)
return true;
}
- /* Otherwise, try using the preallocated memory for the next pool. */
+ /* Otherwise, try using the preallocated memory for a new pool. */
if (*prealloc)
- depot_keep_next_pool(prealloc);
+ depot_keep_new_pool(prealloc);
return true;
}
@@ -297,7 +296,7 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
struct stack_record *stack;
size_t required_size = DEPOT_STACK_RECORD_SIZE;
- /* Update current and next pools if required and possible. */
+ /* Update current and new pools if required and possible. */
if (!depot_update_pools(required_size, prealloc))
return NULL;
@@ -429,13 +428,13 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
goto exit;
/*
- * Check if another stack pool needs to be initialized. If so, allocate
- * the memory now - we won't be able to do that under the lock.
+ * Check if another stack pool needs to be allocated. If so, allocate
+ * the memory now: we won't be able to do that under the lock.
*
* smp_load_acquire pairs with smp_store_release in depot_update_pools
- * and depot_keep_next_pool.
+ * and depot_keep_new_pool.
*/
- if (unlikely(can_alloc && smp_load_acquire(&next_pool_required))) {
+ if (unlikely(can_alloc && smp_load_acquire(&new_pool_required))) {
/*
* Zero out zone modifiers, as we don't have specific zone
* requirements. Keep the flags related to allocation in atomic
@@ -468,9 +467,9 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
} else if (prealloc) {
/*
* Stack depot already contains this stack trace, but let's
- * keep the preallocated memory for the future.
+ * keep the preallocated memory for future.
*/
- depot_keep_next_pool(&prealloc);
+ depot_keep_new_pool(&prealloc);
}
raw_spin_unlock_irqrestore(&pool_lock, flags);
--
2.25.1
From: Andrey Konovalov <[email protected]>
Remove the atomic accesses to entry fields in save_stack_info and
kasan_complete_mode_report_info for tag-based KASAN modes.
These atomics are not required, as the read/write lock prevents the
entries from being read (in kasan_complete_mode_report_info) while being
written (in save_stack_info) and the try_cmpxchg prevents the same entry
from being rewritten (in save_stack_info) in the unlikely case of wrapping
during writing.
Signed-off-by: Andrey Konovalov <[email protected]>
---
Changes v1->v2:
- This is a new patch.
---
mm/kasan/report_tags.c | 25 +++++++------------------
mm/kasan/tags.c | 13 +++++--------
2 files changed, 12 insertions(+), 26 deletions(-)
diff --git a/mm/kasan/report_tags.c b/mm/kasan/report_tags.c
index 8b8bfdb3cfdb..78abdcde5da9 100644
--- a/mm/kasan/report_tags.c
+++ b/mm/kasan/report_tags.c
@@ -31,10 +31,6 @@ void kasan_complete_mode_report_info(struct kasan_report_info *info)
unsigned long flags;
u64 pos;
struct kasan_stack_ring_entry *entry;
- void *ptr;
- u32 pid;
- depot_stack_handle_t stack;
- bool is_free;
bool alloc_found = false, free_found = false;
if ((!info->cache || !info->object) && !info->bug_type) {
@@ -61,18 +57,11 @@ void kasan_complete_mode_report_info(struct kasan_report_info *info)
entry = &stack_ring.entries[i % stack_ring.size];
- /* Paired with smp_store_release() in save_stack_info(). */
- ptr = (void *)smp_load_acquire(&entry->ptr);
-
- if (kasan_reset_tag(ptr) != info->object ||
- get_tag(ptr) != get_tag(info->access_addr))
+ if (kasan_reset_tag(entry->ptr) != info->object ||
+ get_tag(entry->ptr) != get_tag(info->access_addr))
continue;
- pid = READ_ONCE(entry->pid);
- stack = READ_ONCE(entry->stack);
- is_free = READ_ONCE(entry->is_free);
-
- if (is_free) {
+ if (entry->is_free) {
/*
* Second free of the same object.
* Give up on trying to find the alloc entry.
@@ -80,8 +69,8 @@ void kasan_complete_mode_report_info(struct kasan_report_info *info)
if (free_found)
break;
- info->free_track.pid = pid;
- info->free_track.stack = stack;
+ info->free_track.pid = entry->pid;
+ info->free_track.stack = entry->stack;
free_found = true;
/*
@@ -95,8 +84,8 @@ void kasan_complete_mode_report_info(struct kasan_report_info *info)
if (alloc_found)
break;
- info->alloc_track.pid = pid;
- info->alloc_track.stack = stack;
+ info->alloc_track.pid = entry->pid;
+ info->alloc_track.stack = entry->stack;
alloc_found = true;
/*
diff --git a/mm/kasan/tags.c b/mm/kasan/tags.c
index 4fd32121b0fd..b6c017e670d8 100644
--- a/mm/kasan/tags.c
+++ b/mm/kasan/tags.c
@@ -121,15 +121,12 @@ static void save_stack_info(struct kmem_cache *cache, void *object,
if (!try_cmpxchg(&entry->ptr, &old_ptr, STACK_RING_BUSY_PTR))
goto next; /* Busy slot. */
- WRITE_ONCE(entry->size, cache->object_size);
- WRITE_ONCE(entry->pid, current->pid);
- WRITE_ONCE(entry->stack, stack);
- WRITE_ONCE(entry->is_free, is_free);
+ entry->size = cache->object_size;
+ entry->pid = current->pid;
+ entry->stack = stack;
+ entry->is_free = is_free;
- /*
- * Paired with smp_load_acquire() in kasan_complete_mode_report_info().
- */
- smp_store_release(&entry->ptr, (s64)object);
+ entry->ptr = object;
read_unlock_irqrestore(&stack_ring.lock, flags);
}
--
2.25.1
From: Andrey Konovalov <[email protected]>
Instead of using the global pool_offset variable to find a free slot
when storing a new stack record, mainlain a freelist of free slots
within the allocated stack pools.
A global next_stack variable is used as the head of the freelist, and
the next field in the stack_record struct is reused as freelist link
(when the record is not in the freelist, this field is used as a link
in the hash table).
This is preparatory patch for implementing the eviction of stack records
from the stack depot.
Signed-off-by: Andrey Konovalov <[email protected]>
---
Changes v1->v2:
- Fix out-of-bounds when initializing a pool.
---
lib/stackdepot.c | 131 +++++++++++++++++++++++++++++------------------
1 file changed, 82 insertions(+), 49 deletions(-)
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 81d8733cdbed..ca8e6fee0cb4 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -54,8 +54,8 @@ union handle_parts {
};
struct stack_record {
- struct stack_record *next; /* Link in the hash table */
- u32 hash; /* Hash in the hash table */
+ struct stack_record *next; /* Link in hash table or freelist */
+ u32 hash; /* Hash in hash table */
u32 size; /* Number of stored frames */
union handle_parts handle;
unsigned long entries[CONFIG_STACKDEPOT_MAX_FRAMES]; /* Frames */
@@ -87,10 +87,10 @@ static unsigned int stack_hash_mask;
static void *stack_pools[DEPOT_MAX_POOLS];
/* Newly allocated pool that is not yet added to stack_pools. */
static void *new_pool;
-/* Currently used pool in stack_pools. */
-static int pool_index;
-/* Offset to the unused space in the currently used pool. */
-static size_t pool_offset;
+/* Number of pools in stack_pools. */
+static int pools_num;
+/* Next stack in the freelist of stack records within stack_pools. */
+static struct stack_record *next_stack;
/* Lock that protects the variables above. */
static DEFINE_RAW_SPINLOCK(pool_lock);
/*
@@ -220,6 +220,42 @@ int stack_depot_init(void)
}
EXPORT_SYMBOL_GPL(stack_depot_init);
+/* Initializes a stack depol pool. */
+static void depot_init_pool(void *pool)
+{
+ const int records_in_pool = DEPOT_POOL_SIZE / DEPOT_STACK_RECORD_SIZE;
+ int i, offset;
+
+ /* Initialize handles and link stack records to each other. */
+ for (i = 0, offset = 0;
+ offset <= DEPOT_POOL_SIZE - DEPOT_STACK_RECORD_SIZE;
+ i++, offset += DEPOT_STACK_RECORD_SIZE) {
+ struct stack_record *stack = pool + offset;
+
+ stack->handle.pool_index = pools_num;
+ stack->handle.offset = offset >> DEPOT_STACK_ALIGN;
+ stack->handle.extra = 0;
+
+ if (i < records_in_pool - 1)
+ stack->next = (void *)stack + DEPOT_STACK_RECORD_SIZE;
+ else
+ stack->next = NULL;
+ }
+
+ /* Link stack records into the freelist. */
+ WARN_ON(next_stack);
+ next_stack = pool;
+
+ /* Save reference to the pool to be used by depot_fetch_stack. */
+ stack_pools[pools_num] = pool;
+
+ /*
+ * WRITE_ONCE pairs with potential concurrent read in
+ * depot_fetch_stack.
+ */
+ WRITE_ONCE(pools_num, pools_num + 1);
+}
+
/* Keeps the preallocated memory to be used for a new stack depot pool. */
static void depot_keep_new_pool(void **prealloc)
{
@@ -234,7 +270,7 @@ static void depot_keep_new_pool(void **prealloc)
* Use the preallocated memory for the new pool
* as long as we do not exceed the maximum number of pools.
*/
- if (pool_index + 1 < DEPOT_MAX_POOLS) {
+ if (pools_num < DEPOT_MAX_POOLS) {
new_pool = *prealloc;
*prealloc = NULL;
}
@@ -249,45 +285,42 @@ static void depot_keep_new_pool(void **prealloc)
}
/* Updates refences to the current and the next stack depot pools. */
-static bool depot_update_pools(size_t required_size, void **prealloc)
+static bool depot_update_pools(void **prealloc)
{
- /* Check if there is not enough space in the current pool. */
- if (unlikely(pool_offset + required_size > DEPOT_POOL_SIZE)) {
- /* Bail out if we reached the pool limit. */
- if (unlikely(pool_index + 1 >= DEPOT_MAX_POOLS)) {
- WARN_ONCE(1, "Stack depot reached limit capacity");
- return false;
- }
+ /* Check if we still have objects in the freelist. */
+ if (next_stack)
+ goto out_keep_prealloc;
- /*
- * Move on to the new pool.
- * WRITE_ONCE pairs with potential concurrent read in
- * stack_depot_fetch.
- */
- WRITE_ONCE(pool_index, pool_index + 1);
- stack_pools[pool_index] = new_pool;
+ /* Check if we have a new pool saved and use it. */
+ if (new_pool) {
+ depot_init_pool(new_pool);
new_pool = NULL;
- pool_offset = 0;
- /*
- * If the maximum number of pools is not reached, take note
- * that yet another new pool needs to be allocated.
- * smp_store_release pairs with smp_load_acquire in
- * stack_depot_save.
- */
- if (pool_index + 1 < DEPOT_MAX_POOLS)
+ /* Take note that we might need a new new_pool. */
+ if (pools_num < DEPOT_MAX_POOLS)
smp_store_release(&new_pool_required, 1);
+
+ /* Try keeping the preallocated memory for new_pool. */
+ goto out_keep_prealloc;
+ }
+
+ /* Bail out if we reached the pool limit. */
+ if (unlikely(pools_num >= DEPOT_MAX_POOLS)) {
+ WARN_ONCE(1, "Stack depot reached limit capacity");
+ return false;
}
- /* Check if the current pool is not yet allocated. */
- if (*prealloc && stack_pools[pool_index] == NULL) {
- /* Use the preallocated memory for the current pool. */
- stack_pools[pool_index] = *prealloc;
+ /* Check if we have preallocated memory and use it. */
+ if (*prealloc) {
+ depot_init_pool(*prealloc);
*prealloc = NULL;
return true;
}
- /* Otherwise, try using the preallocated memory for a new pool. */
+ return false;
+
+out_keep_prealloc:
+ /* Keep the preallocated memory for a new pool if required. */
if (*prealloc)
depot_keep_new_pool(prealloc);
return true;
@@ -298,35 +331,35 @@ static struct stack_record *
depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
{
struct stack_record *stack;
- size_t required_size = DEPOT_STACK_RECORD_SIZE;
/* Update current and new pools if required and possible. */
- if (!depot_update_pools(required_size, prealloc))
+ if (!depot_update_pools(prealloc))
return NULL;
- /* Check if we have a pool to save the stack trace. */
- if (stack_pools[pool_index] == NULL)
+ /* Check if we have a stack record to save the stack trace. */
+ stack = next_stack;
+ if (!stack)
return NULL;
+ /* Advance the freelist. */
+ next_stack = stack->next;
+
/* Limit number of saved frames to CONFIG_STACKDEPOT_MAX_FRAMES. */
if (size > CONFIG_STACKDEPOT_MAX_FRAMES)
size = CONFIG_STACKDEPOT_MAX_FRAMES;
/* Save the stack trace. */
- stack = stack_pools[pool_index] + pool_offset;
+ stack->next = NULL;
stack->hash = hash;
stack->size = size;
- stack->handle.pool_index = pool_index;
- stack->handle.offset = pool_offset >> DEPOT_STACK_ALIGN;
- stack->handle.extra = 0;
+ /* stack->handle is already filled in by depot_init_pool. */
memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
- pool_offset += required_size;
/*
* Let KMSAN know the stored stack record is initialized. This shall
* prevent false positive reports if instrumented code accesses it.
*/
- kmsan_unpoison_memory(stack, required_size);
+ kmsan_unpoison_memory(stack, DEPOT_STACK_RECORD_SIZE);
return stack;
}
@@ -336,16 +369,16 @@ static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle)
union handle_parts parts = { .handle = handle };
/*
* READ_ONCE pairs with potential concurrent write in
- * depot_update_pools.
+ * depot_init_pool.
*/
- int pool_index_cached = READ_ONCE(pool_index);
+ int pools_num_cached = READ_ONCE(pools_num);
void *pool;
size_t offset = parts.offset << DEPOT_STACK_ALIGN;
struct stack_record *stack;
- if (parts.pool_index > pool_index_cached) {
+ if (parts.pool_index > pools_num_cached) {
WARN(1, "pool index %d out of bounds (%d) for stack id %08x\n",
- parts.pool_index, pool_index_cached, handle);
+ parts.pool_index, pools_num_cached, handle);
return NULL;
}
--
2.25.1
From: Andrey Konovalov <[email protected]>
Change the bool can_alloc argument of __stack_depot_save to a
u32 argument that accepts a set of flags.
The following patch will add another flag to stack_depot_save_flags
besides the existing STACK_DEPOT_FLAG_CAN_ALLOC.
Also rename the function to stack_depot_save_flags, as __stack_depot_save
is a cryptic name,
Signed-off-by: Andrey Konovalov <[email protected]>
---
Changes v1->v2:
- This is a new patch.
---
include/linux/stackdepot.h | 36 +++++++++++++++++++++++++-----------
lib/stackdepot.c | 16 +++++++++++-----
mm/kasan/common.c | 7 ++++---
mm/kasan/generic.c | 9 +++++----
mm/kasan/kasan.h | 2 +-
mm/kasan/tags.c | 3 ++-
6 files changed, 48 insertions(+), 25 deletions(-)
diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index e58306783d8e..0b262e14144e 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -32,6 +32,17 @@ typedef u32 depot_stack_handle_t;
*/
#define STACK_DEPOT_EXTRA_BITS 5
+typedef u32 depot_flags_t;
+
+/*
+ * Flags that can be passed to stack_depot_save_flags(); see the comment next
+ * to its declaration for more details.
+ */
+#define STACK_DEPOT_FLAG_CAN_ALLOC ((depot_flags_t)0x0001)
+
+#define STACK_DEPOT_FLAGS_NUM 1
+#define STACK_DEPOT_FLAGS_MASK ((depot_flags_t)((1 << STACK_DEPOT_FLAGS_NUM) - 1))
+
/*
* Using stack depot requires its initialization, which can be done in 3 ways:
*
@@ -69,31 +80,34 @@ static inline int stack_depot_early_init(void) { return 0; }
#endif
/**
- * __stack_depot_save - Save a stack trace to stack depot
+ * stack_depot_save_flags - Save a stack trace to stack depot
*
* @entries: Pointer to the stack trace
* @nr_entries: Number of frames in the stack
* @alloc_flags: Allocation GFP flags
- * @can_alloc: Allocate stack pools (increased chance of failure if false)
+ * @depot_flags: Stack depot flags
+ *
+ * Saves a stack trace from @entries array of size @nr_entries.
*
- * Saves a stack trace from @entries array of size @nr_entries. If @can_alloc is
- * %true, stack depot can replenish the stack pools in case no space is left
- * (allocates using GFP flags of @alloc_flags). If @can_alloc is %false, avoids
- * any allocations and fails if no space is left to store the stack trace.
+ * If STACK_DEPOT_FLAG_CAN_ALLOC is set in @depot_flags, stack depot can
+ * replenish the stack pools in case no space is left (allocates using GFP
+ * flags of @alloc_flags). Otherwise, stack depot avoids any allocations and
+ * fails if no space is left to store the stack trace.
*
* If the provided stack trace comes from the interrupt context, only the part
* up to the interrupt entry is saved.
*
- * Context: Any context, but setting @can_alloc to %false is required if
+ * Context: Any context, but setting STACK_DEPOT_FLAG_CAN_ALLOC is required if
* alloc_pages() cannot be used from the current context. Currently
* this is the case for contexts where neither %GFP_ATOMIC nor
* %GFP_NOWAIT can be used (NMI, raw_spin_lock).
*
* Return: Handle of the stack struct stored in depot, 0 on failure
*/
-depot_stack_handle_t __stack_depot_save(unsigned long *entries,
- unsigned int nr_entries,
- gfp_t gfp_flags, bool can_alloc);
+depot_stack_handle_t stack_depot_save_flags(unsigned long *entries,
+ unsigned int nr_entries,
+ gfp_t gfp_flags,
+ depot_flags_t depot_flags);
/**
* stack_depot_save - Save a stack trace to stack depot
@@ -103,7 +117,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
* @alloc_flags: Allocation GFP flags
*
* Context: Contexts where allocations via alloc_pages() are allowed.
- * See __stack_depot_save() for more details.
+ * See stack_depot_save_flags() for more details.
*
* Return: Handle of the stack trace stored in depot, 0 on failure
*/
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 1b08897ebd2b..e5121225f124 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -438,19 +438,24 @@ static inline struct stack_record *find_stack(struct list_head *bucket,
return NULL;
}
-depot_stack_handle_t __stack_depot_save(unsigned long *entries,
- unsigned int nr_entries,
- gfp_t alloc_flags, bool can_alloc)
+depot_stack_handle_t stack_depot_save_flags(unsigned long *entries,
+ unsigned int nr_entries,
+ gfp_t alloc_flags,
+ depot_flags_t depot_flags)
{
struct list_head *bucket;
struct stack_record *found = NULL;
depot_stack_handle_t handle = 0;
struct page *page = NULL;
void *prealloc = NULL;
+ bool can_alloc = depot_flags & STACK_DEPOT_FLAG_CAN_ALLOC;
bool need_alloc = false;
unsigned long flags;
u32 hash;
+ if (depot_flags & ~STACK_DEPOT_FLAGS_MASK)
+ return 0;
+
/*
* If this stack trace is from an interrupt, including anything before
* interrupt entry usually leads to unbounded stack depot growth.
@@ -529,13 +534,14 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
handle = found->handle.handle;
return handle;
}
-EXPORT_SYMBOL_GPL(__stack_depot_save);
+EXPORT_SYMBOL_GPL(stack_depot_save_flags);
depot_stack_handle_t stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
gfp_t alloc_flags)
{
- return __stack_depot_save(entries, nr_entries, alloc_flags, true);
+ return stack_depot_save_flags(entries, nr_entries, alloc_flags,
+ STACK_DEPOT_FLAG_CAN_ALLOC);
}
EXPORT_SYMBOL_GPL(stack_depot_save);
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 256930da578a..825a0240ec02 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -22,6 +22,7 @@
#include <linux/sched.h>
#include <linux/sched/task_stack.h>
#include <linux/slab.h>
+#include <linux/stackdepot.h>
#include <linux/stacktrace.h>
#include <linux/string.h>
#include <linux/types.h>
@@ -37,19 +38,19 @@ struct slab *kasan_addr_to_slab(const void *addr)
return NULL;
}
-depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc)
+depot_stack_handle_t kasan_save_stack(gfp_t flags, depot_flags_t depot_flags)
{
unsigned long entries[KASAN_STACK_DEPTH];
unsigned int nr_entries;
nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
- return __stack_depot_save(entries, nr_entries, flags, can_alloc);
+ return stack_depot_save_flags(entries, nr_entries, flags, depot_flags);
}
void kasan_set_track(struct kasan_track *track, gfp_t flags)
{
track->pid = current->pid;
- track->stack = kasan_save_stack(flags, true);
+ track->stack = kasan_save_stack(flags, STACK_DEPOT_FLAG_CAN_ALLOC);
}
#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 4d837ab83f08..5d168c9afb32 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -25,6 +25,7 @@
#include <linux/sched.h>
#include <linux/sched/task_stack.h>
#include <linux/slab.h>
+#include <linux/stackdepot.h>
#include <linux/stacktrace.h>
#include <linux/string.h>
#include <linux/types.h>
@@ -472,7 +473,7 @@ size_t kasan_metadata_size(struct kmem_cache *cache, bool in_object)
sizeof(struct kasan_free_meta) : 0);
}
-static void __kasan_record_aux_stack(void *addr, bool can_alloc)
+static void __kasan_record_aux_stack(void *addr, depot_flags_t depot_flags)
{
struct slab *slab = kasan_addr_to_slab(addr);
struct kmem_cache *cache;
@@ -489,17 +490,17 @@ static void __kasan_record_aux_stack(void *addr, bool can_alloc)
return;
alloc_meta->aux_stack[1] = alloc_meta->aux_stack[0];
- alloc_meta->aux_stack[0] = kasan_save_stack(0, can_alloc);
+ alloc_meta->aux_stack[0] = kasan_save_stack(0, depot_flags);
}
void kasan_record_aux_stack(void *addr)
{
- return __kasan_record_aux_stack(addr, true);
+ return __kasan_record_aux_stack(addr, STACK_DEPOT_FLAG_CAN_ALLOC);
}
void kasan_record_aux_stack_noalloc(void *addr)
{
- return __kasan_record_aux_stack(addr, false);
+ return __kasan_record_aux_stack(addr, 0);
}
void kasan_save_alloc_info(struct kmem_cache *cache, void *object, gfp_t flags)
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index f70e3d7a602e..de3206e11888 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -370,7 +370,7 @@ static inline void kasan_init_cache_meta(struct kmem_cache *cache, unsigned int
static inline void kasan_init_object_meta(struct kmem_cache *cache, const void *object) { }
#endif
-depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc);
+depot_stack_handle_t kasan_save_stack(gfp_t flags, depot_flags_t depot_flags);
void kasan_set_track(struct kasan_track *track, gfp_t flags);
void kasan_save_alloc_info(struct kmem_cache *cache, void *object, gfp_t flags);
void kasan_save_free_info(struct kmem_cache *cache, void *object);
diff --git a/mm/kasan/tags.c b/mm/kasan/tags.c
index 7dcfe341d48e..4fd32121b0fd 100644
--- a/mm/kasan/tags.c
+++ b/mm/kasan/tags.c
@@ -13,6 +13,7 @@
#include <linux/memblock.h>
#include <linux/memory.h>
#include <linux/mm.h>
+#include <linux/stackdepot.h>
#include <linux/static_key.h>
#include <linux/string.h>
#include <linux/types.h>
@@ -101,7 +102,7 @@ static void save_stack_info(struct kmem_cache *cache, void *object,
struct kasan_stack_ring_entry *entry;
void *old_ptr;
- stack = kasan_save_stack(gfp_flags, true);
+ stack = kasan_save_stack(gfp_flags, STACK_DEPOT_FLAG_CAN_ALLOC);
/*
* Prevent save_stack_info() from modifying stack ring
--
2.25.1
From: Andrey Konovalov <[email protected]>
Do not try fetching a stack trace from the stack depot if the
stack_depot_disabled flag is enabled.
Reviewed-by: Alexander Potapenko <[email protected]>
Signed-off-by: Andrey Konovalov <[email protected]>
---
lib/stackdepot.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 2f5aa851834e..3a945c7206f3 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -477,7 +477,7 @@ unsigned int stack_depot_fetch(depot_stack_handle_t handle,
*/
kmsan_unpoison_memory(entries, sizeof(*entries));
- if (!handle)
+ if (!handle || stack_depot_disabled)
return 0;
if (parts.pool_index > pool_index_cached) {
--
2.25.1
From: Andrey Konovalov <[email protected]>
Split code in depot_alloc_stack and depot_init_pool into 3 functions:
1. depot_keep_next_pool that keeps preallocated memory for the next pool
if required.
2. depot_update_pools that moves on to the next pool if there's no space
left in the current pool, uses preallocated memory for the new current
pool if required, and calls depot_keep_next_pool otherwise.
3. depot_alloc_stack that calls depot_update_pools and then allocates
a stack record as before.
This makes it somewhat easier to follow the logic of depot_alloc_stack
and also serves as a preparation for implementing the eviction of stack
records from the stack depot.
Signed-off-by: Andrey Konovalov <[email protected]>
---
lib/stackdepot.c | 87 +++++++++++++++++++++++++++---------------------
1 file changed, 49 insertions(+), 38 deletions(-)
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index babd453261f0..e85b658be050 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -219,54 +219,43 @@ int stack_depot_init(void)
}
EXPORT_SYMBOL_GPL(stack_depot_init);
-/* Uses preallocated memory to initialize a new stack depot pool. */
-static void depot_init_pool(void **prealloc)
+/* Keeps the preallocated memory to be used for the next stack depot pool. */
+static void depot_keep_next_pool(void **prealloc)
{
/*
- * If the next pool is already initialized or the maximum number of
+ * If the next pool is already saved or the maximum number of
* pools is reached, do not use the preallocated memory.
*/
if (!next_pool_required)
return;
- /* Check if the current pool is not yet allocated. */
- if (stack_pools[pool_index] == NULL) {
- /* Use the preallocated memory for the current pool. */
- stack_pools[pool_index] = *prealloc;
+ /*
+ * Use the preallocated memory for the next pool
+ * as long as we do not exceed the maximum number of pools.
+ */
+ if (pool_index + 1 < DEPOT_MAX_POOLS) {
+ stack_pools[pool_index + 1] = *prealloc;
*prealloc = NULL;
- } else {
- /*
- * Otherwise, use the preallocated memory for the next pool
- * as long as we do not exceed the maximum number of pools.
- */
- if (pool_index + 1 < DEPOT_MAX_POOLS) {
- stack_pools[pool_index + 1] = *prealloc;
- *prealloc = NULL;
- }
- /*
- * At this point, either the next pool is initialized or the
- * maximum number of pools is reached. In either case, take
- * note that initializing another pool is not required.
- * smp_store_release pairs with smp_load_acquire in
- * stack_depot_save.
- */
- smp_store_release(&next_pool_required, 0);
}
+
+ /*
+ * At this point, either the next pool is kept or the maximum
+ * number of pools is reached. In either case, take note that
+ * keeping another pool is not required.
+ * smp_store_release pairs with smp_load_acquire in stack_depot_save.
+ */
+ smp_store_release(&next_pool_required, 0);
}
-/* Allocates a new stack in a stack depot pool. */
-static struct stack_record *
-depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
+/* Updates refences to the current and the next stack depot pools. */
+static bool depot_update_pools(size_t required_size, void **prealloc)
{
- struct stack_record *stack;
- size_t required_size = DEPOT_STACK_RECORD_SIZE;
-
/* Check if there is not enough space in the current pool. */
if (unlikely(pool_offset + required_size > DEPOT_POOL_SIZE)) {
/* Bail out if we reached the pool limit. */
if (unlikely(pool_index + 1 >= DEPOT_MAX_POOLS)) {
WARN_ONCE(1, "Stack depot reached limit capacity");
- return NULL;
+ return false;
}
/*
@@ -276,9 +265,10 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
*/
WRITE_ONCE(pool_index, pool_index + 1);
pool_offset = 0;
+
/*
* If the maximum number of pools is not reached, take note
- * that the next pool needs to initialized.
+ * that the next pool needs to be initialized.
* smp_store_release pairs with smp_load_acquire in
* stack_depot_save.
*/
@@ -286,9 +276,30 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
smp_store_release(&next_pool_required, 1);
}
- /* Assign the preallocated memory to a pool if required. */
+ /* Check if the current pool is not yet allocated. */
+ if (*prealloc && stack_pools[pool_index] == NULL) {
+ /* Use the preallocated memory for the current pool. */
+ stack_pools[pool_index] = *prealloc;
+ *prealloc = NULL;
+ return true;
+ }
+
+ /* Otherwise, try using the preallocated memory for the next pool. */
if (*prealloc)
- depot_init_pool(prealloc);
+ depot_keep_next_pool(prealloc);
+ return true;
+}
+
+/* Allocates a new stack in a stack depot pool. */
+static struct stack_record *
+depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
+{
+ struct stack_record *stack;
+ size_t required_size = DEPOT_STACK_RECORD_SIZE;
+
+ /* Update current and next pools if required and possible. */
+ if (!depot_update_pools(required_size, prealloc))
+ return NULL;
/* Check if we have a pool to save the stack trace. */
if (stack_pools[pool_index] == NULL)
@@ -322,7 +333,7 @@ static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle)
union handle_parts parts = { .handle = handle };
/*
* READ_ONCE pairs with potential concurrent write in
- * depot_alloc_stack.
+ * depot_update_pools.
*/
int pool_index_cached = READ_ONCE(pool_index);
void *pool;
@@ -421,8 +432,8 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
* Check if another stack pool needs to be initialized. If so, allocate
* the memory now - we won't be able to do that under the lock.
*
- * smp_load_acquire pairs with smp_store_release in depot_alloc_stack
- * and depot_init_pool.
+ * smp_load_acquire pairs with smp_store_release in depot_update_pools
+ * and depot_keep_next_pool.
*/
if (unlikely(can_alloc && smp_load_acquire(&next_pool_required))) {
/*
@@ -459,7 +470,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
* Stack depot already contains this stack trace, but let's
* keep the preallocated memory for the future.
*/
- depot_init_pool(&prealloc);
+ depot_keep_next_pool(&prealloc);
}
raw_spin_unlock_irqrestore(&pool_lock, flags);
--
2.25.1
From: Andrey Konovalov <[email protected]>
Make KMSAN use stack_depot_save instead of __stack_depot_save,
as it always passes true to __stack_depot_save as the last argument.
Signed-off-by: Andrey Konovalov <[email protected]>
---
Changes v1->v2:
- This is a new patch.
---
mm/kmsan/core.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c
index 3adb4c1d3b19..5d942f19d12a 100644
--- a/mm/kmsan/core.c
+++ b/mm/kmsan/core.c
@@ -76,7 +76,7 @@ depot_stack_handle_t kmsan_save_stack_with_flags(gfp_t flags,
/* Don't sleep. */
flags &= ~(__GFP_DIRECT_RECLAIM | __GFP_KSWAPD_RECLAIM);
- handle = __stack_depot_save(entries, nr_entries, flags, true);
+ handle = stack_depot_save(entries, nr_entries, flags);
return stack_depot_set_extra_bits(handle, extra);
}
@@ -250,11 +250,10 @@ depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id)
/*
* @entries is a local var in non-instrumented code, so KMSAN does not
* know it is initialized. Explicitly unpoison it to avoid false
- * positives when __stack_depot_save() passes it to instrumented code.
+ * positives when stack_depot_save() passes it to instrumented code.
*/
kmsan_internal_unpoison_memory(entries, sizeof(entries), false);
- handle = __stack_depot_save(entries, ARRAY_SIZE(entries), __GFP_HIGH,
- true);
+ handle = stack_depot_save(entries, ARRAY_SIZE(entries), __GFP_HIGH);
return stack_depot_set_extra_bits(handle, extra_bits);
}
--
2.25.1
From: Andrey Konovalov <[email protected]>
Switch stack_record to use list_head for links in the hash table
and in the freelist.
This will allow removing entries from the hash table buckets.
This is preparatory patch for implementing the eviction of stack records
from the stack depot.
Signed-off-by: Andrey Konovalov <[email protected]>
---
Changes v1->v2:
- Use list_head instead of open-coding backward links.
---
lib/stackdepot.c | 77 ++++++++++++++++++++++++++----------------------
1 file changed, 42 insertions(+), 35 deletions(-)
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 0b4591475d4f..1b08897ebd2b 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -18,6 +18,7 @@
#include <linux/jhash.h>
#include <linux/kernel.h>
#include <linux/kmsan.h>
+#include <linux/list.h>
#include <linux/mm.h>
#include <linux/mutex.h>
#include <linux/percpu.h>
@@ -55,7 +56,7 @@ union handle_parts {
};
struct stack_record {
- struct stack_record *next; /* Link in hash table or freelist */
+ struct list_head list; /* Links in hash table or freelist */
u32 hash; /* Hash in hash table */
u32 size; /* Number of stored frames */
union handle_parts handle;
@@ -77,21 +78,21 @@ static bool __stack_depot_early_init_passed __initdata;
/* Initial seed for jhash2. */
#define STACK_HASH_SEED 0x9747b28c
-/* Hash table of pointers to stored stack traces. */
-static struct stack_record **stack_table;
+/* Hash table of stored stack records. */
+static struct list_head *stack_table;
/* Fixed order of the number of table buckets. Used when KASAN is enabled. */
static unsigned int stack_bucket_number_order;
/* Hash mask for indexing the table. */
static unsigned int stack_hash_mask;
-/* Array of memory regions that store stack traces. */
+/* Array of memory regions that store stack records. */
static void *stack_pools[DEPOT_MAX_POOLS];
/* Newly allocated pool that is not yet added to stack_pools. */
static void *new_pool;
/* Number of pools in stack_pools. */
static int pools_num;
-/* Next stack in the freelist of stack records within stack_pools. */
-static struct stack_record *next_stack;
+/* Freelist of stack records within stack_pools. */
+static LIST_HEAD(free_stacks);
/*
* Stack depot tries to keep an extra pool allocated even before it runs out
* of space in the currently used pool. This flag marks whether this extra pool
@@ -123,6 +124,15 @@ void __init stack_depot_request_early_init(void)
__stack_depot_early_init_requested = true;
}
+/* Initialize list_head's within the hash table. */
+static void init_stack_table(unsigned long entries)
+{
+ unsigned long i;
+
+ for (i = 0; i < entries; i++)
+ INIT_LIST_HEAD(&stack_table[i]);
+}
+
/* Allocates a hash table via memblock. Can only be used during early boot. */
int __init stack_depot_early_init(void)
{
@@ -152,10 +162,10 @@ int __init stack_depot_early_init(void)
entries = 1UL << stack_bucket_number_order;
pr_info("allocating hash table via alloc_large_system_hash\n");
stack_table = alloc_large_system_hash("stackdepot",
- sizeof(struct stack_record *),
+ sizeof(struct list_head),
entries,
STACK_HASH_TABLE_SCALE,
- HASH_EARLY | HASH_ZERO,
+ HASH_EARLY,
NULL,
&stack_hash_mask,
1UL << STACK_BUCKET_NUMBER_ORDER_MIN,
@@ -165,6 +175,7 @@ int __init stack_depot_early_init(void)
stack_depot_disabled = true;
return -ENOMEM;
}
+ init_stack_table(entries);
return 0;
}
@@ -205,7 +216,7 @@ int stack_depot_init(void)
entries = 1UL << STACK_BUCKET_NUMBER_ORDER_MAX;
pr_info("allocating hash table of %lu entries via kvcalloc\n", entries);
- stack_table = kvcalloc(entries, sizeof(struct stack_record *), GFP_KERNEL);
+ stack_table = kvcalloc(entries, sizeof(struct list_head), GFP_KERNEL);
if (!stack_table) {
pr_err("hash table allocation failed, disabling\n");
stack_depot_disabled = true;
@@ -213,6 +224,7 @@ int stack_depot_init(void)
goto out_unlock;
}
stack_hash_mask = entries - 1;
+ init_stack_table(entries);
out_unlock:
mutex_unlock(&stack_depot_init_mutex);
@@ -224,30 +236,24 @@ EXPORT_SYMBOL_GPL(stack_depot_init);
/* Initializes a stack depol pool. */
static void depot_init_pool(void *pool)
{
- const int records_in_pool = DEPOT_POOL_SIZE / DEPOT_STACK_RECORD_SIZE;
- int i, offset;
+ int offset;
lockdep_assert_held_write(&pool_rwlock);
- /* Initialize handles and link stack records to each other. */
- for (i = 0, offset = 0;
- offset <= DEPOT_POOL_SIZE - DEPOT_STACK_RECORD_SIZE;
- i++, offset += DEPOT_STACK_RECORD_SIZE) {
+ WARN_ON(!list_empty(&free_stacks));
+
+ /* Initialize handles and link stack records into the freelist. */
+ for (offset = 0; offset <= DEPOT_POOL_SIZE - DEPOT_STACK_RECORD_SIZE;
+ offset += DEPOT_STACK_RECORD_SIZE) {
struct stack_record *stack = pool + offset;
stack->handle.pool_index = pools_num;
stack->handle.offset = offset >> DEPOT_STACK_ALIGN;
stack->handle.extra = 0;
- if (i < records_in_pool - 1)
- stack->next = (void *)stack + DEPOT_STACK_RECORD_SIZE;
- else
- stack->next = NULL;
+ list_add(&stack->list, &free_stacks);
}
- /* Link stack records into the freelist. */
- WARN_ON(next_stack);
- next_stack = pool;
/* Save reference to the pool to be used by depot_fetch_stack. */
stack_pools[pools_num] = pool;
@@ -289,7 +295,7 @@ static bool depot_update_pools(void **prealloc)
lockdep_assert_held_write(&pool_rwlock);
/* Check if we still have objects in the freelist. */
- if (next_stack)
+ if (!list_empty(&free_stacks))
goto out_keep_prealloc;
/* Check if we have a new pool saved and use it. */
@@ -340,19 +346,18 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
return NULL;
/* Check if we have a stack record to save the stack trace. */
- stack = next_stack;
- if (!stack)
+ if (list_empty(&free_stacks))
return NULL;
- /* Advance the freelist. */
- next_stack = stack->next;
+ /* Get and unlink the first entry from the freelist. */
+ stack = list_first_entry(&free_stacks, struct stack_record, list);
+ list_del(&stack->list);
/* Limit number of saved frames to CONFIG_STACKDEPOT_MAX_FRAMES. */
if (size > CONFIG_STACKDEPOT_MAX_FRAMES)
size = CONFIG_STACKDEPOT_MAX_FRAMES;
/* Save the stack trace. */
- stack->next = NULL;
stack->hash = hash;
stack->size = size;
/* stack->handle is already filled in by depot_init_pool. */
@@ -414,15 +419,17 @@ int stackdepot_memcmp(const unsigned long *u1, const unsigned long *u2,
}
/* Finds a stack in a bucket of the hash table. */
-static inline struct stack_record *find_stack(struct stack_record *bucket,
+static inline struct stack_record *find_stack(struct list_head *bucket,
unsigned long *entries, int size,
u32 hash)
{
+ struct list_head *pos;
struct stack_record *found;
lockdep_assert_held(&pool_rwlock);
- for (found = bucket; found; found = found->next) {
+ list_for_each(pos, bucket) {
+ found = list_entry(pos, struct stack_record, list);
if (found->hash == hash &&
found->size == size &&
!stackdepot_memcmp(entries, found->entries, size))
@@ -435,7 +442,8 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
gfp_t alloc_flags, bool can_alloc)
{
- struct stack_record *found = NULL, **bucket;
+ struct list_head *bucket;
+ struct stack_record *found = NULL;
depot_stack_handle_t handle = 0;
struct page *page = NULL;
void *prealloc = NULL;
@@ -462,7 +470,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
read_lock_irqsave(&pool_rwlock, flags);
/* Fast path: look the stack trace up without full locking. */
- found = find_stack(*bucket, entries, nr_entries, hash);
+ found = find_stack(bucket, entries, nr_entries, hash);
if (found) {
read_unlock_irqrestore(&pool_rwlock, flags);
goto exit;
@@ -494,14 +502,13 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
write_lock_irqsave(&pool_rwlock, flags);
- found = find_stack(*bucket, entries, nr_entries, hash);
+ found = find_stack(bucket, entries, nr_entries, hash);
if (!found) {
struct stack_record *new =
depot_alloc_stack(entries, nr_entries, hash, &prealloc);
if (new) {
- new->next = *bucket;
- *bucket = new;
+ list_add(&new->list, bucket);
found = new;
}
} else if (prealloc) {
--
2.25.1
From: Andrey Konovalov <[email protected]>
Check the object size when looking up entries in the stack ring.
If the size of the object for which a report is being printed does not
match the size of the object for which a stack trace has been saved in
the stack ring, the saved stack trace is irrelevant.
Signed-off-by: Andrey Konovalov <[email protected]>
---
Changes v1->v2:
- This is a new patch.
---
mm/kasan/report_tags.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/kasan/report_tags.c b/mm/kasan/report_tags.c
index 78abdcde5da9..98c238ba3545 100644
--- a/mm/kasan/report_tags.c
+++ b/mm/kasan/report_tags.c
@@ -58,7 +58,8 @@ void kasan_complete_mode_report_info(struct kasan_report_info *info)
entry = &stack_ring.entries[i % stack_ring.size];
if (kasan_reset_tag(entry->ptr) != info->object ||
- get_tag(entry->ptr) != get_tag(info->access_addr))
+ get_tag(entry->ptr) != get_tag(info->access_addr) ||
+ info->cache->object_size != entry->size)
continue;
if (entry->is_free) {
--
2.25.1
From: Andrey Konovalov <[email protected]>
Add a reference counter for how many times a stack records has been added
to stack depot.
Add a new STACK_DEPOT_FLAG_GET flag to stack_depot_save_flags that
instructs the stack depot to increment the refcount.
Do not yet decrement the refcount; this is implemented in one of the
following patches.
Do not yet enable any users to use the flag to avoid overflowing the
refcount.
This is preparatory patch for implementing the eviction of stack records
from the stack depot.
Signed-off-by: Andrey Konovalov <[email protected]>
---
Changes v1->v2:
- Add forgotten refcount_inc() under write lock.
- Add STACK_DEPOT_FLAG_GET flag for stack_depot_save_flags.
---
include/linux/stackdepot.h | 13 ++++++++++---
lib/stackdepot.c | 12 ++++++++++--
2 files changed, 20 insertions(+), 5 deletions(-)
diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index 0b262e14144e..611716702d73 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -39,8 +39,9 @@ typedef u32 depot_flags_t;
* to its declaration for more details.
*/
#define STACK_DEPOT_FLAG_CAN_ALLOC ((depot_flags_t)0x0001)
+#define STACK_DEPOT_FLAG_GET ((depot_flags_t)0x0002)
-#define STACK_DEPOT_FLAGS_NUM 1
+#define STACK_DEPOT_FLAGS_NUM 2
#define STACK_DEPOT_FLAGS_MASK ((depot_flags_t)((1 << STACK_DEPOT_FLAGS_NUM) - 1))
/*
@@ -94,6 +95,9 @@ static inline int stack_depot_early_init(void) { return 0; }
* flags of @alloc_flags). Otherwise, stack depot avoids any allocations and
* fails if no space is left to store the stack trace.
*
+ * If STACK_DEPOT_FLAG_GET is set in @depot_flags, stack depot will increment
+ * the refcount on the saved stack trace if it already exists in stack depot.
+ *
* If the provided stack trace comes from the interrupt context, only the part
* up to the interrupt entry is saved.
*
@@ -116,8 +120,11 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries,
* @nr_entries: Number of frames in the stack
* @alloc_flags: Allocation GFP flags
*
- * Context: Contexts where allocations via alloc_pages() are allowed.
- * See stack_depot_save_flags() for more details.
+ * Does not increment the refcount on the saved stack trace; see
+ * stack_depot_save_flags() for more details.
+ *
+ * Context: Contexts where allocations via alloc_pages() are allowed;
+ * see stack_depot_save_flags() for more details.
*
* Return: Handle of the stack trace stored in depot, 0 on failure
*/
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index e5121225f124..e2c622054265 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -23,6 +23,7 @@
#include <linux/mutex.h>
#include <linux/percpu.h>
#include <linux/printk.h>
+#include <linux/refcount.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/stacktrace.h>
@@ -60,6 +61,7 @@ struct stack_record {
u32 hash; /* Hash in hash table */
u32 size; /* Number of stored frames */
union handle_parts handle;
+ refcount_t count;
unsigned long entries[CONFIG_STACKDEPOT_MAX_FRAMES]; /* Frames */
};
@@ -361,6 +363,7 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
stack->hash = hash;
stack->size = size;
/* stack->handle is already filled in by depot_init_pool. */
+ refcount_set(&stack->count, 1);
memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
/*
@@ -477,6 +480,8 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries,
/* Fast path: look the stack trace up without full locking. */
found = find_stack(bucket, entries, nr_entries, hash);
if (found) {
+ if (depot_flags & STACK_DEPOT_FLAG_GET)
+ refcount_inc(&found->count);
read_unlock_irqrestore(&pool_rwlock, flags);
goto exit;
}
@@ -516,12 +521,15 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries,
list_add(&new->list, bucket);
found = new;
}
- } else if (prealloc) {
+ } else {
+ if (depot_flags & STACK_DEPOT_FLAG_GET)
+ refcount_inc(&found->count);
/*
* Stack depot already contains this stack trace, but let's
* keep the preallocated memory for future.
*/
- depot_keep_new_pool(&prealloc);
+ if (prealloc)
+ depot_keep_new_pool(&prealloc);
}
write_unlock_irqrestore(&pool_rwlock, flags);
--
2.25.1
From: Andrey Konovalov <[email protected]>
Simplify comments accompanying the use of atomic accesses in the
stack depot code.
Also drop smp_load_acquire from next_pool_required in depot_init_pool,
as both depot_init_pool and the all smp_store_release's to this variable
are executed under the stack depot lock.
Signed-off-by: Andrey Konovalov <[email protected]>
---
This patch is not strictly required, as the atomic accesses are fully
removed in one of the latter patches. However, I decided to keep the
patch just in case we end up needing these atomics in the following
iterations of this series.
Changes v1->v2:
- Minor comment fix as suggested by Marco.
- Drop READ_ONCE marking for next_pool_required.
---
lib/stackdepot.c | 27 ++++++++++++---------------
1 file changed, 12 insertions(+), 15 deletions(-)
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 128ece21afe9..babd453261f0 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -225,10 +225,8 @@ static void depot_init_pool(void **prealloc)
/*
* If the next pool is already initialized or the maximum number of
* pools is reached, do not use the preallocated memory.
- * smp_load_acquire() here pairs with smp_store_release() below and
- * in depot_alloc_stack().
*/
- if (!smp_load_acquire(&next_pool_required))
+ if (!next_pool_required)
return;
/* Check if the current pool is not yet allocated. */
@@ -249,8 +247,8 @@ static void depot_init_pool(void **prealloc)
* At this point, either the next pool is initialized or the
* maximum number of pools is reached. In either case, take
* note that initializing another pool is not required.
- * This smp_store_release pairs with smp_load_acquire() above
- * and in stack_depot_save().
+ * smp_store_release pairs with smp_load_acquire in
+ * stack_depot_save.
*/
smp_store_release(&next_pool_required, 0);
}
@@ -274,15 +272,15 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
/*
* Move on to the next pool.
* WRITE_ONCE pairs with potential concurrent read in
- * stack_depot_fetch().
+ * stack_depot_fetch.
*/
WRITE_ONCE(pool_index, pool_index + 1);
pool_offset = 0;
/*
* If the maximum number of pools is not reached, take note
* that the next pool needs to initialized.
- * smp_store_release() here pairs with smp_load_acquire() in
- * stack_depot_save() and depot_init_pool().
+ * smp_store_release pairs with smp_load_acquire in
+ * stack_depot_save.
*/
if (pool_index + 1 < DEPOT_MAX_POOLS)
smp_store_release(&next_pool_required, 1);
@@ -324,7 +322,7 @@ static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle)
union handle_parts parts = { .handle = handle };
/*
* READ_ONCE pairs with potential concurrent write in
- * depot_alloc_stack().
+ * depot_alloc_stack.
*/
int pool_index_cached = READ_ONCE(pool_index);
void *pool;
@@ -413,8 +411,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
/*
* Fast path: look the stack trace up without locking.
- * The smp_load_acquire() here pairs with smp_store_release() to
- * |bucket| below.
+ * smp_load_acquire pairs with smp_store_release to |bucket| below.
*/
found = find_stack(smp_load_acquire(bucket), entries, nr_entries, hash);
if (found)
@@ -424,8 +421,8 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
* Check if another stack pool needs to be initialized. If so, allocate
* the memory now - we won't be able to do that under the lock.
*
- * The smp_load_acquire() here pairs with smp_store_release() to
- * |next_pool_inited| in depot_alloc_stack() and depot_init_pool().
+ * smp_load_acquire pairs with smp_store_release in depot_alloc_stack
+ * and depot_init_pool.
*/
if (unlikely(can_alloc && smp_load_acquire(&next_pool_required))) {
/*
@@ -451,8 +448,8 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
if (new) {
new->next = *bucket;
/*
- * This smp_store_release() pairs with
- * smp_load_acquire() from |bucket| above.
+ * smp_store_release pairs with smp_load_acquire
+ * from |bucket| above.
*/
smp_store_release(bucket, new);
found = new;
--
2.25.1
From: Andrey Konovalov <[email protected]>
Stack depot doesn't use the valid bit in handles in any way, so drop it.
Reviewed-by: Alexander Potapenko <[email protected]>
Signed-off-by: Andrey Konovalov <[email protected]>
---
lib/stackdepot.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 0772125efe8a..482eac40791e 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -32,13 +32,12 @@
#define DEPOT_HANDLE_BITS (sizeof(depot_stack_handle_t) * 8)
-#define DEPOT_VALID_BITS 1
#define DEPOT_POOL_ORDER 2 /* Pool size order, 4 pages */
#define DEPOT_POOL_SIZE (1LL << (PAGE_SHIFT + DEPOT_POOL_ORDER))
#define DEPOT_STACK_ALIGN 4
#define DEPOT_OFFSET_BITS (DEPOT_POOL_ORDER + PAGE_SHIFT - DEPOT_STACK_ALIGN)
-#define DEPOT_POOL_INDEX_BITS (DEPOT_HANDLE_BITS - DEPOT_VALID_BITS - \
- DEPOT_OFFSET_BITS - STACK_DEPOT_EXTRA_BITS)
+#define DEPOT_POOL_INDEX_BITS (DEPOT_HANDLE_BITS - DEPOT_OFFSET_BITS - \
+ STACK_DEPOT_EXTRA_BITS)
#define DEPOT_POOLS_CAP 8192
#define DEPOT_MAX_POOLS \
(((1LL << (DEPOT_POOL_INDEX_BITS)) < DEPOT_POOLS_CAP) ? \
@@ -50,7 +49,6 @@ union handle_parts {
struct {
u32 pool_index : DEPOT_POOL_INDEX_BITS;
u32 offset : DEPOT_OFFSET_BITS;
- u32 valid : DEPOT_VALID_BITS;
u32 extra : STACK_DEPOT_EXTRA_BITS;
};
};
@@ -303,7 +301,6 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
stack->size = size;
stack->handle.pool_index = pool_index;
stack->handle.offset = pool_offset >> DEPOT_STACK_ALIGN;
- stack->handle.valid = 1;
stack->handle.extra = 0;
memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
pool_offset += required_size;
--
2.25.1
From: Andrey Konovalov <[email protected]>
Make tag-based KASAN modes to evict stack traces from the stack depot
once they are evicted from the stack ring.
Internally, pass STACK_DEPOT_FLAG_GET to stack_depot_save_flags (via
kasan_save_stack) to increment the refcount when saving a new entry
to stack ring and call stack_depot_put when removing an entry from
stack ring.
Signed-off-by: Andrey Konovalov <[email protected]>
---
Changes v1->v2:
- Adapt to the stack depot API change.
- Drop READ_ONCE when reading entry->stack.
---
mm/kasan/report_tags.c | 1 +
mm/kasan/tags.c | 10 ++++++++--
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/mm/kasan/report_tags.c b/mm/kasan/report_tags.c
index 98c238ba3545..55154743f915 100644
--- a/mm/kasan/report_tags.c
+++ b/mm/kasan/report_tags.c
@@ -7,6 +7,7 @@
#include <linux/atomic.h>
#include "kasan.h"
+#include "../slab.h"
extern struct kasan_stack_ring stack_ring;
diff --git a/mm/kasan/tags.c b/mm/kasan/tags.c
index b6c017e670d8..739ae997463d 100644
--- a/mm/kasan/tags.c
+++ b/mm/kasan/tags.c
@@ -97,12 +97,13 @@ static void save_stack_info(struct kmem_cache *cache, void *object,
gfp_t gfp_flags, bool is_free)
{
unsigned long flags;
- depot_stack_handle_t stack;
+ depot_stack_handle_t stack, old_stack;
u64 pos;
struct kasan_stack_ring_entry *entry;
void *old_ptr;
- stack = kasan_save_stack(gfp_flags, STACK_DEPOT_FLAG_CAN_ALLOC);
+ stack = kasan_save_stack(gfp_flags,
+ STACK_DEPOT_FLAG_CAN_ALLOC | STACK_DEPOT_FLAG_GET);
/*
* Prevent save_stack_info() from modifying stack ring
@@ -121,6 +122,8 @@ static void save_stack_info(struct kmem_cache *cache, void *object,
if (!try_cmpxchg(&entry->ptr, &old_ptr, STACK_RING_BUSY_PTR))
goto next; /* Busy slot. */
+ old_stack = entry->stack;
+
entry->size = cache->object_size;
entry->pid = current->pid;
entry->stack = stack;
@@ -129,6 +132,9 @@ static void save_stack_info(struct kmem_cache *cache, void *object,
entry->ptr = object;
read_unlock_irqrestore(&stack_ring.lock, flags);
+
+ if (old_stack)
+ stack_depot_put(old_stack);
}
void kasan_save_alloc_info(struct kmem_cache *cache, void *object, gfp_t flags)
--
2.25.1
On Wed, 13 Sept 2023 at 19:17, <[email protected]> wrote:
>
> From: Andrey Konovalov <[email protected]>
>
> Change the bool can_alloc argument of __stack_depot_save to a
> u32 argument that accepts a set of flags.
>
> The following patch will add another flag to stack_depot_save_flags
> besides the existing STACK_DEPOT_FLAG_CAN_ALLOC.
>
> Also rename the function to stack_depot_save_flags, as __stack_depot_save
> is a cryptic name,
>
> Signed-off-by: Andrey Konovalov <[email protected]>
>
> ---
>
> Changes v1->v2:
> - This is a new patch.
> ---
> include/linux/stackdepot.h | 36 +++++++++++++++++++++++++-----------
> lib/stackdepot.c | 16 +++++++++++-----
> mm/kasan/common.c | 7 ++++---
> mm/kasan/generic.c | 9 +++++----
> mm/kasan/kasan.h | 2 +-
> mm/kasan/tags.c | 3 ++-
> 6 files changed, 48 insertions(+), 25 deletions(-)
>
> diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
> index e58306783d8e..0b262e14144e 100644
> --- a/include/linux/stackdepot.h
> +++ b/include/linux/stackdepot.h
> @@ -32,6 +32,17 @@ typedef u32 depot_stack_handle_t;
> */
> #define STACK_DEPOT_EXTRA_BITS 5
>
> +typedef u32 depot_flags_t;
> +
> +/*
> + * Flags that can be passed to stack_depot_save_flags(); see the comment next
> + * to its declaration for more details.
> + */
> +#define STACK_DEPOT_FLAG_CAN_ALLOC ((depot_flags_t)0x0001)
> +
> +#define STACK_DEPOT_FLAGS_NUM 1
> +#define STACK_DEPOT_FLAGS_MASK ((depot_flags_t)((1 << STACK_DEPOT_FLAGS_NUM) - 1))
> +
> /*
> * Using stack depot requires its initialization, which can be done in 3 ways:
> *
> @@ -69,31 +80,34 @@ static inline int stack_depot_early_init(void) { return 0; }
> #endif
>
> /**
> - * __stack_depot_save - Save a stack trace to stack depot
> + * stack_depot_save_flags - Save a stack trace to stack depot
> *
> * @entries: Pointer to the stack trace
> * @nr_entries: Number of frames in the stack
> * @alloc_flags: Allocation GFP flags
> - * @can_alloc: Allocate stack pools (increased chance of failure if false)
> + * @depot_flags: Stack depot flags
> + *
> + * Saves a stack trace from @entries array of size @nr_entries.
> *
> - * Saves a stack trace from @entries array of size @nr_entries. If @can_alloc is
> - * %true, stack depot can replenish the stack pools in case no space is left
> - * (allocates using GFP flags of @alloc_flags). If @can_alloc is %false, avoids
> - * any allocations and fails if no space is left to store the stack trace.
> + * If STACK_DEPOT_FLAG_CAN_ALLOC is set in @depot_flags, stack depot can
> + * replenish the stack pools in case no space is left (allocates using GFP
> + * flags of @alloc_flags). Otherwise, stack depot avoids any allocations and
> + * fails if no space is left to store the stack trace.
> *
> * If the provided stack trace comes from the interrupt context, only the part
> * up to the interrupt entry is saved.
> *
> - * Context: Any context, but setting @can_alloc to %false is required if
> + * Context: Any context, but setting STACK_DEPOT_FLAG_CAN_ALLOC is required if
> * alloc_pages() cannot be used from the current context. Currently
> * this is the case for contexts where neither %GFP_ATOMIC nor
> * %GFP_NOWAIT can be used (NMI, raw_spin_lock).
> *
> * Return: Handle of the stack struct stored in depot, 0 on failure
> */
> -depot_stack_handle_t __stack_depot_save(unsigned long *entries,
> - unsigned int nr_entries,
> - gfp_t gfp_flags, bool can_alloc);
> +depot_stack_handle_t stack_depot_save_flags(unsigned long *entries,
> + unsigned int nr_entries,
> + gfp_t gfp_flags,
> + depot_flags_t depot_flags);
>
> /**
> * stack_depot_save - Save a stack trace to stack depot
> @@ -103,7 +117,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
> * @alloc_flags: Allocation GFP flags
> *
> * Context: Contexts where allocations via alloc_pages() are allowed.
> - * See __stack_depot_save() for more details.
> + * See stack_depot_save_flags() for more details.
> *
> * Return: Handle of the stack trace stored in depot, 0 on failure
> */
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index 1b08897ebd2b..e5121225f124 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -438,19 +438,24 @@ static inline struct stack_record *find_stack(struct list_head *bucket,
> return NULL;
> }
>
> -depot_stack_handle_t __stack_depot_save(unsigned long *entries,
> - unsigned int nr_entries,
> - gfp_t alloc_flags, bool can_alloc)
> +depot_stack_handle_t stack_depot_save_flags(unsigned long *entries,
> + unsigned int nr_entries,
> + gfp_t alloc_flags,
> + depot_flags_t depot_flags)
> {
> struct list_head *bucket;
> struct stack_record *found = NULL;
> depot_stack_handle_t handle = 0;
> struct page *page = NULL;
> void *prealloc = NULL;
> + bool can_alloc = depot_flags & STACK_DEPOT_FLAG_CAN_ALLOC;
> bool need_alloc = false;
> unsigned long flags;
> u32 hash;
>
> + if (depot_flags & ~STACK_DEPOT_FLAGS_MASK)
> + return 0;
> +
Shouldn't this be a WARN due to invalid flags?
> /*
> * If this stack trace is from an interrupt, including anything before
> * interrupt entry usually leads to unbounded stack depot growth.
> @@ -529,13 +534,14 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
> handle = found->handle.handle;
> return handle;
> }
> -EXPORT_SYMBOL_GPL(__stack_depot_save);
> +EXPORT_SYMBOL_GPL(stack_depot_save_flags);
>
> depot_stack_handle_t stack_depot_save(unsigned long *entries,
> unsigned int nr_entries,
> gfp_t alloc_flags)
> {
> - return __stack_depot_save(entries, nr_entries, alloc_flags, true);
> + return stack_depot_save_flags(entries, nr_entries, alloc_flags,
> + STACK_DEPOT_FLAG_CAN_ALLOC);
> }
> EXPORT_SYMBOL_GPL(stack_depot_save);
>
> diff --git a/mm/kasan/common.c b/mm/kasan/common.c
> index 256930da578a..825a0240ec02 100644
> --- a/mm/kasan/common.c
> +++ b/mm/kasan/common.c
> @@ -22,6 +22,7 @@
> #include <linux/sched.h>
> #include <linux/sched/task_stack.h>
> #include <linux/slab.h>
> +#include <linux/stackdepot.h>
> #include <linux/stacktrace.h>
> #include <linux/string.h>
> #include <linux/types.h>
> @@ -37,19 +38,19 @@ struct slab *kasan_addr_to_slab(const void *addr)
> return NULL;
> }
>
> -depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc)
> +depot_stack_handle_t kasan_save_stack(gfp_t flags, depot_flags_t depot_flags)
> {
> unsigned long entries[KASAN_STACK_DEPTH];
> unsigned int nr_entries;
>
> nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
> - return __stack_depot_save(entries, nr_entries, flags, can_alloc);
> + return stack_depot_save_flags(entries, nr_entries, flags, depot_flags);
> }
>
> void kasan_set_track(struct kasan_track *track, gfp_t flags)
> {
> track->pid = current->pid;
> - track->stack = kasan_save_stack(flags, true);
> + track->stack = kasan_save_stack(flags, STACK_DEPOT_FLAG_CAN_ALLOC);
> }
>
> #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
> diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
> index 4d837ab83f08..5d168c9afb32 100644
> --- a/mm/kasan/generic.c
> +++ b/mm/kasan/generic.c
> @@ -25,6 +25,7 @@
> #include <linux/sched.h>
> #include <linux/sched/task_stack.h>
> #include <linux/slab.h>
> +#include <linux/stackdepot.h>
> #include <linux/stacktrace.h>
> #include <linux/string.h>
> #include <linux/types.h>
> @@ -472,7 +473,7 @@ size_t kasan_metadata_size(struct kmem_cache *cache, bool in_object)
> sizeof(struct kasan_free_meta) : 0);
> }
>
> -static void __kasan_record_aux_stack(void *addr, bool can_alloc)
> +static void __kasan_record_aux_stack(void *addr, depot_flags_t depot_flags)
> {
> struct slab *slab = kasan_addr_to_slab(addr);
> struct kmem_cache *cache;
> @@ -489,17 +490,17 @@ static void __kasan_record_aux_stack(void *addr, bool can_alloc)
> return;
>
> alloc_meta->aux_stack[1] = alloc_meta->aux_stack[0];
> - alloc_meta->aux_stack[0] = kasan_save_stack(0, can_alloc);
> + alloc_meta->aux_stack[0] = kasan_save_stack(0, depot_flags);
> }
>
> void kasan_record_aux_stack(void *addr)
> {
> - return __kasan_record_aux_stack(addr, true);
> + return __kasan_record_aux_stack(addr, STACK_DEPOT_FLAG_CAN_ALLOC);
> }
>
> void kasan_record_aux_stack_noalloc(void *addr)
> {
> - return __kasan_record_aux_stack(addr, false);
> + return __kasan_record_aux_stack(addr, 0);
> }
>
> void kasan_save_alloc_info(struct kmem_cache *cache, void *object, gfp_t flags)
> diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
> index f70e3d7a602e..de3206e11888 100644
> --- a/mm/kasan/kasan.h
> +++ b/mm/kasan/kasan.h
> @@ -370,7 +370,7 @@ static inline void kasan_init_cache_meta(struct kmem_cache *cache, unsigned int
> static inline void kasan_init_object_meta(struct kmem_cache *cache, const void *object) { }
> #endif
>
> -depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc);
> +depot_stack_handle_t kasan_save_stack(gfp_t flags, depot_flags_t depot_flags);
> void kasan_set_track(struct kasan_track *track, gfp_t flags);
> void kasan_save_alloc_info(struct kmem_cache *cache, void *object, gfp_t flags);
> void kasan_save_free_info(struct kmem_cache *cache, void *object);
> diff --git a/mm/kasan/tags.c b/mm/kasan/tags.c
> index 7dcfe341d48e..4fd32121b0fd 100644
> --- a/mm/kasan/tags.c
> +++ b/mm/kasan/tags.c
> @@ -13,6 +13,7 @@
> #include <linux/memblock.h>
> #include <linux/memory.h>
> #include <linux/mm.h>
> +#include <linux/stackdepot.h>
> #include <linux/static_key.h>
> #include <linux/string.h>
> #include <linux/types.h>
> @@ -101,7 +102,7 @@ static void save_stack_info(struct kmem_cache *cache, void *object,
> struct kasan_stack_ring_entry *entry;
> void *old_ptr;
>
> - stack = kasan_save_stack(gfp_flags, true);
> + stack = kasan_save_stack(gfp_flags, STACK_DEPOT_FLAG_CAN_ALLOC);
>
> /*
> * Prevent save_stack_info() from modifying stack ring
> --
> 2.25.1
>
On Fri, Sep 15, 2023 at 10:32 PM Marco Elver <[email protected]> wrote:
> > +depot_stack_handle_t stack_depot_save_flags(unsigned long *entries,
> > + unsigned int nr_entries,
> > + gfp_t alloc_flags,
> > + depot_flags_t depot_flags)
> > {
> > struct list_head *bucket;
> > struct stack_record *found = NULL;
> > depot_stack_handle_t handle = 0;
> > struct page *page = NULL;
> > void *prealloc = NULL;
> > + bool can_alloc = depot_flags & STACK_DEPOT_FLAG_CAN_ALLOC;
> > bool need_alloc = false;
> > unsigned long flags;
> > u32 hash;
> >
> > + if (depot_flags & ~STACK_DEPOT_FLAGS_MASK)
> > + return 0;
> > +
>
> Shouldn't this be a WARN due to invalid flags?
Good idea! Will fix. Thanks!
On Sat, 16 Sep 2023 19:43:35 +0200 Anders Roxell <[email protected]> wrote:
> On 2023-09-13 19:14, [email protected] wrote:
> > From: Andrey Konovalov <[email protected]>
> >
> > Switch stack_record to use list_head for links in the hash table
> > and in the freelist.
> >
> > This will allow removing entries from the hash table buckets.
> >
> > This is preparatory patch for implementing the eviction of stack records
> > from the stack depot.
> >
> > Signed-off-by: Andrey Konovalov <[email protected]>
> >
>
> Building on an arm64 kernel from linux-next tag next-20230915, and boot
> that in QEMU. I see the following kernel panic.
>
> ...
>
> The full log can be found [1] and the .config file [2]. I bisected down
> to this commit, see the bisect log [3].
>
> When reverted these two commits I managed to build and the kernel
> booted.
>
> 47590ecf1166 ("lib/stackdepot: use list_head for stack record links")
> 8729f3c26fc2 ("lib/stackdepot: allow users to evict stack traces")
Thanks, I have dropped this v2 series.
On 2023-09-13 19:14, [email protected] wrote:
> From: Andrey Konovalov <[email protected]>
>
> Switch stack_record to use list_head for links in the hash table
> and in the freelist.
>
> This will allow removing entries from the hash table buckets.
>
> This is preparatory patch for implementing the eviction of stack records
> from the stack depot.
>
> Signed-off-by: Andrey Konovalov <[email protected]>
>
Building on an arm64 kernel from linux-next tag next-20230915, and boot
that in QEMU. I see the following kernel panic.
[ 67.398850][ T1] Unable to handle kernel read from unreadable memory at virtual address 0000000000000010
[ 67.407996][ T1] Mem abort info:
[ 67.411023][ T1] ESR = 0x0000000096000004
[ 67.414757][ T1] EC = 0x25: DABT (current EL), IL = 32 bits
[ 67.419945][ T1] SET = 0, FnV = 0
[ 67.423172][ T1] EA = 0, S1PTW = 0
[ 67.426669][ T1] FSC = 0x04: level 0 translation fault
[ 67.431357][ T1] Data abort info:
[ 67.434593][ T1] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 67.439801][ T1] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 67.444948][ T1] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 67.449910][ T1] [0000000000000010] user address but active_mm is swapper
[ 67.456236][ T1] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 67.462181][ T1] Modules linked in:
[ 67.465435][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G T 6.6.0-rc1-next-20230915 #2 e95cf19845fbc1e6a5f0694214d59e527e463469
[ 67.477126][ T1] Hardware name: linux,dummy-virt (DT)
[ 67.481994][ T1] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 67.488454][ T1] pc : stack_depot_save_flags+0x2a8/0x780
[ 67.493348][ T1] lr : stack_depot_save_flags+0x2a8/0x780
[ 67.498339][ T1] sp : ffff80008000b870
[ 67.501670][ T1] x29: ffff80008000b870 x28: 00000000650dddc5 x27: 0000000000000000
[ 67.508658][ T1] x26: ffff80008470a000 x25: ffff80008000b9e8 x24: 0000000000000001
[ 67.515564][ T1] x23: 000000000000000e x22: ffff80008000b988 x21: 0000000000000001
[ 67.522430][ T1] x20: ffff00007b40d070 x19: 000000006ee80007 x18: ffff80008000d080
[ 67.529365][ T1] x17: 0000000000000000 x16: 0000000000000000 x15: 2030303178302f30
[ 67.536101][ T1] x14: 0000000000000000 x13: 205d315420202020 x12: 0000000000000000
[ 67.542985][ T1] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
[ 67.549863][ T1] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[ 67.556764][ T1] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[ 67.563687][ T1] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
[ 67.570500][ T1] Call trace:
[ 67.573275][ T1] stack_depot_save_flags+0x2a8/0x780
[ 67.577794][ T1] stack_depot_save+0x4c/0xc0
[ 67.582062][ T1] ref_tracker_alloc+0x354/0x480
[ 67.586273][ T1] sk_alloc+0x280/0x5f8
[ 67.590064][ T1] __netlink_create+0x84/0x200
[ 67.594009][ T1] __netlink_kernel_create+0x11c/0x500
[ 67.598816][ T1] rtnetlink_net_init+0xc4/0x180
[ 67.603052][ T1] ops_init+0x100/0x2c0
[ 67.606827][ T1] register_pernet_operations+0x228/0x480
[ 67.611568][ T1] register_pernet_subsys+0x5c/0xc0
[ 67.616282][ T1] rtnetlink_init+0x60/0xb00
[ 67.620086][ T1] netlink_proto_init+0x374/0x400
[ 67.624465][ T1] do_one_initcall+0x2c8/0x840
[ 67.628518][ T1] do_initcalls+0x21c/0x340
[ 67.632527][ T1] kernel_init_freeable+0x3b0/0x480
[ 67.636905][ T1] kernel_init+0x58/0x380
[ 67.640768][ T1] ret_from_fork+0x10/0x40
[ 67.644606][ T1] Code: eb1b029f 540008c0 91004360 97caa437 (b9401360)
[ 67.650293][ T1] ---[ end trace 0000000000000000 ]---
[ 67.654948][ T1] Kernel panic - not syncing: Oops: Fatal exception
[ 67.660229][ T1] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
The full log can be found [1] and the .config file [2]. I bisected down
to this commit, see the bisect log [3].
When reverted these two commits I managed to build and the kernel
booted.
47590ecf1166 ("lib/stackdepot: use list_head for stack record links")
8729f3c26fc2 ("lib/stackdepot: allow users to evict stack traces")
Cheers,
Anders
[1] http://ix.io/4GyE
[2] https://people.linaro.org/~anders.roxell/next-20230915.config
[3] http://ix.io/4GyG
On Wed, Sep 13, 2023 at 7:14 PM <[email protected]> wrote:
>
> From: Andrey Konovalov <[email protected]>
>
> Currently, the stack depot grows indefinitely until it reaches its
> capacity. Once that happens, the stack depot stops saving new stack
> traces.
>
> This creates a problem for using the stack depot for in-field testing
> and in production.
>
> For such uses, an ideal stack trace storage should:
>
> 1. Allow saving fresh stack traces on systems with a large uptime while
> limiting the amount of memory used to store the traces;
> 2. Have a low performance impact.
>
> Implementing #1 in the stack depot is impossible with the current
> keep-forever approach. This series targets to address that. Issue #2 is
> left to be addressed in a future series.
>
> This series changes the stack depot implementation to allow evicting
> unneeded stack traces from the stack depot. The users of the stack depot
> can do that via new stack_depot_save_flags(STACK_DEPOT_FLAG_GET) and
> stack_depot_put APIs.
>
> Internal changes to the stack depot code include:
>
> 1. Storing stack traces in fixed-frame-sized slots; the slot size is
> controlled via CONFIG_STACKDEPOT_MAX_FRAMES (vs precisely-sized
> slots in the current implementation);
> 2. Keeping available slots in a freelist (vs keeping an offset to the next
> free slot);
> 3. Using a read/write lock for synchronization (vs a lock-free approach
> combined with a spinlock).
>
> This series also integrates the eviction functionality in the tag-based
> KASAN modes.
>
> Despite wasting some space on rounding up the size of each stack record,
> with CONFIG_STACKDEPOT_MAX_FRAMES=32, the tag-based KASAN modes end up
> consuming ~5% less memory in stack depot during boot (with the default
> stack ring size of 32k entries). The reason for this is the eviction of
> irrelevant stack traces from the stack depot, which frees up space for
> other stack traces.
>
> For other tools that heavily rely on the stack depot, like Generic KASAN
> and KMSAN, this change leads to the stack depot capacity being reached
> sooner than before. However, as these tools are mainly used in fuzzing
> scenarios where the kernel is frequently rebooted, this outcome should
> be acceptable.
>
> There is no measurable boot time performance impact of these changes for
> KASAN on x86-64. I haven't done any tests for arm64 modes (the stack
> depot without performance optimizations is not suitable for intended use
> of those anyway), but I expect a similar result. Obtaining and copying
> stack trace frames when saving them into stack depot is what takes the
> most time.
>
> This series does not yet provide a way to configure the maximum size of
> the stack depot externally (e.g. via a command-line parameter). This will
> be added in a separate series, possibly together with the performance
> improvement changes.
Hi Marco and Alex,
Could you PTAL at the not-yet-reviewed patches in this series when you
get a chance?
Thanks!
On Wed, Sep 13, 2023 at 7:15 PM <[email protected]> wrote:
>
> From: Andrey Konovalov <[email protected]>
>
> Simplify comments accompanying the use of atomic accesses in the
> stack depot code.
>
> Also drop smp_load_acquire from next_pool_required in depot_init_pool,
> as both depot_init_pool and the all smp_store_release's to this variable
> are executed under the stack depot lock.
>
> Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Alexander Potapenko <[email protected]>
(but see below)
> * Move on to the next pool.
> * WRITE_ONCE pairs with potential concurrent read in
> - * stack_depot_fetch().
> + * stack_depot_fetch.
Why are you removing the parentheses here? kernel-doc uses them to
tell functions from non-functions, and having them in non-doc comments
sounds consistent.
On Fri, Oct 6, 2023 at 6:14 PM Alexander Potapenko <[email protected]> wrote:
>
> On Wed, Sep 13, 2023 at 7:15 PM <[email protected]> wrote:
> >
> > From: Andrey Konovalov <[email protected]>
> >
> > Simplify comments accompanying the use of atomic accesses in the
> > stack depot code.
> >
> > Also drop smp_load_acquire from next_pool_required in depot_init_pool,
> > as both depot_init_pool and the all smp_store_release's to this variable
> > are executed under the stack depot lock.
Maybe add this to the comment before "if (!next_pool_required)" ?
On Wed, Sep 13, 2023 at 7:15 PM <[email protected]> wrote:
>
> From: Andrey Konovalov <[email protected]>
>
> Split code in depot_alloc_stack and depot_init_pool into 3 functions:
>
> 1. depot_keep_next_pool that keeps preallocated memory for the next pool
> if required.
>
> 2. depot_update_pools that moves on to the next pool if there's no space
> left in the current pool, uses preallocated memory for the new current
> pool if required, and calls depot_keep_next_pool otherwise.
>
> 3. depot_alloc_stack that calls depot_update_pools and then allocates
> a stack record as before.
>
> This makes it somewhat easier to follow the logic of depot_alloc_stack
> and also serves as a preparation for implementing the eviction of stack
> records from the stack depot.
>
> Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Alexander Potapenko <[email protected]>
> +static void depot_keep_next_pool(void **prealloc)
> {
> /*
> - * If the next pool is already initialized or the maximum number of
> + * If the next pool is already saved or the maximum number of
> * pools is reached, do not use the preallocated memory.
> */
> if (!next_pool_required)
It's not mentioned at the top of the file that next_pool_required is
protected by pool_lock, but it is, correct?
Can you please update the comment to reflect that?
> +
> + /*
> + * At this point, either the next pool is kept or the maximum
> + * number of pools is reached. In either case, take note that
> + * keeping another pool is not required.
> + * smp_store_release pairs with smp_load_acquire in stack_depot_save.
As I wrote in the other patch review, I think we'd better keep
parentheses at the end of the function names in the comments (unless
there's a style guide telling us not to).
On Wed, Sep 13, 2023 at 7:15 PM <[email protected]> wrote:
>
> From: Andrey Konovalov <[email protected]>
>
> Instead of using the global pool_offset variable to find a free slot
> when storing a new stack record, mainlain a freelist of free slots
> within the allocated stack pools.
>
> A global next_stack variable is used as the head of the freelist, and
> the next field in the stack_record struct is reused as freelist link
> (when the record is not in the freelist, this field is used as a link
> in the hash table).
>
> This is preparatory patch for implementing the eviction of stack records
> from the stack depot.
>
> Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Alexander Potapenko <[email protected]>
On Mon, Oct 9, 2023 at 10:59 AM Alexander Potapenko <[email protected]> wrote:
>
> On Wed, Sep 13, 2023 at 7:15 PM <[email protected]> wrote:
> >
> > From: Andrey Konovalov <[email protected]>
> >
> > Split code in depot_alloc_stack and depot_init_pool into 3 functions:
> >
> > 1. depot_keep_next_pool that keeps preallocated memory for the next pool
> > if required.
> >
> > 2. depot_update_pools that moves on to the next pool if there's no space
> > left in the current pool, uses preallocated memory for the new current
> > pool if required, and calls depot_keep_next_pool otherwise.
> >
> > 3. depot_alloc_stack that calls depot_update_pools and then allocates
> > a stack record as before.
> >
> > This makes it somewhat easier to follow the logic of depot_alloc_stack
> > and also serves as a preparation for implementing the eviction of stack
> > records from the stack depot.
> >
> > Signed-off-by: Andrey Konovalov <[email protected]>
> Reviewed-by: Alexander Potapenko <[email protected]>
>
>
> > +static void depot_keep_next_pool(void **prealloc)
> > {
> > /*
> > - * If the next pool is already initialized or the maximum number of
> > + * If the next pool is already saved or the maximum number of
> > * pools is reached, do not use the preallocated memory.
> > */
> > if (!next_pool_required)
> It's not mentioned at the top of the file that next_pool_required is
> protected by pool_lock, but it is, correct?
> Can you please update the comment to reflect that?
You're adding lockdep annotations in patch 11, which are pretty
self-descriptive.
Feel free to ignore my comment above.
On Wed, Sep 13, 2023 at 7:17 PM <[email protected]> wrote:
>
> From: Andrey Konovalov <[email protected]>
>
> Make KMSAN use stack_depot_save instead of __stack_depot_save,
> as it always passes true to __stack_depot_save as the last argument.
>
> Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Alexander Potapenko <[email protected]>
On Wed, Sep 13, 2023 at 7:17 PM <[email protected]> wrote:
>
> From: Andrey Konovalov <[email protected]>
>
> Change the bool can_alloc argument of __stack_depot_save to a
> u32 argument that accepts a set of flags.
>
> The following patch will add another flag to stack_depot_save_flags
> besides the existing STACK_DEPOT_FLAG_CAN_ALLOC.
>
> Also rename the function to stack_depot_save_flags, as __stack_depot_save
> is a cryptic name,
>
> Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Alexander Potapenko <[email protected]>
(assuming you'll address Marco's comment)
...
> void kasan_record_aux_stack_noalloc(void *addr)
> {
> - return __kasan_record_aux_stack(addr, false);
> + return __kasan_record_aux_stack(addr, 0);
Maybe make the intent to not allocate more explicit by declaring some
STACK_DEPOT_FLAG_CAN_NOT_ALLOC = 0?
(Leaving this up to you)
On Wed, Sep 13, 2023 at 7:17 PM <[email protected]> wrote:
>
> From: Andrey Konovalov <[email protected]>
>
> Add a reference counter for how many times a stack records has been added
> to stack depot.
>
> Add a new STACK_DEPOT_FLAG_GET flag to stack_depot_save_flags that
> instructs the stack depot to increment the refcount.
>
> Do not yet decrement the refcount; this is implemented in one of the
> following patches.
>
> Do not yet enable any users to use the flag to avoid overflowing the
> refcount.
>
> This is preparatory patch for implementing the eviction of stack records
> from the stack depot.
>
> Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Alexander Potapenko <[email protected]>
On Wed, Sep 13, 2023 at 7:17 PM <[email protected]> wrote:
>
> From: Andrey Konovalov <[email protected]>
>
> Remove the atomic accesses to entry fields in save_stack_info and
> kasan_complete_mode_report_info for tag-based KASAN modes.
>
> These atomics are not required, as the read/write lock prevents the
> entries from being read (in kasan_complete_mode_report_info) while being
> written (in save_stack_info) and the try_cmpxchg prevents the same entry
> from being rewritten (in save_stack_info) in the unlikely case of wrapping
> during writing.
Given that you removed all atomic accesses, it should be fine to
remove the inclusion of atomic.h as well.
> Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Alexander Potapenko <[email protected]>
On Sat, Sep 16, 2023 at 10:04 PM Andrew Morton
<[email protected]> wrote:
>
> On Sat, 16 Sep 2023 19:43:35 +0200 Anders Roxell <[email protected]> wrote:
>
> > On 2023-09-13 19:14, [email protected] wrote:
> > > From: Andrey Konovalov <[email protected]>
> > >
> > > Switch stack_record to use list_head for links in the hash table
> > > and in the freelist.
> > >
> > > This will allow removing entries from the hash table buckets.
> > >
> > > This is preparatory patch for implementing the eviction of stack records
> > > from the stack depot.
> > >
> > > Signed-off-by: Andrey Konovalov <[email protected]>
> > >
> >
> > Building on an arm64 kernel from linux-next tag next-20230915, and boot
> > that in QEMU. I see the following kernel panic.
> >
> > ...
> >
> > The full log can be found [1] and the .config file [2]. I bisected down
> > to this commit, see the bisect log [3].
I am also seeing similar crashes on an x86 KMSAN build.
They are happening when in the following code:
list_for_each(pos, bucket) {
found = list_entry(pos, struct stack_record, list);
if (found->hash == hash &&
found->size == size &&
!stackdepot_memcmp(entries, found->entries, size))
return found;
}
`found` is NULL
@Andrey, could you please take a look?
On Wed, Sep 13, 2023 at 7:18 PM <[email protected]> wrote:
>
> From: Andrey Konovalov <[email protected]>
>
> Check the object size when looking up entries in the stack ring.
>
> If the size of the object for which a report is being printed does not
> match the size of the object for which a stack trace has been saved in
> the stack ring, the saved stack trace is irrelevant.
>
> Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Alexander Potapenko <[email protected]>
On Wed, Sep 13, 2023 at 7:18 PM <[email protected]> wrote:
>
> From: Andrey Konovalov <[email protected]>
>
> Make tag-based KASAN modes to evict stack traces from the stack depot
"Make tag-based KASAN modes evict stack traces from the stack depot"
(without "to")
> Internally, pass STACK_DEPOT_FLAG_GET to stack_depot_save_flags (via
> kasan_save_stack) to increment the refcount when saving a new entry
> to stack ring and call stack_depot_put when removing an entry from
> stack ring.
>
> Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Alexander Potapenko <[email protected]>
(but see the two other comments)
> --- a/mm/kasan/report_tags.c
> +++ b/mm/kasan/report_tags.c
> @@ -7,6 +7,7 @@
> #include <linux/atomic.h>
>
> #include "kasan.h"
> +#include "../slab.h"
Why?
On Thu, 5 Oct 2023 at 22:36, Andrey Konovalov <[email protected]> wrote:
>
> On Wed, Sep 13, 2023 at 7:14 PM <[email protected]> wrote:
> >
> > From: Andrey Konovalov <[email protected]>
> >
> > Currently, the stack depot grows indefinitely until it reaches its
> > capacity. Once that happens, the stack depot stops saving new stack
> > traces.
> >
> > This creates a problem for using the stack depot for in-field testing
> > and in production.
> >
> > For such uses, an ideal stack trace storage should:
> >
> > 1. Allow saving fresh stack traces on systems with a large uptime while
> > limiting the amount of memory used to store the traces;
> > 2. Have a low performance impact.
> >
> > Implementing #1 in the stack depot is impossible with the current
> > keep-forever approach. This series targets to address that. Issue #2 is
> > left to be addressed in a future series.
> >
> > This series changes the stack depot implementation to allow evicting
> > unneeded stack traces from the stack depot. The users of the stack depot
> > can do that via new stack_depot_save_flags(STACK_DEPOT_FLAG_GET) and
> > stack_depot_put APIs.
> >
> > Internal changes to the stack depot code include:
> >
> > 1. Storing stack traces in fixed-frame-sized slots; the slot size is
> > controlled via CONFIG_STACKDEPOT_MAX_FRAMES (vs precisely-sized
> > slots in the current implementation);
> > 2. Keeping available slots in a freelist (vs keeping an offset to the next
> > free slot);
> > 3. Using a read/write lock for synchronization (vs a lock-free approach
> > combined with a spinlock).
> >
> > This series also integrates the eviction functionality in the tag-based
> > KASAN modes.
> >
> > Despite wasting some space on rounding up the size of each stack record,
> > with CONFIG_STACKDEPOT_MAX_FRAMES=32, the tag-based KASAN modes end up
> > consuming ~5% less memory in stack depot during boot (with the default
> > stack ring size of 32k entries). The reason for this is the eviction of
> > irrelevant stack traces from the stack depot, which frees up space for
> > other stack traces.
> >
> > For other tools that heavily rely on the stack depot, like Generic KASAN
> > and KMSAN, this change leads to the stack depot capacity being reached
> > sooner than before. However, as these tools are mainly used in fuzzing
> > scenarios where the kernel is frequently rebooted, this outcome should
> > be acceptable.
> >
> > There is no measurable boot time performance impact of these changes for
> > KASAN on x86-64. I haven't done any tests for arm64 modes (the stack
> > depot without performance optimizations is not suitable for intended use
> > of those anyway), but I expect a similar result. Obtaining and copying
> > stack trace frames when saving them into stack depot is what takes the
> > most time.
> >
> > This series does not yet provide a way to configure the maximum size of
> > the stack depot externally (e.g. via a command-line parameter). This will
> > be added in a separate series, possibly together with the performance
> > improvement changes.
>
> Hi Marco and Alex,
>
> Could you PTAL at the not-yet-reviewed patches in this series when you
> get a chance?
There'll be a v3 with a few smaller still-pending fixes, right? I
think I looked at it a while back and the rest that I didn't comment
on looked fine, just waiting for v3.
Feel free to send a v3 by end of week. I'll try to have another look
today/tomorrow just in case I missed something, but if there are no
more comments please send v3 later in the week.
On Mon, Oct 9, 2023 at 2:35 PM Marco Elver <[email protected]> wrote:
>
> > Hi Marco and Alex,
> >
> > Could you PTAL at the not-yet-reviewed patches in this series when you
> > get a chance?
>
> There'll be a v3 with a few smaller still-pending fixes, right? I
> think I looked at it a while back and the rest that I didn't comment
> on looked fine, just waiting for v3.
>
> Feel free to send a v3 by end of week. I'll try to have another look
> today/tomorrow just in case I missed something, but if there are no
> more comments please send v3 later in the week.
Yes, definitely, there will be v3. I just wanted to collect more
feedback before spamming the list again.
I will send v3 that addresses all the issues and new Alexander's
comments (thanks!) next week (travelling for a conference right now).
Thank you!
On Fri, Oct 6, 2023 at 7:22 PM Alexander Potapenko <[email protected]> wrote:
>
> On Fri, Oct 6, 2023 at 6:14 PM Alexander Potapenko <[email protected]> wrote:
> >
> > On Wed, Sep 13, 2023 at 7:15 PM <[email protected]> wrote:
> > >
> > > From: Andrey Konovalov <[email protected]>
> > >
> > > Simplify comments accompanying the use of atomic accesses in the
> > > stack depot code.
> > >
> > > Also drop smp_load_acquire from next_pool_required in depot_init_pool,
> > > as both depot_init_pool and the all smp_store_release's to this variable
> > > are executed under the stack depot lock.
>
> Maybe add this to the comment before "if (!next_pool_required)" ?
Will do in v3.
Re removed parentheses: will restore them in v3.
Thanks!
On Mon, Oct 9, 2023 at 11:00 AM Alexander Potapenko <[email protected]> wrote:
>
> > +static void depot_keep_next_pool(void **prealloc)
> > {
> > /*
> > - * If the next pool is already initialized or the maximum number of
> > + * If the next pool is already saved or the maximum number of
> > * pools is reached, do not use the preallocated memory.
> > */
> > if (!next_pool_required)
> It's not mentioned at the top of the file that next_pool_required is
> protected by pool_lock, but it is, correct?
> Can you please update the comment to reflect that?
I'll add a clarifying comment above this access to the previous patch.
I don't think it's worth to describe all locking intricacies of the
existing code in other places, as atomic accesses are removed
altogether later in the series anyway.
Thanks!
On Mon, Oct 9, 2023 at 12:10 PM Alexander Potapenko <[email protected]> wrote:
>
> On Wed, Sep 13, 2023 at 7:17 PM <[email protected]> wrote:
> >
> > From: Andrey Konovalov <[email protected]>
> >
> > Change the bool can_alloc argument of __stack_depot_save to a
> > u32 argument that accepts a set of flags.
> >
> > The following patch will add another flag to stack_depot_save_flags
> > besides the existing STACK_DEPOT_FLAG_CAN_ALLOC.
> >
> > Also rename the function to stack_depot_save_flags, as __stack_depot_save
> > is a cryptic name,
> >
> > Signed-off-by: Andrey Konovalov <[email protected]>
> Reviewed-by: Alexander Potapenko <[email protected]>
> (assuming you'll address Marco's comment)
>
> ...
>
> > void kasan_record_aux_stack_noalloc(void *addr)
> > {
> > - return __kasan_record_aux_stack(addr, false);
> > + return __kasan_record_aux_stack(addr, 0);
>
> Maybe make the intent to not allocate more explicit by declaring some
> STACK_DEPOT_FLAG_CAN_NOT_ALLOC = 0?
> (Leaving this up to you)
The next patch adds another flag, so STACK_DEPOT_FLAG_CAN_NOT_ALLOC is
probably not the best name. I could add something like
STACK_DEPOT_FLAG_NONE, but I think this might create an impression
that there's some kind of NONE flag that affects the behavior of
stack_depot_save_flags in a special way.
I think we can just keep the value as 0, as it seems what the kernel
does in similar cases. E.g. for slab_flags_t, the kernel passes 0 to
kmem_cache_create when there are no special flags required.
Thanks!
On Mon, Oct 9, 2023 at 2:05 PM Alexander Potapenko <[email protected]> wrote:
>
> On Wed, Sep 13, 2023 at 7:17 PM <[email protected]> wrote:
> >
> > From: Andrey Konovalov <[email protected]>
> >
> > Remove the atomic accesses to entry fields in save_stack_info and
> > kasan_complete_mode_report_info for tag-based KASAN modes.
> >
> > These atomics are not required, as the read/write lock prevents the
> > entries from being read (in kasan_complete_mode_report_info) while being
> > written (in save_stack_info) and the try_cmpxchg prevents the same entry
> > from being rewritten (in save_stack_info) in the unlikely case of wrapping
> > during writing.
>
> Given that you removed all atomic accesses, it should be fine to
> remove the inclusion of atomic.h as well.
Not all of them are removed: stack_ring.pos is still accessed via atomics.
> > Signed-off-by: Andrey Konovalov <[email protected]>
> Reviewed-by: Alexander Potapenko <[email protected]>
Thanks!
On Mon, Oct 9, 2023 at 2:16 PM Alexander Potapenko <[email protected]> wrote:
>
> On Sat, Sep 16, 2023 at 10:04 PM Andrew Morton
> <[email protected]> wrote:
> >
> > On Sat, 16 Sep 2023 19:43:35 +0200 Anders Roxell <[email protected]> wrote:
> >
> > > On 2023-09-13 19:14, [email protected] wrote:
> > > > From: Andrey Konovalov <[email protected]>
> > > >
> > > > Switch stack_record to use list_head for links in the hash table
> > > > and in the freelist.
> > > >
> > > > This will allow removing entries from the hash table buckets.
> > > >
> > > > This is preparatory patch for implementing the eviction of stack records
> > > > from the stack depot.
> > > >
> > > > Signed-off-by: Andrey Konovalov <[email protected]>
> > > >
> > >
> > > Building on an arm64 kernel from linux-next tag next-20230915, and boot
> > > that in QEMU. I see the following kernel panic.
> > >
> > > ...
> > >
> > > The full log can be found [1] and the .config file [2]. I bisected down
> > > to this commit, see the bisect log [3].
>
> I am also seeing similar crashes on an x86 KMSAN build.
>
> They are happening when in the following code:
>
> list_for_each(pos, bucket) {
> found = list_entry(pos, struct stack_record, list);
> if (found->hash == hash &&
> found->size == size &&
> !stackdepot_memcmp(entries, found->entries, size))
> return found;
> }
>
> `found` is NULL
>
> @Andrey, could you please take a look?
Found a bug, will fix in v3. Thank you!
On Mon, Oct 9, 2023 at 2:24 PM Alexander Potapenko <[email protected]> wrote:
>
> On Wed, Sep 13, 2023 at 7:18 PM <[email protected]> wrote:
> >
> > From: Andrey Konovalov <[email protected]>
> >
> > Make tag-based KASAN modes to evict stack traces from the stack depot
> "Make tag-based KASAN modes evict stack traces from the stack depot"
> (without "to")
>
> > Internally, pass STACK_DEPOT_FLAG_GET to stack_depot_save_flags (via
> > kasan_save_stack) to increment the refcount when saving a new entry
> > to stack ring and call stack_depot_put when removing an entry from
> > stack ring.
> >
> > Signed-off-by: Andrey Konovalov <[email protected]>
> Reviewed-by: Alexander Potapenko <[email protected]>
>
> (but see the two other comments)
>
> > --- a/mm/kasan/report_tags.c
> > +++ b/mm/kasan/report_tags.c
> > @@ -7,6 +7,7 @@
> > #include <linux/atomic.h>
> >
> > #include "kasan.h"
> > +#include "../slab.h"
>
> Why?
This belongs to the previous patch, will fix in v3, thanks!