This patch set implements SLAB support for KASAN
Unlike SLUB, SLAB doesn't store allocation/deallocation stacks for heap
objects, therefore we reimplement this feature in mm/kasan/stackdepot.c.
The intention is to ultimately switch SLUB to use this implementation as
well, which will save a lot of memory (right now SLUB bloats each object
by 256 bytes to store the allocation/deallocation stacks).
Also neither SLUB nor SLAB delay the reuse of freed memory chunks, which
is necessary for better detection of use-after-free errors. We introduce
memory quarantine (mm/kasan/quarantine.c), which allows delayed reuse of
deallocated memory.
Alexander Potapenko (7):
kasan: Modify kmalloc_large_oob_right(), add
kmalloc_pagealloc_oob_right()
mm, kasan: SLAB support
mm, kasan: Added GFP flags to KASAN API
arch, ftrace: For KASAN put hard/soft IRQ entries into separate
sections
mm, kasan: Stackdepot implementation. Enable stackdepot for SLAB
kasan: Test fix: Warn if the UAF could not be detected in kmalloc_uaf2
mm: kasan: Initial memory quarantine implementation
---
v2: - merged two patches that touched kmalloc_large_oob_right
- moved stackdepot implementation to lib/
- moved IRQ definitions to include/linux/interrupt.h
v3: - minor description changes
- store deallocation info in the "mm, kasan: SLAB support" patch
v4: - fix kbuild error reports
v5: - SLAB allocator, stackdepot: adopted suggestions by Andrey Ryabinin
- IRQ: fixed kbuild warnings
v6: - stackdepot: fixed kbuild warnings, simplified kasan_track,
use vmalloc() for depot when possible
- quarantine: improved patch description, removed dead code
v7: - fix kbuild error reports
---
Documentation/kasan.txt | 5 +-
arch/arm/include/asm/exception.h | 2 +-
arch/arm/kernel/vmlinux.lds.S | 1 +
arch/arm64/include/asm/exception.h | 2 +-
arch/arm64/kernel/vmlinux.lds.S | 1 +
arch/blackfin/kernel/vmlinux.lds.S | 1 +
arch/c6x/kernel/vmlinux.lds.S | 1 +
arch/metag/kernel/vmlinux.lds.S | 1 +
arch/microblaze/kernel/vmlinux.lds.S | 1 +
arch/mips/kernel/vmlinux.lds.S | 1 +
arch/nios2/kernel/vmlinux.lds.S | 1 +
arch/openrisc/kernel/vmlinux.lds.S | 1 +
arch/parisc/kernel/vmlinux.lds.S | 1 +
arch/powerpc/kernel/vmlinux.lds.S | 1 +
arch/s390/kernel/vmlinux.lds.S | 1 +
arch/sh/kernel/vmlinux.lds.S | 1 +
arch/sparc/kernel/vmlinux.lds.S | 1 +
arch/tile/kernel/vmlinux.lds.S | 1 +
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/vmlinux.lds.S | 1 +
include/asm-generic/vmlinux.lds.h | 12 +-
include/linux/ftrace.h | 11 --
include/linux/interrupt.h | 20 +++
include/linux/kasan.h | 63 ++++++--
include/linux/slab.h | 10 +-
include/linux/slab_def.h | 14 ++
include/linux/slub_def.h | 11 ++
include/linux/stackdepot.h | 32 ++++
kernel/softirq.c | 2 +-
kernel/trace/trace_functions_graph.c | 1 +
lib/Kconfig | 4 +
lib/Kconfig.kasan | 5 +-
lib/Makefile | 3 +
lib/stackdepot.c | 301 +++++++++++++++++++++++++++++++++++
lib/test_kasan.c | 59 ++++++-
mm/Makefile | 1 +
mm/kasan/Makefile | 4 +
mm/kasan/kasan.c | 219 +++++++++++++++++++++++--
mm/kasan/kasan.h | 44 +++++
mm/kasan/quarantine.c | 289 +++++++++++++++++++++++++++++++++
mm/kasan/report.c | 63 ++++++--
mm/mempool.c | 23 +--
mm/page_alloc.c | 2 +-
mm/slab.c | 53 +++++-
mm/slab.h | 4 +-
mm/slab_common.c | 8 +-
mm/slub.c | 19 +--
47 files changed, 1211 insertions(+), 92 deletions(-)
create mode 100644 include/linux/stackdepot.h
create mode 100644 lib/stackdepot.c
create mode 100644 mm/kasan/quarantine.c
--
2.7.0.rc3.207.g0ac5344
Rename kmalloc_large_oob_right() to kmalloc_pagealloc_oob_right(), as the
test only checks the page allocator functionality.
Also reimplement kmalloc_large_oob_right() so that the test allocates a
large enough chunk of memory that still does not trigger the page
allocator fallback.
Signed-off-by: Alexander Potapenko <[email protected]>
---
v2: - Merged "kasan: Change the behavior of kmalloc_large_oob_right" and
"kasan: Changed kmalloc_large_oob_right, added kmalloc_pagealloc_oob_right"
from v1
v3: - Minor description changes
---
lib/test_kasan.c | 28 +++++++++++++++++++++++++++-
1 file changed, 27 insertions(+), 1 deletion(-)
diff --git a/lib/test_kasan.c b/lib/test_kasan.c
index c32f3b0..90ad74f 100644
--- a/lib/test_kasan.c
+++ b/lib/test_kasan.c
@@ -65,11 +65,34 @@ static noinline void __init kmalloc_node_oob_right(void)
kfree(ptr);
}
-static noinline void __init kmalloc_large_oob_right(void)
+#ifdef CONFIG_SLUB
+static noinline void __init kmalloc_pagealloc_oob_right(void)
{
char *ptr;
size_t size = KMALLOC_MAX_CACHE_SIZE + 10;
+ /* Allocate a chunk that does not fit into a SLUB cache to trigger
+ * the page allocator fallback.
+ */
+ pr_info("kmalloc pagealloc allocation: out-of-bounds to right\n");
+ ptr = kmalloc(size, GFP_KERNEL);
+ if (!ptr) {
+ pr_err("Allocation failed\n");
+ return;
+ }
+
+ ptr[size] = 0;
+ kfree(ptr);
+}
+#endif
+
+static noinline void __init kmalloc_large_oob_right(void)
+{
+ char *ptr;
+ size_t size = KMALLOC_MAX_CACHE_SIZE - 256;
+ /* Allocate a chunk that is large enough, but still fits into a slab
+ * and does not trigger the page allocator fallback in SLUB.
+ */
pr_info("kmalloc large allocation: out-of-bounds to right\n");
ptr = kmalloc(size, GFP_KERNEL);
if (!ptr) {
@@ -324,6 +347,9 @@ static int __init kmalloc_tests_init(void)
kmalloc_oob_right();
kmalloc_oob_left();
kmalloc_node_oob_right();
+#ifdef CONFIG_SLUB
+ kmalloc_pagealloc_oob_right();
+#endif
kmalloc_large_oob_right();
kmalloc_oob_krealloc_more();
kmalloc_oob_krealloc_less();
--
2.7.0.rc3.207.g0ac5344
Add KASAN hooks to SLAB allocator.
This patch is based on the "mm: kasan: unified support for SLUB and
SLAB allocators" patch originally prepared by Dmitry Chernenkov.
Signed-off-by: Alexander Potapenko <[email protected]>
---
v3: - minor description changes
- store deallocation info in kasan_slab_free()
v4: - fix kbuild compile-time warnings in print_track()
v5: - adopted suggestions by Andrey Ryabinin:
-- kasan_kmalloc() can handle NULL, no need to check for it
-- simplified error printing code, removed unnecessary #ifdefs
---
Documentation/kasan.txt | 5 +--
include/linux/kasan.h | 12 ++++++
include/linux/slab.h | 6 +++
include/linux/slab_def.h | 14 +++++++
include/linux/slub_def.h | 11 +++++
lib/Kconfig.kasan | 4 +-
mm/Makefile | 1 +
mm/kasan/kasan.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++
mm/kasan/kasan.h | 34 ++++++++++++++++
mm/kasan/report.c | 54 ++++++++++++++++++++-----
mm/slab.c | 43 +++++++++++++++++---
mm/slab_common.c | 2 +-
12 files changed, 266 insertions(+), 22 deletions(-)
diff --git a/Documentation/kasan.txt b/Documentation/kasan.txt
index aa1e0c9..7dd95b3 100644
--- a/Documentation/kasan.txt
+++ b/Documentation/kasan.txt
@@ -12,8 +12,7 @@ KASAN uses compile-time instrumentation for checking every memory access,
therefore you will need a GCC version 4.9.2 or later. GCC 5.0 or later is
required for detection of out-of-bounds accesses to stack or global variables.
-Currently KASAN is supported only for x86_64 architecture and requires the
-kernel to be built with the SLUB allocator.
+Currently KASAN is supported only for x86_64 architecture.
1. Usage
========
@@ -27,7 +26,7 @@ inline are compiler instrumentation types. The former produces smaller binary
the latter is 1.1 - 2 times faster. Inline instrumentation requires a GCC
version 5.0 or later.
-Currently KASAN works only with the SLUB memory allocator.
+KASAN works with both SLUB and SLAB memory allocators.
For better bug detection and nicer reporting, enable CONFIG_STACKTRACE.
To disable instrumentation for specific files or directories, add a line
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 4b9f85c..4405a35 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -46,6 +46,9 @@ void kasan_unpoison_shadow(const void *address, size_t size);
void kasan_alloc_pages(struct page *page, unsigned int order);
void kasan_free_pages(struct page *page, unsigned int order);
+void kasan_cache_create(struct kmem_cache *cache, size_t *size,
+ unsigned long *flags);
+
void kasan_poison_slab(struct page *page);
void kasan_unpoison_object_data(struct kmem_cache *cache, void *object);
void kasan_poison_object_data(struct kmem_cache *cache, void *object);
@@ -59,6 +62,11 @@ void kasan_krealloc(const void *object, size_t new_size);
void kasan_slab_alloc(struct kmem_cache *s, void *object);
void kasan_slab_free(struct kmem_cache *s, void *object);
+struct kasan_cache {
+ int alloc_meta_offset;
+ int free_meta_offset;
+};
+
int kasan_module_alloc(void *addr, size_t size);
void kasan_free_shadow(const struct vm_struct *vm);
@@ -72,6 +80,10 @@ static inline void kasan_disable_current(void) {}
static inline void kasan_alloc_pages(struct page *page, unsigned int order) {}
static inline void kasan_free_pages(struct page *page, unsigned int order) {}
+static inline void kasan_cache_create(struct kmem_cache *cache,
+ size_t *size,
+ unsigned long *flags) {}
+
static inline void kasan_poison_slab(struct page *page) {}
static inline void kasan_unpoison_object_data(struct kmem_cache *cache,
void *object) {}
diff --git a/include/linux/slab.h b/include/linux/slab.h
index e4b5687..aa61595 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -92,6 +92,12 @@
# define SLAB_ACCOUNT 0x00000000UL
#endif
+#ifdef CONFIG_KASAN
+#define SLAB_KASAN 0x08000000UL
+#else
+#define SLAB_KASAN 0x00000000UL
+#endif
+
/* The following flags affect the page allocator grouping pages by mobility */
#define SLAB_RECLAIM_ACCOUNT 0x00020000UL /* Objects are reclaimable */
#define SLAB_TEMPORARY SLAB_RECLAIM_ACCOUNT /* Objects are short-lived */
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index e878ba3..9edbbf3 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -76,8 +76,22 @@ struct kmem_cache {
#ifdef CONFIG_MEMCG
struct memcg_cache_params memcg_params;
#endif
+#ifdef CONFIG_KASAN
+ struct kasan_cache kasan_info;
+#endif
struct kmem_cache_node *node[MAX_NUMNODES];
};
+static inline void *nearest_obj(struct kmem_cache *cache, struct page *page,
+ void *x) {
+ void *object = x - (x - page->s_mem) % cache->size;
+ void *last_object = page->s_mem + (cache->num - 1) * cache->size;
+
+ if (unlikely(object > last_object))
+ return last_object;
+ else
+ return object;
+}
+
#endif /* _LINUX_SLAB_DEF_H */
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index a33869b..feb1dc9 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -130,4 +130,15 @@ static inline void *virt_to_obj(struct kmem_cache *s,
void object_err(struct kmem_cache *s, struct page *page,
u8 *object, char *reason);
+static inline void *nearest_obj(struct kmem_cache *cache, struct page *page,
+ void *x) {
+ void *object = x - (x - page_address(page)) % cache->size;
+ void *last_object = page_address(page) +
+ (page->objects - 1) * cache->size;
+ if (unlikely(object > last_object))
+ return last_object;
+ else
+ return object;
+}
+
#endif /* _LINUX_SLUB_DEF_H */
diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index 0fee5ac..0e4d2b3 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -5,7 +5,7 @@ if HAVE_ARCH_KASAN
config KASAN
bool "KASan: runtime memory debugger"
- depends on SLUB_DEBUG
+ depends on SLUB_DEBUG || (SLAB && !DEBUG_SLAB)
select CONSTRUCTORS
help
Enables kernel address sanitizer - runtime memory debugger,
@@ -16,6 +16,8 @@ config KASAN
This feature consumes about 1/8 of available memory and brings about
~x3 performance slowdown.
For better error detection enable CONFIG_STACKTRACE.
+ Currently CONFIG_KASAN doesn't work with CONFIG_DEBUG_SLAB
+ (the resulting kernel does not boot).
choice
prompt "Instrumentation type"
diff --git a/mm/Makefile b/mm/Makefile
index 4f0f135..3ac70df 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -3,6 +3,7 @@
#
KASAN_SANITIZE_slab_common.o := n
+KASAN_SANITIZE_slab.o := n
KASAN_SANITIZE_slub.o := n
# These files are disabled because they produce non-interesting and/or
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index bc0a8d8..d26ffb4 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -314,6 +314,59 @@ void kasan_free_pages(struct page *page, unsigned int order)
KASAN_FREE_PAGE);
}
+#ifdef CONFIG_SLAB
+/*
+ * Adaptive redzone policy taken from the userspace AddressSanitizer runtime.
+ * For larger allocations larger redzones are used.
+ */
+static size_t optimal_redzone(size_t object_size)
+{
+ int rz =
+ object_size <= 64 - 16 ? 16 :
+ object_size <= 128 - 32 ? 32 :
+ object_size <= 512 - 64 ? 64 :
+ object_size <= 4096 - 128 ? 128 :
+ object_size <= (1 << 14) - 256 ? 256 :
+ object_size <= (1 << 15) - 512 ? 512 :
+ object_size <= (1 << 16) - 1024 ? 1024 : 2048;
+ return rz;
+}
+
+void kasan_cache_create(struct kmem_cache *cache, size_t *size,
+ unsigned long *flags)
+{
+ int redzone_adjust;
+ /* Make sure the adjusted size is still less than
+ * KMALLOC_MAX_CACHE_SIZE.
+ * TODO: this check is only useful for SLAB, but not SLUB. We'll need
+ * to skip it for SLUB when it starts using kasan_cache_create().
+ */
+ if (*size > KMALLOC_MAX_CACHE_SIZE -
+ sizeof(struct kasan_alloc_meta) -
+ sizeof(struct kasan_free_meta))
+ return;
+ *flags |= SLAB_KASAN;
+ /* Add alloc meta. */
+ cache->kasan_info.alloc_meta_offset = *size;
+ *size += sizeof(struct kasan_alloc_meta);
+
+ /* Add free meta. */
+ if (cache->flags & SLAB_DESTROY_BY_RCU || cache->ctor ||
+ cache->object_size < sizeof(struct kasan_free_meta)) {
+ cache->kasan_info.free_meta_offset = *size;
+ *size += sizeof(struct kasan_free_meta);
+ }
+ redzone_adjust = optimal_redzone(cache->object_size) -
+ (*size - cache->object_size);
+ if (redzone_adjust > 0)
+ *size += redzone_adjust;
+ *size = min(KMALLOC_MAX_CACHE_SIZE,
+ max(*size,
+ cache->object_size +
+ optimal_redzone(cache->object_size)));
+}
+#endif
+
void kasan_poison_slab(struct page *page)
{
kasan_poison_shadow(page_address(page),
@@ -331,8 +384,36 @@ void kasan_poison_object_data(struct kmem_cache *cache, void *object)
kasan_poison_shadow(object,
round_up(cache->object_size, KASAN_SHADOW_SCALE_SIZE),
KASAN_KMALLOC_REDZONE);
+#ifdef CONFIG_SLAB
+ if (cache->flags & SLAB_KASAN) {
+ struct kasan_alloc_meta *alloc_info =
+ get_alloc_info(cache, object);
+ alloc_info->state = KASAN_STATE_INIT;
+ }
+#endif
+}
+
+static inline void set_track(struct kasan_track *track)
+{
+ track->cpu = raw_smp_processor_id();
+ track->pid = current->pid;
+ track->when = jiffies;
}
+#ifdef CONFIG_SLAB
+struct kasan_alloc_meta *get_alloc_info(struct kmem_cache *cache,
+ const void *object)
+{
+ return (void *)object + cache->kasan_info.alloc_meta_offset;
+}
+
+struct kasan_free_meta *get_free_info(struct kmem_cache *cache,
+ const void *object)
+{
+ return (void *)object + cache->kasan_info.free_meta_offset;
+}
+#endif
+
void kasan_slab_alloc(struct kmem_cache *cache, void *object)
{
kasan_kmalloc(cache, object, cache->object_size);
@@ -347,6 +428,17 @@ void kasan_slab_free(struct kmem_cache *cache, void *object)
if (unlikely(cache->flags & SLAB_DESTROY_BY_RCU))
return;
+#ifdef CONFIG_SLAB
+ if (cache->flags & SLAB_KASAN) {
+ struct kasan_free_meta *free_info =
+ get_free_info(cache, object);
+ struct kasan_alloc_meta *alloc_info =
+ get_alloc_info(cache, object);
+ alloc_info->state = KASAN_STATE_FREE;
+ set_track(&free_info->track);
+ }
+#endif
+
kasan_poison_shadow(object, rounded_up_size, KASAN_KMALLOC_FREE);
}
@@ -366,6 +458,16 @@ void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size)
kasan_unpoison_shadow(object, size);
kasan_poison_shadow((void *)redzone_start, redzone_end - redzone_start,
KASAN_KMALLOC_REDZONE);
+#ifdef CONFIG_SLAB
+ if (cache->flags & SLAB_KASAN) {
+ struct kasan_alloc_meta *alloc_info =
+ get_alloc_info(cache, object);
+
+ alloc_info->state = KASAN_STATE_ALLOC;
+ alloc_info->alloc_size = size;
+ set_track(&alloc_info->track);
+ }
+#endif
}
EXPORT_SYMBOL(kasan_kmalloc);
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 4f6c62e..7b9e4ab9 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -54,6 +54,40 @@ struct kasan_global {
#endif
};
+/**
+ * Structures to keep alloc and free tracks *
+ */
+
+enum kasan_state {
+ KASAN_STATE_INIT,
+ KASAN_STATE_ALLOC,
+ KASAN_STATE_FREE
+};
+
+struct kasan_track {
+ u64 cpu : 6; /* for NR_CPUS = 64 */
+ u64 pid : 16; /* 65536 processes */
+ u64 when : 42; /* ~140 years */
+};
+
+struct kasan_alloc_meta {
+ u32 state : 2; /* enum kasan_state */
+ u32 alloc_size : 30;
+ struct kasan_track track;
+};
+
+struct kasan_free_meta {
+ /* Allocator freelist pointer, unused by KASAN. */
+ void **freelist;
+ struct kasan_track track;
+};
+
+struct kasan_alloc_meta *get_alloc_info(struct kmem_cache *cache,
+ const void *object);
+struct kasan_free_meta *get_free_info(struct kmem_cache *cache,
+ const void *object);
+
+
static inline const void *kasan_shadow_to_mem(const void *shadow_addr)
{
return (void *)(((unsigned long)shadow_addr - KASAN_SHADOW_OFFSET)
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 12f222d..0a4fde9 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -115,6 +115,46 @@ static inline bool init_task_stack_addr(const void *addr)
sizeof(init_thread_union.stack));
}
+#ifdef CONFIG_SLAB
+static void print_track(struct kasan_track *track)
+{
+ pr_err("PID = %u, CPU = %u, timestamp = %lu\n", track->pid,
+ track->cpu, (unsigned long)track->when);
+}
+
+static void object_err(struct kmem_cache *cache, struct page *page,
+ void *object, char *unused_reason)
+{
+ struct kasan_alloc_meta *alloc_info = get_alloc_info(cache, object);
+ struct kasan_free_meta *free_info;
+
+ dump_stack();
+ pr_err("Object at %p, in cache %s\n", object, cache->name);
+ if (!(cache->flags & SLAB_KASAN))
+ return;
+ switch (alloc_info->state) {
+ case KASAN_STATE_INIT:
+ pr_err("Object not allocated yet\n");
+ break;
+ case KASAN_STATE_ALLOC:
+ pr_err("Object allocated with size %u bytes.\n",
+ alloc_info->alloc_size);
+ pr_err("Allocation:\n");
+ print_track(&alloc_info->track);
+ break;
+ case KASAN_STATE_FREE:
+ pr_err("Object freed, allocated with size %u bytes\n",
+ alloc_info->alloc_size);
+ free_info = get_free_info(cache, object);
+ pr_err("Allocation:\n");
+ print_track(&alloc_info->track);
+ pr_err("Deallocation:\n");
+ print_track(&free_info->track);
+ break;
+ }
+}
+#endif
+
static void print_address_description(struct kasan_access_info *info)
{
const void *addr = info->access_addr;
@@ -126,17 +166,10 @@ static void print_address_description(struct kasan_access_info *info)
if (PageSlab(page)) {
void *object;
struct kmem_cache *cache = page->slab_cache;
- void *last_object;
-
- object = virt_to_obj(cache, page_address(page), addr);
- last_object = page_address(page) +
- page->objects * cache->size;
-
- if (unlikely(object > last_object))
- object = last_object; /* we hit into padding */
-
+ object = nearest_obj(cache, page,
+ (void *)info->access_addr);
object_err(cache, page, object,
- "kasan: bad access detected");
+ "kasan: bad access detected");
return;
}
dump_page(page, "kasan: bad access detected");
@@ -146,7 +179,6 @@ static void print_address_description(struct kasan_access_info *info)
if (!init_task_stack_addr(addr))
pr_err("Address belongs to variable %pS\n", addr);
}
-
dump_stack();
}
diff --git a/mm/slab.c b/mm/slab.c
index b9ee775..95dd196 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2091,6 +2091,8 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
}
#endif
+ kasan_cache_create(cachep, &size, &flags);
+
size = ALIGN(size, cachep->align);
/*
* We should restrict the number of objects in a slab to implement
@@ -2392,8 +2394,13 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
* cache which they are a constructor for. Otherwise, deadlock.
* They must also be threaded.
*/
- if (cachep->ctor && !(cachep->flags & SLAB_POISON))
+ if (cachep->ctor && !(cachep->flags & SLAB_POISON)) {
+ kasan_unpoison_object_data(cachep,
+ objp + obj_offset(cachep));
cachep->ctor(objp + obj_offset(cachep));
+ kasan_poison_object_data(
+ cachep, objp + obj_offset(cachep));
+ }
if (cachep->flags & SLAB_RED_ZONE) {
if (*dbg_redzone2(cachep, objp) != RED_INACTIVE)
@@ -2416,6 +2423,7 @@ static void cache_init_objs(struct kmem_cache *cachep,
struct page *page)
{
int i;
+ void *objp;
cache_init_objs_debug(cachep, page);
@@ -2426,8 +2434,12 @@ static void cache_init_objs(struct kmem_cache *cachep,
for (i = 0; i < cachep->num; i++) {
/* constructor could break poison info */
- if (DEBUG == 0 && cachep->ctor)
- cachep->ctor(index_to_obj(cachep, page, i));
+ if (DEBUG == 0 && cachep->ctor) {
+ objp = index_to_obj(cachep, page, i);
+ kasan_unpoison_object_data(cachep, objp);
+ cachep->ctor(objp);
+ kasan_poison_object_data(cachep, objp);
+ }
set_free_obj(page, i, i);
}
@@ -2557,6 +2569,7 @@ static int cache_grow(struct kmem_cache *cachep,
slab_map_pages(cachep, page, freelist);
+ kasan_poison_slab(page);
cache_init_objs(cachep, page);
if (gfpflags_allow_blocking(local_flags))
@@ -3325,6 +3338,8 @@ static inline void __cache_free(struct kmem_cache *cachep, void *objp,
{
struct array_cache *ac = cpu_cache_get(cachep);
+ kasan_slab_free(cachep, objp);
+
check_irq_off();
kmemleak_free_recursive(objp, cachep->flags);
objp = cache_free_debugcheck(cachep, objp, caller);
@@ -3372,6 +3387,7 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
void *ret = slab_alloc(cachep, flags, _RET_IP_);
+ kasan_slab_alloc(cachep, ret);
trace_kmem_cache_alloc(_RET_IP_, ret,
cachep->object_size, cachep->size, flags);
@@ -3437,6 +3453,7 @@ kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
ret = slab_alloc(cachep, flags, _RET_IP_);
+ kasan_kmalloc(cachep, ret, size);
trace_kmalloc(_RET_IP_, ret,
size, cachep->size, flags);
return ret;
@@ -3460,6 +3477,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
{
void *ret = slab_alloc_node(cachep, flags, nodeid, _RET_IP_);
+ kasan_slab_alloc(cachep, ret);
trace_kmem_cache_alloc_node(_RET_IP_, ret,
cachep->object_size, cachep->size,
flags, nodeid);
@@ -3477,7 +3495,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
void *ret;
ret = slab_alloc_node(cachep, flags, nodeid, _RET_IP_);
-
+ kasan_kmalloc(cachep, ret, size);
trace_kmalloc_node(_RET_IP_, ret,
size, cachep->size,
flags, nodeid);
@@ -3490,11 +3508,15 @@ static __always_inline void *
__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
{
struct kmem_cache *cachep;
+ void *ret;
cachep = kmalloc_slab(size, flags);
if (unlikely(ZERO_OR_NULL_PTR(cachep)))
return cachep;
- return kmem_cache_alloc_node_trace(cachep, flags, node, size);
+ ret = kmem_cache_alloc_node_trace(cachep, flags, node, size);
+ kasan_kmalloc(cachep, ret, size);
+
+ return ret;
}
void *__kmalloc_node(size_t size, gfp_t flags, int node)
@@ -3528,6 +3550,7 @@ static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
return cachep;
ret = slab_alloc(cachep, flags, caller);
+ kasan_kmalloc(cachep, ret, size);
trace_kmalloc(caller, ret,
size, cachep->size, flags);
@@ -4300,10 +4323,18 @@ module_init(slab_proc_init);
*/
size_t ksize(const void *objp)
{
+ size_t size;
+
BUG_ON(!objp);
if (unlikely(objp == ZERO_SIZE_PTR))
return 0;
- return virt_to_cache(objp)->object_size;
+ size = virt_to_cache(objp)->object_size;
+ /* We assume that ksize callers could use the whole allocated area,
+ * so we need to unpoison this area.
+ */
+ kasan_krealloc(objp, size);
+
+ return size;
}
EXPORT_SYMBOL(ksize);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 8addc3c..242e6fa 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -35,7 +35,7 @@ struct kmem_cache *kmem_cache;
*/
#define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
SLAB_TRACE | SLAB_DESTROY_BY_RCU | SLAB_NOLEAKTRACE | \
- SLAB_FAILSLAB)
+ SLAB_FAILSLAB | SLAB_KASAN)
#define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \
SLAB_NOTRACK | SLAB_ACCOUNT)
--
2.7.0.rc3.207.g0ac5344
Implement the stack depot and provide CONFIG_STACKDEPOT.
Stack depot will allow KASAN store allocation/deallocation stack traces
for memory chunks. The stack traces are stored in a hash table and
referenced by handles which reside in the kasan_alloc_meta and
kasan_free_meta structures in the allocated memory chunks.
IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
duplication.
Right now stackdepot support is only enabled in SLAB allocator.
Once KASAN features in SLAB are on par with those in SLUB we can switch
SLUB to stackdepot as well, thus removing the dependency on SLUB stack
bookkeeping, which wastes a lot of memory.
This patch is based on the "mm: kasan: stack depots" patch originally
prepared by Dmitry Chernenkov.
Signed-off-by: Alexander Potapenko <[email protected]>
---
v2: - per request from Joonsoo Kim, moved the stackdepot implementation to
lib/, as there's a plan to use it for page owner
- added copyright comments
- added comments about smp_load_acquire()/smp_store_release()
v3: - minor description changes
v5: - decreased STACK_ALLOC_ORDER to reduce fragmentation
- declared CONFIG_STACKDEPOT
- replaced __memcpy() with memcpy()
- simplified GFP flags
v6: - renamed depot_stack_handle to depot_stack_handle_t
- made CONFIG_STACKDEPOT depend on CONFIG_STACKTRACE (fix kbuild errors)
- made depot_save_stack() allocate memory via vmalloc() when possible
- added a reentrancy flag to avoid saving stacks from recursive
depot_save_stack() calls
- simplified kasan_track (dropped CPU number and allocation time)
v7: - fixed kbuild errors (made several functions SLAB-only)
---
arch/x86/kernel/Makefile | 1 +
include/linux/stackdepot.h | 32 +++++
lib/Kconfig | 4 +
lib/Kconfig.kasan | 1 +
lib/Makefile | 3 +
lib/stackdepot.c | 301 +++++++++++++++++++++++++++++++++++++++++++++
mm/kasan/kasan.c | 55 ++++++++-
mm/kasan/kasan.h | 11 +-
mm/kasan/report.c | 12 +-
9 files changed, 408 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index edfa3ec..f656d6e 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -19,6 +19,7 @@ endif
KASAN_SANITIZE_head$(BITS).o := n
KASAN_SANITIZE_dumpstack.o := n
KASAN_SANITIZE_dumpstack_$(BITS).o := n
+KASAN_SANITIZE_stacktrace.o := n
OBJECT_FILES_NON_STANDARD_head_$(BITS).o := y
OBJECT_FILES_NON_STANDARD_relocate_kernel_$(BITS).o := y
diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
new file mode 100644
index 0000000..7978b3e
--- /dev/null
+++ b/include/linux/stackdepot.h
@@ -0,0 +1,32 @@
+/*
+ * A generic stack depot implementation
+ *
+ * Author: Alexander Potapenko <[email protected]>
+ * Copyright (C) 2016 Google, Inc.
+ *
+ * Based on code by Dmitry Chernenkov.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_STACKDEPOT_H
+#define _LINUX_STACKDEPOT_H
+
+typedef u32 depot_stack_handle_t;
+
+struct stack_trace;
+
+depot_stack_handle_t depot_save_stack(struct stack_trace *trace, gfp_t flags);
+
+void depot_fetch_stack(depot_stack_handle_t handle, struct stack_trace *trace);
+
+#endif
diff --git a/lib/Kconfig b/lib/Kconfig
index ee38a3f..9d1cd1c 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -543,4 +543,8 @@ config ARCH_HAS_PMEM_API
config ARCH_HAS_MMIO_FLUSH
bool
+config STACKDEPOT
+ bool
+ select STACKTRACE
+
endmenu
diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index 0e4d2b3..67d8c68 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -7,6 +7,7 @@ config KASAN
bool "KASan: runtime memory debugger"
depends on SLUB_DEBUG || (SLAB && !DEBUG_SLAB)
select CONSTRUCTORS
+ select STACKDEPOT if SLAB
help
Enables kernel address sanitizer - runtime memory debugger,
designed to find out-of-bounds accesses and use-after-free bugs.
diff --git a/lib/Makefile b/lib/Makefile
index 50b31e2..0123abc 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -182,6 +182,9 @@ obj-$(CONFIG_SG_SPLIT) += sg_split.o
obj-$(CONFIG_STMP_DEVICE) += stmp_device.o
obj-$(CONFIG_IRQ_POLL) += irq_poll.o
+obj-$(CONFIG_STACKDEPOT) += stackdepot.o
+KASAN_SANITIZE_stackdepot.o := n
+
libfdt_files = fdt.o fdt_ro.o fdt_wip.o fdt_rw.o fdt_sw.o fdt_strerror.o \
fdt_empty_tree.o
$(foreach file, $(libfdt_files), \
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
new file mode 100644
index 0000000..b0eb6db
--- /dev/null
+++ b/lib/stackdepot.c
@@ -0,0 +1,301 @@
+/*
+ * Generic stack depot for storing stack traces.
+ *
+ * Some debugging tools need to save stack traces of certain events which can
+ * be later presented to the user. For example, KASAN needs to safe alloc and
+ * free stacks for each object, but storing two stack traces per object
+ * requires too much memory (e.g. SLUB_DEBUG needs 256 bytes per object for
+ * that).
+ *
+ * Instead, stack depot maintains a hashtable of unique stacktraces. Since alloc
+ * and free stacks repeat a lot, we save about 100x space.
+ * Stacks are never removed from depot, so we store them contiguously one after
+ * another in a contiguos memory allocation.
+ *
+ * Author: Alexander Potapenko <[email protected]>
+ * Copyright (C) 2016 Google, Inc.
+ *
+ * Based on code by Dmitry Chernenkov.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ */
+
+#include <linux/gfp.h>
+#include <linux/jhash.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/percpu.h>
+#include <linux/printk.h>
+#include <linux/slab.h>
+#include <linux/stacktrace.h>
+#include <linux/stackdepot.h>
+#include <linux/string.h>
+#include <linux/types.h>
+#include <linux/vmalloc.h>
+
+#define DEPOT_STACK_BITS (sizeof(depot_stack_handle_t) * 8)
+
+#define STACK_ALLOC_ORDER 2 /* 'Slab' size order for stack depot, 4 pages */
+#define STACK_ALLOC_SIZE (1LL << (PAGE_SHIFT + STACK_ALLOC_ORDER))
+#define STACK_ALLOC_ALIGN 4
+#define STACK_ALLOC_OFFSET_BITS (STACK_ALLOC_ORDER + PAGE_SHIFT - \
+ STACK_ALLOC_ALIGN)
+#define STACK_ALLOC_INDEX_BITS (DEPOT_STACK_BITS - STACK_ALLOC_OFFSET_BITS)
+#define STACK_ALLOC_SLABS_CAP 1024
+#define STACK_ALLOC_MAX_SLABS \
+ (((1LL << (STACK_ALLOC_INDEX_BITS)) < STACK_ALLOC_SLABS_CAP) ? \
+ (1LL << (STACK_ALLOC_INDEX_BITS)) : STACK_ALLOC_SLABS_CAP)
+
+/* The compact structure to store the reference to stacks. */
+union handle_parts {
+ depot_stack_handle_t handle;
+ struct {
+ u32 slabindex : STACK_ALLOC_INDEX_BITS;
+ u32 offset : STACK_ALLOC_OFFSET_BITS;
+ };
+};
+
+struct stack_record {
+ struct stack_record *next; /* Link in the hashtable */
+ u32 hash; /* Hash in the hastable */
+ u32 size; /* Number of frames in the stack */
+ union handle_parts handle;
+ unsigned long entries[1]; /* Variable-sized array of entries. */
+};
+
+static void *stack_slabs[STACK_ALLOC_MAX_SLABS];
+
+static int depot_index;
+static int next_slab_inited;
+static size_t depot_offset;
+static DEFINE_SPINLOCK(depot_lock);
+
+static bool init_stack_slab(void **prealloc)
+{
+ if (!*prealloc)
+ return false;
+ /* This smp_load_acquire() pairs with smp_store_release() to
+ * |next_slab_inited| below and in depot_alloc_stack().
+ */
+ if (smp_load_acquire(&next_slab_inited))
+ return true;
+ if (stack_slabs[depot_index] == NULL) {
+ stack_slabs[depot_index] = *prealloc;
+ } else {
+ stack_slabs[depot_index + 1] = *prealloc;
+ /* This smp_store_release pairs with smp_load_acquire() from
+ * |next_slab_inited| above and in depot_save_stack().
+ */
+ smp_store_release(&next_slab_inited, 1);
+ }
+ *prealloc = NULL;
+ return true;
+}
+
+/* Allocation of a new stack in raw storage */
+static struct stack_record *depot_alloc_stack(unsigned long *entries, int size,
+ u32 hash, void **prealloc, gfp_t alloc_flags)
+{
+ int required_size = offsetof(struct stack_record, entries) +
+ sizeof(unsigned long) * size;
+ struct stack_record *stack;
+
+ required_size = ALIGN(required_size, 1 << STACK_ALLOC_ALIGN);
+
+ if (unlikely(depot_offset + required_size > STACK_ALLOC_SIZE)) {
+ if (unlikely(depot_index + 1 >= STACK_ALLOC_MAX_SLABS)) {
+ WARN_ONCE(1, "Stack depot reached limit capacity");
+ return NULL;
+ }
+ depot_index++;
+ depot_offset = 0;
+ /* smp_store_release() here pairs with smp_load_acquire() from
+ * |next_slab_inited| in depot_save_stack() and
+ * init_stack_slab().
+ */
+ if (depot_index + 1 < STACK_ALLOC_MAX_SLABS)
+ smp_store_release(&next_slab_inited, 0);
+ }
+ init_stack_slab(prealloc);
+ if (stack_slabs[depot_index] == NULL)
+ return NULL;
+
+ stack = stack_slabs[depot_index] + depot_offset;
+
+ stack->hash = hash;
+ stack->size = size;
+ stack->handle.slabindex = depot_index;
+ stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
+ memcpy(stack->entries, entries, size * sizeof(unsigned long));
+ depot_offset += required_size;
+
+ return stack;
+}
+
+#define STACK_HASH_ORDER 20
+#define STACK_HASH_SIZE (1L << STACK_HASH_ORDER)
+#define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
+#define STACK_HASH_SEED 0x9747b28c
+
+static struct stack_record *stack_table[STACK_HASH_SIZE] = {
+ [0 ... STACK_HASH_SIZE - 1] = NULL
+};
+
+/* Calculate hash for a stack */
+static inline u32 hash_stack(unsigned long *entries, unsigned int size)
+{
+ return jhash2((u32 *)entries,
+ size * sizeof(unsigned long) / sizeof(u32),
+ STACK_HASH_SEED);
+}
+
+/* Find a stack that is equal to the one stored in entries in the hash */
+static inline struct stack_record *find_stack(struct stack_record *bucket,
+ unsigned long *entries, int size,
+ u32 hash)
+{
+ struct stack_record *found;
+
+ for (found = bucket; found; found = found->next) {
+ if (found->hash == hash &&
+ found->size == size &&
+ !memcmp(entries, found->entries,
+ size * sizeof(unsigned long))) {
+ return found;
+ }
+ }
+ return NULL;
+}
+
+void depot_fetch_stack(depot_stack_handle_t handle, struct stack_trace *trace)
+{
+ union handle_parts parts = { .handle = handle };
+ void *slab = stack_slabs[parts.slabindex];
+ size_t offset = parts.offset << STACK_ALLOC_ALIGN;
+ struct stack_record *stack = slab + offset;
+
+ trace->nr_entries = trace->max_entries = stack->size;
+ trace->entries = stack->entries;
+ trace->skip = 0;
+}
+
+static DEFINE_PER_CPU(bool, depot_recursion);
+/*
+ * depot_save_stack - save stack in a stack depot.
+ * @trace - the stacktrace to save.
+ * @alloc_flags - flags for allocating additional memory if required.
+ *
+ * Returns the handle of the stack struct stored in depot.
+ */
+depot_stack_handle_t depot_save_stack(struct stack_trace *trace,
+ gfp_t alloc_flags)
+{
+ u32 hash;
+ depot_stack_handle_t retval = 0;
+ struct stack_record *found = NULL, **bucket;
+ unsigned long flags;
+ struct page *page = NULL;
+ void *prealloc = NULL;
+ bool *rec;
+
+ if (unlikely(trace->nr_entries == 0))
+ goto fast_exit;
+
+ rec = this_cpu_ptr(&depot_recursion);
+ /* Don't store the stack if we've been called recursively. */
+ if (unlikely(*rec))
+ goto fast_exit;
+ *rec = true;
+
+ hash = hash_stack(trace->entries, trace->nr_entries);
+ /* Bad luck, we won't store this stack. */
+ if (hash == 0)
+ goto exit;
+
+ bucket = &stack_table[hash & STACK_HASH_MASK];
+
+ /* Fast path: look the stack trace up without locking.
+ *
+ * The smp_load_acquire() here pairs with smp_store_release() to
+ * |bucket| below.
+ */
+ found = find_stack(smp_load_acquire(bucket), trace->entries,
+ trace->nr_entries, hash);
+ if (found)
+ goto exit;
+
+ /* Check if the current or the next stack slab need to be initialized.
+ * If so, allocate the memory - we won't be able to do that under the
+ * lock.
+ *
+ * The smp_load_acquire() here pairs with smp_store_release() to
+ * |next_slab_inited| in depot_alloc_stack() and init_stack_slab().
+ */
+ if (unlikely(!smp_load_acquire(&next_slab_inited))) {
+ /* Zero out zone modifiers, as we don't have specific zone
+ * requirements. Keep the flags related to allocation in atomic
+ * contexts and I/O.
+ */
+ alloc_flags &= ~GFP_ZONEMASK;
+ alloc_flags &= (GFP_ATOMIC | GFP_KERNEL);
+ /* When possible, allocate using vmalloc() to reduce physical
+ * address space fragmentation. vmalloc() doesn't work if
+ * kmalloc caches haven't been initialized or if it's being
+ * called from an interrupt handler.
+ */
+ if (kmalloc_caches[KMALLOC_SHIFT_HIGH] && !in_interrupt()) {
+ prealloc = __vmalloc(
+ STACK_ALLOC_SIZE, alloc_flags, PAGE_KERNEL);
+ } else {
+ page = alloc_pages(alloc_flags, STACK_ALLOC_ORDER);
+ if (page)
+ prealloc = page_address(page);
+ }
+ }
+
+ spin_lock_irqsave(&depot_lock, flags);
+
+ found = find_stack(*bucket, trace->entries, trace->nr_entries, hash);
+ if (!found) {
+ struct stack_record *new =
+ depot_alloc_stack(trace->entries, trace->nr_entries,
+ hash, &prealloc, alloc_flags);
+ if (new) {
+ new->next = *bucket;
+ /* This smp_store_release() pairs with
+ * smp_load_acquire() from |bucket| above.
+ */
+ smp_store_release(bucket, new);
+ found = new;
+ }
+ } else if (prealloc) {
+ /*
+ * We didn't need to store this stack trace, but let's keep
+ * the preallocated memory for the future.
+ */
+ WARN_ON(!init_stack_slab(&prealloc));
+ }
+
+ spin_unlock_irqrestore(&depot_lock, flags);
+exit:
+ if (prealloc) {
+ /* Nobody used this memory, ok to free it. */
+ if (page)
+ free_pages((unsigned long)prealloc, STACK_ALLOC_ORDER);
+ else
+ vfree(prealloc);
+ }
+ if (found)
+ retval = found->handle.handle;
+ *rec = false;
+fast_exit:
+ return retval;
+}
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 95b2267..6c0de02 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -17,7 +17,9 @@
#define DISABLE_BRANCH_PROFILING
#include <linux/export.h>
+#include <linux/interrupt.h>
#include <linux/init.h>
+#include <linux/kasan.h>
#include <linux/kernel.h>
#include <linux/kmemleak.h>
#include <linux/memblock.h>
@@ -31,7 +33,6 @@
#include <linux/string.h>
#include <linux/types.h>
#include <linux/vmalloc.h>
-#include <linux/kasan.h>
#include "kasan.h"
#include "../slab.h"
@@ -393,23 +394,65 @@ void kasan_poison_object_data(struct kmem_cache *cache, void *object)
#endif
}
-static inline void set_track(struct kasan_track *track)
+#ifdef CONFIG_SLAB
+static inline int in_irqentry_text(unsigned long ptr)
+{
+ return (ptr >= (unsigned long)&__irqentry_text_start &&
+ ptr < (unsigned long)&__irqentry_text_end) ||
+ (ptr >= (unsigned long)&__softirqentry_text_start &&
+ ptr < (unsigned long)&__softirqentry_text_end);
+}
+
+static inline void filter_irq_stacks(struct stack_trace *trace)
+{
+ int i;
+
+ if (!trace->nr_entries)
+ return;
+ for (i = 0; i < trace->nr_entries; i++)
+ if (in_irqentry_text(trace->entries[i])) {
+ /* Include the irqentry function into the stack. */
+ trace->nr_entries = i + 1;
+ break;
+ }
+}
+
+static inline depot_stack_handle_t save_stack(gfp_t flags)
+{
+ unsigned long entries[KASAN_STACK_DEPTH];
+ struct stack_trace trace = {
+ .nr_entries = 0,
+ .entries = entries,
+ .max_entries = KASAN_STACK_DEPTH,
+ .skip = 0
+ };
+
+ save_stack_trace(&trace);
+ filter_irq_stacks(&trace);
+ if (trace.nr_entries != 0 &&
+ trace.entries[trace.nr_entries-1] == ULONG_MAX)
+ trace.nr_entries--;
+
+ return depot_save_stack(&trace, flags);
+}
+
+static inline void set_track(struct kasan_track *track, gfp_t flags)
{
- track->cpu = raw_smp_processor_id();
track->pid = current->pid;
- track->when = jiffies;
+ track->stack = save_stack(flags);
}
-#ifdef CONFIG_SLAB
struct kasan_alloc_meta *get_alloc_info(struct kmem_cache *cache,
const void *object)
{
+ BUILD_BUG_ON(sizeof(struct kasan_alloc_meta) > 32);
return (void *)object + cache->kasan_info.alloc_meta_offset;
}
struct kasan_free_meta *get_free_info(struct kmem_cache *cache,
const void *object)
{
+ BUILD_BUG_ON(sizeof(struct kasan_free_meta) > 32);
return (void *)object + cache->kasan_info.free_meta_offset;
}
#endif
@@ -466,7 +509,7 @@ void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size,
alloc_info->state = KASAN_STATE_ALLOC;
alloc_info->alloc_size = size;
- set_track(&alloc_info->track);
+ set_track(&alloc_info->track, flags);
}
#endif
}
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 7b9e4ab9..30a2f0b 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -2,6 +2,7 @@
#define __MM_KASAN_KASAN_H
#include <linux/kasan.h>
+#include <linux/stackdepot.h>
#define KASAN_SHADOW_SCALE_SIZE (1UL << KASAN_SHADOW_SCALE_SHIFT)
#define KASAN_SHADOW_MASK (KASAN_SHADOW_SCALE_SIZE - 1)
@@ -64,16 +65,18 @@ enum kasan_state {
KASAN_STATE_FREE
};
+#define KASAN_STACK_DEPTH 64
+
struct kasan_track {
- u64 cpu : 6; /* for NR_CPUS = 64 */
- u64 pid : 16; /* 65536 processes */
- u64 when : 42; /* ~140 years */
+ u32 pid;
+ depot_stack_handle_t stack;
};
struct kasan_alloc_meta {
+ struct kasan_track track;
u32 state : 2; /* enum kasan_state */
u32 alloc_size : 30;
- struct kasan_track track;
+ u32 reserved;
};
struct kasan_free_meta {
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 0a4fde9..8e58be0f 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -18,6 +18,7 @@
#include <linux/printk.h>
#include <linux/sched.h>
#include <linux/slab.h>
+#include <linux/stackdepot.h>
#include <linux/stacktrace.h>
#include <linux/string.h>
#include <linux/types.h>
@@ -118,8 +119,15 @@ static inline bool init_task_stack_addr(const void *addr)
#ifdef CONFIG_SLAB
static void print_track(struct kasan_track *track)
{
- pr_err("PID = %u, CPU = %u, timestamp = %lu\n", track->pid,
- track->cpu, (unsigned long)track->when);
+ pr_err("PID = %u\n", track->pid);
+ if (track->stack) {
+ struct stack_trace trace;
+
+ depot_fetch_stack(track->stack, &trace);
+ print_stack_trace(&trace, 0);
+ } else {
+ pr_err("(stack is not available)\n");
+ }
}
static void object_err(struct kmem_cache *cache, struct page *page,
--
2.7.0.rc3.207.g0ac5344
Signed-off-by: Alexander Potapenko <[email protected]>
Acked-by: Andrey Ryabinin <[email protected]>
---
lib/test_kasan.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/lib/test_kasan.c b/lib/test_kasan.c
index 90ad74f..82169fb 100644
--- a/lib/test_kasan.c
+++ b/lib/test_kasan.c
@@ -294,6 +294,8 @@ static noinline void __init kmalloc_uaf2(void)
}
ptr1[40] = 'x';
+ if (ptr1 == ptr2)
+ pr_err("Could not detect use-after-free: ptr1 == ptr2\n");
kfree(ptr2);
}
--
2.7.0.rc3.207.g0ac5344
Quarantine isolates freed objects in a separate queue. The objects are
returned to the allocator later, which helps to detect use-after-free
errors.
Freed objects are first added to per-cpu quarantine queues.
When a cache is destroyed or memory shrinking is requested, the objects
are moved into the global quarantine queue. Whenever a kmalloc call
allows memory reclaiming, the oldest objects are popped out of the
global queue until the total size of objects in quarantine is less than
3/4 of the maximum quarantine size (which is a fraction of installed
physical memory).
As long as an object remains in the quarantine, KASAN is able to report
accesses to it, so the chance of reporting a use-after-free is increased.
Once the object leaves quarantine, the allocator may reuse it, in which
case the object is unpoisoned and KASAN can't detect incorrect accesses
to it.
Right now quarantine support is only enabled in SLAB allocator.
Unification of KASAN features in SLAB and SLUB will be done later.
This patch is based on the "mm: kasan: quarantine" patch originally
prepared by Dmitry Chernenkov.
Signed-off-by: Alexander Potapenko <[email protected]>
---
v2: - added copyright comments
- per request from Joonsoo Kim made __cache_free() more straightforward
- added comments for smp_load_acquire()/smp_store_release()
v3: - incorporate changes introduced by the "mm, kasan: SLAB support" patch
v4: - fix kbuild compile-time error (missing ___cache_free() declaration)
and a warning (wrong format specifier)
v6: - extended the patch description
- dropped the unused qlist_remove() function
---
include/linux/kasan.h | 30 ++++--
lib/test_kasan.c | 29 +++++
mm/kasan/Makefile | 4 +
mm/kasan/kasan.c | 71 +++++++++++--
mm/kasan/kasan.h | 11 +-
mm/kasan/quarantine.c | 289 ++++++++++++++++++++++++++++++++++++++++++++++++++
mm/kasan/report.c | 1 +
mm/mempool.c | 7 +-
mm/page_alloc.c | 2 +-
mm/slab.c | 15 ++-
mm/slab.h | 2 +
mm/slab_common.c | 2 +
mm/slub.c | 4 +-
13 files changed, 438 insertions(+), 29 deletions(-)
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index bf71ab0..355e722 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -44,24 +44,29 @@ static inline void kasan_disable_current(void)
void kasan_unpoison_shadow(const void *address, size_t size);
void kasan_alloc_pages(struct page *page, unsigned int order);
-void kasan_free_pages(struct page *page, unsigned int order);
+void kasan_poison_free_pages(struct page *page, unsigned int order);
void kasan_cache_create(struct kmem_cache *cache, size_t *size,
unsigned long *flags);
+void kasan_cache_shrink(struct kmem_cache *cache);
+void kasan_cache_destroy(struct kmem_cache *cache);
void kasan_poison_slab(struct page *page);
void kasan_unpoison_object_data(struct kmem_cache *cache, void *object);
void kasan_poison_object_data(struct kmem_cache *cache, void *object);
void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags);
-void kasan_kfree_large(const void *ptr);
-void kasan_kfree(void *ptr);
+void kasan_poison_kfree_large(const void *ptr);
+void kasan_poison_kfree(void *ptr);
void kasan_kmalloc(struct kmem_cache *s, const void *object, size_t size,
gfp_t flags);
void kasan_krealloc(const void *object, size_t new_size, gfp_t flags);
void kasan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags);
-void kasan_slab_free(struct kmem_cache *s, void *object);
+/* kasan_slab_free() returns true if the object has been put into quarantine.
+ */
+bool kasan_slab_free(struct kmem_cache *s, void *object);
+void kasan_poison_slab_free(struct kmem_cache *s, void *object);
struct kasan_cache {
int alloc_meta_offset;
@@ -79,11 +84,14 @@ static inline void kasan_enable_current(void) {}
static inline void kasan_disable_current(void) {}
static inline void kasan_alloc_pages(struct page *page, unsigned int order) {}
-static inline void kasan_free_pages(struct page *page, unsigned int order) {}
+static inline void kasan_poison_free_pages(struct page *page,
+ unsigned int order) {}
static inline void kasan_cache_create(struct kmem_cache *cache,
size_t *size,
unsigned long *flags) {}
+static inline void kasan_cache_shrink(struct kmem_cache *cache) {}
+static inline void kasan_cache_destroy(struct kmem_cache *cache) {}
static inline void kasan_poison_slab(struct page *page) {}
static inline void kasan_unpoison_object_data(struct kmem_cache *cache,
@@ -92,8 +100,8 @@ static inline void kasan_poison_object_data(struct kmem_cache *cache,
void *object) {}
static inline void kasan_kmalloc_large(void *ptr, size_t size, gfp_t flags) {}
-static inline void kasan_kfree_large(const void *ptr) {}
-static inline void kasan_kfree(void *ptr) {}
+static inline void kasan_poison_kfree_large(const void *ptr) {}
+static inline void kasan_poison_kfree(void *ptr) {}
static inline void kasan_kmalloc(struct kmem_cache *s, const void *object,
size_t size, gfp_t flags) {}
static inline void kasan_krealloc(const void *object, size_t new_size,
@@ -101,7 +109,13 @@ static inline void kasan_krealloc(const void *object, size_t new_size,
static inline void kasan_slab_alloc(struct kmem_cache *s, void *object,
gfp_t flags) {}
-static inline void kasan_slab_free(struct kmem_cache *s, void *object) {}
+/* kasan_slab_free() returns true if the object has been put into quarantine.
+ */
+static inline bool kasan_slab_free(struct kmem_cache *s, void *object)
+{
+ return false;
+}
+static inline void kasan_poison_slab_free(struct kmem_cache *s, void *object) {}
static inline int kasan_module_alloc(void *addr, size_t size) { return 0; }
static inline void kasan_free_shadow(const struct vm_struct *vm) {}
diff --git a/lib/test_kasan.c b/lib/test_kasan.c
index 82169fb..799c98e 100644
--- a/lib/test_kasan.c
+++ b/lib/test_kasan.c
@@ -344,6 +344,32 @@ static noinline void __init kasan_stack_oob(void)
*(volatile char *)p;
}
+#ifdef CONFIG_SLAB
+static noinline void __init kasan_quarantine_cache(void)
+{
+ struct kmem_cache *cache = kmem_cache_create(
+ "test", 137, 8, GFP_KERNEL, NULL);
+ int i;
+
+ for (i = 0; i < 100; i++) {
+ void *p = kmem_cache_alloc(cache, GFP_KERNEL);
+
+ kmem_cache_free(cache, p);
+ p = kmalloc(sizeof(u64), GFP_KERNEL);
+ kfree(p);
+ }
+ kmem_cache_shrink(cache);
+ for (i = 0; i < 100; i++) {
+ u64 *p = kmem_cache_alloc(cache, GFP_KERNEL);
+
+ kmem_cache_free(cache, p);
+ p = kmalloc(sizeof(u64), GFP_KERNEL);
+ kfree(p);
+ }
+ kmem_cache_destroy(cache);
+}
+#endif
+
static int __init kmalloc_tests_init(void)
{
kmalloc_oob_right();
@@ -367,6 +393,9 @@ static int __init kmalloc_tests_init(void)
kmem_cache_oob();
kasan_stack_oob();
kasan_global_oob();
+#ifdef CONFIG_SLAB
+ kasan_quarantine_cache();
+#endif
return -EAGAIN;
}
diff --git a/mm/kasan/Makefile b/mm/kasan/Makefile
index 131daad..63b54aa 100644
--- a/mm/kasan/Makefile
+++ b/mm/kasan/Makefile
@@ -8,3 +8,7 @@ CFLAGS_REMOVE_kasan.o = -pg
CFLAGS_kasan.o := $(call cc-option, -fno-conserve-stack -fno-stack-protector)
obj-y := kasan.o report.o kasan_init.o
+
+ifdef CONFIG_SLAB
+ obj-y += quarantine.o
+endif
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 6c0de02..24f3249 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -307,7 +307,7 @@ void kasan_alloc_pages(struct page *page, unsigned int order)
kasan_unpoison_shadow(page_address(page), PAGE_SIZE << order);
}
-void kasan_free_pages(struct page *page, unsigned int order)
+void kasan_poison_free_pages(struct page *page, unsigned int order)
{
if (likely(!PageHighMem(page)))
kasan_poison_shadow(page_address(page),
@@ -368,6 +368,20 @@ void kasan_cache_create(struct kmem_cache *cache, size_t *size,
}
#endif
+void kasan_cache_shrink(struct kmem_cache *cache)
+{
+#ifdef CONFIG_SLAB
+ quarantine_remove_cache(cache);
+#endif
+}
+
+void kasan_cache_destroy(struct kmem_cache *cache)
+{
+#ifdef CONFIG_SLAB
+ quarantine_remove_cache(cache);
+#endif
+}
+
void kasan_poison_slab(struct page *page)
{
kasan_poison_shadow(page_address(page),
@@ -462,7 +476,7 @@ void kasan_slab_alloc(struct kmem_cache *cache, void *object, gfp_t flags)
kasan_kmalloc(cache, object, cache->object_size, flags);
}
-void kasan_slab_free(struct kmem_cache *cache, void *object)
+void kasan_poison_slab_free(struct kmem_cache *cache, void *object)
{
unsigned long size = cache->object_size;
unsigned long rounded_up_size = round_up(size, KASAN_SHADOW_SCALE_SIZE);
@@ -471,18 +485,43 @@ void kasan_slab_free(struct kmem_cache *cache, void *object)
if (unlikely(cache->flags & SLAB_DESTROY_BY_RCU))
return;
+ kasan_poison_shadow(object, rounded_up_size, KASAN_KMALLOC_FREE);
+}
+
+bool kasan_slab_free(struct kmem_cache *cache, void *object)
+{
#ifdef CONFIG_SLAB
- if (cache->flags & SLAB_KASAN) {
- struct kasan_free_meta *free_info =
- get_free_info(cache, object);
+ /* RCU slabs could be legally used after free within the RCU period */
+ if (unlikely(cache->flags & SLAB_DESTROY_BY_RCU))
+ return false;
+
+ if (likely(cache->flags & SLAB_KASAN)) {
struct kasan_alloc_meta *alloc_info =
get_alloc_info(cache, object);
- alloc_info->state = KASAN_STATE_FREE;
- set_track(&free_info->track);
+ struct kasan_free_meta *free_info =
+ get_free_info(cache, object);
+
+ switch (alloc_info->state) {
+ case KASAN_STATE_ALLOC:
+ alloc_info->state = KASAN_STATE_QUARANTINE;
+ quarantine_put(free_info, cache);
+ set_track(&free_info->track, GFP_NOWAIT);
+ kasan_poison_slab_free(cache, object);
+ return true;
+ case KASAN_STATE_QUARANTINE:
+ case KASAN_STATE_FREE:
+ pr_err("Double free");
+ dump_stack();
+ break;
+ default:
+ break;
+ }
}
+ return false;
+#else
+ kasan_poison_slab_free(cache, object);
+ return false;
#endif
-
- kasan_poison_shadow(object, rounded_up_size, KASAN_KMALLOC_FREE);
}
void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size,
@@ -491,6 +530,11 @@ void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size,
unsigned long redzone_start;
unsigned long redzone_end;
+#ifdef CONFIG_SLAB
+ if (flags & __GFP_RECLAIM)
+ quarantine_reduce();
+#endif
+
if (unlikely(object == NULL))
return;
@@ -521,6 +565,11 @@ void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags)
unsigned long redzone_start;
unsigned long redzone_end;
+#ifdef CONFIG_SLAB
+ if (flags & __GFP_RECLAIM)
+ quarantine_reduce();
+#endif
+
if (unlikely(ptr == NULL))
return;
@@ -549,7 +598,7 @@ void kasan_krealloc(const void *object, size_t size, gfp_t flags)
kasan_kmalloc(page->slab_cache, object, size, flags);
}
-void kasan_kfree(void *ptr)
+void kasan_poison_kfree(void *ptr)
{
struct page *page;
@@ -562,7 +611,7 @@ void kasan_kfree(void *ptr)
kasan_slab_free(page->slab_cache, ptr);
}
-void kasan_kfree_large(const void *ptr)
+void kasan_poison_kfree_large(const void *ptr)
{
struct page *page = virt_to_page(ptr);
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 30a2f0b..7da78a6 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -62,6 +62,7 @@ struct kasan_global {
enum kasan_state {
KASAN_STATE_INIT,
KASAN_STATE_ALLOC,
+ KASAN_STATE_QUARANTINE,
KASAN_STATE_FREE
};
@@ -80,8 +81,10 @@ struct kasan_alloc_meta {
};
struct kasan_free_meta {
- /* Allocator freelist pointer, unused by KASAN. */
- void **freelist;
+ /* This field is used while the object is in the quarantine.
+ * Otherwise it might be used for the allocator freelist.
+ */
+ void **quarantine_link;
struct kasan_track track;
};
@@ -105,4 +108,8 @@ static inline bool kasan_report_enabled(void)
void kasan_report(unsigned long addr, size_t size,
bool is_write, unsigned long ip);
+void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache);
+void quarantine_reduce(void);
+void quarantine_remove_cache(struct kmem_cache *cache);
+
#endif
diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
new file mode 100644
index 0000000..40159a6
--- /dev/null
+++ b/mm/kasan/quarantine.c
@@ -0,0 +1,289 @@
+/*
+ * KASAN quarantine.
+ *
+ * Author: Alexander Potapenko <[email protected]>
+ * Copyright (C) 2016 Google, Inc.
+ *
+ * Based on code by Dmitry Chernenkov.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ */
+
+#include <linux/gfp.h>
+#include <linux/hash.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/percpu.h>
+#include <linux/printk.h>
+#include <linux/shrinker.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/types.h>
+
+#include "../slab.h"
+#include "kasan.h"
+
+/* Data structure and operations for quarantine queues. */
+
+/* Each queue is a signle-linked list, which also stores the total size of
+ * objects inside of it.
+ */
+struct qlist {
+ void **head;
+ void **tail;
+ size_t bytes;
+};
+
+#define QLIST_INIT { NULL, NULL, 0 }
+
+static bool qlist_empty(struct qlist *q)
+{
+ return !q->head;
+}
+
+static void qlist_init(struct qlist *q)
+{
+ q->head = q->tail = NULL;
+ q->bytes = 0;
+}
+
+static void qlist_put(struct qlist *q, void **qlink, size_t size)
+{
+ if (unlikely(qlist_empty(q)))
+ q->head = qlink;
+ else
+ *q->tail = qlink;
+ q->tail = qlink;
+ *qlink = NULL;
+ q->bytes += size;
+}
+
+static void qlist_move_all(struct qlist *from, struct qlist *to)
+{
+ if (unlikely(qlist_empty(from)))
+ return;
+
+ if (qlist_empty(to)) {
+ *to = *from;
+ qlist_init(from);
+ return;
+ }
+
+ *to->tail = from->head;
+ to->tail = from->tail;
+ to->bytes += from->bytes;
+
+ qlist_init(from);
+}
+
+static void qlist_move(struct qlist *from, void **last, struct qlist *to,
+ size_t size)
+{
+ if (unlikely(last == from->tail)) {
+ qlist_move_all(from, to);
+ return;
+ }
+ if (qlist_empty(to))
+ to->head = from->head;
+ else
+ *to->tail = from->head;
+ to->tail = last;
+ from->head = *last;
+ *last = NULL;
+ from->bytes -= size;
+ to->bytes += size;
+}
+
+
+/* The object quarantine consists of per-cpu queues and a global queue,
+ * guarded by quarantine_lock.
+ */
+static DEFINE_PER_CPU(struct qlist, cpu_quarantine);
+
+static struct qlist global_quarantine;
+static DEFINE_SPINLOCK(quarantine_lock);
+
+/* Maximum size of the global queue. */
+static unsigned long quarantine_size;
+
+/* The fraction of physical memory the quarantine is allowed to occupy.
+ * Quarantine doesn't support memory shrinker with SLAB allocator, so we keep
+ * the ratio low to avoid OOM.
+ */
+#define QUARANTINE_FRACTION 32
+
+/* smp_load_acquire() here pairs with smp_store_release() in
+ * quarantine_reduce().
+ */
+#define QUARANTINE_LOW_SIZE (smp_load_acquire(&quarantine_size) * 3 / 4)
+#define QUARANTINE_PERCPU_SIZE (1 << 20)
+
+static struct kmem_cache *qlink_to_cache(void **qlink)
+{
+ return virt_to_head_page(qlink)->slab_cache;
+}
+
+static void *qlink_to_object(void **qlink, struct kmem_cache *cache)
+{
+ struct kasan_free_meta *free_info =
+ container_of((void ***)qlink, struct kasan_free_meta,
+ quarantine_link);
+
+ return ((void *)free_info) - cache->kasan_info.free_meta_offset;
+}
+
+static void qlink_free(void **qlink, struct kmem_cache *cache)
+{
+ void *object = qlink_to_object(qlink, cache);
+ struct kasan_alloc_meta *alloc_info = get_alloc_info(cache, object);
+ unsigned long flags;
+
+ local_irq_save(flags);
+ alloc_info->state = KASAN_STATE_FREE;
+ ___cache_free(cache, object, _THIS_IP_);
+ local_irq_restore(flags);
+}
+
+static void qlist_free_all(struct qlist *q, struct kmem_cache *cache)
+{
+ void **qlink;
+
+ if (unlikely(qlist_empty(q)))
+ return;
+
+ qlink = q->head;
+ while (qlink) {
+ struct kmem_cache *obj_cache =
+ cache ? cache : qlink_to_cache(qlink);
+ void **next = *qlink;
+
+ qlink_free(qlink, obj_cache);
+ qlink = next;
+ }
+ qlist_init(q);
+}
+
+void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache)
+{
+ unsigned long flags;
+ struct qlist *q;
+ struct qlist temp = QLIST_INIT;
+
+ local_irq_save(flags);
+
+ q = this_cpu_ptr(&cpu_quarantine);
+ qlist_put(q, (void **) &info->quarantine_link, cache->size);
+ if (unlikely(q->bytes > QUARANTINE_PERCPU_SIZE))
+ qlist_move_all(q, &temp);
+
+ local_irq_restore(flags);
+
+ if (unlikely(!qlist_empty(&temp))) {
+ spin_lock_irqsave(&quarantine_lock, flags);
+ qlist_move_all(&temp, &global_quarantine);
+ spin_unlock_irqrestore(&quarantine_lock, flags);
+ }
+}
+
+void quarantine_reduce(void)
+{
+ size_t new_quarantine_size;
+ unsigned long flags;
+ struct qlist to_free = QLIST_INIT;
+ size_t size_to_free = 0;
+ void **last;
+
+ /* smp_load_acquire() here pairs with smp_store_release() below. */
+ if (likely(ACCESS_ONCE(global_quarantine.bytes) <=
+ smp_load_acquire(&quarantine_size)))
+ return;
+
+ spin_lock_irqsave(&quarantine_lock, flags);
+
+ /* Update quarantine size in case of hotplug. Allocate a fraction of
+ * the installed memory to quarantine minus per-cpu queue limits.
+ */
+ new_quarantine_size = (ACCESS_ONCE(totalram_pages) << PAGE_SHIFT) /
+ QUARANTINE_FRACTION;
+ new_quarantine_size -= QUARANTINE_PERCPU_SIZE * num_online_cpus();
+ /* Pairs with smp_load_acquire() above and in QUARANTINE_LOW_SIZE. */
+ smp_store_release(&quarantine_size, new_quarantine_size);
+
+ last = global_quarantine.head;
+ while (last) {
+ struct kmem_cache *cache = qlink_to_cache(last);
+
+ size_to_free += cache->size;
+ if (!*last || size_to_free >
+ global_quarantine.bytes - QUARANTINE_LOW_SIZE)
+ break;
+ last = (void **) *last;
+ }
+ qlist_move(&global_quarantine, last, &to_free, size_to_free);
+
+ spin_unlock_irqrestore(&quarantine_lock, flags);
+
+ qlist_free_all(&to_free, NULL);
+}
+
+static void qlist_move_cache(struct qlist *from,
+ struct qlist *to,
+ struct kmem_cache *cache)
+{
+ void ***prev;
+
+ if (unlikely(qlist_empty(from)))
+ return;
+
+ prev = &from->head;
+ while (*prev) {
+ void **qlink = *prev;
+ struct kmem_cache *obj_cache = qlink_to_cache(qlink);
+
+ if (obj_cache == cache) {
+ if (unlikely(from->tail == qlink))
+ from->tail = (void **) prev;
+ *prev = (void **) *qlink;
+ from->bytes -= cache->size;
+ qlist_put(to, qlink, cache->size);
+ } else
+ prev = (void ***) *prev;
+ }
+}
+
+static void per_cpu_remove_cache(void *arg)
+{
+ struct kmem_cache *cache = arg;
+ struct qlist to_free = QLIST_INIT;
+ struct qlist *q;
+ unsigned long flags;
+
+ local_irq_save(flags);
+ q = this_cpu_ptr(&cpu_quarantine);
+ qlist_move_cache(q, &to_free, cache);
+ local_irq_restore(flags);
+
+ qlist_free_all(&to_free, cache);
+}
+
+void quarantine_remove_cache(struct kmem_cache *cache)
+{
+ unsigned long flags;
+ struct qlist to_free = QLIST_INIT;
+
+ on_each_cpu(per_cpu_remove_cache, cache, 1);
+
+ spin_lock_irqsave(&quarantine_lock, flags);
+ qlist_move_cache(&global_quarantine, &to_free, cache);
+ spin_unlock_irqrestore(&quarantine_lock, flags);
+
+ qlist_free_all(&to_free, cache);
+}
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 8e58be0f..bb27732 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -151,6 +151,7 @@ static void object_err(struct kmem_cache *cache, struct page *page,
print_track(&alloc_info->track);
break;
case KASAN_STATE_FREE:
+ case KASAN_STATE_QUARANTINE:
pr_err("Object freed, allocated with size %u bytes\n",
alloc_info->alloc_size);
free_info = get_free_info(cache, object);
diff --git a/mm/mempool.c b/mm/mempool.c
index 716efa8..9da9bef 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -105,11 +105,12 @@ static inline void poison_element(mempool_t *pool, void *element)
static void kasan_poison_element(mempool_t *pool, void *element)
{
if (pool->alloc == mempool_alloc_slab)
- kasan_slab_free(pool->pool_data, element);
+ kasan_poison_slab_free(pool->pool_data, element);
if (pool->alloc == mempool_kmalloc)
- kasan_kfree(element);
+ kasan_poison_kfree(element);
if (pool->alloc == mempool_alloc_pages)
- kasan_free_pages(element, (unsigned long)pool->pool_data);
+ kasan_poison_free_pages(element,
+ (unsigned long)pool->pool_data);
}
static void kasan_unpoison_element(mempool_t *pool, void *element, gfp_t flags)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1993894..0cadb5d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1005,7 +1005,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
trace_mm_page_free(page, order);
kmemcheck_free_shadow(page, order);
- kasan_free_pages(page, order);
+ kasan_poison_free_pages(page, order);
if (PageAnon(page))
page->mapping = NULL;
diff --git a/mm/slab.c b/mm/slab.c
index 7d27277..222a3bf 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3336,9 +3336,20 @@ free_done:
static inline void __cache_free(struct kmem_cache *cachep, void *objp,
unsigned long caller)
{
- struct array_cache *ac = cpu_cache_get(cachep);
+#ifdef CONFIG_KASAN
+ if (kasan_slab_free(cachep, objp))
+ /* The object has been put into the quarantine, don't touch it
+ * for now.
+ */
+ return;
+#endif
+ ___cache_free(cachep, objp, caller);
+}
- kasan_slab_free(cachep, objp);
+void ___cache_free(struct kmem_cache *cachep, void *objp,
+ unsigned long caller)
+{
+ struct array_cache *ac = cpu_cache_get(cachep);
check_irq_off();
kmemleak_free_recursive(objp, cachep->flags);
diff --git a/mm/slab.h b/mm/slab.h
index 5969769..dedb1a9 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -462,4 +462,6 @@ void *slab_next(struct seq_file *m, void *p, loff_t *pos);
void slab_stop(struct seq_file *m, void *p);
int memcg_slab_show(struct seq_file *m, void *p);
+void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
+
#endif /* MM_SLAB_H */
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 07690d3..b8502a2 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -715,6 +715,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
get_online_cpus();
get_online_mems();
+ kasan_cache_destroy(s);
mutex_lock(&slab_mutex);
s->refcount--;
@@ -753,6 +754,7 @@ int kmem_cache_shrink(struct kmem_cache *cachep)
get_online_cpus();
get_online_mems();
+ kasan_cache_shrink(cachep);
ret = __kmem_cache_shrink(cachep, false);
put_online_mems();
put_online_cpus();
diff --git a/mm/slub.c b/mm/slub.c
index 4e63f3b..c76fd2e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1324,7 +1324,7 @@ static inline void kmalloc_large_node_hook(void *ptr, size_t size, gfp_t flags)
static inline void kfree_hook(const void *x)
{
kmemleak_free(x);
- kasan_kfree_large(x);
+ kasan_poison_kfree_large(x);
}
static inline void slab_free_hook(struct kmem_cache *s, void *x)
@@ -1349,7 +1349,7 @@ static inline void slab_free_hook(struct kmem_cache *s, void *x)
if (!(s->flags & SLAB_DEBUG_OBJECTS))
debug_check_no_obj_freed(x, s->object_size);
- kasan_slab_free(s, x);
+ kasan_poison_slab_free(s, x);
}
static inline void slab_free_freelist_hook(struct kmem_cache *s,
--
2.7.0.rc3.207.g0ac5344
KASAN needs to know whether the allocation happens in an IRQ handler.
This lets us strip everything below the IRQ entry point to reduce the
number of unique stack traces needed to be stored.
Move the definition of __irq_entry to <linux/interrupt.h> so that the
users don't need to pull in <linux/ftrace.h>. Also introduce the
__softirq_entry macro which is similar to __irq_entry, but puts the
corresponding functions to the .softirqentry.text section.
Signed-off-by: Alexander Potapenko <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
---
v2: - per request from Steven Rostedt, moved the declarations of __softirq_entry
and __irq_entry to <linux/interrupt.h>
v3: - minor description changes
v5: - fixed kbuild warnings in files that define __exception_irq_entry
---
arch/arm/include/asm/exception.h | 2 +-
arch/arm/kernel/vmlinux.lds.S | 1 +
arch/arm64/include/asm/exception.h | 2 +-
arch/arm64/kernel/vmlinux.lds.S | 1 +
arch/blackfin/kernel/vmlinux.lds.S | 1 +
arch/c6x/kernel/vmlinux.lds.S | 1 +
arch/metag/kernel/vmlinux.lds.S | 1 +
arch/microblaze/kernel/vmlinux.lds.S | 1 +
arch/mips/kernel/vmlinux.lds.S | 1 +
arch/nios2/kernel/vmlinux.lds.S | 1 +
arch/openrisc/kernel/vmlinux.lds.S | 1 +
arch/parisc/kernel/vmlinux.lds.S | 1 +
arch/powerpc/kernel/vmlinux.lds.S | 1 +
arch/s390/kernel/vmlinux.lds.S | 1 +
arch/sh/kernel/vmlinux.lds.S | 1 +
arch/sparc/kernel/vmlinux.lds.S | 1 +
arch/tile/kernel/vmlinux.lds.S | 1 +
arch/x86/kernel/vmlinux.lds.S | 1 +
include/asm-generic/vmlinux.lds.h | 12 +++++++++++-
include/linux/ftrace.h | 11 -----------
include/linux/interrupt.h | 20 ++++++++++++++++++++
kernel/softirq.c | 2 +-
kernel/trace/trace_functions_graph.c | 1 +
23 files changed, 51 insertions(+), 15 deletions(-)
diff --git a/arch/arm/include/asm/exception.h b/arch/arm/include/asm/exception.h
index 5abaf5b..bf19912 100644
--- a/arch/arm/include/asm/exception.h
+++ b/arch/arm/include/asm/exception.h
@@ -7,7 +7,7 @@
#ifndef __ASM_ARM_EXCEPTION_H
#define __ASM_ARM_EXCEPTION_H
-#include <linux/ftrace.h>
+#include <linux/interrupt.h>
#define __exception __attribute__((section(".exception.text")))
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 1fab979..e2c6da0 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -108,6 +108,7 @@ SECTIONS
*(.exception.text)
__exception_text_end = .;
IRQENTRY_TEXT
+ SOFTIRQENTRY_TEXT
TEXT_TEXT
SCHED_TEXT
LOCK_TEXT
diff --git a/arch/arm64/include/asm/exception.h b/arch/arm64/include/asm/exception.h
index 6cb7e1a..0c2eec4 100644
--- a/arch/arm64/include/asm/exception.h
+++ b/arch/arm64/include/asm/exception.h
@@ -18,7 +18,7 @@
#ifndef __ASM_EXCEPTION_H
#define __ASM_EXCEPTION_H
-#include <linux/ftrace.h>
+#include <linux/interrupt.h>
#define __exception __attribute__((section(".exception.text")))
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 37f624df..5a1939a 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -103,6 +103,7 @@ SECTIONS
*(.exception.text)
__exception_text_end = .;
IRQENTRY_TEXT
+ SOFTIRQENTRY_TEXT
TEXT_TEXT
SCHED_TEXT
LOCK_TEXT
diff --git a/arch/blackfin/kernel/vmlinux.lds.S b/arch/blackfin/kernel/vmlinux.lds.S
index c9eec84..d920b95 100644
--- a/arch/blackfin/kernel/vmlinux.lds.S
+++ b/arch/blackfin/kernel/vmlinux.lds.S
@@ -35,6 +35,7 @@ SECTIONS
#endif
LOCK_TEXT
IRQENTRY_TEXT
+ SOFTIRQENTRY_TEXT
KPROBES_TEXT
#ifdef CONFIG_ROMKERNEL
__sinittext = .;
diff --git a/arch/c6x/kernel/vmlinux.lds.S b/arch/c6x/kernel/vmlinux.lds.S
index 5a6e141..50bc10f 100644
--- a/arch/c6x/kernel/vmlinux.lds.S
+++ b/arch/c6x/kernel/vmlinux.lds.S
@@ -72,6 +72,7 @@ SECTIONS
SCHED_TEXT
LOCK_TEXT
IRQENTRY_TEXT
+ SOFTIRQENTRY_TEXT
KPROBES_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/metag/kernel/vmlinux.lds.S b/arch/metag/kernel/vmlinux.lds.S
index e12055e..150ace9 100644
--- a/arch/metag/kernel/vmlinux.lds.S
+++ b/arch/metag/kernel/vmlinux.lds.S
@@ -24,6 +24,7 @@ SECTIONS
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
+ SOFTIRQENTRY_TEXT
*(.text.*)
*(.gnu.warning)
}
diff --git a/arch/microblaze/kernel/vmlinux.lds.S b/arch/microblaze/kernel/vmlinux.lds.S
index be9488d..0a47f04 100644
--- a/arch/microblaze/kernel/vmlinux.lds.S
+++ b/arch/microblaze/kernel/vmlinux.lds.S
@@ -36,6 +36,7 @@ SECTIONS {
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
+ SOFTIRQENTRY_TEXT
. = ALIGN (4) ;
_etext = . ;
}
diff --git a/arch/mips/kernel/vmlinux.lds.S b/arch/mips/kernel/vmlinux.lds.S
index 0a93e83..54d653e 100644
--- a/arch/mips/kernel/vmlinux.lds.S
+++ b/arch/mips/kernel/vmlinux.lds.S
@@ -58,6 +58,7 @@ SECTIONS
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
+ SOFTIRQENTRY_TEXT
*(.text.*)
*(.fixup)
*(.gnu.warning)
diff --git a/arch/nios2/kernel/vmlinux.lds.S b/arch/nios2/kernel/vmlinux.lds.S
index 326fab4..e23e895 100644
--- a/arch/nios2/kernel/vmlinux.lds.S
+++ b/arch/nios2/kernel/vmlinux.lds.S
@@ -39,6 +39,7 @@ SECTIONS
SCHED_TEXT
LOCK_TEXT
IRQENTRY_TEXT
+ SOFTIRQENTRY_TEXT
KPROBES_TEXT
} =0
_etext = .;
diff --git a/arch/openrisc/kernel/vmlinux.lds.S b/arch/openrisc/kernel/vmlinux.lds.S
index 2d69a85..d936de4 100644
--- a/arch/openrisc/kernel/vmlinux.lds.S
+++ b/arch/openrisc/kernel/vmlinux.lds.S
@@ -50,6 +50,7 @@ SECTIONS
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
+ SOFTIRQENTRY_TEXT
*(.fixup)
*(.text.__*)
_etext = .;
diff --git a/arch/parisc/kernel/vmlinux.lds.S b/arch/parisc/kernel/vmlinux.lds.S
index 308f290..f3ead0b 100644
--- a/arch/parisc/kernel/vmlinux.lds.S
+++ b/arch/parisc/kernel/vmlinux.lds.S
@@ -72,6 +72,7 @@ SECTIONS
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
+ SOFTIRQENTRY_TEXT
*(.text.do_softirq)
*(.text.sys_exit)
*(.text.do_sigaltstack)
diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index d41fd0a..2dd91f7 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -55,6 +55,7 @@ SECTIONS
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
+ SOFTIRQENTRY_TEXT
#ifdef CONFIG_PPC32
*(.got1)
diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S
index 445657f..0f41a82 100644
--- a/arch/s390/kernel/vmlinux.lds.S
+++ b/arch/s390/kernel/vmlinux.lds.S
@@ -28,6 +28,7 @@ SECTIONS
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
+ SOFTIRQENTRY_TEXT
*(.fixup)
*(.gnu.warning)
} :text = 0x0700
diff --git a/arch/sh/kernel/vmlinux.lds.S b/arch/sh/kernel/vmlinux.lds.S
index db88cbf..235a410 100644
--- a/arch/sh/kernel/vmlinux.lds.S
+++ b/arch/sh/kernel/vmlinux.lds.S
@@ -39,6 +39,7 @@ SECTIONS
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
+ SOFTIRQENTRY_TEXT
*(.fixup)
*(.gnu.warning)
_etext = .; /* End of text section */
diff --git a/arch/sparc/kernel/vmlinux.lds.S b/arch/sparc/kernel/vmlinux.lds.S
index f1a2f68..aadd321 100644
--- a/arch/sparc/kernel/vmlinux.lds.S
+++ b/arch/sparc/kernel/vmlinux.lds.S
@@ -48,6 +48,7 @@ SECTIONS
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
+ SOFTIRQENTRY_TEXT
*(.gnu.warning)
} = 0
_etext = .;
diff --git a/arch/tile/kernel/vmlinux.lds.S b/arch/tile/kernel/vmlinux.lds.S
index 0e059a0..378f5d8 100644
--- a/arch/tile/kernel/vmlinux.lds.S
+++ b/arch/tile/kernel/vmlinux.lds.S
@@ -45,6 +45,7 @@ SECTIONS
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
+ SOFTIRQENTRY_TEXT
__fix_text_end = .; /* tile-cpack won't rearrange before this */
ALIGN_FUNCTION();
*(.hottext*)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 74adf67..02f14cf 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -101,6 +101,7 @@ SECTIONS
KPROBES_TEXT
ENTRY_TEXT
IRQENTRY_TEXT
+ SOFTIRQENTRY_TEXT
*(.fixup)
*(.gnu.warning)
/* End of text section */
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 8f5a12a..339125b 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -456,7 +456,7 @@
*(.entry.text) \
VMLINUX_SYMBOL(__entry_text_end) = .;
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+#if defined(CONFIG_FUNCTION_GRAPH_TRACER) || defined(CONFIG_KASAN)
#define IRQENTRY_TEXT \
ALIGN_FUNCTION(); \
VMLINUX_SYMBOL(__irqentry_text_start) = .; \
@@ -466,6 +466,16 @@
#define IRQENTRY_TEXT
#endif
+#if defined(CONFIG_FUNCTION_GRAPH_TRACER) || defined(CONFIG_KASAN)
+#define SOFTIRQENTRY_TEXT \
+ ALIGN_FUNCTION(); \
+ VMLINUX_SYMBOL(__softirqentry_text_start) = .; \
+ *(.softirqentry.text) \
+ VMLINUX_SYMBOL(__softirqentry_text_end) = .;
+#else
+#define SOFTIRQENTRY_TEXT
+#endif
+
/* Section used for early init (in .S files) */
#define HEAD_TEXT *(.head.text)
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 6d9df3f..dea12a6 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -811,16 +811,6 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth,
*/
#define __notrace_funcgraph notrace
-/*
- * We want to which function is an entrypoint of a hardirq.
- * That will help us to put a signal on output.
- */
-#define __irq_entry __attribute__((__section__(".irqentry.text")))
-
-/* Limits of hardirq entrypoints */
-extern char __irqentry_text_start[];
-extern char __irqentry_text_end[];
-
#define FTRACE_NOTRACE_DEPTH 65536
#define FTRACE_RETFUNC_DEPTH 50
#define FTRACE_RETSTACK_ALLOC_SIZE 32
@@ -857,7 +847,6 @@ static inline void unpause_graph_tracing(void)
#else /* !CONFIG_FUNCTION_GRAPH_TRACER */
#define __notrace_funcgraph
-#define __irq_entry
#define INIT_FTRACE_GRAPH
static inline void ftrace_graph_init_task(struct task_struct *t) { }
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 0e95fcc..1dcecaf 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -673,4 +673,24 @@ extern int early_irq_init(void);
extern int arch_probe_nr_irqs(void);
extern int arch_early_irq_init(void);
+#if defined(CONFIG_FUNCTION_GRAPH_TRACER) || defined(CONFIG_KASAN)
+/*
+ * We want to know which function is an entrypoint of a hardirq or a softirq.
+ */
+#define __irq_entry __attribute__((__section__(".irqentry.text")))
+#define __softirq_entry \
+ __attribute__((__section__(".softirqentry.text")))
+
+/* Limits of hardirq entrypoints */
+extern char __irqentry_text_start[];
+extern char __irqentry_text_end[];
+/* Limits of softirq entrypoints */
+extern char __softirqentry_text_start[];
+extern char __softirqentry_text_end[];
+
+#else
+#define __irq_entry
+#define __softirq_entry
+#endif
+
#endif
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 8aae49d..17caf4b 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -227,7 +227,7 @@ static inline bool lockdep_softirq_start(void) { return false; }
static inline void lockdep_softirq_end(bool in_hardirq) { }
#endif
-asmlinkage __visible void __do_softirq(void)
+asmlinkage __visible void __softirq_entry __do_softirq(void)
{
unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
unsigned long old_flags = current->flags;
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index a663cbb..3e6f7d4 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -8,6 +8,7 @@
*/
#include <linux/uaccess.h>
#include <linux/ftrace.h>
+#include <linux/interrupt.h>
#include <linux/slab.h>
#include <linux/fs.h>
--
2.7.0.rc3.207.g0ac5344
Add GFP flags to KASAN hooks for future patches to use.
This patch is based on the "mm: kasan: unified support for SLUB and
SLAB allocators" patch originally prepared by Dmitry Chernenkov.
Signed-off-by: Alexander Potapenko <[email protected]>
---
v4: - fix kbuild compilation error (missing parameter for kasan_kmalloc())
---
include/linux/kasan.h | 19 +++++++++++--------
include/linux/slab.h | 4 ++--
mm/kasan/kasan.c | 15 ++++++++-------
mm/mempool.c | 16 ++++++++--------
mm/slab.c | 15 ++++++++-------
mm/slab.h | 2 +-
mm/slab_common.c | 4 ++--
mm/slub.c | 15 ++++++++-------
8 files changed, 48 insertions(+), 42 deletions(-)
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 4405a35..bf71ab0 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -53,13 +53,14 @@ void kasan_poison_slab(struct page *page);
void kasan_unpoison_object_data(struct kmem_cache *cache, void *object);
void kasan_poison_object_data(struct kmem_cache *cache, void *object);
-void kasan_kmalloc_large(const void *ptr, size_t size);
+void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags);
void kasan_kfree_large(const void *ptr);
void kasan_kfree(void *ptr);
-void kasan_kmalloc(struct kmem_cache *s, const void *object, size_t size);
-void kasan_krealloc(const void *object, size_t new_size);
+void kasan_kmalloc(struct kmem_cache *s, const void *object, size_t size,
+ gfp_t flags);
+void kasan_krealloc(const void *object, size_t new_size, gfp_t flags);
-void kasan_slab_alloc(struct kmem_cache *s, void *object);
+void kasan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags);
void kasan_slab_free(struct kmem_cache *s, void *object);
struct kasan_cache {
@@ -90,14 +91,16 @@ static inline void kasan_unpoison_object_data(struct kmem_cache *cache,
static inline void kasan_poison_object_data(struct kmem_cache *cache,
void *object) {}
-static inline void kasan_kmalloc_large(void *ptr, size_t size) {}
+static inline void kasan_kmalloc_large(void *ptr, size_t size, gfp_t flags) {}
static inline void kasan_kfree_large(const void *ptr) {}
static inline void kasan_kfree(void *ptr) {}
static inline void kasan_kmalloc(struct kmem_cache *s, const void *object,
- size_t size) {}
-static inline void kasan_krealloc(const void *object, size_t new_size) {}
+ size_t size, gfp_t flags) {}
+static inline void kasan_krealloc(const void *object, size_t new_size,
+ gfp_t flags) {}
-static inline void kasan_slab_alloc(struct kmem_cache *s, void *object) {}
+static inline void kasan_slab_alloc(struct kmem_cache *s, void *object,
+ gfp_t flags) {}
static inline void kasan_slab_free(struct kmem_cache *s, void *object) {}
static inline int kasan_module_alloc(void *addr, size_t size) { return 0; }
diff --git a/include/linux/slab.h b/include/linux/slab.h
index aa61595..508bd82 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -376,7 +376,7 @@ static __always_inline void *kmem_cache_alloc_trace(struct kmem_cache *s,
{
void *ret = kmem_cache_alloc(s, flags);
- kasan_kmalloc(s, ret, size);
+ kasan_kmalloc(s, ret, size, flags);
return ret;
}
@@ -387,7 +387,7 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s,
{
void *ret = kmem_cache_alloc_node(s, gfpflags, node);
- kasan_kmalloc(s, ret, size);
+ kasan_kmalloc(s, ret, size, gfpflags);
return ret;
}
#endif /* CONFIG_TRACING */
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index d26ffb4..95b2267 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -414,9 +414,9 @@ struct kasan_free_meta *get_free_info(struct kmem_cache *cache,
}
#endif
-void kasan_slab_alloc(struct kmem_cache *cache, void *object)
+void kasan_slab_alloc(struct kmem_cache *cache, void *object, gfp_t flags)
{
- kasan_kmalloc(cache, object, cache->object_size);
+ kasan_kmalloc(cache, object, cache->object_size, flags);
}
void kasan_slab_free(struct kmem_cache *cache, void *object)
@@ -442,7 +442,8 @@ void kasan_slab_free(struct kmem_cache *cache, void *object)
kasan_poison_shadow(object, rounded_up_size, KASAN_KMALLOC_FREE);
}
-void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size)
+void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size,
+ gfp_t flags)
{
unsigned long redzone_start;
unsigned long redzone_end;
@@ -471,7 +472,7 @@ void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size)
}
EXPORT_SYMBOL(kasan_kmalloc);
-void kasan_kmalloc_large(const void *ptr, size_t size)
+void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags)
{
struct page *page;
unsigned long redzone_start;
@@ -490,7 +491,7 @@ void kasan_kmalloc_large(const void *ptr, size_t size)
KASAN_PAGE_REDZONE);
}
-void kasan_krealloc(const void *object, size_t size)
+void kasan_krealloc(const void *object, size_t size, gfp_t flags)
{
struct page *page;
@@ -500,9 +501,9 @@ void kasan_krealloc(const void *object, size_t size)
page = virt_to_head_page(object);
if (unlikely(!PageSlab(page)))
- kasan_kmalloc_large(object, size);
+ kasan_kmalloc_large(object, size, flags);
else
- kasan_kmalloc(page->slab_cache, object, size);
+ kasan_kmalloc(page->slab_cache, object, size, flags);
}
void kasan_kfree(void *ptr)
diff --git a/mm/mempool.c b/mm/mempool.c
index 5669f4d..716efa8 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -112,12 +112,12 @@ static void kasan_poison_element(mempool_t *pool, void *element)
kasan_free_pages(element, (unsigned long)pool->pool_data);
}
-static void kasan_unpoison_element(mempool_t *pool, void *element)
+static void kasan_unpoison_element(mempool_t *pool, void *element, gfp_t flags)
{
if (pool->alloc == mempool_alloc_slab)
- kasan_slab_alloc(pool->pool_data, element);
+ kasan_slab_alloc(pool->pool_data, element, flags);
if (pool->alloc == mempool_kmalloc)
- kasan_krealloc(element, (size_t)pool->pool_data);
+ kasan_krealloc(element, (size_t)pool->pool_data, flags);
if (pool->alloc == mempool_alloc_pages)
kasan_alloc_pages(element, (unsigned long)pool->pool_data);
}
@@ -130,13 +130,13 @@ static void add_element(mempool_t *pool, void *element)
pool->elements[pool->curr_nr++] = element;
}
-static void *remove_element(mempool_t *pool)
+static void *remove_element(mempool_t *pool, gfp_t flags)
{
void *element = pool->elements[--pool->curr_nr];
BUG_ON(pool->curr_nr < 0);
check_element(pool, element);
- kasan_unpoison_element(pool, element);
+ kasan_unpoison_element(pool, element, flags);
return element;
}
@@ -154,7 +154,7 @@ void mempool_destroy(mempool_t *pool)
return;
while (pool->curr_nr) {
- void *element = remove_element(pool);
+ void *element = remove_element(pool, GFP_KERNEL);
pool->free(element, pool->pool_data);
}
kfree(pool->elements);
@@ -250,7 +250,7 @@ int mempool_resize(mempool_t *pool, int new_min_nr)
spin_lock_irqsave(&pool->lock, flags);
if (new_min_nr <= pool->min_nr) {
while (new_min_nr < pool->curr_nr) {
- element = remove_element(pool);
+ element = remove_element(pool, GFP_KERNEL);
spin_unlock_irqrestore(&pool->lock, flags);
pool->free(element, pool->pool_data);
spin_lock_irqsave(&pool->lock, flags);
@@ -347,7 +347,7 @@ repeat_alloc:
spin_lock_irqsave(&pool->lock, flags);
if (likely(pool->curr_nr)) {
- element = remove_element(pool);
+ element = remove_element(pool, gfp_temp);
spin_unlock_irqrestore(&pool->lock, flags);
/* paired with rmb in mempool_free(), read comment there */
smp_wmb();
diff --git a/mm/slab.c b/mm/slab.c
index 95dd196..7d27277 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3387,7 +3387,7 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
void *ret = slab_alloc(cachep, flags, _RET_IP_);
- kasan_slab_alloc(cachep, ret);
+ kasan_slab_alloc(cachep, ret, flags);
trace_kmem_cache_alloc(_RET_IP_, ret,
cachep->object_size, cachep->size, flags);
@@ -3453,7 +3453,7 @@ kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
ret = slab_alloc(cachep, flags, _RET_IP_);
- kasan_kmalloc(cachep, ret, size);
+ kasan_kmalloc(cachep, ret, size, flags);
trace_kmalloc(_RET_IP_, ret,
size, cachep->size, flags);
return ret;
@@ -3477,7 +3477,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
{
void *ret = slab_alloc_node(cachep, flags, nodeid, _RET_IP_);
- kasan_slab_alloc(cachep, ret);
+ kasan_slab_alloc(cachep, ret, flags);
trace_kmem_cache_alloc_node(_RET_IP_, ret,
cachep->object_size, cachep->size,
flags, nodeid);
@@ -3495,7 +3495,8 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
void *ret;
ret = slab_alloc_node(cachep, flags, nodeid, _RET_IP_);
- kasan_kmalloc(cachep, ret, size);
+
+ kasan_kmalloc(cachep, ret, size, flags);
trace_kmalloc_node(_RET_IP_, ret,
size, cachep->size,
flags, nodeid);
@@ -3514,7 +3515,7 @@ __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
if (unlikely(ZERO_OR_NULL_PTR(cachep)))
return cachep;
ret = kmem_cache_alloc_node_trace(cachep, flags, node, size);
- kasan_kmalloc(cachep, ret, size);
+ kasan_kmalloc(cachep, ret, size, flags);
return ret;
}
@@ -3550,7 +3551,7 @@ static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
return cachep;
ret = slab_alloc(cachep, flags, caller);
- kasan_kmalloc(cachep, ret, size);
+ kasan_kmalloc(cachep, ret, size, flags);
trace_kmalloc(caller, ret,
size, cachep->size, flags);
@@ -4333,7 +4334,7 @@ size_t ksize(const void *objp)
/* We assume that ksize callers could use the whole allocated area,
* so we need to unpoison this area.
*/
- kasan_krealloc(objp, size);
+ kasan_krealloc(objp, size, GFP_NOWAIT);
return size;
}
diff --git a/mm/slab.h b/mm/slab.h
index ff39a8f..5969769 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -405,7 +405,7 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags,
kmemcheck_slab_alloc(s, flags, object, slab_ksize(s));
kmemleak_alloc_recursive(object, s->object_size, 1,
s->flags, flags);
- kasan_slab_alloc(s, object);
+ kasan_slab_alloc(s, object, flags);
}
memcg_kmem_put_cache(s);
}
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 242e6fa..07690d3 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1013,7 +1013,7 @@ void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
page = alloc_kmem_pages(flags, order);
ret = page ? page_address(page) : NULL;
kmemleak_alloc(ret, size, 1, flags);
- kasan_kmalloc_large(ret, size);
+ kasan_kmalloc_large(ret, size, flags);
return ret;
}
EXPORT_SYMBOL(kmalloc_order);
@@ -1194,7 +1194,7 @@ static __always_inline void *__do_krealloc(const void *p, size_t new_size,
ks = ksize(p);
if (ks >= new_size) {
- kasan_krealloc((void *)p, new_size);
+ kasan_krealloc((void *)p, new_size, flags);
return (void *)p;
}
diff --git a/mm/slub.c b/mm/slub.c
index d86720d..4e63f3b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1318,7 +1318,7 @@ static inline void dec_slabs_node(struct kmem_cache *s, int node,
static inline void kmalloc_large_node_hook(void *ptr, size_t size, gfp_t flags)
{
kmemleak_alloc(ptr, size, 1, flags);
- kasan_kmalloc_large(ptr, size);
+ kasan_kmalloc_large(ptr, size, flags);
}
static inline void kfree_hook(const void *x)
@@ -2608,7 +2608,7 @@ void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
{
void *ret = slab_alloc(s, gfpflags, _RET_IP_);
trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags);
- kasan_kmalloc(s, ret, size);
+ kasan_kmalloc(s, ret, size, gfpflags);
return ret;
}
EXPORT_SYMBOL(kmem_cache_alloc_trace);
@@ -2636,7 +2636,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
trace_kmalloc_node(_RET_IP_, ret,
size, s->size, gfpflags, node);
- kasan_kmalloc(s, ret, size);
+ kasan_kmalloc(s, ret, size, gfpflags);
return ret;
}
EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
@@ -3194,7 +3194,8 @@ static void early_kmem_cache_node_alloc(int node)
init_object(kmem_cache_node, n, SLUB_RED_ACTIVE);
init_tracking(kmem_cache_node, n);
#endif
- kasan_kmalloc(kmem_cache_node, n, sizeof(struct kmem_cache_node));
+ kasan_kmalloc(kmem_cache_node, n, sizeof(struct kmem_cache_node),
+ GFP_KERNEL);
init_kmem_cache_node(n);
inc_slabs_node(kmem_cache_node, node, page->objects);
@@ -3581,7 +3582,7 @@ void *__kmalloc(size_t size, gfp_t flags)
trace_kmalloc(_RET_IP_, ret, size, s->size, flags);
- kasan_kmalloc(s, ret, size);
+ kasan_kmalloc(s, ret, size, flags);
return ret;
}
@@ -3626,7 +3627,7 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
trace_kmalloc_node(_RET_IP_, ret, size, s->size, flags, node);
- kasan_kmalloc(s, ret, size);
+ kasan_kmalloc(s, ret, size, flags);
return ret;
}
@@ -3655,7 +3656,7 @@ size_t ksize(const void *object)
size_t size = __ksize(object);
/* We assume that ksize callers could use whole allocated area,
so we need unpoison this area. */
- kasan_krealloc(object, size);
+ kasan_krealloc(object, size, GFP_NOWAIT);
return size;
}
EXPORT_SYMBOL(ksize);
--
2.7.0.rc3.207.g0ac5344
2016-03-14 13:43 GMT+03:00 Alexander Potapenko <[email protected]>:
> +
> + rec = this_cpu_ptr(&depot_recursion);
> + /* Don't store the stack if we've been called recursively. */
> + if (unlikely(*rec))
> + goto fast_exit;
> + *rec = true;
This just can't work. As long as preemption enabled, task could
migrate on another cpu anytime.
You could use per-task flag, although it's possible to miss some
in-irq stacktraces:
depot_save_stack()
if (current->stackdeport_recursion)
goto fast_exit;
current->stackdepot_recursion++
<IRQ>
....
depot_save_stack()
if (current->stackdeport_recursion)
goto fast_exit;
> + if (unlikely(!smp_load_acquire(&next_slab_inited))) {
> + /* Zero out zone modifiers, as we don't have specific zone
> + * requirements. Keep the flags related to allocation in atomic
> + * contexts and I/O.
> + */
> + alloc_flags &= ~GFP_ZONEMASK;
> + alloc_flags &= (GFP_ATOMIC | GFP_KERNEL);
> + /* When possible, allocate using vmalloc() to reduce physical
> + * address space fragmentation. vmalloc() doesn't work if
> + * kmalloc caches haven't been initialized or if it's being
> + * called from an interrupt handler.
> + */
> + if (kmalloc_caches[KMALLOC_SHIFT_HIGH] && !in_interrupt()) {
This is clearly a wrong way to check whether is slab available or not.
Besides you need to check
vmalloc() for availability, not slab.
Given that STAC_ALLOC_ORDER is 2 now, I think it should be fine to use
alloc_pages() all the time.
Or fix condition, up to you.
> + prealloc = __vmalloc(
> + STACK_ALLOC_SIZE, alloc_flags, PAGE_KERNEL);
> + } else {
> + page = alloc_pages(alloc_flags, STACK_ALLOC_ORDER);
> + if (page)
> + prealloc = page_address(page);
> + }
> + }
> +
On Mon, Mar 14, 2016 at 5:56 PM, Andrey Ryabinin <[email protected]> wrote:
> 2016-03-14 13:43 GMT+03:00 Alexander Potapenko <[email protected]>:
>
>> +
>> + rec = this_cpu_ptr(&depot_recursion);
>> + /* Don't store the stack if we've been called recursively. */
>> + if (unlikely(*rec))
>> + goto fast_exit;
>> + *rec = true;
>
>
> This just can't work. As long as preemption enabled, task could
> migrate on another cpu anytime.
Ah, you're right.
Do you think disabling preemption around memory allocation is an option here?
> You could use per-task flag, although it's possible to miss some
> in-irq stacktraces:
>
> depot_save_stack()
> if (current->stackdeport_recursion)
> goto fast_exit;
> current->stackdepot_recursion++
> <IRQ>
> ....
> depot_save_stack()
> if (current->stackdeport_recursion)
> goto fast_exit;
>
>
>
>> + if (unlikely(!smp_load_acquire(&next_slab_inited))) {
>> + /* Zero out zone modifiers, as we don't have specific zone
>> + * requirements. Keep the flags related to allocation in atomic
>> + * contexts and I/O.
>> + */
>> + alloc_flags &= ~GFP_ZONEMASK;
>> + alloc_flags &= (GFP_ATOMIC | GFP_KERNEL);
>> + /* When possible, allocate using vmalloc() to reduce physical
>> + * address space fragmentation. vmalloc() doesn't work if
>> + * kmalloc caches haven't been initialized or if it's being
>> + * called from an interrupt handler.
>> + */
>> + if (kmalloc_caches[KMALLOC_SHIFT_HIGH] && !in_interrupt()) {
>
> This is clearly a wrong way to check whether is slab available or not.
Well, I don't think either vmalloc() or kmalloc() provide any
interface to check if they are available.
> Besides you need to check
> vmalloc() for availability, not slab.
The problem was in kmalloc caches being unavailable, although I can
imagine other problems could have arose.
Perhaps we can drill a hole to get the value of vmap_initialized?
> Given that STAC_ALLOC_ORDER is 2 now, I think it should be fine to use
> alloc_pages() all the time.
> Or fix condition, up to you.
Ok, I'm going to drop vmalloc() for now, we can always implement this later.
Note that this also removes the necessity to check for recursion.
>> + prealloc = __vmalloc(
>> + STACK_ALLOC_SIZE, alloc_flags, PAGE_KERNEL);
>> + } else {
>> + page = alloc_pages(alloc_flags, STACK_ALLOC_ORDER);
>> + if (page)
>> + prealloc = page_address(page);
>> + }
>> + }
>> +
--
Alexander Potapenko
Software Engineer
Google Germany GmbH
Erika-Mann-Straße, 33
80636 München
Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
2016-03-15 12:27 GMT+03:00 Alexander Potapenko <[email protected]>:
> On Mon, Mar 14, 2016 at 5:56 PM, Andrey Ryabinin <[email protected]> wrote:
>> 2016-03-14 13:43 GMT+03:00 Alexander Potapenko <[email protected]>:
>>
>>> +
>>> + rec = this_cpu_ptr(&depot_recursion);
>>> + /* Don't store the stack if we've been called recursively. */
>>> + if (unlikely(*rec))
>>> + goto fast_exit;
>>> + *rec = true;
>>
>>
>> This just can't work. As long as preemption enabled, task could
>> migrate on another cpu anytime.
> Ah, you're right.
> Do you think disabling preemption around memory allocation is an option here?
It's definitely not an option. Flag on current doesn't have any
disadvantage over per-cpu approach
and it doesn't require preemption safe context.
However, making the allocation in a separate context would be a better
way to eliminate recursion.
i.e. instead of allocating memory depot_save_stack() kicks a work
which allocates memory.
>> You could use per-task flag, although it's possible to miss some
>> in-irq stacktraces:
>>
>> depot_save_stack()
>> if (current->stackdeport_recursion)
>> goto fast_exit;
>> current->stackdepot_recursion++
>> <IRQ>
>> ....
>> depot_save_stack()
>> if (current->stackdeport_recursion)
>> goto fast_exit;
>>
>>
>>
>>> + if (unlikely(!smp_load_acquire(&next_slab_inited))) {
>>> + /* Zero out zone modifiers, as we don't have specific zone
>>> + * requirements. Keep the flags related to allocation in atomic
>>> + * contexts and I/O.
>>> + */
>>> + alloc_flags &= ~GFP_ZONEMASK;
>>> + alloc_flags &= (GFP_ATOMIC | GFP_KERNEL);
>>> + /* When possible, allocate using vmalloc() to reduce physical
>>> + * address space fragmentation. vmalloc() doesn't work if
>>> + * kmalloc caches haven't been initialized or if it's being
>>> + * called from an interrupt handler.
>>> + */
>>> + if (kmalloc_caches[KMALLOC_SHIFT_HIGH] && !in_interrupt()) {
>>
>> This is clearly a wrong way to check whether is slab available or not.
> Well, I don't think either vmalloc() or kmalloc() provide any
> interface to check if they are available.
>
>> Besides you need to check
>> vmalloc() for availability, not slab.
> The problem was in kmalloc caches being unavailable, although I can
> imagine other problems could have arose.
> Perhaps we can drill a hole to get the value of vmap_initialized?
>> Given that STAC_ALLOC_ORDER is 2 now, I think it should be fine to use
>> alloc_pages() all the time.
>> Or fix condition, up to you.
> Ok, I'm going to drop vmalloc() for now, we can always implement this later.
> Note that this also removes the necessity to check for recursion.
>>> + prealloc = __vmalloc(
>>> + STACK_ALLOC_SIZE, alloc_flags, PAGE_KERNEL);
>>> + } else {
>>> + page = alloc_pages(alloc_flags, STACK_ALLOC_ORDER);
>>> + if (page)
>>> + prealloc = page_address(page);
>>> + }
>>> + }
>>> +
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
On Tue, Mar 15, 2016 at 1:22 PM, Andrey Ryabinin <[email protected]> wrote:
> 2016-03-15 12:27 GMT+03:00 Alexander Potapenko <[email protected]>:
>> On Mon, Mar 14, 2016 at 5:56 PM, Andrey Ryabinin <[email protected]> wrote:
>>> 2016-03-14 13:43 GMT+03:00 Alexander Potapenko <[email protected]>:
>>>
>>>> +
>>>> + rec = this_cpu_ptr(&depot_recursion);
>>>> + /* Don't store the stack if we've been called recursively. */
>>>> + if (unlikely(*rec))
>>>> + goto fast_exit;
>>>> + *rec = true;
>>>
>>>
>>> This just can't work. As long as preemption enabled, task could
>>> migrate on another cpu anytime.
>> Ah, you're right.
>> Do you think disabling preemption around memory allocation is an option here?
>
> It's definitely not an option. Flag on current doesn't have any
> disadvantage over per-cpu approach
> and it doesn't require preemption safe context.
> However, making the allocation in a separate context would be a better
> way to eliminate recursion.
> i.e. instead of allocating memory depot_save_stack() kicks a work
> which allocates memory.
For the record, I've removed the vmalloc code and reinstated
alloc_pages(), so that there's no more recursion.
Making the allocation in a separate worker will remove the recursion,
but may complicate the synchronization.
I'd refrain from that since we don't have problems with recursion right now.
>
>>> You could use per-task flag, although it's possible to miss some
>>> in-irq stacktraces:
>>>
>>> depot_save_stack()
>>> if (current->stackdeport_recursion)
>>> goto fast_exit;
>>> current->stackdepot_recursion++
>>> <IRQ>
>>> ....
>>> depot_save_stack()
>>> if (current->stackdeport_recursion)
>>> goto fast_exit;
>>>
>>>
>>>
>>>> + if (unlikely(!smp_load_acquire(&next_slab_inited))) {
>>>> + /* Zero out zone modifiers, as we don't have specific zone
>>>> + * requirements. Keep the flags related to allocation in atomic
>>>> + * contexts and I/O.
>>>> + */
>>>> + alloc_flags &= ~GFP_ZONEMASK;
>>>> + alloc_flags &= (GFP_ATOMIC | GFP_KERNEL);
>>>> + /* When possible, allocate using vmalloc() to reduce physical
>>>> + * address space fragmentation. vmalloc() doesn't work if
>>>> + * kmalloc caches haven't been initialized or if it's being
>>>> + * called from an interrupt handler.
>>>> + */
>>>> + if (kmalloc_caches[KMALLOC_SHIFT_HIGH] && !in_interrupt()) {
>>>
>>> This is clearly a wrong way to check whether is slab available or not.
>> Well, I don't think either vmalloc() or kmalloc() provide any
>> interface to check if they are available.
>>
>>> Besides you need to check
>>> vmalloc() for availability, not slab.
>> The problem was in kmalloc caches being unavailable, although I can
>> imagine other problems could have arose.
>> Perhaps we can drill a hole to get the value of vmap_initialized?
>>> Given that STAC_ALLOC_ORDER is 2 now, I think it should be fine to use
>>> alloc_pages() all the time.
>>> Or fix condition, up to you.
>> Ok, I'm going to drop vmalloc() for now, we can always implement this later.
>> Note that this also removes the necessity to check for recursion.
>>>> + prealloc = __vmalloc(
>>>> + STACK_ALLOC_SIZE, alloc_flags, PAGE_KERNEL);
>>>> + } else {
>>>> + page = alloc_pages(alloc_flags, STACK_ALLOC_ORDER);
>>>> + if (page)
>>>> + prealloc = page_address(page);
>>>> + }
>>>> + }
>>>> +
>>
>>
>>
>> --
>> Alexander Potapenko
>> Software Engineer
>>
>> Google Germany GmbH
>> Erika-Mann-Straße, 33
>> 80636 München
>>
>> Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
>> Registergericht und -nummer: Hamburg, HRB 86891
>> Sitz der Gesellschaft: Hamburg
--
Alexander Potapenko
Software Engineer
Google Germany GmbH
Erika-Mann-Straße, 33
80636 München
Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
On Mon, Mar 14, 2016 at 11:43:43AM +0100, Alexander Potapenko wrote:
> +depot_stack_handle_t depot_save_stack(struct stack_trace *trace,
> + gfp_t alloc_flags)
> +{
> + u32 hash;
> + depot_stack_handle_t retval = 0;
> + struct stack_record *found = NULL, **bucket;
> + unsigned long flags;
> + struct page *page = NULL;
> + void *prealloc = NULL;
> + bool *rec;
> +
> + if (unlikely(trace->nr_entries == 0))
> + goto fast_exit;
> +
> + rec = this_cpu_ptr(&depot_recursion);
> + /* Don't store the stack if we've been called recursively. */
> + if (unlikely(*rec))
> + goto fast_exit;
> + *rec = true;
> +
> + hash = hash_stack(trace->entries, trace->nr_entries);
> + /* Bad luck, we won't store this stack. */
> + if (hash == 0)
> + goto exit;
Hello,
why is hash == 0 skipped?
Thanks.
On Mon, Apr 11, 2016 at 9:44 AM, Joonsoo Kim <[email protected]> wrote:
> On Mon, Mar 14, 2016 at 11:43:43AM +0100, Alexander Potapenko wrote:
>> +depot_stack_handle_t depot_save_stack(struct stack_trace *trace,
>> + gfp_t alloc_flags)
>> +{
>> + u32 hash;
>> + depot_stack_handle_t retval = 0;
>> + struct stack_record *found = NULL, **bucket;
>> + unsigned long flags;
>> + struct page *page = NULL;
>> + void *prealloc = NULL;
>> + bool *rec;
>> +
>> + if (unlikely(trace->nr_entries == 0))
>> + goto fast_exit;
>> +
>> + rec = this_cpu_ptr(&depot_recursion);
>> + /* Don't store the stack if we've been called recursively. */
>> + if (unlikely(*rec))
>> + goto fast_exit;
>> + *rec = true;
>> +
>> + hash = hash_stack(trace->entries, trace->nr_entries);
>> + /* Bad luck, we won't store this stack. */
>> + if (hash == 0)
>> + goto exit;
>
> Hello,
>
> why is hash == 0 skipped?
>
> Thanks.
We have to keep a special value to distinguish allocations for which
we don't have the stack trace for some reason.
Making 0 such a value seems natural.
--
Alexander Potapenko
Software Engineer
Google Germany GmbH
Erika-Mann-Straße, 33
80636 München
Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
On Mon, Apr 11, 2016 at 4:39 PM, Alexander Potapenko <[email protected]> wrote:
> On Mon, Apr 11, 2016 at 9:44 AM, Joonsoo Kim <[email protected]> wrote:
>> On Mon, Mar 14, 2016 at 11:43:43AM +0100, Alexander Potapenko wrote:
>>> +depot_stack_handle_t depot_save_stack(struct stack_trace *trace,
>>> + gfp_t alloc_flags)
>>> +{
>>> + u32 hash;
>>> + depot_stack_handle_t retval = 0;
>>> + struct stack_record *found = NULL, **bucket;
>>> + unsigned long flags;
>>> + struct page *page = NULL;
>>> + void *prealloc = NULL;
>>> + bool *rec;
>>> +
>>> + if (unlikely(trace->nr_entries == 0))
>>> + goto fast_exit;
>>> +
>>> + rec = this_cpu_ptr(&depot_recursion);
>>> + /* Don't store the stack if we've been called recursively. */
>>> + if (unlikely(*rec))
>>> + goto fast_exit;
>>> + *rec = true;
>>> +
>>> + hash = hash_stack(trace->entries, trace->nr_entries);
>>> + /* Bad luck, we won't store this stack. */
>>> + if (hash == 0)
>>> + goto exit;
>>
>> Hello,
>>
>> why is hash == 0 skipped?
>>
>> Thanks.
> We have to keep a special value to distinguish allocations for which
> we don't have the stack trace for some reason.
> Making 0 such a value seems natural.
Well, the above statement is false.
Because we only compare the hash to the records that are already in
the depot, there's no point in reserving this value.
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
--
Alexander Potapenko
Software Engineer
Google Germany GmbH
Erika-Mann-Straße, 33
80636 München
Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
On Mon, Apr 11, 2016 at 04:51:47PM +0200, Alexander Potapenko wrote:
> On Mon, Apr 11, 2016 at 4:39 PM, Alexander Potapenko <[email protected]> wrote:
> > On Mon, Apr 11, 2016 at 9:44 AM, Joonsoo Kim <[email protected]> wrote:
> >> On Mon, Mar 14, 2016 at 11:43:43AM +0100, Alexander Potapenko wrote:
> >>> +depot_stack_handle_t depot_save_stack(struct stack_trace *trace,
> >>> + gfp_t alloc_flags)
> >>> +{
> >>> + u32 hash;
> >>> + depot_stack_handle_t retval = 0;
> >>> + struct stack_record *found = NULL, **bucket;
> >>> + unsigned long flags;
> >>> + struct page *page = NULL;
> >>> + void *prealloc = NULL;
> >>> + bool *rec;
> >>> +
> >>> + if (unlikely(trace->nr_entries == 0))
> >>> + goto fast_exit;
> >>> +
> >>> + rec = this_cpu_ptr(&depot_recursion);
> >>> + /* Don't store the stack if we've been called recursively. */
> >>> + if (unlikely(*rec))
> >>> + goto fast_exit;
> >>> + *rec = true;
> >>> +
> >>> + hash = hash_stack(trace->entries, trace->nr_entries);
> >>> + /* Bad luck, we won't store this stack. */
> >>> + if (hash == 0)
> >>> + goto exit;
> >>
> >> Hello,
> >>
> >> why is hash == 0 skipped?
> >>
> >> Thanks.
> > We have to keep a special value to distinguish allocations for which
> > we don't have the stack trace for some reason.
> > Making 0 such a value seems natural.
> Well, the above statement is false.
> Because we only compare the hash to the records that are already in
> the depot, there's no point in reserving this value.
So, could you make a patch for it?
Thanks.