This series of patches represent version 0.13 of the kernel memory
leak detector. See the Documentation/kmemleak.txt file for a more
detailed description. The patches are downloadable from (the whole
patch or the broken-out series) http://www.procode.org/kmemleak/
What's new in this version:
- updated to Linux 2.6.20-rc1
- fixed a bug noticeable with a big number of reported memory leaks
--
Catalin
Signed-off-by: Catalin Marinas <[email protected]>
---
Documentation/kmemleak.txt | 161 ++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 161 insertions(+), 0 deletions(-)
diff --git a/Documentation/kmemleak.txt b/Documentation/kmemleak.txt
new file mode 100644
index 0000000..4689f85
--- /dev/null
+++ b/Documentation/kmemleak.txt
@@ -0,0 +1,161 @@
+Kernel Memory Leak Detector
+===========================
+
+
+Introduction
+------------
+
+Kmemleak provides a way of detecting possible kernel memory leaks in a
+way similar to a tracing garbage collector
+(http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29#Tracing_garbage_collectors),
+with the difference that the orphan objects are not freed but only
+reported via /sys/kernel/debug/memleak. A similar method is used by
+the Valgrind tool (memcheck --leak-check) to detect the memory leaks
+in user-space applications.
+
+
+Usage
+-----
+
+CONFIG_DEBUG_MEMLEAK has to be enabled. For additional config options,
+look in:
+
+ -> Kernel hacking
+ -> Kernel debugging
+ -> Debug slab memory allocations
+ -> Kernel memory leak detector
+
+To display the possible memory leaks:
+
+ # mount -t debugfs nodev /sys/kernel/debug/
+ # cat /sys/kernel/debug/memleak
+
+In order to reduce the run-time overhead, memory scanning is only
+performed when reading the /sys/kernel/debug/memleak file. Note that
+the orphan objects are listed in the order they were allocated and one
+object at the beginning of the list may cause other subsequent objects
+to be reported as orphan.
+
+
+Basic Algorithm
+---------------
+
+The memory allocations via kmalloc, vmalloc, kmem_cache_alloc and
+friends are tracked and the pointers, together with additional
+information like size and stack trace, are stored in a hash table. The
+corresponding freeing function calls are tracked and the pointers
+removed from the hash table.
+
+An allocated block of memory is considered orphan if no pointer to its
+start address or to an alias (pointer aliases are explained later) can
+be found by scanning the memory (including saved registers). This
+means that there might be no way for the kernel to pass the address of
+the allocated block to a freeing function and therefore the block is
+considered a leak.
+
+The scanning algorithm steps:
+
+ 1. mark all objects as white (remaining white objects will later be
+ considered orphan)
+ 2. scan the memory starting with the data section and stacks,
+ checking the values against the addresses stored in the hash
+ table. If a pointer to a white object is found, the object is
+ added to the grey list
+ 3. scan the grey objects for matching addresses (some white objects
+ can become grey and added at the end of the grey list) until the
+ grey set is finished
+ 4. the remaining white objects are considered orphan and reported
+ via /sys/kernel/debug/memleak
+
+
+Improvements
+------------
+
+Because the Linux kernel calculates many pointers at run-time via the
+container_of macro (see the lists implementation), a lot of false
+positives would be reported. This tool re-writes the container_of
+macro so that the offset and type information is stored in the
+.init.memleak_offsets section. The memleak_init() function creates a
+radix tree with corresponding offsets for every encountered block
+type. The memory allocations hook stores the pointer address together
+with its aliases based on the type of the allocated block.
+
+While one level of offsets should be enough for most cases, a second
+level, i.e. container_of(container_of(...)), can be enabled via the
+configuration options (one false positive is the "struct socket_alloc"
+allocation in the sock_alloc_inode() function).
+
+Some allocated memory blocks have pointers stored in the kernel's
+internal data structures and they cannot be detected as orphans. To
+avoid this, kmemleak can also store the number of values equal to the
+pointer (or aliases) that need to be found so that the block is not
+considered a leak. One example is __vmalloc().
+
+
+Limitations and Drawbacks
+-------------------------
+
+The biggest drawback is the reduced performance of memory allocation
+and freeing. To avoid other penalties, the memory scanning is only
+performed when the /sys/kernel/debug/memleak file is read. Anyway,
+this tool is intended for debugging purposes where the performance
+might not be the most important requirement.
+
+Kmemleak currently approximates the type id using the sizeof()
+compiler built-in function. This is not accurate and can lead to false
+negatives. The aim is to gradually change the kernel and kmemleak to
+do more precise type identification.
+
+Another source of false negatives is the data stored in non-pointer
+values. Together with the more precise type identification, kmemleak
+could only scan the pointer members in the allocated structures.
+
+The tool can report false positives. These are cases where an
+allocated block doesn't need to be freed (some cases in the init_call
+functions), the pointer is calculated by other methods than the
+container_of macro or the pointer is stored in a location not scanned
+by kmemleak. If the "member" argument in the offsetof(type, member)
+call is not constant, kmemleak considers the offset as zero since it
+cannot be determined at compilation time.
+
+Page allocations and ioremap are not tracked. Only the ARM and i386
+architectures are currently supported.
+
+
+Kmemleak API
+------------
+
+See the include/linux/memleak.h header for the functions prototype.
+
+memleak_init - initialize kmemleak
+memleak_alloc - notify of a memory block allocation
+memleak_free - notify of a memory block freeing
+memleak_padding - mark the boundaries of the data inside the block
+memleak_not_leak - mark an object as not a leak
+memleak_ignore - do not scan or report an object as leak
+memleak_scan_area - add scan areas inside a memory block
+memleak_insert_aliases - add aliases for a given type
+memleak_erase - erase an old value in a pointer variable
+memleak_typeid_raw - set the typeid for an allocated block
+memleak_container - statically declare a pointer alias
+memleak_typeid - set the typeid for an allocated block (takes
+ a type rather than typeid as argument)
+
+
+Dealing with false positives/negatives
+--------------------------------------
+
+To reduce the false negatives, kmemleak provides the memleak_ignore,
+memleak_scan_area and memleak_erase functions. The task stacks also
+increase the amount of false negatives and their scanning is not
+enabled by default.
+
+To eliminate the false positives caused by code allocating a different
+size from the object one (either for alignment or for extra memory
+after the end of the structure), kmemleak provides the memleak_padding
+and memleak_typeid functions.
+
+For objects known not to be leaks, kmemleak provides the
+memleak_not_leak function. The memleak_ignore could also be used if
+the memory block is known not to contain other pointers and it will no
+longer be scanned.
This patch handles the kmemleak operations needed for modules loading so
that memory allocations from inside a module are properly tracked.
Signed-off-by: Catalin Marinas <[email protected]>
---
kernel/module.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 56 insertions(+), 0 deletions(-)
diff --git a/kernel/module.c b/kernel/module.c
index b565eae..f1ff3f5 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -345,6 +345,7 @@ static void *percpu_modalloc(unsigned lo
unsigned long extra;
unsigned int i;
void *ptr;
+ int cpu;
if (align > SMP_CACHE_BYTES) {
printk(KERN_WARNING "%s: per-cpu alignment %li > %i\n",
@@ -374,6 +375,10 @@ static void *percpu_modalloc(unsigned lo
if (!split_block(i, size))
return NULL;
+ /* add the per-cpu scanning areas */
+ for_each_possible_cpu(cpu)
+ memleak_alloc(ptr + per_cpu_offset(cpu), size, 0);
+
/* Mark allocated */
pcpu_size[i] = -pcpu_size[i];
return ptr;
@@ -388,6 +393,7 @@ static void percpu_modfree(void *freeme)
{
unsigned int i;
void *ptr = __per_cpu_start + block_size(pcpu_size[0]);
+ int cpu;
/* First entry is core kernel percpu data. */
for (i = 1; i < pcpu_num_used; ptr += block_size(pcpu_size[i]), i++) {
@@ -399,6 +405,10 @@ static void percpu_modfree(void *freeme)
BUG();
free:
+ /* remove the per-cpu scanning areas */
+ for_each_possible_cpu(cpu)
+ memleak_free(freeme + per_cpu_offset(cpu));
+
/* Merge with previous? */
if (pcpu_size[i-1] >= 0) {
pcpu_size[i-1] += pcpu_size[i];
@@ -1529,6 +1539,42 @@ static inline void add_kallsyms(struct m
}
#endif /* CONFIG_KALLSYMS */
+#ifdef CONFIG_DEBUG_MEMLEAK
+static void memleak_load_module(struct module *mod, Elf_Ehdr *hdr,
+ Elf_Shdr *sechdrs, char *secstrings)
+{
+ unsigned int mloffindex, i;
+
+ /* insert any new pointer aliases */
+ mloffindex = find_sec(hdr, sechdrs, secstrings, ".init.memleak_offsets");
+ if (mloffindex)
+ memleak_insert_aliases((void *)sechdrs[mloffindex].sh_addr,
+ (void *)sechdrs[mloffindex].sh_addr
+ + sechdrs[mloffindex].sh_size);
+
+ /* only scan the sections containing data */
+ memleak_scan_area(mod->module_core,
+ (unsigned long)mod - (unsigned long)mod->module_core,
+ sizeof(struct module));
+
+ for (i = 1; i < hdr->e_shnum; i++) {
+ if (!(sechdrs[i].sh_flags & SHF_ALLOC))
+ continue;
+ if (strncmp(secstrings + sechdrs[i].sh_name, ".data", 5) != 0
+ && strncmp(secstrings + sechdrs[i].sh_name, ".bss", 4) != 0)
+ continue;
+
+ memleak_scan_area(mod->module_core,
+ sechdrs[i].sh_addr - (unsigned long)mod->module_core,
+ sechdrs[i].sh_size);
+ }
+}
+#else
+static inline void memleak_load_module(struct module *mod, Elf_Ehdr *hdr,
+ Elf_Shdr *sechdrs, char *secstrings)
+{ }
+#endif
+
/* Allocate and load the module: note that size of section 0 is always
zero, and we rely on this for optional sections. */
static struct module *load_module(void __user *umod,
@@ -1724,6 +1770,10 @@ static struct module *load_module(void _
/* Do the allocs. */
ptr = module_alloc(mod->core_size);
+ /* the pointer to this block is stored in the module structure
+ * which is inside the block. Just mark it as not being a
+ * leak */
+ memleak_not_leak(ptr);
if (!ptr) {
err = -ENOMEM;
goto free_percpu;
@@ -1732,6 +1782,11 @@ static struct module *load_module(void _
mod->module_core = ptr;
ptr = module_alloc(mod->init_size);
+ /* the pointer to this block is stored in the module structure
+ * which is inside the block. This block doesn't need to be
+ * scanned as it contains data and code that will be freed
+ * after the module is initialized */
+ memleak_ignore(ptr);
if (!ptr && mod->init_size) {
err = -ENOMEM;
goto free_core;
@@ -1762,6 +1817,7 @@ static struct module *load_module(void _
}
/* Module has been moved. */
mod = (void *)sechdrs[modindex].sh_addr;
+ memleak_load_module(mod, hdr, sechdrs, secstrings);
/* Now we've moved module, initialize linked lists, etc. */
module_unload_init(mod);
This patch adds the base support for the kernel memory leak
detector. It traces the memory allocation/freeing in a way similar to
the Boehm's conservative garbage collector, the difference being that
the unreferenced objects are not freed but only shown in
/sys/kernel/debug/memleak. Enabling this feature introduces an
overhead to memory allocations.
Signed-off-by: Catalin Marinas <[email protected]>
---
include/linux/kernel.h | 7
include/linux/memleak.h | 111 ++++
init/main.c | 3
lib/Kconfig.debug | 114 ++++
mm/Makefile | 1
mm/memleak.c | 1478 +++++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 1712 insertions(+), 2 deletions(-)
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index b0c4a05..2fca6bf 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -14,6 +14,7 @@
#include <linux/compiler.h>
#include <linux/bitops.h>
#include <linux/log2.h>
+#include <linux/memleak.h>
#include <asm/byteorder.h>
#include <asm/bug.h>
@@ -285,9 +286,13 @@ static inline int __attribute__ ((format
* @member: the name of the member within the struct.
*
*/
-#define container_of(ptr, type, member) ({ \
+#define __container_of(ptr, type, member) ({ \
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
(type *)( (char *)__mptr - offsetof(type,member) );})
+#define container_of(ptr, type, member) ({ \
+ DECLARE_MEMLEAK_OFFSET(container_of, type, member); \
+ __container_of(ptr, type, member); \
+})
/*
* Check at compile time that something is of a particular type.
diff --git a/include/linux/memleak.h b/include/linux/memleak.h
new file mode 100644
index 0000000..39669bf
--- /dev/null
+++ b/include/linux/memleak.h
@@ -0,0 +1,111 @@
+/*
+ * include/linux/memleak.h
+ *
+ * Copyright (C) 2006 ARM Limited
+ * Written by Catalin Marinas <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef __MEMLEAK_H
+#define __MEMLEAK_H
+
+#include <linux/stddef.h>
+
+struct memleak_offset {
+ unsigned long type_id;
+ unsigned long member_type_id;
+ unsigned long offset;
+};
+
+/* type id approximation */
+#define ml_guess_typeid(size) ((unsigned long)(size))
+#define ml_typeid(type) ml_guess_typeid(sizeof(type))
+#define ml_sizeof(typeid) ((size_t)(typeid))
+
+#ifdef CONFIG_DEBUG_MEMLEAK
+
+/* if offsetof(type, member) is not a constant known at compile time,
+ * just use 0 instead since we cannot add it to the
+ * .init.memleak_offsets section
+ */
+#define memleak_offsetof(type, member) \
+ (__builtin_constant_p(offsetof(type, member)) ? \
+ offsetof(type, member) : 0)
+
+#define DECLARE_MEMLEAK_OFFSET(name, type, member) \
+ static const struct memleak_offset \
+ __attribute__ ((__section__ (".init.memleak_offsets"))) \
+ __attribute_used__ __memleak_offset__##name = { \
+ ml_typeid(type), \
+ ml_typeid(typeof(((type *)0)->member)), \
+ memleak_offsetof(type, member) \
+ }
+
+extern void memleak_init(void);
+extern void memleak_alloc(const void *ptr, size_t size, int ref_count);
+extern void memleak_free(const void *ptr);
+extern void memleak_padding(const void *ptr, unsigned long offset, size_t size);
+extern void memleak_not_leak(const void *ptr);
+extern void memleak_ignore(const void *ptr);
+extern void memleak_scan_area(const void *ptr, unsigned long offset, size_t length);
+extern void memleak_insert_aliases(struct memleak_offset *ml_off_start,
+ struct memleak_offset *ml_off_end);
+
+static inline void memleak_erase(void **ptr)
+{
+ *ptr = NULL;
+}
+
+#define memleak_container(type, member) { \
+ DECLARE_MEMLEAK_OFFSET(container_of, type, member); \
+}
+
+extern void memleak_typeid_raw(const void *ptr, unsigned long type_id);
+#define memleak_typeid(ptr, type) \
+ memleak_typeid_raw(ptr, ml_typeid(type))
+
+#else
+
+#define DECLARE_MEMLEAK_OFFSET(name, type, member)
+
+static inline void memleak_init(void)
+{ }
+static inline void memleak_alloc(const void *ptr, size_t size, int ref_count)
+{ }
+static inline void memleak_free(const void *ptr)
+{ }
+static inline void memleak_padding(const void *ptr, unsigned long offset, size_t size)
+{ }
+static inline void memleak_not_leak(const void *ptr)
+{ }
+static inline void memleak_ignore(const void *ptr)
+{ }
+static inline void memleak_scan_area(const void *ptr, unsigned long offset, size_t length)
+{ }
+static inline void memleak_insert_aliases(struct memleak_offset *ml_off_start,
+ struct memleak_offset *ml_off_end)
+{ }
+static inline void memleak_erase(void **ptr)
+{ }
+
+#define memleak_container(type, member)
+
+static inline void memleak_typeid_raw(const void *ptr, unsigned long type_id)
+{ }
+#define memleak_typeid(ptr, type)
+
+#endif /* CONFIG_DEBUG_MEMLEAK */
+
+#endif /* __MEMLEAK_H */
diff --git a/init/main.c b/init/main.c
index e3f0bb2..c5cdc87 100644
--- a/init/main.c
+++ b/init/main.c
@@ -584,6 +584,8 @@ asmlinkage void __init start_kernel(void
cpuset_init_early();
mem_init();
kmem_cache_init();
+ radix_tree_init();
+ memleak_init();
setup_per_cpu_pageset();
numa_policy_init();
if (late_time_init)
@@ -604,7 +606,6 @@ asmlinkage void __init start_kernel(void
key_init();
security_init();
vfs_caches_init(num_physpages);
- radix_tree_init();
signals_init();
/* rootfs populating might need page-writeback */
page_writeback_init();
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 818e458..42f60fb 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -137,6 +137,120 @@ config DEBUG_SLAB_LEAK
bool "Memory leak debugging"
depends on DEBUG_SLAB
+menuconfig DEBUG_MEMLEAK
+ bool "Kernel memory leak detector"
+ default n
+ depends on EXPERIMENTAL && DEBUG_SLAB
+ select DEBUG_FS
+ select STACKTRACE
+ select FRAME_POINTER
+ select KALLSYMS
+ help
+ Say Y here if you want to enable the memory leak
+ detector. The memory allocation/freeing is traced in a way
+ similar to the Boehm's conservative garbage collector, the
+ difference being that the orphan objects are not freed but
+ only shown in /sys/kernel/debug/memleak. Enabling this
+ feature will introduce an overhead to memory
+ allocations. See Documentation/kmemleak.txt for more
+ details.
+
+ In order to access the memleak file, debugfs needs to be
+ mounted (usually at /sys/kernel/debug).
+
+config DEBUG_MEMLEAK_HASH_BITS
+ int "Pointer hash bits"
+ default 16
+ depends on DEBUG_MEMLEAK
+ help
+ This option sets the number of bits used for the pointer
+ hash table. Higher values give better memory scanning
+ performance but also lead to bigger RAM usage. The size of
+ the allocated hash table is (sizeof(void*) * 2^hash_bits).
+
+ The minimum recommended value is 16. A maximum value of
+ around 20 should be sufficient.
+
+config DEBUG_MEMLEAK_TRACE_LENGTH
+ int "Stack trace length"
+ default 4
+ depends on DEBUG_MEMLEAK && FRAME_POINTER
+ help
+ This option sets the length of the stack trace for the
+ allocated objects tracked by kmemleak.
+
+config DEBUG_MEMLEAK_PREINIT_OBJECTS
+ int "Pre-init actions buffer size"
+ default 512
+ depends on DEBUG_MEMLEAK
+ help
+ This is the buffer for storing the memory allocation/freeing
+ calls before kmemleak is fully initialized. Each element in
+ the buffer takes 24 bytes on a 32 bit architecture. This
+ buffer will be freed once the system initialization is
+ completed.
+
+config DEBUG_MEMLEAK_SECONDARY_ALIASES
+ bool "Create secondary level pointer aliases"
+ default y
+ depends on DEBUG_MEMLEAK
+ help
+ This option creates aliases for container_of(container_of(member))
+ access to objects. Disabling this option reduces the chances of
+ false negatives but it can slightly increase the number of false
+ positives.
+
+config DEBUG_MEMLEAK_TASK_STACKS
+ bool "Scan task kernel stacks"
+ default y
+ depends on DEBUG_MEMLEAK
+ help
+ This option enables the scanning of the task kernel
+ stacks. This option can introduce false negatives because of
+ the randomness of stacks content.
+
+ If unsure, say Y.
+
+config DEBUG_MEMLEAK_ORPHAN_FREEING
+ bool "Notify when freeing orphan objects"
+ default n
+ depends on DEBUG_MEMLEAK
+ help
+ This option enables the notification when objects
+ considered leaks are freed. The stack dump and the object
+ information displayed allow an easier identification of
+ false positives. Use this mainly for debugging kmemleak.
+
+ If unsure, say N.
+
+config DEBUG_MEMLEAK_REPORT_THLD
+ int "Unreferenced reporting threshold"
+ default 1
+ depends on DEBUG_MEMLEAK
+ help
+ This option sets the number of times an object needs to be
+ detected as unreferenced before being reported as a memory
+ leak. A value of 0 means that the object will be reported
+ the first time it is found as unreferenced. Other positive
+ values mean that the object needs to be found as
+ unreferenced a specified number of times prior to being
+ reported.
+
+ This is useful to avoid the reporting of transient false
+ positives where the pointers might be held in CPU registers,
+ especially on SMP systems, or on the stack when the stack
+ scanning option is disabled.
+
+config DEBUG_MEMLEAK_REPORTS_NR
+ int "Maximum number of reported leaks"
+ default 100
+ depends on DEBUG_MEMLEAK
+ help
+ This option sets the maximum number of leaks reported. If
+ this number is too big and there are leaks to be reported,
+ reading the /sys/kernel/debug/memleak file could lead to
+ some soft-locks.
+
config DEBUG_PREEMPT
bool "Debug preemptible kernel"
depends on DEBUG_KERNEL && PREEMPT && TRACE_IRQFLAGS_SUPPORT
diff --git a/mm/Makefile b/mm/Makefile
index f3c077e..aafe03f 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -29,3 +29,4 @@ obj-$(CONFIG_MEMORY_HOTPLUG) += memory_h
obj-$(CONFIG_FS_XIP) += filemap_xip.o
obj-$(CONFIG_MIGRATION) += migrate.o
obj-$(CONFIG_SMP) += allocpercpu.o
+obj-$(CONFIG_DEBUG_MEMLEAK) += memleak.o
diff --git a/mm/memleak.c b/mm/memleak.c
new file mode 100644
index 0000000..41d5c95
--- /dev/null
+++ b/mm/memleak.c
@@ -0,0 +1,1478 @@
+/*
+ * mm/memleak.c
+ *
+ * Copyright (C) 2006 ARM Limited
+ * Written by Catalin Marinas <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ *
+ * Notes on locking
+ *
+ * Kmemleak needs to allocate/free memory for its own data structures:
+ * the memleak_object, the pointer hash and aliases radix trees. The
+ * memleak_free hook can be called from mm/slab.c with the list_lock
+ * held (i.e. when releasing off-slab management structures) and it
+ * will ackquire the memleak_lock. To avoid deadlocks caused by
+ * locking dependency, the list_lock must not be acquired while
+ * memleak_lock is held. This is ensured by not allocating/freeing
+ * memory while any of the kmemleak locks are held.
+ *
+ * The kmemleak hooks cannot be called concurrently on the same
+ * memleak_object (this is due to the way they were inserted in the
+ * kernel).
+ *
+ * The following locks are present in kmemleak:
+ *
+ * - alias_tree_lock - rwlock for accessing the radix tree holding the
+ * objects type information
+ *
+ * - memleak_lock - global kmemleak lock; protects object_list,
+ * last_object, pointer_hash and memleak_object structures
+ *
+ * Locking dependencies:
+ *
+ * - alias_tree_lock --> l3->list_lock
+ * - l3->list_lock --> memleak_lock
+ */
+
+/* #define DEBUG */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/radix-tree.h>
+#include <linux/gfp.h>
+#include <linux/errno.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/kallsyms.h>
+#include <linux/mman.h>
+#include <linux/nodemask.h>
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+#include <linux/cpumask.h>
+#include <linux/spinlock.h>
+#include <linux/rcupdate.h>
+#include <linux/hash.h>
+#include <linux/stacktrace.h>
+
+#include <asm/bitops.h>
+#include <asm/sections.h>
+#include <asm/percpu.h>
+#include <asm/processor.h>
+#include <asm/thread_info.h>
+#include <asm/atomic.h>
+
+#include <linux/memleak.h>
+
+#ifdef CONFIG_DEBUG_SPINLOCK
+#define BUG_ON_LOCKING(cond) BUG_ON(cond)
+#else
+#define BUG_ON_LOCKING(cond)
+#endif
+
+#define MAX_TRACE CONFIG_DEBUG_MEMLEAK_TRACE_LENGTH
+#define SCAN_BLOCK_SIZE 4096 /* maximum scan length with interrupts disabled */
+#define PREINIT_OBJECTS CONFIG_DEBUG_MEMLEAK_PREINIT_OBJECTS
+#define HASH_BITS CONFIG_DEBUG_MEMLEAK_HASH_BITS
+#define BYTES_PER_WORD sizeof(void *)
+
+extern struct memleak_offset __memleak_offsets_start[];
+extern struct memleak_offset __memleak_offsets_end[];
+
+struct memleak_alias {
+ struct hlist_node node;
+ unsigned long offset;
+};
+
+struct memleak_scan_area {
+ struct hlist_node node;
+ unsigned long offset;
+ size_t length;
+};
+
+struct memleak_object {
+ unsigned long flags;
+ struct list_head object_list;
+ struct list_head gray_list;
+ struct rcu_head rcu;
+ int use_count;
+ unsigned long pointer;
+ unsigned long offset; /* padding */
+ size_t size;
+ unsigned long type_id;
+ int ref_count; /* the minimum encounters of the value */
+ int count; /* the ecounters of the value */
+ int report_thld; /* the unreferenced reporting threshold */
+ struct hlist_head *alias_list;
+ struct hlist_head area_list; /* areas to be scanned (or empty for all) */
+ unsigned long trace[MAX_TRACE];
+ unsigned int trace_len;
+};
+
+struct hash_node {
+ struct hlist_node node;
+ unsigned long val;
+ void *object;
+};
+
+enum memleak_action {
+ MEMLEAK_ALLOC,
+ MEMLEAK_FREE,
+ MEMLEAK_PADDING,
+ MEMLEAK_NOT_LEAK,
+ MEMLEAK_IGNORE,
+ MEMLEAK_SCAN_AREA,
+ MEMLEAK_TYPEID
+};
+
+struct memleak_preinit_object {
+ enum memleak_action type;
+ const void *pointer;
+ unsigned long offset;
+ size_t size;
+ unsigned long type_id;
+ int ref_count;
+};
+
+/* Tree storing the pointer aliases indexed by size */
+static RADIX_TREE(alias_tree, GFP_ATOMIC);
+static DEFINE_RWLOCK(alias_tree_lock);
+/* Hash storing all the possible objects, indexed by the pointer value */
+static struct hlist_head *pointer_hash;
+/* The list of all allocated objects */
+static LIST_HEAD(object_list);
+/* The list of the gray objects */
+static LIST_HEAD(gray_list);
+
+static struct kmem_cache *object_cache;
+/* The main lock for protecting the object lists and radix trees */
+static DEFINE_SPINLOCK(memleak_lock);
+static cpumask_t memleak_cpu_mask = CPU_MASK_NONE;
+static atomic_t memleak_initialized = ATOMIC_INIT(0);
+static int __initdata preinit_pos;
+static struct memleak_preinit_object __initdata preinit_objects[PREINIT_OBJECTS];
+/* last allocated object (optimization); protected by memleak_lock */
+static struct memleak_object *last_object;
+static int reported_leaks;
+
+/* object flags */
+#define OBJECT_ALLOCATED 0x1
+#define OBJECT_TYPE_GUESSED 0x2
+
+/* Hash functions */
+static void hash_init(void)
+{
+ unsigned int i;
+ unsigned int hash_size = sizeof(*pointer_hash) * (1 << HASH_BITS);
+ unsigned int hash_order = fls(hash_size) - 1;
+
+ /* hash_size not a power of 2 */
+ if (hash_size & ((1 << hash_order) - 1))
+ hash_order += 1;
+ if (hash_order < PAGE_SHIFT)
+ hash_order = PAGE_SHIFT;
+
+ pointer_hash = (struct hlist_head *)
+ __get_free_pages(GFP_ATOMIC, hash_order - PAGE_SHIFT);
+ if (!pointer_hash)
+ panic("kmemleak: cannot allocate the pointer hash\n");
+
+ for (i = 0; i < (1 << HASH_BITS); i++)
+ INIT_HLIST_HEAD(&pointer_hash[i]);
+}
+
+static struct hash_node *__hash_lookup_node(unsigned long val)
+{
+ struct hlist_node *elem;
+ struct hash_node *hnode;
+ unsigned long index = hash_long(val, HASH_BITS);
+
+ hlist_for_each_entry(hnode, elem, &pointer_hash[index], node) {
+ if (hnode->val == val)
+ return hnode;
+ }
+ return NULL;
+}
+
+static int hash_insert(unsigned long val, void *object)
+{
+ unsigned long flags;
+ unsigned long index = hash_long(val, HASH_BITS);
+ struct hash_node *hnode = kmalloc(sizeof(*hnode), GFP_ATOMIC);
+
+ if (!hnode)
+ return -ENOMEM;
+ INIT_HLIST_NODE(&hnode->node);
+ hnode->val = val;
+ hnode->object = object;
+
+ spin_lock_irqsave(&memleak_lock, flags);
+ hlist_add_head(&hnode->node, &pointer_hash[index]);
+ spin_unlock_irqrestore(&memleak_lock, flags);
+
+ return 0;
+}
+
+static void *hash_delete(unsigned long val)
+{
+ unsigned long flags;
+ void *object = NULL;
+ struct hash_node *hnode;
+
+ spin_lock_irqsave(&memleak_lock, flags);
+ hnode = __hash_lookup_node(val);
+ if (hnode) {
+ object = hnode->object;
+ hlist_del(&hnode->node);
+ }
+ spin_unlock_irqrestore(&memleak_lock, flags);
+
+ kfree(hnode);
+ return object;
+}
+
+/* memleak_lock held by the calling function and interrupts disabled */
+static void *hash_lookup(unsigned long val)
+{
+ struct hash_node *hnode;
+
+ BUG_ON_LOCKING(!irqs_disabled());
+ BUG_ON_LOCKING(!spin_is_locked(&memleak_lock));
+
+ hnode = __hash_lookup_node(val);
+ if (hnode)
+ return hnode->object;
+ return NULL;
+}
+
+/* helper macros to avoid recursive calls. After disabling the
+ * interrupts, the only calls to this function on the same CPU should
+ * be from kmemleak itself and we can either ignore them or
+ * panic. Calls from other CPU's should be protected by spinlocks */
+#define recursive_enter(cpu_id, flags) ({ \
+ local_irq_save(flags); \
+ cpu_id = get_cpu(); \
+ cpu_test_and_set(cpu_id, memleak_cpu_mask); \
+})
+
+#define recursive_clear(cpu_id) do { \
+ cpu_clear(cpu_id, memleak_cpu_mask); \
+} while (0)
+
+#define recursive_exit(flags) do { \
+ put_cpu_no_resched(); \
+ local_irq_restore(flags); \
+} while (0)
+
+/* Object colors, encoded with count and ref_count:
+ * - white - orphan object, i.e. not enough references to it (ref_count >= 1)
+ * - gray - referred at least once and therefore non-orphan (ref_count == 0)
+ * - black - ignore; it doesn't contain references (text section) (ref_count == -1) */
+static inline int color_white(const struct memleak_object *object)
+{
+ return object->count != -1 && object->count < object->ref_count;
+}
+
+static inline int color_gray(const struct memleak_object *object)
+{
+ return object->ref_count != -1 && object->count >= object->ref_count;
+}
+
+static inline int color_black(const struct memleak_object *object)
+{
+ return object->ref_count == -1;
+}
+
+#ifdef DEBUG
+static inline void dump_object_internals(struct memleak_object *object)
+{
+ struct memleak_alias *alias;
+ struct hlist_node *elem;
+
+ printk(KERN_NOTICE " size = %d\n", object->size);
+ printk(KERN_NOTICE " ref_count = %d\n", object->ref_count);
+ printk(KERN_NOTICE " count = %d\n", object->count);
+ printk(KERN_NOTICE " aliases:\n");
+ if (object->alias_list) {
+ hlist_for_each_entry(alias, elem, object->alias_list, node)
+ printk(KERN_NOTICE " 0x%lx\n", alias->offset);
+ }
+}
+#else
+static inline void dump_object_internals(struct memleak_object *object)
+{ }
+#endif
+
+static void dump_object_info(struct memleak_object *object)
+{
+ struct stack_trace trace;
+
+ trace.nr_entries = object->trace_len;
+ trace.entries = object->trace;
+
+ printk(KERN_NOTICE "kmemleak: object 0x%08lx:\n", object->pointer);
+ dump_object_internals(object);
+ printk(KERN_NOTICE " trace:\n");
+ print_stack_trace(&trace, 4);
+}
+
+/* Insert an element into the aliases radix tree.
+ * Return 0 on success. */
+static int insert_alias(unsigned long type_id, unsigned long offset)
+{
+ int ret = 0;
+ struct hlist_head *alias_list;
+ struct hlist_node *elem;
+ struct memleak_alias *alias;
+ unsigned long flags;
+ unsigned int cpu_id;
+
+ if (type_id == 0 || offset == 0 || offset >= ml_sizeof(type_id))
+ return -EINVAL;
+
+ if (recursive_enter(cpu_id, flags))
+ BUG();
+ write_lock(&alias_tree_lock);
+
+ offset &= ~(BYTES_PER_WORD - 1);
+
+ alias_list = radix_tree_lookup(&alias_tree, type_id);
+ if (!alias_list) {
+ /* no alias list for this type id. Allocate list_head
+ * and insert into the radix tree */
+ alias_list = kmalloc(sizeof(*alias_list), GFP_ATOMIC);
+ if (!alias_list)
+ panic("kmemleak: cannot allocate alias_list\n");
+ INIT_HLIST_HEAD(alias_list);
+
+ ret = radix_tree_insert(&alias_tree, type_id, alias_list);
+ if (ret)
+ panic("kmemleak: cannot insert into the aliases radix tree: %d\n", ret);
+ }
+
+ hlist_for_each_entry(alias, elem, alias_list, node) {
+ if (alias->offset == offset) {
+ ret = -EEXIST;
+ goto out;
+ }
+ }
+
+ alias = kmalloc(sizeof(*alias), GFP_ATOMIC);
+ if (!alias)
+ panic("kmemleak: cannot allocate initial memory\n");
+ INIT_HLIST_NODE(&alias->node);
+ alias->offset = offset;
+
+ hlist_add_head_rcu(&alias->node, alias_list);
+
+ out:
+ write_unlock(&alias_tree_lock);
+ recursive_clear(cpu_id);
+ recursive_exit(flags);
+
+ return ret;
+}
+
+/* Insert pointer aliases from the given array */
+void memleak_insert_aliases(struct memleak_offset *ml_off_start,
+ struct memleak_offset *ml_off_end)
+{
+ struct memleak_offset *ml_off;
+ int i = 0;
+#ifdef CONFIG_DEBUG_MEMLEAK_SECONDARY_ALIASES
+ unsigned long flags;
+#endif
+
+ pr_debug("%s(0x%p, 0x%p)\n", __FUNCTION__, ml_off_start, ml_off_end);
+
+ /* primary aliases - container_of(member) */
+ for (ml_off = ml_off_start; ml_off < ml_off_end; ml_off++)
+ if (!insert_alias(ml_off->type_id, ml_off->offset))
+ i++;
+ pr_debug("kmemleak: found %d primary alias(es)\n", i);
+
+#ifdef CONFIG_DEBUG_MEMLEAK_SECONDARY_ALIASES
+ /* secondary aliases - container_of(container_of(member)) */
+ for (ml_off = ml_off_start; ml_off < ml_off_end; ml_off++) {
+ struct hlist_head *alias_list;
+ struct memleak_alias *alias;
+ struct hlist_node *elem;
+
+ /* with imprecise type identification, if the member
+ * id is the same as the outer structure id, just
+ * ignore as any potential aliases are already in the
+ * tree */
+ if (ml_off->member_type_id == ml_off->type_id)
+ continue;
+
+ read_lock_irqsave(&alias_tree_lock, flags);
+ alias_list = radix_tree_lookup(&alias_tree, ml_off->member_type_id);
+ read_unlock_irqrestore(&alias_tree_lock, flags);
+ if (!alias_list)
+ continue;
+
+ rcu_read_lock();
+ hlist_for_each_entry_rcu(alias, elem, alias_list, node)
+ if (!insert_alias(ml_off->type_id, ml_off->offset + alias->offset))
+ i++;
+ rcu_read_unlock();
+ }
+ pr_debug("kmemleak: found %d alias(es)\n", i);
+#endif
+}
+EXPORT_SYMBOL_GPL(memleak_insert_aliases);
+
+/* called with interrupts disabled */
+static inline struct memleak_object *get_cached_object(unsigned long ptr)
+{
+ struct memleak_object *object;
+
+ BUG_ON_LOCKING(!irqs_disabled());
+
+ spin_lock(&memleak_lock);
+ if (!last_object || ptr != last_object->pointer)
+ last_object = hash_lookup(ptr);
+ object = last_object;
+ spin_unlock(&memleak_lock);
+
+ return object;
+}
+
+/* no need for atomic operations since memleak_lock is already held
+ * and interrupts disabled. Return 1 if successful or 0 otherwise */
+static inline int get_object(struct memleak_object *object)
+{
+ BUG_ON_LOCKING(!irqs_disabled());
+ BUG_ON_LOCKING(!spin_is_locked(&memleak_lock));
+
+ if (object->use_count != 0)
+ object->use_count++;
+ return object->use_count != 0;
+}
+
+static void free_object_rcu(struct rcu_head *rcu)
+{
+ unsigned long flags;
+ unsigned int cpu_id;
+ struct hlist_node *elem, *tmp;
+ struct memleak_scan_area *area;
+ struct memleak_object *object =
+ container_of(rcu, struct memleak_object, rcu);
+
+ if (recursive_enter(cpu_id, flags))
+ BUG();
+
+ /* once use_count is 0, there is no code accessing the object */
+ hlist_for_each_entry_safe(area, elem, tmp, &object->area_list, node) {
+ hlist_del(elem);
+ kfree(area);
+ }
+ kmem_cache_free(object_cache, object);
+
+ recursive_clear(cpu_id);
+ recursive_exit(flags);
+}
+
+/* called without memleak_lock held */
+static void put_object(struct memleak_object *object)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&memleak_lock, flags);
+
+ if (--object->use_count > 0)
+ goto out;
+
+ /* should only get here after delete_object was called */
+ BUG_ON(object->flags & OBJECT_ALLOCATED);
+
+ /* the last reference to this object */
+ list_del_rcu(&object->object_list);
+ call_rcu(&object->rcu, free_object_rcu);
+
+ out:
+ spin_unlock_irqrestore(&memleak_lock, flags);
+}
+
+/* called with interrupts disabled (no need to hold the memleak_lock
+ * as the the pointer aliases functions cannot be called concurrently
+ * on the same object) */
+static void delete_pointer_aliases(struct memleak_object *object)
+{
+ struct memleak_alias *alias;
+ struct hlist_node *elem;
+
+ BUG_ON_LOCKING(!irqs_disabled());
+
+ if (object->offset)
+ hash_delete(object->pointer + object->offset);
+
+ if (object->alias_list) {
+ rcu_read_lock();
+ hlist_for_each_entry_rcu(alias, elem, object->alias_list, node)
+ hash_delete(object->pointer
+ + object->offset + alias->offset);
+ rcu_read_unlock();
+ object->alias_list = NULL;
+ }
+}
+
+/* called with interrupts disabled (see above for why memleak_lock
+ * doesn't need to be held) */
+static void create_pointer_aliases(struct memleak_object *object)
+{
+ struct memleak_alias *alias;
+ struct hlist_node *elem;
+ int err;
+
+ BUG_ON_LOCKING(!irqs_disabled());
+
+ if (object->offset) {
+ err = hash_insert(object->pointer + object->offset, object);
+ if (err) {
+ dump_stack();
+ panic("kmemleak: cannot insert offset into the pointer hash table: %d\n", err);
+ }
+ }
+
+ read_lock(&alias_tree_lock);
+ object->alias_list = radix_tree_lookup(&alias_tree, object->type_id);
+ read_unlock(&alias_tree_lock);
+
+ if (object->alias_list) {
+ rcu_read_lock();
+ hlist_for_each_entry_rcu(alias, elem, object->alias_list, node) {
+ err = hash_insert(object->pointer + object->offset
+ + alias->offset, object);
+ if (err) {
+ dump_stack();
+ panic("kmemleak: cannot insert alias into the pointer hash table: %d\n", err);
+ }
+ }
+ rcu_read_unlock();
+ }
+}
+
+/* Insert a pointer and its aliases into the pointer hash table */
+static inline void create_object(unsigned long ptr, size_t size, int ref_count)
+{
+ struct memleak_object *object;
+ int err;
+ struct stack_trace trace;
+
+ BUG_ON_LOCKING(!irqs_disabled());
+
+ object = kmem_cache_alloc(object_cache, GFP_ATOMIC);
+ if (!object)
+ panic("kmemleak: cannot allocate a memleak_object structure\n");
+
+ INIT_LIST_HEAD(&object->object_list);
+ INIT_LIST_HEAD(&object->gray_list);
+ INIT_HLIST_HEAD(&object->area_list);
+ object->flags = OBJECT_TYPE_GUESSED;
+ object->use_count = 1;
+ object->pointer = ptr;
+ object->offset = 0;
+ object->size = size;
+ object->type_id = ml_guess_typeid(size); /* type id approximation */
+ object->ref_count = ref_count;
+ object->count = -1;
+ object->report_thld = CONFIG_DEBUG_MEMLEAK_REPORT_THLD;
+ object->alias_list = NULL;
+
+ trace.max_entries = MAX_TRACE;
+ trace.nr_entries = 0;
+ trace.entries = object->trace;
+ trace.skip = 2;
+ trace.all_contexts = 0;
+ save_stack_trace(&trace, NULL);
+
+ object->trace_len = trace.nr_entries;
+
+ spin_lock(&memleak_lock);
+ /* object->use_count already set to 1 */
+ list_add_tail_rcu(&object->object_list, &object_list);
+ spin_unlock(&memleak_lock);
+
+ err = hash_insert(ptr, object);
+ if (err) {
+ dump_stack();
+ if (err == -EEXIST) {
+ printk(KERN_NOTICE "Existing pointer:\n");
+ spin_lock(&memleak_lock);
+ object = hash_lookup(ptr);
+ dump_object_info(object);
+ spin_unlock(&memleak_lock);
+ }
+ panic("kmemleak: cannot insert 0x%lx into the pointer hash table: %d\n",
+ ptr, err);
+ }
+
+ create_pointer_aliases(object);
+
+ /* everything completed fine, just mark the object as allocated */
+ spin_lock(&memleak_lock);
+ object->flags |= OBJECT_ALLOCATED;
+ last_object = object;
+ spin_unlock(&memleak_lock);
+}
+
+/* Remove a pointer and its aliases from the pointer hash table */
+static inline void delete_object(unsigned long ptr)
+{
+ struct memleak_object *object;
+
+ BUG_ON_LOCKING(!irqs_disabled());
+
+ object = hash_delete(ptr);
+ if (!object) {
+ dump_stack();
+ printk(KERN_WARNING "kmemleak: freeing unknown object at 0x%08lx\n", ptr);
+ return;
+ }
+
+ spin_lock(&memleak_lock);
+
+ if (object->pointer != ptr) {
+ dump_stack();
+ dump_object_info(object);
+ panic("kmemleak: freeing object by alias 0x%08lx\n", ptr);
+ }
+ BUG_ON(!(object->flags & OBJECT_ALLOCATED));
+
+ object->flags &= ~OBJECT_ALLOCATED;
+
+ /* deleting the cached object */
+ if (last_object && ptr == last_object->pointer)
+ last_object = NULL;
+
+#ifdef CONFIG_DEBUG_MEMLEAK_ORPHAN_FREEING
+ if (color_white(object)) {
+ dump_stack();
+ dump_object_info(object);
+ printk(KERN_WARNING "kmemleak: freeing orphan object 0x%08lx\n", ptr);
+ }
+#endif
+
+ spin_unlock(&memleak_lock);
+
+ delete_pointer_aliases(object);
+ object->pointer = 0;
+ put_object(object);
+}
+
+/* Re-create the pointer aliases according to the new size/offset
+ * information */
+static inline void unpad_object(unsigned long ptr, unsigned long offset,
+ size_t size)
+{
+ struct memleak_object *object;
+
+ BUG_ON_LOCKING(!irqs_disabled());
+
+ object = get_cached_object(ptr);
+ if (!object) {
+ dump_stack();
+ panic("kmemleak: resizing unknown object at 0x%08lx\n", ptr);
+ }
+ if (object->pointer != ptr) {
+ dump_stack();
+ dump_object_info(object);
+ panic("kmemleak: resizing object by alias 0x%08lx\n", ptr);
+ }
+ if (offset + size > object->size) {
+ dump_stack();
+ dump_object_info(object);
+ panic("kmemleak: new boundaries exceed object 0x%08lx\n", ptr);
+ }
+
+ /* nothing changed */
+ if (offset == object->offset && size == object->size)
+ return;
+
+ /* re-create the pointer aliases */
+ delete_pointer_aliases(object);
+
+ spin_lock(&memleak_lock);
+ object->offset = offset;
+ object->size = size;
+ if (object->flags & OBJECT_TYPE_GUESSED)
+ object->type_id = ml_guess_typeid(size);
+ spin_unlock(&memleak_lock);
+
+ create_pointer_aliases(object);
+}
+
+/* Make a object permanently gray (false positive) */
+static inline void make_gray_object(unsigned long ptr)
+{
+ struct memleak_object *object;
+
+ BUG_ON_LOCKING(!irqs_disabled());
+
+ object = get_cached_object(ptr);
+ if (!object) {
+ dump_stack();
+ panic("kmemleak: graying unknown object at 0x%08lx\n", ptr);
+ }
+ if (object->pointer != ptr) {
+ dump_stack();
+ dump_object_info(object);
+ panic("kmemleak: graying object by alias 0x%08lx\n", ptr);
+ }
+
+ spin_lock(&memleak_lock);
+ object->ref_count = 0;
+ spin_unlock(&memleak_lock);
+}
+
+/* Mark the object as black */
+static inline void make_black_object(unsigned long ptr)
+{
+ struct memleak_object *object;
+
+ BUG_ON_LOCKING(!irqs_disabled());
+
+ object = get_cached_object(ptr);
+ if (!object) {
+ dump_stack();
+ panic("kmemleak: blacking unknown object at 0x%08lx\n", ptr);
+ }
+ if (object->pointer != ptr) {
+ dump_stack();
+ dump_object_info(object);
+ panic("kmemleak: blacking object by alias 0x%08lx\n", ptr);
+ }
+
+ spin_lock(&memleak_lock);
+ object->ref_count = -1;
+ spin_unlock(&memleak_lock);
+}
+
+/* Add a scanning area to the object */
+static inline void add_scan_area(unsigned long ptr, unsigned long offset, size_t length)
+{
+ struct memleak_object *object;
+ struct memleak_scan_area *area;
+
+ BUG_ON_LOCKING(!irqs_disabled());
+
+ area = kmalloc(sizeof(*area), GFP_ATOMIC);
+ if (!area)
+ panic("kmemleak: cannot allocate a scan area\n");
+
+ INIT_HLIST_NODE(&area->node);
+ area->offset = offset;
+ area->length = length;
+
+ object = get_cached_object(ptr);
+ if (!object) {
+ dump_stack();
+ panic("kmemleak: adding scan area to unknown object at 0x%08lx\n", ptr);
+ }
+ if (object->pointer != ptr) {
+ dump_stack();
+ dump_object_info(object);
+ panic("kmemleak: adding scan area to object by alias 0x%08lx\n", ptr);
+ }
+ if (offset + length > object->size) {
+ dump_stack();
+ dump_object_info(object);
+ panic("kmemleak: scan area larger than object 0x%08lx\n", ptr);
+ }
+
+ spin_lock(&memleak_lock);
+ hlist_add_head(&area->node, &object->area_list);
+ spin_unlock(&memleak_lock);
+}
+
+/* Re-create the pointer aliases according to the new type id */
+static inline void change_type_id(unsigned long ptr, unsigned long type_id)
+{
+ struct memleak_object *object;
+
+ BUG_ON_LOCKING(!irqs_disabled());
+
+ object = get_cached_object(ptr);
+ if (!object) {
+ dump_stack();
+ panic("kmemleak: changing type of unknown object at 0x%08lx\n", ptr);
+ }
+ if (object->pointer != ptr) {
+ dump_stack();
+ dump_object_info(object);
+ panic("kmemleak: changing type of object by alias 0x%08lx\n", ptr);
+ }
+ if (ml_sizeof(type_id) > object->size) {
+ dump_stack();
+ dump_object_info(object);
+ panic("kmemleak: new type larger than object 0x%08lx\n", ptr);
+ }
+
+ spin_lock(&memleak_lock);
+ object->type_id = type_id;
+ object->flags &= ~OBJECT_TYPE_GUESSED;
+ spin_unlock(&memleak_lock);
+
+ if (type_id == object->type_id)
+ return;
+
+ delete_pointer_aliases(object);
+ create_pointer_aliases(object);
+}
+
+/* Allocation function hook */
+void memleak_alloc(const void *ptr, size_t size, int ref_count)
+{
+ unsigned long flags;
+ unsigned int cpu_id;
+
+ if (!ptr)
+ return;
+
+ if (recursive_enter(cpu_id, flags))
+ goto out;
+
+ pr_debug("%s(0x%p, %d, %d)\n", __FUNCTION__, ptr, size, ref_count);
+
+ if (!atomic_read(&memleak_initialized)) {
+ /* no need for SMP locking since this object is
+ * executed before the other CPUs are started */
+ struct memleak_preinit_object *object;
+
+ BUG_ON(cpu_id != 0);
+
+ if (preinit_pos < PREINIT_OBJECTS) {
+ object = &preinit_objects[preinit_pos];
+
+ object->type = MEMLEAK_ALLOC;
+ object->pointer = ptr;
+ object->size = size;
+ object->ref_count = ref_count;
+ }
+ preinit_pos++;
+
+ goto clear;
+ }
+
+ create_object((unsigned long)ptr, size, ref_count);
+
+ clear:
+ recursive_clear(cpu_id);
+ out:
+ recursive_exit(flags);
+}
+EXPORT_SYMBOL_GPL(memleak_alloc);
+
+/* Freeing function hook */
+void memleak_free(const void *ptr)
+{
+ unsigned long flags;
+ unsigned int cpu_id;
+
+ if (!ptr)
+ return;
+
+ if (recursive_enter(cpu_id, flags))
+ goto out;
+
+ pr_debug("%s(0x%p)\n", __FUNCTION__, ptr);
+
+ if (!atomic_read(&memleak_initialized)) {
+ struct memleak_preinit_object *object;
+
+ BUG_ON(cpu_id != 0);
+
+ if (preinit_pos < PREINIT_OBJECTS) {
+ object = &preinit_objects[preinit_pos];
+
+ object->type = MEMLEAK_FREE;
+ object->pointer = ptr;
+ }
+ preinit_pos++;
+
+ goto clear;
+ }
+
+ delete_object((unsigned long)ptr);
+
+ clear:
+ recursive_clear(cpu_id);
+ out:
+ recursive_exit(flags);
+}
+EXPORT_SYMBOL_GPL(memleak_free);
+
+/* Change the size and location information of an allocated memory
+ * object (this is needed for allocations padding the object) */
+void memleak_padding(const void *ptr, unsigned long offset, size_t size)
+{
+ unsigned long flags;
+ unsigned int cpu_id;
+
+ if (!ptr)
+ return;
+
+ if (recursive_enter(cpu_id, flags))
+ goto out;
+
+ pr_debug("%s(0x%p, %d)\n", __FUNCTION__, ptr, size);
+
+ if (!atomic_read(&memleak_initialized)) {
+ struct memleak_preinit_object *object;
+
+ BUG_ON(cpu_id != 0);
+
+ if (preinit_pos < PREINIT_OBJECTS) {
+ object = &preinit_objects[preinit_pos];
+
+ object->type = MEMLEAK_PADDING;
+ object->pointer = ptr;
+ object->offset = offset;
+ object->size = size;
+ }
+ preinit_pos++;
+
+ goto clear;
+ }
+
+ unpad_object((unsigned long)ptr, offset, size);
+
+ clear:
+ recursive_clear(cpu_id);
+ out:
+ recursive_exit(flags);
+}
+EXPORT_SYMBOL(memleak_padding);
+
+/* Mark a object as a false positive */
+void memleak_not_leak(const void *ptr)
+{
+ unsigned long flags;
+ unsigned int cpu_id;
+
+ if (!ptr)
+ return;
+
+ if (recursive_enter(cpu_id, flags))
+ goto out;
+
+ pr_debug("%s(0x%p)\n", __FUNCTION__, ptr);
+
+ if (!atomic_read(&memleak_initialized)) {
+ struct memleak_preinit_object *object;
+
+ BUG_ON(cpu_id != 0);
+
+ if (preinit_pos < PREINIT_OBJECTS) {
+ object = &preinit_objects[preinit_pos];
+
+ object->type = MEMLEAK_NOT_LEAK;
+ object->pointer = ptr;
+ }
+ preinit_pos++;
+
+ goto clear;
+ }
+
+ make_gray_object((unsigned long)ptr);
+
+ clear:
+ recursive_clear(cpu_id);
+ out:
+ recursive_exit(flags);
+}
+EXPORT_SYMBOL(memleak_not_leak);
+
+/* Ignore this memory object */
+void memleak_ignore(const void *ptr)
+{
+ unsigned long flags;
+ unsigned int cpu_id;
+
+ if (!ptr)
+ return;
+
+ if (recursive_enter(cpu_id, flags))
+ goto out;
+
+ pr_debug("%s(0x%p)\n", __FUNCTION__, ptr);
+
+ if (!atomic_read(&memleak_initialized)) {
+ struct memleak_preinit_object *object;
+
+ BUG_ON(cpu_id != 0);
+
+ if (preinit_pos < PREINIT_OBJECTS) {
+ object = &preinit_objects[preinit_pos];
+
+ object->type = MEMLEAK_IGNORE;
+ object->pointer = ptr;
+ }
+ preinit_pos++;
+
+ goto clear;
+ }
+
+ make_black_object((unsigned long)ptr);
+
+ clear:
+ recursive_clear(cpu_id);
+ out:
+ recursive_exit(flags);
+}
+EXPORT_SYMBOL(memleak_ignore);
+
+/* Add a scanning area to a object */
+void memleak_scan_area(const void *ptr, unsigned long offset, size_t length)
+{
+ unsigned long flags;
+ unsigned int cpu_id;
+
+ if (!ptr)
+ return;
+
+ if (recursive_enter(cpu_id, flags))
+ goto out;
+
+ pr_debug("%s(0x%p)\n", __FUNCTION__, ptr);
+
+ if (!atomic_read(&memleak_initialized)) {
+ struct memleak_preinit_object *object;
+
+ BUG_ON(cpu_id != 0);
+
+ if (preinit_pos < PREINIT_OBJECTS) {
+ object = &preinit_objects[preinit_pos];
+
+ object->type = MEMLEAK_SCAN_AREA;
+ object->pointer = ptr;
+ object->offset = offset;
+ object->size = length;
+ }
+ preinit_pos++;
+
+ goto clear;
+ }
+
+ add_scan_area((unsigned long)ptr, offset, length);
+
+ clear:
+ recursive_clear(cpu_id);
+ out:
+ recursive_exit(flags);
+}
+EXPORT_SYMBOL(memleak_scan_area);
+
+/* Change the type id of an allocated memory object */
+void memleak_typeid_raw(const void *ptr, unsigned long type_id)
+{
+ unsigned long flags;
+ unsigned int cpu_id;
+
+ if (!ptr)
+ return;
+ if (!type_id)
+ return;
+
+ if (recursive_enter(cpu_id, flags))
+ goto out;
+
+ pr_debug("%s(0x%p, %ld)\n", __FUNCTION__, ptr, type_id);
+
+ if (!atomic_read(&memleak_initialized)) {
+ struct memleak_preinit_object *object;
+
+ BUG_ON(cpu_id != 0);
+
+ if (preinit_pos < PREINIT_OBJECTS) {
+ object = &preinit_objects[preinit_pos];
+
+ object->type = MEMLEAK_TYPEID;
+ object->pointer = ptr;
+ object->type_id = type_id;
+ }
+ preinit_pos++;
+
+ goto clear;
+ }
+
+ change_type_id((unsigned long)ptr, type_id);
+
+ clear:
+ recursive_clear(cpu_id);
+ out:
+ recursive_exit(flags);
+}
+EXPORT_SYMBOL(memleak_typeid_raw);
+
+/* Scan a block of memory (exclusive range) for pointers and move
+ * those found to the gray list. This function is called with
+ * memleak_lock held and interrupts disabled */
+static void __scan_block(void *_start, void *_end)
+{
+ unsigned long *ptr;
+ unsigned long *start = (unsigned long *)ALIGN((unsigned long)_start,
+ BYTES_PER_WORD);
+ unsigned long *end = _end;
+
+ BUG_ON_LOCKING(!irqs_disabled());
+ BUG_ON_LOCKING(!spin_is_locked(&memleak_lock));
+
+ for (ptr = start; ptr < end; ptr++) {
+ struct memleak_object *object =
+ hash_lookup((*ptr) & ~(BYTES_PER_WORD - 1));
+ if (!object)
+ continue;
+ if (!color_white(object))
+ continue;
+
+ object->count++;
+ /* this can also happen during the gray_list traversal */
+ if (color_gray(object)) {
+ /* found in the hash, get_object() returns 1 */
+ get_object(object);
+ object->report_thld++;
+ list_add_tail(&object->gray_list, &gray_list);
+ }
+ }
+}
+
+static void scan_block(void *start, void *end)
+{
+ unsigned long flags;
+ void *s, *e;
+
+ s = start;
+ while (s < end) {
+ e = s + SCAN_BLOCK_SIZE;
+
+ spin_lock_irqsave(&memleak_lock, flags);
+ __scan_block(s, e < end ? e : end);
+ spin_unlock_irqrestore(&memleak_lock, flags);
+
+ s = e;
+ }
+}
+
+/* Scan a memory block represented by a memleak_object */
+static inline void scan_object(struct memleak_object *object)
+{
+ struct memleak_scan_area *area;
+ struct hlist_node *elem;
+ unsigned long flags;
+
+ spin_lock_irqsave(&memleak_lock, flags);
+
+ /* freed object */
+ if (!(object->flags & OBJECT_ALLOCATED))
+ goto out;
+
+ if (hlist_empty(&object->area_list))
+ __scan_block((void *)(object->pointer + object->offset),
+ (void *)(object->pointer + object->offset
+ + object->size));
+ else
+ hlist_for_each_entry(area, elem, &object->area_list, node)
+ __scan_block((void *)(object->pointer + area->offset),
+ (void *)(object->pointer + area->offset
+ + area->length));
+
+ out:
+ spin_unlock_irqrestore(&memleak_lock, flags);
+}
+
+/* Scan the memory and print the orphan objects */
+static void memleak_scan(void)
+{
+ unsigned long flags;
+ struct memleak_object *object, *tmp;
+#ifdef CONFIG_DEBUG_MEMLEAK_TASK_STACKS
+ struct task_struct *task;
+#endif
+ int i;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(object, &object_list, object_list) {
+ spin_lock_irqsave(&memleak_lock, flags);
+
+ /* there should be a maximum of 1 reference to any
+ * object at this point */
+ BUG_ON(object->use_count > 1);
+
+ /* reset the reference count (whiten the object) */
+ object->count = 0;
+ if (color_gray(object) && get_object(object))
+ list_add_tail(&object->gray_list, &gray_list);
+ else
+ object->report_thld--;
+
+ spin_unlock_irqrestore(&memleak_lock, flags);
+ }
+ rcu_read_unlock();
+
+ /* data/bss scanning */
+ scan_block(_sdata, _edata);
+ scan_block(__bss_start, __bss_stop);
+
+#ifdef CONFIG_SMP
+ /* per-cpu scanning */
+ for_each_possible_cpu(i)
+ scan_block(__per_cpu_start + per_cpu_offset(i),
+ __per_cpu_end + per_cpu_offset(i));
+#endif
+
+ /* mem_map scanning */
+ for_each_online_node(i) {
+ struct page *page, *end;
+
+ page = NODE_MEM_MAP(i);
+ end = page + NODE_DATA(i)->node_spanned_pages;
+
+ scan_block(page, end);
+ }
+
+#ifdef CONFIG_DEBUG_MEMLEAK_TASK_STACKS
+ read_lock(&tasklist_lock);
+ for_each_process(task)
+ scan_block(task_stack_page(task),
+ task_stack_page(task) + THREAD_SIZE);
+ read_unlock(&tasklist_lock);
+#endif
+
+ /* scan the objects already referenced. More objects will be
+ * referenced and, if there are no memory leaks, all the
+ * objects will be scanned. The list traversal is safe for
+ * both tail additions and removals from inside the loop. The
+ * memleak objects cannot be freed from outside the loop
+ * because their use_count was increased */
+ object = list_entry(gray_list.next, typeof(*object), gray_list);
+ while (&object->gray_list != &gray_list) {
+ /* may add new objects to the list */
+ scan_object(object);
+
+ tmp = list_entry(object->gray_list.next, typeof(*object),
+ gray_list);
+
+ /* remove the object from the list and release it */
+ list_del(&object->gray_list);
+ put_object(object);
+
+ object = tmp;
+ }
+ BUG_ON(!list_empty(&gray_list));
+}
+
+static void *memleak_seq_start(struct seq_file *seq, loff_t *pos)
+{
+ struct memleak_object *object;
+ loff_t n = *pos;
+ unsigned long flags;
+
+ if (!n) {
+ memleak_scan();
+ reported_leaks = 0;
+ }
+ if (reported_leaks >= CONFIG_DEBUG_MEMLEAK_REPORTS_NR)
+ return NULL;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(object, &object_list, object_list) {
+ if (n-- > 0)
+ continue;
+
+ spin_lock_irqsave(&memleak_lock, flags);
+ if (get_object(object)) {
+ spin_unlock_irqrestore(&memleak_lock, flags);
+ goto out;
+ }
+ spin_unlock_irqrestore(&memleak_lock, flags);
+ }
+ object = NULL;
+ out:
+ rcu_read_unlock();
+ return object;
+}
+
+static void *memleak_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+ struct list_head *n;
+ struct memleak_object *next = NULL;
+ unsigned long flags;
+
+ ++(*pos);
+ if (reported_leaks >= CONFIG_DEBUG_MEMLEAK_REPORTS_NR)
+ goto out;
+
+ spin_lock_irqsave(&memleak_lock, flags);
+
+ n = ((struct memleak_object *)v)->object_list.next;
+ if (n != &object_list) {
+ next = list_entry(n, struct memleak_object, object_list);
+ /* still in the object_list, get_object() returns 1 */
+ get_object(next);
+ }
+
+ spin_unlock_irqrestore(&memleak_lock, flags);
+
+ out:
+ put_object(v);
+ return next;
+}
+
+static void memleak_seq_stop(struct seq_file *seq, void *v)
+{
+ if (v)
+ put_object(v);
+}
+
+static int memleak_seq_show(struct seq_file *seq, void *v)
+{
+ const struct memleak_object *object = v;
+ unsigned long flags;
+ char namebuf[KSYM_NAME_LEN + 1] = "";
+ char *modname;
+ unsigned long symsize;
+ unsigned long offset = 0;
+ int i;
+
+ spin_lock_irqsave(&memleak_lock, flags);
+
+ if (!color_white(object))
+ goto out;
+ /* freed in the meantime (false positive) or just allocated */
+ if (!(object->flags & OBJECT_ALLOCATED))
+ goto out;
+ if (object->report_thld >= 0)
+ goto out;
+
+ reported_leaks++;
+ seq_printf(seq, "unreferenced object 0x%08lx (size %d):\n",
+ object->pointer, object->size);
+
+ for (i = 0; i < object->trace_len; i++) {
+ unsigned long trace = object->trace[i];
+
+ kallsyms_lookup(trace, &symsize, &offset, &modname, namebuf);
+ seq_printf(seq, " [<%08lx>] %s\n", trace, namebuf);
+ }
+
+ out:
+ spin_unlock_irqrestore(&memleak_lock, flags);
+ return 0;
+}
+
+static struct seq_operations memleak_seq_ops = {
+ .start = memleak_seq_start,
+ .next = memleak_seq_next,
+ .stop = memleak_seq_stop,
+ .show = memleak_seq_show,
+};
+
+static int memleak_seq_open(struct inode *inode, struct file *file)
+{
+ return seq_open(file, &memleak_seq_ops);
+}
+
+static struct file_operations memleak_fops = {
+ .owner = THIS_MODULE,
+ .open = memleak_seq_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
+/* KMemLeak initialization. Set up the radix tree for the pointer aliases */
+void __init memleak_init(void)
+{
+ int i;
+ unsigned long flags;
+
+ hash_init();
+
+ object_cache = kmem_cache_create("memleak_object_cache", sizeof(struct memleak_object),
+ 0, SLAB_PANIC, NULL, NULL);
+ if (!object_cache)
+ panic("kmemleak: cannot create the object cache\n");
+
+ memleak_insert_aliases(__memleak_offsets_start, __memleak_offsets_end);
+
+ /* no need to hold the spinlock as SMP is not initialized
+ * yet. Holding it here would lead to a deadlock */
+ local_irq_save(flags);
+
+ atomic_set(&memleak_initialized, 1);
+
+ if (preinit_pos >= PREINIT_OBJECTS)
+ panic("kmemleak: preinit objects buffer overflow: %d\n",
+ preinit_pos);
+
+ /* execute the buffered memleak actions */
+ pr_debug("kmemleak: %d preinit actions\n", preinit_pos);
+ for (i = 0; i < preinit_pos; i++) {
+ struct memleak_preinit_object *object = &preinit_objects[i];
+
+ switch (object->type) {
+ case MEMLEAK_ALLOC:
+ memleak_alloc(object->pointer, object->size,
+ object->ref_count);
+ break;
+ case MEMLEAK_FREE:
+ memleak_free(object->pointer);
+ break;
+ case MEMLEAK_PADDING:
+ memleak_padding(object->pointer, object->offset,
+ object->size);
+ break;
+ case MEMLEAK_NOT_LEAK:
+ memleak_not_leak(object->pointer);
+ break;
+ case MEMLEAK_IGNORE:
+ memleak_ignore(object->pointer);
+ break;
+ case MEMLEAK_SCAN_AREA:
+ memleak_scan_area(object->pointer,
+ object->offset, object->size);
+ break;
+ case MEMLEAK_TYPEID:
+ memleak_typeid_raw(object->pointer, object->type_id);
+ break;
+ default:
+ BUG();
+ }
+ }
+
+ local_irq_restore(flags);
+
+ printk(KERN_INFO "Kernel memory leak detector initialized\n");
+}
+
+/* Late initialization function */
+int __init memleak_late_init(void)
+{
+ struct dentry *dentry;
+
+ dentry = debugfs_create_file("memleak", S_IRUGO, NULL, NULL,
+ &memleak_fops);
+ if (!dentry)
+ return -ENOMEM;
+
+ pr_debug("kmemleak: late initialization completed\n");
+
+ return 0;
+}
+late_initcall(memleak_late_init);
There are allocations for which the main pointer cannot be found but
they are not memory leaks. This patch fixes some of them. For more
information on false positives, see Documentation/kmemleak.txt.
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/i386/kernel/setup.c | 2 ++
drivers/base/platform.c | 3 +++
drivers/char/vt.c | 4 ++++
drivers/hwmon/w83627hf.c | 4 ++++
drivers/scsi/hosts.c | 3 +++
drivers/video/console/fbcon.c | 3 +++
fs/ext3/dir.c | 3 +++
include/linux/percpu.h | 5 +++++
ipc/util.c | 6 ++++++
kernel/params.c | 8 +++++++-
net/core/dev.c | 6 ++++++
net/core/skbuff.c | 3 +++
net/ipv4/netfilter/ip_conntrack_core.c | 5 +++++
net/sched/sch_generic.c | 5 +++++
14 files changed, 59 insertions(+), 1 deletions(-)
diff --git a/arch/i386/kernel/setup.c b/arch/i386/kernel/setup.c
index 79df6e6..9f159d9 100644
--- a/arch/i386/kernel/setup.c
+++ b/arch/i386/kernel/setup.c
@@ -658,6 +658,8 @@ static __init int add_pcspkr(void)
int ret;
pd = platform_device_alloc("pcspkr", -1);
+ /* mark it as not a leak since this device doesn't need to be freed */
+ memleak_not_leak(pd);
if (!pd)
return -ENOMEM;
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index f9c903b..0e03452 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -166,6 +166,9 @@ struct platform_device *platform_device_
struct platform_object *pa;
pa = kzalloc(sizeof(struct platform_object) + strlen(name), GFP_KERNEL);
+ /* kmemleak cannot guess the object type because the block
+ * size is different from the object size */
+ memleak_typeid(pa, struct platform_object);
if (pa) {
strcpy(pa->name, name);
pa->pdev.name = pa->name;
diff --git a/drivers/char/vt.c b/drivers/char/vt.c
index 06c32a3..1d74ffe 100644
--- a/drivers/char/vt.c
+++ b/drivers/char/vt.c
@@ -2640,6 +2640,10 @@ static int __init con_init(void)
*/
for (currcons = 0; currcons < MIN_NR_CONSOLES; currcons++) {
vc_cons[currcons].d = vc = alloc_bootmem(sizeof(struct vc_data));
+ /* kmemleak does not track the memory allocated via
+ * alloc_bootmem() but this block contains pointers to
+ * other blocks allocated via kmalloc */
+ memleak_alloc(vc, sizeof(struct vc_data), 1);
visual_init(vc, currcons, 1);
vc->vc_screenbuf = (unsigned short *)alloc_bootmem(vc->vc_screenbuf_size);
vc->vc_kmalloced = 0;
diff --git a/drivers/hwmon/w83627hf.c b/drivers/hwmon/w83627hf.c
index dfdc29c..6f5c70f 100644
--- a/drivers/hwmon/w83627hf.c
+++ b/drivers/hwmon/w83627hf.c
@@ -1097,6 +1097,10 @@ static int w83627hf_detect(struct i2c_ad
err = -ENOMEM;
goto ERROR1;
}
+ /* the pointer to member is stored but the code doesn't use
+ * container_of for access and the alias need to be
+ * explicitely declared here */
+ memleak_container(struct w83627hf_data, client);
new_client = &data->client;
i2c_set_clientdata(new_client, data);
diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 38c3a29..965dd14 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -304,6 +304,9 @@ struct Scsi_Host *scsi_host_alloc(struct
shost = kzalloc(sizeof(struct Scsi_Host) + privsize, gfp_mask);
if (!shost)
return NULL;
+ /* kmemleak cannot guess the object type because the block
+ * size is different from the object size */
+ memleak_typeid(shost, struct Scsi_Host);
shost->host_lock = &shost->default_lock;
spin_lock_init(shost->host_lock);
diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c
index 31f476a..1949f2c 100644
--- a/drivers/video/console/fbcon.c
+++ b/drivers/video/console/fbcon.c
@@ -2485,6 +2485,9 @@ static int fbcon_set_font(struct vc_data
size = h * pitch * charcount;
new_data = kmalloc(FONT_EXTRA_WORDS * sizeof(int) + size, GFP_USER);
+ /* the stored pointer is different from the address of the
+ * allocated block because of padding */
+ memleak_padding(new_data, FONT_EXTRA_WORDS * sizeof(int), size);
if (!new_data)
return -ENOMEM;
diff --git a/fs/ext3/dir.c b/fs/ext3/dir.c
index 665adee..13d2888 100644
--- a/fs/ext3/dir.c
+++ b/fs/ext3/dir.c
@@ -352,6 +352,9 @@ int ext3_htree_store_dirent(struct file
new_fn = kzalloc(len, GFP_KERNEL);
if (!new_fn)
return -ENOMEM;
+ /* kmemleak cannot guess the object type because the block
+ * size is different from the object size */
+ memleak_typeid(new_fn, struct fname);
new_fn->hash = hash;
new_fn->minor_hash = minor_hash;
new_fn->inode = le32_to_cpu(dirent->inode);
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 600e3d3..899776c 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -30,7 +30,12 @@ struct percpu_data {
void *ptrs[NR_CPUS];
};
+/* pointer disguising messes up the kmemleak objects tracking */
+#ifndef CONFIG_DEBUG_MEMLEAK
#define __percpu_disguise(pdata) (struct percpu_data *)~(unsigned long)(pdata)
+#else
+#define __percpu_disguise(pdata) (struct percpu_data *)(pdata)
+#endif
/*
* Use this to get to a cpu's version of the per-cpu object dynamically
* allocated. Non-atomic access to the current CPU's version should
diff --git a/ipc/util.c b/ipc/util.c
index a9b7a22..7088618 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -492,6 +492,9 @@ void* ipc_rcu_alloc(int size)
*/
if (rcu_use_vmalloc(size)) {
out = vmalloc(HDRLEN_VMALLOC + size);
+ /* the stored pointer is different from the address of
+ * the allocated block because of padding */
+ memleak_padding(out, HDRLEN_VMALLOC, size);
if (out) {
out += HDRLEN_VMALLOC;
container_of(out, struct ipc_rcu_hdr, data)->is_vmalloc = 1;
@@ -499,6 +502,9 @@ void* ipc_rcu_alloc(int size)
}
} else {
out = kmalloc(HDRLEN_KMALLOC + size, GFP_KERNEL);
+ /* the stored pointer is different from the address of
+ * the allocated block because of padding */
+ memleak_padding(out, HDRLEN_KMALLOC, size);
if (out) {
out += HDRLEN_KMALLOC;
container_of(out, struct ipc_rcu_hdr, data)->is_vmalloc = 0;
diff --git a/kernel/params.c b/kernel/params.c
index f406655..1510d89 100644
--- a/kernel/params.c
+++ b/kernel/params.c
@@ -548,6 +548,7 @@ static void __init kernel_param_sysfs_se
{
struct module_kobject *mk;
int ret;
+ struct module_param_attrs *mp;
mk = kzalloc(sizeof(struct module_kobject), GFP_KERNEL);
BUG_ON(!mk);
@@ -558,8 +559,13 @@ static void __init kernel_param_sysfs_se
ret = kobject_register(&mk->kobj);
BUG_ON(ret < 0);
+ mp = param_sysfs_setup(mk, kparam, num_params, name_skip);
+ /* this structure is not freed but the pointer is
+ * lost. However, there are other pointers to its members and
+ * the object has to be kept */
+ memleak_not_leak(mp);
/* no need to keep the kobject if no parameter is exported */
- if (!param_sysfs_setup(mk, kparam, num_params, name_skip)) {
+ if (!mp) {
kobject_unregister(&mk->kobj);
kfree(mk);
}
diff --git a/net/core/dev.c b/net/core/dev.c
index e660cb5..96fbe2c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3191,6 +3191,12 @@ struct net_device *alloc_netdev(int size
dev = (struct net_device *)
(((long)p + NETDEV_ALIGN_CONST) & ~NETDEV_ALIGN_CONST);
dev->padded = (char *)dev - (char *)p;
+ /* kmemleak cannot guess the object type because the block
+ * size is different from the object size. The stored pointer
+ * is also different from the address of the allocated block
+ * because of padding */
+ memleak_padding(p, dev->padded, alloc_size - dev->padded);
+ memleak_typeid(p, struct net_device);
if (sizeof_priv)
dev->priv = netdev_priv(dev);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index de7801d..14de9cb 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -153,6 +153,9 @@ struct sk_buff *__alloc_skb(unsigned int
/* Get the HEAD */
skb = kmem_cache_alloc_node(cache, gfp_mask & ~__GFP_DMA, node);
+ /* the skbuff_fclone_cache contains objects larger than
+ * "struct sk_buff" and kmemleak cannot guess the type */
+ memleak_typeid(skb, struct sk_buff);
if (!skb)
goto out;
diff --git a/net/ipv4/netfilter/ip_conntrack_core.c b/net/ipv4/netfilter/ip_conntrack_core.c
index 8556a4f..0bcd3a5 100644
--- a/net/ipv4/netfilter/ip_conntrack_core.c
+++ b/net/ipv4/netfilter/ip_conntrack_core.c
@@ -639,6 +639,11 @@ struct ip_conntrack *ip_conntrack_alloc(
}
conntrack = kmem_cache_alloc(ip_conntrack_cachep, GFP_ATOMIC);
+ /* tuplehash_to_ctrack doesn't pass a constant argument to
+ * container_of and therefore the conntrack->tuplehash[].list
+ * aliases are ignored */
+ memleak_container(struct ip_conntrack, tuplehash[IP_CT_DIR_ORIGINAL]);
+ memleak_container(struct ip_conntrack, tuplehash[IP_CT_DIR_REPLY]);
if (!conntrack) {
DEBUGP("Can't allocate conntrack.\n");
atomic_dec(&ip_conntrack_count);
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index bc116bd..eef15c4 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -434,6 +434,11 @@ struct Qdisc *qdisc_alloc(struct net_dev
goto errout;
sch = (struct Qdisc *) QDISC_ALIGN((unsigned long) p);
sch->padded = (char *) sch - (char *) p;
+ /* kmemleak cannot guess the object type because the block
+ * size is different from the object size. The stored pointer
+ * is also different from the address of the allocated block
+ * because of padding */
+ memleak_padding(p, sch->padded, sizeof(struct Qdisc));
INIT_LIST_HEAD(&sch->list);
skb_queue_head_init(&sch->q);
This patch adds the callbacks to memleak_(alloc|free) functions from
kmalloc/kfree, kmem_cache_(alloc|free), vmalloc/vfree etc.
Signed-off-by: Catalin Marinas <[email protected]>
---
include/linux/slab_def.h | 6 ++++++
mm/page_alloc.c | 2 ++
mm/slab.c | 19 +++++++++++++++++--
mm/vmalloc.c | 22 ++++++++++++++++++++--
4 files changed, 45 insertions(+), 4 deletions(-)
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 4b463e6..30d4bd9 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -25,6 +25,7 @@ extern struct cache_sizes malloc_sizes[]
static inline void *kmalloc(size_t size, gfp_t flags)
{
+#ifndef CONFIG_DEBUG_MEMLEAK
if (__builtin_constant_p(size)) {
int i = 0;
#define CACHE(x) \
@@ -43,11 +44,13 @@ found:
malloc_sizes[i].cs_dmacachep :
malloc_sizes[i].cs_cachep, flags);
}
+#endif
return __kmalloc(size, flags);
}
static inline void *kzalloc(size_t size, gfp_t flags)
{
+#ifndef CONFIG_DEBUG_MEMLEAK
if (__builtin_constant_p(size)) {
int i = 0;
#define CACHE(x) \
@@ -66,6 +69,7 @@ found:
malloc_sizes[i].cs_dmacachep :
malloc_sizes[i].cs_cachep, flags);
}
+#endif
return __kzalloc(size, flags);
}
@@ -74,6 +78,7 @@ extern void *__kmalloc_node(size_t size,
static inline void *kmalloc_node(size_t size, gfp_t flags, int node)
{
+#ifndef CONFIG_DEBUG_MEMLEAK
if (__builtin_constant_p(size)) {
int i = 0;
#define CACHE(x) \
@@ -92,6 +97,7 @@ found:
malloc_sizes[i].cs_dmacachep :
malloc_sizes[i].cs_cachep, flags, node);
}
+#endif
return __kmalloc_node(size, flags, node);
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8c1a116..816e909 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3363,6 +3363,8 @@ void *__init alloc_large_system_hash(con
if (_hash_mask)
*_hash_mask = (1 << log2qty) - 1;
+ memleak_alloc(table, size, 1);
+
return table;
}
diff --git a/mm/slab.c b/mm/slab.c
index 909975f..4db2029 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2574,6 +2574,9 @@ static struct slab *alloc_slabmgmt(struc
/* Slab management obj is off-slab. */
slabp = kmem_cache_alloc_node(cachep->slabp_cache,
local_flags & ~GFP_THISNODE, nodeid);
+ /* only scan the list member to avoid false negatives */
+ memleak_scan_area(slabp, offsetof(struct slab, list),
+ sizeof(struct list_head));
if (!slabp)
return NULL;
} else {
@@ -3194,6 +3197,8 @@ static inline void *____cache_alloc(stru
STATS_INC_ALLOCMISS(cachep);
objp = cache_alloc_refill(cachep, flags);
}
+ /* avoid false negatives */
+ memleak_erase(&ac->entry[ac->avail]);
return objp;
}
@@ -3222,6 +3227,7 @@ static __always_inline void *__cache_all
local_irq_restore(save_flags);
objp = cache_alloc_debugcheck_after(cachep, flags, objp,
caller);
+ memleak_alloc(objp, obj_size(cachep), 1);
prefetchw(objp);
return objp;
}
@@ -3492,6 +3498,7 @@ static inline void __cache_free(struct k
struct array_cache *ac = cpu_cache_get(cachep);
check_irq_off();
+ memleak_free(objp);
objp = cache_free_debugcheck(cachep, objp, __builtin_return_address(0));
if (cache_free_alien(cachep, objp))
@@ -3628,6 +3635,7 @@ __cache_alloc_node(struct kmem_cache *ca
local_irq_restore(save_flags);
ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
+ memleak_alloc(ptr, obj_size(cachep), 1);
return ptr;
}
@@ -3643,11 +3651,14 @@ static __always_inline void *
__do_kmalloc_node(size_t size, gfp_t flags, int node, void *caller)
{
struct kmem_cache *cachep;
+ void *ptr;
cachep = kmem_find_general_cachep(size, flags);
if (unlikely(cachep == NULL))
return NULL;
- return kmem_cache_alloc_node(cachep, flags, node);
+ ptr = kmem_cache_alloc_node(cachep, flags, node);
+ memleak_padding(ptr, 0, size);
+ return ptr;
}
#ifdef CONFIG_DEBUG_SLAB
@@ -3683,6 +3694,7 @@ static __always_inline void *__do_kmallo
void *caller)
{
struct kmem_cache *cachep;
+ void *ptr;
/* If you want to save a few bytes .text space: replace
* __ with kmem_.
@@ -3692,7 +3704,10 @@ static __always_inline void *__do_kmallo
cachep = __find_general_cachep(size, flags);
if (unlikely(cachep == NULL))
return NULL;
- return __cache_alloc(cachep, flags, caller);
+ ptr = __cache_alloc(cachep, flags, caller);
+ memleak_padding(ptr, 0, size);
+
+ return ptr;
}
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 86897ee..603aee9 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -365,6 +365,9 @@ void __vunmap(void *addr, int deallocate
void vfree(void *addr)
{
BUG_ON(in_interrupt());
+
+ memleak_free(addr);
+
__vunmap(addr, 1);
}
EXPORT_SYMBOL(vfree);
@@ -465,7 +468,14 @@ fail:
void *__vmalloc_area(struct vm_struct *area, gfp_t gfp_mask, pgprot_t prot)
{
- return __vmalloc_area_node(area, gfp_mask, prot, -1);
+ void *addr = __vmalloc_area_node(area, gfp_mask, prot, -1);
+
+ /* this needs ref_count = 2 since vm_struct also contains a
+ * pointer to this address. The guard page is also subtracted
+ * from the size */
+ memleak_alloc(addr, area->size - PAGE_SIZE, 2);
+
+ return addr;
}
/**
@@ -483,6 +493,8 @@ static void *__vmalloc_node(unsigned lon
int node)
{
struct vm_struct *area;
+ void *addr;
+ unsigned long real_size = size;
size = PAGE_ALIGN(size);
if (!size || (size >> PAGE_SHIFT) > num_physpages)
@@ -492,7 +504,13 @@ static void *__vmalloc_node(unsigned lon
if (!area)
return NULL;
- return __vmalloc_area_node(area, gfp_mask, prot, node);
+ addr = __vmalloc_area_node(area, gfp_mask, prot, node);
+
+ /* this needs ref_count = 2 since the vm_struct also contains
+ a pointer to this address */
+ memleak_alloc(addr, real_size, 2);
+
+ return addr;
}
void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot)
This patch adds the kmemleak-related entries to the vmlinux.lds.S
linker script.
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/i386/kernel/vmlinux.lds.S | 6 ++++++
include/asm-i386/thread_info.h | 18 ++++++++++++++++--
2 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/arch/i386/kernel/vmlinux.lds.S b/arch/i386/kernel/vmlinux.lds.S
index a53c8b1..2850bed 100644
--- a/arch/i386/kernel/vmlinux.lds.S
+++ b/arch/i386/kernel/vmlinux.lds.S
@@ -69,6 +69,7 @@ SECTIONS
/* writeable */
. = ALIGN(4096);
+ _sdata = .; /* Start of data section */
.data : AT(ADDR(.data) - LOAD_OFFSET) { /* Data */
*(.data)
CONSTRUCTORS
@@ -193,6 +194,11 @@ SECTIONS
*(.data.percpu)
__per_cpu_end = .;
}
+ .init.memleak_offsets : AT(ADDR(.init.memleak_offsets) - LOAD_OFFSET) {
+ __memleak_offsets_start = .;
+ *(.init.memleak_offsets)
+ __memleak_offsets_end = .;
+ }
. = ALIGN(4096);
/* freed after init ends here */
diff --git a/include/asm-i386/thread_info.h b/include/asm-i386/thread_info.h
index 4b187bb..02f3457 100644
--- a/include/asm-i386/thread_info.h
+++ b/include/asm-i386/thread_info.h
@@ -95,9 +95,23 @@ static inline struct thread_info *curren
/* thread information allocation */
#ifdef CONFIG_DEBUG_STACK_USAGE
-#define alloc_thread_info(tsk) kzalloc(THREAD_SIZE, GFP_KERNEL)
+#define alloc_thread_info(tsk) \
+ ({ \
+ struct thread_info *ret; \
+ \
+ ret = kzalloc(THREAD_SIZE, GFP_KERNEL); \
+ memleak_ignore(ret); \
+ ret; \
+ })
#else
-#define alloc_thread_info(tsk) kmalloc(THREAD_SIZE, GFP_KERNEL)
+#define alloc_thread_info(tsk) \
+ ({ \
+ struct thread_info *ret; \
+ \
+ ret = kmalloc(THREAD_SIZE, GFP_KERNEL); \
+ memleak_ignore(ret); \
+ ret; \
+ })
#endif
#define free_thread_info(info) kfree(info)
This patch adds the kmemleak-related entries to the vmlinux.lds.S
linker script.
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm/kernel/vmlinux.lds.S | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index a8fa75e..7ec22ad 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -61,6 +61,11 @@ SECTIONS
__per_cpu_start = .;
*(.data.percpu)
__per_cpu_end = .;
+#ifdef CONFIG_DEBUG_MEMLEAK
+ __memleak_offsets_start = .;
+ *(.init.memleak_offsets)
+ __memleak_offsets_end = .;
+#endif
#ifndef CONFIG_XIP_KERNEL
__init_begin = _stext;
*(.init.data)
@@ -109,6 +114,7 @@ SECTIONS
.data : AT(__data_loc) {
__data_start = .; /* address in memory */
+ _sdata = .;
/*
* first, the init task union, aligned
@@ -159,6 +165,7 @@ SECTIONS
__bss_start = .; /* BSS */
*(.bss)
*(COMMON)
+ __bss_stop = .;
_end = .;
}
/* Stabs debugging sections. */
This patch adds a loadable module that deliberately leaks memory. It
is used for testing various memory leaking scenarios.
Signed-off-by: Catalin Marinas <[email protected]>
---
lib/Kconfig.debug | 11 ++++++
mm/Makefile | 1 +
mm/memleak-test.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 106 insertions(+), 0 deletions(-)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 9695af1..b430479 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -263,6 +263,17 @@ config DEBUG_KEEP_INIT
If unsure, say N.
+config DEBUG_MEMLEAK_TEST
+ tristate "Test the kernel memory leak detector"
+ default n
+ depends on DEBUG_MEMLEAK
+ help
+ Say Y or M here to build the test harness for the kernel
+ memory leak detector. At the moment, this option enables a
+ module that explicitly leaks memory.
+
+ If unsure, say N.
+
config DEBUG_PREEMPT
bool "Debug preemptible kernel"
depends on DEBUG_KERNEL && PREEMPT && TRACE_IRQFLAGS_SUPPORT
diff --git a/mm/Makefile b/mm/Makefile
index aafe03f..9f5d450 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -30,3 +30,4 @@ obj-$(CONFIG_FS_XIP) += filemap_xip.o
obj-$(CONFIG_MIGRATION) += migrate.o
obj-$(CONFIG_SMP) += allocpercpu.o
obj-$(CONFIG_DEBUG_MEMLEAK) += memleak.o
+obj-$(CONFIG_DEBUG_MEMLEAK_TEST) += memleak-test.o
diff --git a/mm/memleak-test.c b/mm/memleak-test.c
new file mode 100644
index 0000000..51320d8
--- /dev/null
+++ b/mm/memleak-test.c
@@ -0,0 +1,94 @@
+/*
+ * mm/memleak-test.c
+ *
+ * Copyright (C) 2006 ARM Limited
+ * Written by Catalin Marinas <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <linux/list.h>
+#include <linux/percpu.h>
+
+#include <linux/memleak.h>
+
+struct test_node {
+ long header[25];
+ struct list_head list;
+ long footer[25];
+};
+
+static LIST_HEAD(test_list);
+static DEFINE_PER_CPU(void *, test_pointer);
+
+/* Some very simple testing. This function needs to be extended for
+ * proper testing */
+static int __init memleak_test_init(void)
+{
+ struct test_node *elem;
+ int i;
+
+ printk(KERN_INFO "KMemLeak testing\n");
+
+ /* make some orphan objects */
+ kmalloc(32, GFP_KERNEL);
+ kmalloc(32, GFP_KERNEL);
+ kmalloc(1024, GFP_KERNEL);
+ kmalloc(1024, GFP_KERNEL);
+ kmalloc(2048, GFP_KERNEL);
+ kmalloc(2048, GFP_KERNEL);
+ kmalloc(4096, GFP_KERNEL);
+ kmalloc(4096, GFP_KERNEL);
+#ifndef CONFIG_MODULES
+ kmem_cache_alloc(files_cachep, GFP_KERNEL);
+ kmem_cache_alloc(files_cachep, GFP_KERNEL);
+#endif
+ vmalloc(64);
+ vmalloc(64);
+
+ /* add elements to a list. They should only appear as orphan
+ * after the module is removed */
+ for (i = 0; i < 10; i++) {
+ elem = kmalloc(sizeof(*elem), GFP_KERNEL);
+ if (!elem)
+ return -ENOMEM;
+ memset(elem, 0, sizeof(*elem));
+ INIT_LIST_HEAD(&elem->list);
+
+ list_add_tail(&elem->list, &test_list);
+ }
+
+ for_each_possible_cpu(i)
+ per_cpu(test_pointer, i) = kmalloc(129, GFP_KERNEL);
+
+ return 0;
+}
+module_init(memleak_test_init);
+
+static void __exit memleak_test_exit(void)
+{
+ struct test_node *elem, *tmp;
+
+ /* remove the list elements without actually freeing the memory */
+ list_for_each_entry_safe(elem, tmp, &test_list, list)
+ list_del(&elem->list);
+}
+module_exit(memleak_test_exit);
+
+MODULE_LICENSE("GPL");
Signed-off-by: Catalin Marinas <[email protected]>
---
MAINTAINERS | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index dea5b2a..fe186b4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1746,6 +1746,12 @@ L: [email protected]
W: http://www.kerneljanitors.org/
S: Maintained
+KERNEL MEMORY LEAK DETECTOR
+P: Catalin Marinas
+M: [email protected]
+W: http://www.procode.org/kmemleak/
+S: Maintained
+
KERNEL NFSD
P: Neil Brown
M: [email protected]
This patch adds the CONFIG_DEBUG_KEEP_INIT option which preserves the
.init.text section after initialization. Memory leaks happening during this
phase can be more easily tracked.
Signed-off-by: Catalin Marinas <[email protected]>
---
include/linux/init.h | 5 +++++
lib/Kconfig.debug | 12 ++++++++++++
2 files changed, 17 insertions(+), 0 deletions(-)
diff --git a/include/linux/init.h b/include/linux/init.h
index 5a593a1..eecdb17 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -40,8 +40,13 @@
/* These are for everybody (although not all archs will actually
discard it in modules) */
+#ifdef CONFIG_DEBUG_KEEP_INIT
+#define __init
+#define __initdata
+#else
#define __init __attribute__ ((__section__ (".init.text")))
#define __initdata __attribute__ ((__section__ (".init.data")))
+#endif
#define __exitdata __attribute__ ((__section__(".exit.data")))
#define __exit_call __attribute_used__ __attribute__ ((__section__ (".exitcall.exit")))
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 42f60fb..9695af1 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -251,6 +251,18 @@ config DEBUG_MEMLEAK_REPORTS_NR
reading the /sys/kernel/debug/memleak file could lead to
some soft-locks.
+config DEBUG_KEEP_INIT
+ bool "Do not free the __init code/data"
+ default n
+ depends on DEBUG_MEMLEAK
+ help
+ This option moves the __init code/data out of the
+ .init.text/.init.data sections. It is useful for identifying
+ memory leaks happening during the kernel or modules
+ initialization.
+
+ If unsure, say N.
+
config DEBUG_PREEMPT
bool "Debug preemptible kernel"
depends on DEBUG_KERNEL && PREEMPT && TRACE_IRQFLAGS_SUPPORT
* Catalin Marinas <[email protected]> wrote:
> This series of patches represent version 0.13 of the kernel memory
> leak detector. See the Documentation/kmemleak.txt file for a more
> detailed description. The patches are downloadable from (the whole
> patch or the broken-out series) http://www.procode.org/kmemleak/
nice stuff!
FYI, i'm working on integrating kmemleak into -rt. Firstly, i needed the
fixes below when applying it ontop of 2.6.19-rt15.
Secondly, i'm wondering about the following recursion:
[<c045a7e1>] rt_spin_lock_slowlock+0x98/0x1dc
[<c045b16b>] rt_spin_lock+0x13/0x4b
[<c018f155>] kfree+0x3a/0xce
[<c0192e79>] hash_delete+0x58/0x5f
[<c019309b>] memleak_free+0xe9/0x1e6
[<c018ed2e>] __cache_free+0x27/0x414
[<c018f1d0>] kfree+0xb5/0xce
[<c02788dd>] acpi_ns_get_node+0xb1/0xbb
[<c02772fa>] acpi_ns_root_initialize+0x30f/0x31d
[<c0280194>] acpi_initialize_subsystem+0x58/0x87
[<c06a4641>] acpi_early_init+0x4f/0x12e
[<c06888bc>] start_kernel+0x41b/0x44b
kfree() within kfree() ... this probably works on the upstream SLAB
allocator but makes it pretty nasty to sort out SLAB locking in -rt.
Wouldnt it be better to just preallocate the hash nodes, like lockdep
does, to avoid conceptual nesting? Basically debugging infrastructure
should rely on other infrastructure as little as possible.
also, the number of knobs in the Kconfig is quite large:
CONFIG_DEBUG_MEMLEAK=y
CONFIG_DEBUG_MEMLEAK_HASH_BITS=16
CONFIG_DEBUG_MEMLEAK_TRACE_LENGTH=4
CONFIG_DEBUG_MEMLEAK_PREINIT_OBJECTS=512
CONFIG_DEBUG_MEMLEAK_SECONDARY_ALIASES=y
CONFIG_DEBUG_MEMLEAK_TASK_STACKS=y
i'd suggest to simplify the Kconfig choices to a binary thing: either
have kmemleak or not have it.
plus the Kconfig dependency on SLAB_DEBUG makes it less likely for
people to spontaneously try kmemleak. I'd suggest to offer KMEMLEAK
unconditionally (when KERNEL_DEBUG is specified) and simply select
SLAB_DEBUG.
Ingo
---
fs/fcntl.c | 4 ++--
fs/mbcache.c | 2 +-
include/linux/list.h | 9 +++++++++
include/linux/pid.h | 7 +++++++
kernel/pid.c | 2 +-
5 files changed, 20 insertions(+), 4 deletions(-)
Index: linux/fs/fcntl.c
===================================================================
--- linux.orig/fs/fcntl.c
+++ linux/fs/fcntl.c
@@ -512,7 +512,7 @@ void send_sigio(struct fown_struct *fown
goto out_unlock_fown;
read_lock(&tasklist_lock);
- do_each_pid_task(pid, type, p) {
+ __do_each_pid_task(pid, type, p) {
send_sigio_to_task(p, fown, fd, band);
} while_each_pid_task(pid, type, p);
read_unlock(&tasklist_lock);
@@ -543,7 +543,7 @@ int send_sigurg(struct fown_struct *fown
ret = 1;
read_lock(&tasklist_lock);
- do_each_pid_task(pid, type, p) {
+ __do_each_pid_task(pid, type, p) {
send_sigurg_to_task(p, fown);
} while_each_pid_task(pid, type, p);
read_unlock(&tasklist_lock);
Index: linux/fs/mbcache.c
===================================================================
--- linux.orig/fs/mbcache.c
+++ linux/fs/mbcache.c
@@ -555,7 +555,7 @@ __mb_cache_entry_find(struct list_head *
{
while (l != head) {
struct mb_cache_entry *ce =
- list_entry(l, struct mb_cache_entry,
+ __list_entry(l, struct mb_cache_entry,
e_indexes[index].o_list);
if (ce->e_bdev == bdev && ce->e_indexes[index].o_key == key) {
DEFINE_WAIT(wait);
Index: linux/include/linux/list.h
===================================================================
--- linux.orig/include/linux/list.h
+++ linux/include/linux/list.h
@@ -367,6 +367,8 @@ static inline void list_splice_init(stru
*/
#define list_entry(ptr, type, member) \
container_of(ptr, type, member)
+#define __list_entry(ptr, type, member) \
+ __container_of(ptr, type, member)
/**
* list_for_each - iterate over a list
@@ -822,6 +824,7 @@ static inline void hlist_add_after_rcu(s
}
#define hlist_entry(ptr, type, member) container_of(ptr,type,member)
+#define __hlist_entry(ptr, type, member) __container_of(ptr,type,member)
#define hlist_for_each(pos, head) \
for (pos = (head)->first; pos && ({ prefetch(pos->next); 1; }); \
@@ -897,6 +900,12 @@ static inline void hlist_add_after_rcu(s
rcu_dereference(pos) && ({ prefetch(pos->next); 1;}) && \
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
pos = pos->next)
+#define __hlist_for_each_entry_rcu(tpos, pos, head, member) \
+ for (pos = (head)->first; \
+ rcu_dereference(pos) && ({ prefetch(pos->next); 1;}) && \
+ ({ tpos = __hlist_entry(pos, typeof(*tpos), member); 1;}); \
+ pos = pos->next)
+
#else
#warning "don't include kernel headers in userspace"
Index: linux/include/linux/pid.h
===================================================================
--- linux.orig/include/linux/pid.h
+++ linux/include/linux/pid.h
@@ -125,6 +125,13 @@ static inline pid_t pid_nr(struct pid *p
hlist_for_each_entry_rcu((task), pos___, \
&pid->tasks[type], pids[type].node) {
+#define __do_each_pid_task(pid, type, task) \
+ do { \
+ struct hlist_node *pos___; \
+ if (pid != NULL) \
+ __hlist_for_each_entry_rcu((task), pos___, \
+ &pid->tasks[type], pids[type].node) {
+
#define while_each_pid_task(pid, type, task) \
} \
} while (0)
Index: linux/kernel/pid.c
===================================================================
--- linux.orig/kernel/pid.c
+++ linux/kernel/pid.c
@@ -289,7 +289,7 @@ struct task_struct * fastcall pid_task(s
struct hlist_node *first;
first = rcu_dereference(pid->tasks[type].first);
if (first)
- result = hlist_entry(first, struct task_struct, pids[(type)].node);
+ result = __container_of(first, struct task_struct, pids[(type)].node);
}
return result;
}
Hi Ingo,
On 16/12/06, Ingo Molnar <[email protected]> wrote:
> FYI, i'm working on integrating kmemleak into -rt. Firstly, i needed the
> fixes below when applying it ontop of 2.6.19-rt15.
Do you need these fixes to avoid a compiler error? If yes, this is
caused by a bug in gcc-4.x. The kmemleak container_of macro has
protection for non-constant offsets passed to container_of but the
faulty gcc always returns true for builtin_contant_p, even when this
is not the case. Previous versions (3.4) or one of the latest 4.x gcc
don't have this bug.
I wouldn't extend kmemleak to work around a gcc bug which was already fixed.
> Secondly, i'm wondering about the following recursion:
>
> [<c045a7e1>] rt_spin_lock_slowlock+0x98/0x1dc
> [<c045b16b>] rt_spin_lock+0x13/0x4b
> [<c018f155>] kfree+0x3a/0xce
> [<c0192e79>] hash_delete+0x58/0x5f
> [<c019309b>] memleak_free+0xe9/0x1e6
> [<c018ed2e>] __cache_free+0x27/0x414
> [<c018f1d0>] kfree+0xb5/0xce
> [<c02788dd>] acpi_ns_get_node+0xb1/0xbb
> [<c02772fa>] acpi_ns_root_initialize+0x30f/0x31d
> [<c0280194>] acpi_initialize_subsystem+0x58/0x87
> [<c06a4641>] acpi_early_init+0x4f/0x12e
> [<c06888bc>] start_kernel+0x41b/0x44b
>
> kfree() within kfree() ... this probably works on the upstream SLAB
> allocator but makes it pretty nasty to sort out SLAB locking in -rt.
I test kmemleak with lockdep enabled but I eliminated all the
dependencies on the vanilla kernel. When kfree(hnode) is called (in
hash_delete), no kmemleak locks are held and hence no dependency on
the kmemleak locks (since kmemleak is protected against re-entrance).
My understanding is that slab __cache_free is re-entrant anyway
(noticed this when using radix-tree instead of hash in kmemleak and
got some lockdep reports on l3->list_lock and memleak_lock) and
calling it again from kmemleak doesn't seem to have any problem on the
vanilla kernel.
In the -rt kernel, is there any protection against a re-entrant
__cache_free (via cache_flusharray -> free_block -> slab_destroy) or
this is not needed?
> Wouldnt it be better to just preallocate the hash nodes, like lockdep
> does, to avoid conceptual nesting? Basically debugging infrastructure
> should rely on other infrastructure as little as possible.
It would indeed be better to avoid using the slab infrastructure (and
not worry about kmemleak re-entrance and lock dependecies). I'll have
a look on how this is done in lockdep since the preallocation size
isn't known. There are also the memleak_object structures that need to
be allocated/freed. To avoid any locking dependencies, I ended up
delaying the memleak_object structures freeing in an RCU manner. It
might work if I do the same with the hash nodes.
> also, the number of knobs in the Kconfig is quite large:
I had some reasons and couldn't find a unified solution, but probably
only for one or two if them:
> CONFIG_DEBUG_MEMLEAK=y
> CONFIG_DEBUG_MEMLEAK_HASH_BITS=16
For my limited configurations (an x86 laptop and several ARM embedded
platforms), 16 bits were enough. I'm not sure this is enough on a
server machine for example.
> CONFIG_DEBUG_MEMLEAK_TRACE_LENGTH=4
I thought this is a user preference. I could hard-code it to 16.
What's the trace length used by lockdep?
> CONFIG_DEBUG_MEMLEAK_PREINIT_OBJECTS=512
I can probably hard-code this as well. This is a buffer to temporary
store memory allocations before kmemleak is fully initialised.
Kmemleak gets initialised quite early in the start_kernel function and
shouldn't be that different in other kernel configurations.
> CONFIG_DEBUG_MEMLEAK_SECONDARY_ALIASES=y
> CONFIG_DEBUG_MEMLEAK_TASK_STACKS=y
These could be eliminated as well.
There are also:
CONFIG_DEBUG_MEMLEAK_REPORT_THLD - can be removed
CONFIG_DEBUG_MEMLEAK_REPORTS_NR - can be removed
CONFIG_DEBUG_KEEP_INIT - this might be useful for other tools that
store the backtrace and display it at a later time. Could be made more
generic.
> plus the Kconfig dependency on SLAB_DEBUG makes it less likely for
> people to spontaneously try kmemleak. I'd suggest to offer KMEMLEAK
> unconditionally (when KERNEL_DEBUG is specified) and simply select
> SLAB_DEBUG.
I just followed the DEBUG_SLAB_LEAK configuration but I don't have any
problem with making it more visible.
Thanks for your comments.
--
Catalin
* Catalin Marinas <[email protected]> wrote:
> Hi Ingo,
>
> On 16/12/06, Ingo Molnar <[email protected]> wrote:
> >FYI, i'm working on integrating kmemleak into -rt. Firstly, i needed the
> >fixes below when applying it ontop of 2.6.19-rt15.
>
> Do you need these fixes to avoid a compiler error? If yes, this is
> caused by a bug in gcc-4.x. The kmemleak container_of macro has
> protection for non-constant offsets passed to container_of but the
> faulty gcc always returns true for builtin_contant_p, even when this
> is not the case. Previous versions (3.4) or one of the latest 4.x gcc
> don't have this bug.
>
> I wouldn't extend kmemleak to work around a gcc bug which was already
> fixed.
correct, i needed it for gcc 4.0.2. If you want this feature upstream,
this has to be solved - no way are we going to phase out portions of
gcc4. It's not hard as you can see it from my patch, non-static
container_of is very rare. We do alot of other hackery to keep older
compilers alive, and we only drop compiler support if some important
feature really, really needs new gcc and a sane workaround is not
possible.
> >Secondly, i'm wondering about the following recursion:
> >
> > [<c045a7e1>] rt_spin_lock_slowlock+0x98/0x1dc
> > [<c045b16b>] rt_spin_lock+0x13/0x4b
> > [<c018f155>] kfree+0x3a/0xce
> > [<c0192e79>] hash_delete+0x58/0x5f
> > [<c019309b>] memleak_free+0xe9/0x1e6
> > [<c018ed2e>] __cache_free+0x27/0x414
> > [<c018f1d0>] kfree+0xb5/0xce
> > [<c02788dd>] acpi_ns_get_node+0xb1/0xbb
> > [<c02772fa>] acpi_ns_root_initialize+0x30f/0x31d
> > [<c0280194>] acpi_initialize_subsystem+0x58/0x87
> > [<c06a4641>] acpi_early_init+0x4f/0x12e
> > [<c06888bc>] start_kernel+0x41b/0x44b
> >
> >kfree() within kfree() ... this probably works on the upstream SLAB
> >allocator but makes it pretty nasty to sort out SLAB locking in -rt.
>
> I test kmemleak with lockdep enabled but I eliminated all the
> dependencies on the vanilla kernel. When kfree(hnode) is called (in
> hash_delete), no kmemleak locks are held and hence no dependency on
> the kmemleak locks (since kmemleak is protected against re-entrance).
> My understanding is that slab __cache_free is re-entrant anyway
> (noticed this when using radix-tree instead of hash in kmemleak and
> got some lockdep reports on l3->list_lock and memleak_lock) and
> calling it again from kmemleak doesn't seem to have any problem on the
> vanilla kernel.
>
> In the -rt kernel, is there any protection against a re-entrant
> __cache_free (via cache_flusharray -> free_block -> slab_destroy) or
> this is not needed?
the problem on -rt is the per-CPU slab buffer handling. In vanilla they
are handled with irqs off/on - but inside is an unbound algorithm so for
-rt i converted the local_irq_disable() logic to per-CPU locks. So this
is an -rt only problem.
hm, even on vanilla you might run into problems in slab_destroy(), there
we hold the l3 lock.
> >Wouldnt it be better to just preallocate the hash nodes, like lockdep
> >does, to avoid conceptual nesting? Basically debugging infrastructure
> >should rely on other infrastructure as little as possible.
>
> It would indeed be better to avoid using the slab infrastructure (and
> not worry about kmemleak re-entrance and lock dependecies). I'll have
> a look on how this is done in lockdep since the preallocation size
> isn't known. [...]
lockdep just uses a large compile-time array to allocate from and never
grows that array.
> There are also the memleak_object structures that need to be
> allocated/freed. To avoid any locking dependencies, I ended up
> delaying the memleak_object structures freeing in an RCU manner. It
> might work if I do the same with the hash nodes.
yeah, delayed RCU freeing might work better.
> >also, the number of knobs in the Kconfig is quite large:
>
> I had some reasons and couldn't find a unified solution, [...]
i'm just saying that for upstream you have to make up your mind about
them and have to provide a simplified configuration API. Sometimes it's
hard, but it has to be done. Experience is that both users and
distributors /will/ misconfigure it.
> [...] but probably only for one or two if them:
>
> > CONFIG_DEBUG_MEMLEAK=y
> > CONFIG_DEBUG_MEMLEAK_HASH_BITS=16
>
> For my limited configurations (an x86 laptop and several ARM embedded
> platforms), 16 bits were enough. I'm not sure this is enough on a
> server machine for example.
we try to never expose something like this via the config
infrastructure. One solution here would be to use CONFIG_BASE_SMALL,
that's the standard way of expressing that the kernel is for a small
machine.
> > CONFIG_DEBUG_MEMLEAK_TRACE_LENGTH=4
>
> I thought this is a user preference. I could hard-code it to 16.
> What's the trace length used by lockdep?
lockdep saves all of them because stack traces are very useful in their
entirety. But lockdep really allocates everything internally and puts
out a warning and turns itself off if that pool is depleted. Also, the
amount of memory needed for lockdep is bound by the number of unique
lock acquire sites in the kernel, while kmemleak might need /alot/ of
memory if there's lots of small allocations around - so the comparison
to lockdep in this area is not valid.
Ingo
the other thing that is pretty ugly is the recursive_enter/clear/exit
code. I'd suggest the lockdep model here: use a single flag in
task_struct to detect recursion. Also, never use BUG() when hitting a
bug in a debugging helper - that will crash the box. Use a flag to turn
off the whole thing if you encounter an internal bug and print a
warning. That way bugs can actually be reported, instead of users
wondering why the box locked up under X.
Ingo
one more thing: after bootup i need to access the /debug/memleak file
twice to get any output from it - is that normal? The first 'cat
/debug/memleak' gives no output (but there's the usual scanning delay,
so memleak does do its work).
Ingo
it would also be nice to have more information than this:
unreferenced object 0xf76f5af8 (size 512):
[<c0191f23>] memleak_alloc
[<c018eeaa>] kmem_cache_zalloc
[<c03277a7>] probe_hwif
[<c032870c>] probe_hwif_init_with_fixup
[<c032aea1>] ide_setup_pci_device
[<c0312564>] amd74xx_probe
[<c069c4b4>] ide_scan_pcidev
[<c069c505>] ide_scan_pcibus
[<c069bdca>] ide_init
[<c0100532>] init
[<c0105da3>] kernel_thread_helper
[<ffffffff>]
it would be nice to record 1) the jiffies value at the time of
allocation, 2) the PID and the comm of the task that did the allocation.
The jiffies timestamp would be useful to see the age of the allocation,
and the PID/comm is useful for context.
plus it would be nice to have a kernel thread (running at nice 19) that
scans everything every 10 minutes or so, which would output some
overview 'delta' information into the syslog, along the lines of:
kmemleak: 2 new object leaks detected, see /debug/memleak for details
that way users could see (and report) new leaks in the dmesg, without
having to do anything. I'd like to enable kmemleak in the -rt yum
kernels, so it all has to be as automatic and as informative as
possible.
Ingo
plus a boot-time and a runtime switch to turn kmemleak off would be nice
as well. (turning it back on is probably not needed and is harder as
well)
Ingo
On Sun, 2006-12-17 at 09:58 +0100, Ingo Molnar wrote:
> * Catalin Marinas <[email protected]> wrote:
>
> > Hi Ingo,
> >
> > On 16/12/06, Ingo Molnar <[email protected]> wrote:
> > >FYI, i'm working on integrating kmemleak into -rt. Firstly, i needed the
> > >fixes below when applying it ontop of 2.6.19-rt15.
> >
> > Do you need these fixes to avoid a compiler error? If yes, this is
> > caused by a bug in gcc-4.x. The kmemleak container_of macro has
> > protection for non-constant offsets passed to container_of but the
> > faulty gcc always returns true for builtin_contant_p, even when this
> > is not the case. Previous versions (3.4) or one of the latest 4.x gcc
> > don't have this bug.
> >
> > I wouldn't extend kmemleak to work around a gcc bug which was already
> > fixed.
>
> correct, i needed it for gcc 4.0.2. If you want this feature upstream,
> this has to be solved - no way are we going to phase out portions of
> gcc4. It's not hard as you can see it from my patch, non-static
> container_of is very rare. We do alot of other hackery to keep older
> compilers alive, and we only drop compiler support if some important
> feature really, really needs new gcc and a sane workaround is not
> possible.
If that's because of things like the dinky testcase below,
int main (int argc, char *argv[])
{
static int a[] = { __builtin_constant_p (argc) ? 1 : 0 };
return a[0];
}
AFAIK, no SuSE compiler can handle it. I just build/tested their latest
version,
gcc version 4.1.2 20061129 (prerelease) (SUSE Linux)
and it still can't deal with that without gcc41-rh198849.patch being
applied.
-Mike
On 17/12/06, Ingo Molnar <[email protected]> wrote:
> one more thing: after bootup i need to access the /debug/memleak file
> twice to get any output from it - is that normal? The first 'cat
> /debug/memleak' gives no output (but there's the usual scanning delay,
> so memleak does do its work).
Yes, this is normal. Especially on SMP, I get some transient reports,
probably caused by pointers hold in registers (even more visible on
ARM due to the bigger number of registers per CPU). Reporting a leak
only if it was seen at least once before greatly reduces the false
positives (this is configurable as well but I'll drop the
configuration option). Without this, you could see that, at every
scan, the reported pointers are different.
Some people testing kmemleak used to read the /debug/memleak file
periodically from a script and this wasn't noticeable. It would be
even better if, as you suggested, I schedule a periodic memory
scanning.
--
Catalin
On 17/12/06, Ingo Molnar <[email protected]> wrote:
>
> it would also be nice to have more information than this:
>
> unreferenced object 0xf76f5af8 (size 512):
> [<c0191f23>] memleak_alloc
> [<c018eeaa>] kmem_cache_zalloc
> [<c03277a7>] probe_hwif
> [<c032870c>] probe_hwif_init_with_fixup
> [<c032aea1>] ide_setup_pci_device
> [<c0312564>] amd74xx_probe
> [<c069c4b4>] ide_scan_pcidev
> [<c069c505>] ide_scan_pcibus
> [<c069bdca>] ide_init
> [<c0100532>] init
> [<c0105da3>] kernel_thread_helper
> [<ffffffff>]
BTW, I think there is a call to kzalloc in probe_hwif and it is
optimised to do a kmem_cache_zalloc in include/linux/slab_def.h. The
latest kmemleak-0.13 ifdef's out this optimisation because the size
information gets lost otherwise. The slab.h file was already patched
for this in 2.6.19 but its content was moved to slab_def.h in
2.6.20-rc1.
> it would be nice to record 1) the jiffies value at the time of
> allocation, 2) the PID and the comm of the task that did the allocation.
> The jiffies timestamp would be useful to see the age of the allocation,
> and the PID/comm is useful for context.
I'll add them. Thanks.
--
Catalin
* Catalin Marinas <[email protected]> wrote:
> On 17/12/06, Ingo Molnar <[email protected]> wrote:
> >one more thing: after bootup i need to access the /debug/memleak file
> >twice to get any output from it - is that normal? The first 'cat
> >/debug/memleak' gives no output (but there's the usual scanning
> >delay, so memleak does do its work).
>
> Yes, this is normal. Especially on SMP, I get some transient reports,
> probably caused by pointers hold in registers (even more visible on
> ARM due to the bigger number of registers per CPU). Reporting a leak
> only if it was seen at least once before greatly reduces the false
> positives (this is configurable as well but I'll drop the
> configuration option). Without this, you could see that, at every
> scan, the reported pointers are different.
>
> Some people testing kmemleak used to read the /debug/memleak file
> periodically from a script and this wasn't noticeable. It would be
> even better if, as you suggested, I schedule a periodic memory
> scanning.
yeah. You could also use the allocation timestamp to exclude too young
entries. (if something really leaked then it will be a leak in 10
minutes too)
Ingo
On 17/12/06, Ingo Molnar <[email protected]> wrote:
> * Catalin Marinas <[email protected]> wrote:
> > Do you need these fixes to avoid a compiler error? If yes, this is
> > caused by a bug in gcc-4.x. The kmemleak container_of macro has
> > protection for non-constant offsets passed to container_of but the
> > faulty gcc always returns true for builtin_contant_p, even when this
> > is not the case. Previous versions (3.4) or one of the latest 4.x gcc
> > don't have this bug.
>
> correct, i needed it for gcc 4.0.2. If you want this feature upstream,
> this has to be solved - no way are we going to phase out portions of
> gcc4. It's not hard as you can see it from my patch, non-static
> container_of is very rare. We do alot of other hackery to keep older
> compilers alive, and we only drop compiler support if some important
> feature really, really needs new gcc and a sane workaround is not
> possible.
There might be a simpler solution (haven't tried it), just before your
compilation error line:
#undef container_of
#define container_of(...) __container_of(...)
I could add some meaningful macros that eliminate the use of the
non-constant offset to make this more obvious.
> > In the -rt kernel, is there any protection against a re-entrant
> > __cache_free (via cache_flusharray -> free_block -> slab_destroy) or
> > this is not needed?
>
> the problem on -rt is the per-CPU slab buffer handling. In vanilla they
> are handled with irqs off/on - but inside is an unbound algorithm so for
> -rt i converted the local_irq_disable() logic to per-CPU locks. So this
> is an -rt only problem.
>
> hm, even on vanilla you might run into problems in slab_destroy(), there
> we hold the l3 lock.
It seems that slab_destroy doesn't take the l3 lock again if it is
already held, otherwise it would fail without kmemleak. However, I
can't guarantee that even a minor change wouldn't break kmemleak.
> > There are also the memleak_object structures that need to be
> > allocated/freed. To avoid any locking dependencies, I ended up
> > delaying the memleak_object structures freeing in an RCU manner. It
> > might work if I do the same with the hash nodes.
>
> yeah, delayed RCU freeing might work better.
I could also use a simple allocator based on alloc_pages since
kmemleak doesn't track pages. It could be so simple that it would
never need to free any pages, just grow the size as required and reuse
the freed memleak objects from a list.
This would simplify the recursiveness and also work on any other slab
allocator (looking back at the amount of time I spend to sort out the
recursiveness and locking dependencies, I could've implemented a full
allocator).
I'll try to implement your other suggestions as well. Thanks.
--
Catalin
* Catalin Marinas <[email protected]> wrote:
> > hm, even on vanilla you might run into problems in slab_destroy(),
> > there we hold the l3 lock.
>
> It seems that slab_destroy doesn't take the l3 lock again if it is
> already held, otherwise it would fail without kmemleak. However, I
> can't guarantee that even a minor change wouldn't break kmemleak.
slab_destroy() is careful to not take the /same/ lock. But kmemleak does
an unconditional kfree() and i dont see what saves us from a rare but
possible deadlock here.
> > yeah, delayed RCU freeing might work better.
>
> I could also use a simple allocator based on alloc_pages since
> kmemleak doesn't track pages. [...]
actually, i'm quite sure we want to track pages later on too, any reason
why kmemleak shouldnt cover them?
But i agree that using a separate allocator would be the right choice
here. Once the page allocator will be added to the tracking mechanism we
can exclude those allocations by doing wrappers. I.e. you shouldnt just
add the memleak_*() functions to the page allocator directly - wrap the
existing calls cleanly, and thus preserve a non-tracked API variant. For
the SLAB this is hard because the SLAB has so many internal dependencies
- but the buddy is a 'core' allocator.
and those non-wrapped APIs would also be useful for the SLAB to call
into - that way we could exclude the tracking of SLAB allocations.
(unless someone wants to track/debug the SLAB implementation itself.)
> [...] It could be so simple that it would never need to free any
> pages, just grow the size as required and reuse the freed memleak
> objects from a list.
sounds good to me. Please make it a per-CPU pool. We'll have to fix the
locking too, to be per-CPU - memleak_lock is quite a scalability problem
right now. (Add a memleak_object->cpu pointer so that freeing can be
done on any other CPU as well.)
> This would simplify the recursiveness and also work on any other slab
> allocator (looking back at the amount of time I spend to sort out the
> recursiveness and locking dependencies, I could've implemented a full
> allocator).
yes.
Ingo
On 18/12/06, Ingo Molnar <[email protected]> wrote:
>
> * Catalin Marinas <[email protected]> wrote:
> > I could also use a simple allocator based on alloc_pages since
> > kmemleak doesn't track pages. [...]
>
> actually, i'm quite sure we want to track pages later on too, any reason
> why kmemleak shouldnt cover them?
It could track them but I'm not sure how efficient it would be since
you have different ways to get the page addresses like pfn, struct
page. The pfn would actually introduce a lot of false negatives.
This needs some investigation and it can probably be done but I'll
first focus on the slab allocator.
> > [...] It could be so simple that it would never need to free any
> > pages, just grow the size as required and reuse the freed memleak
> > objects from a list.
>
> sounds good to me. Please make it a per-CPU pool.
Isn't there a risk for the pools to become imbalanced? A lot of
allocations would initially happen on the first CPU.
> [...] (Add a memleak_object->cpu pointer so that freeing can be
> done on any other CPU as well.)
We could add the freed objects to the CPU pool where they were freed
and not use a memleak_object->cpu pointer.
> We'll have to fix the
> locking too, to be per-CPU - memleak_lock is quite a scalability problem
> right now.
The memleak_lock is indeed too coarse (but it was easier to track the
locking dependencies). With a new allocator, however, I could do a
finer grain locking. It probably still needs a (rw)lock for the hash
table. Having per-CPU hash tables is inefficient as we would have to
look up all the tables at every freeing or scanning for the
corresponding memleak_object.
There is a global object_list as well covered by memleak_lock (only
for insertions/deletions as traversing is RCU). Since modifications to
this list happen at the same time with the hash table modifications,
the memleak_lock is held anyway so we probably don't need to do
per-CPU object lists (they would also need some locking if freeing
would happen on a different CPU). List insertion/deletion is very
small compared to the hash-table look-up and it wouldn't introduce a
scalability problem.
--
Catalin
* Catalin Marinas <[email protected]> wrote:
> >> [...] It could be so simple that it would never need to free any
> >> pages, just grow the size as required and reuse the freed memleak
> >> objects from a list.
> >
> >sounds good to me. Please make it a per-CPU pool.
>
> Isn't there a risk for the pools to become imbalanced? A lot of
> allocations would initially happen on the first CPU.
hm, what's the problem with imbalance? These are trees and imbalance
isnt a big issue.
> >[...] (Add a memleak_object->cpu pointer so that freeing can be done
> >on any other CPU as well.)
>
> We could add the freed objects to the CPU pool where they were freed
> and not use a memleak_object->cpu pointer.
i mean totally per-CPU locking and per-CPU radix trees, etc.
> > We'll have to fix the locking too, to be per-CPU - memleak_lock is
> > quite a scalability problem right now.
>
> The memleak_lock is indeed too coarse (but it was easier to track the
> locking dependencies). With a new allocator, however, I could do a
> finer grain locking. It probably still needs a (rw)lock for the hash
> table. Having per-CPU hash tables is inefficient as we would have to
> look up all the tables at every freeing or scanning for the
> corresponding memleak_object.
at freeing we only have to look up the tree belonging to object->cpu.
Scanning overhead does not matter in comparison to runtime tracking
overhead. (but i doubt it would be much different - scanning overhead
scales with size of tree)
> There is a global object_list as well covered by memleak_lock (only
> for insertions/deletions as traversing is RCU). [...]
yeah, that would have to become per-CPU too.
> [...] List insertion/deletion is very small compared to the hash-table
> look-up and it wouldn't introduce a scalability problem.
it's a common misconception to think that 'small' critical sections are
fine. That's not the issue. The pure fact of having globally modified
resource is the problem, the lock cacheline would ping-pong, etc.
Ingo
On 18/12/06, Ingo Molnar <[email protected]> wrote:
>
> * Catalin Marinas <[email protected]> wrote:
>
> > >> [...] It could be so simple that it would never need to free any
> > >> pages, just grow the size as required and reuse the freed memleak
> > >> objects from a list.
> > >
> > >sounds good to me. Please make it a per-CPU pool.
> >
> > Isn't there a risk for the pools to become imbalanced? A lot of
> > allocations would initially happen on the first CPU.
>
> hm, what's the problem with imbalance? These are trees and imbalance
> isnt a big issue.
It could just be more available (freed) memleak_objects on one CPU
than on the others and use more memory. Not a big problem though.
> > > We'll have to fix the locking too, to be per-CPU - memleak_lock is
> > > quite a scalability problem right now.
> >
> > The memleak_lock is indeed too coarse (but it was easier to track the
> > locking dependencies). With a new allocator, however, I could do a
> > finer grain locking. It probably still needs a (rw)lock for the hash
> > table. Having per-CPU hash tables is inefficient as we would have to
> > look up all the tables at every freeing or scanning for the
> > corresponding memleak_object.
>
> at freeing we only have to look up the tree belonging to object->cpu.
At freeing, kmemleak only gets a pointer value which has to be looked
up in the hash table for the corresponding memleak_object. Only after
that, we can know memleak_object->cpu. That's why I think we only need
to have a global hash table. The hash table look-up can be RCU.
It would work with per-CPU hash tables but we still need to look-up
the other hash tables in case the freeing happened on a different CPU
(i.e. look-up the current hash table and, if it fails, look-up the
other per-CPU hashes). Freeing would need to remove the entry from the
hash table and acquire a lock but this would be per-CPU. I'm not sure
how often you get this scenario (allocation and freeing on different
CPUs) but it might introduce an overhead to the memory freeing.
Do you have a better solution here?
> > There is a global object_list as well covered by memleak_lock (only
> > for insertions/deletions as traversing is RCU). [...]
>
> yeah, that would have to become per-CPU too.
That's not that difficult but, as above, we need the hash table
look-up before we find which list it is on.
> > [...] List insertion/deletion is very small compared to the hash-table
> > look-up and it wouldn't introduce a scalability problem.
>
> it's a common misconception to think that 'small' critical sections are
> fine. That's not the issue. The pure fact of having globally modified
> resource is the problem, the lock cacheline would ping-pong, etc.
You are right but I didn't mean that small critical sections are
better, just that in case we need a critical section for the global
hash table look-up, extending this critical region with list
addition/deletion wouldn't make things any worse (than they are,
regarding scalability).
--
Catalin
* Catalin Marinas <[email protected]> wrote:
> >at freeing we only have to look up the tree belonging to object->cpu.
>
> At freeing, kmemleak only gets a pointer value which has to be looked
> up in the hash table for the corresponding memleak_object. Only after
> that, we can know memleak_object->cpu. That's why I think we only need
> to have a global hash table. The hash table look-up can be RCU.
>
> It would work with per-CPU hash tables but we still need to look-up
> the other hash tables in case the freeing happened on a different CPU
> (i.e. look-up the current hash table and, if it fails, look-up the
> other per-CPU hashes). Freeing would need to remove the entry from the
> hash table and acquire a lock but this would be per-CPU. I'm not sure
> how often you get this scenario (allocation and freeing on different
> CPUs) but it might introduce an overhead to the memory freeing.
>
> Do you have a better solution here?
hmm ... nasty. Would it be feasible to embedd the memleak info into the
allocated object itself? That would remove quite some runtime overhead,
besides eliminating the global hash.
Ingo
On 18/12/06, Ingo Molnar <[email protected]> wrote:
>
> * Catalin Marinas <[email protected]> wrote:
>
> > >at freeing we only have to look up the tree belonging to object->cpu.
> >
> > At freeing, kmemleak only gets a pointer value which has to be looked
> > up in the hash table for the corresponding memleak_object. Only after
> > that, we can know memleak_object->cpu. That's why I think we only need
> > to have a global hash table. The hash table look-up can be RCU.
> > [...]
>
> hmm ... nasty. Would it be feasible to embedd the memleak info into the
> allocated object itself? That would remove quite some runtime overhead,
> besides eliminating the global hash.
I thought about this but would be more difficult for page allocations
and even impossible if we later want to track ioremap'ed regions (or
any other resource that has a unique id). The vmalloc'ed regions are
page aligned, though not sure there is any code that makes use of this
assumption. It also makes it too dependent on the allocator itself
(maybe not a big problem). Being less intrusive makes this easier to
maintain, especially if you want to use a different allocator.
Another option would be to store kmemleak info in the slab management
structures, vm_area (for vmalloc) or page structures but this breaks
the kmemleak functions for marking the false positives (which is done
outside the allocator).
Is there any scalability/performance impact of just
acquiring/releasing a lock or this is only affected by the contention
situations? If it's the latter, there can be a lock per hash-table
entry and drastically reduce the contention. The drawback is the
doubling of the hash table size, maybe with some impact on caches and
TLBs.
--
Catalin
On 18/12/06, Ingo Molnar <[email protected]> wrote:
> * Catalin Marinas <[email protected]> wrote:
> > I could also use a simple allocator based on alloc_pages [...]
> > [...] It could be so simple that it would never need to free any
> > pages, just grow the size as required and reuse the freed memleak
> > objects from a list.
>
> sounds good to me. Please make it a per-CPU pool. We'll have to fix the
> locking too, to be per-CPU - memleak_lock is quite a scalability problem
> right now. (Add a memleak_object->cpu pointer so that freeing can be
> done on any other CPU as well.)
I did some simple statistics about allocations happening on one CPU
and freeing on a different one. On a 4-CPU ARM system (and without IRQ
balancing and without CONFIG_PREEMPT), these seem to happen in about
8-10% of the cases. Do you expect higher figures on other
systems/configurations?
As I mentioned in a different e-mail, a way to remove the global hash
table is to create per-cpu hashes. The only problem is that in these
8-10% of the cases, freeing would need to look up the other hashes.
This would become a problem with a high number of CPUs but I'm not
sure whether it would overtake the performance issues introduced by
cacheline ping-ponging in the single-hash case.
Any thoughts?
Thanks.
--
Catalin
* Catalin Marinas <[email protected]> wrote:
> On 18/12/06, Ingo Molnar <[email protected]> wrote:
> >* Catalin Marinas <[email protected]> wrote:
> >> I could also use a simple allocator based on alloc_pages [...]
> >> [...] It could be so simple that it would never need to free any
> >> pages, just grow the size as required and reuse the freed memleak
> >> objects from a list.
> >
> >sounds good to me. Please make it a per-CPU pool. We'll have to fix the
> >locking too, to be per-CPU - memleak_lock is quite a scalability problem
> >right now. (Add a memleak_object->cpu pointer so that freeing can be
> >done on any other CPU as well.)
>
> I did some simple statistics about allocations happening on one CPU
> and freeing on a different one. On a 4-CPU ARM system (and without IRQ
> balancing and without CONFIG_PREEMPT), these seem to happen in about
> 8-10% of the cases. Do you expect higher figures on other
> systems/configurations?
>
> As I mentioned in a different e-mail, a way to remove the global hash
> table is to create per-cpu hashes. The only problem is that in these
> 8-10% of the cases, freeing would need to look up the other hashes.
> This would become a problem with a high number of CPUs but I'm not
> sure whether it would overtake the performance issues introduced by
> cacheline ping-ponging in the single-hash case.
i dont think it's worth doing that. So we should either do the current
global lock & hash (bad for scalability), or a pure per-CPU design. The
pure per-CPU design would have to embedd the CPU ID the object is
attached to into the allocated object. If that is not feasible then only
the global hash remains i think.
Ingo
On 17/12/06, Ingo Molnar <[email protected]> wrote:
> it would be nice to record 1) the jiffies value at the time of
> allocation, 2) the PID and the comm of the task that did the allocation.
> The jiffies timestamp would be useful to see the age of the allocation,
> and the PID/comm is useful for context.
Trying to copy the comm with get_task_comm, I get the lockdep report
below, caused by acquiring the task's alloc_lock. Any idea how to go
around this?
=================================
[ INFO: inconsistent lock state ]
2.6.20-rc2 #211
---------------------------------
inconsistent {hardirq-on-W} -> {in-hardirq-W} usage.
swapper/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
(init_task.alloc_lock){+-..}, at: [<c007d950>] get_task_comm+0x24/0x40
{hardirq-on-W} state was registered at:
[<c00493d0>] lock_acquire+0x80/0x94
[<c020cf1c>] _spin_lock+0x30/0x40
[<c0027c80>] copy_process+0x8b8/0x1208
[<c0028874>] do_fork+0xc0/0x1e0
[<c001497c>] kernel_thread+0x70/0x80
[<c000f864>] rest_init+0x1c/0x30
[<c000fa9c>] start_kernel+0x224/0x27c
[<00008078>] 0x8078
irq event stamp: 33758
hardirqs last enabled at (33757): [<c0014ef8>] default_idle+0x44/0x50
hardirqs last disabled at (33758): [<c00131d4>] __irq_svc+0x34/0xc0
softirqs last enabled at (33750): [<c002f97c>] __do_softirq+0x10c/0x124
softirqs last disabled at (33745): [<c002fde0>] irq_exit+0x58/0x60
other info that might help us debug this:
1 lock held by swapper/0:
#0: (&lp->lock){+...}, at: [<c0146b9c>] smc_interrupt+0x28/0x784
stack backtrace:
[<c0018cf4>] (dump_stack+0x0/0x14) from [<c00471b4>]
(print_usage_bug+0x11c/0x154)
[<c0047098>] (print_usage_bug+0x0/0x154) from [<c0047b60>]
(mark_lock+0xe0/0x4c8)
r8 = 00000001 r7 = C02A4348 r6 = 00000000 r5 = 00000002
r4 = C02A4638
[<c0047a80>] (mark_lock+0x0/0x4c8) from [<c0048b34>]
(__lock_acquire+0x424/0xc40)
[<c0048710>] (__lock_acquire+0x0/0xc40) from [<c00493d0>]
(lock_acquire+0x80/0x94)
[<c0049350>] (lock_acquire+0x0/0x94) from [<c020cf1c>] (_spin_lock+0x30/0x40)
[<c020ceec>] (_spin_lock+0x0/0x40) from [<c007d950>] (get_task_comm+0x24/0x40)
r4 = C02A4348
[<c007d92c>] (get_task_comm+0x0/0x40) from [<c0075570>]
(memleak_alloc+0x168/0x2b4)
r6 = C7596E60 r5 = A0000193 r4 = C7596E64
[<c0075408>] (memleak_alloc+0x0/0x2b4) from [<c0072800>]
(kmem_cache_alloc+0x110/0x124)
[<c00726f0>] (kmem_cache_alloc+0x0/0x124) from [<c0199228>]
(__alloc_skb+0x34/0x130)
[<c01991f4>] (__alloc_skb+0x0/0x130) from [<c0146f04>]
(smc_interrupt+0x390/0x784)
[<c0146b74>] (smc_interrupt+0x0/0x784) from [<c005322c>]
(handle_IRQ_event+0x2c/0x68)
[<c0053200>] (handle_IRQ_event+0x0/0x68) from [<c0054d94>]
(handle_level_irq+0xe4/0x150)
r7 = C02A1128 r6 = C7805C20 r5 = 00000029 r4 = C02A1100
[<c0054cb0>] (handle_level_irq+0x0/0x150) from [<c001469c>]
(asm_do_IRQ+0x70/0x9c)
r7 = 00008000 r6 = 00000029 r5 = 00000000 r4 = C029E000
[<c001462c>] (asm_do_IRQ+0x0/0x9c) from [<c00131d4>] (__irq_svc+0x34/0xc0)
r5 = FF000100 r4 = FFFFFFFF
[<c0014eb4>] (default_idle+0x0/0x50) from [<c0014cb0>] (cpu_idle+0x30/0x4c)
[<c0014c80>] (cpu_idle+0x0/0x4c) from [<c000f870>] (rest_init+0x28/0x30)
r5 = C02C3D30 r4 = C04DB2A4
[<c000f848>] (rest_init+0x0/0x30) from [<c000fa9c>] (start_kernel+0x224/0x27c)
[<c000f878>] (start_kernel+0x0/0x27c) from [<00008078>] (0x8078)
--
Catalin
On Wed, 2006-12-27 at 16:14 +0000, Catalin Marinas wrote:
> On 17/12/06, Ingo Molnar <[email protected]> wrote:
> > it would be nice to record 1) the jiffies value at the time of
> > allocation, 2) the PID and the comm of the task that did the allocation.
> > The jiffies timestamp would be useful to see the age of the allocation,
> > and the PID/comm is useful for context.
>
> Trying to copy the comm with get_task_comm, I get the lockdep report
> below, caused by acquiring the task's alloc_lock. Any idea how to go
> around this?
well you take the lock from irq context, which means it needs to use
_irqsave/restore everywhere. (and all locks taken inside it must be irq
safe as well)
maybe.. not use comm in irq context? it doesn't actually mean anything
there anyway...
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org
On 27/12/06, Arjan van de Ven <[email protected]> wrote:
> On Wed, 2006-12-27 at 16:14 +0000, Catalin Marinas wrote:
> > On 17/12/06, Ingo Molnar <[email protected]> wrote:
> > > it would be nice to record 1) the jiffies value at the time of
> > > allocation, 2) the PID and the comm of the task that did the allocation.
> > > The jiffies timestamp would be useful to see the age of the allocation,
> > > and the PID/comm is useful for context.
> >
> > Trying to copy the comm with get_task_comm, I get the lockdep report
> > below, caused by acquiring the task's alloc_lock. Any idea how to go
> > around this?
>
> well you take the lock from irq context, which means it needs to use
> _irqsave/restore everywhere. (and all locks taken inside it must be irq
> safe as well)
>
> maybe.. not use comm in irq context? it doesn't actually mean anything
> there anyway...
Thanks. Done this and it is ok now.
--
Catalin
* Catalin Marinas <[email protected]> wrote:
> On 17/12/06, Ingo Molnar <[email protected]> wrote:
> >it would be nice to record 1) the jiffies value at the time of
> >allocation, 2) the PID and the comm of the task that did the allocation.
> >The jiffies timestamp would be useful to see the age of the allocation,
> >and the PID/comm is useful for context.
>
> Trying to copy the comm with get_task_comm, I get the lockdep report
> below, caused by acquiring the task's alloc_lock. Any idea how to go
> around this?
just memcpy p->comm without any locking. It's for the current task,
right? That does not need any locking.
Ingo
* Ingo Molnar <[email protected]> wrote:
> > Trying to copy the comm with get_task_comm, I get the lockdep report
> > below, caused by acquiring the task's alloc_lock. Any idea how to go
> > around this?
>
> just memcpy p->comm without any locking. It's for the current task,
> right? That does not need any locking.
furthermore, i'd put "hardirq" into the comm copy if in_hardirq() is
true, and i'd put "softirq" into the comm copy if !in_hardirq &&
in_softirq(). That way the allocation is not attributed to a process
that has no real connection to it.
Ingo
On 27/12/06, Ingo Molnar <[email protected]> wrote:
>
> * Catalin Marinas <[email protected]> wrote:
>
> > On 17/12/06, Ingo Molnar <[email protected]> wrote:
> > >it would be nice to record 1) the jiffies value at the time of
> > >allocation, 2) the PID and the comm of the task that did the allocation.
> > >The jiffies timestamp would be useful to see the age of the allocation,
> > >and the PID/comm is useful for context.
> >
> > Trying to copy the comm with get_task_comm, I get the lockdep report
> > below, caused by acquiring the task's alloc_lock. Any idea how to go
> > around this?
>
> just memcpy p->comm without any locking. It's for the current task,
> right? That does not need any locking.
Yes, it is for the current task. I also added an in_interrupt() test
since the current->comm is not relevant in this case as Arjan pointed
out.
--
Catalin
* Ingo Molnar <[email protected]> wrote:
> > As I mentioned in a different e-mail, a way to remove the global
> > hash table is to create per-cpu hashes. The only problem is that in
> > these 8-10% of the cases, freeing would need to look up the other
> > hashes. This would become a problem with a high number of CPUs but
> > I'm not sure whether it would overtake the performance issues
> > introduced by cacheline ping-ponging in the single-hash case.
>
> i dont think it's worth doing that. So we should either do the current
> global lock & hash (bad for scalability), or a pure per-CPU design.
> The pure per-CPU design would have to embedd the CPU ID the object is
> attached to into the allocated object. If that is not feasible then
> only the global hash remains i think.
embedding the info shouldnt be /that/ hard in case of the SLAB: if the
memleak info is at a negative offset from the allocated pointer. I.e.
that if kmalloc() returns 'ptr', the memleak info could be at
ptr-sizeof(memleak_info). That way you dont have to know the size of the
object beforehand and there's absolutely no need for a global hash of
any sort.
(it gets a bit more complex for page aligned allocations for the buddy
and for vmalloc - but that could be solved by adding one extra pointer
into struct page. That is a far more preferable cost than the
locking/cache overhead of a global hash.)
Ingo
On 27/12/06, Ingo Molnar <[email protected]> wrote:
> * Ingo Molnar <[email protected]> wrote:
> > The pure per-CPU design would have to embedd the CPU ID the object is
> > attached to into the allocated object. If that is not feasible then
> > only the global hash remains i think.
>
> embedding the info shouldnt be /that/ hard in case of the SLAB: if the
> memleak info is at a negative offset from the allocated pointer. I.e.
> that if kmalloc() returns 'ptr', the memleak info could be at
> ptr-sizeof(memleak_info). That way you dont have to know the size of the
> object beforehand and there's absolutely no need for a global hash of
> any sort.
It would probably need to be just a pointer embedded in the allocated
block. With the current design, the memleak objects have a lifetime
longer than the tracked block. This is mainly to avoid long locking
during memory scanning and reporting.
> (it gets a bit more complex for page aligned allocations for the buddy
> and for vmalloc - but that could be solved by adding one extra pointer
> into struct page. [...]
This still leaves the issue of marking objects as not being leaks or
being of a different type. This is done by calling memleak_* functions
at the allocation point (outside allocator) where only the pointer is
known. In the vmalloc case, it would need to call find_vm_area. This
might not be a big problem, only that memory resources are no longer
treated in a unified way by kmemleak (and might not be trivial to add
support for new allocators).
> [...] That is a far more preferable cost than the
> locking/cache overhead of a global hash.)
A global hash would need to be re-built for every scan (and destroyed
afterwards), making this operation longer since the pointer values
together with their aliases (resulted from using container_of) are
added to the hash.
I understand the benefits but I personally favor simplicity over
performance, especially for code used as a debugging tool. DEBUG_SLAB
already introduces an overhead by poisoning the allocated blocks.
Generating the backtrace and filling in the memleak objects for every
allocation is another overhead. Global structures are indeed a
scalability problem but for a reasonable number of CPUs their overhead
might not be that big. Anyway, I can't be sure without some
benchmarking and this is probably highly dependent on the system
(caches, snoop control unit etc.).
--
Catalin
* Catalin Marinas <[email protected]> wrote:
> >memleak info is at a negative offset from the allocated pointer. I.e.
> >that if kmalloc() returns 'ptr', the memleak info could be at
> >ptr-sizeof(memleak_info). That way you dont have to know the size of the
> >object beforehand and there's absolutely no need for a global hash of
> >any sort.
>
> It would probably need to be just a pointer embedded in the allocated
> block. With the current design, the memleak objects have a lifetime
> longer than the tracked block. This is mainly to avoid long locking
> during memory scanning and reporting.
this thing has to be /fast/ in the common path. I dont really see the
point in spreading out allocations like that for small objects. Access
to the object has to be refcounted /anyway/ due to scanning. Just move
that refcounting to the object freeing stage. Keep freed-but-used
buffers in some sort of per-CPU list that the scanning code flushes
after it's done. (Also maybe hold the cache_chain_mutex to prevent slab
caches from being destroyed during scanning.)
> > (it gets a bit more complex for page aligned allocations for the
> > buddy and for vmalloc - but that could be solved by adding one extra
> > pointer into struct page. [...]
>
> This still leaves the issue of marking objects as not being leaks or
> being of a different type. This is done by calling memleak_* functions
> at the allocation point (outside allocator) where only the pointer is
> known. [...]
i dont see the problem. By having the pointer we have access to the
memleak descriptor too.
> [...] In the vmalloc case, it would need to call find_vm_area. This
> might not be a big problem, only that memory resources are no longer
> treated in a unified way by kmemleak (and might not be trivial to add
> support for new allocators).
the pretty horrible locking dependencies in the current one are just as
bad. (which could be softened by a simplified allocator - but that
brings in other problems, which problems can only be solved via
allocator complexity ...)
If 'unification' means global locking and bad overhead and leak
descriptor maintainance complexity then yes, we very much dont want to
treat them in a unified way. Unless there's some killer counter-argument
against embedding the memleak descriptor in the object that we allocate
it is pretty much a must.
btw., you made the argument before that what matters most is the SLAB
allocator. (when i argued that we might want to extend this to other
allocators as well) You cant have it both ways :)
> > [...] That is a far more preferable cost than the locking/cache
> > overhead of a global hash.)
>
> A global hash would need to be re-built for every scan (and destroyed
> afterwards), making this operation longer since the pointer values
> together with their aliases (resulted from using container_of) are
> added to the hash.
by 'global hash' i mean the current code.
> I understand the benefits but I personally favor simplicity over
> performance, [...]
i think you are trying to clinge to a bad design detail. You spent time
on it, and that time was still worthwile, believe me. I often change
design details dozens of times before a patch hits lkml. It doesnt
matter, really - it's the path you walk that matters. This thing /has/
to be fast/scalable to be used in distributions.
> [...] Global structures are indeed a scalability problem but for a
> reasonable number of CPUs their overhead might not be that big.
lets forget you ever said this, ok? ;-)
Ingo
On 28/12/06, Ingo Molnar <[email protected]> wrote:
>
> * Catalin Marinas <[email protected]> wrote:
>
> > >memleak info is at a negative offset from the allocated pointer. I.e.
> > >that if kmalloc() returns 'ptr', the memleak info could be at
> > >ptr-sizeof(memleak_info). That way you dont have to know the size of the
> > >object beforehand and there's absolutely no need for a global hash of
> > >any sort.
> >
> > It would probably need to be just a pointer embedded in the allocated
> > block. With the current design, the memleak objects have a lifetime
> > longer than the tracked block. This is mainly to avoid long locking
> > during memory scanning and reporting.
>
> this thing has to be /fast/ in the common path. I dont really see the
> point in spreading out allocations like that for small objects.
This might have to be done for the vmalloc case, unless we embed the
memleak object in the vm_struct. A different way would be to vmalloc
an extra page but we waste a bit of memory (at least on ARM, the
vmalloc space is pretty small). The current size of the memleak_object
struct is 160 bytes.
Yet another implementation would be to embed the memleak object in the
slab object (since this is the most common case) and use some kind of
global hash for vmalloc or other allocators (see below for what other
objects kmemleak tracks).
> Access
> to the object has to be refcounted /anyway/ due to scanning. Just move
> that refcounting to the object freeing stage. Keep freed-but-used
> buffers in some sort of per-CPU list that the scanning code flushes
> after it's done. (Also maybe hold the cache_chain_mutex to prevent slab
> caches from being destroyed during scanning.)
I'm a bit worried about locking dependencies between l3->list_lock and
some lock for the freed-but-used lists. I have to investigate this as
I'm not familiar with the internals of the slab allocator.
Also, do you see any issues with using RCU freeing instead of a
freed-but-used list (rcu_data is already per-cpu)?
> > > (it gets a bit more complex for page aligned allocations for the
> > > buddy and for vmalloc - but that could be solved by adding one extra
> > > pointer into struct page. [...]
> >
> > This still leaves the issue of marking objects as not being leaks or
> > being of a different type. This is done by calling memleak_* functions
> > at the allocation point (outside allocator) where only the pointer is
> > known. [...]
>
> i dont see the problem. By having the pointer we have access to the
> memleak descriptor too.
Maybe not directly (i.e. without find_vm_area) in the vmalloc case but
it depends on where we store the memleak_object in this case.
> > [...] In the vmalloc case, it would need to call find_vm_area. This
> > might not be a big problem, only that memory resources are no longer
> > treated in a unified way by kmemleak (and might not be trivial to add
> > support for new allocators).
>
> the pretty horrible locking dependencies in the current one are just as
> bad.
Yes, I agree, and I'll implement a simplified allocator.
> If 'unification' means global locking and bad overhead and leak
> descriptor maintainance complexity then yes, we very much dont want to
> treat them in a unified way. Unless there's some killer counter-argument
> against embedding the memleak descriptor in the object that we allocate
> it is pretty much a must.
I don't have a strong counter-argument, I just try to find solutions
to the difficulties I encountered during the initial implementation.
> btw., you made the argument before that what matters most is the SLAB
> allocator. (when i argued that we might want to extend this to other
> allocators as well) You cant have it both ways :)
I forgot to mention but it needs to track the vmalloc allocations as
well since vmalloc'ed objects can have pointers to slab objects (one
clear example is the loadable modules).
There are also some memory areas that need to be scanned or tracked
(so that we don't get false positives) but are not allocated via slab
or vmalloc (like alloc_large_system_hash or percpu_modalloc). I
currently treat them in a /unified/ way, just by calling
memleak_alloc() on those pointers (as with slab and vmalloc). I can
change the API and add a memleak_add_area or similar and store them in
some internal memleak structure, global or per-CPU (it might not
matter much since these are called quite seldom but I should stop
expressing opinions on scalability :-)).
> > I understand the benefits but I personally favor simplicity over
> > performance, [...]
>
> i think you are trying to clinge to a bad design detail. You spent time
> on it, and that time was still worthwile, believe me. I often change
> design details dozens of times before a patch hits lkml. It doesnt
> matter, really - it's the path you walk that matters. This thing /has/
> to be fast/scalable to be used in distributions.
I don't have an issue with dumping the current design but, as I said,
I need to think about solutions to the problems I encountered in the
past (maybe I should just start re-coding kmemleak and see whether it
works but I try to make the transition as easy as possible since I do
this mainly in my spare time and want to finish it in a reasonable
time).
> > [...] Global structures are indeed a scalability problem but for a
> > reasonable number of CPUs their overhead might not be that big.
>
> lets forget you ever said this, ok? ;-)
Well, most of my Linux experience is on limited resources embedded
systems (ARM only recently joined the multi-CPU world). Enlightening
is welcomed :-)
Thanks for your comments.
--
Catalin